ABSTRACT

CONWAY, JONATHAN MICHAEL. In Vitro and In Vivo Analyses of the Role of Multi- Glycoside Hydrolases from Extremely Thermophilic Caldicellulosiruptor Species in the Degradation of Plant Biomass. (Under the direction of Dr. Robert Kelly).

Caldicellulosiruptor species are anaerobic that are the most thermophilic

(Topt = 70-75°C) microorganisms currently known to degrade lignocellulosic biomass, making them of interest for the conversion of plant biomass to industrial products. The genomes of 12 Caldicellulosiruptor species have been sequenced and genomic, transcriptomic, proteomic, and phenotypic analysis has revealed genes of interest for evaluating and manipulating the native attachment and degradation mechanisms of

Caldicellulosiruptor species. Of particular interest are the large (~160-240 kDa), multi- domain Carbohydrate Active enZymes (CAZymes) produced by Caldicellulosiruptor species that containing multiple catalytic glycoside hydrolase (GH) and carbohydrate binding module

(CBM) domains. These multi-domain enzymes represent an alternative paradigm of lignocellulosic plant biomass degradation to cellulosomal organisms, such as Clostridium thermocellum, or free enzyme producers, such as the fungus Trichoderma reesei.

The multi-domain enzymes from Caldicellulosiruptor species can be divided into two broad categories: those freely secreted from the cell and those containing surface layer homology (SLH) domains, which associate these enzymes with the bacterial surface layer (S- layer). Proteins from both categories were produced in full-length and truncated recombinant versions in Escherichia coli and Caldicellulosiruptor bescii and biochemically characterized in vitro. Analysis of the free enzymes from the glucan degradation locus (GDL), the primary genomic site of cellulases in Caldicellulosiruptor species, showed that while CelA

(Athe_1867 in C. bescii) is responsible for the primary cellulase activity, it is enhanced about two-fold by synergistic action with additional enzymes from the GDL. Analysis of truncations of S-layer localized Calkro_0111, the largest CAZyme from any

Caldicellulosiruptor species, revealed the contribution of individual non-catalytic domains to the activity of its GH16 and GH55 domains. In particular, a newly identified actin crosslinking-like (ACL) domain, adjacent to the GH16, is necessary for its activity. In further analysis, an enzyme from Caldicellulosiruptor sp. strain Wai35.B1 containing three catalytic

GH domains, the first enzyme identified to date containing three catalytic GH domains, was characterized focusing on its individual catalytic domains. Protein crystal structures for individual GH and CBM domains were determined from several of these CAZymes: GH5 and CBM28 domains from Csac_0678, GH16 and ACL domains from Calkro_0111, and

GH10 CBM3 and GH48 domains from Wai35_2053. These domains were also characterized biochemically. These domain truncations are singular folded and catalytic entities such that the Caldicellulosiruptor multi-domain GHs truly are comprised of individually folded domains strung along the single polypeptide chain.

To complement these in vitro analyses, genetic manipulations in C. bescii were performed, using newly developed genetic techniques and strains, to engineer plant biomass degradation and evaluate the role of CAZymes in vivo. S-layer associated xylanase,

Calkro_0402, was inserted into C. bescii and the resulting strain showed improved ability to degrade xylan substrates, the first example of engineering improved degradation in

Caldicellulosiruptor. The six GH enzymes of the glucan degradation locus (GDL) in C. bescii were also analyzed by constructing eight knockout strains with different deletion combinations. Solubilization of crystalline cellulose and complex biomass substrates by these strains reveal these enzymes are responsible for essentially all of the biotic substrate solubilization performed by C. bescii. And, while the relative importance of individual enzymes from the GDL is biomass dependent, the synergistic action of these enzymes underlies this bacterium’s broad appetite for plant biomass.

Taken together, the in vitro and in vivo analyses of the multi-domain glycoside hydrolases presented here advances the fundamental understanding of how these enzymes work biochemically and the roles these enzymes play in plant biomass recruitment and degradation by Caldicellulosiruptor species. This work will enable future efforts to engineer strains and biomass processing schemes for optimized plant biomass degradation looking towards the production of bio-fuels and bio-chemicals from lignocellulosic feedstocks.

© Copyright 2017 Jonathan Michael Conway

All Rights Reserved In Vitro and In Vivo Analyses of the Role of Multi-Domain Glycoside Hydrolases from Extremely Thermophilic Caldicellulosiruptor Species in the Degradation of Plant Biomass

by Jonathan Michael Conway

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Chemical and Biomolecular Engineering

Raleigh, North Carolina

2017

APPROVED BY:

______Dr. Robert Kelly Dr. Chase Beisel Committee Chair

______Dr. Amy Grunden Dr. Lucian Lucia Biotechnology Minor Representative

DEDICATION

For my grandparents– the teacher, the pastor, the engineer, and the cook.

I see parts of each of you in myself.

I love you all very much.

ii

BIOGRAPHY

Jonathan Conway was raised in Lancaster County, Pennsylvania. He attended the

University of Notre Dame in South Bend, Indiana for his undergraduate degree in chemical engineering. During his undergraduate career, he participated in several undergraduate research experiences with various professors at Notre Dame. After his junior and senior years at Notre Dame he had the opportunity to intern in the Upstream Manufacturing Science and Technology department at MedImmune, Inc. in Frederick, Maryland. His experiences with undergraduate research at Notre Dame and industrial experience at MedImmune inspired him to pursue a graduate degree in chemical engineering. He began his graduate studies under the direction of Dr. Robert Kelly in the fall of 2011. After completion of his graduate studies in May 2017, he will pursue a career in academia doing research in biological applications of chemical engineering.

iii

ACKNOWLEDGMENTS

First and foremost, I would like to acknowledge my academic advisor, Dr. Robert

Kelly. His guidance, assistance, and leadership have been instrumental in my development and success in graduate school. His belief and trust in me and my abilities has propelled me to achieve many of my research and professional successes. I would also like to acknowledge

Dr. Sara Blumer-Schuette for her mentorship in the early days of my graduate career. So much of scientific research relies on solid fundamental techniques, and the fundamentals I rely on often were taught to me by Sara. All of the members of the Kelly lab group over my years as a member have been amazing friends and colleagues and I appreciate your encouragement, help, and friendship both inside and outside the lab. The undergraduates that have worked with me have also been an integral part of my graduate career. I appreciate all of your trust in my instruction, hard work on my research projects, enthusiasm, and friendship.

I am also grateful for the collegial environment in the chemical engineering department and larger biotechnology community at NC State University. Working alongside the faculty, staff, graduate students, and undergraduates of our department and this university has been a pleasure. Beyond the NC State community, many members of the Department of

Energy BioEnergy Science Center (BESC) have been instrumental in my scientific success.

Of this group, I would especially like to thank the members of Dr. Mike Adams’s lab at the

University of Georgia and Vlad Lunin and Markus Alahuhta at the National Renewable

Energy Laboratory.

iv

I would finally, and most importantly, like to acknowledge my family. My parents,

Michael and Christine Conway, fostered my love of learning from an early age and have supported me in all of my endeavors. Their support and guidance have helped to make me the person I am today. My sister, Caitlin, is always there for me and I appreciate her friendship and support immensely. I am sorry for constantly boring you all with my latest trials and tribulations in the lab. Despite lacking most of the prerequisite training in chemistry, biology, and/or engineering, your willingness to listen has helped me think through my scientific experiments many times. Thank you for your unwavering support and encouragement. Without it, achieving this milestone would not have been possible.

v

TABLE OF CONTENTS

LIST OF TABLES ...... xi LIST OF FIGURES ...... xii CHAPTER 1 Lignocellulosic Biomass Deconstruction by the Extremely Thermophilic Caldicellulosiruptor ...... 1 Abstract ...... 2 Introduction ...... 3 Lignocellulosic Biomass ...... 3 Processing Lignocellulose into Biofuels ...... 5 Carbohydrate Active Enzymes (CAZymes) ...... 7 Glycoside Hydrolases (GHs) ...... 9 Carbohydrate Binding Modules (CBMs) ...... 11 Polysaccharide Lyases (PLs) ...... 13 Carbohydrate Esterases (CEs)...... 15 Immunoglobulin-like Superfamily...... 16 Surface Layer Homology (SLH) Domains ...... 18 Others (Adhesin, Expansin-like, and Swollenin) ...... 18 Biomass Deconstruction by Cellulolytic Microorganisms ...... 19 Biomass Degradation by the Genus Caldicellulosiruptor ...... 22 The Genus Caldicellulosiruptor ...... 22 The Caldicellulosiruptor Pan-CAZome ...... 25 Notable Caldicellulosiruptor CAZymes ...... 26 Linker Regions in Caldicellulosiruptor CAZymes...... 28 The Caldicellulosiruptor Surface Layer ...... 30 FN3 containing Caldicellulosiruptor proteins ...... 31 Comparison to Clostridium thermocellum ...... 32 Conclusions ...... 34 Future Trends ...... 35

vi

Metagenomics - Identifying Novel CAZymes ...... 35 Caldicellulosiruptor bescii Genetic System ...... 36 Metabolic Engineering of a CBP Thermophile ...... 37 Web Resources ...... 39 Acknowledgments ...... 39 References ...... 49 CHAPTER 2 Multi-Domain, Surface Layer Associated Glycoside Hydrolases Contribute to Plant Polysaccharide Degradation by Caldicellulosiruptor species ...... 77 Abstract ...... 78 Introduction ...... 79 Materials and Methods ...... 81 Bacterial Strains, Plasmids and Reagents ...... 81 Growth Media and Culture Conditions ...... 82 Expression of Recombinant Proteins in E. coli ...... 83 Purification of Recombinant Protein and Truncation Mutants ...... 84 Optimal pH and Temperature Determination ...... 85 Analysis of Oligosaccharide Products from Enzymatic Hydrolysis ...... 86 Confocal and Epifluorescence Microscopy ...... 86 Genetic Manipulation of C. bescii ...... 88 Hydrolysis of Hemicellulose Substrates ...... 89 Analysis of Solubilization in Culture Supernatants ...... 90 Results ...... 90 S-layer Homology (SLH) Domain Proteins in Caldicellulosiruptor Species ...... 90 C. kronotskyensis Produces an S-layer and SLH Enzymes Calkro_0111 and Calkro_0402 are Localized on the C. kronotskyensis Cell Surface ...... 92 Laminarinase Calkro_0111 Displays Endo- and Exo-glucanase Activity in Separate Catalytic Domains ...... 93 Biochemical Characterization of Calkro_0402 and Relatedness To Other CBM22/GH10/CBM9 Enzymes ...... 95

vii

Transcriptomic Analysis of SLH Domain Proteins in C. bescii and C. kronotskyensis . 96 Manipulation of the C. bescii S-layer by the Insertion (Knock On) of the Gene Encoding Calkro_0402 ...... 97 C. bescii Strain RKCB103 Xylan Solubilization ...... 98 Discussion ...... 100 Acknowledgements ...... 105 References ...... 117 CHAPTER 3 Functional analysis of the Glucan Degradation Locus (GDL) in Caldicellulosiruptor bescii reveals roles of component glycoside hydrolases in plant biomass deconstruction ...... 135 Abstract ...... 136 Introduction ...... 137 Experimental Procedures ...... 140 Bacterial Strains, Plasmids and Reagents ...... 140 Media and Culture Conditions ...... 142 Cloning and Purification of β-glucosidase Athe_0458 ...... 143 Construction of Genetic Knockout Strains of C. bescii ...... 144 Genomic Sequencing of Modified C. bescii Strains ...... 146 Secretome Analysis ...... 146 Construction of Enzyme Expression Strains of C. bescii ...... 147 Protein Expression and Purification from C. bescii Strains ...... 147 Solubilization of Lignocellulosic Biomass Substrates ...... 148 Biochemical Characterization of GDL Glycoside Hydrolases ...... 149 Results ...... 151 The C. bescii Glucan Degradation Locus ...... 151 Repetitive Nature of the GDL ...... 155 In vivo Analysis of GDL Enzymes through Targeted Gene Deletions ...... 157 Cellulose and Complex Plant Biomass solubilization by GDL GH Knockout Strains 159 Switchgrass Solubilization by Mixtures of GDL Knockout Strains ...... 163

viii

Expression and Purification of Glucan Degradation Locus Cellulases ...... 164 Biochemical Characterization of Athe_1860 ...... 166 In Vitro Analysis of Glucan Degradation Locus Enzyme Cocktails ...... 167 Discussion ...... 170 Acknowledgements ...... 178 References ...... 189 CHAPTER 4 Structural and Biochemical Characterization of Glycoside Hydrolase Catalytic Domains and Carbohydrate Binding Modules from Caldicellulosiruptor Biomass Degradation Enzymes ...... 203 Abstract ...... 204 Introduction ...... 205 Experimental Procedures ...... 208 Bacterial Strains, Plasmids and Reagents ...... 208 Media and Culture Conditions ...... 209 Recombinant Protein Production in E. coli...... 210 Purification of Proteins for Biochemical Characterization ...... 211 Construction of C. bescii Enzyme Expression Strains...... 212 Protein Expression and Purification from C. bescii Strains ...... 213 Purification of Proteins for Cystallography ...... 214 Crystallization ...... 214 Data Collection and Processing ...... 215 Structure Solution and Refinement ...... 216 Biochemical Characterization of Glycoside Hydrolases ...... 216 Results ...... 217 Calkro_0111 GH16 and Fascin Domain Containing Proteins ...... 217 Calkro_0111 GH16 and Fascin Domain Structures...... 218 Biochemical Analysis of the Calkro_0111 GH16-Fascin Protein ...... 219 Wai35_2053 GH10, CBM3, and GH48 Domain Structures...... 220 Biochemical Analysis of Wai35_2053 and Constituent Domains ...... 222

ix

Csac_0678 GH5 and CBM28 Structures ...... 223 Biochemical Analysis of Csac_0678 and Athe_0594 ...... 225 Discussion ...... 227 Acknowledgements ...... 232 References ...... 242

x

LIST OF TABLES

Table 1.1 - Caldicellulosiruptor Genomes ...... 44 Table 1.2 - Surface Layer Homology domain containing Caldicellulosiruptor proteins ...... 45 Table 1.3 - Fibronectin type III-like domain containing Caldicellulosiruptor proteins ...... 47 Table 1.4 - C. thermocellum and C. kronotskyensis extracellular CAZy domain inventory comparison ...... 48 Table 2.1 - Primers used in this study ...... 116 Table 3.1 - Characterized GDL Enzymes ...... 187 Table 3.2 - Specific Activity of Athe_1860 ...... 188 Table 4.1 - Proteins identified by blastp using Calkro_0111 GH16-Fascin sequence as the query ...... 240 Table 4.2 - GH16 domain-containing proteins identified by blastp using the Calkro_0111 Fascin domain sequence as the query ...... 241

xi

LIST OF FIGURES

Figure 1.1 - Biomass structure and degradation by selected Caldicellulosiruptor enzymes . 40 Figure 1.2 - The three biomass degradation mechanisms used by cellulolytic microorganisms ...... 41 Figure 1.3 - Caldicellulosiruptor extracellular pan-CAZome ...... 42 Figure 1.4 - Caldicellulosiruptor intracellular pan-CAZome ...... 43 Figure 2.1 - Catalytic CAZyme SLH Domain Proteins from the genus Caldicellulosiruptor ...... 107 Figure 2.2 - TEM and Immunofluorescence Microscopy of C. kronotskyensis ...... 108 Figure 2.3 - Biochemical Characterization of Calkro_0111 ...... 109 Figure 2.4 - Biochemical Characterization of Calkro_0402 ...... 110 Figure 2.5 - Transcriptional response of genes encoding SLH domain proteins from C. bescii and C. kronotskyensis...... 111 Figure 2.6 - Construction of Calkro_0402 knock-in C. bescii strain RKCB103 ...... 112 Figure 2.7 - Increased xylan solubilization and xylan attachment by Calkro_0402 knock-in C. bescii strain RKCB103 ...... 114 Figure 2.8 - Time course washed oat spelt xylan solubilization ...... 115 Figure 3.1 – The Glucan Degradation Locus in C. bescii ...... 179 Figure 3.2 – Dot plot showing the repeated DNA sequence of the C. bescii GDL ...... 180 Figure 3.3 – GDL knockout strains of C. bescii ...... 181 Figure 3.4 – Biotic Solubilization of Plant Biomass Substrates by GDL knockout strains . 182 Figure 3.5 – Switchgrass solubilization by mixed cultures of GDL knockout strains...... 183 Figure 3.6 – Recombinant expression of GDL Enzymes is C. bescii ...... 184 Figure 3.7 – Enzyme cocktails mimicking C. bescii wildtype and GDL knockout strains . 185 Figure 3.8 – Enzyme cocktails demonstrating the synergy between GDL GH enzymes. ... 186 Figure 4.1 - Selected multi-domain enzymes from Caldicellulosiruptor species ...... 233 Figure 4.2 - Crystal strucutres for Calkro_0111 GH16 and Fascin-like domain ...... 234 Figure 4.3 - Biochemical data on the Calkro_0111 GH16-Fascin protein ...... 235

xii

Figure 4.4 - Crystal structures for Wai35_2053 domains ...... 236 Figure 4.5 - Biochemical data on Wai35_2053 and its domains ...... 237 Figure 4.6 - Crystal structures for Csac_0678 domains ...... 238 Figure 4.7 - Biochemical data on Csac_0678 and Athe_0594 proteins ...... 239

xiii

CHAPTER 1

Lignocellulosic Biomass Deconstruction by the Extremely Thermophilic Genus Caldicellulosiruptor

Jonathan M. Conway, Jeffrey V. Zurawski, Laura L. Lee, Sara E. Blumer-Schuette and Robert M. Kelly

Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695-7905

Published in: Thermophilic Microorganisms (Li, F. ed.), Caister Academic Press, Norfolk, UK. pp 91-120 (2015)

1

Abstract

Extremely thermophilic bacteria from the genus Caldicellulosiruptor produce a variety of large multi-domain proteins for the attachment to and degradation of lignocellulosic biomass. The currently sequenced genomes from the genus

Caldicellulosiruptor encode 144 homologous groups of Carbohydrate Active enZymes

(CAZymes) containing 128 glycoside hydrolase (GH), 58 carbohydrate binding module

(CBM), 9 polysaccharide lyase (PL), and 11 carbohydrate esterase (CE) domains.

Extracellular CAZymes from these species have a distinct multi-domain architecture, with multiple catalytic and binding domains in the same protein. This promotes synergy between catalytic domains and enhances the ability of these bacteria to degrade biomass. A number of these extracellular multi-domain proteins are also cell surface associated via surface layer homology (SLH) domains. These SLH domain containing proteins are thought to help these species attach to the biomass substrate, while catalytic domains in the same proteins degrade it. Because of their highly effective biomass degradation strategy, Caldicellulosiruptor species are of particular interest for development into consolidated bioprocessing microorganisms for the processing of lignocellulosic biomass to biofuels.

2

Introduction

Lignocellulosic Biomass

Lignocellulosic biomass is abundant, inexpensive, and renewable, making it of interest as a feedstock for liquid biofuel production (1). Lignocellulosic biomass can be obtained from a variety of sources, including agricultural wastes (corn, small grain, and sugarcane residues), industrial forestry product wastes (logging residues, tree thinning, sawdust, and paper pulp), and dedicated biofuel crops (switchgrass, miscanthus, sorghum, and poplar). Two recent studies project the total United States production capacity of lignocellulosic biomass priced less than $60 per dry ton to reach 550-600 million dry tons annually by the year 2020, without significant impact to the environment or food supply

(2,3). This is projected to increase to about 770 million dry tons of production by the year

2030, supported largely by an annual 120 dry ton production increase in dedicated biofuel crops over those ten years (3). Similar analysis applies globally to regions where significant lignocellulose reserves are available.

Lignocellulosic biomass dry weight is comprised generally of about 35-50% cellulose, 20-35% hemicellulose, and 10-30% lignin (4,5). The composition of biomass varies significantly from species to species and within different tissues of a single species (6).

Figure 1.1A shows the general makeup of the polysaccharides, cellulose and hemicellulose, in biomass. Cellulose is a homologous polymer comprised of long chains of β-1,4 linked glucose molecules. These long chains are able to pack parallel to one another into ordered arrays 3-5nm thick, called crystalline microfibrils, which assemble into the main structural element of plant cell walls (7). The packing of these crystalline regions is sufficiently

3

compact to preclude the penetration of enzymes and even small molecules, water in particular. However, irregularities in the packing of the polysaccharide chains create amorphous regions, pits, micropores, and capillaries, making the cellulose chains accessible to small molecules and, in some cases, cellulases, enzymes that break down cellulose (8).

Hemicellulose, on the other hand, is a highly variable heterologous polymer of pentoses (e.g., xylan, arabinose) and hexoses (e.g., mannose, glucose, galactose), integrated into a β-1,4- or mixed β-1,3-1,4- linked backbone of repeating disaccharide units, which are then substituted with various side chain sugars (9). Hemicellulose composition varies across different species, for example, hardwoods contain mainly xylans, while softwoods contain mainly mannans, and grasses contain primarily arabinoxylan (5). This diverse composition necessitates an array of hemicellulases to recognize and cleave these linkages to completely degrade hemicellulose. Hemicellulosic glycans, particularly xyloglucan, are thought to non- covalently tether cellulose microfibrils in the cell wall, thereby increasing wall strength (9).

The non-carbohydrate fraction of plant biomass is lignin, a hydrophobic polymer synthesized by the oxidative coupling of the monomers p-hydroxyphenyl (H-), guaiacyl (G-), and syringyl (S-) lignin (10). Lignin attaches covalently to hemicellulose, filling in gaps between hemicellulose and cellulose to increase the strength and rigidity of fibrous plant tissue, and its hydrophobicity improves water and solute conduction through plant vascular tissue (10,11). These characteristics of lignin interfere with the access of enzymes to the polysaccharides of biomass and enable lignin to adsorb enzymes, impeding their activity

(12).

4

Processing Lignocellulose into Biofuels

The main challenge in utilizing lignocellulosic biomass as an energy source is a deficiency of low-cost technologies for effectively accessing polysaccharides within the plant cell wall and breaking them down into fermentable sugars (8,13). Traditional approaches to bioprocessing of lignocellulosic biomass involve three process steps: pretreatment, enzymatic hydrolysis, and fermentation of released sugars (14). Thermal and/or chemical processing of raw biomass is done during pretreatment, with the goals of disrupting the plant cell wall and removing and separating lignin, and sometimes hemicellulose, from the substrate (15).

Cellulase and hemicellulase cocktails are then added to hydrolyze the biomass polysaccharides to mono- and oligo-saccharides, which are fermented by a microorganism to a desired product, typically ethanol. Many variations of these steps have been developed, either to keep hydrolysis and fermentation separate, or combine them in strategies called simultaneous saccharification and fermentation (SSF), where enzyme cocktails are mixed with ethanologens (e.g., yeast) to achieve biomass conversion (14,16). However, conditions within the SSF reaction must be amenable to the microorganism, as well as the enzyme mixtures, which is often far from the specific optimum conditions of either entity.

Ideally, a low-cost processing strategy would combine lignocellulose degradation and fermentation of the released sugars to fuel in a single process step performed by a single microbe, a strategy referred to as Consolidated BioProcessing (CBP) (8). Currently, no naturally occurring microorganism performs the substrate utilization and product formation steps to the extent needed for CBP-based production of liquid fuels. However, efforts are underway to develop CBP microorganisms using two separate engineering strategies:

5

improving product yield from cellulolytic microorganisms, or adding cellulases to non- cellulolytic microorganisms with high product yield to enable substrate utilization (13,17).

Both of these strategies require better understanding of the intrinsic mechanisms that cellulolytic microorganisms use to efficiently attack and break down cellulose. This will help to identify the most promising candidate microorganisms and cellulases for CBP development.

To date, many mesophilic recombinant microorganisms have been developed as potential CBP microorganisms (18,19), but none can navigate the high variability of lignocellulosic feedstocks and have the robustness necessary to function under biologically unfavorable process conditions (20). Because of their ability to degrade cellulose at high temperatures and to utilize unpretreated biomass feedstocks, a number of cellulolytic, extremely thermophilic microorganisms show promise in overcoming the process barriers to

CBP. The high temperatures at which extreme thermophiles grow optimally (i.e., 70˚C or above) reduce the energy input for process cooling, facilitate volatile product recovery, and significantly limit the risk of microbial contamination by mesophilic methanogen and sulfate- reducing soil bacteria, likely to be present in unpretreated biomass feedstocks (21). In addition, the inventory of enzymes produced by these organisms provides a rich reservoir of highly thermostable, and often novel, enzymes for biomass processing.

Cellulolytic, thermophilic bacteria are rare, with many species clustered in the genera

Caldicellulosiruptor, Clostridium, Geobacillus, Acidothermus, and Thermoanaerobacter.

However, Calidcellulosiruptor is the only known genus with species that can fully deconstruct lignocellulose at temperatures above 70°C (20). A number of non-cellulolytic

6

thermophiles that produce carbohydrate active enzymes related to biomass conversion are also of interest for developing CBP microorganisms, including Thermotoga maritima,

Pyrococcus furiosus, Sulfolobus solfataricus, and Dictyoglomus thermophilum (22,23).

Currently, genetic and metabolic engineering of cellulolytic thermophiles, now in the early stages, could enable fine-tuning the enzyme inventory and improving product yields from these organisms, looking towards CBP as the ultimate goal (24-26). Environmental sampling and metagenomic approaches are also of interest to identify as yet undiscovered microorganisms and enzymes that could play a role in an engineered CBP biomass conversion system.

Carbohydrate Active Enzymes (CAZymes)

Microorganisms that degrade biomass substrates produce a large variety of extracellular cellulases, hemicellulases, and non-catalytic binding proteins for interacting with, and liberating sugars from, insoluble biomass substrates. These proteins are designated

Carbohydrate Active enZymes (CAZymes), and have a modular architecture that may contain a single domain or multiple domains from various enzyme classes or binding motifs.

These catalytic and binding domains that comprise CAZymes are classified into families according to amino acid sequence similarity, and cataloged in a database maintained at www.cazy.org (27,28). The CAZy database is ever expanding by grouping proteins into new module families and incorporating putative CAZymes from an increasing number of sequenced genomes. The CAZy database is curated to include biochemical characterization

7

and x-ray crystallographic structural information obtained for members of the various families (28).

Of the enzyme module classes cataloged in the CAZy database, the most abundant for biomass degradation are the catalytic glycoside hydrolases (GHs), which hydrolyze glycosidic bonds between various linked sugars, and the non-catalytic carbohydrate binding modules (CBMs), which bind sugars both soluble and insoluble. The CAZy database also includes other carbohydrate active domain classifications including: polysaccharide lyases

(PLs), carbohydrate esterases (CEs), glycosyltransferases (GTs), and a newly designated

‘Auxiliary Activities’ (AAs) classification (28). PLs in the context of biomass degradation are typically involved in the degradation of pectin. CEs are involved in the removal of ester linked moieties (e.g., acetyl or methoxy groups) attached to the sugars in biomass. GTs are enzymes that form glycosidic bonds, rather than degrade them. While they are produced by cellulolytic microbes, GTs, because they are not involved in biomass deconstruction, will not be discussed here. The AAs class contains a variety of modules that do not fit into the other

CAZy classes, but assist GHs, PLs, and CEs in accessing the polysaccharides of plant cell walls. Currently classified members of the AAs are predominantly from lignolytic and cellulolytic fungi, and the role of these modules in bacteria is currently not well characterized

(29). As such, the AAs class will not be discussed further here.

Currently, there are 133 GH, 69 CBM, 23 PL, and 16 CE families listed in the CAZy database. However, the constant curation of the database has deleted or reclassified some of the families as biochemical characterization data become available. For example, the family

GH 61 was assigned to the GHs based on weak glucanase activity of one member. But as

8

more members were characterized, the protein family has been reassigned as Auxiliary

Activity family 9, rendering GH 61 obsolete. Due to these types of changes, there are currently 126 GH, 66 CBM, 22 PL, and 15 CE families with classified members. A number of proteins that are known, or predicted, to have a carbohydrate active domain, but are not closely related to existing protein sequences, remain as “non-classified sequences” or “nc”, meaning they have not been incorporated into a numbered family classification. As sequencing of novel organisms continues and new protein sequences are introduced into the database, these are grouped into new families when sufficient similarity is identified among non-classified sequences. In addition to the GH family designations, some families have been organized into 14 clans (GH-A through –N), based on tertiary protein structure similarities between GHs of different families (30). In the same way, selected CBM families are organized into 7 ‘fold’ families, based on tertiary structure similarities (31,32). In the following sections, the role of each of these CAZy classified modules will be examined in relation to biomass attachment and deconstruction. A number of additional modules and proteins, also associated with CAZymes, will also be discussed.

Glycoside Hydrolases (GHs)

The enormous diversity of saccharides in plant biomass necessitates the complementary action of a variety of GHs to dismantle them. Nearly all GHs exhibit specificity based on two structural elements of the carbohydrates they target: the configuration of the cleaved bond (α or β) and the ring size of the released sugar molecule

(30). GHs hydrolyze glycosidic bonds, in almost all cases, by acid-base chemistry using two

9

carboxylic acid moiety containing amino acids in the active site, most often glutamic acid or aspartic acid (30,33). Glycosidic bond hydrolysis proceeds by either an inverting mechanism, where the anomeric carbon is inverted by direct displacement of the leaving group with water, or a retaining mechanism where the anomeric carbon orientation is retained by double displacement through a glycosyl-enzyme intermediate (1,33). The transition state of GH-catalyzed hydrolysis reactions involve the distortion of the pyranose ring into a puckered conformation to allow nucleophilic attack and bond hydrolysis (34).

These transition states have been examined for a number of GHs through various crystallographic and computational methods (34-37).

GHs that break down polysaccharides are also classified as endohydrolases or exohydrolases, based on their ability to cleave glycosidic bonds either in the middle, or at the end of polysaccharide chains, respectively (38). In the case of cellulose hydrolysis, endo-β-

1,4-gluconases cut the amorphous regions of cellulose to generate new cellulose chain ends, from which exo-β-1,4-glucanases work to remove small oligosaccharides, mainly cellobiose, resulting in the classification of many exo-β-1,4-glucanases enzymes as cellobiohydrolases.

Exohydrolases can further be divided into two classes, based on their preference for acting at the reducing or non-reducing end of the polysaccharide chains they cleave (39). Several GHs have been identified that display both endo- and exo- modes of action, so characterization based on enzymatic mechanism and structural properties result in more specific classifications (1,38). Once soluble sugars are liberated from polysaccharides (e.g., cellodextrins from cellulose), another set of GHs, such as β-glucosidases or cellobiose phosphorylases, break these soluble carbohydrates into monomers for metabolism (8,40,41).

10

The complex structure and diversity of sugars and bonds in biomass illustrate the need for a high degree of synergy between many GHs to efficiently hydrolyze the substrate.

GHs that hydrolyze the β-1,4-glycosidic bonds found in cellulose and cellodextrins are most prevalent in families GH5, GH6, GH7, GH8, GH9, GH12, GH44, GH45, GH48, GH74, and

GH124 (8). Most xylan-degrading enzymes are concentrated in GH5, GH8, GH10, and

GH11, but bi-functional GHs have been identified in GH7, GH16, GH43, and GH62, families that are typically associated with the degradation of other polysaccharides (42). The

α-1,4-glucosidic bonds, found in polysaccharides related to starch, are typically degraded by members of GH13, GH57, GH70, and GH77 (43,44). Many other GH activities are relevant to biomass degradation, including: α-galactosidases in GH27 and GH36; β-mannanases in

GH5 and GH26; β-mannosidases in GH2; α-arabinofuranosidases in GH43, GH51, GH54, and GH62; and α-glucuronidases in GH67 (27,45-48). Representative crystal structures have now been solved for 100 of the existing 126 GH families (49).

Carbohydrate Binding Modules (CBMs)

CBMs are often components of catalytic proteins, which also contain catalytic GH,

PL, or CE domains, or components of dedicated binding proteins that facilitate the attachment of microorganisms to biomass substrates. CBM domains enhance the function of the catalytic domains to which they are linked by increasing the proximity of the catalytic unit to the substrate, targeting specific areas of the substrate, and disrupting substrate interactions to enhance degradation (31). When CBMs are appended to or removed from catalytic protein units, GHs linked to these CBMs can have increased hydrolytic efficiency

11

(50,51). This suggests that the CBM units increase the effective enzyme concentration at the substrate surface, thereby increasing efficiency. Other experimental data suggest that CBMs play a role in targeting GHs to specific areas of the substrate that differ in susceptibility to hydrolysis (52,53). Microorganisms that grow on biomass attack a diversity of sugars.

Congruency between CBM domain binding and GH domain catalytic specificity within the same protein is essential for efficient degradation by targeting particular sugars within the diverse plant cell wall (32). There is also significant evidence that CBMs play a role in disrupting or penetrating the structure of polysaccharides, enhancing the ability of the catalytic subunit to degrade the substrate (54-56). One model of cellulose microfibril disruption postulates CBM adsorption at a micro-defect in the cellulose chain packing, followed by penetration of the CBM and water molecules into the mircofibril, disrupts the hydrogen bonding between stacked cellulose chains, loosening individual chains for enzymatic degradation (55). To better understand the roles of CBMs, it is important to understand the types of sugars that they bind.

CBM binding is classified into types, based on the recognized ligands: Type A surface binding, Type B glycan chain binding, and Type C soluble sugar binding (31). Type

A CBMs have planar binding sites, which reflect the flat surface of crystalline substrates, such as cellulose or chitin (32). Type B CBMs include the most CAZy classified CBM families, and bind to glycan chains via a cleft or groove binding site (31). The cleft or grove site limits the specificity of these CBMs to polysaccharide chains, for example, binding the free cellulose chains in amorphous microfibril regions, while no binding affinity is seen to crystalline regions of cellulose (57). Similarly, the steric restrictions of the Type C CBM

12

binding site only allow binding to soluble mono-, di-, or tri-saccharides (32). Aromatic amino acids seem to play a universal role in carbohydrate binding in all CBM types, especially tryptophan and tyrosine, which stack against, in Type A CBMs, or sandwich, in

Type B and C CBMs, the pyranose ring of carbohydrate targets (31,58). In addition, site- directed mutations have shown that hydrogen bonds between CBM polar groups and sugar targets are key to CBM affinity in Type B and C CBMs, but only play a minor role in Type A

CBMs (31,59).

Polysaccharide Lyases (PLs)

PLs cleave uronic acid-containing polysaccharides and occur in a diversity of genomes from bacteria to higher eukaryotes. However, they are rarer in these genomes than

GHs, presumably because only a small portion of polysaccharides contain uronic acids (60).

The substrates for PLs can be divided generally into three groups, based on the uronic acid they contain at the PL cleavage site. Galacturonic acid containing polysaccharides including pectin and pectate; glucuronic/iduronic acid containing polysaccharides including heparin, chondroitin, and hyaluronan, among others; and mannuronate/guluronate containing polysaccharides including alginate are all degraded by PLs (61). The first of these PL degraded groups is only one that is significant in the context of biomass utilization, because, pectin and pectate are significant components in the primary cell walls of all dicots and some monocots (62). Glucuronic acid containing glucuronoxylan and glucuronoarabinoxylan are significant components of the cell walls of dicots and grasses respectively (9). However, the

13

single glucuronic acid moiety branching on the xylan backbone in these cases is not catalytically removed by PLs, but rather by GH4 or GH67 α-glucuronidases (47,48).

Pectin consists of three different polysaccharide motifs: 1) homogalacturonan (HGA), the main backbone of α-(1,4)-linked D-galacturonic acid, 2) rhamnogalacturonan I (RGI), which has a [2)-L-rhamnose-α-(1,4)-D-galacturonic acid-α-(1,] linked backbone that is highly branched from rhamnose with L-arabinose and D-galactose, and 3) rhamnogalacturonan II (RGII), which is a highly variable and complex branched region of the α-(1,4)-linked D-galacturonic acid backbone (62). Microorganisms that degrade plant biomass typically contain only PLs from the pectin/pectate lyase families PL1, PL2, PL3,

PL9, PL10, and PL22 and from the rhamnogalacturonan lyase families PL4 and PL11 (61).

Uronic acid-containing polysaccharides are cleaved by PLs via a β-elimination mechanism that generates a new reducing chain end and a hexenuronic acid moiety at the non-reducing chain end after cleavage (60). This occurs without the involvement of water, in contrast to the GH mechanism that does incorporate water to maintain an -OH group on the non-reducing chain end (63). Of the 22 PL families in the CAZy database, all but one, PL17, have a solved structure for a representative member and these structures group into eight

‘fold’ types (60). In pectate lyases with solved crystal structures, an Arg or Lys residue acts as the catalytic base (64,65). For the PL3 pectate lyase from Caldicellulosiruptor bescii, it was found that in digestion reactions of unpretreated switchgrass, where the PL3 was mixed with C. bescii cellobiohydrolase A (CelA), lower CelA loading was needed to achieve the same glucan conversion accomplished by higher loadings of CelA alone (65). This suggests that PLs play an important role in making areas of unpretreated biomass substrates more

14

readily accessible to GHs of other biomass degrading CAZymes, thereby leading to more complete substrate utilization.

Carbohydrate Esterases (CEs)

As a protection against microbial attack, many plant cell wall polysaccharides, particularly hemicelluloses and pectin, are highly substituted with a variety of side chains, including acetyl, feruloyl, and methoxy groups, linked to the sugar backbone by ester linkages (66). Esterification with phenolic acids, such as ferulic acid, can further react to form cross-linkages between adjacent cell wall polymers, which improve the mechanical strength of the plant cell wall, although less frequent than polysaccharide esterification with acetic acid (67). CEs remove these ester substitutions, allowing GHs and PLs access to the polysaccharide backbone for degradation. The synergistic relationship between CEs and

GHs or PLs has been observed frequently (68,69); for example, in corn stover degradation, the removal of acetyl substituting groups via the addition of an acetyl xylan esterase from CE family 5 improved enzymatic conversion of xylan by an endoxylanase from GH 11 (70). CE domains can also be appended to GH or PL catalytic domains, resulting in bi-functional activity of a single enzyme, as observed in the CE1-GH11 containing XynS20E enzyme from

Neocallimastix patriciarum (71).

CEs can be grouped based on the substituted moiety that they remove: cinnamoyl esterases, including feruloyl esterases, are found in family CE1, although many have not yet been assigned a CE family (CEnc); acetylesterases, including acetylxylan esterases, belong to families CE1, CE2, CE3, CE4, CE5, CE6, CE7 and CE16; and, pectin methylesterases are in

15

family CE8 (27,66,72,73). The active center of most CEs is based on a catalytic serine in the consensus sequence (Gly-x-Ser-x-Gly), which is part of a catalytic triad of Ser-His-Asp/Glu or catalytic dyad of Ser-His alone (67). In contrast, the catalytic center of pectin methylesterases center around a catalytic Asp residue stabilized by Arg (73). The reaction mechanism for CEs is similar to that of lipases involved in the breakdown of triglyceride fats, where the substrate binds to the active serine, an alcohol is released, and an acyl-enzyme intermediate remains; subsequently, attack of a nucleophile like water and resolution results in product and the free enzyme (74,75).

Immunoglobulin-like Superfamily

Two additional domain types, fibronectin type III (FN3) and bacterial immunoglobulin-like (BIg), both of which belong to the superfamily of immunoglobulin-like

(Ig-like) proteins in Pfam clan CL0159, have been identified as members of the multi-domain architecture of a number of Caldicellulosiruptor proteins (76,77). Members of this immunoglobulin-like superfamily typically only have 10-30% sequence identity, but share a similar structure of two face-to-face β-sheets with a hydrophobic core (76,78). FN3 domains are common components of eukaryotic proteins, but within bacterial species they have been identified almost exclusively in extracellular glycohydrolases (79,80). There remains some disagreement as to how to classify FN3 domains and what their role is in bacterial enzymes.

Some studies have characterized these domains as FN3 domains and “X domains” (because of their unknown function) and have tried to differentiate phylogenetic groups of X domains from FN3 domains (81). Work with the cellulosomal cellobiohydrolase CbhA from C.

16

thermocellum, containing domains from the N-terminus: CBM4-BIg-GH9-FN3-FN3-CBM3-

Dockerin, showed increased hydrolysis of sugar from acid-swollen cellulose by protein truncations containing one or both FN3 domains and “exfoliation” of cellulose chains by the

FN3 domains alone (80). Work by others to solve the crystal structures and characterize the same FN3 domains from C. thermocellum CbhA suggest that these domains play a role in improving the thermostability of the enzyme, and do not play a role in disrupting the substrate (82). Other proposed roles for FN3 domains include promoting “mechanical elasticity” of the enzyme, functioning as a CBM, and serving as a “spacer” to maintain the optimal distance between the other components of multi-domain enzymes (78,81).

Bacterial immunoglobulin-like domains appear to be involved in cell-to-cell adhesion and extracellular glycohydrolysis, but a definitive role of these domains remains elusive (83).

The structure of the BIg domain from C. thermocellum CbhA has been solved (84,85). The domain shows very close association with, and is required for, the catalytic ability of the

GH9, suggesting a role in the structural stabilization of the catalytic domain (84). Similarly, an Ig-like domain in a thermostable esterase from Thermotoga maritima was found to play a role in the multimerization of the catalytic domain into a hexameric structure for increased activity and thermostability (86). Overall, very little characterization and few structures have been solved for members of the FN3 and BIg families, but the limited work that has been done suggest they play a role in both enzyme stabilization and substrate binding, both of which are essential for robust, thermostable, biomass-degrading enzymes.

17

Surface Layer Homology (SLH) Domains

SLH domains are highly conserved, 50-60 amino acid sequences that are on the N- or

C-terminus of proteins, and attach non-covalently to the cell surface within the surface layer

(S-layer), a planar array of surface protein (87). S-layers are found in almost every taxonomic group of bacteria and archaea, and in most species are predominantly comprised of a single S-layer protein (SLP) that assembles into a protein lattice at the cell surface with oblique, square, or hexagonal symmetry (87). This structured protein array may serve many functions, including as a molecular sieve or protective coat, or an attachment site for extracellular enzymes and molecular moieties involved in cell adhesion or surface recognition (88,89). Proteins, other than the SLP, that contain SLH domains can be tethered to the cell surface within this array, while the remainder of the protein is free to serve its function (90). CAZymes containing SLH domains can mediate the attachment of microorganisms to biomass substrates, by attaching to both the cell via SLH domains and to the substrate via CBM domains; they can also contribute to biomass deconstruction at the cell surface with catalytic domains (91-93). In gram positive bacteria, such as

Caldicellulosiruptor and Clostridium species, SLH domains attach to the 15-30 nm thick peptidoglycan cell wall and its associated secondary cell wall polymers (e.g., teichoic and teichuronic acids) (88).

Others (Adhesin, Expansin-like, and Swollenin)

Expansins are proteins found in plants that participate in cell enlargement by loosening cell wall polysaccharides without breaking them, allowing plant cell walls to

18

“creep” and expand, which facilitates plant growth (94-96). Expansin-like gene sequences appear in the genomes of a small number of phylogenetically diverse bacteria and, from experimental evidence, display similar plant cell wall loosening characteristics to plant expansins (97,98). Expansin-like proteins also include a set of fungal proteins, originally identified in Trichoderma reesei, called swollenins based on their ability to swell cotton fibers without degrading them (99,100). These non-hydrolytic proteins are of particular interest as a biomass pretreatment in biofuels processes for their role in cellulose amorphogenesis, initial disruption of the crystalline microfibral structure prior to hydrolysis

(55). Work is being done to better quantify their effect on cellulose and their ability to enhance the action of CAZymes (101). Recently, Caldicellulosiruptor species have been shown to express two classes of novel adhesin proteins that are enriched in the crystalline cellulose bound fraction of cultures analyzed by proteomics analysis (91). These adhesin proteins have no homology outside the Caldicellulosiruptor genus and are located downstream of a type IV pilus locus, suggesting association with type IV pili to attach to biomass substrates. This novel group of adhesin proteins have no hydrolytic activity, but may play a role in cellulose amorphogenesis, similar to expansin-like proteins (Blumer-

Schuette et al., unpublished).

Biomass Deconstruction by Cellulolytic Microorganisms

Cellulolytic microorganisms use three different degradation mechanisms, shown in

Figure 1.2, to coordinate their cellulase and hemicellulase inventory to break down biomass: free enzymes, cellulosome multi-enzyme complexes, and large multi-domain enzymes (102).

19

Cellulolytic species use at least one, but often more than one, of these degradation mechanisms to effectively degrade lignocellulosic biomass. Aerobic cellulolytic fungi and bacteria typically produce large amounts of individual extracellular free enzymes, but nearly all cellulolytic species use free enzymes to some extent in their biomass degradation strategy

(102,103). These free enzymes almost always contain a substrate-binding CBM domain attached to a catalytic GH domain by a small flexible linker peptide (104). While free enzymes are effective at degrading crystalline cellulose, multi-enzyme cellulosome complexes or large multi-domain enzymes provide the opportunity for coordination of multiple catalytic domains for a more synergistic degradation strategy.

A number of anaerobic bacteria (e.g., Clostridium spp. and Ruminococcus spp.) and fungi (e.g., Chytridomycetes) produce large multi-enzyme complexes on the cell surface called cellulosomes, the most studied of which is from Clostridium thermocellum (105). The cellulosome from C. thermocellum displays superior cellulose degradation activity to free enzyme cocktails and degrades biomass by a mechanism distinct from free enzymes in which the cellulosome physically separates cellulose microfibrils from the larger particle to improve digestion (106). Cellulosomes are constructed around a large primary scaffoldin protein that acts as a scaffold for the enzyme units that degrade biomass. An example of a C. thermocellum cellulosome constructed around the main primary scaffoldin protein, CipA, is shown in Figure 1.2. The primary scaffoldin protein contains many cohesin subunits and typically one CBM domain, which mediates attachment of the large cellulosome complex to the substrate. Enzymes containing GH, CBM, and terminal dockerin modules anchor to cohesin domains on the primary scaffoldin by high affinity, calcium-dependent, type-I

20

cohesin-dockerin interactions (102). The primary scaffoldin attaches to an anchoring scaffoldin protein via a type-II cohesin-dockerin interaction that is divergent from the type-I primary cohesin-dockerin interaction between the scaffoldin and enzyme subunit (105). The anchoring scaffoldin contains terminal surface layer homology (SLH) domains that facilitate attachment of the entire cellulosome complex to the cell surface (105).

While the paradigm describing cellulosome assembly is modeled after the C. thermocellum version, further investigation into the cellulosomes of C. thermocellum and other species reveal a rich array of assembly motifs to increase the number of enzyme attachments or incorporate additional components to the cellulosome complex (102,105).

For example, in C. thermocellum, five cell surface attached scaffoldins, and one scaffoldin that is not cell associated, have been identified; these vary in assembly structures from attaching directly to a single enzyme subunit to attaching 7 primary scaffoldins harboring up to 63 enzymes (107). In contrast, Acetivibrio cellulolyticus produces a “three-tiered cellulosome”, comprised of 83 enzymes attached in a complex to the cell surface via three interacting scaffoldins, one of which also contains a catalytic GH9 module (105).

Cellulosomes provide a number of advantages to the microorganisms that produce them, including increased enzymatic synergism enhanced by favorable assembly and spacing of enzymatic and binding components, and proximity of the cell to the substrate for monitoring the microenvironment and uptake of released sugars for metabolism (1).

Other anaerobic bacteria, particularly members of the genus Caldicellulosiruptor, do not coordinate enzymes into large multi-protein cellulosome complexes, but rather use large multi-domain proteins, which can contain multiple catalytic and binding domains, to

21

efficiently degrade biomass (91). These multi-domain proteins are characterized by their complex architecture of multiple GH, PL, CE, CBM, and/or other accessory domains in a single peptide. Several of these multi-domain enzymes contain two distinct GH domains, creating a multi-functional protein, where it is possible for substrate intermediates to be mobile between the two GH module active sites resulting in increased synergism (108,109).

The inclusion of SLH domains at either the N- or C-terminus of some of these proteins allows for attachment of the protein within the S-layer of the cell, while CBM domains in the same protein can mediate attachment to biomass to hold the cell in close proximity to the substrate (91). These complex multi-domain proteins have been shown to be more active than the sum of the constituent domains as separate proteins, suggesting this multi-domain enzyme strategy for degrading biomass provides an advantage to the microbes that produce them (51,108,110).

Biomass Degradation by the Genus Caldicellulosiruptor

The Genus Caldicellulosiruptor

Caldicellulosiruptor species are extremely thermophilic, gram positive, anaerobic, and asporogenous bacteria that are capable of breaking down a wide range of polysaccharides (111). Currently, the closed genomes of eight Caldicellulosiruptor species, shown in Table 1.1, are available in the National Center for Biotechnology Information

(NCBI) database (www.ncbi.nlm.nih.gov). In addition, two Caldicellulosiruptor genomes,

C. acetigenus and Caldicellulosiruptor sp. F32, currently have permanent draft sequences and are listed as assemblies in the NCBI database. Other genomes that are currently

22

unrelated include those for Caldicellulosiruptor species strain Rt8.B8 and strain Wai35.B1.

These species are globally distributed through North America, Iceland, Russia, and New

Zealand, however, while the genomes of species from similar locations share highly syntenous blocks, cellulolytic capacity does not correlate with the isolation location (91).

Caldicellulosiruptor species have optimum growth temperatures between 70°C and

78°C, and certain species represent the most thermophilic, cellulolytic bacteria known (111).

One of the most extensively studied species in the genus is C. saccharolyticus, which has a large inventory of GHs in its genome that are differentially transcribed to adapt to varying substrates. These GHs metabolize a variety of monosaccharides, both pentose and hexose sugars, simultaneously without regulation by carbon catabolite repression (108,112,113).

Another well studied species, C. bescii, has been shown to grow on industrially relevant loadings of unpretreated biomass and is able to digest up to 85% of insoluble unpretreated switchgrass, while simultaneously deconstructing lignin, cellulose, and hemicellulose

(114,115). Caldicellulosiruptor species ferment the simple sugars liberated from the biomass substrate to molecular hydrogen, lactate, acetate, and a small amount of alcohols. A recent review of Caldicellulosiruptor metabolism discusses both the intermediary metabolism of these species for generating energy and reducing equivalents, as well as the metabolism for fuel production (116). These characteristics make Caldicellulosiruptor species attractive candidates for development into consolidated bioprocessing (CBP) microorganisms.

Comparison of the eight closed Caldicellulosiruptor genomes revealed that the core genome, genes contained in the genomes of all eight species, includes 1543 ORFs, but is non-cellulolytic (91). The pan genome, the collection of all genes in all species (4009

23

ORFs), was found to be open, indicating that sequencing of new species is expected to reveal novel genes not yet identified as part of the pan genome (91). Within the genus, cellulolytic capability varies greatly with C. bescii, C. kronotskyensis, C. obsidiansis, C. lactoaceticus, and C. saccharolyticus having higher cellulolytic ability than the other species with completed genomes (91). Comparison of these five highly cellulolytic species reveals three extracellular GH proteins present in all five genomes: GH9-CBM3-CBM3-CBM3-GH48

(often referred to as CelA), GH74-CBM3-CBM3-CBM3-GH48, and GH9-CBM3-CBM3-

CBM3-GH5. These three CAZymes are implicated as being important for crystalline cellulose degradation, and in particular, the presence of GH48 and CBM3 domains appear to be essential for crystalline cellulose degradation, because the weakly cellulolytic species do contain GH5 and/or GH9 enzymes, but no GH48 domains (91,111). In addition, proteomic analyses of C. bescii and C. obsidiansis identified the cellulolytic inventory of multi- functional, multi-domain enzymes as primarily containing GH5, GH9, GH10, GH43, GH44,

GH48, and GH74 domains, as well as highly conserved CBM3 domains (117). The many multi-domain enzymes, found in the genus Caldicellulosiruptor, that contain multiple catalytic and binding domains in a single protein are thought to have evolved by domain shuffling (118,119). These Caldicellulosiruptor multi-domain enzymes are often co-located in areas of the genome, including one termed the “glucan degradation locus” (GDL), which contains many of the CBM3-containing enzymes, and a separate xylanase locus (91,120).

24

The Caldicellulosiruptor Pan-CAZome

The nine Caldicellulosiruptor species genomes listed in Table 1.1 encode for over

600 individual CAZymes that can be binned into 144 groups of protein homologs in the pan-

CAZome. These 144 proteins contain 128 GH domains representing 44 families and one non-classified sequence, 58 CBM domains representing 17 families, 11 CE domains representing 7 families, and 9 PL domains representing 4 families and 2 non-classified sequences. To separate these 144 proteins into those involved in extracellular biomass processing and those involved in intracellular sugar catabolism, SignalP4.1 (input settings: gram-positive bacteria, sensitive D-cutoff=0.42, no TM regions) was used to predict signal peptides at the protein N-terminus, which is indicative of protein secretion (121). Using this method, 60 of the 144 CAZymes were identified as extracellular, and the remaining 84 were identified as intracellular. A graphical representation of the domain structure of each of the proteins found in the Caldicellulosiruptor extracellular pan-CAZome is shown in Figure 1.3.

Similarly, the intracellular pan-CAZome is shown in Figure 1.4. These figures show the hierarchical structure of the pan-CAZome, where the 7 extracellular and 26 intracellular core

CAZymes, found in all nine species, are shown at the top, with the boxes of increasing size encompassing all the proteins found in five, three, two, and one or more species, respectively. Put another way, the proteins found inside the box for three or more species, but outside the box for five or more species, have homologous proteins in three or four of the nine Caldicellulosiruptor species.

Taking a closer look at the extracellular pan-CAZome reveals the multi-domain enzyme strategy Caldicellulosiruptor species use to degrade biomass. These extracellular

25

proteins contain between 1 and 9 domains, with the average protein having 2.8 domains. In addition, 13 of the 60 extracellular proteins contain more than one catalytic domain from the

GH, PL, or CE families, which allows for bi-functional enzymes and an increased level of catalytic synergism from individual catalytic proteins. This multi-domain extracellular

CAZyme profile is in stark contrast to the CAZyme profile of the intracellular proteins, where all but two of the 84 proteins contain only one CAZy domain. This is not particularly surprising, as intracellular CAZymes are involved in the catabolism of mono- and oligo- saccharides that have already been liberated from the recalcitrant biomass substrate and transported into the cell.

Notable Caldicellulosiruptor CAZymes

A recent review (120) catalogs the biochemically characterized CAZymes from

Caldicellulosiruptor species. The role that a selected number of these proteins have in degrading a biomass substrate are shown in Figure 1.1B, including: β-1,4-glucanase (CelA)

GH9-CBM3-CBM3-CBM3-GH48 (122-124), β-1,4-glucanase and β-1,4-xylanase GH5-

CBM28-SLH (91), β-1,4-xylanase CBM22-CBM22-GH10 (108,125,126), β-1,4-mannanase

CBM27-CBM27-CBM35-GH26 (127,128), α-L-arabinofuranosidase GH43-CBM22-GH43-

CBM6 (108,119), and pectin lyase PL3 (65). Also shown are two CE proteins and one PL protein whose properties have not been biochemically characterized, but are inferred from

CAZy family classifications. These include the CE4 removing acetyl groups, a CE1-CBM13 protein disrupting the ferulic acid cross-linkage between xylan chains, and PL9-CBM3-

CBM3 degrading the α-1,4-linked D-galacturonic acid backbone of pectin.

26

One of the most noteworthy and novel enzymes produced by the five highly cellulolytic Caldicellulosiruptor species is CelA (GH9-CBM3-CBM3-CBM3-GH48). This enzyme was first characterized from C. bescii (Athe_1867) as having a temperature optimum of 85°C and pH optimum of 5.0-6.0 on Avicel, a crystalline cellulose substrate, with additional weak activity on oat spelt xylan (122). Proteomics work on the secretome of C. bescii revealed that CelA is the dominant cellulase produced by the species (117). More recent work with various truncations of CelA have shown that the GH9 domain possesses the main catalytic activity on cellulose substrates, and the GH48 domain works synergistically with the GH9 to increase sugar release (124). CelA has also been demonstrated to degrade the biomass substrates corn stover and switchgrass; while it appears to be inhibited by cellobiose, when a β-D-glucosidase was added with CelA, 100% glucan conversion from

Avicel was achieved after 7 days at 75°C (123). Additional work to elucidate the degradation mechanism of CelA showed that, unlike many cellulases which work from the ends of crystalline cellulose microfibrals, CelA burrows in the middle of the cellulose microfibrals creating cavities approximately 15-20 nm wide and 15-30 nm long (123). This burrowing mechanism promotes the formation of new cellulose fiber chain ends that could be synergistically digested by other cellulases produced by Caldicellulosiruptor species. CelA is among the largest, most thermostable, and most active enzymes yet identified for the degradation of crystalline cellulose.

27

Linker Regions in Caldicellulosiruptor CAZymes

CelA is located in the Caldicellulosiruptor “glucan degrading locus” (GDL), along with a number of other CBM3-rich CAZymes (91). Another feature of the genes in the GDL is that the linkers between the domains of these multi-domain proteins are highly Pro/Thr- rich (108). Linkers have been a noted feature in a number of CAZymes, often with high Gly,

Ser, Pro, and Thr content, and are thought to be unfolded regions of disorder to allow flexibility between the folded ordered CAZy domains (129). To probe the extracellular

Caldicellulosiruptor CAZyme for these flexible linker sequences, the extracellular CAZymes from C. kronotskyensis, the species containing the most overall CAZymes, were submitted to the SEG segmentation algorithm, an online predictor of low complexity disordered regions

(130). SEG was used to identify the low complexity regions from these proteins, and further analysis was used to evaluate these regions for high Pro/Thr content. In C. kronotskyensis, all of the CAZymes located in the GDL from Calkro_0850 (CelA) to Calkro_0864 contain highly Pro/Thr-rich linkers, and, furthermore, these were the only CAZymes in the genome to exhibit this characteristic. Another potential linker motif was identified in Calkro_0402 in the region between the SLH domains and the reminder of the protein, where the SEG predicted region of disorder is highly Gln/Thr-rich. Flexible linkers that are Asn-rich or Gly- rich, instead of the typical Pro/Thr-rich, have been identified in CAZymes from the ruminal fungus, Neocallimastix patriciarum (71,131). CAZyme linker length and flexibility has been probed by x-ray scattering, identifying a number of conformations, including those where the linker is extended, suggesting these regions are highly flexible (132,133). This characteristic of the linkers promotes synergy between the various linked domains, as each

28

domain has the flexibility of movement to perform its role, binding or enzymatic, in the larger multi-domain protein.

Another feature of these Pro/Thr-rich linkers is that that they are thought to be the sites of glycosylation of these Caldicellulosiruptor proteins. The Ser and Thr residues in linker regions of CAZymes are often O-glycosylated, which has been primarily studied in the

CAZymes of Trichoderma reesei (129,134). Native purified CelA (Athe_1867) from C. bescii appears to be larger than the predicted molecular weight by gel electrophoresis (194 kDa predicted, 230 kDa observed), suggesting that the protein in glycosylated (122,124).

Glycosylation has also been implicated for other CAZymes from the GDL: Athe_1857

(GH10-CBM3-CBM3-GH48), Athe_1865 (GH9-CBM3-CBM3-CBM3-GH5), and

Athe_1859 (GH5-CBM3-CBM3-GH44), in a study of cell-free enzymes from C. bescii

(135). In addition, glycosylation has been observed for components of the cellulosome in C. thermocellum and other cellulosomal microbes (103,136). This glycosylation may be a method to protect against proteases, a potential problem in the low complexity sequences of linker regions between domains in Caldicellulosiruptor enzymes (137). Alternatively, glycosylation may play a role in the dynamic binding to carbohydrate substrates and processive action of CAZymes, as was demonstrated by molecular dynamic simulation of

Trichoderma reesei cellobiohydrolases Cel6A (GH6-CBM1) and Cel7A (GH7-CBM1)

(129). Glycosylation was also found to be an important factor in the rigidity of the linker in

Cel45 from Humicola insolens, where a mutated linker with less glycosylation was found to be more rigid by x-ray scattering (132).

29

The Caldicellulosiruptor Surface Layer

A number of the multi-domain CAZymes from the genus Caldicellulosiruptor also contain S-layer Homology (SLH) domains to localize these proteins at the cell surface within the S-layer. This unique group of multi-domain proteins includes both catalytic proteins containing GH5, GH10, GH16, GH43, GH55, GH87, GHnc, PL9, and PL11 domains and non-catalytic proteins containing CBM and immunoglobulin-like domains (91). A list of the

Caldicellulosiruptor SLH domain containing proteins is shown in Table 1.2 and includes protein loci, amino acid length, and domain structure. These SLH multi-domain proteins are present in all nine sequenced Caldicellulosiruptor species and a number of these proteins are unique to a single species. Because these proteins are localized to the cell surface, the interface between the cell and substrate, it is believed that these SLH multi-domain proteins play a major role in the attachment to and degradation of lignocellulosic biomass by

Caldicellulosiruptor species.

As stated previously, the S-layer in most microorganisms is made up of a single S- layer protein (SLP). Proteomic and transcriptomic experiments implicate Csac_2451, from

C. saccharolyticus, and its homologs through the other Caldicellulosiruptor species, as the main SLP (91,111,138,139). Nearly all of the SLH proteins in Table 1.2 are extracellular by the SignalP4.1 prediction, supporting the idea that these proteins are localized outside the cell at the cell surface. Those that do not have predicted signal peptides are truncated at the N- terminus, unlike homologs in the same group, which likely explains the lack of a predicted signal peptide. The SLH domains of these proteins are either at the N-terminus and C- terminus, with only one group of homologs having SLH domains between two CBM

30

domains. The location of these domains at either terminus suggests that these proteins are tethered to the cell by the SLH domain containing end and then allowed to swing out away from the cell to interact with biomass with the remaining domains.

Csac_0678 (GH5-CBM28-SLH-SLH-SLH) is the only catalytic SLH protein to be conserved through all known Caldicellulosiruptor species. Csac_0678 was found to be optimally active at 75°C and pH 5.0 on a variety of substrates containing β-1,3/4-glucan, β-

1,4-xylan, and β-1,4-glucan linkages including cellulose (91). Other Caldicellulosiruptor

SLH proteins are thought to play a role in helping the cell attach and tether itself to the biomass substrate. Csac_2722 is one such protein that has been found to bind Avicel, a crystalline cellulose substrate, and has been identified in the S-layer and cell membrane fraction of C. saccharolyticus grown on switchgrass (91). The proteins grouped as ‘other’ in

Table 1.2 do not contain currently identified GH, PL, CE, or CBM domains, but a number of these proteins have been identified by proteomics as Avicel-bound (91). These proteins presumably contain unidentified CAZy domains that facilitate their attachment to biomass.

SLH proteins not involved in biomass attachment and degradation may play a role in the S- layer by helping to space the other SLH containing CAZymes, giving them adequate area to perform their enzymatic or binding roles.

FN3 containing Caldicellulosiruptor proteins

As previously discussed, little is known about the role of fibronectin type III (FN3)- like domains found in bacteria. A number of Caldicellulosiruptor proteins are predicted to contain these FN3 domains and are listed by homologous group in Table 1.3. These proteins

31

are particularly large, ranging from 750 to 3027 amino acids in length. Of these, Athe_0014 is the largest protein found in the genomes of all currently sequenced Caldicellulosiruptor species. All of these proteins are predicted to be extracellular by SignalP4.1, except two

GH3-FN3 groups, which are the smallest FN3-containing proteins. Three of these protein groups also contain SLH domains for localization at the surface of the bacterial cell.

At one time it was thought that in bacteria, FN3 domains were only found in extracellular glycoside hydrolase enzymes (79). However, four homologous groups are found in the Caldicellulosiruptor genus that contain no identified catalytic domains and, furthermore, no domains other than the predicted FN3 domains. An individual FN3 domain is only approximately 80 amino acids in length, leaving open the possibility that other, as yet unidentified, domains could be present in these proteins. Four of the homologous protein groups, found in Table 1.3, do contain catalytic glycoside hydrolase domains. For these proteins, the FN3 domain may contribute to catalytic ability, binding, thermostability, or domain spacing. However, the definitive role that these FN3 domains play in biomass degradation by the genus Caldicellulosiruptor, and more broadly in bacteria, remains unknown.

Comparison to Clostridium thermocellum

As mentioned, one of the best studied cellulolytic thermophilic microorganisms other than the Caldicellulosiruptor species is C. thermocellum, which primarily uses a cellulosomal mechanism to coordinate its inventory of CAZymes to degrade biomass. Both

Caldicellulosiruptor species and C. thermocellum are of particular interest to engineer as

32

metabolic engineering platforms for consolidated bioprocessing (CBP) of lignocellulosic biomass to biofuels. Table 1.4 compares the extracellular catalytic CAZyme domain inventory of C. thermocellum to that of Caldicellulosiruptor kronotskyensis, the

Caldicellulosiruptor species with the largest CAZyme inventory. Each catalytic domain found in multi-domain proteins with multiple catalytic domains is cataloged separately. A number of C. thermocellum strains have sequenced genomes, but most work has focused on two of these strains: C. thermocellum ATCC 27405 and C. thermocellum DSM 1313. All extracellular CAZymes from the ATCC 27405 strain are found in DSM 1313 and only 3 proteins (containing GH11, GH43, and GH18) are unique to DSM 1313. Because of these very few differences between the two strains, they were collapsed into a single CAZyme domain profile for comparison to C. kronotskyensis in Table 1.4. Again, extracellular proteins for both C. thermocellum and C. kronotskyensis were predicted by identifying signal peptides using SignalP4.1 (121).

One of the major differences between the catalytic domain inventory of these microbes shown in Table 1.4 is that C. thermocellum has almost double the catalytic

CAZymes domains of C. kronotskyensis (70 vs. 38). However the number of domain families is fairly consistent between the organisms: 17 vs. 15 GH families, 4 vs. 3 PL families, 7 vs. 2 CE families for C. thermocellum vs. C. kronotskyensis, respectively.

Interestingly, almost all (61 out of 70) of the catalytic CAZyme domains from C. thermocellum are associated with a dockerin domain that would mediate their attachment to the cellulosome. The remaining 9 catalytic CAZyme domains, which are each on separate proteins, are likely free-enzymes. Some of these C. thermocellum free enzymes have high

33

similarity to C. kronotskyensis CAZymes, particularly Cthe_0040 (GH9-CBM3-CBM3) and

Cthe_0071 (GH48-CBM3), which mirror the structure of the CBM3-rich cellulases of the

Caldicellulosiruptor glucan degradation locus (GDL), and Cthe_2809 (SLH-SLH-SLH-

CBM54-GH16-CBM4-CBM4-CBM4-CBM4), which has high similarity to Calkro_0072. In addition to the catalytic CAZy domain containing proteins, C. thermocellum contains 9 extracellular proteins that contain only CBM domains compared to 3 exclusively CBM- containing proteins in C. kronotskyensis. These proteins are presumably involved in biomass attachment.

To compare the cellulosomal and multi-domain mechanisms that C. thermocellum and C. kronotskyensis, respectively, use to degrade biomass, the number of CAZy domains

(GH, PL, CE, CBM) per extracellular protein was calculated. C. thermocellum contains 136

CAZy domains in 72 total extracellular proteins, while C. kronotskyensis contains 103 CAZy domains in 34 extracellular proteins. C. kronotskyensis has about one more CAZy domain per extracellular protein (3.0 domains/protein) than C. thermocellum (1.8 domains/protein) on average. This confirms the multi-domain strategy that C. kronotskyensis uses to degrade biomass and highlights one of the major differences between these cellulolytic thermophiles.

Conclusions

Members of the genus Caldicellulosiruptor are the most thermophilic, cellulolytic bacteria known. Caldicellulosiruptor species accomplish degradation of unpretreated lignocellulosic biomass by using a collection of unique large multi-domain CAZymes. The multi-domain architecture of these proteins promotes synergy between the various catalytic

34

and binding domains within a single protein to efficiently degrade biomass. An important subset of the CAZymes found in the genus Caldicellulosiruptor includes SLH domains to localize these proteins to the cell surface. These SLH proteins also contain CBM domains for mediating attachment of the cell to the biomass substrate, while various catalytic GH, PL, and CE domains in the same protein digest it. Additionally, a number of accessory domains, such as FN3, are a part of a number of Caldicellulosiruptor CAZymes. The definitive functions of these domains are largely unknown in the context of biomass degradation, and remain an area of ongoing investigation. The Caldicellulosiruptor multi-domain mechanism for biomass degradation is a distinct alternative to the cellulosome mechanism of other cellulolytic thermophiles, such as Clostridium thermocellum. The ability of

Caldicellulosiruptor to degrade unpretreated biomass, using a unique inventory of multi- domain enzymes, at temperatures above 70°C, makes these species leading candidates for development into CBP for the conversion of biomass to biofuels.

Future Trends

Metagenomics - Identifying Novel CAZymes

Identification of novel proteins related to biomass degradation is becoming more possible with the decreasing cost of DNA sequencing and the development of metagenomics.

This sequence-based analysis takes advantage of the great diversity in microbial communities, particularly of microorganisms that would not be culturable using standard microbiological techniques (140,141). A standard metagenomic experiment would take an environmental sample, potentially enrich the natural microbiota in a particular subset of

35

microbes by culturing under selective condition (e.g., high temperature for thermophiles, media with crystalline cellulose for cellulolytic microbes), isolate DNA from the microbial community in the sample, and sequence all of the isolated DNA for analysis (140). This technique has been used to identify novel CAZymes from the human gut microbiota (142), decaying wood (143), buffalo rumen (144), and German grassland soil samples (145). Once a CAZyme gene of interest is identified, it can be cloned for biochemical characterization, as was done in the German grassland soil study to identify a novel CBM9-GH9 cellulase optimally active at 45-50°C and two novel GH11 xylanases optimally active at 60°C (145).

Novel enzymes of particular interest can then be inserted into the genome of a CBP microorganism to improve lignocellulosic biomass degradation.

Caldicellulosiruptor bescii Genetic System

Ultimately, the development of a CBP microorganism depends on the ability to genetically manipulate a platform organism to adjust the cellulolytic enzyme inventory and metabolic pathways for optimized biomass degradation and fuel molecule production. A genetic system for Caldicellulosiruptor bescii has only recently been developed, and work to make it tractable and extend genetic tools into other Caldicellulosiruptor species is continuing (25,146-151). This genetic system in C. bescii is constructed around a uracil auxotroph ΔpyrFA (Cbes_1377) disruption mutant. The gene pyrF encodes for orotidine monophosphate decarboxylase, an enzyme essential for uracil biosynthesis. This allows for selection of uracil auxotroph mutants because uracil prototrophs with functioning PyrF will ultimately convert 5-fluoroorotic acid (5-FOA) into the toxic product fluorodeoxyuridine

36

(148). The uracil auxotroph mutant can then be complemented with a shuttle vector containing the pyrF gene under the control of the 30S ribosomal subunit (Cbes_2105) regulatory region (148). This ΔpyrFA strain was used to successfully knockout the gene

Cbes_1918 encoding lactate dehydrogenase (ldh) to metabolically engineer C. bescii to divert metabolic flux away from lactate and increase acetate and hydrogen production (25).

Metabolic Engineering of a CBP Thermophile

Even with a tractable genetic system, the genetic manipulations necessary to engineering a highly productive consolidating bioprocessing thermophile are non-trivial.

Genetic manipulations will be needed to engineer both the front-end CAZymes inventory to promote complete degradation of the biomass substrate, and the back-end metabolism to divert flux to products of interest (i.e., alcohols or other fuel molecules). This front-end engineering will likely involve knocking genes into the CBP species. As seen in the

Caldicellulosiruptor genus, a number of novel and potentially very productive biomass degrading enzymes are found only in a single species and these would need to be added together into the engineered strain. Likewise, metagenomic studies in cellulosic environments will no doubt reveal novel thermophilic CAZymes from a diversity of species that could complement the Caldicellulosiruptor pan-CAZome in an engineered strain. The two main challenges of the back-end metabolic engineering are a relative dearth of precise information about the metabolism of these cellulolytic thermophiles and an absence of a drop-in thermophilic liquid fuel production pathway. While we have a significant understanding of the metabolism of Caldicellulosiruptor species, there are still holes in our

37

knowledge that are actively being pursued. A more precise understanding of

Caldicellulosiruptor metabolism is necessary to target gene knockouts to direct metabolic flux away from byproducts to products of interest. While small amounts of alcohols are produced by Caldicellulosiruptor species, it is unclear at this early stage whether metabolism can be significantly diverted to fuels using only the Caldicellulosiruptor metabolism. In the case that it is not, another metabolic engineering strategy is to add all of the components of an alcohol production pathway from another source into the CBP host. However, currently such a thermophilic pathway does not exist at the growth temperatures of

Caldicellulosiruptor species (>70°C). Work continues to develop our understanding and manipulate both the front-end biomass degradation and back-end metabolic flux of the

Caldicellulosiruptor genus.

38

Web Resources

Carbohydrate-Active enZYmes Database www.cazy.org

National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/

NCBI Conserved Domains Search http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

Pfam 27.0 Database http://pfam.sanger.ac.uk/

SignalP4.1 Server http://www.cbs.dtu.dk/services/SignalP/

SEG Prediction of Low Complexity Regions http://mendel.imp.ac.at/METHODS/seg.server.html

Acknowledgments

This work was funded in part by the BioEnergy Science Center, a U.S. Department of

Energy Bioenergy Research Center supported by the Office of Biological and Environmental

Research in the DOE Office of Science.

39

Figure 1.1 - Biomass structure and degradation by selected Caldicellulosiruptor enzymes (A) Cellulose microfibrils of β-1,4-linked D-glucose contain both crystalline (left and right) and amorphous regions (center). Various forms of hemicellulose, xylan (top right, bottom center), mannan (bottom left), and xyloglucan (top left), form a sheath around and crosslink the cellulose microfibrals. Pectin homogalacturonan (top left and bottom center) and branched rhamnogalacturonan I (bottom right) also contribute to the sheathing of the cellulose microfibrals. (B) Selected Caldicellulosiruptor enzymes are shown degrading these biomass polysaccharides.

40

Figure 1.2 - The three biomass degradation mechanisms used by cellulolytic microorganisms Free enzymes contain either a single catalytic domain or one linked to a Carbohydrate Binding Module (CBM) domain. Free enzymes are secreted and disconnected from the cell. The cellulosome structure is shown structured around Clostridium thermocellum scaffoldin CipA, but a variety of other multi-enzyme cellulosome structures exist. Enzyme subunits attach to the scaffoldin by Type I cohesin-dockerin interactions. The scaffoldin attaches to an anchoring scaffoldin by a Type II cohesin-dockerin interaction. The anchoring scaffoldin is attached at the cell surface within the cell’s surface layer (S-layer) by Surface Layer Homology (SLH) domains. Multi-domain proteins contain multiple catalytic and binding domains within the same protein. Some of these multi-domain proteins are anchored to the cell surface within the S-layer by SLH domains. Microorganisms may use one or more of these mechanisms to effectively degrade biomass.

41

Figure 1.3 - Caldicellulosiruptor extracellular pan-CAZome Core extracellular CAZymes are contained in the genomes of all nine sequenced Caldicellulosiruptor species. The additional boxes encompass extracellular CAZymes contained in the genomes of five or more, three or more, two or more, and one or more species, respectively. CAZy domain family numbers are shown for Glycoside Hydrolase (GH), Carbohydrate Binding Module (CBM), Polysaccharide Lyase (PL), and Carbohydrate Esterase (CE) domains. Surface Layer Homology (SLH) domains are also shown, but additional predicted domains are omitted. Extracellular proteins were predicted to contain a signal peptide using SignalP4.1.

42

Figure 1.4 - Caldicellulosiruptor intracellular pan-CAZome Core intracellular CAZymes are contained in the genomes of all nine sequenced Caldicellulosiruptor species. The additional boxes encompass intracellular CAZymes contained in the genomes of five or more, three or more, two or more, and one or more species, respectively. CAZy domain family numbers are shown for Glycoside Hydrolase (GH), Carbohydrate Binding Module (CBM), Polysaccharide Lyase (PL), and Carbohydrate Esterase (CE) domains. The intracellular pan-CAZome contains no Surface Layer Homology (SLH) domain CAZymes. Additional predicted domains are omitted. Intracellular proteins are classified in this category if they are not predicted to contain a signal peptide using SignalP4.1.

43

Table 1.1 - Caldicellulosiruptor Genomes

Accession Genome Species nov. Genome Species Number Size (Mb) Announcement Announcement C. acetigenus ASM42172v1 2.59 (152) N/A C. bescii CP001393 2.93 (153) (154) C. CP002219 2.77 (155) (156) hydrothermalis C. kristjanssonii CP002326 2.8 (157) (156) C. CP002330 2.84 (155) (156) kronotskyensis C. lactoaceticus CP003001 2.62 (158) (156) C. obsidiansis CP002164 2.53 (159) (160) C. owensensis CP002216 2.43 (161) (156) C. CP000679 2.97 (162) (112) saccharolyticus

44

Table 1.2 - Surface Layer Homology domain containing Caldicellulosiruptor proteins

# AA Locus Domains Athe_0594, Calkro_2036, Calhy_2064, H557DRAFT_00803, Csac_0678, Calkr_2007, 755-756 Calla_0352 GH5-CBM28⁰-SLH-SLH-SLH Calow_0459⁰, COB47_0546⁰ 567-568 Calkro_0072, Calhy_0060, COB47_0076 1732 SLH-SLH-SLH-CBM54-GH16-CBM4-CBM4-CBM6- 526 Csac_2549* CBM4-CBM4-CBM4 H557DRAFT_02137 1411 1672-1675 Calkro_0402, Calow_1924

CBM22-CBM22-CBM22-GH10-CBM9-CBM9- GH Calkr_2245^ 2159 (CE15^)-SLH-SLH Calla_0206 1593

Calkro_0111 2435 SLH-SLH-SLH-CBM54-FN3-GH16-FLD-FN3-FN3- FLD-FN3-FN3-GH55-CBM32-CBM32 Calkro_0121* 2229

Calhy_1629 1440 SLH-SLH-SLH-GH43-CBM54

Calhy_2383 2007 CBM35-GH87-FN3-SLH-SLH-SLH

Csac_2722 2593 CBM4-Big4-Coh-CBM4-Big4-GHnc-SLH-SLH-SLH

Calkro_0550, Calow_1771 1789-1790 Big4-PL11-SLH-SLH PL Calow_2109 1711 Big3-Big3-Big3-PL9-SLH-SLH-SLH Calkr_1989*† 326 CBM20-SLH-SLH-CBM20 1097

Calla_0367, Calow_0484, COB47_0564 1088 only Calla_2176 CBM27-CBM27-CBM27-SLH-SLH-SLH

COB47_0167*† 886 CBM CBM Calkr_2289 1495 SLH-SLH-SLH-CBM54 Calla_0162* 480 Athe_0438, Calkro_2198, Calhy_2227, Calkr_2224, Calla_0227, Calow_0272, 1156-1157 SLH-SLH-Big5 COB47_0390, Csac_0207, H557DRAFT_02082 Athe_1943, Calkro_0749, Calhy_0832, Calkr_0758, Calla_1578, Calow_1666, 275-277 SLH-SLH COB47_1753, Csac_1009, H557DRAFT_02335

Athe_2303, Calkro_0333, Calhy_0455, Calla_0125, Calkr_2323, COB47_2069, 1013-1026

Other Csac_2451, H557DRAFT_01166 SLH

Calow_1997 972

Athe_2341, Calkro_0210, Calhy_0185, Calkr_2370, 483-484 SLH-SLH Calla_0076, Calow_2070, COB47_2114, Csac_0591, H557DRAFT_01346

45

Table 1.2 Continued

Athe_2342, Calkro_0209, Calhy_0184, Calkr_2371, Calla_0075, Calow_2071, COB47_2115, 1003-1010 SLH Csac_0590, H557DRAFT_01347

Athe_1839, Calkro_0875, Calkr_0834, Calla_1498, 575-576 SLH-SLH-SLH Calow_1583, COB47_1651, Csac_2381, H557DRAFT_00696 Athe_2295, Calkro_0339, Calhy_0464, Calkr_2316, Calla_0133, COB47_2061, Csac_2445, 1074-1075 SLH-SLH H557DRAFT_01171 Calkr_2463, Calla_2324, Calow_0034, 1774-1779 COB47_0063 SLH-SLH-SLH-Big1-Tglut Athe_0077 1710

Calhy_0047† 1626 Athe_0608, Calkro_2018, Calhy_2046, Calow_0482, 547 SLH-SLH-SLH

COB47_0562 Calkro_0360, Calkr_2450 793-798

Other SLH-SLH-SLH Calow_1967† 683

Calkro_0197, Calkr_2380 1052 SLH-SLH

Calkro_0155, Calow_2111 549 MG2-SLH-SLH-SLH

Calkro_0090, Calhy_0073 664 SLH-SLH-SLH Athe_0012 3027 SLH-SLH-FN3-vWFA-SH3-RHSrep-RHScore Calkro_0014 2994

Calkr_2514, Calla_2376 643 SLH-SLH-SLH Calkr_0022 254 SLH-SLH-SLH Calla_2302* 144 Athe_2223 591 SLH COB47_1200*† 477

Calkr_2453 1153 SLH-Big5

H557DRAFT_02031† 128 SLH-SLH * fragment, † no predicted signal peptide, ^ Calkr_2245 contains an additional CE15 domain, ⁰ Calow_0459 and COB47_0546 have a truncated CBM28 domain

GH: Glycoside Hydrolase, PL: Polysaccharide Lyase, CE: Carbohydrate Esterase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), Big: Bacterial Immunoglobulin (CL0159), FLD: Fascin-like domain (CL0066), Coh: Cohesin-like (CL14606), Tglut: Transglutaminase-like superfamily (pfam01841), MG2: Macroglobulin2 (pfam01835), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696)

46

Table 1.3 - Fibronectin type III-like domain containing Caldicellulosiruptor proteins Locus # AA SignalP Domains Athe_1944, Calhy_0831, Calkro_0748, COB47_1754, 1201- Calow_1667, 1207 Y FN3-FN3 Csac_1008, H557DRAFT_02334 Calla_1579, Calkr_0757 798 Athe_1945, Calhy_0830, Calkro_0747, COB47_1755, 1265 Y FN3-FN3 Calow_1668, Csac_1007, H557DRAFT_02333 Athe_0012 3027 Y SLH-SLH-FN3-vWFA-SH3-RHSrep-RHScore Calkro_0014 2994 Calkro_0111 2435 SLH-SLH-SLH-CBM54-FN3-GH16-FLD-FN3-FN3-FLD- Y Calkro_0121* 2229 FN3-FN3-GH55-CBM32-CBM32 Calkro_0153 983 Y FN3-FN3 Calkro_0113 1240 Y CBM32-CBM32-FN3-GH55-CBM32 Calhy_2383 2007 Y CBM35-GH87-FN3-SLH-SLH-SLH Calow_2113 983 Y FN3 Athe_2354, Calhy_0170, Calkr_2396, Calkro_0181, Calla_0066, 770- N GH3-FN3 COB47_2124, 771 Calow_2083, Csac_0586, H557DRAFT_01368 Csac_1102 750 N GH3-FN3 * fragment GH: Glycoside Hydrolase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), FLD: Fascin-like domain (CL0066), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696)

47

Table 1.4 - C. thermocellum and C. kronotskyensis extracellular CAZy domain inventory comparison Extracellular C. thermocellum C. kronotskyensis CAZy Domains 1 GH3 0 8 GH5 4 1 GH8 0 15 GH9 2 4 GH10 4 2 GH11 1 0 GH13 3 2 GH16 3 3 GH18 0 1 GH23 1

3 GH26 0 2 GH30 0 GH 0 GH39 0 6 GH43 1 2 GH44 1 2 GH48 3 1 GH53 1 0 GH55 2 1 GH74 1 0 GH81 2 0 GH87 1 1 GH94 0 0 GH124 0 2 PL1 1

1 PL3 1

PL 1 PL9 0 1 PL11 2 3 CE1 0 1 CE2 0

1 CE3 0

2 CE4 3 CE 1 CE8 1 1 CE12 0 1 CEnc 0 70 Total 38

48

References

1. Maki, M., Leung, K. T., and Qin, W. (2009) The prospects of cellulase-producing

bacteria for the bioconversion of lignocellulosic biomass. Int J Biol Sci 5, 500-516

2. National Research Council. (2009) Liquid Transportation Fuels from Coal and

Biomass: Technological Status, Costs, and Environmental Impacts, The National

Academies Press, Washington, DC

3. U.S. Department of Energy. (2011) U.S. Billion-Ton Update: Biomass Supply for a

Bioenergy and Bioproducts Industry, ORNL/TM-2011/224, Oak Ridge National

Laboratory, Oak Ridge, TN

4. Sun, Y., and Cheng, J. (2002) Hydrolysis of lignocellulosic materials for ethanol

production: a review. Bioresour Technol 83, 1-11

5. Pauly, M., and Keegstra, K. (2008) Cell-wall carbohydrates and their modification as

a resource for biofuels. Plant J 54, 559-568

6. Pauly, M., and Keegstra, K. (2010) Plant cell wall polymers as precursors for

biofuels. Curr Opin Plant Biol 13, 304-311

7. Cosgrove, D. J. (2005) Growth of the plant cell wall. Nat Rev Mol Cell Biol 6, 850-

861

49

8. Lynd, L. R., Weimer, P. J., van Zyl, W. H., and Pretorius, I. S. (2002) Microbial

cellulose utilization: fundamentals and biotechnology. Microbiol Mol Biol Rev 66,

506-577

9. Scheller, H. V., and Ulvskov, P. (2010) Hemicelluloses. Annu Rev Plant Biol 61, 263-

289

10. Vanholme, R., Morreel, K., Ralph, J., and Boerjan, W. (2008) Lignin engineering.

Curr Opin Plant Biol 11, 278-285

11. Grabber, J. (2005) How Do Lignin Composition, Structure, and Cross-Linking Affect

Degradability? A Review of Cell Wall Model Studies. Crop Sci 45, 820-831

12. Ju, X., Engelhard, M., and Zhang, X. (2013) An advanced understanding of the

specific effects of xylan and surface lignin contents on enzymatic hydrolysis of

lignocellulosic biomass. Bioresour Technol 132, 137-145

13. Lynd, L. R., van Zyl, W. H., McBride, J. E., and Laser, M. (2005) Consolidated

bioprocessing of cellulosic biomass: an update. Curr Opin Biotechnol 16, 577-583

14. Margeot, A., Hahn-Hagerdal, B., Edlund, M., Slade, R., and Monot, F. (2009) New

improvements for lignocellulosic ethanol. Curr Opin Biotechnol 20, 372-380

15. Wu, X., McLaren, J., Madl, R., and Wang, D. (2010) Biofuels from Lignocellulosic

Biomass. in Sustainable Biotechnology (Singh, O. V., and Harvey, S. P. eds.),

Springer Netherlands. pp 19-41

50

16. Ishola, M. M., Jahandideh, A., Haidarian, B., Brandberg, T., and Taherzadeh, M. J.

(2013) Simultaneous saccharification, filtration and fermentation (SSFF): a novel

method for bioethanol production from lignocellulosic biomass. Bioresour Technol

133, 68-73

17. Olson, D. G., McBride, J. E., Shaw, A. J., and Lynd, L. R. (2012) Recent progress in

consolidated bioprocessing. Curr Opin Biotechnol 23, 396-405

18. Yamada, R., Hasunuma, T., and Kondo, A. (2013) Endowing non-cellulolytic

microorganisms with cellulolytic activity aiming for consolidated bioprocessing.

Biotechnol Adv

19. You, C., Zhang, X. Z., Sathitsuksanoh, N., Lynd, L. R., and Zhang, Y. H. (2012)

Enhanced Microbial Utilization of Recalcitrant Cellulose by an Ex Vivo Cellulosome-

Microbe Complex. Appl Environ Microbiol 78, 1437-1444

20. Blumer-Schuette, S. E., Kataeva, I., Westpheling, J., Adams, M. W., and Kelly, R. M.

(2008) Extremely thermophilic microorganisms for biomass conversion: status and

prospects. Curr Opin Biotechnol 19, 210-217

21. Barnard, D., Casanueva, A., Tuffin, M., and Cowan, D. (2010) Extremophiles in

biofuel synthesis. Environ Technol 31, 871-888

51

22. VanFossen, A. L., Lewis, D. L., Nichols, J. D., and Kelly, R. M. (2008)

Polysaccharide degradation and synthesis by extremely thermophilic anaerobes. Ann

N Y Acad Sci 1125, 322-337

23. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,

Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)

Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448

24. Cai, Y., Lai, C., Li, S., Liang, Z., Zhu, M., Liang, S., and Wang, J. (2011) Disruption

of lactate dehydrogenase through homologous recombination to improve bioethanol

production in Thermoanaerobacterium aotearoense. Enzyme Microbial. Tech. 48,

155-161

25. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic

engineering of Caldicellulosiruptor bescii yields increased hydrogen production from

lignocellulosic biomass. Biotechnol. Biofuels 6, 85

26. Tripathi, S. A., Olson, D. G., Argyros, D. A., Miller, B. B., Barrett, T. F., Murphy, D.

M., McCool, J. D., Warner, A. K., Rajgarhia, V. B., Lynd, L. R., Hogsett, D. A., and

Caiazza, N. C. (2010) Development of pyrF-based genetic system for targeted gene

deletion in Clostridium thermocellum and creation of a pta mutant. Appl Environ

Microbiol 76, 6591-6599

52

27. Cantarel, B. L., Coutinho, P. M., Rancurel, C., Bernard, T., Lombard, V., and

Henrissat, B. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert

resource for Glycogenomics. Nucleic Acids Res 37, D233-238

28. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M., and Henrissat, B.

(2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids

Res. 42, D490-495

29. Levasseur, A., Drula, E., Lombard, V., Coutinho, P. M., and Henrissat, B. (2013)

Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary

redox enzymes. Biotechnol Biofuels 6, 41

30. Naumoff, D. G. (2011) Hierarchical classification of glycoside hydrolases.

Biochemistry (Mosc) 76, 622-635

31. Boraston, A. B., Bolam, D. N., Gilbert, H. J., and Davies, G. J. (2004) Carbohydrate-

binding modules: fine-tuning polysaccharide recognition. Biochem J 382, 769-781

32. Guillen, D., Sanchez, S., and Rodriguez-Sanoja, R. (2010) Carbohydrate-binding

domains: multiplicity of biological roles. Appl Microbiol Biotechnol 85, 1241-1249

33. McCarter, J. D., and Withers, S. G. (1994) Mechanisms of enzymatic glycoside

hydrolysis. Curr Opin Struct Biol 4, 885-892

34. Vocadlo, D. J., and Davies, G. J. (2008) Mechanistic insights into glycosidase

chemistry. Curr Opin Chem Biol 12, 539-555

53

35. Davies, G. J., Planas, A., and Rovira, C. (2011) Conformational Analyses of the

Reaction Coordinate of Glycosidases. Accounts Chem. Res. 45, 308-316

36. Knott, B. C., Haddad Momeni, M., Crowley, M. F., Mackenzie, L. F., Götz, A. W.,

Sandgren, M., Withers, S. G., Ståhlberg, J., and Beckham, G. T. (2013) The

Mechanism of Cellulose Hydrolysis by a Two-Step, Retaining Cellobiohydrolase

Elucidated by Structural and Transition Path Sampling Studies. J Am Chem Soc 136,

321-329

37. Mayes, H. B., Broadbelt, L. J., and Beckham, G. T. (2013) How Sugars Pucker:

Electronic Structure Calculations Map the Kinetic Landscape of Five Biologically

Paramount Monosaccharides and Their Implications for Enzymatic Catalysis. J Am

Chem Soc

38. Bayer, E. A., Chanzy, H., Lamed, R., and Shoham, Y. (1998) Cellulose, cellulases

and cellulosomes. Curr Opin Struct Biol 8, 548-557

39. Barr, B. K., Hsieh, Y.-L., Ganem, B., and Wilson, D. B. (1996) Identification of Two

Functionally Different Classes of Exocellulases. Biochemistry 35, 586-592

40. Bianchetti, C. M., Elsen, N. L., Fox, B. G., and Phillips, G. N., Jr. (2011) Structure of

cellobiose phosphorylase from Clostridium thermocellum in complex with phosphate.

Acta Crystallogr Sect F Struct Biol Cryst Commun 67, 1345-1349

54

41. Hong, M.-R., Kim, Y.-S., Park, C.-S., Lee, J.-K., Kim, Y.-S., and Oh, D.-K. (2009)

Characterization of a recombinant β-glucosidase from the thermophilic bacterium

Caldicellulosiruptor saccharolyticus. J. Biosci. Bioeng. 108, 36-40

42. Pollet, A., Delcour, J. A., and Courtin, C. M. (2010) Structural determinants of the

substrate specificities of xylanases from different glycoside hydrolase families. Crit

Rev Biotechnol 30, 176-191

43. Janeček, Š., and Blesák, K. (2011) Sequence-Structural Features and Evolutionary

Relationships of Family GH57 α-Amylases and Their Putative α-Amylase-Like

Homologues. The Protein Journal 30, 429-435

44. MacGregor, E. A., Janeček, Š., and Svensson, B. (2001) Relationship of sequence and

structure to specificity in the α-amylase family of enzymes. BBA - Protein Struct M

1546, 1-20

45. Gilbert, H. J., Stålbrand, H., and Brumer, H. (2008) How the walls come crumbling

down: recent structural biochemistry of plant polysaccharide degradation. Curr Opin

Plant Biol 11, 338-348

46. Sørensen, H., Jørgensen, C., Hansen, C., Jørgensen, C., Pedersen, S., and Meyer, A.

(2006) A novel GH43 α-l-arabinofuranosidase from Humicola insolens: mode of

action and synergy with GH51 α-l-arabinofuranosidases on wheat arabinoxylan. Appl

Microbiol Biotechnol 73, 850-861

55

47. Ryabova, O., Vršanská, M., Kaneko, S., van Zyl, W. H., and Biely, P. (2009) A novel

family of hemicellulolytic α-glucuronidase. FEBS Lett 583, 1457-1462

48. Xu, F. (2011) Enzymatic Degradation of Lignocellulosic Biomass. in Biocatalysis for

Green Chemistry and Chemical Process Development, John Wiley & Sons, Inc. pp

361-390

49. Fushinobu, S., Alves, V. D., and Coutinho, P. M. (2013) Multiple rewards from a

treasure trove of novel glycoside hydrolase and polysaccharide lyase structures: new

folds, mechanistic details, and evolutionary relationships. Curr Opin Struct Biol 23,

652-659

50. Mahadevan, S. A., Wi, S. G., Lee, D.-S., and Bae, H.-J. (2008) Site-directed

mutagenesis and CBM engineering of Cel5A (Thermotoga maritima). FEMS

Microbiol. Lett. 287, 205-211

51. Park, J. I., Kent, M. S., Datta, S., Holmes, B. M., Huang, Z., Simmons, B. A., Sale, K.

L., and Sapra, R. (2011) Enzymatic hydrolysis of cellulose by the cellobiohydrolase

domain of CelB from the hyperthermophilic bacterium Caldicellulosiruptor

saccharolyticus. Bioresour Technol 102, 5988-5994

52. Boraston, A. B., Kwan, E., Chiu, P., Warren, R. A., and Kilburn, D. G. (2003)

Recognition and hydrolysis of noncrystalline cellulose. J Biol Chem 278, 6120-6127

56

53. McLean, B. W., Boraston, A. B., Brouwer, D., Sanaie, N., Fyfe, C. A., Warren, R. A.,

Kilburn, D. G., and Haynes, C. A. (2002) Carbohydrate-binding modules recognize

fine substructures of cellulose. J Biol Chem 277, 50245-50254

54. Din, N., Gilkes, N. R., Tekant, B., Miller, R. C., Warren, R. A. J., and Kilburn, D. G.

(1991) Non-hydrolytic Disruption of Cellulose Gibres by the Binding Domain of a

Bacterial Cellulase. Nat Biotech 9, 1096-1099

55. Arantes, V., and Saddler, J. N. (2010) Access to cellulose limits the efficiency of

enzymatic hydrolysis: the role of amorphogenesis. Biotechnol Biofuels 3, 4

56. Reyes-Ortiz, V., Heins, R. A., Cheng, G., Kim, E. Y., Vernon, B. C., Elandt, R. B.,

Adams, P. D., Sale, K. L., Hadi, M. Z., Simmons, B. A., Kent, M. S., and Tullman-

Ercek, D. (2013) Addition of a carbohydrate-binding module enhances cellulase

penetration into cellulose substrates. Biotechnol Biofuels 6, 93

57. Gourlay, K., Arantes, V., and Saddler, J. N. (2012) Use of substructure-specific

carbohydrate binding modules to track changes in cellulose accessibility and surface

morphology during the amorphogenesis step of enzymatic hydrolysis. Biotechnol

Biofuels 5, 51

58. Lehtio, J., Sugiyama, J., Gustavsson, M., Fransson, L., Linder, M., and Teeri, T. T.

(2003) The binding specificity and affinity determinants of family 1 and family 3

cellulose binding modules. Proc Natl Acad Sci U S A 100, 484-489

57

59. Pell, G., Williamson, M. P., Walters, C., Du, H., Gilbert, H. J., and Bolam, D. N.

(2003) Importance of hydrophobic and polar residues in ligand binding in the family

15 carbohydrate-binding module from Cellvibrio japonicus Xyn10C. Biochemistry

42, 9316-9323

60. Lombard, V., Bernard, T., Rancurel, C., Brumer, H., Coutinho, P. M., and Henrissat,

B. (2010) A hierarchical classification of polysaccharide lyases for glycogenomics.

Biochem J 432, 437-444

61. Garron, M. L., and Cygler, M. (2010) Structural and mechanistic classification of

uronic acid-containing polysaccharide lyases. Glycobiology 20, 1547-1573

62. Yadav, S., Yadav, P. K., Yadav, D., and Yadav, K. D. S. (2009) Pectin lyase: A

review. Process Biochem 44, 1-10

63. Yip, V. L. Y., and Withers, S. G. (2004) Nature's many mechanisms for the

degradation of oligosaccharides. Org Biomol Chem 2, 2707-2713

64. Yip, V. L. Y., and Withers, S. G. (2006) Breakdown of oligosaccharides by the

process of elimination. Curr Opin Chem Biol 10, 147-155

65. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.

E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor

bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.

Crystallogr. 69, 534-539

58

66. Williamson, G., Kroon, P. A., and Faulds, C. B. (1998) Hairy plant polysaccharides: a

close shave with microbial esterases. Microbiology 144 ( Pt 8), 2011-2023

67. Biely, P. (2012) Microbial carbohydrate esterases deacetylating plant

polysaccharides. Biotechnol Adv 30, 1575-1588

68. Biely, P., Mackenzie, C. R., Puls, J., and Schneider, H. (1986) Cooperativity of

Esterases and Xylanases in the Enzymatic Degradation of Acetyl Xylan. Bio-

Technology 4, 731-733

69. Faulds, C. B., and Williamson, G. (1995) Release of ferulic acid from wheat bran by

a ferulic acid esterase (FAE-III) from Aspergillus niger. Appl Microbiol Biotechnol

43, 1082-1087

70. Selig, M., Adney, W., Himmel, M., and Decker, S. (2009) The impact of cell wall

acetylation on corn stover hydrolysis by cellulolytic and xylanolytic enzymes.

Cellulose 16, 711-722

71. Pai, C.-K., Wu, Z.-Y., Chen, M.-J., Zeng, Y.-F., Chen, J.-W., Duan, C.-H., Li, M.-L.,

and Liu, J.-R. (2010) Molecular cloning and characterization of a bifunctional

xylanolytic enzyme from Neocallimastix patriciarum. Appl Microbiol Biotechnol 85,

1451-1462

72. Brink, J., and Vries, R. (2011) Fungal enzyme sets for plant polysaccharide

degradation. Appl Microbiol Biotechnol 91, 1477-1492

59

73. Jolie, R. P., Duvetter, T., Van Loey, A. M., and Hendrickx, M. E. (2010) Pectin

methylesterase and its proteinaceous inhibitor: a review. Carbohyd. Res. 345, 2583-

2595

74. Aurilia, V., Parracino, A., and D'Auria, S. (2008) Microbial carbohydrate esterases in

cold adapted environments. Gene 410, 234-240

75. Martínez-Martínez, I., Montoro-García, S., Lozada-Ramírez, J. D., Sánchez-Ferrer,

Á., and García-Carmona, F. (2007) A colorimetric assay for the determination of

acetyl xylan esterase or cephalosporin C acetyl esterase activities using 7-amino

cephalosporanic acid, cephalosporin C, or acetylated xylan as substrate. Anal.

Biochem. 369, 210-217

76. Chothia, C., and Jones, E. Y. (1997) The molecular structure of cell adhesion

molecules. Annu Rev Biochem 66, 823-862

77. Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang,

N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L.,

Eddy, S. R., Bateman, A., and Finn, R. D. (2012) The Pfam protein families database.

Nucleic Acids Res. 40, D290-301

78. Campbell, I. D., and Spitzfaden, C. (1994) Building proteins with fibronectin type III

modules. Structure 2, 333-337

60

79. Little, E., Bork, P., and Doolittle, R. F. (1994) Tracing the spread of fibronectin type

III domains in bacterial glycohydrolases. J Mol Evol 39, 631-643

80. Kataeva, I. A., Seidel, R. D., 3rd, Shah, A., West, L. T., Li, X. L., and Ljungdahl, L.

G. (2002) The fibronectin type 3-like repeat from the Clostridium thermocellum

cellobiohydrolase CbhA promotes hydrolysis of cellulose by modifying its surface.

Appl Environ Microbiol 68, 4292-4300

81. Devillard, E., Goodheart, D. B., Karnati, S. K., Bayer, E. A., Lamed, R., Miron, J.,

Nelson, K. E., and Morrison, M. (2004) Ruminococcus albus 8 mutants defective in

cellulose degradation are deficient in two processive endocellulases, Cel48A and

Cel9B, both of which possess a novel modular architecture. J Bacteriol 186, 136-145

82. Brunecky, R., Alahuhta, M., Bomble, Y. J., Xu, Q., Baker, J. O., Ding, S. Y.,

Himmel, M. E., and Lunin, V. V. (2012) Structure and function of the Clostridium

thermocellum cellobiohydrolase A X1-module repeat: enhancement through

stabilization of the CbhA complex. Acta Crystallogr. D Biol. Crystallogr. 68, 292-

299

83. Fraser, J. S., Yu, Z., Maxwell, K. L., and Davidson, A. R. (2006) Ig-like domains on

bacteriophages: a tale of promiscuity and deceit. J Mol Biol 359, 496-507

84. Schubot, F. D., Kataeva, I. A., Chang, J., Shah, A. K., Ljungdahl, L. G., Rose, J. P.,

and Wang, B. C. (2004) Structural basis for the exocellulase activity of the

61

cellobiohydrolase CbhA from Clostridium thermocellum. Biochemistry 43, 1163-

1170

85. Alahuhta, M., Xu, Q., Bomble, Y. J., Brunecky, R., Adney, W. S., Ding, S. Y.,

Himmel, M. E., and Lunin, V. V. (2010) The unique binding mode of cellulosomal

CBM4 from Clostridium thermocellum cellobiohydrolase A. J Mol Biol 402, 374-387

86. Levisson, M., Sun, L., Hendriks, S., Swinkels, P., Akveld, T., Bultema, J. B.,

Barendregt, A., van den Heuvel, R. H., Dijkstra, B. W., van der Oost, J., and Kengen,

S. W. (2009) Crystal structure and biochemical properties of a novel thermostable

esterase containing an immunoglobulin-like domain. J Mol Biol 385, 949-962

87. Sara, M., and Sleytr, U. B. (2000) S-Layer proteins. J. Bacteriol. 182, 859-868

88. Scott, J. R., and Barnett, T. C. (2006) Surface proteins of gram-positive bacteria and

how they get there. Annu Rev Microbiol 60, 397-423

89. Pum, D., Toca-Herrera, J. L., and Sleytr, U. B. (2013) S-layer protein self-assembly.

Int J Mol Sci 14, 2484-2501

90. Engelhardt, H., and Peters, J. (1998) Structural research on surface layers: a focus on

stability, surface layer homology domains, and surface layer-cell wall interactions. J

Struct Biol 124, 276-302

91. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,

Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,

62

Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,

R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal

determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.

Bacteriol. 194, 4015-4028

92. Kosugi, A., Murashima, K., Tamaru, Y., and Doi, R. H. (2002) Cell-surface-

anchoring role of N-terminal surface layer homology domains of Clostridium

cellulovorans EngE. J Bacteriol 184, 884-888

93. Ali, M. K., Kimura, T., Sakka, K., and Ohmiya, K. (2001) The multidomain xylanase

Xyn10B as a cellulose-binding protein in Clostridium stercorarium. FEMS

Microbiol. Lett. 198, 79-83

94. Cosgrove, D. J., Li, L. C., Cho, H.-T., Hoffmann-Benning, S., Moore, R. C., and

Blecker, D. (2002) The Growing World of Expansins. Plant Cell Physiol. 43, 1436-

1444

95. Cosgrove, D. J. (2000) Loosening of plant cell walls by expansins. Nature 407, 321-

326

96. Sampedro, J., and Cosgrove, D. J. (2005) The expansin superfamily. Genome Biol 6,

242

97. Georgelis, N., Nikolaidis, N., and Cosgrove, D. J. (2014) Biochemical analysis of

expansin-like proteins from microbes. Carbohyd. Polym. 100, 17-23

63

98. Kerff, F., Amoroso, A., Herman, R., Sauvage, E., Petrella, S., Filee, P., Charlier, P.,

Joris, B., Tabuchi, A., Nikolaidis, N., and Cosgrove, D. J. (2008) Crystal structure

and activity of Bacillus subtilis YoaJ (EXLX1), a bacterial expansin that promotes

root colonization. Proc Natl Acad Sci U S A 105, 16876-16881

99. Saloheimo, M., Paloheimo, M., Hakola, S., Pere, J., Swanson, B., Nyyssönen, E.,

Bhatia, A., Ward, M., and Penttilä, M. (2002) Swollenin, a Trichoderma reesei

protein with sequence similarity to the plant expansins, exhibits disruption activity on

cellulosic materials. Eur. J. Biochem. 269, 4202-4211

100. Kang, K., Wang, S., Lai, G., Liu, G., and Xing, M. (2013) Characterization of a novel

swollenin from Penicillium oxalicum in facilitating enzymatic saccharification of

cellulose. BMC Biotechnol 13, 42

101. Jager, G., Girfoglio, M., Dollo, F., Rinaldi, R., Bongard, H., Commandeur, U.,

Fischer, R., Spiess, A. C., and Buchs, J. (2011) How recombinant swollenin from

Kluyveromyces lactis affects cellulosic substrates and accelerates their hydrolysis.

Biotechnol Biofuels 4, 33

102. Vodovnik, M., and Marinsek-Logar, R. (2010) Cellulosomes - Promising

Supramolecular Machines of Anaerobic Cellulolytic Microorganisms. Acta Chim Slov

57, 767-774

64

103. Mazzoli, R., Lamberti, C., and Pessione, E. (2012) Engineering new metabolic

capabilities in bacteria: lessons from recombinant cellulolytic strategies. Trends

Biotechnol 30, 111-119

104. Wilson, D. B. (2011) Microbial diversity of cellulose hydrolysis. Curr Opin

Microbiol 14, 259-263

105. Bayer, E. A., Lamed, R., White, B. A., and Flint, H. J. (2008) From cellulosomes to

cellulosomics. Chem Rec 8, 364-377

106. Resch, M. G., Donohoe, B. S., Baker, J. O., Decker, S. R., Bayer, E. A., Beckham, G.

T., and Himmel, M. E. (2013) Fungal cellulases and complexed cellulosomal

enzymes exhibit synergistic mechanisms in cellulose deconstruction. Energy Environ.

Sci. 6, 1858-1867

107. Fontes, C. M., and Gilbert, H. J. (2010) Cellulosomes: highly efficient nanomachines

designed to deconstruct plant cell wall complex carbohydrates. Annu. Rev. Biochem.

79, 655-681

108. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside

hydrolase inventory drives plant polysaccharide deconstruction by the extremely

thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.

108, 1559-1569

65

109. Khandeparker, R., and Numan, M. T. (2008) Bifunctional xylanases and their

potential use in biotechnology. J Ind Microbiol Biotechnol 35, 635-644

110. Su, X., Mackie, R. I., and Cann, I. K. (2012) Biochemical and mutational analyses of

a multidomain cellulase/mannanase from Caldicellulosiruptor bescii. Appl. Environ.

Microbiol. 78, 2230-2240

111. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,

microbiological, and glycoside hydrolase diversities within the extremely

thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.

Microbiol. 76, 8084-8092

112. van de Werken, H. J., Verhaart, M. R., VanFossen, A. L., Willquist, K., Lewis, D. L.,

Nichols, J. D., Goorissen, H. P., Mongodin, E. F., Nelson, K. E., van Niel, E. W.,

Stams, A. J., Ward, D. E., de Vos, W. M., van der Oost, J., Kelly, R. M., and Kengen,

S. W. (2008) Hydrogenomics of the extremely thermophilic bacterium

Caldicellulosiruptor saccharolyticus. Appl. Environ. Microbiol. 74, 6720-6729

113. VanFossen, A. L., Verhaart, M. R., Kengen, S. M., and Kelly, R. M. (2009)

Carbohydrate utilization patterns for the extremely thermophilic bacterium

Caldicellulosiruptor saccharolyticus reveal broad growth substrate preferences. Appl.

Environ. Microbiol. 75, 7718-7724

114. Basen, M., Rhaesa, A. M., Kataeva, I., Prybol, C. J., Scott, I. M., Poole, F. L., and

Adams, M. W. W. (2014) Degradation of high loads of crystalline cellulose and of

66

unpretreated plant biomass by the thermophilic bacterium Caldicellulosiruptor bescii.

Bioresour Technol 152, 384-392

115. Kataeva, I., Foston, M. B., Yang, S.-J., Pattathil, S., Biswal, A. K., Poole Ii, F. L.,

Basen, M., Rhaesa, A. M., Thomas, T. P., Azadi, P., Olman, V., Saffold, T. D.,

Mohler, K. E., Lewis, D. L., Doeppke, C., Zeng, Y., Tschaplinski, T. J., York, W. S.,

Davis, M., Mohnen, D., Xu, Y., Ragauskas, A. J., Ding, S.-Y., Kelly, R. M., Hahn,

M. G., and Adams, M. W. W. (2013) Carbohydrate and lignin are simultaneously

solubilized from unpretreated switchgrass by microbial action at high temperature.

Energy Environ. Sci. 6, 2186-2195

116. Zurawski, J. V., Blumer-Schuette, S. E., Conway, J. M., and Kelly, R. M. (2014) The

Extremely Thermophilic Genus Caldicellulosiruptor: Physiological and Genomic

Characteristics for Complex Carbohydrate Conversion to Molecular Hydrogen. in

Microbial BioEnergy: Hydrogen Production, Advances in Photosynthesis and

Respiration (De Philippis, R., and Zannoni, D. eds.), Springer Science. pp 177-195

117. Lochner, A., Giannone, R. J., Rodriguez, M., Jr., Shah, M. B., Mielenz, J. R., Keller,

M., Antranikian, G., Graham, D. E., and Hettich, R. L. (2011) Use of label-free

quantitative proteomics to distinguish the secreted cellulolytic systems of

Caldicellulosiruptor bescii and Caldicellulosiruptor obsidiansis. Appl. Environ.

Microbiol. 77, 4042-4054

67

118. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and

Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from

the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,

333-340

119. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,

H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic

bacteria. FEMS Microbiol. Ecol. 28, 99-110

120. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,

Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)

Thermophilic lignocellulose deconstruction. FEMS Microbiology Reviews

121. Petersen, T. N., Brunak, S., von Heijne, G., and Nielsen, H. (2011) SignalP 4.0:

discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785-786

122. Zverlov, V., Mahr, S., Riedel, K., and Bronnenmeier, K. (1998) Properties and gene

structure of a bifunctional cellulolytic enzyme (CelA) from the extreme thermophile

'Anaerocellum thermophilum' with separate glycosyl hydrolase family 9 and 48

catalytic domains. Microbiology 144 ( Pt 2), 457-465

123. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,

Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and

Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion

Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516

68

124. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and

biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase

by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,

e84172

125. Morris, D. D., Gibbs, M. D., Ford, M., Thomas, J., and Bergquist, P. L. (1999)

Family 10 and 11 xylanase genes from Caldicellulosiruptor sp. strain Rt69B.1.

Extremophiles 3, 103-111

126. Dwivedi, P. P., Gibbs, M. D., Saul, D. J., and Bergquist, P. L. (1996) Cloning,

sequencing and overexpression in Escherichia coli of a xylanase gene, xynA from the

thermophilic bacterium Rt8B.4 genus Caldicellulosiruptor. Appl Microbiol

Biotechnol 45, 86-93

127. Gibbs, M. D., Elinder, A. U., Reeves, R. A., and Bergquist, P. L. (1996) Sequencing,

cloning and expression of a β-1,4-mannanase gene, manA, from the extremely

thermophilic anaerobic bacterium, Caldicellulosiruptor Rt8B.4. FEMS Microbiology

Letters 141, 37-43

128. Sunna, A. (2010) Modular organisation and functional analysis of dissected modular

β-mannanase CsMan26 from Caldicellulosiruptor Rt8B.4. Appl Microbiol Biotechnol

86, 189-200

129. Payne, C. M., Resch, M. G., Chen, L., Crowley, M. F., Himmel, M. E., Taylor, L. E.,

2nd, Sandgren, M., Stahlberg, J., Stals, I., Tan, Z., and Beckham, G. T. (2013)

69

Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically

bind to cellulose. Proc. Natl. Acad. Sci. U.S.A. 110, 14646-14651

130. Wootton, J. C. (1994) Non-globular domains in protein sequences: Automated

segmentation using complexity measures. Comput. Chem. 18, 269-285

131. Denman, S., Xue, G. P., and Patel, B. (1996) Characterization of a Neocallimastix

patriciarum cellulase cDNA (celA) homologous to Trichoderma reesei

cellobiohydrolase II. Appl Environ Microbiol 62, 1889-1896

132. Receveur, V., Czjzek, M., Schülein, M., Panine, P., and Henrissat, B. (2002)

Dimension, Shape, and Conformational Flexibility of a Two Domain Fungal

Cellulase in Solution Probed by Small Angle X-ray Scattering. J. Biol. Chem. 277,

40887-40892

133. von Ossowski, I., Eaton, J. T., Czjzek, M., Perkins, S. J., Frandsen, T. P., Schulein,

M., Panine, P., Henrissat, B., and Receveur-Brechot, V. (2005) Protein disorder:

conformational distribution of the flexible linker in a chimeric double cellulase.

Biophys J 88, 2823-2832

134. Stals, I., Sandra, K., Geysens, S., Contreras, R., Van Beeumen, J., and Claeyssens, M.

(2004) Factors influencing glycosylation of Trichoderma reesei cellulases. I:

Postsecretorial changes of the O- and N-glycosylation pattern of Cel7A. Glycobiology

14, 713-724

70

135. Kanafusa-Shinkai, S., Wakayama, J. i., Tsukamoto, K., Hayashi, N., Miyazaki, Y.,

Ohmori, H., Tajima, K., and Yokoyama, H. (2013) Degradation of microcrystalline

cellulose and non-pretreated plant biomass by a cell-free extracellular

cellulase/hemicellulase system from the extreme thermophilic bacterium

Caldicellulosiruptor bescii. J. Biosci. Bioeng. 115, 64-70

136. Sabathe, F., and Soucaille, P. (2003) Characterization of the CipA scaffolding protein

and in vivo production of a minicellulosome in Clostridium acetobutylicum. J

Bacteriol 185, 1092-1096

137. Upreti, R. K., Kumar, M., and Shankar, V. (2003) Bacterial glycoproteins: functions,

biosynthesis and applications. Proteomics 3, 363-379

138. Lochner, A., Giannone, R. J., Keller, M., Antranikian, G., Graham, D. E., and

Hettich, R. L. (2011) Label-free quantitative proteomics for the extremely

thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance

patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. J.

Proteome Res. 10, 5302-5314

139. Dam, P., Kataeva, I., Yang, S. J., Zhou, F., Yin, Y., Chou, W., Poole, F. L., 2nd,

Westpheling, J., Hettich, R., Giannone, R., Lewis, D. L., Kelly, R., Gilbert, H. J.,

Henrissat, B., Xu, Y., and Adams, M. W. (2011) Insights into plant biomass

conversion from the genome of the anaerobic thermophilic bacterium

Caldicellulosiruptor bescii DSM 6725. Nucleic Acids Res. 39, 3240-3254

71

140. Handelsman, J. (2004) Metagenomics: application of genomics to uncultured

microorganisms. Microbiol Mol Biol Rev 68, 669-685

141. Sleator, R. D., Shortall, C., and Hill, C. (2008) Metagenomics. Lett Appl Microbiol

47, 361-366

142. El Kaoutari, A., Armougom, F., Gordon, J. I., Raoult, D., and Henrissat, B. (2013)

The abundance and variety of carbohydrate-active enzymes in the human gut

microbiota. Nat Rev Microbiol 11, 497-504

143. Li, L. L., Taghavi, S., McCorkle, S. M., Zhang, Y. B., Blewitt, M. G., Brunecky, R.,

Adney, W. S., Himmel, M. E., Brumm, P., Drinkwater, C., Mead, D. A., Tringe, S.

G., and Lelie, D. (2011) Bioprospecting metagenomics of decaying wood: mining for

new glycoside hydrolases. Biotechnol Biofuels 4, 23

144. Duan, C. J., Xian, L., Zhao, G. C., Feng, Y., Pang, H., Bai, X. L., Tang, J. L., Ma, Q.

S., and Feng, J. X. (2009) Isolation and partial characterization of novel genes

encoding acidic cellulases from metagenomes of buffalo rumens. J Appl Microbiol

107, 245-256

145. Nacke, H., Engelhaupt, M., Brady, S., Fischer, C., Tautzt, J., and Daniel, R. (2012)

Identification and characterization of novel cellulolytic and hemicellulolytic genes

and enzymes derived from German grassland soil metagenomes. Biotechnol Lett 34,

663-675

72

146. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier

to DNA transformation in Caldicellulosiruptor species results in efficient marker

replacement. Biotechnol. Biofuels 6, 82

147. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)

Improved growth media and culture techniques for genetic analysis and assessment of

biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,

41-49

148. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable

replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic

methodologies to other members of this genus. PLoS One 8, e62881

149. Chung, D., Farkas, J., and Westpheling, J. (2013) Detection of a novel active

transposable element in Caldicellulosiruptor hydrothermalis and a new search for

elements in this genus. J Ind Microbiol Biotechnol 40, 517-521

150. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)

Methylation by a unique alpha-class N4-cytosine methyltransferase is required for

DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844

151. Chung, D. H., Huddleston, J. R., Farkas, J., and Westpheling, J. (2011) Identification

and characterization of CbeI, a novel thermostable restriction enzyme from

Caldicellulosiruptor bescii DSM 6725 and a member of a new subfamily of HaeIII-

like enzymes. J Ind Microbiol Biotechnol 38, 1867-1877

73

152. Onyenwoke, R. U., Lee, Y.-J., Dabrowski, S., Ahring, B. K., and Wiegel, J. (2006)

Reclassification of Thermoanaerobium acetigenum as Caldicellulosiruptor

acetigenus comb. nov. and emendation of the genus description. Int J Syst Evol

Microbiol 56, 1391-1395

153. Yang, S. J., Kataeva, I., Wiegel, J., Yin, Y., Dam, P., Xu, Y., Westpheling, J., and

Adams, M. W. (2010) Classification of 'Anaerocellum thermophilum' strain DSM

6725 as Caldicellulosiruptor bescii sp. nov. Int J Syst Evol Microbiol 60, 2011-2015

154. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.

C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and

Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and

cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,

3760-3761

155. Miroshnichenko, M. L., Kublanov, I. V., Kostrikina, N. A., Tourova, T. P.,

Kolganova, T. V., Birkeland, N.-K., and Bonch-Osmolovskaya, E. A. (2008)

Caldicellulosiruptor kronotskyensis sp. nov. and Caldicellulosiruptor hydrothermalis

sp. nov., two extremely thermophilic, cellulolytic, anaerobic bacteria from

Kamchatka thermal springs. Int J Syst Evol Microbiol 58, 1492-1496

156. Blumer-Schuette, S. E., Ozdemir, I., Mistry, D., Lucas, S., Lapidus, A., Cheng, J.-F.,

Goodwin, L. A., Pitluck, S., Land, M. L., Hauser, L. J., Woyke, T., Mikhailova, N.,

Pati, A., Kyrpides, N. C., Ivanova, N., Detter, J. C., Walston-Davenport, K., Han, S.,

74

Adams, M. W. W., and Kelly, R. M. (2011) Complete genome sequences for the

anaerobic, extremely thermophilic plant biomass-degrading bacteria

Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor kristjanssonii,

Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor owensensis, and

Caldicellulosiruptor lactoaceticus. J. Bacteriol. 193, 1483-1484

157. Bredholt, S., Sonne-Hansen, J., Nielsen, P., Mathrani, I. M., and Ahring, B. K. (1999)

Caldicellulosiruptor kristjanssonii sp. nov., a cellulolytic, extremely thermophilic,

anaerobic bacterium. Int J Syst Bacteriol 49 Pt 3, 991-996

158. Mladenovska, Z., Mathrani, I., and Ahring, B. (1995) Isolation and characterization

of Caldicellulosiruptor lactoaceticus sp. nov., an extremely thermophilic, cellulolytic,

anaerobic bacterium. Arch. Microbiol. 163, 223-230

159. Hamilton-Brehm, S. D., Mosher, J. J., Vishnivetskaya, T., Podar, M., Carroll, S.,

Allman, S., Phelps, T. J., Keller, M., and Elkins, J. G. (2010) Caldicellulosiruptor

obsidiansis sp. nov., an Anaerobic, Extremely Thermophilic, Cellulolytic Bacterium

Isolated from Obsidian Pool, Yellowstone National Park. Appl Environ Microbiol 76,

1014-1020

160. Elkins, J. G., Lochner, A., Hamilton-Brehm, S. D., Davenport, K. W., Podar, M.,

Brown, S. D., Land, M. L., Hauser, L. J., Klingeman, D. M., Raman, B., Goodwin, L.

A., Tapia, R., Meincke, L. J., Detter, J. C., Bruce, D. C., Han, C. S., Palumbo, A. V.,

Cottingham, R. W., Keller, M., and Graham, D. E. (2010) Complete genome

75

sequence of the cellulolytic thermophile Caldicellulosiruptor obsidiansis OB47T. J.

Bacteriol. 192, 6099-6100

161. Huang, C. Y., Patel, B. K., Mah, R. A., and Baresi, L. (1998) Caldicellulosiruptor

owensensis sp. nov., an anaerobic, extremely thermophilic, xylanolytic bacterium. Int

J Syst Bacteriol 48 Pt 1, 91-97

162. Rainey, F. A., Donnison, A. M., Janssen, P. H., Saul, D., Rodrigo, A., Bergquist, P.

L., Daniel, R. M., Stackebrandt, E., and Morgan, H. W. (1994) Description of

Caldicellulosiruptor saccharolyticus gen. nov., sp. nov: An obligately anaerobic,

extremely thermophilic, cellulolytic bacterium. FEMS Microbiol Lett 120, 263-266

76

CHAPTER 2

Multi-Domain, Surface Layer Associated Glycoside Hydrolases Contribute to Plant Polysaccharide Degradation by Caldicellulosiruptor species

Jonathan M. Conway, William S. Pierce, Jaycee H. Le, George W. Harper, John H. Wright, Allyson L. Tucker, Jeffrey V. Zurawski, Laura L. Lee, Sara E. Blumer-Schuette, and Robert M. Kelly

Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC 27695

Published in: J. Biol. Chem. 291, 6732-6747 (March 2016)

77

Abstract

The genome of the extremely thermophilic bacterium, Caldicellulosiruptor kronotskyensis, encodes 19 S-layer homology (SLH) domain-containing proteins, the most in any Caldicellulosiruptor species genome sequenced to date. These SLH proteins include five glycoside hydrolases (GHs) and one polysaccharide lyase, the genes for which were transcribed at high levels during growth on plant biomass. The largest GH identified so far in this genus, Calkro_0111 (2,435 amino acids), is completely unique to C. kronotskyensis and contains SLH domains. Calkro_0111 was produced recombinantly in Escherichia coli as two pieces, containing the GH16 and GH55 domains, respectively, as well as putative binding and spacer domains. These displayed endo- and exo-glucanase activity on the β-1,3-1,6- glucan laminarin. A series of additional truncation mutants of Calkro_0111 revealed the essential architectural features required for catalytic function. Calkro_0402, another of the

SLH-domain GHs in C. kronotskyensis, when produced in E. coli, was active on a variety of xylans and β-glucans. Unlike Calkro_0111, Calkro_0402 is highly conserved in the genus

Caldicellulosiruptor and among other biomass degrading , but missing from

Caldicellulosiruptor bescii. As such, the gene encoding Calkro_0402 was inserted into the C. bescii genome, creating a mutant strain with its S-layer extensively decorated with

Calkro_0402. This strain consequently degraded xylans more extensively than wild-type C. bescii. The results here provide new insights into the architecture and role of SLH domain

GHs and demonstrate that hemicellulose degradation can be enhanced through a non-native

SLH domain GHs engineered into the genomes of Caldicellulosiruptor species.

78

Introduction

Many bacteria (1,2) and archaea (3) produce a two-dimensional, para-crystalline array of protein that covers the outside of the cell, referred to as a surface layer (S-layer). In bacteria, the majority of S-layer proteins are non-covalently associated with the bacterial cell surface via specialized domains at their amino- or carboxy-terminus, typically from one of three distinct, but analogous, domain categories: Surface Layer Homology (SLH) (4,5), Cell

Wall Binding repeat 2 (CWB2) (6), or Noncovalent Attachment Domain (NCAD) (7,8). In archaea, S-layer proteins are most often attached to the cell membrane. S-layer proteins can be anchored to the membrane via a C-terminal transmembrane helix domain (3,9), or covalently attached to cell membrane lipid via an archaeosortase, as shown in Haloferax volcanii (10,11). In most microorganisms, the S-layer is comprised entirely of one or two proteins (SLPs), which self- assemble. In addition to these SLPs, some Firmicutes produce a family of proteins that also contain SLH, CWB2, or NCAD domains, and as such, are predicted to be associated with the S-layer (12). SLPs and S-layer associated proteins can also contain domains with specialized functions allowing the S-layer to play a role in adherence and biofilm formation (13-16), biogenesis of the cell envelope (17), cell division

(18), motility (19,20), and, of interest here, plant cell wall degradation (21,22).

Many lignocellulose-degrading bacteria utilize SLH domain-containing proteins to display plant biomass degrading Carbohydrate Active enZYmes (CAZymes) (23) on the cell surface. In cellulosomal bacteria, SLH domain proteins attached to the cell surface interact with the scaffoldin protein of the multi-enzyme cellulosome complex to anchor it to the cell surface (24-26). In many non-cellulosomal, biomass-degrading bacteria, multi-domain

79

CAZymes containing SLH proteins are implicated in lignocellulose deconstruction. About

5% of the 20,710 predicted SLH domain proteins, represented in 1889 genomes from the

Integrated Microbial Genomes database (27), contain at least one identifiable CAZyme- related component: glycoside hydrolase (GH), polysaccharide lyase (PL), carbohydrate esterase (CE), or carbohydrate binding module (CBM).

Multi-domain CAZymes containing SLH domains have been characterized from

Caldanaerobius polysaccharolyticus (28,29), Bacillus sp. (30), Paenibacillus sp. (31-36),

Clostridium sp. (37-44), Thermoanaerobacterium sp. (45-51), and Caldicellulosiruptor sp.

(21,52). Across these species, the number of identifiable CAZyme domain-containing SLH proteins and total predicted SLH proteins vary significantly. Examples include: Paenibacillus sp. strain JDR-2 (23 CAZyme domain containing of 78 total SLH proteins), Clostridium thermocellum ATCC 27405 (4 of 25), Thermoanaerobacterium saccharolyticum strain

JW/SL-YS485 DSM 8691 (7 of 12), Caldicellulosiruptor bescii (1 of 12), and

Caldicellulosiruptor kronotskyensis (6 of 19).

The genus Caldicellulosiruptor is comprised of extremely thermophilic, gram positive, anaerobic bacteria that are able to attach to, and degrade, the variety of polysaccharides found in lignocellulosic biomass. Currently, the genomes of 12 members of the genus have been sequenced, representing species isolated from globally distributed terrestrial hot springs (53-57). Caldicellulosiruptor species produce many extracellular

CAZymes, including some which have SLH domains (58). The enzyme inventory produced by individual species, and thus the capacity to degrade plant biomass polysaccharides, varies significantly across the genus (59,60). C. kronotskyensis has the largest inventory of

80

CAZymes of any Caldicellulosiruptor species, with 31 CAZymes compared to 20 CAZymes in C. bescii, which is to date the most studied member of the genus (61).

During growth on complex carbohydrates, Caldicellulosiruptor species physically associate with the substrate. This attachment is mediated in part by non-catalytic cellulose binding proteins, called tāpirins, that were recently identified and characterized (62). In addition, proteins anchored to the cell surface within the S-layer via SLH domains are also believed to play a role in the attachment of Caldicellulosiruptor species to insoluble plant biomass substrates (21,63). Two such SLH domain proteins from C. saccharolyticus,

Csac_0678 and Csac_2722, were shown to bind to insoluble substrates, and could be identified in the S-layer protein cell fraction, implying a role in cell-substrate attachment

(21). However, more information is needed to understand the role of SLH domain proteins in plant biomass deconstruction, especially how their function relates to other CAZymes produced for this purpose. In this study, the localization, biochemical characteristics and physiological role of two SLH domain enzymes from C. kronotskyensis, xylanase

Calkro_0402 and laminarinase Calkro_0111, were examined from this perspective.

Materials and Methods

Bacterial Strains, Plasmids and Reagents

Cloning and expression of recombinant proteins used various E. coli strains:

NovaBlue GigaSingles™ (EMD Millipore), Rosetta™ 2(DE3) Singles™ (EMD Millipore),

NEB 10-beta electrocompetent E. coli (New England Biolabs), and Arctic Express (DE3)RIL

81

E. coli (Agilent Technologies). Axenic strains of Caldicellulosiruptor bescii and

Caldicellulosiruptor kronotskyensis were obtained from the Leibniz Institute DSMZ-German

Collection of Microorganisms and Cell Cultures. Caldicellulosiruptor bescii strain

JWCB018 (64) and non-replicating vector pDCW121 (65) were obtained from J.

Westpheling (University of Georgia, Athens, Georgia). Genes of interest were amplified by polymerase chain reaction (PCR) and cloned using the pET46 Ek/LIC vector kit (EMD

Millipore) for protein expression. All other vectors were constructed using one-step isothermal assembly of overlapping dsDNA, as described previously (66). Plasmids were isolated using Qiaprep miniprep kits (Qiagen) and plasmid sequences confirmed by sequencing (Genewiz). Oligonucleotide primer sequences used are listed in Table 1.

Carbohydrates and biomasses used included: laminarin from Laminaria digitata (Sigma

L9634), xylan from birchwood (Sigma X0502), xylan from oat spelts (Sigma X0627), and dilute acid pretreated Populus trichocarpa X Populus deltoides (National Renewable Energy

Laboratory, Golden, CO).

Growth Media and Culture Conditions

All E. coli cultures were maintained in Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or LB medium with 1.5% (w/v) agar plates supplemented with 50 μg/ml carbenicillin (Corning cellgro), 34 μg/ml chloramphenicol

(Sigma), 20 μg/ml gentamycin (Chem-Impex International Inc.), 50 μg/ml streptomycin

(Sigma), and/or 50 μg/ml apramycin (Fisher Scientific), as appropriate. Protein expression was performed in either Terrific Broth (TB) medium (containing 12 g/L tryptone, 24 g/L

82

yeast extract, 4 mL/L glycerol, 2.31 g/L KH2PO4, and 12.54 g/L K2HPO4) or ZYM-5052 medium (67) modified to contain 0.625% glycerol, 0.0625% glucose, and 0.25% lactose.

Caldicellulosiruptor strains were grown anaerobically at 70°C in modified DSM640 medium (DSMZ [http://www.dsmz.de]), containing 0.9 g/L NH4Cl, 0.9 g/L NaCl, 0.4 g/L

MgCl2 ● 6H2O, 0.75 g/L KH2PO4, 1.5 g/L K2HPO4, 1 g/L yeast extract, 1mL/L trace element solution SL-10, 5 g/L cellobiose, 0.5 mg/L resazurin, and 0.75 g/L L-Cysteine-HCl ● H20).

Caldicellulosiruptor genomic DNA (gDNA) for cloning and vector construction was extracted, as described previously (68). For genetic manipulations of C. bescii, strains were grown anaerobically at 70°C in low osmolality defined (LOD) or low osmolality complex

(LOC) media (69). C. bescii was plated embedded in LOD medium with 1.5% agar and grown at 70°C in anaerobic tanks with N2 headspace. For solubilization experiments and immunofluorescence microscopy, Caldicellulosiruptor strains were grown on a modified defined version of DSMZ671 medium (671d medium), described previously (70), with 5 g/L of the specific biomass substrate as the carbon source. For growth of all uracil auxotroph mutants, growth medium was supplemented with 40 mM uracil.

Expression of Recombinant Proteins in E. coli

All Calkro_0111 Truncation Mutants (TMs) and Calkro_0402 TM1 were cloned into the pET46 Ek/LIC vector maintained in NovaBlue E. coli. Sequence confirmed plasmids were transformed into Rosetta 2(DE3) E. coli for protein expression. Proteins were expressed in modified ZYM-5052 auto-induction medium at 37°C 250 rpm. Cells were harvested 18-24 hours after induction by centrifugation at 6000 x g for 10 minutes. For the

83

expression of Calkro_0402, the gene was cloned into the pET46 Ek/LIC vector without the predicted signal peptide, as determined by the SignalP 4.1 Server (71), and maintained in

NEB 10-beta E. coli. The sequence confirmed plasmid was transformed into Arctic Express

(DE3)RIL E. coli for protein expression. Expression was performed in TB media by growing cells at 30°C and 250 rpm until optical density at 600 nm (OD600) reached 1.0 followed by induction at 13°C with 1 mM IPTG (isopropyl-β-D-thiogalactopyranoside). Cells were harvested 16-20 hours later by centrifugation at 6000 x g for 10 minutes. All cell pellets were frozen at -20°C prior to purification.

Purification of Recombinant Protein and Truncation Mutants

Cell pellets were re-suspended in 5 ml of 20 mM sodium phosphate pH 7.4, 0.5 M

NaCl, 20 mM imidazole, 0.1% (v/v) Nonidet P40, and 1 mg/ml lysozyme per gram of wet cell pellet. Cells were lysed in a French Press at 16,000 psi. Cell lysates were heat treated at

65°C for 20 min (except for the full length Calkro_0402 protein product which was not heat treated) to remove heat labile proteins from E. coli. The cell lysate was then centrifuged at

25,000 x g for 20-60 minutes and the supernatant was passed through a 0.22 µm filter to prepare cell free cell extract. Proteins were purified using 1 mL or 5 mL HisTrap HP Ni

Sepharose (GE Life Sciences) immobilized metal affinity chromatography (IMAC) columns, operated according to the manufacturer instructions. For full length Calkro_0402 and

Calkro_0111 TM8, size exclusion chromatography (SEC) was used after IMAC on a HiLoad

26/600 Superdex 200 pg column (GE Life Sciences) in 50 mM sodium phosphate, 150 mM

NaCl, pH 7.2 buffer. All chromatography steps were performed on a Biologic DuoFlow

84

FPLC (BioRad). Purity of the recombinant proteins was evaluated by SDS-PAGE using

NuPAGE Novex 4-12% Bis-Tris protein gels (Life Technologies) or 4-15% Mini-PROTEAN

TGX gels (Bio-Rad) with Benchmark protein ladder (Life Technologies), and staining by

GelCode Blue Safe Protein Stain (Thermo Scientific). The concentration of purified proteins was determined using the bicinchoninic acid assay (Pierce, Thermo Fisher Scientific).

Optimal pH and Temperature Determination

The pH optimum for each recombinant enzyme was determined at 70°C in buffers at pH 2.5-10 (50 mM citrate buffer pH 2.5 and 3.5; 50 mM sodium acetate buffer pH 4.5, 5, 5.5, and 6; 50 mM sodium phosphate pH 6.5, 7, and 8; 50 mM sodium bicarbonate pH 9.2 and

10). The temperature optimum was determined at the optimal pH between 30 and 100°C.

Enzyme reactions and no enzyme/substrate controls were prepared in triplicate with 100 µL of 1% substrate and incubated in PCR multi-well plates in a thermocycler (Eppendorf).

Activity was assessed by measuring reducing sugar released using a 3,5-dinitrosalicylic acid

(DNS) assay modified from (72,73) and adapted to a 96-well microplate format (74,75). The

DNS reagent used contained 1.6% (w/v) sodium hydroxide, 30% (w/v) sodium potassium tartrate, and 0.1% (w/v) 3,5-dinitrosalicilic acid. Briefly, 30 µL of sample was mixed with 60

µL of DNS reagent and incubated in a 96-well PCR plate in a thermocycler (Eppendorf) using the following program: 95°C for 5 min, 48°C for 1 min, hold at 20°C. Then the DNS reaction was diluted with distilled water and absorbance measured at 540 nm. Glucose or xylose was used as standards for glucooligosaccharide or xylooligosaccharide release.

85

Analysis of Oligosaccharide Products from Enzymatic Hydrolysis

To analyze oligosaccharides released by the various enzymes, enzyme was added to

500 µL of 1% substrate in buffer at the enzyme’s optimum pH and incubated between 0 and

240 minutes at 70°C at 800 rpm in a mixing thermocycler (Eppendorf). Reactions without substrate and/or enzyme were also assessed. After incubation, tubes were spun at 16,000 x g for 5 minutes at 4°C and the supernatant applied to 0.5 mL 10K MWCO protein concentrators (Pierce, Thermo Fisher Scientific). The protein concentrator eluent was diluted

10-fold with distilled water and analyzed by high performance liquid chromatography

(HPLC) (Alliance e2695 separations module; Waters) with refractive index (model 2414;

Waters) and photodiode array (model 2998; Waters) detectors. Columns were either the KS-

801 (xylooligosaccharides) or KS-802 (glucooligosaccharides) (Shodex) operated with water as the mobile phase, 0.6 mL/min and 80°C. Laminarioligosaccharides (L2-L6) (Seikagaku

Corporation), individual xylooligosaccharides (X2-X6) (Megazyme), glucose (Sigma), and xylose (Acros Organics) were used as standards.

Confocal and Epifluorescence Microscopy

All centrifugation steps for preparation of labeled cells were carried out at 6,000 x g for 10 min. Cells harvested from a 50 mL culture in 671d medium were washed one time with sterile 1X Phosphate Buffered Saline (PBS) and cells were fixed in 1X PBS containing

4% formaldehyde (methanol free, Fisher) for 30 min at room temperature with gentle shaking, followed by washing three times with 50 mL of PBS. Cells were re-suspended in 3 mL antibody blocking solution containing 5% (v/v) normal goat serum (Immunoreagents),

86

1% (w/v) bovine serum albumin (protease and nuclease free, Fisher), and 0.05% (v/v)

Tween-20 (Fisher) in PBS and incubated on a Boeckle rocker for 60 min at room temperature. After this blocking step, cells were harvested by centrifugation and re- suspended in antibody blocking solution containing 100 µg/mL of primary antibody.

Polyclonal antibodies were raised against Calkro_0111 TM7 or Calkro_0402 TM1 in chickens and supplied as total IgY with pre-immune total IgY controls by GeneTel

Laboratories (Madison, Wisconsin). Cells were incubated with the primary antibody for 18 hours at 4°C on a Boeckle rocker. After this incubation, cells were washed three times with 1 mL PBS containing 1% (w/v) bovine serum albumin and 0.05% (v/v) Tween-20 and re- suspended in antibody blocking solution containing a 1:400 dilution of goat anti-chicken

DyLight488 conjugated secondary antibody (Immunoreagents) and incubated on a Boeckle rocker for 60 minutes at room temperature. Cells were harvested and washed three times with PBS, re-suspended in 100 µL of 3.6 µM DAPI in PBS and incubated at 4°C for 18 hours. Following this incubation, cells were washed three times, re-suspended with PBS, and vacuum filtered onto 0.22 µm polycarbonate hydrophilic isopore membrane filters (EDM

Millipore) and mounted in 15 µL of SlowFade Diamond anti-fade mountant (Life

Technologies). For Calkro_0111 and Calkro_0402 localization in C. kronotskyensis, slides were imaged on a Zeiss LSM 710 confocal workstation using a 63x oil immersion plan- apochromat (NA 1.4) objective (NC State University Cellular and Molecular Imaging

Facility, Raleigh, NC). Image processing was performed using Zeiss Zen Blue software.

Epifluorescence imaging and cell counting was performed using a Nikon eclipse 50i microscope with a Plan Fluor 100x (NA 1.3) oil emersion objective and Nikon DS-Fi1

87

camera. For routine cell counting, cells were stained with acridine orange, as previously described (76). For immunofluorescence images of C. bescii genetic mutant strains, cells were labeled and slides were prepared as described above for confocal imaging. Images were captured as a z-stack of images through the sample with 0.33 µm steps controlled by the

ES10ZE Focus controller (Prior Scientific) and processed into a single focused image using the Nikon Elements software extended depth of field processing methods.

Transmission electron microscopy was performed at the Laboratory for Advanced

Electron and Light Optical Methods (College of Veterinary Science, North Carolina State

University).

Genetic Manipulation of C. bescii

To prepare competent cells, a 500 mL culture of C. bescii strain JWCB018 was grown to an optical density at 680 nm of 0.06-0.07 in LOD medium supplemented with 1X

19 amino acid solution (77). The culture was cooled to room temperature and cells were harvested by centrifugation. All centrifugation steps were carried out at 6,000 x g for 10 min.

The cell pellet was washed three times with 10% sucrose and re-suspended with 10% sucrose to a total volume of 100-120 µL. Fifty µL of competent cells were mixed with 1 µg of pJMC009 plasmid DNA at room temperature and electroporated using 1 mm gap cuvettes

(USA Scientific) and a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) operated at 2.25 kV, 600 Ω, and 25 µF. Immediately following electroporation, cells were transferred to 10 mL of pre-warmed LOC medium and incubated at 70°C for one hour. After this incubation, the culture was cooled to room temperature, and cells were harvested by

88

centrifugation, transferred to 100 mL of pre-warmed LOD medium lacking uracil, and incubated at 70°C for 2-4 days to select for first crossover transformants. This culture was passaged and then plated on LOD medium lacking uracil to select individual colonies.

Integration of the pJMC009 vector was confirmed by PCR and a successful first crossover mutant strain was plated on LOD medium supplemented with 40 mM uracil and 4 mM 5- fluoroorotic acid (5-FOA) to select for second crossover mutants. Isolated colonies were screened by PCR and successful second crossovers were identified. A successful second crossover was plated on LOD medium supplemented with uracil and individual colonies were isolated. This plating and isolation of colonies was repeated two times to obtain the final strain and ensure its purity. All PCR screening steps were performed using genomic

DNA isolated using the ZymoBead™ genomic DNA kit (Zymo Research).

Hydrolysis of Hemicellulose Substrates

Washed biomass substrates (oat spelt xylan, birchwood xylan) were prepared by washing 1 g substrate per 100 mL distilled water overnight at 70°C to remove soluble sugars, centrifuged at 6,000 x g for 10 min and dried at 70°C. Caldicellulosiruptor strains were passaged on modified 671d medium with respective biomass substrates 3 to 4 times.

Solubilization cultures were prepared in triplicate as 50 mL 671d medium cultures with 5 g/L substrate, inoculated to 1 x 106 cells/mL and grown at 70°C for 45.5 hours. Residual substrate was harvested by centrifugation at 6000 x g for 10 min and drying at 70°C until constant mass. The extent of solubilization was determined from the difference in mass between the biomass used to prepare each culture and the residual remaining after harvest.

89

For time course analysis of oat spelt xylan solubilization, cultures were prepared as above except cultures were sampled by removing 2 mL of culture and centrifuging it at 6,000 x g for 10 min. The supernatant was stored at -20°C prior to HPLC analysis.

Analysis of Solubilization in Culture Supernatants

Supernatant samples were analyzed for xylose content, as described previously (70).

Briefly, the samples were brought to 4% (w/w) sulfuric acid and autoclaved for 1 hour on the liquid cycle. Samples were cooled to room temperature and spun at 18,000 x g for 5 minutes and supernatant analyzed by HPLC as described above except using an Aminex-87H (300 mm by 7.8 mm; Bio-Rad) column with a mobile phase of 5 mM sulfuric acid at 0.6 mL/min and 60°C.

C. kronotskyensis and C. bescii RNA microarray - Transcriptomic data, which was previously acquired (70) and deposited in the NCBI Gene Expression Omnibus (GEO) database (Accession GSE68810), was re-analyzed with respect to the SLH domain containing proteins from C. bescii and C. kronotskyensis.

Results

S-layer Homology (SLH) Domain Proteins in Caldicellulosiruptor Species

The genomes of the twelve, sequenced Caldicellulosiruptor species collectively encode 34 different groups of SLH domain-containing proteins (Supplementary Table S-1).

About half of the SLH domain-containing protein groups in these genomes have no additional identifiable domains other than SLH domains, highlighting the limited

90

understanding of this group of proteins. Caldicellulosiruptor species produce between 10 (C. saccharolyticus and C. acetigenus) and 19 (C. kronotskyensis) SLH domain proteins.

Eight of the 34 SLH domain-containing protein groups contain identifiable catalytic

GH or PL CAZyme domains (Figure 2.1). Csac_0678 (GH5), which has been previously characterized (21), has homologs in all 12 sequenced Caldicellulosiruptor species. Three of the 12 Caldicellulosiruptor species (C. owensensis, C. obsidiansis, and C. sp. strain Rt8.B8) only produce a truncated homolog of Csac_0678 (see Table S-1). Homologs of Calkro_0402

(GH10) are found in seven sequenced Caldicellulosiruptor species, with 65-83% amino acid sequence identity. One ortholog of Calkro_0402 (76% identity) has been characterized from

Caldicellulosiruptor sp. Rt69.B1 (52), which does not have a fully sequenced genome.

Calkro_0402 and its Caldicellulosiruptor sp. homologs belong to a larger group of GH10 enzymes with similar domain architecture, both with and without C-terminal SLH domains, that have been characterized from several xylan-degrading microbes (28,31,33,34,36,37,39-

46,48-51,78).

The other six groups of catalytic SLH domain proteins (Figure 2.1) have not been characterized from Caldicellulosiruptor species. Calkro_0111 (GH16/GH55) is the largest of all of these at 2453 amino acids, and is also the largest CAZyme of any kind identified so far in the genus Caldicellulosiruptor. Calkro_0121 is a second truncated paralog of

Calkro_0111, which has 60% amino acid identity to Calkro_0111 and identical domain architecture, but is lacking the final CBM32 domain. The arrangement of the domains in

Calkro_0111 and Calkro_0121 is entirely unique to C. kronotskyensis, and no homologs of the full protein (Calkro_0111) can be identified in any other sequenced microorganism.

91

Calkro_0072 (GH16) has homologs in seven Caldicellulosiruptor species and is very similar to the β-1,3-glucanase Lic16A characterized from Clostridium thermocellum (38). C. hydrothermalis produces two catalytic SLH domain proteins that are unique to this species, putative xylanase Calhy_1629 (GH43) and putative α-glucanase Calhy_2383 (GH87).

Calkro_0550 (PL11) from C. kronotskyensis has one additional homolog in

Caldicellulosiruptor owensensis, and is a putative rhamnogalacturonan lyase. C. owensensis also produces one other putative pectate lyase SLH domain protein, Calow_2109 (PL9), which is unique to this species.

These catalytic SLH domain proteins encompass a range of enzymatic activities localized in the S-layer, presumably used to break down plant polysaccharide components that are in proximity of the cell surface. C. kronotskyensis produces six CAZyme SLH domain proteins, the most of any species; while C. bescii produces the fewest, only the

Csac_0678 homolog conserved in all Caldicellulosiruptor species. These SLH-localized

CAZymes accompany a host of other CAZyme domains found in extracellular free enzymes produced by Caldicellulosiruptor sp. for plant biomass degradation (58).

C. kronotskyensis Produces an S-layer and SLH Enzymes Calkro_0111 and

Calkro_0402 are Localized on the C. kronotskyensis Cell Surface

Microscopy (TEM) of C. kronotskyensis growing on dilute acid treated poplar (P. trichocarpa x P. deltoides) shows the cell membrane, peptidoglycan layer, and S-layer of C. kronotskyensis cells (Figure 2. 2A), with the cells appearing to attach to granules of the substrate mediated by appendages at the cell surface. To determine the localization of the

92

SLH CAZymes Calkro_0402 and Calkro_0111, polyclonal antibodies were raised against portions of these proteins and used for immunofluorescence microscopy in C. kronotskyensis.

Figures 2.2 B-C and D-E show confocal microscopy of C. kronotskyensis labeled with anti-

Calkro_0111 and anti-Calkro_0402 antibodies, respectively. Controls labeled with pre- immune total IgY antibodies show minimal labeling (data not shown). In each case, these enzymes are localized on the cell surface within the S-layer, as would be predicted by the presence of SLH domains. Because of the high similarity between Calkro_0111 and

Calkro_0121, the antibodies likely are labeling both of these proteins in Figure 2.2 B-C.

Laminarinase Calkro_0111 Displays Endo- and Exo-glucanase Activity in Separate

Catalytic Domains

Calkro_0111 contains fifteen predicted domains, including the two catalytic units:

GH16 and GH55 (Figure 2.3A). Because the full enzyme could not be produced recombinantly in E. coli, the enzyme was split into several truncation mutants (TMs) for characterization. Two of these TMs were used to characterize the enzymatic activity: TM1, containing the GH16 domain and covering the first half of Calkro_0111, and TM8, containing the GH55 domain and covering the second half of Calkro_0111. For both TM1 and TM8 the optimal pH was 5, while the optimal temperature was 75°C (Figure 2.3B).

Analysis of the oligosaccharides released from laminarin by these two halves of Calkro_0111 showed that the GH55 domain of TM8 generates primarily glucose and a small amount of laminaribiose from laminarin, while the GH16 domain of TM1 released laminooligosaccharides of size L3 (laminaritriose) and greater, which slowly accumulated

93

with time (Figure 2.3C). Taken together, this suggests that the GH16 domain is an endoglucanase, while the GH55 domain is an exoglucanase. Considering the full-length enzyme, coordination between the GH16 and GH55 domains is expected, such that the GH16 cuts new chain ends for the GH55 to digest, thereby liberating glucose. This synergism is not uncommon in freely secreted Caldicellulosiruptor enzymes with two catalytic domains

(76,79,80), but synergism had yet to be demonstrated in any CAZyme SLH domain protein.

This is primarily due to the fact that very few SLH domain proteins have two catalytic

CAZyme domains, further highlighting the uniqueness of Calkro_0111.

To further explore the domain arrangement and the contribution of the non-catalytic domains to activity, a range of TMs (Figure 2.3A) were produced in E. coli and analyzed.

Activity of these TMs at 70°C on laminarin was evaluated at pH 5, the enzyme optimum pH, and pH 7, the growth optimum pH for Caldicellulosiruptor (Figure 2.3D). TM3 and TM4 had the highest activity at both pHs tested, while TM1 was slightly less active. TM2, which is truncated between the GH16 and the first actin cross linking-like (ACL) domain, was significantly less active. TM5, which contains the GH16 domain alone, and TM6 and TM7, which contain the ACL and Fibronectin type-III domains (FN3) and not the GH16, all displayed little to no activity (Figure 2.3D). TM4, which contains the GH16 and first ACL domain, represents the smallest portion of the GH16 side of the enzyme that remains fully active under these conditions. This suggests that the ACL domain plays an important role in

GH16 function. The ACL domain could stabilize the GH16 domain, a role that has been shown before for accessory domains in other GH multi-domain enzymes. For example, two

FN3 domains (also previously called X1 modules) of C. thermocellum cellobiohydrolase A

94

(CbhA) increased the thermostability of the adjacent GH9 domain (81). For the Calkro_0111

GH55 domain, removal of the ACL and FN3 domains from TM8 to TM9 and then removal of the two CBM32 domains from TM9 to TM10 resulted in successive reductions in activity

(Figure 2.3D). This suggests that both the ACL-FN3-FN3 grouping and two CBM32 domains play an important role in the activity of the Calkro_0111 GH55.

Biochemical Characterization of Calkro_0402 and Relatedness To Other

CBM22/GH10/CBM9 Enzymes

Calkro_0402 belongs to a large group of homologous CBM22, GH10, and CBM9 containing proteins, from a variety of bacteria, primarily Firmicutes. An unrooted neighbor- joining phylogenetic tree constructed from the full amino acid sequences of top blastp hits to

Calkro_0402 and similar xylanases that have been previously characterized shows the relatedness of xylanase enzymes from this group (Figure 2.4A).

Calkro_0402 contains three CBM22, one catalytic GH10, two CBM9 domains, and one Cadherin-like domain, in addition to C-terminal SLH domain repeats (Figure 2.4B). Full length Calkro_0402 was produced recombinantly in E. coli (Figure 2.4C) and the optimal pH and temperature for activity on birchwood xylan were found to be pH 5.5 and 80°C, respectively (Figure 2.4D). Calkro_0402 releases primarily xylotriose and xylobiose from birchwood xylan, but also a small amount of xylose (Figure 2.4E). This type of activity is consistent with other xylanase homologs that have been previously characterized (28,31).

Activity of Calkro_0402 was also detected on oat spelt xylan, wheat arabinoxylan, pachyman

95

1,3-β-glucan, barley β-glucan, and lichenan, in addition to birchwood xylan, all at pH 5.5 and

70°C (data not shown).

Transcriptomic Analysis of SLH Domain Proteins in C. bescii and C. kronotskyensis

Transcriptomic analysis was performed on C. kronotskyensis and C. bescii, the

Caldicellulosiruptor species with the most and least catalytic CAZyme SLH domain proteins, respectively, when these species were grown on Avicel (crystalline cellulose) and the more heterogeneous switchgrass. All CAZyme SLH domain proteins from both species, except

Calkro_0111 and Calkro_0121, were up-regulated on switchgrass. This transcriptional response to switchgrass suggests these genes respond to components of switchgrass, like hemicellulose polysaccharides, not found in crystalline cellulose Avicel. The transcriptional level of these genes on switchgrass is similar to the free enzymes in the glucan degradation locus (GDL), known to be a genomic feature of cellulolytic Caldicellulosiruptor sp.

(59,61,70). Calkro_0333 and Athe_2303 are very highly transcribed under both conditions.

These proteins are homologs of Csac_2451, which was previously identified as the main SLP

(59,60). A number of other non-catalytic SLH domain proteins are transcribed at moderate to high levels, but because domains other than SLH domains are not predicted, the role of these proteins is unknown.

96

Manipulation of the C. bescii S-layer by the Insertion (Knock On) of the Gene Encoding

Calkro_0402

Because Calkro_0402 is very highly transcribed in C. kronotskyensis (Figure 2.5) and utilized by a variety of xylan degrading bacteria (Figure 2.4A), Calkro_0402 was

‘knocked-in’ to genetically tractable C. bescii to examine its potential contribution to plant biomass degradation in vivo. C. bescii is a good genetic background for testing SLH proteins in vivo, because it produces only 12 SLH domain proteins in total and does not produce a homolog of Calkro_0402. Calkro_0402 was inserted into C. bescii uracil auxotroph strain

JWCB018 using pyrF complementation and 5-FOA counter-selection. The knock-in construct contained Calkro_0402, including the native signal peptide, and the native predicted terminator sequence from C. kronotskyensis under the control of the slp promoter from the main SLP (Athe_2303) in C. bescii (Figure 2.6 A and B). As shown in Figure 2.5, the main SLP (Athe_2303) is very highly and constitutively transcribed. Thus, its promoter should direct very high levels of transcription of Calkro_0402 in the knock-in construct. The knock-in was targeted at the ΔCbeI (Athe_2438) deletion locus in C. bescii strain JWCB018, because genetic manipulation in this area of the genome was successful previously (64). The

PCR amplicon for the knock-in C. bescii strain RKCB103 compared to strain JWCB018 is shown in Figure 2.6C.

Using this Calkro_0402 knock-in C. bescii strain RKCB103, immunofluorescence microscopy was performed to investigate the expression of Calkro_0402 within the S-layer of C. bescii. For comparison, immunofluorescence microscopy was also performed on parent strain C. bescii strain JWCB018 and C. kronotskyensis, in addition to C. bescii strain

97

RKCB103 (Figure 2.6 D-F). C. bescii strain JWCB018, which does not have Calkro_0402, had minimal labeling with the anti-Calkro_0402 antibodies (Figure 2.6D), while C. bescii strain RKCB103 was extensively labeled on the cell surface (Figure 2.6F). Labeling of the native expression of Calkro_0402, in C. kronotskyensis is also shown (Figure 2.6E). Based on transcription levels of Calkro_0402 and the Athe_2303 SLP (Figure 2.5), the

Calkro_0402 transcript in C. bescii strain RKCB103 should be >10-fold higher, using the slp promoter, compared to levels observed in wild-type C. kronotskyensis. This clearly relates to an increase in Calkro_0402 protein expression in C. bescii strain RKCB103 compared to native expression in C. kronotskyensis, as seen via the immunofluorescent labeling. Also of particular note is that the native signal peptide from C. kronotskyensis on Calkro_0402 appeared to function in C. bescii to route the protein for secretion. The C. bescii produced

Calkro_0402 also localized to the surface of the C. bescii strain RKCB103 cells, and in

Figure 2.6F appears to localize primarily at the poles of the cell. The organization of SLH proteins on the cell surface is generally thought to be coordinated in bacteria that produce multiple SLH proteins, but the mechanisms by which this occurs has not been determined

(82). This is the first example of manipulation of the Caldicellulosiruptor S-layer through genetic modification.

C. bescii Strain RKCB103 Xylan Solubilization

To understand the role of Calkro_0402 in vivo, the ability of C. bescii strains

RKCB103 and JWCB018 to solubilize plant biomass in culture was examined. Figure 2.7A shows the solubilization of oat spelt and birchwood xylans. Strain RKCB103 is able to

98

solubilize significantly more washed oat spelt xylan than strain JWCB018; the xylose equivalents from the soluble xylooligosaccharides measured in the supernatant are more than double for strain RKCB103 than strain JWCB018 (Figure 2.7B). For both the washed and unwashed birch xylan substrates, strain RKCB103 performs slightly better and released more xylose equivalents to the supernatant than strain JWCB018. Other biomass substrates tested

(diluted acid pretreated Populus trichocarpa x Populus deltoides, dilute acid pretreated switchgrass, and unpretreated Panicum virgatum switchgrass) showed no significant difference in solubilization and also showed < 30 µg/mL xylose equivalents in the culture supernatant (data not shown). This suggests that on these lignocellulosic substrates C. bescii strain JWCB018 produced enough xylanase activity to remove xylan at the rate at which it was exposed from the complex polysaccharide matrix in the substrate, and cellulose solubilization or occlusion of the polysaccharides by lignins are likely limiting the overall solubilization. However, in the substrates with high xylan content, strain RKCB103 with the addition of Calkro_0402 outperformed strain JWCB018.

Immunofluorescence microscopy (Figure 2.6F) shows Calkro_0402 is localized on the surface of strain RKCB103 cells. While the enzyme is tethered to the cell surface, the GH and CBM domains of Calkro_0402 interact with the xylan substrate. This suggests that

Calkro_0402 plays a role in mediating cell attachment to xylan substrates. Cultures of strain

JWCB018 and strain RKCB103 (Figure 2.7, C and D, respectively) grown on washed birchwood xylan support this role for Calkro_0402. These cultures contained approximately the same number of cells as determined by cell counts (data not shown), but the supernatant in Figure 2.7D for strain RKCB103 with Calkro_0402 is relatively clear, as the cells appear

99

to be attached to the insoluble xylan at the bottom of the culture bottle. The strain JWCB018 cultures (Figure 2.7C) were more turbid, suggesting the cells did not as readily attached to the biomass substrate.

To understand the role of Calkro_0402 in improved xylan solubilization, a time course experiment was performed on washed oat spelt xylan. Figure 2.8A shows the planktonic cell density for strains JWCB018 and RKCB103 over the duration of the culture.

The cell densities of the two strains track each other closely. When monitoring xylose release, roughly 9 to 12 hours post-inoculation, the xylose hydrolyzed from xylooligosaccarides by strain RKCB103 was measurable above the baseline abiotic control, whereas, strain JWCB018 takes between 16 and 22 hours post-inoculation (Figure 2.8B). It is important to note that the strains were consuming xylose as they degrade it and these values represent the xylose from excess xylooligosacharides released by the enzymatic action of the cells that has not been consumed for growth. As with the closed bottle total solubilization experiment, strain RKCB103 has about double the xylose equivalents in the supernatant samples as strain JWCB018 at 45.5 hours, the point at which the cultures were harvested.

Discussion

Catalytic CAZyme-containing SLH domain proteins are produced by a variety of lignocellulose-degrading bacteria (21,22,45). The genomes of twelve Caldicellulosiruptor species encode eight different groups of these (Figure 2.1), a subset of the 34 groups of

Caldicellulosiruptor SLH domain proteins (Table S-1). These proteins are one of several

100

mechanisms by which Caldicellulosiruptor species effect plant biomass degradation

(21,59,61-63).

Calkro_0111 from C. kronotskyensis is the largest glycoside hydrolase from all of the biomass degradation enzymes in the Caldicellulosiruptor genus, free or cell-associated, and is entirely unique to C. kronotskyensis. The GH16 domain of Calkro_0111 is an endoglucanase, while the GH55 is an exoglucanase (Figure 2.3). Thus, the two GH domains likely work synergistically, as has been shown for other Caldicellulosiruptor enzymes wihout

SLH domains but with two catalytic domains (76,79,80). Various truncation mutants of

Calkro_0111 (Figure 2.3D) were examined revealing that the non-catalytic ACL domain adjacent to the GH16 domain is necessary for full activity. Furthermore, for the GH55 domain of Calkro_0111, both the ACL-FN3-FN3 domain grouping and two CBM32 domains improved the activity of the GH55 domain.

The very large multi-domain structure of Calkro_0111 is characteristic of many

Caldicellulosiruptor polysaccharide-degrading enzymes, where this architecture is thought to have arisen from domain shuffling (83,84). Specifically for Calkro_0111, the N-terminus

GH16 portion is similar to the N-terminus of Calkro_0072 (45% aa identity) and the C- terminus GH55 portion is similar to Calkro_0113 (58% aa identity). These protein segments may have combined by domain shuffling to form Calkro_0111. However, the ACL-FN3-

FN3-ACL-FN3-FN3 motif in the middle of Calkro_0111, and in homolog Calkro_0121, is completely unique to these two genes. These ACL domains belong to the same protein superfamily (CL22458 RICIN Superfamily) as Fascins, which play a role in bundling actin filaments for cell adhesion and migration in Drosophila and vertebrate species (85). Plants

101

and brown algae, such as Laminaria sp., produce actin and actin bundling proteins that are distinct from Fascins, for a variety of cellular roles (86,87). The ACL domains of

Calkro_0111 may play a role in attachment of this enzyme to actin found in algal or plant cells, similar to the role of a CBM for carbohydrate binding, to help localize the enzyme to laminarin-containing substrates.

Many GH16- and GH55-containing enzymes are active on β-1,3-glucans.

Calkro_0111 activity was detected on a model substrate, laminarin (β-1,3-1,6-glucan), albeit at modest levels such that this is not likely to be its natural substrate. In fact, β-1,3-glucans, similar to laminarin, are produced in plant reproductive and wound tissue (callose) (88,89), exopolysaccharide from bacteria (curdlan) (90), algae-like Laminaria digitata (laminarin), and lichen-like Certraria islandica (lichenan). In addition to this unique substrate preference of Calkro_0111, its genomic neighborhood is also unique, containing other extracellular biomass-degrading enzymes that have no homologs within other species in the genus. These include GH16/GH55 Calkro_0121, GH55 Calkro_0113, and GH81 Calkro_0114. This locus also encodes two ABC transporters, one of which is homologous to a putative glucooligosaccharide transporter in C. saccharolyticus (91). This entire genomic region is transcribed at relatively low levels when C. kronotskyensis is grown on Avicel or switchgrass. Taken together, this genomic locus, including Calkro_0111, likely degrades unique β-1,3-glucans from lichens, algae, or other microorganisms that may be more abundant in the natural environment of Kamchatka, Russia, from which C. kronotskyensis was isolated.

102

The large size of the gene encoding Calkro_0111 demonstrates that this unique multi- domain arrangement is beneficial to C. kronotskyensis for it to be maintained and even duplicated as Calkro_0121. More broadly in the genus Caldicellulosiruptor, the unique multi-domain architecture of enzymes, both with and without SLH domains, has likely evolved to appropriately space the CAZyme domains for increased synergy and activity.

Specifically for SLH domain proteins, domain spacing must also accommodate the tethering of the SLH domains at one end of the protein to the cell surface, while it is attaching to, and degrading, the plant biomass substrate with the other. The large size of these catalytic SLH proteins may reflect the length needed to swing out away from the cell to reach biomass substrates to which they can attach. While domain spacing and arrangement appears to be a critical factor in the activity of Calkro_0111, no pattern for domain arrangement through all of the catalytic Caldicellulosiruptor SLH proteins is readily apparent, based on amino acid sequence. Lacking structural data, which is difficult to obtain for these large multi-domain proteins, the orientation of each domain relative to another cannot be determined precisely. It does seem likely, however, that each protein has evolved its domain spacing to enable its unique combination of binding and catalytic domains to function properly while being tethered to the cell.

While Calkro_0111 and Calkro_0121 are transcribed at low levels when grown on plant biomass substrates, all of the other SLH GH and PL enzymes from C. kronotskyensis and C. bescii are up-regulated when these species are grown on switchgrass compared to

Avicel (Figure 2.5), including the xylanase Calkro_0402. This suggests an important role in plant biomass degradation. Calkro_0402 belongs to a large group of homologous

103

CBM22/GH10/CBM9 xylanases (Figure 2.4A). While Calkro_0402 releases primarily xylobiose and xylotriose from birchwood xylan (Figure 2.4E), similar to previously characterized homologs (28,31), the optimal temperature of 80°C for Calkro_0402 (Figure

2.4D) makes it one of the most thermophilic versions of this family of xylanase enzymes characterized to date. As is predicted by the presence of SLH domains, Calkro_0402 is localized to the cell surface of C. kronotskyensis (Figure 2.2 D-E), which is a common feature of many homologs of Calkro_0402, including some shown in Figure 2.4A associated to the cell by means other than SLH domains. The Calkro_0402 homolog from Thermotoga maritima (and presumably homologs in other Thermotoga sp.) is cell-associated by an amino-terminal hydrophobic peptide anchor and not SLH domains (92). Furthermore, the

Clostridium clariflavum xylanase lacks SLH domains, but has a dockerin domain on its C- terminus, which would allow this enzyme to associate via protein-protein interactions into a cellulosome multi-enzyme complex that are also typically anchored to the cell surface via

SLH domains (61). The fact that these homologs without SLH domains can remain associated with the cell implies that there is particular utility for having this type of

CBM22/GH10/CBM9 xylanase associated with the cell.

Using genetic manipulation of C. bescii to knock-in Calkro_0402, the role of this

SLH protein in vivo for substrate attachment (Figure 2.7 C,D) and degradation (Figure 2.7

A,B and Figure 2.8 A,B) could be examined. Wild-type C. bescii has previously been shown to attach to xylan substrates (63). Our observations suggest that modification of the C. bescii

S-layer to contain SLH domain xylanase Calkro_0402 likely improves this attachment

(Figure 2.7 C,D). Truncation mutants of Xyn10B from Clostridium stercorarium and

104

Xyn10A from Clostridium josui showed that the CBM9 and CBM22 domains mediate the attachment of these enzymes to xylan and other substrates (42,43). While these two xylanases are phylogenetically distant from Calkro_0402 (Figure 2.4A), the role of these

CBM22 and CBM9 domains is likely very similar in Calkro_0402 in the functional role of degrading xylan substrates. The attachment of the Calkro_0402 knock-in in strain RKCB103 is also likely a significant factor in improved solubilization of xylans (Fig 7A). Both C. bescii and C. kronotskyensis produce at least five other extracellular xylanases targeted for the secretome with GH10, GH11, or GH43 domains. If the cell is attached to xylan by

Calkro_0402, these enzymes are secreted from the cell proximate to their substrate, likely making them more effective in the degradation process as well.

Very few of the diverse catalytic CAZyme-containing SLH domain proteins have been characterized, but their role in biomass deconstruction clearly merits closer examination. There are more than 20,000 SLH domain-containing proteins predicted in over

1,800 microbial genomes. These unusual proteins on the surface of bacterial cells contain a variety of functional protein domains, localized at the interface between the cell and its environment. While a full understanding of bacterial S-layer proteins and their varied and important roles at the cell surface is not yet available, the results reported here provide new insights into the role of SLH GHs as agents of plant polysaccharide degradation.

Acknowledgements

This work was supported by the BioEnergy Science Center (BESC), a U.S.

Department of Energy Bioenergy Research Center supported by the Office of Biological and

105

Environmental Research in the DOE Office of Science. LL Lee acknowledges support from an NIH Biotechnology Traineeship (NIH T32 GM008776-11) and NSF Graduate Fellowship, and JM Conway acknowledges support from a US DoEd GAANN Fellowship

(P200A100004-12). We would also like to acknowledge the North Carolina State University

Cellular and Molecular Imaging Facility and Dr. Eva Johannes for her assistance in operating the Zeiss LSM 710 confocal microscope.

106

Figure 2.1 - Catalytic CAZyme SLH Domain Proteins from the genus Caldicellulosiruptor Domain arrangement of these multi-domain proteins is drawn to scale based on prediction from the NCBI’s conserved domain database (93) and Pfam protein families database (94). Representative genes for each group containing more than one member are shown on the left. The number of Caldicellulosiruptor species’ genomes from the 12 completely sequenced to date that contain a member from each group is shown to the right. Domain abbreviations are as follows: GH: glycoside hydrolase, CBM: carbohydrate binding module, PL: polysaccharide lyase, BIg: Bacterial Immunoglobulin-like (CL0159), SLH: surface layer homology (pfam00395), FN3: fibronectin type III (pfam00041), ACL: actin cross linking- like RICIN superfamily (CL22458), Cadherin-like (pfam12733).

107

Figure 2.2 - TEM and Immunofluorescence Microscopy of C. kronotskyensis Transmission electron microscopy of C. kronotskyensis grown on dilute acid pretreated P. trichocarpa x P. deltoides showing the cytoplasmic membrane (cm), peptidoglycan (pg), and S-layer (s) (A). Confocal microscopy of C. kronotskyensis grown on laminarin labeled with total IgY antibodies raised against Calkro_0111 TM7 (B-C). Confocal microscopy of C. kronotskyensis grown on xylooligosaccharides labeled with total IgY antibodies raised against Cakro_0402 TM1 (C-D). The cell counterstain (DAPI) is shown in magenta and secondary antibody labeling (DyLight488 goat anti-chicken) is shown in green. Scale bars are 100 nm (A), 4 µm (B), 1 µm (C), 4 µm (D), 1 µm (E).

108

Figure 2.3 - Biochemical Characterization of Calkro_0111 Domain organization for full length Calkro_0111 and truncation mutants (TMs) produced in E. coli used for characterization (A). Domain abbreviations are as follows: SLH: surface layer homology, CBM: carbohydrate binding module, GH: glycoside hydrolase, FN3: fibronectin type III (pfam00041), and ACL: actin cross linking-like RICIN superfamily (CL22458). (B) Optimum pH at 70°C and optimum temperature at the optimum pH were determined for Calkro_0111 TM1 (0.9 mg/mL incubated 5 hours) and TM8 (0.005 mg/mL incubated 30 min) on the substrate laminarin. Error bars represent the standard deviation from the mean (n=3). (C) Oligosaccharides released from laminarin by TM 1 and TM8. HPLC chromatograms for each enzyme reaction time are aligned with respect to retention time and graphed together. The location of the peaks for standards glucose (G), laminaribiose (L2), laminaritriose (L3), laminaritetraose (L4), laminaripentaose (L5), and laminarihexaose (L6) are shown. (D) Truncation mutant activity as measured by the DNS reducing sugar assay. TMs 1-5 and 7 (4.2 µM) were incubated for 5 hours, TM6 (12.3 µM) was incubated for 10 hours, and TMs 8-10 (0.2 µM) were incubated for 30 min with 1% (w/v) laminarin at 70°C and pH 5 or pH 7. Activity is calculated in µmol reducing sugar released / hour / nmol protein to compare activity for the proteins of different mass. Error bars represent the standard deviation from the mean (n=3).

109

Figure 2.4 - Biochemical Characterization of Calkro_0402 MEGA v6 (95) was used to align, using MUSCLE (96), full protein amino acid sequences and construct an unrooted neighbor-joining phylogenetic tree of close blastp homologs to Calkro_0402 and previously characterized CBM22/GH10/CBM9 containing enzymes (A). Species and protein accession number are shown for each enzyme branch. (B) Domain organization for Calkro_0402 and truncation mutant 1 (TM1), which was used as the antigen for antibody production. Domain abbreviations are as follows: SLH: surface layer homology, CBM: carbohydrate binding module, GH: glycoside hydrolase, Cadherin-like: pfam12733. (C) Purified full length Calkro_0402 (approximately 180 kDa) produced in E. coli and purified by immobilized metal and size exclusion chromatography. (D) Optimum pH at 70°C and the optimum temperature at this optimum pH were determined for recombinant Calkro_0402 (0.02 mg/mL incubated 30 min). Error bars represent the standard deviation from the mean (n=3). (E) Oligosaccharides released from birchwood xylan by Calkro_0402. HPLC chromatograms for each enzyme reaction time are aligned with respect to retention time and graphed together. The location of the peaks for standards xylose (X1), xylobiose (X2), xylotriose (X3), xylotetraose (X4), xylopentaose (X5), and xylohexaose (X6) are shown.

110

Figure 2.5 - Transcriptional response of genes encoding SLH domain proteins from C. bescii and C. kronotskyensis The log squared mean (LSM) transcriptomic level of the 19 SLH domain proteins from C. kronotskyensis and 12 SLH domain proteins from C. bescii when each species is grown on crystalline cellulose (Avicel) and switchgrass (SWG). An LSM value of 0 represents average transcript abundance (black). Genes transcribed at levels higher than average have positive LSM values (red), while genes transcribed at levels lower than average have negative LSM values (green). Differential transcription is shown as Avicel minus SWG with negative values (orange) up-regulated on SWG relative to Avicel. Positive values (blue) are up- regulated on Avicel relative to SWG. Analysis is based on whole genome oligonucleotide microarray experiments deposited in the NCBI GEO database with accession GSE68810 (70). Domain abbreviations are as follows: GH: Glycoside Hydrolase, PL: Polysaccharide Lyase, CBM: Carbohydrate Binding Module, SLH: Surface Layer Homology (pfam00395), FN3: Fibronectin type III-like (pfam00041), Big: Bacterial Immunoglobulin-like (CL0159), ACL: actin cross linking-like RICIN superfamily (CL22458), Tglut: Transglutaminase-like superfamily (pfam01841), MG2: Macroglobulin2 (pfam01835), vWFA: Von Willebrand factor type A (pfam00092), SH3: Bacterial SH3 (pfam08239), RHSrep: RHS repeat (pfam05593), RHScore: RHS associated core domain (TIGR03696). * Calkro_0121 is truncated after the first CBM32.

111

Figure 2.6 - Construction of Calkro_0402 knock-in C. bescii strain RKCB103 Calkro_0402 knock-in vector pJMC009 homologous recombination with the chromosome of parent C. bescii strain JWCB018. pJMC009 was constructed using the pDCW121 (65) vector backbone, which contains the pSC101 origin, repA, and apramycin resistance marker for maintenance in E. coli, as well as, C. bescii wildtype pyrF under the control of the Athe_2105 promoter for selection of uracil prototrophy and 5-FOA counter selection (A). Crossover regions 1kb in length 5’ and 3’ to the ΔCbeI locus (ΔAthe_2438) in strain JWCB018 were used to target Calkro_0402 for insertion via homologous recombination to that locus. The Calkro_0402 insertion construct included Calkro_0402 and an additional 81 bp downstream of the gene to include its predicted transcriptional terminator element amplified from C. kronotskyensis genomic DNA. High, constitutive expression of Calkro_0402 was driven by the 200 bp promoter element upstream of the C. bescii main SLP gene (Athe_2303). (B) The Calkro_0402 construct is inserted in the chromosome of RKCB103 after first crossover vector integration and second crossover 5-FOA counter selection to remove the vector backbone containing pyrF. (C) Using primers outside of the 5’ and 3’ crossover regions (Table 1: ΔAthe_2438 locus F and R), the ΔCbeI locus was amplified and sequenced to verify the Calkro_0402 knock-in. Strain JWCB018 has the expected 2.9 kb product, while strain RKCB103 has the expected 8.2 kb product with the 5.3 kb insertion of the SLP promoter and Calkro_0402 construct. New England Biolabs 1kb ladder was run alongside these PCR products. (D, E, F) Anti-Calkro_0402 immunofluorescence microscopy of C. bescii strain JWCB018, C. kronotskyensis, and C. bescii strain RKCB103 grown on birchwood xylan. (D) C. bescii strain JWCB018, the genetic parent strain, which does not contain Calkro_0402 shows minimal labeling with the anti-Calkro_0402 antibodies. (E) C. kronotskyensis and (F) Calkro_0402 knock-in C. bescii strain RKCB103 are both extensively labeled with antibody. (E,F) Calkro_0402 is shown localized on the cell surface in both C. kronotskyensis and C. bescii strain RKCB103, and appears primarily at the poles of the cells. All cells are labeled with chicken anti-Calkro_0402 total IgY primary antibody and DyLight 488 conjugated goat anti-chicken secondary antibody (green) with DAPI as a counterstain (blue). Scale bars are 2 µm.

112

113

Figure 2.7 - Increased xylan solubilization and xylan attachment by Calkro_0402 knock-in C. bescii strain RKCB103 Total biomass solubilization of washed oat spelt xylan, washed birchwood xylan, and unwashed birchwood xylan after 45.5 hours of growth of C. bescii strain JWCB018 or RKCB103 (A). Xylose measured in the acid hydrolyzed culture supernatant harvested at 45.5 hours (B). Error bars represent the standard deviation from the mean (n=3). (C) C. bescii strain JWCB018 cultures on washed birchwood xylan appear very turbid, and (D) C. bescii strain RKCB103 cultures appear much less turbid, while having the same cell density per mL of culture. C. bescii strain RKCB103 appears to be more readily attached to the insoluble washed birchwood xylan substrate, suggesting that the expression of Calkro_0402 in this strain is playing a role in tethering the cells to the xylan substrate.

114

Figure 2.8 - Time course washed oat spelt xylan solubilization Cell densities of C. bescii strains JWCB018 (blue) and RKCB103 (green) over time while growing on washed oat spelt xylan (A). Xylose measured in acid hydrolyzed supernatant samples for C. bescii strains JWCB018 (blue) and RKCB103 (green) and an abiotic control (brown) (B).

115

Table 2.1 - Primers used in this study Underlined portions are vector specific or overlap regions for isothermal assembly. Primer Sequence Use pET46 Calkro_0111 TM1 F GACGACGACAAGATAAATAAAGCAGGCACA cloning pET46 Calkro_0111 TM1/TM3 R GAGGAGAAGCCCGGTTAATCTGACATATCTGATTC cloning pET46 Calkro_0111 TM2 F GACGACGACAAGATGGTTGAAAAAGGATATTTGATTGG cloning pET46 Calkro_0111 TM2 R GAGGAGAAGCCCGGTTAACTTGGAGGGTTAGGATAAA cloning pET46 Calkro_0111 TM3/TM4 F GACGACGACAAGATGGGTGAATGGAAACTTGT cloning pET46 Calkro_0111 TM4 R GAGGAGAAGCCCGGTTACGGTGTAGAATAGATGATGAA cloning pET46 Calkro_0111 TM5 F GACGACGACAAGATTCCTTCCGTGGGTGAATGGAAACTTGTATGG cloning pET46 Calkro_0111 TM5 R GAGGAGAAGCCCGGTTAACCTTCTCTTTGGTATAC cloning pET46 Calkro_0111 TM6 F GACGACGACAAGATGATTTTAGGATTAGGT cloning pET46 Calkro_0111 TM6 R GAGGAGAAGCCCGGTTACAATTTCTTTTCTAAAGAT cloning pET46 Calkro_0111 TM7 F GACGACGACAAGATCTCTGGAGGCAGGTCTTA cloning pET46 Calkro_0111 TM7 R GAGGAGAAGCCCGGTTAGGTTTCTCCATTTTCATTG cloning pET46 Calkro_0111 TM8 F GACGACGACAAGATGACAGATACAGGTTTGAG cloning pET46 Calkro_0111 TM8 R GAGGAGAAGCCCGGTTACTTGCTCCCATACAC cloning pET46 Calkro_0111 TM9 F GACGACGACAAGATGTCAGAGCCTATATCT cloning pET46 Calkro_0111 TM9 R GAGGAGAAGCCCGGTTACTTGCTCCCATACACTTCA cloning pET46 Calkro_0111 TM10 F GACGACGACAAGATGTTTGGTCCAAATGTCTA cloning pET46 Calkro_0111 TM10 R GAGGAGAAGCCCGGTTAGTTGCAATATTCTGTTAC cloning pET46 Calkro_0402 F GACGACGACAAGATTCAAAGCAGCCAAACTAA cloning pET46 Calkro_0402 R GAGGAGAAGCCCGGTTACTTTTTGAGATTGTTAAGTGCTCTG cloning pET46 Calkro_0402 TM1 F GACGACGACAAGATGTCAGATAGCATAAAG cloning pET46 Calkro_0402 TM1 R GAGGAGAAGCCCGGTTATTGAGCATTTCTTGTAACTG cloning ΔAthe_2438 5' F GAAGTTAGGCTGGTGGTACCCATACACCAAAATGATATAATCTCC pJMC009 ΔAthe_2438 5' R TAAATCCTGTTTATCATGTTGGAAGGCAAATTG pJMC009 Cbescii Pslp (Athe_2303 pJMC009 AACATGATAAACAGGATTTAAAAGAGGCTATG promoter) F Cbescii Pslp (Athe_2303 pJMC009 GCTTTTTCATAACTACTCACCAAACCTC promoter) R Calkro_0402 + Calkro pJMC009 GTGAGTAGTTATGAAAAAGCGACTTATAGC Terminator F Calkro_0402 + Calkro pJMC009 CTACCTAATCGGAAGACTAAAGATTTTCTCC Terminator R ΔAthe_2438 3' F TTAGTCTTCCGATTAGGTAGGTGGTTTTAAC pJMC009 ΔAthe_2438 3' R CACTGAGCGTCAGAGAATTCGGATTTATCGCCTGAGTTG pJMC009 pDCW121 backbone F GAATTCTCTGACGCTCAG pJMC009 pDCW121 backbone R GGTACCACCAGCCTAACTTC pJMC009 ΔAthe_2438 locus F GGCAACCTTCAGTAGGGCTGC PCR ΔAthe_2438 locus R CATTTGATATCAGAATGTTCTCGTTGATT PCR

116

References

1. Sara, M., and Sleytr, U. B. (2000) S-Layer proteins. J. Bacteriol. 182, 859-868

2. Sleytr, U. B., and Beveridge, T. J. (1999) Bacterial S-layers. Trends Microbiol. 7,

253-260

3. Albers, S.-V., and Meyer, B. H. (2011) The archaeal cell envelope. Nat. Rev.

Microbiol. 9, 414-426

4. Kern, J., Wilton, R., Zhang, R., Binkowski, T. A., Joachimiak, A., and Schneewind,

O. (2011) Structure of surface layer homology (SLH) domains from Bacillus

anthracis surface array protein. J. Biol. Chem. 286, 26042-26049

5. Mesnage, S., Fontaine, T., Mignot, T., Delepierre, M., Mock, M., and Fouet, A.

(2000) Bacterial SLH domain proteins are non-covalently anchored to the cell surface

via a conserved mechanism involving wall polysaccharide pyruvylation. EMBO J. 19,

4473-4484

6. Fagan, R. P., Janoir, C., Collignon, A., Mastrantonio, P., Poxton, I. R., and

Fairweather, N. F. (2011) A proposed nomenclature for cell wall proteins of

Clostridium difficile. J. Med. Microbiol. 60, 1225-1228

7. Boot, H. J., Kolen, C. P., and Pouwels, P. H. (1995) Identification, cloning, and

nucleotide sequence of a silent S-layer protein gene of Lactobacillus acidophilus

117

ATCC 4356 which has extensive similarity with the S-layer protein gene of this

species. J. Bacteriol. 177, 7222-7230

8. Johnson, B. R., Hymes, J., Sanozky-Dawes, R., Henriksen, E. D., Barrangou, R., and

Klaenhammer, T. R. (2016) Conserved S-Layer-Associated Proteins Revealed by

Exoproteomic Survey of S-Layer-Forming Lactobacilli. Appl. Environ. Microbiol. 82,

134-145

9. Veith, A., Klingl, A., Zolghadr, B., Lauber, K., Mentele, R., Lottspeich, F., Rachel,

R., Albers, S. V., and Kletzin, A. (2009) Acidianus, Sulfolobus and Metallosphaera

surface layers: structure, composition and gene expression. Mol. Microbiol. 73, 58-72

10. Abdul Halim, M. F., Pfeiffer, F., Zou, J., Frisch, A., Haft, D., Wu, S., Tolić, N.,

Brewer, H., Payne, S. H., Paša-Tolić, L., and Pohlschroder, M. (2013) Haloferax

volcanii archaeosortase is required for motility, mating, and C-terminal processing of

the S-layer glycoprotein. Mol. Microbiol. 88, 1164-1175

11. Abdul Halim, M. F., Karch, K. R., Zhou, Y., Haft, D. H., Garcia, B. A., and

Pohlschroder, M. (2015) Permuting the PGF-CTERM signature motif blocks both

archaeosortase-dependent C-terminal cleavage and prenyl lipid attachment for the

Haloferax volcanii S-layer glycoprotein. J. Bacteriol.

12. Fagan, R. P., and Fairweather, N. F. (2014) Biogenesis and functions of bacterial S-

layers. Nat. Rev. Microbiol. 12, 211-222

118

13. Pham, T. K., Roy, S., Noirel, J., Douglas, I., Wright, P. C., and Stafford, G. P. (2010)

A quantitative proteomic analysis of biofilm adaptation by the periodontal pathogen

Tannerella forsythia. Proteomics 10, 3130-3141

14. Honma, K., Inagaki, S., Okuda, K., Kuramitsu, H. K., and Sharma, A. (2007) Role of

a Tannerella forsythia exopolysaccharide synthesis operon in biofilm development.

Microb. Pathog. 42, 156-166

15. Reynolds, C. B., Emerson, J. E., de la Riva, L., Fagan, R. P., and Fairweather, N. F.

(2011) The Clostridium difficile cell wall protein CwpV is antigenically variable

between strains, but exhibits conserved aggregation-promoting function. PLoS

Pathog. 7, e1002024

16. Sun, Z., Kong, J., Hu, S., Kong, W., Lu, W., and Liu, W. (2013) Characterization of a

S-layer protein from Lactobacillus crispatus K313 and the domains responsible for

binding to cell wall and adherence to collagen. Appl. Microbiol. Biotechnol. 97, 1941-

1952

17. de la Riva, L., Willing, S. E., Tate, E. W., and Fairweather, N. F. (2011) Roles of

cysteine proteases Cwp84 and Cwp13 in biogenesis of the cell wall of Clostridium

difficile. J. Bacteriol. 193, 3276-3285

18. Anderson, V. J., Kern, J. W., McCool, J. W., Schneewind, O., and Missiakas, D.

(2011) The SLH-domain protein BslO is a determinant of Bacillus anthracis chain

length. Mol. Microbiol. 81, 192-205

119

19. Brahamsha, B. (1996) An abundant cell-surface polypeptide is required for swimming

by the nonflagellated marine cyanobacterium Synechococcus. Proc. Natl. Acad. Sci.

U.S.A. 93, 6504-6509

20. McCarren, J., and Brahamsha, B. (2007) SwmB, a 1.12-megadalton protein that is

required for nonflagellar swimming motility in Synechococcus. J. Bacteriol. 189,

1158-1162

21. Ozdemir, I., Blumer-Schuette, S. E., and Kelly, R. M. (2012) S-layer homology

domain proteins Csac_0678 and Csac_2722 are implicated in plant polysaccharide

deconstruction by the extremely thermophilic bacterium Caldicellulosiruptor

saccharolyticus. Appl. Environ. Microbiol. 78, 768-777

22. Egelseer, E., Schocher, I., Sara, M., and Sleytr, U. B. (1995) The S-layer from

Bacillus stearothermophilus DSM 2358 functions as an adhesion site for a high-

molecular-weight amylase. J. Bacteriol. 177, 1444-1451

23. Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M., and Henrissat, B.

(2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids

Res. 42, D490-495

24. Lemaire, M., Ohayon, H., Gounon, P., Fujino, T., and Beguin, P. (1995) OlpB, a new

outer layer protein of Clostridium thermocellum, and binding of its S-layer-like

domains to components of the cell envelope. J. Bacteriol. 177, 2451-2459

120

25. Fujino, T., Beguin, P., and Aubert, J. P. (1993) Organization of a Clostridium

thermocellum gene cluster encoding the cellulosomal scaffolding protein CipA and a

protein possibly involved in attachment of the cellulosome to the cell surface. J.

Bacteriol. 175, 1891-1899

26. Fontes, C. M., and Gilbert, H. J. (2010) Cellulosomes: highly efficient nanomachines

designed to deconstruct plant cell wall complex carbohydrates. Annu. Rev. Biochem.

79, 655-681

27. Markowitz, V. M., Chen, I. M., Palaniappan, K., Chu, K., Szeto, E., Pillay, M.,

Ratner, A., Huang, J., Woyke, T., Huntemann, M., Anderson, I., Billis, K., Varghese,

N., Mavromatis, K., Pati, A., Ivanova, N. N., and Kyrpides, N. C. (2014) IMG 4

version of the integrated microbial genomes comparative analysis system. Nucleic

Acids Res. 42, D560-567

28. Han, Y., Agarwal, V., Dodd, D., Kim, J., Bae, B., Mackie, R. I., Nair, S. K., and

Cann, I. K. (2012) Biochemical and structural insights into xylan utilization by the

thermophilic bacterium Caldanaerobius polysaccharolyticus. J. Biol. Chem. 287,

34946-34960

29. Han, Y., Dodd, D., Hespen, C. W., Ohene-Adjei, S., Schroeder, C. M., Mackie, R. I.,

and Cann, I. K. (2010) Comparative analyses of two thermophilic enzymes exhibiting

both beta-1,4 mannosidic and beta-1,4 glucosidic cleavage activities from

Caldanaerobius polysaccharolyticus. J. Bacteriol. 192, 4111-4121

121

30. Lee, S. P., Morikawa, M., Takagi, M., and Imanaka, T. (1994) Cloning of the aapT

gene and characterization of its product, alpha-amylase-pullulanase (AapT), from

thermophilic and alkaliphilic Bacillus sp. strain XAL601. Appl. Environ. Microbiol.

60, 3764-3773

31. Waeonukul, R., Pason, P., Kyu, K. L., Sakka, K., Kosugi, A., Mori, Y., and

Ratanakhanokchai, K. (2009) Cloning, sequencing, and expression of the gene

encoding a multidomain endo-beta-1,4-xylanase from Paenibacillus curdlanolyticus

B-6, and characterization of the recombinant enzyme. J. Microbiol. Biotechnol. 19,

277-285

32. Cheng, Y. M., Hong, T. Y., Liu, C. C., and Meng, M. (2009) Cloning and functional

characterization of a complex endo-beta-1,3-glucanase from Paenibacillus sp. Appl.

Microbiol. Biotechnol. 81, 1051-1061

33. St John, F. J., Rice, J. D., and Preston, J. F. (2006) Paenibacillus sp. strain JDR-2 and

XynA1: a novel system for methylglucuronoxylan utilization. Appl. Environ.

Microbiol. 72, 1496-1506

34. Ito, Y., Tomita, T., Roy, N., Nakano, A., Sugawara-Tomita, N., Watanabe, S., Okai,

N., Abe, N., and Kamio, Y. (2003) Cloning, expression, and cell surface localization

of Paenibacillus sp. strain W-61 xylanase 5, a multidomain xylanase. Appl. Environ.

Microbiol. 69, 6969-6978

122

35. Itoh, T., Sugimoto, I., Hibi, T., Suzuki, F., Matsuo, K., Fujii, Y., Taketo, A., and

Kimoto, H. (2014) Overexpression, purification, and characterization of

Paenibacillus cell surface-expressed chitinase ChiW with two catalytic domains.

Biosci. Biotechnol. Biochem. 78, 624-634

36. St John, F. J., Preston, J. F., and Pozharski, E. (2012) Novel structural features of

xylanase A1 from Paenibacillus sp JDR-2. J. Struct. Biol. 180, 303-311

37. Kim, H., Jung, K. H., and Pack, M. Y. (2000) Molecular characterization of xynX, a

gene encoding a multidomain xylanase with a thermostabilizing domain from

Clostridium thermocellum. Appl. Microbiol. Biotechnol. 54, 521-527

38. Fuchs, K.-P., Zverlov, V. V., Velikodvorskaya, G. A., Lottspeich, F., and Schwarz,

W. H. (2003) Lic16A of Clostridium thermocellum, a non-cellulosomal, highly

complex endo-β-1,3-glucanase bound to the outer cell surface. Microbiology 149,

1021-1031

39. Adelsberger, H., Hertel, C., Glawischnig, E., Zverlov, V. V., and Schwarz, W. H.

(2004) Enzyme system of Clostridium stercorarium for hydrolysis of arabinoxylan:

reconstitution of the in vivo system from recombinant enzymes. Microbiology 150,

2257-2266

40. Feng, J. X., Karita, S., Fujino, E., Fujino, T., Kimura, T., Sakka, K., and Ohmiya, K.

(2000) Cloning, sequencing, and expression of the gene encoding a cell-bound multi-

123

domain xylanase from Clostridium josui, and characterization of the translated

product. Biosci. Biotechnol. Biochem. 64, 2614-2624

41. Ali, M. K., Fukumura, M., Sakano, K., Karita, S., Kimura, T., Sakka, K., and

Ohmiya, K. (1999) Cloning, sequencing, and expression of the gene encoding the

Clostridium stercorarium xylanase C in Escherichia coli. Biosci. Biotechnol.

Biochem. 63, 1596-1604

42. Ali, E., Araki, R., Zhao, G., Sakka, M., Karita, S., Kimura, T., and Sakka, K. (2005)

Functions of family-22 carbohydrate-binding modules in Clostridium josui Xyn10A.

Biosci. Biotechnol. Biochem. 69, 2389-2394

43. Ali, M. K., Hayashi, H., Karita, S., Goto, M., Kimura, T., Sakka, K., and Ohmiya, K.

(2001) Importance of the carbohydrate-binding module of Clostridium stercorarium

Xyn10B to xylan hydrolysis. Biosci. Biotechnol. Biochem. 65, 41-47

44. Ali, M. K., Kimura, T., Sakka, K., and Ohmiya, K. (2001) The multidomain xylanase

Xyn10B as a cellulose-binding protein in Clostridium stercorarium. FEMS

Microbiol. Lett. 198, 79-83

45. Brechtel, E., Matuschek, M., Hellberg, A., Egelseer, E. M., Schmid, R., and Bahl, H.

(1999) Cell wall of Thermoanaerobacterium thermosulfurigenes EM1: isolation of its

components and attachment of the xylanase XynA. Arch. Microbiol. 171, 159-165

124

46. Liu, S. Y., Gherardini, F. C., Matuschek, M., Bahl, H., and Wiegel, J. (1996) Cloning,

sequencing, and expression of the gene encoding a large S-layer-associated

endoxylanase from Thermoanaerobacterium sp. strain JW/SL-YS 485 in Escherichia

coli. J. Bacteriol. 178, 1539-1547

47. Matuschek, M., Burchhardt, G., Sahm, K., and Bahl, H. (1994) Pullulanase of

Thermoanaerobacterium thermosulfurigenes EM1 (Clostridium thermosulfurogenes):

molecular analysis of the gene, composite structure of the enzyme, and a common

model for its attachment to the cell surface. J. Bacteriol. 176, 3295-3302

48. Matuschek, M., Sahm, K., Zibat, A., and Bahl, H. (1996) Characterization of genes

from Thermoanaerobacterium thermosulfurigenes EM1 that encode two glycosyl

hydrolases with conserved S-layer-like domains. Mol. Gen. Genet. 252, 493-496

49. Shao, W. L., Deblois, S., and Wiegel, J. (1995) A high-molecular-weight, cell-

associated xylanase isolated from exponentially growing Thermoanaerobacterium sp.

strain JW/SL-YS485. Appl. Environ. Microbiol. 61, 937-940

50. Huang, X., Li, Z., Du, C., Wang, J., and Li, S. (2015) Improved expression and

characterization of a multidomain xylanase from Thermoanaerobacterium

aotearoense SCUT27 in Bacillus subtilis. J. Agric. Food Chem. 63, 6430-6439

51. Hung, K. S., Liu, S. M., Fang, T. Y., Tzou, W. S., Lin, F. P., Sun, K. H., and Tang, S.

J. (2011) Characterization of a salt-tolerant xylanase from Thermoanaerobacterium

saccharolyticum NTOU1. Biotechnol. Lett. 33, 1441-1447

125

52. Morris, D. D., Gibbs, M. D., Ford, M., Thomas, J., and Bergquist, P. L. (1999)

Family 10 and 11 xylanase genes from Caldicellulosiruptor sp. strain Rt69B.1.

Extremophiles 3, 103-111

53. Blumer-Schuette, S. E., Ozdemir, I., Mistry, D., Lucas, S., Lapidus, A., Cheng, J.-F.,

Goodwin, L. A., Pitluck, S., Land, M. L., Hauser, L. J., Woyke, T., Mikhailova, N.,

Pati, A., Kyrpides, N. C., Ivanova, N., Detter, J. C., Walston-Davenport, K., Han, S.,

Adams, M. W. W., and Kelly, R. M. (2011) Complete genome sequences for the

anaerobic, extremely thermophilic plant biomass-degrading bacteria

Caldicellulosiruptor hydrothermalis, Caldicellulosiruptor kristjanssonii,

Caldicellulosiruptor kronotskyensis, Caldicellulosiruptor owensensis, and

Caldicellulosiruptor lactoaceticus. J. Bacteriol. 193, 1483-1484

54. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,

Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,

Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,

Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,

Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome

Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain

Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3

55. Elkins, J. G., Lochner, A., Hamilton-Brehm, S. D., Davenport, K. W., Podar, M.,

Brown, S. D., Land, M. L., Hauser, L. J., Klingeman, D. M., Raman, B., Goodwin, L.

A., Tapia, R., Meincke, L. J., Detter, J. C., Bruce, D. C., Han, C. S., Palumbo, A. V.,

126

Cottingham, R. W., Keller, M., and Graham, D. E. (2010) Complete genome

sequence of the cellulolytic thermophile Caldicellulosiruptor obsidiansis OB47T. J.

Bacteriol. 192, 6099-6100

56. van de Werken, H. J., Verhaart, M. R., VanFossen, A. L., Willquist, K., Lewis, D. L.,

Nichols, J. D., Goorissen, H. P., Mongodin, E. F., Nelson, K. E., van Niel, E. W.,

Stams, A. J., Ward, D. E., de Vos, W. M., van der Oost, J., Kelly, R. M., and Kengen,

S. W. (2008) Hydrogenomics of the extremely thermophilic bacterium

Caldicellulosiruptor saccharolyticus. Appl. Environ. Microbiol. 74, 6720-6729

57. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.

C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and

Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and

cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,

3760-3761

58. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.

(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus

Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic

Press, Norfolk, UK. pp 91-120

59. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,

Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,

Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,

127

R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal

determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.

Bacteriol. 194, 4015-4028

60. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,

microbiological, and glycoside hydrolase diversities within the extremely

thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.

Microbiol. 76, 8084-8092

61. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,

Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)

Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448

62. Blumer-Schuette, S. E., Alahuhta, M., Conway, J. M., Lee, L. L., Zurawski, J. V.,

Giannone, R. J., Hettich, R. L., Lunin, V. V., Himmel, M. E., and Kelly, R. M. (2015)

Discrete and structurally unique proteins (tāpirins) mediate attachment of extremely

thermophilic Caldicellulosiruptor species to cellulose. J. Biol. Chem. 290, 10645-

10656

63. Dam, P., Kataeva, I., Yang, S. J., Zhou, F., Yin, Y., Chou, W., Poole, F. L., 2nd,

Westpheling, J., Hettich, R., Giannone, R., Lewis, D. L., Kelly, R., Gilbert, H. J.,

Henrissat, B., Xu, Y., and Adams, M. W. (2011) Insights into plant biomass

conversion from the genome of the anaerobic thermophilic bacterium

Caldicellulosiruptor bescii DSM 6725. Nucleic Acids Res. 39, 3240-3254

128

64. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier

to DNA transformation in Caldicellulosiruptor species results in efficient marker

replacement. Biotechnol. Biofuels 6, 82

65. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic

engineering of Caldicellulosiruptor bescii yields increased hydrogen production from

lignocellulosic biomass. Biotechnol. Biofuels 6, 85

66. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods

Enzymol. 498, 349-361

67. Studier, F. W. (2005) Protein production by auto-induction in high density shaking

cultures. Protein Expr. Purif. 41, 207-234

68. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.

(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic

euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894

69. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)

Improved growth media and culture techniques for genetic analysis and assessment of

biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,

41-49

70. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-

Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative

129

analysis of extremely thermophilic Caldicellulosiruptor species reveals common and

differentiating cellular strategies for plant biomass utilization. Appl. Environ.

Microbiol. 81, 7159-7170

71. Petersen, T. N., Brunak, S., von Heijne, G., and Nielsen, H. (2011) SignalP 4.0:

discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785-786

72. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-

268

73. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing

sugar. Anal. Chem. 31, 426-428

74. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose

assay for endoglucanase activity. Anal. Biochem. 342, 176-178

75. Xiao, Z., Storms, R., and Tsang, A. (2004) Microplate-based filter paper assay to

measure total cellulase activity. Biotechnol. Bioeng. 88, 832-837

76. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside

hydrolase inventory drives plant polysaccharide deconstruction by the extremely

thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.

108, 1559-1569

77. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,

Adams, M. W., and Westpheling, J. (2011) Natural competence in the

130

hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:

construction of markerless deletions of genes encoding the two cytoplasmic

hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238

78. Zhao, Y., Meng, K., Luo, H., Huang, H., Yuan, T., Yang, P., and Yao, B. (2013)

Molecular and biochemical characterization of a new alkaline active multidomain

xylanase from alkaline wastewater sludge. World J. Microbiol. Biotechnol. 29, 327-

334

79. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and

biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase

by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,

e84172

80. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.

O. (2012) Molecular and biochemical analyses of the GH44 Module of

CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium

Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059

81. Brunecky, R., Alahuhta, M., Bomble, Y. J., Xu, Q., Baker, J. O., Ding, S. Y.,

Himmel, M. E., and Lunin, V. V. (2012) Structure and function of the Clostridium

thermocellum cellobiohydrolase A X1-module repeat: enhancement through

stabilization of the CbhA complex. Acta Crystallogr. D Biol. Crystallogr. 68, 292-

299

131

82. Schneewind, O., and Missiakas, D. M. (2012) Protein secretion and surface display in

Gram-positive bacteria. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 367, 1123-1139

83. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,

H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic

bacteria. FEMS Microbiol. Ecol. 28, 99-110

84. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and

Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from

the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,

333-340

85. Kureishy, N., Sapountzi, V., Prag, S., Anilkumar, N., and Adams, J. C. (2002)

Fascins, and their roles in cell structure and function. Bioessays 24, 350-361

86. Thomas, C., Tholl, S., Moes, D., Dieterle, M., Papuga, J., Moreau, F., and Steinmetz,

A. (2009) Actin bundling in plants. Cell Motil. Cytoskeleton 66, 940-957

87. Charrier, B., Le Bail, A., and de Reviers, B. (2012) Plant proteus: brown algal

morphological plasticity and underlying developmental mechanisms. Trends Plant

Sci. 17, 468-477

88. Verma, D., and Hong, Z. (2001) Plant callose synthase complexes. Plant Mol. Biol.

47, 693-701

132

89. Dumas, C., and Knox, R. B. (1983) Callose and determination of pistil viability and

incompatibility. Theor. Appl. Genet. 67, 1-10

90. Harada, T. (1992) The story of research into curdlan and the bacteria producing it.

Trends Glycosci. Glycotechnol. 4, 309-317

91. VanFossen, A. L., Verhaart, M. R., Kengen, S. M., and Kelly, R. M. (2009)

Carbohydrate utilization patterns for the extremely thermophilic bacterium

Caldicellulosiruptor saccharolyticus reveal broad growth substrate preferences. Appl.

Environ. Microbiol. 75, 7718-7724

92. Liebl, W., Winterhalter, C., Baumeister, W., Armbrecht, M., and Valdez, M. (2008)

Xylanase attachment to the cell wall of the hyperthermophilic bacterium Thermotoga

maritima. J. Bacteriol. 190, 1350-1358

93. Marchler-Bauer, A., Derbyshire, M. K., Gonzales, N. R., Lu, S., Chitsaz, F., Geer, L.

Y., Geer, R. C., He, J., Gwadz, M., Hurwitz, D. I., Lanczycki, C. J., Lu, F., Marchler,

G. H., Song, J. S., Thanki, N., Wang, Z., Yamashita, R. A., Zhang, D., Zheng, C., and

Bryant, S. H. (2015) CDD: NCBI's conserved domain database. Nucleic Acids Res.

43, D222-D226

94. Punta, M., Coggill, P. C., Eberhardt, R. Y., Mistry, J., Tate, J., Boursnell, C., Pang,

N., Forslund, K., Ceric, G., Clements, J., Heger, A., Holm, L., Sonnhammer, E. L.,

Eddy, S. R., Bateman, A., and Finn, R. D. (2012) The Pfam protein families database.

Nucleic Acids Res. 40, D290-301

133

95. Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013) MEGA6:

Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725-

2729

96. Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy and

high throughput. Nucleic Acids Res. 32, 1792-1797

134

CHAPTER 3

Functional analysis of the Glucan Degradation Locus (GDL) in Caldicellulosiruptor bescii reveals roles of component glycoside hydrolases in plant biomass deconstruction

To be submitted to Molecular Microbiology

135

Abstract

The Glucan Degradation Locus (GDL) in extremely thermophilic Caldicellulosiruptor species encodes multi-domain glycoside hydrolases (GHs) that are essential for deconstruction of plant biomass polysaccharides. The genomes of the most prolific lignocellulose degraders in the genus contain six identifiable GHs in the GDL, all of which have cellulose-binding modules (CBM3) and four of which have catalytic domains (GH9,

GH44, and GH48) related to crystalline cellulose hydrolysis. The genes encoding the six

GHs in the Caldicellulosiruptor bescii GDL were systematically deleted prior to examining the extent to which the resulting mutants could solubilize crystalline cellulose (Avicel) and plant biomasses switchgrass and poplar. Together three of the GHs, Athe_1867, Athe_1859, and Athe_1857, are responsible for about 90% of the ability to degrade crystalline cellulose and these enzymes act jointly to degrade crystalline cellulose in vivo. Additionally, the relative importance of the different GHs in the GDL varies with plant biomass substrates, with Athe_1867 being most important for poplar degradation and Athe_1857 being most important on switchgrass. Mixed cultures containing two different knockout strains revealed that the presence of all the GDL GHs, even when produced from different strains, degrades switchgrass equally as well as wildtype, suggesting the mix of enzymatic activities of the secretome enzymes is the essential factor for achieving the highest degradation. Full-length, glycosylated versions of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 were produced recombinantly in the native host and examined in vitro separately and in cocktails for crystalline cellulose (Avicel) hydrolysis. While Athe_1867 is the most prolific cellulase, its synergistic action with the other GDL GHs improves its degradation of cellulose two-fold.

136

Further, all of the GH enzymes exhibit some degree of synergy with other GDL GHs, with the pair of Athe_1867 and Athe_1859 exhibiting increased activity 3.3-fold over the activity from their separate contributions. These results demonstrate that the GDL GH enzyme inventory in Caldicellulosiruptor species act to effectively drive maximal degradation of plant biomass substrates.

Introduction

Anaerobic, Gram-positive bacteria from the genus Caldicellulosiruptor are the most thermophilic lignocellulosic biomass degraders currently identified (1,2). These species can be isolated from globally distributed terrestrial hot springs and have optimal growth temperatures between 65-75°C. Caldicellulosiruptor species can degrade unpretreated lignocellulosic substrates, driven by a variety of large, multi-domain carbohydrate active enzymes (CAZymes), including glycoside hydrolases (GHs) (3). These large, single- polypeptide enzymes represent an alternative paradigm for plant biomass degradation compared to fungal or cellulosomal plant biomass degraders (1,4).

Across the 12 Caldicellulosiruptor species that have been isolated, genome sequenced and characterized, the ability to degrade crystalline cellulose varies (5). As the main recalcitrant carbohydrate component of the plant cell wall, efficient degradation of crystalline cellulose is essential to plant biomass conversion. Comparative genomic and phenotypic analyses of the species in the genus have identified CAZy domains that are common to the highly cellulolytic species, including carbohydrate binding module (CBM) family 3 and glycoside hydrolase (GH) families 9, 44, and 48 (6). The genes containing these

137

cellulolytic enzymes are concentrated in one genomic locus, which will be referred to as the

Glucan Degradation Locus (GDL) (5). In C. bescii, the most studied member of the genus, this locus is approximately 50 kb of the 2.9 Mb genome and contains six glycoside hydrolase enzymes (Athe_1867, 1866, 1865, 1860, 1859, and 1857), all of which contain repeated cellulose binding modules from CBM family 3 and Pro/Thr-rich linker regions between

CAZyme domains. The repeated CBM3 domains and linkers, as well as repetitive GH5 and

GH48 sequences, are characteristic of the architecture of the GDL in Caldicellulosiruptor species and are hypothesized to be the result of recombination after gene duplication events

(1,7,8). In fact, the large repetitive sequences of this locus led to errors in genome assembly of the GDL in the initial C. bescii genome sequence (9); this was recently corrected by locating the genes Athe_1859-1857 between Athe_1867 and Athe_1866 (10). Transcriptomic and proteomic analyses of highly cellulolytic Caldicellulosiruptor species show that the GDL is highly transcribed and expressed both during growth on crystalline cellulose and switchgrass (3,11,12).

The most extensively studied enzyme from Caldicellulosiruptor species is the first gene encoded in the GDL, i.e., a GH9/GH48 cellulase referred to as CelA (Athe_1867 in C. bescii). CelA is highly active on crystalline cellulose and one of the most highly expressed proteins in the extracellular proteome (12-14). The N-terminal GH9 domain of CelA functions as an endoglucanase, while the C-terminal GH48 domain is a cellobiohydrolase releasing primarily cellobiose from crystalline cellulose (15). Homologs of CelA (seven full and one partial) are found in the genomes of eight Caldicellulosiruptor species, which represent the most highly cellulolytic members of the genus. CelA displays a surface ablasion

138

mechanism common to processive cellulases. CelA also has a unique cavity digging mechanism, allowing it to burrow into cellulose microfibrals, making it one of the most effective cellulases characterized to date (14).

While CelA has been characterized most extensively, the other five enzymes from the

GDL of C. bescii, with the exception of GH74/GH48 Athe_1860, have been characterized to varying extents (16-19). All of the characterized enzymes have an optimal temperature between 85-90°C and an optimal pH between 5 and 6.5. While CelA is particularly well suited for crystalline cellulose degradation, all of the other characterized enzymes from the

GDL are also active on cellulosic substrates. This includes the GH10 domain of Athe_1857, which displays broader substrate specificity, including on β-glucans, than is typical for enzymes with GH10 domains that are typically active only on xylans (19). In addition to the

β-glucan activities encoded in enzymes of GDL, a highly repetitive GH5 domain, common to

Athe_1859, Athe_1866, and Athe_1865, is active on mannan-containing substrates. Taken together, the GH domains of the GDL from families 5, 9, 10, 44, and 48 show activity on a broad collection of polysaccharide substrates that make up complex plant biomass, including

β-glucans, mannans, and xylans. This wide range of activities would suggest that GDL enzymes play a critical role in enabling Caldicellulosiruptor spp. to scavenge growth substrates from a broad array of polysaccharides encountered in their globally diverse thermal environments.

The predicted molecular weight of CelA, inferred from amino acid sequence, is 190 kDa, although the protein has an apparent Mr of 230 kDa as measured in culture supernatants from C. bescii, the result of glycosylation that appears to be related to secretion (20).

139

Similarly, the other GDL enzymes appear at larger molecular masses than predicted by their amino acid sequence alone. The effect of glycosylation and other native post-translational modifications on these enzymes is largely unknown, as the majority of characterized GDL enzymes were produced in other hosts (Escherichia coli or Bacillus megaterium). Only CelA

(Athe_1867), which was purified natively from C. bescii, has been investigated in its native glycosylated state (13,14). Further, the size and repetitive nature of these genes makes producing recombinant full length versions of these enzymes in traditional protein expression host microorganisms difficult, leading to the characterization of only truncation mutants of the largest enzymes from the GDL.

Here, the individual and collective roles of the GHs encoded in the GDL are examined, as these relate to the deconstruction of crystalline and plant biomass by C. bescii.

This study was made possible by the recent corrections in the GDL genome sequence and improvements in C. bescii genetics (10,21) that facilitated gene deletions and overexpression of the native GHs in this bacterium.

Experimental Procedures

Bacterial Strains, Plasmids and Reagents

For vector construction, NEB 5-alpha and 10-beta E. coli (New England Biolabs) were used. Cloning in E. coli was performed in pET28b (Novagen) and protein was expressed in Rosetta 2 (DE3) E. coli (Millipore) were used for the expression of proteins in

E. coli. An axenic strain of C. bescii was obtained from the Leibniz Institute DSMZ-German

Collection of Microorganisms and Cell Cultures. C. bescii strain JWCB005 (22) and non-

140

replicating vector pDCW121 (23) and replicating shuttle vector pDCW89 (24) were obtained from Dr. Janet Westpheling (University of Georgia, Athens, Georgia). Caldicellulosiruptor genomic DNA (gDNA) for cloning, vector construction, and genome sequencing was extracted, as described previously (25). Genes of interest were amplified by polymerase chain reaction (PCR) and inserted into vectors using Gibson assembly (26), using Gibson

Assembly Mastermix (New England Biolabs). The addition of terminators was performed by placing the sequences on primers and using the Q5 site-directed mutagenesis kit (New

England Biolabs) for the insertion. Plasmids were isolated using ZymoResearch plasmid miniprep classic and ZymoPURE midiprep kits (Zymoresearch); sequences were confirmed by Sanger sequencing (Genewiz). Carbohydrates and biomass used in this study are as follows: Avicel PH-101 crystalline cellulose, Cave-in-rock switchgrass (Panicum virgatum

L.) cultivar Cave-in-Rock field grown in Monroe County, IA, obtained from the National

Renewable Energy Laboratory (NREL) was ground using a Wiley mill (Thomas Scientific) and sieved to 40/80 mesh. Poplar from four-year-old Poplar trichocarpa variants GW-9947 and GW-9762 grown in Clatskanie, OR were milled using a 20-mesh screen on a Wiley-mill

(Thomas Scientific) and were obtained as such from Dr. Gerald A. Tuskan (Oak Ridge

National Lab). The growth conditions for these natural variant poplars were described previously (27,28). Carbohydrates used for enzyme activity assays are as follows: low viscosity barley β-glucan (Megazyme), tamarind xyloglucan (Megazyme), wheat arabinoxylan (Megazyme), 1,4-β-D mannan (Megazyme), lichenan (Megazyme), locust bean gum glucomannan (Sigma Aldrich), birch wood xylan (Sigma Aldrich), oat spelt xylan

141

(Sigma Aldrich), beech wood xylan (Sigma Aldrich), ivory nut mannan (Megazyme), and carboxymethyl cellulose (CMC) (Sigma Aldrich).

Media and Culture Conditions

All E. coli cultures were maintained in either Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or LB medium with 1.5% (w/v) agar plates. Media was supplemented with 50 μg/mL kanamycin (Fisher Scientific), 50 μg/mL apramycin

(Fisher Scientific), and/or 34 μg/mL chloramphenicol (Sigma Aldrich), as appropriate.

Athe_0458 was expressed in E. coli using ZYM-5052 auto-induction medium, as described previously (29).

C. bescii strains were grown anaerobically with 20% CO2 / 80% N2 headspace at

70°C without agitation in the following growth media. For routine cultivation, a modified version of DSM516 medium (DSMZ [http://www.dsmz.de]) was used, containing 0.33 g/L

NH4Cl, 0.33 g/L KCl, 0.33 g/L MgCl2 ● 6H2O, 0.14 g/L CaCl2 ● 2H2O, 0.16 mM sodium tungstate, 1mL/L trace element solution SL-10, 1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, 1 g/L sodium bicarbonate, 1mM potassium phosphate, 5 g/L glucose or cellobiose. The medium as prepared above is defined and is designated D516 medium. A complex version of this medium, designated C516 medium, is supplemented with 0.5 g/L yeast extract. To grow C. bescii for competent cell preparation,

LOD medium (30), containing either cellobiose or glucose as the carbon source, was supplemented with 1 x 19 amino acid solution (31). C. bescii was plated by embedding in

LOD medium, D516 medium, or C516 medium with 1.5% agar and grown at 65°C in

142

anaerobic conditions with 2% hydrogen/98% nitrogen. For the growth of all uracil auxotrophic mutants, the medium was supplemented with 40 mM uracil.

For solubilization experiments Caldicellulosiruptor strains were grown on a modified defined version of DSM671 medium, containing 1 g/L NH4Cl, 0.02 g/L NaCl, 0.1 g/L MgCl2

● 6H2O, 0.05 g/L CaCl2 ● 2H2O, 0.06 g/L K2HPO4, 1mL/L trace element solution SL-10,

1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, and 2.6 g/L sodium bicarbonate. Solubilization cultures contained 5 g/L plant biomass substrate as the carbon source and were supplemented with 40 mM uracil to support growth of auxotrophic mutants. Fifty µg/mL kanamycin was added to the solubilization cultures for strains resistant to kanamycin.

The vitamin solution for all media recipes contains the following: 20 mg/L biotin, 20 mg/L folic acid, 100 mg/L Pyridoxine-HCl, 50 mg/L Thiamine-HCl ● 2H2O, 50 mg/L riboflavin, 50 mg/L nicotinic acid, 50 mg/L D-Ca-pantothenate, 50 mg/L Vitamin B12, 50 mg/L p-Aminobenzoic acid, and 50 mg/L Lipoic acid.

Cloning and Purification of β-glucosidase Athe_0458

The gene for Athe_0458 was cloned via Gibson assembly (26) using Gibson

Assembly Mastermix (New England Biolabs) into pET28b with a C-terminal histidine tag and transformed into 5-alpha E. coli. Sequence confirmed plasmid was transformed into

Rosetta 2 (DE3) E. coli and protein was expressed in a 1L ZYM-5052 auto-induction media culture with incubation at 37°C with agitation at 250 rpm. Cells were harvested after 20 hours and frozen at -20°C prior to protein purification. The frozen cell pellet was thawed and

143

re-suspended with 5 mL / g wet cell pellet in 50 mM sodium phosphate, pH 7.4, 500 mM

NaCl, 20 mM imidazole containing 1 mg / mL lysozyme (ThermoFisher Scientific). Cells were lysed using a French press at 16,000 psi cell pressure. The lysate was heat treated at

65°C for 20 minutes to remove heat labile E. coli proteins and then centrifuged at 18,000 x g for 30 minutes and 0.22 µm filtered. The cell extract was loaded on a 5 mL HisTrap HP nickel-Sepharose (GE Healthcare) column, operated according to the manufacturer’s instructions using a Biologic DuoFlow FPLC (Bio-Rad). Fractions containing Athe_0458 were combined, concentrated, and buffer exchanged using a 10,000 MWCO Vivaspin PES concentrator.

Construction of Genetic Knockout Strains of C. bescii

The following knockout vectors were constructed via Gibson assembly in the pDCW121 vector backbone (23) using 1kb flanking regions to the desired gene deletion location: pJMC021 for RKCB120, RKCB121, and RKCB123; pJMC033 for RKCB124; pJMC034 for RKCB125 and RKCB127; pJMC069 for RKCB130; and pJMC070 for

RKCB132. In all of these vectors, a 200bp upstream of the Athe_2303 S-layer protein, designated the Pslp promoter, and the CbHTK kanamycin resistance gene were placed between the 1kb flanking regions. Using this vector design the genes knocked out in each strain were replaced by this PslpCbHTK cassette to allow for direct selection of second crossovers using kanamycin, as described previously (21).

To prepare competent cells, a 500 mL culture of C. bescii strain JWCB018 was grown to an optical density at 680 nm of 0.06-0.08 in LOD medium supplemented with 1 x

144

19 amino acid solution. The culture was cooled to room temperature and then cells were harvested. Following this, the cell pellet was washed three times in 10% sucrose solution. All centrifugation steps were carried out at 6000 x g for 10 min. The final cell pellet was re- suspended with 10% sucrose to a total volume of 100-120 µL. Cells (50 µL) were mixed with 1 µg of plasmid DNA at room temperature and electroporated using 1 mm gap cuvettes

(USA Scientific) and a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) using the following conditions: 2.0 kV, 200 Ω, and 25 µF. Immediately following electroporation, cells were transferred to 10 mL of pre-warmed complex medium and incubated at 70°C. At 15 minutes and one hour post-electroporation, 1-5 mL of the recovery culture was transferred into selective medium supplemented with 50-100 µg/mL kanamycin.

Cultures were incubated at 70°C for 1-4 days, until growth was observed. This culture was passaged on liquid selective medium once, and then plated on selective medium lacking uracil to select individual colonies. For integrating vectors, a successful first crossover mutant strain was plated on medium supplemented with 40 mM uracil, 4 mM 5-fluoroorotic acid (5-FOA), and 50 µg/mL kanamycin to select for second crossover mutants. Strains were screened by PCR and successful second crossover strains were plated on medium supplemented with uracil and kanamycin two additional times to ensure purity of the strains.

All PCR screening of modified strains were performed on genomic DNA, isolated using the

ZymoBead™ genomic DNA kit (Zymo Research).

145

Genomic Sequencing of Modified C. bescii Strains

Assembled draft genomes sequences were generated by the US Department of

Engery’s Joint Genome Institute (JGI) (32) for C. bescii DSMZ 6725 (wildtype resequencing), JWCB005, RKCB120, RKCB121, RKCB123, RKCB124, RKCB125,

RKCB127, RKCB130, and RKCB132. Sequencing was performed on the Pacific

Biosciences (PacBio) RS / RS II platform with SMRTbell libraries (33).

Secretome Analysis

C. bescii strains were grown in 300 mL D516 medium cultures containing cellobiose as the substrate at 70°C. Growth was monitored by measuring optical density at 600 nm

(OD600) and cultures were harvested during early stationary phase (1-3 x 108 cells/mL) by rapidly cooling the culture in a dry ice bath and centrifuging at 8000 x g for 10 min. The culture supernatant was reserved and stored at 4°C. Cells were re-suspended, transferred to a microcentrifuge tube and centrifuged at 21,000 x g for 1 min. The supernatant was removed and cells were stored at -80°C prior to RNA extraction.

The culture supernatant was centrifuged at 18,000 x g for 20 min to remove any residual cell debris and concentrated and buffer exchanged into 50 mM sodium acetate buffer pH 5.5 containing 10 mM calcium chloride and 100 mM sodium chloride using 50,000

MWCO 20 mL Vivaspin concentrators to a final volume of 5 mL. Total protein concentrations were determined using the BCA assay (Thermo Fisher Scientific). A Mini-

PROTEAN TXG stain free 4-15% gel (Bio-Rad) was run using these supernatant samples with equal total protein loading across all strains. Glycoprotein visualization was performed

146

by staining the gel with the Pierce Glycoprotein Staining Kit (ThermoFisher Scientific). Gel imaging was performed using the Syngene G:BOX gel imaging system.

Construction of Enzyme Expression Strains of C. bescii

Vector pJMC046 was constructed from pSBS4 through several manipulations of

Gibson assembly and site directed mutagenesis. The vector is selected in both E. coli and C. bescii using 50 µg/mL kanamycin. Purified vectors were methylated using recombinantly expressed M.CbeI methylase, as described previously (34). Following methylation, the vector was purified by extracting once with phenol:chloroform:isoamylalcohol (24:25:1) and performing ethanol precipitation. The final plasmid was re-suspended in distilled water. For protein expression strains, electroporations were carried out as described for genetic knockout strains using C. bescii DSMZ wildtype as the parent strain. After growth under selective conditions, these strains were plated once on C516 medium containing glucose and grown in the same liquid medium to obtain the final strains. To verify replicating vector strains, plasmid DNA was isolated from C. bescii using the ZymoResearch plasmid miniprep classic kit was transformed into 5α E. coli. Plasmid DNA was then isolated from colonies of

E. coli and restriction mapped to verify plasmid integrity.

Protein Expression and Purification from C. bescii Strains

Protein expression strains of C. bescii were grown overnight in 500 mL C516 medium containing glucose and 50 µg/mL kanamycin at 70°C and inoculated into a Sartorius

Stedim BioStat Cplus 20L bioreactor containing 17 L of C516 medium with glucose and 50

147

µg/mL kanamycin. The bioreactor was operated at 70°C with agitation at 200 rpm, sparging with 1 L/min 20% CO2/80% N2 gas mix; pH was controlled with NaOH addition at pH 7.00.

The reactor was operated for 22 hours, then cooled to 20-30°C and harvested into a carboy.

The fermentation broth was fractionated using the Millipore Pellicon Mini Tangential Flow

Filter using a Pellicon 2 mini 0.22 µm GVPP 0.1 m2 filter (Millipore) to separate cells from the extracellular protein fraction. This extracellular protein fraction was concentrated as the retentate fraction from the TFF system using a Pellicon 2 mini BioMax 50 kDa cutoff PES

0.1 m2 filter (Millipore) and buffer exchanged into 20 mM sodium phosphate 500 mM sodium chloride buffer pH 7.4. The fermentation broth from 17 L was concentrated to a final volume of 300-800 mL. The final material was sterile-filtered prior to loading it onto a 5 mL

HisTrap HP nickel-Sepharose (GE Healthcare) column, operated according to the manufacturer’s instructions using a Biologic DuoFlow FPLC (Bio-Rad). Protein elution fractions were concentrated and buffer exchanged using 20 mL 50,000 MWCO Vivaspin concentrators. Purity of the proteins was evaluated by SDS-PAGE using Mini-PROTEAN

TGX Stain-Free 4-15% gels (Bio-Rad) with Benchmark protein ladder (Thermo-Fisher).

Protein concentrations were determined using the Pierce BCA protein assay (Pierce, Thermo-

Fisher). Glycoprotein visualization was performed by using the Pierce Glycoprotein Staining

Kit (ThermoFisher Scientific).

Solubilization of Lignocellulosic Biomass Substrates

Prior to starting the solubilization experiment, Caldicellulosiruptor strains were passaged on modified defined DSM671 medium with each of the biomass substrates 3 times

148

to ensure growth on the biomass substrate as the sole carbon source. Fifty mL solubilization cultures were prepared in triplicate for each substrate loaded at 5 g/L and strain in 125 mL serum bottles. Caldicellulosiruptor strains were inoculated into these cultures at a density of

1 x 106 cells/mL. For mixtures of two strains in the same culture, each strain was added at a density of 5 x 105 cells/mL. After growth for 7 days at 70°C, cultures were harvested by centrifugation at 6000 x g for 10 min and washed 3 times with 70°C sterile water. The residual substrate was dried at 70°C until constant mass. The percent solubilization was determined from the difference in mass between the biomass used to prepare each culture and the residual remaining after harvest and drying.

Biochemical Characterization of GDL Glycoside Hydrolases

The activity of Athe_1860 was measured by incubating 1 mg/mL enzyme with 1% polysaccharide substrate at 70°C for 4 hours in 50 mM sodium acetate buffer pH 5.5, containing 10 mM calcium chloride and 100 mM sodium chloride. The pH optimum was determined at 70°C in buffers at pH 3.5-10 (50 mM citrate buffer pH 3.5; 50 mM sodium acetate buffer pH 4.5, 5, 5.5, and 6; 50 mM sodium phosphate pH 6.5, 7, and 8; 50 mM sodium bicarbonate pH 9.2 and 10). The temperature optimum was determined between

55°C and 100°C. Activity was measured using the 3,5-dinitrosalicylic acid (DNS) reducing sugar assay (35,36) adapted to a 96 well format (37). The DNS-reagent used in this study contained 1.6% (w/v) sodium hydroxide, 30% (w/v) sodium potassium tartrate, and 0.1%

(w/v) 3,5-dinitrosalicylic acid. Enzyme reactions were mixed with two volumes of DNS reagent and heated in a thermocycler at 95°C for 5 min, 48°C for 1 min, followed by holding

149

at 20°C. Thirty-six µL of this reaction was mixed with 160 µL distilled water in a 96-well plate and read at 540 nm. Mannose, glucose, and xylose were standards were run alongside the samples to convert absorbance readings to reducing sugar equivalents, as appropriate.

Enzyme cocktail experiments were performed in 50 mM sodium acetate buffer pH

5.5 containing 10 mM calcium chloride and 100 mM sodium chloride with 5 g/L Avicel. To approximate the C. bescii secretome, a mix containing a 25:2:10:1 mass ratio of Athe_1867 :

Athe_1859 : Athe_1857 : Athe_1860 was used based on densitometry of the protein gel for the C. bescii secretome. For a 1mL assay, 100 µg of GDL GH protein was loaded with and without 1µg β-glucosidase Athe_0458. This corresponded to 65.79 µg Athe_1867, 5.26 µg

Athe_1859, 26.32 µg Athe_1857, and 2.63 µg Athe_1860 loading as the wildtype mixture.

To approximate the knockout strains, the enzyme or enzymes knocked out of strains

RKCB120, RKCB121, RKCB124, RKCB125, RKCB130, and RKCB132 were removed from the wildtype enzyme mix and replaced with an equivalent volume of buffer. To analyze the potential contribution of overexpression of enzymes downstream of the knockout location and insertion of the Pslp CbHTK cassette, assays were also prepared with the enzyme mixes of

RKCB120, RKCB130, and RKCB132 with 1.5 x and 3 x the amount of Athe_1859 and/or

Athe_1857, the genes downstream of the deletion sites in these strains. These enzyme assays were incubated statically, to approximate the solubilization culture conditions, at 70°C for 24 hours. To further evaluate the enzyme loading, assays were also conducted with 100 µg, 10

µg, and 1 µg GDL GHs with and without 1 µg β-glucosidase incubated statically for seven days at 70°C. To evaluate the synergism of the GDL enzymes, assays were prepared with a constant 0.5 µM enzyme loading and all possible combinations of 1, 2, 3, and 4 enzymes

150

loaded in equal molar ratios, with and without 1 µg/mL β-glucosidase. This assay was incubated at 70°C with agitation at 500rpm in an Eppendorf thermomixer for 24 h. At the conclusion of all assay incubations, samples were boiled for 15 min to inactivate the enzymes. Cellobiose and glucose release was quantified using an Aminex-87H column

(300mm by 7.8 mm, Bio-Rad) operated with a 5 mM H2SO4 mobile phase at 60°C and 0.6 mL/min. Specific activity units (U) are µmol reducing sugar equivalents released per min for

DNS analyzed samples and µmol glucose equivalents released per min for HPLC analyzed samples. Solubilization is calculated as a percentage of the anhydrous glucan content of the

Avicel loaded in the assay.

Results

The C. bescii Glucan Degradation Locus

The Glucan Degradation Locus (GDL) is highly conserved in the seven most highly cellulolytic Caldicellulosiruptor species characterized to date, including C. bescii. Figure

3.1 shows the C. bescii GDL along with transcriptomic data collected for C. bescii grown on crystalline cellulose (Avicel) and switchgrass, and proteomics data on the extracellular protein fraction of C. bescii grown on six different growth substrates from simple sugars to complex plant biomass. C. bescii has six glucan-degrading enzymes in this locus, which each contain two catalytic GH domains, shown as boxes containing the GH family number in

Figure 3.1A. All of these enzymes contain either two or three cellulose binding modules from CBM family 3. While these GHs represent the core degradation ability of the GDL,

151

other proteins and enzymes encoded in the locus likely contribute to C. bescii’s plant biomass degradation ability.

Upstream of Athe_1867 (CelA) are previously characterized cellulose binding proteins, termed tāpirins, Athe_1871 and Athe_1870 (38). The location of these cellulose- binding proteins near highly cellulolytic enzymes is likely not coincidental. The tāpirins putatively play an important role in cell attachment and anchoring to cellulose, which, in turn, aid in the localization of the secretome, including the GDL enzymes in proximity to the substrate. Located downstream of the GHs in the GDL are three polysaccharide lyase encoding genes: Athe_1855-1853. Athe_1854 has been previously characterized and a crystal structure for the PL3 domain has been solved (39,40). These genes have also previously been knocked out of genetically tractable C. bescii strain JWCB005 and the resulting strain JWCB010 not surprisingly showed decreased ability to grow on pectin-based substrates (41). These three PL genes are not found in the GDLs of all the seven highly cellulolytic species. For example, the GDL of C. saccharolyticus is missing all three PL genes, while the GDL of C. obsidiansis is missing the PL9 and PL11, and in C. krontoskyensis the GDL is missing only the PL9.

In between the GH enzymes Athe_1865 and Athe_1860, are four genes that are highly conserved in all seven highly cellulolytic Caldicellulosiruptor species. Athe_1864 and

Athe_1863 belong to glycosyl transferase protein families that are involved in O- mannosylation of proteins. Protein O-mannosyltransferases (PMTs) are found in both and eukaryotes, and typically have several hydrophobic transmembrane domains

(42). Because of their location in the cytoplasmic membrane in bacteria, glycosylation is

152

thought to be coordinated with secretion (43). This was seen previously for Athe_1867 in C. bescii where extracellular protein was glycosylated, but intracellularly expressed protein lacking the signal peptide was not glycosylated (20). O-mannosylation of various proteins from Gram-positive Mycobacterium species was found on Thr residues particularly in Pro,

Thr, Ala -rich sequences (44,45). Glycosylation patterns for the GDL GHs are not known, but highly Pro/Thr- rich linker sequences are prevalent in these proteins and may be the site of glycosylation.

Athe_1862 encodes a homolog to KWG Leptospira repeat proteins, so named because of their high abundance in Leptospira interrogans proteins, although its function is unknown. Athe_1861 is annotated as a peptidyl-prolyl cis-trans isomerase (PPIase), which are involved in the interconversion of the cis and trans forms of proline during protein maturation. Several types of PPIases are found in bacteria, but Athe_1861 has some sequence similarity to PrsA PPIases, which are thought to be anchored between the cell membrane and peptidoglycan and help to re-fold proteins as they are exported (46,47). This enzyme may play a role in helping in the export and proper extracellular folding of the GDL enzymes, particularly for their Pro/Thr-rich linkers.

To better understand the transcriptional and translational patterns of the GDL during growth on cellulose and plant biomass, previously reported transcriptomic (3) and proteomic data (48) were re-examined in C. bescii (see Figure 3.1B). The GDL GH-containing genes are very highly transcribed during growth on Avicel and switchgrass (3). For C. bescii growth on switchgrass, Athe_1867, Athe_1857, Athe_1865, and Athe_1860 are among the top 15 transcripts (top 0.5%) for the 2780 total genes in the genome. One notable exception

153

to this is GH5/GH44 Athe_1859, which is transcribed at approximately 6-fold lower levels than CelA (Athe_1867). Athe_1860 is separated from the other GDL GHs by the four genes

Athe_1864-1861, which are transcribed at more modest levels. Nevertheless, Athe_1860 is transcribed at comparable levels to the other GH-containing genes. This suggests that although Athe_1860 belongs to a separate operon, there may be coordinated regulation with the other GDL GHs.

Normalized matched ion intensity measurements of the secretome proteins were also analyzed with respect to the GDL from C. bescii grown on various substrates, including soluble sugars (glucose, cellobiose, xylose), biomass polymers (crystalline cellulose Avicel and xylan), and a complex biomass (switchgrass) (48). Normalization of assembled proteins was done using peptide ion intensity values, as described previously (49). These data reflect the transcriptomics, in that the GHs of the GDL are among the most abundant proteins in the extracellular protein fraction. For example, of the 470 proteins detected on Avicel crystalline cellulose and of the 330 detected on switchgrass, all of the GH-containing enzymes of the

GDL are in the top 10% most abundant of these proteins. Based on these proteomics data, the

GH48-containing enzymes Athe_1867, Athe_1859, and Athe_1860 are more abundant by approximately 4-8 fold than their non-GH48 counterparts Athe_1859, Athe_1866, and

Athe_1865. The tāpirin proteins (Athe_1870 and Athe_1871), as well as the polysaccharide lyases (Athe_1855-1853), are also measured in all of these samples at levels about 2-6 fold lower than the lowest GDL GH.

The structure of the operons in the GDL is of interest when considering the transcriptomic and proteomic profiles of the very large GH enzyme genes (3.9-5.7 kb) and

154

the accessory genes present here. To this end, TransTermHP (50) predictions of rho- independent terminators in this locus were investigated. Predicted terminator sequences are shown in Figure 3.1C along with the genes they follow and the TransTermHP combined score, which is related to the probability of a terminator of equal or better quality arising in a random sequence taking into account the GC content of the species. The scores generated by

TransTermHP range from 0 for poor quality terminator predictions to 100 for high quality terminator predictions. In the GDL, predicted terminators with score 100 appear after

Athe_1871, Athe_1868, Athe_1867, Athe_1857, and Athe_1860. Terminators are also predicted after CAZymes Athe_1859, Athe_1865, and Athe_1854, albeit with slightly lower scores (73, 77, and 82 respectively). No terminator is predicted between Athe_1866 and

Athe_1865, or between peptidyl-prolyl cis-trans isomerase Athe_1861 and GH74/GH48 enzyme Athe_1860. These terminator predictions would suggest all of the GH enzymes in the GDL are in separate operons, except Athe_1865 and Athe_1866 that appear to be in a single operon. The locations of these predicted terminators in the GDL are in good agreement with the transcriptomic and proteomic data.

Repetitive Nature of the GDL

The GDL is characterized by its repetitive CBM3, GH5, and GH48 domains, not only at the amino acid level but also at the nucleotide level. Recently, re-sequencing of the C. bescii DSMZ 6725 genome using a PacBio system revealed that the GDL was incorrectly assembled in the originally published sequence. In the C. bescii genome, Athe_1859-1857 is, in fact, between Athe_1867 and Athe_1866. This error occurred because the GH48

155

sequences in Athe_1867, Athe_1857, and Athe_1860 are identical and the short reads of 454 sequencing technology were not able to read through these > 1 kb repeats. Even manual curation of the sequence via Sanger sequencing lead to mistakes that misassembled this locus. To visualize the repetitiveness of the nucleotide sequence of the GDL in C. bescii, a dot plot shown in Figure 3.2 was constructed using EMBOSS dottup

(http://www.bioinformatics.nl/cgi-bin/emboss/dottup) with a sequence word length of 50 bp to compare the recently revised C. bescii sequence to itself. In Figure 3.2, black pixels indicate where a 50 bp sequence at a position on the x-axis is an exact match to 50 bp at a position on the y-axis of the sequence. The diagonal line from the bottom left to top right represents the direct match of the sequence to itself, but the large number of additional lines indicate many other areas where nucleotide sequences 50 bp or larger are repeated. Many of these areas correspond to the repeated CBM3 domains. To better visualize the repeated

GH48 domains (1.8 kb) and GH5 domains (1 kb), these areas have been shaded orange and green, respectively. Of particular note is the 5.7 kb repeated region stretching from the

CBM3s and GH48 of Athe_1867 through the GH5 and CBM3 domains of Athe_1859, which is repeated in the CBM3s, GH48, GH5, and CBM3s of Athe_1857 and Athe_1866, and partially repeated in the CBM3s and GH48 of Athe_1860. This extended large repeat is the reason the genome was originally misassembled. It is important to consider these extended repeats in C. bescii to avoid issues with many standard biomolecular laboratory procedures, including PCR, sequencing, and targeted gene deletions. To this point, strain JWCB029 of C. bescii, in which CelA was deleted, showed reduced ability to grow on crystalline cellulose as well as some lignocellulosic substrates, suggesting its important role in the degradation of

156

cellulosic substrates (51). Due to the repetitive nature of the GDL and the now corrected sequence, it is not clear whether this strain is in fact a deletion of only Athe_1867 CelA or a larger deletion that included Athe_1867, Athe_1859, and Athe_1857. The reverse screening primer used in this study binds in the repeated GH5 domain highlighted in green in Figure

3.2. So, while CelA was certainly deleted in strain JWCB029 (51), it cannot be concluded that only CelA and not a larger portion encompassing multiple enzymes was deleted. This study showed a severe growth phenotype on crystalline cellulose. But without knowing precisely which genes are deleted from the GDL, it is unclear that the absence of only CelA was responsible for this phenotype.

In vivo Analysis of GDL Enzymes through Targeted Gene Deletions

To further probe the role of CelA and the other GH-containing enzymes of the GDL, a series of targeted gene deletions were constructed in C. bescii strain JWCB005, which was also recently sequenced (10). To enable clear selection of modified strains and prevent reversion to the parent strain during counter-selection on 5-FOA, a codon-optimized, highly thermostable kanamycin resistance gene (CbHTK) (21), driven by the promoter (Pslp) from S- layer protein (Athe_2303), was used to replace the genes that were deleted in these strains.

The scheme for the GDL in the knockout strains constructed using these methods are shown in Figure 3.3A. PCR screening was used as an initial verification of the strains, but the repeated regions of the GDL cause primer binding in several locations leading to several

PCR products for each reaction. As a final verification of these strains, the whole genomes were re-sequenced using PacBio sequencing to obtain long reads allowing for the assembly

157

through the repeated areas of the GDL. All of the genome sequences from these strains verify that the knockout strains are missing the genes targeted for deletion and that these are replaced by the Pslp CbHTK construct. This method of replacing the gene with a strong promoter and kanamycin resistance gene enabled genetic manipulation in this complex area of the genome. But one consequence of this method of manipulation is that genes in the same operon as the gene which was deleted could be more highly transcribed and expressed than in the wildtype or JWCB005 parent strain. As seen in Figure 3.1, the GHs in the GDL are transcribed and expressed at high levels on all substrates tested, so that their increased production should not have a significant impact on phenotype, although this issue needs further examination. Additionally, based on the terminator predictions in the GDL, all of the

GDL GHs are in separate operons except for Athe_1866 and Athe_1865. Based on these predictions, all of the PslpCbHTK insertions would be immediately followed by the native terminator for the deleted genes, except in the case of strains RKCB123 and RKCB127 where the terminator falls after the GH5 domain remaining from Athe_1865. This would suggest overexpression of genes downstream is less likely to occur, as it should be mitigated by the terminators in the GDL.

To assess the extracellular proteins produced by these strains and the expression of the enzymes of the GDL, cultures of C. bescii wildtype, parent strain JWCB005, and all of the GDL knockout strains RKCB120-RKCB132 were grown on cellobiose. The supernatant from these cultures was concentrated and separated by SDS-PAGE (see Figure 3.3B). The predicted location of the enzymes of the GDL are shown on the left, based on the genes knockout strains and proteomics performed on C. bescii extracellular proteins previously

158

(52). Athe_1867 CelA is clearly the predominant band in both the C. bescii WT and strain

JWCB005 samples, suggesting it is the most highly expressed. The enzymes deleted in the various RKCB knockout strains were not detected. For example, strains RKCB120,

RKCB130, RKCB121, RKCB123 are all lacking CelA, and the expected band corresponding to Athe_1867 is clearly absent in these lanes. Strain RKCB123, which is missing all but

Athe_1860 and the GH5 of Athe_1865, shows no large extracellular proteins besides the band for Athe_1860 at approximately 240 kDa.

Cellulose and Complex Plant Biomass solubilization by GDL GH Knockout Strains

The extent to which Caldicellulosiruptor species solubilize different complex plant biomass substrates is a key measure of the extracellular enzyme inventory of these species and their function as consolidated biomass processing microorganisms. To this end, C. bescii wildtype, genetic parent strain JWCB005, and GDL enzyme knockout strains RKCB120-132 were grown on crystalline cellulose (Avicel), switchgrass, and two lines of natural variant poplar to assess their ability to solubilize them. The percent solubilization of these substrates was determined at 70°C in static cultures with defined media, where the plant biomass substrate was the sole carbon source. These data are shown in Figure 3.4 expressed as the biotic solubilization percent for each strain, which is directly related to the secretome enzyme inventory of each strain and ability to degrade and grow on the substrate. Biotic solubilization does not include the thermal contribution to solubilization in an abiotic control, which were 2.1% for Avicel, 19.6% for switchgrass, 10.6% for poplar GW-9947, and 10.7% for poplar GW-9762. In these experiments C. bescii wildtype and parent strain JWCB005

159

performed nearly identically, showing the modifications to strain JWCB005 that make it genetically tractable do not affect its ability to degrade these substrates. C. bescii strains

RKCB120-132 have various genes from the GDL knocked out and replaced with the

PslpCbHTK cassette. It is important to note that while it could be possible that the strong promoter of this insertion might drive higher expression of the genes in the locus, particularly downstream, any potential increase in enzyme expression would be expected to increase solubilization. Thus, decreases in plant biomass solubilization by a given strain are a demonstrable effect of the deletion of the gene(s) replaced by the PslpCbHTK cassette.

Additionally, the predicted terminators after Athe_1867, Athe_1859, Athe_1857, Athe_1865, and Athe_1860 would prevent the PslpCbHTK cassette from driving higher expression of genes downstream in all of the knockout strains.

On Avicel (Figure 3.4A), strains RKCB120, RKCB130, RKCB121, and RKCB123, which respectively have knockouts of one, two, three, and five genes starting at CelA

Athe_1867 and continuing downstream, solubilize Avicel significantly less than the wildtype. Lacking CelA alone in strain RKCB120 causes only about a 33% reduction in the biotic solubilization of the wildtype. The double knockout of Athe_1867 and Athe_1859 has a similar reduction (20%) in solubilization. Strain RKCB121, which is lacking Athe_1867,

Athe_1859 and Athe_1857, has a much more severe growth phenotype with a 90% reduction in biotic solubilization, suggesting these three enzymes are the primary cellulases for C. bescii. The individual deletion of Athe_1857 in strain RKCB125 shows no difference from the wildtype and individual deletion of Athe_1859 in strain RKCB132 solubilizes cellulose as well or slightly better than the wildtype. It is interesting that the three strains lacking only

160

an individual member of these three main cellulases (Athe_1867, Athe_1859, or Athe_1857) have at worst a 33% reduction in cellulose solubilization (RKCB120). However, the combined deletion of all three genes in RKCB121 reduces cellulose degradation 90%. This would suggest the action of Athe_1867, Athe_1859, and Athe_1857 is synergistic or coordinated in some way. Additionally, strains RKCB124 and RKCB127 showed no significant reduction in biomass solubilization compared to the wildtype, suggesting the other three GDL GHs contribute little to the degradation of crystalline cellulose.

On switchgrass (Figure 3.4B), a complex herbaceous biomass substrate containing various hemicelluloses in addition to cellulose, a different set of GDL GH enzymes appear to be most important for degradation of this substrate. Here, strain RKCB120 lacking CelA, only has a 20% reduction in the total solubilization of the wildtype. Successive deletions of more genes downstream of CelA in strains RKCB130, RKCB121, and RKCB123 further reduce the switchgrass solubilization. Unlike on Avicel, strain RKCB125, which is lacking only Athe_1857, shows a significant reduction in biotic solubilization. While this

GH10/GH48 enzyme is active on glucan substrates (19), GH10s are typically xylanases, and the importance of this enzyme on a complex substrate containing xylan is not surprising.

Athe_1866 and Athe_1865 contribute modestly to switchgrass solubilization, as their deletion from RKCB121 to RKCB123 and from RKCB125 to RKCB127 results in small reductions in solubilization. Additionally, strain RKCB124, with Athe_1860 deleted, has only a very slight reduction in solubilization from the wildtype.

Two natural variant lines of Poplar trichocarpa (GW-9947 and GW-9762) with reduced lignin content were used to assess biotic solubilization of a complex hardwood

161

substrate (Figure 3.4C&D). Poplar GW-9947 contains 22.7% lignin with 1.7 S/G ratio and

GW-9762 contains 21.7% lignin with 1.5 S/G ratio. These were the lowest lignin variants, and most easily solubilized poplar in a previous solubilization study of five natural variant poplar lines by three cellulolytic Caldicellulosiruptor species, including C. bescii (53). In a different study of sugar release during enzymatic treatment, poplar GW-9947 ranked as the second best and GW-9762 ranked as sixth best for combined glucan and xylan release out of twenty two poplar variants (54). Here, the biotic solubilization of poplar GW-9762 was

11.7% for C. bescii wildtype with GW-9947 being about a third higher at 15.8%. Unlike on

Avicel and switchgrass, Athe_1867 CelA appears to play a very critical role in poplar degradation. Strain RKCB120, with only CelA deleted, has a biotic solubilization of 4.9% on poplar GW-9947, a reduction of 70% from wildtype and on GW-9762 this strain has almost all biotic solubilization eliminated (1.6% biotic solubilization, an 86% reduction). The knockouts of successively larger portions of the GDL (RKCB130, RKCB121, and

RKCB123) perform equally poorly on the more recalcitrant poplar GW-9762. On the less recalcitrant GW-9947, these additional knockouts do increase solubilization slightly from strain RKCB120. Similar to on switchgrass, strain RKCB125, lacking Athe_1857, is able to perform about half the biotic solubilization of the wildtype. The deletion of Athe_1866 and

Athe_1865 from strain RKCB125 to RKCB127 does not change poplar solubilization values, suggesting these enzymes do not play a critical role in poplar degradation. Additionally, the deletion of Athe_1860 in strain RKCB124 does not reduce poplar solubilization for either natural variant.

162

Switchgrass Solubilization by Mixtures of GDL Knockout Strains

To investigate whether the decreases in biotic solubilization are a result of a decreased secretome enzyme inventory, switchgrass solubilization cultures using a mixture of two knockout strains with differing enzyme knockouts were performed. For these experiments strains RKCB120, RKCB121, RKCB127, and RKCB130 and switchgrass were selected because these strains have biotic switchgrass solubilizations at a variety of intermediate levels (15.3%, 4.6%, 7.7%, and 11.8%, respectively) between the maximum

(18.8%) of the wildtype and minimum (0.0%) of the abiotic control. All possible combinations of two strains from this group were performed, and the biotic solubilization values for the mixed cultures are shown alongside the biotic solubilization values of the two individual strains when cultured separately in Figure 3.5. At the bottom of Figure 3.5, the six GDL GH enzymes are listed and those present in the specific strains are indicated. For the mixed cultures, the enzymes produced by one of the two strains are marked once and enzymes produced by both of the two strains in the mixtures are marked twice. Two of the combinations, 130+127 and 127+120, restore the full complement of GDL GH enzymes to the culture between the two combined strains. In the 130+127 combination, all of the enzymes are produced by one of the two strains, except Athe_1860, which is produced by both. The combination of 127+120 is identical, except Athe_1859 is also produced by both strains. In both of these mixed culture cases, restoring the full complement of GDL GH enzymes in the culture, even if they are produced by different strains, restores the biotic solubilization to levels similar to the wildtype. Similarly, the 127+121 mix, while not restoring the full GDL GH complement, restores a mixture similar to strain RKCB125 which

163

is only lacking Athe_1857. The mixture of 127+121 has a biotic solubilization of 7.3%, while strain RKCB125 individually was 9.5%. These mixtures also produce cases where the strains each produce the same set of enzymes and then one of the two strains in the mixture produces an additional one or two enzymes, as in the mixtures of 130+120, 130+121, and

120+121. For example in the mix of 130+120, strain RKCB130 duplicates the enzymes produced by RKCB120, RKCB120 additionally produces Athe_1859, and neither produces

Athe_1867. In this case the mixture solubilizes 15.9%, which is the same as the better solubilizing strain, RKCB120 alone (15.3%). A similar situation occurs for the mixtures of

130+121 and 120+121, where solubilization is close to that of the better strain alone,

RKCB130 and RKCB120, respectively. Taken together, data from these mixed cultures show that the GDL can be separated between two different strains of the same species and perform as well as if the genes are all present in the same species. This also shows that the most important factor in solubilization is the cocktail of extracellular enzymes produces in the culture, independent of the source organism producing the individual enzymes.

Expression and Purification of Glucan Degradation Locus Cellulases

Based on previous expression of proteins from a version of vector pATHE02 native to C. bescii (20) and selection of a version of this plasmid in wildtype C. bescii with the

CbHTK kanamycin resistance gene (21), a new vector pJMC046 was constructed for the expression of proteins with a C-terminal histidine tag in C. bescii using kanamycin selection to maintain the vector. A vector map for pJMC046 is shown in Figure 3.6A. This vector includes the pSC101 origin for replication in E. coli for vector construction. The CbHTK

164

gene driven by the PS30 promoter from the Athe_2105 30S ribosomal subunit is functional for selection of kanamycin resistance in both C. bescii and E. coli. This allows for the removal of the AAC apramycin resistance gene for selection of the vector in E. coli used in previous iterations of this replicating vector. The cloning site containing a C-terminal histidine tag is shown in Figure 3.6B, which is driven by the Pslp promoter from Athe_2303 and is followed by a stem loop sequence taken from after the gene Calkro_0402 in C. kronotskyensis for rho- independent termination. The genes encoding the main GDL cellulases, Athe_1867,

Athe_1859, Athe_1857, and Athe_1860, were separately inserted into this expression vector and transformed into wildtype C. bescii, maintained with kanamycin selection on rich growth medium.

Because the full gene, including the sequence coding for the native signal peptide, was cloned into pJMC046, these GDL GHs were secreted from C.bescii and found extracellularly. Large cultures (17 liters) of C. bescii strains containing the overexpression vectors for each of the four GDL GHs were grown for 22 h, following which the culture supernatant was processed using a tangential flow filtration system to remove the cells and cell debris and then concentrate and buffer exchange proteins larger than 50 kDa. The protein of interest expressed off the replicating vector was then purified from other extracellular proteins by immobilized metal affinity chromatography by utilizing the C-terminal histidine tag. The purified Athe_1867, Athe_1859, Athe_1857, and Athe_1860 produced in C. bescii are shown in Figure 3.6C. While versions of these proteins, except Athe_1860, have previously been characterized, only Athe_1867 CelA has been characterized from a version purified from a 600 L culture of C. bescii (14). By using a histidine tag, the impracticality of

165

native purification from very large culture volumes is avoided. Additionally, expression in the native host means that these versions of the four enzymes should be natively glycosylated by C. bescii, which is confirmed in Figure 3.6D.

Biochemical Characterization of Athe_1860

All of the GDL GH enzymes produced by C. bescii have been characterized in some detail, except for Athe_1860, which has not been characterized. Table 3.1 summarizes the activities of the GDL GH enzymes determined previously (13-19). Here, Athe_1860 was characterized, which at 1904 amino acids, it is the largest GH enzyme produced by C. bescii.

Proteins of this size are problematic to produce recombinantly in heterologous hosts, but this was done here in the native host. Athe_1860 was incubated overnight with a variety of polysaccharide substrates and reducing sugar release as measured by the DNS reducing sugar assay. This screening assay revealed activity on glucan substrates: barley β-glucan, lichenan,

CMC, and Avicel; xylan substrates: birchwood xylan, oat spelt xylan, beechwood xylan, wheat arabinoxylan; and mannan substrates: ivory nut mannan, 1,4 beta-D-mannan; glucomannan from locust bean; and tamarind xyloglucan. Specific activities on these substrates are shown in Table 3.2, where enzyme units are µmol reducing sugar released/min. Athe_1860 has relatively low activity on Avicel compared to the other cellulase enzymes found in the GDL, but like most other enzymes in the GDL, Athe_1860 was active on a broad range of xylans, mannans, and glucans.

166

In Vitro Analysis of Glucan Degradation Locus Enzyme Cocktails

Using versions of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 recombinantly produced in C. bescii, an enzyme cocktail was constructed to mimic the enzyme mix found in the wildtype secretome. The ratio on a mass basis of 25 : 2 : 10 : 1 for Athe_1867 :

Athe_1859 : Athe_1857 : Athe_1860 was determined by densitometry on the wildtype secretome sample. This mixture is shown in Figure 3.7A alongside the C. bescii wildtype secretome sample. Using this wildtype mixture at a loading of 100 µg/mL, enzyme mixes corresponding to knockout strains RKCB120, RKCB132, RKCB130, RKCB121, RKCB124, and RKCB125 were constructed (Figure 3.7B) by removing the enzymes knocked out in these strains from the enzyme mixture. To investigate what the potential effect in vitro of increased expression of genes downstream of the PslpCbHTK cassette insertion, the enzymes downstream of the deletion locus were increased 1.5- (data not shown) and 3-fold (see mixes

120+, 120++, 130+, and 132+ in Figure 3.7B) in addition to removing the enzymes knocked out of these strains. The same concentrations of each of the enzymes in the wildtype mix were also used individually to show the activity of the individual enzymes. The activity of these enzyme cocktails were evaluated on Avicel, with and without C. bescii β-glucosidase

Athe_0458 (Figure 3.7A) loaded at 1 µg/mL, without shaking to mimic the solubilization culture conditions. The close homolog of β-glucosidase Athe_0458 in C. saccharolyticus has previously been characterized (55). Here, the goal of using the β-glucosidase from C. bescii, which does not have a signal peptide and is predicted to be intracellular, is to relieve product inhibition of the enzymes by cellobiose. This strategy was used previously by using addition of Thermotoga maritima β-glucosidase when characterizing Athe_1867 (14). Product

167

inhibition would be minimal in culture with Caldicellulosiruptor species as the bacterium, when it is growing, consumes cellobiose and other soluble sugar products of the GDL enzymes as they are produced, preventing their build up extracellularly.

The cellobiose and glucose released by the various enzyme cocktails recapitulating the wildtype and various knockout strains is shown in Figure 3.7C, without the addition of

β-glucosidase. Specific activity values for these mixtures are shown in Figure 3.7D. For the wildtype enzyme mixture, about 550 µg/mL of cellobiose and glucose were released from

Avicel in 24 hours. This corresponds to a specific activity of 0.0219 U/mg total protein. The enzyme mixes corresponding to strains 124 (lacking Athe_1860) and 125 (lacking

Athe_1857) perform almost identically to the wildtype mix, although these have slightly higher calculated specific activity (0.0227 and 0.0291 U/mg total protein, respectively) because less enzyme is added. Mix 120 is missing Athe_1867, mix 132 is missing

Athe_1859, and mix 130 is missing both. Interestingly, the removal of the relatively small amount of Athe_1859 (5% of the total enzyme mass) from the wildtype mix to 132, results in a loss of 40% of the sugar release. The 120 and 130 mixes release even less sugar. However, mix 120 has the highest specific activity of this group at 0.0150 U/mg total protein. Adding additional Athe_1859 and/or Athe_1857 to mixes 120, 132, and 130 to make mixes 120+,

120++, 130+, and 132+, shows no significant improvement in released sugar, despite the increased total enzyme. Thus specific activity decreases from the 120 to 120+, 120 to 120++,

130 to 130+, and 132 to 132+ mixes. The 121 mix, which has only a very low loading of

Athe_1860 releases no measurable sugar on the timescale of this assay.

168

Figure 3.7E shows cellobiose and glucose released with the addition of 1 µg/mL β- glucosidase to the same enzyme mixes. The trends comparing different enzyme mixes here are the same as without the β-glucosidase addition, but the β-glucosidase significantly increases the total glucose equivalents released, and thus the specific activities shown in

Figure 3.7D are increased. The soluble sugar measurements, both with and without β- glucosidase, are converted to the % glucan solubilization (Figure 3.7F), based on the 5 mg/mL loading of Avicel in the assay, assumed to have the molecular weight of anhydrous glucan (162 g/mol). The addition of β-glucosidase increases the solubilization significantly

(between about 1.5- and 2.5-fold) compared to the assays without the addition of β- glucosidase.

One additional important observation from these enzyme cocktail experiments is that they demonstrates synergism between the enzymes in the GDL. This is most evident for the wildtype mix, which releases more soluble sugar than the sum of that of the constituent enzymes (shown at the right in Figure 3.7C/E). For the assay without β-glucosidase the wildtype releases 566 µg glucose equivalents/mL while 400 µg glucose equivalents/mL would be expected to be released if the release of Athe_1867, Athe_1859, Athe_1857, and

Athe_1860 were purely additive and not synergistic. This demonstrates that some degree of synergism is present in the activity of these enzymes.

To evaluate the synergistic action of these four GDL GHs, the activities of the individual enzymes and equimolar mixtures of all combinations of 2, 3, and 4 of these enzymes were assessed on Avicel with a fixed protein concentration of 0.5 µM. Figure 3.8A shows the percentage of the mixes A through P that are made up of each enzyme. These

169

assays were performed without the addition of β-glucosidase Athe_0458 to assess the synergism of the GDL enzymes only. The glucose and cellobiose release for enzyme mixes

A-P are shown in Figure 3.8B. The degree of synergism for these mixtures was calculated as the ratio of the activity of the mixture to the sum of the activities expected for the individual constituent enzymes. This degree of synergism is shown in Figure 3.8C. For enzymes that are not synergistic, the activities of the individual components would be purely additive giving a degree of synergism of 1.0. Here, synergy is clearly evident between all combinations of these GDL enzymes, except possibly the pairwise combination of

Athe_1857 and Athe_1860 in mix J with a degree of synergism of 1.2. The pair of

Athe_1867 and Athe_1859 exhibits the highest degree of synergy of any pair of enzymes at

3.3. These assays were performed with the addition of 1µg/mL β-glucosidase (data not shown). With the β-glucosidase, total sugar release was increased, similar to the increased activity shown with the addition of β-glucosidase for the enzyme cocktails in Figure 3.7E.

Even with the addition of β-glucosidase the degree of synergism values for the enzyme mixes without β-glucosidase, shown in Figure 3.8C, are not changed.

Discussion

From a genomics perspective, the presence of the GH9, GH44, and GH48 enzymes in the genome of Caldicellulosiruptor species is the major differentiating factor between cellulolytic and non-cellulolytic phenotypes in these extremely thermophilic bacteria (5,6).

These particular GH families are all found in the GDL of Caldicellulosiruptor species. The

GDL encodes six enzymes in C. bescii containing four GH5, two GH9, one GH10, one

170

GH44, three GH48, and one GH74 domains. Genomic loci with similar sets of GH domains are also found in many other cellulolytic bacterial species including Clostridium cellulolyticum (two GH5s, GH8, five GH9s, PL11, GH48) (56,57) and Acidothermus cellulolyticus (two GH5s, GH12, GH6, GH48, and GH74) (58). However, other bacteria do not have a concentrated locus of cellulase genes, and instead these genes are scattered throughout the genome (59). Nevertheless, all cellulolytic bacteria produce complementary

GHs from the major cellulase families: GH5, GH6, GH8, GH9, GH12, GH44, GH45, GH48, and GH124 (60).

While there are many bacteria that produce cellulases, the multi-domain structure of the cellulase enzymes in the GDL is unique to Caldicellulosiruptor species. The architecture of these genes is highly similar with GH domains on the N-terminus and C-terminus and two or three cellulose binding CBM3 domains in between. This multi-domain architecture combined with the highly repetitive nature on the nucleotide sequence level (Figure 3.2) suggests this locus arose by domain shuffling (7,8). This is further supported when comparing the GDLs across Caldicellulosiruptor species (1). For example, the GDL of C. saccharolyticus is missing the repeated CBM3-GH48 GH5-CBM3 found between

Athe_1867-Athe_1859 and Athe_1857-Athe_1866. Instead C. saccharolyticus has one enzyme, which has been characterized and termed CelB (61), that is made up of the GH10 domain homologous to Athe_1857 and the C-terminal GH5 domain homologous to

Athe_1866. It is unclear whether this difference is the result of gene duplication and recombination to expand the GDL repertoire from C. saccharolyticus to C. bescii, or if the repeated domains in C. bescii were lost at some point by C. saccharolyticus. The GDLs of

171

recently sequenced Caldicellulosiruptor species (62) contain GH12 domains among the typical GH5, GH9, GH44, GH48 domains, also likely the result of domain shuffling. The repeated domain shuffling of the GDL sequence in different species represents the evolution of the cellulolytic function in Caldicellulosiruptor, and offers a suite of multi-domain enzymes with a mixture of different activities and domains for degrading plant polysaccharide substrates.

These unique multi-domain enzymes are not only encoded in the genomes of

Caldicellulosiruptor species, they are also highly transcribed and produced (Figure 3.1,

(3,12,48)) under most growth conditions. Based on proteomics data (48), the six GHs from the GDL make up approximately 20% of the secretome protein produced by C. bescii. The large proportion of the secretome that is comprised of the GDL can be seen in the protein gels shown in Figure 3.3B. Because of the large size of these enzymes, the cell would expend significant energy producing these very large proteins (1414-1904 amino acid in length) at their relatively high concentrations. In addition, these large enzymes are glycosylated (Figure 3.3C and Figure 3.6D). Because of their location as part of the GDL, glycosyl transferases Athe_1864 and Athe_1863, likely play a role in this glycosylation, which is believed to occur during secretion (20). Glycosylation of these large multi-domain enzymes may play a role in protecting their Pro/Thr rich linker regions from proteolytic cleavage (63), or aid in their binding and interaction with substrates, as was demonstrated for

Trichoderma reesei cellobiohydrolases (64). While it has been shown that the secretome of

C. bescii (65) and GDL enzymes or their domains (14,17,19) are highly effective at degrading crystalline cellulose and plant biomass substrates, here, using recently improved

172

genetic tools for C. bescii (21) and a corrected genome (10), their roles in vivo could be examined.

Various knockout strains with different combinations of GDL enzyme deletions

(Figure 3.3A) were constructed, and the relative impact of these deletions on the ability to degrade Avicel, switchgrass, and poplar reveals that the relative importance of the GDL enzymes is biomass dependent. In the case of poplar solubilization, Athe_1867 CelA appears to be the most critical, as knockout strain RKCB120 lacking only CelA performs very little biotic solubilization (Figure 3.4C/D). While on switchgrass (Figure 3.4B), the deletion of

Athe_1857 in strain RKCB125 reduces the biotic solubilization by 50%, more than any other single gene knockout. On Avicel, no one enzyme appears to be critical, as the single gene knockout strains degrade over 65% as well as the wildtype, but the combined deletion of

Athe_1867, Athe_1859, and Athe_1857 abolishes nearly 90% of the ability to degrade crystalline cellulose. Previously, an attempt was made to knockout Athe_1867 from C. bescii to demonstrate the criticality of CelA (51). However, upon closer examination of the repetitive genome and the phenotypes reported for strain JWCB029, this strain is likely a

Athe_1867, Athe_1859, Athe_1857 deletion mutant similar to strain RKCB121. This strain demonstrates that it is the combination of Athe_1867, Athe_1859, and Athe_1857 that is critical for cellulose degradation, and further, the criticality of three enzymes suggests that their action is synergistic. For the biomasses tested, the deletion of Athe_1866 and

Athe_1865 (from RKCB121 to RKCB123, or RKCB125 to RKCB127) had only a minor effects. The deletion of Athe_1860 in strain RKCB124 showed no measurable effect on

Avicel or poplar solubilization and only a slight effect for switchgrass. These observations

173

indicate these three genes are not essential for degrading these biomass substrates. The GH5s and GH74 domains encoded in these genes have significant activity on mannans, and thus a substrate not tested here, such as pine or spruce (66) that contains higher mannan levels, might require the use of these enzymes.

The solubilization experiments with the individual knockout strains show the enzymes required for degradation vary with the substrate. Using mixed cultures containing pairwise combinations of four knockout strains (Figure 3.5), solubilization was found to be dependent on the mix of enzymes produced in the culture and independent of the strain that produces them. The two mixture cultures (130+127 and 127+120) both collectively contained all six GDL enzymes; the combination in the culture achieved solubilization similar to the wild type. Because it appears that the most important factor is the extracellular enzyme mixture, strategies based on co-culture of different species producing different enzymes, or addition of industrially produced enzymes to alter the enzyme mixture in the culture, could be feasible. Not considered here, however, is the contribution of the attachment of the cells to the substrate. Caldicellulosiruptor species are known to attach to biomass substrates while they degrade them (38,67), and this could significantly enhance the action of the enzymes by releasing them in close proximity to their substrates. If this is an important factor in how these enzymes function, addition of exogenously produced enzyme may not improve solubilization significantly.

To further investigate the synergy and biomass degradation ability of the GDL enzymes, they were produced in C. bescii for analysis in vitro. By combining a replicating shuttle vector (24) and the antibiotic resistance marker for C. bescii (21) to create plasmid

174

pJMC046 (Figure 3.6A/B), expression of GDL enzymes, with histidine tags to facilitate their purification, is possible in wildtype C. bescii. This strategy offers the advantage that the proteins are glycosylated in their native host (Figure 3.6D). This expression system facilitated the characterization of Athe_1860, the largest GH enzyme produced by C. bescii, which had not been previously characterized. Athe_1860 has similar pH and temperature optima to other GDL GHs (Table 3.1), and is active on a broad variety of glucan, mannan, xylan, glucomannan, and xyloglucan substrates (Table 3.2). This is similar to a thermophilic

GH74 protein from Thermotoga maritima which is most active on barley β-glucan, but also shows activity on xyloglucan, glucomannan, and CMC (68). While Athe_1860 is active on

Avicel, its activity is very slight compared to the Athe_1867.

The cellulases Athe_1867, Athe_1859, and Athe_1857 important for cellulose degradation, as indicated by strain RKCB121, were also produced using the C. bescii expression system. Based on these four enzymes, a synthetic mixture of GDL enzymes was made to recapitulate the wildtype GDL mix with protein ratios determined by densitometry of the wildtype secretome (Figure 3.7A). The activity of this “wildtype” mixture was tested alongside mixtures lacking different enzymes mimicking the GDL enzyme knockout strains.

These results generally mirror the in vivo solubilzation data, with mixes lacking Athe_1860 or Athe_1857 (124 and 125, respectively) performing nearly identically to the wildtype just as the corresponding strains did in vivo. Mix 121 lacking the three main cellulases exhibited minimal activity. The mixes lacking CelA (120, 130, and 132), however, performed significantly worse in vitro than their corresponding knockout strains did in vivo. This could be explained by the insertion of the PslpCbHTK cassette at the CelA deletion causing

175

increased expression of other GDL GHs. However, testing increased enzyme addition of

Athe_1859 and Athe_1857, the enzymes downstream of the CelA deletion and PslpCbHTK insertion, in enzyme mixes 120+, 120++, 130+, and 132+, did not significantly increase the measured activity in vitro. In addition, increased expression of the downstream GDL GHs is unlikely due to the presence of terminators after the majority of GDL GH genes, including

CelA (Figure 3.1C). It is possible that other compensatory effects are occurring in vivo to make up for the loss of CelA, possibly from elsewhere in the genome. Alternatively, CelA at about 65% the mass loading of the synthetic wildtype mixture, while it appears to match the actual wildtype secretome by protein densitometry (Figure 3.7A), may constitute too high a proportion to mimic the in vivo conditions appropriately in vitro.

By adding β-glucosidase (Athe_0458) to the enzyme mixtures, cellobiose released by the GDL enzymes is readily converted to glucose (Figure 3.7E). This relieves end-product inhibition of the GDL enzymes, which was previously demonstrated for Athe_1867 CelA

(14). This end product inhibition also appears to be a factor for Athe_1859, Athe_1857, and

Athe_1860, as their activity is improved with the addition of the β-glucosidase (Figure

3.7D). In vivo, β-glucosidase Athe_0458 is predicted to be intracellular, but its effect in these in vitro experiments would mimic C. bescii cells consuming cellobiose from the extracellular environment to relieve this product inhibition and pulling product through these enzymatic degradation reactions.

The enzyme mixtures recapitulating the GDL enzyme knockout strains in vitro demonstrate the synergism of these four enzymes, as the wildtype mixture performs better than any of the constituent enzymes individually. To further investigate this synergism,

176

experiments were performed with all possible equimolar mixes of these GHs (Figure 3.8).

Using the glucose equivalents released, a degree of synergism was calculated for each mixture, revealing the major synergistic interactions between these GDL GHs. In particular,

Athe_1867 and Athe_1859 are the most synergistic pair of enzymes with the mixture, performing 3.3 times better than the sum of the expected activity of the individual enzymes.

The next highest degree of synergism values are also for the other two pairs with Athe_1859.

While Athe_1859 lacks the GH48 exoglucanase of Athe_1867, Athe_1857, and Athe_1860, the endoglucanase activity of its GH44 likely is the key to its synergistic value, creating more glucose chain ends for the GH48 exoglucanases to degrade.

Using these in vitro and in vivo analyses of the glucan degradation locus in C. bescii, we show, for the first time, the nuanced roles of these large multi-domain GHs in the degradation of plant biomass substrates. While CelA has been highly studied and is a prolific cellulase, its combined action with the other GHs of the GDL is key to cellulose and plant biomass degradation. We demonstrate that the essentiality of individual C. bescii GDL GHs depends on the plant biomass substrate. Using new protein expression tools, we are also able to produce these large enzymes in their native host and demonstrate their synergistic action.

Ultimately, these findings advance how we understand and plan to deploy these enzymes responsible for the cellulolytic phenotype in Calidcellulosiruptor species.

177

Acknowledgements

This work was funded in part by the BioEnergy Science Center, a U.S. Department of

Energy Bioenergy Research Center supported by the Office of Biological and Environmental

Research in the DOE Office of Science.

178

Figure 3.1 – The Glucan Degradation Locus in C. bescii Genes comprising the Glucan Degradation Locus (GDL) (A). GH domains are shown as boxes containing the GH family number. CBM domains are shown as circles containing the CBM family number. PL domains are shown as hexagons containing the PL family number. (B) Transcriptomic and proteomic data for the transcription and expression of GDL genes in C. bescii when grown on various substrates. Proteomic samples contain only the extracellular secretome of C. bescii. Values are shown as log squared mean (LSM). Gene locus tag and annotation are shown to the left. Proteomics nd = not detected. (C) Predicted terminator sequences in the GDL of C. bescii as predicted by TransTermHP (50).

179

Figure 3.2 – Dot plot showing the repeated DNA sequence of the C. bescii GDL The sequence of the C. bescii GDL was used to construct a dot plot using EMBOSS dottup (http://www.bioinformatics.nl/cgi-bin/emboss/dottup). A sequence word length of 50 bp was used to construct the plot. Sequences of 50 bp at a position on the x-axis that match a 50 bp sequence at a position on the y-axis results in a black pixel. The diagonal line from bottom left to top right is the direct match of the sequence to itself. All other markings represent repeated DNA sequences. The repeated GH48 domains of the GDL are shaded in orange. The repeated GH5 domains of the GDL are shaded in green.

180

Figure 3.3 – GDL knockout strains of C. bescii GDL layout for the eight knockout strains generated from C. bescii strain JWCB005 (A). The location of the PslpCbHTK cassette insertion at the gene deletion location is shown. (B) SDS-PAGE gel run on secretome samples from C. bescii wildtype, parent strain JWCB005, and knockout strains RKCB120-132. Lanes are loaded with equal protein mass. The expected location of the GDL enzymes is shown on the left. (C) Glycoprotein stain of the SDS-PAGE gel from (B)

181

Figure 3.4 – Biotic Solubilization of Plant Biomass Substrates by GDL knockout strains Biotic solubilization values for Avicel crystalline cellulose (A), Cave-in-Rock Switchgrass (B), Poplar natural variant GW-9947 (C), Poplar natural variant GW-9762 (D). Biotic solubilization is calculated by subtracting the solubilization in the thermal abiotic control to reflect only the biotic contribution to solubilization. The abiotic thermal solubilization values are as follows: 2.1% for Avicel, 19.6% for switchgrass, 10.6% for poplar GW-9947, and 10.7% for poplar GW-9762. Error bars represent the standard deviation (n=3).

182

Figure 3.5 – Switchgrass solubilization by mixed cultures of GDL knockout strains Biotic solubilization values on switchgrass for the GDL knockout strains (RKCB120, RKCB121, RKCB127, and RKCB130) alone and when combined in a mixed culture inoculated with an equal proportion of two strains. Three replicates of C. bescii containing the full GDL (C. bescii wildtype and C. bescii strain JWCB005) are shown at the left for comparison. The genes contained in each strain are marked in the table with a X. Genes marked with a XX for a mixed culture represents enzymes produced by each of the two strains in the mixture. Error bars represent the standard deviation (n=3).

183

Figure 3.6 – Recombinant expression of GDL Enzymes is C. bescii Vector pJMC046 combines the replicating shuttle vector for C. bescii with the CbHTK gene, allowing for slection in C. bescii (A). The native C. bescii pATHE02 plasmid backbone is marked with a blue hashed half circle, which includes the proposed origin of replication for C. bescii. The pSC101 origin is used for replication in E. coli. The CbHTK kanamycin resistance gene is driven by the 200 bp C. bescii PS30 promoter for selection of kanamycin resistance in both E. coli and C. bescii. The cloning site is marked with a dashed box. Expression of the cloned gene is driven by the Pslp promoter and the cloning site is followed by the Calkro_0402 terminator from C. kronotskyensis. (B) Sequence of the cloning site in pJMC046. The 200 bp Pslp promoter is marked in blue. The start and stop codons surrounding the histidine tag are marked with red boxes. The Calkro_0402 terminator is marked in yellow. (C) SDS-PAGE gel on the C. bescii wildtype concentrated supernatant representing the secretome and purified Athe_1867, Athe_1859, Athe_1857, and Athe_1860 produced using pJMC046 in C. bescii. (D) glycoprotein stain of the SDS-PAGE gel from (C)

184

Figure 3.7 – Enzyme cocktails mimicking C. bescii wildtype and GDL knockout strains SDS-PAGE gel on C. bescii wildtype secretome (lane 1), and synthetic C. bescii wiltype enzyme cocktail (lane 2) comprised of Athe_1867, Athe_1859, Athe_1857, and Athe_1860 in a 25:2:10:1 mass ratio (A). SDS-PAGE gel of purified recombinant β-glucosidase Athe_0458 (lane 3). (B) Mass loading of the enzyme cocktails tested mimicking the C. bescii wildtype and various GDL knockout strains. Cells shaded in green in cocktails 120+, 120++, 130+, and 132+ represent increases of enzyme loading three time that of the wildtype cocktail. (C) Cellobiose and glucose released from Avicel by the various enzyme cocktails as measured by HPLC. (D) Calculated specific activities with and without the addition of β- glucosidase Athe_0458. (E) Cellobiose and glucose released from Avicel by the various enzyme cocktails with the addition of 1µg β-glucosidase as measured by HPLC. (F) Calculated glucan solubilization based on the 5 mg/mL loading of Avicel with (green) and without (orange) the addition of 1µg β-glucosidase. All error bars represent the standard deviation (n=3).

185

Figure 3.8 – Enzyme cocktails demonstrating the synergy between GDL GH enzymes. Domain layout of the four GDL GHs and their molar ratio in enzyme cocktails A-O (A). Sample P represents a no enzyme control. A total loading of 0.5µM enzyme was used with 5 mg/mL Avicel loading. (B) Cellobiose and glucose released from Avicel by enzyme cocktails A-O and no enzyme control P as measured by HPLC. Error bars represent the standard deviation (n=3). (C) Equation for the degree of synergy and the calculated degree of synergy for the enzyme mixtures E-O.

186

Table 3.1 - Characterized GDL Enzymes

Amino Predicted Observed Temperature pH Specific Activity Gene Locus Domains Acid MW MW Optimum Substrates Reference Optimum ◊ length (kDa) (kDa) (°C)

GH9- Avicel: 23-55 CBM3- glucans (Avicel, U/µmol, barley Athe_1867 CBM3- 1759 195 230 85 5-6.5 barley beta-glucan, (13-15) beta-glucan: CBM3- CMC); xylan 100510 U/µmol GH48

glucans (PASC, CMC, Avicel, lichenan); mannans GH5- (guar gum, 1,4-β- CBM3- Avicel: 5.3 Athe_1859 1294 142 170 85 5 mannan); (17) CBM3- U/µmol glucomannan GH44 (KGM); xylan (birchwood); xyloglucan xylan (beech wood); arabinoxylan (wheat); glucans Avicel: 3.4 (barley β-glucan, U/µmol, GH10- lichenin, CMC, beechwood CBM3- PASC, Avicel, filter Athe_1857* 1478 165 200 85-90 6.5 xylan: 39000 (19) CBM3- paper); U/µmol, barley GH48 glucomannan beta-glucan (konjac flour); 3400 U/µmol arabinans (arabic gum, debranched arabinan); pectin

GH5- N-terminal GH5: CBM3- Mannans; C- Athe_1866^ CBM3- 1414 156 195 N/A 5.5-6.5 N/A (17,18) terminal GH5: CBM3- glucans (PASC), GH5

glucans (Avicel, filter paper, PASC, lichenin); mannans GH9- (locust bean gum, Avicel: 10.15 CBM3- guar gum); U/µmol, locust Athe_1865 CBM3- 1369 151 180 85-90 5.5-6.5 (16,18) glucomannan bean gum 1691 CBM3- (konjac); U/mg GH5 arabinoxylan (wheat); xylan (oat spelt) glucans (barley β- glucan, CMC, Avicel, lichenan); mannans (ivory nut, GH74- 1,4-β-mannan); CBM3- Avicel: 0.7 Athe_1860 1904 209 240 85 5.5-6 glucomannan (locust this study CBM3- U/µmol bean); xylans (birch GH48 wood, beech wood, oat spelt, arabinoxylan); xyloglucan * GH10 only was characterized - GH48 domain is 100% identical to Athe_1867 ^ Athe_1866 N-terminal GH5 is 100% identical to Athe_1859 GH5 ◊ enzyme activity U = µmol reducing sugar / min

187

Table 3.2 - Specific Activity of Athe_1860 U = µmol reducing sugar / min Error represents the standard deviation (n=4)

Substrate Linkage Specific Activity (U/µmol)

barley β-glucan β-1,3/4-Glucan 15.6 ± 2.2 tamarind xyloglucan β-1,4-Glucan/α-1,6-Xylose sidechain 8.8 ± 0.6 wheat arabinoxylan β-1,4-Xylan/α-1,2/3-Arabinose sidechain 8.5 ± 0.6 1,4-β-D-mannan β-1,4-Mannan 8.3 ± 0.5 lichenan β-1,3/4-Glucan 8.3 ± 0.3 locust bean glucomannan β-1,4-Glucan/Mannan 7.5 ± 0.5 birch wood xylan β-1,4-Xylan 6.7 ± 0.4 beech wood xylan β-1,4-Xylan 6.3 ± 0.4 oat spelt xylan β-1,4-Xylan 5.0 ± 0.3 ivory nut mannan β-1,4-Mannan 3.8 ± 0.2 CMC β-1,4-Glucan 1.5 ± 0.1 Avicel β-1,4-Glucan 0.70 ± 0.02

188

References

1. Blumer-Schuette, S. E., Brown, S. D., Sander, K. B., Bayer, E. A., Kataeva, I.,

Zurawski, J. V., Conway, J. M., Adams, M. W. W., and Kelly, R. M. (2014)

Thermophilic lignocellulose deconstruction. FeMS Microbiol. Rev. 38, 393-448

2. Yang, S.-J., Kataeva, I., Hamilton-Brehm, S. D., Engle, N. L., Tschaplinski, T. J.,

Doeppke, C., Davis, M., Westpheling, J., and Adams, M. W. W. (2009) Efficient

Degradation of Lignocellulosic Plant Biomass, without Pretreatment, by the

Thermophilic Anaerobe “Anaerocellum thermophilum” DSM 6725. Appl. Environ.

Microbiol. 75, 4762-4769

3. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-

Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative

analysis of extremely thermophilic Caldicellulosiruptor species reveals common and

differentiating cellular strategies for plant biomass utilization. Appl. Environ.

Microbiol. 81, 7159-7170

4. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.

(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus

Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic

Press, Norfolk, UK. pp 91-120

5. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,

Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,

189

Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,

R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal

determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.

Bacteriol. 194, 4015-4028

6. Blumer-Schuette, S. E., Lewis, D. L., and Kelly, R. M. (2010) Phylogenetic,

microbiological, and glycoside hydrolase diversities within the extremely

thermophilic, plant biomass-degrading genus Caldicellulosiruptor. Appl. Environ.

Microbiol. 76, 8084-8092

7. Bergquist, P. L., Gibbs, M. D., Morris, D. D., Te'o, V. S. J., Saul, D. J., and Morgan,

H. W. (1999) Molecular diversity of thermophilic cellulolytic and hemicellulolytic

bacteria. FEMS Microbiol. Ecol. 28, 99-110

8. Gibbs, M. D., Reeves, R. A., Farrington, G. K., Anderson, P., Williams, D. P., and

Bergquist, P. L. (2000) Multidomain and multifunctional glycosyl hydrolases from

the extreme thermophile Caldicellulosiruptor isolate Tok7B.1. Curr. Microbiol. 40,

333-340

9. Kataeva, I. A., Yang, S. J., Dam, P., Poole, F. L., 2nd, Yin, Y., Zhou, F., Chou, W.

C., Xu, Y., Goodwin, L., Sims, D. R., Detter, J. C., Hauser, L. J., Westpheling, J., and

Adams, M. W. (2009) Genome sequence of the anaerobic, thermophilic, and

cellulolytic bacterium "Anaerocellum thermophilum" DSM 6725. J. Bacteriol. 191,

3760-3761

190

10. Williams-Rhaesa, A. M., Poole, F. L., 2nd, Dinsmore, J., Lipscomb, G. L.,

Rubinstein, G. M., Scott, I. M., Conway, J. M., Lee, L. L., Khatibi, P. A., Kelly, R.

M., and Adams, M. W. W. (2017) Genome Stability in Engineered Strains of the

Extremely Thermophilic, Lignocellulose-Degrading Bacterium Caldicellulosiruptor

bescii. Appl. Environ. Microbiol. Submitted February 21, 2017

11. Lochner, A., Giannone, R. J., Keller, M., Antranikian, G., Graham, D. E., and

Hettich, R. L. (2011) Label-free quantitative proteomics for the extremely

thermophilic bacterium Caldicellulosiruptor obsidiansis reveal distinct abundance

patterns upon growth on cellobiose, crystalline cellulose, and switchgrass. J.

Proteome Res. 10, 5302-5314

12. Lochner, A., Giannone, R. J., Rodriguez, M., Jr., Shah, M. B., Mielenz, J. R., Keller,

M., Antranikian, G., Graham, D. E., and Hettich, R. L. (2011) Use of label-free

quantitative proteomics to distinguish the secreted cellulolytic systems of

Caldicellulosiruptor bescii and Caldicellulosiruptor obsidiansis. Appl. Environ.

Microbiol. 77, 4042-4054

13. Zverlov, V., Mahr, S., Riedel, K., and Bronnenmeier, K. (1998) Properties and gene

structure of a bifunctional cellulolytic enzyme (CelA) from the extreme thermophile

'Anaerocellum thermophilum' with separate glycosyl hydrolase family 9 and 48

catalytic domains. Microbiology 144 ( Pt 2), 457-465

191

14. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,

Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and

Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion

Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516

15. Yi, Z., Su, X., Revindran, V., Mackie, R. I., and Cann, I. (2013) Molecular and

biochemical analyses of CbCel9A/Cel48A, a highly secreted multi-modular cellulase

by Caldicellulosiruptor bescii during growth on crystalline cellulose. PLoS One 8,

e84172

16. Su, X., Mackie, R. I., and Cann, I. K. (2012) Biochemical and mutational analyses of

a multidomain cellulase/mannanase from Caldicellulosiruptor bescii. Appl. Environ.

Microbiol. 78, 2230-2240

17. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.

O. (2012) Molecular and biochemical analyses of the GH44 Module of

CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium

Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059

18. Wang, R., Gong, L., Xue, X., Qin, X., Ma, R., Luo, H., Zhang, Y., Yao, B., and Su,

X. (2016) Identification of the C-Terminal GH5 Domain from CbCel9B/Man5A as

the First Glycoside Hydrolase with Thermal Activation Property from a Multimodular

Bifunctional Enzyme. PLoS One 11, e0156802

192

19. Xue, X., Wang, R., Tu, T., Shi, P., Ma, R., Luo, H., Yao, B., and Su, X. (2015) The

N-Terminal GH10 Domain of a Multimodular Protein from Caldicellulosiruptor

bescii Is a Versatile Xylanase/beta-Glucanase That Can Degrade Crystalline

Cellulose. Appl Environ Microbiol 81, 3823-3833

20. Chung, D., Young, J., Bomble, Y. J., Vander Wall, T. A., Groom, J., Himmel, M. E.,

and Westpheling, J. (2015) Homologous expression of the Caldicellulosiruptor bescii

CelA reveals that the extracellular protein is glycosylated. PLoS One 10, e0119508

21. Lipscomb, G. L., Conway, J. M., Blumer-Schuette, S. E., Kelly, R. M., and Adams,

M. W. W. (2016) Highly Thermostable Kanamycin Resistance Marker Expands the

Toolkit for Genetic Manipulation of Caldicellulosiruptor bescii. Appl. Environ.

Microbiol. 82, 4421-4428

22. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier

to DNA transformation in Caldicellulosiruptor species results in efficient marker

replacement. Biotechnol Biofuels 6, 82

23. Cha, M., Chung, D., Elkins, J. G., Guss, A. M., and Westpheling, J. (2013) Metabolic

engineering of Caldicellulosiruptor bescii yields increased hydrogen production from

lignocellulosic biomass. Biotechnol. Biofuels 6, 85

24. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable

replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic

methodologies to other members of this genus. PLoS One 8, e62881

193

25. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.

(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic

euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894

26. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods

Enzymol. 498, 349-361

27. Evans, L. M., Slavov, G. T., Rodgers-Melnick, E., Martin, J., Ranjan, P., Muchero,

W., Brunner, A. M., Schackwitz, W., Gunter, L., Chen, J. G., Tuskan, G. A., and

DiFazio, S. P. (2014) Population genomics of Populus trichocarpa identifies

signatures of selection and adaptive trait associations. Nat. Genet. 46, 1089-1096

28. Muchero, W., Guo, J., DiFazio, S. P., Chen, J. G., Ranjan, P., Slavov, G. T., Gunter,

L. E., Jawdy, S., Bryan, A. C., Sykes, R., Ziebell, A., Klapste, J., Porth, I., Skyba, O.,

Unda, F., El-Kassaby, Y. A., Douglas, C. J., Mansfield, S. D., Martin, J., Schackwitz,

W., Evans, L. M., Czarnecki, O., and Tuskan, G. A. (2015) High-resolution genetic

mapping of allelic variants associated with cell wall chemistry in Populus. BMC

Genomics 16, 24

29. Studier, F. W. (2005) Protein production by auto-induction in high density shaking

cultures. Protein Expr. Purif. 41, 207-234

30. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)

Improved growth media and culture techniques for genetic analysis and assessment of

194

biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,

41-49

31. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,

Adams, M. W., and Westpheling, J. (2011) Natural competence in the

hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:

construction of markerless deletions of genes encoding the two cytoplasmic

hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238

32. Nordberg, H., Cantor, M., Dusheyko, S., Hua, S., Poliakov, A., Shabalov, I.,

Smirnova, T., Grigoriev, I. V., and Dubchak, I. (2014) The genome portal of the

Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 42,

D26-31

33. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D.,

Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F.,

Cicero, R., Clark, S., Dalal, R., Dewinter, A., Dixon, J., Foquet, M., Gaertner, A.,

Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R.,

Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D.,

Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney,

A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D.,

Zhao, P., Zhong, F., Korlach, J., and Turner, S. (2009) Real-time DNA sequencing

from single polymerase molecules. Science 323, 133-138

195

34. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)

Methylation by a unique alpha-class N4-cytosine methyltransferase is required for

DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844

35. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing

sugar. Anal. Chem. 31, 426-428

36. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-

268

37. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose

assay for endoglucanase activity. Anal. Biochem. 342, 176-178

38. Blumer-Schuette, S. E., Alahuhta, M., Conway, J. M., Lee, L. L., Zurawski, J. V.,

Giannone, R. J., Hettich, R. L., Lunin, V. V., Himmel, M. E., and Kelly, R. M. (2015)

Discrete and structurally unique proteins (tāpirins) mediate attachment of extremely

thermophilic Caldicellulosiruptor species to cellulose. J. Biol. Chem. 290, 10645-

10656

39. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.

E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor

bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.

Crystallogr. 69, 534-539

196

40. Alahuhta, M., Taylor, L. E., 2nd, Brunecky, R., Sammond, D. W., Michener, W.,

Adams, M. W., Himmel, M. E., Bomble, Y. J., and Lunin, V. (2015) The catalytic

mechanism and unique low pH optimum of Caldicellulosiruptor bescii family 3

pectate lyase. Acta. Crystallogr. D Biol. Crystallogr. 71, 1946-1954

41. Chung, D., Pattathil, S., Biswal, A. K., Hahn, M. G., Mohnen, D., and Westpheling, J.

(2014) Deletion of a gene cluster encoding pectin degrading enzymes in

Caldicellulosiruptor bescii reveals an important role for pectin in plant biomass

recalcitrance. Biotechnol. Biofuels 7

42. Lommel, M., and Strahl, S. (2009) Protein O-mannosylation: conserved from bacteria

to humans. Glycobiology 19, 816-828

43. Iwashkiw, J. A., Vozza, N. F., Kinsella, R. L., and Feldman, M. F. (2013) Pour some

sugar on it: the expanding world of bacterial protein O-linked glycosylation. Mol.

Microbiol. 89, 14-28

44. Dobos, K. M., Khoo, K. H., Swiderek, K. M., Brennan, P. J., and Belisle, J. T. (1996)

Definition of the full extent of glycosylation of the 45-kilodalton glycoprotein of

Mycobacterium tuberculosis. J Bacteriol 178, 2498-2506

45. Michell, S. L., Whelan, A. O., Wheeler, P. R., Panico, M., Easton, R. L., Etienne, A.

T., Haslam, S. M., Dell, A., Morris, H. R., Reason, A. J., Herrmann, J. L., Young, D.

B., and Hewinson, R. G. (2003) The MPB83 antigen from Mycobacterium bovis

197

contains O-linked mannose and (1-->3)-mannobiose moieties. J Biol Chem 278,

16423-16432

46. Wiemels, R. E., Cech, S. M., Meyer, N. M., Burke, C. A., Weiss, A., Parks, A. R.,

Shaw, L. N., and Carroll, R. K. (2017) An Intracellular Peptidyl-Prolyl cis/trans

Isomerase Is Required for Folding and Activity of the Staphylococcus aureus

Secreted Virulence Factor Nuclease. J Bacteriol 199

47. Veeraraghavan, S., Nall, B. T., and Fink, A. L. (1997) Effect of Prolyl Isomerase on

the Folding Reactions of Staphylococcal Nuclease. Biochemistry 36, 15134-15139

48. Poudel, S., Giannone, R. J., Basen, M., Poole, F. L., 2nd, Kelly, R. M., Adams, M. W.

W., and Hettich, R. L. (2017) Characterizing the range and specificity of substrate-

specific extracellular enzymes in Caldicellulosiruptor bescii. in Society for Industrial

Microbiology and Biotechnology Annual Meeting, Denver, CO

49. Abraham, P. E., Yin, H., Borland, A. M., Weighill, D., Lim, S. D., De Paoli, H. C.,

Engle, N., Jones, P. C., Agh, R., Weston, D. J., Wullschleger, S. D., Tschaplinski, T.,

Jacobson, D., Cushman, J. C., Hettich, R. L., Tuskan, G. A., and Yang, X. (2016)

Transcript, protein and metabolite temporal dynamics in the CAM plant Agave. Nat.

Plants 2, 16178

50. Kingsford, C. L., Ayanbule, K., and Salzberg, S. L. (2007) Rapid, accurate,

computational discovery of Rho-independent transcription terminators illuminates

their relationship to DNA uptake. Genome Biol. 8, R22

198

51. Young, J., Chung, D., Bomble, Y. J., Himmel, M. E., and Westpheling, J. (2014)

Deletion of Caldicellulosiruptor bescii CelA reveals its crucial role in the

deconstruction of lignocellulosic biomass. Biotechnol. Biofuels 7, 142

52. Yokoyama, H., Yamashita, T., Morioka, R., and Ohmori, H. (2014) Extracellular

Secretion of Noncatalytic Plant Cell Wall-Binding Proteins by the Cellulolytic

Thermophile Caldicellulosiruptor bescii. J. Bacteriol. 196, 3784-3792

53. Zurawski, J. V. (2016) Comparative analysis of extremely thermophilic

Caldicellulosiruptor species reveals common and differentiating cellular strategies for

plant biomass utilization. in Microbiological and Engineering Studies of Extremely

Thermophilic Caldicellulosiruptor Species for Lignocellulose Deconstruction and

Conversion, North Carolina State University, Robert M. Kelly, advisor. pp 40-97

54. Bhagia, S., Muchero, W., Kumar, R., Tuskan, G. A., and Wyman, C. E. (2016)

Natural genetic variability reduces recalcitrance in poplar. Biotechnol. Biofuels 9, 106

55. Hong, M.-R., Kim, Y.-S., Park, C.-S., Lee, J.-K., Kim, Y.-S., and Oh, D.-K. (2009)

Characterization of a recombinant β-glucosidase from the thermophilic bacterium

Caldicellulosiruptor saccharolyticus. J. Biosci. Bioeng. 108, 36-40

56. Belaich, J. P., Tardif, C., Belaich, A., and Gaudin, C. (1997) The cellulolytic system

of Clostridium cellulolyticum. J Biotechnol 57, 3-14

199

57. Desvaux, M. (2005) Clostridium cellulolyticum: model organism of mesophilic

cellulolytic . FEMS Microbiol Rev 29, 741-764

58. Ding, S.-Y., Adney, W. S., Vinzant, T. B., Decker, S. R., Baker, J. O., Thomas, S. R.,

and Himmel, M. E. (2003) Glycoside Hydrolase Gene Cluster of Acidothermus

cellulolyticus. in Applications of Enzymes to Lignocellulosics, American Chemical

Society. pp 332-360

59. Guglielmi, G., and Beguin, P. (1998) Cellulase and hemicellulase genes of

Clostridium thermocellum from five independent collections contain few overlaps

and are widely scattered across the chromosome. FEMS Microbiol Lett 161, 209-215

60. Bayer, E. A., Shoham, Y., and Lamed, R. (2013) Lignocellulose-Decomposing

Bacteria and Their Enzyme Systems. in The Prokaryotes: Prokaryotic Physiology

and Biochemistry (Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E., and

Thompson, F. eds.), Springer Berlin Heidelberg, Berlin, Heidelberg. pp 215-266

61. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside

hydrolase inventory drives plant polysaccharide deconstruction by the extremely

thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.

108, 1559-1569

62. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,

Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,

Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,

200

Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,

Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome

Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain

Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3

63. Upreti, R. K., Kumar, M., and Shankar, V. (2003) Bacterial glycoproteins: functions,

biosynthesis and applications. Proteomics 3, 363-379

64. Payne, C. M., Resch, M. G., Chen, L., Crowley, M. F., Himmel, M. E., Taylor, L. E.,

2nd, Sandgren, M., Stahlberg, J., Stals, I., Tan, Z., and Beckham, G. T. (2013)

Glycosylated linkers in multimodular lignocellulose-degrading enzymes dynamically

bind to cellulose. Proc. Natl. Acad. Sci. U.S.A. 110, 14646-14651

65. Kanafusa-Shinkai, S., Wakayama, J. i., Tsukamoto, K., Hayashi, N., Miyazaki, Y.,

Ohmori, H., Tajima, K., and Yokoyama, H. (2013) Degradation of microcrystalline

cellulose and non-pretreated plant biomass by a cell-free extracellular

cellulase/hemicellulase system from the extreme thermophilic bacterium

Caldicellulosiruptor bescii. J. Biosci. Bioeng. 115, 64-70

66. Girio, F. M., Fonseca, C., Carvalheiro, F., Duarte, L. C., Marques, S., and Bogel-

Lukasik, R. (2010) Hemicelluloses for fuel ethanol: A review. Bioresour. Technol.

101, 4775-4800

67. Conway, J. M., Pierce, W. S., Le, J. H., Harper, G. W., Wright, J. H., Tucker, A. L.,

Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M. (2016)

201

Multidomain, Surface Layer-associated Glycoside Hydrolases Contribute to Plant

Polysaccharide Degradation by Caldicellulosiruptor Species. J. Biol. Chem. 291,

6732-6747

68. Chhabra, S. R., and Kelly, R. M. (2002) Biochemical characterization of Thermotoga

maritima endoglucanase Cel74 with and without a carbohydrate binding module

(CBM). FEBS Lett. 531, 375-380

202

CHAPTER 4

Structural and Biochemical Characterization of Glycoside Hydrolase Catalytic Domains and Carbohydrate Binding Modules from Caldicellulosiruptor Biomass Degradation Enzymes

203

Abstract

Caldicellulosiruptor species produce very large, multi-domain, extracellular enzymes that contain multiple catalytic glycoside hydrolase and carbohydrate binding domains within a single polypeptide. Using these multi-domain, multi-functional enzymes

Caldicellulosiruptor species are able to deconstruct plant biomass polysaccharides, including crystalline cellulose. This strategy for plant biomass degradation is distinct from bacteria that produce cellulosomes or free enzyme producing fungi. To examine the relationship between the structure and function of individual units from these proteins, the constituent domains from three Caldicellulosiruptor multi-domain enzymes were characterized biochemically in conjunction with their three-dimensional crystal structures. The protein domains investigated here include: a GH16 laminarinase (Calkro_0111), a GH5 β-glucanase (Csac_0678), and a

GH10/GH12/GH48 xylanase/β-glucanase (Wai35_2053). The Calkro_0111 GH16 domain was not active in the absence of an adjacent fascin-like domain. The structures for both of these domains were solved and significantly higher activity on the branched substrate laminarin than unbranched β-glucans suggests that the fascin domain of this protein functions to orient the catalytic GH16. A CBM28 adjacent to a GH5 in Csac_0678 (a highly conserved

SLH-domain protein among Caldicellulosiruptor species, Athe_0594 in Caldicellulosiruptor bescii) appears to play a similar role in enhancing the activity of the glycoside hydrolase domain. Additionally, the crystal structure of Csac_0678 was used to determine the catalytic acid/base (Glu189) and nucleophilic (Glu286) residues, in the GH5 of Athe_0594. The crystal structures of 3 of the 4 domains in Wai35_2053 were also determined. These domains, like those solved from Calkro_0111 and Csac_0678, are individually folded and

204

catalytic entities strung along a single polypeptide chain. Unlike some other multi-domain enzymes from Caldicellulosiruptor species, the activity of the domains of Wai35_2053 do not appear to act in a synergistic way, with each domain having its highest activity on a substrate with slightly different linkages and the overall activity of Wai35_2053 being similar to the sum of the activity of its parts. Taken together, by elucidating the relationship between the structure and function of the individual domains of the multi-domain enzymes of

Caldicellulosiruptor species, we not only understand how these domains and enzymes work on a fundamental level, but can begin to imagine leveraging this understanding to rationally design enzymes for improved synergy and substrate degradation.

Introduction

Extremely thermophilic bacteria from the genus Caldicellulosiruptor are particularly adept at degrading plant biomass polysaccharides and are the highest temperature cellulose degrading microorganisms currently identified. The large extracellular carbohydrate active enzymes (CAZymes) that perform this degradation are almost exclusively multi-domain, with multiple catalytic glycoside hydrolase (GH) and/or polysaccharide lyase (PL) domains and several carbohydrate binding module (CBM) domains in the same protein (1). The presence of multiple catalytic domains within the same protein offers the opportunity for synergy between modules with different biochemical functions, such as endo- and exo- acting, hemicellulase or cellulase. While CAZymes are produced by all plant biomass degrading microorganisms, the prevalence of multi-domain, multi-functional CAZymes in the genomes of Caldicellulosiruptor species is particularly unique (2).

205

Many of these extracellular multi-domain enzymes from Caldicellulosiruptor species have been studied biochemically such that a better understanding of how these enzymes function is emerging. Figure 4.1 shows selected extracellular enzymes that have been biochemically characterized from C. bescii, C. kronotskyensis, C. saccharolyticus, three highly cellulolytic species that are among the most highly studied species of the genus (3-6).

The extracellular enzymes of Caldicellulosiruptor species can be divided into two broad groups shown in Figure 4.1: those enzymes that are free from the cell, and those that contain surface layer homology (SLH) domains that associate proteins to the bacterial cell surface.

There are 8 groups of catalytic GH- or PL-containing S-layer associated enzymes through all of the Caldicellulosiruptor species, of which three (Csac_0678, Calkro_0111 and

Calkro_402) have been biochemically characterized. These proteins have been identified on the cell surface of their native species, attached within the S-layer by their SLH domains

(5,7). This location is believed to play a role in the attachment of Caldicellulosiruptor species to their plant biomass substrates. It may also be advantageous to the cell to degrade plant biomass and release sugars for uptake in close proximity to the cell. Calkro_0111 is the largest glycoside hydrolase-containing enzyme (2,435 amino acids) produced by any

Caldicellulosiruptor species, and is completely unique to C. kronotskyensis. It contains two catalytic domains (GH16 and GH55) as well as a number of CBM and accessory domains

(Fascin-like and Fibronectin type III) that appear to play a role in the function of the catalytic domains (5). Unlike Calkro_0111, Csac_0678 and its homologs are the smallest GH containing S-layer proteins produced by Caldicellulosiruptor species, and the GH5 domain of this protein appears to be conserved in all Caldicellulosiruptor species. The activity and

206

thermostability of the Csac_0678 GH5 appear to be enhanced by the CBM28 domain adjacent to it (8).

While S-layer enzymes are an important part of how Caldicellulosiruptor species degrade plant biomass, the majority of Caldicellulosiruptor extracellular enzymes are free from the cell. Several enzymes in the secretome have been characterized (see Figure 4.1).

Athe_1867 (CelA) is one of the most effective cellulase enzymes known (9). Athe_1867 is found in the glucan degradation locus (GDL), a locus of extracellular enzymes that contain

CBM3 domains, in all cellulolytic Caldicellulosiruptor species (4). From the GDL,

Athe_1857, Athe_1857, and Csac_1078 have all been characterized previously (6,10,11).

Downstream of the GDL in many Caldicellulosiruptor species is a locus of pectin-degrading enzymes; PL3 containing Athe_1854 has been characterized in detail from this locus (12,13).

Expressing these large Caldicellulosiruptor enzymes recombinantly to facilitate characterization has been problematic until recently, given the recent advancement of genetic tools for C. bescii (14,15). Thus, many of the enzymes characterized from

Caldicellulosiruptor were examined as truncation mutants of their constituent domains for characterization. Often the constituent domains, when taken apart, do not have the same activity as the full protein, as was the case for Csac_1078 (CelB) (6). This reveals the importance of the multi-domain structure of Caldicellulosiruptor enzymes for biomass degradation.

Biochemical characterization is important for understanding what substrates specific domains act on and which non-catalytic domains may play a role in this activity. However, this biochemical data falls short of elucidating the physical way that the domains are

207

arranged and how catalytic and binding domains function through their interactions with substrates, other domains, or other proteins. Information of this nature is best determined through the identification of protein structures. Because obtaining protein crystals is required for solving a protein structure by X-ray diffraction, the multi-domain Calidcellulosiruptor enzymes, with their heterogeneous domains and flexible disordered inter-domain linkers, are ill-suited for this technique in their native state. Thus, these proteins must be broken up into individual domains to produce crystals and obtain their structural information. Only six crystal structures from Caldicellulosiruptor CAZymes have been deposited to date in the protein data bank (www.rcsb.org) (16). These include: C. bescii Athe_1854 pectate lyase

PL3 (PDB ID: 3T9G) (13,17), C. sp. F32 β-1,3-1,4-glucanase GH5 (PDB ID: 4X0V), C. bescii Athe_0185 xylanase GH10 (PDB ID: 4L4O) (18), C. bescii Athe_1867 endoglucanase

GH9 (PDB ID: 4DOD) and cellobiohydrolase GH48 (PDB ID: 4EL8) (9), C. strain Rt8.B4 mannanase Man26 CBM27 (PDB ID: 1PMJ) (19).

Here, we present the crystal structures for seven CAZy domains from three enzymes of three different Caldicellulosiruptor species. In addition, we have characterized these domains biochemically to elucidate the relationship between the structure of the multi- domain proteins and their function.

Experimental Procedures

Bacterial Strains, Plasmids and Reagents

NEB 5-alpha E. coli (New England Biolabs) was used for the construction of vectors used in this study. Proteins for expression in E. coli were cloned in pET46 using Gibson

208

Assembly (20) using Gibson Assembly Mastermix (New England Biolabs). The sequences of some vectors were modified after cloning to add or remove a small number of codons for crystallography constructs or to introduce mutations in the active site of enzymes using the

NEB Q5 site directed mutagenesis kit (New England Biolabs). Protein expression was performed from Rosetta 2 (DE3) E. coli (Novagen). C. bescii was obtained from the Leibniz

Institute DSMZ-German Collection of Microorganisms and Cell Cultures. Replicating shuttle vector pDCW89 (15) was obtained from Dr. Janet Westpheling (University of

Georgia, Athens, Georgia) and modified by adding a cloning site and kanamycin resistance gene to produce vector pJMC046 for protein production in C. bescii. Caldicellulosiruptor genomic DNA (gDNA) for cloning was extracted, as described previously (21).

ZymoResearch plasmid miniprep classic and ZymoPURE midiprep kits (Zymoresearch) were used for isolating plasmid DNA, and sequences were confirmed by Sanger sequencing

(Genewiz). Carbohydrates used in this study are as follows: birch wood xylan (Sigma

Aldrich), oat spelt xylan (Sigma Aldrich), beech wood xylan (Sigma Aldrich), CM- pachyman (Megazyme), CM-curdlan (Megazyme), low viscosity barley β-glucan

(Megazyme), lichenan (Megazyme), laminarin (Sigma Aldrich), locust bean gum glucomannan (Sigma Aldrich), carboxymethyl cellulose (CMC) (Sigma Aldrich).

Media and Culture Conditions

E. coli cultures were grown in either liquid Luria-Bertani (LB) medium (5 g/L yeast extract, 10 g/L tryptone, 10 g/L NaCl) or on plates of LB medium with 1.5% (w/v) agar. LB media was supplemented, as appropriate, with 50 μg/mL kanamycin (Fisher Scientific), 50

209

μg/mL carbenicillin (Fisher Scientific), and/or 34 μg/mL chloramphenicol (Sigma Aldrich).

Protein expression in E. coli was performed in ZYM-5052 auto-induction medium, as described previously (22).

C. bescii strains were grown anaerobically with 20% CO2 / 80% N2 headspace at

70°C without agitation in a modified version of DSM516 medium, designated CG516.

CG516 media contains 0.33 g/L NH4Cl, 0.33 g/L KCl, 0.33 g/L MgCl2 ● 6H2O, 0.14 g/L

CaCl2 ● 2H2O, 0.16 mM sodium tungstate, 1mL/L trace element solution SL-10, 1mL/L vitamin solution, 0.25 mg/L resazurin, 1 g/L L-Cysteine-HCl ● H20, 1 g/L sodium bicarbonate, 1mM potassium phosphate, 5 g/L glucose, and 0.5 g/L yeast extract. The vitamin solution for this media contains the following: 20 mg/L biotin, 20 mg/L folic acid,

100 mg/L Pyridoxine-HCl, 50 mg/L Thiamine-HCl ● 2H2O, 50 mg/L riboflavin, 50 mg/L nicotinic acid, 50 mg/L D-Ca-pantothenate, 50 mg/L Vitamin B12, 50 mg/L p-Aminobenzoic acid, and 50 mg/L Lipoic acid. To grow C. bescii competent cells, LOD medium (23) was used containing 5 g/L glucose and 1 x 19 amino acid solution (24). C. bescii was plated by embedding in CG516 medium with 1.5% agar and grown at 65°C with 2% hydrogen/98% nitrogen headspace in anaerobic conditions. To maintain the replicating vector for protein expression in C. bescii, media was supplemented with 50 μg/mL kanamycin (Fisher

Scientific).

Recombinant Protein Production in E. coli

Proteins were expressed in 1L ZYM-5052 auto-induction media (22) and incubated at

37°C with agitation at 250 rpm. Cells were harvested after 20 hours by centrifugation at 6000

210

x g for 10 minutes. The resulting cell pellets were frozen at -20°C prior to protein purification.

Purification of Proteins for Biochemical Characterization

To re-suspend the cell pellet, 5 mL / g wet cell pellet in 50 mM sodium phosphate, pH 7.4, 500 mM NaCl, 20 mM imidazole containing 1 mg / mL lysozyme (ThermoFisher

Scientific) was used. Cells were lysed using a French press at 16,000 psi cell pressure. Heat treated of the cell lysate at 65°C for 20 minutes was used to remove heat labile E. coli proteins. The heat-treated samples were then centrifuged at 18,000 x g for 30 minutes and

0.22 µm filtered. The cell extract was purified using 1 or 5 mL HisTrap HP nickel-Sepharose

(GE Healthcare) columns, operated according to the manufacturer’s instructions. All FPLC steps were performed using a Biologic DuoFlow FPLC (Bio-Rad). Proteins purified by size exclusion were purified using a HiPrep 16/600 Sephacryl S-200 HR column (GE Healthcare) in 50 mM sodium acetate pH5.5, 150 mM NaCl, 10 mM CaCl2 buffer. Purified proteins were concentrated, and buffer exchanged into 50 mM sodium acetate pH5.5, 100 mM NaCl, 10 mM CaCl2 buffer using 10,000 or 50,000 MWCO Vivaspin PES concentrators. Protein purification was assessed using Mini-PROTEAN TXG stain free 4-15% SDS-PAGE gels

(Bio-Rad). Glycoprotein staining was performed using the Pierce Glycoprotein Staining Kit

(ThermoFisher Scientific). All gel imaging was performed using the Syngene G:BOX gel imaging system. The Pierce BCA protein assay (ThermoFisher Scientific) was used to determine protein concentrations.

211

Construction of C. bescii Enzyme Expression Strains

Vector pJMC046 for the expression of proteins in C. besci was constructed from pSBS4 (14) by adding a protein expression site containing: the 200 bp Pslp promoter from

Athe_2303 to drive expression, a C-terminal His tag coding sequence, and the terminator from Calkro_0402. Genes for Wai35_2053 and Athe_0594 GH5-CBM28 were inserted in this vector and selected in both E. coli and C. bescii using 50 µg/mL kanamycin. Vectors produced in E. coli were methylated using M.CbeI methylase, as described previously (25) prior to transformation into C. bescii.

Competent cells were prepared by growing a 500 mL culture of C. bescii to an optical density at 680 nm of 0.06-0.08. This culture was cooled to room temperature and then cells were harvested. All centrifugation steps during the preparation of competent cells are carried out at 6000 x g for 10 minutes. Following this initial harvest, the cell pellet was washed three times in 10% sucrose solution. The cell pellet after the final wash was re-suspended to a total volume of 100-120 µL with 10% sucrose. Cells (50 µL) were mixed with 1-2 µg of plasmid

DNA and transferred to 1 mm gap cuvettes (USA Scientific). Cells were electroporated using a Gene Pulser II system with Pulse Controller PLUS module (Bio-Rad) at the following conditions: 2.0 kV, 200 Ω, and 25 µF. Immediately following electroporation, cells are re- suspended in 1mL pre-warmed DG516 medium, transferred to a 10 mL pre-warmed culture, and incubated at 70°C. At 15 minutes and one hour post-electroporation, 1-5 mL was transferred into selective medium containing 50 µg/mL kanamycin. These selective media cultures were incubated at 70°C, until growth was observed, typically 2 days. After growth under selective conditions, these strains were plated once on CG516 medium containing

212

kanamycin. To verify strains contained the replicating plasmid, plasmid DNA was isolated from C. bescii using the ZymoResearch plasmid miniprep classic kit was transformed into 5α

E. coli. Plasmid DNA was then isolated from colonies of E. coli using the same kit and restriction mapped to verify the plasmid was intact from the C. bescii strain.

Protein Expression and Purification from C. bescii Strains

Protein expression strains of C. bescii were grown overnight in 500 mL CG516 medium with 50 µg/mL kanamycin at 70°C. This culture served as the inoculum for a

Sartorius Stedim BioStat Cplus 20L bioreactor containing 17 L of CG516 medium with 50

µg/mL kanamycin. The bioreactor was operated for 22 hours at 70°C with sparging of 1

L/min 20% CO2/80% N2 gas mix and agitation at 200 rpm. The pH of the bioreactor was controlled with NaOH addition at pH 7.00. To harvest the reactor, it was cooled to 20-30°C and the fermentation broth was transferred to a carboy. This broth was fractionated using the

Millipore Pellicon Mini Tangential Flow Filter using a Pellicon 2 mini 0.22 µm GVPP 0.1 m2 filter (Millipore). The permeate of this filter was retained as the extracellular protein fraction.

This extracellular protein fraction was concentrated using the TFF system and a Pellicon 2 mini BioMax 10 or 50 kDa cutoff PES 0.1 m2 filter (Millipore). Here the retentate of the filter contained the large extracellular proteins produced by C. bescii, including the protein of interest. This filtering setup was also used to buffer exchange the protein into 20 mM sodium phosphate 500 mM sodium chloride buffer pH 7.4. When the fermentation broth was concentrated to a volume of 300-800 mL, the material was sterile-filtered and loaded onto a 5 mL HisTrap HP nickel-Sepharose (GE Healthcare) column, and purified as described above.

213

Purification of Proteins for Cystallography

The frozen cell pellets were thawed at room temperature with equal volume of buffer

A (50 mM Tris pH 7.5, 100 mM NaCl and 10 mM imidazole) and lysed with lysozyme and sonication. One mg/mL lysozyme (Hampton Research, Aliso Viejo, CA), 1.0 U/mL Pierce

Universal Nuclease (Thermo Scientific, Rockford, IL) and EDTA-free protease inhibitor

(Thermo Scientific, Rockford, IL) according to manufacturer instructions were added in the lysis mixture and incubated for 30 minutes at room temperature with occasional vortexing.

Sonication was done at room temperature for two minutes using a Branson 5510 water bath sonicator (Branson Ultrasonics Corporation, Danbury, CT). Cell debris was removed by centrifugation at 15,000 x g for 15 min. The supernatant was loaded into a 5 mL HisTrap FF crude column (GE Healthcare, Piscataway, NJ) using an Akta FPLC system (GE Healthcare,

Piscataway, NJ) with buffer A. After loading and washing the unbound proteins from the column, protein samples were eluted using 100% of Buffer B (50 mM Tris pH 7.5, 100 mM

NaCl and 250 mM imidazole). Final purification was performed by size-exclusion chromatography using a HiLoad Superdex 75 (16/60) column (GE Healthcare, Piscataway,

New Jersey, USA) in buffer C (20 mM Tris pH 7.5 and 100 mM NaCl).

Crystallization

Crystals were initially obtained with sitting drop vapor diffusion using a 96-well plate with screens (Crustal Screen HT, PEG ion HT and Grid Salt HT) from Hampton Research

(Aliso Viejo, CA). Fifty µL of well solution was added to the reservoir and drops were made with 0.2 µL of well solution and 0.2 µL of protein solution using a Phoenix crystallization

214

robot (Art Robbins Instruments, Sunnyvale, CA). The crystals were grown in in the following conditions for each protein at 20°C: Csac_0678 GH5 in 1.0 M sodium malonate pH 7.0; Csac_0678 CBM28 in 0.1 M HEPES pH 7-8 and 64-74% (v/v) 2-methyl-2,4- pentanediol; Calkro_0111 GH16 in 0.1 M Sodium citrate tribasic dihydrate pH 5.6, 20%

(v/v) 2-Propanol and 20% (w/v) Polyethylene glycol 4,000; Calkro_0111 Fascin in 0.9-1.7 M ammonium sulfate and 0.1 M sodium acetate pH 4-5; Calkro_0111 GH16+Fascin in 20% w/v Polyethylene glycol 3,350 and 0.2 M Ammonium iodide; Wai35_2053 GH10 in Tris hydrochloride pH 8-9 and 1.5-2.1 M ammonium sulfate; Wai35_2053 CBM3 in 8% (v/v) tacsimate pH 4-5 and 14-25% polyethylene glycol 3350; and Wai35_2053 GH48 in 0.1 M citric acid pH 5-6, 14-25% (w/v) polyethylene glycol, 20% (v/v) 2-propanol and 50 mM tri- sodium citrate. The protein solutions used contained the following concentrations of protein in 20 mM Tris pH 7.5 and 100 mM NaCl supplemented with various ligands as noted: 18.5 mg/mL Csac_0678 GH5 supplemented with 10mM CaCl2; 20.7 mg/mL Csac_0678 CBM28;

12.4 mg/mL Calkro_0111 GH16 supplemented with 20mM cellobiose; 36 mg/mL

Calkro_0111 Fascin; 15 mg/mL Calkro_0111 GH16+Fascin supplemented with 5 mM cellobiose; 26 mg/mL Wai35_2053 GH10; 28 mg/mL Wai35_2053 CBM3; and 9 mg/mL

Wai35_2053 GH48 supplemented with 20mM cellobiose.

Data Collection and Processing

Crystals were flash frozen in a nitrogen gas stream at 100 K before home source data collection using an in-house Bruker X8 MicroStar X-Ray generator with Helios mirrors and

215

Bruker Platinum 135 CCD detector. Data were indexed and processed with the Bruker Suite of programs version 2014.9 (Bruker AXS, Madison, WI).

Structure Solution and Refinement

Intensities were converted into structure factors and 5% of the reflections were flagged for Rfree calculations using programs F2MTZ, Truncate, CAD and Unique from the

CCP4 package of programs (26). The program MOLREP (27) was used for molecular replacement using the following structures as the search model for each protein: PDB ID

1G01 for Csac_0678 GH5; PDB ID 3ACF for Csac_0678 CBM28; PDB ID 3ATG for

Calkro_0111 GH16; PDB ID 2YUG for Calkro_0111 Fascin; the previously determined

Calkro_0111 GH16 and Calkro_0111 Fascin structures for Calkro_0111 GH16+Fascin; PDB

ID 5AY7 for Wai35_2053 GH10; PDB ID 2L8A for Wai35_2053 CBM3; and PDB ID 4EL8 for Wai35_2053 GH48. Models for molecular replacement were created with ProtMod server

(http://ffas.burnham.org/protmod-cgi/protModHome.pl) using SCWRL (28) after sequence alignment against PDB using FFAS (29-31). Refinement and manual correction was performed using REFMAC5 (32) version 5.8.0155 and Coot (33) version 0.8.6.

Biochemical Characterization of Glycoside Hydrolases

Activity assays were performed by incubating a defined concentration of enzyme with a 1% (w/v) substrate solution in 100µL volume in a PCR plate thermocycler with heated lid (Eppendorf). All enzyme assays were performed in 50 mM sodium acetate buffer pH 5.5, containing 10 mM calcium chloride and 100 mM sodium chloride. Activity was measured

216

using the 3,5-dinitrosalicylic acid (DNS) reducing sugar assay (34,35), which was adapted to a 96 well format (36). The DNS-reagent used here contained 1.6% (w/v) sodium hydroxide,

30% (w/v) sodium potassium tartrate, and 0.1% (w/v) 3,5-dinitrosalicylic acid. Briefly, 25

µL of enzyme reaction was mixed with 50 µL DNS reagent and heated in a thermocycler at

95°C for 5 min, 48°C for 1 min, followed by holding at 20°C. To measure color development, 36 µL of DNS reaction was mixed with 160 µL distilled water in a 96-well plate and read at 540 nm. Glucose, and xylose were run as standards to calculate a curve to convert absorbance readings to µmol of reducing sugar.

Results

Calkro_0111 GH16 and Fascin Domain Containing Proteins

Calkro_0111 is the largest glycoside hydrolase (2,435 amino acids) produced by any sequenced Caldicellulosiruptor species, and based on searches of genome databases, it appears to be entirely unique to C. kronotskyensis. No proteins covering more than half of

Calkro_0111 result when the full sequence is used as a query for a blastp search, except for

Calkro_0121, a paralog of Calkro_0111 in C. kronotskyensis. Calkro_0111 contains two GH domains, GH16 and GH55, which act as an endoglucanase and exoglucanase, respectively, on the β-1,3/1,6-glucan laminarin (5). Like many Caldicellulosiruptor multi-functional enzymes, these endo- and exo- acting domains presumably work together, with the GH16 domain making new chain ends for the GH55 to degrade. This previous characterization work also showed that a Fascin-like domain (previously referred to as an actin crosslinking- like domain) adjacent to the GH16 was required for its activity. While there are no proteins

217

with the full multi-domain structure of Calkro_0111, several proteins consist of GH16-Facin domain pairs. Table 4.1 shows the closest blastp results when the Calkro_0111 GH16-Fascin sequence is used as a query. These proteins are almost exclusively from Paenibacillus species. Additionally, all of the results are only approximately 410 amino acid proteins comprised of only the GH16 and Fascin domains, with PaecuDRAFT_0918 being the only exception. PaecuDRAFT_0918 contains a CBM6 and CBM32 domain in addition to the

GH16 and Fascin, but this domain structure is not similar to the structure of Calkro_0111. To probe for the presence of Fascin domains in more distantly related GH16 sequences, a list of all genes containing GH16 domains (pfam00722) was obtained from the JGI Integrated

Microbial Genomes & Microbiomes database (37), and this list was searched using blastp with the Calkro_0111 Fascin domain sequence as the query. These results are shown in

Table 4.2. Here, the closest blastp matches consist only of the approximately 410 amino acids GH16 and Fascin domain proteins. Interestingly, two proteins (STAUR_1723 and

COCOR_01711) have the Fascin domain on the N-terminus of the GH16, as opposed to on the C-terminus as seen for all the other blastp matches and in Calkro_0111 itself. These results demonstrate that, while the GH16-Fascin structure is common to many GH16 enzymes, the larger multi-domain structure of Calkro_0111 remains unique.

Calkro_0111 GH16 and Fascin Domain Structures

The crystal structure of the GH16 domain from Calkro_0111 and the adjacent Fascin domain were solved individually (Figure 4.2A and Figure 4.2B/C, respectively). The GH16 structure was refined to a resolution of 1.7Å with a R/Rfree of 0.174/0.230 and the Fascin

218

structure was refined to a resolution of 1.5Å with a R/Rfree of 0.158/0.208. A structure was also determined for the full Calkro_0111 GH16-Fascin protein (Figure 4.2D), which was refined to a resolution of 2.35Å with a R/Rfree of 0.208/0.262. The GH16 structure resembles previously solved structures for GH16 endo-β-glucanases (PDB ID: 4DFS, for example), and has a cleft through which the β-glucan substrate fits to reach the active site for cleavage. A previously solved GH16 structure for an endo-β-galactosides from Clostridium perfringens also contains a smaller C-terminal domain similar to the Fascin domain of

Calkro_0111 (38). It was speculated to be involved in binding to polysaccharides, but it was not determined whether this smaller domain was required for the activity of the GH16 in this

Clostridium perfringens enzyme.

Biochemical Analysis of the Calkro_0111 GH16-Fascin Protein

The Calkro_0111 GH16-Fascin construct was produced in E. coli and purified

(Figure 4.3A). Truncation mutants of Calkro_0111 were previously characterized (5), and the only substrate where activity of the Calkro_0111 GH16 could be detected was the β-1,3-

1,6-glucan laminarin. The current GH16-Fascin construct includes 5 additional amino acids

(TVPSV) on the N-terminus of the TM4 construct characterized in this previous work. Based on the structure of the Calkro_0111 GH16-Fascin, the Fascin domain may play a role in orienting laminarin in the active site of the GH16. This could be accomplished by binding the

β-1,6-branching of laminarin to orient the β-1,3-glucan in the active site for cleavage. To examine this hypothesis, additional polysaccharides with β-1,3-glucan linkages that lack the

β-1,6-branching of laminarin were used as substrates for the Calkro_0111 GH16-Fascin

219

protein. These included: β-1,3/4-glucans lichenan and barley β-glucan, and β-1,3-glucans

CM-pachyman and CM-curdlan. The pachyman and curdlan polysaccharides used in this study are carboxymethylated to improve their solubility. Specific activity was determined on these substrates using the DNS reducing sugar assay (Figure 4.3B), and activity was detected on all of the substrates. However, the specific activity of Calkro_0111 GH16-Fascin on laminarin (0.175 µmol reducing sugar released / min / mg) was 40 times higher than the activity measured on CM-curdlan, 85 times higher than the activity measured on barley β- glucan and lichenan, and 140 times higher than the activity measured on CM-pachyman. This limited activity on β-1,3-glucans without β-1,6-branching suggests that the Calkro_0111

GH16-ACL is specifically designed for activity on laminarin-like branched substrates. It also implicates the Fascin domain as helping to orient the GH16 for improved activity on laminarin. Further, this role in improving activity of an adjacent GH domain is a classic role for CBM domains in CAZymes. This suggests the Fascin domain of Calkro_0111 may represent a new family of CBM, but further evidence of sugar binding is necessary to make this conclusion.

Wai35_2053 GH10, CBM3, and GH48 Domain Structures

Recently, the genome of Caldicellulosiruptor sp. strain Wai35.B1 (39) was sequenced, and a novel enzyme, referred to here as Wai35_2053, predicted to contain three catalytic GH domains, was identified. This is the first enzyme identified to date that has three catalytic domains within a single CAZyme protein, and also the first GH12 domain identified in a Caldicellulosiruptor species. In the genome of C. sp. Wai35.B1, Wai35_2053 (NCBI

220

accession WP_045175321) is found in the glucan degradation locus (GDL), the primary locus of multi-domain enzymes responsible for cellulose degradation in the

Caldicellulosiruptor species. Wai35_2053 has four domains: GH10-CBM3-GH12-GH48, which is similar to Athe_1857 (Figure 4.1) from the C. bescii GDL. In fact, the GH10-

CBM3 domains of Wai35_2053 are 88% identical to the corresponding domains in

Athe_1857, with most of the differences coming in the Pro/Thr rich linker between the two domains. Additionally, the GH48 domain of Wai35_2053 is 98% identical to the GH48 of

Athe_1857, which coincidentally is identical to the GH48 domain of Athe_1867. The GH12 domain of Wai35_2053 has no homology to Athe_1857 or other previously identified

Caldicellulosiruptor proteins.

The domains of Wai35_2053 were individually cloned and produced in E. coli for crystallization efforts. Crystal structures were solved for three of the four domains: GH10,

CBM3, and GH48 (Figure 4.4 A/B, C/D, and E, respectively). The GH10 structure was refined to a resolution of 1.9 Å with a R/Rfree of 0.180/0.234; the CBM3 structure was refined to a resolution of 2.0 Å with a R/Rfree of 0.180/0.245; and, the GH48 was refined to a resolution of 1.9 Å with a R/Rfree of 0.167/0.239. These three crystal structures represent very typical GH10, CBM3, and GH48 domains, respectively. The GH10 structure of

Wai35_2053 resembles that of intracellular C. bescii Athe_0185 xylanase GH10 (PDB ID:

4L4O) (18). Additionally the GH48 domain of Wai35_2053 is similar to the previously solved structure for the GH48 of Athe_1867 (PDB ID: 4EL8) (9), which has 98% sequence identity to the Wai35_2053 GH48.

221

Biochemical Analysis of Wai35_2053 and Constituent Domains

To investigate the activity of Wai35_2053, the full length protein was produced in C. bescii (Figure 4.5A) using newly developed protein expression techniques relying on a high temperature kanamycin resistance marker for C. bescii (14). This figure also shows the individual catalytic domains of Wai35_2053 produced in E. coli for characterization. C. bescii is known to glycosylate its large extracellular proteins, such as Athe_1867 CelA (40), so the glycosylation of Wai35_2053 produced in C. bescii was investigated. Staining an

SDS-PAGE gel containing Wai35_2053 for glycoprotein shows that it is glycosylated

(Figure 4.5B). Glycosylation of Wai35_2053 is also evidenced by the 185 kDa apparent molecular mass of Wai35_2053 on the SDS-PAGE gel, while its predicted size based on sequence alone is 170 kDa.

The activity of Wai35_2053 was screened in an overnight assay on several polysaccharide substrates (data not shown) and substrates where activity was detected were used to assay the activity of the full protein and the three catalytic domains in more detail.

Activity was measured using the DNS reducing sugar assay and activity is calculated on a molar basis to compare across the proteins of different molecular weight (Figure 4.5C/D).

Wai35_2053 is very highly active on xylan substrates, and this xylanase activity appears to be the result of the GH10 domain, based on the similar specific activity of the GH10 domain alone to the full protein on birch wood, oat spelt, and beech wood xylans. This activity for the GH10 domain of Wai35_2053 is similar to that of the GH10 of Athe_1857 (11), which is a close homolog. The GH12 and GH48 had detectable activity on these xylan substrates, but at a level 2 orders of magnitude less than the GH10. This slight activity on xylan has

222

previously been identified for the GH48 of Athe_1867 (9). GH12 domains are typically xyloglucanases or β-glucanases, but some do have activity on xylan substrates (41).

Significant β-1,3-glucanase activity was detected on barley β-glucan for the GH10 and GH12 domains, which each have a specific activity of 250 µmol reducing sugar released / min /

µmol. The activity of the full protein is equal to the sum of these two domains, at approximately 500 µmol reducing sugar released / min / µmol. The GH48 also shows low activity on barley β-glucan. Wai35_2053 and its domains also showed activity on β-1,3/4- glucan lichenan, β-1,4-glucan carboxymethyl cellulose, and β-1,4-mannan locust bean glucomannan (Figure 4.5D), and these activities appear to primarily reside in the GH48 domain. Also of interest is that the activity on locust bean glucomannan appears to only be related to the C-teriminal GH12 and GH48 domains of Wai35_2503, as the GH10 has no activity on this substrate. These results, with the full enzyme having similar activity to the sum of its individual catalytic domains, suggest that minimal synergy is present between the three catalytic domains of Wai35_2053.

Csac_0678 GH5 and CBM28 Structures

Csac_0678 consists of a GH5 domain, CBM28, and repeated SLH domains that anchor this protein to the cell surface within the S-layer, as described previously (7). The

GH5 of Csac_0678 is conserved in all sequenced Caldicellulosiruptor species, but certain species are lacking the CBM28 and SLH domains. The homolog of the Csac_0678 GH5 in C. sp. Rt8.B8 consists of only the single GH5 domain without the CBM28 or SLH domains. The two American isolates, C. owensensis and C. obsidiansis, have homologs that lack the

223

CBM28 domain between their GH5 and SLH domains. Here, the three-dimensional structure of the Csac_0678 GH5 (Figure 4.6 A/B) and CBM28 (Figure 4.6 C/D) were determined.

The structure of Csac_0678 GH5 was refined to a resolution of 1.5 Å with a R/R-free of

0.085/0.115. The Csac_0678 GH5 displays a typical (βα)8-TIM barrel fold (42). This structure has been deposited into the protein data bank (PDB; www.rcsb.org) with entry code

5ECU. The Csac_0678 CBM28 was refined to a resolution of 2.62Å with a R/R-free of

0.260/0.344, with further refinement in progress.

Pair-wise secondary-structure matching by the PDBefold program (43) found 68 unique structural matches for Csac_0678 GH5 from the protein data bank with at least 70% secondary structure similarity. The most similar match was alkaline cellulase K from

Bacillus sp. KSM-635 (44), with a secondary structure similarity of 83%, sequence similarity of 63%, and Cα root-mean-square deviation of 0.727 Å2, suggesting highly similar backbones between the two proteins. Further comparison with other GH5 structures revealed that the Csac_0678 GH5 and Bacillus GH5 differ from the other GH5s by having an enlarged loop near the active site (Figure 4.6 B). However, Csac_0678 GH5 does not have other extended features of Bacillus GH5 (44). The catalytic residues of Csac_0678 GH5 (Glu188 and Glu285) are in similar orientation compared to Bacillus GH5. The residues near the active sites are identical except for leucine instead of histidine at position 151. Histidine at this position is a special feature of the Bacillus GH5 (44).

The structures of only two CBM28 domains have been solved to date. One of these is from a β-glucanase, C. josui Cel5A (45), which has a domain structure very similar to

Csac_0678 with a GH5, CBM28, and SLH domains. However, unlike Csac_0678, C. josui

224

Cel5A has an additional CBM17 domain between its GH5 and CBM28 domains. The

CBM17 and CBM28 families are very similar, but have binding sites for glucan chains that are in opposite directions. The CBM28 of C. josui binds to cellooligosaccharides and is hypothesized to be involved in the orientation of glucan chains towards the active site of its

GH5 (45). The CBM28 of Csac_0678 presumably plays a similar role by binding to glucan chains along its face and orienting the polysaccharide towards the active site of the GH5.

Proteins produced containing both the GH5 and CBM28 domains together did not form crystals for analysis. This is the first time the GH5 and CBM28 domain from the same protein have both been crystallized.

Biochemical Analysis of Csac_0678 and Athe_0594

The biochemical properties and substrates of Csac_0678 were established previously

(4). Additionally, work with the C. bescii homolog (Athe_0594) has shown that the CBM28 improves the activity and thermosstability of the GH5 domain (8). Here, Csac_0678 was compared to Athe_0594 and the active site of Athe_0594 GH5 was investigated based on the predicted active site of the Csac_0678 GH5 crystal structure (Figure 4.6A).

Csac_0678 and Athe_0594 are close homologs, but have a relatively low percent identity (65%) over the whole protein because of divergence in the sequences of their C- terminal SLH domains. Characterizing Athe_0594 from C. bescii is of interest because tools for manipulating C. bescii genetically were recently developed (14,46), while these tools do not exist currently for C. saccharolyticus. An alignment of Csac_0678 and Athe_0594 are shown in Figure 4.7A. Arrows in this figure mark the domain boundaries between the GH5,

225

CBM28 and SLH domains. As seen in this figure, the GH5 domains have very similar sequences (81% sequence identity), the CBM28 domains are highly similar (68% sequence identity), and the SLH domains are the most dissimilar (40% sequence identity) part of these proteins. Because the GH5 sequences are highly similar the active site residues predicted by the Csac_0678 structure (Glu188 and Glu285) (Figure 4.6A) align to the active site of

Athe_0594 (Glu189 and Glu286), which are marked in Figure 4.7A.

To investigate their activities, truncations of Csac_0678 and Athe_0594 containing only the GH5 and CBM28 domain without SLH domains were produced in E. coli. In addition, using newly developed protein expression techniques in C. bescii, an identical truncation of Athe_0594 was produced in C. bescii. Figure 4.7B shows an SDS-PAGE gel of these purified proteins. Based on the previously identified glycosylation of extracellular proteins in C. bescii (40), including Wai35_2053, the Athe_0594 GH5-CBM28 protein produced in C. bescii was stained to detect glycoprotein. Athe_0594 GH5-CBM28 was not found to be glycosylated (data not shown). To investigate the active site of the Athe_0594

GH5, the predicted active site acid/base residue and nucleophilic residue (Glu189 and

Glu286, respectively) were mutated to Ala individually and in combination. This resulted in three mutants of Athe_0594 GH5: E189A, E286A, and E189AE286A, which were expressed alongside the native Athe_0594 GH5 in E. coli and purified (Figure 4.7C).

The activity of these proteins was compared on barley β-glucan and CMC, two substrates Csac_0678 was previously identified to be most active on (4). Csac_0678 GH5-

CBM28 and both the E. coli expressed and C. bescii expressed Athe_0594 GH5-CBM28 had nearly identical activities on these two substrates (Figure 4.7D). The activity of the

226

Athe_0594 GH5 alone is also shown in this figure for comparison. The 250-fold decrease in activity between the GH5-CBM28 and GH5 protein, demonstrates the contribution of the

CBM28 to enhancing the activity of the GH5 that was previously described (8). The activities of the Athe_0594 GH5 active site mutants were also evaluated alongside the native

Athe_0594 GH5 protein (Figure 4.E). Either mutation (E189A or E286A) individually and both mutations in combination appear to have the same effect of nearly abolishing activity of the GH5, confirming Glu189 and Glu286 as the active site residues of this GH5 domain.

Based on structural similarity to other GH5 domain Glu189 can be identified as the acid/base residue and Glu286 can be identified as the nucleophilic residue of the active site (47,48).

Thus, by using the Csac_0678 GH5 crystal structure the active site residues of the Athe_0594 were determined and confirmed.

Discussion

Protein crystal structures are a powerful tool for identifying the physical properties of proteins including: the active site residues, the tertiary structure of a protein, and its interaction with substrates, other domains, and other proteins. Here, we present the crystal structures for 7 domains from three proteins produced by Caldicellulosiruptor species. To complement the structural understanding of these proteins, biochemical data was collected for individual domains and full proteins to inform our understanding of how the structure of an individual domain leads to its function and the function of the larger multi-domain protein.

227

The Calkro_0111 GH16-Facin construct represents a portion of a very unique protein from C. kronotskyensis. While no proteins can be identified that have a similar structure to the large and complex Calkro_0111, there are several identifiable proteins that are only comprised of a similar GH16-Facin domain pair (Table 4.1 and Table 4.2) to that found in

Calkro_0111. The GH16-Fascin domain pair was the smallest identifiable truncation around the Calkro_0111 GH16 domain to have activity (5). It does not appear that the proteins in

Tables 4.1 and 4.2 have been characterized, but it would be interesting to determine if their

GH16 domains also do not have activity without the adjacent Fascin domain. Further, based on the structure of the Calkro_0111 GH16-Fascin (Figure 4.2D) and its significantly higher activity on the β-1,6-branched β-1,3-glucan laminarin compared to other unbranched β-1,3- glucans (Figure 4.3B), the Fascin domain may be a new family of CBM. By binding to laminarin, likely near the branch points of the polysaccharide, it may help to orient the substrate in the active site of the GH16 leading to its increased activity. Further analysis is necessary to substantiate the binding of the Fascin domain to laminarin, but without both the structure of these domains and their biochemical activity we would not understand the coordinated function of these domains.

In the case of Wai35_2053, the crystal structures for three domains (GH10, CBM3, and GH48) are reported (Figure 4.4). These structures represent fairly typical structures for members of these CAZyme domain families. What is interesting about these structures, however, is that they are individual pieces of a much larger multi-domain protein. Each of the catalytic domains, GH10, GH48, and GH12, are active enzymes on their own, and the structures for the GH10 and GH48 are representative of these particular GH families in

228

enzymes that are not multi-domain. Thus, the individually folded and catalytic domains of

Wai35_2053 are like individual knots on a polypeptide rope. This is representative of the multi-domain strategy used in may Caldicellulosiruptor proteins (Figure 4.1). One reason for the prevalence of multi-domain enzymes in these species could be to deploy domains that are synergistic together in a single protein so they can work together on a given substrate.

This appears to be the case for previously characterized CelB (Csac_1078), for which the full protein has higher activity than its individual domains or mixtures of domains that are not attached in the same protein (6). It does not appear from the biochemical data presented here

(Figure 4.5) that the activity of Wai35_2053 is as synergistic as other previously characterized Caldicellulosiruptor multi-domain proteins. For most substrates, the activity of the full protein is accounted for by a single domain or the sum of multiple domains. For example, the GH10 xylanase domain has comparable activity on birch wood xylan and beech wood xylan to the full protein, and while the GH12 and GH48 domains have some activity on these substrates the activity of the full length Wai35_2053, which contains all three domains, is not increased significantly above the GH10 alone. On barley β-glucan, the activity of the GH10 and GH12 domains appear to be additive to reach the activity of the full

Wai35_2053. This is what would be expected if these domains were not synergistic, because, on a molar basis, Wai35_2053 contains both a GH10 and GH12 domain in a single

Wai35_2053 molecule.

The GH12 domain of Wai35_2053 is currently in crystallization trials, but no crystals have formed as of yet. While having this domain structure would be useful for informing the full structure of the protein, having three of the four domains will allow for molecular

229

dynamic simulations to reconstruct the structure of this multi-domain protein. Modeling can also be used to determine a suitable GH12 model to stand in for the unsolved structure of the

Wai35_2053 GH12. By using a modeling approach and a reconstructed full Wai35_2053 structure, we would learn how the protein domains of Wai35_2053 are oriented relative to one another. Similar approaches have previously been utilized on Athe_1867 CelA to identify its mechanism and understand the synergistic action of its GH9 and GH48 domains

(9). This type of molecular simulation with Wai35_2053 would help to identify whether the structure of the protein would suggest any synergistic interactions between the domains. The structure of a protein with three catalytic domains, and in particular a GH12 and GH48 as adjacent domains, would also be interesting for understanding how these three domain each access the substrate while remaining attached in a single protein.

As with the structures of Calkro_0111 and Wai35_2053, the crystal structure of

Csac_0678 (Figure 4.6A) helped to inform biochemical experiments on this protein and its homolog Athe_0594. Here, the active site of Athe_0594 was predicted based on the active site determined by the structure of Csac_0678. Active site residues Glu189 (acid/base) and

Glu286 (nucleophile) of Athe_0594 were mutated to Ala individually and in combination in the Athe_0594 GH5. All three of these mutants showed a very significant loss in activity on

CMC and barley β-glucan compared to the native Athe_0594 GH5 domain (Figure 4.7E).

This confirms these residues are the catalytic acid/base and nucleophilic residues of these domains. This demonstrates the powerful ability of structural information to inform the determination of the active site of a protein. Similar analysis to this could be applied to the other structures determined here for Wai35_2053 catalytic domains or the Calkro_0111

230

GH16 to identify their active sites. Additionally, we confirm that the activities of the

Csac_0678 GH5-CBM28 and Athe_0594 GH5-CBM28 proteins are identical on the substrates CMC and β-beta glucan, and the activity of the Athe_0594 GH5 with its adjacent

CBM28 domain is at least two orders of magnitude higher than the GH5 alone. This presents a similar situation to the Calkro_0111 GH16, where an adjacent putative binding domain also enhances the activity of the catalytic GH domain. In addition, the same molecular simulations that would be useful to reconstruct a Wai35_2053 structure from its constituent domains could be used to model the full structure of Csac_0678.

The analysis of these Caldicellulosiruptor multi-domain proteins, by breaking them into their constituent parts for structural analysis, provides a wealth of information about how these enzymes function. Here, we present examples for the use of structural information to inform biochemical experiments and how biochemical data explains features seen in protein structures. The CBM28 domain of Csac_0678 plays and important part in enhancing the activity of its GH5, just as a Fascin-like domain appears to do for the Calkro_0111 GH16.

The structure of the GH16-Fascin protein informed how much higher activity towards the substrate laminarin could be explained. Additionally, the structures and biochemistry of

Wai35_2053 give us a broader picture of how the domains of multi-domain enzymes of

Caldicellulosiruptor species are arranged and function. The full length multi-domain proteins of Caldicellulosiruptor species are too large and heterogeneous to form the crystals necessary for solving their structures by X-ray diffraction, but by taking these proteins apart, we have demonstrated that the individual domains of these proteins truly are individually folded and functional. This observation about the structure of these multi-domain proteins, where

231

domains are like beads strung along a chain, could allow for the rational design of multi- domain enzymes in the future. By gaining a clearer picture of individual domain structures, the biochemical properties of the domains, and how the domains are arranged into multi- domain proteins, designing enzymes with enhanced synergistic properties may be within our reach.

Acknowledgements

This work was funded in part by the BioEnergy Science Center, a U.S. Department of

Energy Bioenergy Research Center supported by the Office of Biological and Environmental

Research in the DOE Office of Science. Protein crystallography was performed at the

National Renewable Energy Laboratory by Vladimir Lunin and Markus Alahuhta. Inci

Ozdemir performed the cloning of the Csac_0678 GH5 domain.

232

Figure 4.1 - Selected multi-domain enzymes from Caldicellulosiruptor species Proteins are grouped into two groups based on whether they are predicted to be surface- associated (by SLH domains), or freely secreted from the cell. The predicted domain arrangement of these proteins are shown to scale. Glycoside Hydrolase (GH) domains are shown as squares, Carbohydrate Binding Modules (CBM) are shown as circles, Polysaccharide Lyases (PL) are shown as diamonds. Accessory domains: Fascin-like (CL0066), Fibronectin Type III (FN3, pfam 00041), and Cadherin-like (pfam 12733) are also shown.

233

Figure 4.2 - Crystal strucutres for Calkro_0111 GH16 and Fascin-like domain (A) GH16 (B/C) Fascin-like domain (C) GH16-Fascin

234

Figure 4.3 - Biochemical data on the Calkro_0111 GH16-Fascin protein (A) Purified Calkro_0111 GH16-Fascin protein (B) Activity on β-1,3-glucan substrates. Error bars represent the standard deviation (n=4).

235

Figure 4.4 - Crystal structures for Wai35_2053 domains (A/B) GH10 (C/D) CBM3 (E) GH48

236

Figure 4.5 - Biochemical data on Wai35_2053 and its domains (A) Domain layout of Wai35_2053 and proteins produced for characterization (B) Glycoprotein staining of Wai35_2053. A positive glycoprotein control and negative non- glycosylated protein control are also shown. (C) Activity on xylans and barley β-glucan (D) Activity on glucomannan, CMC, and lichenan. Error bars represent the standard deviation (n=4).

237

Figure 4.6 - Crystal structures for Csac_0678 domains (A) Csac_0678 GH5. Active site residues Glu188 and Glu285 are shown. (B) Structural comparison of the Csac_0678 GH5 to alkaline cellulase K from Bacillus sp. KSM-635 (PDB ID: 1G01). A unique extended loop (circled) near the active site in both structures differentiates them from other GH5 domains. (C/D) Csac_0678 CBM28.

238

Figure 4.7 - Biochemical data on Csac_0678 and Athe_0594 proteins (A) Alignment of Csac_0678 and Athe_0594. The domain boundaries between different protein domains are marked. The active site residues are marked and labeled using the Athe_0594 sequence numbers (Glu189 and Glu286) (B) Csac_0678 and Athe_0594 GH5- CBM28 proteins produced in E. coli and C. bescii. (C) Athe_0594 GH5 proteins with active site mutations. (D) Activity of Csac_0678/Athe_0594 GH5-CBM28 proteins and Athe_0594 GH5 on CMC and barley β-glucan. (E) Activity of Athe_0594 GH5 with various mutations on CMC and barley β-glucan Error bars represent the standard deviation (n=4).

239

Table 4.1 - Proteins identified by blastp using Calkro_0111 GH16-Fascin sequence as the query

Amino Blastp Blastp Species Locus Tag Acid Domains Coverage Identity Length % %

SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3-FN3- Calkro_0111 2435 - - kronotskyensis 2002 Fascin-FN3-FN3-GH55- CBM32-CBM32

SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3-FN3- Calkro_0121 2229 97 69 kronotskyensis 2002 Fascin-FN3-FN3-GH55- CBM32 Paenibacillus PaecuDRAFT_0918 762 CBM6-CBM32-GH16-Fascin 97 62 curdlanolyticus YK9 Thermoactinomyces JG50_RS0110100 418 GH16-Fascin 97 52 daqus Paenibacillus sp. PTI45_02212 407 GH16-Fascin 97 50 TI45-13ar Paenibacillus sp. HMPREF1207_05298 410 GH16-Fascin 98 49 HGH0039 Paenibacillus PCH01S_RS00565 423 GH16-Fascin 98 49 chitinolyticus Paenibacillus WH45_RS10505 410 GH16-Fascin 95 49 wulumuqiensis

240

Table 4.2 - GH16 domain-containing proteins identified by blastp using the Calkro_0111 Fascin domain sequence as the query

Amino Blastp Blastp Species Locus Tag Acid Domains Coverage Identity Length % %

SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3- Calkro_0111 2435 - - kronotskyensis 2002 FN3-Fascin-FN3-FN3- GH55-CBM32-CBM32

SLH-SLH-SLH-CBM54- Caldicellulosiruptor FN3-GH16-Fascin-FN3- Calkro_0121 2229 96 63 kronotskyensis 2002 FN3-Fascin-FN3-FN3- GH55-CBM32 Paenibacillus sp. BD3526 Ga0100886 410 GH16-Fascin 97 38 Stigmatella aurantiaca STAUR_1723 445 Fascin-GH16 96 40 DW4/3-1 Coraliomargarita Caka_0989 396 GH16-Fascin 93 33 akajimensis DSM 45221 Archangium gephyra DSM Ga0081691_113251 422 GH16-Fascin 89 41 2261 Corallococcus coralloides COCOR_01711 438 Fascin-GH16 94 36 DSM 2259 Pedobacter saltans DSM Pedsa_2631 431 GH16-Fascin 89 37 12145 Coraliomargarita Caka0071 417 GH16-Fascin 95 34 akajimensis DSM 45221 Chitinophaga pinensis DSM Cpin_5109 409 GH16-Fascin 76 36 2588 Niastella koreensis GR20-10, Niako_0669 406 GH16-Fascin 71 39 DSM 17620 Mucilaginibacter Ga0123631 422 GH16-Fascin 79 33 PAMC26640

241

References

1. Conway, J. M., Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M.

(2015) Lignocellulosic biomass deconstruction by the extremely thermophilic genus

Caldicellulosiruptor. in Thermophilic Microorganisms (Li, F. ed.), Caister Academic

Press, Norfolk, UK. pp 91-120

2. Bayer, E. A., Shoham, Y., and Lamed, R. (2013) Lignocellulose-Decomposing

Bacteria and Their Enzyme Systems. in The Prokaryotes: Prokaryotic Physiology

and Biochemistry (Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E., and

Thompson, F. eds.), Springer Berlin Heidelberg, Berlin, Heidelberg. pp 215-266

3. Zurawski, J. V., Conway, J. M., Lee, L. L., Simpson, H., Izquierdo, J. A., Blumer-

Schuette, S., Nookaew, I., Adams, M. W. W., and Kelly, R. M. (2015) Comparative

analysis of extremely thermophilic Caldicellulosiruptor species reveals common and

differentiating cellular strategies for plant biomass utilization. Appl. Environ.

Microbiol. 81, 7159-7170

4. Blumer-Schuette, S. E., Giannone, R. J., Zurawski, J. V., Ozdemir, I., Ma, Q., Yin,

Y., Xu, Y., Kataeva, I., Poole, F. L., 2nd, Adams, M. W., Hamilton-Brehm, S. D.,

Elkins, J. G., Larimer, F. W., Land, M. L., Hauser, L. J., Cottingham, R. W., Hettich,

R. L., and Kelly, R. M. (2012) Caldicellulosiruptor core and pangenomes reveal

determinants for noncellulosomal thermophilic deconstruction of plant biomass. J.

Bacteriol. 194, 4015-4028

5. Conway, J. M., Pierce, W. S., Le, J. H., Harper, G. W., Wright, J. H., Tucker, A. L.,

Zurawski, J. V., Lee, L. L., Blumer-Schuette, S. E., and Kelly, R. M. (2016)

242

Multidomain, Surface Layer-associated Glycoside Hydrolases Contribute to Plant

Polysaccharide Degradation by Caldicellulosiruptor Species. J. Biol. Chem. 291,

6732-6747

6. VanFossen, A. L., Ozdemir, I., Zelin, S. L., and Kelly, R. M. (2011) Glycoside

hydrolase inventory drives plant polysaccharide deconstruction by the extremely

thermophilic bacterium Caldicellulosiruptor saccharolyticus. Biotechnol. Bioeng.

108, 1559-1569

7. Ozdemir, I., Blumer-Schuette, S. E., and Kelly, R. M. (2012) S-layer homology

domain proteins Csac_0678 and Csac_2722 are implicated in plant polysaccharide

deconstruction by the extremely thermophilic bacterium Caldicellulosiruptor

saccharolyticus. Appl. Environ. Microbiol. 78, 768-777

8. Velikodvorskaya, G. A., Chekanovskaya, L. A., Lunina, N. A., Sergienko, O. V.,

Lunin, V. G., Dvortsov, I. A., and Zverlov, V. V. (2013) Family 28 carbohydrate-

binding module of the thermostable endo-1,4-beta-glucanase CelD from

Caldicellulosiruptor bescii maximizes enzyme activity and irreversibly binds to

amorphous cellulose. Mol Biol (Mosk) 47, 581-586

9. Brunecky, R., Alahuhta, M., Xu, Q., Donohoe, B. S., Crowley, M. F., Kataeva, I. A.,

Yang, S.-J., Resch, M. G., Adams, M. W. W., Lunin, V. V., Himmel, M. E., and

Bomble, Y. J. (2013) Revealing Nature’s Cellulase Diversity: The Digestion

Mechanism of Caldicellulosiruptor bescii CelA. Science 342, 1513-1516

10. Ye, L., Su, X., Schmitz, G. E., Moon, Y. H., Zhang, J., Mackie, R. I., and Cann, I. K.

O. (2012) Molecular and biochemical analyses of the GH44 Module of

243

CbMan5B/Cel44A, a bifunctional enzyme from the hyperthermophilic bacterium

Caldicellulosiruptor bescii. Appl. Environ. Microbiol. 78, 7048-7059

11. Xue, X., Wang, R., Tu, T., Shi, P., Ma, R., Luo, H., Yao, B., and Su, X. (2015) The

N-Terminal GH10 Domain of a Multimodular Protein from Caldicellulosiruptor

bescii Is a Versatile Xylanase/beta-Glucanase That Can Degrade Crystalline

Cellulose. Appl Environ Microbiol 81, 3823-3833

12. Alahuhta, M., Xu, Q., Bomble, Y. J., Brunecky, R., Adney, W. S., Ding, S. Y.,

Himmel, M. E., and Lunin, V. V. (2010) The unique binding mode of cellulosomal

CBM4 from Clostridium thermocellum cellobiohydrolase A. J Mol Biol 402, 374-387

13. Alahuhta, M., Brunecky, R., Chandrayan, P., Kataeva, I., Adams, M. W., Himmel, M.

E., and Lunin, V. V. (2013) The structure and mode of action of Caldicellulosiruptor

bescii family 3 pectate lyase in biomass deconstruction. Acta. Crystallogr. D Biol.

Crystallogr. 69, 534-539

14. Lipscomb, G. L., Conway, J. M., Blumer-Schuette, S. E., Kelly, R. M., and Adams,

M. W. W. (2016) Highly Thermostable Kanamycin Resistance Marker Expands the

Toolkit for Genetic Manipulation of Caldicellulosiruptor bescii. Appl. Environ.

Microbiol. 82, 4421-4428

15. Chung, D., Cha, M., Farkas, J., and Westpheling, J. (2013) Construction of a stable

replicating shuttle vector for Caldicellulosiruptor species: use for extending genetic

methodologies to other members of this genus. PLoS One 8, e62881

244

16. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H.,

Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank. Nucleic Acids

Res. 28, 235-242

17. Alahuhta, M., Taylor, L. E., 2nd, Brunecky, R., Sammond, D. W., Michener, W.,

Adams, M. W., Himmel, M. E., Bomble, Y. J., and Lunin, V. (2015) The catalytic

mechanism and unique low pH optimum of Caldicellulosiruptor bescii family 3

pectate lyase. Acta. Crystallogr. D Biol. Crystallogr. 71, 1946-1954

18. Zhang, Y., An, J., Yang, G., Zhang, X., Xie, Y., Chen, L., and Feng, Y. (2016)

Structure features of GH10 xylanase from Caldicellulosiruptor bescii: implication for

its thermophilic adaption and substrate binding preference. Acta Biochim Biophys Sin

(Shanghai) 48, 948-957

19. Roske, Y., Sunna, A., Pfeil, W., and Heinemann, U. (2004) High-resolution crystal

structures of Caldicellulosiruptor strain Rt8B.4 carbohydrate-binding module

CBM27-1 and its complex with mannohexaose. J. Mol. Biol. 340, 543-554

20. Gibson, D. G. (2011) Enzymatic assembly of overlapping DNA fragments. Methods

Enzymol. 498, 349-361

21. Geslin, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot, G., and Prieur, D.

(2003) PAV1, the first virus-like particle isolated from a hyperthermophilic

euryarchaeote, "Pyrococcus abyssi". J. Bacteriol. 185, 3888-3894

22. Studier, F. W. (2005) Protein production by auto-induction in high density shaking

cultures. Protein Expr. Purif. 41, 207-234

245

23. Farkas, J., Chung, D., Cha, M., Copeland, J., Grayeski, P., and Westpheling, J. (2013)

Improved growth media and culture techniques for genetic analysis and assessment of

biomass utilization by Caldicellulosiruptor bescii. J. Ind. Microbiol. Biotechnol. 40,

41-49

24. Lipscomb, G. L., Stirrett, K., Schut, G. J., Yang, F., Jenney, F. E., Jr., Scott, R. A.,

Adams, M. W., and Westpheling, J. (2011) Natural competence in the

hyperthermophilic archaeon Pyrococcus furiosus facilitates genetic manipulation:

construction of markerless deletions of genes encoding the two cytoplasmic

hydrogenases. Appl. Environ. Microbiol. 77, 2232-2238

25. Chung, D., Farkas, J., Huddleston, J. R., Olivar, E., and Westpheling, J. (2012)

Methylation by a unique alpha-class N4-cytosine methyltransferase is required for

DNA transformation of Caldicellulosiruptor bescii DSM6725. PLoS One 7, e43844

26. Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R.,

Keegan, R. M., Krissinel, E. B., Leslie, A. G., McCoy, A., McNicholas, S. J.,

Murshudov, G. N., Pannu, N. S., Potterton, E. A., Powell, H. R., Read, R. J., Vagin,

A., and Wilson, K. S. (2011) Overview of the CCP4 suite and current developments.

Acta Crystallogr D Biol Crystallogr 67, 235-242

27. Vagin, A., and Teplyakov, A. (2010) Molecular replacement with MOLREP. Acta

Crystallogr D Biol Crystallogr 66, 22-25

28. Canutescu, A. A., Shelenkov, A. A., and Dunbrack, R. L., Jr. (2003) A graph-theory

algorithm for rapid protein side-chain prediction. Protein Sci 12, 2001-2014

246

29. Jaroszewski, L., Rychlewski, L., Li, Z., Li, W., and Godzik, A. (2005) FFAS03: a

server for profile--profile sequence alignments. Nucleic Acids Res 33, W284-288

30. Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. (2000) Comparison of

sequence profiles. Strategies for structural predictions using sequence information.

Protein Sci 9, 232-241

31. Jaroszewski, L., Rychlewski, L., and Godzik, A. (2000) Improving the quality of

twilight-zone alignments. Protein Sci 9, 1487-1496

32. Murshudov, G. N., Skubak, P., Lebedev, A. A., Pannu, N. S., Steiner, R. A., Nicholls,

R. A., Winn, M. D., Long, F., and Vagin, A. A. (2011) REFMAC5 for the refinement

of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr 67, 355-

367

33. Emsley, P., Lohkamp, B., Scott, W. G., and Cowtan, K. (2010) Features and

development of Coot. Acta Crystallogr D Biol Crystallogr 66, 486-501

34. Miller, G. L. (1959) Use of dinitrosalicylic acid reagent for determination of reducing

sugar. Anal. Chem. 31, 426-428

35. Ghose, T. K. (1987) Measurement of Cellulase Activities. Pure Appl. Chem. 59, 257-

268

36. Xiao, Z., Storms, R., and Tsang, A. (2005) Microplate-based carboxymethylcellulose

assay for endoglucanase activity. Anal. Biochem. 342, 176-178

37. Markowitz, V. M., Chen, I. M., Palaniappan, K., Chu, K., Szeto, E., Pillay, M.,

Ratner, A., Huang, J., Woyke, T., Huntemann, M., Anderson, I., Billis, K., Varghese,

N., Mavromatis, K., Pati, A., Ivanova, N. N., and Kyrpides, N. C. (2014) IMG 4

247

version of the integrated microbial genomes comparative analysis system. Nucleic

Acids Res. 42, D560-567

38. Tempel, W., Liu, Z. J., Horanyi, P. S., Deng, L., Lee, D., Newton, M. G., Rose, J. P.,

Ashida, H., Li, S. C., Li, Y. T., and Wang, B. C. (2005) Three-dimensional structure

of GlcNAcalpha1-4Gal releasing endo-beta-galactosidase from Clostridium

perfringens. Proteins: Struct., Funct., Bioinf. 59, 141-144

39. Lee, L. L., Izquierdo, J. A., Blumer-Schuette, S. E., Zurawski, J. V., Conway, J. M.,

Cottingham, R. W., Huntemann, M., Copeland, A., Chen, I. M., Kyrpides, N.,

Markowitz, V., Palaniappan, K., Ivanova, N., Mikhailova, N., Ovchinnikova, G.,

Andersen, E., Pati, A., Stamatis, D., Reddy, T. B., Shapiro, N., Nordberg, H. P.,

Cantor, M. N., Hua, S. X., Woyke, T., and Kelly, R. M. (2015) Complete Genome

Sequences of Caldicellulosiruptor sp. Strain Rt8.B8, Caldicellulosiruptor sp. Strain

Wai35.B1, and "Thermoanaerobacter cellulolyticus". Genome Announc. 3

40. Chung, D., Young, J., Bomble, Y. J., Vander Wall, T. A., Groom, J., Himmel, M. E.,

and Westpheling, J. (2015) Homologous expression of the Caldicellulosiruptor bescii

CelA reveals that the extracellular protein is glycosylated. PLoS One 10, e0119508

41. Habrylo, O., Song, X., Forster, A., Jeltsch, J. M., and Phalip, V. (2012)

Characterization of the four GH12 Endoxylanases from the plant pathogen Fusarium

graminearum. J Microbiol Biotechnol 22, 1118-1126

42. Wierenga, R. K. (2001) The TIM-barrel fold: a versatile framework for efficient

enzymes. FEBS Lett 492, 193-198

248

43. Krissinel, E., and Henrick, K. (2004) Secondary-structure matching (SSM), a new

tool for fast protein structure alignment in three dimensions. Acta Crystallographica

Section D-Biological Crystallography 60, 2256-2268

44. Shirai, T., Ishida, H., Noda, J., Yamane, T., Ozaki, K., Hakamada, Y., and Ito, S.

(2001) Crystal structure of alkaline cellulase K: insight into the alkaline adaptation of

an industrial enzyme. J Mol Biol 310, 1079-1087

45. Tsukimoto, K., Takada, R., Araki, Y., Suzuki, K., Karita, S., Wakagi, T., Shoun, H.,

Watanabe, T., and Fushinobu, S. (2010) Recognition of cellooligosaccharides by a

family 28 carbohydrate-binding module. FEBS Lett 584, 1205-1211

46. Chung, D., Farkas, J., and Westpheling, J. (2013) Overcoming restriction as a barrier

to DNA transformation in Caldicellulosiruptor species results in efficient marker

replacement. Biotechnol. Biofuels 6, 82

47. Davies, G. J., Mackenzie, L., Varrot, A., Dauter, M., Brzozowski, A. M., Schulein,

M., and Withers, S. G. (1998) Snapshots along an enzymatic reaction coordinate:

analysis of a retaining beta-glycoside hydrolase. Biochemistry 37, 11707-11713

48. Shirai, T., Ishida, H., Noda, J., Yamane, T., Ozaki, K., Hakamada, Y., and Ito, S.

(2001) Crystal structure of alkaline cellulase K: Insight into the alkaline adaptation of

an industrial enzyme. J Mol Biol 310, 1079-1087

249