The Pennsylvania State University

The Graduate School

Intercollege Graduate Degree Programs

MULTITROPHIC INTERACTIONS CONTRIBUTE TO WOOD DIGESTION AND

NUTRITIONAL ECOLOGY IN LARVAL ANOPLOPHORA GLABRIPENNIS (ASIAN

LONGHORNED BEETLE)

A Dissertation in

Genetics

by

Erin D. Scully

 2013 Erin D. Scully

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

December 2013

The dissertation of Erin D. Scully was reviewed and approved* by the following:

Dr. Kelli Hoover Professor of Entomology Dissertation Co-Advisor

Dr. John Carlson Professor of Ecosystem and Management Dissertation Co-Advisor

Dr. Dawn Luthe Professor of Plant Stress Biology Dissertation Chair

Dr. Scott Geib Research Entomologist, USDA-ARS-PBARC Special Committee Member

Dr. David Geiser Professor of Plant Pathology

Dr. Robert Paulson Professor of Veterinary and Biomedical Sciences Head of the Genetics Program

*Signatures are on file in the Graduate School

iii

ABSTRACT

The Asian longhorned beetle (Anoplophora glabripennis) is an exotic, wood-boring pest that was first detected in the northeastern United States in 1996. A. glabripennis has a broad host range and can feed and grow in the heartwood of about 25 different deciduous tree species in the

US. A. glabripennis’ ability to thrive in a broad range of healthy host trees is particularly impressive considering that most other wood-boring insects with broad host ranges feed on stressed or dying trees whose woody, intractable components have been pre-digested by wood- degrading bacteria and fungi. In contrast, A. glabripennis harbors a diverse gut microbial community hypothesized to provide key for lignocellulose digestion and nutrient acquisition that help this insect overcome challenges of survival in healthy host trees. To investigate the contribution of gut microbes to digestive physiology, we surveyed the inherent digestive capabilities of A. glabripennis through midgut transcriptome sequencing, inventoried the taxonomic identity and metabolic potential of gut microbial affiliates through metagenome and metatranscriptome profiling, and analyzed the wood-degrading potential of a filamentous fungal taxon (e.g. Fusarium solani) consistently found in association with the A. glabripennis midgut using multidimensional protein identification techniques. While A. glabripennis endogenously produces a diverse array of belonging to three glycoside families, xylanases, detoxification enzymes, and proteins involved in nitrogen and nutrient scavenging, its suite of digestive and nutrient scavenging abilities is greatly expanded through its affiliation with gut microbes. Metagenome and metatranscriptome profiling revealed that the gut community has the capacity to produce cellulases, xylanases, and enzymes involved in nitrogen- fixation and recycling, vitamin and sterol synthesis, and lignin degradation. Moreover, secretome analysis of the beetle gut isolate of F. solani was cultivated on wood chips, revealing the production of several laccases, peroxidases, and accessory enzymes previously implicated in

iv lignin depolymerization, suggesting that this isolate could have lignin-degrading capabilities in the beetle gut. Both insect- and microbial- derived digestive enzymes represent novel targets that could be targeted for control of this invasive beetle. Overall, this research lays the foundation for unraveling the complex interactions between cerambycids and their gut microbes that contribute to survival in an environment devoid of easily accessible nutrients, allowing us to mine for novel biochemical pathways that could be exploited to enhance industrial cellulosic ethanol production and to develop novel targets for pest management.

v

TABLE OF CONTENTS

List of Figures ...... ix

List of Tables ...... xii

Acknowledgements ...... xiv

Chapter 1 Introduction ...... 1

Background ...... 3 Nutritional Composition of Woody Tissue ...... 3 Symbiosis in Xylophagous Insects ...... 4 Endogenous Enzymes of Xylophagous Insects and their Roles in Nutrient Acquisition ...... 5 Host Plant Resistance: Impacts on Insects and their Symbionts ...... 7 Life History of (ALB) Anoplophora glabripennis ...... 9 Microbes Associated with the Midgut of Anoplophora glabripennis ...... 10 Experiments ...... 12 Chapter 2. Comparative Metagenomic Profiling Reveals Lignocellulose Degrading System in Microbial Community Associated with a Wood- Feeding Beetle ...... 13 Chapter 3. Proteomic Analysis of Fusarium solani Isolated from the Asian Longhorned Beetle, Anoplophora glabripennis...... 14 Chapter 4. Midgut Transcriptome Profiling of Anoplophora glabripennis, a Lignocellulose-Degrading, Wood-Boring Cerambycid ...... 15 Chapter 5. Metatranscriptome Analysis and Community Profiling of Microbes Associated with the Asian Longhorned Beetle (Anoplophora glabripennis) Midgut: Insights into Insect-Microbe Interactions and Nutritional Ecology ... 16 Supplemental Chapter: Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis ...... 17 Literature Cited ...... 17

Chapter 2 Comparative Metagenomic Profiling Reveals Lignocellulose Degrading System in Microbial Community Associated with a Wood-Feeding Beetle ...... 29

Abstract ...... 29 Introduction ...... 30 Methods ...... 34 Preparation of Insect Cell Free DNA for Community Profiling and Shotgun Sequencing ...... 34 454 Amplicon Pyrosequencing to Taxonomically Identify Microbes Associated with the A. glabripennis Midgut ...... 35 Phylogenetic Binning and Functional Analysis of A. glabripennis Midgut Microbiota Using Shotgun 454 Pyrosequencing ...... 37 Comparisons to Other Herbivore-Related Metagenomes: ...... 39 Results and Discussion ...... 41

vi

Taxonomic Classification of OTUs and Shotgun Reads ...... 41 Identification of Cellulose-, Hemicellulose- and Aromatic Compound- Degrading Bacterial Taxa ...... 42 Identification of Fungal Community ...... 43 Functional Profiling of Reads Generated through 454 Shotgun Sequencing ...... 45 Comparison of Functional Domains from Other Herbivore Associated Microbial Communities ...... 45 Candidate Genes for Lignin Degrading Enzymes ...... 48 Candidate Genes for Cellulases and Carbohydrases ...... 51 Candidate Genes for Xylose Utilization and Fermentation ...... 52 Candidate Genes for Pectin Degrading Enzymes ...... 54 Candidate Genes for Nutrient Acquisition and Synthesis ...... 54 Candidate Genes from Fusarium ...... 59 Candidate Genes from Leuconostoc ...... 60 Conclusion ...... 61 Acknowledgements ...... 62 Literature Cited ...... 64

Chapter 3 Proteomic Analysis of Fusarium solani isolated from the Asian longhorned beetle, Anoplophora glabripennis ...... 91

Abstract ...... 91 Introduction ...... 92 Materials and Methods ...... 97 Source of Larval A. glabripennis Associated F. solani Culture ...... 97 Solid Wood Fungal Cultures and Fungal Extraction ...... 98 MudPIT Analysis ...... 99 In vitro and Xylanse Activities ...... 101 Zymogram Analysis ...... 102 In vitro Lignin Peroxidase, Manganese Peroxidase, and Laccase Activities ...... 103 Non-denaturing PAGE and Heme Staining ...... 104 Results ...... 105 MudPIT Analysis ...... 105 Verification of Enzyme Activity Through In vitro Lignocellulase Assays ...... 108 Verification of Enzyme Presence and Activity Through PAGE Gel Analysis ...... 109 Discussion ...... 110 Acknowledgements ...... 118 Literature Cited ...... 119

Chapter 4 Midgut Transcriptome Profiling of Anoplophora glabripennis, a Lignocellulose-Degrading, Wood-Boring Cerambycid ...... 141

Abstract ...... 141 Introduction ...... 142 Materials and Methods ...... 145 454 Transcriptome Analysis of A. glabripennis Larvae Feeding on a Suitable Host ...... 145 Comparison to Other Insect Gut Transcriptome Libraries to Identify Groups of ESTs Associated with Feeding in Wood ...... 147

vii

Multivariate Transcriptome Library Comparisons ...... 148 Phylogenetic Analysis ...... 149 Identification of Highly Expressed Genes in the A. glabripennis Midgut ...... 150 Results and Discussion ...... 152 454- and Illumina-Based Transcriptome Sequencing ...... 152 Glycoside and Plant Cell Wall Digesting Enzymes ...... 156 Glycoside Hydrolases and Plant Cell Wall Digesting Enzymes ...... 161 Transcripts Predicted to Encode Enzymes Involved in Nitrogen Acquisition ...... 164 Transcripts Involved in Facilitating Interactions with Gut Microbes ...... 166 Identification of Highly Expressed Genes ...... 167 Profile Comparisons ...... 168 Multivariate Transcriptome Comparisons of Gene Ontology Annotations ...... 171 Conclusions ...... 173 Acknowledgements ...... 175 Literature Cited ...... 176

Chapter 5 Metatranscriptome Analysis and Community Profiling of Microbes Associated with the Asian Longhorned Beetle (Anoplophora glabripennis) Midgut: Insights into Insect-Microbe Interactions and Nutritional Ecology ...... 213

Abstract ...... 213 Introduction ...... 214 Methods ...... 218 Characterization of Fungal and Bacterial Midgut Communities Using 16S and ITS Amplicon Sequencing ...... 218 Operational Taxonomic Unit-(OTU) Based Analysis of 16S and ITS Amplicons .. 220 Gut pH Analysis ...... 222 Metatranscriptome Sequencing ...... 223 Results and Discussion ...... 228 Bacterial Community Structure ...... 228 Identification of Fungal Community ...... 232 Gut pH Profile ...... 235 Assembly Metrics ...... 235 Detection of Microbial Small Ribosomal Subunits (16S/18S) and Large Ribosomal Subunits (23S/28S) in Whole Gut and Gut Contents Assemblies: ...... 236 Bacterial 16S and Fungal ITS OTUs Detected in Metatranscriptome Data ...... 238 Annotation Metrics ...... 240 GO Enrichment and KEGG Pathway Analysis ...... 244 Transcripts with Predicted Involvement in Xylose Utilization ...... 251 Transcripts Predicted to Originate from Fusarium ...... 252 Transcripts Derived from Yeasts ...... 257 Lactic Acid Bacterial Transcripts ...... 260 Conclusions ...... 264 Acknowledgements ...... 267 Literature Cited ...... 269

Chapter 6 Conclusions and Future Directions ...... 309

viii

Literature Cited ...... 322

Appendix A Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis ...... 326

Abstract ...... 326 Introduction ...... 327 Materials and Methods ...... 331 Rearing Colony Derived A. glabripennis on Different Host Trees ...... 331 Collection of Larval A. glabripennis and Fungal Cultures from Introduced Wild Populations ...... 332 Anoplophora Glabripennis Larval Gut Dissection and DNA Extraction ...... 333 Culture Independent Fungal Community Analysis ...... 334 Aerobic Culturing of Gut Fungus on Restrictive Media ...... 336 Multi-Locus Sequencing from Cultured Fungal DNA Extraction ...... 337 Sequence Editing, Alignment and Operational Taxonomic Unit (OTU) Analysis of TEF1-α ...... 338 Single- and Multi-locus Phylogenetic Analysis ...... 339 Results ...... 341 Culture Independent Fungal Community Analysis ...... 341 Culture-dependent fungal analysis ...... 342 Discussion ...... 344 Conclusions ...... 346 Acknowledgements ...... 347 Literature Cited ...... 348

Appendix B Supplemental Tables and Figures ...... 363

ix

LIST OF FIGURES

Figure 2-1. Rarefaction curve of 16S rRNA amplicons ...... 83

Figure 2-2. Maximum likelihood analysis of representative sequences from operational taxonomic unit analysis (OTU) of bacterial 16S rRNA amplicons...... 84

Figure 2-3. Rarefaction, richness, and diversity analyses of 18S amplicon data...... 8Error! Bookmark not defined.

Figure 2-4. Distribution of SEED assignments generated by MG-RAST ...... 86

Figure 2-5. Hierarchical cluster analysis based on Pfam annotations of herbivore related metagenomes ...... 87

Figure 2-6. Principal components analysis (PCA) of Pfam domains from herbivore- related metagenomes ...... 88

Figure 2-7. Distribution of glycoside hydrolase families found in the A. glabripennis gut metagenome ...... 89

Figure 2-8. Xylose utilization pathway present in the A. glabripennis gut community ...... 90

Figure 3-1. Enrichment of GO Molecular Function terms in proteomic analysis...... 139

Figure 3-2. Heme staining and zymogram analysis of A. glabripennis derived F. solani solid wood culture extract ...... 140

Figure 4-1. Histogram of isotig lengths generated from 454 FLX reads...... 199

Figure 4-2. Histogram of transcript lengths generated from 454 FLX and Illumina paired end reads using Trinity ...... 200

Figure 4-3. Relative abundance of the 25 most abundant Pfam domain assignments in the A. glabripennis midgut transcriptome assembly ...... 201

Figure 4-4. KOG assignments for midgut unigenes ...... 202

Figure 4-5. Distribution of glycoside hydrolase families found in the A. glabripennis midgut transcriptome...... 203

Figure 4-6. Proposed mechanisms for direct utilization of ammonia detected in the A. glabripennis midgut ...... 204

Figure 4-7. Structure of hemocyanin enzymes detected in the A. glabripennis midgut transcriptome ...... 205

Figure 4-8. Multivariate comparison of glycoside hydrolase families detected in the gut transcriptomes of herbivorous insects ...... 206

Figure 4-9. Phylogenetic analysis of GH 5 cellulases detected in the Coleoptera...... 207

x

Figure 4-10. Phylogenetic analysis of GH 48 cellulases detected in the Coleoptera ...... 209

Figure 4-11. Phylogenetic analysis of GH 45 cellulases detected in the Coleoptera ...... 210

Figure 4-12. Two-way cluster analysis of level four gene ontology terms from herbivorous insect gut transcriptomes ...... 211

Figure 5-1. Enlarged midgut associated with A. glabripennis and filamentous microbes observed in association with the midgut epithelial cells ...... 292

Figure 5-2a. Rarefaction analysis of 16S bacterial communities sampled from four 3rd instar A. glabripennis larvae feeding on sugar maple ...... 293

Figure 5-2b. Rarefaction analysis of 16S bacterial communities sampled from four 3rd instar A. glabripennis larvae feeding on sugar maple with singleton OTUs removed ..... 293

Figure 5-3. Relative abundances of bacterial classes detected through 16S community analysis of four 3rd instar A. glabripennis larvae...... 295

Figure 5-4a. Venn diagram at distance 0.03 of OTUs detected through 16S sequencing of four gut bacterial communities sampled from third instar A. glabripennis larvae ...... 296

Figure 5-4b. Venn diagram at distance 0.03 of OTUs detected through 16S sequencing of four gut bacterial communities sampled from third instar A. glabripennis larvae with singleton OTUs removed ...... 296

Figure 5-5. Hierarchical cluster analysis of bacterial families detected through 16S amplicon sequencing of four A. glabripennis larval midguts ...... 297

Figure 5-6a. Rarefaction analysis of ITS fungal amplicons sequenced from four 3rd instar larval A. glabripennis midguts ...... 298

Figure 5-6b. Rarefaction analysis of ITS fungal amplicons sequenced from four 3rd instar larval A. glabripennis midguts with singleton OTUs removed ...... 298

Figure 5-7. Abundance of fungal order detected in ITS amplicon data sampled from communities associated with four 3rd instar larval A. glabripennis midguts ...... 300

Figure 5-8a. Venn diagram illustrating overlap in ITS OTUs at a distance of 0.03 among four larval A. glabripennis midgut communities ...... 301

Figure 5-8b. Venn diagram illustrating overlap in ITS OTUs at a distance of 0.03 among four larval A. glabripennis midgut communities with singletons removed ...... 301

Figure 5-9. Maximum likelihood analysis of fungal ITS sequences taxonomically assigned to Fusarium solani detected in A. glabripennis larval midguts ...... 303

Figure 5-10. Bacterial and fungal classes detected in the midgut and midgut contents libraries...... 304

xi

Figure 5-11. Relative abundances of bacterial clusters of orthologous genes found in the midgut and midgut contents libraries ...... 305

Figure 5-12. Relative abundances of fungal clusters of orthologous genes (KOGs) in the midgut and midgut contents libraries ...... 306

Figure 5-13. Pathways for pyruvate utilization detected in the A. glabripennis midgut and in the gut communities ...... 307

Figure 5-14. Partial pathways for aromatic amino acid biosynthesis and biosynthesis of other essential amino acids detected in the gut community ...... 308

Figure 5-15. Putative pathways for xylose utilization based on BLASTX annotation of transcripts sampled from the midgut microbial community of larval A. glabripennis .... 309

xii

LIST OF TABLES

Table 2-1. Summary of Newbler metagenome assembly metrics ...... 78

Table 2-2. Species richness and diversity calculations for bacterial OTUs detected in the A. glabripennis gut ...... 79

Table 2-3. Species richness and diversity calculations for fungal OTUs detected in the A. glabripennis gut ...... 80

Table 2-4. Summary of metagenome annotations ...... 81

Table 2-5. The most highly abundant glycoside hydrolase families detected in the gene tag annotations and their associated KEGG classificationst ...... 82

Table 3-1. MudPIT summary data ...... 129

Table 3-2. Most abundant InterPro IDs identified in MudPIT analysis ...... 130

Table 3-3. Glycoside hydrolase families detected in MudPIT analysis ...... 131

Table 3-4. Other plant cell wall degrading proteins from MudPIT analysis...... 133

Table 3-5. Proteins associated with lignin from MudPIT analysis...... 134

Table 3-6. Proteinases and nitrogen-recycling proteins identified from MudPIT analysis ..... 136

Table 3-7. Verification of lignocellulytic activity of A. glabripennis derived F. solani solid wood extract cultures through in vitro assays ...... 138

Table 4-1. 454 and Sanger EST gut transcriptome libraries from herbivorous insects ...... 190

Table 4-2. Transcriptome assembly and annotation metrics from herbivorous insects included in glycoside hydrolase and Pfam comparisons ...... 191

Table 4-3. Assembly metrics for A. glabripennis midgut transcriptome assembly generated using 454 reads only s ...... 192

Table 4-4. Annotation statistics for A. glabripennis midgut transcriptome s ...... 193

Table 4-5. Assembly metrics for A. glabripennis midgut transcriptome Illumina-454 co- assembly ...... 194

Table 4-6. Annotation statistics for A. glabripennis midgut transcriptome Illumina-454 hybrid assembly ...... 195

Table 4-7. Cytochrome P450 annotations from A. glabripennis midgut transcriptome ...... 196

Table 5-1. Ecological indices for 16S bacterial communities sampled from midguts from 4 individual A. glabripennis larvae feeding on sugar maple ...... 282

xiii

Table 5-2. Taxonomic classifications of 16S bacterial OTUs detected in all A. glabripennis larval guts sampled ...... 283

Table 5-3. Ecological indices for ITS fungal communities sampled from individual third instars A. glabripennis larval guts feeding on the heartwood of sugar maple ...... 284

Table 5-4. Trinity assembly metrics for gut contents and whole gut transcriptome libraries...... 285

Table 5-5. Taxonomic classification of microbial rRNAs detected in the gut contents and whole gut transcriptome libraries ...... 286

Table 5-6. Taxonomic identity of 16S OTUs supported by expression data ...... 288

Table 5-7. Annotation statistics for microbial transcripts detected in the gut contents and whole gut libraries ...... 289

Table 5-8. Number of unique KO terms found in KEGG pathways associated with carbon metabolism, nitrogen acquisition and amino acid synthesis, nutrient acquisition, and detoxification...... 290

xiv

ACKNOWLEDGEMENTS

I would like to extend my utmost gratitude to my family, friends, advisors, committee members, co-workers, and fellow graduate students for their endless support, guidance, and patience. A special thank you to my boyfriend, Andy, who not only helped me format my dissertation when I finally decided I was fed up with Microsoft Office templates, but has also been a source of constant encouragement in my journey through graduate school. Of course, I must thank my wonderful cats, Rivers and Annie, who always made my life awesome, encouraged me to take plenty of naps, and made extensive contributions to my writing and data analysis. Another special thank you to my dad Dennis (“D”), my mom Leslie, my sister Sarah, my brother Kevin, and the rest of the Scully/Heubel/Clark clan for supporting my decision to obtain my Ph.D. and for encouraging me to pursue my dreams..

Next, I would like to thank each and every member of my dissertation committee (Kelli,

John, Scott, Dawn, and David) for advising me through data analysis and the publication processes, encouraging me to apply new and innovative approaches to test hypotheses and address research questions, and providing me with the opportunities to develop new laboratory and data analysis skills. I would also like to extend a special thank you to my coadvisors, Kelli and John, for always having faith in my abilities and for providing me with encouragement and support when I doubted my abilities. I am also very grateful to Scott Geib, who mentored me through much of the bioinformatics and statistical analyses that were used in my experiments, encouraged me to develop myself as a multidisciplinary scientist, and provided me with access to the computational resources that were necessary to complete many of my research chapters.

Thank you to our collaborator, Ming Tien, for providing valuable insights into manuscript preparation and for assistance with all things .

xv

Next, I would like to extend a heartfelt thank you to the Hoover lab. In particular, I would like to thank our undergraduate research assistants, Karen Bingham, Katie Mulfinger, and

Fran McCullough, our lab technicians, Liz McCarthy and David Long, and our former post docs,

Drs. Jim McNeil and Maya Nehme, who provided laboratory and insect rearing support to make this project possible. I would also like to thank Dr. Cristina Rosa for her enlightening discussions on molecular biology techniques, data analysis and interpretation, the finer points of publication, and career opportunities in the sciences. I would also like to thank all Genetics and Entomology students for making grad school really awesome and for providing very wonderful stress relieving activities. I am not sure how I would have made it through grad school without our weekly trips to Zeno’s for Quizzo and our random downtown excursions/adventures in LSB! Additionally, I would like to especially thank the former and current chairs of the Genetics program, Dr. Richard

Ordway and Dr. Robert Paulson, for all of their academic and career guidance and for providing financial support for travel opportunities to attend conferences and to work with off-site collaborators on computational data analysis.

Finally, I would like to thank the Alphawood Foundation, USDA-NIFA, the USDA-

AFRI Microbial Genomics Training Program, the Genetics program, the Department of

Entomology, the Huck Institutes of the Life Sciences, the College of Agricultural Sciences, and

The Department of Energy-Joint Genome Institute for providing financial support for this research. .

1

Chapter 1

Introduction

Current experts estimate that up to 99% of microbes have yet to be cultured in vitro; despite the fastidious nature of many microbes[1], the uncultured fractions of microbial communities may harbor unique repertoires of metabolically and physiologically distinct genes that have yet to be characterized. Indeed, many discoveries of novel genes and biochemical processes have arisen through culture-independent functional screens and metagenomic shotgun sequencing studies. For example, the discoveries of novel bacteriorhodopsin photosynthetic genes in marine bacteria [2], metabolism of phosphorous by microbial communities in sludge bioreactors [3, 4], and detoxification of heavy metals by microbes thriving in acid mine drainage sites [5] have all arisen through metagenomic approaches. Some of the most intriguing microbial communities are those associated with the guts of vertebrates and invertebrates as they may harbor unique metabolic capabilities that contribute to host physiology.

Many herbivorous insects maintain intricate relationships with bacterial, archaeal, fungal, and protistal microbes that often generate enzymes that augment or enhance endogenous insect host metabolic processes or produce enzymes that complement host metabolic pathways or supplant metabolic deficiencies [6]. Phytophagous insects typically harbor two types of symbionts: primary symbionts, which tend to be obligate, intracellular, and vertically transmitted from mother to offspring and secondary symbionts, which tend to be facultative, extracellular , are often harbored in the gut or in specialized external structures called mycangia, and can be either vertically transmitted from mother to offspring or horizontally transferred throughout the population [7]. Despite these differences, both primary and secondary symbionts have the

2 potential to make integral contributions to host digestive and metabolic processes and influence host biology [8, 9]. In wood-feeding insects, a combination of primary and secondary symbionts can be found in association with the digestive system where they can facilitate digestion of lignin, cellulose, hemicellulose, and other wood polysaccharides, ferment wood sugars into compounds that can be directly used for energy and fatty acid production, fix atmospheric nitrogen, recycle nitrogenous waste products, synthesize and pheromones, and synthesis vitamins and other nutrients that are absent or present in small quantities in woody tissue [10-12]. While communities associated with wood-feeding insects have the potential to harbor enzymes that could be exploited to enhance the industrial production of cellulosic ethanol[13, 14], surprisingly few studies on the microbial communities of wood-feeding insects have been performed with the exception of termites[10, 15] and bark beetles[16]. Despite this, little is known about how collaborations between insects that colonize and feed in living trees and microbes contribute to the digestion of woody tissues. Many of these xylophagous insects are notorious pests of deciduous trees and attack tree species commonly planted as biofeedstock for biofuel production

(e.g. Populus and Salix) [17-20]. Understanding these interactions could lead to novel targets for biocontrol of these wood-feeding pests and discovery of efficient lignocellulosic and hemicellulosic enzymes that could be exploited for production of biofuels.

Insects belonging to the family Cerambycidae attack a broad range of deciduous host trees, have pivotal roles in nutrient cycling, and pose substantial threats to urban and forest trees.

The Asian longhorned beetle (Anoplophora glabripennis; ALB) is an exotic, wood-boring insect that was first detected in the United States in 1996. It has a relatively broad host range compared to other cerambycids and is often found colonizing healthy host trees. Microbes found in association with the ALB gut may augment the beetle’s digestive capabilities and compensate for its deficiencies, allowing the beetle to thrive on a nutrient-deficient substrate by enabling digestion of recalcitrant compounds, including lignin, cellulose, and hemicellulos, detoxification

3 of tree secondary metabolites and phenolic defensive compounds, de novo synthesis of nutritional compounds deficient or absent from the feeding substrate, including vitamins, fatty acids, sterols, amino acids, and ammonia (from atmospheric nitrogen) [6]. The consortium of microbes associated with the gut likely acts in tandem with the insect, enabling this beetle to thrive and enjoy a broad range of deciduous host trees.

Background

Nutritional Composition of Woody Tissue

Glucose in wood is present in the form of complex polysaccharides, including cellulose, xylan, and pectin, which are inherently difficult to digest and require a complex of enzymes for efficient degradation [21]. Cellulose alone requires a suite of three enzymes for complete conversion to glucose, including endocellulases, exocellulases (cellobiohydrolases), and β- [22]. While insects are capable of producing their own endocellulases and β- glucosidases, no known multicellular organism harbors an intrinsic ability to produce exocellulases and instead, rely on microbial symbionts for this component of the cellulase complex [23]. While the most abundant cell wall polysaccharide in woody tissue from hardwood tree species is cellulose, a significant amount of hemicellulose is also localized in the heartwood.

Unlike cellulose, which is a linear polymer of glucose, the composition of hemicellulose is more heterogeneous in nature and it is comprised of a mixture of heteropolymers. A variety of five- and six- carbon sugars, including xylose, arabinose, glucose, galactose, mannose, and rhamnose comprise these hemicellulose heteropolymers, forming xylan, glucuronoxylan, arabionxylan, glucomannan, and xyloglucan polysaccharides. Hardwood hemicellulose is dominated by xylan, a branched polysaccharide predominately comprised of xylose; although branching decreases

4 crystallinity, allowing hydrolytic enzyme to gain access to polysaccharide termini, xylose sugars are difficult to process and ferment. Therefore, insects generally cannot use xylose sugars liberated from xylan as a source of energy without the aid of microbes. Plant polysaccharides in hardwood trees are further protected from hydrolytic enzymes by a recalcitrant lignin barrier, a biopolymer containing over 12 types of chemical bonds formed through radical oxidative polymerization [24]. Due to the random, heterogeneous nature of these crosslinkages, this macromolecule can only be efficiently catabolized through radical oxidative depolymerization, a process that has only been conclusively documented and well characterized to be catalyzed by enzymes produced by a small number of wood degrading fungi [25, 26]. Nitrogen is extremely limiting in this environment [27, 28] and protein sources originating from plant cell wall proteins are intricately cross-linked with recalcitrant plant cell wall polysaccharides and biopolymers [29], while other dietary components, including fatty acids, sterols, and vitamins are entirely absent altogether [6]. Lastly, toxic secondary metabolites produced by the host tree in response to herbivory or toxins that have accumulated in the heartwood must be detoxified or sequestered in order for the insect to continue to feed in this environment [30].

Symbiosis in Xylophagous Insects

Different wood-boring insects have different mechanisms for overcoming nutritional challenges associated with feeding in woody tissue. While some insects, including mountain pine beetles and ambrosia beetles [31, 32] , carry fungi in external mycangia that are inoculated into hosts to synthesize sterols , detoxify plant allelochemicals, and degrade lignocellulose [11,

33, 34], others rely on ingested lignocellulases and other enzymes produced by wood-rot fungi that have previously colonized host trees [35]. Yet others overcome these challenges by preferentially targeting stressed or dying host trees, whose woody-intractable components have

5 been pre-digested by wood degrading bacteria [36] and fungi and/or acquire other nutrients deficient from these host trees by directly feeding on the microbial inoculum [37]. Others harbor microbes in salivary glands [34] or in the midgut [11] that help detoxify plant secondary metabolites, digest woody tissue, concentrate nitrogen, and synthesize vitamins, pheromones, and other nutrients. Despite these different strategies, one commonality is that the ability to thrive in woody tissues involves some kind of interaction between the insect and microbial affiliates, with both parties making integral contributions to digestion and nutrient acquisition [38]. In general, stenophagous insects that target a relatively narrow range of host trees tend to have simple and static microbial communities in comparison to insects with a broad range of host trees, whose communities can be extremely complex and dynamic. In fact, the diversity of microbes associated with ALB is much greater in comparison to another cerambycid (Saperda vestita; linden borer), a wood borer with a very restricted host range, suggesting that the consortium of microbes associated with the ALB gut may contribute to its expansive host range [39]. In concert, the ALB gut community does display some degree of plasticity as the types of microbes found in association with the gut varies depending on host tree species/population; this plasticity may contribute to ALB’s ability to colonize and thrive in a variety of host tree species [40].

Endogenous Enzymes of Xylophagous Insects and their Roles in Nutrient Acquisition

Twenty years ago, many scientists surmised that wood-feeding and other herbivorous insects relied exclusively on microbes for the production of cellulases, pectinases, and xylanases for efficient digestion of plant cell wall materials [41] and that microbes associated with insects were solely responsible for neutralizing the majority of the challenges associated with feeding on living plants [6] . However, experiments with gnotobiotic insects [42, 43] and a combination of high-throughput sequencing approaches [44, 45] and molecular cloning [34, 45-50] led to the

6 discovery of insect-derived polysaccharide cell wall degrading enzymes, including endogenous cellulases, clearly demonstrating that insects can have crucial roles in the breakdown of plant cell wall material. In addition, insects are notorious producers of cytochrome p450s [51], which are integrally involved in xenobiotic oxidoreductive processes that ultimately lead to oxidative destruction of toxic compounds, including plant derived secondary allelochemicals and pesticides

[52, 53]. Insects have also adapted sophisticated abilities to evade host plant defenses [54], including circumvention of several of the most widespread lines of anti-herbivore defenses: digestive proteinase inhibitors [54]; cyanates and cyanoamino acids [55]; jasmonic acid induced defensive pathways [56, 57], and insect genomes are often inundated with extensive suites of enzymes involved in detoxification Furthermore, physiological conditions in the midgut also make strong contributions to digestibility and neutralizing the impacts of host tree defensive chemicals. For example, alkaline conditions in the midgut not only hypothesized to inhibit crosslinking between tannins and insect proteins (designed to inactivate critical insect digestive proteins), but it is also hypothesized to help solubilize fractions of the lignin biopolymer, separating the biopolymer from more digestible carbohydrate resources and indirectly enhancing digestibility of cell wall polysaccharides embedded and cross-linked in lignin matrix.

Furthermore, mastication breaks the lignin polymer into smaller, more easily soluble pieces and releases polysaccharide termini from lignin matrix where they can be easily accessed by hydrolytic enzymes. Although previous studies have demonstrated the presence of endogenous endoglucanases and β-glucosidases produced by wood-boring cerambycids and that some regions of the gut have highly alkaline pHs, no large-scale studies have been performed to characterize the endogenous digestive and physiological capacities of any cerambycid.

7 Host Plant Resistance: Impacts on Insects and their Symbionts

Co-evolution between insects and their host plants have yielded a variety of plant chemical defenses designed to mitigate herbivory. These chemical defenses can have direct impacts by negatively influencing life history traits, including fecundity, survival, and development, or indirect impacts by attracting predators and parasitoids or by influencing behavior, causing the insect to avoid the plant entirely [58] . Many of these chemical cues are regulated by either the jasmonic acid (JA) or salicylic acid pathways [59]. Historically, it was theorized that chewing insects induced JA-mediated defensive pathways, sucking/piercing insects induced either JA or SA mediated pathways, and microbes induced SA-mediated pathways[60]; however, recent evidence suggests that there are exceptions to this generalization and that some insects are capable of inducing JA/SA cross-talk [56]. While induction of this pathway was originally attributed to mechanical damage and release of reactive oxygen species from cell membranes, recent evidence also suggests that these pathways can be activated by effectors present in oral secretions, also known as herbivore-associated molecular patterns (HAMPs) [61].

While interactions with HAMPs allows plants to recognize specific herbivores, response to

HAMPs is universal and results in the activation of JA-mediated defenses [62].

The downstream effects of JA-pathway activation often trigger the release or activation of toxins and other defensive chemicals. Defensive chemicals found in association with plants can have immediate impacts on their herbivore targets through disruption of membranes, inhibition of nutrient transport, inhibition of signal transduction pathways, interference with metabolism and other physiological processes, and disruption of hormonal signaling. Common types of chemical defensive compounds include saponins, which destroy membranes, toxins capable of blocking voltage gated potassium channels, inhibiting membrane repolarization after an action potential, non-protein amino acids, including compounds that mimic γ-amino butyric

8 acid (GABA) and interfere with neurotransmission, cyanogenic glycosides, gluocsinolates, which spontaneously arrange into isothiocyanates and nitriles when tissue is damaged, phytoecdysteroids, which mimic insect hormones and interfere with molting processes, terpenoids, and phenolic glycosides [58]. In addition, many plant defensive chemicals can directly damage insect digestive proteins via the production of digestive enzyme inhibitors.

Potent targets of plant defensive chemicals include proteinases, which function in both intracellular signaling and digestive processes [63]. In short, the impacts of plant defensive compounds on phytophagous insects are as diverse as the types of defensive chemicals produced by plants, yet many theorize that constitutive defenses more fully explain resistance than induced defenses since they are associated with a lower fitness cost and they can be efficiently mobilized and activated [64]. However, a role for induced defenses can be quite profound in some systems

[65]. .

In tandem, insects have evolved numerous counter strategies to combat plant defensive chemicals. For example, insect genomes house an arsenal of detoxification enzymes, many of which are capable of destroying or inactivating some of the toxins produced by host trees [66,

67]. In addition, insects often produce many different types of digestive enzymes to mitigate the impacts of digestive enzyme inhibitors produced by host plants. For example, many insects respond to the presence of proteinase inhibitors by either overexpressing proteinases [68] or by producing a different type of proteinase whose integrity and activity is not directly influenced by the inhibitor [69]. Recent evidence also indicates that microbes associated with insects can directly manipulate plant defensive pathways by downregulating herbivore-defensive pathways genes and allowing insects to persist and feed on plant host [70](Chung et al, 2013, in press), provide an alternate source of digestive enzymes in the event that an insect’s endogenous digestive enzymes are inhibited by plant defensive compounds [71], and can directly detoxify plant defensive compounds [72]..

9 Despite the important contributions that microbial symbionts make to host digestive processes, little is known about the potential impacts of host plant resistance on symbiotic microbes in phytophagous insects. However, microbes that make contributions to digestive physiology and nutrient acquisition could become targets for plant defenses, interfering with complementary pathways insects and their gut microbes that are essential for digestive, metabolic, and physiological processes or by disrupting molecular cross-talk/interfering with signaling pathways that promote these interactions. Furthermore, little is known about what drives resistance of host trees to wood-boring pests. Although it has been previously hypothesized that phenolic glycosides confer resistance to wood-borers in Pyrus spp [73] and to beetles in Salix spp [74], direct evidence linking PGs to negative impacts on wood-borers have yet to be established. Wood-borers are especially challenging to control, so identifying traits associated with resistance is key to developing more effective approaches to control these insects.

With the increasing prevalence of pesticide resistance and the lack of natural predators available for biocontrol of some of the most destructive, invasive species, targeting microbial symbionts that perform essential services could provide alternative and potent mechanisms for control.

Life History of (ALB) Anoplophora glabripennis

Beetles in the subfamily Lamiinae are distinct from other wood-boring insects and beetles as many of its members preferentially target healthy host trees and have relatively broad host ranges [39]. For example, the Asian longhorned beetle (Anoplophora glabripennis; ALB) is an exotic, wood-boring insect introduced to northeastern and midwestern United States from China and was first detected in the US in the early 1990s [36, 75, 76]. This insect poses significant threats to urban streetscapes, forest ecosystems, and poplar biofuel plantations and has the potential to destroy up to 35% of the urban tree canopy [77]. Since its introduction, ALB has

10 been documented to complete development in approximately 25 deciduous tree species in the

United States although Acer spp. are the predominant hosts due, in part, to their abundances in the northeastern and midwestern states where ALB has been detected [76, 78]. Full development, from larva to adult, can last anywhere from one to two years depending on the climate, with longer life cycles associated with cooler climates [79, 80]. Females lay their eggs beneath the bark in the cambium; newly hatched first instar larvae almost exclusively feed in the phloem beneath bark before tunneling to and feeding in the phloem/immature xylem as second instars and eventually into the mature xylem or heartwood as third and fourth instars. Ultimately, pupation occurs deep in the center of the tree after five to eight instars and mature adults tunnel their way out of the tree where they feed on twigs and foliage [75]. These insects spend the majority of their lives as larvae where they feed, grow, and develop in a nutrient deficient environment in the heartwood of healthy host trees. ALB’s ability to thrive in a broad range of healthy host trees is particularly impressive considering that most other wood-boring insects with a similarly broad range feed on stressed or dying trees whose woody (intractable) components have been pre- digested by wood-degrading bacteria and fungi. In contrast, ALB harbors a diverse gut microbial community, hypothesized to provide key enzymes for lignocellulose digestion and nutrient acquisition that help ALB overcome challenges of survival in healthy host trees. The larger questions under investigation in our lab are why ALB is able to exploit a broad host range of apparently healthy host trees and how larvae overcome nutritional challenges associated with feeding in the nutritionally-deficient heartwood.

Microbes Associated with the Midgut of Anoplophora glabripennis

We hypothesize that microbes associated with the ALB gut contribute to digestive and physiological processes in the gut, including lignin degradation, extraction of glucose from plant

11 cell wall polysaccharides, nitrogen and nutrient acquisition, and detoxification, promoting ALB’s success in colonizing and persisting in healthy host trees. Through previous culture-independent approaches, a number of microbial taxa were detected in association with the ALB gut, including members of the following bacterial families: Brevibacteriaceae, Dermabacteriaceae,

Enterobacteriaceae, Enterococcaceae, Lactobacillicaceae, Microbacteriaceae,

Promicromonosporaceae, Rhizobiaceae, Streptomycetaceae, Sphingobacteriaceae,

Sphingomonodaceae, and Xanthomonodaceae and the following fungal families: Sodariomycetes and Saccharomycetaceae [40]. In fact, the diversity of microbes associated with ALB is much greater in comparison to another cerambycid (Saperda vestita; linden borer), a wood borer with a very restricted host range, suggesting that the consortium of microbes associated with the ALB gut may contribute to its expansive host range [39]. In concert, the gut community does display some degree of plasticity as the types of microbes found in association with the gut varies depending on host tree species; this plasticity may contribute to ALB’s ability to colonize and thrive in a variety of host tree species [40].

Previous research in our lab also conclusively demonstrated that all major reactions associated with large scale lignin depolymerization occur in the larval ALB gut, including side chain oxidation, demethylation, and ring hydroxylation. The most dominant modification to lignin detected in this analysis was side chain oxidation, a reaction that is often associated with white rot fungal lignin degradation and is not known to be catalyzed by bacterial- or animal- derived enzymes [14, 24]. While white rot fungal isolates have not been previously detected in association with the ALB gut through both culture-dependent and culture-independent techniques, ALB consistently harbors a filamentous fungal isolate belonging to the Fusarium solani species complex [81] . Members of this complex are thought to be efficient lignin degraders and some members are capable of producing enzymes that can catalyze the types of lignin degrading reactions observed in the ALB gut [82-85]. Additionally, many Fusarium

12 isolates are producers of laccases and lignin peroxidase, which are lignin degrading enzymes predominately produced by white rot fungal isolates [86]; thus, this fungal isolate may play a key role in lignin degradation in our ALB isolate. Furthermore, cellulose and hemicellulose (xylan) digestion has been detected in association with larvae reared in many host tree species, including sugar maple (Acer saccharum), silver maple (Acer saccharinum), and pin oak (Quercus palustris) and a number of carboxymethylcellulose degrading microbes were successfully cultured from the gut belonging to the following genera: Enterobacter, Ochrobacterium, Pseudomonas, and an unidentified member of the family Rhizobiaceae [40, 87]. However, disruption of the gut microbiota induced by feeding on a cellulose-based artificial diet containing bacteriostatic and fungistatic agents, causes a reduction in cellulase enzyme complex activity (endoglucanases, exoglucanases, and β-glucosidases), suggesting that microbes in the gut may, in part, contribute to the production of enzymes involved in cellulose and hemicellulose digestion [40].

Evidence from our previous research indicates that the consortium of microbes living in the gut work in tandem with the insect to overcome challenges associated with living in wood.

The primary purpose of this thesis is the first attempt at teasing out interactions between ALB, the gut microbial community, and the host tree at the molecular level that contribute to digestive processes, nutrient acquisition, and detoxification and influence whether or not the beetle can successfully colonize and feed on a host tree species.

Experiments

1. Identify the types of microbes associated with the ALB gut and to inventory the

metabolic function of the beetle gut microbiota through targeted sequencing of

phylogenetic marker genes and shotgun metagenome sequencing respectively.

13 2. Determine the wood-degrading potential of an ALB associated Fusarium solani isolate

using shotgun proteomics and in vitro enzyme assays.

3. Assess ALB’s inherent metabolic and physiological capabilities through transcriptome

sequencing of insect-derived midgut mRNAs as it fed in heartwood.

4. Survey the metabolically active members of ALB’s gut microbiota through

metatranscriptome sequencing.

5. Conduct an intensive phylogenetic survey among Fusarium solani isolates collected from

different ALB populations and insects feeding in different tree species to determine if

same strain consistently harbored within and between populations.

Chapter 2. Comparative Metagenomic Profiling Reveals Lignocellulose Degrading System in Microbial Community Associated with a Wood-Feeding Beetle

Some of the most intriguing microbial communities are those associated with the guts of vertebrates and invertebrates as they may harbor unique metabolic capabilities that contribute to host physiology. Microbes found in association with the ALB gut may augment the beetle’s digestive capabilities and compensate for its deficiencies, allowing the beetle to thrive on a nutrient-deficient substrate by enabling digestion of recalcitrant compounds, including lignin, cellulose, and hemicellulose, detoxification of tree secondary metabolites and phenolic defensive compounds, de novo synthesis of nutritional compounds deficient or absent from the feeding substrate, including vitamins, fatty acids, sterols, amino acids, and ammonia (from atmospheric nitrogen) [6]. The consortium of microbes associated with the gut likely acts in tandem with the insect, enabling this beetle to thrive and enjoy a broad range of deciduous host trees. The goal of this chapter is to survey the types of microbes found in association with the ALB gut and also

14 ascertain the metabolic potential of the gut community by determining the types of genes found in this community that may contribute to digestive physiology.

Our hypotheses are that the ALB gut microbiome will contain fungal and bacterial taxa and enzymes (genes) that can degrade lignin, extract sugar monomers from plant cell wall polysaccharides (particularly enzymes that catalyze exo-glucanase type reactions), degrade xylan, degrade toxic plant secondary metabolites, fix atmospheric nitrogen and recycle nitrogenous compounds (nitrogenase and , respectively), and synthesize nutritional components missing from wood, including vitamins, sterols, amino acids, and fatty acids.

Chapter 3. Proteomic Analysis of Fusarium solani Isolated from the Asian Longhorned Beetle, Anoplophora glabripennis.

The most common and well-studied fungal symbionts of beetles are the approximately 25 families of yeasts belonging to the phylum ascomycota [88]; much effort has been devoted to studying the distribution, phylogeny and enzymatic activities associated with beetle-related yeast.

Many beetles harbor these yeast symbionts in mycetomes [89], which are specialized cellular structures that house endosymbionts. Many of these endosymionts are capable of producing cell wall degrading proteins, fermenting xylose and other monomeric sugars associated with cellulose and hemicellulose, concentrating amino acids and proteins, and synthesizing sterols and other compounds that are ultimately incorporated into insect pheromones and hormones [89-91].

Despite the tremendous degree of information we have on the distribution and potential metabolic capabilities of beetle derived yeasts, much less is known about the function of non-yeast fungi associated with beetles. The most well studied association is the relationship between the ambrosia beetle and its Fusarium solani affiliate, which is externally housed in mycangia and produces sterols, fatty acids and other cell wall degrading proteins [31, 33]. We consistently

15 detect F. solani in association with ALB; however, unlike ambrosia beetle fungi, this fungus appears to be associated with the gut, where it may contribute to digestive and physiological processes internally. The goal of this chapter is to assess this isolate’s capabilities to extract nutrients from woody tissue and produce cell wall and lignin degrading enzymes using shotgun proteomics and in vitro enzyme assays. We hypothesize that this fungal isolate will produce compounds capable of degrading lignin (ie: lignin peroxidases and laccases), plant cell wall materials, breaking down protein cross-linked in plant cell wall materials, and recycling protein waste.

Chapter 4. Midgut Transcriptome Profiling of Anoplophora glabripennis, a Lignocellulose- Degrading, Wood-Boring Cerambycid

Insects can make paramount contributions to overcoming challenges associated with feeding in plants, including production of endogenous cell wall degrading enzymes, a diverse array of enzymes that could be pivotal in detoxifying host plant defensive compounds, and a suite of nutrient scavenging and recycling enzymes that could allow ALB to persist on a nutritionally deficient food substrate. Thus, in order to understand the contributions of gut microbes to digestive physiology, we must first understand the inherent digestive and physiological capabilities of ALB. Obviously, whole genome sequencing would provide the most comprehensive insight into the metabolic capability of ALB, but this can be quite an undertaking in terms of cost and data analysis as it can take years to achieve a quality draft assembly. Since we are primarily interested in genes expressed during feeding in wood, a feasible alternative is shotgun sequencing of expressed midgut genes [92] in larvae fed in sugar maple, a preferred host of ALB. My overall hypothesis is that ALB can produce cell wall degrading enzymes, including cellulases and β-glucosidases, enzymes involved in detoxification of host tree allelochemicals,

16 including cytochrome p450s and glutathione S-, and Enzymes involved in protein acquisition and nitrogen recycling, such as proteinases and glutamine synthetases respectively.

Chapter 5. Metatranscriptome Analysis and Community Profiling of Microbes Associated with the Asian Longhorned Beetle (Anoplophora glabripennis) Midgut: Insights into Insect-Microbe Interactions and Nutritional Ecology

While the results of metagenome profiling the microbiota affiliated with the ALB midgut and shotgun proteomics of Fusarium solani grown on wood-chips suggests that these microbes have the potential to make integral contributions to the digestion of woody tissue and nutrient extraction in the gut, one common criticism of metagenomic-based analyses of host-associated symbionts is that transient, inactive microbes acquired during feeding that are simply passing through the gut and are not making contributions to digestive processes may be sampled for sequencing and included in annotations. A suitable compromise is to focus on transcriptionally active microbes in the gut using metatranscriptomics approaches [15, 93-95], which will provide a snapshot of the types of microbial genes expressed in the gut. The goal of this chapter was to inventory the transcriptionally active microbial taxa associated with the ALB midgut and functional categorize microbial-derived genes expressed in the gut to gain more focused insights into how microbes associated with the gut may be contributing to woody tissue digestion and nutrient acquisition. We hypothesize that microbial genes associated with lignin degradation, 5- carbon wood-sugar utilization (e.g. xylose and arabinose), cellulose, hemicellulose, and pectin digestion, and vitamin, sterol, and nitrogen acquisition are transcriptionally active in the ALB midgut.

17 Supplemental Chapter: Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis

While we have consistently detected Fusarium solani associated with the midgut of ALB larvae reared in our quarantine facility, it is unknown if this fungal isolate is prevalent in wild populations or if its persistence in colony-reared larvae is simply an artifact of captivity. In addition, it is unknown whether multiple strains of F. solani are harbored in the midgut or whether several strains co-exist. The goal of this chapter was to assess the distribution of ALB

F. solani in individuals collected from our colony and from field populations in Long Island, NY and Worcester, MA using operational taxonomic unit (OUT) based approaches and multilocus phylogenetics. We hypothesize that Fusarium solani will be harbored by beetles reared in a variety of host tree species and will be present in insects collected from different field populations

(Worcester, MA and Brooklyn, NY). We hypothesize that different populations may harbor different strains of this fungal species and that molecular phylotyping of F. solani strains harbored in the gut may be eventually useful for tracing the geographic origin of introduced populations of ALB.

Literature Cited

1. Eisen JA: Environmental shotgun sequencing: Its potential and challenges for

studying the hidden world of microbes. Plos Biology 2007, 5(3):384-388.

2. Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, Nguyen LP, Jovanovich S, Gates

CM, Feldman RA, Spudich JL et al: Bacterial rhodopsin: evidence for a new type of

phototrophy in the sea. Science 2000, 289(5486):1902-1906.

18 3. Nielsen AT, Liu WT, Filipe C, Grady L, Jr., Molin S, Stahl DA: Identification of a

novel group of bacteria in sludge from a deteriorated biological phosphorus removal

reactor. Applied and environmental microbiology 1999, 65(3):1251-1258.

4. Schramm A, Santegoeds CM, Nielsen HK, Ploug H, Wagner M, Pribyl M, Wanner J,

Amann R, de Beer D: On the occurrence of anoxic microniches, denitrification, and

sulfate reduction in aerated activated sludge. Applied and environmental microbiology

1999, 65(9):4189-4196.

5. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev

VV, Rubin EM, Rokhsar DS, Banfield JF: Community structure and metabolism

through reconstruction of microbial genomes from the environment. Nature 2004,

428(6978):37-43.

6. Dillon RJ, Dillon VM: The gut bacteria of insects: nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

7. Dale C, Moran NA: Molecular interactions between bacterial symbionts and their

hosts. Cell 2006, 126(3):453-465.

8. McLean AHC, van Asch M, Ferrari J, Godfray HCJ: Effects of bacterial secondary

symbionts on host plant use in pea aphids. Proceedings of the Royal Society B:

Biological Sciences 2011, 278(1706):760-766.

9. Oliver KM, Russell JA, Moran NA, Hunter MS: Facultative bacterial symbionts in

aphids confer resistance to parasitic wasps. Proceedings of the National Academy of

Sciences 2003, 100(4):1803-1807.

10. Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT,

Cayouette M, McHardy AC, Djordjevic G, Aboushadi N et al: Metagenomic and

functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature

2007, 450(7169):560-565.

19 11. Douglas AE: The microbial dimension in insect nutritional ecology. Functional

Ecology 2009, 23(1):38-47.

12. Kane MD: Microbial fermentation in insect guts. In: Gastrointestinal microbiology.

Springer; 1997: 231-265.

13. Scharf ME, Tartar A: Termite digestomes as sources for novel lignocellulases. Biofuels

Bioproducts & Biorefining-Biofpr 2008, 2(6):540-552.

14. Geib SM, Filley TR, Hatcher PG, Hoover K, Carlson JE, Jimenez-Gasco Mdel M,

Nakagawa-Izumi A, Sleighter RL, Tien M: Lignin degradation in wood-feeding

insects. Proceedings of the National Academy of Sciences of the United States of America

2008, 105(35):12932-12937.

15. Tartar A, Wheeler MM, Zhou XG, Coy MR, Boucias DG, Scharf ME: Parallel

metatranscriptome analyses of host and symbiont gene expression in the gut of the

termite Reticulitermes flavipes. Biotechnol Biofuels 2009, 2.

16. Adams AS, Aylward FO, Adams SM, Erbilgin N, Aukema BH, Currie CR, Suen G,

Raffa KF: Mountain pine beetles colonizing historical and haïve host trees are

associated with a bacterial community highly enriched in genes contributing to

terpene metabolism. Applied and environmental microbiology 2013, 79(11):3468-3475.

17. Xueyan Y, Jiaxi Z, Fugui W, Min C: A study on the feeding habits of the larvae of two

species of longicorn (Anoplophora) to different tree species. Journal of Northwest

Forestry College.1995, 2.

18. Haack RA, Herard F, Sun JH, Turgeon JJ: Managing invasive populations of Asian

longhorned beetle and citrus longhorned beetle: A Worldwide Perspective. In:

Annual Review of Entomology. Palo Alto: Annual Reviews; 2010, 55: 521-546.

20 19. Haack RA: Exotic bark-and wood-boring Coleoptera in the United States: recent

establishments and interceptions. Canadian Journal of Forest Research 2006,

36(2):269-288.

20. Hogg E, Brandt JP, Kochtubajda B: Growth and dieback of aspen forests in

northwestern Alberta, Canada, in relation to climate and insects. Canadian Journal

of Forest Research 2002, 32(5):823-832.

21. Petterson RC: The chemical composition of wood. In The Chemistry of Solid Wood The

American Chemical Society,1984: 57.

22. OSullivan AC: Cellulose: the structure slowly unravels. Cellulose 1997, 4(3):173-207.

23. Wilson D, Irwin D: Genetics and properties of cellulases recent progress in

bioconversion of lignocellulosics. In Advances in Biochemical

Engineering/Biotechnology. Edited by Tsao G, Brainard A, Bungay H, Cao N, Cen P,

Chen Z, Du J, Foody B, Gong C, Hall P et al, vol. 65: Springer Berlin / Heidelberg; 1999:

1-21.

24. Kirk TK, Farrell RL: Enzymatic combustion - the microbial-degradation of lignin.

Annual Review of Microbiology 1987, 41:465-505.

25. Tien M, Kirk TK: Lignin-degrading enzyme from Phanerochaete-chrysosporium -

purification, characterization, and catalytic properties of a unique H2o2-requiring

xxygenase. Proceedings of the National Academy of Sciences of the United States of

America-Biological Sciences 1984, 81(8):2280-2284.

26. Eriksson KEL, Blanchette RA, Ander P: Microbial and enzymatic degradation of

wood and wood components. Berlin: Springer-Verlag; 1990.

27. Dass SB, Dosoretz CG, Reddy CA, Grethlein HE: Extracellular proteases produced by

the wood-degrading fungus Phanerochaete chrysosporium under ligninolytic and

non-ligninolytic conditions. Archives of microbiology 1995, 163(4):254-258.

21 28. Mattson WJ: Herbivory in relation to plant nitrogen-Ccntent. Annu Rev Ecol Syst

1980, 11:119-161.

29. Keller B, Templeton MD, Lamb CJ: Specific localization of a plant-cell wall glycine-

rich protein in protoxylem cells of the vascular system. Proceedings of the National

Academy of Sciences of the United States of America 1989, 86(5):1529-1533.

30. Werren JH: Symbionts provide pesticide detoxification. Proceedings of the National

Academy of Sciences of the United States of America 2012, 109(22):8364-8365.

31. Morales-Ramos JA, Rojas MG, Sittertz-Bhatkar H, Saldana G: Symbiotic relationship

between Hypothenemus hampei (Coleoptera : Scolytidae) and Fusarium solani

(Moniliales : Tuberculariaceae). Ann Entomol Soc Am 2000, 93(3):541-547.

32. Adams AS, Six DL, Adams SM, Holben WE: In vitro interactions between yeasts and

bacteria and the fungal symbionts of the mountain pine beetle (Dendroctonus

ponderosae). Microbial ecology 2008, 56(3):460-466.

33. Dowd PF: Insect fungal symbionts - a promising source of detoxifying enzymes.

Journal of Industrial Microbiology 1992, 9(3-4):149-161.

34. Watanabe HaT, G: Cellulolytic Systems in Insects. Annu Rev Entomol 2009, 55:23.

35. Kukor JJ, Cowan DP, Martin MM: The role of ingested fungal enzymes in cellulose

digestion in the larvae of cerambycid beetles. Physiological Zoology 1988, 61(4):364-

371.

36. Hanks LM: Influence of the larval host plant on reproductive strategies of

cerambycid beetles. Annual Review of Entomology 1999, 44:483-505.

37. Boddy L, Jones TH: Chapter 9 Interactions between basidiomycota and

invertebrates. In: British Mycological Society Symposia Series. Edited by Lynne Boddy

JCF, Pieter van W, vol. Volume 28: Academic Press; 2008: 155-179.

22 38. Kawaguchi M: The evolution of symbiotic systems. Cellular and Molecular Life

Sciences 2011, 68(8):1283-1284.

39. Schloss PD, Delalibera I, Handelsman J, Raffa KF: Bacteria associated with the guts of

two wood-boring beetles: Anoplophora glabripennis and Saperda vestita

(Cerambycidae). Environ Entomol 2006, 35(3):625-629.

40. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

41. Breznak JA, Brune A: Role of microorganisms in the digestion of lignocellulose by

termites. Annual Review of Entomology 1994, 39:453-487.

42. Cleveland LR: The physiological and symbiotic relationships between the intestinal

protozoa of termites and their host, with special reference to Reticulitermes flavipes

Kollar. Biol Bull-Us 1924, 46(4):178-201.

43. Willis JD, Oppert C, Jurat-Fuentes JL: Methods for discovery and characterization of

cellulolytic enzymes from insects. Insect Science 2010, 17(3):184-198.

44. Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH: Diversity of Beetle Genes

Encoding Novel Plant Cell Wall Degrading Enzymes. PloS one 2010, 5(12): e15635 .

45. Pauchet Y, Wilkinson P, van Munster M, Augustin S, Pauron D, Ffrench-Constant RH:

Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela

tremulae reveals new gene families in Coleoptera. Insect Biochemistry and Molecular

Biology 2009, 39(5-6):403-413.

46. Watanabe H, Noda H, Tokuda G, Lo N: A cellulase gene of termite origin. Nature

1998, 394(6691):330-331.

23 47. Sugimura M, Watanabe H, Lo N, Saito H: Purification, characterization, cDNA

cloning and nucleotide sequencing of a cellulase from the yellow-spotted longicorn

beetle, Psacothea hilaris. Eur J Biochem 2003, 270(16):3455-3460.

48. Lo N, Watanabe H, Sugimura M: Evidence for the presence of a cellulase gene in the

last common ancestor of bilaterian animals. P Roy Soc Lond B Bio 2003, 270:S69-S72.

49. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, Je YH, Lee SM, Seo SJ, Sohn HD, Jin BR:

cDNA cloning, expression, and enzymatic activity of a cellulase from the mulberry

longicorn beetle, Apriona germari. Comp Biochem Phys B 2004, 139(1):107-116.

50. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G, Quesada M:

cDNA cloning, homology modelling and evolutionary insights into novel endogenous

cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae).

Insect Molecular Biology 2010, 19(3):323-336.

51. Scott JG, Liu N, Wen Z: Insect cytochromes P450: diversity, insecticide resistance

and tolerance to plant toxins. Comparative Biochemistry and Physiology Part C:

Pharmacology, Toxicology and Endocrinology 1998, 121(1–3):147-155.

52. Li X, Berenbaum MR, Schuler MA: Plant allelochemicals differentially regulate

Helicoverpa zea cytochrome P450 genes. Insect Molecular Biology 2002, 11(4):343-

351.

53. Scott JG: Cytochromes P450 and insecticide resistance. Insect Biochemistry and

Molecular Biology 1999, 29(9):757-777.

54. Jongsma MA, Bolter C: The adaptation of insects to plant protease inhibitors. Journal

of Insect Physiology 1997, 43(10):885-895.

55. Levin DA: The chemical defenses of plants to pathogens and herbivores. Annu Rev

Ecol Syst 1976, 7:121-159.

24 56. Zarate SI, Kempema LA, Walling LL: Silverleaf whitefly induces salicylic acid

defenses and suppresses effectual jasmonic acid defenses. Plant Physiol 2007,

143(2):866-875.

57. Walling LL: Avoiding effective defenses: strategies employed by phloem-feeding

insects. Plant Physiol 2008, 146(3):859-866.

58. Mithöfer A, Boland W: Plant defense against herbivores: chemical aspects. Annual

Review of Plant Biology 2012, 63:431-450.

59. Kunkel BN, Brooks DM: Cross talk between signaling pathways in pathogen defense.

Current opinion in plant biology 2002, 5(4):325-331.

60. Walling LL: The myriad plant responses to herbivores. Journal of Plant Growth

Regulation 2000, 19(2):195-216.

61. Felton GW, Tumlinson JH: Plant–insect dialogs: complex interactions at the plant–

insect interface. Current opinion in plant biology 2008, 11(4):457-463.

62. Hogenhout SA, Bos JI: Effector proteins that modulate plant–insect interactions.

Current opinion in plant biology 2011, 14(4):422-428.

63. Ryan CA: Protease Inhibitors in plants: genes for improving defenses against insects

and pathogens. Annual Review of Phytopathology 1990, 28(1):425-449.

64. Wittstock U, Gershenzon J: Constitutive plant toxins and their role in defense against

herbivores and pathogens. Current opinion in plant biology 2002, 5(4):300-307.

65. Fowler SV, Lawton JH: Rapidly induced defenses and talking trees: the devil's

advocate position. Am Nat 1985:181-195.

66. Li X, Schuler MA, Berenbaum MR: Jasmonate and salicylate induce expression of

herbivore cytochrome P450 genes. Nature 2002, 419(6908):712-715.

67. Feyereisen R: Insect P450 enzymes. Annual Review of Entomology 1999, 44(1):507-533.

25 68. De Leo F, Bonadé-Bottino MA, Ceci LR, Gallerani R, Jouanin L: Opposite effects on

Spodoptera littoralis larvae of high expression level of a trypsin proteinase inhibitor

in transgenic plants. Plant Physiol 1998, 118(3):997-1004.

69. Jongsma MA, Bakker PL, Peters J, Bosch D, Stiekema WJ: Adaptation of Spodoptera

exigua larvae to plant proteinase inhibitors by induction of gut proteinase activity

insensitive to inhibition. Proceedings of the National Academy of Sciences 1995,

92(17):8041-8045.

70. Barr KL, Hearne LB, Briesacher S, Clark TL, Davis GE: Microbial symbionts in insects

influence down-regulation of defense genes in maize. PloS one 2010, 5(6): e11339.

 71. Chu C-C, Spencer JL, Curzi MJ, Zavala JA, Seufferheld MJ: Gut bacteria

facilitate adaptation to crop rotation in the western corn rootworm. Proceedings of

the National Academy of Sciences 2013, 110( 29):1917–11922

72. Dowd, PF: In situ production of hydrolytic detoxifying enzymes by symbiotic yeasts

in the cigarette beetle (Coleoptera: Anobiidae). Journal of Economic Entomology

1989, 82(2):396-400.

73. Erasto P, Bojase-Moleta G, Majinda RR: Antimicrobial and antioxidant flavonoids

from the root wood of Bolusanthus speciosus. Phytochemistry 2004, 65(7):875-880.

74. Tahvanainen J, Julkunen-Tiitto R, Kettunen J: Phenolic glycosides govern the food

selection pattern of willow feeding leaf beetles. Oecologia 1985, 67(1):52-56.

75. Lingafelter SW, Hoebke, E. R.: Revision of Anoplophora (Coleoptera:

Cerambycidae). Washington, DC: Entomological Society of Washington 2002:236 p.

76. Haack RA, Law KR, Mastro VC, Ossenbruggen HS, Raimo BJ: New York's battle with

the Asian long-horned beetle. J Forest 1997, 95(12):11-15.

26 77. Nowak DJ, Pasek JE, Sequeira RA, Crane DE, Mastro VC: Potential effect of

Anoplophora glabripennis (Coleoptera : Cerambycidae) on urban trees in the

United States. Journal of Economic Entomology 2001, 94(1):116-122.

78. Hu JF, Angeli S, Schuetz S, Luo YQ, Hajek AE: Ecology and management of exotic

and endemic Asian longhorned beetle Anoplophora glabripennis. Agricultural and

Forest Entomology 2009, 11(4):359-375.

79. Keena MA: Effects of temperature on Anoplophora glabripennis (Coleoptera :

Cerambycidae) adult survival, reproduction, and egg hatch. Environ Entomol 2006,

35(4):912-921.

80. Keena MA, Moore PM: Effects of temperature on Anoplophora glabripennis

(Coleoptera: Cerambycidae) larvae and pupae. Environ Entomol 2010, 39(4):1323-

1335.

81. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

82. Lozovaya VV, Lygin AV, Zernova OV, Li S, Widholm JM, Hartman GL: Lignin

degradation by Fusarium solani f. sp glycines. Plant Disease 2006, 90(1):77-82.

83. Rodriguez A, Perestelo F, Carnicero A, Regalado V, Perez R, de la Fuente G, Falcon

MA: Degradation of natural lignins and lignocellulosic substrates by soil-inhabiting

fungi imperfecti. Fems Microbiol Ecol 1996, 21(3):213-219.

84. Sutherland JB, Pometto III AL, Crawford DL: Lignocellulose degradation by Fusarium

species. Canadian Journal of Botany 1983, 61(4):1194-1198.

85. Falcon MA, Rodriguez A, Carnicero A, Regalado V, Perestelo F, Milstein O, Delafuente

G: Isolation of miroorganisms with lignin transformation potential from soil of

Tenerife Island. Soil Biology & Biochemistry 1995, 27(2):121-126.

27 86. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J,

Schmutz J, Taga M, White GJ, Zhou SG et al: The Genome of Nectria haematococca:

Contribution of Supernumerary Chromosomes to Gene Expansion. Plos Genetics

2009, 5(8): e1000618.

87. Geib SM, Tien M, Hoover K: Identification of proteins involved in lignocellulose

degradation using in gel zymogram analysis combined with mass spectroscopy-

based peptide analysis of gut proteins from larval Asian longhorned beetles,

Anoplophora glabripennis. Insect Science 2010, 17(3):253-264.

88. Jones KG, Dowd PF, Blackwell M: Polyphyletic origins of yeast-like endocytobionts

from anobiid and cerambycid beetles. Mycological research 1999, 103:542-546.

89. Zhang N, Suh SO, Blackwell M: Microorganisms in the gut of beetles: evidence from

molecular cloning. Journal of invertebrate pathology 2003, 84(3):226-233.

90. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

91. Suh SO, Noda H, Blackwell M: Insect symbiosis: Derivation of yeast-like

endosymbionts within an entomopathogenic filamentous lineage. Molecular biology

and evolution 2001, 18(6):995-1000.

92. Ellegren H: Sequencing goes 454 and takes large-scale genomics into the wild.

Molecular Ecology 2008, 17(7):1629-1631.

93. Xiong X, Frank DN, Robertson CE, Hung SS, Markle J, Canty AJ, McCoy KD,

Macpherson AJ, Poussier P, Danska JS et al: Generation and analysis of a mouse

intestinal metatranscriptome through Illumina based RNA-sequencing. PloS one

2012, 7(4):e36009.

28 94. Qi M, Wang P, O'Toole N, Barboza PS, Ungerfeld E, Leigh MB, Selinger LB, Butler G,

Tsang A, McAllister TA et al: Snapshot of the eukaryotic gene expression in

muskoxen rumen—a metatranscriptomic approach. PloS one 2011, 6(5):e20521.

95. Xie L, Zhang L, Zhong Y, Liu N, Long Y, Wang S, Zhou X, Zhou Z, Huang Y, Wang Q:

Profiling the metatranscriptome of the protistan community in Coptotermes

formosanus with emphasis on the lignocellulolytic system. Genomics 2012, 99(4):246-

255.

96. Gayatri Priya N, Ojha A, Kajla MK, Raj A, Rajagopal R: Host plant induced variation

in gut bacteria of Helicoverpa armigera. PloS one 2012, 7(1):e30768.

97. Hemming JDC, Lindroth RL: Effects of phenolic glycosides and protein on gypsy

moth (Lepidoptera: Lymantriidae) and forest tent caterpillar (Lepidoptera:

Lasiocampidae) Performance and Detoxication Activities. Environ Entomol 2000,

29(6):1108-1115.

98. Després L, David J-P, Gallet C: The evolutionary ecology of insect resistance to plant

chemicals. Trends in Ecology & Evolution 2007, 22(6):298-307.

99. Boeckler GA, Gershenzon J, Unsicker SB: Phenolic glycosides of the Salicaceae and

their role as anti-herbivore defenses. Phytochemistry 2011, 72(13):1497-1509.

29

Chapter 2

Comparative Metagenomic Profiling Reveals Lignocellulose Degrading System in Microbial Community Associated with a Wood-Feeding Beetle

Abstract

The Asian longhorned beetle (Anoplophora glabripennis) is an invasive, wood-boring pest that thrives in the heartwood of deciduous tree species. A large impediment faced by A. glabripennis as it feeds on woody tissue is lignin, a highly recalcitrant biopolymer that reduces access to sugars and other nutrients locked in cellulose and hemicellulose. We previously demonstrated that lignin, cellulose, and hemicellulose are actively deconstructed in the beetle gut and that the gut harbors an assemblage of microbes hypothesized to make significant contributions to these processes. While lignin degrading mechanisms have been well characterized in pure cultures of white rot basidiomycetes, little is known about such processes in microbial communities associated with wood-feeding insects. The goals of this study were to develop a taxonomic and functional profile of a gut community derived from an invasive population of larval A. glabripennis collected from infested host and to identify genes that could be relevant for the digestion of woody tissue and nutrient acquisition. To accomplish these goals, we taxonomically and functionally characterized the A. glabripennis midgut microbiota through amplicon and shotgun metagenome sequencing and conducted a large-scale comparison with the metagenomes from a variety of other herbivore-associated communities. This analysis distinguished the A. glabripennis larval gut metagenome from the gut communities of other herbivores, including previously sequenced termite hindgut metagenomes. Genes encoding

30 enzymes were identified in the A. glabripennis gut metagenome that could have key roles in woody tissue digestion including candidate lignin degrading genes (laccases, dye-decolorizing peroxidases, novel peroxidases and β-etherases), 36 families of glycoside hydrolases (such as cellulases and xylanases), and genes that could facilitate nutrient recovery, essential nutrient synthesis, and detoxification. This community could serve as a reservoir of novel enzymes to enhance industrial cellulosic biofuels production or targets for novel control methods for this invasive and highly destructive insect.

Introduction

Cellulose and hemicellulose represent some of the most abundant, renewable carbohydrate resources on the planet, comprising the largest natural source of fermentable sugars, which could be utilized for ethanolic biofuel production [1]. Despite the abundance of these polysaccharides, a major impediment to accessing fermentable sugars from these carbohydrates for large-scale industrial ethanol production is the presence of lignin [2], a stereotypically irregular, aromatic biopolymer comprised of phenylpropanoid aryl alcohol subunits and articulated by over 12 types of chemical bonds [3]. Highly resilient β-aryl ether and carbon- carbon bonds constitute the majority of the linkages in hardwood lignin, which are resistant to hydrolysis and difficult to disrupt. However, wood-feeding insects, in collaboration with their gut microbial communities, have the capacity to produce enzymes that facilitate the degradation of lignocellulosic material [4, 5]. Accordingly, these microbial communities constitute unique ecosystems that may serve as reservoirs of novel proteins and enzymes that could be exploited to enhance the efficiency of industrial biomass pre-treatment processes, decoupling lignin from

31 wood polysaccharides and facilitating access to fermentable sugars in cellulose and hemicellulose. Of recent interest is the gut community of Anoplophora glabripennis [Order

Coleoptera; Family Cerambycidae], an invasive, xylophagous beetle that colonizes and feeds in a broad range of apparently healthy tree species, including several genera commonly planted as short rotation biofeedstocks (e.g., Populus and Salix) [6, 7]. A large community of microbes capable of producing cellulolytic and hemicellulolytic enzymes in the A. glabripennis midgut was previously described [8, 9]. Analysis of A. glabripennis frass also revealed the presence of lignin degradation products [8], suggesting that its gut microbial community or the insect itself also harbors lignin degrading genes. The most dominant modification to lignin detected in A. glabripennis was propyl side chain oxidation, a reaction associated with white rot fungal lignin degradation that is not known to be catalyzed by bacterial- or animal-derived enzymes [10].

White rot fungal isolates have not been previously detected in association with A. glabripennis using either culture-dependent or culture-independent approaches [9, 11-13], suggesting that the lignin-degrading capacity of this system is unique from well-characterized, pure-culture canonical fungal systems. Therefore, the assemblage of microbes associated with the A. glabripennis midgut represents an excellent candidate for mining novel lignocellulose degrading enzymes for biofuel applications.

Many members of the family Cerambycidae, including A. glabripennis, produce their own endogenous cellulases (endoglucanases and β-glucosidases) and other plant cell wall degrading enzymes [9, 14-16]. However, interaction with microbes has been observed to enhance cellulase activities and is hypothesized to enhance glucose release from cellulose in the guts of several beetle species, including A. glabripennis [17]. For example, disruption of the gut microbiota induced by feeding on a cellulose-based artificial diet containing bacteriostatic and fungistatic agents results in a tangible reduction in cellulase complex activity (endoglucanases,

32 exoglucanases, and β-glucosidases) in the A. glabripennis midgut [9]. In addition, insects and other herbivores are generally not capable of producing a full arsenal of O-acetylglucuronxylan- degrading enzymes and they are also generally unable to utilize pentose sugars present in xylan

(e.g., D-xylose) without the aid of xylose-degrading microbes [18]. Although animal-derived enzymes have been hypothesized to be involved in lignin degradation [19] and an endogenous termite laccase can chemically modify lignin alkali and degrade lignin phenolics in vitro [20], microbes living in the guts of wood-feeding insects also have the capacity to produce enzymes that contribute to or enhance endogenous ligninase activities supplied by host enzymes [21, 22].

Therefore, herbivorous animals, and specifically wood-feeding insects, likely benefit from enzymes produced by microbes to facilitate the digestion of woody tissue.

Wood-feeding insects exploit a variety of strategies to liberate carbohydrates from recalcitrant plant tissues and most wood-feeding insects maintain obligate associations with microbes. Associations of microbes with wood-feeding insects occur through cultivation of wood-degrading fungi [23], direct ingestion of fungal or bacterial enzymes [17], preferential feeding on compromised (stressed/decaying) trees whose structural polysaccharides have been previously disrupted by environmental wood-degrading microbes [24], or endosymbiosis with wood-degrading microbes [25]. These microbial affiliates are thought to make important contributions to lignocellulose digestion in a phylogenetically diverse array of insects, including several beetle species where microbial fermentation products have been detected in the gut [26].

Despite the associations between wood-feeding insects and microbes, the fate of lignin and the lignin degrading abilities of the microbial communities associated with many wood-feeding insects (with the exception of termites) [27] are largely uncharacterized; furthermore, no lignin degrading genes or proteins outside the white rot basidiomycetes have been annotated in metagenomes sampled from any wood-feeding insect microbial communities to date.

33 Wood-boring cerambycids harbor large communities of microbes, but little is known about their metabolic potential, other than the role of yeast-like gut symbionts in the digestion of hemicellulose and fermentation of xylose, which has been extensively studied [28]. Community profiling of wood-feeding cerambycid guts has revealed a striking degree of diversity in terms of community richness. In general, stenophagous insects with restricted host ranges tend to have less complex and more static gut communities than polyphagous insects, which have broad host ranges and tend to have more diverse and plastic communities. This diversity and plasticity is hypothesized to allow these insects to colonize and thrive in a broader range of host trees [11].

Microbial community profiling of A. glabripennis larvae feeding in a variety of host tree species demonstrated that the composition of the community was plastic and varied by host tree species

[9]. However, the composition of the A. glabripennis midgut bacterial community was distinct from the wood bacterial community sampled from unforaged sections of the tree [12]. Also, members of the Fusarium solani species complex 6 (group FSSC-6) have been consistently detected in the midguts of A. glabripennis larvae collected from multiple geographic locations and multiple host tree species, as well as larvae feeding on sterilized artificial diet [13]. These findings suggest that not all of the microbes detected in the gut are acquired directly from the host tree.

The primary goals of this study were to provide a functional and taxonomic profile of the larval midgut microbial community of an invasive A. glabripennis population feeding on a preferred host (silver maple; Acer saccharinum) through next generation sequencing of small ribosomal subunit (SSU) amplicons and total DNA collected from the A. glabripennis midgut microbiota. Through this analysis, we compiled a suite of candidate genes found in the A. glabripennis microbial community whose annotations are consistent with lignin-, cellulose-, and hemicellulose-degrading capabilities and other genes that may have roles in nutrient synthesis and

34 detoxification. These microbial genes are hypothesized to make key contributions to the ability of this insect to attack and develop in a broad range of healthy host trees [29, 30]. We used a large-scale comparative metagenomic approach that included metagenomes derived from herbivore communities, ranging from grass-feeding ruminants to insects that thrive on highly complex woody substrates, to demonstrate that the A. glabripennis midgut metagenome was distinct from other host-associated metagenomes and could thus provide valuable insights into the interactions between wood-feeding beetles and their microbial affiliates that contribute to the digestion of woody tissue.

Methods

Preparation of Insect Cell Free DNA for Community Profiling and Shotgun Sequencing

Five fourth instar A. glabripennis larvae actively feeding in the heartwood of a preferred host tree (Acer saccharinum; silver maple) were collected from a field site located in Worcester,

MA and were transported under permit conditions to a USDA-approved quarantine facility at The

Pennsylvania State University for dissection and processing. The sample collection was conducted at a field site that was part of a United States Department of Agriculture’s eradication effort. Permission by the United States Department of Agriculture and by local authorities was obtained under the general permit (P526P-12-02646) Insects were sterilized twice in 70% ethanol to remove surface-contaminating microbes and residual ethanol was removed with a single rinse in sterile milliQ water. Insects were dissected and guts were removed under sterile conditions.

For this experiment, we chose to focus exclusively on microbes associated with the midgut contents since this is the most prominent region in the guts of cerambycids. To enrich the sample

35 for microbial cells and exclude insect tissue, the insect-derived peritrophic matrix (PM) that surrounds and protects the food bolus was separated from the midgut contents and DNA was extracted from microbes adhering to the food. DNA was extracted using the Fast DNA Spin Kit for Soil (MP Biomedicals, Santa Ana, CA), which was chosen due to its abilities to lyse cell walls from a variety of microbes and remove plant polysaccharides and other plant secondary metabolites that can co-extract with DNA and interfere with downstream processes. DNA was quantified using a Nano Drop 1000 spectrophotometer (Thermo-Scientific, Walthan, MA) and approximately 1 µg of DNA was used for 16S/18S amplicon and shotgun (total DNA) 454 library construction (Roche, Branford, CT).

454 Amplicon Pyrosequencing to Taxonomically Identify Microbes Associated with the A. glabripennis Midgut

To identify the bacterial and fungal taxa found in association with the A. glabripennis midgut and to confirm that this sample was successfully enriched for microbial DNA prior to shotgun sequencing, a 16S/18S amplicon library encompassing the V6-V8 hypervariable regions was constructed using a set of primers designed to co-amplify both 16S bacterial rDNA and 18S fungal, insect, and plant rDNA from positions 926F to 1392R [31]. The amplicon library was constructed following the Department of Energy-Joint Genome Institute’s Standard Operating

Procedure. In brief, 20 ng of genomic DNA were added to a PCR cocktail containing 6 µL 5X

PCR buffer, 2 µL GC melt solution (Clonetech, Mountain View, CA), 0.4 µL Taq Polymerase

(Advantage 2 Polymerase, Clonetech, Mountain View, CA), 0.4 µL 10 mM dNTPs (Fermentas,

Pittsburgh, PA), 1 µL 25 nM forward primer (926F: 5’-

CCTATCGGGTGTGTGCCTTGGCAGTCTCAGAAACTYAAAKGAATTGACGG-3’) and 1

36 µL 25 nM reverse primer (1392R: 5’-

CCATCTCATCCCTGCGTGTCTCCGACTCAGCTACTACGGGCGGTGTGTGC-3’. GC melt solution (Clonetech, Mountain View, CA) and Advantage 2 Polymerase (Clonetech, Mountain

View, CA) were used to improve amplification efficiency of templates with high GC content.

Primers were constructed using the standard 454 Titanium adaptor sequence (underlined) and a five base-pair barcode incorporated into the reverse primer (bold). PCR thermal cycling conditions included an initial denaturation for three minutes at 95°C followed by 25 cycles of

95°C for 30s, 50°C for 45s, and 72°C for 90s and a final extension at 68 °C for 10 minutes.

Product quality was assessed by agarose gel electrophoresis and the final was purified using SPRI beads and quantified using the Quant-IT dsDNA Assay on a Qubit fluorimeter (Life

Technologies, Carlsbad, CA). Approximately 7,000 reads were sequenced using 454 Titanium chemistry (Roche, Branford, CT). High quality reads greater than 250 bp in length were clustered into operational taxonomic units (OTUs) at 97% similarity and rarefaction curves and richness estimates were computed using the program mothur (version 1.2.22) [32]. Putative chimeras were identified using UCHIME [33] and were omitted from the analyses. Sequences for representative OTUs were compared to the non-redundant nucleotide database using BLASTN

(BLAST-2.2.23) [34] with an e-value threshold of 0.00001 to determine whether the OTU was of bacterial, fungal, insect, or plant origin. Bacterial reads were classified using Ribosomal

Database Project (RDP) Classifier [35], with an 80% confidence threshold for taxonomic classifications; sequences classified as mitochondrial or chloroplast in origin were omitted from the analysis. Fungal reads were classified by comparison to the non-redundant nucleotide database using BLASTN (BLAST-2.2.23) with an e-value threshold of 0.00001 followed by

MEGAN classification[36] of the top ten blast alignments using the least common ancestor algorithm. Alignments to unidentified or uncultured fungi were removed from BLAST results prior to MEGAN classification. Plant- and insect-derived OTUs were excluded from the

37 analysis. Representative sequences of each bacterial OTU were aligned with ClustalW and were trimmed to 250 bp in length for phylogenetic reconstruction using Garli (version 2.0) [37]. TIM1

+ I + G was chosen as the optimal evolutionary model by jModelTest [38] and 500 bootstrap replicates were compiled to generate a consensus tree. High quality 454 amplicon reads are deposited in the NCBI Sequence Read Archive (SRA) under the accession number SRR767751.

Phylogenetic Binning and Functional Analysis of A. glabripennis Midgut Microbiota Using

Shotgun 454 Pyrosequencing

454 shotgun libraries were constructed using a modified version of the 454 standard library protocol. In brief, 500 ng of DNA were sheared using a sonicator (Covaris, Woburn, MA) and fragments ranging from 500 to 800 bp were size selected using ampure beads. DNA fragments were end-polished, purified, and ligated to 454 Titanium adapters. A fill-in reaction was performed and the ssDNA template was isolated, purified, and prepared for emulsion PCR

(emPCR). Additional cycles were added to the emPCR protocol to linearly amplify 454 adapter- ligated DNA from low yield DNA extractions. A previous study comparing metagenome libraries prepared with additional emPCR cycles to libraries prepared with standard numbers of emPCR cycles revealed no substantial amplification biases in libraries prepared with extra emPCR cycles (unpublished data). Based on this study, we suspect that no major biases were introduced using this approach. A total of 1.25 million shotgun reads (382 Mb) were sequenced at the DOE-Joint Genome Institute using 454 Titanium chemistry (Roche, Branford, CT). Raw reads are deposited in the NCBI Sequence Read Archive under the accession number

SRR767751.

38 Initially, reads were assembled using Newbler (Roche, Branford, CT), but the midgut community was diverse, containing 166 bacterial OTUs and 7 fungal OTUs and the sequencing depth per OTU was too low to generate a high quality assembly. Consequently, the N50 contig length was low (< 1000 bp) and coverage across contigs was not uniform. There was also significant possibility of generating chimeric contigs consisting of reads from more than one bacterial taxon (Table 2-1) [39]. We felt the slight improvement in contig sequence length versus raw read length was outweighed by these assembly issues; therefore, rather than using assembled contigs, high quality shotgun reads were treated as individual gene tags, which were used for annotations (with the exception of comparisons to other metagenome communities and candidate lignin degrading gene comparisons, in which assembled contigs were used to maintain consistency with the other datasets). For annotation and analysis of the unassembled reads, low quality reads with mean quality scores below 20, reads containing repetitive regions, and reads less than 150 bp in length were excluded from the dataset. Tags originating from non-coding

RNAs, including tRNAs and rRNAs, were detected with tRNA-Scan [40] and HMMer using

HMM profiles for prokaryotic, eukaryotic, and archaeal small subunit and large subunit rRNAs

[41, 42]. While tRNAs were filtered out of the dataset and were not utilized in downstream functional analyses, small subunit (16S and 18S) rRNAs detected were taxonomically classified by alignment to the SILVA SSU database [43] to detect additional bacterial and fungal taxa that may not have been detected with 454 amplicon analysis due to primer inefficiencies or biases.

After filtering and removing non-coding RNAs, 1.06 million reads, ranging in length from 150 to

1050 bp, remained (mean read length: 350 bp).

454 library adapters and low quality ends were trimmed from the remaining reads.

Individual reads were annotated by BLASTX comparisons to the non-redundant (NR) protein database [34] using an e-value cutoff of 0.00001 and were taxonomically classified using

39 MEGAN (MEtaGenome ANalyzer) [36] least common ancestor classification based on the top 10

BLAST alignments for each read. Reads predicted to originate from bacterial or fungal taxa were also uploaded to the MG-RAST server [44] for gene prediction and assignment to SEED subsystems. Reads were also functionally categorized via an RPS-BLAST comparison [45] to the

Clusters of Orthologous Gene (COG) database [46]. Reads were also assigned to Gene Ontology

(GO) terms [47] and classified to KEGG enzyme classes [48] using Blast2GO [49]; furthermore, reconstruction of metabolic pathways was conducted using MinPath (Minimal set of Pathways) parsimony analysis [50] of KEGG Orthology (KO) assignments. BLAST results were corroborated by 6-frame translation followed by functional domain analysis using HmmSearch

[41] to scan for Pfam A derived HMMs [51]. CAZyme (Carbohydrate active enzyme) [52] carbohydrase family classifications are based on Pfam domain assignments.

Comparisons to Other Herbivore-Related Metagenomes:

Pfam domains from the A. glabripennis metagenome assembly (contigs and un- assembled singleton reads) were compared to domains from assembled (contigs and unassembled singletons) metagenome data sampled from communities associated with herbivores feeding on a diversity of plants that varied in carbohydrate and lignin composition. Pfam functional domains were chosen for comparative analysis because they are relatively short in length, which increases the likelihood that they will be correctly identified in single sequence reads. Therefore, detection and subsequent annotation of these domains are less likely to be influenced by assembly contiguity, which varied between the metagenome libraries. Annotated Pfam domains were obtained from the JGI IGM/M database for microbial communities associated with 1) herbivores that feed on a variety of plant tissues: panda, reindeer, honey bee, attine ant fungal garden, and

40 wallaby; 2) insects that feed only on phloem and/or xylem tissue: Dendroctonous frontalis galleries and guts, Dendroctonous ponderosae galleries and guts, Xyleborus affinis galleries and guts (larval and adult); and 3) insects that feed only in woody tissue: Amitermes wheeleri hindgut, Nasutitermes sp. hindgut, Sirex noctilio fungal gallery, and a community affiliated with

Trichonympha protist symbionts of termites collected from Los Padres National Forest, CA. The

Pfam compositions of these communities were compared to the Pfam composition of the

Anoplophora glabripennis midgut community. For each community, data were normalized by total number of Pfam domains detected, weighted by contig depth when assembly information was available, and a compositional dissimilarity matrix was constructed based on Euclidean distance. For unassembled singleton reads, a contig depth of one was assumed. Samples were subjected to cluster analysis using Ward’s method. Further, the standardized data were also analyzed using unconstrained Principal Components Analysis to plot samples in multidimensional space. PCA ordination was selected because the data were determined to be linear by detrended correspondence analysis (DCA) (Beta diversity <4). Partially constrained redundancy analysis (RDA), removing effects of library size, did not significantly change the ordination, indicating that differences in library sizes do not significantly influence the ordination.

All multivariate comparisons and ordinations were performed using the R statistical package with

‘vegan’ and ‘cluster’ libraries.

41 Results and Discussion

Taxonomic Classification of OTUs and Shotgun Reads

Approximately 6.7% of the total shotgun reads were classified to class Hexapoda while approximately 0.2% of the total shotgun reads were classified as plant, indicating that the metagenome library was comprised predominantly of microbial DNA. Amplicon sequencing identified seven distinct fungal OTUs and 166 bacterial OTUs using a 97% similarity threshold in mothur, while only a single insect OTU (2% of the total amplicons) and a single plant OTU

(0.53% of the total amplicons) were detected. Overall, fungal reads outnumbered bacterial reads, which could be attributed to a higher relative abundance of fungal taxa in the midgut or to preferential amplification of fungal amplicons with the 926F/1392R primers used in this study, as this dominance is not reflected in the shotgun sequencing data.

OTU taxonomic classification with RDP classifier detected the presence of 166 OTUs in seven bacterial phyla in the midgut community including Actinobacteria (30 OTUs),

Bacteroidetes (29 OTUs), Chlamydiae (1 OTU), Firmicutes (14 OTUs), Proteobacteria (80

OTUs), candidate phylum TM7 (3 OTUs), and Verrucomicrobia (5 OTUs), while four OTUs could not be conclusively assigned to any previously-characterized bacterial phyla. Rarefaction analysis and Chao richness estimates predict the presence of over 350 bacterial OTUs (95% confidence interval range: 266-517 OTUs), demonstrating that deeper sampling of amplicon data may result in the detection of additional less abundant bacterial taxa (Figure 2-1 and Table 2-2).

The most taxonomically-diverse phylum in terms of OTU richness was Proteobacteria, containing

80 distinct OTUs assigned to 22 different families. At the class level, 15 different bacterial classes were identified and the midgut community was dominated by six taxonomic classes

42 (Figure 2-2 and Table 2-3). Overall, the single most-prevalent OTU, which comprised over 21% of the bacterial amplicons, was a member of the family Leuconostocaceae that could not be classified to genus level by RDP. Comparison of this OTU to 16S sequences curated in the RDP database revealed that it had highest nucleotide sequence similarity to bacteria in the genus

Leuconostoc. Other predominant OTUs were assigned to the family Enterobacteriaceae (8.4% bacterial amplicons), the family Microbacteriaceae (8.3% bacterial amplicons), and to the higher phylum Actinobacteria (9.3% bacterial amplicons). Many OTUs could not be definitely assigned to low taxonomic levels, suggesting that the A. glabripennis midgut microbiota may serve as a reservoir for novel microbes. With the exception of the higher overall abundance of fungal 18s

OTUs relative to bacterial 16s OTUs, the results of OTU abundance and classification were corroborated by phylogenetic binning of shotgun reads, which is less impacted by amplification biases relative to PCR-based approaches (Figure S1).

Identification of Cellulose-, Hemicellulose- and Aromatic Compound- Degrading Bacterial

Taxa

Several genera of bacteria were detected in the A. glabripennis midgut community that have been previously implicated in the degradation of lignocellulose, hemicellulose, and other aromatic hydrocarbons, including the following lignocellulose degrading bacteria previously isolated from the A. glabripennis midgut on carboxymethylcellulose-containing media or detected previously through 16S analyses: Brachybacterium, Bradyrhizobium, Cornyebacterium,

Rhizobium, Pseudomonas, Sphingomonas, and Xanthamonas [9, 12]. Furthermore, the midgut community sampled for this study strongly resembles the taxonomic compositions of larval gut communities previously sampled from insects feeding in Acer saccharinum in a separate

43 population (Brooklyn, NY) [9] and from beetles collected in China [11], suggesting a consistent relationship between these microbial taxa and A. glabripennis. Of significance is that, unlike the termite and other herbivore-associated gut communities, the microbiota associated with the A. glabripennis midgut is dominated by aerobes and facultative anaerobes with very few obligate anaerobic taxa. To date, all characterized large-scale lignin degrading reactions require oxygen and have only been demonstrated in aerobic environments [53], such as the A. glabripennis midgut [11].

Identification of Fungal Community

Fungi are frequently encountered in guts of wood feeding insects [54], including A. glabripennis [13]; however, in contrast to the bacterial community, the fungal community is considerably less diverse, containing approximately 7 distinct OTUs. Rarefaction analysis and richness estimates predict 18 fungal OTUs (95% confidence interval: 8-31 OTUs) (Figure 2-3).

Compared to the 16S region in bacteria, 18S regions in fungi display considerably less sequence heterogeneity [55], even among distant relatives and an accurate assessment of fungal diversity in the A. glabripennis midgut may be underestimated. All fungal taxa detected belonged to the phylum Ascomycota, confirming a low abundance or complete absence of white-rot basidiomycetes in the midgut microbiota. All of the fungal taxa detected were yeasts assigned to the family Saccharomycetaceae. However, most could not be conclusively classified to genus level with MEGAN, but had highest-scoring BLAST alignments to the genera Issatchenika (3

OTUs; 58% total fungal amplicons) and Saccharomyces (1 OTU; 36% total fungal amplicons).

The three other fungal OTUs were present as singletons and had highest-scoring BLAST alignments to the fungal genera Geotrichum, Pichia, and an unclassified member of the family

44 Archaeosporaceae. Many of these genera are phylogenetically close relatives to yeasts isolated from the guts of other wood-feeding cerambycid beetles [56], which are often capable of processing hemicellulose and fermenting xylose into ethanol, but are not known to degrade lignin or cellulose. Many wood- and plant-feeding insects, such as leaf-cutter ants [57], wood wasps

[58], bark beetles [59] and some termite species [60] maintain obligate external associations with non-yeast filamentous basidiomycete and ascomycete fungi and directly inoculate fungal isolates into their food sources, where they facilitate pre-digestion of lignocellulose and serve other nutrient-provisioning roles. These strategies substantially reduce the carbohydrate complexity and lignin content of the food substrate prior to ingestion by the insect. In contrast, A. glabripennis constitutively harbors a filamentous ascomycete belonging to the Fusarium solani species complex within its midgut [13]. Multilocus phylogenetic analysis of this isolate collected from several geographic populations revealed that the isolates harbored in the beetle gut are distinct from other previously characterized members of the F. solani species complex.

Moreover, this fungus could be detected in colony-reared insects feeding on sterile diet [13], suggesting that this fungus is intricately associated with the gut. Though F. solani was not detected in the 18S fungal amplicon data, F. solani has been cultivated previously from A. glabripennis beetle guts collected at this field site [13] and reads taxonomically classified to the genus Fusarium were detected in the shotgun reads. This low abundance of F. solani reads in the shotgun libraries is likely due to excluding the peritrophic matrix from the sample as F. solani is likely associated with the gut wall tissue. Members of the Fusarium species complex are metabolically versatile and often harbor lignin peroxidase and other ligninase homologs [61], which suggests contributions to these processes in the A. glabripennis midgut [62].

45 Functional Profiling of Reads Generated through 454 Shotgun Sequencing

Approximately 65% of the high quality 454 reads generated had BLASTX matches to proteins from the non-redundant protein database at an e-value of 0.00001 or lower. Of these reads, approximately 79% had best alignment scores to annotated proteins, while the remaining

21% had highest scoring BLAST alignments to hypothetical or uncharacterized proteins. Overall, the most abundant BLAST and Pfam domain assignments associated with the midgut microbial community belonged to ABC transporters, major facilitator transporters, alcohol dehydrogenases, and aldehyde dehydrogenases. Functional categorization of shotgun reads by both COG and

SEED assignments predicted that the majority of the reads originated from pathways involved in the metabolism of carbohydrates and amino acids (Figure 2-4). Annotation statistics are summarized in Table 2-4 and annotations are publically available through MG-RAST at http://metagenomics.anl.gov/ under the identification number 4453653.3 and JGI IMG/M at http://img.jgi.doe.gov/m/ under project ID Gm00068.

Comparison of Functional Domains from Other Herbivore Associated Microbial

Communities

Hierarchical agglomerative cluster analysis based on Pfam abundances from herbivore- associated metagenomes did not appear to group the microbial communities based on the taxonomic relatedness of their herbivore hosts (Figure 2-5). Although many of the beetle gut communities and fungal gallery communities are derived from closely related beetles and cluster together, several notable exceptions suggest that factors other than taxonomic relatedness contribute to the hierarchical clustering pattern observed. For example, although A. glabripennis

46 (Order Coleoptera) and S. noctilio (Order Hymenoptera) belong to two different insect orders, their microbial communities can be found in the same group in the cluster analysis, suggesting that they share similarities in microbial metabolic capabilities. Additionally, the two hymenopterans included in this comparison (honey bee and Sirex) fall into two distant clusters.

However, a clear division between gut communities and fungal gallery communities is apparent, with the exceptions of the ant fungal garden, which clustered with the herbivore gut communities and was previously hypothesized to function as an external rumen [63]. The A. glabripennis midgut community is also an exception as it clustered with the fungal gallery communities.

Interestingly, many of the fungal gallery communities that cluster with the A. glabripennis metagenome are hypothesized to have lignin degrading capabilities, which is in contrast to the ant fungal garden community. While cellulose and hemicellulose were preferentially degraded in the fungal gardens, lignin remained relatively unscathed and was ultimately discarded by the insects

[63]. The same pattern of cell wall digestion has also been observed in the rumens of many grass-feeding herbivores [64].

Although fungal communities cultivated by bark beetles [65] are primed to synthesize nutrients and detoxify plant secondary metabolites [66], penetration of the lignin barrier enhances access to cellulose and hemicellulose present in both phloem and xylem tissues where bark beetles feed. Although the fate of lignin in the majority of these systems is unclear, lignin degradation and aromatic compound metabolism have been demonstrated in a Fusarium solani fungal gallery strain associated with xylem-feeding ambrosia beetles (e.g., Xyleborus) [67].

Thus, the fungal gallery communities associated with these phloem and xylem feeding beetles have the potential to harbor lignin degrading genes capable of degrading woody tissue. The final cluster in our analysis contains microbial communities associated with insects feeding on heartwood and includes the A. glabripennis midgut and Sirex wood wasp fungal gallery

47 communities. Notably, these wood-feeding communities are relatively distant from those associated with the other herbivore guts or the other fungal gallery communities included in this comparison, suggesting that these communities may harbor genes that encode enzymes optimized for breaking down complex and recalcitrant woody tissue. Like A. glabripennis, the Sirex fungal gallery community is also capable of disrupting lignin polymers and the community contains a lignin degrading white rot fungus belonging to the genus Amylostereum, which produces manganese peroxidases and laccases [68].

The groupings detected through hierarchical cluster analysis are also supported by

Principal Components Ordination (Figure 2-6). The X-axis separates the majority of the gut communities from the gallery communities with the notable exception of the A. glabripennis midgut, which is clearly distinct from the other gut metagenomes and was placed in close proximity to the Sirex fungal gallery microbiome. The Y-axis separates fungal gallery communities associated with phloem-feeding herbivores from wood-feeding herbivores that bore deep into the heartwood. Although both Sirex and A .glabripennis insects feed in similar regions of their host trees, Sirex has a limited host range relative to A. glabripennis and feeds exclusively on the genus Pinus [69]. In contrast, A. glabripennis has a much broader host range and feeds in the heartwood of over 25 deciduous tree species in the United States

(http://www.aphis.usda.gov/plant_health/plant_pest_info/asian_lhb/downloads/hostlist.pdf) and

47 tree species in its native range [30]. These differences in lifestyle are also reflected in the

PCA ordination. Although the A. glabripennis midgut community is most similar to the Sirex fungal gallery community, the distance between these two metagenomes is still quite significant and could be partially driven by differences in host range breadth and environment (e.g. gut vs. gallery).

48 Candidate Genes for Lignin Degrading Enzymes

Genes encoding enzymes that have been previously implicated in lignin degradation were identified in the microbiomes affiliated with both the midgut of A. glabripennis and the fungal gallery communities, and may be partially responsible for driving the grouping of these communities in the hierarchical analysis (Table B1). This is in contrast to the results of a recent comparative metagenomic study that concluded host-associated communities lacked the metabolic potential to degrade lignin [86], and may indicate that the A. glabripennis midgut community represents an exception. A number of bacterial and fungal reads with copper oxidase

(Cu oxidase) Pfam domains were detected in the A. glabripennis midgut, which could have laccase-type activity in vivo [70]. While many of these reads had corresponding BLAST assignments to laccases, multicopper oxidases, and polyphenol oxidases, a large number of the annotations were to hypothetical proteins and could represent novel and previously uncharacterized laccase-type enzymes. While laccases do not endogenously have a high enough redox potential to cleave major linkages in polymeric lignin [71], their activity can be enhanced in the presence of natural redox mediators [72] and, they are capable of disrupting β-aryl ether bonds under these conditions. β-aryl ethers represent the most dominant linkage in hardwood lignin and as a consequence, disruption of these linkages represents a critical step in lignin degradation [73].

A number of other extracellular peroxidases that are often highly expressed by lignin degrading microbes during periods of active lignin degradation were also detected. These include iron-dependent peroxidases, thiol peroxidases, and a number of other uncharacterized peroxidases. The potential participation of these peroxidases in large-scale lignin degradation is also supported by the detection of a number of peroxide-generating enzymes containing predicted

49 leader sequences for extracellular targeting. These included aryl alcohol oxidases, FAD , glyoxal oxidases, GMC oxidoreductases, and pyranose oxidases.

Bacterial dye-decolorizing peroxidases, also known as dyp-type peroxidases, were detected in association with the A. glabripennis midgut microbiota and microbial communities associated with other wood-feeding insects, and have previously been shown to cleave β-aryl ether linkages in both syringyl and guaiacyl lignin in a hydrogen peroxide dependent manner

[74]. While there is some evidence that manganese may act as a diffusible redox mediator in some bacterial dyp-type peroxidases [74], not all β-aryl ether cleaving peroxidases have identifiable manganese binding sites and thus, manganese may enhance the activity of a subset of these peroxidases [75]. Furthermore, reads for another set of β-aryl ether degrading enzymes were also discovered, which have been shown to catalyze the cleavage of these bonds in a glutathione-dependent manner. These enzymes were classified as β-etherases or glutathione-S- transferases [76]. In order to cleave β-aryl ether linkages, these enzymes first require oxidation of the Cα primary alcohol by aryl alcohol dehydrogenase (or Cα dehydrogenase) to generate a ketone group. The presence of a ketone group immediately adjacent to the ether linkage increases the polarity of the ether bond, allowing the ether bond to be easily cleaved by β-etherase, using glutathione as a hydrogen donor [77]. However, these GST (β-etherase) functional domains were not exclusively present in candidate lignin degrading genes [78] and are also associated with genes involved in detoxification (i.e., glutathione s-transferases) [79]. Therefore, only a subset of the GST domain proteins reported in this analysis are lignin degrading candidates. The role of dyp-type peroxidases and β-etherases in polymeric lignin degradation has yet to be clarified.

While some bacteria harboring these genes can cleave β-aryl ether linkages in dimeric lignin model compounds and Kraft and wheat straw lignin, their ability to catalyze degradation of an intact biopolymer from woody plants is unknown [80].

50 Of significance is that the majority of the lignin degrading genes present in the A. glabripennis midgut community are either absent or present in very low abundances in the communities associated with herbivore guts, including, panda, reindeer, honey bee, and wallaby and termites. This finding suggests that these herbivore communities may have alternate genes and mechanisms that could have lignin degrading roles in vivo or that some of these gut- associated communities lack lignin degrading capabilities altogether. In contrast, these lignin degrading candidates were highly abundant in the communities associated with wood-feeding insects, including the Sirex fungal gallery and A. glabripennis midgut. Consistent with their hypothesized role in the pre-digestion of lignocellulose for phloem-feeding insects, many lignin- degrading candidates were also found in high abundances in the fungal galleries of phloem feeding bark beetles. Although small subsets of these lignin degrading genes were also detected in guts of phloem feeding insects, these genes are likely environmentally derived and were acquired by feeding on the fungal gallery inoculum or they may also be encoded by microbes housed in the gut. Notably, peroxidases and extracellular hydrogen-peroxide generating enzymes were overrepresented in the A. glabripennis midgut community relative to other communities included in this analysis, suggesting that this community may have alternative pathways for degrading core lignin.

Despite the high abundances of putative laccases, dyp-type peroxidases, and hydrogen peroxide generating enzymes (FAD oxidases and GMC oxidoreductases) in the fungal gallery communities and the A. glabripennis midgut community, another class of putative lignin degrading enzymes (aldo-keto reductases: AKRs) were well represented in the termite gut communities, the tamar wallaby gut community, a subset of the fungal gallery communities (e.g.

Xyleborus, DP Fungal Alberta (hybrid), DP Fungal Alberta, and DF Fungal Mississippi), and the

A. glabripennis midgut community. An endogenous termite AKR capable of degrading lignin

51 phenolics and enhancing sugar release from pine sawdust was recently characterized [81] and subsets of microbial AKRs can act as Cα dehydrogenases, which can work in conjunction with β- etherases to cleave β-aryl ethers [77]. Microbial AKRs are well represented in the termite gut communities and have the potential to collaborate with host-derived AKRs to enhance ligninase activity in the gut. Interestingly, microbial AKRs are overrepresented in the A. glabripennis gut community relative to most other communities included in the comparison and have the potential to make contributions to digestion of lignin in this system. Taken together, we hypothesize that the A. glabripennis midgut metagenome has a lignin degrading capacity distinct from the termites and other herbivore associated communities that could be prospected for biotechnology purposes.

This possibility is supported by the fact that biochemical modifications to lignin detected in the gut of a lower termite (Zootermopsis angusticollis) were different than the lignin modifications detected in the A. glabripennis gut [8].

Candidate Genes for Cellulases and Carbohydrases

Although many of reads with predicted involvement in carbohydrate digestion are involved in core metabolic pathways, such as glycolysis, many also were annotated by BLAST as accessory enzymes that can digest cellulose and other plant cell wall carbohydrates. For example, reads were classified into 36 different glycoside hydrolase (GH) families based on a combination of Pfam domain and KEGG enzyme class assignments (Figure 2-7, Table B2). The most abundant CAZyme (Carbohydrate Active Enzyme) families detected were represented by families

GH 1 and GH 3 and their associated KEGG EC assignments are presented in Table 5. The majority of these GH 1 and 3 enzymes were predicted to encode β-glucosidases.

52 Many of these GH families could have key roles in processing cellulose, hemicellulose, and other plant polysaccharides in the A. glabripennis midgut. Of particular interest are cellulases (endoglucanases, exoglucanases, and β-glucosidases) that could augment the activities of cellulases inherently produced by A. glabripennis, enhancing the release of glucose from this highly insoluble and indigestible polysaccharide. Microbial cellulases detected in the A. glabripennis midgut metagenome were classified to seven different GH families, including GH 1,

GH 3, GH 5, GH 6, GH 9, GH 45, and GH 61 and their corresponding KEGG E.C. assignments suggest the presence of all enzymes necessary to liberate glucose from cellulose. We hypothesize that these microbial derived cellulases can collaborate with host enzymes to enhance cellulase activity in the midgut of A. glabripennis. Alternatively, the overabundance of microbial-derived

β-glucosidases may also allow microbes associated with the gut to exploit cellulose degradation products released by endogenous beetle cellulases secreted into the gut; however, the interactions among the beetle and its gut microbes are likely diverse, intricate, and dynamic and an explanation of why these β-glucosidases are overrepresented in this community cannot be fully determined without further investigation. Additionally, reads with highest BLAST scores to components of cellulosomes and other proteins with carbohydrate binding motifs that facilitate binding to the cellulose substrate, allowing hydrolytic enzymes to act processively and efficiently to release cellobiose and other cello-oligomers.

Candidate Genes for Xylose Utilization and Fermentation

GH families involved in processing hemicellulose were also detected; in general, the structure of hemicellulose is significantly more heterogeneous in comparison to cellulose and is comprised of a matrix of polysaccharides including xylan, glucuronoxylan, arabinoxylan,

53 glucomannan, and xyloglucan. The heterogeneity both in terms of subunit and linkage composition signifies that degrading this prominent group of cell wall polysaccharides requires a greater diversity of enzymes, although xylan and xyloglucans are the dominant hemicellulose polysaccharides in woody plants [82]. Not surprisingly, a number of GH families involved in breaking α- and β-linkages in xylan and xyloglucans were detected in the metagenome, including

GH families 5, 8, 10, 11, 26, 39, and 43.

Sugar monomers liberated from xylan can be efficiently metabolized by the midgut microbiota (Figure 2-8). Of particular importance is the ability to process xylose and arabinose as mechanisms for insect utilization of plant-derived pentose sugars have not been reported [28] and these sugars are inherently difficult to ferment on an industrial scale. Enzymes from both bacterial and fungal xylose pathways are well represented in the shotgun data to convert D-xylose into D-xylulose-5-phosphate [83]. D-xylulose-5-phosphate can be processed via the pentose phosphate pathway to produce glyceraldehyde-3-phosphate and fructose-6- phosphate, which can enter the glycolysis pathway [84]. Ultimately, pyruvate produced through glycolysis can be converted to acetaldehyde by pyruvate decarboxylase [85] and then to ethanol by alcohol dehydrogenase. Alternatively, acetaldehyde can be oxidized to acetate by acetaldehyde dehydrogenase [86], which can be used as the building blocks for fatty acid production. Although arabinose is a minor constituent of hemicellose in woody plants, it can be converted to D-xylulose-5-phosphate by L-arabinose isomerase and L-ribulokinase where it can be further processed by the pentose phosphate and glycolysis pathways to generate fermentable products [87]. All enzymes required to convert xylose and arabinose to ethanol (or acetate) are present in the A. glabripennis midgut community. Thus, this community could serve as a reservoir for novel enzymes that could be exploited to enhance industrial xylose fermentation.

54 Candidate Genes for Pectin Degrading Enzymes

Liberation of sugar monomers from both cellulose and hemicellulose is greatly enhanced when bonds crosslinking these compounds to pectin and lignin are disrupted, releasing polysaccharide termini and promoting easy access by processive hydrolytic enzymes. Pectin is a polysaccharide comprised primarily of α-galacturonic acid residues and it is often esterified to hemicellulosic and cellulosic polysaccharides in heartwood [88]. Degradation of pectin catalyzed by GH 28 polygalacturonases, pectin , pectin , and pectin acetylases and the disruption of ester linkages between pectin and other structural polysaccharides by , esterases, and acetyl xylan esterases produced by members of the A. glabripennis midgut community could indirectly facilitate cellulose and hemicellulose digestion by exposing polysaccharide termini to hydrolytic enzymes. Galacturonic acid residues released from this polysaccharide can be used as an energy source by the gut microbial community or A. glabripennis as microbial pathways involved in processing galactose and galacturonic acid were detected and pathways involved in galactose utilization have been previously described in beetles

[89].

Candidate Genes for Nutrient Acquisition and Synthesis

Nutrients are extremely scarce in the heartwood where the later instars of A. glabripennis feed. For example, nitrogen is limiting in woody biomass [90] and nitrogen sources originating from plant cell wall proteins are intricately cross-linked with recalcitrant plant cell wall polysaccharides and biopolymers [91], while other dietary components, including fatty acids, sterols, and vitamins are present in extremely low abundances or are absent altogether [25].

55 Besides the abilities of cerambycid beetles to produce endogenous cellulases and detoxification enzymes [14, 16, 92], little is known about their endogenous digestive and metabolic capabilities.

Despite this, transcriptome profiling of other Coleopterans revealed that beetles have impressive endogenous digestive and metabolic capabilities and produce diverse arrays of cell-wall degrading enzymes [93] and detoxification enzymes [94, 95], however, several pathways leading to the synthesis of sterols [96], aromatic amino acids, and branched chain amino acids are blocked at multiple steps [97] and these nutrients must either be acquired from the food source or through interactions with gut microbes. Because these nutrients are scarce in woody tissue, it is hypothesized that microbes associated with wood-feeding beetles can synthesize essential nutrients, facilitate nutrient recovery from woody tissue, and augment endogenous detoxification enzyme activities [25, 98-100].

Candidate Genes for Nitrogen Acquisition

The C:N ratio in the heartwood of hardwood trees can be as high as 1000:1, although plant cell wall proteins cross-linked in the cell wall matrix may serve as a reservoir of protein sources for organisms that live in this habitat. However, there is much debate about whether or not the protein concentrations in woody tissues are high enough to obtain a sufficient amount of nitrogen for de novo synthesis of nucleotides and amino acids. Therefore, it is generally hypothesized that insects and microbes colonizing the heartwood have mechanisms in place to acquire and utilize atmospheric nitrogen or have efficient pathways to recycle nitrogenous waste products [90]. Several bacterial nitrogen fixing genes were identified to convert atmospheric nitrogen to ammonia, which could then be assimilated and used by the beetle and other members of the midgut community. As a consequence, ammonium transporters and glutamine synthases, which actively transport ammonia into the cell and subsequently convert ammonia and glutamate

56 into glutamine, are also highly represented in the A. glabripennis midgut community. In addition, ammonia (a major byproduct of amino acid deamination reactions) [101], urea (a major waste product of amino acid degradation produced by bacteria) and uric acid (a major nitrogenous waste product produced by insects) [102] represent suitable sources of nitrogen that can be recapitulated and recycled through urease, uricase, and allatonin degradation pathways encoded by the midgut community. Overall, reads assigned to recycling pathways were far more abundant than reads assigned to nitrogen fixing pathways; therefore, we hypothesize that that nitrogen recycling might make important contributions to the nitrogen economy in the larval A. glabripennis midgut community. Alternatively, nitrogen fixation pathways may also be prominent in the A. glabripennis community, but these bacteria may be more associated with other regions of the gut where oxygen levels are lower (e.g., hindgut), which were not sampled for this study.

Furthermore, a wide array of proteinases with broad substrate abilities is associated with the gut community. This array of enzymes has the capacity to degrade plant proteins released from the plant cell wall matrix during active lignocellulose degradation and scavenge nitrogen from xenobiotic substrates, including cyanide, alkaloids [103], and non-protein amino acids (i.e., cyanoamino acids) [104]. Finally, the gut community possesses full or partial pathways for the synthesis of 23 amino acids, including full pathways for the biosynthesis of aromatic amino acids.

Candidate Genes for Sterol, Vitamin, and Fatty Acid Synthesis

Other nutrients notably missing or present in low abundances in woody tissue include sterols, vitamins, fatty acids, and inorganic ions [25]. Unlike other animals, insects cannot synthesize cholesterol as this pathway is blocked at several steps; thus, they must acquire sterols that can be converted to cholesterol from their feeding substrate [105]. Many wood-feeding insects (e.g., ambrosia beetles) convert ergosterols produced by cultivated fungal symbionts into

57 cholesterol [106], while others actively convert a variety of phytosterols produced by plants into cholesterol [107]. The F. solani isolate as well as yeasts harbored in the A. glabripennis gut have the capacity to contribute to the synthesis of cholesterol and, accordingly, a number of ergosterol synthesis genes (e.g., C-22 sterol desaturase, cytochrome P450s, and lanosterol 14 alpha demethylase) assigned to phylum Ascomycota, were detected. Vitamins and other nutrients missing from woody tissue can be produced or efficiently assimilated by the A. glabripennis gut community. A combination of acetate, produced via conversion of sugar monomers liberated from woody polysaccharides, and coenzyme A, synthesized by microbial constituents, could be used to synthesize acetyl CoA which is the essential building block for fatty acid synthesis [108].

Furthermore, pathways for synthesizing biotin (vitamin B7), coenzyme A folate (vitamin B9), lipoic acid, pyridoxine (vitamin B6), riboflavin (vitamin B2) thiamine (vitamin B1), and ubiquinone (coenzyme Q10) are well represented in the gut community.

Candidate Genes for Detoxification

Woody plants produce an array of secondary metabolites and digestive enzyme inhibitors in an attempt to restrict insect herbivory and colonization by pathogenic microbes. These compounds often accumulate in the heartwood of the plant [109]. While many insects endogenously produce impressive arrays of detoxification enzymes or have mechanisms to sequester plant toxins, many beetle species directly benefit from detoxification enzymes produced by microbes [110, 111]. For example, microbial communities associated with bark beetles feeding in phloem tissue, which serves as a conduit for toxic defensive chemicals, are highly enriched for detoxification genes [112]. The A. glabripennis midgut microbial community also encodes genes that can mitigate host plant defenses. A number of bacterial and fungal reads with highest scoring BLAST alignments to host plant inducible cytochrome P450s were detected that

58 are known to promiscuously degrade xenobiotic substrates in an oxidoreductive manner [113].

Reads corresponding to enzymes involved in glutathione-mediated detoxification, including glutathione peroxidases, glutathione-S-transferases, and glutathione reductases, were detected in the gut metagenome. The broad substrate specificities of these quintessential detoxification enzymes allow them to act on a wide range of toxic metabolites produced by many species of host trees. Additionally, most plants produce salicylic acid as a defense mediator against pathogens, which induces the production of defensive compounds. In addition, salicylic acid and its regulated pathways have indirect roles in anti-herbivory defenses since they can negatively impact symbiotic microbes associated with herbivores. However, the gut community is capable of producing a number of isochorismatase family proteins hypothesized to disrupt the salicylic acid pathway, which uses isochorismate as a key intermediate [114]. A number of salicylate hydratases were found in the A. glabripennis gut metagenome that could directly destroy salicylic acid to prevent induction of plant defensive pathways.

Metabolism of lignin also releases highly toxic metabolites, which can cause irreversible damage to the peritrophic matrix, digestive enzymes, and gut-associated microbes. While the cytochrome P450 enzymes mentioned previously could aid in the detoxification of these metabolites, other xenobiotic degrading enzymes were detected that could be involved in these processes, including glutathione S-transferases, glutathione S-peroxidases, epoxide hydrolases, aldo-keto reductases, and alcohol dehydrogenases. Further, several enzymes that hypothesized to directly break down small metabolites released from large-scale lignin degradation were detected in the A glabripennis metagenome and included lignostilbene-α-β-dioxygenases, 1,2 and 3,4 aromatic ring dioxygenases, biphenyl 2,3 dioxygenases, and ligX, ligZ, ligY, ligW, and ligW1, which have been observed to coordinate the degradation of ferulic acid and other small molecules released from lignin degradation [115]. A number of enzymes that could function as antioxidants were also detected, which may prevent oxidative damage to the midgut or the microbiota from the

59 ingestion of toxic dietary compounds (e.g. tannins) or from oxidative degradation of lignin.

Finally, one of the most common defense mechanisms employed by plants to reduce herbivory is to produce digestive proteinase enzyme inhibitors to restrict an organism’s ability to break down and assimilate nitrogen [116]. These proteinase enzyme inhibitors typically show high specificity and target a single family of proteinases; however, many insects have evolved a mechanism to overcome these plant defenses by producing a different type of peptidase whose activity and integrity is not impacted by these plant inhibitors [117]. The A. glabripennis microbial gut community has the genetic capacity to produce an assortment of digestive proteinase classes hypothesized to serve as alternative sources of proteinase family activities in the event that host plant proteinase inhibitors disrupt the endogenous proteinase families produced by A. glabripennis. Reduction of cysteine proteinase activity in western corn rootworm (Coleoptera:

Diabrotica virgifera virgifera) in antibiotic treated insects has been previously reported [118], demonstrating a clear role for microbial derived proteinases in insect digestive physiology.

Candidate Genes from Fusarium

Filamentous fungi belonging to the Fusarium species complex have been observed in association with beetles collected from all US populations and from several species of host trees.

Mass spectroscopy based protein identification techniques and in vitro enzyme assays of an F. solani strain associated with the A. glabripennis gut cultivated on wood chips demonstrated that this isolate is capable of producing several extracellular laccase enzymes, indicating that this isolate associated with A. glabripennis has lignin degrading potential. Furthermore, this isolate expressed 28 families of glycoside hydrolases, many of which had predicted cellulase and xylanase activities [62]. In addition to these previously reported findings, genes classified to the

60 genera Fusarium/Nectria were detected in this analysis included flavin-containing amine oxidoreductases (ammonium generation), glutathione-dependent formaldehyde-activating enzyme

(methane metabolism), several sugar transporters, and several short chain dehydrogenases, which can participate in many biochemical processes including sterol synthesis, metabolism of sugar alcohols, and metabolism of fermentation products. Whole genome sequencing is currently underway to compile a complete genetic inventory of this unique fungal strain and will provide a more comprehensive insight into its role in the A. glabripennis midgut.

Candidate Genes from Leuconostoc

Although sequencing coverage was not deep enough to generate draft genomes of any individual OTU in the A. glabripennis gut community, roughly 22,000 high quality reads (7.8

Mb) classified to genus Leuconostoc were detected in the A. glabripennis gut metagenome.

Bacteria from the genus Leuconostoc and other lactic acid bacteria have been previously identified in the guts of A. glabripennis larvae collected from other populations [9] and several other species of coleopterans (e.g., Agrilus planipennis and beetles in the family Carabidae)

[119]. Many genes taxonomically classified to this genus had highest scoring BLAST alignments to xylose fermentation pathways, pathways for utilization of pentose wood sugars, nitrogen recycling enzymes, nutrient synthesizing enzymes, and enzymes with detoxification abilities. A large number of cellobiose phosphorylases and glycoside hydrolase family 1 β-glucosidases were identified, which could be involved in degrading cellobiose disaccharides released from cellulose chains. In addition, a number of genes predicted to encode xylose transporters and xylose fermentation pathways were detected. Further, genes for the uptake and fermentation of other pentose sugars present in hemicellulose, including ribose and arabinose, were detected. Genes

61 annotated as aromatic acid dioxygenases and aryl alcohol dehydrogenases, which could catalyze the degradation of aromatic subunits released from the lignin biopolymer or serve as helper enzymes for β-aryl ether cleavage catalyzed by dyp-type peroxidases, were also identified.

Additionally, pathways involved in nutrient synthesis were also detected, which included pathways for the synthesis of branched chain amino acids, aromatic amino acids, sterols, and vitamins as well as enzymes that could function as antioxidants or in detoxification (e.g. cyanide hydratases). Due to the metabolic capacities for pentose sugar fermentation, nutrient synthesis, and detoxification, complete genome assembly for the Leuconostoc strains found in association with the A. glabripennis midgut and more in-depth studies to characterize the interactions between Leuconstoc and A. glabripennis species would be of value to pursue in future research.

Conclusion

This study represents the first large scale functional metagenomic analysis of the midgut microbial community of a cerambycid beetle with documented lignin degrading capabilities [8].

A taxonomically diverse assemblage of bacteria and fungi are associated with the midgut of A. glabripennis and this study has shown that this community harbors the enzymatic capacity for extensive contributions to the digestion of woody tissue. Of relevance is i) a microbial community dominated by bacterial and fungal aerobes and facultative anaerobes, indicating an appropriate aerobic environment in the midgut for microbial enzymes involved in oxygen- dependent lignin degradative processes, ii) the similarity of the A. glabripennis midgut microbiota to the Sirex fungal gallery community and its distinction from other herbivore gut communities, including the termite hindgut communities , iii) detection of genes encoding secreted oxidative enzymes proposed to disrupt β-aryl ether linkages and hypothesized to have roles in cleaving β-

62 aryl ether linkages in lignin , iv) detection of extracellular H2O2-generating enzymes, and v) detection of a number of genera with predicted lignocellulolytic and hemicellulolytic capabilities.

The midgut community of A. glabripennis has the metabolic potential to produce enzymes to help this wood-boring insect overcome major nutritional challenges associated with feeding in woody tissue and we hypothesize that interactions between the beetle and its gut microbes drive this insect’s ability to colonize and thrive in a broad range of healthy host trees. This wood-degrading system should also have great potential for the development of novel lignocellulose degrading enzymes for applications by the biofuels industry. This study provides the first glimpse into the metabolic potential of the gut community associated with a cerambycid beetle and lays the foundations for future hypothesis-based research, including more in-depth biochemical studies, comparative metagenomics, metatranscriptomics, and pathway modeling to assess potential metabolic cross-talk between this beetle and its gut microbes.

Acknowledgements

Amplicon and metagenomic shotgun sequencing were performed at the Department of

Energy-Joint Genome Institute. We thank Susannah Tringe, Kerrie Barry, Tijana Glavina del

Rio, and Mansi Chovitia at JGI for assistance with metagenomic library preparation, sequencing and annotation. Annotation of shotgun reads was performed using computing resources available at the USDA-ARS Pacific Basin Agriculture Research Center (Moana cluster; Hilo HI), Hawaii

Open Supercomputing Center at University of Hawaii (Jaws cluster; Maui, HI) and the Research

Computing and Cyberinfrastructure Group at The Pennsylvania State University (LionX clusters;

University Park, PA). We thank Al Sawyer’s group at USDA-APHIS in Otis, MA, the

Massachusetts Department of Conversation and Recreation and Maya Nehme for assistance collecting insects. Funding for this project was provided by USDA-NRI-CRSEES grant 2008-

63 35504-04464, USDA-NRI-CREES grant 2009-35302-05286, the Alphawood Foundation, a Seed

Grant to KH from the Pennsylvania State University College of Agricultural Sciences, and a

USDA-AFRI Microbial Functional Genomics Training grant 2010-65110-20488 to EDS. Work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of

Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

64

Literature Cited

1. Jorgensen H, Vibe-Pedersen J, Larsen J, Felby C: Liquefaction of lignocellulose at

high-solids concentrations. Biotechnology and bioengineering 2007, 96(5):862-870.

2. Alvira P, Tomás-Pejó E, Ballesteros M, Negro MJ: Pretreatment technologies for an

efficient bioethanol production process based on enzymatic hydrolysis: a review.

Bioresource Technology 2010, 101(13):4851-4861.

3. Campbell MM, Sederoff RR: Variation in lignin content and composition

(mechanisms of control and implications for the genetic improvement of plants).

Plant Physiol 1996, 110(1):3-13.

4. Watanabe HaT, G: Cellulolytic systems in insects. Annu Rev Entomol 2009, 55:23.

5. Huang SW, Zhang HY, Marshall S, Jackson TA: The scarab gut: A potential

bioreactor for bio‐fuel production. Insect Science 2010, 17(3):175-183.

6. Haack RA, Law KR, Mastro VC, Ossenbruggen HS, Raimo BJ: New York's battle with

the Asian long-horned beetle. J Forest 1997, 95(12):11-15.

7. Haack RA, Herard F, Sun JH, Turgeon JJ: Managing invasive populations of Asian

longhorned beetle and citrus longhorned beetle: A worldwide perspective. In: Annual

Review of Entomology. vol. 55. Palo Alto: Annual Reviews; 2010: 521-546.

8. Geib SM, Filley TR, Hatcher PG, Hoover K, Carlson JE, Jimenez-Gasco Mdel M,

Nakagawa-Izumi A, Sleighter RL, Tien M: Lignin degradation in wood-feeding

insects. Proceedings of the National Academy of Sciences of the United States of America

2008, 105(35):12932-12937.

65 9. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

10. Kirk TK, Farrell RL: Enzymatic combustion - the microbial-degradation of lignin.

Annual Review of Microbiology 1987, 41:465-505.

11. Schloss PD, Delalibera I, Handelsman J, Raffa KF: Bacteria associated with the guts of

two wood-boring beetles: Anoplophora glabripennis and Saperda vestita

(Cerambycidae). Environ Entomol 2006, 35(3):625-629.

12. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Jabbour R, Hoover K: Microbial

community profiling to investigate transmission of bacteria between life stages of

the wood-boring beetle, Anoplophora glabripennis. Microbial ecology 2009,

58(1):199-211.

13. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

14. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, Je YH, Lee SM, Seo SJ, Sohn HD, Jin BR:

cDNA cloning, expression, and enzymatic activity of a cellulase from the mulberry

longicorn beetle, Apriona germari. Comp Biochem Phys B 2004, 139(1):107-116.

15. Sugimura M, Watanabe H, Lo N, Saito H: Purification, characterization, cDNA

cloning and nucleotide sequencing of a cellulase from the yellow-spotted longicorn

beetle, Psacothea hilaris. Eur J Biochem 2003, 270(16):3455-3460.

16. Geib SM, Tien M, Hoover K: Identification of proteins involved in lignocellulose

degradation using in gel zymogram analysis combined with mass spectroscopy-

based peptide analysis of gut proteins from larval Asian longhorned beetles,

Anoplophora glabripennis. Insect Science 2010, 17(3):253-264.

66 17. Kukor JJ, Cowan DP, Martin MM: The role of ingested fungal enzymes in cellulose

digestion in the larvae of cerambycid beetles. Physiological Zoology 1988, 61(4):364-

371.

18. Brennan Y, Callen WN, Christoffersen L, Dupree P, Goubet F, Healey S, Hernandez M,

Keller M, Li K, Palackal N et al: Unusual microbial xylanases from insect guts.

Applied and environmental microbiology 2004, 70(6):3609-3617.

19. King AJ, Cragg SM, Li Y, Dymond J, Guille MJ, Bowles DJ, Bruce NC, Graham IA,

McQueen-Mason SJ: Molecular insight into lignocellulose digestion by a marine

isopod in the absence of gut microbes. Proceedings of the National Academy of

Sciences of the United States of America 2010, 107(12):5345-5350.

20. Coy MR, Salem TZ, Denton JS, Kovaleva ES, Liu Z, Barber DS, Campbell JH, Davis

DC, Buchman GW, Boucias DG et al: Phenol-oxidizing laccases from the termite gut.

Insect Biochemistry and Molecular Biology 2010, 40(10):723-732.

21. Breznak JA, Brune A: Role of microorganisms in the digestion of lignocellulose by

termites. Annual Review of Entomology 1994, 39:453-487.

22. Mathew GM, Mathew DC, Lo S-C, Alexios GM, Yang J-C, Sashikumar JM, Shaikh TM,

Huang C-C: Synergistic collaboration of gut symbionts in Odontotermes formosanus

for lignocellulosic degradation and Bio-hydrogen production. Bioresource

Technology 2012.

23. Francke-Grosmann H: Ectosymbiosis in wood-inhabiting insects. Symbiosis 1967,

2:141-205.

24. Jonsell M, Nordlander G, Jonsson M: Colonization patterns of insects breeding in

wood-decaying fungi. Journal of Insect Conservation 1999, 3(2):145-161.

25. Dillon RJ, Dillon VM: The gut bacteria of insects: nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

67 26. Lemke T, Stingl U, Egert M, Friedrich MW, Brune A: Physicochemical conditions and

microbial activities in the highly alkaline gut of the humus-feeding larva of

Pachnoda ephippiata (Coleoptera: Scarabaeidae). Applied and environmental

microbiology 2003, 69(11):6650-6658.

27. Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT,

Cayouette M, McHardy AC, Djordjevic G, Aboushadi N et al: Metagenomic and

functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature

2007, 450(7169):560-565.

28. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

29. Xueyan Y, Jiaxi Z, Fugui W, Min C: A study on the feeding habits of the larvae of two

species of longicorn (Anoplophora) to different tree species.Journal of Northwest

Forestry College, 2.

30. MacLeod A, Evans H, Baker R: An analysis of pest risk from an Asian longhorn

beetle (Anoplophora glabripennis) to hardwood trees in the European community.

Crop Protection 2002, 21(8):635-645.

31. Engelbrektson A, Kunin V, Wrighton KC, Zvenigorodsky N, Chen F, Ochman H,

Hugenholtz P: Experimental factors affecting PCR-based estimates of microbial

species richness and evenness. The ISME journal 2010, 4(5):642-647.

32. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA,

Oakley BB, Parks DH, Robinson CJ et al: Introducing mothur: open-source, platform-

independent, community-supported software for describing and comparing

microbial communities. Applied and environmental microbiology 2009, 75(23):7537-

7541.

68 33. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R: UCHIME improves sensitivity

and speed of chimera detection. Bioinformatics 2011, 27(16):2194-2200.

34. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ:

Gapped BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Research 1997, 25(17):3389-3402.

35. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid

assignment of rRNA sequences into the new bacterial taxonomy. Applied and

environmental microbiology 2007, 73(16):5261-5267.

36. Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data.

Genome research 2007, 17(3):377-386.

37. Zwickl D: Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under the maxmum likelihood criterion. PhD

dissertation available at http://wwwnescentorg/wg_garli (Univ of Texas, Austin) 2006.

38. Darriba D, Taboada GL, Doallo R, Posada D: jModelTest 2: more models, new

heuristics and parallel computing. Nat Methods 2012, 9(8):772.

39. Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P: A bioinformatician's

guide to metagenomics. Microbiol Mol Biol Rev 2008, 72(4):557-578, Table of

Contents.

40. Lowe TM, Eddy SR: tRNAscan-SE: A program for improved detection of transfer

RNA genes in genomic sequence. Nucleic Acids Research 1997, 25(5):0955-0964.

41. Eddy SR: HMMER: Profile hidden Markov models for biological sequence analysis.

In., vol. 14. Bioinformatics; 1998: 755-763.

42. Huang Y, Gilna P, Li W: Identification of ribosomal RNA genes in metagenomic

fragments. Bioinformatics 2009, 25(10):1338-1340.

69 43. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO: SILVA: a

comprehensive online resource for quality checked and aligned ribosomal RNA

sequence data compatible with ARB. Nucleic Acids Research 2007, 35(21):7188-7196.

44. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez

A, Stevens R, Wilke A et al: The metagenomics RAST server - a public resource for

the automatic phylogenetic and functional analysis of metagenomes. BMC

Bioinformatics 2008, 9:386.

45. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH,

Geer LY, Geer RC, Gonzales NR, Gwadz M et al: CDD: specific functional annotation

with the Conserved Domain Database. Nucleic Acids Research 2009, 37:D205-D210.

46. Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E, Krylov D,

Mazumder R, Mekhedov S, Nikolskaya A et al: The COG database: an updated

version includes eukaryotes. BMC Bioinformatics 2003, 4(1):41.

47. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski

K, Dwight SS, Eppig JT et al: Gene Ontology: tool for the unification of biology. Nat

Genet 2000, 25(1):25-29.

48. Kanehisa M: The KEGG database. In: ‘In Silico’ Simulation of Biological Processes.

John Wiley & Sons, Ltd; 2008: 91-103.

49. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a

universal tool for annotation, visualization and analysis in functional genomics

research. Bioinformatics, 21(18):3674-3676.

50. Ye Y, Doak TG: A parsimony approach to biological pathway

reconstruction/inference for genomes and metagenomes. PLoS computational biology

2009, 5(8):e1000465.

70 51. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, Khanna A,

Marshall M, Moxon S, Sonnhammer ELL et al: The Pfam protein families database.

Nucleic Acids Research 2004, 32(suppl 1):D138-D141.

52. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The

Carbohydrate-Active EnZymes database (CAZy): an expert resource for

Glycogenomics. Nucleic Acids Research 2009, 37:D233-D238.

53. Robinson T, McMullan G, Marchant R, Nigam P: Remediation of dyes in textile

effluent: a critical review on current treatment technologies with a proposed

alternative. Bioresource Technology 2001, 77(3):247-255.

54. Engel P, Moran NA: The gut microbiota of insects–diversity in structure and

function. FEMS Microbiology Reviews 2013.

55. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W,

Fungal Barcoding C, Fungal Barcoding Consortium Author L: Nuclear ribosomal

internal transcribed spacer (ITS) region as a universal DNA barcode marker for

Fungi. Proceedings of the National Academy of Sciences of the United States of America

2012, 109(16):6241-6246.

56. Suh SO, McHugh JV, Pollock DD, Blackwell M: The beetle gut: a hyperdiverse source

of novel yeasts. Mycological research 2005, 109(Pt 3):261-265.

57. Bass M, Cherrett J: Fungal hyphae as a source of nutrients for the leaf‐cutting ant

Atta sexdens. Physiological Entomology 1995, 20(1):1-6.

58. Talbot P: The Sirex-Amylostereum-Pinus association. Annual Review of

Phytopathology 1977, 15(1):41-54.

59. Paine T, Raffa K, Harrington T: Interactions among scolytid bark beetles, their

associated fungi, and live host conifers. Annual Review of Entomology 1997, 42(1):179-

206.

71 60. Logan JWM, Cowie RH, Wood T: Termite (Isoptera) control in agriculture and

forestry by non-chemical methods: a review. Bulletin of entomological research 1990,

80(3):309-330.

61. Sutherland JB, Pometto III AL, Crawford DL: Lignocellulose degradation by Fusarium

species. Canadian Journal of Botany 1983, 61(4):1194-1198.

62. Scully ED, Hoover K, Carlson J, Tien M, Geib SM: Proteomic analysis of Fusarium

solani isolated from the Asian longhorned beetle, Anoplophora glabripennis. PloS

one 2012, 7(4):e32990.

63. Suen G, Scott JJ, Aylward FO, Adams SM, Tringe SG, Pinto-Tomás AA, Foster CE,

Pauly M, Weimer PJ, Barry KW: An insect herbivore microbiome with high plant

biomass-degrading capacity. PLoS genetics 2010, 6(9):e1001129.

64. Waldo D, Smith L, Cox E: Model of cellulose disappearance from the rumen. Journal

of Dairy Science 1972, 55(1):125-129.

65. Mueller UG, Gerardo NM, Aanen DK, Six DL, Schultz TR: The evolution of

agriculture in insects. Annual review of ecology, evolution, and systematics 2005:563-

595.

66. Bridges JR: Mycangial fungi of Dendroetonns frontalis (Coleoptera: Scolytidae) and

their relationship to beetle population trends. Environ Entomol 1983, 12(3):858-861.

67. Norris DM: Degradation of 14C-labeled lignins and 14C-labeled aromatic acids by

Fusarium solani. Applied and environmental microbiology 1980, 40(2):376-380.

68. Bordeaux JM: Characterization of growth conditions for production of a laccase-like

phenoloxidase by Amylostereum areolatum, a fungal pathogen of pines and other

conifers. 2008.

69. Carnegie AJ, Matsuki M, Haugen DA, Hurley BP, Ahumada R, Klasmer P, Sun J, Iede

ET: Predicting the potential distribution of Sirex noctilio (Hymenoptera: Siricidae),

72 a significant exotic pest of Pinus plantations. Annals of Forest Science 2006,

63(2):119-128.

70. Ouzounis C, Sander C: A structure-derived sequence pattern for the detection of type

I copper binding domains in distantly related proteins. FEBS letters 1991, 279(1):73-

78.

71. Xu F, Shin W, Brown SH, Wahleithner JA, Sundaram UM, Solomon EI: A study of a

series of recombinant fungal laccases and bilirubin oxidase that exhibit significant

differences in redox potential, substrate specificity, and stability. Biochimica et

Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology 1996, 1292(2):303-

311.

72. Eggert C, Temp U, Dean JF, Eriksson KE: A fungal metabolite mediates degradation

of non-phenolic lignin structures and synthetic lignin by laccase. FEBS letters 1996,

391(1-2):144-148.

73. Hatfield R, Vermerris W: Lignin formation in plants. The dilemma of linkage

specificity. Plant Physiol 2001, 126(4):1351-1357.

74. Ahmad M, Roberts JN, Hardiman EM, Singh R, Eltis LD, Bugg TD: Identification of

DypB from Rhodococcus jostii RHA1 as a lignin peroxidase. Biochemistry 2011,

50(23):5096-5107.

75. Sugano Y, Sasaki K, Shoda M: cDNA cloning and genetic analysis of a novel

decolorizing enzyme, peroxidase gene dyp from Geotrichum candidum.. Journal of

bioscience and bioengineering 1999, 87(4):411-417.

76. Vuilleumier S: Bacterial glutathione S-transferases: what are they good for? Journal

of bacteriology 1997, 179(5):1431.

77. Masai E, Kubota S, Katayama Y, Kawai S, Yamasaki M, Morohoshi N:

Characterization of the C alpha-dehydrogenase gene involved in the cleavage of

73 beta-aryl ether by Pseudomonas paucimobilis. Bioscience, biotechnology, and

biochemistry 1993, 57(10):1655.

78. Masai E, Katayama Y, Kubota S, Kawai S, Yamasaki M, Morohoshi N: A bacterial

enzyme degrading the model lignin compound beta-etherase is a member of the

glutathione-S- superfamily. FEBS letters 1993, 323(1-2):135-140.

79. Armstrong RN: Structure, catalytic mechanism, and evolution of the glutathione

transferases. Chemical Research in Toxicology 1997, 10(1):2-18.

80. Taylor CR, Hardiman EM, Ahmad M, Sainsbury PD, Norris PR, Bugg TD: Isolation of

bacterial strains able to metabolize lignin from screening of environmental samples.

Journal of applied microbiology 2012, 113(3):521-530.

81. Sethi A, Slack JM, Kovaleva ES, Buchman GW, Scharf ME: Lignin-associated

metagene expression in a lignocellulose-digesting termite. Insect Biochemistry and

Molecular Biology 2013, 43(1):91-101.

82. Timell TE: Wood Hemicelluloses. I. Advances in carbohydrate chemistry 1964, 19:247-

302.

83. Kuyper M, Harhangi HR, Stave AK, Winkler AA, Jetten MS, de Laat WT, den Ridder JJ,

Op den Camp HJ, van Dijken JP, Pronk JT: High-level functional expression of a

fungal xylose isomerase: the key to efficient ethanolic fermentation of xylose by

Saccharomyces cerevisiae? FEMS yeast research 2003, 4(1):69-78.

84. Walfridsson M, Bao X, Anderlund M, Lilius G, Bulow L, Hahn-Hagerdal B: Ethanolic

fermentation of xylose with Saccharomyces cerevisiae harboring the Thermus

thermophilus xylA gene, which expresses an active xylose (glucose) isomerase.

Applied and environmental microbiology 1996, 62(12):4648-4651.

74 85. Bolotin A, Wincker P, Mauger S, Jaillon O, Malarme K, Weissenbach J, Ehrlich SD,

Sorokin A: The complete genome sequence of the lactic acid bacterium Lactococcus

lactis ssp. lactis IL1403. Genome research 2001, 11(5):731-753.

86. Ohta K, Beall DS, Mejia JP, Shanmugam KT, Ingram LO: Genetic improvement of

Escherichia coli for ethanol production: chromosomal integration of Zymomonas

mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II.

Applied and environmental microbiology 1991, 57(4):893-900.

87. Becker G: Physiological influences on wood-destroying insects of wood compounds

and substances produced by microorganisms. Wood Science and Technology 1971,

5(3):236-246.

88. Kohn RaL, O.: Intermolecular calcium ion binding on polyuronate-

polygalacturonate and polyguluronate. Collect Czech Chem Comun 1977, 42:731-744.

89. Consortium TS: The genome of the model beetle and pest Tribolium castaneum.

Nature 2008, 452(7190):949-955.

90. Mattson WJ: Herbivory in relation to plant nitrogen content. Annu Rev Ecol Syst

1980, 11:119-161.

91. Keller B, Templeton MD, Lamb CJ: Specific localization of a plant-cell wall glycine-

rich protein in protoxylem cells of the vascular system. Proceedings of the National

Academy of Sciences of the United States of America 1989, 86(5):1529-1533.

92. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G, Quesada M:

cDNA cloning, homology modelling and evolutionary insights into novel endogenous

cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae).

Insect Molecular Biology 2010, 19(3):323-336.

93. Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH: Diversity of beetle genes

encoding novel plant cell wall degrading enzymes. PloS one 2010, 5(12):e15635.

75 94. Aw T, Schlauch K, Keeling C, Young S, Bearfield J, Blomquist G, Tittiger C:

Functional genomics of mountain pine beetle (Dendroctonus ponderosae) midguts

and fat bodies. BMC Genomics 2010, 11(1):215.

95. Keeling CI, Yuen MM, Liao NY, Docking TR, Chan SK, Taylor GA, Palmquist DL,

Jackman SD, Nguyen A, Li M: Draft genome of the mountain pine beetle,

Dendroctonus ponderosae Hopkins, a major forest pest. Genome Biology 2013,

14(3):R27.

96. Douglas AE: The microbial dimension in insect nutritional ecology. Functional

Ecology 2009, 23(1):38-47.

97. Fraenkel G, Printy GE: The amino acid requirements of the confused flour beetle,

Tribolium confusum, Duval. The Biological Bulletin 1954, 106(2):149-157.

98. Beaver R, Wilding N, Collins N, Hammond P, Webber J: Insect-fungus relationships in

the bark and ambrosia beetles. In: Insect-fungus interactions 14th Symposium of the

Royal Entomological Society of London in collaboration with the British Mycological

Society: 1989: Academic Press; 1989: 121-143.

99. Morales-Jimenez J, Zuniga G, Villa-Tanaca L, Hernandez-Rodriguez C: Bacterial

community and nitrogen fixation in the red turpentine beetle, Dendroctonus valens

LeConte (Coleoptera: Curculionidae: Scolytinae). Microbial ecology 2009, 58(4):879-

891.

100. Grunwald S, Pilhofer M, Holl W: Microbial associations in gut systems of wood- and

bark-inhabiting longhorned beetles [Coleoptera: Cerambycidae]. Systematic and

Applied Microbiology 2010, 33(1):25-34.

101. Brodbeck BV, Andersen PC, Mizell RF: Differential utilization of nutrients during

development by the xylophagous leafhopper, Homalodisca coagulata. Entomologia

experimentalis et applicata 1995, 75(3):279-289.

76 102. Breznak JA: Intestinal microbiota of termites and other xylophagous insects. Annual

Reviews in Microbiology 1982, 36(1):323-323.

103. Baitsch D, Sandu C, Brandsch R, Igloi GL: Gene cluster on pAO1 of Arthrobacter

nicotinovorans involved in degradation of the plant alkaloid nicotine: cloning,

purification, and characterization of 2, 6-dihydroxypyridine 3-hydroxylase. Journal

of bacteriology 2001, 183(18):5262-5267.

104. Raybuck SA: Microbes and microbial enzymes for cyanide degradation.

Biodegradation 1992, 3(1):3-18.

105. Clark A, Bloch K: The absence of sterol synthesis in insects. J Biol Chem 1959,

234(10):2578-2582.

106. Six DL, Stone WD, de Beer ZW, Woolfolk SW: Ambrosiella beaveri, sp nov.,

associated with an exotic ambrosia beetle, Xylosandrus mutilatus (Coleoptera:

Curculionidae, Scolytinae), in Mississippi, USA. Antonie Van Leeuwenhoek

International Journal of General and Molecular Microbiology 2009, 96(1):17-29.

107. Robbins W, Kaplanis J, Svoboda J, Thompson M: Steroid metabolism in insects.

Annual Review of Entomology 1971, 16(1):53-72.

108. Louloudes SJ, Kaplanis J, Robbins W, Monroe R: Lipogenesis from C14-acetate by the

American cockroach. Ann Entomol Soc Am 1961, 54(1):99-103.

109. Taylor AM, Gartner BL, Morrell JJ: Heartwood formation and natural durability—a

review. Wood and Fiber Science 2002, 34(4):587-611.

110. Genta FA, Dillon RJ, Terra WR, Ferreira C: Potential role for gut microbiota in cell

wall digestion and glucoside detoxification in Tenebrio molitor larvae. Journal of

Insect Physiology 2006, 52(6):593-601.

111. Dowd PF: Insect fungal symbionts - a promising source of detoxifying enzymes.

Journal of Industrial Microbiology 1992, 9(3-4):149-161.

77 112. Adams AS, Aylward FO, Adams SM, Erbilgin N, Aukema BH, Currie CR, Suen G,

Raffa KF: Mountain pine beetles colonizing historical and naïve host trees are

associated with a bacterial community highly enriched in genes contributing to

terpene metabolism. Applied and environmental microbiology 2013, 79(11):3468-3475.

113. Schuler MA: The role of cytochrome P450 monooxygenases in plant-insect

interactions. Plant Physiol 1996, 112(4):1411.

114. Daayf F, El Hadrami A, El-Bebany AF, Henriquez MA, Yao Z, Derksen H, El-Hadrami

I, Adam LR: Phenolic compounds in plant defense and pathogen counter-defense

mechanisms. Recent Advances in Polyphenol Research 2012, 3:191.

115. Bugg TD, Ahmad M, Hardiman EM, Singh R: The emerging role for bacteria in lignin

degradation and bio-product formation. Current opinion in biotechnology 2011,

22(3):394-400.

116. Ryan CA: Protease inhibitors in plants: genes for improving defenses against insects

and pathogens. Annual Review of Phytopathology 1990, 28(1):425-449.

117. Jongsma MA, Bakker PL, Peters J, Bosch D, Stiekema WJ: Adaptation of Spodoptera

exigua larvae to plant proteinase inhibitors by induction of gut proteinase activity

insensitive to inhibition. Proceedings of the National Academy of Sciences 1995,

92(17):8041-8045.

118. Chu C-C, Spencer JL, Curzi MJ, Zavala JA, Seufferheld MJ: Gut bacteria facilitate

adaptation to crop rotation in the western corn rootworm. Proceedings of the

National Academy of Sciences 2013.

119. Lehman RM, Lundgren JG, Petzke LM: Bacterial communities associated with the

digestive tract of the predatory ground beetle, Poecilus chalcites, and their

modification by laboratory rearing and antibiotic treatment. Microbial ecology 2009,

57(2):349-358.

78

Table 2-1. Summary of Newbler metagenome assembly metrics.

Number of 454 Shotgun Reads Produced 1,258,810

Number of Contigs 25,838

Number of Singleton Reads 585,749

Minimum Contig Length (bp) 200

Maximum Contig Length (bp) 30,393

N20 (bp) 2,081

N50 (bp) 938

N80 (bp) 555

Total Number of Assembled (bp) 22,220,287

Total Number of Unassembled (bp) 179,346,064

79

Table 2-2. Species richness and diversity calculations for bacterial OTUs detected in the A.

glabripennis gut.

OTUs Chao 95% Ace 95% Jackknife 95% CI Simpson 95% CI Observed Richness CI Richness CI Richness Jackknife Diversity Simpson Chao Ace (1-D) Diversity (1-D) 166 354 266- 437 370- 657 434-870 0.919 0.912- 518 526 0.925

80 Table 2-3. Species richness and diversity calculations for fungal OTUs detected in the A.

glabripennis gut.

# OTUs Chao 95% Ace 95% Jackknife 95% CI Simpson 95% CI Observed Richness CI Richness CI Richness Jackknife Diversity Simpson Chao Ace (1-D) Diversity (1- D) 7 11 8-31 18 9- 12 6-18 0.51 0.49-0.52 101

81 Table 2-4. Summary of metagenome annotations.

Number of High Quality Shotgun 454 Reads 1,067,718

Number of rRNAs 6,397

Number of tRNAs 2,596

Number of reads with BLASTX alignments to annotated proteins in non- 541,761 redundant protein database (e-value = 0.00001)

Number of reads with BLASTX alignments to hypothetical proteins in 144,965 non-redundant protein database (e-value = 0.00001)

Number of reads with COG (Clusters of Orthologous Genes) assignments 357,999

Number of reads with Seed assignments 255,091

Number of reads with GO (Gene Ontology) assignments 361,412

Number of reads with KEGG assignments 173,359

Number of reads with Pfam domains 420,285

Number of reads with BLASTX alignments and Pfam domains 409,594

Number of reads with Pfam domains only (no BLASTX alignments) 10,691

82 Table 2-5. The most highly abundant glycoside hydrolase families detected in gene tag annotations and their associated KEGG classifications.

GH KEGG ECs Reactions Family 1,3 β-glucosidase (EC Hydrolyzes β-1,4 linkages in glucose-containing disaccharides 3.2.1.21) 1 β-galactosidase (EC Hydrolyzes β-galactosidic bond between galactose and its 3.2.1.23) organic functional group 1 β-mannosidase (EC Hydrolyzes terminal, non-reducing mannose residues from β- 3.2.1.25) D linked mannosides 1 β-glucuronidase Hydrolyzes β-D glucuronic acid residues from non-reducing (EC3.2.1.31) end of glycosoaminoglycans 1 Exo-β-1,4-glucanase Releases cello-oligomers from exposed polysaccharide termini (EC 3.2.1.74) in cellulose 1 6-phospho-β- Hydrolyzes β-galactosidic bond between a 6-phospho-β-D- galactosidase (EC galactose and its organic functional group 3.2.1.85 1 6-phospho-β- Hydrolyzes β-1,4 linkages in glucose substituted disaccharides glucosidase (EC containing phosphorylated glucoside residues 3.2.1.86) 1 Strictosidine Liberates D-glucose from strictosidine amygdalin β- glucosidase (EC 3.2.1.117) 1 Thioglucosidase (EC Hydrolyzes linkage between thiol and glycosinolate 3.2.1.147) 1 β-primeverosidase Hydrolyzes linkage between 6-O-(beta-D-xylopyranosyl)- (EC 3.2.1.149) beta-D-glucopyranoside its organic functional group 3 Xylan 1,4-β - Hydrolyzes linkage between β-linked xylose residues in β-1,4 xylosidase (EC xylan 3.2.1.37) 3 β -N- Liberates hexose from gangliosides acetylhexosaminidase (EC 3.2.1.52) 3 Glucan 1,3-β - Cleaves β-1,3 linkages in β glucans glucosidase 3 Endo-β -1,4- Cleaves internal bonds in crystalline cellulose to liberate glucanase polysaccharide termini 3 Exo-1,3-1,4- Releases cello-oligomers from β-1,3 or β-1,4 linked glucose glucanase oligosaccharides and polysaccharides 3 α-L- Hydrolyszes α-1,3 in arabinose-containing oligosaccharides arabinofuranosidase and polysaccharides

83

Figure 2-1. Approximately 166 bacterial OTUs were detected through amplicon sequencing.

Various community richness estimators consistently predicted the presence of over 300 OTUs in association with the A. glabripennis gut and, in agreement with this observation, the rarefaction curve failed to reach saturation. This indicates that additional OTUs would likely be detected with additional amplicon sequencing.

84

Figure 2-2. Maximum likelihood analysis of representative sequences from operational taxonomic unit analysis (OTU) of bacterial 16S rRNA amplicons. Representative sequences from each bacterial OTU were aligned with MEGA 4.0 and phylogenetic analysis using was performed using GARLI 2.0 (500 bootstrap pseudoreplicates and TIM1+I+G evolutionary model). Nodes were collapsed and labeled by taxonomic class. Number of OTUs and percentage of amplicons assigned to each class are labeled. OTUs that could not be assigned to class level by RDP were omitted from the analysis.

85

Figure 2-3. Rarefaction, richness, and diversity analyses of 18S amplicon data. Seven fungal

OTUs were detected through amplicon sequencing. While rarefaction begins to approach saturation, richness estimates predict the presence of at least 11 fungal OTUs indicating that additional sampling may be necessary. This scenario is likely since additional 18S rRNAs from fungal taxa not detected in the 18S amplicons were detected in the shotgun reads (e.g., Fusarium spp.).

86

Figure 2-4. Distribution of SEED assignments generated by MG-RAST. Reads assigned to 28

SEED subsystems were detected in the A. glabripennis larval midgut metagenome. The most dominant subsystems found in association with this microbial community included clustering based subsystems, carbohydrate metabolism, and amino acid and derivatives metabolism.

87

Figure 2-5. Hierarchical cluster analysis based on Pfam annotations of herbivore related metagenomes. Agglomerative hierarchical cluster analysis based on a compositional Euclidean distance matrix was conducted using Pfam annotations from various herbivore related metagenomes. Three distinct clusters representing different herbivore biome-types are highlighted and labeled. These include herbivore gut communities, fungal gallery communities associated with phoem/xylem feeding insects and communities associated with insects feeding in heartwood.

88

Figure 2-6. Principal components analysis (PCA) of Pfam domains from herbivore-related metagenomes. Principal components analysis was conducted to plot samples in multidimensional space. Groupings detected in agglomerative cluster analysis are preserved (Mantel test, p<

0.0001) and are color-coded by groups identified in the dendrogram. Monte Carlo Permutation

Procedure (n=1000 iterations): p<0.0001 for PCA 1 and PCA2. Abbreviations: DP:

Dendroctonous ponderosae, DF: Dendroctonous frontalis, FL: Florida, CR: Costa Rica.

89

Figure 2-7. Distribution of glycoside hydrolase families found in the A. glabripennis gut metagenome. Reads assigned to 36 glycoside hydrolase families were detected in the gut microbiome. The most dominant families were GH 1 and 3, while GH families 11, 45, 46, 61, and 71 were present in very low abundances.

90

Figure 2-8. Xylose utilization pathway present in the A. glabripennis gut community. Xylose released from hemicellulose can be converted into D-xylulose-5-phosphate and eventually into acetaldehyde. Acetaldehyde can be either converted into ethanol by alcohol dehydrogenase or into acetate by acetaldehyde dehydrogenase. These reactions are likely catalyzed by lactic acid bacteria or yeasts associated with the A. glabripennis gut.

91

Chapter 3

Proteomic Analysis of Fusarium solani isolated from the Asian longhorned beetle, Anoplophora glabripennis

Abstract

Wood is a highly intractable food source, yet many insects successfully colonize and thrive in this challenging niche. Overcoming the lignin barrier of wood is a key challenge in nutrient acquisition, but full depolymerization of intact lignin polymers has only been conclusively demonstrated in fungi and is not known to occur by enzymes produced by insects or bacteria. Previous research validated that lignocellulose and hemicellulose degradation occur within the gut of the wood boring insect, Anoplophora glabripennis (Asian longhorned beetle), and that a fungal species, Fusarium solani (ATCC MYA 4552), is consistently associated with the larval stage. While the nature of this relationship is unresolved, we sought to assess this fungal isolate’s ability to degrade lignocellulose and cell wall polysaccharides and to extract nutrients from woody tissue. This gut-derived fungal isolate was inoculated onto a wood-based substrate and shotgun proteomics using Multidimensional Protein Identification Technology

(MudPIT) was employed to identify 400 expressed proteins. Through this approach, we detected proteins responsible for plant cell wall polysaccharide degradation, including proteins belonging to 28 glycosyl hydrolase families and several , esterases, , pectate lyases, and polysaccharide deacetylases. Proteinases with broad substrate specificities and were observed, indicating that this isolate has the capability to digest plant cell wall proteins and can recycle nitrogenous waste under periods of nutrient limitation. Additionally, several laccases, peroxidases, and enzymes involved in extracellular hydrogen peroxide production previously

92 implicated in lignin depolymerization were detected. In vitro biochemical assays were conducted to corroborate MudPIT results and confirmed that cellulases, glycosyl hydrolases, xylanases, laccases, and Mn- independent peroxidases were active in culture; however, lignin- and Mn- dependent peroxidase activities were not detected While little is known about the role of filamentous fungi and their associations with insects, these findings suggest that this isolate has the endogenous potential to degrade lignocellulose and extract nutrients from woody tissue.

Introduction

Most beetles in the family Cerambycidae develop deep in woody tissues where access to sugar monomers present in plant cell wall polysaccharides is impeded by the presence of a recalcitrant lignin barrier and other essential nutritional resources, including proteins, lipids, sterols, and vitamins, are deficient or absent altogether [1]. Many cerambycids that thrive in this suboptimal environment overcome these barriers by preferentially targeting weakened or stressed trees, whose woody, intractable components have been pre-digested by wood-degrading fungi, which degrade lignocellulose into easily digestible mono- and di- saccharides and synthesize other essential dietary components [2]. Alternatively, other cerambycids harbor external wood- degrading fungi, which are physically inoculated into host trees; the fungi colonize the larval tunnels, digest carbon polymers, and serve nutrient provisioning roles for the insect or fungal enzymes are ingested by the insect to aid in lignocellulose digestion in the gut [3]. Unlike other cerambycids, the Asian longhorned beetle (Anoplophora glabripennis, Coleoptera:

Cerambycidae), an exotic insect native to China first detected in the U.S. in the early 1990’s, attacks both weakened [4] and healthy [5] deciduous trees in the absence of external wood-

93 degrading fungi. This beetle also enjoys a broad host range, which includes over 21 deciduous tree species [6, 7].

A. glabripennis larvae face a number of challenges acquiring nutrients as they grow and develop deep in the sapwood (and heartwood in some tree species) where the lignin: nitrogen ratio is high. Woody tissue is primarily composed of three polymeric materials: cellulose, hemicellulose and lignin. Cellulose is a linear polymer of glucose linked by β-1,4glycosidic bonds, accounting for approximately 45% of wood by weight. Its linear structure and extensive hydrogen bonding increases crystallinity of the macromolecule and decreases accessibility of hydrolytic enzymes [8]. Hemicellulose accounts for approximately 15 to 35% of wood by weight depending on tree species and is also bound by β-1,4 linkages. Unlike cellulose, hemicellulose has much greater structural heterogeneity and is primarily comprised of xylose chains forming a xylan structure in secondary cell walls of hardwood tree species. In addition, other lesser abundant monomers typically found in hemicellulose include galactose, rhamnose, arabinose, mannose and their acid derivatives [9]. Lignin is an amorphous structural aromatic polymer, which is often esterified to uronic residues present in hemicellulose and cross-linked to cellulose through ether and glycosidic linkages, protecting these polysaccharides from hydrolytic enzymes

[10]. Furthermore, some evidence indicates that cell wall proteins present in xylem elements are cross-linked with lignin and other cell wall polysaccharides, protecting them from proteolysis, suggesting that lignocellulose and hemicellulose degradation may be required for protein acquisition [11]. Phenylpropanoid units, including coumaryl, coniferyl, and sinapyl alcohol, are the precursors of lignin. Oxidation of these phenols yields free radicals that undergo radical coupling to form a polymer comprised of over 12 types of chemical linkages and dominated by

C-C and ether linkages, which are invulnerable to hydrolysis and can only be broken through radical oxidative depolymerization [12]. Its lack of stereoregularity and periodicity and its condensed, insoluble properties make lignin resistant to most forms of enzymatic attack [13].

94 Current methods of degradation of lignin have been described for white rot basidiomycete fungi. These fungal species utilize secreted heme peroxidases with high redox potentials, including lignin-, Mn-, and versatile- peroxidases, to depolymerize the lignin molecule through oxidization of non-phenolic C-C and ether linkages using hydrogen peroxide as an oxidant [14]. Laccases, or multicopper phenol oxidases, are also produced by some lignin- degrading basidiomycetes, which solely oxidize the phenolic hydroxyl groups that comprise less than 10% of the total linkages in an intact lignin biopolymer. Although these enzymes can indirectly degrade more recalcitrant linkages in the presence of synthetic redox mediators, such as the diammonium salt of 2,2'-azine-bis (3-ethylbenzothiazoline-6-sulfonic acid) (ABTS), natural varieties of these mediators have yet to be discovered; thus, the full biochemical role of laccases in natural lignin depolymerization processes remain obscure [15]. In brown rot basidiomycetes, hydroxy radicals generated through Fenton reactions initiated by iron III reductase ,catalyzed by heme domains in cellobiose dehydrogeases, are responsible for lignin modification [10].

Although these reactive hydroxy radicals induce only partial lignin depolymerization, many speculate that Fenton reactions occur in tandem with radical oxidative processes to expedite complete lignin depolymerization since Fenton-type lignin metabolites have been detected in association with white-rot mediated lignin degradation [16]. Outside the basidiomycetes, the process of lignin degradation by other types of fungi is largely unknown; while some bacteria can degrade aromatic monolignols, dilignols, and other phenolic linkages found in lignin, only partial lignin depolymerization and modification have been documented in bacteria. Thus, bacterial lignin degradation is not well understood and highly speculative [17-21].

From previous studies, we demonstrated that lignin, cellulose, and hemicellulose degradation occur in the guts of larval A. glabripennis [22-24]. Although many insects, including cerambycids, can produce endogenous cellulases and other cell wall degrading enzymes [25-31] as well as enzymes capable of oxidizing phenolic linkages present in lignin [12, 25-30, 32-34],

95 there is no conclusive evidence that insects endogenously produce enzymes capable of oxidizing the predominant C-C or ether linkages required for full lignin depolymerization ,which are definitively broken in the A. glabripennis gut [22]. However, A. glabripennis harbors gut microbiota that likely contribute to digestive physiology and help this insect overcome barriers associated with extracting nutrients from woody tissue, including radical oxidation of non- hydrolyzable bonds in lignin. Previous 16s rRNA-based metagenomic analyses of the gut bacterial communities of larvae reared in several different suitable host tree species revealed a community dominated by Actinobacteria and Proteobacteria that appears to display a tremendous degree of plasticity at lower taxonomic ranks. For example, striking shifts in alpha diversity and taxonomic composition were observed in insects reared in different host tree species with little impact on larval fitness or ability to degrade cellulose and xylan [23]. In contrast, the fungal community was relatively static in comparison and was consistently dominated by a single fungal taxon regardless of collection site or host tree species, demonstrating a stable relationship between this fungal isolate and A. glabripennis. Multilocus phylogenetic analysis encompassing the ITS, EF, and LSU loci convincingly places this isolate in the Fusarium solani species complex [35].

Despite their status as notorious plant and animal pathogens, Fusarium spp. are occasionally found in non-pathogenic associations with beetles [36]. One of the most well studied examples is the relationship between Ambrosia beetles (Coleoptera, Curculionidae) and

Fusarium solani. Ambrosia beetles harbor fungi in special external structures called mycangia, inoculating the fungus into the tree to digest lignocelluose and synthesize other nutrients, including ergosterols required for pupation [37]. However, in many cases, the nature of the relationship between beetles and their respective Fusarium affiliates is unclear; although the presence of the fungus appears to directly improve insect growth and fecundity, their precise contributions to insect physiology and biochemistry are largely unknown [36]. While we are just

96 beginning to understand the relationship between the Asian longhorned beetle and its gut- associated F. solani isolate, it is known that A. glabripennis larval galleries are typically visually free of fungal inoculum indicating that if this isolate is contributing to lignocellulose digestion or other physiological processes in vivo, these processes occur within the gut rather than in the external environment [35].

Furthermore, it is known that F. solani isolates comprise a metabolically versatile species complex that can colonize many diverse niches and persist in extreme environments. Owing to this versatility, these fungi can often extract glucose from exotic carbon sources, including pyrene and benzopyrene [38, 39] and often produce an impressive arsenal of xenbiotic degrading enzymes capable of degrading many common aromatic hydrocarbon pollutants, including chlorobenzenes, polychlorinated biphenyls, and phenanthrenes [40]. One of the hallmark characteristics of lignin degrading enzymes is their lack of substrate specificity [41], leading many to speculate that the enzymes involved in xenobiotic oxidoreductive processes may also oxidize recalcitrant bonds present in lignin. Early wood block surveys conducted with freshwater

Nectria isolates (anamorph: Fusarium) were initially discouraging as these isolates did not induce substantial weight loss characteristic of soft rot processes, despite prolific production of cell wall polysaccharide degrading enzymes and phenol oxidases [42, 43]. However, studies on other F. solani isolates convincingly demonstrated their abilities to oxidize aromatic rings and side chains of synthetic lignin compounds, efficiently metabolize both Kraft and Klason lignin, and thrive on lignocellulose-based substrates as sole sources of carbon [44-47]. Notably, maximal evolution of

14 CO2 from radiolabelled lignin rings and side chains occurred substantially earlier in F. solani isolates in comparison to white rot basidiomycetes and some Fusarium isolates degraded both lignin and polysaccharides simultaneously [46], clearly indicating a strong lignin-degrading propensity and suggesting that these fungi may harbor highly efficient lignin degrading enzymes that could be used to enhance industrial cellulosic ethanol production. Additionally, the recent

97 genome sequence of Nectria haematococca (anamorph: F. solani; mating population VI (MPVI)) also includes a putative lignin peroxidase ortholog (protein ID 4582) [48]; to our knowledge, its activity has not been verified in vitro, but this protein possesses all of the functional domains required for lignin peroxidase activity.

Due to its strong association with A. glabripennis and its potential metabolic versatility and lignocellulose degrading properties, the goal of this study was to survey and characterize the lignocellulolytic, cell wall polysaccharide degrading, and other nutrient extracting capacities of this F. solani isolate using de novo peptide sequencing and in vitro biochemical assays of extracellular proteins produced by the fungus grown on a wood-based substrate.

Materials and Methods

Source of Larval A. glabripennis Associated F. solani Culture

Fungus cultures were obtained from A. glabripennis larvae maintained at Penn State

University, Department of Entomology, University Park, PA USA. Adult A. glabripennis maintained in a quarantined greenhouse were allowed to oviposit into potted sugar maple (Acer saccharum) trees, which are highly preferred hosts of these beetles [7]. Subsequently, eggs were allowed to hatch and larvae were permitted to mature in the trees’ woody tissues. After a period of 90 days, healthy larvae feeding on inner wood were collected and fungi were cultured from larval guts as described previously [22] to create single spore cultures. This isolate is currently curated at the American Type Culture Collection under the accession number MYA 4552.

98 Solid Wood Substrate Fungal Cultures and Fungal Enzyme Extraction

Several agar plugs from a culture of F. solani (MYA 4552) described above were used to inoculate a solid wood substrate in polypropylene growth bags (Unicorn, Commerce, TX, USA) containing 250g red oak wood chips, 30g millet, 15g wheat bran, and 400 mL distilled water at

30° C [49]. Previous studies in Phanerochaete chrysosporium revealed that white rot fungi require easily metabolizable carbohydrates to induce production of peroxidases involved in lignin degradation as this process typically occurs during periods of secondary metabolism and under conditions of extreme nitrogen limitation [10]. Although there is limited evidence that certain

Fusarium solani strains can thrive on lignin as a sole source of carbon and do not require easily metabolizable carbohydrates, including cellulose, glucose, or soluble starches, to induce lignin degrading enzymes [47], other strains do require these components and will not colonize lignocelluose-based substrates in their absence. Thus, general growth conditions required for induction of lignocellulolytic genes in Fusarium are poorly understood and abilities to colonize lignocellulose substrates vary tremendously within this species complex [50]. To promote initial substrate colonization regardless of inherent metabolic potential, the medium was augmented with millet and the culture was grown for an extended time period to expend tractable carbohydrate and protein resources present in the growth medium and to induce physiological processes characteristic of secondary metabolism.

Approximately one month after inoculation, total fungal enzymes were extracted from the entire culture as previously described [49] by mixing bag contents with one volume of 0.5 M

NaCl and incubating for 2 h at 4°C with constant stirring. The mixture was then squeezed through cheesecloth and centrifuged at 15,000×g for 30 min at 4°C to remove cellular debris/wood and preferentially purify proteins secreted into the extracellular environment.

Ammonium sulfate was added to the filtrate over a 30 minute time period until the solution

99 reached 100% saturation to precipitate proteins and to preserve proteins in their native conformation. The suspension was incubated overnight at 4°C with stirring and the preparation was centrifuged at 15,000×g for 30 min at 4°C. The protein pellets were dissolved in 50 ml of water and trace amounts of ammonium sulfate were removed by repeated concentration (Amicon,

10-kDa cutoff) and re-suspension in 50 ml of water. The final extractions were partitioned into 1 ml aliquots and stored at –80°C.

MudPIT Analysis

One mg of total F. solani protein extract was digested with trypsin in solution [51].

Briefly, the protein sample was lyophilized and resuspended in 100 µl of 6 M urea, 100 mM Tris buffer (pH 7.8). Denatured proteins were reduced by adding 5 µl of 200 mM dithiothreitol (DTT) in 100 mM Tris (pH 7.8) and incubating for 1 h at room temperature. The protein sample was subsequently alkylated by adding 20 µl of 200 mM iodoacetamide in 100 mM Tris (pH 7.8) and incubating for another hour at room temperature. Residual iodoacetamide was then consumed by adding another 20 µl of 200 mM DTT in 100 mM Tris (pH 7.8) followed by a1 hour incubation at room temperature. The sample was diluted with water to a working volume of 0.9 ml; then, 20

µg of trypsin in 0.1 mL of water was added to the sample, (Trypsin Gold, Promega Corporation,

Madison WI) bringing the final reaction volume to 1 ml. The protein sample was then completely digested overnight at 37°C and the reaction was halted the following day by lowering the pH to < 6.0 with acetic acid. Residual salts were removed from the sample through repeated concentration in a SpeedVac and resuspension in water.

Multidimensional Protein Identification Technology (MudPIT) analysis was performed at the Penn State Hershey Medical Center Mass Spectrometry Core Research Facility. Two-

100 dimensional liquid chromatography was performed to highly fractionate the sample. In brief, a strong cation exchange (SCX) column was used to separate the protein extract into 15 fractions.

These fractions were subsequently separated on a C-18 column and each sub-fraction was directly spotted onto a MALDI target plate (370 spots/fraction). A total of 5500 MALDI spots were prepared in this manner. Next, tandem MS was performed for each spot on an ABI 4800

MALDI-TOF-TOF (Applied Biosystems, Foster City, CA, US) to determine de novo peptide sequences. These fragments were mapped to the Nectria haematococca/F. solani Mating

Population VI (MPVI) reference protein set (http://genome.jgi- psf.org/Necha2/Necha2.home.html) [48] using ProteinPilot 3.0 Software’s Paragon Algorithm

(Applied Biosystems, Foster City, CA, US). This genome was chosen due to its phylogenetic proximity to the A. glabripennis-derived F. solani isolate, [35], and it was the only suitable reference genome within the F. solani species complex that was publicly available at the time of study. In this analysis, the reference proteome is digested in silico and de novo peptide sequences determined by tandem MS are mapped back to annotated peptides for identification and scored based on mapping quality and coverage. Mapping quality is determined by sequence similarity at the amino acid level and the number of unique mapping locations in the reference proteome; thus, peptides with high amino acid similarity to their predicted reference proteins and peptides that map uniquely to a single protein in the reference proteome are given a higher score. Coverage is determined by the number of unique peptides that map to a single reference protein and proteins covered by multiple peptides are scored more favorably. These parameters are integrated into the

‘unused’ score, which is used to infer the confidence of a protein match. In this analysis, an unused score of 1.3 represents a significant protein match at 95% confidence; significant protein matches were annotated using the Nectria haematococca /F. solani Mating Population VI

(MPVI) 2.0 genome database (abbreviated Necha 2.0).

101 KEGG, gene ontology (GO), and InterPro (IPR) annotations present in genome database were applied to proteins detected by MudPIT analysis. In addition, the full N. haematococca

MPVI amino acid sequence for each protein identified was analyzed for the presence of signal peptides using both neural network and Hidden Markov model (HMM) methodologies using the

SignalP 3.0 web server [52]. For neural network analysis, the mean S score was used to determine presence of a signal peptide and HMM predictions were based on the Cmax score. In an attempt to infer the function of proteins detected in the secretome that were annotated as

‘hypothetical’ in the N. haematococca database, the peptide sequences were extracted from the reference genome and were for comparison to the non-redundant protein database using blastp

(blast+ 2.2.26). From these annotations, we summarized protein classes present, with a focus on enzymes involved in plant cell wall digestion and protein acquisition. Additionally, to confirm that proteins detected in the MudPIT data were enzymatically active in culture we performed in vitro cellulase, xylanase, lignin peroxidase, manganese-dependent and independent peroxidase assays.

In vitro Cellulase and Xylanse Activities

In vitro cellulase, glycoside hydrolases and xylanase activities of fungal enzyme extracts were confirmed by measuring release of reducing sugars from cellulose or xylan-based substrates using the dinitrosalicylic acid (DNS) assay [53, 54]. Protein concentration was measured using

Bradford chemistry [55, 56] with BSA as a protein standard (0 – 20 g); samples were diluted to a working concentration of approximately 60 g/ml in sodium citrate buffer (50 mM, pH 5.5).

To assay for ability to degrade cellulose, we quantified release of reducing sugars from cellulose- based substrates, including microcrystalline cellulose (Avicel) and carboxymethyl cellulose

102 (CMC). To quantify glycoside hydrolase activity directed at β-1,4 linkages present in D- glycopyranosyl containing compounds, salicin, a β-1,4 conjugated phenolic glycoside, was utilized. For CMC and salicin DNS assays, 500 l of a 2% substrate solution (in 50 mM sodium citrate buffer, pH 5.5) was combined with 30 g (500 l of 60 g/ml) of fungal extract; however, due to substrate insolubility, a 1% solution of microcrystalline cellulose was substituted. For xylanase activity, 500 l of a 1% xylan solution (in 50 mM sodium citrate buffer, ph 5.5, xylan from birch wood, Sigma Aldrich Corporation) was combined with 30 g (500 l of 60 g/ml dilution) of fungal extract. Three technical replicates were performed and non-enzyme controls were run to detect background release of reducing sugars from cellulose and xylan based substrates. For all assays, 100 l of the reaction mixture was removed at time 0 and read at 540 nm to allow for subtraction of background reducing sugars. Reactions were incubated at 37° C for 120 min and 100 l aliquots were removed from each reaction after 60 and 120 minutes to record release of reducing sugars over time. For each aliquot, 100 l DNS reagent was added to halt enzyme activity [54] and samples were boiled for 8 min in a water bath. 150 L aliquots of

DNSA reaction were read at 540 nm on a SpectraMax™ microplate reader (Molecular Devices

Corp.) and were compared with a glucose standard curve (20 to 1000 g) to quantify concentration of reducing sugars at each time point.

Zymogram Analysis

SDS-PAGE gels [57] were performed with modifications to independently verify cellulase or xylanase activity through zymogram techniques [58-60]. Twelve percent acrylamide separation gels were prepared containing either 0.1% carboxymethyl cellulose or 0.1% xylan from birch wood. Fungal enzyme extracts prepared as described above were loaded onto each gel

103 in duplicate (20 µg protein/lane) with a pre-stained protein standard (SeeBlue Plus, Invitrogen,

Carlsbad, CA), so that the gel could be cut vertically in half after electrophoresis to produce two identical acrylamide gels. The first half of the gel was stained with colloidal blue to visualize proteins as a reference and imaged using a densitometer (GS-800, Bio-Rad, Hercules, CA). The second half was used for zymogram analysis. Zymogram gels were rinsed in sodium citrate buffer (50 mM, pH 5.5) containing 1 % Triton X-100 for 1 h at room temperature to remove SDS

[58]. This was followed by 1.5 h incubation in sodium citrate buffer (50 mM, pH 5.5) to allow for enzyme activity against the substrates. At this point, gels were stained with 0.1% Congo red for 30 min and destained in 1M NaCl to reveal zones of clearing indicative of degradation of polysaccharide substrates. Gels were imaged under ultraviolet light to highlight zones of clearing and aligned with colloidal blue stained gels using the pre-stained protein standard as a reference.

In vitro Lignin Peroxidase, Manganese Peroxidase, and Laccase Activities

Ligninolytic activity of the fungal extract was further assessed through in vitro approaches. Lignin peroxidase activity was assayed by the oxidation of veratryl alcohol to veratraldehyde as an increase in A310 [61]. Approximately 50 µg of protein were added to 1 mL of a solution containing 25 mM sodium tartrate (pH 3.0) and 20 mM veratryl alcohol. The reaction was initiated by addition of H2O2 at a concentration of 2 mM and the absorbance at A310 was recorded. Mn-dependent and -independent peroxidase activities were measured by the oxidation of 2,6-dimethoxyphenol as an increase in A470 [62]. Approximately 50 µg of protein extract was added to a solution containing 20 mM 2,6-dimethoxyphenol and 0.5 M sodium tartrate (pH 4.5), either with or without 20 mM manganese sulfate (for Mn-dependent and -independent activity, respectively). Total reaction volume was 1 ml and the peroxidase reaction was initiated by

104 addition of H2O2 to obtain a final concentration of 2 mM. Absorbance at 470 nm was recorded.

Laccase activity was verified using 2,6-dimethoxyphenol as a substrate [63]. Approximately 50

µg of protein extract was added to a solution containing 20 mM 2,6-dimethoxyphenoland 0.5 M sodium tartrate (pH 4.5) to a total volume of 1 ml. Absorbance was then read at 470 nm. All assays were performed in triplicate and mean change in absorbance versus a no enzyme control over five minutes was recorded and this difference was used as a measure of enzyme activity.

One unit of activity was defined as the amount of enzyme that oxidizes 1 µmol of substrate per minute.

Non-denaturing PAGE and Heme Staining

All known peroxidases with deconstructive activity directed at intact lignin polymers are extracellular heme oxides containing a heme prosthetic group [10, 60, 63]. In an attempt to detect extracellular proteins present containing this vital prosthetic group, heme stained gels were prepared. In brief, non-denaturing polyacrylamide gel electrophoresis (native PAGE) was run following the methods of Laemmli [57] with a gel containing 12% acrylamide and lacking SDS in the gel and running buffer. To further preserve the functional integrity of the proteins, the samples were not boiled prior to loading on the gel. Fungal enzyme extracts were loaded onto the gel in duplicate (20 µg protein/lane) with protein standards so that the gel could be cut vertically in half after electrophoresis to produce two identical acrylamide gels. The first half of the gel was stained with colloidal blue to visualize proteins as a reference and imaged with a densitometer

(GS-800, Bio-Rad, Hercules, CA). The second half of the gel was stained to identify proteins containing a heme prosthetic group. In brief, the second half of the gel was incubated in a Tris-

MeOH solution (20 mM Tris-HCl, pH 7.3, 50% MeOH) for 30 min, followed by a 45-min

105 incubation in heme stain solution (25 mM acetate buffer, pH 5.3, 0.25% benzidine HCl, 25%

MeOH, and 0.75% H2O2). The gel was then rinsed in 25% MeOH and stored in 0.1 M Tris-HCl, pH 7.3. Heme containing proteins stained dark brown and were matched with colloidal blue stained proteins on the reference gel. Although other non-lignin degrading proteins that contain a heme prosthetic group will stain (cytochromes) using this approach, these proteins are primarily intracellular and are found in association with mitochondria.

Results

MudPIT Analysis

A total of 8400 spectra were collected from tandem MS and, of these, 4740 spectra were utilized to identify 3638 unique peptides at 95% confidence (Table 3-1). These 3638 peptides reliably mapped to 398 distinct proteins in the N. haemotococca reference genome (Table 3-1).

By using the full amino acid sequence from the N. haematococca genome, 175 proteins were found to have a signal peptide using SignalP 3.0 neural network model and 183 using HMM model (Supplemental Information). Over half of the proteins (224 for neural network and 203 for

HMM) did not contain a detectable signal peptide probably due to incidental lysis of cell membranes during the protein extraction or errors in computational signal peptide prediction.

Through functional annotation, InterPro information was applied to 309 of these proteins, GO IDs to 275 proteins, and KEGG annotations to 128 proteins. Full InterPro, GO, and KEGG annotation are provided in supplemental information. Furthermore, there were 20 InterPro IDs that had 5 or more proteins classified to them (Table 3-2). Many of these abundant Interpro IDs

106 are related to cellulose/carbohydrate binding and degradation, protein/peptide degradation, and general metabolic functions. Focusing more closely on proteins that hydrolyze sugars, an analysis of glycoside hydrolase (GH) families was performed using the InterPro IDs and blast searching the reference proteins against the NCBI non-redundant database (Table 3-3).

In total, 48 different proteins classified into 28 GH families were represented in the protein extract, encompassing ~ 8% of all proteins identified (Table 3-4). The most abundant GH was GH 3, which includes proteins represented by a broad range of enzyme classes, including β- glucosidase (EC 3.2.1.21), xylan-1,4- β-xylosidase (EC 3.2.1.37), β-N-actetylhexosamidase (EC

3.2.1.52), and α-L-arabinofuranosidase (EC 3.2.1.55). Specifically β-glucosidase (EC 3.2.1.21) was identified in our sample from this GH family through cross-annotation with the KEGG database (supplemental information). In addition, α- (EC 3.2.1.1; GH 13), glucoamylase

(EC 3.2.1.3; GH 15), licheninase (EC 3.2.1.73; GH 16), glucan 1,3-β-glucosidase (EC 3.2.1.58;

GH 17), (EC 3.2.1.14; GH 28), β- (EC 3.2.1.52; GH 20), α-glucosidase

(EC 3.2.1.20; GH 37), β-galactosidase (EC 3.2.1.23; GH 35), and α,α- (EC 3.2.1.28; GH

37) were identified in the proteome. A full description of potential enzymes in other GH families can be found at www.cazy.org [64]. We were unable to annotate other GH families with additional functionality, as EC numbers were not assigned to KEGG annotations for those proteins in the reference genome. In addition to GH annotated proteins, other plant cell wall degrading proteins that target other compounds in degrading woody tissue such as esterases, pectinases, carbohydrate binding proteins, and others were identified (Table 3-4).

Several proteins that may have the ability to degrade components of lignin or disrupt the linkage between lignin and other components of lignocellulose were identified (Table 3-5).

These include laccases, , radical copper oxidases, oxidoreductases, and superoxide dismutases that are often associated with lignin degradation. Also, a candidate cellobiose

107 dehydrogenase was detected, which can generate hydroxy radicals. In addition, enzymes that target aromatic compounds for degradation were found, including an and biphenol reductase (Table 3-5).

In addition, 47 proteins relevant to protein digestion and nitrogen scavenging were detected in our protein extract, which were classified to 24 different IPR protein domains (Table

3-6). The most abundant domains involved in protease activity were IPR000209 (peptidase S8 and S53, subtilisin, kexin, sedolisin), IPR007484 (peptidase M28), IPR000834 (peptidase M14),

IPR001461 (peptidase A1), and IPR 001563 (peptidase S 10 serine carboxypeptidase). Through cross annotation with the KEGG database, we classified proteins containing these domains to enzyme classes: proteins classified as IPR000209 were assigned to ECs 3.4.21.-(serine endopeptidases), 3.4.21.48 (cerevisin), and 3.4.14.9(tripeptidyl-peptidase I); proteins classified as

IPR007487 were assigned to ECs 3.4.11.- (aminopeptidase), 3.4.11.10 (bacterial leucylaminopeptidase); proteins classified to IPR000834 were assigned to ECs 3.4.17.-

(metallocarboxypeptidase), 3.4.17.15 (carboxypeptidase A2), 3.4.17.2 (carboxypeptidase B); proteins classified to IPR001461 were assigned to ECs 3.4.23.1 (pepsin), 3.4.23.24 (Candida pepsin), 3.4.23.-(aspartic endopeptidases); and proteins classified as IPR001563 were assigned to

ECs 3.4.16.6 (carboxypeptidase D). Two proteins relevant to nitrogen recycling and nitrogen scavenging were also detected, including IPR03778 (urea amidolyase related) and IPR004304

(acetamidase/formamidase) respectively. Although IPR003778 could not be annotated with

KEGG, IPR004304 was classified as EC 3.5.1.49 (formamidase).

Gene ontology (GO) annotation can be utilized to classify proteins by general function and overabundance of certain categories in expression datasets, including transcriptomes and proteomes, which can indicate the potential importance of these groups of genes to ongoing metabolic processes. In the proteome data obtained from an A. glabripennis -derived F. solani

108 isolate growing on solid wood substrate, the most highly abundant categories from level 3 of the

Molecular Function category were hydrolase activity, nucleotide binding, and activity (red line, Figure 3-1), accounting for 34.2%, 9.9%, and 8.7% of proteins in our secretome, respectively. However, it is likely that these GO categories may simply be overrepresented in the reference genome and the overabundance of these categories in our proteome may simply be an artifact of genome enrichment. To correct for this and to identify

GO categories of proteins that are truly overrepresented under these growth conditions, we compared the relative abundances of GO categories in our proteome against their relative abundances in the genome. Through this comparison, many GO categories are enriched in our proteome relative to the reference genome, including carbohydrate binding, hydrolase activity, peroxidase activity, and protein binding (bar graph, Figure 3-1).

Verification of Enzyme Activity Through In vitro Lignocellulase Assays

The fungal enzyme extract created from a solid wood culture of A. glabripennis-derived

F. solani was surveyed for activities characteristic of enzymes that can degrade lignin, cellulose, and xylan (Table 3-7). β-glucosidase activity, measured by the release of reducing sugars from salicin, was 212.8 U/ml, CMCase activity was 13.1 U/ml, and cellulase activity measured by release of reducing sugar from microcrystalline cellulose (Avicel), was 14.7 U/ml (Table 3-7).

Xylanase activity, measured by release of reducing sugar from birch wood xylan, was 70.5 U/ml.

Although the sample did not exhibit lignin peroxidase activity as measured by oxidation of veratryl alcohol, the sample exhibited low levels of Mn-dependent peroxidases activity (0.021

U/ml), Mn-independent peroxidase activity (0.47 U/ml) and laccase activity (0.42 U/ml) (Table

3-7).

109 Verification of Enzyme Presence and Activity Through PAGE Gel Analysis

In addition to verification of activity in in vitro assays, PAGE analysis was performed to visualize active enzymes. A single heme containing protein of approximately 70 kDa was detected on a native heme stained gel; however, a corresponding protein band was not detected on the reference colloidal blue stained gel (Figure 3-2). A possible explanation for not observing a matching protein on the reference gel is that this protein may not be highly abundant or concentrated enough to be adequately visualized with the colloidal blue stain. Through zymogram analysis, carboxymethyl cellulase activity was detected and six major zones of clearing ranging from 20 to 55 kDa were identified (Figure 3-2). Comparison to the reference colloidal blue stained SDS-PAGE gel revealed that several similarly sized protein bands were present, including a major band at 55 kDa and several less intense bands around and below 28 kDa. Relatively high xylanase activity was also detected through birch wood xylan zymogram analysis; many zones of clearing were observed ranging in size from 20 kDa to 50 kDa (Figure 3-

2). Very large and broad zones of clearing between 25 and 45 kDa obstructed the ability to define specific protein bands with activity towards xylan, but distinct bands can be seen above and below this region on the zymogram gel. Again, these regions of clearing can be matched to corresponding protein bands on the reference colloidal blue stained gel. This analysis verifies the data from in vitro analysis and also demonstrates that many of these activities are from a diversity of proteins, which is also represented in the proteomic dataset.

110 Discussion

Over 30 years of research has been devoted to resolving the mechanisms and enzymology of lignin biodegradation, yet only three fungal enzymes have been discovered that conclusively depolymerize lignin, fully converting it to carbon dioxide and water. Reasons commonly cited for lack of progress in this field include a poorly resolved biochemical structure of lignin, unreliable in vitro assays to confirm lignin degrading activities, and uncharacterized growth and physiological conditions required for induction of lignin degrading enzymes in non-white rot fungal isolates. While advances in next-generation sequencing and bioinformatics may allow us to detect lignin, manganese-, and versatile- peroxidase orthologs in the genomes of newly sequenced organisms, validating and confirming that these orthologs actually catalyze lignin depolymerization is not a trivial task. Despite these difficulties, lignin degradation has recently been documented in the guts of two evolutionarily distant insect species: Anoplophora glabripennis and Zootermopsis angusticollis (Pacific dampwood termite) [12, 17, 20, 22, 24, 31,

32, 33, 34]. Since this initial discovery, much effort has been devoted to dissecting lignin degradation mechanisms and mining for key enzymes linked to these processes in both termites and A. glabripennis; however, the fundamental enzymes involved that catalyze these reactions remain elusive. Despite this ambiguity, there is evidence that all major reactions associated with large scale lignin biodegradation occur in both insects [22], but the predominant oxidative reactions are different. For example, side chain oxidation was the dominant reaction in the A. glabripennis gut, while ring hydroxylation was the dominant reaction in Zootermopsis, indicating that different enzymes may catalyze lignin degradation in these systems [22]. Despite these differences, one resounding commonality can be noted: these processes both occur in the absence of white rot basidiomycete fungi, leading to the possibility that novel lignin degrading enzymes

111 are harbored in the insect genomes (not likely) or within the gut communities (more likely).

While the gut bacterial communities of many termites and A. glabripennis are both dominated by

Actinomycete bacteria [23, 65], including taxa known to efficiently metabolize aromatic compounds, the ability of these actinomycetes to completely depolymerize the lignin macromolecule remains questionable [17-21]. In addition, little is known about the composition of fungal communities in the guts of non-fungus cultivating termites; while in contrast, A. glabripennis larvae consistently harbor a filamentous ascomycete belonging to the Fusarium solani species complex [35], a metabolically diverse group of fungi with prolific lignin and aromatic polymer degrading capabilities [40, 46]. Here, we investigate the ability of this A. glabripennis -derived F. solani isolate to colonize and thrive on a solid wood substrate, degrade intractable woody polymers, including cellulose, xylan, and lignin, and extract other essential nutrients from this environment during periods of secondary metabolism.

In white rot basidiomycetes, expression of lignin degrading genes occurs solely during periods of secondary metabolism and nutrient limitation [10]; however, the conditions required for induction of lignin degrading enzymes are not well characterized in Fusarium spp. and their ability to colonize lignocellulose-based substrates varies tremendously [50]. Our goals to ensure substrate colonization regardless of inherent metabolic potential and to induce growth conditions characteristic of secondary metabolism were achieved. Proteins detected in MudPIT data associated with secondary metabolism include cupin-, germin-, and patatin-containing proteins, which facilitate nutrient storage under nutrient limiting conditions [66] (Supplemental

Information). Additionally, proteins and enzymes typically induced during periods of stress and nutrient deprivation were detected including cell wall spore proteins, cytochrome p450 extracellular , gamma glutamyl transpeptidase, heat shock proteins, mucin-like glycoproteins, and woronin body proteins [66-68] (Supplemental Information), likely indicating that this fungus was not persisting under nutrient rich, stress free conditions in culture.

112 Furthermore, proteins associated with host plant interactions were detected from MudPIT data, including several candidate toxin-producing proteins and one cerato-platanin necrosis-inducing enzyme (Supplemental Information) [69], indicating that the fungal isolate was not simply persisting on millet and bran and was actively colonizing wood chips present in the medium.

Likewise, cellulase, glycoside hydrolase (directed at β-1,4 linkages), and xylanase activities were conclusively detected in vitro through zymogram analysis and reducing sugar assays (Figure 3-2 and Table 3-7), demonstrating that this isolate was actively digesting carbohydrate polysaccharides present in the woody substrate rather than simply extracting glucose from the soluble starches present in millet. In tandem, GH families responsible for hydrolyzing β-1,4 linkages present in both cellulose and xylan were identified through MudPIT analysis, including complete enzyme complexes for full conversion of both polysaccharides to glucose. Cellulose alone requires at least three distinct enzymes for efficient glucose liberation, including endoglucases, exoglucanases, and β-glucosidases. Endoglucanases cleave amorphous sites in cellulose at random, decreasing its crystallinity, increasing its solubility, and rapidly exposing reducing and non-reducing ends to other hydrolytic enzymes. Subsequently, exoglucanases act on these reducing and non-reducing termini to release cellobiose and other cello-oligomers, which can be efficiently converted to glucose by β-glucosidases [70].

Endoglucanases (EC 3.2.1.4) are categorized into several GH families, including GH 5, 6, 7, 8, 9,

12, 44, 45, 48, 51, and 61. Exoglucanases or cellobiohydrolases classified into two distinct

KEGG ECs depending on whether they target reducing or nonreducing ends and whether they have inverting or retaining activities: EC 3.2.1.176 (GH7 and 48; reducing/retaining), EC

3.2.1.91 (GH 5, 6, and 9; non-reducing, inverting). β-glucosidases (EC 3.2.1.21) are distributed among GH families 1, 3, 9, 30, 116 [71]. GHs with potential relevance to cellulose digestion detected through MudPIT analysis include GHs belonging to families 1, 3, 5, 6, 7, 45, and 61.

Many of these glycoside hydrolases were not well-annotated in the reference genome with GO or

113 KEGG terms [48], so the precise reactions catalyzed by these GHs cannot be directly inferred; an individual GH family can harbor enzymes with very diverse catalytic and substrate specificities

(Table 3-3). However, the distribution of GHs in conjunction with release of reducing ends from microcrystalline cellulose, carboxymethyl cellulose, and salicin suggests that this fungal isolate possesses the full suite of cellulolytic enzymes.

On the other hand, hemicellulose is a much more heterogeneous polymer containing many monomeric subunits and greater diversity of chemical linkages, and thus, requires a combination of glycoside hydrolases and esterase enzymes for efficient conversion to sugar monomers [9]. For example, O-acetylglucuronoxylan is the predominant polysaccharide in hardwood trees [72] and requires endo-, exo-, and β-xylosidases (EC 3.2.1.37: GH 3, 30, 39, 43,

52, 54, 116, and 120) [71] to hydrolyze β-1,4 linkages; α-uronidsases (EC 3.2.1.139) to liberate glucuronic, mannuronic, and galacturonic acids; and ferulic acid esterase (EC 3.1.1.73) and (EC 3.2.1.72) [34, 73] to hydrolyze phenolic ester bonds that cross-link hemicellulose to lignin. Of the eight GHs families with documented involvement in xlyan degradation, three were detected in our proteome that could account for the xylanase activity we observed in vitro in reducing sugar assays and zymogram analyses (Table 3-7 and Figure 3-2).

These include GH families 3, 43, and a candidate GH 39 (Table 3-3). Many of these had predicted signal peptides, but no specific GO or EC annotations were present in the reference genome that could be utilized to determine substrate specificities. In addition, a number of ester hydrolyzing enzymes, including carboxylesterases, esterases, and a /, were also detected through shotgun proteomics (Table 3-4). Surprisingly, a number of proteins with activity directed at pectin and cutin polymers, which are not highly abundant in woody tissues, were also detected; however, pectin and cutin are highly pertinent to wood decay processes because these compounds are often found in the central location of pit membranes, ray cell walls, and middle lamellae of wood cell walls [74] (Table 3-4). These polysaccharides are

114 often broken down during wood rot processes and decomposition is often coupled to metal cation acquisition as pectin serves as a calcium reservoir in woody tissue [75].

Proteins are also occasionally found impregnating xylem elements and plant cell walls in woody tissue, which provide vital nitrogen sources that could be assimilated by wood degrading bacteria and fungi. Only a few types of proteins are co-localized to cell walls in wood and they are often covalently cross-linked to cellulose, hemicellulose, and lignin in the cell wall matrix

[11]. Incidentally, extracellular proteinases are often highly expressed in white rot basidiomycetes during periods of active lignin metabolism [16]. As production of lignin degrading peroxidases is induced by nitrogen limiting conditions in white rot basidiomycetes, many have hypothesized that these fungi may degrade lignin in order to access proteins cross- linked to lignin, though this has not been directly tested [73, 75, 76]. In concert, we detected many extracellular proteinases in our own secretome, including many enzymes with broad substrate specificities that could serve to scavenge nitrogen from woody tissue, including aspartic peptidases, carboxypeptidases, metallopeptidases, and serine peptidases (Table 3-6). An alternative explanation for the abundance of proteinases in the secretome is that they may have been actively hydrolyzing proteins present in the millet, although these elements should have been depleted from the medium at the time of harvesting. In addition, fungi that thrive under these nitrogen limiting conditions for extended periods of time may also efficiently recycle and reuse nitrogenous waste products in amino acids and nucleotides. Two putative nitrogen recycling proteins were also detected, including formamidase and urease that actively convert nitrogenous waste into ammonia, which can subsequently be re-integrated into amino acids or nucleotides (Table 3-6) [77].

Protein, cellulose, and hemicellulose in wood chips are all protected from hydrolytic enzymes by lignin, a structural biopolymer dominated by recalcitrant linkages that can only be broken through radical oxidative depolymerization catalyzed by lignin-, manganese-, and

115 versatile- peroxidases [10]. In addition, highly reactive hydroxyl radicals produced from Fenton reactions may help to expedite complete lignin depolymerization in some white rot fungi [50, 78].

While these enzymes all catalyze lignin depolymerization using slightly different mechanisms, they all require extracellular peroxide, which is generated by variety of enzymes including FAD oxidases, copper radical oxidases, glyoxal oxidases, and GMC oxidoreductases [10, 16, 78, 79].

Some lignin degrading fungi also utilize laccases [80] to catalyze oxidative cleavage of phenoxy linkages, which may degrade small lignin metabolites released by larger scale depolymerization processes or augment large scale oxidative depolymerization of lignin-, manganese-, or versatile- peroxidases. Although their catalytic potential can be expanded to oxidize more recalcitrant linkages in the presence of synthetic mediators [15], no natural varieties of these redox mediators have been conclusively identified, though some speculate that small secreted proteins expressed during active lignin metabolism may fill this niche [16].

Despite the presence of a lignin peroxidase ortholog in our reference genome, no bona fide lignin peroxidases or manganese peroxidases were detected through in vitro biochemical assays or de novo peptide sequencing ; however, one unidentifiable heme protein was detected

(Figure 3-2). Whether or not this protein functions as an extracellular lignin peroxidase is unknown. Additionally, many extracellular enzymes typically observed in the secretomes of other lignin-degrading fungi were detected, including several secreted laccases whose activity was verified in vitro (Tables 3-6 and 3-7) and two intracellular polyphenol oxidases (tyrosinases).

Both phenol oxidases can oxidize similar substrates, but only laccases can oxidize syringaldazine, while trysoninases are more instrumental in degrading tannic and gallic acids [43, 81, 82]. In tandem, several extracellular peroxide generating enzymes, including FAD oxidases, GMC oxidoreductases, a putative copper radical oxidase, and superoxide dismutases, and one hydroxy radical generating enzyme (candidate cellobiose dehydrogenase) were also observed (Table 3-6), suggesting that processes requiring peroxide and hydroxyl radicals were occurring in the

116 extracellular environment. In addition, several germin proteins were detected, which can also function as oxalate oxidases (Supplemental Info). Oxalate is often produced during periods of active lignin metabolism and can be directly converted to hydrogen peroxide by oxalate oxidase and often enhance oxidative activities of manganese peroxidase in vitro [83]. Several enzymes that degrade small aromatic compounds were also identified, including a candidate reductase with activity directed at biphenyl compounds and a candidate esterase with activity directed at aromatic compounds [84, 85], which could hydrolyze bonds in small lignin metabolites produced from larger-scale biodegradation processes (Table 3-6). Finally, several small secreted proteins and hypothetical proteins were also detected in the culture supernatant; whether or not these proteins could be relevant to lignin degradation is unknown (Table 3-6 and Supplemental

Information).

Although no lignin peroxidase activity was detected in vitro, this does not necessarily indicate that this isolate does not harbor lignin degrading enzymes or possess lignin degrading capabilities. In fact, veratryl alcohol oxidation is often not detected during active lignin degradation in white rot fungi in extracellular extracts prepared from isolates growing on woody substrates, even when lignin peroxidase isozymes were detected in 2D gels or by de novo peptide sequencing. Furthermore, extensive glycosylation or other post translational modifications can interfere with trypsin digestions or alter protein masses, resulting in inefficient digestion or errors in predicted amino acid sequences, which can result in protein non-detection [73, 76]. It is also possible that the lignin peroxidase ortholog detected in the Nectria haematococca reference genome is not present in our isolate. An alternate explanation is that perhaps laccases are more involved in natural lignin degradation in this system than originally demonstrated. Lending support to this speculation is the observation that the lignin degrading capacity is strongly reduced in some laccase-deficient Pycnoporus cinnabarinus and Sporotrichum pulverulentum mutants and the hypothesis that small secreted proteins produced during active lignin metabolism

117 may serve as natural redox mediators for laccase, enhancing its oxidative potential [16, 86, 87].

In addition, Scharf and colleagues [34] recently discovered an endogenous termite laccase whose phenol oxidase activity is enhanced in the presence of hydrogen peroxide, a characteristic more synonymous with lignin peroxidases. While this could indicate that this laccase has an inherently higher redox potential and can catalyze oxidation of more recalcitrant linkages than previously characterized laccases, this observation could also be an artifact of His-labeling and future studies are needed to validate its redox potential and determine its catalytic capabilities.

In conclusion, we have demonstrated that this F. solani isolate has definitive abilities to degrade proteins, cellulose, hemicellulose, and other carbohydrate polymers present in woody tissue and that it expresses many enzymes that are often up-regulated during periods of lignin metabolism in other lignin degrading fungi, including enzymes involved in extracellular peroxidase generation, laccases, and polyphenol oxidases. While we have not definitively documented full lignin depolymerization elicited by this A. glabripennis -derived F. solani strain, these results indicate that this isolate may have lignin degrading potential. In order to better assess the true metabolic potential of this fungal strain, whole genome sequencing is currently in progress to produce a more suitable reference genome for future transcriptomics and proteomics studies and more conclusively assay lignin degrading capabilities. Furthermore, follow up metatranscriptomic and metaproteomic approaches will be employed to determine if fungal transcripts and enzymes are actively expressed in the gut and to assess its potential contribution to lignin degrading activities in larval A. glabripennis.

118 Acknowledgements

We thank A. and B. Stanley at the Penn State Hershey Medical Center Mass

Spectrometry Core Research Facility for mass spectrometry analysis and Cristina Rosa for her comments and suggestions for this manuscript. We also thank Isabel Ramos and Karen

Pongrance for assistance with ALB rearing. Funding for this project was provided by USDA-

NRI-CRSEES grant 2008-35504-04464, USDA-NRI-CREES grant 2009-35302-05286, the

Alphawood Foundation, Chicago, Illinois, a Seed Grant to Dr. Hoover from the Pennsylvania

State University College of Agricultural Sciences.

119 Literature Cited

1. Dillon RJ, Dillon VM: The gut bacteria of insects: nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

2. Kukor JJ, Cowan DP, Martin MM: The role of ingested fungal enzymes in cellulose

digestion in the larvae of cerambycid beetles. Physiological Zoology 1988, 61(4):364-

371.

3. Dowd PF: Insect fungal symbionts - a Promising Source of Detoxifying Enzymes.

Journal of Industrial Microbiology 1992, 9(3-4):149-161.

4. Hanks LM: Influence of the larval host plant on reproductive strategies of

cerambycid beetles. Annual Review of Entomology 1999, 44:483-505.

5. Lingafelter SW, Hoebke, E. R.: Revision of Anoplophora (Coleoptera:

Cerambycidae). Washington, DC: Entomological Society of Washington 2002:236 p.

6. Nowak DJ, Pasek JE, Sequeira RA, Crane DE, Mastro VC: Potential effect of

Anoplophora glabripennis (Coleoptera : Cerambycidae) on urban trees in the

United States. Journal of Economic Entomology 2001, 94(1):116-122.

7. Hu JF, Angeli S, Schuetz S, Luo YQ, Hajek AE: Ecology and management of exotic

and endemic Asian longhorned beetle Anoplophora glabripennis. Agricultural and

Forest Entomology 2009, 11(4):359-375.

8. OSullivan AC: Cellulose: the structure slowly unravels. Cellulose 1997, 4(3):173-207.

9. Pettersen Roger C: The Chemical Composition of Wood. In : The Chemistry of Solid

Wood. vol. 207: American Chemical Society; 1984: 57-126.

10. Kirk TK, Farrell RL: Enzymatic combustion - the microbial-degradation of lignin.

Annual Review of Microbiology 1987, 41:465-505.

120 11. Keller B, Templeton MD, Lamb CJ: Specific localization of a plant-cell wall glycine-

rich protein in protoxylem cells of the vascular system. Proceedings of the National

Academy of Sciences of the United States of America 1989, 86(5):1529-1533.

12. Ke J, Singh D, Chen SL: Aromatic compound degradation by the wood-feeding

termite Coptotermes formosanus (Shiraki). International Biodeterioration &

Biodegradation 2011, 65(6):744-756.

13. Martinez D, Larrondo LF, Putnam N, Gelpke MDS, Huang K, Chapman J, Helfenbein

KG, Ramaiya P, Detter JC, Larimer F et al: Genome sequence of the lignocellulose

degrading fungus Phanerochaete chrysosporium strain RP78 Nat Biotechnol 2004,

22(7):899-899.

14. Camarero S, Sarkar, S., Ruiz-Duenas, F. J., Martinez, M. J., and Martinez, A. T.:

Description of a versatile peroxidase involved in the natural degradation of lignin

that has both manganese peroxidase and lignin peroxidase substrate interaction

sites. The Journal of Biological Chemistry 1999, 274:6.

15. Morozova OV, Shumakovich, G. P., Shleev, S. V., and Yaropolov, Ya. I.: Laccase-

mediator systems and their applications: A review. Applied Biochemistry and

Microbiology 2007, 43(5):12.

16. Regalado V, Perestelo F, Rodriguez A, Carnicero A, Sosa FJ, De la Fuente G, Falcon

MA: Activated oxygen species and two extracellular enzymes: laccase and aryl-

alcohol oxidase, novel for the lignin-degrading fungus Fusarium proliferatum.

Applied Microbiology and Biotechnology 1999, 51(3):388-390.

17. Harazono K, Yamashita N, Shinzato N, Watanabe Y, Fukatsu T, Kurane R: Isolation

and characterization of aromatics-degrading microorganisms from the gut of the

lower termite Coptotermes formosanus. Biosci, Biotechnol, Biochem 2003, 67(4):889-

892.

121 18. Bugg TDH, Ahmad M, Hardiman EM, Singh R: The emerging role for bacteria in

lignin degradation and bio-product formation. Curr Opin Biotechnol 2011, 22(3):394-

400.

19. Kajikawa H, Kudo H, Kondo T, Jodai K, Honda Y, Kuwahara M, Watanabe T:

Degradation of benzyl ether bonds of lignin by ruminal microbes. FEMS Microbiol

Lett 2000, 187(1):15-20.

20. Kato K, Kozaki S, Sakuranaga M: Degradation of lignin compounds by bacteria from

termite guts. Biotechnology Letters 1998, 20(5):459-462.

21. Shary S, Ralph SA, Hammel KE: New insights into the ligninolytic capability of a

wood decay ascomycete. Appl Environ Microbiol 2007, 73(20):6691-6694.

22. Geib SM, Filley TR, Hatcher PG, Hoover K, Carlson JE, Jimenez-Gasco Mdel M,

Nakagawa-Izumi A, Sleighter RL, Tien M: Lignin degradation in wood-feeding

insects. Proceedings of the National Academy of Sciences of the United States of America

2008, 105(35):12932-12937.

23. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

24. Geib SM, Tien M, Hoover K: Identification of proteins involved in lignocellulose

degradation using in gel zymogram analysis combined with mass spectroscopy-

based peptide analysis of gut proteins from larval Asian longhorned beetles,

Anoplophora glabripennis. Insect Science 2010, 17(3):253-264.

25. Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH: Diversity of beetle genes

encoding novel plant cell wall degrading enzymes. PloS one 2010, 5(12):e15635.

26. Pauchet Y, Wilkinson P, van Munster M, Augustin S, Pauron D, Ffrench-Constant RH:

Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela

122 tremulae reveals new gene families in Coleoptera. Insect Biochemistry and Molecular

Biology 2009, 39(5-6):403-413.

27. Watanabe HaT, G: Cellulolytic Systems in Insects. Annu Rev Entomol 2009, 55:23.

28. Willis JD, Oppert C, Jurat-Fuentes JL: Methods for discovery and characterization of

cellulolytic enzymes from insects. Insect Science 2010, 17(3):184-198.

29. Davison A, Blaxter M: Ancient origin of glycosyl hydrolase family 9 cellulase genes.

Molecular biology and evolution 2005, 22(5):1273-1284.

30. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G, Quesada M:

cDNA cloning, homology modelling and evolutionary insights into novel endogenous

cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae).

Insect Molecular Biology 2010, 19(3):323-336.

31. Zhang DH, Lax AR, Bland JM, Allen AB: Characterization of a new endogenous

endo-beta-1,4-glucanase of Formosan subterranean termite (Coptotermes

formosanus). Insect Biochemistry and Molecular Biology 2011, 41(4):211-218.

32. Sun JZ, Scharf ME: Exploring and integrating cellulolytic systems of insects to

advance biofuel technology PREFACE. Insect Science 2010, 17(3):163-165.

33. Coy MR, Salem TZ, Denton JS, Kovaleva ES, Liu Z, Barber DS, Campbell JH, Davis

DC, Buchman GW, Boucias DG et al: Phenol-oxidizing laccases from the termite gut.

Insect Biochemistry and Molecular Biology 2010, 40(10):723-732.

34. Scharf ME, Tartar A: Termite digestomes as sources for novel lignocellulases. Biofuels

Bioproducts & Biorefining-Biofpr 2008, 2(6):540-552.

35. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

123 36. Teetorbarsch GH, Roberts DW: Entomogenous Fusarium species. Mycopathologia

1983, 84(1):3-16.

37. Morales-Ramos JA, Rojas MG, Sittertz-Bhatkar H, Saldana G: Symbiotic relationship

between Hypothenemus hampei (Coleoptera : Scolytidae) and Fusarium solani

(Moniliales : Tuberculariaceae). Ann Entomol Soc Am 2000, 93(3):541-547.

38. Romero MC, Salvioli ML, Cazau MC, Arambarri AM: Pyrene degradation by yeasts

and filamentous fungi. Environmental Pollution 2002, 117(1):159-163.

39. Veignie E, Rafin C, Woisel P, Cazier F: Preliminary evidence of the role of hydrogen

peroxide in the degradation of benzo[a]pyrene by a non-white rot fungus Fusarium

solani. Environmental Pollution 2004, 129(1):1-4.

40. Colombo JC, Cabello M, Arambarri AM: Biodegradation of aliphatic and aromatic

hydrocarbons by natural soil microflora and pure cultures of imperfect and

lignolitic fungi. Environmental Pollution 1996, 94(3):355-362.

41. Tien M, Kirk TK: Lignin-degrading enzyme from Phanerochaete-chrysosporium -

purification, characterization, and catalytic properties of a unique H2o2-requiring

. Proceedings of the National Academy of Sciences of the United States of

America-Biological Sciences 1984, 81(8):2280-2284.

42. Zaremaivan H, Shearer CA: Extracellular enzyme-production and cell-wall

degradation by fresh-water lignicolous fungi. Mycologia 1988, 80(3):365-375.

43. Worrall JJ, Anagnost SE, Zabel RA: Comparison of wood decay among diverse

lignicolous fungi. Mycologia 1997, 89(2):199-219.

44. Lozovaya V, Lygin A, Zernova O, Li S, Widholm J: Lignin degradation by Fusarium

solani f. sp. glycines. Plant Dis 2006, 90:77-82.

124 45. Sutherland JB, Pometto AL, Crawford DL: Lignocellulose Degradation by Fusarium

Species. Canadian Journal of Botany-Revue Canadienne De Botanique 1983,

61(4):1194-1198.

46. Rodriguez A, Perestelo F, Carnicero A, Regalado V, Perez R, DelaFuente G, Falcon MA:

Degradation of natural lignins and lignocellulosic substrates by soil-inhabiting fungi

imperfecti. Fems Microbiol Ecol 1996, 21(3):213-219.

47. Falcon MA, Rodriguez A, Carnicero A, Regalado V, Perestelo F, Milstein O, Delafuente

G: Isolation of microorganisms with lignin transformation potential from soil of

Tenerife Island. Soil Biology & Biochemistry 1995, 27(2):121-126.

48. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J,

Schmutz J, Taga M, White GJ, Zhou S et al: The genome of Nectria haematococca:

contribution of supernumerary chromosomes to gene expansion. PLoS Genet 2009,

5(8):e1000618.

49. Varela E, Mester T, Tien M: Culture conditions affecting biodegradation components

of the brown-rot fungus Gloeophyllum trabeum. Arch Microbiol 2003, 180(4):251-256.

50. Crawford DL, Crawford RL: Microbial-degradation of lignin. Enzyme and Microbial

Technology 1980, 2(1):11-22.

51. Kinter M, Sherman NE: Protein sequencing and identification using tandem mass

spectrometry. New York: John Wiley; 2000.

52. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal

peptides: SignalP 3.0. J Mol Biol 2004, 340(4):783-795.

53. Bernfeld P: alpha and beta. In: Methods Enzymol. Edited by Colowick SP,

Kaplan NO, vol. 1. New York: Academic Press; 1955: 149-150.

54. Miller GL: Use of dinitrosalicylic acid reagent for determination of reducing sugar.

Analytical Chemistry 1959, 31(3):426-428.

125 55. Bollag DM, Rozycki MD, Edelstein SJ: Protein methods, 2nd edn. New York: Wiley-

Liss; 1996.

56. Bradford MM: A rapid and sensitive method for the quantitation of microgram

quantities of protein utilizing the principle of protein-dye binding. Anal Biochem

1976, 72:248-254.

57. Laemmli UK: Cleavage of structural proteins during assembly of head of

bacteriophage-T4. Nature 1970, 227(5259):680-&.

58. Her S, Lee HS, Choi SJ, Choi SW, Choi HJ, Yoon SS, Oh DH: Cloning and sequencing

of beta-1,4-endoglucanase gene (celA) from Pseudomonas sp YD-15. Lett Appl

Microbiol 1999, 29(6):389-395.

59. Schwarz WH, Bronnenmeier K, Grabnitz F, Staudenbauer WL: Activity staining of

cellulases in polyacrylamide gels containing mixed linkage beta-glucans. Anal

Biochem 1987, 164(1):72-77.

60. Chavez R, Schachter K, Navarro C, Peirano A, Aguirre C, Bull P, Eyzaguirre J:

Differences in expression of two endoxylanase genes (xynA and xynB) from

Penicillium purpurogenum. Gene 2002, 293(1-2):161-168.

61. Tien M, Kirk TK: Lignin-degrading enzyme from Phanerochaete chrysosporium-

purification, characterization, and catalytic properties of a unique H2O2-requiring

oxygenase. P Natl Acad Sci-Biol 1984, 81(8):2280-2284.

62. Dejong E, Field JA, Debont JAM: Evidence for a new extracellular peroxidase -

manganese-inhibited peroxidase from the white-rot fungus Bjerkandera Sp Bos-55.

FEBS Lett 1992, 299(1):107-110.

63. Rehman AU, Thurston CF: Purification of laccase-I from Armillaria mellea. J Gen

Microbiol 1992, 138:1251-1257.

126 64. Henrissat B: A classification of glycosyl hydrolases based on amino acid sequence

similarities. Biochem J 1991, 280 ( Pt 2):309-316.

65. Pasti MB, Pometto AL, Nuti MP, Crawford DL: Lignin-solubilizing ability of

actinomycetes isolated from termite (Termitidae) gut. Applied and environmental

microbiology 1990, 56(7):2213-2218.

66. Dunwell JM, Purvis A, Khuri S: Cupins: the most functionally diverse protein

superfamily? Phytochemistry 2004, 65(1):7-17.

67. Ubiyvovk VM, Blazhenko OV, Gigot D, Penninckx M, Sibirny AA: Role of gamma-

glutamyltranspeptidase in detoxification of xenobiotics in the yeasts Hansenula

polymorpha and Saccharomyces cerevisiae. Cell Biology International 2006,

30(8):665-671.

68. Soundararajan S, Jedd G, Li XL, Ramos-Pamplona M, Chua NH, Naqvi NI: Woronin

body function in Magnaporthe grisea is essential for efficient pathogenesis and for

survival during nitrogen starvation stress. Plant Cell 2004, 16(6):1564-1574.

69. Pazzagli L, Cappugi G, Manao G, Camici G, Santini A, Scala A: Purification,

characterization, and amino acid sequence of cerato-platanin, a new phytotoxic

protein from Ceratocystis fimbriata f. sp platani. J Biol Chem 1999, 274(35):24959-

24964.

70. Holtzapple M, Cognata M, Shu Y, Hendrickson C: Inhibition of Trichoderma-reesei

cellulase by sugars and solvents. Biotechnology and bioengineering 1990, 36(3):275-

287.

71. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The

Carbohydrate-Active EnZymes database (CAZy): an expert resource for

Glycogenomics. Nucleic Acids Research 2009, 37:D233-D238.

127 72. Pinto PC, Evtuguin DV, Neto CP: Structure of hardwood glucuronoxylans:

modifications and impact on pulp retention during wood kraft pulping.

Carbohydrate Polymers 2005, 60(4):489-497.

73. Sato S, Liu F, Koc H, Tien M: Expression analysis of extracellular proteins from

Phanerochaete chrysosporium grown on different liquid and solid substrates.

Microbiology-Sgm 2007, 153:3023-3033.

74. Ferris MJ, Muyzer G, Ward DM: Denaturing gradient gel electrophoresis profiles of

16S rRNA-defined populations inhabiting a hot spring microbial mat community.

Appl Environ Microbiol 1996, 62(2):340-346.

75. Sato S, Feltus FA, Iyer P, Tien M: The first genome-level transcriptome of the wood-

degrading fungus Phanerochaete chrysosporium grown on red oak. Current Genetics

2009, 55(3):273-286.

76. Abbas A, Koc H, Liu F, Tien M: Fungal degradation of wood: initial proteomic

analysis of extracellular proteins of Phanerochaete chrysosporium grown on oak

substrate. Current Genetics 2005, 47(1):49-56.

77. Strope PK, Nickerson KW, Harris SD, Moriyama EN: Molecular evolution of urea

amidolyase and urea carboxylase in fungi. Bmc Evolutionary Biology 2011, 11.

78. Kersten P, Cullen D: Extracellular oxidative systems of the lignin-degrading

Basidiomycete Phanerochaete chrysosporium. Fungal Genet Biol 2007, 44(2):77-87.

79. Hammel KE: Fungal degradation of lignin. Driven by Nature: Plant Litter Quality and

Decomposition 1997:33-45.

80. Youn HD, Hah YC, Kang SO: Role of laccase in lignin degradation by white-rot

fungi. Fems Microbiology Letters 1995, 132(3):183-188.

81. Hemingway RWaE, P.: Plant polyphenolics, vol. 59. Houghton, MI: Plenum Press;

1992.

128 82. Bending GD, Read DJ: Lignin and soluble phenolic degradation by ectomycorrhizal

and ericoid mycorrhizal fungi. Mycological research 1997, 101:1348-1354.

83. Kuan IC, Tien M: Stimulation of Mn-peroxidase activity - a possible role for oxalate

in lignin biodegradation. Proceedings of the National Academy of Sciences of the

United States of America 1993, 90(4):1242-1246.

84. Ohta Y, Maeda M, Kudo T: Pseudomonas putida CE2010 can degrade biphenyl by a

mosaic pathway encoded by the tod operon and cmtE, which are identical to those

of P-putida F1 except for a single base difference in the operator-promoter region of

the cmt operon. Microbiology-Uk 2001, 147:31-41.

85. Magnuson TS, Crawford DL: Comparison of extracellular peroxidase- and esterase-

deficient mutants of Streptomyces-viridosporus T7a. Applied and environmental

microbiology 1992, 58(3):1070-1072.

86. Ander P, Eriksson KE: Importance of phenol oxidase activity in lignin degradation by

white-rot fungus Sporotrichum-pulverulentum. Archives of microbiology 1976,

109(1-2):1-8.

87. Bermek H, Li KC, Eriksson KEL: Laccase-less mutants of the white-rot fungus

Pycnoporus cinnabarinus cannot delignify kraft pulp. Journal of Biotechnology 1998,

66(2-3):117-124.

129

Table 3-1. MudPIT summary data.

Unused (% Conf) Proteins Distinct % Total Cutoff Detected Peptides Spectra Identified Spectra Used >2.0 (99) 264 3219 4279 51 >1.3 (95) 398 3638 4740 56.4

130 Table 3-2. Most abundant InterPro IDs identified in MudPIT analysis.

# Proteins InterPro with ID annotation InterPro Description IPR000379 16 Esterase// IPR001138 14 Fungal transcriptional regulatory protein, N-terminal IPR000254 8 Cellulose-binding region, fungal IPR001764 8 Glycoside hydrolase, family 3, N-terminal IPR000209 8 Peptidase S8 and S53, subtilisin, kexin, sedolisin IPR007219 7 Fungal specific transcription factor IPR002772 7 Glycoside hydrolase, family 3, C-terminal IPR003439 6 ABC transporter IPR001410 6 DEAD/DEAH box helicase IPR001650 6 Helicase, C-terminal IPR007484 6 Peptidase M28 IPR003137 6 Protease-associated PA IPR003593 5 AAA ATPase IPR001757 5 ATPase, E1-E2 type IPR006045 5 Cupin IPR008250 5 E1-E2 ATPase-associated region IPR005834 5 Haloacid dehalogenase-like hydrolase IPR000719 5 Protein kinase IPR010259 5 Proteinase inhibitor I9, subtilisin propeptide IPR002290 5 Serine/threonine protein kinase

131 Table 3-3. Glycoside hydrolase families detected in MudPIT analysis.

Glycoside Number Protein ID Interpro ID (or KEGG EC Number Secreted? Hydrolase Family of description if (if known) Proteins known) Candidate GH 2 78, 80 Blast hit to Y endoglucanase Candidate GH 39 1 156 Blast hit to beta Y glucosidase Candidate GH 39 1 170 IPR000293, Y IPR002860 Candidate GH5 1 34 Blast homology Y Candidate GH7 1 44 GH 7 superfamily 3.2.1.58: Glucan N domain 1,3-beta-glucosidase Candidate GH9 1 109 Blast hit to GH9 Y Candidate GH 55 1 5 Pectin 3 Y domain Candidate retaining 1 54 JGI User Y beta glucosidase annotation BNR repeat 2 35, 23 IPR002860 Y Fungal cellulose 1 IPR000254 binding Glucoside hydrolase 1 120 IPR002044 N - starch binding 1 1 27 IPR001360 Y 3 8 2, 56, 57, IPR001764, 3.2.1.21: Beta 5Y 3N 58, 95, 149, IPR002772 glucosidase 189, 254 5 3 26, 51, 209 IPR000254, Y IPR001547, IPR001764, IPR002772 6 1 20 IPR001524 Y 7 4 1, 96, 115, IPR001722, Y 170 IPR000254 10 1 21 IPR001000 Y 11 1 145 IPR001137 Y 13 2 75, 318 IPR006046, 3.2.1.1: Alpha 1 Y 1N IPR004193 amylase 2.4.1.18: 1,4-alpha-glucan branching enzyme 15 1 6 IPR000165 3.2.1.3: Glucan 1,4- Y alpha-glucosidase

132

16 2 16, 87 IPR000757 3.2.1.73: Lichenase Y 17 1 237 IPR000490 3.2.1.58: Glucan N 1,3-beta-glucosidase 20 1 11 IPR001540 3.2.1.52: Beta-N- Y acetylhexosaminidas e 24 1 50 IPR002196 Y 28 1 112 IPR001002, 3.2.1.14: Chitinase N IPR001223, IPR011583 31 1 4 IPR000322 3.2.1.20: Alpha Y glucosidase 32 1 98 IPR001362 Y 35 1 323 IPR001944 3.2.1.23: Beta Y galactosidase 37 1 25 IPR001661 3.2.1.28: Alpha Y trehalase 43 1 184 IPR006710 Y 45 1 52 IPR000254, Y IPR000334 61 1 163 IPR005103 N

133 Table 3-4. Other plant cell wall degrading proteins from MudPIT analysis.

Protein Name Number Protein Interpro ID (or KEGG EC Secreted? of ID description if Number (if Proteins known) known)

Alpha-L- 2 9, 79 IPR010720, Y arabinfuranosidase IPR007934 Candidate beta-N- 1 168 IPR002022 Y acetylhexosaminidase Candidate 1 292 IPR001087 Y Candidate ester 1 402 Blast homology N hydrolase Candidate pectin lyase 1 122 Blast homology Y Carbohydrate binding 1 393 IPR002889 Y protein 30, 77, IPR000379, 3.1.1.1: 2Y 1N 269 IPR000408, Carboxylesterase IPR002018 Chitin binding protein 1 287 IPR001002, Y IPR002889 Chitin deacetylase 1 99 IPR009939 Y 2 73, 278 IPR000379, Y IPR000675, IPR011150 Esterase 3 72, 108, IPR000379, 2Y 1N 136 IPR007312, IPR008262 Galactose epimerase 1 22 IPR008183 5.1.3.3: Aldose 1- N epimerase

Lipolytic enzyme 2 153, 331 IPR001087 Y Pectate lyase 1 116 IPR004898 Y Polysachharide 1 134 IPR002509 Y deacetylase Tannase and feruloyl 1 159 IPR011118 N esterase

134

Table 3-5. Proteins associated with lignin metabolism from MudPIT analysis.

Protein name Number Protein ID Interpro ID (or KEGG EC Secreted? of description if Number (if Proteins known) known) Acid 4 18, 68, 135, IPR000120, 3.1.3.2: Acid Y 267 IPR000560, phosphatase IPR002828, IPR003778, IPR003833, IPR004843

Alkaline 1 97 IPR001952 3.1.3.1: Y phosphatase

Candidate 2- 1 201 IPR000073, Y hydroxy-6-oxo-6- IPR000379, pheylhexa-2,4 IPR003089, dienoate IPR008262 hydrolase

Candidate 1 199 CBM and DOMO Y cellobiose domain containing dehydrogenase protein

Candidate copper 1 138 IPR002889 + Y radical oxidase Blast homology

Candidate 1 199 IPR000379 N esterase directed at aromatic compounds

135

Cadidate FAD 1 404 KOG2852: N oxidoreductase possible oxidoreductase (Best blast hit: FAD oxidoreductase)

Carboxymucono 1 381 IPR003779 N lactone decarboxylase Catalase 2 40, 114 IPR002226, 1.11.1.6: 1Y 1N IPR002818, Catalase IPR010582 FAD 5 128, 165, IPR006094 4Y 1N oxidoreductase 211, 219, 247 Glyoxylase 1 398 IPR002110, N dioxygenase IPR011588

GMC 2 85, 124 IPR000172, 1.1.99.1: Y oxidoreductase IPR007867 Choline dehydrogenase

Laccase 4 33, 133, IPR001117, (3) 1.10.3.2: 2Y 2N 154, 261 IPR002355, Laccaase (1) IPR006162 1.10.3.3: L- ascorbate oxidase Nickel 1 106 IPR006162 1.15.1.1: Nickel Y superoxide superoxide dismutase dismutase Small secreted 3 94, 144, 290 Blast homology 1Y 2N protein

Superoxide 2 47, 248 IPR001189, 1.15.1.1: N dismutase IPR001424 Superoxide dismutase

Tyrosinase 2 37, 285 IPR002227 1Y 1N

136 Table 3-6. Proteinases and nitrogen-recycling proteins identified from MudPIT analysis.

Protein name Number Protein ID Interpro ID (or KEGG EC Number Secreted? of description if (if known) Proteins known)

Amidase 1 29 IPR000120 Y 1 307 IPR006680 Y Aminopeptidase 4 83, 88, 118, IPR003137, 3.4.11.-: Y 256 IPR007484 Aminopeptidase Aspartic 2 102, 104 IPR000250, 3.4.23.-: Aspartic Y endopeptidase IPR001461, endopeptidase IPR001969 Bacterial leucyl 1 63 IPR007484 3.4.11.10: Bacterial Y peptidase leucyl aminopeptidase Candidapepsin 1 24 IPR001461 3.4.23.24: Y Candidapepsin Candidate 1 322 Blast homology N formylmethionine deformylase Carboxypeptidase 1 224 IPR000834 Y A Carboxypeptidase 1 64 IPR001412 3.4.17.15: Y A2 Carboxypeptidase A2 Carboxypeptidase 1 266 IPR000834 3.4.17.2: Y B Carboxypeptidase B Carboxypeptidase 2 152 IPR000379, 3.4.16.5: Y C IPR001563 Carboyxpeptidase C Cerevisin 2 43, 131 IPR000209, 3.4.21.48: Y IPR003137 Cerevisin Cysteine peptidase 1 178 IPR000169 N Di- and tri- peptidyl 1 117 IPR000379, 3.4.14.-: Dipeptidyl N peptidase IPR001375, peptiase and IPR002088 tripeptidyl peptidase Formamidase 1 212 IPR002469, 3.5.1.49: N IPR004304 Foramidase Fungalysin 1 55 IPR001842, Y IPR006025, IPR011096

137

Glutamate 1 90 IPR003137, 3.4.17.21: N carboxypeptidase II IPR007365, Glutamate IPR007484 carboxypeptidase II Metallocarboxypept 1 62 IPR000834 3.4.17.-: Y idase Metallocarboxypept idase Metalloendo 2 250 260 IPR001384, 1Y 1N peptidase IPR001567, IPR006025 Pepsin A 2 14, 380 IPR001461, 3.4.23.1: Pepsin A Y IPR001969 Peptidase 1 13 IPR000379, 3.4.-.-: Acting on Y IPR008758 peptide bonds (peptide hydrolases). Peptidase C2 1 171 IPR001300 N Serine 2 10 IPR001563 3.4.16.6: Y carboxypeptidase D Carboxypeptidase D Serine 4 17, 49, 277, IPR000209, 3.4.21.-: Serine Y endopeptidase 396 IPR010259 endopeptidase Tri-peptidyl 2 82, 221 IPR000209 3.4.14.9: 1Y 1N peptidase Tripeptidyl- peptidase I Trypsin 1 46 IPR001314, 3.4.21.4: Trypsin Y IPR008256 Urease 1 164 IPR000089, N IPR000120, IPR003778, IPR003833

138 Table 3-7. Verification of lignocellulolytic activity of A. glabripennis derived F. solani solid wood culture extracts through in vitro assays.

Protein Conc. Enzyme Activity (mg/ml Specific Activity (U/ml)* extract) (U/mg protein)* Cellulose/xylan Beta-glucosidase 12.77 0.06 212.8 CMCase 0.79 0.06 13.1 Cellulase (from Avicel) 0.88 0.06 14.7 Xylanase 4.23 0.06 70.5

Lignin Lignin Peroxidase 0 0.05 0

Mn-dependent Peroxidase 0.021 0.05 0.42 Mn-independent Peroxidase 0.47 0.05 9.4 Laccase 0.42 0.05 8.4

* 1 unit of activity = amount of enzyme that releases 1 µmol of reducing sugar per minute for reducing sugar assays and amount of enzyme that oxidizes 1 µmol substrate per min for peroxidase assays.

139

Figure 3-1. Enrichment of GO Molecular Function terms in proteomic analysis. Bar graph represents the ratio of % composition of term in proteomic data vs. % composition in the genome annotation. Values over 1 (dotted line) are overrepresented in the proteomic data. Red line illustrates the relative abundance of the GO term in the proteomic data.

140

Figure 3-2. Heme staining and zymogram analysis of A. glabripennis derived F. solani solid wood culture extract. Twenty µg of fungal extract were loaded into each lane. Protein standard is on the left, with band sizes labeled (kDa). Lane A is a colloidal blue stained lane, and lane B is the corresponding heme stain/CMC/or xylan zymogram lane. For heme stain, a single band is present at approximately 70 kDa. For CMC zymogram six major zones of clearing are present at approximately 55, 32, 30, 27, 23 and 20 kDa and can be matched to protein bands on the colloidal blue stained lane. For birch wood xylan zymogram, many zones of clearing are present on the gel with major spots between 30 and 50 kDa. These correspond to a broad range of bands on the colloidal blue stained lane.

141

Chapter 4

Midgut Transcriptome Profiling of Anoplophora glabripennis, a Lignocellulose-Degrading, Wood-Boring Cerambycid

Abstract

Wood-feeding insects often work in collaboration with microbial symbionts to degrade lignin biopolymers and release glucose and other fermentable sugars from recalcitrant plant cell wall carbohydrates, including cellulose and hemicellulose. Here, we present the midgut transcriptome of larval Anoplophora glabripennis, a wood-boring beetle with documented lignin-, cellulose-, and hemicellulose- degrading capabilities, which provides valuable insights into how this insect overcomes challenges associated with feeding in woody tissue. Transcripts from putative protein coding regions of over 9,000 insect-derived genes were identified in the A. glabripennis midgut transcriptome using a combination of 454 shotgun and Illumina paired-end reads. The most highly-expressed genes predicted to encode digestive-related enzymes were trypsins, carboxylesterases, β-glucosidases, and cytochrome P450s. Furthermore, 180 unigenes predicted to encode glycoside hydrolases (GHs) were identified and included several GH 5, 45, and 48 cellulases, GH 1 xylanases, and GH 1 β-glucosidases. In addition, transcripts predicted to encode enzymes involved in detoxification were detected, including a substantial number of unigenes classified as cytochrome P450s (CYP6B) and carboxylesterases, which are hypothesized to play pivotal roles in detoxifying host tree defensive chemicals and could make important contributions to A. glabripennis’ expansive host range. While a large diversity of insect-derived transcripts predicted to encode digestive and detoxification enzymes were

142 detected, few transcripts predicted to encode enzymes required for lignin degradation or synthesis of essential nutrients were identified, suggesting that collaboration with microbial enzymes may be required for survival in woody tissue.A. glabripennis produces a number of enzymes with putative roles in cell wall digestion, detoxification, and nutrient extraction, which likely contribute to its ability to thrive in a broad range of host trees. This system is quite different from the previously characterized termite fermentation system and provides new opportunities to discover enzymes that could be exploited for cellulosic ethanol biofuel production or the development of novel methods to control wood-boring pests.

Introduction

Class Hexapoda represents one of the most ancient and diverse evolutionary groups on the planet [1], containing organisms capable of occupying many recalcitrant niches and persisting under intense environmental conditions including extreme temperatures, periods of desiccation, and exposure to toxins [2]. Many of its members are capable of thriving on suboptimal, nutritionally-deficient substrates [3], including wood-boring beetles belonging to family

Cerambycidae that feed exclusively on woody tissue. Specifically, beetles in the genus

Anoplophora are distinct from other wood-boring insects because many of its members preferentially target healthy host trees and have relatively broad host ranges [4]. For example, the

Asian longhorned beetle (Anoplophora glabripennis) was introduced from China into the United

States, Canada, and several countries in Europe and has been documented to complete development in approximately 47 deciduous tree species worldwide, including several genera commonly planted as feedstock (e.g., Salix and Populus) [5]. Acer spp. (maples) are the predominant hosts in the introduced range [6, 7]. This beetle poses a significant threat to urban

143 streetscapes, has the potential to destroy up to 35% of the urban tree canopy in its introduced range, and has already caused millions of dollars in damage to urban landscapes. Wood-borers, like A. glabripennis, are especially challenging to control in both their natural and invasive ranges because the larvae spend 1-2 years living deep inside their host trees [8]. Natural enemies are rare and treatment of host trees with systemic insecticides is costly and has variable efficacy against A. glabripennis larvae [9, 10]. The most effective method for eradication is destruction of infested and nearby host trees and implementation of strict quarantine measures to contain the infestation. Therefore, understanding the digestive physiology of this ceramybcid at the genetic level is paramount to devising novel control strategies.

Because A. glabripennis spends the majority of its lifecycle in the larval stage and feeds primarily in the heartwood of a broad range of healthy deciduous trees, it must overcome challenges of digesting intractable woody tissue in order to acquire sufficient nutrients to complete development [7, 11]. Glucose is a predominant wood sugar, but it is present in the form of complex polysaccharides, including cellulose, hemicellulose, callose, and pectin, which are inherently difficult to digest and require a complex of hydrolytic enzymes for efficient degradation and liberation of sugar monomers [12]. Extensive hydrogen bonding coupled with linear configurations increases the crystallinity of these cell wall polysaccharides and decreases their permeability, further hindering the activity of hydrolytic enzymes. Plant cell wall polysaccharides are further protected from hydrolytic enzymes by lignin, a biopolymer containing over 12 types of chemical bonds that is extensively cross-linked to both cellulose and hemicellulose [13], shielding them from digestion. Due to the random, heterogeneous nature of these cross-linkages and the high resilience of carbon-carbon and β-aryl ether linkages that dominate this macromolecule, lignin polymers can only be efficiently degraded through oxidative depolymerization, a process that has only been conclusively documented to be catalyzed by enzymes produced by a small number of wood degrading fungi [14]. Nitrogen is also extremely

144 limited in woody tissues [15] and plant cell wall proteins are intricately cross-linked with lignin and cellulose, making them difficult to access [16]. Other essential nutrients, including fatty acids, sterols, and vitamins are present in low concentrations or are completely absent [17].

Lastly, wood-feeding insects must overcome plant secondary metabolites that often accumulate to high concentrations in the heartwood through detoxification or sequestration processes [18].

Many wood-feeding beetles cultivate extracellular symbiotic fungi to facilitate digestion of woody tissue and nutrient acquisition, which are carried in mycangia or other specialized structures on their body [19]. For example, bark beetles utilize a mass attack strategy, in which a mycangial fungus is directly inoculated into a host tree during oviposition to facilitate pre- digestion of woody tissue and mitigation of host tree defenses. An alternative strategy is to preferentially colonize stressed trees [20] whose woody components have already been pre- digested by wood-rotting microbes. However, A. glabripennis is distinct from many other wood- feeding beetles in the sense that a single larvae can successfully develop in a healthy tree without requiring mass attack and the majority of the challenging reactions, including digestion of lignocellulose and hemicellulose and detoxification of plant metabolites, can occur within the gut itself [21-23]. While the midgut community associated with A. glabripennis has the metabolic potential to overcome many of the challenges associated with feeding in woody tissue, including degradation of lignin, cellulose, and hemicellulose and acquisition of nitrogen and other essential nutrients (Scully et al., in press), the contributions of insect-derived digestive and nutrient acquiring enzymes cannot be ignored since insects themselves can produce a diverse array of digestive enzymes, including cellulases, hemicellulases, pectinases, and enzymes that enhance lignin degradation [24-26]. Insects have also evolved other sophisticated abilities to evade host plant defenses and often possess extensive suites of enzymes involved in detoxification of plant metabolites and phytohormones, digestive proteinase inhibitors [27], and cyanates and cyanoamino acids [28], as well as enzymes capable of disrupting jasmonic acid signaling

145 pathways [29]. Furthermore, insects produce many cytochrome P450s [30], which are integrally involved in xenobiotic metabolic processes that ultimately lead to oxidative destruction of toxic compounds, including plant derived secondary metabolites and pesticides [31].

The primary goals of this study were to survey the endogenous digestive and physiological capabilities of larval A. glabripennis through shotgun sequencing of midgut derived messenger RNA and to identify insect-derived genes that are highly expressed in the midgut while actively feeding in wood. The A. glabripennis midgut transcriptome library was also compared to all publically available transcriptome libraries sampled from other plant feeding insects to identify core groups of genes that are associated with digestive processes that could facilitate nutrient recovery from woody tissue regardless of insect taxa. This study represents an important addendum to the growing database of genomic and transcriptomic resources available for coleopterans and fills an important gap, representing the first transcriptome sampled from a wood-feeding cerambycid and the first comprehensive analysis of endogenous genes associated with wood-feeding in insects. These findings offer unique opportunities to bioprospect for enzymes that could be exploited for cellulosic biofuel production or other industrial processes, and to develop novel control methods for this destructive wood-boring pest and other wood- feeding insects.

Materials and Methods

454 Transcriptome Analysis of A. glabripennis Larvae Feeding on a Suitable Host

Five pairs of adult A. glabripennis were allowed to mate and oviposit eggs in potted sugar maple (Acer saccharum) trees in a USDA-approved insect quarantine facility at The Pennsylvania

146 State University (University Park, PA). In brief, sugar maple trees were planted in 25-gal nursery containers filled with Fafard 52 pine bark medium (Fafard, Agawam, MA) and were grown at an outdoor nursery until they were 3-4 years old. Several weeks before use in experiments, trees were moved into the quarantine greenhouse to allow for acclimation to greenhouse conditions.

Three trees were placed in a walk-in insect cage (~3m high, 3m long, and 2m wide) and five mating pairs of A. glabripennis adults were placed in the cage and allowed to mate and lay eggs.

After a period of three months, third instar larvae actively feeding in the heartwood of these trees were dissected and midguts were removed and flash frozen in liquid nitrogen. Five midguts were pooled and total RNA was extracted using the RNeasy RNA extraction kit (Qiagen, Gaithersburg,

MD) followed by enrichment for mRNA using the PolyA Purist kit (Ambion, Austin, TX). The quality and quantity of the enriched mRNA was assessed using the RNA Nano Assay (Agilent,

Santa Clara, CA) and the Nano Drop 1000 spectrophotometer (Thermo-Scientific, Walthan, MA).

Approximately 10 µg of enriched RNA were used for double-stranded cDNA library construction using the Stratagene Just cDNA Synthesis kit (Agilent, Santa Rosa, CA). The sequencing library was prepared using 454 GS FLX library adapters (Roche, Banford, CT) and approximately

232,824 shotgun reads (49.1 Mb) were sequenced using 454 FLX chemistry (Roche, Banford,

CT). Reads are publically available in NCBI’s Sequence Read Archive (SRA) under accession number [SRX265389] and are associated with Bioproject [PRJNA196436]. Raw reads were trimmed to remove residual sequencing adapters and low quality ends; trimmed reads were quality filtered and assembled using Newbler (Roche, Banford, CT) to produce approximately

2,081 contigs and 1,678 isotigs (e.g. transcripts), while 27,000 singleton reads were not incorporated into the assembly. Short singleton reads were discarded and, to increase the amount of information present in the transcriptome dataset, high quality singleton reads (average quality value >30) exceeding 150 nt in length were concatenated to the assembly and the pooled dataset was utilized in downstream transcriptome comparisons. To reduce noise from sequencing errors

147 or real nucleotide polymorphisms caused by allelic differences from pooling multiple individuals for sequencing, high quality isotigs and singletons were clustered using CD-HIT-EST prior to functional annotation using a sequence similarity threshold of 0.97 to generate a set of unique isotigs and reads, which were analogous to unigenes. These unigenes were screened for noncoding RNAs using tRNAscan (tRNAs) [32] and HMMER [33] (rRNAs) using HMM profiles for archaeal, bacterial, and eukaryotic small subunit (SSU), large subunit (LSU), and

5.8/8s ribosomal RNAs [34].

The remaining isotigs and reads were annotated by comparisons to the non-redundant protein database using the BLASTX algorithm (BLAST version 2.2.26) [35] with an e-value threshold of 1e-5. Microbial- and plant- derived isotigs and singletons were identified using

MEGAN (MEtaGenome ANalyzer) [36] based on the least common ancestor of the top five highest-scoring BLASTX alignments and were removed from the dataset since this study focused solely on the beetle’s contribution to wood digestion. Unigenes were assigned to Gene Ontology terms using Blast2GO [37] while unigenes involved in carbohydrate metabolism were detected and classified into glycoside hydrolase (GH) families using HmmSearch [38] to scan for Pfam A derived HMMs [39]. Gene ontology assignments and GH and Pfam annotations were used in downstream comparisons to gut derived transcriptome libraries from other herbivorous insects.

Comparison to Other Insect Gut Transcriptome Libraries to Identify Groups of ESTs

Associated with Feeding in Wood

EST and transcriptome libraries from other plant- and wood-feeding insects were analyzed for similarities and differences to the A. glabripennis midgut transcriptome library in an attempt to identify groups of insect-derived transcripts encoding digestive enzymes that were

148 associated with feeding in wood. Publically available insect gut transcriptomes from insects feeding on plant materials, including wood, phloem, leaves, stored plant materials (starches), and pollen housed in NCBI’s SRA (454 pyrosequencing) or EST database (Sanger sequencing) were downloaded (Table 4-1). Midgut 454 libraries currently available in the Sequence Read Archive

(SRA) include honey bee (Apis mellifera), emerald ash borer (Agrilus planipennis) [40], green dock beetle (Gastrophysa viridula) [41], poplar leaf beetle (Chrysomela tremulae) [26], rice weevil (Sitophilus oryzae), Colorado potato beetle (L. decemlineata), and tobacco hornworm

(Manduca sexta) [42]. Sanger-derived EST libraries available in the EST database include corn plant hopper (Peregrinus maidis) [43], European cornborer (Ostrinia nubilalis) [44], mountain pine beetle (Dendroctonus ponderosae) [45], and termites (Coptotermes formosanus and

Reticulitermes flavipes) [46, 47].

The libraries were assembled and annotated using the same annotation procedure described for the A. glabripennis 454-based assembly with a particular emphasis on (Pfam) domains, [39], Gene Ontology terms, and carbohydratase enzyme (cazyme) family classifications [48], which were utilized in comparisons to the A. glabripennis 454-based assembly. Due to differences in sequencing depths and normalization and library preparation procedures, assembly metrics varied among libraries (Table 4-2). As this may introduce sampling biases in downstream comparisons, contigs and high quality reads were normalized in silico using

CD-HIT-EST [49] to remove redundant reads and contigs to generate a set of unigenes.

Multivariate Transcriptome Library Comparisons

GH family assignments detected in the A. glabripennis 454-based transcriptome assembly were compared to GH family assignments from transcriptomes and EST libraries

149 sampled from herbivorous insect guts feeding on a diversity of plants that varied in carbohydrate composition. This was done to identify potential correlations between carbohydrases associated with insects that feed in similar niches. Data were normalized by the total number of GH domains detected in each library and a compositional dissimilarity matrix was constructed based on Euclidean distance. The standardized data were further analyzed using unconstrained

Principal Components Analysis to plot samples in multidimensional space using the R statistical package with the ‘vegan’ library. PCA ordination was selected because the data were determined to be linear by detrended correspondence analysis (DCA) (Beta diversity <4).

To identify functional similarities between insects with similar feeding habitats, a multivariate comparison of level four Gene Ontology (GO) terms identified in the gut transcriptomes of herbivorous insects was performed. To reduce sampling bias due to differences in library sizes and assembly metrics, a custom python script was used to subsample (with replacement) level four GO assignments from 675 reads and isotigs from each library. Data were log transformed, centered, and a compositional dissimilarity matrix of transcriptome libraries was constructed based on Spearman correlation coefficients. Two-way clusters were generated with

Ward’s method using the R statistical package and the ‘vegan,’ ‘cluster’, ‘gplots,’ and ‘Biobase’ libraries.

Phylogenetic Analysis

Multiple amino acid sequence alignments were generated using ClustalW [50] and alignments were manually trimmed and edited using MEGA 5 [51]. ProTest [52] was used to predict optimal evolutionary models for maximum likelihood analysis using Akaike Information

Criteria (AIC) [53]. Unrooted phylogenetic trees were constructed using Garli (version 2.0) [54];

150 evolution was simulated for 500,000 generations or until likelihood scores reached convergence and non-parametric bootstrap analysis was conducted to generate support for branching topology

(n=500 bootstrap pseudoreplicates). Fully resolved bootstrap consensus trees were compiled using Sum Trees version 3.3.1 [55] and branch lengths less than 1e-8 were collapsed.

Identification of Highly Expressed Genes in the A. glabripennis Midgut

To generate more full length transcripts, enhance transcript discovery, and identify highly expressed genes in the A. glabripennis midgut, short paired-end reads were incorporated into the assembly. Third instar midguts were dissected and total RNA extracted as described above.

Insect-derived ribosomal RNA was depleted from the sample using MicrobEnrich (Ambion,

Austin, TX), replacing the MicrobEnrich capture oligo mix with custom oligos that were complementary to insect 18s and 28s rRNAs (oligo sequences obtained from Ambion, Austin,

TX; oligos purchased from Integrated DNA Technologies, Coralville, IA) (Supplemental Table and Supplemental Methods), while MicrobExpress (Ambion, Austin, TX), was used to deplete the sample of bacterial derived 16s and 23s rRNAs. The quality and quantity of the enriched mRNA was assessed using the RNA Nano Assay (Agilent, Santa Clara, CA) and the Nano Drop

1000 spectrophotometer (Thermo-Scientific, Walthan, MA). The library was prepared using

TruSeq RNA Library Prep Kit (Illumina, San Diego, CA), omitting the polyA enrichment step, and the library was enriched for 175 nt fragments so that paired end reads overlapped by 30 nt.

130 million 2 x 101 paired end reads (36 Gb) were generated using the Illumina HiSeq 2000 platform. To improve overall transcriptome assembly metrics and ultimately improve the ability to detect and annotate expressed genes, 454 and Illumina reads were co-assembled with Trinity.

In brief, 10 million 2 x 101 Illumina paired end reads (175 nt fragments) were simulated from 454

151 isotigs and singletons generated by Newbler using wgsim [56]. To reduce the coverage of highly expressed genes and improve the ability to assemble unigenes and transcript isoforms originating from lowly expressed genes, k-mers (k=25) from Illumina and simulated PE reads were normalized to 30X coverage using digital normalization. Normalized reads were assembled with

Trinity (version r2012-10-05) [57] and TransDecoder was used to predict putative protein coding regions using Markov models trained using the top 500 longest ORFs detected in the A. glabripennis transcriptome dataset . Coding regions were annotated through comparisons to the non-redundant protein database using BLASTP with an e-value threshold of 1e-5. Unigenes with

BLASTP alignments were classified into Gene Ontology (GO) and KEGG terms using Blast2GO

[37] and HmmSearch [38] was utilized to search for Pfam A derived HMMs [39], which were used for functional annotations and GH family assignments. Unigenes were also assigned to

KOG categories (clusters of orthologous genes for eukaryotes) using RPS-BLAST [58]. Illumina reads were mapped to the hybrid assembly using Bowtie [59], expression levels were calculated using RSEM [60], and FPKM (fragments per kilobase of exon per million mapped reads) values were used to normalize read counts [61]. Unigenes and transcript isoforms with less than five mapped reads were flagged as spurious and were removed from the final assembly. Since co- assembly should improve the ability to assemble full-length transcripts, SignalP was used to detect unigenes and transcript isoforms with discernible signal peptides [62] that could encode digestive proteins secreted into the midgut lumen. Raw Illumina reads are available in the NCBI

SRA database under the accession number [SRX265394] and associated with Bioproject

PRJNA196436. Assembled insect-derived transcripts containing predicted coding regions generated from co-assembly of 454 and Illumina paired end reads are publically available in

NCBI’s Transcript Shotgun Assembly database under the accession number [GALX00000000].

152 Results and Discussion

454- and Illumina-Based Transcriptome Sequencing

To develop a comprehensive profile of the endogenous digestive and physiological capabilities of A. glabripennis, mRNA was collected from the midguts of third instar larvae feeding in the heartwood of a preferred host (Acer saccharum) and was sequenced using both

Roche 454 pyrosequencing and Illumina technologies. In total, 232,824 shotgun sequence reads were produced using the Roche 454 FLX platform using two separate runs. 173,778 reads (35.7

Mb), ranging in length from 26 to 557 nt (average read length: 205 nt), were generated on a half plate and 59,046 reads (13.5 Mb) ranging from 39 to 407 nt (average read length: 228 nt), were generated on a quarter plate. These runs correspond to E4GEBH102.sff and E5TY7PB02.sff from SRA [SRX265389], respectively. Reads from both runs were pooled and were quality filtered and assembled together. Approximately 210,000 (42 Mb) of the total 454 FLX reads passed quality filtering and were utilized in the assembly. To enhance sequencing depth and acquire a more complete inventory of the endogenous digestive and metabolic capabilities of A. glabripennis, 130 million paired end Illumina reads (36 Gb) with a library insert size of 175 nucleotides (nt) were generated on a single lane using the Illumina HiSeq 2000 (SRX265394).

After quality filtering and adapter removal, over 128 million paired end reads (34 Gb) remained and were utilized in downstream processing and analyses. Digital k-mer normalization reduced the number of Illumina paired end reads to 2,090,296, which were ultimately used for co- assembly with the 454 FLX reads.

Assembly and Annotation Statistics

153 454Assembly and Annotation Statistics for Comparative Transcriptomics

To facilitate comparisons to transcriptome libraries prepared from the guts of other herbivorous insects, which were derived solely from 454 reads, the 454 reads were first assembled and analyzed without the Illumina reads. Of the 232,824 shotgun reads generated through 454 pyrosequencing (49.2 Mb), approximately 191,000 reads assembled into 2,081 contigs (1.26 Mb), ranging in length from 200 nt to 5,701 nt with an N50 contig length of 907 nt

(Figure 4-1). Assembled contigs that shared common reads were placed into isogroups. These contigs are often broken at branch points between exon boundaries in multiple transcript isoforms from the same unigene. Contig branch structures within each isogroup were then traversed to create 1,658 isotigs (1.4 Mb), which represent unique assembled transcripts or transcript fragments. The N50 isotig length was 1,076 nt and isotigs were grouped into 1,475 isogroups, representing a gene locus or unigene. Of these isogroups, 1,360 were comprised of a single transcript isoform and the average number of isotigs within an isogroup was 1.1. The maximum number of isotigs classified to the same isogroup was 11. For downstream comparative analyses, isogroups were treated as unigenes and isotigs associated with the same isogroup were treated as transcript isoforms. Roughly 27,000 reads (4.0 Mb) were singletons and were not included in the assembly. Of the singletons, approximately 19,000 reads (3.7 Mb) were flagged as high quality and, to increase the amount of information present in the transcriptome dataset, these singleton reads were concatenated to the assembly and the pooled dataset was utilized in downstream transcriptome comparisons. Assembly metrics from the 454-based assembly are presented in

Table 4-3. After clustering the isotigs and high quality singletons with CD-HIT-EST using a sequence similarity threshold of 0.97 to group transcripts that likely represented allelic variants of the same gene, the total number of isotigs and singletons was reduced to around 18,000. Seventy- eight of these isotigs and reads were classified as ribosomal RNAs, while none were classified as

154 tRNAs. Roughly 10,000 isotigs and singletons had BLASTX alignments to protein sequences housed in the non-redundant protein database at an e-value threshold of 1e-5 or lower. Of the isotigs and singletons that had BLASTX alignments, 9,130 were classified to class Hexapoda

(91%). Annotation statistics for this assembly are summarized in Table 4-4.

Hybrid Illumina/454 Transcriptome Assembly

Co-assembly with Illumina paired-end sequences using Trinity substantially improved the assembly metrics, resulting in the assembly of substantially more full length transcripts. For this reason, discussion of the digestive and metabolic capabilities of A. glabripennis are focused mainly on genes and transcripts detected in the co-assembly and the

454-only assembly is used strictly for comparisons to other herbivorous insect gut transcriptomes.

The final 454/Illumina co-assembly contained 42,085 transcripts (31 Mb) ranging in length from

200 to 32,701 nt with an N50 transcript length of 945 nt (Figure 4-2). Approximately 14,600 transcripts had predicted protein coding regions and, of these, over 10,000 transcripts contained full length open reading frames (ORFs) with discernible start and stop codons. These transcripts were classified to 35,948 unigenes, bringing the average number of transcript isoforms per locus to 1.2. The highest number of isoforms detected for an individual gene/locus was 26 and transcripts assigned to this unigene were predicted to encode tropomyosin. Full assembly and annotation metrics for the 454-Illumina hybrid assembly are presented in Table 4-5. Of the unigenes predicted to contain full length or partial ORFs, 13,892 (99%) had BLASTP alignments at an e-value threshold of 1e-5 or lower, while 341 unigenes were predicted to encode rRNAs and

70 transcripts were predicted to encode tRNAs. Approximately 9,900 (72%) of the unigenes that had BLASTP alignments were classified to class Hexapoda. Annotation metrics are presented in

Table 4-6. To assess the potential completeness and quality of the larval midgut transcriptome

155 assembly, several KEGG metabolic pathways known to be conserved, functional, and complete in insects were examined to determine if all genes associated with these pathways were represented in the assembly. Full pathways for glycolysis and gluconeogenesis, pyruvate metabolism, pyrimidine metabolism, purine metabolism, pyruvate metabolism, the citric acid cycle, and phosphatidylinositol signaling systems were successfully constructed from protein coding transcripts in the assembly.

Overall, the most abundant Pfam assignments detected in transcripts generated from the

Illumina/454 co-assembly were primarily structural domains, including WD 40, ankyrin, spectrin, and I-set, and domains associated with regulatory proteins, including reverse transcriptase, protein kinases, and zinc finger domain proteins. The most dominant unigenes predicted to encode enzymes that were detected in this assembly were annotated as trypsins, DDE superfamily , carboxylesterases, cytochrome P450s, and glycoside hydrolase family one (Figure

4-3). The majority of the unigenes detected in the midgut were assigned to the general functional prediction KOG category, indicating that many of the unigenes detected in the midgut have not been definitively assigned to metabolic pathways and suggesting that they may be involved in novel or uncharacterized processes. Other highly abundant KOG categories included signal transduction and carbohydrate transport and metabolism (Figure 4-4). KOG assignments of unigenes with putative signal peptides that could be involved in digestive processes were also conducted (Supplemental Results and Figure 4-S1).

156 Glycoside Hydrolases and Plant Cell Wall Digesting Enzymes

Transcripts Predicted to Encode Hemicellulases

Over 180 different unigenes assigned to 18 GH families were identified, many of which have annotations consistent with involvement in plant cell wall degradation in the A. glabripennis midgut (Figure 4-5). Of particular interest are enzymes capable of degrading cellulose and hemicellulose, which are the two most predominant polysaccharides found in hardwoods. Few insect enzymes involved in large-scale degradation of xylan (the dominant form of hemicellulose found in most deciduous trees) [63] have been expressed and biochemically characterized in vitro. Through in-gel zymograms infused with birch xylan and MADLI-TOF-based peptide sequencing, it was previously demonstrated that A. glabripennis was capable of producing at least one enzyme with hydrolytic activity directed at birch xylan, suggesting that the beetle has the endogenous capacity to degrade this hardwood polysaccharide [23]. Eight transcript isoforms of this GH 1 xylanase were detected in the transcriptome assembly, indicating that xylan degrading transcripts in A. glabripennis may be more numerous that previously reported. The identification of these transcripts is significant and redefines the role of insects in processing xylan as it has generally been presumed that xylanases are only produced by microbial symbionts [64]. It is possible that other GH transcripts detected in the A. glabripennis midgut may also encode xylanases or β-xylosidases. For example, GH family 30 is predominately comprised of β- xylosidases [48] and over 10 unigenes with GH 30 functional domains were detected in the A. glabripennis midgut transcriptome. However, the ability to predict polysaccharide substrates and catalytic potentials of these enzymes was impeded by the lack of specific annotations in the databases because very few of the highest scoring BLASTP alignments have corresponding

KEGG E.C. annotations. More refined annotations would require

157 Despite A. glabripennis’ endogenous ability to degrade long-chain xylan into shorter oligosaccharides, no insect-derived transcripts capable of releasing xylose monomers (β- xylosidases) from xylo-oligomers or converting xylose to ATP or acetyl coA (volatile fatty acids) were detected. Endogenous xylose utilization capabilities have not been described in cerambycids

[65] and it is generally hypothesized that these beetles depend on yeasts or other microbes in the gut to supply these enzymes [66]. Previous metagenomic profiling of the A. glabripennis midgut microbiota revealed that yeasts and lactic acid bacteria associated with the gut have the metabolic potential to ferment five-carbon sugars, converting them to ethanol and other compounds that could be used directly by A. glabripennis for energy and fatty acid production (Scully et al, in press). Furthermore, the presence of a large number of A. glabripennis-derived transcripts predicted to encode alcohol and aldehyde dehydrogenases could suggest a role in processing ethanol, acetate, and other metabolites generated through xylose fermentation by microbes colonizing the gut. A. glabripennis also possesses full fatty acid biosynthetic pathways capable of incorporating acetate, acetyl coA, and microbial fermentation products into fatty acids.

Other minor polysaccharides present in heartwood hemicellulose include glucuronoxylan, arabinoxylan, glucomannan, and xyloglucan, which are comprised of mannose, galactose, rhamnose, arabinose, glucuronic acid, and galacturonic acid monomers [67]. Despite the fact that these polysaccharides make up a relatively minor component of plant cell walls in the heartwood of deciduous trees, many transcripts predicted to encode enzymes that release mannose and galactose residues from polysaccharides were detected in the A. glabripennis midgut. For example, 16 unigenes predicted to encode GH 35 exo-β- and β-galactosidases, 12 unigenes predicted to encode GH 38 α-mannosidases and mannosyl-oligosaccharide α-1,3-1,6 mannosidases, and 3 unigenes predicted to encode GH 47 α-mannosidases were detected and could be utilized to liberate mannose and galactose from the hemicellulose matrix. Other transcripts predicted to encode enzymes responsible for processing minor polysaccharides present

158 in hemicellulose included β-mannosidase, , β-thioglucosidase, and β-fucosidase [48].

Further, A. glabripennis actively expressed transcripts involved in processing and utilizing mannose and galactose sugars via glycolysis, suggesting that these sugars can be directly utilized for energy production.

Transcripts Predicted to Encode Cellulases and Callases

Like many other wood-feeding insects, A. glabripennis also produces a number of transcripts predicted to encode cellulases. One of the most striking discoveries in the midgut transcriptome was the presence of six GH 5 cellulase unigenes, which all had highest scoring

BLASTP alignments to GH 5 endo-β-1,4 glucanases previously detected in the guts of other wood-feeding cerambycid beetles. Recombinant protein expression assays revealed that cellulases associated with other cerambycids (e.g., Apriona germari, Oncideres albomarginata chamela, and Psacothea hilaris) catalyzed the release of cello-oligomers from crystalline cellulose [24, 25, 68]. These were not flagged as transcript isoforms by Trinity, suggesting that genes encoding cellulases are represented in multiple copies in the A. glabripennis genome. The purpose of this redundancy is unknown, but several other coleopterans harbor multiple copies of cellulases belonging to the same GH family [41, 69]. These enzymes could function under different physiological conditions, which is consistent with the contrasting pH and oxygen gradients that can be found in different regions of cerambycid midguts [70]. Alternatively, these unigenes could encode enzymes with slightly different catalytic capabilities that act on different cellulose macromolecule substructures [71], target soluble or insoluble fractions of cellulose [72], or process cello-oligomers into cellobiose (exoglucanase activity). In addition to GH 5 cellulases,

A. glabripennis also expresses endo-β-1,4-glucanases classified to GH families 45 and 48. In addition to endoglucanases, A. glabripennis also produced a large number of β-glucosidases,

159 which hydrolyze cellobiose to release glucose. The majority of these were classified to GH family 1, which was the most abundant GH family detected in the A. glabripennis midgut transcriptome. The overabundance of β-glucosidases relative to cellulases is common in many wood-feeding insects and wood-degrading microorganisms and is hypothesized to serve as a mechanism to indirectly enhance cellulase activity. These β-glucosidases can often act quickly and efficiently to release glucose from cellobiose, reducing the impact of end product inhibition on cellulase activity [73].

In addition to transcripts encoding enzymes predicted to disrupt major hardwood polysaccharides, several transcripts involved in degrading minor polysaccharides were detected.

For example, callose is a linear polysaccharide comprised of β-1,3 and β-1,6 linked glucose.

Although callose is normally associated with the fleshy and metabolically active regions of plants, such as leaves and stems, it is also sporadically deposited in cell walls of secondary growth [74] and represents suitable stores of glucose that could be liberated and assimilated by A. glabripennis. Several β-1,3 and β-1,6 glucanases detected in the midgut transcriptome could be involved in liberating glucose from this polysaccharide.

Predicted to Encode Enzymes that Contribute to Lignin and Phenylpropanoid Degradation

While lignin is highly abundant in the heartwood of deciduous trees where the A. glabripennis larvae were collected for this study, no transcripts predicted to encode enzymes that are capable of yielding the types of lignin degradation products previously observed in A. glabripennis frass [21]were detected. A single laccase unigene with a signal peptide for extracellular targeting was detected in addition to several extracellular copper oxidase domain proteins, peroxidases, aldo-keto reductases, and alcohol dehydrogenases. Laccases are involved in lignin degradation in some white rot fungal taxa [75], and an endogenous termite laccase

160 capable of degrading lignin alkali and lignin phenolics was recently characterized [76]. However, despite their reported ability to degrade lignin phenolics, many laccases require extracellular redox mediators to disrupt the non-phenolic β-aryl ether and C-C linkages that dominant hardwood lignins and yield the types of degradation products observed in A. glabripennis frass

[77]. While pathways for synthesis of these redox mediators have been identified in some white rot fungi, insects are unlikely to have the endogenous ability to synthesize them since all laccase characterized redox mediators are comprised of aromatic rings, which insects cannot inherently synthesize [78]. Based on these observations, we hypothesize that lignin degrading activities in the gut should be directly enhanced through interactions with microbial enzymes capable of synthesizing aromatic redox mediators or liberating aromatic compounds from lignin. Lignin metabolites released from the biopolymer can also be used as laccase mediators. In addition to laccases, 26 unigenes predicted to encode aldo-keto reductases were detected in the A. glabripennis transcriptome. In a recent study, expression levels of termite-produced aldo-keto reductases were correlated with feeding on wood and a recombinant aldo-keto reductase expressed in conjunction with other termite-derived cellulases enhanced sugar release from pine saw dust [79], suggesting a role in enhancing lignocellulose digestion. Additionally, aldo-keto reductases have been shown to enhance xylose metabolism [80], degrade xenobiotics and carbohydrates [81], function as aryl alcohol dehydrogenases to facilitate the degradation of β-aryl ethers in lignin [82], and are induced by exposure to phenolics and aromatic compounds in bacteria and yeasts [83]. The abundance of these aldo-keto reductases in the midgut suggests that they could work in collaboration with other insect and microbial enzymes to facilitate penetration of lignin.

Other enzymes encoded by the A. glabripennis transcriptome capable of disrupting bonds that cross-link hemicellulose to lignin [84] included esterases, which liberate polysaccharide termini from the cell wall matrix, exposing them to hydrolytic enzymes and enhancing sugar

161 release from this group of polysaccharides. Additionally, 16 unigenes predicted to encode alcohol dehydrogenases were detected in the midgut transcriptome; although these enzymes have not been shown to break linkages in polymeric lignin, they are hypothesized to enhance lignin oxidation in the guts of termites [85] and they could serve similar roles in the A. glabripennis midgut. Finally, a number of extracellular peroxidases were also detected. Although the roles of insect-derived peroxidases in digestion and physiology are numerous and diverse [86], direct roles for insect peroxidases in lignin degradation have not been explored.

Lignin degradation releases phenylpropanoids (e.g. coumaryl and cinnamyl alcohol), which are often toxic; however, A. glabripennis produces enzymes capable of degrading phenylpropanoid subunits, including epoxide hydrolases, which are often involved in polycyclic aromatic compound metabolism [87]. Other transcripts predicted to encode detoxification enzymes and antioxidants that could make contributions to degradation or inactivation of toxic lignin metabolites include alcohol dehydrogenases, aldehyde dehydrogenases, cytochrome P450s, glutathione S-transferases, catalases, carboxylesterases, enzymes involved in aromatic compound degradation, and glucuronosyl transferases. Additionally, aldo-keto reductases are capable of degrading phenolic compounds, including tannins and phenylpropanoids released from lignin degradation, and could be primed for detoxification roles.

Glycoside Hydrolases and Plant Cell Wall Digesting Enzymes

A. glabripennis eggs hatch directly beneath the bark of hardwood trees and first and second instars feed on primary phloem and xylem [6], which serve as diffuse transport systems for toxic tree defensive compounds [88], before tunneling into the heartwood as later instars.

Though heartwood is not as metabolically active as the primary phloem and xylem, it

162 accumulates potentially toxic secondary metabolites, including alkaloids, tannins, hydroxycinnamic acids, and phenolic glycosides, defending the plant from herbivory and protecting structural polysaccharides and biopolymers from biotic assaults [89]. Given that A. glabripennis completes development in over 47 different tree species [8] and that it feeds in the phloem and xylem before eventually making its way into the heartwood, this insect must have mechanisms to detoxify or sequester the breadth of defensive plant secondary metabolites it encounters throughout its life cycle.

The gut represents the first line of defense against ingested host plant allelochemicals, pesticides, and other toxins and many transcripts predicted to encode detoxification enzymes were detected. For example, 50 cytochorome P450-like unigenes were detected in the A. glabripennis midgut transcriptome. These enzymes have versatile oxidoreductive properties, are highly involved in degrading lipophilic toxins [90, 91], and have been shown to confer resistance to pesticides [30] as well as small aromatic toxins that can accumulate to high concentrations in the heartwood of trees (e.g. alkaloids). The majority of the cytochrome P450 unigenes detected in the A. glabripennis midgut transcriptome had highest scoring BLASTP alignments to cytochrome P450s identified in the Tribolium castaneum genome [92, 93], although the percent similarity at the amino acid level ranged from 34% to 66%, reflecting a relatively large degree of divergence from previously annotated cytochrome P450s (Table 4-7).

Cytochrome P450 unigenes detected in the A. glabripennis midgut were putatively assigned to clans and families based on the annotations of the highest scoring BLASTP alignments [92]. To conform to cytochrome P450 classification conventions, only alignments sharing ≥ 40% amino acid identity were used to annotate these unigenes. While a handful of unigenes predicted to encode mitochondrial cytochrome P450s were identified, non- mitochondrial clans 3 and 4 were more highly represented in comparison. Non-mitochondrial cytochrome P450s were classified to five families, which were predominated by CYP6 (27

163 unigenes), but also included CYP4 (8 unigenes), CYP9 (4 unigenes), CYP345 (3 unigenes), and

CYP347 (1 unigene). CYP6 family genes are often present in multiple copies, occur in clusters in insect genomes [94], have pivotal roles in the detoxification of host plant defensive chemicals such as xanthotoxins, gossypol , and chlorogenic acid (a key intermediate in lignin biosynthesis)

[31], and are often induced in herbivorous insects during periods of feeding.

Unigenes predicted to encode carboxylesterases were the most dominant enzyme with putative involvement in detoxification processes detected in the midgut; 65 individual unigenes were predicted to encode these proteins. Although secreted carboxylesterases are generally involved in pheromone metabolism [95], intracellular carboxylesterases are often implicated in pesticide and allelochemical metabolism and tolerance [96]. Over 30 of the carboxylesterase unigenes detected in the A. glabripennis midgut lacked secretory peptides and may be primed to serve detoxification roles. For example, carboxylesterases are hypothesized to mediate resistance to phenolic glycosides in Papilio canadensis [97], and are often found in high levels in the midgut [98]. Notably, trees in the family Salicaceae, which include many of A. glabripennis’ preferred hosts (e.g. Populus spp.), are notorious producers of phenolic glycosides (e.g. salicin and tremulacin) [99] and the abundance of carboxylesterases may promote colonization and survival in these hosts.

Transcripts predicted to encode enzymes involved in conjugative deactivation of xenobiotic compounds were also detected from 30 and 21 unigenes predicted to encode UDP- glucuronsyl transferases and glutathione S-transferases, respectively [100]. These transferases can bind to xenobiotic compounds containing a diversity of functional groups, including oxygen, nitrogen, sulfur or carboxyl groups, enhancing their solubility and allowing them to be excreted or stored in the fat body for eventual elimination [101]. They have been previously shown to detoxify cyanates and cinnamaldehydes [102], which can be found in high concentrations in heartwood. Further, they can also conjugate and eliminate aromatic compounds, including

164 tannins and toxic aromatic compounds stored in the heartwood or released from lignin degradation [103].

Transcripts Predicted to Encode Enzymes Involved in Nitrogen Acquisition

Although nitrogen is scarce in the woody tissue of host trees, like other insects, A. glabripennis larvae have high demands for nitrogen during growth and development. Although nitrogen sources are present in very low abundance in woody tissues, microbes associated with the midgut have the metabolic capacity to synthesize all 23 amino acids, which could be assimilated and stored by the insect (Scully et al., in press) in the form of arylphorin and hexameric storage proteins encoded by A. glabripennis [104]. However, the insect possesses endogenous pathways for amino acid synthesis that may be complemented or augmented by microbial pathways, including complete pathways for the synthesis of alanine, aspartic acid, asparagine, proline, cysteine, glycine, and serine and for the synthesis of tyrosine from phenylalanine. In addition, nearly complete pathways for the synthesis of arginine, glutamic acid and selenocysteine were detected, but argininosuccinate lyase, glutamate formiminotransferase, and selenocystine synthases transcripts were absent. These pathways may be incomplete because transcripts encoding these enzymes are simply not expressed in the midgut, they were expressed at low levels and were not detected at the sequencing depth obtained, or because they may be complemented by microbial enzymes that catalyze these reactions.

The phloem tissue where early instars feed is rich in amino acids relative to the heartwood [105] where older larvae grow and develop. Therefore, recycling waste products of amino acid and nucleotide deamination reactions back into functional amino acids, nucleotides, and other nitrogen-containing compounds may be important to the nitrogen economy in A.

165 glabripennis. A number of transcripts with highest scoring BLASTP alignments to enzymes predicted to catalyze deamination reactions and liberating ammonia from a variety of nitrogen- containing compounds were detected in the midgut transcriptome. These included adenine deaminases, cytosine deaminases, , , and chitin-degrading enzymes.

Ammonia liberated from nitrogen-containing compounds could be directly converted to glutamine by glutamine synthetase and aspartic acid by aspartate ammonia and indirectly incorporated into the synthesis of purines [106], glutamate, alanine, asparagine, and proline by enzymes encoded by the insect (Figure 4-6). Furthermore, ammonia or amino acids constructed from recycled ammonia could be shuttled to microbes housed in the midgut to synthesize nonessential or essential amino acids, augmenting or complementing A. glabripennis’ physiological capabilities. Despite these potential contributions to nitrogen economy, the mechanisms of essential amino acid synthesis and recycling are not clear as the abundance of essential amino acids in woody tissue varies depending on tree species, but are significantly lower than the abundance of nonessential amino acids [15]. As with other insects studied, no pathways for the synthesis and metabolism of essential amino acids (e.g., branched chain and aromatic amino acids) were detected in the midgut transcriptome, although these could be expressed elsewhere. However, full pathways for the synthesis of nine essential amino acids were detected in the A. glabripennis midgut metagenome, including histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine, and could serve as key sources of essential amino acids in this insect (Scully et al, in press). Furthermore, the gut community also contained numerous uricase and urease genes, which could be involved in recycling nitrogenous waste products produced by A. glabripennis or its gut microbes (Scully et al, in press). Although the waste products from A. glabripennis have not been biochemically characterized, all enzymes associated with the urea cycle were detected in the A. glabripennis midgut, including several transcripts that catalyze the conversion of arginine to urea, suggesting that urea could be

166 produced in this insect. This urea pathway is also functional in the guts of several other insects

[107, 108].

Transcripts predicted to encode several types of digestive proteinases, including serine and cysteine proteinases, were detected, which aid in protein acquisition from glycoproteins cross-linked in the cell wall matrix [16] or from microbes housed in the midgut. By far, serine proteinase unigenes were more expansive than cysteine proteinase unigenes and cystatins

(cysteine proteinase inhibitors) were co-expressed and likely repress cysteine proteinase activities

[109] in the midgut. These results are consistent with a previous study that reported high serine protease activities and the absence of cysteine and aspartic peptidase activities in the A. glabripennis midgut [110]. However, the expression of both cysteine and aspartic proteinases in the midgut suggests that this insect still has the genetic capacity to produce these proteinases under certain conditions and these genes may be maintained in the genome as a strategy to combat digestive proteinase inhibitors produced by host plants. Analysis of transcripts involved in converting compounds in woody tissue into fatty acids and sterols was also conducted

(Supplemental Results).

Transcripts Involved in Facilitating Interactions with Gut Microbes

Although there is debate about how microbes associated with cerambycid guts contribute to digestive physiology [111], a number of transcripts with putative involvement in mediating interactions with microbes were detected in the A. glabripennis midgut transcriptome. Several of these transcripts are likely involved in maintaining host-microbe homeostasis, including transcripts predicted to encode both antifungal and antibacterial proteins, dual oxidases [112], mucin, which forms a protective barrier to protect the midgut from microbial invasion [113],

167 MPA2 allergen proteins, which have antimicrobial properties and are often upregulated during periods of stress [114], and several encapsulation proteins involved in activating innate immune pathways. Seven unigenes predicted to encode hemocyanins were detected (Figure4- 7). While they primarily function as oxygen carriers in crustaceans, hemocyanins are rare, but not completely absent in insect genomes and their physiological functions are not well characterized

[115]. Despite their functional obscurity in insects, they can function as pro-phenoloxidases under certain circumstances, activating innate immune pathways and mediating insect-microbe interactions in the midgut [116]. Hemocyanins have also been hypothesized to serve roles in the degradation of lignin since transcripts encoding hemocyanins are highly expressed in a symbiont- free, wood-feeding marine isopod, but no direct involvement in this process has been demonstrated [117]. Additionally, signal peptides were not observed in any of the hemocyanin unigenes detected in A. glabripennis which, if not an assembly artifact, would preclude their involvement in extracellular digestive processes (e.g. lignin degradation).

Identification of Highly Expressed Genes

Transcripts originating from genomic and mitochondrial ribosomal rRNAs were omitted from this analysis and the unigenes with the top 50 FPKM values (fragments per kilobase of exon per million mapped reads) containing predicted coding regions were identified as highly expressed (Supplemental Table 4-2). Many of the highly expressed genes identified in the midgut have predicted involvement in stress and immune modulation and included unigenes predicted to encode several MPA2 allergen domain proteins, five carboxylesterases, two cathespins, two encapsulation related proteins, two mucin proteins, a , a cytochrome P450, a thaumatin domain protein, and a lectin domain protein. While

168 carboxylesterases and cytochrome P450s have key involvements in detoxification and sterol acquisition, cathespins, encapsulation proteins, mucin proteins, lipases, thaumatin domain proteins and lectins are hypothesized to play fundamental roles in mediating host-microbe interactions. Not surprisingly, GH 48 and GH 5 cellulases and GH 1 β-glucosidases were also highly expressed in the midgut, reflecting the nutritional importance of cellulose to A. glabripennis. GH 31 and GH 35 β-galactosidases were also highly expressed, suggesting that galactan polymers present in hardwood hemicellulose or on the cell surface of microbes are also crucial sources of sugar for this insect. Chitin deacetylase unigenes were highly abundant; these enzymes can liberate acetate from insect or fungal chitin, which can be recycled for energy or fatty acid production [118]. Several different types of digestive proteinases were also highly expressed and included M16 petpidases, M14 carboxypeptidases, serine proteinases, and cysteine proteinases, which likely serve key roles in nitrogen extraction from plant or microbial cell wall proteins. In addition to these digestive proteins, several unigenes predicted to encode hypothetical proteins with unknown functions were abundant, suggesting that A. glabripennis encodes suites of novel proteins that could be relevant for digestive physiology and development.

Glycoside Hydrolase Profile Comparisons

Through comparisons of transcriptome libraries sampled from a variety of herbivorous insects, no major trends were detected with regard to GH profiles and feeding habitats. Euclidean distances between insects that fed on similar substrates were large in many cases and reflected strong differences in GH compositions. Thus, the gut transcriptome libraries did not show any significant clustering trends by food source (Figure 4-8). For example, Agrilus planipennis and

Dendroctonus ponderosae both feed in phloem and were found in separate planes of the PCA

169 ordination, indicating that there were large differences in GH family composition between these two insects. Likewise, A. glabripennis, Coptotermes formosanus, and Reticulitermes flavipes all feed in wood and were also found in opposite quadrants in the PCA ordination. Although these insects are all capable of producing endogenous cellulases, A. glabripennis produces different types of cellulases than the wood- and phloem-feeding insects compared in this study. For example, the two termite species included in this analysis predominately produce GH 9 cellulases, while A. glabripennis produces GH 5, GH 45, and GH 48 cellulase transcripts and

Dendroctonous ponderosae produces GH 45 and GH 48 cellulase transcripts.

Despite the lack of clustering by feeding niche, there appeared to be some clustering by phylogenetic relatedness. For example, most cerambycid and chrysomelid beetles were positioned along the positive X axis and, like A. glabripennis, some of these insects produce transcripts predicted to encode GH 5, 45, and 48 cellulases, although they feed on very different parts of their host plants. Furthermore, it is interesting to note that GH 5 cellulases have not yet been found in any insect outside the order Coleoptera, but the number of GH 5 cellulases unigenes detected in insect species from this order varied tremendously. While GH 5 transcripts were not detected in association with many coleopterans, the chrysomelids Gastrophysa viridula and Callosobruchus maculatus encode one and four GH 5 unigenes, respectively. Phylogenetic analysis of translated protein sequences revealed that, although chyrsomelid GH 5 cellulases and cerambycid (A. glabripennis, Anoplophora chinensis, Apriona germari, Psachothea hilaris, and

Oncideres albomarginata chamela) GH 5 cellulases share a common ancestor, chrysomelid cellulases have rapidly diverged from cerambycid cellulases (Figure 4-9). In contrast, GH 5 cellulases within the Cerambycidae seem to have multiplied and diversified through gene conversion or gene duplication events and are possibly more adapted to digesting highly insoluble cellulose associated with woody plants.

170 Furthering the hypothesis that the PCA ordination was primarily driven by phylogenetic relatedness, family-specific trends in abundances of GH families were observed within the

Coleoptera. In contrast to GH 5 cellulases, which seem to have multiplied in some cerambycid beetles, GH 45 and GH 48 cellulases were expressed as single copy genes in A. glabripennis

(Figures 4-10 and 4-11). In contrast, members of these GH families have multiplied and diversified in the chrysomelids and curculionid lineages, suggesting that coleopterans have undergone lineage specific adaptations to overcome challenges associated with different feeding regimes. For example, the results of the GH 48 maximum likelihood analysis (Figure 4-10) suggest that GH 48 enzymes were likely encoded in the genome of the last common ancestor of coleopterans and that they underwent family-specific adaptations. This scenario is supported since GH 48 proteins in each insect associated family formed their own supported clusters in the maximum likelihood tree. In particular, genes encoding GH 48 enzymes were likely duplicated in the Chrysomelidae. All members of this family encode at least two GH 48 proteins and the branching topology suggests that the second GH 48 gene originated directly from the first.

Likewise, GH 45 genes have also duplicated and proliferated throughout the chrysomelid and curcurlionid lineages, but the dynamics driving the evolution of this GH family seem to be more complex in comparison to the GH 48 family (Figure 4-11). In some species, GH 45 genes have rapidly propagated and diversified (e.g. Leptinotarsa decemlineata), while in other cases, the insect expressed only a single copy of this gene (e.g. G. viridula and C. tremulae).

The hypothesis that A. glabripennis benefits from microbial enzymes to facilitate nutrient acquisition is supported through the above comparisons. For example, transcripts predicted to encode GH family 18, 20, and 30 , β-hexosamidases, and are strongly associated with A. glabripennis. Although these chitin-degrading genes may be important for remodeling the gut peritrophic matrix, which is predominately composed of chitin, they could also play key roles in modulating interactions with fungal taxa associated with the

171 midgut, including yeasts and Fusarium solani, a soft rot fungal symbiont of A. glabripennis [119,

120]. We hypothesize that the predominance of chitinases allows A. glabripennis to derive a portion of its carbohydrate and nitrogen resources from fungal chitinous cell walls, which are composed of polymers of N-acetylglucosamine. Non-entomopathogenic fungi associated with wood-feeding insects have been previously hypothesized to concentrate and/or recycle nitrogen

[121] and thus we also hypothesize that the Ascomycota fungal strains found in association with the A. glabripennis midgut serve these same purposes.

Multivariate Transcriptome Comparisons of Gene Ontology Annotations

Like the GH analysis, multivariate comparisons of level four GO categories in midgut transcriptome libraries sampled from a variety of herbivorous insects revealed no significant clustering of transcriptome libraries by feeding habitat. However, phylogenetic relatedness alone did not explain the observed pattern of clustering achieved for the transcriptome comparisons

(Figure 4-12). In addition, subsets of different GO categories were enriched in each transcriptome library included in the comparison, while many of the GO categories were present in approximately the same abundances in each library. Together, these findings suggest that most insects possess similar repertoires of gene families and that these genes have adapted in lineage specific manners optimal for overcoming digestive and nutritional challenges associated with specific feeding habitats and ecological niches. For example, although most insects produce a similar number of GH unigenes (directed at O-glycosyl linkages), the GH family-level comparison suggested that each insect produced its own unique GH profile. Other GO categories that are present in similar abundances in all insects included in this comparison include 4-α-

172 glucanotransferases, heme binding and transporting proteins, and regulatory genes (GTPases,

MAP kinases, etc.).

Despite the lack of clustering by food source or phylogenetic relatedness, several trends were detected that distinguish A. glabripennis from the rest of the insects included in this comparison that could be pivotal to its ability to digest lignocellulose and other wood polysaccharides and extract nutrients from a broad range of deciduous host trees. For example, in the midgut of A. glabripennis, more unigenes and transcript isoforms were produced with predicted monooxygenase and oxidoreductase activities relative to other insects included in this comparison, which could be relevant to its ability to detoxify allelochemicals from its broad range of host plants. Further examination of Pfam domain abundances in each library revealed that unigenes and transcript isoforms predicted to encode carboxylesterases and cytochrome P450s were more abundant in the A. glabripennis midgut than many of the other insect libraries sampled. A. glabripennis also has the broadest host range of any insect included in this comparison, suggesting that it needs to encode a broader arsenal of detoxification enzymes relative to other insects included in this comparison. Several unigenes predicted to encode digestive peptidases, (forming C-N bonds), and protein transporters were also overrepresented relative to other insect transcriptome libraries, which could be relevant for digesting and assimilating proteins produced by microbes associated with the midgut or from plant cell walls [16]. These digestive peptidases are also overrepresented in G. viridula, C. formosanus, and M. sexta transcriptome libraries. Unigenes associated with hydrolase activity

(acting on acid anhydrides) were also highly abundant in A. glabripennis, many of which were predicted to encode ATPases and other nucleosidases, DNA binding proteins, RNA binding proteins, nucleotide binding proteins, and transferases involved in transferring phosphorous containing groups were also highly associated with A. glabripennis. The high abundance of unigenes for these nucleotide binding proteins and is likely associated with the high

173 numbers of unigenes predicted to encode reverse transcriptases, transposases, and integrases that were detected in the midgut. Finally, unigenes predicted to encode proteins with substrate- specific and active transmembrane transporter activities, including major facilitator family (MFS) transporters, were also highly abundant in the A. glabripennis midgut. MFS transporters are a diverse group of carriers involved in the absorption of small solutes, including sugar, aromatic amino acids, and other small compounds, which may be involved in assimilation and utilization of small microbial metabolites and/or small metabolites released from the degradation of woody tissue. Taken together, the differences in GH family and level four GO compositions among insects with similar feeding regimes suggest that the ability to degrade polysaccharides found in woody tissue evolved through lineage specific adaptations rather than through convergent evolutionary processes.

Conclusions

The A. glabripennis midgut transcriptome provides the first comprehensive insight into the endogenous digestive capabilities of wood-boring cerambycid larvae as they feed in a highly lignified and nutritionally deficient environment. Comparative transcriptome analysis clearly distinguished the A. glabripennis midgut transcriptome from the gut transcriptome libraries of other herbivorous insects that have been previously sampled for sequencing, which may contribute to the long life cycle and its ability to feed and develop while feeding in a highly lignified food source. Our results highlighted gene categories that were enriched in the A. glabripennis midgut transcriptome which were hypothesized to make key contributions to this insect’s lifestyle, including its ability to colonize a broad range of living host trees. For example, unigenes predicted to encode monoxygeneases, carboxylesterases, heat shock proteins, and other

174 detoxification enzymes were highly abundant in the midgut relative to unigenes in transcriptomes of many other herbivorous insects included in this comparison. Furthermore, A. glabripennis expressed its own unique profile of GH unigenes for liberating sugars from woody tissue.

Our results also highlight deficiencies in endogenous digestive and metabolic pathways that could be supplied by microbes associated with the midgut. Although A. glabripennis possesses the machinery to digest plant cell wall polysaccharides, including cellulases, β- glucosidases, and hemicellulases, it notably lacks enzymes to process xylose, arabinose, and other pentose sugars present in the hemicelluloses of its deciduous host trees. However, previous metagenomic profiling revealed that the midgut community is capable of producing enzymes that fill this niche and are hypothesized to facilitate the beetle’s ability to use these sugar substrates and enhance its ability to acquire acetate from pentose wood sugars for energy and fatty acid production (Scully et al., in press). Furthermore, although A. glabripennis can produce enzymes that can facilitate local degradation of lignin and enzymes that could serve pivotal roles in nitrogen recycling, it lacks the abilities to synthesize aromatic and branched chain amino acids, to manufacture cholesterol and vitamins, and the cellulase activities in the A. glabripennis midgut are enhanced in the presence of a taxonomically diverse microbial community [22]. For these reasons, we hypothesize that enzymes derived from midgut-associated microbes complement the expression of insect genes and serve key vital roles in likely serve vital roles in the digestive physiology of A. glabripennis. This study also substantially expands the genetic resources available for coleopterans by providing transcriptome data for an insect that feeds in the heartwood of healthy host trees. The results presented here may serve as a source for bioprospecting of novel enzymes to enhance industrial biofuels productions and for the pursuit of novel targets for controlling this serious pest.

175 Acknowledgements

454 FLX sequencing was performed in the laboratory of Dr. Stephan Schuster at The

Pennsylvania State University and Illumina HiSeq 2000 sequencing was performed at University of Delaware Biotechnology Institute. BLAST and Pfam searches and phylogenetic analyses were performed using computing resources available at the Hawaii Open Supercomputing Center at

University of Hawaii (Jaws cluster; Maui, HI), the Research Computing and Cyberinfrastructure

Group at The Pennsylvania State University (LionX clusters; University Park, PA), and USDA-

ARS PBARC (Moana cluster; Hilo, HI). We thank David Long, Josh Osborne, Karen Bingham,

Jess Shilladay, Isabel Ramos, and Karen Pongrance for assistance with insect rearing, Scott

DiLoretto and Lynn Tomsho for assistance with 454 library preparation, and Bruce Kingham for assistance with Illumina library construction and sequencing. Funding for this project was provided by USDA-NRI-CRSEES grant 2008-35504-04464, USDA-NRI-CREES grant 2009-

35302-05286, the Alphawood Foundation, Chicago, Illinois, a Seed Grant to Dr. Hoover from the

Pennsylvania State University College of Agricultural Sciences, and a USDA-AFRI Microbial

Functional Genomics Training grant 2010-65110-20488 to EDS.

176 Literature Cited

1. Grimaldi DA: 400 million years on six legs: on the origin and early evolution of

Hexapoda. Arthropod Struct Dev 2010, 39(2-3):191-203.

2. Schowalter TD: Insect responses to major landscape-level disturbance. Annu Rev

Entomol 2012, 57:1-20.

3. Friend WG: Nutritional requirements of phytophagous insects. Annual Review of

Entomology 1958, 3:57-74.

4. Schloss PD, Delalibera I, Handelsman J, Raffa KF: Bacteria associated with the guts of

two wood-boring beetles: Anoplophora glabripennis and Saperda vestita

(Cerambycidae). Environ Entomol 2006, 35(3):625-629.

5. Xueyan Y, Jiaxi Z, Fugui W, Min C: A study on the feeding habits of the larvae of two

species of longicorn (Anoplophora) to different tree species. Journal of Northwest

Forestry College 1995, 2.

6. Hu JF, Angeli S, Schuetz S, Luo YQ, Hajek AE: Ecology and management of exotic

and endemic Asian longhorned beetle Anoplophora glabripennis. Agricultural and

Forest Entomology 2009, 11(4):359-375.

7. Haack RA, Law KR, Mastro VC, Ossenbruggen HS, Raimo BJ: New York's battle with

the Asian long-horned beetle. J Forest 1997, 95(12):11-15.

8. Haack RA, Herard F, Sun JH, Turgeon JJ: Managing invasive populations of Asian

longhorned beetle and citrus longhorned beetle: a worldwide perspective. In: Annual

Review of Entomology. vol. 55. Palo Alto: Annual Reviews; 2010: 521-546.

9. Ugine TA, Gardescu S, Lewis PA, Hajek AE: Efficacy of imidacloprid, trunk-injected

into Acer platanoides, for control of adult Asian longhorned beetles (Coleoptera:

Cerambycidae). Journal of Economic Entomology 2012, 105(6):2015-2028.

177 10. Ugine TA, Gardescu S, Hajek AE: The effect of exposure to imidacloprid on Asian

longhorned beetle (Coleoptera: Cerambycidae) survival and reproduction. Journal

of Economic Entomology 2011, 104(6):1942-1949.

11. Lingafelter SW, Hoebke, E. R.: Revision of Anoplophora (Coleoptera:

Cerambycidae). Washington, DC: Entomological Society of Washington 2002:236 p.

12. Petterson RC: The chemical composition of wood. In The Chemistry of Solid Wood.

1984, The American Chemical Society.

13. Kirk TK, Farrell RL: Enzymatic combustion - the microbial-degradation of lignin.

Annual Review of Microbiology 1987, 41:465-505.

14. Tien M, Kirk TK: Lignin-degrading enzyme from Phanerochaete chrysosporium -

purification, characterization, and catalytic properties of a unique H2o2-requiring

xxygenase. Proceedings of the National Academy of Sciences of the United States of

America-Biological Sciences 1984, 81(8):2280-2284.

15. Mattson WJ: Herbivory in relation to plant nitrogen-content. Annu Rev Ecol Syst

1980, 11:119-161.

16. Keller B, Templeton MD, Lamb CJ: Specific localization of a plant-cell wall glycine-

rich protein in protoxylem cells of the vascular system. Proceedings of the National

Academy of Sciences of the United States of America 1989, 86(5):1529-1533.

17. Dillon RJ, Dillon VM: The gut bacteria of insects: nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

18. Werren JH: Symbionts provide pesticide detoxification. Proceedings of the National

Academy of Sciences of the United States of America 2012, 109(22):8364-8365.

19. Beaver R, Wilding N, Collins N, Hammond P, Webber J: Insect-fungus relationships in

the bark and ambrosia beetles. In: Insect-fungus interactions 14th Symposium of the

178 Royal Entomological Society of London in collaboration with the British Mycological

Society: 1989: Academic Press; 1989: 121-143.

20. Raffa KF: Genetic engineering of trees to enhance resistance to insects. BioScience

1989, 39(8):524-534.

21. Geib SM, Filley TR, Hatcher PG, Hoover K, Carlson JE, Jimenez-Gasco Mdel M,

Nakagawa-Izumi A, Sleighter RL, Tien M: Lignin degradation in wood-feeding

insects. Proceedings of the National Academy of Sciences of the United States of America

2008, 105(35):12932-12937.

22. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

23. Geib SM, Tien M, Hoover K: Identification of proteins involved in lignocellulose

degradation using in gel zymogram analysis combined with mass spectroscopy-

based peptide analysis of gut proteins from larval Asian longhorned beetles,

Anoplophora glabripennis. Insect Science 2010, 17(3):253-264.

24. Sugimura M, Watanabe H, Lo N, Saito H: Purification, characterization, cDNA

cloning and nucleotide sequencing of a cellulase from the yellow-spotted longicorn

beetle, Psacothea hilaris. Eur J Biochem 2003, 270(16):3455-3460.

25. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G, Quesada M:

cDNA cloning, homology modelling and evolutionary insights into novel endogenous

cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae).

Insect Molecular Biology 2010, 19(3):323-336.

26. Pauchet Y, Wilkinson P, van Munster M, Augustin S, Pauron D, Ffrench-Constant RH:

Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela

179 tremulae reveals new gene families in Coleoptera. Insect Biochemistry and Molecular

Biology 2009, 39(5-6):403-413.

27. Jongsma MA, Bolter C: The adaptation of insects to plant protease inhibitors. Journal

of Insect Physiology 1997, 43(10):885-895.

28. Levin DA: The chemical defenses of plants to pathogens and herbivores. Annu Rev

Ecol Syst 1976, 7:121-159.

29. Walling LL: Avoiding effective defenses: strategies employed by phloem-feeding

insects. Plant Physiol 2008, 146(3):859-866.

30. Scott JG, Liu N, Wen Z: Insect cytochromes P450: diversity, insecticide resistance

and tolerance to plant toxins. Comparative Biochemistry and Physiology Part C:

Pharmacology, Toxicology and Endocrinology 1998, 121(1–3):147-155.

31. Li X, Berenbaum MR, Schuler MA: Plant allelochemicals differentially regulate

Helicoverpa zea cytochrome P450 genes. Insect Molecular Biology 2002, 11(4):343-

351.

32. Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer

RNA genes in genomic sequence. Nucleic Acids Research 1997, 25(5):0955-0964.

33. Eddy SR: HMMER: Profile hidden Markov models for biological sequence analysis.

In., vol. 14. Bioinformatics; 1998: 755-763.

34. Huang Y, Gilna P, Li W: Identification of ribosomal RNA genes in metagenomic

fragments. Bioinformatics 2009, 25(10):1338-1340.

35. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ:

Gapped BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Research 1997, 25(17):3389-3402.

36. Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data.

Genome research 2007, 17(3):377-386.

180 37. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a

universal tool for annotation, visualization and analysis in functional genomics

research. Bioinformatics, 21(18):3674-3676.

38. Sun Y, Buhler J: Designing patterns for profile HMM search. Bioinformatics 2007,

23(2):e36-e43.

39. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, Khanna A,

Marshall M, Moxon S, Sonnhammer ELL et al: The Pfam protein families database.

Nucleic Acids Research 2004, 32(suppl 1):D138-D141.

40. Mittapalli O, Bai X, Mamidala P, Rajarapu SP, Bonello P, Herms DA: Tissue-specific

transcriptomics of the exotic invasive insect pest emerald ash borer (Agrilus

planipennis). PloS one 2010, 5(10):e13708.

41. Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH: Diversity of beetle genes

encoding novel plant cell wall degrading enzymes. PloS one 2010, 5(12):e15635.

42. Pauchet Y, Wilkinson P, Vogel H, Nelson DR, Reynolds SE, Heckel DG, Ffrench-

Constant RH: Pyrosequencing the Manduca sexta larval midgut transcriptome:

messages for digestion, detoxification and defence. Insect Molecular Biology 2010,

19(1):61-75.

43. Whitfield AE, Rotenberg D, Aritua V, Hogenhout SA: Analysis of expressed sequence

tags from Maize mosaic rhabdovirus-infected gut tissues of Peregrinus maidis

reveals the presence of key components of insect innate immunity. Insect Molecular

Biology 2011, 20(2):225-242.

44. Khajuria C, Zhu Y, Chen M-S, Buschman L, Higgins R, Yao J, Crespo A, Siegfried B,

Muthukrishnan S, Zhu K: Expressed sequence tags from larval gut of the European

corn borer (Ostrinia nubilalis): exploring candidate genes potentially involved in

Bacillus thuringiensis toxicity and resistance. BMC Genomics 2009, 10(1):286.

181 45. Aw T, Schlauch K, Keeling C, Young S, Bearfield J, Blomquist G, Tittiger C:

Functional genomics of mountain pine beetle (Dendroctonus ponderosae) midguts

and fat bodies. BMC Genomics 2010, 11(1):215.

46. Tartar A, Wheeler MM, Zhou XG, Coy MR, Boucias DG, Scharf ME: Parallel

metatranscriptome analyses of host and symbiont gene expression in the gut of the

termite Reticulitermes flavipes. Biotechnol Biofuels 2009, 2.

47. Xie L, Zhang L, Zhong Y, Liu N, Long Y, Wang S, Zhou X, Zhou Z, Huang Y, Wang Q:

Profiling the metatranscriptome of the protistan community in Coptotermes

formosanus with emphasis on the lignocellulolytic system. Genomics 2012, 99(4):246-

255.

48. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The

Carbohydrate-Active EnZymes database (CAZy): an expert resource for

Glycogenomics. Nucleic Acids Research 2009, 37:D233-D238.

49. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of

protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659.

50. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H,

Valentin F, Wallace IM, Wilm A, Lopez R et al: Clustal W and Clustal X version 2.0.

Bioinformatics 2007, 23(21):2947-2948.

51. Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for

evolutionary analysis of DNA and protein sequences. Briefings in bioinformatics

2008, 9(4):299-306.

52. Abascal F, Zardoya R, Posada D: ProtTest: selection of best-fit models of protein

evolution. Bioinformatics 2005, 21(9):2104-2105.

53. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large

phylogenies by maximum likelihood. Systematic biology 2003, 52(5):696-704.

182 54. Zwickl D: Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under the maxmum likelihood criterion. PhD

dissertation available at http://wwwnescentorg/wg_garli (Univ of Texas, Austin) 2006.

55. Sukumaran J, Holder MT: DendroPy: a Python library for phylogenetic computing.

Bioinformatics 2010, 26(12):1569-1571.

56. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G,

Durbin R: The sequence alignment/map format and SAMtools. Bioinformatics 2009,

25(16):2078-2079.

57. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan

L, Raychowdhury R, Zeng QD et al: Full-length transcriptome assembly from RNA-

Seq data without a reference genome. Nat Biotechnol 2011, 29(7):644-U130.

58. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH,

Geer LY, Geer RC, Gonzales NR, Gwadz M et al: CDD: specific functional annotation

with the Conserved Domain Database. Nucleic Acids Research 2009, 37:D205-D210.

59. Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient

alignment of short DNA sequences to the human genome. Genome Biol 2009,

10(3):R25.

60. Li B, Dewey C: RSEM: accurate transcript quantification from RNA-Seq data with

or without a reference genome. BMC Bioinformatics 2011, 12(1):323.

61. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying

mammalian transcriptomes by RNA-Seq. Nat Meth 2008, 5(7):621-628.

62. Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal

peptides from transmembrane regions. Nat Methods 2011, 8(10):785-786.

183 63. Scrivener AM, Watanabe H, Noda H: Diet and carbohydrate digestion in the yellow-

spotted longicorn beetle Psacothea hilaris. Journal of Insect Physiology 1997,

43(11):1039-1052.

64. Brennan Y, Callen WN, Christoffersen L, Dupree P, Goubet F, Healey S, Hernandez M,

Keller M, Li K, Palackal N et al: Unusual microbial xylanases from insect guts.

Applied and environmental microbiology 2004, 70(6):3609-3617.

65. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

66. Suh SO, Noda H, Blackwell M: Insect symbiosis: derivation of yeast-like

endosymbionts within an entomopathogenic filamentous lineage. Molecular biology

and evolution 2001, 18(6):995-1000.

67. Scheller HV, Ulvskov P: Hemicelluloses. Annual Review of Plant Biology 2010,

61(1):263-289.

68. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, Je YH, Lee SM, Seo SJ, Sohn HD, Jin BR:

cDNA cloning, expression, and enzymatic activity of a cellulase from the mulberry

longicorn beetle, Apriona germari. Comp Biochem Phys B 2004, 139(1):107-116.

69. Keeling CI, Yuen MM, Liao NY, Docking TR, Chan SK, Taylor GA, Palmquist DL,

Jackman SD, Nguyen A, Li M: Draft genome of the mountain pine beetle,

Dendroctonus ponderosae Hopkins, a major forest pest. Genome Biology 2013,

14(3):R27.

70. Johnson KS, Rabosky D: Phylogenetic distribution of cysteine proteinases in beetles:

evidence for an evolutionary shift to an alkaline digestive strategy in Cerambycidae.

Comp Biochem Physiol B Biochem Mol Biol 2000, 126(4):609-619.

184 71. Brás JL, Cartmell A, Carvalho ALM, Verzé G, Bayer EA, Vazana Y, Correia MA, Prates

JA, Ratnaparkhe S, Boraston AB: Structural insights into a unique cellulase fold and

mechanism of cellulose hydrolysis. Proceedings of the National Academy of Sciences

2011, 108(13):5237-5242.

72. Irwin DC, Spezio M, Walker LP, Wilson DB: Activity studies of eight purified

cellulases: specificity, synergism, and binding domain effects. Biotechnology and

bioengineering 1993, 42(8):1002-1013.

73. Andrić P, Meyer AS, Jensen PA, Dam-Johansen K: Reactor design for minimizing

product inhibition during enzymatic lignocellulose hydrolysis: I. significance and

mechanism of cellobiose and glucose inhibition on cellulolytic enzymes.

Biotechnology Advances 2010, 28(3):308-324.

74. Spicer R: Senescence in secondary xylem: heartwood formation as an active

developmental program. Vascular transport in plants 2005:457-475.

75. Eggert C, Temp U, Eriksson KEL: Laccase is essential for lignin degradation by the

white-rot fungus Pycnoporus cinnabarinus. FEBS letters 1997, 407(1):89-92.

76. Coy MR, Salem TZ, Denton JS, Kovaleva ES, Liu Z, Barber DS, Campbell JH, Davis

DC, Buchman GW, Boucias DG et al: Phenol-oxidizing laccases from the termite gut.

Insect Biochemistry and Molecular Biology 2010, 40(10):723-732.

77. Eggert C, Temp U, Dean JF, Eriksson KE: A fungal metabolite mediates degradation

of non-phenolic lignin structures and synthetic lignin by laccase. FEBS letters 1996,

391(1-2):144-148.

78. Brunet P: The metabolism of the aromatic amino acids concerned in the cross-

linking of insect cuticle. Insect Biochemistry 1980, 10(5):467-500.

185 79. Sethi A, Slack JM, Kovaleva ES, Buchman GW, Scharf ME: Lignin-associated

metagene expression in a lignocellulose-digesting termite. Insect Biochemistry and

Molecular Biology 2013, 43(1):91-101.

80. Kavanagh KL, Klimacek M, Nidetzky B, Wilson DK: The structure of apo and holo

forms of xylose reductase, a dimeric aldo-keto reductase from Candida tenuis.

Biochemistry 2002, 41(28):8785-8795.

81. Ford G, Ellis EM: Three aldo–keto reductases of the yeast Saccharomyces cerevisiae.

Chemico-biological interactions 2001, 130–132(0):685-698.

82. Muheim A, Walder R, Sanglard D, Reiser J, Schoemaker HE, Leisola MS. Purification

and properties of an aryl‐alcohol dehydrogenase from the white‐rot fungus

Phanerochaete chrysosporium. Eur J Biochem 1991, 195(2):369-375.

83. Tam LT, Eymann C, Albrecht D, Sietmann R, Schauer F, Hecker M, Antelmann H:

Differential gene expression in response to phenol and catechol reveals different

metabolic activities for the degradation of aromatic compounds in Bacillus subtilis.

Environmental Microbiology 2006, 8(8):1408-1427.

84. Jeffries TW: Biodegradation of lignin-carbohydrate complexes. Biodegradation 1990,

1(2-3):163-176.

85. Butler J, Buckerfield J: Digestion of lignin by termites. Soil Biology and Biochemistry

1979, 11(5):507-513.

86. Dowd PF, Lagrimini LM. The role of peroxidase in host insect defenses. Advances in

Insect Control 1997:221.

87. Fretland AJ, Omiecinski CJ: Epoxide hydrolases: biochemistry and molecular

biology. Chemico-biological interactions 2000, 129(1-2):41-59.

186 88. Dunn JP, Potter DA, Kimmerer TW: Carbohydrate reserves, radial growth, and

mechanisms of resistance of oak trees to phloem-boring insects. Oecologia 1990,

83(4):458-468.

89. Taylor AM, Gartner BL, Morrell JJ: Heartwood formation and natural durability—a

review. Wood and Fiber Science 2002, 34(4):587-611.

90. Lee K, Berenbaum MR: Action of antioxidant enzymes and cytochrome P‐450

monooxygenases in the cabbage looper in response to plant phototoxins. Arch Insect

Biochem 1989, 10(2):151-162.

91. Bernhardt R: Cytochromes P450 as versatile biocatalysts. Journal of Biotechnology

2006, 124(1):128-145.

92. Zhu F, Moural TW, Shah K, Palli SR: Integrated analysis of cytochrome P450 gene

superfamily in the red flour beetle, Tribolium castaneum. BMC Genomics 2013,

14(1):174.

93. Consortium TS: The genome of the model beetle and pest Tribolium castaneum.

Nature 2008, 452(7190):949-955.

94. Scott JG, Wen Z: Cytochromes P450 of insects: the tip of the iceberg. Pest

Management Science 2001, 57(10):958-967.

95. Ishida Y, Leal WS: Rapid inactivation of a moth pheromone. Proceedings of the

National Academy of Sciences of the United States of America 2005, 102(39):14075-

14079.

96. Oakeshott J, Claudianos C, Newcomb R, Russell R: Biochemical genetics and genomics

of insect esterases. 2005.

97. Lindroth RL: Host plant alteration of detoxication activity in Papilio glaucus glaucus.

Entomologia experimentalis et applicata 1989, 50(1):29-35.

187 98. Teese MG, Campbell PM, Scott C, Gordon KH, Southon A, Hovan D, Robin C, Russell

RJ, Oakeshott JG: Gene identification and proteomic analysis of the esterases of the

cotton bollworm, Helicoverpa armigera. Insect Biochem Mol Biol 2010, 40(1):1-16.

99. Boeckler GA, Gershenzon J, Unsicker SB: Phenolic glycosides of the Salicaceae and

their role as anti-herbivore defenses. Phytochemistry 2011, 72(13):1497-1509.

100. Kostaropoulos I, Papadopoulos AI, Metaxakis A, Boukouvala E, Papadopoulou-

Mourkidou E: Glutathione S–transferase in the defence against pyrethroids in

insects. Insect Biochemistry and Molecular Biology 2001, 31(4):313-319.

101. Enayati AA, Ranson H, Hemingway J: Insect glutathione transferases and insecticide

resistance. Insect Molecular Biology 2005, 14(1):3-8.

102. WADLEIGH RW, YU SJ: Metabolism of an organothiocyanate allelochemical by

glutathione transferase in three lepidopterous insects. Journal of Economic

Entomology 1988, 81(3):776-780.

103. Lee K: Glutathione S-transferase activities in phytophagous insects: induction and

inhibition by plant phototoxins and phenols. Insect Biochemistry 1991, 21(4):353-361.

104. Telfer WH, Kunkel JG: The function and evolution of insect storage hexamers.

Annual Review of Entomology 1991, 36(1):205-228.

105. Cowling EB, Merrill W: Nitrogen in wood and its role in wood deterioration.

Canadian Journal of Botany 1966, 44(11):1539-1554.

106. Sasaki T, Ishikawa H: Production of essential amino acids from glutamate by

mycetocyte symbionts of the pea aphid, Acyrthosiphon pisum. Journal of Insect

Physiology 1995, 41(1):41-46.

107. Isoe J, Scaraffia PY: Urea synthesis and excretion in Aedes aegypti mosquitoes are

regulated by a unique cross-talk mechanism. PloS one 2013, 8(6):e65393.

188 108. Nagaoka S, Takata Y, Kato K: Identification of two arginases generated by

alternative splicing in the silkworm, Bombyx mori. Arch Insect Biochem 2011,

76(2):97-113.

109. Ryan CA: Protease inhibitors in plants: genes for improving defenses against insects

and pathogens. Annual Review of Phytopathology 1990, 28(1):425-449.

110. Bian X, Shaw BD, Han Y, Christeller JT: Midgut proteinase activities in larvae of

Anoplophora glabripennis (Coleoptera: Cerambycidae) and their interaction with

proteinase inhibitors. Arch Insect Biochem 1996, 31(1):23-37.

111. Suh SO, McHugh JV, Pollock DD, Blackwell M: The beetle gut: a hyperdiverse source

of novel yeasts. Mycological research 2005, 109(Pt 3):261-265.

112. Ha E-M, Oh C-T, Bae YS, Lee W-J: A direct role for dual oxidase in Drosophila gut

immunity. Science Signaling 2005, 310(5749):847.

113. Korayem AM, Fabbri M, Takahashi K, Scherfer C, Lindgren M, Schmidt O, Ueda R,

Dushay MS, Theopold U: A Drosophila salivary gland mucin is also expressed in

immune tissues: evidence for a function in coagulation and the entrapment of

bacteria. Insect Biochemistry and Molecular Biology 2004, 34(12):1297-1304.

114. Johannessen BR, Skov LK, Kastrup JS, Kristensen O, Bolwig C, Larsen JN, Spangfort

M, Lund K, Gajhede M: Structure of the house dust mite allergen Der f 2:

implications for function and molecular basis of IgE cross-reactivity. FEBS letters

2005, 579(5):1208-1212.

115. Pick C, Schneuer M, Burmester T: The occurrence of hemocyanin in Hexapoda. FEBS

Journal 2009, 276(7):1930-1941.

116. Kawabata T, Yasuhara Y, Ochiai M, Matsuura S, Ashida M: Molecular cloning of

insect pro-phenol oxidase: a copper-containing protein homologous to arthropod

hemocyanin. Proceedings of the National Academy of Sciences 1995, 92(17):7774-7778.

189 117. King AJ, Cragg SM, Li Y, Dymond J, Guille MJ, Bowles DJ, Bruce NC, Graham IA,

McQueen-Mason SJ: Molecular insight into lignocellulose digestion by a marine

isopod in the absence of gut microbes. Proceedings of the National Academy of

Sciences of the United States of America 2010, 107(12):5345-5350.

118. Dixit R, Arakane Y, Specht CA, Richard C, Kramer KJ, Beeman RW, Muthukrishnan S:

Domain organization and phylogenetic analysis of proteins from the chitin

deacetylase gene family of Tribolium castaneum and three other species of insects.

Insect Biochemistry and Molecular Biology 2008, 38(4):440-451.

119. Scully ED, Hoover K, Carlson J, Tien M, Geib SM: Proteomic analysis of Fusarium

solani isolated from the Asian longhorned beetle, Anoplophora glabripennis. PloS

one 2012, 7(4):e32990.

120. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

121. Martin MMC: The evolution of insect-fungus associations: from contact to stable

symbiosis. American zoologist 1992, 32(4):593-605.

190 Table 4-1. 454 and Sanger EST gut transcriptome libraries from herbivorous insects.

Order Family Genus Food Number NCBI reads Accession Blattodea Rhinotermidae Coptotermes Wood 142,738 345139168- 345281906

Reticulitermes Wood 63,680 197217790- 197281650

Coleoptera Buprestidae Agrilus Phloem 126,185 SRX018276

Cerambycidae Anoplophora Wood 277,000 N/A

Chrysomelidae Callosobruchus Stored 909,444 SRR037004 product Chrysomela Leaf 277,780 SRR037007

Gastrophysa Grasses 1,234,472 SRR037001

Leptinotarsa Leaf 839,061 SRR037005

Curculionidae Dendronctonous Phloem 24,880 254003271- 254028151

Sitophilus Stored 926,752 SRR037006 product Diptera Cecidomyiidae Mayetola Sap 118,107 259155064- 259273171

Hemiptera Delphacidae Peregrinus Sap 120,595 282603884- 282720983

Hymenoptera Apidae Apis Pollen 622,279 SRX025528

Lepidoptera Crambidae Ostrinia Leaf 24,103 254003208- 254027311 Sphingidae Manduca Leaf 432,961 SRR017588

191 Table 4-2. Transcriptome assembly and annotation metrics from herbivorous insects included in

glycoside hydrolase and Pfam comparisons.

Genus Number of Number of Number of Number of Number of Number of Unique Insect Reads and Reads and Reads and Reads and Reads and Reads and Isotigs Isotigs with Isotigs with Isotigs with Isotigs Isotigs classified Gene KEGG Pfam as ncRNA Ontology Assignments* Domains* Assignments* Coptotermes 33,662 5,728 91 2,187 592 2691 Reticulitermes 8,616 2,112 312 1,141 352 1,665 Agrilus 29,773 8,614 1,119 4,069 1,230 4,715 Anoplophora 20,587 9,109 78 4,331 1,630 4,591 Chrysomela 48,675 14,606 48 6,004 1,925 7,888 Gastrophysa 45,675 14,783 66 5,720 1,711 10,532 Leptinotarsa 81,335 20,037 57 4,385 1,669 5,764 Callosobruchus 68,336 20,880 84 7,047 2,045 13,703 Dendroctonous 4,328 2,876 130 1,499 423 2,240 Sitophilus 56,592 17,846 96 6,278 1,869 11,796 Mayelota 3,895 2,225 536 1,409 441 1,834 Peregrinus 19,297 8,740 55 3,957 1,014 6,483 Apis 42,370 17,008 179 5,602 1,882 7,110 Ostrinia 2,713 1,670 84 2,394 219 1,419 Manduca 82,488 13,638 179 6,210 1,957 7,439

192

Table 4-3. Assembly metrics for A. glabripennis midgut transcriptome assembly generated using

454 reads only.

Metric Contig Isotig \ Min length (nt) 200 204

N80 length (nt) 510 592

N50 length (nt) 907 1076

N20 length (nt) 1577 1753

Max length (nt) 5701 5701

193 Table 4-4. Annotation statistics for A. glabripennis midgut transcriptome 454 assembly.

Number of Unique Reads and Isotigs 20,587 Number of Reads and Isotigs with BLASTX Hits 10,030 Number of Insect Reads and Isotigs 9,109 Number of Bacterial Reads and Isotigs 45 Number of Fungal Reads and Isotigs 30 Number of rRNAs 78 Number of tRNAs 0 Number of Insect Reads/Contigs with GO Assignments 4,331 Number of Insect Reads/Contigs with KEGG Assignments 1,633 Number of Insect Reads/Contigs with Pfam Assignments 7,202

194 Table 4-5. Assembly metrics for A. glabripennis midgut transcriptome Illumina-454 co-assembly.

Number of Min N80 N50 N20 Max transcripts transcript transcript transcript transcript transcript length (nt) length (nt) length (nt) length (nt) length (nt) 42,085 200 439 945 2407 32,701

195 Table 4-6. Annotation statistics for A. glabripennis midgut transcriptome Illumina-454 hybrid assembly.

Number of Unigenes 42,085 Number of Unigenes with BLASTP Hits 13,892 Number of Insect Unigeness 9,959 Number of rRNAs 341 Number of tRNAs 70 Number of Insect Unigenes with GO Assignments 5,066 Number of Insect Unigenes with KEGG Assignments 1,685 Number of Insect Unigenes with Pfam Assignments 7,688

196 Table 4-7. Cytochrome P450 annotations from A. glabripennis midgut transcriptome.

Protein Length Highest Organism % Clan Putative Full or ID (Amino Scoring Amino cyp450 partial Acid) BLAST Acid family CDS alignment Identity assignment (Accession Number) m.43 203 EFA02818 T. castaneum 55% 3 CYP6B 5’ Partial m.576 221 EFA07581 T. castaneum 54% Mito CYP12H 5’ Partial m.2790 273 AAZ94271 L. decemlineata 53% 3 CYPB6 3’ Partial m.2949 135 EFA12856 T. castaneum 53% 3 CYP345 Internal Partial m.4110 203 EFA05693 T. castaneum 54% 3 CYP6B 3’ Partial m.4694 288 EEZ97722 T. castaneum 62% Mito CYP314 Internal Partial m.4920 134 EFA04616 T. castaneum 60% 4 CYP4B Internal Partial m.5185 243 EFA02818 T. castaneum 44% 3 CYP6B 3’ Partial m.5186 327 ADH29767 T. castaneum 43% 3 CYP6B Internal Partial m.5187 430 ADH29767 T. castaneum 44% 3 CYP6B 5’ Partial m.5554 462 EFA10756 T. castaneum 59% 4 CYP4Q Complete m.5960 494 NP_001034 T. castaneum 55% 4 CYP4Q Complete 529 m.5973 415 XP_001809 T. castaneum 41% 4 CYP347 Complete 620 m.6105 269 EFA05693 T. castaneum 52% 3 CYP6B 3’ Partial m.6106 194 EFA12631 T. castaneum 67% 3 CYP6B 5’ Partial

197 m.6149 509 EFA12857 T. castaneum 44% 3 CYP345 5’ Partial m.6332 450 XP_972348 T. castaneum 55% 3 CYP9 Complete m.6566 194 EFA02818 T. castaneum 47% 3 CYP6B Complete m.6961 509 EFA12636 T. castaneum 58% 3 CYP6B Complete m.7294 504 EFA12856 T. castaneum 54% 3 CYP345 Complete m.7332 283 EFA12632 T. castaneum 55% 3 CYP6B Internal Partial m.7528 433 EFA04535 T. castaneum 48% 4 CYP4B Internal Partial m.7574 441 ADH29761 T. castaneum 53% 3 CYP6B Complete m.7944 199 AAP94193 T. castaneum 58% 4 CYP4Q 5’ Partial m.7945 226 AAF70178 T. castaneum 53% 4 CYP4Q Complete m.8049 512 EFA02821 T. castaneum 54% 3 CYP6B Complete m.8168 513 EFA02819 T. castaneum 53% 3 CYP6B Complete m.8218 502 EFA01323 T. castaneum 62% 4 CYP4B Complete m.8553 320 EFA09242 T. castaneum 52% 3 CYP9Z Complete m.8554 369 EFA09242 T. castaneum 52% 3 CYP9Z Complete m.8555 529 EFA09242 T. castaneum 52% 3 CYP9Z Complete m.9305 514 AAZ94272 L. decemlineata 58% 3 CYPB6 5’ Partial m.9306 308 AAZ94272 L. decemlineata 58% 3 CYPB6 3’ Partial m.9307 224 AAZ94272 L. decemlineata 57% 3 CYP6B 3’ Partial m.9308 446 AAZ94272 L. decemlineata 60% 3 CYP6B Complete m.9560 513 AAZ94272 L. decemlineata 60% 3 CYP6B Complete m.9606 332 EFA10753 T. castaneum 48% 4 CYP4Q 3’ Partial m.9607 497 EFA10753 T. castaneum 55% 4 CYP4Z Complete

198 m.9803 370 XP_969633 T. castaneum 54% 3 CYP6B Complete 1 m.9916 481 EFA12632 T. castaneum 54% 3 CYP6B 5’ Partial m.9917 339 EFA12626 T. castaneum 60% 3 CYP6B Complete m.9918 360 AAZ94272 L. decemlineata 59% 3 CYP6B Complete m.9919 480 EFA12634 T. castaneum 55% 3 CYP6B 5’ Partial m.10025 360 EFA07581 T. castaneum 54% Mito CYP12H Complete m.10234 506 EFA12628 T. castaneum 52% 3 CYP6B Complete m.10236 390 EFA12628 T. castaneum 53% 3 CYP6B 5’ Partial m.10796 118 EFA07581 T. castaneum 53% Mito CYP12H 5’ Partial m.12025 147 AAZ94272 L. decemlineata 56% 3 CYP6B 3’ Partial m.12358 270 XP_970699 T. castaneum 34% N/A N/A 5’ Partial m.13759 245 EFA12627 T. castaneum 54% 3 CYP6B Internal Partial

199

Figure 4-1. Histogram of isotig lengths generated from 454 FLX reads. 232,824 shotgun reads were assembled into 1,658 isotigs (1.4 Mb) using Newbler, which represent assembled transcripts. The N50 isotig length was 1,076 nt.

200

Figure 4-2. Histogram of transcript lengths generated from 454 FLX and Illumina paired end reads using Trinity. Co-assembly of 454 shotgun reads and Illumina paired-end sequences using

Trinity yielded 42,085 transcripts (31 Mb) ranging in length from 200 to 32,701 nt with an N50 transcript length of 945 nt. This represented a substantial improvement over the 454 assembly and generated more full length transcripts in comparison.

201

Figure 4-3. Relative abundance of the 25 most abundant Pfam domain assignments in the A. glabripennis midgut transcriptome assembly. Unigenes generated from the 454/Illumina assembly were scanned from Pfam HMMs. Over 7,500 unigenes contained Pfam domains.

202

Figure 4-4. KOG assignments for midgut unigeness. Unigenes detected in the A. glabripennis midgut transcriptome were assigned to 24 different eukaryotic clusters of orthologous gene

(KOG) categories. Overall, the majority of the unigenes were assigned to the signal transduction mechanisms and the general functional prediction only KOG categories.

203

Figure 4-5. Distribution of glycoside hydrolase families found in the A. glabripennis midgut transcriptome. Approximately 180 unique unigenes assigned to 14 different glycoside hydrolase families were detected. The most dominant families were GH 1 and 18 while GH 15, 16, 45, and

48 were represented by a single unigene.

204

Figure 4-6. Proposed mechanisms for direct utilization of ammonia detected in the A. glabripennis midgut. Several methods for re-incorporating ammonia produced through nucleotide or amino acid deamination reactions into amino acid and nucleosides were reconstructed based on the presence of unigenes in the A. glabripennis midgut transcriptome.

Ammonia can be directly integrated into glutamine by glutamine synthetase and aspargine by aspartate ammonia ligase. Glutamine can subsequently be converted into glutamate by microbial- derived glutamate synthase detected in the A. glabripennis midgut metagenome. Glutamate could subsequently be converted into either proline by a series of reactions catalyzed by γ-glutamyl kinase, dehydrogenase, aminotransferase, and pyrroline-5-carboxylate or aspartate via a transaminase reaction, respectively. Furthermore, aspartate can be converted into inosine, which can be directly used for synthesis of purine nucleotides.

205

Figure 4-7. Structure of hemocyanin enzymes detected in the A. glabripennis midgut transcriptome. Seven unigenes predicted to encode hemocyanins were detected. Unlike many insect hemocyanins, the hemocyanins produced by A. glabripennis had intact and presumably functional copper binding domains. These unigenes were predicted to have involvement in mediating interactions with gut microbes.

206

Figure 4-8. Multivariate comparison of glycoside hydrolase families detected in the gut transcriptomes of herbivorous insects. Principal components analysis was conducted to plot the abundance of each glycoside hydrolase family detected in the gut transcriptomes of wood-feeding insects. PCA values were plotted in sample space and variables are displayed as vectors on the

PCA biplot. PCA axis 1 and PCA axis 2 explain 26.2% and 18.1% of the variation in the data, respectively. Monte Carlo Permutation Procedure (n=1000 iterations): p<0.0001 for PCA 1 and

PCA2. No major clustering of insects feeding on similar food sources was detected.

207

Figure 4-9. Phylogenetic analysis of GH 5 cellulases detected in the Coleoptera. Unrooted maximum likelihood tree for insect and nematode derived GH 5 family proteins was constructed with Garli (version 2.0) using the WAG [122] + I + F + G evolutionary model (n =500 bootstrap replicates). Fully resolved bootstrap consensus trees were compiled using Sum Trees (version

208 3.3.1) and bootstrap values > 50 are displayed. Nodes are annotated with NCBI protein database accession numbers. Scale bar represents branch lengths.

209

Figure 4-10. Phylogenetic analysis of GH 48 cellulases detected in the Coleoptera. Unrooted maximum likelihood tree for insect and bacterial derived GH 48 family proteins was constructed with Garli (version 2.0) using the LG [123] + I + F + G evolutionary model (n=500 bootstrap replicates). Fully resolved bootstrap consensus trees were compiled using Sum Trees (version

3.3.1) and bootstrap values > 50 are displayed. Nodes are annotated with NCBI protein database accession numbers. Scale bar represents branch lengths.

210

Figure 4-11. Phylogenetic analysis of GH 45 cellulases detected in the Coleoptera. Unrooted maximum likelihood tree for Coleopteran-derived GH 45 family proteins with Garli (version 2.0) using the WAG [122] + G evolutionary model (n=500 bootstrap replicates). Fully resolved bootstrap consensus trees were compiled using Sum (version 3.3.1) and bootstrap values > 50 are displayed. Nodes are annotated with NCBI protein database accession numbers. Scale bar represents branch lengths.

211

Figure 4-12. Two-way cluster analysis of level four gene ontology terms from herbivorous insect gut transcriptomes. Two-way cluster analysis of level four gene ontology terms was performed to identify potential correlations between abundances of level four GO terms and feeding niche.

212 Clustering did not appear to be driven by food source or phylogenetic relatedness. However, the

A. glabripennis midgut transcriptome is notably distinct from the other gut transcriptomes included in this comparison.

213 Chapter 5

Metatranscriptome Analysis and Community Profiling of Microbes Associated with the Asian Longhorned Beetle (Anoplophora glabripennis) Midgut: Insights into Insect-Microbe Interactions and Nutritional Ecology

Abstract

The gut microbial communities associated with xylophagous beetles are taxonomically rich and predominately comprised of taxa that are poised to help the insect survive in woody tissues devoid of essential nutrients and high in plant defensive compounds. However, the precise contributions of these gut microbiota to digestive physiology and nutritional ecology remain uncharacterized in many lineages are largely hampered by the sizeable number of facultative symbionts and the dynamic nature of community composition. In this study, we profiled the bacterial and fungal communities of the larval Anoplophora glabripennis midgut, a community hypothesized to work in tandem with the beetle to digest lignocellulose and acquire essential nutrients while feeding in the nutrient-deficient heartwood of host trees. We documented operational taxonomic units (OTUs) that were shared among multiple beetles collected from a single population as well as rare OTUs that were present in low abundances and/or not shared among multiple larval midguts. From this, it was demonstrated that a subset of both shared and rare OTUs were transcriptionally active in the midgut, suggesting that abundance or persistence in the midgut is not necessarily correlated with metabolic activity. Furthermore, transcripts derived from shared and rare OTUs encoded pathways for xylose and pentose sugar utilization and the synthesis of essential nutrients, including amino acids, sterols, and fatty acids, suggesting that they are capable of making

214 contributions to digestive physiology in this insect. These findings, combined with the recently published metagenome of the A. glabripennis midgut and beetle gut transcriptome, will permit a deeper understanding of the physiological interactions between gut symbionts and a cerambycid that develops in the heartwood of a broad range of apparently healthy deciduous trees.

Introduction

Interactions with microbes through symbiosis are pervasive phenomena in insects, which comprise one of the most ecologically versatile lineages on the planet [1]. These interactions drive colonization of novel niches, promote survival under harsh environmental conditions, provide sources of exogenous genetic material, and stimulate rapid niche expansion and adaptation [2-4]. Insects have formed intricate relationships with a phylogenetically diverse array of microbes, including bacteria from over ten phyla, archaea, fungi, and protists [5-7]. The taxonomy, composition, and diversity of insect- associated communities can vary tremendously, even in communities sampled from the same insect species [8]. Despite these variations, the relative importance of symbionts to nutrition and digestive physiology has been repeatedly demonstrated in many phytophagous insect lineages and includes manipulation of host plant physiology to modulate expression of defense-related genes, direct detoxification of ingested plant defensive compounds, synthesis of essential nutrients, nitrogen fixation and recycling, tolerance to extreme environmental conditions, digestion of plant cell wall components,

215 and protection from pathogenic microbes [1, 9-11]. Taken together, microbes have made undeniable contributions to the success and robustness of this group of organisms.

Microbial communities associated with insects range in complexity from relatively simple and static intracellular bacterial communities [12, 13] to highly complex and dynamic communities associated with wood-feeding beetles, including members of the family Cerambycidae [7]. In these diverse systems, which are often dominated by facultative symbionts and have high degrees of plasticity in terms of community richness and composition [14], it is difficult to define the contributions of microbes that directly enhance fitness of the host. Metagenomic approaches using next generation sequencing technologies can generate data from complex microbial communities at an unprecedented scale, promoting the exploration of microbial communities of greater complexity and diversity [15] and providing unique insights into the metabolic potentials of these diverse and plastic communities. Advances in these culture-independent methods have successfully elucidated the taxonomic composition and metabolic potentials of a variety of gut microbial communities associated with phytophagous insects [6, 16-18], facilitated the discovery of new microbial taxa that have yet to be cultivated on artificial media [19], and expedited the discovery of new genes and alleles that could be exploited for industrial and pharmaceutical purposes [20]. However, one common pitfall of metagenomic approaches is that DNA from dead, transient, or metabolically inactive microbes can be sequenced and included in gene annotations, making it difficult to ascertain which taxa provide direct fitness benefits to the insect host and which taxa are inconsequential [21]. Metatranscriptome and/or metaproteome analyses can alleviate these challenges, in part, by focusing analyses on microbial taxa and genes that are

216 actually expressed in a particular community that may make strong candidates for contributing to nutritional ecology.

The wood boring cerambycid beetle Anoplophora glabripennis, is a destructive wood-boring cerambycid with a very broad host range among deciduous trees, thriving in over 47 deciduous tree species in its native (Asia) and introduced ranges (Europe and

North America) [22, 23]. This species develops in the sapwood and heartwood of apparently healthy trees, and harbors a complex, diverse, and plastic gut microbiota [17,

24, 25] hypothesized to make contributions to the digestion of woody tissue and nutritional ecology. This beetle has caused millions of dollars in damage in both its introduced and native ranges [22] and understanding the interactions between A. glabripennis and its midgut microbiota could lead to improved control practices for this insect and other wood-boring cerambycids. Although the larval midgut transcriptome of

A. glabripennis is inundated with endogenously produced transcripts predicted to encode cellulases, xylanases, detoxification enzymes, and enzymes that could enhance lignin degradation, gaps in the beetle digestome that could be supplied by enzymes encoded in the metagenome were also detected, demonstrating a role for collaboration with microbes, particularly with regards to lignin metabolism and synthesis of essential nutrients that are absent or present in low abundances in woody tissue [26].

Metagenomic analysis of microbes associated with the midgut luminal contents of larval

A. glabripennis revealed a diverse microbial community predicted to contain over 300 bacterial and 18 fungal operational taxonomic units (OTUs). Many of the annotated genes in the metagenome were derived from taxa previously detected in this beetle [24,

27, 28], suggesting that a subsets of these microbes are consistently associated with A.

217 glabripennis larvae and that enzymes produced by these taxa may play integral roles in digestive physiology. Specifically genes capable of producing full suites of plant cell wall degrading enzymes, fixing and recycling nitrogen, synthesizing nutrients, detoxifying plant secondary metabolites, and degrading many dominant linkages in hardwood lignin were detected through this analysis. Thus, the A. glabripennis midgut microbiota has the potential to fully liberate glucose from cellulose, convert xylose and other five-carbon sugars released from hemicellulose into compounds that can be directly used by the insect, and synthesize many of the essential nutrients that are deficient in woody tissue [17].

The purpose of the current study is to identify bacterial and fungal taxa consistently present in the larval A. glabripennis midgut through 16S and ITS amplicon sequencing of multiple insects feeding in a preferred host (Acer saccharum) and to survey the metabolic potential of transcriptionally active microbes associated with the midgut through metatranscriptome profiling. This will provide an assesment of the transcriptional activity of both shared and rare OTUs associated with the A. glabripennis midgut, and also provide a snapshot of the microbial gene expression profile in larvae feeding in the heartwood of a preferred host, allowing us to define the relative contributions of these OTUs to digestive and physiological processes. An additional purpose of this study is to gain a more comprehensive inventory of the taxonomy and metabolic potential of microbes associated with this beetle. One shortfall of previous metagenomic survey of the A. glabripennis midgut microbiota was that only microbes associated with the midgut contents were sampled for sequencing [17] and microbes associated with the midgut epithelial cells could have been missed. However, at the time

218 of study, this was only feasible approach to sequence midgut-associated microbes without high levels of host DNA contamination. In this study, deep sequencing of tissue collected from whole guts will allow us to survey the metabolic potential of microbes that may be intricately associated with the midgut tissue.

Methods

Characterization of Fungal and Bacterial Midgut Communities Using 16S and ITS Amplicon Sequencing

To better characterize the bacterial and fungal communities associated with the A. glabripennis midgut as it fed on a preferred host in its introduced range, 16S bacterial rRNA and fungal ITS amplicon libraries were constructed. Five pairs of adult A. glabripennis were allowed to mate and oviposit in potted sugar maple trees (A. saccharum) in a USDA-approved insect quarantine facility at The Pennsylvania State

University (University Park, PA). In brief, the trees were planted in 25-gallon nursery containers filled with Fafard 52 pine bark medium (Fafard, Agawam, MA) and were grown at an outdoor nursery until they were 3-4 years old. Several weeks before use in experiments, the potted trees were moved into the quarantine greenhouse to allow for acclimation to greenhouse conditions. Three trees were placed in a walk-in insect cage

(~3m high, 3m long, and 2m wide) and five mating pairs of A. glabripennis adults were placed in the cage and allowed to mate and lay eggs. Sixty days after the eggs hatched

(indicated by the first appearance of frass), four third instar larvae feeding in the heartwood were removed from the trees.

219 Larvae were surface sterilized in two washes of 70% ethanol followed by a single wash in sterile distilled water to remove residual ethanol. Midguts were dissected from four A. glabripennis larvae and DNA was extracted using the Power Soil DNA Isolation

Kit (MoBio, Carlsbad, CA). DNA integrity and concentration were verified using the

Nano Drop spectrophotometer (Thermo-Fisher, Pittsburgh, PA) and the Quant-It dsDNA assay (Invitrogen, Grand Island, NY), which was analyzed on a Qubit fluorimeter. DNA collected from each midgut was used to construct partial 16S bacterial amplicon libraries, ranging from position 27F to 907R, and full ITS amplicon libraries, ranging from ITS5 to

ITS4. 454 multiplex identifiers (MIDs) and 454 Titanium library adapters were directly incorporated into the primer sequences as described previously [17]. In brief, 100 ng of

DNA were added to a PCR reaction containing 1.0 µL 10X Buffer Mix (Roche, Branford,

CT), 2 mM dNTPs (Roche, Branford, CT), 0.5 U Taq polymerase (Roche, Branford, CT),

5 µM forward primer (27F: 5’-AGAGTTTGATCMTGGCTCAG-3’) and 5 µM reverse primer (907R: 5’-CCCCGTCAATTCMTTTGAGTTT-3’) [30]. PCR cycling conditions were as follows: initial denaturation for 3 minutes at 94° C, 30 cycles of 94° C for 15 seconds, 55° C for 45 seconds, and 72° C for 1 minute, and a final extension at 72° C for

8 minutes. The PCR cocktail and PCR cycling conditions for the ITS amplicon library was identical to the mixture used to generate the bacterial library except that 5 µM forward primer (ITS5: 5’-GGAAGTAAAAGTCGTAACAAGG-3’) and 5 µM reverse primer (ITS4: 5’- TCCTCCGCTTATTGATATGC-3’) [31] were substituted and 35 PCR cycles were used. PCR products were evaluated using agarose gel electrophoresis and bands corresponding to the sizes of the desired products were eluted from the gel using the Agarose Gel Extraction Kit (Roche, Branford, CT). For the 16S amplicon libraries,

220 products of approximately 900 bp in size were eluted from the gel. Since the length of the

ITS region varies in different fungal taxa, PCR products ranging in size from 500 bp to

1800 bp were eluted from the gel and were used for library preparation. Libraries were quantified using the Qubit dsDNA assay (Invitrogen, Carlsbad, CA), samples were multiplexed, and library titers were calculated using quantitative PCR against a library standard (Kapa Systems, Woburn, MA). Approximately 5,000 reads were sequenced from each 16S bacterial library and 1,000 reads were sequenced from each ITS fungal library using 454 Titanium XLR chemistry. Raw reads from each 16S and ITS library are deposited in SRA under the accession numbers SRX367813 and SRX369139, respectively. These reads are associated with BioProjects PRJNA222386 and

PRJNA222384, respectively

Operational Taxonomic Unit-(OTU) Based Analysis of 16S and ITS Amplicons

16S amplicons were assigned to operational taxonomic units (OTUs) using the program mothur (version 1.27.0). Pyrosequencing flowgrams were denoised to reduce the impact of 454 homopolymer errors on OTU classification [32] and low quality amplicons with ≥80% of the bases containing a quality score of less than 25 were removed from the dataset. Chimeras were detected and removed with the program

UCHIME [33], high quality reads greater than 700 bp in length were clustered into operational taxonomic units (OTUs) at 97% similarity, and rarefaction curves, richness estimates, and other indices of ecological diversity were computed using mothur [34].

Bacterial OTUs were taxonomically classified using the Ribosomal Database Project

221 (RDP) Classifier [35], with an 80% confidence threshold for taxonomic classifications; sequences classified as mitochondrial or chloroplast in origin were omitted from the analysis. Before calculating richness and diversity metrics, the same number of amplicons were randomly subsampled from each library to ensure that differences in library sizes were not responsible for driving the similarities and differences documented between the communities. Datasets were also analyzed with singleton OTUs (defined as

OTUs containing only single reads among all communities included in this analysis) removed. Although singleton OTUs can originate from highly transcriptionally or metabolically active OTUs [36], they can also represent artifacts of PCR products (e.g. chimeras) or contamination [37]. Thus, it is important to treat singleton OTUs with special consideration by confirming transcriptional or translational activity in the community before excluding them completely from the analysis.

For analysis of ITS amplicons, flowgrams were denoised and low quality reads were discarded as described above. High quality reads ranging from 450 bp to 850 bp in length were clustered into OTUs at 97%. However, this region is highly prone to indel events, making it difficult to accurately align this region across distantly related taxa [38] and leading to artificial inflations in richness when using alignment-based methods for

OTU classification [39]. Therefore, instead of implementing mothur’s alignment based approach for OTU assignment, amplicons were clustered into OTUs at 97% sequence similarity using the program CD-HIT-EST [40]. The program UCHIME [33] was used to detect putative chimeras using the more highly abundant OTUs as a reference since templates for chimera detection are not currently available for the ITS region. Taxonomic classification of ITS OTUs was conducted by first comparing representative sequences

222 from each OTU to NCBI’s non-redundant nucleotide database using BLASTN [41] to identify any ITS amplicons derived from plant or insect. After excluding insect and plant derived amplicons, fungal OTUs were taxonomically classified using the UNITE database [42] at a 90% confidence level and diversity and richness indices were computed using mothur. A higher confidence threshold was used for fungal classification because fungal taxa are underrepresented in the databases and it is essential to use higher stringency to ensure accuracy of predicted classifications. As in the 16S analysis, the same number of reads were subsampled from each library before computing and comparing diversity indices and analyses were also conducted with singleton OTUs removed.

Gut pH Analysis

The pH of insect guts contributes significantly to digestion and metabolism, sometimes serving to prevent cross-linking of phenolics [43] to digestive enzymes and midgut structural proteins and solubilizing carbohydrates and lignin metabolites to make them more digestible [44]. While the pH optima of digestive serine proteinases associated with the A. glabripennis midgut have been previously documented as alkaline

(>10) [45], the gut pH can differ markedly in different regions [46, 47], serving different digestive functions and providing different microhabitats for microbes that colonize the gut. To document the pH in different regions of the gut, five 3rd instars were dissected and the gut was divided into the foregut, midgut, and hindgut regions. Because the midgut region is enlarged in A. glabripennis and in most cerambycids (Figure 5-1), the

223 pH was documented in multiple regions in the midgut. In brief, the midgut was sectioned into five pieces of approximately 5 mm in length. Each section was homogenized and the pH was determined using the InLab Ultra Micro pH electrode (Mettler Toledo,

Columbus, OH).

Metatranscriptome Sequencing

For this study, RNA was sampled from two gut locations. First, total RNA was sampled from the midgut contents to identify genes from transcriptionally active microbes associated with the food bolus to expand upon a previous metagenomic analysis in which microbes associated with the midgut contents were targeted for sequencing [17].

Second, total RNA was sampled from the intact midgut to capture RNAs expressed by microbes in this region. This is of significant interest because microbes are often observed in the midgut epithelium of the A. glabripennis midgut (Figure 5-1) and are poised to directly interact with host cells. Illumina short read sequencing was chosen for this study because it alleviates many challenges associated with analyzing RNA collected from host-associated microbial communities. For example, it is difficult to isolate high quality RNA from the midgut luminal contents because the lumen is inundated with harsh digestive enzymes and displays pH and salinity extremes [48]; however, even degraded

RNA can be successfully sequenced using Illumina, eliminating the necessity for longer, high quality mRNAs for 454 pyrosequencing. Separating host cells from microbial cells for targeted sequencing of microbial RNAs is also challenging [49]; however, the throughput of Illumina sequencing alleviates this, in part, by allowing samples to be

224 sequenced to sufficient depths to detect microbial reads, even in samples dominated by host cells.

Shotgun Sequencing of mRNA from Midgut Contents Using Illumina GAIIx

Insects were reared and larvae were dissected as described above [24]. Insect midguts were dissected under sterile conditions and the peritrophic matrix was removed to enrich the sample for microbial cells associated with the food bolus. Midgut contents were flash frozen in liquid nitrogen and total RNA was immediately extracted using Fast

RNA Spin Kit for Soil (MP Biomedicals, Solon, OH). A post-RNA extraction clean-up was performed using the RNA Clean and Concentrator (Zymo, Irvine, CA) to remove residual salts and phenolics that co-precipitated with RNA. The sample was treated with

DNase I (Zymo, Irvine, CA) to digest DNA that may have co-extracted with RNA.

Sample integrity was verified using the RNA Pico Assay (Life Technologies, Carlsbad,

CA) and Nano-Drop (Thermo-Fisher, Waltham, MA), while the sample concentration was determined with the Quant-It RNA Assay (Life Technologies, Carlsbad, CA).

Removal of DNA was confirmed using the Quant-It High-Sensitivity DNA Assay (Life

Technologies, Carlsbad, CA). Insect- and microbial-derived rRNAs were depleted from the sample as described previously [26].

Overall, the total RNA recovered from the midgut contents was of poor quality

(RNA integrity score of 5, on a scale ranging from 1 to 10) and the amount of recovered

RNA was low due to the presence of and harsh conditions within the midgut lumen. To obtain sufficient RNA for sequencing, 20 ng of enriched mRNA were

225 amplified using Ovation RNA Seq (NuGen, San Carlos, CA) to produce 2 µg of double- stranded cDNA. The library was sheared using Covaris (Woburn, MA), enriched for 175 nt fragments, and prepared using TruSeq Genomic DNA library adapters (Illumina, San

Diego, CA). Approximately 40 million 130 x 130 nt paired end reads (10.6 Gb) were sequenced using the Illumina GAIIx platform. An overlapping paired end library was constructed to improve the reliability of the assembly by extending the raw Illumina read lengths from 130 nt to 175 nt to reduce the likelihood of cross-assembling reads from orthologous genes from different microbial taxa [15]. Raw Illumina reads are deposited in NCBI’s Sequence Read Archive under the accession number SRX352195.

Shotgun Sequencing of mRNA Collected from Midgut Tissue using Illumina HiSeq

Five 3rd instar A. glabripennis midguts were collected and dissected, total RNA was extracted, and ribosomal RNA was depleted from the sample as described above.

The quality and concentration of the RNA recovered from the intact midgut were high relative to the gut contents library so amplification prior to library construction was not necessary. Approximately 200 ng of enriched RNA were used for library preparation with the TruSeq RNA Library Prep Kit (Illumina, San Diego, CA), omitting the polyA enrichment step to enhance recovery of non-polyA bacterial mRNA. The library was enriched for 175 nt fragments so that paired end reads overlapped by 30 nt.

Approximately 130 million 101 x 101 nt paired end reads (36 Gb) were generated using the Illumina HiSeq 2000 platform. Raw reads are deposited in NCBI’s Sequence Read

Archive under the accession number SRX265389.

226 Assembly of Shotgun Metatranscriptome Data Using Trinity Assembler

Due to differences in library preparation and read lengths, the two libraries were quality filtered and assembled separately. Low quality reads, or reads with ≥ 20% of the bases possessing quality scores less than 20, were filtered from the dataset using FastX

Toolkit and residual library adapters were removed using Cutadapt [50]. Remaining read pairs and orphans were assembled with Trinity de novo assembler [51] in paired-end mode. To reduce the coverage of highly expressed genes and improve the ability to assemble transcripts and transcript isoforms originating from lowly expressed genes, k- mers (k=25) from quality filtered Illumina paired end reads were reduced to ≤ 30X coverage using digital normalization. Normalized reads were assembled with Trinity

(version r2012-10-05) [51]. Trinity was selected for assembling data from reads originating from multiple bacterial and fungal taxa due to its ability to discriminate and assemble gene isoforms and splice variants in eukaryotic data. We expect that Trinity’s ability to distinguish between isoforms from the same gene will prevent cross-assembling reads originating from orthologous genes from different microbial taxa.

Annotation of Metatranscriptome Genes and Isoforms

Transcripts assembled by Trinity from both the midgut and midgut contents libraries were used in downstream annotations and analyses. Due to the fact that transcript isoforms could represent transcripts derived from orthologous genes from closely related bacterial species or strains, they were not collapsed to the gene level and annotations presented here focus primarily on transcripts. SSU (small subunit) and LSU

227 (large submit) rRNAs were detected and removed with HMMer [52] using profiles for prokaryotic, eukaryotic, and archeael SSU, LSU, and 5.8S/8S rRNAs [53]. Bacterial, fungal, insect, and tree SSU and LSU rRNAs detected in the transcriptome were taxonomically classified by comparison to the Silva database, a manually curated collection of high-quality, full length SSU and LSU ribosomal RNAs from all domains of life [54]. Assembled rRNAs were also compared to the 16S rRNAs detected through amplicon sequencing via BLASTN to confirm transcriptional activity of shared OTUs and also determine whether any rare OTUs were transcriptionally active in the midgut.

Due to potential homopolymer errors that can exist in the OTU sequences derived from the 454 amplicons, rRNA sequences from the transcriptome data were considered to have a significant match if they were ≥ 95% identical at the nucleotide level to an OTU detected through 454 amplicon sequencing. Fungal ITS transcripts were also detected using the program ITSx [55], which detects fungal ITS1, 5.8s rRNA, and ITS2 genes using hidden Markov models. These fungal ITS transcripts were compared to the set of fungal ITS OTUs detected in the midgut using BLASTN to determine which OTUs were transcriptionally active in the midgut. ITS sequences from the transcriptome data were considered to have a significant match if they were ≥ 95% identical at the nucleotide level to an OTU detected through 454 amplicon sequencing.

The remainder of the isoforms were annotated by a BLASTX comparison [41] to the NCBI non-redundant protein database and taxonomically classified using MEGAN metagenomic analyzer [56] to identify transcripts that were microbial in origin.

Microbial transcripts were functionally grouped into Gene Ontology terms [57] and mapped onto KEGG pathways [58] using the Trinotate pipeline and the KAAS server

228 [59], respectively. TopGO [60] was used to identify gene ontology categories enriched in the midgut and midgut contents libraries relative to the previously assembled A. glabripennis midgut transcriptome library [26] using a Fisher’s exact test in‘classic’mode. Transcripts predicted to originate from prokaryotic and eukaryotic microbes were assigned to Clusters of Orthologous Genes (COGs) and KOGs [61] using

RPS-BLAST [62]. BLASTX results were corroborated and glycoside hydrolase family assignments were computed [63] by scanning for Pfam A domains [64] using

HmmSearch [65].

Results and Discussion

Bacterial Community Structure

Species richness varied slightly among individual larval midgut communities included in this study and ranged from 82 to 198 OTUs using a 97% sequence similarity threshold. Combined richness for all bacterial midgut communities sampled was 317

OTUs. Measures of community complexity and diversity also varied slightly among individuals. For example, the Simpson index, representing the probability that two individuals sampled at random originated from the same OTU, ranged from 0.04 to 0.20

(Table 5-1). Rarefaction curves computed for each individual midgut community failed to reach saturation, suggesting that the full richness of the individual A. glabripennis midgut communities was not sampled for sequencing (Figure 5-2a). Community diversity measures consistently predicted the presence of over 140 OTUs in each midgut

229 community, indicating that additional sampling could lead to the detection of subsequent bacterial OTUs. This hypothesis is supported by additional sequencing of Sample 1, which led to the detection of approximately 50 additional OTUs, and the previous analysis of a 16S rDNA library constructed from beetles collected at a field site in

Worcester, MA (USA), which predicted the presence of over 300 OTUs [17]. Removal of singleton OTUs reduced richness and diversity estimates slightly, diminishing the range of observed OTUs to 68-143 (Table 5-1 and Figure 5-2b) and reducing the total observed richness to 156 OTUs. Although the removal of singleton OTUs did not have significant impacts on Simpson’s diversity index or Shannon-Wiener index, it did impact richness estimates, reducing the range of Chao richness indices from 141-451 to 91-201 for individual larval midgut communities. These findings indicate that including singleton OTUs in post hoc analyses can cause significant inflations in richness estimates and should be interpreted with caution, particularly when performing comparisons across multiple communities.

The bacterial communities of A. glabripennis midguts were predominately comprised of members of the following classes: Actinobacteria, Alphaproteobacteria,

Bacilli, Bacteroidetes, Clostridia, Gammaproteobacteria, and Sphingobacteria (Figure 5-

3). Most midgut communities contained approximately equivalent relative abundances of

OTUs classified to Bacilli and Betaproteobacteria, while the relative abundances of

Actinobacteria, Alphaproteobacteria, Gammaproteobacteria, and Sphingobacteria fluctuated slightly among individual beetles. However, the communities were generally dominated by OTUs assigned to the class Gammaproteobacteria. Of these, 22 OTUs were detected in association with all beetle gut communities sampled and were classified to

230 several different families (Table 5-2). Two OTUs assigned to the family

Enterobacteriaceae were highly abundant in all insect guts sampled for sequencing; although they could not be definitively classified to genus level using RDP classifier, they had highest scoring BLASTN alignments to bacteria belonging to the genus

Enterobacter in the RDP database. Interestingly, unclassified members of the family

Enterobacteriaceae were consistently associated with the egg, oviposition site, and A. glabripennis larvae in a previous study and were flagged as candidates for vertical transmission, possibly indicating an intimate association with the larval stage of this insect [29].

While the communities varied slightly among individual beetles, reflecting the dynamic nature of the gut community, rare and singleton OTUs accounted for over 100 of the OTUs that were detected in this study. Excluding these from the analysis increased the percentage of OTUs that were shared among at least two samples from 28.7% to

51.3% (Figures 5-4a and 5-4b), indicating that although the bacterial community is diverse, there is a smaller set of core microbial OTUs shared among individual beetles.

These OTUs could originate from microbial taxa that make pivotal contributions to nutritional ecology in the A. glabripennis midgut. Hierarchical clustering analysis illustrated several bacterial OTUs whose abundances were correlated (Figure 5-5), suggesting a potential associative relationship between shared OTUs and signaling potential interactions that could contribute to digestive and physiological processes in the gut. For example, the abundances of two OTUs classified to the family

Enterobacteriaceae were correlated while the abundances of an OTU classified to the genus Sphingobium and an OTU classified to the family Xanthomonadaceae. In addition,

231 the heat map illustrates that the relative abundance of many of the shared OTUs varies across midgut communities.

The taxonomic composition of midgut communities sampled from beetles in this study is relatively distinct from the midgut community derived from beetles collected in

Worcester, MA, which was dominated by Bacilli and Actinobacteria [17]. Although direct OTU comparisons cannot be performed because a different region of the 16S rRNA locus was targeted for sequencing in the referenced study, several genera were detected in both communities, including Aeromicrobium, Aurantimonas, Agrococcus,

Brachybacterium, Brevibacterium, Chrysobacterium, Curtobacterium, Enterococcus,

Flavobacterium, Leucobacter, Microbacterium, Olivibacter, Pseudomonas,

Pseudonocardia, Sphingobacterium, Staphylococcus, Stenotrophomonas, and

Streptococcus. Furthermore, OTUs assigned to the genus Pseudomonas were highly abundant in both populations. Despite these differences, the midgut communities sampled for this study more strongly resembled the midgut communities of beetles collected in Brooklyn, NY because communities sampled from this population were also dominated by OTUs assigned to the genus Enterobacter and OTUs assigned to the class

Gammaproteobacteria [24]. This was not unexpected because the Worcester, MA and

Brooklyn, NY populations were introduced separately and originated from different beetle populations in Asia [66]. Since our colony was derived primarily from the

Brooklyn, NY population, the resemblance of communities is expected. Additionally, fourth instar midguts were sampled from the Worcester, MA population while third instar midguts were used for the current study; thus, the observed differences could also be attributed to natural shifts in the composition of the gut microbiota throughout the

232 beetle’s life cycle as has been shown for other wood-feeding insects [8]. This insect feeds in different regions of its host tree in different life stages and the relative abundances of OTUs in the midgut could fluctuate as the larvae shift from the phloem to the heartwood, although this hypothesis has not been tested.

Identification of Fungal Community

Despite the fact that amplicons derived from Acer spp. comprised over 85% of the

ITS reads, we obtained a good representation of the fungal community associated with the A. glabripennis midgut. Relative to the bacterial communities associated with A. glabripennis larval midguts, the fungal ITS communities sampled were considerably less diverse and observed richness values ranged from 11 to 28 OTUs, which is in agreement with the fungal community richness detected previously through 18S amplicon sequencing [17]. Total richness among all midgut communities was 44 OTUs (Table 5-

3). Rarefaction curves reached saturation and computed values for various community richness estimators were similar to the number of observed OTUs in each community, indicating sufficient sampling of the fungal community (Figure 5-6a). The Boneh estimator, which predicts the detection of additional OTUs with subsequent sampling, predicted the detection of only 1-10 subsequent OTUs, supporting the hypothesis that the sequencing depth achieved for the fungal ITS communities was adequate to detect the majority of the fungal OTUs associated with the A. glabripennis larval midgut.

Additional sequencing of samples 1 and 3 led to the detection of five and ten additional

OTUs, respectively. Simpson’s diversity index values for individual communities ranged

233 from 0.10 to 0.33, reflecting slight fluctuations in the community structure at the OTU level. Removal of singleton OTUs greatly impacted rarefaction analysis (Figure 5-6b) and lowered richness and diversity estimates (Table 5-3), indicating that the majority of the fungal OTUs detected in this study were rare and could represent low abundance contaminants. Removal of these OTUs reduced total observed richness to 19 OTUs.

Additionally, 21 of the 25 OTUs that were unique to a single midgut community were singleton OTUs, which may represent sequencing errors or PCR/sequencing artifacts of

PCR.

Fungal OTUs were exclusively assigned to the phylum Ascomycota.

Furthermore, communities were consistently dominated by ITS amplicons classified to the order Hypocreales (Figure 5-7) and contained a greater percentage of OTUs consistently detected in all midgut communities sampled for sequencing relative to the bacterial communities (Figures 5-8a and 5-8b). Seven OTUs, including six assigned to the genus Fusarium and a single OTU assigned to the genus Pichia, were consistently associated with all larval midguts sampled for sequencing. Fungi belonging to the

Fusarium solani species complex (FSSC) have been consistently detected in the midgut of several populations of A. glabripennis sampled for sequencing and we hypothesize that their persistence signifies their importance to digestive and physiological processes in the midgut. Furthermore, one of the OTUs detected in this analysis was 98% similar at the nucleotide sequence level to a Fusarium solani OTU detected previously from beetles sampled from our A. glabripennis colony [28], indicating persistent maintenance of these

OTUs over multiple generations and also suggesting that there is likely a mechanism of vertical transmission. Maximum likelihood-based phylogenetic analysis illustrates that

234 this OTU is more closely related to an isolate detected in our PSU colony population than

Fusarium-derived OTUs detected in the Worcester, MA population (Figure 5-9) [28].

While the F. solani sequence detected in this analysis has a relatively long branch length due to other sequences assigned to this OTU, this can be attributed to 454-sequencing errors in homopolymer regions. Six OTUs with highest scoring BLASTN alignments to

Fusarium oxysporum were also detected through this analysis. The presence of F. oxysporum OTUs have been detected in the A. glabripennis midgut previously

(unpublished data). Because the ITS region is not phylogenetically informative within the F. oxysporum species complex [67], phylogenetic trees could not be computed.

Yeasts assigned to the genera Candida, Pichia, and Saccharomyces have been previously cultivated from the A. glabripennis midgut from beetles derived from our beetle colony and from insects collected from the Worcester population (unpublished data), suggesting an intimate association among yeasts and larval A. glabripennis.

Although 18S fungal amplicon libraries constructed from midguts of larvae collected from Worcester, MA indicated that the fungal community was dominated by yeasts in the genus Issatchenkia, this locus is not as taxonomically informative as the ITS region and even taxonomically distant relatives can have similar sequences at the 18S rRNA locus

[68]. Therefore, the taxonomic assignments inferred from 18S rDNA are not as finely resolved as those obtained from analysis of the ITS region and the taxonomic classifications obtained from this analysis are likely more accurate.

235 Gut pH Profile

The pH of the gut varied from slightly acidic to alkaline depending on the region sampled. Although the pH of the foregut and anterior midgut were consistently 6.4-6.8 and 6.7-6.8, respectively, the pH of the medial and posterior midgut and the hindgut were considerably more alkaline at 7.9-8.1, 8.9-9.5 and 7.5, respectively. These findings signal that different digestive processes or enzyme isoforms likely occur in different regions in the gut, each with different pH optima. This also suggests that there are gut microhabitats suitable for housing different types of microbes. For example, the lactic acid bacterial and fungal taxa found in the gut are likely to be housed in the anterior midgut since the pH in this region is lower, creating a more favorable environment for survival and maintenance of these microbial taxa. Despite our ability to document pH in relatively small regions of the gut, smaller microhabitats in the gut diverticulum with steep pH gradients could also exist.

Assembly Metrics

Despite the low quality of the total RNA isolated from the midgut contents, the quality of the library was relatively high as over 65% of the Illumina paired end reads passed quality filtering. Another significant challenge of handling metatranscriptome data is the efficient depletion of rRNA from highly degraded, environmental RNA samples using subtractive hybridization approaches [69]. Despite these challenges, only

25% of the reads originated from bacterial, insect, fungal, or plant rRNAs, indicating that rRNA depletion was relatively successful. Not surprisingly, the quality of the whole gut

236 library was exceptional as over 95% of the Illumina paired end reads passed quality filtering. Using Trinity, 161,177 transcript isoforms from 97,506 genes ranging in length from 200 nt to 31,393 nt (N50 contig length: 684 nt) were assembled from the gut contents library while 61,812 transcript isoforms from 45,418 genes ranging in length from 200 nt to 26,118 nt (N50 contig length: 592) were assembled from the whole gut library (Table 5-4). The lower assembly metrics in the gut contents library are likely due to the lower quality of the total RNA collected.

Detection of Microbial Small Ribosomal Subunits (16S/18S) and Large Ribosomal Subunits (23S/28S) in Whole Gut and Gut Contents Assemblies:

In the gut contents library, approximately 260 transcripts were classified as SSU

(small ribosomal subunit) and LSU (large ribosomal subunit) rRNAs (Table 5-5).

However, because subtractive hybridization may have depleted rRNAs from the sample in a biased manner depending on their nucleotide sequence similarities to the oligonucleotide probes, direct comparisons between the relative abundance of these rRNAs and the 16S rDNA detected through amplicon sequencing cannot be performed, yet their presence can be used to infer transcriptional activity of OTUs detected through shotgun sequencing. Several of these rRNAs were insect and plant in origin. The presence of plant rRNA is not unexpected since the larvae used for this study were feeding on live tree tissue; however, these rRNAs covered significantly less percentages of their highest scoring BLASTN alignments, indicating that these rRNAs are likely of low quality and are possibly degraded in the midgut.

237 Owing to their consistent association with A. glabripennis midguts collected from our PSU colony and other A. glabripennis populations, rRNAs taxonomically classified to the genera Fusarium and Pichia were also detected previously in the transcriptome assembly compiled from gut contents [28], indicating that these microbes are transcriptionally active in the midgut. Sixteen rRNAs assigned to the order Hypocreales were detected in the gut contents assembly and the majority of these had highest scoring alignments to fungi belonging to the genus Fusarium. These included several mitochondrial rRNAs, 28S rRNAs, and 12S rRNAs. Three mitochondrial rRNAs were taxonomically classified to the genus Pichia. Other fungal taxa detected in the gut contents library included rRNAs classified to the families Debaryomycetaceae and

Saccharomycetaceae and the higher order Saccharomycetales. Over 100 rRNAs originating from bacteria were also detected in the gut contents library, many of which correspond to taxa detected through 16S rDNA amplicon sequencing. Bacterial families detected through both 16S amplicon sequencing and shotgun transcriptome sequencing of midgut contents included Comamonadaceae, Cornyebacteriaceae, Enterobacteriaceae,

Microbacteriaceae, Rhizobiaceae, Sphingobacteriaceae, and the higher order

Pseudomonadales. Two rRNAs that share 98% nucleotide sequence similarities with the two highly abundant Enterobacteriaceae OTUs associated with all larval guts included in the 16S rDNA amplicon analysis were also detected. rRNAs assigned to the genus

Wolbachia were also detected through shotgun transcriptome sequencing, but not through amplicon sequencing. Some bacterial 16S rRNA genes have mismatches to universal bacterial 16S primers and are often missed by amplicon sequencing [70], so Wolbachia

16S rRNAs may have not been amplified with the primer set chosen for this study.

238 However, PCR amplification with Wolbachia-specific (wsp) primers failed to yield products (data not shown), indicating that the PSU colony is not likely infected with

Wolbachia. Furthermore, the rRNAs classified to this genus were relatively short in length (78 nt to 120 nt) and taxonomic classification of these short transcripts based on

BLAST alignments alone is unreliable.

Approximately 267 rRNAs were detected in the whole gut library. Their taxonomic composition was generally similar to the taxonomic composition of rRNAs detected in the whole gut library. One notable difference between the whole gut and gut contents libraries was that the most dominant rRNA detected in the whole gut library was classified to the genus Pediococcus via comparison to SILVA database. This rRNA was expressed at approximately the same level as insect 18S rRNA by RSEM analysis, potentially indicating high levels of transcriptional activity in the A. glabripennis gut.

Other rRNAs that were found exclusively in association with the whole gut library included rRNAs classified to the phyla Deinococcus-Thermus and Basidiomycota.

Bacterial 16S and Fungal ITS OTUs Detected in Metatranscriptome Data

The expression of several shared and rare OTUs were confirmed in the midgut through BLASTN analysis. For example, rRNAs with highest scoring BLASTN alignments to eight shared OTUs were detected and included OTUs classified to the genera Novosphigobium, Propionibacterium, Pseudomonas, and Staphylococcus; the families Burkholderiaceae and Enterobacteriaceae (2 OTUs), and the family

Actinomycetales. The majority of the rRNAs detected were from OTUs detected in

239 association with only one or two midgut communities and eight rRNAs originated from rare/singleton OTUs (Table 5-6). Interestingly, the majority of the 16S rRNA transcripts constructed from the Illumina reads did not have significant BLASTN matches to OTUs documented through community profiling. In some cases, the transcripts originated from a different region of the 16S rRNA locus than the region targeted for amplicon sequencing and would not be expected to align with the OTUs detected in the amplicon analysis; however, in many cases, the transcripts did overlap the region targeted for amplicon analysis, but had low sequence identities to the 16S OTUs. This signifies that the A. glabripennis midgut community is a dynamic and the insects used for metatranscriptome sequencing likely harbored some different bacterial OTUs than the insects used for amplicon sequencing. This hypothesis is supported given the variation documented in the 16S communities associated with multiple midguts.

Not surprisingly, fungal ITS sequences were not particularly numerous in the metatranscriptome data since the ITS region is generally excised from mature rRNAs.

Nonetheless, they can be present in pre-ribosomal RNAs, which could be found in lower abundances in the metatranscriptome data. Five complete ITS transcripts were identified in both the whole gut and gut contents. Two of these ITS transcripts had 100% nucleotide identity to OTUs detected through community profiling analysis. These OTUs were both classified to the genus Pichia and both were present as singleton OTUs. While transcriptional activities of other fungal OTUs could not be confirmed using this analysis, analysis of SSUs and LSUs detected in the metatranscriptome data confirmed that many of the shared and rare taxa detected through ITS community profiling were transcriptionally active (e.g. Fusarium, Pichia, Candida, and Saccharomyces).

240 Annotation Metrics

Of the assembled transcripts, 7,952 transcripts (2.7 Mb) and 3,167 transcripts

(1.42 Mb) were identified as microbial from the gut contents and whole gut libraries, respectively, and had highest scoring BLASTX alignments to protein coding genes at an e-value of 0.00001 or lower (Table 5-7). Transcripts in the midgut library were assigned to six bacterial and eight fungal orders while transcripts in the midgut contents library were assigned to 14 different bacterial orders and five fungal orders (Figure 5-10). In both libraries, Saccharomycetales was the dominant fungal order while the bacterial orders Enterobacteriales and Lactobacillales were dominant in the midgut contents and midgut libraries, respectively. COG analysis of bacterial transcripts containing putative protein coding regions from the whole gut and gut contents libraries indicated that both libraries were dominated by genes assigned to the general functional prediction only

(Figure 5-11); inorganic ion transport and metabolism; translation, ribosomal structure and biogenesis; and amino acid transport and metabolism functional categories. In contrast, the fungal communities were dominated by the following KOG categories

(Figure 5-12): signal transduction mechanisms; general functional prediction only; translation, ribosomal structure and biogenesis; and energy production and conservation.

Although slight differences in the relative abundances of various COG and KOG categories could be noted between the gut contents and whole gut libraries, the relative abundances of each category were very similar between the two libraries. Because no major differences were observed in the abundances of COG/KOG categories in the

241 midgut and midgut contents libraries, annotations from both libraries were combined for downstream analyses.

At the Pfam level, domains for sugar transporters, glycoside hydrolase family 28 polygalacturonidases, hydrolases, and β-1,3 glucan synthesis were highly abundant in the fungal transcripts, indicating that the fungal community is capable of assimilating sugar residues liberated from host tree polysaccharides and degrading plant polysaccharides occasionally found impregnating woody tissue (e.g., pectin). The preponderance of domains involved in the synthesis of β-1,3 glucans is also significant in that it confirms that the fungal taxa detected in the midgut are actively growing and developing in the midgut, suggesting metabolic activity. It also can explain the overabundance of insect- derived transcripts predicted to encode enzymes involved in the decomposition of β-1,3 and β-1,6 glucans detected previously in the A. glabripennis midgut [26] relative to the transcriptome libraries sampled from the guts of other phytophagous insects, suggesting that the insect could acquire nutrients by breaking down fungal-derived β-1,3-glucans.

Furthermore, domains predicted to encode NAD-specific glutamate dehydrogenases, taurine dehydrogenases, glutamine synthetases and amino acid permeases were also highly abundant in the fungal transcripts, suggesting that fungi associated with the gut are capable of making contributions to nitrogen economy in the midgut. While glutamate dehydrogenase can convert glutamate to α-ketogluturate and the ammonia liberated by this reaction is usually converted to urea for excretion, it can also catalyze the reverse reaction under ammonia-rich conditions, recycling α- ketoglutarate waste products produced by A. glabripennis or its gut microbes back into glutamate, which can be used for protein synthesis or production of other α-ketoamino

242 acids and reducing the need to acquire protein directly from the nitrogen-deficient environment. Glutamine synthetases can also have an integral roles in the nitrogen economy by incorporating ammonia liberated through deamination of amino acids and nucleic acids into glutamine instead of converting it to waste products for excretion.

Taurine dioxygenase may also contribute to the nitrogen economy in the gut because it can actively break down the organic acid taurine, which is one of the most prominent free organic acids in both hemolymph and insect tissues, acting as a potent antioxidant and functioning in numerous physiological processes including synaptic transmission, calcium homeostasis, regulation of intracellular ions, and membrane stability. Taurine dioxygenases can catalyze the conversion of taurine to sulfite, aminoacetaldehyde, carbon dioxide, and succinate, allowing it to be utilized as a source of nitrogen and sulfur.

In the bacterial community, domains for ABC transporters, hydrolases, major facilitator superfamily transporters, phosphotransferase systems, and family 32 glycoside hydrolases were highly abundant, suggesting that bacteria are also capable of assimilating and metabolizing sugars released from polysaccharides present in woody tissue. Of particular interest are the transcripts predicted to encode components of phosphotransferase systems as many of these have predicted abilities to internalize cellobiose, xylose, mannose, galactose, and other sugars present in woody tissue. While

A. glabripennis can endogenously metabolize and use many of the hexose sugar substrates for energy production, members of the bacterial community may metabolize or ferment these substrates, converting them into compounds (e.g. pyruvate, acetate, and oxaloacetate) that can be used in other biosynthetic processes, including amino acid, fatty acid, and sterol synthesis. The majority of the glycoside hydrolase family 32 transcripts

243 were predicted to encode , which can metabolize a wide variety of sugars encountered in the beetle gut, including sucrose, maltose, and perhaps trehalose, a principle component of insect hemolymph. Additionally, like the fungal community, the bacterial community was also rich in transcripts encoding glucan biosynthesis domains, indicating that some of the bacteria found in the midgut are actively growing and potentially providing alternate sources of carbon for A. glabripennis and other members of the gut community. Aldo-keto reductases domains were also highly abundant in the bacterial transcripts. Moreover, these were highly abundant in both the A. glabripennis midgut transcriptome [26] and midgut metagenome [17]; they were hypothesized to serve key roles in lignin degradation as they can enhance the cleavage of β-aryl ether linkages in the presence of other lignin degrading enzymes. Aside from lignin degradation, aldo- keto reductases are also integrally involved in detoxification, steroid biosynthesis, metabolism of monosaccharides, and a variety of other diverse oxidoreductive processes so their roles in the midgut could be multifunctional. Like the fungal community, the bacterial community contained domains with predicted involvement in the recycling of nitrogenous waste products. For example, xanthine uracil permease domains with the potential to assimilate xanthine and other products of purine metabolism and phosphoribosyltransferase domains, which participate in the purine salvage pathway, were highly abundant, suggesting a role in recycling waste products of purine nucleotides. Finally, serine dehydrogenase proteinases were highly abundant, which often function as stress inhibitors in bacteria [26].

244 GO Enrichment and KEGG Pathway Analysis

To further demonstrate how the gut community can augment or complement endogenous digestive and physiological processes catalyzed by beetle-derived enzymes,

GO enrichment analysis was conducted on microbial derived transcripts relative to the previously assembled A. glabripennis midgut transcriptome library using TopGO. The top 100 terms with p-values less than 0.05 were reported (Benjamini–Hochberg False

Discovery Rate=0.05). In the biological process GO category, the microbial community was enriched for GO terms involved in carbohydrate digestion and sugar assimilation. It was also enriched for biological process terms involved in aromatic compound biosynthetic processes and categories containing genes involved in the positive regulation of nitrogen compound metabolic processes, suggesting the community may play roles in the synthesis of essential aromatic amino acids and nitrogen economy in the midgut.

Terms involved in mediating host-microbe interactions, including interspecies interactions (GO:0044419), host interactions (GO:0051651) and symbiosis

(GO:0044403), were also enriched in the microbial community, implying that the community is interacting with the insect host (or the tree) in vivo. For example, several transcripts with predicted involvement in insect-interactions were noted and included bacterial transcripts with predicted chitin binding proteins that may function to allow these bacteria to bind to the peritrophic matrix or midgut epithelium. Transcripts predicted to encode various adhesion proteins and proteins involved in the formation of biofilms were also detected, supporting the hypothesis that these microbes are actively colonizing the midgut. Further, transcripts predicted to encode hemolysins, enzymes

245 involved in sialic acid metabolism, which is a predominant component of animal tissue, and enzymes involved in N-acetylglucosamine metabolism, suggest that these gut associated bacteria are capable of obtaining nutrients directly from A. glabripennis. An additional finding was the presence of several bacterial transcripts predicted to encode proteins. While their role in multicellular organisms is to deacetylate histone, thereby modifying chromatin structure and regulating gene expression, their roles in bacteria are uncharacterized since they lack histones. However, modifications to host histones have been observed in response to microbial colonization, and may function to mediate host gene expression to modulate immune responses [71, 72]. While the function of histone deacetylases is beginning to be understood in pathogenic bacteria, the function of these enzymes in symbioses is uncharacterized. In the molecular function category, GO terms associated with polygalacturonidase and phosphotransferase activities were enriched in the midgut community, emphasizing that degradation of pectin and assimilation and subsequent metabolism of sugars released from the degradation of polysaccharides found in woody tissue may be processes associated with gut microbes.

GO terms associated with acetyl coA carboxylase activity were also enriched, which catalyze the conversion of acetyl coA to malonyl coA, a valuable substrate for the biosynthesis of fatty acids that cannot be found in high abundances in woody tissue.

Thus, microbes could provide the substrates for fatty acid biosynthesis to A. glabripennis.

UDP-glucuronisyltransferases were enriched in the gut community and can play pivotal roles in the metabolism of steroids or the conjugative deactivation of toxic compounds, such as the secondary metabolites that often accumulate to high densities in heartwood. Heme oxidase, peroxidase, and oxidoreductive activity terms were also

246 enriched in the midgut community. While the roles of enzymes in these GO categories in digestive physiology are diverse, many genes assigned to these terms have roles in lignin degradation in white rot fungi and microbial genes in these GO categories could be involved in similar processes in the A. glabripennis midgut.

In contrast, the A. glabripennis midgut transcriptome library was highly enriched for peptidases relative to the gut community and it was also enriched for peptidases relative to transcriptome libraries prepared from the guts of other phytophagous insects

[26]. The majority of these peptidases were classified as digestive serine proteinases, signaling a potential role in nitrogen acquisition in the midgut. While the source of nitrogen used by A. glabripennis as it feeds in the high C:N heartwood is uncertain, it could use these proteinases to scavenge amino acids from proteins produced by nitrogen fixing or recycling microbes harbored in the gut. Amino acid ligases were also highly abundant in the A. glabripennis transcriptome, suggesting that amino acids liberated from microbial or plant cell wall proteins by serine peptidases can be quickly incorporated into proteins synthesized by the beetle. Furthermore, terms for carboxylesterases, hydrolases, acting on ester linkages, and monooxygenase were enriched in the insect gut relative to both the gut microbiota and other insect gut transcriptome libraries, suggesting that A. glabripennis produces its own diverse suite of detoxification enzymes, any of which could be involved in the detoxification of tree metabolites. Finally, α-glycoside transporters were enriched in the beetle transcriptome. Although hardwood polysaccharides are dominated by β-1,4-linked sugar residues, a potential source of α- glucosides could originate from sugars produced by members of the gut community.

247 These differences in the metabolic potentials of A. glabripennis and its gut microbiota were further elucidated through the inspection of pathway maps constructed from KEGG Orthology terms (Table 5-8). Several pathways were more abundant in the microbial community relative to A. glabripennis and included pathways for carbon metabolism, pyruvate metabolism, amino acid metabolism, nitrogen metabolism, lysine biosynthesis, valine/leucine/isoleucine biosynthesis, and phenylalanine/tryptophan/tyrosine biosynthesis were more abundant. In contrast, pathways for lysine degradation, valine/leucine/isoleucine degradation, glutathione metabolism, fatty acid biosynthesis, and detoxification were more abundant in the A. glabripennis midgut transcriptome. One of the most striking findings in the midgut microbiome was the expanded ability to convert sugars to pyruvate and the abilities to convert this substrate into a variety of products that could be used by the beetle or the gut community in various biosynthetic processes (Figure 5-13). For example, while the beetle can only convert glucose and fructose to pyruvate, the community can convert a variety of other sugar substrates, including pentose sugars found in hemicellulose from deciduous trees, into these compounds. Both the community and the beetle can directly shuttle pyruvate into the citric acid cycle where it can be used for energy production and for the synthesis of precursors to amino acids; however, the microbes and A. glabripennis are capable of converting this substrate into different compounds. For example, pyruvate is used as a substrate for alanine synthesis by A. glabripennis and the gut community; however, the community is able to convert these substrates into essential branched chain amino acids (valine/leucine/isoleucine) and homocitrate, which is a key substrate for lysine biosynthesis. Furthermore, while both the community and A. glabripennis can

248 convert pyruvate to oxaloacetate, the beetle can convert this compound to glutamate and aspartate. Furthermore, the community can convert aspartate into asparagine and glutamate into arginine while the beetle can convert glutamine into proline. In addition, pyruvate can also be used to synthesize acetyl coA and malonyl coA in the microbial community, which are key intermediates in the biosynthesis of fatty acids and are scarce in woody tissue. Pyruvate can also be used in fermentation processes to produce energy under anaerobic conditions. The beetle has the capacity to interconvert ethanol products to acetaldehyde and acetate as transcripts predicted to encode alcohol dehydrogenase and aldehyde dehydrogenase were detected in high abundances in the midgut transcriptome library, suggesting that the beetle can also directly use microbial fermentation processes in energy production and fatty acid biosynthesis. The beetle also is more enriched with genes involved in fatty acid biosynthesis and elongation and fatty acid degradation relative to the community providing an excellent opportunity for nutrient exchange.

The gut microbial community possesses a transcriptionally active pentose phosphate pathway capable of converting five-carbon sugars (e.g. xylose and arabinose) into substrates that can be fermented (e.g. pyruvate) (Figure 5-14). Products of the pentose phosphate pathway can be used in the biosynthesis of aromatic amino acids via the shikimate pathway. Full pathways for phenylalanine biosynthesis and partial pathways for histidine acid biosynthesis were detected in the microbial-derived transcripts. Although no pathways involved in the synthesis of tryptophan were detected in either the gut microbial community or the larval midgut transcriptome, A. glabripennis produces transcripts predicted to encode enzymes necessary to convert phenylalanine to tyrosine and encodes full pathways for the decomposition of branched-chain amino acids

249 and lysine, providing a direct example of how pathways encoded by gut microbes can complement insect-derived pathways and vice versa. Further, the community is capable of breaking down and using both nitrogen and sulfur derived in taurine for the production of amino acids, sterols, and other essential nutrients.

Potential pathway complementarity was also noted with regard to carbohydrate digestion. For example, the A. glabripennis midgut transcriptome contained six transcripts predicted to encode cellulases and eight transcript isoforms predicted to encode glycoside hydrolase family 1 xylanases, while only a single cellulase and xylanase were detected in association with the gut community. However, the gut community contained numerous transcripts predicted to encode β-glucosidases, β- xylosidases, and enzymes required for xylose utilization, suggesting that the community can assimilate and ferment these wood sugars into other products that can be used by the insect for biosynthetic purposes. Complementarity was also noted with regard to methionine biosynthesis and salvage. Methionine is an essential amino acid for most insects, while cysteine is not; yet, the low abundance of sulfur in woody tissue may make cysteine a conditionally essential amino acid in this system. While the gut microbiota produce transcripts involved in the biosynthesis of methionine, the beetle itself produced transcripts predicted to encode enzymes with involvement in methionine salvage and recycling. Furthermore, the gut microbiota produced more transcripts with predicted roles in cysteine biosynthesis than the beetle itself, suggesting that the community can also make contributions to biosynthesis of several canonical non-essential amino acids.

Recycled sulfur obtained from taurine (mentioned above) could also be used in other biosynthetic pathways, such as the synthesis of steroids and terpenoids.

250 Despite the potential complementarity noted above, a moderate degree of functional redundancy was noted, defined as genes that are encoded by both the beetle and the gut microbiota. This phenomenon has been noted in the communities of other phytophagous insects and may serve as a source of alternate digestive enzymes in the event that beetle-derived enzymes are damaged or disrupted by host plant defensive chemicals [73]. In addition, redundancy could also directly expedite endogenous digestive processes already occurring in the gut, improving the efficiency of woody tissue digestion and facilitating nutrient acquisition. For example, both the beetle and the microbial community produce enzymes capable of incorporating ammonia into glutamine via glutamate synthetase. While both the beetle and its microbiota can contribute to the nitrogen economy in the gut, the microbial community has an expanded capacity to convert glutamate to arginine while the beetle can convert it to proline, serving as important pathways to synthesize non-essential amino acids from recycled ammonia waste. In addition, both the beetle and the community produce digestive proteinases.

While the beetle produces a larger arsenal of serine and cysteine digestive peptidases, the community also produces both serine and cysteine digestive peptidases, which could be used for protein acquisition in the event that plant-produced proteinase inhibitors disrupt

A. glabripennis’ endogenous proteinase activities. In addition, both the beetle and its gut microbes produced high numbers of β-glucosidases, which could explain the higher cellulase enzyme complex activities in beetles with more diverse microbial communities.

These β-glucosidases could help overcome end product inhibition [74], indirectly enhancing activities of endoglucanases and exoglucanases produced by A. glabripennis.

251 Transcripts with Predicted Involvement in Xylose Utilization

Insect-derived xylanases have been previously documented in the A. glabripennis midgut through both transcriptome and Zymogram analyses [26, 75], which are hypothesized to breakdown the xylan chains that predominately comprise hemicellulose found in deciduous trees. However, pathways for xylose metabolism and utilization have not been previously described in insects and it is generally hypothesized that microbes housed in the gut are responsible for metabolizing these sugars derived from woody tissues [76, 77]. Xylose residues liberated from hemicellulose present a significant source of sugars in hemicellulose that could be assimilated and used by A. glabripennis for energy production or to synthesize carbon compounds used in fatty acid and amino acid biosynthetic pathways. Many have hypothesized that xylose-fermenting yeasts commonly found in the guts of cerambycid beetles could play roles in processing xylose

[77, 78]; however, a direct role in xylose metabolism in the gut has yet to be demonstrated.

For the first time, two transcriptionally active microbial pathways associated with the gut of a cerambycid beetle were documented that catalyze the conversion of xylose into products that can be directly utilized by insects, including A. glabripennis (Figure 5-

15). Yeast-derived transcripts were identified that were predicted to encode xylose transporters capable of assimilating xylose from the environment and several enzymes associated with the oxo-reductase pathway, which include several enzymes that convert xylose into D-xylulose-5-phosphate, a key intermediate of the pentose phosphate pathway. Transcripts originating from the xylose isomerase pathway were derived from

252 lactic acid bacteria (e.g., Pediococcus), which converts xylose directly into D-xylulose.

These processes are highly relevant for both bioprospecting and control of invasive cerambycids. Pentose sugars, such as xylose, are endogenously difficult to ferment, yet they present a significant store of renewable sugars that could be used for the industrial production of ethanol [79]. Pathways encoded by microbes associated with the A. glabripennis midgut and other cerambycid beetles could be exploited for industrial ethanol production. Furthermore, xylose sugars are toxic to some insects [80] and these pathways could serve critical roles in both xylose utilization and detoxification in the midgut that could be disrupted for control. However, whether the metabolism of xylose by gut microbes is required for survival in woody tissue or provides any fitness benefits to the host requires further evaluation.

Transcripts Predicted to Originate from Fusarium

Since OTUs taxonomically classified to the genus Fusarium are consistently found in the A. glabripennis larval midgut [28], transcripts derived from this species complex are of particular interest. These annotations mirror and build upon a previous proteomic analysis of proteins secreted by F. solani isolate MYA 4225 isolated from larval A. glabripennis and cultivated on wood chips in which numerous putative cell wall degrading proteins, detoxification enzymes, and laccases with putative involvement in lignin degradation were identified [27]. Through transcriptome analysis, over 200 transcripts assigned to the genera Fusarium or Nectria were detected in the gut contents library. Many of these transcripts were classified as glycoside hydrolases (GH) based on

253 Pfam domain annotations. One of the most striking findings was the presence of over 30 transcripts predicted to originate from GH 28 polygalacturonidases, which have predicted involvement in degradation of pectin. While pectin is not prevalent in woody tissue, it is periodically deposited in secondary growth and represents stores of galactose sugars and essential minerals (e.g., calcium) [81] that could be assimilated by A. glabripennis or F. solani and used in essential metabolic processes. Incidentally, GH 28 polygalacturonidases were also highly abundant in the secretome of F. solani cultivated on wood chips [27]. The high expression of polygalacturonidase transcripts in the A. glabripennis midgut coupled with the high abundance of polygalacturonidase enzymes detected in the secretome suggests these enzymes may be integrally linked to woody tissue degradation. Functional genomic approaches should be utilized to determine the function of these transcripts, confirming their role in pectin degradation or determining if these genes have evolved to produce enzymes with activity directed towards some of the more dominant polysaccharides in woody tissue. Furthermore, a single transcript predicted to encode a GH 5 cellulase was detected. While its ability to catalyze endo- or exo- type glucanase reactions could not be predicted based on annotations alone, it could work in tandem with A. glabripennis derived cellulases to expedite cellulose digestion in the midgut [24, 26]. Other transcripts with predicted involvement in sugar digestion include those annotated as , mannitol dehydrogenases, mannosyl transferases, glycosyl transferases, and carbohydrate MFS transporters.

It was previously hypothesized that F. solani could play a key role in lignin degradation in the midgut since members of the F. solani species complex are known to harbor lignin peroxidases and laccases in their genomes [14, 82] and several secreted

254 laccases were detected through MudPIT analysis. Although no bona fide ligninase transcripts were detected in this analysis, Fusarium-derived transcripts with predicted involvement in metabolizing aromatic compounds were detected that could play accessory roles in processing the lignin biopolymer. For example, we detected several transcripts in the midgut predicted to encode phenolic acid decarboxylases, aromatic ring hydroxylases, cytochrome P450s, and alcohol and aldehyde dehydrogenases, which can enhance oxidation of inter-phenylpropanoid linkages in lignin in the presence of other lignin degrading enzymes [83]. Transcripts with predicted involvement in heme production and hydrogen peroxide degradation were also detected and included coproporphyrinogen III oxidases and catalases, respectively. These enzymes are often co-expressed with lignin degrading enzymes and serve to quickly degrade heme and hydrogen peroxide to prevent irreversible oxidative damage to tissue, respectively. In addition, although some F. solani strains harbor lignin peroxidase orthologs in their genomes, which could confer ligninolytic activities, novel, uncharacterized genes that encode lignin degrading enzymes could also be present in ALB-associated F. solani as well as other F. solani strains. Transcripts predicted to encode detoxification and stress related proteins were detected and included arylamine N-acetyltransferases, heat shock proteins, cytochrome P450s, an arsenate reflux pump, sulfite reductases, and velvet superfamily regulatory proteins, which are often involved in secondary metabolic processes.

Nonentomopathogenic fungi associated with insects are hypothesized to serve key roles in nutrient provisioning, including (but not limited to) sterol synthesis, nitrogen concentration, essential amino acid production, and vitamin synthesis [84]. Several

255 Fusarium-derived transcripts were detected that could synthesize essential nutrients deficient in the woody tissue where larval A. glabripennis feeds. For example, transcripts predicted to encode enzymes involved in amino acid salvage and recycling of nitrogenous waste products were detected and included nitrate transporters, cysteine transporters, and xanthine/uracil family permeases, which can assimilate uric acid, xanthine, cysteine and other nitrogenous waste products produced by the beetle or other members of the microbial community for recycling purposes. Additionally, transcripts predicted to encode asparginase/ were detected. While ammonia liberated from deamination reactions is primarily reassimilated in the form of glutamine, nitrogen from glutamine can be transferred directly to asparagine via asparagine synthase [85] encoded by the other members gut microbiota and eventually into aspartate via , representing a key mechanism to integrate recycled nitrogen into amino acids besides glutamine. Several proteins involved in sterol synthesis were detected, which insects cannot endogenously synthesize, but are required for pheromone production and cell membrane synthesis [86]. These included transcripts with highest scoring BLASTX alignments to 3-keto sterol reductases; , which are involved in many metabolic pathways, but are often linked to sterol synthesis in fungi; sterol sensing proteins of

SREBP cleavage activation, which activate sterol synthesis; cholesterol transporter proteins; and di-trans-poly-cis-decaprenylcistransferases, which produce key intermediates in the biosynthesis of terpenoids.

Previously, F. solani has not been observed to colonize the galleries created by A. glabripennis as it feeds and tunnels into the heartwood; therefore, we hypothesize that this fungal symbiont is associated with the midgut. This is in contrast to the fungal

256 symbionts of other phytophagous beetles, which are directly inoculated into woody tissue to predigest lignin, cellulose, and other wood carbohydrates [87, 88]. Several transcripts associated with essential metabolic processes support the hypothesis that F. solani is metabolically active in the midgut and is not simply being consumed by the beetle.

These included transcripts with highest scoring BLASTX alignments to β-Ig-

H3/fasciclin, which promotes adherence to the midgut or colonization of ingested woody tissue, components of signal transduction pathways, cell cycle proteins, and transcriptional and translational machinery.

Few transcripts derived from F. solani were detected in the library derived from midgut tissue. Although this could suggest that F. solani actively colonizes the food bolus in the larval A. glabripennis midgut, the percentage of microbial transcripts detected in the whole midgut library was low and, as a consequence, the sequencing depth obtained may not be sufficient to detect these transcripts in the whole gut library.

Flourescent in sito hybridization studies are underway to determine the exact locations of this fungal isolate in the A. glabripennis midgut and whole genome sequencing is being pursued to provide a complete inventory of genes that could contribute to the digestion of woody tissue and the synthesis of nutrients. We expect that this genome will serve as an important investigational tool to more fully understand this fungal isolate’s precise contributions to beetle digestion and physiology.

257 Transcripts Derived from Yeasts

The association of yeasts with cerambycid beetle larvae has been extensively studied in a variety of species. Results from these studies indicate that yeast communities are plastic [7], acquired independently in different cerambycid lineages [89], and that intracellular yeasts harbored in mycetomes can be periodically replaced throughout the life cycle of the insects [7] . Despite these variations, previous studies have demonstrated the importance of ingested fungal enzymes in plant cell wall degradation in several cerambycid species [90] and a reduced growth rate and fecundity in aposymbiotic larvae, demonstrating clear benefits to the insect hosts [91], though their precise contributions to nutrient acquisition and physiological processes remain obscure. Yeasts have been consistently cultivated from all A. glabripennis populations surveyed to date

(unpublished data) and these taxa have been previously associated with the midguts of a variety of other cerambycids [92]. While their exact contributions to digestive physiology in the guts of their cerambycid hosts is unknown, surveys of pure culture yeasts have determined that they are capable of metabolizing and fermenting a variety of woody sugar substrates, including xylose, mannose, glucose [77], and phenols, including phenylpropanoid compounds that comprise the lignin polymer [93], suggesting that they could make contributions to these processes in the guts of cerambycid beetles.

Transcripts taxonomically classified to over 11 different genera of yeasts were detected in the A. glabripennis midgut transcriptome library, including Candida, Cyberlindnera,

Debaryomyces, Eremothecium, Kluyveromyces, Lachancea, Meyerozyma, Pichia,

Saccharomyces, Yarrowia, and Zygosaccharomyces.

258 Aside from the ability to process and ferment xylose sugars, transcripts annotated as galactose kinases, GH 3 chitinases, GH 17 and 81 1,3-beta-endoglucanases, GH 20

α,α-trehalose-phosphate synthases, GH 22 Dol-P-Man α-mannosyl transferases GH 73 α-

1,3-glucanases, GH 76 α-1,6-mannanases, and GT 15 α-1,2-mannosyltransferases were detected, which could be involved in processing mannose-, galactose-, and β-1,3 glucan- containing carbohydrates present in woody tissue. Transcripts predicted to encode enzymes with roles in fermentation were also detected, indicating that yeast-driven fermentation of wood sugars may be actively occurring in anaerobic regions in the A. glabripennis midgut. Transcripts associated with anaerobic, hypoxic environments were detected, indicating that although the midgut is dominated by aerobic environments, microenvironments where the oxygen content is lower may also exist, providing ideal environments for fermentation reactions. Additionally, although yeasts are not known to catalyze large-scale depolymerization of hardwood lignin, several transcripts annotated as multicopper oxidases and laccases were detected, which have been documented to expedite oxidation of linkages in lignin in the presence of other lignin degrading enzymes

[94]. Several transcripts annotated as Cα dehydrogenases were discovered, which can help oxidize some of the most prominent linkages in lignin (e.g. β aryl ethers), making them more susceptible to cleavage [95]. Thus, while these laccases, multicopper oxidases, and Cα dehydrogenases may not have primary roles in lignin polymer degradation, they can potentially serve accessory roles in this process. Furthermore, several transcripts predicted to encode antioxidants were detected, which can shield the midgut epithelium from free radicals generated through lignin degradation or other physiological processes in the gut. These included 1-cys-peroxiredoxin, D-

259 erythroascorbic acid, catalase, superoxide dismutase, and thiospecific antioxidants.

Glutathione- and cytochrome P450 mediated detoxification pathways derived from yeasts were also expressed.

Yeasts associated with the beetle midgut have the potential to produce enzymes with pivotal roles in nitrogen cycling. In addition to the glutamate dehydrogenase and taurine dioxygenases, which were derived predominately from yeasts, several transcripts annotated as putative uricases and ureases were detected, suggesting that yeasts associated with the midgut can decompose and recycle nitrogenous waste products produced by either A. glabripennis or members of the midgut community, allowing nitrogen to be recapitulated into functional nucleotides, amino acids, and other nitrogen- containing compounds. Other transcripts with putative involvement in nitrogen recycling included amino acid and ammonia transporters, nitrogen permease, enzymes associated with methionine salvage, and copper amine oxidase, which is capable of liberating ammonia from xenobiotic amines and participates in metabolism of aromatic compounds and glycine/serine/threonine biosynthesis. Further, autophagy proteins and vacuolar efflux proteins were detected, which are capable of recycling nutrients during periods of stress and releasing free amino acids from proteins for recycling purposes. Many enzymes with predicted roles in amino acid deamination reactions were detected and included enzymes capable of deaminating glutamate, leucine, valine, and glycine, liberating ammonia from these substrates. In addition, transcripts predicted to encode four of the five enzymes involved in the urea cycle were detected and included, carbamoyl phosphate synthetase, arginosuccinate synthase, arginosuccinate lyase, and arginase. Though not part of the canonical urea cycle, a transcript predicted to encode

260 creatinase, which converts to sarcosine and urea, was also present. This suggests that yeasts associated with the midgut can participate in the synthesis of urea and other nitrogenous waste products, providing substrates for the uric acid and urea degrading microbes associated with the gut to breakdown and reassimilate nitrogen.

We detected transcripts involved in the synthesis of methionine, branched chain amino acids, aromatic amino acids, and α-ketoglutarate, suggesting that yeasts could also play roles in essential amino acid biosynthesis. Several enzymes with predicted involvement in ergosterol biosynthesis were detected and included sulfate transporter, oxysterol binding proteins, cytochrome P450s, polyprenyl synthetase, sterol 24C- methyltransferase, steroid (desulfation of steroids), and 3-beta hydroxysteroid dehydrogenase/isomerase, which catalyzes the isomerization of steroid precursors into 4- ene-ketosteroids necessary for the formation of steroid hormones, and several ergosterol biosynthesis ERG4/ERG24 family proteins. It has been hypothesized that insects can utilize ergosterols produced by fungal symbionts for production of pheromones and cholesterol and, in some cases, biochemical evidence supports the utilization of fungal ergosterols [96]. Finally, fungi associated with the gut produced transcripts predicted to encode enzymes involved in the biosynthesis of several vitamins, including riboflavin, thiamine, and thiazole, which are all deficient in woody tissue.

Lactic Acid Bacterial Transcripts

By far, the most abundant microbial transcripts detected in the A. glabripennis midgut were taxonomically classified to the genus Pediococcus. Although OTUs

261 predicted to originate from bacteria assigned to this genus were not highly abundant in the 16S amplicon analysis, many Pediococcus-derived transcripts were detected in the midgut, suggesting that even rare OTUs can be metabolically active in the midgut. This taxon has been previously detected in association with the A. glabripennis midgut [24] and lactic acid bacteria taxonomically classified to the genus Leuconostoc were previously detected through shotgun and 16S rDNA amplicon metagenomic analyses of beetles collected from another A. glabripennis population [17].

Although we detected pathways for carbohydrate digestion, including pathways for the assimilation and utilization of β-1,4 linked di- and oligo-saccharides (e.g. cellobiose derived from cellulose and β-1,4-linked xylose, arabinose, galactan, and rhamnose oligomers derived from hardwood hemicellulose), N-acetylglucosamines (e.g., chitin and other aminosugars), and α-1,3 and α-1,6 linked mannose oligomers, few pathways predicted to degrade cellulose, hemicellulose, and other hardwood polysaccharides originated from lactic acid bacteria. This suggests that these bacteria metabolize and utilize the products of larger scale degradative processes catalyzed by either the beetle or other members of the gut community. Additionally, the overabundance of β-glucosidases and cellobiose phosphotransferase systems associated with this genus and with the gut community in general can partially explain the enhanced cellulase complex activity in the presence of diverse gut microbial communities previously observed in the A. glabripennis midgut. Aldose epimerases were also detected in association with this taxon, which catalyze the inter-conversion of α-D-glucose and β-

D-glucose. This may explain the overabundance of α-glucoside transporters in the A. glabripennis midgut transcriptome relative to the microbial community since the beetle

262 may acquire some of its carbon resources from α-D-glucose polysaccharides synthesized by various members of the midgut community. As mentioned previously, metabolism and fermentation of pentose sugars, including xylose and arabinose, was associated with this taxon via the pentose phosphate pathway. Aldo-keto reductases and aryl alcohol dehydrogenases were also detected in association with this taxon, which can work in tandem to degrade major linkages in lignin and aid in the decomposition of lignin metabolites released from larger scale degradative processes [97]. Other transcripts annotated as enzymes with putative involvement in lignin degradation include multicopper oxidases, which can often function as laccases [98], catalase, and YaaA proteins and thiol peroxidases, which reduce hydrogen peroxide concentrations to prevent toxicity [99].

Numerous pathways for essential nutrient synthesis were derived from lactic acid bacteria, including transcripts predicted to encode enzymes involved in lipid and sterol synthesis via the melvonate pathway; however, these enzymes were neither as abundant or as numerous as the fungal transcripts with predicted involvement in the synthesis of ergosterols. Many transcripts with predicted roles in the biosynthesis of vitamins were detected and included transcripts predicted to encode enzymes with involvement in folate, coenzyme A, and thiamine. In addition to enzymes with predicted roles in the biosynthesis of branched chain amino acids, lysine, asparagine, arginine, aspartate, and aromatic amino acids, numerous transcripts predicted to encode amino acid and polyamine transporters, permeases were detected and could potentially serve as exporters to transfer these essential and non-essential amino acids to A. glabripennis and other members of the gut community. It also has predicted involvement in nitrogen recycling;

263 numerous proteins were detected that are predicted to be involved in nucleoside uptake and salvage, methionine salvage, assimilation, and salvage of nitrogenous waste products

(such as arginine and ornithine). Several deamination pathways were detected, indicating that this isolate can also liberate ammonia from several amino acid and nucleotide substrates, which can subsequently serve as a source of nitrogen for recycling.

While few complete metabolic pathways predicted to originate from bacteria belonging to the genus Pediococcus could be constructed due to low sequencing coverage of microbial transcripts and/or the fact that some of these transcripts may not have been expressed at the time of RNA isolation, these results provide important clues to their potential contributions to metabolic processes in the gut. Previous analysis of metagenomic DNA predicted to originate from bacteria assigned to the genus

Leuconostoc detected in A. glabripennis midguts collected from the Worcester, MA population indicated that these lactic acid bacteria have a similar metabolic potential as the Pediococcus-derived transcripts detected in this current study [17]. Pathways for xylose utilization, essential amino acid synthesis (e.g. branched chain and aromatic amino acids), vitamin synthesis, and ammonia recycling and utilization, as well as enzymes predicted to encode detoxification enzymes and β-glucosidases and β-xylosidases were all taxonomically classified to genus Leuconostoc [17]. At this point in time, it is unknown whether lactic acid bacteria represent true symbionts performing essential digestive and physiological processes in the A. glabripennis midgut or whether these bacteria are simply exploiting degradation products released from cellulose, hemicellulose, and other wood carbohydrates. Regardless of the relationship between

Pediococcus and A. glabripennis, conversion of xylose sugars released by insect-derived

264 xylanase to inert compounds that can be used for energy production and in the biosynthesis of amino acids and fatty acids is likely of benefit to A. glabripennis. The detection of other transcripts involved in the synthesis of essential amino acids, steroids, and other nutrients that are generally deficient from woody tissue suggests that these bacteria could be producing other beneficial nutrients.

Conclusions

Wood-feeding coleopterans represent some of the most devastating pests worldwide, causing billions of dollars in damage to urban landscapes and threatening tree species of high ecological and economic importance [22, 100, 101]. While many studies have sought to characterize the microbial communities associated with these insects to gain an understanding of how they are able to thrive in woody tissue to develop targets for biocontrol, the communities studied were diverse, dynamic and comprised of large numbers of facultative microbes [7, 8, 102], fueling the debate about whether these- facultative microbes enhance fitness or are required for survival. Despite the dynamic nature and plasticity of cerambycid gut communities and the previously documented plasticity of the A. glabripennis gut microbiota observed in beetles feeding in different host trees [24], this analysis detected several shared bacterial and fungal OTUs in third instars feeding in sugar maple that could be primed to make important contributions to digestive and physiological processes in the A. glabripennis midgut.. Compared to communities sampled from other beetle guts, the microbial community of A. glabripennis was less rich and diverse [103]. However, in many of these studies, samples collected

265 from multiple insects were pooled prior to sequencing. With little overlap at the OTU level among individual beetle communities collected in the same host tree in our study, the inflated diversity metrics of some gut communities could be caused by pooling multiple beetles prior to sequencing. In these cases, shared OTUs and transcriptional activities of shared or rare OTUs were not documented and many of the OTUs could have originated from environmental microbes that are not making contributions to digestive physiology.

Incidentally, rRNAs and mRNAs predicted to originate from many of the shared and rare OTUs were also detected in association with the A. glabripennis midgut in this study, indicating that abundance is not necessarily indicative of transcriptional activity and that both common and rare taxa express genes involved in the synthesis of essential nutrients and digestive processes in the midgut. Interestingly, some of the same taxa shared in all life stages of emerald ash borer [8] were detected in all A. glabripennis midguts sampled for sequencing. Through multiple comparisons of microbial communities associated with wood-feeding beetles (including biological replicates), it may be possible to uncover unexpected similarities in microbial community composition that could be important for feeding in wood, both within the cerambycid lineage and also within order Coleoptera. Discoveries such as these can lead to novel control practices for some of most destructive wood-feeding pests by targeting constitutive symbionts whose function is essential for digestion and nutrient acquisition.

While many transcripts originated from shared OTUs, transcripts predicted to originate from rare OTUs were also detected and many were predicted to encode enzymes involved in essential nutrient biosynthesis and xylose metabolism. For

266 example, transcripts derived from the rare OTU Pediococcus were numerous and had predicted involvement in key digestive processes including xylose utilization, fermentation, essential amino acid biosynthesis and vitamin synthesis. Therefore, a subset of these rare OTUs is poised to make key contributions to nutritional ecology in the midgut. For this reason, we propose that the ‘rare biosphere’ should not be completely ignored, but rather, analyzed separately to assess their validity as OTUs (to ensure they do not represent sequencing artifacts or contaminants) and also to assess their metabolic activity.

Though transcriptional activity of many microbial OTUs associated with the A. glabripennis midgut has been confirmed by this study, further investigation is needed to delineate the relationship between A. glabripennis and these microbes to determine whether they represent mutualists and exchange nutrients with this insect, or commensals or parasites exploiting degradation products released from cellulose and hemicellulose by insect-derived enzymes. Although horizontal gene transfer from bacteria and/or fungi into insect genomes has been demonstrated in some beetle lineages [104], the transcripts presented in this analysis are probably not embedded in the A. glabripennis genome since rRNAs and transcripts predicted to encode core metabolic pathways were detected in this analysis, supporting the hypothesis that these transcripts are derived from microbes. In addition, further investigations are necessary to compile a more complete inventory of microbial transcripts expressed in the gut. For example, laser-microdissection followed by deep RNA sequencing can allow us to finely document the transcriptional activity of microbes that adhere to the midgut cells that lie in close proximity to these microbes, allowing us to more accurately predict and model nutrient exchanges and pathway

267 complementarity between A. glabripennis and symbionts at the microbe-insect interface.

Finally, investigations to determine the origins of these microbes are of paramount importance. While several taxa consistently found in association with the midgut of these beetle are candidates for vertical transmission (e.g. Fusarium solani, Enterobacteriaceae), the variations in the communities documented in this study and the shifts in community composition observed when the insect feeds on different tree species [24] suggests that a subset of these microbes might be acquired from the phyllosphere. The correlation between microbial community composition food sources and the microbial community composition of midguts has been previously documented in other insects [102] and also in A. glabripennis. For example, while ARISA analysis of oviposition and non- oviposition wood revealed bacterial OTUs that were exclusively associated with the egg, oviposition pit, and larvae that were not found elsewhere on the wood, bacterial OTUs found on non-oviposition pits were found in the oviposition pit and in association with the larval midgut. Thus, we propose that this insect harbors both constitutive and facultative microbes in its midgut, both of which are capable of contributing to nutritional ecology in this system.

Acknowledgements

Illumina GAIIx and HiSeq 2000 sequencing were performed at University of

Delaware Biotechnology Institute. Trinity assembly and BLAST and Pfam searches were performed using computing resources at the Hawaii Open Supercomputing Center at

268 University of Hawaii (Jaws cluster; Maui, HI), the Research Computing and

Cyberinfrastructure Group at The Pennsylvania State University (LionX clusters;

University Park, PA), and USDA-ARS PBARC (Moana cluster; Hilo, HI). We thank

David Long, Katie Mulfinger, and Karen Bingham for assistance with insect rearing and

Bruce Kingham for assistance with Illumina library construction and sequencing.

Funding for this project was provided by USDA-NRI-CRSEES grant 2008-35504-04464,

USDA-NRI-CREES grant 2009-35302-05286, the Alphawood Foundation, Chicago,

Illinois, a Seed Grant to KH from the Pennsylvania State University College of

Agricultural Sciences, and a and a USDA-AFRI Microbial Functional Genomics

Training grant 2010-65110-20488 to EDS.

269

Literature Cited

1. Dillon RJ, Dillon VM: The gut bacteria of insects: nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

2. Janson EM, Stireman JO, Singer MS, Abbot P: Phytophagous insect-microbe

mutualisms and adaptive evolutionary diversification. Evolution 2008, 62(5):997-

1012.

3. Lee Jr RE, Lee MR, Strong-Gunderson JM: Insect cold-hardiness and ice nucleating

active microorganisms including their potential use for biological control. Journal of

Insect Physiology 1993, 39(1):1-12.

4. Feldhaar H: Bacterial symbionts as mediators of ecologically important traits of

insect hosts. Ecological Entomology 2011, 36(5):533-543.

5. Beaver R, Wilding N: Insect-fungus relationships in the bark and ambrosia beetles.

Insect-fungus interactions 1989:121-143.

6. Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT,

Cayouette M, McHardy AC, Djordjevic G, Aboushadi N et al: Metagenomic and

functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature

2007, 450(7169):560-565.

7. Grunwald S, Pilhofer M, Holl W: Microbial associations in gut systems of wood- and

bark-inhabiting longhorned beetles [Coleoptera: Cerambycidae]. Systematic and

Applied Microbiology 2010, 33(1):25-34.

8. Vasanthakumar A, Handelsman J, Schloss PD, Bauer LS, Raffa KF: Gut microbiota of

an invasive subcortical beetle, Agrilus planipennis Fairmarine, across various life

stages. Environ Entomol 2008, 37(5):1344-1353.

270 9. Chung SH, Rosa C, Scully ED, Peiffer M, Tooker JF, Hoover K, Luthe DS, Felton GW:

Herbivore exploits orally secreted bacteria to suppress plant defenses. Proceedings

of the National Academy of Sciences 2013:201308867.

10. Currie CR, Scott JA, Summerbell RC, Malloch D: Fungus-growing ants use antibiotic-

producing bacteria to control garden parasites. Nature 1999, 398(6729):701-704.

11. Genta FA, Dillon RJ, Terra WR, Ferreira C: Potential role for gut microbiota in cell

wall digestion and glucoside detoxification in Tenebrio molitor larvae. Journal of

Insect Physiology 2006, 52(6):593-601.

12. Engel P, Martinson VG, Moran NA: Functional diversity within the simple gut

microbiota of the honey bee. Proceedings of the National Academy of Sciences 2012.

13. Delalibera Jr I, Vasanthakumar A, Burwitz BJ, Schloss PD, Klepzig KD, Handelsman J,

Raffa KF: Composition of the bacterial community in the gut of the pine engraver,

Ips pini (Say)(Coleoptera) colonizing red pine. Symbiosis (Rehovot) 2007, 43(2):97-

104.

14. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J,

Schmutz J, Taga M, White GJ, Zhou SG et al: The genome of Nectria haematococca:

contribution of supernumerary chromosomes to gene expansion. Plos Genetics 2009,

5(8): e1000618.

15. Rodrigue S, Materna AC, Timberlake SC, Blackburn MC, Malmstrom RR, Alm EJ,

Chisholm SW: Unlocking short read sequencing for metagenomics. PloS one 2010,

5(7):e11840.

16. Adams AS, Aylward FO, Adams SM, Erbilgin N, Aukema BH, Currie CR, Suen G,

Raffa KF: Mountain pine beetles colonizing historical and naïve host trees are

associated with a bacterial community highly enriched in genes contributing to

terpene metabolism. Applied and environmental microbiology 2013, 79(11):3468-3475.

271 17. Scully ED, Geib SM, Hoover K, Tien M, Tringe SG, Barry KW, del Rio TG, Chovatia

M, Herr JR, Carlson JE: Metagenomic profiling reveals lignocellulose degrading

system in a microbial community associated with a wood-feeding beetle. PloS one

2013, 8(9):e73827.

18. Shi W, Xie S, Chen X, Sun S, Zhou X, Liu L, Gao P, Kyrpides NC, No E-G, Yuan JS:

Comparative genomic analysis of the endosymbionts of herbivorous insects reveals

eco-environmental adaptations: biotechnology applications. PLoS Genetics 2013,

9(1):e1003131.

19. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH:

Genome sequences of rare, uncultured bacteria obtained by differential coverage

binning of multiple metagenomes. Nat Biotechnol 2013, 31(6):533-538.

20. Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS,

Chen F, Zhang T et al: Metagenomic discovery of biomass-degrading genes and

genomes from cow rumen. Science 2011, 331(6016):463-467.

21. Helbling DE, Ackermann M, Fenner K, Kohler H-PE, Johnson DR: The activity level of

a microbial community function can be predicted from its metatranscriptome. The

ISME journal 2012, 6(4):902-904.

22. Haack RA, Herard F, Sun JH, Turgeon JJ: Managing invasive populations of Asian

longhorned beetle and citrus longhorned beetle: a worldwide perspective. In: Annual

Review of Entomology. vol. 55. Palo Alto: Annual Reviews; 2010: 521-546.

23. Hu JF, Angeli S, Schuetz S, Luo YQ, Hajek AE: Ecology and management of exotic

and endemic Asian longhorned beetle Anoplophora glabripennis. Agricultural and

Forest Entomology 2009, 11(4):359-375.

272 24. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

25. Schloss PD, Delalibera I, Handelsman J, Raffa KF: Bacteria associated with the guts of

two wood-boring beetles: Anoplophora glabripennis and Saperda vestita

(Cerambycidae). Environ Entomol 2006, 35(3):625-629.

26. Scully ED, Hoover, K., Carlson, J.E., Tien, M., Geib, S.M.: Midgut transcriptome

profiling of Anoplophora glabripennis, a lignocellulose degrading cerambycid

beetle. BMC Genomics 2013.

27. Scully ED, Hoover K, Carlson J, Tien M, Geib SM: Proteomic analysis of Fusarium

solani isolated from the Asian longhorned beetle, Anoplophora glabripennis. PloS

one 2012, 7(4):e32990.

28. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

29. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Jabbour R, Hoover K: Microbial

community profiling to investigate transmission of bacteria between life stages of

the wood-boring beetle, Anoplophora glabripennis. Microbial ecology 2009,

58(1):199-211.

30. Weisburg WG, Barns SM, Pelletier DA, Lane DJ: 16S ribosomal DNA amplification

for phylogenetic study. Journal of bacteriology 1991, 173(2):697-703.

31. Lord NS, Kaplan CW, Shank P, Kitts CL, Elrod SL: Assessment of fungal diversity

using terminal restriction fragment (TRF) pattern analysis: comparison of 18S and

ITS ribosomal regions. Fems Microbiol Ecol 2002, 42(3):327-337.

273 32. Quince C, Lanzén A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT:

Accurate determination of microbial diversity from 454 pyrosequencing data. Nat

Methods 2009, 6(9):639-641.

33. Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R: UCHIME improves sensitivity

and speed of chimera detection. Bioinformatics 2011, 27(16):2194-2200.

34. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA,

Oakley BB, Parks DH, Robinson CJ et al: Introducing mothur: open-source, platform-

independent, community-supported software for describing and comparing

microbial communities. Applied and environmental microbiology 2009, 75(23):7537-

7541.

35. Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid

assignment of rRNA sequences into the new bacterial taxonomy. Applied and

environmental microbiology 2007, 73(16):5261-5267.

36. Campbell BJ, Yu L, Heidelberg JF, Kirchman DL: Activity of abundant and rare

bacteria in a coastal ocean. Proceedings of the National Academy of Sciences 2011,

108(31):12776-12781.

37. Reeder J, Knight R: The'rare biosphere': a reality check. Nat Methods 2009, 6(9):636-

637.

38. Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH: An outlook on the fungal internal

transcribed spacer sequences in GenBank and the introduction of a web-based tool

for the exploration of fungal diversity. New Phytologist 2009, 181(2):471-477.

39. Nilsson RH, Kristiansson E, Ryberg M, Hallenberg N, Larsson KH: Intraspecific ITS

variability in the kingdom fungi as expressed in the international sequence

databases and its implications for molecular species identification. Evolutionary

bioinformatics online 2008, 4:193-201.

274 40. Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of

protein or nucleotide sequences. Bioinformatics 2006, 22(13):1658-1659.

41. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ:

Gapped BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Research 1997, 25(17):3389-3402.

42. Kõljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S,

Høiland K, Kjøller R, Larsson E: UNITE: a database providing web‐based methods

for the molecular identification of ectomycorrhizal fungi. New Phytologist 2005,

166(3):1063-1068.

43. Martin MM, Martin JS: Surfactants: their role in preventing the precipitation of

proteins by tannins in insect guts. Oecologia 1984, 61(3):342-345.

44. Brune A, Kühl M: pH profiles of the extremely alkaline hindguts of soil-feeding

termites (Isoptera: Termitidae) determined with microelectrodes. Journal of Insect

Physiology 1996, 42(11):1121-1127.

45. Bian X, Shaw BD, Han Y, Christeller JT: Midgut proteinase activities in larvae of

Anoplophora glabripennis (Coleoptera: Cerambycidae) and their interaction with

proteinase inhibitors. Arch Insect Biochem 1996, 31(1):23-37.

46. Johnson KS, Felton GW: Potential influence of midgut pH and redox potential on

protein utilization in insect herbivores. Arch Insect Biochem 1996, 32(1):85-105.

47. GRAYSON JM: Digestive tract pH of six species of Coleoptera. Ann Entomol Soc Am

1958, 51(4):403-405.

48. Hurt RA, Qiu X, Wu L, Roh Y, Palumbo A, Tiedje J, Zhou J: Simultaneous recovery of

RNA and DNA from soils and sediments. Applied and environmental microbiology

2001, 67(10):4495-4503.

275 49. Stepanauskas R: Single cell genomics: an individual look at microbes. Current opinion

in microbiology 2012, 15(5):613-620.

50. Martin M: Cutadapt removes adapter sequences from high-throughput sequencing

reads, vol. 17; 2011.

51. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan

L, Raychowdhury R, Zeng QD et al: Full-length transcriptome assembly from RNA-

Seq data without a reference genome. Nat Biotechnol 2011, 29(7):644-U130.

52. Eddy SR: HMMER: Profile hidden Markov models for biological sequence analysis.

In., vol. 14. Bioinformatics; 1998: 755-763.

53. Huang Y, Gilna P, Li W: Identification of ribosomal RNA genes in metagenomic

fragments. Bioinformatics 2009, 25(10):1338-1340.

54. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glöckner FO: SILVA: a

comprehensive online resource for quality checked and aligned ribosomal RNA

sequence data compatible with ARB. Nucleic Acids Research 2007, 35(21):7188-7196.

55. Bengtsson-Palme J, Ryberg M, Hartmann M, Branco S, Wang Z, Godhe A, De Wit P,

Sánchez-García M, Ebersberger I, de Sousa F et al: Improved software detection and

extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other

eukaryotes for analysis of environmental sequencing data. Methods in Ecology and

Evolution 2013, 4(10):914-919.

56. Huson DH, Auch AF, Qi J, Schuster SC: MEGAN analysis of metagenomic data.

Genome research 2007, 17(3):377-386.

57. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski

K, Dwight SS, Eppig JT et al: Gene Ontology: tool for the unification of biology. Nat

Genet 2000, 25(1):25-29.

276 58. Kanehisa M: The KEGG database. In: ‘In Silico’ Simulation of Biological Processes.

John Wiley & Sons, Ltd; 2008: 91-103.

59. Ye Y, Doak TG: A parsimony approach to biological pathway

reconstruction/inference for genomes and metagenomes. PLoS computational biology

2009, 5(8):e1000465.

60. Alexa A, Rahnenführer J, Lengauer T: Improved scoring of functional groups from

gene expression data by decorrelating GO graph structure. Bioinformatics 2006,

22(13):1600-1607.

61. Tatusov R, Fedorova N, Jackson J, Jacobs A, Kiryutin B, Koonin E, Krylov D,

Mazumder R, Mekhedov S, Nikolskaya A et al: The COG database: an updated

version includes eukaryotes. BMC Bioinformatics 2003, 4(1):41.

62. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH,

Geer LY, Geer RC, Gonzales NR, Gwadz M et al: CDD: specific functional annotation

with the conserved domain database. Nucleic Acids Research 2009, 37(suppl 1):D205-

D210.

63. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B: The

Carbohydrate-Active EnZymes database (CAZy): an expert resource for

Glycogenomics. Nucleic Acids Research 2009, 37:D233-D238.

64. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths‐Jones S, Khanna A,

Marshall M, Moxon S, Sonnhammer ELL et al: The Pfam protein families database.

Nucleic Acids Research 2004, 32(suppl 1):D138-D141.

65. Sun Y, Buhler J: Designing patterns for profile HMM search. Bioinformatics 2007,

23(2):e36-e43.

277 66. Carter M, Smith M, Harrison R: Genetic analyses of the Asian longhorned beetle

(Coleoptera, Cerambycidae, Anoplophora glabripennis), in North America, Europe

and Asia. Biological invasions 2010, 12(5):1165-1182.

67. Geiser DM, del Mar Jiménez-Gasco M, Kang S, Makalowska I, Veeraraghavan N, Ward

TJ, Zhang N, Kuldau GA, O’Donnell K: FUSARIUM-ID v. 1.0: A DNA sequence

database for identifying Fusarium. In: Molecular Diversity and PCR-detection of

Toxigenic Fusarium Species and Ochratoxigenic Fungi. Springer; 2004: 473-479.

68. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W,

Fungal Barcoding C, Fungal Barcoding Consortium Author L: Nuclear ribosomal

internal transcribed spacer (ITS) region as a universal DNA barcode marker for

Fungi. Proceedings of the National Academy of Sciences of the United States of America

2012, 109(16):6241-6246.

69. Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I: Detection of large

numbers of novel sequences in the metatranscriptomes of complex marine microbial

communities. PloS one 2008, 3(8):e3042.

70. Sipos R, Székely AJ, Palatinszky M, Revesz S, Marialigeti K, Nikolausz M: Effect of

primer mismatch, annealing temperature and PCR cycle number on 16S rRNA

gene‐targetting bacterial community analysis. Fems Microbiol Ecol 2007, 60(2):341-

350.

71. Eskandarian HA, Impens F, Nahori M-A, Soubigou G, Coppée J-Y, Cossart P, Hamon

MA: A role for SIRT2-dependent histone H3K18 deacetylation in bacterial

infection. Science 2013, 341(6145).

72. Kåhrström CT: Bacterial pathogenesis: Legionella makes its mark on histones.

Nature Reviews Microbiology 2013, 11(6):359-359.

278 73. Chu C-C, Spencer JL, Curzi MJ, Zavala JA, Seufferheld MJ: Gut bacteria facilitate

adaptation to crop rotation in the western corn rootworm. Proceedings of the

National Academy of Sciences 2013.

74. Sternberg D, Vuayakumar P, Reese E: β-Glucosidase: microbial production and effect

on enzymatic hydrolysis of cellulose. Canadian Journal of Microbiology 1977,

23(2):139-147.

75. Geib SM, Tien M, Hoover K: Identification of proteins involved in lignocellulose

degradation using in gel zymogram analysis combined with mass spectroscopy-

based peptide analysis of gut proteins from larval Asian longhorned beetles,

Anoplophora glabripennis. Insect Science 2010, 17(3):253-264.

76. Bignell D: An experimental study of cellulose and hemicellulose degradation in the

alimentary canal of the American cockroach. Canadian Journal of Zoology 1977,

55(3):579-589.

77. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

78. Urbina H, Blackwell M: Multilocus phylogenetic study of the scheffersomyces yeast

clade and characterization of the N-terminal region of xylose reductase gene. PloS

one 2012, 7(6):e39128.

79. Wright JD: Ethanol from biomass by enzymatic hydrolysis. Chem Eng Prog;(United

States) 1988, 84(8).

80. Hu JS, Gelman DB, Salvucci ME, Chen YP, Blackburn MB: Insecticidal activity of

some reducing sugars against the sweet potato whitefly, Bemisia tabaci, Biotype B.

Journal of Insect Science 2010, 10.

279 81. Kohn RaL, O.: Intermolecular calcium ion binding on polyuronate-

polygalacturonate and polyguluronate. Collect Czech Chem Comun 1977, 42:731-744.

82. Regalado V, Perestelo F, Rodriguez A, Carnicero A, Sosa FJ, De la Fuente G, Falcon

MA: Activated oxygen species and two extracellular enzymes: laccase and aryl-

alcohol oxidase, novel for the lignin-degrading fungus Fusarium proliferatum.

Applied Microbiology and Biotechnology 1999, 51(3):388-390.

83. Zimmermann W: Degradation of lignin by bacteria. Journal of Biotechnology 1990,

13(2):119-130.

84. Martin MMC: The evolution of insect-fungus associations: from contact to stable

symbiosis. American zoologist 1992, 32(4):593-605.

85. Joy K: Ammonia, glutamine, and asparagine: a carbon-nitrogen interface. Canadian

Journal of Botany 1988, 66(10):2103-2109.

86. Clark A, Bloch K: The absence of sterol synthesis in insects. J Biol Chem 1959,

234(10):2578-2582.

87. Morales-Ramos JA, Rojas MG, Sittertz-Bhatkar H, Saldana G: Symbiotic relationship

between Hypothenemus hampei (Coleoptera : Scolytidae) and Fusarium solani

(Moniliales : Tuberculariaceae). Ann Entomol Soc Am 2000, 93(3):541-547.

88. Adams AS, Six DL, Adams SM, Holben WE: In vitro interactions between yeasts and

bacteria and the fungal symbionts of the mountain pine beetle (Dendroctonus

ponderosae). Microbial ecology 2008, 56(3):460-466.

89. Jones KG, Dowd PF, Blackwell M: Polyphyletic origins of yeast-like endocytobionts

from anobiid and cerambycid beetles. Mycological research 1999, 103:542-546.

90. Kukor JJ, Cowan DP, Martin MM: The role of ingested fungal enzymes in cellulose

digestion in the larvae of cerambycid beetles. Physiological Zoology 1988, 61(4):364-

371.

280 91. Baker J, Lum P: Development of aposymbiosis in larvae of Sitophilus oryzae

(Coleoptera: Curculionidae) by dietary treatment with antibiotics. Journal of Stored

Products Research 1973, 9(4):241-245.

92. Suh SO, McHugh JV, Pollock DD, Blackwell M: The beetle gut: a hyperdiverse source

of novel yeasts. Mycological research 2005, 109(Pt 3):261-265.

93. Bergauer P, Fonteyne P-A, Nolard N, Schinner F, Margesin R: Biodegradation of

phenol and phenol-related compounds by psychrophilic and cold-tolerant alpine

yeasts. Chemosphere 2005, 59(7):909-918.

94. Johannes C, Majcherczyk A: Natural mediators in the oxidation of polycyclic

aromatic hydrocarbons by laccase mediator systems. Applied and environmental

microbiology 2000, 66(2):524-528.

95. Masai E, Kubota S, Katayama Y, Kawai S, Yamasaki M, Morohoshi N:

Characterization of the C alpha-dehydrogenase gene involved in the cleavage of

beta-aryl ether by Pseudomonas paucimobilis. Bioscience, biotechnology, and

biochemistry 1993, 57(10):1655.

96. Nasir H, Noda H: Yeast‐like symbiotes as a sterol source in anobiid beetles

(Coleoptera, Anobiidae): possible metabolic pathways from fungal sterols to 7‐

dehydrocholesterol. Arch Insect Biochem 2003, 52(4):175-182.

97. Coy MR, Salem TZ, Denton JS, Kovaleva ES, Liu Z, Barber DS, Campbell JH, Davis

DC, Buchman GW, Boucias DG et al: Phenol-oxidizing laccases from the termite gut.

Insect Biochemistry and Molecular Biology 2010, 40(10):723-732.

98. Hoegger PJ, Kilaru S, James TY, Thacker JR, Kües U: Phylogenetic comparison and

classification of laccase and related multicopper oxidase protein sequences. FEBS

Journal 2006, 273(10):2308-2326.

281 99. Liu Y, Bauer SC, Imlay JA: The YaaA protein of the Escherichia coli OxyR regulon

lessens hydrogen peroxide toxicity by diminishing the amount of intracellular

unincorporated iron. Journal of bacteriology 2011, 193(9):2186-2196.

100. Poland TM, McCullough DG: Emerald ash borer: invasion of the urban forest and

the threat to north Americas ash resource. J Forest 2006, 104(3):118-124.

101. Kurz WA, Dymond C, Stinson G, Rampley G, Neilson E, Carroll A, Ebata T, Safranyik

L: Mountain pine beetle and forest carbon feedback to climate change. Nature 2008,

452(7190):987-990.

102. Colman DR, Toolson EC, Takacs-Vesbach CD: Do diet and taxonomy influence insect

gut bacterial communities? Molecular Ecology 2012, 21(20):5124-5137.

103. Reid NM, Addison SL, Macdonald LJ, Lloyd-Jones G: Biodiversity of active and

inactive bacteria in the gut flora of wood-feeding huhu beetle larvae (Prionoplus

reticularis). Applied and environmental microbiology 2011, 77(19):7000-7006.

104. Keeling CI, Yuen MM, Liao NY, Docking TR, Chan SK, Taylor GA, Palmquist DL,

Jackman SD, Nguyen A, Li M: Draft genome of the mountain pine beetle,

Dendroctonus ponderosae Hopkins, a major forest pest. Genome Biology 2013,

14(3):R27.

282

Table 5-1. Ecological indices for 16S bacterial communities sampled from midguts from 4

individual A. glabripennis larvae feeding on sugar maple. Indices were also calculated with

singleton OTUs removed.

Sample ID Number Chao Chao Ace Ace Shannon Shannon Simpson Simpson OTUs 95% CI 95% 95% CI 95% CI CI

1 198 451 342-641 659 565- 3.69 3.62-3.74 0.043 0.041- 776 0.046 2 91 156 121-230 361 199- 2.40 2.29-2.50 0.200 0.185- 315 0.217 3 111 246 178-385 200 295- 2.90 2.81-2.98 0.100 0.093- 451 0.107 4 82 141 107-218 142 84-287 2.80 2.70-2.99 0.115 0.106- 0.125 Singleton OTUs Removed 1 143 201 171-261 240 209- 3.57 3.51-3.62 0.045 0.043- 286 0.048 2 78 111 92- 156 155 127- 2.33 2.23-2.44 0.205 0.190- 200 0.221 3 90 149 116-221 192 157- 2.82 2.73-2.90 0.103 0.093- 244 0.110 4 68 91 77- 130 96 80-131 2.72 2.63-2.81 0.118 0.109- 0.128

283

Table 5-2. Taxonomic classifications of 16S bacterial OTUs detected in all A. glabripennis larval guts sampled.

Phylum Class Family Number of OTUs

Actinobacteria Actinobacteria Propionibacteriaceae 1

Propionibacteriaceae 1

Bacteroidetes Sphingobacteria Chitinophagaceae 1

Prevotellaceae 1

Sphingobacteriaceae 1

Firmicutes Bacilli Staphylococcaceae 1

Streptococcaceae 3

Proteobacteria Alphaproteobacteria Brucellaceae 1

Methylocystaceae 1

Sphingomonadaceae 1

Betaproteobacteria Burkholderiaceae 1

Comamonadaceae 1

Gammaproteobacteria Enterobacteriaceae 3

Pasteurellaceae 1

Pseudomonadaceae 1

Xanthomonadaceae 2

284

Table 5-3. Ecological indices for ITS fungal communities sampled from individual third instars

A. glabripennis larval guts feeding on the heartwood of sugar maple. Indices were also

computed with singleton OTUs removed.

Sample Number Chao Chao Ace Ace Shannon Shannon Simpson Simpson OTUs 95% CI 95% 95% CI 95% CI CI

1 20 23 20-37 25 21-45 1.74 1.70-1.79 0.22 0.21-0.23 2 20 20 20-26 22 20-33 2.52 2.44-2.60 0.10 0.09-0.11 3 28 43 32-92 83 57-134 1.89 1.85-1.94 0.22 0.21-0.23 4 15 21 16-53 22 17-39 1.67 1.52-1.82 0.33 0.28-0.39 Singleton OTUS Removed 1 11 11 11-11 13 11-26 1.69 1.65-1.73 0.22 0.21-0.23 2 16 16 16-16 16 16-16 2.46 2.39-2.53 0.10 0.09-0.11 3 16 17 16-30 22 17-39 1.84 1.80-1.88 0.22 0.21-0.24 4 12 12 12-12 12 12-18 1.62 1.48-1.76 0.33 0.28-0.39

285

Table 5-4. Trinity assembly metrics for gut contents and whole gut transcriptome libraries.

Number of Minimum N80 N50 N20 Maximum Total Transcripts Transcript Transcript Transcript Transcript Transcript Length of Length Length Length Length Length Assembled (nt) (nt) (nt) (nt) (nt) Transcripts (nt) Gut 161,117 200 323 684 1,945 31,383 90.09 Mb Contents Gut 7,952 200 245 334 546 5,049 2.69 Mb Contents: Microbial Gut 3,084 200 247 363 653 5,049 1.12 Mb Contents: Bacterial Gut 4,868 200 242 316 484 2,259 1.58 Mb Contents: Fungal Whole 61,812 200 272 592 1,937 26,118 30.6 Mb Gut Whole 3,167 200 266 492 1,271 14,152 1.42 Mb Gut: Microbial Whole 2,154 200 294 548 1,445 14,152 1.05 Mb Gut: Bacterial Whole 1,013 200 234 353 799 3,643 0.36 Mb Gut: Fungal

286

Table 5-5. Taxonomic classification of microbial rRNAs detected in the gut contents and whole gut transcriptome libraries.

Kingdom Phylum Class Order Number Unique Type Library Transcripts Bacteria Actinobacteria Actinobacteria Actinomycetales 105 16S, 23S Gut Contents, Whole Gut Bacteroidetes N/A N/A 4 23S Gut Contents Sphingobacteria Sphingobacteriales 11 16S, 23S Gut Content, Whole Gut Deinococcus- Deinococci Deinococcales 5 23S Whole Gut Thermus Firmicutes Bacilli Bacillales 15 16S, 23S Gut Contents, Whole Gut Lactobacillales 14 16S, 23S Whole Gut N/A 7 16S, 23S Gut Contents Proteobacteria Alphaproteobacteria Rhizobiales 12 16S, 23S Gut Contents Rickettsiales 6 23S Gut Contents Sphingomonadales 16 16S, 23S Gut Contents, Whole Gut N/A 3 23S Whole Gut Betaproteobacteria Burkholderiales 24 16S, 23S Gut Contents, Whole Gut N/A 4 23S Whole Gut Gammaproteobacteria Enterobacteriales 29 16S,23S Gut Contents, Whole Gut Pseudomonadales 19 16S, 23S Gut Contents, Whole Gut N/A 4 23S Whole Gut N/A N/A 8 23S Gut Contents, Whole Gut N/A N/A N/A 7 16S, 23S Whole Gut Fungi Ascomycota Saccharomycetes Saccharomycetales 48 18S, 28S Gut Contents Sodariomycetes Hypocreales 24 8S, 18S, Gut Contents 28S N/A 6 28S Gut Contents, Whole

287

Gut Basidiomycota N/A N/A 7 Whole Gut N/A N/A N/A 9 8S, 18S, Gut Contents 28S

288

Table 5-6: Taxonomic identity of 16S OTUs supported by expression data.

Taxonomic Percent OTU ID Number of Rare or Shared? Classification to Nucleotide Transcribed Abundant? Lowest Possible Identity rRNAs with Rank BLASTN Hit Acinetobacter 97 554 1 Abundant No Actinomyces 100 1546 1 Rare No Actinomycetales 95 346 2 Abundant No Actinomycetales 96 1459 2 Abundant Yes Actinomycetales 96 161 1 Rare No Bacteria 96 1573 1 Rare No Burkholderiaceae 97 775 1 Abundant Yes Caryophanon 100 1622 1 Rare No Cellovibrio 98 1485 2 Rare No Curtobacterium 97 56 1 Abundant No Enterobacteriaceae 99 21 2 Abundant No Enterobacteriaceae 97 1331 3 Rare No Enterobacteriaceae 96 1594 4 Abundant No Enterobacteriaceae 98 1460 1 Abundant Yes Enterobacteriaceae 95 1464 1 Abundant Yes Novosphingobium 96 19 1 Abundant Yes Pasteurellaceae 95 1548 1 Rare No Pediococcus 95 1403 1 Rare No Propionibacterium 99 35 1 Abundant Yes Pseudomonadaceae 100 1558 1 Rare No Pseudomonadaceae 97 1616 1 Rare No Pseudomonas 97 783 1 Abundant Yes Sphingobacterium 98 1064 1 Abundant No Staphylococcus 99 849 1 Abundant Shared Streptococcus 98 147 1 Abundant No

289

Table 5-7. Annotation statistics for microbial transcripts detected in the gut contents and whole gut libraries.

Gut Contents Whole Gut Number of rRNAs 182 237 Number of Transcripts with 7,952 3,167 BLASTX Alignments Number of Transcripts with Gene 1,083 1,225 Ontology Assignments Number of Transcripts with 705 1,103 KEGG Assignments Number of Transcripts with Pfam 1,686 2,766 Assignments Number of Transcripts with 1,554 2,328 KOG/COG assignments

290

Table 5-8. Number of unique KO terms found in KEGG pathways associated with carbon metabolism, nitrogen acquisition and amino acid synthesis, nutrient acquisition, and detoxification.

ALB Microbe

Carbon Metabolism Pathways 2-Oxocarboxylic acid metabolism 7 18 ABC transporters 8 36 Amino sugar and nucleotide sugar metabolism 22 30 Butanoate metabolism 11 12 C5-Branched dibasic acid metabolism 0 4 Carbon Metabolism 36 67 Citrate cycle TCA cycle 14 22 Fructose and mannose metabolism 11 19 Galactose metabolism 11 16 Glycolysis / Gluconeogenesis 22 28 Pentose and glucuronate interconversions 12 13 Pentose phosphate pathway 7 18 Phosphotransferase system PTS 1 13 Propanoate metabolism 13 15 Pyruvate metabolism 13 31 Starch and sucrose metabolism 17 23 Nitrogen acquisition and amino acids synthesis Alanine, aspartate and glutamate metabolism 20 21 Arginine and proline metabolism 20 19 beta-Alanine metabolism 14 6 Biosynthesis of amino acids 32 67 Cyanoamino acid metabolism 2 4 Cysteine and methionine metabolism 15 17 D-Alanine metabolism 0 3 D-Glutamine and D-glutamate metabolism 1 2 Glutathione metabolism 14 10 Glycine, serine and threonine metabolism 16 17 Histidine metabolism 6 3 Lysine biosynthesis 3 8 Lysine degradation 13 5 Nitrogen metabolism 4 9 Phenylalanine metabolism 7 6 Phenylalanine, tyrosine and tryptophan biosynthesis 3 9 Purine metabolism 54 61 Pyrimidine metabolism 42 49

291

Selenocompound metabolism 4 6 Taurine and hypotaurine metabolism 3 6 Tryptophan metabolism 8 8 Tyrosine metabolism 10 10 Valine, leucine and isoleucine biosynthesis 1 6 Valine, leucine and isoleucine degradation 22 7 Nutrient Acquistion Biosynthesis of unsaturated fatty acids 4 5 Biotin metabolism 3 4 Fatty acid biosynthesis 5 9 Fatty acid degradation 19 8 Fatty acid elongation 7 1 Folate biosynthesis 5 4 Insect hormone biosynthesis 3 0 Lipoic acid metabolism 2 1 Nicotinate and nicotinamide metabolism 8 6 One carbon pool by folate 5 12 Pantothenate and CoA biosynthesis 7 7 Riboflavin metabolism 5 6 Steroid biosynthesis 4 7 Steroid hormone biosynthesis 4 0 Sulfur metabolism 4 7 Terpenoid backbone biosynthesis 9 11 Thiamine metabolism 3 9 Ubiquinone and other terpenoid-quinone biosynthesis 3 3 Vitamin B6 metabolism 4 3 Detoxification Aminobenzoate degradation 6 3 Atrazine degradation 0 2 Benzoate degradation 3 4 Bisphenol degradation 1 0 Caprolactam degradation 4 2 Chloroalkane and chloroalkene degradation 4 3 Dioxin degradation 0 2 - cytochrome P450 7 3 Drug metabolism - other enzymes 15 5 Ethylbenzene degradation 0 2 Metabolism of xenobiotics by cytochrome P450 8 3 Naphthalene degradation 3 4 Nitrotoluene degradation 1 0 Polycyclic aromatic hydrocarbon degradation 2 2 Styrene degradation 1 0 Toluene degradation 0 4 Xylene degradation 0 2

292

Figure 5-1. Enlarged midgut associated with A. glabripennis and filamentous microbes observed in association with the midgut epithelial cells (inset).

293

Figure 5-2a (top). Rarefaction analysis of 16S bacterial communities sampled from four 3rd instar A. glabripennis larvae feeding on sugar maple. Curves fail to reach saturation, indicating that the community may harbor additional OTUs not sampled for sequencing. This hypothesis is supported by the

20 additional OTUs detected by further sequencing of Sample 1 (dotted line). Figure 5-2b (bottom).

Rarefaction analysis of 16S bacterial communities sampled from four 3rd instar A. glabripennis larvae feeding on sugar maple with singleton OTUs removed. Curves fail to reach saturation even with

294 singleton OTUs removed. Additional sequencing of community 1 (dotted line) led to the detection of 20 additional OTUs.

295

Figure 5-3. Relative abundances of bacterial classes detected through 16S community analysis of four 3rd instar A. glabripennis larvae.

296

Figure 5-4a (top). Venn diagram at distance 0.03 of OTUs detected through 16S sequencing of four gut bacterial communities sampled from third instar A. glabripennis larvae. Twenty-two shared OTUs were detected through this analysis. Figure 5-4b (bottom). Venn diagram at distance 0.03 of OTUs detected through 16S sequencing of four gut bacterial communities sampled from third instar A. glabripennis larvae with singleton OTUs removed. Removal of singleton OTUs substantially reduced the number of bacterial OTUs that were unique to one sample.

297

Figure 5-5. Hierarchical cluster analysis of bacterial families detected through 16S amplicon sequencing of four A. glabripennis larval midguts.

298

Figure 5-6a (top). Rarefaction analysis of ITS fungal amplicons sequenced from four 3rd instar larval A. glabripennis midguts. Rarefaction curves appear to reach saturation, indicating sufficient sampling to detect the majority of the fungal community diversity, although additional sampling of samples 1 and 3 revealed the detection of 6 and 12 additional OTUs, respectively.

299

Figure 5-6b (bottom). Rarefaction analysis of ITS fungal amplicons sequenced from four 3rd instar larval

A. glabripennis midguts with singleton OTUs removed. Communities were pooled prior to computation of rarefaction curves. Rarefaction curves saturate with removal of singleton OTUs, indicating that the majority of the fungal community has been sampled for sequencing and that additional sequencing will likely lead to the detection of rare OTUs.

300

Figure 5-7. Abundance of fungal order detected in ITS amplicon data sampled from communities associated with four 3rd instar larval A. glabripennis midguts. Fungal reads were exclusively classified to phylum Ascomycota. At the order level, the communities were dominated by Hypocreales,

Saccharomycetales, and Wallemiales.

301

Figure 5-8a (top). Venn diagram illustrating overlap in ITS OTUs at a distance of 0.03 among four larval

A. glabripennis midgut communities. Although a large percentage of OTUs were unique to a single community, these largely represented singleton OTUs. Figure 5-8b (bottom). Venn diagram illustrating overlap in ITS OTUs at a distance of 0.03 among four larval A. glabripennis midgut communities with singletons removed. Removal of singleton OTUs demonstrates that a large percentage of fungal OTUs were shared among three or more insects sampled for sequencing.

302

Figure 5-9. Maximum likelihood analysis of fungal ITS sequences taxonomically assigned to Fusarium solani detected in A. glabripennis larval midguts. These sequences were more closely related to fungal

ITS sequences obtained from F. solani cultivated from the PSU colony than beetles collected at a field site in Worcester, MA. Star represents F. solani derived OTU detected in the current study and scale bar represents number of substitutions per site.

303

Figure 5-10. Bacterial and fungal classes detected in the midgut (top) and midgut contents (bottom) libraries.

304

Figure 5-11. Relative abundances of bacterial clusters of orthologous genes found in the midgut and midgut contents libraries.

305

Figure 5-12. Relative abundances of fungal clusters of orthologous genes (KOGs) in the midgut and midgut contents libraries.

306

Figure 5-13. Pathways for pyruvate utilization detected in the A. glabripennis midgut and in the gut communities. The gut community has an expanded capacity to synthesize pyruvate from pentose sugars found in hemicellulose and converts pyruvate to essential branched-chain amino acids and homocitrate, an essential component of the lysine biosynthetic pathway.

307

Figure 5-14. Partial pathways for aromatic amino acid biosynthesis and biosynthesis of other essential amino acids detected in the gut community. In some cases, pathways encoded by A. glabripennis are complemented by transcripts derived from the midgut microbiota and vice versa.

For example, both the beetle and gut microbes encoded transcripts that could be used in the biosynthesis of lysine. While the microbial community synthesized the essential amino acid threonine, the beetle was capable of converting this to glycine and serine and both microbes and

A. glabripennis encoded pathways involved in the biosynthesis of the nonessential amino acid serine.

308

Figure 5-15. Putative pathways for xylose utilization based on BLASTX annotation of transcripts sampled from the midgut microbial community of larval A. glabripennis. Xylose can be shuttled into the pentose phosphate pathway by two different pathways detected in the metatranscriptome: the oxoreductive pathway and the isomerase pathway. Fructoste-1,6-phosphate produced from the pentose phosphate pathway can be converted into ethanol, acetate, or acetyl coA, which could be used for the synthesis of ATP or fatty acids. Furthermore, acetyl coA could ultimately be shuttled into the citric acid cycle. Since intermediates of the TCA cycle are frequently used to synthesize the backbones of α-ketoglutarate family amino acids, carbon from xylose could eventually be integrated into amino acids. Transcripts originating from the oxoreductive pathway were classified as yeasts while transcripts originating from the isomerase pathway were classified as lactic acid bacteria. ADH: alcohol dehydrogenase; PDC: pyruvate dehydrogenase complex;

ACS: acetyl coA synthetase.

309

Chapter 6 Conclusions and Future Directions

Wood-feeding cerambycid beetles pose significant threats to forest ecosystems, commercial nurseries, and biofeedstock plantations worldwide [1, 2], yet very little is known about their biology in their native and invasive ranges that could be exploited to effectively control these destructive pests. Natural enemies are rare for invasive cerambycids and often times, pesticides are costly and inefficient to control these invasive pests once they have become established [3]. Thus, new and innovative approaches must be undertaken to prevent irreversible destruction to both urban and forest landscapes, ecosystem disturbances, and large scale losses of tree species economically valuable for their hardwood, fruit, and renewable carbon. Despite these threats, little is known about the interactions between these insects, their gut microbes, and their host trees at the molecular level that could be exploited for novel control mechanisms.

The purpose of this work was to characterize the interactions between a wood-boring cerambycid, its gut microbes, and its host trees that enable this insect to colonize and thrive in a broad range of seemingly healthy deciduous host trees, digest prominent compounds of woody tissue, including recalcitrant polysaccharides (e.g. cellulose, hemicellulose, pectin, and callose) and the lignin biopolymer, extract essential nutrients from woody tissue, and synthesize nutrients deficient from the heartwood. The model organism for this research was Anoplophora glabripennis (Asian longhorned beetle), a destructive wood-boring cerambycid that was first detected in the United States in 1996. Despite its relatively recent introduction and its limited geographic host range, this insect has caused millions of dollars in damage to urban landscapes

[2]and eradication efforts have focused on removal and destruction of all infested trees (and

310 sometimes nearby uninfested trees that could serve as suitable hosts) within a quarantine zone.

These efforts are protracted, costly, and have failed in some cases.

A. glabripennis, in collaboration with its gut microbes, is able to liberate sugar from cellulose and hemicellulose, degrade the lignin polymer that surrounds and protects these recalcitrant polysaccharides, and acquire essential nutrients required for survival. These interactions not only represent putative targets for control, but also opportunities to prospect for enzymes that could be used to enhance the production of cellulosic ethanol on an industrial scale.

To characterize the interactions between A. glabripennis and its gut community that enhance survival in woody tissue, we sought to investigate the digestive and metabolic capacities of A. glabripennis and its gut microbial community using next generation sequencing approaches.

Through transcriptional profiling of the A. glabripennis midgut, we demonstrated that the insect itself has a rich digestive capacity primed for degrading woody polysaccharides and neutralizing potent toxic phytochemicals that could be ingested while feeding in heartwood. A. glabripennis also produces a number of transcripts with predicted roles as helper enzymes that could facilitate the degradation of lignin in the midgut. While no enzymes capable of catalyzing the degradation of the lignin biopolymer were discovered, transcripts predicted to encode enzymes capable of catalyzing local degradation of lignin or enhancing the oxidation of dominant linkages in lignin in the presence of other enzymes were identified. These transcripts were annotated as aldo-keto reductases, peroxidases, alcohol dehydrogenases, and laccases.

Large scale comparisons of gut transcriptome libraries across different phytophagous insects revealed that A. glabripennis produces its own unique profile of cell wall degrading enzymes and uses a completely different set of glycoside hydrolases to degrade woody tissue relative to other wood-feeding insects. These comparisons revealed that A. glabripennis possesses a unique transcriptome profile relative to other phytophagous insects and is enriched for transcripts predicted to encode peptidases, chitinases, carboxylesterases, and oxidoreductases.

311 While the carboxylesterases and oxidoreductases likely function as detoxification enzymes and the chitinases may function to liberate carbon and nitrogen from fungal cell walls, it is unknown why this insect produces so many different digestive peptidases as it feeds in nitrogen-deficient woody tissue. We hypothesize that these function in nitrogen acquisition from proteins cross- linked in the plant cell wall matrix or from nitrogen-fixing and nitrogen recycling microbes associated with the gut. Notably, the A. glabripennis transcriptome was also distinct from the previously sequenced termite transcriptomes, supporting the hypothesis that these insects have different strategies for overcoming many of the challenges associated with feeding in wood. This finding is not unexpected given that wood-feeding evolved independently in many evolutionary lineages but is significant in that it presents a new paradigm for insect-microbe interactions that facilitate the degradation of woody tissue, providing new opportunities to mine for novel lignocellulose degrading genes that could be exploited on an industrial scale to enhance cellulosic ethanol production. Despite its versatile metabolic capacities, we were also able to identify gaps in the A. glabripennis digestome that could be filled by enzymes produced by microbes colonizing the gut. These gaps included xylose and pentose sugar utilization, large scale lignin depolymerization, synthesis of essential amino acids, sterols, and vitamins, and nitrogen acquisition.

Metagenomic profiling revealed a rich assemblage of microbes associated with the A. glabripennis midgut, which was comprised of over 160 bacterial OTUs and 7 fungal OTUs. Taxa previously detected in A. glabripennis larvae collected in China were also detected, suggesting that a subset of these microbes might be common among A. glabripennis larvae and that they could make valuable contributions to digestive and physiological processes in the gut. The most dominant OTUs detected in the midgut were classified as lactic acid bacteria belonging to the genus Leuconstoc. Furthermore, shotgun reads taxonomically classified to the genus Fusarium were also detected in the metagenome library. Fungi belonging to the F. solani species complex

312 have been consistently isolated from A. glabripennis larvae feeding in a variety of host tree species and collected from several different field sites, suggesting that these fungi are intricately associated with the midgut. While pathways for synthesis of essential nutrients and amino acids, nitrogen fixation, nitrogen recycling, and detoxification were detected, one of the most striking findings in this dataset were reads predicted to encode enzymes capable of oxidizing major linkages in lignin, suggesting that the microbial community works in tandem with beetle-derived helper enzymes to facilitate degradation of lignin in the midgut. Furthermore, comparison of the

A. glabripennis metagenome annotations to annotations compiled from microbial communities associated with other herbivores revealed that the A. glabripennis midgut metagenome is distinct from other gut associated communities and is more similar to fungal gallery communities associated with other wood-feeding insects such as Sirex. Many of these fungal gallery communities have documented lignin degrading capacities; comparisons between the A. glabripennis midgut community and the fungal gallery communities revealed that lignin degrading candidate genes identified in the A. glabripennis midgut community were also found in high abundances in fungal gallery communities, but were absent or present in low abundances in herbivore gut communities (few of which are known to degrade lignin), supporting their proposed role in lignin degradation. Interestingly, the lignin degrading candidates associated with the A. glabripennis midgut metagenome and the fungal gallery communities were absent or present in relatively low abundances in four termite hindgut communities, supporting the hypothesis that the mechanisms of lignin degradation in A. glabripennis in the midgut are distinct from the mechanisms used by termites.

While targeted amplicon sequencing and shotgun metagenome sequencing provided valuable insight into the metabolic potential of the community as a whole, few F. solani-derived reads were detected in the shotgun metagenome data and its potential contributions to A. glabripennis digestive physiology could not be predicted. However, MudPIT-based secretome

313 analysis of an A. glabripennis-derived F. solani isolate cultivated on wood chips allowed us to survey this isolate’s cellulose, hemicellulose, and lignin degrading capabilities in culture and provided us with useful insight into its potential contributions to digestive processes in the A. glabripennis midgut. Using this approach, we are able to annotate over 400 secreted proteins, including proteins belonging to 28 glycoside hydrolase families capable of disrupting dominant hardwood polysaccharides. Several secreted laccases, peroxidases, and enzymes associated with lignin degradation were also detected, supporting our hypothesis that this isolate can make pivotal contributions to lignin degrading processes in the midgut. It has been previously hypothesized that fungi associated with the midguts of cerambycid beetles can concentrate and provision nitrogen obtained from the environment or through interactions with nitrogen fixing/recycling microbes present in the gut to their insect hosts. Proteinases with broad substrate specificities and ureases were observed in the culture medium, indicating that this isolate has the capability to digest plant cell wall proteins and can recycle nitrogenous waste under periods of nutrient limitation, supporting a potential nitrogen provisioning role for these microbes.

While DNA sequencing and MudPIT profiling elucidated the metabolic potential of the

A. glabripennis midgut community and a midgut associated F. solani isolates, it is unclear whether any of the annotated genes are metabolically active in the midgut. To focus our survey on transcriptionally active microbes, we performed deep sequencing of RNA isolated from both the midgut and the midgut contents and assembled over 10,000 microbial protein-coding transcripts. Many of the mRNAs detected through this analysis originated from microbial taxa that have been previously detected in association with the midgut and were found in association with multiple individuals collected from the same population. The most dominant taxon detected in this analysis was Pediococcus, a lactic acid bacterial genus with proven abilities to ferment xylose, arabinose, and other sugar substrates commonly found in plant materials. This is of significant interest because the most dominant taxon detected through analysis of the midgut

314 metagenome was also a lactic acid bacterium (e.g. Leuconostoc) that had a similar set of candidate genes as Pediococcus, which included β-glucosidases, β-xylosidases, branched and aromatic amino acid biosynthetic pathways, several detoxification genes, and chitin binding genes suitable for anchoring to the chitin-derived peritrophic matrix. The consistent association of lactic acid bacteria with A. glabripennis may represent a biologically relevant symbiosis and warrants further investigation to determine the precise fitness effects and contributions to digestive physiology in A. glabripennis.

One of the most striking findings in metatranscriptome analysis was the discovery of two separate pathways for the metabolism of xylose, a predominant wood sugar that is found in hemicellulose in deciduous tree species. While A. glabripennis produces transcripts predicted to target hardwood xylans, no endogenous enzymes involved in the utilization or metabolism of xylose were detected, which could serve as a useful substrate for energy, fatty acid, and amino acid production. However, yeasts and lactic acid bacteria associated with the community can convert this pentose sugar (and other pentose sugars found in hemicellulose) into a variety of other products that could be used in various biosynthetic pathways. This finding is significant in that xylose-fermenting yeasts have been consistently isolated from the guts of cerambycid beetles collected all over the world and, despite these capacities, it was previously unknown whether these pathways were actually expressed in their insect hosts [4]. Numerous pathways with predicted involvement in synthesizing essential nutrients that are deficient in woody tissue were also detected in association with these yeasts and included pathways for the synthesis of fungal ergosterols, vitamins, substrates that could be used in fatty acid synthesis, and essential amino acids.

In some cases, pathway complementarity between A. glabripennis and its gut microbes that could be essential for survival in woody tissue were noted. For example, the community was capable of synthesizing several essential nutrients, including methionine, folate, riboflavin, and

315 vitamin B6, suggesting a microbial source for many of these essential nutrients. However, in many cases, the beetle was capable of recycling these nutrients, suggesting that although the microbial community provides these essential nutrients, the beetle may have key roles in nutrient conservation as it feeds in nutrient-deficient woody tissue. There were also many potential examples of beetle capacities to use microbial products as substrates for biosynthesis. For example, the beetle transcriptome was inundated with alcohol and aldehyde dehydrogenases, which could convert microbial fermentation products into compounds that can be used for glycolysis and amino acid biosynthesis (e.g., pyruvate) and fatty acid biosynthesis (e.g. acetyl coA). This phenomenon was also noted with regard to amino acid biosynthesis. While the community possessed a complete methionine biosynthetic pathway as well as mechanisms to scavenge sulfur and nitrogen from taurine, the beetle was capable of converting this amino acid into the non-essential amino acid cysteine.

Over 200 transcripts derived from Fusarium solani were detected from the metatranscriptome, including a single transcript predicted to encode a GH family 5 cellulase and over 30 transcripts predicted to encode GH family 28 polygalacturonidases, indicating a potential role in cell wall decomposition in the midgut. Interestingly, enzymes predicted to encode GH 5 cellulases and GH 28 polygalacturonidases were also detected through MudPIT analysis of F. solani cultured on a solid wood medium, suggesting that these transcripts might be integrally involved in the breakdown of woody tissue. Based on the low abundance of pectin in woody tissue, it is possible that these genes have neo-functionalized in this isolate to breakdown more predominant polysaccharides found in woody tissue. Expression of these genes should be pursued to confirm their function as pectinases or demonstrate activity towards additional substrates (e.g. cellulose or hemicellulose). Although we were able to confirm expression of microbial genes involved in the synthesis of essential nutrients and the degradation of pentose sugars commonly found in hardwood xylans, more targeted approaches could lead to the

316 detection of additional genes expressed by the microbiota that could serve pivotal roles in nutritional ecology in the A. glabripennis midgut. For example, filamentous microbes are consistently visualized in association with the midgut cells in beetle larvae feeding in preferred hosts. Targeted microdissection followed by deep RNA sequencing of both the microbial and insect cells in close proximity to these interfaces could provide more insights into how A. glabripennis interacts with its gut microbes and how pathways encoded by members of the gut microbiota augment and complement pathways expressed by the beetle.

While these microbes are primed to have pivotal roles in the nutritional ecology in the A. glabripennis midgut, the source of these microbes is unclear. Although transcriptional activities of many OTUs detected in multiple insects and/or consistently found in association with A. glabripennis larvae in the PSU colony were confirmed through transcriptome analysis, high transcriptional activity was also documented from many of the rarer OTUs found in lower abundances in the midgut. A subset of these microbes represent candidates for transovarial vertical transmission from mother to offspring, such as the Fusarium solani OTUs that are consistently found in the midgut [5]. Several bacterial OTUs were associated exclusively with the oviposition pit, egg, and larvae and also represented excellent candidates for vertical transmission [6]. However, the variation in community composition in insects with the same ancestry (derived from Penn State colony) and the high transcriptional activities of several rarer

OTUs suggest that not all microbes associated with the community are vertically transmitted.

Instead, these microbes may be acquired from the phyllosphere. Adult female A. glabripennis feed on twigs and leaves of the host tree prior to oviposition and she may carry microbes associated with the surface of the host tree in her digestive tract, which could be deposited at the egg site during oviposition and acquired by first instar larvae as they feed in the oviposition pit after hatching. This hypothesis is supported given that bacterial OTUs found in non-oviposition sites were also found in association with the oviposition pit and that similar types of fungi can be

317 found in the oviposition site and non-oviposition sites on the same host tree (unpublished data).

However, the concentration of fungal CFUs/mL is much higher in the oviposition pit compared to non-oviposition sites, suggesting that perhaps the mother deposits these microbes during oviposition. Although many of the transcriptionally active OTUs detected in the A. glabripennis midgut could have environmental origins, this does not undermine their potential importance to digestive physiology and nutrient acquisition in the A. glabripennis midgut, which was demonstrated by the active transcription of pathways involved in the biosynthesis of key essential nutrients and utilization of pentose sugars found in association with hardwood hemicellulose.

Now that we have compiled a comprehensive genetic inventory of the metabolic and digestive potentials of both the beetle and its gut microbiota and we have demonstrated that essential nutrient biosynthetic pathways encoded by gut microbes are actively expressed in the A. glabripennis midgut suggesting a pivotal role in nutritional ecology, we are currently exploring the feasibility of disrupting interactions between this insect and its gut microbes as a novel form of biocontrol. To develop these targets for control, we are studying the impacts of feeding in a resistant poplar host on both the beetle and its gut microbiota to identify natural effects of resistance that could be used to engineer and/or breed resistant cultivars of poplar capable of disrupting key digestive genes encoded by the insect and/or its microbial symbionts. For these experiments, we are using two different poplar tree species, one of which coevolved with the beetle in China and displays a considerable degree of resistance to both adults and larvae

(Populus tomentosa; Chinese white poplar). For comparative purposes, we also reared beetles in

P. nigra (Lombardy poplar), which has been identified as a suitable host for invasive A. glabripennis populations in the US and Europe. Through comparisons of levels of compounds involved in constitutive defenses, we were able to identify three phenolic glycosides that have accumulated to high abundances in the heartwood of P. tomentosa relative to P .nigra. These compounds included salicin, , tremulacin, and tremuloidin. While many phenolic glycosides,

318 including salicin, have previously characterized antimicrobial properties [7, 8] and could have profound impacts on the gut microbiota, tremulacin and tremuloidin have been previously shown to disrupt growth and development of Lymantria dispar (gypsy moth) at high levels and are capable of alkylating and cross-linking crucial digestive and structural proteins [9, 10]. The presence of PGs with known impacts on both microbes and insects signifies that feeding in P. tomentosa could have negative impacts on both the beetle and its midgut microbiota and represents a good system to identify microbes and genes that could be knocked down or targeted for control.

To assess impacts of feeding in both trees on the gut microbial community composition, targeted amplicon sequencing of bacterial 16S and fungal ITS regions and RNA-Seq based differential expression analysis of insect- derived genes is currently being explored. While feeding in P. tomentosa did not significantly reduce the community richness or diversity relative to insects feeding in P. nigra, differences in the 16S community structure was noted that could be driving resistance in this system. For example, while the 16S communities of insects feeding in

P. nigra were dominated by Alphaproteobacteria and Gammaproteobacteria, which were also highly abundant in insects feeding in sugar maple in both the PSU colony and insects collected from a field population in New York [11], bacterial communities of insects feeding in P. tomentosa were dominated by Sphingobacteria. The differences in community compositions were most striking at the OTU level as visualized through multivariate NMDS analysis. Several

OTUs were strongly associated with the bacterial communities collected from insects feeding in

P. nigra, but were absent from insects feeding in P. tomentosa. Supervised clustering analysis using random forests and linear discriminate analysis are currently underway to determine which

OTUs are most strongly associated with P. nigra, allowing us to more finely document which

OTUs were lost during feeding in P. tomentosa and could be suitable targets for control. In contrast, no major differences were noted in the community compositions of the fungal ITS

319 communities at high taxonomic levels as all communities were generally dominated by yeasts assigned to the order Saccharomycetales. Differences in bacterial ITS community compositions at the OTU level were also very subtle as no major differences between compositions of the communities were revealed by multivariate NMDS analysis. Furthermore, the abundances of various yeasts OTUs and Fusarium OTUs were not impacted by feeding in P. tomentosa, suggesting that resistance acts primarily on the bacterial community in this host tree.

RNA-Seq based differential expression analysis revealed several genes that were up- and down-regulated while feeding in P. tomentosa that could be associated with resistance. Genes that were downregulated in beetles feeding in P. tomentosa were mostly structural genes, including peritrophic matric proteins, ankyrin repeat proteins, cuticular proteins, and collegan- associated proteins, indicating that feeding in a resistant host could disrupt the structural integrity of the peritrophic matrix. Interestingly, these proteins are also common targets for gene-silencing and gene-knockdown for control using RNAi [12], suggesting that host tree defenses can have potent negative impacts on insect herbivores by disrupting critical structural proteins required for survival [13]. Several genes with predicted involvement in insect-microbe interactions were also downregulated, including gram negative and gram positive recognition proteins, which may be a consequence of shifts in community composition caused by feeding in P. tomentosa. Several detoxification and digestive genes in insects feeding in P. tomentosa were up-regulated relative to

P. nigra. These included several cytochrome P450s that could be induced by phenolic glycosides and other classes of toxins that have accumulated in the heartwood of P. tomentosa.

Interestingly, several digestive enzymes were up-regulated in larvae feeding in P. tomentosa relative to P. nigra, including several α-mannosidases, β-glucosidases, and esterases, which could be important for cell wall degradation and nutrient acquisition in the midgut. Many plants are known to produce enzyme inhibitors that inactivate or destroy key digestive enzymes (e.g., digestive proteinase inhibitors) [14]. The overexpression of these genes could be an attempt by

320 A. glabripennis to compensate for reduced digestive enzyme activities in the midgut. For this reason, digestive enzyme activities as well as mass spectrometry based proteomics will be performed to assess the impacts of feeding in P. tomentosa on the proteome and on enzyme activities in the midgtu to develop additional candidates for resistance.

Once definitive targets for control have been identified, their negative impact on insect fitness can be confirmed through RNAi knockdown. Genes with deleterious impacts on beetle larvae can be exploited for control using dsRNAs transformed into poplar for host-mediated knock-down of essential insect genes and/or phenolic glycoside biosynthetic pathways, which can be manipulated to allow phenolic glycosides to overaccumulate in woody tissue, allowing us to engineer poplars with enhanced resistance. This study represents potential to develop systems for controlling A. glabripennis and other prominent wood-boring beetles that pose threats to forests worldwide.

This study represents the first comprehensive -omics study of a cerambycid beetle and its gut microbial community. Prior to this study, little was known about the endogenous metabolic capabilities of cerambycids, aside from their ability to produce cellulases [15-17], and studies conducted on the gut communities of cerambycids were almost exclusively focused on yeast-like symbionts [18]. While these data provide valuable insights into the metabolic potential of both A. glabripennis and its gut community, we expect that whole genome sequencing of A. glabripennis sponsored by the i5k initiative and whole genome sequencing of the F. solani isolate found in association with the midgut will provide crucial investigational tools for further understanding the metabolic capacities of the interactions between this insect and its diverse microbial community and will lead to the discovery of additional targets for biocontrol.

The genome of F. solani is of particular interest to the bioenergy community since many

Fusarium spp. are notorious degraders of lignin and often harbor lignin peroxidase orthologs in their genomes [19-21]. They are inherently metabolically versatile and their metabolic potentials

321 can be greatly expanded through acquisition of supernumerary chromosomes [22].

Characterization of the F. solani genome and its supernumerary chromosomes could lead to a greater understanding of how Fusarium spp. are able to degrade lignin and may also allow us to propose mechanisms for lignin degradation in the A. glabripennis midgut. Disrupting the lignin degrading reactions catalyzed by F. solani through targeted gene silencing or elimination of the

F. solani isolates associated with the midgut could be used as a method of control. The genome of A. glabripennis is also of sufficient interest as it will build upon the inventory of digestive genes that were detected in the midgut. Like F. solani, insects are also metabolically versatile and have evolved the capacities to thrive under extreme environmental conditions. Functional genomics could lead to the discovery of novel metabolic pathways that contribute to the digestion of woody tissue and could be further exploited by the biofuels industry.

322

Literature Cited

1. Nowak DJ, Pasek JE, Sequeira RA, Crane DE, Mastro VC: Potential effect of

Anoplophora glabripennis (Coleoptera : Cerambycidae) on urban trees in the

United States. Journal of Economic Entomology 2001, 94(1):116-122.

2. Haack RA, Herard F, Sun JH, Turgeon JJ: Managing invasive populations of Asian

longhorned beetle and citrus longhorned beetle: a worldwide perspective. In: Annual

Review of Entomology. vol. 55. Palo Alto: Annual Reviews; 2010: 521-546.

3. Ugine TA, Gardescu S, Lewis PA, Hajek AE: Efficacy of imidacloprid, trunk-injected

into Acer platanoides, for control of adult Asian longhorned beetles (Coleoptera:

Cerambycidae). Journal of Economic Entomology 2012, 105(6):2015-2028.

4. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

5. Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K:

Phylogenetic analysis of Fusarium solani associated with the Asian longhorned

beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

6. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Jabbour R, Hoover K: Microbial

community profiling to investigate transmission of bacteria between life stages of

the wood-boring beetle, Anoplophora glabripennis. Microbial ecology 2009,

58(1):199-211.

323 7. Jensen P, Jenkins K, Porter D, Fenical W: Evidence that a new antibiotic flavone

glycoside chemically defends the sea grass Thalassia testudinumagainst zoosporic

fungi. Applied and environmental microbiology 1998, 64(4):1490-1496.

8. Arif T, Bhosale J, Kumar N, Mandal T, Bendre R, Lavekar G, Dabur R: Natural

products–antifungal agents derived from plants. Journal of Asian natural products

research 2009, 11(7):621-638.

9. LINDROTH RL, HEMMING JD: Responses of the gypsy moth (Lepidoptera:

Lymantriidae) to tremulacin, an aspen phenolic glycoside. Environ Entomol 1990,

19(4):842-847.

10. Felton G, Donato K, Del Vecchio R, Duffey S: Activation of plant foliar oxidases by

insect feeding reduces nutritive quality of foliage for noctuid herbivores. J Chem

Ecol 1989, 15(12):2667-2694.

11. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Hoover K: Effect of host tree

species on cellulase activity and bacterial community composition in the gut of

larval Asian longhorned beetle. Environ Entomol 2009, 38(3):686-699.

12. Arakane Y, Specht CA, Kramer KJ, Muthukrishnan S, Beeman RW: Chitin synthases

are required for survival, fecundity and egg hatch in the red flour beetle,Tribolium

castaneum. Insect Biochemistry and Molecular Biology 2008, 38(10):959-962.

13. Fescemyer HW, Sandoya GV, Gill TA, Ozkan S, Marden JH, Luthe DS: Maize toxin

degrades peritrophic matrix proteins and stimulates compensatory transcriptome

responses in fall armyworm midgut. Insect Biochemistry and Molecular Biology 2013,

43(3):280-291.

14. Jongsma MA, Bakker PL, Peters J, Bosch D, Stiekema WJ: Adaptation of Spodoptera

exigua larvae to plant proteinase inhibitors by induction of gut proteinase activity

324 insensitive to inhibition. Proceedings of the National Academy of Sciences 1995,

92(17):8041-8045.

15. Calderon-Cortes N, Watanabe H, Cano-Camacho H, Zavala-Paramo G, Quesada M:

cDNA cloning, homology modelling and evolutionary insights into novel endogenous

cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae).

Insect Molecular Biology 2010, 19(3):323-336.

16. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, Je YH, Lee SM, Seo SJ, Sohn HD, Jin BR:

cDNA cloning, expression, and enzymatic activity of a cellulase from the mulberry

longicorn beetle, Apriona germari. Comp Biochem Phys B 2004, 139(1):107-116.

17. Chang C-J, Wu CP, Lu S-C, Chao A-L, Ho T-HD, Yu S-M, Chao Y-C: A novel exo-

cellulase from white spotted longhorn beetle (Anoplophora malasiaca). Insect

Biochemistry and Molecular Biology 2012.

18. Suh SO, McHugh JV, Pollock DD, Blackwell M: The beetle gut: a hyperdiverse source

of novel yeasts. Mycological research 2005, 109(Pt 3):261-265.

19. Lozovaya VV, Lygin AV, Zernova OV, Li S, Widholm JM, Hartman GL: Lignin

degradation by Fusarium solani f. sp glycines. Plant Disease 2006, 90(1):77-82.

20. Katayama T, Nakatsubo F, Higuchi T: Degradation of arylglycerol-beta-aryl ethers,

lignin substructure models, by Fusarium solani. Archives of microbiology 1981,

130(3):198-203.

21. Monkemann H, Holker U, GolubnitchayaLabudova O, LichtenbergFrate H, Hofer M:

Molecular evidence of a lignin peroxidase H8 homologue in Fusarium oxysporum.

Folia microbiologica 1996, 41(5):445-448.

22. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J,

Schmutz J, Taga M, White GJ, Zhou SG et al: The genome of Nectria haematococca:

325 contribution of supernumerary chromosomes to gene expansion. Plos Genetics 2009,

5(8): e1000618.

326

Appendix A

Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis

Geib SM, Scully ED, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K: Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis. Insects 2012, 3(1):141-160.

Abstract

Culture-independent analysis of the gut of a wood-boring insect, Anoplophora glabripennis (Coleoptera: Cerambycidae), revealed a consistent association between members of the fungal Fusarium solani species complex and the larval stage of both colony-derived and wild A. glabripennis populations. Using the translation elongation factor 1-alpha region for culture-independent phylogenetic and operational taxonomic unit

(OTU)-based analyses, only two OTUs were detected, suggesting that genetic variance at this locus was low among A. glabripennis-associated isolates. To better survey the genetic variation of F. solani associated with A. glabripennis and establish its phylogenetic relationship with other members of the F. solani species complex, single spore isolates were created from different populations and multi-locus phylogenetic analysis was performed using a combination of the translation elongation factor alpha-1, internal transcribed spacer, and large subunit rDNA regions. These analyses revealed that colony-derived larvae reared in three different tree species or on artificial diet, as well as larvae from wild populations collected from three additional tree species in New York City and from a single tree species in Worcester, MA, consistently harbored F. solani within their guts. While there is some

327 genetic variation in the F. solani carried between populations, within-population variation is low. We speculate that F. solani is able to fill a broad niche in the A. glabripennis gut, providing it with fungal lignocellulases to allow the larvae to grow and develop on woody tissue. However, it is likely that many F. solani genotypes could potentially fill this niche, so the relationship may not be limited to a single member of the F. solani species complex.

While little is known about the role of filamentous fungi and their symbiotic associations with insects, this report suggests that larval A. glabripennis has developed an intimate relationship with F. solani that is not limited by geographic location or host tree.

Introduction

The Asian longhorned beetle (Anoplophora glabripennis) is an invasive, wood-boring insect with a relatively broad host range that now includes over 21 deciduous tree species [1]. A. glabripennis was first detected in the United States in 1996 and, since its arrival, has caused millions of dollars in damage to urban streetscapes in several northeastern and midwestern states.

It also poses a threat to the maple syrup industry and forest ecosystems [2, 3]. Most cerambycids are constrained to feeding in stressed, dying, or dead trees and are reported to digest cellulose by ingesting enzymes produced by wood-degrading fungi that colonize infected wood [4, 5]. In contrast, other Lamiinae, including larval A. glabripennis, feed and grow in the inner wood of a variety of healthy hardwood tree species where woody, intractable components, including lignin and cellulose, have not been pre-digested by wood-degrading fungi and, instead, are internally digested [6-9].

328 Although many cerambycids have the endogenous capacity to produce endoglucanases and glycoside hydrolases capable of disrupting random β-1,4 linkages in cellulose chains and hydrolyzing β-1,4 linkages in cellobiose disaccharides, respectively [10-12], insects do not produce exoglucanases, which processively cleave cellobiose and other cello-oligosaccharides from reducing and non-reducing ends of cellulose polymers [13] These exoglucanases are essential to efficiently liberate glucose from carbohydrate polymers; while they are produced by a number of cellulose-degrading bacteria and fungi, no definitive insect- or animal- derived exo- type glucanases have been conclusively identified [10, 12]. In addition, cellulose, hemicellulose, and small amounts of protein are often cross-linked to lignin in hardwood tree species. Thus, circumventing the lignin barrier is paramount for accessing hexose and pentose sugars present in cellulose and hemicellulose polymers and for protein acquisition. Although lignin degradation has been well-documented in the A. glabripennis gut [14], no known insect-derived enzymes are capable of catalyzing the oxidative reactions responsible for large-scale lignin depolymerization in this system and, in fact, only fungi are known to produce peroxidases capable of catalyzing the types of lignin-directed reactions observed in larval A. glabripennis. Furthermore, A. glabripennis larvae face a number of other obstacles as they feed in the sapwood and heartwood of healthy host trees, including low protein/nitrogen availability, low abundance of essential dietary nutrients (e.g. sterols, fatty acids, and vitamins), and toxic secondary metabolites produced by the host tree [15].

Thus, it is likely that microbial species are important in aiding in wood degradation and nutrient acquisition within A. glabripennis. In the wood-feeding beetle species examined to date that harbored bacteria, a broader diversity of gut microbes was associated with broader tree host range. Raffa and colleagues [16] found a broad diversity of bacteria in the gut of A. glabripennis larvae from willow trees in China, while the linden borer (Saperda vestita), a cerambycid with a more restricted host range, contained only a small subset of these same bacteria [17]. Follow-up

329 studies examining the bacterial community composition from colony derived and wild A. glabripennis populations collected from several suitable host trees revealed that the community was consistently dominated by proteobacteria and actinobacteria, including many taxa capable of degrading small aromatic lignin metabolites, cellulose, and xylan, but displayed a considerable degree of plasticity at lower taxonomic levels. In addition, larvae fed on a cellulose-based artificial diet, containing bacteriostatic agents displayed a substantially lower community diversity coupled with reduced endo- exo- and β-glucosidase activities in comparison to tree-fed larvae, suggesting that bacteria may be contributing to cellulase enzyme production in the gut. In concert, a number of carboxymethylcellulose degrading bacteria were successfully cultured from insects feeding on suitable host trees, including bacteria belonging to the families

Brevobacteriaceae, Burkholderiaceae, Micrococcaceae, Staphylococcaceae, and

Streptomycetaceae [18]. Several of these bacterial taxa are found in association with larval

A. glabripennis throughout its life cycle, including bacteria from the families Bacilliaceae and

Xanthomonadaceae , suggesting that they may be vertically transmitted during oviposition [19].

In several beetle species studied to date, a number of fungal species have been discovered in association with the insect’s gut, including yeast endosymbionts [20-23]. The most thoroughly- characterized are yeasts in the basidiocarp-dwelling beetles (mushroom-eating beetles), in which about 300 species of yeasts have been discovered in 25 families [21, 24]. Some of these yeasts produce xylanases and other symbiotic yeasts are speculated to play roles in nitrogen metabolism, detoxification, and nutrient synthesis [20, 21, 25-30]. Some cerambycids harbor intracellular yeasts in mycetocytes, specialized cells in the gastric caecae located at the junction between the foregut and midgut [25]. These yeasts are maternally transmitted by inoculation of the egg surface; newly hatched larvae ingest the egg membrane, acquiring yeasts in the process [25].

Thus far, all yeasts isolated from cerambycids are in the genus Candida [31, 32]. In addition to yeast-like endosymbionts, some beetles are occasionally found in non-pathogenic interactions

330 with members of the Fusarium solani species complex, a group of metabolically diverse fungi dominated by notorious plant and animal pathogens [33]. The most well-studied relationship to date is the relationship between F. solani and ambrosia beetles: in these symbioses, ambrosia beetles harbor fungal spores in mycangia, inoculating the fungus into woody tissue, where it synthesizes sterols required for pheromone production and aids in cellulose digestion [34]. While

Fusarium moniliforme and Fusarium roseum var. graminearum found in association with

Tribolium confusum often have profound, positive impacts on fecundity and fitness, the nature of the relationship is poorly understood and their precise contributions to nutrient acquisition, digestive processes, and insect physiology have yet to be characterized [33]. A considerable amount of research has been devoted to characterizing the fungal communities of other cerambycids, yet very little is known about the fungal communities of the Lamiinae, including A. glabripennis. Interestingly, the Lamiinae were initially thought to lack fungal endosymbionts or to possess mycetocytes until culture-based molecular techniques recently detected the presence of several different ascomycete yeast species found in association with beetle guts (Lamiinae) collected from neotropical regions [32]. In previous studies in our lab, F. solani was detected in association with larvae collected from wild populations in New York City [14], but the consistency of this relationship, and the persistence of this relationship in other beetle populations, was not assessed. The goal of this research was to identify and phylogenetically characterize fungal isolates collected from the guts of larval A. glabripennis from different geographic locations and host tree species.

331 Materials and Methods

Rearing Colony Derived A. glabripennis on Different Host Trees

Larvae were reared in nursery lines of one of three tree species: sugar maple (Acer saccharum), pin oak (Quercus palustris), and callery pear (Pyrus calleryana), as described previously [18]. Trees were planted in 20-gallon nursery containers filled with Fafard 52 pine bark medium (Fafard Inc., Agawam, MA, USA), and grown at an outdoor pot-in-pot nursery at the Pennsylvania State University, University Park campus until they were 4–5 years old. Several weeks prior to use in experiments, trees were moved into a quarantine greenhouse to allow for acclimation to greenhouse conditions. Experiments had to be conducted under quarantine conditions due to the quarantine regulations of A. glabripennis. Three trees of each species were placed in large (ca. 3 m high, 3 m long, 2 m wide) walk-in insect cages, each cage containing only one tree species, and maintained as described previously [8]. Adult A. glabripennis were obtained from a quarantine research colony of mixed ancestry [35]. The research colony is maintained on a cellulose-based artificial diet [36] using Norway maple (Acer platanoides) bolts for adult feeding and oviposition. For this experiment, three mated pairs of adults per tree species

(n=9 pairs) were maturation fed on twigs from either sugar maple or pin oak for 3-5 days. After maturation, mating pairs were released into a cage containing potted trees of the same tree species from which they maturation fed, and were allowed to oviposit into these trees for 2 weeks. At this point, adults were removed from the cages. The trees were then held in the greenhouse for 90 days to permit larval establishment. After 90 days, each tree was dissected and living larvae were collected for gut community analysis.

332 Callery pear was not used for maturation feeding or oviposition because A. glabripennis do not survive or produce eggs when fed on twigs of this tree species and eggs laid into callery pear do not survive [35]. Because A. glabripennis are reluctant to feed on, and do not grow well in callery pear [35], a portion of the larvae extracted from sugar maple were then manually inserted into callery pear and allowed to feed for 2 weeks as described previously [37] to examine the effect of this resistant host on the gut community. Callery pear trees were dissected 2 weeks after larval insertion and apparently healthy, feeding larvae were collected for gut community analysis. Larvae that fed on a cellulose-based artificial diet [36] were also collected for gut community analysis from the quarantine lab colony, of approximately the same age as the tree- reared larvae. While the artificial diet does not contain any specific antibiotics, it does include components such as sodium propionate, sorbic acid, and p-hydroxybenzoic acid methyl ester, which have fungal and microbial inhibitory properties [38]. Samples from these greenhouse trees and quarantine colony will be referred to as “PSU colony-derived” from this point on.

Collection of Larval A. glabripennis and Fungal Cultures from Introduced Wild

Populations

To compare fungal community composition of PSU colony-derived larvae to field- derived insects, A. glabripennis were collected from field populations located in Brooklyn, New

York, United States (referred to as NYC-derived), and Worcester, Massachusetts, United States

(referred to as MA-derived) in conjunction with eradication efforts by USDA-APHIS-PPQ.

Infested trees were located based on the presence of exit holes and dieback. Four trees were cut for this study from New York City, including two silver maples (Acer saccharinum), one

333 sycamore maple (Acer pseudoplatanus), and one horse-chestnut (Aesculus hippocastanum). From

Worcester, Massachusetts, infested material was collected from two silver maple trees.

Trees were cut into segments on site and transferred to the lab where they were dissected to remove larvae. Larvae were immediately frozen after removal from tree segments and were stored at −80 °C until use.

Anoplophora Glabripennis Larval Gut Dissection and DNA Extraction

Larval dissections were performed using sterile dissection tools in a laminar flow hood to maintain sterility. Larvae removed from PSU colony-derived trees were immediately chilled and dissected within 1 hour of removal from trees. NYC- and MA-derived larvae were kept frozen until immediately before dissection, and were then maintained on ice. Larvae were surface sterilized in 70% ethanol for 1 min and rinsed in sterile water before dissection. Whole guts were dissected by cutting the cuticle open laterally, ligating the gut at the anterior midgut and posterior hindgut, and carefully transferring the entire gut into a sterile microcentrifuge tube. For the PSU- colony and NYC-derived samples, guts from 10 larvae feeding on a single tree were pooled into a single microcentrifuge tube for DNA extraction. Then, total DNA was extracted using the

FastDNA® SPIN for Soil Kit (MP Biomedicals), using the FastPrep® Instrument for tissue homogenization, following the manufacturer's protocol. This kit was used due to the complexity of the A. glabripennis gut contents (containing wood, bacteria, fungi) to ensure complete DNA extraction from all organisms and removal of wood polysaccharides and secondary metabolites that can co-extract with DNA. A control DNA extraction was also performed using the sterile water rinsate to confirm external sterilization of larvae. DNA concentration was determined by absorbance at 260 nm and extracts were stored at −20 °C until use. A subset of gut tissue from

334 PSU-colony and NYC-colony derived insects and all MA-derived insects were used for fungal culturing.

Culture Independent Fungal Community Analysis

Initially, the internal transcribed spacer region (ITS1-ITS4) was amplified and sequenced from total gut DNA extractions from each treatment, a commonly used genomic region for fungal identification [39]. After preliminary experiments from which only Fusarium strains were identified, the translation elongation factor-1 alpha (TEF1-α) region was used instead for culture independent analysis, due to the potential presence of non-orthologous copies of the ITS region in some Fusarium that can lead to incorrect identification [40, 41]. Eight culture independent libraries were created, one for each of the following DNA samples: PSU colony-derived sugar maple, PSU colony-derived pin oak, PSU colony-derived callery pear, PSU colony-derived artificial diet, NYC-derived silver maple tree 1, NYC-derived silver maple tree 2, NYC-derived horse-chestnut, and NYC-derived sycamore maple. Fungal-specific primers for translation elongation factor 1 (ef1, 5’-ATG GGT AAG GA(A/G) GAC AAG AC-3’) and translation elongation factor 2 (ef2, 5’-GGA (G/A)GT ACC AGT (G/C)AT CAT GTT-3’) were used to amplify the TEF1-α region, which can amplify this region from a broad range of filamentous ascomycetes [42, 43]. PCR reactions were performed in 25 L volumes with the following components: 5 L of 5× GoTaq green reaction buffer, 0.5 L GoTaq DNA polymerase (1.25 U,

Promega, Madison, WI), 1 L 10 M dNTP mix, 2 L of 10 μM forward primer (ef1), 2 L of 10

μM reverse primer (ef2), and 20 ng of template DNA. PCR conditions were 95 °C denaturation for 3 min, 25 cycles of 95 °C for 30 sec, 53 °C for 45 sec, 72 °C for 1:30 min, with a final extension at 72 °C for 5 min. PCR reactions were also performed on control DNA to ensure that

335 there was no contaminating DNA during extraction. Agarose gel electrophoresis verified amplification of target DNA and confirmed that no fungal products were amplified from sterile rinsate controls. For each gut pool, 2 L of the PCR product was ligated into the pCR® 2.1

TOPO vector (Invitrogen, Carlsbad, CA) following the manufacturer’s protocol. The vector was then transformed into chemically competent E. coli cells (TOP10, Invitrogen, Carlsbad, CA) by heat-shock and clone libraries were created.

Insert DNA from clones was amplified from the M13 priming site of the vector using direct PCR. Twenty-five L PCR reactions were set up in 96-well format with the following components: 5 L of 5× GoTaq green reaction buffer, 0.5 L GoTaq DNA polymerase (1.25 U,

Promega, Madison, WI), 1 L 10 M dNTP mix, 2 L of 10 μM forward primer (M13

Universal), and 2 L of 10 μM reverse primer (M13 Rev). For each of the 8 clone libraries, 48 colonies were picked from the clone library using a sterile pipette tip and immersed into the PCR mix to add bacterial cells to the PCR reaction. The PCR program included an initial 95 °C denaturation for 10 min to rupture bacterial cells, followed by 30 cycles of 95 °C for 30 sec, 55

°C for 1:00 min, 72 °C for 1:30 min, with a final extension at 72 °C for 5 min. Amplification was confirmed by gel electrophoresis; for successful amplifications, 4 L of the PCR product was purified for sequencing by adding 0.8 L of ExoSAP-IT (USB Corporation, Cleveland, OH) and incubating the sample at 37 °C for 15 min, followed by 80 °C for 30 min. Two L of this reaction were then used to sequence from the forward direction from the M13 Universal priming site and 2

L for the reverse from the M13 Rev site using BigDye chemistry at the Penn State Genomics

Core Facility.

336 Aerobic Culturing of Gut Fungus on Restrictive Media

A subset of the PSU-colony derived larvae from sugar maple trees, a NYC-derived larva from sycamore maple, and all MA-derived larvae were used to aerobically culture fungi present in the gut. This enabled the establishment of individual cultured isolates, and thus, a more-in- depth phylogenetic analysis encompassing multiple loci could be performed, overcoming a major limitation of culture-independent approaches. Larval dissections were performed in a laminar flow hood. For PSU-colony derived, NYC-derived, and MA-derived insects, five guts from insects from a given host tree were pooled into a single microcentrifuge tube containing 500 L of sterile PBS solution (0.01M, 0.138M NaCl, 0.0027 M KCl, pH 7.4). For the PSU-colony, three different sugar maple trees produced enough larvae to create three independent cultures. For MA- derived larvae, two silver maple trees produced sufficient larvae to produce cultures. For NYC- derived larvae, only a single sycamore maple produced a sufficient number of larvae for subculturing. A control tube containing only PBS was also used for inoculation to ensure there was no contamination during dissection and plating.

Tissues were homogenized using a disposable micropestle and vortexed at a medium speed for 30 seconds. Serial dilutions of each pooled homogenate were performed in PBS (1:10,

1:100, 1:1000, and 1:10000). One hundred L of each dilution were plated in triplicate onto amended CMC agar plates [17] (5 g carboxymethyl cellulose, 10 g tryptic soy broth, 0.03 g malt extract, and 12 g agar for 1 L, pH 7.0) treated with tetracycline and incubated at 28 °C for 2–4 days. Single spores from several representative fungal colonies from each pool of guts were sub- cultured on tetracycline amended potato dextrose agar (BD, Franklin Lakes, New Jersey, USA) to generate monoconidial cultures. The monoconidial cultures were subcultured onto potato dextrose “reverse agar” (potato dextrose broth with 30% BASF pluronic polyol F-127), and the fungal cultures were easily collected from this media by refrigerating the plates at 4 ºC to re-

337 liquefy media, and then spinning down the mycelia at 4 ºC at 12,000 × g for 10 min in a 50 mL centrifuge tube. At this point, the supernatant was removed and the mycelia were washed with TE buffer to remove remaining media.

DNA extraction was performed by grinding the collected mycelia under liquid nitrogen, followed by extraction in 5 volumes of extraction buffer (50 mM Tris-HCl pH 8.0, 50 mM

EDTA, 3% SDS, 0.1 mg/mL protease K, and 1% β-mercaptoethanol) at 65 ºC for 1 hour.

Phenol:chloroform extraction was then performed with two rounds of extraction in phenol:chloroform:isoamyl alcohol (25:24:1), followed by 1 round of chloroform:isoamyl alcohol

(24:1). DNA was precipitated from the upper phase by addition of 0.5 volume 7 M ammonium acetate and 2 volumes of 95% ethanol and incubated for 1 hour at −20 ºC. DNA was pelleted and resuspended in TE buffer and quantified by measuring absorbance at 260 nm. The number of cultures analyzed for each sample is displayed in Table A1-1.

Multi-Locus Sequencing from Cultured Fungal DNA Extraction

PCR was performed on each single-spore isolate DNA extract to amplify the TEF1-α region as described above [44]. In addition, PCR was performed on the ITS and NL regions of the rRNA [39]. The ITS region was amplified by using PCR primers ITS5 (5'-

GGAAGTAAAAGTCGTAACAAGG-3') and ITS4 (5'-GGTCCGTGTTTCAAGACGG-3'), which encompasses the end of the 18S rDNA locus, the entire ITS1-5.8S rDNA-ITS2 region, and the beginning portion of the 28S rDNA locus. The LSU 28s rDNA region was amplified with primers NL1 (5'-GCATATCAATAAGCGGAGGA-3') and NL4 (5'-

GGTCCGTGTTTCAAGACGG-3'). PCR reactions were purified for sequencing as described above and sequenced in both directions using 2 µl of 1 µM forward or reverse primer to prime the

338 sequencing reaction. All sequencing was performed using BigDye chemistry at the Penn State

Genomic Core Facility.

Sequence Editing, Alignment and Operational Taxonomic Unit (OTU) Analysis of TEF1-α

Alignment of forward and reverse TEF1-α sequences for each sample from both cultured and culture independent samples was performed using Sequencher 4.8 (Gene Codes Corporation,

Ann Arbor, MI). After alignment, the cloning vector sequence was removed and a single consensus sequence for each clone was created by manually discriminating conflicting base calls between the forward and reverse reads. All edited sequences obtained from culture-independent analyses and cultured isolates were submitted to Genbank under accession numbers JQO25721 –

JQ025987 and JN9803020 – JN9803041, respectively. Using the TEF1-α consensus sequences from all clone libraries and all cultured samples, a ClustalW alignment was generated using

MEGA4 [45] and then manually edited to improve accuracy of alignment. This dataset was then used to perform operational taxonomic unit (OTU) analysis using the mothur software package

[46]. First, a Jukes-Cantor distance matrix was produced using the DNAdist program in the

Phylip package using default parameters. This distance matrix was then subjected to OTU cluster analysis with mothur, using the furthest neighbor algorithm and a cutoff of 0.97 [46]. This analysis binned all sequences into two OTUs. BLASTn comparison of a representative sequence from each OTU to the nt and FusariumID databases confirmed that both OTUs contained sequences within the F. solani species complex [42].

339 Single- and Multi-locus Phylogenetic Analysis

For phylogenetic reconstruction and placement of cultures within the F. solani species complex, we utilized the dataset of O’Donnell [44], which reconstructed the phylogeny of the F. solani complex using the TEF1-α, ITS, and LSU regions and included a broad range of F. solani isolates from a variety of geographically distinct locations, mating populations, and subspecies.

The nexus file used in this analysis was downloaded from TreeBASE [47]. ITS, TEF1-α, and

LSU sequences obtained from A. glabripennis-derived cultures were concatenated together and manually aligned to the O'Donnell dataset. This alignment was partitioned into individual loci by position using PAUP Beta 4.0 (Windows) and this partitioned dataset was used for all single locus and multilocus phylogenetic reconstructions [48]. ITS and LSU sequences obtained from cultured isolates were submitted to Genbank under the accession numbers JN982998–JN983019 and JN982967–JN982997, respectively.

Using the sequences obtained from culture-independent analysis of the TEF1-α locus, a single representative sequence was randomly selected from each OTU for inclusion into the

TEF1-α phylogenetic analysis (OTU 1: F01_HEREF (from NYC-derived horse chestnut) and

OTU 2: B11DEF_34 (from PSU colony-derived artificial diet). These OTU sequences were manually aligned to the TEF1-α locus (bases 585 to 1290 of the multilocus alignment) and the

TEF1-α alignment was extracted and imported into jModelTest (version 0.1.0) to select best fit models of nucleotide substitution optimized for maximum likelihood tree topology using Akaike

Information Criteria (AIC) [49, 50]. Maximum likelihood trees were estimated using the TPM2uf

+ G evolutionary model with GARLI 2.0 (Genetic Algorithm for Rapid Likelihood Inference)

[51]. Evolution was simulated for 500,000 generations or until likelihood scores reached convergence; nonparametric bootstrap analysis was conducted to generate support for branching topology (n = 500 pseudoreplicates). Bootstrap consensus trees were rooted with Fusarium

340 staphyleae and compiled using SumTrees version 3.3.1 [52]. Nodes with branch lengths <1E-08 were collapsed and bootstrap values for nodes with support >50 are reported.

Phylogenetic analysis using ITS and LSU loci was also conducted using the O’Donnell dataset and ITS and LSU sequences obtained from cultured A. glabripennis-derived F. solani isolates; however, since ITS and LSU sequences were not available from culture-independent analysis, representative sequences from OTUs could not be included. Maximum likelihood trees were simulated using TIM + I + G for ITS and TIMef + I for LSU with GARLI [31–33].

Evolutionary simulation, nonparametric bootstrap analysis, and compilation of bootstrap consensus trees were performed in the same manner as described above.

Single locus phylogenetic reconstructions were of limited use in resolving the relationship of A. glabripennis-derived isolates to the Fusarium solani species complex due to a large number of unresolved nodes and nodes with low bootstrap values, particularly using the ITS and LSU loci (data not shown). In an attempt to improve bootstrap support values and resolve multifurcating nodes, a multilocus approach was used to construct maximum likelihood phylogenetic trees. Maximum likelihood trees were computed using GARLI [33]. Each locus was treated as a separate partition and allowed to evolve independently according to its optimal model of nucleotide substitution determined by jModelTest (ITS: TIM + I + G; EF: TPM2uf + G; LSU:

TIMef + 1) [31,32]. Evolution was simulated for 500,000 generations or until likelihood scores reached convergence and nonparametric bootstrap analysis was performed to generate support for branching patterns (n=500 pseudoreplicates). Bootstrap consensus trees were rooted with F. staphyleae and were compiled as described above.

341 Results

Culture Independent Fungal Community Analysis

In total, 277 TEF1-α clones associated with A. glabripennis were sequenced and analyzed. From this total, 153 were derived from insects from the Penn State research colony, with 39 clones from insects reared on sugar maple, 33 clones from insects reared on pin oak, 46 clones from insects reared on callery pear, and 35 clones from insects reared on cellulose based artificial diet (Table A1-2). The remaining 124 clones were from insects from field populations in

New York City, with 30 clones from insects collected from silver maple tree 1, 32 clones from insects collected from silver maple tree 2, 29 clones from insects collected from a horse-chestnut tree, and 33 clones from a sycamore maple tree. Based on OTU analysis, using the furthest neighbor algorithm, all TEF1-α sequences derived from culture-independent methods were categorized into two OTUs; specifically, 268 TEF1-α sequences were placed into OTU1, while only nine TEF1-α sequences were placed into OTU2. In addition, while OTU2 was found in association with PSU-derived larvae fed on pin oak or artificial diet, it was not found in association with any A. glabripennis larvae derived from the NYC population. In contrast, OTU1 was highly abundant in both PSU- and NYC-derived populations and was detected in larvae sampled from all host tree species.

342 Culture-dependent fungal analysis

A total of 22 monoconidial fungal cultures was obtained and analyzed across three loci.

Based on analysis of the TEF1-α locus sequences, all of the cultured isolates were classified into the same two OTUs that were described in the culture-independent analysis (Table A1-3); specifically, nine PSU-derived cultures and one NYC-derived culture were categorized into

OTU1, while eight PSU-derived and six MA-derived cultures were categorized into OTU2. In

NYC- and MA-derived larvae, only representatives from OTU1 and OTU2 were cultured respectively; however, although OTU2 had not been detected previously in PSU-derived larvae reared on sugar maple, cultures from both OTUs were isolated from PSU-larvae. The abundance of OTU1 and OTU2 among Fusarium cultures generated from PSU larvae was roughly equivalent. Phylogenetic analysis of both culture independent and cultured strains using the

TEF1-α locus produced congruent results with OTU-based analysis (Figure A1-1). Broadly speaking, all A. glabripennis -derived sequences were resolved into two distinct clades, which corresponded well with the OTU classifications. Specifically, clade 1 contains the same nine cultured representatives that were categorized as OTU1, including a single isolate from NYC- derived larvae, nine PSU-derived isolates, and a representative culture-independent sequence from OTU1. Of particular interest is the low degree of sequence variation among isolates that fall into this clade, which is indicated by low branch lengths and unresolved nodes. While most sequences designated as OTU1 in the OTU-based cluster analysis form a monophyletic group distinct from the O'Donnell isolates, a single outlier (PSU Tree 5 Culture 1) that was categorized as OTU1 was present. Although no O'Donnell sequences were present in clade 1 (not including the outlier “PSU Tree 5 Culture 1”), the nearest group contains F. solani isolates from mating populations (listed as MP in figures based on O’Donnell analysis) III, VI, and VII (ff. spp. mori, pisi and robiniae, respectively).

343 Although all sequences from OTU2 formed a clade distinct from OTU1, there is a significantly higher degree of sequence heterogeneity within this group relative to OTU1 and this group can be further divided into two sub-clades (clades 2a and 2b). Clade 2a contains only isolates derived from PSU sugar maple-reared larvae and has a relatively high degree of within- group sequence heterogeneity, while clade 2b contains only isolates derived from MA populations and has a low degree of within-group sequence heterogeneity, indicated by unresolved nodes. Both clade 2a and 2b fall in a well-supported and distinct clade with the nearest O’Donnell isolate from F. solani f. sp. cucurbitae mating population V. Single locus analysis using the ITS and LSU rRNA regions from the cultured fungal isolates showed similar branch topologies as the TEF1-α derived tree but with much lower bootstrap values and many unresolved nodes (data not shown). These results are similar to results from these loci in the original analysis performed by O’Donnell [44].

Multiple locus analysis encompassing the TEF1-α, ITS, and LSU regions from A. glabripennis-derived cultures and O'Donnell isolates produced similar tree topology as the TEF1-

α single locus analysis, improving bootstrap support on many nodes (Figure A1-2). One major change that can be noted is that multiple locus analysis now places the outlier sequence (PSU

Tree 1 Culture 5) as a neighbor to OTU2 rather than clustering with OTU1. Otherwise, clade 1, which contains all isolates classified as OTU1, and clades 2a and 2b which contain all OTU2- classified isolates, can still be clearly identified. Of particular interest is that the low sequence variability noted within the OTU1 group in the EF-based phylogenetic analysis still persists even when sequence data from two additional loci are integrated, suggesting low genetic diversity among OTU1-derived F. solani isolates. In addition, clade 2b (containing only MA-derived isolates) is still relatively invariant at the nucleotide level in comparison to clade 2a (containing

PSU-derived isolates).

344 Discussion

In Anoplophora glabripennis larvae, collected from 11 trees of six different species at three different geographic locations, we consistently found F. solani associated with the insect gut. While the same fungal genotype was not detected in all larval populations analyzed, there was a strong correlation between the F. solani strain detected and geographic location. Cultured isolates could be categorized into three clades (Fig. 2). Clade 1 contained cultured isolates from

PSU- or NYC-populations and culture-independent OTU1. OTU1 was by far the most abundant

OTU detected through culture-independent approaches, representing 266/277 (96%) of the clones sequenced and it was detected in every gut sample collected from either NYC-derived or PSU- derived insects (Table A1-2).

The isolates within OTU1 and clade 1 show very little sequence divergence, suggesting they may be transmitted between generations since isolates from these groups were found not only in colony-derived artificial diet reared insects, but also in their progeny, which were reared in trees in our quarantine facility. Also, the PSU-colony was originally derived from field A. glabripennis populations in New York City, where a high abundance of this OTU was also found.

It is difficult to determine if OTU1 has a consistent evolutionary relationship with A. glabripennis since we were not able to recover it from Worcester, MA-derived insects. Due to difficulty in obtaining samples, we could not perform culture-independent analysis from this geographic location and were limited to performing analysis on cultured isolates. Although OTU2 was significantly less abundant, representing only 4% of clones, it is readily culturable and was often cultured from samples in which OTU2 had not been detected previously in our culture- independent analyses. This suggests that OTU2 may be less abundant in the gut than OTU1, but more amenable to culturing since it represented approximately 40% of our cultured isolates. This could also explain why OTU1 was not detected in MA-derived samples (TablesA1-2 and A1-3).

345 Despite this, OTU1 has low sequence variability and is highly prevalent in our OTU analysis, suggesting a strong relationship with A. glabripennis.

In addition to the highly prevalent F. solani OTU1 strain, we also detected a lower prevalence of a second clade of F. solani sequences (OTU2, Figs. 1 and 2). Unlike OTU1, this

OTU was much more heterogeneous than the sequences that comprised OTU2, forming two distinct A. glabripennis-derived clades, (clade 2a and clade2b) and was closely related to F. solani isolates from mating population V. Clade 2a contains F. solani derived from PSU colony- derived culture independent samples from pin oak trees and artificial diet, as well as cultured samples from sugar maple trees. Clade 2b contains all of the sequences from MA-derived insects, and there is low sequence diversity among these samples. The isolates that were most closely related to OTU2 are plant pathogens from mating population V. A similar case is observed for

OTU1, as the closest relatives are also plant pathogens (mating populations III, IV and VII). We can speculate that the ancestral source of F. solani found in A. glabripennis were plant-associated fungal populations, either pathogens or endophytes, that may have been encountered by chance by insects feeding on trees. This could enable these fungi to become associated with the gut either in tandem with OTU1 (as seen in PSU colony-derived insects), or could potentially replace

OTU1, which may have occurred in MA-derived insects. The Massachusetts infestation of A. glabripennis is thought to be a separate introduction to the U.S. from the New York City populations [1]. This could also explain why MA-derived F. solani cultures were distinct from

NYC- and PSU colony-derived cultures.

Regardless of the evolutionary origin of the F. solani harbored in the gut of A. glabripennis, the consistent association of F. solani and sequence homogeneity of F. solani isolates within geographic populations of larval A. glabripennis suggests that F. solani may provide some critical metabolic or biochemical requirement in the gut for larval development. F. solani strains are known to have the metabolic ability to degrade cellulose and hemicelluloses,

346 and have some “soft-rot” type lignin degradation abilities [53-57]. Interestingly, members of the

Fusarium solani species complex are metabolically versatile and can often colonize and thrive on lignocellulose-based substrates, including Kraft and Klason lignin, and produce impressive arrays of laccases, cellulases, xylanses, and enzymes involved in xenobiotic detoxification, suggesting that members of this species complex may harbor efficient lignin-degrading enzymes with potential to help insects overcome many of the challenges associated with feeding in wood [55,

58-60]. Other potential biochemical capabilities previously detected in Fusarium isolates that may benefit insect hosts include sterol synthesis, detoxification of plant-derived allelochemicals and secondary metabolites, and nitrogen scavenging from unconventional substrates, including cyanide and formamide [61-63]. The consistent presence of F. solani in the insect gut suggests that A. glabripennis may be harboring this fungus in order to benefit from its metabolic capabilities. In order to assess the true metabolic and lignin degrading potential of A. glabripennis

F. solani affiliates, whole genome sequencing of the OTU 1 isolate is currently underway and in vitro analyses utilizing model lignin compounds will be conducted to survey for potential lignin degrading enzymes. Furthermore, we are currently investigating the role of F. solani in the gut of

A. glabripennis through metagenomic, transcriptomic and proteomic studies to better understand potential contributions to digestion and gut physiology in A. glabripennis.

Conclusions

Many species in the F. solani species complex are notorious plant pathogens, while others are common environmental fungi that are often detected in many diverse habitats [44, 64].

The question resulting from this experiment is: Are the Fusarium isolates we detected in A. glabripennis just environmental contaminants picked up by the beetle during feeding, or are these

347 isolates endosymbionts? It is possible that some of the fungal strains found in a subset of our samples are just environmentally derived or transient fungi that are not associated with this insect.

Despite this, a large clade of F. solani sequences was detected from all culture independent samples, as well as cultured samples from PSU colony-derived and NYC-derived insect gut fungal samples, which appear to have an intimate association with A. glabripennis. Not only did this clade make up the vast majority of the sequences obtained from the clone libraries, but this clade was very distinct from the other F. solani strains in our tree or within any previously described mating populations [44]. Also, these strains are not a by-product of colony rearing or geographic distribution, as it was found in both our research colony as well as several collection sites within New York City. To further test this association, analysis of larval guts from Asia, the native range of this insect, would be most informative. In addition, an understanding of the method of transmission of this fungus, and the maintenance of the fungus within the insect gut will be important to confirming the symbiotic nature of the relationship.

Acknowledgements

We thank Al Sawyer’s group at USDA-APHIS and Ken Gooch at the Massachusetts

Department of Conservation and Recreation for assistance collecting beetles from New York

City, NY and Worcester, MA and the Penn State Genomic Core Facility for sequencing services.

Funding for this project was provided by USDA-NRI-CRSEES 2008-35504-04464, USDA-NRI-

CREES 2009-35302-05286, and the Alphawood Foundation, Chicago, IL.

348

Literature Cited

1. Whitney AN, Keena MA: Effects of host wood moisture on the lifecycle development

of the Asian longhorned beetle. In: 21st US Department of Agriculture interagency

research forum on invasive species: 2010; Annapolis, MD: U.S. Department of

Agriculture, Forest Service, Northern Research Station; 2010: 169.

2. Moser WK, Barnard EL, Billings RF, Crocker SJ, Dix ME, Gray AN, Ice GG, Kim MS,

Reid R, Rodman SU et al: Impacts of Nonnative Invasive Species on US Forests and

Recommendations for Policy and Management. J Forest 2009, 107(6):320-327.

3. Haack R: New York's battle with the Asian long-horned beetle. J Forest 1997,

95(12):11-15.

4. Kukor JJ, Martin MM: Cellulose digestion in Monochamus marmorator Kby

(Coleoptera, Cerambycidae) - Role of acquired fungal enzymes. J Chem Ecol 1986,

12(5):1057-1070.

5. Kukor J, Cowan D, Martin M: The role of ingested fungal enzymes in cellulose

digestion in the larvae of cerambycid beetles. Physiol Zool 1988, 61(4):364-371.

6. Morewood WD, Hoover K, Neiner PR, Sellmer JC: Complete development of

Anoplophora glabripennis (Coleoptera: Cerambycidae) in northern red oak trees.

Can Entomol 2005, 137:376-379.

7. Morewood WD, Neiner PR, McNeil JR, Sellmer JC, Hoover K: Oviposition preference

and larval performance of Anoplophora glabripennis (Coleoptera: Cerambycidae) in

349 four eastern North American hardwood tree species. Environ Entomol 2003, 32:1028-

1034.

8. Morewood WD, Neiner PR, Sellmer JC, Hoover K: Behavior of adult Anoplophora

glabripennis on different tree species under greenhouse conditions. J Insect Behav

2004, 17(2):215-226.

9. Sellmer JC, Morewood WD, Neiner P, Hoover K: Evaluating Asian Longhorned Beetle

Adult Preference and Larval Performance Among Commonly Planted Landscape

Trees. In: 13th Metropolitan Tree Improvement Alliance and Landscape Plant

Development Center Conference: June 16-19 2004; Lisle, IL; 2004.

10. Lee SJ, Kim SR, Yoon HJ, Kim I, Lee KS, Je YH, Lee SM, Seo SJ, Sohn HD, Jin BR:

cDNA cloning, expression, and enzymatic activity of a cellulase from the mulberry

longicorn beetle, Apriona germari. Comp Biochem Physiol, B: Comp Biochem 2004,

139(1):107-116.

11. Lee SJ, Lee KS, Kim SR, Gui ZZ, Kim YS, Yoon HJ, Kim I, Kang PD, Sohn HD, Jin

BR: A novel cellulase gene from the mulberry longicorn beetle, Apriona germari:

Gene structure, expression, and enzymatic activity. Comp Biochem Physiol, B: Comp

Biochem 2005, 140(4):551-560.

12. Sugimura M, Watanabe H, Lo N, Saito H: Purification, characterization, cDNA

cloning and nucleotide sequencing of a cellulase from the yellow-spotted longicorn

beetle, Psacothea hilaris. Eur J Biochem 2003, 270(16):3455-3460.

13. Holtzapple M, Cognata M, Shu Y, Hendrickson C: Inhibition of Trichoderma reesei

cellulase by sugars and solvents. Biotechnol Bioeng 1990, 36(3):275-287.

14. Geib SM, Filley TR, Hatcher PG, Hoover K, Carlson JE, Jimenez-Gasco Mdel M,

Nakagawa-Izumi A, Sleighter RL, Tien M: Lignin degradation in wood-feeding

350 insects. Proceedings of the National Academy of Sciences of the United States of America

2008, 105(35):12932-12937.

15. Dillon RJ, Dillon VM: The gut bacteria of insects: Nonpathogenic interactions.

Annual Review of Entomology 2004, 49:71-92.

16. Schloss PD, Delalibera I, Handelsman J, Raffa KF: Bacteria associated with the guts of

two wood-boring beetles: Anoplophora glabripennis and Saperda vestita

(Cerambycidae). Environ Entomol 2006, 35(3):625-629.

17. Delalibera I, Handelsman J, Raffa KF: Contrasts in cellulolytic activities of gut

microorganisms between the wood borer, Saperda vestita (Coleoptera :

Cerambycidae), and the bark beetles, Ips pini and Dendroctonus frontalis

(Coleoptera : Curculionidae). Environ Entomol 2005, 34(3):541-547.

18. Geib SM, Jimenez-Gasco MdM, Carlson JE, Tien M, Hoover K: Effect of Host Tree

Species on Cellulase Activity and Bacterial Community Composition in the Gut of

Larval Asian Longhorned Beetle. Environ Entomol 2009, 38:686-699.

19. Geib SM, Jimenez-Gasco Mdel M, Carlson JE, Tien M, Jabbour R, Hoover K: Microbial

community profiling to investigate transmission of bacteria between life stages of

the wood-boring beetle, Anoplophora glabripennis. Microbial ecology 2009,

58(1):199-211.

20. Suh SO, Marshall CJ, McHugh JV, Blackwell M: Wood ingestion by passalid beetles in

the presence of xylose-fermenting gut yeasts. Molecular Ecology 2003, 12(11):3137-

3145.

21. Suh SO, McHugh JV, Pollock DD, Blackwell M: The beetle gut: a hyperdiverse source

of novel yeasts. Mycol Res 2005, 109:261-265.

351 22. Suh SO, Nguyen NH, Blackwell M: Yeasts isolated from plant-associated beetles and

other insects: seven novel Candida species near Candida albicans. FEMS Yeast

Research 2008, 8(1):88-102.

23. Suh S-O, Blackwell M: The beetle gut as a habitat for new species of yeasts. In: Insect

Fungal Associations: Ecology and Evolution. Edited by Vega F, Blackwell M. New

York: Oxford University; 2005.

24. Zhang N, Suh SO, Blackwell M: Microorganisms in the gut of beetles: evidence from

molecular cloning. Journal of invertebrate pathology 2003, 84(3):226-233.

25. Jones K, Dowd P, Blackwell M: Polyphyletic origins of yeast-like endocytobionts

from anobiid and cerambycid beetles. Mycol Res 1999, 103:542-546.

26. Suh S-O, Blackwell M: Three new beetle-associated yeast species in the Pichia

guilliermondii clade. FEMS Yeast Research 2004, 5:87-95.

27. Chararas C, Eberhard R, Courtois JE, Petek F: Purification of three cellulases from the

zylophageous larvae of Ergates faber (Coleoptera: Cerambycidae). Insect

Biochemistry 1983, 13(2):213-218.

28. Dowd P: Symbiont-mediated detoxification in insect herbivores. In: Microbial

Mediation of Plant-Herbivore Interactions. Edited by Barbosa P, Krischik V, Jones C.

New York: John Wiley & Sons; 1991: 411-440.

29. Jones CG: Microorganisms as mediators of plant resource exploitation by insect

herbivores. In: A New ecology : novel approaches to interactive systems. Edited by Price

PW, Slobodchikoff CN, Gaud WS, Northern Arizona U. New York: Wiley; 1984: 53-99.

30. Vega FE, Dowd PF: The role of yeasts as insect endosymbionts. In: Insect-fungal

associations: Ecology and evolution. Edited by Vega FE, Blackwell M. New York:

Oxford University Press; 2005: 211–243.

352 31. Nardon P, Grenier A: Endosymbiosis in Coleoptera: biological, biochemical, and

genetic aspects. In: Insect Endocytobiosis: Morphology, Physiology, Genetics,

Evolution. Edited by Schwemmler W, Gassner G. Boca Raton: CRC Press; 1989: 175-

216.

32. Berkov A, Feinstein J, Small J, Nkamany M: Yeasts isolated from neotropical wood-

boring beetles in SE Peru. Biotropica 2007, 39(4):530-538.

33. Teetorbarsch GH, Roberts DW: Entomogenous Fusarium Species. Mycopathologia

1983, 84(1):3-16.

34. Morales-Ramos, Rojas M, Sittertz-Bhatkar H, Saldana G: Symbiotic relationshiop

between Hypothenemus hampei (Coleoptera: Scolytidae) and Fusarium solani

(Moniliales: Tuberculariaceae). Ann Entomol Soc Am 2000, 93:541-547.

35. Morewood WD, Hoover K, Neiner PR, McNeil JR, Sellmer JC: Host tree resistance

against the polyphagous wood-boring beetle Anoplophora glabripennis. Entomol Exp

Appl 2004, 110:79-86.

36. Dubois T, Hajek AE, Smith S: Methods for rearing the Asian longhorned beetle

(Coleoptera : Cerambycidae) on artificial diet. Ann Entomol Soc Am 2002, 95(2):223-

230.

37. Ludwig SW, Lazarus L, McCullough DG, Hoover K, Montero S, Sellmer JC: Methods

to evaluate host tree suitability to the Asian longhorned beetle, Anoplophora

glabripennis. J Environ Hort 2002, 20:175-180.

38. Keena M: Pourable artificial diet for rearing Anoplophora glabripennis (Coleoptera:

Cerambycidae) and methods to optimize larval survival and synchronize

development. Ann Entomol Soc Am 2005, 90(4).

39. White TJ, Bruns T, Lee S, Taylor J: Amplification and direct sequencing of fungal

ribosomal RNA genes for phylogenetics. In: PCR protocols : a guide to methods and

353 applications. Edited by Innis MA, Gelfand DH, Sninsky JJ, White TJ. San Diego:

Academic Press; 1990: 315–322.

40. O'Donnell K, Cigelnik E: Two divergent intragenomic rDNA ITS2 types within a

monophyletic lineage of the fungus Fusarium are nonorthologous. Mol Phylogenet

Evol 1997, 7:103-117.

41. O'Donnell K, Cigelnik E, Nirenberg HL: Molecular systematics and phylogeography

of the Gibberella fujikuroi species complex. Mycologia 1998, 90:465-493.

42. Geiser DM, Jimenez-Gasco MD, Kang SC, Makalowska I, Veeraraghavan N, Ward TJ,

Zhang N, Kuldau GA, O'Donnell K: FUSARIUM-ID v. 1.0: A DNA sequence database

for identifying Fusarium. European Journal of Plant Pathology 2004, 110(5-6):473-

479.

43. O'Donnell K, Kistler HC, Cigelnik E, Ploetz RC: Multiple evolutionary origins of the

fungus causing Panama disease of banana: Concordant evidence from nuclear and

mitochondrial gene genealogies. Proc Natl Acad Sci U S A 1998, 95(5):2044-2049.

44. O'Donnell K: Molecular phylogeny of the Nectria haematococca-Fusarium solani

species complex. Mycologia 2000, 92(5):919-938.

45. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics

analysis (MEGA) software version 4.0. Mol Biol Evol 2007, 24(8):1596-1599.

46. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA,

Oakley BB, Parks DH, Robinson CJ et al: Introducing mothur: Open-Source,

Platform-Independent, Community-Supported Software for Describing and

Comparing Microbial Communities. Appl Environ Microbiol 2009, 75(23):7537-7541.

47. Morell V: TreeBASE: The Roots of Phylogeny. Science 1996, 273(5275):569-560.

48. Swofford D: PAUP*: Phylogenetic analysis using parsimony, version 4.0b10. 2003.

354 49. Posada D: jModelTest: Phylogenetic Model Averaging. Molecular Biology and

Evolution 2008, 25(7):1253-1256.

50. Guindon S, x000E, phane, Gascuel O: A Simple, Fast, and Accurate Algorithm to

Estimate Large Phylogenies by Maximum Likelihood. Systematic Biology 2003,

52(5):696-704.

51. Zwickl D: Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under the maxmum likelihood criterion. PhD

dissertation available at http://wwwnescentorg/wg_garli (Univ of Texas, Austin) 2006.

52. Sukumaran J, Holder MT: DendroPy: a Python library for phylogenetic computing.

Bioinformatics 2010, 26(12):1569-1571.

53. Cheilas T, Stoupis T, Christakopoulos P, Katapodis P, Mamma D, Hatzinikolaou DG,

Kekos D, Macris BJ: Hemicellulolytic activity of Fusarium oxysporum grown on

sugar beet pulp: Production of extracellular arabinanase. Process Biochemistry 2000,

35(6):557-561.

54. Coleman JJ, Rounsley SD, Rodriguez-Carres M, Kuo A, Wasmann CC, Grimwood J,

Schmutz J, Taga M, White GJ, Zhou S et al: The genome of Nectria haematococca:

contribution of supernumerary chromosomes to gene expansion. PLoS Genet 2009,

5(8):e1000618.

55. Lozovaya V, Lygin A, Zernova O, Li S, Widholm J: Lignin degradation by Fusarium

solani f. sp. glycines. Plant Dis 2006, 90:77-82.

56. Manka M: Cellulolytic activity of three Fusarium culmorum (W.G. Sm.) Sacc.

isolates pathogenic towards wheat seedings. J Phytophathol 1988, 122:113-117.

57. Panagiotou G, Kekos D, Macris BJ, Christakopoulos P: Production of cellulolytic and

xylanolytic enzymes by Fusarium oxysporum grown on corn stover in solid state

fermentation. Industrial Crops and Products 2003, 18(1):37-45.

355 58. Sutherland JB, Pometto AL, Crawford DL: Lignocellulose Degradation by Fusarium

Species. Canadian Journal of Botany-Revue Canadienne De Botanique 1983,

61(4):1194-1198.

59. Rodriguez A, Perestelo F, Carnicero A, Regalado V, Perez R, DelaFuente G, Falcon MA:

Degradation of natural lignins and lignocellulosic substrates by soil-inhabiting fungi

imperfecti. Fems Microbiol Ecol 1996, 21(3):213-219.

60. Falcon MA, Rodriguez A, Carnicero A, Regalado V, Perestelo F, Milstein O, Delafuente

G: Isolation of Microorganisms with Lignin Transformation Potential from Soil of

Tenerife Island. Soil Biology & Biochemistry 1995, 27(2):121-126.

61. Teetor-Barsch GH, Roberts DW: Entomogenous <i>Fusarium</i>

species. Mycopathologia 1983, 84(1):3-16.

62. Walter S, Nicholson P, Doohan FM: Action and reaction of host and pathogen during

Fusarium head blight disease. New Phytologist 2010, 185(1):54-66.

63. Dumestre A, Chone T, Portal JM, Gerard M, Berthelin J: Cyanide degradation under

alkaline conditions by a strain of Fusarium solani isolated from contaminated soils.

Applied and environmental microbiology 1997, 63(7):2729-2734.

64. Zhang N, O'Donnell K, Sutton DA, Nalim FA, Summerbell RC, Padhye AA, Geiser DM:

Members of the Fusarium solani Species Complex That Cause Infections in Both

Humans and Plants Are Common in the Environment. J Clin Microbiol 2006,

44(6):2186-2190.

356 Table A1-1. Number of monoconidial gut-derived fungal cultures from each host tree.

Tree Replicate Number Host tree 1 2 3 Total PSU -derived Sugar Maple 7 6 3 16 NYC-derived Sycamore Maple 1 - - 1 MA-derived Silver Maple 4 1 - 5 Total 12 7 3 22

357 Table A1-2. Distribution of culture independent TEF1-α clones from PSU colony- and

NYC-derived larvae into Operational Taxonomic Units (OTUs). All 277 sequences clustered into two distinct OTUs. OTU1 was the most abundant OTU as it was detected in all NYC- and all

PSU-derived samples. OTU2 was not detected in any of the NYC samples, but was detected in a subset of the PSU samples.

OTU Identification Culture Host tree Number Independent 1 2 Total PSU-derived Sugar Maple 39 39 Pin Oak 27 6 33 Callery Pear 46 46 Artificial Diet 32 3 35 NYC-derived Silver Maple (tree 1) 30 30 Silver Maple (tree 2) 32 32 Sycamore Maple 33 33 Horse-Chestnut 29 29 Total 268 9 277

358

Table A1-3. Placement of cultured strains into Operational Taxonomic Units using TEF1-α locus. All PSU-, NYC-, and MA-derived cultured isolates were classified into the same two

OTUs that were detected in the culture-independent analysis.

OTU Identification Cultured Host tree Number 1 2 Total PSU Sugar Maple 9 8 17 NYC Horse chestnut 1 0 1 MA Silver Maple 0 6 6 Total 10 14 24

359

Figure A1-1. Phylogenetic analysis of ALB-derived and O’Donnell Fusarium solani isolates using the translation elongation factor 1-alpha locus. A rooted maximum likelihood tree was generated using TEF1-α sequences from all cultured representatives and a single representative from each OTU detected through culture-independent methods (n=500 bootstrap replicates).

These sequences were placed within the F. solani species complex, utilizing the dataset of

360 O’Donnell [44]. Mating populations (abbreviated MP) are listed based off of the O’Donnell dataset. Nodes with bootstrap support values > 50 are reported, ML distance scale bar = 10 changes. ALB-derived isolates formed 3 strongly supported clusters, denoted as OTU1, OTU2A, and OTU2B. A single NYC isolate and all culture-independent sequences generated from this population were grouped into OTU1, while all MA-derived isolates were classified into cluster

2b. PSU-derived isolates can be observed in both OTU1 and OTU2a.

361

Figure A1-2. Multilocus phylogenetic analysis of ALB-derived and O’Donnell Fusarium solani isolates using the internal transcribed spacer region, the translation elongation factor 1-alpha locus, and the large ribosomal subunit locus. A rooted maximum likelihood tree was generated using a partitioned evolutionary model with sequences from three loci from all isolates cultured from ALB larvae (n=500 bootstrap replicates). ALB-derived sequences were placed within the F. solani species complex, utilizing the dataset of O’Donnell [44]. Mating populations (abbreviated

362 MP) are listed based off of the O’Donnell dataset. Nodes with bootstrap support values > 50 are reported, ML distance scale bar = 10 changes. ALB-derived isolates formed 3 strongly supported monophyletic clades, denoted as OTU1, OTU2a, and OTU2b. A single NYC isolate can be found in OTU1, while all MA-derived isolates were classified into OTU2b. PSU-derived isolates can be observed in both OTU1 and OTU2a.

363

Appendix B

Supplemental Tables and Figures

364

Table B1. Estimated copy number of lignin degrading candidate genes in herbivore-associated microbial communities.

Pfam Description Cu- Cu- Cu- Cu- Dyp_ GST GST Aldo Peroxidase GMC_ GMC_ FAD- oxidase oxidase oxidase oxidase perox C N ket red oxred_ oxred_ ox_C 1 2 _3 _4 C N Termite

Amitermes 0 0 0 36 1 0 0 478 1 0 1 23 hindgut Nasutitermes 0 0 0 4 0 0 0 113 0 0 1 17 hindgut Costa Rica Nasutitermes 0 4 5 4 0 0 0 215 0 0 1 16 hindgut Florida Trychonympha 0 0 0 6 0 0 0 12 0 1 0 14 Protist Herbivore Guts

Tamar Wallaby 0 0 0 3 0 0 0 294 0 0 0 16

Panda 2 1 2 0 1 0 0 6 0 1 1 1

Honey Bee 2 16 15 27 11 32 2 183 0 11 8 53

Reindeer 0 0 0 6 0 0 0 49 0 0 0 4

Ant fungal garden 27 33 43 31 61 104 131 352 47 62 56 0

365

Primary Phloem/Xylem Fungal Galleries

DF Fungal 75 181 185 192 238 1078 1258 1537 78 937 858 203 Mississippi DP Fungal Alberta 72 127 129 125 158 671 591 804 51 311 307 112

DP Fungal British 60 60 69 93 92 355 296 451 25 179 198 79 Columbia DP Fungal Alberta 78 86 96 107 160 712 791 1013 59 475 486 223 (hybrid) Xyleborus fungal 72 87 85 103 158 489 497 1184 93 297 383 265 gallery Phloem/xylem feeding insect guts

Xyleborus adult 12 22 11 26 22 92 127 191 12 69 79 23 gut Xyleborus larval 11 29 26 40 47 184 201 286 11 103 110 57 gut DP Larval Gut 9 44 47 89 99 462 594 557 8 249 253 134 Alberta DF Larval Gut 14 25 25 24 41 219 172 304 11 66 87 100 Mississippi Heartwood-feeding communities

A. glabripennis 33 60 42 52 131 181 133 1623 162 193 314 299 whole gut Sirex fungal 67 88 92 69 99 317 334 357 12 226 275 232 gallery

366

Table B2. KEGG and taxonomic assignments of glycoside hydrolases detected in the A. glabripennis gut metagenome.

GH Number KEGG ECs Class Level Assignments Family of Reads 1 556 β-glucosidase (EC 3.2.1.21) Actinobacteria β-galactosidase (EC 3.2.1.23) Alphaproteobacteria β-mannosidase (EC 3.2.1.25) Bacilli Bacteroidetes β-glucuronidase (EC 3.2.1.31) Betaproteobacteria Exo-β-1,4-glucanase (EC 3.2.1.74) Clostridia 6-phospho-β-galactosidase (EC 3.2.1.85 Gammaproteobacteria 6-phospho-β-glucosidase (EC 3.2.1.86) Hexapoda Strictosidine amygdalin β-glucosidase (EC Saccharomycetes 3.2.1.117) Verrucomicrobia Thioglucosidase (EC 3.2.1.147) β-primeverosidase (EC 3.2.1.149) 2 337 β-galactosidase (EC 3.2.1.23) Actinobacteria β-mannosidase (EC 3.2.1.25) Alphaproteobacteria β-glucuronidase (EC 3.2.1.31) Bacilli Bacteroidetes Mannosylglycoprotein Gammaproteobacteria Hexapoda Lentisphaeria 3 687 β-glucosidase (EC 3.2.1.21) Acidobacteria Xylan 1,4-β -xylosidase (EC 3.2.1.37) Actinobacteria β -N-acetylhexosaminidase (EC 3.2.1.52) Alphaproteobacteria Bacilli Glucan 1,3-β -glucosidase (EC 3.2.1.58) Bacteroidetes Endo-β -1,4-glucanase (EC 3.2.1.74) Betaproteobacteria Exo-1,3-1,4-glucanase (EC 3.2.1.-) Clostridia α-L-arabinofuranosidase (EC 3.2.1.55) Gammaproteobacteria Lentisphaeria Coleoptera Saccharomycetes Thermobaculum Verrucomicrobia

5 77 β-mannosidase (EC 3.2.1.25) Actinobacteria Endo- β -1,4-glucanase (EC 3.2.1.4) Alphaproteobacteria Glucan β -1,3-glucosidase (EC 3.2.1.58) Bacilli Bacteroidetes Licheninase (EC 3.2.1.73) Clostridia Glucan endo-1,6- β -glucosidase (EC 3.2.1.75) Gammaproteobacteria Mannan endo-β-1,4-mannosidase (EC Hexapoda 3.2.1.78) Lentisphaeria 367

Endo- β -1,4-xylanase (EC 3.2.1.8) Verrucomicrobia Exo-β-1,4-cellobiosidase (EC 3.2.1.91) β -1,3-mannanase (EC 3.2.1.-) Mannan transglycosylase (EC 2.4.1.-) β - (EC 3.2.1.45)

6 22 Endo-β -1,4-glucanase (EC 3.2.1.4) Actinobacteria Exo-β -1,4-cellobiosidase (EC 3.2.1.91) Alphaproteobacteria Bacilli Bacteroidetes Chloroflexi Clostridia Gammaproteobacteria Hexapoda Saccharomycetes 8 57 Endo-β -1,4-glucanase (EC 3.2.1.4) Actinobacteria Licheninase (EC 3.2.1.73) Alphaproteobacteria Endo-1,4-β-xylanase (EC 3.2.1.8) Bacilli Gammaproteobacteria

9 10 Endo-β -1,4-glucanase (EC 3.2.1.4) Actinobacteria Exo-β -1,4-cellobiosidase (EC 3.2.1.91) Alphaproteobacteria β-glucosidase (EC 3.2.1.21); Bacilli Bacteroidetes

Gammaproteobacteria Deltaproteobacteria 10 62 Endo-1,4-β-xylanase (EC 3.2.1.8) Actinobacteria Endo-1,3-β-xylanase (EC 3.2.1.32) Alphaproteobactia Bacteroidetes Gammaproteobacteria Verrucomicrobia 11 3 Endo-1,4-β-xylanase (EC 3.2.1.8) Actinobacteria 14 15 β-amylase (EC 3.2.1.2) Actinobacteria Bacteroidetes Verrucomicrobia 15 92 Glucoamylase (EC 3.2.1.3) Actinobacteria Glucodextranase (EC 3.2.1.70) Alphaproteobacteria α,α-trehalase (EC 3.2.1.28) Bacteroidetes Gammaproteobacteria

Hexapoda Saccharomycetes 16 60 Endo-1,3-β-glucanase (EC 3.2.1.39) Actinobacteria Endo-1,3(4)-β-glucanase (EC 3.2.1.6 Alphaproteobacteria Licheninase (EC 3.2.1.73) Bacteroidetes Gammaproteobacteria Endo-β-1,3-galactanase (EC 3.2.1.-)

368

Verrucomicrobia 17 10 Endo-1,3-β-glucosidase (EC 3.2.1.39) Saccharomycetes Glucan 1,3-β-glucosidase (EC 3.2.1.58) Licheninase (EC 3.2.1.73) β -1,3-glucanosyltransglycosylase (EC 2.4.1.-)

18 75 Chitinase (EC 3.2.1.14) Actinobacteria Bacteroidetes Clostridia Gammaproteobacteria Hexapoda Saccharomycetes Verrucomicrobia 20 192 β-hexosaminidase (EC 3.2.1.52) Actinobacteria β -1,6-N-acetylglucosaminidase (EC 3.2.1.-) Alphaproteobacteria β -6-SO3-N-acetylglucosaminidase (EC 3.2.1.-) Bacteroidetes Betaproteobacteria Gammaproteobacteria Hexapoda Lentisphaeria 25 152 (EC 3.2.1.17) Actinobacteria Alphaproteobacteria Bacilli Bacteroidetes Gammaproteobacteria 26 21 β-mannanase (EC 3.2.1.78) Actinobacteria Alphaproteobacteria Bacilli Bacteroidetes Gammaproteobacteria Saccharomycetes 28 26 Polygalacturonase (EC 3.2.1.15) Actinobacteria Exo-polygalacturonase (EC 3.2.1.67) Alphaproteobacteria Endo-xylogalacturonan hydrolase (EC 3.2.1.-) Bacilli Bacteroidetes Gammaproteobacteria Saccharomycetes 31 273 α-glucosidase (EC 3.2.1.20) Actinobacteria α -1,3-glucosidase (EC 3.2.1.84) Alphaproteobacteria -isomaltase (EC 3.2.1.48) Bacilli Bacteroidetes Isomaltosyltransferase (EC 2.4.1.-) Clostridia Gammaproteobacteria Hexapoda Lentisphaeria Saccharomycetes

369

Verrucomicrobia 32 367 (EC 3.2.1.26) Actinobacteria Endo-levanase (EC 3.2.1.65) Alphaproteobacteria Sucrose:fructan 6-fructosyltransferase (EC Bacilli 2.4.1.10) Bacteroidetes Levan fructosyltransferase (EC 2.4.1.-) Betaproteobacteria Gammaproteobacteria Saccharomycetes 35 65 β-galactosidase (EC 3.2.1.23) Actinobacteria Exo-β-1,4-galactanase (EC 3.2.1.-) Bacteroidetes Hexapoda Verrucomicrobia 38 119 α-mannosidase (EC 3.2.1.24) Acidobacteria Mannosyl-oligosaccharide-α -1,3-mannosidase Actinobacteria (EC 3.2.1.-) Alphaproteobacteria Bacilli Bacteroidetes Clostridia Lentisphaeria Hexapoda Saccharomycetes Thermobaculum 39 39 β-xylosidase (EC 3.2.1.37) Actinobacteria Alphaproteobacteria Lentisphaeria Hexapoda Verrucomicrobia 43 210 β -xylosidase (EC 3.2.1.37) Actinobacteria β -1,3-xylosidase (EC 3.2.1.-) Bacilli α-L-arabinofuranosidase (EC 3.2.1.55) Bacteroidetes Arabinanase (EC 3.2.1.99) Clostridia Endo-1,4-β-xylanase (EC 3.2.1.8) Gammaproteobacteria 45 1 Endo-β -1,4-glucanase (EC 3.2.1.4) N/A* 46 1 No EC evidence Bacilli 47 20 α-mannosidase (EC 3.2.1.113) Hexapoda Saccharomycetes 51 84 Dextranase (EC 3.2.1.11) Actinobacteria

53 37 Endo-β-1,4-galactanase (EC 3.2.1.89) Actinobacteria Alphaproteobacteria Bacteroidetes candidate division TM7 Gammaproteobacteria 57 16 α-amylase (EC 3.2.1.1) Actinobacteria 4- α-glucanotransferase (EC 2.4.1.25) Alphaproteobacteria α--galactosidase (EC 3.2.1.22) Bacteroidetes

370

amylopullulanase (EC 3.2.1.41) candidate division TM7 Gammaproteobacteria 61 1 Copper-dependent polysaccharide N/A* monooxygenases 65 189 α,α-trehalase (EC 3.2.1.28) Actinobacteria Maltose phosphorylase (EC 2.4.1.8) Bacilli Kojibiose phosphorylase (EC 2.4.1.230) Bacteroidetes Trehalose-6-phosphate phosphorylase (EC Clostridia 2.4.1.-) Nigerose phosphorylase (EC 2.4.1.-) Saccharomycetes 67 10 α-glucuronidase (EC 3.2.1.139) Actinobacteria Alphaproteobacteria Bacteroidetes Gammaproteobacteria 70 180 Dextransucrase (EC 2.4.1.5) Actinobacteria Reuteransucrase (EC 2.4.1.-) Alphaproteobacteria α-4,6-Glucanotransferase (EC 2.4.1.-) Bacilli Bacteroidetes 71 1 No EC evidence Actinobacteria 76 48 α-1,6-mannanase (EC 3.2.1.101) Actinobacteria Alphaproteobacteria Bacteroidetes Saccharomycetes 81 48 Endo-β-1,3-glucanase (EC 3.2.1.39) Actinobacteria Bacteroidetes Saccharomycetes 85 4 No EC evidence Actinobacteria

88 66 D-4,5-unsaturated-β-glucuronyl hydrolase (EC Actinobacteria 3.2.1.-) Alphaproteobacteria Bacilli Bacteroidetes Gammaproteobacteria Hexapoda Verrucomicrobia 92 161 Mannosyl-oligosaccharide-α-1,2-mannosidase Actinobacteria (EC 3.2.1.113) Bacilli Mannosyl-oligosaccharide-α -1,3-mannosidase Bacteroidetes (EC 3.2.1.-) Gammaproteobacteria Mannosyl-oligosaccharide-α-1,6-mannosidase (EC 3.2.1.-) α -mannosidase (EC 3.2.1.24) α -1,2-mannosidase (EC 3.2.1.-) α -1,3-mannosidase (EC 3.2.1.-) α -1,4-mannosidase (EC 3.2.1.-)

371 VITA Erin D. Scully

EDUCATION Doctor of Philosophy in Genetics. The Pennsylvania State University, University Park, PA. 2009-2013. Advisors: Dr. Kelli Hoover and Dr. John Carlson

Master of Science in Biology. Indiana University of Pennsylvania, Indiana, PA. 2005-2007.

Bachelor of Science in Biology (Summa cum laude; University Honors Program Graduate). Edinboro University of Pennsylvania, Edinboro, PA. 2000-2005.

RESEARCH EXPERIENCE Graduate Research Assistant, Department of Entomology. The Pennsylvania State University. 2009-2013.

Senior Research Technologist. Department of Entomology, The Pennsylvania State University. 2007-2009.

Graduate Research Assistant. Department of Biology, Indiana University of PA. 2005-2007.

GRANTS AND FELLOWSHIPS USDA-AFRI Microbial Genomics Training Grant. January 2012-January 2014.

SCIENTIFIC PUBLICATIONS Scully, ED, Hoover, K, Carlson, JE, Tien, M, and Geib, SM. Midgut Transcriptome Profiling of Anoplophora glabripennis, a Lignocellulose Degrading, Wood- Boring Cerambycid (in review, BMC Genomics).

Scully, ED, Geib, SM, Hoover, K, Tien, M, Tringe, S, Barry, K, Herr, JR, and Carlson, JE, 2013. Comparative Metagenomic Profiling Reveals Lignocellulose Degrading System in Microbial Community Associated with Wood-Feeding Beetle. PLoS One 8 (9): e73827

Chung, SH, Rosa, C, Scully, ED, Peiffer, M, Hoover, K, Luthe, DS, and Felton, GW, 2013. Herbivore Exploits Oral Bacteria to Suppress Plant Defenses. PNAS 11- (39): 15728-15733.

Scully, ED, Hoover, K, Carlson, J, Tien, M, and Geib, SM. 2012. Proteomic Analysis of Fusaruim solani Isolated from the Asian Longhorned Beetle, Anoplophora glabripennis. PLoS One 7(4): e32990.

Geib, SM, Scully, ED, Jimenez-Gasco MM, Carlson, JE, Tien, M, and Hoover, K. 2012. Phylogenetic Analysis of Fusarium solani Associated with the Asian Longhorned Beetle, Anoplophora glabripennis. Insects 3: 141-160.