Genome and transcriptome analyses of the -fungal symbiont clavigera, a lodgepole pine pathogen

Scott DiGuistinia, Ye Wanga, Nancy Y. Liaob, Greg Taylorb, Philippe Tanguayc, Nicolas Feaud, Bernard Henrissate, Simon K. Chanb, Uljana Hesse-Orcea, Sepideh Massoumi Alamoutia, Clement K. M. Tsuif, Roderick T. Dockingb, Anthony Levasseurg, Sajeet Haridasa, Gordon Robertsonb, Inanc Birolb, Robert A. Holtb, Marco A. Marrab, Richard C. Hamelinc, Martin Hirstb, Steven J. M. Jonesb, Jörg Bohlmannf,h,1, and Colette Breuila,1

aDepartment of Wood Science, fDepartment of Forest Science, University of British Columbia, Vancouver, BC, Canada V6T 1Z4; bBritish Columbia Cancer Agency Genome Sciences Centre, Vancouver, BC, Canada V5Z 4E6; cNatural Resources Canada, Ste-Foy, QC, Canada G1V 4C7; dUnité Mixte de Recherche 1202, Institut National de la Recherche Agronomique-Université Bordeaux I, Biodiversité, Gènes et Communautés, Institut National de la Recherche Agronomique Bordeaux-Aquitaine, 33612 Cestas Cedex, France; eArchitecture et Fonction des Macromolécules Biologiques, Unité Mixte de Recherche-6098, Centre National de la Recherche Scientifique, Universités Aix-Marseille I & II, 13288 Marseille cedex 9, France; gBiotechnologie des Champignons Filamenteux, Unité Mixte de Recherche-1161, Institut National de la Recherche, Universités de Provence et de la Méditerranée, 13288 Marseille cedex 09, France; and hMichael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada V6T 1Z3

Edited by Rodney B. Croteau, Washington State University, Pullman, WA, and approved December 27, 2010 (received for review August 2, 2010)

In western North America, the current outbreak of the mountain vectored fungi is symbiotic. The fungi benefit because beetles pine beetle (MPB) and its microbial associates has destroyed wide carry them through the tree bark into a new host’s nutrient-rich areas of lodgepole pine forest, including more than 16 million tissues. The benefits to the beetle and its progeny are less clear, hectares in British Columbia. Grosmannia clavigera (Gc), a critical but the fungi may make nutrients available and may detoxify component of the outbreak, is a symbiont of the MPB and a path- host-defense metabolites (5–7). Although both fungi and bark ogen of pine trees. To better understand the interactions between beetles must overcome physical and chemical host defenses to Gc, MPB, and lodgepole pine hosts, we sequenced the ∼30-Mb Gc become established in conifers, their relative contributions to fi genome and assembled it into 18 supercontigs. We predict 8,314 this process are poorly de ned. Toxic phenolics and oleoresin protein-coding genes, and support the gene models with pro- terpenoids are key chemical defense components in conifers (8, teome, expressed sequence tag, and RNA-seq data. We establish 9). In lodgepole pine, phenolics are stored in specialized poly- that Gc is heterothallic, and report evidence for repeat-induced phenolic parenchyma cells in the inner bark (phloem), and oleoresin monoterpenoids and diterpene resin acids are formed point mutation. We report insights, from genome and transcrip- and accumulate in resin ducts of the phloem and sapwood. When tome analyses, into how Gc tolerates conifer-defense chemicals, Gc is manually inoculated below the bark of seedlings or mature including oleoresin terpenoids, as they colonize a host tree. RNA- trees, as a single fungal inoculum point, it induces the formation seq data indicate that terpenoids induce a substantial antimicro- of a phloem lesion (i.e., a dark necrotic zone of tissue) that bial stress in Gc, and suggest that the may detoxify these contains high concentrations of tree oleoresins and phenolics, chemicals by using them as a carbon source. Terpenoid treatment suggesting that the host prevents further fungal colonization. At ∼ strongly activated a 100-kb region of the Gc genome that con- higher inoculation densities, with inocula in multiple locations, tains a set of genes that may be important for detoxification of the fungus will also invade the sapwood adjacent to the lesions these host-defense chemicals. This work is a major step toward and block water transport to the crown of the tree (10). understanding the biological interactions between the tripartite Gc is specifically associated with the MPB, which colonizes only MPB/fungus/forest system. pine species, suggesting that both the vector and its fungal asso- ciates may have evolved specific metabolic pathways for over- next generation sequencing | monoterpene | carbohydrate active coming pine defenses. Although the virulence of Gc varies between enzymes | ABC transporter | forest genomics isolates (11), little systematic characterization has been performed on the genetic variation in Gc populations and on the relation of ark beetles and their fungal associates have inhabited conifer such variation to the differences in virulence between isolates. Bhosts since the Mesozoic era (1), and are the most eco- Identifying biochemical mechanisms by which Gc overcomes nomically and ecologically significant forest pests in the northern conifer defenses is a key part of understanding interactions be- hemisphere. The current outbreak of the mountain pine beetle tween this fungal pathogen and its host pine. To address this (MPB, Dendroctonus ponderosae) in western North America is knowledge gap, we first generated a draft genome sequence for the largest since the early 1900s. This beetle has killed an esti- Gc, primarily using next-generation sequencing data (12). Here, mated 630 million cubic meters (∼16.3 M hectares) of lodgepole we report the finished 29.8-Mb Gc genome sequence, 8,314 an- pine ( subsp. latifolia Engelm.) forest in British Columbia (www.for.gov.bc.ca/hfp/mountain_pine_beetle/). The MPB epidemic has bypassed the natural geographic barrier of Author contributions: S.D., S.J.M.J., J.B., and C.B. designed research; S.D., Y.W., N.Y.L., G.T., the Rocky Mountains and has the potential to spread eastward P.T., S.K.C., U.H.-O., S.M.A., and R.T.D. performed research; R.A.H., M.A.M., M.H., R.C.H., into the vast Canadian boreal pine forest. Climate change is and S.J.M.J. contributed new reagents/analytic tools; S.D., A.L., B.H., N.F., U.H.-O., C.K.M.T., thought to be a contributing factor to the current MPB epidemic, S.H., and I.B. analyzed data; and S.D., G.R., S.J.M.J., J.B., and C.B. wrote the paper. and the devastation of large areas of pine forest is anticipated to The authors declare no conflict of interest. have major consequences that include disturbing the global This article is a PNAS Direct Submission. balance of atmospheric carbon emission and sequestration (2). Freely available online through the PNAS open access option. Among the MPB-associated microbiota (3), the ascomycete Data deposition: The sequences reported in this paper have been deposited in NCBI Grosmannia clavigera (Gc) is a critical component of this large- GenBank as assembly and annotations accession ACXQ00000000. scale epidemic (Fig. 1). This pathogenic fungus can kill lodge- 1To whom correspondence may be addressed. E-mail: [email protected] or pole pine without the beetle when inoculated at a high density; [email protected]. however, the mechanisms by which the fungus kills trees are not This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. fully characterized (4). The association between bark beetles and 1073/pnas.1011289108/-/DCSupplemental.

2504–2509 | PNAS | February 8, 2011 | vol. 108 | no. 6 www.pnas.org/cgi/doi/10.1073/pnas.1011289108 Downloaded by guest on September 23, 2021 notated protein coding sequences, initial annotation of protein repeat detection using RepeatScout (13). In total, 10.4% of the coding-sequence polymorphisms, proteins secreted in response finished genome was found to be composed of repeats or low to growth on wood, changes in the fungal transcriptome induced complexity sequences. Evidence for repeat-induced point muta- by exposure to lodgepole pine phloem extract (LPPE) or oleo- tion (RIP) was identified using RIPCAL (14), and was found resin terpenoids, and genes and pathways involved in the almost exclusively within transposable elements. After excluding modification, transport, and metabolism of conifer defense mitochondrial DNA, we predicted 8,314 protein-coding gene components. These resources and results provide a solid foun- models, accounting for 46% of the total genome length (SI Ap- dation to clarify the interaction of Gc with host-tree defenses. pendix, Dataset S1). The predicted gene models were supported and validated with EST, RNA-seq, and peptide sequences (Table Results 1). We annotated the translated set of sequences using public Genome Sequence and Protein Coding Annotations in G. clavigera. sequence databases and assigned initial functional descriptions Building on the previously published draft Gc genome sequence for ∼75% of the total predicted protein collection (SI Appendix, (12), we manually finished the genome assembly of Gc [kw1407; Dataset S1). National Center for Biotechnology Information (NCBI), Ge- nome PID: 39837] yielding 18 supercontigs with a total length of RNA-seq Validation of Gene Models and Identification of Protein 29.8 Mb (Table 1 and SI Appendix). Telomeric sequences sug- Coding Sequence Variations. To identify SNPs in the protein- gested that the supercontigs belonged to seven chromosomes. coding regions of the Gc genome and to provide additional gene We achieved 64× sequence coverage across 90% of the finished model support, we assessed the genome using RNA-seq read data genome sequence (SI Appendix, Figs. S1 and S2). We validated from a collection of seven additional Gc strains (SI Appendix, the assembly by aligning to it 99.4% of 7,169 unique expressed Table S1) (42 different culture-treatment combinations). For this sequence tag (EST) sequences (method described in ref. 12). We purpose we generated cDNA from polyA+ purified total RNA assembled the mitochondrial genome into a single ∼90-kb cir- and sequenced it using a paired-end read approach on the Illu- cularized sequence (SI Appendix, Fig. S3). mina Genome Analyzer platform. We predicted 17,236 SNPs Before predicting gene models, we masked the assembled from the tag-to-genome alignments (FPR = 0.0045, FNR = 0.16) genome sequence for repetitive elements identified using simi- (SI Appendix, Dataset S2), of which 12,160 occurred within ∼14.5 larity to repeat databases (repbase v.20090120) and de novo Mb of protein-coding gene-model sequence covering 92% of the

Fig. 1. Life cycle and infection process of MPB and associated microorganisms. (A) MPBs disperse during early summer; both sexes of MPB carry blue stain fungi. Beetles bore through bark, make their galleries in the phloem, and de- posit eggs along the gallery walls.

During this process they introduce MICROBIOLOGY Gc and other associated micro- organisms into the host tree. Fungi, yeast, and bacteria begin coloniz- ing host tree tissues. Wood-staining Grosmannia and Ophiostoma fungi penetrate the xylem. The larvae feed on phloem creating galleries at right angles to the main galler- ies, completing their development after the fourth instar. Larvae pu- pate within the excavated cham- bers, and pupae transform into adults during early summer. During feeding, the larvae and beetles accumulate microorganisms in their guts, on their exoskeletons and in specialized maxillary structures known as mycangia. This process ensures that symbiotic micro- organisms are transmitted to the next host. Gc colonizes the sap- wood rapidly and produces a blue/ black melanin pigment. Fungal growth blocks water and nutrient flow in the sapwood and phloem contributing to tree mortality. (B) Representative phenotypes of Gc. Light micrographs of a-sexual stage characterized with mono- nematous (i) and synnematous (ii) conidiophores reproducing con- idia. (iii) Light micrograph of sex- ual structure characterized by a spherical ascocarp oozing asco- spores. (iv) Stereomicrograph of conidiophores that grow inside the MPB gallery and ascocarps (Inset) on the inner bark of lodgepole pine. (C) Phylogenetic tree showing the positioning of Gc within the .

DiGuistini et al. PNAS | February 8, 2011 | vol. 108 | no. 6 | 2505 Downloaded by guest on September 23, 2021 Table 1. Genome characteristics of G. clavigera of the Gc gene models described above (SI Appendix, Dataset S3), fi Characteristic for which we identi ed enriched Gene Ontology (GO) terms. Ninety percent of these annotated genes (162 genes) belonged Genome size 29.8 Mb to metabolic processes, with the greatest enrichment occurring Estimated chromosome number 7 within “carbohydrate metabolism” (GO:0005975) and “prote- Supercontig number 18 olysis” (GO:0006508). We used SignalP (18) to show that the Contig N50 1.2 Mb deduced protein sequences were enriched in signal peptides for GC% genome 53.4% secretion; we predicted such peptides in 106 (50%) of the 214 GC% transcript 60.5% genes but in only 538 (7%) of the genome-wide set of 8,314 gene Protein coding gene number 8,314 models. The predicted secretome is small relative to secretomes RNA-seq evidence 7,640 predicted for phylogenetically similar species (SI Appendix). We EST evidence 5,500 noted that this reduction occurred for all protein lengths but may Peptide evidence 214 be biased toward smaller protein lengths. Only a small number of Gene density 1/3,517 bp secreted protein families were expanded in Gc (SI Appendix, Mean transcript length 1,641 bp Table S2). Mean intergenic distance 1,466 bp We identified 231 carbohydrate-active enzymes in the Gc ge- Percent multi-exonic genes 77.2% nome, using the CAZy classification system (19) (SI Appendix, Mean number of introns/gene 1.86 Dataset S4). This number is smaller than previously reported for Median intron length 70 bp Neurospora crassa (277) or Magnaporthe grisea (378). The Gc genome contained 139 GHs, 17 of which were detected by peptide sequencing (described above and in SI Appendix, Table total transcriptome length (∼15.77 Mb). Only a small number of S3). These GHs included enzymes that may be involved in variants were located in predicted intron regions (741; 6%). maintaining cell wall plasticity during growth and morphogenesis These 741 SNPs could have resulted from incompletely spliced and in acquiring carbohydrates. We found that Gc secretes rel- transcripts, alternatively spliced transcripts, or inappropriately atively more plant cell wall degrading enzymes that could be predicted intron-exon boundaries. We found an SNP density of involved in pectin degradation of the cell wall or the tracheid one variant per 1,189 bp across the predicted genes and an av- bordered pit membranes, allowing this fungus to colonize the erage minor allele frequency of 25.1%. Transitions were favored sapwood (20). However, GHs involved in degrading host ligno- over transversions by a ratio of 3:1 and amino acid sequence cellulose structures were notably absent from both the proteome variations for 5,689 of the predicted SNPs. and genome data collections (e.g., GH6 cellulase). Gc has only two carbohydrate-binding modules assigned to family 1: one was Identification of G. clavigera Gene Orthologs. To further validate attached to a GH12 plant cell wall digesting enzyme and the the Gc gene models and to assess gene family variations in Gc, other was attached to a chitinase. Carbohydrate esterases were relative to other fungi, we used orthoMCL (15) to identify gene also sparse, in particular families 5 and 1. Whereas M. grisea and orthologs. We clustered ∼186.4 K predicted protein sequences N. crassa, respectively, have 10 and 7 carboyhydrate esterases from 17 fungal taxa and identified 6,780 ortho-groups (groups of from family 1, and 15 and 3 carbohydrate esterases from family 5, putative orthologous genes) that contained at least one member Gc has only one of each. Similarly, Gc appears to have only from Gc. Of these, 1,940 contained a representative from all taxa a single type B feruloyl esterase, with no secretion signal peptide and 692 possessed a strict single-copy orthologous relationship and which is likely not secreted. Without a secreted feruloyl (i.e., clusters contained exactly one member per species). Phy- esterase, Gc cannot hydrolyze the diferulate cross-links in plant logenetic analysis was performed with these 692 genes to confirm cell walls. Peptide sequencing and signal peptide analysis in- the phylogenetic position of Gc within the class dicated that the carbohydrate esterase 5 and carbohydrate es- (Fig. 1C and SI Appendix). We identified the mating-type (MAT) terase 8 enzymes were secreted during growth on the sawdust- gene, suggesting that the sequenced strain belongs to the MAT- agar medium. 1-2 idiomorph. The high-mobility group domain of the Gc MAT We used the MEROPS database (merops.sanger.ac.uk)to protein was similar to those in MAT loci of other filamentous identify 287 putative peptidases in Gc. Twenty of these pepti- ascomycetes. We detected the MAT-1-1 idiomorph α-domain in dases, belonging to the A1, S8, S28, and S53 families, were also other Gc isolates, but not in the sequenced strain. This result identified in the peptide-sequencing data. The top five ranked by indicates that Gc is heterothallic. peptide-spectra abundance are reported in SI Appendix, Table Using CAFE (16), we identified Gc gene family expansions for S3. We identified a lineage-specific gene expansion within the methyltransferases, major facilitator superfamily transporters, peptidase family S53 (10 genes). S53 enzymes were among the and serine-peptidases, whereas gene family contractions oc- most abundant peptidases secreted during growth on the saw- curred for Na+/Ca2+-transporting ATPases, glycoside hydrolases dust-agar medium. In addition, we identified an extracellular li- (GHs), zinc-type alcohol dehydrogenases, and cytochrome P450s pase that may be involved in using pine triglycerides, a major (CYP450s). The largest Gc gene family expansion was for O- carbon source for this fungus. Triglycerides account for ∼2% to methyltransferases, for which we identified 199 methyltransferase- 2.5% dry weight of lodgepole pine stems (21). like sequences (PFAM: PF08241-2). Using a phylogenetic anal- ysis including a subset from the other fungal taxa, we observed Identification and Annotation of Genes for Detoxifying Host Defense a clade containing seven Gc O-methyltransferase sequences that Metabolites. In a pine host, Gc grows in an environment with high showed significant support for branch-specific differences in concentrations of terpenoid and phenolic defense metabolites. synonymous vs. nonsynonymous substitution rates using a likeli- Growth of Gc on malt extract agar was reduced in the presence hood ratio test (P < 0.001), indicating that these methyl- of LPPE. In the presence of terpenoids, Gc grew with an initial transferases may be under positive selection (SI Appendix). lag phase (24 h) followed by growth at nearly the same rate as untreated controls (SI Appendix, Fig. S4). In contrast, N. crassa Identification and Annotation of Genes and Proteins for Inhabiting growth was reduced when challenged with the LPPE treatment Host Pine. Like other sap-staining fungi that colonize conifers, Gc and completely inhibited by the terpene treatment (SI Appendix, is unable to degrade the structural components of wood, such as Fig. S5). These results highlight Gc’s tolerance for terpenoids lignin and cellulose (17). To identify genes that may be used by Gc and possibly other conifer defense compounds. to grow in the host sapwood, we isolated proteins secreted by Gc To identify genes associated with mechanisms used by Gc to during mycelial growth on a simplified substrate, pine sawdust- overcome host chemical defenses, we used Illumina expression supplemented agar medium. Peptide sequencing supported 214 profiling (RNA-seq) on mycelia samples collected at 12 and 36 h

2506 | www.pnas.org/cgi/doi/10.1073/pnas.1011289108 DiGuistini et al. Downloaded by guest on September 23, 2021 after LPPE or terpenoid treatments (SI Appendix, Dataset S1). In Response of G. clavigera to Terpenoid Treatment. In the 12-h my- total, 4,690 gene models showed at least twofold increase in celial cultures treated with terpenoids we observed two over- transcript abundance in at least one of the treatments and time lapping GO clusters within the biological process hierarchy. The points sampled relative to matched untreated controls (SI Ap- first cluster included genes annotated as mRNA processing (P < pendix,Fig.S6). P values for differential expression were highly 0.001) and ribosome biogenesis (P < 0.001), and the second significant for hundreds of genes (SI Appendix, Fig. S7). We cluster included genes annotated as amino acid biosynthesis (P < plotted expression levels for genes induced by the LPPE and ter- 0.001). We observed the induction of genes encoding DNA repair, penoid treatments in 50-kb windows and noted regions with high recombination, stability, and replication proteins, such as helix- transcriptional activity [coexpression clusters (ECs)] (see below, destabilizing proteins, topoisomerases, ss-DNA binding protein, SI Appendix,Fig.S8, and http://bfgweb.bcgsc.ca/homepage.html). DNA repair nucleases, mismatch repair proteins, DNA ligases, a DNA glycosylase, and DNA polymerases. In addition, we ob- Response of G. clavigera to LPPE. GOMiner (22) analysis of the served the induction of genes encoding histones H2A, H2B, and 12-h LPPE gene-expression data identified enrichment of tran- H4, but alternate variants for H2A and H4 and histones H3 and H1 scripts for several biological processes including carbohydrate were strongly repressed. Strong induction of a putative H4 argi- metabolism (P < 0.001), alcohol metabolism (P < 0.001), gly- nine methyltransferase, H3-K79 methyltransferase, ubiquitin colysis (P < 0.001), external encapsulating structure organization conjugase, and SIR2-like deacetylase may implicate chromatin (P < 0.001), cellular protein metabolic processes (P < 0.001), and remodeling in the process of changing gene expression. cellular aromatic compound metabolic processes (P < 0.001). A marked change in gene expression was apparent at the GHs can also contribute to fungal detoxification of sugar- 36-h time point following treatment with terpenoids compared conjugated antimicrobial compounds, such as phenolic glycosides with the 12-h time point. We observed among the differentially and saponins (23, 24). Inspection of the GH gene expression ac- expressed genes two overlapping clusters of GO-terms within the tivity 12 h following LPPE treatment indicated up-regulation biological process hierarchy. The clusters encompass lipid meta- of GHs targeting the plant cell wall (families: GH51, GH78, bolic processes (P < 0.001) and alcohol metabolism (P < 0.001). α GH61, GH53, GH43) and up-regulation of an -trehalase The molecular function hierarchy contained several small clusters (GH37). Genes encoding proteins from families GH3, GH5, and falling primarily within catalytic activity (P < 0.001) and micro- fi GH39 were also induced. Although the substrate speci city of tubule based processes (P < 0.001). Within these classifications, these GHs is unknown, the GH3 and GH39 proteins are likely noteworthy members were oxidoreductase activity (P < 0.001), intracellular, as they do not possess extracellular signal peptide aldehyde dehydrogenase activity (P < 0.001), and electron carrier sequences, whereas the GH5 has a secretion signal and no activity (P < 0.001). Encompassed within the microtubule-based GPI anchor. processes were “cytoskeleton organization and biogenesis” (P < Gene-expression data suggest that LPPE treatment induced − 0.001), cytoskeleton based intracellular transport (P < 0.001), an oxidative stress response in Gc with induction of Mn/Fe and − cellular localization (P < 0.001), and transport (P < 0.001). Our Cu/Zn superoxide dismutases, peroxidases, and a thioredoxin initial Kyoto Encyclopedia of Genes and Genomes annotations and thioredoxin reductase. Up-regulation of the eight subunits of supported the GO analysis, indicating induction of the fatty acid the T-complex polypeptide involved in actin and tubulin folding, and glyoxylate pathways. We examined the β-oxidation capacity as well as the induction of the actin and tubulin genes them- of Gc and found that the FOX2 multifunctional β-oxidation en- selves, may suggest an LPPE-induced reorganization of the Gc zyme (GLEAN_6203) was induced; however, the mitochon- MICROBIOLOGY cytoskeleton. In addition, up-regulation of 19 Gc genes encoding drial short-chain enoyl CoA hydratase (GLEAN_647) was more proteasome and proteasome regulatory subunits may indicate induced protein turnover in response to LPPE treatment. After strongly induced. As well, at both 12 and 36 h we observed strong induction for carnitine acyl transferase and for carnitine acetyl 36 h, many of the genes that were induced by LPPE treatment at β 12 h were no longer induced. Among the genes up-regulated at transferase, indicating that -oxidation in the mitochondria may 36 h, we found no evidence that transcripts for particular bi- be favored over the peroxisome. We were not able to identify ological processes were enriched. However, at 36 h after expo- a peroxisomal acyl-CoA oxidase and we observed no increase in < expression for peroxisomal catalases, indicating that this fun- sure to LPPE, many of the highly expressed (P 0.01) genes β belonged to gene families with known roles in detoxification gus likely uses a nonforming H2O2 pathway for peroxisomal - (e.g., oxidoreductases and CYP450s) (SI Appendix, Table S4). oxidation, which is consistent with its phylogenetic position (28). fi We identified a 100-kb EC on supercontig GCSC_108 (0.9–1.1 We also observed a large number of signi cantly induced tran- ∼ scription factors (SI Appendix, Table S4). We anticipated finding Mb) (Fig. 2). Within an 85-kb core section of this genome re- genes involved in β-ketoadipate metabolism because this path- gion, 35 gene models were predicted, 18 were induced in re- way is commonly used for the aerobic detoxification of aromatic sponse to the terpene treatment, 4 were repressed, and 12 were compounds in microorganisms. However, neither the Gc ortho- unchanged (SI Appendix, Table S6). The most strongly induced fl log to N. crassa 3-carboxy-cis,cis-muconate cyclase (25) or the genes in this region were a avoprotein monooxygenase (FMO), Aspergillus nidulans phenylacetate catabolic gene cluster (26), an FMO-like monooxygenase containing a lipocalin signature, nor genes identified in the TCA cycle were strongly induced in and a short-chain dehydrogenase/reductase enzyme. In addition response to the LPPE treatment. As in A. nidulans, we noted that to these oxidoreductases were enzymes such as an epoxide hy- the genes of the phenylacetate catabolic pathway were clustered, drolase, alcohol dehydrogenase, and aldehyde dehydrogenase, although the cluster is expanded in Gc and included the phe- which may be important for activating terpenoids or their nylacetate 2-hydroxylase (GCSC_179: 1.126–1.135 Mb). intermediates for β-oxidation. Given the induction of genes in We investigated the genome regions surrounding the putative the β-oxidation pathway and genes clustered within the genome detoxification genes within the ECs (SI Appendix, Fig. S8). Our that may be involved in activating terpenoids for β-oxidation, we current expression data validated the two ECs identified by tested the ability of Gc to grow on the terpenoid blend as a sole digital profiling ESTs following LPPE treatment (27). The ad- carbon source. Consistent with the results of gene-expression ditional gene annotation and expression data reported here profiling, Gc was able to grow on the terpenoid blend as a sole allowed us to extend one of these clusters consisting of six loci by carbon source (SI Appendix, Fig. S10). Finally, we have begun ex- four loci to 10 gene models (GCSC_140; 1.13–1.15 Mb; Cluster ploring the contribution of the most strongly induced pleiotropic I) (SI Appendix, Fig. S9 and Table S5). The region with the drug resistance transporter, GLEAN_8030,toGc’s terpenoid tol- highest average expression levels over a 50-kb window in the erance. Deleting this gene using our recently developed split- LPPE-treated data (GCSC_173; 1.84–1.90 Mb; Cluster II) (SI marker Agrobacterium-mediated transformation system (29) pre- Appendix, Table S4) contained 12 genes, all of which responded vented mycelial growth of the fungus on the terpene-supplemented strongly to the LPPE treatment. media (SI Appendix, Fig. S11).

DiGuistini et al. PNAS | February 8, 2011 | vol. 108 | no. 6 | 2507 Downloaded by guest on September 23, 2021 Following a MPB attack or Gc inoculation, the concentration of host-defense chemicals, in particular terpenoids, increases (34). That Gc can overcome terpenoid defenses when N. crassa cannot suggests that this capability could be a critical pathoge- nicity factor in the MPB-Gc symbiosis. Responses to the LPPE and terpenoid treatments were substantially different; Gc growth was reduced by LPPE and delayed by terpenoids, and only 41 genes were induced by both treatments at 12 h. This set of 41 genes was enriched in general and chemical stress responders, including a putative DNA glycosylase and cytidine deaminase, suggesting that changes in DNA methylation or RNA/DNA editing may be important in early chemical stress responses. The LPPE extract is a complex mixture of methanol-water soluble compounds (SI Appendix, Fig. S12), which contains de- fensive phenolic chemicals, sugars, and possibly other metabo- lites. Importantly, the LPPE captures a complexity of compounds encountered by the fungal propagules when deposited into the tree phloem, and this experimental treatment thus complements the more defined terpenoid treatment. When Gc was treated with LPPE the mycelia became pink, which may indicate that oxidized phenolic derivatives, such as quinones and free radicals, were generated. Consistent with this observation, genes involved in sugar utilization and response to oxidative stresses were strongly induced. The activation of genes and gene clusters that may be involved in the detoxification or degradation of host antimicrobial compounds suggests that Gc needs to detoxify its environment. The induction of transcription factors may reflect Fig. 2. Gene expression cluster induced following terpenoid treatment. RNA-seq profiling reveals a cluster of coexpressed genes on supercontig the regulatory coordination required to process the diverse col- GCSC_108. For complete details and the results of genome-wide mapping lection of antimicrobial compounds in this phloem extract. In fungi, catabolic enzymes that degrade aromatic compounds can data, see SI Appendix, Fig. S6. From Top to Bottom: transposons detected fi using de novo and reference based methods represented by black bars along be encoded in gene clusters (26), and although we did not nd supercontig. Expression analysis results derived from comparison of control substantial induction of genes known to be involved in aromatic vs. treatment for the 36-h terpenoid-treated samples averaged in 50-kb degradation in Gc, LPPE treatment induced gene clusters. This windows across GCSC_108. (Enlargement) Log-transformed coverage data finding suggests that Gc may use unique metabolic pathways to for the peak expression region indicates agreement between predicted gene detoxify the host-specific pine-defense chemicals. models and RNA-seq data. When Gc was treated with terpenoids we observed a lag phase in growth and indications that terpenoid treatment induced transcriptome reprogramming mediated by chromatin remodel- Discussion ing. We observed a gene cluster that spanned ∼100 kb and that We developed fundamental genomic and molecular resources responded strongly to terpenoid treatment; functional annota- for functional characterization of a bark beetle-symbiotic fungus tion indicates that this cluster may contribute to early enzymatic and tree pathogen. Sequencing and assembly of the Gc genome steps in terpenoid metabolism. Coexpression clusters that span involved next generation sequencing data and traditional fin- large genomic regions have been reported for higher eukaryotes ishing strategies. This approach resulted in a high-quality ge- (35), but not for fungi. Clusters may reflect selection for co- nome sequence. The genome size, repeat content, and gene ordinated gene expression and reliable gene transmission (35, collection are similar to other saprophytic and pathogenic fungi 36). For Gc, the origin and maintenance of such a cluster may fl fi in the class Sordariomycetes. We identified evidence for the re ect detoxi cation pathway optimization that minimizes ac- fungus-specific genome defense mechanism RIP. In N. crassa cumulation of toxic metabolite intermediates. In support of this, RIP occurs before or during meiosis, causing C•GtoT•A in the prokaryote Burkholderia xenovorans nearly half of the 93 genes induced by the diterpene dehydroabietic acid occur in an mutations within duplicated sequences (30). This result indicates ∼ that Gc has sexual potential despite the Gc sexual cycle being 80-kb region, and the genes in this region participate directly in rarely reported in field data and not yet achieved under dehydroabietic acid metabolism (37). At 36 h following terpe- noid treatment, Gc genes involved in fatty acid metabolism were laboratory conditions. induced. This finding may indicate that subsequent steps in- It is unknown if Gc relies on a limited number of modifying fi β fi volved in terpenoid detoxi cation occur through the -oxidation enzymes with broad substrate speci cities, or a large number of fi fi fi pathway; consistent with this, we con rmed that Gc is able to use enzymes with narrow substrate speci cities for the detoxi cation terpenoids as a sole carbon source. Fungi that are able to use of host defense metabolites, such as terpenoids and phenolics. hydrophobic substrates, such as long-chain alkanes via the The expansion of the Gc O-methyltransferase gene family may β-oxidation pathway, have been described (38), and fungi cul- represent an opportunity for future work to test the range of fi tured in the presence of small amounts of terpenoids often substrate speci cities of these enzymes and their possible role in generate nonspecific oxidized derivatives; however, the ability to fi detoxi cation of host metabolites. Some O-methyltransferases grow on alkanes, such as antimicrobial monoterpenoids, as a sole have known functions in plant phenolic metabolism (31) and in carbon source is unusual. The gene-expression data described fungal and animal phenolic detoxification (32, 33). Noticeably, here provide an opportunity for exploring the genes and mech- Gc had only 54 CYP450s and no obvious expansion of any anisms involved in this process. CYP450 gene subfamilies, which is surprising given the potential By applying the genomic and molecular resources developed of CYP450s to contribute to the transformation of host defense in this work we have begun to clarify the specialized mechanisms chemicals. Transcriptome sequencing identified a number of Gc that Gc has developed, which allow it to tolerate terpenoids and CYP450s inducible by terpenoid or LPPE that warrant further grow in its pine host, an evolutionary adaptation that is an im- investigation. These CYP450s will be explored in future work portant factor in the interaction between host tree, the fungal across a larger collection of different Gc strains. pathogen, and its beetle vector.

2508 | www.pnas.org/cgi/doi/10.1073/pnas.1011289108 DiGuistini et al. Downloaded by guest on September 23, 2021 Materials and Methods analysis was performed with custom scripts (SI Appendix and available upon request). Detailed materials and methods, including references, are described in the SI Appendix. RNA-seq, Variant Detection, and Expression Analysis. RNA-seq data were generated with an Illumina (GA ) from poly(A+) mRNA (SI Appendix). Se- Strains. Gc strain kw1407 (NCBI ID: 655863) is deposited into the ii quence clusters were generated on an Illumina cluster station. Lanes were University of Alberta Mycological Herbarium 11150 along with the addi- sequenced to 36 cycles. Postrun analysis was performed with the Illumina GA tional isolates used in this study (11151–11156). pipeline (v.1.0). Paired-end (PE) reads were aligned to the reference genome sequence using CLCbio’s Genomics workbench (http://www.clcbio.com/; Genome Sequence Finishing, ESTs, and Genome Anotations. Genome sequence CLCbio, DK); SNP prediction was also performed within this software package finishing was performed on the draft assembly described previously, with with additional postprediction filtering (SI Appendix). Culture conditions for additional data: one lane of Illumina Genome Analyzer (GA ) from a 3-kb ii mycelia generated for expression analysis, terpene, and LPPE treatment long insert library and 1,299 finishing reactions performed for filling gaps preparations are described in the SI SI Appendix. Treatments for tran- (SI Appendix). Telomeric repeats were identified using the sequence scriptome analysis were carried out using a TLC sprayer applying the treat- TTAGGG as a search sequence. ESTs were reported earlier (27). Gene models ment directly to culture surfaces with filtered nitrogen gas as the carrier. are a composite of ab initio and homology-based predictions generated using GLEAN (SI Appendix). Putative gene function assignments were gen- ACKNOWLEDGMENTS. The authors thank the Functional Genomics Group of erated from searches of the NCBI NR and Swissprot databases using BLAST the British Columbia Cancer Agency Genome Sciences Centre for expert and combined with PFAM domain assignments. GO annotations were technical assistance, and all of the excellent undergraduate students who assigned using Blast2GO. Predicted protein localizations were determined have worked in the C.B. laboratory on this project. This work was funded by using SignalP, TMHMM, and WolfPsort. grants from the Natural Sciences and Engineering Research Council of Canada (to J.B. and C.B.), the British Columbia Ministry of Forests (to S.J.J., Peptide Sequencing. To obtain extracellular proteins for Gc, the fungus was J.B., and C.B.), the Canadian Forest Service Genomics program (to R.C.H.), grown on sawdust-agar plates overlaid with cellophane. After 3 d of and funds from Genome Canada, Genome British Columbia and Genome Alberta (to J.B., C.B., R.C.H., and S.J.J.) in support for the Tria project (www. growth, mycelia and cellophane were transferred to acetate buffer, centri- fi thetriaproject.ca). S.J.J. and M.A.M. are Michael Smith Distinguished Schol- fuged, and ltered (SI Appendix). The protein solution was concentrated ars. Salary support for J.B. came in part from a Natural Sciences and Engi- and separated by 1D SDS/PAGE (SI Appendix). In-gel protein digests were neering Research Council of Canada Steacie Fellowship and the University of performed for 16 bands cut from the 1D gel (SI Appendix). Peptide analysis British Columbia Distinguished Scholars Program. This is National Center for was performed by tandem mass spectrometry (SI Appendix). Bioinformatic Biotechnology Information Genome Project 39847.

1. Seybold S, Bohlmann J, Raffa K (2000) Biosynthesis of coniferophagous bark beetle 20. Lieutier F, Berryman A (1988) Preliminary histological investigations of the defence pheromones and conifer isoprenoids: Evolutionary perspective and synthesis. Can reactions of 3 pines to Ceratocystis-clavigera and two chemical elicitors. Can J Res 18: Entomol 132:697–753. 1243–1247. 2. Kurz WA, et al. (2008) Mountain pine beetle and forest carbon feedback to climate 21. Gao Y, Chen T, Breuil C (1995) Identification and quantification of nonvolatile change. Nature 452:987–990. lipophilic substances in fresh sapwood and heartwood of lodgepole pine (Pinus- 3. Lee S, Kim J-J, Breuil C (2006) Diversity of fungi associated with the mountain pine contorta Dougl). Holzforschung 49(1):20–28. beetle Dendroctonus ponderosae and infested lodgepole pines in British Columbia. 22. Zeeberg BR, et al. (2003) GoMiner: A resource for biological interpretation of Fungal Divers 22:91–105. genomic and proteomic data. Genome Biol 4(4):R28. 4. Lee S, Kim J-J, Breuil C (2006) Pathogenicity of Leptographium longiclavatum 23. Bouarab K, Melton R, Peart J, Baulcombe D, Osbourn A (2002) A saponin-detoxifying associated with Dendroctonus ponderosae to Pinus contorta. Can J Res 36:2864–2872. enzyme mediates suppression of plant defences. Nature 418:889–892. MICROBIOLOGY 5. Ayres M, Wilkens R, Ruel J, Lombardero M (2000) Nitrogen budgets of phloem- 24. Zheng Z, Shetty K (2000) Solid-state bioconversion of phenolics from cranberry feeding bark beetles with and without symbiotic fungi. Ecology 8:2198–2210. pomace and role of Lentinus edodes β-Glucosidase. J Agric Food Chem 48:895–900. 6. Bleiker K, Six D (2007) Dietary benefits of fungal associates to an eruptive herbivore: 25. Kajander T, et al. (2002) The structure of Neurospora crassa 3-carboxy-cis,cis- Potential implications of multiple associates on host population dynamics. Environ muconate lactonizing enzyme a β-propeller cycloisomerase. Structure 10:483–492. Entomol 36:1384–1396. 26. Fernández-Cañón JM, Peñalva MA (1995) Fungal metabolic model for human type I 7. Lieutier F, Yart A, Salle A (2009) Stimulation of tree defences by Ophiostomatoid hereditary tyrosinaemia. Proc Natl Acad Sci USA 92:9132–9136. fungi can explain attack success of bark beetles on conifers. Ann For Sci 66:801–823. 27. Hesse-Orce U, et al. (2010) Gene discovery for the bark beetle-vectored fungal tree 8. Franceschi VR, Krokene P, Christiansen E, Krekling T (2005) Anatomical and chemical pathogen Grosmannia clavigera. BMC Genomics 11:536. defences of conifer bark against bark beetles and other pests. New Phytol 167: 28. Shen Y-Q, Burger G (2009) Plasticity of a key metabolic pathway in fungi. Funct Integr 353–375. Genomics 9(2):145–151. 9. Keeling C, Bohlmann J (2006) Genes enzymes and chemicals of terpenoid diversity in 29. Wang Y, DiGuistini S, Wang T-CT, Bohlmann J, Breuil C (2010) Agrobacterium- the constitutive and induced defence of conifers against insects and pathogens. New meditated gene disruption using split-marker in Grosmannia clavigera a mountain Phytol 170:657–675. pine beetle associated pathogen. Curr Genet 56:297–307. 10. Lee S, Kim JJ, Breuil C (2005) Leptographium longiclavatum sp. nov., a new species 30. Galagan JE, Selker EU (2004) RIP: The evolutionary cost of genome defence. Trends associated with the mountain pine beetle, Dendroctonus ponderosae. Mycol Res 109: Genet 20:417–423. 1162–1170. 31. Preisig CL, Matthews DE, VanEtten HD (1989) Purification and characterization of S- 11. Plattner A, Kim JJ, DiGuistini S, Breuil C (2008) Variation in pathogenicity of adenosyl-L-methionine: 6-a-hydroxymaackiain 3-O-methyltransferase from Pisum a mountain pine beetle-associated blue-stain fungus, Grosmannia clavigera,on sativum. Plant Physiol 91:559–566. young lodgepole pine in British Columbia. Can J Plant Pathol 30(3):457–466. 32. Männistö PT, Kaakkola S (1999) Catechol-O-methyltransferase (COMT): Biochemistry 12. DiGuistini S, et al. (2009) De novo genome sequence assembly of a filamentous molecular biology pharmacology and clinical efficacy of the new selective COMT fungus using Sanger 454 and Illumina sequence data. Genome Biol 10(9):R94. inhibitors. Pharmacol Rev 51:593–628. 13. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in 33. Feltrer R, Álvarez-Rodrigueź ML, Barreiro C, Godio RP, Coque J-JR (2010) Character- large genomes. Bioinformatics 21:i351–i358. ization of a novel 2,4 6-trichlorophenol-inducible gene encoding chlorophenol O- 14. Hane JK, Oliver RP (2008) RIPCAL: A tool for alignment-based analysis of repeat- methyltransferase from Trichoderma longibrachiatum responsible for the formation of induced point mutations in fungal genomic sequences. BMC Bioinformatics 9:478. chloroanisoles and detoxification of chlorophenols. Fungal Genet Biol 47:458–467. 15. Li L, Stoeckert C, Jr., Roos D (2003) OrthoMCL: Identification of ortholog groups for 34. Raffa K, Berryman A (1983) Physiological-aspects of lodgepole pine wound responses eukaryotic genomes. Genome Res 13:2178–2189. to a fungal symbiont of the mountain pine-beetle Dendroctonus ponderosae 16. De Bie T, Cristianini N, Demuth JP, Hahn MW (2006) CAFE: A computational tool for (Coleoptera Scolytidae). Can Entomol 115:723–734. the study of gene family evolution. Bioinformatics 22:1269–1271. 35. Hurst LD, Pál C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene 17. Zabel RA, Morell JJ (1992) Wood stains and discolorations. Wood Microbiology: Decay order. Nat Rev Genet 5:299–310. and its Prevention, eds Zabel RA, Morell JJ, (Academic Press, San Diego, CA), pp 36. Walton JD (2000) Horizontal gene transfer and the evolution of secondary metabolite 326–343. gene clusters in fungi: An hypothesis. Fungal Genet Biol 30(3):167–171. 18. Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell 37. Smith DJ, Park J, Tiedje JM, Mohn WW (2007) A large gene cluster in Burkholderia using TargetP, SignalP and related tools. Nat Protoc 2:953–971. xenovorans encoding abietane diterpenoid catabolism. J Bacteriol 189:6195–6204. 19. Cantarel BL, et al. (2009) The Carbohydrate-Active EnZymes database (CAZy): An 38. Thevenieau F, et al. (2007) Characterization of Yarrowia lipolytica mutants affected in expert resource for Glycogenomics. Nucleic Acids Res 37:D233–D238. hydrophobic substrate utilization. Fungal Genet Biol 44:531–542.

DiGuistini et al. PNAS | February 8, 2011 | vol. 108 | no. 6 | 2509 Downloaded by guest on September 23, 2021