Schlegel et al. BMC Genomics (2016) 17:1015 DOI 10.1186/s12864-016-3369-8

RESEARCH ARTICLE Open Access Globally distributed root endophyte Phialocephala subalpina links pathogenic and saprophytic lifestyles Markus Schlegel1†, Martin Münsterkötter2†, Ulrich Güldener2,3, Rémy Bruggmann4, Angelo Duò1, Matthieu Hainaut5, Bernard Henrissat5, Christian M. K. Sieber2,6, Dirk Hoffmeister7 and Christoph R. Grünig1,8*

Abstract Background: Whereas an increasing number of pathogenic and mutualistic ascomycetous were sequenced in the past decade, species showing a seemingly neutral association such as root endophytes received less attention. In the present study, the genome of Phialocephala subalpina, the most frequent species of the Phialocephala fortinii s.l. – Acephala applanata species complex, was sequenced for insight in the genome structure and gene inventory of these wide-spread root endophytes. Results: The genome of P. subalpina was sequenced using Roche/454 GS FLX technology and a whole genome shotgun strategy. The assembly resulted in 205 scaffolds and a genome size of 69.7 Mb. The expanded genome size in P. subalpina was not due to the proliferation of transposable elements or other repeats, as is the case with other ascomycetous genomes. Instead, P. subalpina revealed an expanded gene inventory that includes 20,173 gene models. Comparative genome analysis of P. subalpina with 13 ascomycetes shows that P. subalpina uses a versatile gene inventory including genes specific for pathogens and saprophytes. Moreover, the gene inventory for carbohydrate active enzymes (CAZymes) was expanded including genes involved in degradation of biopolymers, such as pectin, hemicellulose, cellulose and lignin. Conclusions: The analysis of a globally distributed root endophyte allowed detailed insights in the gene inventory and genome organization of a yet largely neglected group of organisms. We showed that the ubiquitous root endophyte P. subalpina has a broad gene inventory that links pathogenic and saprophytic lifestyles. Keywords: Comparative genomics, Lifestyle, Root endophyte, Species complex, Parasitism-mutualism continuum

Background nature of interaction with their hosts [5, 6]. It is assumed Plant roots are confronted with the colonization of symbi- that they behave along the mutualism - antagonism con- otic fungal species ranging from pathogens to mutualists tinuum depending on host conditions and environment, [1]. While research has largely been focused on the symbi- as shown for some mycorrhizal fungi [7–9]. otic and pathogenic interactions, seemingly neutral associ- Species belonging to the helotialean Phialocephala for- ations with plant roots by endophytes received less tinii s.l. – Acephala applanata species complex (PAC) attention [2, 3]. Endophytic fungi colonize functional roots are the dominant root endophytes in woody plant spe- tissues but disease symptoms do not develop at all or at cies [5]. PAC shows a global distribution as PAC species least not for prolonged periods of time [4]. Despite their colonize roots from arctic to subtropical plant species prevalence in many ecosystems, little is known about the throughout the northern hemisphere [10–12]. Recently, a study proved the presence of PAC on the southern * Correspondence: [email protected] hemisphere [13]. In contrast to ectomycorrhizal species †Equal contributors (EcM), which are usually confined to primary, non- 1 Institute of Integrative Biology (IBZ), Forest Pathology and Dendrology, ETH lignified roots, PAC can be found in all parts of the root Zürich, 8092 Zürich, Switzerland 8Microsynth AG, Schützenstrasse 15, 9436 Balgach, Switzerland system, predominatly on fine roots < 3 mm where up to Full list of author information is available at the end of the article

© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Schlegel et al. BMC Genomics (2016) 17:1015 Page 2 of 22

80% of randomly sampled roots were colonized [5]. In They were described as beneficial, neutral or pathogenic addition, PAC species are belonging to the first colo- for different hosts, growing conditions and fungal strains nizers of tree seedlings in natural forest ecosystems, in- [5, 20, 21]. New results comparing the effect of PAC spe- fecting them within a few weeks after germination [14]. cies and strains on Picea abies indicate that the outcome PAC is composed of more than 15 cryptic species [10], of the interactions is mainly driven by the fungal genotype eight of which were formally described [15, 16]. Among and follow the antagonism-mutualism continuum. the strains sampled from a single study site, PAC species Whereas some of the PAC/P. subalpina strains were form communities of up to 10 sympatrically occurring shown to be pathogenic and killed most of the seedlings, species. In contrast to agricultural ecosystems, PAC others were benign [22]. Nevertheless, infection by PAC is communities remain stable for several years [17] al- costly for the plant since an increase in plant growth was though minor long-term changes in the frequency of never observed [22]. The outcome of PAC-host interac- species can be observed [18]. Most PAC communities tions depends on the ability of PAC strains to invade and are dominated by few species and additional species colonize host root tissues. This is evident by the health occur at low frequencies [5] as observed in many other status of Norway spruce seedlings, which negatively corre- community structures of biological systems [19]. Species lates with the biomass of the in roots [22, 23]. diversity and community structure do not correlate with However, negative effects of harmful PAC strains are re- the tree community, geographical location, soil proper- duced by the co-colonization of ectomycorrhizal fungi ties, management practices nor does climate, precipita- and other PAC strains [24]. tion and temperature [10]. Host specificity of PAC The dynamics of PAC-host colonization was rarely stud- species was not observed [5] except for A. applanata ied [25–27], and data on intraspecific variation of that was almost exclusively isolated from species belong- colonization dynamics for different PAC species is missing ing to the Pinaceae but rarely from ericaceous roots of completely. In general, PAC strains form hyphopodia- or the ground vegetation [14]. appressoria-like structures to enter root hairs or epidermal Despite the fact that PAC species are highly successful cells (Fig. 1a, b). After entering the root, PAC strains grow colonizers of plant roots and widely distributed in natural inter- and intracellularly and colonize the cortex but rarely ecosystems, their ecological role is still poorly understood. invade the vascular cylinder (Fig. 1c, d). Intercellular

ab

10 µm 2 µm cd

10 µm 2 µm

Fig. 1 Key features of the colonization of roots by PAC species. Key steps in the colonization of roots by PAC species (V. Queloz, unpublished). Figure 1a, b. Surface colonization and appressoria/hyphopodia formation of P. subalpina on Cistus incanus roots. Figure 1c. Cross-section of Pinus strobus root colonized by PAC stained using borax, methylene blue and toluidine blue and counterstained with basic fuchsine. The fungus com- pletely colonizes the cortical tissue up to the endodermis. Accumulation of phenolic compounds in the vascular cylinder is evident by the pres- ence of dark granular structures. Figure 1d. Example of intracellular colonization of P. subalpina in C. incanus cortex (arrow) Schlegel et al. BMC Genomics (2016) 17:1015 Page 3 of 22

labyrinthine fungal tissue resembling the Hartig net in ecto- Table 1 Genome statistics for Phialocephala subalpina mycorrhizal fungi as well as mantel-like structures were oc- Genome size [Mb] 69.69 – casionally observed for PAC [27 29]. Scaffolds 205 Host defense reactions such as cell-wall appositions or Scaffolds N50 [kb] 1,449 lignituber formation were rarely observed [27]. Intracel- N50 number of scaffolds 16 lular hyphae traverse host cells by narrow penetration hyphae without apparent lysis of the plant cell wall while GC (%) 45.9 the host cell cytoplasma disintegrates after colonization Predicted protein-coding genes 20,173 by PAC hyphae (Fig. 1d). Pertaining to the ultrastruc- Average exon length [bp] 443.9 tural level, hyphae are not surrounded by host-derived Average intron length [bp] 80.2 perifungal membranes, which are regarded as hallmark Average number of introns per gene 2.7 for biotrophic fungal associations [27]. Finally, cortical tRNAs 91 cells of the plant are often associated with thick-walled, heavily melanized fungal cells, i.e. microsclerotia, which TEs content 5.70% were shown to accumulate reserve substances [5, 25]. Other repeatsa 8.10% The availability of genomic sequences provides informa- atandem repeats, SSR, and low-complexity DNA tion on the gene inventory of species and identifies char- acteristic genomic structures and gene sets associated FJOG01000001-FJOG01000205). Mapping of reads data with different lifestyles [30–33]. Although the number of against the assembled genome did not reveal any sig- sequenced fungal genomes increases rapidly, genome se- nificant deviations from the average 25x coverage ex- quencing of ascomycetes was mostly restricted to patho- cept for the mtDNA (2,629x) and rDNA repeat (1,422x) genic, saprophytic or well-known mutualistic species. In indicating that no repetitive sequences were collapsed contrast, only few endophytes were sequenced and re- in short scaffolds leading to an underestimation of the stricted to endophytes in the Clavicipitaceae [34, 35]. Cla- genome size. vicipitalean endophytes are obligate biotrophs, colonize The annotation pipeline and manual validation resulted their hosts systemically and follow a very close symbiotic in 20,173 gene models. In addition, 91 tRNA genes coding lifestyle with their hosts but roots are not colonized [36]. for all amino acids and 20 5S rRNA loci were identified. This sets them apart from non-clavicipitalean endophytes None of the expected single-copy core orthologous genes isolated from all parts of the plants at high frequencies. found in eukaryotes (248 and 246 genes, see material and An exception is the genome of Phialocephala scopiformis, methods) was missing in the protein predications for P. a foliar endophyte, for which the draft genome was re- subalpina indicating that the core gene inventory had cently published albeit with no analysis [37]. In the completely been covered. This was supported by EST data present study, the genome of P. subalpina, the most fre- as 28,045 out of 28,092 assembled transcripts (99.83%) quent species of the PAC, was sequenced and compared mapped to the assembly with high coverage (Additional to the genomes of 13 ascomycetous species with different file 1). 73.6% of the 20,173 proteins showed an identity lifestyles to get first insights in the genome structure and >30% with known proteins in the Similarity Matrix of Pro- gene inventory of non-clavicipitalean endophyte. teins database (SIMAP) [39]. The remaining 5,313 pro- teins with less than 30% identity included 4,233 species- Results specific P. subalpina proteins. No significant length differ- Phialocephala subalpina holds a large and feature-rich ence was observed among the high identity and the low genome identity genes (Additional file 2A). Moreover, mapping of ThegenomeofP. subalpina strain UAMH 11012 was the 4,233 proteins against the 454 EST dataset and RNA- sequenced at 25 x coverage using the Roche/454 GS Seq data showed that 2,833 of these gene models were FLX technology. Sequence data was assembled into 204 covered by ≥50% of the ORF length with EST/RNA-Seq scaffolds (excluding rDNA repeat and mtDNA) with a data (Additional file 2B). A classification scheme of the genome size of 69.7 Mb and an average GC content of gene models based on different analysis is given in Fig. 2. 45.9% (Table 1). The complete rRNA repeat (part of this assembly) and the mtDNA (http://www.ebi.ac.uk/ Classification of repetitive elements ena/data/view/JN031566) [38] were assembled manu- Repeats that could be classified as transposable elements ally and validated by Sanger sequencing. Data sets are (TEs) accounted for 5.7% of the genome sequence accessible at http://pedant.helmholtz-muenchen.de/ (Table 2). In contrast, 1.8% of repeats identified in genomes.jsp?category=fungal. The genome and annota- RepatScout analysis could not be classified to any TE tion data was submitted to the European Nucleotide family. TE were dominated by Class I elements of the Archive (ENA, http://www.ebi.ac.uk/ena/data/view/ Copia and Gypsy families attributing for 55% of all TEs. Schlegel et al. BMC Genomics (2016) 17:1015 Page 4 of 22

Presence of RNAi pathway and evidence for RIP putative InterPro paralogous mechanisms 168 1445 Homologs of DICER, ARGONAUT and RNA-dependent

704 Low identity RNA polymerase genes were found in multiple copies in putative 246 711 orthologous the genome of P. subalpina and comparison with other ascomycetes showed that the copy number of each of 32 2001 1955 3277 the three genes was higher than with other ascomycetes (Table 3). Similarly, several copies of cytosine methyl- 27 transferase gene of the Dnmt1 family were present in 7757 122 the genome of P. subalpina (Table 3). The cytosine 443 31 methyltransferases encoded in P. subalpina were classi- fied both based on the presence of InterPro domains 670 and arrangements of the domains in comparison with Neurospora crassa, Ascobolus immersus, Sclerotinia sclerotiorum and Botrytis cinerea homologues. Whereas Fig. 2 Classification of gene models of P. subalpina. Venn diagram showing a classification of gene models based on four different gene models PAC_15881 and PAC_01402 included only characteristics. Putative orthologous gene models were determined the C-5 cytosine methyltransferase domain (IPR001525) by QuartetS analysis including 13 ascomycetous species (see and showed homologies with the RiD gene of N. crassa, Table 2), putative paralogous gene models in P. subalpina were the other two genes (PAC_07461, PAC_02147) encoded analyzed using the Uclust algorithm, low identity gene models the C-5 cytosine methyltransferase domain as well as showing <30% identity in SIMAP searches and gene models including ≥1 InterPro accessions. Five hundred eighty four single- BAH domains (IPR001025). Gene PAC_07461 has a high copy and high-identity genes not including InterPro accessions were similarity with N. crassa Dim2 whereas gene PAC_02147 not covered by one of the four characteristics has a high similarity with Masc2 of A. immersus. A clear difference in the dinucleotide frequencies was observed in repeat versus genomic control regions and the In contrast, class II TEs were less abundant and were difference in dinucleotide frequencies was more pro- dominated by Tc1 Mariner and Helitron elements. A nounced in P. subalpina than in S. sclerotiorum (Fig. 4). large fraction of the repeat consensus sequences of the TpA dinucleotides were significantly overrepresented in RepeatScout analysis could not be assigned to any TE repeats whereas CpA/TpG were underrepresented sug- family. TEs of all classes/families were evenly scattered gesting a dominant mode of RIP for CpA to TpA muta- throughout the genome of P. subalpina and no evidence tions. In addition, repeat regions were generally rich in AT for TE-rich islands was observed (Fig. 3). Besides the content as also ApA, ApT and TdT were more frequent in TEs, the genome of P. subalpina also included low- repeats than in non-repeat genomic regions (Fig. 3). complexity sequences, tandem as well as simple se- quence repeats (total 8.1% of the genome). The genome indicates horizontal gene transfer (HGT) events from bacteria co-occurring in the same habitat Twenty one out of 163 genes, originally with a non-fungal best SIMAP hit, showed a skewed taxonomic distribution Table 2 Classification and frequency of the most important or higher bit scores with non-fungal taxa in BLAST transposable elements in the genome of P. subalpina searches against the NCBI non-redundant protein (nr) TE class TE In % Cumulative length in the Proportion of database (Table 4). Phylogenetic analysis for these 21 family of TEs genome [in kb] the genome genes showed that they likely result of HGT as the gene Class I Gypsy 28.9% 1,452 2.1% trees significantly deviate from the expected species tree Copia 26.1% 1,308 1.9% (for three examples see Fig. 5b-d; see also http://purl.org/ non LTR 4.6% 229 0.3% phylo/treebase/phylows/study/TB2:S20196). In contrast, Class II Tc1 8.2% 411 0.6% the RPB2 gene which was used as a control in the analysis, Mariner showed the expected ascomycetous phylogeny (Fig. 5a). In Helitron 3.9% 194 0.3% the majority of HGTs, the protein sequences taken from MITE 1.2% 59 0.1% the phylogenetically closest non-fungal species were de- rived from soil-borne bacteria (i.e. Collimonas arenae or hAT 0.6% 32 0.0% Brevibacillus laterosporus) or to live in the rhizosphere Mutator 0.4% 18 0.0% (i.e. Frankia sp.) and share therefore the same habitat as Not not 26.2% 1,284 1.8% P. subalpina. In 12 of the phylogenies, one or few of the classified classified closest relatives were of fungal origin while most of the Schlegel et al. BMC Genomics (2016) 17:1015 Page 5 of 22

scaffold00001 1.0

0.5

0.0 0 500 1000 1500 2000 1.0

0.5

0.0 2000 2500 3000 3500 4000

scaffold00002 1.0

0.5

0.0 0 500 1000 1500 2000 1.0

Frequency 0.5

0.0 2000 2500 3000 3500 4000 scaffold00003 1.0

0.5

0.0 0 500 1000 1500 2000 1.0

0.5

0.0 2000 2500 3000 3500 4000 Position (kb)

ClassI ClassII unknow gene content GC content Fig. 3 Overview of the gene content (gray), repeat content and the average GC content (green line) in selected scaffolds. The vertical lines (bars) represent the fraction of bases covered by genes and repeats within consecutive 1 kb windows remaining species were of prokaryotic origin. In several similarity with the gene of A. alternata encoding alm cases, P. subalpina clusters together with O. maius and/or and the gene of A. fumigatus encoding Alb1/PksP.In P. scopiformis (Table 4 & Fig. 5d). addition, PAC_07895 was placed in a second QuartetS cluster. Adjacent to both PKS genes, a putative hydroxy- Key secondary metabolite genes naphthalene reductase gene was found (PAC_11432, The genome of P. subalpina encodes a large number of PAC_07896). However, the two putative scytalone dehy- genes coding for putative secondary metabolite (SM) key dratases (PAC_18365, PAC_19872) in the P. subalpina enzymes (Table 5). 75.8% of the SM key genes in P. sub- genome were not located within either cluster. Presence alpina clustered with putative orthologous genes in the of putative orthologous genes for the two PKS for the 13 other 13 species without any obvious dominance of the ascomycetous species included in the comparative ana- closely related Leotiomycete species, i.e., P. subalpina lysis (Table 6) showed that only melanized species were shared most orthologous clusters with Aspergillus flavus. included in these two clusters. Further, genes of other Numerous SM key genes are clustered in putative sec- Leotiomycete species, such as S. sclerotiorum, B. cinerea ondary metabolite loci including genes for acyl-, and and M. brunnea, were represented in both clusters as P. methyltransferases, oxidoreductases, cytochrome P450 subalpina. Two NRPS genes were identified that are monooxygenases, or transcription factors (Table 5 & likely encoding siderophore synthetases (PAC_05248 and Additional file 3). PAC_13158). Besides key enzymes in secondary metab- Among the eight genes for non-reducing Type I PKSs, olism, the genome of P. subalpina also encodes a two gene clusters were identified in P. subalpina prob- fucose-specific lectin (PAC_07514) with similarities to ably involved in the melanin synthesis pathway. Whereas the AAL protein of Aleuria aurantia that was shown to PAC_11435 was placed in the same QuartetS cluster as protect the fungus against predators and parasites [40]. G. lozoyensis PKS1, PAC_07895 showed the highest Comparative genome analysis proves different lifestyles Table 3 Presence of RNAi and RIP core proteins in different and enlarged gene families in P. subalpina fungal genomes To explore the unexpectedly large gene set, the prote- Mechanism Gene Ps Om Ssc Bc Bg Nc Ca Sc Sp Median ome was compared to 13 ascomycetous proteomes RNAi Argo 4 2 2 3 2(3) 2 1 0 1 2 (Table 6) using orthologous cluster analysis, InterPro motif occurrence and a review of Carbohydrate-active Dicer 4 2 2 2 2 2 0 0 1 2 enzymes (CAZymes). A total of 174,555 predicted RdRP 5 3 3 3 1 3 0 0 1 3 proteins were grouped into 20,555 clusters of corre- RIP Dnmt1 4 3 3 2(3) 0 2 0 0 0 1 sponding putative orthologous genes. Proteins of P. family subalpina were present in 12,932 clusters (62.9%), sig- Ps Phialocephala subalpina, Om Oidiodenron maius, Ssc Sclerotinia sclerotiorum, Bc Botrytis cinerea, Bg Blumeria graminis, Nc Neurospora crassa, Ca Candida nificantly more than for any other investigated species. albicans, Sc Saccharomyces cerevisiae, Sp Schizosaccharomyces pombe 1,408 clusters included proteins of all 13 species and Schlegel et al. BMC Genomics (2016) 17:1015 Page 6 of 22

P. subalpina covering core functions in the primary metabolism, energy supply and cell cycle. 163 Quar- tetS clusters were mostly restricted to pathogens, whereas 61 clusters were predominantly found in saprophytic species. Principal component analysis based on these two datasets showed that P. subalpina was either placed within the pathogens or close to the saprophytes (Fig. 6a, b). Moreover, P. subalpina also shared the highest number of QuartetS clusters with

log2(repeats/control) thetwomycorrhizalspecies(Table7,Fig.6c).The most frequent FunCat annotations for the pathogen- and saprophyte-related clusters showed that the sec- ondary and C-metabolism was most often included but also several putative orthologous proteins classi- −1.0 −0.5 0.0 0.5 1.0 1.5 aa ac ag at ca cc cg ct ga gc gg gttatctg tt fied as virulence and disease factors were recognized Fig. 4 Change in dinucleotide frequencies in repeat regions. Change (Table 8). in the dinucleotide frequencies observed in the repeat regions of P. subalpina (blue) and S. sclerotiorum (orange) compared to genomic A total of 6,556 InterPro accessions were annotated control regions among the 14 species included in the analysis. A plateau of approx. 5,000–5,500 distinct InterPro accessions per

Table 4 Genes of P. subalpina likely acquired by horizontal gene transfer from non-fungal species Protein Putative function SignalP Remarks Order Closest species Habitat PAC_13705 fatty acid desaturase n.a. Sphaerosoma arcticus n.a. PAC_15031 metallo-β-lactamase Bacillales Brevibacillus laterosporus soil/water/insects PAC_18575 Acidobacteriales Acidobacteria bacterium soil KBS 146 PAC_13085 β-lactamase Caulobacteriales Phenylobacterium sp. plant root associated PAC_16385 terpenoid cyclase Rhizobiales Chelativorans sp. soil/rhizosphere PAC_19778 glyoxalase gene fragment Burkholderiales Collimonas arenae soil PAC_12936 glycoside hydrolase with P. scopiformis Burkholderiales Paraburkholderia sp. plant-associated bacteria PAC_20157 hydrolase secreted with O. maius Sphingobacteriales Pedobacter heparinus soil protein PAC_01946 adenine deaminase with P. scopiformis Actinomycetales Frankia sp. EuI1c soil/plant symbiont PAC_03408 kynurenine formamidase Actinomycetales Frankia sp. EUN1f soil/plant symbiont PAC_03301 secreted with O. maius and P. Actinomycetales Streptomyces sp. mostly soil protein scopiformis PAC_17397 methyltransferase with O. maius Actinomycetales Streptomyces sp. mostly soil PAC_19296 Actinomycetales Streptomyces sp. mostly soil PAC_14321 with O. maius Burkholderiales Curvibacter lanceolatus n.a. PAC_07909 quinoprotein amine secreted with O. maius Burkholderiales Paraburkholderia udeis soil/rhizosphere dehydrogenase protein PAC_06755 carotenoid oxygenase with O. maius Actinomycetales Streptomyces sp. mostly soil PAC_02359 Actinomycetales Mycobacterium avium water/soil PAC_18364 peptidase with P. scopiformis Rhizobiales Bradyrhizobium sp. soil/plant symbiont PAC_05940 oleate hydratase with P. scopiformis Spirochaetales Leptospira sp. n.a. PAC_15362 lipase with P. scopiformis Actinomycetales various species n.a. PAC_11525 with P. scopiformis Ktedonobacterales Ktedonobacter racemifer soil Schlegel et al. BMC Genomics (2016) 17:1015 Page 7 of 22

Fig. 5 Phylogenetic analysis of genes likely acquired by horizontal gene transfer. Figure 5a. Phylogenetic tree of a conserved housekeeping gene (RPB2) of PAC and related fungi. Figure 5b-d. Phylogenetic trees of four P. subalpina genes likely acquired by HGT. The P. subalpina protein sequences cluster with bacterial proteins. Some have close (but well separated) relatives from other Ascomycetes (5c, d). Colour indications: blue = P. subalpina, black = non-fungal species, green = fungal species Schlegel et al. BMC Genomics (2016) 17:1015 Page 8 of 22

Table 5 Key secondary metabolite genes found in the genome of P. subalpina Class Type Gene code SM clustera Length (aa) Domain structureb Remarks NRPS-like L-alpha-aminoadipate PAC_02750 1167 A-T-R Lys2, L-lysine biosynthesis reductase NRPS-like adenylating reductase PAC_02944 8 1049 A-T-R NRPS-like adenylating reductase PAC_01324 3 1047 A-T-R NRPS-like adenylating reductase PAC_02391 6 1023 A-T-R NRPS-like adenylating reductase PAC_03877 10 1117 A-T-R NRPS-like adenylating reductase PAC_04959 12 1038 A-T-R NRPS-like adenylating reductase PAC_05733 1008 A-T-R NRPS-like adenylating reductase PAC_07003 1054 A-T-R NRPS-like adenylating reductase PAC_12811 1054 A-T-R NRPS-like adenylating reductase PAC_17490 1065 A-T-R NRPS-like adenylating reductase PAC_19780 1050 A-T-R NRPS-like adenylating reductase PAC_02202 5 1227 A-T-R-Kinase NRPS-like adenylating reductase PAC_08842 1345 A-T-R-KR NRPS-like furanone synthetase PAC_08910 17 1013 A-T-TE NRPS-like quinone synthetase PAC_14584 979 A-T-TE NRPS-like unknown PAC_19640 1394 A-T-DUF NRPS-like unknown PAC_12359 1655 A-T-DUF NRPS PAC_05248 13 4676 A-T-C-A-T-C-A-T-C-T-C-T-C Siderophore biosynthesis (Ferrichrome-like) NRPS PAC_13158 4838 A-T-C-A-T-C-T-C-A-T-C-T-C-T-C Siderophore biosynthesis (Ferrichrome-like) NRPS PAC_15746 1639 A-T-C-T-C NRPS PAC_20134 4692 C-A-T-C-A-T-C-A-T-C-A-T-C-T NRPS PAC_16560 565 A-T-C NRPS PAC_16746 1207 A-T-C NRPS PAC_19282 1220 A-T-C PKS-NRPS PAC_02326 3974 KS-AT-DH-MT-KR-T-C-A-T-R PKS-NRPS PAC_08246 16 3965 KS-AT-DH-MT-KR-T-C-A-T-R Type I PKS non-reducing PKS PAC_01338 4 2542 SAT-KS-AT-PT-T-MT-TE Citrinin biosynthesis-like Type I PKS non-reducing PKS PAC_02435 7 2498 SAT-KS-AT-PT-T-MT-TE Citrinin biosynthesis-like Type I PKS non-reducing PKS PAC_03302 2223 KS-AT-DH-T-TE Type I PKS non-reducing PKS PAC_03589 9 1670 KS-AT-PT-T-TE Type I PKS non-reducing PKS PAC_07895 15 2171 SAT-KS-AT-PT-T-T-TE Likely melanin biosynthesis Type I PKS non-reducing PKS PAC_08751 2099 SAT-KS-AT-PT-T-T-TE Type I PKS non-reducing PKS PAC_10081 2125 SAT-KS-AT-PT-T-T-TE Type I PKS non-reducing PKS PAC_11435 2169 SAT-KS-AT-PT-T-T-TE Likely melanin biosynthesis Type I PKS reducing PKS PAC_00199 1 2694 KS-AT-DH-MT-ER-KR-T Alternapyrone biosynthesis-like Type I PKS reducing PKS PAC_00310 2 2283 KS-AT-DH-ER-KR-T Type I PKS reducing PKS PAC_01646 1687 AT-DH-MT-ER-KR-T Type I PKS reducing PKS PAC_04883 11 2239 KS-AT-DH-ER-KR-T Type I PKS reducing PKS PAC_06141 2258 KS-AT-DH-ER-KR-T Type I PKS reducing PKS PAC_10762 3173 KS-AT-DH-MT-ER-KR-T- Acyltransferase Schlegel et al. BMC Genomics (2016) 17:1015 Page 9 of 22

Table 5 Key secondary metabolite genes found in the genome of P. subalpina (Continued) Type I PKS reducing PKS PAC_11350 3203 KS-AT-DH-MT-ER-KR-T- Acyltransferase Type I PKS reducing PKS PAC_14253 2590 KS-AT-DH-MT-ER-KR-T Type I PKS reducing PKS PAC_14645 2274 KS-AT-DH-ER-KR-T Type I PKS reducing PKS PAC_16276 3140 KS-AT-DH-MT-ER-KR-T- Acyltransferase Type I PKS reducing PKS PAC_17799 2411 KS-AT-DH-ER-KR-T Lovastatin-diketide synthase-like Type I PKS reducing PKS PAC_19082 2581 KS-AT-DH-MT-ER-KR-T Type I PKS reducing PKS PAC_19990 2970 KS-AT-DH-MT-KR-T-C Lovastatin-nonaketide synthase-like Type I PKS reducing PKS PAC_10712 1430 KS-AT-TH-KR-T 6-Methylsalicylate synthase-like PKS Type III PKS PAC_02116 479 IPR012328/IPR001099 DMATS PAC_15749 538 IPR017795 Terpene PAC_05884 483 Terpene PAC_13844 593 IPR017825/IPR017825/ Squalene synthase-like IPR002060 Terpene PAC_15298 557 Terpene PAC_01018 338 IPR008949 Terpene PAC_04028 331 IPR008949 Terpene PAC_11164 466 IPR008949 Terpene PAC_12198 392 IPR008949 Terpene PAC_16221 336 IPR008949 a for details on the putative SM gene clusters see Additional file 3 b Abbreviations for domains: A adenylation, AT acyltransferase, C condensation, DH dehydratase, DUF terminal NRPS domain of unknown function, ER enoyl reductase, KR ketoreductase, KS ketosynthase, MT methyltransferase, PT product template, R reduction, SAT starter unit-ACP-transacylase, T thiolation (=acyl- or aryl- or peptidylcarrier protein), TE thioesterase, TH thiohydrolase species was observed when plotting the number of distinct hydrolases) as well as IPR011050 (Pectin lyase fold/viru- InterPro accessions per species against the total number lence factor) and IPR012334 (Pectin lyase fold). Only, 53 of annotated InterPro accessions per species (Additional InterPro accessions were significantly underrepresented in file 4). P. subalpina is represented by 5,232 distinct Inter- P. subalpina. The underrepresented InterPro accessions Pro accessions, and the highest number was observed in frequently showed uneven distributions in the 13 ascomy- the two saprophyte species T. reesei (5,529) and A. flavus cetous genomes, and some of the accessions (IPR013762, (5,303). In contrast, mycorrhizal species and the obligate IPR000477, IPR001584) are most probably related to biotrophic pathogen B. graminis included a significantly transposable elements. Similarly to PCA based on Quar- lower number of InterPro accessions. 963 (14.6%) InterPro tetS analysis, PCA based on InterPro accessions overrep- accessions were significantly overrepresented in the P. resented in pathogenic and saprophytic species placed P. subalpina genome compared to the average count for the subalpina either in the pathogenic cluster or the sapro- other 13 ascomycetous species and 386 (5.9%) InterPro phytic cluster (Fig. 7a, b). accessions were encoded >3x in the P. subalpina genome. Enrichment of GO terms for the overrepresented Mapping of the overrepresented InterPro accessions InterPro accessions in P. subalpina compared to average against the gene ontology (GO) showed that catabolic/ number of InterPro accessions found in 13 ascomyce- metabolic processes, transporters, and InterPro accessions tous genomes. Of the 963 overrepresented InterPro ac- involved in binding events were most frequent (Table 9). cessions 57% mapped to one or several GO terms and 411 (43%) of the InterPro accessions were not linked with the table summarizes the GO terms with the highest GO annotations and P. subalpina included among others numbers of distinct InterPro accessions. 534 gene models with the fungal heterokaryon incompati- Eight hundred eighty one gene models in P. subal- bility domain (IPR010730), 549 gene models with the pina were classified in 138 different CAZyme families major facilitator superfamily domain (IPR020846) and 328 resulting in 998 CAZyme modules, which are signifi- gene models with ankyrin repeats (IPR020683). Moreover, cantly more than observed in the other ascomycetous several classes of CAZymes contained InterPro accessions species.ThesecondmostfrequentlyfoundCAZyme without GO annotation such as IPR017853 (glycoside modules (883 modules) were encoded by F. Schlegel et al. BMC Genomics (2016) 17:1015 Page 10 of 22

Table 6 Species included in the comparative genomic analysis Species Code Lifestyle Class Order Family Reference Botrytis cinerea Bc Pathogen (necrotroph) Sclerotiniaceae [55] Sclerotinia sclerotiorum Ssc Pathogen (necrotroph) Leotiomycetes Helotiales Sclerotiniaceae [55] Blumeria graminis Bg Pathogen (obligate biotroph) Leotiomycetes Erysiphales Erysiphaceae [124] Marssonina brunnea Mb Pathogen (hemi-biotroph) Leotiomycetes Helotiales [125] Fusarium oxysporum f. sp. Fo Pathogen (hemi-biotroph) Sordariomycetes Hypocreales Nectriaceae [56] lycopersici Aspergillus flavus Af Saprophyte (soil & rhizosphere) Eurotiomycetes Eurotiales Trichocomaceae [126] Trichoderma reesei Tr Saprophyte (soil)a Sordariomycetes Hypocreales Hypocreaceae [127] Chaetomium globosum Chg Saprophyte (soil/plant debris) Sordariomycetes Sordariales Chaetomiaceae [128] Penicillium chrysogenum Pc Saprophyte (soil) Eurotiomycetes Eurotiales Aspergillaceae [129] Glarea lozoyensis Gl Saprophyte(soil) b Leotiomycetes Helotiales Helotiaceae [130] Oidiodendron maius Om Saprophyte (peat bog)/ericoid Leotiomycetes Leotiomycetes incertae Myxotrichaceae [32] mycorrhizal sedis Tuber melanosporum Tm Ectomycorrhizae Tuberaceae [66] Cenoccocum geophilum Ceg Ectomycorrhizae Dothideomycetes Pleosporomycetidae Gloniaceae [131] incertae sedis Phialocephala subalpina Ps Root endophyte Leotiomycetes Helotiales Dermateaceae this paper ain contrast to other Trichoderma species, T. reesei does not show mycoparasitism bITS sequences from environmental samples often derived from soil/rhizosphere

Fig. 6 Characterization of P. clusters enriched for pathogens or saprophytes. Principle component analysis (PCA) based on the presence/absence of species in putative orthologous gene clusters derived from QuartetS analysis. Figure 6a. PCA analysis based on 61 gene clusters enriched for saprophytic species. Figure 6b. PCA analysis based on 163 clusters enriched in pathogens. Figure 6c. Venn diagram showing the distribution of orthologous gene clusters for the two ectomycorrhizal species Tuber melanosporum and Cenococcum geophilum, the saprophyte/ericoid mycorrhizal species Oidiodendron maius as well as P. subalpina. Abbreviations of species are given in Table 2. Color code: green: mycorrhizal species, red: plant pathogens, purple: soil saprotrophs and blue: P. subalpina Schlegel et al. BMC Genomics (2016) 17:1015 Page 11 of 22

Table 7 Number of putative orthologous gene clusters shared by the two mycorrhizal species with the other species included in the analysis Species Total proteins Total clusters Bc Ssc Bg Mb Fo Af Tr Chg Pc Gl Om Tm Ceg Ps Tm 7,496 5,345 4,540 4,646 4,039 4,593 4,636 4,501 4,456 4,352 4,505 3,073 3,460 - 4,755 4,905 Ceg 14,561 8,645 5,797 5,795 4,492 5,556 6,038 5,795 5,576 5,439 5,704 3,733 4,939 4755 - 7,069 For abbreviation of species see Table 2 oxysporum.ThegenomeofP. subalpina was especially Genome expansion due to a large set of distinct genes in rich in genes coding for glycoside hydrolases (471), gene families glycosyltransferases (150) and redox enzymes (auxiliary ac- With a genome size of approximately 69.7 Mb, the P. sub- tivities enzymes, 157). Principal component analysis based alpina genome is significantly larger than the average gen- on the frequency of CAZyme modules involved in plant cell ome size of previously sequenced ascomycetous species wall degradation (PCWDEs) such as cellulose, hemicellu- [41]. Genome expansions can be caused by various events lose, pectin, cutin, and enzymes likely acting on different including (i) genome duplications, (ii) invasion of autono- substrates separated P. subalpina from all genomes (Fig. 8). mous elements such as TEs and expansion of repetitive Especially modules involved in pectin breakdown were sequences such as microsatellites and tandem repeats into encoded in high copy numbers in the genomes of both P. the genome, (iii) the number and length of introns and/or subalpina and F. oxysporum but P. sublpina included also a (iv) the expansion of the gene inventory [42, 43]. No evi- higher number of modules for cellulose and hemicellulose dence for large segmental duplications were observed in degradation (Additional file 5). The ectomycorrhizal species the P. subalpina genome by genome-wide alignments T. melanosporum and C. geophilum as well as the obligate using CoCoNUT [44] and the frequency of repeats in gen- biotrophic species B. graminis were separated due to the eral and TEs in special was small, considering the genome small number of PCWDEs (Fig. 8). size (Additional file 6). Proliferation of TE is counteracted by three processes which act at different stages of TE pro- liferation. Repeat induced point mutations (RIP) act on Discussion the DNA level by introducing C to T transitions and re- In the present study we sequenced the genome of the versing CpG to TpA dinucleotides in repeated regions [45, root endophyte Phialocephala subalpina belonging to 46]. MIP (Methylation induced premeiotically) de-novo the Phialocephala fortinii s.l. – Acephala applanata methylates repeated DNA sequences [47] and RNA inter- species complex one of the most prevalent species in ference (RNAi) suppresses TE proliferation either by het- forest ecosystems. By comparative genomic analysis erochromatin assembly or small interfering RNAs, which with the gene inventory of 13 other ascomycetous silence TE transcripts [48–50]. Indirect evidence that a species we show that P. subalpina links pathogenic RIP mechanism is or was active in P. subalpina stems and saprophytic lifestyles. from the skewed dinucleotide distribution in repeat

Table 8 Results of FunCat enrichment analysis for the orthologous gene clusters including P. subalpina that were mainly restricted to pathogenic and saprophytic species FunCat IDa Description Pathogen- Saprophyte- Total proteins in Total proteins in related proteins related proteins QuartetS clusters Ps genome 01.05 C-compound and carbohydrate metabolism 12 6 18 2473 01.20 secondary metabolism 9 7 16 2652 01.06 lipid, fatty acid and isoprenoid metabolism 7 4 11 1387 20.03 transport facilities 7 3 10 1216 11.02.03.04 transcriptional control 7 2 9 923 20.01.03 C-compound and carbohydrate transport 5 4 9 504 16.01 protein binding 6 2 8 1904 16.03.01 DNA binding 6 2 8 577 20.09.18.07 non-vesicular cellular import 5 3 8 428 32.05.05 virulence, disease factors 5 - 5 350 01.06.06.11 tetracyclic and pentacyclic triterpenes cholesterin, - 2 2 173 steroids and hopanoids metabolism asee http://mips.helmholtz-muenchen.de/funcatDB/ Schlegel et al. BMC Genomics (2016) 17:1015 Page 12 of 22

Table 9 Enrichment of GO terms for the overrepresented InterPro accessions encoded in P. subalpina (each InterPro accession was only considered once per gene model) GOID GO description Number of InterPro accessions Total proteins in Ps genome NA no GO annotation 411 6426 GO:0055114 oxidation-reduction process 103 1138 GO:0016491 oxidoreductase activity 54 1002 GO:0005975 carbohydrate metabolic process 43 368 GO:0008152 metabolic process 38 808 GO:0003824 catalytic activity 35 733 GO:0004553 hydrolase activity, hydrolyzing O-glycosyl compounds 30 235 GO:0016020 membrane 28 459 GO:0016021 integral component of membrane 26 912 GO:0005515 protein binding 24 1065 GO:0005524 ATP binding 22 501 GO:0006508 proteolysis 22 209 GO:0055085 transmembrane transport 19 913 GO:0008270 zinc ion binding 17 882 GO:0016787 hydrolase activity 14 208 GO:0050660 flavin adenine dinucleotide binding 14 184 GO:0020037 heme binding 13 280

regions. RIP is active during the sexual cycle [51] but no Common to the RIP/MIP process is that cytosine DNA sexual stage is known in P. subalpina. However, several methyltransferase of the Dnmt1 family play a key role [45, lines of evidence suggest that sexual reproduction regu- 54]. Homologs of cytosine DNA methyltransferases were larly occurs as (i) most PAC populations showed no gam- present in P. subalpina including two gene models closely etic disequilibrium, (ii) mating types do not deviate from a related to N. crassa RiD and one protein related to N. 1:1 ratio in PAC populations and (iii) strong purifying se- crassa Dim2. A fourth gene model found in P. subalpina lection was recorded in the mating type loci [52]. More- (PAC_02147) was closest related to MASC2 of Ascobolus over, teleomorphs are known for phylogenetically closely immersus which was placed in a cluster exclusively with related species such as Phaeomollisia piceae or Mollisia basidiomycete species in the study of Amselem et al. [54]. spp. [53]. Therefore, it seems likely that field studies have However, additional analysis showed that also other Leo- overlooked the teleomorph of PAC species so far [52]. tiomycete species included MASC2 homologs (B. cinerea

Fig. 7 Characterization of P. subalpina based on overrepresented InterPro accessions for pathogens or saprophytes. Principle component analysis (PCA) based on the abundance of InterPro accessions either overrepresented in pathogenic or saprophytic species. Figure 7a. PCA analysis based on 51 InterPro accession >2x overrepresented in saprophytic species. Figure 7b. PCA analysis based on 75 InterPro accession >2x overrepresented in pathogenic species. Abbreviations of species are given in Table 2. Color code: green: mycorrhizal species, red: plant pathogens, purple: soil saprotrophs and blue: P. subalpina Schlegel et al. BMC Genomics (2016) 17:1015 Page 13 of 22

showed a slightly higher fraction of putative paralogous genes than observed on average for the other 13 asco- mycetous species. F. oxysporum showed the highest frac- tion of putative paralogous genes and this observation is due to large segmental genome duplication [56]. Sec- ondly, a significant number of gene models in P. subal- pina were species-specific as they showed no significant hits in SIMAP analysis. The high fraction of species- specific genes may be the result of the lifestyle of P. sub- alpina as well as the missing genome data of closely re- lated species in the Loramyces – Vibrissea clade [53, 57]. Indeed, if the supposedly species-specific genes of P. Fig. 8 PCA analysis based on CAZyme modules involved in plant subalpina were blasted against the recently announced cell wall degradation (PCWDEs). Principle component analysis based genome of the closely related P. scopiformis [37], 28% of on the abundance of CAZyme modules involved in cellulose, the gene models showed significant blast hits (>1.0 E-14). hemicellulose, pectin, and cutin breakdown. The sum of the number Thirdly, significantly more InterPro accessions were an- of CAZyme modules per compound were used for the analysis. Abbreviations of species are given in Table 2. Color code: green: notated in P. subalpina including 13,074 gene models mycorrhizal species, red: plant pathogens, purple: soil saprotrophs than in any other species used for comparison. The high and blue: P. subalpina number of annotated InterPro accessions in the gene in- ventory did, however, not result in an expanded func- tional catalogue as the number of distinct InterPro (CCD54489), Sclerotinia borealis (ESZ90943) and M. accessions per species reached a plateau at approx. brunnea (XP_007291083)) adding some additional notable 5,200–5,500 distinct accessions indicating that “more of exceptions of ascomycetous proteins in this cluster. Beside the same” is present in P. subalpina. the indirect evidence of RIP/MIP, P. subalpina included Our analyses show that the gene inventory of P. subal- also RNAi key enzymes such as Dicer, Argonaute and pina was also expanded to some extent by HGT from RdRP in multiple copies. Based on these findings we non-fungal organisms. A systematic analysis of the ac- hypothesize that P. subalpina has a well-developed arsenal quired prokaryotic genes by 60 fungal genomes by of defense mechanisms in place to counteract the prolifer- Marcet-Houben & Gabaldón [58] showed that species in ation of TEs which may explain the comparative low fre- the exhibited a high incidence of inter- quencies of TEs. domain HGT including between 4 and 63 proteins per No significant differences in intron length and the species. The 21 HGT events found in P. subalpina likely average intergenic distance were observed. However, the represents a lower limit of events and closer inspection gene inventory was one of the largest among ascomy- of uncertain candidates will likely reveal additional HGT cetes with 20,173 annotated gene models resulting in events. Moreover, HGT events from other fungal species 2,500 to >10,000 more gene models than in other asco- were shown to be another significant source for HGT mycetes. In the light of the broad host range and the [59, 60]. The phylogenetically closest protein sequences broad geographical distribution of species belonging to often originated from species in the Burkholderiales or the P. fortinii s.l. – A. applanata species complex [1, 3, Actinomycetales that colonize soil and/or roots, i.e., they 5, 10] an enlarged gene repertoire could be expected. share their habitat with P. subalpina and rendering a However, even compared to other fungal species with HGT event likely. broad host ranges such as S. sclerotiorum, B. cinerea and In almost half of the cases, the HGT event was unique F. oxysporum the number of gene models is large [55, or occurred recently. In contrast, also older HGT events 56]. The high numbers of gene models may be the result were detected that were characterized by the presence of of the annotation of a large fraction of very short gene multiple fungal species within a clade of non-fungal spe- models [55] or gene models including TE fragments. In cies [60]. Interestingly, O. maius and P. scopiformis P. subalpina, no significant deviation in protein length shared several of the 21 studied HGT events. It is pos- distributions for low identity genes was observed and a sible that the HGT event occurred in a common ances- large fraction of the low identity genes were covered by tor as both fungi are related to P. subalpina (Fig. 6a). ESTs. Moreover, particular attention was paid to mask Alternatively, it might be possible that these genes were TEs. Therefore, the expansions in the gene inventory of introduced twice independently for O. maius as O. P. subalpina do not result from annotation artefacts. maius can also be found in similar habitats as P. subal- Several factors contributed to the high number of gene pina. It is associated with roots of ericaceous shrubs and models found in P. subalpina. First of all, P. subalpina involved in the decomposition of sphagnum peat [61]. Schlegel et al. BMC Genomics (2016) 17:1015 Page 14 of 22

Genes transferred by HGT can drive important evolu- pathogenic species (Fig. 7b and Additional file 7), indicating tionary innovations as shown for plant pathogens [60, the robustness of the analysis. The list included genes with 62–64]. This was also observed in P. subalpina as sev- protease/peptidase domains, cutinases, pectinases, genes in- eral proteins were hydratases or peptidases. Notably, two cluding a necrosis inducing domain or genes with chitin- β-lactamases were included that are involved in the de- binding modules. Many of these gene classes were shown toxification of β-lactam antibiotics [65] and may result to play a pivotal role in plant-pathogen interactions [68, in a competitive advantage against other microbes. 70]. For example, PAMP-induced host chitinases can either be overcome by the action of secreted proteases [72] or the The chameleonic genome of P. subalpina secretion of LysM effectors that may be coupled with a The ecological role of members of the P. fortinii s.l.-A. chitin-binding module [73, 74]. The finding that P. subal- applanata species complex is still poorly understood. They pina includes the repertoire of pathogenic species fits well were described as beneficial, neutral or pathogenic for dif- with the recent host-fungus interactions studies showing ferent hosts, growing conditions and fungal strains [5, 20] that PAC strains behave along the antagonism-mutualism and even a role in mycorrhizal associations was hypothe- continuum and are localized rather towards antagonistic in- sized [21]. In order to shed light on the lifestyle of P. subal- teractions as the colonization results in reduced biomass pina we compared the genome against 13 genomes of accumulation of the host [22]. However, strong strain- other ascomycetous species with different lifestyles. All our specific differences were observed in the outcome of the analyses showed that the gene inventory of P. subalpina interaction with some strains killing the majority of the supports multiple lifestyles. First, P. subalpina shared more seedlings and others only marginally affecting the host [22] putative orthologous genes with the two ectomycorrhizal and future studies are needed to understand the underlying species included in the analysis than any other species indi- mechanisms. Despite the pathogenic gene repertoire ob- cating some affinities with the two species. Nevertheless, an served in P. subalpina, host defense mechanisms such as important difference is evident. EcM fungi as well as obli- lignituber formation and cell wall appositions are rarely ob- gate biotrophic pathogens are generally characterized by a served during the invasion of PAC strains indicating that reduced gene inventory especially for plant cell wall degrad- PAC can manipulate/suppress the plant immune response ing enzymes (PCWDEs) [32]. However, P. subalpina en- [25]. Although the precise mechanisms how P. subalpina codes a high number of PCWDEs. Moreover, several EcM suppresses host induced defense mechanisms are unknown, fungi and obligate biotrophic pathogens show genome ex- effectors such as small secreted proteins (SSPs) predicted in pansions due to TE proliferation [30, 66, 67] which was ab- thegenomeofP. subalpina maybecandidatesastheywere sent in P. subalpina. shown to function as effectors in plant-fungal interactions In contrast to EcM fungi, saprophytes and necrotrophic/ [70, 75] and genome-wide differential gene expression stud- hemibiotrophic pathogens are well endowed with enzymes ies during host colonization will help to identify possible involved in the degradation of plant material [32]. The main effectors. difference between these two groups is that the pathogenic Beside the pathogen-related gene repertoire, our ana- species must have a specific gene inventory allowing them lysis shows that P. subalpina has also affinities with to invade hosts and overcome plant immune response, i.e., saprophyte-specific genes indicating that the species in- pathogen-associated molecular patterns (PAMP) or cludes the signature of both lifestyles in the genome. A damage-associated molecular patterns (DMAP) [68, 69]. very similar positioning in the analysis was observed for Whereas necrotrophic pathogens kill the host tissue by se- F. oxysporum. Indeed, F. oxysporum not only includes creting effectors like toxins and/or proteins, hemibiotrophic the >70 pathogenic variants but non-pathogenic strains pathogens grow intracellularly. and form specialized struc- of F. oxysporum were also isolated as endophytes from tures such as haustoria for nutrient uptake [70]. Recent asymptomatic roots [76–78]. comparative genome analysis showed that there are only Both species share a high number of β-lactamase/β-lac- few genes associated with plant pathogens that are absent tamase-related genes with saprophytes involved in the de- in non-pathogens [33, 71]. However, domains overrepre- toxification of β-lactam antibiotics. β-lactamases are well sented in necrotrophic/hemibiotrophic pathogenic species known fungal defense effector proteins [79], and detoxify- compared to saprophytes were identified [71]. When our ing β-lactam antibiotics help to persist against antagonists. dataset was enriched for overrepresented InterPro acces- Besides the β-lactamase genes, those coding for sulfatases sions in pathogens only few of the accessions were in ac- and different hydrolases were also enriched in saprophytic cordance with Soanes et al [71]. However, a closer look species including P. subalpina and F. oxysporum.Incon- showed that also of the accessions listed by Soanes et al trast to most pathogenic species, P. subalpina showed also [71] were also overrepresented in our dataset yet with a fac- an enlarged repertoire of PCWDEs involved in cellulose tor <2x. Irrespective of which InterPro accession list was and hemicellulose breakdown as well as proteins with used for analysis, P. subalpina was placed close to the auxiliary activities supposed to be involved in lignin Schlegel et al. BMC Genomics (2016) 17:1015 Page 15 of 22

breakdown. Indeed, a strain of P. fortinii s.l. was shown to with host plants and other microorganisms in the cause soft rot in autoclaved wood of beech and conifer rhizosphere. species indicating that members of the PAC can degrade Although the biomass and turnover rates of fine roots lignin, cellulose and hemicellulose [1]. in forest ecosystems largely depend upon tree species, age of forest stands and climate, estimates indicate that Why does P. subalpina stand out of the crowd? as much as 30% of the net primary production is used The answer of two fundamental questions is still pend- for fine root production [89]. In an Abies alba stand for ing: (i) why are PAC species so amazingly successful in example >6.0 t/ha fine roots biomass with a cumulative colonizing their hosts and (ii) does the host also benefit length of > 20,000 km/ha was estimated [90] showing from the colonization by PAC. At least the second ques- the importance of fine roots as carbon source. Given the tion may be answered by the genome sequence of P. fact that species of the P. fortinii s.l. – A. applanata spe- subalpina. cies complex dominate many of the endophytic assem- Although often hypothesized, endophytic fungi of forest blages in fine roots of temperate and boreal forests [10], trees were rarely shown to be mutualistic for their hosts PAC species have access to a very substantial carbon [4]. However, the closely related needle endophyte P. sco- source. P. subalpina can already colonize the future car- piformis was shown to produce rugulosin, a potent sec- bon source by entering healthy fine roots and may then ondary metabolite against herbivory [80, 81]. Similarly, switch to the saprophytic lifestyle as soon as the roots interaction studies using pathogens (Phytophthora pluri- die off. Indeed, the signature of both lifestyles was ob- vora and Elongisporangium undulatum), P. subalpina served in the genome although studies about the import- strains and P. abies seedlings showed that some of the P. ance of PAC species in the fine root turnover are subalpina strains effectively reduced mortality and disease missing. Moreover, the fungus can escape the highly intensity caused by the pathogens [82]. In addition, sec- competitive soil community by colonizing the roots [91]. ondary metabolites were identified in PAC species that inhibited Phytophthora spp. [82], and the genome of P. Conclusions subalpina encodes a high number of secondary metabolite The analysis of a globally distributed root endophyte key enzymes. Some of these are compatible with known allowed for detailed insights in the gene inventory and pathways and products, such as melanin or ferrichrome- genome organization of a yet largely neglected group of like siderophores. For example, enzymes PAC_05248 and organisms. Our analysis showed that the genome of P. PAC_13158 resemble SidC-like (=type II) and NPS1-like subalpina has a versatile genome including genes for (=type IV) synthetases, i.e., two distinct types of both a pathogenic and saprophytic lifestyle but showed ferrichrome-like siderophore-producing enzymes [83]. also some affinities with ectomycorrhizal species. The Also, two non-reducing type I PKSs of P. subalpina degree of pathogenicity among strains of P. subalpina is flanked by hydroxynaphthalene reductases likely involved high, as observed in F. oxysporum.InF. oxysporum in the melanin synthesis were recognized. Whereas gene pathogenicity is driven by mobile pathogenicity chromo- PAC_1135 was placed in the same putative orthologous somes [56]. Re-sequencing of multiple strains of P. sub- gene cluster as PKS1 of G. lozoyensis, gene PAC_07895 alpina will help identify the molecular basis of its was placed in a second orthologous gene cluster and pathogenicity. In addition, one central question will be showed high similarities with the alm gene of A. alter- to understand the evolutionary trajectory of PAC, nata/the PksP/Alb1 gene of A. fumigatus. All three pro- i.e.whether PAC will become more pathogenic in future teins were experimentally shown to be involved in the which would have a severe effect on forest ecosystems melanin production [84–86]. Several Leotiomycete species health, or whether the PAC-host interaction gets less included genes in both clusters (i.e. B. cinerea, S. slcero- antagonistic. tiorum, M. brunnea), whereas melanized non-helotialean species and G. lozoyensis were included in one of the clus- Methods ters (i.e. C. geophilum, O. maius and C.globosum). More- Selection of Phialocephala subalpina strain and DNA over, no gene models of non-melanized species were isolation included in one of these two clusters. To the best of our Strain UAMH 11012 (UAMH Centre for Global Micro- knowledge this is the first report for the duplication of the fungal Biodiversity, Toronto, Canada) was used for gen- melanin pathway in ascomycetous fungi, although the re- ome sequencing. The strain was originally isolated as dundancy of important secondary metabolite genes was single hyphal tip culture from P. abies fine roots in an reported previously [87, 88]. Still, the products of many of undisturbed forest in Switzerland [14] and was classified the secondary metabolite key enzymes and clusters remain using multiple classes of molecular markers [16, 38, 92]. unresolved. However, they may represent the chemical The strain was grown in malt extract broth (50 ml 2% language of P. subalpina to interact via small molecules (w/v)) for 10 days at 20 °C under constant shaking. Schlegel et al. BMC Genomics (2016) 17:1015 Page 16 of 22

Mycelium was harvested by filtration, lyophilized, and and with <10 hit on the genome were excluded from fur- total DNA was isolated using a CTAB-based protocol ther analysis. The remaining sequences were clustered [93]. Strain identity was verified using microsatellite ana- using blastclust (-S 90 –L 0.9 –bF–p F) and only one lysis [94] before the genome sequencing. sequence per cluster was used for further analysis. Blastx was used to exclude any sequences in the library not be- Sequencing, assembly and gap closing longing to TEs (i.e. HET domain containing proteins or A whole genome shotgun strategy (WGS) on the Roche/ ubiquitin-like proteins). This draft library was then 454 GS FLX (454) was used and sequencing was per- mapped against the genome using RepeatMasker and formed at the Functional Genomic Centre of University consensus sequences of complete TEs were derived. Fi- Zürich/ETH Zürich (FGCZ). In total, 1.3 Mio shotgun nally, sequences were classified according to the system- reads as well as 2.3 Mio reads from one 3 kb paired-end atic of Wicker et al. [97]. The manually curated TE library, 6.2 Mio reads of three 8 kb paired-end libraries library was used for genome masking before annotation. and 309,909 reads of a 20 kb library were included in the assembly. The assembly was performed using new- Annotation of the P. subalpina genome bler 2.5 with a minimum overlap of 50 bases and 98% The annotation strategy is presented in Additional file 9. sequence similarity. We noticed that newbler tends to In brief, a reference dataset of 1,089 gene models/pro- open gaps due to the high number of pair-end reads in teins covered by full-length EST sequences was estab- the assembly. Therefore, reads mapping to both sides of lished and used as training dataset for Augustus [98]. the gaps were identified and gaps were closed after man- Ab-initio gene prediction was performed on the masked ual validation. genome using GeneMark-ES [99], Augustus [100] and FGENESH (Neurospora and Ustilago matrices). Geno- 454 sequencing of a normalized EST library meThreader [101] was used to calculate spliced align- The same strain was used to generate a normalized ments for P. subalpina ESTs and protein data from EST library. The fungus was pre-cultivated in 50 ml of related fungal species. The program was run based on a 2% malt broth (20 g l-1 malt extract; Difco) for 14 days P. subalpina specific splicing model trained using 454 at 20 °C under constant shaking. Then, the mycelium EST dataset with the software BSSM4GSQ [102]. For was homogenized with a blender for 30 s and 5 ml of the training of the splicing model, ESTs were chosen the homogenized mycelium was transferred to new 50 that showed a high coverage >98%, a sequence similarity ml of 2% malt broth (20 g l-1 malt extract; Hefe of 100% and had only one hit on the genome resulting Schweiz). After 48 h the actively growing mycelium was in >4,000 intron/exon junctions. harvested and immediately frozen in liquid nitrogen. A total of 28,092 assembled ESTs as well as 27,819 Total RNA was isolated from approx. 75 mg fresh my- non-assembled but well matching 454 singleton reads celium using the RNeasy plant mini kit (Qiagen, Hom- were mapped. Protein sequences of Rhychnosporium brechtikon, Switzerland). Full-length cDNA was secalis, S. sclerotiorum, B. cinerea, F. graminareum, N. synthesized using the MINT kit (evrogen, Moscow, crassa, Saccharomyces cerevisiae were mapped using Russia) with a degenerated poly-T primer (5′-AAGCA GenomeThreader. Finally, Jigsaw [103] was trained based GTGGTATCAACGCAGAGTAC (T)4G(T)9C(T)10VN- on the 1,089 high confident gene models and used to 3′) during the first strand cDNA synthesis [95] and calculate the best gene model using all the available evi- polTM1 (5′-AAGCAGTGGTATCAACGCAGAGTACT dence (predictors, ESTs, and trans-alignments). Subse- TTTGTCTTTTGTTCTGTTTCTTTTVN-3′)forthe quently, all gene models were manually curated in generation of dsDNA. The cDNA was normalized using Apollo [104] and functional annotation was performed the TRIMMER kit (evrogen) and the library was se- in PEDANT [105]. quenced on the 454. Resulting reads were filtered for chimeras and then a whole transcriptome assembly was Estimating the completeness of the P. subalpina genome performed in newbler 2.3 with a minimum overlap of and classification of gene models 50 bases and 98% sequence similarity. The completeness of the P. subalpina genome and anno- tation was assessed by mapping two separate highly con- Repeat library construction served core gene sets including 248 and 246 proteins A repeat library was constructed based on the final as- respectively [106, 107]. In addition, the fraction of suc- sembly of the genome (see Additional file 8). In brief, cessfully and non-mapped ESTs was analyzed. putative repeat sequences were derived from RepeatSc- Gene models were classified based on the identity out analysis [96]. Low-complexity sequences and micro- against the best hit in the Similarity Matrix of Proteins satellites were removed using the default filtering database (SIMAP) [39]. Proteins with identities ≥30% options in RepeatScout. In addition, sequences <50 bases were considered as confidential gene prediction. The Schlegel et al. BMC Genomics (2016) 17:1015 Page 17 of 22

quality of the low identity genes (proteins with <30% candidate protein were downloaded. In addition, each can- identity with a protein sequence in SIMAP) was didate gene was also blasted against the genome data of assessed by mapping the protein sequences against the as- phylogenetically closely related fungal genomes (G. lozoyen- sembled 454 EST dataset using GenomeThreader and ana- sis, B. cinerea, M. brunnea, S. sclerotiorum). Protein se- lyzing the coverage of the gene model by ESTs. In addition, quences for each candidate gene were clustered with RNA-Seqdatawasavailablefollowingthegenomesequen- USEARCH v8.0.1517 (http://www.drive5.com/usearch/) cing/annotation [108] (http://www.ebi.ac.uk/ena/data/view/ [113] using the -cluster_fast option at an identity threshold PRJEB12610) was mapped against the genome using tophat of 0.95 (pre-sorted by length). If the resulting number of (http://ccb.jhu.edu/software/tophat/index.shtml), discarding clusters exceeded 40, the threshold was sequentially re- reads with bad mapping quality. Read coverage was calcu- duced by 0.05 until cluster numbers were ≤ 40, or the lated using coverageBed (http://bedtools.readthedocs.org/ threshold reached 0.5. This procedure was only applied to en/latest/content/tools/coverage.html) for each exon within non-fungal sequences. Fungal BLAST hits were clustered at the coding sequence. The coverage graph coverage graph 0.95 if the number of clusters was ≤ 40. If not, the threshold was calculated in R using the ggplot2 package [109]. wasadjustedasdescribed.Thecluster representative se- quence of each cluster was used for phylogenetic analysis. Presence of RNAi pathway and analysis for the presence Protein sequences including the P. subalpina sequence of RIP mechanism were aligned using MAFFT [114] (E-INS-i method) and A reference dataset of key proteins in the RNAi pathway the alignment trimmed with TrimAl (http://trimal.cgen- of Neurospora crassa (ARGONAUT, DICER, and RNA- omics.org/trimal) [115] using the –strict setting. Maximum dependent RNA polymerases (RdRP)) was used to search likelihood phylogenetic trees were inferred using FastTree for similar genes in P. subalpina as described in Laurie 2.1 (http://meta.microbesonline.org/fasttree/) [116] with de- et al. [110]. In addition, the presence of repeat-induced fault settings. All trees were deposited in TreeBase (http:// point mutations (RIP) in the genome of P. subalpina purl.org/phylo/treebase/phylows/study/TB2:S20196). The was analyzed. As RIP results in C-to-T transitions at re- DNA-dependent RNA polymerase II (RPB2)sequencethat petitive loci [45, 111] we analyzed di-nucleotide abun- is routinely used in phylogenetic studies was used as con- dances of all predicted interspersed repeats and of non- trol. Due to its high conservation, only proteins with > 75% repetitive control sequences using RIPCAL [112]. Over- identity were included and clustering done at a 90% represented dinucleotides were identified by determining threshold. the fold change of the dinucleotide abundance between repeats and controls. S. sclerotiorum was included as a Secondary metabolism reference in the RIP analysis. Gene, repeat, and GC con- Key proteins involved in the secondary metabolism of P. tent were calculated using a sliding window analysis subalpina were searched by using conserved InterPro (window size: 1,000 bp, step size: 1,000 bp) and plotted motifs of polyketide synthases (PKSs), non-ribosomal for each scaffold. In addition, the genome of P. subal- peptide synthetases (NRPSs) and related enzymes such pina was mined for cytosine DNA methyltransferase as terpene synthases (TPC), and dimethyl allyl trypto- genes of the Dnmt1 family involved in RIP and classified phan synthases (DMATSs) followed by manual inspec- as described in Amselem et al. [54]. tion of the genes to verify the domain arrangement in the case of PKSs and NRPSs. In addition, putative sec- Analysis of horizontal gene transfer from non-fungal ondary metabolite clusters were identified by searching species for genes encoding tailoring enzymes (e.g., acyl-, and A total of 163 proteins showed the best hit with non-fungal methyltransferases, oxidoreductases, cytochrome P450 taxa (124 with bacteria, 20 with plants and 17 with meta- enzymes) up- and downstream of genes for key enzymes zoa) when mapped against the SIMAP database in PED- and searching for and promotors [117]. ANT. The possibility of HGT for these genes was Gene cluster involved in melanin synthesis were pre- evaluated. In a first step, a BLAST search against the nr dicted using protein (i) sequences of PKS genes experi- database (query coverage ≥40%; identities ≥20%; best 1,000 mentally shown to be involved in melanin synthesis such hits) was done and the taxonomic distribution and similar- as G. lozoyensis PKS1 (AAN59953.1; [85]), Alternaria al- ities of the hits were analyzed in R. Genes for HGT were se- ternate alm (BAK64048.1; [84, 118]) and Alb1/PksP of lected as candidates if (i) they showed a biased taxonomic Aspergillus fumigatus (XP_756095.1; [86]), (ii) additional distribution of the hits (<15% fungal hits) and/or (ii) the genes involved in the melanin synthesis pathway such as fungal hits showed smaller bit scores in blast searches than scytalone dehydratases (BC1G_144888) and hydroxy- non-fungal hits. Candidate genes were further analyzed naphthalene reductases (BC1G_04230) [55] and (iii) by using a phylogenetic approach. The full protein datasets of comparing orthologous gene clusters including candi- the ≤1000 best blastp hits against nr database for each date PKS derived from QuartetS analysis for the 14 Schlegel et al. BMC Genomics (2016) 17:1015 Page 18 of 22

ascomycetous species with the presence of melanization matrices were subjected to principle component analysis in the respective species. In addition, the NRPS genes as described above. putatively involved in siderophore synthesis were In a third step carbohydrate-active enzymes were an- annotated. notated using the CAZyme expert annotation pipeline [123] and the number of enzymes in the diverse Comparative genome analysis CAZyme families were compared with a special empha- Comparative genome analyses were performed against a sizes on the CAZyme families likely involved in cellulose selection of published genomes of species showing dif- (GH6, GH7, GH45), hemicellulose (GH10, GH11, GH26, ferent lifestyles (i.e. saprophytes, bio- and necrotrophic GH31, GH67, GH115, GH134), pectin (GH28, GH53, parasites and mycorrhizal species, Table 1). Special em- GH78, GH79, GH88, GH105, GH106, GH127, PL1, PL3, phasis was put on selecting ascomycetous genomes and, PL4, PL9, PL11, CE8, CE12), and cutin layer breakdown whereever possible, genomes that are closely related to (CE5). Moreover, a fourth category of CAZyme families P. subalpina. All genomes used for comparative analysis were included that likely act on different of the above were functionally annotated in PEDANT. mentioned substrates (GH12, GH30, GH43, GH5, Putative orthologous gene clusters were analyzed using GH51, GH54, GH62, GH74, GH93). The total number QuartetS [119] and the total number of clusters in which of CAZYme modules per substrate class and species a specific species was present as well as pairwise shared were calculated and subjected to principal component clusters was recorded. Orthologous gene clusters analysis as described above. In addition, proteins with enriched in pathogenic species were searched (four out auxiliary activities (AAs) that are hypothesized to be in- of the five pathogenic species show entries for the re- volved in the degradation of lignin were analyzed. spective cluster and ≤1 species of the six saprotroph spe- cies and ≤1 species of the two mycorrhizal species are Additional files present in the clusters respectively). Gene clusters enriched in saprophytes were searched using the same Additional file 1: Mapping statics for the assembled 454 ESTs. (PDF 197 kb) strategy. Both matrices were then subjected to principle Additional file 2: Validation of low identity gene models. (PDF 190 kb) component analysis using the vegan package in R [120] Additional file 3: Putative secondary metabolite clusters identified in to analyze the position of P. subalpina compared to the the P. subalpina genome (XLSX 12 kb) other species. Moreover, FunCat terms [121] were Additional file 4: General statistics on the presence of InterPro enriched for these lifestyle enriched clusters by mapping accessions in the analyzed genomes. (XLSX 10 kb) P. subalpina geneIDs against the FunCat annotations Additional file 5: CAZyme modules related to PCWD used for principle and selecting the ten most frequent FunCat terms (any component analysis. (XLSX 12 kb) geneID/FunCat category was only considered once). In Additional file 6: General genome statistics for the species included in the present study. (XLSX 15 kb) addition, putative orthologous gene clusters shared Additional file 7: PCA analysis based on InterPro accessions. Placement among the two mycorrhizal species T. melanosporum of the 13 ascomycete species and P. subalpina in PCA based on InterPro and C. geophilum with O. maius and P. subalpina was accessions described in Soanes et al. 2008 to be enriched in pathogens. analyzed separately as the limited number of mycor- (PDF 359 kb) rhizal species did not allow to properly defining lifestyle Additional file 8: Strategy to construct the manually-curated repeat library. (PDF 114 kb) enriched clusters. Additional file 9: Overview of the annotation strategy applied. In a second step, the non-redundant portion of Inter- (PDF 120 kb) Pro accessions per gene model, i.e. counting each Inter- Pro accession per gene model only once, was analyzed. Besides some general statistics such as the number of Abbreviations CAZyme: Carbohydrate-active enzymes; HGT: Horizontal gene transfer; distinct InterPro accessions within a species or different PAC: Phialocephala fortinii s.l. – Acephala applanata species complex; lifestyles or the total number of InterPro accessions, the PCWDE: Plant cell wall degrading enzymes; TE: Transposable elements number of significantly over- or underrepresented Inter- Pro accessions in P. subalpina was determined by com- Acknowledgments paring their abundance in P. subalpina against the We thank Marzanna Küenzler und Weihong Qi (FGCZ, Zürich) for running the 454/FLX sequencing and Thomas Wicker for help in the classification and average observed in the 13 genomes using a Z-test [122]. construction of the TE library. We also thank Andrin Gross and Ottmar Significantly over- and underrepresented InterPro acces- Holdenrieder (Institute of Integrative Biology (IBZ), ETH Zurich) for helpful sions were then mapped against the gene ontology anno- comments on a previous version of the manuscript. tations (http://www.ebi.ac.uk/interpro/download.html) and enriched for GO terms. Moreover, InterPro acces- Funding This study was partially funded by Vontobel Stiftung, Zürich to CRG. The sions >2x overrepresented in pathogenic species and funding agency had no influence on the design of the study and collection, saprophytic species were mined and the resulting analysis, and interpretation of data and in writing the manuscript. Schlegel et al. BMC Genomics (2016) 17:1015 Page 19 of 22

Availability of data and materials 7. Francis R, Read DJ. Mutualism and antagonism in the mycorrhizal symbiosis, The datasets supporting the conclusions of this article are available from the with special reference to impacts on plant community structure. Can J Bot. following repositories. 1995;73(S1):1301–9. European Nucleotide Archive: 8. Johnson NC, Graham JH, Smith FA. Functioning of the mycorrhizal http://www.ebi.ac.uk/ena/data/view/JN031566 associations along the mutualism-parasitism continuum. New Phytol. 1997; http://www.ebi.ac.uk/ena/data/view/FJOG01000001-FJOG01000205 135(4):575–85. Pedant Database: 9. Schulz B, Boyle C. The endophytic continuum. Mycol Res. 2005;109(6):661–86. http://pedant.helmholtz-muenchen.de/genomes.jsp?category=fungal. 10. Queloz V, Sieber TN, Holdenrieder O, McDonald BA, Grünig CR. No CAZYme: biogeographical pattern for a root-associated fungal species complex. Glob http://www.cazy.org/ Ecol Biogeogr. 2011;20(1):160–9. TreeBase: 11. Piercey MM, Graham SW, Currah RS. Patterns of genetic variation in http://purl.org/phylo/treebase/phylows/study/TB2:S20196 Phialocephala fortinii across a broad latitudinal transect in Canada. Mycol In addition, datasets further supporting the conclusions of this article are Res. 2004;108(8):955–64. included within the article and its additional files. 12. Zhang C, Yin L, Dai S. Diversity of root-associated fungal endophytes in Rhododendron fortunei in subtropical forests of China. Mycorrhiza. 2009; – Authors’ contributions 19(6):417 23. AD and CRG designed the research project. AD and CRG performed all 13. Bruzone MC, Fontenla SB, Vohník M. Is the prominent ericoid mycorrhizal ’ experiments and collected sequence data. MS, MM, UG, RB, MH, BH, CMKS, fungus Rhizoscyphus ericae absent in the Southern Hemisphere s Ericaceae? DH, CRG implemented analytical tools and performed analysis. UG, BH, A case study on the diversity of root mycobionts in Gaultheria spp. from – CMKS, DH, CRG wrote the manuscript. All authors have read and approved northwest Patagonia, Argentina. Mycorrhiza. 2015;25(1):25 40. the final manuscript. 14. Grünig CR, Duò A, Sieber TN. Population genetic analysis of Phialocephala fortinii s.l. and Acephala applanata in two undisturbed forests in Switzerland and evidence for new cryptic species. Fungal Genet Biol. 2006;43(6):410–21. ’ Authors informations 15. Grünig CR, Sieber TN. Molecular and phenotypic description of the Not applicable. widespread root symbiont Acephala applanata gen. et sp. nov., formerly known as dark septate endophyte Type 1. Mycologia. 2005;97(3):628–40. Competing interests 16. Grünig CR, Duò A, Sieber TN, Holdenrieder O. Assignment of species rank to The authors declare that they have no competing interests. six reproductively isolated cryptic species of the Phialocephala fortinii s.l.- Acephala applanata species complex. Mycologia. 2008;100(1):47–67. 17. Queloz V, Grünig CR, Sieber TN, Holdenrieder O. Monitoring the spatial and Consent for publication temporal dynamics of a community of the tree-root endophyte Not applicable. Phialocephala fortinii s.l. New Phytol. 2005;168(3):651–60. 18. Stroheker S, Queloz V, Sieber TN. Spatial and temporal dynamics in the Ethics approval and consent to participate Phialocephala fortinii s.l. – Acephala applanata species complex (PAC). Plant Not applicable. Soil. 2016;407(1):231–41. 19. McGill BJ, Etienne RS, Gray JS, Alonso D, Anderson MJ, Benecha HK, Author details Dornelas M, Enquist BJ, Green JL, He FL, et al. Species abundance 1 Institute of Integrative Biology (IBZ), Forest Pathology and Dendrology, ETH distributions: moving beyond single prediction theories to integration Zürich, 8092 Zürich, Switzerland. 2Institute of Bioinformatics and Systems within an ecological framework. Ecol Lett. 2007;10(10):995–1015. Biology, Helmholtz Zentrum München, German Research Center for 20. Zimmerman E, Peterson RL. Effect of a dark septate fungal endophyte on 3 Environmental Health, 85764 Neuherberg, Germany. Department of seed germination and protocorm development in a terrestrial orchid. Genome-oriented Bioinformatics, Technische Universität München, Symbiosis. 2007;43(1):45–52. 4 Wissenschaftszentrum Weihenstephan, 85354 Freising, Germany. Interfaculty 21. Jumpponen A, Mattson KG, Trappe JM. Mycorrhizal functioning of Bioinformatics Unit and Swiss Institute of Bioinformatics, University of Berne, Phialocephala fortinii with Pinus contorta on glacier forefront soil: Interactions 5 Baltzerstrasse 6, 3012 Bern, Switzerland. Architecture et Fonction des with soil nitrogen and organic matter. Mycorrhiza. 1998;7(5):261–5. Macromolécules Biologiques (AFMB), UMR 7257 CNRS, Université 22. Tellenbach C, Grünig CR, Sieber TN. Negative effects on survival and 6 Aix-Marseille, 163 Avenue de Luminy, 13288 Marseille, France. DOE Joint performance of Norway spruce seedlings colonized by fungal root endophytes Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA. are primarily isolate-dependent. Environ Microbiol. 2011;13(9):2508–17. 7 Friedrich-Schiller-Universität, Pharmazeutische Mikrobiologie, Winzerlaer 23. Tellenbach C, Grünig CR, Sieber TN. Suitability of quantitative real-time PCR 8 Strasse 2, 07745 Jena, Germany. Microsynth AG, Schützenstrasse 15, 9436 to estimate the biomass of fungal root endophytes. Appl Environ Microbiol. Balgach, Switzerland. 2010;76(17):5764–72. 24. Reininger V, Sieber TN. Mycorrhiza reduces adverse effects of dark septate Received: 26 July 2016 Accepted: 2 December 2016 endophytes (DSE) on growth of conifers. PLoS One. 2012;7(8), e42865. 25. Yu T, Nassuth A, Peterson RL. Characterization of the interaction between the dark septate fungus Phialocephala fortinii and Asparagus officinalis roots. References Can J Microbiol. 2001;47(8):741–53. 1. Sieber TN. Fungal root endophytes. In: Waisel Y, Eshel A, Kafkafi U, editors. 26. Currah RS, Tsuneda A, Murakami S. Morphology and ecology of Phialocephala Plant Roots: The hidden half. 3rd ed. New York and Basel: Marcel Dekker; fortinii in roots of Rhododendron brachycarpum. Can J Bot. 1993;71(12):1639–44. 2002. p. 887–917. 27. Peterson RL, Wagg C, Pautler M. Associations between microfungal endophytes 2. Schulz BJE, Boyle CJC, Sieber TN. Microbial Root Endophytes. Berlin: and roots: do structural feautures indicate function? Botany. 2008;86(5):445–56. Springer; 2006. 28. O’Dell TE, Massicotte HB, Trappe JM. Root colonization of Lupinus latifolius 3. Addy HD, Piercey MM, Currah RS. Microfungal endophytes in roots. Can J Agardh. and Pinus contorta Dougl. by Phialocephala fortinii Wang & Wilcox. Bot. 2005;83(1):1–13. New Phytol. 1993;124(1):93–100. 4. Sieber TN. Endophytic fungi in forest trees: are they mutualists? Fungal Biol 29. Fernando AA, Currah RS. A comparative study of the effects of the root Rev. 2007;21(2–3):75–89. endophytes Leptodontidium orchidicola and Phialocephala fortinii (Fungi 5. Grünig CR, Queloz V, Sieber TN, Holdenrieder O. Dark septate endophytes Imperfecti) on the growth of some subalpine plants in culture. Can J Bot. (DSE) of the Phialocephala fortinii s.l. – Acephala applanata species complex 1996;74(7):1071–8. in tree roots – classification, population biology and ecology. Botany. 2008; 30. Wicker T, Oberhaensli S, Parlange F, Buchmann JP, Shatalina M, Roffler S, 86(12):1355–69. Ben-David R, Dolezel J, Simkova H, Schulze-Lefert P, et al. The wheat 6. Rodriguez RJ, White JF, Arnold AE, Redman RS. Fungal endophytes: diversity powdery mildew genome shows the unique evolution of an obligate and functional roles. New Phytol. 2009;182(2):314–30. biotroph. Nat Genet. 2013;45(9):1092–6. Schlegel et al. BMC Genomics (2016) 17:1015 Page 20 of 22

31. Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, Martínez 55. Amselem J, Cuomo CA, van Kan JA, Viaud M, Benito EP. Genomic analysis of AT, Otillar R, Spatafora JW, Yadav JS, et al. The Paleozoic Origin of Enzymatic the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis Lignin Decomposition Reconstructed from 31 Fungal Genomes. Science. cinerea. PLoS Genet. 2011;7(8), e1002230. 2012;336(6089):1715–9. 56. Ma LJ, van der Does HC, Borkovich KA, Coleman JJ, Daboussi MJ, Di Pietro 32. KohlerA,KuoA,NagyLG,MorinE,BarryKW,BuscotF,CanbackB,ChoiC,Cichocki A, Dufresne M, Freitag M, Grabherr M, Henrissat B, et al. Comparative N, Clum A, et al. Convergent losses of decay mechanisms and rapid turnover of genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature. symbiosis genes in mycorrhizal mutualists. Nat Genet. 2015;47(4):410–5. 2010;464(7287):367–73. 33. Ohm RA, Feau N, Henrissat B, Schoch CL, Horwitz BA, Barry KW, Condon BJ, 57. Wang Z, Johnston PR, Takamatsu S, Spatafora JW, Hibbett DS. Toward a Copeland AC, Dhillon B, Glaser F, et al. Diverse lifestyles and strategies of phylogenetic classification of the Leotiomycetes based on rDNA data. plant pathogenesis encoded in the genomes of eighteen Dothideomycetes Mycologia. 2006;98(6):1065–75. fungi. PLoS Pathog. 2012;8(12), e1003037. 58. Marcet-Houben M, Gabaldón T. Acquisition of prokaryotic genes by fungal 34. Schardl CL, Young CA, Hesse U, Amyotte SG, Andreeva K, Calie PJ, genomes. Trends Genet. 2010;26(1):5–8. Fleetwood DJ, Haws DC, Moore N, Oeser B, et al. Plant-Symbiotic Fungi as 59. Mallet LV, Becq J, Deschavanne P. Whole genome evaluation of horizontal Chemical Engineers: Multi-Genome Analysis of the Clavicipitaceae Reveals transfers in the pathogenic fungus Aspergillus fumigatus. BMC Genomics. Dynamics of Alkaloid Loci. PLoS Genet. 2013;9(2):e1003323. 2010;11(1):1–13. 35. Schardl CL, Young CA, Moore N, Krom N, Dupont P-Y, Pan J, Florea S, Webb 60. Richards TA, Leonard G, Soanes DM, Talbot NJ. Gene transfer into the fungi. JS, Jaromczyk J, Jaromczyk JW, et al. Chapter Ten - Genomes of Plant- Fungal Biol Rev. 2011;25(2):98–110. Associated Clavicipitaceae. In: Francis MM, editor. Advances in Botanical 61. Rice A, Currah R. Oidiodendron maius: saprobe in Sphagnum peat, mutualist Research. Volume 70. London: Academic Press; 2014. p. 291–327. in ericaceous roots? In: Schulz BE, Boyle CC, Sieber T, editors. Microbial Root 36. Schardl CL, Leuchtmann A, Spiering MJ. Symbiosis of grasses with Endophytes, vol. 9. Berlin and Heidelberg: Springer; 2006. p. 227–46. seedborne fungal endophytes. Annu Rev Plant Biol. 2004;55(1):315–40. 62. Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ. Evolution of 37. Walker AK, Frasz SL, Seifert KA, Miller JD, Mondo SJ, LaButti K, Lipzen A, filamentous plant pathogens: gene exchange across eukaryotic kingdoms. Dockter RB, Kennedy MC, Grigoriev IV, et al. Full genome of Phialocephala Curr Biol. 2006;16(18):1857–64. scopiformis DAOMC 229536, a fungal endophyte of spruce producing the 63. Soanes D, Richards TA. Horizontal gene transfer in eukaryotic plant potent anti-insectan compound rugulosin. Genome Announc. pathogens. Annu Rev Phytopathol. 2014;52(1):583–614. 2016;4(1):e01768–01715. 64. Gardiner DM, McDonald MC, Covarelli L, Solomon PS, Rusu AG, Marshall M, 38. Duo A, Bruggmann R, Zoller S, Bernt M, Grunig CR. Mitochondrial genome Kazan K, Chakraborty S, McDonald BA, Manners JM. Comparative evolution in species belonging to the Phialocephala fortinii s.l. - Acephala pathogenomics reveals horizontally acquired novel virulence genes in fungi applanata species complex. BMC Genomics. 2012;13:17. infecting cereal hosts. PLoS Pathog. 2012;8(9), e1002952. 39. Arnold R, Goldenberg F, Mewes H-W, Rattei T. SIMAP—the database of all- 65. Palzkill T. Metallo-β-lactamase structure and function. Ann N Y Acad Sci. against-all protein sequence similarities and annotations with new 2013;1277:91–104. interfaces and increased coverage. Nucleic Acids Res. 2014;42(D1):D279–84. 66. Martin F, Kohler A, Murat C, Balestrini R, Coutinho PM, Jaillon O, Montanini 40. Bleuler-Martinez S, Butschi A, Garbani M, WäLti MA, Wohlschlager T, Potthoff B, Morin E, Noel B, Percudani R, et al. Perigord black truffle genome E, Sabotic J, Pohleven J, Lüthy P, Hengartner MO, et al. A lectin-mediated uncovers evolutionary origins and mechanisms of symbiosis. Nature. 2010; resistance of higher fungi against predators and parasites. Mol Ecol. 2011; 464(7291):1033–8. 20(14):3056–70. 67. Martin F, Aerts A, Ahren D, Brun A, Danchin EGJ, Duchaussoy F, Gibon J, 41. Mohanta TK, Bae H. The diversity of fungal genome. Biol Proced Online. Kohler A, Lindquist E, Pereda V, et al. The genome of Laccaria bicolor 2015;17:8. provides insights into mycorrhizal symbiosis. Nature. 2008;452(7183):88–92. 42. Petrov DA. Evolution of genome size: new approaches to an old problem. 68. Mengiste T. Plant immunity to necrotrophs. Annu Rev Phytopathol. Trends Genet. 2001;17(1):23–8. 2012;50(1):267–94. 43. Kelkar YD, Ochman H. Causes and consequences of genome expansion in 69. Bellincampi D, Cervone F, Lionetti V. Plant cell wall dynamics and wall-related fungi. Genome Biol Evol. 2012;4(1):13–23. susceptibility in plant–pathogen interactions. Front Plant Sci. 2014;5:228. 44. Abouelhoda M, Kurtz S, Ohlebusch E. CoCoNUT: an efficient system for the 70. Presti LL, Lanver D, Schweizer G, Tanaka S, Liang L, Tollot M, Zuccaro A, comparison and analysis of genomes. BMC Bioinformatics. 2008;9(1):476–93. Reissmann S, Kahmann R. Fungal effectors and plant susceptibility. Annu 45. Galagan JE, Selker EU. RIP: the evolutionary cost of genome defense. Trends Rev Plant Biol. 2015;66(1):513–45. Genet. 2004;20(9):417–23. 71. Soanes DM, Alam I, Cornell M, Wong HM, Hedeler C, Paton NW, Rattray M, 46. Cambareri EB, Jensen BC, Schabtach E, Selker EU. Repeat-induced G-C to A- Hubbard SJ, Oliver SG, Talbot NJ. Comparative genome analysis of T mutations in Neurospora. Science. 1989;244:1571–5. filamentous fungi reveals gene family expansions associated with fungal 47. Goyon C, Faugeron G. Targeted transformation of Ascobolus immersus and pathogenesis. PLoS One. 2008;3(6), e2300. de novo methylation of the resulting duplicated DNA sequences. Mol Cell 72. Jashni MK, Dols IHM, Iida Y, Boeren S, Beenen HG, Mehrabi R, Collemare J, Biol. 1989;9(7):2818–27. de Wit PJGM. Synergistic action of a metalloprotease and a serine protease 48. Obbard DJ, Gordon KHJ, Buck AH, Jiggins FM. The evolution of RNAi as a from Fusarium oxysporum f. sp. lycopersici cleaves chitin-binding tomato defence against viruses and transposable elements. Philos Trans R Soc B. chitinases, reduces their antifungal activity, and enhances fungal virulence. 2009;364(1513):99–115. Mol Plant-Microbe Interact. 2015;28(9):996–1008. 49. Yamanaka S, Mehta S, Reyes-Turcu FE, Zhuang F, Fuchs RT, Rong Y, Robb GB, 73. Kombrink A, Thomma BPHJ. LysM effectors: secreted proteins supporting Grewal SIS. RNAi triggered by specialized machinery silences developmental fungal life. PLoS Pathog. 2013;9(12), e1003769. genes and retrotransposons. Nature. 2013;493(7433):557–60. 74. de Jonge R, Thomma BPHJ. Fungal LysM effectors: extinguishers of host 50. Slotkin RK, Martienssen R. Transposable elements and the epigenetic immunity? Trends Microbiol. 2009;17(4):151–7. regulation of the genome. Nat Rev Genet. 2007;8(4):272–85. 75. Hacquard S, Joly DL, Lin Y-C, Tisserant E, Feau N, Delaruelle C, Legué V, 51. Kasbekar DP. What have we learned by doing transformations in Neurospora Kohler A, Tanguay P, Petre B, et al. A comprehensive analysis of genes tetrasperma? In: van den Berg MA, Maruthachalam K, editors. Genetic encoding small secreted proteins identifies candidate effectors in Transformation Systems in Fungi, vol. 2. Cham and Heidelberg: Springer; Melampsora larici-populina (Poplar Leaf Rust). Mol Plant-Microbe Interact. 2014. p. 47–52. 2011;25(3):279–93. 52. Zaffarano PL, Queloz V, Duo A, Grunig CR. Sex in the PAC: A hidden affair in 76. Sieber TN, Grunig CR. Fungal root endophytes. In: Eshel A, Beeckman T, dark septate endophytes? BMC Evol Biol. 2011;11:282. editors. Plant Roots: The hidden half. 4th ed. Boca Raton, London, New York: 53. Grünig CR, Queloz V, Duò A, Sieber TN. Phylogeny of Phaeomollisia piceae CRC Press; 2013. p. 1–49. gen. sp. nov.: a dark-septate conifer-needle endophyte and its relationships 77. Gordon TR, Martyn RD. The evolutionary biology of Fusarium oxysporum. to Phialocephala and Acephala. Mycol Res. 2009;113(2):207–21. Annu Rev Phytopathol. 1997;35(1):111–28. 54. Amselem J, Lebrun M-H, Quesneville H. Whole genome comparative 78. Gordon TR, Okamoto D, Jacobson DJ. Colonization of muskmelon and analysis of transposable elements provides new insight into mechanisms of nonsusceptible crops by Fusarium oxysporum f. sp. melonis and other their inactivation in fungal genomes. BMC Genomics. 2015;16(1):141. species of Fusarium. Phytopathology. 1989;79:1095–100. Schlegel et al. BMC Genomics (2016) 17:1015 Page 21 of 22

79. Kunzler M. Hitting the sweet spot: glycans as targets of fungal defense 103. Allen JE, Salzberg SL. JIGSAW: integration of multiple sources of evidence effector proteins. Molecules. 2015;20(5):8144–67. for gene prediction. Bioinformatics. 2005;21(18):3596–603. 80. Sumarah MW, Puniani E, Blackwell BA, Miller JD. Characterization of 104. Lee E, Harris N, Gibson M, Chetty R, Lewis S. Apollo: a community resource polyketide metabolites from foliar endophytes of Picea glauca. J Nat Prod. for genome annotation editing. Bioinformatics. 2009;25(14):1836–7. 2008;71(8):1393–8. 105. Walter MC, Rattei T, Arnold R, Güldener U, Münsterkötter M, Nenova K, 81. Miller JD, Sumarah MW, Adams GW. Effect of a rugulosin-producing Kastenmüller G, Tischler P, Wölling A, Volz A, et al. PEDANT covers all endophyte in Picea glauca on Choristoneura fumiferana. J Chem Ecol. complete RefSeq genomes. Nucleic Acids Res. 2009;37 suppl 1:D408–11. 2008;34(3):362–8. 106. Parra G, Bradnam K, Ning Z, Keane T, Korf I. Assessing the gene space in 82. Tellenbach C, Sumarah MW, Grunig CR, Miller JD. Inhibition of Phytophthora draft genomes. Nucleic Acids Res. 2009;37(1):289–97. species by secondary metabolites produced by the dark septate endophyte 107. Aguileta G, Marthey S, Chiapello H, Lebrun M-H, Rodolphe F, Fournier E, Phialocephala europaea. Fungal Ecol. 2013;6(1):12–8. Gendrault-Jacquemard A, Giraud T. Assessing the performance of single- 83. Bushley KE, Ripoll DR, Turgeon BG. Module evolution and substrate copy genes for recovering robust phylogenies. Syst Biol. 2008;57(4):613–27. specificity of fungal nonribosomal peptide synthetases involved in 108. Reininger V, Schlegel M. Analysis of the Phialocephala subalpina siderophore biosynthesis. BMC Evol Biol. 2008;8(1):1–24. transcriptome during colonization of its host plant Picea abies. PLoS One. 84. Kimura N, Tsuge T. Gene cluster involved in melanin biosynthesis of the 2016;11(3), e0150591. filamentous fungus Alternaria alternata. J Bacteriol. 1993;175(14):4427–35. 109. R Development Core Team. R: a language and environment for statistical 85. Zhang A, Lu P, Dahl-Roshak AM, Paress PS, Kennedy S, Tkacz JS, An Z. computing. Vienna: R Foundation for Statistical Computing; 2006. Efficient disruption of a polyketide synthase gene (pks1) required for 110. Laurie JD, Ali S, Linning R, Mannhaupt G, Wong P, Güldener U, melanin synthesis through Agrobacterium-mediated transformation of Münsterkötter M, Moore R, Kahmann R, Bakkeren G, et al. Genome Glarea lozoyensis. Mol Gen Genomics. 2003;268(5):645–55. comparison of barley and maize smut fungi reveals targeted loss of RNA 86. Langfelder K, Jahn B, Gehringer H, Schmidt A, Wanner G, Brakhage AA. silencing components and species-specific presence of transposable Identification of a polyketide synthase gene (pksP) of Aspergillus fumigatus elements. Plant Cell. 2012;24(5):1733–45. involved in conidial pigment biosynthesis and virulence. Med Microbiol 111. Horns F, Petit E, Yockteng R, Hood ME. Patterns of repeat-induced point Immunol. 1998;187(2):79–89. mutation in transposable elements of basidiomycete fungi. Genome Biol 87. Wick J, Heine D, Lackner G, Misiek M, Tauber J, Jagusch H, Hertweck C, Evol. 2012;4(3):240–7. Hoffmeister D. A fivefold parallelized biosynthetic process secures 112. Hane JK, Oliver RP. RIPCAL: a tool for alignment-based analysis of repeat-induced chlorination of Armillaria mellea (honey mushroom) toxins. Appl Environ point mutations in fungal genomic sequences. BMC Bioinformatics. 2008;9:478. Microbiol. 2015;82(4):1196–204. 113. Edgar RC. Search and clustering orders of magnitude faster than BLAST. 88. Braesel J, Götze S, Shah F, Heine D, Tauber J, Hertweck C, Tunlid A, Bioinformatics. 2010;26(19):2460–1. Stallforth P, Hoffmeister D. Three redundant synthetases secure redox- 114. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid active pigment production in the Basidiomycete Paxillus involutus. multiple sequence alignment based on fast Fourier transform. Nucleic Acids Chem Biol. 2015;22(10):1325–34. Res. 2002;30(14):3059–66. 89. Brunner I, Godbold LD. Tree roots in a changing world. J For Res. 115. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for 2007;12(2):78–82. automated alignment trimming in large-scale phylogenetic analyses. 90. Brunner I, Ruf M, Lüscher P, Sperisen C. Molecular markers reveal extensive Bioinformatics. 2009;25(15):1972–3. intraspecific below-ground overlap of silver fir fine roots. Mol Ecol. 2004; 116. Price MN, Dehal PS, Arkin AP. FastTree 2 – Approximately Maximum- 13(11):3595–600. Likelihood Trees for Large Alignments. PLoS One. 2010;5(3), e9490. 91. Hibbing ME, Fuqua C, Parsek MR, Peterson SB. Bacterial competition: surviving 117. Sieber CMK, Lee W, Wong P, Münsterkötter M, Mewes H-W, Schmeitzl C, and thriving in the microbial jungle. Nat Rev Micro. 2010;8(1):15–25. Varga E, Berthiller F, Adam G, Güldener U. The Fusarium graminearum 92. Queloz V, Duo A, Sieber TN, Grünig CR. Microsatellite size homoplasies and genome reveals more secondary metabolite gene clusters and hints of null alleles do not affect species diagnosis and population genetic analysis horizontal gene transfer. PLoS One. 2014;9(10), e110311. in a fungal species complex. Mol Ecol Ressour. 2010;10:348–67. 118. Kheder AA, Akagi Y, Akamatsu H, Yanaga K, Maekawa N, Otani H, Tsuge T, Kodama 93. Grünig CR, Linde CC, Sieber TN, Rogers SO. Development of single-copy M. Functional analysis of the melanin biosynthesis genes ALM1 and BRM2-1 in the RFLP markers for population genetic studies of Phialocephala fortinii and tomato pathotype of Alternaria alternata. J Gen Plant Pathol. 2011;78(1):30–8. closely related taxa. Mycol Res. 2003;107(1):1332–41. 119. Yu C, Zavaljevski N, Desai V, Reifman J. QuartetS: a fast and accurate algorithm 94. Queloz V, Duo A, Grünig CR. Isolation and characterization of microsatellite for large-scale orthology detection. Nucleic Acids Res. 2011;39(13):e88. markers for the tree-root endophytes Phialocephala subalpina and 120. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Phialocephala fortinii s.s. Mol Ecol Ressour. 2008;8(6):1322–5. Simpson GL, Solymos P, Stevens MHH, Wagner H. vegan: Community 95. Beldade P, Rudd S, Gruber J, Long A. A wing expressed sequence tag Ecology Package. R package version 2.3-1. The Comprehensive R Archive resource for Bicyclus anynana butterflies, an evo-devo model. BMC Network. 2014. https://CRAN.R-project.org/package=vegan. Genomics. 2006;7(1):130. 121. Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, 96. Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in Güldener U, Mannhaupt G, Münsterkötter M, et al. The FunCat, a functional large genomes. Bioinformatics. 2005;21 suppl 1:i351–8. annotation scheme for systematic classification of proteins from whole 97. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, genomes. Nucleic Acids Res. 2004;32(18):5539–45. Leroy P, Morgante M, Panaud O, et al. A unified classification system for 122. Stahel WA. Statistische Datenanalyse. 4th ed. Braunschweig: Vieweg; 2002. eukaryotic transposable elements. Nat Rev Genet. 2007;8(12):973–82. 123. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. The 98. Stanke M, Tzvetkova A, Morgenstern B. AUGUSTUS at EGASP: using EST, carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. protein and genomic alignments for improved gene prediction in the 2014;42(Database issue):D490–5. human genome. Genome Biol. 2006;7 Suppl 1:S11. 124. Spanu PD, Abbott JC, Amselem J, Burgis TA, Soanes DM, Stüber K, van 99. Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene Themaat EV L, Brown JKM, Butcher SA, Gurr SJ, et al. Genome expansion and prediction in novel fungal genomes using an ab initio algorithm with gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism. unsupervised training. Genome Res. 2008;18(12):1979–90. Science. 2010;330(6010):1543–6. 100. Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in 125. Zhu S, Cao Y-Z, Jiang C, Tan B-Y, Wang Z, Feng S, Zhang L, Su X-H, Brejova eukaryotes with a generalized hidden Markov model that uses hints from B, Vinar T, et al. Sequencing the genome of Marssonina brunnea reveals external sources. BMC Bioinformatics. 2006;7(1):1–11. fungus-poplar co-evolution. BMC Genomics. 2012;13(1):1–10. 101. Gremme G, Brendel V, Sparks ME, Kurtz S. Engineering a software tool for 126. Nierman WC, Yu J, Fedorova-Abrams ND, Losada L, Cleveland TE, Bhatnagar gene structure prediction in higher organisms. Inf Softw Technol. D, Bennett JW, Dean R, Payne GA. Genome sequence of Aspergillus flavus 2005;47(15):965–78. NRRL 3357, a strain that causes aflatoxin contamination of food and feed. 102. Sparks ME, Brendel V. Incorporation of splice site probability models for Genome Announc. 2015;3(2):e00168–00115. non-canonical introns improves gene structure prediction in plants. 127. Martinez D, Berka RM, Henrissat B, Saloheimo M, Arvas M, Baker SE, Bioinformatics. 2005;21(Suppl_3):iii20–30. Chapman J, Chertkov O, Coutinho PM, Cullen D, et al. Genome sequencing Schlegel et al. BMC Genomics (2016) 17:1015 Page 22 of 22

and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina). Nat Biotechnol. 2008;26(5):553–60. 128. Cuomo CA, Guldener U, Xu JR, Trail F, Turgeon BG. The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization. Science. 2007;317(5843):1400–2. 129. van den Berg MA, Albang R, Albermann K, Badger JH, Daran J-M, Driessen AJ M, Garcia-Estrada C, Fedorova ND, Harris DM, Heijne WHM, et al. Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum. Nat Biotechnol. 2008;26(10):1161–8. 130. Youssar L, Grüning BA, Erxleben A, Günther S, Hüttel W. Genome sequence of the fungus Glarea lozoyensis: the first genome sequence of a species from the Helotiaceae family. Eukaryotic Cell. 2012;11(2):250. 131. Peter M, Kohler A, Ohm RA, Kuo A, Krützmann J, Morin E, Arend M, Barry KW, Binder M, Choi C, et al. Ectomycorrhizal ecology is imprinted in the genome of the dominant symbiotic fungus Cenococcum geophilum. Nat Commun. 2016;7:12662.

Submit your next manuscript to BioMed Central and we will help you at every step:

• We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal • We provide round the clock customer support • Convenient online submission • Thorough peer review • Inclusion in PubMed and all major indexing services • Maximum visibility for your research

Submit your manuscript at www.biomedcentral.com/submit