bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Article-Discoveries

2 Specialized plant biochemistry drives gene clustering in fungi

3 Emile Gluck-Thaler1, Jason C. Slot1*

4 1 Department of Plant Pathology, The Ohio State University, Columbus, Ohio,

5 United States of America

6 *Corresponding author

7 E-mail: [email protected]

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

1 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

23 Abstract

24 The fitness and evolution of both prokaryotes and eukaryotes are affected

25 by the organization of their genomes. In particular, the physical clustering of

26 functionally related genes can facilitate coordinated gene expression and can

27 prevent the breakup of co-adapted alleles in recombining populations. While

28 clustering may thus result from selection for phenotype optimization and

29 persistence, the extent to which eukaryotic gene organization in particular is

30 driven by specific environmental selection pressures has rarely been

31 systematically explored. Here, we investigated the genetic architecture of fungal

32 genes involved in the degradation of phenylpropanoids, a class of plant-produced

33 secondary metabolites that mediate many ecological interactions between plants

34 and fungi. Using a novel gene cluster detection method, we identified over one

35 thousand gene clusters, as well as many conserved combinations of clusters, in

36 a phylogenetically and ecologically diverse set of fungal genomes. We

37 demonstrate that congruence in gene organization over small spatial scales in

38 fungal genomes is often associated with similarities in ecological lifestyle.

39 Additionally, we find that while clusters are often structured as independent

40 modules with little overlap in content, certain gene families merge multiple

41 modules in a common network, suggesting they are important components of

42 phenylpropanoid degradation strategies. Together, our results suggest that

43 phenylpropanoids have repeatedly selected for gene clustering in fungi, and

2 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

44 highlight the interplay between gene organization and ecological evolution in this

45 ancient eukaryotic lineage.

46

47 Introduction

48 Genome organization is intimately linked to the trajectory of organismal

49 evolution. The impacts of genome organization on prokaryotic evolution in

50 particular have been extensively studied (Baquero 2004), and increasingly, the

51 organization of genes in eukaryotic genomes is also recognized to affect

52 organismal fitness and evolution. For example, the spatial clustering of

53 functionally related genes may enable the compartmentalization and optimization

54 of phenotypes through coordinated gene expression (Hurst et al. 2002; Al-

55 Shahrour et al. 2010; McGary et al. 2013). Similarly, the formation of loci

56 composed of co-adapted alleles can facilitate the inheritance of locally adapted

57 ecotypes within recombining populations over short time periods (Yeaman 2013;

58 Holliday et al. 2016). Rather than resulting from non-adaptive processes, such as

59 genetic hitchhiking, the persistence of such organizational patterns in eukaryotic

60 genomes suggests they may instead result from natural selection (Hurst et al.

61 2002; Lynch 2007). However, the extent to which eukaryotic genome

62 organization is driven by specific environmental selection pressures, especially

63 over macroevolutionary timescales, is not clear.

64 Organized genome structure is particularly apparent in fungi, a lineage of

65 eukaryotic microorganisms that perform critical ecosystem services and biomass

3 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

66 transformation, and also impact plant and animal health (Peay et al. 2016).

67 Fungal genomes are more or less replete with metabolic gene clusters (MGCs)

68 composed of genes encoding enzymes, transporters and regulators that

69 participate in specialized metabolic processes such as nutrient acquisition,

70 competition and defense (Wisecaver et al. 2014). Although MGCs are far more

71 rare in eukaryotes compared with bacteria, fungal MGCs exhibit similarly sparse

72 and disjunct phylogenetic distributions among distantly related species with

73 overlapping niches (Greene et al. 2014; Dhillon et al. 2015; Glenn et al. 2016).

74 This ecological pattern of distribution suggests that conserved combinations of

75 genes may be signatures of ecological selection in fungal genomes.

76 Fungal MGCs encoding the production of specialized, or secondary,

77 metabolites (SMs) have been studied extensively (Hoffmeister and Keller 2007)

78 and more recently, several reports suggest that adaptations to degrade plant

79 SMs are also encoded in MGCs (Shanmugam et al. 2010; Greene et al. 2014;

80 Wang et al. 2014; Kettle et al. 2015; Glenn et al. 2016). Plant SMs mediate

81 important biotic and abiotic interactions, including the exclusion of fungal

82 pathogens, the establishment of mutualisms, and the rates of nutrient cycling

83 long after the plant has died. The largest group of plant SMs are the

84 phenylpropanoids, which not only contribute to constitutive and inducible

85 chemical defenses, but are also the main barriers to wood decay in terrestrial

86 ecosystems (Floudas et al. 2012), and costly inhibitors of lignocellulose biofuel

87 production (Jönsson and Martín 2016). As the primary colonizers of plant

4 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

88 material in natural and artificial environments, fungi are frequently in contact with

89 phenylpropanoids, and must often mitigate their inhibitory effects in order to

90 grow. Common fungal adaptations to phenylpropanoid toxicity include

91 detoxification through sequestration, excretion and degradation (Mäkelä et al.

92 2015). Despite the characterization of many phenylpropanoid degradation

93 pathways in fungi, the genomic bases of these pathways are largely unknown

94 (Mäkelä et al. 2015), precluding the use of currently available algorithms to

95 investigate whether or not these metabolic processes are encoded in MGCs

96 (Wisecaver et al. 2014; Weber et al. 2015).

97 Here, we developed a novel algorithm based on empirically derived

98 models of fungal genome evolution in order to systematically identify clusters of

99 genes putatively involved in phenylpropanoid degradation, enabling us to test the

100 hypothesis that selection pressures from plant SMs impact genome organization

101 across disparate fungal lineages. Using a database of 556 fungal genomes

102 representing 481 species, we detected 1168 MGCs and many conserved

103 combinations of MGCs that putatively degrade a broad array of

104 phenylpropanoids. We then tested for associations between MGCs and various

105 fungal ecological lifestyles, and found that the presence of certain MGCs was

106 enriched in plant pathotrophs and saprotrophs. While many clusters appear to

107 have evolved independently, we identified several gene families that are

108 commonly associated with diverse MGCs, suggesting they play important roles in

109 phenylpropanoid catabolism. These results suggest that phenylpropanoids are

5 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

110 drivers of gene organization in plant-associated fungi, and that MGCs may in turn

111 determine patterns of fungal community assembly on both living and decaying

112 plant tissues.

113

114 Results

115

116 Diverse candidate gene clusters are associated with phenylpropanoid

117 degradation

118

119 Using 27 different “anchor” gene families involved in phenylpropanoid

120 degradation as separate queries (Methods; Supplementary Table 1), we

121 searched 556 genomes from 481 fungal species for clusters associated with

122 each anchor gene family (Methods; Supplementary Figure 1, Supplementary

123 Table 2). After removing those clusters containing genes known to exclusively

124 participate in fungal secondary metabolite biosynthesis (Supplementary Table 3;

125 Supplementary Table 4), as well as gene clusters with fewer than 4 genes, we

126 found evidence of unexpected clustering in 13 anchor gene families, which we

127 defined as separate cluster classes. We identified a total of 1168 clusters

128 distributed across 363 fungal genomes (Figure 1, Supplementary Figure 2,

129 Supplementary Table 5). 31 of these clusters belonged to multiple clusters

130 classes (i.e., contained multiple anchor genes) (Supplementary Table 3). When

131 accounting for multiple occurrences of homologous clusters (as defined by

6 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

132 cluster model; Methods) among individuals of the same species, this total figure

133 was reduced to 962 clusters distributed across 309 fungal species. The number

134 of cluster models per cluster class varies from 1 to 8, and clusters assigned to

135 different models generally do not overlap in terms of content (Supplementary

136 Table 6). Additionally, clusters typically contained very few intervening genes

137 (Supplementary Figure 3). Clusters are disproportionately distributed across

138 fungal lineages: for example, comprises 50.3% of the species in

139 the analysis yet contains 89.4% of all clusters, while Agaricomycetes represent

140 22.9% of species, but have only 4.7% of clusters (Supplementary Table 7).

141 However, homologous clusters assigned to the same model are typically found

142 distributed across different taxonomic classes (Supplementary Table 7).

143 In order to benchmark our approach, we searched for previously

144 characterized clusters containing quinate 5-dehydrogenase (QDH) and stilbene

145 dioxygenase (SDO) homologs. All 8 previously predicted clusters from the SDO

146 cluster class (Greene et al. 2014) and all 31 described clusters from the QDH

147 class (Hane et al. 2007; Shalaby et al. 2012) were recovered. Additionally, while

148 this study was being prepared, a single cluster from the pterocarpan hydroxylase

149 (PAH) class was identified elsewhere (Pigné et al. 2017). To our knowledge, the

150 remaining 1128 clusters identified in this analysis are described here for the first

151 time. We expect some false negative error in our detection but not significant

152 false positive to arise due to variation in the quality of genome assemblies and

153 annotations. Some false positive error in our detection may, however, arise from

7 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

154 the fact that beyond the function of the anchor gene, we did not require any

155 genes to be demonstrably involved in phenylpropanoid degradation because the

156 loci involved in such pathways are for the most part unknown (Mäkelä et al.

157 2015). Although we removed clusters with genes known to exclusively participate

158 in biosynthetic secondary metabolism (Supplementary Table 4), it is still possible

159 that some of the remaining clusters participate in biosynthetic reactions, as it is

160 often difficult to definitively distinguish between biosynthetic and catabolic

161 processes.

162

163 Clustered homolog groups, including those distributed across cluster classes, are

164 enriched for primary and secondary metabolic functions

165

166 Despite the lack of functional constraints on cluster composition, clustered

167 homolog groups were, enriched for 7 KOG processes primarily related to either

168 primary or secondary metabolism, including “Energy production and conversion”

169 and “Secondary metabolite biosynthesis, transport and catabolism”, the two

170 largest and most significantly enriched categories (Supplementary Table 8). The

171 KOG process “Transcription” was the only non-metabolic process enriched in

172 clustered homolog groups. Homolog groups present in more than one cluster

173 class are also enriched for multiple functional categories related to primary and

174 secondary metabolism (Supplementary Table 9). Conserved domains from the

175 Pfam database that are present among shared homolog groups include transport

8 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

176 and transcription-related domains, cytochrome P450 domains, and domains

177 related to functional group modification (e.g., dehydrogenase, transferase,

178 amidase and decarboxylase; Supplementary Table 10).

179

180 Phenylpropanoid degradation gene clusters are enriched in Pezizomycotina

181 species with plant-associated lifestyles

182

183 We limited our exploratory enrichment analyses to the Pezizomycotina, as they

184 contained the vast majority of identified clusters. Clusters from 5 cluster classes

185 are enriched in plant pathotrophs, including those containing the anchor genes

186 naringenin 3-dioxygenase (NAD) and epicatechin laccase (ECL) that are know to

187 participate in flavonoid degradation (Figure 1). An additional 3 cluster classes are

188 enriched in other plant-associated lifestyles, such as plant saprotrophs and

189 endophytes. The benzoate 4-monooxygenase (BPH) cluster class is enriched in

190 soil saprotrophs. No cluster classes are enriched in plant symbiotrophs or animal

191 pathotrophs. Similar patterns of lifestyle-dependent enrichment are observed

192 when comparing Pezizomycotina species with specific cluster models to those

193 without any cluster models (Supplementary Figure 4). Notably, at least one model

194 in each of 12 cluster classes is enriched in a plant-associated lifestyle. Aromatic

195 ring dioxygenase (ARD) models are most often enriched in plant-associated

196 lifestyles: model 1 is enriched in plant symbiotrophs and plant pathotrophs;

197 model 2 is enriched in plant pathotrophs, and model 5 is enriched in endophytes.

9 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

198 Although ecological enrichments highlight predominant trends, clusters across all

199 anchor genes are typically associated with diverse lifestyles, in part because any

200 given fungal species can be associated with multiple ecological lifestyles.

201 We observed that 207 fungal species possessed more than one cluster

202 model. These fungi were assigned to multi cluster model profiles (MCMPs) based

203 on the combinations of cluster models found in their genomes (Methods). 139

204 fungal species were distributed across 16 distinct MCMPs with 5 or more species

205 (Figure 2). Fungi from the same MCMP tend to be closely related; however, 13

206 MCMPs contain fungi from different taxonomic orders, and 6 of these contain

207 fungi from different taxonomic classes. We also limited this exploratory

208 enrichment analysis to Pezizomycotina species and found that 4 multi-cluster

209 profiles are enriched in species with plant-associated lifestyles, 1 is enriched in

210 animal pathotrophs and endophytes, and 1 is enriched in soil saprotrophs (Figure

211 2).

212

213 Homolog groups are found in modules of co-clustered genes

214

215 We visualized structural associations among all clustered homolog groups as a

216 network where nodes represent homolog groups, and edges represent the co-

217 occurrence of homolog groups in a cluster (Figure 3). A number of homolog

218 groups are found in clusters belonging to different cluster classes (39 in total;

219 Supplementary Table 10), which resulted in a common network for all cluster

10 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

220 classes. Shared homolog groups are typically among the most highly connected

221 nodes in this network (i.e., they co-occur frequently with a diverse set of homolog

222 groups; Supplementary Table 10). The modularity of the network is significantly

223 greater than that observed in networks of similar size with randomly distributed

224 edges (Supplementary Table 11). This high degree of modularity reflects a large

225 number of homolog groups that are unique to each cluster class (238 in total).

226 Some homolog groups, while not shared across cluster class, are highly

227 connected because they are present in different cluster models (e.g., see

228 homolog group 1001 in Figure 4).

229

230 Results spotlight (could be displayed as a Box, combining results and

231 discussion): Pterocarpan hydroxylase clusters may encode uncharacterized

232 strategies for flavonoid degradation

233

234 The production of toxic SMs by plants is an ancient and effective strategy for

235 deterring fungal growth. Isoflavonoids, for example, are a critical component of

236 legume (Fabaceae) defenses against fungi. In response, some legume

237 pathogens have evolved degradative metabolic pathways to detoxify

238 isoflavonoids, thus facilitating host colonization (Enkerli et al. 1998). Pterocarpan

239 hydroxylase (PAH) is involved in the degradation of pterocarpan and medicarpin

240 isoflavonoids, and can contribute to virulence on susceptible plant hosts (Enkerli

241 et al. 1998). We found 133 clusters in the PAH class, distributed across 94 fungal

11 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

242 species with diverse ecological lifestyles (Figure 4). Homology among these

243 clusters is best described by 4 models containing many of the canonical

244 functions found in fungal MGCs. Models 1, 2 and 4 all contain cytochrome

245 P450s, a large family of enzymes involved in detoxification mechanisms in fungi

246 and other organisms (Lah et al. 2011), while models 2 and 3 contain at least one

247 transporter and model 4 contains a transcription factor. Model 3 contains a

248 homolog of isoflavone reductase, an enzyme used in isoflavonoid biosynthesis in

249 plants, but may also be involved in isoflavonoid detoxification pathways (Höhl et

250 al. 1989). The widespread distribution of PAH clusters suggests they are involved

251 in the degradation of flavonoids found outside the Fabaceae, and the enrichment

252 of model 1-type clusters in plant saprotrophs suggests that flavonoids that persist

253 in plant tissues may impact fitness in saprophytic environments. However, the

254 genetics underlying isoflavonoid biosynthesis and degradation, and flavonoid

255 metabolism in general, are not well understood for fungi, despite the ecological

256 importance of this chemical family.

257 A PAH homolog in Alternaria brassisicola found in a cluster assigned to

258 model 1 was recently shown to contribute to the accumulation of DHN melanin in

259 cell walls, possibly through catalyzing melanin polymerization or the cross-linking

260 of melanin to fungal cell walls (Pigné et al. 2017). Melanins are a class of

261 complex phenolic polymers that can serve to quench oxidizing radicals, and play

262 a role in fungal pathogenicity and survival in harsh environments (Bayry et al.

263 2014). Intriguingly, plant flavonoids can be polymerized in planta and in vitro into

12 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

264 melanin by fungal laccases (Fowler et al. 2011) and possibly by

265 monooxygenases similar to PAH (Desentis-Mendoza et al. 2006). As the

266 repurposing of host metabolites as substrates for the production of SMs is not

267 unprecedented in fungi (Schmaler-Ripcke et al. 2009), we suggest that this PAH

268 cluster in particular is a good candidate for exploring hypotheses of cross talk

269 between specialized metabolic pathways.

270

271 Discussion

272

273 Candidate phenlypropanoid degradation MGCs found at over 1000 loci with

274 unexpectedly conserved synteny

275

276 Fungi are the most ecologically significant decomposers of plant biomass, and

277 represent some of the world’s most devastating plant pathogens (Peay et al.

278 2016). The presence of catabolic genes targeting plant SMs in fungal genomes

279 has often been linked to species ecology and the ability to colonize different

280 types of plant tissues (Floudas et al. 2012). Much less is known about how these

281 genes are organized in fungal genomes, and how their organization impacts or

282 may be affected by fungal evolution and ecology. The genomes of many fungi

283 tend to undergo extensive and frequent rearrangements (Hane et al. 2011;

284 Plissonneau et al. 2016). By contrast, MGCs we detected are often

285 simultaneously present in lineages that diverged hundreds of millions of years

13 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

286 ago (Floudas et al. 2012) (Figure 1). As combinations of genes that persist over

287 long periods of evolutionary time are likely to be maintained by natural selection

288 (Hurst et al. 2002; Baquero 2004; Muto et al. 2013) and often encode proteins

289 that are functionally related (Szklarczyk et al. 2014; Zhao et al. 2014), we

290 hypothesize that the conserved combinations of phenylpropanoid degrading

291 genes detected here encode adaptive metabolic phenotypes and are signatures

292 of selection on gene organization.

293 The most well studied phenotype conferred by clustering is the improved

294 coordination of gene transcription (Hurst et al. 2002). Clustering facilitates gene

295 transcription by allowing co-regulation through local chromatin modifications

296 (Shwab et al. 2007), the sharing of promoters (Davila Lopez et al. 2010), and the

297 avoidance of topological constraints on DNA encountered during transcription

298 (Tsochatzidou et al. 2016). In addition to synchronising cellular responses,

299 coordinated gene transcription may also decrease the accumulation of toxic

300 intermediate metabolites, such as those produced during phenylpropanoid

301 degradation (Greene et al. 2014; Mäkelä et al. 2015), by optimizing enzyme

302 stoichiometry and improving metabolic flux (McGary et al. 2013). Gene clustering

303 may additionally be selected for because it increases the capacity for

304 recombining populations to evolve. When genes contributing to the same trait are

305 clustered, selection for that trait is more efficient because of decreased selective

306 interference between genes at that locus (Pepper 2003). Clustering can also

307 prevent the breakup of co-adapted alleles in the face of gene flow (Yeaman

14 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

308 2013), which may enable local metabolic adaptation of fungal populations

309 distributed across cryptic niches.

310 Similarly, clustering increases the probability of propagating adaptive

311 combinations of genes through both vertical (Pepper 2003) and horizontal

312 (Lawrence and Roth 1996; Baquero 2004) transfer. Many of the clusters

313 observed in this study have discontinuous phylogenetic distributions that are

314 consistent with evolutionary scenarios such as vertical inheritance coupled with

315 extensive loss, convergence, duplication, or horizontal gene transfer (HGT). HGT

316 of MGCs is predicted to be associated with rapid adaptive changes to

317 phenotypes, including increased capacity for host colonization (Dhillon et al.

318 2015; Glenn et al. 2016) and nutrient acquisition (Slot and Hibbett 2007). For

319 example, MGCs in bacteria are frequently dispersed by HGT among species

320 inhabiting the same environment or host (Baquero 2004). HGT is positively

321 influenced by relatedness in bacteria (Andam and Gogarten 2011) and fungi

322 (Wisecaver et al. 2014; Gluck-Thaler and Slot 2015), as well as by shared

323 ecological niche (Smillie et al. 2011). Given that the distributions we observed

324 here may be at least partly HGT-driven (Greene et al. 2014), the role of HGT in

325 the dispersal of putative phenylpropanoid degrading MGCs must be explicitly

326 tested with phylogenetic methods (Gluck-Thaler and Slot 2015).

327 We detected the vast majority of clusters in sub-phylum Pezizomycotina.

328 Among fungi, the Pezizomycotina are known to possess highly clustered

329 genomes, while phylum Basidiomycota and sub-phylum Saccharomycotina have

15 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

330 much fewer (Wisecaver et al. 2014). Such differences may be due modes of

331 chromosomal evolution unique to the Pezizomycotina that involve extensive

332 rearrangements that could facilitate cluster formation (Hane et al. 2011). The

333 large population sizes attributed to many Pezizomycotina lineages could then

334 serve to increase selection efficiency for clusters (Lynch and Walsh 2007) such

335 that they would be maintained. Fungi in the Pezizomycotina are also thought to

336 experience more HGT than those in other phyla (Wisecaver et al. 2014), which

337 may facilitate cluster dispersal.

338

339 Gene cluster distributions suggest ecological adaptation

340

341 While certain clusters have distributions suggestive of lineage-specific bias

342 (Supplementary Table 7), these conserved combinations of genes were

343 ultimately detected by their unexpected phylogenetic distributions that conflicted

344 with lineage-specific phylogenetic signal, indicating that phylogeny is not the sole

345 determinant of cluster distributions. Indeed, gene organization is often an

346 important determinant of fitness and may be differentially selected across

347 ecological niches (Kirkpatrick and Barton 2006; Yeaman 2013). Our analysis

348 suggests that the presence of putative phenylpropanoid degrading MGCs in the

349 Pezizomycotina is associated with ecological lifestyle (Figure 1), although this

350 does not of course preclude other traits, such as host preference, that might also

351 affect cluster distributions. Intriguingly, two of the five cluster classes enriched in

16 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

352 plant pathotrophs (NAD and ECL) are predicted to target flavonoids, which are

353 important components of plant defense against fungal pathogens. Such clusters

354 may contribute to pathogen fitness, as recent reports suggest that the

355 degradation of host defense compounds contributes quantitatively to plant

356 pathogen virulence and reproduction (Hammerbacher et al. 2013; Kettle et al.

357 2015). Other cluster classes enriched in plant pathotrophs, including QDH, ARD

358 and phenol 2-monooxygenase (PMO), are predicted to target less derived

359 molecules, and may participate in the downstream degradation of more complex

360 phenylpropanoids, such as lignin (Jönsson and Martín 2016). Catabolic clusters

361 may also confer fitness benefits to saprotrophs living in plant tissues or soil by

362 enabling the degradation of phenylpropanoids that inhibit fungal colonization of

363 organic matter (Floudas et al. 2012). Indeed, many of the cluster classes

364 enriched in saprotrophs, including ferulic acid esterase (CAE) and BPH are

365 predicted to target phenylpropanoids commonly found in decaying plant material

366 and soils (Mäkelä et al. 2015). We also found that individual cluster models are

367 enriched in plant-associated lifestyles, giving insight into the specific patterns of

368 gene organization favoured by natural selection (Supplementary Figure 4).

369 Clusters from different models were rarely enriched in more than one ecological

370 lifestyle, possibly reflecting differential selection pressures imposed by

371 phenylpropanoids encountered in different ecological contexts.

372

17 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

373 Cluster model co-occurrences may reflect simultaneous selection by multiple

374 plant metabolites

375

376 The modularity of the structural network of clustered homolog groups suggests

377 that different cluster classes carry out semi-independent functions (Wagner

378 1996), and may reflect differences in phytochemicals processed by each cluster

379 class (Figure 3). Minimally, the structuring of clusters as independent modules

380 with little overlap in content indicates that cluster classes have largely evolved

381 independently of one another, and that selection for gene organization is likely to

382 have occurred multiple times.

383 We sorted different fungal species into multi-cluster model profiles

384 (MCMPs) based on the combinations of clusters found in their genomes (Figure

385 2), raising the intriguing possibility that conserved cluster repertoires may be

386 important for determining ecological phenotypes, similar to how combinations of

387 pathogenicity islands in bacteria determine host range (Bouyioukos et al. 2016).

388 While certain MCMPs were only present in specific fungal classes, others were

389 present in fungi from different taxonomic classes and orders, and certain MCMPs

390 are enriched in Pezizomycotina species with specific plant-associated lifestyles

391 (Figure 2). Plants typically do not produce a single chemical in isolation; rather,

392 defense metabolites are released as mixtures, either with metabolites from

393 different pathways, or with metabolic precursors from the same pathway, or both

394 (Gershenzon et al. 2012). MCMPs may reflect the compositions of SM mixtures

18 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

395 encountered during the colonization of specific hosts, or possibly temporal

396 differences in defense compound pressure (Hammerbacher et al. 2013), and

397 may underlie assembly patterns of fungal communities on different hosts and

398 substrates. A detailed investigation of the full complement of degradative MGCs

399 from fungi colonizing the same plant host would be ideal for testing this

400 hypothesis.

401

402 Do gene clusters bridge plant and fungal metabolisms?

403

404 In order to infer which clustered homolog groups may play outsized roles in

405 fungal strategies for phenylpropanoid degradation, we ranked each group by the

406 number of non-redundant co-occurrences they have with other clustered homolog

407 groups, and identified homolog groups present in different cluster classes (Figure

408 3; Supplementary Table 10). We predict that the homolog groups with the

409 greatest number of co-occurrences (i.e., the most highly connected, Figure 3), as

410 well as those found associated with the greatest number of cluster classes, have

411 been repeatedly recruited to diverse catabolic processes and may encode key

412 strategies for phenylpropanoid degradation in fungi. Notably, some of the shared

413 homolog groups, like those with members encoding cytochrome P450 enzymes

414 (see groups ALL1024, ALL1032, ALL1115 in Supplementary Table 10) and multi-

415 drug transporters (ALL1122, ALL1127), are known to be associated with

19 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

416 evolutionary adaptability and often underlie fungal detoxification strategies of

417 diverse chemicals (Lah et al. 2011; Mäkelä et al. 2015).

418 Fungi often exclusively acquire carbon from plant-derived substrates.

419 Intriguingly, the cellular processes associated with shared homolog groups also

420 suggest selection to integrate phenylpropanoid degradation products into central

421 fungal metabolism. Other shared homolog groups have members normally

422 associated with core fungal metabolism, for example, carbohydrate transporters

423 (MFS monocarboxylate transporter: ALL1065; Sugar (and other) transporter:

424 ALL1071, ALL1103), enzymes that produce substrates for aromatic amino acid

425 biosynthesis (3-dehydroshikimate dehydratase: ALL1002), and enzymes that

426 participate in aromatic amino acid degradation (fumarylacetoacetate hydrolase:

427 ALL1055). Metabolic processes, such as those often encoded in clusters (Muto

428 et al. 2013; Greene et al. 2014), are predicted to evolve through the recruitment

429 and subsequent specialization of duplicated enzyme-encoding genes from

430 existing metabolic pathways (Jensen 1976). Shared homolog groups thus offer

431 evidence that genes involved in SM degradation as well as those involved in core

432 fungal metabolism have been recruited from existing pathways to MGCs, as has

433 been suggested elsewhere (Greene et al. 2014). Combinations of genes in

434 MGCs may ultimately serve to connect diversifying chemical environments to a

435 stable metabolic network, acting as a bridge between specialized plant

436 metabolism and core fungal metabolism, and enabling fungal growth under what

437 would otherwise be challenging conditions.

20 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

438 The prevalence of putative phenylpropanoid-degrading MGCs in plant-

439 associated fungi suggests that specialized plant metabolism is a strong source of

440 selection on fungal gene organization. Based on their distribution, we

441 hypothesize that the MGCs detected here encode selectable phenotypes that

442 promote the colonization of plant substrates by fungi. Further functional

443 characterization of these loci will serve to accelerate discovery of metabolic

444 processes that are of great interest not only for lignocellulosic biofuel production,

445 but also for our understanding of the evolutionary dynamics between gene

446 organization and ecological adaptation.

447

448 Materials and methods

449

450 Data acquisition, annotation and software specifications

451

452 Publically available data from 556 assembled fungal genomes and predicted

453 proteomes were downloaded from various sources (Supplementary Table 2).

454 While data from all genomes was used to generate the presented results, specific

455 results from the 209 unpublished genomes have been redacted, and assigned

456 generic names based on taxonomic class. For each gene with alternative

457 transcripts in the set of genomes, the predicted amino acid sequence associated

458 with the longest of such transcripts was used for cluster detection, and the rest

459 were discarded. Ecological metadata of all 556 genomes were compiled from

21 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

460 various sources (Supplementary Table 2), including the community-curated

461 FUNGuild database (last accessed 09/09/16) (Nguyen et al. 2016) and the U.S.

462 National Collection database (last accessed April 1st, 2017) (Farr and

463 Rossman 2017). The naming of ecological lifestyles follow the framework

464 established by FUNGuild (Nguyen et al. 2016).

465 Amino acid sequence searches with cutoffs of 30% identity, 50 bitscore

466 and where the length of the target sequence was 50-150% of the query

467 sequence, were performed with using USEARCH v8.0.1517’s UBLAST algorithm

468 (additional parameters: evalue cutoff = 1e-5, accel = 0.8) (Edgar 2010) or with

469 BLASTp v2.2.25+ (additional parameters: evalue cutoff = 1e-4) (Altschul et al.

470 1990). Homology was determined using OrthoMCL v2 with an inflation value of

471 1.5 (Fischer et al. 2011). All amino acid sequences from clustered homolog

472 groups were assigned to orthologous groups from the fuNOG database (last

473 accessed 03/18/16) (Huerta-Cepas et al. 2015) using HMMER3 (Eddy 2011).

474 Clustered sequences were additionally annotated by PFAM and GO term using

475 InterProScan 5 v5.20-59.0 (Jones et al. 2014). Predicted functional annotations

476 and KOG processes for each clustered homolog group were based on the most

477 frequent fuNOG annotation assigned to proteins within that group.

478 All phylogenetic trees were visualized using ETE v3 (Huerta-Cepas et al. 2016)

479 and all other graphs were visualized using the ggplot2 package in R (Wickham

480 2016). All data used to generate the presented figures can be found in

481 Supplementary Table 12.

22 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

482

483 Construction of the microsynteny tree

484

485 The evolution of microsynteny (gene content conservation over small genomic

486 distances) does not necessarily recapitulate phylogenetic relationships

487 determined by models of sequence evolution. In order to assess the

488 unexpectedness of cluster distributions within a phylogenetic framework based

489 on microsynteny conservation, we used an approach similar to Snel et al. (1999)

490 (Snel et al. 1999) to construct a species tree based on microsyntenic distance, or

491 pairwise comparisons of gene content conservation (Supplementary Figure 1).

492 Distances based on microsynteny are necessarily bounded by some maximum

493 value that is often rapidly reached as microsynteny degrades, and thus the ability

494 to resolve phylogenetic relationships decreases as evolutionary distance

495 increases (Rogozin et al. 2004). Nevertheless, the species relationships that we

496 recovered in general correspond well with those obtained using conventional

497 phylogenetic methods (Schoch et al. 2009; Spatafora et al. 2016).

498 To construct the microsynteny tree, up to 10 genes upstream and

499 downstream of all homologs from a randomly selected gene family were retrieved

500 (designated “gene neighborhood”), then had their amino acid sequences

501 compared against one another using UBLAST, and subsequently were sorted

502 into homolog groups using OrthoMCL. For each pairwise neighborhood

503 comparison, the total number of shared orthologs was divided by the size of the

23 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

504 smallest neighborhood, subtracted from 1, and designated as pairwise syntenic

505 distance. Pairwise syntenic distances thus range from 0 (all homolog groups are

506 shared within a neighborhood) to ~0.95 (only one gene, the randomly selected

507 query, is shared). 1000 neighborhoods were randomly sampled in this way. A

508 neighbor-joining tree was constructed from the distance matrix of the median

509 pairwise syntenic distance for each pairwise genome comparison using the ape

510 package in R (Paradis et al. 2004). The original dataset was sampled with

511 replacement to obtain 100 bootstrap pseudo-replicate trees that were then used

512 to calculate bootstrap support on the original tree using the –z switch in RAxML

513 v8.2.0 (Stamatakis 2014). Nodes on the original tree receiving less than 70%

514 bootstrap support were collapsed. The final microsynteny tree was used to

515 calculate distances covered by cluster distributions, as well as individual homolog

516 groups.

517

518 Sampling null models of gene cluster evolution

519

520 Empirically derived null models describing the background levels of “gene

521 cluster” distributions were developed for each of the 12 largest taxonomic classes

522 and each cluster size ranging from 4-24 genes (252 distributions in total). Briefly,

523 for a given null distribution, a group of neighboring genes (i.e., query cluster) of

524 size X was chosen at random from a randomly selected genome from taxonomic

525 class Y. Hits to each gene in the query cluster were recovered from the proteome

24 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

526 database using BLASTp. Genomes containing homologous clusters (i.e.,

527 containing hits to all genes in the query cluster, where no more than 6 intervening

528 genes separated any hit from another) were then retrieved. To determine the

529 phylogenetic distance associated with the query cluster distribution, the total

530 branch length distance on the microsynteny tree connecting all genomes with

531 homologous clusters was calculated and stored. The above sampling process

532 was repeated 500 times for each null distribution.

533

534 Locating Clusters through Unexpected Synteny

535

536 We have identified previously undetected gene clusters putatively involved in

537 phenylpropanoid degradation using a method that posits that fungal genomes are

538 spatially compartmentalized by function, similarly to those of bacteria (Snel et al.

539 2000; Rogozin et al. 2002; Rogozin et al. 2004; Szklarczyk et al. 2014; Zhao et

540 al. 2014; Zaidi and Zhang 2016). Our method also draws upon the observation

541 that microsynteny is often poorly conserved in fungal genomes (Hane et al.

542 2011), which has previously enabled biosynthetic MGC detection through the

543 identification of conserved microsynteny among individuals of the same species

544 (Takeda et al. 2014). Briefly, genes in regions surrounding “anchor genes” of

545 interest were assigned to homolog groups. Clusters were then detected by

546 comparing the phylogenetic distributions of homolog group combinations within

547 these regions to distributions expected under null models of microsynteny

25 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

548 evolution. More detailed steps are described below and in Supplementary Figure

549 1.

550 All homologs of an anchor gene query sequence of interest were retrieved

551 using BLASTp. Twenty genes upstream and downstream of all anchor gene

552 homologs (designated “anchor gene neighborhood”) were retrieved, compared

553 against one another using UBLAST and sorted into homolog groups using

554 OrthoMCL. Individual homolog groups whose distribution on the microsynteny

555 species tree had a maximum pairwise distance below 0.95 were discarded.

556 Within the set of all anchor gene neighborhoods, the set of unique combinations

557 of 4 or more homolog groups that included the anchor gene (“cluster motifs”) was

558 determined. The genomes in which each motif occurred were identified, with the

559 condition that genes belonging to homolog groups in the motif never be

560 separated by more than 6 intervening genes. For each genome in which a given

561 motif occurred, the probability of observing the motif in that genome, given the

562 size of the motif and the taxonomic class of the genome, was empirically

563 estimated by determining the proportion of samples in the appropriate null

564 distribution that cover a total distance on the microsynteny tree greater than or

565 equal to the distance associated with the given motif, divided by the total number

566 of samples in the null distribution. For example, to estimate the probability of

567 observing a motif with 8 homolog groups in a Dothideomycete genome, we would

568 compare the total distance associated with the observed motif to the null

569 distribution of distances associated with size 8 clusters sampled from

26 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

570 Dothideomycetes. For all such test, the test statistic is distance on the

571 microsynteny tree, and the null hypothesis is that the phylogenetic distribution of

572 a given cluster motif containing an anchor gene is consistent with background

573 rates of microsynteny evolution. The null hypothesis is rejected for motifs with an

574 estimated probability below the significance threshold of 0.05 in at least one

575 genome, and all genes assigned to such motifs are designated clusters. Clusters

576 with proteins that had fuNOG annotations or PFAM domains or GO term

577 annotations associated with proteins known to exclusively participate in fungal

578 secondary metabolite biosynthesis were excluded from further analysis

579 (Supplementary Table 4). All annotations of clustered proteins were also

580 manually inspected for evidence of exclusive participation in biosynthetic

581 metabolism, and excluded if necessary.

582

583 Cluster models and multi cluster model profiles (MCMPs)

584

585 Homology among clusters from the same cluster class (i.e., containing the same

586 anchor gene) was determined by assessing similarities in homolog group

587 content. Briefly, a matrix detailing the presence or absence of homolog groups in

588 each cluster was used to calculate Bray-Curtis dissimilarity indices for all pairwise

589 comparisons of clusters. Pairwise comparisons were then grouped using

590 complete linkage clustering, and any clusters separated by under 0.6 distance

591 units were assigned to the same cluster model. This cutoff was empirically

27 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

592 determined after manual examination of the content of clusters assigned to the

593 same model under various distance cutoffs. Homolog groups present in ≥75% of

594 clusters assigned to a given model were then used to summarize that model. The

595 above approach was also used to group fungal species with similar combinations

596 of clusters into MCMPs by using a matrix detailing the presence or absence of

597 homologous clusters (as determined by cluster model) among all species with

598 two or more clusters. MCMPs observed in more than 5 species were used for

599 enrichment analyses. All above analyses were performed using the vegan

600 package in R (Oksanen et al. 2016).

601

602 Network analyses

603

604 Amino acid sequences from all clustered homolog groups across the 13 cluster

605 classes were combined into one set and then sorted into new homolog groups

606 using OrthoMCL. Homolog groups that contained sequences from multiple

607 cluster classes were designated as “shared”. Pairwise co-occurrences of all

608 homolog groups in all clusters were determined, and visualized as a network with

609 Cytoscape v.3.4.0 (Shannon et al. 2003). The network layout was determined

610 solely by the AllegroLayout plugin with the Allegro Spring-Electric algorithm.

611 Analyses of network modularity were performed with the spectral partitioning

612 algorithm (Newman 2006), as implemented in MODULAR (Marquitti et al. 2014),

28 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

613 and the probability of network modularity was estimated by randomly sampling

614 the network 1000 times, twice.

615

616 Enrichment tests

617

618 A one tailed Fisher’s exact test, as implemented in the Text-NSP Perl module,

619 was used to conduct all tests of enrichment (see Supplementary Figure 1 for

620 examples of contingency table and formulae). Unless noted otherwise, gene

621 cluster, cluster model or MCMP presence/absence was recorded at the species

622 level, and counts of species were used to fill in contingency tables. Odds ratios

623 were calculated to estimate the magnitude of the enrichment effect (Szumilas

624 2010). Contingency tables with a zero in least one cell had all cells incremented

625 by 0.5 to avoid division by zero when calculating the odds ratio (Bradburn et al.

626 2007). In general, the odds ratio can be interpreted as the odds that fungi have a

627 particular lifestyle, given they have a particular cluster-based feature (either

628 cluster presence (Figure 1b), cluster model presence (Supplementary Figure 4)

629 or MCMP assignment (Figure 2c). The precision of each odds ratio, except for

630 those associated with contingency tables with a zero in at least one cell, was

631 estimated by calculating the 95% confidence interval (Szumilas 2010). The

632 general form of the null hypothesis is that particular features are not enriched in

633 fungi with a particular ecological lifestyle. We rejected the null hypothesis at α =

634 0.05. As the Pezizomycotina possessed 89.4% of all clusters detected, only data

29 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

635 from Pezizomycotina genomes were used in enrichments tests presented in

636 Figure 1b, Supplementary Figure 4, and Figure 2c. Given the exploratory nature

637 of this study, we did not correct for multiple testing in order to identify trends to

638 follow up in later confirmatory studies.

639

640 Code availability

641

642 Unless noted otherwise, all analyses were performed using a series of custom

643 Perl scripts (Supplementary Figure 1). All scripts used to generate the presented

644 results are available upon request. Fasta files of all recovered cluster models, as

645 well as a script to recover cluster models identified in this analysis from a user-

646 supplied genome of interest, are available for download at

647 https://github.com/egluckthaler/cluster_retrieve

648

649 Acknowledgements and funding

650

651 This work was supported by funds from The Ohio Agricultural Research and

652 Development Center at The Ohio State University (EGT, JCS), The National

653 Science Foundation (DEB-1638999, JCS) and the Fonds de recherche du

654 Québec-Nature et technologies (EGT). All computational work was conducted on

655 the Ohio State Supercomputer. We thank [complete author list to be determined]

656 for providing access to unpublished genome data produced by the U.S.

30 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

657 Department of Energy Joint Genome Institute, a DOE Office of Science User

658 Facility, supported by the Office of Science of the U.S. Department of Energy

659 under Contract No. DE-AC02-05CH11231.

660

661

662 Author contributions

663

664 EGT and JCS conceptualized the work and wrote the manuscript. EGT

665 developed and performed all analyses.

666

667 Competing financial interests

668

669 The authors have declared that no competing interests exist.

670

671 References

672 Al-Shahrour F, Minguez P, Marqués-Bonet T, Gazave E, Navarro A, Dopazo J.

673 2010. Selection upon genome architecture: conservation of functional

674 neighborhoods with changing genes. PLoS Comput Biol 6(10):e1000953.

675 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local

676 alignment search tool. Journal of molecular biology 215:403-410.

677 Andam CP, Gogarten JP. 2011. Biased gene transfer in microbial evolution. Nat

678 Rev Micro 9:543-555.

31 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

679 Baquero F. 2004. From pieces to patterns: evolutionary engineering in bacterial

680 pathogens. Nat Rev Microbiol 2:510-518.

681 Bayry J, Beaussart A, Dufrêne YF, Sharma M, Bansal K, Kniemeyer O,

682 Aimanianda V, Brakhage AA, Kaveri SV, Kwon-Chung KJ. 2014. Surface

683 structure characterization of Aspergillus fumigatus conidia mutated in the melanin

684 synthesis pathway and their human cellular immune response. Infection and

685 immunity 82:3141-3153.

686 Bouyioukos C, Reverchon S, Képès F. 2016. From multiple pathogenicity islands

687 to a unique organized pathogenicity archipelago. Sci Rep 6.

688 Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. 2007. Much ado about

689 nothing: a comparison of the performance of meta‐analytical methods with rare

690 events. Statistics in medicine 26:53-77.

691 Davila Lopez M, Martinez Guerra JJ, Samuelsson T. 2010. Analysis of gene

692 order conservation in eukaryotes identifies transcriptionally and functionally

693 linked genes. PLoS ONE 5:e10654.

694 Desentis-Mendoza RM, Hernández-Sánchez H, Moreno A, Rojas del C E, Chel-

695 Guerrero L, Tamariz J, Jaramillo-Flores ME. 2006. Enzymatic polymerization of

696 phenolic compounds using laccase and tyrosinase from Ustilago maydis.

697 Biomacromolecules 7:1845-1854.

698 Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A, Foster A, Gill

699 N, Henrissat B, Herath P. 2015. Horizontal gene transfer and gene dosage drives

32 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

700 adaptation to wood colonization in a tree pathogen. Proceedings of the National

701 Academy of Sciences 112:3451-3456.

702 Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol

703 7:e1002195.

704 Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST.

705 Bioinformatics 26:2460-2461.

706 Enkerli J, Bhatt G, Covert SF. 1998. Maackiain detoxification contributes to the

707 virulence of Nectria haematococca MP VI on chickpea. Molecular Plant-Microbe

708 Interactions 11:317-326.

709 Fungal Databases [Internet]. 2017. U.S. National Fungus Collections, ARS,

710 USDA. cited April 1, 2017.

711 Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos

712 DS, Stoeckert CJ. 2011. Using OrthoMCL to assign proteins to OrthoMCL‐DB

713 groups or to cluster proteomes into new Ortholog groups. Current protocols in

714 bioinformatics:6.12. 11-16.12. 19.

715 Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, Martínez AT,

716 Otillar R, Spatafora JW, Yadav JS. 2012. The Paleozoic origin of enzymatic lignin

717 decomposition reconstructed from 31 fungal genomes. Science 336:1715-1719.

718 Fowler ZL, Baron CM, Panepinto JC, Koffas MA. 2011. Melanization of

719 flavonoids by fungal and bacterial laccases. Yeast 28:181-188.

720 Gershenzon J, Fontana A, Burow M, Wittstock U, Degenhardt J. 2012. Mixtures

721 of plant secondary metabolites: metabolic origins and ecological benefits. In:

33 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

722 Iason GR, Dicke M, Hartley SE, editors. The ecology of plant secondary

723 metabolites: from genes to global processes. Cambridge (UK): Cambridge

724 University Press. p. 56-77.

725 Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH, Stewart JE,

726 Snook ME. 2016. Two Horizontally Transferred Xenobiotic Resistance Gene

727 Clusters Associated with Detoxification of Benzoxazolinones by Fusarium

728 Species. PLoS ONE 11:e0147486.

729 Gluck-Thaler E, Slot JC. 2015. Dimensions of Horizontal Gene Transfer in

730 Eukaryotic Microbial Pathogens. PLoS Pathog 11:e1005156.

731 Greene GH, McGary KL, Rokas A, Slot JC. 2014. Ecology drives the distribution

732 of specialized tyrosine metabolism modules in fungi. Genome Biology and

733 Evolution 6:121-132.

734 Hammerbacher A, Schmidt A, Wadke N, Wright LP, Schneider B, Bohlmann J,

735 Brand WA, Fenning TM, Gershenzon J, Paetz C. 2013. A common fungal

736 associate of the spruce bark beetle metabolizes the stilbene defenses of Norway

737 spruce. Plant Physiology 162:1324-1336.

738 Hane JK, Lowe RG, Solomon PS, Tan K-C, Schoch CL, Spatafora JW, Crous

739 PW, Kodira C, Birren BW, Galagan JE. 2007. Dothideomycete–plant interactions

740 illuminated by genome sequencing and EST analysis of the wheat pathogen

741 Stagonospora nodorum. The Plant Cell 19:3347-3368.

34 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

742 Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP. 2011. A

743 novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi.

744 Genome Biol 12:R45.

745 Hoffmeister D, Keller NP. 2007. Natural products of filamentous fungi: enzymes,

746 genes, and their regulation. Natural product reports 24:393-416.

747 Höhl B, Arnemann M, Schwenen L, Stöckl D, Bringmann G, Jansen J, Barz W.

748 1989. Degradation of the Pterocarpan Phytoalexin (—)-Maackiain by Ascochyta

749 rabiei. Zeitschrift für Naturforschung C 44:771-776.

750 Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW. 2016. Evidence for

751 extensive parallelism but divergent genomic architecture of adaptation along

752 altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist

753 209:1240-1251.

754 Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: reconstruction, analysis, and

755 visualization of phylogenomic data. Mol Biol Evol 33:1635-1638.

756 Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei

757 T, Mende DR, Sunagawa S, Kuhn M. 2015. eggNOG 4.5: a hierarchical orthology

758 framework with improved functional annotations for eukaryotic, prokaryotic and

759 viral sequences. Nucleic acids research 44(D1):D286-D293.

760 Hurst LD, Williams EJ, Pal C. 2002. Natural selection promotes the conservation

761 of linkage of co-expressed genes. Trends Genet 18:604-606.

762 Jensen RA. 1976. Enzyme recruitment in evolution of new function. Annu Rev

763 Microbiol 30:409-425.

35 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

764 Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen

765 J, Mitchell A, Nuka G. 2014. InterProScan 5: genome-scale protein function

766 classification. Bioinformatics 30:1236-1240.

767 Jönsson LJ, Martín C. 2016. Pretreatment of lignocellulose: formation of

768 inhibitory by-products and strategies for minimizing their effects. Bioresource

769 technology 199:103-112.

770 Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM. 2015.

771 Degradation of the benzoxazolinone class of phytoalexins is important for

772 virulence of Fusarium pseudograminearum towards wheat. Molecular plant

773 pathology 16:946-962.

774 Kirkpatrick M, Barton N. 2006. Chromosome inversions, local adaptation and

775 speciation. Genetics 173:419-434.

776 Lah L, Podobnik B, Novak M, Korošec B, Berne S, Vogelsang M, Kraševec N,

777 Zupanec N, Stojan J, Bohlmann J. 2011. The versatility of the fungal cytochrome

778 P450 monooxygenase system is instrumental in xenobiotic detoxification. Mol

779 Microbiol 81:1374-1389.

780 Lawrence JG, Roth JR. 1996. Selfish operons: horizontal transfer may drive the

781 evolution of gene clusters. Genetics 143:1843-1860.

782 Lynch M. 2007. The origins of genome architecture. Sunderland (MA): Sinauer

783 Associates.

784 Mäkelä MR, Marinović M, Nousiainen P, Liwanag AJ, Benoit I, Sipilä J, Hatakka

785 A, de Vries RP, Hildén KS. 2015. Chapter Two-Aromatic Metabolism of

36 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

786 Filamentous Fungi in Relation to the Presence of Aromatic Compounds in Plant

787 Biomass. Advances in applied microbiology 91:63-137.

788 Marquitti FMD, Guimarães PR, Pires MM, Bittencourt LF. 2014. MODULAR:

789 software for the autonomous computation of modularity in large network sets.

790 Ecography 37:221-224.

791 McGary KL, Slot JC, Rokas A. 2013. Physical linkage of metabolic genes in fungi

792 is an adaptation against the accumulation of toxic intermediate compounds.

793 Proceedings of the National Academy of Sciences 110:11481-11486.

794 Muto A, Kotera M, Tokimatsu T, Nakagawa Z, Goto S, Kanehisa M. 2013.

795 Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences

796 of Reactions. Journal of Chemical Information and Modeling 53:613-622.

797 Newman ME. 2006. Finding community structure in networks using the

798 eigenvectors of matrices. Physical review E 74:036104.

799 Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J, Schilling JS,

800 Kennedy PG. 2016. FUNGuild: an open annotation tool for parsing fungal

801 community datasets by ecological guild. Fungal Ecology 20:241-248.

802 Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara B, Simpson

803 GL, Solymos P, Stevens MHH, Wagner H. 2016. vegan: Community Ecology

804 Package. Version 2.3-5.

805 Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and

806 evolution in R language. Bioinformatics 20:289-290.

37 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

807 Peay KG, Kennedy PG, Talbot JM. 2016. Dimensions of biodiversity in the Earth

808 mycobiome. Nature Reviews Microbiology 14:434-447.

809 Pepper JW. 2003. The evolution of evolvability in genetic linkage patterns.

810 Biosystems 69:115-126.

811 Pigné S, Zykwinska A, Janod E, Cuenot S, Kerkoud M, Raulo R, Bataillé-

812 Simoneau N, Marchi M, Kwasiborski A, N’Guyen G, et al. 2017. A flavoprotein

813 supports cell wall properties in the necrotrophic fungus Alternaria brassicicola.

814 Fungal Biology and Biotechnology 4:1.

815 Plissonneau C, Stürchler A, Croll D. 2016. The Evolution of Orphan Regions in

816 Genomes of a Fungal Pathogen of Wheat. mBio 7:e01231-01216.

817 Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely

818 LA, Koonin EV. 2002. Connected gene neighborhoods in prokaryotic genomes.

819 Nucleic acids research 30:2212-2223.

820 Rogozin IB, Makarova KS, Wolf YI, Koonin EV. 2004. Computational approaches

821 for the analysis of gene neighbourhoods in prokaryotic genomes. Briefings in

822 bioinformatics 5:131-149.

823 Schmaler-Ripcke J, Sugareva V, Gebhardt P, Winkler R, Kniemeyer O,

824 Heinekamp T, Brakhage AA. 2009. Production of pyomelanin, a second type of

825 melanin, via the tyrosine degradation pathway in Aspergillus fumigatus. Appl

826 Environ Microbiol 75:493-503.

827 Schoch CL, Sung G-H, López-Giráldez F, Townsend JP, Miadlikowska J,

828 Hofstetter V, Robbertse B, Matheny PB, Kauff F, Wang Z. 2009. The

38 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

829 tree of life: a phylum-wide phylogeny clarifies the origin and evolution of

830 fundamental reproductive and ecological traits. Systematic biology 58:224-239.

831 Shalaby S, Horwitz BA, Larkov O. 2012. Structure–Activity Relationships

832 Delineate How the Maize Pathogen Cochliobolus heterostrophus Uses Aromatic

833 Compounds as Signals and Metabolites. Molecular Plant-Microbe Interactions

834 25:931-940.

835 Shanmugam V, Ronen M, Shalaby S, Larkov O, Rachamim Y, Hadar R, Rose

836 MS, Carmeli S, Horwitz BA, Lev S. 2010. The fungal pathogen Cochliobolus

837 heterostrophus responds to maize phenolics: novel small molecule signals in a

838 plant‐fungal interaction. Cellular microbiology 12:1421-1434.

839 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N,

840 Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated

841 models of biomolecular interaction networks. Genome Res 13:2498-2504.

842 Shwab EK, Bok JW, Tribus M, Galehr J, Graessle S, Keller NP. 2007. Histone

843 deacetylase activity regulates chemical diversity in Aspergillus. Eukaryotic cell

844 6:1656-1664.

845 Slot JC, Hibbett DS. 2007. Horizontal transfer of a nitrate assimilation gene

846 cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE

847 2:e1097.

848 Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ. 2011.

849 Ecology drives a global network of gene exchange connecting the human

850 microbiome. Nature 480:241-244.

39 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

851 Snel B, Bork P, Huynen MA. 1999. Genome phylogeny based on gene content.

852 Nat Genet 21:108-110.

853 Snel B, Lehmann G, Bork P, Huynen MA. 2000. STRING: a web-server to

854 retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic

855 acids research 28:3442-3444.

856 Spatafora JW, Chang Y, Benny GL, Lazarus K, Smith ME, Berbee ML, Bonito G,

857 Corradi N, Grigoriev I, Gryganskyi A. 2016. A phylum-level phylogenetic

858 classification of zygomycete fungi based on genome-scale data. Mycologia

859 108:1028-1046.

860 Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-

861 analysis of large phylogenies. Bioinformatics 30:1312-1313.

862 Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA,

863 Wang J, Schierup MH. 2011. The making of a new pathogen: insights from

864 comparative population genomics of the domesticated wheat pathogen

865 Mycosphaerella graminicola and its wild sister species. Genome Res 21:2157-

866 2166.

867 Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J,

868 Simonovic M, Roth A, Santos A, Tsafou KP. 2014. STRING v10: protein–protein

869 interaction networks, integrated over the tree of life. Nucleic acids research

870 43(D1):D447-D452.

871 Szumilas M. 2010. Explaining odds ratios. Journal of the Canadian Academy of

872 Child and Adolescent Psychiatry 19:227.

40 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

873 Takeda I, Umemura M, Koike H, Asai K, Machida M. 2014. Motif-independent

874 prediction of a secondary metabolism gene cluster using comparative genomics:

875 application to sequenced genomes of Aspergillus and ten other filamentous

876 fungal species. DNA research 21(4):447-457

877 Tsochatzidou M, Malliarou M, Papanikolaou N, Roca J, Nikolaou C. 2017.

878 Genome urbanization: clusters of topologically co-regulated genes delineate

879 functional compartments in the genome of Saccharomyces cerevisiae. Nucleic

880 acids research 45(10):5818-5828.

881 Wagner GP. 1996. Homologues, natural kinds and the evolution of modularity.

882 American Zoologist 36:36-43.

883 Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C. 2014. Gene discovery

884 for enzymes involved in limonene modification or utilization by the mountain pine

885 beetle-associated pathogen clavigera. Appl Environ Microbiol

886 80:4566-4576.

887 Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach

888 MA, Müller R, Wohlleben W. 2015. antiSMASH 3.0—a comprehensive resource

889 for the genome mining of biosynthetic gene clusters. Nucleic acids research

890 43:W237-W243.

891 Wickham H. 2016. ggplot2: elegant graphics for data analysis: Springer.

892 Wisecaver JH, Slot JC, Rokas A. 2014. The Evolution of Fungal Metabolic

893 Pathways. PLoS Genet 10:e1004816.

41 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

894 Yeaman S. 2013. Genomic rearrangements and the evolution of clusters of

895 locally adaptive loci. Proceedings of the National Academy of Sciences

896 110:E1743-E1751.

897 Zaidi SSA, Zhang X. 2016. Computational operon prediction in whole-genomes

898 and metagenomes. Briefings in Functional Genomics. elw034.

899 Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B, San Francisco B,

900 Solbiati J, Steves A, Brown S. 2014. Prediction and characterization of enzymatic

901 activities guided by sequence similarity and genome neighborhood networks.

902 Elife 3:e03275.

903

904 Figure legends

905

906 Figure 1: Associations between gene cluster distributions and fungal lifestyle. a)

907 A phylogeny of 556 fungi (representing 481 species) based on pairwise

908 microsyntenic similarity is shown to the left, annotated by taxonomic class. All

909 taxonomic classes have been abbreviated by removing “-mycetes” suffixes. A

910 matrix indicating the ecological lifestyle (s) associated with each fungus is shown

911 to the immediate right of the phylogeny, followed by a heatmap indicating the

912 number of putative phenylpropanoid-degrading gene clusters in each genome,

913 for each cluster class. Numbers in brackets following the taxonomic class and

914 ecological lifestyle headers correspond to the number of genomes and species

915 within those categories. Numbers in brackets following cluster class headers

42 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

916 indicate the total number of clusters assigned to that class, the number of

917 species with at least 1 cluster, and the number of clusters that overlap with

918 clusters from another cluster class. b) Odds ratios representing the strength of

919 the association between cluster presence and fungal ecological lifestyle are

920 shown for each of 13 gene cluster classes and 6 lifestyles, using data at the

921 species level from the Pezizomycotina. Dotted red lines indicate an odds ratio of

922 1. Dark grey bars indicate enrichment below a significance level of 0.05, while

923 black outlines indicate enrichment below a significance level of 0.01. Error bars

924 indicate the 95% confidence interval (CI) for each odds ratio measurement. CIs

925 of 0 are not shown. The color-coding of ecological lifestyle is consistent across

926 the entire figure.

927

928 Figure 2: Combinations of putative phenylpropanoid-degrading gene clusters in

929 fungal genomes. a) A matrix describes the presence/absence of various

930 homologous clusters (as determined by cluster model; rows) in the genomes of

931 139 fungal species (columns). Species are grouped into 16 multi-cluster model

932 profiles (MCMPs) based on similarities in the combinations of clusters found in

933 their genomes. b) A bar chart depicts the number of fungal species from each

934 MCMP per taxonomic class. c) Odds ratios representing the strength of the

935 association between MCMP and fungal ecological lifestyle are shown, using data

936 at the species level from the Pezizomycotina. Dotted black lines indicate an odds

937 ratio of 1. Dark grey bars indicate enrichment below a significance level of 0.05,

43 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

938 while black outlines indicate enrichment below a significance level of 0.01. Error

939 bars indicate the 95% confidence interval (CI) for each odds ratio measurement.

940 CIs of 0 are not shown. Enrichment data is not shown for MCMP 12, as fewer

941 than 5 fungi from the Pezizomycotina are assigned to this MCMP.

942

943 Figure 3: Co-occurrence network of homolog groups in putative phenylpropanoid-

944 degrading gene clusters. Each node represents a homolog group found in a

945 putative phenylpropanoid-degrading gene cluster. All nodes are color-coded by

946 the cluster class in which they are found, except for those homolog groups found

947 associated with multiple cluster classes (i.e., shared), colored pink and shaped

948 as squares. Nodes representing anchor gene families used to retrieve the

949 clusters are shaped as diamonds, and those anchor gene families that are

950 shared across multiple cluster classes are additionally colored pink. Edges

951 symbolize the co-occurrence of two homolog groups in the same gene cluster,

952 while edge width is proportional to the frequency of that occurrence. Node size is

953 proportional to the number of connections emanating from that node. The

954 proximity of nodes to one another is proportional to the number of shared

955 connections. The annotations, followed by the code names and number of unique

956 connections in parentheses, of nodes with the greatest number of connections

957 (i.e., associated with the greatest diversity of homolog groups) are indicated in

958 the top right-hand corner of the network.

959

44 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

960 Figure 4: The distribution of pterocarpan hydroxylase (PAH) clusters in fungi. a) A

961 phylogeny of the Pezizomycotina based on pairwise microsyntenic distance is

962 shown to the left, annotated by taxonomic class. The presence/absence of

963 clusters assigned to the 4 PAH cluster models is indicated in the matrix to the

964 right of the phylogeny, color-coded by cluster model. b) A simplified schematic of

965 one of several reactions catalyzed by PAH. c) Homolog groups present in ≥75%

966 of clusters assigned to a given cluster model are depicted as boxes color-coded

967 by cluster model, while PAH homologs are indicated in yellow. The four-digit

968 code and predicted annotation are indicated for each homolog group. D) The

969 depicted network follows the conventions specified in Figure 2. Homolog groups

970 present in ≥75% of clusters assigned to a given cluster model are color-coded by

971 cluster model and inscribed with their code, while others are colored gray. Nodes

972 representing homolog groups present in clusters from other cluster classes are

973 drawn as squares.

974

975

45 A) Pezizomycotina Agarico. Sacch.2 Eurotio. Sordario. Leotio. Dothideo. Sacch.1 Fungal lifestyle Number ofclusters (122/110) (50/42) (73/67) (95/70) (37/21) (89/84) (3/3) 1 2 bioRxiv preprint 3 4

Fungal lifestyle

Animal pathotroph (85/70) Soil saprotroph (110/75)

Plant saprotroph (258/220) doi: not certifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. Plant pathotroph (146/115)

Plant symbiotroph (41/40) https://doi.org/10.1101/184242

Endophyte (65/46)

QDH (247/148/22)

ARD (210/156/1)

NAD (174/151/0)

PAH (145/133/0)

PMO (134/124/6) Gene clusterclass

CCH (78/71/30)

BPH (62/54/1)

SAH (40/35/0)

SDO (26/21/0)

ECL (23/23/0) ;

VAO (15/13/2) this versionpostedSeptember4,2017. CAE (12/12/0)

FAD (2/2/0)

Odds ratio B) 10 12 14 10 12 14 10 12 14 10 12 14 10 12 14 10 12 14 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Gene cluster class QDH

ARD Enrichment significance(Pezizomycotina) * *

NAD P P >0.05 PAH PMO

CCH

BPH SAH ≤ 00 P 0.05 SDO The copyrightholderforthispreprint(whichwas + *upper 95% CIoutofrange odds ratio outof range ECL VAO CAE *

FAD ≤

0.01 + * *

saprotroph pathotroph saprotroph pathotroph symbiotroph

Soil Animal Plant Plant Plant Endophyte Fungal lifestyle Fungal bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A) Multi-cluster model profile (MCMP) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 QDH 4 5 6 1 2 3 4 ARD 5 6 7 8 1 2 3 4 NAD 5 6 7 8 1 2 PAH 3 4 1 2 3 PMO 4 5 6 1 Gene cluster model 2 3 CCH 4 5 6 1 2 BPH 3 4 1 2 3 SAH 4 5 6 1 ECL 2 1 SDO 1 VAO 2 3 1 CAE 2 FAD 1 Fungal species gene cluster presence

*upper 95% CI out of range pathotroph 20 20 * Animal 15 16 B) C) 12 10 8

5 4 Dothideo. 0 0 saprotroph 20 20 * *

15 16 Soil 12 10 8 Leotio. 5 4 Number of species 0 0 saprotroph Fungal lifestyle 20 20

15 16 Plant

10 12 8 5

Sordario. 4 0 0 20 pathotroph 20 *** 15 16 Plant

10 12

Odds ratio 8 5 Eurotio. 4

Taxonomic class Taxonomic 0 0 20 symbiotroph 20 ** 15

16 Plant

10 12 8 5

4 Saccharo. 0 0

20 Endophyte 20 **** 15 16

10 12

Other 8 5 4 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 Multi-cluster model profile (MCMP) Multi-cluster model profile (MCMP) Enrichment significance Number of species in MCMP (Pezizomycotina only) P > 0.05 P ≤ 0.05 P ≤ 0.01 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Most higly connected homolog groups 1-Aldehyde dehydrogenase (ALL1050; 104) 2-Quinate permease (ALL1001; 83) 3-Cytochrome P450 (ALL1024; 82) 4-Catechol dioxygenase (ALL1022; 71) 5-Salicylate hydroxylase (ALL1011; 68)

1 3 2 4

5 Homolog group type Unique Shared Unique Shared anchor anchor Gene cluster class QDH BPH NAD SAH ARD ECL PMO SDO PAH VAO CCH CAE FAD bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Cluster presence O O O O H A) Model 1 Model 2 Model 3 Model 4 B) H PAH H

O H H H O O O O

O O (-)-maackiain 1α-hydroxy-maackiain

C) Cytochrome NmrA-like Unknown Unknown P450 transcriptional Model 1 PAH 1005 1007 regulator 1011 1014 Dothideomycetes

Sugar+other Cytochrome Dehydrogenase P450 transporter Model 2 PAH 1001 1002 1003

Amino acid MFS Isoflavone Monoamine Unknown permease transporter reductase oxidase Model 3 PAH 1010 1030 1155 1289 1296

Leotiomycetes Cytochrome Transcription 15-hydroxy Unknown P450 factor prostaglandin Model 4 PAH 1001 10791129 dehydrogenase 1298 D)

1011 1005 1007 1014

1003

Sordariomycetes 1002

1001

PAH 1010 1298 1129 1079 1030 1155

1296

1289

Homolog group type

Eurotiomycetes Model 1 Model 2 Model 3 Model 4 Present in < 75% of clusters Anchor Shared with another cluster class assigned to model