bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Article-Discoveries
2 Specialized plant biochemistry drives gene clustering in fungi
3 Emile Gluck-Thaler1, Jason C. Slot1*
4 1 Department of Plant Pathology, The Ohio State University, Columbus, Ohio,
5 United States of America
6 *Corresponding author
7 E-mail: [email protected]
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
23 Abstract
24 The fitness and evolution of both prokaryotes and eukaryotes are affected
25 by the organization of their genomes. In particular, the physical clustering of
26 functionally related genes can facilitate coordinated gene expression and can
27 prevent the breakup of co-adapted alleles in recombining populations. While
28 clustering may thus result from selection for phenotype optimization and
29 persistence, the extent to which eukaryotic gene organization in particular is
30 driven by specific environmental selection pressures has rarely been
31 systematically explored. Here, we investigated the genetic architecture of fungal
32 genes involved in the degradation of phenylpropanoids, a class of plant-produced
33 secondary metabolites that mediate many ecological interactions between plants
34 and fungi. Using a novel gene cluster detection method, we identified over one
35 thousand gene clusters, as well as many conserved combinations of clusters, in
36 a phylogenetically and ecologically diverse set of fungal genomes. We
37 demonstrate that congruence in gene organization over small spatial scales in
38 fungal genomes is often associated with similarities in ecological lifestyle.
39 Additionally, we find that while clusters are often structured as independent
40 modules with little overlap in content, certain gene families merge multiple
41 modules in a common network, suggesting they are important components of
42 phenylpropanoid degradation strategies. Together, our results suggest that
43 phenylpropanoids have repeatedly selected for gene clustering in fungi, and
2 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
44 highlight the interplay between gene organization and ecological evolution in this
45 ancient eukaryotic lineage.
46
47 Introduction
48 Genome organization is intimately linked to the trajectory of organismal
49 evolution. The impacts of genome organization on prokaryotic evolution in
50 particular have been extensively studied (Baquero 2004), and increasingly, the
51 organization of genes in eukaryotic genomes is also recognized to affect
52 organismal fitness and evolution. For example, the spatial clustering of
53 functionally related genes may enable the compartmentalization and optimization
54 of phenotypes through coordinated gene expression (Hurst et al. 2002; Al-
55 Shahrour et al. 2010; McGary et al. 2013). Similarly, the formation of loci
56 composed of co-adapted alleles can facilitate the inheritance of locally adapted
57 ecotypes within recombining populations over short time periods (Yeaman 2013;
58 Holliday et al. 2016). Rather than resulting from non-adaptive processes, such as
59 genetic hitchhiking, the persistence of such organizational patterns in eukaryotic
60 genomes suggests they may instead result from natural selection (Hurst et al.
61 2002; Lynch 2007). However, the extent to which eukaryotic genome
62 organization is driven by specific environmental selection pressures, especially
63 over macroevolutionary timescales, is not clear.
64 Organized genome structure is particularly apparent in fungi, a lineage of
65 eukaryotic microorganisms that perform critical ecosystem services and biomass
3 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
66 transformation, and also impact plant and animal health (Peay et al. 2016).
67 Fungal genomes are more or less replete with metabolic gene clusters (MGCs)
68 composed of genes encoding enzymes, transporters and regulators that
69 participate in specialized metabolic processes such as nutrient acquisition,
70 competition and defense (Wisecaver et al. 2014). Although MGCs are far more
71 rare in eukaryotes compared with bacteria, fungal MGCs exhibit similarly sparse
72 and disjunct phylogenetic distributions among distantly related species with
73 overlapping niches (Greene et al. 2014; Dhillon et al. 2015; Glenn et al. 2016).
74 This ecological pattern of distribution suggests that conserved combinations of
75 genes may be signatures of ecological selection in fungal genomes.
76 Fungal MGCs encoding the production of specialized, or secondary,
77 metabolites (SMs) have been studied extensively (Hoffmeister and Keller 2007)
78 and more recently, several reports suggest that adaptations to degrade plant
79 SMs are also encoded in MGCs (Shanmugam et al. 2010; Greene et al. 2014;
80 Wang et al. 2014; Kettle et al. 2015; Glenn et al. 2016). Plant SMs mediate
81 important biotic and abiotic interactions, including the exclusion of fungal
82 pathogens, the establishment of mutualisms, and the rates of nutrient cycling
83 long after the plant has died. The largest group of plant SMs are the
84 phenylpropanoids, which not only contribute to constitutive and inducible
85 chemical defenses, but are also the main barriers to wood decay in terrestrial
86 ecosystems (Floudas et al. 2012), and costly inhibitors of lignocellulose biofuel
87 production (Jönsson and Martín 2016). As the primary colonizers of plant
4 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
88 material in natural and artificial environments, fungi are frequently in contact with
89 phenylpropanoids, and must often mitigate their inhibitory effects in order to
90 grow. Common fungal adaptations to phenylpropanoid toxicity include
91 detoxification through sequestration, excretion and degradation (Mäkelä et al.
92 2015). Despite the characterization of many phenylpropanoid degradation
93 pathways in fungi, the genomic bases of these pathways are largely unknown
94 (Mäkelä et al. 2015), precluding the use of currently available algorithms to
95 investigate whether or not these metabolic processes are encoded in MGCs
96 (Wisecaver et al. 2014; Weber et al. 2015).
97 Here, we developed a novel algorithm based on empirically derived
98 models of fungal genome evolution in order to systematically identify clusters of
99 genes putatively involved in phenylpropanoid degradation, enabling us to test the
100 hypothesis that selection pressures from plant SMs impact genome organization
101 across disparate fungal lineages. Using a database of 556 fungal genomes
102 representing 481 species, we detected 1168 MGCs and many conserved
103 combinations of MGCs that putatively degrade a broad array of
104 phenylpropanoids. We then tested for associations between MGCs and various
105 fungal ecological lifestyles, and found that the presence of certain MGCs was
106 enriched in plant pathotrophs and saprotrophs. While many clusters appear to
107 have evolved independently, we identified several gene families that are
108 commonly associated with diverse MGCs, suggesting they play important roles in
109 phenylpropanoid catabolism. These results suggest that phenylpropanoids are
5 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
110 drivers of gene organization in plant-associated fungi, and that MGCs may in turn
111 determine patterns of fungal community assembly on both living and decaying
112 plant tissues.
113
114 Results
115
116 Diverse candidate gene clusters are associated with phenylpropanoid
117 degradation
118
119 Using 27 different “anchor” gene families involved in phenylpropanoid
120 degradation as separate queries (Methods; Supplementary Table 1), we
121 searched 556 genomes from 481 fungal species for clusters associated with
122 each anchor gene family (Methods; Supplementary Figure 1, Supplementary
123 Table 2). After removing those clusters containing genes known to exclusively
124 participate in fungal secondary metabolite biosynthesis (Supplementary Table 3;
125 Supplementary Table 4), as well as gene clusters with fewer than 4 genes, we
126 found evidence of unexpected clustering in 13 anchor gene families, which we
127 defined as separate cluster classes. We identified a total of 1168 clusters
128 distributed across 363 fungal genomes (Figure 1, Supplementary Figure 2,
129 Supplementary Table 5). 31 of these clusters belonged to multiple clusters
130 classes (i.e., contained multiple anchor genes) (Supplementary Table 3). When
131 accounting for multiple occurrences of homologous clusters (as defined by
6 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
132 cluster model; Methods) among individuals of the same species, this total figure
133 was reduced to 962 clusters distributed across 309 fungal species. The number
134 of cluster models per cluster class varies from 1 to 8, and clusters assigned to
135 different models generally do not overlap in terms of content (Supplementary
136 Table 6). Additionally, clusters typically contained very few intervening genes
137 (Supplementary Figure 3). Clusters are disproportionately distributed across
138 fungal lineages: for example, Pezizomycotina comprises 50.3% of the species in
139 the analysis yet contains 89.4% of all clusters, while Agaricomycetes represent
140 22.9% of species, but have only 4.7% of clusters (Supplementary Table 7).
141 However, homologous clusters assigned to the same model are typically found
142 distributed across different taxonomic classes (Supplementary Table 7).
143 In order to benchmark our approach, we searched for previously
144 characterized clusters containing quinate 5-dehydrogenase (QDH) and stilbene
145 dioxygenase (SDO) homologs. All 8 previously predicted clusters from the SDO
146 cluster class (Greene et al. 2014) and all 31 described clusters from the QDH
147 class (Hane et al. 2007; Shalaby et al. 2012) were recovered. Additionally, while
148 this study was being prepared, a single cluster from the pterocarpan hydroxylase
149 (PAH) class was identified elsewhere (Pigné et al. 2017). To our knowledge, the
150 remaining 1128 clusters identified in this analysis are described here for the first
151 time. We expect some false negative error in our detection but not significant
152 false positive to arise due to variation in the quality of genome assemblies and
153 annotations. Some false positive error in our detection may, however, arise from
7 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
154 the fact that beyond the function of the anchor gene, we did not require any
155 genes to be demonstrably involved in phenylpropanoid degradation because the
156 loci involved in such pathways are for the most part unknown (Mäkelä et al.
157 2015). Although we removed clusters with genes known to exclusively participate
158 in biosynthetic secondary metabolism (Supplementary Table 4), it is still possible
159 that some of the remaining clusters participate in biosynthetic reactions, as it is
160 often difficult to definitively distinguish between biosynthetic and catabolic
161 processes.
162
163 Clustered homolog groups, including those distributed across cluster classes, are
164 enriched for primary and secondary metabolic functions
165
166 Despite the lack of functional constraints on cluster composition, clustered
167 homolog groups were, enriched for 7 KOG processes primarily related to either
168 primary or secondary metabolism, including “Energy production and conversion”
169 and “Secondary metabolite biosynthesis, transport and catabolism”, the two
170 largest and most significantly enriched categories (Supplementary Table 8). The
171 KOG process “Transcription” was the only non-metabolic process enriched in
172 clustered homolog groups. Homolog groups present in more than one cluster
173 class are also enriched for multiple functional categories related to primary and
174 secondary metabolism (Supplementary Table 9). Conserved domains from the
175 Pfam database that are present among shared homolog groups include transport
8 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
176 and transcription-related domains, cytochrome P450 domains, and domains
177 related to functional group modification (e.g., dehydrogenase, transferase,
178 amidase and decarboxylase; Supplementary Table 10).
179
180 Phenylpropanoid degradation gene clusters are enriched in Pezizomycotina
181 species with plant-associated lifestyles
182
183 We limited our exploratory enrichment analyses to the Pezizomycotina, as they
184 contained the vast majority of identified clusters. Clusters from 5 cluster classes
185 are enriched in plant pathotrophs, including those containing the anchor genes
186 naringenin 3-dioxygenase (NAD) and epicatechin laccase (ECL) that are know to
187 participate in flavonoid degradation (Figure 1). An additional 3 cluster classes are
188 enriched in other plant-associated lifestyles, such as plant saprotrophs and
189 endophytes. The benzoate 4-monooxygenase (BPH) cluster class is enriched in
190 soil saprotrophs. No cluster classes are enriched in plant symbiotrophs or animal
191 pathotrophs. Similar patterns of lifestyle-dependent enrichment are observed
192 when comparing Pezizomycotina species with specific cluster models to those
193 without any cluster models (Supplementary Figure 4). Notably, at least one model
194 in each of 12 cluster classes is enriched in a plant-associated lifestyle. Aromatic
195 ring dioxygenase (ARD) models are most often enriched in plant-associated
196 lifestyles: model 1 is enriched in plant symbiotrophs and plant pathotrophs;
197 model 2 is enriched in plant pathotrophs, and model 5 is enriched in endophytes.
9 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
198 Although ecological enrichments highlight predominant trends, clusters across all
199 anchor genes are typically associated with diverse lifestyles, in part because any
200 given fungal species can be associated with multiple ecological lifestyles.
201 We observed that 207 fungal species possessed more than one cluster
202 model. These fungi were assigned to multi cluster model profiles (MCMPs) based
203 on the combinations of cluster models found in their genomes (Methods). 139
204 fungal species were distributed across 16 distinct MCMPs with 5 or more species
205 (Figure 2). Fungi from the same MCMP tend to be closely related; however, 13
206 MCMPs contain fungi from different taxonomic orders, and 6 of these contain
207 fungi from different taxonomic classes. We also limited this exploratory
208 enrichment analysis to Pezizomycotina species and found that 4 multi-cluster
209 profiles are enriched in species with plant-associated lifestyles, 1 is enriched in
210 animal pathotrophs and endophytes, and 1 is enriched in soil saprotrophs (Figure
211 2).
212
213 Homolog groups are found in modules of co-clustered genes
214
215 We visualized structural associations among all clustered homolog groups as a
216 network where nodes represent homolog groups, and edges represent the co-
217 occurrence of homolog groups in a cluster (Figure 3). A number of homolog
218 groups are found in clusters belonging to different cluster classes (39 in total;
219 Supplementary Table 10), which resulted in a common network for all cluster
10 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
220 classes. Shared homolog groups are typically among the most highly connected
221 nodes in this network (i.e., they co-occur frequently with a diverse set of homolog
222 groups; Supplementary Table 10). The modularity of the network is significantly
223 greater than that observed in networks of similar size with randomly distributed
224 edges (Supplementary Table 11). This high degree of modularity reflects a large
225 number of homolog groups that are unique to each cluster class (238 in total).
226 Some homolog groups, while not shared across cluster class, are highly
227 connected because they are present in different cluster models (e.g., see
228 homolog group 1001 in Figure 4).
229
230 Results spotlight (could be displayed as a Box, combining results and
231 discussion): Pterocarpan hydroxylase clusters may encode uncharacterized
232 strategies for flavonoid degradation
233
234 The production of toxic SMs by plants is an ancient and effective strategy for
235 deterring fungal growth. Isoflavonoids, for example, are a critical component of
236 legume (Fabaceae) defenses against fungi. In response, some legume
237 pathogens have evolved degradative metabolic pathways to detoxify
238 isoflavonoids, thus facilitating host colonization (Enkerli et al. 1998). Pterocarpan
239 hydroxylase (PAH) is involved in the degradation of pterocarpan and medicarpin
240 isoflavonoids, and can contribute to virulence on susceptible plant hosts (Enkerli
241 et al. 1998). We found 133 clusters in the PAH class, distributed across 94 fungal
11 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
242 species with diverse ecological lifestyles (Figure 4). Homology among these
243 clusters is best described by 4 models containing many of the canonical
244 functions found in fungal MGCs. Models 1, 2 and 4 all contain cytochrome
245 P450s, a large family of enzymes involved in detoxification mechanisms in fungi
246 and other organisms (Lah et al. 2011), while models 2 and 3 contain at least one
247 transporter and model 4 contains a transcription factor. Model 3 contains a
248 homolog of isoflavone reductase, an enzyme used in isoflavonoid biosynthesis in
249 plants, but may also be involved in isoflavonoid detoxification pathways (Höhl et
250 al. 1989). The widespread distribution of PAH clusters suggests they are involved
251 in the degradation of flavonoids found outside the Fabaceae, and the enrichment
252 of model 1-type clusters in plant saprotrophs suggests that flavonoids that persist
253 in plant tissues may impact fitness in saprophytic environments. However, the
254 genetics underlying isoflavonoid biosynthesis and degradation, and flavonoid
255 metabolism in general, are not well understood for fungi, despite the ecological
256 importance of this chemical family.
257 A PAH homolog in Alternaria brassisicola found in a cluster assigned to
258 model 1 was recently shown to contribute to the accumulation of DHN melanin in
259 cell walls, possibly through catalyzing melanin polymerization or the cross-linking
260 of melanin to fungal cell walls (Pigné et al. 2017). Melanins are a class of
261 complex phenolic polymers that can serve to quench oxidizing radicals, and play
262 a role in fungal pathogenicity and survival in harsh environments (Bayry et al.
263 2014). Intriguingly, plant flavonoids can be polymerized in planta and in vitro into
12 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
264 melanin by fungal laccases (Fowler et al. 2011) and possibly by
265 monooxygenases similar to PAH (Desentis-Mendoza et al. 2006). As the
266 repurposing of host metabolites as substrates for the production of SMs is not
267 unprecedented in fungi (Schmaler-Ripcke et al. 2009), we suggest that this PAH
268 cluster in particular is a good candidate for exploring hypotheses of cross talk
269 between specialized metabolic pathways.
270
271 Discussion
272
273 Candidate phenlypropanoid degradation MGCs found at over 1000 loci with
274 unexpectedly conserved synteny
275
276 Fungi are the most ecologically significant decomposers of plant biomass, and
277 represent some of the world’s most devastating plant pathogens (Peay et al.
278 2016). The presence of catabolic genes targeting plant SMs in fungal genomes
279 has often been linked to species ecology and the ability to colonize different
280 types of plant tissues (Floudas et al. 2012). Much less is known about how these
281 genes are organized in fungal genomes, and how their organization impacts or
282 may be affected by fungal evolution and ecology. The genomes of many fungi
283 tend to undergo extensive and frequent rearrangements (Hane et al. 2011;
284 Plissonneau et al. 2016). By contrast, MGCs we detected are often
285 simultaneously present in lineages that diverged hundreds of millions of years
13 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
286 ago (Floudas et al. 2012) (Figure 1). As combinations of genes that persist over
287 long periods of evolutionary time are likely to be maintained by natural selection
288 (Hurst et al. 2002; Baquero 2004; Muto et al. 2013) and often encode proteins
289 that are functionally related (Szklarczyk et al. 2014; Zhao et al. 2014), we
290 hypothesize that the conserved combinations of phenylpropanoid degrading
291 genes detected here encode adaptive metabolic phenotypes and are signatures
292 of selection on gene organization.
293 The most well studied phenotype conferred by clustering is the improved
294 coordination of gene transcription (Hurst et al. 2002). Clustering facilitates gene
295 transcription by allowing co-regulation through local chromatin modifications
296 (Shwab et al. 2007), the sharing of promoters (Davila Lopez et al. 2010), and the
297 avoidance of topological constraints on DNA encountered during transcription
298 (Tsochatzidou et al. 2016). In addition to synchronising cellular responses,
299 coordinated gene transcription may also decrease the accumulation of toxic
300 intermediate metabolites, such as those produced during phenylpropanoid
301 degradation (Greene et al. 2014; Mäkelä et al. 2015), by optimizing enzyme
302 stoichiometry and improving metabolic flux (McGary et al. 2013). Gene clustering
303 may additionally be selected for because it increases the capacity for
304 recombining populations to evolve. When genes contributing to the same trait are
305 clustered, selection for that trait is more efficient because of decreased selective
306 interference between genes at that locus (Pepper 2003). Clustering can also
307 prevent the breakup of co-adapted alleles in the face of gene flow (Yeaman
14 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
308 2013), which may enable local metabolic adaptation of fungal populations
309 distributed across cryptic niches.
310 Similarly, clustering increases the probability of propagating adaptive
311 combinations of genes through both vertical (Pepper 2003) and horizontal
312 (Lawrence and Roth 1996; Baquero 2004) transfer. Many of the clusters
313 observed in this study have discontinuous phylogenetic distributions that are
314 consistent with evolutionary scenarios such as vertical inheritance coupled with
315 extensive loss, convergence, duplication, or horizontal gene transfer (HGT). HGT
316 of MGCs is predicted to be associated with rapid adaptive changes to
317 phenotypes, including increased capacity for host colonization (Dhillon et al.
318 2015; Glenn et al. 2016) and nutrient acquisition (Slot and Hibbett 2007). For
319 example, MGCs in bacteria are frequently dispersed by HGT among species
320 inhabiting the same environment or host (Baquero 2004). HGT is positively
321 influenced by relatedness in bacteria (Andam and Gogarten 2011) and fungi
322 (Wisecaver et al. 2014; Gluck-Thaler and Slot 2015), as well as by shared
323 ecological niche (Smillie et al. 2011). Given that the distributions we observed
324 here may be at least partly HGT-driven (Greene et al. 2014), the role of HGT in
325 the dispersal of putative phenylpropanoid degrading MGCs must be explicitly
326 tested with phylogenetic methods (Gluck-Thaler and Slot 2015).
327 We detected the vast majority of clusters in sub-phylum Pezizomycotina.
328 Among fungi, the Pezizomycotina are known to possess highly clustered
329 genomes, while phylum Basidiomycota and sub-phylum Saccharomycotina have
15 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
330 much fewer (Wisecaver et al. 2014). Such differences may be due modes of
331 chromosomal evolution unique to the Pezizomycotina that involve extensive
332 rearrangements that could facilitate cluster formation (Hane et al. 2011). The
333 large population sizes attributed to many Pezizomycotina lineages could then
334 serve to increase selection efficiency for clusters (Lynch and Walsh 2007) such
335 that they would be maintained. Fungi in the Pezizomycotina are also thought to
336 experience more HGT than those in other phyla (Wisecaver et al. 2014), which
337 may facilitate cluster dispersal.
338
339 Gene cluster distributions suggest ecological adaptation
340
341 While certain clusters have distributions suggestive of lineage-specific bias
342 (Supplementary Table 7), these conserved combinations of genes were
343 ultimately detected by their unexpected phylogenetic distributions that conflicted
344 with lineage-specific phylogenetic signal, indicating that phylogeny is not the sole
345 determinant of cluster distributions. Indeed, gene organization is often an
346 important determinant of fitness and may be differentially selected across
347 ecological niches (Kirkpatrick and Barton 2006; Yeaman 2013). Our analysis
348 suggests that the presence of putative phenylpropanoid degrading MGCs in the
349 Pezizomycotina is associated with ecological lifestyle (Figure 1), although this
350 does not of course preclude other traits, such as host preference, that might also
351 affect cluster distributions. Intriguingly, two of the five cluster classes enriched in
16 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
352 plant pathotrophs (NAD and ECL) are predicted to target flavonoids, which are
353 important components of plant defense against fungal pathogens. Such clusters
354 may contribute to pathogen fitness, as recent reports suggest that the
355 degradation of host defense compounds contributes quantitatively to plant
356 pathogen virulence and reproduction (Hammerbacher et al. 2013; Kettle et al.
357 2015). Other cluster classes enriched in plant pathotrophs, including QDH, ARD
358 and phenol 2-monooxygenase (PMO), are predicted to target less derived
359 molecules, and may participate in the downstream degradation of more complex
360 phenylpropanoids, such as lignin (Jönsson and Martín 2016). Catabolic clusters
361 may also confer fitness benefits to saprotrophs living in plant tissues or soil by
362 enabling the degradation of phenylpropanoids that inhibit fungal colonization of
363 organic matter (Floudas et al. 2012). Indeed, many of the cluster classes
364 enriched in saprotrophs, including ferulic acid esterase (CAE) and BPH are
365 predicted to target phenylpropanoids commonly found in decaying plant material
366 and soils (Mäkelä et al. 2015). We also found that individual cluster models are
367 enriched in plant-associated lifestyles, giving insight into the specific patterns of
368 gene organization favoured by natural selection (Supplementary Figure 4).
369 Clusters from different models were rarely enriched in more than one ecological
370 lifestyle, possibly reflecting differential selection pressures imposed by
371 phenylpropanoids encountered in different ecological contexts.
372
17 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
373 Cluster model co-occurrences may reflect simultaneous selection by multiple
374 plant metabolites
375
376 The modularity of the structural network of clustered homolog groups suggests
377 that different cluster classes carry out semi-independent functions (Wagner
378 1996), and may reflect differences in phytochemicals processed by each cluster
379 class (Figure 3). Minimally, the structuring of clusters as independent modules
380 with little overlap in content indicates that cluster classes have largely evolved
381 independently of one another, and that selection for gene organization is likely to
382 have occurred multiple times.
383 We sorted different fungal species into multi-cluster model profiles
384 (MCMPs) based on the combinations of clusters found in their genomes (Figure
385 2), raising the intriguing possibility that conserved cluster repertoires may be
386 important for determining ecological phenotypes, similar to how combinations of
387 pathogenicity islands in bacteria determine host range (Bouyioukos et al. 2016).
388 While certain MCMPs were only present in specific fungal classes, others were
389 present in fungi from different taxonomic classes and orders, and certain MCMPs
390 are enriched in Pezizomycotina species with specific plant-associated lifestyles
391 (Figure 2). Plants typically do not produce a single chemical in isolation; rather,
392 defense metabolites are released as mixtures, either with metabolites from
393 different pathways, or with metabolic precursors from the same pathway, or both
394 (Gershenzon et al. 2012). MCMPs may reflect the compositions of SM mixtures
18 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
395 encountered during the colonization of specific hosts, or possibly temporal
396 differences in defense compound pressure (Hammerbacher et al. 2013), and
397 may underlie assembly patterns of fungal communities on different hosts and
398 substrates. A detailed investigation of the full complement of degradative MGCs
399 from fungi colonizing the same plant host would be ideal for testing this
400 hypothesis.
401
402 Do gene clusters bridge plant and fungal metabolisms?
403
404 In order to infer which clustered homolog groups may play outsized roles in
405 fungal strategies for phenylpropanoid degradation, we ranked each group by the
406 number of non-redundant co-occurrences they have with other clustered homolog
407 groups, and identified homolog groups present in different cluster classes (Figure
408 3; Supplementary Table 10). We predict that the homolog groups with the
409 greatest number of co-occurrences (i.e., the most highly connected, Figure 3), as
410 well as those found associated with the greatest number of cluster classes, have
411 been repeatedly recruited to diverse catabolic processes and may encode key
412 strategies for phenylpropanoid degradation in fungi. Notably, some of the shared
413 homolog groups, like those with members encoding cytochrome P450 enzymes
414 (see groups ALL1024, ALL1032, ALL1115 in Supplementary Table 10) and multi-
415 drug transporters (ALL1122, ALL1127), are known to be associated with
19 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
416 evolutionary adaptability and often underlie fungal detoxification strategies of
417 diverse chemicals (Lah et al. 2011; Mäkelä et al. 2015).
418 Fungi often exclusively acquire carbon from plant-derived substrates.
419 Intriguingly, the cellular processes associated with shared homolog groups also
420 suggest selection to integrate phenylpropanoid degradation products into central
421 fungal metabolism. Other shared homolog groups have members normally
422 associated with core fungal metabolism, for example, carbohydrate transporters
423 (MFS monocarboxylate transporter: ALL1065; Sugar (and other) transporter:
424 ALL1071, ALL1103), enzymes that produce substrates for aromatic amino acid
425 biosynthesis (3-dehydroshikimate dehydratase: ALL1002), and enzymes that
426 participate in aromatic amino acid degradation (fumarylacetoacetate hydrolase:
427 ALL1055). Metabolic processes, such as those often encoded in clusters (Muto
428 et al. 2013; Greene et al. 2014), are predicted to evolve through the recruitment
429 and subsequent specialization of duplicated enzyme-encoding genes from
430 existing metabolic pathways (Jensen 1976). Shared homolog groups thus offer
431 evidence that genes involved in SM degradation as well as those involved in core
432 fungal metabolism have been recruited from existing pathways to MGCs, as has
433 been suggested elsewhere (Greene et al. 2014). Combinations of genes in
434 MGCs may ultimately serve to connect diversifying chemical environments to a
435 stable metabolic network, acting as a bridge between specialized plant
436 metabolism and core fungal metabolism, and enabling fungal growth under what
437 would otherwise be challenging conditions.
20 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
438 The prevalence of putative phenylpropanoid-degrading MGCs in plant-
439 associated fungi suggests that specialized plant metabolism is a strong source of
440 selection on fungal gene organization. Based on their distribution, we
441 hypothesize that the MGCs detected here encode selectable phenotypes that
442 promote the colonization of plant substrates by fungi. Further functional
443 characterization of these loci will serve to accelerate discovery of metabolic
444 processes that are of great interest not only for lignocellulosic biofuel production,
445 but also for our understanding of the evolutionary dynamics between gene
446 organization and ecological adaptation.
447
448 Materials and methods
449
450 Data acquisition, annotation and software specifications
451
452 Publically available data from 556 assembled fungal genomes and predicted
453 proteomes were downloaded from various sources (Supplementary Table 2).
454 While data from all genomes was used to generate the presented results, specific
455 results from the 209 unpublished genomes have been redacted, and assigned
456 generic names based on taxonomic class. For each gene with alternative
457 transcripts in the set of genomes, the predicted amino acid sequence associated
458 with the longest of such transcripts was used for cluster detection, and the rest
459 were discarded. Ecological metadata of all 556 genomes were compiled from
21 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
460 various sources (Supplementary Table 2), including the community-curated
461 FUNGuild database (last accessed 09/09/16) (Nguyen et al. 2016) and the U.S.
462 National Fungus Collection database (last accessed April 1st, 2017) (Farr and
463 Rossman 2017). The naming of ecological lifestyles follow the framework
464 established by FUNGuild (Nguyen et al. 2016).
465 Amino acid sequence searches with cutoffs of 30% identity, 50 bitscore
466 and where the length of the target sequence was 50-150% of the query
467 sequence, were performed with using USEARCH v8.0.1517’s UBLAST algorithm
468 (additional parameters: evalue cutoff = 1e-5, accel = 0.8) (Edgar 2010) or with
469 BLASTp v2.2.25+ (additional parameters: evalue cutoff = 1e-4) (Altschul et al.
470 1990). Homology was determined using OrthoMCL v2 with an inflation value of
471 1.5 (Fischer et al. 2011). All amino acid sequences from clustered homolog
472 groups were assigned to orthologous groups from the fuNOG database (last
473 accessed 03/18/16) (Huerta-Cepas et al. 2015) using HMMER3 (Eddy 2011).
474 Clustered sequences were additionally annotated by PFAM and GO term using
475 InterProScan 5 v5.20-59.0 (Jones et al. 2014). Predicted functional annotations
476 and KOG processes for each clustered homolog group were based on the most
477 frequent fuNOG annotation assigned to proteins within that group.
478 All phylogenetic trees were visualized using ETE v3 (Huerta-Cepas et al. 2016)
479 and all other graphs were visualized using the ggplot2 package in R (Wickham
480 2016). All data used to generate the presented figures can be found in
481 Supplementary Table 12.
22 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
482
483 Construction of the microsynteny tree
484
485 The evolution of microsynteny (gene content conservation over small genomic
486 distances) does not necessarily recapitulate phylogenetic relationships
487 determined by models of sequence evolution. In order to assess the
488 unexpectedness of cluster distributions within a phylogenetic framework based
489 on microsynteny conservation, we used an approach similar to Snel et al. (1999)
490 (Snel et al. 1999) to construct a species tree based on microsyntenic distance, or
491 pairwise comparisons of gene content conservation (Supplementary Figure 1).
492 Distances based on microsynteny are necessarily bounded by some maximum
493 value that is often rapidly reached as microsynteny degrades, and thus the ability
494 to resolve phylogenetic relationships decreases as evolutionary distance
495 increases (Rogozin et al. 2004). Nevertheless, the species relationships that we
496 recovered in general correspond well with those obtained using conventional
497 phylogenetic methods (Schoch et al. 2009; Spatafora et al. 2016).
498 To construct the microsynteny tree, up to 10 genes upstream and
499 downstream of all homologs from a randomly selected gene family were retrieved
500 (designated “gene neighborhood”), then had their amino acid sequences
501 compared against one another using UBLAST, and subsequently were sorted
502 into homolog groups using OrthoMCL. For each pairwise neighborhood
503 comparison, the total number of shared orthologs was divided by the size of the
23 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
504 smallest neighborhood, subtracted from 1, and designated as pairwise syntenic
505 distance. Pairwise syntenic distances thus range from 0 (all homolog groups are
506 shared within a neighborhood) to ~0.95 (only one gene, the randomly selected
507 query, is shared). 1000 neighborhoods were randomly sampled in this way. A
508 neighbor-joining tree was constructed from the distance matrix of the median
509 pairwise syntenic distance for each pairwise genome comparison using the ape
510 package in R (Paradis et al. 2004). The original dataset was sampled with
511 replacement to obtain 100 bootstrap pseudo-replicate trees that were then used
512 to calculate bootstrap support on the original tree using the –z switch in RAxML
513 v8.2.0 (Stamatakis 2014). Nodes on the original tree receiving less than 70%
514 bootstrap support were collapsed. The final microsynteny tree was used to
515 calculate distances covered by cluster distributions, as well as individual homolog
516 groups.
517
518 Sampling null models of gene cluster evolution
519
520 Empirically derived null models describing the background levels of “gene
521 cluster” distributions were developed for each of the 12 largest taxonomic classes
522 and each cluster size ranging from 4-24 genes (252 distributions in total). Briefly,
523 for a given null distribution, a group of neighboring genes (i.e., query cluster) of
524 size X was chosen at random from a randomly selected genome from taxonomic
525 class Y. Hits to each gene in the query cluster were recovered from the proteome
24 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
526 database using BLASTp. Genomes containing homologous clusters (i.e.,
527 containing hits to all genes in the query cluster, where no more than 6 intervening
528 genes separated any hit from another) were then retrieved. To determine the
529 phylogenetic distance associated with the query cluster distribution, the total
530 branch length distance on the microsynteny tree connecting all genomes with
531 homologous clusters was calculated and stored. The above sampling process
532 was repeated 500 times for each null distribution.
533
534 Locating Clusters through Unexpected Synteny
535
536 We have identified previously undetected gene clusters putatively involved in
537 phenylpropanoid degradation using a method that posits that fungal genomes are
538 spatially compartmentalized by function, similarly to those of bacteria (Snel et al.
539 2000; Rogozin et al. 2002; Rogozin et al. 2004; Szklarczyk et al. 2014; Zhao et
540 al. 2014; Zaidi and Zhang 2016). Our method also draws upon the observation
541 that microsynteny is often poorly conserved in fungal genomes (Hane et al.
542 2011), which has previously enabled biosynthetic MGC detection through the
543 identification of conserved microsynteny among individuals of the same species
544 (Takeda et al. 2014). Briefly, genes in regions surrounding “anchor genes” of
545 interest were assigned to homolog groups. Clusters were then detected by
546 comparing the phylogenetic distributions of homolog group combinations within
547 these regions to distributions expected under null models of microsynteny
25 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
548 evolution. More detailed steps are described below and in Supplementary Figure
549 1.
550 All homologs of an anchor gene query sequence of interest were retrieved
551 using BLASTp. Twenty genes upstream and downstream of all anchor gene
552 homologs (designated “anchor gene neighborhood”) were retrieved, compared
553 against one another using UBLAST and sorted into homolog groups using
554 OrthoMCL. Individual homolog groups whose distribution on the microsynteny
555 species tree had a maximum pairwise distance below 0.95 were discarded.
556 Within the set of all anchor gene neighborhoods, the set of unique combinations
557 of 4 or more homolog groups that included the anchor gene (“cluster motifs”) was
558 determined. The genomes in which each motif occurred were identified, with the
559 condition that genes belonging to homolog groups in the motif never be
560 separated by more than 6 intervening genes. For each genome in which a given
561 motif occurred, the probability of observing the motif in that genome, given the
562 size of the motif and the taxonomic class of the genome, was empirically
563 estimated by determining the proportion of samples in the appropriate null
564 distribution that cover a total distance on the microsynteny tree greater than or
565 equal to the distance associated with the given motif, divided by the total number
566 of samples in the null distribution. For example, to estimate the probability of
567 observing a motif with 8 homolog groups in a Dothideomycete genome, we would
568 compare the total distance associated with the observed motif to the null
569 distribution of distances associated with size 8 clusters sampled from
26 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
570 Dothideomycetes. For all such test, the test statistic is distance on the
571 microsynteny tree, and the null hypothesis is that the phylogenetic distribution of
572 a given cluster motif containing an anchor gene is consistent with background
573 rates of microsynteny evolution. The null hypothesis is rejected for motifs with an
574 estimated probability below the significance threshold of 0.05 in at least one
575 genome, and all genes assigned to such motifs are designated clusters. Clusters
576 with proteins that had fuNOG annotations or PFAM domains or GO term
577 annotations associated with proteins known to exclusively participate in fungal
578 secondary metabolite biosynthesis were excluded from further analysis
579 (Supplementary Table 4). All annotations of clustered proteins were also
580 manually inspected for evidence of exclusive participation in biosynthetic
581 metabolism, and excluded if necessary.
582
583 Cluster models and multi cluster model profiles (MCMPs)
584
585 Homology among clusters from the same cluster class (i.e., containing the same
586 anchor gene) was determined by assessing similarities in homolog group
587 content. Briefly, a matrix detailing the presence or absence of homolog groups in
588 each cluster was used to calculate Bray-Curtis dissimilarity indices for all pairwise
589 comparisons of clusters. Pairwise comparisons were then grouped using
590 complete linkage clustering, and any clusters separated by under 0.6 distance
591 units were assigned to the same cluster model. This cutoff was empirically
27 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
592 determined after manual examination of the content of clusters assigned to the
593 same model under various distance cutoffs. Homolog groups present in ≥75% of
594 clusters assigned to a given model were then used to summarize that model. The
595 above approach was also used to group fungal species with similar combinations
596 of clusters into MCMPs by using a matrix detailing the presence or absence of
597 homologous clusters (as determined by cluster model) among all species with
598 two or more clusters. MCMPs observed in more than 5 species were used for
599 enrichment analyses. All above analyses were performed using the vegan
600 package in R (Oksanen et al. 2016).
601
602 Network analyses
603
604 Amino acid sequences from all clustered homolog groups across the 13 cluster
605 classes were combined into one set and then sorted into new homolog groups
606 using OrthoMCL. Homolog groups that contained sequences from multiple
607 cluster classes were designated as “shared”. Pairwise co-occurrences of all
608 homolog groups in all clusters were determined, and visualized as a network with
609 Cytoscape v.3.4.0 (Shannon et al. 2003). The network layout was determined
610 solely by the AllegroLayout plugin with the Allegro Spring-Electric algorithm.
611 Analyses of network modularity were performed with the spectral partitioning
612 algorithm (Newman 2006), as implemented in MODULAR (Marquitti et al. 2014),
28 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
613 and the probability of network modularity was estimated by randomly sampling
614 the network 1000 times, twice.
615
616 Enrichment tests
617
618 A one tailed Fisher’s exact test, as implemented in the Text-NSP Perl module,
619 was used to conduct all tests of enrichment (see Supplementary Figure 1 for
620 examples of contingency table and formulae). Unless noted otherwise, gene
621 cluster, cluster model or MCMP presence/absence was recorded at the species
622 level, and counts of species were used to fill in contingency tables. Odds ratios
623 were calculated to estimate the magnitude of the enrichment effect (Szumilas
624 2010). Contingency tables with a zero in least one cell had all cells incremented
625 by 0.5 to avoid division by zero when calculating the odds ratio (Bradburn et al.
626 2007). In general, the odds ratio can be interpreted as the odds that fungi have a
627 particular lifestyle, given they have a particular cluster-based feature (either
628 cluster presence (Figure 1b), cluster model presence (Supplementary Figure 4)
629 or MCMP assignment (Figure 2c). The precision of each odds ratio, except for
630 those associated with contingency tables with a zero in at least one cell, was
631 estimated by calculating the 95% confidence interval (Szumilas 2010). The
632 general form of the null hypothesis is that particular features are not enriched in
633 fungi with a particular ecological lifestyle. We rejected the null hypothesis at α =
634 0.05. As the Pezizomycotina possessed 89.4% of all clusters detected, only data
29 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
635 from Pezizomycotina genomes were used in enrichments tests presented in
636 Figure 1b, Supplementary Figure 4, and Figure 2c. Given the exploratory nature
637 of this study, we did not correct for multiple testing in order to identify trends to
638 follow up in later confirmatory studies.
639
640 Code availability
641
642 Unless noted otherwise, all analyses were performed using a series of custom
643 Perl scripts (Supplementary Figure 1). All scripts used to generate the presented
644 results are available upon request. Fasta files of all recovered cluster models, as
645 well as a script to recover cluster models identified in this analysis from a user-
646 supplied genome of interest, are available for download at
647 https://github.com/egluckthaler/cluster_retrieve
648
649 Acknowledgements and funding
650
651 This work was supported by funds from The Ohio Agricultural Research and
652 Development Center at The Ohio State University (EGT, JCS), The National
653 Science Foundation (DEB-1638999, JCS) and the Fonds de recherche du
654 Québec-Nature et technologies (EGT). All computational work was conducted on
655 the Ohio State Supercomputer. We thank [complete author list to be determined]
656 for providing access to unpublished genome data produced by the U.S.
30 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
657 Department of Energy Joint Genome Institute, a DOE Office of Science User
658 Facility, supported by the Office of Science of the U.S. Department of Energy
659 under Contract No. DE-AC02-05CH11231.
660
661
662 Author contributions
663
664 EGT and JCS conceptualized the work and wrote the manuscript. EGT
665 developed and performed all analyses.
666
667 Competing financial interests
668
669 The authors have declared that no competing interests exist.
670
671 References
672 Al-Shahrour F, Minguez P, Marqués-Bonet T, Gazave E, Navarro A, Dopazo J.
673 2010. Selection upon genome architecture: conservation of functional
674 neighborhoods with changing genes. PLoS Comput Biol 6(10):e1000953.
675 Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local
676 alignment search tool. Journal of molecular biology 215:403-410.
677 Andam CP, Gogarten JP. 2011. Biased gene transfer in microbial evolution. Nat
678 Rev Micro 9:543-555.
31 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
679 Baquero F. 2004. From pieces to patterns: evolutionary engineering in bacterial
680 pathogens. Nat Rev Microbiol 2:510-518.
681 Bayry J, Beaussart A, Dufrêne YF, Sharma M, Bansal K, Kniemeyer O,
682 Aimanianda V, Brakhage AA, Kaveri SV, Kwon-Chung KJ. 2014. Surface
683 structure characterization of Aspergillus fumigatus conidia mutated in the melanin
684 synthesis pathway and their human cellular immune response. Infection and
685 immunity 82:3141-3153.
686 Bouyioukos C, Reverchon S, Képès F. 2016. From multiple pathogenicity islands
687 to a unique organized pathogenicity archipelago. Sci Rep 6.
688 Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. 2007. Much ado about
689 nothing: a comparison of the performance of meta‐analytical methods with rare
690 events. Statistics in medicine 26:53-77.
691 Davila Lopez M, Martinez Guerra JJ, Samuelsson T. 2010. Analysis of gene
692 order conservation in eukaryotes identifies transcriptionally and functionally
693 linked genes. PLoS ONE 5:e10654.
694 Desentis-Mendoza RM, Hernández-Sánchez H, Moreno A, Rojas del C E, Chel-
695 Guerrero L, Tamariz J, Jaramillo-Flores ME. 2006. Enzymatic polymerization of
696 phenolic compounds using laccase and tyrosinase from Ustilago maydis.
697 Biomacromolecules 7:1845-1854.
698 Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A, Foster A, Gill
699 N, Henrissat B, Herath P. 2015. Horizontal gene transfer and gene dosage drives
32 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
700 adaptation to wood colonization in a tree pathogen. Proceedings of the National
701 Academy of Sciences 112:3451-3456.
702 Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol
703 7:e1002195.
704 Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST.
705 Bioinformatics 26:2460-2461.
706 Enkerli J, Bhatt G, Covert SF. 1998. Maackiain detoxification contributes to the
707 virulence of Nectria haematococca MP VI on chickpea. Molecular Plant-Microbe
708 Interactions 11:317-326.
709 Fungal Databases [Internet]. 2017. U.S. National Fungus Collections, ARS,
710 USDA. cited April 1, 2017.
711 Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos
712 DS, Stoeckert CJ. 2011. Using OrthoMCL to assign proteins to OrthoMCL‐DB
713 groups or to cluster proteomes into new Ortholog groups. Current protocols in
714 bioinformatics:6.12. 11-16.12. 19.
715 Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, Martínez AT,
716 Otillar R, Spatafora JW, Yadav JS. 2012. The Paleozoic origin of enzymatic lignin
717 decomposition reconstructed from 31 fungal genomes. Science 336:1715-1719.
718 Fowler ZL, Baron CM, Panepinto JC, Koffas MA. 2011. Melanization of
719 flavonoids by fungal and bacterial laccases. Yeast 28:181-188.
720 Gershenzon J, Fontana A, Burow M, Wittstock U, Degenhardt J. 2012. Mixtures
721 of plant secondary metabolites: metabolic origins and ecological benefits. In:
33 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
722 Iason GR, Dicke M, Hartley SE, editors. The ecology of plant secondary
723 metabolites: from genes to global processes. Cambridge (UK): Cambridge
724 University Press. p. 56-77.
725 Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH, Stewart JE,
726 Snook ME. 2016. Two Horizontally Transferred Xenobiotic Resistance Gene
727 Clusters Associated with Detoxification of Benzoxazolinones by Fusarium
728 Species. PLoS ONE 11:e0147486.
729 Gluck-Thaler E, Slot JC. 2015. Dimensions of Horizontal Gene Transfer in
730 Eukaryotic Microbial Pathogens. PLoS Pathog 11:e1005156.
731 Greene GH, McGary KL, Rokas A, Slot JC. 2014. Ecology drives the distribution
732 of specialized tyrosine metabolism modules in fungi. Genome Biology and
733 Evolution 6:121-132.
734 Hammerbacher A, Schmidt A, Wadke N, Wright LP, Schneider B, Bohlmann J,
735 Brand WA, Fenning TM, Gershenzon J, Paetz C. 2013. A common fungal
736 associate of the spruce bark beetle metabolizes the stilbene defenses of Norway
737 spruce. Plant Physiology 162:1324-1336.
738 Hane JK, Lowe RG, Solomon PS, Tan K-C, Schoch CL, Spatafora JW, Crous
739 PW, Kodira C, Birren BW, Galagan JE. 2007. Dothideomycete–plant interactions
740 illuminated by genome sequencing and EST analysis of the wheat pathogen
741 Stagonospora nodorum. The Plant Cell 19:3347-3368.
34 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
742 Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP. 2011. A
743 novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi.
744 Genome Biol 12:R45.
745 Hoffmeister D, Keller NP. 2007. Natural products of filamentous fungi: enzymes,
746 genes, and their regulation. Natural product reports 24:393-416.
747 Höhl B, Arnemann M, Schwenen L, Stöckl D, Bringmann G, Jansen J, Barz W.
748 1989. Degradation of the Pterocarpan Phytoalexin (—)-Maackiain by Ascochyta
749 rabiei. Zeitschrift für Naturforschung C 44:771-776.
750 Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW. 2016. Evidence for
751 extensive parallelism but divergent genomic architecture of adaptation along
752 altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist
753 209:1240-1251.
754 Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: reconstruction, analysis, and
755 visualization of phylogenomic data. Mol Biol Evol 33:1635-1638.
756 Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei
757 T, Mende DR, Sunagawa S, Kuhn M. 2015. eggNOG 4.5: a hierarchical orthology
758 framework with improved functional annotations for eukaryotic, prokaryotic and
759 viral sequences. Nucleic acids research 44(D1):D286-D293.
760 Hurst LD, Williams EJ, Pal C. 2002. Natural selection promotes the conservation
761 of linkage of co-expressed genes. Trends Genet 18:604-606.
762 Jensen RA. 1976. Enzyme recruitment in evolution of new function. Annu Rev
763 Microbiol 30:409-425.
35 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
764 Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen
765 J, Mitchell A, Nuka G. 2014. InterProScan 5: genome-scale protein function
766 classification. Bioinformatics 30:1236-1240.
767 Jönsson LJ, Martín C. 2016. Pretreatment of lignocellulose: formation of
768 inhibitory by-products and strategies for minimizing their effects. Bioresource
769 technology 199:103-112.
770 Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM. 2015.
771 Degradation of the benzoxazolinone class of phytoalexins is important for
772 virulence of Fusarium pseudograminearum towards wheat. Molecular plant
773 pathology 16:946-962.
774 Kirkpatrick M, Barton N. 2006. Chromosome inversions, local adaptation and
775 speciation. Genetics 173:419-434.
776 Lah L, Podobnik B, Novak M, Korošec B, Berne S, Vogelsang M, Kraševec N,
777 Zupanec N, Stojan J, Bohlmann J. 2011. The versatility of the fungal cytochrome
778 P450 monooxygenase system is instrumental in xenobiotic detoxification. Mol
779 Microbiol 81:1374-1389.
780 Lawrence JG, Roth JR. 1996. Selfish operons: horizontal transfer may drive the
781 evolution of gene clusters. Genetics 143:1843-1860.
782 Lynch M. 2007. The origins of genome architecture. Sunderland (MA): Sinauer
783 Associates.
784 Mäkelä MR, Marinović M, Nousiainen P, Liwanag AJ, Benoit I, Sipilä J, Hatakka
785 A, de Vries RP, Hildén KS. 2015. Chapter Two-Aromatic Metabolism of
36 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
786 Filamentous Fungi in Relation to the Presence of Aromatic Compounds in Plant
787 Biomass. Advances in applied microbiology 91:63-137.
788 Marquitti FMD, Guimarães PR, Pires MM, Bittencourt LF. 2014. MODULAR:
789 software for the autonomous computation of modularity in large network sets.
790 Ecography 37:221-224.
791 McGary KL, Slot JC, Rokas A. 2013. Physical linkage of metabolic genes in fungi
792 is an adaptation against the accumulation of toxic intermediate compounds.
793 Proceedings of the National Academy of Sciences 110:11481-11486.
794 Muto A, Kotera M, Tokimatsu T, Nakagawa Z, Goto S, Kanehisa M. 2013.
795 Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences
796 of Reactions. Journal of Chemical Information and Modeling 53:613-622.
797 Newman ME. 2006. Finding community structure in networks using the
798 eigenvectors of matrices. Physical review E 74:036104.
799 Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J, Schilling JS,
800 Kennedy PG. 2016. FUNGuild: an open annotation tool for parsing fungal
801 community datasets by ecological guild. Fungal Ecology 20:241-248.
802 Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara B, Simpson
803 GL, Solymos P, Stevens MHH, Wagner H. 2016. vegan: Community Ecology
804 Package. Version 2.3-5.
805 Paradis E, Claude J, Strimmer K. 2004. APE: analyses of phylogenetics and
806 evolution in R language. Bioinformatics 20:289-290.
37 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
807 Peay KG, Kennedy PG, Talbot JM. 2016. Dimensions of biodiversity in the Earth
808 mycobiome. Nature Reviews Microbiology 14:434-447.
809 Pepper JW. 2003. The evolution of evolvability in genetic linkage patterns.
810 Biosystems 69:115-126.
811 Pigné S, Zykwinska A, Janod E, Cuenot S, Kerkoud M, Raulo R, Bataillé-
812 Simoneau N, Marchi M, Kwasiborski A, N’Guyen G, et al. 2017. A flavoprotein
813 supports cell wall properties in the necrotrophic fungus Alternaria brassicicola.
814 Fungal Biology and Biotechnology 4:1.
815 Plissonneau C, Stürchler A, Croll D. 2016. The Evolution of Orphan Regions in
816 Genomes of a Fungal Pathogen of Wheat. mBio 7:e01231-01216.
817 Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL, Szekely
818 LA, Koonin EV. 2002. Connected gene neighborhoods in prokaryotic genomes.
819 Nucleic acids research 30:2212-2223.
820 Rogozin IB, Makarova KS, Wolf YI, Koonin EV. 2004. Computational approaches
821 for the analysis of gene neighbourhoods in prokaryotic genomes. Briefings in
822 bioinformatics 5:131-149.
823 Schmaler-Ripcke J, Sugareva V, Gebhardt P, Winkler R, Kniemeyer O,
824 Heinekamp T, Brakhage AA. 2009. Production of pyomelanin, a second type of
825 melanin, via the tyrosine degradation pathway in Aspergillus fumigatus. Appl
826 Environ Microbiol 75:493-503.
827 Schoch CL, Sung G-H, López-Giráldez F, Townsend JP, Miadlikowska J,
828 Hofstetter V, Robbertse B, Matheny PB, Kauff F, Wang Z. 2009. The Ascomycota
38 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
829 tree of life: a phylum-wide phylogeny clarifies the origin and evolution of
830 fundamental reproductive and ecological traits. Systematic biology 58:224-239.
831 Shalaby S, Horwitz BA, Larkov O. 2012. Structure–Activity Relationships
832 Delineate How the Maize Pathogen Cochliobolus heterostrophus Uses Aromatic
833 Compounds as Signals and Metabolites. Molecular Plant-Microbe Interactions
834 25:931-940.
835 Shanmugam V, Ronen M, Shalaby S, Larkov O, Rachamim Y, Hadar R, Rose
836 MS, Carmeli S, Horwitz BA, Lev S. 2010. The fungal pathogen Cochliobolus
837 heterostrophus responds to maize phenolics: novel small molecule signals in a
838 plant‐fungal interaction. Cellular microbiology 12:1421-1434.
839 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N,
840 Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated
841 models of biomolecular interaction networks. Genome Res 13:2498-2504.
842 Shwab EK, Bok JW, Tribus M, Galehr J, Graessle S, Keller NP. 2007. Histone
843 deacetylase activity regulates chemical diversity in Aspergillus. Eukaryotic cell
844 6:1656-1664.
845 Slot JC, Hibbett DS. 2007. Horizontal transfer of a nitrate assimilation gene
846 cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE
847 2:e1097.
848 Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ. 2011.
849 Ecology drives a global network of gene exchange connecting the human
850 microbiome. Nature 480:241-244.
39 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
851 Snel B, Bork P, Huynen MA. 1999. Genome phylogeny based on gene content.
852 Nat Genet 21:108-110.
853 Snel B, Lehmann G, Bork P, Huynen MA. 2000. STRING: a web-server to
854 retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic
855 acids research 28:3442-3444.
856 Spatafora JW, Chang Y, Benny GL, Lazarus K, Smith ME, Berbee ML, Bonito G,
857 Corradi N, Grigoriev I, Gryganskyi A. 2016. A phylum-level phylogenetic
858 classification of zygomycete fungi based on genome-scale data. Mycologia
859 108:1028-1046.
860 Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-
861 analysis of large phylogenies. Bioinformatics 30:1312-1313.
862 Stukenbrock EH, Bataillon T, Dutheil JY, Hansen TT, Li R, Zala M, McDonald BA,
863 Wang J, Schierup MH. 2011. The making of a new pathogen: insights from
864 comparative population genomics of the domesticated wheat pathogen
865 Mycosphaerella graminicola and its wild sister species. Genome Res 21:2157-
866 2166.
867 Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J,
868 Simonovic M, Roth A, Santos A, Tsafou KP. 2014. STRING v10: protein–protein
869 interaction networks, integrated over the tree of life. Nucleic acids research
870 43(D1):D447-D452.
871 Szumilas M. 2010. Explaining odds ratios. Journal of the Canadian Academy of
872 Child and Adolescent Psychiatry 19:227.
40 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
873 Takeda I, Umemura M, Koike H, Asai K, Machida M. 2014. Motif-independent
874 prediction of a secondary metabolism gene cluster using comparative genomics:
875 application to sequenced genomes of Aspergillus and ten other filamentous
876 fungal species. DNA research 21(4):447-457
877 Tsochatzidou M, Malliarou M, Papanikolaou N, Roca J, Nikolaou C. 2017.
878 Genome urbanization: clusters of topologically co-regulated genes delineate
879 functional compartments in the genome of Saccharomyces cerevisiae. Nucleic
880 acids research 45(10):5818-5828.
881 Wagner GP. 1996. Homologues, natural kinds and the evolution of modularity.
882 American Zoologist 36:36-43.
883 Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C. 2014. Gene discovery
884 for enzymes involved in limonene modification or utilization by the mountain pine
885 beetle-associated pathogen Grosmannia clavigera. Appl Environ Microbiol
886 80:4566-4576.
887 Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach
888 MA, Müller R, Wohlleben W. 2015. antiSMASH 3.0—a comprehensive resource
889 for the genome mining of biosynthetic gene clusters. Nucleic acids research
890 43:W237-W243.
891 Wickham H. 2016. ggplot2: elegant graphics for data analysis: Springer.
892 Wisecaver JH, Slot JC, Rokas A. 2014. The Evolution of Fungal Metabolic
893 Pathways. PLoS Genet 10:e1004816.
41 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
894 Yeaman S. 2013. Genomic rearrangements and the evolution of clusters of
895 locally adaptive loci. Proceedings of the National Academy of Sciences
896 110:E1743-E1751.
897 Zaidi SSA, Zhang X. 2016. Computational operon prediction in whole-genomes
898 and metagenomes. Briefings in Functional Genomics. elw034.
899 Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B, San Francisco B,
900 Solbiati J, Steves A, Brown S. 2014. Prediction and characterization of enzymatic
901 activities guided by sequence similarity and genome neighborhood networks.
902 Elife 3:e03275.
903
904 Figure legends
905
906 Figure 1: Associations between gene cluster distributions and fungal lifestyle. a)
907 A phylogeny of 556 fungi (representing 481 species) based on pairwise
908 microsyntenic similarity is shown to the left, annotated by taxonomic class. All
909 taxonomic classes have been abbreviated by removing “-mycetes” suffixes. A
910 matrix indicating the ecological lifestyle (s) associated with each fungus is shown
911 to the immediate right of the phylogeny, followed by a heatmap indicating the
912 number of putative phenylpropanoid-degrading gene clusters in each genome,
913 for each cluster class. Numbers in brackets following the taxonomic class and
914 ecological lifestyle headers correspond to the number of genomes and species
915 within those categories. Numbers in brackets following cluster class headers
42 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
916 indicate the total number of clusters assigned to that class, the number of
917 species with at least 1 cluster, and the number of clusters that overlap with
918 clusters from another cluster class. b) Odds ratios representing the strength of
919 the association between cluster presence and fungal ecological lifestyle are
920 shown for each of 13 gene cluster classes and 6 lifestyles, using data at the
921 species level from the Pezizomycotina. Dotted red lines indicate an odds ratio of
922 1. Dark grey bars indicate enrichment below a significance level of 0.05, while
923 black outlines indicate enrichment below a significance level of 0.01. Error bars
924 indicate the 95% confidence interval (CI) for each odds ratio measurement. CIs
925 of 0 are not shown. The color-coding of ecological lifestyle is consistent across
926 the entire figure.
927
928 Figure 2: Combinations of putative phenylpropanoid-degrading gene clusters in
929 fungal genomes. a) A matrix describes the presence/absence of various
930 homologous clusters (as determined by cluster model; rows) in the genomes of
931 139 fungal species (columns). Species are grouped into 16 multi-cluster model
932 profiles (MCMPs) based on similarities in the combinations of clusters found in
933 their genomes. b) A bar chart depicts the number of fungal species from each
934 MCMP per taxonomic class. c) Odds ratios representing the strength of the
935 association between MCMP and fungal ecological lifestyle are shown, using data
936 at the species level from the Pezizomycotina. Dotted black lines indicate an odds
937 ratio of 1. Dark grey bars indicate enrichment below a significance level of 0.05,
43 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
938 while black outlines indicate enrichment below a significance level of 0.01. Error
939 bars indicate the 95% confidence interval (CI) for each odds ratio measurement.
940 CIs of 0 are not shown. Enrichment data is not shown for MCMP 12, as fewer
941 than 5 fungi from the Pezizomycotina are assigned to this MCMP.
942
943 Figure 3: Co-occurrence network of homolog groups in putative phenylpropanoid-
944 degrading gene clusters. Each node represents a homolog group found in a
945 putative phenylpropanoid-degrading gene cluster. All nodes are color-coded by
946 the cluster class in which they are found, except for those homolog groups found
947 associated with multiple cluster classes (i.e., shared), colored pink and shaped
948 as squares. Nodes representing anchor gene families used to retrieve the
949 clusters are shaped as diamonds, and those anchor gene families that are
950 shared across multiple cluster classes are additionally colored pink. Edges
951 symbolize the co-occurrence of two homolog groups in the same gene cluster,
952 while edge width is proportional to the frequency of that occurrence. Node size is
953 proportional to the number of connections emanating from that node. The
954 proximity of nodes to one another is proportional to the number of shared
955 connections. The annotations, followed by the code names and number of unique
956 connections in parentheses, of nodes with the greatest number of connections
957 (i.e., associated with the greatest diversity of homolog groups) are indicated in
958 the top right-hand corner of the network.
959
44 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
960 Figure 4: The distribution of pterocarpan hydroxylase (PAH) clusters in fungi. a) A
961 phylogeny of the Pezizomycotina based on pairwise microsyntenic distance is
962 shown to the left, annotated by taxonomic class. The presence/absence of
963 clusters assigned to the 4 PAH cluster models is indicated in the matrix to the
964 right of the phylogeny, color-coded by cluster model. b) A simplified schematic of
965 one of several reactions catalyzed by PAH. c) Homolog groups present in ≥75%
966 of clusters assigned to a given cluster model are depicted as boxes color-coded
967 by cluster model, while PAH homologs are indicated in yellow. The four-digit
968 code and predicted annotation are indicated for each homolog group. D) The
969 depicted network follows the conventions specified in Figure 2. Homolog groups
970 present in ≥75% of clusters assigned to a given cluster model are color-coded by
971 cluster model and inscribed with their code, while others are colored gray. Nodes
972 representing homolog groups present in clusters from other cluster classes are
973 drawn as squares.
974
975
45 A) Pezizomycotina Agarico. Sacch.2 Eurotio. Sordario. Leotio. Dothideo. Sacch.1 Fungal lifestyle Number ofclusters (122/110) (50/42) (73/67) (95/70) (37/21) (89/84) (3/3) 1 2 bioRxiv preprint 3 4
Fungal lifestyle
Animal pathotroph (85/70) Soil saprotroph (110/75)
Plant saprotroph (258/220) doi: not certifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. Plant pathotroph (146/115)
Plant symbiotroph (41/40) https://doi.org/10.1101/184242
Endophyte (65/46)
QDH (247/148/22)
ARD (210/156/1)
NAD (174/151/0)
PAH (145/133/0)
PMO (134/124/6) Gene clusterclass
CCH (78/71/30)
BPH (62/54/1)
SAH (40/35/0)
SDO (26/21/0)
ECL (23/23/0) ;
VAO (15/13/2) this versionpostedSeptember4,2017. CAE (12/12/0)
FAD (2/2/0)
Odds ratio B) 10 12 14 10 12 14 10 12 14 10 12 14 10 12 14 10 12 14 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Gene cluster class QDH
ARD Enrichment significance(Pezizomycotina) * *
NAD P P >0.05 PAH PMO
CCH
BPH SAH ≤ 00 P 0.05 SDO The copyrightholderforthispreprint(whichwas + *upper 95% CIoutofrange odds ratio outof range ECL VAO CAE *
FAD ≤
0.01 + * *
saprotroph pathotroph saprotroph pathotroph symbiotroph
Soil Animal Plant Plant Plant Endophyte Fungal lifestyle Fungal bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
A) Multi-cluster model profile (MCMP) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 QDH 4 5 6 1 2 3 4 ARD 5 6 7 8 1 2 3 4 NAD 5 6 7 8 1 2 PAH 3 4 1 2 3 PMO 4 5 6 1 Gene cluster model 2 3 CCH 4 5 6 1 2 BPH 3 4 1 2 3 SAH 4 5 6 1 ECL 2 1 SDO 1 VAO 2 3 1 CAE 2 FAD 1 Fungal species gene cluster presence
*upper 95% CI out of range pathotroph 20 20 * Animal 15 16 B) C) 12 10 8
5 4 Dothideo. 0 0 saprotroph 20 20 * *
15 16 Soil 12 10 8 Leotio. 5 4 Number of species 0 0 saprotroph Fungal lifestyle 20 20
15 16 Plant
10 12 8 5
Sordario. 4 0 0 20 pathotroph 20 *** 15 16 Plant
10 12
Odds ratio 8 5 Eurotio. 4
Taxonomic class Taxonomic 0 0 20 symbiotroph 20 ** 15
16 Plant
10 12 8 5
4 Saccharo. 0 0
20 Endophyte 20 **** 15 16
10 12
Other 8 5 4 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 Multi-cluster model profile (MCMP) Multi-cluster model profile (MCMP) Enrichment significance Number of species in MCMP (Pezizomycotina only) P > 0.05 P ≤ 0.05 P ≤ 0.01 bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Most higly connected homolog groups 1-Aldehyde dehydrogenase (ALL1050; 104) 2-Quinate permease (ALL1001; 83) 3-Cytochrome P450 (ALL1024; 82) 4-Catechol dioxygenase (ALL1022; 71) 5-Salicylate hydroxylase (ALL1011; 68)
1 3 2 4
5 Homolog group type Unique Shared Unique Shared anchor anchor Gene cluster class QDH BPH NAD SAH ARD ECL PMO SDO PAH VAO CCH CAE FAD bioRxiv preprint doi: https://doi.org/10.1101/184242; this version posted September 4, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Cluster presence O O O O H A) Model 1 Model 2 Model 3 Model 4 B) H PAH H
O H H H O O O O
O O (-)-maackiain 1α-hydroxy-maackiain
C) Cytochrome NmrA-like Unknown Unknown P450 transcriptional Model 1 PAH 1005 1007 regulator 1011 1014 Dothideomycetes
Sugar+other Cytochrome Dehydrogenase P450 transporter Model 2 PAH 1001 1002 1003
Amino acid MFS Isoflavone Monoamine Unknown permease transporter reductase oxidase Model 3 PAH 1010 1030 1155 1289 1296
Leotiomycetes Cytochrome Transcription 15-hydroxy Unknown P450 factor prostaglandin Model 4 PAH 1001 10791129 dehydrogenase 1298 D)
1011 1005 1007 1014
1003
Sordariomycetes 1002
1001
PAH 1010 1298 1129 1079 1030 1155
1296
1289
Homolog group type
Eurotiomycetes Model 1 Model 2 Model 3 Model 4 Present in < 75% of clusters Anchor Shared with another cluster class assigned to model