Computational, Evolutionary and Functional Genetic Characterization of

Fungal Gene Clusters Adapted to Degrade Plant Defense Chemicals

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Emile Gluck Thaler, M.Sc.

Graduate Program in Plant Pathology

The Ohio State University

2019

Dissertation committee:

Jason C. Slot, Adviser

Ana P. Alonso

Pierluigi Bonello

Laura S. Kubatko

1

Copyright by Emile Glück Thaler

2019

2

Abstract

Fungal interactions with plants pose both significant risks and benefits to global economies and ecosystems. As pathogens, fungi consume crops at our expense, and as mutualists and decayers, they maintain the health of fields, forests and soils. A key trait underlying these varied lifestyles is the ability to degrade toxic chemicals produced by plants to defend themselves from fungal attack. However, little is known about the genetic bases of these degradative (i.e., catabolic) mechanisms, or the evolutionary processes that give rise to adaptive catabolism, which has resulted in a fundamental gap in our understanding of how fungi adapt to their plant hosts. One promising approach to address these gaps in our knowledge is the study of metabolic gene clusters (MGCs), which are groups of neighboring genes that encode enzymatic, transporter and regulatory proteins participating in the same or related metabolic pathway. The self-contained nature of MGCs facilitates the discovery of genes encoding adaptive pathways, as well as investigations into the mechanisms shaping their evolution. Yet the extent to which catabolic genes form MGCs is unknown, largely due to a lack of tools suitable for their identification. The primary research objectives of this dissertation are thus twofold: to first develop computational tools for the identification of MGCs encoding the degradation of plant defense chemicals, and to then characterize the MGCs identified by these tools using phylogenetic and functional genetic analyses in order to elucidate the evolutionary processes driving fungal catabolic to plant tissues.

ii In Chapter 1, I synthesize what is currently known about catabolic MGCs and their contributions to fungal ecological adaptation, with a focus on the evolutionary forces driving their assembly, maintenance and dispersal in fungal populations. In

Chapter 2, I review the impact of one of these forces, , on the evolution of eukaryotic microbial pathogens, including fungi. In Chapter 3, by developing a novel computer program that models how gene order is typically conserved in fungal genomes, I show that genes involved in the degradation of plant defenses form unexpectedly conserved clusters with biased ecological distributions, and identify gene families that are repeatedly incorporated into different types of clusters, which suggests they encode critical enzymatic functions for plant defense degradation. In Chapter 4, I characterize the function and evolution of three MGCs identified in Chapter 3 that contain a gene involved in the degradation of plant defense compounds, stilbene cleavage oxygenase (SCO). Using heterologous gene expression and liquid chromatography to conduct in vitro enzymatic assays, we found that SCOs located in different MGCs have similar substrate specificities, indicating that there has been little evolution at the level of enzymatic function. In contrast, phylogenetic analyses suggest multiple independent origins of SCO’s association with different cluster types, consistent with recurrent selection for specific gene combinations that are generated through combinatorial evolution. Together, the work presented in this dissertation demonstrates that catabolic MGCs are more prevalent than previously thought, and suggests that combinatorial evolution, especially as it manifests at the level of genome organization, is an important force shaping fungal adaptation in plant-associated niches.

iii

For my family, with love and gratitude

iv

Acknowledgements

The disease triangle is a foundational concept in plant pathology. It recognizes that the manifestation of a disease is contingent on three factors: the pathogen, the and the environment. Similar ecological theory applies to a graduate education, which is only possible when the student, the supervisor and the surrounding environment align just so. With that in mind, I have much to be grateful for.

First, I need to thank my mentor and adviser, Jason. Simply put, my education and development as a scientist would not have been possible without his consistent support, honesty and respect. Jason’s tremendous passion for teaching and learning is an inspiration, and sets a standard that I will always strive towards.

Although it is all too easy to feel isolated while in graduate school, I am fortunate to have found in my department a welcoming and sustaining environment, and for that, I am indebted to my friends and colleagues for their daily presence and support.

Beyond the day-to-day, I am grateful for my friends Anna, Forest, Harry, Katie, Owen, Tash and Tori who, despite the distance, have always been there when I needed them.

For Diana, whose love kept me going.

For Da-Young, whose support I can always count on.

For Horacio and Justin, who always have time.

For Bethany, whose wisdom keeps me sane.

For my mother Yaël, whose courage gives me the confidence to forge my own path.

For my father Sam, whose unconditional love sustains my confidence.

For my brother Aaron, who is always right there with me.

My work and education were made possible by the generous support of the Fonds de Recherche du Québec and The Ohio State University, for which I am grateful.

v

Vita

June 2011 ...... Diploma of College Studies in Music, Marianopolis College, Canada

June 2011. . . .Diploma of College Studies in Health Sciences, Marianopolis College, Canada

June 2014 ...... B.Sc. Honors in Life Sciences, McGill University, Canada

May 2017 ...... M.Sc. in Plant Pathology, The Ohio State University, U.S.A

Publications

Gluck-Thaler E, Vijayakumar V, Slot JC (2018). Fungal adaptation to plant defences through convergent assembly of metabolic modules. Molecular ecology 27: 5120-5136.

Gluck-Thaler E, Slot JC (2018). Specialized plant biochemistry drives gene clustering in fungi. The ISME journal 12: 1694-1705.

Reynolds HT, Vijayakumar V, Gluck-Thaler E, Korotkin HB, Matheny PB, Slot JC (2018). Horizontal gene cluster transfer increased hallucinogenic mushroom diversity. Evolution Letters 2: 88-101.

Samsatly J, Chamoun R, Gluck-Thaler E, Jabaji S (2016). Genes of the de novo and Salvage Biosynthesis Pathways of Vitamin B6 are Regulated under Oxidative Stress in the Plant Pathogen Rhizoctonia solani. Frontiers in microbiology 6: 1429.

Gluck-Thaler E, Slot JC (2015). Dimensions of horizontal gene transfer in eukaryotic microbial pathogens. PLoS Pathogens 11: e1005156.

Fields of Study

Major Field: Plant Pathology

Specialization: Bioinformatics and Evolutionary Genomics

vi Table of Contents

Abstract ...... ii Vita ...... vi List of Figures ...... viii List of Tables ...... x Chapter 1: The Architecture of Resistance: Fungal Gene Clusters Adapted to Degrade Plant Defense Chemicals ...... 1 Fungal ecology in the context of catabolism ...... 2 Catabolic clusters in fungi ...... 6 Cluster evolution: towards a predictive framework ...... 12 MGC evolution: does one size fit all? ...... 25 MGCs: windows into the future of fungal ecology ...... 28 References ...... 30 Chapter 2: Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens ...... 45 Introduction ...... 46 Does HGT really contribute to eukaryotic microbial pathogen genomes? ...... 47 Does genetic network complexity influence HGT? ...... 48 Do eukaryotic microbial pathogens become “who they meet”? ...... 53 Do human activities impact HGT in eukaryotic microbial pathogens? ...... 56 References ...... 57 Chapter 3: Specialized Plant Biochemistry Drives Gene Clustering in Fungi ...... 61 Abstract ...... 62 Introduction ...... 63 Materials and Methods ...... 65 Results ...... 71 Discussion ...... 81 Acknowledgements ...... 86 References ...... 87 Chapter 4: Fungal Adaptation to Plant Defenses Through Convergent Assembly of Metabolic Modules ...... 93 Abstract ...... 94 Materials and Methods ...... 98 Results ...... 109 Discussion ...... 125 Acknowledgements ...... 134 References ...... 135 Bibliography ...... 144 Appendix A: Chapter 2 Supplementary Materials ...... 171 Appendix B: Chapter 3 Supplementary Materials ...... 173 Appendix C: Chapter 4 Supplementary Materials ...... 189

vii List of Figures

Figure 2.1. The interacting dimensions of horizontal gene transfer in eukaryotic microbial pathogens…………………………………………………………………………………………51 Figure 3.1. Associations between gene cluster distributions and fungal lifestyle……………… ..73 Figure 3.2. Combinations of candidate phenylpropanoid-degrading gene clusters in fungal genomes………………………………………………………………………………………… ..76 Figure 3.3. Co-occurrence network of homolog groups in candidate phenylpropanoid-degrading gene clusters……………………………………………………………………………………. ..78 Figure 3.4. The distribution of pterocarpan hydroxylase (PAH) clusters in fungi……………… 79 Figure 4.1. The distributions of three types of stilbene cleavage oxygenase (sco) gene clusters in fungi……………………………………………………………………………………………. 110 Figure 4.2. Biochemical characterization of four fungal stilbene cleavage oxygenase (SCO) enzymes encoded in alternate cluster types……………………………………………………. 113 Figure 4.3. Transitions between and gains of distinct cluster types across the stilbene cleavage oxygenase (sco) phylogeny…………………………………………………………………….. 119 Figure 4.4. Synteny comparisons within three lineages of stilbene cleavage oxygenase (sco) loci inferred to have experienced transition or fusion events between different cluster types……… 121 Figure B.1. Schematics of computational pipelines and enrichment tests……………………... 174 Figure B.2. The distribution of candidate phenylpropanoid-degrading gene clusters in fungi… 177 Figure B.3. Distributions of gene window size for all gene cluster sizes……………………… 177 Figure B.4. Associations between cluster model presence and fungal lifestyle……………….. 178 Figure B.5. The distribution of candidate phenylpropanoid-degrading gene cluster models in fungi……………………………………………………………………………………………. 178 Figure C.1. External standard curve of butylated hydroxyanisole (BHA) used for the quantification of cleavage products resulting from stilbene cleavage oxygenase activity……... 190 Figure C.2. Maximum likelihood tree of bacterial, plant and fungal amino acid sequences from the carotenoid cleavage dioxygenase and stilbene cleavage oxidase families………………… 191 Figure C.3. Maximum likelihood tree of fungal amino acid sequences from the stilbene cleavage oxidase family with outgroup of characterized stilbene cleaving bacterial enzymes………….. 192 Figure C.4. Maximum likelihood tree of 466 monophyletic fungal amino acid sequences from the stilbene cleavage oxidase (SCO) clade examined in this study……………………………….. 192 Figure C.5. 50% majority rule consensus Bayesian tree of 466 monophyletic fungal amino acid sequences from the stilbene cleavage oxidase (sco) clade examined in this study……………. 193 Figure C.6. Putative replacement of vanillyl alcohol oxidase (vao) within a model 3 stilbene cleavage oxygenase (sco) cluster………………………………………………………………. 193 Figure C.7. Maximum likelihood tree of bacterial and fungal amino acid sequences from the 2,3- dihydroxybenzoate decarboxylase (dhbd) family……………………………………………… 194

viii Figure C.8. Maximum likelihood tree of the monophyletic clade containing all 2,3- dihydroxybenzoate decarboxylase (dhbd) genes found clustered with stilbene cleavage oxygenase, based on amino acid sequences……………………………………………………. 195 Figure C.9. Maximum likelihood tree of bacterial and fungal amino acid sequences from the gentisate 1,2 dioxygenase (gdo) family………………………………………………………… 195 Figure C.10. Maximum likelihood tree of the monophyletic clade containing all gentisate 1,2 dioxygenase (gdo) genes found clustered with stilbene cleavage oxygenase, based on amino acid sequences………………………………………………………………………………………. 196 Figure C.11. Maximum likelihood tree of bacterial and fungal amino acid sequences from the vanillyl alcohol oxidase (vao) family………………………………………………………….. 196 Figure C.12. Maximum likelihood tree of the monophyletic clade containing all vanillyl alcohol oxidase (vao) found clustered with stilbene cleavage oxygenase, based on amino acid sequences………………………………………………………………………………………. 197 Figure C.13. A rooted, constrained maximum likelihood phylogeny based on the amino acid sequences of the second largest subunit of RNA polymerase II (rpb2) representing relationships among all 288 genomes examined in this study and 9 outgroup fungal genomes………………………………………………………………………………………… 197 Figure C.14. Phylogenetic diversity measurements of key clustered and housekeeping gene families…………………………………………………………………………………………. 201 Figure C.15. Rate coefficients for transitions between clustered states sampled during BayesTraits v3 reverse jump MCMC analysis………………………………………………………………. 201 Figure C.16. Hypothetical catabolic pathways encoded in three distinct types of stilbene cleavage oxidase (sco) clusters…………………………………………………………………………… 202

ix List of Tables

Table A.1. Well supported reports of HGT in Eukaryotic Microbial Pathogens………………. 172 Table B.1. Information concerning anchor gene family queries……………………………….. 179 Table B.2. Genome metadata…………………………………………………………………... 181 Table B.3. Additional information concerning clusters retrieved with each anchor gene family…………………………………………………………………………………………… 181 Table B.4. Key words associated with secondary metabolite biosynthesis used to remove genes predicted to participate exclusively in biosynthetic metabolism………………………………. 182 Table B.5. All candidate gene clusters detected in study………………………………………. 183 Table B.6. Homolog groups present ≥75% of clusters assigned to a given model…………….. 183 Table B.7. Total number of clusters per cluster model in species from each taxonomic class… 183 Table B.8. Enrichment of KOG processes in clustered proteins (one tailed Fisher's exact test). 184 Table B.9. Enrichment of KOG processes in proteins that are part of homolog groups present in multiple cluster classes (i.e., shared homolog groups)…………………………………………. 186 Table B.10. Information concerning homolog groups present in multiple cluster classes (i.e., shared homolog groups)………………………………………………………………………... 188 Table B.11. Output from MODULAR analysis with spectral partitioning…………………….. 188 Table C.1. Genome metadata…………………………………………………………………... 203 Table C.2. AU Test between the optimal ML tree and the ML tree whose topology is constrained to match the Bayesian phylogeny………………………………………………………………. 203 Table C.3. AU Tests for the independence of transitions in sco's clustered state……………… 203 Table C.4. Primers generated and used in this study…………………………………………… 204 Table C.5. Descriptions and coordinates of sco clusters and singletons……………………….. 205 Table C.6. Motif alignments of characterized bacterial and fungal scos………………………. 205 Table C.7. Presence of conserved amino acid motifs in 466 fungal sco sequences……………. 205 Table C.8. UPLC analysis and determination of cleavage product concentrations……………. 206 Table C.9. Substrates of sco and related proteins……………………………………………… 207 Table C.10. Node statistics of maximum likelihood trees of sco, dhbd, gdo and vao…………… 208 Table C.11. Inferred gains and transitions of sco's clustered state……………………………... 208 Table C.12. Transposable element sequences detected in sco cluster regions…………………. 208

x

Chapter 1: The Architecture of Resistance: Fungal Gene Clusters Adapted to Degrade Plant Defense Chemicals

1 Fungal ecology in the context of catabolism

The metabolic basis of plant-fungal interactions

Fungal interactions with plants have shaped earth’s ecosystems since the early

Neoproterozoic era (Berbee et al., 2015), and continue to have significant ecological and economic impacts on the functioning of terrestrial ecosystems. For example, fungal saprotrophs that degrade dead plant tissues drive global carbon cycles (Floudas et al.,

2012), while pathogens that colonize living tissues pose significant risks to global food and natural resource security (Savary et al., 2019). A fundamental understanding of the determinants of plant-fungal interactions is thus of critical importance for predicting and preserving ecosystem function, especially in the era of climate change.

Whereas morphological features often determine the outcomes of interactions between plants and animals, plant-fungal interactions are largely circumscribed by metabolic traits. Plants produce myriad specialized or secondary metabolites (SMs) both constitutively (phytoanticipins) and inducibly (phytoalexins) to resist fungal pathogens and signal to mutualistic partners either through direct modes of action (VanEtten et al.,

2001) or through regulation of broader defense responses (Piasecka et al., 2015), and these SMs often persist in tissues even after the plant has died. Plant-associated fungi from across the ecological spectrum, ranging from mutualists to saprotrophs to pathogens, have correspondingly evolved diverse arrays of degradative metabolic (i.e., catabolic) pathways to contend with the chemical complexity of plant tissues (Mäkelä et

2 al., 2014). Indeed, large suites of genes involved in nutrient assimilation and defense detoxification are among the most highly expressed sequences in fungi upon exposure to plant chemicals or tissues (Lah et al., 2013; Paolinelli-Alfonso et al., 2016; Seifbarghi et al., 2017). Catabolic pathways enable assimilation of nutrients essential for growth and/or the transformation of toxic plant SMs into more benign products, and consequently contribute both quantitatively (VanEtten et al., 1980; Kliebenstein et al., 2005; Pareja-

Jaime et al., 2008; Ökmen et al., 2013; Srivastava et al., 2013; Kettle et al., 2015; Wadke et al., 2016; Zhao et al., 2018) and qualitatively to fungal growth on plant tissues (Schäfer et al., 1989; Bowyer et al., 1995). The detoxification of plant SMs through such “counter- defense” pathways not only lifts constraints on fungal growth and contributes to overall fitness, but also may produce molecules that interfere with broader plant defense responses (Bouarab et al., 2002). Yet despite the reliance of fungi on degradative metabolism to gain access to plant tissues, little is known about the genetics underlying fungal catabolic pathways, especially counter-defense pathways, precluding a systematic accounting of the evolutionary processes driving fungal adaptation to plants.

Genome architecture and ecological adaptation

One promising approach for identifying genes underlying catabolic pathways is to examine patterns in how genes are organized in genomes (i.e., genome architecture). In plant, animal, bacterial and fungal genomes, genes with related functions are often positioned closer together than expected by chance, albeit to different degrees. For example, genomic islands of divergence, which consist of chromosomal regions that are

3 strongly differentiated between populations, often contain genes contributing to fitness in a specific environment. Genomic islands of divergence are implicated in adaptive responses to local environmental conditions in organisms as diverse as fungi (Ellison et al., 2011), plants (Holliday et al., 2016) and fish (Larson et al., 2017), suggesting they are key attributes of evolution. A more concentrated architectural pattern is the supergene, which consists of two or more tightly linked loci contributing to the same adaptive phenotype. Supergenes encode a wide-range of traits in diverse taxa, including survival traits in flies (Durmaz et al., 2018), wing coloration in butterflies (Joron et al., 2011), ecologically adaptive skeletal features in fish (Miller et al., 2014) and mutualism determinants in nitrogen fixing bacteria (Porter et al., 2018). While supergenes may occasionally be considered genomic islands of divergence, the former implies some level of functional association between the constituent genes, while the latter often contain genes that are not directly related by function but instead are implicated in the same general response to some environmental condition. Genomic islands of divergence may additionally describe nucleotide-based divergence between populations that accumulates in both coding and non-coding regions. Genomic islands of divergence and supergenes may initially evolve as a consequence of ecological selection, but then may facilitate the rapid acquisition of complex adaptive traits in new lineages through introgression or horizontal gene transfer, greatly accelerating the rate of ecological adaptation (Nandasena et al., 2006; Bay and Ruegg 2017). Identifying adaptive genomic islands of divergence and supergenes can greatly accelerate the discovery of genes underlying complex traits of

4 interest, and enable the development of hypotheses concerning the mode and tempo of ecological evolution.

Metabolic gene clusters: adaptation made manifest in genome organization

The best-studied class of supergene is the metabolic gene cluster (MGC). MGCs are self-contained genetic modules consisting of neighboring genes encoding enzymes, transporters and regulators participating in the same or related metabolic pathway (Slot

2017). MGCs typically consist of non-homologous genes, and are thus distinct from tandemly arrayed gene clusters consisting of paralogs that arise through DNA duplication

(Graham 1995). The physical proximity and physiologically related functions of genes in

MGCs suggest their clustering is a product of natural selection, and not of neutral demographic processes (e.g., genetic bottlenecks and drift), or genetic hitchhiking

(Yeaman et al., 2016). Whereas bacterial genomes contain many MGCs, and plant genomes contain few, fungal genomes are in the clustering “goldilocks zone” in that they possess an intermediate number of MGCs (Wisecaver et al., 2014). MGCs encode either biosynthetic or catabolic pathways, and are most often associated with specialized or secondary metabolic (SM) pathways contributing to fitness in specific environments

(Rokas et al., 2018). Biosynthetic MGCs in fungi have been intensively researched for decades, advancing our understanding of the intertwined processes driving the evolution of fungal ecology, chemical diversification and genome architecture (for comprehensive reviews focusing on biosynthetic MGCs, see Keller (2018), Nützmann et al. (2018) and

Rokas et al. (2018)). Much less emphasis has been placed on characterizing catabolic

5 MGCs in fungi, largely due to a lack of available tools suited for their identification. Yet the reliance of fungi on catabolic strategies to carve out plant-associated niches suggests that catabolic MGCs likely encode pathways of significant ecological consequence. For example, several recent reports have identified MGCs encoding pathways to degrade plant defense compounds, which may enable fungi to resist plant metabolites that limit fungal growth (Plumridge et al., 2010; Wang et al., 2014; Elmore et al., 2015; Gérecová et al., 2015; Kettle et al., 2015; Gluck-Thaler et al., 2018). The development of tools for the identification and characterization of counter-defense MGCs thus presents new opportunities to discover loci encoding traits contributing to fungal adaptation in plant- associated niches.

Here, I examine the currently known diversity of catabolic MGCs in fungi, with a focus on those that degrade plant defense compounds. I synthesize literature from the fields of evolutionary genomics, theoretical biology and chemical ecology in order to identify key criteria for evaluating and predicting MGC evolution. I discuss the implications of clustering for metabolic pathway evolution, as well as key differences between catabolic and biosynthetic MGCs. Finally, I suggest several future research directions for closing gaps in our knowledge of the causes and consequences of catabolic gene clustering for the evolution of fungal ecology.

Catabolic clusters in fungi

Catabolic MGCs encode diverse pathways across metabolic space. Although their substrates vary, MGCs can be roughly categorized as either “nutrient-assimilating” or

6 “detoxifying” based on the primary physiological role of their pathways (Slot 2017). This definition focuses solely on the function of the pathway that is selected for, and not the type of selection pressure that may result in the clustering of genes in the pathway.

Furthermore, while detoxification pathways may additionally provide nutrients for cellular growth, we consider them distinct from nutrient-assimilating clusters, as selection to resist a toxin is likely the main driver behind the pathways’ evolution, while nutrient assimilation is secondary, or simply a consequence of integrating the pathway into existing metabolic networks.

Nutrient-assimilating MGCs

MGCs involved in nutrient acquisition often target molecules containing growth- limiting nutrients with variable environmental distributions, such as nitrogen. For example, the DAL cluster contains 6 genes for assimilating allantoin, a nitrogen- containing intermediate of purine catabolism that accumulates in plants experiencing abiotic or biotic stresses (Montalbini 1991; Wang et al., 2015). Another cluster targeting nitrogen-rich molecules is the 3-gene HANT cluster that encodes a pathway for high- affinity nitrate assimilation (Johnstone et al., 1990). Many nutrient-acquiring MGCs targeting plant-derived carbohydrates have also been identified, and are most prevalent in yeast, whose ecological niche is often determined by their ability to consume substrate- specific sugars. To date, gene clusters encoding functions related to the assimilation of maltose (Charron et al., 1986), L-rhamnose (Watanabe et al., 2008), galactose (Douglas and Hawthorne 1966), and D-galacturonic acid (Martens-Uzunova and Schaap 2008),

7 have been functionally characterized, while many others have been predicted in silico

(Jeffries and Van Vleet 2009). MGCs with potential to use both externally and internally derived metabolites as substrates include those encoding pathways for amino acid degradation. MGCs targeting L-proline (Hull et al., 1989) and leucine, as well as 4- hydroxyphenylpyruvate, a degradation product of tyrosine and phenylalanine (Fernández-

Cañón and Peñalva 1995; Boyce et al., 2015) have all been characterized and shown to encode enzymes catalyzing successive pathway steps. Fungi are able to synthesize all of these amino acids, suggesting the encoded pathways may be involved in maintaining internal homeostasis, in addition to processing these nutrients when imported from outside the .

Joint assimilatory and detoxifying MGCs

Strictly detoxifying MGCs are rarely reported compared with MGCs that simultaneously detoxify molecules and enable their assimilation. This may be because detoxification can consist of a single step, while multi-step pathways are required for complete assimilation. However, complete assimilation may ultimately provide a larger fitness benefit than strict detoxification, as it allows access to new nutrient sources. Joint assimilatory and detoxifying MGCs identified to date encode pathways enabling the transformation of both rare and common plant defense compounds. MGCs targeting widely produced defense chemicals include the quinic acid utilization (QUT) cluster

(Giles et al., 1973), which consists of 3 genes encoding enzymes catalyzing successive steps for the degradation of quinic acid, 1 enzyme encoding gene with an unknown

8 function, 1 gene encoding a quinate transporter and 2 genes encoding a transcriptional activator and a repressor. This cluster is present in the vast majority of sequenced filamentous Ascomycete genomes (Gluck-Thaler and Slot 2018), possibly reflecting the extensive production and accumulation of quinic acid in plant tissues (up to 10% dry leaf weight) where it may inhibit the growth of fungi unable to degrade it (Bentley and

Haslam 1990; Tadych et al., 2015). Quinic acid also accumulates in certain plant tissues during interactions with pathogenic fungi that possess and express the QUT cluster

(Soanes et al., 2012), which is likely due to the pathogen-mediated diversion of precursors for the biosynthesis of other defense compounds (Parker et al., 2009). The product of the QUT pathway, protocatechuic acid, is itself a substrate for the 3- oxoadipate (β-ketoadipate) pathway whose catechol and catechuate branches are both occasionally found partially encoded in MGCs (Gérecová et al., 2015), and represent two conserved fungal strategies for channeling aromatic ring-containing molecules into central carbon metabolism (Martins et al., 2015). The 3-oxoadipate pathway is predicted to contribute to fitness in both saprotrophic and pathotrophic niches, given that aromatic compounds are frequently deployed as anti-fungal compounds and are often by-products of lignin degradation (Mäkelä et al., 2014). Key genes within the 3-oxoadipate pathway are required for in some pathogenic fungi (Michielse et al., 2012) and have also been horizontally transferred and retained between plant pathogenic fungi and plant pathogenic oomycetes (Richards et al., 2006), suggesting that clustered or not, this pathway represents an important counter-defense strategy facilitating growth on plant tissues.

9 A number of MGCs target less widely distributed phytochemicals. The pine pathogen and bark beetle symbiont Grosmannia clavigera possesses a cluster of 4 co- regulated genes, some of which are essential for in vitro growth on the terpene limonene, a major constituent of chemical defense mixtures produced by pines in response to fungal and beetle attack (Wang et al., 2014). In various species of the Fusarium genus of grass endophytes/pathogens, two characterized genes catalyzing successive steps in the degradation of grass-specific benzoxazolinone defense chemicals are both embedded within separate clusters of either 15 or 8 neighboring genes whose expression all simultaneously increase in the presence of benzoxazolinones in F. verticillioides (Glenn et al., 2016). In F. pseudograminearum, homologs of these two genes are found clustered at the same locus and, along with 7 of their neighboring genes, also exhibit coordinated increases in transcript expression in the presence of benzoxazolinones (Kettle et al.,

2015). The ability to degrade host-produced benzoxazolinone compounds appears dispensable for pathogenicity on maize seedlings in F. verticillioides (Glenn et al., 2002), but nevertheless may facilitate colonization of maize tissues by F. verticilliodes and other species within this genus during endophytic stages of their lifecycles (Saunders and Kohn

2009), and furthermore contributes quantitatively to virulence on wheat in F. pseudograminearum (Kettle et al., 2015). It remains to be verified if and how the genes inferred to cluster with the two characterized loci contribute to benzoxazolinone degradation, but independent horizontal transfers of each candidate cluster from

Fusarium into distantly related grass-associated fungi suggest they contribute in some way to fitness in grass-associated niches (Glenn et al., 2016).

10 Phylogenetic analyses also support another horizontal MGC transfer between grass-associated fungi involving a 7-gene cluster that contains a gene encoding stilbene cleavage oxygenase (SCO), an enzyme that catalyzes the degradation of bi-phenolic stilbene defense compounds into monophenolic aldehydes (Brefort et al., 2011; Avalos

Cordero et al., 2013; Greene et al., 2014; Gluck-Thaler and Slot 2018). This cluster also contains a transporter-encoding gene, a transcription factor-encoding gene, and 4 additional enzyme-encoding genes that are predicted to channel the products of SCO directly to the tricarboxylic acid cycle through a gentisate intermediate (Greene et al.,

2014). sco associates with two additional types of conserved clusters predicted to be involved in monophenolic transformation, and together, the three types of sco clusters have convergently assembled multiple times throughout the evolution of the fungal (Gluck-Thaler et al., 2018), suggesting recurrent selection for specific counter- defense pathways.

Challenges facing cluster identification

A major challenge facing the high-throughput identification of bona fide counter- defense MGCs is that genes underlying these pathways are rarely known, precluding searches for clustering using existing software. Inspired by cluster-detection algorithms based on patterns of gene co-occurrences (Snel et al., 2000), we recently developed a phylogenetic “guilt-by-association” approach that identifies groups of genes that associate with a focal gene of interest, thus enabling the exploration of cluster space from a limited starting position (Gluck-Thaler and Slot 2018). By modeling how gene order is

11 typically conserved in fungal genomes, we found that 56 distinct groups of genes have unexpectedly maintained close physical linkage with characterized genes known to target molecules belonging to the largest class of plant defense compounds, the phenylpropanoids. Such context-based tools that do not rely heavily on a priori assumptions concerning gene function are increasingly recognized as useful for uncovering structural diversity at biosynthetic MGC loci as well (Navarro-Muñoz et al.,

2018).

Cluster evolution: towards a predictive framework

Building upon some of the earliest observations that genes are linearly organized along (Sturtevant 1913), J.B.S. Haldane and R. A. Fisher were the first to develop a statistical framework for modeling associations between loci that enabled more accurate predictions of how phenotypes are inherited (Haldane 1919; Fisher 1922).

Although the concepts of genetic linkage and linkage disequilibrium [i.e., non-random associations between alleles (Lewontin and Kojima 1960)] remain instrumental for mapping phenotypes to genotypes, the realization that linked genes might have anything in common besides their physical proximity emerged gradually with developments in

DNA sequencing technologies. Beginning in the 1960s with Jacob and Monod’s pioneering work on bacterial operons (Jacob and Monod 1961), investigations into the higher-level ordering of genes in genomes revealed that linkage between genes can itself result from selection. The best examples of this are supergenes such as MGCs, where related functional associations between constituent enzymatic, transporter and regulatory

12 components suggest these loci are the products of natural selection and not neutral processes (Rokas et al., 2018). Yet in contrast with their prevalence in bacteria, MGCs are a relatively rare phenomenon in fungi considering the number of metabolic enzymes they possess, with conservative estimates putting the number of clustered metabolic genes below 5% (Wisecaver et al., 2014), suggesting conditions must be “just right” for their formation. Furthermore, biosynthetic clusters are likely more prevalent than catabolic MGCs in fungal genomes (Gluck-Thaler and Slot 2018), although comprehensive comparisons have not been conducted. Three interacting factors are likely to affect where, how and when catabolic MGCs, especially those involved in counter- defense, evolve: the selective environment, the contributions of clustering to organismal fitness, and the rates of evolutionary events enabling their assembly. Increased recognition of the conditions under which MGCs emerge will foster a greater understanding of their contributions to fungal fitness and ecology.

The selective environment

Plant SM production is highly contextual, varying significantly according to phenology, ontogeny, tissue type, climate and the surrounding biotic environment

(Barton and Koricheva 2010; Kim et al., 2011). Fungi colonizing plants are thus expected to encounter strong spatial and temporal gradients in selection imposed by plant SMs that manifest across landscapes and even within individuals (Andrew et al., 2007; Holeski et al., 2012). While both constitutive and inducible defense compounds vary spatially and temporally, temporal variation exists as a function of some time-sensitive process, e.g.,

13 either developmental stage or season, whereas spatial variation exists as a function of physical distance. Given several key differences in the type of selection imposed by either spatial or temporal gradients, we predict that the evolution of counter-defense

MGCs is more likely when fungal populations encounter strong spatial variation in the distribution of a given defense metabolite.

First, spatial and temporal gradients are predicted to select for alternative evolutionary responses. Spatial heterogeneity in absence of strong temporal variation favors the evolution of specialized genotypes (Lynch and Gabriel 1987), and commonly manifests as local adaptation among populations, where individuals display higher fitness in their “home” environment compared to fitness in an “away” environment. For example, fungal pathogens tend to display higher fitness on plant hosts from their environment of origin as opposed to hosts collected from far away (Susi and Laine 2013).

Conversely, temporal variation alone is predicted to select for more broadly adapted genotypes (Lynch and Gabriel 1987; Buckling et al., 2007), possibly as a consequence of an overall reduction in the efficiency of natural selection that stymies specialization

(Cvijović et al., 2015). Simulations modeling the evolution of antibiotic resistance in bacteria, which is analogous to the evolution of counter-defense responses in plant- associated fungi, suggest that short temporal fluctuations in antibiotic doses are less likely to result in the evolution of specialized resistant genotypes (Baker, Ferrari et al.

2018). In cases where environments fluctuate unpredictably over time, selection for generalized genotypes may manifest as bet-hedging, where genotypes that generate random phenotypic variation are favored in the long term (Beaumont et al., 2009).

14 However, the effects of temporal variation on the evolution of specialized phenotypes crucially depend on whether there is a fitness cost to resistance (Baker, Ferrari et al.

2018), whether compensatory mutations emerge to mediate the cost of resistance

(Melnyk et al., 2017) and on the interaction between the strength of selection and the rate of fluctuations, with specialized genotypes more likely to dominate under low selection intensity, low fluctuation or high selection intensity, high fluctuation (Hall et al., 2012).

Second, temporal and spatial selection gradients likely favor different adaptive traits. Bacteria exposed to constant fluctuations in antibiotic doses occasionally evolve tolerance phenotypes where growth rates match fluctuations in selection pressure, instead of evolving biochemical resistance mechanisms (Fridman et al., 2014). Certain fungal pathogens deploy a similar strategy by entering a ‘quiescent’ phase after initial colonization of healthy plant tissues, only resuming active growth when tissues senesce and the concentration of plant defense compounds declines, in addition to other biochemical changes (Prusky et al., 2013). As opposed to favoring phenological traits, spatial gradients may instead favor the evolution of physiological traits. For example, bacterial populations exposed to spatially increasing concentrations of antibiotics evolve resistant biochemical phenotypes through the gradual, stepwise accumulation of adaptive mutations (Baym et al., 2016).

Contributions to fitness

Given that MGCs are likely products of selection, what exactly is being selected?

Beyond the direct phenotypes encoded by gene clusters, clustering itself is expected to

15 contribute to organismal fitness in three distinct ways that are related to the physiological and evolutionary consequences of genetic linkage. Selection for these contributions to fitness, which may occur simultaneously, would then result in selection of clustered loci.

First, clustering is predicted to contribute to phenotype optimization by facilitating the co-expression of neighboring genes (Hurst et al., 2004; Price et al., 2005).

In fungi, this phenomenon is at least partly related to the dynamics of chromatin remodeling. The expression of many MGCs is tightly regulated by epigenetic markings such as histone modifications (Pfannenstiel and Keller 2019). While co-expression is certainly possible for genes that are located far apart through trans-regulation, tighter regulation of their expression may enable fine tuning of enzyme concentrations within the cell and subsequently, the optimization of metabolic pathway flux. Homologs of physically linked fungal metabolic genes are more likely to show tissue-specific co- expression in primates compared with homologs of uncoupled metabolic genes, indicating that control over pathway flux through physical linkage in fungi may be analogous to the sophisticated trans-regulatory networks operating in other eukaryotes

(Eidem et al., 2015). Benefits accrued by co-expression may be particularly relevant if a pathway produces toxic intermediates, which would select for phenotypes better able to minimize their accumulation. While the structural configurations of extant fungal MGCs suggest that clustering is more likely to occur between enzyme pairs that produce and then consume toxic intermediate compounds (McGary et al., 2013), an explicit test of this hypothesis using the GAL cluster in yeast found that while clustering increased co- expression between pathway genes, there were no fitness consequences to having an

16 unclustered pathway over the short term in vitro (Lang and Botstein 2011). Additional studies using different catabolic MGCs and monitored over longer periods of time are needed to refine our understanding of the evolutionary impact of co-expression.

Clustering is also expected to contribute to phenotype selectability, or the efficiency at which a given phenotype is selected, by increasing a genome’s modularity.

In contrast to “beanbag thinking” views of evolution that focus on the effects of single genes in isolation from the rest of the genome (Mayr 1963), it is now understood that the location of a particular allele within a genome has evolutionary consequences for its establishment within a population. This phenomenon, referred to as the Hill-Robertson effect, occurs when selection of one locus interferes with the selection of adjacent or linked loci (Hill and Robertson 1966). The most extreme example of this is a selective sweep, where selection at a single locus carries linked alleles, whether neutral, mal- adaptive or sub-optimal, to fixation, as long as the selective benefit conferred by the adaptive locus exceeds the cumulative negative contributions of linked loci. Linkage of genes contributing to the same trait decreases the risk of selective interference on that trait, which then contributes to the overall modularity of an organism’s genotype- phenotype map, allowing changes to one phenotype without changing another. The evolution of modular genetic architectures is thus expected to be favored when organisms experience a combination of stabilizing and directional selection for multiple traits

(Wagner 1996; Le Nagard et al., 2011), as is likely the case for fungal populations exposed to the chemical complexity of plant tissues.

17 Although it does not necessarily contribute to traditional Darwinian fitness measured in terms of survival and reproduction, modularity increases individual fitness in the sense that it increases an individual’s probability of transferring adaptive alleles to its offspring in the face of conflicting selection pressures, which overall improves a population’s capacity to respond to selection i.e., its evolvability (Pepper 2003).

Furthermore, while discussed above as a component of phenotype optimization, increased regulatory coordination between enzymes in a pathway may also increase the selectability of that pathway, as all genes in a pathway must be co-expressed and their products present in the cell in order for the pathway to be visible to natural selection. The implications of phenotype selectability are especially relevant for metabolic evolution, as complex pathways are often built from combining multiple conserved combinations or modules of existing reactions (Yamada et al., 2006; Samal et al., 2011; Medema et al.,

2014). Modularity is a general property of metabolic networks, and it concomitantly increases with an organism’s ability to survive in a wider range of environments (Samal et al., 2011). Clustering thus not only contributes to the selectability of a given pathway, but also to selectability of particular reaction modules within that pathway that can then be more easily integrated into another metabolic or genomic context, thus promoting metabolic evolution (Medema et al., 2014).

Finally, clustering is also expected to contribute to phenotype persistence, resulting from decreased recombination between linked loci. This serves to increase the evolutionary stability of particular gene combinations encoding adaptive phenotypes

(Moreno-Hagelsieb and Jokic 2012). While preserving combinations of co-adapted

18 alleles may provide fitness advantage if those genes interact epistatically, if they contribute to the same phenotype, and/or if their products interact physically (Dandekar et al., 1998), it is predicted to be especially beneficial when locally adapting populations experience gene flow from maladapted allopatric populations (Haldane 1957). Increased recombination between loci is typically favored when populations are evolving towards a single optimum in the absence of migration (Martin et al., 2006), but in the presence of maladaptive gene flow, there is strong selection against the acquisition of maladaptive alleles, resulting in selection of genotypes that are better able to preserve locally adapted phenotypes (Yeaman and Whitlock 2011).

The best examples of such genotypes are those consisting of inversions, which experience little to no recombination and often occur in regions containing genes contributing to locally adaptive phenotypes. Inversions have profoundly impacted the evolution of many species by enabling the persistence of locally adaptive traits like tolerance to low water availability in the malaria mosquito Anopheles gambiae (Fouet et al., 2012), wing coloration patterns in butterflies (Nishikawa et al., 2015), crypsis phenotypes in stick insects (Lindtke et al., 2017), and have been shown to have even independently evolved in different populations adapting to the same environment (Jones et al., 2012).

The evolution of linkage between locally adaptive alleles in general is dependent on migration rate, where distance between loci is expected to gradually decrease as migration rate increases to some critical value determined by the relative magnitudes of selection and drift (Yeaman and Whitlock 2011). As migration rates continue to increase,

19 differences in genomic architectures are minimized to the point where linkage between locally adaptive alleles does not occur (Yeaman and Whitlock 2011). Contrary to the predictions of Baas-Becking’s hypothesis that “everything is everywhere, but the environment selects” (Baas-Becking 1934), which is often invoked to explain patterns of microbial biogeography and presumes global panmixia among populations, certain fungal lineages are now recognized as dispersal limited (Peay et al., 2010). Fungal migration rates across a landscape are thus expected to vary, and likely exist in some populations at intermediate levels optimal for cluster formation. However, the effects of migration on the evolution of linkage have only been examined in the context of genomic islands of divergence, so it remains to be tested whether or not similar demographic parameters also favor the evolution of MGCs in fungi.

The concept of phenotype persistence may not only be relevant for organismal fitness, as the fitness of a gene in a multi-gene pathway crucially depends on the presence of its functional neighbors. In their absence, the pathway cannot function, so any single gene would provide a smaller fitness benefit to the organism, ultimately decreasing its own fitness. The hypothesis of “selfish clustering” thus predicts the evolution of more concentrated architectures through time as individual gene fitness increases if co- inherited with a functional neighbor, even if its overall position in the genome has no effect on organismal fitness (Lawrence and Roth 1996; Walton 2000). Although initially proposed to contribute to gene fitness only under a scenario where horizontal transfer is rampant, fungal genomes are likely dynamic enough that selfish clustering may also be beneficial under scenarios of vertical inheritance. Many fungi have accessory

20 chromosomes that are meiotically unstable (Möller et al., 2018), and can sometimes be transferred between individuals of the same species (Ma et al., 2010), which would lead to selection dynamics similar to an inheritance scenario dominated by horizontal gene transfer. Furthermore, if the risk of recombination with mal-adapted migrants is high, an individual allele’s fitness is likely to increase the greater its probability of co-inheritance with co-adapted functional partners. However, distinguishing between selection for phenotype persistence due to selfish clustering or due to some benefit to the organism is difficult in practice because gene fitness and organismal fitness often align. Explicit tests would require examining rates of clustering in a pathway that was selectively neutral at the whole organism level in the face of either significant gene flow or where horizontal transfer dominated.

A population’s ability to respond to selection for any of the above three phenotypes ultimately depends on the effective population size, or the number of individuals participating in the exchange of genetic material within a given population.

We see a general trend across the tree of life where lineages with large effective population sizes such as bacteria have more highly clustered genomes compared with lineages with much smaller populations like animals. Rates of clustering also vary phylogenetically within the fungi, where Ascomycetes are predicted to harbor many more clusters than mushroom-forming Basidiomycetes (Gluck-Thaler and Slot 2018). While many fungal lineages are predicted to have intermediate effective population sizes, population sizes are rarely assessed and comparisons among fungi are rare, so it remains an open question as to whether or not differences in population sizes can explain

21 differences in how populations respond to selection for phenotypes conferred by clustering.

The evolutionary routes to clustering

Although genomic islands of divergence and supergenes may differ in the length of the region over which they occur, similar evolutionary mechanisms may underlie their assembly because there are only a limited number of routes to establishing and/or maintaining genetic linkage (Yeaman 2013), and clarification of these mechanisms will be useful for a more nuanced understanding of MGC formation.

The first mechanism assumes existing linkage between locally adaptive loci in a given population. If this population then comes into contact with an allopatric population

(e.g., through the removal of some physical barrier), differentiation in and around the regions containing the locally adaptive alleles is expected to decline more slowly compared with differentiation at neutrally evolving sites in other parts of the genome, resulting in the formation of islands of divergence through an ‘erosion’-like process

(Nosil et al., 2009; Yeaman et al., 2016). The second mechanism, which manifests at the onset of local adaptation, occurs when strong divergent selection at a particular locus recruits either existing or de novo adaptive mutations that are located in close physical proximity through a “divergence hitchhiking” effect. Analogous to a selective sweep, this results in the increased establishment probability of alleles linked to the strongly selected locus, even in the case where other unlinked alleles make greater contributions to fitness

(Yeaman and Whitlock 2011). The third mechanism, “competition among genetic

22 architectures”, follows and overlaps with the second, describing a process occurring over longer periods of time where new mutations in alleles located closer to existing islands of divergence are more likely to persist in a population compared with new mutations that occur farther away, even if the mutations make similar contributions to fitness (Yeaman and Whitlock 2011). Finally, the removal of intervening sequences through relaxed selection or genomic rearrangements that bring existing adaptive alleles into closer physical proximity are expected to gradually replace more diffuse arrangements of the same alleles through a process termed “competition among genomic architectures”

(Yeaman 2013).

The functional relationships between genes in MGCs, as well as several outstanding examples of cluster assembly, suggest their formation occurs primarily through competition among genomic architectures as opposed to the other mechanisms detailed above that would require a priori organization of functionally related genes at the same locus. The constant shuffling of fungal genomes is likely more than sufficient to generate new structural configurations of genes that can serve as the substrate for selection. Genomes of both sexually and asexually reproducing fungi undergo frequent rearrangements mediated by both homologous and non-homologous recombination, as well as inversions (Hane et al., 2011; Grandaubert et al., 2014; Croll et al., 2015; Faino et al., 2016; Shi-Kunne et al., 2018). Homologous recombination hotspots evolve quickly and vary across certain fungal genomes, ranging from little to no recombination in accessory chromosomes to specific regions on core chromosomes that experience frequent recombination (Croll et al., 2015; Stukenbrock and Dutheil 2018). Rates of non-

23 homologous recombination, including inter-chromosomal recombination, also vary across the genome, and are particularly elevated in sub-telomeric regions, likely facilitated by frequent occurrence of transposable elements and conserved repetitive sequences within these regions (Palmer and Keller 2010; Dallery et al., 2017).

Increasing rates of genomic rearrangements are correlated with decreasing distances between locally adaptive alleles (Yeaman and Whitlock 2011), and indeed, gene clusters often preferentially accumulate in dynamic chromosomal regions such as sub-telomeres that experience high rates of non-homologous and homologous recombination (Perrin et al., 2007; Croll et al., 2015; Druzhinina et al., 2016). Sub-telomeric regions often contain strain specific genes and gene clusters with predicted contributions to adaptation in specific environments (Wu et al., 2009). For example, nearly half of all clustered genes whose expression increased during an of mouse lung cells were located in sub- telomeric regions of the mammalian pathogen Aspergillus fumigatus (McDonagh et al.,

2008). Rearrangements occurring between non-homologous genomic areas (Zhao et al.,

2014) as well as within and between clusters themselves (Proctor et al., 2009), have indirectly been inferred to lead to growth of MGCs through the relocation of genes from one area of the genome to another, as in the case of the origins of the allantoin degradation cluster in yeast (Wong and Wolfe 2005). Here, 6 of the 8 genes that were previously scattered across the genome were rapidly re-organized into a single locus located in a sub-telomeric region (Wong and Wolfe 2005). Gene relocations are also implicated in the multiple independent assemblies of stilbene counter-defense clusters

(Gluck-Thaler et al., 2018). Although the exact mechanisms through which

24 rearrangements occur are not well known, transposable elements are frequently adjacent to MGC loci and may facilitate non-homologous recombination between disparate genomic regions leading to cluster growth (Dallery et al., 2017; Lind et al., 2017; Gluck-

Thaler et al., 2018). Rearrangements are not only associated with the growth of fungal

MGCs, but have also been implicated in the assembly and rearrangement of biosynthetic

MGCs in plants (Field et al., 2011), of homeobox clusters in animals (Chan et al., 2015) and multiple MGCs encoding diverse metabolic pathways in bacteria (Cimermancic et al., 2014).

The recently proposed conveyor hypothesis predicts that cluster formation occurs through progressive rearrangements and intervening sequence deletions that gradually bring together adaptive loci in dynamic regions, such as sub-telomeres.

Eventually, if selection for the phenotype conferred by clustering is strong enough, the cluster will eventually migrate to a more stable region of the genome (Slot 2017).

Sampling fungal populations to greater depths will enable more precise tracking of clusters as they travel across genomes (Lind et al., 2017).

MGC evolution: does one size fit all?

Biosynthetic fungal MGCs have been intensively studied for the better part of three decades, and justifiably so given the utility of their natural products for human, plant and animal health (Keller 2018). Comparative studies of biosynthetic MGCs both within and across fungal populations have enriched our understanding of how genetic diversity at these loci promotes and reflects fungal adaptation (Lind et al., 2017). Yet the

25 extent to which these insights also apply to catabolic MGCs is not known, because comparative studies are lacking among catabolic MGCs and between these loci and their biosynthetic counterparts. However, several key differences between specialized biosynthetic or degradative metabolisms predict that MGCs encoding such pathways are on distinct evolutionary trajectories.

Specialized biosynthetic and catabolic pathways are likely subject to different constraints imposed by existing metabolic networks within the cell. Specialized biosynthetic metabolism is outward facing, in the sense that the products of these pathways need not integrate into existing metabolic networks. In contrast, catabolism is inward facing, where products must ultimately channel into a set of core nutrient assimilation pathways with specific substrate requirements. For example, while plants produce a seemingly unlimited variety of differentially substituted aromatic molecules, only a limited set of aromatic ring structures are cleaved by fungi (Mäkelä et al., 2014).

Pathways for aromatic molecule assimilation by fungi thus typically consist of functional group-modifying reactions that transform diversely substituted rings into a select number of structures, like catechol or protocatechuate, which in turn serve as substrates for downstream ring-cleavage pathways (Mäkelä et al., 2014). The evolution of core biosynthetic pathways may more resemble degradative pathways, as their products often feed into existing reactions. Indeed, associations between core biosynthetic genes tend to fluctuate less over time than associations between catabolic genes, suggesting they may be even more constrained (Snel and Huynen 2004). Similarly, the evolution of strictly detoxifying reactions may resemble that of specialized biosynthetic metabolism, in that

26 their products need only be less toxic than the original substrate and not necessarily serve as a substrate for an existing pathway in order to confer a fitness benefit.

The constraint to produce a molecule that serves as a substrate for an existing pathway likely promotes alternative evolutionary dynamics. Firn and Jones’ screening hypothesis predicts that if the probability of synthesizing a bioactive molecule is low, then mechanisms to generate diverse chemicals will be selected over time (Firn and Jones

2003). Lineages with the ability to produce a greater diversity of chemicals will be more likely to evolve adaptive molecules, and will correspondingly have higher fitness compared with lineages less able to generate diversity. Although never explicitly tested for biosynthetic fungal metabolism, this hypothesis finds strong support in plant-insect interactions, where increasing diversity of plant defense chemicals is generally associated with reduced insect herbivory (Richards et al., 2015). Selection for screening mechanisms may similarly apply to strict detoxification strategies, as the probability of evolving a reaction that reduces the toxicity of a given molecule is likely low. Many organisms maintain large repertoires of enzymes and transporters with broad substrate specificities, such as cytochrome P450s (Copley 2015). Promiscuous enzymes are frequently implicated in the detoxification of antibiotics, xenobiotics and plant defense compounds in bacteria, insects and fungi (Copley 2015; Lah et al., 2013; Rane et al., 2019). It follows that the ability to generate and maintain diverse detoxification strategies may be selected as a mechanism that increases fitness in antagonistic chemical environments. But whereas mechanisms to generate diverse specialized chemicals or detoxification reactions may be adaptive under certain conditions, the constraint of producing substrates for a defined set

27 of core catabolic pathways predicts that screening mechanisms would not be favored during the evolution of assimilatory pathways. Instead, these constraints are predicted to impose strong directional selection to produce specific, and not novel, products.

Compounded with the fact that assimilatory pathways typically consist of multiple reactions while detoxification often only involves single reactions, constraints on the products of assimilatory pathways suggests that counter-defense MGCs are not on the frontlines of chemical arms races between plants and fungi but rather encode keystone biochemical traits that evolve under sustained selection by a plant defense metabolite or metabolite class. However, once formed, the counter-defense cluster and its encoded pathway could accelerate adaptation in other lineages through horizontal transfer or introgression (Greene et al., 2014). Although the disjunct phylogenetic distributions of counter-defense clusters suggests the processes driving their dispersal over macroevolutionary timescales resemble those affecting biosynthetic MGC distributions

(Gluck-Thaler and Slot 2018), the drivers of within-population diversity of counter- defense MGC have never been examined. Comparative analyses of rates of change, both in terms of gene function and at the level of gene combinations, will decipher the mechanisms of counter-defense MGC evolution and its consequences for plant-fungal interactions.

MGCs: windows into the future of fungal ecology

The implications of counter defense clustering for the study of fungal ecology are profound. Like morphological features in plants and animals help us understand how

28 these lineages adapt to their environments, MGCs in fungal genomes enable researchers to rapidly generate and test hypotheses of ecological evolution. Developments in cluster prediction, such as tools for predicting clustering in three-dimensional space (Schulz et al., 2018), coupled with high-throughput screening methods (Carere et al., 2016) will help fuel comparative analyses necessary for exploring this new frontier in plant-fungal interactions. Identification of pathways will also enable synthetic assembly of degradative pathways for biotechnological applications, and will be useful for creating biochemical roadmaps for removing undesirable inhibitors of ligno-cellulose fermentations (Ramos et al., 2011). Several outstanding questions remain: how do particular clusters contribute to fitness on a given host? Is there evidence of selection for combinations of counter-defense MGCs, similar to simultaneous selection of multiple pathogenicity islands in bacteria (Bouyioukos et al., 2016)? What precise role does migration play in the assembly of MGC loci? Are there geographic hotspots of MGC evolution, and how frequently do MGCs migrate outside of these hotspots? What processes drive counter-defense MGC diversity at the population level, and what are the consequences of this diversity? In-depth sampling along spatial and ecological gradients is the first step required to identify selective and demographic forces shaping cluster formation.

Our knowledge of chemical defense and counter-defense between plants and fungi lags far behind our understanding of how these interactions manifest between plants and insect herbivores (Ehrlich and Raven 1964; Richards et al., 2015). Decades of research have demonstrated that chemical interactions between plants and insects have

29 cascading consequences for generating and maintaining biodiversity across earth’s ecosystems (Richards et al., 2015), and provide valuable explanatory power for describing the distribution of species across landscapes and the evolution of pathogenicity (Becerra 2015). The consequences of similar chemical interactions between plants and fungi remain to be shown, yet hold much promise for increasing our understanding of the diversification of life on earth. Investigations into the evolution of fungal ecology have repeatedly been shown to contribute to the development of new theory that advances our understanding of the processes shaping species interactions and community assembly (Bruns 2019). Developing high-throughput approaches for identifying and characterizing counter-defense loci will resolve our understanding of plant-fungal co-evolution, and will serve as a foundation for investigating the consequences of these interactions for biodiversity writ large.

References

Andrew RL, Peakall R, Wallis IR, Foley WJ (2007). Spatial distribution of defense chemicals and markers and the maintenance of chemical variation. Ecology 88: 716-728.

Avalos Cordero FJ, Díaz Sánchez V, Estrada Alejandro F, Limón Mirón MdC, Al-Babili S (2013). The oxygenase CAO-1 of Neurospora crassa is a cleavage enzyme. Eukaryotic Cell 12: 1305-1314.

Baas-Becking LGM (1934). Geobiologie; of inleiding tot de milieukunde. WP Van Stockum & Zoon NV: The Hague, the Netherlands.

Barton KE, Koricheva J (2010). The ontogeny of plant defense and herbivory: characterizing general patterns using meta-analysis. The American Naturalist 175: 481- 493.

30 Bay RA, Ruegg K (2017). Genomic islands of divergence or opportunities for introgression? Proceedings of the Royal Society B: Biological Sciences 284.

Baym M, Lieberman TD, Kelsic ED, Chait R, Gross R, Yelin I et al (2016). Spatiotemporal microbial evolution on antibiotic landscapes. Science 353: 1147-1151.

Beaumont HJ, Gallie J, Kost C, Ferguson GC, Rainey PB (2009). Experimental evolution of bet hedging. Nature 462: 90.

Becerra JX (1997). Insects on plants: macroevolutionary chemical trends in host use. Science 276: 253-256.

Becerra JX (2007). The impact of herbivore–plant coevolution on plant community structure. Proceedings of the National Academy of Sciences 104: 7483-7488.

Becerra JX (2015). On the factors that promote the diversity of herbivorous insects and plants in tropical forests. Proceedings of the National Academy of Sciences 112: 6098- 6103.

Bentley R, Haslam E (1990). The shikimate pathway—a metabolic tree with many branches. Critical reviews in biochemistry and molecular biology 25: 307-384.

Berbee ML, Wang S, Chang Y, Sekimoto S, Clum A, Aerts AL et al (2015). Phylogenomic Analyses Indicate that Early Fungi Evolved Digesting Cell Walls of Algal Ancestors of Land Plants. Genome Biology and Evolution 7: 1590-1601.

Bouarab K, Melton R, Peart J, Baulcombe D, Osbourn A (2002). A saponin-detoxifying enzyme mediates suppression of plant defences. Nature 418: 889-892.

Bouyioukos C, Reverchon S, Kepes F (2016). From multiple pathogenicity islands to a unique organized pathogenicity archipelago. Scientific reports 6: 27978.

Bowyer P, Clarke B, Lunness P, Daniels M, Osbourn A (1995). Host range of a plant pathogenic determined by a saponin detoxifying enzyme. Science 267: 371-374.

Boyce KJ, McLauchlan A, Schreider L, Andrianopoulos A (2015). Intracellular Growth Is Dependent on Tyrosine Catabolism in the Dimorphic Fungal Pathogen Penicillium marneffei. PLoS Pathogens 11: 1-30.

Bradshaw RE, Slot JC, Moore GG, Chettri P, de Wit PJGM, Ehrlich KC et al (2013). Fragmentation of an aflatoxin-like gene cluster in a forest pathogen. New Phytologist 198: 525-535.

31 Brefort T, Scherzinger D, Limón MC, Estrada AF, Trautmann D, Mengel C et al (2011). Cleavage of resveratrol in fungi: characterization of the enzyme Rco1 from Ustilago maydis. Fungal genetics and biology 48: 132-143.

Bruns TD (2019). The developing relationship between the study of fungal communities and community ecology theory. Fungal Ecology. doi:10.1016/j.funeco.2018.12.009

Buckling A, Brockhurst MA, Travisano M, Rainey PB (2007). Experimental adaptation to high and low quality environments under different scales of temporal variation. Journal of evolutionary biology 20: 296-300.

Campbell MA, Staats M, van Kan JAL, Rokas A, Slot JC (2013). Repeated loss of an anciently horizontally transferred gene cluster in botrytis. Mycologia 105: 1126-1134.

Carere J, Colgrave ML, Stiller J, Liu C, Manners JM, Kazan K et al (2016). Enzyme- driven metabolomic screening: a proof-of-principle method for discovery of plant defence compounds targeted by pathogens. New Phytologist 212: 770-779.

Chan C, Jayasekera S, Kao B, Páramo M, Von Grotthuss M, Ranz JM (2015). Remodelling of a homeobox gene cluster by multiple independent gene reunions in Drosophila. Nature communications 6: 6509.

Charron MJ, Dubin RA, Michels CA (1986). Structural and functional analysis of the MAL1 locus of Saccharomyces cerevisiae. Molecular and cellular biology 6: 3891-3899.

Cimermancic P, Medema MH, Claesen J, Kurita K, Brown LCW, Mavrommatis K et al (2014). Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158: 412-421.

Copley SD (2015). An evolutionary biochemist's perspective on promiscuity. Trends in biochemical sciences 40: 72-78.

Croll D, Lendenmann MH, Stewart E, McDonald BA (2015). The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics 201: 1213-1228.

Cvijović I, Good BH, Jerison ER, Desai MM (2015). Fate of a mutation in a fluctuating environment. Proceedings of the National Academy of Sciences 112: E5021-E5028.

Dallery J-F, Lapalu N, Zampounis A, Pigné S, Luyten I, Amselem J et al (2017). Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters. BMC Genomics 18.

32 Dandekar T, Snel B, Huynen M, Bork P (1998). Conservation of gene order: a fingerprint of proteins that physically interact. Trends in biochemical sciences 23: 324-328. de las Heras A, Chavarría M, de Lorenzo V (2011). Association of dnt genes of Burkholderia sp. DNT with the substrate-blind regulator DntR draws the evolutionary itinerary of 2, 4-dinitrotoluene biodegradation. Molecular microbiology 82: 287-299.

Del Carratore F, Zych K, Cummings M, Takano E, Medema MH, Breitling R (2019). Computational identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters. Communications Biology 2: 83.

Douglas H, Hawthorne D (1966). Regulation of genes controlling synthesis of the galactose pathway enzymes in yeast. Genetics 54: 911.

Druzhinina IS, Kopchinskiy AG, Kubicek EM, Kubicek CP (2016). A complete annotation of the chromosomes of the cellulase producer Trichoderma reesei provides insights in gene clusters, their expression and reveals genes required for fitness. Biotechnology for biofuels 9: 75.

Durmaz E, Benson C, Kapun M, Schmidt P, Flatt T (2018). An inversion supergene in Drosophila underpins latitudinal clines in survival traits. Journal of Evolutionary Biology 31: 1354-1364.

Ehrlich PR, Raven PH (1964). Butterflies and plants: a study in coevolution. Evolution 18: 586-608.

Eidem HR, McGary KL, Rokas A (2015). Shared selective pressures on fungal and human metabolic pathways lead to divergent yet analogous genetic responses. Molecular biology and evolution 32: 1449-1455.

Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, Glass N et al (2011). Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proceedings of the National Academy of Sciences 108: 2831-2836.

Elmore MH, McGary KL, Wisecaver JH, Slot JC, Geiser DM, Sink S et al (2015). Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages. Genome biology and evolution 7: 789-800.

Faino L, Seidl MF, Shi-Kunne X, Pauper M, van den Berg GC, Wittenberg AH et al (2016). Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen. Genome research 26: 1091-1100.

33 Fernández-Cañón JM, Peñalva MA (1995). Molecular characterization of a gene encoding a homogentisate dioxygenase from Aspergillus nidulans and identification of its human and plant homologues. Journal of Biological Chemistry 270: 21199-21205.

Field B, Fiston-Lavier AS, Kemen A, Geisler K, Quesneville H, Osbourn AE (2011). Formation of plant metabolic gene clusters within dynamic chromosomal regions. Proceedings of the National Academy of Sciences 108: 16116-16121.

Firn RD, Jones CG (2003). Natural products–a simple model to explain chemical diversity. Natural product reports 20: 382-391.

Fisher RA (1922). The systematic location of genes by means of crossover observations. The American Naturalist 56: 406-411.

Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B et al (2012). The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336: 1715-1719.

Fouet C, Gray E, Besansky NJ, Costantini C (2012). Adaptation to aridity in the malaria mosquito Anopheles gambiae: chromosomal inversion polymorphism and body size influence resistance to desiccation. PloS one 7: e34841.

Fridman O, Goldberg A, Ronin I, Shoresh N, Balaban NQ (2014). Optimization of lag time underlies antibiotic tolerance in evolved bacterial populations. Nature 513: 418.

Gérecová G, Neboháčová M, Zeman I, Pryszcz LP, Tomáška Ľ, Gabaldón T et al (2015). Metabolic gene clusters encoding the enzymes of two branches of the 3-oxoadipate pathway in the pathogenic yeast Candida albicans. FEMS yeast research 15: fov006.

Gerton JL, DeRisi J, Shroff R, Lichten M, Brown PO, Petes TD (2000). Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences 97: 11383-11390.

Giles NH, Case ME, Jacobson JW (1973). Genetic regulation of quinate-shikimate catabolism in Neurospora crassa. Molecular cytogenetics. Springer: Boston, MA. pp 309- 314.

Glenn A, Gold S, Bacon C (2002). Fdb1 and Fdb2, Fusarium verticillioides loci necessary for detoxification of preformed antimicrobials from corn. Molecular plant- microbe interactions 15: 91-101.

Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH et al (2016). Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species. PLoS ONE 11: e0147486.

34

Gluck-Thaler E, Slot JC (2018). Specialized plant biochemistry drives gene clustering in fungi. The ISME Journal 12: 1694-1705.

Gluck-Thaler E, Vijayakumar V, Slot JC (2018). Fungal adaptation to plant defences through convergent assembly of metabolic modules. Molecular ecology 27: 5120-5136.

Graham GJ (1995). Tandem genes and clustered genes. Journal of theoretical biology 175: 71-87.

Grandaubert J, Lowe RG, Soyer JL, Schoch CL, Van de Wouw AP, Fudal I et al (2014). Transposable element-assisted evolution and adaptation to host plant within the Leptosphaeria maculans-Leptosphaeria biglobosa species complex of fungal pathogens. BMC genomics 15: 891.

Greene GH, McGary KL, Rokas A, Slot JC (2014). Ecology drives the distribution of specialized tyrosine metabolism modules in Fungi. Genome Biology and Evolution 6: 121-132.

Haldane J (1919). The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8: 299-309.

Haldane JBS (1957). The cost of natural selection. Journal of Genetics 55: 511.

Hall AR, Miller AD, Leggett HC, Roxburgh SH, Buckling A, Shea K (2012). Diversity– disturbance relationships: frequency and intensity interact. Biology letters 23: 768-771.

Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP (2011). A novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome biology 12: R45.

Hill WG, Robertson A (1966). The effect of linkage on limits to artificial selection. Genetics Research 8: 269-294.

Hittinger CT, Rokas A, Carroll SB (2004). Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proceedings of the National Academy of Sciences 101: 14144-14149.

Holeski LM, Hillstrom ML, Whitham TG, Lindroth RL (2012). Relative importance of genetic, ontogenetic, induction, and seasonal variation in producing a multivariate defense phenotype in a foundation tree species. Oecologia 170: 695-707.

35 Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW (2016). Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist 209: 1240-1251.

Hull E, Green P, Arst Jr H, Scazzocchlo C (1989). Cloning and physical characterization of the l-proline catabolism gene cluster of Aspergillus nidulans. Molecular microbiology 3: 553-559.

Hurst LD, Pál C, Lercher MJ (2004). The evolutionary dynamics of eukaryotic gene order. Nature Reviews Genetics 5: 299-310.

Jacob F, Monod J (1961). Genetic regulatory mechanisms in the synthesis of proteins. Journal of molecular biology 3: 318-356.

Jeffries TW, Van Vleet JRH (2009). Pichia stipitis genomics, transcriptomics, and gene clusters. FEMS yeast research 9: 793-807.

Johnstone I, McCabe P, Greaves P, Gurr S, Cole G, Brow M et al (1990). Isolation and characterisation of the crnA-niiA-niaD gene cluster for nitrate assimilation in Aspergillus nidulans. Gene 90: 181-192.

Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J et al (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484: 55-61.

Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR et al (2011). Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477: 203-206.

Kaznadzey A, Shelyakin P, Gelfand MS (2017). Sugar Lego: Gene composition of bacterial carbohydrate metabolism genomic loci. Biology Direct 12.

Keller NP (2018). Fungal secondary metabolism: regulation, function and drug discovery. Nature Reviews Microbiology 17: 167-180.

Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM (2015). Degradation of the benzoxazolinone class of phytoalexins is important for virulence of Fusarium pseudograminearum towards wheat. Molecular plant pathology 16: 946-962.

Kim S-G, Yon F, Gaquerel E, Gulati J, Baldwin IT (2011). Tissue specific diurnal rhythms of metabolites and their regulation during herbivore attack in a native tobacco, Nicotiana attenuata. PLoS ONE 6: e26214.

36 Kliebenstein DJ, Rowe HC, Denby KJ (2005). Secondary metabolites influence Arabidopsis/Botrytis interactions: Variation in host production and pathogen sensitivity. Plant Journal 44: 25-36.

Lah L, Haridas S, Bohlmann J, Breuil C (2013). The cytochromes P450 of Grosmannia clavigera: Genome organization, phylogeny, and expression in response to pine host chemicals. Fungal Genetics and Biology 50: 72-81.

Lang GI, Botstein D (2011). A test of the coordinated expression hypothesis for the origin and maintenance of the GAL cluster in yeast. PLoS ONE 6: e25290.

Larson WA, Limborg MT, McKinney GJ, Schindler DE, Seeb JE, Seeb LW (2017). Genomic islands of divergence linked to ecotypic variation in sockeye salmon. Molecular Ecology 26: 554-570.

Lawrence JG, Roth JR (1996). Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143: 1843-1860.

Le Nagard H, Chao L, Tenaillon O (2011). The emergence of complexity and restricted pleiotropy in adapting networks. BMC evolutionary biology 11.

Lewontin R, Kojima Ki (1960). The evolutionary dynamics of complex polymorphisms. Evolution 14: 458-472.

Lind AL, Wisecaver JH, Lameiras C, Wiemann P, Palmer JM, Keller NP et al (2017). Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS Biology 15: e2003583.

Lindtke D, Lucek K, Soria-Carrasco V, Villoutreix R, Farkas TE, Riesch R et al (2017). Long-term balancing selection on chromosomal variants associated with crypsis in a stick insect. Molecular Ecology 26: 6189-6205.

Liu TT, Xu Y, Liu H, Luo S, Yin YJ, Liu SJ et al (2011). Functional characterization of a gene cluster involved in gentisate catabolism in Rhodococcus sp. strain NCIMB 12038. Applied Microbiology and Biotechnology 90: 671-678.

Lowe TM, Ailloud F, Allen C (2015). Hydroxycinnamic acid degradation, a broadly conserved trait, protects Ralstonia solanacearum from chemical plant defenses and contributes to root colonization and virulence. Molecular Plant-Microbe Interactions 28: 286-297.

Lynch M, Gabriel W (1987). Environmental tolerance. The American Naturalist 129: 283-303.

37 Ma L-J, van der Does HC, Borkovich KA, Coleman JJ, Daboussi M-J, Di Pietro A et al (2010). Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464: 367-373.

Mäkelä MR, Donofrio N, De Vries RP (2014). Plant biomass degradation by fungi. Fungal Genetics and Biology 72: 2-9.

Martens-Uzunova ES, Schaap PJ (2008). An evolutionary conserved d-galacturonic acid metabolic pathway operates across filamentous fungi capable of pectin degradation. Fungal Genetics and Biology 45: 1449-1457.

Martin G, Otto SP, Lenormand T (2006). Selection for recombination in structured populations. Genetics 172: 593-609.

Martins TM, Hartmann DO, Planchon S, Martins I, Renaut J, Pereira CS (2015). The old 3-oxoadipate pathway revisited: new insights in the catabolism of aromatics in the saprophytic fungus Aspergillus nidulans. Fungal Genetics and Biology 74: 32-44.

Mayr E (1963). Animal species and evolution. Harvard University Press: Cambridge, MA.

McDonagh A, Fedorova ND, Crabtree J, Yu Y, Kim S, Chen D et al (2008). Sub- telomere directed gene expression during initiation of invasive aspergillosis. PLoS Pathogens 4: e1000154.

McGary KL, Slot JC, Rokas A (2013). Physical linkage of metabolic genes in fungi is an adaptation against the accumulation of toxic intermediate compounds. Proceedings of the National Academy of Sciences 110: 11481-11486.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA (2014). A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis. PLoS Computational Biology 10: e1004016.

Melnyk AH, McCloskey N, Hinz AJ, Dettman J, Kassen R (2017). Evolution of cost-free resistance under fluctuating drug selection in Pseudomonas aeruginosa. mSphere 2: e00158-00117.

Michielse CB, Reijnen L, Olivain C, Alabouvette C, Rep M (2012). Degradation of aromatic compounds through the β-ketoadipate pathway is required for pathogenicity of the tomato wilt pathogen F usarium oxysporum f. sp. lycopersici. Molecular plant pathology 13: 1089-1100.

38 Miller CT, Glazer AM, Summers BR, Blackman BK, Norman AR, Shapiro MD et al (2014). Modular skeletal evolution in sticklebacks is controlled by additive and clustered quantitative trait loci. Genetics 197: 405-420.

Möller M, Habig M, Freitag M, Stukenbrock EH (2018). Extraordinary Genome Instability and Widespread Chromosome Rearrangements During Vegetative Growth. Genetics 210: 517-529.

Montalbini P (1991). Effect of rust infection on levels of uricase, allantoinase and ureides in susceptible and hypersensitive bean leaves. Physiological and molecular plant pathology 39: 173-188.

Moreno-Hagelsieb G, Jokic P (2012). The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic acids research 40: 7104-7112.

Nandasena KG, O'Hara GW, Tiwari RP, Howieson JG (2006). Rapid in situ evolution of nodulating strains for Biserrula pelecinus L. through lateral transfer of a symbiosis island from the original mesorhizobial inoculant. Applied and environmental microbiology 72: 7365-7367.

Navarro-Muñoz J, Selem-Mojica N, Mullowney M, Kautsar S, Tryon J, Parkinson E et al (2018). A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data. bioRxiv: 445270.

Nishikawa H, Iijima T, Kajitani R, Yamaguchi J, Ando T, Suzuki Y et al (2015). A genetic mechanism for female-limited Batesian mimicry in Papilio butterfly. Nature Genetics 47: 405-409.

Nosil P, Funk DJ, Ortiz-Barrientos D (2009). Divergent selection and heterogeneous genomic divergence. Molecular ecology 18: 375-402.

Nützmann HW, Scazzocchio C, Osbourn A (2018). Metabolic Gene Clusters in Eukaryotes. Annual Review of Genetics 52: 159-183.

Ökmen B, Etalo DW, Joosten MH, Bouwmeester HJ, de Vos RC, Collemare J et al (2013). Detoxification of α-tomatine by C ladosporium fulvum is required for full virulence on tomato. New Phytologist 198: 1203-1214.

Pál C, Papp B, Lercher MJ (2005). Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics 37: 1372-1375.

Palmer JM, Keller NP (2010). Secondary metabolism in fungi: does chromosomal location matter? Current Opinion in Microbiology 13: 431-436.

39

Paolinelli-Alfonso M, Villalobos-Escobedo JM, Rolshausen P, Herrera-Estrella A, Galindo-Sánchez C, López-Hernández JF et al (2016). Global transcriptional analysis suggests Lasiodiplodia theobromae pathogenicity factors involved in modulation of grapevine defensive response. BMC Genomics 17.

Pareja-Jaime Y, Roncero MIG, Ruiz-Roldán MC (2008). Tomatinase from Fusarium oxysporum f. sp. lycopersici is required for full virulence on tomato plants. Molecular plant-microbe interactions 21: 728-736.

Parker D, Beckmann M, Zubair H, Enot DP, Caracuel-Rios Z, Overy DP et al (2009). Metabolomic analysis reveals a common pattern of metabolic re-programming during invasion of three host plant species by Magnaporthe grisea. The Plant Journal 59: 723- 737.

Peay KG, Garbelotto M, Bruns TD (2010). Evidence of dispersal limitation in soil microorganisms: isolation reduces species richness on mycorrhizal tree islands. Ecology 91: 3631-3640.

Pepper JW (2003). The evolution of evolvability in genetic linkage patterns. Biosystems 69: 115-126.

Perrin RM, Fedorova ND, Bok JW, Cramer RA, Jr., Wortman JR, Kim HS et al (2007). Transcriptional Regulation of Chemical Diversity in Aspergillus fumigatus by LaeA. PLoS Pathogens 3: e50.

Pfannenstiel BT, Keller NP (2019). On top of biosynthetic gene clusters: How epigenetic machinery influences secondary metabolism in fungi. Biotechnology advances. doi: 10.1016/j.biotechadv.2019.02.001

Piasecka A, Jedrzejczak-Rey N, Bednarek P (2015). Secondary metabolites in plant innate immunity: conserved function of divergent chemicals. New Phytologist 206: 948- 964.

Plumridge A, Melin P, Stratford M, Novodvorska M, Shunburne L, Dyer PS et al (2010). The decarboxylation of the weak-acid preservative, sorbic acid, is encoded by linked genes in Aspergillus spp. Fungal Genetics and Biology 47: 683-692.

Porter SS, Faber-Hammond J, Montoya AP, Friesen ML, Sackos C (2019). Dynamic genomic architecture of mutualistic cooperation in a wild population of Mesorhizobium. The ISME journal 13: 301-315.

Price MN, Huang KH, Arkin AP, Alm EJ (2005). Operon formation is driven by co- regulation and not by horizontal gene transfer. Genome research 15: 809-819.

40

Proctor RH, McCormick SP, Alexander NJ, Desjardins AE (2009). Evidence that a secondary metabolic biosynthetic gene cluster has grown by gene relocation during evolution of the filamentous fungus Fusarium. Molecular Microbiology 74: 1128-1142.

Prusky D, Alkan N, Mengiste T, Fluhr R (2013). Quiescent and necrotrophic lifestyle choice during postharvest disease development. Annual Review of Phytopathology 51: 155-176.

Ramos J-L, Marqués S, van Dillewijn P, Espinosa-Urgel M, Segura A, Duque E et al (2011). Laboratory research aimed at closing the gaps in microbial bioremediation. Trends in biotechnology 29: 641-647.

Rane RV, Ghodke AB, Hoffmann AA, Edwards OR, Walsh TK, Oakeshott JG (2019). Detoxifying enzyme complements and host use phenotypes in 160 insect species. Current Opinion in Insect Science 31: 131-138.

Richards LA, Dyer LA, Forister ML, Smilanich AM, Dodson CD, Leonard MD et al (2015). Phytochemical diversity drives plant–insect community diversity. Proceedings of the National Academy of Sciences 112: 10973-10978.

Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ (2006). Evolution of filamentous plant pathogens: gene exchange across eukaryotic kingdoms. Current Biology 16: 1857-1864.

Rokas A, Wisecaver JH, Lind AL (2018). The birth, evolution and death of metabolic gene clusters in fungi. Nature Reviews Microbiology 16: 731-744.

Samal A, Wagner A, Martin OC (2011). Environmental versatility promotes modularity in genome-scale metabolic networks. BMC Systems Biology 5.

Saunders M, Kohn LM (2009). Evidence for alteration of fungal endophyte community assembly by host defense compounds. New Phytologist 182: 229-238.

Savary S, Willocquet L, Pethybridge SJ, Esker P, McRoberts N, Nelson A (2019). The global burden of pathogens and pests on major food crops. Nature Ecology & Evolution 3: 430-439.

Schäfer W, Straney D, Ciuffetti L, Van Etten H, Yoder O (1989). One enzyme makes a fungal pathogen, but not a saprophyte, virulent on a new host plant. Science 246: 247- 249.

Schulz T, Stoye J, Doerr D (2018). GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data. BMC genomics 19: 308.

41

Seifbarghi S, Borhan MH, Wei Y, Coutu C, Robinson SJ, Hegedus DD (2017). Changes in the Sclerotinia sclerotiorum transcriptome during infection of Brassica napus. BMC genomics 18: 266.

Shi-Kunne X, Faino L, van den Berg GC, Thomma BP, Seidl MF (2018). Evolution within the fungal genus Verticillium is characterized by chromosomal rearrangement and gene loss. Environmental microbiology 20: 1362-1373.

Slot JC (2017). Fungal gene cluster diversity and evolution. Advances in genetics. Elsevier. pp 141-178.

Snel B, Lehmann G, Bork P, Huynen MA (2000). STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic acids research 28: 3442-3444.

Snel B, Huynen MA (2004). Quantifying modularity in the evolution of biomolecular systems. Genome Research 14: 391-397.

Soanes DM, Chakrabarti A, Paszkiewicz KH, Dawe AL, Talbot NJ (2012). Genome-wide Transcriptional Profiling of Appressorium Development by the Rice Blast Fungus Magnaporthe oryzae. PLoS Pathogens 8: e1002514.

Srivastava A, Cho IK, Cho Y (2013). The Bdtf1 gene in alternaria brassicicola is important in detoxifying brassinin and maintaining virulence on brassica species. Molecular Plant-Microbe Interactions 26: 1429-1440.

Stukenbrock EH, Dutheil JY (2018). Fine-scale recombination maps of fungal plant pathogens reveal dynamic recombination landscapes and intragenic hotspots. Genetics 208: 1209-1229.

Sturtevant AH (1913). The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. Journal of experimental zoology 14: 43-59.

Susi H, Laine AL (2013). Pathogen life-history trade-offs revealed in allopatry. Evolution 67: 3362-3370.

Tadych M, Vorsa N, Wang Y, Bergen MS, Johnson-Cicalese J, Polashock JJ et al (2015). Interactions between cranberries and fungi: the proposed function of organic acids in virulence suppression of fruit rot fungi. Frontiers in microbiology 6: 835.

Van der Meer JR, De Vos WM, Harayama S, Zehnder AJB (1992). Molecular mechanisms of genetic adaptation to xenobiotic compounds. Microbiological Reviews 56: 677-694.

42

VanEtten H, Matthews P, Tegtmeier K, Dietert M, Stein J (1980). The association of pisatin tolerance and demethylation with virulence on pea in Nectria haematococca. Physiological Plant Pathology 16: 257-268.

VanEtten H, Temporini E, Wasmann C (2001). Phytoalexin (and phytoanticipin) tolerance as a virulence trait: why is it not required by all pathogens? Physiological and Molecular Plant Pathology 59: 83-93.

Wadke N, Kandasamy D, Vogel H, Lah L, Wingfield BD, Paetz C et al (2016). The Bark-Beetle-Associated Fungus, Endoconidiophora polonica, Utilizes the Phenolic Defense Compounds of Its Host as a Carbon Source. Plant Physiology 171: 914-931.

Wagner A (2009). Evolutionary constraints permeate large metabolic networks. BMC evolutionary biology 9: 231.

Wagner GP (1996). Homologues, natural kinds and the evolution of modularity. American Zoologist 36: 36-43.

Walton JD (2000). Horizontal gene transfer and the evolution of secondary metabolite gene clusters in fungi: an hypothesis. Fungal genetics and biology 30: 167-171.

Wang W-S, Zhao X-Q, Li M, Huang L-Y, Xu J-L, Zhang F et al (2015). Complex molecular mechanisms underlying seedling salt tolerance in rice revealed by comparative transcriptome and metabolomic profiling. Journal of experimental botany 67: 405-419.

Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C (2014). Gene discovery for enzymes involved in limonene modification or utilization by the mountain pine beetle- associated pathogen Grosmannia clavigera. Applied and environmental microbiology 80: 4566-4576.

Watanabe S, Saimura M, Makino K (2008). Eukaryotic and bacterial gene clusters related to an alternative pathway of non-phosphorylated L-rhamnose metabolism. Journal of Biological Chemistry 283: 20372-20382.

Wisecaver JH, Slot JC, Rokas A (2014). The Evolution of Fungal Metabolic Pathways. PLoS Genetics 10: e1004816.

Wong S, Wolfe KH (2005). Birth of a metabolic gene cluster in yeast by adaptive gene relocation. Nature genetics 37: 777-782.

Wu C, Kim Y-S, Smith KM, Li W, Hood HM, Staben C et al (2009). Characterization of chromosome ends in the filamentous fungus Neurospora crassa. Genetics 181: 1129- 1145.

43

Yamada T, Kanehisa M, Goto S (2006). Extraction of phylogenetic network modules from the metabolic network. BMC Bioinformatics 7.

Yeaman S, Whitlock MC (2011). The genetic architecture of adaptation under migration– selection balance. Evolution 65: 1897-1911.

Yeaman S (2013). Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences 110: E1743-E1751.

Yeaman S, Aeschbacher S, Bürger R (2016). The evolution of genomic islands by increased establishment probability of linked alleles. Molecular ecology 25: 2542-2558.

Zhang H, Rokas A, Slot JC (2012). Two different secondary metabolism gene clusters occupied the same Ancestral locus in fungal of the Arthrodermataceae. PLoS ONE 7.

Zhao C, Waalwijk C, de Wit PJGM, Tang D, van der Lee T (2014). Relocation of genes generates non-conserved chromosomal segments in Fusarium graminearumthat show distinct and co-regulated gene expression patterns. BMC Genomics 15: 191.

Zhao T, Kandasamy D, Krokene P, Chen J, Gershenzon J, Hammerbacher A (2018). Fungal associates of the tree-killing bark beetle, Ips typographus, vary in virulence, ability to degrade conifer phenolics and influence bark beetle tunneling behavior. Fungal Ecology 38: 71-79.

44

Chapter 2: Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens

Published as:

Gluck-Thaler E, Slot JC (2015). Dimensions of horizontal gene transfer in eukaryotic

microbial pathogens. PLoS pathogens 11: e1005156.

45 Introduction

Comparative genomic studies of microorganisms have disrupted the paradigm of vertical inheritance with modification. First in bacteria, and more recently in microscopic and even multicellular eukaryotes, horizontal gene transfer (HGT) has been implicated in genomic and ecological evolution. HGT is the exchange of genetic material between organisms that occurs independently of meiotic and mitotic recombination between mating or hybridizing individuals. HGT occurs as viral and plasmid mediated transfer, and transformation by environmental DNA via known or yet unknown mechanisms

(Thomas and Nielsen 2005). The existence of environmental gene pools and pan- genomes is supported by decades of functional and phylogenetic studies in bacteria that have highlighted the exchange and proliferation of virulence factors and antibiotic resistance mechanisms (Baquero 2004; Barlow 2009; Smillie et al., 2011). Presently, accumulating reports of HGT in eukaryotes raise similar questions of how exposure to such gene pools has impacted the evolution of eukaryotic microbes, and whether or not human activities influence HGT dynamics. Here, we describe the evidence supporting

HGT in eukaryotic microbial pathogens from divergent lineages that impact human, animal and plant health (Table A.1). We consider three interacting dimensions affecting the prevalence of HGT (genetic network structure, selectable functions, and opportunity for contact) in order to better understand how HGT manifests in this important group of organisms.

46 Does HGT really contribute to eukaryotic microbial pathogen genomes?

Reports of HGT among eukaryotic microbial pathogens have accumulated in recent years, largely driven by comparative genomic analyses showing an unexpected distribution and phylogenetic placement of gene sequences. However, these analyses require appropriate sampling and methodology and should be interpreted with caution.

Objections to the veracity and extent of HGT include the absence of a reproducible transfer mechanism in some lineages, and the plausibility of alternative explanations for the distributions and phylogenies of HGT candidate gene sequences (Kroken et al.,

2003). The alternative interpretations of unexpected gene distributions and phylogenies hinge on different assumptions about the parsimony of a small number of gene transfer events versus a large number of the more widely accepted processes of gene duplication and loss (Kroken et al., 2003). Proponents of HGT hypotheses emphasize the importance of robust phylogenetic analyses coupled with multiple additional lines of evidence to support claims (Zhaxybayeva and Doolittle 2011; Soanes and Richards 2014). The most convincing cases have thus variously combined model-based phylogenetic reconstructions to compare gene trees and species phylogenies and tree topology tests, with additional support from genome structure, sequence identity, codon usage, GC content, and evidence of benefits to the recipient (Table A.1). Analyses relying on but a few of these methods, especially supporting methods in isolation, are rarely sufficient to strongly support HGT, and can result in false positive identification of HGT

(Zhaxybayeva and Doolittle 2011). Unsampled genetic diversity at the population and species levels, which can impact reconstructions of gene distribution and inheritance, may also lead to false positive identification of HGT, underscoring the importance of

47 robust taxon sampling (Heath et al., 2008). For example, in a 2011 study, genes encoding the greatly expanded crinkler protein family in the amphibian pathogen

Batrachochytrium dendrobatidis had a distribution and phylogeny consistent with HGT from plant pathogenic oomycetes; however, the recent detection of crinkler homologs in two additional fungal lineages opens the possibility that the current distribution is compatible with vertical inheritance and widespread gene loss (Sun et al., 2011; James et al., 2013; Lin et al., 2014). Similar sampling biases can also lead to the underestimation of HGT in lineages of eukaryotic microbial pathogens due to recent HGTs that are not fixed in populations and transferred genes that are prone to subsequent loss under changing selective pressures. This is illustrated by the rapid, differential degeneration of a horizontally acquired gene cluster among members of the necrotrophic fungal genus

Botrytis (Campbell et al., 2013). More thorough sampling efforts currently underway may reduce erroneous inferences about HGT, help resolve the timing and direction of HGT events, and provide better estimates of their ecological contexts (Grigoriev et al., 2013).

Does genetic network complexity influence HGT?

The complexity hypothesis posits that genes that are more modular in nature (i.e., residing at the periphery of gene connectivity networks) are more likely to be successfully transferred because they are less disruptive to host networks and require establishment of fewer connections (Figure 2.1) (Jain et al., 1999; Baquero 2004). This hypothesis is supported in bacteria, where low network connectivity is found to enhance a gene’s “transferability”, which is largely independent of its specific biological function

48 (Pál et al., 2005; Cohen et al., 2010). In eukaryotic microbial pathogens, genes encoding virulence factors may provide a fitness advantage to the recipient without extensive integration into genetic networks. Examples include genes encoding secreted effector proteins and specialized metabolic genes, especially those located in complete multigene clusters, which encode mechanisms for regulation, compartmentalization, secretion of products, and stoichiometric control of toxic intermediates (Friesen et al., 2006; Patron et al., 2007; Khaldi and Wolfe 2011; Klosterman et al., 2011; Richards et al., 2011;

Tsaousis et al., 2012; Greene et al., 2014; Elmore et al., 2015). Biased rates of gene transfer and loss of function found in systematic investigations of pathogen genomes also support the complexity hypothesis. Notably, in the human pathogen Trichomonas vaginalis, only 2% of 152 horizontally transferred genes were “informational” (more often involved in highly conserved, connected cellular processes like transcription), compared to 65% that encoded metabolic enzymes, which are “operational” genes associated with less conserved processes (Carlton et al., 2007; Strese et al., 2014).

However, a survey of metabolic enzymes in select pathogens found that the degree of network connectivity of horizontally transferred genes was not different from the connectivity of vertically inherited genes (Whitaker et al., 2009). The authors note that this could be because the horizontally transferred genes integrated into existing networks during the extensive time period since they were acquired (Wisecaver et al., 2014).

A notable consequence of limited gene connectivity associated with horizontally transferred genes is the relaxation of network-imposed selection pressures, giving rise to sequence divergence, thus increasing the capacity for adaptive evolution. This effect has

49 been reported in plant pathogenic Pyrenophora spp. where ten horizontally transferred genes present significantly more diversifying sequence change compared to corresponding homologs in donor species (Sun et al., 2013).

What biological functions favor successful HGT in eukaryotic microbial pathogens?

Environmental selection in pathogen niches may favor the acquisition of certain gene functions. To date, there have been few formal investigations of general trends in functions of genes transferred to eukaryotic microbial pathogens (but see

Richards and Talbot (2013), Soanes and Richards (2014), and Savory et al. (2015) for discussion of trends in fungi and oomycetes), and few functional confirmations of the roles horizontally transferred genes play in virulence (Whitaker et al., 2009;

Wisecaver et al., 2014). Individual reports suggest three functional categories that are often horizontally transferred to divergent pathogen lineages: secreted molecules, membrane modifications, and metabolism specialized to host interactions and environments (Figure 2.1). Secreted molecule genes that have been transferred include those for degradative enzymes, such as the twenty-two different plant polysaccharide depolymerization enzymes that were transferred from phytopathogenic fungi to Oomycetes (Richards et al., 2011). Genes for the production of toxic metabolites that disrupt normal cellular function are also reported transferred, including the fumonisin mycotoxin gene cluster between phytopathogenic Fusarium and

Aspergillus fungi (Khaldi and Wolfe 2011). Membrane modifications identified in HGT

50 Figure 2.1: The Interacting Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens. The probability of horizontal gene transfer (HGT) and retention of a gene in recipient organisms is proposed to be under three main interacting influences or dimensions: (1) the genetic network structure, defined as the sum of functional connections between the gene and all other genes within a genome; (2) the selectability of the phenotype conferred by its function in a host environment; and (3) opportunity for contact, i.e. the rate and intimacy of meetings between donor DNA and recipient organisms throughout their lifecycles. In this depiction of a general model, the probability of HGT increases with increasing dot intensity and size. Some pathogen-specific parameters influencing each dimension are listed above each circle’s perimeter. While HGT and the subsequent maintenance of genes in recipient genomes might be possible under the influence of only one or two dimensions, it is predicted to have the highest probability where all dimensions interact.

51 reports may directly mediate cellular contact between hosts and pathogens, or mask pathogen membranes from host defense responses (Zhao et al., 2014). For example, a septin trans-membrane protein acquired by the microsporidian Ordospora colligata may facilitate the -mediated infection of its hosts (Pombert et al., 2015).

Finally, the environment that is the host selects for the ability to metabolize and/or utilize host defenses and resources, or other sources of fitness in or on the host. For example, osmoregulatory genes acquired by phytopathogenic fungi may facilitate cellular osmotic balance in vascular fluids, and an anaerobic sulfur mobilization gene may increase survival of Blastocystis (a Stramenopile suspected to be a pathogen) in anaerobic gut environments (Klosterman et al., 2011; Tsaousis et al., 2012). Some genes or gene clusters may be considered “repeat offenders”, having been transferred multiple times, possibly due to advantages conferred in specific pathogenic ecologies. For example, the complex distribution of epipolythiodioxopiperazine toxin gene clusters in ascomycete fungi suggests at least three independent transfers between divergent lineages (Patron et al., 2007).

The extent to which de novo gene gain promotes rapid pathogen emergence is largely unknown. One apparently contemporary transfer of a secreted toxin-encoding gene required for complete virulence on wheat, from Stagonospora nodorum to

Pyrenophora tritici-repentis has been reported . The role of HGT in pathogen emergence is additionally supported by functional studies of transferred genes. For example, deletions of two horizontally acquired genes from the grass pathogen, Fusarium pseudograminearum, and of a horizontally acquired osmoregulatory gene from vascular

52 wilt fungi resulted in reduced virulence (Klosterman et al., 2011; Gardiner et al., 2012).

Conversely, gain of virulence was documented in a fungal endophyte transformed with a membrane modification gene that the related entomopathogenic fungus, Metarhizium robertsii, may have ancestrally acquired from insects (Zhao et al., 2014).

The fitness benefits conferred by horizontally acquired genes may range from none to highly beneficial, and may not directly relate to pathogenesis. Some pathogens exhibit complex lifecycles that alternate between pathogenic and non-pathogenic states, and alternatively the gain of adaptive genes with no pathogenic function could facilitate attenuation of pathogenicity or transition to free-living status under selection from host density. For example, it was speculated that the transfer of a nitrate assimilation gene cluster to the mycoparasitic Trichoderma fungi may promote a transition to the nitrogen- limited wood-decay niche, and the transfer of a sugar utilization gene cluster to

Schizosaccharomyces yeast could be part of an ecological transition from pathogen to fermentor (Slot and Hibbett 2007; Slot and Rokas 2010).

Do eukaryotic microbial pathogens become “who they meet”?

The frequency of physical contact between donors and recipients should be considered a driving force behind the likelihood of HGT events, and is a function of an organism’s ecology. Bacteria isolated from the same human body site, for example, exchange genes more frequently, and the genes they exchange are more frequently associated with niche-specific functions (Smillie et al., 2011). Three groups of ecologically adjacent organisms are often shown to be involved in horizontal gene

53 exchange with eukaryotic microbial pathogens: co-infecting pathogens, non-pathogens symbiotically associated with the host, and the hosts themselves. Examples of transfers between potentially co-infecting pathogens include a host-specific toxin gene between two fungal wheat pathogens and a plant defense compound degradation cluster between fungal grass pathogens (Friesen et al., 2006; Greene et al., 2014). Non-pathogenic gut commensal bacteria are thought to have contributed diverse metabolic genes to

Trichomonas vaginalis and Blastocystis genomes, including those involved in carbohydrate and amino acid metabolism (Carlton et al., 2007; Tsaousis et al., 2012;

Strese et al., 2014). Remarkably, there are cases of host-gene acquisition by insect- and plant-pathogenic fungi. These include the acquisition of a sterol binding protein by the entomopathogenic fungus Metarhizium robertsii that enables it to incorporate host- derived cholesterol into its cell membrane during , and the acquisition of a purine salvage pathway gene by obligate intracellular Microsporidian pathogens, among others (Selman et al., 2011; Jaramillo et al., 2013; Zhao et al., 2014; Pombert et al.,

2015). Considering the range of genetic exchange between ecological associates outside of predator-prey relationships, we suggest that the “you are what you eat” hypothesis proposed as a mechanism of HGT in phagotrophic eukaryotes may be rebranded “you are who you meet” for eukaryotic microbial pathogens (Doolittle 1998). We propose that the frequency of meetings between pathogens and specific classes of organisms may be influenced by virulence, localization in host, and host range (Figure 2.1). Less virulent pathogens may have sustained encounters with genomes of co-occurring pathogens, non- pathogenic symbionts and the host because of their low impact on host mortality. In

54 contrast, increasingly virulent pathogens may disproportionately encounter genomes from the greater environment as increased virulence correlates with increased survival time outside of hosts (Walther and Ewald 2004). Obligate intracellular pathogens might be more frequently exposed to host genomes, while extracellular pathogens may often encounter genomes of other host-associated organisms. Similarly, generalists encounter a greater diversity of host genomes compared to specialists, and facultative pathogens may encounter more genes from the greater environment. Factors that contribute to a net increase in exposure to foreign DNA may favor acquisition of novel adaptive functions or drive an HGT ratchet by replacing pathogen genes with foreign genes (as proposed by

Doolittle (1998)). Furthermore, the gradually converging ecologies resulting from successive “meetings” may promote further transfers of ecology-specific genes such that decreasing ecological proximity results in acceleration of gene acquisition and vice versa.

This could explain in part the repeated transfers of phytopathogenic genes from fungi to

Oomycetes, as well as the repeated acquisition of genes by the plant pathogenic Fusarium lineage from other plant pathogenic fungi, including the Verticillium, Aspergillus and

Collectotrichum genera (Khaldi and Wolfe 2011; Klosterman et al., 2011; Richards et al.,

2011; Gardiner et al., 2012; Savory et al., 2015). It remains to be investigated whether specific pathogen lineages, virulence levels, host localizations, or specializations are more prone to horizontal gene exchange, but we note that lineage-specific biases in the rates of HGT were recently shown in a large comparative analysis of fungi (Wisecaver et al., 2014).

55 Do human activities impact HGT in eukaryotic microbial pathogens?

Human activities that accelerate environmental changes may impose selection pressures and precipitate dispersal events, which can influence the likelihood of HGT among eukaryotic microbial pathogens. Strong selection pressures exerted by decreased host diversity and intensive management practices in agro-ecosystems, including antimicrobials and other chemical control agents, may be expected to increase the prevalence of horizontally transferred genes, similar to the horizontal proliferation of antibiotic resistance genes in bacteria due to modern overuse (Barlow 2009).

Homogeneous host environments increase the density of host-specific pathogens and non-pathogens and the opportunities for them to interact, and at the same time relax density-dependent selection against virulent pathogens. Other human activities, such as global trade and travel, influence both the frequency and diversity of the close physical encounters required for horizontal gene flow, which can lead to emergence and evolution of pathogens in the near-term (Friesen et al., 2006). Furthermore, the migration of host ranges associated with climate and land-use changes provides new opportunities for encounters between pathogens established in previously isolated environments (Parmesan and Yohe 2003). The extent of human impact on HGT in eukaryotic microbial pathogens is not known, but recent HGT discoveries in these organisms argue for careful consideration of pathogen emergence by HGT as a consequence of ecosystem management.

56 References

Baquero F (2004). From pieces to patterns: evolutionary engineering in bacterial pathogens. Nature Reviews Microbiology 2: 510.

Barlow M (2009). What antimicrobial resistance has taught us about horizontal gene transfer. Horizontal Gene Transfer. Springer. pp 397-411.

Campbell MA, Staats M, van Kan JA, Rokas A, Slot JC (2013). Repeated loss of an anciently horizontally transferred gene cluster in Botrytis. Mycologia 105: 1126-1134.

Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q et al (2007). Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315: 207- 212.

Cohen O, Gophna U, Pupko T (2010). The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer. Molecular biology and evolution 28: 1481-1489.

Doolittle WF (1998). You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends in Genetics 14: 307-311.

Elmore MH, McGary KL, Wisecaver JH, Slot JC, Geiser DM, Sink S et al (2015). Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages. Genome biology and evolution 7: 789-800.

Friesen TL, Stukenbrock EH, Liu Z, Meinhardt S, Ling H, Faris JD et al (2006). Emergence of a new disease as a result of interspecific virulence gene transfer. Nature genetics 38: 953.

Gardiner DM, McDonald MC, Covarelli L, Solomon PS, Rusu AG, Marshall M et al (2012). Comparative pathogenomics reveals horizontally acquired novel virulence genes in fungi infecting cereal hosts. PLoS pathogens 8: e1002952.

Greene GH, McGary KL, Rokas A, Slot JC (2014). Ecology drives the distribution of specialized tyrosine metabolism modules in fungi. Genome biology and evolution 6: 121- 132.

Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R et al (2013). MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic acids research 42: D699-D704.

Heath TA, Hedtke SM, Hillis DM (2008). Taxon sampling and the accuracy of phylogenetic analyses. Journal of Systematics and Evolution 46: 239-257.

57 Jain R, Rivera MC, Lake JA (1999). Horizontal gene transfer among genomes: the complexity hypothesis. Proceedings of the National Academy of Sciences 96: 3801-3806.

James TY, Pelin A, Bonen L, Ahrendt S, Sain D, Corradi N et al (2013). Shared signatures of and phylogenomics unite Cryptomycota and . Current Biology 23: 1548-1553.

Jaramillo VDA, Vargas WA, Sukno SA, Thon MR (2013). Horizontal transfer of a subtilisin gene from plants into an ancestor of the plant pathogenic fungal genus Colletotrichum. PLoS One 8: e59078.

Khaldi N, Wolfe KH (2011). Evolutionary origins of the fumonisin secondary metabolite gene cluster in Fusarium verticillioides and Aspergillus niger. International journal of evolutionary biology.

Klosterman SJ, Subbarao KV, Kang S, Veronese P, Gold SE, Thomma BP et al (2011). Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS pathogens 7: e1002137.

Kroken S, Glass NL, Taylor JW, Yoder O, Turgeon BG (2003). Phylogenomic analysis of type I polyketide synthase genes in pathogenic and saprobic ascomycetes. Proceedings of the National Academy of Sciences 100: 15670-15675.

Lin K, Limpens E, Zhang Z, Ivanov S, Saunders DG, Mu D et al (2014). Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus. PLoS genetics 10: e1004078.

Pál C, Papp B, Lercher MJ (2005). Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature genetics 37: 1372-1375.

Parmesan C, Yohe G (2003). A globally coherent fingerprint of climate change impacts across natural systems. Nature 421: 37.

Patron NJ, Waller RF, Cozijnsen AJ, Straney DC, Gardiner DM, Nierman WC et al (2007). Origin and distribution of epipolythiodioxopiperazine (ETP) gene clusters in filamentous ascomycetes. BMC Evolutionary Biology 7: 174.

Pombert J-F, Haag KL, Beidas S, Ebert D, Keeling PJ (2015). The Ordospora colligata genome: Evolution of extreme reduction in microsporidia and host-to-parasite horizontal gene transfer. MBio 6: e02400-02414.

Richards TA, Soanes DM, Jones MD, Vasieva O, Leonard G, Paszkiewicz K et al (2011). Horizontal gene transfer facilitated the evolution of plant parasitic mechanisms in the oomycetes. Proceedings of the National Academy of Sciences 108: 15258-15263.

58

Richards TA, Talbot NJ (2013). Horizontal gene transfer in osmotrophs: playing with public goods. Nature Reviews Microbiology 11: 720.

Savory F, Leonard G, Richards TA (2015). The role of horizontal gene transfer in the evolution of the oomycetes. PLoS pathogens 11: e1004805.

Selman M, Pombert J-F, Solter L, Farinelli L, Weiss LM, Keeling P et al (2011). Acquisition of an animal gene by microsporidian intracellular parasites. Current Biology 21: R576-R577.

Slot JC, Hibbett DS (2007). Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study. PloS one 2: e1097.

Slot JC, Rokas A (2010). Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi. Proceedings of the National Academy of Sciences 107: 10136-10141.

Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ (2011). Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480: 241-244.

Soanes D, Richards TA (2014). Horizontal gene transfer in eukaryotic plant pathogens. Annual Review of Phytopathology 52: 583-614.

Strese Å, Backlund A, Alsmark C (2014). A recently transferred cluster of bacterial genes in Trichomonas vaginalis-lateral gene transfer and the fate of acquired genes. BMC evolutionary biology 14: 119.

Sun B-F, Xiao J-H, He S, Liu L, Murphy RW, Huang D-W (2013). Multiple interkingdom horizontal gene transfers in Pyrenophora and closely related species and their contributions to phytopathogenic lifestyles. PLoS One 8: e60029.

Sun G, Yang Z, Kosch T, Summers K, Huang J (2011). Evidence for acquisition of virulence effectors in pathogenic chytrids. BMC evolutionary biology 11: 195.

Thomas CM, Nielsen KM (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nature reviews microbiology 3: 711.

Tsaousis AD, de Choudens SO, Gentekaki E, Long S, Gaston D, Stechmann A et al (2012). Evolution of Fe/S cluster biogenesis in the anaerobic parasite Blastocystis. Proceedings of the National Academy of Sciences 109: 10426-10431.

59 Walther BA, Ewald PW (2004). Pathogen survival in the external environment and the evolution of virulence. Biological Reviews 79: 849-869.

Whitaker JW, McConkey GA, Westhead DR (2009). The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome biology 10: R36.

Wisecaver JH, Slot JC, Rokas A (2014). The evolution of fungal metabolic pathways. PLoS genetics 10: e1004816.

Zhao H, Xu C, Lu H-L, Chen X, Leger RJS, Fang W (2014). Host-to-pathogen gene transfer facilitated infection of insects by a pathogenic fungus. PLoS pathogens 10: e1004009.

Zhaxybayeva O, Doolittle WF (2011). Lateral gene transfer. Current Biology 21: R242- R246.

60

Chapter 3: Specialized Plant Biochemistry Drives Gene Clustering in Fungi

Published as: Gluck-Thaler E, Slot JC (2018). Specialized plant biochemistry drives gene clustering in fungi. The ISME Journal 12: 1694-1705.

EG-T and JCS conceptualized the work and wrote the manuscript. EG-T developed and performed all analyses.

61 Abstract

The fitness and evolution of prokaryotes and eukaryotes are affected by the organization of their genomes. In particular, the physical clustering of genes can coordinate gene expression and can prevent the breakup of co-adapted alleles. While clustering may thus result from selection for phenotype optimization and persistence, the impact of environmental selection pressures on eukaryotic genome organization has rarely been systematically explored. Here, we investigated the organization of fungal genes involved in the degradation of phenylpropanoids, a class of plant-produced secondary metabolites that mediate many ecological interactions between plants and fungi. Using a novel gene cluster detection method, we identified 1110 gene clusters and many conserved combinations of clusters in a diverse set of fungi. We demonstrate that congruence in genome organization over small spatial scales is often associated with similarities in ecological lifestyle.

Additionally, we find that while clusters are often structured as independent modules with little overlap in content, certain gene families merge multiple modules into a common network, suggesting they are important components of phenylpropanoid degradation strategies. Together, our results suggest that phenylpropanoids have repeatedly selected for gene clustering in fungi, and highlight the interplay between genome organization and ecological evolution in this ancient eukaryotic lineage.

62 Introduction

Genome architecture is intimately linked to the trajectory of organismal evolution.

The impacts of linear gene organization on prokaryotic evolution in particular have been extensively studied (Baquero 2004), and such organization is increasingly recognized as affecting eukaryotic fitness and evolution as well. For example, spatial clustering of functionally related genes may enable optimization of phenotypes through coordinated gene expression (Hurst et al., 2002; Al-Shahrour et al., 2010; McGary et al., 2013). Similarly, loci composed of co-adapted alleles can facilitate inheritance of locally adapted ecotypes within populations (Yeaman 2013; Holliday et al., 2016). Rather than resulting from non-adaptive processes, the persistence of such organizational patterns in eukaryotic genomes suggests that they are targets of selection (Hurst et al., 2002; Lynch 2007). However, the extent to which specific environmental selection pressures drive eukaryotic genome organization, especially over macroevolutionary timescales, is unclear.

Organized genome structure is particularly apparent in fungi, a lineage of eukaryotic microorganisms whose activities impact biomass transformation and plant and animal health

(Peay et al., 2016). Fungal genomes contain many metabolic gene clusters (MGCs) composed of genes encoding enzymes, transporters and regulators that participate in specialized metabolic processes such as nutrient acquisition, competition and defense

(Wisecaver et al., 2014). Although MGCs are far more rare in fungi compared with bacteria, fungal MGCs exhibit similarly sparse distributions among distantly related species with overlapping niches (Greene et al., 2014; Dhillon et al., 2015; Glenn et al., 2016). This ecological pattern of distribution suggests that conserved combinations of genes may be signatures of ecological selection in fungal genomes.

63 Fungal MGCs encoding specialized or secondary metabolite (SMs) production have been studied extensively (Hoffmeister and Keller 2007) and more recently, several reports suggest that MGCs also encode to degrade plant SMs (Shanmugam et al., 2010;

Greene et al., 2014; Wang et al., 2014; Kettle et al., 2015; Glenn et al., 2016). Plant SMs mediate important biotic and abiotic interactions, including the exclusion of fungal pathogens and the rates of nutrient cycling long after the plant has died. The largest group of plant SMs are phenylpropanoids, which not only contribute to chemical defenses, but are also the main barriers to wood decay (Floudas et al., 2012), and costly inhibitors of lignocellulose biofuel production (Jönsson and Martín 2016). As the primary colonizers of plant material, fungi are frequently in contact with phenylpropanoids, and must mitigate their inhibitory effects through sequestration, excretion and degradation in order to grow (Mäkelä et al., 2015).

Despite the characterization of many phenylpropanoid degradation pathways in fungi, the genomic bases of these pathways are largely unknown (Mäkelä et al., 2015), precluding the use of currently available algorithms to investigate whether or not these metabolic processes are encoded in MGCs (Wisecaver et al., 2014; Weber et al., 2015).

Here, we developed a novel algorithm based on empirically derived models of fungal genome evolution to test the hypothesis that selection pressures from plant SMs impact genome organization across disparate fungal lineages. Using a database of 529 fungal genomes, we systematically detected 1110 candidate MGCs and many conserved combinations of MGCs that putatively degrade a broad array of phenylpropanoids. We tested for associations between MGCs and various fungal ecological lifestyles, and found that the presence of certain MGCs was enriched in plant pathotrophs, saprotrophs, symbiotrophs and

64 endophytes. While many clusters appear to have evolved independently, we identified several gene families that are commonly associated with diverse MGCs, suggesting they play important roles in phenylpropanoid catabolism. Overall, our results suggest that phenylpropanoids are drivers of genome organization in plant-associated fungi, and that

MGCs in turn determine patterns of fungal community assembly on both living and decaying plant tissues.

Materials and Methods

Data acquisition, annotation and software specifications

Publically available data from 529 assembled fungal genomes and predicted proteomes were retrieved from various sources (Table B.2). Ecological metadata were compiled from various sources (Table B.2), including the community-curated FUNGuild database (last accessed September 9th, 2016) (Nguyen et al., 2016) and the U.S. National

Fungus Collection database (last accessed April 1st, 2017) (Farr and Rossman 2017).

Amino acid sequence searches with cutoffs of 30% identity, 50 bitscore and where the length of the target sequence was 50-150% of the query sequence, were performed with

USEARCH v8.0.1517’s UBLAST algorithm (additional parameters: evalue cutoff = 1e-5, accel = 0.8) (Edgar 2010) or with BLASTp v2.2.25+ (additional parameters: evalue cutoff =

1e-4) (Altschul et al., 1990). was determined using OrthoMCL v2 with an inflation value of 1.5 (Li et al., 2003). All amino acid sequences from clustered homolog groups were assigned to orthologous groups from the fuNOG database (last accessed

65 03/18/16) (Huerta-Cepas et al., 2015) using HMMER3 (Eddy 2011). Predicted functional annotations and KOG processes for each clustered homolog group were based on the most frequent fuNOG annotation assigned to proteins within that group. Clustered sequences were additionally annotated by PFAM and GO term using InterProScan 5 v5.20-59.0 for screening purposes only (Jones et al., 2014).

All phylogenetic trees were visualized using ETE v3 (Huerta-Cepas et al., 2016) and all other graphs were visualized using the ggplot2 package in R (Wickham 2016).

Construction of the microsynteny tree

The evolution of microsynteny (gene content conservation over small genomic distances) does not necessarily recapitulate phylogenetic relationships determined by models of sequence evolution. In order to assess the unexpectedness of cluster distributions within a phylogenetic framework based on microsynteny conservation, we used an approach similar to Snel et al. (1999) to construct a species tree based on microsyntenic distance, or pairwise comparisons of gene content conservation (Figure B.1).

To construct the microsynteny tree, we retrieved 10 genes upstream and downstream of all homologs from a randomly selected gene family (designated “gene neighborhood”).

Variation in genome assembly quality occasionally resulted in gene neighborhoods smaller than the maximum neighborhood size of 21, but we required a minimum neighborhood size of 10. All genes within the retrieved neighborhoods were compared using UBLAST, then sorted into homolog groups using OrthoMCL. For each pairwise neighborhood comparison, syntenic distance was defined as 1 - (the number of shared orthologs / the smallest

66 neighborhood size). Pairwise syntenic distances thus range from 0 (all homolog groups are shared) to ~0.95 (only the query gene is shared). 1000 neighborhoods were randomly sampled in this way. A neighbor-joining tree was constructed from the distance matrix of median pairwise syntenic distances using the ape package in R (Paradis et al., 2004). The original dataset was sampled with replacement to obtain 100 trees that were then used to calculate bootstrap support on the original tree using using RAxML v8.2.0 (Stamatakis

2014), and nodes receiving less than 70% bootstrap support were collapsed. The final microsynteny tree was used to calculate distances covered by cluster distributions, as well as individual homolog groups.

Sampling null models of gene cluster evolution

Empirically derived null models describing the background levels of gene cluster distributions were developed for each of the 12 largest taxonomic classes and each cluster size ranging from 4-24 genes (252 distributions in total). Briefly, for a given null distribution, a group of neighboring genes (i.e., query cluster) of size X was chosen at random from a randomly selected genome from taxonomic class Y. Hits to each gene in the query cluster were recovered using BLASTp. Genomes containing homologous clusters (i.e., containing hits to all genes in the query cluster, where no more than 6 intervening genes separated any hit from another) were then retrieved. To determine the phylogenetic distance associated with the query cluster distribution, the total non-overlapping branch length distance on the microsynteny tree connecting all genomes with homologous clusters was calculated. The above sampling process was repeated 500 times for each null distribution.

67

Locating Clusters through Unexpected Synteny

All homologs of an anchor gene query sequence of interest were retrieved using

BLASTp. Twenty genes upstream and downstream of all anchor gene homologs (designated

“anchor gene neighborhood”) were retrieved and sorted into homolog groups using

OrthoMCL after a UBLAST step. Individual homolog groups whose distribution on the microsynteny species tree had a maximum pairwise distance below 0.95 were discarded.

Within the set of all anchor gene neighborhoods, the set of unique combinations of 4 or more homolog groups that included the anchor gene (“cluster motifs”) was determined. The genomes in which each motif occurred were identified, with the condition that genes belonging to homolog groups in the motif never be separated by more than 6 intervening genes. For each genome in which a given motif occurred, the probability of observing the motif in that genome, given the size of the motif and the taxonomic class of the genome, was empirically estimated by determining the proportion of samples in the appropriate null distribution that cover a total distance on the microsynteny tree greater than or equal to the distance associated with the given motif, divided by the total number of samples in the null distribution. For example, to estimate the probability of observing a motif with 8 homolog groups in a Dothideomycete genome, we would compare the total distance associated with the observed motif to the null distribution of distances associated with size 8 clusters sampled from . For all such tests, the test statistic is distance on the microsynteny tree, and the null hypothesis is that the phylogenetic distribution of a given cluster motif is consistent with background rates of microsynteny evolution. The null hypothesis is rejected

68 for motifs with an estimated probability below 0.05 in at least one genome, and all genes assigned to such motifs are designated clusters. Clusters with proteins that had fuNOG annotations or PFAM domains or GO term annotations associated with proteins known to exclusively participate in fungal secondary metabolite biosynthesis were excluded from further analysis (Table B.4). All annotations of clustered proteins were also manually inspected for evidence of exclusive participation in biosynthetic metabolism, and excluded if necessary.

Cluster models and multi cluster model profiles (MCMPs)

Homology among clusters from the same cluster class (i.e., containing the same anchor gene) was determined by assessing similarities in homolog group content. Briefly, a matrix detailing the presence or absence of homolog groups in each cluster was used to calculate Bray-Curtis dissimilarity indices for all pairwise comparisons of clusters. Pairwise comparisons were then grouped using complete linkage clustering, and any clusters separated by under 0.6 distance units were assigned to the same cluster model. This cutoff was empirically determined after manual examination of the content of clusters assigned to the same model under various distance cutoffs. Homolog groups present in ≥75% of clusters assigned to a given model were then used to summarize that model. The above approach was also used to group fungal species with similar combinations of clusters into MCMPs by using a matrix detailing the presence or absence of homologous clusters (as determined by cluster model) among all species with two or more clusters. MCMPs observed in more than 5

69 species were used for enrichment analyses. All above analyses were performed using the vegan package in R (Oksanen et al., 2016).

Network analyses

Amino acid sequences from all clustered homolog groups across 13 cluster classes were combined into one set and then sorted into new homolog groups using UBLAST and

OrthoMCL. Homolog groups that contained sequences from multiple cluster classes were designated as “shared”. Pairwise co-occurrences of all homolog groups in all clusters were determined, and visualized as a network with Cytoscape v.3.4.0 (Shannon et al., 2003). The network layout was determined solely by the AllegroLayout plugin with the Allegro Spring-

Electric algorithm. Analyses of network modularity were performed with the spectral partitioning algorithm (Newman 2006), as implemented in MODULAR (Marquitti et al.,

2014), and the probability of network modularity was estimated by randomly sampling the network 1000 times, twice.

Enrichment tests

A one tailed Fisher’s exact test, as implemented in the Text-NSP Perl module, was used to conduct all tests of enrichment using data from the Pezizomycotina (Figure B.1).

Unless noted otherwise, cluster features (either cluster presence, cluster model presence or

MCMP assignment) were recorded at the species level, and counts of species were used to fill in contingency tables. Species assigned to multiple ecological lifestyles were considered

70 to belong to each of those lifestyles separately for the purposes of conducting enrichment tests. Contingency tables with a zero in least one cell had all cells incremented by 0.5 to avoid division by zero when calculating the odds ratio (Bradburn et al., 2007). In general, the odds ratio can be interpreted as the odds that fungi have a particular lifestyle, given they have a particular cluster-based feature. The precision of each odds ratio, except for those associated with contingency tables with a zero in at least one cell, was estimated by calculating the 95% confidence interval (Szumilas 2010). We rejected the null hypothesis that particular features are not enriched in fungi with a particular ecological lifestyle at α =

0.05. Given the exploratory nature of this study, we did not correct for multiple testing in order to identify trends in cluster distributions to follow up in later confirmatory studies.

Results

Diverse candidate gene clusters are associated with phenylpropanoid degradation

Using 27 different “anchor” gene families involved in phenylpropanoid degradation as separate queries (Methods; Table B.1), we searched 529 genomes from 454 fungal species for clusters using a novel gene cluster detection algorithm (Methods; Figure B.1, Table B.2).

We found evidence of unexpected clustering in 13 anchor gene families, which we defined as separate cluster classes: Aromatic ring-opening dioxygenase (ARD), Benzoate 4- monooxygenase (BPH), Catechol dioxygenase (CCH), Epicatechin laccase (ECL), Ferulic acid decarboxylase (FAD), Ferulic acid esterase 7 (CAE), Naringenin 3-dioxygenase (NAD),

Phenol 2-monooxygenase (PMO), Pterocarpan hydroxylase (PAH), Quinate 5-

71 dehydrogenase (QDH), Salicylate hydroxylase (SAH), Stilbene dioxygenase (SDO) and

Vanillyl alcohol oxidase (VAO). We identified a total of 1110 clusters distributed across 341 fungal genomes, or 938 clusters distributed across 287 fungal species (Figure 3.1, Figure B.2,

Table B.5). Only 31 clusters contained multiple different anchor genes (Table B.3).

By grouping clusters together based on similarities in their gene content, we identified 56 distinct types of clusters, or cluster models (Figure B.5). The number of cluster models per cluster class varies from 1 to 8, and clusters assigned to different models generally do not overlap in terms of content (Table B.6). Additionally, the gene windows over which clusters spanned typically contained very few intervening genes not part of the cluster itself (Figure

B.3). Clusters are disproportionately distributed across fungal lineages: the Pezizomycotina comprise 49.6% of species, yet contain 90.3% of all clusters, while represent

26.4% of species, but only 4.3% of clusters (Table B.7). However, homologous clusters assigned to the same model are typically found distributed across different taxonomic classes

(Figure B.5, Table B.7).

In order to benchmark our approach, we searched for previously characterized clusters containing QDH and SDO homologs. All 8 previously predicted clusters from the

SDO cluster class (Greene et al., 2014) and all 31 described clusters from the QDH class

(Hane et al., 2007; Shalaby et al., 2012) were recovered. Additionally, while this study was being prepared, a single cluster from the PAH class was identified by Pigné et al. (2017). To our knowledge, the remaining 1070 clusters are described here for the first time. We expect some false negative error in our detection but not significant false positive to arise due to variation in the quality of genome assemblies and annotations. Some false positive error in

72 Figure 1 Number of clusters a) Gene cluster class b) 1 2 3 4 Fungal lifestyle Enrichment significance (Pezizomycotina) P > 0.05 P ≤ 0.05 P ≤ 0.01 Fungal lifestyle AnimalSoil pathotroph saprotrophPlant saprotrophPlant (85/70) (106/71) pathotrophPlant (247/209)symbiotrophEndophyte (138/107)QDH (63/44) ARD(230/183/22) (39/38) (198/144/1)NAD (168/145/0)PAH (145/133/0)PMO CCH(133/123/6) (75/69/30)BPH (55/47/1)SAH (37/32/0)SDO (24/19/0)ECL (18/18/0)VAO (13/11/2)CAE (12/12/0)FAD (2/2/0)

12 * pathotroph Animal (3/3) 10

Sacch.1 8 6 4 2 0

12 * saprotroph

(89/84) 10 Dothideo. 8 Soil 6 4 2 0 Leotio. (37/21)

12 * saprotroph 10 8 Plant Fungal lifestyle 6 4

(89/64) 2 Pezizomycotina Sordario. 0

12 * * pathotroph 10 8 Plant Odds ratio 6 4 2 (62/56) Eurotio. 0 + 12 symbiotroph 10 8 Plant 6

(50/42) 4 Sacch.2 2 0 +

12 * Endophyte 10 8 6 4 2 0 Agarico. (120/108)

QDH ARD NAD PAH PMO CCH BPH SAH SDO ECL VAO CAE FAD Gene cluster class *upper 95% CI out of range +odds ratio out of range

Figure 3.1: Associations between gene cluster distributions and fungal lifestyle. a) A phylogeny of 529 fungi (representing 454 species) based on pairwise microsyntenic similarity is shown to the left, annotated by taxonomic class. All taxonomic classes have been abbreviated by removing “-mycetes” suffixes. A matrix indicating the ecological lifestyle(s) associated with each fungus is shown to the immediate right of the phylogeny, followed by a heatmap indicating the number of candidate phenylpropanoid-degrading gene clusters in each genome, for each cluster class. Numbers in brackets following the taxonomic class and ecological lifestyle headers correspond to the number of genomes and species within those categories. Numbers in brackets following cluster class headers indicate the total number of clusters assigned to that class, the number of species with at least 1 cluster, and the number of clusters that overlap with clusters from another cluster class. b) Odds ratios representing the strength of the association between cluster presence and fungal ecological lifestyle are shown for each of 13 gene cluster classes and 6 lifestyles, using data at the species level from the Pezizomycotina. Dotted red lines indicate an odds ratio of 1. Dark grey bars indicate enrichment below a significance level of 0.05, while black outlines indicate enrichment below a significance level of 0.01. Error bars indicate the 95% confidence interval (CI) for each odds ratio measurement. CIs of 0 are not shown. The color-coding of ecological lifestyle is consistent across the entire figure.

73 our detection may, however, arise from the fact that beyond the anchor gene, no genes were required to be demonstrably involved in phenylpropanoid degradation because the loci involved in such pathways are for the most part unknown (Mäkelä et al., 2015).

Although we removed clusters with genes known to exclusively participate in biosynthetic secondary metabolism (Table B.4), it is still possible that some remaining clusters participate in biosynthetic reactions.

Clustered homolog groups, including those distributed across cluster classes, are enriched for primary and secondary metabolic functions

Despite the lack of functional constraints on cluster composition, clustered homolog groups were enriched for 7 KOG processes primarily related to primary or secondary metabolism, including “Energy production and conversion” and “Secondary metabolite biosynthesis, transport and catabolism”, the two largest and most significantly enriched categories (Table B.8). The KOG process “Transcription” was the only non-metabolic process enriched in clustered homolog groups. Homolog groups present in more than one cluster class are also enriched for multiple functional categories related to primary and secondary metabolism (Table B.9). Conserved Pfam domains present among shared homolog groups include transport and transcription-related domains, cytochrome P450 domains, and domains related to functional group modification (e.g., dehydrogenase, transferase and decarboxylase; Table B.10).

74 Phenylpropanoid degradation gene clusters are enriched in Pezizomycotina species with plant-associated lifestyles

We limited our exploratory enrichment analyses to Pezizomycotina species, as they contained the vast majority of identified clusters. 4 cluster classes are enriched in plant pathotrophs, including those containing ECL, which is known to participate in flavonoid degradation (Figure 3.1). 6 cluster classes are enriched in species with other plant-associated lifestyles, such as plant saprotrophs and endophytes. The BPH and VAO cluster classes are enriched in soil saprotrophs. No cluster classes are enriched in animal pathotrophs. Similar patterns of lifestyle-dependent enrichment are observed at the cluster model level (Figure

B.4). Notably, at least one model in each of 12 cluster classes is enriched in fungal species with a plant-associated lifestyle, and different models from the same cluster class rarely are enriched in fungi with the same ecology. Although ecological enrichments highlight predominant trends, clusters across all anchor genes are typically associated with diverse lifestyles, in part because any given fungal species can be associated with multiple ecological lifestyles.

191 fungal species possessed multiple clusters of different models. We assigned these fungi to multi-cluster model profiles (MCMPs) based on the combinations of cluster models found in their genomes (Methods). 133 fungal species were distributed across 16 distinct

MCMPs containing 5 or more species (Figure 3.2). Fungi from the same MCMP tend to be closely related; however, 13 MCMPs contain fungi from different taxonomic orders, and 7 of these contain fungi from different taxonomic classes. When limiting an exploratory enrichment analysis to Pezizomycotina species, we found that 2 MCMPs are enriched in

75 Figure 2 a) Multi-cluster model profile (MCMP) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 QDH 4 5 6 1 2 3 4 ARD 5 6 7 1 2 3 4 NAD 5 6 7 8 1 2 PAH 3 4 1 2 3 PMO 4 5 6 1 2 Gene cluster model 3 CCH 4 5 6 1 2 BPH 3 4 1 2 3 SAH 5 6 1 SDO 1 ECL 2 1 VAO 2 3 1 CAE 2 FAD 1 Fungal species gene cluster presence

*upper 95% CI out of range pathotroph 15 24 * Animal 16 b) 10 c) 5 8

Dothideo. 0

0 saprotroph 15 24 * Soil 10 16

5 8 Leotio.

0 Number of species 0 saprotroph 24 Fungal lifestyle 15 * Plant 10 16 5 8

Sordario. 0 0 pathotroph 15 24 Plant 10 16 5

Odds ratio 8 Eurotio.

Taxonomic class Taxonomic 0 0 symbiotroph 15 24 * * 10 16 Plant 5 8

Saccharo. 0 0 Endophyte 15 24 * 10 16 5 Other 8 0 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 11 12 13 14 15 16 Multi-cluster model profile (MCMP) Multi-cluster model profile (MCMP) Enrichment significance Number of species in MCMP (Pezizomycotina) P > 0.05 P ≤ 0.05 P ≤ 0.01

Figure 3.2: Combinations of candidate phenylpropanoid-degrading gene clusters in fungal genomes. a) A matrix describes the presence/absence of various homologous clusters (as determined by cluster model; rows) in the genomes of 133 fungal species (columns). Species are grouped into 16 multi-cluster model profiles (MCMPs) based on similarities in the combinations of clusters found in their genomes. b) A bar chart depicts the number of fungal species from each MCMP per taxonomic class. c) Odds ratios representing the strength of the association between MCMP and fungal ecological lifestyle are shown, using data at the species level from the Pezizomycotina. Dotted black lines indicate an odds ratio of 1. Dark grey bars indicate enrichment below a significance level of 0.05, while black outlines indicate enrichment below a significance level of 0.01. Error bars indicate the 95% confidence interval (CI) for each odds ratio measurement. CIs of 0 are not shown. Enrichment data is not shown for MCMP 10, as fewer than 5 fungi from the Pezizomycotina are assigned to this MCMP.

76 species with plant-associated lifestyles, 1 is enriched in animal pathotrophs and endophytes, and 1 is enriched in soil saprotrophs (Figure 3.2).

Homolog groups are found in modules of co-clustered genes

We visualized structural associations among all clustered homolog groups as a network where nodes represent homolog groups, and edges represent co-occurrence of homolog groups in a cluster (Figure 3.3). 39 homolog groups are found in clusters belonging to different classes (Table B.10), which resulted in a common network for all cluster classes.

These “shared” homolog groups are typically highly connected (i.e., they co-occur with a diverse set of homolog groups; Table B.10). The modularity of the network is significantly greater than that observed in networks of similar size with randomly distributed edges (Table

B.11). This high degree of modularity reflects a large number of homolog groups that are unique to each cluster class (236 in total). Some homolog groups, while not shared between cluster classes, are highly connected because they are present in different cluster models

(e.g., homolog group 1001 in Figure 3.4).

Results spotlight: Candidate pterocarpan hydroxylase clusters may encode uncharacterized strategies for flavonoid degradation

The production of toxic SMs by plants is an ancient and effective strategy for deterring fungal growth. Isoflavonoids, for example, are a critical component of legume

(Fabaceae) defenses against fungi. In response, some legume pathogens have evolved degradative metabolic pathways to detoxify isoflavonoids, thus facilitating host colonization

77

Figure 3.3: Co-occurrence network of homolog groups in candidate phenylpropanoid-degrading gene clusters. Each node represents a homolog group found in a candidate phenylpropanoid- degrading gene cluster. All nodes are color-coded by the cluster class in which they are found, except for those homolog groups found associated with multiple cluster classes (i.e., “shared”), colored pink and shaped as squares. Nodes representing anchor gene families used to retrieve the clusters are shaped as diamonds, and those anchor gene families that are shared across multiple cluster classes are additionally colored pink. Edges symbolize the co-occurrence of two homolog groups in the same gene cluster, while edge width is proportional to the frequency of that occurrence. Node size is proportional to the number of connections emanating from that node. The proximity of nodes to one another is proportional to the number of shared connections. The annotations, followed by the code names and number of unique connections in parentheses, of nodes with the greatest number of connections (i.e., associated with the greatest diversity of homolog groups) are indicated in the top right-hand corner of the network.

78 Figure 4

Cluster presence O O O O a) b) H Model 1 Model 2 Model 3 Model 4 H PAH H

O H H H O O O O

O O (-)-maackiain 1α-hydroxy-maackiain c) Cytochrome NmrA-like Unknown Unknown P450 transcriptional Model 1 PAH 1005 1007 regulator 1011 1014 Dothideomycetes Sugar+other Cytochrome Dehydrogenase P450 transporter Model 2 PAH 1001 1002 1003

Amino acid MFS Isoflavone Monoamine Unknown permease transporter reductase oxidase Model 3 PAH 1010 1030 1155 1289 1296

Cytochrome Transcription 15-hydroxy Unknown P450 factor prostaglandin

Leotiomycetes Model 4 PAH 1001 1079 1129 dehydrogenase1298 d)

1011 1005 1007 1014

1002 1003

1001 PAH 1010 1298 1129 1079 1030 1155

1296

1289

Homolog group type Model 1 Model 2 Model 3 Model 4 Present in

Eurotiomycetes < 75% of clusters Anchor Shared with another cluster class assigned to model

Figure 3.4: The distribution of pterocarpan hydroxylase (PAH) clusters in fungi. a) A phylogeny of the Pezizomycotina based on pairwise microsyntenic distance is shown to the left, annotated by taxonomic class. The presence/absence of clusters assigned to the 4 PAH cluster models is indicated in the matrix to the right of the phylogeny, color-coded by cluster model. b) A simplified schematic of one of several reactions catalyzed by PAH. c) Homolog groups present in ≥75% of clusters assigned to a given cluster model are depicted as boxes color-coded by cluster model, while PAH homologs are indicated in yellow. The four-digit code and predicted annotation are indicated for each homolog group. D) The depicted network follows the conventions specified in Figure 3.2. Homolog groups present in ≥75% of clusters assigned to a given cluster model are color-coded by cluster model and inscribed with their code, while others are colored gray. Nodes representing homolog groups present in clusters from other cluster classes are drawn as squares

79 (Enkerli et al., 1998). Pterocarpan hydroxylase (PAH) is involved in the degradation of pterocarpan and medicarpin isoflavonoids, and can contribute to virulence on susceptible plant hosts (Enkerli et al., 1998). We found 133 clusters in the PAH class, distributed across

94 fungal species with diverse ecological lifestyles (Figure 3.4). Homology among these clusters is best described by 4 models containing many of the canonical functions found in fungal MGCs. Models 1, 2 and 4 all contain cytochrome P450s, a large family of enzymes involved in detoxification processes (Lah et al., 2011), while models 2 and 3 contain at least one transporter and model 4 contains a transcription factor. Model 3 contains a homolog of isoflavone reductase, an enzyme participating in isoflavonoid biosynthesis in plants and possibly isoflavonoid detoxification in fungi (Höhl et al., 1989). The widespread distribution of PAH clusters suggests they are involved in the degradation of flavonoids or flavonoid-like molecules found outside the Fabaceae. However, the genetics underlying isoflavonoid biosynthesis and degradation, and flavonoid metabolism in general, are not well understood for fungi.

A PAH homolog in Alternaria brassisicola found in a model 1 cluster was recently shown to contribute to DHN melanin accumulation in fungal cell walls, possibly through polymerization or cross-linking (Pigné et al., 2017). Melanins are a class of phenolic polymers that quench oxidizing radicals, and contribute to pathogenicity and survival in harsh environments (Bayry et al., 2014). Intriguingly, plant flavonoids can be polymerized in planta and in vitro into melanin by fungal laccases (Fowler et al., 2011) and possibly by monooxygenases similar to PAH (Desentis-Mendoza et al., 2006). As the repurposing of host metabolites for the production of SMs is not unprecedented in fungi (Schmaler-Ripcke et al.,

80 2009), we suggest that this PAH cluster is an excellent candidate for exploring hypotheses of metabolic cross talk.

Discussion

Candidate phenlypropanoid degradation MGCs are found at over 1000 loci with unexpectedly conserved synteny

Fungi are the most ecologically significant decomposers of plant biomass, and represent some of the world’s most devastating plant pathogens (Peay et al., 2016). The presence of genes targeting plant SMs in fungal genomes has often been linked to species ecology and the ability to colonize different types of plant tissues (Floudas et al., 2012).

Much less is known about how these genes are organized in fungal genomes, and how their organization impacts or may be affected by fungal ecology and evolution. Many fungal genomes tend to undergo extensive and frequent rearrangements (Hane et al., 2011;

Plissonneau et al., 2016). By contrast, MGCs we detected are often simultaneously present in lineages that diverged hundreds of millions of years ago (Floudas et al., 2012). As combinations of genes that persist over long periods of time are likely to be maintained by natural selection (Hurst et al., 2002; Baquero 2004; Muto et al., 2013) and often encode proteins that are functionally related (Szklarczyk et al., 2014; Zhao et al., 2014), we hypothesize that the conserved combinations of phenylpropanoid degrading genes detected here encode adaptive metabolic phenotypes, and are signatures of selection on genome organization.

81 The most well studied phenotype conferred by clustering is improved coordination of gene transcription (Hurst et al., 2002). Clustering facilitates transcription by allowing co- regulation through local chromatin modifications (Shwab et al., 2007), promoter sharing

(Davila Lopez et al., 2010), and avoidance of topological constraints on DNA encountered during transcription (Tsochatzidou et al., 2017). In addition to synchronizing cellular responses, coordinated gene transcription may also decrease the accumulation of toxic intermediate metabolites, such as those produced during phenylpropanoid degradation

(Greene et al., 2014; Mäkelä et al., 2015), by optimizing enzyme stoichiometry and improving metabolic flux (McGary et al., 2013). Gene clustering may additionally be selected because it increases the capacity for recombining populations to evolve. When genes contributing to the same trait are clustered, selection for that trait is more efficient because of decreased selective interference between genes at that locus (Pepper 2003). Clustering can also prevent the breakup of co-adapted alleles in the face of gene flow (Yeaman 2013), possibly enabling local adaptation within cryptic niches.

Similarly, clustering increases the probability of propagating adaptive combinations of genes through both vertical (Pepper 2003) and horizontal transfer (Lawrence and Roth

1996; Baquero 2004). Many of the clusters observed in this study have discontinuous phylogenetic distributions consistent with evolutionary scenarios such as vertical inheritance coupled with extensive loss, convergence, duplication, or horizontal gene transfer (HGT).

HGT of MGCs is predicted to be associated with rapid adaptive changes to phenotypes, including increased capacity for host colonization (Dhillon et al., 2015; Glenn et al., 2016) and nutrient acquisition (Slot and Hibbett 2007). For example, MGCs in bacteria are

82 frequently dispersed by HGT among species inhabiting the same environment or host

(Baquero 2004). HGT is positively influenced by relatedness in bacteria (Andam and

Gogarten 2011) and fungi (Wisecaver et al., 2014; Gluck-Thaler and Slot 2015), as well as by shared ecological niche (Smillie et al., 2011). Given that the distributions we observed here may be at least partly HGT-driven (Greene et al., 2014), the role of HGT in the dispersal of putative phenylpropanoid degrading MGCs must be explicitly tested with phylogenetic methods in follow-up studies (Gluck-Thaler and Slot 2015).

Gene cluster distributions suggest ecological adaptation

We detected the majority of clusters in the Pezizomycotina, which are known to possess highly clustered genomes compared to the (Wisecaver et al., 2014).

Such differences may be due modes of chromosomal evolution unique to the Pezizomycotina involving extensive rearrangements that could facilitate cluster formation (Hane et al., 2011;

Hartmann et al., 2017). The large population sizes attributed to many Pezizomycotina lineages could then serve to increase selection efficiency for clusters (Lynch and Walsh

2007) such that they would be maintained. Nevertheless, while certain clusters have distributions suggestive of lineage-specific bias (Table B.7), these conserved combinations of genes were ultimately detected because their distributions conflicted with lineage-specific phylogenetic signal, indicating that phylogeny is not the sole distribution determinant.

Indeed, genome organization is an important component of fitness and may be differentially selected across ecological niches (Kirkpatrick and Barton 2006; Yeaman 2013). Fungal

MGCs, especially those enriched in plant pathotrophs, may contribute to pathogen fitness, as

83 recent reports suggest that degradation of host defense compounds contributes quantitatively to plant pathogen virulence and reproduction (Hammerbacher et al., 2013; Kettle et al.,

2015). MGCs may also confer fitness benefits to saprotrophs by enabling the degradation of phenylpropanoids that inhibit fungal colonization of organic matter (Floudas et al., 2012).

Our exploratory analysis indeed suggests that the presence of certain candidate MGCs is associated with ecological lifestyle (Figure 3.1), and that different cluster models may be specialized for phenylpropanoids encountered in different environments, given that models from the same cluster class are rarely enriched in fungi with the same ecology (Figure B.4).

Cluster model co-occurrences may reflect simultaneous selection by multiple plant metabolites

The modularity of the structural network of clustered homolog groups suggests that different cluster classes carry out semi-independent functions (Wagner 1996), and may reflect differences in the phytochemicals they target (Figure 3.3). Minimally, the structuring of clusters as independent modules with little overlap in content indicates that cluster classes have largely evolved independently of one another, and that selection for genome organization is likely to have occurred multiple times.

Recurrent patterns of cluster combinations among fungi from different taxonomic classes (Figure 3.2) raise the intriguing possibility that conserved cluster repertoires may be important for determining ecological phenotypes, similar to how combinations of pathogenicity islands in bacteria determine host range (Bouyioukos et al., 2016). Plants typically do not produce a single chemical in isolation; rather, defense metabolites are

84 released as complex mixtures (Gershenzon et al., 2012). Some multi-cluster model profiles

(MCMPs) may thus reflect the compositions of SM mixtures encountered during the colonization of specific hosts, or additionally, temporal differences in defense compound pressure (Hammerbacher et al., 2013). A detailed investigation of degradative MGCs from fungi colonizing the same host would be ideal for testing this hypothesis.

Do gene clusters bridge plant and fungal metabolisms?

The observation that some homolog groups are clustered with diverse sets of genes, often from different cluster classes (Figure 3.3), implies they have been repeatedly recruited to different catabolic processes and may encode key strategies for phenylpropanoid degradation in fungi. Notably, some of the shared homolog groups, like those with members encoding cytochrome P450 enzymes (see groups ALL1024, ALL1032, ALL1115 in Table

B.10) and multi-drug transporters (ALL1122, ALL1127), are known to be associated with evolutionary adaptability and underlie many fungal detoxification strategies (Lah et al., 2011;

Mäkelä et al., 2015). Intriguingly, cellular processes associated with other shared homolog groups suggest selection to integrate phenylpropanoid degradation products into central fungal metabolism, for example, carbohydrate transport (MFS monocarboxylate transporter:

ALL1065; Sugar transporter: ALL1071, ALL1103), and aromatic amino acid biosynthesis

(3-dehydroshikimate dehydratase: ALL1002) and degradation (fumarylacetoacetate hydrolase: ALL1055). Combinations of both evolvable and core genes in MGCs may ultimately serve to connect diversifying chemical environments to stable metabolic networks, acting as bridges between specialized plant metabolism and central fungal metabolism.

85 The prevalence of candidate phenylpropanoid-degrading MGCs in plant-associated fungi suggests that specialized plant metabolism is a strong source of selection on fungal genome organization. Based on their distribution, we hypothesize that the MGCs detected here encode selectable phenotypes promoting the colonization of plant substrates by fungi.

Further functional characterization of these loci will serve to accelerate discovery of metabolic processes of great interest not only for lignocellulosic biofuel production, but also for our understanding of the evolutionary dynamics between genome organization and ecological adaptation.

Acknowledgements

This work was supported by funds from The Ohio Agricultural Research and

Development Center at The Ohio State University (EGT, JCS), The National Science

Foundation (DEB-1638999, JCS) and the Fonds de recherche du Québec-Nature et technologies (EGT). All computational work was conducted on the Ohio State

Supercomputer. We are grateful to Adrian Tsang, Alexey Grum-Grzhimaylo, Amy Jo

Powell, Andrii Gryganskyi, Angel T. Martinez, Burt H. Bluhm, Christian Kubicek, Colleen

Hansel, Daniele Armaleo, Daniel L. Linder, David Ezra, David S. Hibbett, Don Natvig,

Francis Michel Martin, Francisco Javier Ruiz Dueñas, Francois Lutzoni, Gabor M. Kovacs,

Gerald Bills, Gillian Turgeon, Gregory Bonito, Igor Grigoriev, Irina S. Druzhinina, John W.

Taylor, Jon Karl Magnuson, José María Barrasa, Joseph Spatafora, Kathryn Bushley, Kerry

O'Donnell, Laurie Connell, Marie-Noëlle Rosso, M. Catherine Aime, Patrik Inderbitzin, Paul

Dyer, Pedro Crous, Robin Ohm, Scott E. Baker, Stephen B. Goodwin, Tom Bruns, and Trey

86 Sato for providing access to the unpublished data produced by the U.S. Department of

Energy Joint Genome Institute, a DOE Office of Science User Facility, supported by the

Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-

05CH11231, and the 1000 Fungal Genomes Project.

References

Al-Shahrour F, Minguez P, Marqués-Bonet T, Gazave E, Navarro A, Dopazo J (2010). Selection upon genome architecture: conservation of functional neighborhoods with changing genes. PLoS Computational Biology 6: e1000953.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal of Molecular Biology 215: 403-410.

Andam CP, Gogarten JP (2011). Biased gene transfer in microbial evolution. Nature Reviews Microbiology 9: 543-555.

Baquero F (2004). From pieces to patterns: evolutionary engineering in bacterial pathogens. Nature Reviews Microbiology 2: 510-518.

Bayry J, Beaussart A, Dufrêne YF, Sharma M, Bansal K, Kniemeyer O et al (2014). Surface structure characterization of Aspergillus fumigatus conidia mutated in the melanin synthesis pathway and their human cellular immune response. Infection and Immunity 82: 3141-3153.

Bouyioukos C, Reverchon S, Képès F (2016). From multiple pathogenicity islands to a unique organized pathogenicity archipelago. Scientific Reports 6.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A (2007). Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 26: 53-77.

Davila Lopez M, Martinez Guerra JJ, Samuelsson T (2010). Analysis of gene order conservation in eukaryotes identifies transcriptionally and functionally linked genes. PLoS One 5: e10654.

Desentis-Mendoza RM, Hernández-Sánchez H, Moreno A, Rojas del C E, Chel-Guerrero L, Tamariz J et al (2006). Enzymatic polymerization of phenolic compounds using laccase and tyrosinase from Ustilago maydis. Biomacromolecules 7: 1845-1854.

87

Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A et al (2015). Horizontal gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen. Proceedings of the National Academy of Sciences 112: 3451-3456.

Eddy SR (2011). Accelerated profile HMM searches. PLoS Comput Biol 7: e1002195.

Edgar RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461.

Enkerli J, Bhatt G, Covert SF (1998). Maackiain detoxification contributes to the virulence of Nectria haematococca MP VI on chickpea. Molecular Plant Microbe Interactions 11: 317-326.

Farr DF, Rossman AY (2017). Fungal Databases. U.S. National Fungus Collections, ARS, USDA: https://nt.ars-grin.gov/fungaldatabases/.

Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B et al (2012). The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336: 1715-1719.

Fowler ZL, Baron CM, Panepinto JC, Koffas MA (2011). Melanization of flavonoids by fungal and bacterial laccases. Yeast 28: 181-188.

Gershenzon J, Fontana A, Burow M, Wittstock U, Degenhardt J (2012). Mixtures of plant secondary metabolites: metabolic origins and ecological benefits. In: Iason GR, Dicke M, Hartley SE (ed). The ecology of plant secondary metabolites: from genes to global processes. Cambridge University Press: Cambridge, MA. pp 56-77.

Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH et al (2016). Two Horizontally Transferred Xenobiotic Resistance Gene Clusters Associated with Detoxification of Benzoxazolinones by Fusarium Species. PLoS One 11: e0147486.

Gluck-Thaler E, Slot JC (2015). Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens. PLoS Pathogens 11: e1005156.

Greene GH, McGary KL, Rokas A, Slot JC (2014). Ecology drives the distribution of specialized tyrosine metabolism modules in fungi. Genome Biology and Evolution 6: 121-132.

Hammerbacher A, Schmidt A, Wadke N, Wright LP, Schneider B, Bohlmann J et al (2013). A common fungal associate of the spruce bark beetle metabolizes the stilbene defenses of Norway spruce. Plant Physiology 162: 1324-1336.

88 Hane JK, Lowe RG, Solomon PS, Tan K-C, Schoch CL, Spatafora JW et al (2007). Dothideomycete–plant interactions illuminated by genome sequencing and EST analysis of the wheat pathogen Stagonospora nodorum. Plant Cell 19: 3347-3368.

Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP (2011). A novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome Biology 12: R45.

Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D (2017). A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. The ISME Journal 11: 1189-1204.

Hoffmeister D, Keller NP (2007). Natural products of filamentous fungi: enzymes, genes, and their regulation. Natural Products Report 24: 393-416.

Höhl B, Arnemann M, Schwenen L, Stöckl D, Bringmann G, Jansen J et al (1989). Degradation of the Pterocarpan Phytoalexin (—)-Maackiain by Ascochyta rabiei. Zeitschrift für Naturforschung C 44: 771-776.

Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW (2016). Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist 209: 1240-1251.

Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC et al (2015). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Research 44: D286-D293.

Huerta-Cepas J, Serra F, Bork P (2016). ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular Biology and Evolution 33: 1635-1638.

Hurst LD, Williams EJ, Pal C (2002). Natural selection promotes the conservation of linkage of co-expressed genes. Trends in Genetics 18: 604-606.

Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C et al (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236-1240.

Jönsson LJ, Martín C (2016). Pretreatment of lignocellulose: formation of inhibitory by- products and strategies for minimizing their effects. Bioresoure Technology 199: 103-112.

Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM (2015). Degradation of the benzoxazolinone class of phytoalexins is important for virulence of Fusarium pseudograminearum towards wheat. Molecular Plant Pathology 16: 946-962.

89 Kirkpatrick M, Barton N (2006). Chromosome inversions, local adaptation and speciation. Genetics 173: 419-434.

Lah L, Podobnik B, Novak M, Korošec B, Berne S, Vogelsang M et al (2011). The versatility of the fungal cytochrome P450 monooxygenase system is instrumental in xenobiotic detoxification. Molecular Microbiology 81: 1374-1389.

Lawrence JG, Roth JR (1996). Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143: 1843-1860.

Li L, Stoeckert CJ, Roos DS (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research 13: 2178-2189.

Lynch M (2007). The origins of genome architecture. Sinauer Associates: Sunderland, MA.

Mäkelä MR, Marinović M, Nousiainen P, Liwanag AJ, Benoit I, Sipilä J et al (2015). Aromatic Metabolism of Filamentous Fungi in Relation to the Presence of Aromatic Compounds in Plant Biomass. Advances in Applied Microbiology 91: 63-137.

Marquitti FMD, Guimarães PR, Pires MM, Bittencourt LF (2014). MODULAR: software for the autonomous computation of modularity in large network sets. Ecography 37: 221- 224.

McGary KL, Slot JC, Rokas A (2013). Physical linkage of metabolic genes in fungi is an adaptation against the accumulation of toxic intermediate compounds. Proceedings of the National Academy of Sciences 110: 11481-11486.

Muto A, Kotera M, Tokimatsu T, Nakagawa Z, Goto S, Kanehisa M (2013). Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences of Reactions. Journal of chemical information and modeling 53: 613-622.

Newman ME (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74: 036104.

Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J et al (2016). FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology 20: 241-248.

Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara B et al (2016). vegan: Community Ecology Package. R package version 2.3-5.

Paradis E, Claude J, Strimmer K (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.

90

Peay KG, Kennedy PG, Talbot JM (2016). Dimensions of biodiversity in the Earth mycobiome. Nature Reviews Microbiology 14: 434-447.

Pepper JW (2003). The evolution of evolvability in genetic linkage patterns. Biosystems 69: 115-126.

Pigné S, Zykwinska A, Janod E, Cuenot S, Kerkoud M, Raulo R et al (2017). A flavoprotein supports cell wall properties in the necrotrophic fungus Alternaria brassicicola. Fungal Biology and Biotechnology 4: 1.

Plissonneau C, Stürchler A, Croll D (2016). The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat. mBio 7: e01231-01216.

Schmaler-Ripcke J, Sugareva V, Gebhardt P, Winkler R, Kniemeyer O, Heinekamp T et al (2009). Production of pyomelanin, a second type of melanin, via the tyrosine degradation pathway in Aspergillus fumigatus. Applied Environtal Microbiology 75: 493- 503.

Shalaby S, Horwitz BA, Larkov O (2012). Structure–Activity Relationships Delineate How the Maize Pathogen Cochliobolus heterostrophus Uses Aromatic Compounds as Signals and Metabolites. Molecular Plant Microbe Interactions 25: 931-940.

Shanmugam V, Ronen M, Shalaby S, Larkov O, Rachamim Y, Hadar R et al (2010). The fungal pathogen Cochliobolus heterostrophus responds to maize phenolics: novel small molecule signals in a plant-fungal interaction. Cell Microbiology 12: 1421-1434.

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 2498-2504.

Shwab EK, Bok JW, Tribus M, Galehr J, Graessle S, Keller NP (2007). Histone deacetylase activity regulates chemical diversity in Aspergillus. Eukaryotic Cell 6: 1656- 1664.

Slot JC, Hibbett DS (2007). Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE 2: e1097.

Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ (2011). Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480: 241-244.

Snel B, Bork P, Huynen MA (1999). Genome phylogeny based on gene content. Nature Genetics 21: 108-110.

91

Stamatakis A (2014). RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30: 1312-1313.

Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J et al (2014). STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Research 43: D447-52

Szumilas M (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry 19: 227.

Tsochatzidou M, Malliarou M, Papanikolaou N, Roca J, Nikolaou C (2017). Genome urbanization: clusters of topologically co-regulated genes delineate functional compartments in the genome of Saccharomyces cerevisiae. Nucleic Acids Research 45: 5818-5828.

Wagner GP (1996). Homologues, natural kinds and the evolution of modularity. American Zoologist 36: 36-43.

Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C (2014). Gene discovery for enzymes involved in limonene modification or utilization by the mountain pine beetle- associated pathogen Grosmannia clavigera. Applied Environmental Microbiology 80: 4566-4576.

Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R et al (2015). antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Research 43: W237-W243.

Wickham H (2016). ggplot2: elegant graphics for data analysis. Springer.

Wisecaver JH, Slot JC, Rokas A (2014). The Evolution of Fungal Metabolic Pathways. PLoS Genetics 10: e1004816.

Yeaman S (2013). Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences 110: E1743-E1751.

Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B et al (2014). Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks. Elife 3: e03275.

92

Chapter 4: Fungal Adaptation to Plant Defenses Through Convergent Assembly of Metabolic Modules

Published as: Gluck-Thaler E, Vijayakumar V, Slot JC (2018). Fungal adaptation to plant defences through convergent assembly of metabolic modules. Molecular ecology 27: 5120-5136.

JCS and EG-T conceived the study. JCS, EG-T, and VV designed the study. EG-T and VV collected materials and data. EG-T and VV analyzed the data. JCS, EG-T and VV wrote the manuscript. EG-T and VV contributed equally to this work.

93 Abstract

The ongoing diversification of plant defense compounds exerts dynamic selection pressures on the microorganisms that colonize plant tissues. Evolutionary processes that generate resistance towards these compounds increase microbial fitness by giving access to plant resources and increasing pathogen virulence. These processes entail sequence- based mechanisms that result in adaptive gene functions, and combinatorial mechanisms that result in novel syntheses of existing gene functions. However, the priority and interactions among these processes in adaptive resistance remains poorly understood.

Using a combination of molecular genetic and computational approaches, we investigated the contributions of sequence-based and combinatorial processes to the evolution of fungal metabolic gene clusters encoding stilbene cleavage oxygenases (SCOs), which catalyze the degradation of biphenolic plant defense compounds known as stilbenes into monophenolic molecules. We present phylogenetic evidence of convergent assembly among three distinct types of SCO gene clusters containing alternate combinations of phenolic catabolism. Multiple evolutionary transitions between different cluster types suggest recurrent selection for distinct gene assemblages. By comparison, we found that the substrate specificities of heterologously expressed SCO enzymes encoded in different clusters types were all limited to stilbenes and related molecules with a 4’-OH group, and differed modestly in substrate range and activity under the experimental conditions.

Together, this work suggests a primary role for genome structural rearrangement, and the importance of enzyme modularity, in promoting fungal metabolic adaptation to plant defense chemistry.

94 Introduction

Metabolic pathways for the synthesis and degradation of secondary/specialized metabolites (SMs) govern critical interactions between organisms. For example, plants produce numerous SMs that defend against fungal attack and decay, while fungi escape the inhibitory effects of these molecules by neutralizing or breaking them down

(Michielse et al., 2012; Hammerbacher et al., 2013; Kettle et al., 2015). The co- diversification of such biosynthetic and degradative pathways may be a driving force behind global biodiversity and ecosystem function (Bailey et al., 2009; Futuyma and

Agrawal 2009; Speed et al., 2015), affecting the spatial and temporal assembly of complex communities (Züst et al., 2012; Richards et al., 2015).

Many biosynthetic and degradative SM pathways in bacteria, plants and fungi are encoded in metabolic gene clusters (MGCs), which consist of neighboring sets of genes encoding enzymes, regulators and transporters functioning in the same metabolic processes (Nützmann et al., 2016; Adamek et al., 2017; Slot 2017). MGCs are more frequently affected by gene duplication and horizontal gene transfer (HGT) compared with unclustered genes, and are hotspots for metabolic diversification through both combinatorial and sequence-based mechanisms (Wisecaver et al., 2014; Lind et al.,

2017). Combinatorial-based diversification of biosynthetic and degradative pathways typically results from processes such as HGT and/or the rewiring of regulatory circuits that assemble existing enzymatic reactions into qualitatively new pathways (Wagner

2011). Sequence-based diversification results from point mutations, insertions, deletions and recombination within enzyme-encoding genes that lead to specialized or novel

95 enzymatic functions (Copley 2009). Changes to the coding sequences of individual genes within MGCs can contribute quantitatively (Lendenmann et al., 2014) and qualitatively

(Proctor et al., 2008) to biochemical phenotypes, while changes to the combinations of genes within MGCs are often associated with the addition or deletion of particular enzymatic reactions from a given pathway (Yabe and Nakajima 2004; Ehrlich et al.,

2004; Proctor et al., 2009). Although combinatorial and sequence-based processes are not mutually exclusive, their effects are rarely examined in parallel, precluding an assessment of their relative contributions to the diversification of adaptive metabolic pathways encoded in MGCs.

Among eukaryotes, fungi possess some of the most highly clustered genomes

(Slot 2017). Fungal MGCs that synthesize SMs are well studied, and increasingly, fungal mechanisms to degrade/assimilate plant SMs are also found encoded in clusters (Kettle et al., 2015; Glenn et al., 2016; Gluck-Thaler and Slot 2018). For example, a fungal MGC encoding a putative pathway for the degradation of plant-produced stilbene molecules was recently reported by (Greene et al., 2014). Stilbenes are a structurally diverse group of ethene-linked biphenolic SMs produced by distantly related plant families and some bacteria (Chong et al., 2009). Stilbene biosynthesis has convergently evolved at least four times across the plant kingdom (Tropf et al., 1994; Han et al., 2014), and numerous plant lineages produce distinct suites of stilbene derivatives (Chong et al., 2009). Stilbenes are important components of constitutive and inducible plant defenses, often with broad- spectrum antifungal activity (Celimene et al., 1999; 2001; Jeandet et al., 2010), and some form lignin monomers that contribute to plant cell wall integrity (Del Río et al., 2017).

96 The ability to degrade or neutralize stilbenes in some fungal pathogens can be directly associated with their ability to colonize host tissue (Hammerbacher et al., 2013; Wadke et al., 2016).

One stilbene catabolism mechanism entails the enzymatic cleavage of the 1,2- diphenylethylene stilbene backbone into two phenolic aldehydes by stilbene cleavage oxygenases (SCO; EC: 1.13.11.43) derived from the carotenoid cleavage dioxygenase family (Harrison and Bugg 2014). Five fungal SCOs from Ustilago maydis (UmRco1;

XP_761231), Aspergillus fumigatus (AfRco1; XP_746307), Chaetomium globosum

(CgRco1; XP_001219451), Botryotinia fuckeliana (BfRco1; XP_001548426) and

Neurospora crassa (Cao-1; XP_961764) have been functionally characterized to date and were reported to share identical specificities towards multiple substrates (Brefort et al.,

2011; Díaz-Sánchez et al., 2013). However, how the SCO-containing catabolic pathways adapt to diverse stilbene structures remains an open question.

Here, we investigated the evolution of fungal SCO function and the composition of associated MGCs (sco clusters). We first identified three sco clusters with distinct gene combinations. Given that alternative types of MGCs typically encode different metabolic pathways, we predicted that SCOs encoded in different sco cluster types would either have 1) specificity to alternative substrates, suggesting enzyme functional divergence plays a role in the formation of new cluster types, or 2) conserved specificity, suggesting cluster/pathway remodeling can operate independently of sequence-based evolution. We analyzed the substrate specificities of four heterologously expressed SCOs from different cluster types and unclustered genomic regions, and found evidence for broad

97 conservation of substrate specificity in both unclustered and clustered SCOs, with only modest variation in the substrate specificity of SCOs found in alternate cluster types.

Conversely, we found evidence for multiple independent origins of and repeated transitions between different types of sco clusters, consistent with selection acting upon recurrent combinations of enzyme-encoding genes. These results suggest that the reorganization of sco clusters may underpin adaptive shifts in stilbene degradation pathways, while changes in enzymatic function may play a secondary role.

Materials and Methods

sco cluster retrieval and annotation

sco clusters were identified in publicly available genomes of 550 fungi (Table

C.1) as previously described (Gluck-Thaler and Slot 2018). Briefly, all hits to a previously functionally characterized SCO amino acid query (accession: EAA32528.1) were retrieved from predicted proteomes using BLASTp from the BLAST suite v.2.2.25+

(Altschul et al., 1990) with cutoffs (≥30% identity, bitscore ≥50, evalue ≤1e-5, and target sequence length 50-150% query length). Additional unannotated sequences were retrieved by tBLASTn of genome assemblies. All genes located within 20 genes upstream and downstream of all sco hits (“neighborhood”) were assigned to homolog groups using OrthoMCL v2 (Fischer et al., 2011)with an inflation value of 1.5, and homolog groups with limited phylogenetic distributions were discarded. Clusters were then detected by comparing the phylogenetic distributions of homolog group

98 combinations within neighborhoods to distributions expected under null models of microsynteny evolution. A matrix detailing the presence or absence of homolog groups in each cluster was used to calculate Raup-Crick dissimilarity indices for all pairwise comparisons of clusters using the vegan v.2.4-4 package in R (Oksanen et al., 2016). The matrix of dissimilarity indices was then subjected to complete linkage clustering, and any clusters separated by less than the empirically determined distance cutoff of 0.05 units were assigned to the same cluster model. As implemented, Raup-Crick dissimilarity values correspond to the probability that the content of two given clusters is non- identical.

Summary descriptions of each cluster model consisted of those homolog groups found in ≥75% of clusters assigned to that model. A representative sequence from each summary homolog group was used to query genome assemblies whose retrieved clusters lacked that homolog group using tBLASTn. Hits occurring within 6 genes upstream or downstream of an sco sequence or a sequence part of an sco cluster and whose lengths were between 50-150% of the query were manually assembled and considered clustered with that sco. All detected gene clusters were re-classified as belonging to a given model if they were missing at most 1 homolog group from that model’s summary. Clusters not meeting this criteria but with at least 1 homolog group from the model’s summary were designated as ‘partial’; clusters meeting the criteria for multiple models were designated as ‘hybrid’; clusters meeting the criteria for full membership in one model and partial membership in another were designated as ‘partial-hybrid’. sco homologs not part of any cluster were designated ‘singletons’.

99 Annotations of all clustered proteins were computed using eggnog-mapper

(Huerta-Cepas et al., 2017) based on fungal-specific fuNOG orthology data (Huerta-

Cepas et al., 2016b), and consensus annotations for homolog groups were derived by selecting the most frequent annotation among all members of the group. Conservation of synteny between select clustered regions was assessed with pairwise tBLASTx searches and visualized with EasyFig v2.1, with minimum hit length of 100bp and maximum e- value of 1E-4 (Sullivan et al., 2011).

Phylogenetic analysis and visualization

Preliminary phylogenetic analyses of key clustered genes (sco, gdo, vao and dhbd) and conserved housekeeping genes (rpb1, rpb2, pol1, hsp60 and ef1a) were conducted as follows: all hits to various amino acid queries were retrieved using

BLASTp, where hits had a minimum 1e-5 evalue cutoff (Altschul et al., 1990). Genomes without a hit to a given query were manually searched with tBLASTn, and resulting hits compiled with the rest. All sequences were aligned with mafft v7.221 using the --auto strategy(Katoh and Standley 2013) and the resulting sequence alignment was trimmed with TrimAl v1.4 using the -automated1 strategy (Capella-Gutierrez et al., 2009). A preliminary phylogenetic analysis was then performed with FASTTREE v.2.1.10 (Price et al., 2010), where trees were mid-point rooted prior to subsequent pruning.

For select clustered genes, a representative sequence was chosen from each major clade in the preliminary midpoint-rooted FASTTREE and used to retrieve the top 10

100 bacterial hits from the NCBI nr database (last accessed 03/01/2018). These bacterial sequences were combined with the original set of fungal sequences and submitted for preliminary phylogenetic analysis as above. The resulting trees were rooted based on the placement of bacterial sequences. All sequences descending from the last common ancestor of sequences in sco clusters were then extracted, and submitted to preliminary phylogenetic analysis as above. The best fitting model of protein evolution for phylogenetic reconstruction was determined according to the AICC using prottest v3.4

(Darriba et al., 2011). Trimmed alignments were submitted to RAxML v8.2.9 for maximum likelihood (ML) phylogenetic analysis where 100 rapid bootstraps were used to map support values on the best-scoring ML tree (Stamatakis 2014). For housekeeping genes, the clade of single copy sequences containing the query on the preliminary

FASTTREE was extracted and submitted to ML analysis, as above. The majority rule consensus tree of the five housekeeping gene trees was computed in RAxML, and its topology was used to constrain another ML analysis of the trimmed rpb2 alignment in

RAxML with 100 rapid bootstraps.

To quantify the phylogenetic and taxonomic diversity of each key clustered gene phylogeny, we calculated the percentage of total branch length on the rpb2 species tree for each internal node, as well as the number of taxonomic classes associated with the set of genomes descending from each node. Differences in the distributions of either the number of taxonomic classes, or the percentage of total distance on the rpb2 tree, covered by descendants from each node on each of the 5 gene trees were determined by first conducting a Kruskal-Wallis test, followed by a post-hoc Dunn test for multiple

101 comparisons. P-values were adjusted using the Benjamini-Hochberg procedure, and the null hypothesis of no difference between the mean ranks of two particular distributions was rejected at alpha = 0.05. All statistical analyses were implemented in the stats package of R (R Core Team 2017).

The trimmed sco alignments were additionally used for Bayesian inference with

MrBayes v3.2.6 (Ronquist et al., 2012). Two independent Metropolis coupled Markov

Chain Monte Carlo analyses (options: 4 chains, temp=0.03, starttree=random, prset aamodelpr=mixed, lset rates=invgamma Ngammacat=4) ran until the average standard deviation of split frequencies fell below 0.05, which occurred after 10 million generations. Trees were sampled every 100 generations, and the first 35% of trees were discarded as burn-in. The 50% majority rule consensus tree was obtained using the sumt function.

All phylogenetic trees were visualized using ETE v3(Huerta-Cepas et al., 2016a).

Tanglegrams were drawn using the ‘ape’ package’s ‘cophyloplot’ function in R (Paradis et al., 2004).

Evolutionary transition analyses

All ancestral state reconstruction analyses were conducted by submitting the 50% credible set of sco trees from the MrBayes analysis (minus burnin; see above) and a trait matrix indicating each sco sequence’s clustered state (either model 1, model 2, model 3, or unclustered; hybrid clusters were assigned to multiple states) to a BayesTraits v3

102 multistate MCMC analysis (options: reverse jump with a gamma prior for the rate coefficients whose mean and variance were both seeded from uniform hyper-priors ranging from 0-20; 1.2 billion total iterations; 20% discarded as burnin; sample every

192,000 iterations) (Pagel and Meade 2006). Because our model consists of 12 parameters corresponding to transition rates between each possible clustered state, we used reverse jump MCMC in order to decrease the number of rate parameters to be estimated and decrease the complexity of our model. Reverse jump MCMC integrates all results over model space though an MCMC search, weighing models with different numbers of estimated parameters by their probability, resulting in a posterior distribution of simplified models that best explain the data. Values for the prior and hyper-prior were chosen by examining trace plots of the posterior distributions of various preliminary analyses run with multiple alternative values, and selecting the values that resulted in distributions that did not appear to be prematurely cutoff. Variation in the quality of genome assemblies was accounted for by treating unclustered sco singletons located within 6 gene models of a contig or scaffold end as belonging to all clustered states with equal probability. Transition rates were estimated over the entire tree. Transitions between clustered states and gains in clustered state were inferred when the median probability of the ancestral clustered state at a given node increased by ≥75% compared to the median probability of that state at the most recent parental node with ≥0.95 posterior probability support. Transition and gain events that may have occurred over multiple nodes but are supported at a particular node by extant cluster distributions are referred to as having “ambiguous” placement. Due to ambiguity in the ancestral clustered

103 state at/near the root of the tree, we refrained from inferring gains or transitions in clustered states at up to 3 nodes away from the root.

Testing hypotheses of cluster gains and transitions

In order to obtain an appropriate ML sco phylogeny for hypothesis testing, we submitted the trimmed alignment of sco sequences for ML analysis, as above, with the added constraint that the topology exactly match that of sco’s Bayesian consensus tree

(Table C.2). It was necessary that the topology of the ML tree to be used for testing alternate hypotheses match that of the Bayesian consensus because the latter was used to reconstruct ancestral states. In order to then assess support for the independence of cluster transitions and origins within a ML framework, we retrieved the per-site log likelihoods of the Bayesian topology-constrained and hypothesis-constrained ML sco phylogenies with RAxML (option: -f G) and submitted them to CONSEL v0.2 for analysis with the approximately unbiased (AU) test (Shimodaira and Hasegawa 2001; Shimodaira 2002).

The AU test calculates the probability that a given topology has the largest observed likelihood among the set of supplied phylogenies. The null hypothesis that a given phylogeny had the largest observed likelihood was weakly rejected at α ≤ 0.10 and strongly rejected at α ≤ 0.01. To obtain constrained ML trees, the trimmed alignment of sco sequences was submitted to RAxML (options: -f d, -g) along with a constraint tree specifying the monophyly of particular sets of sequences (see Table C.3 for all constraint criteria). These trees were non-comprehensive (i.e., they only contained sequences explicitly mentioned in the hypothesis), and besides the bifurcating node specifying the

104 monophyletic constraint, consisted only of multifurcating nodes. The placement of taxa not present in the constraint tree and multifurcations were resolved using a ML search.

Each constrained ML phylogeny was manually verified to adhere to the specified constraint.

Molecular cloning and protein expression

An oligo-dT-primed cDNA library constructed using SuperScript® III First-

Strand Synthesis System for RT-PCR (#18080-051, Invitrogen Life Technologies,

Waltham, MA, USA) was used as the template for obtaining the coding sequences of four fungal SCOs (Dssco from Diplodia sapinea (MH350427); Mosco from Magnaporthe oryzae (MH350428); Pasco from Podospora anserina (MH350429); Prsco from

Penicillium roqueforti (MH350430)) by PCR amplification using Platinum Taq DNA polymerase (#11304-011, Invitrogen, Waltham, MA, USA). Specially designed oligonucleotide primer pairs for enzyme-free cloning into pETite C-His Kan Vector

(#49002- 1, Expresso T7 Cloning Kit, Lucigen Corp., Middleton, WI, USA) for C- terminal His tag protein expression and cell lysate preparations were used. The oligonucleotide primer pairs used in the present study are listed in Table C.4. The resulting recombinant plasmids were transformed into the HI-Control 10G host strain and following sequence verification freshly transformed into HI-Control BL21 (DE3) chemically competent cells for expression and cell lysate preparation, respectively, following manufacturer’s instructions. Sequence verified transformant(s) were grown in 5 mL of Luria-Bertani (LB) liquid medium containing 50 µg/mL kanamycin for 16 h at 37

105 °C with 225 rpm shaking. Cultures were transferred to 100 mL of fresh LB liquid media in a 500 mL flask and incubated at 37°C until cultures reached an optical density (OD) at

600 nm of 0.4–0.6. Protein expression was induced with 1 mM isopropyl β-D-1- thiogalactopyranoside (IPTG, #I3301, Teknova, Hollister, CA, USA) for 4 h at 22–25°C.

Later, cells were pelleted, and lysed using 1 mg/mL lysozyme (#3L2510, Teknova,

Hollister, CA, USA) and incubated on ice for 30 min. Soluble proteins were collected after centrifugation of cell lysates at 10,000 x g for 30 min at 4 °C.

In vitro enzymatic assays

All reagents and analytical grade substrates (resveratrol, , , , isorhapontigenin, isoeugenol, 3,5-dihydroxybenzaldehyde, 3,5- dimethoxybenzoic acid, 3,4-dihydroxybenzaldehyde and 4-hydroxybenzaldehyde) used for enzymatic assays were purchased from Sigma-Aldrich (St. Louis, MO, USA), while (CAS # 22139-77-1) was purchased from Sequoia research products

(Pangbourne, UK). Briefly, assays with respective cell lysates (≤ 20 µg) were carried out in 100 µL incubation buffer containing 1mM Tris(2-carboxyethyl) phosphine hydrochloride (TCEP, #C4706), 0.3mM FeSO4 (#215422), 100 mM HEPES (#H3375) and 1 mg/mL Catalase (#C1345). Control incubations were performed by preparing cell lysates of empty HI-Control BL21 (DE3) chemically competent cells. The cell lysate (c.

15-20 µL) was mixed with the incubation buffer and allowed to stand for 5 min to equilibrate. Substrates were dissolved in DMSO (#D128, Fisher Scientific, Columbus,

OH, USA) and added to a final concentration of 100 µM. Reactions were incubated at

106 27°C for 30 min in the dark and extracted with 2x 100 µl of ethyl acetate (#E195, Fisher

Scientific, Columbus, OH, USA) containing 1.0 mg/ml butylated hydroxyanisole as an internal standard (BHA, # B1253). Finally, the supernatants were combined and dried in an Eppendorf vacufuge concentrator (#07-748-13, Fisher Scientific, Waltham, MA,

USA) for 25 min. All samples were re-suspended in 100 µl of pure HPLC grade methanol (#A456, Fisher Scientific, Columbus, OH, USA), vortexed, and placed in a water bath sonicator for 10 minutes before being analyzed by UPLC.

UPLC separation and analyses

Products of the in vitro enzymatic reactions were analyzed by UPLC, and relatively quantified by using BHA (Figure C.1). For UPLC measurements, the column used was Cortecs UPLC C18 (2.1 < 100 mm. 1.6 µm, #186007095, Waters Corp.,

Milford, MA, USA) with a column temperature of 40 °C and sample tray held at 10 °C.

The mobile phase consisted of 0.1% acetic acid (A, aqueous) and acetonitrile (B) with a flow rate of 0.3 mL/min throughout. The gradient used was as follows: 0–0.25 min of 10

% B, 0.25–2 min of 10–27 % B, 2–4 min of 27-50 % B, 4–8 min of 50–90 % B, 8.01–10 min of 90 % B, and 10.01–12 min of 10 % B again. Area (µV*sec) for each peak was recorded and cleavage product concentration was calculated using external standard equation (Cx) = Ax / CFave, where Ax is the area of the analyte and CFave is the calibration factor. CFave was calculated using the formula CF = Area of the analyte / Conc. of the internal standard. Percent substrate consumption and relative enzymatic activity were

107 calculated based on area (µV*sec) of the substrate peaks in the chromatogram before

(100µM) and after enzymatic reactions.

Motif analysis of protein sequences

Previously identified substrate binding motifs (McAndrew et al., 2016) were retrieved by aligning two previously characterized bacterial SCOs with all 9 characterized fungal SCOs (5 described in previous studies, 4 described here), and analyzing regions of interest using commandline MEME v.4.12.0 (Bailey and Elkan

1994). Identified motifs were retrieved from the complete set of scos using FIMO in the

MEME suite.

Transposable element search

Nucleotide sequences of all sco singleton and cluster regions (extending 10,000bp upstream and downstream from either the unclustered gene, or from the genes at the ends of the cluster) were extracted from genome assemblies. RepeatModeler from the

RepeatMasker package (Smit et al. 2015) was used to de novo identify repetitive sequences in these regions, which were screened against the protein family (PFAM) database (last accessed 09/28/2018) using NCBI's conserved domain search tool

(Marchler-Bauer and Bryant 2004). All sequences that did not align to a known transposon-associated PFAM were removed. The filtered repeat sequences were then combined with RepBase's fungal-specific repeat library (downloaded 09/28/2018 from

108 www.girinst.org/repbase/) and used as a custom library for searches of the original extracted regions using RepeatMasker. The resulting hits were manually inspected, and any hits to non-transposon sequences were removed.

Results

sco homologs are alternately located in three distinct gene clusters

Using a previously described cluster detection method (Gluck-Thaler and Slot

2018), we identified unexpectedly retained synteny in the gene neighborhoods surrounding 466 monophyletic sco homologs retrieved from 550 fungi, and found that sco variably forms clusters with three distinct groups of genes, which we refer to as gene cluster models (Methods; Figure 4.1, Figures C.2-5, Table C.5). Model 1 contains four genes: sco, a 2,3-dihydroxybenzoate decarboxylase (dhbd; EC# 4.1.1.46), an aldehyde dehydrogenase (EC# 1.2.1.10), and a gene with unknown function containing a Snoal- like domain (PF13577) and part of the NFT2-like superfamily (SSF54427). Model 2, which was previously identified by Greene et al. (2014), contains five genes: sco, a gentisate dioxygenase (gdo; EC# 1.13.11.4), a salicylate hydroxylase (EC# 1.14.13.1), a fumarylpyruvate hydrolase (EC# 3.7.2.5), and a Zn(2)-Cys(6) transcription factor (ctfG).

Model 3 consists of two genes: sco, and a vanillyl alcohol oxidase (vao; EC# 1.1.3.38). A total of 80 model 1 clusters, 35 model 2 clusters, 55 model 3 clusters and 1 hybrid model

2+3 cluster were found distributed among 120 Pezizomycotina genomes. Of these, 4 model 1 clusters, 1 model 2 cluster, and 29 model 3 clusters contained at least one gene

109 a) Gene cluster/gene b) presence sco aldehyde unknown2,3-dihydroxy- dehydrogenase unclusteredModel 1Model 2 Model 3 Model 1 benzoate 80 clusters sco dhbd decarboxylase

salicylate fumaryl- gentisate 1,2transcription Model 2 hydroxylase pyruvate dioxygenase factor 37 clusters sco hydrolasegdo

vanillyl alcohol Model 3 vao

Dothideomycetes sco 55 clusters oxidase

c) Taxonomic class Dothideomycetes

Sordariomycetes

Leotiomycetes Sordariomycetes

rpb2 sco Leotiomycetes Eurotiomycetes

dhbd gdo vao

Figure 4.1: The distributions of three types of stilbene cleavage oxygenase (sco) gene clusters in fungi. a) A rooted, constrained maximum likelihood phylogeny based on the amino acid sequences of the second largest subunit of RNA polymerase II (rpb2; Methods) representing relationships among all 288 Pezizomycotina genomes examined in this study and 9 outgroup fungal genomes. Four major Pezizomycotina classes are indicated to the left. This phylogeny contains all 212 genomes that have at least 1 sco homolog (of which 203 are from the Pezizomycotina) from a database of 550 genomes. Matrix indicates presence/absence of unclustered sco genes and three cluster models in individual genomes, color-coded by gene/cluster model type. b) Representative example of each sco cluster model with genes color- coded according to model type (except sco, which is colored red): model 1: Ophioceras dolichostomum (Sordariomycetes), Ophdo1_1340.90_c1; model 2: Magnaporthe oryzae (Sordariomycetes), Maggr1_117659_c2; model 3: Podospora anserina (Sordariomycetes), Podan2_2307_c1 (Table C.5). Predicted functions are indicated, and key genes in each model are labeled (dhbd=2,3-dihydroxybenzoate decarboxylase; gdo=gentisate 1,2 dioxygenase, vao=vanillyl alcohol oxidase). c) Rooted maximum likelihood phylogenies of rpb2, sco, dhbd, gdo, and vao amino acid sequences with associated taxa color-coded by Pezizomycotina class. All color codes are consistent within and across figures.

110 from other cluster models (but not enough to be considered true hybrids). Thirty-one genomes harbored more than one type of cluster model. No genome was found to contain more than one copy of either model 1 or model 2 clusters; however, 8 genomes contain multiple model 3 clusters. A total of 52, 28, and 0 sco homologs were part of partial model 1, model 2 and model 3 clusters, respectively (defined as having <75% of the genes in the cluster model), while 203 sco homologs were not identified as clustered according to our criteria (see Methods). We found two additional model 2 clusters in the carT clade of fungal carotenoid cleavage dioxygenases when we expanded our search for cluster models outside of the monophyletic sco clade (Figure C.2).

Each cluster model is defined by a single key enzyme-encoding gene with homologs known to function in monophenolic catabolism. dhdb (model 1) is in the

KEGG orthology group K14333, which also contains a characterized enzyme from

Aspergillus oryzae (accession: P80346.1) that catalyzes the decarboxylation of 2,3- dihydroxybenzoate to catechol. Fungal dhbd sequences additionally share a conserved amidohydrolase domain (PF04909) with Sphingobium sp. SYK-6 ligW2 (accession:

WP_014075111.1), an enzyme that participates in the degradation of lignin-derived biphenyl compounds through the non-oxidative decarboxylation of 5-carboxyvanillate

(Peng et al., 2005). gdo (model 2) is in the KEGG orthology group K00450, contains a conserved cupin-fold (PF07883.10), and is related to bacterial gdos that participate in gentisate degradation by catalyzing the ring cleavage of gentisate to 3-maleylpyruvate

(Werwath et al., 1998). vao (model 3) contains FAD-binding (PF01565.22) and FAD- linked oxidase (PF02913.18) domains, and includes the only functionally characterized

111 fungal vao from Penicillium simlissisum, vaoX (accession: P56216). vaoX catalyzes the conversion of vanillyl alcohol to vanillin, but is also active on 4-hydroxybenzyl alcohols,

4-hydroxybenzylamines, 4-(methoxy-methyl)phenols, and 4-allylphenols (Fraaije et al.,

1995).

SCOs encoded in different clusters have conserved functions and substrate binding motifs

Fungal SCOs share a number of secondary structural and functional residue features with the characterized bacterial NOV1 (McAndrew et al., 2016). Fungal SCOs contain the conserved protein domain RPE65 (pfam03055), which defines members of the carotenoid cleavage dioxygenase family in plants and bacteria, and SCOs in bacteria and fungi. We identified previously characterized residues associated with tetradentate iron coordination in the SCO active site and with the essentiality of 4’-OH groups for stilbene cleavage by bacterial SCO (Figure 4.2, Table C.6) (Harrison and Bugg 2014;

McAndrew et al., 2016). Four histidine residues and three glutamate residues in SCO contribute to the 4-His + 3-Glu (second shell glutamate) dual-sphere metal binding catalytic center involved in canonical non-heme iron (II) dependent CCD-type enzymes

(Harrison and Bugg 2014). These residues are conserved in all nine functionally characterized SCO sequences, and amino acid motifs (a-h by position in primary sequence; Figure 4.2c) indicate additional imperfect conservation in surrounding residues. 58.6% of all retrieved fungal SCOs were conserved in all these residues, corresponding to 64.0% of unclustered SCOs, 97.5% of SCOs in model 1 clusters, 82.9% of SCOs in model 2 clusters and 43.6% of SCOs in model 3 clusters (Table C.7). The

112 a) Structural substitutions b) % substrate depletion in 30 minutes R5 R3 R′3/5 R′4 DsSCO MoSCO PrSCO PaSCO

Pterostilbene OCH3 OCH3 - OH 35 100 100 100 Piceatannol OH OH OH OH 0 98 100 98 Resveratrol OH OH - OH 10 80 80 95 Piceid OH Oglc - OH 0 49 18 86

Isoeugenol N/A N/A OCH3 OH 0 0 8 30

Isorhapontigenin OH OH OCH3 OH 0 0 0 0 Pinosylvin OH OH - - 0 0 0 0 R′3 D. sapinea M. oryzyae P. roqueforti P. anserina

R′4 singleton model 2 model 3 model 3

R3 c) Conserved binding motifs from characterized SCOs R′5 *** ** * R4 stilbene

OH R5

HO 98-106 (motif_a) 131-138 (motif_b) 161-172 (motif_c) * * **

O O piceid HO 215-222 (motif_d) 280-287 (motif_e) 350-362 (motif_f) HO OH OH * *

OH O CH3 H3C isoeugenol 415-421 (motif_g) 473-483 (motif_h)

Figure 4.2: Biochemical characterization of four fungal stilbene cleavage oxygenase (SCO) enzymes encoded in alternate cluster types. a) Chemical structures of all seven molecules used for enzyme characterization analyses. b) Percent depletion of the seven examined molecules after a 30-minute incubation with each of four heterologously expressed fungal SCO enzymes encoded in different cluster models: DsSCO from Diplodia sapinea (accession: MH350427; not part of any identified cluster models); MoSCO from Magnaporthe oryzae Guy11 (accession: MH350428; model 2); PrSCO from Penicillium roqueforti FM164 (accession: MH350430; model 3); PaSCO from Podospora anserina S mat+ (accession: MH350429; model 3). c) Eight conserved substrate-binding motifs from two characterized bacterial SCO sequences and all nine characterized fungal SCO sequences (four from this study, five from previous studies, Table S6 and S9). Residues of interest (indicated positions relative to NOV1 from Novosphingobium aromaticivorans, accession YP_496081.1) discussed in the main text are indicated with an asterisk (*), and include the YRN and KE motifs involved in the recognition and deprotonation of 4’-OH groups on stilbene molecules, the EF motif involved in hydrogen bonding with the 3/5 hydroxyl group of resveratrol molecules, and the 4-Histidine(H) + 3-Glutamic acid(E) dual- sphere metal binding catalytic center found in non-heme iron (II) dependent carotenoid cleavage dioxygenases.

113 )motifs in the conserved 4-His + 3-Glu dual-sphere metal binding catalytic center are:

(F/M)(T/C)AHPKX (motif_c), X(M/F)(I/M)HD(C/F)(A/G) (motif_d), (F/M)XXH(T/V)X

(motif_e), X(G/Q)XHGX(W/F) (motif_h), and AXKEXXX (motif_b), XXXEFX(R/Q)

(motif_f), and XX(Q/G)E(P/C)XX (motif_g), respectively (Table C.6).

Similarly, Tyr and Lys residues essential for recognition and deprotonation of 4’-

OH group of stilbenes in NOV1, are fully conserved in the nine functionally characterized proteins (Figure 4.2c). The Tyr residue is part of a YRN motif (motif_a) fully conserved in all characterized proteins, and in approximately 84% of all retrieved fungal SCOs, corresponding to 74.8% of unclustered SCOs, 97.5% of SCOs in model 1 clusters, 88.6% of SCOs in model 2 clusters, and 94.6% of SCOs in model 3 clusters. The

Lys residue is part of a KE motif (motif_b) fully conserved in characterized proteins, and in approximately 80.3% of all retrieved fungal SCOs, including 66.2% of unclustered

SCOs, 100% of SCOs in model 1 clusters, 88.6% of SCOs in model 2 clusters, and

92.7% of SCOs in model 3 clusters. Finally, a Glu residue previously shown in NOV1 to hydrogen bond with the 3/5 hydroxyl groups of resveratrol (McAndrew et al. 2016), is fully conserved in an EF motif (Figure 4.2c; motif_f) in the characterized proteins, and present in approximately 82.2% of all fungal homologs, consisting of 76.6% of unclustered SCOs, 100% of SCOs in model 1 clusters, 100% of SCOs in model 2 clusters, and 90.9% of SCOs in model 3 clusters (Table C.7). Finally, the Ser residue

(S283) in NOV1, speculated to be involved in dioxygenase activity on isoeugenol, a non- stilbene possessing a 4’-OH group, was not found in fungal SCOs that cleave isoeugenol

114 (Table C.6), and the current dataset is too limited to identify additional residues associated with substrate diversity.

Four SCOs encoded in multiple genomic contexts, in four different fungal species, which are identical at all conserved binding motifs (Table C.6), were profiled for substrate preferences (Figure 4.2, Figure C.5, Table C.8): (1) DsSCO (unclustered) from the pine necrotroph Diplodia sapinea; (2) MoSCO (model 2 cluster) from the grass hemibiotroph Magnaporthe oryzae; (3) PrSCO (model 3 cluster) from the cheese- associated saprotroph Penicillium roqueforti; (4) PaSCO (model 3 cluster), from the dung saprotroph Podospora anserina. No SCOs from model 1 were included, as the above 4 SCOs were selected and characterized before the development of our most recent cluster detection algorithm that led to the discovery of the model 1 cluster type. However, the previously characterized SCO from C. globosum is encoded in a partial model 1 cluster (Figure C.5). We selected the above four enzymes from model fungi in order to compare activity and specificity of SCO from different cluster models and states, as well as the same model in evolutionarily divergent species. SCOs from two model 3 clusters were selected because of preliminary evidence of HGT of this cluster model. All scos were expressed under standard growth conditions, except for Dssco, which was only expressed when induced with 0.2 % resveratrol or ground Austrian pine needle agar medium (Luchi et al., 2007).

Substrate cleavage specificities of heterologously expressed proteins were similar among the four enzymes in a panel consisting of the stilbenes resveratrol, pterostilbene, piceatannol, piceid, pinosylvin, and isorhapontigenin, and the monophenolic isoeugenol,

115 with some notable exceptions (Figure 4.2 and Table C.8). Pterostilbene is the preferred substrate of all four enzymes, generally followed by piceatannol, and then resveratrol.

None of these enzymes was active on isorhapontigenin, which differs from piceatannol by methylation of the R’3/5-OH, or pinosylvin, which lacks a 4΄-OH group. All of these results are consistent with cleavage of more limited panels of stilbenes by the other five characterized fungal SCOs (Brefort et al., 2011; Díaz-Sánchez et al., 2013), and similar to that of the extensive panel of stilbenes and related molecules tested on bacterial NOV1

(McAndrew et al., 2016)(Table C.9).

We identified a number of differences among these SCO enzymes in terms of substrate diversity and level of activity under the experimental conditions (Figure 4.2). Of all the four tested enzymes, PaSCO had the highest activity and broadest range in cleavage profiles, effectively cleaving five out of six stilbenoid compounds and isoeugenol. PrSCO had the same range of substrates as PaSCO, but was not as effective at cleaving the non-stilbene isoeugenol or piceid, a glycosylated derivative of resveratrol whose cleavage has not been previously reported. MoSCO had similar affinities for stilbenoid compounds to PrSCO but did not cleave isoeugenol. P. anserina, P. roqueforti, and M. oryzae SCOs all had high cleavage activity on pterostilbene, piceatannol and resveratrol, but compared to the other enzymes, PaSCO was much more active on piceid

(Table C.8). DsSCO was somewhat active on pterostilbene and resveratrol, but no other compounds tested. Interestingly, in the very closely related but less virulent pine pathogen D. scrobiculata, sco is a pseudogene. The greatest differences in activity among the different enzymes were observed in isoeugenol and piceid.

116

The evolution of sco cluster genes suggests complex evolutionary histories, and co- diversification within clusters.

In a maximum likelihood phylogeny of bacterial, plant and fungal carotenoid cleavage dioxygenases and SCOs, all fungal SCOs are monophyletic with previously characterized bacterial SCOs, to the exclusion of bacterial, plant and fungal carotenoid cleavage dioxygenases (EC# 1.13.11.59) involved in carotenoid metabolism (Figure C.2).

Carotenoid cleavage dioxygenases fall into two distinct clades (carX and carT) each containing groups of fungal sequences that are monophyletic with earlier diverging bacterial and plant sequences. Maximum likelihood phylogenies of sco and key genes from each of the sco cluster models (dhbd from model 1: Figures C.7-8; gdo from model

2: Figures C.9-10; vao from model 3: Figures C.10-12) in fungi portrayed complex patterns of inheritance compared to the conserved housekeeping gene rpb2, which closely tracks accepted species relationships (Figure 4.1c, Figure C.13). In the sco and vao gene phylogenies, sequences belonging to fungi from different taxonomic classes are frequently found as sister taxa, while the phylogenies of gdo and dhbd appear to conflict somewhat less with vertical inheritance among species. These observations are supported by differences among the distributions of two phylogenetic diversity metrics across the various gene trees: the gene trees of sco, gdo, and vao, but not dhbd, differed significantly from that of the housekeeping gene rpb2 in terms of the number of taxonomic classes represented in the descendants of each node, and sco and vao but not

117 gdo nor dhbd differed significantly from rpb2 in terms of the total distance on the rpb2 tree covered by descendants of each node (Table C.10, Figure C.14).

We compared clades of key genes from each of the three cluster models with the sco topology to investigate the patterns of clustered gene co-diversification (Figure 4.3).

Large clades of clustered dhbd (model 1) and gdo (model 2) sequences co-diversify with their neighboring sco sequences, suggesting that clustered sequences in the respective gene families have been co-inherited since they became clustered early on. Although clades of clustered vao sequences also co-diversify with their co-clustered sco sequences, distantly related sco sequences tend to cluster with closely related vao sequences, suggesting multiple and more recent independent origins of their clustering. Congruence between clustered clades on different trees is independent of whether the genes’ topologies are consistent with species trees. Clustered homologs of gdo and vao are often closely related to unclustered homologs.

sco underwent multiple transitions to and between cluster models.

Ancestral clustered states of sco (i.e. models 1-3 or unclustered) were estimated within a Bayesian framework, enabling the inference of transitions between and gains of distinct clustered states while integrating over the posterior distribution of sco topologies

(Figure 4.3, Figure C.5). Transition rates between clustered states sampled during reverse

118

Figure 4.3: Transitions between and gains of distinct cluster types across the stilbene cleavage oxygenase (sco) phylogeny. a) A tanglegram of the 50% majority rule consensus Bayesian sco phylogeny and maximum likelihood phylogenies of 2,3-dihydroxybenzoate decarboxylase (dhbd; key gene model 1), gentisate 1,2 dioxygenase (gdo; key gene model 2) and vanillyl alcohol oxidase (vao; key gene model 3) amino acid sequences. Connecting lines indicate genes within the same sco gene cluster, color-coded by cluster model. Sequences descending from nodes with inferred transition events between, or gain and fusion events of clustered states are outlined with grey squares (labeled by event ID) and have bold connecting lines. b) Descriptions of cluster transition, gain and fusion events inferred through ancestral state reconstruction (details provided in Figure S5 and Table S11). Transition and gain events with ambiguous ancestral states but supported by extant cluster distributions are prefixed with an ‘A’.

119 jump MCMC analysis roughly fall into two categories, those with a median rate coefficient of ~2 (model 1 to unclustered, model 2 to unclustered, model 3 to unclustered, model 2 to 3) and those with a median rate coefficient of ~0 (unclustered to model 1, unclustered to model 2, unclustered to model 3, model 1 to model 2, model 2 to model 1, model 1 to model 3, model 3 to model 1, model 3 to model 2) (Figure C.15).

We identified 6 distinct events (T1-6) and 2 ambiguously placed events (Methods;

AT1-2) in which sco homologs are inferred to have transitioned between one cluster model and another (Figure 4.3, Figure 4.4, Table C.11). Transitions from model 2 to model 3 occurred most frequently (T1-T5, AT2), followed by transitions from model 1 to model 3 (T6) and transitions from model 3 to model 2 (AT1). Interestingly, we also inferred the replacement of a vao in a model 3 containing taxon with a more divergent vao homolog (Figure C.6).

We identified one special case of a transition in the form of a cluster fusion (F1) consisting of a single locus containing a sco sequence associated with all accessory model 2 and 3 genes in Neofusicoccum parvum (Dothideomycetes) (Figure 4.4). Model 3 is inferred to be the ancestral clustered state of this clade having arisen through a gain in model 3 clustered state (AG2). The N. parvum vao homolog (Neopa1_tbn3; detected through tBLASTn) is most closely related to Sordariomycete sequences, while two closely related species, Macrophomina phaseolina and Botryosphaeria dothidea, contain model 2 clusters lacking sco sequences. Coupled with the observation that the N. parvum sco homolog (Neopa1_1337) is found in a region of the phylogeny with poor resolution and high taxonomic diversity, these lines of evidence are consistent with the fusion of a

120 Colletotrichum gloeosporoides a) Gloci1_32067 Nectria haematococca Necha2_44390 Podospora anserina Podan2_7551 1 Fusarium oxysporum fo5176 Fusoxfo5176_FOXB19597To Fusarium oxysporum fo5176 Fusoxfo5176_FOXB20258To 0.95 1 Fusarium verticillioides 1 Fusve1_6247 Aspergillus terreus Aspte1_9612 Delitschia confertaspora Delco1_356922 Aspergillus sydowii AG2 1 Aspsy1_86309 1 Pseudogymnoascus sp. VKMF4517_5811

F1 Neofusicoccum parvum 0.99 Neopa1_1337

Botryosphaeria dothidea 1 Botdo1_1997

1 12 unclustered sco sequences

Aspergillus sydowii b) T2 Aspsy1_75786 Penicillium janthinellum Penja1_449600 Aspergillus brasiliensis Aspbr1_139552 1 Aspergillus fumigatus AfRCO1 1 1 Aspergillus fumigatus Aspfu1_4805 Aspergillus nidulans 1 Aspnid1_7311 Aspergillus flavus Aspfl1_34113

1 Aspergillus oryzae Aspor1_1780

Talaromyces stipitatus 1 Talst12_10415

Aspergillus versicolor Aspve1_193046

1 Aspergillus sydowii Aspsy1_131757

Aspergillus aculeatus Aspac1_22097 Aspergillus niger AspniDSM1_168554

Aspergillus niger

0.98 Aspni7_1128364 1 Aspergillus niger AspniNRRL31_10247

Aspergillus campestris c) Aspcam1_280974 Cladophialophora psammophila 1 Claps1_A1O504082T0s

Coccidioides posadasii T6 Cpos_06718

1 Coccidioides immitis 1 Cocim1_6678

1 Aspergillus nidulans Aspnid1_3123 Cladophialophora psammophila Claps1_A1O502246T0 0.95 Penicillium expansum Penex1_338270

1 Ancestral Cluster No model unclustered model 1 genes Model 1

state type Model 2 sco model 2 genes

Model 3 Model

Model 2 Model No model No probability 1 Model 0 Model 3 model 3 genes

Figure 4.4: Synteny comparisons within three lineages of stilbene cleavage oxygenase (sco) loci inferred to have experienced transition or fusion events between different cluster types. All depicted trees on the left were extracted from the 50% majority rule consensus Bayesian sco

121 amino acid phylogeny, and can be viewed in their original context in Figure S5. Tree tips are labeled with binomial species names and protein IDs, and are color coded by the cluster model to which they belong. Transition, gain and fusion event IDs are labeled above the nodes at which they are inferred to have occurred (Figure 3, Table S11). Histograms depicting the median probability of reconstructed ancestral clustered states (ranging from 0 to 1) are drawn above their respective branches. Dotted gray lines artificially extend branches that would otherwise be too short to accommodate histograms. Support values of ≥0.95 posterior probability are drawn beneath their respective branches, while support values <0.95 are not shown. Gene schematics on the right are color coded by the sco cluster model to which they belong (if any), except for sco, which is always colored red. Shaded-lines drawn between gene schematics indicate percent identity, as determined by pairwise tBLASTx comparisons. Note that genes from different families part of the same cluster model are shaded with the same color. a) Inferred fusion between model 2 and model 3 clusters in Neofusicoccum parvum (Dothideomycetes). Note that the close relative Botryosphaeria dothidea (Dothideomycetes) has an unclustered sco and a model 2 cluster lacking sco located on different contigs b) Inferred sco transition from model 2 to model 3 cluster type. c) Inferred sco transition from model 1 to model 3 partial hybrid cluster type.

122 vertically inherited model 2 cluster (without sco) with ambiguously inherited sco and vao genes.

We also identified 2 distinct events (G1-2) and 3 ambiguously placed events

(AG1-3) where sco is inferred to have transitioned from an unclustered state to either a model 3 cluster (G2, AG1-3) or model 2 cluster (G1). Although not included in our ancestral state reconstruction analysis involving the monophyletic sco clade, at least one other independent gain of a model 2 clustered state appears to have occurred in the carT clade of carotenoid cleavage dioxygenases involving two sequences (Aspve1_89005 and

Lorju1_471378) that are very distantly related to the sco clade (Figure C.2).

With regards to the SCO enzymes we characterized here, we infer that PaSCO and PrSCO are in a small clade recently descended from T4 (transition from model 2 to model 3), MoSCO is in a clade with an ancient model 2 clustered state, and DsSCO is in a very small clade that recently lost model 3 to become unclustered.

After confirming that the likelihood of the Bayesian topology-constrained sco ML tree did not differ significantly from the true optimal ML sco phylogeny (p = 0.364,

Table C.2), we sought to validate inferred transition events within a maximum likelihood framework using constrained topologies corresponding to hypotheses regarding the independence of transition events (Figure 4.3, Figure C.5, Table C.3 and S11). Node labels refer to internal nodes on the sco phylogeny (Figure C.5). We failed to reject the null hypothesis of no transition from model 3 to model 2 among the descendants of node_387_0.863 (test 4; p = 0.142), indicating ambiguous support for event AT1 as a true transition event. We failed to reject the null hypothesis that events T3 and T4 have

123 independent origins (test 6; p = 0.195), suggesting that the model 3 clusters descending from these events may have arisen through a single transition event. We also failed to reject the null hypothesis of no independent transitions from model 2 to model 3 at events

AT2 and T5 (test 9; p = 0.133), suggesting that model 3 clusters descending from node_547_0.621 may have arisen through a single transition event. We weakly rejected the null hypothesis of no independent transitions among the descendants of node_491_0.663 (test 5; affects events T2 and T3; p = 0.071), of node_859_0.992 (test

12; affects event T6; p = 0.074), and of node_861_0.992 (test 11; affects event AG3, T6; p = 0.079). We strongly rejected the null hypothesis of no independent transitions among the descendants of node_522_0.786 (test 7; affects events T2, T3, T4; p = 0.003), of node_523_1.00 (test 8; affects events T1, T2, T3, T4; p = 3.00E-05), and of node_558_0.968 (test 10; affects events T1, T2, T3, T4, T5, AT2; p = 8.00E-54). We also strongly rejected the null hypothesis of no independent transitions to model 2 clusters from either model 1 or model 3 across the entire sco phylogeny (test 2; p = 1.00E-5), and the null hypothesis of no independent transitions to model 3 clusters from either model 2 or 3 across the entire sco phylogeny (test 3; p = 8.00E-8). Together, these constraint analyses support a scenario where the sco gene family has experienced at least 4 transition events and 1 ambiguous transition event during the course of its evolutionary history.

124 Singleton and clustered sco homologs are associated with transposable elements

We found evidence of transposable elements (TEs) in genomic regions associated with 33.7% of all retrieved sco homologs, corresponding to 66 sco singletons (in 51 fungi) and 91 sco clusters (in 78 fungi)(Table C.12). Notably, scos from 16 of the cluster regions with TEs descend from three of the eight nodes where we inferred transition events (T1, T4, T6), and from three of the five nodes where we inferred gain events

(AG1, AG2, AG3). In most cases, with the exception of AG1 and AG3, not every single sco sequence descending from the node where we inferred a T/AG event was located near detectable TEs. We also found TEs associated with a model 2 cluster that was previously inferred to have been horizontally transferred between fungi from the distantly related

Sordariomycetes and Dothideomycetes (Greene et al., 2014).

Discussion

Fungi are consummate decomposers, and their catabolic activities often underlie their saprotrophic, mutualistic and pathogenic interactions with plants. Many plants produce SMs that inhibit fungal growth, and correspondingly, the ability of fungi to degrade plant SMs like stilbenes can affect how they colonize plant tissues (Michielse et al., 2012; Hammerbacher et al., 2013; Kettle et al., 2015). Studies on the diversification of biosynthetic SM pathways in plants have received considerable attention in recent years (Wisecaver et al., 2017; Moghe et al., 2017; Leong and Last 2017), yet little is known about the processes underpinning the co-evolution of cognate degradative pathways in fungi. One fungal strategy to degrade plant-produced stilbenes involves the

125 deployment of SCO, a family of fungal enzymes that degrade stilbene compounds

(Brefort et al., 2011; Díaz-Sánchez et al., 2013). Although the pathways in which SCO participates are yet to be described, we report here that sco homologs are alternatively organized into three distinct types of MGCs, leading us to ask how different evolutionary processes may have affected the evolution of stilbene catabolism in fungi. In general, metabolic diversification can proceed through combinatorial evolution, where new combinations of existing enzymes coalesce into new pathways, and through sequence- based evolution, where changes to gene coding sequences result in new enzymatic activities (Wagner 2011). Our biochemical and phylogenetic investigation suggests that combinatorial evolution plays a primary role in promoting the diversification of stilbene degradation pathways in fungi, while sco’s enzymatic function is more constrained.

Convergent assembly of sco clusters suggests selection for specific gene combinations

The generation of new phenotypes through combining existing genotypes is a key driver of metabolic diversity (Takiguchi et al., 1989; Wagner 2011). MGCs participating in the biosynthesis and degradation of SMs are increasingly recognized as hotspots for combinatorial evolution (Copley 2009; Lind et al., 2017; Gluck-Thaler and Slot 2018).

These loci experience frequent gene pseudogenization, duplication, insertion, and deletion events that differentiate gene content even amongst closely related species

(Reynolds et al., 2017; Lind et al., 2017). Several MGCs are inferred to have arisen through the relocation of genes within the same genome (Proctor et al., 2009), while others have arisen through relocation and fusion with existing MGCs (Lind et al., 2017),

126 similar to the cluster fusion we observed here in N. parvum (Figure 4.4). Many MGCs are located in recombination hotspots, such as subtelomeres (Wu et al., 2009; Croll et al.,

2015) and TE-rich regions (Fleetwood et al., 2007; Lind et al., 2017). That a third of all sco homologs including those descending from nearly half of the inferred transition and gain events are associated with TEs (Table C.12) is consistent with a role in generating combinatorial diversity at these loci, and suggests a plausible mechanism mediating cluster gains and transitions.

In this study, we provide evidence that combinatorial evolution has impacted the evolutionary trajectory of stilbene catabolism. We observed multiple independent instances in which sco genes shifted genomic context either from an unclustered state to a distinct cluster type or between cluster types, suggesting convergent selection for specific gene combinations (Figure 4.3, Figure 4.4, Table C.11). Convergence is indicative of strong directional selection for optimal phenotypes in similar environments (Conant and

Wagner 2003; Bittleston et al., 2016). Several other examples of convergent evolution of particular combinations of clustered genes have previously been reported, notably galactose utilization clusters in divergent yeast lineages (Slot and Rokas 2010), cyanate detoxification clusters in the Pezizomycotina (Elmore et al., 2015), nicotinate degradation clusters in fungi and bacteria (Amon et al., 2017), and triterpene biosynthesis clusters in plants (Field and Osbourn 2008). While our analysis was limited to examining cluster assembly from the perspective of sco, the organizational histories of genes associated with sco are also marked by convergence. For example, the sah, fph, and gdo genes

127 present in model 2 clusters have convergently clustered without sco in bacteria and fungi

(Liu et al., 2011).

One outstanding question regarding convergent cluster assembly is why clustering is selected at all. Although selection for coordinated expression is often proposed as a driving force behind clustering, support for this hypothesis is mixed: an explicit test using galactose utilization clusters in yeast suggested that co-expression is a consequence but not necessarily a cause of clustering (Lang and Botstein 2011), while homologs of clustered fungal metabolic genes are more likely to be co-expressed in primates suggesting a causal relationship (Eidem et al., 2015). While transcription factors in biosynthetic clusters are often involved in coordinating the expression of neighboring genes (Keller 2015), the relationship between transcription factors located in sco clusters and neighboring gene co-expression remains unexplored. A search of publically available transcriptomic datasets revealed that the majority of genes in a model 1 cluster in the pine saprotroph Ophiostoma picea are significantly upregulated when grown on lodgepole pine sawdust media compared to unamended control (Haridas et al., 2013), and that all genes in the pine pathogen Grosmannia clavigera’s model 2 cluster are expressed on unamended media and in the presence of host phloem, and are all significantly downregulated in response to terpene exposure (DiGuistini et al., 2011).

Whether or not coordinated expression is selectable may ultimately depend on the fitness consequence of sub-optimal regulation, which is likely more severe if pathway intermediates are toxic. Clustered gene pairs are more likely to encode enzymes that are linked by a toxic intermediate (McGary et al., 2013), and some of the predicted

128 intermediary metabolites of stilbene degradation pathways, such as the monophenolic aldehydes produced by SCO, are expected to be toxic. Regardless of the effects on co- expression, the tight linkage of genes within MGCs may itself be driven by selection for co-adapted alleles and genes in the face of gene flow and recombination (Yeaman 2013), and the increased selectability of higher order metabolic phenotypes (Pepper 2003).

The concordant phylogenies of genes within sco clusters that are themselves sparsely distributed among distantly related species suggests some sco clusters were horizontally transferred (Reynolds et al., 2018) (Figure 4.1, Figure 4.3). However, the complex patterns of inheritance we observed in sco, vao and gdo gene families preclude the testing of specific HGT hypotheses (Figure 4.1). An alternative explanation for this pattern is the long-term retention of clusters that formed in ancient ancestors, which either were not detected in species with only a single sampled genome (because they are dispersed in the pan-genome), or were lost in most species. In either scenario, the distribution of gene clusters is a signature of selection on these specific genetic architectures. The most parsimonious explanation is that at least some horizontal sco cluster transfer has occurred given the large numbers of duplications and losses required to explain vertical inheritance (Szöllősi et al., 2015). HGT is an important source of genetic novelty for combinatorial evolution in bacteria, and is likely to impact fungal evolution as well (Soanes and Richards 2014). HGT can accelerate adaptive processes not only by giving organisms access to existing adaptive genotypes, but also by increasing combinatorial potential (Baquero 2004; Schonknecht et al., 2013). Clustered genes also tend to experience significantly more HGT compared with unclustered loci in

129 fungi (Wisecaver et al., 2014), and HGT has been implicated in the wholesale transfer of many other fungal biosynthetic and catabolic MGCs enabling the colonization of new hosts and substrates (Slot 2017).

Conserved specificities of alternately clustered scos suggest modularity of function

It has previously been demonstrated that distantly related SCOs accommodate cleavage of similar stilbene molecules (Brefort et al., 2011; Díaz-Sánchez et al., 2013;

McAndrew et al., 2016) (Table C.9). Similarly, four distantly related sco homologs presented here exhibit broadly conserved activity on five of the six stilbene molecules tested (Figure 4.2, Table C.8). Previous studies on stilbene degrading enzymes suggested that stilbene cleavage is mediated by a small number of conserved amino acid residues

(Figure 4.2, Table C.6), and the 4’-OH aromatic substitution is essential for full enzyme activity (Harrison and Bugg 2014; McAndrew et al., 2016). Here we show Tyr and Lys amino acid residues that mediate the hydrogen bonding with the 4’-OH, are moderately conserved (>66%) across unclustered SCOs and highly conserved (>88%) across clustered SCOs, with a few exceptions. Further, the 4-H + 3-E residues previously shown to form the tetradentate iron coordination active site of the greater CCO family (Sui et al.,

2015; 2016), are also moderately conserved (64%) across unclustered SCOs and highly conserved (>82%) across SCOs in model 1 and 2 clusters. However, only 43.6% of SCOs found in model 3 clusters have these conserved residues, which merits further investigation with a larger set of these enzymes. Nevertheless, the generally high level of structural and functional conservation suggests that sco has evolved as an interchangeable

130 metabolic module, and is consistent with the hypothesis of fungal stilbene metabolism adapting primarily through gene combination.

However, although SCOs cleave a range of 4′-OH substituted stilbenes, we show here that the breadth of substrate can vary. We observed that two closely related SCOs from P. anserina (model 3) and P. roquefortii (presumed model 3) cleave the phenylpropene isoeugenol, in addition to the conserved substrates shared with other characterized homologs. Furthermore, SCOs vary sharply in the efficiency with which they cleave piceid, a glycosylated form of resveratrol. Broader surveys of SCO homologs may reveal additional amino acid residues associated with these variations in activity and specificity. Preservation of core functions coupled with the expansions of substrate range can minimize the change required to explore new regions of a phenotypic landscape

(O'Maille et al., 2008; Ferrada and Wagner 2010). The recent transitions to model 3 clusters (T1-T6, AT2) might have followed relaxation of selection, which resulted in a broad-specificity SCO that was more suited to its novel role (Copley 2009)(Figure 4.3,

Figure 4.4); by contrast, MoSCO, which has a narrower substrate range, is anciently associated with model 2. The activity of PaSCO and PrSCO on isoeugenol is also interesting because the model 3 key gene, vao, is involved in the production of 4- hydroxycinnamyl alcohols from 4-allylphenols (e.g. eugenol and chavicol), and could indicate model 3 clusters can degrade other classes of antifungal metabolites or lignin monomers (Fraaije et al., 1995).

131 Combinations of genes within clusters may facilitate the specialization of metabolic pathways

What could be the functional and ecological significances of the three different types of sco clusters? In addition to the inhibitory effects of diverse stilbenes against fungi colonizing both living and decaying plant tissues (Celimene et al., 1999; 2001;

Jeandet et al., 2010), some stilbenes (e.g., piceatannol and resveratrol) were recently shown to cross-couple with monolignols during lignification of the cell wall ( Del Río et al., 2017), where they may be expected to be released during lignin degradation processes typically carried out by saprotrophic fungi. Consequently, as particular fungal lineages frequently shift between different hosts and nutritional modes, the benefit of acquiring and optimizing mechanisms to degrade different types of stilbenes may wax and wane as populations disperse across heterogeneous environments. Transitions between a limited set of cluster types provide preliminary evidence for a constrained number of adaptive stilbene catabolism pathways, which may each represent a fitness optimum in a separate local metabolic landscape.

By decreasing recombination, MGCs facilitate the co-adaptation between constituent genes, which may facilitate specialization on specific substrates, and possibly the formation of protein complexes that enable functional compartmentalization or metabolic channeling (Castellana et al., 2014). Given the evidence of evolutionary transitions between cluster types and the broad conservation of SCO activity on differently substituted stilbenes, we speculate that the specialized function of any given cluster may ultimately depend on the types of cleavage products produced by SCO’s activity on host/substrate specific stilbenes. Predicted functions encoded in each sco

132 cluster suggest accessory genes enable degradation of alternatively substituted monophenolic cleavage products by alternative, conserved ring-cleavage pathways

(Figure C.16). Model 2 cluster gene functions (gdo, fph) suggest a plausible pathway for transformation to citric acid cycle intermediates by gentisic acid cleavage (Greene et al.,

2014). Model 1 and model 3 cluster gene functions (e.g. dhbd and vao, respectively) suggest ring cleavage of downstream products by the β-ketoadipate pathway, which also leads to the production of citric acid cycle intermediates (Mäkelä et al., 2015).

Specialization of sco clusters to particular cleavage products may circumvent fitness defects arising from cross-talk with other (especially core) metabolic pathways (McGary et al., 2013; Keller 2015). Cluster distributions in large pan-genomes, dispersed among large populations and through horizontal transfer is expected to increase the strength of ecological selection that results in this specialization and optimization (Lynch 2006). At the same time, these large gene pools would provide broad access to combinatorial partners that are selected under similar environmental conditions (Baquero 2004).

While the precise metabolic adaptations conferred by SCO clusters is an ongoing investigation, complicated by the existence of multiple mechanisms of stilbene defense

(Hammerbacher et al 2013), we argue that MGCs containing sco are positive evidence of metabolic adaptation to stilbenes. The origin and persistence of clustering by various mechanisms is a signature of natural selection on the encoded functions (Slot 2017). It is possible that different sco clusters do not represent adaptation to diverse metabolites, but instead represent convergent mechanisms for degradation of the same compounds in, for example, different intracellular or extracellular environments. Altogether, our results are

133 consistent with a process where combinatorial evolution through genomic rearrangements facilitates fungal adaptation to plant defense chemistry, and highlight the additional importance of enzymes with modular functions. Future work to characterize these stilbene catabolic pathways in full, and to assess their contributions to fitness in particular environments, will help to refine our understanding of the processes contributing to metabolic adaptation and specialization.

Acknowledgements

We thank the staff of the Molecular and Cellular Imaging Center at The Ohio

State University. Special thanks to Mike Kelly (MCIC, Columbus) for expert advice and help with UPLC measurements. We thank Marysabel Mendez Acevedo for her help in cloning the Podospora anserina SCO. We thank Thomas K. Mitchell, Enrico Bonello,

Ana Paula Alonso, and Laura Kubatko for their kind support in providing materials and helpful comments for parts of this work. We thank Cathie Aime, Daniele Armaleo, Jose

Maria Barrasa, Stephen B. Goodwin, Gerald Bills, Burt H. Bluhm, Gregory Bonito, Sarah

Branco, Tom Bruns, Kathryn Bushley, Laurie Connell, Pedro Crous, Gunther

Doehlemann, Paul Dyer, David Ezra, Dave Greenshields, Igor Grigoriev, Alexey Grum-

Grzhimaylo, Andrii Gryganskyi, Colleen Hansel, David S. Hibbett, Patrik Inderbitzin,

Bjorn D. Lindahl, Daniel L. Lindner, Francois Lutzoni, Jon Karl Magnuson, Francis

Michel Martin, Don Natvig, Kerry O'Donnell, Robin Ohm, Amy Jo Powell, Marie-Noëlle

Rosso, Trey Sato, Steven W. Singer, Joseph Spatafora, John W. Taylor, Adrian Tsang,

Gillian Turgeon, Rytas Vilgalys, and Kenneth H. Wolfe for providing access to genomic

134 data prior to publication; these sequence data were produced in collaboration with the user community by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, supported by the Office of Science of the U.S. Department of

Energy under Contract No. DE-AC02-05CH11231. Computational work was performed using the resources of the Ohio Supercomputer Center. This work was supported by the

Fonds de Recherche du Quebec-Nature et Technologies (EGT), the NSF (DEB-1638999) and an OARDC SEEDS Interdisciplinary Team Research Competition Grant.

References

Adamek M, Spohn M, Stegmann E, Ziemert N (2017). Mining bacterial genomes for secondary metabolite gene clusters. Antibiotics. Springer. pp 23-47.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal of molecular biology 215: 403-410.

Ámon J, Fernández-Martín R, Bokor E, Cultrone A, Kelly JM, Flipphi M et al (2017). A eukaryotic nicotinate-inducible gene cluster: convergent evolution in fungi and bacteria. Open biology 7: 170199.

Bailey JK, Schweitzer JA, Ubeda F, Koricheva J, LeRoy CJ, Madritch MD et al (2009). From genes to ecosystems: a synthesis of the effects of plant genetic factors across levels of organization. Philosophical Transactions of the Royal Society B: Biological Sciences 364: 1607-1616.

Bailey TL, Elkan C (1994). Fitting a mixture model by expectation maximization to discover motifs in bipolymers. Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology 2: 28-36.

Baquero F (2004). From pieces to patterns: evolutionary engineering in bacterial pathogens. Nature Reviews Microbiology 2: 510.

Bittleston LS, Pierce NE, Ellison AM, Pringle A (2016). Convergence in multispecies interactions. Trends in ecology & evolution 31: 269-280.

135 Brefort T, Scherzinger D, Limón MC, Estrada AF, Trautmann D, Mengel C et al (2011). Cleavage of resveratrol in fungi: characterization of the enzyme Rco1 from Ustilago maydis. Fungal genetics and biology 48: 132-143.

Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972-1973.

Castellana M, Wilson MZ, Xu Y, Joshi P, Cristea IM, Rabinowitz JD et al (2014). Enzyme clustering accelerates processing of intermediates through metabolic channeling. Nature biotechnology 32: 1011.

Celimene CC, Micales JA, Ferge L, Young RA (1999). Efficacy of pinosylvins against white-rot and brown-rot fungi. Holzforschung 53: 491-497.

Celimene CC, Smith DR, Young RA, Stanosz GR (2001). In vitro inhibition of Sphaeropsis sapinea by natural stilbenes. Phytochemistry 56: 161-165.

Chong J, Poutaraud A, Hugueney P (2009). Metabolism and roles of stilbenes in plants. Plant science 177: 143-155.

Conant GC, Wagner A (2003). Convergent evolution of gene circuits. Nature genetics 34: 264.

Copley SD (2009). Evolution of efficient pathways for degradation of anthropogenic chemicals. Nature chemical biology 5: 559.

Croll D, Lendenmann MH, Stewart E, McDonald BA (2015). The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics 201: 1213-1228.

Darriba D, Taboada GL, Doallo R, Posada D (2011). ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27: 1164-1165. del Río JC, Rencoret J, Gutiérrez A, Kim H, Ralph J (2017). Hydroxystilbenes are monomers in palm fruit endocarp lignins. Plant physiology 174: 2072-2082.

Díaz-Sánchez V, Estrada AF, Limón MC, Al-Babili S, Avalos J (2013). The oxygenase CAO-1 of Neurospora crassa is a resveratrol cleavage enzyme. Eukaryotic cell 12: 1305- 1314.

DiGuistini S, Wang Y, Liao NY, Taylor G, Tanguay P, Feau N et al (2011). Genome and transcriptome analyses of the mountain pine beetle-fungal symbiont Grosmannia clavigera, a lodgepole pine pathogen. Proceedings of the National Academy of Sciences 108: 2504-2509.

136

Ehrlich KC, Chang P-K, Yu J, Cotty PJ (2004). Aflatoxin biosynthesis cluster gene cypA is required for G aflatoxin formation. Applied Environmental Microbiology 70: 6518- 6524.

Eidem HR, McGary KL, Rokas A (2015). Shared selective pressures on fungal and human metabolic pathways lead to divergent yet analogous genetic responses. Molecular biology and evolution 32: 1449-1455.

Elmore MH, McGary KL, Wisecaver JH, Slot JC, Geiser DM, Sink S et al (2015). Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages. Genome biology and evolution 7: 789-800.

Ferrada E, Wagner A (2010). Evolutionary innovations and the organization of protein functions in genotype space. PLoS ONE 5: e14172.

Field B, Osbourn AE (2008). Metabolic diversification—independent assembly of operon-like gene clusters in different plants. Science 320: 543-547.

Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB et al (2011). Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Current protocols in bioinformatics 35: 6.12. 11-16.12. 19.

Fleetwood DJ, Scott B, Lane GA, Tanaka A, Johnson RD (2007). A complex ergovaline gene cluster in Epichloë endophytes of grasses. Applied Environmental Microbiology 73: 2571-2579.

Fraaije MW, Veeger C, Van Berkel WJ (1995). Substrate Specificity of Flavin- Dependent Vanillyl-Alcohol Oxidase from Penicillium Simplicissimum: Evidence for the Production of 4-Hydroxycinnamyl Alcohols from 4-Allylphenols. European journal of biochemistry 234: 271-277.

Futuyma DJ, Agrawal AA (2009). Macroevolution and the biological diversity of plants and herbivores. Proceedings of the National Academy of Sciences 106: 18054-18061.

Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH et al (2016). Two horizontally transferred xenobiotic resistance gene clusters associated with detoxification of benzoxazolinones by Fusarium species. PloS ONE 11: e0147486.

Gluck-Thaler E, Slot JC (2018). Specialized plant biochemistry drives gene clustering in fungi. The ISME journal 12: 1694-1705.

137 Greene GH, McGary KL, Rokas A, Slot JC (2014). Ecology drives the distribution of specialized tyrosine metabolism modules in fungi. Genome biology and evolution 6: 121- 132.

Hammerbacher A, Schmidt A, Wadke N, Wright LP, Schneider B, Bohlmann J et al (2013). A common fungal associate of the spruce bark beetle metabolizes the stilbene defenses of Norway spruce. Plant physiology 162: 1324-1336.

Han Y, Zhao W, Wang Z, Zhu J, Liu Q (2014). Molecular evolution and sequence divergence of plant chalcone synthase and chalcone synthase-Like genes. Genetica 142: 215-225.

Haridas S, Wang Y, Lim L, Alamouti SM, Jackman S, Docking R et al (2013). The genome and transcriptome of the pine saprophyte Ophiostoma piceae, and a comparison with the bark beetle-associated pine pathogen Grosmannia clavigera. BMC genomics 14: 373.

Harrison PJ, Bugg TD (2014). Enzymology of the carotenoid cleavage dioxygenases: reaction mechanisms, inhibition and biochemical roles. Archives of biochemistry and biophysics 544: 105-111.

Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC et al (2015). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic acids research 44: D286-D293.

Huerta-Cepas J, Serra F, Bork P (2016). ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular biology and evolution 33: 1635-1638.

Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C et al (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular biology and evolution 34: 2115-2122.

Jeandet P, Delaunois B, Conreux A, Donnez D, Nuzzo V, Cordelier S et al (2010). Biosynthesis, metabolism, molecular engineering, and biological functions of stilbene phytoalexins in plants. Biofactors 36: 331-341.

Katoh K, Standley DM (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution 30: 772- 780.

Keller NP (2015). Translating biosynthetic gene clusters into fungal armor and weaponry. Nature chemical biology 11: 671.

138 Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM (2015). Degradation of the benzoxazolinone class of phytoalexins is important for virulence of Fusarium pseudograminearum towards wheat. Molecular plant pathology 16: 946-962.

Lang GI, Botstein D (2011). A test of the coordinated expression hypothesis for the origin and maintenance of the GAL cluster in yeast. PloS one 6: e25290.

Lendenmann MH, Croll D, Stewart EL, McDonald BA (2014). Quantitative trait locus mapping of melanization in the plant pathogenic fungus Zymoseptoria tritici. G3: Genes, Genomes, Genetics 4: 2519-2533.

Leong BJ, Last RL (2017). Promiscuity, impersonation and accommodation: evolution of plant specialized metabolism. Current opinion in structural biology 47: 105-112.

Lind AL, Wisecaver JH, Lameiras C, Wiemann P, Palmer JM, Keller NP et al (2017). Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS biology 15: e2003583.

Liu TT, Xu Y, Liu H, Luo S, Yin YJ, Liu SJ et al (2011). Functional characterization of a gene cluster involved in gentisate catabolism in Rhodococcus sp. strain NCIMB 12038. Applied microbiology and biotechnology 90: 671-678.

Luchi N, Capretti P, Bonello P (2007). Production of Diplodia scrobiculata and Diplodia pinea pycnidia on ground Austrian pine needle agar medium. Phytopathologia Mediterranea 46: 230-235.

Lynch M (2006). Streamlining and simplification of microbial genome architecture. Annual Reviews Microbiology 60: 327-349.

Mäkelä MR, Marinović M, Nousiainen P, Liwanag AJ, Benoit I, Sipilä J et al (2015). Aromatic Metabolism of Filamentous Fungi in Relation to the Presence of Aromatic Compounds in Plant Biomass. Advances in Applied Microbiology 91: 63-137.

Marchler-Bauer A, Bryant SH (2004). CD-Search: protein domain annotations on the fly. Nucleic acids research 32: W327-W331.

McAndrew RP, Sathitsuksanoh N, Mbughuni MM, Heins RA, Pereira JH, George A et al (2016). Structure and mechanism of NOV1, a resveratrol-cleaving dioxygenase. Proceedings of the National Academy of Sciences 113: 14324-14329.

McGary KL, Slot JC, Rokas A (2013). Physical linkage of metabolic genes in fungi is an adaptation against the accumulation of toxic intermediate compounds. Proceedings of the National Academy of Sciences 110: 11481-11486.

139 Michielse CB, Reijnen L, Olivain C, Alabouvette C, Rep M (2012). Degradation of aromatic compounds through the β-ketoadipate pathway is required for pathogenicity of the tomato wilt pathogen F usarium oxysporum f. sp. lycopersici. Molecular plant pathology 13: 1089-1100.

Moghe GD, Leong BJ, Hurney SM, Jones AD, Last RL (2017). Evolutionary routes to biochemical innovation revealed by integrative analysis of a plant-defense related specialized metabolic pathway. Elife 6: e28468.

Nützmann HW, Huang A, Osbourn A (2016). Plant metabolic clusters–from genetics to genomics. New Phytologist 211: 771-789.

O'maille PE, Malone A, Dellas N, Hess Jr BA, Smentek L, Sheehan I et al (2008). Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature chemical biology 4: 617.

Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara B et al (2016). vegan: Community Ecology Package. R package version 2.3-5.

Pagel M, Meade A (2006). Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist 167: 808-825.

Paradis E, Claude J, Strimmer K (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.

Peng X, Masai E, Kasai D, Miyauchi K, Katayama Y, Fukuda M (2005). A second 5- carboxyvanillate decarboxylase gene, ligW2, is important for lignin-related biphenyl catabolism in Sphingomonas paucimobilis SYK-6. Applied Environmental Microbiology 71: 5014-5021.

Pepper JW (2003). The evolution of evolvability in genetic linkage patterns. Biosystems 69: 115-126.

Price MN, Dehal PS, Arkin AP (2010). FastTree 2–approximately maximum-likelihood trees for large alignments. PloS ONE 5: e9490.

Proctor RH, Busman M, Seo J-A, Lee YW, Plattner RD (2008). A fumonisin biosynthetic gene cluster in Fusarium oxysporum strain O-1890 and the genetic basis for B versus C fumonisin production. Fungal Genetics and Biology 45: 1016-1026.

Proctor RH, McCormick SP, Alexander NJ, Desjardins AE (2009). Evidence that a secondary metabolic biosynthetic gene cluster has grown by gene relocation during evolution of the filamentous fungus Fusarium. Molecular microbiology 74: 1128-1142.

140

Raguso RA, Agrawal AA, Douglas AE, Jander G, Kessler A, Poveda K et al (2015). The raison d'être of chemical ecology. Ecology 96: 617-630.

Reynolds HT, Slot JC, Divon HH, Lysøe E, Proctor RH, Brown DW (2017). Differential retention of gene functions in a secondary metabolite cluster. Molecular biology and evolution 34: 2002-2015.

Reynolds HT, Vijayakumar V, Gluck-Thaler E, Korotkin HB, Matheny PB, Slot JC (2018). Horizontal gene cluster transfer increased hallucinogenic mushroom diversity. Evolution Letters 2: 88-101.

Richards LA, Dyer LA, Forister ML, Smilanich AM, Dodson CD, Leonard MD et al (2015). Phytochemical diversity drives plant–insect community diversity. Proceedings of the National Academy of Sciences 112: 10973-10978.

Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S et al (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology 61: 539-542.

Schönknecht G, Chen W-H, Ternes CM, Barbier GG, Shrestha RP, Stanke M et al (2013). Gene transfer from bacteria and archaea facilitated evolution of an extremophilic eukaryote. Science 339: 1207-1210.

Shimodaira H, Hasegawa M (2001). CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17: 1246-1247.

Shimodaira H (2002). An approximately unbiased test of phylogenetic tree selection. Systematic biology 51: 492-508.

Slot JC, Rokas A (2010). Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi. Proceedings of the National Academy of Sciences 107: 10136-10141.

Slot JC (2017). Fungal gene cluster diversity and evolution. Advances in genetics. Elsevier. pp 141-178.

Smit A, Hubley R, Green P (2015). RepeatMasker Open-4.0. 2013–2015.

Soanes D, Richards TA (2014). Horizontal gene transfer in eukaryotic plant pathogens. Annual Review of Phytopathology 52: 583-614.

141 Speed MP, Fenton A, Jones MG, Ruxton GD, Brockhurst MA (2015). Coevolution can explain defensive secondary metabolite diversity in plants. New Phytologist 208: 1251- 1263.

Stamatakis A (2014). RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30: 1312-1313.

Sui X, Golczak M, Zhang J, Kleinberg KA, von Lintig J, Palczewski K et al (2015). Utilization of dioxygen by carotenoid cleavage oxygenases. Journal of Biological Chemistry 290: 30212-30223.

Sui X, Zhang J, Golczak M, Palczewski K, Kiser PD (2016). Key residues for catalytic function and metal coordination in a carotenoid cleavage dioxygenase. Journal of Biological Chemistry 291: 19401-19412.

Sullivan MJ, Petty NK, Beatson SA (2011). Easyfig: a genome comparison visualizer. Bioinformatics 27: 1009-1010.

Szöllősi GJ, Davín AA, Tannier E, Daubin V, Boussau B (2015). Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philosophical Transactions of the Royal Society B: Biological Sciences 370: 20140335.

Takiguchi M, Matsubasa T, Amaya Y, Mori M (1989). Evolutionary aspects of urea cycle enzyme genes. Bioessays 10: 163-166.

Team RC (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Tropf S, Lanz T, Rensing S, Schröder J, Schröder G (1994). Evidence that stilbene synthases have developed from chalcone synthases several times in the course of evolution. Journal of Molecular Evolution 38: 610-618.

Wadke N, Kandasamy D, Vogel H, Lah L, Wingfield BD, Paetz C et al (2016). The Bark-Beetle-Associated Fungus, Endoconidiophora polonica, Utilizes the Phenolic Defense Compounds of Its Host as a Carbon Source. Plant Physiology 171: 914-931.

Wagner A (2011). The origins of evolutionary innovations: a theory of transformative change in living systems. Oxford University Press: Oxford, England.

Werwath J, Arfmann H-A, Pieper DH, Timmis KN, Wittich R-M (1998). Biochemical and genetic characterization of a gentisate 1, 2-dioxygenase from Sphingomonas sp. strain RW5. Journal of bacteriology 180: 4171-4176.

142 Wisecaver JH, Slot JC, Rokas A (2014). The evolution of fungal metabolic pathways. PLoS genetics 10: e1004816.

Wisecaver JH, Borowsky AT, Tzin V, Jander G, Kliebenstein DJ, Rokas A (2017). A global coexpression network approach for connecting genes to specialized metabolic pathways in plants. The Plant Cell 29: 944-959.

Wu C, Kim Y-S, Smith KM, Li W, Hood HM, Staben C et al (2009). Characterization of chromosome ends in the filamentous fungus Neurospora crassa. Genetics 181: 1129-1145.

Yabe K, Nakajima H (2004). Enzyme reactions and genes in aflatoxin biosynthesis. Applied microbiology and biotechnology 64: 745-755.

Yeaman S (2013). Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences 110: E1743-E1751.

Züst T, Heichinger C, Grossniklaus U, Harrington R, Kliebenstein DJ, Turnbull LA (2012). Natural enemies drive geographic variation in plant defenses. Science 338: 116- 119.

143 Bibliography

Adamek M, Spohn M, Stegmann E, Ziemert N (2017). Mining bacterial genomes for secondary metabolite gene clusters. Antibiotics. Springer. pp 23-47.

Al-Shahrour F, Minguez P, Marqués-Bonet T, Gazave E, Navarro A, Dopazo J (2010). Selection upon genome architecture: conservation of functional neighborhoods with changing genes. PLoS Computational Biology 6: e1000953.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal of molecular biology 215: 403-410.

Ámon J, Fernández-Martín R, Bokor E, Cultrone A, Kelly JM, Flipphi M et al (2017). A eukaryotic nicotinate-inducible gene cluster: convergent evolution in fungi and bacteria. Open biology 7: 170199.

Andam CP, Gogarten JP (2011). Biased gene transfer in microbial evolution. Nature Reviews Microbiology 9: 543-555.

Andrew RL, Peakall R, Wallis IR, Foley WJ (2007). Spatial distribution of defense chemicals and markers and the maintenance of chemical variation. Ecology 88: 716-728.

Avalos Cordero FJ, Díaz Sánchez V, Estrada Alejandro F, Limón Mirón MdC, Al-Babili S (2013). The oxygenase CAO-1 of Neurospora crassa is a resveratrol cleavage enzyme. Eukaryotic Cell 12: 1305-1314.

Baas-Becking LGM (1934). Geobiologie; of inleiding tot de milieukunde. WP Van Stockum & Zoon NV: The Hague, the Netherlands.

Bailey JK, Schweitzer JA, Ubeda F, Koricheva J, LeRoy CJ, Madritch MD et al (2009). From genes to ecosystems: a synthesis of the effects of plant genetic factors across levels of organization. Philosophical Transactions of the Royal Society B: Biological Sciences 364: 1607-1616.

Bailey TL, Elkan C (1994). Fitting a mixture model by expectation maximization to discover motifs in bipolymers. Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology 2: 28-36.

Baquero F (2004). From pieces to patterns: evolutionary engineering in bacterial pathogens. Nature Reviews Microbiology 2: 510-518.

Barlow M (2009). What antimicrobial resistance has taught us about horizontal gene transfer. Horizontal Gene Transfer. Springer. pp 397-411.

144 Barton KE, Koricheva J (2010). The ontogeny of plant defense and herbivory: characterizing general patterns using meta-analysis. The American Naturalist 175: 481- 493.

Bay RA, Ruegg K (2017). Genomic islands of divergence or opportunities for introgression? Proceedings of the Royal Society B: Biological Sciences 284.

Baym M, Lieberman TD, Kelsic ED, Chait R, Gross R, Yelin I et al (2016). Spatiotemporal microbial evolution on antibiotic landscapes. Science 353: 1147-1151.

Bayry J, Beaussart A, Dufrêne YF, Sharma M, Bansal K, Kniemeyer O et al (2014). Surface structure characterization of Aspergillus fumigatus conidia mutated in the melanin synthesis pathway and their human cellular immune response. Infection and Immunity 82: 3141-3153.

Beaumont HJ, Gallie J, Kost C, Ferguson GC, Rainey PB (2009). Experimental evolution of bet hedging. Nature 462: 90.

Becerra JX (1997). Insects on plants: macroevolutionary chemical trends in host use. Science 276: 253-256.

Becerra JX (2007). The impact of herbivore–plant coevolution on plant community structure. Proceedings of the National Academy of Sciences 104: 7483-7488.

Becerra JX (2015). On the factors that promote the diversity of herbivorous insects and plants in tropical forests. Proceedings of the National Academy of Sciences 112: 6098- 6103.

Bentley R, Haslam E (1990). The shikimate pathway—a metabolic tree with many branches. Critical reviews in biochemistry and molecular biology 25: 307-384.

Berbee ML, Wang S, Chang Y, Sekimoto S, Clum A, Aerts AL et al (2015). Phylogenomic Analyses Indicate that Early Fungi Evolved Digesting Cell Walls of Algal Ancestors of Land Plants. Genome Biology and Evolution 7: 1590-1601.

Bittleston LS, Pierce NE, Ellison AM, Pringle A (2016). Convergence in multispecies interactions. Trends in ecology & evolution 31: 269-280.

Bouarab K, Melton R, Peart J, Baulcombe D, Osbourn A (2002). A saponin-detoxifying enzyme mediates suppression of plant defences. Nature 418: 889-892.

Bouyioukos C, Reverchon S, Kepes F (2016). From multiple pathogenicity islands to a unique organized pathogenicity archipelago. Scientific reports 6: 27978.

145 Bowyer P, Clarke B, Lunness P, Daniels M, Osbourn A (1995). Host range of a plant pathogenic fungus determined by a saponin detoxifying enzyme. Science 267: 371-374.

Boyce KJ, McLauchlan A, Schreider L, Andrianopoulos A (2015). Intracellular Growth Is Dependent on Tyrosine Catabolism in the Dimorphic Fungal Pathogen Penicillium marneffei. PLoS Pathogens 11: 1-30.

Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A (2007). Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events. Statistics in Medicine 26: 53-77.

Bradshaw RE, Slot JC, Moore GG, Chettri P, de Wit PJGM, Ehrlich KC et al (2013). Fragmentation of an aflatoxin-like gene cluster in a forest pathogen. New Phytologist 198: 525-535.

Brefort T, Scherzinger D, Limón MC, Estrada AF, Trautmann D, Mengel C et al (2011). Cleavage of resveratrol in fungi: characterization of the enzyme Rco1 from Ustilago maydis. Fungal genetics and biology 48: 132-143.

Bruns TD (2019). The developing relationship between the study of fungal communities and community ecology theory. Fungal Ecology. doi:10.1016/j.funeco.2018.12.009

Buckling A, Brockhurst MA, Travisano M, Rainey PB (2007). Experimental adaptation to high and low quality environments under different scales of temporal variation. Journal of evolutionary biology 20: 296-300.

Campbell MA, Staats M, van Kan JA, Rokas A, Slot JC (2013). Repeated loss of an anciently horizontally transferred gene cluster in Botrytis. Mycologia 105: 1126-1134.

Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25: 1972-1973.

Carere J, Colgrave ML, Stiller J, Liu C, Manners JM, Kazan K et al (2016). Enzyme- driven metabolomic screening: a proof-of-principle method for discovery of plant defence compounds targeted by pathogens. New Phytologist 212: 770-779.

Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q et al (2007). Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science 315: 207- 212.

Castellana M, Wilson MZ, Xu Y, Joshi P, Cristea IM, Rabinowitz JD et al (2014). Enzyme clustering accelerates processing of intermediates through metabolic channeling. Nature biotechnology 32: 1011.

146 Celimene CC, Micales JA, Ferge L, Young RA (1999). Efficacy of pinosylvins against white-rot and brown-rot fungi. Holzforschung 53: 491-497.

Celimene CC, Smith DR, Young RA, Stanosz GR (2001). In vitro inhibition of Sphaeropsis sapinea by natural stilbenes. Phytochemistry 56: 161-165.

Chan C, Jayasekera S, Kao B, Páramo M, Von Grotthuss M, Ranz JM (2015). Remodelling of a homeobox gene cluster by multiple independent gene reunions in Drosophila. Nature communications 6: 6509.

Charron MJ, Dubin RA, Michels CA (1986). Structural and functional analysis of the MAL1 locus of Saccharomyces cerevisiae. Molecular and cellular biology 6: 3891-3899.

Chong J, Poutaraud A, Hugueney P (2009). Metabolism and roles of stilbenes in plants. Plant science 177: 143-155.

Cimermancic P, Medema MH, Claesen J, Kurita K, Brown LCW, Mavrommatis K et al (2014). Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158: 412-421.

Cohen O, Gophna U, Pupko T (2010). The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer. Molecular biology and evolution 28: 1481-1489.

Conant GC, Wagner A (2003). Convergent evolution of gene circuits. Nature genetics 34: 264.

Copley SD (2009). Evolution of efficient pathways for degradation of anthropogenic chemicals. Nature chemical biology 5: 559.

Copley SD (2015). An evolutionary biochemist's perspective on promiscuity. Trends in biochemical sciences 40: 72-78.

Croll D, Lendenmann MH, Stewart E, McDonald BA (2015). The impact of recombination hotspots on genome evolution of a fungal plant pathogen. Genetics 201: 1213-1228.

Cvijović I, Good BH, Jerison ER, Desai MM (2015). Fate of a mutation in a fluctuating environment. Proceedings of the National Academy of Sciences 112: E5021-E5028.

Dallery J-F, Lapalu N, Zampounis A, Pigné S, Luyten I, Amselem J et al (2017). Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters. BMC Genomics 18.

147

Dandekar T, Snel B, Huynen M, Bork P (1998). Conservation of gene order: a fingerprint of proteins that physically interact. Trends in biochemical sciences 23: 324-328.

Darriba D, Taboada GL, Doallo R, Posada D (2011). ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27: 1164-1165.

Davila Lopez M, Martinez Guerra JJ, Samuelsson T (2010). Analysis of gene order conservation in eukaryotes identifies transcriptionally and functionally linked genes. PLoS One 5: e10654. de las Heras A, Chavarría M, de Lorenzo V (2011). Association of dnt genes of Burkholderia sp. DNT with the substrate-blind regulator DntR draws the evolutionary itinerary of 2, 4-dinitrotoluene biodegradation. Molecular microbiology 82: 287-299.

Del Carratore F, Zych K, Cummings M, Takano E, Medema MH, Breitling R (2019). Computational identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters. Communications Biology 2: 83. del Río JC, Rencoret J, Gutiérrez A, Kim H, Ralph J (2017). Hydroxystilbenes are monomers in palm fruit endocarp lignins. Plant physiology 174: 2072-2082.

Desentis-Mendoza RM, Hernández-Sánchez H, Moreno A, Rojas del C E, Chel-Guerrero L, Tamariz J et al (2006). Enzymatic polymerization of phenolic compounds using laccase and tyrosinase from Ustilago maydis. Biomacromolecules 7: 1845-1854.

Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A et al (2015). Horizontal gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen. Proceedings of the National Academy of Sciences 112: 3451-3456.

Díaz-Sánchez V, Estrada AF, Limón MC, Al-Babili S, Avalos J (2013). The oxygenase CAO-1 of Neurospora crassa is a resveratrol cleavage enzyme. Eukaryotic cell 12: 1305- 1314.

DiGuistini S, Wang Y, Liao NY, Taylor G, Tanguay P, Feau N et al (2011). Genome and transcriptome analyses of the mountain pine beetle-fungal symbiont Grosmannia clavigera, a lodgepole pine pathogen. Proceedings of the National Academy of Sciences 108: 2504-2509.

Doolittle WF (1998). You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends in Genetics 14: 307-311.

Douglas H, Hawthorne D (1966). Regulation of genes controlling synthesis of the galactose pathway enzymes in yeast. Genetics 54: 911.

148

Druzhinina IS, Kopchinskiy AG, Kubicek EM, Kubicek CP (2016). A complete annotation of the chromosomes of the cellulase producer Trichoderma reesei provides insights in gene clusters, their expression and reveals genes required for fitness. Biotechnology for biofuels 9: 75.

Durmaz E, Benson C, Kapun M, Schmidt P, Flatt T (2018). An inversion supergene in Drosophila underpins latitudinal clines in survival traits. Journal of Evolutionary Biology 31: 1354-1364.

Eddy SR (2011). Accelerated profile HMM searches. PLoS Comput Biol 7: e1002195.

Edgar RC (2010). Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460-2461.

Ehrlich KC, Chang P-K, Yu J, Cotty PJ (2004). Aflatoxin biosynthesis cluster gene cypA is required for G aflatoxin formation. Applied Environmental Microbiology 70: 6518- 6524.

Ehrlich PR, Raven PH (1964). Butterflies and plants: a study in coevolution. Evolution 18: 586-608.

Eidem HR, McGary KL, Rokas A (2015). Shared selective pressures on fungal and human metabolic pathways lead to divergent yet analogous genetic responses. Molecular biology and evolution 32: 1449-1455.

Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, Glass N et al (2011). Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proceedings of the National Academy of Sciences 108: 2831-2836.

Elmore MH, McGary KL, Wisecaver JH, Slot JC, Geiser DM, Sink S et al (2015). Clustering of two genes putatively involved in cyanate detoxification evolved recently and independently in multiple fungal lineages. Genome biology and evolution 7: 789- 800.

Enkerli J, Bhatt G, Covert SF (1998). Maackiain detoxification contributes to the virulence of Nectria haematococca MP VI on chickpea. Molecular Plant Microbe Interactions 11: 317-326.

Faino L, Seidl MF, Shi-Kunne X, Pauper M, van den Berg GC, Wittenberg AH et al (2016). Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen. Genome research 26: 1091-1100.

149 Farr DF, Rossman AY (2017). Fungal Databases. U.S. National Fungus Collections, ARS, USDA: https://nt.ars-grin.gov/fungaldatabases/.

Fernández-Cañón JM, Peñalva MA (1995). Molecular characterization of a gene encoding a homogentisate dioxygenase from Aspergillus nidulans and identification of its human and plant homologues. Journal of Biological Chemistry 270: 21199-21205.

Ferrada E, Wagner A (2010). Evolutionary innovations and the organization of protein functions in genotype space. PLoS ONE 5: e14172.

Field B, Fiston-Lavier AS, Kemen A, Geisler K, Quesneville H, Osbourn AE (2011). Formation of plant metabolic gene clusters within dynamic chromosomal regions. Proceedings of the National Academy of Sciences 108: 16116-16121.

Field B, Osbourn AE (2008). Metabolic diversification—independent assembly of operon-like gene clusters in different plants. Science 320: 543-547.

Firn RD, Jones CG (2003). Natural products–a simple model to explain chemical diversity. Natural product reports 20: 382-391.

Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB et al (2011). Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Current protocols in bioinformatics 35: 6.12. 11-16.12. 19.

Fisher RA (1922). The systematic location of genes by means of crossover observations. The American Naturalist 56: 406-411.

Fleetwood DJ, Scott B, Lane GA, Tanaka A, Johnson RD (2007). A complex ergovaline gene cluster in Epichloë endophytes of grasses. Applied Environmental Microbiology 73: 2571-2579.

Floudas D, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B et al (2012). The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336: 1715-1719.

Fouet C, Gray E, Besansky NJ, Costantini C (2012). Adaptation to aridity in the malaria mosquito Anopheles gambiae: chromosomal inversion polymorphism and body size influence resistance to desiccation. PloS one 7: e34841.

Fowler ZL, Baron CM, Panepinto JC, Koffas MA (2011). Melanization of flavonoids by fungal and bacterial laccases. Yeast 28: 181-188.

Fraaije MW, Veeger C, Van Berkel WJ (1995). Substrate Specificity of Flavin- Dependent Vanillyl-Alcohol Oxidase from Penicillium Simplicissimum: Evidence for the

150 Production of 4-Hydroxycinnamyl Alcohols from 4-Allylphenols. European journal of biochemistry 234: 271-277.

Fridman O, Goldberg A, Ronin I, Shoresh N, Balaban NQ (2014). Optimization of lag time underlies antibiotic tolerance in evolved bacterial populations. Nature 513: 418.

Friesen TL, Stukenbrock EH, Liu Z, Meinhardt S, Ling H, Faris JD et al (2006). Emergence of a new disease as a result of interspecific virulence gene transfer. Nature genetics 38: 953.

Futuyma DJ, Agrawal AA (2009). Macroevolution and the biological diversity of plants and herbivores. Proceedings of the National Academy of Sciences 106: 18054-18061.

Gardiner DM, McDonald MC, Covarelli L, Solomon PS, Rusu AG, Marshall M et al (2012). Comparative pathogenomics reveals horizontally acquired novel virulence genes in fungi infecting cereal hosts. PLoS pathogens 8: e1002952.

Gérecová G, Neboháčová M, Zeman I, Pryszcz LP, Tomáška Ľ, Gabaldón T et al (2015). Metabolic gene clusters encoding the enzymes of two branches of the 3-oxoadipate pathway in the pathogenic yeast Candida albicans. FEMS yeast research 15: fov006.

Gershenzon J, Fontana A, Burow M, Wittstock U, Degenhardt J (2012). Mixtures of plant secondary metabolites: metabolic origins and ecological benefits. In: Iason GR, Dicke M, Hartley SE (ed). The ecology of plant secondary metabolites: from genes to global processes. Cambridge University Press: Cambridge, MA. pp 56-77.

Gerton JL, DeRisi J, Shroff R, Lichten M, Brown PO, Petes TD (2000). Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences 97: 11383-11390.

Giles NH, Case ME, Jacobson JW (1973). Genetic regulation of quinate-shikimate catabolism in Neurospora crassa. Molecular cytogenetics. Springer: Boston, MA. pp 309- 314.

Glenn A, Gold S, Bacon C (2002). Fdb1 and Fdb2, Fusarium verticillioides loci necessary for detoxification of preformed antimicrobials from corn. Molecular plant- microbe interactions 15: 91-101.

Glenn AE, Davis CB, Gao M, Gold SE, Mitchell TR, Proctor RH et al (2016). Two horizontally transferred xenobiotic resistance gene clusters associated with detoxification of benzoxazolinones by Fusarium species. PloS ONE 11: e0147486.

Gluck-Thaler E, Slot JC (2015). Dimensions of Horizontal Gene Transfer in Eukaryotic Microbial Pathogens. PLoS Pathogens 11: e1005156.

151

Gluck-Thaler E, Slot JC (2018). Specialized plant biochemistry drives gene clustering in fungi. The ISME Journal 12: 1694-1705.

Gluck-Thaler E, Vijayakumar V, Slot JC (2018). Fungal adaptation to plant defences through convergent assembly of metabolic modules. Molecular ecology 27: 5120-5136.

Graham GJ (1995). Tandem genes and clustered genes. Journal of theoretical biology 175: 71-87.

Grandaubert J, Lowe RG, Soyer JL, Schoch CL, Van de Wouw AP, Fudal I et al (2014). Transposable element-assisted evolution and adaptation to host plant within the Leptosphaeria maculans-Leptosphaeria biglobosa species complex of fungal pathogens. BMC genomics 15: 891.

Greene GH, McGary KL, Rokas A, Slot JC (2014). Ecology drives the distribution of specialized tyrosine metabolism modules in fungi. Genome biology and evolution 6: 121- 132.

Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R et al (2013). MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic acids research 42: D699-D704.

Haldane J (1919). The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8: 299-309.

Haldane JBS (1957). The cost of natural selection. Journal of Genetics 55: 511.

Hall AR, Miller AD, Leggett HC, Roxburgh SH, Buckling A, Shea K (2012). Diversity– disturbance relationships: frequency and intensity interact. Biology letters 23: 768-771.

Hammerbacher A, Schmidt A, Wadke N, Wright LP, Schneider B, Bohlmann J et al (2013). A common fungal associate of the spruce bark beetle metabolizes the stilbene defenses of Norway spruce. Plant physiology 162: 1324-1336.

Han Y, Zhao W, Wang Z, Zhu J, Liu Q (2014). Molecular evolution and sequence divergence of plant chalcone synthase and chalcone synthase-Like genes. Genetica 142: 215-225.

Hane JK, Lowe RG, Solomon PS, Tan K-C, Schoch CL, Spatafora JW et al (2007). Dothideomycete–plant interactions illuminated by genome sequencing and EST analysis of the wheat pathogen Stagonospora nodorum. Plant Cell 19: 3347-3368.

152 Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP (2011). A novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome Biology 12: R45.

Haridas S, Wang Y, Lim L, Alamouti SM, Jackman S, Docking R et al (2013). The genome and transcriptome of the pine saprophyte Ophiostoma piceae, and a comparison with the bark beetle-associated pine pathogen Grosmannia clavigera. BMC genomics 14: 373.

Harrison PJ, Bugg TD (2014). Enzymology of the carotenoid cleavage dioxygenases: reaction mechanisms, inhibition and biochemical roles. Archives of biochemistry and biophysics 544: 105-111.

Hartmann FE, Sánchez-Vallet A, McDonald BA, Croll D (2017). A fungal wheat pathogen evolved host specialization by extensive chromosomal rearrangements. The ISME Journal 11: 1189-1204.

Heath TA, Hedtke SM, Hillis DM (2008). Taxon sampling and the accuracy of phylogenetic analyses. Journal of Systematics and Evolution 46: 239-257.

Hill WG, Robertson A (1966). The effect of linkage on limits to artificial selection. Genetics Research 8: 269-294.

Hittinger CT, Rokas A, Carroll SB (2004). Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proceedings of the National Academy of Sciences 101: 14144-14149.

Hoffmeister D, Keller NP (2007). Natural products of filamentous fungi: enzymes, genes, and their regulation. Natural Products Report 24: 393-416.

Höhl B, Arnemann M, Schwenen L, Stöckl D, Bringmann G, Jansen J et al (1989). Degradation of the Pterocarpan Phytoalexin (—)-Maackiain by Ascochyta rabiei. Zeitschrift für Naturforschung C 44: 771-776.

Holeski LM, Hillstrom ML, Whitham TG, Lindroth RL (2012). Relative importance of genetic, ontogenetic, induction, and seasonal variation in producing a multivariate defense phenotype in a foundation tree species. Oecologia 170: 695-707.

Holliday JA, Zhou L, Bawa R, Zhang M, Oubida RW (2016). Evidence for extensive parallelism but divergent genomic architecture of adaptation along altitudinal and latitudinal gradients in Populus trichocarpa. New Phytologist 209: 1240-1251.

153 Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C et al (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular biology and evolution 34: 2115-2122.

Huerta-Cepas J, Serra F, Bork P (2016). ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular biology and evolution 33: 1635-1638.

Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC et al (2015). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic acids research 44: D286-D293.

Hull E, Green P, Arst Jr H, Scazzocchlo C (1989). Cloning and physical characterization of the l-proline catabolism gene cluster of Aspergillus nidulans. Molecular microbiology 3: 553-559.

Hurst LD, Pál C, Lercher MJ (2004). The evolutionary dynamics of eukaryotic gene order. Nature Reviews Genetics 5: 299-310.

Hurst LD, Williams EJ, Pal C (2002). Natural selection promotes the conservation of linkage of co-expressed genes. Trends in Genetics 18: 604-606.

Jacob F, Monod J (1961). Genetic regulatory mechanisms in the synthesis of proteins. Journal of molecular biology 3: 318-356.

Jain R, Rivera MC, Lake JA (1999). Horizontal gene transfer among genomes: the complexity hypothesis. Proceedings of the National Academy of Sciences 96: 3801-3806.

James TY, Pelin A, Bonen L, Ahrendt S, Sain D, Corradi N et al (2013). Shared signatures of parasitism and phylogenomics unite Cryptomycota and microsporidia. Current Biology 23: 1548-1553.

Jaramillo VDA, Vargas WA, Sukno SA, Thon MR (2013). Horizontal transfer of a subtilisin gene from plants into an ancestor of the plant pathogenic fungal genus Colletotrichum. PLoS One 8: e59078.

Jeandet P, Delaunois B, Conreux A, Donnez D, Nuzzo V, Cordelier S et al (2010). Biosynthesis, metabolism, molecular engineering, and biological functions of stilbene phytoalexins in plants. Biofactors 36: 331-341.

Jeffries TW, Van Vleet JRH (2009). Pichia stipitis genomics, transcriptomics, and gene clusters. FEMS yeast research 9: 793-807.

154 Johnstone I, McCabe P, Greaves P, Gurr S, Cole G, Brow M et al (1990). Isolation and characterisation of the crnA-niiA-niaD gene cluster for nitrate assimilation in Aspergillus nidulans. Gene 90: 181-192.

Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J et al (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484: 55-61.

Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C et al (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236-1240.

Jönsson LJ, Martín C (2016). Pretreatment of lignocellulose: formation of inhibitory by- products and strategies for minimizing their effects. Bioresoure Technology 199: 103- 112.

Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR et al (2011). Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature 477: 203-206.

Katoh K, Standley DM (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution 30: 772- 780.

Kaznadzey A, Shelyakin P, Gelfand MS (2017). Sugar Lego: Gene composition of bacterial carbohydrate metabolism genomic loci. Biology Direct 12.

Keller NP (2015). Translating biosynthetic gene clusters into fungal armor and weaponry. Nature chemical biology 11: 671.

Keller NP (2018). Fungal secondary metabolism: regulation, function and drug discovery. Nature Reviews Microbiology 17: 167-180.

Kettle AJ, Batley J, Benfield AH, Manners JM, Kazan K, Gardiner DM (2015). Degradation of the benzoxazolinone class of phytoalexins is important for virulence of Fusarium pseudograminearum towards wheat. Molecular Plant Pathology 16: 946-962.

Khaldi N, Wolfe KH (2011). Evolutionary origins of the fumonisin secondary metabolite gene cluster in Fusarium verticillioides and Aspergillus niger. International journal of evolutionary biology.

Kim S-G, Yon F, Gaquerel E, Gulati J, Baldwin IT (2011). Tissue specific diurnal rhythms of metabolites and their regulation during herbivore attack in a native tobacco, Nicotiana attenuata. PLoS ONE 6: e26214.

155 Kirkpatrick M, Barton N (2006). Chromosome inversions, local adaptation and speciation. Genetics 173: 419-434.

Kliebenstein DJ, Rowe HC, Denby KJ (2005). Secondary metabolites influence Arabidopsis/Botrytis interactions: Variation in host production and pathogen sensitivity. Plant Journal 44: 25-36.

Klosterman SJ, Subbarao KV, Kang S, Veronese P, Gold SE, Thomma BP et al (2011). Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS pathogens 7: e1002137.

Kroken S, Glass NL, Taylor JW, Yoder O, Turgeon BG (2003). Phylogenomic analysis of type I polyketide synthase genes in pathogenic and saprobic ascomycetes. Proceedings of the National Academy of Sciences 100: 15670-15675.

Lah L, Haridas S, Bohlmann J, Breuil C (2013). The cytochromes P450 of Grosmannia clavigera: Genome organization, phylogeny, and expression in response to pine host chemicals. Fungal Genetics and Biology 50: 72-81.

Lah L, Podobnik B, Novak M, Korošec B, Berne S, Vogelsang M et al (2011). The versatility of the fungal cytochrome P450 monooxygenase system is instrumental in xenobiotic detoxification. Molecular Microbiology 81: 1374-1389.

Lang GI, Botstein D (2011). A test of the coordinated expression hypothesis for the origin and maintenance of the GAL cluster in yeast. PloS one 6: e25290.

Larson WA, Limborg MT, McKinney GJ, Schindler DE, Seeb JE, Seeb LW (2017). Genomic islands of divergence linked to ecotypic variation in sockeye salmon. Molecular Ecology 26: 554-570.

Lawrence JG, Roth JR (1996). Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143: 1843-1860.

Le Nagard H, Chao L, Tenaillon O (2011). The emergence of complexity and restricted pleiotropy in adapting networks. BMC evolutionary biology 11.

Lendenmann MH, Croll D, Stewart EL, McDonald BA (2014). Quantitative trait locus mapping of melanization in the plant pathogenic fungus Zymoseptoria tritici. G3: Genes, Genomes, Genetics 4: 2519-2533.

Leong BJ, Last RL (2017). Promiscuity, impersonation and accommodation: evolution of plant specialized metabolism. Current opinion in structural biology 47: 105-112.

156 Lewontin R, Kojima Ki (1960). The evolutionary dynamics of complex polymorphisms. Evolution 14: 458-472.

Li L, Stoeckert CJ, Roos DS (2003). OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research 13: 2178-2189.

Lin K, Limpens E, Zhang Z, Ivanov S, Saunders DG, Mu D et al (2014). Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus. PLoS genetics 10: e1004078.

Lind AL, Wisecaver JH, Lameiras C, Wiemann P, Palmer JM, Keller NP et al (2017). Drivers of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS biology 15: e2003583.

Lindtke D, Lucek K, Soria-Carrasco V, Villoutreix R, Farkas TE, Riesch R et al (2017). Long-term balancing selection on chromosomal variants associated with crypsis in a stick insect. Molecular Ecology 26: 6189-6205.

Liu TT, Xu Y, Liu H, Luo S, Yin YJ, Liu SJ et al (2011). Functional characterization of a gene cluster involved in gentisate catabolism in Rhodococcus sp. strain NCIMB 12038. Applied Microbiology and Biotechnology 90: 671-678.

Lowe TM, Ailloud F, Allen C (2015). Hydroxycinnamic acid degradation, a broadly conserved trait, protects Ralstonia solanacearum from chemical plant defenses and contributes to root colonization and virulence. Molecular Plant-Microbe Interactions 28: 286-297.

Luchi N, Capretti P, Bonello P (2007). Production of Diplodia scrobiculata and Diplodia pinea pycnidia on ground Austrian pine needle agar medium. Phytopathologia Mediterranea 46: 230-235.

Lynch M (2006). Streamlining and simplification of microbial genome architecture. Annual Reviews Microbiology 60: 327-349.

Lynch M (2007). The origins of genome architecture. Sinauer Associates: Sunderland, MA.

Lynch M, Gabriel W (1987). Environmental tolerance. The American Naturalist 129: 283-303.

Ma L-J, van der Does HC, Borkovich KA, Coleman JJ, Daboussi M-J, Di Pietro A et al (2010). Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464: 367-373.

157 Mäkelä MR, Donofrio N, De Vries RP (2014). Plant biomass degradation by fungi. Fungal Genetics and Biology 72: 2-9.

Mäkelä MR, Marinović M, Nousiainen P, Liwanag AJ, Benoit I, Sipilä J et al (2015). Aromatic Metabolism of Filamentous Fungi in Relation to the Presence of Aromatic Compounds in Plant Biomass. Advances in Applied Microbiology 91: 63-137.

Marchler-Bauer A, Bryant SH (2004). CD-Search: protein domain annotations on the fly. Nucleic acids research 32: W327-W331.

Marquitti FMD, Guimarães PR, Pires MM, Bittencourt LF (2014). MODULAR: software for the autonomous computation of modularity in large network sets. Ecography 37: 221- 224.

Martens-Uzunova ES, Schaap PJ (2008). An evolutionary conserved d-galacturonic acid metabolic pathway operates across filamentous fungi capable of pectin degradation. Fungal Genetics and Biology 45: 1449-1457.

Martin G, Otto SP, Lenormand T (2006). Selection for recombination in structured populations. Genetics 172: 593-609.

Martins TM, Hartmann DO, Planchon S, Martins I, Renaut J, Pereira CS (2015). The old 3-oxoadipate pathway revisited: new insights in the catabolism of aromatics in the saprophytic fungus Aspergillus nidulans. Fungal Genetics and Biology 74: 32-44.

Mayr E (1963). Animal species and evolution. Harvard University Press: Cambridge, MA.

McAndrew RP, Sathitsuksanoh N, Mbughuni MM, Heins RA, Pereira JH, George A et al (2016). Structure and mechanism of NOV1, a resveratrol-cleaving dioxygenase. Proceedings of the National Academy of Sciences 113: 14324-14329.

McDonagh A, Fedorova ND, Crabtree J, Yu Y, Kim S, Chen D et al (2008). Sub- telomere directed gene expression during initiation of invasive aspergillosis. PLoS Pathogens 4: e1000154.

McGary KL, Slot JC, Rokas A (2013). Physical linkage of metabolic genes in fungi is an adaptation against the accumulation of toxic intermediate compounds. Proceedings of the National Academy of Sciences 110: 11481-11486.

Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA (2014). A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis. PLoS Computational Biology 10: e1004016.

158 Melnyk AH, McCloskey N, Hinz AJ, Dettman J, Kassen R (2017). Evolution of cost-free resistance under fluctuating drug selection in Pseudomonas aeruginosa. mSphere 2: e00158-00117.

Michielse CB, Reijnen L, Olivain C, Alabouvette C, Rep M (2012). Degradation of aromatic compounds through the β-ketoadipate pathway is required for pathogenicity of the tomato wilt pathogen F usarium oxysporum f. sp. lycopersici. Molecular plant pathology 13: 1089-1100.

Miller CT, Glazer AM, Summers BR, Blackman BK, Norman AR, Shapiro MD et al (2014). Modular skeletal evolution in sticklebacks is controlled by additive and clustered quantitative trait loci. Genetics 197: 405-420.

Moghe GD, Leong BJ, Hurney SM, Jones AD, Last RL (2017). Evolutionary routes to biochemical innovation revealed by integrative analysis of a plant-defense related specialized metabolic pathway. Elife 6: e28468.

Möller M, Habig M, Freitag M, Stukenbrock EH (2018). Extraordinary Genome Instability and Widespread Chromosome Rearrangements During Vegetative Growth. Genetics 210: 517-529.

Montalbini P (1991). Effect of rust infection on levels of uricase, allantoinase and ureides in susceptible and hypersensitive bean leaves. Physiological and molecular plant pathology 39: 173-188.

Moreno-Hagelsieb G, Jokic P (2012). The evolutionary dynamics of functional modules and the extraordinary plasticity of regulons: the Escherichia coli perspective. Nucleic acids research 40: 7104-7112.

Muto A, Kotera M, Tokimatsu T, Nakagawa Z, Goto S, Kanehisa M (2013). Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences of Reactions. Journal of chemical information and modeling 53: 613-622.

Nandasena KG, O'Hara GW, Tiwari RP, Howieson JG (2006). Rapid in situ evolution of nodulating strains for Biserrula pelecinus L. through lateral transfer of a symbiosis island from the original mesorhizobial inoculant. Applied and environmental microbiology 72: 7365-7367.

Navarro-Muñoz J, Selem-Mojica N, Mullowney M, Kautsar S, Tryon J, Parkinson E et al (2018). A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data. bioRxiv: 445270.

Newman ME (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74: 036104.

159

Nguyen NH, Song Z, Bates ST, Branco S, Tedersoo L, Menke J et al (2016). FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecology 20: 241-248.

Nishikawa H, Iijima T, Kajitani R, Yamaguchi J, Ando T, Suzuki Y et al (2015). A genetic mechanism for female-limited Batesian mimicry in Papilio butterfly. Nature Genetics 47: 405-409.

Nosil P, Funk DJ, Ortiz-Barrientos D(2009). Divergent selection and heterogeneous genomic divergence. Molecular ecology 18: 375-402.

Nützmann HW, Scazzocchio C, Osbourn A (2018). Metabolic Gene Clusters in Eukaryotes. Annual Review of Genetics 52: 159-183.

Nützmann HW, Huang A, Osbourn A (2016). Plant metabolic clusters–from genetics to genomics. New Phytologist 211: 771-789.

O'maille PE, Malone A, Dellas N, Hess Jr BA, Smentek L, Sheehan I et al (2008). Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature chemical biology 4: 617.

Ökmen B, Etalo DW, Joosten MH, Bouwmeester HJ, de Vos RC, Collemare J et al (2013). Detoxification of α-tomatine by C ladosporium fulvum is required for full virulence on tomato. New Phytologist 198: 1203-1214.

Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara B et al (2016). vegan: Community Ecology Package. R package version 2.3-5.

Pagel M, Meade A (2006). Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. The American Naturalist 167: 808-825.

Pál C, Papp B, Lercher MJ (2005). Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics 37: 1372-1375.

Palmer JM, Keller NP (2010). Secondary metabolism in fungi: does chromosomal location matter? Current Opinion in Microbiology 13: 431-436.

Paolinelli-Alfonso M, Villalobos-Escobedo JM, Rolshausen P, Herrera-Estrella A, Galindo-Sánchez C, López-Hernández JF et al (2016). Global transcriptional analysis suggests Lasiodiplodia theobromae pathogenicity factors involved in modulation of grapevine defensive response. BMC Genomics 17.

160 Paradis E, Claude J, Strimmer K (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.

Pareja-Jaime Y, Roncero MIG, Ruiz-Roldán MC (2008). Tomatinase from Fusarium oxysporum f. sp. lycopersici is required for full virulence on tomato plants. Molecular plant-microbe interactions 21: 728-736.

Parker D, Beckmann M, Zubair H, Enot DP, Caracuel-Rios Z, Overy DP et al (2009). Metabolomic analysis reveals a common pattern of metabolic re-programming during invasion of three host plant species by Magnaporthe grisea. The Plant Journal 59: 723- 737.

Parmesan C, Yohe G (2003). A globally coherent fingerprint of climate change impacts across natural systems. Nature 421: 37.

Patron NJ, Waller RF, Cozijnsen AJ, Straney DC, Gardiner DM, Nierman WC et al (2007). Origin and distribution of epipolythiodioxopiperazine (ETP) gene clusters in filamentous ascomycetes. BMC Evolutionary Biology 7: 174.

Peay KG, Garbelotto M, Bruns TD (2010). Evidence of dispersal limitation in soil microorganisms: isolation reduces species richness on mycorrhizal tree islands. Ecology 91: 3631-3640.

Peay KG, Kennedy PG, Talbot JM (2016). Dimensions of biodiversity in the Earth mycobiome. Nature Reviews Microbiology 14: 434-447.

Peng X, Masai E, Kasai D, Miyauchi K, Katayama Y, Fukuda M (2005). A second 5- carboxyvanillate decarboxylase gene, ligW2, is important for lignin-related biphenyl catabolism in Sphingomonas paucimobilis SYK-6. Applied Environmental Microbiology 71: 5014-5021.

Pepper JW (2003). The evolution of evolvability in genetic linkage patterns. Biosystems 69: 115-126.

Perrin RM, Fedorova ND, Bok JW, Cramer RA, Jr., Wortman JR, Kim HS et al (2007). Transcriptional Regulation of Chemical Diversity in Aspergillus fumigatus by LaeA. PLoS Pathogens 3: e50.

Pfannenstiel BT, Keller NP (2019). On top of biosynthetic gene clusters: How epigenetic machinery influences secondary metabolism in fungi. Biotechnology advances. doi: 10.1016/j.biotechadv.2019.02.001

161 Piasecka A, Jedrzejczak-Rey N, Bednarek P (2015). Secondary metabolites in plant innate immunity: conserved function of divergent chemicals. New Phytologist 206: 948- 964.

Pigné S, Zykwinska A, Janod E, Cuenot S, Kerkoud M, Raulo R et al (2017). A flavoprotein supports cell wall properties in the necrotrophic fungus Alternaria brassicicola. Fungal Biology and Biotechnology 4: 1.

Plissonneau C, Stürchler A, Croll D (2016). The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat. mBio 7: e01231-01216.

Plumridge A, Melin P, Stratford M, Novodvorska M, Shunburne L, Dyer PS et al (2010). The decarboxylation of the weak-acid preservative, sorbic acid, is encoded by linked genes in Aspergillus spp. Fungal Genetics and Biology 47: 683-692.

Pombert J-F, Haag KL, Beidas S, Ebert D, Keeling PJ (2015). The Ordospora colligata genome: Evolution of extreme reduction in microsporidia and host-to-parasite horizontal gene transfer. MBio 6: e02400-02414.

Porter SS, Faber-Hammond J, Montoya AP, Friesen ML, Sackos C (2019). Dynamic genomic architecture of mutualistic cooperation in a wild population of Mesorhizobium. The ISME journal 13: 301-315.

Price MN, Dehal PS, Arkin AP (2010). FastTree 2–approximately maximum-likelihood trees for large alignments. PloS ONE 5: e9490.

Price MN, Huang KH, Arkin AP, Alm EJ (2005). Operon formation is driven by co- regulation and not by horizontal gene transfer. Genome research 15: 809-819.

Proctor RH, Busman M, Seo J-A, Lee YW, Plattner RD (2008). A fumonisin biosynthetic gene cluster in Fusarium oxysporum strain O-1890 and the genetic basis for B versus C fumonisin production. Fungal Genetics and Biology 45: 1016-1026.

Proctor RH, McCormick SP, Alexander NJ, Desjardins AE (2009). Evidence that a secondary metabolic biosynthetic gene cluster has grown by gene relocation during evolution of the filamentous fungus Fusarium. Molecular microbiology 74: 1128-1142.

Prusky D, Alkan N, Mengiste T, Fluhr R (2013). Quiescent and necrotrophic lifestyle choice during postharvest disease development. Annual Review of Phytopathology 51: 155-176.

Raguso RA, Agrawal AA, Douglas AE, Jander G, Kessler A, Poveda K et al (2015). The raison d'être of chemical ecology. Ecology 96: 617-630.

162 Ramos J-L, Marqués S, van Dillewijn P, Espinosa-Urgel M, Segura A, Duque E et al (2011). Laboratory research aimed at closing the gaps in microbial bioremediation. Trends in biotechnology 29: 641-647.

Rane RV, Ghodke AB, Hoffmann AA, Edwards OR, Walsh TK, Oakeshott JG (2019). Detoxifying enzyme complements and host use phenotypes in 160 insect species. Current Opinion in Insect Science 31: 131-138.

Reynolds HT, Slot JC, Divon HH, Lysøe E, Proctor RH, Brown DW (2017). Differential retention of gene functions in a secondary metabolite cluster. Molecular biology and evolution 34: 2002-2015.

Reynolds HT, Vijayakumar V, Gluck-Thaler E, Korotkin HB, Matheny PB, Slot JC (2018). Horizontal gene cluster transfer increased hallucinogenic mushroom diversity. Evolution Letters 2: 88-101.

Richards LA, Dyer LA, Forister ML, Smilanich AM, Dodson CD, Leonard MD et al (2015). Phytochemical diversity drives plant–insect community diversity. Proceedings of the National Academy of Sciences 112: 10973-10978.

Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ (2006). Evolution of filamentous plant pathogens: gene exchange across eukaryotic kingdoms. Current Biology 16: 1857-1864.

Richards TA, Soanes DM, Jones MD, Vasieva O, Leonard G, Paszkiewicz K et al (2011). Horizontal gene transfer facilitated the evolution of plant parasitic mechanisms in the oomycetes. Proceedings of the National Academy of Sciences 108: 15258-15263.

Richards TA, Talbot NJ (2013). Horizontal gene transfer in osmotrophs: playing with public goods. Nature Reviews Microbiology 11: 720.

Rokas A, Wisecaver JH, Lind AL (2018). The birth, evolution and death of metabolic gene clusters in fungi. Nature Reviews Microbiology 16: 731-744.

Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S et al (2012). MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology 61: 539-542.

Samal A, Wagner A, Martin OC (2011). Environmental versatility promotes modularity in genome-scale metabolic networks. BMC Systems Biology 5.

Saunders M, Kohn LM (2009). Evidence for alteration of fungal endophyte community assembly by host defense compounds. New Phytologist 182: 229-238.

163 Savary S, Willocquet L, Pethybridge SJ, Esker P, McRoberts N, Nelson A (2019). The global burden of pathogens and pests on major food crops. Nature Ecology & Evolution 3: 430-439.

Savory F, Leonard G, Richards TA (2015). The role of horizontal gene transfer in the evolution of the oomycetes. PLoS pathogens 11: e1004805.

Schäfer W, Straney D, Ciuffetti L, Van Etten H, Yoder O (1989). One enzyme makes a fungal pathogen, but not a saprophyte, virulent on a new host plant. Science 246: 247- 249.

Schmaler-Ripcke J, Sugareva V, Gebhardt P, Winkler R, Kniemeyer O, Heinekamp T et al (2009). Production of pyomelanin, a second type of melanin, via the tyrosine degradation pathway in Aspergillus fumigatus. Applied Environtal Microbiology 75: 493- 503.

Schönknecht G, Chen W-H, Ternes CM, Barbier GG, Shrestha RP, Stanke M et al (2013). Gene transfer from bacteria and archaea facilitated evolution of an extremophilic eukaryote. Science 339: 1207-1210.

Schulz T, Stoye J, Doerr D (2018). GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data. BMC genomics 19: 308.

Seifbarghi S, Borhan MH, Wei Y, Coutu C, Robinson SJ, Hegedus DD (2017). Changes in the Sclerotinia sclerotiorum transcriptome during infection of Brassica napus. BMC genomics 18: 266.

Selman M, Pombert J-F, Solter L, Farinelli L, Weiss LM, Keeling P et al (2011). Acquisition of an animal gene by microsporidian intracellular parasites. Current Biology 21: R576-R577.

Shalaby S, Horwitz BA, Larkov O (2012). Structure–Activity Relationships Delineate How the Maize Pathogen Cochliobolus heterostrophus Uses Aromatic Compounds as Signals and Metabolites. Molecular Plant Microbe Interactions 25: 931-940.

Shanmugam V, Ronen M, Shalaby S, Larkov O, Rachamim Y, Hadar R et al (2010). The fungal pathogen Cochliobolus heterostrophus responds to maize phenolics: novel small molecule signals in a plant-fungal interaction. Cell Microbiology 12: 1421-1434.

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 13: 2498-2504.

164 Shi-Kunne X, Faino L, van den Berg GC, Thomma BP, Seidl MF (2018). Evolution within the fungal genus Verticillium is characterized by chromosomal rearrangement and gene loss. Environmental microbiology 20: 1362-1373.

Shimodaira H (2002). An approximately unbiased test of phylogenetic tree selection. Systematic biology 51: 492-508.

Shimodaira H, Hasegawa M (2001). CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17: 1246-1247.

Shwab EK, Bok JW, Tribus M, Galehr J, Graessle S, Keller NP (2007). Histone deacetylase activity regulates chemical diversity in Aspergillus. Eukaryotic Cell 6: 1656- 1664.

Slot JC (2017). Fungal gene cluster diversity and evolution. Advances in genetics. Elsevier. pp 141-178.

Slot JC, Hibbett DS (2007). Horizontal transfer of a nitrate assimilation gene cluster and ecological transitions in fungi: a phylogenetic study. PLoS ONE 2: e1097.

Slot JC, Rokas A (2010). Multiple GAL pathway gene clusters evolved independently and by different mechanisms in fungi. Proceedings of the National Academy of Sciences 107: 10136-10141.

Smillie CS, Smith MB, Friedman J, Cordero OX, David LA, Alm EJ (2011). Ecology drives a global network of gene exchange connecting the human microbiome. Nature 480: 241-244.

Smit A, Hubley R, Green P (2015). RepeatMasker Open-4.0. 2013–2015.

Snel B, Bork P, Huynen MA (1999). Genome phylogeny based on gene content. Nature Genetics 21: 108-110.

Snel B, Huynen MA (2004). Quantifying modularity in the evolution of biomolecular systems. Genome Research 14: 391-397.

Snel B, Lehmann G, Bork P, Huynen MA (2000). STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic acids research 28: 3442-3444.

Soanes D, Richards TA (2014). Horizontal gene transfer in eukaryotic plant pathogens. Annual Review of Phytopathology 52: 583-614.

165 Soanes DM, Chakrabarti A, Paszkiewicz KH, Dawe AL, Talbot NJ (2012). Genome-wide Transcriptional Profiling of Appressorium Development by the Rice Blast Fungus Magnaporthe oryzae. PLoS Pathogens 8: e1002514.

Speed MP, Fenton A, Jones MG, Ruxton GD, Brockhurst MA (2015). Coevolution can explain defensive secondary metabolite diversity in plants. New Phytologist 208: 1251- 1263.

Srivastava A, Cho IK, Cho Y (2013). The Bdtf1 gene in alternaria brassicicola is important in detoxifying brassinin and maintaining virulence on brassica species. Molecular Plant-Microbe Interactions 26: 1429-1440.

Stamatakis A (2014). RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30: 1312-1313.

Strese Å, Backlund A, Alsmark C (2014). A recently transferred cluster of bacterial genes in Trichomonas vaginalis-lateral gene transfer and the fate of acquired genes. BMC evolutionary biology 14: 119.

Stukenbrock EH, Dutheil JY (2018). Fine-scale recombination maps of fungal plant pathogens reveal dynamic recombination landscapes and intragenic hotspots. Genetics 208: 1209-1229.

Sturtevant AH (1913). The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. Journal of experimental zoology 14: 43-59.

Sui X, Golczak M, Zhang J, Kleinberg KA, von Lintig J, Palczewski K et al (2015). Utilization of dioxygen by carotenoid cleavage oxygenases. Journal of Biological Chemistry 290: 30212-30223.

Sui X, Zhang J, Golczak M, Palczewski K, Kiser PD (2016). Key residues for catalytic function and metal coordination in a carotenoid cleavage dioxygenase. Journal of Biological Chemistry 291: 19401-19412.

Sullivan MJ, Petty NK, Beatson SA (2011). Easyfig: a genome comparison visualizer. Bioinformatics 27: 1009-1010.

Sun B-F, Xiao J-H, He S, Liu L, Murphy RW, Huang D-W (2013). Multiple interkingdom horizontal gene transfers in Pyrenophora and closely related species and their contributions to phytopathogenic lifestyles. PLoS One 8: e60029.

Sun G, Yang Z, Kosch T, Summers K, Huang J (2011). Evidence for acquisition of virulence effectors in pathogenic chytrids. BMC evolutionary biology 11: 195.

166 Susi H, Laine AL (2013). Pathogen life-history trade-offs revealed in allopatry. Evolution 67: 3362-3370.

Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J et al (2014). STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Research 43: D447-52

Szöllősi GJ, Davín AA, Tannier E, Daubin V, Boussau B (2015). Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philosophical Transactions of the Royal Society B: Biological Sciences 370: 20140335.

Szumilas M (2010). Explaining odds ratios. Journal of the Canadian Academy of Child and Adolescent Psychiatry 19: 227.

Tadych M, Vorsa N, Wang Y, Bergen MS, Johnson-Cicalese J, Polashock JJ et al (2015). Interactions between cranberries and fungi: the proposed function of organic acids in virulence suppression of fruit rot fungi. Frontiers in microbiology 6: 835.

Takiguchi M, Matsubasa T, Amaya Y, Mori M (1989). Evolutionary aspects of urea cycle enzyme genes. Bioessays 10: 163-166.

Team RC (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Thomas CM, Nielsen KM (2005). Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nature reviews microbiology 3: 711.

Tropf S, Lanz T, Rensing S, Schröder J, Schröder G (1994). Evidence that stilbene synthases have developed from chalcone synthases several times in the course of evolution. Journal of Molecular Evolution 38: 610-618.

Tsaousis AD, de Choudens SO, Gentekaki E, Long S, Gaston D, Stechmann A et al (2012). Evolution of Fe/S cluster biogenesis in the anaerobic parasite Blastocystis. Proceedings of the National Academy of Sciences 109: 10426-10431.

Tsochatzidou M, Malliarou M, Papanikolaou N, Roca J, Nikolaou C (2017). Genome urbanization: clusters of topologically co-regulated genes delineate functional compartments in the genome of Saccharomyces cerevisiae. Nucleic Acids Research 45: 5818-5828.

Van der Meer JR, De Vos WM, Harayama S, Zehnder AJB (1992). Molecular mechanisms of genetic adaptation to xenobiotic compounds. Microbiological Reviews 56: 677-694.

167 VanEtten H, Matthews P, Tegtmeier K, Dietert M, Stein J (1980). The association of pisatin tolerance and demethylation with virulence on pea in Nectria haematococca. Physiological Plant Pathology 16: 257-268.

VanEtten H, Temporini E, Wasmann C (2001). Phytoalexin (and phytoanticipin) tolerance as a virulence trait: why is it not required by all pathogens? Physiological and Molecular Plant Pathology 59: 83-93.

Wadke N, Kandasamy D, Vogel H, Lah L, Wingfield BD, Paetz C et al (2016). The Bark-Beetle-Associated Fungus, Endoconidiophora polonica, Utilizes the Phenolic Defense Compounds of Its Host as a Carbon Source. Plant Physiology 171: 914-931.

Wagner A (2009). Evolutionary constraints permeate large metabolic networks. BMC evolutionary biology 9: 231.

Wagner A (2011). The origins of evolutionary innovations: a theory of transformative change in living systems. Oxford University Press: Oxford, England.

Wagner GP (1996). Homologues, natural kinds and the evolution of modularity. American Zoologist 36: 36-43.

Walther BA, Ewald PW (2004). Pathogen survival in the external environment and the evolution of virulence. Biological Reviews 79: 849-869.

Walton JD (2000). Horizontal gene transfer and the evolution of secondary metabolite gene clusters in fungi: an hypothesis. Fungal genetics and biology 30: 167-171.

Wang W-S, Zhao X-Q, Li M, Huang L-Y, Xu J-L, Zhang F et al (2015). Complex molecular mechanisms underlying seedling salt tolerance in rice revealed by comparative transcriptome and metabolomic profiling. Journal of experimental botany 67: 405-419.

Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C (2014). Gene discovery for enzymes involved in limonene modification or utilization by the mountain pine beetle- associated pathogen Grosmannia clavigera. Applied and environmental microbiology 80: 4566-4576.

Wang Y, Lim L, Madilao L, Lah L, Bohlmann J, Breuil C (2014). Gene discovery for enzymes involved in limonene modification or utilization by the mountain pine beetle- associated pathogen Grosmannia clavigera. Applied Environmental Microbiology 80: 4566-4576.

168 Watanabe S, Saimura M, Makino K (2008). Eukaryotic and bacterial gene clusters related to an alternative pathway of non-phosphorylated L-rhamnose metabolism. Journal of Biological Chemistry 283: 20372-20382.

Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R et al (2015). antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Research 43: W237-W243.

Werwath J, Arfmann H-A, Pieper DH, Timmis KN, Wittich R-M (1998). Biochemical and genetic characterization of a gentisate 1, 2-dioxygenase from Sphingomonas sp. strain RW5. Journal of bacteriology 180: 4171-4176.

Whitaker JW, McConkey GA, Westhead DR (2009). The transferome of metabolic genes explored: analysis of the horizontal transfer of enzyme encoding genes in unicellular eukaryotes. Genome biology 10: R36.

Wickham H (2016). ggplot2: elegant graphics for data analysis. Springer.

Wisecaver JH, Borowsky AT, Tzin V, Jander G, Kliebenstein DJ, Rokas A (2017). A global coexpression network approach for connecting genes to specialized metabolic pathways in plants. The Plant Cell 29: 944-959.

Wisecaver JH, Slot JC, Rokas A (2014). The evolution of fungal metabolic pathways. PLoS genetics 10: e1004816.

Wong S, Wolfe KH (2005). Birth of a metabolic gene cluster in yeast by adaptive gene relocation. Nature genetics 37: 777-782.

Wu C, Kim Y-S, Smith KM, Li W, Hood HM, Staben C et al (2009). Characterization of chromosome ends in the filamentous fungus Neurospora crassa. Genetics 181: 1129- 1145.

Yabe K, Nakajima H (2004). Enzyme reactions and genes in aflatoxin biosynthesis. Applied microbiology and biotechnology 64: 745-755.

Yamada T, Kanehisa M, Goto S (2006). Extraction of phylogenetic network modules from the metabolic network. BMC Bioinformatics 7.

Yeaman S (2013). Genomic rearrangements and the evolution of clusters of locally adaptive loci. Proceedings of the National Academy of Sciences 110: E1743-E1751.

Yeaman S, Aeschbacher S, Bürger R (2016). The evolution of genomic islands by increased establishment probability of linked alleles. Molecular ecology 25: 2542-2558.

169 Yeaman S, Whitlock MC (2011). The genetic architecture of adaptation under migration– selection balance. Evolution 65: 1897-1911.

Zhang H, Rokas A, Slot JC (2012). Two different secondary metabolism gene clusters occupied the same Ancestral locus in fungal dermatophytes of the Arthrodermataceae. PLoS ONE 7.

Zhao C, Waalwijk C, de Wit PJGM, Tang D, van der Lee T (2014). Relocation of genes generates non-conserved chromosomal segments in Fusarium graminearumthat show distinct and co-regulated gene expression patterns. BMC Genomics 15: 191.

Zhao H, Xu C, Lu H-L, Chen X, Leger RJS, Fang W (2014). Host-to-pathogen gene transfer facilitated infection of insects by a pathogenic fungus. PLoS pathogens 10: e1004009.

Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B et al (2014). Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks. Elife 3: e03275.

Zhao T, Kandasamy D, Krokene P, Chen J, Gershenzon J, Hammerbacher A (2018). Fungal associates of the tree-killing bark beetle, Ips typographus, vary in virulence, ability to degrade conifer phenolics and influence bark beetle tunneling behavior. Fungal Ecology 38: 71-79.

Zhaxybayeva O, Doolittle WF (2011). Lateral gene transfer. Current Biology 21: R242- R246.

Züst T, Heichinger C, Grossniklaus U, Harrington R, Kliebenstein DJ, Turnbull LA (2012). Natural enemies drive geographic variation in plant defenses. Science 338: 116- 119.

170

Appendix A: Chapter 2 Supplementary Materials

171 Table A.1: Well supported reports of HGT in Eukaryotic Microbial Pathogens. This table details 21 references with at least one report of HGT among eukaryotic microbial pathogens. Recipient lineage, donor lineage, detection methods, putative contact opportunity, and information on gene functions are listed. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

172

Appendix B: Chapter 3 Supplementary Materials

173

a) Cluster Detection Pipeline

Retrieve all homologs of anchor gene of interest Anchor gene homologs

Retrieve 20 genes upstream and downstream of anchor gene homologs Anchor gene Sort neighborhood genes into neighborhood ortholog groups; discard Microsynteny groups with a maximum distance <0.95 on the tree microsynteny tree

Compute unique combinations of ortholog groups found in the same neighborhood

Retain motifs that cover a Cluster motifs distance on the synteny tree with a ≤5% probability of Null models being observed in at least one genome, given the size of motif, and the appropriate null distribution

Clusters Discard clusters that contain genes predicted to exclusively participate in fungal secondary metabolite biosynthesis

More than 0 clusters No detected? Yes

Declare cluster class, Declare anchor gene based on anchor gene family to be unclustered family Group clusters according to similarities in their homolog group composition

Repeat for each Cluster models anchor gene family

Group species according to similarity in the combinations of clusters found in their genomes Multi-cluster model profiles

Continued Figure B.1: Schematics of computational pipelines and enrichment tests. a) Description of the main cluster detection algorithm. b) Description of microsynteny tree pipeline. c) Description of null model pipeline. d) Generalized representation of the contingency tables used for all enrichment analyses.

174

Figure B.1 continued

b) Microsynteny Tree Pipeline

Retrieve the best hit to a randomly sampled query gene from all genomes No

Were >56 best hits retrieved? Yes Repeat Retrieve 10 genes upstream 1000x and downstream of all best hits to query gene

Sort neighborhood genes into ortholog groups

Compute pairwise microsyntenic distances between all genomes with query genes

Randomly sample Compute median microsyntenic distance data microsyntenic distance for 100 times with replacement all pairwise combinations of (i.e., bootstrap) genomes

Build neighbor-joining tree using the median microsyntenic distance matrix; map bootstrap support to all nodes; collapse nodes with ≤70% support

Microsynteny tree

Continued

175

Figure B.1 continued

c) Null Model Pipeline

Randomly sample a query gene from a randomly selected genome of taxonomic class Y

Extend gene window of size (X-1)

Upstream? No No Repeat Repeat for Repeat for 500x each window each of 12 Downstream? size in 4-24 taxonomic range classes Yes Yes

Identify all genomes where homologs of genes in query window are clustered

Calculate distance on microsynteny tree covered by set of genomes with clusters

Null model for window of size X in taxonomic class Y

d) Generalized contingency table for one tail Fisher’s exact tests of enrichment Number of (genomes/ Number of (genomes/ species) associated species) NOT WITH ecological associated with lifestyle ecological lifestyle Number of (genomes/ species) WITH (cluster/ cluster model/multi- n11 n12 cluster model profile) Number of (genomes/ species) WITHOUT (cluster/cluster model/ multi-cluster model n21 n22 profile)

176

Figure B.2: The distribution of candidate phenylpropanoid-degrading gene clusters in fungi. The data shown in this figure are identical to those presented in Figure 1, except that the full species names of published fungal genomes are displayed on the microsynteny tree, along with their taxonomic order and class. Branch length distances

(in black) and bootstrap support values (in red) are additionally displayed above their corresponding branches. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure B.3: Distributions of gene window size for all gene cluster sizes. The distributions of gene window size (i.e., the number of genes in the genomic region over which the gene cluster spans) for each gene cluster size in each cluster class are depicted as Tukey- style box-and-whisker plots, where the 25th, 50th, and 75th percentiles are represented by the lower, middle and uppoer hinges, respectively. Lower whiskers extend to the 25th percentile - 1.5 * IQR, and upper whiskers extend to the 75th percentile + 1.5 * IQR, where IQR is the interquartile range (defined as the distance between the 75th and 25th percentiles). Outliers (as determined by Tukey’s method) are represented as black points that lie beyond the whisker boundaries. Not all gene cluster sizes are present in all cluster classes. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

177 Figure B.4: Associations between cluster model presence and fungal lifestyle. Odds ratios representing the strength of the association between cluster model presence and ecological lifestyle are shown for all cluster models in all cluster classes, for each of 6 fungal ecological lifestyles, using data from Pezizomycotina species. Dark grey bars indicate enrichment below a significance level of 0.05, while black outlines indicate enrichment below a significance level of 0.01. Error bars indicate the 95% confidence interval (CI) for each odds ratio measurement. CIs of 0 are not shown. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure B.5: The distribution of candidate phenylpropanoid-degrading gene cluster models in fungi. Displayed on the left is the microsynteny species tree containing all 529 genomes used in this study. Taxonomic order and class names are indicated below species names. Branch length distances (in black) and bootstrap support values (in red) are displayed above their corresponding branches. To the right are heatmaps displaying the ecological lifestyles associated with each genome (color coded by lifestyle) and the presence/absence of all 56 unique cluster types (cluster models) detected in each genome.

The 6 largest taxonomic classes are additionally indicated to the right of the heatmaps.

This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

178 Anchor gene Code Clusters Annotation quality Substrate Sequence source name detected? type Quinate 5- QDH Yes Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ dehydrogenase response level phenolic protein/XP_014080163.1 Naringenin 3- NAD Yes Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ dioxygenase response level phenolic protein/XP_003854249.1 Aromatic ring- ARD Yes Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ opening response level phenolic protein/XP_003719196.1 dioxygenase Phenol 2- PMO Yes Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ monooxygenase response level phenolic protein/EGP84432.1 Pterocarpan PAH Yes Experimental evidence at Free http://www.ncbi.nlm.nih.gov/p hydroxylase protein level phenolic rotein/AAC49410.1 Catechol CCH Yes Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ dioxygenase response level phenolic protein/EMD97186.1 Benzoate 4- BPH Yes Experimental evidence at Free http://www.uniprot.org/uniprot monooxygenase protein level phenolic /P17549 Salicylate SAH Yes Experimental evidence at Free http://www.ncbi.nlm.nih.gov/p hydroxylase protein level phenolic rotein/724090703 Epicatechin ECL Yes Experimental evidence at Free https://www.ncbi.nlm.nih.gov/ laccase protein level phenolic protein/EQB50152.1 Stilbene SDO Yes Experimental evidence at Free http://www.ncbi.nlm.nih.gov/p dioxygenase protein level phenolic rotein/EAA32528.1 Vanillyl alcohol VAO Yes Experimental evidence at Free http://www.uniprot.org/uniprot oxidase protein level phenolic /P56216 Ferulic acid CAE Yes Predicted protein Phenolic http://www.uniprot.org/uniprot esterase 7 polymer /A2QIA0 Ferulic acid FAD Yes Experimental evidence at Free http://www.uniprot.org/uniprot decarboxylase protein level phenolic /Q03034 Table B.1: Information concerning anchor gene family queries Continued

179 Table B.1 continued

Ferulic acid cae1 No Experimental evidence at Phenolic http://www.uniprot.org/unip esterase 1 protein level polymer rot/Q8WZI8 Ferulic acid cae2 No Experimental evidence at Phenolic http://www.uniprot.org/uniprot esterase 2 protein level polymer /Q9Y871 Ferulic acid cae3 No Experimental evidence at Phenolic http://www.uniprot.org/uniprot esterase 3 protein level polymer /Q9HE18 Ferulic acid cae4 No Inferred from homology Phenolic http://www.uniprot.org/uniprot esterase 4 polymer /A2QZI3 Ferulic acid cae5 No Experimental evidence at Phenolic http://www.uniprot.org/uniprot esterase 5 protein level polymer /Q9P979 Ferulic acid cae6 No Predicted protein Phenolic http://www.uniprot.org/uniprot esterase 6 polymer /Q0CI40 Tannase 1 tan1 No Experimental evidence at Free https://www.ncbi.nlm.nih.gov/ protein level phenolic protein/218059668 Tannase 2 tan2 No Experimental evidence at Free http://www.uniprot.org/uniprot protein level phenolic /P78581 Versatile vpl1 No Experimental evidence at Phenolic http://www.uniprot.org/uniprot peroxidase protein level polymer /Q9UR19 Quercetin qdo No Experimental evidence at Free https://www.ncbi.nlm.nih.gov/ dioxygenase protein level phenolic protein/157139497 Manganese mnp1 No Experimental evidence at Phenolic http://www.uniprot.org/uniprot peroxidase protein level polymer /Q02567 Lignin peroxidase lpo No Experimental evidence at Phenolic http://www.uniprot.org/uniprot protein level polymer /P06181 Rutinosidase rbg No Experimental evidence at Free http://www.uniprot.org/uniprot protein level phenolic /A2QQ32 Isoflavone ifr1 No Evidence at transcriptional Free https://www.ncbi.nlm.nih.gov/ reductase response level phenolic protein/EGP83788.1

180 Table B.2: Genome metadata. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Anchor gene Clusters that also Clusters with genes predicted to participate family contain an anchor exclusively in fungal secondary metabolism from another biosynthesis that were removed from analysis cluster class Quinate 5- 22 75 dehydrogenase Naringenin 3- 0 0 dioxygenase Aromatic ring- 1 40 opening dioxygenase Phenol 2- 6 84 monooxygenase Pterocarpan 0 68 hydroxylase Catechol 30 72 dioxygenase Benzoate 4- 1 0 monooxygenase Salicylate 0 134 hydroxylase Epicatechin 0 26 laccase Stilbene 0 69 dioxygenase Vanillyl alcohol 2 30 oxidase Ferulic acid 0 0 esterase Ferulic acid 0 27 decarboxylase

Table B.3: Additional information concerning clusters retrieved with each anchor gene family.

181 fuNOG and GO annotation key words pks nrps polyketide fatty acid synthase nonribosomal peptide peptide synthetase condensation domain

trichothecene 3-o-acetyltransferase citrinin aflatoxin norsolorinic ketoacyl snoal acyl carrier protein enoyl-\(acyl phosphopantetheine indole-diterpene tryptophan dimethylallyltransferase

dimethylallyl tryptophan synthase antibiotic

terpene PFAM terms PF14765 PF16073 PF08493 PF02801 PF00109 PF16197 PF13577

PF13561 PF00550 PF00668 PF00550 PF00733 PF07366

Table B.4: Key words associated with secondary metabolite biosynthesis used to remove genes predicted to participate exclusively in biosynthetic metabolism.

182 Table B.5: All candidate gene clusters detected in study. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table B.6: Homolog groups present ≥75% of clusters assigned to a given model. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table B.7: Total number of clusters per cluster model in species from each taxonomic class. Multiple instances of same cluster in fungi belonging to the same species were not counted. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

183 KOG process Total number of Total number of Total number of fuNOG Total number of fuNOG pvalue fuNOG groups fuNOG groups groups predicted to groups predicted to in database assigned to clustered participate in process participate in process and proteins assigned to clustered proteins Secondary metabolites 39446 380 1488 57 0 biosynthesis, transport and catabolism Energy production and 39446 380 888 44 4.15001E-13 conversion Carbohydrate transport 39446 380 2033 65 2.63E-12 and metabolism Coenzyme transport and 39446 380 343 20 1.98652E-10 metabolism Amino acid transport 39446 380 929 31 2.81E-09 and metabolism Lipid transport and 39446 380 863 20 0.00031546 metabolism Transcription 39446 380 1188 20 0.01200192 RNA processing and 39446 380 560 9 0.09439707 modification Inorganic ion transport 39446 380 451 6 0.2696099 and metabolism Nuclear structure 39446 380 63 1 0.4568283 Defense mechanisms 39446 380 169 2 0.4854798 Translation, ribosomal 39446 380 638 6 0.5798240 structure and biogenesis Cell wall membrane 39446 380 362 3 6.79E-01 envelope biogenesis Nucleotide transport and 39446 380 265 2 0.7259077 metabolism Continued

Table B.8: Enrichment of KOG processes in clustered proteins (one tailed Fisher's exact test).

184 Table B.8 continued

Intracellular trafficking, 39446 380 1065 7 0.8898333 secretion, and vesicular transport Chromatin structure and 39446 380 292 1 9.41E-01 dynamics Replication, 39446 380 668 3 0.9568002 recombination and repair Signal transduction 39446 380 1123 6 0.9609165 mechanisms Cell cycle control, cell 39446 380 355 1 9.68E-01 division, chromosome partitioning Posttranslational 39446 380 1602 8 0.9876903 modification, protein turnover, chaperones Cell motility 39446 380 5 0 1.00E+00 Cytoskeleton 39446 380 290 0 1 Extracellular structures 39446 380 24 0 1 Function unknown 39446 380 23782 68 1

185 KOG process Total number of Total number of Total number of Total number of fuNOG groups pvalue fuNOG groups fuNOG groups assigned fuNOG groups predicted to participate in process in database to proteins in shared predicted to and assigned to proteins in shared homolog groups participate in process homolog group Carbohydrate transport 39446 132 2033 31 3.56E-13 and metabolism Secondary metabolites 39446 132 1488 26 3.41E-12 biosynthesis, transport and catabolism Energy production and 39446 132 888 20 1.94E-11 conversion Coenzyme transport and 39446 132 343 11 2.49E-08 metabolism Amino acid transport 39446 132 929 9 0.00415203 and metabolism Lipid transport and 39446 132 863 7 0.02641278 metabolism Transcription 39446 132 1188 4 0.56497257 Translation, ribosomal 39446 132 638 2 0.63210798 structure and biogenesis 7 Intracellular trafficking, 39446 132 1065 3 0.69511919 secretion, and vesicular transport Signal transduction 39446 132 1123 3 0.72890205 mechanisms RNA processing and 39446 132 560 1 0.84901080 modification Cell cycle control, cell 39446 132 355 0 1 division, chromosome partitioning Continued

Table B.9: Enrichment of KOG processes in proteins that are part of homolog groups present in multiple cluster classes (i.e., shared homolog groups).

186 Table B.9 continued

Cell wall membrane 39446 132 362 0 1 envelope biogenesis Chromatin structure and 39446 132 292 0 1 dynamics Cell motility 39446 132 5 0 1 Cytoskeleton 39446 132 290 0 1 Defense mechanisms 39446 132 169 0 1 Extracellular structures 39446 132 24 0 1 Function unknown 39446 132 23782 15 1 Inorganic ion transport 39446 132 451 0 1 and metabolism Nuclear structure 39446 132 63 0 1 Nucleotide transport 39446 132 265 0 1 and metabolism Posttranslational 39446 132 1602 0 1 modification, protein turnover, chaperones Replication, 39446 132 668 0 1 recombination and repair

187

Table B.10: Information concerning homolog groups present in multiple cluster classes

(i.e., shared homolog groups). This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

File N.modules Modularity P.Null1 P.Null2 co-occurrence_network 6 0.651784 0 0

Table B.11: Output from MODULAR analysis with spectral partitioning.

188

Appendix C: Chapter 4 Supplementary Materials

189 7.E+06 y = 2376.1x - 204691 6.E+06 R² = 0.99803

5.E+06

4.E+06 V*sec) µ 3.E+06

Area ( Area 2.E+06

1.E+06

0.E+00 0 500 1000 1500 2000 2500

Conc. of butylated hydroxyanisole (ng/µL)

Figure C.1: External standard curve of butylated hydroxyanisole (BHA) used for the quantification of cleavage products resulting from stilbene cleavage oxygenase activity.

190 Figure C.2: Maximum likelihood tree of bacterial, plant and fungal amino acid sequences from the carotenoid cleavage dioxygenase and stilbene cleavage oxidase families.

Bootstrap support values (out of 100) are indicated above each branch. Schematics of gene clusters associated with each sequence are drawn to scale to the right of the tree, where only homolog groups that are part of the three different cluster models identified in

Figure 1 (main text) are shown and color-coded. The name of each cluster is indicated on top of each schematic, and corresponds to the information listed in Table C.5. The background of each cluster schematic is colored according to the model to which it was assigned (model 1: orange; model 2: green; model 3: blue; hybrid model 2+3: purple).

Sequences with stilbene cleavage activity that have been functionally characterized in previous studies have a light brown background (bacterial origins and accessions:

Sphingomonas paucimobilis: AAC60447.2 and AAB35856.2; Novosphingobium aromaticivorans: YP_496081.1 and YP_498079.1; fungal origins, protein ids and accessions: Ustilago maydis: UmRCO, Ustma1_5084, XP_761231; Aspergillus fumigatus: AfRCO, AspfuA11631_109054, XP_746307; Chaetomium globosum:

CgRCO, Chagl1_10054, XP_001219451; Botrytis cinerea: BcRCO, Botci1_1568,

XP_001548426; Neurospora crassa: Cao-1, Neucr2_6499, XP_961764.1). Sequences with stilbene cleavage activity that have been functionally characterized in this study have a gold background (fungal origins, protein ids and accessions: Podospora anserina:

PaSCO, Podan2_2307, MH350429; Magnaporthe oryzae: MoSCO, MH350428;

Diplodia sapinea: DsSCO, MH350427; Penicillium roquefortii: PrSCO,

PenroFM164_584405910, MH350430). Sequences with apocarotenoid cleavage activity

191 that have been functionally characterized in previous studies have a pink background

(bacterial origins and accessions: Synechocystis sp. strain PCC 6803: BAA18428.1;

Nostoc sp. PCC7120: BAB75983.1; plant origins and accessions: Arabidopsis thaliana:

NP_193652.1 and NP_195007.2). Sequences with carotenoid cleavage activity that have been functionally characterized in previous studies have an orange-red background

(fungal origins, protein ids/enzyme name and accessions: Neurospora crassa: Cao-2,

XP_001727958.1, Fusarium fujikuroi: Fusfu1_14330, CarX, CAH70723.1 and CarT,

CAL90971.1; plant origins and accessions: Zea mays: AAB62181.2 and ABF85668.1;

Malus domestica: ABY47995.1; Arabidopsis thaliana: NP_191911.1 and

NP_001318427.1). Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on

OhioLINK.

Figure C.3: Maximum likelihood tree of fungal amino acid sequences from the stilbene cleavage oxidase family with outgroup of characterized stilbene cleaving bacterial enzymes. Tree annotations follow those described in Figure C.2. Resolution is retained for on-screen examination of sequence accession numbers when magnified.

Figure C.4: Maximum likelihood tree of 466 monophyletic fungal amino acid sequences from the stilbene cleavage oxidase (SCO) clade examined in this study. Tree annotations follow those described in Figure C.2. Resolution is retained for on-screen examination of

192 sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on

OhioLINK.

Figure C.5: 50% majority rule consensus Bayesian tree of 466 monophyletic fungal amino acid sequences from the stilbene cleavage oxidase (sco) clade examined in this study. Unique node identifiers are written above each branch, and terminate in a numerical suffix indicating posterior probability (ranging from 0-1). Tree tips are labeled with binomial species names, taxonomic classes and protein IDs, and are color coded by the cluster model to which they belong. Histograms depicting the median probability of reconstructed ancestral clustered states (ranging from 0 to 1; 4 states: unclustered (red), model 1 (orange), model 2 (green), model 3 (blue)) are drawn above their respective branches. Dotted gray lines artificially extend branches that would otherwise be too short to accommodate histograms. Nodes with inferred transition, gain and fusion events (listed in Table S11) are outlined in squares color coded by cluster model. All other tree annotations follow those described in Figure C.2. Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.6: Putative replacement of vanillyl alcohol oxidase (vao) within a model 3 stilbene cleavage oxygenase (sco) cluster. The tree on the left was extracted from the

193 50% majority rule consensus Bayesian sco protein phylogeny (Figure C.2). In this tree, unique node identifiers that terminate in a numerical suffix indicating posterior probability (ranging from 0-1) are written above each branch with ≥0.95 posterior probability support. Histograms depicting the median probability of reconstructed ancestral clustered states (ranging from 0 to 1) are drawn above their respective branches.

Dotted gray lines artificially extend branches that would otherwise be too short to accommodate histograms. The tree on the right was extracted from the vao maximum likelihood phylogeny (Figure C.12). Bootstrap support values ≥70 (out of 100) are drawn above their respective branches. In both trees, tips are labeled with binomial species names, taxonomic classes and protein IDs, and are color coded by the cluster model to which they belong. Solid black lines between tree tips indicate sequences found in the same cluster. Red lines indicate sequences whose clustered association is inferred to have arisen through a gene replacement. Note that the placement of Aspgl1_76738 (model 3) in the 50% majority rule consensus tree conflicts with its placement in the maximum likelihood tree (Figure C.4), and furthermore is likely monophyletic with the vao sequences in model 3 clusters descending from node_510_1.00 (see test 6 in Table C.3).

Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.7: Maximum likelihood tree of bacterial and fungal amino acid sequences from the 2,3-dihydroxybenzoate decarboxylase (dhbd) family. Bootstrap support values (out of

194 100) are indicated above each branch. Sequences found in model 1 stilbene cleavage oxidase (sco) clusters are colored orange, and the protein id of the co-clustered SCO is indicated to their right in red. Sequences with DHBD activity that have been functionally characterized in previous studies have a light brown background (bacterial origin and accession: Sphingomonas paucimobilis: WP_014075111.1; fungal origin and accession:

Aspergillus niger: P80346.1). Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on

OhioLINK.

Figure C.8: Maximum likelihood tree of the monophyletic clade containing all 2,3- dihydroxybenzoate decarboxylase (dhbd) genes found clustered with stilbene cleavage oxygenase, based on amino acid sequences. Tree annotations follow those described in

Figure C.7. Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.9: Maximum likelihood tree of bacterial and fungal amino acid sequences from the gentisate 1,2 dioxygenase (gdo) family. Bootstrap support values (out of 100) are indicated above each branch. Sequences found in model 2 stilbene cleavage oxidase (sco) clusters are colored green, and the protein id of the co-clustered sco is indicated to their right in red. Sequences with GDO activity that have been functionally characterized in

195 previous studies have a light brown background (bacterial origin and accession:

Rhodococcus sp. NCIMB 12038: ADT78164.1. Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.10: Maximum likelihood tree of the monophyletic clade containing all gentisate

1,2 dioxygenase (gdo) genes found clustered with stilbene cleavage oxygenase, based on amino acid sequences. Tree annotations follow those described in Figure C.9. Resolution is retained for on-screen examination of sequence accession numbers when magnified.

This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.11: Maximum likelihood tree of bacterial and fungal amino acid sequences from the vanillyl alcohol oxidase (vao) family. Bootstrap support values (out of 100) are indicated above each branch. Sequences found in model 3 stilbene cleavage oxidase (sco) clusters are colored blue, and the protein id of the co-clustered sco is indicated to their right in red. Sequences with VAO activity that have been functionally characterized in previous studies have a light brown background (fungal origin and accession: Penicillium simplicissimum: P56216.1. Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this

196 document; you can find it in the supplementary files uploaded with this document on

OhioLINK.

Figure C.12: Maximum likelihood tree of the monophyletic clade containing all vanillyl alcohol oxidase (vao) found clustered with stilbene cleavage oxygenase, based on amino acid sequences. Tree annotations follow those described in Figure C.11. Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Figure C.13: A rooted, constrained maximum likelihood phylogeny based on the amino acid sequences of the second largest subunit of RNA polymerase II (rpb2) representing relationships among all 288 Pezizomycotina genomes examined in this study and 9 outgroup fungal genomes. This phylogeny contains all 212 genomes that have at least 1 sco homolog (of which 203 are from the Pezizomycotina) from a database of 549 genomes. Tips are labeled with genome codes, binomial species names and taxonomic class. Bootstrap support values (out of 100) are indicated above each branch. Schematics of gene clusters associated with each taxa are drawn to scale to the right of the tree, where only homolog groups that are part of the three different cluster models identified in

Figure 4.1 (Chapter 4 main text) are shown and color-coded. The name of each cluster is indicated on top of each schematic, and corresponds to the information listed in Table

C.5. The background of each cluster schematic is colored according to the model to

197 which it was assigned (model 1: orange; model 2: green; model 3: blue; hybrid model

2+3: purple). Resolution is retained for on-screen examination of sequence accession numbers when magnified. This figure’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

198 A) b d ab ac cd

7.5

5.0

2.5 Number of taxonomic classes represented among descendants of each node on gene tree

rpb2 sco dhbd gdo vao (n=296) (n=465) (n=90) (n=156) (n=172) Gene family (internal nodes) B) a b ab ab b 1.00

0.75

0.50

0.25 covered by descendants of each node on gene tree Percentage total branch length distance of rpb2 tree

0.00 rpb2 sco dhbd gdo vao (n=296) (n=465) (n=90) (n=156) (n=172) Gene family (internal nodes)

Figure C.14: Phylogenetic diversity measurements of key clustered and housekeeping gene families. Box-and-whisker plots display distributions of taxonomic diversity (A) or

199 Faith's phylogenetic diversity (B) associated with phylogenetic trees of key gene families

(second largest subunit of RNA polymerase II (rpb2; housekeeping gene), stilbene cleavage oxidase (sco; found in all models); dihydroxybenzoate decarboxylase (dhbd; model 1); gentisate 1,2 dioxygenase (gdo; model 2); vanillyl alcohol oxidase (vao; model

3)). The 25th, 50th, 75th percentiles are represented by lower, middle and upper hinges of the box plots; dots represent outliers as determined by Tukey's method; whiskers extend to +/- 1.5 x interquartile range. Taxonomic diversity is defined as the number of taxonomic classes represented in the descendants of an internal node on a given gene tree, and Faith's phylogenetic diversity is defined as the sum of the branch lengths corresponding to the minimum spanning path between all descendants of an internal node on a given gene tree, as measured on the rpb2 housekeeping tree that closely tracks accepted species relationships. Different lower case letters above each boxplot indicate significant differences (α = 0.05) among distributions, as determined by a Dunn test for multiple comparisons and the subsequent adjustment of p-values for multiple comparisons by the Benjamini-Hochberg procedure.

200 Transition rate coefficients sampled through reverse-jump MCMC

6 t icien ff oe c

e 4 t ra

ion t ransi t ampled S

2

0

unclustered model 1 unclustered model 2 unclustered model 3 model 1 model 2 model 1 model 3 model 2 model 3 to model 1 to unclustered to model 2 to unclustered to model 3 to unclustered to model 2 to model 1 to model 3 to model 1 to model 3 to model 2 Transition rate description

Figure C.15: Rate coefficients for transitions between clustered states sampled during

BayesTraits v3 reverse jump MCMC analysis. Distributions of sampled rate coefficients for each of the twelve possible clustered state transitions are summarized using box-and- whisker plots, where the top, middle and bottom horizontal lines respectively indicate the

75th, 50th, and 25th percentiles, where the vertical lines extend to 1.5 times the interquartile range (IQR), and outliers that fall outside 1.5 IQR are indicated by points.

201

Model 1 80 clusters aldehyde unknown2,3-dihydroxy- benzoate dehydrogenase sco dhbd decarboxylaseadh OH OH OH HO HO β-ketoadipate adh dhbd pathway O O succinyl-CoA acetyl-CoA sco monophenolic 2,3-dihydroxy catechol aldehyde O- benzoate R′3 Model 2 stilbene R′4 37 clusters salicylate fumaryl- gentisate 1,2transcription R3 hydroxylase pyruvate dioxygenase factor R′5 sah sco fph hydrolasegdo R4 O HO R5 OH TCA O sah O gdo fph O O fumarate HO O cycle OH OH pyruvate monophenolic gentisate 3-maleylpyruvic aldehyde O- acid isoeugenol OH

Model 3 H3 C HO succinyl-CoA

O β-ketoadipate O CH3 55 clusters OH acetyl-CoA H3C pathway vanillyl OH alcohol O β-ketoadipyl-CoA sco vao oxidaseO protocatechuic β-ketoadipate

H3 C vanillic acid OH acid O vao? OH

H C H C

OH 3 3 H3 C maleylacetate O O O

vao? HO O OH OH OH OH vao? ? vanillin

vanillyl p-creosol HO methoxyhydro- HO hydroxy- OH alcohol quinone quinol

Figure C.16: Hypothetical catabolic pathways encoded in three distinct types of stilbene cleavage oxidase (sco) clusters. The names of metabolic genes within the three cluster types are abbreviated, and their respective acronyms are annotated above the proposed reactions carried out by their protein products. Reactions with no corresponding clustered gene are not annotated. Annotations ending in a ‘?’ indicate reactions that have never been observed and are purely theoretical. TCA cycle = tricarboxylic acid cycle.

202 Table C.1: Genome metadata. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table C.2: AU Test between the optimal ML tree and the ML tree whose topology is constrained to match the Bayesian phylogeny. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on

OhioLINK.

Table C.3: AU Tests for the independence of transitions in sco's clustered state. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

203

Primer_pair_id Description Direction Sequence (5'-3') Specific for full length sco GAAGGAGATATACATATGGCCGGA from Diplodia sapinea Forward CACTACCTGAAGC Pinea-lsd_C-His (accession: MH350427); with pETite C-His Kan GTGATGGTGGTGATGATGAACAAC vector overhangs Reverse AAGCCCCAACTCCTGCC Specific for full length sco GAAGGAGATATACATATGACCTCT from Podospora anserina Forward TCCACCTCGCCTCAG Podan_lsd_esp (accession: MH350429) with pETite C-His Kan GTGATGGTGGTGATGATGTGGTTTA vector overhangs Reverse TCATTGTCTACCCA Specific for full length sco GAAGGAGATATACATATGTCCGTT from Penicillium Forward CTTTCTCAAAAGCCC FM164_C-His roqueforti (accession: MH350430 with pETite C- GTGATGGTGGTGATGATGCAGATC His Kan vector overhangs Reverse CTCGCCCTCAACCCA Specific for full length sco GAAGGAGATATACATATGGCTAGC from Magnaporthe oryzae Forward CATTTCAAGCCTCCG Mo_lsd_esp (accession: MH350428 with pETite C-His Kan GTGATGGTGGTGATGATGCCCTTTG vector overhangs Reverse ATATCCCCGGC

Table C.4: Primers generated and used in this study.

204 Table C.5: Descriptions and coordinates of sco clusters and singletons. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table C.6: Motif alignments of characterized bacterial and fungal scos. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table C.7: Presence of conserved amino acid motifs in 466 fungal sco sequences. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

205

Stilbene Concentration (µg) of detectable CPs Other name Cleavage Products (CPs) Substrate DsapSCO MorySCO ProqSCO PansSCO 3,5-Dimethoxy-4'- Pterostilbene hydroxy-E-stilbene 3,5-Dimethoxybenzaldehyde n.d. 0.03 0.03 0.03 4'-Hydroxybenzaldehyde 0.241 1.15 1.30 1.14

3,5,4′-Trihydroxy-trans- Resveratrol stilbene 3,5-Dihydroxybenzaldehyde n.d. 0.03 0.04 0.08 4'-Hydroxybenzaldehyde 0.337 0.84 0.87 0.74

3',4',3,5-Tetrahydroxy- Piceatannol trans-stilbene 3,5-Dihydroxybenzaldehyde n.d. 0.10 0.16 0.11 3',4'-Dihydroxybenzaldehyde n.d. 0.29 0.35 0.21

3,5,4'- Trihydroxystilbene-3-O- 5-Hydroxy-3-O-β-D- Piceid β-D-glucopyranoside glucopyranoside n.d. n.d. n.d. n.d.

4'-Hydroxybenzaldehyde n.d. 0.88 0.18 0.97

4-Hydroxy-3- 4-Hydroxy-3-methoxy- methoxybenzaldehyde Isoeugenol 1-propenylbenzene (Vanillin) n.d. n.d. 0.07 0.85 (propenyl-substituted Guaiacol)

n.d., not determined

Table C.8: UPLC analysis and determination of cleavage product concentrations.

206

Stilbene Organism Prot. Accession No. Comment Reference Substrate(s) Resveratrol, Cleaves stilbene compounds with a 4′-OH group McAndrew Novosphingobium NOV1 YP_496081 Piceatannol, (deprotonated by Tyr101 and Lys135) and et al. 2016 aromaticivorans , stilbene compounds lacking the 4′-OH or having

Isoeugenol 4′-OCH3 are not deprotonated, hence no activity. Marasco Resveratrol, Cleaves the interphenyl α-β double bond of and Novosphingobium Piceatannol, stilbenes with a 4′-OH group to their NOV2 YP_498079 Schmidt- aromaticivorans , corresponding aldehyde products. Dannert 2008 4,4'-dihydroxy- Similar to lignostilbene α,β-dioxygenase (LSD) Kamoda Sphingomonas paucimobilis SPA1 AAC60447 3,3'- isozyme class I that cleaves the central double and Saburi dimethoxystilbene bond of stilbenes. 1993 The first eukaryotic enzyme identified, involved Brefort et Ustilago maydis XP_761231 Resveratrol, in the degradation of stilbenes. Predicted to al. 2011 Aspergillus fumigatus XP_746307 Piceatannol represent a subfamily of fungal enzymes related RCO1 Chaetomium globosum XP_001219451 to the bacterial LSD I–III and Nov 1–2 enzymes,

Botryotinia fuckeliana XP_001548426 which cleave lignostilbene and resveratrol,

respectively. Diaz- Resveratrol, Cleaves the interphenyl α-β double bond. But Neurospora crassa CAO1 XP_961764 Sanchez et Piceatannol does not convert five other similar stilbenes, al. 2013 indicating the requirement for a minimal number

of unmodified hydroxyl groups on the stilbene.

Diplodia sapinea MH350427 Pterostilbene, Cleaves all tested stilbene compounds with a 4′- This study Magnaporthe oryzae Guy-11 MH350428 Resveratrol, OH group (5/6), except for Isorhapontigenin and This study SCO Pencillium roqueforti FM164 MH350430 Piceatannol, a stilbene compound lacking the 4′-OH group, This study Podospora anserina S mat + 78-1 MH350429 Piceid, Isoeugenol pinosylvin. This study

Table C.9: Substrates of sco and related proteins

207 Table C.10: Node statistics of maximum likelihood trees of sco, dhbd, gdo and vao. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table C.11: Inferred gains and transitions of sco's clustered state. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

Table C.12: Transposable element sequences detected in sco cluster regions. This table’s dimensions are too large for this document; you can find it in the supplementary files uploaded with this document on OhioLINK.

208