<<

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange

Doctoral Dissertations Graduate School

5-2016

Computational Identification of erpeneT Synthase Genes and Their Evolutionary Analysis

Qidong Jia University of Tennessee - Knoxville, [email protected]

Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss

Part of the Bioinformatics Commons, Computational Biology Commons, Evolution Commons, Genomics Commons, Biology Commons, and the Systems Biology Commons

Recommended Citation Jia, Qidong, "Computational Identification of erpeneT Synthase Genes and Their Evolutionary Analysis. " PhD diss., University of Tennessee, 2016. https://trace.tennessee.edu/utk_graddiss/3654

This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council:

I am submitting herewith a dissertation written by Qidong Jia entitled "Computational Identification of erpeneT Synthase Genes and Their Evolutionary Analysis." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Life Sciences.

Feng Chen, Major Professor

We have read this dissertation and recommend its acceptance:

Brian C. O’Meara, Gerald A. Tuskan, Xiaohan Yang

Accepted for the Council:

Carolyn R. Hodges

Vice Provost and Dean of the Graduate School

(Original signatures are on file with official studentecor r ds.) Computational Identication of Terpene Synthase Genes and Their Evolutionary Analysis

A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville

Qidong Jia May 2016 © by Qidong Jia, 2016

All Rights Reserved.

ii dedicated to my beloved family

iii Acknowledgements

First, I would like to thank my advisor Dr. Feng Chen for his consistent support, guidance and condence in me during my dissertation work. He is a person with great passion and I sincerely appreciate his help in developing my scientic attitude.

I also sincerely appreciate Dr. Albrecht Von Arnim for his tremendous mentorship, encouragement and support. I would also like to express my sincere gratitude to my statistics advisor Dr. Robert Mee for his detailed guidance and advice.

Second, I’m very grateful to my committee members, Dr. Brian C. O’Meara, Dr. Gerald

A. Tuskan and Dr. Xiaohan Yang for their time, guidance and expertise throughout my dissertation.

Finally, and most importantly, my upmost gratitude goes to my parents, parents-in- law and my wife, none of this would have been possible without their unconditional support, sacrice, love and faith in me.

iv Abstract

Terpenoids, the largest and most structurally and functionally diverse class of natural compounds on earth, are mostly synthesized by to be involved in various plant- environment interactions. Some terpenoids are classied as primary metabolites essential for plant growth and development. Terpene synthases (TPSs), the key enzymes for ter- penoid biosynthesis, are the major determinant of the tremendous diversity of terpenoid carbon skeletons. The TPS genes represent a mid-size family of about 30-100 functional genes in almost all major sequenced plant genomes. TPSs are also found in fungi and bacteria, but microbial TPS genes share low levels of sequence similarity and dierent patterns of gene structure with their plant counterparts. Although one common ancestor theory has been suggested and supported by studies from model , the evolution of plant terpene synthase genes and the evolutionary relationships among terpene synthase genes in plants, bacteria and fungi are still unclear. The recent discovery of microbial- type TPS genes in Selaginella muellendori makes it even more confusing. The goal of this dissertation project is to study the mechanisms that govern the dynamic evolution of terpene synthase genes using comparative genomics methods. Here, we carried out

v a large-scale screen to identify terpene synthases in plants (transcriptomes for over

1000 plant species sequenced by the OneKP project and selected genomes from non- seed plants), fungi and bacteria species (sequenced genomes in JGI). Several important discoveries were made by analyzing the data: (1) the microbial-type TPS genes are widely and specically distributed in non-seed land plants; (2) HGT from bacterial to fungi in

TPS gene family is identied; (3) a new subfamily x is identied and new insights into the subfamily classication of TPSs are reported by including TPSs identied from large-scale non-seed plant species; (4) the distribution and genomic organization of four types of TPSs in fungi are characterized. These important ndings will help us better understand the evolution of plant secondary metabolism, especially for basal land plants.

vi Table of Contents

1 Introduction and Literature Review1

1.1 Terpenoids - Natural Functions and Industrial Uses...... 2

1.2 Biosynthesis of Terpenoids...... 3

1.3 Terpene Synthases...... 6

1.4 Origin and Evolution of Terpene Synthase Genes...... 8

1.5 Objectives...... 10

1.6 Bibliography...... 11

1.7 Appendix...... 16

2 Evolution of Typical Plant Terpene Synthase Genes in Non-Seed Plants 19

2.1 Abstract...... 20

2.2 Introduction...... 21

2.3 Results and Discussion...... 23

2.3.1 Identication of Terpene Synthase Genes from Genomes of Horn-

wort, , Liverwort and Two ...... 23

vii 2.3.2 Identication of Terpene Synthases from Non-Seed Plants Tran-

scriptomes...... 24

2.3.3 Search for Terpene Synthase Genes in ...... 24

2.3.4 Phylogenetic Analysis of TPSs from Non-Seed Plants with TPSs

from Selected Seed Plants...... 25

2.4 Conclusions...... 28

2.5 Materials and Methods...... 30

2.5.1 Data Retrieval, Management and Classication..... 30

2.5.2 Identication of Typical Plant Terpene Synthases from Transcrip-

tomes of 324 Non-Seed Plants...... 31

2.5.3 Assembly of Anthoceros punctatus Genome and Identi-

cation of Terpene Synthases...... 31

2.5.4 Identication of Terpene Synthases from Marchantia polymorpha

and fallax ...... 32

2.5.5 Identication of Terpene Synthases from Salvinia cucullata and

Azolla superorganism ...... 33

2.5.6 Identication of Terpene Synthases from Lygodium japonicum ... 33

2.5.7 Search for Terpene Synthase Genes in Twelve Algal Genomes and

Transcriptomes...... 34

2.5.8 TPSs on JBrowse...... 34

2.5.9 Phylogenetic Analyses of Terpene Synthases...... 34

2.6 Bibliography...... 36

viii 2.7 Appendix...... 42

3 Microbial Type Terpene Synthase Genes Occur Widely and Specically in

Non-seed Land Plants 60

3.1 Abstract...... 62

3.2 Introduction...... 63

3.3 Results and Discussion...... 65

3.3.1 Terpene Synthase Genes of Microbial Type are Highly Enriched

in the Transcriptomes of Non-Seed Land Plants...... 65

3.3.2 The Majority of MTPSL Genes Identied from Plant Transcrip-

tomes Forms Four Groups Clustered with Either Fungal or Bac-

terial Terpene Synthases...... 66

3.3.3 The Majority of MTPSL Genes Identied from Plant Transcrip-

tomes are Plant Genes...... 67

3.3.4 Evolutionary Implications of Non-Seed Plant-Specic MTPSLs.. 71

3.3.5 Biochemical Function of Selected MTPSLs: Diversity of Activities. 74

3.4 Conclusions...... 75

3.5 Materials and Methods...... 76

3.5.1 Identication of Terpene Synthases of Microbial Type from Tran-

scriptomes and Sequenced Genomes...... 76

3.5.2 Assembly of Hornwort Anthoceros punctatus Genome and Identi-

cation of MTPSL Genes...... 78

ix 3.5.3 Phylogenetic Analyses of Terpene Synthases...... 78

3.5.4 Plant Material, Genomic DNA Isolation and PCR...... 79

3.6 Acknowledgements...... 80

3.7 Bibliography...... 81

3.8 Appendix...... 88

4 Horizontal Gene Transfer of Terpene Synthase Genes from Bacteria to

Fungi 103

4.1 Abstract...... 105

4.2 Introduction...... 106

4.3 Results...... 109

4.3.1 Analysis of Bacterial and Fungal Terpene Synthase Genes Sug-

gests Possible HGT Events from Bacteria to Fungi...... 109

4.3.2 The Apparent Orthologs of BTPSL were Identied in a Group of

Entomopathogenic Fungi...... 110

4.3.3 Collinearity for the Genome Region Containing the BTPSL and

the Identication of Neighbor Genes...... 112

4.3.4 Experimental Verication of BTPSL from M. Robertsii as a Fungal

Gene...... 114

4.3.5 The Presence of Typical Fungal TPS Genes in Relevant Fungal

Species...... 114

x 4.3.6 Phylogenetic Analysis of BTPSLs, Related Bacterial TPSs and

Other TPSs...... 115

4.3.7 The BTPSL Gene of HGT Origin in M. Robertsii Encodes a Func-

tional Enzyme...... 116

4.3.8 Molecular Evolutionary Analysis of BTPSL Genes of HGT Origin

in ...... 116

4.4 Discussion...... 117

4.5 Materials and Methods...... 121

4.5.1 Data Sources and Analysis...... 121

4.5.2 Multiple Sequence Alignments and Phylogenetic Inference..... 122

4.5.3 Multiple Genome Alignment...... 124

4.5.4 Analysis of Selective Pressure...... 124

4.5.5 Fungal Culture...... 124

4.5.6 MAA_08668 Gene Cloning and Verication of Its Position..... 125

4.5.7 Biochemical Characterization of MAA_08668...... 126

4.6 Acknowledgements...... 127

4.7 Bibliography...... 128

4.8 Appendix...... 136

5 Identication, Characterization, and Evolution of Fungal Terpene Syn-

thases 147

5.1 Abstract...... 148

xi 5.2 Introduction...... 148

5.3 Results...... 151

5.3.1 Distribution of TPSs in Fungal Genomes...... 151

5.3.2 Intron-Exon Structure...... 155

5.3.3 Motif Analysis...... 155

5.4 Discussion...... 157

5.5 Methods...... 159

5.5.1 Genome Mining of Fungal Terpene Synthases...... 159

5.5.2 Multiple Sequence Alignments and Phylogenetic Inference..... 160

5.6 Bibliography...... 161

5.7 Appendix...... 168

6 Summary and Conclusions 183

Appendix 186

A List of species 187

A.1 List of OneKP Species...... 187

A.2 List of 519 Fungi Species...... 198

Vita 202

xii List of Tables

2.1 Total number of TPS genes identied in six non-seed plant genomes.... 42

2.2 Typical plant terpene synthase genes in Salvinia cucullata...... 43

2.3 Typical plant terpene synthase genes in Azolla superorganism...... 44

2.4 Assembly statistics for Anthoceros punctatus...... 45

2.5 Typical plant terpene synthase genes in Anthoceros punctatus...... 46

2.6 Typical plant terpene synthase genes in Marchantia polymorpha...... 47

2.7 Assembly statistics for Sphagnum fallax...... 48

2.8 Typical plant terpene synthase genes in Sphagnum fallax...... 49

2.9 Number of TPSs genes from non-seed plant transcriptomes...... 50

2.10 List of transcriptomes/genomes...... 51

3.1 List of screened plant species and the number of MTPSLs in each species.

All these 1103 transcriptomes were from OneKP (www.onekp.com). MTP-

SLs are found from 146 species...... 88

3.2 Summary statistics of MTPSLs in 9 plant lineages...... 90

xiii 3.3 22 MTPSL genes outside of the four groups (high similarity to microbial

TPSs)...... 91

3.4 MTPSL genes from the hornwort Anthoceros puctatus...... 92

3.5 Typical plant terpene synthase genes in Sphagnum fallax...... 93

3.6 A list of sequenced plants searched for MTPSL genes...... 94

4.1 Presence/Absence of BTPSLs for each entomopathogenic fungi examined

in this study...... 136

4.2 List of typical fungal TPSs for each entomopathogenic fungi...... 137

4.3 Presence/Absence of BTPSLs for each entomopathogenic fungi examined

in this study...... 138

4.4 Presence/Absence of BTPSLs for each entomopathogenic fungi examined

in this study...... 139

5.1 Summary statistics of exon numbers of four TPS types...... 168

5.2 Summary statistics of mRNA lengths of four TPS types...... 169

A.1 List of screened transcriptomes and the number of PTPS and MTPS in

each sample. All these 1175 transcriptomes were from OneKP (www.

onekp.com). The unique four letter codes are the transcriptome identiers

assigned by 1KP...... 188

A.2 List of 519 screened fungi genomes and the number of PTPS and MTPS

in each species. These data were downloaded from the fungal genomics

portal (http://jgi.doe.gov/fungi)...... 198

xiv List of Figures

1.1 Two compartmentalized pathways of isoprenoid biosynthesis. Dashed

arrows indicate more than one step...... 16

1.2 The mevalonic acid (MVA) pathway and the methylerythritol phosphate

(MEP) pathway in plants (Vranova et al., 2013)...... 17

1.3 A schematic diagram of terpene synthases with annotated sequence

features (Keeling and Bohlmann, 2006)...... 18

2.1 Distribution of protein lengths for typical plant terpene synthases in non-

seed plants. The green dash line indicates the median length, while the

read line indicates the average protein length...... 52

2.2 Phylogenetic relationships of terpene synthases from non-seed plants and

selected TPSs from seed plants. Weblogos show the conserved motifs

found in each sunfamily...... 53

2.3 Alignment of the conserved motifs found in 339 terpene synthases..... 54

3.1 Distribution of terpene synthase genes of microbial type (MTPSL) identi-

ed from the transcriptomes of 1103 plant species...... 96

xv 3.2 Phylogeny of terpene synthases of microbial type (MTPSL) identied from

OneKP...... 97

3.3 Validation of representative MTPSL genes to be plant genes...... 98

3.4 Motif analysis of MTPSLs...... 99

3.5 Biochemical activity of selected MTPSL genes...... 100

3.6 Alignment of one MTPSL of putative contamination with one fungal TPS. 101

3.7 Phylogenetic of MTPSL genes identied from Anthoceros puctatus

genome with those identied from transcriptomes...... 102

4.1 Maximum Likelihood phylogenetic tree of 341 Terpene Synthase C-domain

sequences from bacteria (Blue) and fungi (Magenta)...... 140

4.2 Amino acid sequence alignment of 8 BTPSL TPSs, and representative TPSs

from bacteria and fungi...... 141

4.3 Alignment of the orthologous genomic regions containing the BTPSL

genes from seven fungi species...... 142

4.4 Conrmation of MAA_08668, a fungal TPS gene of putative horizontal

gene transfer origin, integration into Metarhizium robertsii genome..... 143

4.5 Maximum Likelihood phylogenetic tree and exon-intron structure of both

types of TPS genes...... 144

4.6 Biochemical activity of terpene synthase MAA_08668 of putative horizon-

tal gene transfer origin in Metarhizium robertsii...... 145

xvi 4.7 A species tree redrawn based on previous studies showing the evolution-

ary relationship of several entomopathogenic fungi...... 146

5.1 Phylogenetic tree of major fungal lineages and corresponding number of

terpene synthases in each of the four types...... 170

5.2 Phylogeny of all four types of fungal terpene synthases...... 171

5.3 Maximum likelihood phylogeny of 1753 α type terpene synthases...... 172

5.4 Maximum likelihood phylogeny of 545 Trichodiene synthases...... 173

5.5 Phylogenetic relationship among 7 species containing

bifunctional CPS/KSs...... 174

5.6 Maximum likelihood phylogeny of 230 bifunctional CPS/KSs...... 175

5.7 Multiple sequence alignment of 11 Basidiomycota CPS/KSs and 3 Ascomy-

cota CPS/KSs...... 176

5.8 Maximum likelihood phylogeny of 114 αα1 type terpene synthases..... 177

5.9 Multiple sequence alignment of 2 Basidiomycota and 3 αα1

type terpene synthases...... 178

5.10 Species distribution in the four types of fungal Terpene synthases...... 179

5.11 Comparisons of the terpene synthase gene length and exon number..... 180

5.12 Sequence logos of conserved terpene synthase family motifs observed in

each type of fungal terpene synthases...... 181

5.13 Venn diagram of identied CPS/KSs by known CPS/KSs from three species. 182

xvii Chapter 1

Introduction and Literature Review

1 1.1 Terpenoids - Natural Functions and Industrial Uses

Terpenoids, also known as isoprenoids, constitute the largest and most diverse class of natural products, consisting of over 50,000 dierent terpenoid structures. They are chemically modied, such as by the addition of oxygen or rearrangement of the carbon skeleton, from terpenes, most of which are pure hydrocarbons. The term “terpene” is derived from the word “turpentine”, which is a uid distilled from resin produced mainly by pine tress. This uid is a variable mixture of terpenes. All classes of terpenoids are biosynthesized from two fundamental C5 isoprene building blocks: isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). They are condensed by prenyltransferases in a typical head-to-tail manor to form various prenyl diphosphates that serve as the precursors for terpenoid biosynthesis. According to the number of isoprene building blocks they contain, terpenoids can be classied as hemiterpenoids

(C5), monoterpenoids (C10), sesquiterpenoids (C15), diterpenoids (C20), triterpenoid (C30), tetraterpenoid (C40) or polyterpenoids (C5n)(Tholl and Lee, 2011).

Although terpenoids are more abundant and extensively studied in plants, they are ubiquitous and involved in a wide range of biological functions in all living organisms.

Many of them are produced as primary metabolites, where they function as pigments in photosynthesis (carotenoids, chlorophylls and plastoquinones), plant hormones (gib- berellic acid, abscisic acid and cytokinins), electron transport carriers (ubiquinone and plastoquinone), carbohydrate carriers (bactoprenol and dolichols), membrane stabilizers

(sterols), etc. (Tholl, 2006). However, the most diverse and abundant isoprenoids are

2 discovered in plants as secondary metabolites that play vital roles in direct and indirect plant defenses in response to herbivores and pathogens, as well as in communicating with their environment, among which volatile terpenoids released by dierent plant tissues, such as owers, and , are particular important (Christianson, 2008;

Pichersky and Lewinsohn, 2011; Pichersky et al., 2006).Besides their importance in plant physiology and ecology, many terpenoids are economically important as fragrance, pigments, polymers, bers, natural insecticide and natural rubber. There also have been many important applications of terpenoids in pharmaceutical industry such as the two most valuable terpenoids, the antineoplastic agent TAXOL®(paclitaxel) and the antimalarial drug artemisinin that has been used to treat malaria for more than 2000 years.

1.2 Biosynthesis of Terpenoids

The universal isoprene building blocks of all isoprenoids, IPP and DMAPP, can be synthesized via two distinct pathways: the mevalonic acid (MVA) pathway and the methylerythritol phosphate (MEP) pathway (Figure 1.1). For many years, the classical

MVA pathway had been widely accepted as the only route in all organisms. However, several discordant results suggesting the existence of another pathway had been reported in certain bacteria. These led to the discovery of the MEP pathway by Rohmer in 1993

(Rohmer et al., 1993). Although there are some exceptions, it is now well established that the MVA pathway is present in and archaea and located in the cytosol; in contrast, the MEP pathway, which operates in plastids, is used by eubacteria, plant

3 chloroplasts and green algae but absent in archaea, fungi and animals (Vranova et al.,

2013). In plants, these two pathways coexist but dier in their localization (Roberts, 2007).

The MVA pathway provides the precursors for the biosynthesis of sesquiterpenes and triterpenes, while monoterpenes, diterpenes, carotenoids, chlorophylls and gibberellins are generated via the MEP pathway. Although interactions between them have been reported in plants, very little is known about the exchange mechanism by which they are nely cooperated (Hemmerlin et al., 2003).

The MVA pathway uses seven enzymes to transform acetyl-CoA to IPP. In the initial reaction of the MVA pathway (Figure 1.2), three molecules of acetyl-CoA are condensed successively to produce 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by the enzymes acetoacetyl-CoA thiolase (AACT) and HMG-CoA synthase (HMGS). HMG-

CoA is further converted to mevalonic acid (MVA) in two successive reduction steps by HMG-CoA reductase (HMGR), a key rate-limiting enzyme in the MVA pathway.

The subsequent conversion of MVA into mevalonate-5-diphosphate (MVADP) involves two-step phosphorylation reactions, catalyzed by mevalonate kinase (MK or MVK) and phosphomevalonate kinase (PMK or PMVK). Then, MVADP is decarboxylated by mevalonate-5-pyrophosphate decarboxylase (MPDC or MVD) to nally form IPP, which is then converted to its isomer, DMAPP, in a reaction catalyzed by the IPP/DMAPP isomerase (IDI).

The MEP pathway (Figure 1.2), also named 1-deoxy-D-xylulose-5-phosphate (DXP) pathway, starts with the formation of DXP by condensation of glyceraldehyde-3-phosphate

(GAP) with (hydroxyethyl) thiamine diphosphate derived from pyruvate. This reaction is

4 catalyzed by 1-deoxy-D-xylulose 5-phosphate synthase (DXS). In the second step, DXP

is rearranged and reduced to generate 2-C-Methylerythritol 4-phosphate (MEP) by DXP

reductoisomerase (DXR) in the presence of NADPH. The next step is the conversion

of MEP into 4-Diphosphocytidyl-2-C-methylerythritol (CDP-ME) in a CTP-dependent

reaction mediated by the enzyme 4-diphosphocytidyl-2C-methyl-D-erythritol synthase

(MCT or IspD). Subsequently, CDP-ME is phosphorylated to yield 4-Diphosphocytidyl-

2-C-methyl-D-erythritol 2-phosphate (CDP-MEP), which is catalyzed by 4-(cytidine 5’-

diphospho)-2-C-methyl-D-erythritol kinase (CMK). The following two steps are the se-

quential conversion of CDP-MEP into (E)-4-Hydroxy-3-methyl-but-2-enyl pyrophosphate

(HMBPP) under the catalysis of 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase

(MDS) and 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase (HDS). In the nal

step, the enzyme HMBPP reductase (HDR) converts HMBPP into IPP and DMAPP with

an approximate ratio of 5.

In the last stage of terpenoid biosynthesis, prenyltransferases catalyze the head-to-tail

fusion of DMAPP with one, two, or three units of IPP to form linear prenyl diphosphate

precursors: geranyl diphosphate (GPP, C10, monoterpenoids), farnesyl diphosphate (FPP,

C15, sesquiterpenoids) and geranylgeranyl diphosphate (GGPP, C20, diterpenoids). Next,

these direct precursors of terpenoids are catalyzed by terpene synthases to produce

diverse terpenoid compounds, which can be further modied by hydroxylation, methyla-

tion, glycosylation, acylation or peroxidation.

5 1.3 Terpene Synthases

As described above, all terpenoids are produced from allylic diphosphates by the

action of terpene synthases (TPSs), which constitute a superfamily and contribute to

the tremendous diversity of terpenoid carbon skeletons. They are present in multiple

domains of life, but their protein sequences show a relatively low level of conservation

even within the same domain (Bohlmann et al., 1997). However, the elucidated protein

structures of several TPSs (Hyatt et al., 2007; Köksal et al., 2011b, 2010; McAndrew et al.,

2011; Zhou et al., 2012) revealed a high similarity in their tertiary structures consisting

of an N-terminal domain and a catalytically active C-terminal domain, which implies a

similarity in their catalytic mechanisms. Insights obtained from structural studies helped

us identity two distinct chemical strategies employed for initiating cyclization reactions,

based on which TPSs can be divided into two major categories. Typically, class I TPSs

has a C-terminal domain (also referred to as α-domain or class I fold) and catalyzes

an ionization-initiated cyclization of the substrate diphosphate group. The α-domain adopts the α-helical protein fold and contains two metal binding motifs, highly conserved

DDXXD and less well conserved NSE/DTE motifs located on opposing helices near the entrance of the active site. Both of them bind a trinuclear magnesium cluster to trigger the ionization of the isoprenoid substrate and initiate the cyclization reaction. In contrast, class II TPSs possesses a functional N-terminal domain (or β-domain), which together with a third “insertion” γ-domain forming a vestigial class II fold. Enzymes in this class contain a conserved DXDD motif, which is located in the β-domain and responsible for

6 the protonation-initiated cyclization. The γ-domain carries a highly acidic EDXXD-like

Mg2+/diphosphate binding motif that also contributes to the activity of class II TPSs (Cao

et al., 2010).

Typically, microbial class I and II TPSs are monofunctional and display only α-domain

and βγ-domain, respectively. Most plant monoterpene and sesquiterpene synthases

contain both α-domain and β-domain, but only have α-domain activity. The β-domain is always rendered inactive due to the loss of conserved DXDD motif. The available protein crystal structures suggest that most of plant diterpene and some sesquiterpene synthases possess all these three domains (γ-β-α). However, usually only one domain is functional. For example, sesquiterpene synthase only retain class I TPSs activity as seen in α-bisabolene synthase (AgBIS) from Abies grandis (McAndrew et al., 2011). In contrast, Diterpene synthases use either one of the two active sites, such as the class I TPS

Taxadiene synthase from the Pacic yew (Köksal et al., 2011b), the class II TPS ent-copalyl diphosphate synthase (CPS) from Arabidopsis thaliana (Köksal et al., 2011a). However, there are always exceptions. Some diterpene synthases with both class I and class II activities, such as the abietadiene synthase (AgAS) (Zhou et al., 2012) from Abies grandis, ent-kaurene synthase (FCPS/KS) (Kawaide et al., 1997) from the Phaeosphaeria sp. L487 and ent-kaurene synthase (PpCPS/KS) (Hayashi et al., 2006) from the moss

Physcomitrella patens, have been reported.

Terpene synthases are approximately 550–860 amino acids long and 50–100 kDa in their molecular masses (Figure 1.3), dierentiated by the combinations of domains and motifs. In general, sesquiterpene synthases are 550-600 aa long and 50–70 amino acids

7 shorter than monoterpene synthases, which contain the N-terminal plastid-targeting

peptides. Diterpene synthases are even longer than monoterpene synthases due to the

additional insertion sequences (γ-domain) in their N-terminal. Many terpene synthases

also carry a highly conserved RR(x)8W motif downstream of the N-terminal transit

peptide. Previous studies shown that the activity of monoterpene synthase doesn’t require

this motif, and thus it is considered as a landmark separating the transit peptide from the

mature protein sequence (Bohlmann et al., 1998; Keeling and Bohlmann, 2006).

1.4 Origin and Evolution of Terpene Synthase Genes

Although the TPS represents a mid-size gene family, they are too few in numbers when

compared to the great number of compounds in this group, which is mainly due to the

functional plasticity of most TPSs. Many terpene synthases can produce more than one

compound via only very few substitutions. In order to survive in the rapidly changing

environment, TPS genes must evolve quickly to generate the required terpenoid prole.

It is widely accepted that TPS genes have undergone extensive lineage-specic gene

duplication, followed by sub-functionalization or even neo-functionalization to perform

dierent roles, the so-called divergent evolutionary process, which is indicated by the

fact that a signicant number of TPS genes are clustered into tandem arrays. On the

other hand, instances of convergent evolution can also be observed. Fox example, TPS23,

a sesquiterpene synthase for producing (E)-β-caryophyllene from farnesyl diphosphate

(FPP) in maize (Zea mays), is more similar to other maize TPSs than (E)-β-caryophyllene

8 synthases from other dicot species (Köllner et al., 2008). Also, the low level of sequence similarity among remotely related species and distinct patterns of genomic organizations of plant and microbial TPSs support a convergent evolution.

However, where did plant TPS genes come from and do plant and microbial TPS genes share a common ancestor? Previous studies proposed that plant terpene synthase gene ancestor, resembling a diterpene synthase associated with Gibberellin biosynthesis, emerged prior to the divergence of angiosperms and (Trapp and Croteau, 2001). Later, the bifunctional diterpene synthase (PpCPS/KS) identied in the Physcomitrella patens, was hypothesized to be the common ancestor (Hayashi et al., 2006; Keeling et al., 2010). Interestingly, in bacterium Bradyrhizobium japonicum, the ent-copalyl diphosphate synthase (BjCPS) and ent-kaurene synthase (BjKS) share some similarity with the β-domain (CPS activity) and α-domain (KS activity) of the plant and fungal TPS genes, respectively. This, together with the fact that plant terpene synthase genes are longer and the two types of monoterpene synthases (CPS/KS) are of roughly equal length, suggested a common ancestral diterpene synthase gene shared by plants, fungi, and bacteria (Morrone et al., 2009). Results gained by analyzing the structures of diterpene synthase genes supported this hypothesis and showed that all modern plant

TPS genes evolved from a three-domain (γ-β-α) diterpene synthase gene, which is the fusion of ancient bacterial α-domain and βγ-domain terpene synthase genes (Cao et al.,

2010). Recently, a new class of plant terpene synthase gene, the microbial-type class I terpene synthase genes containing only a α-domain, has been discovered in the spikemoss

Selaginella muellendori (Li et al., 2012). This class of TPS genes is more similar to

9 microbial TPS genes than other plant TPS genes and is probably integrated into plant genomes from microbes via horizontal gene transfer, indicating a dierent evolutionary path rather than evolving from the three-domain diterpene synthase gene.

1.5 Objectives

As more genome and transcriptome sequences from non-model species are available, particular for non-seed plants, it is possible to obtain more TPS genes and examine them within a broader range of taxa. This analysis holds true for constructing a comprehensive evolutionary history of this gene family. Generally speaking, we are interested to trace the origin and evolutionary history of terpene synthase genes, and then to conrm or rene the evolutionary paths. The overall project is designed to address broad questions about the evolution of terpene synthase genes.

10 1.6 Bibliography

Bohlmann, J., Meyer-Gauen, G., and Croteau, R. (1998). Plant terpenoid synthases:

molecular biology and phylogenetic analysis. Proceedings of the National academy of

Sciences of the United States of America, 95(8):4126–33.8

Bohlmann, J., Steele, C. L., and Croteau, R. (1997). Monoterpene synthases from grand r

(abies grandis): cdna isolation, characterization, and functional expression of myrcene

synthase, (-)-(4s)- limonene synthase, and (-)-(1s,5s)-pinene synthase. Journal of

Biological Chemistry, 272(35):21784–21792.6

Cao, R., Zhang, Y., Mann, F. M., Huang, C., Mukkamala, D., Hudock, M. P., Mead, M. E.,

Prisic, S., Wang, K., Lin, F. Y., Chang, T. K., Peters, R. J., and Oldeld, E. (2010). Diterpene

cyclases and the nature of the isoprene fold. Proteins, 78(11):2417–32.7,9

Christianson, D. W. (2008). Unearthing the roots of the terpenome. Current Opinion in

Chemical Biology, 12(2):141–50.3

Hayashi, K. i., Kawaide, H., Notomi, M., Sakigi, Y., Matsuo, A., and Nozaki, H. (2006).

Identication and functional analysis of bifunctional ent-kaurene synthase from the

moss physcomitrella patens. FEBS Letters, 580(26):6175–6181.7,9

Hemmerlin, A., Hoe er, J. F., Meyer, O., Tritsch, D., Kagan, I. A., Grosdemange-Billiard,

C., Rohmer, M., and Bach, T. J. (2003). Cross-talk between the cytosolic mevalonate and

the plastidial methylerythritol phosphate pathways in tobacco bright yellow-2 cells.

Journal of Biological Chemistry, 278(29):26666–26676.4

11 Hyatt, D. C., Youn, B., Zhao, Y., Santhamma, B., Coates, R. M., Croteau, R. B., and

Kang, C. (2007). Structure of limonene synthase, a simple model for terpenoid cyclase

catalysis. Proceedings of the National academy of Sciences of the United States of America,

104(13):5360–5.6

Kawaide, H., Imai, R., Sassa, T., and Kamiya, Y. (1997). Ent-kaurene synthase from the

fungus phaeosphaeria sp. l487. cdna isolation, characterization, and bacterial expression

of a bifunctional diterpene cyclase in fungal gibberellin biosynthesis. Journal of

Biological Chemistry, 272(35):21706–12.7

Keeling, C. I. and Bohlmann, J. (2006). Genes, enzymes and chemicals of terpenoid

diversity in the constitutive and induced defence of against insects and

pathogens. New Phytologist, 170(4):657–75. xv,8, 18

Keeling, C. I., Dullat, H. K., Yuen, M., Ralph, S. G., Jancsik, S., and Bohlmann, J.

(2010). Identication and functional characterization of monofunctional ent-copalyl

diphosphate and ent-kaurene synthases in white spruce reveal dierent patterns for

diterpene synthase evolution for primary and secondary metabolism in gymnosperms.

Plant Physiology, 152(3):1197–1208.9

Köksal, M., Hu, H., Coates, R. M., Peters, R. J., and Christianson, D. W. (2011a). Structure

and mechanism of the diterpene cyclase ent-copalyl diphosphate synthase. Nature

Chemical Biology, 7(7):431–433.7

12 Köksal, M., Jin, Y., Coates, R. M., Croteau, R., and Christianson, D. W. (2011b). Taxadiene

synthase structure and evolution of modular architecture in terpene biosynthesis.

Nature, 469(7328):116–122.6,7

Köksal, M., Zimmer, I., Schnitzler, J. P., and Christianson, D. W. (2010). Structure of

isoprene synthase illuminates the chemical mechanism of teragram atmospheric carbon

emission. Journal of Molecular Biology, 402(2):363–373.6

Köllner, T. G., Held, M., Lenk, C., Hiltpold, I., Turlings, T. C. J., Gershenzon, J., and

Degenhardta, J. (2008). A maize (e)-β-caryophyllene synthase implicated in indirect

defense responses against herbivores is not expressed in most american maize varieties.

Plant Cell, 20(2):482–494.9

Li, G., Kollner, T. G., Yin, Y., Jiang, Y., Chen, H., Xu, Y., Gershenzon, J., Pichersky, E.,

and Chen, F. (2012). Nonseed plant selaginella moellendor [corrected] has both seed

plant and microbial types of terpene synthases. Proceedings of the National academy of

Sciences of the United States of America, 109(36):14711–5.9

McAndrew, R. P., Peralta-Yahya, P. P., DeGiovanni, A., Pereira, J. H., Hadi, M. Z., Keasling,

J. D., and Adams, P. D. (2011). Structure of a three-domain sesquiterpene synthase: a

prospective target for advanced biofuels production. Structure, 19(12):1876–84.6,7

Morrone, D., Chambers, J., Lowry, L., Kim, G., Anterola, A., Bender, K., and Peters, R. J.

(2009). Gibberellin biosynthesis in bacteria: Separate ent-copalyl diphosphate and ent-

kaurene synthases in bradyrhizobium japonicum. FEBS Letters, 583(2):475–480.9

13 Pichersky, E. and Lewinsohn, E. (2011). Convergent evolution in plant specialized

metabolism. Annual Review of Plant Biology, 62:549–66.3

Pichersky, E., Noel, J. P., and Dudareva, N. (2006). Biosynthesis of plant volatiles: nature’s

diversity and ingenuity. Science, 311(5762):808–11.3

Roberts, S. C. (2007). Production and engineering of terpenoids in plant cell culture. Nature

Chemical Biology, 3(7):387–95.4

Rohmer, M., Knani, M., Simonin, P., Sutter, B., and Sahm, H. (1993). Isoprenoid

biosynthesis in bacteria: a novel pathway for the early steps leading to isopentenyl

diphosphate. Biochemical Journal, 295 ( Pt 2):517–24.3

Tholl, D. (2006). Terpene synthases and the regulation, diversity and biological roles of

terpene metabolism. Current Opinion in Plant Biology, 9(3):297–304.2

Tholl, D. and Lee, S. (2011). Terpene specialized metabolism in arabidopsis thaliana.

Arabidopsis Book, 9:e0143.2

Trapp, S. C. and Croteau, R. B. (2001). Genomic organization of plant terpene synthases

and molecular evolutionary implications. Genetics, 158(2):811–32.9

Vranova, E., Coman, D., and Gruissem, W. (2013). Network analysis of the mva and mep

pathways for isoprenoid synthesis. Annual Review of Plant Biology, 64:665–700. xv,4,

17

14 Zhou, K., Gao, Y., Hoy, J. A., Mann, F. M., Honzatko, R. B., and Peters, R. J. (2012). Insights

into diterpene cyclization from structure of bifunctional abietadiene synthase from

abies grandis. Journal of Biological Chemistry, 287(9):6840–6850.6,7

15 1.7 Appendix

Figure 1.1: Two compartmentalized pathways of isoprenoid biosynthesis. Dashed arrows indicate more than one step.

16 PP64CH27-Gruissem ARI 25 March 2013 12:49

O O OH H C H C S-CoA 3 3 O Acetyl-CoA D-Glyceraldehyde 3-phosphate DXS Pyruvate CO2 Ac-CoA OH AACT HS-CoA O H3C OPOH OO O OH HO 1-Deoxy-D-xylulose 5-phosphate H3C S-CoA + DXR NADPH + H Acetoacetyl-CoA NADP+ OH H3C O Ac-CoA + H2O HO HMGS HS-CoA OPOH OH HO

O H3C OH O 2-C-Methyl-D-erythritol 4-phosphate CTP NH2 HO S-CoA MCT PP OH i H3C O O HO O N N 3-Hydroxy-3-methylglutaryl-CoA OPOPO + OH HO OH O HMGR 2NADPH + 2H 2NADPH+ +HS-CoA HO OH 4-(Cytidine 5'-diphospho)-2-C-methyl-D-erythritol O HO CH3 O ATP CMK ADP HOP OH NH HO OH 2 O H3C O O Mevalonate HO O N N OPOPO ATP MK ADP OH HO OH O HO OH O HO CH 3 O 2-Phospho-4-(cytidine 5'-diphospho)-2-C-methyl-D-erythritol MDS HO OPO CMP HO O OH P O Mevalonate-5-phosphate O OH CH3 ATP HO P PMK OH ADP O O

O HO CH3 2-C-Methyl-D-erythritol 2,4-cyclodiphosphate O O 2 reduced ferredoxin HDS 2 oxidized ferredoxin + H O HO OPOP OH H3C 2 HO OH H C O OH Mevalonate-5-diphosphate 2 OOP PO ATP MPDC ADP + Pi + CO2 OH OH 4-Hydroxy-3-methylbut-2-enyl-diphosphate H3C HDR + CH3 H3C NADPH + H + H2C O OH OH OH NADPH + H2O O OH CH3 OOP P O H3C O POP OH H2C OH OH

Annu. Rev. Plant Biol. 2013.64:665-700. Downloaded from www.annualreviews.org O O OOP PO H C O POP OH OH OH IPPI 3 Isopentenyl diphosphate Dimethylallyl diphosphate OH OH IPPI O O Isopentenyl diphosphate Dimethylallyl diphosphate IPP FPPS PP IPP i GPPS PP IPP i Access provided by University of Tennessee - Knoxville Hodges Library on 03/27/16. For personal use only. CH GGPPS PP IPP 3 OH OH i GGPPS CH3 PPi CH3 OH OH O POP OH CH CH 3 3 O O O POP OH H C O O H3C Farnesyl diphosphate 3 Geranyl diphosphate

CH3 CH3 CH3 CH3 CH CH CH CH OH OH 3 3 3 3 OH OH O POP OH H3C H3C O POP OH O O Geranylgeranyl diphosphate Geranylgeranyl diphosphate O O

670 Vranova´ Coman Gruissem Figure 1.2: The mevalonic· · acid (MVA) pathway and the methylerythritol phosphate (MEP) pathway in plants (Vranova et al., 2013).

17

660 Review Tansley review

and TPS-d3 groups encompassing primarily monoterpene synthases, sesquiterpene synthases, and primarily diterpene synthases, respectively (Fig. 4; Martin et al., 2004). This analysis also suggests that conifer sesquiterpene synthases evolved independently several times as members of all three divisions, TPS-d1, TPS-d2 and TPS-d3 (Martin et al., 2004). Comparative genetic mapping of loblolly pine (Pinus taeda: Pinaceae) and Douglas-fir suggests conserved chromosomal evolution in conifers (Krutovsky et al., 2004). Although few genomic sequences of terpene synthases have been recovered to date, there are six genomic sequences from grand fir and Pacific yew (Taxus brevifolia: Taxaceae) and several from angiosperms. These have provided a framework for an exten- sive evolutionary analysis resulting in three classes of terpene Fig. 2 Schematic of terpene synthase enzymes. a.a., amino acids. Figure 1.3: A schematic diagram of terpene synthases with annotated sequence features synthase genes (Trapp & Croteau, 2001b). Trapp & Croteau (Keeling and Bohlmann, 2006). (2001b) divided genomic sequences of conifer terpene syn- related enzymes involved in both primary and secondary thases into three classes: classes I, II and III. However, it is metabolism (Bohlmann et al., 1998b; Martin et al., 2004). important to note that these classes do not refer to classes of Their diversity appears to originate from repeated duplication terpene synthase gene functions. An original duplication of and subsequent divergence of an ancestral terpene synthase the ancestral terpene synthase gene resulted in class I genes. gene of primary metabolism (Bohlmann et al., 1998b; Trapp This class includes all known angiosperm genes involved in & Croteau, 2001b). Terpene synthases can be divided into gibberellin metabolism as well as conifer diterpene synthases three functional classes, namely monoterpene synthases, (some with loss of two introns at the 5′ end) and a few ses- sesquiterpene synthases, and diterpene synthases, and are quiterpene synthases that retain the 210-amino acid domain approximately 550–860 amino acids long and 50–100 kDa in but have lost an intron at the 3′ end of the gene. Loss of the size (Fig. 2; Bohlmann et al., 1998b). Sesquiterpene synthases 210-amino acid domain differentiates class II terpene synthases, are usually the smallest of these enzymes. Conifer monoterpene which include the monoterpene and sesquiterpene synthases are typically 600–650 amino acids long, 50–70 synthases, from class I. Further loss of several introns differen- amino acids longer than sesquiterpene synthases, because of tiates class III genes from class II genes. Class III genes are so far their N-terminal plastid-targeting sequence. Conifer diterpene only represented by mono-, sesqui-, and diterpene synthases synthases are c. 210 amino acids longer than monoterpene from angiosperms. 18 synthases because of an additional conserved sequence The grouping of conifer terpene synthases as shown in motif in the N-terminal half of these proteins (Bohlmann Figs 3 and 4 supports the evolutionary model in which the et al., 1998b). Interestingly, although not normally found in specialization of these synthases occurred before conifer sesquiterpene synthases, this 210-amino acid feature is found speciation and functional specialization of terpene synthases in the sesquiterpene synthases for (E)-α-bisabolene in Norway occurred independently in angiosperms and conifers spruce (Picea abies: Pinaceae) (Martin et al., 2004) and grand (Bohlmann et al., 1998b; Martin et al., 2004). As more terpene fir (Bohlmann et al., 1998a) and the sesquiterpene synthase synthases are identified in conifers, especially those of primary for (E)-γ-bisabolene in Douglas-fir (Pseudotsuga menziesii: metabolism (none has been identified to date), their amino Pinaceae) (Huber et al., 2005a), and also in an angiosperm acid sequence and genomic structure can be examined within monoterpene synthase (Dudareva et al., 1996). It is also found this framework to elucidate the evolution of this highly diverse in all known angiosperm diterpene synthases for the formation gene family further. By examining the sequences of homologous of gibberellins. Many terpene synthases have a highly conserved enzymes from several conifer species, we can also delineate

RRX8W motif near the N-terminus and a DDXXD motif mutations that lead to functional diversification and place involved in metal cofactor binding. Phylogenetic analysis these mutations within the evolutionary context of conifers. based on amino acid sequences has grouped the plant terpene synthases into seven subfamilies, named TPS-a to TPS-g B. Identification and functional characterization of terpene (Bohlmann et al., 1998b). Interestingly, the TPS-d subfamily synthases Genomic sequence analysis of Arabidopsis thaliana is exclusively comprised of gymnosperm terpene synthases. (Brassicales: Brassicaceae) has identified at least 30 putative Recent updates of the TPS-d subfamily to include newly terpene synthase genes (Aubourg et al., 2002). We would characterized conifer enzymes reinforce the findings of earlier expect at least as many terpene synthases in conifers, if not studies and elaborate on the phylogenetic divisions of the more. So far, however, only about one-third of this number TPS-d subfamily (Fig. 3; Martin et al., 2004). The TPS-d have been functionally characterized in any one conifer subfamily can now be further separated into TPS-d1, TPS-d2, species (Table 1). Several techniques have been used to

New Phytologist (2006) 170: 657–675 www.newphytologist.org © The Authors (2006). Journal compilation © New Phytologist (2006) Chapter 2

Evolution of Typical Plant Terpene

Synthase Genes in Non-Seed Plants

19 2.1 Abstract

The terpene synthases (TPSs) are widely encoded in plants and the primary enzymes responsible for producing diverse natural compounds. Previous studies, which typically focus on sequenced plant species, classify this mid-size gene family into seven subfamilies: a, b, c, d, g, e/f and h. However, TPSs from non-seed plant species are very limited in this classication. In this study, by adding more TPSs identied from large-scale non-seed plant transcriptomes and several sequenced non-seed plant genomes, we were able to increase the resolution of TPS subfamily classication. The new phylogenetic analysis recognized seven previous known subfamilies and a new subfamily x. The tree topographies are dominated by taxonomic relatedness and placements of some TPSs from

Physcomitrella patens (PpCPS/KS) and Selaginella moellendori are changed. Subfamilies x and h are specic to non-seed plants. The new subfamily x contains TPSs from all major lineages of non-seed plants except , while the subfamily h has no TPSs from

Hornworts. Although TPSs in subfamilies x and h share similar motif composition, TPSs in subfamily h are more closely related to some sequences in the subfamily d, which is specic to gymnosperm. This suggests that sequences in the subfamily h are more similar to the ancestor three-domain gene in plants.

20 2.2 Introduction

Terpenoids, the largest and most structurally and functionally diverse class of natural

compounds on earth, are mostly synthesized by plants to be involved in various plant-

environment interactions. Some terpenoids are classied as primary metabolites essential

for plant growth and development. Terpene synthases (TPSs), the key enzymes for ter-

penoid biosynthesis, are the major determinant of the tremendous diversity of terpenoid

carbon skeletons. The TPS genes represent a mid-size family of 30-100 functional genes

in almost all major sequenced plant genomes (Chen et al., 2011).

Terpene synthases can be further divided into seven subfamilies: a, b, c, d, g, e/f and h

(Bohlmann et al., 1998; Chen et al., 2011; Dudareva et al., 2003; Martin and Bohlmann,

2004). The a, b and g subfamilies are angiosperm specic, while the d subfamily is

gymnosperm-specic. The subfamily c encodes monoterpene synthases and contains the

single Terpene Synthase gene in the moss Physcomitrella patens, genes from angiosperms and gymnosperms and also genes in Selaginella moellendori. The subfamily e/f contains ent-kaurene synthases (KSs) from angiosperms and gymnosperms. The new subfamily h is specic to Selaginella moellendori and members of this subfamily contain both DxDD and DDxxD motifs, which are signatures of bifunctional diterpene synthases.

However, most of our current knowledge of terpene synthase genes is based on studies in model species of plants. Among the 50 plants with genomes, the vast majority (94%) of them belong to angiosperms, while only one from each of the three clades: gymnosperm, bryophyte and lycophyta (Michael and Jackson, 2013). The inferred subfamilies are based

21 on the phylogenetic analysis of terpene synthases from seven sequenced plant genomes and selected sequences from gymnosperms. Among these seven plant species, Selaginella moellendori is the only non-seed plant analyzed and some members of its terpene family form the new subfamily h. Given the high diversity of terpenoids, taxonomically specialized TPS genes should exist in some plant species or families, in particular for non-seed plants. So, in order to have a comprehensive understanding of the evolutionary history of this gene family, studies within a more comprehensive coverage of plant species are indispensable, which are also suggested in previous studies (Trapp and Croteau, 2001).

In the last years, new advances in high throughput sequencing techniques have allowed us to discover genes and prole their expression patterns more quickly and cheaply.

Lots of studies (Bleeker et al., 2011; Drew et al., 2013; Hall et al., 2013; Han et al., 2013;

Keeling et al., 2011) have been made by transcriptome mining to identify terpene synthase genes in non-model species. Interestingly, a project, called PhytoMetaSyn with the aim of sequencing 75 non-model plants that generate high-value metabolites such as polyketides, terpenes and alkaloids, has already identied a list of new biosynthetic genes involved in the metabolic pathways (Xiao et al., 2013). Another project is the One Thousand Plant

Transcriptomes Project (OneKP), which generates transcriptomes for over 1000 plant species that span almost all major plant clades (from green algae to owering plants), aiming to increase the taxonomic diversity of available genes in GenBank. Also, genomes of more non-seed plants are emerging, such as a moss (Sphagnum fallax), a liverwort

(Marchantia polymorpha), a Hornwort (Anthoceros Punctatus) and two ferns (Salvinia cucullata and Azolla superorganism). Analysis of terpene synthases from these genomic

22 datasets of non-seed plants will provide valuable insights into the evolutionary history of terpene synthases.

2.3 Results and Discussion

2.3.1 Identication of Terpene Synthase Genes from Genomes of

Hornwort, Moss, Liverwort and Two Ferns

Six non-seed plant species (Table 2.1) with available genomes were screened for typical plant terpene synthase genes. These six species span all ve lineages of non-seed plants.

The two genomes, Salvinia cucullata and Azolla superorganism, contain 7 (Table

2.2) and 1 (Table 2.3) terpene synthase genes, respectively. Five of these seven terpene synthase genes in Salvinia cucullata have DxDD or DDxxD motifs, while the other two,

ScuTPS5 and ScuTPS7, have no these two functional motifs. The longest one, ScuTPS1, has the DxDx and DDxxD motifs. All these seven genes lack the NDSE motif. The only gene in the Azolla superorganism genome encodes a protein of 894 amino acids with the DxDD and DDxxD motifs. The genome of the hornwort Anthoceros punctatus is not complete and contains more than 15,000 contigs (Table 2.4). 22 terpene synthases were identied and the protein lengths range from 186 to 880 amino acids (Table 2.5). 14 of them contain at least one functional motif. ApuTPS5 and ApuTPS16 have both the DxDD and DDxxD motifs. The liverwort genome, Marchantia polymorpha, contains seven terpene synthases

(Table 2.6), all of which are around 800 amino acids. The genome of the moss Sphagnum

23 fallax has four terpene synthases and all of them have both the DxDD and DDxxD motifs except the shortest one (Table 2.7 and 2.8). For the Selaginella moellendori, 18 terpene synthases were identied in our previous study and three of them are putative bifunctional terpene synthases.

2.3.2 Identication of Terpene Synthases from Non-Seed Plants

Transcriptomes

By mining transcriptomes of 324 non-seed plant species, we found 253 archetypical plant terpene synthases (at least 150 amino acids long) from 88 species (Table 2.9).

The median and average length of these 253 terpene synthases is 370 and 433 amino acids, respectively (Figure 2.1). They are present in all major clades of non-seed plants: monilophytes, lycophytes, hornworts, and liverworts. However, they are completely absent in two groups of green algae: Chlorophyte and Charophyte. It is also clear from table 2.9 that all lycophytes and majority of liverworts screened encode terpene synthase genes, while only 27% (17 out of 90) of monilophytes contain terpene synthase genes.

2.3.3 Search for Terpene Synthase Genes in Green Algae

To conrm the complete absence of terpene synthase genes in green algae, we further analyzed additional available genomic datasets obtained from other sources. In total, 8 algal genomes (1 from Charophyta and 7 from ) and 4 transcriptomes (3 from

24 Charophyta and 1 from Chlorophyta) were screened for the presence of terpene synthase

genes (Table 2.10). The blastp searches didn’t nd any signicant hits to the known

terpene synthase genes. This, at least now, further conrmed the absence of terpene

synthase genes in green algae and is also consistent with the ndings of one early study

(Sasso et al., 2012).

2.3.4 Phylogenetic Analysis of TPSs from Non-Seed Plants with

TPSs from Selected Seed Plants

Previous studies mainly in seed plants classied terpene synthases into seven sub-

families (TPS-a to TPS-h) by means of phylogenetic analysis. In this study, we extended

this analysis further by adding terpene synthases identied from large-scale non-seed

plant transcriptomes and genomes. 253 plant terpene synthases identied from 88

oneKP species, 60 from 7 non-seed plant genomes (7 in Salvinia cucullata, 1 in Azolla

superorganism, 18 in Selaginella moellendori, 22 in Anthoceros Punctatus, 4 in Sphagnum

fallax, 7 in Marchantia polymorpha and 1 in Physcomitrella patens), 1 bifunctional TPS

from the transcriptome of Lygodium japonicum and 149 selected TPSs from seed plants

(Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Sorghum and Gymnosperm) were collected (463 in total) to conduct the phylogenetic analyses. An initial multiple sequence alignment based on the 463 terpene synthase protein sequences was used to detect the conserved functional DxDD and DDxxD motifs in each sequence. Sequences containing at least one of the two motifs were selected.

25 The aforementioned curation and criteria resulted in a nal dataset of 339 terpene synthases (200 from non-seed plants and 139 from seed plants). The phylogenetic analysis based on alignment of these 339 terpene synthases enabled us to classify them into eight subfamilies: the seven currently recognized subfamilies (a, b, c, d, e/f, g and h) and one newly identied subfamily x (Figure 2.2 and Figure 2.3). The sequence motifs observed in each subfamily and each terpene synthase were displayed in Figure 2.2 and Figure

2.3, respectively. As shown in Figure 2.2, TPS-a, b, g and d subfamilies are specic to seed plants. The RR(x)8W motif that is essential for the isomerization of substrates

(Williams et al., 1998) is conserved in both b and d subfamilies. Its variations are found in subfamily a and it is completely absent from subfamily g. Within these four subfamilies, the SAYDTAWVA motif, which is shared by both CPS and KS proteins (Rodriguez et al.,

2012; Sakamoto et al., 2004; Smith et al., 1998), is only observed in subfamily d, especially in sequences having both DxDD and DDxxD motifs. All four subfamilies have conserved RW and RWD motifs around the DDxxD motif. The functions of both motifs are unknown.

Another motif whose function is known and shared by all four subfamilies is the RxR motif, which plays a role in directing the diphosphate anion away from the reactive carbocation after ionization (Kollner et al., 2004; Starks et al., 1997; Tholl et al., 2005).

In subfamily g, this motif is changed to RxQ. The typical motifs, DDxxD and NSE/DTE, are conserved among all four subfamilies.

The subfamily h, which is specic to non-seed plants and previously includes only sequences from Selaginella moellendori, now has members from all ve lineages of non- seed plants except the Hornworts. The RR(x)8W motif is completely absent and all other

26 motifs investigated are observed in almost all members of this subfamily, the exceptions being the ten members from Liverworts. They lack both the DDxxD and NSE/DTE motifs, but the RxR and RWD motifs are intact. One possible explanation is that sequences in

Liverworts lost this functional motif during the evolution, and only keep the DxDD motif and function as class II terpene synthases. This is contrary to the general feature of this subfamily being that sequences have both functional motifs and are putative bifunctional terpene synthases.

The subfamily e/f contains the class I terpene synthases with the DDxxD signature.

All members possess the conserved SAYDTAWVA motif and modied RxR motif, but lack the RR(x)8W and DxDD motifs. Members are from both seed and non-seed plants.

Interestingly, no sequences from Monilophytes and Mosses are found in this subfamily.

Another subfamily that includes sequences from both seed and non-seed plants is the subfamily c, which contains CPS proteins from all four angiosperm species, gymnosperms and Lycophytes. Members of this subfamily have motif features similar to that shared by ten members from Liverworts in the subfamily h. They contain the DxDD motif, but the

NSE/DTE motif is missing. They also share an almost completely conserved SAYDTAWVA motif.

The subfamily x that is the largest subfamily with 109 members is specic to non-seed plants. Members are from all ve lineages of non-seed plants except the Lycophytes and are clustered together according to the lineages to which they belong. The single terpene synthase in the genome of Physcomitrella patens, PpCPS/KS, belongs to the subfamily c according to previous studies. However, it is now in the subfamily x and clusters with

27 sequences from other Mosses. Its new placement is strongly supported by the obvious dierences in motif features between subfamilies c and x. Members in the subfamily x contain all conserved motifs except the RR(x)8W motif and are putative bifunctional terpene synthases, while members in the subfamily c only have the DxDD functional motif and are bona de CPS synthases.

2.4 Conclusions

In this chapter, typical plant terpene synthases in non-seed plant transcriptomes and genomes were identied and characterized. A phylogenetic analysis of these newly identied terpene synthases from non-seed plants and selected terpene synthases from seed plants recognized seven previous known subfamilies and a new subfamily x.

Members in dierent subfamilies usually have dierent motif features, and sequences from same plant lineage are more inclined to cluster together. Subfamilies a, b, g and d and specic to seed plants, while subfamilies h and x are specic to non-seed plants.

Subfamilies e/f and c include members from both seed plant and non-seed plants.

Although the results obtained here are largely consistent with previous studies, they still provide new ndings and insights into the evolution of terpene synthase gene family, for example, the discovery of new subfamily x. These results can be attributed to the increased number of terpene synthases identied from non-seed plants. In previous studies, sequences from non-seed plants are limited to two sequenced genomes:

Physcomitrella patens and Selaginella moellendori. When more sequences from non-seed

28 plants added, placements of some sequences found in these two genomes were changed.

For example, the PpCPS/KS in Physcomitrella patens is now in the subfamily x and some

Selaginella moellendori genes previous clustered with gymnosperms KS proteins now cluster with sequences from Liverworts and Lycophytes (Figure 2.3).

Previous study suggested that modern plant terpene synthases originated from the bifuntional diterpene synthase, PpCPS/KS in Physcomitrella patens (Hayashi et al., 2006).

However, from my results, it is in the largest subfamily x and seems a very common gene in non-seed plants. It is very likely that members in the subfamily x are one of the oldest existing terpene synthases in land plants, but not actual ancestors of modern plant terpene synthases, at least not the immediate ancestors. In another study (Trapp and Croteau,

2001), an extant relative of a conifer diterpene synthase was suggested to be the ancestor.

It seems that sequences in the subfamily h are more closely related to the ancestor three- domain gene in plants, in view of their closer sequence similarity with the gymnosperm terpene synthases and almost same motif feature with members in the subfamily x. Also, some members in subfamily h, such as the ten sequences in Liverworts, function as mono- functional enzymes, even though they still have the three-domain structure. However, the loss of DDxxD and NSE/DTE motifs makes them as CPS proteins. The motif feature for gymnosperm diterpene synthases is also similar to that in the subfamily h. These seven diterpene synthases (Figure 2.3) are the only group of sequences containing the

SAYDTAWVA motif in seed plants.

29 2.5 Materials and Methods

2.5.1 Data Retrieval, Management and Taxonomy Classication

Transcriptomes assembled by SOAPdenovo-trans are downloaded from the repository

at Westgrid (http://onekp.westgrid.ca/1kp-data). Each sample typically has about

10k scaolds with lengths of greater than 1kb. For species having multiple samples from dierent tissues or developmental stages, only one sample (the merged one) is kept for further analysis, resulting in a total of 1175 representative species (Appendix A.1), 324 of which are non-seed plants. For each of these species, both assembled sequences and predicted protein sequences are pre-processed so as to make the datasets more uniform and meet the various requirements of dierent tools, such as shortening the sequence name or assigning a unique and easy-to-distinguish sequence id. All original information and the newly assigned sequence information is integrated into an SQLite database for future reference. The at sequence les are then loaded into the local searchable blast database.

Although the oneKP initiative provides basic taxonomic assignments for each species, not all of them are indexed by NCBI taxonomy database (http://www.ncbi.nlm.nih. gov/taxonomy) due to the mismatching names between the two sources. After manually correcting these errors, each species is assigned a unique taxonomic identier and the species tree, which displays the taxonomic relationships among these species, is generated using the taxident function in the NCBI C++ toolkit.

30 2.5.2 Identication of Typical Plant Terpene Synthases from Tran-

scriptomes of 324 Non-Seed Plants

For all the assembled contigs, the longest regions without stop codons were annotated

and translated using the getorf program from the EMBOSS package (Rice et al., 2000)

with a minimum length of 50 amino acids. The resulting peptides were searched against

the Pfam-A database locally using HMMER 3.0 hmmsearch (Finn et al., 2011) with an

E value of 1e-5. Only sequences with best hits from the following four HMM proles were considered as putative terpene synthases: Terpene_synth_C (PF03936) and Terpene synthase N terminal domain (PF01397), TRI5 (PF06330). For sequences from the same species that have 100% identity, only the longest one was kept as the representative sequence to reduce redundancy. All the putative TPS sequences were subjected to BLASTP

(Altschul et al., 1990) search against the NCBI’s non-redundant database using default parameters.

2.5.3 Assembly of Hornwort Anthoceros punctatus Genome and

Identication of Terpene Synthases

The Illumina paired-end whole genome sequencing data (access number: SRR1278954)

was retrieved from NCBI’s Sequence Read Archive (SRA) database. The reads were

assembled using SPAdes-3.1.1 (Bankevich et al., 2012; Nurk et al., 2013) and the resulting

contigs and singletons were further assembled by CAP3 (Huang and Madan, 1999). The

nal CAP3 assembly contains 34448 sequences (16272 contigs and 18176 singletons), of

31 which 15596 sequences have a minimum length of 500 bp. The assembly statistics based on these 15596 sequences are described in Table 2.4 generated by QUality ASessment Tool

(QUAST) (Gurevich et al., 2013).

The nal assembly was then searched for occurrences of terpene synthases using homology-based methods and ab initio predictors. TBLASTN searches were performed with an E-value cuto of 1e-30 using the known genes typical plant terpene synthases from Arabidopsis thaliana and Oryza sativa, and typical plant terpene synthases identied from 324 transcriptomes of non-seed plant species. We also run SNAP (Korf, 2004) trained for Arabidopsis thaliana and softberry (Solovyev et al., 2006) with specic parameters for

Physcomitrella patens on the assembly. The resulting protein sequences of predicted genes were subsequently subjected to a HMMER search against three HMM proles (PF03936,

PF01397 and PF06330).

2.5.4 Identication of Terpene Synthases from Marchantia poly-

morpha and Sphagnum fallax

To identify typical plant TPSs, the annotated protein sequences of Marchantia poly- morpha genome (version 3.1) and Sphagnum fallax (version 0.5, download from JGI

Phytozome, https://phytozome.jgi.doe.gov/pz/portal.html) were subjected to

HMMER searches against three HMM proles (PF03936, PF01397, PF06330). Each TPS gene was manually checked and the exon-intron information was obtained from the genome annotation.

32 2.5.5 Identication of Terpene Synthases from Salvinia cucullata

and Azolla superorganism

The genome and transcriptome assemblies for both species were shared by Fay-Wei Li.

The annotated protein sequences from both species were searched using HMMER. One

TPS was found in Azolla superorganism and nothing was found in Salvinia cucullata. To make sure of not losing any terpene synthases, tBLASTn was used to search the genome and transcriptome assemblies with terpene synthases identied from non-seed plants and from Lygodium japonicum (GI: 747018826) and Physcomitrella patens (GI: 146325986) as queries. The tBLASTn searches conrmed the only one typical plant terpene synthase in Azolla superorganism and found 7 terpene synthases in Salvinia cucullata, of which 3 were identied from its transcriptome assembly and the other 4 were from its genome assembly.

2.5.6 Identication of Terpene Synthases from Lygodium japon-

icum

The Lygodium japonicum Transcriptome data was downloaded from http://bioinf. mind.meiji.ac.jp/kanikusa/download.php and searched using HMMER to identify typical plant terpene synthases. Two short sequences, isotig20305_F_T_0.4 (131 amino acids long) and isotig20305_N_P_0.4 (114 amino acids long), were identied. Both sequences are part of the known ent-kaurene synthase (GI: 747018826) in Lygodium japonicum.

33 2.5.7 Search for Terpene Synthase Genes in Twelve Algal Genomes

and Transcriptomes

Typical plant TPS genes from Arabidopsis thaliana, Oryza sativa and Physcomitrella

patens were used as blast queries to search against the genomics datasets for the following

twelve algal species: Klebsormidium accidum, Nitella mirabilis, Coleochaete orbicularis,

Mesostigma viride, Chlamydomonas reinhardtii v5.5, Ostreococcus tauri, Ostreococcus luci- marinus v2.0, Micromonas pusilla CCMP1545 v3.0, Micromonas sp. RCC299 v3.0, Volvox carteri v2.1, Coccomyxa subellipsoidea C-169 v2.0 and Spirogyra pratensis (Table 2.10). The blast searches were performed at http://marchantia.info/blast/blast.html and

JGI Phytozome.

2.5.8 TPSs on JBrowse

For Anthoceros Punctatus, Sphagnum fallax and Salvinia cucullata, all the TBLASTN results and predicted protein-coding genes by SNAP and softberry are displayed on local JBrowse. The transcriptome data from non-seed plant species were mapped to the corresponding draft genome by BLASTN and the results are also displayed on JBrowse.

2.5.9 Phylogenetic Analyses of Terpene Synthases

Sequences were aligned using MAFFT (einsi) with 1000 iterations of improvement.

ProtTest (Darriba et al., 2011) was used to selection of the most appropriate protein evolution model for the alignment under the Akaike Information Criterion. For the

34 maximum likelihood analyses, we used RAxML (Stamatakis, 2014) with 1000 bootstrap replicates under the best substitution model (JTT+G+F).

35 2.6 Bibliography

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic local

alignment search tool. Journal of Molecular Biology, 215(3):403–10. 31

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin,

V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., Pyshkin, A. V., Sirotkin, A. V.,

Vyahhi, N., Tesler, G., Alekseyev, M. A., and Pevzner, P. A. (2012). Spades: a new

genome assembly algorithm and its applications to single-cell sequencing. Journal of

Computational Biology, 19(5):455–77. 31

Bleeker, P. M., Spyropoulou, E. A., Diergaarde, P. J., Volpin, H., De Both, M. T., Zerbe, P.,

Bohlmann, J., Falara, V., Matsuba, Y., Pichersky, E., Haring, M. A., and Schuurink, R. C.

(2011). Rna-seq discovery, functional characterization, and comparison of sesquiterpene

synthases from solanum lycopersicum and solanum habrochaites trichomes. Plant

Molecular Biology, 77(4-5):323–36. 22

Bohlmann, J., Meyer-Gauen, G., and Croteau, R. (1998). Plant terpenoid synthases:

molecular biology and phylogenetic analysis. Proceedings of the National academy of

Sciences of the United States of America, 95(8):4126–33. 21

Chen, F., Tholl, D., Bohlmann, J., and Pichersky, E. (2011). The family of terpene synthases

in plants: a mid-size family of genes for specialized metabolism that is highly diversied

throughout the kingdom. Plant Journal, 66(1):212–29. 21

36 Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. (2011). Prottest 3: fast selection of

best-t models of protein evolution. Bioinformatics, 27(8):1164–5. 34

Drew, D. P., Dueholm, B., Weitzel, C., Zhang, Y., Sensen, C. W., and Simonsen, H. T.

(2013). Transcriptome analysis of thapsia laciniata rouy provides insights into terpenoid

biosynthesis and diversity in apiaceae. International Journal of Molecular Sciences,

14(5):9080–98. 22

Dudareva, N., Martin, D., Kish, C. M., Kolosova, N., Gorenstein, N., Fäldt, J., Miller, B.,

and Bohlmann, J. (2003). (e)-β-ocimene and myrcene synthase genes of oral scent

biosynthesis in snapdragon: Function and expression of three terpene synthase genes

of a new terpene synthase subfamily. Plant Cell, 15(5):1227–1241. 21

Finn, R. D., Clements, J., and Eddy, S. R. (2011). Hmmer web server: interactive sequence

similarity searching. Nucleic Acids Research, 39(Web Server issue):W29–37. 31

Gurevich, A., Saveliev, V., Vyahhi, N., and Tesler, G. (2013). Quast: quality assessment tool

for genome assemblies. Bioinformatics, 29(8):1072–5. 32

Hall, D. E., Yuen, M. M., Jancsik, S., Quesada, A. L., Dullat, H. K., Li, M., Henderson, H.,

Arango-Velez, A., Liao, N. Y., Docking, R. T., Chan, S. K., Cooke, J. E., Breuil, C., Jones,

S. J., Keeling, C. I., and Bohlmann, J. (2013). Transcriptome resources and functional

characterization of monoterpene synthases for two host species of the mountain pine

, lodgepole pine (pinus contorta) and jack pine (pinus banksiana). BMC Plant

Biology, 13:80. 22

37 Han, X. J., Wang, Y. D., Chen, Y. C., Lin, L. Y., and Wu, Q. K. (2013). Transcriptome

sequencing and expression analysis of terpenoid biosynthesis genes in litsea cubeba.

PLOS ONE, 8(10):e76890. 22

Hayashi, K. i., Kawaide, H., Notomi, M., Sakigi, Y., Matsuo, A., and Nozaki, H. (2006).

Identication and functional analysis of bifunctional ent-kaurene synthase from the

moss physcomitrella patens. FEBS Letters, 580(26):6175–6181. 29

Huang, X. and Madan, A. (1999). Cap3: A dna sequence assembly program. Genome

Research, 9(9):868–77. 31

Keeling, C. I., Weisshaar, S., Ralph, S. G., Jancsik, S., Hamberger, B., Dullat, H. K., and

Bohlmann, J. (2011). Transcriptome mining, functional characterization, and phylogeny

of a large terpene synthase gene family in spruce (picea spp.). BMC Plant Biology, 11:43.

22

Kollner, T. G., Schnee, C., Gershenzon, J., and Degenhardt, J. (2004). The variability of

sesquiterpenes emitted from two zea mays cultivars is controlled by allelic variation of

two terpene synthase genes encoding stereoselective multiple product enzymes. Plant

Cell, 16(5):1115–31. 26

Korf, I. (2004). Gene nding in novel genomes. BMC Bioinformatics, 5:59. 32

Martin, D. M. and Bohlmann, J. (2004). Identication of vitis vinifera (-)-α-terpineol

synthase by in silico screening of full-length cdna ests and functional characterization

of recombinant terpene synthase. Phytochemistry, 65(9):1223–1229. 21

38 Michael, T. P. and Jackson, S. (2013). The rst 50 plant genomes. Plant Gen., 6(2):–. 21

Nurk, S., Bankevich, A., Antipov, D., Gurevich, A. A., Korobeynikov, A., Lapidus, A.,

Prjibelski, A. D., Pyshkin, A., Sirotkin, A., Sirotkin, Y., Stepanauskas, R., Clingenpeel,

S. R., Woyke, T., McLean, J. S., Lasken, R., Tesler, G., Alekseyev, M. A., and Pevzner,

P. A. (2013). Assembling single-cell genomes and mini-metagenomes from chimeric

mda products. Journal of Computational Biology, 20(10):714–37. 31

Rice, P., Longden, I., and Bleasby, A. (2000). Emboss: the european molecular biology open

software suite. Trends in Genetics, 16(6):276–7. 31

Rodriguez, M. V., Mendiondo, G. M., Cantoro, R., Auge, G. A., Luna, V., Masciarelli,

O., and Benech-Arnold, R. L. (2012). Expression of seed dormancy in grain sorghum

lines with contrasting pre-harvest sprouting behavior involves dierential regulation

of gibberellin metabolism genes. Plant and Cell Physiology, 53(1):64–80. 26

Sakamoto, T., Miura, K., Itoh, H., Tatsumi, T., Ueguchi-Tanaka, M., Ishiyama, K.,

Kobayashi, M., Agrawal, G. K., Takeda, S., Abe, K., Miyao, A., Hirochika, H., Kitano,

H., Ashikari, M., and Matsuoka, M. (2004). An overview of gibberellin metabolism

enzyme genes and their related mutants in rice. Plant Physiology, 134(4):1642–53. 26

Sasso, S., Pohnert, G., Lohr, M., Mittag, M., and Hertweck, C. (2012). Microalgae in the

postgenomic era: a blooming reservoir for new natural products. FEMS Microbiology

Reviews, 36(4):761–85. 25

39 Smith, M. W., Yamaguchi, S., Ait-Ali, T., and Kamiya, Y. (1998). The rst step of gibberellin

biosynthesis in pumpkin is catalyzed by at least two copalyl diphosphate synthases

encoded by dierentially regulated genes. Plant Physiology, 118(4):1411–9. 26

Solovyev, V., Kosarev, P., Seledsov, I., and Vorobyev, D. (2006). Automatic annotation of

eukaryotic genes, pseudogenes and promoters. Genome Biology, 7 Suppl 1:S10 1–12. 32

Stamatakis, A. (2014). Raxml version 8: a tool for phylogenetic analysis and post-analysis

of large phylogenies. Bioinformatics, 30(9):1312–3. 35

Starks, C. M., Back, K., Chappell, J., and Noel, J. P. (1997). Structural basis for cyclic terpene

biosynthesis by tobacco 5-epi-aristolochene synthase. Science, 277(5333):1815–20. 26

Tholl, D., Chen, F., Petri, J., Gershenzon, J., and Pichersky, E. (2005). Two sesquiterpene

synthases are responsible for the complex mixture of sesquiterpenes emitted from

arabidopsis owers. Plant Journal, 42(5):757–71. 26

Trapp, S. C. and Croteau, R. B. (2001). Genomic organization of plant terpene synthases

and molecular evolutionary implications. Genetics, 158(2):811–32. 22, 29

Williams, D. C., McGarvey, D. J., Katahira, E. J., and Croteau, R. (1998). Truncation

of limonene synthase preprotein provides a fully active ’pseudomature’ form of this

monoterpene cyclase and reveals the function of the amino-terminal arginine pair.

Biochemistry, 37(35):12213–20. 26

Xiao, M., Zhang, Y., Chen, X., Lee, E. J., Barber, C. J., Chakrabarty, R., Desgagne-Penix, I.,

Haslam, T. M., Kim, Y. B., Liu, E., MacNevin, G., Masada-Atsumi, S., Reed, D. W., Stout,

40 J. M., Zerbe, P., Zhang, Y., Bohlmann, J., Covello, P. S., De Luca, V., Page, J. E., Ro, D. K.,

Martin, V. J., Facchini, P. J., and Sensen, C. W. (2013). Transcriptome analysis based on next-generation sequencing of non-model plants producing specialized metabolites of biotechnological interest. Journal of Biotechnology, 166(3):122–34. 22

41 2.7 Appendix

Table 2.1: Total number of TPS genes identied in six non-seed plant genomes.

Lineage Species TPS Count Ferns Salvinia cucullata 7 Azolla superorganism 1 Lycophytes Selaginella moellendori 18 Hornworts Anthoceros Punctatus 22 Mosses Sphagnum fallax 4 Liverworts Marchantia polymorpha 7

42 Table 2.2: Typical plant terpene synthase genes in Salvinia cucullata.

ID Contig / Gene_loc Strand Start End Length ScuTPS1 XLOC_018097|g.99102 + 119 2299 727 ScuTPS2 XLOC_008367|g.46493 + 3 620 206 ScuTPS3 XLOC_008367|g.46489 + 81 1586 502 ScuTPS4 scf7180000009411 + 140260 147974 447 ScuTPS5 scf7180000010110 - 1257296 1259664 260 ScuTPS6 scf7180000010110 - 1251971 1256640 270 ScuTPS7 scf7180000010334 - 568868 581876 611

43 Table 2.3: Typical plant terpene synthase genes in Azolla superorganism.

ID Contig Strand Start End Length AsuTPS1 scf7180000009599 - 1241006 1248428 894

44 Table 2.4: Assembly statistics for Anthoceros punctatus.

Statistics without reference scaolds.fasta.cap3 # contigs 15596 # contigs (>= 0 bp) 34448 # contigs (>= 1000 bp) 13144 Largest contig 94143 Total length 94470290 Total length (>= 0 bp) 97989643 Total length (>= 1000 bp) 92634135 N50 12462 N75 5614 L50 2103 L75 4939 GC (%) 49.52 Mismatches # N’s 89463 # N’s per 100 kbp 94.7

45 Table 2.5: Typical plant terpene synthase genes in Anthoceros punctatus.

ID Contig Strand Start End Length ApuTPS1 Contig1003 - 5384 6147 186 ApuTPS2 Contig11875 - 33079 36795 780 ApuTPS3 Contig1639 - 116 5200 880 ApuTPS4 Contig1919 + 1570 2527 225 ApuTPS5 Contig2366 - 4054 7915 841 ApuTPS6 Contig3848 + 585 1607 238 ApuTPS7 Contig4529 - 79 1572 497 ApuTPS8 Contig5467 - 219 1859 343 ApuTPS9 Contig5884 + 49 2024 379 ApuTPS10 Contig654 + 419 3724 676 ApuTPS11 Contig68 + 12275 14825 655 ApuTPS12 Contig799 + 4488 7388 422 ApuTPS13 Contig9716 + 118 5299 880 ApuTPS14 NODE_10797_length_1803_cov_7.30435_ID_21593 + 223 1267 159 ApuTPS15 NODE_11022_length_1747_cov_7.20567_ID_22043 - 240 1708 192 ApuTPS16 NODE_2742_length_9352_cov_7.29709_ID_5483 - 3261 7292 796 ApuTPS17 NODE_4282_length_6121_cov_7.00363_ID_8563 + 106 2462 615 ApuTPS18 NODE_4542_length_5765_cov_6.30368_ID_9083 + 30 2098 381 ApuTPS19 NODE_50_length_49024_cov_6.82293_ID_99 + 15844 19514 868 ApuTPS20 NODE_6403_length_3849_cov_17.8529_ID_12805 - 333 3138 600 ApuTPS21 NODE_717_length_21934_cov_18.8625_ID_1433 - 1739 5575 858 ApuTPS22 NODE_8602_length_2556_cov_7.03039_ID_17203 + 1289 2499 403

46 Table 2.6: Typical plant terpene synthase genes in Marchantia polymorpha. scaold ID length DXDD DDXXD NSE/DTE Subfamily scaold_15 Mapoly0015s0008.1 882 + - - C scaold_46 Mapoly0046s0122.1 879 + - - h scaold_50 Mapoly0050s0107.1 791 + + + C scaold_97 Mapoly0097s0049.1 858 - + + e/f scaold_167 Mapoly0167s0025.1 795 - + + e/f scaold_207 Mapoly0207s0001.1 892 + + + C scaold_221 Mapoly0221s0001.1 779 + + + C

47 Table 2.7: Assembly statistics for Sphagnum fallax.

Statistics without reference sphag_f_v0.1 # contigs 1929 # contigs (>= 0 bp) 1929 # contigs (>= 1000 bp) 1432 Largest contig 9831565 Total length 397359651 Total length (>= 0 bp) 397359651 Total length (>= 1000 bp) 396998157 N50 1820331 N75 926644 L50 58 L75 134 GC (%) 36.44 Mismatches # N’s 1372291 # N’s per 100 kbp 345.35

48 Table 2.8: Typical plant terpene synthase genes in Sphagnum fallax.

Id Contig Start End Strand Protein_Length Sphfalx0014s0037.1 super_14 437280 444282 + 858 Sphfalx0136s0053.1 super_136 830109 836630 - 720 Sphfalx0136s0055.1 super_136 857468 868275 + 894 Sphfalx0790s0001.1 super_790 29 3476 - 379

49 Table 2.9: Number of TPSs genes from non-seed plant transcriptomes.

Lineage TPS Count Species with TPSs Total Species Monilophytes 47 19 70 Lycophytes 54 21 21 Hornworts 23 3 7 Mosses 40 22 41 Liverworts 89 23 27 Charophyta 0 0 47 Chlorophyta 0 0 111 Total 253 88 324

50 Table 2.10: List of charophyta transcriptomes/genomes.

Phylum Species Data_type Charophyta Klebsormidium accidum Genome Nitella mirabilis Transcriptome Coleochaete orbicularis Transcriptome viride Transcriptome Chlorophyta Chlamydomonas reinhardtii v5.5 Genome Ostreococcus tauri Genome Ostreococcus lucimarinus v2.0 Genome Micromonas pusilla CCMP1545 v3.0 Genome Micromonas sp. RCC299 v3.0 Genome Volvox carteri v2.1 Genome Coccomyxa subellipsoidea C-169 v2.0 Genome Spirogyra pratensis Transcriptome

51 432.69370

Figure 2.1: Distribution of protein lengths for typical plant terpene synthases in non-seed plants. The green dash line indicates the median length, while the read line indicates the average protein length.

52 a

b g

d

h

e

f

c

x

Seed plants Hornworts Monilophytes Mosses Lycophytes Liverworts

4.0

Figure 2.2: Phylogenetic relationships of terpene synthases from non-seed plants and selected TPSs from seed plants. Weblogos show the conserved motifs found in each sunfamily.

53 Figure 2.3: Alignment of the conserved motifs found in 339 terpene synthases. Each motif is separated by three consecutive dots (...).

54 AT1G66020.1AT1G66020.1 RKFKKLPTSEW...------...DLCT...KW...RDN...DDTFD...RWA...NDITGFEDD AT4G20200.1AT4G20200.1 RKFKKLPTSEW...------...DLYT...KW...RDR...DDTFD...RWA...NDLMGYEDD AT4G20230.1AT4G20230.1 RKFKKLPLSEW...------...DLYT...KW...KDR...DDTFD...RWA...NDIAGFEDD AT3G29190.1AT3G29190.1 NMFKELPTSEW...------...DLYT...KW...RDR...DDTLD...RWA...NDMKGYKED AT4G20210.1AT4G20210.1 RKFQKFPPSEW...------...NLYT...KW...KYV...DDTCD...RWA...NDIAGFEDD AT3G29110.1AT3G29110.1 RPISKLPPSKW...------...DLYT...KW...RPV...DDTCD...RWA...NDLAGFEDD AT1G48800.1AT1G48800.1 RKFEKLGPSEW...------...DLYT...KW...RDR...DDTCD...RWD...NDLFGYKDD AT2G23230.1AT2G23230.1 RTFNKFPRSDW...------...DLST...KW...RDR...DDTCD...RWD...NDMGGFKDD AT4G15870.1AT4G15870.1 RKFKRLPLSKW...------...DLYT...KW...RDR...DDTCD...RWD...NDIAGFEGD AT4G13280.1AT4G13280.1 LAFTKLSHCQW...------...DLST...KW...RER...DDACD...RWS...DDITDFDSD AT4G13300.1AT4G13300.1 LAFTKLSHSQW...------...DLST...KW...RER...DDACD...RWN...DDITDFESD AT5G44630.1AT5G44630.1 TNFTKLPSSQW...------...KLHT...MW...RER...DDACD...RWD...DDIADFEED AT5G48110.1AT5G48110.1 RPLFQFPPSLL...------...DMYS...KW...RVR...DDLYD...RWE...DDVATYKDE AT3G14540.1AT3G14540.1 RPSTYFSPSLW...------...DLET...KW...RDR...DDTYD...RLN...NDVGTYETE AT3G14520.1AT3G14520.1 RPSTYFSPSLW...------...DLET...KW...RDR...DDTYD...RLN...NDVGTYETE TPS-a-1 AT1G31950.1AT1G31950.1 LPTPHFSPSLW...------...DLET...KW...RDR...DDTYD...SIE...NDVGTYETE AT3G29410.1AT3G29410.1 RPLTYFSPSYW...------...DLET...RW...RER...DDTCD...RWD...NDIVTFEQE AT1G33750.1AT1G33750.1 RPLPHSAPDLW...------...DLET...KW...RDR...DDTCD...RWD...NDIAGFEEE AT5G23960.1AT5G23960.1 RPLADFPANIW...------...DLYT...RW...RHR...DDMYD...EWL...DDISSYEFE POPTR_0019s06220.1POPTR_0019s06220.1 TPVTPAVPYLW...------...DLYT...RW...RDR...DDTCD...RCC...NDIVGHEDE POPTR_0019s06190.1POPTR_0019s06190.1 ------...------...DLYT...RW...RDR...DDTYD...RCC...NDIAGHEDE POPTR_0019s03350.1POPTR_0019s03350.1 ------...------...DLYT...RW...RDR...DDTYD...RCC...NDIVGHEDE POPTR_0121s00250.1POPTR_0121s00250.1 RPEANFPPSLW...------...DLST...RW...RDR...DDTYD...RCS...NDIVGHEDE POPTR_0019s01320.1POPTR_0019s01320.1 RPEANFPPSLW...------...DLYT...RW...RDR...DDTYD...RCS...NDIVGHEDE POPTR_0408s00200.1POPTR_0408s00200.1 RRKADFPPSLW...------...DLYT...--...RDR...DDTYD...RCS...NDLASHEDE POPTR_0007s07360.1POPTR_0007s07360.1 RPLADFPPTEW...------...DLHT...SW...RDR...DDTYD...RCN...NDIVTHEFE POPTR_0007s07410.1POPTR_0007s07410.1 RPLADFPPTEW...------...DLHT...SW...RDR...DDTYD...RCN...NDIVTHEFE POPTR_0015s09710.1POPTR_0015s09710.1 RRTANYHPSIW...------...DLYN...RW...RDR...DDIYD...RWD...DDVKSHKFE POPTR_0015s05270.1POPTR_0015s05270.1 RRTANYHPSIW...------...DLYN...RW...RDR...DDIYD...RWD...DDVRSHKFE POPTR_0001s44080.1POPTR_0001s44080.1 RRSASFPPSIW...------...DLYA...KW...RDR...DDIYD...RWE...DDIVSHKFE POPTR_0011s14600.1POPTR_0011s14600.1 RRSANFHPSIW...------...DLQM...MW...RDR...DDVYD...RWD...DDIVSHKFE POPTR_0005s09830.1POPTR_0005s09830.1 RRTAEFHPSVW...------...DLFT...RW...RDR...DDIYD...RWD...DDITSHEFE Os04g27190.1Os04g27190.1 KEVSSFEPSVW...------...SLHE...QW...RDR...DDTYD...RWD...NDIAAFKSG Os04g27340.1Os04g27340.1 KEVSSFEPSVW...------...SLHE...QW...RDR...DDTYD...RWD...NDIAAFKSG Os04g27540.1Os04g27540.1 ------...------...SLHE...EW...SDR...DDTYD...RWD...NDIAAFKHG Os04g27790.1Os04g27790.1 K-TTNYEPSVW...------...SLHE...EW...RDR...DDTYD...RWN...NDIASFERG Os04g27400.1Os04g27400.1 ------...------...SLHD...EW...RDR...DDTYD...RWD...NDIAAFKQG Os04g26960.1Os04g26960.1 ------...------...NLHE...RW...RDR...DDTYD...GWD...NDISGFKLG Os04g27070.1Os04g27070.1 ------...------...KLHE...RW...RDR...DDTYD...GWD...NDIAGFKLG sorghum01g015070.1sorghum01g015070.1 --VRTVDPSVW...------...TLHE...QW...RDR...DDTFD...RWD...NDIASFNSG sorghum06g031270.1sorghum06g031270.1 -----MRSNL-...------...CLHD...RY...ANI...DDTYD...RWE...NDLASFKHG sorghum01g035460.1sorghum01g035460.1 ERASGFRPTMF...------...CLHD...RW...RDR...DDTYD...RWE...NDLASFKRG sorghum07g020980.1sorghum07g020980.1 --APVFHPTVW...------...SLHE...QW...RDR...DDTYD...RWD...DDMSAFKNG Os03g22634.1Os03g22634.1 ----MDDPSRW...------...SLHD...QW...RDR...DDTYD...SWD...DDMAAFKNG Os01g42610.1Os01g42610.1 TAAPAWPTAMW...------...NLHE...-W...RDR...DDTYD...RWD...DDLAVSQNG Os07g11790.1Os07g11790.1 -MAPAFHPAIF...------...DLYE...QW...RER...DDTYD...RWD...DDLAASHSG Os03g24690.1Os03g24690.1 -----LGRQKW...------...QLYL...LW...RDC...DDIYD...RWD...NDIASYKTG Os03g24680.1Os03g24680.1 --AAAVEENSG...------...DLHV...LW...RDR...DDTYD...RWD...------TPS-a-2 Os03g24640.1Os03g24640.1 -----LSSD--...------...DLYT...LW...RDR...DDTYD...RWD...NDIASHRVG sorghum01g034700.1sorghum01g034700.1 ------...------...GLHT...LW...RDR...DDTYD...RWD...NDIASYKKG sorghum05g019210.1sorghum05g019210.1 ------...------...ELHI...LW...LDR...DDTFD...RWD...----DMKGR sorghum07g025700.1sorghum07g025700.1 ------...------...DLHV...LW...RNR...DDTYD...R--...NDISSYYKP sorghum07g004480.1sorghum07g004480.1 QKLTTYHPSLW...------...DLNL...LW...RDR...DDIMD...RWE...NDIASTKRE sorghum07g004485.1sorghum07g004485.1 -----FHPSLW...------...DLNL...LW...RDR...DDIMD...RCE...NDIASTKRE sorghum07g004470.1sorghum07g004470.1 AKAPTFHPSLW...------...DLNL...AW...RDR...DDIID...RCS...NDITST--- Os08g07080.1Os08g07080.1 ------...------...DLHL...GW...RDR...DDIFD...MCN...NDIASTKRE Os01g23530.1Os01g23530.1 RKSTNFHPSLW...------...DLNL...LW...RDR...DDIFD...RWD...NDIVSNKRE sorghum07g005130.1sorghum07g005130.1 DVHPKFHSSLW...------...DLPT...LW...RDR...DDLYD...RWD...NDIMSYKRE sorghum07g003080.1sorghum07g003080.1 DCRRQYAPSVW...------...DLYV...MW...RDR...DDIYD...RWD...NDIMSYERE Os08g04500.1Os08g04500.1 DLHFDHHPNVW...------...DLHT...MW...RDR...DDIYD...RWD...NDIASHERE Os03g31430.1Os03g31430.1 ------...------...DFHA...EW...RHR...DDTFD...RWD...DDIVAHAFE sorghum01g032610.1sorghum01g032610.1 RHAVDYRPSVW...------...SFHE...EW...RHR...DDTFD...RWD...DDIVAHEFE sorghum05g006470.1sorghum05g006470.1 RPIRTLEESKW...------...DLFA...LW...RKR...DDIID...SWN...DDVAGHENE POPTR_0001s31580.1POPTR_0001s31580.1 ------...------...DLYG...RW...RDR...DDAYD...SWD...DDLGTSSDE POPTR_0092s00200.1POPTR_0092s00200.1 RRSGNYPTPFW...------...DLYA...RW...RDR...DDAYD...SWD...DDLGTSSDE POPTR_0001s31570.1POPTR_0001s31570.1 RRSGNYPTPFW...------...DLYA...RW...RDR...DDAYD...SWD...DDLGTSSDE POPTR_0001s31550.1POPTR_0001s31550.1 RRSGNYPTPFW...------...DLYG...RW...RDR...DDAYD...SWD...DDLGTSSDE POPTR_0019s03980.1POPTR_0019s03980.1 RRSGNYQTSMW...------...DLYA...RW...RDR...DDIYD...RWD...DDLGTSSDE POPTR_0019s01270.1POPTR_0019s01270.1 RRSANYQTSMW...------...DLYA...RW...RDR...DDVYD...RWD...DDLGTSSDE POPTR_0019s01340.1POPTR_0019s01340.1 RRSANYQTSIW...------...DLYA...RW...RDR...DDVYD...RWD...DDLGTSTDE AT3G25830.1AT3G25830.1 RRSGNYQPSPW...------...DLHA...RW...RDR...DDIYD...NWD...NDLGTSPTE AT3G25820.1AT3G25820.1 RRSGNYQPSPW...------...DLHA...RW...RDR...DDIYD...NWD...NDLGTSPTE AT3G25810.1AT3G25810.1 RRSGNYQPSSW...------...DLHA...CW...RDR...DDIYD...NWD...NDLATSPDE TPS-b AT2G24210.1AT2G24210.1 RRSANYQPSRW...------...DLHA...SW...RDR...DDIYD...NWD...NDLVTSPDE AT4G16740.1AT4G16740.1 RRSANYQPSLW...------...DLYA...SW...RDR...DDIYD...KWD...NDLATSSEE AT4G16730.1AT4G16730.1 ------...------...DLHA...SW...RDR...DDIYD...KWD...NDLATSTEE POPTR_0017s06920.1POPTR_0017s06920.1 RRSANYEPNSW...------...SLHA...RW...RDR...DDIYD...RWD...NDLASASAE POPTR_0007s02920.1POPTR_0007s02920.1 RRLANYQPTTW...------...SLRA...RW...RDR...DDVYD...RWD...NDLATSSAE sorghum06g002820.1sorghum06g002820.1 RRSANYRPTSW...------...NIAF...WW...RDR...DDVYD...RWD...NDSATHMDE sorghum01g039090.1sorghum01g039090.1 RRSANYQPNSW...------...DVRL...RW...RDR...DDIYD...RWG...NDSATHSEE POPTR_0011s03440.1POPTR_0011s03440.1 RRSANYKPNIW...------...NLYV...RW...RDR...DDVYD...RWD...NDLGTSVAE POPTR_0007s02810.1POPTR_0007s02810.1 RRSADYHPSVW...------...DLHT...RW...RDR...DDIYD...RWD...DDLGTYTAE sorghum04g001780.1sorghum04g001780.1 HRGDSIRPS--...------...DLFD...RW...RDQ...DDIFD...VWN...DDMGSAKDE sorghum04g001800.1sorghum04g001800.1 HRGDFIRP---...------...DLFD...RW...RDQ...DDIFD...VWN...DDMGSAKDE sorghum04g001810.1sorghum04g001810.1 HRGQFIRP---...------...DLFD...RW...RDQ...DDIFD...MWN...DDMGSAEDE Os02g02930.1Os02g02930.1 RRAAHVRPSIY...------...DLLD...RW...RDQ...DDIFD...MWD...DDLGSAKDE TPS-g POPTR_0004s02990.1POPTR_0004s02990.1 ------NF--...------...DVCG...NW...RDQ...DDIFD...EWD...DDLGTAKDE POPTR_0004s02970.1POPTR_0004s02970.1 ------NF--...------...DVFG...NW...RDQ...DDIFD...EWD...DDLGTAKDE AT1G61680.1AT1G61680.1 AKRSILRNV--...------...DLHE...KW...RSQ...DDIFD...RWD...DDLGSAKDE

Figure 2.3: Continued.

55 GYMN_PsTPSPinGYMN_PsTPSPin RRMGDFHSNLW...------...DLNS...RW...RHR...DDMYD...RWD...GDTRCYKAD GYMN_PtTPSPin2GYMN_PtTPSPin2 RRRGDFHSNLW...------...DLNS...RW...RHR...DDMYD...RWD...GDTRCYQAD GYMN_AgTPSPin2GYMN_AgTPSPin2 RRMGDFHSNLW...------...DLNS...RW...RHR...DDMYD...RWD...GDTRCYKAD GYMN_PtTPSTerGYMN_PtTPSTer RRTGGYLSNLW...------...NLNS...RW...RHR...DDMYD...RWD...GDTRCYQAD GYMN_AgTPSCamGYMN_AgTPSCam RRVGNYHSNLW...------...DLNS...RW...RHR...DDMYD...RWD...GDTRCYKAD GYMN_PaTPS-LimGYMN_PaTPS-Lim RRRGNYHSNLW...------...DLNS...RW...RHR...DDIYD...RWD...GDTRCYKAD GYMN_AgTPSPheGYMN_AgTPSPhe RRIGDYHSNLW...------...DVNS...RW...RHR...DDIYD...RWN...GDTRCYKAD GYMN_AgTPSMyrGYMN_AgTPSMyr RRIGDYHSNIW...------...DLNS...RW...RHR...DDIYD...RWN...GDTRCYKAD GYMN_PtTPSFarGYMN_PtTPSFar RRVGDYHPNLW...------...DLNS...RW...RHR...DDIYD...RWD...GDICGYQAE GYMN_PaTPSFarGYMN_PaTPSFar RRVGDYHSNLW...------...DLNS...RW...RHR...DDIYD...RWD...GDICGYQAE GYMN_PaTPSLisGYMN_PaTPSLis RRIADHHPNLW...------...DLNT...RW...RHR...DDIYD...RWH...GDIHSYQAE GYMN_PtTPSPin1GYMN_PtTPSPin1 RRIAGHHSNLW...------...DLNS...RW...RHR...DDIYD...RWD...GDTQCYKAD TPS-d GYMN_AgTPSTerGYMN_AgTPSTer RRIVEFHSNLW...------...DLNS...RW...RHR...DDIYD...RWD...NDTRCYKAD GYMN_AgTPSLimGYMN_AgTPSLim RRIADHHPNLW...------...DLNS...RW...RHR...DDMYD...RWD...GDTRCYKAD GYMN_AgTPSLim2GYMN_AgTPSLim2 RRIADHHPNLW...------...DLNS...RW...RHR...DDIYD...RWD...GDTRCYKAD GYMN_AgTPSSelGYMN_AgTPSSel RRTGNHHGNVW...------...DLNA...RW...RKR...DDLYD...RWD...DDLKDFEDE GYMN_AgTPSHumGYMN_AgTPSHum STTSNRHGNMW...------...DLNA...RW...RKC...DDLYD...RWD...DDARDFQAE GYMN_AgTPSBisGYMN_AgTPSBis R-TANPHPNVW...SAYDTAWVA...DLET...SW...RER...DDMYD...RWD...DDTKTYKAE GYMN_PaTPSBisGYMN_PaTPSBis R-TANPHPNIW...SAYDTAWVA...DLET...SW...RER...DDMYD...RWD...DDTKTYKAE GYMN_TbTPSTaGYMN_TbTPSTa RLSANYHGDLW...SAYDTAWVA...DLNT...RW...RHR...DDMAD...RWD...NDTKTYQAE GYMN_PaTPSLASGYMN_PaTPSLAS R---EFPPGFW...SAYDTAWVA...DIDD...RW...RER...DDLYD...KWD...NDTKTYEAE GYMN_PaTPSIsoGYMN_PaTPSIso R---EFPPGFW...SAYDTAWVA...DIDD...RW...RER...DDLYD...RWD...NDTKTYEAE GYMN_AgTPSAbiGYMN_AgTPSAbi R---EFPPGFW...SAYDTAWVA...DIDD...RW...RER...DDLYD...RWD...NDTKTYQAE GYMN_GbTPSLevGYMN_GbTPSLev RLNADYHPAVW...SAYDTAWVA...DVDD...SW...RQR...DDLYD...RWD...NDTKTYQAE LYC_YHZW_PTPS2LYC_YHZW_PTPS2 ------...------...----...RW...RQR...DDLFD...RWD...NDTKTFEVK LYC_YHZW_PTPS1LYC_YHZW_PTPS1 ------...------...----...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_GKAG_PTPS2LYC_GKAG_PTPS2 EKPSYYHQTLW...SAYDTAWVA...DVDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_GAON_PTPS1LYC_GAON_PTPS1 EKPSYYHQTLW...SAYDTAWVA...DIDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_CBAE_PTPS1LYC_CBAE_PTPS1 EKPSYYHQTLW...SSYDTAWVA...DIDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_ZZEI_PTPS3LYC_ZZEI_PTPS3 EKPSYYHQTLW...SAYDTAWVA...DVDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_GAON_PTPS2LYC_GAON_PTPS2 ------...------...DVDD...RW...RQR...DDLFD...RWD...------LYC_ENQF_PTPS1LYC_ENQF_PTPS1 ------...------...DVDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_PQTO_PTPS1LYC_PQTO_PTPS1 ------...------...DVDD...--...---...-----...---...------LYC_XNXF_PTPS1LYC_XNXF_PTPS1 TKPSFYHPTLW...SAYDTAWVA...DVDD...RW...RQR...DDLFD...RWD...XXXXXXXXX LYC_PQTO_PTPS2LYC_PQTO_PTPS2 ------...------...----...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_WAFT_PTPS4LYC_WAFT_PTPS4 ------...------...DLDD...--...---...-----...---...------LYC_WAFT_PTPS1LYC_WAFT_PTPS1 ------...------...----...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_WAFT_PTPS3LYC_WAFT_PTPS3 ------...------...----...SW...RQR...DDLFD...RWD...NDTKTFEAE LYC_ZZEI_PTPS4LYC_ZZEI_PTPS4 KPPSYYHPTLW...SAYDTAWVA...DIDD...RW...RQR...DDLFD...RWD...NDTKTFEAE LYC_ZZEI_PTPS6LYC_ZZEI_PTPS6 ------...------...DVDD...RW...RQR...DDLFD...RWN...------LYC_ZZEI_PTPS8LYC_ZZEI_PTPS8 ------...------...DVDD...RW...RQR...DDLFD...RWN...------LYC_ZZEI_PTPS5LYC_ZZEI_PTPS5 ------...------...DIDD...RW...RQR...DDLFD...RWN...------LYC_ZZEI_PTPS7LYC_ZZEI_PTPS7 ------...------...----...--...---...DDLFD...RWD...NDTKTFKAE LYC_YHZW_PTPS7LYC_YHZW_PTPS7 ------...------...----...--...---...DDLFD...RWD...NDTKTFEAE MON_VIBO_PTPS2MON_VIBO_PTPS2 KL--PSPDRLY...SAYDTAWVS...DIDD...RW...RKR...DDYMD...RWD...NDIQTYEVE TPS-h MON_UOMY_PTPS12MON_UOMY_PTPS12 KL--PSPDRFY...SAYDTAWVS...DIDD...RW...RQK...-----...---...------MON_UOMY_PTPS13MON_UOMY_PTPS13 TTEAYFNPNIW...SVYDTAWVA...DIDD...RW...GQG...NDLVD...RWD...KGSQTCKV- MON_NHCM_PTPS4MON_NHCM_PTPS4 R---YYHPNLW...SAYDTAWVA...ILPP...RW...RQR...DDLAD...RWD...NDSQTFKAE MON_UOMY_PTPS8MON_UOMY_PTPS8 R------...AAAPGPYIS...DLGD...RW...RQR...DDLFD...RWD...NDSKSFQAE SmTPS12SmTPS12 KEKQQPYQGIL...STYDTAWVA...DIED...KW...RFK...DDILD...SWD...NDIQTYKIE SmTPS15SmTPS15 ------...STYDTAWVA...DIED...KW...RFK...DDFLD...SWD...NDIQTYKIE LYC_ZZOL_PTPS2LYC_ZZOL_PTPS2 KEKQQPYQGIL...STYDTAWVA...DIED...KW...RFK...DDILD...SWD...NDIQTYKVE SmTPS11SmTPS11 KEKQQPYQGIL...STYDTAWVA...DIED...EW...RFK...DDLLD...RWD...NDIQTYKIE SmTPS16SmTPS16 KKKQQPYQGIL...SAYDTAWVA...DIDD...NW...RLK...DDLLD...RWD...NDMKTYKIE SmTPS1SmTPS1 KKKQLPYQGIL...SAYDTAWVA...DIDD...NW...RLT...DDLMD...RWD...------IE SmTPS2SmTPS2 R-KQQPYQGIL...SAYDTAWVA...DIDD...K-...---...-----...-WD...ND------LYC_ZFGK_PTPS3LYC_ZFGK_PTPS3 ------...------...----...--...---...DDLAD...SWD...NDIGTFEDE SmTPS17SmTPS17 ------...SAYDTAWVG...DIDD...RW...GDK...DDLAD...SWD...NDIGTYQFE SmTPS18SmTPS18 ------...SPYDTAWVA...ELDC...RW...DHL...DDLAD...RWD...NDLATYKVE MAR_YBQN_PTPS4MAR_YBQN_PTPS4 ------...------...DLDD...RW...RER...RQLFV...RWD...QSIAYKRD- MAR_IRBN_PTPS5MAR_IRBN_PTPS5 ------...------...DLDD...RW...RER...RQLFV...RWD...QSISYIKD- MAR_NWQC_PTPS1MAR_NWQC_PTPS1 ------...------...DLDD...RC...RER...HKLFV...TWD...------MAR_WZYK_PTPS1MAR_WZYK_PTPS1 ------...------...DLDD...--...---...-----...---...------MAR_BNCU_PTPS1MAR_BNCU_PTPS1 SSRSVSVAEAA...SAYDTAWVA...DLDD...TW...RER...YQLFN...EWN...QGMG-FKTP MAR_KRUQ_PTPS9MAR_KRUQ_PTPS9 ------...SAYDSAWVA...DLDD...RW...RER...YQLFV...RWD...QGIA---KK MAR_NRWZ_PTPS2MAR_NRWZ_PTPS2 SYARIYLP--A...SAYDTAWVA...DLDD...HW...RER...YQLFN...RWD...QRIAAVKNK MAR_NRWZ_PTPS1MAR_NRWZ_PTPS1 SITAILLPPAA...SAYDTAWVA...DLDD...HW...RER...YQLFN...RWD...QRIAAVKNK MAR_HERT_PTPS2MAR_HERT_PTPS2 ------...------...DLDD...--...---...-----...---...------Mapoly0046s0122.1Mapoly0046s0122.1 RIFAILLPSVP...SAYDTAFCA...DLDD...RW...RER...HQLFN...RWD...QGFADYKET BRY_LNSF_PTPS3BRY_LNSF_PTPS3 ------...------...DIDD...GW...RHR...DDLFD...RWD...NDIATYKFE BRY_LNSF_PTPS1BRY_LNSF_PTPS1 EASSVMSDEVY...SAYDTAWVA...DIDD...GW...RHR...-----...---...------BRY_TMAJ_PTPS1BRY_TMAJ_PTPS1 ------...------...----...GW...RHR...DDLFD...RWD...------

Figure 2.3: Continued.

56 MAR_LFVP_PTPS7MAR_LFVP_PTPS7 K------GTM...SAYDTAWVA...DMVS...RW...RQP...DDLFD...RWD...NDCSTHRRD MAR_LFVP_PTPS8MAR_LFVP_PTPS8 K------GTM...SAYDTAWVA...DMVS...RW...RQP...DDLFD...RWD...AEC------Mapoly0097s0049.1Mapoly0097s0049.1 K------DTM...SAYDTAWVA...DMVS...RW...RQP...DDLFD...RWD...NDCSTQQRD MAR_JPYU_PTPS1MAR_JPYU_PTPS1 ------...----TAWVA...DMVS...RW...RQP...DDLFD...RWD...NDCSTHQRD MAR_ILBQ_PTPS3MAR_ILBQ_PTPS3 ------...------...----...RW...RHM...DDLFD...RWD...NDSRTHKRD MAR_TXVB_PTPS2MAR_TXVB_PTPS2 R------KTS...SAYDTAWVA...DLMD...RW...RQV...DDLFD...SWD...NDSRTYQRE MAR_TXVB_PTPS3MAR_TXVB_PTPS3 R------KTS...SAYDTAWVA...DLMD...RW...RQV...DDLFD...SWD...------Mapoly0167s0025.1Mapoly0167s0025.1 R------...SAYDTAWVA...DIET...RW...RQP...DDLFD...RWD...NDLRTHSRE MAR_JPYU_PTPS6MAR_JPYU_PTPS6 ------...------...----...RW...RQP...DDLFD...RWD...NDLRTHSRE MAR_HERT_PTPS3MAR_HERT_PTPS3 ------...------...----...--...---...DDLFD...RWD...ND------MAR_TXVB_PTPS4MAR_TXVB_PTPS4 EAVTEIEDGVN...SAYDTAWVA...DLGT...RW...RKK...DDLFD...RWD...NDTRSHSRE MAR_RTMU_PTPS7MAR_RTMU_PTPS7 RSCGQFHGSA-...------...DSPV...RW...RSM...DDLYD...KFD...------MAR_RTMU_PTPS6MAR_RTMU_PTPS6 RLSGKFHVSA-...------...DSPV...RW...RSM...DDLYD...KFD...------MAR_RTMU_PTPS5MAR_RTMU_PTPS5 ------...------...----...RW...RAM...DDLFD...KFD...NDARSYEHE MAR_IRBN_PTPS3MAR_IRBN_PTPS3 RA------...ASSTCAGFV...DLGV...RW...RDI...DDLFD...RWD...NDIDGYKRD LYC_ZZOL_PTPS4LYC_ZZOL_PTPS4 ------...------...----...RW...RQK...DDFMD...RWD...NDIQGFHRE SmTPS10SmTPS10 FSAIQISSGFF...SAYDMGWIA...SFED...RW...RQK...DDFMD...RWD...NDIQGFHRE SmTPS6SmTPS6 FSAIQSSSGFF...------...SFED...RW...RQK...DDFMD...RWD...NDIQGFHRE SmTPS8SmTPS8 F-ATQISSGLF...SAYDMGWIA...----...RW...RQK...DDFMD...RWD...NDIQGFDRE LYC_ABIJ_PTPS3LYC_ABIJ_PTPS3 ------...------...SLED...RW...RQK...DDFMD...RWD...NDIQGFDRE LYC_LGDQ_PTPS2LYC_LGDQ_PTPS2 ------...------...----...RW...RQK...DDFMD...GWD...NDIQGFDRE LYC_KJYC_PTPS1LYC_KJYC_PTPS1 ------...------...SLED...RW...RQK...DDYFD...RWD...NDIQGFKRE LYC_JKAA_PTPS1LYC_JKAA_PTPS1 ------...------...SIES...KW...RNK...DDYFD...RWD...NDIQGYDRD LYC_ZYCD_PTPS1LYC_ZYCD_PTPS1 ------...------...SIES...KW...RNK...DDYFD...RWD...------LYC_ZFGK_PTPS5LYC_ZFGK_PTPS5 VSPAVIEGTLF...EASDIAWIA...NFDD...KW...KTK...DDYFD...RWD...NDLQGYARE SmTPS5SmTPS5 ------...------...----...--...-AR...DDFMD...LWP...RSTRAFSLD SmTPS7SmTPS7 SR------VW...------...ELMM...-W...QHK...DDFMD...---...------Os02g36140Os02g36140 ------...SPYDTAWVA...DVTT...SW...RQK...DDLFD...KWE...NDIRGIERE TPS-e/f Os04g52230.1Os04g52230.1 ------...SLYDTAWVA...DITT...CW...RQK...DDFFD...KWD...NDSQGFERE Os04g10060.1Os04g10060.1 ------...SSYDIAWVA...DLPT...SW...RKN...DDFFD...KWD...NDIQSFERE sorghum06g028220.1sorghum06g028220.1 ------...SLYDTAWVS...DKAT...SW...RQK...DDFFD...KWH...NDSQSFERE Os04g52210.1Os04g52210.1 ------...SPYDTAWVA...DITT...SW...RQK...DDFFD...MWD...NDVQTYERE sorghum06g028210.1sorghum06g028210.1 ------...------...DAAT...SW...RQK...DDFFD...MWD...NDVETFERE Os04g52240.1Os04g52240.1 ------...------...DMQT...SW...RQK...DDFFD...KWD...NDVQSYERE Os02g36220.1Os02g36220.1 ------...SSYDTAWVA...DMGT...SW...RLM...DDFFD...KWD...NDMQTYEKE Os02g36264.1Os02g36264.1 ------...SSYDTAWVA...DMRT...SW...RLM...DDFFD...KWD...NDMQTYEKE Os11g28530.1Os11g28530.1 ------...SSYDTAWVA...DMAT...SW...RVI...DDLFD...KWD...NDVMTYEKE Os12g30824.1Os12g30824.1 ------...SLYDTAWVA...DIPT...SW...RVM...DDFFD...KWD...NDSQTYRKE POPTR_0008s08220.1POPTR_0008s08220.1 ------...SSYDTAWVA...DNAT...RW...RQK...DDFFD...RWD...NDWRSFKRE POPTR_0008s08190.1POPTR_0008s08190.1 ------...----MAWFS...DNAT...RW...RQK...DDFFD...RWD...NDWRSFKRE AT1G79460.1AT1G79460.1 ------...SAYDTSWVA...DLAT...RW...RQK...DDFFD...KWD...NDIQGFKRE AT1G61120.1AT1G61120.1 --YSFVSPSAW...------...QLHK...MW...REK...DDFFD...RWE...NDLQSYQKE POPTR_0004s03810.1POPTR_0004s03810.1 -----MVPDM-...------...QLYK...RW...REK...DDFYD...RWD...NDIQSYKKE GYMN_PgKSGYMN_PgKS ------...SAYDTAWVA...DIVS...RW...RQK...DDFFD...VWD...NDIQGCKRE GYMN_PsKSGYMN_PsKS ------...SAYDTAWVA...DIVS...RW...RQK...DDFFD...VWD...NDIQGCKRE ANT_RXRQ_PTPS4ANT_RXRQ_PTPS4 ------...------...TLRD...RW...NGM...DDYFD...RWD...NDVRSTRRE ANT_RXRQ_PTPS3ANT_RXRQ_PTPS3 ------...------...TLRD...RW...NGM...DDYFD...RWD...NDVRSTRRE ApuTPS11ApuTPS11 ------...------...TLRD...RW...NGM...DDYFD...RWD...NDVRSTRRE ApuTPS4ApuTPS4 ------...------...----...RW...KGM...DDYFD...---...------RE ANT_RXRQ_PTPS12ANT_RXRQ_PTPS12 RG------W...SLYAVAWIA...TLDD...RW...KGK...DDYFD...RWD...NDARTAERE ANT_RXRQ_PTPS20ANT_RXRQ_PTPS20 RG------W...SLYAVAWIA...TLDD...RW...KGK...DDYFD...RWD...NDARTAERE ApuTPS2ApuTPS2 RG------W...SLYAVAWIA...----...--...KGK...DDYFD...RWD...NDARTAERE MAR_FITN_PTPS1MAR_FITN_PTPS1 ------...------...----...CW...RQK...DDYYD...RWD...------LYC_PKOX_PTPS4LYC_PKOX_PTPS4 ------...------...DPDT...CW...RQK...DDYYD...RWD...------LYC_UPMJ_PTPS1LYC_UPMJ_PTPS1 ------...------...EVDD...RW...RQK...DDLFD...RWH...NDIQSYKRE POPTR_0002s05300.1POPTR_0002s05300.1 ------W...SAYDTAWVA...DIDD...KW...RKT...HSYFH...---...YQLGHYQKN POPTR_0005s23190.1POPTR_0005s23190.1 ------...SAYDTAWVA...DIDD...KW...RRT...GSYFP...REE...YQLDHYKKN AT4G02780.1AT4G02780.1 ------W...SAYDTAWVA...DIDD...KW...RSE...SSSFG...RSD...LPRQYLKAR Os02g36210.1Os02g36210.1 RP------L-...SAYDTAWIA...DIDD...RW...EQS...ASHLR...DWD...------Os04g09900.1Os04g09900.1 RR------L-...SAYDTSLVA...DIDD...RW...QTT...SSHFR...---...------sorghum01g021990.1sorghum01g021990.1 AKTTRVETEIW...SAYDTAWVA...DVDD...--...---...-----...---...------TPS-c GYMN_PgCPSGYMN_PgCPS ------...SAYDTAWIA...DVDD...RW...QDQ...RIFFS...RWD...RRIATYKEE GYMN_PsCPSGYMN_PsCPS ------...SAYDTAWIA...DVDD...RW...QDQ...RIFFS...RWD...RRIATYKEE SmTPS3SmTPS3 ------...SAYDTAWVA...DADD...RW...RSK...NEYFD...RWN...QDTHTDKVD SmTPS13SmTPS13 ------...SAYDTAWVA...DADD...RW...RSK...NEYFD...RWN...QDTHTDKVD LYC_ZZOL_PTPS1LYC_ZZOL_PTPS1 ------...------...DADD...RW...RSK...NEYFD...RWN...QDTHTDKVD SmTPS4SmTPS4 ------...SAYDTAWVA...DGDD...RW...RD-...AEYFR...R--...------LYC_ZFGK_PTPS6LYC_ZFGK_PTPS6 ------...SAYDTAWVA...DVDD...--...---...-----...---...------LYC_JKAA_PTPS2LYC_JKAA_PTPS2 ------...SAYDTAWVA...DVDD...RW...LQK...KEYFS...RWN...DDIQEYKDQ

Figure 2.3: Continued.

57 BRY_HVBQ_PTPS3BRY_HVBQ_PTPS3 ------...-LVDYNSMG...ELNA...QW...RQK...DDYFD...AWD...NDITGIERE BRY_HVBQ_PTPS2BRY_HVBQ_PTPS2 ------...TLIDCTSVG...DMNA...QW...RQK...DDYFD...AWD...NDITGIERE BRY_WNGH_PTPS1BRY_WNGH_PTPS1 R------...TLIDGISVG...DMNA...RW...RQK...DDYFD...AWD...NDITGIERE BRY_QKQO_PTPS3BRY_QKQO_PTPS3 ------...TLVECTSIG...DLNA...HW...RQK...DDYFD...AWD...NDITGIERE BRY_TMAJ_PTPS2BRY_TMAJ_PTPS2 ------...TLVECTSIG...DLNA...HW...RQK...DDYFD...AWD...NDITGIERE BRY_JADL_PTPS1BRY_JADL_PTPS1 ------...------...DLYA...HW...RQK...DDYFD...AWD...NDIPGDERD BRY_ORKS_PTPS2BRY_ORKS_PTPS2 ------...------...DISA...KW...RQK...DDYFD...KWD...NDITGIEVY BRY_ORKS_PTPS1BRY_ORKS_PTPS1 ------...------...DISA...KW...RQK...DDYFD...KWD...NDITGIERE BRY_XWHK_PTPS1BRY_XWHK_PTPS1 ------...------...DMNA...RW...RQK...DDYFD...TWD...NDITGIERE BRY_BGXB_PTPS3BRY_BGXB_PTPS3 ------...------...----...--...-QK...DDYFD...AWD...------BRY_HVBQ_PTPS4BRY_HVBQ_PTPS4 ------...------...----...RW...RQK...DDYFD...AW-...------BRY_NGTD_PTPS3BRY_NGTD_PTPS3 ------...------G...DLNA...SW...RQK...DDYFD...MWD...-----VKPS BRY_RDOO_PTPS2BRY_RDOO_PTPS2 ------...------...DVDD...--...---...-----...---...------PpCPS_KSPpCPS_KS ------...SPYDTAWVA...DVDD...KW...RQK...DDYFD...TWN...NDIQGMKRE BRY_KEFD_PTPS1BRY_KEFD_PTPS1 ------...------...DVDD...EW...RQK...DDYFD...AWD...KDVQGIKRE BRY_WNGH_PTPS3BRY_WNGH_PTPS3 ------...------...----...--...---...DDYFD...MWD...------BRY_CMEQ_PTPS1BRY_CMEQ_PTPS1 ------...SAYDTAWVA...DVDD...RW...RQK...DDYFD...MWN...NDIQGVKRE BRY_ORKS_PTPS3BRY_ORKS_PTPS3 ------...------...----...RW...RQK...DDYFD...MWD...------BRY_QKQO_PTPS1BRY_QKQO_PTPS1 ------...------...DVDD...RW...RQK...DDYFD...---...------BRY_IGUH_PTPS1BRY_IGUH_PTPS1 ------...SAYDTAWVA...DVDD...RW...RQK...DDYFD...RWD...NDIQGMKRE BRY_BGXB_PTPS2BRY_BGXB_PTPS2 ------...SAYDTAWVA...DVDD...RW...RHK...DDYFD...RWD...NDIQGMKRE BRY_BGXB_PTPS1BRY_BGXB_PTPS1 ------...------...----...RW...RHK...DDYFD...RWD...NDIQGMKRE BRY_WNGH_PTPS4BRY_WNGH_PTPS4 ------...------...DVDD...--...---...-----...---...------Sphfalx0790s0001.1Sphfalx0790s0001.1 ------...SPYDTAWVA...DIDD...--...---...-----...---...------Sphfalx0136s0055.1Sphfalx0136s0055.1 ------...SPYDTAWVA...DIDD...RW...RKK...DDYFD...RWD...NDLQGLERE Sphfalx0136s0053.1Sphfalx0136s0053.1 ------...SPYDTAWVA...DIDD...RW...RQK...DDYFD...RWD...------BRY_RCBT_PTPS1BRY_RCBT_PTPS1 ------...------...----...RW...RQK...DDYFD...RWD...NDIQGLKRE BRY_RCBT_PTPS2BRY_RCBT_PTPS2 ------...------...DIDD...RW...RQK...DDYFD...RWD...NDIQGLKRE BRY_UHLI_PTPS4BRY_UHLI_PTPS4 ------...------...DIDD...--...---...-----...---...------Sphfalx0014s0037.1Sphfalx0014s0037.1 ------...SPYDTAWVA...DVDD...RW...RQK...DDYFD...RWD...NDIQGFKRE MON_NDUV_PTPS2MON_NDUV_PTPS2 ------...SPYDTAWIA...DADD...KW...RQR...DDFFD...SWE...NDIRGFERE MON_NDUV_PTPS1MON_NDUV_PTPS1 ------...SPYDTAWIA...DADD...--...---...-----...---...------MON_UJTT_PTPS4MON_UJTT_PTPS4 ------...------...----...--...---...DDYFD...SWD...NDIRGFERE MON_ZXJO_PTPS1MON_ZXJO_PTPS1 ------...------...----...RW...RQK...DDFFD...SWD...NDIQGFEKE MON_UJTT_PTPS1MON_UJTT_PTPS1 ------...SPYDTAWIA...DADD...--...---...-----...---...------MON_UJTT_PTPS2MON_UJTT_PTPS2 ------...SPYDTAWIA...DADD...--...---...-----...---...------MON_POPJ_PTPS1MON_POPJ_PTPS1 ------...SPYDTAWVA...DVDD...--...---...-----...---...------MON_FQGQ_PTPS1MON_FQGQ_PTPS1 ------...------...----...RW...RQK...DDFFD...CWD...NDIQGFERE MON_XWDM_PTPS3MON_XWDM_PTPS3 ------...------...DVDD...--...---...-----...---...------TPS-x MON_NDUV_PTPS3MON_NDUV_PTPS3 ------...SAYDTAWVA...DVDD...KW...RQN...DDLYD...RWD...NDIQGYKRE MON_NDUV_PTPS4MON_NDUV_PTPS4 ------...SAYDTAWVA...DVDD...RW...EKT...KSFFT...SWD...ADIQSFNSG MON_YIXP_PTPS1MON_YIXP_PTPS1 ------...SPYDTAWVA...DADD...RW...RQI...DDFFD...RWD...NDIQGFKRE ScuTPS6ScuTPS6 ------...------...----...--...---...DDFFD...RWN...NDLH----E ScuTPS1ScuTPS1 ------...SSYDTAWVG...DLDN...RW...RQN...DDFFD...RWD...------ScuTPS2ScuTPS2 ------...------...----...RW...RQK...DDFFD...RWD...------ScuTPS4ScuTPS4 ------...SSYDTAWVS...DVDD...SW...---...-----...---...------ScuTPS3ScuTPS3 ------...SSYDTAWVS...DVDD...--...---...-----...---...------AsuTPS1AsuTPS1 ------...SAYDTAWVG...DGDD...SW...RQK...DDYFD...RWD...NDIQGYKRE MON_KIIX_PTPS1MON_KIIX_PTPS1 ------...------...DADD...RW...RQR...DDFFD...SWD...NDIQGFKRE MON_CQPW_PTPS1MON_CQPW_PTPS1 ------...------...DNDD...--...---...-----...---...------Lja_cpsksLja_cpsks ------...SAYDTAWVA...DVDD...RW...RQK...DDFFD...RWD...NDIQGYKRE MON_CQPW_PTPS5MON_CQPW_PTPS5 ------...------...----...RW...QQK...DDFFD...RWD...------MON_CQPW_PTPS7MON_CQPW_PTPS7 ------...------...----...RW...QQK...DDFFD...RWD...------MON_QIAD_PTPS2MON_QIAD_PTPS2 ------...------...----...RW...RQK...DDFFD...SWD...NDIQGFKRE MON_UOMY_PTPS6MON_UOMY_PTPS6 ------...------...----...RW...RQK...DDFFD...SWD...NDIQGFKRE MAR_KRUQ_PTPS1MAR_KRUQ_PTPS1 ------...S------...DLDD...RW...RQK...DDYFD...EWD...------MAR_KRUQ_PTPS10MAR_KRUQ_PTPS10 ------...SPYDTAWVA...DLDD...RW...RQK...DDYFD...EWD...------MAR_KRUQ_PTPS5MAR_KRUQ_PTPS5 ------...------...----...RW...RQT...DDYFD...EWD...NDMVQGQRE MAR_KRUQ_PTPS17MAR_KRUQ_PTPS17 ------...------...----...RW...RQK...DDYFD...EWD...------MAR_KRUQ_PTPS16MAR_KRUQ_PTPS16 ------...------...----...RW...RQK...DDYFD...EWD...------MAR_KRUQ_PTPS2MAR_KRUQ_PTPS2 ------...------...----...RW...RQK...DDYFD...EWD...SDIRGHKRE MAR_HPXA_PTPS2MAR_HPXA_PTPS2 ------...------...DLDD...RW...RQR...DDYFD...QWD...NDIQGHERE MAR_HPXA_PTPS1MAR_HPXA_PTPS1 ------...SPYDTAWVA...DLDD...RW...RQR...DDYFD...QWD...NDIQGHERE MAR_HPXA_PTPS3MAR_HPXA_PTPS3 ------...SPYDTAWVA...DLDD...RW...RQR...DDYFD...QWD...NDIQGHERE MAR_HPXA_PTPS5MAR_HPXA_PTPS5 ------...------...----...RW...RHK...DDYFD...EWD...------MAR_HPXA_PTPS6MAR_HPXA_PTPS6 ------...------...DLDD...RW...RQK...DDYFD...EWD...NDIQGQKRE MAR_RTMU_PTPS2MAR_RTMU_PTPS2 ------...------...DLDD...RW...RQK...DDFFD...EWN...NDSQTHEKE MAR_RTMU_PTPS1MAR_RTMU_PTPS1 ------...------...DLDD...RW...RQK...DDFFD...EWN...NDSQTHEKE MAR_YBQN_PTPS2MAR_YBQN_PTPS2 ------...SPYDTAWVA...DLDD...RW...RQR...DDYFE...EWD...NDIHGGERE MAR_YBQN_PTPS1MAR_YBQN_PTPS1 ------...------...----...RW...RQR...DDYFD...---...------MAR_YFGP_PTPS2MAR_YFGP_PTPS2 ------...------...DLDD...RW...RHR...DDFFE...EWD...NDIHGYE-- MAR_YFGP_PTPS5MAR_YFGP_PTPS5 ------...SPYDTAWVG...DLDD...GW...RQK...DDFFD...EWD...NDIQGYERE MAR_JHFI_PTPS1MAR_JHFI_PTPS1 ------...------...----...RW...RQK...DDYFD...KWD...NDITGYERE MAR_PIUF_PTPS1MAR_PIUF_PTPS1 ------...SPYDTAWVA...DLDD...RW...REK...DDYFD...KWD...NDITGYKRE MAR_JHFI_PTPS2MAR_JHFI_PTPS2 ------...------...----...RW...RQK...DDYFD...---...------MAR_LFVP_PTPS12MAR_LFVP_PTPS12 ------...SPYDTAWVA...DVDD...RW...RQR...RDFFE...SWD...DDIRNFKRA Mapoly0015s0008.1Mapoly0015s0008.1 ------...SPYDTAWVA...DVDD...RW...KQR...RDFFD...SWD...DDIRNFKRA MAR_TXVB_PTPS5MAR_TXVB_PTPS5 ------...SPYDTAWVA...DVDD...RW...RQR...RDFFE...SWD...EDIQSFKRG MAR_JPYU_PTPS3MAR_JPYU_PTPS3 RVPK------W...SPYDTAWVS...DLDD...RW...EKQ...DDFFD...SRN...NDIRTYQRE MAR_JPYU_PTPS2MAR_JPYU_PTPS2 ------...------...----...RW...EKQ...DDFFD...SRN...NDIRTYQRE Mapoly0050s0107.1Mapoly0050s0107.1 RVPK------W...SPYDTAWVS...----...RW...EKQ...DDFFD...SRN...NDIRTYQRE MAR_LFVP_PTPS11MAR_LFVP_PTPS11 ------...------...----...RW...QKE...DDFFD...SRN...NDIQTYQRE Mapoly0221s0001.1Mapoly0221s0001.1 ------...SPYDTAWIC...DMDD...RW...DKT...DDFFD...SRD...NDMATFQKE Mapoly0207s0001.1Mapoly0207s0001.1 R------...SPYDTAWIC...DMDD...RW...DKT...DDFFD...SRD...NDMATFQKE MAR_LFVP_PTPS10MAR_LFVP_PTPS10 R------...SPYDTAWVS...DLDD...RW...NET...DDFFD...SSD...---XTFE-- MAR_LFVP_PTPS5MAR_LFVP_PTPS5 ------...------...DLDD...RW...RKV...DDFFD...SWN...NYMETYQRK

Figure 2.3: Continued.

58 ApuTPS19ApuTPS19 ------...SAYDTAWVA...DVDD...SW...KEK...DDYFH...SWD...QDIHAHKQV ANT_RXRQ_PTPS19ANT_RXRQ_PTPS19 ------...SAYDTAWVA...DVDD...SW...KEK...DDYFH...SWD...QDIHAHK-- ApuTPS21ApuTPS21 ------...SAYDTAWVA...DVDD...RW...RQE...TNYFQ...RWN...KDL-ADS-- ANT_RXRQ_PTPS5ANT_RXRQ_PTPS5 ------...------...DLDD...RW...RQK...DDYFD...SWD...------ApuTPS5ApuTPS5 ------...SPYDTAWVG...DLDD...RW...RQK...DDYFD...SWD...NDIQGYKRE ApuTPS16ApuTPS16 ------...SPYDTAWVG...DLDD...RW...RQK...DDYFD...SWD...NDLNSYERE ApuTPS15ApuTPS15 ------...SAYDTAWVA...DVDD...--...---...-----...---...------ApuTPS10ApuTPS10 RLSNWHSPDTW...SAYDTTWVA...RCGT...--...R--...DDFND...RKG...NEKKSRQME ApuTPS18ApuTPS18 ------...------...----...RW...RQK...DDWFD...RWD...NDVQGYK-- TPS-x ApuTPS3ApuTPS3 RGSDWHSPDTW...SAYDTAWVA...GCGR...RW...RQK...DDWFD...RWD...NDVQGYKRE ANT_RXRQ_PTPS1ANT_RXRQ_PTPS1 ------...------...DVDD...--...---...-----...---...------ANT_WCZB_PTPS1ANT_WCZB_PTPS1 ------...SAYDTAWVA...DVDD...RW...RQK...DDYFD...SWD...NDIRGY--- ANT_FAJB_PTPS3ANT_FAJB_PTPS3 ------...------...DVDD...--...---...-----...---...------ANT_TCBC_PTPS4ANT_TCBC_PTPS4 ------...------...----...RW...RQK...DDYFD...SWD...------ANT_RXRQ_PTPS21ANT_RXRQ_PTPS21 ------EL-...SPYDAGWVA...DLDD...RW...RHS...DDYIT...RWD...NDIRSHQRE ANT_RXRQ_PTPS22ANT_RXRQ_PTPS22 ------EL-...SPYDAGWVA...DLDD...RW...RHS...DDYIT...RWD...------ApuTPS13ApuTPS13 -AQLQGLQNLR...SPYDAGWVA...DLDD...RW...RHS...DDYVT...RWD...NDIRSHQRE ApuTPS20ApuTPS20 -AQLQGLQNL-...SPYDAGWVA...DLDD...RF...-HS...-----...---...------ApuTPS7ApuTPS7 MASSAGLEKL-...SPYDAGWVA...DLDD...--...---...-----...---...------MON_CQPW_PTPS4MON_CQPW_PTPS4 ------...------...DLDD...RW...RHS...DDYLT...RWE...NDIRSFQRG ANT_TCBC_PTPS2ANT_TCBC_PTPS2 ------...------...DLDD...RW...RTN...DDYTT...---...------ANT_FAJB_PTPS1ANT_FAJB_PTPS1 LAT-----ELQ...SPYDAAWVA...DLDD...GW...RHE...DDYTT...SWD...------ApuTPS17ApuTPS17 ------...------...DLDD...GW...RYD...DDYTT...SWD...NDIRSYQVR

Figure 2.3: Continued.

59 Chapter 3

Microbial Type Terpene Synthase

Genes Occur Widely and Specically in Non-seed Land Plants

60 Author Contributions: Qidong Jia conceptualized and conducted all the bioinfor- matics analyses, and co-wrote the manuscript.

61 3.1 Abstract

Microbial terpene synthases-like genes (MTPSLs) are a novel type of plant terpene synthase genes that have been previously identied only in the lycophyte Seleginella moellendori. MTPSLs are more closely related to bacterial and fungal terpene synthases than typical plant terpene synthases. The goal of this study is to investigate the distribution, evolution and biochemical functions of plant MTPSL genes. By analyzing the transcriptomes of 1103 plant species ranging from green algae to owering plants, putative MTPSL genes were identied predominantly from non-seed land plants including , lycophytes and ferns. Based on phylogenetic analyses and the conrmation of representative members in sequenced plant genomes, the vast majority of these genes

(97% and all from non-seed plants) were assigned with condence to be plant genes rather than microbial contamination. The general absence of MTPSL genes in seed plants was further supported by the negative result of searching for MTPSL genes from the sequenced genomes of a wide range of seed plants. Thus, it was concluded that

MTPSL genes occur widely and specically in non-seed plants. MTPSLs from non-seed plants are divergent, forming four major groups. Two of the four groups showed closer relatedness to bacterial terpene synthase, while the other two groups are more closely related to fungal terpene synthases. Two of the four groups contain canonical aspartate- rich ‘DDxxD’ motif. The third group has ‘DDxxxD’ motif and the fourth group has only the rst two DD conserved in this motif. Representative members from each of the four distinct groups with varying aspartate-rich motif were characterized for terpene synthase

62 activities. These MTPSLs displayed diverse catalytic functions as monoterpene synthases and sesquiterpene synthases.

3.2 Introduction

Terpenoids are the largest class of secondary metabolites made by plants (Connolly and Hill, 1991). Particularly rich are monoterpenes (C10), sesquiterpnes (C15) and diter- penes (C20). Seed plants (angiosperms and gymnosperms) are well known for their rich production of terpenoids (Tholl, 2006). For non-seed plants, only liverworts are known to be rich producer of terpenoids (Asakawa, 2008). Terpenoids have diverse biological and ecological functions (Gershenzon and Dudareva, 2007). Some of the functions are specic to certain plant lineages. For instance, volatile terpenoids as oral scent compounds are involved in attracting pollinators for pollination. Many of the functions of terpenoids especially in plant defenses against biotic and abiotic stresses seem to be universal for all plant lineages (Mithofer and Boland, 2012). The biosynthesis of terpenoids in all plant linages, particularly in basal land plants, is therefore the key to understand the roles of terpenoids for adaptation of terrestrial plants to the land.

Terpene synthases (TPSs) are pivotal enzymes for terpenoid biosynthesis. From both the primary proteins sequences and structural perspective, terpene synthases form a superfamily. TPS genes have been found in plants, fungi and bacteria (Chen et al., 2011).

While all TPSs share certain structural similarities (Köksal et al., 2011), typical plant TPSs and microbial TPSs from fungi and bacteria have very limited sequence similarities and are

63 therefore only distantly related evolutionarily (Li et al., 2012). Much progress has been

made in our understanding of the function and evolution of typical plant TPSs, which

are conserved in land plants (Chen et al., 2011). Typical plant TPSs form subfamilies.

Monoterpene synthases and sesquiterpene synthases have been proposed to have evolved

independently in gymnosperms and angiosperms, they are absent in the non-seed plants

with sequenced genome, both monoterpene synthases and sesquiterpenes appear to be

absent (Degenhardt et al., 2009), and their typical plant TPSs are only of diterpene synthase

type. Therefore, the molecular basis underlying the biosynthesis of monoterpenes and

sesquiterpenes identied from non-seed plants remains elusive.

Recently, an important discovery made in the eld of plant terpene biosynthesis

is the discovery of microbial type terpene synthase genes in plants. Such genes have

been named microbial terpene synthase-like genes (MTPSLs) (Li et al., 2012). Unlike

typical plant TPSs, which contains either α-β-γ three domains or α-β two domains

(Köksal et al., 2011), MTPSLs contain only α domain. Phylogenetic analysis showed that MTPSLs from Selaginella moellendori have closer relatedness to microbial TPSs,

especially fungal TPSs, than to typical plant TPSs (Li et al., 2012). So far, MTPSL has only

been identied in a lycophyte S. moellendori, not in the moss Physcomitrella patens and

15 species of owering plants with sequenced genome (Li et al., 2012). This nding raises

many intriguing questions about the origin, evolution and function of this new type of

terpene synthase genes in the plant kingdom. The goal of this study is to determine the

distribution of MTPSL in the plant kingdom, to infer their evolution and to determine

their biochemical activities.

64 3.3 Results and Discussion

3.3.1 Terpene Synthase Genes of Microbial Type are Highly En-

riched in the Transcriptomes of Non-Seed Land Plants

To determine the distribution of MTPSL genes in the plant kingdom, we took a transcriptomic approach. The transcriptomes for 1103 species (Table 3.1) of green plants (779 species of seed plants, 166 species of non-seed land plants, 47 species of Charophyta and 111 species of Chlorophyte) generated from the OneKP initiative

(https://sites.google.com/a/ualberta.ca/onekp/) were searched for microbial type terpene synthase genes using a HMMER method as previously described (Li et al.,

2012). A total of 712 MTPSL genes were identied from the transcriptomes of a total of

146 species. Strikingly, the vast majority of MTPSL genes (706 out of the 712 MTPSL genes or 99.2%) were found from the transcriptomes of non-seed land plants (Figure 3.1).

Bryophytes consist of three groups: hornworts, mosses and liverworts, which have

7, 41 and 26 species in the onekp dataset, respectively. The number of species whose transcriptomes contain MTPSL genes for hornworts, mosses and liverworts are 3, 30 and 24. Among the 22 species of lycophytes, 21 were found to contain MTPSL genes in their transcriptomes. The Monilophytes group contains 70 species, of which 65 species were found to contain MTPSL genes in their transcriptomes. The median number of

MTPSL genes from the transcriptome of each species for hornworts, mosses, liverworts, lycophytes and ferns are 0, 1, 8, 3 and 4.5, respectively (Figure 3.1 and Table 3.2). Among

65 all species, the fern Cystopteris utahensis (a tetraploid) was found to contain the most

MTPSL genes with 20 members (Table 3.1).

Except the genes found from non-seed land plants, extremely low occurrence of

MTPSL genes was also found from the transcriptomes of seed plants and Charophyta.

Among the 779 species of seed plants, only two species: Phytolacca bogotensis and Opuntia sp. were found to contain MTPSL genes in their transcriptomes with one and four members respectively. Among the 47 species of Charophyta, only one species Micrasterias mbriata contained MTPSL gene (one member) in its transcriptome. No MTPSL genes were found from the transcriptomes of 111 species of Chlorophyta.

3.3.2 The Majority of MTPSL Genes Identied from Plant Tran-

scriptomes Forms Four Groups Clustered with Either Fungal

or Bacterial Terpene Synthases

Phylogenetic analysis was performed for the 712 MTPSL genes identied from plant transcriptomes together with 48 known MTPSL genes from S. moellendori and selected

terpene synthase genes from bacteria and fungi. The resulting phylogenetic tree shown

that the distribution of MTPSL genes in non-seed plants exhibit lineage-specic charac-

teristic and the majority of them (690 out of 712) were clustered into four major groups

(Figure 3.2).

The group I, the second largest MTPSL gene group, contains about 86% of MTPSL genes

(152 out of 177) from 23 species of liverworts, 34% of MTPSL genes (27 out of 79) from 10

66 species of mosses and 28% of MTPSL genes (23 out of 83) from 9 species of lycophytes. The

group II was composed of MTPSL genes mainly from mosses (about 66% of MTPSL genes

from 24 species) and hornworts (50% of MTPSL genes from all 3 species that have MTPSL genes found). There was also one MTPSL gene found in a liverwort species, Scapania nemorosa. Members of MTPSL genes in this species were also present in group I and III.

The group III, the smallest group, contains 4 MTPSL genes from 3 species of hornworts and 14 MTPSL genes from 7 species of liverworts. These 18 genes were clustered with the fungal Trichodiene synthase (TRI5) genes. The group IV contain almost all MTPSL genes found from 65 species of ferns (Monilophytes, 352 out of 353) and about 70% of MTPSL genes in Lycophytess (58 out of 83). The known MTPSL genes from S. moellenfori were closely related with MTPSL genes from the transcriptomes of other Lycophyte species.

There are 22 MTPSL genes that locate outside of the four major groups and disperse within the fungal or bacterial clade (Figure 3.2 and Table 3.3). They are categorized in group V. MTPSL genes found in two seed plant species, Opuntia sp. (4 members) and

Phytolacca bogotensis (1 member), and one green algae, Micrasterias mbriata (1 member), are in this list.

3.3.3 The Majority of MTPSL Genes Identied from Plant Tran-

scriptomes are Plant Genes

Putative MTPSL genes identied in the plant transcriptomes could have one of the two origins: plants and microbes. The reason why some of these genes can be of microbial

67 origin is that all plants are known to be associated with endophytic fungi and bacteria

(Faeth and Fagan, 2002; Gaiero et al., 2013; Rosenblueth and Martinez-Romero, 2006;

Ryan et al., 2008). Some of the plant transcriptomes may unavoidably contain transcripts

from microbes that were associated with the plants. The 22 MTPSL genes from group

V (Table 3.3) have a high probability of being derived from plant-associated microbes.

The similarities of these 22 MTPSL genes with microbial terpene synthase genes are

extremely high. For Instance, IRBN_MTPSL8, one of the 22 MTPSL genes identied from

the transcriptome of liverwort S. Nemorosa, is 91% identical to a terpene synthase from a fungus Serendipita vermifera (Figure 3.6). S. Vermifera is a member of the ubiquitous fungal order Sebacinales, which has the greatest diversity of mycorrhizal-types (Weiss et al., 2004). The association of liverworts with sebacinales as endophytes has been documented (Weiss et al., 2011). Therefore, IRBN_MTPSL8 is most likely a fugal terpene synthase gene derived from S. nemorosa-associated endophyte. Certainly, at this time it is impossible to rule out the possibility that some of the 22 MTPSL genes are plant genes obtained from microbes very recently through horizontal gene transfer, which however will require extensive investigations. Consequently, these 22 MTPSL genes are not being further considered in this study. Three lines of evidence were used to judge that the most majority, if not all, of the remaining 690 MTPSL genes in the four major clades are plant genes.

The rst line of evidence is the clade patterns of these genes based on phylogenetic analysis. Forming four groups, the MTPSL genes in each group were identied from multiple plant species from multiple plant lineages (Figure 3.2). This pattern supports

68 that the MTPSLs in each group share a common evolutionary origin. The plant species associated with each clade are highly divergent, have diverse geographic distribution, and their plant materials were acquired in diverse fashion (from axenic culture to led- collected sample). Plant-associated microbes are highly diverse and some exhibit species- specicity (Opelt et al., 2007). Terpene synthase genes from dierent lineages of bacteria can be highly diverged (Yamada et al., 2015). A single clade of these genes if from microbes would suggest all these microbes from diverse plants are closely related, which seems to be very unlikely. Therefore, we can rule out that possibility that the plants in the same group are associated with closely related endophytes to give out MTPSL transcripts in the plant transcriptomes. Within each group, the MTPSLs from the same lineages showed closer relatedness. For Instance, in group I, the MTPSLs from liverworts, mosses and lycophytes form three distinct subclades (Figure 3.2). The MTPSLs from closely related species are also most closely related. For instance, the analyzed mosses include three species from the same Sphagnum: S. lescurii, S. palustre and S. recurvatum. In group I, the MTPSLs from these three species clustered together (Figure 3.2).

The second line of evidence came from the analysis of putative MTPSL genes from axenic culture. The Liverwort S. nemorosa was selected for this purpose. A total of 8 putative MTPSL genes were identied in the transcriptome of S. nemorosa (Table 3.1).

They belong to group I (5 genes), group II (1 gene) and group III (1 gene). The eighth gene belongs to group V. Axenic culture of S. nemorosa was initiated by germinating isolated spore. Its culture is therefore free of contamination of endophytic microbes. We extracted genomic DNA from S. nemorosa and did PCR to amplify DNA fragment for the

69 eight putative MTPSL genes. Six of the eight MTPSL genes (ve from group I and one

from group III) were amplied and conrmed by sequencing. However, the amplication

of the members from groups II and V failed. The member from group V has been

suggested to be contamination from endophytic fungi (Figure 3.6). analysis of the onekp

transcriptomes has suggested that s. nemorosa was contaminated with unknown source

of plant (https://pods.iplantcollaborative.org/wiki/display/iptol/Sample+

source+and+purity). The putative MTPSL from S. nemorosa in group II is the only

MTPSL from liverworts to be assigned in this group. Together with contamination

analysis, our experimental data suggested this single gene is derived contaminated plant

material. Overall, This experimental study conrmed that the MTPSLs from S. nemorosa

in group I and III are plant genes.

The third line of evidence came from the conrmation of representative MTPSL genes

together with their neighboring genes from each of the four groups to be bona de plant

genes. The genes in group IV are clustered with SmMTPSL genes from S. moellendori,

which have been determined to be plant genes (Li et al., 2012). Therefore, no additional

analysis is needed. For obtain similar evidence for group II and III, we analyzed the

hornwort Anthoceros punctatus, whose draft genome has been recently sequenced (Li et al., 2014). By assembling the raw genome sequence and then identify MTPSL genes using the same HMMER approach, 16 MTPSL genes were identied, of which 11 were considered full-length (Table 3.4). Phylogenetic analysis showed that three of the 11 genes belong to group II and the rest eight to the group III (Figure 3.7). ApMTPSL1 from group

II and ApMTPSL2 from group III were selected as representative for the following study.

70 In the assembled genome of A. punctatus, ApMTPSL1 is neighbored with a Cytochrome

P450 gene, to which the most homologous is from the plant Oryza Brachyantha (Figure

3.3A). Similarly ApMTPSL2 is neighbored with a leucine-rich receptor-like kinase gene, to which the most homologous gene is from the plant Theobroma cacao (Figure 3.3B).

We extracted genome DNA from A. punctatus grown in axenic culture and did PCR to amply the coding sequence of ApMTPSL1/2 gene and their respective neighboring gene.

The amplied DNA fragment was cloned and fully sequenced. It was conrmed that both

ApMTPSL1 (Figure 3.3A) and ApMTPSL2 genes (Figure 3.3B) reside in the A. punctatus genome and neighbors with a plant gene. The similar evidence for the group I was obtained with a moss Sphagnum fallax, whose genome sequence is available (JGI site).

The sequenced S. fallax genome contains 21 MTPSL genes, All of which belong to group

I (Table 3.5). Representative MTPSL genes from S. fallax were conrmed to reside in its genome and neighbor with a cophia-like retrotransposable element, to which the most homologous is from the plant Arabidopsis Thaliana (Figure 3.3C). The conrmation of representative MTPSL genes in each group to reside in plant genome supports that their apparent orthologs in each group are plant genes.

3.3.4 Evolutionary Implications of Non-Seed Plant-Specic MTP-

SLs

The conrmation of the vast majority of MTPSL genes identied from the OneKP transcriptomes to be plant genes indicates that MTPSL genes occur widely in non-seed

71 plants. The MTPSL genes in group I appear to be most conserved. It contains MTPSL

genes from bryophytes (liverworts and mosses) and lycophytes (Figure 3.2). This implies

the presence of MTPSL genes in the common ancestor of land plants. It is interesting to notice that MTPSL genes are generally absent in the transcriptomes of green algae

(Figure 3.1). To provide further evidence about the presence/absence of MTPSL genes in

green algae, MTPSL genes were searched in the sequenced genomes for six species of

Chlorophyte and one Species of Charophyta (Table 3.6). No MTPSL genes were detected

from these sequenced green algae. It will be interesting to determine whether MTPSL

genes originated in algal ancestor of land plants or in the common ancestral land plants.

The absence of MTPSL genes in the transcriptomes of a wide range of Charophyta and the

genome of Klebsormidium accidum (Hori et al., 2014) does not support the evolution of

MTPSL genes in the algal ancestor of land plants. If so, the evolution of MTPSL genes

may be associated with plants’ transition from freshwater to land. The pioneer land

plants faced harsh environment and had to overcome many biotic and abiotic stresses.

Many products of terpene synthases are volatile hydrocarbons. It is possible that such

compounds are not so useful for an aquatic environment but become important in a

terrestrial ecosystem.

It is interesting to observe that MTPSL genes from non-seed plants showed dierent

relatedness to bacterial terpene synthase and fungal terpene synthases. The groups III and

IV are most closely related to fungal terpene synthases, while the group I are most closely

related to bacterial terpene synthases. The group II is most related to bacterial terpene

synthases, which however resides in the fungal clade. These clade patterns suggest

72 complex evolutionary history of microbial type terpene synthases in these kingdoms of life. While it is possible that microbial type terpene synthase genes are ancestral universal genes and evolved dierently, the absence of terpene synthase genes in other lineages of life expect bacteria, fungi and plants does imply that such a distribution pattern may be a result of horizontal gene transfer. It is premature to make strong claims about the donor and receipts of such horizontal gene transfer events. If the terpene synthase genes in bacteria and fungi are presumed to be ancestral to MTPSLs, then MTPSL genes can be explained by multiple hgt events from bacteria and fungi. Unfortunately, our understating of terpene synthase genes in bacteria and fungi is very much limited (Quin et al., 2014;

Schmidt-Dannert, 2015; Yamada et al., 2015). We just showed the rst case of hgt of terpene synthase genes from bacteria to fungi, which will undoubtedly confound our inference of the evolution of microbial type terpene synthase genes.

Equally interesting is the absence of mtpls genes in seed plants (Figure 3.1). The absence of MTPSL genes in transcriptomes could be due to the lack of expression. To gain additional evidence about the presence/absence of MTPSL genes in seed plants, we analyzed the genome sequences of 48 species of seed plants (Table 3.6), but no mtpls genes were identied. As mentioned previously, land plants contain typical plant terpene synthases, which catalyzes the similar biochemical reactions for the production of terpenoids as microbial type terpene synthases but are only distantly related to

MTPSLs (Li et al., 2012). There is a distinct dierence in typical plant terpene synthases between those from seed and non-seed plants. From the non-seed plants that have been studied, their typical plant terpene synthases are diterpene synthases. In contrast, typical

73 terpene synthases from seed plants include monoterpene synthases and sesequiterpene

synthases in addition to diterpene synthases (Chen et al., 2011). Several MTPSL genes

From S. moellendori have been demonstrated to function as monoterpene synthases and sesquiterpepene synthases (Li et al., 2012). We hypothesize that most MTPSL genes in non-

seed plants function as monoterpene synthases and sesquiterpene synthases. After HGT,

the transferred gene has to be established and maintained in the population. Since seed

plants evolved their own plant-like tps that produce terpenes for various functions, the

pressure for maintaining MTPSL from current hgt events in higher plants is probably not

high enough (it is just one more tps in an already existing family of about 10-30 members)

– thus this wound be one of the reasons that there are no MTPSL in seed plants.

3.3.5 Biochemical Function of Selected MTPSLs: Diversity of Ac-

tivities

The presence of MTPSL genes only in basal land plants poses an intriguing question about their biological functions. In seed plants, TPSs are responsible for the production of a diversity of terpenoids important for plant-environment interactions, especially as defenses. Some basal land plants, such as liverworts (Nagashima et al., 2001) and mosses

(Saritas et al., 2001), also produce a vast diversity of terpenoids. However, essentially nothing is known about how such terpenoids in basal land plants are synthesized and their biological functions. Based on this knowledge, we hypothesize that MTPSLs in basal land plants make diverse terpenoids for plant defense against various biotic and

74 abiotic stresses. As a rst step, in this study, we chose to characterize selected MTPSL gene biochemically.

Terpene synthases contain two highly conserved motifs: the ‘DDxxD’ motif and ‘NSE’ motif, which are involved in substrate binding (Hyatt et al., 2007; Koksal et al., 2011; Starks et al., 1997). MTPSL genes in all four groups contain the highly conserved “NSE” motif in their C-terminal (Figure 3.4). In contrast, these four groups of proteins show dierence in the aspartate-rich motif (Figure 3.4). The conserved ‘DDxxD’ motif is present in genes in group I and IV. Genes in group II display a conserved ‘DDxxxD’ motif, while genes in groups III have no clear ‘DDxxD’ existed. Although the ‘DDxxD’ motif is prevalent in group IV, many members of MTPSL genes in this group display a ‘DDDDD’ or ‘DxDDD’ motif instead of the ‘DDxxD’ motif.

To determine whether the four groups of MTPSLs with varying aspartate-rich motifs have terpene synthase activities, representative members from each group were chosen and synthesized. The synthesized cDNAs were cloned into protein expression vector.

Recombinant proteins were produced in E. coli and tested for terpene synthase activities.

Two types of substrates were tested: farnesyl diphosphate and geranyl diphosphate, which are substrates for sesquiterpenes and monoterpenes, respectively.

3.4 Conclusions

In this study, the MTPSL genes were systematically mined from large-scale transcrip- tomes. Out of 779 seed plant species, only 5 MTPSL genes were found from 2 species,

75 while 706 MTPSL genes were found from 143 non-seed land plant species. So, besides the

genome of Selaginella moellendori, MTPSL genes are widely and specically distributed in non-seed land plants. Based on the phylogenetic analysis and experimental validation of selected MTPSL genes, the vast majority of them are believed to be real plant genes, although a small percentage of them are most likely from contamination. MTPSL genes form four lineage specic groups that exhibit diverse structural features, which indicates multiple evolutionary origins of plant MTPSL genes. Biochemical studies of selected genes shown that MTPSL genes function as monoterpene synthase and sesquiterpene synthases.

However, the biological functions of MTPSL genes are largely unknown This study will pave the way for characterization of the secondary metabolites produced by these MTPSL genes and investigation the biological functions of MTPSL genes in non-seed land plants.

This study also provokes interest in inferring the evolutionary origins of MTPSL genes.

3.5 Materials and Methods

3.5.1 Identication of Terpene Synthases of Microbial Type from

Transcriptomes and Sequenced Genomes

The One Thousand Plants Project (OneKP; http://www.onekp.com) has sequenced transcriptomes for over 1000 non-model plant species spanning almost all major plant clades (from green algae to owering plants). All transcriptomes were pre-assembled with the SOAPdenovo-Trans assembler (Xie et al., 2014). Transcriptomes of 1103 representative

76 species (Table 3.1) were analyzed in this study. For all the assembled contigs, the longest regions without stop codons were annotated and translated using the getorf program from the EMBOSS package (Rice et al., 2000) with a minimum length of 150 amino acids. The resulting peptides were searched against the Pfam-a database locally using HMMER 3.0

Hmmsearch (Finn et al., 2011) With an E value of 1e-5. Only sequences with best hits from the following four HMM proles were considered as putative terpene synthases:

Terpene_synth_C(PF03936) and Terpene Synthase N terminal domain (PF01397), TRI5

(PF06330) and SmMTPSLs (a prole created by using 48 microbial type tpss identied from

Selaginella moellendori). For sequences from the same species that have 100% identity, only the longest one was kept as the representative sequence to reduce redundancy.

All the putative tps sequences were subjected to blastp search against the NCBI’s non-redundant database using default parameters and the “microbial tps-like protein” annotation was based on the taxonomies of the top 10 best hits for each sequence, heavily weighted on the taxonomy of best BLASTP hit. Since S. moellendori is the only species reported having both plant and microbial types of terpene synthases, for sequences with the best hits from this species, another criteria taking into account the hits’ type of terpene synthases or the types they are mostly close to was applied.

77 3.5.2 Assembly of Hornwort Anthoceros punctatus Genome and

Identication of MTPSL Genes

For Anthoceros punctatus, the illumina paired-end whole genome sequencing data

(Access Number: SRR1278954) was retrieved from NCBI’s Sequence Read Archive (SRA)

Database. The reads were assembled using SPAdes-3.1.1 (Bankevich et al., 2012) and the

resulting contigs and singletons were further assembled by CPA3 (Huang and Madan,

1999). The nal CAP3 assembly contains 34448 sequences (16272 Contigs and 18176

Singletons) with a total length of 97Mb, of which 15596 sequences have a minimum length

of 500 bp. The N50 contig length based on these 15596 sequences is 12,462 bp. The

assemblies were searched for occurrences of terpene synthases using homology-based

methods and ab initio predictors. A TBLASTN search was performed with an E-value

cuto of 1e-30 using the 716 MTPSL genes identied from OneKP transcriptomes. We also

run SNAP (Korf, 2004) trained for Arabidopsis thaliana on the assembly. The resulting protein sequences of predicted genes were subsequently subjected to a HMMER search against four HMM proles (PF03936, PF01397, PF06330 and SmMTPSLs generated by using 48 microbial type tpss from spikemoss).

3.5.3 Phylogenetic Analyses of Terpene Synthases

Bacterial and fungal terpene synthases were obtained from Pfam (Version 27.0). To reduce ambiguities in sequence alignment, only the terpene synthase c terminal domains were included. Sequences were aligned using MAFFT (Linsi) (Katoh et al., 2002) with

78 1000 iterations of improvement. ProtTest (Darriba et al., 2011) was used to selection

of the most appropriate protein evolution model for the protein alignment under the

Akaike Information Criterion. For the maximum likelihood analyses, we used RAxML

(Stamatakis, 2014) with 1000 bootstrap replicates under the best substitution model

(LG+G+F) selected by ProtTest.

3.5.4 Plant Material, Genomic DNA Isolation and PCR

Anthoceros punctatus was cultured in Knop medium containing 0.025 g/L K2HPO4,

0.025 g/L KH2PO4, 0.025 g/L KCl, 0.025 g/L MgSO4•7H2O, 0.1g/L Ca(NO3)2•4H2O, 0.037 g/L FeSO4•7H2O, 7 g agar and pH was adjusted to 6.5 before autoclave. Sphagnum fallax was cultured in BCD medium (Ashton and Cove, 1977). A. punctatus and S. fallax were grown at 22 °C under 16 h light/8 h dark photoperiods.

Genome DNA from A. punctatus and S. fallax was isolated using the VIOGENE plant genomic isolation kit protocol. A fragment of genomic DNA covering partial MTPSL genes and its neighbor gene were amplied and subcloned onto the Teasy vector. The amplication was carried out using Phusion high-delity DNA polymerase with the following protocol: 2 min at 98 °C, followed by 35 cycles of 10s at 98 °C, 15s at Tm temperature, 45s at 72 °C and a nal elongation step of 7 min at 72 °C. The cloned DNA

fragment was fully sequenced.

79 3.6 Acknowledgements

This project was supported by a University of Tennessee AgResearch Innovation Grant

(to F.C.). The 1000 Plants (1KP) initiative, led by GKSW, is funded by the Alberta Ministry of Innovation and Advanced Education, Alberta Innovates Technology Futures (AITF),

Innovates Centres of Research Excellence (iCORE), Musea Ventures, BGI-Shenzhen and

China National Genebank (CNGB). We also acknowledge the contributions of Dr. James

Leebens-Mack’s lab in obtaining material for RNA Seq and developing the infrastructure for distribution of the 1kp data.

80 3.7 Bibliography

Asakawa, Y. (2008). Liverworts-potential source of medicinal compounds. Current

Pharmaceutical Design, 14(29):3067–88. 63

Ashton, N. W. and Cove, D. J. (1977). The isolation and preliminary characterisation

of auxotrophic and analogue resistant mutants of the moss, physcomitrella patens.

Molecular and General Genetics MGG, 154(1):87–95. 79

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin,

V. M., Nikolenko, S. I., Pham, S., Prjibelski, A. D., Pyshkin, A. V., Sirotkin, A. V.,

Vyahhi, N., Tesler, G., Alekseyev, M. A., and Pevzner, P. A. (2012). Spades: a new

genome assembly algorithm and its applications to single-cell sequencing. Journal of

Computational Biology, 19(5):455–77. 78

Chen, F., Tholl, D., Bohlmann, J., and Pichersky, E. (2011). The family of terpene synthases

in plants: a mid-size family of genes for specialized metabolism that is highly diversied

throughout the kingdom. Plant Journal, 66(1):212–29. 63, 64, 74

Connolly, J. D. and Hill, R. A. (1991). Dictionary of Terpenoids. Chapman and Hall/CRC.

63

Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. (2011). Prottest 3: fast selection of

best-t models of protein evolution. Bioinformatics, 27(8):1164–5. 79

81 Degenhardt, J., Kollner, T. G., and Gershenzon, J. (2009). Monoterpene and sesquiterpene

synthases and the origin of terpene skeletal diversity in plants. Phytochemistry, 70(15-

16):1621–37. 64

Faeth, S. H. and Fagan, W. F. (2002). Fungal endophytes: common host plant symbionts

but uncommon mutualists. Integrative and Comparative Biology, 42(2):360–8. 68

Finn, R. D., Clements, J., and Eddy, S. R. (2011). Hmmer web server: interactive sequence

similarity searching. Nucleic Acids Research, 39(Web Server issue):W29–37. 77

Gaiero, J. R., McCall, C. A., Thompson, K. A., Day, N. J., Best, A. S., and Duneld,

K. E. (2013). Inside the root microbiome: bacterial root endophytes and plant growth

promotion. American Journal of , 100(9):1738–50. 68

Gershenzon, J. and Dudareva, N. (2007). The function of terpene natural products in the

natural world. Nature Chemical Biology, 3(7):408–14. 63

Hori, K., Maruyama, F., Fujisawa, T., Togashi, T., Yamamoto, N., Seo, M., Sato, S., Yamada,

T., Mori, H., Tajima, N., Moriyama, T., Ikeuchi, M., Watanabe, M., Wada, H., Kobayashi,

K., Saito, M., Masuda, T., Sasaki-Sekimoto, Y., Mashiguchi, K., Awai, K., Shimojima, M.,

Masuda, S., Iwai, M., Nobusawa, T., Narise, T., Kondo, S., Saito, H., Sato, R., Murakawa,

M., Ihara, Y., Oshima-Yamada, Y., Ohtaka, K., Satoh, M., Sonobe, K., Ishii, M., Ohtani,

R., Kanamori-Sato, M., Honoki, R., Miyazaki, D., Mochizuki, H., Umetsu, J., Higashi, K.,

Shibata, D., Kamiya, Y., Sato, N., Nakamura, Y., Tabata, S., Ida, S., Kurokawa, K., and

82 Ohta, H. (2014). Klebsormidium accidum genome reveals primary factors for plant

terrestrial adaptation. Nature Communications, 5:3978. 72

Huang, X. and Madan, A. (1999). Cap3: A dna sequence assembly program. Genome

Research, 9(9):868–77. 78

Hyatt, D. C., Youn, B., Zhao, Y., Santhamma, B., Coates, R. M., Croteau, R. B., and

Kang, C. (2007). Structure of limonene synthase, a simple model for terpenoid cyclase

catalysis. Proceedings of the National academy of Sciences of the United States of America,

104(13):5360–5. 75

Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). Mat: a novel method for rapid

multiple sequence alignment based on fast fourier transform. Nucleic Acids Research,

30(14):3059–66. 78

Koksal, M., Jin, Y., Coates, R. M., Croteau, R., and Christianson, D. W. (2011). Taxadiene

synthase structure and evolution of modular architecture in terpene biosynthesis.

Nature, 469(7328):116–20. 75

Korf, I. (2004). Gene nding in novel genomes. BMC Bioinformatics, 5:59. 78

Köksal, M., Jin, Y., Coates, R. M., Croteau, R., and Christianson, D. W. (2011). Taxadiene

synthase structure and evolution of modular architecture in terpene biosynthesis.

Nature, 469(7328):116–122. 63, 64

Li, F. W., Villarreal, J. C., Kelly, S., Rothfels, C. J., Melkonian, M., Frangedakis, E., Ruhsam,

M., Sigel, E. M., Der, J. P., Pittermann, J., Burge, D. O., Pokorny, L., Larsson, A., Chen, T.,

83 Weststrand, S., Thomas, P., Carpenter, E., Zhang, Y., Tian, Z., Chen, L., Yan, Z., Zhu, Y.,

Sun, X., Wang, J., Stevenson, D. W., Crandall-Stotler, B. J., Shaw, A. J., Deyholos, M. K.,

Soltis, D. E., Graham, S. W., Windham, M. D., Langdale, J. A., Wong, G. K., Mathews,

S., and Pryer, K. M. (2014). Horizontal transfer of an adaptive chimeric photoreceptor

from bryophytes to ferns. Proceedings of the National academy of Sciences of the United

States of America, 111(18):6672–7. 70

Li, G., Kollner, T. G., Yin, Y., Jiang, Y., Chen, H., Xu, Y., Gershenzon, J., Pichersky, E.,

and Chen, F. (2012). Nonseed plant selaginella moellendor [corrected] has both seed

plant and microbial types of terpene synthases. Proceedings of the National academy of

Sciences of the United States of America, 109(36):14711–5. 64, 65, 70, 73, 74

Mithofer, A. and Boland, W. (2012). Plant defense against herbivores: chemical aspects.

Annual Review of Plant Biology, 63:431–50. 63

Nagashima, F., Suzuki, M., Takaoka, S., and Asakawa, Y. (2001). Sesqui- and diterpenoids

from the japanese liverwort jungermannia infusca. Journal of Natural Products,

64(10):1309–17. 74

Opelt, K., Berg, C., Schonmann, S., Eberl, L., and Berg, G. (2007). High specicity but

contrasting biodiversity of sphagnum-associated bacterial and plant communities in

bog ecosystems independent of the geographical region. ISME Journal, 1(6):502–16. 69

Qiu, Y. L., Li, L., Wang, B., Chen, Z., Knoop, V., Groth-Malonek, M., Dombrovska, O.,

Lee, J., Kent, L., Rest, J., Estabrook, G. F., Hendry, T. A., Taylor, D. W., Testa, C. M.,

84 Ambros, M., Crandall-Stotler, B., Du, R. J., Stech, M., Frey, W., Quandt, D., and Davis,

C. C. (2006). The deepest divergences in land plants inferred from phylogenomic

evidence. Proceedings of the National academy of Sciences of the United States of America,

103(42):15511–6. 96

Quin, M. B., Flynn, C. M., and Schmidt-Dannert, C. (2014). Traversing the fungal

terpenome. Natural Product Reports, 31(10):1449–73. 73

Rice, P., Longden, I., and Bleasby, A. (2000). Emboss: the european molecular biology open

software suite. Trends in Genetics, 16(6):276–7. 77

Rosenblueth, M. and Martinez-Romero, E. (2006). Bacterial endophytes and their

interactions with hosts. Molecular Plant-Microbe Interactions, 19(8):827–37. 68

Ryan, R. P., Germaine, K., Franks, A., Ryan, D. J., and Dowling, D. N. (2008). Bacterial

endophytes: recent developments and applications. FEMS Microbiology Letters,

278(1):1–9. 68

Saritas, Y., Sonwa, M. M., Iznaguen, H., Konig, W. A., Muhle, H., and Mues, R. (2001).

Volatile constituents in mosses (musci). Phytochemistry, 57(3):443–57. 74

Schmidt-Dannert, C. (2015). Biosynthesis of terpenoid natural products in fungi. Advances

in Biochemical Engineering/Biotechnology, 148:19–61. 73

Stamatakis, A. (2014). Raxml version 8: a tool for phylogenetic analysis and post-analysis

of large phylogenies. Bioinformatics, 30(9):1312–3. 79

85 Starks, C. M., Back, K., Chappell, J., and Noel, J. P. (1997). Structural basis for cyclic terpene

biosynthesis by tobacco 5-epi- aristolochene synthase. Science, 277(5333):1815–1820. 75

Tholl, D. (2006). Terpene synthases and the regulation, diversity and biological roles of

terpene metabolism. Current Opinion in Plant Biology, 9(3):297–304. 63

Weiss, M., Selosse, M. A., Rexer, K. H., Urban, A., and Oberwinkler, F. (2004). Sebacinales: a

hitherto overlooked cosm of heterobasidiomycetes with a broad mycorrhizal potential.

Mycological Research, 108(Pt 9):1003–10. 68

Weiss, M., Sykorova, Z., Garnica, S., Riess, K., Martos, F., Krause, C., Oberwinkler, F.,

Bauer, R., and Redecker, D. (2011). Sebacinales everywhere: previously overlooked

ubiquitous fungal endophytes. PLOS ONE, 6(2):e16793. 68

Wickett, N. J., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N.,

Ayyampalayam, S., Barker, M. S., Burleigh, J. G., Gitzendanner, M. A., Ruhfel, B. R.,

Wafula, E., Der, J. P., Graham, S. W., Mathews, S., Melkonian, M., Soltis, D. E., Soltis, P. S.,

Miles, N. W., Rothfels, C. J., Pokorny, L., Shaw, A. J., DeGironimo, L., Stevenson, D. W.,

Surek, B., Villarreal, J. C., Roure, B., Philippe, H., dePamphilis, C. W., Chen, T., Deyholos,

M. K., Baucom, R. S., Kutchan, T. M., Augustin, M. M., Wang, J., Zhang, Y., Tian, Z.,

Yan, Z., Wu, X., Sun, X., Wong, G. K., and Leebens-Mack, J. (2014). Phylotranscriptomic

analysis of the origin and early diversication of land plants. Proceedings of the National

academy of Sciences of the United States of America, 111(45):E4859–68. 96

86 Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., Huang, W., He, G., Gu, S., Li, S., Zhou,

X., Lam, T. W., Li, Y., Xu, X., Wong, G. K., and Wang, J. (2014). Soapdenovo-trans: de

novo transcriptome assembly with short rna-seq reads. Bioinformatics, 30(12):1660–6.

76

Yamada, Y., Kuzuyama, T., Komatsu, M., Shin-Ya, K., Omura, S., Cane, D. E., and Ikeda, H.

(2015). Terpene synthases are widely distributed in bacteria. Proceedings of the National

academy of Sciences of the United States of America, 112(3):857–62. 69, 73

87 3.8 Appendix

Table 3.1: List of screened plant species and the number of MTPSLs in each species. All these 1103 transcriptomes were from OneKP (www.onekp.com). MTPSLs are found from 146 species.

Table S1. List of screened transcriptomes and the number of MTPS in each sample. All these 1103 transcriptomes were from 1KP (www.onekp.com). MTPSLs are found from 146 species.

Monilophytes 353 Lycophytes 77 Charophyta 1 Angiosperms (Continued) Angiosperms (Continued) Adiantum aleuticum 5 Dendrolycopodium obscurum 2 Bambusina borreri 0 Allium commutatum 0 Daphne geraldii 0 Adiantum tenerum 8 Diphasiastrum digitatum 0 Chaetosphaeridium globosum 0 Allium sativum 0 Daphniphyllum macropodum 0 Anemia tomentosa 12 Huperzia lucidula 1 Chara vulgaris 0 Alnus serrulata 0 Datura metel 0 Angiopteris evecta 5 Huperzia myrisinites 2 atmophyticus 0 Aloe vera 0 Delosperma echinatum 0 Argyrochosma nivea 10 Huperzia selago 7 Closterium lunula 0 Alternanthera brasileana 0 Dendropemon caribaeus 0 Asplenium nidus 6 Huperzia squarrosa 2 Coleochaete irregularis 0 Alternanthera caracasana 0 Deschampsia cespitosa 0 Asplenium platyneuron 7 Isoetes sp. 6 Coleochaete scutata 0 Alternanthera sessilis 0 Desmanthus illinoensis 0 Athyrium filix femina 2 Isoetes sp. 7 Cosmarium broomei 0 Alternanthera tenella 0 Deutzia scabra 0 Azolla cf. caroliniana 5 Isoetes tegetiformans 5 Cosmarium granatum 0 Amaranthus cruentus 0 Dianthus sp. 0 Blechnum spicant 15 Lycopodiella apressa 1 Cosmarium ochthodes 0 Amaranthus palmeri 0 Dichroa febrifuga 0 Botrypus virginianus 6 Lycopodium annotinum 1 Cosmarium subtumidum 0 Amaranthus retroflexus 0 Digitalis purpurea 0 Cheilanthes arizonica 7 Lycopodium deuterodensum 4 Cosmarium tinctum 0 Amaryllis belladonna 0 Dillenia indica 0 Cibotium glaucum 5 Phylloglossum drummondii 4 Cosmocladium cf. constrictum 0 Amborella trichopoda 0 Dioscorea villosa 0 Crepidomanes venosum 0 Pseudolycopodiella caroliniana 3 Cylindrocystis brebissonii 0 Amelanchier canadensis 0 Diospyros malabarica 0 Cryptogramma acrostichoides 3 Selaginella acanthonota 3 Cylindrocystis cushleckae 0 Ancistrocladus tectorius 0 Dipsacus sativum 0 Culcita macrocarpa 2 Selaginella apoda 2 Cylindrocystis sp. 0 Anemone hupenhensis 0 Disporopsis pernyi 0 Cyathea spinulosa 1 Selaginella kraussiana 8 Desmidium aptogonum 0 Anemone pulsatilla 0 Dombeya burgessiae 0 Cystopteris fragilis 8 Selaginella lepidophylla 10 Entransia fimbriat 0 Angelica archangelica 0 Draba aizoides 0 Cystopteris protrusa 5 Selaginella selaginoides 2 Euastrum affine 0 Anisacanthus quadrifidas 0 Draba hispida 0 Cystopteris reevesiana 1 Selaginella stauntoniana 2 Gonatozygon kinahanii 0 Annona muricata 0 Draba magellanica 0 Cystopteris utahensis 20 Selaginella wallacei 6 Interfilum paradoxum 0 Anthemis tinctoria 0 Draba oligosperma 0 Danaea sp. 7 Selaginella willdenowii 5 Klebsormidium subtile 0 Anthirrhinum majus 0 Draba ossetica 0 Davallia fejeensis 8 Mesostigma viride 0 Anticharis glandulosa 0 Draba sachalinensis 0 Dennstaedtia 3 Chlorophyta 0 Mesotaenium braunii 0 Antirrhinum braun 0 Drakea elastica 0 Deparia lobato 8 Acrosiphonia sp. 0 Mesotaenium caldariorum 0 Aphanopetalum resinosum 0 Drimia altissima 0 Didymochlaena truncatula 2 Ankistrodesmus sp. 0 Mesotaenium endlicherianum 0 Apios americana 0 Drimys winteri 0 Diplazium wichurae 6 Aphanochaete repens 0 Mesotaenium kramstei 0 Apocynum androsaemifolium 0 Dryas octopetala 0 Dipteris conjugata 5 Asteromonas gracilis 0 Micrasterias fimbriata 1 Arabis alpina 0 Drypetes deplanchei 0 Equisetum diffusum 2 Bathycoccus prasinos 0 Mougeotia sp. 0 Ardisia humilis 0 Edgeworthia papyrifera 0 Equisetum hymale 0 Blastophysa cf. rhizopus 0 Netrium digitus 0 Ardisia revoluta 0 Ehretia acuminata 0 Gymnocarpium dryopteris 6 Bolbocoleon piliferum 0 Nucleotaenium eifelense 0 Argemone mexicana 0 Elaeagnus pungens 0 Hemionitis arifolia 3 Botryococcus braunii 0 Onychonema laeve 0 Aristida stricta 0 Elaeocarpus sylvestris 0 Homalosorus pycnocarpos 6 Botryococcus sudeticus 0 Penium exiguum 0 Aristolochia elegans 0 Eleusine coracana 0 Hymenophyllum bivalve 4 Botryococcus terribilis 0 Penium margaritaceum 0 Aruncus dioicus 0 Epifagus virginiana 0 Hymenophyllum cupressiforme 1 Brachiomonas submarina 0 Phymatodocis nordstedtiana 0 Ascarina rubricaulis 0 Epilobium sp. 0 Leucostegia immersa 8 Bryopsis plumosa 0 Planotaenium ohtanii 0 Asclepia curassavica 0 Erigeron speciosus 0 Lindsaea linearis 7 Carteria crucifera 0 Pleurotaenium trabecul 0 Asclepias syriaca 0 Eriospermum lancifolia 0 Lindsaea microphylla 2 Carteria obtusa 0 Roya obtusa 0 Asparagus densiflorus 0 Erythroxylum coca 0 Lygodium japonicum 5 Cephaleuros virescens 0 Spirogyra sp. 0 Aster tataricus 0 Escallonia rubra 0 Marattia sp. 3 Chaetopeltis orbicularis 0 minuta 0 Astilbe chinensis 0 Escallonia sp. cv. Newport 0 Myriopteris eatonii 11 Chlamydomonas bilatus 0 Spirotaenia sp. 0 Astragalus membranaceus 0 Eschscholzia californica 0 Nephrolepis exaltata 13 Chlamydomonas cribrum 0 Staurastrum sebaldi 0 Astragalus propinquus 0 Eucalyptus leucoxylon 0 Notholaena montieliae 4 Chlamydomonas moewusii 0 Staurodesmus convergens 0 Atriplex hortensis 0 Eucommia ulmoides 0 Onoclea sensibilis 1 Chlamydomonas noctigama 0 Staurodesmus omearii 0 Atriplex prostrata 0 Euphorbia pekinensis 0 Ophioglossum vulgatum 7 Chlamydomonas sp. 0 Xanthidium antilopaeum 0 Atriplex rosea 0 Eupomatia bennettii 0 Osmunda javanica 2 Chlorella minutissima 0 Zygnema sp. 0 Atropa belladonna 0 Euptelea pleiosperma 0 Osmunda regalis 0 Chloromonas oogama 0 Zygnemopsis sp. 0 Aucuba japonica 0 Exacum affine 0 Osmunda sp. 4 Chloromonas perforata 0 Austrobaileya scandens 0 Exocarpos cupressiformis 0 Osmundastrum cinnamomeum 0 Chloromonas reticulata 0 Gymnosperms 0 Avena fatua 0 Fagus sylvatica 0 Pilularia globulifera 1 Chloromonas rosa 0 Abies lasiocarpa 0 Azadirachta indica 0 Ficus religiosa 0 Pityrogramma trifoliata 7 Chloromonas subdivisa 0 Acmopyle pancheri 0 Bacopa caroliniana 0 Flaveria angustifolia 0 Plagiogyria japonica 2 Chloromonas tughillensi 0 Agathis robusta 0 Balanophora fungosa 0 Flaveria bidentis 0 Pleopeltis polypodioides 9 Chlorosarcinopsis halophila 0 Amentotaxus argotaenia 0 Basella alba 0 Flaveria brownii 0 Polypodium amorphum 9 Cladophora glomerata 0 Araucaria rulei 0 Batis maritima 0 Flaveria cronquestii 0 Polypodium glycyrrhiza 7 coccoid prasinophyte 0 Arucaria sp. 0 Bauhinia tomentosa 0 Flaveria kochiana 0 Polypodium hesperium 4 Coccomyxa pringsheimii 0 cupressoides 0 Begonia sp. 0 Flaveria palmeri 0 Polypodium plectolens 3 Codium fragile 0 chilensis 0 Berberidopsis beckleri 0 Flaveria pringlei 0 Polystichum acrostichoides 5 Cylindrocapsa geminella 0 Austrotaxus spicata 0 Bergenia sp. 0 Flaveria pubescens 0 Psilotum nudum 2 Cymbomonas sp. 0 gracilis 0 Beta maritima 0 Flaveria sonorensis 0 Pteris ensigormis 2 Dolichomastix tenuilepi 0 Callitris macleayana 0 Betula pendula 0 Flaveria trinervia 0 Pteris vittata 10 Dunaliella primolecta 0 decurrens 0 Bischofia javanica 0 Flaveria vaginata 0 Sceptridium dissectum 4 Dunaliella salina 0 Cathaya agryrophylla 0 Bituminaria bituminosa 0 Forestiera segregata 0 Sticherus lobatus 0 Dunaliella tertiolecta 0 Cedrus libani 0 Bixa orellana 0 Fouqueria macdougalli 0 Thelypteris acuminata 3 Entocladia endozoica 0 Cephalotaxus harringtonia 0 Blutoparon vermiculare 0 Francoa appendiculata 0 Thyrsopteris elegans 3 Eremosphaera viridi 0 lawsoniana 0 Boehmeria nivea 0 Frankenia laevis 0 Tmesipteris parva 4 Eudorina elegans 0 japonica 0 Boerhavia cf. spiderwort 0 Freycinetia multiflora 0 Vittaria appalachiana 1 Fritschiella tuberosa 0 lanceolata 0 Boerhavia dominnii 0 Galax urceolata 0 Vittaria lineata 3 Geminella sp. 0 dupreziana 0 Borya sphaerocephala 0 Galium boreale 0 Woodsia ilvensis 3 Golenkinia longispicula 0 Cycas micholitzii 0 Boswellia sacra 0 Galphimia gracilis 0 Woodsia scopulina 10 Gonium pectorale 0 Dacrycarpus compactus 0 Bougainvillea spectabilis 0 Garcinia livingstonei 0 Haematococcus pluviali 0 Dacrydium balansae 0 Boykinia jamesii 0 Garcinia oblongiflolia 0 Liverworts 183 Haematococcus pluvialis 0 Dioon edule 0 Brassica nigra 0 Gelsemium sempervirens 0 Barbilophozia barbata 11 Hafniomonas reticulata 0 archeri 0 Brocchinia reducta 0 Gentiana acaulis 0 Bazzania trilobata 0 Halochlorococcum marinum 0 Encephalartos barteri 0 Brodiaea sierrae 0 Geranium carolinianum 0 Blasia sp. 1 Helicodictyon planctonicum 0 Ephedra sinica 0 Brugmansia sanguinea 0 Geranium maculatum 0 Calypogeia fissa 4 Heterochlamydomonas inaequalis 0 Falcatifolium taxoides 0 Buddleja lindleyana 0 Geum quellyon 0 Conocephalum conicum 9 Hormidiella sp. 0 hodginsii 0 Buddleja sp. 0 Gleditsia sinensis 0 Frullania 8 Ignatius tetrasporus 0 Ginkgo biloba 0 Bursera simaruba 0 Gleditsia triacanthos 0 Lejeuneaceae sp. 14 Leptosira obovata 0 pensilis 0 Buxus sempervirens 0 Gloriosa superba 0 Lunularia cruciata 9 Lobochlamys segnis 0 Gnetum montanum 0 Byblis gigantea 0 Glycine soja 0 Marchantia emarginata 4 Lobomonas rostrata 0 Halocarpus bidwillii 0 Caiophora chuquitensis 0 Glycyrrhiza glabra 0 Marchantia paleacea 8 Mantoniella squamata 0 Juniperus scopulorum 0 Calceolaria pinifolia 0 Glycyrrhiza lepidota 0 Marchantia polymorpha 7 Microspora cf. tumidula 0 Keteleeria evelyniana 0 Calycanthus floridus 0 Gomortega keule 0 Metzgeria crassipilis 1 Microthamnion kuetzigianum 0 Lagarostrobos franklinii 0 Camptotheca acuminata 0 Gompholobium polymorphum 0 Monoclea gottschei 0 Monomastix opisthostigma 0 Larix speciosa 0 Canella winterana 0 Goodyera pubescens 0 Odontoschisma prostratum 10 Nannochloris atomus 0 Manoao colensoi 0 Canna sp. 0 Grevillea robusta 0 Pallavicinia lyellii 4 Neochloris oleoabundans 0 glyptostroboides 0 Cannabis sativa 0 Greyia sutherlandii 0 Pellia cf. Epiphylla 9 Neochloris sp. 0 0 Capnoides sempervirens 0 Griselinia littoralis 0 Pellia neesiana 16 Neochlorosarcina sp. 0 Microcachrys tetragona 0 Carthamus lanatus 0 Griselinia racemosa 0 Plagiochila asplenioides 9 Nephroselmis olivace 0 Microstrobos fitzgeraldii 0 Carya glabra 0 Gunnera manicata 0 Porella navicularis 8 Nephroselmis pyriformis 0 Nageia nagi 0 Cassytha filiformis 0 Gylcine soja 0 Porella pinnata 8 Ochlochaete sp. 0 Neocallitropsis pancheri 0 Castanea crenata 0 Gymnocladus dioicus 0 Ptilidium pulcherrimum 11 Oedogonium cardiacu 0 Nothotsuga longibracteata 0 Castanea pumila 0 Gyrocarpus americanus 0 Radula lindenbergia 8 Oedogonium foveolatum 0 papuana 0 Casuarina glauca 0 Gyrostemon ramulosus 0 Riccia berychiana 4 Oltmannsiellopsis viridis 0 Parasitaxus usta 0 Catharanthus roseus 0 Haemaria discolor 0 Scapania nemorosa 8 Oogamochlamys gigantea 0 Phyllocladus hypohyllus 0 Cavendishia cuatrecasasii 0 Hakea drupaceae 0 Schistochila sp. 3 Pandorina morum 0 Picea engelmanii 0 Celsia arcturus 0 Hakea prostrata 0 Sphaerocarpos texanus 3 Parachlorella kessleri 0 uviferum 0 Celtis occidentalis 0 Hamamelis virginiana 0 Pediastrum duplex 0 Pinus jeffreyi 0 Centella asiatica 0 0 Mosses 79 Pedinomonas minor 0 Pinus parviflora 0 Cephalotus follicularis 0 Helenium autumnale 0 Andreaea rupestris 0 Pedinomonas tuberculata 0 Pinus ponderosa 0 Ceratocapnos vesicaria 0 Heliconia sp. 0 Anomodon attenuatus 1 Persursaria percursa 0 Pinus radiata 0 Ceratophyllum demersum 0 Heliotropium calcicola 0 Anomodon rostratus 4 Phacotus lenticularis 0 orientalis 0 Cercidiphyllum japonicum 0 Heliotropium convolvulaceum 0 Atrichum angustatum 1 Picocystis salinarum 0 Podocarpus coriaceus 0 Cercis canadensis 0 Heliotropium filiforme 0 Aulacomnium heterostichum 1 Pirula salina 0 Podocarpus rubens 0 Cercocarpus ledifolius 0 Heliotropium greggii 0 Bryum argenteum 0 Planophila laetevirens 0 Prumnopitys andina 0 Chamaseyce mesebyranthemum 0 Heliotropium karwinsky 0 Buxbaumia aphylla 1 Planophila terrestris 0 Pseudolarix amabilis 0 Chelidonium majus 0 Heliotropium mendocinum 0 Calliergon cordifolium 0 Pleurastrum insigne 0 Pseudotaxus chienii 0 Chenopodium quinoa 0 Heliotropium racemosum 0 Ceratodon purpureus 0 Prasinococcus capsulatus 0 Pseudotsuga menziesii 0 Chionanthus retusus 0 Heliotropium sp. 0 cf. Physcomicromitrium sp. 0 Prasinoderma coloniale 0 Retrophyllum minus 0 Chlorogalum pomeridianum 0 Heliotropium tenellum 0 Climacium dendroides 1 Prasiola crispa 0 Saxegothaea conspicua 0 Chondropetalum tectorum 0 Heliotropium texanum 0 Dicranum scoparium 3 Prototheca wickerhamii 0 Sciadopitys verticillata 0 Chrysobalanus icaco 0 Helonias bullata 0 Diphyscium foliosum 3 Pseudoscourfieldia marina 0 sempervirens 0 Cicerbita plumieri 0 Helwingia japonica 0 Encalypta streptocarpa 0 Pteromonas angulosa 0 giganteum 0 Cimicifuga racemosa 0 Hemerocallis sp. 0 Fontinalis antipyretica 0 Pteromonas sp. 0 Stangeria eriopus 0 Cinnamomum camphora 0 Hemerocallis spp. 0 Funaria 1 Pycnococcus provasolii 0 Sundacarpus amarus 0 Cissus quadrandularis 0 Heracleum lanatum 0 Hedwigia ciliata 0 Pyramimonas parkeae 0 cryptomerioides 0 Cistus inflatus 0 Hesperaloe parviflora 0 Hypnum subimponens 5 Scenedesmus dimorphus 0 distichum 0 Citrus x paradisi 0 Heteropyxis natalensis 0 Leucobryum albidum 2 Scherffelia dubia 0 Taxus baccata 0 Cladrastis lutea 0 Heuchera sanguinea 0 Leucobryum glaucum 3 Scourfieldia sp. 0 Taxus cuspidata 0 Cleome gynandra 0 Hibbertia grossulariifolia 0 Leucodon sciuroides 2 Spermatozopsis exsultans 0 sp. 0 Cleome violaceae 0 Hibiscus cannabinus 0 Neckera douglasii 2 Spermatozopsis similis 0 plicata 0 Cleome viscosa 0 Hilleria latifolia 0 Niphotrichum elongatum 1 Stephanosphaera pluvialis 0 dolabrata 0 Cocculus laurifolius 0 Hoheria angustifolia 0 Orthotrichum lyellii 0 Stichococcus bacillaris 0 Torreya nucifera 0 Cochlearea officinalis 0 Holarrhena pubescens 0 Philonotis fontana 5 Stigeoclonium helveticum 0 Torreya taxifolia 0 Cocos nucifera 0 Houttuynia cordata 0 Plagiomnium insigne 2 Tetraselmis chui 0 Tsuga heterophylla 0 Codariocalyx motorius 0 Humulus lupulus 0 Polytrichum commune 1 Tetraselmis cordiformis 0 Welwitschia mirabilis 0 Colchicum autumnale 0 Hydrangea quercifolia 0 Pseudotaxiphyllum elegans 3 Tetraselmis striata 0 cedarbergensis 0 Conopholis americana 0 Hydrastis canadensis 0 Racomitrium varium 4 Trebouxia arboricola 0 Wollemia nobilis 0 Convolvulus arvensis 0 Hydrocotyle umbellata 0 Rhynchostegium serrulatum 4 Trentepohlia annulata 0 Conzya canadensis 0 Hypecoum procumbens 0 Rhytidiadelphus loreus 1 Unidentified species CCMP 1205 0 Angiosperms 5 Copaifera officianalis 0 Hypericum perforatum 0 Schwetschkeopsis fabronia 3 Uronema belka 0 Acacia argyrophylla 0 Coriaria nepalensis 0 Idiospermum australiense 0 Scouleria aquatica 0 Vitreochlamys sp. 0 Acacia pycnantha 0 Cornus floridana 0 Ilex paraguariensis 0 Sphagnum lescurii 5 Volvox aureus 0 Acer negundo 0 Corokia cotoneaster 0 Ilex sp. 0 Sphagnum palustre 7 Volvox globator 0 Acorus americanus 0 Corydalis linstowiana 0 Ilex vomitoria 0 Sphagnum recurvatum 3 Actinidia chinensis 0 Cotoneaster transcaucasicus 0 Illicium floridanum 0 Syntrichia princeps 1 Aerva lanata 0 Crassula perforata 0 Illicium parviflorum 0 Takakia lepidozioides 2 Aerva persica 0 Crossopetalum rhacoma 0 Impatiens balsamifera 0 Tetraphis pellucida 0 Aesculus pavia 0 Croton tiglium 0 Inula helenium 0 Thuidium delicatulum 5 Aextoxicon punctatum 0 Cunonia capensis 0 Ipomoea coccinea 0 Timmia austriaca 2 Agapanthus africanus 0 Curculigo sp. 0 Ipomoea hederacea 0 Agastache rugosa 0 Curcuma olena 0 Ipomoea indica 0 Hornworts 14 Agave tequilana 0 Curtisia dentata 0 Ipomoea lindheimeri 0 Anthoceros formosae 0 Ailanthus altissima 0 Cuscuta pentagonia 0 Ipomoea lobata 0 Leiosporoceros dussii 0 Ajuga reptans 0 Cyanastrum cordifolium 0 Ipomoea nil 0 Megaceros tosanus 0 Akania lucens 0 Cyanella orchidofromis 0 Ipomoea pubescens 0 Megaceros vincentianus 2 Akebia trifoliata 0 Cymbopogon nardus 0 Ipomoea purpurea 0 Nothoceros aenigmaticus 0 Alangium chinense 0 Cyperus papyrus 0 Ipomoea quamoclit 0 Paraphymatoceros hallii 3 Allamanda cathartica 0 Cypselea humifusum 0 Itea virginica 0 Phaeoceros carolinianus 9 Allionia incarnata 0 Cyrilla racemiflora 0 Jacquinia sp. 0 Allionia spp. 0 Daenikera sp. 0 Johnsonia pubescens 0

88 Table 3.1: Continued.

Angiosperms (Continued) Angiosperms (Continued) Angiosperms (Continued) Joinvillea ascendens 0 Phacelia campanularia 0 Tiarella polyphylla 0 Juglans nigra 0 Phelline lucida 0 Tragopogon castellanus 0 Juncus inflexus 0 Phellodendron amurense 0 Tragopogon dubius 0 Kadsura heteroclite 0 Philadelphus inodorus 0 Tragopogon porrifolius 0 Kalanchoe crenato diagremontiana 0 Phlox drummondii 0 Tragopogon pratensis 0 Kaliphora madagascariensis 0 Phlox sp. 0 Traubia modesta 0 Kerria japonica 0 Pholisma arenarium 0 Trianthemum portulacastrum 0 Kigelia africana 0 Phoradendron serotinum 0 Triglochin maritimum 0 Kirkia wilmsii 0 Phormium tenax 0 Triodia aff. bynoei 0 Kochia scoparia 0 Phycella aff. cyrtanthoides 0 Trochodendron araliodes 0 Koeberlina spinosa 0 Phyla dulcis 0 Tropaeolum peregrinum 0 Krameria lanceolata 0 Phyllanthus sp. 0 Trubulus eichlerianus 0 Lactuca graminifolia 0 Physena madagascariensis 0 Typha angustifolia 0 Lagerstroemia indica 0 Physocarpus opulifolius 0 Typha latifolia 0 Lantana camara 0 Phytolacca americana 0 Typhonium blumei 0 Larrea tridentata 0 Phytolacca bogotensis 1 Ulmus alata 0 Lathyrus sativus 0 Pilostyles thunbergii 0 Uncarina grandidieri 0 Laurelia sempervirens 0 Pinguicula agnata 0 Uniola paniculata 0 Lavandula angustifolia 0 Pinguicula caudata 0 Urginea maritima 0 Ledum palustre 0 Piper auritum 0 Urtica dioica 0 Lennoa madreporoides 0 Pistia stratioides 0 Utricularia sp. 0 Leontopodium alpinum 0 Pittosporum resiniferum 0 Uvaria microcarpa 0 Leonurus japonicus 0 Pittosporum sahnianum 0 Valeriana officianalis 0 Lepidosperma gibsonii 0 Plantago coronopis 0 Vanilla planifolia 0 Licania michauxii 0 Plantago maritima 0 Verbascum sp. 0 Ligustrum sinense 0 Plantago virginica 0 Verbena hastata 0 Lilium sargentiae 0 Platanthera clavellata 0 Viburnum odoratissimum 0 Limnanthes douglassii 0 Platanus occidentalis 0 Viola canadensis 0 Limonium spectabile 0 Platycodon grandiflorus 0 Viola tricolor 0 Lindenbergia philippensis 0 Platyspermation crassifolium 0 Vitex agnus castus 0 Lindera benzoin 0 Plumbago auriculata 0 Wikstroemia indica 0 Linum bienne 0 Podophyllum peltatum 0 Wisteria floribunda 0 Linum flavum 0 Pogostemon sp. 0 Wrightia natalensis 0 Linum grandiflorum 0 Polansia trachysperma 0 Xanthicercis zambesiaca 0 Linum hirsutum 0 Poliomintha bustamanta 0 Xanthuium strumarium 0 Linum leonii 0 Polycarpaea repens 0 Xeronema callistemon 0 Linum lewisii 0 Polygala lutea 0 Xerophyllum asphodeloides 0 Linum macraei 0 Polygonum convolvulus 0 Xerophyta villosa 0 Linum perenne 0 Polypremum procumbens 0 Ximenia americana 0 Linum strictum 0 fruticosa 0 Yucca brevifolia 0 Linum tenuifolium 0 Portulaca amilis 0 Yucca filamentosa 0 Linum usitatissimum 0 Portulaca cryptopetala 0 Zaleya pentandra 0 Liquidambar styraciflua 0 Portulaca grandiflora 0 Zephyranthes treatiae 0 Litchi chinensis 0 Portulaca mauii 0 Zingiber officinale 0 Lobelia siphilitica 0 Portulaca molokaiensis 0 Ziziphus jujuba 0 Lomandra longifolia 0 Portulaca oleracea 0 Lonicera japonica 0 Portulaca pilosa 0 Lophophora williamsii 0 Portulaca suffruticosa 0 Loropetalum chinense 0 Portulaca umbraticola 0 Ludovia sp. 0 Posidonia australis 0 Lupinus angustifolius 0 Prunella vulgaris 0 Lupinus polyphyllus 0 Prunus prostrata 0 Lycium barbarum 0 Psychotria douarrei 0 Lycium sp. 0 Psychotria ipecacuanha 0 Lycopersicon cheesmanii 0 Psychotria marginata 0 Maesa lanceolata 0 Punica granatum 0 Magnolia grandiflora 0 Pycnanthemum tenuifolium 0 Maianthemum canadense 0 Pyrenacantha malvifolia 0 Maianthemum sp. 0 Quassia amara 0 Malesherbia fasiculata 0 Quercus shumardii 0 Malus baccata 0 Quillaja saponaria 0 Manihot grahamii 0 Rauvolfia tetraphyla 0 Manilkara zapota 0 Rehmannia glutinosa 0 Mansoa alliacea 0 Reseda odorata 0 Mapania palustris 0 Rhamnus caroliniana 0 Maranta leuconeura 0 Rhamnus japonica 0 Marrubium vulgare 0 Rhizophora mangle 0 Masdevallia yuangensis 0 Rhodiola rosea 0 Matricaria matricariodes 0 Rhododendron scopulorum 0 Medinilla magnifica 0 Rhodophiala pratensis 0 Melaleuca quinquenervia 0 Rhus radicans 0 Melia azedarach 0 Ribes aff. giraldii 0 Meliosma cuneifolia 0 Ricinus communis 0 Melissa officinalis 0 Roridula gorgonias 0 Menyanthes trifoliata 0 Rosa palustris 0 Mertensia paniculata 0 Rosmarinus officinalis 0 Michelia maudiae 0 Ruellia brittoniana 0 Micromeria fruticosa 0 Ruscus sp. 0 Microstegium vimineum 0 Sabal bermudana 0 Microtea debilis 0 Sagittaria latifolia 0 Mirabilis jalapa 0 Saintpaulia ionantha 0 Mitella pentandra 0 Salix acutifolia 0 Mollugo cerviana 0 Salix dasyclados 0 Mollugo nudicaulis 0 Salix eriocephala 0 Mollugo pentaphylla 0 Salix fargesii 0 Mollugo verticillata 0 Salix purpurea 0 Monotropa uniflora 0 Salix sachalinensis 0 Morinda citrifolia 0 Salix viminalis 0 Moringa oleifera 0 Salvadora sp. 0 Morus nigra 0 Salvia spp. 0 Mumea americana 0 Sambucus canadensis 0 Muntingia calabura 0 Sanchezia sp. 0 Mydocarpus sp. 0 Sanguinaria canadensis 0 Myrica cerifera 0 Sanguisorba minor 0 Myriophyllum aquaticum 0 Sansevieria trifasciata 0 Myristica fragrans 0 Santalum acuminatum 0 Nandina domestica 0 Saponaria officianalis 0 Narcissus viridiflorus 0 Sarcandra glabra 0 Nelumbo nucifera 0 Sarcobatus vermiculatus 0 Nelumbo sp. 0 Sarcodes sanguinea 0 Nepenthes alata 0 Saruma henryi 0 Nepeta cataria 0 Sassafras albidum 0 Neurachne alopecuroidea 0 Saururus cernuus 0 Neurachne annularis 0 Saxifraga stolonifera 0 Neurachne lanigera 0 Scaevola sp. 0 Neurachne minor 0 Schiedea membranacea 0 Neurachne munroi 0 Schizolaena sp. 0 Neurachne tenuifolia 0 Schlegelia parasitica 0 Nicotiana sylvestris 0 Schlegelia parasitica B 0 Nolina atopocarpa 0 Schlegelia violacea 0 Nolina bigelorii 0 Scutellaria montana 0 Nothofagus obliqua 0 Senecio rowleyanus 0 Nuphar advena 0 Senna hebecarpa 0 Nypa fruticans 0 Serenoa repens 0 Nyssa ogeche 0 Sessuvium portulacastrum 0 Ochna mossambicensis 0 Sessuvium ventricosum 0 Ochna serrulata 0 Sideroxylon reclinatum 0 Oenothera affinis 0 Silene latifolia 0 Oenothera berlandieri 0 Silybum marianum 0 Oenothera biennis 0 Simmondsia chinensis 0 Oenothera clelandii 0 Sinapis alba 0 Oenothera elata 0 Sinningia tuberosa 0 Oenothera elata hookeri 0 Sinojackia xylocarpa 0 Oenothera filiformis 0 Sisyrinchium angustifolium 0 Oenothera gaura 0 Smilax bona nox 0 Oenothera grandiflora 0 Solanum dulcamara 0 Oenothera grandis 0 Solanum lasiophyllum 0 Oenothera laciniata 0 Solanum ptychanthum 0 Oenothera longituba 0 Solanum sisymbriifolium 0 Oenothera nana 0 Solanum xanthocarpum 0 Oenothera picensis 0 Solenostemon scutellarioides 0 Oenothera rhombipetala 0 Solidago canadensis 0 Oenothera rosea 0 Sorbus koehneana 0 Oenothera serrulata 0 Souroubea exauriculata 0 Oenothera speciosa 0 Spergularia media 0 Oenothera suffulta suffulta 0 Stachyurus praecox 0 Oenothera villaricae 0 Stackhousia spathulata 0 Olea europaea 0 Staphylea trifolia 0 Oncidium sphacelatum 0 Stemona tuberosa 0 Oncotheca balansae 0 Strelitzia reginae 0 Opuntia sp. 4 Strobilanthes dyerianus 0 Orchidantha maxillaroides 0 Strychnos spinosa 0 Oresitrophe rupifraga 0 Stylidium adnatum 0 Orobanche fasciculata 0 Symphoricarpos sp. 0 Oxalis sp. 0 Symplocus sp. 0 Oxera neriifolia 0 Synsepalum dulcificum 0 Oxera pulchella 0 Syzygium macranthum 0 Paeonia lactiflora 0 Syzygium paniculatum 0 Panicum miliaceum A 0 Tabebuia umbellate 0 Papaver bracteatum 0 Talbotia elegans 0 Papaver rhoeas 0 Talinum sp. 0 Papaver setigerum 0 Tamarix chinensis 0 Papaver somniferum 0 Tanacetum parthenium 0 Paraneurachne muelleri 0 Tapiscia sinensis 0 Passiflora caerulea 0 Tellima breviflora 0 Passiflora edulis 0 Terminalia neotaliala 0 Paulownia fargesii 0 Ternstroemia gymnanthera 0 Peganum harmala 0 Tetrastigma obtectum 0 Peliosanthese minor 0 Tetrastigma voinierianum 0 Peltoboykinia watanabei 0 Tetrazygia bicolor 0 Pennantia corymbosa 0 Teucrium chamaedrys 0 Peperomia fraseri 0 Thalictrum thalictroides 0 Pereskia aculeata 0 Thladiantha villosula 0 Persea borbonia 0 Thymus vulgaris 0 Petiveria alliacea 0 Thyridolepis mitchelliana 0 Peumus boldus 0 Thyridolepis multiculmis 0

89 Table 3.2: Summary statistics of MTPSLs in 9 plant lineages.

Lineage Species Count MTPSL Count Mean Median St. Dev. Min Max Angiosperms 699 5 0.01 0 0.16 0 4 Gymnosperms 80 0 0 0 0 0 0 Monilophytes 70 353 5.04 4.50 3.81 0 20 Lycophytes 22 83 3.77 3 2.65 0 10 Hornworts 7 14 2 0 3.32 0 9 Mosses 41 79 1.93 1 1.82 0 7 Liverworts 26 177 6.81 8 4.15 0 16 Charophyta 47 1 0.02 0 0.15 0 1 Chlorophyta 111 0 0 0 0 0 0

90 Table 3.3: 22 MTPSL genes outside of the four groups (high similarity to microbial TPSs).

Group MTPSL_id NR_top_hit evalue bit_score Scientic_name Kingdom V ANT_WCZB_MFTPS7 gi|751680917|gb|KIM31075.1| 0 1019 Serendipita vermifera MAFF 305830 Fungi V ANT_WCZB_MFTPS8 gi|353240956|emb|CCA72799.1| 0 981 Piriformospora indica DSM 11827 Fungi V ANT_WCZB_MFTPS9 gi|353240956|emb|CCA72799.1| 7.00E-130 391 Piriformospora indica DSM 11827 Fungi V CHO_MCHJ_MFTPS1 gi|913451420|ref|WP_050430829.1| 0.001 50.8 Chondromyces crocatus Bacteria V EUD_MRKX_MFTPS1 gi|588255974|ref|XP_006957163.1| 0 558 Wallemia mellicola CBS 633.66 Fungi V EUD_QAIR_MFTPS1 gi|751178290|gb|KIL64254.1| 5.00E-177 508 Amanita muscaria Koide BX008 Fungi V EUD_QAIR_MFTPS2 gi|751175026|gb|KIL61028.1| 0 620 Amanita muscaria Koide BX008 Fungi V EUD_QAIR_MFTPS3 gi|751174784|gb|KIL60790.1| 7.00E-143 427 Amanita muscaria Koide BX008 Fungi V EUD_QAIR_MFTPS5 gi|751181225|gb|KIL67171.1| 0 691 Amanita muscaria Koide BX008 Fungi V LYC_JKAA_MFTPS1 gi|927407765|ref|XP_013949969.1| 0 541 Trichoderma virens Gv29-8 Fungi V LYC_ZYCD_MFTPS1 gi|238496645|ref|XP_002379558.1| 1.00E-130 390 Aspergillus avus NRRL3357 Fungi V MAR_AEXY_MFTPS1 gi|389636521|ref|XP_003715910.1| 6.00E-178 513 Magnaporthe oryzae 70-15 Fungi V MAR_IRBN_MFTPS6 gi|751680917|gb|KIM31075.1| 0 934 Serendipita vermifera MAFF 305830 Fungi V MAR_JHFI_MFTPS20 gi|751680917|gb|KIM31075.1| 6.00E-101 313 Serendipita vermifera MAFF 305830 Fungi V MAR_LGOW_MFTPS3 gi|629725325|ref|XP_007822988.1| 7.00E-56 196 Metarhizium robertsii Fungi V MAR_NWQC_MFTPS8 gi|629725325|ref|XP_007822988.1| 6.00E-36 144 Metarhizium robertsii Fungi V MAR_OFTV_MFTPS5 gi|751680917|gb|KIM31075.1| 0 931 Serendipita vermifera MAFF 305830 Fungi V MAR_RTMU_MFTPS4 gi|667838359|ref|XP_007783348.1| 3.00E-128 386 Coniosporium apollinis CBS 100218 Fungi V MAR_WJLO_MFTPS3 gi|549052256|emb|CCX30236.1| 4.00E-68 231 Pyronema omphalodes CBS 100304 Fungi V MAR_WJLO_MFTPS4 gi|549052256|emb|CCX30236.1| 1.00E-71 240 Pyronema omphalodes CBS 100304 Fungi V MAR_YBQN_MFTPS7 gi|648165817|gb|KDR79494.1| 0 596 Galerina marginata CBS 339.88 Fungi V MON_QIAD_MFTPS2 gi|629662947|ref|XP_007805277.1| 2.00E-77 251 Endocarpon pusillum Z07020 Fungi

91 Table 3.4: MTPSL genes from the hornwort Anthoceros puctatus.

Gene Length (bp) Location Introns Protein size (aa) Group ApMTPSL1 1227 NODE_4150_length_6347_cov_5.87285_ID_8299:1863..3089 0 408 II ApMTPSL2 1293 Contig14756:5881..7173 0 430 III ApMTPSL3 1311 Contig1451:3817..5127 0 436 II ApMTPSL4 1206 Contig5434:17076..18281 0 401 III ApMTPSL5 1242 Contig969:424..1665 0 413 III ApMTPSL6 1266 NODE_4646_length_5616_cov_12.4068_ID_9291:3111..4376 0 421 III ApMTPSL7 1284 NODE_2278_length_10879_cov_6.65983_ID_4555:7120..8403 0 427 III ApMTPSL8 927 Contig1090:645..1571 0 308 III ApMTPSL9 1026 Contig4580:3013..4038 0 341 III ApMTPSL10 1149 Contig8980:4454..5685 1 382 II ApMTPSL11 822 NODE_112_length_40181_cov_9.86101_ID_223:37545..38965 3 273 III ApMTPSL12 873 NODE_2080_length_11745_cov_6.2947_ID_4159:3313..4244 1 290 III ApMTPSL13* Contig13088:2558..5572 III ApMTPSL14 1026 Contig969:7202..8227 0 341 III ApMTPSL15 297 Contig8303:306..602 0 98 ApMTPSL16 330 NODE_9289_length_2299_cov_6.55749_ID_18577:52..381 0 109

92 Table 3.5: Typical plant terpene synthase genes in Sphagnum fallax.

Gene Length Group NR_top_hit evalue bit_score Scientic_name Kingdom Sphfalx0057s0049.1 490 I gi|288346547|gb|EFC80872.1| 2.00E-26 122 Frankia sp. EUN1f Bacteria Sphfalx0363s0003.1 478 I gi|664165251|ref|WP_030699689.1| 2.00E-27 122 Streptomyces griseus Bacteria Sphfalx0008s0258.1 482 I gi|827012632|ref|WP_047174997.1| 3.00E-25 119 Streptomyces sp. MNU77 Bacteria Sphfalx0141s0067.1 478 I gi|664165251|ref|WP_030699689.1| 7.00E-26 117 Streptomyces griseus Bacteria Sphfalx0021s0008.1 482 I gi|491462970|ref|WP_005320742.1| 2.00E-25 116 Streptomyces pristinaespiralis Bacteria Sphfalx0097s0020.1 473 I gi|664054675|ref|WP_030593937.1| 9.00E-24 115 Streptomyces anulatus Bacteria Sphfalx0455s0001.1 228 I gi|664054675|ref|WP_030593937.1| 5.00E-24 110 Streptomyces anulatus Bacteria Sphfalx0420s0001.1 488 I gi|972355343|ref|WP_059010379.1| 3.00E-22 110 Streptomyces specialis Bacteria Sphfalx0043s0071.1 483 I gi|491462970|ref|WP_005320742.1| 1.00E-22 108 Streptomyces pristinaespiralis Bacteria Sphfalx0199s0005.1 456 I gi|740130670|ref|WP_037978644.1| 5.00E-21 106 Streptomyces sp. TAA486 Bacteria Sphfalx0045s0015.1 458 I gi|740130670|ref|WP_037978644.1| 2.00E-19 101 Streptomyces sp. TAA486 Bacteria Sphfalx0442s0005.1 456 I gi|740130670|ref|WP_037978644.1| 2.00E-18 98.6 Streptomyces sp. TAA486 Bacteria Sphfalx0197s0010.1 456 I gi|926397772|ref|WP_053725884.1| 2.00E-18 98.6 Streptomyces sp. WM6378 Bacteria Sphfalx0197s0012.1 454 I gi|927089120|ref|WP_053788880.1| 9.00E-19 97.1 Streptomyces sp. XY332 Bacteria Sphfalx0012s0019.1 457 I gi|902778075|ref|WP_049649135.1| 9.00E-19 96.7 Kitasatospora sp. MY 5-36 Bacteria Sphfalx0280s0009.1 379 I gi|664165251|ref|WP_030699689.1| 1.00E-13 81.6 Streptomyces griseus Bacteria Sphfalx0280s0009.2 378 I gi|664165251|ref|WP_030699689.1| 4.00E-13 79.7 Streptomyces griseus Bacteria Sphfalx0308s0010.1 164 I gi|644654594|ref|WP_025348809.1| 3.00E-12 73.2 Nocardia nova Bacteria Sphfalx0128s0030.1 87 I gi|644654594|ref|WP_025348809.1| 1.00E-09 63.2 Nocardia nova Bacteria Sphfalx0043s0070.1 124 I gi|702805978|ref|WP_033270350.1| 0.003 46.2 Streptomyces lydicus Bacteria Sphfalx0128s0029.1 107 I gi|652906113|ref|WP_027160120.1| 0.21 40 Methylobacter luteus Bacteria

93 Table 3.6: A list of sequenced plants searched for MTPSL genes.

Species Data version Amaranthus hypochondriacus v1.0 Amborella trichopoda v1.0 Ananas comosus v3 Aquilegia coerulea v1.1 Aquilegia coerulea v3.1 Arabidopsis halleri v1.1 Arabidopsis lyrata v1.0 Arabidopsis thaliana TAIR10 Boechera stricta v1.2 Brachypodium distachyon v3.1 Brachypodium stacei v1.1 Brassica rapa FPsc v1.3 Capsella grandiora v1.1 Capsella rubella v1.0 Carica papaya ASGPBv0.4 Chlamydomonas reinhardtii v5.5 Citrus clementina v1.0 Citrus sinensis v1.1 Coccomyxa subellipsoidea C-169 v2.0 Cucumis sativus v1.0 Eucalyptus grandis v2.0 Eutrema salsugineum v1.0 Fragaria vesca v1.1 Glycine max Wm82.a2.v1 Gossypium raimondii v2.1 Kalanchoe marnieriana v1.1 Klebsormidium accidum v1.0 Linum usitatissimum v1.0 Malus domestica v1.0 Manihot esculenta v6.1 Medicago truncatula Mt4.0v1 Micromonas pusilla CCMP1545 v3.0 Micromonas sp. RCC299 v3.0 Mimulus guttatus v2.0 Musa acuminata v1 Oryza sativa v7_JGI Ostreococcus lucimarinus v2.0 Panicum hallii v2.0 Panicum virgatum v1.1

94 Table 3.6: Continued

Species Data version Phaseolus vulgaris v1.0 Physcomitrella patens v3.3 Populus trichocarpa v3.0 Prunus persica v2.1 Ricinus communis v0.1 Salix purpurea v1.0 Selaginella moellendori v1.0 Setaria italica v2.2 Setaria viridis v1.1 Solanum lycopersicum iTAG2.3 Solanum tuberosum v3.4 Sorghum bicolor v3.1 Sphagnum fallax v0.5 Spirodela polyrhiza v2 Theobroma cacao v1.1 Triticum aestivum v2.2 Vitis vinifera Genoscope.12X Volvox carteri v2.1 Zea mays 6a Zostera marina v2.2

95 Angiosperms (2/699) Seed plants

Gymnosperms (0/80)

Monilophytes (65/70)

Lycophytes (21/22)

Non-seed plants Hornworts (3/7)

Mosses (30/41)

Liverworts (24/26)

Charophyta (1/47) Green algae Chlorophyta (0/111)

Number of MTPSL

Figure 3.1: Distribution of terpene synthase genes of microbial type (MTPSL) identied from the transcriptomes of 1103 plant species. The phylogeny of green plants was adapted from (Qiu et al., 2006) and (Wickett et al., 2014). The numbers in parentheses represent the number of transcriptomes containing putative MTPSL (in red) and total transcriptomes analyzed in each lineage (in black). Each boxplot represents the number of MTPSL found for individual species in each plant lineage. The solid black lines denote the median number of MTPSL from each species. Whiskers represent 1.5 times the quantile of the data. Points outside of the range of the whiskers are outliers.

96 I

II

III

IV

Monilophytes Mosses Fungi Lycophytes Liverworts Bacteria Hornworts Selaginella moellendorffii Seed plants and algae

0.6 Figure 3.2: Phylogeny of terpene synthases of microbial type (MTPSL) identied from OneKP with known MTPSL genes from S. moellendori, bacterial terpene synthases and fungal terpene synthases. Terpene synthases are color-coded based on their source. The majority of MTPSL genes are clustered into four major groups (I to IV).

97 Figure 3.3: Validation of representative MTPSL genes to be plant genes. The gures show the schematic genomic organization of representative MTPSL genes with their neighboring genes. The genomic region spanning each MTPSL gene and its neighboring gene and the intergenic region was amplied using PCR and conrmed by sequencing. A. ApMTPSL1 from the hornwort Anthoceros punctatus. B. ApMTPSL2 from the hornwort A. punctatus. C. SfMTPSL1 from the moss Sphagnum fallax. In each schematic gure, the neighboring gene was annotated as a functional protein. The species that contains a protein with the highest homology to the respective neighbor gene was indicated. The group number indicates that the specic groups as listed in Figure 3.2 that the respective MTPSL represents.

98

I II III IV

FigureFigure 3.4:3.4: MotifMotif an analysisalysis of of MTPSLs. MTPSLs. ‘DDxxD’ ‘DDxxD’ and and ‘NSE’ ‘NSE’ are aretwo twohighly highly conserved motifs found in terpene synthases. Sequence motif logos made using weblogo 3.0, showing conserved motifs found in terpene synthases. Sequence motif logos made using the conserved motifs found in each group of terpene synthase genes of microbial type. weblogo 3.0, showing the conserved motifs found in each group of terpene synthase genes of microbial type.

74

77

99

Figure 4 20 2 15 1 Lw-IRBN-TPS2 10 5 0 11 12 13 14 15 16 17

8 12 6 7 9 4 Lw-IRBN-TPS4 5 6 3 unidentified 3 11 12 13 14 15 16 17 1, bicycloelemene 20 2, bicyclogermacrene 3, α-isocomene Moss-GOWD-TPS2 15 no activity

4, β-elemene ) 10 5, (E)-β-caryophyllene

ions 5 6, (E)-β-farnesene 11 12 13 14 15 16 17 7, nerolidol 8, dactylol 15 9 9, γ-curcumene 12 10, α-zingiberene 9 10 Moss-QKQO-TPS3 11, β-bisabolene 6 11 13 12, β-curcumene 3 12 14 13, sesquiphellandrene (TIC x 1,000,000x (TIC 11 12 13 14 15 16 17 14, (E)-α-bisabolene 15, (Z,E)-α-farnesene 16 16 16, (E,E)-α-farnesene 12 17, protoillud-6-ene Fern-GSXD-TPS3 15 8 18, (Z)-γ-bisabolene abundance 4 19, (E)-γ-bisabolene 20, β-bisabolol 11 12 13 14 15 16 17 21, α-bisabolol

Relative 24 17 18 Fern-UJTT-TPS4 12 6

Abundance

11 12 TIC:13 15031104.D\data.ms 14 15 16 17 2 .4 e + 0 7

2 .3 e + 0 7

2 .2 e + 0 7

2 .1 e + 0 7

2 e + 0 7

1 .9 e + 0 7

1 .8 e + 0 7

1 .7 e + 0 7

1 .6 e + 0 7

1 .5 e + 0 7

1 .4 e + 0 7

1 .3 e + 0 7

1 .2 e + 0 7

1 .1 e + 0 7

1 e + 0 7 Fern-YJJY-TPS1 9000000 8000000

7000000

6000000

5000000

4000000

3000000

2000000

1000000

10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 Time-->

12 11 12 9 19 Moss-VBMM-TPS3 6 21 6 9 18 14 20 3 10 11 12 13 14 15 16 17 Retention time (min)

Figure 3.5: Biochemical activity of selected MTPSL genes.

Figure 5 100

* 20 * 40 * 60 IRBN_MTPSL8 : MA--SPATIRLPDILSAMDRFELRTHPDEREVTRASNEWFNSYNMMPAPIFEKFVKCDFGLMTAM : 63 gi|751680917 : MASPSPATIRLPDILSAMDKFELRTHPDEREVTRASNEWFNSYNMMPAPIFEKFVKCEFGLMTAM : 65 MA SPATIRLPDILSAMD4FELRTHPDEREVTRASNEWFNSYNMMPAPIFEKFVKC FGLMTAM

* 80 * 100 * 120 * IRBN_MTPSL8 : SYPDTDATRLRITADYMSILFAYDDLMDLPSSDLMHDRIASSKAAKIMMQVLTHPHKFKPVPGLP : 128 gi|751680917 : SYPDTDATRLRITADYMSILFAYDDLMDLPSSDLMHDRIASSKAAKIMMQVLTHPHKFKPVPGLP : 130 SYPDTDATRLRITADYMSILFAYDDLMDLPSSDLMHDRIASSKAAKIMMQVLTHPHKFKPVPGLP

140 * 160 * 180 * IRBN_MTPSL8 : VATAFHDFWTRFCATSTPSMQKRFTETTYEYVMAVKNQVGNRASSVCPSIEEYVSLRRDTSAIKV : 193 gi|751680917 : VATAFHDFWTRFCATSTKSMQKRFTETTYEYVMAVKNQVGNRQSSVCPSIEEYVSLRRDTSAIKV : 195 VATAFHDFWTRFCATST SMQKRFTETTYEYVMAVKNQVGNR SSVCPSIEEYVSLRRDTSAIKV

200 * 220 * 240 * 260 IRBN_MTPSL8 : TYACIEYCLNIDCPDEAFYHPSLAALQEAGNDILSWANDVYSFDNEQCSGDCHNLIAVVAINKNI : 258 gi|751680917 : TYACIEYCLNIDVPDEAFYHPSLAALQEAGNDILSWANDVYSFDNEQCSGDCHNLIAVVAINKNI : 260 TYACIEYCLNID PDEAFYHPSLAALQEAGNDILSWANDVYSFDNEQCSGDCHNLIAVVAINKNI

* 280 * 300 * 320 IRBN_MTPSL8 : TVQAAMEYAMGMIDSAINRFFEECSNVPSFGPDVDPKVQAYIKGVELYLSGSVFWHLESERYFGP : 323 gi|751680917 : TVQAAMEYAMGMIDSAIARFFEECANVPSFGPDVDPKVQAYIKGVELYLSGSVYWHLESERYFGP : 325 TVQAAMEYAMGMIDSAI RFFEEC NVPSFGPDVDPKVQAYIKGVELYLSGSV5WHLESERYFGP

* 340 * 360 * 380 * IRBN_MTPSL8 : RVKHVKDTLMVELRPLDEGAKPAFDLIYKLPSNLTSNVLAAVSNRTPTP-PAPVEAAP-AAPSPP : 386 gi|751680917 : RVKHVKDTLMVELRPLDEGAKPAFNLIYKLPSNLTSNVLAAVTPTTKTPEPVPVAAAPTVAPSPP : 390 RVKHVKDTLMVELRPLDEGAKPAF1LIYKLPSNLTSNVLAAV3 T TP P PV AAP APSPP

400 * 420 * 440 * IRBN_MTPSL8 : PRTTRGTPT------PAHHAPEIHAPVPISPFNPNFPTVSPTSVPPPSYEHQRAFAQYMAAQLDE : 445 gi|751680917 : PRGSSNSSTGTVRASPVQH--EIHAPTPISPFNPNFPTSNPNMMPPPSYEQQRVFAQFMAAQLED : 453 PR 3 3 T P H EIHAP PISPFNPNFPT P 6PPPSYE QR FAQ5MAAQL

460 * 480 * 500 * 520 IRBN_MTPSL8 : KMRAEQYYNQAPQYYSAPQSPYQDQQQ----KLRQNSLMEVLLSRPTSELTNILVIASVLMASSP : 506 gi|751680917 : KMRAEQQW-QVPQYYSAPQSPYQPQQQQQLTKARQNSLMEAILNRPTSELTNILVIASVLMASSP : 517 KMRAEQ 5 Q PQYYSAPQSPYQ QQQ K RQNSLME 6L RPTSELTNILVIASVLMASSP

* 540 IRBN_MTPSL8 : LALVPFVPLLVLLLFPEAPAVLLS : 530 gi|751680917 : LALIPFVPLLVLLLYPEAPAVLLA : 541 LAL6PFVPLLVLLL5PEAPAVLL

Figure 3.6: Alignment of one MTPSL of putative contamination with one fungal TPS.

101 II

ANT_WCZB_MBTPS3 ANT_WCZB_MBTPS1

ANT_WCZB_MBTPS2 ANT_FAJB_MFTPS1

ANT_RXRQ_MFTPS2

ANT_RXRQ_MBTPS1 ANT_WCZB_MFTPS7 ANT_WCZB_MBTPS4 ApMTPSL1 ANT_TCBC_MFTPS2ANT_RXRQ_MFTPS3ApMTPSL3contig8980 ANT_RXRQ_MFTPS4

ANT_WCZB_MFTPS8

ANT_WCZB_MFTPS9

ApMTPSL6ApMTPSL7 ApMTPSL8node2080 ApMTPSL5 ApMTPSL13ApMTPSL14 ApMTPSL2 ApMTPSL4 ApMTPSL9

ANT_FAJB_MFTPS3

ANT_FAJB_MFTPS2 node112 III ANT_WCZB_MFTPS6

ANT_WCZB_MFTPS5

ANT_TCBC_MFTPS4

0.4

Figure 3.7: Phylogenetic tree of MTPSL genes identied from Anthoceros puctatus genome with those identied from hornworts transcriptomes.

102 Chapter 4

Horizontal Gene Transfer of Terpene

Synthase Genes from Bacteria to Fungi

103 Author Contributions: Qidong Jia conducted the phylogenetic analyses, identied the HGT events, performed all comparative genomics and bioinformatics analyses, and wrote the manuscript.

104 4.1 Abstract

Terpenoids constitute the largest class of secondary metabolites. The vast diversity of terpenoids is partly achieved through the continued creation of novel terpene synthase

(TPS) genes, which encode key enzymes for terpenoid biosynthesis, through gene duplication followed by function divergence. In contrast, little is known about the contribution of horizontal gene transfer (HGT) of TPS genes for the diversity of terpenoids.

The goal of this study was to investigate HGT of TPS genes from bacteria to fungi.

By phylogenetic analysis of TPSs from bacteria and fungi, several fungal TPSs were found to be nested within bacterial TPSs, implying HGT from bacteria to fungi. These

TPSs were renamed BTPSL (bacterial TPS-like). We then focused our study on a group of entomopathogenic fungi with sequenced genomes. Of the eleven species of fungi analyzed, eight BTPSL genes were found from seven species, of which the majority are Metarhizium species. In most fungal species containing BTPSL, collinearity could be identied for BTPSL and neighbor genes. In addition to BTPSL genes, each of the fungal species was found to contain typical fungal TPS genes, suggesting that terpenoids produced in each fungus are determined by both BTPSL and typical fungal TPSs. We also performed biochemical studies on one of the identied BTPSLs (MAA_08668) and showed that it has sesquiterpene synthase activity. Molecular evolutionary analysis of BTPSL genes implied purifying selection, suggesting that the novel chemistry brought about by the acquisition of BTPSL genes via HGT may have important and conserved functions for the receipt fungi.

105 4.2 Introduction

Metabolites can be divided into two groups: primary metabolites and secondary metabolites. While primary metabolites are generally conserved and essential for cellular functions, secondary metabolites tend to be lineage- or even species-specic and are important for ecological functions. Collectively, secondary metabolites are highly diverse, mounting to more than 200,000 compounds identied so far (Hartmann, 2007). Species- specicity and diversity of secondary metabolites are believed to be the result of natural selection. Therefore, understanding the mechanisms of evolution of secondary metabolite biosynthesis is an important avenue to our understanding of the adaptation of dierent living organisms to their unique niches. At the molecular level, duplication followed by functional divergence of genes of secondary metabolism is an important mechanism for achieving diversity and specicity (Kliebenstein et al., 2001; Ober, 2005). The impact of this mechanism is manifested by the presence of families of genes of secondary metabolites of varied sizes. Another mechanism for acquiring genetic novelty for secondary metabolism is horizontal gene transfer (HGT), which is dened as transfer of genes between organisms in a manner other than traditional reproduction (Gogarten and

Townsend, 2005; Keeling and Palmer, 2008). In fungi, a few HGT events for genes of secondary metabolism transferred from bacteria have been documented. For example,

Schmitt reported a clade of fungal type 1 polyketide synthase (PKS) genes, termed 6-

MSAS-type PKS, likely originated from Actinobacteria (Schmitt and Lumbsch, 2009). The presence of bacterial type hybrid non-ribosomal peptide synthase/PKS proteins in several

106 fungal species was also reported to be the result of a HGT event from bacteria to fungi

(Lawrence et al., 2011). Since biosynthetic genes for secondary metabolism are often

organized into clusters and co-expressed under particular conditions, these metabolic

gene clusters can also be transferred between fungi species (Campbell et al., 2012; Khaldi

and Wolfe, 2011; Proctor et al., 2013).

Terpenoids constitute the largest class of secondary metabolites. Despite their vast

diversity, all terpenoids are synthesized from the same ve-carbon precursor isopentenyl

diphosphate (IPP). Depending on the number of IPP units, terpenoids can be further

categorized as monoterpenes (C10), sesquiterpenes (C15), diterpenes (C20), triterpenes (C30),

tetraterpenes (C40) and polyterpenes. The biosynthesis of monoterpenes, sesquiterpenes and diterpenes from their respective substrates geranyl diphosphate, farnesyl diphosphate and geranylgeranyl diphosphate is catalyzed by the same type of enzymes, termed terpene synthases (TPSs). TPSs have been identied in bacteria, fungi and plants (Chen et al., 2011;

Keller et al., 2005; Yamada et al., 2015). Despite some structural homology, bacterial and fungal TPSs are only distantly related to typical plant TPSs judged on protein sequence similarity. Structural studies showed that TPSs are modular in nature, consisting one or more of the three modular structures called α-domain, β-domain and γ-domain (Köksal et al., 2011). Typical plant TPSs have either α-β or α-β-γ structure. The majority of

TPSs from bacteria and fungi are only formed by the α-domain. Interestingly, a recent study showed that microbial type terpene synthase genes are also present in plants (Li et al., 2012). The sporadic distribution of microbial type TPSs in the three domains of life suggests that the early evolution of microbial type TPSs genes may have involved HGT.

107 Despite the enormous diversity of terpenoids and the implication of HGT of TPS genes to achieve such diversity, to the best of our knowledge, a clear case of HGT of TPS genes has not been reported.

In principal, HGT of TPS genes can rapidly result in novel chemistry in the recipient organism. There are two biochemical pathways for the production of the terpenoid precursor IPP, the mevalonic acid pathway and the methylerythritol 4-phosphate path- way, and all living organisms contain one or both of these pathways (Vranova et al.,

2013). Consequently, the substrates for monoterpenes, sesquiterpenes and diterpenes are being produced by essentially all living organisms (also for the production of primary terpene metabolites). Therefore, TPS genes acquired through HGT can be functional in the recipient immediately after being acquired with its substrate readily available. One important feature of many TPSs is that they often produce multiple products from one substrate (Degenhardt et al., 2009). Therefore, acquisition of a single TPS may lead to the production of a suite of novel terpene metabolites. Terpenoids have many ecological functions, ranging from defense to benecial communications (Chen et al., 2011; Tholl,

2006). If the acquisition of a TPS gene and consequently the ability of producing a suite of terpenoids provide tness benet for the recipient, such TPS genes of HGT would likely be retained and xed in the recipient. Therefore, HGT of TPS genes most likely have made important contribution to the vast diversity of terpenoids observed in plants, bacteria, and fungi. In recent studies it has been shown that many fungi acquired genes from bacteria through HGT (Bushley and Turgeon, 2010; Kroken et al., 2003; Lawrence et al.,

2011; Schmitt and Lumbsch, 2009). To test our hypothesis concerning TPS genes, the rst

108 objective of this study was to detect events of HGT of TPS genes from bacteria to fungi.

Once identied, our second objective was to analyze such TPS genes in a specic biological

context to understand their biochemical function and evolution.

4.3 Results

4.3.1 Analysis of Bacterial and Fungal Terpene Synthase Genes

Suggests Possible HGT Events from Bacteria to Fungi

The list of sequenced bacterial and fungal species is growing rapidly, as evidenced by

the growing number of sequencing projects listed in Joint Genome Institute’s Genomes

Online Database (GOLD, https://gold.jgi.doe.gov). However, considering the fact

that some of the sequenced genomes are not well annotated and the challenge that a

huge amount of data we would have to deal with, in this study, we restricted our initial

analysis to well-dened TPS genes to explore the evolutionary relationships of TPSs in fungi and bacteria. Therefore, we chose the TPSs included in the Pfam database (http:

//pfam.xfam.org,version27.0).

The ‘gold standard’ for HGT identication is through phylogenetic analysis by

identifying incongruent relationships. Since the low degree of overall sequence identity

between bacterial and fungal TPSs, the C-terminal domain of each TPS, which is a more

conserved region, was extracted and used for the construction of phylogenetic . An

unrooted maximum likelihood phylogenetic tree of 341 TPS C-domain protein sequences

109 is shown in Figure 4.1. According to the tree topology, the majority of bacterial and

fungi TPSs clustered into two separate groups. However, some sequences were not

clustered within their own kingdom, although most of them were poorly supported

by the respective bootstrap values and therefor ignored for further analysis. Among

them, there were two clades in which fungi TPSs were embedded within bacterial TPSs

with high bootstrap values (100%, Figure 4.1). This nested relationship implies that

these genes were transferred horizontally from bacteria to fungi. One clade contained

a fungi TPS from Arthrobotrys oligospora (Nematode-trapping fungus) and a bacterial

TPS from Sphingobacterium sp. strain 21. The other one included four TPSs, two

fungi TPSs MAA_08668 and MAC_05714 from the two Metarhizium species M. robertsii

and M. acridum, respectively, and two bacterial TPSs from Granulicella mallensis and

Kitasatospora setae. The overall sequence similarity among these four domain sequences

was more than 50%. The sequence identity of MAA_08668 and MAC_05714 is 77%. In this

study, the fungal TPSs nested in bacterial TPSs were designated BTPSL (bacterial TPS-

like). We chose to do detailed follow-up study on the BTPSL genes from putative HGT

event initially identied in Metarhizium.

4.3.2 The Apparent Orthologs of BTPSL were Identied in a Group

of Entomopathogenic Fungi

Because the TPS dataset used in our initial analysis was restricted to the pfam database,

to determine the distribution of BTPSL genes in the fungi kingdom, we searched the

110 NCBI’s non-redundant protein sequence database (NR) with blastp using MAA_08668

and MAC_05714 as query respectively. Due to the high sequence homology between

MAA_08668 and MAC_05714, essentially same results were obtained. There was a clear

sequence identity drop from 55% to 29%, so sequences showing at least 50% sequence

identity to the query sequences were chosen for further analysis. Using this criterion,

in addition to M. robertsii and M. acridum, BTPSLs were identied in M. acridum (MAC),

M. majus (MAJ), M. guizhouense (MGU), M. brunneum (MBR), M. anisopliae (MAN) and

Ophiocordyceps sinensis (OCS) (Table 4.1). Interestingly, all these are entomopathogenic fungi and O. sinensis is closely related to Metarhizium. Metarhizium album (MAM) is another species in the Metarhizium genus that have been fully sequenced (Hu et al., 2014).

We specically searched its genome but no homolog of BTPSL genes was detected. In MBR, two distinct paralogs (MBR_10393 and MBR_09977) with 75% global amino acid sequence identity were present. The phylogeny of the group of entomopathogenic fungi including

Metarhizium and O. sinensis has been well resolved (Hu et al., 2013). Beauveria bassiana and Cordyceps militaris are in a lineage sister to the common ancestor of Metarhizium and

O. sinensis (Figure 4.7). Because their genome has been fully sequenced, we purposely searched genome sequences of B. bassiana nor C. militaris but BTPSL homologs were not

detected in neither species.

From the same blast search of the NCBI’s NR database, seven TPSs were identied

from bacterial species (Table 4.3), ve of which are species of the Betaproteobacteria

Burkholderia sp, one Acidobacteria and one Actinobacteria.

111 These eight BTPSLs from fungi range in size from 310 to 355 amino acids. They show higher sequence homology to bacterial TPSs than typical fungal TPSs at the amino acid level. All BTPSLs contain a highly conserved aspartate-rich “DDxxxD” motif located approximately 90 aa of their N-terminus and “NDxxSxxxE” motif at the C-terminus

(Figure 4.2)

4.3.3 Collinearity for the Genome Region Containing the BTPSL

and the Identication of Neighbor Genes

In order to rule out the possibility of bacterial contamination, we aligned the corre- sponding scaolds containing BTPSLs from seven entomopathogenic fungi using Mauve, which draws each collinear set of matching regions as a contiguously colored local collinear block (LCB). The alignment shows that the surrounding regions of BTPSLs were collinear among all the Metarhizium species although some assembly was of low quality

(Figure 4.3). For example, in MGU, scaold_187 (AZNH01000187.1) is only 24845 bp in size and showed high sequence similarity to others, but the expected downstream sequence

(scaold_81, AZNH01000081.1) by comparing with others was not joined together. All

BTPSLs were located in a single LCB with two exceptions, MBR_09977 and OCS_03958.

Neighboring genes in the recipient genomes were displayed below blocks as rectangles

(Figure 4.3). It is not surprising that the gene order in the region surrounding BTPSL was found to be highly conserved across all members of the Metarhizium genus. In each genome, the BTPSL gene’s nearest upstream gene was a short hypothetical gene encoding

112 a protein of 148 aa. The next gene upstream was identied as a gene that encodes a C6

zinc nger protein (458 aa), one type of zinc nger proteins found exclusively in fungal

species, particularly in ascomycete fungus. The nearest downstream neighbor gene of

the BTPSL was also a C6 zinc nger, Zn(2)-Cys(6) type (Todd and Andrianopoulos, 1997).

Downstream of this zinc nger was a member of the major/yellow royal jelly family of proteins that are present in all insects studied to date, bacteria and fungal species in the phyla ascomycota and basidiomycota but absent in non-insect Metazoa. In insects, bacterial origins of this gene family by horizontal gene transfer have been suggested

(Ferguson et al., 2011). Members of this protein family can be found in most of the

Metarhizium species, but not in MAC, MAM and OCS. In MAC, downstream of the BTPSL

was a secreted aspartic proteinase, instead of the zinc nger and major/yellow royal jelly

proteins. Further down, ADAM and bzip transcription factor genes could be identied.

ADAMs are zinc-dependent proteases and mainly present in animal species from protozoa

to mammals (Edwards et al., 2008; Huovila et al., 2005; Pollheimer et al., 2014). They

have also been found in fungi but are absent in plants (Lavens et al., 2005). The bZIP

transcription factors are found in all eukaryotes and play an import role in controlling

gene activity (Amoutzias et al., 2007). For TPS MBR_09977, its nearest upstream gene

was identied as a zinc nger protein of C2H2 type, which is almost exclusively found in

eukaryotes (Iuchi, 2001). Taken together, although major/yellow royal jelly proteins are

also present in bacteria, all other anking genes are non-bacterial on either side of the

BTPSL. It is interesting to note that there were three types of transcription factors in this

113 region and the rate of transcription of the BTPSL gene might be regulated by a few or all of them.

4.3.4 Experimental Verication of BTPSL from M. Robertsii as a

Fungal Gene

It is always a concern that a putative HGT is actually due to DNA contamination

during genome sequencing. To verify that these BTPSL genes were really originated from

fungi and not the result of bacterial DNA contamination, M. robertsii was selected as

a model species for experimental study. First, genomic DNA was isolated and used as

template for the amplication of the coding sequence of MAA_08668. Genomic DNA

sequences covering MAA_08668 and its upstream and downstream genes as well as

its intergenic regions were amplied using PCR. Amplied DNA sequences were fully

sequenced and its position was conrmed in the published genomic sequence (Figure 4.4).

4.3.5 The Presence of Typical Fungal TPS Genes in Relevant Fun-

gal Species

With an increasing number of fungal species being sequenced, TPS genes were found

to be widely distributed in the fungi kingdom especially in the phyla ascomycota and

basidiomycota (Quin et al., 2014). We were interested to ask whether the group of

entomopathogenic fungi examined in this study contain other, i.e. typical fungal, TPS

genes. To answer this question, we searched their genome sequence for the presence of

114 putative TPS genes. The number of fungal TPS found in these genomes ranged from 1 to

6 (Table 4.2). Most of them were around 330 aa in size, which is similar to that of TPSs in fungi, with several exceptions (Figure 4.5 and Table 4.2).

4.3.6 Phylogenetic Analysis of BTPSLs, Related Bacterial TPSs and

Other TPSs

Phylogenetic analysis was performed to understand the evolutionary relatedness of

BTPSLs, related bacterial TPSs listed in Table 4.3 and typical fungal TPSs identied from

entomopathogenic fungi listed in Table 4.2. It is evident that BTPLSs from fungi are

monophyletic. Their sister clade are TPSs identied from bacteria listed in Table 4.3.

When the intron-exon structures for all the TPS genes used in the phylogenetic analysis

were compared, it is interesting to notice that as is typical in bacterial genes, none of the

8 BTPSL genes in entomopathogenic fungi had introns except the one in Ophiocordyceps

sinensis (OCS_03958), which contained a single 56 bp intron (Figure 4.5). Since the

phylogenetic analysis indicated their prokaryotic origin, the intron in gene OCS_03958

was probably acquired later on after transferring horizontally from bacteria. Most typical

fungal TPS genes contain 2 introns, but some of the fungal TPSs, in particular those possessing fungal trichodiene synthase (TRI5: PF06330) domain, had no intron.

115 4.3.7 The BTPSL Gene of HGT Origin in M. Robertsii Encodes a

Functional Enzyme

MAA_08668 was heterologously expressed in E. coli and the recombinant protein was tested for TPS enzyme activity. In vitro assays revealed that MAA_08668 was able to convert the common TPS substrate (E,E)-FPP into a mixture of 20 sesquiter- penes, dominated by the two alcohols epi-α-cadinol and nerolidol (Figure 4.6). γ-

Cadinene, germacrene D, δ-cadinene, β-elemene, (E)-β-caryopphylene, α-cadinene, epi- bicyclosesquiphellandrene, cadina-1,4-diene and 10 unidentied sesquiterpenes were detected as minor products. Besides (E,E)-FPP, MAA_08668 also accepted (Z,E)-FPP, pro- ducing β-bisabolene, α-longipinene, α-ylangene and at least 25 further so far unidentied sesquiterpenes (Figure 4.6). Fed with (Z,Z)-FPP, the enzyme produced only trace amounts of an unidentied sesquiterpene. GPP and GGPP, the precursors for monoterpenes and diterpenes, respectively, were not accepted as substrates.

4.3.8 Molecular Evolutionary Analysis of BTPSL Genes of HGT

Origin in Metarhizium

The selective forces that have shaped the evolution of the BTPSL genes in fungal recipients were inferred by examining the ratios of non-synonymous to synonymous nucleotide substitution rates (dN/dS) using codeml of PAML (Yang, 2007). When applied the one-ratio model that assumes the same ω ratio for all branches to our data, we got a log likelihood of -6363.8 with an estimate ω = 0.16093 (Table 4.4), indicating that the major

116 selective force acting on all BTPSL genes and bacterial TPS genes in the tree (Figure 4.5) was that of purifying selection. We then applied the free-ratio model, which assumes a dierent ω ratio for each branch, to test whether rate heterogeneity exists among lineages.

This model gave a log likelihood of -6309.81 (Table 4.4). The LRT test showed that the free-ratio model did t the data signicantly better than the one-ratio model (2ΔlnL =

107.99, p = 0 with df =26). This indicates that signicantly dierent ω ratios indeed existed at specic lineages. Therefore, we used PAML’s branch model to test whether signicant shifts in selective pressures exist following HGT. Two dierent ω ratios were assigned for the BTPSL genes in fungal recipients (ω1 for foreground branches) and for all bacterial genes in donor groups (ω0 for background branches). The log likelihood under this model was -6348.66 with ω0= 0.18 and ω1 = 0.00075 (Table 4.4). The likelihood ratio test in comparing this model to one-ratio model was signicant (2ΔlnL = 30.28, p = 0 with df =1) (Table 4.4). This signicantly decreased ω ratio compared to that in donor groups indicates more stringent purifying selection acting on genes in fungal recipients, demonstrating clearly a great adaptive signicance of these HGT-acquired genes.

4.4 Discussion

Our comprehensive phylogenetic analysis of bacterial and fungi terpene synthases showed that a few TPS genes designated as bacterial terpene synthase-like (BTPSL) found in several entomopathogenic fungi were embedded within the bacteria TPS genes. This nested relationship indicates a bacterial origin of fungi BTPSL (Figure 4.1 and 4.5). To the

117 best of our knowledge, this is the rst time HGT has been reported in TPS gene family

although HGT is not uncommon in other classes of secondary metabolite genes (Bushley

and Turgeon, 2010; Kroken et al., 2003; Lawrence et al., 2011; Schmitt and Lumbsch, 2009).

Several independent lines of evidence suggest that fungal BTPSLs were achieved by HGT

from bacteria. First, the regions surrounding the BTPSL genes in Metarhizium species

are highly conserved and all anking genes in this conserved syntenic region appear to

be fungal sequences. If the BTPSL genes were from a bacteria contaminant, chances of

their integration into this region would be very unlikely. Second, all these genes except

the one found in OCS have no introns, which is consistent with HGT from bacteria.

Third, in contrast to the typical “DDxxD” motif shared by almost all eukaryotic TPSs, all

these BTPSLs possess a “DDxxxD” motif (Figure 4.2), which is another strong indicator

of HGT from bacteria. Fourth, the presence of at least two transcription factors in the

BTPSL surrounding region is consistent with the observation that many fungal secondary

metabolite gene clusters often possess one or more transcription factors (Homeister and

Keller, 2007; Osbourn, 2010; Yin and Keller, 2011), indicating that the special gene cluster

identied in this work is very likely the result of natural selection over time following

HGT rather than a random process. Taken together, these results provide circumstantial

evidence for HGT of TPS genes from bacteria to fungi. With this rst case, we speculate that the occurrence of HGT of TPS genes from bacteria to fungi may not be an extremely rare event. As shown in Figure 4.1, one TPS gene from A. oligospora, a predacious fungus

of nematodes, is most likely originated from an HGT event from a bacterium donor.

Comprehensive phylogenetic analysis of TPS genes from a most current list of sequenced

118 bacteria and fungi will provide a better assessment on the occurrence of HGT of TPS genes from bacteria to fungi.

The presence of BTPSL genes in six of the seven Metarhizium species and O. sinensis and the absence in B. bassiana and C. militaris implies that the HGT event occurred in the immediate common ancestor of Metarhizium species and O. sinensis and after their divergence from the common ancestor of B. bassiana and C. militaris. In this scenario, the absence of BTPSL gene in MAM would be explained by lineage-specic gene loss. Metarhizium species can be classied into three groups based on their host ranges: specialist species (MAM and MAC), transitional species (MAJ and MGU) and generalist species (MAA, MBR and MAN). Previous studies have shown that specialists

MAM and MAC have fewer gene clusters involved in secondary metabolite biosynthesis than other Metarhizium species or plant-associated fungi (Hu et al., 2014). It is tempting to speculate that the loss of BTPSL gene in MAM is neutral or even adaptive. On the other hand, it is interesting to observe that for typical fungal terpene synthases, the number of genes varied little among entomopathogenic fungi species studied except in two generalist species OCS (6) and BBA (5) (Table 4.2). Further functional studies of all these TPS genes in each of these entomopathogenic fungi will help explain how these genes contribute to their success of living in a more diverse environment and invasion of varying range of insect hosts.

The actual donor bacterium species is hard to determine due to the insucient taxon sampling. Bacterial sequences which shared more than 50% sequence identity with BTPSL genes identied in fungi are from three dierent bacterial groups: Betaproteobacteria,

119 Acidobacteria and Actinobacteria, which are all possible sources of HGT. Most of them are

from the genus Burkholderia of Betaproteobacteria, containing a wide variety of Gram-

negative species that occupy a wide range of ecological niches, such as soil, water, plant

rhizosphere and fungi. Many Burkholderia species contain so-called genomic islands (GIs)

that are the main contributors to the great genomic plasticity within this genus (Holden

et al., 2004; Kim et al., 2005; Tuanyok et al., 2008). These GIs can perform a broad range

of functions and are more likely to arise from horizontal gene transfer between dierent

bacteria. Their genomes are open as evidenced by the previous nding that only about

3% of protein-coding gene families are conserved across all Burkholderia species (Ussery

et al., 2009). Previous studies also have shown that Metarhizium species are prone to accept exogenous DNA, particular for nonspecialist species (Hu et al., 2014). Therefore it is possible that the BTPSL genes originated from a Burkholderia species. The mechanism of the HGT is not clear, but one pre-condition must be met for HGT to occur, that is the close contact between donor and recipient. Given the diverse ecological niches (particular in or on fungal mycelia) Burkholderia are occupying, contact between donor and recipient is guaranteed and routes for HGT into fungi are more likely to establish.

Compared to vertical inheritance, the acquisition of new genes by HGT is random and dangerous. So, gain of benecial genes is not guaranteed. The majority of horizontally transferred genes get lost in the recipient genomes very quickly except those increasing the tness of the recipient organism (Hao and Golding, 2006; Novozhilov et al., 2005).

In Metarhizium species, the fact that these BTPSL genes are arranged in clusters along with other genes like transcription factors (a typical characteristic of fungal secondary

120 metabolism) indicates that they are under strong selection and their expression are

probably ne-turned by these transcription factors. Previous studies have shown that in

Metarhizium species transcription factors are under strong positive selection, particularly the Zn(2)-Cys(6) type and C2H2 type (Hu et al., 2014), as compared to the BTPSL genes

that are under strong purifying selection. The changes in transcription factors and

present of BTPSL genes in Metarhizium species except the basal species MAM could, in part, explain the expanded host range. Within Metarhizium species, there are other dierences in gene contents and transcriptional regulations contributing to the dierences in pathogenicity and host range, such as dierent number of proteases, cytochrome

P450 enzymes, polyketide synthases and dehydrogenases (Gao et al., 2011; Hu et al.,

2014). Although MAA_08668 showed TPS enzyme activity and catalyzed the formation of multiple sesquiterpenes in vitro, the in vivo function and identity of the secondary metabolites produced by this horizontally transferred gene remains unknown. Further experimental studies, such as mutational analysis, are needed to conrm the requirement of this type of TPS gene for pathogenicity and to determine the active molecules that directly cause disease in insects.

4.5 Materials and Methods

4.5.1 Data Sources and Analysis

Bacterial and fungal terpene synthases (TPSs) were obtained from Pfam (version 27.0).

Since the majority of TPSs from these two kingdoms contains only the C terminal domain

121 of the TPS protein, only sequences that included in Pfam entry PF03936 were retrieved.

The original dataset contained 287 bacterial TPSs in 140 species and 202 fungal TPSs in 69 species. In order to remove spurious sequences, the downloaded sequences were subjected to a HMMER (Finn et al., 2011) search against the Pfam-A database. Only sequences with the PF03936 domain as the best-matched domain and active in UniProt database were kept in our analysis. This resulted in a nal dataset of 275 bacterial TPSs in 133 species and

181 fungal TPSs in 64 species.

TPSs in 7 Metarhizium species, Ophiocordyceps sinensis, Cordyceps militaris, Beauveria bassiana ARSEF 2860 and Beauveria bassiana D1-5 were mined from available protein sequences in NCBI protein database using HMMER at an e-value of 1e-2. For detailed information about the species and the number of TPSs in each species, see Table 4.1 and

4.2.

Sequences similar to the BTPSLs in Metarhizium species were retrieved from NCBI nr database based on blastp searches with the BTPSLs from M. robertsii (MAA_08668) and M. acridum (MAC_05714) as the queries. The blastp hits with at least 50% sequence identity to the query were chosen.

4.5.2 Multiple Sequence Alignments and Phylogenetic Inference

TPSs obtained from Pfam were rst clustered at 80% sequence identity in each kingdom (bacteria and fungi) using CD-HIT (Li and Godzik, 2006) to eliminate highly similar sequences within each kingdom. After clustering, the number of TPSs for bacteria and fungi were reduced to 165 and 138, respectively. The corresponding C

122 terminal domain sequences from bacterial and fungal TPSs were retrieved based on the coordinates predicted by HMMER. All multiple sequence alignments were made using

MAFFT (v7.130b) (Katoh and Standley, 2013) in a highly accurate setting (L-INS-i) with

1000 iterations of improvement. The appropriate amino acid substitution model was determined using ProtTest version 3.4 (Darriba et al., 2011) for each alignment according to Akaike information criterion (AIC) and Bayesian information criterion (BIC). The improved general amino acid substitution matrix with empirical base frequencies along with a gamma distribution (LG+G+F) was obtained as the most appropriate model for all protein datasets. Maximum likelihood analyses were performed using RAxML version

8.1.11 (Stamatakis, 2014) with 1000 bootstrap replicates under the best substitution model for each dataset via the online CIPRES Science gateway portal (Miller et al., 2010).

The maximum-likelihood tree shown in Figure 4.5 was inferred from the codon alignment of TPS genes identied from several entomopathogenic fungi and TPS genes in bacteria that showing great similarity to BTPSLs. The codon alignment was generated using PAL2NAL (Suyama et al., 2006) from the MAFFT protein alignment (L-INS-I method with 1000 iterations of improvement) and the corresponding nucleotide sequences. The maximum-likelihood analysis was performed using PhyML 3.1 (Guindon et al., 2010) with

GTR+I+G nucleotide substitution model chosen by jModeltest2 (Darriba et al., 2012) based on the AIC and BIC criteria. The robustness of the phylogenetic tree was estimated by bootstrapping with 1000 replicates.

123 4.5.3 Multiple Genome Alignment

For comparisons of sequences at the genome level, homologous contig sequences from each of seven species were aligned with sequence of MAA as the reference using the progressive alignment algorithm of the MAUVE Multiple Genome Aligner (version 2.4.0) at default settings (Darling et al., 2004).

4.5.4 Analysis of Selective Pressure

Selection analyses were performed using the codeml program implemented in the

PAML 4.8 (Yang, 2007). To detect the dierent selective pressure acting on the TPS gene in fungal recipients (foreground branches) and genes in bacterial donor groups (background branches), we used PAML’s branch models that allow ω to vary among lineages. The likelihood ratio test (LRT) was carried out to determine if foreground branches are under signicantly dierent selection pressure from the background branches by comparing one-ratio against two-ratios.

4.5.5 Fungal Culture

Metarhizium robertsii (MAA) isolate, ARSEF 23, was ordered from USDA ARS Col- lection of Entomopathogenic Fungal Cultures (ARSEF), Ithaca, New York. Lyophilized isolates were resuspended with 500µl steriled distilled water and suspension solution was cultured on Difco potato dextrose agar medium (PDA) at 28°C for one week.

124 4.5.6 MAA_08668 Gene Cloning and Verication of Its Position

Mycelia of M. robertsii were used to isolate genomic DNA. The genomic DNA was extracted using Plant Genomic DNA Extraction Miniprep System (http://www.

viogene.com), according to the protocol recommended by the manufacturer. The

MAA_08668 gene was amplied from genomic DNA using the forward primer 5’-

ATGGAAAAACAAAGACTGAAAGCTC-3’ and the reverse primer 5’-CTAGACCAAGCT

GCTCGTTGACTCAG-3’ and the resulting PCR product was cloned into pEXP5-CT/TOPO

vector (www.lifetechnologies.com), according to the protocol provided by the manu-

facturer. MAA_08668 gene cloned into pEXP5-CT/TOPO was conrmed by sequencing.

Position of MAA_08668 in the published Metarhizium genomic sequences was veried

by PCR. 10591 bp of DNA sequences covering MAA_08668 and its neighbor genes were

amplied using four pairs of primers: 11039F 5’-ATGCCGCTGGCAGACTTGCGGTAC -

3’, 6768R1 5’-GAGGTGAGACACCAGCGTTATG-3’, 6768F2 5’-CTACGTAGCCATTAGCG

AGAGGGAC-3’, 8868R1 5’-GACTCAAGTACTGGATACGAGGTTAC-3’, 8868F2 5’-GTCAG

TCACCAGCAGATTATGCTC-3’, 6869R1 5’-CTAATTGTCGACACTACTCTGGCCAG-3’,

6869F2 5’-CAACTTCATGTGCACTACTATAGCAAG-3’, 8869R 5’-TACACAAGGCCGGCA

AGAACAGTG 3’. The PCR products were subcloned into pGEM T-EASY vector and fully

sequenced.

125 4.5.7 Biochemical Characterization of MAA_08668

The E.coli BL21 codon plus strain (http://www.lifetechnologies.com), trans- formed with the plasmid containing MAA_08668, was used for protein expression. The

BL21 culture was grown in liquid LB at 37°C until the culture reached OD600 of 0.6.

Protein expression was induced by addition of 1 M isopropylthio-β-galactoside (IPTG) to a nal concentration of 1 mM. After 20 hours incubation at 18°C , cells were harvested by centrifugation at 6000 g, resuspended in protein extraction buer (50 mM Mopso (pH7.0),

5 mM MgCl2, 5 mM Sodium ascorbate, 5 mM dithiothreitol, 0.5 mM PMSF and 10% (v/v) glycerol) and disrupted by a 4 × 30 sec treatment with a sonicator (Microson XL 2000;

Misonix, Farmingdale, New York). Cell debris was removed by centrifugation at 14,000 rpm (20 min, 4°C) and the supernatant was desalted by passage through a PD-10 Desalting

Column (http://www.gelifesciences.com) into assay buer (10 mM Mopso, pH7.0, 1 mM dithiothreitol, 10% (v/v) glycerol).

To determine the catalytic activity of MAA_08668, enzyme assays were conducted in a Teon-sealed, screw-capped 1 ml GC glass vial containing 40 µl of the bacte- rial extract and 60 µl assay buer containing 10 µM substrate [(E,E)-FPP, (Z,E)-FPP,

(Z,Z)-FPP and GPP, respectively], 10 mM MgCl2, 0.2 mM NaWO4 and 0.1 mM NaF.

A SPME (solid phase microextraction) ber consisting of 100 µm polydimethylsiloxane

(http://www.sigmaaldrich.com) was placed into the headspace of the vial for 45 min incubation at 30°C to adsorb the TPS reaction products. Product analysis was conducted using an Agilent 6890 Series gas chromatograph (GC) coupled to an Agilent 5973

126 quadrupole mass selective detector (interface temp, 250°C; quadrupole temp, 150°C; source temp, 230°C; electron energy, 70 eV). The GC was operated with a DB-5MS column

(Agilent, Santa Clara, USA, 30 m x 0.25 mm x 0.25 µm). The sample (SPME) was directly injected without split at an initial oven temperature of 80°C. The temperature was held for 3 min, then increased to 240°C with a gradient of 7°C min-1, and further increased to

300°C with a gradient of 60°C min-1 and a hold of 2 min. To determine potential diterpene synthase activity, assays were set up as described above, containing 50 µM (E,E,E)-GGPP as substrate, and were overlaid with 100 µl hexane. After incubation for 60 min at 30°C, the hexane phase was collected and analyzed using GC-MS.

4.6 Acknowledgements

We thank USDA ARS Collection of Entomopathogenic Fungal Cultures (ARSEF),

Ithaca, New York for providing us with the Metarhizium robertsii (MAA) isolate, ARSEF

23.

127 4.7 Bibliography

Amoutzias, G. D., Veron, A. S., Weiner, J., r., Robinson-Rechavi, M., Bornberg-Bauer, E.,

Oliver, S. G., and Robertson, D. L. (2007). One billion years of bzip transcription factor

evolution: conservation and change in dimerization and dna-binding site specicity.

Molecular Biology and Evolution, 24(3):827–35. 113

Bushley, K. E. and Turgeon, B. G. (2010). Phylogenomics reveals subfamilies of

fungal nonribosomal peptide synthetases and their evolutionary relationships. BMC

Evolutionary Biology, 10:26. 108, 118

Campbell, M. A., Rokas, A., and Slot, J. C. (2012). Horizontal transfer and death of a fungal

secondary metabolic gene cluster. Genome Biol Evol, 4(3):289–93. 107

Chen, F., Tholl, D., Bohlmann, J., and Pichersky, E. (2011). The family of terpene synthases

in plants: a mid-size family of genes for specialized metabolism that is highly diversied

throughout the kingdom. Plant Journal, 66(1):212–29. 107, 108

Darling, A. C., Mau, B., Blattner, F. R., and Perna, N. T. (2004). Mauve: multiple alignment

of conserved genomic sequence with rearrangements. Genome Research, 14(7):1394–

403. 124

Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. (2011). Prottest 3: fast selection of

best-t models of protein evolution. Bioinformatics, 27(8):1164–5. 123

Darriba, D., Taboada, G. L., Doallo, R., and Posada, D. (2012). jmodeltest 2: more models,

new heuristics and parallel computing. Nature Methods, 9(8):772. 123

128 Degenhardt, J., Kollner, T. G., and Gershenzon, J. (2009). Monoterpene and sesquiterpene

synthases and the origin of terpene skeletal diversity in plants. Phytochemistry, 70(15-

16):1621–37. 108

Edwards, D. R., Handsley, M. M., and Pennington, C. J. (2008). The adam

metalloproteinases. Molecular Aspects of Medicine, 29(5):258–89. 113

Ferguson, L. C., Green, J., Surridge, A., and Jiggins, C. D. (2011). Evolution of the insect

yellow gene family. Molecular Biology and Evolution, 28(1):257–72. 113

Finn, R. D., Clements, J., and Eddy, S. R. (2011). Hmmer web server: interactive sequence

similarity searching. Nucleic Acids Research, 39(Web Server issue):W29–37. 122

Gao, Q., Jin, K., Ying, S. H., Zhang, Y., Xiao, G., Shang, Y., Duan, Z., Hu, X., Xie, X. Q.,

Zhou, G., Peng, G., Luo, Z., Huang, W., Wang, B., Fang, W., Wang, S., Zhong, Y., Ma,

L. J., St Leger, R. J., Zhao, G. P., Pei, Y., Feng, M. G., Xia, Y., and Wang, C. (2011).

Genome sequencing and comparative transcriptomics of the model entomopathogenic

fungi and m. acridum. PLoS Genetics, 7(1):e1001264. 121

Gogarten, J. P. and Townsend, J. P. (2005). Horizontal gene transfer, genome innovation

and evolution. Nature Reviews. Microbiology, 3(9):679–87. 106

Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010).

New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing

the performance of phyml 3.0. Systematic Biology, 59(3):307–321. 123

129 Hao, W. and Golding, G. B. (2006). The fate of laterally transferred genes: life in the fast

lane to adaptation or death. Genome Research, 16(5):636–43. 120

Hartmann, T. (2007). From waste products to ecochemicals: Fifty years research of plant

secondary metabolism. Phytochemistry, 68(22-24):2831–2846. 106

Homeister, D. and Keller, N. P. (2007). Natural products of lamentous fungi: enzymes,

genes, and their regulation. Natural Product Reports, 24(2):393–416. 118

Holden, M. T., Titball, R. W., Peacock, S. J., Cerdeno-Tarraga, A. M., Atkins, T., Crossman,

L. C., Pitt, T., Churcher, C., Mungall, K., Bentley, S. D., Sebaihia, M., Thomson, N. R.,

Bason, N., Beacham, I. R., Brooks, K., Brown, K. A., Brown, N. F., Challis, G. L.,

Cherevach, I., Chillingworth, T., Cronin, A., Crossett, B., Davis, P., DeShazer, D.,

Feltwell, T., Fraser, A., Hance, Z., Hauser, H., Holroyd, S., Jagels, K., Keith, K. E.,

Maddison, M., Moule, S., Price, C., Quail, M. A., Rabbinowitsch, E., Rutherford, K.,

Sanders, M., Simmonds, M., Songsivilai, S., Stevens, K., Tumapa, S., Vesaratchavest, M.,

Whitehead, S., Yeats, C., Barrell, B. G., Oyston, P. C., and Parkhill, J. (2004). Genomic

plasticity of the causative agent of melioidosis, burkholderia pseudomallei. Proceedings

of the National academy of Sciences of the United States of America, 101(39):14240–5. 120

Hu, X., Xiao, G., Zheng, P., Shang, Y., Su, Y., Zhang, X., Liu, X., Zhan, S., St Leger, R. J., and

Wang, C. (2014). Trajectory and genomic determinants of fungal-pathogen speciation

and host adaptation. Proceedings of the National academy of Sciences of the United States

of America, 111(47):16796–801. 111, 119, 120, 121

130 Hu, X., Zhang, Y., Xiao, G., Zheng, P., Xia, Y., Zhang, X., St Leger, R. J., Liu, X., and Wang,

C. (2013). Genome survey uncovers the secrets of sex and lifestyle in caterpillar fungus.

Chinese Science Bulletin, 58(23):2846–2854. 111

Huovila, A. P., Turner, A. J., Pelto-Huikko, M., Karkkainen, I., and Ortiz, R. M. (2005).

Shedding light on adam metalloproteinases. Trends in Biochemical Sciences, 30(7):413–

22. 113

Iuchi, S. (2001). Three classes of c2h2 zinc nger proteins. Cellular and Molecular Life

Sciences, 58(4):625–35. 113

Katoh, K. and Standley, D. M. (2013). Mat multiple sequence alignment software version

7: improvements in performance and usability. Molecular Biology and Evolution,

30(4):772–80. 123

Keeling, P. J. and Palmer, J. D. (2008). Horizontal gene transfer in eukaryotic evolution.

Nature Reviews. Genetics, 9(8):605–18. 106

Keller, N. P., Turner, G., and Bennett, J. W. (2005). Fungal secondary metabolism - from

biochemistry to genomics. Nature Reviews. Microbiology, 3(12):937–47. 107

Khaldi, N. and Wolfe, K. H. (2011). Evolutionary origins of the fumonisin secondary

metabolite gene cluster in fusarium verticillioides and aspergillus niger. Int J Evol Biol,

2011:423821. 107

Kim, H. S., Schell, M. A., Yu, Y., Ulrich, R. L., Sarria, S. H., Nierman, W. C., and DeShazer,

D. (2005). Bacterial genome adaptation to niches: divergence of the potential virulence

131 genes in three burkholderia species of dierent survival strategies. BMC Genomics,

6:174. 120

Kliebenstein, D. J., Lambrix, V. M., Reichelt, M., Gershenzon, J., and Mitchell-

Olds, T. (2001). Gene duplication in the diversication of secondary metabolism:

tandem 2-oxoglutarate-dependent dioxygenases control glucosinolate biosynthesis in

arabidopsis. Plant Cell, 13(3):681–93. 106

Kroken, S., Glass, N. L., Taylor, J. W., Yoder, O. C., and Turgeon, B. G. (2003). Phylogenomic

analysis of type i polyketide synthase genes in pathogenic and saprobic ascomycetes.

Proceedings of the National academy of Sciences of the United States of America,

100(26):15670–5. 108, 118

Köksal, M., Jin, Y., Coates, R. M., Croteau, R., and Christianson, D. W. (2011). Taxadiene

synthase structure and evolution of modular architecture in terpene biosynthesis.

Nature, 469(7328):116–122. 107

Lavens, S. E., Rovira-Graells, N., Birch, M., and Tuckwell, D. (2005). Adams are present

in fungi: identication of two novel adam genes in aspergillus fumigatus. FEMS

Microbiology Letters, 248(1):23–30. 113

Lawrence, D. P., Kroken, S., Pryor, B. M., and Arnold, A. E. (2011). Interkingdom gene

transfer of a hybrid nps/pks from bacteria to lamentous ascomycota. PLOS ONE,

6(11):e28231. 107, 108, 118

132 Li, G., Kollner, T. G., Yin, Y., Jiang, Y., Chen, H., Xu, Y., Gershenzon, J., Pichersky, E.,

and Chen, F. (2012). Nonseed plant selaginella moellendor [corrected] has both seed

plant and microbial types of terpene synthases. Proceedings of the National academy of

Sciences of the United States of America, 109(36):14711–5. 107

Li, W. and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large

sets of protein or nucleotide sequences. Bioinformatics, 22(13):1658–9. 122

Miller, M. A., Pfeier, W., and Schwartz, T. (2010). Creating the cipres science gateway for

inference of large phylogenetic trees. In Gateway Computing Environments Workshop

(GCE), 2010, pages 1–8. 123

Novozhilov, A. S., Karev, G. P., and Koonin, E. V. (2005). Mathematical modeling

of evolution of horizontally transferred genes. Molecular Biology and Evolution,

22(8):1721–32. 120

Ober, D. (2005). Seeing double: gene duplication and diversication in plant secondary

metabolism. Trends in Plant Science, 10(9):444–9. 106

Osbourn, A. (2010). Secondary metabolic gene clusters: evolutionary toolkits for chemical

innovation. Trends in Genetics, 26(10):449–57. 118

Pollheimer, J., Fock, V., and Knoer, M. (2014). Review: the adam metalloproteinases -

novel regulators of trophoblast invasion? Placenta, 35 Suppl:S57–63. 113

Proctor, R. H., Van Hove, F., Susca, A., Stea, G., Busman, M., van der Lee, T., Waalwijk, C.,

Moretti, A., and Ward, T. J. (2013). Birth, death and horizontal transfer of the fumonisin

133 biosynthetic gene cluster during the evolutionary diversication of fusarium. Molecular

Microbiology, 90(2):290–306. 107

Quin, M. B., Flynn, C. M., and Schmidt-Dannert, C. (2014). Traversing the fungal

terpenome. Natural Product Reports, 31(10):1449–73. 114

Schmitt, I. and Lumbsch, H. T. (2009). Ancient horizontal gene transfer from bacteria

enhances biosynthetic capabilities of fungi. PLOS ONE, 4(2):e4437. 106, 108, 118

Stamatakis, A. (2014). Raxml version 8: a tool for phylogenetic analysis and post-analysis

of large phylogenies. Bioinformatics, 30(9):1312–3. 123

Suyama, M., Torrents, D., and Bork, P. (2006). Pal2nal: robust conversion of protein

sequence alignments into the corresponding codon alignments. Nucleic Acids Research,

34(Web Server issue):W609–12. 123

Tholl, D. (2006). Terpene synthases and the regulation, diversity and biological roles of

terpene metabolism. Current Opinion in Plant Biology, 9(3):297–304. 108

Todd, R. B. and Andrianopoulos, A. (1997). Evolution of a fungal regulatory gene family:

the zn(ii)2cys6 binuclear cluster dna binding motif. Fungal Genetics and Biology,

21(3):388–405. 113

Tuanyok, A., Leadem, B. R., Auerbach, R. K., Beckstrom-Sternberg, S. M., Beckstrom-

Sternberg, J. S., Mayo, M., Wuthiekanun, V., Brettin, T. S., Nierman, W. C., Peacock,

S. J., Currie, B. J., Wagner, D. M., and Keim, P. (2008). Genomic islands from ve strains

of burkholderia pseudomallei. BMC Genomics, 9:566. 120

134 Ussery, D. W., Kiil, K., Lagesen, K., Sicheritz-Ponten, T., Bohlin, J., and Wassenaar, T. M.

(2009). The genus burkholderia: analysis of 56 genomic sequences. Genome Dyn, 6:140–

57. 120

Vranova, E., Coman, D., and Gruissem, W. (2013). Network analysis of the mva and mep

pathways for isoprenoid synthesis. Annual Review of Plant Biology, 64:665–700. 108

Yamada, Y., Kuzuyama, T., Komatsu, M., Shin-Ya, K., Omura, S., Cane, D. E., and Ikeda, H.

(2015). Terpene synthases are widely distributed in bacteria. Proceedings of the National

academy of Sciences of the United States of America, 112(3):857–62. 107

Yang, Z. (2007). Paml 4: phylogenetic analysis by maximum likelihood. Molecular Biology

and Evolution, 24(8):1586–91. 116, 124

Yin, W. and Keller, N. P. (2011). Transcriptional regulatory elements in fungal secondary

metabolism. J Microbiol, 49(3):329–39. 118

135 4.8 Appendix

Table 4.1: Presence/Absence of BTPSLs for each entomopathogenic fungi examined in this study.

Organism BTPSLs Length (aa) Metarhizium album (MAM) NDa Metarhizium acridum (MAC) MAC_05714 349 Metarhizium majus (MAJ) MAJ_08936 355 Metarhizium guizhouense (MGU) MGU_11447 348 Metarhizium brunneum (MBR) MBR_10393 332 MBR_09977 342 Metarhizium anisopliae (MAN) MAN_10655 355 Metarhizium robertsii (MAA) MAA_08668 355 Ophiocordyceps sinensis (OCS) OCS_03958 310 Cordyceps militaris (CCM) ND Beauveria bassiana (BBA) ND Beauveria bassiana D1-5 (B15) ND a not detected.

136 Table 4.2: List of typical fungal TPSs for each entomopathogenic fungi.

Organism Typical fungal TPSs Length (aa) Beauveria bassiana D1-5 (B15) B15_g10715 320 B15_g11707 410 B15_g9145 241 B15_g8322 308 B15_g8139 805 Beauveria bassiana (BBA) BBA_00396 193 BBA_03632 320 BBA_08696 347 BBA_08783 805 BBA_08897 308 Cordyceps militaris (CCM) CCM_03050 228 Metarhizium acridum (MAC) MAC_09823 215 MAC_06344 194 Metarhizium anisopliae (MAN) MAN_07035 331 MAN_02977 327 Metarhizium album (MAM) MAM_08406 1009 MAM_06660 321 Metarhizium brunneum (MBR) MBR_03882 331 MBR_00969 326 Metarhizium guizhouense (MGU) MGU_07289 331 MGU_07703 326 MGU_08368 256 MGU_10775 472 Metarhizium majus (MAJ) MAJ_05691 331 MAJ_07611 248 MAJ_10664 459 MAJ_11006 181 Metarhizium robertsii (MAA) MAA_06799 331 MAA_07176 326 MAA_06544 320 MAA_06581 328 Ophiocordyceps sinensis (OCS) OCS_00486 389 OCS_04474 424 OCS_00133 281 OCS_01341 440 OCS_00754 387 OCS_06263 360

137 Table 4.3: Presence/Absence of BTPSLs for each entomopathogenic fungi examined in this study.

Gene name Organism Strain Taxonomy ID Length (aa) B_ MSh2 Burkholderia sp. MSh2 1506588 340 B_ MSh1 Burkholderia sp. MSh1 1506587 340 B_AU4i Burkholderia sp. AU4i 1335308 347 B_ce Burkholderia cepacia complex NAa 87882 342 B_ceJBK9 Burkholderia cepacia JBK9 1395570 311 G_ma Granulicella mallensis NA 940614 333 K_se Kitasatospora setae NA 2066 320 a not available.

138 Table 4.4: Presence/Absence of BTPSLs for each entomopathogenic fungi examined in this study.

Model Estimates of Parameters np lnL 2ΔlnL(LRT) One-ratio ωa= 0.16093 29 -6363.800274 Free-ratio vs. One-ratio Free-ratio 55 -6309.807230 107.986088 (df=26; p=5.824e-12) Two-ratios Two-ratios vs. One-ratio b c BTPSLs ω0 = 0.18221; ω1 = 0.00075 30 -6348.661135 30.278278 (df=1; p=3.743e-08) a ω = nonsynonymous/synonymous rate ratio (dN/dS). b ω0 = ω for bacterial TPSs branch. c ω1 = ω for BTPSLs branch.

139 Bacteria Fungi 1

100 MAA_08668 49 MAC_05714 100 G8NTZ9 E4N7E5

2

100 G1X7C4 F4C2X7

3.0

Figure 4.1: Maximum Likelihood phylogenetic tree of 341 Terpene Synthase C-domain sequences from bacteria (Blue) and fungi (Magenta) collected in the Pfam database. Branches with putative horizontal gene transfer events are labeled in green. Enlarged views of the two clusters are depicted on the right. Scale bars indicate the numbers of substitutions per site.

140

* 20 * 40 * 60 * 80 MAN_10655 : ME------KQRLKAQLSSLRVPLFSVPWPEQCSNKAEVVEARMMKWADEHNLLVTDEYRNRVIRTRYGL : 63 MAA_08668 : ME------KQRLKAQLSSLRVPLFSVPWPGQCSNKAEVVEARMMKWADEHNLLVTDEYRNRVIRTRYGL : 63 MGU_11447 : ME------KQRLKAQLSSLRVPLFSVPWPGQCSNKAEVIEARMMKWADEHNLLVTDEYRNRVIRTRYGL : 63 MBR_10393 : ME------KQRLTAQLSSLRVPLFSVPWPGQCSNKAEVIEARMMKWADEHNLLVTDEYRNRVIRTRYGL : 63 MAJ_08936 : ME------KQRLKAQLSSLRVPLFSVPWPGQCSNKAEVIEARMMKWADEHNLLVTDEYRNRVIRTRYGL : 63 MAC_05714 : ME------KQHLKSQIAALRVPTFNIAWPERRSPKADVVEERMMKWADHHKLLVNGEYRDRVIRTRYGL : 63 MBR_09977 : ME------KQILEPQLSALRLPAFNVPWPGARSPHAEVIEARMIEWADHYDLLVNDEHRSRVIRARYGW : 63 OCS_03958 : MALRQFPPILLEPFEMAMKHHVLEAQLASIHVPLFHFPWPGACSPAVDETEARMMDAPATPG------YAW : 65 B_MSh1 : MK------TDALKIELAALRIPVFDVPWAGACSPHAQRIETRMLEWADDHGLLVNDMYRKRVMRTRYGW : 63 CC1G_10050 : MP------SP-AGALPKSFILPDLVN------DCPFPLRVNPLCDEVGRLSEQWFLRHA-NYSPPRAVAFMALKAGE : 63

* 100 * 120 * 140 * 160 MAN_10655 : LAARCYPNAGDELLQVIADYLVWFFLIDDLFVDCVEVATDETIRNLT--AMVDVLDLNVSGSPPVFGELAWLDVCQRLRR : 141 MAA_08668 : LAARCYPNAGEELLQVIADYLVWFFLTDDLFVDRVEVATDETIRNLT--AMVDVLDLNVAGSPPVFGELAWLDVCQRLRR : 141 MGU_11447 : LAARCYPNAGEVLLQAIADYLVWFFLADDLFVDRVEVATDETIRNLT--AMVDVLDLNVAGSPPVFGELAWLDVCQRLRR : 141 MBR_10393 : LAARCYPNAGEELLQAIADCLVWFFLADDLFVDRVEVATDETIRNLT--AMVDVLDLNVAGSPPVFGELAWLDVCQRLRR : 141 MAJ_08936 : LAARCYPNAGEVLLQAIADYLVWFFLADDLFVDRVEVATDETIRNLT--AMVDVLDLNVAGSPPVFGELAWLDVCQRLRR : 141 MAC_05714 : LAARCYPNAGEELLQAIADYFVWFFLADDLFVDRVEVVTDETIRNLT--AMIDVLDRNVAREEPVFGELAWLDVCQRLRD : 141 MBR_09977 : LAARCYPNAAKELLQVIADYFVWFFLADDLFVDRVETVSGDTLRNLT--AMIDVLDFNSAGLEPVWGELAWLDVCRRLRS : 141 OCS_03958 : LAARCYPNAEPELLQTIADYFVWFFLADDLLVDRVETVTPDTLQNLT--AMIDALDFGSARPEPAYGETAFLDVCHRLRR : 143 B_MSh1 : LAARCYPNADPALLQVIADYFVWYFLTDDLFIDRVETLGPGTLTHLV--AIVDVLDYDQTARQPVYGERAWLDVCRRLRA : 141 CC1G_10050 : LTAACYPDADAFHLRVSDDFMNFLFNADD-WLDDFDIEDTYGLANCTVRALRDPVNFITDKRAGLMTKSYF----SRFLK : 138

* 180 * 200 * 220 * 240 MAN_10655 : LLQVEAFERFAQGMRLWATTAALQILNHLRPTSVSIREYQTIRRHTSGMNPCTSLADAANKGSVQACEFYNADVQTLVRQ : 221 MAA_08668 : LLQVEAFERFAQGMRLWATTAALQILNHLRPTSVGIQEYQTIRRHTSGMNPCTSLADAANKGSVQACEFYDADVQTLVRQ : 221 MGU_11447 : LLQAETFERFAQGMRLWATTAALQILNHLRPTSVGIREYQTIRRHTSGMNPCASLADAANKGSVQACEFYDADVQTLVRQ : 221 MBR_10393 : LLQAEAFERFAQGMRLWATTAALQILNHLRPTSVGMREYQTIRRHTSGMNPCTSLADAANKGSVQACEFYDADVQTLVRQ : 221 MAJ_08936 : LLQAEAFERFAQGMRLWATTAALQILNHLRPTPVGIREYQTIRRHTSGLNPCTSLADAANKGSVQACEFYDADVQTLVRQ : 221 MAC_05714 : LLQPEAFQRFAQGMRLWATTAALQILNHQRPKSVGIREYEAIRRHTSGLNPCTSLADAANKGSVQAHEFYQPDVQKLVLQ : 221 MBR_09977 : LLQAEPFERFAQGMRLWATTAGLQILNHIRPKSVGIREYQTIRRHTSGMNPCTALSDAANNGSVKPYEFYQPDVQALVRR : 221 OCS_03958 : LLGSEPFERFAQGMRLWGTAAGLQILNHMQPKSIGLASYKTIRRHTSGMNPCMALADAANGGSIEPSEFHRPDVQTLCRY : 223 B_MSh1 : RLSAEHFARFAQGMRLWATTAGLQILNHIHAESVGIPQYETIRRHTSGMNPCLALADTANCGAVPPDTFHRPDVQELCRH : 221 CC1G_10050 : TAGPRCTERFIQTLALYFESVVTQKQARNNGTLPDLESYITIRRNNSGCKPCYALIEFCAGIDLPDEVINHPIIQSLEDA : 218

* 260 * 280 * 300 * 320 MAN_10655 : TNNIVCWANDIQSLRIEIHQPGQFRNMVTIYAQQGQSLQDAVETAATRVNKEIASFCELANAFTARD--ISHELHGFIDG : 299 MAA_08668 : TNNIVSWANDIQSLRREIHQPGQFRNMVTICAQQGQSIQDSVETTATRVNKEIAGFCGLADAVTARP--ISDELHGLIDG : 299 MGU_11447 : TNNIVCWANDIQSLRIEIHQPGQFRNIVTIYAQQGQSLQDAVETTATRVNKEIAGFCELADAVTARP--ISDELHGLIDG : 299 MBR_10393 : TNNIVCWANDIQSLRIEIHQPGQFRNMVTIYAQQGQSLQDAVETTATRVNKEIASFCELADAVTARP--ISDELHGLIDG : 299 MAJ_08936 : TNNIVCWANDIQSLRIEIHQPGQFRNMVTIYAQQGQSLQDAVETTATRVNKEIAGFCELADAVTARP--ISDELHGLIDG : 299 MAC_05714 : TNNIVCWANDIQSLGMEIQQPGQFRNMVTIYIQQGQSLSEAVSTTTARVNNELSDFCKLADIVTAPS--ISDELRVYVDG : 299 MBR_09977 : ANNIVCWANDIQSLGVEIRQPGQFRNMVVIYAEQGGSLQNSVETTAARVDAEISSFLELADAVTARA---NVTLRGLVDG : 298 OCS_03958 : ANNVVCWSNDIQSLSVEIRQPGQFINMVIDRAAEGHSLQESIAYTASRVKSEMGRFVELCKTMTPGA---SDGLCRMMDG : 300 B_MSh1 : ANHIVCWSNDIQSLGIEARQPGQFRNMVLIRRLEGHTLQEGVDYTAARVRDEIGEFVRCADALSQHA---DTRVRGLVDG : 298 CC1G_10050 : SNDLIAWSNDIFSFNREQSRHDSFNMVSIVMHQKGFALQEAVNFVGELCKKAMERFQADKRNLPSWGPEIDGEVAMYVDG : 298

* 340 * 360 * MAN_10655 : LKYWIRGYLDWVVHDTLRYADQFIESDADDRRFSAPDLSLLNNSCSSVTESTRSLV : 355 MAA_08668 : LKYWIRGYLDWVAHDTMRYADHFIESDADDRRFSAPDLSLLNKSCSSVTESTSSLV : 355 MGU_11447 : LKYWIRGYLDWVVHDTMRYADQFIESDADDRRFSAPDLSLLKKKLLVCD------: 348 MBR_10393 : LKYWIRGYLDWVVHDTLRYADQFIESDADDRRF------: 332 MAJ_08936 : LEYWIRGYLDWVVHDTMRYADQFIESDADDRRFSAPDLSLLKKNCSSVTESTSSLV : 355 MAC_05714 : LKYWIRGYMDWVVHDTERYADKFIASDADDRCVSTLNPSLLNRSSSSATE------: 349 MBR_09977 : LKYWIRGYLDWVEHDTLRYVDKFAAVDADDRFLSTPQVA----SRHSV------: 342 OCS_03958 : LRYWMRGYQD------: 310 B_MSh1 : CRYWIRGYLDWVARDTQRYAAAY-ADDADDRGLIAPSGSVARD------: 340 CC1G_10050 : LQNWIVGSLNWSIDGTERYFGKDGPGIKKHRKVKLFPKRPLKTPAVRVLA------: 348

Figure 4.2: Amino acid sequence alignment of 8 BTPSL TPSs, and representative TPSs from bacteria and fungi. 8 BTPSL TPSs: MAN_10655 from Metarhizium anisopliae, MAA_08668 from Metarhizium robertsii, MGU_11447 from Metarhizium guizhouense, MBR_10393 and MBR_09977 from Metarhizium brunneum, MAJ_08936 from Metarhizium majus, MAC_05714 from Metarhizium acridum, OCS_03958 from Ophiocordyceps sinensis; Representative TPSs from bacteria and fungi: B_MSh1 from bacteria, Burkholderia sp MSh1; CC1G_10050 from fungi, Coprinopsis cinerea okayama7#130. The positions of DDxx(x)D and NDxxSxxxE motifs are indicated by black bars.

141 Figure 4.3: Alignment of the orthologous genomic regions containing the BTPSL genes from seven fungi species. Each horizontal prole indicates one genomic region with the height as the indicator of degree of sequence conservation. The white regions correspond to unaligned sequences. The regions that share the same color are locally collinear blocks. Below each prole are gene models and orthologous genes are colored and connected by shaded boxes in the same color. Red vertical lines delineate the ends of contigs. Three letter codes correspond to species: MAA, Metarhizium robertsii; MAN, Metarhizium anisopliae;MBR, Metarhizium brunneum; MAJ, Metarhizium majus; MAC, Metarhizium acridum; MGU, Metarhizium guizhouense; OCS, Ophiocordyceps sinensis. The scaold names with length information are displayed under species codes.

142 MAA_11039 MAA_08668 MAA_08669

125056 126203 128227 129294 135646134623

10591 bp

Figure 4.4: Conrmation of MAA_08668, a fungal TPS gene of putative horizontal gene transfer origin, integration into Metarhizium robertsii genome. The genomic region spanning MAA_08668, its neighboring genes (MAA_11039: Zn(2)-Cys(6) zinc nger domain protein; MAA_08669: hypothetical protein) and the intergenic region were amplied using PCR and conrmed by sequencing.

143 0.92 MAN_10655 BTPSLs MAA_08668 1 MGU_11447 1 MAJ_08936 MBR_10393 1 MAC_05714 0.9 MBR_09977 OCS_03958 1 B_AU4i Bacterial TPSs 0.99 B_ce 1 1 B_ceJBK9 1 B_MSh2 B_MSh1 K_se G_ma OCS_06263 Fungal TPSs MAM_06660 OCS_00133 OCS_04474 MBR_00969 MAA_07176 MAN_02977 1 MAJ_07611 MGU_07703 1 MAC_06344 1 0.84 MAA_06581 1 MGU_08368 0.70 MAJ_11006 CCM_03050 BBA_00396 1 BBA_08783 0.99 0.89 B15_g8139 0.87 MAA_06544 OCS_01341 1 MGU_10775 0.94 MAJ_10664 OCS_00754 MAM_08406 1 BBA_08696 1 B15_g11707 OCS_00486 1 BBA_03632 1 0.72 B15_g10715 B15_g9145 MBR_03882 MAN_07035 MAA_06799 MAJ_05691 1 MGU_07289 MAC_09823 BBA_08897 B15_g8322 0.4

Figure 4.5: Maximum Likelihood phylogenetic tree and exon-intron structure of both types of TPS genes from entomopathogenic fungi and those bacterial TPS genes similar to the BTPSL genes in fungi. The phylogenetic tree was constructed using PhyML with GTR + I + G substitution model based on the codon alignment that is generated from the corresponding protein alignment. Bootstrap values greater than 70% are shown beside branches. The BTPSL genes (Table 4.1), typical fungal TPS genes (Table 4.2) and bacterial TPS genes (Table 4.3) are colored in cyan, magenta and orange, respectively. Exons and introns are represented by boxes with arrowheads and lines. Scale bars indicate the numbers of substitutions per site. Gene models are drawn to scale.

144 Figure'6'

9 ) 40 (E,E)-FPP 8

ions 30 5 20 6

3 4 10 1 7 2

13 14 15 16 (TIC x 100,000x (TIC 100 (Z,E)-FPP 12 80 60 abundance 40 10 20 11 Relative 13 14 15 16 Retention time (min)

Figure 4.6: Biochemical activity of terpene synthase MAA_08668 of putative horizontal gene transfer origin in Metarhizium robertsii. The gene was expressed in Escherichia coli and recombinant protein was incubated with the potential substrates (E,E)-farnesyl diphosphate (FPP) and (Z,E)-FPP. Enzyme products were analyzed using GC-MS. The total ion current (TIC) chromatograms are shown. 1, β-elemene*; 2, (E)-β- caryophyllene*; 3, epi-bicyclosesquiphellandrene; 4, cadina-1,4-diene; 5, γ-cadinene; 6, δ-cadinene; 7, α-cadinene; 8, nerolidol*; 9, epi-α-cadinol; 10, α-longipinene; 11, α- ylangene; 12, β- bisabolene*. Compounds marked with * were identied using authentic standards. All other compounds were tentatively identied using commercial mass spectra libraries. Unlabeled peaks represent unidentied sesquiterpenes.

145 Metarhizium robertsii (MAA)

Metarhizium anisopliae (MAN)

Metarhizium brunneum (MBR)

Metarhizium guizhouense (MGU)

Metarhizium majus (MAJ)

Metarhizium acridum (MAC)

Metarhizium album (MAM)

Ophiocordyceps sinensis (OCS)

Cordyceps militaris (CCM)

Beauveria bassiana (BBA)

Fusarium graminearum

Figure 4.7: A species tree redrawn based on previous studies showing the evolutionary relationship of several entomopathogenic fungi. Fusarium graminearum is included as an outgroup.

146 Chapter 5

Identication, Characterization, and

Evolution of Fungal Terpene

Synthases

147 5.1 Abstract

Terpenes, one important class of fungi secondary metabolites, are the largest and

most structurally diverse natural compounds. Fungi are known for producing a wide

range of bioactive terpene products that play important roles in fungal survival and

development. Many of them are also used by humans as medicines, such as antibiotics

and anticancer drugs. Terpene synthases (TPSs) are the key enzymes to produce these

terpenes. Recent advances in sequencing technologies have made available a large number

of fungi genomes, so this provides a great opportunity to systematically investigate the

distribution of fungal terpene synthase genes. In this chapter, four types of terpene

synthase genes were identied from 519 fungi genomes. All of them are from higher fungi

(Ascomycota and Basidiomycota). The two groups of fungi have evolved dierent sets of

terpene synthases. The α and TRI5 types of terpene synthases are mainly found in species of Basidiomycota, while the bifunctional CPS/KS and the chimera diterpene synthases are present in Ascomycota. The results also suggest possible HGT between Ascomycota and

Basidiomycota.

5.2 Introduction

During evolution, fungi have successfully adapted to an extraordinarily wide range of environments, from soil and human body to the deep-sea sediments and even underwater volcano. Survival in such a wide diversity of habitats needs employ various strategies for obtaining nutrients. Among these strategies, such as producing almost indestructible

148 spores and developing trapping structures by predatory fungi (Yang et al., 2007), the ability to produce vast diversity of natural secondary metabolites in response to com- plex, extreme and changing environments also ensure their survival and reproduction.

Fungi use these nature products as chemical signals for communication (Brakhage and

Schroeckh, 2011; Netzker et al., 2015), to protect them from predators. Many of these biologically active compounds are also used as benecial drugs by humans, including antibiotics (Demain and Fang, 2000), plant growth hormones (gibberellins) (Bomke and

Tudzynski, 2009) and anticancer drugs (Balde et al., 2010; Evidente et al., 2014).

Terpenoids are one class of secondary metabolites widely produced by fungi (Keller et al., 2005). They are derived from a single isoprene unit and are the most diverse class of natural compounds on earth. This huge diversity is partially achieved by the terpene synthases, key enzymes involved in the synthesis of terpenoids. Based on dierent substrate activation mechanisms, two classes of terpene synthases are recognized currently. The Class I terpene synthases, which has the α-domain, catalyzes an ionization-dependent cyclization of the substrate diphosphate group and contains two conserved motifs: DDxxD and NSE. The Class II terpene synthases contains a functional

β-domain and catalyzes a protonation-dependent cyclization. The catalytic site of this type is a DxDD motif. Based on the types of terpenoids they produce, terpene synthases can be classied as isoprene synthases, monoterpene synthases, sesquiterpene synthases and diterpene synthases (Chen et al., 2011). In fungi, monoterpene synthases are rare and the rst one, 1,8-cineole synthase, was recently reported from Hypoxylon sp. (Shaw et al., 2015). Sesquiterpene synthases, such as the trichodiene synthases, convert the

149 farnesyl pyrophosphate into dierent compounds and have the Class I terpene synthases

signatures. Diterpene synthases that use geranylgeranyl pyrophosphate as substrates

have both α-domain and β-domain. As opposed to majority plant diterpene synthases,

fungal counterparts are believed to be bifunctional. The rst diterpene synthases was

identied from the Fungus Phaeosphaeria sp. L487 (Kawaide et al., 1997) and produces the ent-Kaurene, a precursor of Gibberellin. A special type of diterpene synthases, called chimeric fungal terpene synthases that can catalyze both chain elongation and cyclization reactions, was also existed in fungi (Matsuda et al., 2015; Qin et al., 2016; Toyomasu et al., 2009; Ye et al., 2015). Sequences of this type have terpene synthases domains and prenyltransferase domains.

Fungi terpene synthases are often found within biosynthetic clusters. For example, the Tri5 gene cluster found in Fusarium species has 12 genes including terpene synthase gene, the cytochrome P450 monooxygenase genes and the regulatory genes (Kimura et al.,

2007; Proctor et al., 2009). The GA biosynthetic cluster in Gibberella fujikuroi contains six

genes: a GGPP synthase, a bifunctional diterpene synthase and four cytochrome P450

monooxygenase genes (Tudzynski and Hölter, 1998). These gene clusters are key sources

of novel high-value compounds (Cacho et al., 2014; Keller, 2015; Weber et al., 2015).

Although fungi are rich source of terpenoids, when compared to plant terpene

synthases, out knowledge of fungal terpene synthases is very limited. So, identication

and characterization of terpene synthase genes and biosynthetic gene clusters will pave

the way to a deeper understanding of fungal terpene synthases. Recently, the JGI

1000 fungal genome project provides us a great data set to look at the fungal terpene

150 synthases. Currently, more than 500 fungal genome sequences are already publicly

available through MycoCosm (http://genome.jgi.doe.gov/programs/fungi). In this chapter, four types of fungal terpene synthases are identied from 519 fungal genomes.

Their distributions, sequence features and evolutionary relationships are investigated.

5.3 Results

5.3.1 Distribution of TPSs in Fungal Genomes

By mining 519 fungal genomes from the Joint Genome Institute (JGI), we identied

1753 class I terpene synthases (Figure 5.1 and Appendix A.2), 545 Trichodiene syn- thases, 230 bifunctional ent-copalyl diphosphate synthase/ent-kaurene synthase (CPS/KS) enzymes (all sequences of length less than 300 amino acids are eliminated) and 114 chimera diterpene synthases (sequences with both Terpene_synth_C and polyprenyl_synt domains). All four types of terpene synthases are found exclusively in Ascomycota and

Basidiomycota (Figure 5.1). For any type, it is clear that there is no terpene synthase found in the basal lineages of fungi, such as Glomeromycota, Mucoromycotina and

Zoopagomycotina. Terpene synthases identied in this study clearly cluster into four groups in the phylogenetic tree (Figure 5.2) according to the type, suggesting signicant variation within the four types of Terpene synthases.

About 70% of the class I terpene synthases (1220 out of 1753) are present in Agari- comycotina, which is one of three major subphyla of the fungal division Basidiomycota

151 and produce mushrooms, but completely absent in Ustilaginomycotina, a group of non-

mushroom forming fungi in the Basidiomycota and most of which are plant parasites.

In another subphyla, Pucciniomycotina, only 7 terpene synthases are found from two

species, Atractiellales sp. 95 and Cronartium quercuum f. sp. fusiforme G11. The remaining

30% are found in the largest subphylum of Ascomycota, Pezizomycotina, which includes almost all lamentous fungi. It appears that terpene synthases are present in all the eight studied classes of this subphylum and are evenly distributed across species. However, no terpene synthases are found from 40 species of Saccharomycotina and 7 species of

Taphrinomycotina. Species of Saccharomycotina and Taphrinomycotina are yeasts and yeast-like fungi in the Ascomycota and they don’t form fruiting bodies during the sexual phase of their life cycle, while species of Pezizomycotina are characterized by their fruiting bodies. It is clear that the presence or absence of terpene synthases in the species of

Ascomycota depends on dierent morphological characters, such as whether possessing fruiting bodies. It also needs to point out that the class I terpene synthases found in

Ascomycota and Basidiomycota cluster into two major groups according to the fungal phylogeny (Figure 5.3), although there are a small number of genes scattered over the two groups.

Similar to the α type of terpene synthases, more than half of Trichodiene synthases

(359 out of 545) are identied from species of Agaricomycotina. This type is also completely absent in Ustilaginomycotina, although there are 16 genomes available. Two

(Tritirachium sp. CBS 265.96 and Septobasidium sp. PNB30-8B) of 22 Pucciniomycotina species examined have Trichodiene synthase genes and the numbers of genes in the two

152 species are 18 and 9, respectively. The remaining are present in the following ve classes of Ascomycota: Eurotiomycetes, , Dothideomycetes, Pezizomycetes and

Leotiomycetes. Although there is no gene identied from the other three classes, it not yet known whether this type is absent from species of these three classes with the consideration of limited available genomes. Compared to the α type, the phylogenetic relationship of Trichodiene synthases is less clear (Figure 5.4). To a certain extent, the phylogenetic analysis indicates the lineage-specic distribution of this type of genes (As- comycota specic and Basidiomycota specic). However, there are more genes showing higher sequence similarity with those outside of their specic lineages. For example, more than half of Trichodiene synthases identied from species of Sordariomycetes cluster together with those in Agaricomycotina. In addition, all 18 Trichodiene synthase genes found in Tritirachium sp. CBS 265.96 reside on the branch of Ascomycota species. These incongruities indicate possible horizontal transfer events among species in these two phyla of Dikarya.

The bifunctional CPS/KSs are mainly found in three clades of the Ascomycota division:

Eurotiomycetes, Dothideomycetes and Sordariomycetes. Although Agaricomycotina has the largest number of genomes available (136 out of 519), only 11 CPS/KSs are found from seven species. All of these seven species belong to class , but reside in three dierent orders: Agaricales, Boletales and Corticiales (Figure 5.5). These 11 CPS/KSs form a single group and cluster with members from the Ascomycota division with strong support in the phylogenetic tree (Figure 5.6). The multiple sequence alignment also indicates the high sequence similarity among these sequences and the conserved motifs

153 (Figure 5.7). This indicates the possible existence of horizontal gene transfer events from

species of Ascomycota to species of Basidiomycota. The possibility that these transfers

occur in the opposite direction also exists. If this is true, the CPS/KS gene has been lost

in vast majority of Basidiomycota species during evolution.

The αα1 type of diterpene synthases contains the smallest number of genes compared

to the other three types. Its distribution is very similar to the distribution of CPS/KSs. Out

of 136 available genomes in the Basidiomycota division, only two sequences are found

from a single species, Gymnopus luxurians, which also has the largest number of CPS/KSs

among the seven Basidiomycota species containing CPS/KSs. These two sequences are

very similar to three members in Class Dothideomycetes (Figure 5.8 and Figure 5.9).

The distribution of these four types of terpene synthases among species is also

investigated within each fungal lineage (Figure 5.10). Species containing TRI5 type

terpene synthases are more likely to have α type (100% in Agaricomycotina, 83% in

Dothideomycetes, 92% in Sordariomycetes, 84% in Eurotiomycetes, 50% in Pezizomycetes

and 100% in Leotiomycetes). The chances of having CPS/KS and TRI5 types are 1%,

17%, 21% and 53% in Agaricomycotina, Dothideomycetes, Sordariomycetes and Euro-

tiomycetes, respectively. The chances of having CPS/KS and αα1 types are 40%, 28% and 53% in Dothideomycetes, Sordariomycetes and Eurotiomycetes, respectively. The class Eurotiomycetes has the largest number of species (19) that have all four types of terpene synthases and the numbers are 1, 3 and 8 in Agaricomycotina, Dothideomycetes and Sordariomycetes, respectively.

154 5.3.2 Intron-Exon Structure

The mRNA sequence lengths and exon numbers of four types of terpene synthases

were investigated (Figure 5.11, Table 5.1 and Table 5.2). The pairwise Wilcoxon rank sum

test with Bonferroni’s correction was used to test the mRNA length dierence between

any two types of terpene synthases and show that all four groups are signicantly dierent

from each other (p < 2e-16). The median mRNA lengths of the four types are, respectively,

1041 (α), 987 (TRI5), 2808 (CPS/KS) and 2142 ( αα1).

The number of exons of all four types appears to be no unique pattern and ranges from a single exon to more than 10. The same test was also used to test the exon number dierence between any two types and reveal that all, except the exon numbers between

TRI5 and CPS/KS (p =1) and those between α and αα1 (p = 0.055), are signicantly dierent at 0.01 signicance level or above. The median exon numbers of the four types are 5 (α),

4 (TRI5), 4 (CPS/KS) and 5.5 (αα1), respectively. As shown in Figure 5.11A, the number

of exons is not correlated with the length of the gene (Pearson correlation coecient r =

0.02). However, genes of CPS/KS type tend to have long exon sequences as indicated by

its longest average gene length and low exon number, while the α type tend to have short

exon sequences.

5.3.3 Motif Analysis

Potential functional and structural motifs that are important for substrate binding

and catalysis were checked for each of the four types of terpene synthases. The α type

155 contains two aspartate-rich motifs, D(D,E)xxD and NDxxS(Y,F)xxE (Figure 5.12), which are signatures of class I terpene synthases. The two motifs are believed to form a trinuclear metal cluster that trigger the Mg2+ dependent ionization of isoprenoid diphosphate group.

Terpene synthases of this type are putative sesquiterpene synthases.

The Trichodiene synthases here have no typical aspartate-rich motif DDxxD, although the DD motif is obvious. It is possible that a small number of sequences possess the motif, but most of them don’t. The additional metal cofactor binding motif, NDxxSFYKE, is observed in this type.

The CPS/KS type sequences are diterpene synthases and have both α and β domains.

Four conserved motifs, D(D,E)xxE, DxDD, DExxE and NDxGSxxRDxxE, are observed. The

DXDD motif is a signature of class II terpene synthases and is responsible for catalyzing the protonation dependent cyclization reaction (Prisic et al., 2007). The DExxE motif is similar to the DDXXD motif, but is more common observed in fungi (Fischer et al., 2015).

As opposed to most plant CPS/KS sequences that have only one active domain (CPS-type or KS-type), the fungal CPS/KSs are believed to be bifunctional enzymes, which synthesize ent-kaurene, a precursor for gibberellins (GAs), from geranylgeranyl diphosphate (GGDP) using single enzymes (Schmidt-Dannert, 2015).

As expected, the chimera diterpene synthases, αα1 type, contain two DDxxD motifs, one from the terpene synthase domain and the other one from the prenyltransferase domain. Both these two DDxxD motifs have KS activities and are expected to be involved in binding the substrate diphosphate group via an Mg2+. The third motif, NDxxxxxxE, is also observed in the terpene synthase domain. All these chimera diterpene synthases

156 have their terpene domain at the N terminus and the prenyltransferase domain at the

C terminus, except the one AlAl1_EUR_RTHQ_7399, of which the two domains are

overlapped mostly. This conserved domain architecture indicates a common origin and

conserved functions performed by these unusual multifunctional enzymes.

5.4 Discussion

Like plants, fungi also produce a wide range of diverse terpenoids to help them adapt

and survive in dierent ecological niches. Current research on fungal terpenome is mainly

limited to the phylum Ascomycota (Quin et al., 2014). The results obtained by mining

such a wide range of fungi species, in particular for species in the phylum Basidiomycota,

greatly expanded our knowledge and understanding of fungal terpenome. Our results

show that the phyla Ascomycota and Basidiomycota have evolved dierent sets of

terpene synthases and both phyla use α and TRI5 types as their predominant terpene

synthases. For species of Basidiomycota, the α and TRI5 types of terpene synthases are commonly encoded in their genomes, while the two types of diterpene synthases are almost completely lacking. The α and TRI5 types are putative sesquiterpene synthases and expected to produce various sesquiterpenoids from farnesyl diphosphate (FPP). In contrast, species of Ascomycota encode almost all the diterpene synthases identied in this study, although they also encode about 30% of α and TRI5 types terpene synthases.

It seems that species of Ascomycota produce a wider range of terpene natural products

than species of Basidiomycota.

157 The seemly lack of diterpene synthases in Basidiomycota doesn’t mean that they don’t exist. Over 30000 estimated species (Morrow and Fraser, 2009) in this phylum and only

174 species are investigated in this study and majority of them are not selected for the discovery of terpene synthases (Schmidt-Dannert, 2015). So, it is possible that more species in this phylum contain diterpene synthases. It is also possible that our current search method is not power enough to discover these genes and is also limited by our current knowledge of fungal diterpene synthases. It is worth mentioning that a HMMER search using known terpene synthase domains from Pfam doesn’t pick up any fungal

CPS/KSs because of lacking these domains as opposed to their plant counterparts. That’s why we used four known CPS/KSs as blast queries (one from a species of Ascomycota, the other three from plant species). About 75% of CPS/KSs found in this study can be found by all these four sequences, while the rest 25% was exclusively found by the fungal sequence

(Figure 5.13). In view of the low sequence similarity shared by terpene synthases and no

CPS/KSs from Basidiomycota used as queries, Basidiomycota species probably have their

CPS/KSs that show little sequence similarity to those in species of Ascomycota. Another possibility is that Basidiomycota acquired CPS/KSs from Ascomycota via horizontal gene transfer, which is also speculated by a recent study (Fischer et al., 2015).

Currently, ve αα1 type terpene synthases have been characterized. Three of them

(Chiba et al., 2013; Qin et al., 2016; Ye et al., 2015) are from eurotiomycetes fungi and the other two from a species of Sordariomycetes (Toyomasu et al., 2008, 2007). This is consistent with our results that the vast majority of αα1 type sequences are found from

Sordariomycetes, Eurotiomycetes and Dothideomycetes. The close relationship between

158 α type and αα1 type sequences in the phylogenetic tree indicates that the αα1 type actually is a result of gene fusion. Terpene synthases might also fuse with other genes, such as the cytochrome P450 monooxygenase genes. Sequences with this architecture are observed in this study. So, more types of terpene synthases are waiting to be found.

We didn’t nd any terpene synthases in species of early diverging fungal lineages.

There are some short sequences have been found, but at current stage, they are removed for further analysis because of low quality. However, we can’t rule out the possibility that species belong to this group may contain terpene synthases.

5.5 Methods

5.5.1 Genome Mining of Fungal Terpene Synthases

519 fungal genomes were obtained from JGI (http://genome.jgi.doe.gov/programs/ fungi/index.jsf, date last accessed October 2, 2015). Putative terpene synthases were identied by a search of all predicted amino acid sequences with HMMER against the Pfam database (http://pfam.xfam.org, version 28.0). Sequences with TRI5 domain (PF06330) as the best matching domain are classied as Trichodiene synthases, while sequences with

Terpene_synth_C (PF03936) domain but without polyprenyl_synt (PF00348) domain are classied as class I terpene synthases. For sequences with both Terpene_synth_C and polyprenyl_synt domains, we named them as chimera diterpene synthases.

In order to recover the sequences with both ent-copalyl diphosphate synthase/ent- kaurene synthase (CPS/KS) activities, selected known CPS/KS sequences (SmCPSKS

159 from Sphaceloma manihoticola, GI: 197724597 (Bömke et al., 2008); PpCPSKS from

Physcomitrella patens, GI: 146325986 (Hayashi et al., 2006); AtGA1 and AtGA2 from

Arabidopsis thaliana, GI: 15235504 and 15235504 (Mayer et al., 1999), respectively) were used as blastp queries to identify CPS/KS homologs in the available JGI fungal proteins

(e-value cuto of 10-5).

5.5.2 Multiple Sequence Alignments and Phylogenetic Inference

All multiple sequence alignments were made using MAFFT in a highly accurate setting

(L-INS-i) with 1000 iterations of improvement. Maximum-likelihood phylogenetic trees were inferred with FastTree (version 2.1.7) using a more accuracy setting (-spr 4 -mlacc 2

-slownni). All trees were rendered using FigTree (version 1.4.2).

160 5.6 Bibliography

Balde, E. S., Andol, A., Bruyere, C., Cimmino, A., Lamoral-Theys, D., Vurro, M., Damme,

M. V., Altomare, C., Mathieu, V., Kiss, R., and Evidente, A. (2010). Investigations of

fungal secondary metabolites with potential anticancer activity. Journal of Natural

Products, 73(5):969–71. 149

Bomke, C. and Tudzynski, B. (2009). Diversity, regulation, and evolution of the gibberellin

biosynthetic pathway in fungi compared to plants and bacteria. Phytochemistry, 70(15-

16):1876–93. 149

Brakhage, A. A. and Schroeckh, V. (2011). Fungal secondary metabolites - strategies to

activate silent gene clusters. Fungal Genetics and Biology, 48(1):15–22. 149

Bömke, C., Rojas, M. C., Gong, F., Hedden, P., and Tudzynski, B. (2008). Isolation

and characterization of the gibberellin biosynthetic gene cluster in sphaceloma

manihoticola. Applied and Environmental Microbiology, 74(17):5325–5339. 160

Cacho, R. A., Tang, Y., and Chooi, Y. H. (2014). Next-generation sequencing approach for

connecting secondary metabolites to biosynthetic gene clusters in fungi. Frontiers in

Microbiology, 5:774. 150

Chen, F., Tholl, D., Bohlmann, J., and Pichersky, E. (2011). The family of terpene synthases

in plants: a mid-size family of genes for specialized metabolism that is highly diversied

throughout the kingdom. Plant Journal, 66(1):212–29. 149

161 Chiba, R., Minami, A., Gomi, K., and Oikawa, H. (2013). Identication of ophiobolin f

synthase by a genome mining approach: A sesterterpene synthase from aspergillus

clavatus. Organic Letters, 15(3):594–597. 158

Demain, A. L. and Fang, A. (2000). The natural functions of secondary metabolites.

Advances in Biochemical Engineering/Biotechnology, 69:1–39. 149

Evidente, A., Kornienko, A., Cimmino, A., Andol, A., Lefranc, F., Mathieu, V., and Kiss, R.

(2014). Fungal metabolites with anticancer activity. Natural Product Reports, 31(5):617–

27. 149

Fischer, M. J., Rustenhloz, C., Leh-Louis, V., and Perriere, G. (2015). Molecular and

functional evolution of the fungal diterpene synthase genes. BMC Microbiology, 15:221.

156, 158

Hayashi, K. i., Kawaide, H., Notomi, M., Sakigi, Y., Matsuo, A., and Nozaki, H. (2006).

Identication and functional analysis of bifunctional ent-kaurene synthase from the

moss physcomitrella patens. FEBS Letters, 580(26):6175–6181. 160

Hibbett, D. S., Binder, M., Bischo, J. F., Blackwell, M., Cannon, P. F., Eriksson, O. E.,

Huhndorf, S., James, T., Kirk, P. M., Lucking, R., Thorsten Lumbsch, H., Lutzoni, F.,

Matheny, P. B., McLaughlin, D. J., Powell, M. J., Redhead, S., Schoch, C. L., Spatafora,

J. W., Stalpers, J. A., Vilgalys, R., Aime, M. C., Aptroot, A., Bauer, R., Begerow, D., Benny,

G. L., Castlebury, L. A., Crous, P. W., Dai, Y. C., Gams, W., Geiser, D. M., Grith, G. W.,

Gueidan, C., Hawksworth, D. L., Hestmark, G., Hosaka, K., Humber, R. A., Hyde, K. D.,

162 Ironside, J. E., Koljalg, U., Kurtzman, C. P., Larsson, K. H., Lichtwardt, R., Longcore, J.,

Miadlikowska, J., Miller, A., Moncalvo, J. M., Mozley-Standridge, S., Oberwinkler, F.,

Parmasto, E., Reeb, V., Rogers, J. D., Roux, C., Ryvarden, L., Sampaio, J. P., Schussler,

A., Sugiyama, J., Thorn, R. G., Tibell, L., Untereiner, W. A., Walker, C., Wang, Z., Weir,

A., Weiss, M., White, M. M., Winka, K., Yao, Y. J., and Zhang, N. (2007). A higher-level

phylogenetic classication of the fungi. Mycological Research, 111(Pt 5):509–47. 170

Kawaide, H., Imai, R., Sassa, T., and Kamiya, Y. (1997). ent-kaurene synthase from

the fungus phaeosphaeria sp. l487: cdna isolation, characterization, and bacterial

expression of a bifunctional diterpene cyclase in fungal gibberellin biosynthesis. Journal

of Biological Chemistry, 272(35):21706–21712. 150

Keller, N. P. (2015). Translating biosynthetic gene clusters into fungal armor and

weaponry. Nature Chemical Biology, 11(9):671–7. 150

Keller, N. P., Turner, G., and Bennett, J. W. (2005). Fungal secondary metabolism - from

biochemistry to genomics. Nature Reviews. Microbiology, 3(12):937–947. 149

Kimura, M., Tokai, T., Takahashi-Ando, N., Ohsato, S., and Fujimura, M. (2007). Molecular

and genetic studies of fusarium trichothecene biosynthesis: Pathways, genes, and

evolution. Bioscience, Biotechnology and Biochemistry, 71(9):2105–2123. 150

Matsuda, Y., Mitsuhashi, T., Quan, Z., and Abe, I. (2015). Molecular basis for stellatic

acid biosynthesis: A genome mining approach for discovery of sesterterpene synthases.

Organic Letters, 17(18):4644–7. 150

163 Mayer, K., Schuller, C., Wambutt, R., Murphy, G., Volckaert, G., Pohl, T., Dusterhoft, A.,

Stiekema, W., Entian, K. D., Terryn, N., Harris, B., Ansorge, W., Brandt, P., Grivell,

L., Rieger, M., Weichselgartner, M., de Simone, V., Obermaier, B., Mache, R., Muller,

M., Kreis, M., Delseny, M., Puigdomenech, P., Watson, M., Schmidtheini, T., Reichert,

B., Portatelle, D., Perez-Alonso, M., Boutry, M., Bancroft, I., Vos, P., Hoheisel, J.,

Zimmermann, W., Wedler, H., Ridley, P., Langham, S. A., McCullagh, B., Bilham, L.,

Robben, J., Van der Schueren, J., Grymonprez, B., Chuang, Y. J., Vandenbussche, F.,

Braeken, M., Weltjens, I., Voet, M., Bastiaens, I., Aert, R., Defoor, E., Weitzenegger, T.,

Bothe, G., Ramsperger, U., Hilbert, H., Braun, M., Holzer, E., Brandt, A., Peters, S., van

Staveren, M., Dirske, W., Mooijman, P., Klein Lankhorst, R., Rose, M., Hauf, J., Kotter,

P., Berneiser, S., Hempel, S., Feldpausch, M., Lamberth, S., Van den Daele, H., De Keyser,

A., Buysshaert, C., Gielen, J., Villarroel, R., De Clercq, R., Van Montagu, M., Rogers, J.,

Cronin, A., Quail, M., Bray-Allen, S., Clark, L., Doggett, J., Hall, S., Kay, M., Lennard,

N., McLay, K., Mayes, R., Pettett, A., Rajandream, M. A., Lyne, M., Benes, V., Rechmann,

S., Borkova, D., Blocker, H., Scharfe, M., Grimm, M., Lohnert, T. H., Dose, S., de Haan,

M., Maarse, A., Schafer, M., et al. (1999). Sequence and analysis of chromosome 4 of the

plant arabidopsis thaliana. Nature, 402(6763):769–77. 160

Morrow, C. A. and Fraser, J. A. (2009). Sexual reproduction and dimorphism in the

pathogenic basidiomycetes. FEMS Yeast Research, 9(2):161–77. 158

Netzker, T., Fischer, J., Weber, J., Mattern, D. J., Konig, C. C., Valiante, V., Schroeckh, V.,

and Brakhage, A. A. (2015). Microbial communication leading to the activation of silent

164 fungal secondary metabolite gene clusters. Frontiers in Microbiology, 6:299. 149

Prisic, S., Xu, J., Coates, R. M., and Peters, R. J. (2007). Probing the role of the dxdd motif

in class ii diterpene cyclases. Chembiochem, 8(8):869–74. 156

Proctor, R. H., McCormick, S. P., Alexander, N. J., and Desjardins, A. E. (2009).

Evidence that a secondary metabolic biosynthetic gene cluster has grown by gene

relocation during evolution of the lamentous fungus fusarium. Molecular Microbiology,

74(5):1128–1142. 150

Qin, B., Matsuda, Y., Mori, T., Okada, M., Quan, Z., Mitsuhashi, T., Wakimoto, T., and Abe,

I. (2016). An unusual chimeric diterpene synthase from emericella variecolor and its

functional conversion into a sesterterpene synthase by domain swapping. Angewandte

Chemie. International Edition in English, 55(5):1658–61. 150, 158

Quin, M. B., Flynn, C. M., and Schmidt-Dannert, C. (2014). Traversing the fungal

terpenome. Natural Product Reports, 31(10):1449–73. 157

Schmidt-Dannert, C. (2015). Biosynthesis of terpenoid natural products in fungi. Advances

in Biochemical Engineering/Biotechnology, 148:19–61. 156, 158

Shaw, J. J., Berbasova, T., Sasaki, T., Jeerson-George, K., Spakowicz, D. J., Dunican, B. F.,

Portero, C. E., Narvaez-Trujillo, A., and Strobel, S. A. (2015). Identication of a fungal

1,8-cineole synthase from hypoxylon sp. with specicity determinants in common with

the plant synthases. Journal of Biological Chemistry, 290(13):8511–26. 149

165 Toyomasu, T., Kaneko, A., Tokiwano, T., Kanno, Y., Kanno, Y., Niida, R., Miura, S.,

Nishioka, T., Ikeda, C., Mitsuhashi, W., Dairi, T., Kawano, T., Oikawa, H., Kato, N.,

and Sassa, T. (2009). Biosynthetic gene-based secondary metabolite screening: A new

diterpene, methyl phomopsenonate, from the fungus phomopsis amygdali. Journal of

Organic Chemistry, 74(4):1541–1548. 150

Toyomasu, T., Niida, R., Kenmoku, H., Kanno, Y., Miura, S., Nakano, C., Shiono, Y.,

Mitsuhashi, W., Toshima, H., Oikawa, H., Hoshino, T., Dairi, T., Kato, N., and Sassa,

T. (2008). Identication of diterpene biosynthetic gene clusters and functional analysis

of labdane-related diterpene cyclases in phomopsis amygdali. Bioscience, Biotechnology

and Biochemistry, 72(4):1038–1047. 158

Toyomasu, T., Tsukahara, M., Kaneko, A., Niida, R., Mitsuhashi, W., Dairi, T., Kato, N.,

and Sassa, T. (2007). Fusicoccins are biosynthesized by an unusual chimera diterpene

synthase in fungi. Proceedings of the National academy of Sciences of the United States

of America, 104(9):3084–3088. 158

Tudzynski, B. and Hölter, K. (1998). Gibberellin biosynthetic pathway in gibberella

fujikuroi: Evidence for a gene cluster. Fungal Genetics and Biology, 25(3):157–170. 150

Weber, T., Blin, K., Duddela, S., Krug, D., Kim, H. U., Bruccoleri, R., Lee, S. Y., Fischbach,

M. A., Muller, R., Wohlleben, W., Breitling, R., Takano, E., and Medema, M. H. (2015).

antismash 3.0-a comprehensive resource for the genome mining of biosynthetic gene

clusters. Nucleic Acids Research, 43(W1):W237–43. 150

166 Yang, Y., Yang, E., An, Z., and Liu, X. (2007). Evolution of nematode-trapping cells

of predatory fungi of the orbiliaceae based on evidence from rrna-encoding dna and

multiprotein sequences. Proceedings of the National academy of Sciences of the United

States of America, 104(20):8379–84. 149

Ye, Y., Minami, A., Mandi, A., Liu, C., Taniguchi, T., Kuzuyama, T., Monde, K., Gomi, K.,

and Oikawa, H. (2015). Genome mining for sesterterpenes using bifunctional terpene

synthases reveals a unied intermediate of di/sesterterpenes. Journal of the American

Chemical Society, 137(36):11846–53. 150, 158

167 5.7 Appendix

Table 5.1: Summary statistics of exon numbers of four TPS types.

Type Count Mean Median St. Dev. Min Max α 1753 5.08 5 2.12 1 14 TRI5 545 3.90 4 1.89 1 10 CPS/KS 230 4.04 4 1.70 1 13 αα1 114 6.05 5.5 2.90 1 11

168 Table 5.2: Summary statistics of mRNA lengths of four TPS types.

Type Count Mean Median St. Dev. Min Max α 1753 1071.03 1041 132.78 903 3117 TRI5 545 1009.87 987 106.01 900 2499 CPS/KS 230 2581.29 2808 544.81 948 4269 αα1 114 2112.45 2142 148.11 1605 2418

169 α TRI5 CPS/KS αα1 Total Species Sordariomycetes 242 (65) 54 (38) 57 (32) 28 (19) 80 Leotiomycetes 27 (14) 3 (3) 9 (4) 2 (1) 18 Xylonomycetes 2 (1) 0 (0) 0 (0) 2 (2) 3 Lecanoromycetes 4 (2) 0 (0) 2 (2) 1 (1) 2 Eurotiomycetes 113 (46) 73 (38) 60 (43) 41 (27) 59 Pezizomycotina Dothideomycetes 130 (65) 19 (18) 88 (45) 37 (24) 93 Orbiliomycetes 3 (1) 0 (0) 0 (0) 1 (1) 2 Pezizomycetes Ascomycota 5 (3) 10 (4) 3 (2) 0 (0) 9 Saccharomycotina --- - 40 Taphrinomycotina --- - 7 Dikarya Ustilaginomycotina --- - 16 Pucciniomycotina Basidiomycota 7 (2) 27 (2) - - 22 Agaricomycotina 1220 (126) 359 (94) 11 (7) 2 (1) 136 Kickxellomycotina --- - 3 Zoopagomycotina Zygomycota --- - 0 Mucoromycotina --- - 12 Entomophthoromycotina --- - 2 Cryptomycota --- - 1 Blastocladiomycota --- - 1 Neocallimastigomycota --- - 2 Glomeromycota --- - 1 Microsporidia --- - 8 Chytridiomycota --- - 2 1753 545 230 114 519

Figure 5.1: Phylogenetic tree of major fungal lineages and corresponding number of terpene synthases in each of the four types. The numbers of species is shown in parentheses. The depicted tree is approximately based on paper (Hibbett et al., 2007). For the full terpene synthase gene distribution for each of 519 fungal species analyzed, see Appendix A.2.

170 AlAl1_LEC_CZOJ_335 AlAl1_SOR_KWUN_14422 AlAl1_EUR_GOAS_4551 0.5 1 AlAl1_LEO_FLJQ_13430 1 AlAl1_EUR_HLIA_1630 1 AlAl1_EUR_QHTQ_10911 0.67 0.36 AlAl1_EUR_SXXD_1959 1 AlAl1_EUR_WBMS_10503 1 AlAl1_EUR_GJUY_1660 AlAl1_EUR_VYMS_5245 10.91 AlAl1_EUR_WKWV_11836 0.69AlAl1_EUR_ODBE_10650 0.93AlAl1_EUR_RTHQ_11768 0.84 AlAl1_EUR_VQCR_11796 0.89 AlAl1_EUR_EYTM_7265 0.97AlAl1_EUR_JHOF_10250 AlAl1_EUR_ONDG_11120 AlAl1_SOR_JKHV_7910 AlAl1_DOT_ERHT_3641 1 1 AlAl1_DOT_YXFC_1880 0.84 AlAl1_SOR_QCRC_13764 1 AlAl1_SOR_WINP_3356 1 AlAl1_SOR_FSJK_6335 0.99 0.83 AlAl1_SOR_UMYR_3729 α(1753) AlAl1_SOR_LATZ_7654 0.43 AlAl1_DOT_DKTZ_12307 AlAl1_DOT_JMZC_5726 TRI5(545) 0.8 AlAl1_DOT_OCCK_982 0.85 AlAl1_DOT_JSQJ_6579 1 AlAl1_DOT_SYTJ_7431 CPS/KS(230) 0.93 AlAl1_DOT_NWTK_7859 AlAl1_DOT_PINU_5695 1 1 αα1(114) AlAl1_DOT_TNXD_11473 0.86AlAl1_DOT_DKTZ_11889 AlAl1_DOT_JMZC_5492 AlAl1_DOT_OCCK_10101 Single_EUR_DPIG_4069 1 Single_EUR_YUBL_2162 AlAl1_EUR_PPHY_2376 0.97 0.36 AlAl1_EUR_AYAI_1086 1 AlAl1_EUR_HLIA_4516 0.42 AlAl1_EUR_HNBX_2834 AlAl1_SOR_LATZ_6187 AlAl1_SOR_QCRC_12413 1 1 1 AlAl1_SOR_WINP_14583 0.94 AlAl1_SOR_QAIF_1219 0.99 0.99 AlAl1_SOR_XBMT_14221 Single_DOT_JSQJ_10445 0.96 AlAl1_DOT_ZFEH_9911 0.99 AlAl1_SOR_ZMIH_455 0.83 1 AlAl1_SOR_FSJK_12289 0.72AlAl1_SOR_UMYR_13737 1 AlAl1_SOR_XAFO_12293 AlAl1_EUR_DPIG_10264 1 AlAl1_EUR_BSIJ_8966 1 AlAl1_EUR_LJBX_8771 AlAl1_ORB_IHAF_5847 AlAl1_EUR_VQCR_11138 0.95 1 AlAl1_EUR_VYMS_4908 0.98 AlAl1_EUR_BSIJ_9281 1 AlAl1_XYL_HNJE_4522 1 AlAl1_XYL_MZAQ_2518 0.85 AlAl1_DOT_JOFG_5695 Single_SOR_YIJV_2052 0.72 AlAl1_SOR_RHTP_5019 0.17 0.97 AlAl1_SOR_EBHA_9358 0.99 AlAl1_SOR_RQYB_6119 1 AlAl1_SOR_RBPQ_10632 1 0.98 AlAl1_SOR_RBPQ_9392 AlAl1_DOT_NJXP_7686 0.94 0.86 AlAl1_DOT_OWZC_3702 0.9 1 AlAl1_DOT_MORC_2898 1 AlAl1_DOT_QGZN_2491 1 AlAl1_DOT_DKTZ_631 1AlAl1_DOT_JMZC_1240 AlAl1_DOT_OCCK_10424 Single_DOT_VWIU_6821 0.6 AlAl1_EUR_QHTQ_10144 0.23 1 AlAl1_EUR_YUBL_11604 AlAl1_EUR_MHSV_9415 0.98 1 AlAl1_EUR_WYIH_587 0.96 AlAl1_EUR_WBMS_4798 0.77 1 AlAl1_EUR_PYJS_4999 0.95 AlAl1_EUR_ODBE_6888 AlAl1_EUR_RTHQ_7399 1 1 AlAl1_EUR_EYTM_12258 0.99 0.97AlAl1_EUR_JHOF_11404 AlAl1_EUR_GJUY_4531 1AlAl1_EUR_VYMS_8671 0.91 AlAl1_EUR_WKWV_6692 AlAl1_EUR_XJPO_12400 AlAl1_DOT_ZLLL_9302 1 AlAl1_SOR_RQYB_6003 1 AlAl1_DOT_XAAG_6412 0.98 Single_SOR_GDRU_6156 0.92 AlAl1_SOR_DEFX_929 1 AlAl1_EUR_XYUY_48 0.48 Single_DOT_HCWQ_13388 0.7 Single_DOT_ZFEH_9984 0.6 Single_DOT_AXMY_1171 0.51 Single_LEO_RSMD_12172 0.930.8 Single_SOR_IUGD_9553 1 Single_EUR_TALT_6058 1 Single_EUR_XYUY_4566 0.39 Single_AGA_RHJD_16068 0.99 AlAl1_AGA_RHJD_16069 0.95 0.32 AlAl1_AGA_RHJD_16091 0.48 AlAl1_DOT_GEAK_12341 1 AlAl1_DOT_DGLV_16613 1 AlAl1_DOT_PINU_12649 AlAl1_SOR_IADJ_90 0.99 AlAl1_DOT_EGUL_6851 0.92 AlAl1_SOR_GDRU_19690 0.63 AlAl1_DOT_USYT_10235 1 AlAl1_SOR_DEFX_11311 AlAl1_LEO_FLJQ_1601 0.82 AlAl1_EUR_HPVK_8023 1 0.98 AlAl1_SOR_GYZV_8783 0.01 AlAl1_EUR_AYAI_1601 1 1 AlAl1_SOR_RHTP_45 AlAl1_DOT_NWTK_117 0.97AlAl1_DOT_PINU_11393 1 1 AlAl1_DOT_TNXD_10527 AlAl1_DOT_RDUF_4893 0.19AlAl1_DOT_JMZC_1090 0.96AlAl1_DOT_DKTZ_836 AlAl1_DOT_OCCK_11708

Figure 5.2: Phylogeny of all four types of fungal terpene synthases. Left: The evolutionary relationships of four types of fungal terpene synthases identied from fungal genomes. The phylogenetic tree was constructed with FastTree based on a multiple sequence alignment generated with MAFFT-LINSI. Sequences were color-coded based on types (α, Blue; TRI5, Orange; CPS/KS, Purple; αα1, Magenta). Right: The enlarged subtree shows the αα1 type of terpene synthases.

171 Agaricomycotina(1220) Pucciniomycotina(7) Basidiomycota Sordariomycetes(242) Eurotiomycetes(113) Dothideomycetes(130) Leotiomycetes(27) Pezizomycetes(5) Ascomycota Lecanoromycetes(4) Orbiliomycetes(3) Xylonomycetes(2)

Figure 5.3: Maximum likelihood phylogeny of 1753 α type terpene synthases.

172 TRI5_AGA_DOXV_12739

TRI5_AGA_FTUW_13960 TRI5_AGA_DOXV_12741

TRI5_AGA_FTUW_13961 TRI5_AGA_WOTK_5798 TRI5_AGA_OCCJ_13094

TRI5_AGA_RRJW_3005 TRI5_AGA_FTUW_13858 TRI5_AGA_OCCJ_13093 TRI5_AGA_FTUW_13971 TRI5_AGA_OCCJ_7422 TRI5_AGA_DOXV_12733 TRI5_AGA_OCCJ_12092 TRI5_AGA_FTUW_12871 TRI5_AGA_FTUW_13970 TRI5_AGA_DOXV_12734

TRI5_AGA_UZKW_14198 TRI5_AGA_OCCJ_10780 TRI5_AGA_OVWY_4835 TRI5_AGA_OCCJ_1642

TRI5_AGA_NNLT_12305 TRI5_AGA_PCAN_8892 TRI5_AGA_ZQBK_10065 TRI5_AGA_DBVV_5450 TRI5_AGA_ZQBK_10069

TRI5_AGA_PFYJ_7007 TRI5_AGA_PFYJ_13467TRI5_AGA_PFYJ_6548

TRI5_AGA_PFYJ_7911 TRI5_AGA_XPNY_6248 TRI5_AGA_PFFQ_122 TRI5_AGA_PFFQ_2269 TRI5_AGA_XPNY_6585 TRI5_AGA_XPNY_6577 TRI5_AGA_PFFQ_9358 TRI5_AGA_RIYR_11247TRI5_AGA_PFFQ_4321

TRI5_AGA_RRJW_8509 TRI5_AGA_YXNI_5102 TRI5_AGA_TVNE_6111 TRI5_AGA_USCC_7530 TRI5_AGA_RFWF_2467 TRI5_AGA_RFWF_5712 TRI5_AGA_QVXG_4808 TRI5_AGA_HEVO_8656 TRI5_AGA_QVXG_4812 TRI5_AGA_QVXG_3315 TRI5_AGA_OTUO_9445 TRI5_AGA_AFVL_9358 TRI5_AGA_AFVL_9350 TRI5_AGA_FHKP_5076 TRI5_AGA_NMWS_13286 TRI5_AGA_NMWS_4412 TRI5_AGA_ZDTE_13752 TRI5_AGA_YVFV_8778 TRI5_AGA_ZDTE_13759 TRI5_AGA_ZDTE_15001 TRI5_AGA_ZDTE_13758 TRI5_AGA_FJVL_5586 TRI5_AGA_ZDTE_5257 TRI5_AGA_FVKT_6659 TRI5_AGA_GBII_5970 TRI5_AGA_CQIM_3094 TRI5_AGA_PELB_7797 TRI5_AGA_ETNA_4723 TRI5_AGA_WYMN_20268 TRI5_AGA_WYMN_3094 TRI5_AGA_GBII_6125 TRI5_AGA_TVNE_4320 TRI5_AGA_XPDP_9898 TRI5_AGA_USCC_5504 TRI5_AGA_JGOT_10118 TRI5_AGA_DECH_15867 TRI5_AGA_JGOT_10116 TRI5_AGA_DECH_3841 TRI5_AGA_DECH_12127 TRI5_AGA_DECH_14634 TRI5_AGA_DCEF_3355 TRI5_AGA_DECH_6271 TRI5_AGA_ZNEB_5569 TRI5_AGA_DECH_9452 TRI5_AGA_DECH_10773 TRI5_AGA_YLAB_3258 TRI5_AGA_QZVB_1448TRI5_AGA_ZQJS_374 TRI5_AGA_YLAB_3981 TRI5_AGA_QDCF_11900 TRI5_AGA_RSVJ_2047 TRI5_AGA_RSVJ_2051 TRI5_AGA_ZNEB_5563 TRI5_AGA_RSVJ_12676 TRI5_AGA_RSVJ_14354 TRI5_AGA_QDCF_11908 TRI5_AGA_RSVJ_2046 TRI5_AGA_ZQJS_382 TRI5_AGA_TDAC_12576 TRI5_AGA_TDAC_16052 TRI5_AGA_DCEF_3363 TRI5_AGA_TDAC_16051 TRI5_AGA_JWFA_10373 TRI5_AGA_QDCF_11905TRI5_AGA_ZQJS_379 TRI5_AGA_SUQO_8461 TRI5_AGA_SUQO_8454 TRI5_AGA_QDCF_11899TRI5_AGA_DCEF_3364 TRI5_AGA_SUQO_8452 TRI5_AGA_XWAP_9351 TRI5_AGA_SUQO_8449 TRI5_AGA_QZVB_1449TRI5_AGA_ZQJS_373 TRI5_AGA_JWFA_10370 TRI5_AGA_SUQO_9798 TRI5_AGA_PELB_7798 TRI5_AGA_XWAP_9356 TRI5_AGA_PELB_7789 TRI5_AGA_JWFA_7092 TRI5_AGA_JWFA_9783 TRI5_AGA_GBII_6096 TRI5_AGA_XWAP_13542 TRI5_AGA_XPDP_9916 TRI5_AGA_JGOT_10140 TRI5_AGA_DJWQ_4441 TRI5_AGA_OTUO_6642 TRI5_AGA_DCEF_3366 TRI5_AGA_OTUO_9271 TRI5_AGA_QDCF_11897 TRI5_AGA_SUPT_5420 TRI5_AGA_OTUO_11885 TRI5_AGA_ZQJS_369 TRI5_AGA_OTUO_9270 TRI5_AGA_QZVB_1451 TRI5_AGA_STXZ_10331 TRI5_AGA_XPNY_6579 TRI5_AGA_OTUO_2658 TRI5_AGA_PFFQ_4338 TRI5_AGA_NDTR_10109 TRI5_AGA_HEVO_8658 TRI5_AGA_NDTR_3207 TRI5_AGA_SUQO_10705 TRI5_AGA_FJVL_5582 TRI5_AGA_EHFU_4594 TRI5_AGA_YVFV_8782 TRI5_AGA_RJGY_15308 TRI5_AGA_FHKP_5079 TRI5_AGA_JEWI_4608 TRI5_AGA_PFFQ_4322 TRI5_AGA_PMTX_5846 TRI5_AGA_FJVL_5583 TRI5_AGA_PMTX_5845 TRI5_AGA_PELB_7790 TRI5_AGA_PMTX_5844 TRI5_AGA_PMTX_10978 TRI5_AGA_PELB_7796 TRI5_AGA_PMTX_3171 TRI5_AGA_PELB_7794 TRI5_AGA_PMTX_13066 TRI5_AGA_JGOT_10141 TRI5_AGA_PMTX_4363 TRI5_AGA_GBII_3190 TRI5_AGA_JEWI_10774 TRI5_AGA_GBII_6095 TRI5_AGA_RJGY_5624 TRI5_AGA_DCEF_3365 TRI5_AGA_JEWI_4249 TRI5_AGA_QDCF_11898 TRI5_AGA_EHFU_3494 TRI5_AGA_PQBN_316 TRI5_AGA_ZQJS_371 TRI5_AGA_PQBN_5043 TRI5_AGA_QZVB_1450 TRI5_AGA_PQBN_7693 TRI5_AGA_PFFQ_6385 TRI5_AGA_RJGY_17175 TRI5_AGA_RIYR_12387 TRI5_AGA_JEWI_12244 TRI5_AGA_HEVO_8635 TRI5_AGA_PMTX_8557 TRI5_AGA_FJVL_8896 TRI5_AGA_RYSH_14471 TRI5_AGA_YVFV_13468 TRI5_AGA_WLYJ_12825 TRI5_AGA_FHKP_12902 TRI5_AGA_RHJD_21919 TRI5_AGA_RHJD_17528 TRI5_AGA_DCEF_3381 TRI5_AGA_RHJD_17527 TRI5_AGA_GBII_6091 TRI5_AGA_RHJD_14847 TRI5_AGA_XPDP_9927 TRI5_AGA_RHJD_17591 TRI5_AGA_JGOT_10146 TRI5_AGA_RHJD_8543 TRI5_AGA_PELB_9645 TRI5_AGA_RHJD_8544 TRI5_AGA_QDCF_11879 TRI5_AGA_KMDG_24984 TRI5_AGA_KMDG_24989 TRI5_AGA_QZVB_1467TRI5_AGA_ZQJS_8 TRI5_AGA_KMDG_31976 TRI5_AGA_KMDG_30689 TRI5_PUC_ITMK_7792 TRI5_AGA_DOXV_8784 TRI5_PUC_ITMK_6270 TRI5_AGA_FTUW_5204 TRI5_PUC_ITMK_5245 TRI5_AGA_QMZJ_32 TRI5_PUC_ITMK_8373 TRI5_AGA_QMZJ_33 TRI5_PUC_ITMK_8369 TRI5_AGA_QMZJ_3062 TRI5_PUC_ITMK_4385 TRI5_AGA_QMZJ_10137 TRI5_PUC_ITMK_2717 TRI5_AGA_QMZJ_4066 TRI5_PUC_ITMK_7257 TRI5_AGA_QMZJ_10152 TRI5_PUC_ITMK_7196 TRI5_AGA_QMZJ_2364 TRI5_AGA_UZKW_6665 TRI5_AGA_QMZJ_5396 TRI5_AGA_QMZJ_10153 TRI5_AGA_OCJC_9560 TRI5_AGA_QMZJ_10154 TRI5_AGA_NWCG_516 TRI5_AGA_QMZJ_12256 TRI5_AGA_NWCG_13467 TRI5_AGA_BSDJ_10528 TRI5_AGA_OCJC_4348 TRI5_AGA_BSDJ_7358 TRI5_AGA_NWCG_2678 TRI5_AGA_BSDJ_1736 TRI5_AGA_OCJC_18057 TRI5_AGA_BSDJ_7843 TRI5_AGA_NWCG_10504 TRI5_AGA_AFVL_10642 TRI5_AGA_OCJC_13371 TRI5_DOT_AXMY_5681 TRI5_AGA_NWCG_15238 TRI5_AGA_QOEU_28169 TRI5_AGA_PYHT_2188 TRI5_AGA_TVNE_9732 TRI5_AGA_USCC_10538 TRI5_AGA_MBVZ_8854 TRI5_AGA_KMDG_1048 TRI5_AGA_MBVZ_8853 TRI5_AGA_PTDW_9513 TRI5_AGA_MBVZ_7535 TRI5_AGA_SAAH_14616 TRI5_AGA_MBVZ_7219 TRI5_AGA_PTDW_12043 TRI5_AGA_RDKM_12659 TRI5_AGA_SAAH_14967 TRI5_AGA_QHHM_6877 TRI5_AGA_TVNE_8192 TRI5_AGA_HZVO_9682 TRI5_AGA_USCC_624 TRI5_AGA_HZVO_4801 TRI5_AGA_TVNE_1325 TRI5_AGA_HZVO_4805 TRI5_AGA_USCC_3040 TRI5_AGA_HZVO_4802 Pucciniomycotina (27) TRI5_AGA_XMTQ_4271 TRI5_AGA_XMTQ_4297 TRI5_AGA_DECH_9524 TRI5_AGA_XMTQ_290 TRI5_AGA_URDR_9133 TRI5_AGA_LKKY_17116 TRI5_AGA_URDR_2751 Agaricomycotina (359) TRI5_AGA_SDNQ_19532 TRI5_AGA_URDR_9390 TRI5_AGA_SDNQ_19668 TRI5_AGA_URDR_14614 TRI5_AGA_SDNQ_19004 TRI5_AGA_URDR_1810 TRI5_AGA_SDNQ_18219 TRI5_DOT_RKGA_5783 Pezizomycetes (10) TRI5_DOT_CZYK_6834 TRI5_SOR_EBHA_9140 TRI5_DOT_TIPI_9030 TRI5_DOT_PINU_11395 TRI5_SOR_KWUN_6344 TRI5_DOT_NWTK_120 Eurotiomycetes (73) TRI5_DOT_CWHF_4010 TRI5_SOR_WMCS_14019 TRI5_DOT_JVUN_10081 TRI5_DOT_XLMY_11949 TRI5_SOR_ZEXW_5627 TRI5_DOT_XMUQ_9203 TRI5_SOR_LATZ_10449 Dothideomycetes (19) TRI5_DOT_DGLV_2492 TRI5_SOR_RCRI_7705 TRI5_DOT_CGEC_7187 TRI5_SOR_DEFX_4476 TRI5_SOR_KWUN_5724 TRI5_SOR_GDRU_6120 Leotiomycetes (3) TRI5_DOT_FARK_750 TRI5_SOR_GDRU_4006 TRI5_EUR_BPJM_1424 TRI5_DOT_TQZK_13419 TRI5_EUR_ODFM_6722 TRI5_SOR_YCLU_3501 TRI5_EUR_RKIM_2641 TRI5_SOR_TQPY_11038 Sordariomycetes (54) TRI5_EUR_ODFM_10477 TRI5_SOR_LATZ_16228 TRI5_PUC_UWEZ_5627 TRI5_SOR_CLIK_8878 TRI5_PUC_UWEZ_2424 TRI5_SOR_YIJV_7450 TRI5_PUC_UWEZ_907 TRI5_PUC_UWEZ_2377 TRI5_SOR_BHOE_5824 TRI5_PUC_UWEZ_2953 TRI5_SOR_YIJV_12023 TRI5_PUC_UWEZ_927 TRI5_SOR_GYZV_8897 TRI5_PUC_UWEZ_5529 TRI5_SOR_RCRI_10543 TRI5_EUR_OVDR_11924 TRI5_SOR_BYZB_8637 TRI5_EUR_QPWW_12575 TRI5_SOR_LATZ_13836 TRI5_EUR_XYUY_6896 TRI5_SOR_RHTP_8140 TRI5_PUC_UWEZ_2389 TRI5_SOR_AFYZ_15498 TRI5_PUC_UWEZ_2426 TRI5_SOR_XVHZ_4755 TRI5_PUC_UWEZ_936 TRI5_SOR_KMYC_4542 TRI5_PUC_UWEZ_1073 TRI5_PUC_UWEZ_929 TRI5_SOR_DVFN_1749 TRI5_PUC_UWEZ_935 TRI5_SOR_UTCF_2395 TRI5_PUC_UWEZ_2418 TRI5_SOR_PCJS_2954 TRI5_PUC_UWEZ_956 TRI5_SOR_ZEXW_5390 TRI5_PUC_UWEZ_2432 TRI5_SOR_QAIF_14404 TRI5_PUC_UWEZ_2522 TRI5_SOR_WINP_8718 TRI5_PUC_UWEZ_2526 TRI5_SOR_QCRC_757 TRI5_EUR_EYTM_3007 TRI5_SOR_ZMIH_11282 TRI5_EUR_JHOF_6388 TRI5_SOR_FSJK_2215 TRI5_EUR_RTHQ_8732 TRI5_EUR_VQCR_7897 TRI5_SOR_XBMT_4392 TRI5_EUR_ONDG_10022 TRI5_SOR_XAFO_5700 TRI5_EUR_GJUY_3226 TRI5_SOR_UMYR_7598 TRI5_EUR_PYJS_2158 TRI5_AGA_OAGC_5378 TRI5_EUR_UOFT_6793 TRI5_AGA_MRRS_4804 TRI5_EUR_KNVJ_619 TRI5_AGA_MRRS_9517 TRI5_EUR_CEEB_2169 TRI5_AGA_MRRS_9516 TRI5_EUR_UXZL_6872 TRI5_AGA_OAGC_8781 TRI5_EUR_TALT_2797 TRI5_AGA_OAGC_8963 TRI5_EUR_XJPO_2915 TRI5_EUR_QOCQ_9246 TRI5_AGA_OAGC_7979 TRI5_SOR_CFMZ_6799 TRI5_AGA_OAGC_7972 TRI5_EUR_EYTM_7653 TRI5_AGA_OAGC_7969 TRI5_EUR_JHOF_6956 TRI5_AGA_QTDW_1926 TRI5_EUR_RTHQ_204 TRI5_AGA_CNHF_10412 TRI5_EUR_VQCR_224 TRI5_AGA_QQOC_5237 TRI5_EUR_ONDG_6366 TRI5_AGA_SUXE_4017 TRI5_EUR_VYMS_4560 TRI5_AGA_SUXE_3977 TRI5_EUR_WKWV_10857 TRI5_EUR_GJUY_11438 TRI5_AGA_SUXE_3980 TRI5_EUR_VYMS_1394 TRI5_AGA_QQOC_4267 TRI5_EUR_WKWV_10109 TRI5_AGA_QQOC_4266 TRI5_EUR_EYTM_13192 TRI5_AGA_QTDW_2384 TRI5_EUR_JHOF_6140 TRI5_AGA_QTDW_4196 TRI5_EUR_ONDG_8357 TRI5_EUR_RTHQ_6432 TRI5_AGA_CNHF_10409 TRI5_EUR_VQCR_6825 TRI5_AGA_OABX_4228 TRI5_EUR_GJUY_5491 TRI5_AGA_QQOC_4568 TRI5_EUR_WKWV_11082 TRI5_AGA_SUXE_3978 TRI5_EUR_ODBE_9865 TRI5_AGA_CNHF_10408 TRI5_LEO_SZOF_14729 TRI5_AGA_QTDW_1008 TRI5_EUR_RKIM_8338 TRI5_EUR_GOAS_8561 TRI5_AGA_QTDW_4170 TRI5_LEO_MHPO_12850 TRI5_AGA_RAHU_8717 TRI5_EUR_SXXD_2116 TRI5_AGA_RAHU_8718 TRI5_EUR_YQCY_8009 TRI5_AGA_RAHU_8687 TRI5_EUR_UXZL_7979 TRI5_AGA_QTDW_4194 TRI5_EUR_OXNM_218 TRI5_AGA_RAHU_8702 TRI5_EUR_YUBL_2618 TRI5_DOT_FSCX_6564 TRI5_AGA_RAHU_8699 TRI5_DOT_JOFG_9461 TRI5_AGA_DGYS_11105 TRI5_DOT_MHQQ_611 TRI5_AGA_FYAN_6166 TRI5_EUR_MHSV_5359 TRI5_AGA_DGYS_6965 TRI5_EUR_WYIH_10916 TRI5_EUR_DPIG_468 TRI5_AGA_DGYS_11098 TRI5_SOR_GNBS_8478 TRI5_AGA_FYAN_10825 TRI5_EUR_RKIM_8630 TRI5_AGA_DGYS_5474 TRI5_SOR_SROL_12052 TRI5_AGA_DGYS_5470 TRI5_SOR_YIJV_4586 TRI5_SOR_XFDJ_9965 TRI5_AGA_DGYS_5473 TRI5_SOR_RCHQ_11347 TRI5_LEO_XMIZ_1273 TRI5_AGA_FYAN_10828 TRI5_EUR_ZOON_59 TRI5_AGA_FYAN_10826 TRI5_AGA_FYAN_4804 TRI5_SOR_WMCS_3808 TRI5_DOT_FSCX_11205 TRI5_AGA_FYAN_2298 TRI5_EUR_UHXJ_7921 TRI5_EUR_UOFT_9840 TRI5_AGA_DGYS_12637 TRI5_DOT_KMFJ_10129 TRI5_AGA_FYAN_4805 TRI5_EUR_OVDR_11235 TRI5_EUR_YUBL_12492 TRI5_AGA_DGYS_11902 TRI5_EUR_YUBL_6211 TRI5_AGA_FYAN_11085 TRI5_EUR_QHTQ_8018 TRI5_AGA_FYAN_4808 TRI5_EUR_BPJM_1709 TRI5_AGA_FYAN_4468 TRI5_EUR_WBMS_8183 TRI5_EUR_CEEB_2307 TRI5_AGA_YBWF_9480 TRI5_EUR_YUBL_11382 TRI5_EUR_KNVJ_1527 TRI5_AGA_SAKG_12077 TRI5_EUR_OXNM_9130 TRI5_AGA_TDSN_8895TRI5_AGA_ZCJM_104 TRI5_EUR_TDJI_7064 TRI5_EUR_YUBL_12478 TRI5_EUR_UHXJ_9563 TRI5_AGA_YBWF_3663 TRI5_EUR_UOFT_1206 TRI5_EUR_QOCQ_2669 TRI5_AGA_YBWF_3769 TRI5_EUR_ZOON_4562 TRI5_EUR_BSIJ_7386 TRI5_AGA_YBWF_3758 TRI5_EUR_LJBX_7861 TRI5_SOR_NNCN_13026 TRI5_PEZ_FQQU_11909 TRI5_AGA_ZENN_14176 TRI5_PEZ_FQQU_9197 TRI5_AGA_PMOD_6359 TRI5_PEZ_QBBJ_742 TRI5_AGA_ZCJM_4806 TRI5_PEZ_FYCQ_10151 TRI5_PEZ_FYCQ_4844 TRI5_PEZ_FYCQ_4237 TRI5_PEZ_FQQU_7361 TRI5_PEZ_QBBJ_9632 TRI5_AGA_ZCJM_10845 TRI5_PEZ_AHRI_4216 TRI5_PEZ_AHRI_611 TRI5_SOR_QBEX_6235 TRI5_AGA_ZCJM_13088 TRI5_SOR_YHYP_3697 TRI5_SOR_XVHZ_3557 TRI5_SOR_GYZV_3632 TRI5_EUR_ODBE_1738 TRI5_AGA_ZCJM_11931 TRI5_EUR_QHTQ_9457 TRI5_SOR_ANQY_8699 TRI5_SOR_XFDJ_10062 TRI5_SOR_GDRU_8280 TRI5_SOR_IOUK_6351 TRI5_SOR_XFDJ_9193 TRI5_SOR_CLIK_16672 TRI5_AGA_YBWF_1505 TRI5_AGA_QHHM_7302 TRI5_SOR_CLIK_4653

TRI5_AGA_ZENN_14177TRI5_AGA_PMOD_6358 TRI5_AGA_DLNR_3485 TRI5_AGA_DLNR_3484 TRI5_AGA_TDSN_1905 TRI5_AGA_SAKG_8475 TRI5_AGA_TDSN_1908 TRI5_AGA_SAKG_8479 TRI5_AGA_DLNR_4354 TRI5_AGA_FSVD_9640

TRI5_AGA_PKON_5766 TRI5_AGA_ZENN_13851TRI5_AGA_FSVD_9618 TRI5_AGA_YBWF_1498 TRI5_AGA_DLNR_3486

TRI5_AGA_ZCJM_13087 TRI5_AGA_ZCJM_13082TRI5_AGA_TDSN_1937 TRI5_AGA_SAKG_5075 TRI5_AGA_TDSN_1947 TRI5_AGA_SAKG_8436

TRI5_AGA_PMOD_8298 TRI5_AGA_FSVD_441 TRI5_AGA_PMOD_6398 TRI5_AGA_PMOD_3368TRI5_AGA_ZENN_7622 TRI5_AGA_PKON_4964

TRI5_AGA_PKON_7936 TRI5_AGA_ZENN_7624 TRI5_AGA_PKON_5153 TRI5_AGA_ZENN_5256 TRI5_AGA_FSVD_4698 TRI5_AGA_ZENN_14181 TRI5_AGA_PKON_8226 TRI5_AGA_QQOC_6532

TRI5_AGA_QHHM_7783

0.4

Figure 5.4: Maximum likelihood phylogeny of 545 Trichodiene synthases.

173 Dendrothele bispora CBS 962.96 Corticiale Punctularia strigosozonata HHB-11173 SS5

Gymnopus androsaceus JB14 Agaricales Gymnopus luxurians FD-317 M1

Serpula lacrymans var. shastensis SHA21-2

Serpula lacrymans var. lacrymans S7.3 Boletales

Serpula lacrymans var. lacrymans S7.9

2.0

Figure 5.5: Phylogenetic relationship among 7 Basidiomycota species containing bifunctional CPS/KSs.

174 CPSKS_DOT_MZDW_4403 0.977 CPSKS_SOR_TEGV_6778 CPSKS_LEO_SPIB_947 0.93 0.829 CPSKS_DOT_XJJN_10316 1 CPSKS_DOT_DGLV_14198 0.913 CPSKS_DOT_AEZQ_2122 0.876 0.994 CPSKS_SOR_KWUN_16197 0.69 CPSKS_DOT_SCZI_8201 CPSKS_EUR_ODFM_8315 0.948 CPSKS_DOT_FKOG_4256 1 CPSKS_EUR_ZDJP_10970 CPSKS_EUR_XYUY_2171 0.999 1 0.942 CPSKS_EUR_TALT_9402 0.882 CPSKS_EUR_XJPO_11063 CPSKS_EUR_OAJC_8633 0.995 CPSKS_EUR_QHTQ_6412 0.968 CPSKS_EUR_DPIG_533 1 CPSKS_EUR_BPJM_7176 CPSKS_LEO_RSMD_5564 0.999 CPSKS_LEO_RSMD_4328 0.839 CPSKS_LEO_MHPO_9099 CPSKS_LEO_RSMD_86 0.681 0.379 CPSKS_LEO_RSMD_7443 0.999 CPSKS_EUR_WBMS_3293 0.7 CPSKS_DOT_XJJN_10314 1 CPSKS_DOT_DGLV_14200 0.998 0.922 CPSKS_SOR_KWUN_16199 0.999 CPSKS_DOT_AEZQ_2123 0.983 CPSKS_DOT_SCZI_8200 CPSKS_DOT_USYT_12065 1 CPSKS_DOT_LQLG_183 1 CPSKS_DOT_YXFC_16422 1 CPSKS_DOT_HIBM_562 0.962 CPSKS_DOT_XJJN_15481 0.213 CPSKS_DOT_XMUQ_7194 0.996 CPSKS_DOT_XLMY_3566 0.996 CPSKS_SOR_KWUN_12553 CPSKS_DOT_XAAG_3366 0.996 0.991 CPSKS_DOT_HIBM_8285 1 CPSKS_DOT_VRZF_5217 0.956 CPSKS_DOT_VWIU_9653 0.999 CPSKS_DOT_DGLV_14328 1 CPSKS_DOT_EGUL_11467 0.983 CPSKS_DOT_CGEC_8473 0.987 CPSKS_DOT_ERHT_2126 CPSKS_EUR_WBMS_3295 1 CPSKS_SOR_SROL_13045 0.96 CPSKS_EUR_YUBL_12612 0.976 CPSKS_SOR_WMCS_11898 0.139 CPSKS_DOT_HCWQ_15696 1 CPSKS_DOT_HCWQ_15871 CPSKS_DOT_CGEC_655 0.25 1 CPSKS_DOT_EGUL_9356 0.975 CPSKS_SOR_KMYC_4180 1 CPSKS_SOR_XVHZ_2080 1CPSKS_SOR_QBEX_7151 1 CPSKS_SOR_YHYP_1896 1 CPSKS_EUR_RKIM_5987 0.998 CPSKS_EUR_ODBE_11155 1 CPSKS_EUR_ONDG_5423 1CPSKS_EUR_EYTM_13648 0.982CPSKS_EUR_JHOF_1456 0.94 CPSKS_EUR_RTHQ_10379 CPSKS_EUR_VQCR_10397 CPSKS_DOT_UMGD_4836 0.894 CPSKS_EUR_ZDYO_8853 0.507 CPSKS_DOT_AXMY_3391 0.782 CPSKS_DOT_UMGD_7021 0.441 1 CPSKS_DOT_LQLG_14720 1 CPSKS_DOT_AXMY_11965 0.805 0.983CPSKS_DOT_XTCE_9312 0.983 CPSKS_DOT_XJJN_7572 CPSKS_DOT_FXQO_14739 1 CPSKS_DOT_HIBM_2519 0.25 CPSKS_DOT_ZLLL_11498 0.995 CPSKS_DOT_KCAO_6662 0.507 CPSKS_DOT_XTCE_13513 0.721 CPSKS_SOR_KWUN_10377 0.999 CPSKS_DOT_LQLG_2095 CPSKS_DOT_FKOG_11741 CPSKS_LEO_SPIB_12960 0.906 CPSKS_EUR_XYUY_3545 0.997 CPSKS_EUR_OXNM_7879 1 CPSKS_EUR_TDJI_2453 0.085 CPSKS_SOR_CNMF_2055 CPSKS_SOR_LZDR_9666 Agaricomycotina (11) 1 CPSKS_SOR_VIJX_8001 0.982CPSKS_SOR_PCFR_9778 0.276 CPSKS_SOR_WJHB_5724 1 CPSKS_SOR_VGCQ_9400 1 CPSKS_LEO_MHPO_1615 Pezizomycetes (3) 0.667 CPSKS_SOR_TQPY_2580 0.145 CPSKS_SOR_YCLU_385 0.762 1 CPSKS_SOR_IUGD_4178 0.896 CPSKS_SOR_WMCS_10919 CPSKS_SOR_TQPY_3452 Eurotiomycetes (60) 0.98 0.94 CPSKS_SOR_IUGD_6121 1 CPSKS_SOR_LWWS_1830 CPSKS_DOT_USYT_4234 1 CPSKS_DOT_ZLLL_7849 CPSKS_DOT_CGEC_652 Dothideomycetes (88) 0.512 CPSKS_DOT_HCWQ_834 0.774 1 CPSKS_SOR_WMCS_7061 0.178 CPSKS_DOT_DGLV_12541 0.999 CPSKS_DOT_HIBM_13704 Lecanoromycetes (2) 1 0.994 CPSKS_SOR_KWUN_6074 1 CPSKS_DOT_TQZK_9417 0.857 CPSKS_DOT_UMGD_10601 0.952 CPSKS_DOT_AEZQ_10931 CPSKS_LEO_QSWL_5652 Leotiomycetes (9) 0.978 CPSKS_DOT_OWZC_7490 CPSKS_DOT_HIBM_8892 1 0.403 CPSKS_DOT_CGEC_6912 1 CPSKS_DOT_HCWQ_457 0.949 CPSKS_DOT_DGLV_310 Sordariomycetes (57) 0.428 CPSKS_DOT_XJJN_13624 0.976 CPSKS_DOT_XTCE_6461 0.852 CPSKS_DOT_ERHT_11516 0.896 CPSKS_DOT_XMUQ_10064 0.996 CPSKS_DOT_XLMY_14444 0.999 CPSKS_DOT_JVUN_5205 CPSKS_DOT_TYVU_5171 1 CPSKS_SOR_KWUN_13617 0.913 CPSKS_DOT_RDUF_10043 1 CPSKS_DOT_PINU_8530 1CPSKS_DOT_TNXD_11524 CPSKS_LEC_UZSC_8458 0.951 CPSKS_LEC_CZOJ_25 CPSKS_EUR_HLIA_10884 0.793 0.994 CPSKS_EUR_ZDZM_8746 1 CPSKS_EUR_GOAS_787 0.836 1 CPSKS_EUR_ODBE_1227 CPSKS_EUR_GJUY_6232 0.995 0.995CPSKS_EUR_VYMS_9676 0.968CPSKS_EUR_WKWV_2376 0.982 CPSKS_EUR_RTHQ_9627 0.467 CPSKS_EUR_VQCR_9756 0.999CPSKS_EUR_ONDG_7478 0.991CPSKS_EUR_EYTM_8792 CPSKS_EUR_JHOF_4955 CPSKS_DOT_RHMD_9423 1 CPSKS_EUR_UXZL_9842 0.993 CPSKS_EUR_PPHY_3786 0.989 0.984CPSKS_EUR_KHOX_7447 0.589 CPSKS_EUR_KHOX_8376 CPSKS_EUR_CEEB_3814 0.998 1 CPSKS_EUR_CEEB_550 1 CPSKS_EUR_QHTQ_11670 0.653 CPSKS_EUR_GJUY_10992 0.903 CPSKS_EUR_WBMS_9166 1 CPSKS_EUR_ZDYO_5158 0.782 CPSKS_EUR_MHSV_4118 1 CPSKS_EUR_WYIH_7048 0.921 CPSKS_EUR_KWOK_7500 1 CPSKS_EUR_DSIB_6099 1CPSKS_EUR_UTRY_7271 CPSKS_DOT_FARK_848 1 CPSKS_DOT_MSMV_4294 0.643 CPSKS_PEZ_FQQU_10418 CPSKS_PEZ_AHRI_6131 0.99 0.991 CPSKS_PEZ_AHRI_9707 CPSKS_AGA_RHJD_14286 0.984 1 CPSKS_AGA_RHJD_8212 0.64 CPSKS_AGA_RHJD_14288 0.999 CPSKS_AGA_KMDG_10032 1 CPSKS_AGA_NNLT_27262 0.992 CPSKS_AGA_BSDJ_5166 0.173 CPSKS_AGA_BSDJ_8949 0.999 CPSKS_AGA_SSIY_2492 0.317 CPSKS_AGA_XJPI_2953 1 CPSKS_AGA_OVWY_7434 0.952 CPSKS_AGA_OVWY_7324 CPSKS_EUR_HNBX_2573 1 CPSKS_EUR_ODFM_10954 1 CPSKS_EUR_TDJI_6816 1 CPSKS_EUR_SXXD_2363 1 CPSKS_EUR_ZDYO_7254 CPSKS_SOR_UMYR_11981 1 CPSKS_SOR_XAFO_12197 0.989 1 CPSKS_SOR_JKHV_5739 0.241 CPSKS_SOR_UMYR_14479 1 CPSKS_SOR_FSJK_14949 1 CPSKS_SOR_UMYR_12790 CPSKS_SOR_WMCS_12952 1 0.996 CPSKS_SOR_UMYR_14975 1 CPSKS_SOR_XAFO_12277 0.995 CPSKS_SOR_QCRC_15113 1CPSKS_SOR_WINP_12851 CPSKS_DOT_SYTJ_5381 CPSKS_SOR_WMCS_9023 0.481 CPSKS_SOR_HJXM_2671 0.862 1 CPSKS_EUR_PYJS_10626 0.96 CPSKS_DOT_QGZN_2016 1 CPSKS_DOT_UFYD_10637 0.999 CPSKS_EUR_QPWW_10042 0.867 CPSKS_SOR_RHTP_9642 0.936 0.985 0.032 CPSKS_SOR_CLIK_10266 0.999 CPSKS_SOR_DEFX_2883 CPSKS_SOR_DEFX_3811 0.571 CPSKS_EUR_LVGZ_2542 1 0.74CPSKS_EUR_PEHU_1687 0.657 CPSKS_EUR_WIPJ_2263 CPSKS_SOR_CLIK_10267 1 CPSKS_SOR_DEFX_8802 0.071 0.882 CPSKS_DOT_AXMY_2174 1 CPSKS_DOT_AXMY_2175 0.952 CPSKS_SOR_KWUN_3000 CPSKS_SOR_BYZB_5410 0.987 CPSKS_SOR_KVJG_1845 CPSKS_SOR_NNCN_9158 1 CPSKS_SOR_SLHE_7931 0.849 0.34 CPSKS_SOR_JKHV_4040 1 1 CPSKS_SOR_FSJK_15156 0.998 CPSKS_SOR_UMYR_12793 1 CPSKS_SOR_UMYR_14517 1 CPSKS_SOR_UMYR_15147 CPSKS_DOT_XAAG_7177 0.532 CPSKS_DOT_AXMY_2217 1 CPSKS_DOT_RKGA_6318 CPSKS_SOR_DEFX_4220 0.899 0.999 CPSKS_SOR_XFDJ_9804 0.636 CPSKS_DOT_GEAK_83 1 CPSKS_DOT_EBZG_8634 1 CPSKS_DOT_JHQD_6433 CPSKS_DOT_TNXD_12530 1 CPSKS_DOT_RDUF_11293 1 CPSKS_DOT_PINU_8503 1CPSKS_DOT_TNXD_12166 0.764CPSKS_DOT_NWTK_11994 0.434 CPSKS_DOT_JMZC_6315 0.999CPSKS_DOT_DKTZ_12470 0.824CPSKS_DOT_OCCK_6999

0.3

Figure 5.6: Maximum likelihood phylogeny of 230 bifunctional CPS/KSs.

175

* 20 * 40 * 60 * 80 * 100 * 120 * CPSKS_PEZ_FQQU_10418 : ------MSI----DLHTAANAILHSLSA----GHDPK-HGFGAMSNSSYDTAWVSMIKHPS-TG--LWLFPQCFHQLLARQSE-RGCWN-GNPSDEFSVILNTLAGLLAVKKHMENTSPSNPESA-DGDKPM : 111 CPSKS_PEZ_AHRI_6131 : ------MSTHKD-NWSFGHSDVATRLLDQVWSQYDSK-HGFGSMSGNFYDTAWVSMVSRNI-EGVKKWIFPEAFDSIISGQGD-DGGWKAFKTDQAVDKIMSGLACLLSLLQHK------IPEAPFSAAG : 114 CPSKS_PEZ_AHRI_9707 : ------MTIANPIDWSQGANDLLRRLSD----HYDDK-YGLCFSSSSYYDTAWVSLVSKKE-NGKSIWAFPECFHFVLRGQLK-DGGWPQYR--DPVDRILNTLACLLSLVAHQE------FAIWPGSQKD : 110 CPSKS_AGA_BSDJ_5166 : ------MST----DWYGSANALLRSLTS----RYDDR-YGLGTMSASIYDTAWVALVSRTI-DGQKQWIFPESFQFIYDQQAE-DGSF--PGDGSVADAVINTLACLLAFKHH------ESTWNGGKND : 104 CPSKS_AGA_BSDJ_8949 : ------: - CPSKS_AGA_KMDG_10032 : ------MDS-----YYEAANALLKDMVE----WYDAGLGGFSTWSPSIYDTAWVSLVSRTMPNGSMHWVFPECFQFIYDHQLE-DGSWRAPGCSNFDDSIINTLACLLSFKRH------ETR-GSEYTD : 106 CPSKS_AGA_NNLT_27262 : ------MMDT------YKAANDLLQKLVG----LYDAG-TGMGTWSISIYDTAWVSLVSRPDPEGSIRWVFPECFEFICNAQLV-DGSWRAPGCNDFDDAIINTLACLLSLKRH------QRL-GVGPLD : 105 CPSKS_AGA_OVWY_7324 : ---MSFTSLVSTSESCDWYDHANALLKSLTT----RYSDQ-YGLGTMSASIYDTAWVSLVSKKV-NGSIQWVFPHSFEHIYNAQSA-DGSW--QGDGSVPDAVINTMACLLSLKSH------ESSCNDGDKD : 114 CPSKS_AGA_OVWY_7434 : ---MSFTSLIST--SFDWHDHANALLRSLTT----RYSDQ-YGLGTMSGSIYDTAWVSLVSRKV-DGSIQWVFPHSFEHIYNAQST-DGSW--QGDGSIPDAVINTMACLLSLKSH------ESSWNGGDND : 112 CPSKS_AGA_RHJD_14286 : ------MLLDE------EANNLLKSVLS----HFNPS-YGLGNMSPSIYDTAWVSIIAKDN-----KWLFPATFHYLLEHQDP-SGGWE---SSSTTDSILNTLSALLSMKRH------ESN----DST : 93 CPSKS_AGA_RHJD_14288 : ------MPPDE------EANNVLKSVLS----RFNPS-YGLGNMSPSIYDTAWVSMITKDN-----KWLFPATFHYLLEHQDPESGGWE---SSSTTDSILNTLSALLSMKQH------ESD----DPT : 94 CPSKS_AGA_RHJD_8212 : MHSLDSKPLLDE------DANNLLKSVLS----RFNPS-YGLGNMSPSIYDTAWVSMITKDS-----QWLFPDAFHYVLGQQDP-SGGWK---SSSITDSISNTLSALLSMKRH------ELD----DAT : 100 CPSKS_AGA_SSIY_2492 : ------MSGSIYDTAWVSLVSKTV-DGSTQWVFPHSFQHIYNAQSA-DGSW--QGDGSIPDAIINTMACLLSLKSH------ESSWNGGVND : 76 CPSKS_AGA_XJPI_2953 : ------MSGSIYDTAWVSLVSKTV-DGSTQWVFPHSFQHIYNAQSA-DGSW--QGDGSIPDAIINTMACLLSLKSH------ESSWNGGVND : 76

140 * 160 * 180 * 200 * 220 * 240 * 260 * CPSKS_PEZ_FQQU_10418 : LLGRLVRAIAFLDAAFEKWNPDETLHVGFEFIVPGILDLLRKKGIDF-QFPGLMRLTELREKKLTVCKPQLLYEM---SLPTLHSLEAFVGMEGFDFDRIKHHIKNGSMMASPSSTAAYLMNVEEWDTDAEEYLR : 242 CPSKS_PEZ_AHRI_6131 : MDIRIRAAEAFINIELQSWDINIADRVGFEVIVPTLLKLVEQFGVQFEHFPGRSILLEMRDKKLGALKPETIYSK---KTNALHSLEAFIDI--IDFDRVRHQLLCGSLMASPSSTAAYLMHASTWDAEAEAYLK : 244 CPSKS_PEZ_AHRI_9707 : IGVRIDAAVTFLKSELQLWDVEGTDYVGLEILVPLLLRLLEKHSITF-CFPGQDVLQELNKKKMAKFHPKLVYGP---KTTLLHSLEGLIGL--LDYDKVKHQLVNGSMMGSPSSTAAYLMHASEWDEEAEKYLR : 239 CPSKS_AGA_BSDJ_5166 : IAARSEKAVAFLNTALQTWDVKSTERIALEMIVPSLFEQLATFGLDF-DFPQRKYLFAVREKKLSMVDIEIVYKH---HTTVLHSLEAFVGK--IDFDRIAHHVRGGSLMASPSSTAAYLTYASKWDEAAEGYLR : 233 CPSKS_AGA_BSDJ_8949 : ------MATSTLTST------: 9 CPSKS_AGA_KMDG_10032 : LPQRIEKAASFLRGATRSWNVEAVERIAFEVILPRLLELLESEGFHF-DIPNRDTLYEIYARKLKKIDFESLYKPDAVRSGALHSLEAFAGI--CDFDRLAHHKRNGNFFASPASTAAYLMSVSVWDEEAENYLR : 238 CPSKS_AGA_NNLT_27262 : LADRAGKAVDFLKGVVTEWKVVSVERIAFEMILPRLLELLEDEGIRF-NIPDRDLFYDIYERKLKKVNLEGMYQPDAIRTGALHSLEAFVGK--CDFDRLAHHKRDGNFFASPASTAAYLMSISVWDEEAENYLR : 237 CPSKS_AGA_OVWY_7324 : IAPRAAKAVAYLVNALQGWDILSTERIAFEMIVPSLMEQLEQYGLTF-DFPQKDILLQLSRKKLAMVDFEVVYKH---HTTVLHSLEAFVGK--IDFDRLTHHLRNGSLMASPSSTAAYLMCASKWDESAEQYLN : 243 CPSKS_AGA_OVWY_7434 : IAPRAAKAVAYLVNALQGWDILSTERIAFEMIVPSLMEQLEHYGLTF-DFPQKGILLKLSSKKLAMVDFEIVYKH---HTTVLHSLEAFVGK--IDFDRLTHHLRNGSLMASPSSTAAYLMCASKWDENAEKYLN : 241 CPSKS_AGA_RHJD_14286 : LSDRIRNAKLYLASQLNSWDVSKAERVGFELTIPTMLSLLSELGIDF-SFPQRSFLMHLNSAKLSKLGPEMVYQI---RTPILHTLEGLIGH--LDFDRLSHHKRYGSFMASPSSTAAYLMYASVWDNDCENYLK : 222 CPSKS_AGA_RHJD_14288 : LSDHIRSAELYLTSQLNSWDISKAERVGFELTIPTMLSLLSELGIDF-SFPQRSFLMLLNSAKLSKLSPEKIYQS---RTPILHSLEGLIEH--IDFDCLAHHKCHGSFMASPSSTAAYLIYASAWDDDCENYLK : 223 CPSKS_AGA_RHJD_8212 : LSVRIRNAELYLASQLNSWDVSKAERVGFEVTIPTMLSLLSGLGIDF-SFPQCSYLMHLNSEKLLKLSPEKVYQS---RTPILHSLEGLIGH--LDFNCLAHHKRHGSFMASPSSTAAYLMYASAWDDDCENYLK : 229 CPSKS_AGA_SSIY_2492 : IAPRAAKAASYLVDALQGWDILSTERIAFEMIVPSLMEQLEKYGLTF-DFPQKEILLKLSKKKLAMVNFEIVYKH---HTTVLHSLEAFAGK--IDFDRLTHHLRNGSLMASPSSTAAYLMYASKWDESAERYLN : 205 CPSKS_AGA_XJPI_2953 : IAPRAAKAASYLVDALQGWDILSTERIAFEMIVPSLMEQLEKYGLTF-DFPQKEILLKLSKKKLAMVNFEIVYKH---HTTVLHSLEAFAGK--IDFDRLTHHLRNGSLMASPSSTAAYLMYASKWDESAERYLN : 205

280 * 300 * 320 * 340 * 360 * 380 * 400 CPSKS_PEZ_FQQU_10418 : FVIERGKGTEDYGGL---PSAFPSTFFEFSWVVENLLENGFEVGKLDRDSLGKIRDILEKGLVDGKGLLGFAPGLMPDSDDTAKSLVVLNRLGVHGISPDSLITEFGKPDRFKTYSFERNPSFTANCNVLKALLQ : 374 CPSKS_PEZ_AHRI_6131 : HVLHM---SAQDSELLLAPAAFPTSVFEFSWMITTLLENGLQSDELDCRSLNGMKKTLLEELESNKGLVGFAQGLSADADDTAKTCTALLLLGED-ISLDPLVNEYEAEDYFRTYPRERNASLSANCNVLLAFLT : 375 CPSKS_PEZ_AHRI_9707 : HVI------SPDGGV---PTSSPTTIFEIAWVLCNLLDNGISYKSLDPKTLGRLSSVLKDTLEAENGVCGFAPGLGCDVDDTAKSIIALQYMGID-MPREPMCKTFELEDHFRTYMLERNPSITANSNVLLSFLH : 364 CPSKS_AGA_BSDJ_5166 : HVIEQCK-SYGYGAV---CNVWPTTVFEFSWSVCNLVESGFDLAKLDQDALARIRDVLYNALAAEKGIIGFAANVGPDSDDTAKALTALTYLGKP-FPYDALLKTFELPTHFQCFQFERNPSFSANCNILIALLR : 363 CPSKS_AGA_BSDJ_8949 : ------DHSFTSCNTLSS------NLAMSVDPESTDPE------PSH------: 38 CPSKS_AGA_KMDG_10032 : HVIEQYK-VYGNGVV---GCAWPTTVMDFSWSVCNIMESGFKLDKLDKHSLTRIGDILHSYLTSENGIIGFAPNVTPDADDTSKALATLLHIGNP-FPLDNMIEAFEVSTHFQCYQHERNPSVTVNCNVLMTLLH : 368 CPSKS_AGA_NNLT_27262 : RVIRQFE-SYGYGAV---SCAWPTTVMEFAWSVCNIMESGFELKQLDQNLLARIGDVLHSYLTCENGIIGFAPNVTPDADDTSKALATLLHIGKP-FRFDDMLDAFELPTHFQCYQHERNPSVTVNCNVLMALLH : 367 CPSKS_AGA_OVWY_7324 : HVIEQCQ-SYGYGAV---CNVWPTTVFEFSWSICNLFESGFDPLRLDEGCLDRVANILHDALAAEKGVIGFAANVGPDSDDTAKALTALQFLQKP-FSMDPLIKVFELPTHFQCFQFERNPSLSANCNILTALLR : 373 CPSKS_AGA_OVWY_7434 : HVIEQCR-SYGYGAV---CNVWPTTVFEFSWSICNLFESGFDPSRLDKDCLYRISNILHDALAAEKGVIGFAANVGPDSDDTAKALTALQFLQKP-FSMDPLINVFELPTHFQCFQFERNPSLSANCNILLALLR : 371 CPSKS_AGA_RHJD_14286 : EVLSRSE-INGKGSV---PCAWPTTFFELSWIMCNLHDGGFTFKNLNSESVSTIAEILKEGIEAGNGVVGFALDVGEDADDSAKAITALHHLGTH-KSLAPLCKAFELESHFQCYPYEPNPSLTAQCNILSALLE : 352 CPSKS_AGA_RHJD_14288 : EVLSQSE-INGKGSV---PCAWPTTFFELSWIVCNLHDGGFNFNNLNPESVTAIAKILNEGIEAGGGVVGFALDIGEDADDSAKALTALHHLGSH-KSLAPLCKAFELETHFQCYPYERNPSLTAHCNILSALLE : 353 CPSKS_AGA_RHJD_8212 : EVLSRSE-INGKGSV---PCAWPTTFFELSWIVCNLHDGGFDFKNLNPESVSAVVEILKEGIEAGDGVVGFAMDTGEDADDSSKVLTALHHLGTH-KSLEPLCKAFELESHFQCYPYERNPSMTVQCNILSALLE : 359 CPSKS_AGA_SSIY_2492 : HVIDQCQ-SYGYGAV---CNVWPTTVFEFSWSICNLFESGFDASRLDKECLDRIANILYDSLAAEKGIIGFAANVGPDSDDTAKALTALQFLQKP-FSMDPLIKAFELPTHFQCFQFERNPSLSANCNILIALLR : 335 CPSKS_AGA_XJPI_2953 : HVIDQCQ-SYGYGAV---CNVWPTTVFEFSWSICNLFESGFDASRLDKECLDRIANILYDSLAAEKGIIGFAANVGPDSDDTAKALTALQFLQKP-FSMDPLIKAFELPTHFQCFQFERNPSLSANCNILIALLR : 335

* 420 * 440 * 460 * 480 * 500 * 520 * 540 CPSKS_PEZ_FQQU_10418 : --HVTP-----EKNYATQIEICVRFISQYWWNSWEALQDKWSLSMGYPIMVMSQALVKLY----SLWEQDLLPQL------PKEL--MQLQLPVVLMQGLTRTLYAQNANGSWGHLSGHEETAYSVLTLANISS : 489 CPSKS_PEZ_AHRI_6131 : --APSP------ERYHSQVWKCVRFLCNSWWNCGGEIEDKWNTSTYYSTMLLSQAFTKFL----HIFNRGEFEELSQRLIHYPEVKELLTHKLPVMLIHLLLKVIEDQSDSGSFGSRDAMEETAYAIIALSDLAS : 498 CPSKS_PEZ_AHRI_9707 : --APEP------SKFTSQISKLVNFISKGCWESDNRLGDKWHLSEYYPLMLLSQAFVKLL----YTWSQGKLPDL------PEAP--LRKEIPYILLHIVVRILQDQQANGSIGFDNSCEETCYATVALTQLAS : 478 CPSKS_AGA_BSDJ_5166 : --STEP------KNFTSQVIKAATFITEEWWTIEGPVEDKWHLSQWYPAMLASQGLLRLL----YLHGQGLFSEV------SEHL--LTVKIPLTLFGILIRVLQSQHEDGSWGVNKSREETAYAILALANLAS : 477 CPSKS_AGA_BSDJ_8949 : ------VATFLTERWWATEISAADGSLAMGYHPAMLTSQALLRFLHHDRHVGSLGL------SDDL--LSVKIPITLMDILTRTLELQNGDGSWGDKPSREATAQAVITLANLAS : 139 CPSKS_AGA_KMDG_10032 : --YPDP------NKYSKQILKASEFVINEYWNSNRVVEDKWHASPWYPGLVATRAMTKLL----HLHTQGYLKDA------SENL--VRVKIPAVLFKILSTILQTQNDDGSWGINGNPEETAYCVLSLARVSS : 482 CPSKS_AGA_NNLT_27262 : --HPEP------DKYSKQILKASEFVLSEYWNAGKLVQDKWHISPWYPALVMTRAMTTVL----YLSGQGLLKDV------PENL--ITAKIPTVLFKILSCMLQTQNEDGSWGDNGNPEETADCVLSLARLAS : 481 CPSKS_AGA_OVWY_7324 : --ASDP------AVYENQILKAATFITNEWWTTEGTVEDKWHLSRWYPAMLTSQGLIRLL----YLHGQGLFPQM------SADL--LKVKIPSTLFTIIIRILQSQRPDGSWGVRGSREETAYAVLALANLSS : 487 CPSKS_AGA_OVWY_7434 : --ASNP------AGYESQILKAATFITNEWWMTEGTVEDKWHLSRWYPAMLTSQGLVRLL----YLHGQGLFPQI------SADL--LKVKIPTTLFTIIIRILQSQCPDGSWGMGGSREETAYAVLALANLSS : 485 CPSKS_AGA_RHJD_14286 : VGASEPTKDFNEEKLAESVVEAVSFISGAWWTTNSEIGDKWHDSPYYVYLLIAQSLSKFM----LLFNQGHFAGS------PEIL--LQTKTPIALFQILIRILQSQRPDGSWG---SCEETAYALLALTKLAS : 471 CPSKS_AGA_RHJD_14288 : VGALELTEDFNKEKLAESVLKAVSFISEAWWTTNSEIEDKWHDSPYYIYLVIAQSLSKFM----LLFNQGHFARF------PEIL--VQTKIPIALFQILIRILQSQRSDGSWG---SCEETAYALLALTELAS : 472 CPSKS_AGA_RHJD_8212 : VGASEPTRD---EKFAESVLKAVRFISETWWTTNSEIKDKWHDSTYYVYLLIAQSLSKFM----LLFNQGHFAGF------PEIL--VQTKTPIALFQILIRILQSQKSNGSWG---SCEETAYALLALTKLAS : 475 CPSKS_AGA_SSIY_2492 : --ASNP------AGYENQILKAATFITSEWWTTEGTVEDKWHLSRWYPAMLTSQGLVHLL----YLHGQGLFPQI------SDDL--LKVKIPTTLFTIIVRILQSQRPDGSWGEGGSREETAYAVLALANLSS : 449 CPSKS_AGA_XJPI_2953 : --ASNP------AGYENQILKAATFITSEWWTTEGTVEDKWHLSRWYPAMLTSQGLVHLL----YLHGQGLFPQI------SDDL--LKVKIPTTLFTIIVRILQSQRPDGSWGEGGSREETAYAVLALANLSS : 449

* 560 * 580 * 600 * 620 * 640 * 660 * CPSKS_PEZ_FQQU_10418 : LPFTEI--LDEEIRTAILRGRTFL-----RSVNIASV------KGDWLWIEKTTYRSQPLAQSYVLGALQC---NTP-----TV-KLGSKVEEL--FTAVSTPKIANFSTFWGSLPVYAGVPQWNIKASLVEA : 598 CPSKS_PEZ_AHRI_6131 : LSFLTDC-LYDIIQESVERARDYL-----RRNG--GD--EAPCLDVKSGIWIEKVLYSCRSLHNTYIVAALMK---PFPVRP-----KEPPQSSTL--LLPIPEGKISQFTKFYRQLPMFRDLSVSLLRGALVEN : 613 CPSKS_PEZ_AHRI_9707 : LPAVRETFLEEIVESTLEKARQFL-----RQSGVAGD----PYVTSKDYVWVGKVAFASGTLLSAYVLGALNA---PSPAY------EMGDSVKQL--FGPAYDGDDKQIS-FFGQLPVLKSVPKWILRTSLMQS : 592 CPSKS_AGA_BSDJ_5166 : LPYVDL--IRDQVNAAVAKGREYL------LSVNAI--EKLHIAPADYIWVGKIGYGVEHVCNGYVVSAMHI---PVPLYEPASV-DTGV------NVSAEKIKGFAKFYSRLPMYQGFPQWRMYAWLIES : 588 CPSKS_AGA_BSDJ_8949 : LPYVEL--IRDHIYSAIAAGRSYL------VSVDALLQSRVSVDSSEHRQAIEVEEGVNRSSEPYASTALHL---PIPLYDPASA-DTGI------TVPTDKIQKFCRFYHQLPIYKGYPAWRLFSWLITS : 252 CPSKS_AGA_KMDG_10032 : LPHVAS--FRDQVDASIAAGRRYLEPWLTRKLEANAL------VWLEKVLYYVERICRSSVIAALNA---PVPTYSPETLFPDGN------: 556 CPSKS_AGA_NNLT_27262 : LPCASS--FHGQITAAIASGRRYLEPWVSKELNATSL------IWIEKVLYSIECICRSYVIAALHA---PVPTYLPENLPSAGL------: 555 CPSKS_AGA_OVWY_7324 : LPYSEL--IREHITASISLARSYL------STSNAV--DNLYVAPADYIWIGKIGYGVEHVCNSYIISAMNI---RVPLYEPSTV-SSGAP------PIPVAQVEKFTKFYGRLPMYKDFPKWRLVAWLIEG : 599 CPSKS_AGA_OVWY_7434 : LPYSEL--IREHITTSISLARSYL------STSHAA--DNLHIAPADYIWIGKIGYGVEHVCNSYIVSAMNI---PVPLYEPSTV-SSGAP------SVPVTQVETFTKFYGRLPMYKGFPKWRLVAWLIEG : 597 CPSKS_AGA_RHJD_14286 : LPFISI--MHNTIKKAVLSGRDFLQLNLT--DNQDSG--DRVC------LWIDKVSYRIPPVSYSYILAALRATAGSIP------DSTMVYGELDRLILIPVKRVAGFLHFYRKMPLFQECEDWQLLAYIAEG : 586 CPSKS_AGA_RHJD_14288 : LPFISI--MHDTIQKVVHPGREFLQLSLT--ENQDSG--DRIC------LWIDKVNYRIPPVSYSYILAALRATACPIP------DSTMVYGELDRLILIPVKRVAGFLRFYRKMPLFQECEDWQLLAYIAEG : 587 CPSKS_AGA_RHJD_8212 : LPFVSI--MRNIIQKAIHSGYDFLQLGLM--GSQDSE--DQVC------LWIDKVNYRIPSVSYSYILAALHAAARPIP------DSALVTGELDRLIHIPVKRVSGFLHFCRKMPLFQNCEDWQLLGYIAEG : 590 CPSKS_AGA_SSIY_2492 : LPYSEL--IREQIATSVSLARSYL------TTSNAV--DNLHIAPADYIWVGKIGYGVEHVCNSYIVSALNI---AVPLYEPSTV-SVGAP------PIPVSKVESFTKFYGRLQMYKGFPKWRLVAWIVEG : 561 CPSKS_AGA_XJPI_2953 : LPYSEL--IREQIATSVSLARSYL------TTSNAV--DNLHIAPADYIWVGKIGYGVEHVCNSYIVSALNI---AVPLYEPSTV-SVGAP------PIPVSKVESFTKFYGRLQMYKGFPKWRLVAWIVEG : 561

680 * 700 * 720 * 740 * 760 * 780 * 800 * CPSKS_PEZ_FQQU_10418 : YLFLPQLKAIRLEVFERE--GLLPEKYLEYIPFTWTGASNKDGTFASA---QLLYDMMVVSLLNYQADEYMEHETKHRYQDNLSDIDLIISEVF------ASCANNI------: 694 CPSKS_PEZ_AHRI_6131 : YTFLPKLRDVKLDVFDRK--GYRDDKYLEYIPFSWIGPSLLTPEVRIPT--QVYWDMSMIALLNYQVDEYIEADVGIKHKDRLDDVVQMIDRLF------SRKNGA------SKTLY---TSMSNGHG------: 722 CPSKS_PEZ_AHRI_9707 : QILLRNPSRLELAGFDRKALGFKEDKYLLSVAFSWVS---INNAYRSSTAAQILEDLIAMSMLNFHVDEFIESLVSEKFRGKLEAASGLVRKVFRGSPTSTPAKEGT------GSDV-----EMNGSDN------: 707 CPSKS_AGA_BSDJ_5166 : FLFLPELRRVHHIAFDRQ--GMKEDPYFEYLPFSWTGPNGMEETYASA---QTIFDMIVISMVNFQVDEFFDIVVQKHGEGALAYLRTLIDEMM------AGIEKGIIPPAIPQN------AGKADG------: 698 CPSKS_AGA_BSDJ_8949 : YIYLPELRRIRLNVFNRQ--GMNEDPYFEYLPFCWIGPNCMEKTYASP---YTMFDLMVIIMVNFQVDEFFDLVVRDHGEGALAHVQQAVEEMC------NDLEDGVLPDTLAPSD------: 357 CPSKS_AGA_KMDG_10032 : ------ISRFDRL------FSVTE--DL------: 570 CPSKS_AGA_NNLT_27262 : ------DVCFDSL------FSVDE--DLPTK------: 572 CPSKS_AGA_OVWY_7324 : YLFLPELLRIRLDVFDRN--GMKKDGYLEYLPFTLTGPNGMEETYLSP---QTIFDMIVIAMLDFQVDEFFDIVVQKHGEGAIQSLKDAIGVIF------NDLEKGIIPSALPKADA----NGHTNGHENGTNGT : 719 CPSKS_AGA_OVWY_7434 : YLFLPELARVRLDVFNRD--GMKKDPYFEYLPFSWTGPNGMEEAYASP---QTIFDMIVISMVNFQVDEFFDIVVQKHGEGAIQSLKDAIEVIC------NDLEKGIMPNALPKADT----NGHTNGHENKINGT : 717 CPSKS_AGA_RHJD_14286 : YLYRPILEEVRNSVFGRE--GMGKESYMEYIPFGWTSANAMHKKYSCP---QNCFVLMAITLINFQVDEFFDSMVQDQGMAALPTLRRALDDIF------DALNLG------RDIADF------: 687 CPSKS_AGA_RHJD_14288 : YLYMPILEEVRNSVFGRE--GMGKEPYIEYIPFSWTSANAMHKKYSCP---QNCFVLMTISLADYQVDEFFDSMVQDQGKAALPTLRRALDNIF------GALNLG------HDIADF------: 688 CPSKS_AGA_RHJD_8212 : YLYMPILEEVRKTVSGRE--GM---PYAEFIPFGWTSANAMHKKYSCP---QNSFALMNMCLVTFQIDEFFDSVVQVQGTPALPTLRRALDNIF------NALNLG------HDIKYF------: 688 CPSKS_AGA_SSIY_2492 : YLYLPELARVRLDVFDRD--GMKKDPYFEYLPFSWTGPNGMEETYTSP---QTIFDMIVISMVNFQVDEFFDLVVQKHGPGAIQSLKHAIGVIC------DDLEKGIVPKALPKAGTNGHSNGHSNGHSNGTNGT : 685 CPSKS_AGA_XJPI_2953 : YLYLPELARVRLDVFDRD--GMKKDPYFEYLPFSWTGPNGMEETYTSP---QTIFDMIVISMVNFQVDEFFDLVVQKHGPGAIQSLKHAIGVIC------DDLEKGIVPKALPKAGTNGHSNGHSNGHSNGTNGT : 685

820 * 840 * 860 * 880 * 900 * 920 * 940 CPSKS_PEZ_FQQU_10418 : -SANDDTDRDAFPRHLRRFTKAALHHPRIALANDEMKSHLFKEIHKFFKAHHQQAVDNLQLAAE-----MTTPRTSFHTWVRGVSAQHTSAHYSYAFYNCLAAHAGEKYQ------LATVQQKYISQDVITHL : 815 CPSKS_PEZ_AHRI_6131 : -MNEDDSLYDVSH-KLLGFKNAILSHPNISKASPSDRYYLEHELKAYLRAQVEQSQINADFVRQDDRRVLQSARLPYFKWVMGPASDHVSCRYAFAFLLCIVSHSMNGAPA------FRTVIERHVAMDACSHM : 848 CPSKS_PEZ_AHRI_9707 : -STAEDSVYKT----ISSFRDLVAGHPRVQNASPRDKSFLLSEMEAYVVAQVGQEIDNEKLKAQDDRRTFKTVGSSFRKWVQGPSAEHLSCFYSFAFLLCLGSHHGKGCAS------FPTEHMRYMAEDVCRHM : 830 CPSKS_AGA_BSDJ_5166 : --EAQPTGYDEIRARLGHFVNFVFTYPRIEFASENDKAQLRKEMKIYLHAHTTQCEDNVKLQAQAEYDPFLSPRSSYLKWVRSTAADHLSSQYAFAFITCLLGHSQNKGKEVSKREDYFPTAEIKYIAQDCSTHL : 831 CPSKS_AGA_BSDJ_8949 : --DASISPQEHVRARLGHFLRFLFSYPRIQCASDADKAHLRKEIRIFLMAHATQCEDNIRLQSQERTDLFCTPRSSYLRWVRSTAADHFSTQYTFAFMTCLLGHAGNKGRTAKEWRDYFPTAELKYIAQDCATHV : 490 CPSKS_AGA_KMDG_10032 : ------: - CPSKS_AGA_NNLT_27262 : ------: - CPSKS_AGA_OVWY_7324 : ETGLKDSSYQDIHNRLSHFINFISTYPRIERASDNDKGQLRRELKIFLLAHTQQCEDNIRLQAQEHYDPFLTPRSSYFKWVRGTASDHLASQYSFAFITCLLSHSQNKAKG-GVPEDLFPSAEIKYIAQDCVTHF : 853 CPSKS_AGA_OVWY_7434 : ETSLKDSSYRDIHSRLSHFINFVFTYPRIKRASENDKGQLRREMKIYLFAHTQQCEDNVRLQAQEHYDPFLTPHSSYLKWVRGTASDHLSSQYAFAFITCLLGHSQNKAKG-GVPEDFFPTAEIKYIAQDCSGHL : 851 CPSKS_AGA_RHJD_14286 : --DCGDGPYSHMVGYMHKYVSFIINHPTSQNAAYYDKTHLRRELKAYLLSMMQQTEDNTVYATQLSWETVMHPAASYLKWVRTLASEHLSGLYSAAFFMCQL------SPGVDVFPTPELKFIAQDCATHL : 810 CPSKS_AGA_RHJD_14288 : --DCGDGPYSHMVQYMHKCISFIINYPTLQKAAYYDQTHLRRELKAYLLGMIQQTEDNTVYAAQLSWETVMHPAASYLKWVRTLASEHVAGLCSAAFFMCHL------SPGVDVFPTPELKFIAQDCVTHM : 811 CPSKS_AGA_RHJD_8212 : --DCGDGPYSHMVQHMHKYISFILNYPTSQKAAYYDKAHLRQELKAFILSMIQQVEDNTIYAAQSSWETVLNPAASYLKWVRTLGAESVSGLCVSSLLICHL------SPDIDVFRTPELKFIAQDCITHM : 811 CPSKS_AGA_SSIY_2492 : ESEQKDSSYQDIHSRLSHFVNFVFTYPRIEYASPNDHGQLRREMRIYLLAHTQQCEDNVRLQAQEHFDPYLTPPSSYLKWVRGTASDHLSSQYAFAFIACLLGHSQNKAKG-GVREDYFPTAEIKYIAQDCSGHL : 819 CPSKS_AGA_XJPI_2953 : ESEQKDSSYQDIHSRLSHFVNFVFTYPRIEYASPNDHGQLRREMRIYLLAHTQQCEDNVRLQAQEHFDPYLTPPSSYLKWVRGTASDHLSSQYAFAFIACLLGHSQNKAKG-GVREDYFPTAEIKYIAQDCSGHL : 819

* 960 * 980 * 1000 * 1020 * 1040 * CPSKS_PEZ_FQQU_10418 : ASLCRMYNDYGSLQRDQDEKNVNSLMFPEFERL---KTDVEKKKELLKLTEYERRCLNLALDGL----ETELKSEGKERLMRLIKMFCDVTDTYGWVYVVRDIGTRTCRLECKV : 922 CPSKS_PEZ_AHRI_6131 : AAACRMHNDFGSVERDRDEANLNSINFPEFEAY--DKDDRVVKEELMQVANYELDAFENAMVRL-----KHLCGEERKYVYNMVRFFSNAVRFYGELYVLRDLTCRV------: 948 CPSKS_PEZ_AHRI_9707 : ASRCRIFNDLGSLARDEKESNLNSVNFPEFES----KSALAAKEELLRIADYEQKSVKTGLEAL-----AVVCPAQHKRVLDHMHLFSHAVEFWNEVYMVKDLSQEKLSLSS-- : 933 CPSKS_AGA_BSDJ_5166 : SVICRIFNDWGSMKRDREEKNLNAVFFPEFEGRAKGKDEKQLKAELRAISEYERKCLNLSFDELLRVAQRTVGPQAGKRLHEVVRLFYNASEMYTEIYEFRDISTWN------: 938 CPSKS_AGA_BSDJ_8949 : SVICRQFNDYGSLSRDREERNLNGVFFPEFEGRIKAKSDLLLKSELRAISDYERRIMDLTYDELLRVAQGAMGPEQGKRAHKLVRLFRNVGDLYNQIYELSGLES------: 595 CPSKS_AGA_KMDG_10032 : ------RKFT------: 574 CPSKS_AGA_NNLT_27262 : ------FDRKL------: 577 CPSKS_AGA_OVWY_7324 : SVVCRTFNDWGSMRRDREEKNLNAIFFPEFEGRAHTKSDEQLKLELRAISEYERKALRLSFDELLLVCQKTVGPDRGKRLHDIVKLFYNASEMYTEIYEFRDISTWN------: 960 CPSKS_AGA_OVWY_7434 : SVICRIFNDWGSMRRDREEKNLNATFFPEFEGRARHKSDEQLKSELRAISEYERKALQVSFDELLRVGQKTVGPAQGKRLHEIVRLFYNASEMYTEIYEFRDISTWN------: 958 CPSKS_AGA_RHJD_14286 : SVLCRIWNDWGSMKRDIAERNINGMNFPEFAN----MTEADIKVEMRRISDYERRCLHASLAELRKRAKEVMGNTQGDSLADAFNLFFGAAEVHNAIYELKDISAWKLSLLNGQ : 920 CPSKS_AGA_RHJD_14288 : SVLCRIWNDLGSMKRDVAERNINGMNFPEFTN----MTEADIKVEMRRILDYERRCLHTSLGEFRKRAKEVMGSTQGDNLADAFHLFFGGAEVYNAIYDLKDISAWKLSLLNGQ : 921 CPSKS_AGA_RHJD_8212 : CAVCRIWNDWGGRKRDAAERNINGINFPEFAG----MTEVEIKEEMRRISDYERRCLHASLGEFRKRAKEVMGDTQGDNLADAFALFFNGCEVFNAIYELKDISAWRPSSLNGQ : 921 CPSKS_AGA_SSIY_2492 : SVICRIFNDWGSMRRDREEKNLNATFFPEFEGRAKNKSDEQLKSELRAISDYERKTLQLSFDELLRVCQKTVGPAQGKRLHEVVRLFYNASEMYTEIYEFRDLSTWQA------: 927 CPSKS_AGA_XJPI_2953 : SVICRIFNDWGSMRRDREEKNLNATFFPEFEGRAKNKSDEQLKSELRAISDYERKTLQLSFDELLRVCQKTVGPAQGKRLHEVVRLFYNASEMYTEIYEFRDLSTWQA------: 927

Figure 5.7: Multiple sequence alignment of 11 Basidiomycota CPS/KSs and 3 Ascomycota CPS/KSs.

176 AlAl1_AGA_RHJD_16069 1 AlAl1_AGA_RHJD_16091 0.996 AlAl1_DOT_GEAK_12341 1 AlAl1_DOT_DGLV_16613 0.999 AlAl1_DOT_PINU_12649 AlAl1_SOR_DEFX_929 1 AlAl1_EUR_XYUY_48 AlAl1_SOR_IADJ_90 0.766 0.982 AlAl1_DOT_EGUL_6851 0.92 AlAl1_SOR_GDRU_19690 0.119 AlAl1_DOT_USYT_10235 0.997 AlAl1_SOR_DEFX_11311 AlAl1_LEO_FLJQ_1601 1 0.321 AlAl1_EUR_HPVK_8023 0.996 AlAl1_EUR_AYAI_1601 1 1 AlAl1_SOR_RHTP_45 AlAl1_SOR_GYZV_8783 AlAl1_DOT_JMZC_1090 0.795 0.818AlAl1_DOT_DKTZ_836 1AlAl1_DOT_OCCK_11708 AlAl1_DOT_RDUF_4893 0.444AlAl1_DOT_NWTK_117 0.994 AlAl1_DOT_PINU_11393 1 AlAl1_DOT_TNXD_10527 AlAl1_SOR_RQYB_6003 0.997 AlAl1_DOT_XAAG_6412 AlAl1_DOT_ZLLL_9302 AlAl1_EUR_XJPO_12400 0.926 AlAl1_DOT_JOFG_5695 AlAl1_SOR_RHTP_5019 Agaricomycotina(2) 1 AlAl1_SOR_EBHA_9358 0.977 AlAl1_SOR_RQYB_6119 0.571 0.999 AlAl1_SOR_RBPQ_10632 0.961 1 0.996 AlAl1_SOR_RBPQ_9392 Orbiliomycetes(1) AlAl1_DOT_NJXP_7686 0.903 AlAl1_DOT_OWZC_3702 0.857 AlAl1_DOT_MORC_2898 0.998 1 AlAl1_DOT_QGZN_2491 Eurotiomycetes(41) 0.562 0.999 AlAl1_DOT_DKTZ_631 1AlAl1_DOT_JMZC_1240 AlAl1_DOT_OCCK_10424 AlAl1_EUR_QHTQ_10144 0.999 AlAl1_EUR_YUBL_11604 Dothideomycetes(37) AlAl1_EUR_MHSV_9415 1 1 AlAl1_EUR_WYIH_587 0.984 AlAl1_EUR_WBMS_4798 Lecanoromycetes(1) 1 AlAl1_EUR_PYJS_4999 0.989 0.963 AlAl1_EUR_ODBE_6888 AlAl1_EUR_RTHQ_7399 1 1 AlAl1_EUR_EYTM_12258 0.991 Leotiomycetes(2) 0.939 AlAl1_EUR_JHOF_11404 AlAl1_EUR_GJUY_4531 1AlAl1_EUR_VYMS_8671 0.888 AlAl1_EUR_WKWV_6692 AlAl1_ORB_IHAF_5847 Sordariomycetes(28) AlAl1_EUR_VQCR_11138 0.843 1 AlAl1_EUR_VYMS_4908 0.959 AlAl1_EUR_BSIJ_9281 1 AlAl1_XYL_HNJE_4522 Xylonomycetes(2) 0.995 AlAl1_XYL_MZAQ_2518 AlAl1_EUR_DPIG_10264 1 AlAl1_EUR_BSIJ_8966 1 0.71 AlAl1_EUR_LJBX_8771 AlAl1_EUR_PPHY_2376 0.175 AlAl1_EUR_AYAI_1086 1 AlAl1_EUR_HLIA_4516 0.901 AlAl1_EUR_HNBX_2834 AlAl1_SOR_LATZ_6187 0.968 AlAl1_SOR_QCRC_12413 1 1 1 AlAl1_SOR_WINP_14583 0.91 AlAl1_SOR_QAIF_1219 0.971 0.987 AlAl1_SOR_XBMT_14221 AlAl1_DOT_ZFEH_9911 AlAl1_SOR_UMYR_13737 0.979 0.201 AlAl1_SOR_XAFO_12293 1 0.708 AlAl1_SOR_FSJK_12289 0.223 AlAl1_SOR_ZMIH_455 AlAl1_LEC_CZOJ_335 AlAl1_LEO_FLJQ_13430 1 AlAl1_EUR_HLIA_1630 1 AlAl1_EUR_QHTQ_10911 0.845 1 AlAl1_EUR_SXXD_1959 AlAl1_SOR_KWUN_14422 0.19 AlAl1_EUR_GOAS_4551 1 0.19 AlAl1_EUR_GJUY_1660 AlAl1_EUR_VYMS_5245 10.98 AlAl1_EUR_WKWV_11836 0.72AlAl1_EUR_ODBE_10650 0.867AlAl1_EUR_RTHQ_11768 AlAl1_EUR_VQCR_11796 0.889 0.54 AlAl1_EUR_EYTM_7265 0.967AlAl1_EUR_JHOF_10250 AlAl1_EUR_ONDG_11120 AlAl1_SOR_LATZ_7654 0.935 AlAl1_DOT_JSQJ_6579 1 AlAl1_DOT_SYTJ_7431 0.933 AlAl1_DOT_NWTK_7859 AlAl1_DOT_PINU_5695 1 1 AlAl1_DOT_TNXD_11473 0.853 0.966 AlAl1_DOT_DKTZ_11889 AlAl1_DOT_JMZC_5492 AlAl1_DOT_OCCK_10101 AlAl1_DOT_DKTZ_12307 AlAl1_DOT_JMZC_5726 AlAl1_DOT_OCCK_982 0.56 AlAl1_SOR_JKHV_7910 AlAl1_DOT_ERHT_3641 1 1 AlAl1_DOT_YXFC_1880 0.958 AlAl1_SOR_QCRC_13764 1 AlAl1_SOR_WINP_3356 1 AlAl1_SOR_FSJK_6335 0.995 AlAl1_SOR_UMYR_3729

0.3

Figure 5.8: Maximum likelihood phylogeny of 114 αα1 type terpene synthases.

177

* 20 * 40 * 60 * 80 * AlAl1_AGA_RHJD_16069 : MELLYSTIVDPSEYDSGGLCDGIDLRKSNFTWLEDRGIIRAQEDWKKYISPFQEFRGTLGPEYSFLSVLLPECLPERLEVLGYANEFAFL : 90 AlAl1_AGA_RHJD_16091 : MEFQYSTVIDPSTYDTEGLCDGIDFRRNNFTWLEDRGAIRAQADWTKYVSPATGHRGVLGPQYSLLSSAIPECSPERLEVISYALEFGFL : 90 AlAl1_DOT_GEAK_12341 : MIYQFSTIVDPKTYDNEGLSKGIDLRKNNFTHFEDRGAIRAQHDWARYVAPIKQFKGTLGHDFSFMTVCVPECIPERLEIISYANEFAFL : 90 AlAl1_DOT_DGLV_16613 : MKYQFSTIVDPATYDNEGLSNGIDLRKNNFTHFEDRGAIRAQQDWAKHVAPIKQFKGTLGHDYSFMTVCVPECIPIRLEIISYANEFAFM : 90 AlAl1_DOT_PINU_12649 : MRYQFSTIVDLAAYDNEGLSDGIDLRKNNFTHLEDRGAIRAQQDWAKHVAPIKQFKGTLGHDYSFMTVCVPECIPTRLEIISYANEFAFM : 90 M q5ST66Dp YD eGL GIDlR4nNFT EDRGaIRAQ DW 4 6 P f4GtLG 5Sf63v 6PEC P RLE66sYAnEFaF6

100 * 120 * 140 * 160 * 180 AlAl1_AGA_RHJD_16069 : YDNFIEFVDKEQSTIEYDQIGQAFLEGARTGKIFTQDTDAETKRAGKRKMQSQMVLEMLAIDREGAIAIMKSWVGFAEAAS---NHKEFA : 177 AlAl1_AGA_RHJD_16091 : LDDVINATDQEQGTIESNDMMQAFLEGVQTGKI--TKNDAQTKREGKRKIQSQLLLEMFSIDRERAIAFVKAWAEFAEVGSGRQHHENFA : 178 AlAl1_DOT_GEAK_12341 : YDDATELETEDNMNSENDKMMQGFLTDALSSRP-PKDLDS----SGKMRILTQMVSEMMAIDKKCAVVTMRAWSEFLRVGSSRQHGTIFT : 175 AlAl1_DOT_DGLV_16613 : YDDDTELDNENNTSAENDKLIGIFLAGTQGWSS-LQDQSS----SGKTRILKQLFSEMVEIDKECAIVTMKAWAEFLRVGSSRQHGTVFT : 175 AlAl1_DOT_PINU_12649 : YDDDTELDTEDDTSAENDELMGIFLAGPESRSP-SQDQSS----SGKTRILKQLFSEMVAVDKECAIRAMKAWAEFLRDGSSRQHSTAFI : 175 yD1 e E 1 6 FL g d GK 46 Q6 EM 6D4e A6 64aW eF gS rqh F

* 200 * 220 * 240 * 260 * AlAl1_AGA_RHJD_16069 : TLDEYLPFRLINAGAMVWLQFILFGMNLKIPENEKDKCYKLVQPALFVLALQNDLCSWEKEYIAAKNCDQVHIINALWVLMREYNTDVPG : 267 AlAl1_AGA_RHJD_16091 : TLEEYLPYRIVDAGHALWYTFITFGMGLNIPQWEREKCDELTQSATAALVLQNDLFSWEQEYATAQSNHQSHVTNALRVLMREHNIGIQE : 268 AlAl1_DOT_GEAK_12341 : RLEDYLPYRVKDVGEMFWYGVVTFGMALHIPDHEMEACHRLMEPAWVAVGLANDVFSWPKERDANQKSGRSHIINAVWILMQEKGLSEEQ : 265 AlAl1_DOT_DGLV_16613 : RLEDYLPYRIKDVGEMFWFGVVTFGMALHIPDHEMDACHKLMEPAWIAVGLANDVFSWPKERDASQRLGRTHVVNAVWVVMQEHGFSQKQ : 265 AlAl1_DOT_PINU_12649 : RLEDYLPYRIKDVGEMFWFGVVTFGMALHIPDHEMDACHKLMEPAWIAVGLANDVFSWPKERDASQRLGRTHVVNAIWVVMQEHGLSQEQ : 265 Le YLP5R6 1 G m W 6tFGM L IP E C L 2pA a6 L ND6fSW kE a q H6 NA6w66M E

280 * 300 * 320 * 340 * 360 AlAl1_AGA_RHJD_16069 : AQEICRNLIKKYISEYVQVVEDAKQDESFSADARKFVEASKYSIAGNAVWSTTCPRYQPGVSFNERQLEWMRNGVPNK-----PGPSFEP : 352 AlAl1_AGA_RHJD_16091 : AQQMCRKLIKQHVSDYIQIVENVKHNESLSADLRKYIEAMQYTISGNIAWSMNCPRYHPQASLNETQLEWMHSGVPDKLTFSLPSPPASP : 358 AlAl1_DOT_GEAK_12341 : AGQYCRELAAQYVARYVENVQRVKDDESLSADLRTYIEAMQYSISGNVIWSKSCPRYNPGQHFNQTQVDWMLNGVPDPTGFDS---SSSS : 352 AlAl1_DOT_DGLV_16613 : ADQYCRELAAQYVSQYVDSIRNIKNEESISPDLRTYVEAMQYSISGNVIWSKFCPRYNPEKRFNQTQLDWMQNELPSSVELDRASNTSSS : 355 AlAl1_DOT_PINU_12649 : ANQYCRKLAAQYVTQYLDNIRKIKNDESISLDLRTYVEAMQYSISGNVTWSKLCPRYNPEKCFNQTQLDWMHNGLPRPIELDSASDTSSS : 355 A 2 CR L qy6 Y6 6 K ES S DlR 56EAmqY3IsGN WS CPRY P fN2tQ6 WM ng6P s

* 380 * 400 * 420 * 440 * AlAl1_AGA_RHJD_16069 : -CAGAEKSRFSLNGAST------EEP------DSNPAPEEEKS-LSAIGQFVVEAPYQYISSLPSKGVRDRFIDAVNQWLKV : 420 AlAl1_AGA_RHJD_16091 : EVLGSLSPSWSVHSDSSRSSTPPLEEPITATKNLLMNLELPSPPPPSE------IVEAPYQYIASLPSKGIRDKFIDAVNQWLKV : 437 AlAl1_DOT_GEAK_12341 : YDSTSTNGSPKPESETTVESRS--QEGGTDDISGIMSSLLDCSLPPLSHEVRLSS----TFGAPWEYIDSLPSKGARDMFLDGINHWLDV : 436 AlAl1_DOT_DGLV_16613 : FLSTSTHGSPVSGSQATIESK----EGWTADSSGIVSLLLNCSLPPLSHKV------ISAPWIYVDSLPSKGTRDMFLDALNHWLRV : 432 AlAl1_DOT_PINU_12649 : FLSTSTHSSPVSGSQTTIEYK----DGWTADSSEIMSLFLNSSLPPLSHKV------ISAPVIYVDSLPSKGARDMFLDALNHWLQV : 432 s s s 3 e t l pP AP Y6 SLPSKG RD F6Da6N WL V

460 * 480 * 500 * 520 * 540 AlAl1_AGA_RHJD_16069 : PHNVVKQIKAAINRLHHASLLLDDFQDSSPLRRGKPAAHTMFGAPQTINSAGYCIIKAIEQIQALGNAQIVTNKL------LSLYKGQ : 502 AlAl1_AGA_RHJD_16091 : PENIVEQIKALTNRLHQASLLLDDFEDSSPLRRGKPAAHTIFGAPQAINSAGYCIVKAIGELQALGASQIITSKLILTSSDKILSLFKGQ : 527 AlAl1_DOT_GEAK_12341 : GRETSSQVKKVVRMLHNASLMFDDVQDGSPLRRSKPATHRVFGIAQTINSASFLVNESIKETRRFAGDRGVDIVL-----EQLTSLFVGQ : 521 AlAl1_DOT_DGLV_16613 : DGQRASQVKMAIRMLHNASLMLDDVQDGSPLRRSKPSAHRVFGVAQTTNSAAFLVNESIKLIRELAGDQGVAAVL-----EKLTSLFVGQ : 517 AlAl1_DOT_PINU_12649 : DEQKVSQVKMAIRMLHNASLMLDDIQDGSRLRRGKSSAHRVFGVAQTTNSAAFLVNESIKQIRELAGDQGVAAVL-----EKLTSLFVGQ : 517 Q6K LH ASL6lDD 2D SpLRR Kp aH 6FG Qt NSA 5 6 I l q 6 L SL5 GQ

* 560 * 580 * 600 * 620 * AlAl1_AGA_RHJD_16069 : ALDLHWTYNGIWPTPAEYIQMIDCKTGAQFDLVIELMLAHSDASIKPD--LSKLTTLLGRYFQIADDYKNLVSADYAKQKGFCEDLDEGK : 590 AlAl1_AGA_RHJD_16091 : ALDLHWTYNGICPTPAEYIQMIDCKTGAQFDLVVDMMLAHSNASVKPD--LKKLTTLLGRYFQIADDYKNLVSADYRKQKGFCEDLDEGK : 615 AlAl1_DOT_GEAK_12341 : AQDLHSSRNLSCPSLTEYIQTIDQKTGALFILAAKLMCLFSTTDKATERSLLRFCLLLGRFFQIRDDFQNITSHEYTKQKGFCEDLDCGT : 611 AlAl1_DOT_DGLV_16613 : AQDLHSSRNLSRPSLTEYIQTIDQKTSALFELASRLMCLCSTATVVPNRSLSRFCILLGRFFQIRDDYQNLTSPEYTKQKGFCDDLDSGT : 607 AlAl1_DOT_PINU_12649 : AQDLHSSRNLSCPSLTEYIQTIDQKTSALFELAWRLMYLCSMANVVPDSSLSRFCILLGRFFQIRDDYKNLTSPEYTKQKGFCDDLNSGT : 607 A DLH 3 N P3 EYIQ ID KT A F L 6M S a p L 4 LLGR5FQI DD5 N6 S Y KQKGFC DL1 G

640 * 660 * 680 * 700 * 720 AlAl1_AGA_RHJD_16069 : YSLPLIHLMQSQPDNLQLRNILSTRRNEGRMMYEHKLLVLKYLKEAKSLEYTYSILADLHARIRQQIDELEEVLGESNTELRLLWELLRV : 680 AlAl1_AGA_RHJD_16091 : YSLPLIHLLQSHPENLQLRNILSTRRAEGKMMYEQKVLVLEYLREAESLEYTHSVLEGLHAKIGQQIDNIEESFGETSIELRVLWELLRV : 705 AlAl1_DOT_GEAK_12341 : YTIPLIYTIAQEPHNILLQNLLSTRLADGALDDAQKSLILEQMELKETNKYLKKILSLLHNELMAELQFLSDLFASENLYIKLMLLKLGV : 701 AlAl1_DOT_DGLV_16613 : YTLPLVYAISQQSENLLLQNLLSTRLAEGTLDDAQKRLALDQMQLVKTDEFLQKILASLYDELRAELQCISSSFASENPQMELMLVMLKL : 697 AlAl1_DOT_PINU_12649 : YTLPLLYAISQQSENLLLQNLLSTRLAEGTLDDAQKCLALDQMQLVKTDKFLRKILAVLYDELSAELQYISSSFASKNPQMERMLAMLKL : 697 Y36PL6 6 N6 L N6LSTR aeG 6 qK L L 6 3 5 6L L 6 26 6 f n 6 6 L 6

Figure 5.9: Multiple sequence alignment of 2 Basidiomycota and 3 Ascomycota αα1 type terpene synthases.

178 Agaricomycotina Pucciniomycotina Dothideomycetes

CPS/KS αα1 CPS/KS αα1 CPS/KS αα1 α 0 0 TRI5 α 0 0 TRI5 α 5 1 TRI5 2 0 0 0 0 0 14 3 0 30 0 0 0 2 0 0 2 20 12 0 0 1 0 3 0 0 0 0 4 3 0 4 0 0 1 5 89 0 6

Sordariomycetes Eurotiomycetes Leotiomycetes

CPS/KS αα1 CPS/KS αα1 CPS/KS αα1 α 6 1 TRI5 α 3 1 TRI5 α 1 0 TRI5 6 0 1 8 0 0 2 0 0 21 1 0 2 2 3 1 2 8 0 0 0 8 19 0 2 0 1 3 1 0 6 11 2 6 0 1 10 5 2

Pezizomycetes Orbiliomycetes Lecanoromycetes

CPS/KS αα1 CPS/KS αα1 CPS/KS αα1 α 0 0 TRI5 α 0 0 TRI5 α 0 0 TRI5 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0

Xylonomycetes

CPS/KS αα1 α 0 2 TRI5 0 0 0 1 0 0 0 0 0 0 0 0 0

Figure 5.10: Species distribution in the four types of fungal Terpene synthases. Each venn diagram represents a fungal lineage and the number of species that are unique to and shared between four Terpene synthases types.

179 A

α TRI5 CPS/KS αα1 Exon Number

Length

B α C α TRI5 TRI5 CPS/KS CPS/KS αα1 αα1 Length Exon Number

α TRI5 CPS/KS αα1 α TRI5 CPS/KS αα1 Type Type

Figure 5.11: Comparisons of the terpene synthase gene length and exon number. (A) A scatter plot of exon number (y-axis) compared to the mRNA length (x-axis) for α (red points), TRI5 (forestgreen points), CPS/KS (darkturquoise points) and αα1 (darkorchid1 points). The overlaid density distributions of exon number and terpene synthase gene length for each type are shown along the axes. (B) Boxplot shows the distribution of counts of exons across the four dierent terpene synthase genes. (C) Length of four types of terpene synthase genes shown in boxplot. (The horizontal line within boxplot shows the median and the diamond inside each boxplot shows the mean. Outliers are shown as black dots.)

180 α

TRI5

CPS/KS

αα1

Figure 5.12: Sequence logos of conserved terpene synthase family motifs observed in each type of fungal terpene synthases. Bits indicate the conservation of amino acids at specic positions and the height of letters within each stack reects the relative frequency of the corresponding amino acid. The logos were created on the WebLogo 3 server (http: //weblogo.threeplusone.com/create.cgi).

181 PpCPSKS_188 SmCPSKS_256

59 0 At_196 1 9 2491 0 0 185 2489 0

0 2 0

0 0

0

Figure 5.13: Venn diagram of identied CPS/KSs by known CPS/KSs from three species. SmCPSKS from Sphaceloma manihoticola; PpCPSKS from Physcomitrella patens; AtGA1 and AtGA2 from Arabidopsis thaliana.

182 Chapter 6

Summary and Conclusions

183 Recent advances in sequencing technology greatly increase the width, depth and accessibility of genomics data, and thus propelled the functional and comparative ge- nomics studies at large scales. The research presented in this dissertation applies modern computational techniques into the study of terpene synthases in plants, fungi and bacteria.

One aspect of this research is to attempt to trace the evolutionary history of this gene family by phylogenetic analysis. This benets to a great extent from analyzing terpene synthases in a wide range of species.

In Chapter 1, an overview of the state-of-the-art knowledge on terpene synthases and their evolution was given. Also, specic aims were outlined: (1) identify and characterize the archetypical plant terpene synthases and microbial-type terpene synthases in the plant kingdom via comparative transcriptomic analysis of 1000 plants, (2) identify the terpene synthases from fungi and bacteria species, (3) infer the evolutionary history of this gene family by comparative analysis of terpene synthases found in plants, fungi and bacteria.

In Chapter 2, typical plant terpene synthases were identied and characterized from more than 1000 plant transctiptomes and six genomes of non-seed plants. The phylogenetic analysis based on sequences in non-seed plants and selected seed plants provides new insights into the evolution of this gene family. A new subfamily x was discovered and new hypotheses about the ancestor of modern plant terpene synthases was proposed.

In Chapter 3, by mining the transcriptomes of 1103 plant species ranging from green algae to owering plants, 712 microbial type terpene synthases, a new type that is

rst found in the genome of Selaginella moelledori in 2012, were identied from 146

184 plant species, almost all of which are non-seed land plants. They are dominant only in major clades of non-seed land plants: Monilophytes, Lycophytes, Hornworts, Mosses and

Liverworts. A phylogenetic tree built on these 712 microbial type terpene synthases and selected sequences from bacteria and fungi enables us classify them into four major groups and the biochemical analysis reveal that they functions as monoterpene synthases and sesquiterpene synthases.

In Chapter 4, two putative horizontal gene transfer events from bacterial to fungi in the terpene synthase gene family were identied. To the best of our knowledge, this is the

rst case of HGT of terpene synthase genes. These HGT-acquired genes were restricted to a group of entomopathogenic fungi and they were under stringent purifying selection in fungal recipients.

In Chapter 5, the distribution and genomic organization of four types of terpene synthase genes in 519 fungi genomes are characterized. The results show that two main groups of fungi, Ascomycota and Basidiomycota, have evolved dierent sets of terpene synthases. Compared to the α and TRI5 types of terpene synthases, two types of diterpene synthases, the bifunctional CPS/KS and the chimera diterpene synthases, were much less common in both groups. Also, almost all the diterpene synthases were encoded in the species of Ascomycota.

185 Appendix

186 Appendix A

List of species

A.1 List of OneKP Species

187 Table A.1: List of screened transcriptomes Table A.1: Continued and the number of PTPS and MTPS in each sample. All these 1175 transcriptomes were Code Species P M from OneKP (www.onekp.com). The unique DHWX Fontinalis antipyretica 0 0 FFPD Ceratodon purpureus 0 0 four letter codes are the transcriptome JMXW Bryum argenteum 0 0 identiers assigned by 1KP. TAVP Calliergon cordifolium 0 0 YEPO cf. Physcomicromitrium sp. 0 0 YWNF Hedwigia ciliata 0 0 Code Species P M Chloranthales (CHA) OSHQ Sarcandra glabra 13 0 Anthocerotophyta (ANT) WZFE Ascarina rubricaulis 8 0 RXRQ Phaeoceros carolinianus 17 4 Chlorophyta (CHO) WCZB Phaeoceros carolinianus 2 9 FPCO Interlum paradoxum 2 0 FAJB Paraphymatoceros hallii 2 3 MCHJ Micrasterias mbriata 0 1 TCBC Megaceros vincentianus 2 2 ACRY Pteromonas sp. 0 0 DXOU Nothoceros aenigmaticus 0 0 AEKF Penium margaritaceum 0 0 FANS Leiosporoceros dussii 0 0 AJAU Helicodictyon planctonicum 0 0 IQJU Anthoceros formosae 0 0 AJUW Chloromonas rosa 0 0 UCRN Megaceros tosanus 0 0 AKCR Parachlorella kessleri 0 0 Basal Magnoliophyta (BMA) ALZF Halochlorococcum marinum 0 0 VZCI Illicium oridanum 20 0 AYPS Unidentied species CCMP 1205 0 0 FZJL Austrobaileya scandens 9 0 AZZW Chlorokybus atmophyticus 0 0 URDJ Amborella trichopoda 5 0 BAZF Chaetopeltis orbicularis 0 0 ROAP Illicium parviorum 4 0 BCYF Chlamydomonas cribrum 0 0 NWMY Kadsura heteroclite 2 0 BFIK Entransia mbriat 0 0 WTKZ Nuphar advena 2 0 BHBK Cosmarium tinctum 0 0 NPND Ceratophyllum demersum 0 0 BILC Prototheca wickerhamii 0 0 Bryophyta (BRY) BTFM Monomastix opisthostigma 0 0 HVBQ Tetraphis pellucida 4 0 BZSH Golenkinia longispicula 0 0 LNSF Hypnum subimponens 3 5 CBNG Planophila laetevirens 0 0 ORKS Philonotis fontana 3 5 CQQP Ochlochaete sp. 0 0 QKQO Pseudotaxiphyllum elegans 3 3 DFDS Desmidium aptogonum 0 0 BGXB Plagiomnium insigne 3 2 DRFX Closterium lunula 0 0 WNGH Aulacomnium heterostichum 3 1 DRGY Chaetosphaeridium globosum 0 0 KEFD Encalypta streptocarpa 3 0 DUMA Tetraselmis cordiformis 0 0 RCBT Sphagnum palustre 2 7 DVYE Oedogonium cardiacu 0 0 NGTD Dicranum scoparium 2 3 DZPJ Cylindrocapsa geminella 0 0 TMAJ Neckera douglasii 2 2 EATP Microthamnion kuetzigianum 0 0 JADL Rhynchostegium serrulatum 1 4 EEJO Neochloris oleoabundans 0 0 RDOO Racomitrium varium 1 4 EGNB Scoureldia sp. 0 0 VBMM Anomodon rostratus 1 4 ENAU Spermatozopsis similis 0 0 IGUH Schwetschkeopsis fabronia 1 3 ETGN Botryococcus braunii 0 0 RGKI Leucobryum glaucum 1 3 FFGR Netrium digitus 0 0 UHLI Sphagnum recurvatum 1 3 FMRU Zygnema sp. 0 0 HRWG Buxbaumia aphylla 1 1 FMVB Scherelia dubia 0 0 QMWB Anomodon attenuatus 1 1 FOYQ Microspora cf. tumidula 0 0 WSPM Rhytidiadelphus loreus 1 1 FQLP Klebsormidium subtile 0 0 XWHK Funaria 1 1 FXHG Hafniomonas reticulata 0 0 CMEQ Orthotrichum lyellii 1 0 GBGT Xanthidium antilopaeum 0 0 WOGB Andreaea rupestris 1 0 GFUR Chloromonas subdivisa 0 0 EEMJ Thuidium delicatulum 0 5 GGWH Onychonema laeve 0 0 GOWD Sphagnum lescurii 0 5 GJIY Neochloris sp. 0 0 AWOI Diphyscium foliosum 0 3 GUBD Brachiomonas submarina 0 0 SKQD Takakia lepidozioides 0 2 GXBM Coccomyxa pringsheimii 0 0 VMXJ Leucobryum albidum 0 2 GYBH Codium fragile 0 0 ZACW Leucodon sciuroides 0 2 GYRP Euastrum ane 0 0 ZQRI Timmia austriaca 0 2 HAOX Spirogyra sp. 0 0 ABCD Niphotrichum elongatum 0 1 HHXJ Tetraselmis striata 0 0 GRKU Syntrichia princeps 0 1 HIDG Cosmarium broomei 0 0 MIRS Climacium dendroides 0 1 HJVM Cosmarium ochthodes 0 0 SZYG Polytrichum commune 0 1 HKZW Mesotaenium caldariorum 0 0 ZTHV Atrichum angustatum 0 1 HVNO Tetraselmis chui 0 0 BPSG Scouleria aquatica 0 0 HYHN Prasinoderma coloniale 0 0

188 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M IHOI Chloromonas oogama 0 0 RPQV Phymatodocis nordstedtiana 0 0 IJMT Aphanochaete repens 0 0 RPRU Staurodesmus omearii 0 0 IRYH Heterochlamydomonas inaequalis 0 0 RQFE Cosmocladium cf. constrictum 0 0 ISGT Hormidiella sp. 0 0 RRSV Pedinomonas minor 0 0 ISHC Staurastrum sebaldi 0 0 RUIF Carteria obtusa 0 0 ISIM Nephroselmis pyriformis 0 0 RYJX Pandorina morum 0 0 ISPU Volvox globator 0 0 SDPC Oedogonium foveolatum 0 0 JIWJ Acrosiphonia sp. 0 0 SNOX Planotaenium ohtanii 0 0 JKKI Lobomonas rostrata 0 0 SRGS Chlamydomonas bilatus 0 0 JMTE Pseudoscoureldia marina 0 0 TGNL Picocystis salinarum 0 0 JMUI Stigeoclonium helveticum 0 0 TNAW Pyramimonas parkeae 0 0 JOJQ Cylindrocystis cushleckae 0 0 TPHT Spirotaenia sp. 0 0 JRDV Pediastrum duplex 0 0 USIX Neochlorosarcina sp. 0 0 JRGZ Chlamydomonas moewusii 0 0 UTRE Chloromonas tughillensi 0 0 JTIG Bryopsis plumosa 0 0 VALZ Chlamydomonas noctigama 0 0 KADG Ignatius tetrasporus 0 0 VAZE Cylindrocystis sp. 0 0 KEYW Gonatozygon kinahanii 0 0 VBLH Cladophora glomerata 0 0 KFEB Haematococcus pluvialis 0 0 VFIV Fritschiella tuberosa 0 0 KMNX Nucleotaenium eifelense 0 0 VHIJ Blastophysa cf. rhizopus 0 0 KSFK Chlorosarcinopsis halophila 0 0 VIAU Carteria crucifera 0 0 KUJU Gonium pectorale 0 0 VJDZ Botryococcus sudeticus 0 0 KYIO Mesostigma viride 0 0 VQBJ Coleochaete scutata 0 0 LETF Planophila terrestris 0 0 WCLV Prasiola crispa 0 0 LNIL Pteromonas angulosa 0 0 WCQU Staurodesmus convergens 0 0 LSHT Bolbocoleon piliferum 0 0 WDCW Mesotaenium endlicherianum 0 0 MCPK Bathycoccus prasinos 0 0 WDGV Cosmarium subtumidum 0 0 MFYC Nannochloris atomus 0 0 WDWX Dunaliella primolecta 0 0 MFZO Zygnemopsis sp. 0 0 WSJO Mesotaenium braunii 0 0 MMKU Nephroselmis olivace 0 0 WXRI Stichococcus bacillaris 0 0 MNCB Eremosphaera viridi 0 0 XDLL Oogamochlamys gigantea 0 0 MNNM Cosmarium granatum 0 0 XIVI Cymbomonas sp. 0 0 MOYY Pleurotaenium trabecul 0 0 XJGM coccoid prasinophyte 0 0 MWAN Chlorella minutissima 0 0 XMCL Prasinococcus capsulatus 0 0 MWXT Chara vulgaris 0 0 XOAL Dolichomastix tenuilepi 0 0 MXDS Spermatozopsis exsultans 0 0 XOZZ Chlamydomonas sp. 0 0 MXEZ Pycnococcus provasolii 0 0 XRTZ Roya obtusa 0 0 NATT Trentepohlia annulata 0 0 YDCQ Cephaleuros virescens 0 0 NBYP Mesotaenium kramstei 0 0 YLBK Cylindrocystis brebissonii 0 0 NDPQ Dunaliella salina 0 0 YSQT Penium exiguum 0 0 NKXU Trebouxia arboricola 0 0 ZDIZ Dunaliella tertiolecta 0 0 NNHQ Spirotaenia minuta 0 0 ZFXU Asteromonas gracilis 0 0 NQYP Pirula salina 0 0 ZIVZ Phacotus lenticularis 0 0 NSTT Oltmannsiellopsis viridis 0 0 ZLBP Chloromonas reticulata 0 0 OAEZ Persursaria percursa 0 0 ZLQE Stephanosphaera pluvialis 0 0 ODXI Haematococcus pluviali 0 0 ZNUM Leptosira obovata 0 0 OFUE Lobochlamys segnis 0 0 ZRMT Mougeotia sp. 0 0 OQON Entocladia endozoica 0 0 Chromista (CHR) OTQG Ankistrodesmus sp. 0 0 APTP Ishige okamurai 0 0 PFUD Geminella sp. 0 0 ASZK Punctaria latifolia 0 0 POIR Volvox aureus 0 0 BAJW Isochrysis sp. 0 0 PRIQ Pleurastrum insigne 0 0 BAKF Cryptomonas curvata 0 0 PUAN Pedinomonas tuberculata 0 0 BOGT Mallomonas sp. 0 0 PZIF Scenedesmus dimorphus 0 0 DBYD Synura petersenii 0 0 QPDY Coleochaete irregularis 0 0 EBWI Ochromonas sp. 0 0 QRTH Chloromonas perforata 0 0 FIDQ Undaria pinnatida 0 0 QWFV Bambusina borreri 0 0 FIKG Sargassum henslowianum 0 0 QWRA Vitreochlamys sp. 0 0 FOMH Sargassum integerrimum 0 0 QXSZ Mantoniella squamata 0 0 FSQE Desmarestia viridis 0 0 QYXY Botryococcus terribilis 0 0 HFIK Sargassum vachellianum 0 0 RAWF Uronema belka 0 0 IAYV Rhodomonas sp. 0 0 RNAT Eudorina elegans 0 0 IRZA Proteomonas sulcata 0 0

189 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M JCXF Scytosihon lomentaria 0 0 CGDN Tetraclinis sp. 20 0 JGGD Sargassum muticum 0 0 EGLZ Prumnopitys andina 20 0 JQFK Nannochloropsis oculata 0 0 HQOM Torreya nucifera 20 0 LDRY Hizikia fusifrome 0 0 IFLI Callitris gracilis 20 0 LIRF Dictyopteris undulata 0 0 RSCE Wollemia nobilis 20 0 LXRN Prymnesium parvum 0 0 JUWL Keteleeria evelyniana 19 0 MJMQ Hemiselmis virescens 0 0 MFTM Pinus jereyi 19 0 QLMZ Colpomenia sinuosa 0 0 UEVI Fokienia hodginsii 19 0 RAPY Kjellmaniella crassifolia 0 0 UUJS Nageia nagi 19 0 ROZZ Chroomonas sp. 0 0 XTZO Araucaria rulei 19 0 RWXW Sargassum horneri 0 0 IZGN Dacrydium balansae 18 0 SRSQ Laminaria japonica 0 0 KLGF Sundacarpus amarus 18 0 ULXR Scytosiphon dotyo 0 0 SCEB Podocarpus coriaceus 17 0 VJED Pavlova lutheri 0 0 OVIJ Papuacedrus papuana 16 0 VKVG Synura sp. 0 0 VFYZ Thuja plicata 16 0 VRGZ Petalonia fascia 0 0 QFAE Sequoiadendron giganteum 15 0 VYER Sargassum hemiphyllum 0 0 ZQWM Lagarostrobos franklinii 15 0 YRMA Sargassum thunbergii 0 0 AQFM Pseudolarix amabilis 14 0 Coniferophyta (CON) NPRL Cathaya agryrophylla 14 0 GMHZ Cryptomeria japonica 46 0 YFZK Sciadopitys verticillata 14 0 BUWV Platycladus orientalis 40 0 YLPM Pseudotaxus chienii 14 0 NKIN Thujopsis dolabrata 40 0 AREG Nothotsuga longibracteata 13 0 YYPE Austrocedrus chilensis 40 0 BTTS Austrotaxus spicata 12 0 QNGJ Cupressus dupreziana 39 0 ACWS Arucaria sp. 11 0 AIGO Chamaecyparis lawsoniana 37 0 JZVE Parasitaxus usta 11 0 XIRK Athrotaxis cupressoides 34 0 WVWN Larix speciosa 9 0 CDFR Manoao colensoi 33 0 WWSS Taxus baccata 9 0 QSNJ Taiwania cryptomerioides 33 0 BBDD Microstrobos tzgeraldii 6 0 GKCZ Diselma archeri 32 0 ZYAX Taxus cuspidata 5 0 FRPM Calocedrus decurrens 31 0 HBGV Sequoia sempervirens 0 0 ZQVF Cunninghamia lanceolata 31 0 Cycadales (CYC) NRXL Metasequoia glyptostroboides 30 0 XZUY Cycas micholitzii 7 0 OWFC Halocarpus bidwillii 30 0 WLIC Dioon edule 4 0 ETCJ Pilgerodendron uviferum 29 0 KAWQ Stangeria eriopus 3 0 IIOL Pinus parviora 29 0 GNQG Encephalartos barteri 2 0 IOVS Pseudotsuga menziesii 29 0 Eudicotyledons (EUD) XQSG Microbiota decussata 29 0 MHYG Conzya canadensis 43 0 EFMS Torreya taxifolia 28 0 LRRR Teucrium chamaedrys 39 0 GAMH Tsuga heterophylla 28 0 GUMF Anthemis tinctoria 36 0 GGEA Cedrus libani 28 0 BMSE Senecio rowleyanus 33 0 AUDE Widdringtonia cedarbergensis 27 0 XQRV Ipomoea purpurea 32 0 FMWZ Dacrycarpus compactus 27 0 BAHE Solenostemon scutellarioides 31 0 PLYX Falcatifolium taxoides 27 0 EQDA Salvia spp. 31 0 DZQM Pinus radiata 26 0 DESP Erigeron speciosus 30 0 NVGZ Cephalotaxus harringtonia 26 0 BEKN Papaver rhoeas 28 0 QCGM Saxegothaea conspicua 26 0 CPOC Convolvulus arvensis 28 0 XMGP Juniperus scopulorum 26 0 ERIA Heuchera sanguinea 26 0 JDQB Neocallitropsis pancheri 25 0 FCCA Boswellia sacra 26 0 VGSX Retrophyllum minus 25 0 OLXF Hibiscus cannabinus 26 0 VSRH Abies lasiocarpa 25 0 PVGM Oncotheca balansae 26 0 HILW Acmopyle pancheri 23 0 UBLN Xanthuium strumarium 26 0 IAJW Amentotaxus argotaenia 23 0 WOHL Schlegelia parasitica 26 0 JBND Pinus ponderosa 23 0 YGCX Geranium maculatum 26 0 AWQB Picea engelmanii 22 0 CAQZ Symphoricarpos sp. 25 0 JRNA Phyllocladus hypohyllus 22 0 DSUV Asclepia curassavica 25 0 MHGD Microcachrys tetragona 22 0 OKEF Hibbertia grossulariifolia 25 0 MIXZ Agathis robusta 22 0 TQOO Loropetalum chinense 25 0 XLGK Podocarpus rubens 22 0 WOBD Limonium spectabile 25 0 FHST Taxodium distichum 21 0 EYKJ Silybum marianum 24 0 OXGJ Glyptostrobus pensilis 21 0 FFFY Valeriana ocianalis 24 0 RMMV Callitris macleayana 21 0 LKKX Talinum sp. 24 0

190 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M NBMW Sanchezia sp. 24 0 QZZU Pyrenacantha malvifolia 18 0 NHAG Ipomoea nil 24 0 RHAU Chamaseyce mesebyranthemum 18 0 QCOU Papaver setigerum 24 0 THHD Elaeocarpus sylvestris 18 0 TEZA Solidago canadensis 24 0 YADI Asclepias syriaca 18 0 UHBY Oresitrophe rupifraga 24 0 YKQR Hamamelis virginiana 18 0 BNDE Hypericum perforatum 23 0 ZUHO Anemone hupenhensis 18 0 CWLL Schlegelia parasitica B 23 0 AQZD Flaveria pringlei 17 0 JETM Bauhinia tomentosa 23 0 CLRW Bacopa caroliniana 17 0 JYMN Flaveria brownii 23 0 EDBB Polyscias fruticosa 17 0 SMMC Chenopodium quinoa 23 0 FDMM Rosmarinus ocinalis 17 0 TXMP Linum strictum 23 0 GCFE Verbena hastata 17 0 VKJD Maesa lanceolata 23 0 JBGU Amaranthus palmeri 17 0 AJFN Mydocarpus sp. 22 0 JEXA Impatiens balsamifera 17 0 AYMT Eucalyptus leucoxylon 22 0 JTRM Dipsacus sativum 17 0 CYVA Cimicifuga racemosa 22 0 LLQV Portulaca cryptopetala 17 0 EMBR Ipomoea pubescens 22 0 LYPZ Flaveria pubescens 17 0 ERWT Ipomoea coccinea 22 0 OAGK Matricaria matricariodes 17 0 GETL Pogostemon sp. 22 0 OQBM Ipomoea indica 17 0 GRFT Buddleja sp. 22 0 OSMU Lycium sp. 17 0 JEPE Flaveria bidentis 22 0 RTTY Salvadora sp. 17 0 TQKZ Angelica archangelica 22 0 TLCA Oenothera speciosa 17 0 ZBPY Alternanthera brasileana 22 0 UTQR Tabebuia umbellate 17 0 DMLT Vitex agnus castus 21 0 XVRU Heliotropium calcicola 17 0 HTIP Paeonia lactiora 21 0 YGAT Phyllanthus sp. 17 0 LQJY Solanum xanthocarpum 21 0 ZJRC Sambucus canadensis 17 0 PUCW Agastache rugosa 21 0 BFJL Cornus oridana 16 0 WHNV Micromeria fruticosa 21 0 BHYC Linum lewisii 16 0 YKZB Plantago maritima 21 0 BZDF Oenothera biennis 16 0 BJSW Cannabis sativa 20 0 COBX Polypremum procumbens 16 0 CWYJ Heracleum lanatum 20 0 INSP Myrica cerifera 16 0 DUQG Tanacetum parthenium 20 0 JSZD Bursera simaruba 16 0 EAAA Marrubium vulgare 20 0 LPGY Viola tricolor 16 0 EDEQ Wrightia natalensis 20 0 PCGJ Anisacanthus quadridas 16 0 GHLP Solanum dulcamara 20 0 QSKP Polansia trachysperma 16 0 IANR Rosa palustris 20 0 QSLH Ipomoea hederacea 16 0 QJXB Wikstroemia indica 20 0 RKLL Copaifera ocianalis 16 0 RQNK Papaver somniferum 20 0 RQUG Hakea prostrata 16 0 SNNC Leonurus japonicus 20 0 WBXY Oenothera elata 16 0 VXOD Linum tenuifolium 20 0 XPBC Galphimia gracilis 16 0 WQUF Nepenthes alata 20 0 ZBTA Boerhavia cf. spiderwort 16 0 ZCUA Flaveria trinervia 20 0 ALUC Ipomoea quamoclit 15 0 ARYD Oenothera laciniata 19 0 BLWH Portulaca mauii 15 0 EDXZ Schlegelia violacea 19 0 EILE Rhamnus japonica 15 0 HLJG Viburnum odoratissimum 19 0 EJCM Lindenbergia philippensis 15 0 KFZY Tragopogon castellanus 19 0 FUMQ Nepeta cataria 15 0 MQIV Phyla dulcis 19 0 FYUH Lavandula angustifolia 15 0 OHAE Polygala lutea 19 0 KPTE Bixa orellana 15 0 SCAO Mollugo nudicaulis 19 0 LWCK Lycium barbarum 15 0 SUVN Hedera helix 19 0 OEKO Heliotropium liforme 15 0 VKGP Geranium carolinianum 19 0 OHKC Alternanthera caracasana 15 0 YFQX Apocynum androsaemifolium 19 0 PDQH Aerva lanata 15 0 ZINQ Oenothera grandiora 19 0 PMTB Oenothera gaura 15 0 BIDT Flaveria cronquestii 18 0 QUTB Aextoxicon punctatum 15 0 BVOF Linum perenne 18 0 SALZ Pittosporum resiniferum 15 0 CKDK Ochna serrulata 18 0 WAXR Litchi chinensis 15 0 GGJD Strychnos spinosa 18 0 WEQK Centella asiatica 15 0 IDGE Heliotropium racemosum 18 0 WXVX Ledum palustre 15 0 IZNU Ipomoea lobata 18 0 AEPI Linum leonii 14 0 NJLF Viola canadensis 18 0 CFRN Carthamus lanatus 14 0 PJSX Acacia pycnantha 18 0 DOVJ Leontopodium alpinum 14 0 PXYR Euphorbia pekinensis 18 0 FUPX Tragopogon pratensis 14 0

191 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M HNCF Linum hirsutum 14 0 RFSD Lactuca graminifolia 12 0 IMZV Oenothera grandis 14 0 RTNA Oxera pulchella 12 0 JNKW Cicerbita plumieri 14 0 SUAK Codariocalyx motorius 12 0 KWGC Crassula perforata 14 0 TJLC Nothofagus obliqua 12 0 MXFG Pinguicula agnata 14 0 TJQY Kerria japonica 12 0 MZLD Ligustrum sinense 14 0 UCNM Ajuga reptans 12 0 NEBM Syzygium macranthum 14 0 UEEN Forestiera segregata 12 0 NHUA Castanea crenata 14 0 UJGI Oenothera rosea 12 0 QKMG Phoradendron serotinum 14 0 UZWG Castanea pumila 12 0 TOKV Aphanopetalum resinosum 14 0 VNMY Bischoa javanica 12 0 TORX Olea europaea 14 0 WFBF Podophyllum peltatum 12 0 UDUT Larrea tridentata 14 0 WKCY Urtica dioica 12 0 UPZX Cleome viscosa 14 0 WQRD Galium boreale 12 0 VLNB Gompholobium polymorphum 14 0 XXYA Verbascum sp. 12 0 WEAC Strobilanthes dyerianus 14 0 YZGX Cyrilla racemiora 12 0 WRPP Synsepalum dulcicum 14 0 ZSSR Xanthicercis zambesiaca 12 0 XRCX Aster tataricus 14 0 MRKX Phytolacca bogotensis 11 1 YNUE Punica granatum 14 0 AIOU Brugmansia sanguinea 11 0 YRHD Antirrhinum braun 14 0 ATYL Scutellaria montana 11 0 AXPJ Linum avum 13 0 AXNH Oenothera liformis 11 0 BOLZ Atropa belladonna 13 0 CCID Akebia trifoliata 11 0 EZGR Portulaca oleracea 13 0 DLJZ Solanum ptychanthum 11 0 FCBJ Saxifraga stolonifera 13 0 EVOD Eschscholzia californica 11 0 GDZS Byblis gigantea 13 0 FXGI Stylidium adnatum 11 0 HJMP Astragalus membranaceus 13 0 GBVZ Thalictrum thalictroides 11 0 IHPC Platycodon grandiorus 13 0 HAEU Berberidopsis beckleri 11 0 IRAF Argemone mexicana 13 0 HMFE Allionia incarnata 11 0 IYDF Thymus vulgaris 13 0 IWIS Portulaca pilosa 11 0 JCMU Pinguicula caudata 13 0 JGAB Mirabilis jalapa 11 0 KTAR Chionanthus retusus 13 0 JHCN Oxalis sp. 11 0 MKZR Nicotiana sylvestris 13 0 JKNQ Oenothera suulta suulta 11 0 NGRR Ternstroemia gymnanthera 13 0 KPUM Exacum ane 11 0 NTEO Pittosporum sahnianum 13 0 LDEL Portulaca amilis 11 0 NVSO Flaveria palmeri 13 0 MDJK Heliotropium texanum 11 0 OINM Hydrocotyle umbellata 13 0 MYVH Linum grandiorum 11 0 PTBJ Plantago virginica 13 0 MYZV Tropaeolum peregrinum 11 0 TVSH Bituminaria bituminosa 13 0 NAUM Ipomoea lindheimeri 11 0 UAXP Gyrostemon ramulosus 13 0 ODDO Ardisia humilis 11 0 XGFU Exocarpos cupressiformis 13 0 OODC Linum bienne 11 0 XVJB Morus nigra 13 0 PEZP Glycyrrhiza glabra 11 0 ZCDJ Acacia argyrophylla 13 0 POZS Linum usitatissimum 11 0 AFQQ Inula helenium 12 0 RJNQ Lagerstroemia indica 11 0 AUIP Phelline lucida 12 0 RSPO Santalum acuminatum 11 0 AYIY Ruellia brittoniana 12 0 SBZH Pennantia corymbosa 11 0 CBJR Atriplex rosea 12 0 SIZE Passiora caerulea 11 0 CLMX Escallonia rubra 12 0 SVVG Fagus sylvatica 11 0 CLNU Escallonia sp. cv. Newport 12 0 SWGX Tetrazygia bicolor 11 0 COAQ Malesherbia fasiculata 12 0 TPEM Platyspermation crassifolium 11 0 CTYH Basella alba 12 0 UPOG Anemone pulsatilla 11 0 DDRL Tragopogon dubius 12 0 XRLM Buddleja lindleyana 11 0 DIHD Heliotropium tenellum 12 0 YHFG Nandina domestica 11 0 FWBF Alangium chinense 12 0 YRBQ Flaveria angustifolia 11 0 HENI Quercus shumardii 12 0 AAXJ Atriplex prostrata 10 0 JPDJ Symplocus sp. 12 0 AHRN Cuscuta pentagonia 10 0 KCPT Linum macraei 12 0 BJKT Delosperma echinatum 10 0 KJAA Mollugo pentaphylla 12 0 CIAC Bergenia sp. 10 0 LAPO Draba oligosperma 12 0 DAYQ Mitella pentandra 10 0 LWDA Alnus serrulata 12 0 DCCI Calceolaria pinifolia 10 0 NUZN Cercidiphyllum japonicum 12 0 DXQW Juglans nigra 10 0 PCNH Psychotria marginata 12 0 FGDU Syzygium paniculatum 10 0 QAUE Actinidia chinensis 12 0 FYTP Daphniphyllum macropodum 10 0

192 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M GNPX Oxera neriifolia 10 0 HUQC Scaevola sp. 8 0 GSZA Lonicera japonica 10 0 IWMW Buxus sempervirens 8 0 GVCB Oenothera anis 10 0 IZLO Lobelia siphilitica 8 0 HBUQ Licania michauxii 10 0 JGYZ Holarrhena pubescens 8 0 JAFJ Bougainvillea spectabilis 10 0 KNMB Lathyrus sativus 8 0 KYAD Celtis occidentalis 10 0 LRTN Monotropa uniora 8 0 MGVU Allamanda cathartica 10 0 NNGU Coriaria nepalensis 8 0 MYMP Astragalus propinquus 10 0 ONLQ Atriplex hortensis 8 0 OMYK Trianthemum portulacastrum 10 0 OOVX Boykinia jamesii 8 0 SSDU Papaver bracteatum 10 0 OQHZ Quillaja saponaria 8 0 SXCE Physocarpus opulifolius 10 0 OTAN Deutzia scabra 8 0 TAGM Melissa ocinalis 10 0 QEHE Rauvola tetraphyla 8 0 UHJR Citrus x paradisi 10 0 QICX Ailanthus altissima 8 0 UQCB Portulaca molokaiensis 10 0 QURC Dichroa febrifuga 8 0 VGVI Dendropemon caribaeus 10 0 QXWF Flaveria kochiana 8 0 VMNH Sinapis alba 10 0 RUUB Physena madagascariensis 8 0 VYDM Orobanche fasciculata 10 0 RVGH Drypetes deplanchei 8 0 XSSD Amaranthus cruentus 10 0 SIBR Celsia arcturus 8 0 AALA Meliosma cuneifolia 9 0 SIIK Hakea drupaceae 8 0 ASMV Ilex vomitoria 9 0 SJAN Oenothera serrulata 8 0 AWJM Edgeworthia papyrifera 9 0 SLOI Tiarella polyphylla 8 0 BEFC Manilkara zapota 9 0 UWFU Itea virginica 8 0 DFYF Ilex sp. 9 0 XMBA Poliomintha bustamanta 8 0 DKFZ Mertensia paniculata 9 0 XNLP Manihot grahamii 8 0 INQX Salix acutifolia 9 0 ZBVT Chrysobalanus icaco 8 0 JTQQ Glycyrrhiza lepidota 9 0 ZETY Hydrangea quercifolia 8 0 KDCH Portulaca umbraticola 9 0 ZSGF Psychotria ipecacuanha 8 0 KEGA Glycine soja 9 0 ZTLR Rhizophora mangle 8 0 LVNW Cocculus laurifolius 9 0 ABEH Heliotropium greggii 7 0 OBTI Peganum harmala 9 0 BGZG Cissus quadrandularis 7 0 OUER Heliotropium convolvulaceum 9 0 BSEY Daenikera sp. 7 0 PIYM Galax urceolata 9 0 CTSS Tellima breviora 7 0 PKMO Cistus inatus 9 0 CVDF Simmondsia chinensis 7 0 PPPZ Rhododendron scopulorum 9 0 DTNC Sinningia tuberosa 7 0 PUDI Daphne geraldii 9 0 EDIT Sessuvium ventricosum 7 0 PZRT Nelumbo nucifera 9 0 EJBY Anticharis glandulosa 7 0 QNOC Sanguisorba minor 9 0 EMAL Ehretia acuminata 7 0 QTJY Euptelea pleiosperma 9 0 FEDW Epilobium sp. 7 0 RKFX Cercis canadensis 9 0 FZQN Silene latifolia 7 0 RPPC Erythroxylum coca 9 0 GNRI Digitalis purpurea 7 0 TKEK Mansoa alliacea 9 0 HDSY Aerva persica 7 0 UYED Flaveria sonorensis 9 0 HGSM Gelsemium sempervirens 7 0 VFFP Acer negundo 9 0 HPNZ Oenothera longituba 7 0 VUSY Nyssa ogeche 9 0 IUSR Myriophyllum aquaticum 7 0 WGET Kochia scoparia 9 0 KGJF Tragopogon porrifolius 7 0 WWQZ Medinilla magnica 9 0 NMGG Hypecoum procumbens 7 0 XMVD Chelidonium majus 9 0 PSHB Lantana camara 7 0 ZHMB Krameria lanceolata 9 0 QIKZ Griselinia racemosa 7 0 AQGE Humulus lupulus 8 0 STKY Balanophora fungosa 7 0 DAAD Ardisia revoluta 8 0 SWPE Reseda odorata 7 0 DUNJ Helenium autumnale 8 0 SYHW Ribes a. giraldii 7 0 DZLN Oenothera picensis 8 0 TIUZ Cunonia capensis 7 0 ECTD Gentiana acaulis 8 0 UVDC Azadirachta indica 7 0 EHNF Dillenia indica 8 0 VVPY Croton tiglium 7 0 EZZT Passiora edulis 8 0 XHKT Sanguinaria canadensis 7 0 FAMO Conopholis americana 8 0 XISJ Anthirrhinum majus 7 0 FNEN Phlox sp. 8 0 XMQO Gunnera manicata 7 0 GIPR Aucuba japonica 8 0 YSRZ Fouqueria macdougalli 7 0 GIWN Sarcobatus vermiculatus 8 0 YUOM Rhus radicans 7 0 GLVK Salix eriocephala 8 0 AVJK Cavendishia cuatrecasasii 6 0 HKMQ Oenothera villaricae 8 0 BCAA Kirkia wilmsii 6 0

193 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M BERS Zaleya pentandra 6 0 UMUL Paulownia fargesii 5 0 BKQU Phytolacca americana 6 0 VQFW Platanus occidentalis 5 0 BLVL Sorbus koehneana 6 0 XKPS Ximenia americana 5 0 CSUV Cochlearea ocinalis 6 0 YHLF Oenothera rhombipetala 5 0 CWZU Betula pendula 6 0 ZSAB Hoheria angustifolia 5 0 DRIL Kalanchoe crenato diagremontiana 6 0 BWRK Alternanthera sessilis 4 0 EGOS Allionia spp. 6 0 CKKR Astilbe chinensis 4 0 EQYT Oenothera berlandieri 6 0 CPLT Portulaca grandiora 4 0 FONV Greyia sutherlandii 6 0 CUTE Blutoparon vermiculare 4 0 FROP Epifagus virginiana 6 0 CZPV Moringa oleifera 4 0 FWCQ Garcinia oblongiolia 6 0 GEHT Gleditsia triacanthos 4 0 GJNX Cypselea humifusum 6 0 GRRW Grevillea robusta 4 0 HDWF Francoa appendiculata 6 0 IHCQ Crossopetalum rhacoma 4 0 HXCD Flaveria vaginata 6 0 IXVJ Menyanthes trifoliata 4 0 HYZL Akania lucens 6 0 JNVS Datura metel 4 0 IDAU Oenothera elata hookeri 6 0 JWEY Heliotropium sp. 4 0 KVAY Trubulus eichlerianus 6 0 KBRW Oenothera clelandii 4 0 MFEA Stackhousia spathulata 6 0 KTWL Kaliphora madagascariensis 4 0 MZOB Heliotropium mendocinum 6 0 LNER Casuarina glauca 4 0 NCVK Prunus prostrata 6 0 LVUS Cleome violaceae 4 0 OLES Schiedea membranacea 6 0 MVSE Griselinia littoralis 4 0 OSIP Garcinia livingstonei 6 0 PTFA Jacquinia sp. 4 0 PHCE Prunella vulgaris 6 0 PTLU Staphylea trifolia 4 0 QACK Helwingia japonica 6 0 RJIM Melia azedarach 4 0 QIEH Cotoneaster transcaucasicus 6 0 RWKR Saintpaulia ionantha 4 0 RXEN Polycarpaea repens 6 0 SHEZ Dianthus sp. 4 0 SKNL Saponaria ocianalis 6 0 TVCU Ochna mossambicensis 4 0 TTRG Lupinus angustifolius 6 0 VCIN Malus baccata 4 0 TZWR Arabis alpina 6 0 WVMY Phlox drummondii 4 0 UOYN Catharanthus roseus 6 0 XFFT Cercocarpus ledifolius 4 0 VHZV Gleditsia sinensis 6 0 YKFU Peltoboykinia watanabei 4 0 VTLJ Caiophora chuquitensis 6 0 YQIJ Phacelia campanularia 4 0 WMLW Amaranthus retroexus 6 0 ZJUL Rhodiola rosea 4 0 WMUK Schizolaena sp. 6 0 CJGZ Oenothera nana 3 0 WVEF Rhamnus caroliniana 6 0 CQMG Ulmus alata 3 0 WYIG Dombeya burgessiae 6 0 DLAI Solanum lasiophyllum 3 0 XAYK Kigelia africana 6 0 EYRD Alternanthera tenella 3 0 YNFJ Microtea debilis 6 0 FVXD Beta maritima 3 0 ZHEE Ziziphus jujuba 6 0 HBHB Aesculus pavia 3 0 ZRIN Uncarina grandidieri 6 0 HUSX Roridula gorgonias 3 0 ACFP Boehmeria nivea 5 0 IPWB Brassica nigra 3 0 ATFX Muntingia calabura 5 0 LFOG Salix purpurea 3 0 AZBL Petiveria alliacea 5 0 NFXV Mumea americana 3 0 BNTL Souroubea exauriculata 5 0 NMDZ Solanum sisymbriifolium 3 0 BXBF Draba sachalinensis 5 0 NXOH Apios americana 3 0 CPKP Lophophora williamsii 5 0 NXTS Mollugo verticillata 3 0 DHAW Geum quellyon 5 0 OXYP Sideroxylon reclinatum 3 0 EAVM Amelanchier canadensis 5 0 PAZJ Ricinus communis 3 0 FYSJ Polygonum convolvulus 5 0 RZTJ Salix fargesii 3 0 GCYL Portulaca suruticosa 5 0 SERM Sarcodes sanguinea 3 0 IEPQ Salix dasyclados 5 0 SZPD Tetrastigma voinierianum 3 0 KKDQ Salix viminalis 5 0 UDHA Ceratocapnos vesicaria 3 0 NIGS Heliotropium karwinsky 5 0 UGJI Lycopersicon cheesmanii 3 0 OCTM Begonia sp. 5 0 ULGV Morinda citrifolia 3 0 OWAS Rehmannia glutinosa 5 0 UVQL Draba magellanica 3 0 RBYC Elaeagnus pungens 5 0 VDKG Cleome gynandra 3 0 SFKQ Hilleria latifolia 5 0 VGHH Hydrastis canadensis 3 0 SQCF Dryas octopetala 5 0 VYGG Stachyurus praecox 3 0 SWOH Trochodendron araliodes 5 0 WWKL Tapiscia sinensis 3 0 SXML Ilex paraguariensis 5 0 AJJE Terminalia neotaliala 2 0 TJES Spergularia media 5 0 AXBO Sinojackia xylocarpa 2 0

194 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M BTZX Philadelphus inodorus 2 0 PQED Gloeochaete wittrockiana 0 0 CMFF Lupinus polyphyllus 2 0 WLZP Cyanophora paradoxa 0 0 DZTK Batis maritima 2 0 Gnetales (GNE) EDHN Ficus religiosa 2 0 TOXE Welwitschia mirabilis 5 0 HANM Pholisma arenarium 2 0 GTHK Gnetum montanum 4 0 HZTS Sessuvium portulacastrum 2 0 VDAO Ephedra sinica 2 0 JLOV Pereskia aculeata 2 0 Liliopsida (LIL) KVFU Diospyros malabarica 2 0 EFCZ Eleusine coracana 37 0 NJJO Pilostyles thunbergii 2 0 LEMW Curcuma olena 22 0 QZXQ Gymnocladus dioicus 2 0 WTDE Johnsonia pubescens 18 0 SLYR Cladrastis lutea 2 0 ZENX Neurachne alopecuroidea 18 0 SZUO Eucommia ulmoides 2 0 TRRQ Narcissus viridiorus 17 0 TDTF Salix sachalinensis 2 0 XZME Drakea elastica 17 0 VJPU Boerhavia dominnii 2 0 YPIC Microstegium vimineum 17 0 YZVJ Cephalotus follicularis 2 0 MTII Acorus americanus 15 0 ZGQD Corydalis linstowiana 2 0 RCUX Maianthemum sp. 15 0 ZPKK Aruncus dioicus 2 0 SOHV Panicum miliaceum A 15 0 AUGV Capnoides sempervirens 1 0 WCOR Thyridolepis multiculmis 15 0 CRNC Limnanthes douglassii 1 0 RQZP Nolina bigelorii 14 0 FAKD Nelumbo sp. 1 0 SILJ Talbotia elegans 14 0 HABV Draba aizoides 1 0 LTZF Sisyrinchium angustifolium 13 0 IKFD Quassia amara 1 0 MUMD Lomandra longifolia 13 0 JOIS Koeberlina spinosa 1 0 PRFO Agapanthus africanus 13 0 KZED Senna hebecarpa 1 0 CMCY Hesperaloe parviora 12 0 LQUX Gylcine soja 1 0 DGXS Freycinetia multiora 12 0 RNBN Mollugo cerviana 1 0 QNPH Colchicum autumnale 12 0 SMUR Lennoa madreporoides 1 0 VQYB Neurachne lanigera 12 0 QAIR Opuntia sp. 0 4 WBIB Lepidosperma gibsonii 12 0 AKRH Gylcine soja 0 0 XHHU Heliconia sp. 12 0 AXAF Thladiantha villosula 0 0 XUAB Paraneurachne muelleri 12 0 BBBA Tetrastigma obtectum 0 0 BLAJ Hemerocallis spp. 11 0 CGGO Plumbago auriculata 0 0 CNTZ Oncidium sphacelatum 11 0 DCVZ Plantago coronopis 0 0 JHUL Hemerocallis sp. 11 0 DNQA Psychotria douarrei 0 0 LELS Haemaria discolor 11 0 DYFF Pycnanthemum tenuifolium 0 0 MTHW Platanthera clavellata 11 0 FPLR Gylcine soja 0 0 QOXT Xerophyta villosa 11 0 GIOY Corokia cotoneaster 0 0 TZNS Canna sp. 11 0 GTSV Draba hispida 0 0 UZXL Disporopsis pernyi 11 0 HRUR Utricularia sp. 0 0 YBML Yucca brevifolia 11 0 HSXO Ancistrocladus tectorius 0 0 DPFW Zephyranthes treatiae 10 0 HTDC Tamarix chinensis 0 0 KOFB Urginea maritima 10 0 JLLY Melaleuca quinquenervia 0 0 LSKK Orchidantha maxillaroides 10 0 LJQF Draba ossetica 0 0 NNOK Neurachne tenuifolia 10 0 LMVB Liquidambar styraciua 0 0 YXNR Triodia a. bynoei 10 0 NIJU Heteropyxis natalensis 0 0 BPKH Neurachne annularis 9 0 PFSA Gylcine soja 0 0 HOKG Nolina atopocarpa 9 0 PGKL Phellodendron amurense 0 0 HXJE Serenoa repens 9 0 RIDD Gylcine soja 0 0 ICNN Yucca lamentosa 9 0 RMWJ Wisteria oribunda 0 0 YJUG Curculigo sp. 9 0 TJMB Gylcine soja 0 0 BDJQ Zingiber ocinale 8 0 UZNH Curtisia dentata 0 0 BXAY Neurachne minor 8 0 VWIP Carya glabra 0 0 DMIN Phycella a. cyrtanthoides 8 0 XONJ Camptotheca acuminata 0 0 HATH Aristida stricta 8 0 XOOE Desmanthus illinoensis 0 0 IXEM Brodiaea sierrae 8 0 Euglenida (EUG) JDTY Rhodophiala pratensis 8 0 UNBZ Euglena sp. 0 0 MFIN Pistia stratioides 8 0 (GIN) ROEI Cymbopogon nardus 8 0 SGTW Ginkgo biloba 0 0 UOEL Strelitzia reginae 8 0 Glaucophyta (GLA) XBKS Thyridolepis mitchelliana 8 0 JKHA Cyanoptyche gloeocystis 0 0 AFLV Xerophyllum asphodeloides 7 0 POOW Glaucocystis cf. nostochinearum 0 0 JNUB Maranta leuconeura 7 0

195 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M JSAG Masdevallia yuangensis 7 0 KUXM Selaginella selaginoides 2 2 JVBR Aloe vera 7 0 LGDQ Selaginella apoda 2 2 KYNE Cyanella orchidofromis 7 0 ABIJ Selaginella lepidophylla 1 10 MWYQ Smilax bona nox 7 0 KJYC Selaginella willdenowii 1 5 OOSO Helonias bullata 7 0 CBAE Huperzia myrisinites 1 2 THDM Vanilla planifolia 7 0 XNXF Dendrolycopodium obscurum 1 2 VVVV Ludovia sp. 7 0 ENQF Lycopodium annotinum 1 1 XFJG Maianthemum canadense 7 0 GKAG Huperzia lucidula 1 1 YMES Typhonium blumei 7 0 ULKT Lycopodiella apressa 1 1 ZMGN Neurachne munroi 7 0 Magnoliidae (MAG) BSTR Chondropetalum tectorum 6 0 WBOD Magnolia grandiora 25 0 COCP Triglochin maritimum 6 0 XQWC Michelia maudiae 21 0 GZSF Uniola paniculata 6 0 XSZI Peperomia fraseri 20 0 IADP Deschampsia cespitosa 6 0 FALI Calycanthus oridus 19 0 KBXS Allium commutatum 6 0 OBPL Myristica fragrans 19 0 MVRF Sansevieria trifasciata 6 0 WAIL Laurelia sempervirens 17 0 OCWZ Dioscorea villosa 6 0 MAQO Gomortega keule 16 0 SVTS Drimia altissima 6 0 MUNP Piper auritum 16 0 XPAF Mapania palustris 6 0 ABSS Sassafras albidum 15 0 FGRF Asparagus densiorus 5 0 PSJT Uvaria microcarpa 14 0 KXSK Agave tequilana 5 0 WKSU Drimys winteri 12 0 LDME Amaryllis belladonna 5 0 PAWA Aristolochia elegans 11 0 LSJW Ruscus sp. 5 0 SJEV Canella winterana 11 0 TCYS Peliosanthese minor 5 0 WPHN Idiospermum australiense 10 0 HWUP Sabal bermudana 4 0 YZRI Annona muricata 9 0 PLBZ Chlorogalum pomeridianum 4 0 VYLQ Cassytha liformis 8 0 SART Xeronema callistemon 4 0 BCGB Cinnamomum camphora 7 0 BRUD Typha latifolia 3 0 BSVG Gyrocarpus americanus 7 0 BYQM Posidonia australis 3 0 WIGA Persea borbonia 7 0 CIEA Juncus inexus 3 0 CSSK Houttuynia cordata 6 0 EMJJ Borya sphaerocephala 3 0 DHPO Eupomatia bennettii 6 0 FCEL Phormium tenax 3 0 QDVW Saruma henryi 5 0 ONBE Eriospermum lancifolia 3 0 WPYJ Frankenia laevis 4 0 VTUS Goodyera pubescens 3 0 KRJP Peumus boldus 3 0 BYPY Brocchinia reducta 2 0 OPDF Saururus cernuus 1 0 GDKK Gloriosa superba 2 0 KAYP Lindera benzoin 0 0 NSPR Nypa fruticans 2 0 (MAR) PPQR Typha angustifolia 2 0 KRUQ Porella navicularis 14 8 PWSG Cyperus papyrus 2 0 LFVP Marchantia paleacea 11 8 RDYY Cyanastrum cordifolium 2 0 JPYU Marchantia polymorpha 7 7 VBHQ Stemona tuberosa 2 0 RTMU Calypogeia ssa 7 4 WXNT Joinvillea ascendens 2 0 HPXA Ptilidium pulcherrimum 6 11 DWZT Sagittaria latifolia 1 0 TXVB Lunularia cruciata 5 9 GJPF Allium sativum 1 0 YBQN Odontoschisma prostratum 4 10 THEW Lilium sargentiae 1 0 IRBN Scapania nemorosa 4 8 ZKPF Traubia modesta 1 0 UUHD Porella pinnata 4 8 QGLJ Cocos nucifera 0 0 JHFI Pellia neesiana 3 16 RMVB Avena fatua 0 0 LGOW Schistochila sp. 3 3 Lycopodiophyta (LYC) CHJJ Lejeuneaceae sp. 2 14 ZZEI Phylloglossum drummondii 8 4 ILBQ Conocephalum conicum 2 9 ZFGK Selaginella kraussiana 5 8 PIUF Pellia cf. Epiphylla 2 9 PYHZ Isoetes sp. 5 7 TGKW Frullania 2 8 ZZOL Selaginella stauntoniana 4 2 WJLO Riccia berychiana 2 4 YHZW Huperzia selago 3 7 YFGP Pallavicinia lyellii 2 4 PKOX Isoetes tegetiformans 3 5 HERT Sphaerocarpos texanus 2 3 UPMJ Pseudolycopodiella caroliniana 3 3 NRWZ Metzgeria crassipilis 2 1 ZYCD Selaginella acanthonota 3 3 WZYK Bazzania trilobata 2 0 WAFT Diphasiastrum digitatum 3 0 NWQC Plagiochila asplenioides 1 9 JKAA Selaginella wallacei 2 6 BNCU Radula lindenbergia 1 8 PQTO Lycopodium deuterodensum 2 4 FITN Treubia lacunose 1 6 GAON Huperzia squarrosa 2 2 OFTV Barbilophozia barbata 0 11

196 Table A.1: Continued Table A.1: Continued

Code Species P M Code Species P M TFYI Marchantia emarginata 0 4 PNZO Culcita macrocarpa 0 2 AEXY Blasia sp. 0 1 QVMR Psilotum nudum 0 2 TFDQ Monoclea gottschei 0 0 RFRB Didymochlaena truncatula 0 2 Moniliformopses (MON) URCP Athyrium lix femina 0 2 UOMY Osmunda sp. 8 4 UWOD Plagiogyria japonica 0 2 CQPW Anemia tomentosa 6 12 GANB Cyathea spinulosa 0 1 MEKP Dipteris conjugata 5 5 HTFH Onoclea sensibilis 0 1 NDUV Vittaria appalachiana 4 1 RICC Cystopteris reevesiana 0 1 UJTT Pityrogramma trifoliata 3 7 TRPJ Hymenophyllum cupressiforme 0 1 NHCM Angiopteris evecta 3 5 BIVQ Osmundastrum cinnamomeum 0 0 VIBO Osmunda javanica 3 2 JVSZ Equisetum hymale 0 0 GSXD Myriopteris eatonii 2 11 TWFZ Crepidomanes venosum 0 0 ALVQ Tmesipteris parva 2 4 YKSS Osmunda regalis 0 0 XDVM Sticherus lobatus 2 0 Prorocentraceae (PRO) POPJ Pteris vittata 1 10 TZJQ Prorocentrum micans 0 0 YLJA Polypodium amorphum 1 9 Rhodophyta (RHO) XWDM Cystopteris fragilis 1 8 IKWM Gracilaria lemaneiformi 1 0 FQGQ Polystichum acrostichoides 1 5 FTRP Gracilaria chouae 0 1 WCLG Adiantum aleuticum 1 5 OBUY Porphyridium cruentum 0 1 QIAD Hymenophyllum bivalve 1 4 PVGP Porphyridium purpureum 0 1 ZXJO Hemionitis arifolia 1 3 BWVJ Betaphycus gelatinae 0 0 YIXP Lindsaea microphylla 1 2 CKXF Gymnogongrus ftabelliformis 0 0 KIIX Pilularia globulifera 1 1 IEHF Dumontia simplex 0 0 HNDZ Cystopteris utahensis 0 20 IHJY Kappaphycus alvarezii 0 0 RWYZ Blechnum spicant 0 15 IKIZ Grateloupia livida 0 0 NWWI Nephrolepis exaltata 0 13 JEBK Eucheuma denticulatum 0 0 XDDT Argyrochosma nivea 0 10 JJZR Rhodochaete parvula 0 0 YJJY Woodsia scopulina 0 10 LJPN Gracilaria blodgettii 0 0 UJWU Pleopeltis polypodioides 0 9 LLXJ Chroodactylon ornatu 0 0 BMJR Adiantum tenerum 0 8 PWKQ Prionitis divaricata 0 0 FCHS Deparia lobato 0 8 PYDB Sinotubimorpha guangdongensis 0 0 OQWW Davallia fejeensis 0 8 RSOF Glaucosphaera vacuolata 0 0 WGTU Leucostegia immersa 0 8 RTLC Rhodella violacea 0 0 CJNT Polypodium glycyrrhiza 0 7 SBLT Gloeopeltis furcata 0 0 DCDT Cheilanthes arizonica 0 7 UGPM Chondrus crispus 0 0 DJSE Ophioglossum vulgatum 0 7 URSB Grateloupia turuturu 0 0 JBLI Danaea sp. 0 7 UYFR Symphyocladia latiuscula 0 0 KJZG Asplenium platyneuron 0 7 VNAL Gracilaria asiatica 0 0 NOKI Lindsaea linearis 0 7 VZWX Ceramium kondoi 0 0 BEGM Botrypus virginianus 0 6 WEJN Mazzaella japonica 0 0 HEGQ Gymnocarpium dryopteris 0 6 XAXW Polysiphonia japonica 0 0 OCZL Homalosorus pycnocarpos 0 6 YSBD Heterosiphonia pulchra 0 0 PSKY Asplenium nidus 0 6 ZJOJ Grateloupia licina 0 0 UFJN Diplazium wichurae 0 6 ZULJ Porphyra yezoensis 0 0 CVEG Azolla cf. caroliniana 0 5 ORJE Cibotium glaucum 0 5 PBUU Lygodium japonicum 0 5 YOWV Cystopteris protrusa 0 5 EEAQ Sceptridium dissectum 0 4 YCKE Notholaena montieliae 0 4 ZRAV Polypodium hesperium 0 4 EWXK Thyrsopteris elegans 0 3 MROH Thelypteris acuminata 0 3 MTGC Dennstaedtia 0 3 SKYV Vittaria lineata 0 3 UXCS Marattia sp. 0 3 WQML Cryptogramma acrostichoides 0 3 YQEC Woodsia ilvensis 0 3 ZQYU Polypodium plectolens 0 3 CAPN Equisetum diusum 0 2 FLTD Pteris ensigormis 0 2

197 A.2 List of 519 Fungi Species

Table A.2: List of 519 screened fungi genomes and the number of PTPS and MTPS in each species. These data were downloaded from the fungal genomics portal (http:// jgi.doe.gov/fungi).

Code JGI_id Species Clade α TRI5 α-α1 CPS/KS Total KMDG Denbi1 Dendrothele bispora CBS 962.96 v1.0 Agaricomycotina 33 5 0 1 39 QOEU Sphst1 Sphaerobolus stellatus v1.0 Agaricomycotina 28 1 0 0 29 FTUW Armga1 Armillaria gallica 21-2 v1.0 Agaricomycotina 20 7 0 0 27 RHJD Gymlu1 Gymnopus luxurians v1.0 Agaricomycotina 13 7 2 3 25 DECH Mar1 Marasmius ardii PR-910 v1.0 Agaricomycotina 17 8 0 0 25 PQBN Suihi1 Suillus hirtellus EM16 v1.0 Agaricomycotina 21 3 0 0 24 DCEF Leisp1 Leiotrametes sp BRFM 1775 v1.0 Agaricomycotina 17 6 0 0 23 CHIK Galma1 Galerina marginata v1.0 Agaricomycotina 23 0 0 0 23 RSVJ Corgl3 Cortinarius glaucopus AT 2004 276 v2.0 Agaricomycotina 17 5 0 0 22 MKIJ Clapy1 Clavicorona pyxidata HHB10654 v1.0 Agaricomycotina 21 0 0 0 21 PFFQ Gansp1 Ganoderma sp. 10597 SS1 v1.0 Agaricomycotina 13 7 0 0 20 ZCJM Laesu1 Laetiporus sulphureus var. sulphureus v1.0 Agaricomycotina 13 7 0 0 20 LKKY Exigl1 Exidia glandulosa v1.0 Agaricomycotina 19 1 0 0 20 PMTX Rhivi1 Rhizopogon vinicolor AM-OR11-026 v1.0 Agaricomycotina 11 8 0 0 19 OAGC Cerun2 Cerrena unicolor v1.1 Agaricomycotina 13 6 0 0 19 PMOD Antsi1 Antrodia sinuosa v1.0 Agaricomycotina 14 5 0 0 19 QHHM Mutel1 Mutinus elegans ME.BST v1.0 Agaricomycotina 16 3 0 0 19 NDTR Leumo1 Leucogyrophana mollusca KUC20120723A-06 v1.0 Agaricomycotina 17 2 0 0 19 EHFU Suigr1 Suillus granulatus EM37 v1.0 Agaricomycotina 17 2 0 0 19 FYAN Cersu1 Ceriporiopsis (Gelatoporia) subvermispora B Agaricomycotina 8 10 0 0 18 DGYS Obbri1 Obba rivulosa 3A-2 v1.0 Agaricomycotina 10 8 0 0 18 OTUO Xerba1 Xerocomus badius 84.06 v1.0 Agaricomycotina 12 6 0 0 18 URDR Gymch1 Gymnopilus chrysopellus PR-1187 v1.0 Agaricomycotina 13 5 0 0 18 NMWS Conpu1 Coniophora puteana v1.0 Agaricomycotina 16 2 0 0 18 QDCF Pycsa1 Pycnoporus sanguineus BRFM 1264 v1.0 Agaricomycotina 10 7 0 0 17 GBII Trave1 Trametes versicolor v1.0 Agaricomycotina 11 6 0 0 17 ZDTE Amamu1 Amanita muscaria Koide v1.0 Agaricomycotina 12 5 0 0 17 DOXV Guyne1 Guyanagaster necrorhiza MCA 3950 v1.0 Agaricomycotina 12 5 0 0 17 FJVL Lenti7_1 Lentinus tigrinus ALCF2SS1-7 v1.0 Agaricomycotina 13 4 0 0 17 QMZJ Plicr1 Plicaturopsis crispa v1.0 Agaricomycotina 5 11 0 0 16 ZQJS Pycco1662_1 Pycnoporus coccineus CIRM1662 Agaricomycotina 9 7 0 0 16 SAKG Fompi3 Fomitopsis pinicola FP-58527 SS1 v3.0 Agaricomycotina 11 5 0 0 16 NWCG Ricme1 Peniophora a. cinerea v1.0 Agaricomycotina 11 5 0 0 16 FSVD PosplRSB12_1 Postia placenta MAD-698-R-SB12 v1.0 Agaricomycotina 12 4 0 0 16 XPNY Dicsq1 Dichomitus squalens v1.0 Agaricomycotina 12 4 0 0 16 JWAZ Fomme1 Fomitiporia mediterranea v1.0 Agaricomycotina 16 0 0 0 16 PXUU Sisni1 niveocremeum HHB9708 ss-1 1.0 Agaricomycotina 16 0 0 0 16 HMXP Sissu1 Sistotremastrum suecicum v1.0 Agaricomycotina 16 0 0 0 16 CKWM Triab1_1 Trichaptum abietinum v1.0 Agaricomycotina 16 0 0 0 16 PELB Artel1 Artolenzites elegans CIRM1663 v1.0 Agaricomycotina 8 7 0 0 15 OCCJ Armme1_1 Armillaria mellea Agaricomycotina 9 6 0 0 15 TDSN Daequ1 Daedalea quercina v1.0 Agaricomycotina 10 5 0 0 15 USCC PleosPC15_2 Pleurotus ostreatus PC15 v2.0 Agaricomycotina 10 5 0 0 15 JGOT Traci1 Trametes cingulata BRFM 1805 v1.0 Agaricomycotina 10 5 0 0 15 MBVZ Hetan2 Heterobasidion annosum v2.0 Agaricomycotina 11 4 0 0 15 OCJC Lopni1 Peniophora sp. CONTA v1.0 Agaricomycotina 11 4 0 0 15 JEWI Suibr1 Suillus brevipes v1.0 Agaricomycotina 11 4 0 0 15 FHKP Lenti6_1 Lentinus tigrinus ALCF2SS1-6 v1.0 Agaricomycotina 12 3 0 0 15 YVFV Sisbr1 Lentinus tigrinus v1.0 Agaricomycotina 12 3 0 0 15 BCXG Schpa1 Schizopora paradoxa KUC8140 v1.0 Agaricomycotina 15 0 0 0 15 BSDJ Punst1 Punctularia strigosozonata v1.0 Agaricomycotina 8 4 0 2 14 YBWF Wolco1 Wolporia cocos MD-104 SS10 v1.0 Agaricomycotina 8 6 0 0 14 PKON Pospl1 Postia placenta MAD 698-R v1.0 Agaricomycotina 9 5 0 0 14 TVNE PleosPC9_1 Pleurotus ostreatus PC9 v1.0 Agaricomycotina 9 5 0 0 14 QZVB Pycco1 Pycnoporus coccineus BRFM 310 v1.0 Agaricomycotina 9 5 0 0 14

198 Table A.2: Continued

Code JGI_id Species Clade α TRI5 α-α1 CPS/KS Total MRRS Panru1 Panus rudis PR-1116 ss-1 v1.0 Agaricomycotina 11 3 0 0 14 HEVO Polbr1 Polyporus brumalis BRFM 1820 v1.0 Agaricomycotina 11 3 0 0 14 ZQBK Glotr1_1 Gloeophyllum trabeum v1.0 Agaricomycotina 12 2 0 0 14 PCAN Stehi1 Stereum hirsutum FP-91666 SS1 v1.0 Agaricomycotina 13 1 0 0 14 MWLD Aurvu1 Auriscalpium vulgare FP105234-Sp v1.0 Agaricomycotina 14 0 0 0 14 SDNQ Aurde3_1 Auricularia subglabra v2.0 Agaricomycotina 9 4 0 0 13 DLNR Fibra1 Fibroporia radiculosa TFFH 294 Agaricomycotina 9 4 0 0 13 SUPT Boled1 Boletus edulis v1.0 Agaricomycotina 12 1 0 0 13 DBVV Neole1 Neolentinus lepideus v1.0 Agaricomycotina 12 1 0 0 13 QQOC Phlbr1 Phlebia brevispora HHB-7030 SS6 v1.0 Agaricomycotina 7 5 0 0 12 PFYJ Tulca1 Tulasnella calospora AL13/4D v1.0 Agaricomycotina 8 4 0 0 12 TDAC Hypsu1 Hypholoma sublateritium v1.0 Agaricomycotina 9 3 0 0 12 XPDP Tralj1 Trametes ljubarskyi CIRM1659 v1.0 Agaricomycotina 9 3 0 0 12 RIYR Polar1 Polyporus arcularius v1.0 Agaricomycotina 10 2 0 0 12 DKBC Jaaar1 Jaapia argillacea v1.0 Agaricomycotina 12 0 0 0 12 HZVO Ramac1 Ramaria rubella (R. acris) UT-36052-T v1.0 Agaricomycotina 7 4 0 0 11 XMTQ Elmca1 Aporpium caryae L-13461 Agaricomycotina 8 3 0 0 11 RJGY Suilu1 Suillus luteus UH-Slu-Lm8-n1 v1.0 Agaricomycotina 8 3 0 0 11 RFWF Cylto1 Cylindrobasidium torrendii v1.0 Agaricomycotina 9 2 0 0 11 NNLT Gyman1 Gymnopus androsaceus JB14 v1.0 Agaricomycotina 9 1 0 1 11 STXZ Hydpi2 Hydnomerulius pinastri v2.0 Agaricomycotina 10 1 0 0 11 WLYJ Macfu1 Macrolepiota fuliginosa v1.0 Agaricomycotina 10 1 0 0 11 RYSH Rhisa1 Rhizopogon salebrosus TDB-379 v1.0 Agaricomycotina 10 1 0 0 11 NCZF Onnsc1 Onnia scaura P-53A v1.0 Agaricomycotina 11 0 0 0 11 SUQO Paxam1 Paxillus ammoniavirescens Pou09.2 v1.0 Agaricomycotina 4 6 0 0 10 QTDW Phchr2 Phanerochaete chrysosporium RP-78 v2.2 Agaricomycotina 4 6 0 0 10 SUXE Trace1 Trametopsis cervina CIRM-BRFM 1824 v1.0 Agaricomycotina 6 4 0 0 10 OVWY Serla_varsha1 Serpula lacrymans var shastensis SHA21-2 v1.0 Agaricomycotina 7 1 0 2 10 AFVL Volvo1 Volvariella volvacea V23 Agaricomycotina 7 3 0 0 10 ZNEB Pycci1 Pycnoporus cinnabarinus BRFM 137 Agaricomycotina 8 2 0 0 10 XJPI SerlaS7_3_2 Serpula lacrymans S7.3 v2.0 Agaricomycotina 9 0 0 1 10 SSIY SerlaS7_9_2 Serpula lacrymans S7.9 v2.0 Agaricomycotina 9 0 0 1 10 RAHU Phaca1 Phanerochaete carnosa HHB-10118-Sp v1.0 Agaricomycotina 4 5 0 0 9 JWFA Paxin1 Paxillus involutus ATCC 200175 v1.0 Agaricomycotina 5 4 0 0 9 CNHF Bjead1_1 Bjerkandera adusta v1.0 Agaricomycotina 6 3 0 0 9 YLAB Ompol1 Omphalotus olearius Agaricomycotina 7 2 0 0 9 PTDW Lacam1 Laccaria amethystina LaAM-08-1 v1.0 Agaricomycotina 7 2 0 0 9 SAAH Lacbi2 Laccaria bicolor v2.0 Agaricomycotina 7 2 0 0 9 LNRR Thega1 Thelephora ganbajun P2 v1.0 Agaricomycotina 9 0 0 0 9 RRJW Agabi_varbur_1 Agaricus bisporus var. burnettii JB137-S8 Agaricomycotina 6 2 0 0 8 QVXG Fibsp1 Fibulorhizoctonia sp. CBS 109695 v1.0 Agaricomycotina 4 3 0 0 7 XWAP Paxru1 Paxillus rubicundulus Ve08.2h10 v1.0 Agaricomycotina 4 3 0 0 7 UZKW Panst_KUC8834_1_1 Panellus stipticus KUC8834 v1.1 Agaricomycotina 5 2 0 0 7 WOTK Agabi_varbisH97_2 Agaricus bisporus var bisporus (H97) v2.0 Agaricomycotina 6 1 0 0 7 CQIM Copci1 Coprinopsis cinerea Agaricomycotina 6 1 0 0 7 ETNA Copci_AmutBmut1 Coprinopsis cinerea AmutBmut pab1-1 v1.0 Agaricomycotina 6 1 0 0 7 OABX Phlgi1 Phlebiopsis gigantea v1.0 Agaricomycotina 6 1 0 0 7 RDKM Trima3 Tricholoma matsutake 945 v3.0 Agaricomycotina 6 1 0 0 7 NUVL Hebcy2 Hebeloma cylindrosporum h7 v2.0 Agaricomycotina 7 0 0 0 7 YDNI Pilcr1 Piloderma croceum F 1598 v1.0 Agaricomycotina 7 0 0 0 7 ONZG Pismi1 Pisolithus microcarpus 441 v1.0 Agaricomycotina 7 0 0 0 7 DJWQ Gyrli1 Gyrodon lividus BX v1.0 Agaricomycotina 5 1 0 0 6 ARTW Botbo1 Botryobasidium botryosum v1.0 Agaricomycotina 6 0 0 0 6 ZKED Pisti1 Pisolithus tinctorius Marx 270 v1.0 Agaricomycotina 6 0 0 0 6 CWSV Rhiso1 Rhizoctonia solani AG-1 IB Agaricomycotina 6 0 0 0 6 PDMQ Sclci1 Scleroderma citrinum Foug A v1.0 Agaricomycotina 6 0 0 0 6 FVKT Amath1 Amanita thiersii Skay4041 v1.0 Agaricomycotina 4 1 0 0 5 OMBC Fishe1 Fistulina hepatica v1.0 Agaricomycotina 4 0 0 0 4 YXNI Leugo1_1 Leucoagaricus gongylophorus Agaricomycotina 2 1 0 0 3 KXQT Calco1 Calocera cornea v1.0 Agaricomycotina 3 0 0 0 3 AEOE Calvi1 Calocera viscosa v1.0 Agaricomycotina 3 0 0 0 3

199 Table A.2: Continued

Code JGI_id Species Clade α TRI5 α-α1 CPS/KS Total KNQF Dacsp1 Dacryopinax sp. DJM 731 SSP1 v1.0 Agaricomycotina 3 0 0 0 3 FDSM Schco3 Schizophyllum commune H4-8 v3.0 Agaricomycotina 3 0 0 0 3 WSPI Schco_LoeD_1 Schizophyllum commune Loenen D v1.0 Agaricomycotina 3 0 0 0 3 XIIL Schco_TatD_1 Schizophyllum commune Tattone D v1.0 Agaricomycotina 3 0 0 0 3 PYHT Pirin1 Piriformospora indica DSM 11827 from MPI Agaricomycotina 1 1 0 0 2 JYHG CerAGI Ceratobasidium sp. (anastomosis group I, AG-I) v1.0 Agaricomycotina 2 0 0 0 2 BFSU ClaPMI390 Clavulina sp. PMI_390 v1.0 Agaricomycotina 2 0 0 0 2 EPBK Monpe1_1 Moniliophthora perniciosa FA553 Agaricomycotina 1 0 0 0 1 UAVJ Sebve1 Sebacina vermifera MAFF 305830 v1.0 Agaricomycotina 1 0 0 0 1 TEXE Walse1 Wallemia sebi v1.0 Agaricomycotina 1 0 0 0 1 YDCR Basun1 Basidioascus undulatus Agaricomycotina 0 0 0 0 0 IBEU Cryne_JEC21_1 Cryptococcus neoformans var neoformans JEC21 Agaricomycotina 0 0 0 0 0 FWMR Cryne_H99_1 Cryptococcus neoformans var. grubii H99 Agaricomycotina 0 0 0 0 0 BBWI Cryvi1 Cryptococcus vishniacii v1.0 Agaricomycotina 0 0 0 0 0 MSAX Diocr1 Dioszegia cryoxerica v1.0 Agaricomycotina 0 0 0 0 0 HMCI Hydru2 Hydnum rufescens UP504 v2.0 Agaricomycotina 0 0 0 0 0 GZLM Treme1 Tremella mesenterica Fries v1.0 Agaricomycotina 0 0 0 0 0 GHXA Trich1 Trichosporon chiarellii MYA-4694 v1.0 Agaricomycotina 0 0 0 0 0 KWWA Triol1 Trichosporon oleaginosus IBC0246 v1.0 Agaricomycotina 0 0 0 0 0 CYOI Walic1 Wallemia ichthyophaga EXF-994 Agaricomycotina 0 0 0 0 0 BXWZ Catan1 Catenaria anguillulae PL171 v1.0 Blastocladiomycota 0 0 0 0 0 SKYA Batde5 Batrachochytrium dendrobatidis JAM81 v1.0 Chytridiomycota 0 0 0 0 0 QRGY Ganpr1 Gonapodya prolifera v1.0 Chytridiomycota 0 0 0 0 0 LQKN Rozal1_1 Rozella allomycis CSF55 Cryptomycota 0 0 0 0 0 DGLV Corca1 Corynespora cassiicola CCP v1.0 Dothideomycetes 2 1 1 5 9 PINU Cocvi1 Cochliobolus victoriae FI3 v1.0 Dothideomycetes 3 1 3 2 9 DKTZ CocheC4_1 Cochliobolus heterostrophus C4 v1.0 Dothideomycetes 4 0 4 1 9 OCCK CocheC5_3 Cochliobolus heterostrophus C5 v2.0 Dothideomycetes 4 0 4 1 9 AXMY Bysci1 Byssothecium circinans CBS 675.92 v1.0 Dothideomycetes 2 1 0 5 8 TNXD Cocca1 Cochliobolus carbonum 26-R-13 v1.0 Dothideomycetes 3 0 2 3 8 JMZC CocheC5_1 Cochliobolus heterostrophus C5 Dothideomycetes 3 0 4 1 8 XJJN Amnli1 Amniculicola lignicola CBS 123094 v1.0 Dothideomycetes 2 0 0 5 7 HIBM Melpu1 Melanomma pulvis-pyrius v1.0 Dothideomycetes 2 0 0 5 7 NWTK Cocsa1 Cochliobolus sativus ND90Pr v1.0 Dothideomycetes 3 1 2 1 7 XAAG Psehy1 Pseudovirgaria hyperparasitica CBS 121739 v1.0 Dothideomycetes 4 0 1 2 7 FKOG Tryvi1 Trypethelium eluteriae v1.0 Dothideomycetes 5 0 0 2 7 ZFEH Botdo1_1 Botryosphaeria dothidea Dothideomycetes 6 0 1 0 7 HCWQ Linin1 Lindgomyces ingoldianus ATCC 200398 v1.0 Dothideomycetes 2 0 0 4 6 AEZQ Aaoar1 Aaosphaeria arxii CBS 175.79 v1.0 Dothideomycetes 3 0 0 3 6 UMGD Rhyru1_1 Rhytidhysteron rufulum Dothideomycetes 3 0 0 3 6 SYTJ Neopa1 Neofusicoccum parvum UCRNP2 Dothideomycetes 4 0 1 1 6 FXQO Ophdi1 Ophiobolus disseminans CBS 113818 v1.0 Dothideomycetes 5 0 0 1 6 CGEC Zoprh1 Zopa rhizophila v1.0 Dothideomycetes 0 1 0 4 5 ERHT Lopma1 Lophiostoma macrostomum v1.0 Dothideomycetes 2 0 1 2 5 RDUF Cocmi1 Cochliobolus miyabeanus ATCC 44560 v1.0 Dothideomycetes 2 0 1 2 5 LQLG Karrh1 Karstenula rhodostoma CBS 690.94 v1.0 Dothideomycetes 2 0 0 3 5 FSCX Aplpr1 Aplosporella prunicola CBS 121.167 v1.0 Dothideomycetes 3 2 0 0 5 XLMY Len1 Lentithecium uviatile v1.0 Dothideomycetes 1 1 0 2 4 XMUQ Trepe1 Trematosphaeria pertusa CBS 122368 v1.0 Dothideomycetes 1 1 0 2 4 USYT Glost2 Glonium stellatum CBS 207.34 v1.0 Dothideomycetes 1 0 1 2 4 EGUL Plesi1 Pleomassaria siparia v1.0 Dothideomycetes 1 0 1 2 4 RKGA Dotsy1 Dothidotthia symphoricarpi v1.0 Dothideomycetes 2 1 0 1 4 TOST Altbr1 Alternaria brassicicola Dothideomycetes 3 0 1 0 4 CWHF Stasp1 Stagonospora sp. SRC1lsM3a v1.0 Dothideomycetes 3 1 0 0 4 ZLLL Cenge3 Cenococcum geophilum 1.58 v2.0 Dothideomycetes 0 0 1 2 3 XTCE Polfu1 Polyplosphaeria fusca CBS 125425 v1.0 Dothideomycetes 0 0 0 3 3 QGZN Pyrtt1 Pyrenophora teres f. teres Dothideomycetes 1 0 1 1 3 JOFG Bauco1 Baudoinia compniacensis UAMH 10762 (4089826) v1.0 Dothideomycetes 1 1 1 0 3 YXFC Bimnz1 Bimuria novae-zelandiae CBS 107.79 v1.0 Dothideomycetes 1 0 1 1 3 JVUN Stano2 Stagonospora nodorum SN15 v2.0 Dothideomycetes 1 1 0 1 3 MORC Coclu2 Cochliobolus lunatus m118 v2.0 Dothideomycetes 2 0 1 0 3

200 Table A.2: Continued

Code JGI_id Species Clade α TRI5 α-α1 CPS/KS Total JSQJ Macph1 Macrophomina phaseolina MS6 Dothideomycetes 2 0 1 0 3 WIWL Altal1 Alternaria alternata SRC1lrK2f v1.0 Dothideomycetes 3 0 0 0 3 KROL Sepmu1 Septoria musiva SO2202 v1.0 Dothideomycetes 3 0 0 0 3 CYWZ Seppo1 Septoria populicola v1.0 Dothideomycetes 3 0 0 0 3 OWZC Delco1 Delitschia confertaspora ATCC 74209 v1.0 Dothideomycetes 0 0 1 1 2 GEAK Myc2 Mycosphaerella jiensis v2.0 Dothideomycetes 0 0 1 1 2 TQZK Setho1 Setomelanomma holmii CBS 110217 v1.0 Dothideomycetes 0 1 0 1 2 SCZI Spo1 Sporormia metaria v1.0 Dothideomycetes 0 0 0 2 2 FARK Venpi1 Venturia pirina Dothideomycetes 0 1 0 1 2 KMFJ Aurpu_var_mel1 Aureobasidium pullulans var. melanogenum CBS 110374 Dothideomycetes 1 1 0 0 2 KCAO Clael1 Clathrospora elynae CBS 161.51 v1.0 Dothideomycetes 1 0 0 1 2 RHMD Decga1 Decorospora gaudefroyi v1.0 Dothideomycetes 1 0 0 1 2 MHQQ Disac1 Dissoconium aciculare v1.0 Dothideomycetes 1 1 0 0 2 VWIU Leppa1 Lepidopterella palustris v1.0 Dothideomycetes 1 0 0 1 2 VRZF Maseb1 Massarina eburnea CBS 473.64 v1.0 Dothideomycetes 1 0 0 1 2 EBZG Mycgr3 Mycosphaerella graminicola v2.0 Dothideomycetes 1 0 0 1 2 TIPI Settu1 Setosphaeria turcica Et28A v1.0 Dothideomycetes 1 1 0 0 2 CZYK Settur1 Setosphaeria turcica NY001 v1.0 Dothideomycetes 1 1 0 0 2 UFYD Ternu1 Teratosphaeria nubilosa CBS 116005 v1.0 Dothideomycetes 1 0 0 1 2 WTQZ Hyspu1_1 Hysterium pulicare Dothideomycetes 2 0 0 0 2 CIVI Lepmu1 Leptosphaeria maculans Dothideomycetes 2 0 0 0 2 XDMT Lopmy1 Lophium mytilinum CBS 269.34 v1.0 Dothideomycetes 2 0 0 0 2

201 Vita

Qidong Jia was born in Zhengzhou, the provincial capital of Henan Province. He obtained his Bachelor’s degree in plant protection and Master’s degree in bioinformatics at the Nanjing Agricultural University, China. He was then accepted to the Graduate

School of Genome Science and Technology, a multidisciplinary joint program between the

University of Tennessee Knoxville and Oak Ridge National Laboratory, to further pursue his research in the eld of bioinformatics and computational biology. He also earned a

Master’s degree in Statistics and a Minor in Computational Science at the University of

Tennessee.

202