Structure and Evolution of Linalool Synthase

Leland Cseke, Natalia Dudareva,1 and Eran Pichersky Department of Biology, University of Michigan

Plant terpene synthases constitute a group of evolutionarily related . Within this group, however, enzymes that employ two different catalytic mechanisms, and their associated unique domains, are known. We investigated the structure of the gene encoding linalool synthase (LIS), an that uses geranyl pyrophosphate as a and catalyzes the formation of linalool, an acyclic monoterpene found in the ¯oral scents of many plants. Although LIS employs one catalytic mechanism (exempli®ed by limonene synthase [LMS]), it has sequence motifs indicative of both LMS-type synthases and the terpene synthases employing a different mechanism (exempli®ed by copalyl diphosphate synthase [CPS]). Here, we report that LIS genes analyzed from several species encode proteins that have overall 40%±96% identity to each other and have 11 introns in identical positions. Only the region encoding roughly the last half of the LIS gene (exons 9±12) has a gene structure similar to that of the LMS-type genes. On the other hand, in the ®rst part of the LIS gene (exons 1±8), LIS gene structure is essentially identical to that found in the ®rst half of the gene encoding CPS. In addition, the level of similarity in the coding information of this region between the LIS and CPS genes is also signi®cant, whereas the second half of the LIS protein is most similar to LMS-type synthases. Thus, LIS appears to be a composite gene which might have evolved from a recombination event between two different types of terpene synthases. The combined evolutionary mechanisms of duplication followed by divergence and/or ``domain swapping'' may explain the extraordinarily large diversity of proteins found in the plant terpene synthase family.

Introduction Terpenes are among the most widespread and alyzes the formation of limonene, a monoterpenoid de- chemically diverse groups of compounds produced by fense compound. So far, LMS (a protein with approxi- plants. They act in a large number of roles, e.g., hor- mately 600 amino acids) is one of the few terpene syn- mones, mediators of polysaccharide synthesis, photo- thases whose gene has been characterized for diverse synthetic pigments, electron carriers, and membrane species. The deduced amino acid sequences of LMSs components, among other essential functions within the from various angiosperms are 40%±65% identical to plant (Chappell 1995; McGarvey and Croteau 1995). each other, but share only ϳ28% amino acid sequence They also mediate interactions between plants and their identity with LMS from the gymnosperm Abies grandis environment: some serve as toxic defense compounds (Bohlmann, Steel, and Croteau 1997). Signi®cant levels to ward off herbivores or fungal attacks (Johnson and of interspeci®c sequence identity are also seen with co- Croteau 1987; Lewinsohn, Gijzen, and Croteau 1992; palyl diphosphate synthase (CPS, formerly ent-kaurene Nebeker, Hodges, and Blance 1993), while others are synthase A). CPS (a protein with approximately 800 res- volatile compounds released from the plants to attract idues), together with the similarly sized ent-kaurene syn- pollinators (Dobson 1993; Knudsen and Tollsten 1993). thase (KS, formerly ent-kaurene synthase B), is respon- Terpenoid compounds (also called isoprenoids) are sible for the production of ent-kaurene (the precursor of made from the condensation of one molecule of dime- gibberellin-type plant hormones) from GGPP (Yama- thylallyl pyrophosphate (DMAPP) and one or more mol- guchi et al. 1996). The deduced amino acid sequences ecules of isopentyl pyrophosphate (IPP), to give geranyl of CPS from A. thaliana, tomato, pea, pumpkin, and diphosphate (GPP), farnesyl diphosphate (FPP), or ger- corn species are only 45%±50% identical (Sun and Ka- anylgeranyl diphosphate (GGPP) (Gershenzon and Cro- miya 1994, 1997; Bensen et al. 1995). teau 1993). Practically all other terpenes are derived Certain sequence motifs shared by all plant terpene from these three precursors. The enzymes that use GPP, synthases suggest that they share a common evolution- FPP, or GGPP as substrates are termed terpene synthases ary origin (McGarvey and Croteau 1995; Bohlmann, or, when the formed is a cyclic compound, ter- Meyer-Gauen, and Croteau 1998). Interestingly, the lev- pene cyclases. The mechanisms of action and of the evo- el of sequence identity between ``same'' synthases (i.e., lution of these enzymes are of particular interest for an enzymes with the same substrate and same product) in understanding of the many plant processes in which ter- different species can be lower than that between two penes are involved. different synthases. This is clearly shown in the phylo- Research on terpenes synthesis has focused on the genetic analysis of the terpene synthase cyclases, such as limonene synthase (LMS), which cat- carried out by Croteau and co-workers (Bohlmann, Meyer-Gauen, and Croteau 1998), in which it was found 1 Present address: Horticulture Department, Purdue University. that the same terpene synthases (e.g., LMSs) from dif- ferent plant species belonged in two different subfami- Key words: domain swapping, evolution, ¯oral scent, gene struc- ture, isoprenoids, plants, terpene synthases, secondary metabolism. lies, and each of these subfamilies also contained en- Address for correspondence and reprints: Eran Pichersky, De- zymes whose substrates and products were quite differ- partment of Biology, University of Michigan, Ann Arbor, Michigan ent. 48109-1048. E-mail: [email protected]. Gene structure also points to the common ancestry Mol. Biol. Evol. 15(11):1491±1498. 1998 of many terpene synthases. LMS, 5EAS (5-epi-aristo- ᭧ 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 lochene synthase), VES (vetispiradiene synthase), and

1491 1492 Cseke et al.

CAS (casbene synthase) have introns in identical posi- cies (Kaiser 1991; Knudsen, Tollsten, and Bergstrom tions (Facchini and Chappell 1992; Mau and West 1994; 1993). A cDNA encoding linalool synthase (LIS1, for- Back and Chappell 1995; Yuba et al. 1996). However, merly LIS) was previously isolated and characterized CPS gene structure differs substantially from that of oth- from a C. breweri ¯oral cDNA library (Dudareva et al. er terpene cyclases (Sun and Kamiya 1994; Bensen et 1996). As part of our investigation to understand how al. 1995). It is noteworthy that the deduced amino acid the LIS gene promoter in C. breweri is differentially sequences of CPSs share only 20%±31% identity with regulated in several ¯ower parts during ¯oral develop- other terpene synthases, including KS (ϳ26%±31% with ment, we isolated the LIS gene from two linalool-scent- KS, whose complete gene structure is not currently ed Onagraceae species, C. breweri and Oenothera ari- known) (Sun and Kamiya 1994; Yamaguchi et al. 1996, zonica, and from a nonscented one, Clarkia concinna, 1998). Their substantial sequence divergence from other which has nonetheless been shown to express its LIS terpene cyclases has resulted in the placement of the gene at a low level in the stigma (Pichersky et al. 1994; CPS proteins in a separate subfamily in the classi®cation Dudareva et al. 1996). In comparing the structures of scheme proposed by Bohlmann, Steele, and Croteau these LIS genes (and that of Arabidopsis thaliana, which (1997) and Bohlmann, Meyer-Gauen, and Croteau was independently obtained) with the structures of other (1998). terpene synthase genes, we discovered that the structure Although they are all evolutionarily related, mech- of LIS indicates that it is a composite gene. The results anistically the plant terpene synthases seem to fall into of these comparisons, and their implications for the evo- two main catergories. One class, exempli®ed by LMS, lution of the terpene gene family, are presented here. employs a mechanism involving the ionization of the substrate allylic diphosphate to initiate the reaction (Vo- Materials and Methods gel et al. 1996). A second class, exempli®ed by CPS, Construction of Partial Genomic Libraries from C. employs a mechanism involving substrate double-bond breweri, C. concinna, and O. arizonica protonation (Vogel et al. 1996). An important feature of the ®rst class of terpene synthases is the DDXXD motif DNAs were digested with EcoRI or EcoRV and found in the C-terminal halves of these proteins and separated on 0.7% agarose gels. Fragments in the ap- shown to be involved in metal binding and ca- propriate range (as previously determined by Southern talysis (Ashby and Edwards 1990; Tarshis et al. 1994; blot analysis with an LIS1 probe) were cut out and elut- Starks et al. 1997; Lesburg et al. 1997). CPS does not ed from the gels. In the case of EcoRV-generated frag- contain such a motif, but instead has an aspartate-rich ments, EcoRI/NotI adaptors (Pharmacia Biotech) were motif in its N-terminal half that is also believed to be ligated to the blunt ends of these fragments. Partial ge- involved in metal cofactor binding (Sun and Kamiya nomic libraries were then constructed and screened us- 1994). Thus, the classi®cation based on levels of amino ing the Lambda ZAP II/EcoRI/CIAP Cloning Kit (Stra- acid sequence similarities among various terpene syn- tagene) according to manufacturer's instructions. Li- thases, which places CPS in a separate subfamily, is braries were screened with LIS1 as a probe under high- consistent with its distinct mechanism. stringency conditions (65ЊC hybridization temperature) However, some terpene synthases, such as abieta- for the C. breweri library and under low-stringency con- diene synthase (ABS) and linalool synthase (LIS), are ditions (58ЊC hybridization temperature) for O. arizon- not easily classi®ed based on amino acid sequence com- ica library, with all other conditions being identical to parisons alone. This is so because they have higher se- the ones described in Dudareva et al. (1996). quence similarity to CPS in their N-terminal parts but Isolation of C. breweri LIS2 cDNA display higher similarity to LMS-type synthases in their A C. breweri ¯oral cDNA library (Dudareva et al. C-terminal parts. Therefore, it has previously been sug- 1996) was screened with a probe derived from LIS1(a gested that ABS and LIS (both proteins have similar fragment containing the ®rst 2.2 kb of the coding region) sizes of 868±886 residues but are placed in different using low-stringency conditions (58ЊC hybridization subfamilies based on their amino acid sequences) are of temperature). Screening and characterization were done composite origin (Dudareva et al. 1996; Vogel et al. as previously described (Dudareva et al. 1996). 1996). It is important to note, however, that assessment of the evolutionary relatedness of enzymes with differ- GenBank Searches and Sequence Analysis ent activities based on amino acid sequence comparisons The A. thaliana LIS gene (accession number alone may be misleading, because selective pressures AC002294) was found in a BLAST search of GenBank may cause different rates of evolution in corresponding using the LIS1 protein sequence as the query. Introns parts of the coding regions in the various gene lineages, within the Arabidopsis LIS gene sequence, as well as depending on the function of each region in the different the LIS genomic sequences of C. breweri, C. concinna, enzymes. and O. arizonica, were identi®ed using the C. breweri We have been studying the biosynthesis of linalool, LIS1 cDNA sequence for comparison. an acyclic monoterpene alcohol found in the ¯oral scents of Clarkia breweri (Pichersky et al. 1994; Pich- Sequence Comparisons ersky, Lewinsohn, and Croteau 1995; Raguso and Pich- Sequence alignments were performed using the ersky 1995) and Oenothera spp. (all in the family On- ClustalW1.7 program (http;//dot.imagen.bcm.tmc:9331/ agraceae), as well as in those of many other plant spe- multi-align.html). Linalool Synthase Evolution 1493

Northern Blots of C. breweri LIS1 and LIS2 mRNAs Total RNA samples from vegetative and ¯oral or- gans of C. breweri were prepared using the Qiagen RNeasy௢ mRNA extraction kit and run on vertical aga- rose gels under the same conditions described by Du- dareva et al. (1996). Northern blots were probed with LIS1- and LIS2-speci®c probes under high-stringency conditions, and results were quanti®ed and normalized to equal amounts of 18S RNA as previously described (Dudareva et al. 1996). The LIS1-speci®c probe was de- rived from the 5Ј region of the LIS1 cDNA (the ®rst 650 nt of the coding region), and had ϳ70% identity with the corresponding region in LIS2. The LIS2-speci®c probe was derived from the 5Ј region of the LIS2 cDNA (the ®rst 500 nt of the coding region) and was also ϳ70% identical with the corresponding region in LIS1. Test blots with cloned DNAs indicated that the probes did not cross-hybridize under high-stringency condi- FIG. 1.ÐComparison of LIS1 (black columns) and LIS2 (stippled tions. columns) mRNA levels in various C. breweri ¯oral organs and in vegetative (leaf) tissue as determined by northern blot analysis. The Antibody Production and Western Blots relative abundance of transcripts is set on an arbitrary scale for com- parison purposes. Results are averages of two experiments. An LIS1 cDNA fragment (codons 362±653) and an LIS2 cDNA fragment (codons 394±789) were expressed in Escherichia coli using the pET-T7 (Studier et al. 1990) and pTRC99A (Pharmacia) expression systems, sequence with high levels of similarity to the C. breweri respectively. The resulting polypeptides were puri®ed on LIS sequences in the Arabidopsis thaliana genome pro- SDS-PAGE and used to produce antibodies in rabbits ject databank (accession numbers AC002294 and (Cocalico Inc., Reamston, Pa.). Western blots with anti- AF067605). LIS1 and anti-LIS2 antibodies were performed as de- scribed by Dudareva et al. (1996) using a 1:4,000 di- Comparison of LIS1 and LIS2 Expression in C. lution for both sera. breweri The tissue-speci®c, developmentally regulated ex- Results pression of C. breweri LIS1 was reported earlier (Du- Isolation of Plant LIS Sequences dareva et al. 1996). Examination of the expression of LIS2 by northern blot analysis using a gene-speci®c The isolation of a C. breweri LIS1 cDNA (acces- sion number U58314) was previously reported (Dudar- probe indicates that its expression pattern is quite dif- eva et al. 1996). It was derived from the LIS1 gene, ferent from and, overall, much lower than that of LIS1 which was shown to be expressed in ¯oral tissues that (®g. 1). As previously reported (Dudareva et al. 1996), emit linalool (Dudareva et al. 1996). For this study, we in freshly opened ¯owers the highest levels of LIS1 isolated a genomic DNA fragment that corresponds to mRNA transcripts are found in the stigma, followed by the ®rst half of the coding region of LIS1 as well as 1.5 the petal, style, stamen, and sepal tissues, and no tran- kb of the 5Ј region upstream of the coding region (ac- scripts are seen in vegetative tissues. LIS2, on the other cession number AF067601). Using the LIS1 cDNA as a hand, has mRNA levels comparable to those of LIS1 probe in low-stringency hybridizations, we isolated an only in petals, with much lower levels detectable in the additional cDNA, LIS2 (accession number AF067603), stigma, styles, sepals, and stamens. As with LIS1, no encoding a protein distinct from LIS1. We also isolated LIS2 transcripts are found in vegetative tissues. a complete LIS genomic sequence from C. concinna (ac- To test for the presence of LIS2 protein, antibodies cession number AF067602), the proposed progenitor of against the LIS2 protein that reacted with E. coli±ex- C. breweri (Lewis 1962; MacSwain, Raven, and Thorp pressed LIS2 but not with LIS1 were produced (data not 1973; Raguso and Pichersky 1995), using the C. breweri shown). However, the anti-LIS2 antibodies did not de- LIS1 cDNA as a probe. Clarkia concinna ¯owers are tect protein in any ¯oral (or vegetative) tissue, whereas known to express LIS at very low levels only in stigma the anti-LIS1 antibodies did detect LIS protein (Dudar- tissue and to emit 1,000-fold less linalool than C. brew- eva et al. 1996 and additional data not shown). These eri ¯owers (Pichersky et al. 1994; Dudareva et al. 1996). results suggest that LIS2 mRNA is produced in ¯oral Another relative of C. breweri, O. arizonica, produces tissues but that its translation results in very low levels copious amounts of linalool, and a partial LIS genomic of protein or perhaps in a dysfunctional enzyme that is clone from O. arizonica (roughly corresponding to the rapidly degraded, consistent with our previous results in second half of the gene) was isolated and characterized which only the LIS1 protein could be isolated and iden- (accession number AF067604), again using the LIS1 ti®ed from C. breweri ¯owers (Pichersky, Lewinsohn, cDNA as a probe. Finally, we have identi®ed a genomic and Croteau 1995; Dudareva et al. 1996). 1494 Cseke et al.

FIG. 2.ÐComparison of promoter sequences from C. breweri and C. concinna LIS genes immediately upstream of the initiating ATG codon (asterisks). A dot indicates a gap introduced to maximize identity (black boxes). The arrow indicates the position of the ®rst nucleotide of the LIS1 transcript. Underlined areas indicate possible CAATT and TATA boxes. The ®rst nucleotide of the transcript is numbered ϩ1.

Comparison of the Promoter Sequences of C. breweri Structures of LIS Genes from Different Species LIS1 and C. concinna LIS The C. concinna LIS gene contains 11 introns. The The transcriptional start site for C. breweri LIS1 A. thaliana gene tentatively identi®ed as LIS also con- was previously located 53 nt upstream of the transla- tains 11 introns in identical positions within the gene tional start site (Dudareva et al. 1996). A possible TATA (illustrated in ®g. 3). In addition, the ®rst six introns (I± box for this gene is located 25 nt upstream of the tran- VI) of the partial C. breweri LIS genomic clone are in scriptional start site, and a possible CAATT box is lo- positions identical to those in the full-length LIS geno- cated 147 nt upstream of the translational start site (®g. mic clones of C. concinna and A. thaliana, and the same 2). When the upstream sequences of the C. breweri and is true for the last ®ve introns (VII±XI) of the partial O. C. concinna LIS genes are aligned, comparison indicates arizonica LIS genomic clone (®g. 3). Although some that they have high sequence identity with each other variation in the sizes of the corresponding introns in LIS (ϳ93% DNA identity in the region illustrated in ®g. 2). genes from different species is evident (®g. 3), introns Compared with the C. breweri LIS1 promoter sequence, that are small (Ͻ250 nt) in one species tend to be small the C. concinna CAATT box has an insertion of four in other species as well. The exception is intron XI, adenine nucleotides, creating the sequence which is large (675 nt) in O. arizonica. It is noteworthy CAAAAAATT (®g. 2), and several other insertions oc- that the ®rst intron is quite large (440±912 nucleotides) cur within this C. concinna promoter region, two of in all LIS genes analyzed. them immediately upstream of the putative TATA box. It is possible that these differences in the LIS promoters Comparison of LIS Sequences from the two related species are responsible for the vast- The LIS protein sequences from C. breweri (LIS1 ly different expression characteristics of this gene in the and LIS2), C. concinna, O. arizonica, and A. thaliana two species, but this remains to be veri®ed experimen- were aligned, and the percentage of identity was cal- tally. culated for the region encompassing residues 476±795

FIG. 3.ÐStructures and extents of characterized LIS genes. Dark areas with Arabic numerals indicate exons, light areas with Roman numerals indicate introns, and shaded areas indicate promoter regions and 3Ј nontranslated regions. Sizes of exons and introns are drawn to scale. Linalool Synthase Evolution 1495

FIG. 4.ÐComparison of the gene structures of the two full-length LIS genes (from C. concinna and A. thaliana) with genes encoding monoterpene, sesquiterpene, and diterpene cyclases, and with CPS. Terpene cyclase genes are LMS from P. frutescens (Yuba et al. 1996; EMBL accession number AB005744), 5EAS from N. tabacum (Facchini and Chappell 1992; accession number L04680), and CAS from R. communis (Mau and West 1994; accession number L32134). The CPS gene is from A. thaliana (Sun and Kamiya 1994; accession numbers U11034 and AC004044). Vertical lines indicate positions of introns. Numbers are the numbers of codons per exon. The double-headed arrow labeled ``A'' indicates the part of 5EAS known to encode the terpene synthase domain and whose structure is conserved between LIS and LMS-type synthase genes. The double-headed arrow labeled ``B'' indicates the region whose structure is conserved between LIS and CPS. Dark-shaded areas indicate regions of high identity (as measured by comparisons of protein-coding information) among LIS, LMS, 5EAS, and CAS. Light-shaded areas indicate regions of high identity between LIS and CPS. Asterisks indicate regions with the highest conservation among LIS protein sequences.

(C. breweri LIS1 numbering). This region corresponds thase) indicates that residues 36±230, encoded by exon to the C-terminal terpene synthase region of 5EAS and 2 and most of exon 3, form a structure related to gly- also coincides with the region that has the highest se- cosylhydrolases that appears not to be involved in ter- quence similarity to other terpene synthases such as pene biosynthesis. Residues 231±550, encoded by the LMS and CAS. (It should be noted that C. breweri LIS2 end of exon 3 and exons 4±7, constitute the terpene encodes a shorter protein [815 residues] than does C. synthase portion of the protein involved in initial rec- breweri LIS1 [870 residues], and the difference in length ognition and the binding of the FPP substrate and Mg2ϩ is due to earlier termination of the LIS2 protein). In this cofactor in the centered around the DDXXD region, the C. concinna LIS protein is 97.5% identical motif. Identical regions are proposed to be involved in to LIS1 from C. breweri. Other LIS protein sequences catalysis in other LMS-type synthases (region A in ®g. share identity to each other in the 67%±72% range, with 4). the exception of A. thaliana LIS, which is only ϳ45% The LIS protein has signi®cant sequence identity identical to the other LIS sequences. When compared (34%) with 5EAS in the regions encoded by the end of over the entire length of the polypeptides, the A. thali- exon 3 and all of exon 4 in the 5EAS gene (correspond- ana LIS is ϳ40% identical to the Onagraceae LIS se- ing to most of exon 8 and all of exon 9 in LIS) (dark- quences, and the C. breweri LIS1 and C. concinna LIS shaded areas in ®g. 4), but not elsewhere. Homology proteins are 96% identical. The protein segments encod- modeling (SwissModel) predicts that this region of the ed by exons 3, 7, 9, and 11 have retained the highest LIS protein has a three-dimensional structure similar to levels of identity among these proteins, with the seg- that of the corresponding region of 5EAS (data not ment encoded by exon 9 (which includes the important shown). In contrast, LIS shows the highest identity to motif DDXXD) being the most conserved (®g. 4). CPS in the ®rst part of the protein (ϳ28% compared Comparison of LIS Sequences with Those of Other with 17%±22% for other terpene synthases) (region B Terpene Synthases in ®g. 4). CPS regions encoded by exons 4, 6, 7, and The three-dimensional structure of a single plant 10 of CPS retain the highest levels (29%±38%) of iden- terpene synthase polypeptide, the one encoded by the tity with the LIS sequences encoded by the correspond- tobacco 5EAS gene, has been determined so far (Starks ing regions (most of exon 1 as well as exons 3, 4, and et al. 1997). The analysis of the three-dimensional struc- 7) of LIS (light-shaded areas in ®g. 4). The very begin- ture of the 550-residue 5EAS protein (an LMS-type syn- ning of the CPS gene encodes a cleavable transit peptide 1496 Cseke et al. that directs its protein to the plastid (Sun and Kamiya LMS-type catalytic mechanism contain the metal-cofac- 1994; Aach, Bose, and Graeve 1995), but the LIS pro- tor-binding domain and most of the residues involved in tein appears to have no cleavable N-terminus of more substrate (GPP, FPP, or GGPP) binding in the C-terminal than eight amino acids (Dudareva et al. 1996). However, part of the protein. (However, 5EAS also has a small N- the sequence of the N-terminal 60 amino acids of LIS terminal region [residues 1±35] involved in substrate has the characteristics of a plastid-targeting sequence binding [Starks et al. 1997]). CPS, which utilizes a dif- (Keegstra, Olsen, and Theg 1989), being rich in hy- ferent mechanism, has a substantially divergent se- droxylated and basic residues. quence with the metal-cofactor- present in the N-terminal half of the protein (but, since the three- Comparison of the Gene Structures of LIS with Those dimensional structure of CPS has not been determined, of Other Terpene Synthases the location of other residues essential for catalysis is It has previously been reported that intron positions not yet known). In LIS, which appears to employ a sim- in LMS-type terpene cyclase genes are very similar pli®ed LMS-type mechanism (Pichersky, Lewinsohn, (Back and Chappell 1995). For example, LMS from Per- and Croteau 1995), the metal-cofactor-binding motif illa frutescens, 5EAS from Nicotinana tabacum, and DDXXD is present in the second part of the protein, and CAS from Ricinus cummunis (encoding a monoterpene, the extensive sequence identity between LIS and other a sesquiterpene, and a diterpene cyclase, respectively) terpene cyclases centered around this motif suggests that have introns in identical positions, although LMS and the active site of LIS is present in this part of the protein. CAS have longer ®rst exons which are thought to en- We previously reported that the LIS protein shares code, in part, a transit peptide that directs these proteins sequence similarity with LMS-type synthases in its C- to the plastids (Facchini and Chappell 1992; Mau and terminal half, and with CPS in its N-terminal half (Du- West 1994; Yuba et al. 1996). However, it is noteworthy dareva et al. 1996). A similar observation was made for that the LIS gene conforms to this gene structure only ABS (Vogel et al. 1996), suggesting that both of these in the region of the LMS-type genes encoding the part proteins are of composite origin. In fact, the sequences of the protein that corresponds to the structure of the of a few other terpene synthases also show similarity to terpene synthase (region A in ®g. 4). In this region, en- both LMS-type synthases and CPS in the respective C- coded by exons 9±12 in LIS, the LIS introns occur in and N-termini, for example, taxadiene synthase (TAS) the same positions as in the LMS-type terpene cyclase (Wildung and Croteau 1996) and KS (Yamaguchi et al. genes. No such correspondence in gene structure be- 1996, 1998). Interestingly, all of these proteins are of tween LIS and 5EAS, LMS, or CAS is seen in the ®rst similar size to LIS. Although ABS is a bifunctional en- halves of these genes, which in the LMS-type genes en- zyme which employs both a CPS-type mechanism and codes the glycosylhydrolase-like domain. Instead, in this an LMS-type mechanism (Vogel et al. 1996), it is not region (LIS exons 1±8), exon sizes and intron positions clear why the LIS protein (or TAS and KS) should dis- in LIS are essentially identical to those found in the play similarities to CPS. Moreover, LIS and KS fall into region of CPS encompassed by exons 4±11. one subfamily and TAS and ABS fall into another in the plant terpene synthase classi®cation scheme of Cro- Discussion teau and co-workers (Bohlmann, Meyer-Gauen, and Variation Among LIS Protein Sequences Croteau 1998). Prior to the present report, none of the The levels of variation between LIS protein se- structures of the genes encoding these proteins had been quences in species within the same family but not within reported. Thus, the segmental similarity in the coding the same genus (65%±69% in the Clarkia vs. O. ari- sequences between them and the CPS- and LMS-type zonica comparisons) and between LIS protein sequences synthases was dif®cult to interpret. For example, their in angiosperm species not in the same family (40%± similarity to CPS in the N-terminus could have been due 45% in the Onagraceae vs. A. thaliana comparisons) are to preferential sequence conservation of a region that in the same range found for other terpene synthases. may have been present in all ancestral terpene synthases These observations and the similarity in gene structure but whose function is conserved only in these enzymes lend support to the identi®cation of C. breweri LIS2 and (and not in the other terpene synthases), rather than in- the O. arizonica and the A. thaliana genes whose se- dicating a more recent common origin. quences are presented here as encoding LIS. However, We now show that the sequence similarity of LIS it should be noted that since amino acid sequence iden- and CPS in the N-terminal part of the protein is most tity of up to 80% by itself does not guarantee the same likely due to a relatively recent common ancestry of LIS function in the plant terpene synthase family (Bohl- (for the ®rst half of the gene) with CPS, as evidenced mann, Steele, and Croteau 1997), only a functional by the similarity in structure of the genes encoding study can establish the correct products of the reactions them. Speci®cally, introns 1±8 in LIS occur in positions catalyzed by the enzymes encoded by these genes. virtually identical to introns 4±11 in CPS, respectively delineating seven pairs of exons of nearly identical sizes Comparisons of Linalool Synthase with Other Terpene (within each pair). However, the rest of the LIS gene Synthases, and the Origin of the LIS Gene shows no structural similarity to CPS. Instead, in the From sequence comparisons and examination of second half of the LIS gene (the region encoded by ex- the three-dimensional structure of 5EAS, it was deter- ons 9±12), introns 8±11 occur in the same positions as mined that the terpene synthases which employ the introns 3±6 in LMS, 5EAS, and CAS. This observation Linalool Synthase Evolution 1497

suggests that LIS is a composite gene resulting from a evolved independently, each must have evolved from a discrete recombination event between the ®rst half of a member of the terpene synthase family, and, more spe- CPS-like gene and the second half of an LMS-type gene. ci®cally, from within the clade of the LMS-type terpene The alternative hypothesis, that LIS gene structure rep- synthases. The universal presence of a number of related resents the ancestral condition in the terpene synthase enzymes catalyzing the formation of related terpene family, is unlikely because the LIS protein occupies a compounds which are essential for plant viability (gib- nonbasal position in the phylogenetic tree of this family berellins, brassinosteroids, etc.) may provide a pool (Bohlmann, Meyer-Gauen, and Croteau 1998) and also from which new enzymes, catalyzing the formation of because CPS encodes an enzyme in the pathway leading ``specialized'' terpenes, can evolve, sometimes more to the production of the gibberellin-type plant hormones than once. This may explain the presence of the same which are essential for all higher plants. Thus, CPS is specialized terpene in taxonomically diverse species of likely to have preceded LIS. plants when closely related species in each group do not make such a compound. The evidence presented in this Repeated Evolution paper shows that in addition to duplication and diver- The most recent phylogenetic analysis of the ter- gence, additional terpene synthases may be created by pene synthase family of proteins, based on comparisons a process of recombination between different terpene of amino acid sequences only (Bohlmann, Meyer- synthase genes, or ``domain swapping'' (Gilbert 1980; Gauen, and Croteau 1998), placed all the LMS-type ter- Doolittle 1995). These two processesÐduplication fol- pene synthases, with the possible exception of LIS and lowed by divergence and duplications followed by do- KS, in a monophyletic clade. Although intron positions main swappingÐwould increase in frequency as the in most of the genes encoding terpene synthases are not number of related terpene synthase genes in the genome yet known, in the few cases in which gene structure was increases and could explain the extraordinarily large determined in the LMS-type terpene synthase genes, the number of diverse terpene synthases found in present- intron positions were the same (Back and Chappell day plants. 1995). These observations suggest that all of these en- zymes originated from a single ancestral gene which has Acknowledgments undergone successive cycles of duplications and diver- gence. This research was funded by grant IBN-9417582 However, within this group, the phylogenetic rela- from the National Science Foundation to E.P. We thank tionships are not clear. For example, although a priori it Dr. R. A. Raguso for collecting the O. arizonica plants would seem most likely that speci®c enzymes such as and Dr. Peter Kaufman for the use of his facilities for LMS have evolved only once in plants, evidence has growing the O. arizonica plants. been presented that the protein sequence of LMS in a gymnosperm species, grand ®r (Abies grandis), is more LITERATURE CITED similar to the sequences of other terpene synthases in that same species (64% identical to grand ®r pinene cy- AACH, H., G. BOSE, and J. E. GRAEVE. 1995. ent-Kaurene bio- synthesis in a cell-free system from wheat (Triticum aesti- clase and myrcene synthase) than it is to LMS protein vum L.) seedlings and the localization of ent-kaurene syn- sequences from angiosperms (e.g., 28% identical to thetase in plastids of three species. Planta 197:333±342. LMSs from mint) (Yuba et al. 1996; Bohlmann, Steele, ASHBY, M. N., and P. A. EDWARDS. 1990. Elucidation of the and Croteau 1997). Moreover, grand ®r LMS, as well as de®ciency in two yeast coenzyme Q mutants: characteriza- other terpene synthases from this gymnosperm species, tion of the structural gene encoding hexaprenyl pyrophos- require Kϩ for optimal activity, while the corresponding phate synthetase. J. Biol. Chem. 265:13157±13164. enzymes from angiosperms do not (Savage, Hatch, and BACK, K., and J. CHAPPELL. 1995. Cloning and bacterial ex- Croteau 1994). Bohlmann, Steele, and Croteau (1997) pression of a sesquiterpene cyclase from Hyoscyamus mu- initially suggested that perhaps overall lower rates of ticus and its molecular comparison to related terpene cy- protein sequence evolution in gymnosperms explain the clases. J. Biol. Chem. 270:7375±7381. BENSEN, R. J., G. S JOHAL,V.C.CRANE,J.T.TOSSBERG,P. lower divergence among different gymnosperm terpene S. SCHNABLE,R.B.MEELEY, and S. P. BRIGGS. 1995. Clon- synthase sequences. However, their more comprehensive ing and characterization of the maize An1 gene. Plant Cell phylogenetic treatment of the plant terpene synthase 7:75±84. family clearly indicated that the LMS enzymes in gym- BOHLMANN, J., G. MEYER-GAUEN, and R. CROTEAU. 1998. nosperms and angiosperms do not form a monophyletic Plant terpenoid synthases: molecular and phylogenetic anal- clade distinct from all other LMS-type terpene synthases ysis. Proc. Natl. Acad. Sci. USA 95:4126±4133. (Bohlmann, Meyer-Gauen, and Croteau 1998), suggest- BOHLMANN, J., C. L. STEELE, and R. CROTEAU. 1997. Mono- ing that speci®c LMS enzymatic activities evolved in terpene synthases from grand ®r (Abies grandis). J. Biol. plants more than once. Chem. 272:21784±21792. Such repeated evolution could be considered ``con- CHAPPELL, J. 1995. The biochemistry and molecular biology of isoprenoid metabolism. Plant Physiol. 107:1±6. vergent evolution'' if LMS genes from gymnosperms DOBSON, H. E. M. 1993. Floral volatiles in insect biology. Pp. and angiosperms evolved from unrelated ancestral 47±81 in E. BERNAYS, ed. Insect-plant interactions. Vol. 5. genes. However, the sequence similarities that do exist CRC Press, Boca Raton, Fla. between LMS proteins from gymnosperms and those DOOLITTLE, R. F. 1995. The multiplicity of domains in proteins. from angiosperms suggest that although these enzymes Annu. Rev. Biochem. 64:287±314. 1498 Cseke et al.

DUDAREVA, N., L. CSEKE,V.M.BLANC, and E. PICHERSKY. PICHERSKY, E., E. LEWINSOHN, and R. CROTEAU. 1995. Puri- 1996. Evolution of ¯oral scent in Clarkia: novel patterns of ®cation and characterization of S-linalool synthase, an en- S-linalool synthase gene expression in the C. breweri ¯ow- zyme involved in the production of ¯oral scent in Clarkia er. Plant Cell 8:1137±1148. breweri. Arch. Biochem. Biophys. 316:803±807. FACCHINI, P. J., and J. CHAPPELL. 1992. Gene family for an PICHERSKY, E., R. A. RAGUSO,E.LEWINSOHN, and R. CRO- elicitor-induced sesquiterpene cyclase in tobacco. Proc. TEAU. 1994. Floral scent production in Clarkia (Onagra- Natl. Acad. Sci. USA 89:11088±11092. ceae). I. Localization and developmental modulation of GERSHENZON, J., and R. CROTEAU. 1993. Terpenoid biosynthe- monoterpene emission and linalool synthase activity. Plant sis: the basic pathway and formation of monoterpenes, ses- Physiol. 106:1533±1540. quiterpenes, and diterpenes. Pp. 340±388 in T. S. MOORE, RAGUSO, R. A., and E. PICHERSKY. 1995. Floral volatiles from ed. Lipid metabolism in plants. CRC Press, Boca Raton, Clarkia breweri and C. concinna (Onagraceae): recent evo- Fla. lution of ¯oral scent and moth pollination. Plant Syst. Evol. GILBERT, W. 1980. Why genes in pieces? Nature 271:501. 194:55±67. JOHNSON, M., and R. CROTEAU. 1987. Biochemistry of conifer SAVAGE, T., M. W. HATCH, and R. CROTEAU. 1994. Monoter- resistance to bark beetles and their fungal symbionts. Pp. pene synthases of Pinus cotorta and related con®fers. A new 76±91 in G. FULLER and W. D. NES, eds. Ecology and me- class of terpenoid cyclase. J. Biol. Chem. 269:4012±4020. tabolism of plant lipids. ACS Symposium Series 325. STARKS, C. M., K. BACK,J.CHAPPELL, and J. P. NOEL. 1997. American Chemical Society, Washington, D.C. Structural basis for cyclic terpene biosynthesis by tobacco KAISER, R. 1991. Trapping, investigation, and reconstitution of 5-epi-aristolochene synthase. Science 277:1815±1820. ¯oral scents. Pp. 213±248 in P. M ULLER and D. LAMPAR- STUDIER, F. W., A. H. ROSENBERG,J.J.DUNN, and J. W. DU- SKY, eds. Perfume: art, science, and technology. Elsevier, BENDORFF. 1990. Use of T7 RNA polymerase to direct ex- New York. pression of cloned genes. Methods Enzymol. 185:60±89. KEEGSTRA, K., L. D. OLSEN, and S. M. THEG. 1989. Chloro- SUN,T.-P., and Y. KAMIYA. 1994. The Arabidopsis GA1 locus plastic precursors and their transport across the envelope encodes the cyclase ent-kaurene synthase A of gibberellin membranes. Annu. Rev. Plant Physiol. Plant Mol. Biol. 40: biosynthesis. Plant Cell 6:1509±1518. 471±501. . 1997. Regulation and cellular localization of ent-kau- KNUDSEN, J. T., and L. TOLLSTEN. 1993. Trends in ¯oral scent rene synthesis. Physiol. Plant. 101:701±708. chemistry in pollination syndromes: ¯oral scent composi- TARSHIS, L. C., M. YAN,C.D.POULTER, and J. C. SACCHET- tion in moth-pollinated taxa. Bot. J. Linn. Soc. 113:263± TINI. 1994. Crystal structure of recombinant farnesyl di- 284. phosphate synthase at 2.6-A resolution. Biochemistry 33: KNUDSEN, J. T., L. TOLLSTEN, and G. BERGSTROM. 1993. Floral 10871±10877. scentsÐa check-list of volatile compounds isolated by VOGEL, B. S., M. R. WILDUNG,G.VOGEL, and R. CROTEAU. head-space techniques. Phytochemistry 33:253±280. 1996. Abietadiene synthase from grand ®r (Abies gran- LESBURG, C. A., G. ZHAI,D.ECANE, and D. W. CHRISTIAN- dis)ÐcDNA isolation, characterization, and bacterial ex- SON. 1997. Crystal structure of pentalenene synthase: mech- pression of a bifunctional diterpene cyclase involved in res- anistic insights on terpenoid cyclization reactions in biolo- in acid biosynthesis. J. Biol. Chem. 271:23262±23268. gy. Science 277:1820±1824. WILDUNG, M. R., and R. CROTEAU. 1996. A cDNA clone for LEWINSOHN, E., M. GIJZEN, and R. CROTEAU. 1992. Wound- taxadiene synthase, the diterpene cyclase that catalyzes the inducible pinene cyclase from grand ®r: puri®cation, char- committed step of taxol biosynthesis. J. Biol. Chem. 271: acterization and renaturation after SDS-PAGE. Arch. Bio- 9201±9204. chem. Biophys. 293:167±173. YAMAGUCHI, S., T. SAITO,H.ABE,H.YAMANE,N.MUROFU- LEWIS, H. 1962. Catastrophic selection as a factor in specia- SHI, and Y. KAMIYA. 1996. Molecular cloning and charac- tion. Evolution 16:257±271. terization of a cDNA encoding the gibberellin biosynthetic MCGARVEY, D. J., and R. CROTEAU. 1995. Terpenoid metab- enzyme ent-kaurene synthase B from pumpkin (Cucurbita olism. Plant Cell 7:1015±1026. maxima L.). Plant J. 10:203±213. MACSWAIN, J., P. RAVEN, and R. THORP. 1973. Comparative YAMAGUCHI, S., T.-P.SUN,H.KAWAIDE, and Y. KAMIYA. 1998. behavior of bees and Onagraceae. IV. Clarkia bees of the The GA2 locus of Arabidopsis thaliana encodes ent-kaurene western United States. Univ. Calif. Publ. Entomol. 70:1±80. synthase of gibberellin biosynthesis. Plant Physiol. 116: MAU, C. J. D., and C. A. WEST. 1994. Cloning of casbene 1271±1278. synthase cDNA: evidence for conserved structural features YUBA, A., K. YAZAKI,M.TABATA,G.HONDA, and R. CRO- among terpenoid cyclases in plants. Proc. Natl. Acad. Sci. TEAU. 1996. cDNA cloning, characterization, and functional USA 91:8497±8501. expression of 4S-(-)-limonene synthase from Perilla. Arch. NEBEKER, T. E., J. D. HODGES, and C. A. BLANCE. 1993. Host Biochem. Biophys. 332:280±287. reponse to bark beetle and pathogen colonization. Pp. 158± 173 in T. D. SCHOWALTER and G. M. FILIP, eds. Beetle± JULIAN P. A DAMS, reviewing editor pathogen interactions in conifer forests. Academic Press, London. Accepted July 27, 1998