Copyright  1998 by the Genetics Society of America

Molecular Evolution of Genes Controlling Petal and Stamen Development: Duplication and Divergence Within the APETALA3 and PISTILLATA MADS-Box Gene Lineages

Elena M. Kramer,* Robert L. Dorit† and Vivian F. Irish* *Departments of Molecular, Cellular and Developmental Biology and †Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520 Manuscript received December 24, 1997 Accepted for publication March 2, 1998

ABSTRACT The specification of floral organ identity in the higher dicots depends on the function of a limited set of homeotic genes, many of them members of the MADS-box gene family. Two such genes, APETALA3 (AP3) and PISTILLATA (PI), are required for petal and stamen identity in Arabidopsis; their orthologs in Antirrhinum exhibit similar functions. To understand how changes in these genes may have influenced the morphological evolution of petals and stamens, we have cloned twenty-six homologs of the AP3 and PI genes from two higher eudicot and eleven lower eudicot and magnolid dicot . The sequences of these genes reveal the presence of characteristic PI- and AP3-specific motifs. While the PI-specific motif is found in all of the PI genes characterized to date, the lower eudicot and magnolid dicot AP3 homologs contain distinctly different motifs from those seen in the higher . An analysis of all the available AP3 and PI sequences uncovers multiple duplication events within each of the two gene lineages. A major duplication event in the AP3 lineage coincides with the base of the higher eudicot radiation and may reflect the evolution of a petal-specific AP3 function in the higher eudicot lineage.

LOWERS are a defining characteristic of the angio- angiosperm lineages (Figure 1). Phylogenetic analyses F sperms. The typical hermaphroditic angiosperm of the angiosperms based on a large rbcL data set have flower contains both sterile and reproductive organs. identified two major monophyletic clades (Chase et al. These organs are generally organized into whorls, with 1993; Crane et al. 1995; Qiu et al. 1993). Both of these a particular organ type arising from a single node on groups, the eudicots and the monocots, are rooted the axis of a determinate floral meristem. The flowers of within an unresolved basal grade of magnolid dicots. the model species Arabidopsis thaliana display the typical The eudicot clade can be further subdivided into the higher eudicot floral organization. The first and second lower eudicots, comprising the Ranunculidae, basal Ha- whorls of the flower contain the sterile and petals, mamelididae and basal Rosidae, and the higher eudi- respectively. The third whorl contains the stamens, the cots, made up of the bulk of the flowering , includ- male reproductive structures which produce pollen. ing the majority of the model species used for genetic The female reproductive structures, the carpels, arise analysis (Drinnan et al. 1994). Stamenally-derived pet- in the fourth whorl and contain the ovules. Numerous als, called andropetals, have evolved many times within variations on this basic floral architecture exist within the lower eudicots and at least once at the base of the the angiosperms. These differences in organ number, higher eudicot clade and the monocot clade (Takhta- structure and phyllotaxy are critical morphological char- jan 1991). A second type of petals, bracteopetals, are acters in the study of angiosperm systematics. derived from sepals or other sterile subtending organs. The evolution of angiosperm floral diversity has been The diverse magnolid dicots include species which have a subject of considerable study. While the stamens and been characterized as possessing bracteopetals, as well carpels are thought to have each evolved only once, it as a number of species which are considered to have is widely accepted that the sterile organs have evolved andropetals (Takhtajan 1991). The designations of many times within the angiosperms, although the details petals as being andropetallous or bracteopetallous have Cronquist Takh- of these events are unresolved ( 1988; been based primarily on morphological characters. tajan Drinnan Endress 1991; et al. 1994; 1994). Much The isolation of floral homeotic mutants in Arabi- of the controversy has centered on the number and dopsis and Antirrhinum has provided an inroad into the the nature of petal derivation events within the various dissection of the genetic mechanisms underlying floral diversity and the derivation of floral organs (Komaki et al. 1988; Bowman et al. 1989; Carpenter and Coen Corresponding author: Vivian Irish, Department ofMolecular, Cellular and Developmental Biology, Yale University, P.O. Box 208104, New 1990). Genetic analysis of these mutants has led to Haven, CT 06520-8104. E-mail: [email protected] a model where three classes of genes, known as the

Genetics 149: 765–783 ( June, 1998) 766 E. M. Kramer, R. L. Dorit and V. F. Irish

Figure 1.—Simplified phylogeny showing the relationships between major angiosperm clades (Chase et al. 1993; Crane et al. 1995; Friis and Endress 1990; Qiu et al. 1993). Lin- eages which have been proposed to represent independent derivations of andropetals and bracteopetals are indicated (Dahlgren et al. 1984; Drinnan et al. 1994; Takhtajan 1991).

A, B, and C classes, interact in a combinatorial manner size and sequence within the family and have yet to be to specify particular organ identities (Bowman et al. assigned any specific function. 1991; Coen and Meyerowitz 1991; Meyerowitz et al. While many of the currently characterized MADS do- 1991). Mutations in the B group genes display transfor- main proteins appear to function as homodimers, the mations of the petals into sepals and the stamens into AP3 and PI gene products are thought to act together carpels, indicating that the corresponding wild-type as a heterodimeric transcription factor. This conclusion gene products are required for specifying petal and is supported by the finding that both the AP3 and PI stamen identity (Jack et al. 1992; Goto and Meyero- gene products are required for DNA binding and nu- witz 1994). Like many of the organ-identity genes, the clear localization (McGonigle et al. 1996; Riechmann Arabidopsis B group genes APETALA3 (AP3) and PIS- et al. 1996a; Hill et al. 1998). Furthermore, the AP3 TILLATA (PI) encode products which contain a MADS and PI proteins have been shown to bind to each other domain, a highly conserved region of approximately 57 in immunoprecipitation experiments (Goto and Mey- amino acids which has been shown to play a role in erowitz 1994; Riechmann et al. 1996a). The mainte- DNA binding and protein dimerization (Norman et al. nance of AP3 and PI expression depends upon the pres- 1988; Pollock and Treisman 1991). The repre- ence of both gene products, suggesting that the AP3/ sentatives of the MADS-box family are further distin- PI heterodimer promotes the transcription of AP3 and guished by the presence of a 70-amino acid region called PI, perhaps directly (Jack et al. 1992; Goto and Meyer- the K domain (Figure 2; Ma et al. 1991; Davies and owitz 1994). Interestingly, phylogenetic analysis of the Schwarz-Sommer 1994). This portion of the protein is entire plant MADS box gene family has shown that the predicted to form two to three amphipathic helices which AP3 and PI lineages are the products of a duplication may facilitate protein-protein interactions (Pnueli et al. event which makes them more closely related to each 1991). Separating the MADS and K domains is a short other than to any of the other MADS-box genes (Doyle intervening region (I) of approximately thirty amino 1994; Purugganan et al. 1995; Purugganan 1997; acids which, along with the K domain, has been shown Theissen et al. 1996). to play a role in dimerization specificity (Krizek and The conservation of B group functions across the an- Meyerowitz 1996; Reichmann et al. 1996a). The C- giosperms is being addressed through studies of AP3 terminal portions of the proteins vary considerably in and PI homologs in several higher eudicot species. In general, the expression patterns of these genes are all quite similar to those seen in Arabidopsis (Theissen and Saedler 1995; Irish and Kramer 1998). The mutant phenotypes, in those species where they have been ana- lyzed, are generally consistent with a conserved role for Figure AP3 and PI in promoting the establishment of petal and 2.—Domains of the plant MADS-box gene as de- Sommer Trobner fined by sequence conservation and protein structure. The stamen identity ( et al. 1990; et al. MADS-box displays the highest level of conservation within 1992; Angenent et al. 1993; van der Krol and Chua the family and has been shown to play a role in DNA binding 1993). Furthermore, the AP3 ortholog from Antirrhi- and protein dimerization. The I and K regions also appear to num, DEFICIENS (DEF), is able to largely replace endog- play roles in dimerization but show lower levels of sequence Irish Yamamoto conservation. The C-terminal region is the most divergent in enous Arabidopsis AP3 function ( and sequence but does contain some highly conserved lineage- 1995; Samach et al. 1997). Similarly, both DEF and the specific motifs. Antirrhinum PI ortholog, GLOBOSA (GLO), have been Evolution of AP3 and PI Lineages 767 shown to promote stamen and petal identity when ec- cleotide 5Ј-CCGGATCCTCTAGAGCGGCCGC(T)17 from 500 topically expressed in Nicotiana (Davies et al. 1996). ng of poly-A RNA. This poly-T primer was used with a second primer with the sequence 5Ј-GGGGTACCAA(C/T)(A/C)GI Taken together, these results support the conclusion CA(A/G)GTIACITA(T/C)TCIAAG(A/C)GI(A/C)G-3Ј in a that AP3 and PI orthologs are responsible for determin- polymerase chain reaction (PCR) to amplify MADS-box-con- ing petal and stamen identity within the higher eudicots. taining cDNAs (primary PCR reaction). PCR analysis was per- Since the petals of the higher eudicots are thought formed in 100 ␮l of PCR buffer (10 mm Tris pH 8.3, 50 mm m Ј to be homologous, it is not surprising that all the higher KCl, 1.5 m MgCl2) containing 50 pmol and 20 pmol of 5 and 3Ј primer, respectively, 200 ␮mol of each deoxyribonucleotide eudicots studied to date appear to have a conserved triphosphate, and 2.5 units of AmpliTaq Gold polymerase petal developmental pathway. In order to examine how (Perkin Elmer, Foster City, CA). Amplification began with a independent petal derivation events may be reflected Taq-activation step of 12 min at 95ЊC, followed by 10 cycles in the petal developmental program, it is necessary to of 20 sec denaturing at 95Њ, 30 sec annealing at 38Њ and 1 min examine species whose petals are not homologous to extension at 72Њ. The program was completed by 30 cycles of 20 sec denaturing at 95Њ, 30 sec annealing at 42Њ and an those of the higher eudicots. Accordingly, we have extension time of 1 min at 72Њ. Amplified products were ana- cloned AP3 and PI homologs from eleven lower eudicot lyzed on a 1% agarose gel, revealing one or more distinct and magnolid dicot species in an effort to understand fragments of Ն0.6 kb. The reactions were directly cloned using the evolution both of these gene lineages and of the the TA and TOPO-TA Cloning kits (Invitrogen, Carlsbad, CA). pathways of petal specification. We present a phyloge- Clones were analyzed based on size and all fragments over 0.6 kb were sequenced using fluorescent sequencing methods netic analysis of the B group genes which indicates that by the Keck Foundation Biotechnology Resource Laboratory the path of B group gene evolution is more complex at Yale University. than previously thought. Our analysis suggests that there In nudicaule, Dicentra eximia, bulbosus and are, in fact, two paralogous AP3 lineages in the higher terminalis,3Јprimers targeted to AP3-orPI-specific eudicots: one represented by the well-studied AP3 or- C-terminal motifs were used in conjunction with the degener- ate 5Ј primer to specifically amplify AP3 and PI homologs tholog group and the other containing the tomato AP3 from the primary PCR reaction (same conditions as above). paralog TM6 and several related genes. The data suggest The PI-specific primer has the sequence 5Ј-TGIA(A/G)(A/ that these two lineages are the result of a gene duplica- G)TTIGGITGIA(A/T)(T/G)GGITG and the AP3-specific tion event which occurred after the divergence of the primer has the sequence 5Ј-CIAGICGIAG(A/G)TC(A/G)T. in the lower eudicots but before the diversifi- For these species, clones from both the primary PCR reaction as well as the AP3- and PI-specific secondary amplifications cation of the higher eudicots. Sequence analysis reveals were analyzed. The complete 3Ј sequence of the clones gener- that the AP3-like genes of the lower eudicots and mag- ated with the AP3- or PI-specific primers was obtained using nolid dicots are actually more similar to the members 3Ј RACE (3Ј RACE primer sequences available upon request). of the TM6 lineage than they are to the higher eudicot The complete cDNA sequences of PnAP3-2 and PnPI-1 were AP3 lineage. Although we have also identified duplica- obtained using the 5Ј RACE System for Rapid Amplification tion events in the PI lineage, none appear to date back of cDNA Ends, Version 2.0 (Gibco BRL, Gaithersburg, MD), in conjunction with several different oligonucleotides (se- to the base of the higher eudicots. In addition, the quences available upon request). PI homologs we have isolated display a greater overall For the cloning of LeAP3 from Lycopersicon esculentum,a conservation of sequence than do the AP3 lineage mem- AP3- specific primer with the sequence 5Ј-A(A/ bers. Based on these observations, we present a model G)IGC(A/G)AAIGTIGTIAT(A/G)TC was designed from the for the evolution of the B group gene lineage and discuss C-terminal consensus D(I/L)TTFAL. This 3Ј primer was used in conjunction with the degenerate 5Ј MADS-box primer to how duplication and divergence in this gene lineage specifically amplify LeAP3. The complete 3Ј sequence was ob- may have influenced the evolution of petals. tained using 3Ј RACE (primer sequence available upon re- quest). Phylogenetic analysis: The sequences of all of the published MATERIALS AND METHODS B group representatives were obtained from GenBank (see Tables 2 and 3 for accession numbers). Protein sequences Species sampled and sources of plant material: The species were aligned using CLUSTALW and refined by hand, taking included in this analysis are given in Tables 1–3, along with both nucleotide and amino acid sequences into consideration family membership, general collection information and Gen- (see appendices 1 and 2). The alignments were further modi- Bank accession numbers for the gene sequences. Throughout fied (for phylogenetic analysis) by encoding gaps as single the text we have followed the taxonomic designations of Cron- characters in a supplemental data matrix. In this approach, quist (1981) for dicots (the only exception to this being the gaps are treated as single events, thus preventing their over- designation of the Ranunculidae as an independent subclass) weighting (based on gap length) in the subsequent phyloge- and Dahlgren et al. (1984) for monocots. The choice of taxa netic analyses. was influenced by both phylogenetic position and specimen We generated parsimony trees based on the aligned protein availability. sequences using the PAUP 4.0* package (Phylogenetic Analysis Cloning and analysis: For each species, total RNA was pre- Using Parsimony, Version 61, used by permission of the au- pared using Trizol (GIBCO BRL, Gaithersburg, MD) from thor, Swofford 1993). Parsimony trees were found using the whole flower buds collected across a range of developmental heuristic search algorithm, generating 1000 replicate runs stages. Poly-A mRNA was extracted from total RNA using Mag- using random stepwise addition of sequences. Multiple equal- netight Oligo (dT) particles (Novagen, Madison, WI). Single- length parsimony trees were collapsed into 50% majority rule stranded cDNA was synthesized by priming with the oligonu- consensus trees. Bootstrap values for all resolved nodes in 768 E. M. Kramer, R. L. Dorit and V. F. Irish

TABLE 1 Collection information

Sample Voucher or source Clones/speciesa Lycopersicon esculentum cv. Celebrity Cultivated, Kramer 108 5 (5) Syringa vulgaris Kramer 106 11 (11) Pachysandra terminalis Kramer 105 22 (9) Papaver californicum Kramer 100 4 (4) Papaver nudicaule Cultivated, Kramer 101 56 (36) Dicentra eximia Kramer 102 30 (20) Delphinium ajacis Cultivated, Kramer 104 5 (5) Caltha palustris Kramer 103 16 (16) Ranunculus bulbosus Kramer 107 31 (22) Michelia figo Yale Marsh Gardens 10 (10) Liriodendron tulipifera Yale Marsh Gardens 6 (6) Peperomia hirta Yale Marsh Gardens 12 (12) Piper magnificum Yale Marsh Gardens 8 (8) a First value listed under the Clones/species heading indicates the total number of isolates characterized by restriction analysis and sequencing; the value in parentheses is the number of sequenced clones for each species (does not include 5Ј or 3Ј RACE fragments; see materials and methods). the consensus tree were derived from the partition functions This region of the MADS domain is largely invariant obtained from 1000 replicate bootstrapping runs. These runs across the predicted products of the plant MADS-box were similarly produced using a heuristic search via random stepwise addition and under the TBR (tree-bisection-recon- genes examined to date and includes three amino acids nection) algorithm for branch swapping. that are invariant across all members of the MADS-box Distance matrices were derived from the PI and AP3 data- family (Doyle 1994; Shore and Sharrocks 1995). The sets under a mean character difference criterion. Amino acid tyrosine residue within this domain is diagnostic for the substitutions were weighted in accordance with the BLO- B group genes. The use of this B-group specific 5Ј primer SUM substitution matrix. Trees were subsequently generated via the Neighbor-Joining algorithm (NJ) as implemented in in conjunction with the completely nonspecific poly-T PAUP. The resulting trees represent the “minimum evolution” 3Ј primer allowed for the amplification of AP3 and PI networks connecting the sequences. Bootstrap values for re- orthologs with little bias against divergent paralogs. solved nodes are derived from 1000 replicate runs, again using Since the predicted length of the eudicot B group mem- the NJ algorithm. bers ranges from 180 to 230 amino acids, we examined For both the parsimony and distance analyses, the AP3 trees were rooted using six PI sequences (GLO, PI, DaPI, LtPI, PhPI all clones longer than approximately 600 bp. and OsMADS2); conversely, six AP3 sequences (AP3, DEFA, We used Syringa vulgaris (lilac), a higher eudicot, as SLM3, MfAP3, PhAP3 and CpAP3) were used to root the PI a positive control to test the primary PCR reaction and trees. These choices for outgroups were based, first of all, on cloning strategy. Both AP3 and PI orthologs were iso- the fact that AP3 and PI are known to be paralogous lineages lated from Syringa and their sequences showed very and are, therefore, each other’s natural outgroup. Secondly, while the AP3 and PI orthologs can be reasonably aligned high similarity to that of the other Asterid B group to each other throughout their entire length, unambiguous representatives. For each lower eudicot and magnolid alignments cannot be generated between the B group se- dicot species, we analyzed 10 to 36 clones (Table 1). It quences and the remaining members of the MADS-box gene should be noted that in several species, an exhaustive family. survey has not been undertaken (Ͻ10 clones analyzed, Similar analyses were conducted using nucleotide data sets aligned in accordance with the protein alignments (data not Table 1). In the case of Lycopersicon, this is due to the shown, see results). fact that we were only interested in establishing the sequence of one gene (see below). Analyses of the Pa- paveraceae and were initiated with Pa- RESULTS paver californicum and Delphinium ajacis but subsequent Cloning of B-group genes from lower eudicots and analyses were carried out with more convenient repre- magnolid dicots: Our strategy for cloning B group gene sentative species (P. nudicaule, R. bulbosus and Caltha members from various lower eudicot and magnolid di- palustris). cot species depends on the presence of a highly con- A small number of the sequenced clones obtained served sequence in the putative DNA-binding ␣-helix of from the primary PCR reactions proved not to be MADS- the MADS-box domain (Shore and Sharrocks 1995). box gene representatives (5/171 total clones sequen- We designed a degenerate primer based on the deca- ced) and two were found to be members of the AGA- peptide sequence NRQVTYSKR using the sequence of MOUS-like family (one from P. nudicaule and one from the published higher eudicot AP3 and PI orthologs. Delphinium ajacis). Conceptual translation of all of the Evolution of AP3 and PI Lineages 769

TABLE 2 PISTILLATA-like genes

Subclass Family Species Gene Ref. Acc. no. Asteridae Scrophulariaceae Antirrhinum majus GLO (Trobner et al. 1992) S28062 Solanaceae Petunia hybrida FBP1 (Angenent et al. 1992) M91190 PMADS2 (Kush et al. 1993) X69947 Nicotiana tabacum NTGLO (Hansen et al. 1993) X67959 Oleaceae Syringa vulgaris SvPI This study AF052861 Dilleniidae Brassicaceae Arabidopsis thaliana PI (Goto and Meyerowitz 1994) D30807 Caryophyllidae Caryophyllaceae Silene latifolia SLM2 (Hardenack et al. 1994) X80489 Ranunculidae Papaver nudicaule PnPI-1 This study AF052855 PnPI-2 This study AF052856 Fumariaceae Dicentra eximia DePI This study AF052857 Ranunculaceae Caltha palustris CpPI This study AF052858 Ranunculus bulbosus RbPI-1 This study AF052859 RbPI-2 This study AF052860 Delphinium ajacis DaPI This study AF052862 Magnoliidae Magnoliaceae Michelia figo MfPI a This study AF052863 Liriodendron tulipifera LtPI This study AF052864 Piperaceae Peperomia hirta PhPI This study AF052865 Piper magnificum PmPI-1 This study AF052866 PmPI-2 This study AF052867 Commelinidae Poaceae Oryza sativa OsMADS2 (Chung et al. 1995) L37526 OsMADS4 (Chung et al. 1995) L37527 a MfPI was only represented by a partial sequence at the time of submission and was not included in the phylogenetic analysis. other cDNAs showed that they encode B group-type will refer to as the PI motif, has a core consensus se- MADS-box gene products (164/171 total clones se- quence of MPFxFRVQPxQPNLQE. Four of the posi- quenced). We assigned the novel genes to the AP3 or tions are completely invariant and the remainder show PI classes based on the overall sequence similarity to very strong conservation of chemical characteristics. the known AP3 and PI representatives and on the pres- Overall, the PI motif appears to be a strongly hydropho- ence of specific diagnostic sites in the MADS, K and C bic domain, although several charged and polar amino domains. AP3 and PI-like proteins can be distinguished acids are present within the region. The motif bears no from one another at MADS-box residues 29, 35 and strong similarity to any known structural elements and 47. Within the K box, PI homologs possess a highly a BLAST search for similar sequences in GenBank yields conserved sequence KHExL (appendix 1, residues 88 only PI homologs. to 92). The comparable sequence in the K box of the There are only two instances of marked divergence AP3 homologs is (H/Q)YExM (appendix 2, residues 85 in sequence or structure among the newly isolated PI to 89). We also found that the C-terminal portions of homologs. One is found in the predicted products of the predicted proteins contained diagnostic motifs for the D. eximia (Fumariaceae) and P. nudicaule (Papavera- each lineage (see below). Each unique cDNA was named ceae) genes, DePI and PnPI-1, both of which contain an using the first letter of the genus and species from which approximately 20 amino acid insertion upstream of the it was isolated followed by either AP3 or PI, depending PI motif as well as a 10 to 12 amino acid addition at the on its sequence similarity. C-terminal end of the protein (Figure 3A). These novel Sequence and phylogenetic analysis of PI homologs: regions are characterized by stretches of A/C-rich repet- We have identified a total of twelve new PI-like genes itive DNA sequence. At the nucleotide level, the two which have been cloned from nine species (Table 2). cDNAs clearly align with one another through these In most of the species surveyed, only one PI-like gene novel regions, sharing 58% identity in the 100 bp up- was identified, while in P. nudicaule, R. bulbosus and Piper stream of the PI motif and 70% identity in the down- magnificum, two distinct PI-like genes were found. The stream region. It is likely that the event(s) that produced sequences of the predicted products of the new PI clones these insertions occurred before the last common ances- align well with the previously studied higher eudicot tor of the Papaveraceae and Fumariaceae, which are and monocot representatives. Particularly striking is an sister families within the Papaverales. The second exam- approximately twenty amino acid region at the C-termi- ple of major sequence deviation within the PI homologs nal end of the predicted proteins that displays extremely is seen in the P. nudicaule gene PnPI-2. The PnPI-2 cDNA high conservation (Figure 3A). This domain, which we encodes a truncated protein of only 164 amino acids, 770 E. M. Kramer, R. L. Dorit and V. F. Irish

Figure 3.—Alignment of C-ter- minal regions of the predicted protein sequences analyzed in this study. The names of genes cloned in this study are highlighted in bold. The sequences of the un- published genes AsAP3 and SIL- KY-1 are not shown. See Tables 1 and 2 for information on these genes. (A) Predicted PI protein se- quences. The region which we have designated as the PI motif is boxed and the consensus is shown below. Residues which show chem- ical conservation with the consen- sus are highlighted in bold. (B) Predicted AP3 protein sequences. Genes are grouped according to lineage as indicated by phyloge- netic analysis. The region which bears similarity to the PI motif is boxed and defined as PI Motif-De- rived. Residues which show chemi- cal conservation with the PI motif core consensus are highlighted in bold. The C-terminal euAP3 and paleoAP3 motifs are also indicated with boxes. Residues in each re- gion which show chemical conser- vation with the motif consensus se- quences defined in the text are highlighted in bold.

as opposed to the usual length of 180 to 210 amino based tree shown in Figure 4B. In both trees, all of the acids. Alignment of the PnPI-1 and PnPI-2 nucleotide higher eudicot PI sequences are grouped into a single sequences reveals that the similarity between the PnPI- clade. The distance and parsimony trees do differ in 1 and PnPI-2 transcripts is quite high (80% identity) up the placement of the PnPI-1/PnPI-2 paralog pair relative to the point of the stop codon in PnPI-2, after which to the other Ranunculid PI orthologs. One unexpected the similarity declines considerably. result is the position of the P. magnificum gene, PmPI-2, Parsimony and distance-based phylogenies of the PI as sister to the higher eudicot clade in the parsimony sequences were produced by analysis of the complete analysis. This is not the expected position of a represen- PI protein data set using several AP3-like sequences as tative of a basal magnolid dicot. The other Piperaceae the outgroup. The parsimony analysis resulted in 8 trees representatives, PmPI-1 and PhPI, are located at the base of equivalent length, from which a 50% majority rule of the tree. In diverging from its paralog, PmPI-2 may consensus tree with a consistency index of 0.972 was have independently acquired sequence motifs charac- derived, shown in Figure 4A. The topology of the parsi- teristic of the higher eudicot PI representatives. In the mony tree is similar in many ways to that of the distance- distance analysis, PmPI-2 is placed at the base of the Evolution of AP3 and PI Lineages 771

Figure 4.—Phylogenetic trees of the complete PI amino acid data set. The numbers next to the nodes give bootstrap values from 1000 replicates. (A) Parsimony analysis. The tree length is 1526 steps. (B) Distance analysis. tree, closer to the other Piperaceae representatives. We total of nine lower eudicot and magnolid dicot species note that the low bootstrap support seen for certain (Table 3). P. terminalis and P. nudicaule were the only nodes reflects the restricted number of characters on species that we found to have two AP3 -like genes. The which the node is based, rather than on the existence Ranunculid and magnolid dicot AP3 representatives of alternative, better supported topologies. Analysis of show many differences when compared to the higher a nucleotide PI data set yielded trees which displayed a eudicot AP3 orthologs. Foremost among these are the significant number of unresolved polytomies due largely considerable differences in the sequences of the pre- to the effects of saturation (data not shown). Where dicted C-termini. The higher eudicot AP3 gene products resolution was obtained using the nucleotide sequences, display a highly conserved C-terminal motif with the the structure of the consensus tree did not differ from consensus sequence D(L/I)TTFALLE (euAP3 motif, that found using the protein sequences. Figure 3B). The majority of the AP3 clones from the Sequence and phylogenetic analysis of the AP3 homo- Ranunculidae and the magnolid dicots have a com- logs: The higher eudicot L. esculentum (tomato) has pletely different predicted C-terminal sequence with the been previously found to contain an AP3 paralog, TM6, consensus YGxHDLRLA (paleoAP3 motif, Figure 3B). which is considered to be orthologous to the The predicted product of an AP3 -like gene from Zea tuberosum gene PD2 (Pnueli et al. 1991; Garcia-Mar- mays, SILKY-1, also displays the paleoAP3 motif with oto et al. 1993). Solanum (potato) contains another only one amino acid difference (C. Padilla and R. gene, STDEF, which appears to be orthologous to the Schmidt, personal communication). The paleoAP3 AP3 lineage members of the other and higher motif is slightly divergent, but recognizable, in PtAP3- eudicots. Lycopersicon was examined in an attempt to 1 and PtAP3-2 from P. terminalis. Interestingly, the pa- find an AP3 -like gene which would be more similar leoAP3 motif aligns very well with the C-terminal end to the other higher eudicot AP3 orthologs than the of the predicted TM6 and PD2 proteins (Figure 3B). previously described TM6. We recovered several clones Similarity to the paleoAP3 motif is also seen in the of a cDNA, LeAP3 whose predicted product displays 93% product of a putative B group-related gene which has amino acid identity to that of STDEF from Solanum been isolated from the Pteridophyte (fern) Ceratopteris (Garcia-Maroto et al. 1993). This level of similarity is (Munster et al. 1997; Figure 3B). The product of this comparable to that seen between the TM6 and PD2 gene fern gene, CRM3, has relatively low similarity to the products from Lycopersicon and Solanum, respectively. AP3 gene products through most of its length, but the This result confirms the presence of two separate AP3- predicted C-terminal end aligns well with the paleoAP3 related genes in both tomato and potato. motif. This observation is remarkable considering the We have also cloned eleven AP3-like genes from a factthat the Pteridophytes diverged from the land plants 772 E. M. Kramer, R. L. Dorit and V. F. Irish

TABLE 3 APETALA3-like genes

Subclass Family Species Gene Ref. Acc. no. Asteridae Scrophulariaceae Antirrhinum majus DEF (Schwartz-Sommer et al. 1992) S12378 Solanaceae Lycopersicon esculentum cv. LeAP3 This study AF052868 Celebrity Lycopersicon esculentum TM6 (Pnueli et al. 1991) X60759 cv. Tiny Tim Petunia hybrida PMADS1 (van der Krol et al. 1993) X69946 Solanum tuberosum STDEF (Garcia-Maroto et al. 1993) X67511 PD2 (Garcia-Maroto et al. 1993) None Nicotiana tabacum NTDEF (Davies et al. 1996) X96428 Asteraceae Argyroxiphium sandwicense AsAP3 Unpublished resultsb Oleaceae Syringa vulgaris ScAP3 This study AF052869 Dilleniidae Brassicaceae Arabidopsis thaliana AP3 (Jack et al. 1992) A42095 Brassica oleraceae BobAP3 (Carr and Irish 1997) BOU67456 var. boytris Brassica oleraceae Boi1AP3 (Carr and Irish 1997) BOU67453 var. italica Boi2AP3 (Carr and Irish 1997) BOU67455 Caryophyllidae Caryophyllaceae Silene latifolia SLM3 (Hardenack et al. 1994) X80490 Dianthus caryophyllus CMB2 Unpublished resultsc L40405 Polygonaceae Rumex acetosa RAD1 (Ainsworth et al. 1995) X8913 RAD2 (Ainsworth et al. 1995) X89108 Rosidae Fabaceae Medicago sativa NMH7 (Heard and Dunn 1995) L41727 Buxaceae Pachysandra terminalis PtAP3-1 This study AF052870 PtAP3-2 This study AF052871 Ranunculidae Papaveraceae Papaver californicum PcAP3 This study AF052872 Papaver nudicaule PnAP3-1 This study AF052873 PnAP3-2 This study AF052874 Fumariaceae Dicentra eximia DeAP3 This study AF052875 Ranunculaceae Caltha palustris CpAP3 This study AF052854 Ranunculus bulbosus RbAP3 This study AF052876 Magnoliidae Magnolicaceae Michelia figo MfAP3 This study AF052877 Liriodendron tulipifera LtAP3a This study AF052878 Piperaceae Peperomia hirta PhAP3 This study AF052879 Commelinidae Poaceae Zea mays SILKY-1 Unpublished resultsd a LtAP3 was only represented by a partial sequence at the time of submission and was included in the phylogenetic analysis. b M. Barrier and M. Purugganan, personal commmunication. c S. C. Baudinette and K. W. Savin, unpublilshed results. d C. Padilla and R. Schmidt, personal communication. roughly 400 million years ago (mya) (Stewart and The PI motif is particularly divergent in RAD1 and RAD2 Rothwell 1993). The Papaver clones PnAP3-1 and from Rumex, while in CMB2,anAP3 paralog from Dian- PcAP3 are unique among the Ranunculid AP3 genes in thus, a truncation has eliminated most of the PI motif- that they do not show high conservation of the paleoAP3 derived sequence. In the Michelia and Peperomia AP3 motif. representatives, the PI motif sequence is unrecogniz- One other motif that we recognized in the new AP3 able. gene products is a region with similarity to the PI motif. Figure 5, A and B, show the results of parsimony and The Ranunculid AP3 representatives have a region up- distance analysis of the complete AP3 protein data set. stream of the paleoAP3 motif with the consensus se- The parsimony analysis produced 27 trees of equivalent quence FxFRLQPSQPNLH. This is very similar to the length, from which we generated a 50% majority rule core of the PI motif consensus sequence (Figure 3B). consensus tree with a consistency index of 0.89. The When all of the AP3-like gene products are aligned, the topmost clade in both the parsimony and distance trees PI motif-derived sequence can be observed in almost contains a grouping of the higher eudicot Brassicaceae, all of the AP3 -like proteins. While this sequence has Asterid and Caryophyllid AP3 orthologs (Figure 5A). been very highly conserved in the PI lineage, in the AP3 The members of this clade will be referred to as the lineage it has diverged to differing degrees. TM6 and euAP3 lineage. The second main clade in both trees is PD2 show a much greater conservation of the PI motif made up of several other higher eudicot AP3-like genes: than do the other higher eudicot AP3 gene products. TM6, PD2, AsAP3 and CMB2. The orthology of PD2 and Evolution of AP3 and PI Lineages 773

Figure 5.—Phylogenetic trees of the complete AP3 data set. The numbers next to the nodes give bootstrap values from 1000 replicates. (A) Parsimony analysis. The tree length is 1781 steps. The distribution of diagnostic sequence characters is mapped onto the tree. (B) Distance analysis.

TM6 has been suggested previously (Garcia-Maroto If the character of each representative’s predicted et al. 1993). Likewise, a recent analysis of the relation- C-terminal motif is mapped on Figure 5A, we see that ships of the MADS-box genes also found an association the paleoAP3 motif is present throughout the basal pa- between AsAP3 and TM6 (Purugganan 1997). CMB2 raphyly and along the TM6 lineage. We will refer to the is an AP3-like gene that was isolated from Dianthus caryo- lower dicot and magnolid dicot genes which exhibit the phyllus. The inclusion of CMB2 in the TM6 lineage is paleoAP3 motif as the paleoAP3 lineage. The euAP3 supported byseveral positions which are synapomorphic motif is synapomorphic for the clade defining what we for this clade, including 69Met, 118Gly, 147Lys and have called the euAP3 lineage. Diagnostic amino acid 211Val. Our analyses define all of these genes as descen- character states from throughout the protein sequence dants of a lineage that we will refer to as the TM6 lineage. can be mapped onto the tree as well. Although the In the parsimony analysis, the remaining AP3-like C-termini of the predicted PtAP3-1 and PtAP3-2 proteins genes of the lower eudicots and magnolid dicots consti- resemble the paleoAP3 motif, in other positions the tute a paraphyletic group at the base of the tree. The sequence is more similar to the euAP3 genes. These two very similar Pachysandra genes are paired together euAP3-like positions include 54Leu, 55His, 72Leu, 148- and are the most closely related to the higher eudicot Asn and 150Ile. The euAP3 clade is also defined by clades. In contrast, the distance analysis defines a clade several synapomorphic residues, aside from the C-termi- composed of PtAP3-1, PtAP3-2 and the Ranunculid and nal euAP3 motif. These changes include 2Gly Ala, Magnoliaceae sequences. This result highlights the simi- 7Glu Gln, 28Ile Leu, 115Asp Ser/Cys, 144His→ Lys → → → → larities between the Pachysandra and other lower eudi- and 226Phe Leu. Interestingly, the primitive states of → cot sequences. In both trees, PnAP3-1 and PcAP3 from positions 2 and 7 in the MADS box are identical to the P. nudicaule and P. californicum, respectively, are paired character state in the PI lineage members. together with very high certainty, suggesting that they define a paralogous lineage. Similar to the PI analysis, DISCUSSION the low bootstrap support found for several nodes re- flects the limited number of characters on which the Morphological evolution proceeds by changes in de- node is based. All resolved nodes shown in Figure 5A velopmental pathways due to alterations in the structure were present in all equally parsimonious trees. As with and regulation of particular developmental genes. The PI, a nucleotide AP3 data set was analyzed, but the re- organ identity genes of the A, B, and C classes are logical sulting trees were highly unresolved (data not shown). starting points for a study of changing floral morpholo- Where resolution was obtained using the nucleotide gies. The B group genes, in particular, appear to be a sequences, the structure did not differ from that found likely site of developmental flexibility in floral evolution. using the protein sequences. While A and C group members might be expected to 774 E. M. Kramer, R. L. Dorit and V. F. Irish show very high levels of constraint due to their pleiotro- pic roles in floral meristem identity and determinacy, as well as in organ identity, the functions of the B group genes appear to be limited to establishing organ iden- tity. Moreover, the fact that the petals and stamens dis- play enormous morphological plasticity within the an- giosperms may be directly reflected as changes in the B group genes. These predictions are supported by sub- stitution rate analysis, which shows that the AP3/PI group is evolving 20–40% faster than are all of the other plant MADS-box genes (Purugganan 1997). Further- more, although the B group genes appear to have a conserved function in determining petal identity in the higher eudicots, these petals are not thought to be homologous with those of the lower eudicots, magno- lid dicots and monocots. The independent derivation events that gave rise to the different petals of the angio- sperms may be reflected in the diversified structure and function of the B group genes. In this study we present the isolation of twenty-six new B group genes from a total of thirteen species distributed throughout the angiosperms. One of the most striking findings of the analysis of these genes is Figure 6.—Model of B Group MADS-box gene evolution. the high frequency of gene duplications within both The hypothesized appearance and divergence of various diag- the AP3 and PI lineages. We have identified several key nostic sequences are indicated on a simplified lineage tree. ancestral duplications, in addition to what appear to be more recent duplication events. Pinpointing the exact time at which these duplications have occurred is diffi- gous to the paleoAP3 motif, then we must add posses- cult, however. The functional redundancy that results sion of the paleoAP3 motif to the list of characteristics from having multiple gene copies can serve to release of the B group ancestor. The fact that the predicted constraint on the sequence of the paralogs. Thus, al- product of CRM3 does not possess the PI motif could though a high sequence similarity between paralogs reflect an origin of this domain that postdates the split would seem to suggest a relatively recent duplication of the ferns from the land plants (Figure 6). event, low sequence similarity could reflect either an All of the species which we have thoroughly surveyed ancient duplication or the rapid divergence of a paralo- contain both an AP3 and a PI representative, suggesting gous lineage. If we can identify orthologs of both dupli- that the duplication event which produced these two cation products in more than one species, then it is lineages predates the diversification of the angiosperms. clear that the duplication event occurred before the last Initially after the duplication event, both of the products common ancestor of the species in question. would have exhibited the PI motif, the characteristic The B group gene ancestor and the AP3/PI duplica- MADS-box residues and, possibly, the paleoAP3 motif. tion event: The AP3 and PI genes have previously been In the PI lineage, the PI motif and the MADS-box char- shown to be members of closely paralogous gene lin- acters appear to have been highly conserved throughout eages (Doyle 1994; Purugganan et al. 1995; Purugga- while the paleoAP3 motif was lost, possibly by a single nan 1997; Theissen et al. 1996). We report further truncation event. In contrast, along the AP3 lineage, the evidence for the paralogy of the AP3 and PI lineages, PI motif diversifies dramatically, along with additional most notably the presence of the PI motif-derived se- changes throughout the protein (Figure 6). quence in the AP3 homologs. Based on the common The PI lineage: The high degree of conservation of sequence characteristics of the AP3 and PI lineages, we the PI motif throughout the members of the PI lineage can reconstruct some of the characters of the B group suggests that it has a critical function. The requirement ancestral gene lineage. The basal representatives ofboth of the C terminus for in vivo function has been demon- lineages contain the PI motif as well as several diagnostic strated for the products of several plant MADS-box amino acids in the MADS domain (2Gly, 7Glu). This genes, including AP3 (Krizek and Meyerowitz 1996; would suggest, based on parsimony, that the most recent Riechmann and Meyerowitz 1997). In vitro studies B group ancestor also displayed these traits. The pre- have shown, however, that the C regions of the AP3 and dicted product of the CRM3 gene from Pteridophyte PI proteins are not required for heterodimerization or Ceratopteris may reveal another aspect of the B group the binding of the complex to DNA (Krizek and Mey- ancestor. If the CRM3 C-terminal domain is homolo- erowitz 1996; Riechmann et al. 1996b). Furthermore, Evolution of AP3 and PI Lineages 775 none of the PI gene mutants isolated in Arabidopsis or Antirrhinum has lesions in this region (Goto and Meyerowitz 1994; Jack et al. 1992; Sommer et al. 1990; Trobner et al. 1992). For these reasons, it is difficult to speculate on the functions of the PI motif, aside from suggesting that it is involved in protein interactions which are important for the overall function of the PI orthologs. Many other duplication events have followed the cre- ation of the separate AP3 and PI lineages. In the PI lineage, five different paralog pairs have been identi- fied, in Petunia hybrida, R. bulbosus, P. nudicaule, P. mag- nificum and Oryza sativa. The phylogenetic analysis sug- gests that these are the products of independent duplication events which occurred after the last com- mon ancestor of any of the species included in this analysis. In two of these species, P. hybrida and O. sativa, there is some evidence for functional divergence of the paralogs (Angenent et al. 1993; Chung et al. 1995). Although polyploidy is a common cause for the pres- ence of paralogs within plant genomes, none of these species are known polyploids (Bennett and Smith 1976, 1991; Bennett and Leitch 1995). Duplications such as these, as well as insertion events like those seen in the PI orthologs of P. nudicaule and D. eximia, may be useful as synapomorphies to define taxonomic groups. Figure 7.—Model showing correlation of B Group gene The AP3 lineage: We have defined two distinct AP3 duplication events and land plant evolution. The divisions of the simplified gene phylogeny indicate which B group lineage lineages which appear to be present throughout the representatives have been identified in the extant plant groups higher eudicots. These are clearly the products of an listed on the left. ancient duplication event that occurred before the di- versification of the higher eudicots. We cannot rule out the possibility that this duplication event occurred very role of these Papaver genes is unknown. Evidence for early in the angiosperms and we have simply not de- a functionally divergent AP3 paralog is seen in the Med- tected the members of the euAP3 lineage in the lower icago sativa gene, NMH7 (Heard and Dunn 1995). The eudicots or magnolid dicots. The sequence and phylo- sequence of NMH7 is highly diverged from the sequence genetic analyses do not support an extremely early of the other euAP3 lineage members, and may reflect duplication, however. Most notably, the Pachysandra the fact that the function of this paralog has also di- PtAP3-1 and PtAP3-2 genes appear to represent a mosaic verged considerably, being involved in mediating root of paleoAP3 characters and characters of the euAP3 and nodulation (Heard and Dunn 1995). Our analysis, which TM6 lineages of the higher eudicots. Although the is based on amplifying floral cDNAs, would not have C-terminal regions of the predicted products of the detected any paralogs such as NMH7. Pachysandra genes bear more similarity to the paleoAP3 Implications of the AP3 duplication event and subse- motif, other positions in the protein sequences ally them quent divergence: While the PI lineage appears to be with the higher eudicot genes. Based on these observa- highly conserved, the AP3 lineage has experienced sig- tions, we propose that the duplication which produced nificant diversification, some of which clearly correlates the euAP3 and TM6 lineages from a paleoAP3 ancestor with duplication events. If the fern CRM3 gene is, in occurred after the last common ancestor of the Buxa- fact, a representative of the B group ancestral lineage ceae and the higher eudicots but before the diversifica- which possesses a paleoAP3 motif, then this small region tion of the major higher eudicot subclasses (Figure 7). has been conserved, to some degree, from the Pterido- We see the results of additional taxon-specific duplica- phytes, dating back some 400 mya, up through the lower tion events in the AP3 lineages of Brassica oleracea, Rumex Rosids, which first appeared approximately 80 mya acetosa, and P. terminalis. The AP3 paralog pairs of these (Drinnan et al. 1994; Stewart and Rothwell 1993). species all show relatively high levels of sequence similar- This highly conserved motif was lost, however, in the ity. We have also identified several more divergent para- euAP3 lineage where we see fixation of a new conserved logous AP3 lineages. One such lineage is defined by C-terminal motif. In addition, we see changes through- PcAP3 and PnAP3-1 from P. californicum and P. nudicaule, out the predicted products of the euAP3 lineage genes, respectively. Other than being florally expressed, the including the MADS-box domain. The euAP3 lineage 776 E. M. Kramer, R. L. Dorit and V. F. Irish would, therefore, appear to have undergone consider- Version 61 of PAUP 4.0*. This work was supported by grants from able sequence divergence in regions that were highly the National Science Foundation (IBN 9630916) and United States Department of Agriculture (97 01286) to V.F.I., a National Science conserved, some for at least 300 million years. We as- Foundation graduate fellowship to E.M.K. and the Yale Science Devel- sume that this reflects a shift in the functional repertoire opment Fund to R.L.D. of the euAP3 lineage members relative to the ancestral paleoAP3 representatives. There are many possible ways in which the functions of the euAP3, paleoAP3 and TM6 lineage members may LITERATURE CITED differ. The euAP3 lineage members function primarily Ainsworth, C., S. Crossley, V. Buchanan-Wollaston, M. Thanga- in the establishment of petal and stamen identity. The velu and J. Parker, 1995 Male and female flowers of the dioe- cious plant sorrel show different patterns of MADS box gene role played by euAP3 representatives in stamen identity expression. Plant Cell 7: 1583–1598. is likely to be the ancestral function of these genes since Angenent, G. C., M. Busscher, J. Franken, J. N. M. Mol and A. J. stamens are thought to have evolved only once, before van Tunen, 1992 Differential expression of two MADS box the diversification of the angiosperms. In contrast, the genes in wild-type and mutant petunia flowers. Plant Cell 4: 983– 993. function of euAP3 lineage members in petal identity Angenent, G. C., J. Franken, M. Bussher, L. Colombo and A. J. may be more recently acquired, reflecting a de novo van Tunen, 1993 Petal and stamen formation in petunia is evolution of petals at the base of the higher eudicot regulated by the homeotic gene fbp1. Plant J. 4: 101–112. Bennett, M. D., and I. J. Leitch, 1995 Nuclear DNA amounts in radiation. Similarly, it seems likely that the paleoAP3 angiosperms. Ann. Bot. 76: 113–176. lineage members also function in stamen identity, but Bennett, M. D., and J. B. Smith, 1976 Nuclear DNA amounts in each time that petals were derived, whether from sta- angiosperms. Philos. Trans. R. Soc. Lond. B Biol. Sci. 274: 227– 274. mens or from bracts, the paleoAP3 ortholog(s) may Bennett, M. D., and J. B. Smith, 1991 Nuclear DNA amounts in have been recruited to new roles. In the Ranunculidae, angiosperms. Philos. Trans. R. Soc. Lond. B Biol. Sci. 334: 309– 345. for example, andropetals are thought to have been de- Bowman, J. L., D. R. Smyth E. M. Meyerowitz, Drinnan and 1989 Genes rived many times, even within single families ( directing flower development in Arabidopsis. Plant Cell 1: 37–52. et al. 1994). Each of these events may have resulted in Bowman, J. L., D. R. Smyth and E. M. Meyerowitz, 1991 Genetic changes of the functions of the B group genes. The interactions among floral homeotic genes of Arabidopsis. Develop- ment 112: 1–20. functions of the TM6 lineage members in the higher Carpenter, R., and E. S. Coen, 1990 Floral homeotic mutations eudicots are not well understood. Expression patterns produced by transposon-mutagenesis in Antirrhinum majus. Genes have only been characterized for TM6 itself, which is Dev. 4: 1483–1493. Carr, S. M., and V. F. Irish, 1997 Floral homeotic gene expression highly expressed in the developing petal, stamen and defines developmental arrest stages in Brassica oleracea L. vars. carpel, but a role in the formation of these organs has botrytis and italica. Planta 201: 179–188. not been established (Pnueli et al. 1991). Our analysis Chase, M. W., D. E. Soltis, R. G. Olmstead, D. Morgan, D. H. Les et al., 1993 Phylogenetics of seed plants: an analysis of nucleotide reveals that the TM6 lineage members still retain se- sequences from the plastid gene rbcL. Ann. Miss. Bot. Gard. 80: quence similarity to the ancestral paleoAP3 lineage, but 528–580. the individual representatives appear to have under- Chung, Y.-Y., S.-R. Kim, H.-G. Kang, Y.-S. Noh, M. Park et al., 1995 Characterization of two rice MADS box genes homologous to gone significant diversification since the euAP3/TM6 GLOBOSA. Plant Sci. 109: 45–56. duplication event. Coen, E. S., and E. M. Meyerowitz, 1991 The war of the whorls: These results reveal a complex pattern of gene dupli- genetic interactions controlling flower development. Nature 353: 31–37. cation and divergence in the AP3 and PI lineages. While Crane, P. R., E. M. Friis and K. R. Pedersen, 1995 The origin and it appears that the euAP3 motif is found in all of the early diversification of angiosperms. Nature 374: 27–33. higher eudicot AP3 homologs that are known to play a Cronquist, A., 1981 An Integrated System of Classification of Flowering Plants. Columbia University Press, New York. role in determining petal and stamen identity, it is the Cronquist, A., 1988 The Evolution and Classification of Flowering paleoAP3 motif that defines the AP3 homologs of the Plants. The New York Botanical Garden, New York. lower eudicots and magnolid dicots. To understand Dahlgren, R. M. T., H. T. Clifford and P. F. Yeo, 1984 The Families of the Monocotyledons. Springer Verlag, New York. what changes in function may underlie the sequence Davies, B., and Z. Schwarz-Sommer, 1994 Control of floral organ divergence observed in the euAP3 lineage relative to identity by homeotic MADS-box transcription factors. Results the ancestral paleoAP3 lineage, investigation of the Probl. Cell Differ. 20: 235–258. Davies, B., A. Di Rosa, T. Eneva, H. Saedler H. Sommer, function of the paleoAP3 lineage members in the lower and 1996 Alteration of tobacco floral organ identity by expression eudicots and magnolid dicots will be necessary. These of combinations of Antirrhinum MADS-box genes. Plant J. 10: and other comparative developmental approaches are crit- 663–677. Doyle, J. J., ical to defining the relationships between gene duplica- 1994 Evolution of a plant homeotic multigene family: toward connecting molecular systematics and molecular develop- tion, gene expression and the morphological diversifica- mental genetics. Syst. Biol. 43: 307–328. tion that is the hallmark of the angiosperm radiation. Drinnan, A. N., P. R. Crane and S. B. Hoot, 1994 Patterns of floral evolution in the early diversification of non-magnoliid dicotyle- Irish The authors thank the members of the lab for helpful discus- dons (eudicots), pp. 93–122 in Early Evolution of Flowers, edited C. Padilla, R. Schmidt, M. Barrier sions. We are very grateful to by P. K. Endress and E. M. Friis. Springer-Verlag, New York. and M. Purugganan for providing unpublished sequence informa- Endress, P. K., 1994 Floral structure and evolution of primitive tion. We also thank David Swofford for kindly allowing us to use angiosperms: recent advances. Plant Syst. Evol. 192: 79–97. Evolution of AP3 and PI Lineages 777

Friis, E. M., and P. K. Endress, 1990 Origin and evolution of angio- tures and homology with homeotic genes from Antirrhinum and sperm flowers. Adv. Bot. Res. 17: 99–162. Arabidopsis. Plant J. 1: 255–266. Garcia-Maroto, F., F. Salamini and W. Rohde, 1993 Molecular Pollock, R., and R. Treisman, 1991 Human SRF-related proteins: cloning and expression patterns of the Deficiens homologous gene DNA-binding properties and potential regulatory targets. Genes St-Deficiens from Solanum tuberosum. Plant J. 4: 771–780. Dev. 5: 2327–2341. Goto, K., and E. M. Meyerowitz, 1994 Function and regulation Purugganan, M. D., 1997 The MADS-box floral homeotic gene of the Arabidopsis floral homeotic gene PISTILLATA. Genes lineages predate the origin of seed plants: phylogenetic and mo- Dev. 8: 1548–1560. lecular clock estimates. J. Mol. Evol. 45: 392–396. Hansen, G., J. Estruch, H. Sommer and A. Spena, 1993 NTGLO : Purugganan, M. D., S. D. Rounsley, R. J. Schmidt and M. F. Yanof- a tobacco homologue of the GLOBOSA floral homeotic gene of sky, 1995 Molecular evolution of flower development: diversi- Antirrhinum majus : cDNA sequence and expression pattern. Mol. fication of the plant MADS-box regulatory gene family. Genetics Gen. Genet. 239: 310–312. 140: 345–356. Qiu, Y.-L., M. W. Chase, D. H. Les C. R. Parks, Hardenack, S., D. Ye, H. Saedler and S. Grant, 1994 Comparison and 1993 Molecular of MADS box gene expression in developing male and female phylogenetics of the Magnoliidae: cladistic analyses of nucleotide flowers of the dioecious plant white campion. Plant Cell 6: 1775– sequences of the plastid gene rbcL. Ann. Miss. Bot. Gard. 80: 587–606. 1787. Riechmann, J. L., E. M. Meyerowitz, Heard, J., and K. Dunn, 1995 Symbiotic induction of a MADS-box and 1997 MADS domain proteins in plant development. J. Biol. Chem. 378: 1079–1101. gene during development of alfalfa root nodules. Proc. Natl. Riechmann, J. L., B. A. Krizek E. M. Meyerowitz, Acad. Sci. USA 92: 5273–5277. and 1996a Di- merization specificity of Arabidopsis MADS domain homeotic Hill, T., C. D. Day, S. C. Zondlo, A. G. Thackeray and V. F. Irish, proteins APETALA1, APETALA3, PISTILLATA, and AGAMOUS. 1998 Discrete spatial and temporal cis-acting elements regulate Proc. Natl. Acad. Sci. USA 93: 4793–4798. transcription of the Arabidopsis floral homeotic gene APETALA3. Riechmann, J. L., M. Wang and E. M. Meyerowitz, 1996b DNA- Development 125: 1711–1721. binding properties of Arabidopsis MADS domain homeotic pro- Irish, V. F., and E. M. Kramer, 1998 Genetic and molecular analysis teins APETALA1, APETALA3, PISTILLATA and AGAMOUS. Nu- of angiosperm flower development. Adv. Bot. Res. 28: 197–230. Irish, V. F., Y. T. Yamamoto, cleic Acids Res. 24: 3134–3141. and 1995 Conservation of floral ho- Samach, A., S. E. Kohalmi, P. Motte, R. Datla and G. W. Haughn, meotic gene function between Arabidopsis thaliana and Antirhinum 1997 Divergence of function and regulation of class B floral majus. Plant Cell 7: 1635–1644. Jack, T., L. L. Brockman E. M. Meyerowitz, organ identity genes. Plant Cell 9: 559–570. and 1992 The home- Schwarz-Sommer, Z., I. Hue, P. Huijser, P. J. Flor, R. Hansen et otic gene APETALA3 of Arabidopsis thaliana encodes a MADS box al., 1992 Characterization of the Antirrhinum floral homeotic and is expressed in petals and stamens. Cell 68: 683–697. MADS-box gene deficiens : evidence for DNA binding and autoreg- Komaki, M., K. Okada, E. Nishino Y. Shimura, and 1988 Isolation ulation of its persistent expression throughout flower develop- and characterization of novel mutants of Arabidopsis thaliana de- ment. EMBO J. 11: 251–263. fective in flower development. Development 104: 195–203. Shore, P., and A. D. Sharrocks, 1995 The MADS-box family of Krizek, B. A., and E. M. Meyerowitz, 1996 Mapping the protein transcription factors. Eur. J. Biochem. 229: 1–13. regions responsible for the functional specificities of the Arabi- Sommer, H., J.-P. Beltra´n, P. Huijser, H. Pape, W.-E. Lo¨ nnig et al., dopsis MADS domain organ-identity proteins. Proc. Natl. Acad. 1990 Deficiens, a homeotic gene involved in the control of Sci. USA 93: 4063–4070. flower morphogenesis in Antirrhinum majus : the protein shows Kush, A., A. Brunelle, D. Shevell and N.-H. Chua, 1993 The homology to transcription factors. EMBO J. 9: 605–613. cDNA sequence of two MADS box proteins in petunia. Plant Stewart, W. N., and G. W. Rothwell, 1993 Paleobotany and the Physiol. 102: 1051–1052. Evolution of Plants, Ed. 2, Cambridge University Press, Cambridge. Ma, H., M. F. Yanofsky and E. M. Meyerowitz, 1991 AGL1-AGL6, Swofford, D., 1993 Phylogenetic Analysis using Parsimony. Illinois Nat- an Arabidopsis gene family with similarity to floral homeotic and ural History Survey, Champaign, IL. Takhtajan, A., transcription factor genes. Genes Dev. 5: 484–495. 1991 Evolutionary Trends in Flowering Plants. Colum- McGonigle, B., K. Bouhidel and V. F. Irish, 1996 Nuclear localiza- bia University Press, New York. Theissen, G., H. Saedler, tion of the Arabidopsis APETALA3 and PISTILLATA homeotic and 1995 MADS-box genes in plant on- gene products depends on their simultaneous expression. Genes togeny and phylogeny: Haekel’s ‘biogenetic law’ revisited. Curr. Dev. 10: 1812–1821. Opin. Gen. Dev. 5: 628–639. Meyerowitz, E. M., J. L. Bowman, L. L. Brockman, G. N. Drews, Theissen, G., J. T. Kim and H. Saedler, 1996 Classification and T. Jack et al., 1991 A genetic and molecular model for floral phylogeny of the MADS-box multigene family suggest defined development in Arabidopsis thaliana. Development 1: 157–167. roles of MADS-box gene subfamilies in the morphological evolu- tion of eukaryotes. J. Mol. Evol. 43: 484–516. Munster, T., J. Pahnke, A. Di Rosa, J. T. Kim, W. Martin et al., Trobner, W., L. Ramirez, P. Motte, I. Hue, P. Huijser et al., 1992 1997 Floral homeotic genes were recruited from homologous Globosa—a homeotic gene which interacts with deficiens in the MADS-box genes preexisting in the common ancestor of ferns control of antirrhinum floral organogenesis. EMBO J. 11: 4693– and seed plants. Proc. Natl. Acad. Sci. USA 94: 2415–2420. Norman, A., M. Runswick, R. Pollock R. Treisman, 4704. and 1988 Iso- van der Krol, A. R., and N.-H. Chua, 1993 Flower development lation and properties of cDNA clones encoding SRF, a transcrip- in petunia. Plant Cell 5: 1195–1203. tion factor that binds to the c-fos serum response element. Cell van der Krol, A. R., A. Brunelle, S. Tsuchimoto and N. H. Chua, 55: 989–1003. 1993 Functional analysis of petunia floral homeotic mads box Pnueli, L., M. Abu-Abeid, D. Zamir, W. Nacken, Z. Schwarz-Som- gene PMADS1. Genes Dev. 7: 1214–1228. mer et al., 1991 The MADS box gene family in tomato: temporal expression during floral development, conserved secondary struc- Communicating editor: E. Meyerowitz 778 E. M. Kramer, R. L. Dorit and V. F. Irish

APPENDIX 1 Alignment of all PI lineage members with selected AP3 outgroup representitives

(Continued) Evolution of AP3 and PI Lineages 779

APPENDIX 1 (Continued)

(Continued) 780 E. M. Kramer, R. L. Dorit and V. F. Irish

APPENDIX 1 (Continued)

Incomplete sequence of MfPI is not shown (see Table 2). Amino acids 25–291 were used in the phylogenetic analysis. Evolution of AP3 and PI Lineages 781 APPENDIX 2 Alignment of all AP3 lineage members with selected PI outgroup representatives

(Continued) 782 E. M. Kramer, R. L. Dorit and V. F. Irish

APPENDIX 2 (Continued)

(Continued) Evolution of AP3 and PI Lineages 783

APPENDIX 2 (Continued)

Incomplete sequence of LtAP3 and sequences of unpublished genes AsAP3 and SILKY-1 are not shown (see Table 3 for references). Amino acids 25–283 were used in the phylogenetic analysis.