Molecular Evolution of FLORICAULA/LEAFY Orthologs in the ()

Kirsten Bomblies1 and John F. Doebley Department of Genetics, University of Wisconsin-Madison

Members of the grass family (Poaceae) exhibit a broad range of inflorescence structures and other morphologies, making the grasses an interesting model system for studying the evolution of development. Here we present an analysis of the molecular evolution of FLORICAULA/LEAFY-like genes, which are important developmental regulatory loci known to affect inflorescence development in a wide range of flowering species. We have focused on sequences from the Andropogoneae, a tribe within the grass family that includes ( mays ssp. mays) and Sorghum (Sorghum bicolor). The FLORICAULA/LEAFY gene phylogeny we generated largely agrees with previously published phylogenies for the Andropogoneae using other nuclear genes but is unique in that it includes both members of one of the many duplicate gene sets present in maize. The placement of these sequences in the phylogeny suggests that the duplication of the maize FLO- RICAULA/LEAFY orthologs, zfl1 and zfl2, is a consequence of a proposed tetraploidy event that occurred in the common Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 ancestor of Zea and a closely related , . Our data are consistent with the hypothesis that the transcribed regions of the FLORICAULA/LEAFY-like genes in the Andropogoneae are functionally constrained at both nonsynon- ymous and synonymous sites and show no evidence of directional selection. We also examined conservation of short noncoding sequences in the first intron, which may play a role in gene regulation. Finally, we investigated the genetic diversity of one of the two maize FLORICAULA/LEAFY orthologs, zfl2, in maize and its wild ancestor, teosinte (Z. mays ssp. parviglumis), and found no evidence for selection pressure resulting from maize domestication within the zfl2-coding region.

Introduction Flower-bearing reproductive structures or inflorescen- in reproductive development. These include roles in shoot ces of angiosperms (flowering ) vary dramatically in apical meristem development in tobacco (Ahearn et al. form and complexity. The grass family (Poaceae), which 2001), leaf compounding in pea and tomato (Souer et al. contains numerous closely related species with diverse 1998; Molinero-Rosales et al. 1999), and a potential role inflorescence phenotypes, is particularly striking in this in inflorescence branching in rice (Kyozuka et al. 1998). regard (Kellogg 2000), providing a useful model for the Furthermore, expression changes of FLO/LFY-like genes evolution of reproductive morphology in plants. To begin have been implicated in the evolution of inflorescence understanding the genetics underlying the evolution of architecture in Brassicaceae species (Shu et al. 2000; Yoon morphological structures, it is important to investigate the and Baum 2004), while in maize we have previously pro- molecular evolution of regulatory genes involved in the posed one of two duplicate FLO/LFY orthologs, zfl2, as a development of the phenotypes in question. candidate gene for a quantitative trait locus (QTL) contri- The genetic basis of the complex morphological buting to inflorescence structure differences between maize changes that accompany the transition to reproductive and its wild progenitor, teosinte (Zea mays ssp. parviglu- development has been extensively studied in flowering mis; hereafter parviglumis; Bomblies et al. 2003; Doebley plants,particularly indicot species.One ofthe keyregulatory 2004). The potential roles of FLO/LFY genes in inflores- genes in inflorescence and flower development is the cence structure evolution, along with the finding that these Antirrhinum majus FLORICAULA gene (FLO; Coen et al. genes appear to be involved in inflorescence branching in 1990) and its Arabidopsis thaliana ortholog, LEAFY the grasses rice and maize (Kyozuka et al. 1998; Bomblies (LFY; Weigel et al. 1992). FLO and LFY gene products et al. 2003), make them attractive candidates for a role in are involved in promoting the reproductive transition, as mediating the evolution of inflorescence structure differen- well as in controlling the identity and patterning of flowers ces in the Poaceae. and their constituent organs (Coen et al. 1990; Weigel et al. To begin addressing whether FLO/LFY orthologs 1992). Studies in additional species suggest that the role of may play a role in grass morphological evolution, we FLO/LFY orthologs in reproductive development is largely undertook a study of the molecular evolution of FLO/ conserved in diverse angiosperms, including maize (Hofer LFY-like genes in the Andropogoneae, a morphologically et al. 1997; Souer et al. 1998; Molinero-Rosales et al. diverse tribe of grasses that includes maize and sorghum 1999; Ahearn et al. 2001; Bomblies et al. 2003). (Kellogg 2000). We generated a phylogeny for Andropo- In some species the FLO/LFY genes appear to have goneae FLO/LFY-like genes and studied their molecular evolved novel functions in addition to their normal roles evolution. We also examined nucleotide diversity at the zfl2 locus in maize and parviglumis to address whether this 1 Present address: Department of Molecular Biology, Max Planck gene has been selected for inflorescence architecture differ- Institute for Developmental Biology, Spemannstrasse 37-39, Tu¨bingen, ences during the domestication of maize. Taken together, Germany. our results suggest that the FLO/LFY-like genes in the Key words: FLORICAULA/LEAFY, Andropogoneae, zfl1, zfl2, maize, domestication. Andropogoneae are evolving with selective constraint for amino acid conservation. Relative-rate tests on zfl1- and E-mail: [email protected]. zfl2-like sequences in maize and its close relatives suggest Mol. Biol. Evol. 22(4):1082–1094. 2005 doi:10.1093/molbev/msi095 that in most species neither of the paralogs shows strong Advance Access publication February 2, 2005 evidence for relaxed constraint following duplication.

Molecular Biology and Evolution vol. 22 no. 4 Ó Society for Molecular Biology and Evolution 2005; all rights reserved. Molecular Evolution of Andropogoneae FLO/LFY Genes 1083

Table 1 Species Used for Sequencing of FLO/LFY Orthologs Sample Paraloga Origin Source Collection GenBank Accession Number , Andropogoneae mutica USDA PI 271556 AY789607 odorata India USDA PI 301632 AY789616 Capillipedium parviflorum India USDA PI 301782 AY789618 koenigii RES 97-18 AY789614 Chrysopogon fulvus USDA PI 199241 AY789611 Cleistachne sorghoides ICRISAT IS 14346 AY789619 aurita Paraguay USDA PI 404628 AY789606 Coelorachis selloana USDA PI 309987 AY789608 aquatica JFW 2-88 AY789609 Coix lacyrma-jobi Brazil USDA PI 320865 AY789624

Cymbopogon distans India USDA PI 271552 AY789610 Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 Cymbopogon flexuosus USDA PI 209700 AY789617 muticus EAK JS 5865 AY789605 Elionurus tripsacoides Texas JD 646 AY789604 Heteropogon contortus South USDA PI 364892 AY789612 Hyparrhenia hirta Ethiopia USDA PI 196827 AY789615 afrum South Africa USDA PI 364923 AY789620 digitatus USDA PI 206746 AY789613 Saccharum officinarum LL UMGH AY789622 Sorghum bicolor ICRISAT IS 12711 AY789623 Sorghum versicolor Ethiopia USDA PI 260273 AY789621 Tripsacum andersonii zfl1 DT 68-68 AY789595 zfl2 AY789600 zfl2(m) AY789599 zfl1 United States DT 68-23-5 AY789596 Tripsacum floridanum zfl1 United States 68-23-5 AY789590 zfl2 AY789601 Tripsacum latifolium zfl1 DT 79-20 AY789592 Tripsacum zopilotense zfl1 Mexico 79-74 AY789591 zfl2 AY789602 Zea diploperennis zfl2 RG 1120 AY789598 Zea luxurians zfl1 HI G-5 AY789594 zfl2 AY780597 Zea mays zfl1 United States JK W22 AY789593 zfl2 AY789603 Panicoideae, Arundinelleae Arundinella hirta USDA PI 246756 AY789625

NOTE.—DT 5 David Timothy; EAK 5 Elizabeth A. Kellogg; HI 5 Hugh Iltis; JD 5 John Doebley; JFW 5 Jonathan F. Wendel; JK 5 Jerry Kermicle; ICRISAT 5 International Crops Research Institute for Semi-Arid Tropics; LL 5 Lewis Lukens; RES 5 Russell E. Spangler; RG 5 Rafael Guzman; UMGH 5 University of Minnesota Greenhouse; USDA 5 United States Department of Agriculture. a Paralog 5 sequence similarity to maize zfl1 or zfl2 for tetraploid clade (Zea and Tripsacum); zfl2(m) 5 maize-like zfl2 allele from T. andersonii.

Finally, though we have previously presented zfl2 as a can- M13R, and internal primers. Inserts were sequenced to at didate gene for a maize domestication QTL based on its least 23 coverage from multiple clones to correct for Taq roles in development, there is no significant evidence for error. We manually edited sequences using Sequencher a selective sweep having acted on the zfl2-transcribed 4.1 (GeneCodes Corporation, Ann Arbor, Mich.) generated region during domestication. alignments using ClustalW (Thompson, Higgins, and Gibson 1994), and manually edited alignments in Se-Al v.1.0a1 (Rambaut 1996). Andropogoneae sequences are Materials and Methods available in GenBank (accession numbers AY789590– Sample Material and Sequencing AY789625; see Supplementary Material). We amplified FLO/LFY-like gene sequences by poly- We sampled zfl2 diversity in maize from a geograph- merase chain reaction (PCR) from 29 Andropogoneae spe- ically diverse collection of 16 maize landraces as previously cies in 18 genera and one out-group (Arundinella hirta; described (Tenaillon et al. 2001). We sampled parviglumis table 1) using primers matching a conserved region of the zfl2 alleles from partially inbred plants fixed for given zfl2 FLO/LFY genes from rice and maize: 5#CCAACGACG- alleles. We amplified three overlapping zfl2 PCR products as CCTTCTCGG3# and 5#GGCACTGCTCGTACAGATG- follows: (1) approximately 400 bp 5# to the start codon to G3#. Amplicons range in size from 840 to 1,040 bp and exon2 (primers: 5#AGCCTCGCCGTGTCTTCT3# and 5# include exon1, intron1, and most of exon2. PCR products CCCGTGGACTTGCGAGAC3#), (2) exon2 to exon3 (pri- were cloned into pCR 2.1 TOPO vector (Invitrogen, Carls- mers: 5#AACGGGCTTGACTA3# and 5#TTGGGCTTGT- bad, Calif.) and sequenced at the University of Wisconsin- TGATGTAG3#), and (3) exon3 to 165 bp 3# of the Madison Biotechnology Center using BigDye chemistry termination codon (primers: 5#TCCGGTACGCCAAGA3# (Applied Biosystems, Foster City, Calif.) with M13F, and 5#GACGTCCCCATTCTAAAT3#) and combined 1084 Bomblies and Doebley these into single sequences. We purified PCR products Sequence Analysis (Qiaquick PCR cleanup kit, Qiagen, Valencia, Calif.) and directly sequenced PCR products 1 and 3 to minimize errors We visualized conservation of Andropogoneae FLO/ introduced by PCR amplification. We were able to directly LFY-like sequences by comparing maize zfl1 and zfl2 with sequence product 2, which was amplified with primers that the Sorghum bicolor sequence using VISTA (Bray, amplify both zfl1 and zfl2, with zfl2-specific internal primers Dubchak, and Pachter 2003) with 8-bp windows to maxi- for 11 of the 13 parviglumis samples. For the remaining two mize detection of short conserved noncoding sequences in parviglumis samples and all 16 maize samples, products intron1. We scanned intron sequences from maize zfl1 were TA cloned prior to sequencing. We sequenced multiple and zfl2, Elionurus muticus, and S. bicolor for known tran- clones of each allele to correct for Taq errors. Sequences scription factor–binding sites using the PLACE (Higo et al. were edited and aligned as described for Andropogoneae 1999) and TESS (Schug and Overton 1997) databases. To sequences above. Zea zfl2 sequences and alignments are estimate codon bias, we calculated the effective number of available in GenBank (Z. mays ssp. mays accession num- codons (ENC; Wright 1990) and synonymous third position bers: AY789020–AY789035; Z. mays ssp. parviglumis GC content (GC3s) in DnaSP v.3.99 (Rozas et al. 2003). Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 accession numbers: AY789036–AY789048). We used dN/dS (x) ratios to examine whether Andro- pogoneae FLO/LFY-like sequences are evolving under purifying constraint for amino acid sequences (x , 1) or Phylogenetic Analysis positive selection for amino acid changes (x . 1). x We generated maximum parsimony phylogenetic trees was calculated for each sequence and each codon within by heuristic searches using PAUP 4.0b10 (Swofford 2003) the sequences using a ML approach in PAML v.3.14 starting from a random tree with simple stepwise addi- (Yang 1997), which employs the method of Goldman tion, tree bisection reconnection branch swapping, and and Yang (1994) to take into account codon and transition- ACCTRAN (accelerated transformation) character optimi- transversion biases. zation to estimate branch lengths. Alignment gaps were In order to test for variation of evolutionary rates treated as missing data, and branches with zero length were among sequences or at specific codons within the se- collapsed. Bootstrap analysis (Felsenstein 1985) was car- quences, we calculated likelihood scores for the Bayesian ried out with 1,000 replicates using the same search options. phylogeny (shown in fig. 1) in PAML using the following We used Modeltest 3.6 (Posada and Crandall 1998) to models: (1) ‘‘model 0,’’ in which one average x value is test which model of molecular evolution is most appropriate estimated from the data and applied to all branches in for our data set using the Akaike information criterion as the phylogeny and all codons in each sequence; (2) ‘‘model implemented in Modeltest, which allows comparison of 1,’’ which similarly applies one x value to all branches but nonnested models and establishes a 95% confidence inter- places each codon within each sequence into one of two val of appropriate models. We generated a phylogeny by categories (x 5 0 [purifying selection] and x 5 1 [neu- Bayesian inference using MrBayes v.3.0b4 (Huelsenbeck tral]); (3) ‘‘model 2,’’ which is similar to model 1 but allows and Ronquist 2001) with the GTR-invariant-C model codons in a third x category (x . 1 [positive or directional (Tavare´ 1986; Yang 1997), as recommended as most appro- selection]); (4) ‘‘model b,’’ which applies one x value to all priate for our data by Modeltest. We used uniform prior branches but places codons within each sequence into 1 of probabilities and four rate categories to approximate the 10 x value categories and fits a b distribution where the C distribution. We ran the Markov chain Monte Carlo anal- estimated parameters (p and q) define the shape of the dis- ysis for 1,000,000 generations starting from a random tree. tribution of x values between zero and one; (5) ‘‘model b 1 The first 25,000 generations were dropped as chain burn-in, x,’’ which is similar to model b but allows an additional and subsequently every 100th generation was sampled to category for x . 1; and (6) ‘‘model F,’’ in which a separate generate a set of 9,750 trees from which the 50% majority x value is calculated for each sequence in the phylogeny but rule consensus tree was calculated in MrBayes. all codons within the sequence are assigned that x value. We generated alternative tree topologies differing in These models have been previously described and used the placement of the genera Elionurus and Coelorachis with in sequence analyses (Goldman and Yang 1994; Nielsen respect to zfl1 and zfl2 in Treeview PPC (Page 1996). We and Yang 1998; Yang 1998; Yang et al. 2000). tested for significant differences between these alternate In order to test which models of molecular evolution hypotheses and the Bayesian 50% majority rule tree using best fit our FLO/LFY ortholog data, we compared the like- the Shimodaira-Hasegawa (SH) test (Shimodaira and lihood scores obtained for the Bayesian phylogeny under Hasegawa 1999) implemented in PAUP. We performed the the above models and calculated the likelihood ratio test SH test with settings corresponding to the GTR-invariant-C statistic (2D ln L) for nested models (where H0 is a more model as was used to generate the phylogeny in MrBayes. specific case of Ha). Statistical significance was assessed Nucleotide frequencies, transition/transversion ratio, the under a v2 distribution, where the number of degrees of number of invariant sites, and the C shape parameter for freedom is the difference in the number of free parameters nucleotide substitution rate variation were estimated from calculated under each model (Felsenstein 1981; Goldman the data using maximum likelihood (ML) in PAUP. These 1993). Using this test, we compared likelihoods obtained were entered as fixed values for the SH test, for which under models 0 and 2, 0 and F, 1 and 2, and b and b 1 x. we ran 1,000 bootstrap replicates with full optimization We used Tajima’s relative-rate test (Tajima 1993), as (all free parameters estimated by ML in each replicate) to implemented in MEGA2.1 (Kumar et al. 2001), to test for set the 95% confidence interval for the test statistic. differences in evolutionary rates among zfl1-zfl2 duplicates. Molecular Evolution of Andropogoneae FLO/LFY Genes 1085 Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021

FIG. 1.—Bayesian phylogeny for FLO/LFY-like sequences from the Andropogoneae. Tree shown is the 50% majority rule tree obtained from a 1,000,000 generation Markov chain Monte Carlo analysis with the GTR-invariant-C model. The Bayesian posterior probability of each clade is indicated above the branch, and the parsimony bootstrap values obtained for the same clades with 1,000 replicates are given in parentheses.

We used S. bicolor as an out-group to test for rate variation et al. 2001). We calculated Fay and Wu’s H statistic for between the duplicate sequences obtained from the genetic hitchhiking (Fay and Wu 2000) at http://crimp. anciently tetraploid Zea and Tripsacum species and used lbl.gov/htest.html for the zfl2 exon1-intron1-exon2 region RFL (GenBank accession number AB005620) as an out- using Tripsacum floridanum as the out-group and for group to test for rate variation between the full-length maize intron2 using Z. diploperennis as the out-group. zfl1-andzfl2-coding regions. We used DnaSP to estimate the population recombi- We calculated nucleotide diversity (p), linkage dis- nation parameter (C 5 4Nc) for the zfl2-transcribed region equilibrium (LD; as r2), the minimum number of recom- and intron2 sequences in maize and parviglumis by coales- bination events (Rm), and the number of segregating sites cent simulations. These employed the observed minimum (S) in maize and parviglumis zfl2 sequences using DnaSP. number of recombination events (Rmobs) and the number of For LD analysis, we included coded insertion-deletion segregating sites (S) as estimated from the data in DnaSP. sites, except those due to microsatellites. We performed simulations of 1,000 realizations each, with different input values for C, in ProSeq v2.7 (Filatov 2002) to determine at which value for C the simulated (Rm ) Selection Tests and Neutrality Statistics for Z. mays sim exceeded Rm in fewer than 5% of the realizations zfl2 Sequences obs (i.e., the value of C for which P[Rmsim  Rmobs] 5 We performed HKA selection tests (Hudson, Kreit- 0.95). This method of estimating C was previously man, and Aguade´ 1987) in DnaSP for the exon1-intron1- described (Wall 1999). We used the estimated values of exon2 region of the maize zfl2 sequences using Tripsacum C in subsequent coalescent simulations (1,000 realizations dactyloides as the out-group and for the intron2 region each) to determine appropriate 95% confidence intervals for using Zea diploperennis as the out-group. For neutral Tajima’s D (Tajima 1989) and Wall’s Q (Wall 1999). This control loci, we used previously published data sets for method of estimating appropriate intervals that take recom- adh1 (Tenaillon et al. 2001; Tiffin and Gaut 2001), adh2 bination into account was previously described (Simonsen, (Goloubinoff, Pa¨a¨bo, and Wilson 1993), te1 (White and Churchill, and Aquadro 1995; Wall 1999). We calculated Doebley 1999), and bz2, an1, and csu1138 (Tenaillon the values of Tajima’s D and Wall’s Q for the sequence 1086 Bomblies and Doebley Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021

FIG. 2.—VISTA plot showing sequence conservation between exon1 and exon2 and intron1 of the Sorghum bicolor FLO/LFY-like sequence and zfl1 and zfl2 sequences. Arrowheads between the graphs indicate conserved putative transcription factor–binding site locations in intron1. Sequence regions are labeled below the graph.

samples using DnaSP (for Tajima’s D) and ProSeq (for ploid hybrid species carrying Zea-like and Tripsacum-like Wall’s Q) for zfl2 sequences from maize and parviglumis. genomes (Talbert et al. 1990). Our phylogeny also agrees with several previously Results published Andropogoneae phylogenies (Spangler et al. Andropogoneae Phylogeny 1999; Lukens and Doebley 2001; Mathews et al. 2002) in the placement of samples from the genera Coelorachis We generated phylogenies for tribe Andropogoneae and Elionurus as close relatives of Zea and Tripsacum. FLO/LFY-like sequences using maximum parsimony and Interestingly, sequences from these genera group more Bayesian approaches. The 50% majority rule consensus closely with zfl2 than with zfl1 (78% posterior probability; tree from the Bayesian analysis is shown (fig. 1) with 60% parsimony bootstrap support). This branch topology is branch support indicating the posterior probability of each expected if Coelorachis and Elionurus are more closely clade obtained under the GTR-invariant-C model (see related to a diploid progenitor that contributed a zfl2-like Materials and Methods). Parsimony analysis produced sim- gene to the ancestral tetraploid Zea-Tripsacum ancestor ilar tree topology (not shown), though bootstrap support for than they are to the zfl1-contributing progenitor. To test this most nodes was lower than the Bayesian posterior proba- hypothesis against alternative possibilities, we generated bilities (fig. 1). alternate trees in which (1) Elionurus and Coelorachis Our Andropogoneae phylogeny for the FLO/LFY-like are placed on a trifurcation with the zfl1 and zfl2 clades sequences agrees in several key aspects with previously (H ) or (2) Elionurus and Coelorachis are grouped with published phylogenies for other genes (Spangler et al. 1 zfl1 instead of zfl2 (H2). The relative consistency of these 1999; Lukens and Doebley 2001; Mathews et al. 2002): topologies with the data was tested using the SH test (1) the ‘‘core Andropogoneae’’ form a monophyletic (Shimodaira and Hasegawa 1999). The SH test was not sig- clade, (2) Sorghum FLO/LFY-like sequences show a nificant when comparing the original tree topology, which close relationship with sequences from Cleistachne places Elionurus and Coelorachis with zfl2 (H ), with either sorghoides and Saccharum officinarum, and (3) Sorghum- 0 H1 (P 5 0.211) or H2 (P 5 0.208), though H0 was labeled Cleistachne-Saccharum sequences group with the core as the ‘‘best’’ tree in terms of likelihood scores (ÿln L 5 Andropogoneae (fig. 1). 7,254 vs. ÿln L 5 7,256 for H1 and H2). These results sug- Sequences from the duplicate maize FLO/LFY genes, gest that the closer relationship of Elionurus and Coelora- zfl1 and zfl2, fall into two separate and well-supported chis to zfl2 implied by the phylogenies is not robust, and clades within the Tripsacum-Zea clade (fig. 1). This sug- thus the branch topology for these genera with respect to gests that the duplication of these genes occurred prior the duplicated maize zfl genes remains uncertain. to the divergence of Zea and Tripsacum but after the diver- gence of this clade from the remaining Andropogoneae. Andropogoneae Nucleotide Sequence Conservation This corresponds with the proposed timing of a tetraploidy event estimated to have occurred approximately 11 MYA in We used VISTA plots to visualize sequence conserva- an ancestor of the Zea-Tripsacum lineage (Gaut and tion between the S. bicolor FLO/LFY-like sequence and zfl1 Doebley 1997). Within the Zea-Tripsacum clade, we ob- and zfl2. These plots reveal two highly conserved regions: tained single zfl1- and/or zfl2-like alleles from each species one directly downstream to the start codon and a second except Tripsacum andersonii, from which we obtained Zea- spanning a series of leucine repeats (fig. 2). An acidic-basic like alleles of zfl1 and zfl2, as well as a Tripsacum-like zfl2 domain in exon2 (;1,060–1,220 bp) shows greater amino allele from a single sample. This result is in agreement with acid sequence variability, but many charged positions are a previous study that showed that T. andersonii is a poly- more highly conserved than adjacent uncharged residues Molecular Evolution of Andropogoneae FLO/LFY Genes 1087

Table 2 Conservation of Putative Binding Sites in Intron1 of Representative Andropogoneae FLO/LFY-like Sequences Opaque-2–likea@ sph/RY@ RY/legumin 2@ C/EBP beta-likea c-myb–likea Species GATGAYRTGR CATGCATG CATGCAYC CTTGTTCAAT TTCAATCA Zea mays (zfl1) UUUCTTGTTCGAT TTCGATCA Zea luxurians (zfl1) UUUCTTGTTCGAT TTCGATCA Tripsacum floridanum (zfl1) U CATGCGGT U CTTGTTCAAC TTCAACCA Zea mays (zfl2)—U CATGCTTC UU Zea luxurians (zfl2) GA_GATGTAG CATGCACG CATGCTTC CTTGTTCAGT TTCAGTCA Tripsacum floridanum (zfl2) UUCGTGCGTC UU Elionurus tripsacoides GATGACGCGG UUCTTGTTCACT TTCACTCA Coelorachis aurita UUTATGCATC UU Apluda mutica UUUU U Chionachne koenigii — UU UTTCAAT(G)CA

Hyparrhenia hirta UUUU UDownloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 Cymbopogon flexuosus — UU U U Sorghum bicolor GGTGATGTGG UU U U Sorghum versicolor UUUGTTGTTCAAT U Coix lacryma-jobi UUUCTTGTTCGAT TGCAATCA Arundinella hirta — U CATGCATG— — Oryza sativa (RFL) — U* CATGCATG— —

NOTE.—U 5 Sequence matches consensus sequence; — 5 entire site absent from sequence, _ 5 base absent from sequence; (N) 5 single base insertion; N 5 single- nucleotide difference from consensus. Base ambiguities: R 5 AorG;Y5 CorT;M5 CorG.@ 5 Identified within Andropogoneae by PLACE; a 5 identified within Andropogoneae by TESS; * 5 identified by Prasad et al. (2003) in RFL.

(data not shown). Overall charge in this region appears (Z. luxurians, T. zopilotense and T. floridanum), we also largely conserved among the Andropogoneae sequences sequenced the duplicate locus from the same plant and sampled: the 1,060- to 1,160-bp region contains a range found that in all three cases, the paralog contains a wild- of 11–16 basic–positively charged amino acids (arginine type RY element (table 2), suggesting that within the tetra- and lysine residues), with an average of 13.9 basic residues ploid species in our sample, it may be sufficient for only one per sequence; the bp 1,160–1,220 region contains from 4 to of the paralogs to retain this sequence. 9 acidic–negatively charged amino residues (aspartic acid We identified three additional conserved intron1 and glutamic acid), with an average of 8 acidic residues per sequences with putative transcription factor–binding sites sequence. For both regions, T. floridanum and Tripsacum (fig. 2). These include an Opaque-2–like site, which is mod- zopilotense zfl1 sequences deviate from the remaining erately conserved throughout the tribe (21 of 36 sequences), sequences in that they have the lowest number of charged a second largely conserved RY repeat (24 sequences), and amino acids (11 basic, 4 acidic). A proline-rich domain is an overlapping pair of sites similar to binding sites for found in the 5# region in all of the Andropogoneae FLO/ animal EBP-b (24 sequences) and c-myb proteins (23 se- LFY-like sequences but is highly variable at the primary quences). These three putative sites appear not to be essen- sequence level. tial for gene regulation as they are independently lost or Intron1 sequences are highly variable, but several mutated several times in the phylogeny (table 2). Further short stretches of sequence within the intron are conserved analyses will be required to determine whether any of these across multiple Andropogoneae species (fig. 2). Sequence sites play a role in regulation of the FLO/LFY-like genes conservation within this intron is of particular interest in and whether alterations of these sites may affect gene ex- light of a recent finding that both introns of the rice pression level or pattern. FLO/LFY ortholog, RFL, harbor regulatory elements suffi- cient to drive reporter expression in the wild-type pat- RFL Codon Bias in FLO/LFY-like Genes tern (Prasad, Kushalappa, and Vijayraghavan 2003). It is not currently known whether other grass FLO/LFY ortho- The maize FLO/LFY orthologs zfl1 and zfl2 are GC logs are regulated by sequence elements located in introns, rich, suggesting codon bias. Thus, to examine codon bias but binding motif searches in our sequences revealed that in these and other Andropogoneae FLO/LFY-like genes, we several of the conserved regions contain sequence motifs calculated the ENC (Wright 1990), which is independent of with similarity to known transcription factor–binding sites sequence length and amino acid composition. ENC can (table 2). Particularly striking is the near-perfect conserva- range from 20 (maximum bias; one codon used per amino tion of a RY repeat sequence initially identified as a can- acid) to 61 (no bias; all codons used equally). For the FLO/ didate regulatory site in RFL (Prasad, Kushalappa, and LFY-like genes in the Andropogoneae, ENC varies from Vijayraghavan 2003). Four of our 36 Andropogoneae 31.1 in S. bicolor to 37.5 in Z. luxurians zfl2, with an aver- sequences carry point mutations in the otherwise perfectly age ENC across the Andropogoneae of 33.6. These values conserved RY repeat, altering the sequence from the con- suggest that strong codon bias is conserved throughout the sensus CATGCATG to CATGCACG in zfl2-like sequences tribe, and this places these genes among the more strongly isolated from Zea luxurians and Z. diploperennis, and to biased genes reported to date in maize and other grasses CATGCGGT in zfl1-like sequences isolated from T. zopi- (Fennoy and Bailey-Serres 1993; Zhang, Kosakovsky lotense and T. floridanum. For three of these species Pond, and Gaut 2001). Nevertheless, 8 of the 293 codon 1088 Bomblies and Doebley

Table 3 This result extends a previous finding that high overall Codon Bias in Various Full-Length FLO/LFY Orthologs genomic GC content observed in grasses is not prevalent GenBank Accession in other monocots (Salinas et al. 1988; Montero et al. Species Number ENC GC3s 1990). The fern (Ceratopteris) and pine (Pinus) orthologs Grasses (Poaceae) have ENC and GC3s values similar to the unbiased dicot Zea mays (zfl1) AY179881 34.6 0.92 species (table 3), suggesting that low bias may be the ances- Zea mays (zfl2) AY179882 36.5 0.90 tral state for these genes. Furthermore, the high bias in Oryza sativa AB005620 38.4 0.86 grasses and eucalyptus suggests that codon bias in these Lolium temulentum AF321273 35.4 0.88 Other monocots genes may have arisen multiple times. Juncus effusus AF160481 56.2 0.39 Orchis mascula AB088439 50.3 0.68 Evidence for Purifying Selection Acting on Serapias lingua AB088466 52.7 0.67 Andropogoneae FLO/LFY-like Genes Dactylorhiza roma AB088469 45.1 0.71

Dicots: To examine whether FLO/LFY-like genes in the Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 Antirrhinum majus M55525 55.9 0.52 Andropogoneae are experiencing selective constraint on Arabidopsis thaliana M91208 56.5 0.49 Nicotiana tabaccum (NFL1) U16172 58.1 0.50 amino acid sequence, we examined the ratio of nonsynon- Petunia 3 hybrida AF030171 50.5 0.39 ymous to synonymous substitution rates (dN/dS 5 x). Vitis vinifera AF450278 52.8 0.61 Because these sequences exhibit a strong codon bias (see Eucalyptus globulus (ELF1) AF034806 37.5 0.82 above), we calculated x using a ML-based approach that Nonseed plants Pinus radiata (NLY) U76757 54.9 0.40 takes bias into consideration (Goldman and Yang 1994). Pinus radiata (PRFLL) AF109149 51.8 0.42 Pairwise comparisons of sequences within and between Ceratopteris richardii AB049974 53.2 0.36 clades (as defined in fig. 1) yield a narrow range of x values from a low of 0.03 within the Sorghum-Saccharum- NOTE.—ENC 5 effective number of codons (Wright 1990); GC3s 5 synony- mous third position GC content. Cleistachne clade to a high of 0.07 within the zfl1 clade. The average pairwise x values among species (averaged over all possible pairwise comparisons for each species, positions in our alignment are conserved for ‘‘unpreferred’’ outside the same clade) range narrowly from a low of or ‘‘rare’’ codons (A or T ending) in all 36 sequences, 0.03 for Z. mays zfl2 to a high of 0.09 for E. muticus. while an additional 18 rare codons are highly conserved To test whether specific Andropogoneae FLO/LFY- (in .25 sequences). These apparently nonrandom patterns like sequences have different x values (which might sug- of codon usage suggest that synonymous sites in these gest differing evolutionary constraints in different species), genes are also subject to selection pressure throughout we calculated the likelihood of the phylogeny under a the Andropogoneae. model allowing different x values on each branch of the To estimate when codon bias arose in the FLO/LFY tree (model F in tables 4 and 5) and a model constraining orthologs, we examined codon bias in full-length FLO/ all branches to a single x value (model 0 in tables 4 and 5). LFY-like cDNAs available in GenBank using ENC values Model F has a significantly less negative likelihood score and GC3s (table 3). We found that for dicot FLO/LFY relative to model 0 (table 5), but branch-specific x values orthologs ENC values are generally high (suggesting low under model F are all lower than one (ranging from 0.0001 bias) and GC3s values are correspondingly low (with the to 0.37). This suggests that despite variation in x values exception of Eucalyptus globulus; see table 3), while the among species, a model primarily involving purifying grass orthologs (from rice, Lolium, and maize) have low selection and constraint on amino acid sequence best ENC and high GC3s values suggestive of strong codon explains the evolution of Andropogoneae FLO/LFY-like bias. Interestingly, ENC is also high and GC3s low in sequences. FLO/LFY orthologs reported from other monocots, includ- None of the Andropogoneae FLO/LFY-like sequences ing several orchids and a rush (Juncus effusus), though the have overall x values greater than one. However, average x latter is closely related to the grass family (Bremer 2000). values for entire sequences may not reflect selection acting

Table 4 Likelihoods and Estimated Parameters Under Different Models of Codon Evolution Applied to Andropogoneae FLO/LFY Phylogeny Model ÿln L ja (Ts/Tv) xb Additional Parameters Calculated 0 4348.11 2.84 0.08 1 4464.87 1.98 0.39 px50 50.61 (px51 5 0.39) 2 4462.23 1.96 0.41 px50 50.61, (px51 5 0.39), px.1 50.005 b 4288.86 2.80 0.10 b (p 5 0.308, q 5 2.771) b 1 x 4283.30 2.80 0.10 b (p 5 0.40, q 5 4.57), px50 5 0.98, (px.1 5 0.02) F 4314.27 2.83 0.08 (range of branch x values: 0.0001–0.37)

NOTE.—px5i gives the proportion of codons within the sequences placed into category, where i 5 0, 1 or .1. See Materials and Methods for model assumptions. a j 5 transition/transversion ratio. b x 5 dN/dS value calculated under each model and applied to all branches in the phylogeny. Molecular Evolution of Andropogoneae FLO/LFY Genes 1089

Table 5 zfl1 and zfl2 as Duplicate Genes Hypothesis Testing Using Likelihood Ratio Tests for the Models in Table 4 Because gene duplication may subsequently allow a b 2 relaxed selective constraint on one or both paralogs (Force H0 (null) Ha (alternate) 2D ln L df P value (v ) Favored Model et al. 1999; Lynch and Conery 2000), we examined the evolution of the zfl1-zfl2 duplicates in the Zea-Tripsacum 1 2 5.28 2 0.1 . P . 0.05 clade in more detail. Pairwise x values are close to zero 0 2 228.24 2 P , 0.005 0 0 F 67.68 35 P , 0.005 F for the Zea and Tripsacum zfl1 and zfl2 sequences, suggest- bb1 x 11.12 2 P , 0.005 b 1 x ing that purifying selection is the dominant force acting on both paralogs (table 6). We tested whether the duplication a The likelihood ratio statistic. b The degrees of freedom is the difference in the number of parameters calcu- of zfl-like genes resulted in a change in nucleotide substi- lated under the models being compared. tution rate in either paralog by comparing the duplicate sequences from five species (T. floridanum, T. andersonii,

T. zopilotense, Z. mays, and Z. luxurians) with S. bicolor Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 on only a subset of codons. Thus, we tested whether any using Tajima’s relative-rate test (Tajima 1993). The test individual codons within the Andropogoneae FLO/LFY- is statistically significant for only one species (T. florida- like sequences show evidence of positive selection for num; table 5), suggesting that the zfl1 and zfl2 clades are amino acid changes (x values . 1) using a ML approach evolving at similar rates. The significant relative-rate test which employs several models that assign individual for T. floridanum suggests that the zfl1 gene from this spe- codons within the sequences into classes with different x cies has accumulated an excess of unique mutations when values (Yang et al. 2000). Positive or directional selection compared with T. floridanum zfl2, implying a relaxation of acting on specific codons can be detected by comparing evolutionary constraint. In support of this, we identified likelihood scores under models that allow a class of sites several potentially deleterious mutations in T. floridanum with x values greater than one to scores obtained under zfl1: (1) a leucine to valine mutation of a central leucine nested models that constrain x values between zero and repeat residue that is highly conserved among angiosperm one. A significantly higher likelihood score under models FLO/LFY orthologs examined (data not shown), (2) a sec- that allow individual codons to have x values greater than ond leucine to valine mutation within the leucine repeat one, would suggest that a subset of sites exists which may region, (3) a mutation in the highly conserved RY repeat be experiencing directional selection (Yang et al. 2000). sequence in intron1 (table 2), and (4) at least two small dele- Analyses performed using all of the models we tested tions and three base changes that result in a lower number of for the Andropogoneae FLO/LFY orthologs assign the charged residues in the acidic and basic regions than majority of codons in each sequence x values close to zero. observed in other Andropogoneae sequences (see above). Model b, which estimates the distribution of x values according to a b distribution (0 , x , 1), yields shape Molecular Evolution of zfl2 in Maize and parviglumis parameter estimates (p and q) describing an L-shaped dis- Relative Nucleotide Diversity of Maize and tribution with maximum density near zero (table 4). While parviglumis zfl2 Alleles model b 1 x has a significantly better likelihood score than model b (table 5), only 2% of the codons are placed in the We previously proposed zfl2 as a candidate for a major- x . 1 category. Model 2 places 0.5% of the codons in the effect maize domestication QTL that affects inflorescence x . 1 category (table 4) and has a significantly lower like- architecture (Bomblies et al. 2003). To determine whether lihood score than model 0, which does not allow any sites evidence of selection pressure is detectable at the zfl2 locus, with x . 1 (table 5). Of the codons placed in the x . 1 we sequenced all three exons, the two intervening introns, class under models 2 and b 1 x, only one codon is statisti- and approximately 400 bp 5# of the translation start and cally significant, but this codon lies in the variable proline- 165 bp 3# of the termination codon from 16 maize samples rich region and is absent from many of the sequences. and 13 parviglumis samples. All zfl2 sequences obtained Thus, this result is inconclusive, and overall there is no contain full-length open reading frames. In two of these convincing evidence that individual codons in the sequences, we identified insertions in noncoding regions that exon1-exon2 region of the Andropogoneae FLO/LFY show similarity to small transposable elements. One of these sequences are experiencing positive selection for amino has a strong match in GenBank; intron2 of one parviglumis acid changes. sample contains an mPIF miniature transposable element

Table 6 Tajima’s Relative-Rate Tests for zfl1-zfl2 Duplicate Loci Species Pairwise x (dN/dS) Out-group Aligned Sites Unique in zfl1 Unique in zfl2 P value Tripsacum andersonii 0.023 Sorghum bicolor 768 13 9 0.394 Tripsacum floridanum 0.071 Sorghum bicolor 728 31 16 0.029* Tripsacum zopilotense 0.061 Sorghum bicolor 720 26 16 0.123 Zea luxurians 0.047 Sorghum bicolor 775 19 23 0.537 Zea mays 0.024 Sorghum bicolor 777 25 20 0.456 Zea mays Oryza sativa 916 16 26 0.123

* Significant at the P , 0.05 level. 1090 Bomblies and Doebley

Table 7 Table 9 Nucleotide diversity of zfl2 from Maize and parviglumis HKA Test Results for zfl2 in Maize

Region Sites S 1,0003 pm 1,0003 pp pm/pp Region Out-group Sites Sm k Control P value All 2,685 175 7.7 12.6 0.62 5#-Exon2 Tripsacum dactyloides 544 5 25.3 adh1 0.16 5# Region 264 14 10.8 11.9 0.96 an1 0.13 Exon1 414 11 2.2 7.0 0.32 bz2 0.10 Intron1 125 16 6.9 33.2 0.21 csu1138 0.20 Exon2 404 22 7.3 5.8 1.25 Intron2 Zea diploperennis 1179 45 26.06 adh1 0.66 Intron2 1,108 104 11.0 17.7 0.62 adh2 0.64 Exon3/3# 370 8 2.7 4.3 0.62 te1 0.52

NOTE.—p 5 average nucleotide diversity per base pair in maize (pm) and NOTE.—Sites 5 number of aligned nucleotides; Sm 5 segregating sites in maize parviglumis(pp). calculated from the data; hH 5 H statistic for genetic hitchhiking; k 5 average num- ber of nucleotide differences with out-group; control 5 ‘‘neutral’’ control locus used; P value 5 HKA test P value. Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 (MITE) (AF416310). A 334-bp insertion in intron2 in one of our maize samples (AY789025) has no GenBank matches Tests for Selection on the zfl2-Transcribed Region but has a 19-bp terminal inverted repeat with one mismatch, suggesting that it may be a previously uncharacterized To test further whether zfl2 has experienced selection MITE. It is not known whether these insertions affect gene pressure, we used the HKA test (Hudson, Kreitman, and function. However, we also identified a 313-bp Hbr22 MITE Aguade´ 1987), which asks whether the ratio of nucleotide (AF203729) 327 bp 5# of the start codon of zfl2 from inbred diversity within a population to divergence from an out- maize line W22 (not included in the diversity study), for group differs significantly from the same ratio for neutral which previous results suggest that the allele is functional control loci. No part of the zfl2-transcribed region was sig- despite the insertion (Bomblies et al. 2003). nificant for this test, suggesting that this region has evolved We calculated nucleotide diversity (p) for the maize neutrally in maize (table 9). We also applied the H statistic and parviglumis zfl2 sequences, as well as for each region (Fay and Wu 2000) to test for linkage to a selected site and within the gene (table 7). Overall nucleotide diversity for found that H was also not significant for the zfl2 regions maize zfl2 sequences compared with parviglumis sequences tested (unpublished data). is approximately 62% (table 7), which is similar to a loss of We applied Tajima’s D (Tajima 1989) and Wall’s Q (a genetic diversity previously attributed to the maize domes- statistic that has greater power in the presence of recombi- tication bottleneck (Eyre-Walker et al. 1998; White and nation; Wall 1999) to detect nonneutral patterns of evolu- Doebley 1999). We observed a drop in relative nucleotide tion. We estimated confidence intervals for Tajima’s D and diversity to 21% in the first intron (table 7). While such a Wall’s Q by coalescent simulation with recombination (see drop in relative diversity can be indicative of selection, the Materials and Methods). The observed Tajima’s D value is most common parviglumis haplotype in this region outside the expected 95% confidence interval for the entire (observed in 5 of the 13 sequences) is identical to the pre- parviglumis zfl2–transcribed region, and both Tajima’s D dominant maize haplotype (observed in 10 of the 16 maize and Wall’s Q were outside the confidence interval for samples). Furthermore, while only two segregating single- intron2 in parviglumis (table 8). In maize, however, nucleotide polymorphisms are observed in the maize zfl2 Tajima’s D and Wall’s Q are not significant, and therefore intron1 sequences, insertion-deletion variation defines at the null hypothesis of neutral evolution cannot be rejected least three different haplotypes in the maize sample, two for maize. of which are also found in the parviglumis sample (see Elevated LD is expected for genes under selection. alignment 2 in Supplementary Material). These results sug- Therefore, we estimated the extent of LD in the zfl2 region by calculating r2 values as a function of distance between gest that the observed drop in relative diversity in maize 2 compared with parviglumis in intron1 is unlikely to result polymorphic sites for both maize and parviglumis. The r from a selective sweep having acted on this region during values were averaged for 50-bp between-site distance cat- maize domestication. egories and plotted to estimate the extent of LD (fig. 3). In parviglumis, r2 values decay to ;0.1 for sites greater than ;300 bp apart, while in maize, r2 values decay to 0.1 only Table 8 for sites more than ; 900 bp apart (fig. 3). The level of LD Neutrality Statistics for zfl2 in maize zfl2 relative to parviglumis is similar to that observed at several neutrally evolving maize genes, and Region Sites S Rm 4Nc D CI (D) Q CI (Q) obs LD in maize zfl2 decays more rapidly than is observed All (m) 2,685 76 7 14.4 ÿ0.22 ÿ1.27–1.42 0.34 0.10–0.40 for several selected genes (Remington et al. 2001), suggest- All (p) 2,685 146 20 78 ÿ1.07* ÿ1.02–1.33 0.23 0.14–0.36 Intron2 (m) 1,376 46 2 1.8 ÿ0.13 ÿ1.68–2.26 0.30 0.07–0.55 ing that the observed differences are attributable to the Intron2 (p) 1,376 89 9 24.5 ÿ1.24* ÿ0.90–1.27 0.41* 0.11–0.39 domestication bottleneck alone.

NOTE.—m 5 maize; p 5 parviglumis; sites 5 aligned sites excluding gaps; S 5 Discussion number of segregating sites calculated from the data; Rmobs 5 minimum number of recombination events calculated from the data; 4Nc 5 population recombination In this study, we examine the molecular evolution of parameter estimated by coalescent simulations; D 5 Tajima’s D; Q 5 Wall’s Q; CI 5 95% confidence intervals calculated using coalescent simulations. FLO/LFY orthologs in the grass tribe Andropogoneae. * Value outside CI. Strong evolutionary constraint on protein sequence is Molecular Evolution of Andropogoneae FLO/LFY Genes 1091

conserved sequence is a RY repeat sequence previously proposed as a putative regulatory site in rice (Prasad, Kushalappa, and Vijayraghavan 2003). RY sequences are known to be binding sites for B3 transcription factors and are important for spatial regulation of target genes (Baumlein et al. 1992; Reidt et al. 2000). The near-perfect conservation of this eight-nucleotide sequence in the Andropogoneae and rice strongly suggests that this site is important for gene regulation in these species, and thus

2 it would be extremely interesting to determine which, if FIG. 3.—Average linkage disequilibrium, calculated as r , for segre- any, protein binds to this site. gating sites in 50-bp increments of increasing distance between polymor- phic sites in maize and teosinte. The x axis shows distance between Interestingly, four species within the tetraploid Zea- polymorphic sites, and the y axis shows the average r2 value for each dis- Tripsacum clade harbor point mutations in the RY repeat tance category. sequence. For three of these individuals, we also isolated Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 the paralog and found in each case that the duplicate has retained a wild-type RY sequence. While this sample size suggested by very low nonsynonymous to synonymous is not large enough to draw firm conclusions, this observa- substitution rate ratios (x) observed throughout the tribe. tion suggests that constraint on the RY repeat sequence may This result is not surprising as FLO/LFY genes are essential be relaxed when a duplicate locus is present but that main- for fertility and other important aspects of reproductive tenance of at least one paralog with a wild-type RY repeat is development in a wide range of flowering plant species functionally important. In contrast to the highly conserved (Coen et al. 1990; Weigel et al. 1992; Hofer et al. 1997; RY repeat sites, other potential transcription factor–binding Souer et al. 1998; Molinero-Rosales et al. 1999; Ahearn sites in intron1 show greater variability. This suggests either et al. 2001; Bomblies et al. 2003). No individual codons that the similarity of the sequences to binding sites is inci- within the exon1-exon2 region tested showed evidence dental or that there is potential plasticity in gene regulation for directional selection. that may allow evolution of novel expression patterns or In addition to selective constraints at nonsynonymous levels. To address this, it would be important to determine sites, our data suggest that evolutionary constraints also whether there are differences in FLO/LFY ortholog expres- exist for synonymous and some noncoding sites in these sion between these species and whether these differences sequences. Constraint on synonymous sites within the correlate with sequences differences in putative regulatory coding region is suggested by the strong codon bias we elements. observed in Andropogoneae FLO/LFY-like sequences. One of our primary interests in generating a phylogeny Strong codon bias has been previously observed for numer- for FLO/LFY-like genes from the Andropogoneae was to ous other nuclear genes in grasses, suggesting that this pres- examine the relationship of the duplicate maize FLO/LFY sure operates at the genomic level and is probably not orthologs, zfl1 and zfl2, with sequences from related species specific to any unique properties of the FLO/LFY genes to better understand the origin of the paralogs. The phylog- (Salinas et al. 1988; Montero et al. 1990; Fennoy and eny presented in this study is unique among previously pub- Bailey-Serres 1993; Zhang, Kosakovsky Pond, and Gaut lished Andropogoneae gene phylogenies (Spangler et al. 2001). However, while the majority of codons in the Andro- 1999; Lukens and Doebley 2001; Mathews et al. 2002) in pogoneae FLO/LFY sequences have a G or C in synonymous that it includes both members of a maize duplicate gene third positions, several codons are conserved for rare (A or U set. Sequences from the zfl1 and zfl2 duplicate loci form ending) codons. Conservation of rare codons suggests that two clades within the Zea-Tripsacum group, supporting selection at some sites acts against the prevailing codon bias our previous hypothesis (Bomblies et al. 2003) that the zfl to maintain these synonymous sites. Rare codon use at spe- genes were duplicated in the tetraploidy event preceding cific positions has been implicated in basic aspects of mes- the Zea-Tripsacum divergence (Gaut and Doebley 1997). senger RNA function such as splicing (Schaal and Maniatis The relationships of duplicated Zea-Tripsacum genes with 1999) and translational pausing to allow protein domain orthologous sequences from other Andropogoneae are of folding (Purvis et al. 1987). interest because the progenitor(s) of the tetraploidy event Recently, the rice FLO/LFY ortholog, RFL, was shown are as yet unknown. If tetraploidy results from genome dou- to harbor functional regulatory elements within both of its bling within a single species (autotetraploidy), duplicate introns (Prasad, Kushalappa, and Vijayraghavan 2003). genes should be more closely related to one another than Because conservation of noncoding sequences is a useful either is to sequences from related diploids. If tetraploidy criterion for identifying putative regulatory elements (Koch results from hybridization of two species (allotetraploidy), et al. 2001; Levy, Hannenhalli, and Workman 2001; then the duplicate sequences should be more closely related Kaplinsky et al. 2002; Hong et al. 2003), we examined to orthologs from species related to the diploid ancestors than the first intron from the Andropogoneae FLO/LFY sequen- the paralogs are to one another. In the FLO/LFY-like gene ces for conservation of sequences that might indicate the phylogenies we obtained, the genera Elionurus and Coelor- presence of important regulatory elements. We found multi- achis are supported as close relatives of Zea and Tripsacum ple regions of moderate to high sequence conservation, sev- and show a closer relationship to zfl2 than to zfl1. However, eral of which contain sequences with similarity to known because statistical tests fail to reject alternate hypotheses transcription factor–binding sites. The most highly regarding the relationship of Elionurus and Coelorachis 1092 Bomblies and Doebley to zfl1 and zfl2, the phylogenetic relationships of the dupli- regions (Fu, Zheng, and Dooner 2002). Thus, if selection cates remain unclear, and we cannot conclude from these has acted on nearby regulatory regions, its molecular sig- data whether the tetraploid ancestor of the Zea-Tripsacum nature may not be detected in the coding region. Such a clade arose by auto- or allotetraploidy. Detailed phyloge- result has been observed for the teosinte branched1 gene netic analyses of additional duplicate genes in the Zea- in maize, which shows strong evidence of selection in Tripsacum clade with orthologous sequences from upstream regulatory regions, but nearly neutral nucleotide closely related genera could shed light on this question in diversity patterns in the coding region (Wang et al. 1999; the future. Clark et al. 2004). Thus, concluding whether or not selec- Duplicate genes are of broad interest because redun- tion has acted on zfl2 during maize domestication will dancy may release paralogs from evolutionary constraint require further study of surrounding genomic regions. and thus provide ‘‘raw material’’ for evolution (Ohno 1970). Theoretical studies suggest that relaxed constraint Supplementary Material on duplicate genes may result in several fates, including 1. Alignment file for Andropogoneae sequences. Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 loss of function of one paralog, evolution of novel func- 2. Alignment file for Zea zfl2 sequences. tion(s), or long-term maintenance of both paralogs (Force et al. 1999; Lynch and Conery 2000). We have previously shown that zfl1 and zfl2 are largely redundant in maize Acknowledgments (Bomblies et al. 2003), but we do not know whether this is true for other Zea and Tripsacum species. x ratios for We would like to thank Richard M. Clark, Levi J. Yant the five Zea and Tripsacum species (including maize) from and two anonymous reviewers for valuable comments on which we isolated both paralogs are close to zero for both the manuscript, Qiong Zhao for helpful advice on computer genes, suggesting that both the zfl1 and zfl2 protein sequen- analyses, and E. Kellogg for Chionachne koenigii DNA. ces are evolutionarily constrained throughout the clade. This work was supported by NIH grant GM-58816 to Relative-rate tests confirm that in most of the duplicate J.D. K.B. was supported by a Howard Hughes Medical pairs, the paralogs from a given species are evolving at sim- Institute predoctoral fellowship. ilar rates. However, the duplicate pair from T. floridanum shows a significant relative-rate test. We argue that this is Literature Cited not merely an artifact of performing multiple tests because Ahearn, K. P., H. A. Johnson, D. Weigel, and D. R. Wagner. 2001. T. floridanum zfl1 carries several potentially deleterious NFL1,aNicotiana tabacum LEAFY-like gene, controls meri- mutations, including two leucine to valine changes within stem initiation and floral structure. Plant Cell Physiol. 42: a highly conserved series of leucine repeats. While leucine 1130–1139. to valine is often considered a ‘‘conservative’’change due to Baumlein, H., I. Nagy, R. Villarroel, D. Inze, and U. Wobus. 1992. the structural similarity and neutral charge of these Cis-analysis of a seed protein gene promoter: the conservative amino acids, in several cases leucine to valine mutations RY repeat CATGCATG within the legumin box is essential for in the context of leucine repeats have been shown to desta- tissue-specific expression of a legumin gene. Plant J. 2: 233–239. bilize coiled-coil structures in proteins (Zhu et al. 1993) Bomblies, K., R. L. Wang, B. A. Ambrose, R. J. Schmidt, R. B. or abolish protein-protein interactions necessary for tran- Meeley, and J. Doebley. 2003. Duplicate FLORICAULA/ scription factor function (Wang et al. 2001). However, LEAFY homologs zfl1 and zfl2 control inflorescence architec- because the functional consequences of these and the other ture and flower patterning in maize. Development 130: mutations observed in T. floridanum zfl1 are not known, we 2385–2395. cannot conclude whether this gene is deteriorating into a Bray, N., I. Dubchak, and L. Pachter. 2003. AVID: a global align- pseudogene or acquiring a novel function. Overall, our ment program. Genome Res. 13:97–102. results for the zfl1 and zfl2 paralogs add to the expanding Bremer, K. 2000. Early Cretaceous lineages of monocot flowering literature demonstrating that gene duplicates frequently plants. Proc. Natl. Acad. Sci. USA 97:4707–4711. show evidence of purifying selection pressure for amino Clark, R. M., E. Linton, J. Messing, and J. F. Doebley. 2004. acid conservation acting on both duplicates and long-term Pattern of diversity in the genomic region near the maize domestication gene tb1. Proc. Natl. Acad. Sci. USA 101: maintenance of paralogs (Van de Peer et al. 2001; Conant 700–707. and Wagner 2003; Hileman and Baum 2003). Coen, E. S., J. M. Romero, S. Doyle, R. Elliott, G. Murphy, and R. We have previously proposed zfl2 as a candidate gene Carpenter. 1990. floricaula: a homeotic gene required for for a major-effect maize domestication QTL for inflores- flower development in Antirrhinum majus. Cell 63:1311–1322. cence architecture (Bomblies et al. 2003). Thus, we asked Conant, G. C., and A. Wagner. 2003. Asymmetric sequence diver- whether the maize zfl2 gene shows evidence of selection gence of duplicate genes. Genome Res. 13:2052–2058. during domestication by examining zfl2 sequence diversity Doebley, J. 2004. The genetics of maize evolution. Annu. Rev. in maize and its wild ancestor parviglumis. We found that Genet. 38:37–59. the drop in relative nucleotide diversity and elevated LD Eyre-Walker, A., R. L. Gaut, H. Hilton, D. L. Feldman, and B. S. observed in maize zfl2 relative to parviglumis are similar Gaut. 1998. Investigation of the bottleneck leading to the domestication of maize. Proc. Natl. Acad. Sci. USA to values previously reported for neutrally evolving loci 95:4441–4446. and can thus be explained by the domestication bottleneck Fay, J. C., and C. I. Wu. 2000. Hitchhiking under positive Darwin- effect alone (Eyre-Walker et al. 1998; White and Doebley ian selection. Genetics 155:1405–1413. 1999). However, it is important to point out that in maize, Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a recombination often occurs preferentially in or near coding maximum likelihood approach. J. Mol. Evol. 17:368–376. Molecular Evolution of Andropogoneae FLO/LFY Genes 1093

Felsenstein, J. 1985. Confidence limits on phylogenies: an Levy, S., S. Hannenhalli, and C. Workman. 2001. Enrichment of approach using the bootstrap. Evolution 39:783–791. regulatory signals in conserved non-coding genomic sequence. Fennoy, S. L., and J. Bailey-Serres. 1993. Synonymous codon Bioinformatics 17:871–877. usage in Zea mays L. nuclear genes is varied by levels of C Lukens, L., and J. F. Doebley. 2001. Molecular evolution of the and G-ending codons. Nucleic Acids Res. 21:5294–5300. teosinte branched 1 gene among maize and related grasses. Filatov, D. A. 2002. ProSeq: a software for preparation and evolu- Mol. Biol. Evol. 18:627–638. tionary analysis of DNA sequence data sets. Mol. Ecol. Notes Lynch, M., and J. S. Conery. 2000. The evolutionary fate and con- 2:621–624. sequences of duplicate genes. Science 290:1151–1155. Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, and Mathews, S., R. E. Spangler, R. J. Mason-Gamer, and E. A. Kel- J. Postlethwait. 1999. Preservation of duplicate genes by logg. 2002. Phylogeny of Andropogoneae inferred from Phy- complementary, degenerative mutations. Genetics 151: tochrome B, GBSSI, and ndhf. Int. J. Plant Sci. 163:441–450. 1531–1545. Molinero-Rosales, N., M. Jamilena, S. Zurita, P. Gomez, J. Capel, Fu, H., Z. Zheng, and H. K. Dooner. 2002. Recombination rates and R. Lozano. 1999. FALSIFLORA, the tomato orthologue of between adjacent genic and retrotransposon regions in maize FLORICAULA and LEAFY, controls flowering time and floral

vary by 2 orders of magnitude. Proc. Natl. Acad. Sci. USA meristem identity. Plant J. 20:685–693. Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 99:1082–1087. Montero, L. M., J. Salinas, G. Matassi, and G. Bernardi. 1990. Gaut, B. S., and J. F. Doebley. 1997. DNA sequence evidence for Gene distribution and isochore organization in the nuclear the segmental allotetraploid origin of maize. Proc. Natl. Acad. genome of plants. Nucleic Acids Res. 18:1859–1867. Sci. USA 94:6809–6814. Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting Goldman, N. 1993. Statistical tests of models of DNA substitution. positively selected amino acid sites and applications to the J. Mol. Evol. 36:182–198. HIV-1 envelope gene. Genetics 148:929–936. Goldman, N., and Z. Yang. 1994. A codon-based model of nucleo- Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, tide substitution for protein-coding DNA sequences. Mol. Biol. Heidelberg, Germany. Evol. 11:725–736. Page, R. D. M. 1996. TREEVIEW: an application to display phy- Goloubinoff, P., S. Pa¨a¨bo, and A. C. Wilson. 1993. Evolution of logenetic trees on personal computers. Comput. Appl. Biosci. maize inferred from sequence diversity of an Adh2 gene seg- 12:357–358. ment from archaeological specimens. Proc. Natl. Acad. Sci. Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model USA 90:1997–2001. of DNA substitution. Bioinformatics 14:817–818. Higo, K., Y. Ugawa, M. Iwamoto, and T. Korenaga. 1999. Plant Prasad, K., K. Kushalappa, and U. Vijayraghavan. 2003. Mech- cis-acting regulatory DNA elements (PLACE) database: 1999. anism underlying regulated expression of RFL, a conserved Nucleic acids Res. 27:297–300. transcription factor, in the developing rice inflorescence. Mech. Hileman, L. C., and D. A. Baum. 2003. Why do paralogs persist? Dev. 120:491–502. Molecular evolution of CYCLOIDEA and related floral sym- Purvis, I. J., A. J. Bettany, T. C. Santiago, J. R. Coggins, K. Dun- metry genes in Antirrhineae (Veronicaceae). Mol. Biol. Evol. can, R. Eason, and A. J. Brown. 1987. The efficiency of folding 20:591–600. of some proteins is increased by controlled rates of translation Hofer, J., L. Turner, R. Hellens, M. Ambrose, P. Matthews, A. in vivo. A hypothesis. J. Mol. Biol. 193:413–417. Michael, and N. Ellis. 1997. UNIFOLIATA regulates leaf Rambaut, A. 1996. Se-Al: sequence alignment editor. (http:// and flower morphogenesis in pea. Curr. Biol. 7:581–587. evolve.zoo.ox.ac.uk/). Hong, R. L., L. Hamaguchi, M. A. Busch, and D. Weigel. 2003. Reidt, W., T. Wohlfarth, M. Ellerstrom, A. Czihal, A. Tewes, I. Regulatory elements of the floral homeotic gene AGAMOUS Ezcurra, L. Rask, and H. Baumlein. 2000. Gene regulation dur- identified by phylogenetic footprinting and shadowing. Plant ing late embryogenesis: the RY motif of maturation-specific Cell 15:1296–1309. gene promoters is a direct target of the FUS3 gene product. Hudson, R. R. M., M. Kreitman, and M. Aguade´. 1987. A test of Plant J. 21:401–408. neutral molecular evolution based on nucleotide data. Genetics Remington, D. L., J. M. Thornsberry, Y. Matsuoka, L. M. Wilson, 116:153–159. S. R. Whitt, J. Doebley, S. Kresovich, M. M. Goodman, and Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian E. S. Buckler IV. 2001. Structure of linkage disequilibrium inference of phylogeny. Bioinformatics 17:754–755. and phenotypic associations in the maize genome. Proc. Natl. Kaplinsky, N. J., D. M. Braun, J. Penterman, S. A. Goff, and M. Acad. Sci. USA 98:11479–11484. Freeling. 2002. Utility and distribution of conserved noncod- Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. ing sequences in the grasses. Proc. Natl. Acad. Sci. USA 2003. DnaSP, DNA polymorphism analyses by the coalescent 99:6147–6151. and other methods. Bioinformatics 19:2496–2497. Kellogg, E. A. 2000. Molecular and morphological evolution in Salinas, J., G. Matassi, L. M. Montero, and G. Bernardi. 1988. Andropogoneae. Pp. 149–158 in S. W. L. Jacobs and J. E. Compositional compartmentalization and compositional pat- Everett, eds. Grasses: systematics and evolution. Common- terns in the nuclear genomes of plants. Nucleic Acids Res. wealth Scientific and Industrial Research Organization 16:4269–4285. (CSIRO), Collingwood, Victoria, . Schaal, T. D., and T. Maniatis. 1999. Selection and characteriza- Koch, M. A., B. Weisshaar, J. Kroymann, B. Haubold, and T. tion of pre-mRNA splicing enhancers: identification of novel Mitchell-Olds. 2001. Comparative genomics and regulatory SR protein-specific enhancer sequences. Mol. Cell. Biol. evolution: conservation and function of the Chs and Apetala3 19:1705–1719. promoters. Mol. Biol. Evol. 18:1882–1891. Schug, J., and G. C. Overton. 1997. TESS: transcription element Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: search software on the WWW. Technical report CBIL-TR- molecular evolutionary genetics analysis software. Bioinfor- 1997-1001-v0.0. Computational Biology and Informatics Lab- matics 17:1244–1245. oratory, School of Medicine, University of Pennsylvania, Kyozuka, J., S. Konishi, K. Nemoto, T. Izawa, and K. Shimamoto. Philadelphia, Pa. 1998. Down-regulation of RFL, the FLO/LFY homolog of rice, Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of accompanied with panicle branch initiation. Proc. Natl. Acad. log-likelihoods with applications to phylogenetic inference. Sci. USA 95:1979–1982. Mol. Biol. Evol. 16:1114–1116. 1094 Bomblies and Doebley

Shu, G., W. Amaral, L. C. Hileman, and D. A. Baum. 2000. Wall, J. D. 1999. Recombination and the power of statistical tests LEAFY and the evolution of rosette flowering in violet of neutrality. Genet. Res. 74:65–79. cress (Jonopsidium acaule, Brassicaceae). Am. J. Bot. 87: Wang, R. L., A. Stec, J. Hey, L. Lukens, and J. Doebley. 1999. The 634–641. limits of selection during maize domestication. Nature 398: Simonsen, K. L., G. A. Churchill, and C. A. Aquadro. 1995. Prop- 236–239. erties of statistical tests of neutrality for DNA polymorphism Wang, Y., W. Devereux, T. M. Stewart, and R. A. Casero Jr. 2001. data. Genetics 141:413–429. Characterization of the interaction between the transcription Souer, E., A. van der Krol, D. Kloos, C. Spelt, M. Bliek, J. Mol, factors human polyamine modulated factor (PMF-1) and and R. Koes. 1998. Genetic control of branching pattern and NF-E2-related factor 2 (Nrf-2) in the transcriptional regulation floral identity during Petunia inflorescence development. of the spermidine/spermine N1-acetyltransferase (SSAT) gene. Development 125:733–742. Biochem. J. 355:45–49. Spangler, R., B. Zaitchik, E. Russo, and E. Kellogg. 1999. Andro- Weigel, D., J. Alvarez, D. R. Smyth, M. F. Yanofsky, and E. M. pogoneae evolution and generic limits in Sorghum (Poaceae) Meyerowitz. 1992. LEAFY controls floral meristem identity in using ndhf sequences. Syst. Bot. 24:267–281. Arabidopsis. Cell 69:843–859.

Swofford, D. L. 2003. PAUP*. Phylogenetic analysis using parsi- White, S. E., and J. F. Doebley. 1999. The molecular evolution of Downloaded from https://academic.oup.com/mbe/article/22/4/1082/1083392 by guest on 01 October 2021 mony (*and other methods). Version 4. Sinauer Associates, terminal ear1, a regulatory gene in the genus Zea. Genetics Sunderland, Mass. 153:1455–1462. Tajima, F. 1989. Statistical method for testing the neutral mutation Wright, F. 1990. The Ôeffective number of codonsÕ used in a gene. hypothesis by DNA polymorphism. Genetics 123:585–595. Gene 87:23–29. Tajima, F. 1993. Simple methods for testing the molecular evolu- Yang, Z. 1997. PAML: a program package for phylogenetic anal- tionary clock hypothesis. Genetics 135:599–607. ysis by maximum likelihood. CABIOS 13:555–556. Talbert, L. E., J. F. Doebley, S. R. Larson, and V. L. Chandler. Yang, Z. 1998. Maximum likelihood phylogenetic estimation 1990. Tripsacum andersonii is a natural hybrid involving from DNA sequences with variable rates over sites: approxi- Zea and Tripsacum: molecular evidence. Am. J. Bot. 77: mate methods. J. Mol. Evol. 39:306–314. 722–726. Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Tavare´, S. 1986. Some probabilistic and statistical problems on Codon-substitution models for heterogeneous selection pres- the analysis of DNA sequences. Lect. Math. Life Sci. sure at amino acid sites. Genetics 155:431–449. 17:57–86. Yoon, H. S., and D. Baum. 2004. Transgenic study of parallelism Tenaillon, M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F. in plant morphological evolution. Proc. Natl. Acad. Sci. USA. Doebley, and B. S. Gaut. 2001. Patterns of DNA sequence 101:6524–6529. polymorphism along chromosome 1 of maize (Zea mays Zhang, L., S. Kosakovsky Pond, and B. S. Gaut. 2001. A survey of ssp. mays L.). Proc. Natl. Acad. Sci. USA 98:9161–9166. the molecular evolutionary dynamics of twenty-five multigene Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUS- families from four grass taxa. J. Mol. Evol. 52:144–156. TAL W: improving the sensitivity of progressive multiple Zhu, B. Y., N. E. Zhou, C. M. Kay, and R. S. Hodges. 1993. Pack- sequence alignment through sequence weighting, position-spe- ing and hydrophobicity effects on protein folding and stability: cific gap penalties and weight matrix choice. Nucleic Acids effects of beta-branched amino acids, valine and isoleucine, on Res. 22:4673–4680. the formation and stability of two-stranded alpha-helical coiled Tiffin, P., and B. S. Gaut. 2001. Sequence diversity in the tetra- coils/leucine zippers. Protein Sci. 2:383–394. ploid and the closely related diploid Z. diploper- ennis: insights from four nuclear loci. Genetics 158:401–412. Neelima Sinha, Associate Editor Van de Peer, Y., J. S. Taylor, I. Braasch, and A. Meyer. 2001. The ghost of selection past: rates of evolution and functional diver- gence of anciently duplicated genes. J. Mol. Evol. 53:436–446. Accepted January 25, 2005