<<

Molecular Phylogenetics and Evolution 81 (2014) 159–173

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

Origin of a novel regulatory module by duplication and degeneration of an ancient transcription factor

Sandra K. Floyd a, Joseph G. Ryan a, Stephanie J. Conway b, Eric Brenner c, Kellie P. Burris d, Jason N. Burris d, Tao Chen e, Patrick P. Edger f, Sean W. Graham g, James H. Leebens-Mack h, J. Chris Pires f, Carl J. Rothfels i, ⇑ Erin M. Sigel j, Dennis W. Stevenson k, C. Neal Stewart Jr. d, Gane Ka-Shu Wong l,m,n, John L. Bowman a,o, a School of Biological Sciences, Monash University, Melbourne, Victoria 3800, Australia b School of , University of Melbourne, Melbourne, Victoria 3010, Australia c Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003-6688, USA d Department of Plant Science, University of Tennessee, Knoxville, TN 37919, USA e Fairy Lake Botanical Garden, Shenzhen & CAS, Guangdong 518004, China f Division of Biological Sciences, University of Missouri, Columbia, MO 65211, USA g Department of Botany, University of British Columbia, UBC Bot Garden, Vancouver, BC V6T 1Z4, Canada h Univ Georgia, Dept Plant Biol, Miller Plant Sci, Athens, GA 30602, USA i Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada j Department of Biology, Duke University, Box 90338, Durham, NC 27707, USA k New York Bot Garden, Bronx, NY 10458, USA l Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada m Department of Medicine, University of Alberta, Edmonton, AB T6G 2E1, Canada n BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China o University of California, Davis, Section of Plant Biology, One Shields Ave., Davis, CA 95616, USA article info abstract

Article history: It is commonly believed that gene duplications provide the raw material for morphological evolution. Both Received 4 April 2014 the number of genes and size of gene families have increased during the diversification of land . Sev- Revised 2 June 2014 eral small proteins that regulate transcription factors have recently been identified in plants, including the Accepted 2 June 2014 LITTLE ZIPPER (ZPR) proteins. ZPRs are post-translational negative regulators, via heterodimerization, of Available online 27 September 2014 class III Homeodomain Leucine Zipper (C3HDZ) proteins that play a key role in directing plant form and growth. We show that ZPR genes originated as a duplication of a C3HDZ transcription factor paralog in Keywords: the common ancestor of euphyllophytes ( and seed plants). The ZPRs evolved by degenerative muta- Gene family evolution tions resulting in loss all of the C3HDZ functional domains, except the leucine zipper that modulates Gene duplication Homeodomain Leucine Zipper dimerization. ZPRs represent a novel regulatory module of the C3HDZ network unique to the euphyllo- LITTLE ZIPPER phyte lineage, and their origin correlates to a period of rapid morphological changes and increased com- Evolution of development plexity in land plants. The origin of the ZPRs illustrates the significance of gene duplications in creating developmental complexity during land plant evolution that likely led to morphological evolution. Ó 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

1. Introduction prior to the evolutionary radiation of the land plants (embryo- phytes) (Banks et al., 2011; Delaux et al., 2012; Floyd and The increasing abundance of genomic and transcriptomic Bowman, 2006, 2007; Rensing et al., 2008; Tanabe et al., 2005; resources has revealed that a basic genetic toolkit was in place Zalewski et al., 2013). Comparative genomics and phylogenetic analyses of gene families has also shown that there were net gains in the number of gene families at key nodes of the land plant phy- Abbreviations: C3HDZ, class III Homeodomain Leucine Zipper; HD, homeodo- logeny, including the ancestor and later in the main; LZ, leucine zipper; SAM, shoot apical meristem; SRP, small regulatory euphyllophyte lineage (monilophytes and seed plants), as well as protein; TF, transcription factor; ZPR, LITTLE ZIPPER. ⇑ Corresponding author at: School of Biological Sciences, Monash University, evidence of gene or genome duplications in numerous lineages, Melbourne, Victoria 3800, Australia. including , , euphyllophytes, seed plants, and E-mail address: [email protected] (J.L. Bowman). http://dx.doi.org/10.1016/j.ympev.2014.06.017 1055-7903/Ó 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/). 160 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 angiosperms (Banks et al., 2011; Floyd et al., 2006; Jiao et al., 2011; (Magnani and Hake, 2008; Staudt and Wenkel, 2011; Wenkel Rensing et al., 2008; Zalewski et al., 2013). It has long been recog- et al., 2007). However, this may reflect a discovery bias as most nized that gene duplications provide the raw material for biologi- developmental research focuses on angiosperms. Recently, Hu cal evolution (Freeling and Thomas, 2006; Jiao et al., 2011; Ohno, et al. (2008) identified MINI ZINC FINGER (MIF) sequences in gym- 1970; Zhang, 2003). It is likely that gene and genome duplications nosperm taxa, indicating that this class of plant SRPs predates the were the ultimate source for novel gene families as well as increas- origin of flowering plants. The Aux/IAA genes that competitively ing gene family size in land plant genomes. Therefore, in order to regulate AUXIN RESPONSE FACTOR (ARF) transcription factors understand plant morphological evolution, it is important to be involved in auxin response are ancient, as are the ARFs themselves. able to reconstruct the evolution of complexity in gene families Both ARFs and their Aux/IAA regulators have been identified in the and make associations between gene duplications, the appearance genomes of the , Physcomitrella and the , , of novel gene function, and increasing developmental and struc- indicating that they may be present in all land plants (Banks et al., tural complexity (Aburomia et al., 2003; Floyd and Bowman, 2007). 2011). Extensive searches for other SRP homologs in non-flowering Although the fate of most duplicate genes may be degeneration plants have not yet been undertaken and have the potential to and loss, the fact remains that numerous paralogs have accumu- reveal a more ancient origin for many SRPs than is currently lated and been retained for hundreds of millions of years following hypothesized. gene and genome duplications during the diversification of land To resolve the phylogenetic relationships of SRPs and their tar- plants (Prince and Pickett, 2002). This is particularly true of plant gets, and therefore infer the timing of the duplication(s) that gave transcription factor genes (Banks et al., 2011; Blanc and Wolfe, rise to novel regulatory modules, it is essential to have a well- 2004; Edger and Pires, 2009; Floyd and Bowman, 2007; Freeling resolved phylogenetic history of both the SRPs and their targets. and Thomas, 2006; Seoighe and Gehring, 2004). Very few plant developmental gene families have been broadly Transcription factors (TFs) are proteins that must interact with studied from a phylogenetic perspective. One notable exception is other molecules to function (Amoutzias et al., 2008; Riechmann the class III Homeodomain Leucine Zipper (C3HDZ) gene family. et al., 2000. Many transcription factors regulate gene expression C3HDZs are essential for patterning and differentiation in the Ara- during developmental processes. TFs typically interact with the bidopsis shoot, including establishment of the embryonic meristem, DNA of their target genes, but frequently have multiple functional patterning and polarity of vascular tissues, and establishment of domains for interaction with other proteins (Amoutzias et al., leaf polarity (Emery et al., 2003; McConnell et al., 2001; Otsuga 2008). Bridgham et al. (2008) demonstrated that for developmen- et al., 2001; Prigge et al., 2005; Talbert et al., 1995). Expression data tal genes that encode proteins with modular subfunctions (eg. dis- for C3HDZs in and the lycophyte Selaginella suggest tinct DNA binding and dimerization domains), a simple nucleotide that a role in the SAM and vascular patterning may be quite ancient substitution can alter the protein of one paralog so that dimeriza- (Floyd and Bowman, 2006; Floyd et al., 2006). tion may occur, but DNA or other ligand binding is prevented due C3HDZ genes have been identified in all land plant lineages and to loss of critical domains. The result of such mutations is that the charophycean algae. Phylogenetic analyses of Floyd et al. (2006) mutated paralog becomes an antagonistic, negative, post-transla- and Prigge and Clark (2006) both suggested that C3HDZ gene tional regulator of any non-altered paralogs. duplications occurred in a common ancestor of monilophytes Many plant transcription factors involved in developmental (ferns, whiskferns, and horsetails) and seed plants. Comparison of regulation (bHLH, b-ZIP, HD-ZIP, LEAFY, MADS-Box, WRKY) func- the two independent analyses also suggests that sampling of tion as dimers, with dimerization critical for DNA binding and gymnosperms and monilophytes has not yet revealed the full mediated by an independently functioning (Amoutzias complement of C3HDZs in those taxa. Full resolution of C3HDZ et al., 2008; Chi et al., 2013; Siriwardana and Lamb, 2012). Since phylogeny in euphyllophytes requires the addition and analysis dimerization is essential for their function, duplicate paralogs of of missing homologs. most plant transcription factor genes have the potential to become C3HDZs in Arabidopsis are post-translationally regulated by LIT- (by degenerate mutation) novel regulatory modules for existing TLE ZIPPER (ZPR) proteins (Kim et al., 2008; Staudt and Wenkel, developmental pathways, with consequences for fine-tuning 2011; Wenkel et al., 2007). ZPR genes were first identified as direct developmental control both spatially and temporally. The extent targets of the Arabidopsis C3HDZ protein REVOLUTA (REV) (Wenkel to which this has happened and the possible implications for land et al., 2007). Each of the four ZPR genes in Arabidopsis (ZPR1-4) plant evolution remain an intriguing area for investigation. encodes a short protein (67–105 amino acids) with a single leu- Recently, a variety of small regulatory proteins (SRPs) have been cine-zipper (LZ) domain that is most similar to the LZ domains of identified in plants. These have been referred to as ‘‘small interfer- C3HDZ proteins. All four ZPR genes are known to be upregulated ing peptides (siPEPS)’’ (Seo et al., 2011) or ‘‘microProteins’’ (Staudt by three of the five Arabidopsis C3HDZ proteins, REV, PHB, and and Wenkel, 2011). SRPs competitively inhibit transcription fac- PHB, (unknown for ATHB8 and ATHB15/CNA) (Brandt et al., tors, either competing for binding sites or creating non-functional 2013; Kim et al., 2008; Wenkel et al., 2007). All four ZPR proteins dimers and thus post-translationally dampen the effects of the dimerize with all five C3HDZ proteins via the LZ domains, forming expressed target TF proteins (Hong et al., 2011; Seo et al., 2011; heterodimers that are unable to bind to DNA (Brandt et al., 2013; Staudt and Wenkel, 2011). The importance of plant SRPs in flower- Kim et al., 2008; Wenkel et al., 2007). Overexpression of ZPR genes ing plant development is only just beginning to be appreciated (Seo causes phenotypic defects that mimic C3HDZ loss-of-function phe- et al., 2011; Staudt and Wenkel, 2011), and the origin and history notypes (Wenkel et al., 2007), and can partially rescue the phb-1d of SRPs remains largely unexplored. dominant gain-of-function phenotype (Kim et al., 2008). Analysis The ability of SRPs to interact or compete with transcription fac- of ZPR loss-of-function mutants indicates that ZPR function is tors through shared domains suggests that the SRPs may be evolu- required for SAM structure and function. zpr3-2 zpr4-2 double tionary related to their targets and may represent degenerate mutants exhibited a variety of defects including abnormal SAM paralogs. The origin of such post-translational competitive inhibi- structure, disruption of normal phyllotaxis, production of extra tors would represent a mechanism for the origin of novel regula- cotyledons and leaves, as well as ectopic axillary meristems (Kim tory modules imposed on the ancestral developmental tool kit, et al., 2008). Together these data suggest that normal SAM and may have had a significant role in morphological change maintenance and function (including phyllotaxis) in Arabidopsis during evolution. Thus far, most SRPs are known only requires ZPR proteins to maintain the balance of C3HDZ activity from flowering plants, suggesting a relatively recent origin (Brandt et al., 2013; Kim et al., 2008). S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 161

The ZPR genes were described as a new gene family restricted to PmC3HDZ4. Complete GbC3HDZ4 and PmC3HDZ4 coding sequences flowering plants (Staudt and Wenkel, 2011; Wenkel et al., 2007). were then cloned and sequenced. The fourth potential euphyllo- However, the similarity of the leucine zipper (LZ) amino acid phyte lineage of Prigge and Clark (2006) was represented by a sin- sequences of ZPR and C3HDZ proteins suggests a shared ancestry. gle partial EST from Pinus taeda (PtaHDZ34) and a fifth clade As previously discussed, C3HDZ genes are known to have an ancient included a single P. taeda EST 30 fragment (PtaHDZ35). A specific origin. Intriguingly, the first C3HDZ gene (Crhb1) identified from the reverse primer based on the end of the PtaHDZ35 coding region, Ceratopteris by Aso et al. (1999) encodes a truncated C3HDZ used with forward degenerate primers to amplify an ortholog for protein with only the HD and LZ domains (Floyd and Bowman, the fifth gene from G. biloba (GbC3HDZ5). 50RACE 2006). The phylogenetic position of Crhb1 is uncertain, as it was was used to identify the 50 end of the coding sequence, which was resolved in different positions in previous published C3HDZ phylo- then cloned and sequenced. GbC3HDZ5 nucleotide sequence was genetic analyses (Floyd et al., 2006, #146; Prigge and Clark, 2006, used as a query in a BLAST search of transcript databases #338). It is possible that Crhb1 is a paralog of the ZPR genes. In order in the Sequence Read Archive (SRA) at GenBank, GenBANK ESTs, to determine whether ZPR genes represent degenerate C3HDZ and 1KP project transcript database (http://www.onekp.com/). genes, we utilized newly available public genomic and transcrip- The resulting sequences were assembled as contigs in Sequencher tomic resources to fully assess the phylogenetic distribution of 4.10.1 (GeneCodes). A single, complete transcript could not be ZPR genes in land plants and to reanalyze the C3HDZ phylogeny, assembled for any single species. An isolated contig from Pseudots- with an emphasis on more complete sampling of monilophytes uga menziesii aligned near what should be the 50 end of a C3HDZ and gymnosperms and the inclusion of ZPR gene sequences. Our coding sequence. A specific primer was designed based on that goal was to determine if ZPR genes resolve within the C3HDZ gene sequence and used with the PtaHDZ35 coding reverse primer to tree and, if so, to map the duplication event that gave rise to ZPR amplify a single, mostly complete coding sequence from P. menziesii genes in order to infer how the origin of a novel regulatory module (PMC3HDZ5). may have influenced plant evolution. 2.2.2. Monilophytes Gymnosperm and fern C3HDZ sequences were used as queries 2. Methods in BLAST searches of the 1KP project transcript databases (http:// www.onekp.com/) for monilophytes. Our search of leptosporan- 2.1. Search for ZPR orthologs in public databases giate ferns was focused on identifying orthologs of PtaHDZ34 (Prigge and Clark, 2006). In addition, three Arabidopsis ZPR gene coding sequences were obtained from The sequences (CriHDZ31, CriHDZ32, Mm_DQ657207) identified by Arabidopsis Information Resource (TAIR) database (http:// Prigge and Clark (2006) were obtained from GenBank. www.Arabidopsis.org/). Previously identified Oryza sativa ZPR sequences were obtained from The Rice Genome Annotation 2.2.3. Lycophytes Project (http://rice.plantbiology.msu.edu/). Using Arabidopsis ZPR Shoot apices were obtained from Huperzia lucidula cultivated in sequences as queries, we performed PsiBLAST similarity searches the University of California, Davis conservatory. RNA was extracted first in the GenBank EST database, focusing the search on gymno- and cDNA synthesized as described in Floyd et al. (2006). Degener- sperms and then monilophytes. We identified a single partial EST ate primers were used to amplify fragments of multiple C3HDZ from biloba and one from Cycas rumphii that were likely cDNA sequences, followed by 50 and 30 RACE to identify cDNA ends. ZPR homologs. We then used the C. rumphii sequence to perform a BLAST search for an ortholog in Zamia furfuracea in the GenBank 2.2.4. Mosses Sequence Read Archives (SRA). Individual SRA reads were assem- Degenerate PCR was performed using methods and primers bled into a complete ZPR coding sequence using Sequencher described in Floyd et al. (2006) on cDNA from Sphagnum sp. col- 4.10.1 (Gene Codes). The and Ginkgo ZPR sequences were lected from the conservatory at the University of California, Davis used as queries for BLAST similarity searches in the 1KP Project and Dawsonia superba collected from wild populations near Mount (http://www.onekp.com/) and Ancestral Angiosperm Genome Dona Buang, Victoria, Australia. Multiple clones were screened and Project (http://ancangio.uga.edu/) EST assemblies. sequenced revealing what appeared to be several C3HDZ genes. The C3HDZ fragments from Sphagnum were used to perform BLAST 2.2. Class III HD-Zip sequences- expanding the sampling of searches of the 1KP project transcript databases (http://www.one- gymnosperms, monilophytes, lycophytes, and mosses kp.com/). Sphagnum C3HDZ contigs were assembled in Sequencher 4.10, combining sequence from our clones and 1KP scaffolds. Three All C3HDZ sequences used in our previous analysis of C3HDZ partial C3HDZs were identified in Dawsonia.50 and 30 RACE was phylogeny (Floyd et al., 2006) formed the core of the C3HDZ used to identify cDNA ends. Complete (or nearly so) cDNA character matrix for this analysis. sequences were then cloned and sequenced. All C3HDZ and ZPR cDNA nucleotide coding sequences cloned 2.2.1. Gymnosperms for this study as well as those downloaded from the 1KP BLAST site We added Pinus taeda C3HDZ sequences identified by Prigge and are provided in Dataset S1. C3HDZ sequence source information Clark (2006) (obtained from GenBank) (Table S1). Degenerate PCR and accession numbers are provided in Table S1. was performed using methods and primers described in Floyd et al. (2006) on a cDNA library prepared from Cycas rumphii (created 2.3. Sequence comparison and phylogenetic analysis at the New York Botanic Garden). Screening of clones revealed four different C3HDZ cDNA sequences (CruC3HDZ1-4). 50 and 30 RACE Pairwise sequence similarities for the HD and LZ domains were was performed to identify the ends of each cDNA sequence. Com- calculated for the C3HDZ REV sequence, the complete euphyllo- plete coding sequences of all four Cycas C3HDZ genes were cloned phyte clade III C3HDZ sequences and ZPR amino acid sequences and sequenced. A reverse primer based on CruC3HDZ3 was used for which we had complete coding sequences using the EMBOSS with a forward degenerate primer to amplify partial orthologs from Needle global pairwise alignment tool (http://www.ebi.ac.uk/ Ginkgo biloba (GbC3HDZ4) and Pseudotsuga menziesii (PmC3HDZ4). Tools/psa/emboss_needle/), which implements the Needleman– 50 RACE and 30RACE identified the ends of GbC3HDZ4 and Wunsch alignment algorithm (Needleman and Wunsch, 1970). 162 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173

Sequences included in the comparison are indicated in Tables S1, sputter coated with gold. The specimens were imaged using scan- S2. ning electron microscopy. Nucleotide sequences were imported into Se-Al (Rambaut, 1996), translated, and aligned manually. Previous phylogenetic analyses of C3HDZ and other gene families (Floyd and Bowman, 3. Results 2007; Floyd et al., 2006; Prigge and Clark, 2006) revealed that amino acid alignments are better for reconstructing deep phyloge- 3.1. New ZPR sequences from flowering plants netic relationships of land plant TF sequences (than nucleotide sequences) and that Maximum Likelihood and Bayesian methods Searches of public databases revealed ZPR-related sequences result in gene trees with similar topology and support. We used from diverse angiosperm species. Arabidopsis ZPR genes fall into Bayesian analyses of amino acid alignments for the expanded phy- two groups, one whose members encode several amino acids logenetic analyses presented here. We excluded ambiguously N-terminal of the LZ (ZPR1,2) and one that does not (ZPR3,4) aligned sequence to produce an alignment of 793 amino acid char- (Wenkel et al., 2007). This appears to be true for other angiosperms acters. Phylogenetic analyses were performed using Mr. Bayes 3.2.1 as well. Inspection of the aligned ZPR amino acid sequences from (Huelsenbeck and Ronquist, 2001), run on multiple parallel proces- diverse angiosperms reveals that most angiosperm species exam- sors on the Monash University Sun Grid High Speed Computer Clus- ined have at least one ZPR gene encoding a protein with N-terminal ter. The first analysis included only C3HDZ sequences. The second sequence and at least one that does not (Figs. 1, S1). The N-termi- included both C3HDZ and ZPR sequences. Both analyses were run nal domains of angiosperm ZPR1,2 proteins are not easily aligned for 2,000,000 generations, which was sufficient for convergence of to each other or to a C3HDZ homeodomain (HD), which is the the two simultaneous runs of each analysis. Convergence was domain N-terminal to the leucine zipper (LZ) in C3HDZ proteins assessed by visual inspection of the plot of the log likelihood scores (Figs. 1, S1). Most ZPR3,4 group proteins, which lack N-terminal of the two runs and the convergence diagnostic ‘‘Potential Scale sequence, begin with a start codon in what would be the leucine Reduction Factor’’ (Gelman and Rubin, 1992) calculated by MrBa- position (d) of the first heptad of the LZ domain (Fig. 1). yes. Angiosperm ZPR sequences lacking N-terminal domains were Angiosperm ZPR genes also encode a region of variable length at constrained as a monophyletic group. To allow for the burn-in the C-terminal end of the LZ. These C-terminal regions are also dif- phase, the first 50% of the total number of saved trees were dis- ficult to align, although within each of the two groups of angio- carded in both analyses. Character matrix and command files used sperm ZPRs there is a unique motif that is shared by many, but to run the Bayesian phylogenetic analyses are provided in Datasets not all, of the sequences of that group (Fig. S1). S2, S3. Individual saved trees were viewed and filtered to compare topologies in PAUP 4.0 (Swofford, 2003). 3.2. ZPR sequences in non-flowering plants 2.4. Ginkgo ZPR expression vector construction 3.2.1. Gymnosperms Searches of publicly available transcript databases revealed RNA extraction and cDNA synthesis from Ginkgo biloba shoot ZPR-like sequences from multiple gymnosperm species. We found apices was done as previously described (Floyd et al., 2006). Two no evidence of more than one ZPR transcript from any single gym- reverse primers were designed based on the partial GbZPR EST nosperm species. For the purposes of phylogenetic analysis, we and used to perform 50RACE using the SMART RACE kit (Clontech). included ZPR sequences from seven gymnosperms, including The 50 end of the transcript was amplified, TA cloned (Invitrogen), , Ginkgo and (Table S2). All gymnosperm ZPR genes and sequenced. The GbZPR coding sequence was amplified and encode a region N-terminal of the LZ domain, which is similar to, TOPO cloned directly into the pENTRD-TOPO vector (Invitrogen). and aligns with, the HD of C3HDZ protein sequences (Figs. 1, S1, The GbZPR coding sequence was then recombined into a binary S2). The gymnosperm ZPR sequences also encode a short C-termi- vector based on pMLBART (Gleave, 1992) with two tandem copies nal domain that is conserved in all gymnosperms, but not shared of the cauliflower mosaic virus 35S promoter (35S) upstream of the with ZPR genes outside of gymnosperms (Fig. S1, Table S2). Gateway attR1-attR2 site, using Gateway LR Clonase II (Invitrogen). The resulting construct (35S:GbZPR-BART) put the GbZPR coding sequence under control of the constitutive 35S promoter. The 3.2.2. Monilophytes coding sequence for GbZPR is available in Dataset S2. From BLAST similarity searches we retrieved short transcripts with similarity to the LZ domain as well as upstream HD-encoding 2.5. Transformation of Arabidopsis and selection of transformants regions of C3HDZs from both eusporangiate and leptosporangiate monilophytes (Figs. 1, S1, Table S2). As with the gymnosperms, there The binary vector containing the expression construct was no evidence of more then one ZPR-like transcript for any one (35S:GbZPR-BART) was transformed into Agrobacterium tumefac- species. For phylogenetic analysis we included ZPR-like sequences iens (strain GV3001) by electroporation and then introduced into from eusporangiate taxa diffusum (Edi_2008727), E. wild-type Arabidopsis thaliana Landsberg erecta (Ler) using stan- hymale (Ehy_2010667), and Sceptridium dissectum (Sdi_2001319) dard protocols. Seeds were collected and sown. Seedlings were and from the leptosporangiate ferns Adiantum aleuticum sprayed with BASTA to select for transformants. Seeds were col- (Aal_2002621), Argyrochosma nivea (Ani_2073218), Asplenium platy- lected from 10 T1 lines and sown. Young T2 plants were examined neuron (Apl_2055780), and Gaga arizonica (Gar_2015071)(Table S2). for evidence of the effects of the transgene, comparing to the All of the monilophyte ZPR-like sequences encode a C-terminal results of Wenkel et al. (2007), who performed similar experiments region. The C-terminal amino acid sequences of the leptosporan- with overexpressing Arabidopsis ZPR genes. giate fern ZPR proteins are similar to each other but not with those of the eusporangiate ZPR proteins (Edi_2008727, Ehy_2010667). 2.6. Microscopy

Young 35S:GbZPR seedlings and leaves were fixed in FAA 3.2.3. Other taxa overnight and then dehydrated through an ethanol series to 100% No ZPR-related sequences were discovered in the publicly avail- ethanol. The specimens were dried in a critical point drier and able databases for lycophyte or taxa. S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 163

Fig. 1. Alignment of selected LITTLE ZIPPER (ZPR) and class III HD-Zip (C3HDZ) amino acid sequences. Arabidopsis C3HDZ sequences ATHB8 and REV are at the top of the alignment. Amino acids highlighted in bold red are predicted to interact with DNA bases and amino acids enclosed in a box are predicted to interact with DNA phosphates (according to Sessa et al., 1998). Red amino acids in the monilophyte ZPR sequences indicate conservation of base-binding residues and the boxes extending into monilophyte and gymnosperm ZPR sequences indicate conservation of phosphate-contacting amino acids. Within the angiosperm ZPR1,2 sequences, amino acids highlighted in red could be conserved, but the alignment is uncertain. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

3.3. Comparison of ZPR sequences from flowering and non-flowering As mentioned above, monilophyte, gymnosperm and some plants angiosperm ZPR genes encode N-terminal regions (relative to the LZ). The N-terminal domains of monilophyte and gymnosperm The ZPR sequences from monilophytes and gymnosperms share ZPR sequences share sequence similarity with the C3HDZ HD a LZ that is clearly alignable to the LZ domains of C3HDZ proteins sequences and are readily aligned, whereas the N-terminal and angiosperm ZPR LZ domains (Figs. 1, S1). Monilophyte ZPR LZ domains of angiosperm ZPRs are neither readily aligned to each domains have 59% similarity to C3HDZ LZs and 58% similarity to other or to the N-terminal domains of monilophyte and gymno- gymnosperm ZPR LZ domains, but only 52% similarity with angio- sperm ZPR sequences (Figs. 1, S1). Monilophyte ZPR N-terminal sperm ZPR LZ domains (52%) (Table 1A). Gymnosperm ZPR LZ domains have 47% similarity to C3HDZ HDs, gymnosperm ZPR N- domains are more similar to angiosperm ZPR LZ domains (69%) than terminal domains have 39% similarity to C3HDZ HDs and angio- to C3HDZ LZ domains (61%) or monilophyte ZPR LZ domains (58%). sperm ZPR1,2 N-terminal domains are only 23% similar to C3HDZ

Table 1 Average pairwise similarities among C4HDZ and ZPR genes.

C3 Mon euphyll III Gym euphyll III Mon ZPR Gym ZPR Angio ZPR A. Average pairwise similarities (%) for the leucine-zipper domains of C3HDZ and ZPR amino acid sequences C3 84 Mon euphyll III 80 87 Gym euphyll III 81 80 99 Mon ZPR 59 60 60 87 Gym ZPR 61 63 61 58 87 Angio ZPR 55 56 54 52 69 77 C3HDZ Mon ZPR Gym ZPR Angio ZPR B. Average pairwise similarities (%) for the homeodomains of C3HDZ and n-terminal domains of ZPR amino acid sequences C3HDZ 76 Mon ZPR 49 64 Gym ZPR 38 36 56 Angio ZPR 24 17 19 26 164 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173

Fig. 2. C3HDZ phylogeny, 50% majority-rule consensus phylogram from Bayesian analysis of aligned C3HDZ amino acid sequences, rooted with the outgroup algal sequence CcC3HDZ1. Character matrix and parameters available in Dataset S2. Numbers at branches indicate posterior probability values. Taxa are color coded according to major land plant classification. Three clades comprised of euphyllophyte sequences (euphyllophyte I, II, III) and five monophyletic clades of seed plant sequences (a–e) are indicated. For full names of taxa and sequence information see Table S2. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 165

Fig. 3. Phylogeny and evolution of C3HDZ and LITTLE ZIPPER sequences. (A) 50% majority-rule consensus phylogram from Bayesian analysis of aligned C3HDZ and ZPR amino acid sequences, rooted with the outgroup algal sequence CcC3HDZ1. Character matrix and parameters available in Dataset S3. Only the euphyllophyte clade is shown here; the complete tree is available as Fig. S2. Numbers at branches indicate posterior probability values. Taxa are color coded according to major land plant classification. Three clades comprised of euphyllophyte C3HDZ sequences (I, II, III) are indicated. ZPR sequences resolve within euphyllophyte clade III. Asterisks highlight the Arabidopsis ZPR genes. For full names of taxa and sequence information see Tables S1, S2. (B) Cladogram depicting a consensus of land plant phylogeny (Kelch et al., 2004; Pryer et al., 2001; Qiu et al., 2006, 2007). Red dots indicate C3HDZ duplication events with the numbers showing how many duplications are inferred in that lineage. The red dot with the black outline indicates C3HDZ losses in an ancestor of angiosperms. The white circles outlined with red indicate ZPR duplication events. Three C3HDZ duplications were inferred in the euphyllophyte lineage prior to divergence of monilophytes and seed plants, one paralog was the ancestral ZPR. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 166 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173

Fig. 4. Aberrant morphology of Arabidopsis plants constitutively expressing the Ginkgo biloba ZPR gene. (A) Wild-type seedling, 14 days after sowing. (B) A severely affected 35S:GbZPR plant (18 days after sowing) with small, elliptical cotyledons and arrested SAM. (C) 35S:GbZPR plant (39 days after sowing) in which original SAM terminated in leaf-like organ and a new SAM formed. (D) SEM of plant similar to (B), with no SAM. (E) Abaxial surface of a wild-type Arabidopsis cotyledon, 14 days after sowing, with clusters of small cells and numerous stomata interspersed with larger cells. (F) Wild-type cotyledon adaxial surface with larger, more uniform cells and few stomata. (G) 35S:GbZPR cotyledon abaxial surface, similar to wild-type abaxial surface shown in (E). (H) 35S:GbZPR cotyledon adaxial surface, similar to the 35S:GbZPR abaxial cotyledon (G) and the wild-type abaxial cotyledon (E). (I) 35S:GbZPR plant 25 days after sowing exhibiting an epinastic phenotype; the leaves becoming progressively more epinastic; leaves numbered 1–6 from oldest to youngest. Oldest leaves 1–3 are round and mostly flat, younger leaves 4–6 are distinctly narrower and downward curved. (J) Plant similar to (I) with later leaves strongly epinastic; 39 days after sowing. Scale bars for A–C, I, J = 1 mm; D = 200 lm; E–H = 100 lm. Abbreviations: c = cotyledon; SAM = shoot apical meristem; wt = wild type. S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 167

HDs (Table 1B). Angiosperm ZPR3,4 sequences lack coding euphyllophyte III clade. Within the euphyllophyte III clade the sequence N-terminal to the LZ. gymnosperm sequences are the monophyletic sister group to a Despite being conserved enough for alignment, the residues poorly supported (89%) clade in which monilophyte C3HDZs and responsible for DNA binding in C3HDZ proteins are not conserved the ZPR clade form an unresolved polytomy (Figs. 3, S2). Branches in ZPR N-terminal domains. In monilophyte ZPR sequences, only leading to the ZPR clade, the monilophyte ZPR clade and the seed one of the five residues predicted to interact with DNA bases plant ZPR clade are all long (Fig. 3). (Sessa et al., 1998) is conserved and two of the four residues pre- Within the ZPR clade, the monilophyte ZPR sequences are dicted to interact with DNA phosphates (Sessa et al., 1998) are con- resolved as a highly supported (100%), monophyletic sister group served. In gymnosperms only two of the phosphate-contacting to the seed plant sequences (gymnosperms plus angiosperms) residues are mostly conserved, and in angiosperms it is not clear and the gymnosperm sequences are resolved as a sister clade to if any of the C3HDZ DNA-binding residues are conserved, although the monophyletic angiosperm sequences (100%), although the there are numerous arginine (R) residues that may represent part of gymnosperm clade was not well supported (79%). The angiosperm the ancestral DNA-binding domain. All ZPR sequences include a clade is further resolved into two major clades, the ZPR1,2 clade region C-terminal of the LZ and these sequences are variable in and the ZPR3,4 clade. The ZPR 3,4 clade was constrained based length and not conserved across major taxonomic groups. However, on shared absence of N-terminal sequence. Within ZPR3,4 clade, there are motifs that are conserved within taxonomic lineages. the single Amborella sequence (Atr_c73493) is resolved as sister to remaining sequences which are largely unresolved. The remain- 3.4. Phylogenetic analysis of C3HDZ sequences ing sequences were identified as a monophyletic ZPR1,2 clade with 94% support. Within the ZPR1,2 clade, there is a basal polytomy of The consensus tree from the Baysian phylogenetic analysis of the Amborella Atr_CK765457 sequence, a clade of eudicot C3HDZ amino acid sequences was rooted with the outgroup charo- sequences, and a poorly-supported (59%) clade including basal phycean algal sequence (CcC3HDZ1)(Fig. 2) In the resulting toplo- angiosperm and monocot sequences. gy, the bryophyte sequences diverge from the three basal nodes, outside of a well-supported (98%) clade including all vascular plant sequences. The branching order of the bryophyte grade is not con- 3.6. Ginkgo ZPR over-expression in Arabidopsis sistent with currently accepted hypotheses of land plant phylog- eny, however the sister position of the liverwort MpC3HDZ1 as The ZPR coding sequence from the gymnosperm Ginkgo biloba sister to vascular plants is not highly supported (82%). Within (GbZPR) was constitutively expressed, using the cauliflower mosaic the vascular plant clade, all of the lycophyte sequences are virus 35S (35S) regulatory sequence (35S:GbZPR), in Arabidopsis. resolved as a highly-supported (100%), monophyletic sister-group Plants with a range of abnormal phenotypes from severe to moder- to a clade including all of the euphyllophyte sequences, which is ate were observed in two of 10 segregating T2 35S:GbZPR lines also highly supported (100%). Within the lycophyte clade, Selagi- (Fig. 4). The remaining eight lines exhibited only moderately nella sequences are all resolved in one clade sister to all of the affected leaves. All of the plants with abnormal development were Huperzia sequences. Within the euphyllophyte clade, there are also slow to germinate and slower-growing than wild-type seed- three highly-supported (100%) euphyllophyte subclades that lings. 35S:GbZPR plants exhibiting the most severe phenotype pro- include both monilophyte and seed plant sequences (I, II, and III duced two small, eliptical cotyledons and no post-germination in Fig. 2). SAM activity (Fig. 4B and D). We also observed plants with two Euphyllophyte I includes a highly supported (100%) clade of small cotyledons and two arrested leaf primordia. These severely sequences from eusporangiate and leptosporangiate monilophytes affected plants died about 8–16 days after germination. In one sister to a large clade of seed plant sequences. The Ceratopteris plant the SAM terminated in a single leaf-like organ and a new Crhb1 gene, which encodes a truncated C3HDZ protein, is resolved SAM had initiated in a lateral position (Fig. 4C). In wild-type seed- within the monilophyte subclade of euphyllophyte clade I. The lings the abaxial cotyledon surface exhibits clusters of small cells seed plant clade within euphyllophyte clade I is further resolved and stomata interspersed with larger cells for an overall irregularly into three highly-supported (100%) subclades, ‘‘a’’, ‘‘b’’, and ‘‘c’’ textured surface (Fig. 4E), whereas the adaxial surface consists of (Fig. 2). Subclade ‘‘a’’ includes only gymnosperm sequences. relatively larger and more uniform cells, few stomata, and a Subclade ‘‘b’’ is resolved into a subclade of gymnosperm sequences smoother surface (Fig. 4F). Cotyledons of the more severely sister to a subclade of angiosperm sequences, including Arabidopsis affected 35S:GbZPR plants (as shown in Fig. 4B and C) have abaxial ATHB8 and ATHB15/CNA (Fig. 2). Subclade ‘‘c’’ is resolved into (Fig. 4G) and adaxial (Fig. 4H) surfaces similar in appearance to asubclade of gymnosperm sequences sister to a subclade of angio- each other that resemble the abaxial surface of wild-type cotyle- sperm sequences, including Arabidopsis REV, PHB and PHV (Fig. 2). dons (Fig. 4E). Several seedlings in the weaker 35S:GbZPR-10 line Euphyllophyte II includes a subclade of monilophyte sequences sis- produced relatively normal cotyledons, followed by rosette leaves ter to a subclade of gymnosperm sequences (d) (Fig. 2). Euphyllo- that were progressively more epinastic (downward curved) (Fig. 4I phyte III is also further resolved into a clade of monilophyte and J). sequences sister to a clade of gymnosperm sequences (e) (Fig. 2). Thus, there are five distinct clades that include seed plant sequences, however only two of these (subclades ‘‘b’’ and ‘‘c’’) also 4. Discussion include angiosperm sequences. 4.1. Evolution of the C3HDZ gene family 3.5. Phylogenetic analysis of C3HDZ and ZPR sequences Phylogenetic analysis of C3HDZ amino acid sequences (Fig. 2) When rooted with the outgroup algal sequence (CcC3HDZ1), the produced a result consistent with previous analyses (Floyd and consensus tree topology for the analysis including C3HDZ and ZPR Bowman, 2006; Prigge and Clark, 2006). When rooted with the sequences (Figs. 3, S2) is nearly identical to the C3HDZ analysis outgroup algal sequence (CcC3HDZ1), the basic topology is gener- described above (Fig. 2). Euphyllophyte clades I, II, and III are all ally consistent with currently supported scenarios of land plant highly supported (100%) as in the C3HDZ-only tree. The ZPR phylogeny (Kelch et al., 2004; Pryer et al., 2001; Qiu et al., 2006, sequences resolve as a highly supported (100%) clade within the 2007) in that diverge from the basal nodes, vascular 168 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 plant sequences are monophyletic, and lycophytes and euphyllo- monilophyte ZPR sequences are more similar to euphyllophyte phytes are sister groups within the vascular plant clade (Fig. 2). C3HDZ LZs (60%) than to gymnosperm (58%) or angiosperm Within the euphyllophyte clade, resolution is also consistent (52%) ZPR LZs (Table 1). Similarly, the N-terminal domains of with previous analyses of C3HDZ phylogeny. Euphyllophyte clade monilophyte ZPRs are more similar to the HDs of C3HDZs than to I was identified in both Floyd et al. (2006) and Prigge and Clark N-terminal domains of gymnosperm or angiosperm ZPRs. This (2006). The C3HDZ gene tree of Floyd et al. (2006) also resolved sequence similarity may be attracting the monilophyte C3HDZ an additional euphyllophyte clade with two monilophyte sequences to the ZPR clade. sequences, from (PnC3HDZ3) and Ceratopteris (CrC3HDZ1). In addition, inspection of the amino acid matrix (Dataset S3) The gene tree of Prigge and Clark (2006) resolved a second reveals few residues or motifs that are potential synapomorphies euphyllophyte clade with several fern sequences and a single gym- uniting the euphyllophyte III C3HDZ sequences as a clade. Thus nosperm sequence (PtaHDZ35) as well as a possible third clade rep- the weak phylogenetic signal for the euphyllophyte III C3HDZ clade resented by a single gymnosperm sequence (PtaHDZ34). For this may be overwhelmed by shared similarities (either ancestral or analysis we identified orthologs for PtaHDZ34 and PtaHDZ35 from homoplasious) in the N-terminal and LZ regions between monilo- other gymnosperms and monilophytes, which clarifies the resolu- phyte ZPR and C3HDZ sequences. tion of the euphyllophyte C3HDZs into three well-supported clades However, alignment of amino acids in the variable linker region (I, II, III in Figs. 2 and 3) including sequences from eusporangiate between the LZ and the START domain (LR in Fig. 6) reveals a motif and leptosporangiate monilophytes as well as seed plants. The that is unique to the euphyllophyte III C3HDZ sequences (Fig. 5). C3HDZ topology suggests that the euphyllophyte common ances- The linker region is partly excluded from the matrix for phyloge- tor had a single C3HDZ gene, which was then triplicated, either netic analysis because it is impossible to reliably align linker by gene or genome duplications, prior to the divergence of monil- sequences from C3HDZs across major plant lineages. However, ophytes and seed plants. The three ancient paralogs have persisted C3HDZ linker region sequences from species within lineages (eg in these lineages. mosses, lycophytes, euphyllophyte I, II and III) are readily aligned There are five distinct subclades including seed plant C3HDZ and reveal motifs unique to each clade (Fig. 5). The euphyllophyte sequences (Fig. 2a–e), however only two of these (subclades ‘‘b’’ III C3HDZ sequences share unique sequence structure in the linker and ‘‘c’’) also include angiosperm sequences. The lack of angio- region, which can be considered a protein structural synapomor- sperm C3HDZ sequences in three of the five seed plant clades phy. It is likely that the euphyllophyte III C3HDZs are the mono- implies that three ancient C3HDZ paralogs were lost in an ancestor phyletic sister lineage to the ZPR clade, as resolved in some runs of extant flowering plants after the divergence of angiosperm and of the Bayesian analyses. We repeated the Bayesian analysis con- gymnosperm ancestors (Fig. 2). All of the flowering plant straining the euphyllophyte III C3HDZs as a clade (without defining sequences are derived from later duplications of the paralog that any internal topology). In the resulting consensus tree, euphyllo- gave rise to euphyllophyte clade I (Floyd et al., 2006). phyte clade III is identified with 100% support and includes a (con- strained) C3HDZ clade sister to the ZPR clade which is also highly 4.2. Origin and evolution of the ZPR gene family supported (100%) (Figs. 7, S3). Within the constrained C3HDZ clade, highly supported monophyletic monilophyte and gymno- Our analysis including both C3HDZ and ZPR amino acid sperm clades are resolved as in the C3HDZ-only analysis (Fig. 2), sequences resulted in a tree, when rooted with the outgroup algal and all sequences are resolved in a topology consistent with land sequence (CcC3HDZ1), with the same basic topology and support as plant phylogeny. The ZPR clade sister to the euphyllophyte the C3HDZ tree (Fig. 2), but with ZPR sequences resolved as a well- C3HDZ clade III is the simplest hypothesis that explains the current supported (100%) clade within euphyllophyte clade III (Fig. 3A). data. This analysis was unable to resolve relationship of the ZPR clade Regardless of the precise resolution within the euphyllophyte III to the euphyllophyte and gymnosperm C3HDZ sequences clade, there is strong support for the relationship of the ZPR clade (Figs. 3A; S2). The lack of resolution of the euphyllophyte III clade to the euphyllophyte clade III C3HDZ sequences. These analyses C3HDZ sequences in the consensus tree is the result of numerous indicate that the truncated Crhb1 gene from the fern Ceratopteris, alternate topologies among the post-burnin saved trees. The alter- which we had hypothesized as a possible paralog to the ZPR genes, nate resolutions of the euphyllophyte III clade included topologies is in fact distantly related (in euphyllophyte clade I; Figs. 2 and 3). in which individual lineages/sequences of C3HDZ sequences (gym- None of the orthologs of Crhb1 encode truncated C3HDZ proteins nosperm clade, monilophyte clade, Aev_C3HDZ5, Sdi_2003479) nor did we identify any similarly truncated genes from any other were sister to the ZPR clade as well as the monophyletic euphyllo- monilophyte or gymnosperm. Thus the evidence suggests that phyte III C3HDZ sequences (as in Fig 2) sister to the ZPR clade. the Crhb1 gene represents a unique origin of a truncated C3HDZ Pairwise sequence comparisons of the LZ domains indicated that paralog in Ceratopteris.

Fig. 5. Amino acid alignment of the linker region of euphyllophyte clade I, II, and III sequences. Boxes indicate unique sequence motifs of the clade III sequences. The dashed box indicates sequence included in the Bayesian data matrix, the solid box indicates motifs excluded from the Bayesian data matrix. ? signifies missing data. S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 169

Fig. 6. Hypothesis stages in the evolution of ZPR genes by degeneration of an ancient C3HDZ duplicate gene. Cartoon depicts the protein structure of C3HDZ protein (top) with five functional domains (HD, LZ, START, SAD, PAS). As mutations accumulated through time, functional domains were lost. In the ZPR3,4 clade of angiosperm ZPRs only the LZ remains. Red vertical line represents the stop codon. LR = linker region. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Within the ZPR clade, the monilophyte sequences are resolved was ancestral to the ZPR genes (step 2 in Fig. 7) leading to loss of as a highly supported sister clade to a highly supported (100%) C-terminal functional domains (step 3 in Fig. 7). Monilophyte seed plant ZPR clade. The gymnosperm ZPRs are identified as a and gymnosperm ZPRs encode sequence N-terminal to the LZ, poorly supported (79%) sister group to a highly-supported (100%) which is similar to and aligns with the HD of C3HDZ genes whereas the angiosperm ZPR clade. This basic topology is consistent with flowering plant ZPR genes have no clear sequence similarity with land plant phylogeny. Poor support for the gymnosperm ZPRs as the C3HDZ HD (Table 1; Figs. 1, S1). This suggests that the HD a monophyletic group is due to alternate topologies in which the was not immediately lost in the ancestral ZPR gene. Rather, follow- conifer ZPR sequences (Pam_2013367, Pm_2053043, Pta_BF17217) ing the divergence of the monilophyte and seed plant ancestors, are resolved as a sister clade to the angiosperm ZPRs. However, sequence divergence occurred in the HD-encoding region indepen- as mentioned previously, the gymnosperm ZPRs share a synapo- dently (steps 4a and 4b in Fig. 7). In the monilophyte lineage, lim- morphic motif in the C-terminal domain (Fig. S1) that unites them ited sequence divergence of the HD occurred and the resulting as a clade (that was not included in the phylogenetic analysis). sequence was well conserved through subsequent diversification Thus we can reject hypotheses in which conifer ZPRs are sister to of eusporangiate and leptosporangiate taxa and retains 49% angiosperm ZPRs. similarity with C3HDZ HDs. In gymnosperm ZPR genes, the HD is Angiosperm ZPR sequences are resolved as two subclades each further diverged from an ancestral HD (only 38% average similar- of which include sequences from , monocots and ity), implying that the ZPR of the seed plant ancestor encoded at . This indicates a duplication of the ZPR gene occurred in a least a partly degenerate HD. The lack of clear sequence similarity common ancestor of extant angiosperms. Angiosperm ZPR to the C3HDZ HD in the angiosperm ZPR genes (Fig. S1) indicates sequences are not well resolved within the two clades. Angiosperm that additional sequence divergence occurred in the ancestral ZPRs are short proteins, represented by as few as 44 residues in the ZPR gene of flowering plants following the divergence of angio- matrix (Figs. 1, S1). The lack of resolution within angiosperm ZPR sperm and gymnosperm ancestors. In addition, a duplication clades is likely due to the paucity of sequence data and homoplasy. occurred in the common ancestor of flowering plants, resulting The fact that gymnosperm and monilophyte ZPR genes encode in two ancient angiosperm ZPR genes (step 5 in Fig. 7). The sequence that can be aligned to the C3HDZ HD (in addition to C3HDZ paralog of the ZPRs was also lost in the angiosperm ances- the LZ) is evidence of a relationship between ZPR and C3HDZ tor (Fig. 7). Since all ZPR genes in the angiosperm ZPR3,4 clade lack genes. Phylogenetic analysis of amino acid sequences supports N-terminal sequence we can infer that the HD remnant was lost in the ZPR sequences as nested within C3HDZ sequences, with a one of the two ancient angiosperm paralogs prior to the divergence single point of origin. The ZPR genes are resolved within euphyllo- of extant angiosperms (step 6 in Fig. 7). This required the origin of phyte clade III and include both monilophyte and seed plant a new start codon in position ‘‘d’’ of heptad 1 and loss of the origi- sequences. This implies that another duplication of one of three nal start codon. Inspection of the aligned C3HDZ and ZPR ancient euphyllophyte C3HDZ paralogs occurred giving rise to a sequences reveals that in C3HDZ sequences and monilophyte ZPR fourth paralog prior to the divergence of euphyllophyte lineages. sequences, a leucine (L) is encoded for position ‘‘d’’ in heptad 1. The ZPRs should therefore be considered as C3HDZ euphyllophyte However, this position is occupied by either a methionine (M) or clade IV (Figs. 3A and 7). an isoleucine (I) in gymnosperm ZPR proteins and mostly M in angiosperm taxa (Figs. 1, S1). It is likely that the substitution of 4.3. ZPR sequence evolution M for L in position d of heptad 1 occurred in the ZPR of the ances- tral seed plant. Loss of the upstream start codons (and the HD) did All ZPR genes (monilophyte, gymnosperm and angiosperm) not occur until after the ZPR duplication in the common ancestor of encode short proteins (49–120 amino acids) and have no known extant angiosperms, and then only in one of the ancient paralogs functional domains other than the LZ (Figs. 1, S1). Thus the deriva- that was ancestral to the ZPR3,4 clade. tion of a ZPR gene from an ancestral C3HDZ gene involved muta- tions both C- and N- terminal to the LZ domain. In the simplest 4.4. Functional characterization of a gymnosperm ZPR scenario, following a duplication of one of three C3HDZ paralogs in an ancestral euphyllophyte (step 1 in Fig. 7), a premature stop Constitutive expression of the ZPR gene from the gymnosperm codon C-terminal to the LZ evolved in the C3HDZ paralog that Ginkgo biloba in Arabidopsis produced plants with slow 170 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173

Fig. 7. Bayesian phylogram of land plant C3HDZ and ZPR amino acid sequences with additional constraint. The character matrix and parameters were identical to those for Figs. 3 and S2 (Dataset S3) except that monophyly was constrained for C3HDZ euphyllophyte clade III sequences based on unique protein motifs as discussed in the text. Numbers at branches indicate posterior probability values. Taxa are color coded according to major land plant classification. For simplicity, only the euphyllophyte clade is shown (complete tree available as Fig. S3) and all clades except euphyllophyte clade III are drawn in cartoon form. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 171 germination, slow growth, and meristem arrest (Fig. 4A–D). The 4.6. ZPR genes and the evolution of euphyllophytes adaxial surface of cotyledons of 35S:GbZPR plants resembled abax- ial surfaces of both 35S:GbZPR and wild-type plants, indicating a The earliest euphyllophytes were much simpler in structure loss of adaxial identity and abaxialization of the adaxial surface than their modern descendants (Banks, 1980; Corvez et al., 2012; (Fig. 4E–H). In addition, some plants produced epinastic leaves Doran, 1980; Galtier, 2010; Hoffman and Tomescu, 2013; Kenrick (Fig. 4F–I). These phenotypic defects have been observed in and Crane, 1997; Trant and Gensel, 1985). However, paleobotanical Arabidopsis plants constitutively expressing ZPR1 and ZPR3 and evidence shows that the euphyllophyte lineage underwent dra- they are consistent with a reduction in C3HDZ function (Kim matic evolutionary changes in morphology prior to the divergence et al., 2008; Wenkel et al., 2007). Wenkel et al. (2007) reported that of the monilophyte and seed plant ancestors. These changes constitutive expression of ZPR3 in Arabidopsis produced more included the origin of determinate lateral branching systems (that severe defects than constitutive expression of ZPR1. The more ultimately were modified to become megaphyllous leaves) pro- severe defects included termination of the meristem immediately duced in a regular, spiral phyllotaxis and concomitant increases in after cotyledon formation in a single, radialized organ and the pro- the complexity of shoot vasculature and increase in stature duction of trumpet-shaped leaves. Neither of these more severe (Banks, 1980; Corvez et al., 2012; Doran, 1980; Galtier, 2010; effects was observed in 35S:ZPR1 plants by Wenkel et al. (2007) Hoffman and Tomescu, 2013; Kenrick and Crane, 1997; Trant and nor in 35S:GbZPR plants in this study, although we did observe Gensel, 1985). Morphological innovations such as lateral branch one case of apparent meristem termination in a leaf-like organ systems were the result of changes to the ancestral developmental after leaf initiation (Fig. 4D). Constitutive expression of the gymno- programs regulating the function of the shoot apical meristem sperm GbZPR in Arabidopsis produced the same developmental (SAM) and vascular patterning. C3HDZ genes were likely to have defects as that of ZPR1 in Arabidopsis, suggesting that both ZPR1 been part of the developmental program for the sporophyte SAM and GbZPR can suppress C3HDZ function by dimerization, but to and vasculature in the ancestor of all vascular plants (Floyd and a lesser degree than ZPR3. While overexpression in a heterologous Bowman, 2006, 2007). Duplications of the ancestral C3HDZ gene system cannot inform on the developmental roles that ZPRs play in provided additional paralogs, which may have allowed for func- Ginkgo, our results suggest that they are biochemically conserved tional diversification of C3HDZs in the earliest euphyllophytes. across deep phylogenetic distances in the seed plants, suggesting One of those early euphyllophyte paralogs underwent degenerative conservation of their negative regulatory roles. mutations to become the ancestral ZPR gene, representing a novel developmental module for regulating C3HDZ activity. Without fur- 4.5. Origin of small regulatory proteins in plants ther functional data from gymnosperms and monilophytes, the pre- cise role of ZPR genes in euphyllophyte evolution will remain SRPs have been identified that regulate diverse transcription speculative. However, given the demonstrated role of ZPR genes factors (TFs) in plants including ARF, bHLH, C3HDZ, MYB, and in regulating SAM function in Arabidopsis, it is quite likely that they KNX (Seo et al., 2011; Staudt and Wenkel, 2011) and many more were an important developmental innovation that contributed to have been predicted, some of which may target additional TFs the increasing morphological complexity of euphyllophyte plants. (MADS, NAC, WRKY, ZF, others) (Seo et al., 2011). Only one group of known SRPs (the Aux/IAAs) was likely part of the ancestral land 5. Summary and conclusions plant toolkit. Most SRPs, including the ZPRs, were previously thought to be restricted to flowering plants (Magnani and Hake, The LITTLE ZIPPER genes were evolutionarily derived, through 2008; Staudt and Wenkel, 2011; Wenkel et al., 2007). We have degenerative mutations, from the C3HDZ transcription factors that shown that the ZPR ‘‘family’’ of SRPs originated long before the ori- they regulate. Their origin can be traced to a gene duplication event gin of flowering plants. Our evidence suggests that a C3HDZ gene in the ancestor of the euphyllophytes. ZPRs represent a novel trans- duplication occurred in an ancestral euphyllophyte and one paral- regulatory module that increased the complexity of an ancient og degenerated to become the ancestral ZPR. developmental network involving the C3HDZ transcription factors. Phylogenetic analyses of multiple plant gene families indicate It is likely that the increased number of C3HDZ transcription factors that gene duplications occurred in the euphyllophyte ancestor and the addition of the ZPR regulatory module played a role in the not only in the C3HDZ family, but in several other important devel- developmental evolution of the euphyllophytes, perhaps in allow- opmental gene families (Zalewski et al., 2013; SK Floyd, unpub- ing a rapid, dynamic flux of C3HDZ expression in the apex. ZPR lished data). This could be evidence of whole or partial genome genes may therefore have been critical for the earliest stages of evo- duplications in the last common ancestor of ferns and seed plants. lution of megaphyllous leaves in the euphyllophyte lineage. It is therefore possible that gene duplications early in the euphyllo- The origin of the ZPRs provides a concrete, empirical example of phyte lineage, followed by degenerative mutations, gave rise to how a duplicate gene led to a novel regulatory mechanism with other SRPs, adding developmental complexity to the euphyllo- consequences for morphological evolution in plants. phyte developmental program. Thorough sampling of the increas- ZPRs are only one of several families of SRPs that act as compet- ingly broad genomic and transcriptomic resources for plants and itive inhibitors of plant transcription factors. Much work remains rigorous phylogenetic analyses including sequences of SRPs and to be done to understand exactly how the ZPRs, and other small their target TFs will reveal if SRPs other than the ZPRs have a com- regulatory proteins, have influenced plant development and evolu- mon origin in the ancestor of euphyllophyte plants or evolved at tion. However, it is clear that gene and genome duplications at key different times during land plant diversification. nodes in plant evolution have provided not only more genes, but It is of interest that SRPs share both functional and evolutionary novel regulatory modules, that increased the complexity of the similarities with plant miRNAs, which act post-trancriptionally to land plant developmental toolkit. negatively regulate target transcripts and are also thought be evo- lutionarily derived from their targets following gene duplication (Allen et al., 2004). In both cases evolution of a negative regulator Acknowledgments from a paralog can result in the establishment of a negative feed- back loop, allowing more sophisticated regulation of gene activity We thank Philip Chan for assistance with use of the Monash and perhaps facilitating co-option of genetic modules for new University Sun Grid High Speed Computer Cluster for Bayesian functions. phylogenetic analysis. The Ancestral Angiosperm Genome Project 172 S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 provided valuable sequence data used in our phylogenetic analy- Floyd, S.K., Zalewski, C.S., Bowman, J.L., 2006. Evolution of Class III Homeodomain ses. This work was supported in part by the National Science Foun- leucine zipper genes in streptophytes. Genetics 173, 373–388. Freeling, M., Thomas, B.C., 2006. Gene-balanced duplications, like tetraploidy, dation Grant IOB0515435 to JLB and SKF, the Australian Research provide predictable drive to increase morphological complexity. Genome Res. Council Grant FT100100763 to SKF; and the Australian Research 16, 805–814. Council Grant FF0561326 to JLB. The 1000 Plants (1KP) initiative, Galtier, J., 2010. The origins and early evolution of the megaphyllous leaf. Int. J. Plant Sci. 171, 641–661. led by GKSW, is funded by the Alberta Ministry of Enterprise and Gelman, A., Rubin, D.B., 1992. Inference from iterative simulation using multiple Advanced Education, Alberta Innovates Technology Futures (AITF) sequences. Stat. Sci. 7, 457–511. Innovates Centre of Research Excellence (iCORE), Musea Ventures, Gleave, A.P., 1992. A versatile binary vector system with a T-DNA organizational- structure conducive to efficient integration of cloned DNA into the plant and BGI-Shenzhen. genome. Plant Mol. Biol. 20, 1203–1207. Hoffman, L.A., Tomescu, A.M.F., 2013. An early origin of secondary growth: Appendix A. Supplementary material Franhueberia Gerriennei Gen. Et Sp Nov from the Lower Devonian of Gaspe (Quebec, Canada). Am. J. Bot. 100, 754–763. Hong, S.Y., Kim, O.K., Kim, S.G., Yang, M.S., Park, C.M., 2011. Nuclear import and DNA Supplementary data associated with this article can be found, in binding of the ZHD5 transcription factor is modulated by a competitive peptide the online version, at http://dx.doi.org/10.1016/j.ympev.2014.06. inhibitor in Arabidopsis. J. Biol. Chem. 286, 1659–1668. Hu, W., dePamphilis, C.W., Ma, H., 2008. Phylogenetic analysis of the plant-specific 017. Zinc Finger-Homeobox and Mini Zinc Finger gene families. J. Integr. Plant Biol. 50, 1031–1045. Huelsenbeck, J., Ronquist, F., 2001. MrBayes: Bayesian inference of phylogeny. Bioinformatics 17, 754–755. References Jiao, Y.N., Wickett, N.J., Ayyampalayam, S., Chanderbali, A.S., Landherr, L., Ralph, P.E., Tomsho, L.P., Hu, Y., Liang, H.Y., Soltis, P.S., Soltis, D.E., Clifton, S.W., Schlarbaum, S.E., Schuster, S.C., Ma, H., Leebens-Mack, J., dePamphilis, C.W., 2011. Ancestral Aburomia, R., Khaner, O., Sidow, A., 2003. Functional evolution in the ancestral polyploidy in seed plants and angiosperms. Nature 473, U97–U113. lineage of vertebrates or when genomic complexity was wagging its Kelch, D.G., Mishler, B.D., 2004. Inferring phylogeny using genomic characters: a morphological tail. J. Struct. Funct. Genomics 3, 45–52. case study using land plant plastomes. In: Goffinet, B., Hollowell, V.C., Magill, Allen, E., Xie, Z.X., Gustafson, A.M., Sung, G.H., Spatafora, J.W., Carrington, J.C., 2004. R.E. (Eds.), Molecular Systematics of Bryophytes: Progress, Problems, and Evolution of microRNA genes by inverted duplication of target gene sequences Perspectives. Missouri Botanical Garden Press, St. Louis. in Arabidopsis thaliana. Nat. Genet. 36, 1282–1290. Kenrick, P., Crane, P.R., 1997. The Origin and Early Evolution of Land Plants: A Amoutzias, G.D., Robertson, D.L., de Peer, Y.V., Oliver, S.G., 2008. Choose your Cladistic Study. Smithsonian Institution Press, Washington, DC. partners: dimerization in eukaryotic transcription factors. Trends Biochem. Sci. Kim, Y.S., Kim, S.G., Lee, M., Lee, I., Park, H.Y., Seo, P.J., Jung, J.H., Kwon, E.J., Suh, S.W., 33, 220–229. Paek, K.H., Park, C.M., 2008. HD-ZIP III activity is modulated by competitive Aso, K., Kato, M., Banks, J.A., Hasebe, M., 1999. Characterization of homeodomain- inhibitors via a feedback loop in Arabidopsis shoot apical meristem leucine zipper genes in the fern Ceratopteris richardii and the evolution of the development. Plant Cell 20, 920–933. homeodomain-leucine zipper gene family in vascular plants. Mol. Biol. Evol. 16, Magnani, E., Hake, S., 2008. KNOX lost the OX: the Arabidopsis KNATM gene defines 544–552. a novel class of KNOX transcriptional regulators missing the homeodomain. Banks, H.P., 1980. The role of in the evolution of vascular plants. Rev. Plant Cell 20, 875–887. Palaeobot. Palynol. 29, 165–176. McConnell, J., Emery, J., Eshed, Y., Bao, N., Bowman, J., Barton, M.K., 2001. Role of Banks, J.A., Nishiyama, T., Hasebe, M., Bowman, J.L., Gribskov, M., dePamphilis, C., PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Albert, V.A., Aono, N., Aoyama, T., Ambrose, B.A., Ashton, N.W., Axtell, M.J., Nature 411, 709–713. Barker, E., Barker, M.S., Bennetzen, J.L., Bonawitz, N.D., Chapple, C., Cheng, C.Y., Needleman, S.B., Wunsch, C.D., 1970. A general method applicable to search for Correa, L.G.G., Dacre, M., DeBarry, J., Dreyer, I., Elias, M., Engstrom, E.M., Estelle, similarities in amino acid sequence of 2 proteins. J. Mol. Biol. 48, 443. M., Feng, L., Finet, C., Floyd, S.K., Frommer, W.B., Fujita, T., Gramzow, L., Ohno, S., 1970. Evolution by Gene Duplication. Springer-Verlag, New York. Gutensohn, M., Harholt, J., Hattori, M., Heyl, A., Hirai, T., Hiwatashi, Y., Ishikawa, Otsuga, D., DeGuzman, B., Prigge, M.J., Drews, G.N., Clark, S.E., 2001. REVOLUTA M., Iwata, M., Karol, K.G., Koehler, B., Kolukisaoglu, U., Kubo, M., Kurata, T., regulates meristem initiation at lateral positions. Plant J. 25, 223–236. Lalonde, S., Li, K.J., Li, Y., Litt, A., Lyons, E., Manning, G., Maruyama, T., Michael, Prigge, M.J., Clark, S.E., 2006. Evolution of the class III HD-Zip gene family in land T.P., Mikami, K., Miyazaki, S., Morinaga, S., Murata, T., Mueller-Roeber, B., plants. Evol. Dev. 8, 350–361. Nelson, D.R., Obara, M., Oguri, Y., Olmstead, R.G., Onodera, N., Petersen, B.L., Pils, Prigge, M.J., Otsuga, D., Alonso, J.M., Ecker, J.R., Drews, G.N., Clark, S.E., 2005. Class III B., Prigge, M., Rensing, S.A., Riano-Pachon, D.M., Roberts, A.W., Sato, Y., Scheller, homeodomain-leucine zipper gene family members have overlapping, H.V., Schulz, B., Schulz, C., Shakirov, E.V., Shibagaki, N., Shinohara, N., Shippen, antagonistic, and distinct roles in Arabidopsis development. Plant Cell 17, 61–76. D.E., Sorensen, I., Sotooka, R., Sugimoto, N., Sugita, M., Sumikawa, N., Tanurdzic, Prince, V.E., Pickett, F.B., 2002. Splitting pairs: the diverging fates of duplicated M., Theissen, G., Ulvskov, P., Wakazuki, S., Weng, J.K., Willats, W.W.G.T., Wipf, genes. Nat. Rev. Genet. 3, 827–837. D., Wolf, P.G., Yang, L.X., Zimmer, A.D., Zhu, Q.H., Mitros, T., Hellsten, U., Loque, Pryer, K.M., Schneider, H., Smith, A.R., Cranfill, R., Wolf, P.G., Hunt, J.S., Sipes, S.D., D., Otillar, R., Salamov, A., Schmutz, J., Shapiro, H., Lindquist, E., Lucas, S., 2001. Horsetails and ferns are a monophyletic group and the closest living Rokhsar, D., Grigoriev, I.V., 2011. The Selaginella genome identifies genetic relatives to seed plants. Nature 409, 618–622. changes associated with the evolution of vascular plants. Science 332, 960–963. Qiu, Y.L., Li, L.B., Wang, B., Chen, Z.D., Knoop, V., Groth-Malonek, M., Dombrovska, O., Blanc, G., Wolfe, K.H., 2004. Functional divergence of duplicated genes formed by Lee, J., Kent, L., Rest, J., Estabrook, G.F., Hendry, T.A., Taylor, D.W., Testa, C.M., polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691. Ambros, M., Crandall-Stotler, B., Duff, R.J., Stech, M., Frey, W., Quandt, D., Davis, Brandt, R., Xie, Y.K., Musielak, T., Graeff, M., Stierhof, Y.D., Huang, H., Liu, C.M., C.C., 2006. The deepest divergences in land plants inferred from phylogenomic Wenkel, S., 2013. Control of stem cell homeostasis via interlocking microRNA evidence. Proc. Natl. Acad. Sci. USA 103, 15511–15516. and microProtein feedback loops. Mech. Dev. 130, 25–33. Qiu, Y.L., Li, L.B., Wang, B., Chen, Z.D., Dombrovska, O., Lee, J., Kent, L., Li, R.Q., Jobson, Bridgham, J.T., Brown, J.E., Rodriguez-Mari, A., Catchen, J.M., Thornton, J.W., 2008. R.W., Hendry, T.A., Taylor, D.W., Testa, C.M., Ambros, M., 2007. A nonflowering Evolution of a new function by degenerative mutation in cephalochordate land plant phylogeny inferred from nucleotide sequences of seven chloroplast, steroid receptors. PLoS Genet. 4. mitochondrial, and nuclear genes. Int. J. Plant Sci. 168, 691–708. Chi, Y.J., Yang, Y., Zhou, Y., Zhou, J., Fan, B.F., Yu, J.Q., Chen, Z.X., 2013. Protein– Rambaut, A., 1996. Se-Al: Sequence Alignment Editor. . protein interactions in the regulation of WRKY transcription factors. Mol. Plant Rensing, S.A., Lang, D., Zimmer, A.D., Terry, A., Salamov, A., Shapiro, H., Nishiyama, 6, 287–300. T., Perroud, P.F., Lindquist, E.A., Kamisugi, Y., Tanahashi, T., Sakakibara, K., Fujita, Corvez, A., Barriel, V., Dubuisson, J.Y., 2012. Diversity and evolution of the T., Oishi, K., Shin-I, T., Kuroki, Y., Toyoda, A., Suzuki, Y., Hashimoto, S., megaphyll in Euphyllophytes: phylogenetic hypotheses and the problem of Yamaguchi, K., Sugano, S., Kohara, Y., Fujiyama, A., Anterola, A., Aoki, S., foliar organ definition. C.R. Palevol 11, 403–418. Ashton, N., Barbazuk, W.B., Barker, E., Bennetzen, J.L., Blankenship, R., Cho, S.H., Delaux, P.M., Xie, X.N., Timme, R.E., Puech-Pages, V., Dunand, C., Lecompte, E., Dutcher, S.K., Estelle, M., Fawcett, J.A., Gundlach, H., Hanada, K., Heyl, A., Hicks, Delwiche, C.F., Yoneyama, K., Becard, G., Sejalon-Delmas, N., 2012. Origin of K.A., Hughes, J., Lohr, M., Mayer, K., Melkozernov, A., Murata, T., Nelson, D.R., strigolactones in the green lineage. New Phytol. 195, 857–871. Pils, B., Prigge, M., Reiss, B., Renner, T., Rombauts, S., Rushton, P.J., Sanderfoot, A., Doran, J.B., 1980. A new species of Psilophyton from the Lower Devonian of Northern Schween, G., Shiu, S.H., Stueber, K., Theodoulou, F.L., Tu, H., Van de Peer, Y., New-Brunswick, Canada. Can. J. Bot. – Rev. Can. Bot. 58, 2241–2262. Verrier, P.J., Waters, E., Wood, A., Yang, L.X., Cove, D., Cuming, A.C., Hasebe, M., Edger, P.P., Pires, J.C., 2009. Gene and genome duplications: the impact of dosage- Lucas, S., Mishler, B.D., Reski, R., Grigoriev, I.V., Quatrano, R.S., Boore, J.L., 2008. sensitivity on the fate of nuclear genes. Chromosome Res. 17, 699–717. The Physcomitrella genome reveals evolutionary insights into the conquest of Emery, J.F., Floyd, S.K., Alvarez, J., Eshed, Y., Hawker, N.P., Izhaki, A., Baum, S.F., land by plants. Science 319, 64–69. Bowman, J.L., 2003. Radial patterning of Arabidopsis shoots by class III HD-Zip Riechmann, J.L., Heard, J., Martin, G., Reuber, L., Jiang, C.Z., Keddie, J., Adam, L., and KANADI genes. Curr. Biol. 13, 1768–1774. Pineda, O., Ratcliffe, O.J., Samaha, R.R., Creelman, R., Pilgrim, M., Broun, P., Floyd, S.K., Bowman, J.L., 2006. Distinct developmental mechanisms reflect the Zhang, J.Z., Ghandehari, D., Sherman, B.K., Yu, C.L., 2000. Arabidopsis independent origins of leaves in vascular plants. Curr. Biol. 16, 1911–1917. transcription factors: genome-wide comparative analysis among . Floyd, S.K., Bowman, J.L., 2007. The ancestral developmental tool kit of land plants. Science 290, 2105–2110. Int. J. Plant Sci. 168, 1–35. S.K. Floyd et al. / Molecular Phylogenetics and Evolution 81 (2014) 159–173 173

Seo, P.J., Hong, S.Y., Kim, S.G., Park, C.M., 2011. Competitive inhibition of transcription in the leaves and stems of Arabidopsis thaliana. Development 121, 2723– factors by small interfering peptides. Trends Plant Sci. 16, 541–549. 2735. Seoighe, C., Gehring, C., 2004. Genome duplication led to highly selective expansion Tanabe, Y., Hasebe, M., Sekimoto, H., Nishiyama, T., Kitani, M., Henschel, K., of the Arabidopsis thaliana proteome. Trends Genet. 20, 461–464. Munster, T., Theissen, G., Nozaki, H., Ito, M., 2005. Characterization of MADS- Sessa, G., Steindler, C., Morelli, G., Ruberti, I., 1998. The Arabidopsis Athb-8,-9 and - box genes in charophycean and its implication for the evolution of 14 genes are members of a small gene family coding for highly related HD-ZIP MADS-box genes. Proc. Natl. Acad. Sci. USA 102, 2436–2441. proteins. Plant Mol. Biol. 38, 609–622. Trant, C.A., Gensel, P.G., 1985. Branching in Psilophyton – a new species from the Siriwardana, N.S., Lamb, R.S., 2012. A conserved domain in the N-terminus is Lower Devonian of New-Brunswick, Canada. Am. J. Bot. 72, 1256–1273. important for LEAFY dimerization and function in Arabidopsis thaliana. Plant J. Wenkel, S., Emery, J., Hou, B.H., Evans, M.M.S., Barton, M.K., 2007. A feedback 71, 736–749. regulatory module formed by LITTLE ZIPPER and HD-ZIPIII genes. Plant Cell 19, Staudt, A.C., Wenkel, S., 2011. Regulation of protein function by ‘microProteins’. 3379–3390. EMBO Rep. 12, 35–42. Zalewski, C.S., Floyd, S.K., Furumizu, C., Sakakibara, K., Stevenson, D.W., Bowman, Swofford, D.L., 2003. PAUP⁄. Phylogenetic Analysis Using Parsimony (⁄ and Other J.L., 2013. Phylogeny of Class IV HD-Zip genes and the evolution of epidermal Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts. complexity in land plants. Mol. Biol. Evol. 30, 2347–2365. Talbert, P.B., Adler, H.T., Parks, D.W., Comai, L., 1995. The REVOLUTA gene is Zhang, J.Z., 2003. Evolution by gene duplication: an update. Trends Ecol. Evol. 18, necessary for apical meristem development and for limiting cell divisions 292–298.