Gene 493 (2012) 113–123

Contents lists available at SciVerse ScienceDirect

Gene

journal homepage: www.elsevier.com/locate/gene

Analysis of the formation of flower shapes in wild species and cultivars of tree using the MADS-box subfamily gene☆

Qingyan Shu a,⁎, Liangsheng Wang a, Jie Wu a,b, Hui Du a,b, Zheng'an Liu a, Hongxu Ren a,⁎⁎, Jingjing Zhang a,b a Beijing Botanical Garden, Institute of Botany, Chinese Academy of Sciences, 20 Nanxin Cun, Xiangshan, Haidian District, Beijing 100093, PR China b Graduate School of the Chinese Academy of Sciences, Beijing 100049, PR China article info abstract

Article history: Tree peony (Paeonia suffricotisa) cultivars have a unique character compared with wild species; the stamen Accepted 3 November 2011 petalody results in increased whorls of petals and generates different flower forms, which are one of the Available online 21 November 2011 most important traits for cultivar classification. In order to investigate how petaloid stamens are formed, we obtained the coding sequence (666 bp) and genomic DNA sequence of the PsTM6 genes (belongs to B sub- Edited by Meghan Jendrysik family of MADS-box gene family) from 23 tree peony samples, Five introns and six exons consisted of the ge- nomic DNA sequence. The analysis of cis-acting regulatory elements in the third and fourth intron indicated Keywords: Tree peony that they were highly conserved in all samples. Partial putative amino acids were analyzed and the results Stamen petaloid suggested that functional differentiation of PsTM6 paralogs apparently affected stamen petalody and flower Cis-acting regulatory element shape formation due to due to amino acid substitution caused by differences in polarity and electronic charge. Sliding window analysis Sliding window analysis indicated that the different regions of PsTM6 were subjected to different selection Flower shape forces, especially in the K domain. This is the first attempt to investigate genetic control of the stamen petal- ody based on the PsTM6 sequence. This will provide a basis for understanding the evolution of PsTM6 and its the function of in determining stamen morphology of tree peony. Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved.

1. Introduction all of them are native to China. The amount of stamen varied largely within different cultivars or species, some or all stamens can hetero- The genus Paeonia is divided into three sections, including Moutan, morphically develop into petals, which formed into many kinds of Onaepia and Paeonia. The tree peony (Paeonia suffruticosa), in the section cultivars. Due to the stamen petaloid, it formed into a large variety Moutan, is the most primitive, followed by Onaepia, while section Paeo- of flower shapes such as lotus, crown, chrysanthemum, rose, globular, nia is more recent and more evolved (Hong et al., 1998). It is deduced and crown-proliferation (ESM_Fig. 1), which is one of the most im- that woody from sect. Moutan is firstly derived from the ancestor portant bases for the classification of cultivars. In tree peony, it is of Paeonia. In sect. Moutan, according to the morphology of flowers (flo- very unique that the flower of the wild species only has 1–3 whorls ral disks), it can be divided into two subsections, namely subsect. Vagina- of petals named as single flower, in contrast, the cultivars have vari- tae (flower disk is leathery) and Delavayanae (flower disk is fleshy), the ous number of whorls of petals because of stamen petalody. Till content of paeonol and its analogs in the subsect. Vaginatae was lower now, there is no attempt to uncover the inherent mechanism by ex- than that of subsect. Delavayanae, which means that the former was con- ploring genes responsible for stamen or petal development. sidered more evolved as compared with the latter (Guo et al., 2008). Great progress has been made on flower development genetics on Most plants of sect. Moutan are the diploid subshrub with numer- model plants like Arabidopsis, Antirrhinum and Petunia (Theissen et al., ous flowers of five carpels. Sect. Moutan contains 9 wild species and 2000). According to the ABCDE model of flower development, five clas- ses of transcription factors are responsible for the identities of four whorls of floral organs in a combinational manner. A and E-class Abbreviations: MADS, the first character of MCM1 from the budding yeast Saccharo- genes are responsible for the specification of sepals, A+B+E for petals, myces cerevisiae; AGAMOUS, from the thale cress Arabidopsis thaliana; DEFICIENS, from B+C+E for stamens, and C+E for carpels (Theissen et al., 2000; the snapdragon Antirrhinum majus; SRF, from the human Homo sapiens; ORF, Open fi reading frame; cDNA, DNA Complementary to RNA; RT-PCR, Reverse transcription- Melzer and Theißen, 2009). Among the ve classes genes, it is polymerase chain reaction; RACE, Rapid amplification of cDNA ends; UTR, Untranslated needed to mention B-class genes for petal and stamen identity, all region(s); dNTP, deoxyribonucleoside triphosphate. class B genes known to date belonging to subfamily DEFICIENS (DEF)/ ☆ Flower shape analysis of tree peony by MADS-box subfamily gene. GLOBOSA (GLO) have been studied more extensively and profoundly ⁎ Correspondence to: Q. Shu, Tel.: +86 10 62836655; fax: +86 10 62590348, +86 (Zahn et al., 2005), which indicated that two major gene duplication 10 62836010. ⁎⁎ Correspondence to: H. Ren, Tel: +86 10 62836010; fax: +86 10 62836010. events took place within this subfamily, one happened before extant E-mail addresses: [email protected] (Q. Shu), [email protected] (H. Ren). angiosperm and produced the DEF/AP3 (paleoAP3) and GLO/PI lineages.

0378-1119/$ – see front matter. Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2011.11.008 114 Q. Shu et al. / Gene 493 (2012) 113–123

Fig. 1. Sequence analyses of PsTM6. MADS, K and C domains were underlined and defined by double, single, and wave line respectively, as published by Ma et al.1991, between MADS and K domain was I domain. indicated for PI motif-derived and for paleoAP3 motif, K1 (positions 87–108), K2 (positions 121–135) and K3 (positions 142–168) sub-domains of K domain were indicated in shade, the conserved residue for amino acid at different positions were in bold. The reversed triangle filled in black indicated the positions of introns. The asterisk indicated a stop codon.

It is thought that a new duplication of paleoAP3 gave origin to the which was helpful to understand the genetic control for the forma- two paralogous lineages, TM6 and euAP3 lineage before the diversifi- tion of flower shapes. The comparison of the evolutionary rates of cation of the major higher eudicot subclasses (Kramer et al., 1998). paralogous PsTM6 coding sequences might give some insights into The first TM6 member being discovered was tomato MADS box functional evolution. Based on the exons sequences, a Neighour- gene 6 (Pnueli et al., 1991), another two TM6 members have been joining tree of wild species was constructed to illustrate the relation- functionally characterized in petunia and tomato, which has been ship among them. This is the first attempt to use B class MADS-box suggested that possibly after euAP3/TM6 duplication, euAP3 genes gene to uncover their phylogenetic relationship. This will provide a acquired a role in petal development while TM6 began to control basic knowledge for the mechanism of stamen petaloid and the for- stamen development (Kramer et al., 2006). AP3/DEF belongs to the mation of flower shape in tree peony. MIKC-type MADS-domain proteins which consist of four domains in- cluding a MADS domain (M), an intervening region (I), a Keratin-like 2. Materials and methods domain (K) and a C-terminus (C). The Keratin-like domain, located betweenIandCdomain,isinvolvedinmediatingspecificprotein/ 2.1. Materials protein interactions (Zahn et al., 2005). In this study, the coding sequence and genomic DNA sequence of In this study, 9 wild species of Paeonia sect. Moutan, namely TM6 genes in tree peony were obtained and analyzed. The partial nu- P. jishanensis, P. ludlowii, P. ostii, P. decomposita, P. qiui, P. rockii, P. delavayi, cleotides and putative amino acids were compared among 9 wild spe- P. potaninii,andP. lutea, and 14 cultivars with different flower shapes cies and 14 cultivars to characterize the functional differentiation, were collected from Beijing Botanical Garden, Institute of Botany, the Q. Shu et al. / Gene 493 (2012) 113–123 115

Chinese Academy of Sciences and Yuzhong Peace Peony Garden, Gansu 2008). The primers for PCR amplification was as follows: Forward province, China (Table 1). For cultivars, the stamens in single flowers primer: 5-‘ATGGSTMGWGGRAAGATYGAGAT-3’; Reverse primer: R1: developed normally and completely, but some of them showed hetero- 5-‘AGCATAAAAGAAGAAGAGGGTAAAT’-3 (Genomic DNA sequence morphosis into petals and formed into different flower shapes, namely, for ‘Huai nian’); R2: 5-‘KSTGTAGGTAYCAGTCTGGGTTTTG’-3 (Partial part of stamens became petaloid (‘Taiyō’,ESM_Fig. 1B), or all stamens genomic DNA sequences for 23 samples). The PCR reaction mixture were petaloid in some cultivars and even disappeared (‘Hong lou cang (20 μl) was composed of 90 ng genomic DNA, 200 μM dNTPs, jiao’,ESM_Fig. 1E). Based on the degree of stamen petalody, the flower 2.5 mM MgCl2, 0.3 μM primers, 10× buffer, and 1 U Taq DNA poly- shape was classified into 7 types such as single form (‘Feng dan’,ESM_ merase (Transgen Biotech Ltd Co., Beijing). The amplification was car- Fig. 1A; ‘Dian dan er hao’), lotus form (‘Shu hua zi’), chrysanthemum ried out in an Eppendorf Mastercycler Gradient (Type 5331, form (‘He ping hong’, ‘Huai nian’, ‘Qing luo’ and ‘Taiyō’), rose form Eppendorf AG, Hamburg, Germany) using the following program: de- (‘Luo yang hong’,ESM_Fig. 1C), crown form (‘Yao huang’,ESM_ naturing at 94 °C for 5 min, 40 cycles of denaturing at 94 °C for 30 s, Fig. 1D; ‘Lan tia yu’, ‘Lan xian nv’), globular form (‘Hong lou cang jiao’) annealing at 55 °C for 30 s, and elongation at 72 °C for 3 min, followed and crown proliferation form (‘Hong hua lou shuang’)(Wang, 1997). by final elongation at 72 °C for 7 min. The objective fragments However, in wild species, all wild plants have single flower form, part were cloned into pEASY-T3 vector (Transgen Biotech Ltd Co., Beijing) of wild species like, P. ludlowii (ESM_ Fig. 1F), P. decomposita (ESM_ by TA reaction. The sequencing was conducted with two directions Fig. 1G), P. delavayi (ESM_ Fig. 1H) and P. rockii,(ESM_Fig. 1I) were (T7, SP6). illustrated. 2.4. Gene structure and cis-acting regulatory elements analysis 2.2. Open reading frame (ORF) cloning By comparing the mRNA sequence and genomic DNA sequence of Total RNA isolation was followed the kit of RNAtrip (Beijing PsTM6, gene structure was analyzed such as the position and number Applygene Co. Ltd, China). Primers used for reverse transcription (RT) of introns and exons. The cis-regulatory element was also analyzed ‘ and PCR were as follows: PTA (for RT): 5- CCGGATCCTCTAGAGCG- for the third and fourth intron among 23 samples using publicly ’ ‘ ’ GCCGC(T)17 -3. AP: 5- CCGGATCCTCTAGAGCGGCCGC -3. The forward available database: database of Plant Cis-acting Regulatory DNA ′ fi primer used for 3 RACE (Rapid Ampli cation of cDNA ends) was 5- Elements (PLACE, http://www.dna.affrc.go.jp/PLACE/signalscan.html) ‘ ’ ′ MTCAAAACCCAGACTGRTACCTACA -3. Then the 5 end sequence was (Higo et al., 1999). obtained using primers, M1 sense 5-‘ATGGSTMGWGGRAAGATYGA- GAT’-3, and Anti: 5-‘KSTGTAGGTAYCAGTCTGGGTTTTG’-3. PCR was performed in a final volume 20 μl including 2 μlof10× 2.5. Simple sequence repeats analysis PCR reaction buffer (Mg2+ Plus), 2.5 μl of dNTPs (dATP, dTTP, dCTP and dGTP 2.5 mM each), 2 μl of forward primer (5 μM), 2 μl of reverse The analysis for simple sequence repeats was carried out using primer (5 μM), 1 μl of template DNA (90 ng/μl), 10.3 μl MQW and SSR Hunter (Li and Wan, 2005). The parameters for the minimal num- fl 0.2 μl LA Taq DNA polymerase (Takara Biotechnology (Dalian) Co., ber of repeats and types were set to 5. The anking sequence of SSRs LTD.). The reaction was set to perform 1 cycle of 5 min at 94 °C for de- about 300 bp in length was used for further analysis. naturalization followed by 35 cycles of 1 min at 94 °C, 1 min at 55 °C, 1 min at 72 °C, and then 1 cycle of 10 min at 72 °C for final extension. 2.6. Putative amino acid analysis

2.3. Genomic DNA isolation, PCR reaction and sequencing The partial encoding sequences was edited from genomic DNA se- quences, the putative amino acid was obtained by using DNAman 2.0. A modified version of the cetyltrimethyl ammonium bromide Then all the putative amino acids from all samples were aligned by (CTAB) method was used to extract genomic DNA (Han et al., Clustal X version 1.81 and used for comparison such as amino acids

Table 1 Tree peony (Paeonia) samples examined in this study.

Common and Latin names Abbreviation Flower type Stamen petaloidy Subsection

‘He ping hong’ HPH Chrysanthemum Partial ‘Hong lou cang jiao’ HLCJ Globular Partial ‘Huai nian’ HN Chrysanthemum Partial ‘Lan xian nv’ LXN Crown Complete ‘Yao huang’ YH Crown Complete ‘Shu hua zi’ SHZ Lotus Partial ‘Qing luo’ QL Chrysanthemum Partial ‘Lan tian yu’ LTY Crown Complete ‘Luo yang hong’ LYH Rose Partial ‘Dian dan er hao’ DDRH Single No ‘Dian dan san hao’ DDSH Single No ‘Hong hua lou shuang’ HHLS Crown-proliferation Complete ‘Taiyō’ TaiyO Chrysanthemum Partial ‘Feng dan’ FD Single No P. jishanensis Single No Vaginatae P. ludlowii Single No Delavayanae P. ostii Single No Vaginatae P. decomposita Single No Vaginatae P. qiui Single No Delavayanae P. delavayi Single No Delavayanae P. potaninii Single No Delavayanae P. rockii Single No Vaginatae P. lutea Single No Delavayanae 116 Q. Shu et al. / Gene 493 (2012) 113–123 substitution like changes of amino acid charge, hydrophobicity, size 3.2. The structure of PsTM6 genomic sequence and phylogenetic construction (Thompson et al. 1997). There are in total 5 introns in the genomic DNA sequence of PsTM6 2.7. GC content, Effective Number of Codons (ENC), nonsynonymous (Fig. 1). Its intron-exon boundaries fell between exons and were bor- ′ ′ nucleotide substitution/synonymous substitution (Ka/Ks) and dered by the 5 -GT and AG-3 consensus sequences. As with plants, fi sliding window analysis the AT content was signi cantly higher in the introns than that in the exons. It had one intron less than that of most angiosperm, fi The partial exons of 23 samples were aligned by Clustal X version which is characterized by six introns and ve exons (Janssens et al., fi 1.81 (Thompson et al. 1997) and Bioedit Version 7.0.5 (Hall, 1999), 2007). The rst intron, about 84 bp in length and between fi applying the default parameters for gap opening and gap extension. sequences encoding MADS and I-domain, was located at the fth GC content, ENC, Ka/Ks and sliding window analysis was carried out amino acid of I-domain. The second one, 81 bp in length, was spaced using DnaSP V5.0 (Librado and Rozas, 2009). For Ka/Ks analysis, se- by sequences encoding the second last amino acid of I-domain. In quences from wild species were used as inter-group, and those from Arabidopsis, the most favorable intron length was between 80 and cultivars as intra-group. The sliding window analysis was carried 90 bp (Hebsgaard et al., 1996). Goodall and Filipowicz (1990) hy- – out with window length of 30 bp and step size of 10 bp. pothesized a minimum intron length of 70 73 nt in dicots, however, Filipowicz et al. (1994) postulated a minimum of only 64 nt. The third 94 bp long intron was located and separated by sequences encoding 2.8. Phylogenetic analysis amino acids E, I, K and Q, R, M and G within K-domain, and the fourth intron was the longest and about 827 bp in length, was between se- Sequences including genomic DNA or encoding sequence and pu- quences for amino acids of R, E, R, K and Y, H, K, L within K-domain. tative amino acids were aligned using Clustal X version 1.81 and re- The fifth intron was about 74 bp and located in the boundary of K fi fi ned manually. After that, the sequence le was formatted using and C domain. It can be assumed that the natural mutational bias to- fi Bioedit Version 7.0.5 by lling gaps as single characters in the data wards deletions has been countered by strong selection in order to matrix (Hall, 1999). In the analysis, gaps were treated as single preserve the minimum intron length that is necessary for intron splic- events, thus preventing their overweighting (based on gap length) ing (Comeron, 1999). As compared with that of HmAP3 (Hydrangea in the subsequent phylogenetic analysis. Neighbor-Joining method macrophylla), it has 5 introns in the sequence encoding K-domain was used for phylogenetic tree construction (MEGA 4.1) based on (Geuten et al., 2006), in PsTM6, there are in total 3 introns inside the following options besides complete deletion (Gaps/Missing the sequence for K-domain, the position and length of introns in data), Maximum composite likelihood (model) and substitutions in- PsTM6 was different from that of HmAP3. Introns 4 and 5 in K-domain cluding transitions and transversions. Bootstrap test with 1000 repli- of AP3/DEF contain useful phylogenetic information, especially in cates was followed after construction (Tamura et al., 2007). Bootstrap IMPDEF1 and IMPDEF2 with a higher percentage of variable sites similar value less than 50 was not indicated in the tree. to the chloroplast atpB-rbcL spacer (Janssens et al., 2007). Whether these introns are important for gene expression or not, it needs to be 3. Results and discussion further characterized.

3.1. ORF feature of PsTM6 (TM6 of P. suffruticosa) 3.3. SSRs and cis-acting regulatory analysis for the third intron and its divergency among all samples The cDNA of PsTM6 was isolated, analysis of the sequence showed that the open reading frame (ORF) of PsTM6 was 666 bp long, encodes K domain is important for its function as transcription factor, but it a protein with putative length of 222 amino acids including 57 amino was separated by two introns (the third and fourth intron), which acids of MADS domain, 29 amino acids of I (intervening) domain, may suggest the importance of the intron. There was one simple se- 83 amino acids of K (Keratin-like) domain and 53 amino acids of quence repeat (SSR) in the third intron with repeat unit (CT) n. This C (C-terminal) domain (Fig. 1). The representative glycine residue intron was submitted for cis-acting regulatory element analysis by (G-110) (Indicated in the Fig. 1 by bold) supported that it belonged PLACE database, there were in total 23 cis-acting regulatory- to MIKC-type proteins. Based on the past research, TM6 has the elements (ESM_Table 3), including the CTRMCAMV35S with motif conserved PI-motif-derived (FxFRLQPSQPNLH) and paleoAP3 motif TCTCTCTCT, which was also found in the promoter of DEFICIENCE (YGxHDLRLA) (Kramer et al., 1998), PsTM6 also contained the (DEF) gene of B class MADS-box genes (De Bodt et al., 2006). More two conserved motifs like YAFRLHSSHH (PI-motif-derived) and and more evidence indicated that non-coding microsatellites in YMMCTSLES (paleoAP3 motif). It has been postulated that K domain plants involved in gene regulation. In Arabidopsis, CT/GA and CTT/ folds into 3 amphipathic α-helices referred to as K1, K2 and K3 sepa- GAA repeats were found abundant in the 5′-flanks, suggesting that rated by inter-helical regions, each conferred different dimerization they can be potentially function as factors in regulating gene expres- specificities. In the K domain of PsTM6, the first 20 amino acids sion (De Bodt et al., 2006). The unit of (CT) n included the TCTCtCT se- (from positions 87 to 108) encoded an amphipathic α-helices re- quences similar to the TCCC motif known as conserved DNA module ferred to as K1 with some featured residue as M-87, L-91 and N-98, array AtpCD-CMA involved in light responsiveness (Morgante et al., L-101, R-102, I-105 and R-108, which was supposed to be required 2002), (CT) n might be as an enhancer due to the motif (TCTCTCTCT) for PI/AP3 interaction (Yang and Jack, 2004). The second conserved found in a 60-nt region downstream of the transcription start sites of helix called K2 (positions 121–135) with conserved protein residues the CaMV35S RNA, which could enhance gene translation in plant at L-121 and 124, S-131 and I-135, which indicates that it is critical protoplasts (Arguello-Astorga and Herrera-Estrella, 1996). As com- for the strength of the protein-protein interaction of PI/SEP3 (or PI/ plementary sequences to (CT) n, (GA) n served as a regulatory ele- SEP1) and AP3/PI (Yang and Jack, 2004). Between the amino acids ment having similar functions of GAG motif (AGAGAGa), which of sites 109–120, it comprises K1–K2 inter-helical region with G-110, involved in light regulation (Arguello-Astorga and Herrera-Estrella, L-113 and 116, this region with K2 subdomain contributes to the AP3/ 1996). Maybe SSRs of (CT) n in the third intron was one of the impor- PI interaction. K2-K3 inter-helical regions (positions 136–142) contain tant cis-acting regulatory elements responsible for PsTM6 gene several charged amino acids. K3 sub-domain consists of 26 amino regulation. acids (positions 143–168) with conserved residues like T-147, Y-150 Microsatellites generally evolve rapidly, but there are about 10% of and L-164. noncoding CT/GA and CTT/GAA repeats showing high conservation in Q. Shu et al. / Gene 493 (2012) 113–123 117 occurrences and ancient, which may be explained by function con- Table 2 straint so that many homologous genes have the corresponding mi- Cis-acting regulatory element analysis on the third intron of PsTM6 tree peony using PLACE database. crosatellite sequences in their regulatory regions (Pauli et al., 2004). Most microsatellites of CT/GA and CTT/GAA types seem to be originat- Factor or site name Loc. (Str.) Signal sequence Site # ed by recent mutations under positive selection (Morgante et al., ARFAT Site 26 (+) TGTCTC S000270 2002), the reasons of positive selection for some repeat occurrences ARR1AT Site 2 (−) NGATT S000454 are still unknown, but it may provide opportunities for rapid adaptive ARR1AT Site 75 (−) NGATT S000454 changes in these regulatory regions or play specific roles in gene reg- CTRMCAMV35S Site 16 (+) TCTCTCTCT S000460 CTRMCAMV35S Site 18 (+) TCTCTCTCT S000460 ulation (Zhang et al. 2006). DOFCOREZM Site 33 (−) AAAG S000265 Although several studies have shown that introns of low-copy nu- DOFCOREZM Site 41 (−) AAAG S000265 clear genes tend to diverge at high rate for both nucleotide and indel EBOXBNNAPA Site 5 (+) CANNTG S000144 substitution (Sang, 2002), it seemed different from intron 3 and 4 in EBOXBNNAPA Site 5 (−) CANNTG S000144 GT1CONSENSUS Site 34 (−) GRWAAW S000198 PsTM6. The average length of the third intron remained rather con- GT1CONSENSUS Site 35 (−) GRWAAW S000198 stant in all samples ranging from 89 bp to 113 bp (Fig. 2), the differ- GT1GMSCAM4 Site 34 (−) GAAAAA S000453 ence in the length was most due to the different repeat of (CT) n tt MYB1LEPR Site 63 (+) GTTAGTT S000443 (CT) n, but in ‘Feng dan’‘Luo yang hong’, ‘Taiyō’, ‘Yao huang’ and MYBCORE Site 46 (+) CNGTTR S000176 P. osti, the first (CT) n repeat existed, due to deletion of thymine (T) MYCCONSENSUSAT Site 5 (+) CANNTG S000407 MYCCONSENSUSAT Site 5 (−) CANNTG S000407 and the latter part lost (CT) n made their short length of the third in- NODCON2GM Site 31 (+) CTCTT S000462 tron as compared with that of other samples (ESM_File 4). In the left OSE2ROOTNODULE Site 31 (+) CTCTT S000468 boundary of the third intron, fifteen nucleotides remained the same, POLASIG2 Site 70 (+) AATTAAA S000081 namely left flanks, then in the sixteenth nucleotide, most of them RAV1AAT Site 47 (−) CAACA S000314 consisted of cytosine (C), except that in ‘Huai nian’, ‘Yao huang’ and SEBFCONSSTPR10A Site 25(+) YTGTCWC S000391 SURECOREATSULTR11 Site 27 (−) GAGAC S000499 P. rockii, C was substituted by thymine (T), however, in the (CT) n, one nucleotide of T was substituted by A in ‘Luo yang hong’ and ‘Yao huang’ which caused its repeat was interrupted at this site. EBOXBNNAPA, MYCCONSENSUSAT, NODCON2GM, OSE2ROOTNODULE, This feature was conserved in the right boundary, the same 48 nucle- POLASIG2 and PYRIMIDINEBOXHVEPB1(Table 2). For the motif of otides consisted of the right flanks among all species and cultivars. CTRMCAMV35S, the number of repeats can be classified into three clas-

Between the two boundaries, it was changeable in the sequences ses among wild species, namely (CT)8 (P. jishanensis, P. decomposita among all samples (ESM_File 4). In view of the importance of selec- and P. lutea), (CT)9 (P. ludlowii,andP. qiui) and (CT)10 (P. potaninii, tive splicing, the conserved sequence in the boundary might play im- P. delavayi and P. rockii). It suggested that the (TC) n may enhance portant role. Some recent studies had focused on the role of simple gene expression (Pauli et al., 2004), then different number of this sequence repeats (SSR) in cells, studies on the mechanism of diseases motif maybe contributes to the expression level of PsTM6 gene, which caused by excessive expansions of triplet repeats in intonic or is important for petal and stamen identity. But in P. ostii due to no T in- untranslated regions (UTRs) had given new insight into the func- sertion after (CT) n, it lacked the motif of PYRIMIDINEBOXHVEPB1 tional roles of microsatellites in transcription or translation (TTTTTTCC), which was the same in cultivars like ‘Yao huang’, ‘Fen (Morgante et al., 2002). In the third intron, the pyrimidine-rich gdan’, ‘Hong lou cang jiao’, ‘Hong hua lou shuang’, ‘Luo yang hong’, microsatellites namely (CT) n was also found in Rice (Oryza sativa and ‘Taiyō’. Due to the short length of these cultivars and P. osti,the ssp. japonica cv. Nipponbare) and Arabidopsis thaliana (Mount et al., number of their motifs of NODCON2GM and OSE2ROOTNODULE was 1992). It is known that increasing the length of (CT) n in the promoter one less than that of the other samples. Whether the insertion of T6 is region leads to activation of the promoter, while substitution of (CT) an ancient character or appears in the following adaptation, function n with purines will decrease the activity. Whether different (CT) re- of its deletion or addition is needed to be further studied. peats will regulate PsTM6 gene transcription and translation or not, and how to affect the stamen petaloid, it still needs to be further 3.4. Feature for the fourth intron and its divergence among 23 samples characterized. Except of the length difference, there were in total 6 to 7 motifs in The fourth intron was subjected for cis-acting regulatory element the third intron despite the length difference including CTRMCAMV35S, analysis. The result indicated that there were a lot of cis-acting regulatory

Fig. 2. The length variation of the third intron among 23 tree peony samples. The Y-axis indicated the length (bp) of intron, the X-axis indicated samples with different intron length. 118 Q. Shu et al. / Gene 493 (2012) 113–123 motifs within the fourth intron, WUSATAG elements were found, in was found in ‘Hong hua lou shuang’, and an additional motif of Arabidopsis, the WUSCHEL homeodomain proteins have been shown SURECOREATSULTR11 (GAGAC) in ‘Luo yang hong’ was found besides to bind to WUSATAG motifs and regulate the formation and mainte- all these conserved 27 motifs. nance of shoot and root apical meristem, it also regulated the specificex- The cis-acting regulatory elements for the fourth intron were con- pression of AGAMOUS gene in petals and stamens (Kamiya et al., 2003). served notably like CIACADIANLELHC (CAANNNNATC), DOFCOREZM Another important motif, CCAAT box (CCAAT) was identified which is es- (AAAG) and NTBBF1ARROLB (ACTTTA), which have been found in sential for the specificexpressionforAGAMOUS in the mature carpel and the promoter of Apetala3 promoters (Koch et al., 2001). This may sug- stamen, lack of this box will prevent or decrease its expression (Kamiya gested its conserved function for this gene expression, and it was et al., 2003). The other most prominent motifs were recognized confirmed in gene encoding interleukins of which the conserved non- including ARR1AT (NGATT), CAATbox1 (CAAT), TATAbox (TATAAAT, coding sequences were reliable guides to regulatory elements. In TTATTT), TAAAGSTKST1 (TAAAG), WBOXHVIS01 (TGACT), WRKY710S phylogenetic footprinting, regions that are conserved between ortho- (TGAC), ROOTMOTIFTAPOX1 (ATATT), SEF1MOTIF (ATATTTAWW), logous regulatory sequences are thought to be functional important SEF4MOTIFGM7S (RTTTTTR), POLLEN1LELAT52 (AGAAA), MYCCON- (De Bodt et al., 2006), since this non-coding sequences have survived SENSUSAT (CANNTG) and CACTFTPPCA1 (YACT). The other previously- selection for a longer period of time. identified putative stress-responsive cis-acting regulatory elements More and more researches indicated that SNPs inside the intron also existed in the fourth intron including the ABA-responsive were important for the gene expression and function (Ingvarsson EBOXBNNAPA and SEF4MOTIFGM7S, the light-regulated IBOXCORE et al., 2006). Although the fourth intron had conserved length, there and INRNTPSADB, the dehydration-responsive elements MYBCORE, existed some substitutions of nucleotides such as T/C, C/G, T/A, A/T, the cold-responsive LTRECOREATCOR15 and the defense-responsive which may provide a good source for single nucleotide polymorphism WBOXATNPR1 (ESM_Table1). SEF1MOTIF was found and previously (SNP) in future. characterized to be involved in young tissue develoment; it also con- tained pollen-specific cis-actingregulatoryelements POLLEN1LELAT52 3.5. GC content and ENC (effective number of codon) analysis (AGAAA) which is an important regulatory motif for controlling anther and pollen development (Filichkin et al., 2004). MYBST1 is the binding The average GC content in wild species and cultivars was 40% sites motif for MYB-related protein as transcriptional activator using coding region with 453 bp long (ESM_Table4). The total GC (Baranowskij et al., 1994). Among these cis-actingregulatoryelements, content was divided into three parts such as first, second and third the most abundant one in the fourth intron was CACTFTPPCA1, followed codon, the average GC content in the third codon in wild species by CAATBOX1, ARR1AT, GATABOX, DOFCOREZM, EBOXBNNAPA, and and cultivars was 47.4% and 47.5%, respectively (ESM_Table4). It has

MYCCONSENSUSAT. Other less frequently one was ANAERO1CONSEN- been confirmed that the GC content of the third codon (GC3) has SUS, CBFHV, DRE2COREZMRAB17, EECCRCAH1, GT1CORE, LTREATLTI78, closed relationship with ENC, normally the ENC value is from 20 to LTRECOREATCOR15, MYB2CONSENSUSAT, PYRIMIDINEBOXOSRAMY1A, 61, the lower the ENC value, the higher bias of codon usage, the and XYLAT. Most of these cis-actingregulatoryelementswerepresented more expression of genes (Ingvarsson et al., 2006). The ENC value in in the plant promoter sequences, whichareresponsibleforgeneexpres- wild species and cultivars was 58.2 and 58.3, respectively. It rangeed sion regulation. The exact functions of these elements still need to be from 55.0 (P. rockii, P. jishanensis, ‘Yao huang’ and ‘Lan tian yu’)to characterized. 61.0 (P. decomposita, ‘Qing luo’ and ‘Shu hua zi’) (ESM_ Table4). The The fourth intron was 344 bp in length, which seemed like a con- codon bias was weak for the analyzed region, which indicated that served character in most wild species and cultivars, although it was the synonymous mutation was less affected. PsTM6 gene encodes known as accumulating more mutations than that of exons. In ‘Shu transcription factor belong to MADS-box family which is responsible hua zi’ and ‘He ping hong’, the fourth intron was 350 bp long. In for petal and stamen identity, so its expression is strictly controlled ‘Huai nian’, and P. rockii, the fourth intron was 693 bp and 658 bp which is deduced by its ENC value. long, respectively. During the last decade, several studies were car- ried out and tried to explain the importance of intron length diver- 3.6. Ka/Ks analysis gence (Janssens et al., 2007), Mount et al. (1992) emphasized the necessity of multiple compensatory mutations in order to converse Four hundred and fifty-three nucleotides of ORF were analyzed a short intron into a long one (Mount et al., 1992), it was demonstrat- using 9 wild species as inter-group and 14 cultivars as intra-group. ed that short introns were mostly favored in highly expressed genes, The result showed that synonymous and nonsynonymous sites were e.g., genes encoding ribosomal proteins (Castillo-Davis et al., 2002), 94 and 359, respectively, synonymous nucleotide diversity (Pi (s)) then the costs of transcription and other molecular process like splic- was 0.0321, and nonsynonymous nucleotide diversity (Pi(a)) was ing could be reduced, additionally, intron lengthening may result in 0.0125, total Pi(a)/Pi(s) ratio was 0.383. Synonymous nucleotide di- an increase of secondary structures within the pre-mRNA sequence, vergence (Ks) and nonsynonymous nucleotide divergency (Ka) was which generate alternative splicing sites and may affect the original 0.0315 and 0.013, respectively (ESM_File 2). Sliding window analysis splicing of the intron-exon boundaries (Janssens et al., 2007). Possi- indicated that different region of PsTM6 was affected by different se- bly, the enlargement of intron 4 in PsTM6, resulted in an increase of lection force, the ratio of Ka/Ks was more than 1.0 in the region of alternative splicing sites within specific region. 291–430, 301–350 bp and 311–360 bp, 311–360 bp, 331–380 bp, These conserved character of intron 4 could also be seen in the cis- 351–400 bp and 361–410 bp based on window length of 50 bp and acting regulatory elements, in total 27 motifs were found (ESM_Table2). step size of 10 bp. (Fig. 3). This indicated that in this region, over ac- The motif number and sites of ‘Shu hua zi’ was similar to most samples cumulated nonsynonymous substitution will be an important force with intron 344 bp long. But in ‘He ping hong’, most motif was for adaptive evolution of PsTM6 gene, it will enhance the adaptation conserved as compared with most samples, except that it had one of individual with useful nonsynonymous substitution. The useful more motif of SITEIIATCYTC (TGGGCY), and one additional motif of nonsynonymous substitution disperses within populations, which in- SORLIP2AT (GGGCC) (ESM_Table2). Within most samples with intron dicated the gene in this region was affected by positive selection, ac- of 344 bp long, there was a little difference in the 27 conserved motifs, cordingly, its putative coding amino acid of part of K domain were however, in ‘Hong lou cang jiao’, ‘Hong hua lou shuang’‘Luo yang also influenced by positive selection, the accumulation of nonsynon- hong’‘Dian dan san hao’,thefirst two all gained one motif namely ymous substitution in K domain will explain the functional diversifi- TATAPVTRNALEU (TTTATATA), but it lacked the motif of WRKY71OS cation of PsTM6 which is important for dimerization between (TGAC) in ‘Hong lou cang jiao’ and ‘Dian dan san hao’, one more motif MADS-box proteins especially in the sub-domain with three strings Q. Shu et al. / Gene 493 (2012) 113–123 119

Fig. 3. The sliding window analysis based on the rate of nonsynonymous nucleotide substitution/synonymous substitution (Ka/Ks) was carried out using DnaSP V5.0 with window length of 30 bp and step size of 10 bp. Sequences of PsTM6 (454 bp in length) from 9 wild species were used as inter-group, and those from cultivars as intra-group. The green curve indicated the rate of Ka/Ks, the red curve indicated the rate of DNA polymorphism and divergence in synonymous sites/nonsynonymous sites (Pi(a)/Pi(s)).

of heptad repeats (abcdefg) n. The former studies indicate that the fist including whole amino acids of MDAS, I and K domain. The coding se- region (30 amino acids) of K domain in class A genes (FBP29, PFG) and quence was highly conserved since exons are generally considered to E genes (FBP4 and FBP23; FBP5 and FBP23) might exist functional dif- be under strong purified selection, therefore, substitution rate within ferentiation by sliding window analysis (Nam et al., 2005), which was exon was restricted in order to avoid deleterious mutations. All puta- similar to our results in this study. I and K domains are important for tive amino acids were aligned (Fig. 5), which illustrated low variation homo- and hetero-dimerization of MADS-box proteins, whereas C do- among all samples. There were in total 24 amino acid substitutions main is involved in transcriptional activation, which suggested that (Fig. 5). Fifty-seven putative amino acids consisted of MADS domain, all the three domains are possibly involved in the functional differen- among which, 13 sites existed amino acids substitution. In the second tiation (Nam et al., 2005). However, in this study the Ka/Ks ratio was site, alanine-2 (A-2) and Glycine-2 (G-2) occupied with equal chance, constant and low in I domain, due to lack of sequence coding for C do- the former is non-polar, however, the latter is polar. G occupied in the main, so it was hard to judge its rate. In contrast, in the region of second site for all wild species from subsect Delavayanae. At the third 140–170 bp and 220–280 bp, the ratio of Ka/Ks was almost zero, site, most consisted of positive charged and polar amino acid, arginine which demonstrated that the functional importance and intensifica- (R-3), except that it was substituted by neutral serine (S) in ‘Hong tion of functional constraints of the two regions. Pi, namely π, repre- hua lou shuang’, P. rocki, P. delavayii and P. potaninii. In the fifth and sentative of interspecies morphology, Pi (a): Pi (s) is equal to non- seventh sites, neutral asparagines (N-5) and negative charged aspar- synonymous divergence: synonymous divergence. Pi (a): Pi (s)b0, tic acid (D-7) replaced positively charged hydrophilic lysine (K-5) Pi (a): Pi (s)=0 and Pi (a): Pi (s)>0 means that gene is affected by and glutamic acid (E-7) in P. lutea, recpectively. N-9 in ‘Huai nian’ negative selection, neutral selection and positive selection, respec- and R-10 in ‘He ping hong’ made it different from the site K-9 and tively. From the whole region analyzed (Fig. 3), ratio of Pi (a): Pi (s) K-10 in the other samples, which consisted of positive charged was higher than zero in the region of 1–170 bp long, which encodes amino acids in the two sites, the substitution of site 10 did not change MADS-Box domain, it suggested that this region of PsTM6 might be influenced by positive selection, which is essential for the DNA bind- ing ability, the accumulation of positive selection will contribute its function as transcription factor for controlling its downstream genes for flower organ identity. Unexpectedly, from 180 to 450 bp long, the Pi (a): Pi (s) was 0, which indicated this region was subjected to neutral selection. In conclusion, within its coding sequence of 454 bp long, different part was affected by different selection force, which will contribute to its functional diversification in controlling petal and stamen identity, therefore forming various flower shapes in tree peony. Based on the sequence of P. jishanensis, the Ka and Ks value between two samples was also analyzed site by site using K-estimator (Comeron, 1999), with window size of 30 bp and step size of 9 bp (Fig. 4). In the nine wild species, the Ka/Ks value was the highest in P. ostii,incontrast, the cultivars of ‘Feng dan’ has the lowest Ka/Ks value, since P. ostii is the ancestor of ‘Feng dan’,somesynonymoussubstitutionofP. ostii may be Fig. 4. The comparison of Ka/Ks ratio between two samples based on site by site using fi K-estimator (Comeron, 1999) with window size of 30 bp and step size of 9 bp. Ka is the xed and stably transferred to its offspring which may contribute its ad- number of nonsynonymous substitutions per nonsynonymous site, Ks is the number aptation and endurance to artificial selection. of synonymous substitutions per synonymous site. The encoding sequence from P. jishanensis was used as reference sequence. Series with triangle indicated wild spe- 3.7. Analysis on part of putative amino acids of PsTM6 in all samples cies, 1–8 was representative for P. ludlowii, P. lutea, P. qiui, P. composita, P. potaninii, P. ostii, P. delavayi, and P. rocki. Series with diamond indicated cultivars, 1–14 was as follows: ‘Huai nian’, ‘Dian dan er hao’, ‘Dian dan san hao’, ‘Fen dan’, ‘He ping hong’, The coding sequence was obtained by excluding introns from ge- ‘Hong hua lou shuang’, ‘Hong lou cang jiao’, ‘Lan tian yu’, ‘Lan xian nv’, ‘Luo yang nomic DNA sequence and translated into 151 putative amino acids hong’, ‘Qingluo’, ‘Shu hua zi’, ‘Taiyō’, and ‘Yao huang’. 120 Q. Shu et al. / Gene 493 (2012) 113–123

Fig. 5. The coding sequences with 453 bp in length from 23 tree peony samples were translated into 151 putative amino acids and aligned. The amino acids substitutions were an- alyzed and marked with arrows. The number above the sequence was the site of amino acids.

the amino acid charge. N-13 substituted by acid amino acid of D-13 non-chiral G with polarity replaced E in P. lutea and P. decomposita, in P. ludlowii caused its difference in that of most samples. One and non-polar amino acid alanine (A) in P. rockii,and‘Huai nian’. exception existed in site 15, the uncharged polar threonine (T-15) Position-112, Glycine (G) was special to ‘Luo yang hong’ and ‘Yao was substituted by non-polar isoleucine (I-15) in P. lutea and huang’, in the other samples, the neutral and non-polar amino acid P. decomposita. At the site 25, it was unique for P. lutea due to its oc- (G) was replaced by acid and polar one (D). At site 126, the basic cupation by Leucine (L), instead of R-25 in the others. T-28 consisted amino acid, lysine (K), was special to P. ostii, ‘Feng dan’‘Hong hua lou of ‘Hong lou cang jiao’ and ‘Hong hua lou shuang’, which was replaced shuang’, ‘Hong lou cang jiao'and ‘Taiyō’, the other sites was glutamine by I-28 in the other samples. L-29 in ‘He ping hong’ and G-34 in ‘Qing (Q) with hydrophilic characteristic. A-133 occupied in most samples luo’ were unique as compared with that of other samples occupied by was replaced by D-133 in ‘He ping hong’. The most prominent feature phenylalanine (F-29) and E-34, respectively. In the amino acid site was at site 148, the residue contained aspartic acid (D) with acid and (37), the hydrophobic isoleucine (I) was the most common one, but polar property, in contrast, it was replaced by glycine (G) with neutral it was replaced by valine (V) in P. rockii, ‘Huai nian’ and P. jishanensis, and non-polar property in P. jishanensis, ‘Hong hua lou shuang’, ‘Lan without changing amino acid properties. The uncharged hydrophilic xian nv’, ‘Luo yang hong’, ‘Taiyō’ and ‘Yao huang’. The threonine (T-151) T-51 was the major one in MADS domain of samples except positively consists of most sites of all samples except serine (S-151) in that of charged hydrophilic K-51 in P. potaninii. As compared with amino ‘Lan xian nv’. acids of MADS domain, in I domain, 2 of 29 sites showed variable, The 151 amino acids among aligned illustrated the conserved and neutral and non-polar amino acid consisted the 58th site, namely synapomorphic amino acids like E7, K9, I28, L45, K66, K100 and H141, the majority was isoleucine (I), but valine (V) was for ‘Huai nian’. which were similar to that of VvTM6 (Poupin et al., 2007). In all sam- K-66 occupied in most samples except that it was substituted by R-66 ples, some amino acids were unique to the species or cultivars, these in that of ‘Feng dan’, ‘Taiyō’, ‘Hong lou cang jiao’, ‘Hong hua lou shuang’, sites were as follows, firstly in wild species, G-2 in P. ludlowii, P. lutea, and P. ostii. P. decomposita, P. potaninii and P. delavayi (Fig. 5) which were distinct As compared with PsTM6 amino acid component with Arabidopsis from all species from subsect Vaginatae. It is known that these wild thaliana, it illustrated that many consensus amino acids appeared in species distributed in Southwest of China and has closed relationship the K domain such as L-101, R-102, I-105, R-108, G-110, L-121, S-131, with each other; this conserved amino acid may further provide in- I-135, T-147 and Y-150 (tyrosine) (Yang et al. 2003), it also demon- formation on their relationship. S-3 was special in P. rockii, P. delavayi, strated that the majority of the critical amino acids were hydrophobic. and P. potaninii. N-5 and D-7 was unique in P. lutea. The other specific Eight of 64 amino acids were variable, in site-89 and 92. N-89 in sites were I-15 to P. decomposita and P. lutea, L-25 to P. lutea, and K-51 P. qiui and G-92 in ‘Dian dan er hao’ was special in contrast to most of to P. potaninii, N-89 to P. qiui, G-111 to P. lutea and P. decomposita, and this site of D-89 and E-92. G-110 was conserved in all samples, which G-148 to P. jishanensi. Secondly in cultivars these sites like T-27 in indicated its importance of this site. Three different amino acids con- ‘Hong hua lou shuang’ and ‘Hong lou cang jiao’, G-34 in ‘Qing luo’, sisted of site-111, the most common one was glutamic acid (E), then V-58 in ‘Huai nian’, D-113 in ‘He ping hong’, S-152 in ‘Lan xian nv’ Q. Shu et al. / Gene 493 (2012) 113–123 121 were special. Among these variable sites, some seemed like very con- 2008), ‘Taiyō’ is from Japan cultivar group whose ancestor is originat- serve, it was reported that ‘Feng dan’ is the offspring of P. ostii, our ed from P. ostii, meanwhile, P. ostii is one of the parents of ‘Feng dan’. data also demonstrated that it kept the same amino acid at K-126, ‘Hong lou cang jiao’ and ‘Hong hua lou shuang’ clustered together this also happened to ‘Huai nian’ at V-37, which has genetic informa- with high bootstrap value (82). As far as we know, ‘Huai nian’ is the tion of P. rockii. offspring of P. rockii based on the blotches at the base of petals, in Considered about the diversity of flower shapes in cultivars, they the NJ tree, they clustered together with bootstrap value of 99, evolved from single type to lotus, chrysanthemum, rose, crown, glob- which indicated their closed relationship. Although P. decomposita ular, or crown-proliferation etc., the typical characteristics were and P. lutea cluster together in the NJ tree, the relationship between shown in ESM_Fig. 1, but in wild species, the flower shape is very sim- them is deputed, in the past research, it demonstrated that P. decomposita ple like single flower shape. The relationship between animo acid had closed relationship to P. lutea,theflower disk of P. decomposita is composition and flower shape will give an insight for this diversity. leathery, but fleshy for P. lutea, due to the original distribution of ‘Feng dan’ and ‘Dian dan er hao’ has the single flower shape, which P. decomposita, it is regarded as the ancient wild species, whether they came from ‘Jiang Nan’ and ‘Xi Nan’ cultivar group, respectively, they have closed phylogenetic relationship or not, it needs to carry out further shared most of the same amino acid at variable site, except G-92 or research. ‘Yao huang’ and ‘Luo yang hong’ are from Zhongyuan cultivar E-92 and Q-126 or K-126. With lotus flower shape of ‘Huai nian’ group, their closed relationship made them clustered together. From and ‘He ping hong’ which are all from ‘Xi Bei’ cultivar group, the con- the whole view of the NJ tree, many samples clustered separately in the sensus sites accounted for 80% except some unique sites like N-9, V- branches, when compared the genomic DNA sequences among these 37 and V-58 in ‘Huai nian’, and L-29 and D-133 in ‘He ping hong’. samples, it was highly conserved which provided less polymorphic From ‘Jiang Nan’ and ‘Japan’ cultivar group, respectively, ‘Qing luo’ sites, then caused the difficulty to phylogenetic construction. and ‘Taiyō’ with flower shape like chrysanthemum, the different In order to explore the phylogenetic relationship among wild sites were G-34 or E-34, K-126 or Q-126, and G-148 or D-148. Wheth- species, the NJ tree was constructed (Fig. 6B) using genomic DNA se- er the consensus amino acid site contributes to its flower shape or quence, as discussed, P. decomposita and P. lutea,orP. delavayi and not, it is still need to be further studied. As compared with the other P. potaninii clustered together with P. ludlowii. P. jishanensis and cultivars with different flower shapes, ‘Yao huang’ with flower P. rockii clustered together and rooted in the tree, they were also shape like a crown, ‘Hong hua lou shuang’ with crown-proliferation regarded as ancient wild species, especially for P. rockii which has flowers showed unique amino acid like T-27, G-111 and G-148 unique character of blotches at the base of petals. Except P. decomposita which was putative for its special flower shape. In all, the checked pu- and P. lutea, all wild species belonging to subsect. Delavayanae clustered tative amino acids were highly conserved, the variable amino acid together, it demonstrated their closed relationship (Zhao et al. 2008). substitution causing the changes of amino acid charge and hydropho- The introns were deleted from the genomic DNA sequences, all bicity may be useful for functional differentiation in dimerization of exons were used for NJ tree construction using all samples or wild proteins of MADS-box family. The study on protein–protein interac- species (Fig. 6C and D). P. decomposita and P. lutea,orP. delavayi tion of the above variable amino acids will help to understand their and P. potaninii again clustered together. ‘Huai nian’ and P. rockii, functional changes and importance among this family. showed closed relationship. ‘Feng dan’, P. ostii, ‘Taiyō’, ‘Hong lou MADS-box protein plays key roles as regulators in the vegetative cang jiao’ and ‘Hong hua lou shuang’ clustered into one branch, unex- and reproductive development. The majority of well-characterized pectedly, ‘Hong lou cang jiao’ demonstrated phylogenetic relation- plant MADS-box proteins contain two conserved domains, such as ship to P. ostii, since it is from Xibei cultivars group which is the the DNA-binding domain (MADS) and the K domain. The former pos- offspring of P. rockii, it may deduced that it has genetic information sess both DNA-binding and dimerization functions which interact from P. ostii, since it is one of the widely used parent in breeding cul- with DNA. The latter is predicted to form three amphipathic α- tivars. As expected that, due to less polymorphic sites provided by helices (K1, K2 and K3), which are important for heterodimerization exons, most samples clustered as single branch, it was difficult to dis- between MADS-box proteins for floral organ identity. Between the criminate each other. Based on exons sequences, the NJ tree for wild MADS and K domain, there is an intervening region (I domain) with species was constructed, from the tree, P. qiui was in the middle to various length which is necessary for dimerization and functional separated the two subsects, in the upper part of the tree, P. jishanensis specificity. From the whole feature of the putative amino acid, in demonstrated closed relationship with P. rockii, they formed into one the diverse sites, some substitution appeared without changing the branch and then clustered with P. ostii, in contrast, in the lower part property, like site 37, some with complete different property like neu- of the tree, P. decomposita and P. lutea showed closed relationship tral amino acid (S-3) vs. basic ones (R-3) or neutral and non-polar and clustered into one branch, P. potaninii and P. delavayi formed one (A-2) was replaced by hydrophilic one (G-2). Whether the vari- into one branch and clustered with P. ludlowii. The flower disk of able sites contribute its second structure, the strength and stability P. decomposita is half leathery, it may be the middle type among the of dimmerization, or interaction with other family members, it still wild species, the cluster result demonstrated the closed relationship needs to be further characterized. Also, some other questions remain to wild species of Subsect. Delavayanae. This was the first evidence to be answered, how the changes in the amino acid site in turn will to give an illumination on the relationship between P. decomposita affect their expression pattern and exertion of function, then control and P. lutea. the flower organ identity and flower shape. The exons were translated into putative amino acids and were used for NJ construction (ESM_Figs. 2A and B), the cluster results 3.8. Phylogenetic analysis was similar to the results by using exons of 23 samples. The difference was in the branch of which P. jishanensis formed a single branch in It showed that AP3/DEF introns could be applied for phylogenetic this tree, another difference laid in the branch of ‘Feng dan’, P.ostii, analysis at a relatively low taxonomic level (Janssens et al., 2007). The ‘Taiyō’, ‘Hong lou cang jiao’ and ‘Hong hua lou shuang’ which clus- NJ tree (Fig. 6, ESM_ Fig. 2) based on the genomic DNA sequence tered together. In the NJ tree of wild species, the bootstrap value (ESM_File3), and part of coding sequences (ESM_File 1) and putative was a little bit lower for each branch node, P. qiui and P. ostii,or amino acids (Fig. 5) was constructed. NJ tree was constructed by ge- P. potaninii and P. delavayi, P. rockii and P. jishanensis clustered togeth- nomic DNA sequence (1447 bp) including 4 introns and 5 exons er, with P. ludlowii into one big branch, P. decomposita and P. lutea using 23 samples (Fig. 6A). ‘Taiyō’ and P. jishanensis were rooted in formed into another separated branch. the tree. ‘Taiyō’,‘Feng dan’ and P. ostii clustered in the tree, as it is Based on the above results, for phylogenetic construction on 23 known they have very closed relationship to each other (Zhao et al. samples, it got similar results by using genomic DNA sequences and 122 Q. Shu et al. / Gene 493 (2012) 113–123 A C

B D

Fig. 6. Neighbor-Joining method was used for phylogenetic construction (MEGA 4.1, Tamura et al., 2007) based on the following options besides complete deletion (Gaps/Missing data), Maximum composite likelihood (model) and substitutions including transitions and transversions. Bootstrap test with 1000 replicates was followed after construction. Boot- strap support was given along the branch, and bootstrap value less than 50 was not indicated in the tree. NJ tree based on genomic DNA sequence of PsTM6 from 23 tree peony samples (A) and 9 wild species (B), coding sequence from 23 tree peony samples (C) and 9 wild species (D). Short name of cultivars can be referred from Table 1.

exons. For wild species, compared with traditional taxonomy and the 4. Conclusions progress on the past researches, using sequences of exons will be a good tool for its phylogenetic analysis despite of its conserved Due to the typically different flower forms among wild species and character. cultivars, based on flower developmental model, coding sequence of Introns 4 and 5 within K-domain of AP3/DEF duplicates were am- PsTM6 belonging to euAP3 lineage was obtained in tree peony, the plified and sequenced for 59 Impatiens species and used for phyloge- character of its MADS, I, K and C domain were analyzed. Genomic ny reconstruction, the results indicated that introns 4 and 5 in AP3/ DNA sequence analysis indicated that in K domain, there were in DEF-like genes are a valuable source of characters for phylogenetic total 3 introns. Genomic DNA sequence including 4 introns and 4 studies at the intrageneric level (Janssens et al., 2007). However, exons were obtained in 9 wild species and 14 cultivars, the results in- due to its conservation of PsTM6 in wild species and cultivars, it dicated that not only exons among 23 samples, but also introns were showed the limited usage for its phylogenetic construction, which highly conserved, there was simple sequence repeat (CT) n inside in- suggested that these data combined with data from chloroplast tron 3, which was highly variable among all samples. Cis-acting regu- gene and some low-copy nuclear gene may be a good source for latory element analysis suggested that inside intron 3 and 4 there tree peony phylogenetic study and confirmation on its systematic were a lot of annotated regulatory elements, which might play an im- position. portant role in PsTM6 expression, and affect stamen petaloid. Partial Q. Shu et al. / Gene 493 (2012) 113–123 123 putative amino acids analysis revealed that due to amino acids substi- Hong, D.Y., Pan, K.Y., Yu, H., 1998. Taxonomy of complex. Ann. Mo. Bot. Gard. 85, 554–564. tution by different polarity and electronic charge, the functional dif- Ingvarsson, P.K., et al., 2006. Clinal variation in phyB2, a candidate gene for day-length- ferentiation of PsTM6 paralogs maybe affects the stamen petaolid induced growth cessation and bud set, across a latitudinal gradient in European and flower shape formation. Sliding window analysis indicated that Aspen (Populus tremula). Genetics 172, 1845–1853. Janssens, S., et al., 2007. Phylogenetic utility of the AP3/DEF K-domain and its molecu- the different regions of PsTM6 were subjected to different selection lar evolution in Impatiens (Balsaminaceae). Mol. Phylogenet. Evol. 43, 225–239. force especially in the K domain. This study will provide the knowl- Kamiya, N., et al., 2003. Isolation and characterization of a rice WUSCHEL-type homeo- edge on the funcational diversification of PsTM6 gene and lay base box gene that is specifically expressed in the central cells of a quiescent centre in – for understanding its molecular mechanism on the formation of flow- the root apical meristem. Plant J. 35, 429 441. Koch, M.A., et al., 2001. Comparative genomics and regulartory evolution: conservation er shape. and function of the Chs and Apetala3 promoter. Mol. Biol. Evol. 18, 1882–1891. Kramer, E.M., Dorit, R.L., Irish, V.F., 1998. Molecular evolution of genes controlling petal and stamen development: duplication and divergence within the APETALA3 and Acknowledgment PISTILLATA MADS-box gene lineages. Genetics 149, 765–783. Kramer, E.M., Su, H.J., Wu, C.C., Hu, J.M., 2006. A simplified explanation for the frame This work was supported by the Pilot Research Program of shift mutation that created a novel C-terminal motif in the APETALA3 gene lineage. – Institute of Botany, the Chinese Academy of Sciences (2005–2008), BMC Evol. Biol. 6, 30 47. Li, Q., Wan, J.G., 2005. SSRHunter: development of a local searching software for SSR National Natural Science Foundation of China, (grant no. 30800760), sites. Hereditas (Beijing) 27, 808–810 (In Chinese with English abstract). and The Scientific Research Foundation for Returned Overseas Librado, P., Rozas, J., 2009. DnaSP v5: A software for comprehensive analysis of DNA – Chinese Scholars from Ministry of Education of China ((2010) 1561). polymorphism data. Bioinformatics 25, 1451 1452. Melzer, R., Theißen, G., 2009. Reconstitution of ‘floral quartets’ in vitro involving class B and class E floral homeotic proteins. Nucleic Acids Res. 37, 2723–2736. Appendix A. Supplementary data Morgante, M., Hanafey, M., Powell, W., 2002. Microsatellites are preferentially associat- ed with nonrepetitive DNA in plant genomes. Nat. Genet. 30, 194–200. Mount, S.M., et al., 1992. Splicing signals in Drosophila: intron size, information con- Supplementary data to this article can be found online at doi:10. tent, and consensus sequences. Nucleic Acids Res. 20, 4255–4262. 1016/j.gene.2011.11.008. Nam, J.M., Kaufmann, K., Theißen, G., Nei, M., 2005. A simple method for predicting the functional differentiation of duplicate genes and its application to MIKC- typeMADS-box genes. Nucleic Acids Res. 33 (1), e12. References Pauli, S., et al., 2004. The cauliflower mosaic virus 35S promoter extends into the tran- scribed region. J. Virol. 78, 12120–12128. Arguello-Astorga, G.R., Herrera-Estrella, L.R., 1996. Ancestral multipartite units in light- Pnueli, L., et al., 1991. The MADS box gene family in tomato: temporal expression dur- responsive plant promoters have structural features correlating with specific ing floral development, conserved secondary structures and homology with home- phototransduction pathways. Plant Physiol. 112, 1151–1166. otic genes from Antirrhinum and Arabidopsis. Plant J. 1, 255–266. Baranowskij, N., Frohberg, C., Part, S., Willimitzer, L., 1994. A novel DNA binding pro- Poupin, M.J., et al., 2007. Isolation of the three grape sub-lineages of B-class MADS-box tein with homology to myb containg only one repeat can function as a transcrip- TM6, PISTILLATA and APETALA3 genes which are differentially expressed during tional activator. EMBO J. 13, 5383–5392. flower and fruit development. Gene 404, 10–24. Castillo-Davis, C.I., et al., 2002. Selection for short introns in highly expressed genes. Sang, T., 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Nat. Genet. 31, 415–418. Rev. Biochem. Mol. Biol. 27, 121–147. Comeron, J.M., 1999. K-estimator: calculation of the number of nucleotidesubstitutions Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genet- per site and the confidence intervals. Bioinformatics 15, 763–764. ics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599. De Bodt, S., Theissen, G., Van de Peer, V., 2006. Promoter analysis of MADS-Box genes in Theissen, G., et al., 2000. A short history of MADS-box genes in plants. Plant Mol. Biol. through phylogenetic footprinting. Mol. Biol. Evol. 23, 1293–1303. 42, 115–149. Filichkin, S.A., et al., 2004. A novelendobeta-mannanase associated with anther and Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The Clus- gene in tomato LeMAN5 is pollen development. Plant Physiol. 134, 1080–1087. talX windows interface: flexible strategies for multiple sequence alignment aided Filipowicz, W., Gniadkowski, M., Klahre, U., Liu, H., 1994. Pre-mRNA splicing in plants. In: by quality analysis tools. Nucl. Acids Res. 25, 4876–4882. Lamond, A.I. (Ed.), Pre-mRNA Processing. R.G. Landes Company, Austin, pp. 65–77. Wang, L.Y., 1997. Pictorial record of Chinese tree peony varieties. China Forestry Pub- Geuten, K., et al., 2006. Petaloidy and petal identity MADS-box genes in the balsami- lishing House, Beijing, pp. 104–106 (In Chinese). noid genera Impatiens and Marcgravis. Plant J. 47, 501–518. Yang, Y., Jack, T., 2004. Defining subdomains of the K domain important for protein- Goodall, G.J., Filipowicz, W., 1990. The minimum functional length of pre-mRNA in- protein interactions of plant MADS proteins. Plant Mol. Biol. 55, 45–49. trons in monocots and dicots. Plant Mol. Biol. 14, 727–733. Yang, Y., Fanning, L., Jack, T., 2003. The K domain mediates heterodimerization of the Guo, B.L., Hong, D.Y., Xiao, P.G., 2008. Further research on chemotaxonomy of paeonol Arabidopsis floral organ identity proteins APETALA3 and PISTILLATA. Plant J. 33, and analogs in Paeonis (Ranunculaceae). J. Syst. Evol. 46, 724–729. 47–59. Hall, T.A., 1999. BioEdit: a user-friendly biological sequence alignment editor and anal- Zahn, L.M., et al., 2005. To B or not to B a flower: the role of DEFICIENS and GLOBOSA ysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98. orthologs in the evolution of the angiosperms. J. Hered. 96, 225–240. Han, X.Y., et al., 2008. Molecular characterization of Tree Peony germplasms using Zhang, L.D., Zuo, K.J., Zhang, F., Cao, Y.F., Wang, J., Zhang, Y.D., Sun, X.F., Tang, K.X., 2006. sequence-related amplified polymorphism markers. Biochem. Genet. 46, 162–179. Conservation of noncoding microsatellites in plants: implication for gene regula- Hebsgaard, S.M., et al., 1996. Splice site prediction in Arabidopsis thaliana DNA by com- tion. BMC Genomics 7, 323–337. bining local and global sequence information. Nucleic Acids Res. 24, 3439–3452. Zhao, X., Zhou, Z.Q., Lin, Q.B., Pan, K.Y., Li, M.Y., 2008. Phylogenetic analysis of Paeonia Higo, K., Ugawa, Y., Iwamoto, M., Korenaga, T., 1999. Plant cis-acting regulatory DNA el- sect. Moutan (Paeoniaceae) based on multiple DNA fragments and morphological ements (PLACE) database: 1999. Nucleic Acids Res. 27, 297–300. data. J. Syst. Evol. 46, 563–572.