YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 1 Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution

journal homepage: www.elsevier.com/locate/ympev

2 Short Communication

6 4 Evolutionary history of PEPC genes in green : Implications for the 7 q 5 evolution of CAM in orchids

a,b,1 c,1 c a c,d,e,⇑ 8 Hua Deng , Liang-Sheng Zhang , Guo-Qiang Zhang , Bao-Qiang Zheng , Zhong-Jian Liu , a,⇑ 9 Yan Wang

10 a State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese 11 Academy of Forestry, Beijing 100091, 12 b Research Institute of Forestry Policy and Information, Chinese Academy of Forestry, Beijing 100091, China 13 c Shenzhen Key Laboratory for Orchid Conservation and Utilization, The National Orchid Conservation Center of China and The Orchid Conservation and Research Center of 14 Shenzhen, Shenzhen, China 15 d The Center for Biotechnology and BioMedicine, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China 16 e College of Forestry, South China Agricultural University, Guangzhou, China

1718 19 article info abstract 3421 22 Article history: The phosphoenolpyruvate carboxylase (PEPC) gene is the key enzyme in CAM and C4 photosynthesis. A 35 23 Received 14 November 2014 detailed phylogenetic analysis of the PEPC family was performed using sequences from 60 available pub- 36 24 Revised 7 October 2015 lished genomes, the Phalaenopsis equestris genome and RNA-Seq of 15 additional orchid species. The 37 25 Accepted 8 October 2015 PEPC family consists of three distinct subfamilies, PPC-1, PPC-2, and PPC-3, all of which share a recent 38 26 Available online xxxx common ancestor in chlorophyte algae. The eudicot PPC-1 lineage separated into two clades due to whole 39 genome duplication (WGD). Similarly, the monocot PPC-1 lineage also divided into PPC-1M1 and PPC- 40 27 Keywords: 1M2 through an ancient duplication event. The monocot CAM- or C -related PEPC originated from the 41 28 Crassulacean acid metabolism (CAM) 4 clade PPC-1M1. WGD may not be the major driver for the performance of CAM function by PEPC, 42 29 30 Phosphoenolpyruvate carboxylase (PEPC) although it increased the number of copies of the PEPC gene. CAM may have evolved early in monocots, 43 31 Phylogeny as the CAM-related PEPC of orchids originated from the monocot ancient duplication, and the earliest 44 32 RNA-Seq sequences CAM-related PEPC may have evolved immediately after the diversification of monocots, with CAM devel- 45 33 oping prior to C4. Our results represent the most complete evolutionary history of PEPC genes in green 46 plants to date and particularly elucidate the origin of PEPC in orchids. 47 Ó 2015 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license 48 (http://creativecommons.org/licenses/by-nc-nd/4.0/). 49

50 51 52 53 1. Introduction Silvera et al., 2010a). In the Orchidaceae, 50% of species were antic- 60 ipated to be CAM plants (Smith and Winter, 1996). 61

54 C4 photosynthesis has evolved independently more than 62 Phosphoenolpyruvate carboxylase (PEPC; EC 4.1.1.31) plays a 62 55 times, including 7500 species in 19 families or 3% of flowering key role in the carbon metabolism of C4 and crassulacean acid 63 56 plant species (Sage et al., 2011), whereas CAM has arisen multiple metabolism (CAM) plants (Masumoto et al., 2010) and markedly 64 57 times in 35 plant families and is present in 30,000 species, improves photosynthetic efficiency and water use efficiency 65 58 comprising 6% of plant species from the Lycophyta, Pterophyta, (Driever and Kromdijk, 2013). PEPC is widely present in all photo- 66 59 Gnetophyta, and Anthophyta divisions (Keeley and Rundel, 2003; synthetic organisms (Izui et al., 2004). In addition to photosyn- 67 thetic function, housekeeping isoforms of PEPC also play 68 essential metabolic roles in non-photosynthetic functions (Fan 69 Abbreviations: CAM, crassulacean acid metabolism; ML, maximum likelihood; et al., 2013; O’Leary et al., 2011). Consistent with its diverse roles 70 NJ, neighbor-joining; PEPC, phosphoenolpyruvate carboxylase; WGD, whole and origin, plant PEPC can be divided into two types: plant-type 71 genome duplication. q 72 This paper was edited by the Associate Editor Elizabeth Zimmer. PEPC (PPC-1) and bacterial-type PEPC (PPC-2). Although the PEPC ⇑ Corresponding authors at: Shenzhen Key Laboratory for Orchid Conservation involved in the CAM pathway has shown high sequence identity 73 and Utilization, The National Orchid Conservation Center of China and The Orchid to its counterpart in C4 photosynthesis, these genes are completely 74 Conservation and Research Center of Shenzhen, Shenzhen, China (Z.-J. Liu). different (Christin et al., 2014). 75 E-mail addresses: [email protected] (Z.-J. Liu), [email protected] Understanding the origin and function of PEPC has both funda- 76 (Y. Wang). mental and bioengineering importance, as it may help in identify- 77 1 These authors contributed equally to this work.

http://dx.doi.org/10.1016/j.ympev.2015.10.007 1055-7903/Ó 2015 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007 YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

2 H. Deng et al. / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

78 ing certain gene lineages suitable for the evolution of C4 and CAM After local searches were performed in the proteome datasets with 137 79 plants and in detecting specific amino acids essential for enzymatic the PEPC domain (PF00311) (Finn et al., 2014), the resulting 138 80 characteristics of photosynthetic PEPC and therefore in applying sequences were manually adjusted in multiple sequence align- 139

81 the C4 and CAM pathways in crops and fuel plants. The evolution- ments to delete obvious errors. Multiple sequence alignment was 140 82 ary history of the PEPC gene family, however, remains poorly performed in MUSCLE using the default parameters (Edgar, 2004). 141 83 understood in green plants, especially in the CAM plants, such as 84 orchids. In this work, a detailed phylogenetic analysis of the PEPC 2.4. Phylogenetic reconstruction and synteny analysis 142 85 family was performed, using Phalaenopsis equestris genome 86 sequences (Cai et al., 2015) and the RNA-Seq sequences of 15 other The phylogenetic trees of PEPC were reconstructed using the 143 87 orchid species, in combination with available genome sequences, neighbor-joining (NJ) and maximum likelihood (ML) methods. 144 88 to outline the evolutionary history of PEPC and to identify suitable Using the ‘pairwise deletion’ option and the ‘Poisson correction’ 145 model, we constructed NJ trees with MEGA, with a bootstrap test 146 89 lineages for the evolution of C4 and CAM plants. Our comparative 90 analyses help to elucidate the history of photosynthetic PEPC of 1000 replicates. ML trees were constructed using FastTree 147 91 and, in particular, shed light on the origin of PEPC in orchids. (http://www.microbesonline.org/fasttree) with the approximate 148 likelihood ratio test (aLRT) method. Synteny was detected using 149 92 2. Materials and methods the Plant Genome Duplication Database (Tang et al., 2008). 150

93 2.1. RNA extraction and transcriptome sequencing 3. Results and discussion 151

94 Total RNA was extracted from Dendrobium catenatum and Pha- 3.1. The evolutionary history of PEPC genes in green plants 152 95 laenopsis equestris tissue samples from the National Orchid Conser- 96 vation Center of China (NOCC) using the Sigma SpectrumTM Plant We constructed a phylogenetic tree of PEPC family members 153 97 Total RNA Kit. To identify the number of PEPC genes present in using genomic sequences of representative species (Fig. S1). The 154 98 D. catenatum, different tissues and timings were sampled, includ- tree shows that the PEPC gene family can be divided into three lin- 155 99 ing leaves (one was sampled at dawn, 6:30 a.m. and another at eages: PPC-1 (plant-type PEPC), PPC-2 (bacterial-type PEPC) and 156 100 dusk, 6:30 p.m.), stem, root (one sample of just the green tips with- PPC-3. To comprehensively understand the origin of PEPC, we 157 101 out velamen, another of the root with velamen), blossom-bud, and added the recently sequenced genome of Klebsormidium flaccidum 158 102 lip (modified petal). The P. equestris samples were taken from leaf, to the dataset. As the terrestrial algae closest to land plants, the 159 103 stem, root, and flower. Except for the leaves of D. catenatum, all charophytic alga Klebsormidium is important for finding the origins 160

104 other samples were collected in daytime. The transcriptome library of the CO2-concentrating mechanism and PEPC in aquatic algae and 161 105 construction and sequencing were performed at BGI and followed land plants. Our phylogenetic tree indicates that the PEPC of Kleb- 162 106 the protocols in Peng’s paper (Peng et al., 2012b). sormidium fills the major phylogenetic gap between PPC-1 and 163 PPC-2, as it is located between Chlorophyta and land plants. We 164 107 2.2. Data sources also found another class of PEPC in Klebsormidium and retrieved 165 its homologs from NCBI GenBank (http://www.ncbi.nlm.nih.gov/). 166 108 The Arabidopsis thaliana and predicted proteins were retrieved Interestingly, these sequences exhibit high identity with bacteria, 167 109 from the TAIR10 Genome Release (http://www.arabidopsis.org). in addition to including a moss gene. PPC-3, which is formed by 168 110 The representative plant genomes and predicted proteome K. flaccidum and moss (Fig. S1), probably came from horizontal 169 111 sequences were obtained from available public databases gene transfer from bacteria (Peng et al., 2012a) or from sequencing 170 112 (Table S1). The PEPC genes and predicted proteins of P. equestris contamination. 171 113 were retrieved from its complete genome (Zhong-Jian Liu et al., PPC-2 is believed to be derived from ancient inter-kingdom hor- 172 114 unpublished data). The transcriptomes of P. equestris and izontal gene transfer from bacteria (Peng et al., 2012a) and is a rel- 173 115 D. catenatum from different tissues and timings were sequenced atively conservative bacterial-type PEPC (O’Leary et al., 2011) with 174 116 and assembled in this work. We also screened PEPC genes from lower copy numbers than PPC-1 (Fig. S3). Prasinophycean alga, 175 117 other public transcriptome databases. Eight orchids (Cymbidium which nested at the base of PPC-1, was not observed in PPC-2. 176 118 sinense, Hemipilia forrestii, Habenaria delavayi, Paphiopedilum arme- One possible explanation is that the horizontal gene transfer 177 119 niacum, Cypripedium singchii, Neuwiedia malipoensis, Apostasia occurred after the algae diverged; alternatively, PEPC genes of 178 120 shenzhenica, and Galeola faberi) had been sequenced before (Tsai PPC-2 may have been lost in these algae through evolution. Most 179 121 et al., 2013) and were re-assembled using Trinity in this work. species within PPC-2 only have one copy, and PPC-2 is conservative 180 122 RNA-Seq datasets of P. equestris and D. catenatum aligned using compared to PPC-1, which generally has two to six copies in most 181 123 Bowtie (Langmead and Salzberg, 2012) were mapped to their species. 182 124 assembled genomes. The gene expression levels were measured PPC-1 is a typical plant-type enzyme (O’Leary et al., 2011; Park 183 125 by RPKM (reads per kilobase of mRNA length per million of et al., 2012) with more members having diversified after the diver- 184 126 mapped reads). The transcriptome sequences of another six orch- gence of eudicots and monocots (Fig. S3). Within PPC-1, PEPC 185 127 ids (Masdevallia yuangensis, Oncidium sphacelatum, Haemaria dis- sequences from the green algae and moss are basal to the eudicot 186 128 color, Goodyera pubescens, Platanthera clavellata, Vanilla planifolia) and monocot lineages and coincide largely with a previous tree 187 129 were retrieved from the 1KP project (http://www.onekp.com/). (Silvera et al., 2014). Eudicots apparently separate into two clades. 188 130 When two or more sequences had an identical overlap region in Grape has two copies and underwent a gene duplication event. 189 131 the same locus, the longest protein sequence was selected. The Meanwhile, several successive duplications occurred in the mono- 190 132 gene expression data of Agave deserti and A. tequilana were cot lineage such that banana, rice, and sorghum each have five 191 133 retrieved from their transcriptomes (Gross et al., 2013). members, and the latter two plants cluster together as a sister 192 group, forming four monophyletic groups (Fig. S1). These results 193 134 2.3. Homolog identification and multiple sequence alignment show that one or more gene duplication events occurred in the 194 diversification of eudicots and monocots within PPC-1. Detailed 195 135 Proteins with a PEPC domain were identified by the hidden analyses are presented below, with separate discussions for eudi- 196 136 Markov model-based HMMER program (3.0) (Finn et al., 2011). cots and monocots. 197

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007 YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

H. Deng et al. / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 3

198 We further conducted motif analysis on these sequences. These duplication may have resulted from the core eudicot whole- 225 199 results agreed with the phylogenetic tree and also support that the genome duplications (WGD). 226 200 PEPC family is composed of three clades. The proteins encoded by In PPC-1E1, both Brassicaceae and Fabaceae have two clades, 227 201 the PPC-1 gene are relatively conservative and have many con- implying two duplication events, whereas Solanaceae have three 228 202 served motifs, such as motifs 2, 3, and 15. PPC-2 lacks motif 15 clades due to WGDs (Figs. 1 and S4), showing that at least two 229 203 and has an insertion of approximately 150 aa between motif 3 duplications occurred and that one copy was lost. In PPC-1E2, 230 204 and motif 12. Both PPC-1 and PPC-2 have specific versions of motif Fabaceae have one duplication, and other species typically have 231 205 11 at the end of the C terminal. PPC-3 has an insertion of approx- only one copy. Syntenic analyses indicate that these duplications 232 206 imately 400 aa in the N-terminal and a different motif pattern have remained from a recent WGD (Fig. S5). Lineage-specific dupli- 233 207 compared to PPC-1 and PPC-2. In short, based on protein sequence cation occurred in Lotus and Aquilegia, which support the hypoth- 234 208 analyses, these three clades have particular motif patterns. The esis that the core eudicot WGD occurred after the divergence of 235 209 characteristics of the protein sequences were congruent with the Lotus and Aquilegia. WGD occurred repeatedly in various eudicots, 236 210 phylogenetic analysis, supporting PPC-1, PPC-2 and PPC-3 as three resulting in multiple copies of PPC-1. After several rounds of WGD, 237 211 clades in green plants. PPC-1E1 has more members than PPC-1E2. Compared to the tree 238 for PPC-1E2, the PPC-1E1 tree has longer branches, which means 239 that it evolved more rapidly, along with functional divergence. 240 212 3.2. Phylogenetic analysis of eudicots in PPC-1 Moreover, the topology of the NJ tree partly conflicts with the 241 ML tree, indicating diversity in some eudicots (Figs. 1 and S4). 242 213 To further investigate the details of the evolution of the PPC-1 For instance, three clades of Solanaceae connect with each other 243 214 lineage, we conducted phylogenetic analysis of eudicots in the quite differently in the NJ and the ML trees, likely due to selective 244 215 PPC-1 lineage alone. The PPC-1 of eudicots was split into three pressure. Mimulus guttatus (Scrophulariaceae) clusters together 245 216 clades, forming PPC-1E0, PPC-1E1 and PPC-1E2, based on ML and with Solanaceae as the sister group in NJ tree, which is consistent 246 217 NJ methods (Figs. 1 and S4). As the basal eudicot species, Lotus with the phylogenetic relationship, but this is not the case in the 247 218 and Aquilegia each have two members in the PPC-1E0 clade, due ML tree. 248 219 to lineage-specific duplication. Each of the two clades, PPC-1E1 220 and PPC-1E2, has one grape gene, and other eudicot species also 221 separate into PPC-1E1 and PPC-1E2, indicating that the duplication 3.3. Phylogenetic analysis of monocots in PPC-1 249 222 came from an ancient duplication in the early eudicots. We also 223 determined that the duplicate pair was located on two pairs of The evolutionary relationship of monocots in PPC-1 has not 250 224 duplicated segmental blocks (Fig. S5), indicating that the been well explained to date. In the phylogenetic tree of PPC-1 251

Fig. 1. Neighbor-joining (NJ) tree showing the evolution of PPC-1 genes in eudicots and monocots. Detailed species information is provided in Table S1. (a) The eudicot PPC-1 gene tree used the representative species and can be divided into three clades: PPC-1E1, PPC-1E2 and PPC-1E0. (b) The monocot PPC-1 gene tree mainly used Poaceae and can be divided into two clades: PPC-1M1 and PPC-1M2. Plant species abbreviations: AT: Arabidopsis thaliana; Araly, Arabidopsis lyrata; Carubv, Capsella rubella; Tp, Thellungiella parvula; Bra, Brassica rapa; Thhalv, Thellungiella halophila; PGSC, Solanum tuberosum; Solyc, Solanum lycopersicum; CA, Capsicum annuum; Migut, Mimulus guttatus; Glyma, Glycine max; Phvul, Phaseolus vulgaris; C.cajan, Cajanus cajan; Medtr, Medicago truncatula; Vv: Vitis vinifera; POPTR, Populus trichocarpa; NNU, Nelumbo nucifera; Acov, Aquilegia caerulea.

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007 YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

4 H. Deng et al. / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

252 constructed by Silvera et al., the monocot lineage is paraphyletic to PPC-1M1 due to its shorter branches, despite having almost the 274 253 because some orchid PEPC sequences are nested within the eudi- same number of copies as PPC-1M1. 275 254 cots PPC1-E1 lineage, and Poaceae has two clades: one embedded Aside from the two orchid lineages located separately in 276 255 within PPC-1M2 and another located at the base, as the sister to PPC-1M1 and PPCI-1M2, Silvera et al. described another orchid lin- 277 256 all other monocots of PPC-1 (Silvera et al., 2014). In some cases, eage nested in the PPC1-E1. This lineage is, however, not consistent 278 257 monocots have many lineages mixed with eudicots in PPC-1 with the phylogeny of species. The phylogenetic tree in this study 279 258 (Gehrig et al., 2001; O’Leary et al., 2011). The previous results are supports the idea that orchid PEPC genes separated into 280 259 controversial and have not been explicitly elucidated to date. We Orchid-1M1 and Orchid-1M2, each embedded in PPC-1M1 and 281 260 conducted phylogenetic analyses of monocots, using the PPC-1 lin- PPCI-1M2. 282 261 eage alone, to clarify its phylogenetic relationship. In PPC-1M1, the relationship of different clades remains unclear 283 262 Our NJ and ML phylogeny trees (see Figs. 1 and S4) indicate that because they are not completely identical in the NJ and ML trees. A 284 263 monocots divide into two lineages, namely PPC-1M1 and PPC-1M2, previous study showed that the rice PPC4 gene is not clustered 285 264 which came from an ancient duplication. The evolution history of with any other rice PEPC members (Masumoto et al., 2010). In this 286 265 monocots is more complex than that of eudicots, although they study, the rice PPC4 evolved more rapidly, with a longer branch 287

266 both have two lineages. After the diversification of early monocots, than C4-type PPC genes, in both the NJ tree and the ML tree. The 288 267 several successive duplication and loss events occurred in PPC4 gene and C4-type PPC genes form a sister group in the NJ tree, 289 268 PPC-1M1 and PPC-1M2, leading to varying numbers of copies while in the ML tree, PPC4 is located at the base, as a single clade. 290 269 between species. PPC-1M1 subsequently separated into two clades, Our phylogenetic trees show that the rice PPC4 is located in PPC- 291 270 one consisting of orchids and agaves and the other consisting of the 1M1 and should be classified into PPC-1M1. In PPC-1M2, banana 292 271 Poaceae (Fig. 2). PPC-1M2 shows a similar pattern (orchids and mixes with palm in the NJ tree, while they separate as two single 293 272 agaves have only one copy), and exhibiting this copy is relatively clades in the ML tree, although both are located in the basal portion 294 273 conserved. Meanwhile, PPC-1M2 is relatively conserved compared of PPC-1M2. 295

Fig. 2. Phylogenetic tree of monocot PPC-1, focusing on orchids. The left panel shows the monocot PPC-1 with the NJ method, the upper right shows the Orchid-1M1, the middle right shows the Poaceae-1M1, and the bottom right shows the Orchid-1M2. Plant species abbreviations (not including the species listed above): tequilana, Agave tequilana; deserti, Agave deserti; GSMUA, Musa acuminata; PDK, Elaeis guineensis; PH, Phyllostachys edulis; Bradi, Brachypodium distachyon; MLOC, Hordeum vulgare; LOC_Os, Oryza sativa; SiPROV, Setaria italica; GRMZM, Zea mays; Sobic, Sorghum bicolor; Pavir, Panicum virgatum; EMT, Aegilops tauschii; MECR, Mesembryanthemum crystallinum; Dca, Dendrobium catenatum; PEQU, Phalaenopsis equestris; Cym, Cymbidium sinense; Hem, Hemipilia forrestii; Ha, Habenaria delavayi; Pa, Paphiopedilum armeniacum; Ch, Cypripedilum singchii; Ne, Neuwiedia malipoensis; Ap, Apostasia shenzhenica; Ga, Galeola faberi.

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007 YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

H. Deng et al. / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx 5

296 3.4. CAM- and C4-related PEPC nest in PPC-1M1 could therefore be missed. To verify a reliable number of PEPC 357 members, we retrieved the PEPC genes and predicted proteins of 358 297 The PEPC expression supports that monocots’ CAM-related P. equestris from its complete genomes (Cai et al., 2015) and those 359 298 PEPC is located in PPC-1M1. The CAM plants Agave deserti of D. catenatum from its transcriptomes, with different tissues and 360

299 (Winter et al., 2014) and Dendrobium catenatum (our unpublished timing. We found that the C3/CAM intermediate Dendrobium has 361 300 data) can switch between C3 and CAM photosynthetic modes based four copies of PEPC, whereas the strong CAM Phalaenopsis has three 362 301 on water availability. One PEPC gene in PPC-1M1 is preferentially copies. The numbers of PEPC members do not exhibit the tendency 363 302 expressed in CAM plants and not in other non-CAM plants, described in the Oncidiinae. However, Orchidaceae is famous for its 364 303 although these data came from different tissues or developmental diversity; we merely obtained a glimpse of PEPC evolution and fur- 365 304 stages (Fig. S6). One of the PEPC genes is highly expressed in CAM ther comparisons of genomes and transcriptomes with various 366 305 agaves (Agd2a and Agt2) and in facultative CAM D. catenatum timing are needed. Even so, this study suggests that the number 367 306 (Dca1b). We assume that these genes may be the CAM-related of PEPC members may be irrelevant to the type of photosynthesis. 368 307 PEPC with the function of fixing carbon dioxide because of their Furthermore, Phalaenopsis only has one copy in the PPC-1M1 369 308 high expression in CAM plants, especially in major photosynthetic lineage where CAM-related PEPC nested. Certain other species pos- 370 309 tissues, the leaves. sess one or two copies, such as agaves, Dendrobium and Vanilla. 371 310 Phalaenopsis equestris, a well-known strong CAM orchid These copies originated from the ancient genome duplication after 372 311 (Borland et al., 2014; Pollet et al., 2011), has a PEPC gene (Peq1) the early diversification of monocots. Therefore, we propose that 373 312 that clustered together with the CAM-related PEPC of Dendrobium WGD led to the increase of PEPC gene copies but that it may not 374 313 along the longest branch (Fig. 2). We thus conclude that Peq1 is be the major driver for the performance of CAM function, the fixa- 375 314 also a CAM-related PEPC. Nevertheless, we did not observe evi- tion of carbon dioxide, by PEPC. 376 315 dence of high expression of PEPC in the transcriptome of Pha- 316 laenopsis leaves (Fig. S6). This may be because it was sampled in 4. Conclusions 377 317 the daytime rather than at night, when PEPC fixes carbon dioxide. 318 Further analysis is needed after sampling at different times to We present the first report of another basal clade, PPC-3, which 378 319 determine the gene expression rhythm. is composed of K. flaccidum and moss. PPC-2 is conserved com- 379 320 The C4-type PEPC of maize is expressed at very high levels in pared to PPC-1 because most species have a single copy in the for- 380 321 leaves (Masumoto et al., 2010), and this C4 PEPC also nested in mer and two to six copies in the latter. Eudicots and monocots in 381 322 the PPC-1M1 in this study. CAM-related PEPC did not expand in PPC-1 each shared their own common ancestors. For eudicots, 382 323 CAM plants, whereas C4-type PEPC expanded markedly in C4 the ancient and recent WGD events caused varying numbers of 383 324 plants. Given that CAM-related PEPC clustered with non-CAM- copies in PPC-1 and subsequent separation into PPC-1E1 and 384 325 related PEPC of related species and that the topological structure PPC-1E2. Especially in the lineage of PPC-1E2, two or three rounds 385 326 of PPC-1M1 was congruent with the phylogenetic relationship, of duplications occurred in the Brassicaceae, Fabaceae, and Solana- 386 327 we speculate that CAM-related PEPC evolved independently in dif- ceae. Similarly, the monocot PPC-1 lineage is also divided into two 387 328 ferent species and several times in PPC-1M1. clades (PPC-1M1 and PPC-1M2) because of an ancient duplication 388

event. Nested in the PPC-1M1 lineage together with C4-type PEPC, 389 329 3.5. An ancient duplication in monocots and the origin of PEPC in CAM several special PEPC genes may have led to CAM-related function, 390 as their expression increased dramatically in the transcriptomes of 391 330 With each clade nesting in PPC-1M1 and PPC-1M2 (Fig. 2), the CAM plants such as agaves and orchids, indicating that the CAM 392 331 orchids’ PEPC lineages largely coincide with the topological struc- pathway may be primarily regulated at the transcriptional level. 393 332 ture of monocot PPC-1 in the NJ and ML trees, indicating that the In monocots, the CAM- or C4-related PEPC may originate from 394 333 duplication event of orchid PEPC came from the ancient monocot the lineage of PPC-1M1. The WGD led to the increase in the num- 395 334 duplication, which may be due to an ancient WGD in monocots ber of PEPC gene copies, but it may not be the major driver for the 396 335 (Jiao et al., 2014). We thus conclude that the ancient WGD in performance of CAM function by PEPC. CAM may have evolved 397 336 monocots occurred at least before the orchids developed. early in monocots, as CAM-related PEPC in orchids originated from 398 337 The orchid Vanilla planifolia belongs to the , which the monocot ancient duplication, and the earliest CAM may have 399 338 is the basal subfamily following Apostasiaceae (Górniak et al., evolved just after the diversification of monocots, with CAM 400 339 2010), and it is an obligate CAM plant (Gehrig et al., 1998). appearing prior to C4. 401 340 Although most CAM orchids belong to the Epidendroideae 341 (Silvera et al., 2010b, 2009) and are located at the top of the phy- 402 342 logenetic tree of Orchidaceae (Górniak et al., 2010), that Vanilla is Acknowledgments 343 an obligate CAM plant implies that CAM in orchids originated very This work was supported by the Basic Research Fund of Richard 403 344 early. Using fixation of CO2 by PEPC as the symbol of CAM evolu- 404 345 tion, we assume that CAM may have evolved early in monocots, Ivey Foundation (Grant No. RIF2014-05) and Development Special 405 346 as CAM-related PEPC in orchids originated from the monocot Fund of Biological Industry of Shenzhen Municipality (Grant No. 406 347 ancient duplication and that the earliest CAM plant may have JC201005310690A). We thank Ying-Qiu Tian and Jie Huang for help 407 348 evolved just after the diversification of monocots, with CAM devel- with lab work and Chao Bian for sharing bioinformatic resources and discussion. 408 349 oping prior to C4. 350 In Oncidiinae species, the number of PEPC members increase 351 gradually along with the type of photosynthesis ranging from C 3 Appendix A. Supplementary material 409 352 to weak CAM to strong CAM. The C3 species have two to three PEPC 353 members, C /CAM intermediate orchids have three to four mem- 3 Supplementary data associated with this article can be found, in 410 354 bers, and strong CAM species have four to six members (Silvera the online version, at http://dx.doi.org/10.1016/j.ympev.2015.10. 411 355 et al., 2014). The number of PEPC members may merit further 007. 412 356 investigation, as it is based on PCR sampling, and some copies 413

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007 YMPEV 5323 No. of Pages 6, Model 5G 23 October 2015

6 H. Deng et al. / Molecular Phylogenetics and Evolution xxx (2015) xxx–xxx

414 References Langmead, B., Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat. 470 Methods 9, 357–359. http://dx.doi.org/10.1038/nmeth.1923. 471 Masumoto, C., Miyazawa, S.-I., Ohkawa, H., Fukuda, T., Taniguchi, Y., Murayama, S., 472 415 Borland, A.M., Hartwell, J., Weston, D.J., Schlauch, K.a., Tschaplinski, T.J., Tuskan, G. Kusano, M., Saito, K., Fukayama, H., Miyao, M., 2010. Phosphoenolpyruvate 473 416 a., Yang, X., Cushman, J.C., 2014. Engineering crassulacean acid metabolism to carboxylase intrinsically located in the chloroplast of rice plays a crucial role in 474 417 improve water-use efficiency. Trends Plant Sci. 19, 327–338. http://dx.doi.org/ ammonium assimilation. Proc. Natl. Acad. Sci. USA 107, 5226–5231. http://dx. 475 418 10.1016/j.tplants.2014.01.006. doi.org/10.1073/pnas.0913127107. 476 419 Cai, J., Liu, X., Vanneste, K., Proost, S., Tsai, W.-C., Liu, K.-W., Chen, L.-J., He, Y., Xu, Q., O’Leary, B., Park, J., Plaxton, W.C., 2011. The remarkable diversity of plant PEPC 477 420 Bian, C., Zheng, Z., Sun, F., Liu, W., Hsiao, Y.-Y., Pan, Z.-J., Hsu, C.-C., Yang, Y.-P., (phosphoenolpyruvate carboxylase): recent insights into the physiological 478 421 Hsu, Y.-C., Chuang, Y.-C., Dievart, A., Dufayard, J.-F., Xu, X., Wang, J.-Y., Wang, J., functions and post-translational controls of non-photosynthetic PEPCs. 479 422 Xiao, X.-J., Zhao, X.-M., Du, R., Zhang, G.-Q., Wang, M., Su, Y.-Y., Xie, G.-C., Liu, G.- Biochem. J. 436, 15–34. http://dx.doi.org/10.1042/BJ20110078. 480 423 H., Li, L.-Q., Huang, L.-Q., Luo, Y.-B., Chen, H.-H., Van de Peer, Y., Liu, Z.-J., 2015. Park, J., Khuu, N., Howard, A.S.M., Mullen, R.T., Plaxton, W.C., 2012. Bacterial- and 481 424 The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 47, 65– plant-type phosphoenolpyruvate carboxylase isozymes from developing castor 482 425 72. http://dx.doi.org/10.1038/ng.3149. oil seeds interact in vivo and associate with the surface of mitochondria. Plant J. 483 426 Christin, P.-A., Arakaki, M., Osborne, C.P., Bräutigam, A., Sage, R.F., Hibberd, J.M., 71, 251–262. http://dx.doi.org/10.1111/j.1365-313X.2012.04985.x. 484 427 Kelly, S., Covshoff, S., Wong, G.K.-S., Hancock, L., Edwards, E.J., 2014. Shared Peng, Y., Cai, J., Wang, W., Su, B., 2012a. Multiple inter-kingdom horizontal gene 485 428 origins of a key enzyme during the evolution of C and CAM metabolism. J. Exp. 4 transfers in the evolution of the phosphoenolpyruvate carboxylase gene family. 486 429 Bot. 65, 3609–3621. http://dx.doi.org/10.1093/jxb/eru087. PLoS ONE 7, e51159. http://dx.doi.org/10.1371/journal.pone.0051159. 487 430 Driever, S.M., Kromdijk, J., 2013. Will C crops enhanced with the C CO - 3 4 2 Peng, Z., Cheng, Y., Tan, B.C.-M., Kang, L., Tian, Z., Zhu, Y., Zhang, W., Liang, Y., Hu, X., 488 431 concentrating mechanism live up to their full potential (yield)? J. Exp. Bot. Tan, X., Guo, J., Dong, Z., Liang, Y., Bao, L., Wang, J., 2012b. Comprehensive 489 432 64, 3925–3935. http://dx.doi.org/10.1093/jxb/ert103. analysis of RNA-Seq data reveals extensive RNA editing in a human 490 433 Edgar, R.C., 2004. MUSCLE: a multiple sequence alignment method with reduced transcriptome. Nat. Biotechnol. 30, 253–260. http://dx.doi.org/10.1038/ 491 434 time and space complexity. BMC Bioinformatics 5, 113. http://dx.doi.org/ nbt.2122. 492 435 10.1186/1471-2105-5-113. Pollet, B., Vanhaecke, L., Dambre, P., Lootens, P., Steppe, K., 2011. Low night 493 436 Fan, Z., Li, J., Lu, M., Li, X., Yin, H., 2013. Overexpression of phosphoenolpyruvate temperature acclimation of Phalaenopsis. Plant Cell Rep. 30, 1125–1134. http:// 494 437 carboxylase from Jatropha curcas increases fatty acid accumulation in Nicotiana dx.doi.org/10.1007/s00299-011-1021-2. 495 438 tabacum. Acta Physiol. Plant. 35, 2269–2279. http://dx.doi.org/10.1007/s11738- Sage, R.F., Christin, P.-A., Edwards, E.J., 2011. The C plant lineages of planet Earth. J. 496 439 013-1264-3. 4 Exp. Bot. 62, 3155–3169. http://dx.doi.org/10.1093/jxb/err048. 497 440 Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Silvera, K., Neubig, K.K.M., Whitten, W.M.W., Williams, N.H., Winter, K., Cushman, J. 498 441 Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M., C., 2010a. Evolution along the crassulacean acid metabolism continuum. Funct. 499 442 2014. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230. Plant Biol. 37, 995. http://dx.doi.org/10.1071/FP10084. 500 443 http://dx.doi.org/10.1093/nar/gkt1223. Silvera, K., Santiago, L.S., Cushman, J.C., Winter, K., 2009. Crassulacean acid 501 444 Finn, R.D., Clements, J., Eddy, S.R., 2011. HMMER web server: interactive sequence metabolism and epiphytism linked to adaptive radiations in the Orchidaceae. 502 445 similarity searching. Nucleic Acids Res. 39, W29–W37. http://dx.doi.org/ Plant Physiol. 149, 1838–1847. http://dx.doi.org/10.1104/pp.108.132555. 503 446 10.1093/nar/gkr367. Silvera, K., Santiago, L.S., Cushman, J.C., Winter, K., 2010b. The incidence of 504 447 Gehrig, H., Faist, K., Kluge, M., 1998. Identification of phosphoenolpyruvate crassulacean acid metabolism in Orchidaceae derived from carbon isotope 505 448 carboxylase isoforms in leaf, stem and roots of the obligate CAM plant Vanilla ratios: a checklist of the flora of Panama and Costa Rica. Bot. J. Linn. Soc. 163, 506 449 planifolia Salib. (Orchidaceae): a physiological and molecular approach. Plant 194–222. http://dx.doi.org/10.1111/j.1095-8339.2010.01058.x. 507 450 Mol. Biol. 38, 1215–1223. http://dx.doi.org/10.1023/A:1006006331011. Silvera, K., Winter, K., Rodriguez, B.L., Albion, R.L., Cushman, J.C., 2014. Multiple 508 451 Gehrig, H., Heute, V., Kluge, M., 2001. New partial sequences of isoforms of phosphoenolpyruvate carboxylase in the Orchidaceae (subtribe 509 452 phosphoenolpyruvate carboxylase as molecular phylogenetic markers. Mol. Oncidiinae): implications for the evolution of crassulacean acid metabolism. J. 510 453 Phylogenet. Evol. 20, 262–274. http://dx.doi.org/10.1006/mpev.2001.0973. Exp. Bot. 65, 3623–3636. http://dx.doi.org/10.1093/jxb/eru234. 511 454 Górniak, M., Paun, O., Chase, M.W., 2010. Phylogenetic relationships within Smith, J., Winter, K., 1996. Taxonomic distribution of Crassulacean acid metabolism. 512 455 Orchidaceae based on a low-copy nuclear coding gene, Xdh: congruence with In: Winter, K., Smith, J.A.C. (Eds.), Crassulacean Acid Metabolism, Ecological 513 456 organellar and nuclear ribosomal DNA results. Mol. Phylogenet. Evol. 56, 784– Studies. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 427–436. http://dx. 514 457 795. http://dx.doi.org/10.1016/j.ympev.2010.03.003. doi.org/10.1007/978-3-642-79060-7. 515 458 Gross, S.M., Martin, J.a., Simpson, J., Abraham-Juarez, M.J., Wang, Z., Visel, A., 2013. Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M., Paterson, A.H., 2008. Synteny and 516 459 De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti collinearity in plant genomes. Science 320, 486–488. http://dx.doi.org/ 517 460 and Agave tequilana. BMC Genom. 14, 563. http://dx.doi.org/10.1186/1471- 10.1126/science.1153917. 518 461 2164-14-563. Tsai, W.-C., Fu, C.-H., Hsiao, Y.-Y., Huang, Y.-M., Chen, L.-J., Wang, M., Liu, Z.-J., Chen, 519 462 Izui, K., Matsumura, H., Furumoto, T., Kai, Y., 2004. Phosphoenolpyruvate H.-H., 2013. OrchidBase 2.0: comprehensive collection of Orchidaceae floral 520 463 carboxylase: a new era of structural biology. Annu. Rev. Plant Biol. 55, 69–84. transcriptomes. Plant Cell Physiol. 54, e7. http://dx.doi.org/10.1093/pcp/ 521 464 http://dx.doi.org/10.1146/annurev.arplant.55.031903.141619. pcs187. 522 465 Jiao, Y., Li, J., Tang, H., Paterson, A.H., 2014. Integrated syntenic and phylogenomic Winter, K., Garcia, M., Holtum, J.A.M., 2014. Nocturnal versus diurnal CO uptake: 523 466 analyses reveal an ancient genome duplication in monocots. Plant Cell 26, 2 how flexible is Agave angustifolia? J. Exp. Bot. 65, 3695–3703. http://dx.doi.org/ 524 467 2792–2802. http://dx.doi.org/10.1105/tpc.114.127597. 10.1093/jxb/eru097. 525 468 Keeley, J.E., Rundel, P.W., 2003. Evolution of CAM and C4 carbon-concentrating 469 mechanisms. Int. J. Plant Sci. 164, S55–S77. http://dx.doi.org/10.1086/374192. 526

Please cite this article in press as: Deng, H., et al. Evolutionary history of PEPC genes in green plants: Implications for the evolution of CAM in orchids. Mol. Phylogenet. Evol. (2015), http://dx.doi.org/10.1016/j.ympev.2015.10.007