Dissecting Molecular Evolution in the Highly Diverse Plant Clade Caryophyllales Using Transcriptome Sequencing
Total Page:16
File Type:pdf, Size:1020Kb
MBE Advance Access published April 2, 2015 ! Article, Discoveries Dissecting molecular evolution in the highly diverse plant clade Caryophyllales using transcriptome sequencing Ya Yang*,1, Michael J. Moore2, Samuel F. Brockington3, Douglas E. Soltis4,5,6, Gane Ka-Shu Wong7,8,9, Eric J. Carpenter7, Yong Zhang9, Li Chen9, Zhixiang Yan9, Yinlong Xie9, Rowan F. Sage10, Sarah Covshoff11, Julian M. Hibberd11, Matthew N. Nelson12, and Stephen A. Smith*,1 Downloaded from 1Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI 48109-1048, USA 2Department of Biology, Oberlin College, Science Center K111, 119 Woodland St., Oberlin, Ohio 44074- http://mbe.oxfordjournals.org/ 1097 USA 3Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom 4Department of Biology, University of Florida, Gainesville, FL 32611-8525, USA 5Florida Museum of Natural History, University of Florida, Gainesville, FL 32611-7800, USA 6Genetics Institute, University of Florida, Gainesville, FL 32610, USA at University of Alberta on December 14, 2015 7Department of Biological Sciences, University of Alberta, Edmonton AB, T6G 2E9, Canada 8Department of Medicine, University of Alberta, Edmonton AB, T6G 2E1, Canada 9BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China 10Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, Ontario M5S 3B2, Canada 11Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, UK 12School of Plant Biology, The University of Western Australia, 35 Stirling Highway, Crawley, Western Australia 6009, Australia * Corresponding authors. Email: [email protected] or [email protected] © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. ! Abstract Many phylogenomic studies based on transcriptomes have been limited to “single-copy” genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122- Downloaded from gene data set with a gene occupancy of 92.1%. From the homolog trees we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes http://mbe.oxfordjournals.org/ with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key at University of Alberta on December 14, 2015 enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in non-model species that is based on homolog groups in addition to inferred ortholog groups. Keywords: Caryophyllales; substitution rate heterogeneity; paleopolyploidy; RNA-seq ! Introduction Transcriptome sequencing, or RNA-seq, has shown huge potential for understanding the genetic and genomic bases of diversification in non-model systems (for example, Barker et al. 2008; Dunn et al. 2008; Lee et al. 2011; Wickett et al. 2011; Delaux et al. 2014; Li et al. 2014; Misof et al. 2014; Sveinsson et al. 2014; Wickett et al. 2014; Cannon et al. 2015; Hollister et al. 2015). Previous phylogenomic studies based on transcriptomes have been limited by availability of data sets that include both high numbers of genes and dense taxon sampling. Methodological issues in homology and orthology inference, especially in accommodating the frequent genome duplications in plants, have resulted in the discarding of a large proportion of genes from previous phylogenomic studies (Yang and Smith 2014). Downloaded from These limitations, together with the dynamic nature of gene expression, gene duplication and loss, lineage specific heterogeneity in substitution rates and gene tree topology discordance have resulted in sparse matrices among nuclear genes in prior analyses, leading researchers to reduce data sets to a small number http://mbe.oxfordjournals.org/ of genes for analysis. These challenges have limited many RNA-seq phylogenomic studies to inferring a species tree and only a limited number of studies have explored transcriptome-wide functional analyses beyond one-to-one orthologs or genes involved in a particular functional category (Barker et al. 2008; Lee et al. 2011). We use recently developed tree-based homology inference methods (Yang and Smith 2014) that overcome many of the previous analytical limitations to illustrate the rich content of transcriptome data at University of Alberta on December 14, 2015 sets. We demonstrate our approach in the plant clade Caryophyllales by applying these methods to a data set of 69 transcriptomes and 27 genomes. With an estimated 11,510 species in 34 families (APG III; Bremer et al. 2009), the Caryophyllales represent approximately 6% of angiosperm species diversity, and are estimated to have a crown age of ca. 67–121 Ma (Bell et al. 2010; Moore et al. 2010). Species of the Caryophyllales are found across all continents and in all terrestrial ecosystems and exhibit extreme life history diversity, ranging from tropical trees to temperate annual herbs, and from long-lived desert succulents such as cacti to a diverse array of carnivorous plants [e.g., sundews (Drosera) and Old World pitcher plants (Nepenthes)]. The group also contains several independent origins of C4 and CAM photosynthesis and exhibits repeated adaptation to warm, dry and high salinity environments (Sage et al. 2011; Edwards and Ogburn 2012; Kadereit et al. 2012). The extraordinary diversity in growth forms and ecological adaptations makes Caryophyllales an ideal group for investigating gene and genome evolution and heterogeneity in molecular substitution rate. We apply a tree-based homology and orthology inference approach (Yang and Smith 2014) to examine phylogenetics, molecular substitution rate, and gene and genome duplications in the Caryophyllales. We use the inferred ortholog groups to reconstruct the phylogenetic relationships among major clades of Caryophyllales and to evaluate the heterogeneity in phylogenetic signal among ortholog ! groups. We then use homolog trees to explore patterns of substitution rate heterogeneity among lineages within the Caryophyllales. Previous studies have linked variations in among-lineage substitution rates to a wide range of factors, such as temperature, UV radiation, water limitation, woody vs. herbaceous habit, parasitism, speciation rate, generation time, metabolic rate and rates of mitosis (Barraclough and Savolainen 2001; Davies et al. 2004; Kay et al. 2006; Wright et al. 2006; Smith and Donoghue 2008; Goldie et al. 2010; Gaut et al. 2011; Buschiazzo et al. 2012; Bromham et al. 2013; Lanfear et al. 2013). However, these studies have been based on either a small number of genes and (typically) the use of introns, or on the estimation of absolute rates between a few distantly related species pairs (Yue et al. 2010; Buschiazzo et al. 2012). Studies utilizing nuclear gene sequences have also suffered from sparse Downloaded from taxon sampling, where large phylogenetic distances separate the ingroup from a small number of distant outgroups. In this study, we leverage our high gene content and more thorough taxon sampling to test whether substitution rate between woody and herbaceous lineages are uniform across thousands of genes http://mbe.oxfordjournals.org/ throughout the Caryophyllales. We also map gene duplication events inferred from homolog trees onto the species tree, revealing a number of previously unknown putative genome duplication events. Finally, we discuss the functional categories of genes that experienced high levels of gene family expansion within the Caryophyllales. Results and Discussion at University of Alberta on December 14, 2015 Tree-based homology and orthology inferences Our data set comprises 69 Caryophyllales transcriptomes and 27 eudicot genomes. We took advantage of the availability of fully annotated genomes in eudicots that provide high quality outgroups for rooting the Caryophyllales (Goodstein et al. 2012). The homology inference first used similarity