This Is the Peer Reviewed Version of the Following Article
Total Page:16
File Type:pdf, Size:1020Kb
This is the peer reviewed version of the following article: Schmickl, R. , Liston, A. , Zeisek, V. , Oberlander, K. , Weitemier, K. , Straub, S. C., Cronn, R. C., Dreyer, L. L. and Suda, J. (2016), Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae). Molecular Ecology Resources, 16: 1124-1135. which has been published in final form at https://doi.org/10.1111/1755-0998.12487. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. 1 Phylogenetic marker development for target enrichment from transcriptome 2 and genome skim data: the pipeline and its application in southern African 3 Oxalis (Oxalidaceae) 4 5 Roswitha Schmickl1,*, Aaron Liston2, Vojtěch Zeisek1,3, Kenneth Oberlander1,4, Kevin 6 Weitemier2, Shannon C.K. Straub5, Richard C. Cronn6, Léanne L. Dreyer7, Jan 7 Suda1,3 9 1 Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43 10 Průhonice, Czech Republic 11 2 Department of Botany and Plant Pathology, Oregon State University, 2082 12 Cordley Hall, Corvallis, OR 97331, USA 13 3 Department of Botany, Faculty of Science, Charles University in Prague, 14 Benátská 2, 128 01 Prague, Czech Republic 15 4 Department of Conservation Ecology and Entomology, Stellenbosch 16 University, Private Bag X1, Matieland, 7602, South Africa 17 5 Department of Biology, Hobart and William Smith Colleges, 213 Eaton Hall, 18 Geneva, NY 14456, USA 19 6 USDA Forest Service, Pacific Northwest Research Station, 3200 SW 20 Jefferson Way, Corvallis, OR 97331, USA 21 7 Department of Botany and Zoology, Stellenbosch University, Private Bag X1, 22 Matieland, 7602, South Africa 23 24 *Correspondence: Roswitha Schmickl; Institute of Botany, The Czech Academy of 25 Sciences, Zámek 1, 252 43 Průhonice, Czech Republic; Fax: +420 221 951 645; E- 26 mail: [email protected] 27 Keywords: cytonuclear discordance, genome skimming, low-copy nuclear genes, 28 Oxalis, species tree, target enrichment 1 29 30 Running head: Low-copy nuclear genes for Hyb-Seq 31 32 Abstract 33 Phylogenetics benefits from using a large number of putatively independent nuclear 34 loci and their combination with other sources of information, such as the plastid and 35 mitochondrial genomes. To facilitate the selection of orthologous low-copy nuclear 36 (LCN) loci for phylogenetics in non-model organisms, we created an automated and 37 interactive script to select hundreds of LCN loci by a comparison between 38 transcriptome and genome skim data. We used our script to obtain LCN genes for 39 southern African Oxalis (Oxalidaceae), a speciose plant lineage in the Greater Cape 40 Floristic Region. This resulted in 1,164 LCN genes greater than 600 bp. Using target 41 enrichment combined with genome skimming (Hyb-Seq) we obtained on average 42 1,141 LCN loci, nearly the whole plastid genome and the nrDNA cistron from 23 43 southern African Oxalis species. Despite a wide range of gene trees, the phylogeny 44 based on the LCN genes was very robust, as retrieved through various gene and 45 species tree reconstruction methods as well as concatenation. Cytonuclear 46 discordance was strong. This indicates that organellar phylogenies alone are unlikely 47 to represent the species tree and stresses the utility of Hyb-Seq in phylogenetics. 48 49 Introduction 50 High-throughput sequencing (HTS) has the potential to greatly increase the amount 51 of phylogenetically informative signal in molecular datasets (Parks et al., 2009, 2012) 52 and overcome difficulties in phylogenetic reconstructions, such as polytomies and 53 low support values, that are often the result of using only a small fraction of the 54 genome. However, HTS also “opens the era of real incongruence” (Jeffroy et al., 55 2006), and even massive amounts of sequence data do not always result in strongly 56 resolved phylogenies (Pyron, 2015). When HTS was introduced to plant 2 57 phylogenetics, sequencing of the plastid genome was its first focus (e.g., Parks et al., 58 2009, Givnish et al., 2010). Later approaches of genome skimming, the sequencing 59 of the high-copy fractions of the nuclear, plastid and mitochondrial genome (Straub et 60 al., 2012), resulted in the assembly of the rDNA cistron and nearly the complete 61 plastid and mitochondrial genome. Currently, target enrichment (sequence capture) 62 of hundreds of loci is becoming increasingly popular in phylogenetics. In animal 63 phylogenomics non-exonic or partly exonic ultraconserved elements and their quite 64 variable flanking regions are often utilized (e.g., Faircloth et al., 2012, Hedtke et al., 65 2013, Smith et al., 2013). For plant phylogenetics, low-copy nuclear (LCN) genes are 66 targeted (Mandel et al., 2014, Weitemier et al., 2014, Grover et al., 2015, Heyduk et 67 al., 2015, Stephens et al., 2015a, b, Mandel et al., 2015, Nicholls et al., 2015) due to 68 the paucity of ultraconserved nuclear sequences (Reneker et al., 2012). 69 Target sequencing strategies for plant nuclear genomes are largely lineage- 70 specific, requiring the de novo design of target enrichment probes. Chamala et al. 71 (2015) recently introduced a pipeline for phylogenetic marker development in 72 angiosperms using transcriptomes, and they obtained several hundred putative LCN 73 genes that can be utilized at three phylogenetic levels (genus, family, order); 74 however empirical evidence for the phylogenetic utility of these loci was not 75 demonstrated. Alternative phylogenetic marker developments, also utilizing 76 transcriptomes (Rothfels et al., 2013, Pillon et al., 2014, Tonnabel et al., 2014), 77 resulted in a much smaller number (up to 20) of mainly LCN loci, but these loci were 78 evaluated with PCR in the empirical datasets, not target enrichment. In recently 79 published phylogenies based on target enrichment of several hundred LCN genes, 80 these loci were selected from transcriptomes, gene expression studies, the literature, 81 or a combination of these sources (Mandel et al., 2014, Grover et al., 2015, Heyduk 82 et al., 2015, Stephens et al., 2015a, b, Mandel et al., 2015, Nicholls et al., 2015). 83 Weitemier et al. (2014) designed LCN probes for target enrichment based on a 84 combination of transcriptome and genome data and demonstrated their phylogenetic 3 85 utility in Asclepias L.. The limitation of this probe design pipeline is that (draft) 86 genomes are still infrequent, especially for non-model species, and are costly to 87 generate. This limitation also applies to the approach of de Sousa et al. (2014), who 88 selected 50 LCN loci from a genomic source and amplified them using target 89 enrichment. Except for Chamala et al. (2015), who offer a user-friendly but 90 empirically untested probe design pipeline, and Weitemier et al. (2014), whose Hyb- 91 Seq pipeline is designed for more advanced users, no automated probe design 92 pipeline for LCN genes is currently available. 93 In this study we developed a novel probe design pipeline for targeting 94 orthologous LCN loci for phylogenetic reconstruction by using genome skim and 95 transcriptome data. In particular, genome skim data of one accession of the studied 96 plant group were combined with a congeneric transcriptome from the 1000 Plants 97 (1KP) initiative (http://www.onekp.com/). We implemented our software workflow in 98 the user-friendly, automated and interactive BASH script Sondovač, which allows a 99 straightforward design of LCN probes also for users with limited bioinformatics skills. 100 The utility of this approach is demonstrated by the design of probes for southern 101 African Oxalis, and over 1,000 candidate LCN loci were obtained. Use of the probes 102 for targeted sequencing of these loci in 23 southern African Oxalis species resulted in 103 sufficient sequencing depth of the LCN loci, as well as the plastid and high-copy 104 nuclear genome (e.g., nuclear ribosomal DNA (nrDNA) cistron). Considering their 105 different evolutionary rates and modes of inheritance (Small et al., 2004), the 106 combination of all three datasets can substantially contribute to understanding 107 speciation from a phylogenetic perspective. The observed, strong cytonuclear 108 discordance (nuclear tree topology deviating from organellar tree topology) suggests 109 that organellar phylogenies alone do not resemble the species tree. 110 111 Materials and methods 112 Taxonomic focus 4 113 Oxalis L. (ca. 500 species) is common in the flora of the New World and a major 114 component of the Greater Cape Floristic Region (GCFR) (Born et al., 2006). 115 Southern African taxa comprise ca. 46% of the genus (ca. 230 species) and 116 represent the seventh-largest genus and the largest geophytic genus in the GCFR 117 (Proches et al., 2006); they bear bulbs with above-ground parts emanating from 118 seasonal stems. There is some evidence of rapid, possibly adaptive radiation of 119 southern African Oxalis, as the base of the clade is poorly resolved (Oberlander et 120 al., 2011). Results of dating the southern African crown Oxalis radiation varied 121 between an age of 9.9 and 32.2 mil. years (Oberlander et al., 2014). 122 Oberlander et al. (2011) published a phylogeny of the southern African 123 species, based on a combined dataset of plastid trnL intron, trnL-trnF intergenic 124 spacer, and trnS-trnG, as well as the internal transcribed spacers of nuclear 125 ribosomal DNA (ITS), and found that these species are monophyletic. However, the 126 phylogeny lacked good resolution and high support values for many clades, and 127 strong cytonuclear discordance was observed. For Hyb-Seq we selected 23 Oxalis 128 species from the core southern African Oxalis clade (clade four of Oberlander et al., 129 2011). Twelve of those species were from the “O. hirta and relatives clade” (clade 11 130 of Oberlander et al., 2011; hereafter the Hirta clade), which showed strong 131 cytonuclear discordance.