This is the peer reviewed version of the following article:
Schmickl, R. , Liston, A. , Zeisek, V. , Oberlander, K. , Weitemier, K. , Straub, S. C., Cronn, R. C., Dreyer, L. L. and Suda, J. (2016), Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae).
Molecular Ecology Resources, 16: 1124-1135. which has been published in final form at
https://doi.org/10.1111/1755-0998.12487. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. 1 Phylogenetic marker development for target enrichment from transcriptome
2 and genome skim data: the pipeline and its application in southern African
3 Oxalis (Oxalidaceae) 4
5 Roswitha Schmickl1,*, Aaron Liston2, Vojtěch Zeisek1,3, Kenneth Oberlander1,4, Kevin
6 Weitemier2, Shannon C.K. Straub5, Richard C. Cronn6, Léanne L. Dreyer7, Jan
7 Suda1,3
9 1 Institute of Botany, The Czech Academy of Sciences, Zámek 1, 252 43
10 Průhonice, Czech Republic
11 2 Department of Botany and Plant Pathology, Oregon State University, 2082
12 Cordley Hall, Corvallis, OR 97331, USA
13 3 Department of Botany, Faculty of Science, Charles University in Prague,
14 Benátská 2, 128 01 Prague, Czech Republic
15 4 Department of Conservation Ecology and Entomology, Stellenbosch
16 University, Private Bag X1, Matieland, 7602, South Africa
17 5 Department of Biology, Hobart and William Smith Colleges, 213 Eaton Hall,
18 Geneva, NY 14456, USA
19 6 USDA Forest Service, Pacific Northwest Research Station, 3200 SW
20 Jefferson Way, Corvallis, OR 97331, USA
21 7 Department of Botany and Zoology, Stellenbosch University, Private Bag X1,
22 Matieland, 7602, South Africa 23
24 *Correspondence: Roswitha Schmickl; Institute of Botany, The Czech Academy of
25 Sciences, Zámek 1, 252 43 Průhonice, Czech Republic; Fax: +420 221 951 645; E-
26 mail: [email protected]
27 Keywords: cytonuclear discordance, genome skimming, low-copy nuclear genes,
28 Oxalis, species tree, target enrichment
1 29
30 Running head: Low-copy nuclear genes for Hyb-Seq 31
32 Abstract
33 Phylogenetics benefits from using a large number of putatively independent nuclear
34 loci and their combination with other sources of information, such as the plastid and
35 mitochondrial genomes. To facilitate the selection of orthologous low-copy nuclear
36 (LCN) loci for phylogenetics in non-model organisms, we created an automated and
37 interactive script to select hundreds of LCN loci by a comparison between
38 transcriptome and genome skim data. We used our script to obtain LCN genes for
39 southern African Oxalis (Oxalidaceae), a speciose plant lineage in the Greater Cape
40 Floristic Region. This resulted in 1,164 LCN genes greater than 600 bp. Using target
41 enrichment combined with genome skimming (Hyb-Seq) we obtained on average
42 1,141 LCN loci, nearly the whole plastid genome and the nrDNA cistron from 23
43 southern African Oxalis species. Despite a wide range of gene trees, the phylogeny
44 based on the LCN genes was very robust, as retrieved through various gene and
45 species tree reconstruction methods as well as concatenation. Cytonuclear
46 discordance was strong. This indicates that organellar phylogenies alone are unlikely
47 to represent the species tree and stresses the utility of Hyb-Seq in phylogenetics. 48
49 Introduction
50 High-throughput sequencing (HTS) has the potential to greatly increase the amount
51 of phylogenetically informative signal in molecular datasets (Parks et al., 2009, 2012)
52 and overcome difficulties in phylogenetic reconstructions, such as polytomies and
53 low support values, that are often the result of using only a small fraction of the
54 genome. However, HTS also “opens the era of real incongruence” (Jeffroy et al.,
55 2006), and even massive amounts of sequence data do not always result in strongly
56 resolved phylogenies (Pyron, 2015). When HTS was introduced to plant
2 57 phylogenetics, sequencing of the plastid genome was its first focus (e.g., Parks et al.,
58 2009, Givnish et al., 2010). Later approaches of genome skimming, the sequencing
59 of the high-copy fractions of the nuclear, plastid and mitochondrial genome (Straub et
60 al., 2012), resulted in the assembly of the rDNA cistron and nearly the complete
61 plastid and mitochondrial genome. Currently, target enrichment (sequence capture)
62 of hundreds of loci is becoming increasingly popular in phylogenetics. In animal
63 phylogenomics non-exonic or partly exonic ultraconserved elements and their quite
64 variable flanking regions are often utilized (e.g., Faircloth et al., 2012, Hedtke et al.,
65 2013, Smith et al., 2013). For plant phylogenetics, low-copy nuclear (LCN) genes are
66 targeted (Mandel et al., 2014, Weitemier et al., 2014, Grover et al., 2015, Heyduk et
67 al., 2015, Stephens et al., 2015a, b, Mandel et al., 2015, Nicholls et al., 2015) due to
68 the paucity of ultraconserved nuclear sequences (Reneker et al., 2012).
69 Target sequencing strategies for plant nuclear genomes are largely lineage-
70 specific, requiring the de novo design of target enrichment probes. Chamala et al.
71 (2015) recently introduced a pipeline for phylogenetic marker development in
72 angiosperms using transcriptomes, and they obtained several hundred putative LCN
73 genes that can be utilized at three phylogenetic levels (genus, family, order);
74 however empirical evidence for the phylogenetic utility of these loci was not
75 demonstrated. Alternative phylogenetic marker developments, also utilizing
76 transcriptomes (Rothfels et al., 2013, Pillon et al., 2014, Tonnabel et al., 2014),
77 resulted in a much smaller number (up to 20) of mainly LCN loci, but these loci were
78 evaluated with PCR in the empirical datasets, not target enrichment. In recently
79 published phylogenies based on target enrichment of several hundred LCN genes,
80 these loci were selected from transcriptomes, gene expression studies, the literature,
81 or a combination of these sources (Mandel et al., 2014, Grover et al., 2015, Heyduk
82 et al., 2015, Stephens et al., 2015a, b, Mandel et al., 2015, Nicholls et al., 2015).
83 Weitemier et al. (2014) designed LCN probes for target enrichment based on a
84 combination of transcriptome and genome data and demonstrated their phylogenetic
3 85 utility in Asclepias L.. The limitation of this probe design pipeline is that (draft)
86 genomes are still infrequent, especially for non-model species, and are costly to
87 generate. This limitation also applies to the approach of de Sousa et al. (2014), who
88 selected 50 LCN loci from a genomic source and amplified them using target
89 enrichment. Except for Chamala et al. (2015), who offer a user-friendly but
90 empirically untested probe design pipeline, and Weitemier et al. (2014), whose Hyb-
91 Seq pipeline is designed for more advanced users, no automated probe design
92 pipeline for LCN genes is currently available.
93 In this study we developed a novel probe design pipeline for targeting
94 orthologous LCN loci for phylogenetic reconstruction by using genome skim and
95 transcriptome data. In particular, genome skim data of one accession of the studied
96 plant group were combined with a congeneric transcriptome from the 1000 Plants
97 (1KP) initiative (http://www.onekp.com/). We implemented our software workflow in
98 the user-friendly, automated and interactive BASH script Sondovač, which allows a
99 straightforward design of LCN probes also for users with limited bioinformatics skills.
100 The utility of this approach is demonstrated by the design of probes for southern
101 African Oxalis, and over 1,000 candidate LCN loci were obtained. Use of the probes
102 for targeted sequencing of these loci in 23 southern African Oxalis species resulted in
103 sufficient sequencing depth of the LCN loci, as well as the plastid and high-copy
104 nuclear genome (e.g., nuclear ribosomal DNA (nrDNA) cistron). Considering their
105 different evolutionary rates and modes of inheritance (Small et al., 2004), the
106 combination of all three datasets can substantially contribute to understanding
107 speciation from a phylogenetic perspective. The observed, strong cytonuclear
108 discordance (nuclear tree topology deviating from organellar tree topology) suggests
109 that organellar phylogenies alone do not resemble the species tree. 110
111 Materials and methods
112 Taxonomic focus
4 113 Oxalis L. (ca. 500 species) is common in the flora of the New World and a major
114 component of the Greater Cape Floristic Region (GCFR) (Born et al., 2006).
115 Southern African taxa comprise ca. 46% of the genus (ca. 230 species) and
116 represent the seventh-largest genus and the largest geophytic genus in the GCFR
117 (Proches et al., 2006); they bear bulbs with above-ground parts emanating from
118 seasonal stems. There is some evidence of rapid, possibly adaptive radiation of
119 southern African Oxalis, as the base of the clade is poorly resolved (Oberlander et
120 al., 2011). Results of dating the southern African crown Oxalis radiation varied
121 between an age of 9.9 and 32.2 mil. years (Oberlander et al., 2014).
122 Oberlander et al. (2011) published a phylogeny of the southern African
123 species, based on a combined dataset of plastid trnL intron, trnL-trnF intergenic
124 spacer, and trnS-trnG, as well as the internal transcribed spacers of nuclear
125 ribosomal DNA (ITS), and found that these species are monophyletic. However, the
126 phylogeny lacked good resolution and high support values for many clades, and
127 strong cytonuclear discordance was observed. For Hyb-Seq we selected 23 Oxalis
128 species from the core southern African Oxalis clade (clade four of Oberlander et al.,
129 2011). Twelve of those species were from the “O. hirta and relatives clade” (clade 11
130 of Oberlander et al., 2011; hereafter the Hirta clade), which showed strong
131 cytonuclear discordance. Oxalis hirta L. and Oxalis obtusa Jacq. had two accessions
132 each, as they are among the most morphologically variable taxa within the core
133 southern African Oxalis clade. Approximately one third of the accessions were
134 polyploid. Silica-dried leaf material was used in all cases. Sampling information is
135 available in Table S1 (Supporting information). 136
137 Target enrichment probe design
138 The transcriptome draft assembly of the cosmopolitan weed Oxalis corniculata L.
139 from the 1KP initiative (accession JHCN) and genome skim raw data of an accession
140 of southern African O. obtusa (accession J12; see “Illumina library preparation and
5 141 Hyb-Seq”) were combined to get hundreds of orthologous LCN loci. Enrichment of
142 multi-copy loci was minimized by using unique transcripts only, which were obtained
143 by comparing all transcripts and removing those sharing ≥90% sequence similarity
144 using BLAT v.32x1 (Kent, 2002). Before matching the O. obtusa genome skim data
145 against those unique transcripts, reads of plastid and mitochondrial origin were
146 removed with Bowtie 2 (Langmead and Salzberg, 2012), SAMtools (Li et al., 2009)
147 and bam2fastq (http://gsl.hudsonalpha.org/information/software/bam2fastq) utilizing
148 Ricinus communis L. GenBank accessions NC_016736 and NC_015141 as
149 references. Paired-end reads were subsequently combined with FLASH (Magoč and
150 Salzberg, 2011). These processed reads were matched against the unique O.
151 corniculata transcripts sharing ≥85% sequence similarity with BLAT. Transcripts with
152 >1000 BLAT hits, indicating repetitive elements, and O. obtusa BLAT hits containing
153 masked nucleotides were removed before de novo assembly of the O. obtusa BLAT
154 hits to larger contigs with Geneious v.6.1.7 (Kearse et al., 2012), using the medium
155 sensitivity / fast setting. After assembly, only those contigs that comprised exons
156 ≥120 bp and had a total locus length ≥600 bp were retained. To ensure that probes
157 did not target multiple similar loci, any probe sequences sharing ≥90% sequence
158 similarity were removed using cd-hit-est v.4.5.4 (Li and Godzik, 2006), followed by a
159 second filtering step for contigs containing exons ≥120 bp and totaling loci length
160 ≥600 bp. To ensure that plastid sequences were absent from the probes, the probe
161 sequences were matched against the Ricinus plastome reference sharing ≥90%
162 sequence similarity with BLAT, and the hits were removed from the probe set.
163 Repetitive elements were then masked with RepeatMasker
164 (http://www.repeatmasker.org/). Ambiguous sites in the probe sequences, which
165 were generated during assembly of the genome skim reads, were randomly replaced
166 by one of the relevant nucleotides and stretches of up to 5N replaced by T. Tiling
167 density 2☓ was used.
6 168 The workflow described above up to the removal of remaining plastid
169 sequences is summarized in Figure 1. It was implemented in an automated and
170 interactive BASH script named Sondovač, which is deposited in GitHub
171 (https://github.com/V-Z/sondovac/wiki/) and licensed under open-source license GPL
172 v.3 allowing further modifications. The script runs on major Linux distributions and
173 Mac OS X; it runs on a standard desktop computer equipped with modern CPU like
174 Intel i5 or i7. Strong bioinformatics skills and access to high-performance computer
175 clusters are not required. 176
177 Illumina library preparation and Hyb-Seq
178 Genomic DNA was extracted according to the CTAB protocol (Doyle and Doyle,
179 1987). The genome skim data of O. obtusa accession J12 was obtained with 250 bp
180 paired-end reads from a partial lane of an Illumina MiSeq (San Diego, California,
181 USA) performed by StarSEQ (Mainz, Germany). For the other 23 species of Oxalis,
182 used for Hyb-Seq, 200 ng to 1 µg extracted DNA was sheared with a Covaris
183 (Woburn, Massachusetts, USA) S220 sonicator using the program for fragmentation
184 to 1,000 bp for 45 sec (200 cycles, 4°C). Library preparation followed the NEBNext
185 Ultra DNA Library Prep (New England Biolabs, Ipswich, Massachusetts, USA)
186 protocol for Illumina with a few modifications: (1) Size-selection (~600–650 bp) was
187 performed on a 1% agarose gel and (2) two additional cleanup steps were
188 implemented, one after adapter-ligation with the QIAquick PCR Purification Kit
189 (Qiagen, Venlo, Netherlands), and a second after gel extraction with the QIAquick
190 Gel Extraction Kit (Qiagen). (3) Enriched PCR products were cleaned up with the
191 QIAquick PCR Purification Kit and subsequently with Agencourt AMPure XP beads
192 (Beckman Coulter Genomics, Danvers, Massachusetts, USA). Amplification of
193 ligated, size-selected fragments was performed with 10 cycles of PCR, using
194 NEBNext Multiplex Oligos for Illumina Index Primers Set 1 and 2 (New England
195 Biolabs). Libraries were subsequently pooled in approximately equimolar ratios in a
7 196 24-plex reaction. Solution hybridization with MyBaits biotinylated RNA baits
197 (MYcroarray, Ann Arbor, Michigan, USA), which were synthesized from our custom-
198 designed probes, and enrichment followed the MYbaits manual v.1.3.8 with
199 approximately 200 ng of input DNA (9 ng per accession) and 12 cycles of PCR
200 enrichment. Target-enriched libraries were sequenced on an Illumina MiSeq at
201 Oregon State University (v.3 chemistry) to obtain 150 bp paired-end reads. All DNA
202 concentration measurements were performed with the Qubit 2.0 fluorometer
203 (Invitrogen / Life Technologies, Carlsbad, California, USA). 204
205 Data analysis pipelines for quality filtering, assembly, alignment, and quality
206 assessment of the Hyb-Seq data
207 Adapter sequences and low quality reads were removed with Trimmomatic v.0.30
208 (Bolger et al., 2014). In case of quality 209 discarded. The remaining part of the read was trimmed, if average quality in a 5 bp 210 window was 211 FASTX-Toolkit (Gordon and Hannon, 2010) was used to remove duplicate reads. 212 Reference-guided assembly of the targeted loci was performed with Alignreads v. 213 2.25 (Straub et al., 2011), using the probe sequences separated by a string of 200 214 Ns each as reference and the parameter settings of Weitemier et al. (2014). Steps up 215 to the final alignments of the LCN loci were performed following Weitemier et al. 216 (2014). Although the LCN loci were created by randomly concatenating the 217 respective exons, they will be called genes. All analyses based on the LCN genes 218 were conducted on 727 loci that contained sequence information for at least part of 219 these genes for all accessions; the remaining 437 genes out of the total 1,164 220 targeted had completely missing sequences for certain accessions. 221 The plastid genome and the nrDNA cistron (18S-ITS1-5.8S-ITS2-26S) were 222 also assembled with Alignreads, utilizing Oxalis reference sequences that were built 223 according to Straub et al. (2011, 2012). Genome skim reads of O. obtusa were 8 224 quality trimmed, using the same settings as for the Hyb-Seq data, and duplicate 225 reads were removed. Alignreads was run with the following masking parameters, 226 utilizing the plastid genome sequence of Ricinus communis (GenBank accession 227 JF937588) as reference: Any base with sequencing depth <5 was masked, and 228 single nucleotide polymorphisms (SNPs) were only called in O. obtusa if 80% of 229 reads supported that SNP with sequencing depth ≥25 at that site. Read type 454, 230 accounting for longer read length, and linear setting were chosen in YASRA, the 231 assembler within Alignreads, and medium percentage sequence identity between 232 genome skim data and reference sequence was chosen, as this resulted in the 233 highest quality assembly (data not shown). Plastome regions present in the Ricinus 234 reference, but absent in O. obtusa, were masked. Hyb-Seq reads were then 235 assembled with guidance of the draft Oxalis plastome reference with the same 236 settings used for read assembly of the targeted nuclear loci. The resulting contigs of 237 all accessions were aligned with Mulan (Ovcharenko et al., 2005), utilizing the draft 238 Oxalis plastome reference sequence. Positions in consensus sequences were 239 masked in the alignment editor MEGA v.6 (Tamura et al., 2013) in case of (1) regions 240 with many SNPs compared to the reference due to wrongly assembled indels or 241 SNPs between overlapping contig ends, or (2) insertions not found in other Oxalis 242 accessions present at contig ends. Visual inspection of the assemblies was 243 performed with Tablet v.1.14.04.10 (Milne et al., 2013). 244 The nrDNA cistron (without the external transcribed spacer due to repetitive 245 elements) was assembled with Alignreads based on an Oxalis reference that was 246 built from a 774 bp partial sequence of O. obtusa (GenBank accession EU436922). 247 The same masking parameters as for building the Oxalis draft plastome reference 248 were chosen. Hyb-Seq reads were assembled to this reference with Alignreads 249 without masking parameters. In cases of more than one contig per accession and 250 divergent SNPs between them, such sites were masked. The consensus sequences 251 were aligned using MAFFT v.6864b (Katoh and Toh, 2008) with default settings. 9 252 Data completeness of all final alignments was calculated considering missing 253 data per base pair, and in case of the LCN gene matrix also considering missing 254 number of targeted exons and missing number of targeted genes, which were 255 regarded missing if all targeted exons failed enrichment. Sequence similarity 256 between probes and assembled reads of each Oxalis accession was estimated using 257 BLAT, based on a minimum 85% sequence similarity. 258 259 Plastome annotation 260 Annotation of the draft plastid genome was performed with DOGMA (Wyman et al., 261 2004) on the Oxalis accession with the longest and one of the most complete 262 plastome sequence (Oxalis hirsuta Sond.). The annotated plastome was visualized 263 with GenomeVx (Conant and Wolfe, 2008). 264 265 Phylogenetic network analysis 266 Putative conflicting signals within the LCN gene, plastome and nrDNA cistron 267 datasets were visualized with NeighborNet (Bryant and Moulton, 2004) implemented 268 in SplitsTree v.4.13.1 (Huson and Bryant, 2006) using concatenated sequence 269 alignments, and networks were constructed based on uncorrected pairwise matrices. 270 Bootstrapping was performed with 100 replicates. 271 272 Phylogenetic tree reconstructions and gene-gene/-species tree comparisons 273 Maximum likelihood (ML) and Bayesian Markov chain Monte Carlo (MCMC) / 274 Bayesian inference (BI) methods were used for phylogenetic reconstruction of gene 275 trees. ML gene trees were run with RAxML v.7.3.0 (Stamatakis, 2006) with the GTR 276 + Г nucleotide substitution model and rapid bootstrap with 100 replicates each. 277 Bayesian MCMC analysis was performed with MrBayes v.3.2 (Ronquist and 278 Huelsenbeck, 2003), utilizing sampling across substitution models and Г correction. 279 Two simultaneous runs with four chains each were performed for 5 mil. generations, 10 280 in each run 1,001 trees were sampled, of which the first 25% were discarded as 281 burn-in. In order to diagnose convergence between the individual Bayesian MCMC 282 runs in such a large dataset, three measures of convergence were chosen: (1) 283 average effective sample size (avgESS) >100, (2) potential scale reduction factor 284 (PSRF) ~1.0, and (3) average standard deviation of split frequencies (ASDSF) <0.05. 285 Phylogenetic hypotheses based on the targeted LCN genes were inferred in 286 three different ways, which are intensely debated (e.g., von Haeseler, 2012, 287 DeGiorgio et al., 2014, Liu et al., 2015, Tonini et al., 2015): (1) species tree 288 reconstruction under the multispecies coalescent model utilizing three different 289 programs: (i) ASTRAL (Mirarab et al., 2014), which finds the species tree that agrees 290 with the largest number of quartet trees induced by the set of gene trees, (ii) MP-EST 291 (Liu et al., 2010), which uses maximum pseudo-likelihood for the estimation of 292 species trees, and (iii) STAR (Liu et al., 2009), which uses average ranks of gene 293 coalescence times to build species trees; bootstrapping of the STAR species tree 294 was performed according to the multilocus bootstrap method of Seo (2008), (2) a 295 supertree approach using matrix representation parsimony (MRP) (Baum, 1992, 296 Ragan, 1992), and (3) a supermatrix approach using concatenation of the dataset. 297 The first two methods were performed both on the set of MrBayes maximum 298 posterior probability trees and RAxML best trees. The concatenated dataset was run 299 with RAxML, applying the GTR + Г nucleotide substitution model and rapid bootstrap 300 with 100 replicates. 301 Topological differences between gene trees were assessed by calculating the 302 Robinson-Foulds (RF) distance (Robinson and Foulds, 1981) with the R function 303 RF.dist of the phangorn package (Schliep, 2011) and visualized as principal 304 coordinate (PCoA) plot. Topological differences between the gene trees and the 305 STAR species tree were calculated as modified RF distance (RFD) using STRAW 306 (Shaw et al., 2013) and visualized as a histogram. 11 307 Phylogenetic reconstruction based on the draft plastid genome and nrDNA 308 cistron was performed using MrBayes on partitioned datasets. In the case of the 309 plastome data the alignment was partitioned into protein-, tRNA-, and rRNA-coding 310 as well as intron/spacer (i.e., non-coding) regions. The nrDNA alignment was 311 partitioned into 18S, ITS1, 5.8S, ITS2 and 26S. All partitions were sampled across 312 substitution models, and an initial partition-specific among-site rate variation 313 correction, estimated by PartitionFinder v.1.1.1 (Lanfear et al., 2012), was employed, 314 which was updated based on output from initial MrBayes runs. All parameters were 315 unlinked across partitions, and each partition had its own, independent evolutionary 316 rate. Two simultaneous runs with four chains each were performed for 100 mil. 317 generations, in each run 1,001 trees were sampled, of which the first 25% were 318 discarded as burn-in. For plastome analysis, the complete protein-coding plastid 319 complement (Moore et al., 2010) and inverted repeat of Oxalis latifolia Kunth 320 (GenBank accession HQ664602) was used as outgroup, whereas O. corniculata 321 served as outgroup both in the LCN and nrDNA cistron dataset. However, as O. 322 corniculata and O. latifolia are phylogenetically very distantly related, and the branch 323 leading to O. corniculata was very long in all analyses, these outgroups were 324 trimmed from the final trees and trees were rooted using Oxalis imbricata Eckl. & 325 Zeyh. in the comparison of phylogenetic hypotheses based on all three datasets. 326 327 Results 328 Probe design, sequencing, quality trimming, assembly, alignment and quality 329 assessment 330 Design of Oxalis LCN probes resulted in 4,926 exons ≥120 bp length and 1,164 331 genes (Table S2, Supporting information) of total 1,127,209 bp. Gene length ranged 332 from 600 to 4,125 bp with a mean of 968 bp. Adapter and quality-filtering resulted in 333 an average loss of 7% of the reads (Table S3, Supporting information). In O. 334 palmifrons T.M. Salter 22% of the reads were dropped, indicating its poor initial DNA 12 335 quality; DNA of this accession was nearly degraded. After duplicate read removal on 336 average 47% of quality-filtered reads remained (Table S3, Supporting information). 337 This was a high number of duplicate reads, likely the result of PCR duplicates from 338 too many PCR cycles during genomic library preparation and enrichment. Of the 339 quality-filtered reads after duplicate removal on average 59% mapped to the LCN 340 genes (on-target), 27% to the draft plastome and 3% to the nrDNA cistron reference 341 (Table S3, Supporting information). The mean number of targeted loci with complete 342 or partial sequence information was 4,124 for exons (from total of 4,926) and 1,141 343 for genes (from total of 1,164) (Table S2, Supporting information). Sequences 344 diverged by 3–4% between reads and LCN probes (Table S2, Supporting 345 information), and sequence divergence was 11% between probes and the O. 346 corniculata transcriptome used to define exon boundaries. Completeness of the 347 datasets were as follows: 90% for the LCN exons, 96% for the plastome, and 96% for 348 the nrDNA cistron (Table S2, Supporting information). These mean values are an 349 underestimate due to two accessions with exceptionally low read numbers (O. hirta 350 J62, O. palmifrons). Mean sequencing depth for the datasets was 16 (LCN exons), 351 173 (plastome) and 290 (nrDNA cistron) (Table S2, Supporting information). 352 353 Plastome annotation 354 The plastid genome was annotated to enable partitioning of the alignment for 355 phylogenetic reconstruction, and the annotation can be summarized as follows (Fig. 356 S1, Supporting information): 79 protein-coding genes were found, including three 357 genes that were absent in closely related Ricinus (infA, ycf15, ycf68). Two genes 358 (rps16, rpl32) were missing in Oxalis compared to Ricinus. All 30 tRNA-coding genes 359 were present, as were all four rRNA-coding genes. 360 361 Phylogenetic reconstructions 362 1) Phylogenetic networks 13 363 In the phylogenetic network based on the concatenated LCN gene matrix (Fig. S2A, 364 Supporting information) splits were numerous but short, and in the plastome network 365 (Fig. S2B, Supporting information) splits were largely absent, thereby justifying the 366 use of phylogenetic tree reconstruction methods for those two datasets. In contrast, 367 the nrDNA cistron network contained numerous splits (Fig. S2C, Supporting 368 information). 369 2) Robustness of phylogenetic hypothesis based on the LCN genes 370 Although all avgESS estimates were satisfactory, a small minority of PSRF and 371 ASDSF values (<20) strongly differed from expectation (Fig. S3, Supporting 372 information). Removing these loci from species tree reconstruction idiosyncratically 373 affected the uncertain nodes discussed in the following (data not shown), and the loci 374 were kept. Species tree topology was relatively robust to both the method of species 375 and gene tree reconstruction, as the major clades were obtained and relationships 376 within clades were nearly identical in all cases (Fig. S4, Supporting information): (1) 377 Topology within the Hirta clade was identical between the different methods except 378 for the weakly supported position of Oxalis primuloides R. Knuth in the concatenated 379 tree with 45% bootstrap support (BS). (2) In all cases Oxalis amblyosepala Schltr. 380 and Oxalis polyphylla Jacq. formed a clade, which was in sister relationship to the 381 Hirta clade. (3) Oxalis hirsuta, Oxalis inconspicua T.M. Salter, O. obtusa, and Oxalis 382 pulchella Jacq. formed a clade, and within-clade relationships were identical with all 383 methods except for the MRP supertree based on BI of gene trees. Clustering of O. 384 inconspicua and the outgroup O. corniculata was apparently due to long-branch 385 attraction: These two species exhibited the longest branches in the tree, which is 386 likely to result in clustering together, independent on the relationships of the 387 underlying sequences (Felsenstein, 1978). (4) Phylogenetic placement of O. 388 imbricata, Oxalis orthopoda T.M. Salter, Oxalis truncatula Jacq., O. palmifrons and 389 Oxalis smithiana Eckl. & Zeyh. partly deviated between the different methods. In all 14 390 cases the latter two were in a successive sister relationship to the clade comprising 391 the Hirta clade and O. amblyosepala and O. polyphylla. 392 The number of incongruent nodes between species trees based on BI of gene 393 trees versus ML gene trees was two to three, and in a comparison of all utilized 394 species tree methods with concatenation there were six incongruent nodes (Fig. S4, 395 Supporting information). All nodes of the STAR tree had 100% BS. The 396 concatenated dataset showed weak support for all those nodes, which were 397 discussed above in (1) and (4) as ambiguous in phylogenetic placement; otherwise 398 the BS values were also 100%. 399 3) Incongruences between gene-gene and gene-species trees 400 The extensive and largely homogeneous cluster of RF distances between gene trees 401 shown in the PCoA demonstrated the great variability of gene tree topologies (Fig. 402 S5, Supporting information). RF distances between the gene trees and the STAR 403 species tree, displayed in a histogram (Fig. S6, Supporting information), was on 404 average RFD = 34, which is a relatively high value, considering that the maximum RF 405 value for this number of accessions is RFD = 46. 406 4) Comparison between phylogenetic trees based on the LCN genes, the plastome, 407 and the nrDNA cistron 408 The STAR species tree reconstruction method was chosen to compare datasets 409 (Figure 2). The plastome and nrDNA cistron trees resulted in the same major clades 410 as the LCN gene dataset: the Hirta clade, O. amblyosepala and O. polyphylla as 411 sister, and O. hirsuta, O. inconspicua, O. obtusa, and O. pulchella as a clade. 412 However, relationships within the Hirta clade were incongruently resolved, especially 413 in the plastome tree (Figure 2A). In the nrDNA cistron tree (Figure 2B) there were 414 numerous polytomies and partly low posterior probability (PP), which is possibly the 415 result of a lack of parsimony informative sites in the ribosomal RNAs combined with a 416 high rate of evolution in the spacers, hindering direct comparison. Similar to the 417 comparison of reconstruction methods of trees based on the LCN loci, phylogenetic 15 418 placement of O. imbricata, O. orthopoda, O. truncatula, O. palmifrons, and O. 419 smithiana deviated between the trees based on the three datasets; the differences 420 were slightly stronger, as O. palmifrons and O. smithiana were not in sister 421 relationship to the clade comprising the Hirta clade and O. amblyosepala and O. 422 polyphylla in the plastome and nrDNA cistron trees. All nodes of the plastome tree 423 were supported with 1 PP except for the Oxalis ciliaris Jacq. / O. hirta and Oxalis 424 tenella Jacq. / Oxalis callosa R. Knuth – O. primuloides splits (Figure 2A). The 18 425 incongruent nodes between the species tree based on the LCN loci and the plastome 426 tree revealed the strong cytonuclear discordance, which is supported by the high 427 support values in both the plastome tree and species tree based on the LCN genes 428 (Figure 2A). 429 430 Discussion 431 The value of our target enrichment probe design for plant phylogenetics and 432 its application in Hyb-Seq 433 Our LCN probe design pipeline, implemented in the BASH script Sondovač, 434 generates LCN loci from a combination of transcriptome and genome skim data. 435 Many transcriptomes can be taken from the already existing HTS resource 1KP 436 initiative, in which approximately 70% of APG III families are represented (APG III, 437 2009). Depending on the phylogenetic level of the group under study and the number 438 of LCN genes users want to obtain, a transcriptome of a more or less closer relative 439 of the study group must be utilized. Users need to provide paired-end genome skim 440 data of one of the taxa of their study group, as this accounts for the specificity of the 441 probe design. The reference sequences of the plastid and possibly mitochondrial 442 genome, which are used in the pipeline to remove organellar reads from the genome 443 skim read pool, are also part of existing HTS resources 444 (http://www.ncbi.nlm.nih.gov/genome/organelle/); plastome reference sequences 16 445 from taxa up to the same order of the studied plant group are suitable (Straub et al., 446 2012). 447 By running Sondovač for southern African Oxalis, over 1,000 orthologous 448 LCN genes of adequate length were obtained. Target enrichment of these loci 449 resulted in a nearly complete data matrix. Efficiency of target enrichment of the LCN 450 loci was similar to that reported for the Asclepias dataset by Weitemier et al. (2014) if 451 considering only their accessions with external barcodes; there were approximately 452 60% on-target, quality-filtered reads after duplicate removal, although sequence 453 divergence to the LCN probes was larger for Oxalis (average 4% compared to 1.5% 454 for Asclepias). The average sequencing depth 16☓ of LCN exons should facilitate 455 unambiguous SNP calling in orthologous genes, but also enable the identification of 456 paralogous genes in polyploid accessions, which will be an essential step towards 457 using target enrichment on polyploids. Plastome assembly and annotation suggest 458 that it is possible to obtain (nearly) the whole plastid genome with Hyb-Seq. Only a 459 few quickly evolving regions, such as the full-length copy of ycf1, were partial in this 460 analysis. 461 462 Towards a refined phylogeny of southern African Oxalis 463 Although the selected 23 Oxalis species comprised only a small subset of species 464 from the southern African Oxalis clade, preliminary phylogenetic conclusions and 465 evolutionary implications can be drawn. Major phylogenetic relationships based on 466 the LCN loci did not strongly differ from the published phylogeny of Oberlander et al. 467 (2011), but node resolution and support improved dramatically. 468 Topologies of the trees based on the LCN loci were relatively robust to the 469 methods by which they were obtained (multispecies coalescent, MRP supertree, 470 concatenation) and also to the methods through which the gene trees were estimated 471 (BI and ML). Only few highly supported, conflicting relationships were found between 17 472 the concatenated tree and the coalescent trees, indicating that the coalescent model 473 likely reduces to the concatenation model in this case (Liu et al., 2009, 2015). 474 The phylogenetic tree topology based on the LCN genes showed widespread 475 and strong incongruences with the plastome tree topology. Cytonuclear discordance 476 is considered as evidence for either incomplete lineage sorting (ILS) of the 477 chloroplast or chloroplast introgression (chloroplast capture). Studies usually 478 interpreted this discordance in terms of hybridization (e.g., Helianthus L.: Dorado et 479 al., 1992, Mitella L.: Okuyama et al., 2005, Ficus L.: Renoult et al., 2009, 480 Senecioneae Cass.: Pelser et al., 2010), although the presence of ILS was rarely 481 statistically tested. In addition to cytonuclear discordance, both gene-gene and gene- 482 species trees showed quite strong topological incongruences in Oxalis. There was an 483 immense variety of gene tree topologies in general, and gene tree topologies strongly 484 differed from the species tree topology. The underlying reasons for this topological 485 variation need to be found by testing for ILS, hybridization, paralogy and a 486 combination of those under coalescent models (e.g., Rasmussen and Kellis, 2012, 487 Yu et al., 2014). Given the very short branch lengths at the base of the southern 488 African Oxalis clade (Oberlander et al., 2011) as well as generally large population 489 sizes of most species, strong ILS effects are perhaps to be expected, particularly at 490 the base of the putative southern African radiation. The extent of hybridization in 491 Oxalis is poorly known. The only confirmed hybrid is South American Oxalis tuberosa 492 Molina (Emshwiller and Doyle, 2002). Both heterostyly and pollinator specificity may 493 reduce the potential for hybridization: In Oxalis tristyly is predominant and distyly rare 494 (Weller et al., 2007, Gardner et al., 2011), meaning that three (in distyly two) floral 495 morphs with reciprocal placement of stigma and anthers persist in a population. 496 Heterostyly promotes precise pollen transfer, which could thus promote reproductive 497 isolation and prevent hybridization, if the position of sexual organs is not similar 498 between the mating individuals (Keller et al., 2012). Dense taxon sampling in the 499 southern African lineage is required to address the extent of hybridization. Finally, 18 500 gene duplication and loss could well be a major cause of the observed 501 incongruences, especially in the polyploid accessions, but testing that requires the 502 identification of paralogous genes. A bioinformatic pipeline for paralogue 503 identification of LCN genes from target enrichment data is currently under 504 development by AL. 505 The observed, strong cytonuclear discordance suggests that organellar 506 phylogenies alone are unlikely to represent the species tree (Davis et al., 2014), and 507 the strong incongruences between gene-gene and gene-species trees imply that a 508 large number of LCN loci is needed to robustly resolve phylogenies in the presence 509 of ILS and hybridization, especially of putatively radiating lineages such as southern 510 African Oxalis. 511 512 Acknowledgements 513 The authors thank Gane Ka-Shu Wong (University of Alberta) and Douglas E. Soltis 514 (University of Florida) for access to the 1KP transcriptome data, Tomáš Fér (Charles 515 University in Prague) for access to the genome skim data, Lenka Flašková (Charles 516 University in Prague) for DNA extraction, Filip Pardy (Central European Institute of 517 Technology, Brno) for help with sonication, Mark Dasenko (Oregon State University 518 Center for Genome Research and Biocomputing / CGRB) for Illumina sequencing 519 support, Sanjuro Jodgeo (Oregon State University) for data analysis support, and 520 Daisie Huang, Jennifer Mandel and an anonymous reviewer for constructive 521 comments. Data were analysed on the CGRB computer cluster, on MetaCentrum 522 under the program LM2010005, and on CERIT-SC under the program Centre CERIT 523 Scientific Cloud. MetaCentrum and CERIT-SC are part of the Operational Program 524 Research and Development for Innovations, Reg. no. CZ.1.05/3.2.00/08.0144, Czech 525 Republic. This work, including three stays in the Liston Laboratory, was financially 526 supported to R.S. by project Reg. no. CZ.1.07/2.3.00/30.0048 of the European Social 527 Fund in the Czech Republic through the Operational Program Education for 19 528 Competitiveness. Additional funding came from the long-term research development 529 projects RVO 67985939 (The Czech Academy of Sciences) and institutional 530 resources of the Ministry of Education, Youth and Sports of the Czech Republic for 531 the support of science and research. 532 533 Author contributions 534 A.L., R.S., and J.S. designed the study, R.S. and A.L. developed the probe design 535 pipeline, V.Z. converted this pipeline into the BASH script Sondovač, J.S., K.O., and 536 L.D. provided samples, R.S. conducted laboratory work, K.W., S.C.K.S., and R.C.C. 537 advised on data collection and computational analyses, R.S., V.Z., and K.O. 538 analysed the data, and R.S. wrote the initial draft of the manuscript. All authors 539 contributed to the final version of the manuscript. 540 541 References 542 The Angiosperm Phylogeny Group (2009) An update of the Angiosperm Phylogeny 543 Group classification for the orders and families of flowering plants: APG III. Botanical 544 Journal of the Linnean Society, 161, 105–121. 545 Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic 546 inference, and the desirability of combining gene trees. Taxon, 41, 3–10. 547 Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina 548 sequence data. Bioinformatics, 30, 2114–2120. 549 Born J, Linder HP, Desmet P (2006) The Greater Cape Floristic Region. Journal of 550 Biogeography, 34, 147–162. 551 Bryant D, Moulton V (2004) Neighbor-Net: an agglomerative method for the 552 construction of phylogenetic networks. Molecular Biology and Evolution, 21, 255– 553 265. 20 554 Chamala S, García N, Godden GT et al. (2015) MarkerMiner 1.0: a new application 555 for phylogenetic marker development using angiosperm transcriptomes. Applications 556 in Plant Sciences, 3, 1400115. 557 Conant GC, Wolfe KH (2008) GenomeVx: simple web-based creation of editable 558 circular chromosome maps. Bioinformatics, 24, 861–862. 559 Davis CC, Xi Z, Mathews S (2014) Plastid phylogenomics and green plant 560 phylogeny: almost full circle but not quite there. BMC Biology, 12, 11. 561 DeGiorgio M, Syring J, Eckert AJ et al. (2014) An empirical evaluation of two-stage 562 species tree inference strategies using a multilocus dataset from North American 563 pines. BMC Evolutionary Biology, 14, 67. 564 De Sousa F, Bertrand YJK, Nylinder S et al. (2014) Phylogenetic properties of 50 565 nuclear loci in Medicago (Leguminosae) generated using multiplexed sequence 566 capture and next-generation sequencing. PLoS ONE, 9, e109704. 567 Dorado O, Rieseberg LH, Arias DM (1992) Chloroplast DNA introgression in 568 southern California sunflowers. Evolution, 46, 566–572. 569 Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of 570 fresh leaf tissue. Phytochemical Bulletin, 19, 11–15. 571 Emshwiller E, Doyle JJ (2002) Origins of domestication and polyploidy in oca (Oxalis 572 tuberosa; Oxalidaceae). 2. Chloroplast-expressed glutamine synthetase data. 573 American Journal of Botany, 89, 1042–1056. 574 Faircloth BC, McCormack JE, Crawford NG et al. (2012) Ultraconserved elements 575 anchor thousands of genetic markers for target enrichment spanning multiple 576 evolutionary timescales. Systematic Biology, 61, 717–726. 577 Felsenstein J (1978) Cases in which parsimony or compatibility methods will be 578 positively misleading. Systematic Zoology, 27, 401–410. 579 Givnish TJ, Ames M, McNeal JR et al. (2010) Assembling the tree of the 580 Monocotyledons: plastome sequence phylogeny and evolution of Poales. Annals of 581 the Missouri Botanical Garden, 97, 584–616. 21 582 Gordon A, Hannon GJ (2010) FASTX-Toolkit. FASTQ/A short-reads pre-processing 583 tools. http://hannonlab.cshl.edu/fastx_toolkit/ [accessed 29 May 2015]. 584 Grover CE, Gallagher JP, Jareczek JJ et al. (2015) Re-evaluating the phylogeny of 585 allopolyploid Gossypium L.. Molecular Phylogenetics and Evolution, 92, 45–52. 586 Hedtke SM, Morgan MJ, Cannatella DC, Hillis DM (2013) Targeted enrichment: 587 maximizing orthologous gene comparisons across deep evolutionary time. PLoS 588 ONE, 8, e67908. 589 Heyduk K, Trapnell DW, Barrett CF, Leebens-Mack J (2015) Phylogenomic analyses 590 of species relationships in the genus Sabal (Arecaceae) using targeted sequence 591 capture. Biological Journal of the Linnean Society, early view. 592 Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary 593 studies. Molecular Biology and Evolution, 23, 254–267. 594 Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning 595 of incongruence. Trends in Genetics, 22, 225–231. 596 Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence 597 alignment program. Briefings in Bioinformatics, 9, 286–298. 598 Kearse M, Moir R, Wilson A et al. (2012) Geneious Basic: an integrated and 599 extendable desktop software platform for the organization and analysis of sequence 600 data. Bioinformatics, 28, 1647–1649. 601 Keller B, de Vos JM, Conti E (2012) Decrease of sexual organ reciprocity between 602 heterostylous primrose species, with possible functional and evolutionary 603 implications. Annals of Botany, 110, 1233–1244. 604 Kent WJ (2002) BLAT – the BLAST-like alignment tool. Genome Research, 12, 656– 605 664. 606 Lanfear R, Calcott B, Ho SY, Guindon S (2012) PartitionFinder: combined selection 607 of partitioning schemes and substitution models for phylogenetic analyses. Molecular 608 Biology and Evolution, 29, 1695–1701. 22 609 Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature 610 Methods, 9, 357–359. 611 Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets 612 of protein or nucleotide sequences. Bioinformatics, 22, 1658–1659. 613 Li H, Handsaker B, Wysoker A et al. (2009) The Sequence Alignment/Map format 614 and SAMtools. Bioinformatics, 25, 2078–2079. 615 Liu L, Xi Z, Wu S, Davis C, Edwards SV (2015) Estimating phylogenetic trees from 616 genome-scale data. ArXiv:1501.03578. 617 Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for 618 estimating species trees under the coalescent model. BMC Evolutionary Biology, 10, 619 302. 620 Liu L, Yu L, Kubatko L et al. (2009) Coalescent methods for estimating phylogenetic 621 trees. Molecular Phylogenetics and Evolution, 53, 320–328. 622 Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using 623 coalescence times among sequences. Systematic Biology, 58, 468–477. 624 Magoč T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to 625 improve genome assemblies. Bioinformatics, 27, 2957–2963. 626 Mandel JR, Dikow RB, Funk VA (2015) Using phylogenomics to resolve mega- 627 families: an example from Compositae. Journal of Systematics and Evolution, 53, 628 391–402. 629 Mandel JR, Dikow RB, Funk VA et al. (2014) A target enrichment method for 630 gathering phylogenetic information from hundreds of loci: an example from the 631 Compositae. Applications in Plant Sciences, 2, 1300085. 632 Milne I, Stephen G, Bayer M et al. (2013) Using Tablet for visual exploration of 633 second-generation sequencing data. Briefings in Bioinformatics, 14, 193–202. 634 Mirarab S, Reaz R, Bayzid MdS et al. (2014) ASTRAL: genome-scale coalescent- 635 based species tree estimation. Bioinformatics, 30, i541–i548. 23 636 Moore MJ, Soltis PS, Bell CD et al. (2010) Phylogenetic analysis of 83 plastid genes 637 further resolves the early diversification of eudicots. Proceedings of the National 638 Academy of Sciences of the United States of America, 107, 4623–4628. 639 Nicholls JA, Pennington RT, Koenen EJM et al. (2015) Using targeted enrichment of 640 nuclear genes to increase phylogenetic resolution in the neotropical rain forest genus 641 Inga (Leguminosae: Mimosoideae). Frontiers in Plant Science, 642 doi:10.3389/fpls.2015.00710. 643 Oberlander KC, Dreyer LL, Bellstedt DU (2011) Molecular phylogenetics and origins 644 of southern African Oxalis. Taxon, 60, 1667–1677. 645 Oberlander KC, Roets F, Dreyer LL (2014) Pre-Pleistocene origin of an endangered 646 habitat: links between vernal pools and aquatic Oxalis in the Greater Cape Floristic 647 Region of South Africa. Journal of Biogeography, 41, 1572–1582. 648 Okuyama Y, Fujii N, Wakabayashi M et al. (2005) Nonuniform concerted evolution 649 and chloroplast capture: heterogeneity of observed introgression patterns in three 650 molecular data partition phylogenies of Asian Mitella (Saxifragaceae). Molecular 651 Biology and Evolution, 22, 285–296. 652 Ovcharenko I, Loots GG, Giardine BM et al. (2005) Mulan: multiple-sequence local 653 alignment and visualization for studying function and evolution. Genome Research, 654 15, 184–194. 655 Parks M, Cronn R, Liston A (2009) Increasing phylogenetic resolution at low 656 taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC 657 Biology, 7, 84. 658 Parks M, Cronn R, Liston A (2012) Separating the wheat from the chaff: mitigating 659 the effects of noise in a plastome phylogenomic dataset from Pinus L. (Pinaceae). 660 BMC Evolutionary Biology, 12, 100. 661 Pelser PB, Kennedy AH, Tepe EJ et al. (2010) Patterns and causes of incongruence 662 between plastid and nuclear Senecioneae (Asteraceae) phylogenies. American 663 Journal of Botany, 97, 856–873. 24 664 Pillon Y, Johansen J, Sakishima T et al. (2014) Primers for low-copy nuclear genes in 665 Metrosideros and cross-amplification in Myrtaceae. Applications in Plant Sciences, 2, 666 1400049. 667 Proches S, Cowling RM, Goldblatt P et al. (2006) An overview of the Cape 668 geophytes. Biological Journal of the Linnean Society, 87, 27–43. 669 Pyron RA (2015) Post-molecular systematics and the future of phylogenetics. Trends 670 in Ecology and Evolution, 30, 384–389. 671 Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. 672 Molecular Phylogenetics and Evolution, 1, 53–58. 673 Rasmussen MD, Kellis M (2012) Unified modeling of gene duplication, loss, and 674 coalescence using a locus tree. Genome Research, 22, 755–765. 675 Reneker J, Lyons E, Conant GC et al. (2012) Long identical multispecies elements in 676 plant and animal genomes. Proceedings of the National Academy of Sciences of the 677 United States of America, 109, E1183–E1191. 678 Renoult JP, Kjellberg F, Grout C et al. (2009) Cyto-nuclear discordance in the 679 phylogeny of Ficus section Galoglychia and host shifts in plant-pollinator 680 associations. BMC Evolutionary Biology, 9, 248. 681 Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Mathematical 682 Biosciences, 53, 131–147. 683 Ronquist F, Huelsenbeck J (2003) MrBayes 3: Bayesian phylogenetic inference 684 under mixed models. Bioinformatics, 19, 1572–1574. 685 Rothfels CJ, Larsson A, Li F-W et al. (2013) Transcriptome-mining for single-copy 686 nuclear markers in ferns. PLoS ONE, 8, e76957. 687 Schliep KP (2011) phangorn: phylogenetic analysis in R. Bioinformatics, 27, 592– 688 593. 689 Seo TK (2008) Calculating bootstrap probabilities of phylogeny using multilocus 690 sequence data. Molecular Biology and Evolution, 25, 960–971. 25 691 Shaw TI, Ruan Z, Glenn TC, Liu L (2013) STRAW: Species TRee Analysis Web 692 server. Nucleic Acids Research, 41, W238–W241. 693 Small RL, Cronn RC, Wendel JF (2004) Use of nuclear genes for phylogeny 694 reconstruction in plants. Australian Systematic Botany, 17, 145–170. 695 Smith BT, Harvey MG, Faircloth BC et al. (2013) Target capture and massively 696 parallel sequencing of ultraconserved elements for comparative studies at shallow 697 evolutionary time scales. Systematic Biology, 0, 1–13. 698 Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic 699 analyses with thousands of taxa and mixed models. Bioinformatics, 22, 2688–2690. 700 Stephens JD, Rogers WL, Heyduk K et al. (2015a) Resolving phylogenetic 701 relationships of the recently radiated carnivorous plant genus Sarracenia using target 702 enrichment. Molecular Phylogenetics and Evolution, 85, 76–87. 703 Stephens JD, Rogers WL, Mason CM et al. (2015b) Species tree estimation of 704 diploid Helianthus (Asteraceae) using target enrichment. American Journal of Botany, 705 102, 910–920. 706 Straub SCK, Fishbein M, Livshultz T et al. (2011) Building a model: developing 707 genomic resources for common milkweed (Asclepias syriaca) with low coverage 708 genome sequencing. BMC Genomics, 12, 211. 709 Straub SC, Parks M, Weitemier K et al. (2012) Navigating the tip of the genomic 710 iceberg: next-generation sequencing for plant systematics. American Journal of 711 Botany, 99, 349–364. 712 Tamura K, Stecher G, Peterson D et al. (2013) MEGA6: molecular evolutionary 713 genetics analysis version 6.0. Molecular Biology and Evolution, 30, 2725–2729. 714 Tonini J, Moore A, Stern D et al. (2015) Concatenation and species tree methods 715 exhibit statistically indistinguishable accuracy under a range of simulated conditions. 716 PLoS Currents, 7, ecurrents.tol.34260cc27551a527b124ec5f6334b6be. 717 Tonnabel J, Olivieri I, Mignot A et al. (2014) Developing nuclear DNA phylogenetic 718 markers in the angiosperm genus Leucadendron (Proteaceae): a next-generation 26 719 sequencing transcriptomic approach. Molecular Phylogenetics and Evolution, 70, 37– 720 46. 721 von Haeseler A (2012) Do we still need supertrees? BMC Biology, 10, 13. 722 Weitemier K, Straub SCK, Cronn RC et al. (2014) Hyb-Seq: combining target 723 enrichment and genome skimming for plant phylogenomics. Applications in Plant 724 Sciences, 2, 1400042. 725 Wyman SK, Jansen RK, Boore JL (2004) Automatic annotation of organellar 726 genomes with DOGMA. Bioinformatics, 20, 3252–3255. 727 Yu Y, Dong J, Liu KJ, Nakhleh L (2014) Maximum likelihood inference of reticulate 728 evolutionary histories. Proceedings of the National Academy of Sciences of the 729 United States of America, 111, 16448–16453. 730 731 Data accessibility 732 Sondovač is deposited in GitHub (https://github.com/V-Z/sondovac/wiki/) together 733 with the input Oxalis data (genome skim paired-end reads, plastid and mitochondrial 734 reference). There we provided a link to the transcriptome data, which were obtained 735 in the framework of the 1KP initiative. Raw reads, assemblies, alignments, 736 phylogenetic trees, the Geneious de novo assembly output and the final probe 737 sequences are available under Dryad doi:10.5061/dryad.dn08t. 738 27 739 Figures and tables 740 741 Figure 1 742 Workflow of the probe design script Sondovač. An overview on the main steps of 743 Hyb-Seq are given in the top part of the figure; probe design is the first one. Each 744 step of Sondovač is numbered and illustrated by three boxes each: Software is 745 highlighted in dark gray, a summary of each step is given in medium gray, and 746 input/output of each step is depicted in light gray. Optional removal of reads of 747 mitochondrial origin from the genome skim data is marked by decoloration of the text. 748 The required input files of Sondovač are highlighted in bold. The direction of the 749 workflow is indicated by arrows. 750 751 Figure 2 752 Comparison of phylogenetic hypotheses based on 727 LCN genes, the plastid 753 genome and the nrDNA cistron of southern African Oxalis. (A) Comparison between 754 the STAR species tree, based on maximum likelihood gene trees, and the plastome 755 tree. (B) Comparison between the STAR species tree and the nrDNA cistron tree. 756 Dashed lines connect each accession to its placement in the contrasting tree. Values 757 close to the nodes are bootstrap support (* = 100%) in the STAR species tree or 758 posterior probability (* = 1) in the plastome and nrDNA cistron tree. The Hirta clade 759 (Oberlander et al., 2011) is colored in light gray (respectively green in the on-line 760 color figure). Nodes that are incongruent between the contrasting phylogenetic trees 761 of the different datasets are marked with arrows. 28 Hybridization between Bait synthesis, based Probe design baits and genomic Sequencing Data analysis on probe sequences libraries 2 3 4 Bowtie 2, SAMtools, Bowtie 2, SAMtools, FLASH bam2fastq bam2fastq Removal of reads of Removal of reads of Combination of plastid origin mitochondrial origin paired-end reads Paired-end genome skim Combined genome skim Paired-end genome Paired-end genome skim reads without plastid and reads without plastid (and skim reads reads without plastid reads mitochondrial reads mitochondrial) reads Mitochondriome Plastome reference 1 reference BLAT, UNIX commands Removal of transcripts sharing ≥90% sequence similarity Transcripts Unique transcripts 8 7 6 5 UNIX commands Geneious UNIX commands BLAT Retention of those contigs De novo assembly of the Filtering of BLAT output: Matching of the unique that comprise exons transcript or genome skim (1) Choice of transcript or transcripts and the filtered, ≥ bait length (in the Oxalis BLAT hits [depending on genome skim sequences combined genome skim case 120 bp) and have a the selection in (1) in the for further processing reads sharing ≥85% certain total locus length previous step] to larger (2) Removal of transcripts sequence similarity (in the Oxalis case contigs with >1,000 BLAT hits ≥600 bp) (3) Removal of transcript or genome skim BLAT hits [depending on the selection in (1)] containing masked nucleotides De novo assembled Quality-filtered sequences Sequences of matching Length-filtered probe sequences of filtered of matching transcripts transcripts and genome sequences BLAT hits and genome skim reads skim reads 9 10 11 cd-hit-est UNIX commands BLAT, UNIX commands Removal of probe Retention of those contigs Removal of probe sequences sharing ≥90% that comprise exons sequences sharing ≥90% sequence similarity ≥ bait length (in the Oxalis sequence similarity with case 120 bp) and have a the plastome reference certain total locus length (in the Oxalis case ≥600 bp) Similarity-filtered probe Similarity-, length-filtered Final probe sequences sequences probe sequences Plastome reference A LCN genes Plastome B LCN genes nrDNA cistron 1 Supporting information 2 3 Table S1 4 Voucher information of southern African Oxalis accessions used in this study. 5 Herbarium specimens are either deposited in the Herbarium Collections at Charles 6 University in Prague (PRC) or the Stellenbosch University Herbarium (STEU). Ploidy 7 level estimates are unpublished data of JS. 8 Species Herbarium/ Locality Latitude/ Collector/ Ploidy Voucher no. Longitude Date Oxalis amblyosepala Schltr. PRC Vanhrynsdorp, Western Cape, S31° 46' 52.5'' J. Suda & R. Sudová 2x J11-969 South Africa E18° 46' 02.7'' 24.08.2011 Oxalis blastorrhiza T.M. Salter PRC Vanhrynsdorp, Western Cape, S31° 36' 07.9'' J. Suda & R. Sudová 2x J557 South Africa E18° 43' 47.9'' 31.05.2010 Oxalis callosa R. Knuth PRC Calvinia, Northern Cape, South S31° 58' 22.4'' J. Suda & R. Sudová 6x J11-726 Africa E20° 08' 50.3'' 19.08.2011 Oxalis ciliaris Jacq. PRC Swellendam, Western Cape, S34° 01' 28.3'' J. Suda & P. Trávníček 4x J233 South Africa E20° 25' 40.3'' 01.09.2009 Oxalis creaseyii T.M. Salter PRC Garies, Northern Cape, South S30° 28' 09.9'' J. Suda & R. Sudová 2x J11-961 Africa E18° 08' 08.2'' 24.08.2011 Oxalis exserta T.M. Salter PRC Springbok, Northern Cape, S29° 26' 14.4'' J. Suda & R. Sudová 8x J616 South Africa E17° 57' 10.1'' 08.06.2010 Oxalis gracilis Jacq. PRC Vanhrynsdorp, Western Cape, S31° 43' 04.4'' J. Suda & R. Sudová 2x J558 South Africa E18° 46' 07.6'' 31.05.2010 Oxalis helicoides T.M. Salter PRC Springbok, Northern Cape, S29° 46' 22.9'' J. Suda & P. Trávníček 2x J319 South Africa E17° 49' 15.5'' 06.09.2009 Oxalis hirsuta Sond. PRC Williston, Northern Cape, South S31° 22' 59.6'' J. Suda & R. Sudová 2x J12-20 Africa E20° 35' 24.6'' 14.04.2012 Oxalis hirta L. PRC Yzerfontein, Western Cape, S33° 19' 42.1'' J. Suda & R. Sudová 4x J57 South Africa E18° 11' 15.9'' 29.08.2007 Oxalis hirta L. PRC Yzerfontein, Western Cape, S33° 08' 10.2'' J. Suda & R. Sudová 4x J62 South Africa E17° 59' 54.5'' 30.08.2007 Oxalis imbricata Eckl. & Zeyh. STEU Worcester, Western Cape, S33°53.624' K. Oberlander 2x MO472 South Africa E19°26.284' 05.05.2004 Oxalis inconspicua T.M. Salter PRC Garies, Northern Cape, South S30° 32' 03.9'' J. Suda & R. Sudová 2x J595 Africa E18° 08' 14.9'' 02.06.2010 Oxalis kamiesbergensis T.M. PRC Kamieskroon, Northern Cape, S30° 10' 43.6'' J. Suda & R. Sudová 2x Salter J153 South Africa E18° 00' 45.0'' 14.09.2007 Oxalis obtusa Jacq. PRC Kommetjie, Western Cape, S34° 08' 00.5'' J. Suda & R. Sudová 2x J12 South Africa E18° 20' 13.6'' 26.08.2007 Oxalis obtusa Jacq. PRC Calvinia, Northern Cape, South S31° 19' 51.1'' J. Suda & R. Sudová 2x J99 Africa E19° 52' 09.4'' 02.09.2007 Oxalis orthopoda T.M. Salter STEU Mossel Bay, Western Cape, S34°10.723' K. Oberlander 4x MO351 South Africa E21°56.650' 24.04.2003 Oxalis palmifrons T.M. Salter STEU NA NA Ex Hort./ 2x MO403 NA Oxalis polyphylla Jacq. PRC De Hoop, Western Cape, South S34° 25' 51.7'' J. Suda & J. Rauchová 2x J11-44 Africa E20° 25' 06.5'' 21.05.2011 Oxalis primuloides R. Knuth PRC Calvinia, Northern Cape, South S31° 21' 03.1'' J. Suda & R. Sudová 2x J136 Africa E19° 31' 29.0'' 12.09.2007 1 Oxalis pulchella Jacq. PRC Steinkopf, Northern Cape, S29° 13' 33.2'' J. Suda & R. Sudová 2x J11-875 South Africa E17° 36' 29.2'' 22.08.2011 Oxalis reclinata Jacq. PRC Bitterfontein, Western Cape, S30° 58' 34.4'' J. Suda & P. Trávníček 2x J343 South Africa E18° 14' 03.6'' 08.09.2009 Oxalis smithiana Eckl. & Zeyh. PRC Cradock, Eastern Cape, South S32° 15' 46.4'' P. Trávníček & M. Kubešová 4x J11-511 Africa E25° 25' 21.9'' 05.06.2011 Oxalis tenella Jacq. PRC Clanwilliam, Western Cape, S32° 10' 05.3'' J. Suda & R. Sudová 2x J131 South Africa E18° 52' 15.2'' 11.09.2007 Oxalis truncatula Jacq. PRC Greyton, Western Cape, South S34° 04' 11.8'' J. Suda & P. Trávníček 4x J230 Africa E19° 37' 58.0'' 31.08.2009 9 10 Table S2 11 Success of Hyb-Seq in terms of number of reads after duplicate read removal, 12 number of assembled LCN exons and genes, sequence divergence between the 13 LCN matrix and the LCN probes, and completion and sequencing depth of all three 14 datasets of southern African Oxalis. The table is complemented by the genome skim 15 accession O. obtusa J12 and the transcriptome accession O. corniculata JHCN, 16 which did not form part of the calculation of the means. 17 Species Accession # quality-filtered # LCN exons # LCN genes Mean Completion of Completion of Completion of Mean Mean Mean number reads after assembled assembled sequence LCN matrix plastome nrDNA cistron sequencing sequencing sequencing duplicate from total 4,926 from total 1,164 divergence depth of LCN depth of depth of nrDNA removal from LCN loci plastome cistron probes Oxalis J11-969 497,402 4,246 (86%) 1,161 (100%) 4% 94% 98% 100% 13 166 209 amblyosepala Oxalis J557 715,360 4,339 (88%) 1,160 (100%) 4% 95% 99% 100% 16 220 427 blastorrhiza Oxalis callosa J11-726 788,393 4,490 (91%) 1,160 (100%) 4% 97% 98% 100% 17 139 405 Oxalis ciliaris J233 542,186 4,345 (88%) 1,160 (100%) 4% 95% 99% 100% 13 99 223 Oxalis creaseyii J11-961 703,091 4,454 (90%) 1,162 (100%) 4% 96% 99% 99% 15 155 293 Oxalis exserta J616 496,193 4,170 (85%) 1,160 (100%) 4% 92% 97% 100% 12 103 225 Oxalis gracilis J558 749,774 4,432 (90%) 1,162 (100%) 4% 96% 94% 100% 17 118 418 Oxalis helicoides J319 798,432 4,450 (90%) 1,162 (100%) 4% 97% 99% 100% 16 184 379 Oxalis hirsuta J12-20 933,290 4,514 (92%) 1,163 (100%) 3% 98% 99% 100% 21 393 306 Oxalis hirta J57 472,230 4,137 (84%) 1,162 (100%) 4% 91% 98% 93% 11 111 273 Oxalis hirta J62 156,655 2,611 (53%) 1,093 (94%) 4% 63% 79% 98% 8 55 69 Oxalis imbricata MO472 3,280,494 4,520 (92%) 1,163 (100%) 4% 99% 98% 83% 45 361 1,583 Oxalis J595 655,265 4,408 (89%) 1,161 (100%) 4% 95% 99% 98% 16 217 172 inconspicua Oxalis J153 401,417 3,913 (79%) 1,158 (99%) 4% 88% 99% 79% 12 138 228 kamiesbergensis Oxalis obtusa J99 534,889 4,287 (87%) 1,163 (100%) 3% 95% 99% 81% 14 166 233 Oxalis orthopoda MO351 939,291 4,516 (92%) 1,163 (100%) 4% 98% 96% 100% 21 168 660 Oxalis palmifrons MO403 168,543 1,253 (25%) 753 (65%) 4% 22% 64% 100% 16 83 58 Oxalis polyphylla J11-44 599,495 4,298 (87%) 1,159 (100%) 4% 94% 99% 100% 16 234 190 Oxalis J136 500,786 4,093 (83%) 1,161 (100%) 4% 91% 99% 100% 13 153 317 primuloides Oxalis pulchella J11-875 549,374 4,225 (86%) 1,162 (100%) 4% 93% 99% 100% 15 226 248 Oxalis reclinata J343 375,386 4,003 (81%) 1,158 (99%) 4% 90% 99% 100% 11 111 158 Oxalis smithiana J11-511 1,074,130 4,451 (90%) 1,162 (100%) 3% 97% 96% 100% 26 299 981 Oxalis tenella J131 511,049 4,343 (88%) 1,160 (100%) 4% 94% 99% 100% 14 137 246 Oxalis truncatula J230 627,899 4,488 (91%) 1,164 (100%) 3% 98% 98% 79% 15 126 231 Mean 711,293 4,124 (84%) 1,141 (98%) 4% 90% 96% 96% 16 173 290 Oxalis obtusa J12 7,938,349 3,972 (81%) 1,160 (100%) 1% 88% 99% 100% 11 825 4,490 Oxalis corniculata JHCN NA NA 1,163 (100%) 11% 93% NA NA NA NA NA 18 19 Table S3 2 20 Success of Hyb-Seq in terms of number of raw reads after removal of PhiX reads, 21 reads after quality-filtering, reads after duplicate read removal, number and 22 proportion of reads mapped to the LCN probes (on-target), the plastome reference 23 and the nrDNA cistron reference in southern African Oxalis. In cases where the sum 24 of on-target, plastid and nrDNA reads after assembly exceeded the initial, total 25 number of quality-filtered reads after duplicate removal, which was obviously due to 26 read mapping across datasets, the three read sources were marked with an asterisk. 27 The table is complemented by the genome skim accession O. obtusa J12, which did 28 not form part of the calculation of the means. The effect of target enrichment is 29 evident from a comparison of the proportion of reads mapped on-target between the 30 genome skim accession J12 and the Hyb-Seq accession J99 of O. obtusa: Instead of 31 on average 2% genome skim reads 59% Hyb-Seq reads mapped on-target. 32 Species Accession # raw reads # quality-filtered # quality-filtered # on-target reads # plastid reads # nrDNA reads number reads reads after (as % of quality- (as % of quality- (as % of quality- duplicate removal filtered reads filtered reads filtered reads (as % of quality- after duplicate after duplicate after duplicate filtered reads) removal) removal) removal) Oxalis J11-969 1,220,088 1,101,853 497,402 (45%) 313,410 (63%) 162,330 (33%) 11,079 (2%) amblyosepala Oxalis J557 1,895,744 1,724,917 715,360 (41%) 402,111 (56%) 222,377 (31%) 26,254 (4%) blastorrhiza Oxalis callosa J11-726 1,451,514 1,311,491 788,393 (60%) 472,837 (60%) 142,150 (18%) 20,045 (3%) Oxalis ciliaris J233 1,012,810 931,389 542,186 (58%) 317,293 (59%) 96,490 (18%) 10,788 (2%) Oxalis creaseyii J11-961 1,543,348 1,423,598 703,091 (49%) 424,600 (60%) 154,369 (22%) 16,048 (2%) Oxalis exserta J616 935,348 841,451 496,193 (59%) 268,508 (54%) 97,016 (20%) 13,602 (3%) Oxalis gracilis J558 1,296,462 1,188,885 749,774 (63%) 447,138 (60%) 109,747 (15%) 28,477 (4%) Oxalis helicoides J319 1,838,158 1,704,468 798,432 (47%) 458,664 (57%) 186,043 (23%) 19,722 (2%) Oxalis hirsuta J12-20 3,207,132 2,876,738 933,290 (32%) 625,456 (67%)* 407,215 (44%)* 20,964 (2%)* Oxalis hirta J57 1,039,590 942,496 472,230 (50%) 236,745 (50%) 111,585 (24%) 12,304 (3%) Oxalis hirta J62 276,908 246,727 156,655 (63%) 77,330 (49%) 39,072 (25%) 3,420 (2%) Oxalis imbricata MO472 6,867,908 6,243,085 3,280,494 (53%) 1,775,687 (54%) 372,088 (11%) 69,811 (2%) Oxalis J595 1,773,838 1,635,614 655,265 (40%) 419,477 (64%)* 224,056 (34%)* 18,540 (3%)* inconspicua Oxalis J153 1,029,982 927,395 401,417 (43%) 232,487 (58%) 134,519 (34%) 9,216 (2%) kamiesbergensis Oxalis obtusa J99 1,378,380 1,260,241 534,889 (42%) 331,024 (62%) 162,427 (30%) 9,398 (2%) Oxalis orthopoda MO351 1,849,216 1,711,627 939,291 (55%) 620,990 (66%) 167,454 (18%) 87,142 (9%) Oxalis palmifrons MO403 507,374 395,164 168,543 (43%) 66,170 (39%) 64,900 (39%) 3,728 (2%) Oxalis polyphylla J11-44 1,806,672 1,631,230 599,495 (37%) 383,590 (64%)* 235,690 (39%)* 11,903 (2%)* Oxalis J136 1,197,900 1,100,614 500,786 (46%) 264,869 (53%) 147,514 (29%) 19,336 (4%) primuloides Oxalis pulchella J11-875 1,719,890 1,540,537 549,374 (36%) 339,571 (62%)* 237,816 (43%)* 17,299 (3%)* Oxalis reclinata J343 826,650 742,405 375,386 (51%) 224,180 (60%) 109,011 (29%) 7,297 (2%) 3 Oxalis smithiana J11-511 2,730,800 2,505,934 1,074,130 (43%) 629,066 (59%) 283,942 (26%) 61,765 (6%) Oxalis tenella J131 1,207,402 1,114,970 511,049 (46%) 320,370 (63%) 134,438 (26%) 13,894 (3%) Oxalis truncatula J230 1,262,226 1,158,263 627,899 (54%) 419,502 (67%) 122,845 (20%) 8,458 (1%) Mean 1,630,473 1,510,879 711,293 (47%) 419,628 (59%) 171,879 (27%) 21,687 (3%) Oxalis obtusa J12 9,236,186 8,525,040 7,938,349 (93%) 186,831 (2%) 1,113,811 (14%) 264,900 (3%) 33 34 Fig. S1 35 Map of the draft plastid genome of Oxalis hirsuta. The thin black lines of the inner 36 circle indicate the location of the large single copy (LSC) and small single copy (SSC) 37 regions, the thick black lines those of the inverted repeats (IR). Transcription is 38 clockwise for genes on the outside of the circle and counterclockwise for genes on 39 the inside of the circle. Genes with multiple exons are marked with an asterisk. Quasi 40 full-length (lacking start and stop codon) and partial genes are marked with an arrow. 41 42 Fig. S2 43 Splits network of the LCN gene matrix, the plastome and the nrDNA cistron datasets. 44 (A) 727 concatenated LCN loci (least squares fit = 99.5%), (B) plastid genome (least 45 squares fit = 98.0%), (C) nrDNA cistron (least squares fit = 96.7%). For better 46 visualization only bootstrap support values above 90% are shown. 47 48 Fig. S3 49 Convergence measures of Bayesian MCMC runs. (A) Average effective sample size 50 (avgESS) and (B) potential scale reduction factor (PSRF). Parameters are TL: total 51 tree length; r(A⟨–⟩C)–r(G⟨–T⟩ ): reversible substitution rates; k_revmat: substitution 52 rates of the GTR model; pi(A)–pi(T): stationary state frequencies; alpha: shape of 53 gamma distribution of rate variation across sites. Convergence is indicated by 54 avgESS >100 and PSRF ~1.0. (C) Average standard deviation of split frequencies 55 (ASDSF), shown for each of the 727 LCN genes. Convergence is indicated by 56 ASDSF <0.05. 57 58 Fig. S4 59 Comparison of phylogenetic trees based on 727 LCN genes. Species trees were 60 reconstructed under the assumption of the multispecies coalescent model 61 implemented in ASTRAL (A), MP-EST (B), and STAR (C), as MRP supertree (D) and 62 concatenated supermatrix (E). Except for concatenation both Bayesian inference (BI) 63 and maximum likelihood (ML) were used for building the species tree. Dashed lines 64 connect each accession to its placement in the contrasting tree. Values close to the 65 nodes are bootstrap support (* = 100%). The Hirta clade (Oberlander et al., 2011) is 66 colored in green. Nodes which are incongruent between BI of gene trees and ML 67 gene trees within the same species tree method are marked with a blue arrow. 68 Nodes which are incongruent between the different species tree methods and 69 concatenation are marked with orange arrows on the concatenated tree. 70 71 Fig. S5 72 Principal coordinate analysis of Robinson-Foulds (RF) distances between gene trees. 73 (A) RF distances between gene trees obtained with Bayesian inference (BI), (B) RF 74 distances between maximum likelihood (ML) gene trees. Eigenvalues were highest 75 for the first three axes, and the first two are displayed. Each dot represents a gene 76 tree. Kernel density is indicated with blue lines. 77 78 Fig. S6 4 79 Robinson-Foulds (RF) distances between gene trees and the species tree. (A) RF 80 distances between gene trees obtained with Bayesian inference (BI) and the STAR 81 species tree, (B) RF distances between maximum likelihood (ML) gene trees and the 82 STAR species tree. 5 trnS trnL trnF trnL* * -GGA -UCC -UAA -GAA -UAA trnM trnG psbZ -GGU psbC psbD -CAU trnT psaA ycf3 ycf3 ycf3 rps4 trnT -GCA ndhJ petN ndhK psaB rbcL -UGU rps14 trnC ndhC * * -CAU ⌂ trnV * -UGA trnV accD atpE atpB -UAC trnS -UAC trnfM -UUC -GUA -GUC psaI psbM trnE ycf4 trnY trnD cemA * * rpoB petA * rpoC1 * rpoC1 petL petG psbJ psaJ psbF LSC rpoC2 rpl33 psbL ⌂ rps18 trnW psbE trnP -CCA rps2 -UGG atpI rpl20 rps12 atpH clpP 5' end * atpF clpP * ⌂ ⌂ Oxalis hirsuta draft plastome *atpF clpP * -UCU psbB ⌂ atpA trnR * trnG-GCC psbT 153,678 bp psbH psbN psbI trnS-GCU psbK ⌂ petB trnQ-UUG petD ⌂ rpoA trnK-UUU rps11 * rpl36 ⌂ infA ⌂ matK rps8 rpl14 *trnK-UUU rpl16 psbA trnH ⌂ -GUG rps3 rps19 rpl22 rpl2 rps19 * rpl2* rpl23 rpl2 * trnI * rpl2 -CAU rpl23 trnI-CAU IRb IRa ⌂ ycf2 SSC ycf2 -CAA ⌂ ⌂ * trnL trnL ycf68 -CAA ndhB * trnV ndhB 3' end ndhB -GAC ndhB rps7 *trnI * *trnA*trnI rrn16 rps7 rps12 *trnA -GAU -GAU rps12 * ycf15 -UGC -UGC trnR ycf15 rrn23 3' end rrn4.5 ⌂ ⌂ -ACG rrn5 -GUU rps15 ycf1 ndhH ndhA ndhG ndhA ndhE -GAC psaC * rrn16 ndhI * trnN trnV -GAU ndhF -GAU trnI -UGC ndhD trnN * trnI -UGC rrn23 * trnA -GUU trnA rrn5 * rrn4.5 -ACG ycf68 * ycf1' ⌂ ccsA trnR ⌂ -UAG trnL ATP synthase Ribosomal RNA Other function NADH dehydrogenase RNA polymerase Intron Photosystem protein Rubisco subunit Unknown function Ribosomal protein subunit Transfer RNA O. tenella J131 0.0010 0.0010 O. hirta J57 0.01 O. hirta J62 O. callosa J11-726 95 O. primuloides J136 93 O. creaseyii J11-961 O. callosa J11-726 O. kamiesbergensis J153 O. ciliaris J233 100 100 100 O. helicoides J319 O. blastorrhiza J557 O. kamiesbergensis J153 100 O. amblyosepala J11-969 100 100 100 O. tenella J131 O. gracilis J558 100 O. exserta J616 100 O. hirta J57 100 100 100 O. creaseyii J11-961 O. helicoides J319 100 O. primuloides J136 100 O. polyphylla J11-44 100 O. gracilis J558 100 100 O. ciliaris J233 O. reclinata J343 O. tenella J131 100 96 100 100 100 O. blastorrhiza J557 100 98 99 O. gracilis J558 100 100 100 100 100 O. hirta J62 96 O. amblyosepala J11-969 100 100 100 92 O. exserta J616 100 O. reclinata J343 100 O. helicoides J319 100 99 O. amblyosepala J11-969 100 97 100 98 100 100 100 96 100 100 O. smithiana J11-511 O. polyphylla J11-44 O. primuloides J136 O. polyphylla J11-44 99 100 100 100 100 100 100 O. blastorrhiza J557 O. smithiana J11-511 100 100 100 O. reclinata J343 100 100 97 100 100 100 O. callosa J11-726 O. ciliaris J233 1 O. imbricata MO472 97 100 100 O. exserta J616 O. hirta J57 100 100 O. palmifrons MO403 94 O. palmifrons MO403 91 100 95 100 O. hirta J62 93 100 O. truncatula J230 97 100 98 100 O. truncatula J230 100 O. orthopoda MO351 100 100 O. creaseyii J11-961 100 O. orthopoda MO351 100 100 100 100 100 100 O. orthopoda MO351 O. kamiesbergensis J153 100 96 93 97 O. hirsuta J12-20 100 100 O. imbricata MO472 100 O. hirsuta J12-20 100 O. smithiana J11-511 100 O. hirsuta J12-20 100 100 100 100 O. palmifrons MO403 100 100 O. inconspicua J595 LCN genes 100 Plastome nrDNA cistron O. pulchella J11-875 100 100 100 O. obtusa J12 O. pulchella J11-875 O. obtusa J99 100 O. pulchella J11-875 O. truncatula J230 O. obtusa J12 O. obtusa J99 O. obtusa J99 O. obtusa J12 O. inconspicua J595 A O. inconspicua J595 B C O. imbricata MO472 TL TL pi(A) pi(C) pi(G) pi(T) alpha pi(A) pi(C) pi(G) pi(T) alpha A r(A‹-›C) r(A‹-›G) r(A‹-›T) r(C‹-›G) r(C‹-›T) r(G‹-›T) B r(A‹-›C) r(A‹-›G) r(A‹-›T) r(C‹-›G) r(C‹-›T) r(G‹-›T) k_revmat k_revmat C A ASTRAL species tree B MP-EST species tree BI of gene trees ML gene trees BI of gene trees ML gene trees C STAR species tree D MRP supertree BI of gene trees ML gene trees BI of gene trees ML gene trees E Concatenation ML d = 5 d = 5 Eigenvalues Eigenvalues A BI of gene trees B ML gene trees BI of gene trees ML gene trees Numbergene-species of comparisons tree Numbergene-species of comparisons tree 0 20 40 60 80 0 20 40 60 80 100 15 20 25 30 35 40 45 15 20 25 30 35 40 45 A RF distance B RF distance