bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 ORIGINAL ARTICLE
2 DNA barcoding British Euphrasia reveals deeply divergent polyploids but lack of
3 species-level resolution
4 Xumei Wang1, Galina Gussarova2,3,4, Markus Ruhsam5, Natasha de Vere6, Chris Metherell7,
5 Peter M. Hollingsworth5, Alex D. Twyford8*
6 1 School of Pharmacy, Xi'an Jiaotong University, 76 Yanta West Road, Xi'an, 710061,
7 P.R.China
8 2 Tromsø University Museum, UiT The Arctic University of Norway, PO box 6050 Langnes,
9 NO-9037, Tromsø, Norway
10 3 CEES-Centre for Ecological and Evolutionary Synthesis, Dept. of Biosciences, University
11 of Oslo, PO BOX 1066, Blindern, NO-0316, Norway
12 4 Dept. of Botany, Faculty of Biology, St Petersburg State University, Universitetskaya nab.
13 7/9, 199034 St Petersburg, Russia
14 5 Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LR, UK
15 6 National Botanic Garden of Wales, Llanarthne, Carmarthenshire, SA32 8HG, UK and
16 Institute of Biological, Environmental & Rural Sciences (IBERS), Aberystwyth University,
17 Penglais, Aberystwyth, Ceredigion, SY23 3DA, UK
18 7 57 Walton Rd, Bristol BS11 9TA
19 8 University of Edinburgh, Institute of Evolutionary Biology, West Mains Road, Edinburgh,
20 EH3 9JT, UK
21
22 Running title: Complex patterns of diversity in Euphrasia
23 For correspondence: [email protected]
24
1
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
25 26 ABSTRACT
27 Background and aims DNA barcoding is emerging as a useful tool not only for species
28 identification but for studying evolutionary and ecological processes. Although plant DNA
29 barcodes do not always provide species-level resolution, the generation of large DNA
30 barcode datasets can provide insights into the mechanisms underlying the generation of
31 species diversity. Here, we use DNA barcoding to study evolutionary processes in
32 taxonomically complex British Euphrasia, a group with multiple ploidy levels, frequent self-
33 fertilization, young species divergence and widespread hybridisation.
34 Methods We sequenced the core plant barcoding loci, supplemented with additional nuclear
35 and plastid loci, in representatives of all 19 British Euphrasia species. We analyse these data
36 in a population genetic and phylogenetic framework. We then date the divergence of
37 haplotypes in a global Euphrasia dataset using a time-calibrated Bayesian approach
38 implemented in BEAST.
39 Key results No Euphrasia species has a consistent diagnostic haplotype. Instead, haplotypes
40 are either widespread across species, or are population specific. Nuclear genetic variation is
41 strongly partitioned by ploidy levels, with diploid and tetraploid British Euphrasia possessing
42 deeply divergent ITS haplotypes (DXY = 5.1%), with haplotype divergence corresponding to
43 the late Miocene. In contrast, plastid data show no clear division by ploidy, and instead reveal
44 weakly supported geographic patterns.
45 Conclusions Using standard DNA barcoding loci for species identification in Euphrasia will
46 be unsuccessful. However, these loci provide key insights into the maintenance of genetic
47 variation, with divergence of diploids and tetraploids suggesting that ploidy differences act as
48 a barrier to gene exchange in British Euphrasia, with rampant hybridisation within ploidy
49 levels. The scarcity of shared diploid-tetraploid ITS haplotypes supports the polyploids being 2
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
50 allotetraploid in origin. Overall, these results show that even when lacking species-level
51 resolution, DNA barcoding can reveal insightful evolutionary patterns in taxonomically
52 complex genera.
53 54 Keywords: DNA barcoding, Euphrasia, Orobanchaceae, polyploidy, taxonomic complexity,
55 British flora, phylogeny
56
3
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
57 58 INTRODUCTION
59
60 DNA barcoding is a valuable tool for discriminating among species, and these data often give
61 insights into identity that are overlooked based on morphology alone (Hebert and Gregory,
62 2005). Plant DNA barcoding has been used for species discovery, reconstructing historical
63 vegetation types from frozen sediments, surveying environmental variation, and many other
64 research topics (reviewed in Hollingsworth et al., 2016). Despite the extensive uptake of
65 DNA barcoding, there are numerous reports of taxon groups where DNA barcodes do not
66 provide exact plant species identification, and where DNA barcode sequences are shared
67 among related species (Percy et al., 2014, Yan et al., 2015a, Yan et al., 2015b, Spooner,
68 2009). Identifying the causes of why DNA barcoding ‘fails’ is essential to help guide the
69 development of future DNA barcoding systems. If species discrimination is usually limited
70 by information content of the core barcoding loci, then improvements can be made by
71 extending the DNA barcode to harness variation across the whole plastid genome. In contrast,
72 if limits to species discrimination are caused by hybridisation, incomplete lineage sorting, and
73 polyploidy, then we may see no improvement with more plastid data, and future barcoding
74 systems will need to target the nuclear genome, as well as use new analytical methods to cope
75 with many independent loci (Coissac et al., 2016, Hollingsworth et al., 2016). As such, more
76 genetic studies of complex taxa groups are required in order to identify the underlying
77 evolutionary processes that can cause DNA barcoding to fail. In addition, the generation of
78 large data sets of DNA sequence data from multiple individuals of multiple species in such
79 groups also provide datasets that can shed light onto evolutionary relationships and
80 evolutionary divergence, without a need for the barcode markers to track species boundaries.
81
4
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
82 Postglacial species radiations of taxonomically complex groups in Northern Europe are a case
83 where we may not expect a clear cut-off between intraspecific variation and interspecific
84 divergence and thus DNA barcoding may provide limited discriminatory power. Despite this,
85 DNA barcoding may still be valuable if used to identify evolutionary and ecological
86 processes that result in shared sequence variation. For example, many postglacial taxa are
87 characterised by a combination of: (1) recent postglacial speciation, (2) extensive
88 hybridisation, (3) frequent self-fertilization, (4) divergence involving polyploidy. Our
89 expectation is that factors 1 + 2 will cause DNA barcode sequences to be shared among
90 geographically proximate taxa, while 3 will cause barcodes to be population rather than
91 species-specific (Hollingsworth et al., 2011). Factor 4, polyploidy, will manifest as shared
92 haplotypes between recent polyploids and their parental progenitors, or deep haplotype
93 divergence in older polyploid groups, where ploidy act as a reproductive isolating barrier and
94 allows congeneric taxa to accumulate genetic differences. Many of these interacting factors
95 are common across taxonomically complex postglacial groups (Ennos, 2005), which include
96 the Arabidopsis arenosa complex (Schmickl et al., 2012), Cerastium (Brysting et al., 2007),
97 Epipactis (Squirrell et al., 2002) and Galium (Kolář et al., 2015). As such, the application of
98 DNA barcoding to such groups will not only reveal the prevalence of haplotype sharing and
99 the potential for species discrimination, but improve our understanding of processes
100 underlying shared variation.
101
102 One example of a taxonomically challenging group showing postglacial divergence is British
103 Euphrasia species. This group of 19 taxa are renowned for their difficult species
104 identification, and at present only a handful of experts can identify these species in the field.
105 Morphological species identification is difficult due to their small stature, combined with
106 species being defined by a complex suite of overlapping characters (Yeo, 1978). They are
5
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
107 also generalist hemiparasites and thus phenotypes are plastic and depend upon host quality
108 (Svensson and Carlsson, 2004). DNA barcode-based identification could partly resolve these
109 identification issues, and lead to a greater understanding of species diversity and distributions
110 in this under-recorded group. This is particularly important as a number of Euphrasia species
111 are critically rare and of conservation concern, while others are ecological specialists that are
112 useful indicators of habitat type (French et al., 2008). More generally, DNA barcoding data
113 could reveal the processes structuring genetic diversity and those that are responsible for
114 recent speciation.
115
116 Previous broad-scale surveys of Euphrasia using AFLP and microsatellites have shown a
117 significant proportion of genetic variation is partitioned by ploidy groups and by species,
118 despite extensive hybridisation (French et al., 2008). Here, we follow-on from previous
119 population genetic studies by testing the utility of DNA barcoding across British Euphrasia.
120 Our first aim is to assess whether DNA barcoding is informative for species recognition in
121 this young postglacial group. Our second aim is to understand the evolutionary factors that
122 may explain patterns of shared DNA sequences. We first sequence British populations for the
123 standard DNA barcoding loci, and analyse the distribution of haplotypes across populations
124 and species. To better understand the historical context of haplotype sharing we sequence
125 additional loci and use dated phylogenetic analyses to infer divergence ages. Overall, these
126 results are used to understand the efficacy of genetic tools for studying species-level variation
127 in a taxonomically complex group.
128
129 MATERIALS AND METHODS
130
131 Specimen sampling
6
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
132
133 The 19 currently recognised British Euphrasia species are all annuals, selfers or mixed-
134 mating small herbaceous plants, which occur in a range of habitats including coastal turf,
135 chalk downland, mountain ridges and heather moorland. The species can be divided into two
136 groups, glabrous or short-eglandular hairy tetraploids (15 species, Fig 1A), or long glandular
137 hairy diploids (4 species, Fig. 1B). Our sampling included representatives of all British
138 species, with samples collected from across widespread populations (Fig. 1c, Table S1,
139 [Supplementary Information]). Samples were collected in South West England and Wales
140 to allow us to include mixed populations of diploids and tetraploids, early generation diploid
141 x tetraploid hybrids, and two diploid hybrid species hypothesised to be derived from diploid x
142 tetraploid crosses (E. vigursii, parentage: E. rostkoviana x E. micrantha), E. rivularis
143 (parentage: E. anglica x E. micrantha; Yeo, 1956). Samples from Scotland allowed us to
144 sample complex tetraploid taxa and tetraploid hybrids, plus scarcer Scottish diploids. Our
145 sampling scheme investigated range-wide variation by targeting many taxa and populations,
146 but without intrapopulation sampling. This is because prior work has shown low
147 intrapopulation diversity, with populations frequently fixed for single haplotypes (French et
148 al., 2008). All samples were identified by Euphrasia experts Alan Silverside or Chris
149 Metherell.
150
151 For our population-level DNA barcoding study, we analysed a total of 133 individuals, with
152 106 samples representing 19 species, as well as 27 samples from 14 putative hybrids. We
153 sequenced samples for the core plant DNA barcoding loci, matK and rbcL (Hollingsworth et
154 al., 2009), as well as ITS2 (China Plant BOL Group, 2011), and supplemented these loci with
155 rpl32-trnLUAG, which has been informative in prior population studies of Euphrasia (Stone,
156 2013).
7
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
157
158 For our broader molecular phylogenetic and molecular dating analysis, we expanded the
159 sampling in the molecular phylogeny of the genus by Gussarova et al. (2008), to include a
160 detailed sample of British taxa. The previous analysis included 40 taxa for the nuclear
161 ribosomal internal transcribed spacer (ITS), and 50 taxa for plastid DNA (Gussarova et al.,
162 2008). We sequenced samples to match the previous data matrix, which included: the trnL
163 intron (Taberlet et al., 1991), intergenic spacers atpB-rbcL (Hodges and Arnold, 1994) and
164 trnL-trnF (Taberlet et al., 1991), and ITS (White et al. 1990).
165
166 DNA extraction, PCR amplification, and sequencing
167
168 DNA was extracted from silica dried tissue using the DNeasy Plant Mini kit (Qiagen, Hilden,
169 Germany) following the manufacturer's protocol, but with an extended incubation of 1 hour
170 at 65 oC. These DNA samples were added to existing DNA extractions from 68 individuals
171 from French et al. (2008).
172
173 PCRs were done in two laboratories following separate protocols. We applied the following
174 conditions for most taxa. We performed PCRs in 10 μL reactions, with DNA amplification
175 and PCR conditions for each primer given in Table S2 [Supplementary Information]. We
176 visualised PCR products on a 1% agarose gel, with 5 μL of PCR product purified for
177 sequencing with ExoSAP-IT (USB Corporation, Cleveland, OH, USA) using standard
178 protocols. Sequencing was performed in 10 μL reactions containing 1.5 μL 5 × BigDye
179 buffer (Life Technologies, Carlsbad, CA, USA), 0.88 μL BigDye enhancing buffer BD × 64
180 (MCLAB, San Francisco, CA, USA), 0.125 μL BigDye v3.1 (Life Technologies), 0.32 μM
181 primer and 1 μM of purified PCR product. We sequenced PCR products on the ABI 3730
8
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
182 DNA Analyser (Applied Biosystems, Foster City, CA, USA) at Edinburgh Genomics. A
183 subset of sequences were generated as part of the effort to DNA barcode the UK Flora, and
184 followed a different set of protocols, detailed in de Vere et al. (2012). We assembled, edited
185 and aligned sequences using Geneious v. 8 (Biomatters, Auckland, New Zealand), with
186 manual editing. Indels were coded as unordered binary characters and appended to the
187 matrices. We used gap coding as implemented in Gapcoder (Young and Healy, 2003), with
188 indels treated as point mutations and equally weighted with other mutations.
189
190 Population genetic analyses of British samples
191 We examined patterns of sequence variation using a range of population genetic methods. We
192 investigated the amount of sequence diversity in these recently diverging species using
193 descriptive statistics, and then tested the cohesiveness of taxa using analysis of molecular
194 variance (AMOVA) and related methods. Analyses were performed separately on ITS2 and a
195 concatenated matrix of plastid data. For plastid data, haplotypes were determined from
196 nucleotide substitutions and indels of the aligned sequences. Basic population genetic
197 statistics were performed in Arlequin Version 3.0 (Excoffier and Lischer, 2010), and this
198 included the number of haplotypes, as well as hierarchical AMOVA in groups according to:
199 (1) ploidy levels (diploid vs tetraploid); (2) geographic regions (Wales, England, Scotland);
200 (3) species. AMOVAs were performed on all taxa, and repeated for ploidy levels and
201 geographic regions on a dataset only including confirmed species (i.e. excluding hybrids).
202 Sequence diversity and divergence statistics were estimated with DnaSP (Librado and Rozas,
203 2009), which included: average nucleotide diversity across taxa (Pi), Watterson’s theta (per
204 site), Tajima’s D and divergence between ploidy levels (DXY).
205
9
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
206 Genetic divergence among sampling localities were explored with Spatial Analysis of
207 Molecular Variance (SAMOVA; Dupanloup et al., 2002), implemented in SPADS v.1.0
208 (Dellicour and Mardulyn, 2014). SAMOVA maximises the proportion of genetic variance
209 due to differences among populations (FCT) for a given number of genetic clusters (K-value).
210 We considered the best grouping to have the highest FCT value after 100 repetitions. This
211 analysis investigated interspecific differentiation, thus only used species samples, excluding
212 hybrids.
213
214 The relationships between haplotypes was inferred by constructing median-joining networks
215 (MJM; Bandelt et al., 1999) with the program NETWORK v.4.6.1.1 (available at
216 http://www.fluxus-engineering.com/), treating gaps as single evolutionary events.
217
218 Phylogenetic analyses and molecular dating
219 We used Bayesian phylogenetic analyses in MrBayes v. 3.1.2 (Huelesenbeck & Ronquist
220 2001) to infer species relationships and broad-scale patterns of colonisation. Our analyses
221 used a sequence matrix that included our newly sampled British taxa in addition to previous
222 global Euphrasia samples from Gussarova et al. (2008). We selected the best fitting model of
223 nucleotide substitution using the Akaike Information Criterion (AIC) with an empirical
224 correction for small sample sizes implemented in MrAIC (Nylander, 2004). Using GTR +G
225 as the best model for the plastid dataset and SYM + G for the ITS dataset we ran two sets of
226 four Markov Chain Monte Carlo (MCMC) runs for 5,000,000 generations. Indels were
227 included as a separate partition with a restriction site (binary) model. We sampled every
228 1000th generation and used a burnin of 2,500,000, and default priors. We confirmed chain
229 convergence by observing the average standard deviation of split frequencies and by plotting
230 parameter values in Tracer v. 1.6 (Rambaut & Drummond 2013).
10
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
231
232 We used BEAST v. 1.8.2 (Drummond et al. 2012) to estimate the divergence ages of major
233 Euphrasia lineages occurring in Britain. This analysis used our British samples in
234 conjunction with the full global dataset of Euphrasia (Gussarova et al., 2008). Our analysis
235 gives crude dates due to the lack of available fossils for calibration, but these estimates allow
236 us to compare between very recent (postglacial) divergence, and much older divergence
237 events. We only analysed ITS sequences, due to the lack of support obtained for the plastid
238 phylogeny (see results). The analysis included two partitions: one containing all ITS
239 sequences (638 bp) and the other containing 38 gap-coded indels. We applied the same
240 substitution models as those used in the MrBayes analyses. For the binary characters, we used
241 a stochastic Dollo model. Separate analyses were run testing for the strict vs the uncorrelated
242 lognormal (UCLN) molecular clock. The branching was modelled using the Yule tree prior,
243 which assumes a constant speciation rate per lineage. The root age was set with a normal
244 prior of mean = 2.65x107 and SD = 5x105, according to the results obtained based on a
245 calibrated phylogeny of Euphrasia by Gussarova et al. (2008). This prior analysis used
246 geological dating of volcanic islands to set priors as maximum ages for colonization of
247 endemic Euphrasia species. Substitution rate priors were based on Key et al. (2006), ranging
248 from 0.38 x 10-9 to 8.34 x 10-9 substitutions/site/year. Data analyses were run for 50,000,000
249 generations, logging every 50,000th generation. We compared the fit of the clock models
250 using AICM criterion implemented in Tracer. The AICM values obtained were very similar
251 between the two models: 8723.681+/-0.685 vs. 8882.911+/-1.582. The strict clock was
252 chosen as it is a simpler model and with lower AICM. These phylogenetic results were
253 compared with those from an empty alignment using the same values for priors.
254
255 RESULTS
11
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
256
257 ITS haplotype diversity in British populations
258
259 The final ITS2 alignment contained 130 individuals representative of all British taxa, and was
260 380 bp in length. Only two samples (of E. scottica) presented double peaks, and were
261 excluded from analysis. Overall diversity across taxa was modest, with a nucleotide diversity
262 (Pi) of 2.3%, and theta (per site) of 0.01781. There were 33 nucleotide substitutions and one
263 indel, from which we called 23 haplotypes. All sequences have been deposited in GenBank
264 (Accession numbers given on acceptance).
265
266 The ITS haplotypes revealed strong partitioning by ploidy. Of the 23 haplotypes, three (H1,
267 H20 and H21) were restricted to diploids, and 19 to tetraploids, with only one haplotype (H2)
268 shared across ploidy levels (Table 1). Haplotype H2 was not only shared across ploidy levels
269 but was also the most widespread haplotype, found in 67 samples across 34 populations. This
270 included geographically distinct species such as the Scottish endemic E. marshallii and the
271 predominantly English and Welsh E. anglica, and ecologically contrasting taxa such as the
272 dry heathland specialist E. micrantha and the (currently unpublished) obligate coastal “E.
273 fharaidensis”. Overall, 86% of taxa had one of six widespread haplotypes. There were also a
274 large number of rare haplotypes, with over two thirds restricted to a single population (17
275 haplotypes: H4, H7-H10, H12-H21 and H23; Table 1). The remaining haplotypes found in
276 multiple populations (H1, H2, H3, H5, H6, H11) showed no clear pattern of geography, with
277 three found in all geographic regions (England, Scotland, Wales) and the remaining three
278 shared between two geographic regions. Similarly, patterns of haplotype sharing do not
279 follow species boundaries. Of the eight species with multiple populations (excluding
280 hybrids), none of them had diagnostic ITS haplotypes. Despite haplotype sharing across taxa,
12
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
281 there was no evidence for this being due to non-neutral processes, as the value of Tajima’s D
282 (-0.17) was not significantly different from zero.
283
284 The putative hybrid species, E. vigursii and E. rivularis, possessed ITS haplotype H1, which
285 is common to other diploid taxa, or population specific haplotypes (H20, H21), but no
286 tetraploid haplotypes. The two sampled diploid-tetraploid hybrids (E. arctica x rostkoviana,
287 E. tetraquetra x vigursii) possessed the full range of haplotypes: haplotype H2, which is
288 common across ploidy levels, tetraploid specific haplotype H3, and diploid specific haplotype
289 H1. Most (9/12) tetraploid hybrid populations had haplotypes shared with their putative
290 parents, while the other populations had unique haplotypes.
291
292 The highest FCT values in the SAMOVA were when K = 2 (Table S3 [Supplementary
293 Information]), and this corresponded to the diploid-tetraploid divide described above. At K
294 = 3, SAMOVA distinguished clusters corresponding to the two ploidy groups, and a third
295 group of hybrid species derived from inter-ploidy level mating. AMOVA also supported the
296 strong division by ploidy, with 88.2% of variation attributed to ploidy differences (Table 2; P
297 < 0.001). A high proportion of variation was also partitioned by taxa (63.2%), and regions
298 (25%), though this may be inflated by limited sampling within species.
299
300 Summary statistics and haplotype networks revealed substantial divergence between diploid
301 and tetraploid haplotypes. An average of 18.3 site differences were found between diploids
302 and tetraploids, with divergence measured as DXY = 0.051 (5.1%). The haplotype network
303 revealed clusters corresponding to diploid and tetraploid haplotypes, separated by many
304 mutations (Figure 2). The diploid cluster centres round haplotype H1, found in 15 individuals
305 from 5 diploid species and one diploid-tetraploid hybrid. The only haplotype from this part of
13
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
306 the network present in tetraploids is haplotype H18, found in a single sample of E. ostenfeldii.
307 Within the tetraploid cluster, widespread haplotype H2 is at the centre, surrounded by other
308 widespread haplotypes (H3, 8 populations, 12 samples; H6, 7 populations, 11 samples), and
309 singleton haplotypes.
310
311 ITS phylogeny and molecular dating
312
313 Our broad-scale global Euphrasia phylogenetic analyses performed using MrBayes gave
314 meaningful clusters of species, though the tree topology was generally poorly supported with
315 many polytomies (Fig. 3). Haplotypes occurring in Britain were predominantly found in two
316 main clusters: a tetraploid clade of Holarctic taxa from Sect. Euphrasia (posterior probability
317 support, pp = 1.00, Clade A, Fig. 3), and a well-supported geographically restricted Palearctic
318 diploid lineage (pp = 1.0, Clade B, Fig. 3). The tetraploid clade included a mix of British and
319 European taxa, and is sister to a mixed clade of alpine diploid species and tetraploid E.
320 minima (Clade IVc, Fig. 3). The diploid clade includes British diploids E. anglica, E.
321 rivularis, E. rostkoviana, E. vigursii and European relatives (diploids or taxa without
322 chromosome counts). The only non-diploid in the clade is one individual of tetraploid British
323 E. ostenfeldii, which appears to be correctly identified and thus may have captured the diploid
324 ITS haplotype through historical hybridisation. Overall, terminal branches of the tree are
325 short, indicative of limited variation between related haplotypes. The only exception was the
326 long branch of E. disperma from New Zealand, a result seen in previous Bayesian analyses
327 (cf. Gussarova et al., 2008, fig 2) but not in Parsimony analyses, where it clusters together
328 with the other southern hemisphere species on a shorter branch (Gussarova et al., 2008).
329
14
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
330 Molecular dating with BEAST yielded a similar tree topology to MrBayes (results not
331 shown), with many nodes having a low posterior support values. The clade of tetraploids and
332 alpine relatives had an estimated crown age of 2.2 Ma (95% HPD = 2.0 - 2.4 Ma, Fig. 3,
333 Node 1). The median crown age of the major group of diploids (also including Nearctic
334 endemics E. oakesii and E. randii) was estimated as 1.0 Ma (95% HPD = 0.7 – 1.4 Ma; pp =
335 1.00, Fig. 3, Node 2). Due to low support of internal branches, the age of the most recent
336 common ancestor of the diploid and tetraploid lineages could not be inferred directly. The
337 nearest dated node gaining support is the broader clade of Euphrasia, which includes the
338 diploid and tetraploid groups, in addition to two additional clades that include divergent
339 species of unknown ploidy such as E. insignis (Japan), which has a crown age of 8.0 Ma,
340 95% HPD = 6.5 – 9.8 Ma (Fig. 3, Node 3).
341
342 Plastid haplotype diversity
343
344 Initial sequencing of rbcL in 48 samples revealed no polymorphism, and so no further
345 sequencing was performed for this region and it was excluded from further analyses.
346 The final matK alignment was 844 bp with one indel, and the rpl32-trnL region was 630 bp
347 with nine indels. The final concatenated plastid alignment was 1474 bp for 130 samples, with
348 2.7% segregating sites (40 sites) across 38 haplotypes. Nucleotide diversity was exceptionally
349 low with Pi = 0.3%, and theta (per site) was similarly low at 0.00381. Tajima’s D was not
350 significantly different from zero (-0.27).
351
352 Similar to ITS, most plastid haplotypes were individual or population specific (63%, 24/38
353 haplotypes found in one population only), with only 4 haplotypes being widespread (H4: 15
354 populations, 24 samples; H5: 13 populations, 18 samples; H2: 11 populations, 16 samples;
15
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
355 H1, 10 populations, 14 samples, Table S4 [Supplementary Information]). However, unlike
356 ITS, plastid haplotypes revealed complex patterns unrelated to ploidy. Most widespread
357 haplotypes were shared across ploidy levels. An AMOVA found a moderate degree of
358 genetic diversity was partitioned by ploidy (18.7%) and species (26.5%), with these values
359 being reduced when hybrids were included (Table 2 [Supplementary Information]).
360 Despite geography explaining little of the variation across the total dataset (4.9%), or being
361 evident in the SAMOVA (Table S5), localised haplotype sharing was apparent in Scottish
362 tetraploids. For example, haplotype H7 is shared across Scottish populations of E. arctica
363 (and its hybrids), E. foulaensis and E. micrantha (and its hybrids), while H10 is also shared
364 across three species in Scotland.
365
366 Relationship among plastid haplotypes
367
368 The final concatenated plastid alignment was 1692 bp in length, for a total of 82 Euphrasia
369 samples, including those from Gussarova et al. (2008). This alignment included the trnL
370 intron (517 bp, 73 variable sites), trnL-trnF (420 bp, 85 variable sites) and atpB-rbcL (754
371 bp, 89 variable sites). The plastid tree (Fig. 4) successfully recovered the geographic clades
372 reported in Gussarova et al. (2008). All diploid and tetraploid British samples possessed
373 plastid haplotypes from the broad Palearctic taxa clade, which also includes E. borneensis
374 (Borneo) and E. fedtschenkoana (Tian Shan). This clade received moderate support in our
375 analysis (pp = 0.85). While informative of broad-scale relationships, most terminal branches
376 were extremely short, and gave no information on interspecific relationships.
377
378 DISCUSSION
379
16
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
380 We have investigated the utility of DNA barcoding data for the study of taxonomically
381 complex Euphrasia in Britain. We find that Euphrasia species do not possess diagnostic ITS
382 and plastid sequence profiles, and instead haplotypes are either widespread across taxa, or are
383 individual or population specific. While DNA barcoding is of limited value as a molecular
384 tool for identifying Euphrasia species, we show these data to be informative of the
385 evolutionary processes underlying the generation and maintenance of diversity. Most notably,
386 deep divergence of ITS haplotypes between ploidy groups (Pi = 5.1%, >2 Ma divergence),
387 support British Euphrasia being assembled from diverse pre-glacial continental taxa, and
388 with ploidy differences acting as an important reproductive barrier partitioning genetic
389 variation. Overall our results shed light on the maintenance of genetic variation in one of the
390 most renowned taxonomically challenging plant groups.
391
392 DNA barcoding in taxonomically complex genera
393
394 Taxonomically complex Euphrasia have many characteristics that would make a DNA-based
395 identification system desirable. In particular, molecular identification tools could be used to
396 confirm species identities and subsequently revise the distribution of these under-recorded
397 taxa. More generally, genetic data could be used to investigate which British species are
398 genetically cohesive ‘good’ taxa. Our data show, however, that barcode sequences do not
399 correlate with species boundaries as defined by morphology. Instead, haplotypes are often
400 individual or population specific (for both ITS and plastid DNA), and where widespread are
401 either shared across species within a ploidy level (for ITS) or across ploidy levels and species
402 (plastid DNA). Patterns of haplotype sharing are often surprising, including between
403 geographically disparate populations 100+ km apart, and between ecologically specialised
404 taxa that seldom co-occur in the wild.
17
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
405
406 One possibility is that the species are not discrete genetic entities, and that the current
407 taxonomy reflects a blend of discrete lineages, polytopic taxa, and morphotypes determined
408 by a small number of genes. Alternatively, the species may represent meaningful biological
409 entities, but with boundaries permeable to gene flow. In either case, the lack of species-
410 specific or morphotype-specific barcodes may be affected by numerous evolutionary
411 processes. Low sequence diversity associated with recent diversification is the first important
412 factor. While elevated plastid diversity is a hallmark of some parasitic taxa, this is not the
413 case for facultative hemiparasites like Euphrasia, which generally show similar patterns of
414 mutation to autotrophic taxa (Wicke et al., 2016). Despite observing low plastid diversity
415 (<0.5%) across taxa, we still detected 38 plastid haplotypes. This is sufficient variation for
416 population genetic analysis, but lack of nucleotide diversity could explain the poor
417 performance of our phylogenetic analyses. A second factor that reduces the ability of DNA
418 data to distinguish species is selective sweeps, which can act on any genomic region
419 including the plastid (Twyford, 2014, Muir and Filatov, 2007). While our data collection was
420 not designed to measure the strength of selection, values for Tajima’s D were not
421 significantly different from zero thus do not point to a strong selective regime.
422
423 In contrast to the factors above, it seems that self-fertilization, hybridity and incomplete
424 lineage sorting are dominant factors shaping haplotype distributions across British Euphrasia
425 species and populations. We observed many haplotypes that are restricted to individual
426 samples or populations, consistent with French et al. (2008), who used extensive
427 intrapopulation sampling to show most Euphrasia populations are fixed for a single plastid
428 haplotype. This pattern of local fixation of haplotypes, and scarcity of widespread haplotypes,
429 is likely due to many Euphrasia species being (at least partly) self-fertilising (Fis >0.4,
18
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
430 French et al., 2004), and localised gravity-mediated seed dispersal. Where widespread
431 haplotypes are present, these are shared amongst taxa and geographic areas. In these cases,
432 dispersal of haplotypes has clearly occurred and their sharing among species may be due to
433 hybridisation and incomplete lineage sorting. The large number of reported hybrids and
434 hybrid species based on morphology (Preston and Pearman, 2015, Stace et al., 2015), as well
435 as the prevalence of hybridisation in genetic data (Stone, 2013, Liebst, 2008), point to
436 hybridisation being a key factor shaping genetic diversity in Euphrasia. Future genomic
437 surveys will estimate the proportion of loci introgressing across species barrier in models that
438 explicitly account for incomplete lineage sorting (Twyford and Ennos, 2012).
439
440 The sharing of DNA barcodes among Euphrasia species parallels a number of other studies
441 where DNA barcoding has failed to provide species-level information. A notable example is
442 willows, where hybridisation and selective sweeps have caused a single haplotype to spread
443 across highly divergent taxa and between geographic regions (Percy et al., 2014). Poor
444 species discrimination from DNA barcoding is also seen in the rapid radiation of Chinese
445 Primula (Yan et al., 2015a) and Rhododendron (Yan et al., 2015b), which is likely a product
446 of hybridisation and recent species divergence. In each of these groups, future DNA
447 barcoding systems that target large quantities of nuclear sequence variation may provide
448 resolution (Coissac et al., 2016, Hollingsworth et al., 2016). These data have the joint benefit
449 of providing many nucleotide characters from unlinked loci, while also moving away from
450 genomic regions that have atypical inheritance and patterns of evolution (i.e. plastids).
451 Analyses of many nuclear genes (or entire genomes) would be particularly valuable for
452 Euphrasia, where it may be possible to identify adaptive variants maintained in the face of
453 hybridisation. These adaptive genes may underlie differences between species or ecotypes
19
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
454 (Twyford and Friedman, 2015), and these loci could then potentially be used for future
455 species identification.
456
457 Polyploidy and the maintenance of genetic variation
458
459 The deep divergence of ITS haplotypes between diploid and tetraploid Euphrasia suggest
460 strong reproductive barriers between ploidy levels, a result consistent with previous
461 population surveys with AFLPs (French et al., 2008). The lack of support of internal nodes in
462 our phylogeny make the divergence between ploidy groups difficult to date, but it must
463 predate the origin of the tetraploid clade (2.2 Ma) and the diploid clade (1 Ma), with a date
464 likely to be closer to 8 Ma (similar to a previous global Euphrasia phylogeny, Gussarova et
465 al., 2008). Our divergence age estimate suggests that genetic diversity in British Euphrasia
466 long pre-dates recent glacial divergence and the origin of young British endemic taxa. As
467 such, British Euphrasia diversity has been assembled from a diverse pool of genetic diversity
468 from European and Amphi-Atlantic taxa. The presence of distinct ITS haplotypes in each
469 ploidy group may also be a consequence of concerted evolution of this multi-copy region
470 (Sang et al., 1995), rather than absolute reproductive isolation. Further work will be required
471 to test the extent of gene flow across ploidy levels, which is now emerging as a common
472 feature of other plant groups (Pinheiro et al., 2010, Slotte et al., 2008).
473
474 The divergence between diploid and tetraploid ITS sequences adds further weight to the
475 British tetraploids not being young autopolyploids, and instead being allopolyploids. While it
476 is difficult to decipher the parentage of British tetraploids from our data, our phylogenetic
477 analysis place these British taxa in a clade composed exclusively of tetraploids (Clade IVd,
478 Gusarova et al., 2008), and thus these species may have originated elsewhere before dispersal
20
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
479 to the UK. Alternatively, plastid haplotypes shared between diploids and tetraploids could
480 point to a British diploid parent to the tetraploids, with this parent not contributing an ITS
481 haplotype. An allotetraploid origin is further supported by the high number of tetraploid-
482 specific AFLP bands (French et al., 2008), fixed microsatellite heterozygosity indicative of
483 disomic inheritance (Stone, 2013), and tetraploid genome assemblies of double the size of
484 diploids (Twyford and Ness, Unpublished data). Future genomic and cytogenetic studies will
485 clarify the origin of British tetraploids.
486
487 Conclusions
488
489 This study highlights how DNA barcoding data may fail to distinguish between species in
490 taxonomically complex groups such as Euphrasia. No species in our study possessed a
491 consistent diagnostic haplotype. Widespread haplotype sharing among species, in conjunction
492 with high levels of intraspecific variation, make Euphrasia a particular challenge for DNA
493 barcoding. However, our results are able to help us understand the maintenance of diversity,
494 and in particular allow us to comment on the origins of British tetraploid species.
495
496 ACKNOWLEDGEMENTS
497
498 Thanks to Richard Ennos for his assistance with this research. This study formed part of an
499 international exchange programme for XW, funded by a Chinese Scholarship Council (CSC).
500 The Royal Botanic Garden Edinburgh (RBGE) is supported by the Scottish Government’s
501 Rural and Environmental Science and Analytical Services Division. Research by ADT is
502 supported by NERC Fellowship NE/L011336/1.
21
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
503
504 LITERATURE CITED
505
506 Bandelt H-J, Forster P, Röhl A. 1999. Median-joining networks for inferring intraspecific
507 phylogenies. Molecular Biology and Evolution, 16: 37-48.
508 Brysting AK, Oxelman B, Huber KT, Moulton V, Brochmann C, Renner S. 2007.
509 Untangling complex histories of genome mergings in high polyploids. Systematic
510 Biology, 56: 467-476.
511 Coissac E, Hollingsworth PM, Lavergne S, Taberlet P. 2016. From barcodes to genomes:
512 extending the concept of DNA barcoding. Molecular Ecology, 25: 1423-1428.
513 de Vere N, Rich TC, Ford CR, Trinder SA, Long C, Moore CW, Satterthwaite D,
514 Davies H, Allainguillaume J, Ronca S. 2012. DNA barcoding the native flowering
515 plants and conifers of Wales. PLoS One, 7: e37945.
516 Dellicour S, Mardulyn P. 2014. SPADS 1.0: a toolbox to perform spatial analyses on DNA
517 sequence data sets. Molecular Ecology Resources, 14: 647-651.
518 Dupanloup I, Schneider S, Excoffier L. 2002. A simulated annealing approach to define the
519 genetic structure of populations. Molecular Ecology, 11: 2571-2581.
520 Drummond AJ, Suchard MA, Xie D & Rambaut A. 2012. Bayesian phylogenetics with
521 BEAUti and the BEAST 1.7 Molecular Biology And Evolution 29: 1969-1973.
522 Ennos RA, French GC, Hollingsworth PM. 2005. Conserving taxonomic complexity.
523 Trends in Ecology & Evolution, 20: 164-168.
524 Excoffier L, Lischer HE. 2010. Arlequin suite ver 3.5: a new series of programs to perform
525 population genetics analyses under Linux and Windows. Molecular Ecology
526 resources, 10: 564-567.
22
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
527 French G, Hollingsworth P, Silverside A, Ennos R. 2008. Genetics, taxonomy and the
528 conservation of British Euphrasia. Conservation Genetics, 9: 1547-1562.
529 French GC, Ennos RA, Silverside AJ, Hollingsworth PM. 2004. The relationship between
530 flower size, inbreeding coefficient and inferred selfing rate in British Euphrasia
531 species. Heredity, 94: 44-51.
532 China Plant BOL Group. 2011. Comparative analysis of a large dataset indicates that
533 internal transcribed spacer (ITS) should be incorporated into the core barcode for seed
534 plants. Proceedings of the National Academy of Sciences, 108: 19641-19646.
535 Gussarova G, Popp M, Vitek E, Brochmann C. 2008. Molecular phylogeny and
536 biogeography of the bipolar Euphrasia (Orobanchaceae): Recent radiations in an old
537 genus. Molecular Phylogenetics and Evolution, 48: 444-460.
538 Hebert PD, Gregory TR. 2005. The promise of DNA barcoding for taxonomy. Systematic
539 Biology, 54: 852-859.
540 Hodges SA, Arnold ML. 1994. Columbines: a geographically widespread species flock.
541 Proceedings of the National Academy of Sciences, 91: 5129-5132.
542 Hollingsworth P, De-Zhu L, Van der Bank M, Twyford A. 2016. Telling plant species
543 apart with DNA: from barcodes to genomes. Philosophical Transactions of the Royal
544 Society B.
545 Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, van der
546 Bank M, Chase MW, Cowan RS, Erickson DL. et al. 2009. A DNA barcode for
547 land plants. Proceedings of the National Academy of Sciences, 106: 12794-12797.
548 Hollingsworth PM, Graham SW, Little DP. 2011. Choosing and using a plant DNA
549 barcode. PLoS ONE, 6: e19254.
550 Kolář F, Píšová S, Záveská E, Fér T, Weiser M, Ehrendorfer F, Suda J. 2015. The origin
551 of unique diversity in deglaciated areas: traces of Pleistocene processes in
23
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
552 north‐European endemics from the Galium pusillum polyploid complex (Rubiaceae).
553 Molecular Ecology, 24: 1311-1334.
554 Librado P, Rozas J. 2009. DnaSP v5: a software for comprehensive analysis of DNA
555 polymorphism data. Bioinformatics, 25: 1451-1452.
556 Liebst B. 2008. Do they really hybridize? A field study in artificially established mixed
557 populations of Euphrasia minima and E. salisburgensis (Orobanchaceae) in the Swiss
558 Alps. Plant Systematics and Evolution, 273: 179-189.
559 McNeal JR, Bennett JR, Wolfe AD, Mathews S. 2013. Phylogeny and origins of
560 holoparasitism in Orobanchaceae. American Journal of Botany, 100: 971-983.
561 Muir G, Filatov D. 2007. A selective sweep in the chloroplast DNA of dioecious Silene
562 (Section Elisanthe). Genetics, 177: 1239-1247.
563 Percy DM, Argus George W, Cronk Quentin C, Fazekas Aron J, Kesanakurti Prasad R,
564 Burgess Kevin S, Husband Brian C, Newmaster Steven G, Barrett Spencer CH,
565 Graham Sean W. 2014. Understanding the spectacular failure of DNA barcoding in
566 willows (Salix): Does this result from a trans-specific selective sweep? Molecular
567 Ecology, 23: 4737-4756.
568 Pinheiro F, De Barros F, Palma‐Silva C, Meyer D, Fay MF, Suzuki RM, Lexer C,
569 Cozzolino S. 2010. Hybridization and introgression across different ploidy levels in
570 the Neotropical orchids Epidendrum fulgens and E. puniceoluteum (Orchidaceae).
571 Molecular Ecology, 19: 3981-3994.
572 Preston CD, Pearman DA. 2015. Plant hybrids in the wild: evidence from biological
573 recording. Biological Journal of the Linnean Society, 115: 555-572.
574 Sang T, Crawford DJ, Stuessy TF. 1995. Documentation of reticulate evolution in peonies
575 (Paeonia) using internal transcribed spacer sequences of nuclear ribosomal DNA:
24
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
576 implications for biogeography and concerted evolution. Proceedings of the National
577 Academy of Sciences, 92: 6813-6817.
578 Schmickl R, Paule J, Klein J, Marhold K, Koch MA. 2012. The evolutionary history of the
579 Arabidopsis arenosa complex: diverse tetraploids mask the western carpathian center
580 of species and genetic diversity. PLOS ONE, 7: e42691.
581 Slotte T, Huang H, Lascoux M, Ceplitis A. 2008. Polyploid speciation did not confer
582 instant reproductive isolation in Capsella (Brassicaceae). Molecular Biology and
583 Evolution, 25: 1472-1481.
584 Spooner DM. 2009. DNA barcoding will frequently fail in complicated groups: An example
585 in wild potatoes. American Journal of Botany, 96: 1177-1189.
586 Squirrell J, Hollingsworth PM, Bateman RM, Tebbitt MC, Hollingsworth ML. 2002.
587 Taxonomic complexity and breeding system transitions: conservation genetics of the
588 Epipactis leptochila complex (Orchidaceae). Molecular Ecology, 11: 1957-1964.
589 Stace CA, Preston CD, Pearman DA. 2015. Hybrid flora of the British Isles: Botanical
590 Society of Britain and Ireland.
591 Stone H. 2013. Evolution and conservation of tetraploid Euphrasia L. in Britain.
592 Unpublished Thesis, The University of Edinburgh.
593 Svensson BM, Carlsson BÅ. 2004. Significance of time of attachment, host type, and
594 neighbouring hemiparasites in determining fitness in two endangered grassland
595 hemiparasites. Annales Botanici Fennici, 41: 63-75.
596 Taberlet P, Gielly L, Pautou G, Bouvet J. 1991. Universal primers for amplification of
597 three non-coding regions of chloroplast DNA. Plant Molecular Biology, 17: 1105-
598 1109.
599 Twyford A, Ennos R. 2012. Next-generation hybridization and introgression. Heredity, 108:
600 179-189.
25
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
601 Twyford AD. 2014. Testing evolutionary hypotheses for DNA barcoding failure in willows.
602 Molecular Ecology, 23: 4674-4676.
603 Twyford AD, Friedman J. 2015. Adaptive divergence in the monkey flower Mimulus
604 guttatus is maintained by a chromosomal inversion. Evolution, 69: 1476-1486.
605 Wicke S, Müller KF, Quandt D, Bellot S, Schneeweiss GM. 2016. Mechanistic model of
606 evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants.
607 Proceedings of the National Academy of Sciences: 201607576.
608 Yan H-F, Liu Y-J, Xie X-F, Zhang C-Y, Hu C-M, Hao G, Ge X-J. 2015a. DNA barcoding
609 evaluation and its taxonomic implications in the species-rich genus Primula L. in
610 China. PLOS ONE, 10: e0122903.
611 Yan L-J, Liu J, Möller M, Zhang L, Zhang X-M, Li D-Z, Gao L-M. 2015b. DNA
612 barcoding of Rhododendron (Ericaceae), the largest Chinese plant genus in
613 biodiversity hotspots of the Himalaya–Hengduan Mountains. Molecular Ecology
614 Resources, 15: 932-944.
615 Yeo P. 1956. Hybridization between diploid and tetraploid species of Euphrasia. Watsonia,
616 3: 253-269.
617 Yeo PF. 1978. A taxonomic revision of Euphrasia in Europe. Botanical Journal of the
618 Linnean Society, 77: 223-334.
619 Young ND, Healy J. 2003. GapCoder automates the use of indel characters in phylogenetic
620 analysis. BMC Bioinformatics, 4: 6.
621
622
623
26
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
624 625 Figure legends
626
627 Figure 1. Euphrasia samples used in this study. (A) Tetraploid British Euphrasia (here E.
628 arctica) have glabrous leaves sometimes with sparse short eglandular hairs or bristles. (B)
629 Diploid British Euphrasia have long glandular hairs. (C) Collection sites of Euphrasia DNA
630 samples. Diploids are shown in red, tetraploids in blue. Orange boxes correspond to the three
631 broad sampling areas.
632
633 Figure 2. Median-joining network of ITS haplotype relationships in British Euphrasia.
634 Numbers correspond to the ITS haplotypes in Table 1. Haplotypes are coloured by ploidy,
635 with diploids in red and tetraploids in blue. Hypothetical (unsampled) haplotypes are
636 represented by filled black circles.
637
638 Figure 3. Majority rule consensus phylogeny of Euphrasia inferred from the ITS region using
639 MrBayes. Posterior probabilities greater than 0.85 are indicated. Individuals are coloured by
640 ploidy and geography: British diploids (red), British tetraploid (blue), other geographic areas
641 (black). Green circled nodes have divergence dates estimated with BEAST, as given in the
642 text. Clade A and Clade B correspond to the main study groups, with additional clades
643 corresponding to Gussarova et al. (2008) also marked: II northern tetraploids; III Taiwan; IVa
644 South American/Tasmanian; IVb complex (S. American, N. Zealand, Japan); IVc Alpine
645 European.
646
27
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
647 Figure 4. Majority rule consensus phylogeny of Euphrasia inferred from a concatenation of
648 plastid trnL intron, trnL-trnF and atpB-rbcL using MrBayes. Posterior probabilities greater
649 than 0.85 are indicated, and British diploids (red) and British tetraploid (blue) are coloured.
650
651 Table 1. The distribution of ITS2 haplotypes between species and geographic regions in
652 British Euphrasia. Haplotype numbers correspond to the haplotype network Figure 2.
653 Population specific haplotypes are aggregated under one column. n = number of samples.
654
Population- Species n Widespread haplotypes specific Region haplotypes
H1 H2 H3 H5 H6 H11
E. anglica SW 4 4 0
E. anglica W 3 2 1 0
E. arctica S 3 1 2 0
E. arctica SW 2 2 0
E. arctica W 2 1 1 0
E. arctica x confusa S 1 1
E. arctica x foulaensis S 1 1 0
E. arctica x micrantha S 3 1 1 1 0
E. arctica x nemorosa S 2 1 1 0
E. arctica x rostkoviana S 3 1 2 0
E. cambrica W 3 1 1 1
E. campbelliae S 3 3 0
E. confusa S 4 4 0
E. confusa SW 3 1 1 1
E. confusa W 3 2 1
E. confusa x micrantha S 2 1 1
E. “fharaidensis” S 2 1 1 0
E. foulaensis S 6 2 2 2 0
28
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
E. foulaensis x marshllii S 2 1 1 0
E. foulaensis x nemorosa S 1 1 0
E. foulaensis x ostenfeldii S 1 1 0
E. frigida S 6 5 1 0
E. heslop-harrisonii S 6 5 1
E. marshllii S 3 3 0
E. marshallii x micrantha S 2 2 0
E. micrantha S 5 4 1
E. micrantha SW 3 2 1
E. micrantha W 3 2 1
E. micrantha x nemorosa SW 1 1 0
E. micrantha x scottica W 4 4 0
E. nemorosa S 3 3 0
E. nemorosa SW 2 1 1
E. nemorosa W 3 2 1
E. nemorosa x tetraquetra SW 1 1 0
E. ostenfeldii S 5 3 2
E. ostenfeldii W 1 1 0
E. pseudokerneri W 3 2 1 0
E. rivularis W 3 2 1
E. rostkoviana W 3 2 1
E. rotundifolia S 1 1 0
E. scottica S 3 3 0
E. scottica W 3 3
E. tetraquetra SW 3 3 0
E. tetraquetra W 3 1 2 0
E. tetraquetra x vigursii SW 2 1 1 0
E. vigursii SW 4 4 0
Total 130 15 67 12 4 11 3 18 655 656 657 29
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
658 659 Table 2. Hierarchical analysis of molecular variance (AMOVA) of British Euphrasia 660 populations. Analyses performed between (A) species, (B) 3 geographic locations (Wales, 661 South-West England, Scotland, (C) diploids and tetraploids. Number in parentheses are the 662 results only including species (excluding hybrids). d.f. = Degrees of freedom. **P < 0.001;*P 663 < 0.05. 664
ITS Plastid DNA Source of variation d.f. % Total variance d.f. % Total variance (A) Taxa Between taxa 33 (19) 63.17** 33 (19) 15.87** (26.48**) (65.62**) Within taxa 96 (77) 36.83 (34.38) 96 (68) 84.13 (73.52) (B) Location Between regions 2 (2) 25.52** 2 (2) 5.40** (4.91*) (25.92**) Within regions 127 (94) 74.48 (74.08) 127 (85) 94.60 (95.09) (C) Ploidy Between ploidy groups 1 (1) 88.24** 1 (1) 11.05* (18.76*) (84.39**) Within Diploid and 123 (95) 11.76 (15.61) 123 (86) 88.95 (81.24) tetraploid 665
30
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
666 667 Supplementary Table S1. Voucher information and population details of Euphrasia samples
668 used in the study. Columns headed cpDNA and ITS 2 refer to the number of individuals
669 sequenced for these regions.
670 Supplementary Table S2. PCR conditions and primer sequences for regions sequenced in this
671 study.
672 Supplementary Table S3. Spatial analysis of molecular variation (SAMOVA) of ITS
673 sequence data across British Euphrasia populations.
674 Supplementary Table S4. Plastid haplotype frequencies across British Euphrasia species. The
675 column cpDNA indicates the number of individuals sampled.
676 Supplementary Table S5. Spatial analysis of molecular variation (SAMOVA) of plastid
677 sequence data across British Euphrasia populations.
678
679
680
681
682
31
bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/164681; this version posted July 17, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.