Page 1 of 117 Journal of Biogeography

1 2 3 4 5 We thank the reviewers for their helpful comments. Our responses to each 6 7 comment are included below in blue. We apologise for the length of time it 8 9 has taken us to resubmit our revision. To properly address reviewer concerns 10 11 required additional analyses that turned out to be quite time consuming. 12 Additionally, and more recently, an intensive campaign of fieldwork by the first 13 14 author in Brazil became a more immediate and immovable priority. We 15 16 appreciate the time taken by the reviewers to assess our manuscript, and we 17 18 believe that they have helped us to provide a more robust and balanced 19 manuscript. 20 21 For Peer Review 22 23 24 25 ------26 REVIEWER COMMENTS TO AUTHOR 27 28 29 30 Referee: 1 31 32 33 Comments to the Author 34 35 This manuscript by Faria and colleagues focuses on two genera of 36 37 Collembola to test for signatures of Pleistocene persistence in glaciated vs. 38 39 unglaciated areas of Great Britain. I believe that this is a compelling and well- 40 written manuscript, which will be of broad interest to the readers of the Journal 41 42 of Biogeography. However, I would like to express a few concerns and some 43 44 suggestions that may help the authors to improve it further. 45 46 47 48 49 - OTU delineation: OTU delineation is a very important aspect of this 50 51 study as all subsequent analyses are based on the delimited OTUs. Given 52 that it is such a critical part of the analyses, I would expect a more thorough 53 54 approach, rather than relying simply on a 4% pairwise p-distance threshold. It 55 56 is not clear why the authors did not assess different threshold values and/or 57 58 they did not apply any phylogeny-based method (e.g., PTP or GMYC 59 analyses for single-locus species delimitation). At least I would appreciate a 60 Journal of Biogeography Page 2 of 117

1 2 3 4 better justification and discussion on this issue, so that it does not sound like 5 an arbitrary decision. 6 7 8 9 The reviewer’s concern is pertinent, particularly because the details about our 10 11 approach for OTU delineation were not included in the manuscript. Our 12 approach was based on prior data available from several studies that have 13 14 investigated the temporal-spatial aspects and biological significance of cryptic 15 16 molecular lineage diversity in the genus , one of the two genera 17 18 investigated in the present study. The more relevant of these for the present 19 study is Cicconardi et al. (2013) who defined a molecular lineage within a 20 21 morphological definedFor species Peer of Collembola Review as “a group of individuals, or an 22 23 individual, that are reciprocally defined as a monophyletic (group of 24 25 individuals) or as a divergent lineage (single individuals) by unlinked loci.” 26 They used Hardy-Weinberg equilibrium (HWE) and linkage-disequilibrium 27 28 (LD) tests among lineages in sympatry to evaluate molecular lineages within 29 30 morphospecies. As a result, they found 19 molecular lineages occurring 31 32 sympatrically, with genetic isolation in sympatry congruent with assignment to 33 biological species. We then based our OTU threshold on these prior data and 34 35 the thorough approach behind with explicit reference to the biological species 36 37 concept. We used sequences from these 19 molecular lineages of 38 39 Cicconnardi et al. collected in sympatry to calculate their pairwise distances 40 and used the minimum p-distance found between them (0.04) to define our 41 42 96% similarity threshold. This is now summarized in lines 180-190. 43 44 45 46 Initially, we did not apply any phylogeny-based method due to the nature of 47 Collembola data, which mostly present deep divergent sequences that may 48 49 have experienced high levels of saturation. This is not very appropriate for 50 51 Bayesian or ML methods. Thus, we used a NJ method to build the tree and 52 used, as explained above, prior data regarding species definition available for 53 54 Collembola to define the 4% distance threshold. That is to say, our interest 55 56 was to identify clusters, not higher-level relationships among clusters. 57 58 59 60 Page 3 of 117 Journal of Biogeography

1 2 3 4 We are aware that NJ uses a genetic distance-clustering algorithm without an 5 optimality criterion. Thus, we have taken on board the reviewer’s suggestion 6 7 to have a more thorough approach to define OTUs, and explored Bayesian 8 9 Inference (BI) and Maximum Likelihood (ML) to build trees to then explore a 10 11 range of OTU delimitation methods. Due to limitations and caveats inherent to 12 delimitation methods, we applied three different approaches to then assess 13 14 congruence of inference across approaches (explained in lines 155-190). 15 16 17 18 Main limitations: 19 GMYC – performance is affected by: ratio of population sizes to species 20 21 divergence times, varyingFor population Peer sizes,Review number of species involved and 22 23 number of sampling singletons (tends to over split species) 24 25 bPTP and mPTP – branch lengths proportional to the amount of genetic 26 change rather than to time. Outperforms GMYC when interspecific distances 27 28 are small. 29 30 31 32 First, to infer an ultrametric tree, which is a requirement for the GMYC 33 method, we ran BEAST analyses using all sequences but no outgroups, since 34 35 they tend to compress the coalescent events towards the tips of the trees, 36 37 making difficult for GMYC to distinguish closely related species. We tested a 38 39 combination of different parameters (tree and clock models) and priors (for the 40 ucdl.mean) but BEAST analyses were somewhat problematic (there were 41 42 convergence issues with poor ESS values). These problems are likely to be 43 44 due to the type of data (species level data with some variation within species), 45 46 which cannot be fully resolved by a Yule tree prior or by a Coalescent tree 47 prior. As we were not confident on the resulting BI trees (which actually 48 49 resulted in very different GMYC results), we explored ML analyses. 50 51 52 53 We inferred a gene tree using maximum likelihood in RAXML (Stamatakis et 54 55 al., 2008). After rooting the tree and removing the outgroup sequence, we 56 57 used the inferred tree for species delimitation by bPTP and mPTP methods 58 59 (Zhang et al., 2013; Kapli et al., 2017). However, bPTP failed to run the raxml 60 Journal of Biogeography Page 4 of 117

1 2 3 4 tree, possibly because the branch lengths of ML trees seemed to be set to the 5 minimum. This could be potentially a problem caused by the large amount of 6 7 exactly identical sequences across both datasets. To take this into account, 8 9 we collapsed our sequences into unique haplotypes, and re-run ML analysis 10 11 and bPTP analysis with this shorter dataset. However, bPTP also failed to run 12 and we focused on mPTP, an improved model over bPTP that accounts for 13 14 different levels of intraspecific genetic diversity. It also implements a more 15 16 accurate heuristic algorithm compared to bPTP. Both and 17 18 Lepidocyrtus unique haplotype datasets were successfully run in the 19 webserver. Using the multicoalescent rate, mPTP delimited 12 Entomobrya 20 21 and 17 LepidocyrtusFor OTUs Peerfor ML trees. Review 22 23 24 25 Finally, we re-ran Beast analyses using the unique haplotype dataset for both 26 genera. Optimal convergence and ESS values were achieved and resulting BI 27 28 ultrametric trees were used for both GMYC and mPTP analyses. GMYC was 29 30 run in R using the ‘splits’ package (Ezard et al., 2009). Number of GMYC ML 31 32 entities were 22 for Entomobrya and 18 for Lepidocyrtus. Finally, mPTP for 33 the BI tree defined 13 OTUs for Entomobrya, and 22 OTUs for Lepidocyrtus. 34 35 Details of similarities and differences among OTUs defined by the different 36 37 methods are in Appendix 2: Table S2.2. 38 39 40 In general, our previous NJ tree inference was similar to the BI haplotype 41 42 trees and only slightly different from ML haplotype trees (see Appendix 43 44 2:Table S2.1). OTU delimitation methods slightly differ in number of OTUs 45 46 defined, thus we have taken a conservative approach counting OTUs as 47 those that were recovered in the majority of approaches employed and 48 49 avoiding intra splits (Appendix 2: Table S2.2). Final number of OTUs is 12 for 50 51 Entomobrya and 18 for Lepidocyrtus. 52 53 54 55 56 - NJ trees: On a related note, I do not find very appealing the fact that 57 58 the only trees presented in the manuscript are distance-based neighbour- 59 joining trees, which are however treated as if they were true phylogenetic 60 Page 5 of 117 Journal of Biogeography

1 2 3 4 trees (e.g., they are used to establish monophyletic sequence clusters - lines 5 271, 361 etc). To establish monophyly, I think it would be preferable to include 6 7 a phylogenetic tree (e.g., a Maximum Likelihood analysis using RAxML which 8 9 is particularly suitable for large datasets). Or you could just present the 10 11 BEAST trees (if they included the full dataset, see next point below), which 12 are currently not presented in the manuscript or supplementary material. 13 14 15 16 17 18 As explained above, we have substituted the NJ trees with BEAST unique 19 haplotype trees (Figs. 2 & 3) and included all haplotypes ML trees in the 20 21 supplementary materialFor (Appendix Peer 2: Figs.Review S2.1 & S2.2). 22 23 24 25 BEAST trees used for dating analysis did not include the full dataset, only the 26 geographic localized monophyletic OTUs and they are now included in the 27 28 supplementary material (Appendix 4: Figs. S4.1 to S4.7) 29 30 31 32 33 - Dating analyses: The methodology followed for the dating analyses in 34 35 BEAST (lines 180-185) is not very clearly explained. For example, it is not 36 37 clear which datasets were used for these analyses. If I judge from the 38 39 explanations in the Results section (lines 333-343) it appears as if the 40 analyses were applied separately on specific sequence clusters (as the 41 42 models of molecular evolution are reported for each OTU or sequence cluster 43 44 separately). However, if this is the case, I do not understand why would you 45 46 opt for a Yule speciation prior for the analyses (lines 183-184), given that you 47 are basically analysing intraspecific datasets. Please clarify and justify which 48 49 datasets were used for these analyses. It would be also particularly helpful to 50 51 include the BEAST trees in the figures of the manuscript or the supplementary 52 material. 53 54 55 56 Yes, dating analyses were applied separately for the geographically localized 57 58 monophyletic clusters and this has been clarified in the text (lines 221-222). 59 60 Journal of Biogeography Page 6 of 117

1 2 3 4 We agree with the reviewer’s point that the Yule speciation prior was 5 inappropriate because we were in fact analysing single OTUs with coalescent 6 7 variation. Thus, dating analyses have been repeated with the Coalescent tree 8 9 prior (lines 224-225). Estimated ages are older than the ones previously found 10 11 using the Yule tree prior, thus, our conclusions have not been changed (lines 12 406-408, 413-414 and Table 5). 13 14 15 16 17 18 - Evolutionary rate estimation: I think you should also clarify and justify 19 better the strategy used for estimating the applied evolutionary rate. For 20 21 example, it is not veryFor clear: Peer A) Why do Review you need to use a COII external rate 22 23 to calibrate the COI rate? (instead of using directly a COI external rate). It 24 25 would be good to explain that from the beginning of this section, otherwise it is 26 quite confusing. B) How does your estimate for the COI rate compares with 27 28 the rates cited in the literature you mention? It would be good to explain that 29 30 too. C) In lines 165-166, why do you mention the broad range of values found 31 32 in the literature for the COII gene, but then you do not really use this range for 33 anything, not even for the discussion. Also, the literature you cite in lines 166- 34 35 167 does not seem to focus on the COII gene, but they are more general 36 37 about mitochondrial clock rates. D) Is it justified to calibrate your evolutionary 38 39 rate based on old divergence times (35Ma here, line 176) and then use it for 40 estimating very recent events? 41 42 43 44 We have included a more thorough explanation about the strategy used. See 45 46 lines 196-228. 47 48 49 Answering each point raised: 50 51 52 A) Currently, there is no COI rate available for Collembola. However, a COII 53 54 mean rate has been estimated for Lepidocyrtus in an extensive comparative 55 56 rate analysis using COII sequences found across Pancrustacean major 57 58 lineages (Cicconardi et al., 2010). We make use of this rate for calibrating the 59 60 Page 7 of 117 Journal of Biogeography

1 2 3 4 rate of the COI region, as it is a rate specifically estimated for one of the 5 genera that we are working on. 6 7 8 9 B) We have only mentioned that we are using a fast and more conservative 10 11 rate of evolution. We have not compared our COI rate because rates cited in 12 the text were proposed to insect groups (e.g. beetles, butterflies, drosophila). 13 14 15 16 C) We mentioned the broad range of rates already proposed for the 17 18 mtDNA to give a general idea of how this rate can vary across groups and we 19 did not discuss how our COI rate fits in this range because it comes from a 20 21 different class (Collembola).For Peer References Review used in these lines are indeed more 22 23 general about the mtDNA molecular clock and we have changed them to 24 25 more appropriated ones. 26 27 28 D) Indeed, rates of mitochondrial evolution have been suggested to be time 29 30 dependent in a variety of taxa, with a trend of declining rates over time for 31 32 insect mitochondrial DNA, but there is in fact little supporting evidence, with 33 time calibrated DNA analyses recently revealed to suffer from inherent bias 34 35 (see https://doi.org/10.1111/mec.13451). By using a rate obtained from an old 36 37 estimate (35 Ma), we found a rate that is faster than those universal rates 38 39 (0.006-0.0248%) discussed above, including the COII rate estimated for 40 Collembola. Thus, we are more likely to be underestimating than 41 42 overestimating the ages of divergence we are interested in. Then, if using this 43 44 faster than expected rate, we obtained ages that predate glaciation periods, 45 46 we can be confident that this conservative approach provides reliable relative 47 time frames for lineage persistence over long periods of time. 48 49 50 51 - Structuring of genetic variation: An important aspect for the 52 interpretation of the observed patterns seems to be the “non-random 53 54 geographical structuring of genetic variation” (e.g., lines 270, 294-296, 359- 55 56 360, 363). However, it is not clear how the authors evaluate the existence of a 57 58 non-random structuring. There is a description of the observed patterns and 59 maps are used for presentation which is nice, but I do not see any formal 60 Journal of Biogeography Page 8 of 117

1 2 3 4 analysis or statistical test. It would be helpful to include some kind of test to 5 assess the structuring of genetic variation, it could just be some simple 6 7 analysis (e.g., AMOVA- Analysis of Molecular Variance) 8 9 10 11 Yes, this is a good point, as we have not quantitatively evaluated the 12 existence of non-random distribution of monophyletic clusters. This has been 13 14 only visually inspected because testing for significance is not simple. AMOVA 15 16 would not be the way to go because it only tells about the existence of 17 18 population differentiation and the pattern we are really interested in it is local 19 monophyly. 20 21 For Peer Review 22 23 To address the reviewer’s concern, we have applied Ripley’s function (K) to 24 25 our data (Dixon, 2002). This method determines spatial distribution of point 26 patterns, i.e. OTUs, to check whether they appear to be dispersed, clustered 27 28 or randomly distributed. For inferential purposes, the estimate of K is 29 30 compared to the true value of K for a completely random (Poisson) point 31 32 process. Deviations between the empirical and theoretical K curves may 33 suggest spatial clustering or spatial regularity. 34 35 36 37 K function was tested using the package ‘spatstat’ in R (Baddeley & Turner, 38 39 2005). Three corrections (Kiso – isotropic, Ktrans – translation correction, Kbord – 40 border method) were applied to observed data while estimating K function and 41 42 these were compared to what it would be expected if spatial distributions were 43 44 completely random (Kpois – completely random Poisson point process). For 45 46 both Entomobrya and Lepidocyrtus, empirical K functions for overall OTUs 47 deviate from the theoretical expected value. 48 49 50 51 These plots used overall distribution of all OTUs, we then checked patterns 52 for individual OTUs of interest (localized and disjunctly distributed clusters) 53 54 using the mark function and then running Kdot (i-to-any) function. This is the 55 56 analogue to K function but for individual marked OTUs (multitype data). To 57 58 run Kdot, an i mark value (character string identifying the points in X from 59 which distances are measured) was defined as each OTU of interest and 60 Page 9 of 117 Journal of Biogeography

1 2 3 4 analyses run separately [i.e. analyses ran between localities of occurrence of 5 a defined monophyletic cluster (i) versus localities where all other OTUs were 6 7 sampled (to-any)]. 8 9 10 11 Plots for geographically localised monophyletic OTUs within Entomobrya and 12 Lepidocyrtus show significant deviation from random distribution consistent 13 14 with geographic clustering. Kdot function analysis was added to the text (lines 15 16 236-243 and 370-373) and we present plots for OTUs of interest (localized 17 18 and disjunctly distributed) in the supplementary material (Appendix S3: Figs. 19 S3.5 & S3.6). 20 21 For Peer Review 22 23 - Limited dispersal capability: Both in the introduction and in the 24 25 discussion the authors support a very low dispersal capability for these 26 genera of Collembola (lines 98-99 and 376-380), which is an important 27 28 argument to discard a scenario of long-distance dispersal after ice-sheet 29 30 retreat (lines 373-376). How is this compatible with the fact that there are 31 32 almost identical haplotypes distributed in Canada, South Africa or Australia 33 (lines 242-243 and 259)? Please explain this in the discussion. 34 35 36 37 This can be reconciled as follows (added to the text in lines 470-482): 38 39 40 Great Britain sequences that are genetically identical or nearly identical to 41 42 sequences sampled in remote continents are most plausibly explained as 43 44 human introductions. Passive dispersal of Collembola through human activity 45 46 has been implicated in the sharing of lineages across distant continents, 47 e.g. Cerathophysella denticulata lineage L3 found in Canada, South Africa, 48 49 Australia and New Zealand (Porco et al., 2012), and such introductions are 50 51 considered to have mainly been accidental (e.g. in soil used as ship ballast). 52 Mitochondrial metagenomic data has also revealed island faunas to be highly 53 54 represented by introduced Collembola (Cicconardi et al. 2017). Our data set 55 56 presents similar examples, such as Entomobrya OTU 6, with sequences 57 58 100% similar sampled in Canada, and South Africa and sequences 100% 59 identical to Lepidocyrtus OTU 9 sampled in Australia and Tasmania. Genetic 60 Journal of Biogeography Page 10 of 117

1 2 3 4 identity, or near identity across these vast geographic distances is consistent 5 with passive transport through human activities 6 7 8 9 10 11 - Sampling effort: I think it would be helpful to provide some more details 12 about the sampling (lines 119-121). The reader may wonder how did you 13 14 ensure equivalent sampling in glaciated vs. non-glaciated areas and at each 15 16 sampling site. 17 18 19 We added the following paragraph to Methods, lines 119-126 20 21 For Peer Review 22 23 Samples were collected from 98 sites across Britain between 2011 and 2012. 24 25 The aim was an approximately equal sampling effort across the glaciated and 26 unglaciated areas of England and Wales, collecting particularly from the 27 28 geographical extremes of the mainland (Fig. 1, Appendix 1: Table S1.1). At 29 30 each location, surface-active were collected by vacuum collection 31 32 (two sets of 30 second collections, transferred directly to EtOH), and duplicate 33 soil / litter samples were collected with minimal disturbance into plastic bags, 34 35 for return to the laboratory and extraction in a Tullgren funnel system followed 36 37 by manual removal of the relevant genera. 38 39 40 41 Some typos and small corrections/clarifications (not an exhaustive list): 42 43 - Line 180: “This estimated COI rate of was” remove “of” 44 45 46 47 Changed. 48 49 50 - Line 136 vs. Line 232: The difference in the fragment lengths is due to 51 52 trimming the ends of the sequences? Please clarify 53 54 55 Yes, primers ColFol-for and ColFol-rev amplify a 658bp barcode region, 56 57 however, when sequencing our DNA samples, the final ends of the 58 59 sequences did not present clear peaks, so they have been trimmed. 60 Page 11 of 117 Journal of Biogeography

1 2 3 4 5 6 7 - Line 402: change “these result suggests” to “these results suggest” 8 9 Changed. 10 11 12 - Line 446: change “bases” to “based” 13 14 Changed. 15 16 17 18 Referee: 2 19 20 21 Comments to the AuthorFor Peer Review 22 23 The present study evaluates the predictions of the “southern richness, 24 25 northern poverty” hypothesis in 2 genera of a group of terrestrial invertebrates 26 with restricted dispersal abilities. The assessment of genetic diversity is done 27 28 with the amplification of COI and morphospecies assignment based in 29 30 matching the genetic data with a reference panel of identified specimens. 31 32 Regardless the morphospecies assignation specimens are clustered in 33 reciprocally monophyletic groups (OTUs) based on genetic distance with 34 35 reference to a previous study of sympatric species from Panamá. The main 36 37 result is the evidence of persistence of Collembola during Pleistocene glacial 38 39 cycles in the northern areas of the British Isles. By showing pre-LGM OTUs 40 clades, the authors provide attractive evidence of potential geothermal glacial 41 42 refugia on areas previously covered by the ice. This idea is supported in 43 44 recent evidence from Antartica. 45 46 The strengths of the study are: (1) ask a general question on an 47 understudied group; (2) an extensive sampling in terms of geography and 48 49 number of specimens; and (3) provide an alternative hypothesis (geothermal 50 51 glacial refugia) to explain the presence of old lineages in glaciated areas. 52 Points to be improved on the manuscript: (1) refer explicitly to situations that 53 54 deviate from the general trend (i.e. (i) OTU 1 of Lepidocyrtus been more 55 56 diverse in glaciated areas, or (ii) no explanation to the potential barriers that 57 58 create disjunct distributions on Lepidocyrtus (see Fig 5a-b)); (2) presentation 59 of phylogenies (see note on Fig. 3); (3) discussion of alternative mechanisms 60 Journal of Biogeography Page 12 of 117

1 2 3 4 to explain genetic patterns observed (for example: (i) differential expansion 5 speeds for both genera (Entomobrya faster than Lepidocurtys), or higher 6 7 genetic diversity in the south for Lepidocyrtus due to historic man mediated 8 9 introductions from other parts of the world); (4) justify some technical 10 11 decisions on the data analysis (program to select models of molecular 12 evolution, or application of OTUs specific models of molecular evolution in the 13 14 estimation of MRCA); and (5) lack of discussion in terms of the implications 15 16 for the tested hypothesis of the presence of 2 intra-OTU clades (NIVG1 and 17 18 NVG2) on Entomobrya. Please, also refer to the Specific comments for more 19 detailed suggestions. 20 21 I would recommendFor this article Peer as: “Accepted Review with minor revisions”. 22 23 24 25 Specific comments: 26 27 28 Line 175: Why did you choose those models? Why they are different on each 29 30 gene? 31 32 33 Choosing the most appropriate model of DNA sequence evolution for a DNA 34 sequence dataset is a crucial step for inferring trees and estimating times of 35 36 divergence. We determined the most appropriate model for each dataset 37 38 using maximum likelihood approach in MEGA software. 39 40 41 Substitution models are different on each gene because sequences obtained 42 43 differ in many aspects (e.g. base frequencies, GC content, bias in transitions 44 45 and tranversions, etc.), which affect the rate of change from one nucleotide to 46 47 another. Models take these differences in account by incorporating different 48 rate parameters, thus, different genes are likely to result in different 49 50 substitution models. 51 52 53 54 Line 223: Are they “morphospecies” or properly identified species? The fact 55 that they are assigned to scientific names makes it look like they were 56 57 properly assigned to described species. Please, make this clear. 58 59 60 Page 13 of 117 Journal of Biogeography

1 2 3 4 The original aim of the work was to establish the extent to which 5 morphospecies and genetic species agreed in these two genera of 6 7 Collembola. It was important to minimise damage to the animals’ bodies prior 8 9 to DNA extraction, so standard additional checks such as squashing 10 11 mouthparts had to be avoided. Inevitably this means that any names 12 generated prior to sequencing must be morphospecies. Whether the referee 13 14 regards these as “properly identified species” or not is almost a semantic 15 16 question. 17 18 19 To expand on the process: the animals were allocated a species name based 20 21 on the result of keyingFor them Peer out using primarilyReview one source, Hopkin (2007). 22 23 For Entomobrya this key relies almost exclusively on colour pattern, and is not 24 25 fully adequate as in conflates E. nivalis and E. intermedia (Steve Hopkin was 26 not convinced that E. intermedia was a good species while writing this key), 27 28 so once an Entomobrya was keyed to E. nivalis its dorsal pattern (especially 29 30 dark markings on abd 4-6) was further checked against South (1961). 31 32 Lepidocyrtus were keyed out using the same key, but in this genus the key 33 uses morphology as well as colour patterns. The shape of T2 and the 34 35 distribution of scales on legs and antennae are especially important, although 36 37 the scales are an unsatisfactory feature since they are hard to see in 38 39 uncleared specimens, dislodge easily on handling, and can wash off the body 40 to end up apparently associated with the legs. 41 42 43 44 45 46 47 Line228: Same comment as with Entomobrya. 48 49 50 51 Please, see answer above. 52 53 54 Line 243: How do you discard the presence of introduced haplotypes? That is 55 56 suggested by the geographic origin of the sequenced haplotypes. 57 58 59 60 Journal of Biogeography Page 14 of 117

1 2 3 4 Rather than discarding, we have acknowledged the presence of introduced 5 haplotypes and provided a better discussion regarding human introductions, 6 7 as explained above in answer to referee 1 (see lines 470-482) 8 9 10 11 Line 334: The Methods section does not indicate the program used to select 12 this. Please, add. 13 14 15 16 Methods did indicate it, now in lines 222-223 (MEGA) 17 18 19 Line 340: The fact that 2 different models of molecular evolution were 20 21 selected for 2 differentFor OTUs Peer implies that Review a different time-calibrated phylogeny 22 23 was used to estimate the age of the OTU on each case, is it correct? If yes, 24 25 what models were used for the other OTUs? 26 27 28 We have only estimated the age of OTUs that are geographically localized to 29 30 a restrict area and this was done for each OTU separately. As it has been 31 32 explained in lines 413-414, the K2 model was selected for both OTUs 3 and 8, 33 while T92 model was selected for three OTUs: 7, 12 and 18 (all models 34 35 selected using MEGA with each dataset separately). 36 37 38 39 If my understanding is correct, I would say that the use of different models of 40 molecular evolution to estimate time-calibrated phylogenies within the same 41 42 group is an unusual practice. I am afraid that the use of different models might 43 44 affect the possibility to compare the age estimations. I would use the same 45 46 model for the whole phylogeny and then compare the estimated ages. 47 However, I do see the benefit of using clade-specific models, as there could 48 49 be variability between clades on the rate of evolution. 50 51 I think this section would benefit by an explicit explanation of the rational of 52 this decision and if there is another paper where this approach is used, please 53 54 include it as a reference. 55 56 57 58 We have not estimated ages for the whole phylogenetic tree. Our aim was not 59 to provide a time-calibrated phylogeny for the group but rather provide a 60 Page 15 of 117 Journal of Biogeography

1 2 3 4 relative temporal framework for coalescent origins of geographically localized 5 monophyletic clusters as they indicate endemic genetic variation evolving in 6 7 situ and check whether they predate the LGM. We are not aware of another 8 9 paper using a similar approach but we have expanded our explanation in lines 10 11 196-228. 12 13 14 15 16 Line 359: How do you interpret that only one OTU (OTU 4) has a non-random 17 18 pattern? Why is it different? 19 This is only mentioned in the discussion and there is no further evaluation of 20 21 its relevance for theFor main argument. Peer Review 22 23 24 25 First, we mapped the geographic distribution of each sequence, and then we 26 evaluated how OTUs or variation within OTUs (e.g. monophyletic clusters) are 27 28 distributed across sampling sites (widespread, restricted to a few closely 29 30 located sites or disjunctly distributed). When we closely examine the amount 31 32 of sequence variation within each OTU and the geographic distribution of this 33 variation, we can see that only OTU 4 has sequence variation restricted to a 34 35 few proximate sites, i.e., they are geographically localized to a restricted area. 36 37 While the other Entomobrya OTUs present distinct patterns, with sequences 38 39 being more widely or disjunctly distributed. 40 41 42 43 44 Line 404: The presence of more than one morphospecies on the same 45 46 mitochondrial lineage in Lepidocyrtus could be caused by introgression 47 events. Have you considered this alternative? How this would affect your 48 49 conclusions? 50 51 52 One of the controversies of DNA barcoding is that it can generate false 53 54 negatives, i.e., identical barcodes in two different species, and this could be 55 56 explained by mtDNA introgression, hybridization, and incomplete lineage 57 58 sorting causing incongruences between gene trees and species trees (Funk & 59 Omland, 2003). To be able to evaluate introgression, we would need a 60 Journal of Biogeography Page 16 of 117

1 2 3 4 combination of mitochondrial and nuclear genes to assess these 5 incongruences but we have only sequenced the COI gene. However, no 6 7 mtDNA introgression and rarely hybridization have been documented in 8 9 Collembola (Burkhardt & Filser, 2005). For example, field studies have shown 10 11 a low probability of natural hybridasation between individuals derived from 12 sympatric population which has been related to specific habitat preferences of 13 14 these population resulting in spatial isolation, thus preventing hybridization 15 16 (Skarżyński, 2004) 17 18 19 We believe a taxonomic issue is more likely to explain this pattern. 20 21 Collembola studiesFor are proving Peer morphological Review species to be largely 22 23 polyphyletic and constituted by several monophyletic clusters geographically 24 25 distributed in restricted areas (e.g. Garrick et al., 2007, 2008, Cicconardi et 26 al., 2010), and many studies are challenging the current taxonomic system 27 28 which lacks enough morphological characters to reliable identify species and 29 30 species boundaries (e.g. Cicconardi et al., 2013). Morphological stasis due to 31 32 a possible buffering effect for the soil environment (selection does not act in 33 visual cues) has also been used to explain some discrepancies between 34 35 morphological and molecular data, such as a high level of cryptic species. 36 37 38 39 Furthermore, molecular studies indicate stable lineages evolving 40 independently with no sharing between close located sites and long 41 42 divergence time as main features for Collembola diversity (e.g. Cicconardi et 43 44 al., 2010, 2017). Finally, if introgression more than taxonomic issues were to 45 46 explain our data, it would not change our conclusions regarding number of 47 OTUs and time of origins for endemic variation. It would rather tell us that 48 49 some of our OTUs are a process of mixing between two putative species. 50 51 52 Line 413: What if the most diverse areas, are more diverse because they 53 54 have more introduced haplotypes? The south of the country has very 55 56 important port cities. 57 58 59 This is not likely to be the case because introduced haplotypes occur in both 60 Page 17 of 117 Journal of Biogeography

1 2 3 4 North and South of the country. 5 6 7 8 9 Line 415: Entomobrya presents the similar diversity in glaciated and 10 11 unglaciated areas, while Lepidocyrtus is more diverse in unglaciated areas. 12 What if this pattern is explained by differential re-colonization speeds between 13 14 genera? 15 16 17 18 Been Entomobrya faster in recolonization and Lepidocyrtus slower. Then, 19 given enough time, the diversity of Lepidocyrtus will be also the same 20 21 between glaciated andFor unglaciated Peer areas. Review In other words, if differential re- 22 23 colonization speed is taken into account, this data (both genera distributions) 24 25 could still support the “northern poverty” hypothesis. Please, refer to this 26 possibility. 27 28 29 30 This possibility has been acknowledged in the text (lines 506-508). 31 32 33 Table 1: 34 35 - Why range is not included? It is present on Table 2 36 37 38 39 It has been included to Table 1. 40 41 42 - On the morphospecies with “-“ on the last column mean that there is 43 44 “no match”. If yes, write that in order to have the same nomenclature as in 45 46 Table 2. 47 48 49 This has been changed. 50 51 52 53 54 Table 2, Column OTU 1: It seems quite interesting that OTU 1 is more diverse 55 56 on the glaciated area. It is true that corresponds to only one of 18 lineages, 57 58 but it is one of the most widespread. I would suggest to, at least, referring to 59 this situation explicitly in the discussion. Does it have an effect on the final 60 Journal of Biogeography Page 18 of 117

1 2 3 4 interpretation of the data or it is just an isolated situation? 5 6 7 Table 2 refers to numbers of localities where OTU1 has been sampled in each 8 9 area (glaciated x unglaciated). There is no indication of haplotype diversity 10 11 between these areas. This is actually referred to in Table 4. Number of 12 haplotypes are greater in glaciated than unglaciated areas for both OTU 1 and 13 14 15, however after rarefying the data and testing diversity differences using 15 16 ECOSIM, as explained in methods, we found intra-OTU richness to be 17 18 equivalent in both areas for both OTUs. 19 20 21 Figure 3: For Peer Review 22 23 - Lineages 6; 9.1; 15.1; 15.3 and 16, could also be cartooned to save 24 25 space. 26 27 28 This figure has been changed. 29 30 31 32 - Two different clades are lab 33 34 35 This comment is incomplete. 36 37 38 39 Referee: 3 40 41 42 Comments to the Author 43 44 The paper by Faria et al. has a number of positive attributes. For example, it 45 46 broadly examines genetic diversity (i.e., at the genus, species and population- 47 level) in organisms for which relatively little information about spatial 48 49 structuring is available, and presents an interesting (and perhaps somewhat 50 51 counter-intuitive) finding of pre-LGM persistence in glaciated areas, 52 suggesting an important role cryptic northern for micro-refugia. Given this, I 53 54 think that paper has potential to make a strong contribution to understanding 55 56 of biogeographic patterns and processes in northern latitudes. 57 58 59 My main reservations relate to (1) the rather qualitative / descriptive nature of 60 Page 19 of 117 Journal of Biogeography

1 2 3 4 the inferences, (2) the use of potentially confusing / inconsistent terminology 5 relating to units of biological organization, as well as (3) missed opportunities 6 7 for extracting more information out of the data that were collected. These are 8 9 expanded upon below. 10 11 12 1. Examination of intra-OTU spatial structure seems to have been done by- 13 14 eye, yet conclusions based on this a re phrased quite strongly. For 15 16 Entomobrya spp. OTU 4, I think that there is sufficient sympatry between the 17 18 two subgroups that the questions about the extent of substructure could be 19 raised on the basis of differing personal opinion; for Lepidocyrtus spp. OTUs, 20 21 readers cannot evaluateFor subgroup Peer spatial Review structure for themselves, and so is 22 23 an issue with transparency. In the latter case, the delineation of intra-OTU 24 25 subgroups also not presented, and I don't recall seeing any explicit criteria laid 26 out for identifying such subgroups a priori. For example, for Entomobrya spp. 27 28 OTU 4, only a subset of the haplotypes in this this OTU are assigned to a 29 30 subgroup, and basis of deciding which haplotypes are included or not needs 31 32 clearer rationale. I would add that, if elements of the estimated phylogenetic 33 tree topology and node support values are used for this (cf. simply using % 34 35 sequence divergence thresholds), then maximum likelihood and/or Bayesian 36 37 inference would be preferable to the current NJ trees. 38 39 40 41 42 We have re-analysed our data through both Bayesian and Maximum 43 44 Likelihood inferences and these trees were used as input to OTU delimitation 45 46 analysis (GMYC and mPTP). Please, see the detailed explanation given in 47 response to a similar issue raised by referee 1. Within Entomobrya OTU4, a 48 49 widespread OTU, the subset of haplotypes indicated as in situ endemic 50 51 variation was defined by node support (BI posterior =1) and inspection of its 52 distribution, e.g. monophyletic clusters occur only in closely located 53 54 geographic sites as opposed to the wider distribution of other sequences 55 56 belonging to this OTU. For Lepidocyrtus, none of the widespread OTUs 57 58 presented this signature, but five OTUs (defined as explained above) were 59 restrictedly distributed across sampling sites, indicating their localized 60 Journal of Biogeography Page 20 of 117

1 2 3 4 occurrence. Thus, we are not referring to subgroups but rather to entire OTUs 5 within Lepidocyrtus data and reader can verify support for their monophyly 6 7 and their localized distribution by looking at Figure 3 and maps presented in 8 9 Figure 5. 10 11 12 13 14 2. Throughout, there is awkward interchangeable use of the following: (a) 15 16 intra-OTU = within species = among population; (b) OTU = species = lineage. 17 18 I recognize that OTUs are defined as groups of haplotype with < 4% 19 sequence divergence among them, that these are tentatively treated as 20 21 species. However, Forsome confusion Peer arises Review when also using the term species to 22 23 refer to morphologically identified named species, and in some cases (owing 24 25 to a departure from the standard number system used for most OTUs) 26 "lineage" is no longer synonymous with OTU. All of this is further complicated 27 28 by use of the term "communities", which in some cases is clearly misapplied 29 30 (i.e., when dealing with the intra-OTU level, it would effectively be referring to 31 32 communities of haplotypes), and in other cases is questionable (e.g., when 33 dealing with multiple OTUs, it implicitly suggests that the communities are 34 35 comprised only of Collembola from up to two genera). 36 37 38 39 We have revised the use of these terms throughout the text. We noticed that 40 the confusion mainly appeared in sections related to species diversity analysis 41 42 in which we erroneously used the term species to designate OTUs and 43 44 referred to communities of these species in glaciated and unglaciated areas 45 46 without explaining the term. The word community is used to refer to the 47 collection of sequences of each genus found in each area. It is a statistical 48 49 term used in the analysis more than an ecological term. We have added an 50 51 explanation of what we mean by community in lines 262-263 and Tables 3 & 4 52 captions. Apart from that, we have used OTU or intra-OTU to talk about 53 54 taxonomic units defined by molecular data and avoided the use of lineage or 55 56 species terminologies. Also, morphologically identified named species have 57 58 been always referred to as morphospecies. 59 60 Page 21 of 117 Journal of Biogeography

1 2 3 4 5 3. Both measures of richness (i.e., rarefied number of different OTUs at the 6 7 species-level, and rarefied number of haplotypes at the intraspecific-level) 8 9 treat all countable entities as being equally different from one another. 10 11 However, the information embedded in the DNA sequence data that could be 12 used to consider phylogenetic diversity at both levels. Also, I think that more 13 14 could be done with the finding that Entomobrya comprised of 13 OTUs, but 15 16 assigned to only 6 morphospecies, and Lepidocyrtus was comprised of 22 17 18 OTUs, but assigned to only 8 morphospecies (two unnamed). Does this 19 suggest that the 4% sequence divergence cut-off for COI does capture 20 21 biologically relevantFor partitions Peer (i.e., it over-splits Review groups), or alternatively, that 22 23 that morphological characters are insufficient (i.e., lead to under-splitting)? If 24 25 one of these explanations is favored, on what basis is that other discounted? 26 If the favored explanation necessarily the same for both genera, and why / 27 28 why not? 29 30 31 32 Measuring phylogenetic diversity (PD) gives, among other important things, 33 information about community structuring patterns (e.g. whether species in a 34 35 community present phylogenetic clustering or overdispersion) which goes 36 37 beyond our main focus. Taking referee’s suggestion, we have calculated 38 39 phylogenetic diversity (Faith's PD) for communities of Entomobrya and 40 Lepidocyrtus from glaciated and unglaciated areas. Then compared, through 41 42 standardized effect size (SES), observed Faith’s PD patterns to those 43 44 expected under various null models of community assembly and phylogeny 45 46 structure (Swenson, 2014). SES was run with iterations = 1000 and runs= 999 47 using ‘picante’ package in R (Kembel et al., 2010). For datasets using overall 48 49 OTUs, Entomobrya PD in both glaciated and unglaciated communities were 50 51 significantly lower than what it would be expected in null communities (Table 52 1). For Lepidocyrtus, PD in glaciated community does not differ from the 53 54 expected in null communities, while PD in unglaciated community is 55 56 significantly lower than expected in null communities (Table 1). These 57 58 patterns of lower PD indicate that closely related species occur more together 59 than expected by chance, i.e. phylogenetic clustering, which suggests niche 60 Journal of Biogeography Page 22 of 117

1 2 3 4 conservatism. Although we have understood how PD varies in our dataset, we 5 rather not include this to our main text because it opens a new topic and 6 7 discussion that is not essential for the patterns we are describing, plus it 8 9 would compromise word limit. Additionally, we are uncomfortable 10 11 extrapolating phylogenetic relationships from our single marker, and as 12 mentioned above saturation issues compromise deeper phylogenetic 13 14 inference. 15 16 17 18 Regarding correspondence between morphology and molecular analysis, as 19 discussed above, we have re-done our delimitation analysis and results are 20 21 similar to the 4% cut-offFor used Peer previously. Review The discrepancy found between 22 23 morphospecies and OTUs is a common feature in studies applying molecular 24 25 methods to survey biological diversity. For details and potential explanations, 26 please refer to our response to similar question raised by referee 2. An 27 28 insufficient morphological taxonomic system is likely to explain mismatches 29 30 for both genera and this had been briefly discussed in lines 494-497. 31 32 33 34 35 Table 1 - Faith’s Phylogenetic diversity (PD) measures for Entomobrya and 36 Lepidocyrtus datasets from glaciated and unglaciated communities. Observed 37 38 patterns were compared to those expected under null models of community 39 assembly and phylogeny structure. ComGL = community in glaciated areas, 40 41 comUnG = community in unglaciated areas, n.tx = number of taxa in community, 42 43 pd.obs = observed Faith’s PD in community, null = mean PD in null communities, sd 44 = standard deviation of PD in null communities, rank = rank of observed PD vs. null 45 46 communities, obs.p = P-value of observed PD vs. null communities 47 48 49 Genus n.tx pd.obs null sd rank obs.p 50 51 52 ComGL 9 1,46 1,97 0,24 12,00 0,012 53 54 Entomobrya 55 ComUnG 12 1,92 2,31 0,21 53,50 0,054 56 57 58 ComGL 11 10,52 13,35 1,96 82,00 0,082 59 60 Page 23 of 117 Journal of Biogeography

1 2 3 4 ComUnG Lepidocyrtus 17 15,56 18,91 1,22 56,00 0,056 5 6 7 8 9 10 11 Overall, I think that the paper has good potential, but would benefit from a 12 13 revision. Below I provide comments on some more minor aspects that could 14 15 be addressed relatively easily. 16 17 18 19 20 ABSTRACT 21 For Peer Review 22 23 L34. insert word: "... [genetic] signatures..." 24 25 26 27 L40. what would the signatures of persistence (cf. recent recolonization) look 28 29 like, given the focus on OTUs within genera? 30 31 32 Endemism and disjunct distribution have been used as basic arguments to 33 34 support the glacial survival hypothesis (e.g. Brochmann et al, 2003), thus they 35 36 are considered signatures of persistence. Populations surviving glacial cycles 37 are expected to accumulate genetic variation, and they should contain related 38 39 and locally endemic alleles, distinct from surrounding regions (Hewitt, 1999). 40 41 As explained throughout the manuscript, signatures of persistence were found 42 43 in monophyletic clusters of sequences occurring in geographically localized 44 sampling sites, i.e., endemic variation that diversified in situ. For Entomobrya, 45 46 this was found within a widespread OTU, while for Lepidocyrtus, entire OTUs 47 48 presented this pattern. Also, clear patterns of OTUs disjunctly distributed were 49 50 found within both genera. Remaining OTUs that are widespread but present 51 no sign of localized endemic variation or are geographically localized but 52 53 present no sequence variation are more likely to be a result of recent 54 55 recolonization. 56 57 58 59 60 Journal of Biogeography Page 24 of 117

1 2 3 4 L44-45. in what way is "geographically disjunct genetic variation" different 5 from "geographically localized diversification of lineages"? 6 7 8 9 These patterns are quite distinct in the way genetic variation is spatially 10 11 distributed. In the first pattern, related sequences are distributed considerably 12 apart from each other in the geographical space (e.g. an OTU found in East 13 14 Anglia and South West). While in the second pattern, the range of distribution 15 16 of genetic variation is rather geographically restricted to a few closely located 17 18 sites. 19 20 21 For Peer Review 22 23 L46, what is the basis for suggesting that pre-LGM diversification occurred in 24 25 situ? 26 27 28 As explained above, populations surviving glacial cycles are expected to 29 30 accumulate genetic variation, and they should contain related and locally 31 32 endemic alleles distinct from surrounding regions (Hewitt, 1999). Our finding 33 of monophyletic clusters of sequences that are geographically restricted to a 34 35 few closely located sites whose coalescent origin predates LGM, indicate 36 37 endemic variation evolving in situ. Following a comprehensive approach to 38 39 estimate dates for the onset of these in situ diversification events, we found 40 ages for these events that were older than the LGM, indicating population 41 42 persistence that precedes LGM. 43 44 45 46 47 INTRODUCTION 48 49 50 51 L57. replace word: "significantly" often implies a statistical test has been 52 performed. 53 54 55 56 Changed to “considerably”. 57 58 59 L60. expand abbreviation (Ma.) on first mention. Also, lower bound (0.01 Ma) 60 Page 25 of 117 Journal of Biogeography

1 2 3 4 marks the beginning of the Holocene, which is generally thought to be been 5 free of glaciation events... should this lower bound instead be 0.02Ma (i.e., at 6 7 the height of the LGM)? 8 9 10 11 The lower bound for LGM is 11,7Ka (e.g. Tzdesaki et al. 2013), thus, we have 12 changed it to 0.012 Ma. 13 14 15 16 L63. Delete word: [geographical] - seems redundant, given reference to 17 18 latitudinal range. 19 20 21 Changed to geographicFor position Peer Review 22 23 24 25 L64. Consider replacing "leaving a" with "resulting in a present-day" 26 27 28 Replaced. 29 30 31 32 L68. The opening of this new paragraph refers back to "this view" given in the 33 previous one. I think the new paragraph should stand alone, by re-stating / 34 35 clarifying the major elements of the scenario given previously (e.g., "local 36 37 extirpation followed by recolonization from southern refugia, coupled with loss 38 39 of intraspecific diversity owing to founder effects" or similar). 40 41 42 Due to manuscript word limit, we rather do not re-state the scenario explained 43 44 in the previous paragraph. We believe it is sufficient. 45 46 47 48 49 L76. "Environmental stability" seems vague. Does this really just mean 50 51 climatic stability / thermal buffering... or is there more to it? Either way, I think 52 this should be rephrased. 53 54 55 56 57 58 Changes in water properties such as temperature and chemistry are much 59 slower in subterranean than in surface waters. Therefore, groundwaters are 60 Journal of Biogeography Page 26 of 117

1 2 3 4 buffered from extreme changes in temperature and rapid hydrological 5 variations, resulting in a more stable environment. 6 7 8 9 We have added a short explanation to line 78. 10 11 12 13 14 L79 & 82. Replace word: [presents / presented] 15 16 17 18 ‘Presents’ changed to ‘shows’ 19 20 21 L82-84. Here and aboveFor the Peer authors seem Review to be suggesting that if haplotypes 22 23 are presently endemic to an area, then they must have arisen there / evolved 24 25 in situ. While I agree that this may be likely and is certainly a parsimonious 26 explanation, it does not preclude a scenario in which a distinct lineage 27 28 evolved elsewhere, colonized a new area, and subsequently went extinct in 29 30 the original location. I think that underlying assumptions should probably be 31 32 rationalized for the reader. 33 34 35 Yes, we agree and have suggested this when discussing patterns for 36 37 disjunctly distributed OTUs. These native OTUs probably presented a broader 38 39 distribution in Great Britain but went extinct in most of their ranges in 40 response to climate change (lines 458-464). We have now explicitly raised 41 42 this scenario for localised clades explaining they could also have once been 43 44 widespread (i.e. same and related alleles found elsewhere), but went extinct 45 46 in most of their range in response to climate change, surviving in a few ice- 47 free localities (lines 464-468) 48 49 50 51 52 L87. expand abbreviation (ka) on first mention. 53 54 55 56 Changed. 57 58 59 60 Page 27 of 117 Journal of Biogeography

1 2 3 4 5 L91-92. rephrase, to past tense. 6 7 8 9 We do not think it is necessary. 10 11 12 L104, insert words: "...test for [the following] signatures..." 13 14 15 16 Changed. 17 18 19 L105, What aspect(s) of "intra-OTU genetic richness" are considered (e.g., 20 21 based on entire haplotypesFor Peervs. just on Reviewsegregating sites, or some other)? 22 23 24 25 We have considered haplotypes within OTUs and this is explained in Methods 26 section. 27 28 29 30 L108, replace "origins" with "times" (or similar). 31 32 33 We do not think it is necessary. 34 35 36 37 L109, Is long-distance dispersal considered an implausible explanation a 38 39 priori (e.g., given the statements about dispersal limitation of Collembola, 40 given earlier in the Intro), or is this still considered to be biologically possible 41 42 and therefore is somehow going to be tested against alternative explanations 43 44 for disjunct distributions? 45 46 47 Yes, it is considered unlikely to be due to dispersal limitation, as explained in 48 49 Intro and Discussion, lines 450-464. 50 51 52 L110. expand abbreviations (MtDNA, COI) on first mention. 53 54 55 56 Changed. 57 58 59 METHODS 60 Journal of Biogeography Page 28 of 117

1 2 3 4 5 L118, Here it says N=722 samples of Entomobrya spp. and N=428 of 6 7 Lepidocyrtus spp., but Table S1 lists N=724 and N=391, respectively. 8 9 10 11 This has been verified and corrected. Table S1 has now N=722 Entomobrya 12 and N=428 Lepidocyrtus. 13 14 15 16 17 18 L120-121, Here it says N=98 sites, but Table S1 has site ID#'s to go up to 19 102. If might be worth adding to the Table S1 caption that site ID#'s are not 20 21 consecutive numbers?For Peer Review 22 23 24 25 We have added to table header: “Sites 46, 70, 71 and 82 are absent from the 26 list as no specimens were collected at these siles” 27 28 29 30 L124-125, rephrase "posterior identification checks" 31 32 33 We think it is OK. 34 35 36 37 L130-131, Of the "100 µL volume of proteinase K + Buffer ATL", how much of 38 39 each component / mixed in what ratio? 40 41 42 43 44 90 ul of Buffer ATL + 10 ul of proteinase K (ratio 9:1 which is the same ratio 45 46 found in the original protocol). 47 48 49 L138, expand "PCR" on first mention (earlier) 50 51 52 This is expanded in line 144 and we do not see the need to expand it before. 53 54 55 56 L150-152, This needs to be shifted down, since the basic of unit of analysis is 57 58 an OTU, yet the information on how they were delineated comes later in the 59 60 Page 29 of 117 Journal of Biogeography

1 2 3 4 paragraph. 5 6 7 This has been completely revised. 8 9 10 11 L152-154, It is not immediately clear why a NJ phylogenetic analysis was 12 necessary, given that all that's needed to define OTUs is pairwise p-distances. 13 14 If the tree is needed for some other purpose, then that should be stated. 15 16 17 18 This has been completely revised. 19 20 21 For Peer Review 22 23 L155-158, What is the relevance here of choices relating to the OTU 24 25 delineation threshold that was previously made by Cicconardi et al. 2013? Is 26 there something biologically meaningful about this 96% similarity cut-off (e.g., 27 28 corresponds with a natural division between intra- vs. inter-specific variation 29 30 seen for named species), or this used just for consistency across studies? 31 32 Either way, rationale should be given, including why the use of different genes 33 (COI here vs. COII in Cicconardi et al. 2013) would not be problematic for this 34 35 purpose (there are indications in the following paragraph that COI mutation 36 37 rate is quite a bit faster, almost 2x, than that of COII). 38 39 40 We have extensively discussed these points in previous answers to Referee 41 42 1. Please, check responses to topics “OTU delineation” and “Evolutionary rate 43 44 estimation” in Referee 1. 45 46 47 L158-161, this last sentence indicates what was done (i.e., a BOLD 48 49 comparison), but does not clearly explain the purpose of doing it. 50 51 52 We explained that it was done “for evidence of identical or nearly identical 53 54 sequences sampled outside the Great Britain”. 55 56 57 58 59 60 Journal of Biogeography Page 30 of 117

1 2 3 4 L165 & 167, I presume these rates are per lineage (cf. between a pair of 5 diverging lineages), but it would good to explicitly state that this is the case. 6 7 8 9 These are general rates proposed for different groups within . 10 11 12 L192, Following mapping, how were non-random patterns identified? 13 14 15 16 Patterns were identified visually. Please, refer to our answer to similar 17 18 question to Referee number 1. 19 20 21 L197 & 201, there isFor some redundancyPeer Review here re: the partitioning into glaciated 22 23 vs. unglaciated groupings. 24 25 26 In these lines, we have explained how sampling sites were classified into 27 28 glaciated and unglaciated areas to test for richness differences between these 29 30 areas. We did not see the redundancy indicated. 31 32 33 L208-210, meaning unclear: "...differences in sampling and abundances..." 34 35 and "...as a threshold the abundance of the smaller...". Is "abundance" the 36 37 essentially just the sample size (i.e., number of sequenced individuals), or 38 39 something other? 40 41 42 Yes, abundance essentially means sample size = number of sequences per 43 44 sampling sites 45 46 47 L212-213, An a priori hypothesis is given for OTU-level richness but no 48 49 hypothesis is given from intra-OTU richness, and so this seems incomplete. 50 51 52 This has been changed to include intra-OTU diversity. 53 54 55 56 L214-215, the reference to "communities" here (i.e., smaller, or rarefied) is 57 58 confusing. I think it just intended to refer to glaciated vs. unglaciated 59 groupings, but it doesn't give the same clarity (i.e., does not convey which of 60 Page 31 of 117 Journal of Biogeography

1 2 3 4 this two had the small sample size). Also, the idea that a community (in the 5 ecological sense) would be comprised only of members of two Collembola 6 7 genera seems to be a narrow view. 8 9 10 11 Yes, we just intend to refer to the groupings found in each area. The term 12 community comes from the software used. We have added an explanation of 13 14 what we mean by community in lines 262-263 and Tables 3 and 4. 15 16 17 18 19 RESULTS 20 21 For Peer Review 22 23 L221-231, some persistent issues with correspondence between sample sizes 24 25 reported in the Methods, here in the Results, and in Table S1. Perhaps a 26 clearer distinction needs to made between number of specimens collected 27 28 and assessed on the basis of morphology vs. number of collected specimens 29 30 use for molecular analyses. 31 32 33 Table S1 (now Table S1.1 in Appendix S1) has been amended and it now 34 35 includes all specimens collected and assessed on basis of morphology. All 36 37 specimens were sequenced, however, some of them failed to amplify or 38 39 produce sequences of good quality traces (as explained in lines 285-286). 40 This is why numbers differ. 41 42 43 44 45 46 L235, reword: "...clustering sequences [into] OTUs..." 47 48 49 We think [down to] is OK. 50 51 52 L241 (& 258): What is the importance of OTUs that closely matched 53 54 sequences in BOLD? It is not clear to me why these are being highlighted. 55 56 57 58 59 60 Journal of Biogeography Page 32 of 117

1 2 3 4 Identical or near identical BOLD matches are highly indicative that the 5 sequences belong to the same species, and their geographic origin can be 6 7 informative about introduction (Cicconardi et al., 2017). 8 9 10 11 L252, typo [OUT] 12 13 14 Changed. 15 16 17 18 L267, replace "well" with "broadly" (or similar). 19 20 21 Replaced. For Peer Review 22 23 24 25 L269, delete word [more] 26 27 28 Deleted. 29 30 31 32 33 L269-275, The conclusion of non-random geographic structuring / clear 34 35 structuring of genetic variation within OTU 4 is subjective, and perhaps 36 37 questionable. From looking at Fig. 4, the two subgroups (NivG1 and NivG2) 38 39 are broadly sympatric at 3 sampling sites and allopatric at 4 sites, so the 40 presence / absence of substructure could be seen as somewhat equivocal. 41 42 43 44 Rather than subjectively, we based our conclusion on the high bootstrap 45 46 support value found for these groups in the NJ analysis, which has been 47 corroborated by the high posterior value found in the Bayesian inference and 48 49 inspection of their geographic distribution. Both suggested the existence of 50 51 these two monophyletic clusters within OTU4. The fact that they are broadly 52 sympatric does not change conclusions. 53 54 55 56 57 58 59 60 Page 33 of 117 Journal of Biogeography

1 2 3 4 Also, from Figure 2, inclusion of haplotype C019 in the subgroup NIVG1 is 5 odd, and is inconsistent with the reported node support value (bs=67%) for 6 7 the monophyly of this group. 8 9 10 11 This was an error when shading the subgroup, C019 does not belong to 12 NIVG1 13 14 15 16 17 18 L278, rephrase "presented spread/widespread" 19 20 21 We believe it doesn’tFor need toPeer be rephrased. Review 22 23 24 25 L280, replace "unique" with "different" (or similar). 26 27 28 Changed to single. 29 30 31 32 L281-282, comment: the labelling structure of OTUs is confusing. For 33 Entomobrya (and most Lepidocyrtus) spp., each different OTU (also 34 35 sometimes referred to as "lineage") is identified by a different integer. 36 37 However, in a few cases for Lepidocyrtus spp., different OTUs are identified 38 39 by a decimal place (e.g., OTU 7.1 and OTU 7.2), and this also means that 40 "lineage" is no longer synonymous with "OTU" (e.g., lineage 7 is a higher-level 41 42 group). I think that the inconsistency introduced by the latter should be 43 44 avoided. Also, the decision to pool different OTUs together (e.g., OTU 7.1 + 45 46 OTU 7.2 = OTU 7) causes the earlier definition of what an OTU is (i.e., the 4% 47 rule) to now be inaccurate. It is also unclear if this treatment was applied prior 48 49 to conducting richness analyses, or afterwards. I think it would be better to 50 51 avoid this reclassification altogether. 52 53 54 This has been completely revised after phylogenetic inferences. 55 56 57 58 L287, replace "disjunctive" with "disjunct" 59 60 Journal of Biogeography Page 34 of 117

1 2 3 4 Both terms are correct. 5 6 7 L295-297, as above, that conclusion that there is geographic structuring of 8 9 genetic variation is subjective (e.g., no test was performed). In this case, there 10 11 is also no opportunity for the reader to see for themself what the authors are 12 refer to, given that subgroups within OTUs 3, 7, 8, 12 and 18 are not identified 13 14 on Figure 3, and their spatial distributions are not mapped in Figure 5. 15 16 17 18 There are no subgroups within these OTUs, they are themselves whole OTUs 19 that show geographic structuring of genetic variation and their distributions 20 21 are indeed indicatedFor in Figure Peer 3 and mapped Review in Figure 5. 22 23 24 25 L311, "time" (rather than "times") 26 27 28 Times is correct because it does not refer to chronology but to the number of 29 30 times a random sample is drawn from a community 31 32 33 L315, replace word: "communities" (within-OTU was previously described as 34 35 equivalent to within-species). 36 37 38 39 The word “communities” has been clarified, as discussed above, and we have 40 refrained from using species terminology. 41 42 43 44 L325, "lower" (rather than "smaller") 45 46 47 We rather stick to smaller because it clearly denotes something is reduced. 48 49 This fits better to the reduced diversity found than the word lower, which 50 51 relates more to a position, e.g. subordinate, inferior, worse. 52 53 54 L327 & 328, the level of organization here is among individuals / populations 55 56 within species (not within communities). 57 58 59 60 Page 35 of 117 Journal of Biogeography

1 2 3 4 As explained above, the use of the term community has been explained in line 5 262-263. 6 7 8 9 L333, the best model applies to a set of sequences, not to groups 10 11 12 Changed to “for sequences in the two Entomobrya monophyletic groups” 13 14 15 16 L334-337, as noted above, inclusion of haplotype C019 in the subgroup 17 18 NIVG1 is odd. Also, something that is not mentioned here or apparent from 19 Figure 4 is which of the two subgroups (i.e., NivG1 or NivG2) has a 20 21 geographic distributionFor spanning Peer the glaciated Review and unglaciated areas. 22 23 24 25 CO19 has been incorrectly included in NIG1 and this has been amended in 26 Figure 2. Their geographic distribution has been mentioned in lines 341-342 27 28 and we added how they span through glaciated and unglaciated areas in lines 29 30 342-343. 31 32 33 DISCUSSION 34 35 36 37 L350, "confined" (rather than "constrained") 38 39 40 Changed to “restricted”. 41 42 43 44 L356, "characteristics" (rather than "arguments") 45 46 47 We are talking about arguments to support a hypothesis, thus, we believe the 48 49 word is appropriate. 50 51 52 L359-364, From the shading of these subgroups on Figure 2, it appears that 53 54 NivG1 includes 11 (not 10) individuals, and NivG2 includes 7 (not 6) 55 56 individuals. 57 58 59 60 Journal of Biogeography Page 36 of 117

1 2 3 4 We re-built the tree and redraw subgroups making sure the right individuals 5 have been shaded. 6 7 8 9 L373, "long-distance" (rather than "long-term") 10 11 12 Changed. 13 14 15 16 L381, missing word: "...origin in [the] Great Basin..." 17 18 19 This article is not needed before the name of the country (Great Britain). 20 21 For Peer Review 22 23 L385-385, The meaning of "leading front" are "pre-empt space" are both 24 25 somewhat unclear. 26 27 28 The ideas conveyed by both terms are commonly discussed in the literature 29 30 concerning genetic consequences of Pleistocene glaciations. 31 32 “Leading front” (~ leading edge) is a common term used to explain patterns of 33 colonization after ice sheets retreat [(e.g. “dispersal at the leading edge would 34 35 likely be by long distance dispersants that set up colonies far ahead of the 36 37 main population” in Hewitt (1999)]. “Pre-empt space” also relates to the 38 39 dynamics of post-glacial recolonization where empty spaces resultant from ice 40 sheet advance, are now filled by pioneering dispersants [( e.g. “these 41 42 pioneers could expand rapidly to fill the area before significant numbers of 43 44 other dispersants arrived” in Hewitt (1999)]. 45 46 47 L394, Until here, it was not apparent that morphospecies diagnosis is based 48 49 on color pattern characters. I think it is important to mention in the Methods 50 51 (around L123), since this would probably affect readers' expectations about 52 the extent to which such diagnosis would match DNA-based OTUs. 53 54 55 56 It has been added to Methods, lines 129-130. 57 58 59 L395, this should refer to Figure 2 (not Figure 1) 60 Page 37 of 117 Journal of Biogeography

1 2 3 4 5 Changed. 6 7 8 9 L339, replace "occupied" with "contained" 10 11 12 Changed to “featured in”. 13 14 15 16 L402-403, this information on key taxonomic features should be mentioned in 17 18 the Methods. 19 20 21 Done. For Peer Review 22 23 24 25 L409, There is no reason to expect that whole genera would respond to 26 environmental change as single cohesive unit. 27 28 29 30 We agree, and that is not what we are suggesting. We merely point of there is 31 32 a difference between genera. 33 34 35 L410 (and elsewhere), replace "presented" with another word. 36 37 38 39 We believe this word is OK. 40 41 42 L413-415, First it is stated that the Lepidocyrtus OTU richness data do 43 44 support the "above" hypothesis (the only one mentioned by name was the 45 46 northern poverty hypothesis), but then it is state that the geographic 47 distribution of Lepidocyrtus species richness adds further support for the long- 48 49 term persistence. The switch between OTU and species is also a source of 50 51 confusion, since the previous paragraph talks about species in the strict sense 52 of morphospecies... it is not clear if that carries over to this paragraph. 53 54 55 56 The word species has been changed to OTU throughout the text as explained 57 58 before. 59 60 Journal of Biogeography Page 38 of 117

1 2 3 4 5 Sentence mentioning long-term persistence has been deleted to avoid 6 7 confusion and this same discussion is presented in the following paragraph. 8 9 10 11 12 L419, "population persistence of [species]" seems a bit awkward. 13 14 15 16 Word order changed. 17 18 19 L420, delete word: [some] 20 21 For Peer Review 22 23 We prefer to keep this word because it denotes that this is a relative rather 24 25 than absolute age estimative. 26 27 28 L425-426, But note that the lower bound of the HPD surrounding the point 29 30 estimate of 53 Kya does not predate the LGM. 31 32 33 As explained in a previous response to referee 1, this analysis has been 34 35 revised by changing the tree prior used. Ages have changed and lower 36 37 bounds of HPD do predate LGM for both NVG1 and NIVG2. 38 39 40 TABLES 41 42 43 44 Table 1: I think that L661 of the caption should be "number of [haplotypes] 45 46 sampled from single localities", and that L662 would be clearer if it read as 47 "observed [intra-OTU] p-distances". As noted above, I am not really sure what 48 49 value the BOLD comparison adds. Also, there is no mention of what the 50 51 asterisks in the "morphospecies" column indicate. 52 53 54 These are in fact number of sequences and not number of haplotypes 55 56 sampled from single localities. 57 58 59 60 Page 39 of 117 Journal of Biogeography

1 2 3 4 The observed [intra-OTU] p-distances are reported as mean p-distances and 5 we also think it is good to present maximum p-distances as it gives a more 6 7 complete idea of their variation. 8 9 10 11 As explained in the caption, BOLD information refers to the localities from 12 where identical or near identical sequences (>99% similarity) were sampled 13 14 which gives an idea of their restricted (no match to the database) or worldwide 15 16 distribution (e.g. Canada, Australia, etc.) 17 18 We did mention what asterisks indicate just below the table but superscript 19 alphabetic letters have substituted them. 20 21 For Peer Review 22 23 24 25 Table 2: Unclear why the structure of this table differs from Table 1 by not 26 listing "number of [haplotypes] sampled from single localities". In the BOLD 27 28 column, what do "no match", "not provided" and "<99% match" mean? None 29 30 of these designations appear in Table 1. 31 32 33 Tables have now similar structure. “No match” means that when querying a 34 35 sequence into Bold database, no hit was found, i.e., there is nothing like this 36 37 sequence already published in the database. “Not provided” means, although, 38 39 a similar sequence was found in the database, the geographic information of 40 the place where this sequence was sampled from is missing. “<99% match” 41 42 means that, although, similar sequences were found, they are less than 99% 43 44 identical which preclude us to consider it for taxonomic or geographic 45 46 assignments. 47 48 49 Table 3: The word "community" on L694 does not apply broadly across this 50 51 table and should be replaced. Also, it is unclear why many of the mean and 52 95% CI values are not given (instead reported as a dash). 53 54 55 56 We have added a sentence to the Table caption to explain what we mean by 57 58 community. Mean and CI can only be given for the community that is being 59 60 Journal of Biogeography Page 40 of 117

1 2 3 4 rarefied for each comparison, this is why a dash sign appears in the other 5 community. 6 7 8 9 Table 4: see comments on Table 3. 10 11 12 Changed. 13 14 15 16 Table 5: it is unclear why the corresponding intra-OTU MRCA & CI's for 17 18 Entomobrya spp. are not included here. 19 20 21 This result is presentedFor in thePeer text, lines Review 409-11, because there were only 2 22 23 monophyletic clusters for Entomobrya, not enough to build a table. Table 5 24 25 only deals with Lepidocyrtus results. 26 27 28 Table S2: The purpose of summarizing data from a different gene (COII), from 29 30 a different location (Panama data), from a different study (add citation) should 31 32 be given here. Also abbreviated table headings need to be expanded / 33 defined in the caption. 34 35 36 37 Information added. 38 39 40 FIGURES 41 42 43 44 Figure 1. An inset showing the broader geographic context would be a useful 45 46 addition. 47 48 49 Our study focuses on the Great Britain geographic context, and differs from 50 51 many other studies that look at a wider European context. Thus, we believe 52 that the figure would not benefit from widening its view and we rather keep it 53 54 to GB to avoid any confusion. 55 56 57 58 Figure 2. The inclusion of haplotype C019 in the subgroup NIVG1 within 59 lineage 4 (= OTU 4) is odd. If poorly supported monophyly is not seen to be 60 Page 41 of 117 Journal of Biogeography

1 2 3 4 problematic when delimiting subgroups, then other sequences (i.e., nivalis 5 L;20,23 etc) should also be included in NIVG1. Alternatively, if at least 6 7 moderately supported monophyly is a requirement, then haplotype C019 8 9 should be exclude from subgroup NIVG1 (and its geographic distribution 10 11 excluded from the ellipse in Fig. 4). 12 13 14 This was an error when shading the group. C019 is not part of NIVG1. This 15 16 figure has been completely revised. 17 18 19 Figure 3. As noted above, the shading of OTUs 3, 7, 8, 12 and 18 (i.e., those 20 21 in which there is reportedFor to Peer be geographic Review structuring of genetic variation) is 22 23 not all that meaningful if subgroups within these OTUs are not identified and 24 25 named, and also represented on Figure 5 26 27 28 There are no subgroups within these OTUs. The entire OTU itself is a 29 30 monophyletic cluster. This differs from Entomobrya data in which we found 31 32 monophyletic clusters within a widespread OTU (number 4). No monophyletic 33 clusters were found within Lepidocyrtus widespread lineages. This difference 34 35 is explained in the text, lines 361-364. 36 37 38 39 Figure 4. The caption / figure does not clarify which ellipse represents which 40 subgroup in panel B. 41 42 43 44 Caption changed accordingly. 45 46 47 Figure 5: see comment on Figure 3. 48 49 50 51 Please, see response to comment on Figure 3. 52 53 54 55 References 56 57 58 Baddeley, A. & Turner, R. (2005) spatstat: An R Package for Analyzing 59 60 Spatial Point Patterns. Journal of Statistical Software, 12, 1–42. Journal of Biogeography Page 42 of 117

1 2 3 4 Brochmann, C., Gabrielsen, T.M., Nordal, I., Landvik, J.Y., & Elven, R. (2003) 5 Glacial survival or tabula rasa? The history of North Atlantic biota 6 7 revisited. Taxon, 52, 417–450. 8 9 10 Burkhardt, U. & Filser, J. (2005) Molecular evidence for a fourth species within 11 12 the Isotoma viridis group (Insecta, Collembola). Zoologica Scripta, 34, 13 177–185. 14 15 16 Cicconardi, F., Borges, P.A.V., Strasberg, D., Oromí, P., López, H., Pérez- 17 18 Delgado, A.J., Casquet, J., Caujapé-Castells, J., Fernández-Palacios, 19 20 J.M., Thébaud, C., & Emerson, B.C. (2017) MtDNA metagenomics 21 reveals large-scaleFor invasionPeer of belowgroundReview arthropod communities by 22 23 introduced species. Molecular Ecology, 26, 3104-3115. 24 25 26 Cicconardi, F., Fanciulli, P.P., & Emerson, B.C. (2013) Collembola, the 27 28 biological species concept and the underestimation of global species 29 richness. Molecular Ecology, 22, 5382–5396. 30 31 32 Cicconardi, F., Nardi, F., Emerson, B.C., Frati, F., & Fanciulli, P.P. (2010) 33 34 Deep phylogeographic divisions and long-term persistence of forest 35 36 invertebrates (Hexapoda: Collembola) in the North-Western 37 Mediterranean basin. Molecular Ecology, 19, 386–400. 38 39 40 Dixon, P.M. (2002) Ripley’s K function. Encyclopedia of environmetrics (ed. by 41 42 A.H. El-Shaarawi and W.W. Piegorsch), Wiley, Chichester New York. 43 44 45 Ezard, T., Fujisawa, T., & Barraclough, T.G. (2009) SPLITS: SPecies’ LImits 46 by Threshold Statistics: R package version 1.0-18/r45. 47 48 49 Funk, D.J. & Omland, K.E. (2003) Species-level paraphyly and polyphyly: 50 51 Frequency, causes, and consequences, with insights from 52 53 mitochondrial DNA. Annual Review of Ecology Evolution and 54 Systematics, 34, 397–423. 55 56 57 Garrick, R.C., Rowell, D.M., Simmons, C.S., Hillis, D.M., & Sunnucks, P. 58 59 (2008) Fine-scale phylogeographic congruence despite demographic 60 Page 43 of 117 Journal of Biogeography

1 2 3 4 incongruence in two low-mobility saproxylic . Evolution, 62, 5 1103–1118. 6 7 8 Garrick, R.C., Sands, C.J., Rowell, D.M., Hillis, D.M., & Sunnucks, P. (2007) 9 10 Catchments catch all: long-term population history of a giant 11 12 from the southeast Australian highlands — a multigene approach. 13 Molecular Ecology, 16, 1865–1882. 14 15 16 Hewitt, G.M. (1999) Post-glacial re-colonization of European biota. Biological 17 18 Journal of the Linnean Society, 68, 87–112. 19 20 21 Hopkin, S.P. (2007)For A Key toPeer the Collembola Review (springtails) of Britain and 22 Ireland. Field Studies Council (FSC) AIDGAP, Shrewsbury, UK. 23 24 25 Kapli, P., Lutteropp, S., Zhang, J., Kobert, K., Pavlidis, P., Stamatakis, A., & 26 27 Flouri, T. (2017) Multi-rate Poisson Tree Processes for single-locus 28 29 species delimitation under Maximum Likelihood and Markov Chain 30 Monte Carlo. Bioinformatics, 33, 1630–1638. 31 32 33 Kembel, S.W., Cowan, P.D., Helmus, M.R., Cornwell, W.K., Morlon, H., 34 35 Ackerly, D.D., Blomberg, S.P., & Webb, C.O. (2010) Picante: R tools 36 37 for integrating phylogenies and ecology. Bioinformatics, 26, 1463– 38 1464. 39 40 41 Porco, D., Bedos, A., Greenslade, P., Janion, C., Skarżyński, D., Stevens, 42 43 M.I., Jansen van Vuuren, B., & Deharveng, L. (2012) Challenging 44 45 species delimitation in Collembola: cryptic diversity among common 46 springtails unveiled by DNA barcoding. Invertebrate Systematics, 26, 47 48 470–477. 49 50 51 Skarzyński, D. (2004). Taxonomic status of Schaefferia emucronata Absolon, 52 53 1900 and Schaefferia willemi (Bonet. 1930) (Collembola, 54 Hypogastruridae) in the light of hybridisation studies. Deutsche 55 56 Entomologische Zeitschrift, 51, 81-86. 57 58 59 60 Journal of Biogeography Page 44 of 117

1 2 3 4 South, A. (1961) The of the British species of Entomobrya 5 (Collembola). Transactions of the Royal Entomological Society of 6 7 London, 113, 387–416. 8 9 10 Stamatakis, A., Hoover, P., & Rougemont, J. (2008) A rapid bootstrap 11 12 algorithm for the RAxML Web servers. Systematic Biology, 57, 758– 13 771. 14 15 16 Swenson, N. (2014) Functional and Phylogenetic Ecology in R. Springer- 17 18 Verlag, New York. 19 20 21 Tzedakis, P.C., Emerson,For B.C., Peer & Hewitt, Review G.M. (2013) Cryptic or mystic? 22 Glacial tree refugia in northern Europe. Trends in Ecology & Evolution, 23 24 28, 696–704. 25 26 27 Zhang, J., Kapli, P., Pavlidis, P., & Stamatakis, A. (2013) A general species 28 29 delimitation method with applications to phylogenetic placements. 30 Bioinformatics, 29, 2869–2876. 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 45 of 117 Journal of Biogeography

1 2 3 1 Evidence for the Pleistocene persistence of Collembola in Great Britain 4 5 6 2 7 8 3 Christiana M. A. Faria1*, Peter Shaw2, Brent C. Emerson3 9 10 4 11 12 5 1Centre for Ecology, Evolution and Conservation, School of Biological Sciences, 13 14 6 University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, Great 15 7 Britain 16 8 2Centre for Research in Ecology, Whitelands College, Holybourne Ave., London SW15 17 9 4JD, Great Britain 18 10 3Island Ecology and Evolution Research Group, Instituto de Productos Naturales y 19 11 Agrobiología (IPNA-CSIC), c/Astrofísico Francisco Sánchez 3, La Laguna, 20 12 Tenerife, Canary Islands, 38206, Spain 21 For Peer Review 22 13 *Current address: Laboratory of Ecology and Evolution of Insects, Department of 23 14 Biology, Centre for Sciences, Building 906, Federal University of Ceará, 60455- 24 15 970, Fortaleza, CE, Brazil 25 16 26 27 17 28 29 30 18 Corresponding author: Christiana M. A. Faria, Address: Laboratory of Ecology and 31 32 19 Evolution of Insects, Department of Biology, Centre for Sciences, Building 906, Federal 33 34 20 University of Ceará, 60455-970, Fortaleza, CE, Brazil, Fax number: +558533669806, 35 36 37 21 Email address: [email protected] 38 39 22 40 41 23 Running title: Pleistocene persistence of Collembola in Great Britain 42 43 24 44 45 46 25 47 48 26 Word counts (abstract, main text and references): 7, 510 49 50 27 51 52 53 28 54 55 29 56 57 30 58 59 1 60 Journal of Biogeography Page 46 of 117

1 2 3 31 ABSTRACT 4 5 6 32 7 8 33 Aim Using two genera of Collembola, Lepidocyrtus and Entomobrya, we test for 9 10 34 genetic signatures of Pleistocene persistence of soil arthropods in Great Britain. 11 12 35 Location Great Britain. 13 14 15 36 Methods A region of the mitochondrial COI gene was sequenced for 1150 Collembola 16 17 37 specimens from the genera Lepidocyrtus and Entomobrya across Great Britain. 18 19 38 Individuals were clustered into OTUs, and both OTU richness and geographic patterns 20 21 For Peer Review 22 39 of genetic variation within OTUs were compared between glaciated and unglaciated 23 24 40 areas to identify signatures of OTU persistence through Pleistocene glacial events. 25 26 41 Results Our analyses identified 12 Entomobrya and 18 Lepidocyrtus OTUs in Great 27 28 Britain. Lepidocyrtus OTU richness was significantly lower in glaciated than 29 42 30 31 43 unglaciated areas, whereas there was no difference for Entomobrya OTU richness. 32 33 44 However, both genera presented clear patterns of geographically disjunct genetic 34 35 45 variation and geographically localized diversification of OTUs. Estimated dates for the 36 37 38 46 onset of in situ diversification events indicate population persistence that precede the 39 40 47 last glacial maximum. 41 42 48 Main conclusions. Patterns of genetic diversity within Collembola OTUs in Great 43 44 45 49 Britain add to a growing body of evidence that elements of the invertebrate fauna have 46 47 50 persisted in situ through Pleistocene glacial cycles. Genetic signatures of population 48 49 51 persistence in more northern glaciated areas of Great Britain support a hypothesis of 50 51 geothermal glacial refugia that call for further investigation with other soil mesofaunal 52 52 53 54 53 taxa. 55 56 54 Key-words: Collembola, mesofauna, Pleistocene, refugia, soil, western Europe 57 58 59 2 60 Page 47 of 117 Journal of Biogeography

1 2 3 55 INTRODUCTION 4 5 6 56 7 8 57 Climate oscillations have considerably affected the distributions of many species 9 10 58 globally, resulting in the north-south structuring of genetic diversity within many 11 12 59 species as a result of past range contraction and expansion (e.g. Hewitt, 2004). In the 13 14 15 60 northern hemisphere, during the Pleistocene glaciations (from 2.59-0.012 Million years 16 17 61 ago, Ma), as high latitudes were covered by ice and permafrost, northern populations 18 19 62 either went extinct or their distributions retreated southward, surviving in refugia 20 21 For Peer Review 22 63 (Tzedakis et al., 2013). Due to their geographic position and latitudinal range, large 23 24 64 areas of the British Isles were repeatedly glaciated throughout this period (Chiverrell & 25 26 65 Thomas, 2010), resulting in a present-day biota of reduced diversity, limited endemism, 27 28 and typically recent origin from continental Europe (Yalden, 1982; Searle, 2008; 29 66 30 31 67 Montgomery et al., 2014). 32 33 68 34 35 69 While clearly an appropriate model for many species, the generality of this view has 36 37 38 70 been challenged by evidence that several species persisted through glaciations in some 39 40 71 areas of Ireland (Teacher et al., 2009), the English Channel (Coyer et al., 2003; Provan 41 42 72 et al., 2005; Hoarau et al., 2007) and more recently in the Great Britain (McInerney et 43 44 45 73 al., 2014). Evidence has recently been presented for an ancient groundwater fauna 46 47 74 endemic to Britain and Ireland, of which two amphipod species from the genus 48 49 75 Niphargus were estimated to have persisted for at least 19.5 million years (McInerney et 50 51 al., 2014). The survival of groundwater dwelling organisms was probably facilitated by 52 76 53 54 77 the environmental stability of hypogean environments compared to the surface water 55 56 78 (i.e. water temperature and chemistry buffering) (Gibert et al., 1994; McInerney et al., 57 58 59 3 60 Journal of Biogeography Page 48 of 117

1 2 3 79 2014). With regard to the terrestrial environment, the only convincing evidence for 4 5 6 80 Pleistocene persistence comes from the frog, Rana temporaria, which shows several 7 8 81 related mitochondrial DNA cytochrome b haplotypes that are endemic to Ireland 9 10 82 (Teacher et al. 2009). 11 12 83 13 14 15 84 Signatures of population persistence, such as that presented by Rana temporaria, can 16 17 85 provide compelling evidence for Pleistocene persistence in the form of in situ patterns 18 19 86 of allelic divergence and/or gradients of haplotype richness between unglaciated and 20 21 For Peer Review 22 87 glaciated areas. However, these signatures may be obscured for more dispersive species 23 24 88 by subsequent gene flow. During the Last Glacial Maximum (LGM), the British ice 25 26 89 sheet achieved its maximum extent at different times (between 27 and 15 thousand 27 28 years ago, ka) in different sectors (Clark et al., 2012, see Fig. 15), but left ice-free 29 90 30 31 91 habitats, particularly in the south, that could have acted as refugia. Under this model, the 32 33 92 persistence of southern British refugial populations would facilitate local accumulation 34 35 93 of diversity and subsequent “southern richness northern poverty” of genetic variation as 36 37 38 94 temperatures rise, ice sheets retract, and populations expand northwards, as proposed for 39 40 95 pan-European species (Hewitt, 2000). 41 42 96 43 44 45 97 Sedentary soil invertebrates like Collembola have been shown to survive harsh 46 47 98 conditions, such as Pleistocene climate change, and retain genetic signatures in the form 48 49 99 of geographically structured genetic variation (e.g. Stevens et al., 2007, Garrick et al., 50 51 2008, Cicconardi et al., 2010). Their wingless morphology limits dispersal ability (and 52 100 53 54 101 hence gene flow), while their high local abundance (10,000 m-2 upwards, Hopkin, 1997) 55 56 102 increases survival probability relative to other less abundant arthropod groups (Emerson 57 58 59 4 60 Page 49 of 117 Journal of Biogeography

1 2 3 103 et al., 2011). 4 5 6 104 7 8 105 Here we sample two widespread Collembola genera across Great Britain, Entomobrya 9 10 106 and Lepidocyrtus, to test for the following signatures of population persistence prior to 11 12 107 the last glaciation: (i) lower OTU richness and lower intra-OTU genetic richness in 13 14 15 108 northern glaciated areas compared to southern unglaciated areas; (ii) local monophyly 16 17 109 (e.g. geographically restricted monophyletic groups of sequences); (iii) coalescent 18 19 110 origins for endemic genetic variation that predates the LGM; (iv) disjunct distributions 20 21 For Peer Review 22 111 of OTUs that cannot be explained by long distance dispersal. We evaluate evidence for 23 24 112 these expectations by sampling the mitochondrial DNA COI gene for 722 specimens of 25 26 113 Entomobrya and 428 specimens of Lepidocyrtus sampled from across Great Britain. 27 28 29 114 30 31 115 MATERIAL AND METHODS 32 33 116 34 35 117 Sampling and laboratory work 36 37 38 118 39 40 119 Samples were collected from 98 sites across Britain between 2011 and 2012, aiming for 41 42 120 approximately equal sampling effort across the glaciated and unglaciated areas of 43 44 45 121 England and Wales, collecting particularly from the geographical extremes of the region 46 47 122 (Fig. 1, see Supporting Information Appendix S1: Table S1.1 for details). At each 48 49 123 location, surface-active animals were collected by vacuum collection (two sets of 30 50 51 second collections, transferred directly to EtOH), and duplicate soil / litter samples were 52 124 53 54 125 collected with minimal disturbance into plastic bags for subsequent extraction with a 55 56 126 Tullgren funnel system followed by manual removal of the relevant genera. Upon 57 58 59 5 60 Journal of Biogeography Page 50 of 117

1 2 3 127 sorting, samples were individually placed in 96-well PCR plates and stored into 4 5 o 6 128 absolute ethanol at 4 C prior to identification by PS. Identification to morphospecies 7 8 129 followed Hopkin (2007) and body scales and colour-pattern were mainly used as key 9 10 130 characters for diagnosis. At least 1 photo per morphospecies per site was taken to help 11 12 131 posterior identification checks. 13 14 15 132 16 17 133 DNA was isolated using the DNeasy 96 well Blood and Tissue Extraction Kit 18 19 134 (QIAGEN, West Sussex, Great Britain), following manufacturer’s instructions with the 20 21 For Peer Review 22 135 following modifications to preserve the exoskeleton of each animal. After removing 23 24 136 ethanol from the PCR collection plates, digestion was performed by adding a 100 µL 25 26 137 volume of proteinase K + Buffer ATL working solution (1:9 ratio). Samples were 27 28 incubated for just one hour at 56oC to avoid skeleton lysis. After incubation, lysates 29 138 30 31 139 were transferred to the DNeasy 96 collection tubes and the plates with remaining 32 33 140 exoskeletons were refilled with absolute ethanol and stored as vouchers at 4oC. 34 35 141 36 37 38 142 DNA extracts were amplified for the 658 bp barcode region with the primers ColFol-for 39 40 143 (modified from primer LCO1490) and ColFol-rev (modified from primer HCO2198) 41 42 144 (Ramirez-González et al., 2013). Polymerase chain reactions (PCR) contained NH4 43 44 45 145 buffer (1x), 3.0 mM MgCl2, 2.5 mM of each dNTP, 0.4 µM of each primer and 0.5 U of 46 47 146 Taq polymerase (Bioline) in 25 µL final volume and used the following thermal profile: 48 49 147 95oC for 2 min, 40 cycles of 95oC for 1 min; 52oC for 45 s, 72oC for 1 min; and finally 50 51 72oC for 5 min. PCR products were cleaned with ExoSap, then normalized to a 52 148 53 54 149 concentration of 5ng/µL and sent to Eurofins for sequencing with the reverse primer. 55 56 150 57 58 59 6 60 Page 51 of 117 Journal of Biogeography

1 2 3 151 Sequence alignment and molecular OTU delineation 4 5 6 152 7 8 153 MtDNA COI sequences were processed with manual assessment of ambiguous base 9 10 154 calls with GENEIOUS PRO v.5.6.4 (Kearse et al., 2012) and then aligned using MAFFT 11 12 155 v.6.814 (Katoh et al., 2002). In order to define molecular OTUs, we explored Bayesian 13 14 15 156 Inference (BI) and Maximum likelihood (ML) approaches to build trees and different 16 17 157 delimitation methods (GMYC, mPTP and a clustering threshold of 96% similarity) to 18 19 158 account for their inherent limitations (Luo et al., 2018). First, all sequences for each 20 21 For Peer Review 22 159 genus were used to infer ML trees using maximum likelihood in RAXML (Stamatakis et 23 24 160 al., 2008) with a GTR+G substitution model (4 categories). We used the webserver 25 26 161 (https://raxml-ng.vital-it.ch/#/) and ran full analyses with automatic bootstrap replicates. 27 28 After removing the outgroup sequence, inferred ML trees were then used for OTU 29 162 30 31 163 delimitation by mPTP analysis (Kapli et al., 2017) using the webserver (https://mptp.h- 32 33 164 its.org/#/tree) with the multi rate poisson tree processes method. Second, we obtained 34 35 165 ultrametric BI trees running BEAST v.1.8.4 (Drummond et al., 2012) using all sequences 36 37 38 166 for each genus, a GTR+G (4 categories) model, and a combination of Tree and Clock 39 40 167 priors using the Cipres Science gateway (https://www.phylo.org/portal2). However, 41 42 168 BEAST output estimates were problematic due to convergence issues, particularly for 43 44 45 169 clock.rate. This is likely due to the nature of the dataset, which has both speciation and 46 47 170 coalescent signals that cannot be fully resolved with either a Yule or Coalescent tree 48 49 171 prior. Thus we refrained from using BI trees representing all haplotypes for GMYC 50 51 52 172 approach (Generalized Mixed Yule Coalescent, Fujisawa & Barraclough, 2013). As an 53 54 173 alternative, we used DNAcollapser (Villesen, 2007) to reduce dataset to unique 55 56 174 haplotypes and re-ran BEAST with GTR+G (4 categories), Yule tree prior and clock 57 58 59 7 60 Journal of Biogeography Page 52 of 117

1 2 3 175 prior fixed to 1. Two independent runs of 5x107 Markov Chain Monte Carlo (MCMC) 4 5 6 176 generations were sampled every 1000 generations. Runs were well converged and ESS 7 8 177 values were >200. BI ultrametric trees were then used as input files for GMYC in R 9 10 178 using ‘splits’ package (Ezard et al., 2009). ML trees were also inferred with the unique 11 12 179 haplotypes dataset and mPTP analysis repeated for both ML haplotype and BI haplotype 13 14 15 180 trees. Third, we used a 96% similarity threshold for clustering of Entomobrya and 16 17 181 Lepidocyrtus sequences into OTUs. This threshold is based on prior data available from 18 19 182 several studies that have extensively investigated the temporal-spatial aspects and 20 21 For Peer Review 22 183 biological significance of cryptic molecular lineage diversity in the genus Lepidocyrtus, 23 24 184 one of the two genera investigated in the present study (Cicconardi et al., 2010, 2013). 25 26 185 Following phylogenetic reconstruction, GMYC analysis, Hardy-Weinberg equilibrium 27 28 and linkage-disequilibrium tests, 19 molecular lineages occurring sympatrically were 29 186 30 31 187 found to be congruent with biological species (Cicconardi et al. 2013). We then used 32 33 188 sequences from these 19 molecular lineages to calculate their pairwise distances and 34 35 189 selected the minimum p-distance found between them (0.04) to define our clustering 36 37 38 190 threshold (96%) (Appendix S1: Table S1.2). Once OTUs were defined, one sequence 39 40 191 per OTU was submitted to the BOLD Identification System for evidence of identical or 41 42 192 nearly identical sequences sampled outside the Great Britain. 43 44 45 193 46 47 194 Evolutionary rate estimation and dating analysis 48 49 195 50 51 General mutation rates ranging from 0.006 - 0.0248% substitutions/site/Ma (million 52 196 53 54 197 years ago) have been proposed for the mtDNA genome of insects, mostly for the COI 55 56 198 gene (e.g. DeSalle et al. 1987, Brower 1994, Wares 2001, Ho & Lo 2010, and 57 58 59 8 60 Page 53 of 117 Journal of Biogeography

1 2 3 199 references therein). Following an extensive comparative rate analysis using COII 4 5 6 200 sequences from across Pancrustacean major lineages, a conservative (fast) COII mean 7 8 201 rate of 0.0245 substitutions/site/Ma has been estimated for the genus Lepidocyrtus 9 10 202 (Cicconardi et al. 2010). However, currently there is no COI external rate available for 11 12 13 203 Collembola. Thus, we applied this fast COII evolutionary rate to a set of corresponding 14 15 16 204 sequences to estimate a COI rate as follows. First, the root height of eight Lepidocyrtus 17 18 205 COII sequences sampled sympatrically in Panama (Cicconardi et al., 2013) was 19 20 206 obtained running BEAST using the COII rate of Cicconardi et al. (2010). This root height 21 For Peer Review 22 23 207 was then applied to a corresponding COI sequence dataset for the same specimens 24 25 208 (Emerson et al., 2011) to estimate the rate of substitution of the COI gene. Tamura-Nei 26 27 209 plus Gamma (TN93+G) and Tamura 3-parameter plus Gamma (T92+G) were selected 28 29 30 210 as the best model for the COI and COII data respectively in MEGA v.6.06 (Tamura et al., 31 32 211 2013). The BEAST estimated root age of the eight Panamian Lepidocyrtus COII 33 34 212 sequences was 35 Ma (HPD=12.8-60 Ma). When this root age was applied to the 35 36 37 213 corresponding COI sequences, an estimated substitution rate of 0.0421 38 39 214 substitutions/site/Ma for the COI gene was obtained with BEAST. As the COII rate 40 41 215 estimate of Cicconardi et al. (2010) is considered an overestimate of the true rate, 42 43 216 yielding probable underestimates of divergence times (Cicconardi et al. 2010), 44 45 46 217 divergence times derived from our COI rate are also likely to be underestimates. 47 48 218 49 50 219 The COI rate was then used in BEAST to estimate the time of the most recent common 51 52 53 220 ancestor (MRCA) for nodes of interest within the Entomobrya and Lepidocyrtus BI 54 55 221 trees, i.e. dating analysis applied separately only for the geographic localized 56 57 222 monophyletic clusters. The most appropriate substitution model for each dataset was 58 59 9 60 Journal of Biogeography Page 54 of 117

1 2 3 223 estimated using MEGA. An uncorrelated lognormal relaxed clock (with clock.rate priors 4 5 6 224 set to ucdl.mean = 0.042, ucdl.std = 0.11) and a coalescent tree prior (intraspecific 7 8 225 dataset) were applied to two independent runs of 107 Markov Chain Monte Carlo 9 10 226 (MCMC) generations, sampled every 1000 generations. Convergence among the 11 12 227 independent runs, and expected sampling sizes (ESS) for the posterior distribution and 13 14 15 228 parameters were evaluated using TRACER v.1.6 (Rambaut et al., 2014). 16 17 229 18 19 230 Geographical distribution and OTU diversity analysis 20 21 For Peer Review 22 231 23 24 232 Haplotype distributions were mapped using QGIS (QGIS Development Team, 2014), to 25 26 233 evaluate non-random patterns of (i) geographically localised monophyletic clades of 27 28 mtDNA sequences, consistent with population persistence over an evolutionary 29 234 30 31 235 (mutational) timescale, and (ii) geographically disjunct distributions of OTUs, 32 33 236 consistent with range fragmentation. While these distributions were visually inspected, 34 35 237 we also used Ripley’s K function (Kdot function) to test whether these clades are 36 37 38 238 dispersed, clustered or randomly distributed throughout the study area (Dixon, 2002). 39 40 239 Kdot was tested for each OTU of interest (marked point pattern) in R using package 41 42 240 ‘spatstat’ (Baddeley & Turner, 2005). Three corrections (Kiso – isotropic, Ktrans – 43 44 45 241 translation correction, Kbord – border method) were applied to observed data while 46 47 242 estimating K function and these were compared to what it would be expected if spatial 48 49 243 distributions were random (Kpois – completely random Poisson point process). 50 51 52 244 53 54 245 To test for diversity richness differences between glaciated and unglaciated areas, we 55 56 246 used permutation analyses in ECOSIM v.7.72 (Gotelli & Entsminger, 2000) to analyse 57 58 59 10 60 Page 55 of 117 Journal of Biogeography

1 2 3 247 matrices of (i) OTU richness within sampling sites and (ii) intra-OTU haplotype 4 5 6 248 richness within sampling sites. Sampling sites were classified into two biogeographical 7 8 249 areas: glaciated and unglaciated, defined according to the maximum limits of the 9 10 250 British-Irish ice sheet (BIIS) during the last Pleistocene glacial period (Clark et al., 11 12 251 2012). This represents the most up-to-date reconstruction of the extent and times of 13 14 15 252 BIIS, derived from evidence such as moraines, lateral meltwater channels, and 16 17 253 subglacial melt. Although it reached its maximum extent at different times in different 18 19 254 sectors (see Fig. 15 in Clark et al., 2012), its overall extent, independent of the time, was 20 21 For Peer Review 22 255 used to classify each locality as either glaciated or unglaciated (Fig. 1). We used the 23 24 256 rarefaction algorithm implemented in ECOSIM to control for differences in sampling and 25 26 257 abundances when testing for OTU and intra-OTU richness differences, using as a 27 28 threshold the abundance of the smaller community being compared. The random 29 258 30 31 259 number seed was set to 10 and simulations were run for 1000 iterations. Richness 32 33 260 curves and their 95% confidence intervals were generated and hypotheses of equivalent 34 35 261 OTU richness and intra-OTU haplotype richness for glaciated and unglaciated 36 37 38 262 communities (i.e. collection of Entomobrya haplotypes and Lepidocyrtus haplotypes 39 40 263 found in each area) was evaluated by assessing whether the observed richness of the 41 42 264 smaller community fell within the 95% confidence interval of the simulations for the 43 44 45 265 rarefied community. 46 47 266 48 49 267 RESULTS 50 51 52 268 53 54 269 Overview of Entomobrya and Lepidocyrtus of Great Britain 55 56 270 57 58 59 11 60 Journal of Biogeography Page 56 of 117

1 2 3 271 We collected a total of 1150 individuals, 722 from the genus Entomobrya and 428 from 4 5

6 272 the genus Lepidocyrtus. Entomobrya specimens were sampled across 90 sampling sites 7 8 273 (Appendix S1), and were assigned to 6 morphospecies (Entomobrya albocincta, E. 9 10 274 intermedia, E. marginata, E. multifasciata, E. nicoleti and E. nivalis). Two Katianna sp. 11 12 275 sequences were sampled as outgroups. Five morphospecies were sampled from many 13 14 15 276 locations while the scarcer E. marginata was found in only a few sites (Table 1). 16 17 277 Lepidocyrtus specimens were sampled from 68 sampling sites (Appendix S1: Table 18 19 278 S1.1), and were assigned to 4 common morphospecies (Lepidocyrtus curvicollis, L. 20 21 For Peer Review 22 279 cyaneus, L. lanuginosus, L. lignorum), two unknown morphospecies (Lepidocyrtus sp. 1 23 24 280 cf. orange, Lepidocyrtus sp. 2) plus one specimen of each of L. ruber and L. violaceus 25 26 281 (Table 2). 27 28 29 282 30 31 283 MtDNA COI sequences were obtained from 659 Entomobrya and 391 Lepidocyrtus 32 33 284 specimens, with alignment lengths of 561 and 556 bp respectively, and with 173 and 92 34 35 285 unique haplotypes respectively. The remaining specimens of each genus either failed to 36 37 38 286 amplify or produce sequences of good quality. Inferred ML trees using all haplotype 39 40 287 datasets for both genera are presented in Appendix S2. Using the multi coalescent rate, 41 42 288 mPTP delimited 12 Entomobrya OTUs and 17 Lepidocyrtus OTUs. Inferred ultrametric 43 44 45 289 BI trees using the unique haplotypes dataset for both genera are presented in Figures 2 46 47 290 and 3. A comparison of monophyletic clusters recovered by each tree inference methods 48 49 291 and their respective branch support is provided in Appendix S2. GMYC analysis using 50 51 BI ultrametric trees defined 22 ML entities for Entomobrya and 18 ML entities for 52 292 53 54 293 Lepidocyrtus. mPTP using BI haplotype trees delimited 13 Entomobrya and 22 55 56 294 Lepidocyrtus OTUs while for ML haplotype trees, mPTP delimited 12 Entomobrya and 57 58 59 12 60 Page 57 of 117 Journal of Biogeography

1 2 3 295 18 Lepidocyrtus OTUs. Finally, using the 96% similarity threshold for clustering 4 5 6 296 sequences down to OTUs (see methods for details), a total of 13 Entomobrya OTUs and 7 8 297 22 Lepidocyrtus OTUs were identified. OTU numbers and splits defined by each 9 10 298 method used are presented in Appendix S2. 11 12 299 13 14 15 300 Due to the slightly different number of OTUs defined by different methods, we have 16 17 301 taken a conservative approach of considering OTUs those that are recovered in the 18 19 302 majority of approaches employed and avoiding intra splits (Appendix S2: Table S2.2). 20 21 For Peer Review 22 303 Thus, final numbers of OTUs are 12 for Entomobrya and 18 for Lepidocyrtus. 23 24 304 25 26 305 Of the 12 Entomobrya OTUs, two were singletons (Fig. 2). Overall mean divergence 27 28 among OTUs was 16% (Table 1). Sequences assigned to the morphospecies E. 29 306 30 31 307 albocincta corresponded to a single OTU, while the other five morphospecies were 32 33 308 composed of two or three OTUs (Table 1, Fig. 2). The number of haplotypes per OTU 34 35 309 ranged from 1 to 44 (Table 1). The geographic distribution of the nine OTUs with >99% 36 37 38 310 similarity to BOLD sequences revealed the presence of related sequences mainly in 39 40 311 Canada and France, but also in Moldova, Norway and South Africa (Table 1). 41 42 312 43 44 45 313 For Lepidocyrtus, five out of the 18 OTUs were singletons (Fig 3). Overall mean 46 47 314 divergence among OTUs was 14% (Table 2). Sequences assigned to the morphospecies 48 49 315 L. cyaneus and L. lanuginosus corresponded to nine and six OTUs respectively, while 50 51 those assigned to L. lignorum and L. curvicollis corresponded to three and two OTUs 52 316 53 54 317 respectively (Table 2, Fig. 3). Seven out of the 18 OTUs are comprised of individuals 55 56 318 representing more than one morphospecies, but are mainly represented by one (e.g. 57 58 59 13 60 Journal of Biogeography Page 58 of 117

1 2 3 319 most individuals of OTU 3 are L. lignorum), apart from OTU 1 which is the most 4 5 6 320 abundant (n=189 specimens) and widespread (n=41 sites) OTU with very low sequence 7 8 321 variation (only 9 haplotypes), equally represented by two morphospecies, L. 9 10 322 lanuginosus and L. lignorum (n=83 and n=82 respectively), plus a few specimens 11 12 323 assigned to L. curvicollis, L. cyaneus and L. ruber. The number of haplotypes per OTU 13 14 15 324 ranged from 1 to 14 (Table 2). The 13 OTUs with >99% similarity to BOLD sequences 16 17 325 revealed the presence of related sequences mainly in Canada and France, but also in 18 19 326 Germany, Italy, Poland, USA, Australia and Tasmania (Table 2). 20 21 For Peer Review 22 327 23 24 328 Geographical distributions 25 26 329 27 28 Entomobrya OTUs 1, 4, 11 and 12 are the most abundant (n individuals > 90) and 29 330 30 31 331 widespread (n sampling sites > 30) with ranges extending across the geographic extent 32 33 332 of sampling sites (Fig. 4 & Appendix S3: Fig. S3.1). OTUs 5 and 6 have medium 34 35 333 abundances (n individuals >50) and are also relatively broadly distributed (Appendix 36 37 38 334 S3: Fig. S3.2). The remaining OTUs are represented by a very small number of 39 40 335 individuals and occur in a single or very few sampling sites (Appendix S3: Fig. S3.3). 41 42 336 Among the six abundant OTUs, only OTU 4 presented a pattern of non-random 43 44 45 337 geographic structuring of genetic variation (Fig. 2). Within this OTU, clear structuring 46 47 338 of genetic variation was found in two monophyletic clusters of DNA sequences, NivG1 48 49 339 and NivG2, comprised of ten individuals (7 haplotypes) and 6 individuals (3 50 51 haplotypes), respectively (posterior probability: NivG1 = 1; NivG2 =1) (Figs. 2 & 4). 52 340 53 54 341 Both clades are found in only a few geographically proximate sites in the North, 55 56 342 Northwest and Yorkshire regions. NIVG1 spans both glaciated and unglaciated areas, 57 58 59 14 60 Page 59 of 117 Journal of Biogeography

1 2 3 343 while NIVG2 occurs within glaciated areas only. The following disjunctive distributions 4 5 6 344 were found: North East, North West, Wales and East Anglia (OTU2), North East and 7 8 345 South West (OTU3), East Anglia and Wales (OTU10) (Appendix S3: Fig. S3.3). 9 10 346 11 12 347 Among Lepidocyrtus OTUs, three main spatial distribution patterns were identified: 13 14 15 348 OTUs either presented broad ranges (n=3 OTUs), disjunct distributions (n=3), or were 16 17 349 geographically restricted to a few closely located sites (n=7) (Table 2). The remaining 18 19 350 five OTUs were found in single sampling sites, frequently being represented by a single 20 21 For Peer Review 22 351 individual (Fig. 3). The ranges of the widespread OTUs 1 and 15 extended across the 23 24 352 geographic extent of sampling sites, while OTU 9 was not sampled from more northerly 25 26 353 areas, being largely distributed from West Yorkshire down to eastern, western and 27 28 southern regions (Appendix S3: Fig. S3.4). The following disjunctive distributions were 29 354 30 31 355 found: North, Wales and South (OTU 11), East Anglia and South West (OTU 17), 32 33 356 North, Humberside, South, South East and South West (OTU 13) (Fig. 5a,b). Within 34 35 357 OTUs with geographically localised ranges, OTU 3 and 16 are only found in 36 37 38 358 unglaciated sites, while the remaining OTUs (6, 7, 8, 12 and 18) are found in both 39 40 359 glaciated and unglaciated sites (Fig. 5c,d). Geographically widespread OTUs were also 41 42 360 the most abundant (n > 40 individuals), while localised or geographically disjunct OTUs 43 44 45 361 presented mid to low abundances (Table 2). Unlike Entomobrya, no pattern of 46 47 362 geographic structuring of genetic variation was found within widespread OTUs. 48 49 363 However, geographical structuring of genetic variation was found in five out of the 50 51 seven geographically localised OTUs (OTUs 3, 7, 8, 12 and 18). These OTUs 52 364 53 54 365 (comprised of 7 to 17 individuals) presented some level of haplotype variation (from 55 56 366 four to eight haplotypes per OTU), which were found only in a few closely located sites 57 58 59 15 60 Journal of Biogeography Page 60 of 117

1 2 3 367 (from two to five sites per OTU, Table 2). OTUs 6 and 16 were also geographically 4 5 6 368 restricted, but did not present sequence variation. 7 8 369 9 10 370 K function plots for each geographically localised and disjunctly distributed OTU, for 11 12 371 both Entomobrya and Lepidocyrtus, revealed deviations from random distribution of 13 14 15 372 genetic variation, with empirical values with corrections greater than that expected for a 16 17 373 random distribution (Appendix S3: Figs. S3.5 & S3.6). 18 19 374 20 21 For Peer Review 22 375 OTU diversity analysis 23 24 376 25 26 377 For Entomobrya, a total of 43 sampling sites were located in areas covered by the ice 27 28 sheet, while 56 were located in areas that have not been covered by ice during 29 378 30 31 379 glaciations. The investigation of geographical variation in Entomobrya genetic diversity 32 33 380 revealed that overall OTU richness is equivalent in glaciated and unglaciated areas 34 35 381 (Table 3). After rarefying the unglaciated community data down to the abundance level 36 37 38 382 of the glaciated community, it was found that for 1000 random samples of 264 39 40 383 individuals, there was an average of 11 OTUs with a variance of 0.9 (Table 3). The 41 42 384 confidence interval indicates that 95% of the times that a random sample of 264 43 44 45 385 individuals is drawn from the unglaciated community, it is expected to find between 9- 46 47 386 12 different OTUs. The observed diversity of the glaciated community (n=9 OTUs) is 48 49 387 inside this confidence interval, indicating these communities are equivalent in diversity. 50 51 Looking within OTUs, observed haplotype richness of the smaller communities fell 52 388 53 54 389 within the simulated 95% confidence interval for all but one OTU (OTU 11 haplotype 55 56 390 richness was greater in glaciated than unglaciated areas) (Table 3). 57 58 59 16 60 Page 61 of 117 Journal of Biogeography

1 2 3 391 4 5 6 392 For Lepidocyrtus, a total of 34 sites were sampled within each of the glaciated and 7 8 393 unglaciated areas of the Great Britain (Fig. 1). Contrary to Entomobrya data, the 9 10 394 analysis revealed that Lepidocyrtus OTU richness is significantly different between 11 12 395 glaciated and unglaciated communities with greater richness found in southern areas 13 14 15 396 (Table 4). The observed diversity of the glaciated community (n=11 OTUs) is outside 16 17 397 the confidence interval (15-17 OTUs) and is below the simulated average, indicating 18 19 398 that OTU richness is smaller than expected in the glaciated areas. Looking within the 20 21 For Peer Review 22 399 three widespread OTUs (1, 9 and 15), observed haplotype richness of the less abundant 23 24 400 community fell within the simulated 95% confidence interval for all OTUs, indicating 25 26 401 that haplotype richness does not differ between glaciated and unglaciated communities 27 28 for these OTUs (Table 4). 29 402 30 31 403 32 33 404 MtDNA COI rate estimation and long-term persistence analysis 34 35 405 36 37 38 406 The best model of molecular evolution for sequences in the two Entomobrya 39 40 407 monophyletic groups detected within E. nivalis OTU 4 (NivG1 and NivG2) was T92. 41 42 408 Adopting the closest model in BEAST (TN93) and the estimated COI substitution rate 43 44 45 409 (0.0421 substitutions/site/myr), the age of the MRCA for NivG1 is 77,100 years ago 46 47 410 (HPD= 21,700-142,300) and 42,700 years ago for NivG2 (HPD=38,350-94,700) 48 49 411 (Appendix S4). 50 51 52 412 53 54 413 The best model for Lepidocyrtus OTUs 3 and 8 was Kimura 2-parameter (K2), while for 55 56 414 OTUs 7, 12 and 18 it was T92. Adopting the closest models in BEAST (HKY- 57 58 59 17 60 Journal of Biogeography Page 62 of 117

1 2 3 415 Hasegawa-Kishino-Yano for K2 and TN93 for T92), the mean age of the MRCA for 4 5 6 416 these OTUs ranged from 160,000 years ago (HPD= 77,200-256,200) to 533,900 years 7 8 417 ago (HPD=355,600-720,700) (Table 5, Figs. S11 to S15). These deep divergence times 9 10 418 are more likely to be underestimates than overestimates (i.e. a fast mtDNA COI 11 12 419 substitution rate was used), and they substantially predate the LGM. 13 14 15 420 16 17 421 DISCUSSION 18 19 422 20 21 For Peer Review 22 423 Glacial events greatly altered distributional ranges of European species as ice or 23 24 424 permafrost conditions pushed organisms southward or gradually restricted them to 25 26 425 various refuges (e.g. Taberlet et al., 1998; Hewitt, 2004). As a consequence, high levels 27 28 of endemic genetic variation are typically found in refugial areas (evidence of long-term 29 426 30 31 427 in situ evolution) (e.g. Provan & Bennett, 2008; Tzedakis et al., 2013), and disjunct 32 33 428 modern distributions are observed in some taxa (e.g Stehlik et al., 2002; Willis & van 34 35 429 Andel, 2004; Hedderson & Nowell, 2006; Schmitt et al., 2006). These two basic 36 37 38 430 arguments – endemism and disjunction, have been used to support the glacial survival 39 40 431 hypothesis (e.g Brochmann et al., 2003) and they were investigated within the 41 42 432 Entomobrya and Lepidocyrtus genera in order to detect OTUs with signatures of long- 43 44 45 433 term persistence. 46 47 434 48 49 435 Within the broadly distributed Entomobrya OTU 4, a pattern of non-random geographic 50 51 structuring of genetic variation was observed (Fig. 2) in two monophyletic clusters of 52 436 53 54 437 DNA sequences. NivG1 and NivG2, comprised of ten individuals (7 haplotypes) and 6 55 56 438 individuals (3 haplotypes), respectively, presented clear structuring of genetic variation 57 58 59 18 60 Page 63 of 117 Journal of Biogeography

1 2 3 439 (Figs. 2 & 4), occurring in only a few geographically proximate sites in the North, 4 5 6 440 Northwest and Yorkshire regions. Genetic signatures in Lepidocyrtus presented stronger 7 8 441 support for the glacial survival hypothesis. Five mtDNA COI Lepidocyrtus OTUs 9 10 442 contained geographically restricted sequence variation (Figs. 3 & 5c,d) and these OTUs 11 12 443 represented 29 haplotypes distributed across 18 sampling sites (Fig. 5c,d). These sites 13 14 15 444 are mostly in unglaciated terrains, but include some glaciated areas (largely located near 16 17 445 the border of the ice sheet, except for OTU 7 for which sequence variation extends well 18 19 446 into the northern glaciated area). 20 21 For Peer Review 22 447 23 24 448 Furthermore, three Entomobrya and three Lepidocyrtus OTUs presented disjunct 25 26 449 geographical distributions mainly between North-South and East-West regions (Fig. 27 28 5a,b & Appendix S3: Fig. S3.3). While this disjunctive pattern could be explained by 29 450 30 31 451 long distance dispersal after ice sheet retreat, as noted for other taxa previously studied 32 33 452 in different contexts (e.g. Brochmann et al., 2003; Beatty & Provan, 2013; Tzedakis et 34 35 453 al., 2013), it seems unlikely for Entomobrya and Lepidocyrtus (and Collembola in 36 37 38 454 general) due to their limited dispersal capability. This low dispersive ability has been 39 40 455 well demonstrated by the strong geographic structuring across very short distances 41 42 456 found for Lepidocyrtus lineages in previous molecular studies in the Mediterranean 43 44 45 457 Basin and Panama (Cicconardi et al., 2010, 2013), and other genera elsewhere (e.g. 46 47 458 Garrick et al., 2007, 2008; McGaughran et al., 2008). A more likely scenario is that 48 49 459 these represent native OTUs whose origin in Great Britain predates the last glaciation, 50 51 for which previously broader distributions were reduced by glacial onset, with 52 460 53 54 461 persistence in a limited number of ice-free areas in otherwise unfavourable habitat. 55 56 462 After glaciation, as the ice receded, deglaciated regions recolonised by a few leading 57 58 59 19 60 Journal of Biogeography Page 64 of 117

1 2 3 463 front OTUs (Hewitt 1999) would effectively pre-empt space, reducing the opportunity 4 5 6 464 for other surviving OTUs to re-expand their populations. A similar scenario could also 7 8 465 explain the existence of endemic variation. Localised clades could have once been 9 10 466 widespread (i.e. same and related alleles found elsewhere), but went extinct in most of 11 12 467 their range in response to climate change. After ice retreat, they remained isolated or 13 14 15 468 colonized a few closely located sites. 16 17 469 18 19 470 Great Britain sequences that are genetically identical or nearly identical to sequences 20 21 For Peer Review 22 471 sampled in remote continents are most plausibly explained as human introductions. 23 24 472 Passive dispersal of Collembola through human activity has been implicated in the 25 26 473 sharing of lineages across distant continents, e.g. Cerathophysella denticulata lineage 27 28 L3 found in Canada, South Africa, Australia and New Zealand (Porco et al., 2012), and 29 474 30 31 475 such introductions are considered to have mainly been accidental (e.g. in soil used as 32 33 476 ship ballast). Mitochondrial metagenomic data has also revealed island faunas to be 34 35 477 highly represented by introduced Collembola (Cicconardi et al. 2017). Our data set 36 37 38 478 presents similar examples, such as Entomobrya OTU 6, with sequences 100% similar 39 40 479 sampled in Canada, and South Africa and sequences 100% identical 41 42 480 to Lepidocyrtus OTU 9 sampled in Australia and Tasmania. Genetic identity, or near 43 44 45 481 identity across these vast geographic distances is consistent with passive transport 46 47 482 through human activities. 48 49 483 50 51 OTU diversity analysis 52 484 53 54 485 55 56 57 58 59 20 60 Page 65 of 117 Journal of Biogeography

1 2 3 486 Our results illustrate the extent to which reliance on morphospecies can misrepresent 4 5 6 487 diversity. Within Entomobrya, colour-pattern morphospecies were generally 7 8 488 monophyletic but concealing two or more mtDNA OTUs (Fig. 2). Katz et al. (2015) 9 10 489 also found colour pattern species of Entomobrya were generally (but not universally) 11 12 490 supported by mtDNA in the USA. The correspondence between morphospecies and 13 14 15 491 OTU was less clear for Lepidocyrtus. In particular, animals assigned to the 16 17 492 morphospecies Lepidocyrtus cyaneus featured in seven OTUs, and the dominant OTU 18 19 493 contained a mix of morphospecies (mainly L. lignorum and L. lanuginosus). 20 21 For Peer Review 22 494 Unexpectedly high cryptic species diversity has been noted previously in this genus 23 24 495 (Ciccornadi et al., 2010, Cicconardi et al. 2013), and together these results suggest that 25 26 496 key taxonomic features (mainly distribution of scales, ground colour) may be unreliable 27 28 indicators of species boundaries. 29 497 30 31 498 32 33 499 If soil-dwelling mesofauna survived within southern regions that remained ice free, they 34 35 500 would be expected to exhibit higher species richness and higher genetic diversity within 36 37 38 501 species, compared to subsequently recolonised glaciated areas. Our analysis of genetic 39 40 502 diversity revealed different responses between the genera. Entomobrya presented 41 42 503 equivalent OTU richness between glaciated and unglaciated areas, providing no support 43 44 45 504 for the northern poverty hypothesis. In contrast, Lepidocyrtus OTU richness did differ 46 47 505 between glaciated and unglaciated communities, with fewer OTUs in the glaciated 48 49 506 areas, supporting the above hypothesis. Entomobrya and Lepidocyrtus may have 50 51 different rates of dispersal, which could potentially explain their distinct re-colonization 52 507 53 54 508 patterns. 55 56 509 57 58 59 21 60 Journal of Biogeography Page 66 of 117

1 2 3 510 Long-term persistence of Entomobrya and Lepidocyrtus in the Great Britain 4 5 6 511 7 8 512 Our data reveal evidence for population persistence of Entomobrya OTUs within Great 9 10 513 Britain, estimated to extend back at least some 77,000 years. Six geographically 11 12 514 proximate sampling sites are represented exclusively by 9 haplotypes that comprise two 13 14 15 515 monophyletic groups, not sampled within any other site. These six sites (Fig. 4) span 16 17 516 both glaciated and unglaciated terrains of Great Britain, with most sequence variation 18 19 517 concentrated within the sampling sites extending well into the glaciated northern areas. 20 21 For Peer Review 22 518 The estimated age of 77,000 years for the onset of genetic differentiation within the 23 24 519 oldest of these two clades indicates persistence through the last Pleistocene glaciation. 25 26 520 27 28 Similarly, with divergences dating back at least to the Ionian stage of the Pleistocene 29 521 30 31 522 (~500-160 Kya), Lepidocyrtus OTU differentiation is inferred to predate the LGM. The 32 33 523 implication is that the present distribution of the five geographically structured OTUs 34 35 524 has been influenced by relatively old paleogeographic events. These OTUs survived 36 37 38 525 through extreme climatic and ecological changes that characterized Great Britain and 39 40 526 have remained isolated from neighbouring OTUs. 41 42 527 43 44 45 528 The Collembola genera Lepidocyrtus and Entomobrya add to the growing evidence for 46 47 529 an ancient invertebrate fauna in Great Britain (Nieberding et al., 2005; McInerney et al., 48 49 530 2014, see above). Our results are consistent with other studies demonstrating the 50 51 persistence of Collembola lineages through extreme climatic disturbances, such as 52 531 53 54 532 glaciations in the Antarctic (Stevens et al., 2007), the Messinian Salinity Crisis in the 55 56 533 Mediterranean basin (Cicconardi et al., 2010) and periglacial forest contraction in 57 58 59 22 60 Page 67 of 117 Journal of Biogeography

1 2 3 534 southeastern Australia (Garrick et al., 2007). Together, these studies demonstrate the 4 5 6 535 marked resilience of the Collembola fauna to historical climate cycles, and their 7 8 536 suitability to investigate its consequences. 9 10 537 11 12 538 The unglaciated southern regions of the British Isles have been previously suggested to 13 14 15 539 act as northern cryptic refugia based upon evidence from fossil remains (Vincent, 1990), 16 17 540 fish (Bernatchez, 2001; Hänfling et al., 2002), and groundwater amphipods (McInerney 18 19 541 et al., 2014). Results presented here extend inference beyond the long-term persistence 20 21 For Peer Review 22 542 for Collembola OTUs in unglaciated areas. Patterns of genetic variation within 23 24 543 Collembola also reveal evidence for population persistence within regions inferred to 25 26 544 have been glaciated, suggesting the existence of small ice-free northern areas that 27 28 allowed survival. One possible mechanism for the existence of northern ice-free areas in 29 545 30 31 546 Great Britain comes from recent work suggesting they may result from geothermal 32 33 547 activity, i.e. they are located close to geothermal heated ground and water (e.g. heated 34 35 548 ground and ponds, steam fields). These areas would have provided thermal gradients 36 37 38 549 with less extreme temperatures (between very high temperatures in the core of 39 40 550 geothermal sites, and very low temperatures below the ice-sheets), thus, possibly 41 42 551 creating a buffer zone where life could survive – the “geothermal glacial refugia” 43 44 45 552 hypothesis (Convey & Lewis Smith, 2006). Such geothermal activity usually persists 46 47 553 over extended time scales that could encompass glacial cycles, providing refugia for life 48 49 554 (Fraser et al., 2014). Using Antarctica as a natural model system and a comprehensive 50 51 terrestrial database associated with landscape, climate and geothermal data, Fraser et al. 52 555 53 54 556 (2014) provided strong support for the role of geothermal heated terrain in structuring 55 56 557 broad-scale contemporary patterns in Antarctic diversity by providing glacial refugia. 57 58 59 23 60 Journal of Biogeography Page 68 of 117

1 2 3 558 They further suggest that for many cases where the existence of small refugia within the 4 5 6 559 glaciated regions of northern Europe and northern North America has been inferred 7 8 560 from phylogeographic data, but their causes and precise locations have not been 9 10 561 identified, there is a great possibility that they may have been geothermal. A 11 12 562 consideration of the geothermal map of Great Britain (Busby et al., 2011) suggests that 13 14 15 563 some sampling sites where Entomobrya and Lepidocyrtus OTUs present signatures of in 16 17 564 situ diversification coincide with areas around geothermal terrains. For example, the 18 19 565 oldest E. nivalis OTUs (NivG1 and NivG2) were sampled from sampling sites 76, 77, 20 21 For Peer Review 22 566 78 and 79, which broadly coincide with terrains that receive heat flow from the 23 24 567 Northern England Granites. Sites for L. cyaneus OTUs 7 (loc: 10, 101) and 12 (loc: 24, 25 26 568 28) coincide with areas near thermal springs (e.g. Taffs Well) and close to heat flow 27 28 from the Cheshire Basin. While these data suggest that long-term persistence of 29 569 30 31 570 Collembola in glaciated areas of the Great Britain may have been facilitated by 32 33 571 geothermal glacial refugia, this hypothesis remains to be tested. 34 35 572 36 37 38 573 ACKNOWLEDGEMENTS 39 40 574 41 42 575 This study was supported by SYNTAX grant awarded to P.S. C.M.A.F. was funded by a 43 44 45 576 Brazilian Council for Scientific and Technological Development PhD studentship 46 47 577 (CNPQ -200439/2010-3). 48 49 578 50 51 REFERENCES 52 579 53 54 580 Baddeley, A. & Turner, R. (2005) spatstat: An R package for analyzing spatial point 55 581 patterns. Journal of Statistical Software, 12, 1–42. 56 57 58 59 24 60 Page 69 of 117 Journal of Biogeography

1 2 3 582 Beatty, G.E. & Provan, J. (2013) Post-glacial dispersal, rather than in situ glacial 4 583 survival, best explains the disjunct distribution of the Lusitanian plant species 5 6 584 Daboecia cantabrica (Ericaceae). Journal of Biogeography, 40, 335–344. 7 8 585 Bernatchez, L. (2001) The evolutionary history of brown trout (Salmo trutta L.) inferred 9 586 from phylogeographic, nested clade, and mismatch analyses of mitochondrial 10 587 DNA variation. Evolution, 55, 351–379. 11 12 588 Brochmann, C., Gabrielsen, T.M., Nordal, I., Landvik, J.Y., & Elven, R. (2003) Glacial 13 589 survival or tabula rasa? The history of North Atlantic biota revisited. Taxon, 52, 14 590 417–450. 15 16 17 591 Brower, A.V. (1994) Rapid morphological radiation and convergence among races of 18 592 the butterfly Heliconius erato inferred from patterns of mitochondrial DNA 19 593 evolution. Proceedings of the National Academy of Sciences, 91, 6491–6495. 20 21 594 Busby, J., Kingdon, A.,For & Williams, Peer J. (2011) Review The measured shallow temperature field 22 595 in Britain. Quarterly Journal of Engineering Geology and Hydrogeology, 44, 23 596 373–387. 24 25 597 Chiverrell, R.C. & Thomas, G.S.P. (2010) Extent and timing of the Last Glacial 26 27 598 Maximum (LGM) in Britain and Ireland: a review. Journal of Quaternary 28 599 Science, 25, 535–549. 29 30 600 Cicconardi, F., Borges, P.A.V., Strasberg, D., Oromí, P., López, H., Pérez-Delgado, 31 601 A.J., Casquet, J., Caujapé-Castells, J., Fernández-Palacios, J.M., Thébaud, C., & 32 602 Emerson, B.C. (2017) MtDNA metagenomics reveals large-scale invasion of 33 603 belowground arthropod communities by introduced species. Molecular Ecology, 34 26, 3104-3115. 35 604 36 37 605 Cicconardi, F., Fanciulli, P.P., & Emerson, B.C. (2013) Collembola, the biological 38 606 species concept and the underestimation of global species richness. Molecular 39 607 Ecology, 22, 5382–5396. 40 41 608 Cicconardi, F., Nardi, F., Emerson, B.C., Frati, F., & Fanciulli, P.P. (2010) Deep 42 609 phylogeographic divisions and long-term persistence of forest invertebrates 43 610 (Hexapoda: Collembola) in the North-Western Mediterranean basin. Molecular 44 45 611 Ecology, 19, 386–400. 46 47 612 Clark, C.D., Hughes, A.L.C., Greenwood, S.L., Jordan, C., & Sejrup, H.P. (2012) 48 613 Pattern and timing of retreat of the last British-Irish Ice Sheet. Quaternary 49 614 Science Reviews, 44, 112–146. 50 51 615 Convey, P. & Lewis Smith, R. i. (2006) Geothermal bryophyte habitats in the South 52 616 Sandwich Islands, maritime Antarctic. Journal of Vegetation Science, 17, 529– 53 617 538. 54 55 56 618 Coyer, J.A., Peters, A.F., Stam, W.T., & Olsen, J.L. (2003) Post-ice age recolonization 57 619 and differentiation of Fucus serratus L. (Phaeophyceae; Fucaceae) populations 58 620 in Northern Europe. Molecular Ecology, 12, 1817–1829. 59 25 60 Journal of Biogeography Page 70 of 117

1 2 3 621 DeSalle, R., Freedman, T., Prager, E.M., & Wilson, A.C. (1987) Tempo and mode of 4 622 sequence evolution in mitochondrial DNA of Hawaiian Drosophila. Journal of 5 6 623 Molecular Evolution, 26, 157–164. 7 8 624 Dixon, P.M. (2002) Ripley’s K function. Encyclopedia of environmetrics (ed. by A.H. 9 625 El-Shaarawi and W.W. Piegorsch), Wiley, Chichester New York. 10 11 626 Drummond, A.J., Suchard, M.A., Xie, D., & Rambaut, A. (2012) Bayesian 12 627 Phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and 13 628 Evolution, 29, 1969–1973. 14 15 629 Emerson, B.C., Cicconardi, F., Fanciulli, P.P., & Shaw, P.J.A. (2011) Phylogeny, 16 17 630 phylogeography, phylobetadiversity and the molecular analysis of biological 18 631 communities. Philosophical Transactions of the Royal Society B: Biological 19 632 Sciences, 366, 2391–2402. 20 21 633 Ezard, T., Fujisawa, T.,For & Barraclough, Peer T.G. Review (2009) SPLITS: SPecies’ LImits by 22 634 Threshold Statistics: R package version 1.0-18/r45. 23 24 635 Fraser, C.I., Terauds, A., Smellie, J., Convey, P., & Chown, S.L. (2014) Geothermal 25 636 activity helps life survive glacial cycles. Proceedings of the National Academy 26 27 637 of Sciences of the United States of America, 111, 5634–5639. 28 29 638 Fujisawa, T. & Barraclough, T.G. (2013) Delimiting species using single-locus data and 30 639 the Generalized Mixed Yule Coalescent approach: a revised method and 31 640 evaluation on simulated data sets. Systematic Biology, 62, 707–724. 32 33 641 Garrick, R.C., Rowell, D.M., Simmons, C.S., Hillis, D.M., & Sunnucks, P. (2008) Fine- 34 642 scale phylogeographic congruence despite demographic incongruence in two 35 low-mobility saproxylic springtails. Evolution, 62, 1103–1118. 36 643 37 38 644 Garrick, R.C., Sands, C.J., Rowell, D.M., Hillis, D.M., & Sunnucks, P. (2007) 39 645 Catchments catch all: long-term population history of a giant springtail from the 40 646 southeast Australian highlands — a multigene approach. Molecular Ecology, 16, 41 647 1865–1882. 42 43 648 Gibert, J., Danielopol, D., & Stanford, J.A. (1994) Groundwater Ecology. Academic 44 649 Press, San Diego, California. 45 46 47 650 Gotelli, N.J. & Entsminger, G.L. (2000) EcoSim: Null models software for ecology. 48 651 Version 5.0. Acquired Intelligence Inc. & Kesey-Bear, 49 50 652 Hänfling, B., Hellemans, B., Volckaert, F. a. M., & Carvalho, G.R. (2002) Late glacial 51 653 history of the cold-adapted freshwater fish Cottus gobio, revealed by 52 654 microsatellites. Molecular Ecology, 11, 1717–1729. 53 54 655 Hedderson, T.A. & Nowell, T.L. (2006) Phylogeography of Homalothecium sericeum 55 656 (Hedw.) Br. Eur.; toward a reconstruction of glacial survival and postglacial 56 57 657 migration. Journal of Bryology, 28, 283–292. 58 59 26 60 Page 71 of 117 Journal of Biogeography

1 2 3 658 Hewitt, G. (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907–913. 4 5 659 Hewitt, G.M. (1999) Post-glacial re-colonization of European biota. Biological Journal 6 7 660 of the Linnean Society, 68, 87–112. 8 9 661 Hewitt, G.M. (2004) Genetic consequences of climatic oscillations in the Quaternary. 10 662 Philosophical Transactions of the Royal Society of London. Series B: Biological 11 663 Sciences, 359, 183–195. 12 13 664 Ho, S.Y.W. & Lo, N. (2013) The insect molecular clock. Australian Journal of 14 665 Entomology, 52, 101–105. 15 16 666 Hoarau, G., Coyer, J.A., Veldsink, J.H., Stam, W.T., & Olsen, J.L. (2007) Glacial 17 18 667 refugia and recolonization pathways in the brown seaweed Fucus serratus. 19 668 Molecular Ecology, 16, 3606–3616. 20 21 669 Hopkin, S.P. (1997) BiologyFor of Peerthe Springtails: Review (Insecta: Collembola). Oxford 22 670 University Press, Oxford, UK. 23 24 671 Hopkin, S.P. (2007) A Key to the Collembola (springtails) of Britain and Ireland. Field 25 672 Studies Council (FSC) AIDGAP, Shrewsbury, UK. 26 27 673 Kapli, P., Lutteropp, S., Zhang, J., Kobert, K., Pavlidis, P., Stamatakis, A., & Flouri, T. 28 29 674 (2017) Multi-rate Poisson Tree Processes for single-locus species delimitation 30 675 under Maximum Likelihood and Markov Chain Monte Carlo. Bioinformatics, 31 676 33, 1630–1638. 32 33 677 Katoh, K., Misawa, K., Kuma, K., & Miyata, T. (2002) MAFFT: a novel method for 34 678 rapid multiple sequence alignment based on fast Fourier transform. Nucleic 35 679 Acids Research, 30, 3059–3066. 36 37 Katz, A.D., Giordano, R., & Soto-Adames, F.N. (2015) Operational criteria for cryptic 38 680 39 681 species delimitation when evidence is limited, as exemplified by North 40 682 American Entomobrya (Collembola: ). Zoological Journal of the 41 683 Linnean Society, 173, 818–840. 42 43 684 Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, 44 685 S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., 45 686 & Drummond, A. (2012) Geneious Basic: An integrated and extendable desktop 46 47 687 software platform for the organization and analysis of sequence data. 48 688 Bioinformatics, 28, 1647–1649. 49 50 689 Luo, A., Ling, C., Ho, S.Y.W., Zhu, C.-D., & Mueller, R. (2018) Comparison of 51 690 methods for nolecular species delimitation across a range of speciation 52 691 scenarios. Systematic Biology, 67, 830–846. 53 54 692 McGaughran, A., Hogg, I.D., & Stevens, M.I. (2008) Patterns of population genetic 55 693 structure for springtails and mites in southern Victoria Land, Antarctica. 56 57 694 Molecular Phylogenetics and Evolution, 46, 606–618. 58 59 27 60 Journal of Biogeography Page 72 of 117

1 2 3 695 McInerney, C.E., Maurice, L., Robertson, A.L., Knight, L.R.F.D., Arnscheidt, J., 4 696 Venditti, C., Dooley, J.S.G., Mathers, T., Matthijs, S., Eriksson, K., Proudlove, 5 6 697 G.S., & Hänfling, B. (2014) The ancient Britons: groundwater fauna survived 7 698 extreme climate change over tens of millions of years across NW Europe. 8 699 Molecular Ecology, 23, 1153–1166. 9 10 700 Montgomery, W.I., Provan, J., McCabe, A.M., & Yalden, D.W. (2014) Origin of British 11 701 and Irish mammals: disparate post-glacial colonisation and species 12 702 introductions. Quaternary Science Reviews, 98, 144–165. 13 14 703 Nieberding, C., Libois, R., Douady, C.J., Morand, S., & Michaux, J.R. (2005) 15 16 704 Phylogeography of a nematode (Heligmosomoides polygyrus) in the western 17 705 Palearctic region: persistence of northern cryptic populations during ice ages? 18 706 Molecular Ecology, 14, 765–779. 19 20 707 Porco, D., Bedos, A., Greenslade, P., Janion, C., Skarżyński, D., Stevens, M.I., Jansen 21 708 van Vuuren, B.,For & Deharveng, Peer L. (2012) Review Challenging species delimitation in 22 709 Collembola: cryptic diversity among common springtails unveiled by DNA 23 710 barcoding. Invertebrate Systematics, 26, 470–477. 24 25 26 711 Provan, J. & Bennett, K.D. (2008) Phylogeographic insights into cryptic glacial refugia. 27 712 Trends in Ecology & Evolution, 23, 564–571. 28 29 713 Provan, J., Wattier, R.A., & Maggs, C.A. (2005) Phylogeographic analysis of the red 30 714 seaweed Palmaria palmata reveals a Pleistocene marine glacial refugium in the 31 715 English Channel. Molecular Ecology, 14, 793–803. 32 33 716 QGIS Development Team (2014) QGIS Geographic Information System. Open Source 34 Geospatial Foundation Project, 35 717 36 37 718 Rambaut, A., Suchard, M.A., Xie, D., & Drummond, A.J. (2014) Tracer v1.6. 38 39 719 Ramirez-Gonzalez, R., Yu, D.W., Bruce, C., Heavens, D., Caccamo, M., & Emerson, 40 720 B.C. (2013) PyroClean: denoising pyrosequences from protein-coding 41 721 amplicons for the recovery of interspecific and intraspecific genetic variation. 42 722 PLoS ONE, 8, 1–11. 43 44 723 Schmitt, T., Hewitt, G.M., & Müller, P. (2006) Disjunct distributions during glacial and 45 46 724 interglacial periods in mountain butterflies: Erebia epiphron as an example. 47 725 Journal of Evolutionary Biology, 19, 108–113. 48 49 726 Searle, J.B. (2008) The colonization of Ireland by mammals. The Irish Naturalists’ 50 727 Journal, 29, 109–115. 51 52 728 South, A. (1961) The Taxonomy of the British Species of Entomobrya (collembola). 53 729 Transactions of the Royal Entomological Society of London, 113, 387–416. 54 55 730 Stamatakis, A., Hoover, P., & Rougemont, J. (2008) A rapid bootstrap algorithm for the 56 57 731 RAxML Web servers. Systematic Biology, 57, 758–771. 58 59 28 60 Page 73 of 117 Journal of Biogeography

1 2 3 732 Stehlik, I., Blattner, F.R., Holderegger, R., & Bachmann, K. (2002) Nunatak survival of 4 733 the high Alpine plant Eritrichium nanum (L.) Gaudin in the central Alps during 5 6 734 the ice ages. Molecular Ecology, 11, 2027–2036. 7 8 735 Stevens, M.I., Frati, F., McGaughran, A., Spinsanti, G., & Hogg, I.D. (2007) 9 736 Phylogeographic structure suggests multiple glacial refugia in northern Victoria 10 737 Land for the endemic Antarctic springtail Desoria klovstadi (Collembola, 11 738 Isotomidae). Zoologica Scripta, 36, 201–212. 12 13 739 Taberlet, P., Fumagalli, L., Wust-Saucy, A.-G., & Cosson, J.-F. (1998) Comparative 14 740 phylogeography and postglacial colonization routes in Europe. Molecular 15 16 741 Ecology, 7, 453–464. 17 18 742 Tamura, K., Stecher, G., Peterson, D., Filipski, A., & Kumar, S. (2013) MEGA6: 19 743 Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and 20 744 Evolution, 30, 2725–2729. 21 For Peer Review 22 745 Teacher, A.G.F., Garner, T.W.J., & Nichols, R.A. (2009) European phylogeography of 23 746 the common frog (Rana temporaria): routes of postglacial colonization into the 24 747 British Isles, and evidence for an Irish glacial refugium. Heredity, 102, 490–496. 25 26 27 748 Tzedakis, P.C., Emerson, B.C., & Hewitt, G.M. (2013) Cryptic or mystic? Glacial tree 28 749 refugia in northern Europe. Trends in Ecology & Evolution, 28, 696–704. 29 30 750 Villesen, P. (2007) FaBox: an online toolbox for fasta sequences. Molecular Ecology 31 751 Notes, 7, 965–968. 32 33 752 Vincent, P. (1990) The Biogeography of the British Isles: An Introduction. Routledge, 34 753 London; New York. 35 36 Wares, J.P. (2001) Patterns of Speciation Inferred from Mitochondrial DNA in North 37 754 38 755 American Chthamalus (Cirripedia: Balanomorpha: Chthamaloidea). Molecular 39 756 Phylogenetics and Evolution, 18, 104–116. 40 41 757 Willis, K.J. & van Andel, T.H. (2004) Trees or no trees? The environments of central 42 758 and eastern Europe during the Last Glaciation. Quaternary Science Reviews, 23, 43 759 2369–2387. 44 45 760 Yalden, D.W. (1982) When did the mammal fauna of the British Isles arrive? Mammal 46 47 761 Review, 12, 1–57. 48 762 49 50 763 DATA ACCESSIBILITY 51 52 764 DNA sequences were deposited in BoldSystem (http://www.boldsystems.org/) under 53 54 55 765 accession numbers XXX 56 57 766 58 59 29 60 Journal of Biogeography Page 74 of 117

1 2 3 767 BIOSKETCH 4 5 6 768 Christiana M. A. Faria is currently a postdoctoral researcher at the laboratory of 7 8 769 Ecology and Evolution of Social Insects at the University of Ceará, Brazil. She is 9 10 770 interested in the application of molecular methods to the study of ecology, behaviour, 11 12 771 biogeography, species adaptation and diversification processes. 13 14 15 772 16 17 773 AUTHOR CONTRIBUTIONS 18 19 774 C.M.A.F and B.C.E. designed the research, P.S. collected the samples, C.M.A.F. 20 21 For Peer Review 22 775 conducted the laboratory work and analysed the data. C.M.A.F and B.C.E wrote the 23 24 776 paper, with input from P.S. 25 26 777 27 28 SUPPORTING INFORMATION 29 778 30 31 779 Appendix S1 Supplementary information on the materials used in this study 32 33 780 Appendix S2 Supplementary information for methods and results used in this study 34 35 781 Appendix S3 Spatial distribution patterns for Entomobrya and Lepidocyrtus OTUs 36 37 38 782 Appendix S4 Dated trees for Entomobrya and Lepidocyrtus geographically localized 39 40 783 monophyletic clusters 41 42 784 43 44 45 785 46 47 48 49 50 51 52 53 54 55 56 57 58 59 30 60 Page 75 of 117 Journal of Biogeography

1 2 3 4 786 Table 1. Summary data for each Entomobrya OTU sampled in Great Britain, reporting the number of individuals (n ind) and of mtDNA COI 5 6 7 787 haplotypes (n hap) found, the number of sites each OTU was sampled from (n sites), the number of sequences sampled from single localities (n 8 9 788 single), the number of localities sampled in glaciated (n glac) and unglaciated (n unglac) areas, the mean and maximum observed intraspecific p- 10 11 789 distances, and BOLD information on the locality of occurrence of the identical or near identical sequence found in this database (>99% 12 13 similarity). ‘No match’ a query sequence has no hit in Bold database. 14 790 For Peer Review 15 16 791 17 18 Mean/max BOLD most frequent 19 OTU morphospecies n ind n hap n sites n single n glac n unglac Range 20 intra-OTU p-dist sequence 21 22 1 intermediaa 165 22 47 11 21 26 widespread 2.1/4 Canada 23 24 2 intermedia 5 2 5 1 3 2 disjunct 0.1/0.2 Canada 25 26 27 3 nivalis/marginata 2 2 2 2 1 1 disjunct 0.2/0.2 no match 28 b 29 4 nivalis 94 38 33 29 12 21 widespread 1.3/3.2 France, Canada 30 31 5 nivalisc 54 16 24 12 6 18 spread 0.4/1.6 Norway 32 33 6 mutifasciatad 52 7 14 4 8 6 spread 0.1/0.5 France, Canada, SAfrica 34 35 7 nivalis 1 1 1 1 0 1 - - no match 36 37 38 39 31 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 76 of 117

1 2 3 4 8 nicoleti 2 2 1 0 1 0 - 1.4/1.4 no match 5 6 9 marginata 3 2 2 2 0 2 - 2.4/3.6 no match 7 8 9 10 multifasciata 2 2 2 2 0 21 disjunct - France, Canada 10 e 11 11 nicoleti 106 44 33 37 16 17 widespread 0.3/1.6 no match 12 f 13 12 albocincta 172 37 55 27 15 40 widespread 1.2/2.9 France, Canada, Moldova 14 For Peer Review 15 792 OTUs are mostly composed of one morphospecies but individuals from different morphospecies are also found within these OTUs: 16 793 aintermedia (n=160), nivalis (n=3), albocincta (n=1), multifasciata (n=1), bnivalis (n=86), intermedia (n=2), multifasciata (n=6), cnivalis (n=49), intermedia 17 794 (n=4), marginata (n=1), dmultifasciata (n=50), intermedia (n=1), nicoleti (n=1), enicoleti (n=92), intermedia (n=4), multifasciata (n=7), nivalis (n=3), 18 795 falbocincta (n=171), intermedia (n=1) 19 796 20 21 797 22 23 24 798 25 26 799 27 28 800 29 30 31 801 32 33 802 34 35 803 36 37 38 39 32 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 77 of 117 Journal of Biogeography

1 2 3 4 804 Table 2. Summary data for each Lepidocyrtus OTU sampled in Great Britain, reporting: the number of individuals (n ind), number of mtDNA 5 6 7 805 COI haplotypes (n hap) found, number of sites each OTU was sampled from (n sites), number of localities sampled in glaciated (n glac) and 8 9 806 unglaciated (n unglac) areas, the mean and maximum observed intraspecific p-distances, and BOLD information on the locality of occurrence of 10 11 807 the identical or near identical sequence found in this database (>99% similarity). ‘No match’ a query sequence has no hit in Bold database, ‘not 12 13 provided’ the geographic information of similar sequence found in the database is missing, ‘<99% match’ a similar sequence was found in the 14 808 For Peer Review 15 16 809 database but it is less than 99% identical thus inhibiting taxonomic or geographic assignments. 17 18 810 19 20 Mean/max 21 BOLD most frequent OTU Morphospecies n ind n hap n sites n glac n unglac Range intra-OTU 22 sequence 23 p-distances a 24 1 lanuginosus/lignorum 189 9 41 25 16 widespread 0.02/0.5 Canada 25 26 2 lignorum 2 1 1 0 1 - - France 27 28 3 lignorum/curvicollisb 9 7 4 0 4 0.86/1.65 not provided 29 30 4 albocincta 1 1 1 0 1 - - <99% match 31 32 5 lanuginosus 1 1 1 1 0 - - <99% match 33 34 6 lanuginosus 13 2 4 3 1 localised 0.07/0.18 France 35 36 7 cyaneus 17 8 5 4 1 - 1.76/4.14 Canada 37 38 39 33 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 78 of 117

1 2 3 4 8 cyaneus 11 4 5 1 4 localised 2.36/4.14 Tasmania 5 6 9 cyaneus/violaceusc 49 15 26 8 18 spread 0.59/1.98 Australia, Tasmania 7 8 9 10 curvicollis 1 1 1 0 1 - - France, Italy 10 11 11 curvicollis 4 2 4 1 3 disjunct 0.09/0.18 not provided 12 13 12 cyaneus 13 4 3 2 1 localised 1.09/2.54 no match 14 d For Peer Review 15 13 cyaneus/sp. 2 14 8 8 3 5 disjunct 1.12/1.8 Canada,France, Germany 16 17 14 cyaneus 1 1 1 0 1 - - Canada 18 19 15 lanuginosus/sp.1e 53 18 22 10 12 spread 1.6/4.68 Canada, USA, France, 20 21 Poland 22 23 16 cyaneus 4 2 2 0 2 localised 0.09/0.18 no match 24 25 17 lanuginosus/lignorumf 2 2 2 0 2 disjunct 2.16/ - no match 26 27 18 lignorum 7 6 2 1 1 localised 2.36/3.96 Canada 28 29 : a b 30 811 OTUs composed of individuals from more than one morphospecies lanuginosus (n=83), lignorum (n=82), curvicollis (n=13), cyaneus (n=9), ruber (n=1); 31 32 812 lignorum (n=5), curvicollis (n=3), lanuginosus (n=1); c cyaneus (n=35), violaceus (n=1); d cyaneus (n=13), Lepidocyrtus sp. 2 (n=1); e lanuginosus (n=33), 33 34 813 Lepidocyrtus sp.1 (n=9), lignorum (n=5), cyaneus (n=2), curvicollis (n=1); f lanuginosus (n=1), lignorum (n=1). 35 36 814 37 38 39 34 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 79 of 117 Journal of Biogeography

1 2 3 4 815 Table 3. Summary of Entomobrya ECOSIM data entry and diversity results (mean and 95% confidence interval – CI) for the overall OTU richness 5 6 7 816 and for each of the most abundant OTUs. For all simulated pairs, the larger community was rarefied down to the abundance level of the smaller 8 9 817 community being compared and the observed diversity (n OTU/hap) of the smaller community was tested to see if it fell within the estimated 10 11 818 95% confidence interval of the larger community. N ind = number of individuals, N otu/hap = observed number of OTUs or of haplotypes within 12 13 OTUs, glac = community in the glaciated area, unglac = community in the unglaciated area. *Community is used in a statistical sense (not 14 819 For Peer Review 15 16 820 ecological sense) meaning the collection of Entomobrya OTUs or collection of haplotypes within OTUs found in each area within Great Britain. 17 18 821 19 20 All OTUs OTU 1 OTU 4 OTU 5 OTU 6 OTU 11 OTU 12 21 22 23 glac unglac glac unglac glac unglac glac unglac glac unglac glac unglac glac unglac 24 25 N ind 264 394 75 87 30 63 14 40 20 32 58 50 31 141 26 27 N otu/hap 9 12 14 16 15 26 6 12 3 6 27 20 6 33 28 29 Mean - 11 - 14.7 - 16 - 6.3 - 4.4 23.7 - - 10 30 31 95% CI - 9 to 12 - 13 to 16 - 13 to 20 - 4 to 9 - 2 to 6 21 to 26 - - 6 to 14 32 33 822 34 35 36 823 37 38 39 35 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 80 of 117

1 2 3 824 Table 4. Summary of Lepidocyrtus ECOSIM data entry and diversity results (mean and 4 5 6 825 95% confidence interval – CI) for overall OTU richness and for each of the most 7 8 826 abundant OTUs. For all simulated pairs, the larger community was rarefied down to the 9 10 827 abundance level of the smaller community being compared and the observed diversity 11 12 828 (n hap) of the smaller community was tested to see if it felt within the estimated 95% 13 14 15 829 confidence interval of the larger community. n ind = number of individuals, n otu/hap = 16 17 830 number of OTUs or of haplotypes within OTUs, glac = community in the glaciated area, 18 19 831 unglac = community in the unglaciated area. *Community is used in a statistical sense 20 21 For Peer Review 22 832 (not ecological sense) meaning the collection of Lepidocyrtus OTUs or collection of 23 24 833 haplotypes within OTUs found in each area within Great Britain. 25 26 All OTUs OTU 1 OTU 9 OTU 15 27 28 29 glac unglac glac unglac glac unglac glac unglac 30 31 n ind 184 205 117 72 15 34 19 34 32 33 n otu/hap 11 17 8 3 6 13 9 13 34 35 Mean - 16.5 6 - - 7 - 9 36 37 38 95% CI - 15 to 17 3 to 8 - - 4 to 9 - 7 to 11 39 40 834 41 42 835 43 44 45 836 46 47 837 48 49 838 50 51 52 839 53 54 840 55 56 841 57 58 59 36 60 Page 81 of 117 Journal of Biogeography

1 2 3 842 Table 5. Estimated times, in thousands of years, of the MRCA of geographically 4 5 6 843 localized Lepidocyrtus mitochondrial OTUs (in situ endemic variation) sampled across 7 8 844 Great Britain expressed as mean values with 95% highest posterior density (HPD) 9 10 845 intervals. 11 12 846 13 14 15 OTU Mean value 95% HPD 16 17 OTU 3 160,000 77,200-256,200 18 19 OTU 7 533,390 355,600-720,700 20 21 For Peer Review 22 OTU 8 424,200 257,720-610,100 23 24 OTU 12 236,800 129,400-359,600 25 26 OTU 18 451,400 280,000-655,100 27 28 29 847 30 31 848 32 33 849 34 35 36 850 37 38 851 39 40 852 41 42 853 43 44 45 854 46 47 855 48 49 856 50 51 52 857 53 54 858 55 56 859 57 58 59 37 60 Journal of Biogeography Page 82 of 117

1 2 3 860 FIGURE LEGENDS 4 5 6 861 7 8 862 Figure 1. Distribution of sampling sites in Great Britain; a complete list of geographic 9 10 863 coordinates and number of individuals sampled per locality can be found in Appendix 11 12 864 S1. Black triangles indicate the maximum extent of the British-Irish ice sheet during the 13 14 15 865 last Pleistocene glacial period, according to Clark et al. (2012) 16 17 866 18 19 867 Figure 2. Bayesian tree of Entomobrya mtCOI Sanger sequences (n = 174 unique 20 21 For Peer Review 22 868 haplotypes) derived from 90 sampling sites across Great Britain (see Fig. 1, Appendix 23 24 869 S1). Katianna sp. was sampled as an outgroup. Numbers immediately to the right of 25 26 870 morphospecies names correspond to sampling sites. Two monophyletic clusters of DNA 27 28 sequences, NivG1 and NivG2, are grey highlighted which represents a clear structuring 29 871 30 31 872 of genetic variation found within OTU 4. Clades were collapsed to reduce space and the 32 33 873 sizes of the triangles indicate the amount of sequence variation within the clade. Further 34 35 874 OTU details can be found in Table 1. 36 37 38 875 39 40 876 Figure 3. Bayesian tree of Lepidocyrtus mtCOI Sanger sequences (n = 98 unique 41 42 877 haplotypes) derived from 68 sampling sites across Great Britain (see Fig. 1, Appendix 43 44 45 878 S1). Katianna sp. was sampled as an outgroup. Numbers immediately to the right of 46 47 879 morphospecies names correspond to sampling sites. Geographically localized OTUs are 48 49 880 grey highlighted, except OTUs 6 and 16 because they did not present sequence 50 51 variation. Most abundant OTUs are collapsed to reduce space and the sizes of the 52 881 53 54 882 triangles indicate the amount of sequence variation within the clade. Further OTU 55 56 883 details can be found in Table 2. 57 58 59 38 60 Page 83 of 117 Journal of Biogeography

1 2 3 884 4 5 6 885 Figure 4. Geographical distributions of Entomobrya a) OTU 1 – grey squares and b) 7 8 886 OTU 4 – grey dots, according to their sampling sites in Great Britain. Ellipses in b) 9 10 887 indicate geographically localized ranges of two monophyletic groups (NIVG1 - big 11 12 888 ellipse, and NIVG2 - small ellipse restricted to the glaciated area) found in OTU 4 (see 13 14 15 889 text for details). Black small dots indicate remaining sampling sites. Black triangles 16 17 890 indicate the maximum extent of the British-Irish ice sheet during the last Pleistocene 18 19 891 glacial period, according to Clark et al. (2012). 20 21 For Peer Review 22 892 23 24 893 Figure 5. Geographical distributions of Lepidocyrtus OTUs with disjunctive 25 26 894 distributions in Great Britain: a) OTU 11 – black circles, OTU 17 = black squares, b) 27 28 OTU 13 – black circles and Lepidocyrtus OTUs with geographically localized 29 895 30 31 896 distributions c) OTU 3 – black squares, OTU 6 = black circles, OTU 7 = white stars, 32 33 897 OTU 18 = black triangles, d) OTU 8 = grey circles, OTU 12 = black squares, OTU 16 = 34 35 898 white stars. Overlapping shapes share same sampling sites. Small light grey dots 36 37 38 899 indicate remaining sampling sites. Light grey triangles indicate the maximum extent of 39 40 900 the British-Irish ice sheet during the last Pleistocene glacial period, according to Clark 41 42 901 et al. (2012). 43 44 45 902 46 47 48 49 50 51 52 53 54 55 56 57 58 59 39 60 Journal of Biogeography Page 84 of 117

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 1 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 85 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Figure 2 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 86 of 117

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 For Peer Review 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Figure 3 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 87 of 117 Journal of Biogeography

1 2 3 4 (a) 5 (b) 6 7 8 9 10 11 12 13 14 For Peer Review 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Figure 4 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 88 of 117

1 2 3 4 (a) (b) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 (c) (d) 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Figure 5 50 51 52 53 54 55 56 57 58 59 60 Page 89 of 117 Journal of Biogeography

1 2 3 Journal of Biogeography 4 5 6 SUPPORTING INFORMATION 7 8 9 Evidence for the Pleistocene persistence of Collembola in Great Britain 10 11 12 Christiana M. A. Faria, Peter Shaw, Brent C. Emerson 13 14 15 16 To complement the information given in the main manuscript, the following sections 17 18 provide Supplementary information for material (Appendix S1), methods (Appendix 19 20 S2) and results (Appendix S2-S4) used in the study 21 For Peer Review 22 23 24 25 26 27 Appendix S1 Supplementary information on the material used in this study 28 29 30 31 32 33 34 Table S1.1. Sampling sites within the island of Great Britain, reporting the geographic 35 36 coordinates and number of Entomobrya (n = 722) and Lepidocyrtus (n = 428) 37 38 individuals collected per locality. Locations coded according to Fig 1. Sites 46, 70, 71 39 40 41 and 82 are absent from the list as no specimens from these genera were collected at 42 43 these sites. 44 45 46 47 48 Locality Latitude Longitude Entomobrya Lepidocyrtus 49 50 1 57.22694 -3.744723 3 2 51 52 2 57.250828 -3.646657 1 2 53 54 55 3 57.056782 -3.650115 2 0 56 57 4 56.992771 -3.484993 3 6 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 1

Journal of Biogeography Page 90 of 117

1 2 3 5 53.567368 -2.232038 2 1 4 5 6 6 53.574394 -2.140705 2 1 7 8 7 53.711956 -1.745219 4 0 9 10 8 53.621056 -1.544229 7 6 11 12 13 9 53.267548 -3.517447 12 3 14 15 10 53.270206 -3.332264 9 15 16 17 11 53.390812 -3.188403 8 0 18 19 12 53.191521 -3.080297 3 1 20 21 For Peer Review 22 13 53.26902 -2.800899 7 12 23 24 14 53.383728 -2.403826 2 0 25 26 15 53.348923 -2.387444 5 2 27 28 29 16 53.383045 -2.369771 3 2 30 31 17 53.40139 -2.331261 8 1 32 33 18 53.403618 -2.28128 2 1 34 35 36 19 52.886627 -2.164448 13 0 37 38 20 53.400555 -2.160005 10 9 39 40 21 53.37479 -1.511842 4 11 41 42 22 51.76825 -4.620054 1 0 43 44 45 23 52.215801 -1.503167 16 1 46 47 24 51.653542 -4.957234 18 11 48 49 25 51.647945 -4.937279 14 8 50 51 52 26 51.678989 -3.993226 12 4 53 54 27 51.534397 -3.574383 21 3 55 56 28 51.539238 -3.128327 12 19 57 58 59 29 51.511406 -2.158805 12 16 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 2

Page 91 of 117 Journal of Biogeography

1 2 3 30 51.482948 -1.557258 15 1 4 5 6 31 50.512501 -4.82223 11 7 7 8 32 50.369995 -4.786108 10 0 9 10 33 50.487782 -4.762217 4 1 11 12 13 34 50.470009 -4.720558 6 8 14 15 35 50.654724 -4.290828 5 3 16 17 36 50.426395 -4.275012 4 1 18 19 37 50.376942 -4.034168 5 6 20 21 For Peer Review 22 38 50.633892 -3.565275 3 5 23 24 39 50.716389 -3.465 2 5 25 26 40 53.723221 -0.479807 2 5 27 28 29 41 53.727322 -0.369791 4 0 30 31 42 53.800648 -0.057395 14 5 32 33 43 52.623665 1.246741 44 13 34 35 36 44 52.41946 0.51529 2 0 37 38 45 52.423801 0.527952 4 0 39 40 47 52.341007 0.537557 11 0 41 42 48 52.414165 0.542738 17 1 43 44 45 49 51.718281 0.523698 1 2 46 47 50 52.146744 1.081943 20 0 48 49 51 51.32111 -0.475282 11 2 50 51 52 52 51.313606 -0.466388 3 0 53 54 53 51.316383 -0.464724 1 4 55 56 54 51.299717 -0.374997 11 18 57 58 59 55 51.173615 -0.371669 2 0 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 3

Journal of Biogeography Page 92 of 117

1 2 3 56 51.176941 -0.360549 23 0 4 5 6 57 51.312778 -0.335275 2 0 7 8 58 51.248577 -0.320651 14 0 9 10 59 51.248592 -0.32065 3 11 11 12 13 60 51.21553 -0.314491 6 2 14 15 61 51.157887 -0.314006 10 0 16 17 62 51.433052 -0.276102 13 12 18 19 63 51.626289 -4.256871 2 0 20 21 For Peer Review 22 64 51.456116 -0.244719 0 1 23 24 65 51.448338 -0.242504 20 19 25 26 66 51.135246 -0.249651 0 1 27 28 29 67 51.273663 0.042656 2 2 30 31 68 51.490414 0.285921 2 1 32 33 69 51.200829 0.518717 0 9 34 35 36 72 57.032429 -3.625677 1 0 37 38 73 57.005009 -3.541238 0 3 39 40 74 55.684959 -1.838683 6 6 41 42 75 55.675194 -1.799497 2 1 43 44 45 76 55.606403 -1.705754 10 5 46 47 77 55.054352 -2.61745 8 12 48 49 78 54.451942 -2.60667 4 9 50 51 52 79 54.991104 -2.363755 11 4 53 54 80 54.798607 -2.868619 5 2 55 56 81 55.05661 -1.632874 7 1 57 58 59 83 54.190464 -2.797168 0 3 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 4

Page 93 of 117 Journal of Biogeography

1 2 3 84 54.191376 -2.795045 0 1 4 5 6 85 53.498901 -1.774891 6 4 7 8 86 53.6982 -1.264826 6 2 9 10 87 53.291092 -3.712894 8 0 11 12 13 88 52.697304 -3.519311 4 0 14 15 89 52.699707 -3.621308 3 10 16 17 90 52.638248 -3.716803 11 0 18 19 91 52.58733 -3.784479 5 9 20 21 For Peer Review 22 92 52.711727 -3.575345 0 8 23 24 93 51.269722 -2.91999 1 4 25 26 94 50.955917 -3.952013 1 0 27 28 29 95 51.665211 -3.712372 0 1 30 31 96 53.411362 -1.830386 1 8 32 33 97 53.383255 -1.602855 12 7 34 35 36 98 52.666122 -3.282838 11 1 37 38 99 52.70047 -3.060518 18 0 39 40 100 52.67226 -3.407084 3 17 41 42 101 52.722786 -2.839915 40 20 43 44 45 102 52.362823 -1.94556 13 8 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 5

Journal of Biogeography Page 94 of 117

1 2 3 Table S1.2. Minimum (Min), maximum (Max) and overall mean uncorrected p- 4 5 6 distances (p-dis) found among Lepidocyrtus COII lineages collected sympatrically 7 8 from six localities in Panama (sequence data from Cicconardi et al., 2013). This data 9 10 was used to define the minimum pairwise distance threshold to define OTUs (see text 11 12 13 for details). 14 15 Localities N sympatric lineages Min p-dis Max p-dis Overall p-dist 16 17 L.b1b-SanFelix 3 0.15 0.17 0.11 18 19 L.b2b-ElValle 3 0.05 0.17 0.08 20 21 For Peer Review 22 L.ve-Fortuna 3 0.21 0.19 0.13 23 24 L.ve-P.I.L.A. 3 0.04 0.11 0.07 25 26 L.ve-Darien 4 0.04 0.20 0.17 27 28 L.ve-SanFelix 3 0.12 0.22 0.15 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 6

Page 95 of 117 Journal of Biogeography

1 2 3 4 5 Appendix S2 Supplementary material for methods and results used in this study 6 7 8 Table S2.1. Entomobrya and Lepidocyrtus monophyletic clusters recovered using the following approaches: NJ full - Neighbour-Joining with all 9 haplotypes dataset, BI hap - Bayesian inference with unique haplotypes dataset, ML hap – Maximum Likelihood with unique haplotypes dataset, 10 11 ML full - Maximum Likelihood with all haplotypes dataset (see text for details). Posterior (max. 1) and bootstrap (max. 100) support values are 12 presented for BI and NJ/ML trees respectively. Symbol ( ✔) denotes a cluster has been recovered. Singleton indicates clusters defined by a single 13 haplotype sequence, no support value associated. Variations in cluster recovering (e.g. split or association with other clusters) are indicated. For 14 example, within Entomobrya cluster 4 two Formonophyletic Peer clusters NIV1 andReview NIV2 can be found. 15 16 17 ENTOMOBRYA DATASET 18 Cluster NJ full tree bootstrap BI hap tree posterior ML hap tree bootstrap ML full tree bootstrap 19 1 ✔ 99 ✔ 1 ✔ 83 ✔ 80 20 21 2 ✔ 99 ✔ singleton ✔ singleton ✔ 97 22 3 ✔ 99 ✔ 1 ✔ 100 ✔ 98 23 ✔ with ✔ with ✔ with Niv1 but ✔ with Niv1 but 4 99 1 51 67 24 Niv1, Niv2 Niv1, Niv2 not Niv2 not Niv2 25 26 5 ✔ 99 ✔ 1 ✔ 100 ✔ 97 27 6 ✔ 99 ✔ 1 ✔ 100 ✔ 98 28 7 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 29 8 ✔ 99 ✔ 1 ✔ 94 ✔ 97 30 31 9 ✔ 99 ✔ 1 ✔ 97 ✔ 96 32 10 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 33 11 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 34 12 ✔ 99 ✔ 1 ✔ 92 ✔ 89 35 36 13 ✔ 99 ✔ 1 ✔ 91 ✔ 93 37 LEPIDOCYRTUS DATASET 38 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 7 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 96 of 117

1 2 3 4 5 Cluster NJ full tree bootstrap BI tree hap posterior ML tree hap bootstrap ML full tree bootstrap 6 1 ✔ 100 ✔ 1 ✔ 97 ✔ 67 7 8 2 ✔ 100 ✔ singleton ✔ singleton ✔ 98 9 3 ✔ 100 ✔ 1 ✔ 100 ✔ 95 10 4 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 11 5 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 12 13 6 ✔ 100 ✔ 1 ✔ 100 ✔ 91 14 7 split in 2 100 For✔ Peer1 Review✔ 100 ✔ 90 15 8 ✔ 100 ✔ 1 no within 9 no within 9 16 9 split in 2 99 ✔ 1 no split, plus 8 97 no split, plus 8 94 17 18 10 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 19 11 ✔ 100 ✔ 1 ✔ 99 ✔ 91 20 12 ✔ 100 ✔ 1 ✔ 98 ✔ 92 21 13 ✔ 100 ✔ 0.99 ✔ 99 ✔ 94 22 23 14 ✔ singleton ✔ singleton ✔ singleton ✔ singleton 24 15 split in 2 100 ✔ 1 ✔ 98 ✔ 44 25 16 ✔ 100 ✔ 1 ✔ 1 ✔ 96 26 17 ✔ 100 ✔ 1 ✔ 1 ✔ 89 27 28 18 ✔ 100 ✔ 1 ✔ 87 ✔ 85 29 30 31 32 33 34 35 36 37 38 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 8 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 97 of 117 Journal of Biogeography

1 2 3 4 5 Table S2.2 Delimitation of Entomobrya and Lepidocyrtus OTUs using the following approaches: 96% threshold analysis, GMYC, mPTP using 6 BI unique haplotypes tree (BI hap), mPTP using ML unique haplotypes tree (ML hap) and mPTP using ML all haplotypes tree (ML full) (see 7 text for details). Symbol (✔) denotes that OTU has been defined by that method. Variations in delimited OTUs (such as splits or combinations) 8 are also indicated. For example, GMYC splits Entomobrya OTU1 in four different entities. N recovered counts the number of times an OTU 9 (without variations) has been delimited considering these five approaches. Final numbers of OTUs for each genus consider the most frequent 10 11 “form” of OTU while conservatively avoiding over splitting. For example, Entomobrya OTU1 without splits is defined in 4 out of 5 approaches, 12 thus it is considered a single OTU. Entomobrya OTU10 is combined with OTU11 in 3 out of 5 approaches, thus both are considered a single 13 OTU. Lepidocyrtus OTU9 is trickier because it has been split in 2 approaches and combined with OTU8 in other 2 approaches. We kept it as a 14 single OTU because OTU8 is defined in theFor majority of Peerapproaches (3 out Review of 5), then, when looking at only OTU9, it appears without splits in 3 15 out of 5 approaches, thus being considered a single OTU. 16 17 18 ENTOMOBRYA DATASET 19 96% GMYC mPTP mPTP mPTP OTU N recovered 20 threshold (BI hap) (BI hap) (ML hap) (ML full) 21 1 ✔ split in 4 ✔ ✔ ✔ 4 22 23 2 ✔ ✔ ✔ ✔ ✔ 5 24 3 ✔ ✔ ✔ ✔ ✔ 5 25 4 ✔ split in 5 split in 2 ✔ ✔ 3 26 5 ✔ ✔ ✔ ✔ ✔ 5 27 28 6 ✔ ✔ ✔ ✔ ✔ 5 29 7 ✔ ✔ plus 9 ✔ ✔ 4 30 8 ✔ ✔ ✔ ✔ ✔ 5 31 9 ✔ split in 2 plus 7 ✔ ✔ 3 32 33 10 ✔ ✔ plus 11 plus 11 plus 11 2 34 11 ✔ ✔ plus 10 plus 10 plus 10 2 35 12 ✔ ✔ ✔ ✔ ✔ 5 36 13 ✔ split in 2 split in 2 ✔ ✔ 3 37 38 Total 13 22 13 12 12 12 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 9 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 98 of 117

1 2 3 4 5 LEPIDOCYRTUS DATASET 6 96% GMYC mPTP mPTP mPTP N recovered OTU 7 threshold (BI hap) (BI hap) (ML hap) (ML full) 8 9 1 ✔ ✔ ✔ ✔ ✔ 5 10 2 ✔ ✔ ✔ ✔ ✔ 5 11 3 ✔ ✔ ✔ ✔ ✔ 5 12 13 4 ✔ ✔ ✔ ✔ ✔ 5 14 5 ✔ For✔ Peer✔ Review✔ ✔ 5 15 6 ✔ ✔ ✔ ✔ ✔ 5 16 7 split in 2 ✔ split in 2 ✔ ✔ 3 17 18 8 ✔ ✔ ✔ within 9 within 9 3 19 9 split in 2 ✔ split in 2 ✔ plus 8 ✔ plus 8 1 20 10 ✔ ✔ ✔ ✔ ✔ 5 21 11 ✔ ✔ ✔ ✔ ✔ 5 22 23 12 ✔ ✔ ✔ ✔ ✔ 5 24 13 ✔ ✔ ✔ ✔ ✔ 5 25 14 ✔ ✔ ✔ ✔ ✔ 5 26 15 split in 3 ✔ split in 2 ✔ split in 2 2 27 28 16 ✔ ✔ ✔ ✔ ✔ 5 29 17 ✔ ✔ ✔ ✔ ✔ 5 30 18 ✔ ✔ split in 2 ✔ ✔ 4 31 Total 22 18 22 17 18 18 32 33 34 35 36 37 38 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 10 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 99 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 100 of 117

1 2 3 Figure S2.1. Maximum likelihood tree of Entomobrya mtCOI Sanger sequences 4 5 (using all haplotypes) derived from 90 sampling sites across Great Britain (see Fig. 1, 6 Appendix S1: Table S1.1). Katianna sp. was sampled as an outgroup. Numbers 7 immediately to the right of morphospecies names correspond to sampling sites. A 8 monophyletic cluster of DNA sequences, NivG1, is grey highlighted, which 9 represents a clear structuring of genetic variation, found within OTU 4. Further OTU 10 details can be found in Table 1. 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 12

Journal of Biogeography

For Peer Review Page 101 of 117 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 102 of 117

1 2 3 Figure S2.2. Maximum likelihood tree of Lepidocyrtus mtCOI Sanger sequences 4 5 (using all haplotypes) derived from 68 sampling sites across Great Britain (see Fig. 1, 6 Appendix S1: Table S1.1). Katianna sp. was sampled as an outgroup. Numbers 7 immediately to the right of morphospecies names correspond to sampling sites. 8 Geographically localized OTUs are grey highlighted, except OTUs 6 and 16 because 9 they did not present sequence variation. Further OTU details can be found in Table 1. 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 14

Page 103 of 117 Journal of Biogeography

1 2 Appendix S3 Spatial distribution patterns for Entomobrya and Lepidocyrtus OTUs 3 4 5 6 7 (a) (b) 8 9 10 11 12 13 14 For Peer Review 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Figure S3.1. Geographical distributions of Entomobrya a) OTU 11 – red dots, and b) OTU 12 – purple dots, according to 36 their sampling localities in Great Britian. Small black dots indicate remaining sampling sites. Black triangles indicate the 37 38 maximum extent of the British-Irish ice sheet during the last Pleistocene glacial period. 39 15 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Journal of Biogeography Page 104 of 117

1 2 3 4 (a) (b) 5 6 7 8 9 10 11 12 13 14 For Peer Review 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Figure S3.2. Geographical distributions of Entomobrya a) OTU 5 – blue dots, and b) OTU 6 – orange dots, according to their sampling 34 35 localities in Great Britain. Small black dots indicate remaining sampling sites. Black triangles indicate the maximum extent of the 36 British-Irish ice sheet during the last Pleistocene glacial period. 37 38 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 16 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 105 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure S3.3. Geographical distributions of Entomobrya mtDNA COI OTUs according to 49 50 their sampling localities in Great Britain. OTU 2 – green dots, OTU 3 - purple dots, 51 52 OTU 7 – light blue dot, OTU 8 - orange dot, OTU 9 – dark and light blue dots, OTU 10 53 – light blue and red dots. Black dots indicate remaining sampling sites. Black triangles 54 55 indicate the maximum extent of the British-Irish ice sheet during the last Pleistocene 56 57 glacial period. 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 17

Journal of Biogeography Page 106 of 117

1 2 3 (a) (b) 4 (c) 5 6 7 8 9 10 11 12 13 14 For Peer Review 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Figure S3.4. Geographical distributions of widespread Lepidocyrtus in Great Britain a) OTU 1 – pink dots, b) OTU 9 – blue dots, c) OTU 30 15 – green dots. Small black dots indicate remaining sampling sites. Black triangles indicate the maximum extent of the British-Irish ice 31 32 sheet during the last Pleistocene glacial period. 33 34 35 36 37 38 39 40 41 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 18 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 107 of 117 Journal of Biogeography

1 2 3 4 5 (a) (b) 6 7 8 9 10 11 12 13 14 15 16 17

18 (c) (d) 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 (e) (f) 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 (g) 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 19

Journal of Biogeography Page 108 of 117

1 2 3 Figure S3.5. Ripley’s Kdot function for spatial distribution patterns for Entomobrya and 4 5 Lepidocyrtus localized monophyletic clusters sampled in Great Britain. Plots show that 6 OTUs deviate from completely random distribution with empirical values (black, red and 7 green lines) greater than that expected for a random distribution (blue dashed lines), 8 indicating their distribution to be clustered. Kiso – isotropic correction, Ktrans – translation 9 correction, Kbord – border method or reduced samples estimator, Kpois - completely random 10 Poisson point process. Localized monophyletic clusters: Entomobrya OTU4 a) NIVG1, b) 11 NIVG2, Lepidocyrtus c) OTU3, d) OTU7, e) OTU8, f) OTU12, g) OTU18. 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 20

Page 109 of 117 Journal of Biogeography

1 2 3 4 (a) (b) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 (c) (d) 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 (e) (f) 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 21

Journal of Biogeography Page 110 of 117

1 2 Figure S3.6. Ripley’s Kdot function for spatial distribution patterns for Entomobrya and 3 Lepidocyrtus disjunctly distributed clusters sampled in Great Britain. Plots show that OTUs 4 mostly deviate from completely random distribution with empirical values (black, red and 5 6 green lines) greater than that expected for a random distribution (blue dashed lines). Kiso – 7 isotropic correction, Ktrans – translation correction, Kbord – border method or reduced samples 8 estimator, Kpois - completely random Poisson point process. Disjunctly distributed clusters: 9 Entomobrya a) OTU2, b) OTU3, c) OTU10 and Lepidocyrtus d) OTU11, e) OTU13, f) 10 OTU17. 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 22

Page 111 of 117 Journal of Biogeography

1 2 3 4

5 Appendix S4 Dated trees for Entomobrya and Lepidocyrtus geographically localized 6 monophyletic clusters 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 Figure S4.1. BEAST tree for Entomobrya geographically localized monophyletic 48 cluster NIVG1 found within the widespread OTU4. Numbers after the name indicate 49 localities where sequences were sampled in Great Britain (Table S1). Node ages were 50 estimated using a Collembola COI rate of 0.0421 substitutions/site/Ma (see text for 51 details). 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 23

Journal of Biogeography Page 112 of 117

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Figure S4.2. BEAST tree for Entomobrya geographically localized monophyletic 39 cluster NIVG2 found within the widespread OTU4. Numbers after the name indicate 40 localities where sequences were sampled in Great Britain (Table S1). Node ages were 41 estimated using a Collembola COI rate of 0.0421 substitutions/site/Ma (see text for 42 details). 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 24

Page 113 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 Figure S4.3. BEAST tree for Lepidocyrtus geographically localized monophyletic 34 cluster OTU3. Numbers after the name indicate localities where sequences were 35 sampled in Great Britain (Table S1). Node ages were estimated using a Collembola 36 COI rate of 0.0421 substitutions/site/Ma (see text for details). 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 25

Journal of Biogeography Page 114 of 117

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure S4.4. BEAST tree for Lepidocyrtus geographically localized monophyletic 49 50 cluster OTU7. Numbers after the name indicate localities where sequences were 51 sampled in Great Britain (Table S1). Node ages were estimated using a Collembola 52 COI rate of 0.0421 substitutions/site/Ma (see text for details). 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 26

Page 115 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 34 Figure S4.5. BEAST tree for Lepidocyrtus geographically localized monophyletic 35 cluster OTU8. Numbers after the name indicate localities where sequences were 36 sampled in Great Britain (Table S1). Node ages were estimated using a Collembola 37 COI rate of 0.0421 substitutions/site/Ma (see text for details). 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 27

Journal of Biogeography Page 116 of 117

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 Figure S4.6. BEAST tree for Lepidocyrtus geographically localized monophyletic 32 33 cluster OTU12. Numbers after the name indicate localities where sequences were 34 sampled in Great Britain (Table S1). Node ages were estimated using a Collembola 35 COI rate of 0.0421 substitutions/site/Ma (see text for details). 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 28

Page 117 of 117 Journal of Biogeography

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 For Peer Review 22 23 24 25 26 27 28 29 30 31 32 33 Figure S4.7. BEAST tree for Lepidocyrtus geographically localized monophyletic 34 cluster OTU18. Numbers after the name indicate localities where sequences were 35 sampled in Great Britain (Table S1). Node ages were estimated using a Collembola 36 COI rate of 0.0421 substitutions/site/Ma (see text for details). 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 C.M.A. Faria et al. Pleistocene persistence of Collembola in Great Britain 29