Systematic Entomology Page 2 of 55

1

1 Out of the Orient: Post-Tethyan transoceanic and trans-Arabian routes

2 fostered the spread of Baorini skippers in the Afrotropics

3

4 Running title: Historical biogeography of Baorini skippers

5

6 Authors: Emmanuel F.A. Toussaint1,2*, Roger Vila3, Masaya Yago4, Hideyuki Chiba5, Andrew

7 D. Warren2, Kwaku Aduse-Poku6,7, Caroline Storer2, Kelly M. Dexter2, Kiyoshi Maruyama8,

8 David J. Lohman6,9,10, Akito Y. Kawahara2

9

10 Affiliations:

11 1 Natural History Museum of Geneva, CP 6434, CH 1211 Geneva 6, Switzerland

12 2 Florida Museum of Natural History, University of Florida, Gainesville, Florida, 32611, U.S.A.

13 3 Institut de Biologia Evolutiva (CSIC-UPF), Passeig Marítim de la Barceloneta, 37, 08003

14 Barcelona, Spain

15 4 The University Museum, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

16 5 B. P. Bishop Museum, 1525 Bernice Street, Honolulu, Hawaii, 96817-0916 U.S.A.

17 6 Biology Department, City College of New York, City University of New York, 160 Convent

18 Avenue, NY 10031, U.S.A.

19 7 Biology Department, University of Richmond, Richmond, Virginia, 23173, USA

20 8 9-7-106 Minami-Ôsawa 5 chome, Hachiôji-shi, Tokyo 192-0364, Japan

21 9 Ph.D. Program in Biology, Graduate Center, City University of New York, 365 Fifth Ave., New

22 York, NY 10016, U.S.A.

23 10 Entomology Section, National Museum of the Philippines, Manila 1000, Philippines

24

25 *To whom correspondence should be addressed: E-mail: [email protected] Page 3 of 55 Systematic Entomology

2

26

27 ABSTRACT

28 The origin of taxa presenting a disjunct distribution between and Asia has puzzled

29 biogeographers for centuries. This biogeographic pattern has been hypothesized to be the result

30 of transoceanic long-distance dispersal, Oligocene dispersal through forested corridors, Miocene

31 dispersal through the or passive dispersal on the rifting Indian plate. However,

32 it has often proven difficult to pinpoint the mechanisms at play. We investigate the biotic

33 exchange between the Afrotropics and the Oriental region during the Cenozoic, a period in

34 which geological changes altered landmass connectivity. We use Baorini skippers (,

35 Hesperiidae) as a model, a widespread clade of in the Old-World tropics presenting a

36 disjunct distribution between the Afrotropics and the Oriental region. We use anchored

37 phylogenomics to infer a robust evolutionary tree for Baorini skippers and estimate divergence

38 times and ancestral ranges to test biogeographic hypotheses. Our phylogenomic tree recovers

39 strongly supported relationships for Baorini skippers and clarifies the systematics of the .

40 Dating analyses suggest that these butterflies originated in the Oriental region, Greater Sunda

41 Islands, and Philippines in the early Miocene approximately 23 million years ago. Baorini

42 skippers dispersed from the Oriental region toward Africa at least five times in the past 20

43 million years. These butterflies colonized the Afrotropics primarily through trans-Arabian

44 geodispersal after the closure of the Tethyan seaway in the mid-Miocene. Range expansion from

45 the Oriental region toward the African continent likely occurred via the Gomphotherium land

46 bridge through the Arabian Peninsula. Alternative scenarios invoking long-distance dispersal and

47 vicariance are not supported. The Miocene climate change and biome shift from forested areas to

48 grasslands possibly facilitated geodispersal in this clade of skippers.

49

50 Systematic Entomology Page 4 of 55

3

51

52

53 KEYWORDS

54 Anchored phylogenomics; Arabotropical forest dispersal; evolution; Gomphotherium

55 land bridge; Hesperiidae; Lepidoptera phylogenomics; Neotethys Seaway closure; Old World

56 biogeography; transoceanic long-distance dispersal Page 5 of 55 Systematic Entomology

4

57 INTRODUCTION

58 The remarkable geological features of the Old-World tropics, such as the Indo-Australian

59 Archipelago (IAA), the Himalayan mountain range, and the Deccan traps are testaments to the

60 dynamic geological evolution that took place throughout the Cenozoic (Seton et al. 2012). The

61 collision of the Asian, Australian, and Pacific plates resulted in the assemblage of Wallacea (Hall

62 2013) and orogeny of New Guinea (Toussaint et al. 2014), allowing the Asian and Australian

63 biotas to connect through shallow marine straits and ephemeral land bridges between myriad

64 tropical islands (Voris 2000). The collision of the rifting Indian plate with the Asian plate in the

65 Paleocene (Hu et al. 2016), triggered the orogeny of the Himalayas (Zhang et al. 2012), and

66 connected the relic Gondwanan stock with the Laurasian biota. The closure of the Tethyan

67 Ocean in the Oligocene (Pirouz et al. 2017) triggered the orogeny of the Zagros mountain chain

68 along the Alpine-Himalayan orogenic belt, thereby connecting African and Asian biotas by a

69 land bridge (i.e., the Gomphotherium land bridge; Rögl 1998) through the Arabian Peninsula

70 (Harzhauser et al. 2007; Berra & Angiolini 2014). Disentangling the impact of these major

71 geological rearrangements on the evolution of the regional biota is necessary to understand the

72 mechanisms of lineage diversification in the Old-World tropics.

73 Of particular interest are clades of organisms disjunctly distributed in Asia and Africa,

74 because these allow testing hypotheses related to dispersal, vicariance, and extinction with

75 respect to the geology of the African, Arabian, Asian, and Indian plates. The distribution of such

76 clades has been explained by four biogeographic hypotheses. The first hypothesis, transoceanic

77 long-distance dispersal (LDD) between Asia and Africa, has been invoked to explain disjunct

78 distributions. Under this hypothesis (H1), the colonization of distant areas from a source

79 geographic range is accomplished directly and is not the result of range expansion followed by

80 widespread extinction/range contraction. A second hypothesis posits range expansion with trans-

81 Arabian dispersal (H2, also known as the “Gomphotherium land bridge hypothesis”, Harzhauser Systematic Entomology Page 6 of 55

5

82 et al. 2007) followed by regional extinction in the Arabian Peninsula. The Gomphotherium land

83 bridge hypothesis is based on geological evidence that the eastern part of the Tethys Ocean (i.e.,

84 Neotethys) that once separated Africa and Asia closed as a result of the Arabian plate moving

85 northward and colliding with the Asian plate in the Oligocene ca. 27 million years ago (Ma),

86 engendering the orogeny of the Zagros mountain chain in the process (Pirouz et al. 2017). A

87 third hypothesis is derived from the “boreotropical dispersal hypothesis”. This latter postulate

88 suggests that lineages expanded their geographic ranges in higher latitudes through boreotropical

89 forests that existed in the Eocene and Oligocene, before climate cooling triggered the wane of

90 boreotropical forests, in turn promoting ecological vicariance (Wolfe 1975). Recent studies (e.g.,

91 Couvreur et al. 2011; Sánchez-Ramírez et al. 2015) suggest that a similar process could be

92 invoked for Africa-Asia disjunctions, where range expansion could have happened after the

93 closure of the Tethys Ocean via now-vanished tropical forests in the Arabian Peninsula

94 (Ghazanfar & Fisher 1998; Griffin 2002). This hypothesis, which we coin “Arabotropical forest

95 dispersal” (H3), implies that range expansion would had to have happened between the

96 emergence of these habitats and their disappearance, triggered by the onset of a cooler climate in

97 the late Eocene and Oligocene (Zachos et al. 2001). As a result, this hypothesis predicts range

98 expansions resulting in ancestral widespread distributions, followed by vicariance triggered by

99 the aridification of the Arabian Peninsula and associated floristic reconfiguration (Ghazanfar &

100 Fisher 1998; Griffin 2002). The last hypothesis proposes that organisms rafted on the

101 Gondwanan Indian plate (“biotic ferry hypothesis” or “out-of- hypothesis”; Datta-Roy &

102 Karanth 2009) and subsequently diversified in Asia. Under this hypothesis (H4), lineages

103 occurring in Gondwana during the Cretaceous and therefore in the landmass comprising Africa,

104 India, and Madagascar at the time, would have been transported via the rafting Indian plate

105 across the Indian Ocean, before docking in the Oriental region in the Eocene ca. 60 Ma (Hu et al.

106 2016). Although this hypothesis was initially supported by phylogenetic patterns, it has since Page 7 of 55 Systematic Entomology

6

107 been largely rejected by numerous studies of unrelated plants and that use molecular

108 dating approaches (Toussaint et al. 2016). A related mechanism is the use of the Indian rafting

109 plate as a “stepping stone” whereby taxa dispersed from the Oriental region to insular India

110 before its final docking and then dispersed toward other Gondwanan landmasses (Toussaint et al.

111 2018a). Both processes imply ancient African and Oriental divergences. Overall, there are few

112 empirical studies focusing on taxa that investigate the evolution of African-Asian

113 disjunctions in a phylogenetic framework with the aim to test these competing hypotheses.

114 Baorini (Lepidoptera, Hesperiidae, Hesperiinae) are widespread butterflies

115 distributed from Madagascar, the Mascarenes, Africa and southern Europe, through the Arabian

116 Peninsula, and to the Oriental region and the IAA (Warren et al. 2009). The tribe currently

117 includes 17 genera (Fan et al. 2016; Zhu et al. 2016; de Jong & Coutsis 2017) and approximately

118 100 , which are mostly distributed in the Oriental region and the IAA, with a few African

119 species in geographically widely distributed genera or species-rich clades (Fan et al. 2016).

120 Moore, [1881] comprises several Oriental/Indo-Australian species and two species in

121 Madagascar, the Mascarenes, and southeastern coastal Africa. Evans, 1935 is endemic

122 to Africa and comprises three widespread species, and Hübner, [1819] has three African

123 endemic species and another more widely distributed species in the Oriental and Palearctic

124 regions. The mostly Oriental and Indo-Australian , Walker, 1870, comprises two

125 species with geographic ranges extending from the IAA to continental Africa and Madagascar.

126 The four genera Afrogegenes de Jong & Coutsis, 2017, Evans, 1949, Evans, 1937

127 and Larsenia Chiba, Fan & Sáfián, 2016, are wholly African, with multiple endemic species,

128 some restricted to Madagascar. The other genera Moore, 1881, Swinhoe, 1893,

129 de Nicéville, 1895, Mabille, 1904, Evans, 1937, Pseudoborbo Lee,

130 1966, Tsukiyamaia Zhu, Chiba & Wu, 2016, Zinaida Evans, 1937, Zenonoida Fan and Chiba,

131 2016 are distributed from the Oriental region including India to the Australian region and some Systematic Entomology Page 8 of 55

7

132 Pacific islands (e.g., Pelopidas lyelli (Rothschild, 1915) in the Solomon Islands and Vanuatu;

133 Braby 2000). A few recent molecular systematic studies have clarified some phylogenetic

134 relationships of Baorini (Jiang et al. 2013; Fan et al. 2016; Zhu et al. 2016; Li et al. 2017; Tang

135 et al. 2017), but these studies included a relatively small number of loci ( 5) and species ( 50),

136 and have been unable to confidently place genera, making it difficult to assess their

137 biogeographic history. In the present study, we use a phylogenomic approach to sequence 13

138 gene fragments including those most commonly used in butterfly phylogenetics (Wahlberg and

139 2008). We then combine this newly generated dataset with other available data to infer a

140 robust time-tree of nearly 70 Baorini species and test whether the distribution of Afrotropical

141 taxa can be linked to one of the four biogeographic hypotheses described above. We use a

142 hypothetico-deductive framework to test whether the phylogenomic pattern, divergence time

143 estimates, and ancestral range estimation are consistent with one or more of the hypotheses. We

144 do not test the Indian biotic ferry hypothesis because divergence time evidence suggests that

145 Baorini are much younger than the invoked geological events (e.g., Espeland et al. 2018).

146

147 MATERIALS AND METHODS

148 Taxon sampling and molecular biology

149 Samples of 32 Baorini species were collected in the field (Appendix 1) and placed in glassine

150 envelopes or directly in >95% ethanol. In the lab, DNA was extracted from abdomens or legs

151 using an OmniPrepTM DNA extraction kit (G-Biosciences) following Toussaint et al. (2018b).

152 Quantified DNA extracts were submitted to RAPiD Genomics (Gainesville, FL, USA) for library

153 preparation, hybridization enrichment and sequencing. Random mechanical shearing of DNA

154 was conducted with an average size of 300 bp followed by an end-repair reaction and ligation of

155 an adenine residue to the 3'-end of the blunt-end fragments to allow ligation of barcode adapters

156 and PCR-amplification of the library. Following library construction, solution-based target Page 9 of 55 Systematic Entomology

8

157 enrichment of Agilent SureSelect probes was conducted in a pool containing 16 libraries. These

158 libraries were enriched with the SureSelect Target Enrichment System for Illumina Paired-End

159 Multiplexed Sequencing Library protocol. Paired-end sequencing of 150-bp reads was

160 accomplished with an Illumina HiSeq.

161 We used the BUTTERFLY2.0 probe set (Kawahara et al. 2018) to capture the following

162 13 loci: acetyl-CoA (ACOA, 1020 bp), carbamoyl-phosphate synthetase 2, aspartate

163 transcarbamylase, and dihydroorotase (CAD, 1854 bp), catalase (CAT, 1290 bp), cytochrome

164 oxidase c subunit 1 (CO1, 1341 bp), dopa decarboxylase (DDC, 702 bp), elongation factor 1

165 alpha (EF1A, 1059 bp), glyceraldhyde-3-phosphate dehydrogenase (GAPDH, 606 bp), hairy cell

166 leukemia protein 1 (HCL, 633 bp), isocitrate dehydrogenase (IDH, 708 bp), malate

167 dehydrogenase (MDH, 681 bp), ribosomal protein S2 (RPS2, 471 bp), ribosomal protein S5

168 (RPS5, 555) and wingless (WGL, 240 bp).

169 We sampled sequences of 47 additional Baorini species that were available on GenBank

170 and BOLD, which were included in recent systematic studies of the tribe (Fan et al. 2016; Zhu et

171 al. 2016; Tang et al. 2017). We included sequence data for the mitochondrial ribosomal 16S (533

172 bp), and nuclear ribosomal 18S (378 bp) and 28S (857 bp). When possible, we built chimeras

173 between newly sequenced specimens and other specimens of the same species for which other

174 sequence data were available to improve sequence coverage for terminals. We included 19

175 outgroups from all subfamilies and major clades (Toussaint et al. 2018b) to allow the use

176 of secondary calibrations (see below). All sequence data were imported in Geneious R8.1.8

177 (Biomatters, USA), cleaned and aligned separately for each locus using MUSCLE (Edgar 2004).

178 The alignments of ribosomal loci were straightforward and did not require fine-tuning, likely

179 because of the evolutionary scale this taxon sampling represents. Gene trees were inferred using

180 FastTree 2.1.5 (Price et al. 2010) and inspected for contamination and sequence quality. Trees

181 were rooted with Macrosoma hyacinthina (Warren, 1905) (Hedylidae), since Hedylidae are the Systematic Entomology Page 10 of 55

9

182 sister group to Hesperiidae (Espeland et al. 2018; Toussaint et al. 2018b). The final matrix

183 included 89 species including 70 Baorini, and 16 concatenated loci totaling 12,928 aligned

184 nucleotides.

185

186 AHE Data assembly and cleanup

187 We used the pipeline for anchored phylogenomics of Breinholt et al. (2018) to create a matrix

188 from raw Illumina reads. Paired-end Illumina data were cleaned with Trim Galore! ver. 0.4.0

189 (www.bioinformatics.babraham.ac.uk), allowing a minimum read size of 30 bp, and we removed

190 bases with a Phred score below 20. The 13 loci that were sequenced using the BUTTERFLY2.0

191 kit (Kawahara et al. 2018) were assembled with iterative baited assembly (IBA, Breinholt et al.

192 2018), in which only reads with both forward and reverse reads that passed filtering were

193 included. Assembled reads from the probe region were blasted against the Danaus plexippus

194 (Linnaeus, 1758) (Nymphalidae) reference genome and BLAST results were used for single hit

195 and orthology filtering. Following Breinholt et al. (2018), the loci were screened for orthology

196 with a single hit threshold of 0.9 and genome mapping. The identified orthologous sequences

197 were screened for contamination by identifying and removing sequences that were nearly

198 identical at the and genus level (Breinholt et al. 2018). Sequences from each locus were

199 separately aligned with MAFFT v. 7.245 (Katoh and Standley 2013) and concatenated with

200 FASconCAT-G 1.0.4 (Kück and Meusemann 2010). Aliscore 2.2 (Kück et al. 2010) was used to

201 check for saturation and sites that appeared to evolve randomly. The individual locus alignments

202 and files used for phylogenetic inference in this study (see below) are available on Dryad

203 (XXXX).

204

205 Datasets, model partitioning and phylogenetic analysis Page 11 of 55 Systematic Entomology

10

206 We simultaneously estimated the best partitioning scheme and corresponding models of

207 nucleotide substitution in IQ-TREE 1.6.9 (Nguyen et al. 2015) using ModelFinder

208 (Kalyaanamoorthy et al. 2017) with the greedy algorithm and based on the Bayesian information

209 criterion corrected (BIC). The optimal models of nucleotide substitution were determined across

210 all available models in IQ-TREE including the FreeRate model (+R, Soubrier et al. 2012), that

211 relaxes the assumption of gamma distributed rates. The 13 protein-coding loci were divided by

212 codon positions (39 partitions), and the three non-protein-coding loci were left as independent

213 partitions (3 partitions). To find the most likely tree, 100 maximum likelihood (ML) searches

214 were conducted in IQ-TREE, with two calculations of nodal support: ultrafast bootstrap

215 (UFBoot) and SH-aLRT tests. We generated 1000 replicates for UFBoot (Hoang et al. 2018) and

216 SH-aLRT (Guindon et al. 2010). To reduce the risk of overestimating branch supports with

217 UFBoot due to severe model violations, we used hill-climbing nearest neighbor interchange

218 (NNI) to optimize each bootstrap tree (Hoang et al. 2018). All phylogenomic analyses were

219 conducted on the University of Florida HiPerGator High Performance Computing Cluster

220 (www.hpc.ufl.edu). When discussing branch support, we refer to 'strong' support as SH-aLRT ≥

221 80 and UFBoot ≥ 95, and 'moderate' support as SH-aLRT ≥ 80 or UFBoot ≥ 95.

222 We also performed an additional set of analyses based on AHE data only, excluding all

223 information from GenBank to ensure that chimeric constructions and molecular matrix

224 incompleteness were not biasing the phylogenetic inference. This reduced dataset comprised 50

225 taxa including 31 Baorini representatives. The methods were identical to the ones detailed above

226 for the full dataset. The best partitioning scheme and corresponding models of nucleotide

227 substitution were simultaneously estimated in IQ-TREE using ModelFinder based on the BIC,

228 and 100 ML searches were conducted to avoid local optima with computation of 1000 UFBoot

229 replicates and 1000 SH-aLRT tests.

230 Systematic Entomology Page 12 of 55

11

231 Divergence Time Estimation

232 We estimated divergence times in a Bayesian framework with BEAST 1.10.1 (Suchard et al.

233 2018). The best partitioning scheme and models of substitution were selected in PartitionFinder2

234 (Lanfear et al. 2017) using the greedy algorithm and the BIC across all models included in

235 BEAST (option models=beast). The dataset was partitioned a priori, similar to the ModelFinder

236 analysis (see above). To take into account the importance of clock partitioning, we implemented

237 (i) a single clock for all partitions, (ii) two clocks, one for the mitochondrial partition and one for

238 the nuclear partitions, and (iii) one clock for first and second codon positions of protein-coding

239 genes, one clock for third codon positions of protein-coding genes, and one clock for each

240 ribosomal gene for a total of five clocks. We assigned a Bayesian lognormal relaxed clock model

241 to the different clock partitions. We also tested different tree models by using a Yule or a birth-

242 death model in different analyses. Clock rates were set with an approximate continuous time

243 Markov chain rate reference prior (Ferreira & Suchard 2008). All analyses consisted of 100

244 million generations with a parameter and tree sampling every 5000 generations. We estimated

245 marginal likelihood estimates (MLE) for each analysis using path-sampling and stepping-stone

246 sampling (Baele et al. 2012, 2013), with 1000 path steps, and chains running for one million

247 generation with a log likelihood sampling every 1000 cycles.

248 Two hesperiid fossils, Pamphilites abdita Scudder, 1875 and Protocoeliades kristenseni

249 de Jong, 2016 were candidates to be included as fossil calibrations for this study (de Jong 2017).

250 However, preliminary testing in BEAST revealed that the age estimates obtained with these two

251 fossils were largely underestimated compared to the ages from Espeland et al. (2018) or Chazot

252 et al. (2019). This discrepancy is partly due to the difficulty of placing the oldest described

253 skipper fossil into an extant taxon: Protocoeliades kristenseni from Denmark, dated at ca. 55 Ma

254 (de Jong 2016). This fossil has been described as belonging to the based on

255 morphology (de Jong 2016). However, because relationships among Coeliadinae genera and Page 13 of 55 Systematic Entomology

12

256 their monophyly are uncertain, any placement of the fossil elsewhere than on the stem of

257 Coeliadinae remains contentious (de Jong 2016, 2017). Therefore, because Coeliadinae is sister

258 to the remainder of Hesperiidae (Espeland et al. 2018; Toussaint et al. 2018b), the fossil would

259 only calibrate a minimum age of 55 million years for crown Hesperiidae, while Espeland et al.

260 (2018) estimated this node to be ca. 76 Ma and the split between Hesperiidae and Hedylidae to

261 be ca. 106 Ma, and Chazot et al. (2019) estimated these nodes to ca. 65 Ma and ca. 99 Ma.

262 Because Chazot et al. (2019) has a more extensive taxon sampling including Baorini genera, we

263 chose this study for secondary calibrations over the more robust phylogenomic tree of Espeland

264 et al. (2018). We constrained the node corresponding to the split between Hesperiidae and

265 Hedylidae with a uniform prior encompassing the 95% credibility interval (lower=80.73 /

266 upper=119.22) estimated for this node in Chazot et al. (2019). We also constrained the crown of

267 Hesperiidae (lower=55.8 / upper=78.1) and the node corresponding to all Baorini excluding

268 Parnara (lower=10.3 / upper=17.2).

269

270 Ancestral Range Estimation

271 We used the R-package BioGeoBEARS 1.1.1 (Matzke 2018) to estimate ancestral ranges in

272 Baorini. Because the validity of model comparison in BioGeoBEARS is debated (Ree &

273 Sanmartin 2018), the analyses were only performed under the dispersal-extinction-cladogenesis

274 (DEC) model (Ree 2005; Ree & Smith 2008). The DEC model specifies instantaneous transition

275 rates between geographical ranges along the branches of a given phylogeny. The likelihoods of

276 ancestral states at cladogenesis events in the phylogeny are then estimated in a ML framework,

277 with probabilities of range transitions as a function of time. We used the BEAST Maximum

278 Clade Credibility (MCC) tree from the best analysis (see Results) with outgroups pruned. The

279 geographic distribution of Baorini was extrapolated from the literature (Chiba & Eliot 1991; de

280 Jong & Treadaway 1993; Braby 2000; Zhang et al. 2010; Huang 2011). In order to test the three Systematic Entomology Page 14 of 55

13

281 hypotheses related to the colonization of Africa, we used the following areas in the

282 BioGeoBEARS analyses: Western Palearctic (P) from southern Europe to ; Africa,

283 Madagascar and Mascarenes (F); Arabian Peninsula (A); Oriental (O) from Iran to Japan and

284 from China to the Malaysian Peninsula; Greater Sunda Islands and Philippines (G) including

285 , Java, Palawan, Sumatra and satellite islands; Wallacea (W) including the Lesser Sunda

286 Islands, Moluccas and Sulawesi including satellite islands; and and New Guinea (N)

287 including satellite islands.

288 Considering the relative geological stability of the African and Asian regions in the late

289 Cenozoic, we only took into account the dynamic geological history of the IAA by designing

290 two time slices with differential dispersal rate scalers. TS1 (25 Ma – 15 Ma) corresponds to the

291 period predating the acceleration of orogenies in the Philippines archipelago, Wallacea and New

292 Guinea (Yumul et al. 2008; Hall 2013; Toussaint et al. 2014), and TS2 (15 – present)

293 corresponds to the above-mentioned events, more active formation of the IAA and intensifying

294 sea-level fluctuations in the Plio-Pleistocene (Voris 2000). The dispersal rate scaler values were

295 selected according to terrain and water body positions throughout the timeframe of the age of the

296 tribe (Appendix 2). The maximum number of areas per ancestral state was fixed to five. To

297 reduce the computational burden, ancestral states corresponding to unrealistic disjunct areas

298 were removed manually from the list of possible states.

299

300 RESULTS

301 Phylogenetic Relationships

302 The phylogenetic tree based on the reduced dataset (AHE data only) with the highest likelihood

303 (lnL = -113733.176) from the 100 searches performed in IQ-TREE is presented in Figure 1 (see

304 Appendix 3 for the full tree including outgroups), and the one based on the full dataset (LnL = -

305 126096.182) is presented in Figure 2. Since the phylogenies inferred in ML with both datasets Page 15 of 55 Systematic Entomology

14

306 gave similar branching patterns, we only discuss the ones recovered with the greater taxon

307 sampling and presented in Figure 2. The inferred tree is well-resolved with more than 85% of

308 nodes that recovered moderate or strong support. The phylogeny inferred in BEAST under

309 Bayesian inference (BI) gave a very similar topology (Figure 2, Appendix 3) and a greater

310 number of nodes with strong support values (PP  0.95). Hesperia Fabricius, 1793 and Saliana

311 Evans, 1955 are recovered as sister to Baorini with strong support in both ML and BI (UFBoot =

312 100 / SH-aLRT = 100 / PP = 1.0). Baorini is recovered as monophyletic with strong support

313 (UFBoot = 100 / SH-aLRT = 100 / PP = 1.0) and is split into five clades. Parnara is recovered in

314 Clade I and as sister to the remainder of the tribe with strong support (UFBoot = 100 / SH-aLRT

315 = 100 / PP = 1.0). Clade II, which comprises the genera Iton, Tsukiyamaia, Zenonia, Zenonoida

316 and Zinaida; it is recovered with weaker support in ML (UFBoot = 37 / SH-aLRT = 50) but with

317 strong support in BI (PP = 1.0). Clade III is recovered with moderate to strong support (UFBoot

318 = 90 / SH-aLRT = 100 / PP = 0.99) and comprises Prusiana as sister to Caltoris. In Clade IV,

319 Baoris is recovered as sister to Pelopidas with moderate to strong support (UFBoot = 91 / SH-

320 aLRT = 100 / PP = 0.99). Finally, in Clade V, (Herrich-Schäffer, 1869) is

321 sister to a clade comprising the monotypic Pseudoborbo, the polyphyletic Borbo, Gegenes and

322 the sister genera Afrogegenes and Larsenia. Species of Baorini distributed in Africa,

323 Madagascar, or the Mascarenes are recovered in different clades across the tree.

324

325 Divergence Times and Biogeography

326 Dating analyses using different numbers of clocks and tree models estimated similar ages with

327 broadly overlapping credibility intervals (Table 1). Based on the MLE comparison (Table 1), the

328 analysis with two clocks and a Yule model was selected for further analyses. Results of this

329 analysis are presented in Figure 3 (see Appendix 4 for the full chronogram), along with results of

330 the DEC model in BioGeoBEARS (DEC, lnL = -176.24). The BioGeoBEARS analysis Systematic Entomology Page 16 of 55

15

331 suggested an origin in the Oriental region and Greater Sunda Islands/Philippines, with multiple

332 independent colonization events of Africa, Madagascar and the Mascarenes, but also of regions

333 east of Wallace’s Line in the Miocene. We recover an origin of Baorini in the early Miocene ca.

334 23 Ma in the Oriental region and Greater Sunda Islands/Philippines. Based on this result, we

335 estimate range expansion across several areas in Parnara (Clade I) toward the IAA and

336 Australian region. This initial range expansion in Parnara is followed by a second range

337 expansion toward Africa, Madagascar and the Mascarenes in the late Miocene to early Pliocene.

338 We infer a range contraction in the remainder of Baorini, a lineage that shows an Oriental origin

339 in the mid-Miocene (Figure 3). Within Clade II, we estimate range expansion toward West

340 Palearctic, Africa, Madagascar and the Mascarenes from the Oriental region in the Miocene with

341 a potential vicariance between Zenonia and Zinaida (Figure 3). We estimate an origin of Clade

342 III in the Greater Sunda Islands/Philippines, followed by range contraction in the Oriental region

343 in Caltoris. The ancestor of Prusiana colonized Wallacea sometime in the Miocene or Plio-

344 Pleistocene. We also infer range expansion toward the Greater Sunda Islands/Philippines in

345 Caltoris, followed by range expansion toward Wallacea and Australian region in the Miocene or

346 Pliocene. In Clade IV, the ancestor of Pelopidas is inferred to have expanded its range from the

347 Oriental region to Wallacea in the late Miocene, allowing a subsequent range expansion toward

348 Australia and New Guinea in P. lyelli. A similar range expansion is estimated in the sister pair P.

349 agna (Moore, [1866])/P. mathias (Fabricius, 1798). The colonization of Africa, Madagascar and

350 the Mascarenes by the ancestors of P. mathias and P. thrax (Hübner, [1821]) is estimated to have

351 occurred in the late Miocene to Plio-Pleistocene through independent range expansions. We

352 estimate a mid-Miocene range expansion from the Oriental region toward a vast region

353 encompassing the Arabian Peninsula, Western Palearctic region, Africa, Madagascar and the

354 Mascarenes, Oriental region and Greater Sunda Islands/Philippines in Clade CV. The Arabian

355 Peninsula was also colonized from Africa in the past 5 million years in Afrogegenes (Figure 3). Page 17 of 55 Systematic Entomology

16

356

357 DISCUSSION

358 Phylogenetics and of Baorini

359 Our phylogenomic analysis recovers a well-resolved phylogenetic hypothesis for Baorini

360 skippers, especially in the BEAST analysis. Zhu et al. (2016) recovered Parnara as sister to the

361 remainder of Baorini, a relationship consistent with our results (Figures 1 to 3) but at odds with

362 the conclusions of Fan et al. (2016). which recovered Larsenia as sister to the remainder of

363 Baorini including Parnara. Unlike the results of Fan et al. (2016), Larsenia is nested within

364 Clade V and these relationships are strongly supported in our analyses, although the exact

365 placement of Larsenia within this clade is uncertain (Figures 1 to 3). The placement of other

366 genera in the studies of Zhu et al. (2016) and Fan et al. (2016) were only partially consistent with

367 our results, albeit with weak support. These prior studies were based on few genetic markers

368 (three and five loci, respectively), resulting in inconsistencies among studies. Our inferences

369 consistently recover five clades with strong support that are somewhat consistent with the current

370 tribal classification. The only taxonomic discrepancy in our inferences is the polyphyly of Borbo.

371 Fan et al. (2016) described Larsenia to comprise a number of exclusively African species of

372 Borbo because their analysis recovered this clade as sister to all Baorini. Based on our

373 reconstruction, Larsenia is one of the most derived clades within Baorini (Figures 1 to 3). It is

374 unclear what should be done regarding Borbo detecta (Trimen, 1893) since it is recovered as

375 sister to Afrogegenes and Larsenia. Additional taxon sampling would be needed to propose

376 taxonomic changes in this clade.

377

378 Origins of Baorini skippers

379 The placement of Baorini within Hesperiinae remains unclear due to the staggering diversity of

380 the subfamily (> 2000 described species) and the lack of a comprehensive phylogeny for this Systematic Entomology Page 18 of 55

17

381 group. However, recent phylogenomic evidence (Toussaint et al. 2018b) suggests that Baorini

382 might be part of a large phylogenetic grade, thereby rendering difficult the inclusion of

383 outgroups that might inform the estimation of ancestral ranges (Toussaint et al. 2018a). Our

384 BioGeoBEARS analysis recovers an origin of Baorini in the Oriental region plus Greater Sunda

385 Islands/Philippines. We purposefully chose a broader definition of the Oriental region to avoid

386 losing biogeographic signal while retaining the power to test hypotheses relative to the

387 mechanisms for the colonization of Africa. Ancestors of Baorini likely originated in an area

388 climatically and biologically similar to the modern Oriental region, but a finer-scale

389 biogeographical and ecological study of specific clades would be needed to test this hypothesis.

390 Colonization of Wallacea and Sahul (i.e., the Australian region) would have been difficult before

391 ca. 1520 Ma, except perhaps through LDD (Hall 2013); the inferred ancestral ranges of early

392 diverging Baorini in the Oriental region is therefore unsurprising.

393

394 Hypothesis testing: African colonization of Baorini skippers

395 Our divergence time estimation supports H2 (Gomphotherium land bridge), as the African

396 continent was mostly colonized by range expansion (e.g., ancestors of and P.

397 thrax in clade IV, Figure 3) likely taking advantage of the Gomphotherium land bridge (Rögl

398 1998). Consequently, the BioGeoBEARS analysis rejects H1 (transoceanic long-distance

399 dispersal) although some dispersal events were likely required to explain for instance the

400 transition from the Oriental region to a larger range also comprising Western Palearctic and the

401 African continent in Clade II (Figure 3). Both our dating and BioGeoBEARS analysis reject H3

402 (Arabotropical forest dispersal). For the ancestors of Baorini to take advantage of tropical forests

403 as found in plants and fungi (Couvreur et al. 2011; Sánchez-Ramírez et al. 2015), they would

404 have had to colonize the Arabian Peninsula region in the Oligocene when these forests still

405 existed. Instead, our BioGeoBEARS estimate suggests that the ancestors of Baorini originated in Page 19 of 55 Systematic Entomology

18

406 the Oriental region in the Miocene. By the time Baorini skippers began to disperse to other

407 regions west of the Oriental region, the boreotropical forests had disappeared, due to a decrease

408 in global temperature (Zachos et al. 2001). It is therefore unlikely that these butterflies would

409 have taken advantage of these ecosystems to colonize the African continent.

410

411 Biogeographic history of Baorini skippers

412 The first colonization event toward Africa is found in Parnara, where a range expansion

413 event allowed the ancestor of Parnara except P. amalia (Semper, [1879]) to disperse from a

414 widespread range in the Palearctic, Oriental region, and IAA toward eastern Africa. Parnara

415 naso (Fabricius, 1798) is one of only two Parnara species to live in Africa, and its distribution is

416 limited to Madagascar and the Mascarenes (Chiba & Eliot 1991); the other African Parnara, P.

417 monasi (Trimen, 1889), is restricted to eastern coastal Africa. The current distribution of these

418 two species is estimated to be the result of range expansion in the Western (H1). We argue that

419 long-distance dispersal (H2) is possible alternative scenario because of the peculiar distribution

420 of Parnara skippers in southeastern coastal Africa. When considering LDD as an alternative

421 hypothesis to the range expansion pattern recovered by the DEC model (Figure 3) for the

422 colonization of Africa in Parnara, it is possible that now-sunken islands between India and the

423 Mascarenes facilitated such dispersal, acting as stepping-stones during periods of low sea-level

424 (e.g., Shapiro et al. 2002; Bradler et al. 2005, Warren et al. 2010). Bathymetric fluctuations have

425 occurred throughout the Cenozoic, with lowstands of up to 50 m below present sea level in the

426 last 5 million years (Miller et al. 2005). Such fluctuations coincide with colonization of Africa by

427 the ancestor of P. naso (and likely P. monasi), possibly by using islands that were subaerial at

428 the time as stepping-stones, including, for instance, Saya de Malha (Warren et al. 2010). African

429 Brusa (two extant species), which were not sampled in our study, may represent an additional

430 colonization event of Africa. Including P. monasi and generating more data for these two Systematic Entomology Page 20 of 55

19

431 African species will be critical in order to understand the underlying mechanism of colonization

432 in Parnara.

433 We infer a Miocene range expansion event from the Oriental region to Western Palearctic

434 and Africa in Clade II. Although a colonization route through the Palearctic region and into

435 northern Africa is possible, it is equally likely that Baorini skippers dispersed through the

436 Arabian Peninsula (see below). Our analysis suggests range expansion through the Arabian

437 Peninsula and Africa in Clade V. This pattern is consistent with the colonization of Africa by

438 Pelopidas and is largely compatible with H1. All of these colonization events seem to postdate

439 the closure of the Tethys Ocean, a geological suture that was until recently believed to be of

440 early Miocene age (e.g., Rögl 1998; Bosworth et al. 2005; Harzhauser et al. 2007). New

441 geological evidence based on magnetostratigraphy and strontium isotope analyses of

442 sedimentation in the High Zagros points to a slightly older origin of this land bridge, likely in the

443 mid-Oligocene, ca. 27 Ma (Pirouz et al. 2017). According to H1, the closure of the eastern part of

444 the Tethys Ocean (i.e., Neotethys) in the mid-Oligocene permitted the formation of the

445 Gomphotherium land bridge, a land corridor between Africa and the Oriental region (Rögl 1998).

446 Studies of unrelated taxa have suggested that this passage was used as a means of dispersal. For

447 instance, Bourguignon et al. (2017) hypothesized that multiple clades of termites (Isoptera) used

448 this bridge to disperse between the two continents. They argued that because these termite clades

449 were fungus-growers, geodispersal was more likely than LDD. Tamar et al. (2018) also

450 recognized the role played by the Gomphotherium land bridge in allowing dispersal to and from

451 Africa in a study on the biogeography of Uromastyx Merrem, 1820 lizards (Squamata,

452 Agamidae). In particular, they argued that the aridification of North Africa and the Arabian

453 Peninsula in the early Miocene might have favored dispersal in this clade. Miletinae butterflies

454 are also suggested to have dispersed into Africa via this corridor in the early Miocene

455 (Kaliszewska et al. 2015). Other butterfly studies have shed light on patterns consistent with the Page 21 of 55 Systematic Entomology

20

456 Gomphotherium land bridge hypothesis (Aduse-Poku et al. 2009, 2015; Sahoo et al. 2018). In the

457 case of Baorini skippers, the potential passage of the Gomphotherium land bridge is dated

458 between ca. 5 and 15 Ma in three different clades. When considering the credibility intervals

459 associated with these estimates, the crossing of the Gomphotherium land bridge could be

460 combined with more humid conditions in the Arabian Peninsula (Griffin 2002), as well as the

461 emergence of C4-grassland ecosystems in Eastern Mediterranean and African regions as early as

462 the mid-Miocene, followed by their dominance in the late Miocene (Edwards et al. 2010). All

463 Baorini skipper larvae are -feeders and we hypothesize that the synchronous emergence

464 of large grasslands and savannas in this key region might have facilitated dispersal toward

465 Africa, but also fostered diversification in Africa as suggested for other Lepidoptera (Kergoat et

466 al. 2012; Toussaint et al. 2012). Based on the co-occurrence of a land-bridge, appropriate host

467 plants, and a humid climate on the Arabian Peninsula and North Africa in the Miocene, the

468 disjunct distribution of Baorini skippers between Africa and Asia seems to stem principally from

469 range expansion from the Oriental region rather than transoceanic dispersal toward Africa.

470

471 ACKNOWLEDGEMENTS

472 We thank Niklas Wahlberg, an anonymous reviewer and Thomas Simonsen for insightful

473 comments and suggestions that contributed to strengthen this manuscript. We thank Antonio

474 Giudici, Geoffrey Prosser, Jee & Rani Nature Photography, Laitche, Hsu Hong Lin and Charles

475 James Sharp who provided photographs that were illustrated in this study. The University of

476 Florida's HiPerGator provided computational resources and technical support. EFAT

477 acknowledges UF Health Shands Hospital for assistance, and Marius Toussaint and Claire

478 Toussaint for inspiring this project on the 19th of May 2018. DJL acknowledges support from

479 DEB-1541557 and 1120380. AYK acknowledges funding from DEB-1541500 and 1541560. RV

480 acknowledges funding for sample collecting from grant CGL2016-76322-P (AEI/FEDER, UE). Systematic Entomology Page 22 of 55

21

481

482 CONFLICT OF INTEREST STATEMENT

483 The authors declare no conflict of interest.

484

485 REFERENCES

486 Aduse-Poku, K., Brattström, O., Kodandaramaiah, U., Lees, D.C., Brakefield, P.M. and

487 Wahlberg, N. (2015) Systematics and historical biogeography of the old-world butterfly

488 subtribe Mycalesina (Lepidoptera: Nymphalidae: Satyrinae). BMC Evolutionary Biology,

489 15(1), 167.

490 Aduse-Poku, K., Vingerhoedt, E. and Wahlberg, N. (2009) Out-of-Africa again: a phylogenetic

491 hypothesis of the genus Charaxes (Lepidoptera: Nymphalidae) based on five gene regions.

492 Molecular Phylogenetics and Evolution, 53(2), 463–478.

493 Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A. & Alekseyenko, A.V. (2012).

494 Improving the accuracy of demographic and molecular clock model comparison while

495 accommodating phylogenetic uncertainty. Molecular Biology and Evolution, 29(9), 2157–

496 2167.

497 Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A. and Lemey, P. (2012) Accurate model

498 selection of relaxed molecular clocks in Bayesian phylogenetics. Molecular Biology and

499 Evolution, 30(2), 239–243.

500 Berra, F. & Angiolini, L. (2014). The evolution of the Tethys region throughout the Phanerozoic:

501 a brief tectonic reconstruction. Petroleum systems of the Tethyan region: AAPG Memoir,

502 106, 1–27.

503 Bosworth, W., Huchon, P. & McClay, K. (2005). The Red Sea and Gulf of Aden basins. Journal

504 of African Earth Sciences, 43(1–3), 334–378. Page 23 of 55 Systematic Entomology

22

505 Bourguignon, T., Lo, N., Šobotník, J., Ho, S.Y., Iqbal, N., Coissac, E., Lee, M., Jendryka, M.M.,

506 Sillam-Dussès, D., Křížková, B. & Roisin, Y. (2017). Mitochondrial phylogenomics

507 resolves the global spread of higher termites, ecosystem engineers of the

508 tropics. Molecular Biology and Evolution, 34(3), 589–597.

509 Braby, M.F. (2000). Butterflies of Australia: their identification, biology and distribution.

510 CSIRO publishing.

511 Bradler, S., Cliquennois, N. & Buckley, T.R. (2015). Single origin of the Mascarene stick

512 : ancient radiation on sunken islands? BMC Evolutionary Biology, 15(1), 196.

513 Breinholt, J.W., Earl, C., Lemmon, A.R., Lemmon, E.M., Xiao, L. & Kawahara, A.Y. (2018).

514 Resolving relationships among the megadiverse butterflies and moths with a novel pipeline

515 for anchored phylogenomics. Systematic Biology, 67(1), 78–93.

516 Chazot, N., Wahlberg, N., Freitas, A.V.L., Mitter, C., Labandeira, C., Sohn, J.C., Sahoo, R.K.,

517 Seraphim, N., de Jong, R. and Heikkilä, M. (2019) Priors and posteriors in Bayesian timing

518 of divergence analyses: the age of butterflies revisited. Systematic Biology, in press.

519 Chiba, H. & Eliot, J.N. (1991). A revision of the genus Parnara Moore (Lepidoptera,

520 Hesperiidae), with special reference to the Asian species. Lepidoptera Science, 42(3), 179–

521 194.

522 Couvreur, T.L., Pirie, M.D., Chatrou, L.W., Saunders, R.M., Su, Y.C., Richardson, J.E. &

523 Erkens, R.H. (2011). Early evolutionary history of the flowering plant family Annonaceae:

524 steady diversification and boreotropical geodispersal. Journal of Biogeography, 38(4),

525 664–680.

526 Datta-Roy, A. & Karanth, K.P. (2009). The Out-of-India hypothesis: What do molecules

527 suggest? Journal of Biosciences, 34(5), 687–697. Systematic Entomology Page 24 of 55

23

528 de Jong, R. & Coutsis, J.G. (2017). A re-appraisal of Gegenes Hübner, 1819 (Lepidoptera,

529 Hesperiidae) based on male and female genitalia, with the description of a new genus,

530 Afrogegenes. Tijdschrift voor Entomologie, 160(1), 41–60.

531 de Jong, R. & Treadaway, C.G. (1993). The Hesperiidae (Lepidoptera) of the Philippines.

532 Zoologische Verhandelingen, 288, 1–115.

533 de Jong, R. (2016). Reconstructing a 55-million-year-old butterfly (Lepidoptera: Hesperiidae).

534 European Journal of Entomology, 113, 423–428.

535 de Jong, R. (2017). Fossil butterflies, calibration points and the molecular clock (Lepidoptera:

536 Papilionoidea). Zootaxa, 4270(1), 1–63.

537 Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high

538 throughput. Nucleic Acids Research, 32(5), 1792–1797.

539 Edwards, E.J., Osborne, C.P., Strömberg, C.A., Smith, S.A. & C4 Grasses Consortium (2010).

540 The origins of C4 grasslands: integrating evolutionary and ecosystem science. Science,

541 328(5978), 587–591.

542 Espeland, M., Breinholt, J., Willmott, K.R., Warren, A.D., Vila, R., Toussaint, E.F.A., Maunsell,

543 S.C., Aduse-Poku, K., Talavera, G., Eastwood, R., Jarzyna, M.A., Guralnick, R., Lohman,

544 D.J., Pierce, N.E. & Kawahara, A.Y. (2018). A comprehensive and dated phylogenomic

545 analysis of butterflies. Current Biology, 28(5), 770–778.

546 Fan, X., Chiba, H., Huang, Z., Fei, W., Wang, M. & Sáfián, S. (2016). Clarification of the

547 phylogenetic framework of the tribe Baorini (Lepidoptera: Hesperiidae: Hesperiinae)

548 inferred from multiple gene sequences. PloS one, 11(7), e0156861.

549 Ferreira, M.A. & Suchard, M.A. (2008). Bayesian analysis of elapsed times in continuous‐time

550 Markov chains. Canadian Journal of Statistics, 36(3), 355–368.

551 Ghazanfar, S.A. & Fisher, M. (1998). Vegetation of the Arabian Peninsula. Kluwer Academic

552 Publishers, Dordrecht, 365 pp. Page 25 of 55 Systematic Entomology

24

553 Griffin, D.L. (2002). Aridity and humidity: two aspects of the late Miocene climate of North

554 Africa and the Mediterranean. Palaeogeography, Palaeoclimatology,

555 Palaeoecology, 182(1–2), 65–91.

556 Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. (2010). New

557 algorithms and methods to estimate maximum-likelihood phylogenies: assessing the

558 performance of PhyML 3.0. Systematic Biology, 59(3), 307–321.

559 Hall, R. (2013). The palaeogeography of Sundaland and Wallacea since the Late

560 Jurassic. Journal of Limnology, 72(s2), 1.

561 Harzhauser, M., Kroh, A., Mandic, O., Piller, W.E., Göhlich, U., Reuter, M. & Berning, B.

562 (2007). Biogeographic responses to geodynamics: a key study all around the Oligo–

563 Miocene Tethyan Seaway. Zoologischer Anzeiger-A Journal of Comparative

564 Zoology, 246(4), 241–256.

565 Hoang, D.T., Chernomor, O., von Haeseler, A., Minh, B.Q. & Le, S.V. (2018). UFBoot2:

566 Improving the ultrafast bootstrap approximation. Molecular Biology and Evolution, 35(2),

567 518–522.

568 Hu, X., Garzanti, E., Wang, J., Huang, W., An, W. & Webb, A. (2016). The timing of India-Asia

569 collision onset–Facts, theories, controversies. Earth-Science Reviews, 160, 264–299.

570 Huang, H. (2011): Notes on the genera Caltoris Swinhoe, 1893 and Baoris Moore, [1881] from

571 China (Lepidoptera: Hesperiidae). Atalanta 42(1–4), 201–220.

572 Jiang, W., Zhu, J., Song, C., Li, X., Yang, Y. & Yu, W. (2013). Molecular phylogeny of the

573 butterfly genus Polytremis (Hesperiidae, Hesperiinae, Baorini) in China. PloS one, 8(12),

574 e84098.

575 Kaliszewska, Z.A., Lohman, D.J., Sommer, K., Adelson, G., Rand, D.B., Mathew, J., Talavera,

576 G. & Pierce, N.E. (2015). When caterpillars attack: biogeography and life history evolution

577 of the Miletinae (Lepidoptera: Lycaenidae). Evolution, 69(3), 571–588. Systematic Entomology Page 26 of 55

25

578 Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K., von Haeseler, A. & Jermiin, L.S. (2017).

579 ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods,

580 14(6), 587.

581 Katoh, K. & Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7:

582 improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772–

583 780.

584 Kawahara, A.Y., Breinholt, J.W., Espeland, M., Storer, C.G., Plotkin, D., Dexter, K.M.,

585 Toussaint, E.F.A., St Laurent, R.A., Brehm, G., Vargas, S., Forero, D., Pierce, N.E.,

586 Lohman, D.J. (2018) Phylogenetics of moth-like butterflies (Papilionoidea: Hedylidae)

587 based on a new 13-locus target capture probe set. Molecular Phylogenetics and Evolution,

588 127, 600–605.

589 Kergoat, G.J., Prowell, D.P., Le Ru, B.P., Mitchell, A., Dumas, P., Clamens, A.L., Condamine,

590 F.L. & Silvain, J.F. (2012). Disentangling dispersal, vicariance and adaptive radiation

591 patterns: a case study using armyworms in the pest genus Spodoptera (Lepidoptera:

592 Noctuidae). Molecular Phylogenetics and Evolution, 65(3), 855–870.

593 Kück, P. and Meusemann, K. (2010). FASconCAT: Convenient handling of data matrices.

594 Molecular Phylogenetics and Evolution, 56(3), 1115–1118.

595 Kück, P., Meusemann, K., Dambach, J., Thormann, B., von Reumont, B.M., Wägele, J.W. &

596 Misof, B. (2010). Parametric and non-parametric masking of randomness in sequence

597 alignments can be improved and leads to better resolved trees. Frontiers in Zoology, 7(1),

598 10.

599 Lanfear, R., Frandsen, P.B., Wright, A.M., Senfeld, T. & Calcott, B. (2016). PartitionFinder 2:

600 new methods for selecting partitioned models of evolution for molecular and

601 morphological phylogenetic analyses. Molecular Biology and Evolution, 34(3), 772–773. Page 27 of 55 Systematic Entomology

26

602 Li, Y.P., Gao, K., Yuan, F., Wang, P. & Yuan, X.Q. (2017). Molecular systematics of the

603 butterfly tribe Baorini (Lepidoptera: Hesperiidae) from China. Journal of the Kansas

604 Entomological Society, 90(2), 100–108.

605 Matzke, N.J. (2018) BioGeoBEARS: BioGeography with Bayesian (and likelihood)

606 Evolutionary Analysis with R Scripts. version 1.1.1, published on GitHub on November

607 6, 2018.

608 Miller, K.G., Kominz, M.A., Browning, J.V., Wright, J.D., Mountain, G.S., Katz, M.E.,

609 Sugarman, P.J., Cramer, B.S., Christie-Blick, N. & Pekar, S.F. (2005). The Phanerozoic

610 record of global sea-level change. Science, 310, 1293–1298.

611 Nguyen, L.T., Schmidt, H.A., von Haeseler, A. & Minh, B.Q. (2014). IQ-TREE: a fast and

612 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Molecular

613 Biology and Evolution, 32(1), 268–274.

614 Pirouz, M., Avouac, J.P., Hassanzadeh, J., Kirschvink, J.L. & Bahroudi, A. (2017). Early

615 Neogene foreland of the Zagros, implications for the initial closure of the Neo-Tethys and

616 kinematics of crustal shortening. Earth and Planetary Science Letters, 477, 168–182.

617 Price, M.N., Dehal, P.S. & Arkin, A.P. (2010). FastTree 2–approximately maximum-likelihood

618 trees for large alignments. PloS one, 5(3), e9490.

619 Ree, R.H. (2005). Detecting the historical signature of key innovations using stochastic models

620 of character evolution and cladogenesis. Evolution, 59(2), 257–265.

621 Ree, R.H. & Sanmartín, I. (2018). Conceptual and statistical problems with the DEC+ J model of

622 founder‐event speciation and it comparison with DEC via model selection. Journal of

623 Biogeography, 45(4), 741–749.

624 Ree, R.H. & Smith, S.A. (2008). Maximum likelihood inference of geographic range evolution

625 by dispersal, local extinction, and cladogenesis. Systematic Biology, 57(1), 4–14. Systematic Entomology Page 28 of 55

27

626 Rögl, F. (1998). Palaeogeographic considerations for Mediterranean and Paratethys seaways

627 (Oligocene to Miocene). Annalen des Naturhistorischen Museums in Wien. Serie A für

628 Mineralogie und Petrographie, Geologie und Paläontologie, Anthropologie und

629 Prähistorie, 279–310.

630 Sahoo, R.K., Lohman, D.J., Wahlberg, N., Müller, C.J., Brattström, O., Collins, S.C., Peggie, D.,

631 Aduse-Poku, K. and Kodandaramaiah, U. (2018) Evolution of Hypolimnas butterflies

632 (Nymphalidae): Out-of-Africa origin and Wolbachia-mediated introgression. Molecular

633 Phylogenetics and Evolution, 123, 50–58.

634 Sánchez‐Ramírez, S., Tulloss, R.E., Amalfi, M. & Moncalvo, J.M. (2015). Palaeotropical

635 origins, boreotropical distribution and increased rates of diversification in a clade of edible

636 ectomycorrhizal mushrooms (Amanita section Caesareae). Journal of Biogeography, 42(2),

637 351–363.

638 Seton, M., Müller, R.D., Zahirovic, S., Gaina, C., Torsvik, T., Shephard, G., Talsma, A., Gurnis,

639 M., Turner, M., Maus, S. & Chandler, M. (2012). Global continental and ocean basin

640 reconstructions since 200 Ma. Earth-Science Reviews, 113(3–4), 212–270.

641 Shapiro, B., Sibthorpe, D., Rambaut, A., Austin, J., Wragg, G.M., Bininda-Emonds, O.R., Lee,

642 P.L. & Cooper, A. (2002). Flight of the dodo. Science, 295(5560), 1683–1683.

643 Suchard, M.A., Lemey, P., Baele, G., Ayres, D.L., Drummond, A.J. & Rambaut, A. (2018)

644 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus

645 Evolution, 4(1), vey016.

646 Tamar, K., Metallinou, M., Wilms, T., Schmitz, A., Crochet, P.A., Geniez, P. & Carranza, S.

647 (2018). Evolutionary history of spiny‐tailed lizards (Agamidae: Uromastyx) from the

648 Saharo‐Arabian region. Zoologica Scripta, 47(2), 159–173.

649 Tang, J., Huang, Z., Chiba, H., Han, Y., Wang, M. & Fan, X. (2017). Systematics of the genus

650 Zinaida Evans, 1937 (Hesperiidae: Hesperiinae: Baorini). PloS one, 12(11), e0188883. Page 29 of 55 Systematic Entomology

28

651 Toussaint, E.F.A., Condamine, F.L., Kergoat, G.J., Capdevielle-Dulac, C., Barbut, J., Silvain,

652 J.F. & Le Ru, B.P. (2012). Palaeoenvironmental shifts drove the adaptive radiation of a

653 noctuid stemborer tribe (Lepidoptera, Noctuidae, Apameini) in the Miocene. PloS one,

654 7(7), e41377.

655 Toussaint, E.F.A., Hall, R., Monaghan, M.T., Sagata, K., Ibalim, S., Shaverdo, H.V., Vogler,

656 A.P., Pons, J. & Balke, M. (2014). The towering orogeny of New Guinea as a trigger for

657 megadiversity. Nature Communications, 5, 4001.

658 Toussaint, E.F.A., Fikáček, M. & Short, A.E. (2016). India–Madagascar vicariance explains

659 cascade beetle biogeography. Biological Journal of the Linnean Society, 118(4), 982–991.

660 Toussaint, E.F.A. & Short, A. (2018a). Transoceanic Stepping–stones between Cretaceous

661 waterfalls? The Enigmatic Biogeography of pantropical Oocyclus cascade beetles.

662 Molecular Phylogenetics and Evolution, 127, 416–428.

663 Toussaint, E.F.A., Breinholt, J.W., Earl, C., Warren, A.D., Brower, A.V.Z., Yago, M., Dexter,

664 K.M., Espeland, M., Pierce, N.E., Lohman, D.J., Kawahara, A.Y. (2018b). Anchored

665 phylogenomics illuminates the skipper butterfly tree of life. BMC Evolutionary Biology,

666 18, 101.

667 Voris, H.K. (2000). Maps of Pleistocene sea levels in : shorelines, river systems

668 and time durations. Journal of Biogeography, 27(5), 1153–1167.

669 Wahlberg, N. & Wheat, C.W. (2008). Genomic outposts serve the phylogenomic pioneers:

670 designing novel nuclear markers for genomic DNA extractions of Lepidoptera. Systematic

671 Biology, 57, 231–242.

672 Warren, A.D., Ogawa, J.R. & Brower, A.V. (2009). Revised classification of the family

673 Hesperiidae (Lepidoptera: Hesperioidea) based on combined molecular and morphological

674 data. Systematic Entomology, 34(3), 467–523. Systematic Entomology Page 30 of 55

29

675 Warren, B.H., Strasberg, D., Bruggemann, J.H., Prys‐Jones, R.P. & Thébaud, C. (2010). Why

676 does the biota of the Madagascar region have such a strong Asiatic flavour? Cladistics,

677 26(5), 526–538.

678 Wolfe, J.A. (1975). Some aspects of plant geography of the Northern Hemisphere during the late

679 Cretaceous and Tertiary. Annals of the Missouri Botanical Garden, 264–279.

680 Yumul Jr, G.P., Dimalanta, C.B., Maglambayan, V.B. & Marquez, E.J. (2008). Tectonic setting

681 of a composite terrane: a review of the Philippine island arc system. Geosciences Journal,

682 12(1), 7.

683 Zachos, J., Pagani, M., Sloan, L., Thomas, E. & Billups, K. (2001). Trends, rhythms, and

684 aberrations in global climate 65 Ma to present. Science, 292(5517), 686–693.

685 Zhang, Y.L., Xue, G.X. & Yuan, F. (2010). Descriptions of the female genitalia of three species

686 of Caltoris (Lepidoptera: Hesperiidae: Baorini) with a key to the species from China.

687 Proceedings of the Entomological Society of Washington, 112(4), 576–584.

688 Zhang, J., Santosh, M., Wang, X., Guo, L., Yang, X. & Zhang, B. (2012). Tectonics of the

689 northern Himalaya since the India–Asia collision. Gondwana Research, 21(4), 939–960.

690 Zhu, J.Q., Chiba, H. & Wu, L.W. (2016). Tsukiyamaia, a new genus of the tribe Baorini

691 (Lepidoptera, Hesperiidae, Hesperiinae). ZooKeys (555), 37. Page 31 of 55 Systematic Entomology

30

692 TABLES

693 Table 1. Comparison of BEAST divergence time analyses.

Analy Relaxed Tree SS MLE PS MLE Baorini Clade I Clade II Clade III Clade IV Clade V sis Clocks Model A1 1 Yule - - 22.519 (18.277- 5.373 (3.378- 13.906 (11.661- 12.215 (10.048- 10.715 (8.625- 12.250 (10.189- 127312.2 127312.46 26.284) 8.031) 16.214) 14.222) 12.701) 14.071) 48 1 A2 1 Birth- - - 21.629 (17.145- 5.017 (3.137- 13.230 (10.716- 11.506 (9.246- 10.059 (7.924- 11.566 (9.366- death 127319.4 127320.46 25.808) 7.547) 15.774) 13.682) 12.198) 13.621) 29 59 A3 2 Yule - - 22.839 (19.543- 5.868 (3.193- 15.105 (13.193- 13.139 (11.137- 11.323 (9.496- 13.282 (11.447- 127146.0 127146.82 26.577) 10.337) 16.643) 14.880) 13.107) 14.824) 05 9 A4 2 Birth- - - 22.676 (18.794- 5.282 (2.872- 14.369 (11.995- 12.362 (10.014- 10.780 (8.695- 12.551 (10.456- death 127155.6 127155.34 26.654) 9.538) 18.373) 14.395) 12.674) 14.435) 05 1 A5 5 Yule - - 22.258 (18.489- 5.246 (3.263- 13.736 (11.478- 12.189 (10.170- 10.400 (8.515- 12.130 (10.253- 127256.9 127256.65 25.943) 7.751) 15.978) 14.113) 12.235) 13.938) 42 6 A6 5 Birth- - - 21.755 (17.937- 4.908 (3.057- 13.129 (10.827- 11.627 (9.604- 9.897 (8.004- 11.604 (9.624- death 127261.9 127261.82 25.543) 7.324) 15.351) 13.619) 11.774) 13.460) 44 2 694

695 Notes: PS, path sampling; SS, stepping-stone sampling; MLE, marginal likelihood estimate from the path sampling and stepping-stone analyses

696 conducted in BEAST; the values in the “Clade” columns are median ages in millions of years estimated in BEAST along with 95% credibility

697 intervals. Systematic Entomology Page 32 of 55

31

698 FIGURE LEGENDS

699 Figure 1. Maximum likelihood tree of Baorini grass skippers based on anchored phylogenomics

700 Best scoring ML phylogenetic tree inferred in IQ-TREE based on the concatenated dataset of 13

701 loci comprised in the BUTTERFLY2.0 kit (Kawahara et al. 2018). The five main Baorini clades

702 are labelled (I-V). A historical drawing of Pelopidas mathias is presented from the book Indian

703 Life by Harold Maxwell-Lefroy (1909).

704

705 Figure 2. Maximum likelihood tree of Baorini grass skippers based on an extended dataset

706 Best scoring ML phylogenetic tree inferred in IQ-TREE based on the concatenated dataset of 16

707 loci including ribosomal loci recovered from GenBank. The five main Baorini clades are labelled

708 (I-V). Green stars indicate species for which data was generated in this study while others for

709 which data was already available are not labelled. Photographs of Baorini species are given on

710 the right side of the figure. From top to bottom: Parnara ganga (credit: Antonio Giudici), Iton

711 semamora (credit: Antonio Giudici), (credit: Geoffrey Prosser), Zenonoida

712 discreta (credit: Antonio Giudici), Caltoris cahira (credit: Antonio Giudici), (Jee &

713 Rani Nature Photography), Pelopidas mathias (credit: Laitche), (credit: Hsu

714 Hong Lin), (credit: Charles James Sharp).

715

716 Figure 3. Bayesian divergence time estimates and historical biogeography of Baorini skippers

717 Median age timetree from the BEAST analysis using two Bayesian relaxed clocks and a Yule

718 speciation tree model. 95% credibility intervals are shown as grey bars on all nodes. Nodal

719 support values are posterior probabilities (PP), and nodes with PP > 0.95 are labelled with an

720 asterisk. The biogeographic estimation of ancestral ranges from the DEC analysis performed in

721 BioGeoBEARS is presented with the most likely state at each node. Range expansion and long-

722 distance dispersal events are indicated following the inserted caption. The major geological Page 33 of 55 Systematic Entomology

32

723 events that occurred in the past 40 million years and are relevant to the evolution of Baorini are

724 presented under the geological timescale. A historical drawing of Pelopidas mathias is presented

725 from the book Indian Insect Life by Harold Maxwell-Lefroy (1909). Polytremis lubricans Systematic Entomology Page 34 of 55 Pelopidas mathias

Pelopidas jansonis

Pelopidas agna

Pelopidas thrax Borbo

Borbo cinnara

Gegenes nostrodamus Baoris oceia Gegenes pumilio

Afrogegenes letterstedti Baoris farri Caltoris philippina Afrogegenes hottentota

Borbo detecta Caltoris cormasa V Larsenia fallax IV Prusiana prusias Larsenia holtzi

Larsenia perobscura Zenonia zeno III Larsenia sirena Baorini Hedylidae Zenonia anax Macrosoma hyacinthina

Badamia exclamationis Zinaida pellucida II chromus Burara etelka Iton semamora I benjaminii

Coeliades forestan Euschemon rafflesia

Parnara apostata Mylon illineatus Epargyreus clarus Passova gellias

Leptalina unicolor

Piruna pirus Hesperiidae Saliana esperi Trapezites petalia Coeliadinae

Euschemoninae zeus luedheri Eudaminae bononia Pyrginae SH-aLRT > 80 and UFBoot > 95 Heteropterinae SH-aLRT > 80 or UFBoot > 95 Trapezitinae inachus SH-aLRT < 80 and UFBoot < 95 Hesperiinae Outgroups Page 35 of 55 Systematic Entomology Parnara amalia I Parnara bada Parnara apostata Parnara ganga 91/85 Parnara ogasawarensis Parnara guttata Parnara batta Iton semamora II 50/37 Iton watsonii Tsukiyamaia albimacula Zenonia zeno Zenonia anax Baorini 73/28 Zenonoida discreta Zenonoida eltota Zinaida pellucida 94/89 Zinaida zina Zinaida jigongi 87/74 Zinaida matsuii Zinaida kiraizana Zinaida gigantea Zinaida suprema 85/90 Zinaida nascens Zinaida micropunctata Zinaida caerulescens Zinaida gotama Zinaida mencia Zinaida theca Prusiana prusias III 100/90 Caltoris brunnea Caltoris cormasa Caltoris philippina 100/92 Caltoris malaya Caltoris aurociliata 100/85 Caltoris bromus Caltoris sirius Caltoris cahira Caltoris septentrionalis 89/80 Baoris pagana Baoris oceia Baoris farri IV Baoris leechii 100/91 Baoris penicillata Pelopidas lyelli Pelopidas subochracea Pelopidas assamensis 71/61 Pelopidas mathias Pelopidas sinensis Polytremis lubricans 0.03 Pseudoborbo bevani 91/77 Borbo fatuellus Borbo ratek V Borbo borbonica Borbo cinnara Borbo gemella Gegenes nostrodamus Gegenes pumilio Borbo detecta AHE data (BUTTERFLY2.0 kit) Afrogegenes hottentota SH-aLRT > 80 and UFBoot > 95 Afrogegenes letterstedti SH-aLRT > 80 or UFBoot > 95 Larsenia holtzi SH-aLRT < 80 and UFBoot < 95 Larsenia fallax Larsenia perobscura Larsenia sirena * Parnara amalia Systematic Entomology * ParnaraPage bada 36 of 55 I Parnara apostata Parnara naso * Parnara ganga * Parnara ogasawarensis * Parnara guttata II * Parnara batta * Iton watsonii * Iton semamora Tsukiyamaia albimacula * * Zenonia zeno * Zenonia anax * Zenonoida discreta Zenonoida eltota * * Zinaida pellucida Zinaida zina * * Zinaida matsuii * Zinaida kiraizana * * Zinaida gigantea Zinaida suprema Zinaida jigongi * Zinaida mencia Zinaida theca * * Zinaida gotama Zinaida caerulescens * Zinaida nascens Zinaida micropunctata III Prusiana prusias * Caltoris brunnea Caltoris philippina Caltoris cormasa Caltoris aurociliata Caltoris malaya Range expansion Caltoris sirius Caltoris septentrionalis Caltoris cahira * BEAST PP ≥ 0.95 Caltoris kumara Caltoris bromus BEAST 95% CI * Baoris pagana * Baoris oceia Baoris penicillata Western Palearctic Baoris leechii * IV Africa / Madagascar / Mascarenes Baoris farri Pelopidas thrax Arabian Peninsula * Pelopidas lyelli * Pelopidas jansonis Oriental Region Pelopidas subochracea Greater Sunda Islands / Philippines Pelopidas sinensis * Pelopidas agna Wallacea Pelopidas mathias * Pelopidas conjuncta Australia / New Guinea Pelopidas assamensis Polytremis lubricans * Pseudoborbo bevani * * Borbo fatuellus * Borbo ratek * Borbo borbonica V * * Borbo cinnara Borbo gemella * Gegenes nostrodamus * Gegenes pumilio Borbo detecta * * Afrogegenes hottentota * Afrogegenes letterstedti Larsenia holtzi Larsenia fallax * Larsenia perobscura Larsenia sirena TS1 TS2 Time OLIGOCENE Intermittent Mascarenes/SeychellesMIOCENE subaerial island chain Plio. Ple. (Ma) 30 20 10 0 Formation of Wallacea Tethys Sea opened Formation of the Gomphotherium land bridge C4 grassland emergence Intermittent Mascarenes/Seychelles subaerial island chain