Louisiana State University LSU Digital Commons

LSU Master's Theses Graduate School

2016 Biogeography and Population Genetics on Sulawesi, Indonesia: Unrecognized Diversity in the Shrew Crocidura elongata Ryan Andrew Eldridge Louisiana State University and Agricultural and Mechanical College, [email protected]

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_theses

Recommended Citation Eldridge, Ryan Andrew, "Biogeography and Population Genetics on Sulawesi, Indonesia: Unrecognized Diversity in the Shrew Crocidura elongata" (2016). LSU Master's Theses. 3390. https://digitalcommons.lsu.edu/gradschool_theses/3390

This Thesis is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Master's Theses by an authorized graduate school editor of LSU Digital Commons. For more information, please contact [email protected]. BIOGEOGRAPHY AND POPULATION GENETICS ON SULAWESI, INDONESIA: UNRECOGNIZED DIVERSITY IN THE SHREW CROCIDURA ELONGATA

A Thesis

Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Master of Science

in

The Department of Biological Sciences

by Ryan Andrew Eldridge B.S., The Ohio State University, 2012 August 2016 Acknowledgments

I am grateful for the abundance of individuals whose efforts were instrumental in the production of this thesis. Anang Achmadi, Kevin Rowe, Karen Rowe, James Patton, Carol Patton, Heru

Handika, Alan Hitch, and Jacob Esselstyn all collected specimens that were used in this study.

The members of the Esselstyn lab (Thomas Giarla, Terrence Demos, Mark Swanson, and

Jonathan Nations) have provided endless advice and camaraderie. I am particularly indebted to

Dr. Giarla, without whose patience much of my progress would have been far more arduous. I thank my committee members, Jeremy Brown and Frederick Sheldon, for helpful comments and suggestions during the production of this manuscript. Finally, I thank my advisor, Jacob

Esselstyn, who accepted me as a graduate student in the first place, and whose guidance and steady hand made this thesis possible.

This work was funded in part by NSF grants DEB-1343517 and OISE-0965856, as well as

National Geographic’s Committee for Research and Exploration.

ii

Table of Contents

Acknowledgments……………………………………………………………………………….. ii

Abstract………………………………………………………………………………………….. iv

1. Introduction…………………………………………………………………………...... 1

2. Materials and Methods………………………………………………………………………… 6 2.1 Taxon sampling………………………………………………………………………. 6 2.2 DNA sequencing……………………………………………………………………... 6 2.3 Mitochondrial gene tree and divergence date estimation ………………..…………... 8 2.4 Phylogenetic analysis of concatenated nuclear genes… …………………………….. 9 2.5 Assessment of geographic patterns of diversity…………………………………….... 9 2.6 Species delimitation………………………………………………………………… 11 2.7 Species tree estimation …………………………..…………………………………. 12 2.8 Gene flow estimation… …………………..………………………………….…...... 12

3. Results………………………………………………………………………………………... 15 3.1 Alignment contents and sequence variation…………………………………..…….. 15 3.2 Mitochondrial tree and divergence time estimation………………………………… 15 3.3 Phylogenetic analysis of concatenated nuclear genes………………………………. 18 3.4 Assessment of geographic patterns of diversity…………………………………….. 21 3.5 Species delimitation………………………………………………………………… 25 3.6 Species tree estimation……………………………………………………………… 25 3.7 Gene flow estimation……………………………………………………………….. 27

4. Discussion and Conclusions…………………………………………………………………. 33

References………………………………………………………………………………………. 38

Appendix: Supplementary information…………………………………..……………………... 45

Vita……………………………………………………………………………………………… 47

iii

Abstract

In view of its mountainous terrain, peninsular shape, and status as an agglomeration of formerly separate landmasses, the island of Sulawesi, Indonesia, presents a fertile opportunity for examining biogeographical processes. Previous work on Sulawesi primates and amphibians suggested the existence of distinct areas of endemism (AoEs), but the relevance of these zones to a broader set of taxa has not been widely investigated, nor the reasons for their existence fully explored. Here, we use population genetic analyses of the endemic Sulawesi shrew, Crocidura elongata, to assess biogeographical partitioning according to the putative AoEs. We uncover significant cryptic diversity within C. elongata that poorly aligns with the AoEs. Rather, we identify patterns consistent with the earliest divergence occurring along elevational gradients, and subsequent divergence evolving across the island’s AoEs. Our results suggest that disparate forces have contributed to the diversification of Sulawesi’s vertebrate fauna and complement other studies in highlighting the extensive intra-island diversification that has occurred among small mammal faunas on the larger islands of Indo-Australia.

iv

1. Introduction

Insular Southeast Asia holds an important place in the history of biogeography and, indeed, the history of evolutionary biology (Darwin and Wallace 1858; Wallace 1869). Early naturalists documented striking zones of faunal turnover and remarkable levels of endemism (e.g., Huxley

1868; Wallace 1869; Dickerson 1928). Nevertheless, the fauna of this megadiverse region remains poorly known in many respects, and modern expeditions frequently uncover new species of vertebrates, including mammals (e.g., Heaney et al. 2011, 2014a; Esselstyn et al. 2012, 2013).

It is perhaps unsurprising, then, that the biogeographical factors that shaped the region’s current patterns of diversity and endemism remain poorly understood. For instance, in the Philippines, the history of cyclic connections among certain islands during the Pleistocene has often been invoked to explain biodiversity patterns (e.g., Dickerson 1928; Inger 1954; Brown and Alcala

1970). However, the importance of this model as a mechanism of speciation has proven somewhat limited in the Philippines and neighboring areas, and other factors likely play a significant role (Gorog et al. 2004; Siler et al. 2010; Brown et al. 2013; Sheldon et al. 2015).

Nevertheless, permanent and intermittent stretches of both water and alien habitat serve as partial barriers to dispersal, provide opportunities for allopatric speciation, and have contributed considerably (though not solely) to the development of the region’s overall diversity (Heaney et al. 2005; Jansa et al. 2006; Esselstyn and Brown 2009; Lohman et al. 2011; Rowe et al. 2016a).

Other important factors determining patterns of diversity and endemism include island area, age, isolation, and topographic complexity. Island area is positively correlated with levels of species richness on both oceanic and continental islands (Wilson 1961; Heaney 1984), and seems to foster in situ diversification when combined with long-term isolation and complex topographies (e.g., Heaney and Rickart 1990; Heaney 2011; Toussaint et al. 2014). For example,

1 the large, topographically complex oceanic islands of Luzon and Sulawesi harbor adaptive radiations of mammals that have undergone extensive in situ diversification (Jansa et al. 2006;

Rowe et al. 2016a). In addition to the ecologically and morphologically diverse radiations found on these large islands, recent studies have also discovered high levels of relatively cryptic diversity, with genetic analyses uncovering deep divergences among allopatric and morphologically similar lineages (Lim and Sheldon 2011; Heaney et al. 2011, 2014a, b; Balete et al. 2012; Musser 2014). These findings suggest that geographic or ecological processes occurring within individual islands often generate diversity and may initiate the speciation process.

Among Southeast Asia’s largest islands, Sulawesi presents a particularly compelling case for biogeographical study. Lying immediately east of Wallace’s Line (Wallace 1863), Sulawesi exhibits Wallacea’s emblematic mélange of the Sahulian (i.e., originating on the New

Guinea/Australian continental plate) and Sundaic (Asian) faunas, albeit with a stronger Asian flavor (Whitten et al. 2002). Described as “anomalous” by Wallace (1880) as he struggled to classify its predominantly endemic biota, Sulawesi is remarkable in a number of geographic respects. First, it is among the largest islands in Southeast Asia (11th largest globally) and overwhelmingly the largest in Wallacea. Second, Sulawesi is centrally positioned relative to three highly diverse and distinctive potential source faunas found in Sahul (Australia and New

Guinea), Sunda (Borneo, Java, and Sumatra), and the Philippines. Third, Sulawesi is strikingly

“peninsular,” with its four large arms together comprising a substantially greater land area than the island’s central core (Figure 1.1). Fourth, Sulawesi’s topography spans an extensive elevational range, reaching from sea level to >3000 m in the case of several peaks in the island’s mountainous interior. Lastly, Sulawesi is a composite island, formed as no fewer than four

2

FIGURE 1.1: Map of Sulawesi with black lines corresponding to AoE boundaries. Numbers indicate the locations of sampling localities as follow: 0 = Luwu Timur, 1 = Mt. Gandangdewata, 2 = Salu Tiwo, 3 = Mt. Balease, 4 = Mt. Latimojong, 5 = Lore Lindu, 6 = Mt. Tompotika, 7 = Mt. Dako, 8 = Mt. Buliohuto, and 9 = Mt. Mekongga.

3 prehistoric islands collided between 5 and 20 Ma (Mega-annum or million years ago), largely as a result of the ongoing collision between the Asian and Sahulian continental margins (Hall 2002;

Spakman and Hall 2010; Hall 2011). Thus, Sulawesi’s combination of large area, peninsular shape, mountainous terrain, central location relative to disparate source faunas, and history as an archipelago provides numerous plausible mechanisms for generating high levels of species diversity within the island.

Early taxonomic work showed the preponderance of endemic taxa on Sulawesi (e.g.,

Miller and Hollister 1921a, b), but it was not until much later that biologists began to realize that diversity might be partitioned among Sulawesi’s peninsulas in a regular, predictable fashion.

Fooden (1969) first noted that macaque hybrid zones correspond to natural geographic boundaries separating the peninsulas from the central core, and individual components of the north peninsula from each other. More recent genetic comparisons of amphibians, , and mammals have supported the notion that these areas are consistently places of lineage partitioning, suggesting that the geographic distributions of these taxa have been influenced by a shared mechanism (Evans et al. 2003a, b, 2008; McGuire et al. 2007; Shekelle et al. 2010;

Setiadi et al. 2011). For example, noting shared patterns of endemism between Sulawesi macaques and the Celebes toad, Evans et al. (2003a, 2008) postulated a role for marine inundations and/or adaptation to distinct ecological circumstances in generating areas of endemism (AoEs; Figure 1.1) on Sulawesi. It is also possible that some organisms colonized the proto-Sulawesi archipelago broadly and diversified in allopatry prior to the collision of individual islands (e.g., squirrels; see Hawkins et al. 2016). However, the ages of divergences among the macaque and toad species noted by Evans et al. (2003a) are probably too young to fit this scenario.

4

Recently, new species of rodents that are apparently endemic to just the western portion of the central core AoE have been discovered (Esselstyn et al. 2012; Musser 2014; Rowe et al.

2014, 2016b), suggesting that denser sampling might reveal additional AoEs or that the AoE paradigm does not fit the island’s diverse rodent fauna. In any case, the taxonomic breadth across which Sulawesi’s AoE paradigm is relevant remains unknown. Testing the paradigm in additional taxa will help assess the importance of shared biogeographical events and mechanisms, while also offering guidance to conservation efforts.

Shrews of the genus Crocidura are well represented on Sulawesi, with six endemic species derived from two independent colonizations (Ruedi 1995; Ruedi et al. 1998; Esselstyn et al. 2009). Among Sulawesi’s shrews, the species Crocidura elongata offers a particularly promising opportunity to evaluate the AoE paradigm because it is widespread on the island and its members are easily distinguished from other known species by external characters (Ruedi

1995). Here, we perform multi-locus population genetic, phylogenetic, and species delimitation analyses on C. elongata sampled from across Sulawesi and gauge geographic structure among these populations. We interpret the results in terms of their consistency with the AoE paradigm and the potential mechanisms that might have produced genetic isolation in C. elongata.

5

2. Materials and Methods

2.1 Taxon sampling

Specimens of Crocidura elongata were collected from ten localities across Sulawesi, incorporating representatives from four of the island’s seven AoEs. These specimens are vouchered at the Field Museum of Natural History, Chicago (FMNH); Museum Victoria,

Melbourne (NMV); Louisiana State University Museum of Natural Science, Baton Rouge

(LSUMZ); Museum of Wildlife and Fish Biology, Davis (WFB); and Museum Zoologicum

Bogoriense, Bogor (MZB). Sample sites include multiple localities within the central core (Mt.

Gandangdewata [20 individuals], Salu Tiwo [4], Mt. Balease [10], Luwu Timur [3], Mt.

Latimojong [6], and Lore Lindu National Park [9]) and NW (Mt. Dako [9] and Mt. Buliohuto

[8]) AoEs, with one population sampled from each of the SE (Mt. Mekongga [3]) and E (Mt.

Tompotika [3]) AoEs (Figure 1.1). All shrews were collected in forested habitats, with the majority taken in relatively undisturbed primary forest. Only the Salu Tiwo site consisted of secondary forest and shrubby, regenerating vegetation. Several of the sites include samples from multiple elevations, often encompassing the major forest types of lowland dipterocarp forest and montane forest (sensu Musser 2014). With the exception of one individual (WFB8203) from Mt.

Mekongga that was sampled at 150 m above sea level, Salu Tiwo was the lowest sample site at

170 m. The highest site was on Mt. Gandangdewata, where sampling reached 2600 m. We generally found C. elongata co-occurring with 4–5 other species of Crocidura.

2.2 DNA sequencing

DNA was extracted from tissue samples using the Qiagen DNeasy protocol or Promega Wizard

SV kits. We amplified one mitochondrial gene (cytochrome b [cyt b]) and 15 nuclear loci via the polymerase chain reaction (PCR). Of the 15 nuclear loci, eight (apolipoprotein B [ApoB]; brain-

6 derived neurotrophic factor [BDNF]; breast cancer susceptibility gene [BRCA]; growth hormone receptor [GHR]; mast-cell growth factor [MCGF]; prostaglandin E2 receptor [PTGER]; recombination-activating gene [RAG-1]; and von Willebrand factor [vWF]) were amplified and sequenced following the protocols of Esselstyn et al. (2013). Primers for the remaining seven loci (ankyrin repeat domain [ANKRD]; cyclin-T [CCNT]; glutamate receptor, N-methyl D-

Aspartate [GRIN]; methyl-CpG-binding domain [MBD]; nuclear receptor coactivator [NCOA];

SLIT and NTRK family [SLITRK]; and transmembrane protein [TMEM]) were designed using alignments of coding loci from Sorex araneus, Bos taurus, and Mus musculus taken from the

OrthoMam database (Douzery et al. 2014). General methods of amplification and cycle sequencing for these seven loci followed Esselstyn et al. (2013), but with locus-specific annealing temperatures (Table A.1 in Appendix) used during PCR thermal cycles. All nuclear loci are entirely protein coding except MCGF, which spans exon – intron boundaries. PCR products were purified using ExoSAP-IT and prepared for sequencing using the BigDye

Terminator v3.1 Cycle Sequencing Kit and a non-commercial ethanol clean up procedure.

Prepared samples were sequenced in both directions at the Cornell University Biotechnology

Resource Center. Sequences were edited by eye in GENEIOUS 7.1.7 (Kearse et al. 2012) and aligned using the MUSCLE algorithm (Edgar 2004) in GENEIOUS. Alignments were examined by eye and verified to be free of premature stop codons. Degenerate bases were called as heterozygotes by GENEIOUS. Chromatograms associated with each heterozygous site were subsequently checked by eye to avoid scoring sequencing errors as genetic variants.

All nuclear sequences were resolved into statistically probable haplotypes using PHASE 2.1.1

(Stephens et al. 2001; Stephens and Donnelly 2003). The online application SeqPHASE (Flot

7

2010) was used to convert FASTA files to PHASE input files, as well as convert PHASE output back to FASTA format.

2.3 Mitochondrial gene tree and divergence date estimation

We estimated a mitochondrial gene tree using Bayesian and maximum likelihood (ML) frameworks. An HKY + Γ model of nucleotide substitution was chosen using the Bayesian

Information Criteria (BIC) in jModelTest 2.1.7 (Guindon and Gascuel 2003; Darriba et al. 2012), where we allowed three substitution schemes (JC/F81, K80/HKY, and SYM/GTR) and used an

ML-optimized base tree for calculating the likelihood of each model.

We performed the Bayesian analysis in BEAST 2.1.3 (Bouckaert et al. 2014). Two independent runs of 2 × 109 Markov chain Monte Carlo (MCMC) generations were initiated with random starting trees. The prior of substitution rate was specified as a lognormal distribution with a mean of 1.0 and standard deviation of 1.25. We used a Yule model of tree shape. Samples were drawn every 2 × 105 generations. Divergence dates between taxa were estimated using a mean substitution rate for cyt b of 0.01 substitutions per site per million years (Brown et al.

1979; see also Nabholz et al. 2007), with 95% of the prior probability between 0.001 and 0.02 substitutions per site per million years. We assumed a relaxed clock with a lognormal distribution. We altered the mean of the uncorrelated lognormal distribution of the substitution rate from a uniform distribution to a lognormal distribution. We assessed MCMC convergence and selected appropriate burn-in values by examining trace plots of the likelihood and other parameters, while verifying that adequate effective sample sizes (>200) were obtained, in

TRACER v1.6 (Rambaut et al. 2014). Trees produced by BEAST were summarized using

TREEANNOTATOR v2.1.2 after discarding the burn-in (Bouckaert et al. 2014).

8

We conducted the ML analysis using GARLI v2.0 (Zwickl 2006). We performed six independent analyses, each consisting of five replicates of 5 × 106 generations. Outside of specifying the model of nucleotide substitution, we kept all settings at their defaults. We used

Crocidura batakorum, a Philippine endemic species inferred as sister to the main radiation of

Sulawesi shrews (Esselstyn et al. 2009), as the outgroup. ML bootstrap support values were obtained from 1000 bootstrap replicates and mapped onto the tree receiving the highest likelihood using Sumtrees v.4.1.0 in DendroPy v.4.1.0 (Sukumaran and Holder 2010).

2.4 Phylogenetic analysis of concatenated nuclear genes We conducted a concatenated analysis using MrBayes v.3.2.5 using all 15 unphased nuclear loci.

We performed a partitioned analysis, choosing models of nucleotide substitution (Table A.2 in

Appendix) for each unphased gene using the BIC in jModelTest 2.1.7 (Guindon and Gascuel

2003; Darriba et al. 2012), in which we allowed three substitution schemes and used an ML- optimized base tree for calculating the likelihood of each model. We ran two independent

MCMC analyses, each with four chains. Both analyses were run for 106 generations, drawing samples every 1000 generations, and we discarded the first 25% of generations as burn-in. We used TRACER v1.6 (Rambaut et al. 2014) to confirm MCMC stationarity and ensure that effective sample sizes of >200 were achieved. Crocidura nigripes, a distantly related species

(Ruedi et al. 1998; Esselstyn et al. 2009), was used as the outgroup.

2.5 Assessment of geographic patterns of diversity We used the Bayesian approach implemented in STRUCTURE v2.3 (Pritchard et al. 2000) to cluster individuals according to genetic similarity and thereby ascertain geographic patterns within C. elongata. We used STRUCTURE’s admixture model and assumed that allele frequencies were correlated (Falush et al. 2003) between clusters. We chose not to use collection locality as a prior. Initial analyses, including the full set of 76 individuals and 15 phased nuclear

9 loci, were run for 5 × 104 generations with 104 generations discarded as burn-in to explore the potential number of clusters (K) at values of 1–10 (10 replicates each). Continuing to use the full dataset, we subsequently ran STRUCTURE longer (7.5 × 105 generations, 5 × 105 generations discarded as burn-in), 20 times with a narrower range of K values (1–5) based on results from our initial analyses. As finer-scale population genetic partitions can be overshadowed by higher levels of genetic structuring (Evanno et al. 2005), additional runs of STRUCTURE were performed in a hierarchical manner similar to the approach used by Vähä et al. (2007). Following the identification of the highest-level split of clusters in the initial STRUCTURE analyses

(“Level 1”), STRUCTURE was run independently on each of the resultant clusters (“Level 2”).

This logic was extended to a third round (“Level 3”), by which point most individuals had sorted into clusters corresponding to collection localities.

In the majority of cases, we determined the best estimate of the number of clusters (K) using the method of Evanno et al. (2005) as implemented in the online application Structure

Harvester (Earl and vonHoldt 2012). Estimates of K are made independently of any direct consideration of biological reality and therefore blind acceptance of estimated K values has been discouraged (Meirmans 2015). In this study, there was a single instance (Level 3c) in which the

K identified using the method of Evanno et al. (2005) seemed dubious on biological grounds (see

Results).

All runs in Levels 2 and 3 were initially conducted for 5 × 104 generations (burn-in of 104 generations; 20 times for each K of 1–10), whereupon cluster assignment coefficients were compared between replicates to ensure consistency among runs at each K. The two analyses initially producing inconsistent results (3b and 3c) were re-run for longer durations (3b was run for 7.5 × 105 generations with a burn-in of 5 × 105, 20 times for each K 1–10; after using those

10 settings in 3c continued to produce inconsistent results, we re-ran 3c for 2 × 106 generations with a burn-in of 7.5 × 105, 20 times for each K 1–10).

We used Mantel tests (Mantel 1967) in the ade4 package (Dray and Dufour 2007) in R (R

Core Team 2016) to test for spatial autocorrelation between mitochondrial genetic distance

(calculated with MEGAv6.0; Tamura et al. 2013) and both straight-line geographic distance and elevational difference. We based our results on 104 permutations.

2.6 Species delimitation

Because our initial results showed substantial genetic divergence between populations, we tested the hypothesis that these populations represented different species and estimated species tree relationships simultaneously using BPPv.3.1 (Rannala and Yang 2013; Yang and Rannala 2014;

Yang 2015). BPP compares the probabilities of different species boundaries and phylogenies using the multispecies coalescent (Yang 2015). We included all phased nuclear loci (but not cyt b) in this analysis. We divided individuals among 12 putative species. Each sampling locality was represented by one putative species, with the exceptions of Mt. Gandangdewata and Lore

Lindu, each of which was divided into two putative species on the basis of our observations from our mitochondrial gene tree analyses (see Results). We also excluded one Mt. Mekongga individual (WFB8203) from the analysis because it appeared alone at the end of a long branch in the cyt b tree and we wanted to avoid artifacts of small sample size (initial runs including that individual produced inconsistent results). Each BPP analysis was run for 105 steps after discarding a burn-in of 8,000. As prior selection influences the posterior probabilities for models

(Yang and Rannala 2010), BPP was run with three separate combinations of priors for ancestral population size (θ) and root age (τ0), mirroring those applied by Leaché and Fujita (2010) (see

Table 3.3 in Results). We ran BPP twice with each combination to ensure convergence.

11

2.7 Species tree estimation

We used *BEAST (Heled and Drummond 2010) to estimate relationships between the putative species supported in the BPP analysis. Models of nucleotide substitution (Table A.2 in

Appendix) were chosen for each locus (we used all phased nuclear genes, plus cyt b) using the

BIC in jModelTest 2.1.7 (Guindon and Gascuel 2003; Darriba et al. 2012) where we allowed three substitution schemes and used an ML-optimized base tree for calculating the likelihood of each model. Two identical runs of *BEAST with random starting trees were executed, each with a chain length of 2 × 109, with samples drawn every 2 × 105 generations. In initial analyses using a relaxed clock for each gene, MCMC trace plots failed to stabilize after 1.3 × 109 generations, and we observed low variation in the rates of evolution for different lineages. For these reasons, as well as the fact that cyt b’s faster rate of evolution allows it a greater opportunity than the nuclear loci to depart from a strict rate, all nuclear genes were set to adhere to a strict clock model, while cyt b was fit to a relaxed clock. Divergence dates between taxa were also estimated by calibrating the tree with a substitution rate of 0.01 substitutions per site per million years at the cyt b locus (Brown et al. 1979; see also Nabholz et al. 2007), with 95% of the prior probability between 0.001 and 0.02 substitutions per site per million years. We used a Yule model of tree shape. We selected burn-in values, assessed MCMC convergence, and summarized trees produced by *BEAST using the same methods outlined above in the mitochondrial gene tree methods. We used Crocidura nigripes as the outgroup.

2.8 Gene flow estimation

We used the Bayesian coalescent model implemented in MIGRATE-N (Beerli and Felsenstein

2001; Beerli 2006) to estimate rates of gene flow between various clusters and groups of clusters

12 identified in our STRUCTURE analyses. Because we wanted to assess patterns of gene flow among different sets of populations, we conducted three rounds of analyses.

In each of the three analyses we 1) used MIGRATE-N’s built-in FST-like calculation

(Beerli and Felsenstein 1999) to generate initial parameter values; 2) assumed a constant mutation rate across loci; 3) assumed a finite sites mutation model; 4) enabled an MCMC heating scheme with four heated chains (allowing swapping between chains); 5) used uniform prior distributions; 6) discarded a burn-in of 30,000 trees per chain; 7) recorded 8000 total steps with a sampling increment of 200 (total trees sampled per replicate thus equaled 1.6 × 105) for each of four replicates; and 8) used posterior modes as our parameter estimates. In analysis 1 we allowed the mutation-scaled effective immigration rates, M, to be asymmetrical, while in analyses 2 and

3, mutation-scaled effective immigration rates were assumed to be symmetrical. Additional, analysis-specific settings are given in Table 2.1. We used phased loci for each analysis.

13

TABLE 2.1: Settings specific to individual analyses conducted in MIGRATE-N (Beerli 2006; Beerli and Felsenstein 2001). θ = the mutation-scaled effective population size; M = the mutation-scaled effective immigration rate. 1a denotes lower-elevation individuals from Mt. Gandangdewata; 1b denotes higher-elevation individuals from the same mountain. 8203 refers to WFB8203 from Mt. Mekongga. Localities Upper and lower Population included in limits of prior Analysis assessed population distributions assessed θ M A 7, 8 1 B 1, 4, 5 0, 0.05 0, 1000 C 0, 2, 3, 6, 9 A 7, 8 B 1, 4, 5 2 C 6 0, 0.03 0, 6000 D 2 E 0, 3, 8203 A 7, 8 B 0, 2, 3, 6, 9 C 4 3 0, 0.03 0, 6000 D 5 E 1a F 1b

14

3. Results

3.1 Alignment contents and sequence variation

Our cyt b alignment, including 46 unique haplotypes sequenced from 75 individuals, was 1131 bases long with 25.6% missing data. Sequence lengths for phased nuclear loci ranged from 450 to 793 bases, with the number of unique alleles sequenced ranging from 19 to 64 (Table 3.1).

The percentage of missing data ranged from 0.3% to 26.3%. Inter-locality, uncorrected p- distances generated using MEGAv6.0 (Tamura et al. 2013) ranged from 0.017 to 0.121, while intra-locality distances ranged from 0 to 0.012 (Table 3.2).

3.2 Mitochondrial tree and divergence time estimation

In our Bayesian reconstruction of the mitochondrial gene tree, we observed deep divergences among the mitochondrial haplotypes represented in each locality (Figure 3.1). The earliest

TABLE 3.1: Summary of nucleotide sequence alignments for each phased, nuclear locus. Number of unique Total number of Proportion Locus Sequence length haplotypes/phased phased alleles missing data alleles ANKRD 568 116 32 0.063 ApoB 550 148 55 0.027 BDNF 450 148 42 0.082 BRCA 650 150 55 0.079 CCNT 612 132 47 0.114 GHR 564 134 33 0.017 GRIN 718 112 41 0.085 MBD 757 136 25 0.019 MCGF 593 144 58 0.075 NCOA 715 138 19 0.003 PTGER 473 148 20 0.027 RAG-1 607 146 56 0.033 SLITRK 711 128 56 0.063 TMEM 719 72 28 0.084 vWF 793 142 64 0.263

15

TABLE 3.2: Uncorrected p-distances generated in MEGA. Each value represents the average number of base differences per site. Numbers in headings correspond to locality numbers (see Figure 1.1). 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata; 8203 refers to WFB8203 from Mt. Mekongga. Locality 0 3 4 1a 1b 2 7 8 5 9 8203 6 0 0 3 0.011 0.004 4 0.118 0.109 0.002 1a 0.111 0.100 0.017 0.001 1b 0.105 0.100 0.040 0.041 0.002 2 0.017 0.012 0.099 0.095 0.102 0.003 7 0.117 0.105 0.110 0.106 0.116 0.100 0.007 8 0.114 0.102 0.102 0.098 0.099 0.098 0.047 0.008 5 0.110 0.103 0.030 0.031 0.031 0.093 0.109 0.097 0.012 9 0.050 0.048 0.122 0.110 0.098 0.045 0.118 0.103 0.109 0.00 8203 0.044 0.041 0.119 0.114 0.106 0.037 0.115 0.107 0.116 0.045 n/a 6 0.029 0.025 0.097 0.089 0.089 0.031 0.100 0.100 0.094 0.048 0.031 0.004

16

FIGURE 3.1: Bayesian mitochondrial gene tree generated in BEAST illustrating the relationships between individuals of Crocidura elongata. Numbered boxes correspond to localities (see Figure 1.1; 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata).

17 divergence separated Mt. Dako and Mt. Buliohuto from all other localities. Within the latter clade, the first divergence divided Mt. Gandangdewata, Mt. Latimojong and Lore Lindu from the other localities. Bayesian posterior probabilities for many nodes exceeded 0.95, although many nodes had considerably less support. For example, the node joining Mt. Dako and Mt. Buliohuto to the others received a posterior probability of 0.29; additionally, the node joining the clade of

Luwu Timur individuals to the Mt. Balease clade received a posterior probability of 0.59. Most, but not all, of the sampled localities formed clades. The Mt. Gandangdewata and Mt. Mekongga samples were both paraphyletic, grouping according to elevation (one clade of individuals from

Mt. Gandangdewata was sampled at ~1000 m above the other clade; likewise, one individual from Mt. Mekongga (WFB8203) was sampled at least 1350 m below the other two individuals from that locality). Individuals from Lore Lindu were monophyletic on the gene tree, but two clades were separated by relatively long branches and the node joining them was poorly supported (posterior probability of 0.48). Individuals from the two Lore Lindu clades were sampled sympatrically. We estimated the mean age of the oldest divergence within C. elongata at 2.10 Ma with a 95% HPD interval of 0.97–3.88 Ma.

Our ML tree (Figure 3.2) largely supported the relationships established in the Bayesian analysis. However, individuals collected from Lore Lindu were paraphyletic, divided into two clades mirroring the two Lore Lindu clades observed in the Bayesian analysis. Additionally, the clade of individuals collected from Luwu Timur was nested among the Mt. Balease specimens.

3.3 Phylogenetic analysis of concatenated nuclear genes

For the most part, the relationships and groupings supported in our concatenated analysis (Figure

3.3) were consistent with those supported by the mitochondrial gene tree analyses. However, the concatenated analysis grouped all individuals from Lore Lindu into a single,

18

FIGURE 3.2: Maximum likelihood mitochondrial gene tree generated in GARLI illustrating the relationships between individuals of Crocidura elongata, with Crocidura batakorum serving as the outgroup. Numbered boxes correspond to localities (see Figure 1.1; 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata).

19

FIGURE 3.3: Concatenated nuclear gene tree generated in MrBayes illustrating the relationships between individuals of Crocidura elongata, with Crocidura nigripes serving as the outgroup. Numbered boxes correspond to localities (see Figure 1.1; 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata).

20 somewhat poorly supported clade (posterior probability of 0.71). In addition, the individuals comprising the low-elevation Mt. Gandangdewata mitochondrial clade were paraphyletic in the concatenated analysis, although all individuals still grouped apart from the high-elevation Mt.

Gandangdewata specimens.

Our concatenated analysis also suggested somewhat complicated patterns of genetic differentiation between individuals from the Mt. Balease, Luwu Timur, Mt. Tompotika, Salu

Tiwo, and Mt. Mekongga localities. The three individuals from Mt. Mekongga formed a clade, albeit with a posterior probability of only 0.66. Additionally, the clade of Mt. Tompotika individuals was nested within a larger clade containing individuals from Mt. Balease and Luwu

Timur. Individuals from Mt. Balease, with the exception of one individual that was sister to the

Mt. Mekongga clade, were interspersed with the individuals from Luwu Timur.

3.4 Assessment of geographic patterns of diversity The hierarchical implementation of STRUCTURE analyses revealed both broad- and fine-scale, geographically associated genetic partitions. The initial set of all individuals (Figure 3.4, Level

1) was estimated by the Evanno et al. (2005) method to comprise two clusters with no evidence of admixture. One cluster consisted of all individuals from Salu Tiwo, Mt. Balease, Luwu Timur,

Mt. Tompotika, Mt. Dako, Mt. Buliohuto, and Mt. Mekongga; the other of all individuals from

Mt. Gandangdewata, Mt. Latimojong, and Lore Lindu. We observed a distinct elevational difference between the two clusters (Figure 3.5), with the mean elevation of the latter cluster

(2169 m) markedly above that of the former (864 m).

Level 2 of the STRUCTURE analysis produced a similar pattern, with each cluster identified in Level 1 splitting into two clusters with no signal of admixture (Figure 3.4, Level 2):

21

FIGURE 3.4: Summary and likelihood plots from our hierarchical implementation of STRUCTURE. The x-axes on the summary plots are representative of individual specimens. Specific shades on summary plots represent cluster assignments; the probability an individual belongs to a specific cluster is expressed by the height of that shade. Numbers within summary plots indicate localities from which individuals comprising the numbered cluster were sampled (see Figure 1.1; 1a and 1b denote the lower- and higher- elevation groups of individuals from Mt. Gandangdewata). The analysis labeled “Level 1” consists of all individuals from all localities. Arrows point from a cluster assignment in a previous analysis to an analysis including only the individuals assigned to that cluster. Below each summary plot is a plot showing the likelihoods (mean L(K)) for each number of cluster assignments (K) tested in that analysis. Error bars indicate standard deviation.

22

FIGURE 3.5: Elevational comparison of clusters identified in STRUCTURE analyses. Numbered points represent individuals identified with their locality number (see Figure 1.1). Individuals are grouped according to their split assignment in a specific STRUCTURE analysis; (a) illustrates the elevational difference between the two clusters identified in the Level 1 STRUCTURE analysis, and (b) illustrates the elevational difference between the two clusters identified in the Level 2a STRUCTURE analysis.

23 individuals from Salu Tiwo, Mt. Balease, Luwu Timur, Mt. Mekongga, and Mt. Tompotika were split from those from Mt. Dako and Mt. Buliohuto, and the individuals from Mt. Latimojong and

Lore Lindu, plus five individuals from Mt. Gandangdewata, were split from the remaining 15 individuals from Mt. Gandangdewata. The five Gandangdewata individuals assigned to the former cluster were collected at 1535 to 1600 m above sea level, while the 15 assigned to the latter were collected at 2200 to 2600 m above sea level (Figure 3.5). The two groups of

Gandangdewata individuals corresponded to two clades identified in the mitochondrial gene tree analysis.

Each cluster identified in Level 2 of the analysis was subjected to its own STRUCTURE run. The analysis conducted on the cluster identified in Level 2a, comprising the higher-elevation group of Mt. Gandangdewata specimens, assigned each individual by nearly equal amounts to each of seven clusters; we regarded these results as biologically unrealistic and discarded them, leaving three Level 3 analyses.

All individuals in runs 3a and 3b sorted into clusters by locality with the exception that the low-elevation cluster of Mt. Gandangdewata specimens grouped with the Lore Lindu specimens. In Level 3c of the analysis, the most appropriate K was estimated as 5 by the Evanno et al. (2005) method. Individuals from Salu Tiwo and Mt. Tompotika formed their own clusters, as did two individuals from Mt. Mekongga. The remaining individual from Mt. Mekongga

(WFB8203) and all individuals from Luwu Timur and Mt. Balease were partially assigned, to varying degrees, to two clusters (WFB8203 and the Luwu Timur specimens always received the four highest membership coefficients for one of the two clusters). However, because of the extensive degree of admixture observed between the two clusters, the biological basis for this number of clusters may be weak. Lowering K to 4 produced somewhat inconsistent results, but

24 in most replicates Mt. Balease individuals were consolidated into one cluster, with WFB8203 and the Luwu Timur individuals’ assignments split between the Mt. Balease and Mt. Tompotika clusters. Because the individuals from Salu Tiwo and the two individuals from Mt. Mekongga form their own clusters among different estimates of K, we consider it sensible to view these as discrete groups. However, the extent to which WFB8203, as well as individuals from Mt.

Tompotika, Mt. Balease, and Luwu Timur, represent defined entities is less clear.

Our Mantel tests indicated that both geographic distance (Mantel r statistic=0.58; p<<0.001) and elevational difference (Mantel r statistic=0.33; p<<0.001) exhibit significant correlation with mitochondrial genetic distance among individuals of C. elongata.

3.5 Species delimitation

Our BPP analyses supported the species status of most of the putative species we specified.

Across every replicate, with each combination of priors, all putative species other than the two from Lore Lindu were delimited with posterior probabilities of 1 (Table 3.3). All replicates assigned a much higher posterior probability to a single species consisting of all individuals from

Lore Lindu than to a division of these individuals into two species, although the extent of the difference between delimitation schemes varied between different sets of priors.

3.6 Species tree estimation

Many initial analyses in *BEAST failed to converge. Generally, the likelihood and posterior

MCMC plots visualized in TRACER trended downward before stabilizing, a pattern we initially attributed to having set an insufficiently high upper bound to the 95% prior probability interval of the mean of the uncorrelated lognormal distribution for cyt b. However, the initial downward trend in the likelihood score continued to some extent even after raising our estimate of that parameter to biologically unrealistic values (e.g., an upper bound to the 95% prior probability

25

TABLE 3.3: Posterior probabilities for each species delimitation under three combinations of ancestral population size and root age prior values. Each probability averaged across two replicates. Locality numbers correspond to listed in Figure 1.1; 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata. Locality Mean Posterior Probability ( ± SD) θ=(1,10); τ0=(1,10,1) θ=(2,2000); τ0=(2,2000,1) θ=(1,10); τ0=(2,2000,1) 3 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 0 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 2 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 6 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 9 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 4 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 1a 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 1b 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 7 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 8 1.00000 ± 0 1.00000 ± 0 1.00000 ± 0 5b/5a 0.871615 ± 0.16754895 0.90767 ± 0.04162031 0.99951 ± 0.00046669 5b 0.128385 ± 0.16754895 0.09233 ± 0.04162031 0.00049 ± 0.00046669 5a 0.128385 ± 0.16754895 0.09233 ± 0.04162031 0.00049 ± 0.00046669

26 interval of 5.0 substitutions per base per million years), leading us to suspect a poor fit of this dataset to the multispecies coalescent. Nevertheless, our estimated topology (Figure 3.6) generally agreed with the mitochondrial gene tree and concatenated nuclear gene tree. The most basal divergence between clades of C. elongata was dated at 8.38 Ma with a 95% HPD interval of 6.77–10.11 Ma; however, due to the abnormal behavior of the MCMC plots described above, we question the reliability of this date and instead base our conclusions on the dates estimated in our mitochondrial gene tree analysis.

3.7 Gene flow estimation

For the most part, we estimated minimal gene flow between the localities included in our

MIGRATE-N analyses (Table 3.4), but migration rates were higher among localities that grouped together in Level 3 STRUCTURE analyses (Figure 3.4). The migration rates estimated between the NW AoE (represented by Mt. Dako and Mt. Buliohuto) and any locality or group of localities elsewhere on the island never exceeded 0.02 immigrants per generation. Similarly, localities included in Level 2a of the STRUCTURE analysis never exhibited high rates of gene flow with localities included in Level 3c, with immigration rates never exceeding 0.01 immigrants per generation in either direction. Migration rates within each of those groups

(localities included in Level 2a and those included in Level 3c) were generally much higher.

Among localities comprising Level 3c (Mt. Balease/Luwu Timur/WFB8203, Salu Tiwo, and Mt.

Tompotika—the two Mt. Mekongga individuals other than WFB8203 were excluded from analysis 2 to avoid artifacts from the small sample size), immigration rates ranged from 0.171 to

0.948 immigrants per generation, though we note that the posterior distribution for M between

Salu Tiwo and Mt. Tompotika failed to converge. Likewise, immigration rates between localities comprising Level 3a (Mt. Latimojong, Lore Lindu and the lower-elevation group of Mt.

27

FIGURE 3.6: Species tree generated in *BEAST illustrating the relationships between putative species within Crocidura elongata with Crocidura nigripes serving as the outgroup. Numbered boxes correspond to localities (see Figure 1.1; 1a and 1b denote the lower- and higher-elevation groups of individuals from Mt. Gandangdewata).

28

TABLE 3.4: Summary of results from MIGRATE-N analyses. X denotes a failure to reach convergence in the posterior distribution for that parameter. Population labels are specified in Table 2.1. Values provided for θ and M are posterior distribution modes, with the 95% probability interval given in parentheses. Immigrants per Population origin of Analysis Population assessed θ M generation migrants ((θ*M)/4) 0.00192 (0.00063– B 13.0 (0–42.0) 0.006 A 0.00310) C 41.0 (2.7–88.7) 0.020 0.00198 (0.00097– A 15.0 (0–54.7) 0.008 1 B 0.00237) C 20.3 (0–50.7) 0.010 0.00112 (0.00023– A 7.7 (0–32.7) 0.002 C 0.00193) B 11.0 (0–36.0) 0.003 B 30.0 (0–132.0) 0.009 0.00123 (0.0007– C 42.0 (0–148.0) 0.013 A 0.0018) D 26.0 (0–128.0) 0.008 E 34.0 (0–140.0) 0.011 A 30.0 (0–132.0) 0.011 0.00143 (0.0008– C 22.0 (0–120.0) 0.008 B 0.0021) D 22.0 (0–124.0) 0.008 E 18.0 (0–120.0) 0.007 A 42.0 (0–148.0) 0.004 2 B 22.0 (0–120.0) 0.002 C 0.00037 (0–0.0009) D X 3410.0 (2072.0– E 0.316 4468.0) A 26.0 (0–128.0) 0.003 B 22.0 (0–124.0) 0.003 D 0.00051 (0–0.001) C X 1338.0 (552.0– E 0.171 3072.0)

29

(TABLE 3.4 continued) Immigrants per Population origin of Analysis Population assessed θ M generation migrants ((θ*M)/4) A 34.0 (0–140.0) 0.010 B 18.0 (0–120.0) 0.005 3410.0 (2072.0– 2 E 0.00111 (0–0.002) C 0.948 4468.0) 1338.0 (552.0– D 0.371 3072.0) B 26.0 (0–124.0) 0.010 C 34.0 (0–140.0) 0.013 0.0015 (0.0009– A D 38.0 (0–144.0) 0.014 0.0021) E 42.0 (0–152.0) 0.016 F 50.0 (1–156.0) 0.019 A 26.0 (0–124.0) 0.007 C 30.0 (0–132.0) 0.008 0.0010 (0.000– B D 56.0 (0–124.0) 0.014 0.0015) 3 E 26.0 (0–128.0) 0.007 F 22.0 (0–124.0) 0.006 A 34.0 (0–140.0) 0.004 B 30.0 (0–132.0) 0.003 2082.0 (1404.0– D 0.208 C 0.0004 (0–0.0008) 3148.0) 4606.0 (3140.0– E 0.461 5948.0) F 130.0 (0.0–300.0) 0.013

30

(TABLE 3.4 continued) Immigrants per Population origin of Analysis Population assessed θ M generation migrants ((θ*M)/4) A 38.0 (0–144.0) 0.006 B 56.0 (0–124.0) 0.009 2082.0 (1404.0– C 0.312 D 0.0006 (0–0.0015) 3148.0) 2574.0 (1924.0– E 0.386 4600.0) F 86.0 (0–268.0) 0.013 A 42.0 (0–152.0) 0.006 B 26.0 (0–128.0) 0.004 3 4606.0 (3140.0– C 0.691 E 0.0006 (0–0.0011) 5948.0) 2574.0 (1924.0– D 0.386 4600.0) F 198.0 (24.0–456.0) 0.0298 A 50.0 (1–156.0) 0.0005 B 22.0 (0–124.0) 0.0003 F 0.00003 (0–0.0003) C 130.0 (0.0–300.0) 0.001 D 86.0 (0–268.0) 0.001 E 198.0 (24.0–456.0) 0.002

31

Gandangdewata specimens) ranged from 0.208 to 0.691 immigrants per generation. The high- elevation Mt. Gandangdewata group, however, seemed largely isolated from all other populations, as the immigration rate from this group to the lower-elevation group from the same mountain was 0.03 immigrants per generation with no other immigration rates including the high-elevation group exceeding 0.01 immigrants per generation. In general, convergence in the posterior distribution for M was poorer when migration rates were higher.

32

4. Discussion and Conclusions

Our analyses identify deep divergences among clades of Crocidura elongata, suggesting that the current single-species classification does not adequately reflect the diversity of the taxon.

Although our results were not entirely concordant, the analyses did agree in many respects. For instance, clusters identified in the STRUCTURE analyses (Figure 3.4) generally corresponded to mitochondrial clades and, with a few exceptions, to clades in our concatenated nuclear gene tree.

The topologies of our concatenated sequence tree and *BEAST species tree (Figs. 3.3 and 3.6) largely support the same relationships as our mitochondrial gene tree, though we treat our species tree with caution because of the problems with MCMC behavior described in Methods and Results.

The sharp geographic partitioning of C. elongata clades is inconsistent with the genetic partitioning in monkeys and toads observed by Evans et al. (2003a). Therefore, the historical factors responsible for the distributions of monkeys and toads do not appear to have been as important in shaping the modern, overall distribution of C. elongata lineages. However, we cannot rule out the possibility, if C. elongata is indeed a complex of multiple species, that one or more of the component species abides by the AoE paradigm. For example, if C. elongata consisted of a highland and lowland species, corresponding to the initial split in the

STRUCTURE analysis (Figs. 3.4 and 3.5), either species individually could exhibit genetic partitioning consistent with the AoE paradigm when examined across Sulawesi as a whole.

In addition to the elevational difference between the clusters identified in the initial

STRUCTURE split, two cases of paraphyly observed in the mitochondrial gene tree analyses are consistent with diversification across elevational gradients. The two clades of individuals sampled from Mt. Gandangdewata group distinctly by elevation, with one clade sampled

33 between 1535 and 1600 m, the other between 2500 and 2600 m. Similarly, on Mt. Mekongga one distinctive individual (WFB8203) was collected at 150 m, whereas the other two specimens were collected at 1515 and 1899 m, respectively. Both elevational partitions (those of Mt.

Gandangdewata and Mt. Mekongga) were supported by the STRUCTURE analysis, although the

Mt. Mekongga individuals formed a poorly supported clade in the concatenated analysis.

However, this limited sample precludes a firm conclusion regarding the diversity of C. elongata on Mt. Mekongga. Additionally, our mitochondrial gene tree analyses were equivocal in respect to the monophyly of the Lore Lindu specimens, but the two clades from that locality were not separated by elevation. Rather, the three individuals comprising the smaller Lore Lindu clade were sympatric with individuals from the larger clade, and neither our concatenated nuclear gene tree nor the STRUCTURE analysis supported splitting the Lore Lindu individuals into two groups.

Elevation has long been posited as a driver of diversification among small mammals of

Southeast Asia (Ruedi 1995; Musser et al. 2010; Justiniano et al. 2015), but empirical validation has been elusive. For example, noting morphological discrepancies between two C. elongata specimens collected at different elevations in Lore Lindu National Park, Ruedi (1995) suggested the possibility of altitudinal diversification. Our results here add empirical heft to that general speculation. Our estimation of minimal levels of gene flow between the two sides of the Level 1

STRUCTURE split, compared with moderate levels of gene flow within each side of the split

(though not between Mt. Dako/Mt. Buliohuto and the other localities comprising that side of the split), strongly suggests a role played by elevation in the divergence of those lineages.

Furthermore, our estimation of minimal gene flow between our highest-elevation samples, the group of Mt. Gandangdewata specimens sampled at 2500–2600 m, and the lower high-elevation

34 lineages occupying Lore Lindu, Mt. Latimojong, and the 1535–1600 m section of Mt.

Gandangdewata, suggests that elevation has played a role in lineage divergence on multiple occasions. Our observations of the Lore Lindu, Mt. Latimojong, and lower-elevation Mt.

Gandangdewata groups suggest some degree of connectivity (either past or present) between high-elevation habitats, in contrast to repeated up-mountain colonizations by neighboring lowland species. This is somewhat analogous to the patterns observed in Philippine Apomys mice

(Justiniano et al. 2015), with the exception that C. elongata lineages also occur in the lowlands, whereas Apomys on Luzon have been collected almost exclusively above 700 m.

In the Bayesian mitochondrial gene tree analysis, we estimated the root age at slightly older than 2 Ma (95% HPD 0.97–3.88 Ma). Thus, the oldest divergence within C. elongata appears to post-date the fusion of all components of the proto-Sulawesi archipelago, with the possible exception of a large fragment of the SE AoE (Hall 2002). If C. elongata, or some component of it, abides by the AoE paradigm, then, isolation on proto-Sulawesi islands is probably not the cause of their modern distributions. This does not necessarily mean that island coalescent events did not influence the AoEs’ formation, however. For example, the islands’ fusion into a single landmass may not have erased historical ecological differences, such as soil and plant community compositions, that would have influenced subsequent population distributions and interactions. It should be noted, however, that the substitution rate used to calibrate both dating analyses in this study was taken from a distantly related species, a practice that has been criticized (Nabholz et al. 2007).

The consistent downward trends we observed in the species tree analysis’s posterior

MCMC plots suggest a poor fit of our dataset to the multispecies coalescent, a phenomenon not uncommon in empirical studies (Reid et al. 2013). Factors contributing to poor fit can include

35 gene flow between lineages, selection, gene duplication and/or extinction events, and genetic recombination (Reid et al. 2013). Our estimate of the date of the oldest divergence among clades of C. elongata from the species tree analysis (8.38 Ma) is much older than the date estimated in the mitochondrial gene tree analysis (2.10 Ma), a result inconsistent with gene flow having strongly influenced any departures from the multispecies coalescent (Leaché et al. 2014).

However, we did estimate moderate levels of gene flow among some of the groups whose relationships were estimated in the species tree analysis, a violation of the multispecies coalescent assumption of no migration between groups. Additionally, because of the complicated patterns of differentiation among the Mt. Balease, Luwu Timur, Mt. Tompotika, and Mt.

Mekongga localities we observed in the STRUCTURE and concatenated analyses, it is possible that the individuals collected from these localities have not diverged sufficiently to justify actually treating these groups as discrete populations. Although we treat our results from our species tree analysis warily, we note that, if taken at face value, they would leave open the possibility that dispersal to different components of the proto-Sulawesi archipelago contributed to the earliest divergences among clades of C. elongata. The plate tectonic reconstructions of

Hall (2002) suggest that the landmasses that would later become Sulawesi remained divided by sea 8 Ma. In particular, the landmass most closely neighboring Sundaland, comprising the western portion of Sulwesi’s central core and its southwestern peninsula, was separated from other islands by narrow channels. Thus, any taxon with the capacity to disperse from Sundaland to western Sulawesi around 8 Ma could presumably have dispersed to other islands thereafter, a scenario that has been posited for Sulawesi squirrels (Hawkins et al. 2016). With respect to C. elongata, however, such a scenario remains unsupported outside of our species tree analysis (the mitochondrial substitution rate would need to be reduced to 0.003 substitutions per site per

36 million years for the date estimated in the mitochondrial gene tree analysis to match the 8.38 Ma figure estimated in the species tree analysis).

In sum, Crocidura elongata, as defined by current classification (Miller and Hollister

1921; Ruedi 1995), is a highly diverse taxon featuring a number of well-defined, highly diverged lineages. These lineages appear distinctly partitioned geographically, with elevation and collection locality predicting degree of genetic similarity. The current single-species classification is almost certainly insufficient to account for the diversity present. However, much greater geographic sampling is needed to determine whether genetic partitioning of the species comprising the C. elongata complex is concordant with the AoEs. Nevertheless, this study joins a growing body of work (e.g., Rickart et al. 2005; Esselstyn et al. 2013; Justiniano et al. 2015) in recognizing the importance of within-island diversification in building regional faunas, especially on the larger islands of Indo-Australia.

37

References

Balete DS, Rickart EA, Heaney LR, Alviola PA, Duya MV, Duya MRM, Sosa T, Jansa SA (2012) Archboldomys (Muridae: Murinae) reconsidered: a new genus and three new species of shrew mice from Luzon Island, Philippines. American Museum Novitates, 3754, 1–60. Beerli P (2006) Comparison of Bayesian and maximum likelihood inference of population genetic parameters. Bioinformatics, 22, 341–345. Beerli P, Felsenstein J (1999) Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics, 152, 763– 773. Beerli P, Felsenstein J (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n populations by using a coalescent approach. Proceedings of the National Academy of Sciences of the USA, 98, 4563–4568. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, 10, DOI 10.1371/journal.pcbi.1003537. Brown RM, Siler CD, Oliveros CH, Esselstyn JA, Diesmos AC, Hosner PA, Linkem CW, Barley AJ, Oaks JR, Sanguila MB, Welton LJ, Moyle RG, Peterson AT, Alcala AC (2013) Evolutionary processes of diversification in a model island archipelago. Annual Review of Ecology, Evolution, and Systematics, 44, 411–435. Brown WC, Alcala AC (1970) The zoogeography of the Philippine Islands, a fringing archipelago. Proceedings of the California Academy of Sciences, 38, 105–130. Brown WM, George Jr. M, Wilson AC (1979) Rapid evolution of mitochondrial DNA. Proceedings of the National Academy of Sciences of the USA, 76, 1967–1971. Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature Methods, 9, 772. Darwin C, Wallace AR (1858) On the tendency of species to form varieties; and on the perpetuation of varieties and species by natural means of selection. Zoological Journal of the Linnean Society, 3, 46–50. Dickerson RE (1928) Distribution of life in the Philippines. Philippine Bureau of Science, Manila. Douzery EP, Scornavacca C, Romiguier J, Belkhir K, Galtier N, Delsuc F, Ranwez V (2014) OrthoMaM v8: a database of orthologous exons and coding sequences for comparative genomics in mammals. Molecular Biology and Evolution, 31, 1923–1928. Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. Journal of Statistical Software, 22, 1–20.

38

Earl DA, vonHoldt BM (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resource, 4, 359–361. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32, 1792–1797. Esselstyn JA, Achmadi AS, Rowe KC (2012) Evolutionary novelty in a rat with no molars. Biology Letters, 8, 990–993. Esselstyn JA, Brown RM (2009) The role of repeated sea–level fluctuations in the generation of shrew (Soricidae: Crocidura) diversity in the Philippine Archipelago. Molecular Phylogenetics and Evolution, 53, 171–181. Esselstyn JA, Maharadatunkamsi, Achmadi AS, Siler CD, Evans BJ (2013) Carving out turf in a biodiversity hotspot: multiple unrecognized shrew species co–occur on Java Island, Indonesia. Molecular Ecology, 22, 4972–4987. Esselstyn JA, Timm RM, Brown RM (2009) Do geological or climatic processes drive speciation in dynamic archipelagos? The temp and mode of diversification in Southeast Asian Shrews. Evolution, 63, 2595–2610. Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611–2620. Evans BJ, Supriatna J, Andayani N, Setiati MI, Cannatella DC, Melnick DJ (2003a) Monkeys and toads define areas of endemism on Sulawesi. Evolution, 57, 1436–1443. Evans BJ, Brown RM, McGuire JA, Supriatna J, Andayani N, Diesmos A, Iskandar D, Melnick DJ, Cannatella DC (2003b) Phylogenetics of fanged frogs: testing biogeographcial hypotheses at the interface of the Asian and Australian faunal zones. Systematic Biology, 52, 794–819. Evans BJ, McGuire JA, Brown RM, Andayani N, Supriatna J (2008) A coalescent framework for comparing alternative models of population structure with genetic data: evolution of Celebes toads. Biology Letters, 4, 430–433. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure: Extensions to linked loci and correlated allele frequencies. Genetics, 164, 1567–1587. Flot J-F (2010) SeqPHASE: a web tool for interconverting PHASE input/output files and FASTA sequence alignments. Molecular Ecology Resources, 10, 162–166. Fooden J (1969) and evolution of the monkeys of Celebes (Primates: Cercopithecidae). S. Karger, Basel, Switzerland. Gorog, AJ, Sinaga MH, Engstrom MD (2004) Vicariance or dispersal? Historical biogeography of three Sunda shelf murine rodents (Maxomys surifer, Leopoldomys sabanus and Maxomys whiteheadi). Biological Journal of the Linnean Society, 81, 91–109. Guindon S, Gascuel O (2003) A simple, fast and accurate method to estimate large phylogenies by maximum likelihood. Systematic Biology, 52, 696–704.

39

Hall R (2002) Cenozoic geological and plate tectonic evolution of SE Asian and the SW Pacific: computer–based reconstructions and animations. Journal of Asian Earth Sciences, 20, 353–434. Hall R (2011) Australia-SE Asia collision: plate tectonics and crustal flow. Geological Society, London, Special Publications, 355, 75–109. Hawkins MTR, Leonard JA, Helgen KM, McDonough MM, Rockwood LL, Maldonado JE (2016) Evolutionary history of endemic Sulawesi squirrels constructed from UCEs and mitogenomes sequenced from museum specimens. BMC Evolutionary Biology, 16, DOI 10.1186/s12862-016-0650-z. Heaney LR (1984) Mammalian species richness on islands on the Sunda Shelf, Southeast Asia. Oecologia, 61, 11–17. Heaney LR (2011) Discovering diversity: studies of the mammals of Luzon Island, Philippines. Fieldiana Life and Earth Sciences, 2, 1–60. Heaney LR, Balete DS, Rickart EA. Alviola PA, Duya MRM, Duya MV, Veluz MJ, VandeVrede L, Steppan SJ (2011) Seven new species and a new subgenus of forest mice (Rodentia: Muridae: Apomys) from Luzon Island. Fieldiana Life and Earth Sciences, 5, 1–60. Heaney LR, Balete DS, Rickart EA, Veluz MJ, Jansa SA (2014b) Three new species of Musseromys (Muridae, Rodentia), the endemic Philippine tree mouse from Luzon Island. American Museum Noviates, 3802, 1–27. Heaney LR, Balete DS, Veluz MJ, Steppan SJ, Esselstyn JA, Pfeiffer AW, Rickart EA (2014a) Two new species of Philippine forest mice (Apomys, Muridae, Rodentia) from Lubang and Luzon Islands, with a redescription of Apomys sacobianus Johnson, 1962. Proceedings of the Biological Society of Washington, 126, 395–413. Heaney LR, Rickart EA (1990) Correlations of clades and clines: geographic, elevational, and phylogenetic distribution patterns among Philippine mammals. Pp. 321–332. In G. Peters and R. Hutterer (eds.), Vertebrates in the Tropics. Museum Alexander Koenig, Bonn, Germany. Heaney LR, Walsh JS, Peterson AT (2005) The roles of geological history and colonization abilities in the genetic differentiation between mammalian populations in the Philippine archipelago. Journal of Biogeography, 32, 229–247. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27, 570–580. Huxley TH (1868) On the distribution of the Alectoromorphae and Heteromorphae. Proceedings of the Zoological Society of London, 1868, 294–319. Inger RF (1954) Systematics and zoogeography of Philippine Amphibia. Fieldiana, 33, 181–531. Jansa SA, Barker FK, Heaney LR (2006) The pattern and timing of diversification of Philippine endemic rodents: evidence from mitochondrial and nuclear gene sequences. Systematic Biology, 55, 73–88.

40

Justiniano R, Schenk JJ, Balete DS, Rickart EA, Esselstyn JA, Heaney LR, Steppan SJ (2015) Testing diversification models of endemic Philippine forest mice (Apomys) with nuclear phylogenies across elevational gradients reveals repeated colonization of isolated mountain ranges. Journal of Biogeography, 42, 51–64. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Mentjies P, Drummond A (2012) Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28, 1647–1649. Leaché AD, Fujita MK (2010) Bayesian species delimitation in West African forest geckos (Hemidactylus fasciatus). Proceedings of the Royal Society B, 277, 3071–3077. Leaché AD, Harris RB, Rannala B, Yang Z (2014) The influence of gene flow on species tree estimation: a simulation study. Systematic Biology, 63, 17–30. Lim HC, Sheldon FH (2011) Multilocus analysis of the evolutionary dynamics of rainforest bird populations in Southeast Asia. Molecular Ecology, 20, 3414–3438. Lohman DJ, de Bruyn M, Page T, von Rintelen K, Hall R, Ng PKL, Shih H, Carvalho GR, von Rintelen T (2011) Biogeography of the Indo-Australian archipelago. Annual Review of Ecology, Evolution, and Systematics, 42, 205–226. Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209–220. McGuire JA, Brown RM, Mumpuni, Riyanto R, Andayani N (2007) The flying lizards of the lineatus group (: Iguania: ): a taxonomic revision with descriptions of two new species. Herpetological Monographs, 21, 179–212. Meirmans PG (2015) Seven common mistakes in population genetics and how to avoid them. Molecular Ecology, 24, 3223–3231. Miller Jr. GS, Hollister N (1921a) Twenty new mammals collected by H. C. Raven in Celebes. Proceedings of the Biological Society of Washington, 34, 93–104. Miller Jr. GS, Hollister N (1921b) Descriptions of sixteen new murine rodents from Celebes. Proceedings of the Biological Society of Washington, 34, 67–76. Musser GG (2014) A systematic review of Sulawesi Bunomys (Muridae, Murinae) with the description of two new species. Bulletin of the American Museum of Natural History, 392, 1–313. Musser GG, Durden LA, Holden ME, Light JE (2010) Systematic review of endemic Sulawesi squirrels (Rodentia, Sciuridae), with descriptions of new species of associated sucking lice (Insecta, Anoplura), and phylogenetic and zoogeographic assessments of sciurid lice. Bulletin of the American Museum of Natural History, 339, 1–255. Nabholz B, Glémin S, Galtier N (2007) Strong variations of mitochondrial mutation rate across mammals—the longevity hypothesis. Molecular Biology and Evolution, 25, 120–130.

41

Pritchard JK, Stephens M, Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics, 155, 945–959. R Core Team (2016) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from https://www.R-project.org/. Rambaut A, Suchard MA, Xie D, Drummond AJ (2014) Tracer v1.6. Available from http://beast.bio.ed.ac.uk/TRACER. Rannala B, Yang Z. (2013) Improved reversible jump algorithms for Bayesian species delimitation. Genetics, 194, 245–253. Reid NM, Hird SM, Brown JM, Pelletier TA, McVay JD, Satler JD, Carstens BC (2013) Poor fit to the multispecies coalescent is widely detectable in empirical data. Systematic Biology, 63, 322–333. Rickart EA, Heaney LR, Goodman SM, Jansa S (2005) Review of the Philippine genera Chrotomys and Celaenomys (Muridae) and description of a new species. Journal of Mammaology, 86, 415-428. Rowe KC, Achmadi AS, Esselstyn JA (2014) Convergent evolution of aquatic foraging in a new genus and species (Rodentia: Muridae) from Sulawesi Island, Indonesia. Zootaxa, 3815, 541–564. Rowe, KC, Achmadi AS, Esselstyn JA (2016a) Repeated evolution of carnivory among Indo- Australian rodents. Evolution, 70, 653–665. Rowe KC, Achmadi AS, Esselstyn JA (2016b) A new genus and species of omnivorous rodent (Muridae: Murinae) from Sulawesi, nested within a clade of endemic carnivores. Journal of Mammalogy. Ruedi M (1995) Taxonomic revision of shrews of the genus Crocidura from the Sunda Shelf and Sulawesi with description of two new species (Mammalia: Soricidae). Zoological Journal of the Linnean Society, 115, 211–265. Ruedi M, Auberson M, Savolainen V (1998) Biogeography of Sulawesian shrews: testing for their origin with a parametric bootstrap on molecular data. Molecular Phylogenetics and Evolution, 9, 567–571. Setiadi MI, McGuire JA, Brown RM, Zubairi M, Iskandar DT, Andayani N, Supriatna J, Evans BJ (2011) Adaptive radiation and ecological opportunity in Sulawesi and Philippine fanged frog (Limnonectes) communities. The American Naturalist, 178, 221–240. Shekelle M, Meier R, Wahyu I, Widateti, Ting N (2010) Molecular phylogenetics and chronometrics of Tarsiidae based on 12s mtDNA haplotypes: evidence for Miocene origins of crown tarsiers and numerous species within the Sulawesian clade. International Journal of Primatology, 31, 1083–1106. Sheldon FH, Lim HC, Moyle RG (2015) Return to the Malay Archipelago: the biogeography of Sundaic rainforest birds. Journal of Ornithology, 156, S91–S113.

42

Siler CD, Oaks JR, Esselstyn JA, Diesmos AC, Brown RM (2010) Phylogeny and biogeography of Philippine bent-toed geckos (Gekkonidae: Cyrtodactylus) contradict a prevailing model of Pleistocene diversification. Molecular Phylogenetics and Evolution, 55, 699– 710. Spakman W, Hall R (2010) Surface deformation and slab-mantle interaction during Banda Arc subduction rollback. Nature Geoscience, 3, 562–566. Stephens M, Donnelly P (2003) A comparison of Bayesian methods for haplotype reconstruction. American Journal of Human Genetics, 73, 1162–1169. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68, 978–989. Sukumaran J, Holder MT (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics, 26, 1569–1571. Sukumaran J, Holder MT (2016) Sumtrees: Phylogenetic Tree Summarization. 4.1.0. Available at https://github.com/jeetsukumaran/DendroPy. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Molecular Biology and Evolution, 30, 2725–2729. Toussaint EFA, Hall RA, Monaghan MT, Sagata K, Ibalim S, Shaverdo HV, Volger AP, Pons J, Balke M (2014) The towering orogeny of New Guinea as a trigger for arthropod megadiversity. Nature Communications, 5, 1–10. Vähä J, Erkinaro J, Niemelä E, Primmer C (2007) Life-history and habitat features influence the within-river genetic structure of Atlantic salmon. Molecular Ecology, 16, 2638–2654. Wallace AR (1863) On the physical geography of the Malay Archipelago. Journal of the Royal Geographic Society, 33, 217–234. Wallace AR (1869) The Malay Archipelago, the land of the orang-utan and the bird of paradise: a narrative of travel with studies of man and nature. Macmillan, London. Wallace AR (1880) Island life, or the phenomena and causes of insular faunas and floras, including a revision and attempted solution of the problem of geological climates. Macmillan, London. Whitten AJ, Mustafa M, Henderson GS (2002) The ecology of Sulawesi. Periplus, Singapore. Wilson EO (1961) The nature of the taxon cycle in the Melanesian ant fauna. The American Naturalist, 95, 169–193. Yang Z (2015) The BPP program for species tree estimation and species delimitation. Current Zoology, 61, 854–865. Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proceedings of the National Academy of Sciences, 107, 9264–9269. Yang Z, Rannala B (2014) Unguided species delimitation using DNA sequence data from multiple loci. Molecular Biology and Evolution, 31, 3125–3135.

43

Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.

44

Appendix Supplementary information

TABLE A.1: Primer sequences and locus-specific annealing temperatures. Annealing temperature Gene name Primer name Sequence (˚C) ANKRD F1 TTTCCAGCAGATCCGTCCAC 57 ANKRD R1 ACTCCCAGAAGAACTGCTGC 57 CCNT F1 TAGCTGCCCAGGTAGAACAG 57 CCNT R1 TATGTGACAGGGGGAGGAGG 57 GRIN F1 ACTATGGAATATGGCCAGAGCA 57 GRIN R1 AGAAACCTTCCAGTCCAGCA 57 MBD F1 AGCTCTGAGTGCCATGAGTG 52 MBD R1 TTGAACGTCCTTGGCCTATAA 52 NCOA F1 GCAGCAGCACAGGAAATAGC 56 NCOA R1 TCCAGTCGCTCAAGTTTGGG 56 SLITRK F1 ATAACCCATGGGACTGCACC 56 SLITRK R1 TGGATCGCGTTGTACTCCAC 56 TMEM F1 AGCAATGACAAGACTGGAAGA 52 TMEM R1 GTGAAAACAACTGAAAGTCTTCTGC 52

45

TABLE A.2: Models of nucleotide substitution used for each phased and unphased nuclear locus. Locus Model of nucleotide substitution Model of nucleotide substitution used (phased loci) used (unphased loci) ANKRD HKY F81 ApoB HKY HKY + Γ BDNF JC K80 + Γ BRCA HKY HKY CCNT HKY JC + I + Γ GHR K80 K80 GRIN K80 K80 MBD HKY HKY MCGF HKY HKY NCOA K80 K80 PTGER F81 HKY RAG-1 K80 K80 + I SLITRK K80 K80 TMEM HKY HKY vWF HKY + I HKY + I

46

Vita

Ryan Andrew Eldridge was raised in Rocky River, Ohio, and received his undergraduate degree in Evolution and Ecology from The Ohio State University in 2012. Ryan came to Louisiana State

University in 2013 to pursue a Master of Science degree in Biological Sciences. He hopes to earn his degree in Summer of 2016.

47