Louisiana State University LSU Digital Commons

LSU Doctoral Dissertations Graduate School

2010 Population genetics and systematics of a - rich clade of Neotropical reef fishes, the tubeblenny genus Ron Israel Eytan Louisiana State University and Agricultural and Mechanical College, [email protected]

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations

Recommended Citation Eytan, Ron Israel, "Population genetics and systematics of a species-rich clade of Neotropical reef fishes, the tubeblenny genus Acanthemblemaria" (2010). LSU Doctoral Dissertations. 405. https://digitalcommons.lsu.edu/gradschool_dissertations/405

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. POPULATION GENETICS AND SYSTEMATICS OF A SPECIES-RICH CLADE OF NEOTROPICAL REEF FISHES, THE TUBEBLENNY GENUS ACANTHEMBLEMARIA

A Dissertation

Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy

in

The Department of Biological Sciences

by Ron Israel Eytan B.S., University of Miami, 1999

December, 2010

Acknowledgements

I would like to thank my parents. My mother has been tireless in working to provide me with the many opportunities in life and in academia that I have enjoyed. My father bought me a microscope when I was five and took me on many weekend visits to the zoo, aquarium, planetarium, and the Museum of Science and Industry. I am very thankful to Dr. Michael

Hellberg, who took a chance on me when other people would not. The members of my graduate committee, Robb Brumfield, Jim Cronin, Phil Hastings, Richard Stevens, and Andrew

Whitehead, have generously shared their knowledge and expertise with me. Ray Clarke and Phil

Hastings laid down the foundation on which I built my dissertation and have been extremely kind and generous while sharing their knowledge of Acanthemblemaria blennies with me. Other faculty members, including Bryan Carstens, Kyle Harms, and Mohamed Noor, kindly shared their knowledge and time. Melissa Baustian, James Maley, Nancy Rabalais, and Bill Stickle generously provided me with invaluable resources during my dissertation. I benefited greatly from interactions with and the support of faculty and staff in the Department of Biological

Sciences, including Pat Arbour-Reily, Chris Austin, Chimene Boyd, Prosanta Chakrabarty,

Nanette Crochet, Evanna Gleason, Cindy Henk, Scott Herke, John Larkin, Prissy Milligan, Tom

Moore, Susan Murray, and Jackie Stephens. I owe much to the close friends I have made while at

LSU, including Melissa DeBiasse, Matt Brown, Curt Burney, Jane Carlson, CJ Hayden, Verity

Mathis, Emily McMains, John McVay, Rebecca Tedford, and Dan Warren, for both their friendship and support. Members of my cohort, fellow LSU graduate students, and friends

Adriana Bravo, Gustavo Bravo, Matt Carling, Zac Cheviron, Santiago Claramunt, Brian

Counterman, Andrés Cuervo, Matt Davis, Jessie Deichmann, Alice Dennis, Sheri Dixon-Schully,

Ali Hamilton, CJ Hayden, Richard Gibbons, Sarah Hird, Elizabeth Jackson, Heather Jackson,

ii Nathan Jackson, Haw Chuan Lim, James Maley, Ben Marx, John McCormack, Jonathan Myers,

John McVay, Jamie Oaks, Brian O’Shea, Daniel Ortiz-Barrientos, Tim Paine, Tara Peltier,

Carlos Prada, Fabio Raposo, Noah Reid, Mike Taylor, Deb Triant, and many others enriched my time at LSU. My field work would not have been possible without permits from the Belize

Fisheries Department, the Bahamas Department of Fisheries, the Florida Keys National Marine

Sanctuary, and the Government of the Netherland Antilles. Assistance in the field and opportunities for field work were provided by Andy Caballero, Ray Clarke, Rachel Colin, Ben

Holt, Howard Lasker, Ross Robertson, Klaus Rützler, Mike Taylor, Peter Wainwright, Dan

Warren, Liz Whiteman, Culebra Divers, Dolphin Encounters, Paguera Divers, CARMABI,

Nature Foundation Sint Maarten, the Carrie Bow Cay research station, the Calabash Cay research station, the Bocas del Toro research station, and their respective staffs, and the RV

Walton Smith and its crew. Ray Clarke, Philip Hastings, Ben Holt, Cindy Klepadlo, Liz

Whiteman, and the Scripps Institution of Oceanography Marine Vertebrate collection kindly provided tissue samples. I received funding for my dissertation from a Sigma Xi Grant in Aid of

Research, a grant from the Lerner-Gray Fund for Marine Research, an ASIH Rainey fund award, the Association of Marine Labs of the Caribbean, a short-term fellowship from the Smithsonian

Tropical Research Institute, the LSU graduate school, the Caribbean Coral Reef Ecosystem

Project, and LSU BioGrads. Computational resources for portions of my dissertation were provided by the Louisiana Optical Network Initiative.

iii

Table of Contents

ACKNOWLEDGEMENTS……………………………………………………………………….ii

ABSTRACT……………………………………………….………………………………………v

CHAPTER

1 GENERAL INTRODUCTION……………………………………………….…………1

2 NUCLEAR AND MITOCHONDRIAL SEQUENCE DATA REVEAL AND CONCEAL DIFFERENT DEMOGRAPHIC HISTORIES AND POPULATION GENETIC PROCESSES IN CARIBBEAN REEF FISHES…………………………………………….……3

3 THE PERFORMANCE OF BAYESIAN PHYLOGENETIC INFERENCE UNDER EXTREME SUBSTITUTION RATE VARIATION: EFFECTS ON CONCATENATED AND SPECIES TREE ANALYSES……………………………………………….…………………..36

4 A THORNY SITUATION: ACCOUNTING FOR CONFLICT BETWEEN MOLECULES AND MORPHOLOGY IN THE NEOTROPICAL REEF FISH CLADE ACANTHEMBLEMARIA () ……………………………………………….80

5 CONCLUSIONS……………………………………………….…………………….108

REFERENCES……………………………………………….………………………………...112

APPENDIX 1: SUPPLEMENTARY DATA...... ….……………....………………………..….131

APPENDIX 2: PERMISSION FROM EVOLUTION…………………………………...…….143

VITA…………………………………………………………………………………...…….....149

iv Abstract Neotropical coral reef fish communities are species-poor compared to those of the Indo-West

Pacific. An exception to that pattern is the blenny clade Chaenopsidae, one of only three coral reef fish families endemic to the Neotropics. Within the chaenopsids, the genus

Acanthemblemaria is the most species-rich. To understand the origin and maintenance of genetic and species diversity in these fishes, I characterized the population genetics for two

Acanthemblemaria species, reconstructed the phylogeny of the group, and identified suites of correlated morphological characters responsible for the distinctive skull morphology of these fishes.

By combining nuclear and mitochondrial sequence data I was able to recover the complex demographic history of two closely related Acanthemblemaria species, A. aspera and A. spinosa. Old population expansions in both species were obscured by a rapid mitochondrial substitution rate, but the mitochondrial DNA allowed the recovery of a recent expansion in A. aspera corresponding to a period of increased habitat availability. However, the older expansions that took place in both species were only recovered using the nuclear markers.

Across the genus I found that mitochondrial COI is evolving nearly 100X faster than the nuclear markers and at an absolute rate of nearly 25% pairwise sequence divergence per million years. Replicate Bayesian phylogenetic analyses failed to converge on the same posterior distributions because proposals to update the rate multiplier parameter were rarely accepted, but when the tuning parameter was adjusted, all datasets converged quickly on to the same posterior distribution. When COI was included, posterior probabilities of the species tree were lower and topological estimates were worse than those from the nuclear-only dataset.

v The species tree that was constructed for the genus conflicted with the morphological phylogeny for the group, primarily due to the convergence of skull bones with spines. By performing phylogenetic analyses on these characters, I resolved some of the conflicts between the morphological and molecular phylogenies. Divergence time estimates recovered a mid-

Miocene origin for the genus, with speciation both before and after the closure of the Isthmus of

Panama. Some sister taxa were broadly sympatric, but many occur in allopatry.

vi Chapter 1: General Introduction

The origin and maintenance of biodiversity has been of long-standing interest to ecologists and evolutionary biologists. Because they harbor the greatest species-richness in the oceans

(Bellwood and Hughes, 2001), coral reef ecosystems have been particularly well studied. Coral reefs harbor the greatest diversity of marine fishes and attempts to understand that diversity have employed approaches ranging from population ecology (Sale, 1977) to molecular evolution

(McMillan and Palumbi, 1997).

While over 75 different families of fishes are present on reefs (Bellwood and

Wainwright, 2002), a small number of taxonomic groups have received the greatest attention from researchers (Sale, 1991; Sale, 2002). These are the Pomacentridae (damselfishes), Labridae

(wrasses), Scaridae (parrotfishes), Acanthuridae (surgeonfishes), Pomacanthidae (angelfishes), and Chaetodontidae (butterflyfishes). While these groups have high species diversities, others have as much or more (Bellwood and Wainwright, 2002). Small, cryptic reef fishes, in particular, are generally very species-rich (Nelson and Wheeler, 2006), and compared with the six families listed above, much of this richness is likely undescribed, and certainly, understudied (Munday and Jones, 1998).

The Blennioidei is a perciform suborder of small, bottom-dwelling reef fishes. Blennies are a species-rich group composed of six families (Hastings and Springer, 2009b; Springer,

1993), with a total of 883 named species, and many more undescribed (Hastings and Springer,

2009a; Hastings and Springer, 2009b). While, in general, reef fish diversity decreases longitudinally from the Indo-West Pacific, the opposite is true of blennies, where the

Labrisomidae, Dactyloscopidae, and Chaenopsidae blennies are the only reef fish families endemic to the New World (Bellwood and Wainwright, 2002; Hastings, 2009).

1 This dissertation sets out to examine the origins and maintenance of genetic and species diversity in the blenny genus Acanthemblemaria (Chaenopsidae) one of the most species-rich genera of fishes on Neotropical coral reefs, with 22 described species, 9 in the Tropical Eastern

Pacific and 13 in the Caribbean (Hastings and Springer, 2009b). Acanthemblemaria blennies are typified by the presence of spinous processes on the bones of the skull (Metzelaar, 1919; Smith-

Vaniz and Palacio, 1974; Stephens, 1963) and morphological characters related to head spines represent the majority of the characters used to infer the interspecific relationships in the group

(Hastings, 1990).

I studied the diversity of the genus at different temporal and spatial scales. First, at the spatial scale of the entire Caribbean basin, I conducted a population genetics study on the two

West Atlantic species with the largest geographic distributions, A. aspera and A. spinosa.

Because they are closely related and have identical life histories, I was able to investigate whether ecological differences between these fishes corresponded to differences in population genetic patterns. Second, at the scale of the genome, I used Bayesian divergence dating methods to determine rates of molecular evolution, both absolute, and between the mitochondrial and nuclear genomes. I then tested whether large rates of molecular evolution can affect the performance of Bayesian partitioned and species tree phylogenetic analyses. Last, at a macroevolutionary time scale, I characterized the causes of conflicts between the gene-based and morphology-based phylogenetic estimates for the genus, where taxa with clear affinities based on cranial morphology were not closely related in the molecular phylogeny.

2 Chapter 2: Nuclear and Mitochondrial Sequence Data Reveal and Conceal Different Demographic Histories and Population Genetic Processes in Caribbean Reef Fishes*

Introduction The broad aim of comparative phylogeography is to infer how co-distributed taxa respond to shared evolutionary events (Avise, 2000; Hickerson et al., 2009). Each species is treated as a replicate sample of the underlying processes responsible for observed genetic patterns. Several well-documented geological processes have produced concordant patterns of genetic structure and historical demography, including Pleistocene glaciation in Europe (Hewitt, 2000; Taberlet et al., 1998) and northwestern North America (Brunsfeld et al., 2001; Carstens et al., 2005), the closure of the Central American seaway (Hickerson et al., 2006; Knowlton et al., 1993; Lessios,

2008), and the rise of the Andes (Burney and Brumfield, 2009). However, congruent population genetic patterns may arise due to different processes occurring at different times (i.e. pseudocongruence; (Cunningham and Collins, 1994) and similar patterns of subdivision may not accurately reflect a shared history of co-occurring taxa.

J*ust as multiple co-occurring species may afford replication for inferring common historical events acting in a region, multiple genetic markers allow replicate samples of the demographic history of particular species (Brito and Edwards, 2009). By combining markers, researchers make the tacit assumption that those markers are behaving in a similar fashion.

However, loci, like taxa, may conflict. A major source of this conflict is the inherent stochasticity in the time of lineage sorting for each marker (Hudson and Turelli, 2003). Different estimates of demographic history can also result when markers differ from one another in the mechanisms affecting their evolution, such as mode of transmission, effective population size, or

* Reprinted by permission of Evolution 3 rates of recombination (Brito and Edwards, 2009; Graur and Li, 2000; Hare, 2001; Zhang and

Hewitt, 2003).

The most common example of this in studies is the difference between nuclear and mitochondrial sequence markers. The former is transmitted biparentally and inter-locus recombination should mean that most nuclear markers provide replicate estimates of a common demography, while the latter is transmitted maternally as a single non-recombining block. Given the power afforded by the small effective population size of mitochondrial DNA (Moore, 1995) and the expense and effort required to survey multiple nuclear markers, the argument has been made that mitochondrial DNA offers more than enough power to address most questions

(Barrowclough and Zink, 2009). However, as cost concerns recede with the advent of next- generation sequencing (Hudson, 2008), concerns about factors which could confound inferences provided by mitochondrial DNA (e.g. non-neutrality, extreme rate variation, recombination) have led some (Galtier et al., 2009) to suggest that mitochondrial DNA “is the worst marker” for population genetics and should not be used at all.

In theory, mitochondrial and nuclear DNA should be able to complement each other in demographic studies. The smaller effective population size of mitochondrial DNA should allow it to capture the signal of demographic events that cannot leave their footprints on the larger effective population size of nuclear markers. The strength of nuclear DNA lies in its ability to provide replicate samples of the underlying demographic history affecting the genome of an organism as well as replicate samples of the coalescent process. For this reason, sampling multiple nuclear markers can substantially reduce the variance of parameter estimates (Brito and

Edwards, 2009; Carling and Brumfield, 2007; Felsenstein, 2006; Hey, 2010; Lee and Edwards,

2008). Discussions of the relative merits of mitochondrial and nuclear markers in phylogeography have centered on two different, but complementary, goals: to identify clades of

4 populations (and thus phylogeographic breaks or cryptic species) using gene trees, and to reconstruct historical demography (Barrowclough and Zink, 2009; Edwards and Bensch, 2009;

Zink and Barrowclough, 2008). While mitochondrial DNA is useful in delimiting geographically restricted clades, its power to estimate demographic parameters on its own is poor. The opposite is true of nuclear DNA. Combining both marker types allows investigators to identify clades and then estimate parameters of interest, such as migration rates (Barrowclough and Zink, 2009; Lee and Edwards, 2008). However, simple combination does not admit the possibility that mitochondrial and nuclear DNA can reveal different demographic events. In practice, such a dual gene class approach to inferring historical demography requires robust substitution rate estimates for both types of markers to reconcile them into a single time frame. Incorrect estimates of substitution rates can severely bias parameter estimates such as divergence times, population size changes, and migration, among others. Here I explore congruence of marker types, phylogeographic patterns, and demographic inferences between co-occurring taxa from a genus of reef fishes that contains sister taxa to either side of the Isthmus of Panama, allowing the calibration of taxon-specific substitution rates.

Acanthemblemaria is a genus of blennies occurring on both sides of the Isthmus of

Panama and throughout tropical and sub-tropical waters of the western Atlantic and eastern

Pacific. They are members of the Family Chaenopsidae, one of only two coral reef fish families with an exclusively Neotropical distribution (Stephens, 1963). The Western Atlantic members of the genus occur throughout the Caribbean basin, the Bahamas, and peninsular Florida (Smith-

Vaniz and Palacio, 1974). All members in the genus are small (~1.2-3.5 cm SL) and are obligate dwellers of vacated invertebrate holes on shallow (<1 - ~22 meters) rocky and coral reefs

(Böhlke and Chaplin, 1993; Clarke, 1994).

5 and A. spinosa are found throughout the Caribbean and the

Bahamas and co-occur over large portions of their respective ranges (Fig. 1). The two species are closely related (Hastings, 1990; Eytan et al., unpub. data), share the same mating system (male resource defense polygyny; Hastings, 2002), pelagic larval duration (21-24 days; Johnson and

Brothers, 1989) , and ecologically overlap (Clarke, 1994). The two species differ in microhabitat use, though. A. spinosa lives in shelters in live and standing dead coral off the reef surface

(high-profile habitat). A. aspera is found at the base of standing dead corals or in coral rubble

(low-profile habitat), sometimes at the base of corals housing A. spinosa (Clarke, 1994).

The differences in microhabitat use and specialization give the two species different propensities to go locally extinct. Clarke (1996) found that A. spinosa populations in St. Croix went locally extinct due to habitat degradation, specifically the destruction of standing dead

Acropora palmata corals. The resulting coral rubble this provided allowed A. aspera populations to increase in size. This type of population dynamic may lead to discordant demographic cycles, both between species and between populations. I hypothesize that if the two species do differ in historical demography, despite similar life histories, the contrast in microhabitat requirements for the two blenny species should favor population persistence in A. aspera rather than A. spinosa as the latter is more prone to local extinction. However, it is not clear whether the population dynamics observed at ecological time scales will extend to evolutionary ones.

I collected mitochondrial and nuclear sequence data for both species from populations where they co-occur and analyzed the data with substitution rates calculated using a relaxed molecular clock. I then tested the expectation of phylogeographic and demographic concordance among co-distributed taxa. A. aspera and A. spinosa are good candidates for this test because of their similar life histories, close relationship, and nearly identical geographic distributions, but potential differences in historical demography due to different microhabitat use.

6

Materials and Methods Sample Collection

Only individuals from populations where both species co-occur were used in the current study

(Fig. 2.1 and Appendix 1 Table 1). Samples from species used as outgroups for analysis of substitution rates were collected from Panama and Belize (Appendix 1 Table 1). Samples were collected on SCUBA. A dilute solution of quinaldine sulfate was squirted into the blenny hole. A small glass vial was then placed over the hole, into which the fish immediately swam. Photo vouchers from freshly collected specimens are available from RIE for a subset of these individuals by request. Whole fishes were stored individually in 95% ethanol at -80° C.

Figure 2.1 Confirmed locations of A. aspera and A. spinosa populations in the Caribbean (Smith-Vaniz and Palacio, 1974 and RIE, pers. obs.) The localities sampled for this study are listed.

DNA Extraction, PCR and Sequencing

DNA was extracted from 11-16 individuals of each species from each of six populations using the Qiagen QIAMP DNA Minikit. DNA was extracted from a total of 84 individuals from each species, as well as a single individual of A. paula, A. betinensis, and A. exilispinus. 7 The polymerase chain reaction (PCR) was performed on a PTC-100 or 200 (MJ

Research) to amplify three genetic markers: protein coding genes mitochondrial cytochrome oxidase I (COI) and nuclear recombination-activating gene 1 (rag1), and intron V from nuclear alpha-tropomyosin (atrop). The primers used, primer references, and PCR conditions can be found in Appendix 1 Table 2. Amplicons were purified with a Strataprep PCR Purification Kit

(Stratagene) or directly sequenced without cleanup in both directions on an ABI 3100 or 3130

XL automated sequencer using 1/8 reactions of BigDye Terminators (V3.1, Applied Biosystems) and the amplification primers.

Alignment and Phasing

Sequencing reactions for rag1 and COI produced clear reads 539 and 704 bp long, respectively.

Reads for atrop were 440 bp, 427 bp, 423 bp, and 418 bp long in A. spinosa, A. paula, A. betinensis, and A. exilispinus, respectively. The atrop sequences for A. aspera had length variants (425, 427 and 429 bp) as well as indel-hets, with the latter obscuring reads. Indel heterozygotes containing a single indel were resolved using CHAMPURU (Flot, 2007).

Sequences containing more than one indel-het were cloned to resolve constituent allelic sequences using the Invitrogen TOPO TA Cloning Kit for Sequencing. The initial direct sequences were always used in determining allelic sequences from cloned DNA to avoid scoring any changes that resulted from errors introduced by the PCR.

The COI and rag1 sequences contained no gaps and were aligned using MUSCLE

(Edgar, 2004), implemented in Geneious v 4.5.4 (Drummond et al., 2009), as were the intraspecific atrop datasets. The interspecific atrop alignment, consisting of one sequence each from A. spinosa, A. paula, A. betinensis, A. exilispinus, and A. aspera, was more difficult to align due to length polymorphisms. BAli-Phy v. 2.0.1 (Suchard and Redelings, 2006) was used to align the atrop sequences using the HKY substitution model, gamma distributed rate variation,

8 and the default indel model. BAli-Phy was run four times to ensure concordance among runs.

The final output from each run was separately analyzed, with all the samples before convergence discarded as burnin. The consensus alignment from the run with the highest posterior probability was used for subsequent analyses.

Alleles from sequences with multiple heterozygous single nucleotide polymorphisms

(SNPs) were resolved using PHASE v2.1.1 (Stephens and Donnelly, 2003; Stephens and Scheet,

2005; Stephens et al., 2001). Input files were prepared for PHASE using the online software package SeqPHASE (Flot, 2009). Alleles determined by cloning heterozygotes or as output from

CHAMPURU were used to create a “known” file for PHASE. A default probability threshold of

0.9 was used for all runs.

After initial PHASE runs, all datasets contained some individuals with unresolved SNPs.

I cloned a subset of these individuals to directly determine their haplotype phase. The direct haplotype observations were then added to the “known” file and the datasets were reanalyzed.

Final datasets for each gene and species contained no more than 3 individuals for which the phase of a single SNP was not resolved to 0.9 (6 total for rag1, 4 for atrop). After alignment and phasing of heterozygous SNPs, the final dataset contained 336 nuclear and 84 mitochondrial alleles for each species. The sequences have been submitted to GenBank with the accession numbers HM196865-HM197713.

Haplotype Networks

Parsimony networks were constructed for COI for both species using TCS v1.21 (Clement et al.,

2000). Networks for the COI gene in A. spinosa failed to connect at the 95% confidence level so the connection limit was fixed at 70 in order to connect populations. Although this reduces the confidence of the resulting networks, the main purpose is to visualize the number of inferred mutations between populations, rather than infer relationships among populations.

9 Recombination

I tested for recombination using the tree-based SBP and GARD methods of Kosakovsky-Pond et al. (Pond et al., 2006) implemented online via the Datamonkey webserver (Pond and Frost, 2005) using the model of sequence evolution chosen by Datamonkey using the AIC criterion, with gamma distributed rate variation.

The GARD and SBP tests failed to find recombination in any of the pooled population datasets for any of the markers of either species. Recombination was detected via GARD and

SBP for one population level dataset – the rag1 alignment for Belize and Honduras in A. spinosa.

Recombination was detected with a p value = 0.1 at position 192. I performed all analyses involving the Belize and Honduras rag1 alignment twice, both for the whole alignment and for positions 1-192. The results of those analyses did not differ from those using the full alignment

(not shown).

Population Structure

Sequences were collapsed into haplotypes using FaBox, (Villesen, 2007) and treated as alleles based solely on identity, not the genetic distance between the haplotypes. Measures of genetic subdivision, as measured by pairwise ΦST values among populations, were calculated using analysis of molecular variance (AMOVA) (Excoffier et al., 1992; Michalakis and Excoffier,

1996), implemented in GenoDive v2.0b15.1 (Meirmans and Van Tienderen, 2004), which for this purpose are equivalent to Weir and Cockerham’s θ (Weir and Cockerham, 1984). To allow meaningful comparisons between species and markers and to correct for high levels of variation within populations, which necessarily reduce measures of the proportion of variation partitioned among populations (Hedrick, 2005), the ΦST measures were estimated using a standardizing procedure (Meirmans, 2006) implemented by GenoDive. The significance of all comparisons was tested by permutation.

10 To detect differentiated populations (k) without the need to define populations a priori, and to determine if A. aspera and A. spinosa have congruent patterns of genetic differentiation, a

Bayesian clustering analysis was implemented in STRUCTURE v2.3 (Pritchard et al., 2000) using the admixture model with uncorrelated allele frequencies. A recent extension to the

STRUCTURE method (Hubisz et al., 2009), which uses sampling locality as a prior, was employed. The use of this prior does not tend to find population structure when none is present

(Hubisz et al., 2009), but has been recommended for situations (like ours) where available data are limited (Pritchard et al., 2009).

I performed 10 replicate runs for k values between 1 (no population differentiation) and 6

(the maximum number of populations sampled). Each replicate was run for 10 million iterations following an initial burnin of 100,000 iterations. Best estimates of k were inferred using the method of Evanno et al. (2005) as implemented in STRUCTURE Harvester (Earl, 2009). The output files for the best estimate of k were then processed in CLUMMP (Jakobsson and

Rosenberg, 2007) using the default parameters.

The Evanno et al. analysis identified two clusters for both species, but the initial bar plots from these runs showed four clusters (not shown). The Evanno et al. method for determining k is biased towards detecting the highest level of population structure in datasets with hierarchical population structure (Waples and Gaggiotti, 2006). This appeared to be the case here, as the initial k = 2 corresponds to a split between populations in the eastern and western Caribbean

(data not shown). STRUCTURE was run again to determine if additional structure was present in the two identified clusters. The dataset for each species was split to represent membership in the two detected clusters and each was analyzed with the original run conditions, with the exception that log Pr(X|K) was estimated for k = 1-3.

11 Pairwise Sequence Divergence

The net average pairwise sequence divergence between each pair of populations for each species and each gene was calculated in MEGA v4.0 (Tamura et al., 2007) using the model of sequence evolution selected by the AIC in jModelTest (Posada, 2008). The models selected by the AIC for each marker and analysis can be found in Appendix 1 Table 3. If the chosen model was not available in MEGA, the next less complex model was used. To aid in direct comparisons, all models of sequence evolution for each marker were the same for both species, using the less complex model of sequence evolution when these differed.

Gene Tree Construction

Gene trees for each species for each of the three markers were constructed in BEAST v1.4.8

(Drummond and Rambaut, 2007) using a constant size coalescent prior and a strict molecular clock. These were constructed for use in the GMYC analysis (see below), which requires rooted ultrametric gene trees. The BEAST analyses do not require the inclusion of an outgroup to root the tree because a molecular clock is enforced (Drummond and Rambaut, 2007; Huelsenbeck et al., 2002a).

MCMC analyses were run four times for either 10,000,000 steps (COI data) or

25,000,000 steps (nuclear markers), sampling every 1,000 steps for all for a total of 10,000 and

25,000 trees, respectively. Convergence onto the posterior distribution was assessed using two methods. The first was by visual inspection of traces in Tracer v1.4.1 and check for concordance between runs. All parameters had effective sample size (ESS) values > 250. The second method for assessing convergence was by using Are We There Yet? (AWTY) (Nylander et al., 2008).

Cumulative posterior probability plots were inferred using the cumulative function. Posterior probability estimates for each clade were compared between the four MCMC runs by producing scatter plots using the compare function. One fourth of the sampled trees were discarded as

12 burnin for each test. The cumulative posterior probability plots indicated that all runs had stabilized while the scatter plots showed that posterior probability for clades were similar for all compared runs. The maximum clade credibility (MCC) tree was calculated for each gene tree using TreeAnnotator v1.4.8 with a burnin of 6,250 for the two nuclear markers and 2,500 for

COI.

Substitution Rate and Divergence Time Estimates

I estimated substitution rates and divergence times using a relaxed clock approach that allows rates to vary among species and divergence times to follow a probabilistic distribution rather than relying on point estimates. The use of a relaxed molecular clock is not appropriate for the analysis of intraspecific data, as the coalescent employs a strict clock (Hein et al., 2005). Any differences in branch lengths in a coalescent tree would only be caused by the variance of the

Poisson process describing the number of mutations along a branch and not variation in substitution rates among lineages (Hein et al., 2005). Once all gene copies have coalesced, this restriction is relaxed and rates can vary among lineages.

I determined whether all gene copies had coalesced in individual populations by using the

Generalized Mixed Yule Coalescent (GMYC) model (Pons et al., 2006). The GMYC analysis divides a single locus gene tree into a portion where a Yule speciation process affects the branch lengths and a portion where there is a shift to a coalescent branching process, using this boundary to define species. Here, it was used to detect the presence of a transition point from a coalescent to Yule process to determine the appropriate use of a relaxed molecular clock. If gene copies were found to belong to distinct clusters then representative alleles from each cluster was used, while if no clustering was found then a single representative allele for each species was used. The MCC trees inferred in BEAST were used for the GMYC model, which was implemented in R (R Development Core Team, 2009) using the SPLITS package (available at

13 CRAN repository) and the single threshold model.

The substitution rates for the three genes were determined using a relaxed clock analysis in BEAST v1.48. Based on the results of the GMYC analysis, a tree was built including a single representative nuclear gene sequence for A. aspera, A. spinosa, A. paula, A. betinensis, and A. exilispinus (except for COI, which had 5 A. spinosa sequences representing populations delimited by SPLITS as having coalesced). A. paula was included because it is hypothesized to be sister to A. aspera (Hastings, 1990; Eytan et al., unpub. data) and could break a possible long branch leading to A. aspera. An exponential prior was placed on the time to most recent common ancestor (TMRCA) of the transisthmian geminate pair of A. betinensis and A. exilispinus. A mean of 7 million years with a zero offset of 3.1 million years was used for the exponential prior, which translates into a distribution with lower and upper 95% confidence intervals of 3.28 and

28.92 million years, respectively. This prior distribution represents the latest possible divergence between the pair but also allows for the possibility that divergence may have occurred before the final closure of the isthmus, although with a decreasing probability further back in time. The uncorrelated lognormal relaxed clock (Drummond et al., 2006) was used to estimate branch rates. The nucleotide substitution model GTR + Γ was used for COI and HKY for the two nuclear markers.

Each dataset was also run using the Kimura 2 parameter (K2P) model so that rate estimates would be directly comparable to those for other transisthmian species pairs in Lessios

(2008). Further analyses were also done using both the GTR + Γ or HKY model and the K2P model under fixed substitution rates to determine what inferred transisthmian divergence times would be assuming previously published rates of molecular evolution in geminate coral reef fishes. These rates were taken from Table 3 in Lessios (2008) for species pairs assumed to have begun divergence at the final closure of the isthmus. Upper and lower values, where present,

14 were used. The two rates used for COI were 1.03% per million years and 1.77% per million years. The rate used for rag1 was 0.097% per million years. No atrop data were included in

Lessios (2008).

Two MCMC searches were conducted for each dataset with a Yule speciation prior on the gene tree for 10,000,000 (COI) or 25,000,0000 (nuclear genes) generations. The log files from the runs were inspected using TRACER v1.4.1 to check for convergence in the Markov chain. Maximum clade credibility trees were constructed using TreeAnnotator v1.4.8. The mean rate for each marker and inferred transisthmian divergence time and their 95% upper and lower highest posterior densities, as provided in TRACER, was recorded.

Demographic Reconstruction

To detect departures from a constant population size or neutrality, I used the summary statistics

Fs (Fu, 1997) and R2 (Ramos-Onsins and Rozas, 2002), which have the greatest power to reveal population growth (Ramos-Onsins and Rozas, 2002). Large negative values of Fs and small positive values of R2 indicate population growth. I also used Tajima’s D (Tajima, 1989), as it also has good power to detect population growth (Ramos-Onsins and Rozas, 2002) but has the added benefit of being a two-tailed test. Significantly negative values of Tajima's D indicate population growth (or a selective sweep), while significantly positive values are a signature of genetic subdivision, population contraction, or diversifying selection. The Fs, R2, and Tajima's D tests were all implemented in DNAsp v5.1 (Librado and Rozas, 2009) for each gene, population, and all populations combined for each of the two species. The significance of all the tests were determined by 1000 coalescent simulations, also implemented in DNAsp.

I reconstructed the historical demography of each species by using the GMRF skyride plot (Minin et al., 2008), implemented in BEAST v1.5.2. The GMRF skyride plot is a non- parametric analysis that uses the waiting time between coalescent events in a gene tree to

15 estimate changes in effective population size over time. It differs from the related Bayesian skyline plot (Drummond et al., 2005) by not requiring the specification of a user defined prior on the number of population size changes in the history of the sample.

GMRF skyride plots using time-aware smoothing were constructed for each population as demarcated by the STRUCTURE analyses, each gene, and all populations combined, for both species. Rates of molecular evolution for each gene were fixed at the values obtained in the substitution rate estimation analysis. Each dataset was run twice for 10 million generations, sampling every 1,000, except for the all populations combined datasets, which were run for 100 million generations, sampling every 10,000. Output files were checked in Tracer and all ESS values were greater than 250. Bayesian skyride plots were then visualized in Tracer. Population size changes were deemed significant if the upper and lower 95% confidence intervals at the root of the plot did not overlap with those at the tips.

Results Similar Patterns of Population Subdivision for A. aspera and A. spinosa

A. aspera and A. spinosa had largely congruent patterns of population subdivision. Pairwise ΦST values showed that both species share few, if any, COI alleles among populations, with most populations composed entirely of private alleles (except for Belize - Honduras and Puerto Rico –

St. Thomas for A. aspera and St. Thomas – Puerto Rico for A. spinosa) (Table 2.1). All pairwise

COI values were significant, except in the case of Puerto Rico – St. Thomas for A. aspera.

While most A. spinosa populations did not share any nuclear alleles, all A. aspera populations did. The majority of pairwise comparisons for A. spinosa (9 out of 15) had a corrected ΦST of 1, but none did for A. aspera. The significance of pairwise ΦST values also differed between marker type and between species. Whereas all but one pairwise COI comparison was significant for A. aspera, 4 out of 15 comparisons using nuclear markers were

16 not significant at p = 0.05. This was in contrast to A. spinosa, where all nuclear ΦST values were significant save one, Belize – Honduras, which also lacked significance in A. aspera.

Table 2.1 Corrected ΦST values for A. aspera (above diagonal) and A. spinosa (below diagonal). * - All ΦST values are significant at p = 0.05 except where marked by an asterisk.

The STRUCTURE analyses (Fig. 2.2) confirmed the presence of hierarchical population structure and recovered the same four clusters for both species, corresponding to the Bahamas,

Belize and Honduras, Puerto Rico and St. Thomas, and St. Maarten. For each of the four reduced datasets, the Evanno et al. method selected k = 2. However, in the case of A. aspera there was multimodality in the assignment of individuals from St. Maarten to clusters (see Figure

2, where there is a probability of 0.6 and 0.4 for assignment of St. Maarten individuals to either the Puerto Rico – St. Thomas cluster or to a separate St. Maarten cluster, respectively). This

17

Figure 2.2 Graphical summary of the results from the STRUCTURE analysis for k = 4 for A) A. aspera and B) A. spinosa. Each individual is represented by a vertical line broken into four colored segments to represent the estimated proportions of that individual’s genome originating from each of the four inferred clusters.

suggested that k may equal 3 for A. aspera. However, in addition to the results from the Evanno et al. test, the values of L(K) for the eastern Caribbean A. aspera dataset were at their lowest at k

= 1 and highest at k =2 (Appendix 1 Figure 1).

18 The COI haplotype networks also showed that geographic structuring of populations is largely congruent between the two species (Fig. 2.3). For both species, most populations were reciprocally monophyletic, with the exception of Belize and Honduras. There, A. spinosa did not share alleles among populations, but A. aspera does.

Large Differences in Population Divergence among Species

While overall measures of genetic subdivision were similar for the two species, the degree of genetic divergence among populations was not: A. spinosa had many more inferred mutations between populations than A. aspera (Table 2.2, Fig. 2.3). For the two A. aspera populations that had the largest COI genetic distances between them, Honduras and Puerto Rico/St. Thomas, pairwise sequence differences based on the number of inferred mutations in the haplotype network were 10 times greater for A. spinosa COI than for A. aspera (Fig. 2.3). The true ratio may be even greater than this conservative estimate, because the number of mutations inferred for A. spinosa using the TCS analysis did not account for the possibility of multiple mutations at the same site. Model-corrected estimates of net average pairwise sequence divergence between pairs of populations (Table 2.2) were significantly higher in A. spinosa than A. aspera for all markers (Mann-Whitney U-test p = <.05 for each) (Table 2.2). The ratio of mean pairwise genetic distance in A. spinosa compared to A. aspera varied from 19.76 for rag1 to 25.02 for atrop, with COI at 22.57.

Gene Trees and Branching Processes Are Different for A. aspera and A. spinosa

The gene trees derived from the mitochondrial data recovered a pattern of subdivision similar to that in the STRUCTURE analyses, but the trees constructed from the nuclear markers did not

(Figure 2.4). The A spinosa COI gene tree recovered five reciprocally monophyletic clades, with all but the Bahamas having good support (BPP >0.9). Unlike the STRUCTURE results, Belize and Honduras were reciprocally monophyletic. The A. aspera COI gene tree supported all

19 -90˚ -85˚ -80˚ -75˚ -70˚ -65˚ -60˚ -90˚ -85˚ -80˚ -75˚ -70˚ -65˚ -60˚

5 5

3 Bahamas 25˚ 25˚ Belize

9 13 11 Honduras 20˚ 20˚ Puerto Rico 11 2 10 5 9 12 St. Thomas 8 10 5 11 15˚ 15˚ St. Maarten

10˚ 10˚

AB

Figure 2.3 COI haplotype networks for A) A. aspera and B) A. spinosa. Circle color indicates population and size is proportional to the number of individuals sharing that haplotype. Haplotypes shared by >1 individual are marked with the sample size. Black dots on branches are inferred mutations.

clusters recovered in the STRUCTURE analysis (with the exception of the Bahamas) as monophyletic with good support.

The A. spinosa atrop tree recovered a well-supported western Caribbean clade, but eastern Caribbean populations were paraphyletic (Figure 2.4). Neither the A. spinosa rag1 tree nor either of the nuclear DNA trees for A. aspera had any well-supported nodes that correspond to geography. Here, the eastern and western Caribbean clades are defined as being all populations to the east and west of the Mona Passage, respectively, as in Baums et al. (2005) and

Taylor and Hellberg (2006), with the western Caribbean clade including the Bahamas.

The gene trees constructed for each marker for both species were used for the GMYC analyses. The results indicated that gene copies had not coalesced within separate populations in

A. aspera; there was only one cluster present for all A. aspera markers (LRT p > 0.05). Based on those results, one representative A. aspera allele from each gene was used for the substitution rate estimates.

20

Table 2.2 Model-corrected net average pairwise sequence divergence between each pair of populations for A. aspera (above diagonal) and A. spinosa (below diagonal).

In A. spinosa, the COI gene copies coalesced within five of the six populations (Puerto

Rico and St. Thomas gene copies coalesced together) (LRT p = 0.0028). This indicated that for

COI there was a transition from a coalescent to a Yule branching process. For that reason, five representative COI sequences for A. spinosa, corresponding to the five delimited clusters from the GMYC analysis, were used for substitution rate estimates. The GMYC analysis did not indicate any population level clustering for the A. spinosa nuclear genes (LRT p > 0.05), so one representative allele for each of the two genes were used for substitution rate estimates.

21

Figure 2.4 Gene trees for each marker for A. aspera and A. spinosa inferred in BEAST. Colors at tips of branches indicate population origin for each allele. Stars represent nodes with greater than 0.90 BPP.

Mitochondrial Substitution Rates Are up to 37X Faster Than for Nuclear DNA

Substitution rate estimates revealed very rapid mitochondrial rates in Acanthemblemaria.

Mitochondrial COI was 37.65X and 14.94X faster than rag1 and atrop, respectively. The mean and 95% upper and lower highest posterior density (HPD) for substitution rates across all taxa in the analysis in substitutions/site/million years for COI, atrop, and rag1 were 5.61x10-2 (1.63x10-

2, 9.92x10-2), 4.65x10-3 (4.69x10-4, 1.02x10-2), and 1.45x10-3 (2.12x10-4, 2.87x10-3), respectively.

22 The posterior distribution for the mean substitution rate for atrop was right-skewed so the median rate (3.99x10-3) was used instead of the mean for all further analyses. These rates corresponded to values of 11.22, 0.798, and 0.298 percent sequence divergence per million years, respectively. The mean rate I have calculated for COI is over six times faster than the highest estimated COI substitution rate listed in Lessios (2008) (1.77% per million years) for geminate reef fish taxa assumed to have split at the final closure of the Central America Isthmus.

For rag1, my mean rate is ~3 times faster than that in Lessios (2008) (0.097% per million years), although the lower bound of the posterior distribution of my estimate overlaps with that of

Lessios.

Transisthmian Divergence Time Estimates Are Concordant among Markers and Robust to the Model of Sequence Evolution Estimates of divergence time for the transisthmian species pair A. betinensis and A. exilispinus agreed among markers and revealed a split before the final closure of the Isthmus of Panama.

The mean and 95% HPDs for dates across the isthmus (TMRCA for A. betinensis and A. exilispinus) were 4.63 my (3.10, 8.17) for COI, 4.62 my (3.10, 8.14) for atrop, and 4.67 my

(3.10, 8.41) for rag1. These estimated values showed that the divergence time estimates did not simply return the calibration prior for the divergence date (7.0 my (3.28, 28.92)) specified in

BEAST.

Published estimates for COI substitution rates from Lessios (2008) recovered significantly older dates when they were used to estimate divergence across the isthmus, but the use of the published rag1 rates did not. For the slower COI rate from Lessios, 1.03% per million years, the mean and upper and lower 95% HPDs for inferred divergence times using the GTR +

Γ model selected by jModelTest were 37.84 (25.94, 50.93) mya. When the K2P model was used, dates of 20.31 (16.64, 23.93) mya were recovered. For the faster substitution rate, 1.77% per million years, divergence times when using GTR + Γ are 21.92 (15.08, 29.58) mya. The dates

23 inferred when using K2P were 11.82 (9.72, 14.012) mya. When the rag1 rate from Lessios

(2008) (0.097% per million years) was used, and the K2P model, the inferred mean and upper and lower 95% divergence dates for A. exilispinus and A. betinensis were 7.25 (1.60, 14.29) mya.

Mean divergence time for A. exilispinus and A. betinensis did not differ from those inferred using the exponential calibration prior when the K2P model from Lessios (2008) was used, and neither did mean substitution rate estimates. For COI the mean, lower, and upper 95%

HPDs for divergence time using the K2P model were 4.62 (3.1, 8.19) for COI, 4.64 (3.1, 8.21) for atrop, and 4.64 (3.1, 8.25) mya for rag1. The mean substitution rates using the K2P model for COI, atrop, and rag1 were 2.72x10-2 (1.0x10-2, 4.32x10-2), 4.05x10-3 (5.45x10-4, 1.05x10-2), and 1.47x10-3 (2.45x10-4, 2.87x10-3) substitutions/site/million years, respectively.

Demographic Reconstruction Reveals Multiple Expansions and Large Effective Population Sizes Reconstructions of historical demography revealed a recent population expansion in A. aspera and older expansions in both species. Test results for population size change in A. aspera based on summary statistics differed between markers (Table 2.3). For the combined datasets, the two nuclear markers showed a significant signal of population expansion, while the COI data did not.

The results from individual A. aspera populations were also mixed. For COI, a strong signal of population expansion was only inferred for the Bahamas (although there was some support for an expansion in Honduras). Likewise, the atrop data only recovered a strong signal of expansion in a single population, St. Maarten (Table 2.3). However, for rag1, expansions were detected in five out of six individual populations, with the exception of the Bahamas.

In contrast to the results from the summary statistics, the Bayesian skyride tests did not recover significant size changes for individual A. aspera populations for any markers surveyed

(Appendix 1 Figures 2-5). However, the skyride plots for the combined datasets recovered

24 signals of significant size change for all three markers (Fig. 2.5), all in the form of population expansions.

The historical demography of A. spinosa inferred using frequency-based tests differed from that of A. aspera. While there was little support from the A. aspera COI data for expansions in individual populations, 4/6 A. spinosa populations did show a signal of expansion (Table 2.3).

However, like in A. aspera, the atrop data recovered a strong signal of expansion in only a single population (also in St. Maarten). In addition, two populations had significantly positive Tajima’s

D values (Puerto Rico and St. Thomas A. spinosa for atrop) and may have undergone population declines, although that signal was not found for the other two A. spinosa markers (Table 2.3).

Signals of population size changes were found in the combined A. spinosa datasets. In the case of the COI data, a significant signal of population decline was found. However, that result may stem from the deep divergence at COI among A. spinosa populations, which would cause significant positive values of Tajima’s D. Of the two nuclear markers, only the rag1 data showed a signal of expansion in A. spinosa (Table 3). A recent meta-analysis by Wares (2010) found that

Tajima’s D values calculated from mitochondrial sequence data in natural populations are biased towards a deviation from neutrality and towards negative values, which could give a false signal of population expansion or a selective sweep. I do not believe that the bias observed by Wares has influenced my results. There are no significantly negative Tajima’s D values in my results that are not supported by at least one of the other frequency-based tests of neutrality (Table 2.3).

In addition, there were cases where a significant signal of expansion was detected by the other two tests, but not by using Tajima’s D.

As in A. aspera, the Bayesian skyride tests did not recover significant size changes for any individual populations for any marker (Supplementary Information). The skyride plots for

25 Table 2.3 Results of demographic tests using summary statistics. Significance was determined by the test itself (only in the case of Tajima’s D) and/or through coalescent simulations. Significant test results are in bold. Values for A. aspera COI are not available for Puerto Rico or St. Thomas because there are no segregating sites in either population.

the combined datasets recovered signals of significant size change for 2/3 of the combined A. spinosa datasets, both in the form of a population expansion (Fig. 2.5). The third plot, COI, showed a signal of decline, which, as with the summary statistics, was likely due to the large genetic divergence among populations.

Mitochondrial and nuclear markers recovered different root heights and times of population size changes both within and between species (Fig 2.5). In the case of A. aspera, the atrop plot showed a signal of population expansion for nearly the entire history of the sample, as evidenced by the median effective population size. The rag1 dataset showed an older maximum root height than in atrop, although the 95% HPD estimates of the root height for the two genes overlapped. The rag1 dataset did not suggest population expansion until roughly 500,000 years

26 ago – the same time as the inferred expansion using atrop, showing concordance between the markers.

Both A. spinosa nuclear genes recovered signals of expansions, but the inferred timing of the expansions and root heights differed between the markers. The atrop dataset showed a population expansion that began ~400,000 years ago and a maximum TMRCA of 690,000 years, neither of which were significantly different from A. aspera atrop. In contrast, the rag1 data showed an expansion that began earlier, 1.5 million years ago, and a significantly older maximum root height than atrop (and A. aspera rag1) of 2.1 million years. Individual population level comparisons (Appendix 1 Table 4), did not recover significantly different root heights between the species for any of the markers or populations.

The effective population sizes that were estimated from the datasets where all populations were combined did not differ significantly between species for any of the markers (Appendix 1

Table 4). The results from the nuclear markers showed that both species have large effective population sizes, over 15 million. Individual populations also had large effective sizes for both species (Appendix 1 Table 4), with median numbers of individuals ranging from 1.2 to 11.7 million, consistent with reported population densities (Clarke, 1994) and museum collections

(Greenfield, 1981; Greenfield and Johnson, 1990). The COI estimates, representing the effective number of females, were significantly lower than those for the nuclear datasets with median effective population sizes for A. aspera and A. spinosa of 463,000 and 693,000 individuals, respectively.

Discussion Recent studies have found that pelagic larval duration can be a poor predictor of differences in

ΦST among marine species (Bowen et al., 2006; Weersing and Toonen, 2009). I found the opposite to be true for Acanthemblemaria aspera and A. spinosa: the two species have identical

27

Figure 2.5 GMRF skyride plots for each marker for A. aspera and A. spinosa inferred in BEAST. Time, in years, is shown on the x-axis. Effective population size in log number of individuals is shown on the y-axis. The central dark horizontal line in the plot is the median value for effective population size; the light lines are the upper and lower 95% HPD for those estimates. The vertical dashed line represents the median TMRCA. The upper 95% HPD on TMRCA is at the right end of the plot, while the lower 95% HPD is the vertical line to the left of the median. Horizontal dashed lines represent the cutoffs used in this study to assess significance of population size changes.

28 pelagic larval durations (21-24 days) (Johnson and Brothers, 1989) and showed near identical patterns of pairwise ΦST values (Table 2.1) and genetic differentiation as reported by

STRUCTURE (Figure 2.2). These concordant patterns of subdivision recovered from frequency- based analyses were, however, superficial and misleading. A. spinosa has far greater COI divergence than A. aspera (~20X) among populations (Table 2.2), a difference ignored by the

ΦST and STRUCTURE analyses because they treat all alleles identically, regardless of the number of substitutions separating them. Inferred patterns of historical demography (Figure 2.5) differed between mitochondrial and nuclear markers, which may be due to the very rapid rate of mitochondrial evolution in these fishes. This rapid rate has obscured signals of old expansion for both species, which were only revealed by nuclear DNA. However, the mitochondrial DNA, in conjunction with the nuclear DNA, allowed us to recover temporally separated population expansions in A. aspera.

Substitution Rates

The large level of mtDNA sequence divergence among populations that I found for A. spinosa

(Table 2.2) is surprising for a marine fish with a 21-24 day pelagic larval duration. This level of sequence divergence may not be unique to A. spinosa; a phylogeographic study on

Acanthemblemaria from the eastern Pacific also found high mitochondrial DNA sequence divergence among populations (Lin et al., 2009). My results here reveal a mitochondrial substitution rate that is high both in absolute terms and relative to nuclear rates in the same species, and demonstrate the effect that incorrect substitution rate estimates have on the estimation of population genetic parameters.

The inferred COI substitution rate of 11.22% per million years is one of the fastest vertebrate mitochondrial rates known (Nabholz et al., 2008; Nabholz et al., 2009; Welch et al.,

29 2008). One possible reason for the fast mitochondrial rate I estimated would be that the transisthmian calibration I employed was incorrect and that divergence occurred long before the closure of the isthmus, as seen in other taxa (Knowlton et al., 1993; Lessios, 2008; Marko, 2002).

The use of slower rates seen in other teleosts discounts this possibility. When the slowest COI rate from Lessios (2008) for transisthmian geminate fishes (1.03% per million years) was used, the mean inferred divergence for A. exilispinus and A. betinensis was 37.8 million years ago

(confidence interval of 50.9 to 25.9 Mya). At that time, abyssal water depths connected the eastern Pacific and western Atlantic (Lessios, 2008), making an initial divergence between species that live in close association with coral reefs and in 1 meter of water unlikely. For the maximum substitution rate listed in Lessios (2008), 1.77% per million years, and using the K2P model, mean divergence time is 11.8 million years ago with upper and lower bounds of 14.0 and

9.7. These dates would also place the initial divergence of the A. exilispinus and A. betinensis at a time when appropriate habitat did not exist.

In contrast, the substitution rates inferred for the nuclear genes do not appear to be exceptionally fast. The value I obtained for rag1, 0.30% per million years, was faster than that listed in (Lessios, 2008) for rag1. However, when the rate from Lessios (2008) was used in

BEAST analyses (0.097% per million years), the confidence interval on the transisthmian divergence overlapped the one obtained here in the analysis using the exponential prior on divergence time. The substitution rate estimated for atrop, 0.80% per million years, agreed with published estimates for autosomal introns in birds (0.72% per million years) (Axelsson et al.,

2004). Together, these data support an exceptionally fast mitochondrial substitution rate in

Acanthemblemaria.

30 These results illustrate the problems that could be introduced by using a published substitution rate that is inappropriate for the taxa under consideration by causing substantial bias in demographic parameters. In addition to the extreme overestimation of divergence times illustrated above, effective population size estimates would be systematically overestimated by using a substitution rate that is too slow. An incorrect substitution rate also leads to biases in coalescent estimation of population size change and migration rates as these values are dependent on substitution rate.

The ratio of mitochondrial to nuclear exon substitution rate, 37.6:1, is also one of the greatest known for (Caccone et al., 2004; Oliveira et al., 2008; The Nasonia Genome

Working Group, 2010; Willett and Burton, 2004). This large ratio may have consequences for postzygotic isolation due to epistasis between co-adapted nuclear and mitochondrial genotypes.

Proteins encoded in the mitochondrial genome, such as those responsible for oxidative phosphorylation, directly interact with nuclear-encoded proteins. Gene products from each genome must be able to work properly with each other, or organismal breakdown may occur.

This has been seen in hybrids with mismatched nuclear and mitochondrial genomes (Burton et al., 2006; Rawson and Burton, 2002; The Nasonia Genome Working Group, 2010). In Nasonia wasps, nuclear genes that interact directly with the mitochondrion have a significantly higher synonymous-to-nonsynonymous substitution ratio (dN/dS) than those that do not (The Nasonia

Genome Working Group, 2010). Finding similarly high compensatory dN/dS ratios in

Acanthemblemaria would suggest the possibility of co-evolution of nuclear and mitochondrial genomes that could lead to hybrid breakdown through cyto-nuclear disequilibrium in these fishes.

31 Demographic Histories of A. aspera and A. spinosa

My demographic analyses revealed two bouts of population expansion: older expansions of both species and a younger one specific to A. aspera. The former was recovered by nuclear DNA, the latter by mitochondrial DNA. Alone, neither marker type, mitochondrial or nuclear, would have provided a complete picture of the historical demography of these fishes.

The nuclear data recovered a population expansion dating to 400,000 – 500,000 years ago for both species, although the A. spinosa rag1 data indicates a significantly older expansion than

A. spinosa atrop, beginning ~1.5 Mya (Figure 2.5), as well as a significantly older root age

(Supplementary Information). This discrepancy between nuclear loci probably arises from coalescent stochasticity, as different markers in the nuclear genome can have different times to their most recent common ancestor.

The COI skyride plot for all A. aspera populations combined recovered a population expansion beginning ~20,000 year BP. This coincides with the last glacial maximum, a period of lowered sea levels when there was 89% less available shelf area in the Caribbean basin

(Bellwood and Wainwright, 2002) than at present. This reduced habitat availability may have caused reduced population sizes in A. aspera. Such a population bottleneck would cause the root of the COI gene tree to appear to be quite young. As glaciers receded and sea level in the

Caribbean basin rose (Lambeck et al., 2002), habitat suitable for coral reef species was restored

(Montaggioni, 2000) and populations grew. Genetic signatures of population expansion that date to increased habitat availability following maximum global glaciation have also been found in other coral reef fishes (Fauvelot et al., 2003; Rocha et al., 2005; Thacker et al., 2008).

Within A. aspera, then, mitochondrial and nuclear markers recovered different aspects of population history. The older population expansion may reflect the initial spread of both species

32 throughout the Caribbean at the time of speciation and would account for the discrepancy between the inferred gene tree root heights of the mitochondrial and nuclear markers.

A single gene region cannot recover a population history older than the most recent bottleneck (Heled and Drummond, 2008) which, due to its smaller effective population size, should be more recent for mitochondrial than nuclear data. The severity and recency of the most recent bottleneck would affect the height of the gene tree, while the substitution rate of the mitochondria would determine the amount of signal that would be available to detect the recovery. The pattern of recent expansion revealed by COI was nearly hidden by high rates of substitution, which resulted in fixed mitochondrial haplotypes among populations that may have been interpreted as evidence for long-term isolation. This is evidenced by most A. aspera populations having private alleles and corrected ΦST values of 1, which is not the expectation under a scenario of population growth (Excoffier et al., 2009).

While both species shelter in holes in corals, they differ in their microhabitat use and in their propensity to go locally extinct (Clarke, 1994; Clarke, 1996). A. spinosa is found in shallower water than A. aspera (Clarke, 1994), although the two overlap at intermediate depths.

A. spinosa occurs only in high-profile shelters up off the reef in living or standing dead coral, while the less specialized A. aspera can persist in low-profile habitat on the reef surface in coral rubble (Clarke, 1994). These differences in microhabitat use give A. spinosa a greater propensity to go locally extinct. Thus, the same processes that can cause A. spinosa populations to decline

(when living and standing dead coral is destroyed and reduced to rubble) can allow A. aspera populations to grow (Clarke, 1996).

Given what is known about the differences in microhabitat requirements of these fishes, the expectation is that A. aspera populations should be more stable over time than A. spinosa

33 populations, at least at ecological time scales. In addition to local high frequency demographic cycles, regional changes in habitat availability due to glaciation would also be expected to favor persistence of A. aspera populations over A. spinosa at evolutionary time scales. Given that A. spinosa lives in shallower waters that A. aspera, the substantial reduction in shelf area during intervals of low sea level would be expected to have a greater effect on A. spinosa than A. aspera.

In this study, however, I found demographic patterns that were at odds with those predicted by the ecology and life histories of these blennies. On the one hand, the similarities in the life histories of the two species did not translate into concordant demographic histories. On the other hand, the differences I did find in the historical demography of the two species were in the opposite direction than expected from their differences in microhabitat use. While there was a signal of bottlenecks in individual A. spinosa populations, as suggested by point estimates

(Table 2.3) and TCS haplotype distributions (although Bayesian skyride tests did not), there was no evidence of a range-wide extirpation for the species. However, A. aspera appears to have undergone a range-wide bottleneck. Together, my results indicate that, compared to A. aspera, A. spinosa populations were better able to persist during lower sea levels at the last glacial maximum.

It is not clear then why I recovered a pattern of range-wide population expansion in A. aspera and population persistence in A. spinosa. In previous comparative studies of historical demography in marine taxa, interspecific differences correlated well with habitat requirements

(Hickerson and Cunningham, 2005; Marko, 2004). However, those studies involved intertidal taxa that were directly affected by glaciation. It may be that less obvious factors than ecological differences are responsible for contrasting demographic patterns in coral reef taxa resulting from

34 sea level changes. Sampling of additional nuclear markers to test for multiple population size changes using a method such as the extended Bayesian skyline plot (Heled and Drummond,

2008) could provide further insight into the timing and degree of demographic changes through multiple glacial cycles.

Conclusions

My study illustrates that mitochondrial and nuclear markers can reveal complementary information in historical demographic studies. The smaller effective size and rapid substitution rate of the mitochondrial DNA allowed the inference of a recent population expansion in A. aspera, while the slower nuclear DNA recovered an older expansion for both species. However, the rapid mitochondrial substitution rate also obscured the recent expansion in A. aspera.

Analyses of the mitochondrial data using frequency-based metrics alone did not indicate the underlying population expansions in A. aspera, neither young, nor old. The results of the frequency-based tests, coupled with the STRUCTURE results, lead to a pattern of subdivision that was very similar to that of A. spinosa even though the underlying demography of the two species was quite different.

35 Chapter 3: The Performance of Bayesian Phylogenetic Inference under Extreme Substitution Rate Variation: Effects on Concatenated and Species Tree Analyses

Introduction Rates of molecular evolution are known to vary throughout the genome, sometimes substantially

(Hellberg, 2006; Senchina et al., 2003; Wolfe et al., 1989). However, it is not clear how differences in rates of molecular evolution among genes affect multi-locus phylogenetic analyses. Heterogeneity in rates among and within markers can be accommodated by partitioning the dataset (Nylander et al., 2004; Yang, 1996) with a wide range of potential partitioning schemes. These schemes can range from combining all loci and treating them as a single locus with a single rate of substitution, to highly parameterized strategies, where each gene and codon position is assigned its own partition, each with its own substitution models.

In Bayesian phylogenetic analyses using MrBayes (Ronquist and Huelsenbeck, 2003), accommodating differences in rates of molecular evolution among partitions, as well as substitution rate parameters, has been shown to increase the accuracy of both parameter and topology estimates (Brandley et al., 2005; Brown and Lemmon, 2007; Nylander et al., 2004). In addition, not allowing for differences in rates of molecular evolution among partitions can lead to poor results (Marshall et al., 2006).

As of this writing, there are increasing numbers of large multilocus datasets (see the

Assembling the Tree of Life website, www.phylo.org/atol/projects for a partial list). These datasets are frequently analyzed in a partitioned Bayesian framework and it is expected that variation in substitution rates among partitions will be present. Given the large number of markers, marker partitions, and the accompanying large numbers of parameters, the effect of substitution rate variation among loci should be considered.

36 When partitioned datasets are analyzed in a Bayesian framework, the overall rate of evolution for all partitions is evolved on the same topology, with the same set of branch lengths

(Nylander et al., 2004). However, as the overall rate may be quite different among the partitions, the among-partition rates are scaled according to a rate multiplier parameter, which allows branch lengths to be proportional across partitions. The rate multiplier is defined as mi, with m as the rate of the ith partition. A likelihood is then calculated for each partition by multiplying its rate multiplier by the shared set of branch lengths (Brown et al., 2010). If a dataset is split into many partitions, there will also be many parameters in the rate multiplier, and many likelihoods to be calculated. For example, a ten gene dataset partitioned by codon position would have a rate multiplier parameter of (m1, m2, m3, m4, m5, m6, m7, m8, m9, …, m30). Determining the posterior distribution of the joint rate multiplier parameter is accomplished using Markov chain Monte

Carlo sampling (MCMC).

One of the primary concerns when performing Bayesian phylogenetic analyses using

MCMC is whether the chain has converged on the true posterior distribution (Huelsenbeck et al.,

2002b; Nylander et al., 2004; Ronquist et al., 2009). Depending on the model, the parameter space may be very complex, with many local likelihood optima (Huelsenbeck et al., 2002b;

Huelsenbeck et al., 2001). This can be problematic, especially for high dimension parameters such as the rate multiplier, because the Markov chain will have a harder time moving through parameter space and finding the true optimum likelihood peak (Huelsenbeck et al., 2002b;

Huelsenbeck et al., 2001).

To move through the parameter space, proposals are made to update the Markov chain from its current state, (denoted here as θ) to a new one (θ*) (Huelsenbeck et al., 2001; Ronquist et al., 2009; Yang, 2006). Whether the proposed move is accepted is decided by the proposal

37 ratio, r (= θ*/ θ). If r is > 1, the proposed move is always accepted. If r is < 1, the proposed change in state is accepted with probability r. Thus, if r is not much less than 1, meaning that the posterior probability at the new state is not much worse than at the original state, the update will most likely be accepted. However, if the posterior probability at θ* is much worse than at θ, r will be much less than 1 and the proposed update to the chain has a poor chance of being accepted (see section 7.3 and figure 7.4 of Ronquist et al. 2009).

The calculation of the proposal ratio depends on the proposal algorithm, which differs depending on the parameter being changed. In the case of the rate multiplier, new values are drawn from a Dirichlet distribution that is centered on the current parameter values of the chain

(Ronquist and Huelsenbeck, 2003; Ronquist et al., 2009). The new values are determined by multiplying the current ones by the tuning parameter, α. (Note that the tuning parameter α is not the same as the shape parameter α of the Γ distribution of rate variation among sites). The higher the value of α, the closer the proposed parameter values will be to the current ones. Thus, by changing the magnitude of the α value, modest changes (large α) can be proposed, which may have a higher chance of being accepted than bold proposals (small α).

If update proposals are rejected too frequently, a sample of the posterior distribution may never be taken because the possible range of values for that parameter is never explored, as the

Markov chain cannot move. This poor mixing of the chain can lead to a failure of replicate runs to converge onto the same posterior distribution, if they converge at all. However, by adjusting the tuning parameter, the optimal acceptance rate of new proposals can be determined. The effect of optimal proposal acceptance rates on Bayesian phylogenetic analyses have been explored for topology proposals (Lakner et al., 2008). However, the effect of acceptance rates of proposals to changes in the rate multiplier parameter has not.

38 The expectation, then, would be that large differences in substitution rates among partitions would lead to poor proposal acceptance for the rate multiplier parameter. This is because the overall rate is divided very unequally among partitions, which have to share the same set of branch lengths. That in turn would cause a small area of the parameter space to have the optimal likelihood, which would be exacerbated as the number of partitions, and the differences in rates among them, increased.

Here I test that expectation, and generally explore the behavior of partitioned Bayesian analyses for a group of reef fishes, the tube blenny genus Acanthemblemaria, that show large differences in the rate of molecular evolution among markers. In a previous study on the population genetics of two Caribbean Acanthemblemaria species, substitution rates were estimated for one mitochondrial and two nuclear markers using a pair of transisthmian geminate taxa to calibrate a molecular clock. That study (Eytan and Hellberg, 2010) found a substitution rate for mitochondrial COI that is very high in both absolute terms (11.2% per million years) and relative to nuclear markers (over 37 times faster than nuclear exons), the latter of which appear to be evolving at rates typical of vertebrate genes.

The large differences in substitution rates between mitochondrial and nuclear markers in

Acanthemblemaria allows me to determine the extent to which among-marker rate differences affect phylogenetic reconstruction and how to ameliorate problems that may arise. Here I reconstruct the phylogeny of the genus Acanthemblemaria using five nuclear and one mitochondrial marker. I calculate the absolute rates of molecular evolution for each marker and marker class across the genus, as well as the relative rate of the mitochondrial to nuclear DNA using Bayesian relaxed clock divergence dating. I then estimate the Acanthemblemaria phylogeny in both Bayesian concatenated and species tree frameworks. The species tree analysis

39 allows me to determine how markers with very different rates affect species tree estimation. In contrast to concatenation, species trees do not force markers to share the same set of branch lengths, nor the same topology, and accounting for rate variation among partitions (or in this case, genes) may not be as important as in concatenated analyses.

Materials and Methods Study System: Acanthemblemaria Tube Blennies

The genus Acanthemblemaria is part of the coral reef fish Family Chaenopsidae sensu Stephens

(1963). The Chaenopsidae is one of six families in the Suborder Blennioidei (Hastings and

Springer, 2009b) and one of only three reef fish families with an exclusively New World distribution (Bellwood and Wainwright, 2002; Hastings, 2009). In the Pacific,

Acanthemblemaria blennies range from the northern Gulf of California to the Golfo de

Guayaquil in Ecuador. The Western Atlantic members of the genus occur throughout the

Caribbean basin, the Bahamas, and peninsular Florida (Hastings, 2000; Hastings, 2009; Smith-

Vaniz and Palacio, 1974; Stephens, 1963).

Acanthemblemaria is also the most species rich of the Chaenopsid genera (Hastings and

Springer, 2009b), with 22 described species, 9 in the Tropical Eastern Pacific and 13 in the

Caribbean (Hastings, 2009). The genus has also had the largest increase in the number of named species in the Chaenopsidae since the initial treatment of the family by Stephens (1963). Much of this growth has been due to the recognition that several species with large distributions may consist of several cryptic taxa (Hastings and Springer, 2009a; Hastings and Springer, 2009b; Lin and Galland, 2010).

Previous phylogenetic hypotheses have been proposed for the interrelationships of the genus based on morphological data (Hastings, 1990). Phylogenetic analysis of morphological

40 data recovered a monophyletic Acanthemblemaria, as well as two well-supported transisthmian sister pairs, Acanthemblemaria betinensis – A. exilispinus and A. castroi – A. rivasi. The latter pair is an example of a rare sister relationship between Galapagos and southwest Caribbean shore fish species (Hastings, 2000; Hastings, 2009; Rosenblatt, 1967).

Sample Collection

Individuals from 16 out of the 22 named Acanthemblemaria species, as well as one undescribed species, were collected on SCUBA (Appendix 1 Table 5). Four outgroup taxa, based on relationships in Hastings (1990) and Almany and Baldwin (1996), were also included. Of the taxa collected, three pairs are putative transisthmian geminates (Hastings, 1990; Hastings and

Springer, 1994), two in the ingroup and one in the outgroup. Photo vouchers from freshly collected specimens for a subset of these individuals have been submitted to Dryad with accession numbers ###. Whole fishes were stored individually in 95% ethanol or salt-saturated

DMSO at -80° C.

DNA Extraction, PCR and Sequencing

DNA was extracted using the Qiagen (Valencia, CA) QIAMP DNA Minikit. The polymerase chain reaction (PCR) was performed to amplify six genetic markers (Appendix 1 Table 6): protein-coding genes mitochondrial cytochrome oxidase I (COI), nuclear recombination- activating gene 1 (rag1), titin-like protein (TMO4C4), melanocortin 1 receptor (MC1R), SH3 and

PX domain containing 3 gene (SH3PX3), and intron V from nuclear α-tropomyosin (atrop). PCR amplification of the full-length rag1 molecule was not possible for some taxa. A set of internal primers were developed for the study and used to amplify rag1 in those other taxa.

Amplicons were purified with a Strataprep PCR Purification Kit (Stratagene, La Jolla,

CA) or directly sequenced without cleanup in both directions on an ABI 3100 or 3130 XL

41 automated sequencer with 1/8 reactions of BigDye Terminators (V3.1, Applied Biosystems) and the amplification primers, or internal primers as indicated in Appendix 1 Table 6.

Sequence Alignment and Model Selection

Sequences for the five protein-coding genes contained no gaps and were aligned using MUSCLE

(Edgar, 2004) as implemented in Geneious v3.6 (Drummond et al., 2007a). The α-tropomyosin sequences contained numerous gaps, with nearly every species having different length indels.

BAli-Phy v. 2.0.1 (Suchard and Redelings, 2006) was used to align the atrop sequences. To decrease run times, a consensus sequence was used for individuals with the same gap lengths, except when gaps of the same size were shared between species. BAli-Phy was run using the

GTR substitution model, gamma distributed rate variation, and the default indel model. BAli-Phy was run four times to ensure concordance among runs. The final output from each run was separately analyzed, with all samples before convergence, as determined by stationarity in the

Markov chain, visualized in Tracer v1.5 (Rambaut and Drummond, 2010), discarded as burnin.

The consensus alignment from the run with the highest posterior probability was used for subsequent analyses. All columns in the final consensus alignment with posterior probabilities less than 0.95 were discarded.

Models of sequence evolution were chosen for each of the protein-coding genes using

ModelTest (Posada and Crandall, 1998) and three different partitioning strategies: the full sequence, each codon position separately, and first and second positions combined with the third position separate (the SRD06 model; (Shapiro et al., 2006). In the case of α-tropomyosin, models were chosen for the full marker as well as exons and introns separately. Intron-exon boundaries were found by identifying the upstream and downstream splice junctions. I used the

AIC because it is the least conservative model selection criterion and underparameterization is

42 problematic for Bayesian phylogenetic analyses (Huelsenbeck and Rannala, 2004; Lemmon and

Moriarty, 2004). However, time-reversible models that include parameters for substitutions that do not exist in the alignment (i.e. no columns with G to C changes, for example) can lead to poor performance and skewed parameter estimates (Sullivan and Joyce, 2005). For that reason, I chose the next less complex model for a partition when a particular substitution type was not present. All models used can be found in Table S3. Compositional heterogeneity was tested for each marker and the full alignment using SeqVis (Ho et al., 2006) and the X2 test in PAUP* v4.0b10. (Swofford, 2003).

MrBayes Heating, Branch Length Priors, and Proposal Settings

Preliminary Runs - Ten potential strategies for partitioning the concatenated dataset were considered (Table 3.1). Initial MrBayes runs of 10,000,000 generations sampling every 1000 were conducted twice with four heated chains for each of the ten partitioning schemes and with four different heating strategies (80 runs in all). In addition, all partitioning strategies were run without COI to test the effect of including this fast marker in the analysis.

Optimizing MCMC proposals - Proposal updates for the Markov chain were accepted infrequently, with poor mixing of the rate multiplier parameter (see Results). A two-part strategy was used to determine and implement optimal tuning parameters. MrBayes v3.2, which performs auto-tuning of proposal parameters, was used to determine the optimal tuning parameter for proposals to change the rate multiplier. Those were then used to supplant the default proposals in

MrBayes v3.1.2, because the revision of v3.2 used in this study did not support parallelization of runs.

MrBayes v3.2 uses four priors for topologies and branch lengths not available in v3.1.2

(eSPR, eSS, pSPR, and Muliplier(V)). To avoid complications from including these tree priors

43 Table 3.1 List of partitioning strategies used in this study.

Model Name Partition Description Number of Partitions 1 FULL All included nucleotide positions 1 2 SNMAT SRD06 model for nDNA, mtDNA, !-trop concatenated 5 3 SNMIE SRD06 model for nDNA, mtDNA, !-trop intron and exon 6 4 GENES Partitioned by gene region, !-trop concatenated 6 5 NMAT nDNA by codon, mtDNA by codon, !-trop concatenated 7 6 NMIE nDNA by codon, mtDNA by codon, !-trop concatenated 8 7 SGAT SRD06 model for each locus, !-trop concatenated 11 8 SGIE SRD06 model for each locus, !-trop intron and exon 12 9 GCAT Each locus by codon position, !-trop concatenated 16 10 GCIE Each locus by codon position, !-trop intron and exon 17

No COI Model Name Partition Description Number of Partitions 1 FULL All included nucleotide positions 1 2 SNMAT SRD06 model for nDNA, mtDNA, !-trop concatenated 3 3 SNMIE SRD06 model for nDNA, mtDNA, !-trop intron and exon 4 4 GENES Partitioned by gene region, !-trop concatenated 5 5 NMAT nDNA by codon, mtDNA by codon, !-trop concatenated 5 6 NMIE nDNA by codon, mtDNA by codon, !-trop intron and exon 6 7 SGAT SRD06 model for each locus, !-trop concatenated 9 8 SGIE SRD06 model for each locus, !-trop intron and exon 10 9 GCAT Each locus by codon position, !-trop concatenated 13 10 GCIE Each locus by codon position, !-trop intron and exon 14

and to make my proposal estimates comparable, I downloaded the v3.2 source code from the

MrBayes SourceForge repository (revision 63, downloaded 5/8/2008) and edited the source code to turn off the additional tree priors (the model.c file was edited to change default moves and settings). Extended TBR and LOCAL, which are present in v3.1.2, were retained. The code for v3.2 was then recompiled, and each partitioning strategy with greater than one partition was run once with a single chain for 2.3-6 million generations, sampling every 500 and autotuning every

100 generations. The run lengths for each partitioning strategy were determined by observing when the tuning parameter values leveled off. MrBayes v3.1.2 contains a bug that prevents proposals from being changed when using batch files with input redirect. I edited the source code for the “command.c” file to fix the bug. The optimized tuning parameters were then input into

44 MrBayes v3.1.2 from a batch file with input redirection and run in parallel using MPI. Those tuning parameters were then used in all subsequent MrBayes analyses. The final tuning parameters for each partitioning strategy are available in the Supplementary information. Source code for the edited versions of MrBayes 3.1.2 and 3.2 are available from RIE upon request.

Heating strategy and branch length priors - Heating strategies for the 10 different partitioning strategies were determined by performing single MrBayes runs with four heated chains and four different temperature settings for 1,000,000 generations, sampling every 100, for a total of 40 runs. Temperatures of 0.2, 0.1, 0.05, and 0.02 were specified for each run, with the exception of the SGIE strategy where an additional run using temperature = 0.01 was performed.

The temperature that allowed acceptance between 0.2 and 0.8 swaps was used for all subsequent

MrBayes analyses.

Once the optimal heating strategy was determined for each partitioning strategy, I determined which branch length prior to use by performing single MrBayes runs with four heated chains for 15,000,000 generations, sampling every 1500, and using the previously determined heating parameter. Four different branch length priors were tested, exponential distributions with means of 2, 10, 50, or 100. Optimal branch length priors for each partitioning strategy were chosen using 2 ln Bayes factors (Newton and Raftery, 1994).

Final MrBayes Runs and Determination of Partitioning Strategy

Once the optimal heating and branch length priors were determined for each partitioning strategy, final runs were performed in MrBayes. Between four and eight runs, each with four heated chains, for 15,000,000 generations, sampling every 1500, were performed for each partitioning strategy. Convergence onto the posterior distribution for the estimated topology was assessed using Are We There Yet? (AWTY) (Nylander et al., 2008). Convergence onto the

45 posterior distribution for parameter estimates was assessed by effective sample size (ESS) values greater than 250, as determined in Tracer v1.5 (Rambaut and Drummond, 2010). The marginal likelihood was estimated with the method of Newton and Raftery (1994) with the modifications proposed by Suchard et al. (2001), implemented in Tracer v1.5 (Rambaut and Drummond,

2010). The partitioning strategy with the greatest pairwise 2 ln Bayes factor score was chosen, unless the difference between the two best strategies was <10.

Maximum Likelihood Gene Trees

Maximum likelihood trees for individual genes and for the fully concatenated dataset were constructed using GARLI v0.96b8-r601 (Zwickl, 2006), which allows analysis of partitioned datasets. Individual genes were partitioned by codon position (or intron/exon in the case of atrop) while the full alignment was partitioned using the GCIE strategy, which was chosen as the optimal partitioning strategy for MrBayes (see Results), with the BIC used to choose the substitution models for each partition in all cases. The substitution models used can be found in

Appendix 1 Table 7. Default settings were used for the GARLI analyses with the following exceptions: Attachments per taxon were set at 145, genthreshfortopoterm was set to 50000, and searchreps were set to 4. Five separate bootstrap runs, each with 20 bootstrap repetitions, were done. The resulting 100 trees were summarized into a consensus tree using SumTrees

(Sukumaran and Holder, 2008). The GARLI input files have been submitted to TreeBase and

Dryad with accession numbers #### and ###, respectively.

Estimation of Variation and Absolute Values of Substitution Rates

A time calibrated phylogeny was constructed using BEAST v1.5.4 (Drummond and Rambaut,

2007). The full alignment used in the previous analyses was employed using the GCIE partitioning strategy. Trees for all partitions were linked, while substitution models for each

46 partition were unlinked. Trees were estimated using a single linked clock for all genes as well as separate clocks for each individual gene. The latter was done so that relative rates and variation in rates among genes would be calculated. The substitution models used for each partition were identical to those of the GARLI analysis. Runs were also performed without COI to determine its affect on estimates of topology and parameters.

Two replicate runs were performed for both the linked and unlinked clock analyses using an uncorrelated lognormal distribution on branch lengths and the calibration priors detailed below. In all cases, the Markov chain was run for 1 billion generations, sampling every 50,000.

Convergence onto the posterior distribution for the estimated topology was assessed using

AWTY (Nylander et al., 2008) for all BEAST analyses, where appropriate. Convergence onto the posterior distribution for parameter estimates was assessed by ESS values greater than 250, as determined in Tracer v1.5 (Rambaut and Drummond, 2010). Any parameters with low ESS values, or unreasonable posterior parameter estimates had their priors adjusted and the analyses were run again. Operator weights were adjusted according to the changes suggested at the end of runs. This was done iteratively until good sample sizes and reasonable estimates were obtained for all parameters.

Calibration Priors

Priors on the time to most recent common ancestor (TMRCA) for two transisthmian species pairs were specified. The first species pair, A. betinensis and A. exilispinus, occur in <1 meter of water and are restricted to areas close to the isthmus (Hastings, 2009). This suggests that their progenitor was split close to the final closure of the isthmus. The calibration was given an exponential prior with a mean of 7 million years and a zero offset of 3.1 million years. This prior

47 represents the most recent possible split for the geminates at the close of the isthmus, but allows for a split prior to the closure, although with decreasing probability back in time.

The second pair of geminates, A. rivasi and A. crockeri, have a Galapagos – Caribbean distribution (Hastings, 2009). While the most recent possible split between these two would have been the closure of the Isthmus and the earliest possible split the rise of the Galapagos (at most

17 million years ago; (Werner and Hoernle, 2003), the split most probably occurred between those dates. A truncated normal prior for the split time of A. rivasi and A. crockeri was specified.

A minimum offset of 3.1 million years, representing the most recent possible split for the species pair, was used. The mean and standard deviation were set at 10 and 3.52, respectively, which gave a 95% confidence interval 3.1 and 16.9 million years.

Substitution Rate Estimation

I was interested in the gene-specific variation in substitution rates among taxa. To obtain estimates for substitution rates and rate variation, as well as to visualize patterns of rate variation across the phylogeny, I used the ultrametric topology resulting from the time-calibrated tree produced from the previous BEAST analysis. Doing so removed the affect of phylogenetic estimation error (aside from that of the original BEAST analyses) from the substitution rate analyses. This tree was used as the starting topology and all operators that act on tree topology were removed from the BEAST xml file, which allowed branch lengths and rates to change while keeping the tree topology fixed.

Fixed topology analyses were conducted for each gene separately, as well as the concatenated nuclear gene dataset, with two runs of 100,000,000 generations, sampling every

5,000 for each. The models of sequence evolution used for the full alignment were used for each individual gene. Convergence was assessed in Tracer, but not AWTY, as a fixed topology was

48 used. All xml files used for the BEAST analyses have been submitted to TreeBase and Dryad with accession numbers #### and ###, respectively.

Species Tree Estimation

Species tree analyses were conducted using the *BEAST package in BEAST v1.5.4 (Heled and

Drummond 2010). Sequences were grouped by nominal species for the analyses. Two different datasets were used, one including the COI matrix and one without, with the same models of sequence evolution as in the BEAST substitution rate analyses. The datasets were run twice for

100,000,000 (nuclear DNA only) or 1,000,000,000 (nuclear and mitochondrial DNA) generations, sampling every 5,000 or 50,000, respectively. Convergence onto the posterior distribution was assessed as above.

Results Sequence Data

Six gene regions, five nuclear and one mitochondrial, were successfully amplified in all taxa for a total alignment length of 4,411 bp. Aligned sequence lengths for each marker ranged from

1503 bp for rag1 to 280 bp for atrop (Table 3.2). In the case of atrop, many of the positions in the noncoding region of the nucleotide alignment had Bayesian posterior probabilities (BPP)

<0.95 and many positions were excluded (data not shown). The included atrop noncoding data contained > 55% percent variable positions. All sequences have been submitted to GenBank with accession numbers XX######-XX######.

All markers were informative, with at least 15% variable and 11% parsimony informative sites for each. However, there were substantial differences in information content among genes.

In particular, COI, while accounting for about 14% of the total dataset by length, contained over

28% of the variable sites and over 33% of the parsimony informative sites. For COI third codon

49 positions, 206 out of 207 bp were variable and all were parsimony informative (Table 3.2). Thus, although the third codon of COI comprised < 5% of the total dataset, it contained over 22% of the variable sites and nearly 28% of the parsimony informative ones. Despite these striking differences in informational content among markers, compositional heterogeneity tests found no significant differences in base frequencies among loci (not shown).

Table 3.2 Markers sampled for this study. Total marker lengths in base pairs, percent and number of variable and parsimony informative (PI) sites are shown for the final alignment.

Gene Region Included Length % Variable Sites (No. Variable Sites) % PI Sites (No. PI Sites ) RAG1 1503 bps 15.17 (228) 11.38 (171) MC1R 855 bps 16.61 (142) 12.28 (105) SH3PX3 741 bps 16.87 (125) 12.96 (96) TMO4C4 411 bps 24.82 (102) 18.98 (78) COI 621 bps 41.38 (257) 40.42 (251) !-tropomyosin 280 bps 20.71 (58) 13.93 (39) TOTAL 4411 bps 20.68 (912) 16.78 (740)

COI=28.18% of variable sites; 33.92% PI

Partition Included Length % Variable Sites (No. Variable Sites) % PI Sites (No. PI Sites ) RAG1 (1) 501 bps 6.59 (33) 5.39 (27) RAG1 (2) 501 bps 5.19 (26) 3.39 (17) RAG1 (3) 501 bps 33.73 (169) 26.55 (133) MC1R (1) 285 bps 4.91 (14) 3.86 (11) MC1R (2) 285 bps 2.46 (7) 1.4 (4) MC1R (3) 285 bps 42.81 (122) 36.14 (103) SH3PX3 (1) 247 bps 2.43 (6) 2.02 (5) SH3PX3 (2) 247 bps 1.21 (3) 0.81 (2) SH3PX3 (3) 247 bps 46.96 (116) 38.06 (94) TMO4C4 (1) 137 bps 12.41 (17) 8.76 (12) TMO4C4 (2) 137 bps 5.84 (8) 4.38 (6) TMO4C4 (3) 137 bps 56.2 (77) 45.99 (63) COI (1) 207 bps 21.74 (45) 20.29 (42) COI (2) 207 bps 2.9 (6) 1.45 (3) COI (3) 207 bps 99.52 (206) 99.52 (206) !-tropomyosin (I) 92 bps 56.52 (52) 42.39 (39) !-tropomyosin (E) 188 bps 3.19 (6) 1.06 (2) TOTAL 4411 bps 20.68 (912) 16.78 (740)

COI3rd=22.59% of variable sites; 27.84% PI

50 MCMC Runs, Proposal Adjustments, and Partitioning Strategies

Proposals to Update the Rate Multiplier Were Rarely Accepted and Replicate MCMC Runs Failed to Converge

Of the nine partitioning strategies that allowed for multiple partitions with unlinked rate priors, five had poor mixing for the rate multiplier parameter (updates accepted ≤ 0.1% of the time) and four had significantly different log likelihoods between eight replicate runs (Table 3.3).

Adjusting the heating had no effect on the observed behavior nor on the acceptance rates of update proposals.

The large differences in the mean log likelihoods calculated from replicate runs are apparent from the log likelihood traces (Figure 3.1A). The four strategies with 11 or more partitions (GCIE, GCAT, SGIE, SGAT) showed substantial variation among runs. Upper and lower log likelihoods for replicate runs differed by up to 383 log likelihood units (in the case of the GCAT strategy), with at least 116 log likelihood unit differences among runs. The strategies with fewer partitions did not differ significantly in log likelihood scores among runs (Tables

3.3A and B, Figure 3.1B-C), with none having upper and lower log likelihood estimates that differed by more than 81 log likelihood units among replicate runs.

Number of Partitions, Not the Exclusion of COI, Determined Proposal Acceptance Rates and MCMC Convergence Success Because of its rapid rate of molecular evolution, another set of replicate MrBayes runs were performed without the COI data to determine if its inclusion was affecting convergence.

However, when COI was removed from the alignment and runs were performed according to the original partitioning strategies (strategies and number of partitions with and without COI can be found in Table 3.1), among-run variation in log likelihoods remained high (Figure 3.2A). In addition, the acceptance rate for the rate multiplier parameter was still low, with updates accepted < 0.1% of the time (Table 3.4A). This, however, was not observed for all partitioning 51

Figure 3.1A-C Traces visualizing the estimates of the ln likelihoods from eight replicate runs for each of the nine strategies with greater than one partition (see Table 3.1). The name of the partitioning strategy and the number of partitions are shown in the upper left corner of each plot. All pre-burnin samples have been removed. The x-axis represents 10,000,000 generations of the Markov chain for all strategies. The y-axis represents the ln likelihoods, which differ between strategies. The range in ln likelihood values among runs, is shown below the traces.

52

53 (figure continued)

strategies, as only those with greater than six partitions (GCIE, GCAT, SGIE, SGAT) displayed that behavior (Table 3.4). Upper and lower log likelihood estimates for replicate runs again showed substantial differences in log likelihood units. The differences in log likelihood units among runs ranged from 141 for the GCIE partitioning strategy to over 264 for the SGIE strategy. An exception to this pattern was the GENES strategy, with five partitions, where log likelihoods also differed significantly between runs and a large difference in log likelihood units was present (158, greater than that from the full matrix analyses) (Figure 3.2C). As in the runs with the full molecular matrix, the strategies with fewer partitions (with the exception of

GENES) did not differ significantly in log likelihood scores among runs (Figure 3.2B). The differences in log likelihood units among runs were comparable to those resulting from the full matrix analyses (Figure 3.1B).

"Replicate" MCMC Runs Varied and the Best Runs Were Rare

I calculated 2 ln Bayes factors from the post-burnin samples for each of the replicate runs from the nine full data matrix partitioning strategies. As judged by the criteria of Kass and Raftery

(1995), highly significant differences in Bayes factors (2 ln Bayes factors > 10) were found

54 Table 3.3A-B Initial runs without operator adjustment for each of the nine multi-partition strategies. Runs were performed twice for each heating strategy. ESS = effective sample size, LnL = log likelihood, TL = total length, Ratemult = the “rate multiplier” parameter.

Temp Run RateMult Accept. Rate Mean LnL LnL Sig Diff? Temp Run RateMult Accept. Rate Mean LnL LnL Sig Diff? GCAT 0.2 1 0.03 -19047.832 SGIE 0.2 1 0.03 -18763.95 Yes Yes 2 0.02 -18708.403 2 0.03 -18948.01a 0.1 1 0.02 -18883.309 0.1 1 0.03 -18892.18a Yes No 2 0.04 -18723.23 2 0.02 -18883.77 0.05 1 0.03 -18824.096 0.05 1 0.05 -18718.37 Yes No 2 0.03 -18898.421 2c 0.05 N/A 0.02 1 0.04 -18779.72b 0.02 1 0.03 -18760.63b No Nod 2 0.03 -18743.95b 2 0.04 -18871.23 0.01 1 0.05 -18899.43 Yes GCIE 0.2 1 0.05 -18636.16a 2 0.04 -18682.53 Yes 2 0.03 -18830.70 0.1 1 0.04 -18597.70 SGAT 0.2 1 0.03 -18871.49 Yes Yes 2 0.04 -18721.08 2 0.03 -18918.30 0.05 1 0.03 -18626.86 0.1 1 0.4 18931.72 Yes Yes 2 0.03 -18737.49b 2 0.03 -19010.06 0.02 1 0.03 -18885.99b 0.05 1 0.05 -18870.80 Yes Yes 2 0.03 -18629.04 2 0.03 -18927.32 0.02 1 0.03 -18868.60b No 2 0.04 -18838.99b

Table 3.3A RED for ratemult: less than or equal to 0.1% Yellow: less than or equal to 1 aMean at Highest point in posterior distribution bbimodal c Sharp increase in LnL at end of run dOverlap is only in the bimodal part of distribution

Temp Run RateMult Accept. Rate Mean LnL LnL Sig Diff? Temp Run RateMult Accept. Rate Mean LnL LnL Sig Diff? NMIE 0.2 1 0.06 -18695.03 SNMIE 0.2 1 1.05 -19339.10 No No 2 0.09 -18695.23 2 1.02 -19338.75 0.1 1 0.08 -18695.03 0.1 1 1.05 -19339.25 No No 2 0.08 -18695.43 2 0.98 -19338.34 0.05 1 0.1 -18694.02 0.05 1 1.01 -19338.89 No No 2 0.1 -18693.90 2 1.01 -19338.60 0.02 1 0.1 -18693.68 0.02 1 0.96 -19339.17 No No 2 0.09 -18694.68 2 1.06 -19338.80

NMAT 0.2 1 0.11 -18763.80 SNMAT 0.2 1 1.9 -19407.90 No No 2 0.09 -18762.08 2 1.9 -19407.81 0.1 1 0.13 -18763.18 0.1 1 1.83 -19408.22 No No 2 0.12 -18763.87 2 1.96 -19408.24 0.05 1 0.14 -18763.83 0.05 1 1.88 -19407.82 No No 2 0.14 -18763.90 2 1.93 -19408.12 0.02 1 0.14 -18265.25 0.02 1 1.9 -19408.29 N/A No 2 0.14 -18765.76 2 1.89 -19407.96

GENES 0.2 1 0.21 -19751.17 No 2 0.3 -19768.79 0.1 1 0.3 -19769.47 No 2 0.16 -19749.99 0.05 1 0.34 -19776.42 No 2 0.16 -19747.00 0.02 1 0.2 -19778.69 No 2 0.28 -19773.76

Table 3.3B RED for ratemult: less than or equal to 0.1% Yellow: less than or equal to 1 aMean at Highest point in posterior distribution bbimodal c Sharp increase in LnL at end of run dOverlap is only in the bimodal part of distribution

55 Figure 3.2A-C Estimates of the ln likelihoods from eight replicate runs for each of the nine strategies with greater than one partition, but with COI removed (see Table 3.1). Notation as in Figure 1.

56

57 (figure continued)

among nearly every replicate run for the four strategies with greater than eight partitions (Table

3.5). Bayes factors were not significant among runs for any of the strategies with fewer than eight partitions, with the exception of the GENES strategy (not shown).

Although the log likelihoods among replicate runs were not significantly different for the

GENES strategy (Tables 3.3A and B), the runs did have a large difference in log likelihood units when compared to the other less partitioned strategies (Figure 3.1C). The long “tail” of the log likelihood distributions for those runs most likely caused concordant differences in the associated harmonic mean likelihoods, as the Newton and Raftery method for calculating Bayes factors is sensitive to low likelihood values (Newton and Raftery, 1994; Nylander et al., 2004).

For each partitioning strategy, the best of the eight runs, as determined by 2 ln Bayes factors, were selected and compared using Bayes factors. The GCIE partitioning strategy was favored (Table 3.6). However, when looking solely at the strategies with significantly different

Bayes factors among runs (Table 3.5), in all cases the run with the highest Bayes factors for a given partitioning strategy occurred only once, indicating that for the best partitioning strategies, the best runs were rare.

58 Table 3.4A-B Rate multiplier acceptance rates for runs with and without COI with no operator adjustment, and including COI with operator adjustment.

Ratemult Ratemult Ratemult Ratemult Ratemult Ratemult Temp Run Accept. Rate Accept. Rate Temp Run Accept. Rate Accept. Rate Accept. Rate Accept. Rate no COI New Props. no COI New Props. GCAT 0.2 1 0.03 0.04 31.39 GCIE 0.2 1 0.05 0.04 33.77 2 0.02 0.04 32.62 2 0.03 0.05 33.98 0.1 1 0.02 0.03 31.2 0.1 1 0.04 0.04 34.03 2 0.04 0.03 31.63 2 0.04 0.04 34.7 0.05 1 0.03 0.04 31.48 0.05 1 0.03 0.07 34.95 2 0.03 0.06 32.04 2 0.03 0.06 34.7 0.02 1 0.04 0.05 30.86 0.02 1 0.03 0.06 34 2 0.03 0.03 31.36 2 0.03 0.06 34.5

SGIE 0.2 1 0.03 0.04 67.57 SGAT 0.2 1 0.03 0.05 72.34 2 0.03 0.04 68.19 2 0.03 0.03 72.74 0.1 1 0.03 0.05 49.03 0.1 1 0.4 0.05 72.43 2 0.02 0.05 68.14 2 0.03 0.05 72.41 0.05 1 0.05 0.05 68.08 0.05 1 0.05 0.04 71.93 2c 0.05 0.04 66.94 2 0.03 0.05 71.04 0.02 1 0.03 0.04 66.4 0.02 1 0.03 0.07 71.72 2 0.04 0.04 66.37 2 0.04 0.04 71.83

RED for ratemult: less than or equal to 0.1% Yellow: less than or equal to 1 TABLE 3.4A

Ratemult Ratemult Ratemult Ratemult Ratemult Ratemult. Temp Run Acceptance Accept. Rate Temp Run Accept. Rate Accept. Rate Accept. Rate Accept. Rate Rate no COI New Props. no COI New Props. NMIE 0.2 1 0.06 1.28 9.37 GENES 0.2 1 0.21 1.2 2.93 2 0.09 1.26 24.84 2 0.3 1.96 / 0.1 1 0.08 1.29 13.72 0.1 1 0.3 0.47 10.5 2 0.08 1.28 6.39 2 0.16 0.46 / 0.05 1 0.1 1.33 10.44 0.05 1 0.34 0.23 1.02 2 0.1 1.32 10.27 2 0.16 0.49 / 0.02 1 0.1 1.29 12.22 0.02 1 0.2 0.55 5.92 2 0.09 1.25 13.73 2 0.28 0.55 /

NMAT 0.2 1 0.11 2.18 5.11 SNMIE 0.2 1 1.05 10.17 8.23 2 0.09 2.15 12.26 2 1.02 10.24 17.65 0.1 1 0.13 2.16 5.42 0.1 1 1.05 10.31 14.54 2 0.12 2.14 12.21 2 0.98 10.24 11.61 0.05 1 0.14 2.07 5.17 0.05 1 1.01 10.31 12.83 2 0.14 2.14 7.26 2 1.01 10.24 13.29 0.02 1 0.14 2.12 7.9 0.02 1 0.96 10.21 11.8 2 0.14 2.12 9.34 2 1.06 10.34 11.54

SNMAT 0.2 1 1.9 14.82 10.19 2 1.9 1.09 6.32 0.1 1 1.83 2.1 7.81 2 1.96 14.7 8.32 0.05 1 1.88 1.93 9.94 2 1.93 15.02 8.91 0.02 1 1.9 14.73 9.43 2 1.89 14.88 10.44

RED for ratemult: less than or equal to 0.1% Yellow: less than or equal to 1 TABLE 3.4B

59 Optimizing Proposal Acceptance Rates for the Rate Multiplier Parameter Resulted in Faster Convergence and Decreased Variation among Replicate Runs

In MrBayes, the proposal algorithm that updates the rate multiplier has a default setting of 500 for the tuning parameter, α. This value appeared to be non-optimal, as <0.1% of proposals to update the rate multiplier parameter were accepted (Table 3.4). When the adjusted values for the tuning parameter were used, the rate multiplier parameter showed good mixing and log likelihood values were stable among runs (Table 3.4 and Figure 3.3). Thus, optimizing proposal acceptance rates for the rate multiplier parameter resulted in faster convergence and a substantial reduction in among-run variation compared to when default tuning was used (Figures 3.1A and

3.3).

Figure 3.3 Traces visualizing the estimates of the ln likelihoods from the operator adjusted runs of the four most partitioned strategies. Each set represents four replicate runs for each strategy. All pre-burnin samples have been removed.

60

Determination of Final Partitioning Strategy

Final runs were performed once the final tuning parameter settings for each partitioning strategy, temperature settings, and branch length priors were determined. Pairwise comparisons of Bayes factors for the ten different partitioning strategies revealed highly significant support for the

GCIE strategy, the same as chosen for the non-operator adjusted runs (Table 3.6). In addition, the rank order of the partitioning strategies, as judged by Bayes factors, was nearly the same for the operator vs. non-operator adjusted runs. The GCIE partitioning strategy was used for all final phylogenetic analyses.

Phylogenetic Reconstruction

Once tuning parameters were adjusted, the AWTY results showed that all Bayesian analyses had converged. In addition, all parameters for the mutational models had ESS values > 250. Models for the MrBayes and BEAST analyses, which were chosen using the AIC, as well as GARLI models using the BIC, for each dataset partitioning strategy can be found in Table S3.3.

Bayesian and Maximum Likelihood Analyses Yielded Well-Supported, Concordant Topologies

Both maximum likelihood and Bayesian analyses of phylogeny for individual gene regions recovered a monophyletic Acanthemblemaria genus (not shown). Support for relationships within the genus varied by marker type and inference method, though. The trees resulting from the partitioned maximum likelihood analyses generally had weaker support for most nodes compared to the Bayesian analyses (not shown).

The MrBayes analysis of the concatenated dataset analyzed under the GCIE partitioning strategy produced a well-supported phylogeny, with 34 of 36 nodes supported by Bayesian posterior probability (BPP) values greater than 0.95 (Fig. 3.4). The topology of the maximum

61 likelihood tree was identical to the Bayesian tree, with 33 of 36 nodes supported by non- parametric bootstrap values of 75% or greater.

The genus Acanthemblemaria was recovered as monophyletic with good support (Fig.

3.4). The genus is nearly evenly split into two clades, each containing a well-supported transisthmian species pair. Each clade had a majority of either Eastern Pacific (clade I) or

Caribbean taxa (clade II). Several additional monophyletic groups were found within clades I and II. The “barnacle blenny” clade (sensu Hastings (1990) consists of all species in clade I with the exception of A. greenfieldi and A. chaplini. Clade I also contains the “hancocki species group” (sensu Hastings (1990), consisting of all the species in the “barnacle blenny” clade except

A. rivasi and A. castroi. Within clade II, two monophyletic groups were identified; clades A and

B. Clade IIA consists of A. medusa, A. maria, and Acan. n. sp. Clade IIB contains A. spinosa, A. aspera, and A. paula.

Within clade II, the only Eastern Pacific taxon is the transisthmian geminate A. exilispinus. Clade I includes two Caribbean sister taxa, A. greenfieldi and A. chaplini, along with eastern Pacific taxa. In addition, a deeply divergent sister relationship was recovered between A. chaplini collected from its type locality in New Providence, the Bahamas and individuals from two other Caribbean populations, denoted A. cf. chaplini in Figure 3.4 (all collection localities can be found in supplemental table S1). Last, a previously proposed cryptic species sister to A. rivasi with a Venezuelan distribution (A. Acero, pers. comm. to PAH) is recovered as reciprocally monophyletic with A. rivasi from Panama, suggesting it may be a valid species.

Two nodes received poor support in both the Bayesian and maximum likelihood analyses. The first node, in the “hancocki species group”, supports a sister relationship between

A. hancocki and A. macrospilus/A. balanorum. The second node, in clade II, creates a sister

62 relationships between clades A and B. The node establishing the monophyly of clade IIA also received poor support, but only in the maximum likelihood analysis.

Estimated branch lengths in the concatenated analysis differed among taxa. Within the

“barnacle blenny” clade every sister lineage differed in branch lengths, while in clade IIA, the branch leading to A. medusa was substantially shorter than those of its sister taxa. The largest difference in branch lengths was between the transisthmian geminates A. exilispinus and A. betinensis, with the branch leading to the latter twice as long as its sister species (Figure 3.3).

Including COI Increases Topological Uncertainty of Species Tree Estimates

The species tree analysis of the nuclear DNA dataset recovered a well-supported topology nearly identical to that from the concatenated analysis (Figure 3.5). The only difference was the placement of A. hancocki as basal to the rest of the species in the “hancocki species group”, where in the concatenated analysis A. crockeri was basal, although the node in question had poor support in both analyses (Figures 3.4 and 3.5A).

The species tree analysis using the dataset that included the rapidly evolving COI recovered a different topology from the analysis with only nuclear markers (Figures 3.5A and

3.5B). In contrast to the nuclear only tree, no sister relationships in the “hancocki species group” were recovered with good support. In general, the main effect of adding the mitochondrial COI data was much poorer support for several nodes that were well resolved in the nuclear only tree.

In nearly all cases, the nodes for which support declined in the species tree also had poor support in the COI gene tree (Fig. 3.5C).

To further visualize the effect of adding COI to the species tree analysis, all post-burnin trees for both the full and nuclear only dataset were analyzed in DensiTree v1.4.5 (Bouckaert,

2010). The qualitative estimates of the consensus trees from each of the datasets shows greater

63 uncertainty in the full dataset, as more consensus trees, each representing fewer total trees from the posterior sample, were found, leading to a less dense tree than that from the nuclear DNA only (Figure 3.6). Taken together, the decreases in posterior probabilities of the species tree, as well as the larger set of post-burnin consensus trees, show that with the addition of COI, topological uncertainty increases.

Figure 3.4 Acanthemblemaria phylogenetic estimate based on MrBayes analysis of concatenated dataset with GCIE partitioning. All intraspecific tips have been collapsed. Eastern Pacific taxa are in bold. The Bayesian consensus tree is shown. All nodes with posterior probabilities ≥ 0.95 are denoted by an asterisk above the branch. All nodes with maximum likelihood non-parametric bootstrap values ≥ 75% are denoted by an asterisk below the branch. Nodes denoted by a dash did not receive posterior probabilities ≥ 0.95 (above the branch) or non-parametric bootstrap values ≥ 75% (below the branch). Clades are demarcated. The clade within the gray box is the “hancocki species group”.

64 Substitution Rate Estimates

The BEAST analyses of substitution rates found that rates varied among taxa, but that most markers were evolving under a strict molecular clock. The tree resulting from the six-gene analysis showed some signal of rate variation throughout the phylogeny (Figure 3.7A). Where present, variation in substitution rates was almost exclusively limited to terminal branches. This indicates that differences in substitution rates were primarily between sister taxa, with no evidence of clade-wide shifts. The primary exception to the pattern of rate shifts at terminal branches was within Clade IIB, with an increased rate on the internal branch leading to the A. aspera/A. paula split, although there was no significant difference in rates between those taxa.

If the coefficient of variation in the mean substitution rate across the phylogeny is not significantly different from zero, then a strict clock cannot be rejected (Drummond et al., 2007b).

Although a visual pattern of rate variation was observed in the six-gene phylogeny, the coefficient of variation in the mean substitution rate was not significantly greater than zero for the combined dataset and most individual markers (not shown).

The two genes that deviated from a strict molecular clock were rag1 and COI (Figures

3.7B and 3.7C). It was not clear whether these deviations resulted from substitution rate variation within Acanthemblemaria or from including the outgroup taxa. Branch rates for the rag1 and

COI datasets were calculated on a fixed topology that excluded the outgroup lineages. When those outgroup taxa were removed, a strict clock could no longer be rejected for rag1 (Figure

3.7C). Conversely, a significant deviation from a strict clock was still recovered from the COI dataset (Figure 3.7B). Therefore, out of the six markers used in this study, only COI was not evolving under a strict clock within Acanthemblemaria.

65 Mitochondrial COI Substitution Rate Is 25.6% per Million Years and 97.5X Faster than Nuclear DNA

Substitution rate estimates revealed very rapid mitochondrial rates across the phylogeny, in agreement with previous results (Eytan and Hellberg, 2010). COI was evolving at a mean rate 97.5X greater than the combined nuclear markers (Figure 3.8), with upper and lower 95% HPD of 205:1 and 20:1, respectively. The mean 95% upper and lower HPD for substitution rates across the phylogeny in substitutions/site/million years for COI and the combined nuclear markers were 1.43-1 (2.46-1,5.39-2) and 1.67-3 (2.74-3,6.47-3), respectively. These rates corresponded to values of 24.6 (49.2,10.68) and 0.33 (0.59,1.29) percent sequence divergence per million years, respectively.

Discussion The partitioning of molecular datasets when inferring phylogenies using Bayesian inference has been shown to increase the accuracy of both parameter and topology estimates (Brandley et al.,

2005; Brown and Lemmon, 2007; Nylander et al., 2004). However, the success of this approach is dependant on good mixing of the Markov chain so that a robust sample of the posterior distribution of the model parameters can be obtained (Huelsenbeck et al., 2002b; Ronquist et al.,

2009; Yang, 2006). In this study, I found poor mixing of the Markov chain, with proposals to update the rate multiplier parameter being rarely accepted. Log likelihoods also differed among replicate runs for strategies that employed more than eight partitions. Although I estimated a rapid mitochondrial substitution rate, including mitochondrial data in the MrBayes analyses did not cause the poor MCMC runs. Instead, as the number of partitions increased, the performance of the MrBayes analyses decreased, regardless of the severity of rate variation among partitions.

However, this behavior could be ameliorated by adjusting a single proposal tuning parameter - the rate multiplier. Including the fast COI data did affect the species tree analysis: mitochondrial

66

A. greenfieldi A. balanorum * A. chaplini 0.57 * A. macrospilus A. macrospilus 0.9/X 1.0 * 0.45/X A. balanorum A. crockeri * * A. crockeri 0.39 A. hancocki * A. hancocki 0.77/0.53 A. rivasi B * A. castroi * * * A. exilispinus 0.88/0.4 * A. betinensis 0.86/0.44 * A. medusa A. maria * * * Acan. n. sp. * A. aspera * * * A. paula * A. spinosa C. lucasana * * P. bicirrus * E. myersi A * E. nigra

* A. chaplini * A. greenfieldi * A. macrospilus * * A. balanorum * * A. hancocki A. crockeri * A. castroi * A. rivasi * * A. maria * * Acan. n. sp. * A. betinensis * * A. exilispinus * A. medusa * * A. aspera * * * A. paula * * A. spinosa C. lucasana * P. bicirrus E. myersi 1 * E. nigra C

Figure 3.5 A. Species tree topology estimated from the nuclear DNA only dataset. Bayesian posterior probabilities (BPP) ≥ 0.95 for the nuclear and combined datasets are represented by asterisks above and below the nodes, respectively. An “X” indicates a split that was not recovered from the nuclear + mitochondrial dataset. B. The inset shows the resolution of the “hancocki species group” recovered from the nuclear + mitochondrial dataset with BPP. C. The COI gene tree with all nodes with BPP ≥ 0.95 denoted by asterisks.

67 DNA caused decreased support for nodes and increased uncertainty in the species tree estimate compared to the exclusive use of nuclear markers. The increased variation in COI relative to other markers seems to have given it disproportionate weight in the posterior estimates of the species tree.

Substitution Rates

When averaged over the entire phylogeny, the absolute and relative substitution rates for mitochondrial COI were considerably higher than the already-fast rates previously estimated

(Eytan and Hellberg 2010). Here I found a mean substitution rate of 24.6% pairwise sequence divergence per million years, while previous results found a rate of 11.2% per million years. A mean ratio of 97.5:1 was found here for mitochondrial to nuclear sequence substitution rates,

Figure 3.6 The posterior distribution of post-burnin consensus species trees for (A) the nuclear DNA and (B) nuclear and mitochondrial DNA datasets, respectively. Darker lines indicate that a particular consensus tree contains a larger proportion of the posterior set of trees than lighter lines. Areas where there are dark lines can be interpreted as parts of the topology with high confidence, while light, overlapping lines represent low confidence.

68

Figure 3.7 BEAST analyses of substitution rates throughout the genus Acanthemblemaria. A. Tree resulting from six-gene analyses. Branches are colored by substitution rates. Blue branches are slow, while red branches are fast. B and C. Posterior density of the coefficient of variation for the mean COI (B) and rag1 (C) substitution rates. The value for the coefficient of variation is on the x-axis, while the y-axis represents the proportion of the sample in the posterior distribution. The blue and gray curves represent the posterior distributions of the analysis with and without the outgroup taxa, respectively. Values that do not overlap with zero indicate that a strict clock can be rejected.

69 substantially higher than the 37-fold difference found previously. These values place

Acanthemblemaria blennies at the highest end of vertebrate mitochondrial substitution rate estimates, both absolute and relative to nuclear DNA (Nabholz et al., 2008; Nabholz et al., 2009;

Welch et al., 2008).

Figure 3.8 Posterior distribution of the ratio of mitochondrial to nuclear substitution rates.

I was unable to reject a strict molecular clock in Acanthemblemaria for any markers, save for COI (Figure 3.7B). This inability to reject a strict clock may be because rate variation was not present in the majority of the tree (Figure 3.7A), which would cause the mean rate to appear more clock-like (Welch and Bromham, 2005). That local clocks may exist in Acanthemblemaria is suggested by the concatenated MrBayes tree, where several sister taxa have different branch lengths. The most prominent of these were A. betinensis and A. exilispinus, where the latter is on a substantially shorter branch than the former (Figure 3.4). This result is problematic, as it would appear that a strict clock should be rejected for these geminate taxa. The rejection of a strict

70 clock between geminates may have a substantial effect on substitution rates estimated using the rise of the Isthmus of Panama as a calibration, as branch lengths will not be proportional to time and cannot provide an accurate estimate of absolute rates of molecular evolution.

That the rate of COI evolution in this genus is so great also has implications for species delimitation using DNA barcoding. COI-based species delimitation frequently makes use of a

“barcoding gap”, where a certain level of sequence divergence is used to distinguish between within-species genetic variation and between-species genetic divergence (Hebert et al., 2003). A

“barcoding gap” calculated in other taxa with slower rates of molecular evolution is expected to cause oversplitting in Acanthemblemaria blennies, as many populations have very large COI genetic distances among them (Eytan and Hellberg, 2010), but do not appear to differ in any other diagnostic feature (Smith-Vaniz and Palacio, 1974). While DNA barcoding may prove to be a useful tool for alpha , it may not be appropriate to apply the same fixed “barcode gap” to different taxonomic groups.

Partitioned Bayesian Analyses

In the course of estimating the phylogenetic tree for Acanthemblemaria using MrBayes, I found that a subset of partitioning strategies resulted in very poor performance of the Markov chain, with replicate runs of datasets with more than 8 partitions having large differences in mean log likelihoods (Figure 3.1 and Tables 3.3A and B). This poor performance was related to the mixing of the rate multiplier parameter, as proposals to update the Markov chain were seldom accepted

(Tables 3.3A and B, 3.4). Because all partitions must share the same set of branch lengths

(Nylander et al., 2004), large differences in substitution rates among partitions cause the overall rate to be divided unequally and could lead to a small portion of the parameter space containing the highest posterior probabilities. This would lead to update proposals being rarely accepted,

71 because even small changes in the Markov chain would produce very different posterior probabilities (Ronquist et al., 2009). The expectation, then, was that the large differences in substitution rates between mitochondrial COI and the nuclear DNA in my dataset was the cause of the poor mixing I observed.

Contrary to my expectation, including the COI data had no effect on the acceptance of proposed changes to the rate multiplier, or on variation in log likelihoods. When COI was removed from the analyses, both acceptance of proposals and mixing of the Markov chain remained poor and differences in log likelihoods among runs remained large (Figure 3.2A).

Rather than the relative rates among partitions being the problem, the number of partitions in the data was responsible for the poor performance of the MCMC analysis. All strategies with more than 8 partitions (or 6 in the case of the nuclear-only dataset) were characterized by replicate runs with large differences in log likelihoods, although the effect was less pronounced when COI was removed (Figure 3.2A). However, the latter result would be expected because the number of partitions decreased with the exclusion of COI.

Changing the heating to allow more swaps between chains had no effect on run performance and did not aid in convergence or proposal acceptance rates (Tables 3.3A and B,

Figure 3.1A). Adjusting the tuning parameter that produces updates to the rate multiplier parameter, however, did. For the four partitioning strategies with more than 8 partitions, replicate runs converged onto the same posterior distribution, and individual runs converged faster than when operators were not adjusted (compare Figure 3.3 with Figure 3.1A). In addition, proposals to update the rate multiplier parameter were accepted with far greater frequency once the tuning parameter was adjusted than before (Table 3.4). Simply adjusting the tuning parameter, without

72 making any other changes, increased mixing of the Markov chain and decreased differences in log likelihoods among runs substantially (Figure 3.3).

In all cases, the adjusted tuning parameters were substantially greater than the default values in MrBayes. While MrBayes uses a default value for the tuning parameter α of 500, the auto-tuned proposals I employed had α values as high as 60,000 (see Supplementary information). Values this high suggest that only very small proposed changes to the Markov chain were being accepted (Ronquist et al., 2009).

Without operator adjustment, differences in log likelihoods among runs were so large that for nearly all of the four highly partitioned strategies, significant Bayes factors were found among replicate runs (Table 3.5). Interestingly, when the best of these eight runs for each strategy were taken and then Bayes factors were calculated among them, I found nearly the same hierarchy of support for partitioning strategies as in the operator adjusted runs (Table 3.6). Note, however, that the “best” of the eight replicate runs with no operator adjustment occurred just once (Tables 3.5 and 3.6), demonstrating that without operator adjustment, convergence could be rare and the best log likelihood may not be found.

Previous studies examining the effect of partitioning strategies in MrBayes did not find the same results I did, although none specifically examined mixing of the rate multiplier parameter. Of these, some did not account for variation in substitution rates among partitions

(Brandley et al., 2005; Brown and Lemmon, 2007). For those that did, results were mixed, and strict comparisons are not possible due to different software versions, or incomplete reporting of results. Marshall (2009) found evidence that the inferred rate multiplier value for individual partitions was grossly inaccurate, but did not seem to find poor mixing of the rate multiplier parameter, nor a direct affect on the number of partitions and convergence success. However, as

73 just three partitions were used for his simulated and empirical datasets, it is not surprising that the poor MCMC performance I recovered was not found. However, re-analyses of pre-existing datasets that contained up to 7 partitions did not recover the same pattern as I did here – although fewer replicate runs were performed than in my study.

Nylander et al. (2004) did not find any failure of replicate runs to converge on the same log likelihoods, although they used a different version of MrBayes (v3.0). However, they also examined fewer partitions than in this study- a maximum of five. They did perform a single

MCMC run for a 12-partition strategy, and convergence appeared to happen quite quickly (their

Figure 8C), although it is not clear if rates were allowed to vary among partitions.

Last, Brown et al. (2010) appeared to find some evidence of poor mixing for the rate multiplier parameter in an 11-partition dataset. Posterior estimates for some of their partitions switched back and forth from very small to very large values (Figure 6C in Brown et al.).

However, log likelihoods did not appear to have convergence problems (their Figure 6A). It is not clear how many runs were performed for those analyses or whether auto-tuning was employed (Brown et al used MrBayes 3.2 for their study).

Studies employing large numbers of partitions could cause the same behavior I found in my MrBayes analyses. For example, a phylogenomic study of birds by Hackett et al. (2008) contained 19 genes for 171 taxa. The MrBayes analyses of the dataset partitioned by gene failed to converge on the same log likelihood for replicate runs (Hackett et al., 2008). Although the proposal acceptance rate was for the rate multiplier parameter was not given in their paper, their log likelihood plots were qualitatively similar to those from this study (compare Figure 3.1 to

Figure S3 in Hackett et al.) and, as in my study, Metropolis coupling did not appear to aid in convergence.

74 Table 3.5 Pairwise 2ln Bayes factors (BF) calculated among replicate runs for each multi- partition strategy. Positive BF values greater than 10 are considered to be very strong support in favor of a given model (Kass and Raftery, 1995).

2lnBF lnL GCAT1 GCAT2 GCAT3 GCAT4 GCAT5 GCAT6 GCAT7 GCAT8 GCAT1 -19073.43 - GCAT2 -18731.13 684.60 - GCAT3 -18924.27 298.31 -386.28 - GCAT4 -18745.66 655.54 -29.06 357.23 - GCAT5 -18848.25 450.36 -234.24 152.05 -205.18 - GCAT6 -18934.20 278.45 -406.15 -19.86 -377.09 -171.91 - GCAT7 -18885.70 375.46 -309.14 77.14 -280.09 -74.90 97.00 - GCAT8 -18875.22 396.41 -288.19 98.09 -259.14 -53.95 117.96 20.95 - 2lnBF lnL GCIE1 GCIE2 GCIE3 GCIE4 GCIE5 GCIE6 GCIE7 GCIE1 -18855.18 - GCIE2 -18618.83 472.69 - GCIE3 -18748.66 213.04 -259.65 - GCIE4 -18650.27 409.82 -62.87 196.78 - GCIE5 -18801.79 106.79 -365.91 -106.26 -303.04 - GCIE6 -18981.71 -253.06 -725.75 -466.10 -662.88 -359.84 - GCIE7 -18832.78 44.81 -427.88 -168.23 -365.01 -61.98 297.87 - 2lnBF lnL GENES1 GENES2 GENES3 GENES4 GENES5 GENES6 GENES7 GENES8 GENES1 -19771.29 - GENES2 -19787.03 -31.46 - GENES3 -19789.77 -36.96 -5.50 - GENES4 -19781.88 -21.18 10.28 15.78 - GENES5 -19802.58 -62.57 -31.11 -25.61 -41.39 - GENES6 -19795.92 -49.25 -17.78 -12.29 -28.07 13.33 - GENES7 -19864.66 -186.74 -155.28 -149.78 -165.56 -124.16 -137.49 - GENES8 -19801.93 -61.27 -29.80 -24.31 -40.09 1.31 -12.02 125.47 - 2lnBF lnL SGAT1 SGAT2 SGAT3 SGAT4 SGAT5 SGAT6 SGAT7 SGAT8 SGAT1 -18893.59 - SGAT2 -18937.81 -88.45 - SGAT3 -18952.66 -118.14 -29.69 - SGAT4 -19030.48 -273.78 -185.33 -155.64 - SGAT5 -18891.62 3.94 92.39 122.08 277.72 - SGAT6 -18948.81 -110.44 -22.00 7.70 163.34 -114.38 - SGAT7 -18957.99 -128.80 -40.35 -10.66 144.98 -132.74 -18.35 - SGAT8 -18878.15 30.88 119.32 149.02 304.66 26.94 141.32 159.68 - 2lnBF lnL SGIE1 SGIE2 SGIE3 SGIE4 SGIE5 SGIE6 SGIE7 SGIE1 -18785.29 - SGIE2 -18904.98 -239.40 - SGIE3 -18738.58 93.42 332.82 - SGIE4 -18917.13 -263.69 -24.29 -357.11 - SGIE5 -18972.38 -374.18 -134.79 -467.60 -110.50 - SGIE6 -18978.45 -386.33 -146.93 -479.75 -122.64 -12.15 - SGIE7 -18954.65 -338.73 -99.33 -432.15 -75.04 35.46 47.60 -

Although the current direction in molecular phylogenetics is towards species tree inference rather than concatenated analyses (Edwards, 2009), the latter will continue to be used,

75 especially for studies directed at sampling many taxa and employing large number of loci. Some such studies aim to sequence up to 20 genes for thousands of species (see Assembling the Tree of life website for a partial list www.phylo.org/atol/projects). Robust strategies for analyzing those datasets in a Bayesian framework are needed.

I suggest that the rate of accepted update proposals to the Markov chain should be examined as part of a partitioned Bayesian analysis, and note such inspection will become more important as more partitions are included in the model. Operator adjustment may be essential to allow good mixing of the Markov chain and convergence onto the same posterior distribution among replicate runs.

Table 3.6 Top table: Pairwise 2ln Bayes factors (BF) calculated among the best out of all replicate runs for each multi-partition strategy without operator adjustment. Bottom table: Pairwise 2ln BF calculated among operator adjusted runs for each multi-partition strategy. BF values greater than 10 are considered to be very strong support in favor of a given model (Kass and Raftery, 1995). Partitioning strategies are listed in rank order from strongest to weakest support.

2lnBF Best Runs with no operator adjustment GCAT_run2 GCIE_run2 GENES_run1 NMAT_run2 NMIE_run1 SGAT_run5 SGIE_run3 SNMAT_run3 SNMIE_run4 GCAT_run2 - GCIE_run2 224.59 - GENES_run1 -2080.33 -2304.92 - NMAT_run2 -98.44 -323.03 1981.89 - NMIE_run1 35.72 -188.87 2116.05 134.16 - SGAT_run5 -294.04 -518.63 1786.29 -195.60 -329.76 - SGIE_run3 -14.90 -239.48 2065.44 83.55 -50.61 279.15 - SNMAT_run3 -1388.23 -1612.82 692.10 -1289.79 -1423.95 -1094.19 -1373.34 - SNMIE_run4 -1250.50 -1475.09 829.83 -1152.06 -1286.22 -956.46 -1235.60 137.73 -

Rank order of partitioning strategies without operator adjustment Rank order of partitioning strategies with operator adjustment 1 GCIE 1 GCIE 2 NMIE 2 GCAT 3 GCAT 3 NMIE 4 SGIE 4 SGIE 5 NMAT 5 NMAT 6 SGAT 6 SGAT 7 SNMIE 7 SNMIE 8 SNMAT 8 SNMAT 9 GENES 9 GENES 10 FULL 2lnBF with operator adjustment GCAT GCIE GENES NMAT NMIE SGAT SGIE SNMAT SNMIE FULL GCAT - GCIE 128.82 - GENES -2072.06 -2200.88 - NMAT -199.52 -328.34 1872.54 - NMIE -55.19 -184.01 2016.87 144.33 - SGAT -259.00 -387.82 1813.06 -59.48 -203.81 - SGIE -157.96 -286.78 1914.10 41.56 -102.77 101.04 - SNMAT -1354.98 -1483.80 717.08 -1155.47 -1299.80 -1095.99 -1197.03 - SNMIE -1199.35 -1328.17 872.71 -999.84 -1144.17 -940.36 -1041.40 155.63 - FULL -3591.88 -3720.70 -1519.82 -3392.36 -3536.69 -3332.89 -3433.93 -2236.90 -2392.53 -

76

Species Tree Analyses

The high mitochondrial substitution rate inferred for Acanthemblemaria did not cause difficulties for the partitioned Bayesian analyses, but it did have an effect on species tree estimation. When added to the analysis, COI lowered the posterior probabilities for nodes that were well supported in the nuclear-only dataset. The main cause for this appears to be that the fast rate of COI evolution increased uncertainty in the species tree topology. This was evidenced by all nodes with poor support (<0.95 BPP) in the COI gene tree having poor support in the nuclear + COI species tree as well (Figure 3.5). In addition, including the mitochondrial data led to a lower density of nodes in the posterior collection of consensus species trees (Figure 3.6).

The COI dataset had the most information of all the markers used in this study (Table 2).

However, most of these sites may have amounted to little more than noise, with a comparatively low amount of signal to use for phylogenetic estimation. That, however, did not pose a problem for the concatenated analyses. It seems that the species tree estimation, because it integrates over all gene trees, gives equal weight to all markers, regardless of the amount of phylogenetic signal contained within any particular one. Indeed, that is the desired behavior of a Bayesian analysis, as it allows the incorporation of phylogenetic uncertainty (Huelsenbeck et al., 2000).

It is not clear how general this fast gene effect I observed may be. The case of COI in my dataset represents an extreme example of how poor quality information might overwhelm an otherwise robust dataset. However, the problem that it represents may have implications for any

Bayesian species tree analysis where a minority of the markers in a dataset contains the majority of its information. When one marker has much more information than others, it may have a disproportionately large effect on the species tree estimate. It may be expected, though, that

77 species tree inference methods that require fully resolved gene trees, such as STEM (Kubatko et al., 2009), would not be vulnerable to this problem. Further exploration of this issue using a combination of simulated and empirical data, along with different species tree inference methods, would be useful.

Acanthemblemaria Phylogeny

A robust and well-supported phylogeny was estimated for the tube blenny genus

Acanthemblemaria. The topology generated in the concatenated Bayesian and maximum likelihood analyses were largely concordant with that of the species tree analysis. These trees recovered some of the clades and species pairs found by Hastings (1990) in his morphological phylogeny of Acanthemblemaria, but there was also some conflict. The two hypothesized geminates were recovered, as was the “barnacle blennies” clade. The sister relationship of A. greenfieldi and A. chaplini was also confirmed. However, in contrast to the topologies recovered in this study, the Hastings tree was highly nested (Figure 1 of Hastings 1990), with a progression from Caribbean to Eastern Pacific taxa. Additionally, neither the “aspera species group”, consisting of A. medusa sister to A. aspera and A. paula, nor the proposed sister relationship between A. spinosa and A. maria found in the Hastings tree were recovered here

(Figures 3.4 and 3.5).

Conclusions

I recovered a very fast mitochondrial substitution rate for Acanthemblemaria. That fast rate, however, did not cause the problems found in the partitioned MrBayes analysis. Instead, the large numbers of partitions (at least 8) were responsible for poor performance of the MrBayes analyses, leading to significantly different log likelihoods among replicate runs. This situation was ameliorated by adjusting the operator proposals for the Markov chain. The rapid

78 mitochondrial substitution rate did affect the species tree analysis though, leading to decreased support for many nodes compared to the nuclear dataset. This appeared to be caused by a poor signal to noise ratio in the mitochondrial data, which also contained the most information out of the markers surveyed. I strongly recommend that as part of any partitioned Bayesian analysis, the acceptance of proposals to change the rate multiplier parameter should be optimized so that there is good mixing of the Markov chain. In addition, particular care should be given to Bayesian species tree estimation when a minority of markers contains the majority of information in a dataset.

79 Chapter 4: A Thorny Situation: Accounting for Conflict between Molecules and Morphology in the Neotropical Reef Fish Clade Acanthemblemaria (Chaenopsidae)

Introduction Coral reef communities harbor the greatest marine fish diversity of any oceanic ecosystem.

Biodiversity of coral reef fishes is highest in the Indo-West Pacific and decreases longitudinally to the east and west, with the Neotropics being species-poor in comparison (Bellwood and

Wainwright, 2002; Briggs, 1974; Mora et al., 2003). The major exception to this pattern comes from the Blennioidei, a perciform suborder of small, bottom-dwelling reef fishes. Blennies are a species-rich group composed of six families. Of those, the Labrisomidae, the Dactyloscopidae, and the Chaenopsidae are the only reef fish families endemic to the New World (Bellwood and

Wainwright, 2002; Hastings, 2009).

Acanthemblemaria (Metzelaar, 1919) is the most species-rich genus of chaenopsids, as well as one of the most species-rich genera of Neotropical blennies (Hastings, 2009; Hastings and Springer, 2009b). All members in the genus are small (~1.2-3.5 cm standard length) and are obligate dwellers of vacated invertebrate holes on shallow (<1 - ~22 meters) rocky and coral reefs (Stephens, 1963). As currently recognized, Acanthemblemaria contains 22 species, with a nearly even split of 10 in the Tropical Eastern Pacific and 12 in the Tropical Western Atlantic

(Hastings, 2009). Since the original treatment of the family Chaenopsidae by Stephens (1963), more named species have been added to Acanthemblemaria than to any other chaenopsid genus.

Much of this growth has been due to the recognition that several species with large distributions contain cryptic taxa (Hastings and Springer, 2009a; Hastings and Springer, 2009b; Lin and

Galland, 2010).

80 The generic name Acanthemblemaria comes from the Greek Akanthos-, or thorn. The name is apt, as Acanthemblemaria blennies are typified by the presence of spinous processes on the frontal bones (Metzelaar, 1919; Smith-Vaniz and Palacio, 1974; Stephens, 1963).

Morphological characters related to head spination represent the majority of the characters used to infer the interspecific relationships in the group (Hastings, 1990). A molecular phylogeny of the genus (Chapter 2) recovered Acanthemblemaria as monophyletic, but also recovered conflicts with the morphological hypothesis of Hastings (1990), where taxa with clear affinities based on cranial morphology were not closely related in the molecular phylogeny.

The characters used for inferring phylogenetic relationships must be independent of one another (Kluge, 1989). Suites of morphological characters that evolve in concert violate this dictate. Such correlated evolution is most likely to occur when a set of characters underlie a functionally adaptive phenotype (Emerson and Hastings, 1998). Such suites of correlated characters can mislead phylogenetic analyses because they track adaptive history instead of phylogeny (Holland et al., 2010; McCracken et al., 1999). In practice, it is difficult to determine whether characters are correlated. This is because a suit of characters that are highly correlated with one another are expected to produce the same result as a suite of independent characters with good phylogenetic signal: strong support for a given clade (Shaffer et al., 1991).

Here I test whether the morphological characters representing spinous processes in

Acanthemblemaria are correlated with one another, and if accounting for that correlation can reconcile the molecular and morphological hypotheses for the genus. I reconstruct the species tree of the genus Acanthemblemaria using 5 nuclear markers. I also employ Bayesian relaxed clock divergence dating to determine the age of the group and timing of speciation among the members of the genus.

81 Materials and Methods Taxon Sampling

Individuals from 16 out of the 22 named Acanthemblemaria species, as well as one undescribed species, were collected on SCUBA (Appendix 1 Table 5). Four outgroup taxa, chosen based on

Hastings (1990) and Almany and Baldwin (1996), were also included in the study. Of the taxa collected, six are putative transisthmian geminates (Hastings, 1990; Hastings and Springer,

1994), with two geminate pairs in the ingroup and one in the outgroup. Photo vouchers from freshly collected specimens are available from RIE for a subset of these individuals by request.

Whole fishes were stored individually in 95% ethanol or salt-saturated DMSO at -80° C.

DNA Extraction, PCR and Sequencing

DNA was extracted using the Qiagen (Valencia, CA) QIAMP DNA Minikit. The polymerase chain reaction (PCR) was performed to amplify five genetic markers (Appendix 1 Table 6): nuclear protein-coding genes recombination-activating gene 1 (rag1), titin-like protein

(TMO4C4), melanocortin 1 receptor (MC1R), SH3 and PX domain containing 3 gene (SH3PX3), and intron V from nuclear α-tropomyosin (atrop). PCR amplification of the full-length rag1 molecule was not possible for some taxa. A set of internal primers were developed for the study and used to amplify rag1 in those other taxa.

Amplicons were purified with a Strataprep PCR Purification Kit (Stratagene, La Jolla,

CA) or directly sequenced without cleanup in both directions on an ABI 3100 or 3130 XL automated sequencer with 1/8 reactions of BigDye Terminators (V3.1, Applied Biosystems) and the amplification primers, or internal primers as indicated in Chapter 2.

Sequence Alignment and Model Selection

Sequences for the four protein-coding genes were aligned using MUSCLE (Edgar, 2004) as implemented in Geneious v3.6 (Drummond et al., 2007a). The α-tropomyosin sequences, which

82 contained numerous gaps, were aligned in BAli-Phy v2.0.1 (Suchard and Redelings, 2006) using the GTR substitution model, gamma distributed rate variation, and the default indel model. BAli-

Phy was run four times to ensure concordance among runs. All the samples of the Markov chain taken before convergence, as determined by stationarity in the Markov chain, which was visualized in Tracer v1.5 (Rambaut and Drummond, 2010), were discarded as burnin. The consensus alignment from the run with the highest posterior probability was used for subsequent analyses and all positions with posterior probabilities less than 0.95 were discarded.

Models of sequence evolution and the partitioning strategy used were the same as in

Chapter 2, and determined using ModelTest (Posada and Crandall, 1998) and 2 ln Bayes factors, respectively. Molecular data for all analyses were partitioned by codon position for each of the protein-coding genes and a-trop by intron/exon boundaries.

Bayesian Species Tree and Divergence Dating Analyses

Species tree estimation - Species tree analyses were conducted using the *BEAST package in

BEAST v1.5.4 (Heled and Drummond 2010). Sequences were grouped by nominal species for the analyses. Trees and clocks were unlinked among all genes, with each gene region dated using the uncorrelated log normal distribution (UCLD) (Drummond et al., 2006) and the calibrations detailed below. The datasets were run twice for 100,000,000 generations, sampling every 5,000.

Convergence onto the posterior distribution for the estimated topology was assessed using the

“compare” and “cumulative” functions in AWTY (Nylander et al., 2008). Convergence onto the posterior distribution for parameter estimates was assessed by effective sample size (ESS) values greater than 250, as determined in Tracer v1.5 (Rambaut and Drummond, 2010). A time- calibrated phylogeny of the concatenated dataset was also constructed in BEAST, using the same calibrations and run conditions as for the species tree.

83 Divergence dating - Priors on the time to most recent common ancestor (TMRCA) for two transisthmian species pairs were specified. The first species pair, A. betinensis and A. exilispinus, occur in <1 meter of water and are restricted to areas close to the isthmus (Hastings,

2009). These distributions suggests that their progenitor was split close to the final closure of the isthmus. The calibration was given an exponential prior with a mean of 7 million years and a zero offset of 3.1 million years. This prior represents the most recent possible split for the geminates at the close of the isthmus, but allows for a split prior to the closure, although with decreasing probability back in time.

The second pair of geminates, A. rivasi and A. crockeri, have a Galapagos – Caribbean distribution (Hastings, 2009). While the most recent possible split between these two would have been the closure of the Isthmus and the earliest possible split the rise of the Galapagos (at most

17 million years ago (Werner and Hoernle, 2003), the split most probably occurred between those dates. A truncated normal prior for the split time of A. rivasi and A. crockeri was specified.

A minimum offset of 3.1 million years, representing the most recent possible split for the species pair, was used. The mean and standard deviation were set at 10 and 3.52, respectively, which gave a 95% confidence interval 3.1 and 16.9 million years.

Analysis of Morphological Data

A modified version of the morphological matrix from Hastings (1990) was analyzed.

Acanthemblemaria stephensi and A. atrata were not sampled for the species tree analyses as tissue was not available, and were removed. Three taxa were added to the morphological matrix

(Acan. n. sp., Protemblemaria bicirrus, and Cirriemblemaria lucasana) and scored for the set of

60 characters from Hastings (1990). The new matrix was analyzed in a Bayesian framework using MrBayes v.3.1.2 (Ronquist and Huelsenbeck, 2003) and the Mkv model for morphological

84 data (Lewis, 2001). In MrBayes all characters were set as variable and unordered, save for three that were ordered in Hastings (1990): character 2 (number of spines on the nasal rami (excluding

AFO process)), character 3 (process on the nasal bones anterior to the first anterofrontal sensory pore (AFO process)), and character 7 (anterolateral extent of the frontal ridge). The MrBayes analyses were run twice with 4 heated chains (temp=0.1) for 10,000,000 generations, sampling every 1,000. Convergence onto the posterior distribution for the model parameters and topology was assessed using Tracer v1.5 (Rambaut and Drummond, 2010) and Are We There Yet?

(AWTY) (Nylander et al., 2008), respectively.

Identification of Correlated Incongruent Morphological Characters

The method of Holland et al. (2010) was used to identify morphological characters that are incongruent with the Acanthemblemaria species tree phylogeny and correlated with one another.

The post-burnin set of consensus species trees were used to calculate the excess score and retention index statistics for each post-burnin tree, for each character in the morphological dataset. Groups of characters that were found to share an excess of incongruence with the set of species trees were then inferred to belong to correlated suites of incompatible characters.

Results Molecular Data and Convergence Criteria

The five nuclear gene regions were successfully amplified in all taxa for a total alignment length of 3,790 bp. The lengths of the aligned sequences, as well as the proportion of variable and parsimony informative sites for each marker, were the same as for Chapter 2. For each of the analyses (time-calibrated species and concatenated trees, and the morphological tree) convergence diagnostics (AWTY results and ESS values >250) indicated that convergence onto the posterior distribution had occurred.

85 The Species Tree Estimate for Acanthemblemaria Yielded a Well-Supported Phylogeny but It Was In Significant Conflict with the Morphological Hypothesis

Comparison of Species Tree with Hastings 1990 - The Bayesian species tree estimate yielded a well-resolved topology with 13 of 19 nodes supported by Bayesian posterior probability (BPP) values greater than 0.95 (Figure 4.1A). However, many of the well-supported nodes conflicted with the morphological hypothesis of Hastings (1990) (Figure 4.1B) and the Bayesian estimate of the morphological data inferred in this study (Figure 4.1C).

As in Hastings (1990), Acanthemblemaria was recovered as monophyletic in the species tree analysis, here with high support (BPP = 1.0) (Figure 4.1A). Hastings’ phylogeny was highly nested, showing a progression from A. chaplini and A. greenfieldi at the base of the tree, through the Caribbean Acanthemblemaria taxa, to the “hancocki species group” at the crown (Figure

4.1B). In the *BEAST species tree, though, two major clades, here denoted as clade I and clade

II, were recovered with high support (Figure 4.1A). Each of these clades contained a pair of transisthmian sister species, both of which were recovered with BPP of 1.0. Neither of these transisthmian pairs was basal to the other taxa in their respective clades. While the relationships of these two pairs of geminate taxa to the other members of their respective clades received high posterior support, both were less than 0.95 (Figure 4.1A).

Clade I was composed of a majority of Eastern Pacific taxa, clade II of mostly Caribbean taxa. In clade I, a monophyletic group of taxa that occurs in the Eastern Pacific, with the exception of the geminate A. rivasi, was found. This clade, A. crockeri + “the hancocki species group” (sensu Hastings 1990) was also recovered by Hastings. However, the species tree analysis recovered the transisthmian geminates A. castroi and A. rivasi as sister to the remaining species in the clade, with A. crockeri nested within the “hancocki species group”, but with poor support.

86 In clade II, the well-supported relationship between the geminate taxa A. betinensis and

A. exilispinus was also present in the Hastings tree. However, many of the other splits in clade II conflicted with the morphological phylogeny. The A. maria/A. spinosa split was not recovered in the species tree, nor was the “aspera species group” of A. medusa, (A. aspera, A. paula). Instead,

A. spinosa was found to be sister to A. aspera and A. paula, while A. maria was sister the undescribed Acanthemblemaria species. A. medusa, which was placed as sister to A. aspera and

A. paula in the “aspera species group” based on morphological data, was found to be sister to A. maria and Acan. n. sp., albeit with a BPP of 0.86.

Comparison of Species Tree with Bayesian Estimates of Morphology - The phylogeny based on Bayesian inference of morphological data closely mirrored that of Hastings (1990), although support was poor for many of the nodes (Figure 4.1C). All the splits and clades inferred by Hastings were recovered here with the exception of the hancocki/stephensi split, as the latter taxon was not included in this study. The undescribed Acanthemblemaria species, which was not included in Hastings, was recovered here as a member of the “hancocki species group”. Also, as in Hastings, the morphological tree inferred here was highly nested, with the same progression of taxa.

A. spinosa and A. medusa Were Responsible for the Majority of Incongruence between Molecules and Morphology The splits network of the morphological tree with the sequence-based tree revealed that most of the terminal taxa agree between the two trees, as indicated by strictly bifurcating splits, including the entire “hancocki species group” clade (Figure 4.2). The two taxa that were responsible for the majority of the conflict between the two trees, as visualized by conflicting networks of splits, were A. spinosa and A. medusa. Both taxa were characterized by needing to traverse extra splits to be united with clades specified by either the morphological or the molecular phylogeny. When

87 each of these taxa were removed from the tree, conflicting splits disappeared from the splits networks (Figure 4.2).

954 "#!/$00.1%0+2% "#!,-')+%.%! & !"#1-,/0()&'+(! / 456 $ 954 4578 !"#4-'-*0/+1! ' ( 954 !"#,/0,3$/& $ !"#5-*,0,3&! 45<< 954 "#!$%&'(% 954 !"#,-(./0&#

954 !"#$%&'&()&*+(! 45:: "#!506%.0.(%(! 45:; "#!402*('! & / 954 "#!4'$%'! $ 45:6 954 ' ",'.#!.#!()# ( $$ 954 "#!'()0$' 954 "#!)'*+' 954 "#!()%.3('

954 6"#'+,-(-*- 8"#4&,&//+(

954 2"#17$/(& ! 7#!.%/$' 4544:

"#!/$00.1%0+2% ? 458= -'.,3,8% "#!,-')+%.%! ?-'.,3,8%,@#')('@,23$1#A! ?'()0$'!@#')('@,23$1#A! ",'.#!.#!()#

! !"#5-*,0,3&! ,@#')('@,23$1#A ! ! 45;< ! !"#1-,/0()&'+(!

"#!$%&'(% 45:6 954 !"#,-(./0&# "#!$%&'(% "#!4'$%'! "#!()%.3(' "#!506%.0.(%(! "#!'()0$' "#!)'*+' "#!402*('! "#!/$00.1%0+2% "#!,-')+%.%! !"#(.$)5$*(& !"#5-*,0,3& !"#-./-.- !"#1-,/0()&'+( !"#,-(./0&# !"#4-'-*0/+1 !"#,/0,3$/& !"#$%&'&()&*+( 23$14'$1-/&- !"#$%&'%()*+,-')$./,01%23$1# 4566 456: !"#4-'-*0/+1! 45<9 !"#,/0,3$/& ! "#!()%.3(' 954 4567 "#!4'$%'! ? !"#$%&'&()&*+(! '()0$'! 45;7 456= "#!506%.0.(%(! 456< 4589 "#!'()0$' @#')('@,23$1#A "#!)'*+' "#!402*('!

45:> 6&//&$14'$1-/&- 8/0.$14'$1-/&- " # 45;7 2"#17$/(& ! 7#!.%/$' 45=

Figure 4.1 Molecular and morphological hypotheses of the phylogeny of Acanthemblemaria, with Tropical Eastern Pacific (TEP) and Caribbean taxa in bold or normal font, respectively. 1A. Bayesian species tree estimated in *BEAST. Posterior probabilities are shown at all nodes and branch lengths are in units of substitutions per site. The majority of taxa in Clade I occur in the TEP, while Clade II consists of primarily Caribbean species. 1B. Morphological phylogeny inferred using maximum parsimony from Hastings (1990). 1C. 50% majority rule consensus tree from the Bayesian estimate of the morphological dataset. Posterior probabilities greater than 0.5 shown at nodes. The “hancocki” and “aspera” species groups, sensu Hastings (1990) are enclosed by boxes.

88 ! !"#*),+($%&/3$ !"#0)/)'(+3* !"#,+(,;-+&

!"#9)',(,;& !"#,)$1+(& !"#+&<)$& !"#8+--':&-/2& !"#,9)%/&'&

!"#$%&'($)

6"#'&8+) 6"#*7-+$&

5+(1-*0/-*)+&)

!"#*)+&) 4&++&-0/-*)+&) !,)'"#'"#$%"

!"#)$%-+) !"#%)3/) !"#-.&/&$%&'&$ !"#*-23$) !"#0-1&'-'$&$

" !"#0)/)'(+3* !"#*),+($%&/3$ !"#,+(,;-+&

!"#9)',(,;&

!"#,)$1+(& !"#+&<)$&

!"#,9)%/&'& !"#8+--':&-/2&

6"#'&8+) !"#$%&'($) 6"#*7-+$&

5+(1-*0/-*)+&) 4&++&-0/-*)+&) !"#*)+&) !,)'"#'"#$%" !"#-.&/&$%&'&$ !"#)$%-+) !"#%)3/) !"#0-1&'-'$&$

#

6"#'&8+) !"#8+--':&-/2& !"#,9)%/&'& 6"#*7-+$&

5+(1-*0/-*)+&) !"#*),+($%&/3$ !"#0)/)'(+3*

4&++&-0/-*)+&) !"#,+(,;-+&

!"#9)',(,;&

!"#)$%-+) !"#,)$1+(& !"#%)3/) !"#+&<)$&

!"#0-1&'-'$&$ !,)'"#'"#$%" !"#*)+&) !"#-.&/&$%&'&$

Figure 4.2 Split networks of the morphological tree with the species tree. Splits that agreed between the two trees are indicated by strictly bifurcating splits. Conflicting splits are represented as a network of edges. 2A. Split network for all taxa. 2B. Split network after the removal A. medusa. 2C. Split network after the removal of A. medusa and A. spinosa.

89 Evidence of Suites of Mutually Incompatible Characters in A. spinosa, but Not A. medusa

I investigated the characters responsible for the conflict between the morphological and molecular trees and the source of the incompatible splits. In the case of A. spinosa, the A. maria/A. spinosa split that was recovered from the morphological matrix was supported by six characters (Table 4.1A). However, six other characters in the morphological matrix were incompatible with the A. maria/A. spinosa split (Table 4.1B). When a maximum parsimony (MP) tree was inferred using only these characters, six most parsimonious trees were found, all supporting the clade (A. aspera, (A. paula, A. spinosa)). This clade was found in the species tree as well, although in the species tree the sister relationship was (A. spinosa, (A. aspera, A. paula)). A single character in the morphological matrix (57; posterior pair of anterofrontal pores fused into a single medial pore) was a synapomorphy for the clade (A. aspera, A. paula, A. spinosa).

The inclusion of A. medusa in the “aspera species group”, which consists of (A. aspera,

A. medusa, A. paula), was supported by two characters in the morphological matrix (Table 4.1C).

However, the morphological dataset contained eight characters in conflict with the “aspera species group” (Table 4.1D). Unlike the conflicting characters for the A. maria/A. spinosa split, the maximum parsimony trees inferred from the conflicting “aspera species group” did not recover the clade found in the species tree: (A. medusa, (A. maria, A. n. sp.)). Instead, all the MP trees recovered a clade consisting of A. aspera, A. chaplini, A. greenfieldi, and A. medusa and no morphological characters supported the clade found in the species tree.

Time Calibrated Phylogenies Recovered A Mid-Miocene Origin for Acanthemblemaria

Species Tree - The dated species tree analysis found that Acanthemblemaria originated in the mid-Miocene, with a complex pattern of speciation within the genus both before and after the

90 Table 4.1A-D The list of characters found which support or conflict with the placement of A. spinosa and A. medusa in the morphological phylogeny. Character numbers, names, and states are from the morphological data matrix used in this study, which was based on Hastings (1990).

Table 4.1A Characters supporting A. maria/A. spinosa split Character Number Character names and states 4 Lateral supratemporal ridge: spines present medially 5 Posterior extent of the frontal ridge: to lateral supratemporal ridge 7 Anterolateral extent of the frontal ridge: confluent with the dorsoposterior margin of the postorbital 27 Orbital margin of the postorbital: serrations or spines present Dorso-posterior margin of the postorbital: a row of laterally projecting spines present, contiguous with a row of 30 spines on the frontal wedge. Shape of the proximal dorsal-fin pterygiophores (at the level of the mid-spinous dorsal fin): a single central strut 48 present with a flat sheet of bone both anteriorly and posteriorly

Table 4.1B Characters incompatible with A. maria/A. spinosa split Character Number Character names and states Central area of the frontal wedge: an open swath with no spines or ridges present (aspera) OR spines or ridges 8 present (paula and spinosa). 31 Shape of the junction of the circumorbitals: entire, the lacrimal and postorbital both extending to the posterior angle Neural spur, a lateral projection on the anterior portion of the neural arch: present on one to four caudal vertebrae 42 (spinosa) OR present on five or more caudal vertebrae (aspera and paula) 47 Posterior inner margin of the pelvis: no ossified threads present. 56 Modal number of common pores: one 57 Posterior pair of anterofrontal pores: fused into a single medial pore

Table 4.1C Characters supporting A. medusa as part of the "aspera species group" Character Number Character names and states 21 Ventral margin of the lacrimal: three or four blades present 23 Ventral margin of the lacrimal at the third anterior infraorbital pore: a distinct notch present

Table 4.1D Characters incompatible with A. medusa as part of the "aspera species group" Character Number Character names and states 1 Anterior margin of the nasal bones: smooth (medusa and aspera) OR spines or serrations present (paula) Anterolateral extent of the frontal ridge:confluent with the middle of the supraorbital flange, at or anterior to the second supraorbital sensory pore (medusa and aspera) OR confluent with the 7 lateral edge of the supraorbital flange, at or posterior to the first supraorbital sensory pore (SOl) but anterior to the frontal/postorbital juncture (paula) Central area of the frontal wedge: an open swath with no spines or ridges present (aspera and medusa) OR spines or 8 ridges present (paula) 9 Frontal, between ridge and central swath: smooth (medusa) OR one or more spines present (aspera and paula) Epipleural ribs: present on all precaudal vertebrae (within one before to one after the last precaudal vertebra) (medusa 44 and paula) OR absent from two or more posterior precaudal vertebrae (aspera) Hypural five: ossified, autogenous (paula) OR unossified or not autogenous (aspera and medusa) (Pleisomorphic 45 condition uncertain). Membrane posterior to the dorsal and anal fins: attached to region of the procurrent rays (medusa) OR confluent with 51 the caudal fin (aspera and paula) 57 Posterior pair of anterofrontal pores: separate (medusa) OR fused into a single medial pore (aspera and paula)

91 closure of the Isthmus of Panama (Figure 4.3 and Table 4.2). The time to most recent common ancestor (TMRCA) of Acanthemblemaria was recovered with a mean of 13.1 mya and lower and upper confidence levels of 7.4 and 20.9 mya, respectively.

Three out of six terminal splits in Acanthemblemaria (or five out of eight with the outgroup taxa) were inferred to have occurred prior to the closure of the Isthmus of Panama.

Two of the three ingroup splits were the transisthmian geminates A. castroi/A. rivasi and A. betinensis/A. exilispinus with mean split times of 4.6 and 4.2 mya, respectively. The third terminal split prior to the closure of the isthmus, that of A. chaplini/A. greenfieldi, had a mean divergence date of 8.2 mya, but was not significantly older than either of the geminate taxa. In addition to those three splits, two clades that did not include transisthmian geminates were also found to have split prior to the closure of the isthmus. The (A. spinosa, (A. aspera, A. paula)) clade had a mean TMRCA of 7.7 mya and the (A. medusa, (A. maria, A. n. sp.)) clade had mean divergence time of 8.3 mya (Table 4.2).

For three pairs of terminal taxa, a split after the closure of the Isthmus of Panama could not be rejected. The mean TMRCA for two of those splits, A. aspera/A. paula and A. maria/A. n. sp. were similar, 5.3 and 5.7 mya, with lower confidence limits of 2.63 and 2.76 mya, respectively. In contrast, the third split, A. balanorum/A. macrospilus, was substantially younger, with a mean inferred divergence time of 2.7 mya. The clade to which those two species belong,

((A. hancocki, (A. crockeri, (A. balanorum, A. macrospilus))) was also inferred to have diverged after the closure of the Isthmus (3.9 mya, but with a lower confidence limit of 1.9 mya).

92 )"#&'11$2%1.3%

)"#/0(,.%$%#

!"#2.-01()&'+(#

!"#5.'.*10+2#

!"#-01-6$0&

!"#8.*-1-6&#

)"#'%*(+%

!"#-.(/01&#

!"#$%&'&()&*+(#

)"#617%$1$+%+#

)"#513-+(#

)"#5('%(#

)/($"#$"#+,"

)"#(+,1'(

)"#,(-.(

)"#+,%$4+(

7"#5&-&00+(

,"#'+-.(.*.

3"#24$0(&

!"#$%&'( !"# ("$,( +' $' (' )' #' *' # !'"'$'(')'#'*'# !"#$%& 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 4.3 Time-calibrated species tree, with branch lengths in units of millions of years. Support values for the species tree are the same as those from Figure 1A. Node bars indicate the 95% upper and lower HPDs for node heights. The vertical dashed line indicates the final closure of the Isthmus of Panama, 3.1 mya.

Concatenated Tree - The time-calibrated estimate of the phylogeny from the concatenated dataset yielded a well-supported phylogeny that was congruent with the species tree, both in topology and support, as well as divergence times of major clades and splits (Figure 4.4 and

Table 4.2). For the splits and clades that were shared between the species tree and concatenated analyses (i.e. all interspecific splits) divergence dates were in agreement. However, there was a

93 Table 4.2 Estimated divergence times for selected nodes in the species tree (top) and concatenated tree (bottom). All times are in millions of years and bold values indicate splits inferred to have occurred prior to the final closure of the Isthmus of Panama, 3.1 mya. SPECIES TREE DIVERGENCE TIMES NODE MEAN LOWER 95 HPD UPPER 95 HPD Root 29.07 16.74 47.97 Acanthemblemaria 13.15 7.75 21.44 A. betinenis/A. exilispinus 4.16 3.1 6.29 A. castroi/A. rivasi 4.63 3.1 7.31 A. spinosa(A. aspera,A. paula) 7.71 4.12 12.56 A. chaplini/A. greenfieldi 8.2 4.45 13.42 "barnacle blennies" 9.63 5.47 15.73 A. aspera/A. paula 5.33 2.63 8.9 Clade I 10.53 6.14 17.25 Clade II 10.74 6.14 17.25 myersi/E. nigra 8.46 4.12 14.46 Cirriemblemaria lucasana/ Protemblemaria bicirrus 12.08 6.04 20.44 A. maria/A. n. sp. 5.74 2.76 9.62 A. medusa, (A. maria/A. n. sp.) 8.31 4.53 13.57 "hancocki species group" 3.91 1.94 6.5 A. balanorum/A. macrospilus 2.71 1.08 4.71 A. crockeri, (A. balanorum/A. macrospilus) 3.54 1.78 6.01 A. medusa, A. n. sp., A. maria, A. exiispinus, A. betinensis 9.6 5.54 15.38

CONCATENATED TREE DIVERGENCE TIMES NODE MEAN LOWER 95 HPD UPPER 95 HPD Root 24.93 14.82 38.06 Acanthemblemaria 11.47 7.11 17.4 A. betinenis/A. exilispinus 3.96 3.1 5.72 A. spinosa(A. aspera,A. paula) 7.17 4.25 10.98 "barnacle blennies" 8.61 5.35 13.12 A. aspera/A. paula 5.1 2.94 7.99 Clade I 9.7 6.08 14.81 Clade II 9.4 5.79 14.24 /E. nigra 7.49 3.97 11.96 Cirriemblemaria lucasana/ Protemblemaria bicirrus 10.97 5.99 17.3 A. maria/A. n. sp. 5.46 3.06 8.55 A. medusa, (A. maria/A. n. sp.) 7.61 4.56 11.67 "hancocki species group" 3.66 2.01 5.73 A. balanorum/A. macrospilus 2.61 1.32 4.16 A. crockeri, (A. balanorum/A. macrospilus) 3.2 1.73 5.03 A. medusa, A. n. sp., A. maria, A. exiispinus, A. betinensis 8.75 5.43 13.3 A. paula TMRCA 1.45 0.62 2.46 A. aspera TMRCA 0.58 0.13 1.16 A. spinosa TMRCA 1.61 0.7 2.7 A. maria TMRCA 1.14 0.47 1.96 A. medusa TMRCA 1.39 0.55 2.44 A. cf. chaplini TMRCA 0.72 0.21 1.37 A. rivasi/A. cf. rivasi 0.97 0.38 1.71 A. rivasi s.l./A. castroi 4.36 3.1 6.57 A. chaplini/A. cf. chaplini 5.06 2.77 7.95 A. chaplini s.l./ A. greenfieldi 7.64 4.54 11.74

94 trend towards older divergence estimates from the species tree analysis compared to the concatenated analysis, although it was not significant (Table 4.2).

The concatenated analysis revealed substantial divergences among populations for six nominal species: A. chaplini, A. rivasi, A. medusa, A. maria, A. paula, and A. spinosa, where the mean TMRCA was at least 1 mya for all taxa (except A. rivasi at 0.97 mya). The most extreme example comes from A. chaplini. Individuals sampled from Bocas del Toro, Panama and the

Abacos in northwest Bahamas were deeply diverged from A. chaplini sampled from New

Providence, in the central Bahamas (Figure 4). The mean TMRCA for the intraspecific split in A. chaplini was 5.06 mya, with lower and upper HPDs of 2.77 and 7.95 mya, respectively (Table 2).

This split time was significantly older than the one between the A. chaplini individuals from

Panama and the northwest Bahamas (Table 2 and Figure 4.4). As opposed to A. chaplini, the split times were not significantly different among populations for any of the five other species with substantial intraspecific divergences (Figure 4.4).

Discussion Acanthemblemaria – Molecules versus Morphology

My phylogenetic reconstruction of the genus Acanthemblemaria based on molecular data was in significant conflict with the phylogenetic estimate of the group based on morphological data

(Figs. 4.1 and 4.2). This was surprising because most of the morphological characters were related to complex spinous processes on the skulls of Acanthemblemaria blennies (Hastings,

1990) composed of different, presumably independent, bones (Hastings, 1990; Smith-Vaniz and

Palacio, 1974).

Two species were responsible for most of the conflict between the molecular and morphological phylogenies – A. medusa and A. spinosa (Figure 4.2). A. spinosa (the “spinyhead blenny”) has an elaborate suite of spinous processes, the bases for the generic moniker

95 Acanthemblemaria (akanthos- Greek for “thorn”) (Metzelaar, 1919). Analyses based on morphological data recovered A. maria as its sister species, both in this study (Figure 4.1C), and in Hastings (1990) (Figure 4.1B). A. maria has the most elaborate spinous processes in the group

(Böhlke, 1961) and a gross skull morphology similar to A. spinosa (Smith-Vaniz and Palacio,

1974). However, the inferred sister relationship between A. maria and A. spinosa was,

!"#-&**,:'*/;'#,2 !"#-&**,:'*/;'#,2 !"#4:"#49%0/','#0.-/ !"#4:"#49%0/','#,- !"#49%0/','#,- !"#$%4&5+0'/1+#123 !"#$%4&5+0'/1+#123 !"#8%/%,5&1$#123 !"#8%/%,5&1$#123 !"#4&546*&'#123 !"#4&546*&'#123 !"#9%,4546'#..-/ !"#9%,4546'#09 !"#&'2%+'#0.-/ !"#&'2%+'#0.-/ !"#4:"#&'2%+'#<2/ !"#4%+7&5'#4-5 !"#*.'/'+0',1+#..-/ !"#*.'/'+0',1+#..-/ !"#8*7',*,+'+#0.-/ !"#8*7',*,+'+#0.-/ !"#$*;1+%#0: !"#$*;1+%#631 !"#$*;1+%#<2 !"#$%&'%#,2 !"#$%&'%#,- !"#$%&'%#0.-/ !"#$%&'%#6;3 !"#$%&'%#631 !4%,"#,"#+0"#<2 !4%,"#,"#+0"#<2 !"#%+0*&%78/ !"#%+0*&%#631 !"#0%1/%#,- !"#0%1/%#,- !"#0%1/%#,2 !"#0%1/%#,2 !"#+0',5+%#8/ !"#+0',5+%#.9 !"#+0',5+%#<2 !"#+0',5+%#631 !"#+0',5+%#0: <"#8'4'&&1+#123 3"#/14%+%,%#123 ("#$)*&+'#..-/ ("#$)*&+'#6-5 ("#,'-&%#0.-/

!"# +' $' (' )' #' *' # !'"'$'(')'#'*'# !"#$%& %$ %# $+ $* $) $( $' $& $! $% $$ $# + * ) ( ' & ! % $ #

Figure 4.4 Time-calibrated Bayesian phylogeny of the concatenated dataset, with branch lengths in units of millions of years. Branches subtending nodes with <0.95 BPP are light; all others are bold. Node bars indicate the upper and lower 95% HPDs for node heights and the vertical dashed line indicates the final closure of the Isthmus of Panama, 3.1 mya. Locality abbreviations are listed after species names and are as follows: BA: Bahamas, BE: Belize, CPAN: Caribbean Panama, CR: Costa Rica, CU: Curaçao, GAL: Galapagos, HN: Honduras, MEX: Pacific Mexico, PPAN: Pacific Panama, PR: Puerto Rico, SAL: El Salvador, STX: St. Croix, SXM: Saint Maarten, VE: Venezuela. All locality information can be found in the Appendix.

96 unexpectedly, not reflected in the genetically-based species tree, where A. spinosa was recovered as sister to A. aspera and A. paula (Figure 4.1A).

The A. maria/A. spinosa clade recovered from the analyses of the morphological data was supported by six characters (Table 4.1A). Five of these come from three bones in the skull: the frontals, the supratemporal ridge, and the postorbitals (Table 4.1A and Hastings (1990). In both species, the lateral supratemporal ridge and the dorso-posterior margin of the postorbital contain spines that are confluent with those found on the frontals (Hastings, 1990). This may functionally constrain the possible character states of the postorbital and the supratemporal ridge.

Another possibility is that a shared pathway is responsible for the development of these three bones, and the implementation of that pathway has evolved independently in A. maria and A. spinosa. These hypotheses are not mutually exclusive.

Six morphological characters were incompatible with an A. maria and A. spinosa clade

(Table 4.1B). A parsimony analysis of these six characters recovered the clade (A. aspera, A. paula, A. spinosa), which was also found in the species tree analysis (Figure 4.1A). However, in contrast to the species tree, the parsimony analysis recovered A. spinosa sister to A. paula, with

A. aspera sister to these two taxa. Only one of those six characters relate to spines (Table 4.1B) and its state is shared by A. paula and A. spinosa (Hastings, 1990). Taken together with the convergent character states of skull bones in A. maria and A. spinosa, this result gives credence to the idea that suites of characters relating to spinous processes have evolved multiple times in

Acanthemblemaria. These results suggest that although there was strong support in the morphological data for the sister relationship of A. maria and A. spinosa, there was also some support for the (A. aspera, A. paula, A. spinosa) clade, but it got “outvoted” in the morphological analyses.

97 In contrast to A. spinosa, the placement of A. medusa in the morphological analyses does not appear to be caused by convergence. The morphological phylogeny places A. medusa sister to A. aspera and A. paula, in the “aspera species group” (Figures 1A and 1B, and Hastings

1990). This group is supported by two synapomorphies, both related to the lacrimal bone (Table

4.1C). However, more characters did not support the “aspera species group” than did; eight in total (Table 4.1D). When parsimony trees were constructed using these eight characters, the clade found in the species tree (A. maria, A. medusa, Acan. n. sp.) was not recovered. These results show that there was not strong support for the “aspera species group” in the morphological data. However, in contrast to A. spinosa, there was not support for an alternate placement of A. medusa.

Suites of characters can create substantial errors in phylogenetic analyses based on morphology because they can create the illusion that relationships are supported by more independent characters than is the case. Examples of suites of correlated characters point to the role of natural selection in the repeated evolution of functionally adaptive phenotypes (Emerson,

1982; Emerson and Hastings, 1998; Holland et al., 2010; McCracken et al., 1999).

The function of the spinous processes on the skull bones of Acanthemblemaria is not known. Acanthemblemaria blennies spend most of their lives in vacated invertebrate holes

(Böhlke, 1957; Böhlke and Chaplin, 1993). As such, the heads of these fishes are frequently the only exposed part of their bodies and thus likely targets for (possibly convergent) selective pressure. While it has been proposed that there may be selection for skulls that efficiently block the blenny shelters as a means of defense against predators (Lindquist and Kotrschal, 1987), this hypothesis has not been tested. Skull morphology does not appear to be important in feeding behavior, nor does it influence predation success (Clarke et al., 2009; Clarke et al., 2005; Finelli

98 et al., 2009). In addition, skull morphology does not appear to be influenced by interspecific competition for resources (Lindquist and Kotrschal, 1987). In fact, extensive spination might incur a cost, as it may restrict lateral movement of the jaw in these fishes (Rosenblatt and

Stephens, 1978).

It seems unlikely that A. maria and A. spinosa are subject to the same selective pressures.

A. maria occurs in high-energy environments on the reef crest or in shallow water and generally does not live in live or standing dead corals, nor does it shelter in holes high up off the reef substrate (Clarke, 1994; Greenfield, 1981; Greenfield and Johnson, 1990); Eytan and Hellberg, unpub. data). A. spinosa, on the other hand, is found in deeper, lower energy sections of the reef, typically in live or standing dead coral not close to the reef substrate (Clarke, 1989; Clarke,

1994; Clarke, 1996; Greenfield and Greenfield, 1982) Eytan and Hellberg, unpub. data).

Convergence in the skull spines of A. maria and A. spinosa may have arisen due to heterochrony. All Acanthemblemaria species have spinous processes on the frontal bones, but with differences in the degree of spination. A common pathway could underlie the development of spines in all species and different phenotypes arise due to differences in developmental timing. In the case of A. maria and A. spinosa, hypermorphosis, where there is a delay in the offset of a developmental process, could give rise to the extreme spination found in these species. As suggested by Emerson and Hastings (1998), this could be tested by studying the ontogenetic trajectory of spine development in a number of different Acanthemblemaria species to determine the onset and offset of this trait.

Acanthemblemaria Diversity

My results demonstrate that Acanthemblemaria species diversity is presently under-described.

The molecular phylogenies inferred in this study supported the inclusion of the undescribed

99 species from Isla Margarita (Acan. n. sp.) as belonging to Acanthemblemaria (Figures 4.1A and

4.4). In addition, two other lineages were identified as possible undescribed taxa. The first represents a population of A. rivasi from coastal Venezuela. Acero (1984) noted diagnosable differences between A. rivasi populations from the southern and southwestern Caribbean and those from Central America, where the species was originally described by Stephens (1970).

Acero found that A. rivasi individuals from Colombia and Venezuela have significantly different numbers of total dorsal fin and segmented anal fin elements from those in Costa Rica and

Panama. In addition, individuals from Venezuela have a pattern of bright blue dots on the head not found in Central American populations. These meristic and color differences between A. rivasi populations may warrant the description of a new species restricted to the south and southwestern Caribbean (A. Acero pers. comm. to P. Hastings), a valid diagnosis supported by the reciprocal monophyly of Venezuelan and Panamanian A. rivasi based on the concatenated dataset (Figure 4.4).

Another undescribed species, sister to A. chaplini, was found in the concatenated phylogeny. A. chaplini from New Providence, Bahamas, was recovered as sister to A. chaplini individuals from the Abacos in the Bahamas and Panama (Figure 4.4). These two were separated from the New Providence individual by a long branch, with a mean TMRCA of 5 my, which was deeper than that of some nominal congeners (Table 4.2 and Figure 4.4). This is despite the much greater distance between the Abacos and Panama (~2000 km) than the Abacos and New

Providence (~130 km). The Abacos and New Providence are separated by the deep waters of the

Northeast Providence Channel, which may help maintain the deep genetic divergence between the two populations. However, the Caribbean Sea between the Bahamas and Panama is not

100 shallow, discounting the possibility that water depth alone was responsible for the isolation of these lineages.

Because New Providence is the type locality for A. chaplini (Böhlke, 1957), the individuals from the Abacos and Panama should be described as a new species. A species similar to A. chaplini, A. cubana, was recently described from Cuba (Garrido and Varela, 2008). A. cubana lives in sympatry with A. chaplini on Cuban reefs and is distinguished from the latter by slight differences in papillae. Given the slight differences between A. cubana and A. chaplini, it is not clear if the former is a valid species. However, those subtle differences may represent a deeply divergent lineage, such as the one I found in this study. Without further study it is difficult to determine the validity of A. cubana, whether it represents one of the two lineages I have sampled here, or if it belongs to a third, unsampled, lineage.

Biogeography and Timing of Speciation in Acanthemblemaria

My divergence dating analyses for the genus Acanthemblemaria recovered a mid-Miocene origin for the genus and species pairs were found to have diverged both before and after the closure of the Isthmus of Panama (Figures 4.3 and 4.4). In addition, I found that sister taxa had a range of distributions, from broadly sympatric to completely allopatric (Figures 4.5 and Hastings 2009).

The Isthmus of Panama has long been recognized as a major driver of allopatric marine speciation in the Neotropics (Hastings, 2000; Hastings, 2009; Jordan, 1908; Knowlton et al.,

1993; Lessios, 2008; Lessios et al., 2001). However, its importance in the diversification of reef fishes has been equivocal. Taylor and Hellberg (2005) found that for the Neotropical goby genus

Elacatinus, the Isthmus of Panama was associated with two splits and that no sister taxa were transisthmian geminates. Instead, the Risor clade was divided by the Isthmus, as was a basal

Elacatinus species, which was sister to the rest of the genus. Likewise, Rocha et al. (2008) found

101

A. aspera

A. paula

A. spinosa

A

A. maria

Acan. n. sp.

A. medusa

B

A. chaplini

A. greenfieldi

C

Figure 4.5 The distributions (in yellow or blue) and degree of range overlap (in green) for the three Caribbean clades. 5A. A. spinosa, (A. aspera, A. paula). 5B. A. medusa, (A. maria, Acan. n. sp.). 5C. A. chaplini, A. greenfieldi.

102 that for Haemulon grunts there was limited support for the Isthmus playing a role in generating diversity. A single pair of geminate taxa was recovered in their analysis, while two pairs of taxa proposed by Jordan to be geminates (Jordan, 1908) were not. However, they did recover sister clades sundered by the Isthmus (Rocha et al., 2008).

My results are similar to these two studies, but with a more complicated pattern. I recovered two pairs of geminate taxa, A. betinensis and A. exilispinus, and A. castroi and A. rivasi (Figures 4.1, 4.3, 4.4). Both pairs were sister to other clades or pairs of species, and neither was nested in the phylogeny. I also recovered a basal split in Clade I between A. greenfieldi and

A. chaplini and the “hancocki species group”. Therefore, the Caribbean taxa were not monophyletic. This split in Clade I was quite old, with a mean TMRCA of 10.5 my and 9.7 my, respectively, and matched the TMRCA of Clade II (Table 2). The basal split between A. greenfieldi and A. chaplini and the “hancocki species group” was surprising, as they are well separated by morphology and by distribution (Hastings, 1990; Smith-Vaniz and Palacio, 1974).

Given the age of this split and difference between these species, Clade I may have been larger in the past, with subsequent extinctions, as suggested by the distributions of A. chaplini and A. greenfieldi (see below).

Both Taylor and Hellberg (2005) and Rocha et. al (2008) found that the majority of taxa in their studies diversified within ocean basins. However, the geography of speciation differed between Elacatinus and Haemulon. Taylor and Hellberg found that Caribbean Elacatinus species diversified in allopatry and that sister taxa had either allopatric or micro-allopatric distributions

(Taylor and Hellberg, 2005). In contrast, Rocha et al. found that most sister taxa and closely related species had sympatric distributions (Rocha et al., 2008).

103 In this study, I found a combination of both patterns. The distributions of sister taxa and sister clades overlapped substantially in some cases, while others were allopatric (Figure 4.5).

The three Caribbean clades (A. spinosa, (A. aspera, A. paula); A. medusa, (A. maria, Acan. n. sp.); A. chaplini, A. greenfieldi) varied in their extent of range overlap (Figure 4.5). The species in the A. spinosa, (A. aspera, A. paula) clade had the largest degree of range overlap (Figure

4.5A). A. aspera and A. spinosa co-occur over a large portion of their respective ranges. A. paula was found in close sympatry with these species in two locations: the Belizean barrier reef and

New Providence in the Bahamas. Since its description, A. paula has been considered a micro- endemic species, thought to only occur in a small area in Belize (Hastings, 2009; Johnson and

Brothers, 1989). The species is very small (18 mm maximum standard length), lays few eggs

(less than 5 per brood), and is a habitat specialist (Greenfield and Greenfield, 1982; Johnson and

Brothers, 1989), giving credence to the idea that its ability to colonize new regions is poor. Here

I document a 1500-kilometer range extension for the species, showing that A. paula’s distribution is much larger than previously thought.

These three taxa demonstrate fine scale habitat partitioning where they co-occur. In

Belize, each species is found on a different section of the reef, spanning a depth gradient from

~1-5 meter in A. paula, 3-15 meters in A. spinosa, and 8-22 meters in A. aspera (Clarke, 1994;

Eytan and Hellberg, unpub. data). Where they co-occur, these species partition out the substrate by hole size, coral type, and shelter height, in some cases all co-occurring on the same stand of coral (Clarke, 1994; Eytan and Hellberg, unpub. data). This fine scale partitioning could be an example of ecological character displacement to allow taxa to co-exist (Bay et al., 2001;

Robertson, 1996). Alternatively, these species may have diverged in parapatry with disruptive

104 selection due to competition for shelters driving speciation. However, evidence to support either hypothesis is lacking, and further study is warranted to address this question.

The A. medusa, (A. maria, Acan. n. sp.) clade also had a broad distribution and often overlapping ranges, but in no case do sister taxa. Acan. n. sp was recovered as sister to A. maria.

This new species has never been recorded east of Isla Margarita (Ramjohn, 1999), nor has it been recorded as far west as Los Roques, Venezuela (Cervigón, 1991), suggesting that its distribution is quite restricted. While its range is close to that of A. maria, the two taxa do not overlap, but have abutting distributions (Figure 4.5B).

The sister pair of A. chaplini and A. greenfieldi exist in complete allopatry with disjunct ranges (Figure 4.5C). A. chaplini is found in Florida and the Bahamas, as well as further south in

Panama. Meanwhile, A. greenfieldi is found in the central and western Caribbean, in between the two regions where A. chaplini is found. A Panama – Florida distributional tract may not be uncommon though, as it has been found in both Elacatinus gobies and the coral Acropora palmata (Baums et al., 2005; Taylor and Hellberg, 2005; Taylor and Hellberg, 2006). These two species have the oldest divergence time of any Acanthemblemaria sister taxa (Figures 4.3 and

4.4, Table 4.2). It may be that extensive extinctions have occurred since these taxa split, perhaps in the eastern Caribbean or Caribbean coast of South America, leaving the observed allopatric distribution.

In contrast to the old split between A. chaplini and A. greenfieldi, the “hancocki species group” in the eastern Pacific is a young clade. The mean TMRCA of all taxa in the group was estimated to be 3.91 or 3.66 my for the species tree and concatenated analyses, respectively

(Table 4.2). However, the lower 95% HPD was as young as 1.9 mya. Coupled with their continental distribution, this suggests diversification of this species group occurred after the

105 closure of the Isthmus of Panama. The sister taxa in this group, A. macrospilus and A. balanorum occur in sympatry in Mexico (Figure 2.1.2 in Hastings 2009). As in the A. spinosa, (A. aspera, A. paula) clade, where macrospilus and A. balanorum co-occur, they partition out the available habitat along a depth gradient (Lindquist, 1985). The basal member of the group, A. hancocki, is found in strict allopatry with regard to the rest of the species in this clade (Hastings, 2000;

Hastings, 2009).

Determining the geography of speciation for any taxonomic group is difficult because current species distributions may not reflect those at the time of speciation (Losos and Glor,

2003). In the case of Acanthemblemaria, this is exacerbated by evidence that extinction (Clarke,

1996; Eytan and Hellberg, 2010), poorly known geographic ranges (Dennis et al., 2004; Dennis et al., 2005; Hastings and Robertson, 1999; this study), and the presence of cryptic taxa

(Hastings, 2009; this study) may be common in this genus. However, the latter would not change the interpretation of the geographic distributions of congeners, as in all cases newly described or discovered species have been found to be sister to the nominal taxa.

Conclusions In this study, three lineages were recovered as possible new species, which would bring the membership of the genus to 25 taxa, making Acanthemblemaria one of the most species-rich clades of Neotropical coral reef fishes. I found that that the head spines characteristic of

Acanthemblemaria have evolved numerous times, leading to conflict between the morphological and molecular phylogenies of the group. This was typified by A. spinosa and A. maria, both of which have elaborate spinous processes, but were not recovered as sister to each other in the molecular phylogenetic analyses. Numerous skull bones appear to have evolved in concert, perhaps due to selection acting on constrained developmental pathways. Bayesian divergence dating found that the genus diverged in the mid-Miocene. A complex pattern was recovered of

106 clades diverging both before and after the closure of the Isthmus of Panama, almost entirely within ocean basins. While several clades had overlapping ranges, most sister taxa occur in allopatry. The exception was the A. spinosa, (A. aspera, A. paula) clade, which exists in sympatry. Fine scale habitat segregation may allow for co-existence of these taxa, and warrants further study.

107 Chapter 5: Conclusions

A historical perspective is necessary to understand the origins and maintenance of genetic and species diversity on coral reefs. This dissertation has focused on a group of understudied coral reef fishes, the Neotropical blenny genus Acanthemblemaria. In Chapter 2, I reconstructed the historical demography for two closely related Acanthemblemaria species, A. aspera and A. spinosa using sequence data from one mitochondrial and two nuclear markers. I found that, despite being closely related, with similar life histories, A. spinosa populations were able to persist through the most recent glacial maximum, while A. aspera population were not, as evidenced by a range-wide population expansion beginning ~20 kya in the latter and not the former. I was able to recover this recent expansion because the smaller effective population size and rapid substitution rate of the mitochondrial data provided a strong demographic signal. On the other hand, the slower nuclear DNA recovered an older expansion for both species. However, the rapid mitochondrial substitution rate also obscured the recent expansion in A. aspera.

Analyses of the mitochondrial data using frequency-based metrics alone did not indicate the underlying population expansions in A. aspera, neither young, nor old. The results of the frequency-based tests lead to a pattern of subdivision that was very similar to that of A. spinosa even though the underlying demography of the two species was quite different.

Analyses of substitution rates for the genus found that mitochondrial DNA is evolving at an extremely fast rate, both absolute and relative to nuclear DNA (Chapter 3). When estimated across a phylogeny of the entire genus constructed using 5 nuclear and 1 mitochondrial marker, mitochondrial COI was found to be evolving at nearly 25% pairwise sequence divergence per million years and 97.5X faster than nuclear DNA, putting Acanthemblemaria blennies at the highest end of vertebrate mitochondrial substitution rates. This rapid rate may have

108 consequences for post-zygotic hybrid breakdown if co-adapted nuclear and mitochondrial genes were to function poorly on hybrid genomic backgrounds.

Given the growing number of very large datasets for tree of life projects (tens of genetic markers for hundreds of taxa), I tested whether large differences in substitution rates among markers affected the performance of partitioned Bayesian analyses of phylogeny (Chapter 3).

When I tested the effect of the rapid COI substitution rate on the performance of Bayesian phylogenetic analyses, I found that the rapid rate did not affect partitioned phylogenetic analyses.

Instead, the number of partitions had a direct effect on the ability of partitioned analyses to converge on the posterior distribution of marginal likelihood, regardless of the relative substitution rates among markers. As more partitions were added, performance decreased.

However, this was completely ameliorated by adjusting the proposal acceptance rate for the rate multiplier parameter. To ensure convergence of large partitioned datasets, it is essential to adjust this acceptance rate.

In contrast to the partitioned Bayesian analyses, I found that the rapid COI substitution rate did affect Bayesian species tree analyses (Chapter 3). When COI was added to the data matrix, the posterior estimates of topology were much poorer than when nuclear DNA was used alone. This appeared to be due to the large amount of information in the COI dataset relative to the nuclear markers, where, although accounting for 14% of the dataset, contained over 1/3rd of the parsimony informative sites. This allowed it to have a disproportionately large effect the species tree analysis. However, that information appeared to contain more signal than noise, and adversely affected species tree estimates. Care should be given to species tree analyses where a minority of the dataset is responsible for a majority of its information.

109 The molecular phylogenies estimated for Acanthemblemaria were found to significantly conflict with those derived from a morphological dataset. I investigated the cause of this incongruence by comparing the species tree to the morphological tree to identify the specific morphological characters responsible for this conflict (Chapter 4). I found that the head spines characteristic of Acanthemblemaria have evolved numerous times, leading to incongruence between the morphological and molecular phylogenies of the group. This was typified by A. spinosa and A. maria, both of which have elaborate spinous processes, but were not recovered as sister to each other in the molecular phylogenetic analyses. Numerous skull bones appear to have evolved in concert, perhaps due to convergent selection acting on constrained developmental pathways.

Analysis of the concatenated phylogeny for the genus (Chapter 4) recovered three lineages as possible new species, which would bring the membership of the genus to 25 taxa, making Acanthemblemaria one of the most species-rich clades of Neotropical coral reef fishes.

Bayesian divergence dating found that the genus diverged in the mid-Miocene. A complex pattern was recovered of clades diverging both before and after the closure of the Isthmus of

Panama, almost entirely within ocean basins. While several clades had overlapping ranges, most sister taxa occur in allopatry. The exception was the A. spinosa, (A. aspera, A. paula) clade, which exists in sympatry. Fine scale habitat segregation may allow for co-existence of these taxa, and warrants further study.

These studies of the genus Acanthemblemaria elucidated the processes responsible for the genetic and species diversity found in the group. As coral reef ecosystems are under increasing threat, understanding the evolutionary processes underlying their constituent taxa is of

110 the utmost importance. Given their high species-richness, regional endemicity, but poorly understood biology, this is especially true for cryptic reef fish groups such as Acanthemblemaria.

111 References

Acero, A. P. 1984. The Chaneopsine blennies of the southwestern Caribbean (Pisces: Clinidae: Chaenopsinae). II. The genera Acanthemblemaria, Ekemblemaria and Lucayablennius. Revista de Biologia Tropical 32:35-44.

Almany, G. R., and C. C. Baldwin. 1996. A new Atlantic species of Acanthemblemaria (Teleostei: Blennioidei: Chaenopsidae): Morphology and relationships. Proceedings of the Biological Society of Washington 109:419-429.

Avise, J. C. 2000. Phylogeography: the history and formation of species. Harvard University Press, Cambridge, Massachusetts.

Axelsson, E., N. Smith, H. Sundstrom, S. Berlin, and H. Ellegren. 2004. Male-biased mutation rate and divergence in autosomal, Z-linked and W-linked introns of chicken and turkey. Molecular Biology and Evolution 21:1538.

Barrowclough, G. F., and R. M. Zink. 2009. Funds enough, and time: mtDNA, nuDNA and the discovery of divergence. Molecular Ecology 18:2934-2936.

Baums, I. B., M. W. Miller, and M. E. Hellberg. 2005. Regionally isolated populations of an imperiled Caribbean coral, Acropora palmata. Molecular Ecology 14:1377-1390.

Bay, L. K., G. P. Jones, and M. I. McCormick. 2001. Habitat selection and aggression as determinants of spatial segregation among damselfish on a coral reef. Coral Reefs 20:289-298.

Bellwood, D. R., and T. P. Hughes. 2001. Regional-scale assembly rules and biodiversity of coral reefs. Science 292:1532-1534.

Bellwood, D. R., and P. C. Wainwright. 2002. The History and Biogeography of Fishes on Coral Reefs. Pages 5-32 in Coral Reef Fishes: Dynamics and Diversity in a Complex Ecosystem (P. F. Sale, ed.) Academic Press, San Diego.

Böhlke, J. 1957. The Bahaman species of emblemariid blennies. Proceedings of the Academy of Natural Sciences of Philadelphia 109:25-60.

112

Böhlke, J. E. 1961. The Atlantic species of the Clinid fish genus Acanthemblemaria. Notulae Naturae of The Academy of Natural Sciences of Philadephia 346:1-7.

Böhlke, J. E., and C. G. C. Chaplin. 1993. Fishes of the Bahamas and Adjacent Tropical Waters, 2nd edition. University of Texas Press, Austin.

Bouckaert, R. 2010. DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26:1372.

Bowen, B. W., A. L. Bass, A. Muss, J. Carlin, and D. R. Robertson. 2006. Phylogeography of two Atlantic squirrelfishes (Family Holocentridae): exploring links between pelagic larval duration and population connectivity. Marine Biology 149:899-913.

Brandley, M. C., A. Schmitz, and T. W. Reeder. 2005. Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Systematic Biology 54:373- 390.

Briggs, J. C. 1974. Marine Zoogeography. McGraw-Hill Book Company, New York.

Brito, P., and S. Edwards. 2009. Multilocus phylogeography and phylogenetics using sequence- based markers. Genetica 135:439-455.

Brown, J., S. Hedtke, A. Lemmon, and E. Lemmon. 2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Systematic Biology 59:145.

Brown, J. M., and A. R. Lemmon. 2007. The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Systematic Biology 56:643-655.

Brunsfeld, S. J., J. Sullivan, D. E. Soltis, and P. S. Soltis. 2001. A comparative phylogeography of northwestern North America: a synthesis. Pages 319-340 in Integrating Ecology and Evolution in a Spatial Context (J. Silvertown, and J. Antonovics, eds.). Blackwell Science, Oxford.

113 Burney, C. W., and R. T. Brumfield. 2009. Ecology Predicts Levels of Genetic Differentiation in Neotropical Birds. American Naturalist 174:358-368.

Burton, R. S., C. K. Ellison, and J. S. Harrison. 2006. The sorry state of F-2 hybrids: Consequences of rapid mitochondrial DNA evolution in allopatric populations. American Naturalist 168:S14-S24.

Caccone, A., G. Gentile, C. Burns, E. Sezzi, W. Bergman, M. Ruelle, K. Saltonstall, and J. Powell. 2004. Extreme difference in rate of mitochondrial and nuclear DNA evolution in a large ectotherm, Galapagos tortoises. Molecular Phylogenetics and Evolution 31:794- 798.

Carling, M. D., and R. T. Brumfield. 2007. Gene Sampling Strategies for Multi-Locus Population Estimates of Genetic Diversity (θ). PLoS ONE 2.

Carstens, B. C., S. J. Brunsfeld, J. R. Demboski, J. M. Good, and J. Sullivan. 2005. Investigating the evolutionary history of the Pacific Northwest mesic forest ecosystem: Hypothesis testing within a comparative phylogeographic framework. Evolution 59:1639-1652.

Cervigón, F. 1991. Los peces marinos de Venezuela. Fundación Científica Los Roques.

Clarke, R., C. Finelli, and E. Buskey. 2009. Water flow controls distribution and feeding behavior of two co-occurring coral reef fishes: II. Laboratory experiments. Coral Reefs 28:475-488.

Clarke, R. D. 1989. Population fluctuation, competition and microhabitat distribution of 2 species of tube blennies, Acanthemblemaria (Teleostei, Chaenopsidae). Bulletin of Marine Science 44:1174-1185.

Clarke, R. D. 1994. Habitat partitioning by chaenopsid blennies in Belize and the Virgin Islands. Copeia:398-405.

Clarke, R. D. 1996. Population shifts in two competing fish species on a degrading coral reef. Marine Ecology-Progress Series 137:51-58.

114 Clarke, R. D., E. J. Buskey, and K. C. Marsden. 2005. Effects of water motion and prey behavior on zooplankton capture by two coral reef fishes. Marine Biology 146:1145-1155.

Clement, M., D. Posada, and K. A. Crandall. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9:1657-1659.

Cunningham, C. W., and T. M. Collins. 1994. Developing model systems for molecular biogeography: vicariance and interchange in marine invertebrates. Pages 405-433 in Molecular Ecology and Evolution: Approaches and Applications (B. Schierwater, B. Streit, P. Wagner, and R. DeSalle, eds.). Birkhaueser Verlag, Basel, Switzerland.

Dennis, G. D., D. Hensley, P. L. Colin, and J. J. Kimmel. 2004. New records of marine fishes from the Puerto Rican Plateau. Caribbean Journal of Science 40:70-87.

Dennis, G. D., W. F. Smith-Vaniz, P. L. Colin, D. A. Hensley, and M. A. McGehee. 2005. Shore fishes known from islands of the Mona Passage, Greater Antilles with comments on their zoogeography. Caribbean Journal of Science 41.

Drummond, A., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology 7:214.

Drummond, A., A. Rambaut, B. Shapiro, and O. Pybus. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular Biology and Evolution 22:1185.

Drummond, A. J., B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, S. Stones-Havas, T. Thierer, and A. Wilson. 2007a. Geneious v3.6. Available from http://www.geneious.com.

Drummond, A. J., B. Ashton, M. Cheung, J. Heled, M. Kearse, R. Moir, S. Stones-Havas, T. Thierer, and A. Wilson. 2009. Geneious v4.5.4. Available from http://www.geneious.com.

Drummond, A. J., S. Y. W. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and dating with confidence. Plos Biology 4:699-710.

115 Drummond, A. J., S. Y. W. Ho, N. Rawlence, and A. Rambaut. 2007b. A rough guide to BEAST 1.4. BEAST Manual.

Earl, D. 2009. Structure Harvester v0.3. Available from: http://users.soe.ucsc.edu/~dearl/software/struct_harvest/.

Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32:1792-1797.

Edwards, S., and S. Bensch. 2009. Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008. Molecular Ecology 18:2930-2933.

Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging? Evolution 63:1-19.

Emerson, S. B. 1982. Frog postcranial morphology: identification of a functional complex. Copeia 1982:603-613.

Emerson, S. B., and P. A. Hastings. 1998. Morphological correlations in evolution: consequences for phylogenetic analysis. The Quarterly Review of Biology 73:141-162.

Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14:2611-2620.

Excoffier, L., R. Petit, and M. Foll. 2009. Genetic consequences of range expansion. Annual Review of Ecology, Evolution, and Systematics 40.

Excoffier, L., P. E. Smouse, and J. M. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes - application to human mitochondrial- DNA restriction data. Genetics 131:479-491.

Eytan, R. I., and M. E. Hellberg. 2010. Nuclear and mitochondrial sequence data reveal and conceal different demographic histories and population genetic processes in Caribbean reef fishes. Evolution 9999.

116 Fauvelot, C., G. Bernardi, and S. Planes. 2003. Reductions in the mitochondrial DNA diversity of coral reef fish provide evidence of population bottlenecks resulting from Holocene sea-level change. Evolution 57:1571-1583.

Felsenstein, J. 2006. Accuracy of coalescent likelihood estimates: Do we need more sites, more sequences, or more loci? Molecular Biology and Evolution 23:691-700.

Finelli, C., R. Clarke, H. Robinson, and E. Buskey. 2009. Water flow controls distribution and feeding behavior of two co-occurring coral reef fishes: I. Field measurements. Coral Reefs 28:461-473.

Flot, J. 2007. CHAMPURU 1.0: a computer software for unraveling mixtures of two DNA sequences of unequal lengths. Molecular Ecology Notes 7:974-977.

Flot, J.-F. 2009. seqPHASE: a web tool for interconverting PHASE input/output files and FASTA sequence alignments. Molecular Ecology Resources 10:162-166.

Fu, Y. X. 1997. Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics 147:915-925.

Galtier, N., B. Nabholz, S. Glémin, and G. Hurst. 2009. Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Molecular Ecology.

Garrido, O. H., and C. Varela. 2008. Sobre la especie Acanthemblemaria chaplini (Pisces: Chaenopsidae) en Cuba, con la descripción de una especie nueva. Solenodon 7:29-36.

Graur, D., and W.-H. Li. 2000. Fundamentals of Molecular Evolution, 2nd edition. Sinauer Associates, Inc., Sunderland, Massachusetts.

Greenfield, D. W. 1981. The blennioid fishes of Belize and Honduras, Central America, with comments on their systematics, ecology, and distribution (Blennidae, Chaenopsidae, Labrisomidae, Tripterygiidae). Fieldiana Zoology:1-106.

Greenfield, D. W., and T. A. Greenfield. 1982. Habitat and resource partitioning between two species of Acanthemblemaria (Pisces: Chaenopsidae), with comments on the chaos

117 hypothesis. The Atlantic Barrier Reef Ecosystem at Carrie Bow Cay, Belize, 1: Structure and Communities.:499-507.

Greenfield, D. W., and R. K. Johnson. 1990. Community structure of western Caribbean blennioid fishes. Copeia:433-448.

Hackett, S. J., R. T. Kimball, S. Reddy, R. C. K. Bowie, E. L. Braun, M. J. Braun, J. L. Chojnowski, W. A. Cox, K.-L. Han, J. Harshman, C. J. Huddleston, B. D. Marks, K. J. Miglia, W. S. Moore, F. H. Sheldon, D. W. Steadman, C. C. Witt, and T. Yuri. 2008. A Phylogenomic Study of Birds Reveals Their Evolutionary History. Science 320:1763- 1768.

Hare, M. 2001. Prospects for nuclear gene phylogeography. Trends in Ecology & Evolution 16:700-706.

Hastings, P. A. 1990. Phylogenetic relationships of tube blennies of the genus Acanthemblemaria (Pisces, Blennioidei). Bulletin of Marine Science 47:725-738.

Hastings, P. A. 2000. Biogeography of the Tropical Eastern Pacific: distribution and phylogeny of chaenopsid fishes. Zoological Journal of the Linnean Society 128:319-335.

Hastings, P. A. 2002. Evolution of morphological and behavioral ontogenies in females of a highly dimorphic clade of blennioid fishes. Evolution 56:1644-1654.

Hastings, P. A. 2009. Biogeography of New World Blennies. Pages 95-118 in The Biology of Blennies (R. A. Patzner, E. J. Gonçalves, P. A. Hastings, and B. J. Kapoor, eds.). Science Publishers, Enfield, New Hampshire.

Hastings, P. A., and D. R. Robertson. 1999. Notes on a collection of chaenopsid blennies from Bahia Azul, Bocas del Toro, Caribbean, Panama. Revue Francaise d'Aquariologie Herpetologie (Nancy) 26:33-38.

Hastings, P. A., and V. G. Springer. 1994. Review of Stathmonotus, with redefinition and phylogenetic analysis of the Chaenopsidae (Teleostei:Blennioidei). Smithsonian Contributions to Zoology:1-48.

118 Hastings, P. A., and V. G. Springer. 2009a. Recognizing diversity in blennioid fish nomenclature (Teleostei: Blennioidei). Zootaxa 2120:3-14.

Hastings, P. A., and V. G. Springer. 2009b. Systematics of the Blenniodei and the Included Families Dactyloscopidae, Chaenopsidae, Clinidae and Labrisomidae. Pages 3-30 in The Biology of Blennies (R. A. Patzner, E. J. Gonçalves, P. A. Hastings, and B. J. Kapoor, eds.). Science Publishers, Enfield, New Hampshire.

Hebert, P. D. N., S. Ratnasingham, and J. R. deWaard. 2003. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings Of The Royal Society Of London Series B-Biological Sciences 270:S96-S99.

Hedrick, P. W. 2005. A standardized genetic differentiation measure. Evolution 59:1633-1638.

Hein, J., M. H. Schierup, and C. Wiuf. 2005. Gene Genealogies, Variation and Evolution. Oxford University Press, New York.

Heled, J., and A. Drummond. 2008. Bayesian inference of population size history from multiple loci. BMC Evolutionary Biology 8:289.

Hellberg, M. 2006. No variation and low synonymous substitution rates in coral mtDNA despite high nuclear variation. BMC Evolutionary Biology 6:24.

Hewitt, G. 2000. The genetic legacy of the Quaternary ice ages. Nature 405:907-913.

Hey, J. 2010. Isolation with migration models for more than two populations. Molecular Biology and Evolution 27:905-920.

Hickerson, M., B. Carstens, J. Cavender-Bares, K. Crandall, C. Graham, J. Johnson, L. Rissler, P. Victoriano, and A. Yoder. 2009. Phylogeography's past, present, and future: 10 years after Avise 2000. Molecular Phylogenetics and Evolution.

Hickerson, M., E. Stahl, and H. Lessios. 2006. Test for simultaneous divergence using approximate Bayesian computation. Evolution:2435-2453.

119 Hickerson, M. J., and C. W. Cunningham. 2005. Contrasting quaternary histories in an ecologically divergent sister pair of low-dispersing intertidal fish (Xiphister) revealed by multilocus DNA analysis. Evolution 59:344-360.

Ho, J. W. K., C. E. Adams, J. B. Lew, T. J. Matthews, C. C. Ng, A. Shahabi-Sirjani, L. H. Tan, Y. Zhao, S. Easteal, S. R. Wilson, and L. S. Jermiin. 2006. SeqVis: Visualization of compositional heterogeneity in large alignments of nucleotides. Bioinformatics 22:2162- 2163.

Holland, B. R., H. G. Spencer, T. H. Worthy, and M. Kennedy. 2010. Identifying Cliques of Convergent Characters: Concerted Evolution in the Cormorants and Shags. Systematic Biology 59:433-445.

Hubisz, M., D. Falush, M. Stephens, and J. Pritchard. 2009. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9:1322- 1332.

Hudson, M. 2008. Sequencing breakthroughs for genomic ecology and evolutionary biology. Molecular Ecology Resources 8:3-17.

Hudson, R. R., and M. Turelli. 2003. Stochasticity overrules the "three-times rule": Genetic drift, genetic draft, and coalescence times for nuclear loci versus mitochondrial DNA. Evolution 57:182-190.

Huelsenbeck, J. P., J. P. Bollback, and A. M. Levine. 2002a. Inferring the root of a phylogenetic tree. Systematic Biology 51:32-43.

Huelsenbeck, J. P., B. Larget, R. E. Miller, and F. Ronquist. 2002b. Potential Applications and Pitfalls of Bayesian Inference of Phylogeny. Systematic Biology 51:673-688.

Huelsenbeck, J. P., and B. Rannala. 2004. Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models. Systematic Biology 53:904-913.

Huelsenbeck, J. P., B. Rannala, and J. P. Masly. 2000. Accommodating Phylogenetic Uncertainty in Evolutionary Studies. Science 288:2349-2350.

120 Huelsenbeck, J. P., F. Ronquist, R. Nielsen, and J. P. Bollback. 2001. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 294:2310-2314.

Jakobsson, M., and N. Rosenberg. 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801.

Johnson, G. D., and E. B. Brothers. 1989. Acanthemblemaria paula, a new diminutive chaenopsid (Pisces, Blennioidei) from Belize, with comments on life history. Proceedings of the Biological Society of Washington 102:1018-1030.

Jordan, D. S. 1908. The Law of Geminate Species. American Naturalist 42:73-80.

Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90:773-795.

Kluge, A. 1989. A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Systematic Zoology 38:7-25.

Knowlton, N., L. A. Weigt, L. A. Solorzano, D. K. Mills, and E. Bermingham. 1993. Divergence in Proteins, Mitochondrial-DNA, and Reproductive Compatibility across the Isthmus of Panama. Science 260:1629-1632.

Kubatko, L. S., B. C. Carstens, and L. L. Knowles. 2009. STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25:971-973.

Lakner, C., P. Van Der Mark, J. P. Huelsenbeck, B. Larget, and F. Ronquist. 2008. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic Biology 57:86.

Lambeck, K., Y. Yokoyama, and T. Purcell. 2002. Into and out of the Last Glacial Maximum: sea-level change during Oxygen Isotope Stages 3 and 2. Quaternary Science Reviews 21:343-360.

121 Lee, J. Y., and S. V. Edwards. 2008. Divergence across Australia's Carpentarian barrier: statistical phylogeography of the red-backed fairy wren (Malurus melanocephalus). Evolution 62:3117-3134.

Lemmon, A., and E. Moriarty. 2004. The importance of proper model assumption in Bayesian phylogenetics. Systematic Biology 53:265-277.

Lessios, H. 2008. The great American schism: divergence of marine organisms after the rise of the Central American Isthmus. Annual Review of Ecology, Evolution, and Systematics 39:63-91.

Lessios, H. A., B. D. Kessing, and J. S. Pearse. 2001. Population structure and speciation in tropical seas: Global phylogeography of the sea urchin Diadema. Evolution 55:955-975.

Lewis, P. O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50:913-925.

Librado, P., and J. Rozas. 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451-1452.

Lin, H.-C., and G. R. Galland. 2010. Molecular analysis of Acanthemblemaria macrospilus (Teleostei: Chaenopsidae) with description of a new species from the Gulf of California, Mexico. Zootaxa 2525:51-62.

Lin, H.-C., C. Sànchez-Ortiz, and P. A. Hastings. 2009. Colour variation is incongruent with mitochondrial lineages: cryptic speciation and subsequent diversification in a Gulf of California reef fish (Teleostei: Blennioidei). Molecular Ecology 18:2476-2488.

Lindquist, D. G. 1985. Depth Zonation, Microhabitat, and Morphology of 3 Species of Acanthemblemaria (Pisces, Blennioidea) in the Gulf of California, Mexico. Marine Ecology-Pubblicazioni Della Stazione Zoologica Di Napoli I 6:329-344.

Lindquist, D. G., and K. M. Kotrschal. 1987. The Diets in 4 Pacific Tube Blennies (Acanthemblemaria, Chaenopsidae) - Lack of Ecological Divergence in Syntopic Species. Marine Ecology-Pubblicazioni Della Stazione Zoologica Di Napoli I 8:327-335.

122 Losos, J. B., and R. E. Glor. 2003. Phylogenetic comparative methods and the geography of speciation. Trends in Ecology & Evolution 18:220-227.

Marko, P. B. 2002. Fossil calibration of molecular clocks and the divergence times of geminate species pairs separated by the Isthmus of Panama. Molecular Biology and Evolution 19:2005-2021.

Marko, P. B. 2004. 'What's larvae got to do with it?' Disparate patterns of post-glacial population structure in two benthic marine gastropods with identical dispersal potential. Molecular Ecology 13:597-611.

Marshall, D. 2009. Cryptic failure of partitioned Bayesian phylogenetic analyses: lost in the land of long trees. Systematic Biology.

Marshall, D. C., C. Simon, and T. R. Buckley. 2006. Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors. Systematic Biology 55:993-1003.

McCracken, K. G., J. Harshman, D. A. McClellan, and A. D. Afton. 1999. Data set incongruence and correlated character evolution: an example of functional convergence in the hind- limbs of stifftail diving ducks. Systematic Biology 48:683-714.

McMillan, W., and S. Palumbi. 1997. Rapid rate of control-region evolution in Pacific butterflyfishes (Chaetodontidae). Journal of Molecular Evolution 45:473-484.

Meirmans, P. 2006. Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60:2399-2402.

Meirmans, P. G., and P. H. Van Tienderen. 2004. GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Molecular Ecology Notes 4:792-794.

Metzelaar, J. 1919. Report on the fishes collected by Dr. J. Boeke in the Dutch West Indies 1904-1905. With comparative notes on marine fishes of tropical West Africa. in Rapport Visscherij en de industrie van Zeeprodukten in de Kolonie Curaçao, Gravenhage.

123 Michalakis, Y., and L. Excoffier. 1996. A generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci. Genetics 142:1061-1064.

Minin, V., E. Bloomquist, and M. Suchard. 2008. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Molecular Biology and Evolution 25:1459.

Montaggioni, L. 2000. Postglacial reef growth. Comptes Rendus de l'Academie des Sciences Series IIA Earth and Planetary Science 331:319-330.

Moore, W. 1995. Inferring phylogenies from mtDNA variation: mitochondrial-gene trees versus nuclear-gene trees. Evolution 49:718-726.

Mora, C., P. M. Chittaro, P. F. Sale, J. P. Kritzer, and S. A. Ludsin. 2003. Patterns and processes in reef fish diversity. Nature 421:933-936.

Munday, P., and G. Jones. 1998. The ecological implications of small body size among coral- reef fishes. Oceanography and Marine Biology 36:373-411.

Nabholz, B., S. Glémin, and N. Galtier. 2008. Strong variations of mitochondrial mutation rate across mammals--the longevity hypothesis. Molecular Biology and Evolution 25:120.

Nabholz, B., S. Glémin, and N. Galtier. 2009. The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evolutionary Biology 9:54.

Nelson, J., and A. Wheeler. 2006. Fishes of the World. Wiley New York.

Newton, M., and A. Raftery. 1994. Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society. Series B (Methodological) 56:3-48.

Nylander, J., J. Wilgenbusch, D. Warren, and D. Swofford. 2008. AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics 24:581.

124

Nylander, J. A. A., F. Ronquist, J. P. Huelsenbeck, and J. L. Nieves-Aldrey. 2004. Bayesian phylogenetic analysis of combined data. Systematic Biology 53:47-67.

Oliveira, D., R. Raychoudhury, D. Lavrov, and J. Werren. 2008. Rapidly evolving mitochondrial genome and directional selection in mitochondrial genes in the parasitic wasp Nasonia (Hymenoptera: Pteromalidae). Molecular Biology and Evolution 25:2167.

Pond, S. L. K., and S. D. W. Frost. 2005. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21:2531-2533.

Pond, S. L. K., D. Posada, M. B. Gravenor, C. H. Woelk, and S. D. W. Frost. 2006. GARD: a genetic algorithm for recombination detection. Bioinformatics 22:3096-3098.

Pons, J., T. G. Barraclough, J. Gomez-Zurita, A. Cardoso, D. P. Duran, S. Hazell, S. Kamoun, W. D. Sumlin, and A. P. Vogler. 2006. Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology 55:595-609.

Posada, D. 2008. jModelTest: Phylogenetic model averaging. Molecular Biology and Evolution 25:1253-1256.

Posada, D., and K. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817.

Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959.

Pritchard, J. K., X. Wen, and D. Falush. 2009. Documentation for structure software: Version 2.3Available from http://pritch.bsd.uchicago.edu.

Rambaut, A., and A. J. Drummond. 2010. Tracer v1.5. Available from http://beast.bio.ed.ac.uk/Tracer

Ramjohn, D. 1999. Checklist of coastal and marine fishes of Trinidad and Tobago. Marine Fishery Analysis Unit, Fisheries Division, Ministry of Agriculture, Land and Marine Resources, Trinidad and Tobago. Fisheries Information Series 8:151.

125

Ramos-Onsins, S., and J. Rozas. 2002. Statistical properties of new neutrality tests against population growth. Molecular Biology and Evolution 19:2092.

Rawson, P., and R. Burton. 2002. Functional coadaptation between cytochrome c and cytochrome c oxidase within allopatric populations of a marine copepod. Proceedings of the National Academy of Sciences 99:12955.

Robertson, D. R. 1996. Interspecific competition controls abundance and habitat use of territorial Caribbean damselfishes. Ecology 77:885-899.

Rocha, L. A., K. C. Lindeman, C. R. Rocha, and H. A. Lessios. 2008. Historical biogeography and speciation in the reef fish genus Haemulon (Teleostei: Haemulidae). Molecular Phylogenetics and Evolution 48:918.

Rocha, L. A., D. R. Robertson, C. R. Rocha, J. L. Van Tassell, M. T. Craig, and B. W. Bowens. 2005. Recent invasion of the tropical Atlantic by an Indo-Pacific coral reef fish. Molecular Ecology 14:3921-3928.

Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.

Ronquist, F., P. van der Mark, and J. P. Huelsenbeck. 2009. Bayesian phylogenetic analysis using MRBAYES. Pages 210-265 in The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing. (P. Lemey, M. Salemi, and A.-M. Vandamme, eds.). Cambridge University Press, New York.

Rosenblatt, R. 1967. The zoogeographic relationships of the marine shore fishes of tropical America. Studies in Tropical Oceanography 5:579-592.

Rosenblatt, R. H., and J. S. Stephens. 1978. Mccoskericthys sandae: a new and unusual chaenopsid blenny from the Pacific coast of Panama and Costa Rica. Contrib. Sci. Nat. Hist. Mus. Los Angeles Co. 293:1-22.

Sale, P. 1977. Maintenance of High Diversity in Coral Reef Fish Communities. American Naturalist 111:337-359.

126

Sale, P. F. 1991. The Ecology Of Fishes On Coral Reefs. Academic Press, San Diego, California.

Sale, P. F. 2002. Coral Reef Fishes: Dynamics And Diversity In A Complex Ecosystem. Academic Press, San Diego, California.

Senchina, D. S., I. Alvarez, R. C. Cronn, B. Liu, J. Rong, R. D. Noyes, A. H. Paterson, R. A. Wing, T. A. Wilkins, and J. F. Wendel. 2003. Rate Variation Among Nuclear Genes and the Age of Polyploidy in Gossypium. Molecular Biology and Evolution 20:633-643.

Shaffer, H. B., J. M. Clark, and F. Kraus. 1991. When molecules and morphology clash: a phylogenetic analysis of the North American ambystomatid salamanders (Caudata: Ambystomatidae). Systematic Zoology 40:284-303.

Shapiro, B., A. Rambaut, and A. J. Drummond. 2006. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Molecular Biology and Evolution 23:7-9.

Smith-Vaniz, W. F., and F. J. Palacio. 1974. Atlantic fishes of the genus Acanthemblemaria, with description of three new species and comments on Pacific species (Clinidae: Chaenopsinae). Proceedings of the Academy of Natural Sciences of Philadelphia 125:197-224.

Springer, V. 1993. Definition of the suborder Blennioidei and its included families (Pisces: Perciformes). Bulletin of Marine Science 52:472-495.

Stephens, J. S. 1963. A revised classification of the blennioid fishes of the American family Chaenopsidae. University of California Publications in Zoology 68:1-165.

Stephens, J. S. 1970. Seven new chaenopsid blennies from the Western Atlantic. Copeia:280- 309.

Stephens, M., and P. Donnelly. 2003. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. American Journal of Human Genetics 73:1162-1169.

127 Stephens, M., and P. Scheet. 2005. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. American Journal of Human Genetics 76:449- 462.

Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68:978-989.

Suchard, M., R. Weiss, and J. Sinsheimer. 2001. Bayesian selection of continuous-time Markov chain evolutionary models. Molecular Biology and Evolution 18:1001.

Suchard, M. A., and B. D. Redelings. 2006. BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047-2048.

Sukumaran, J., and M. T. Holder. 2008. SumTrees: Summarization of Split Support on Phylogenetic Trees. Part of the DendroPy Phylogenetic Computation Library Version 2.1.3 (http://sourceforge.net/projects/dendropy).

Sullivan, J., and P. Joyce. 2005. Model Selection in Phylogenetics. Annual Review of Ecology, Evolution, and Systematics 36:445-466.

Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Ver. 4. Sinauer Associates.

Taberlet, P., L. Fumagalli, A. Wust-Saucy, and J. Cosson. 1998. Comparative phylogeography and postglacial colonization routes in Europe. Molecular Ecology 7:453-464.

Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585.

Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24:1596-1599.

Taylor, M. S., and M. E. Hellberg. 2005. Marine radiations at small geographic scales: speciation in neotropical reef gobies (Elacatinus). Evolution 59:374-385.

128 Taylor, M. S., and M. E. Hellberg. 2006. Comparative phylogeography in a genus of coral reef fishes: biogeographic and genetic concordance in the Caribbean. Molecular Ecology 15:695-707.

Team, R. D. C. 2009. R: A language and environment for statistical computing. R Foundation for Statistical Computing.

Thacker, C., A. Thompson, D. Roje, and E. Shaw. 2008. New expansions in old clades: population genetics and phylogeny of Gnatholepis species (Teleostei: Gobioidei) in the Pacific. Marine Biology 153:375-385.

The Nasonia Genome Working Group. 2010. Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species. Science 327:343-348.

Villesen, P. 2007. FaBox: an online toolbox for FASTA sequences. Molecular Ecology Notes 7:965-968.

Waples, R., and O. Gaggiotti. 2006. What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Molecular Ecology 15:1419-1440.

Wares, J. P. 2010. Natural distributions of mitochondrial sequence diversity support new null hypotheses. Evolution 64:1136-1142.

Weersing, K., and R. Toonen. 2009. Population genetics, larval dispersal, and connectivity in marine systems. Mar Ecol Prog Ser 393:1-12.

Weir, B. S., and C. C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370.

Welch, J., O. Bininda-Emonds, and L. Bromham. 2008. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evolutionary Biology 8:53.

Welch, J. J., and L. Bromham. 2005. Molecular dating when rates vary. Trends In Ecology & Evolution 20:320-327.

129 Werner, R. K., and K. Hoernle. 2003. New volcanological and volatile data provide strong support for the continuous existence of Galápagos Islands over the past 17 million years. International Journal of Earth Sciences 92:904–911.

Willett, C. S., and R. S. Burton. 2004. Evolution of interacting proteins in the mitochondrial electron transport system in a marine copepod. Molecular Biology And Evolution 21:443-453.

Wolfe, K. H., P. M. Sharp, and W.-H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.

Yang, Z. 1996. Maximum-likelihood models for combined analyses of multiple sequence data. Journal of Molecular Evolution 42:587-596.

Yang, Z. 2006. Computational Molecular Evolution. Oxford University Press, Oxford.

Zhang, D. X., and G. M. Hewitt. 2003. Nuclear DNA analyses in genetic studies of populations: practice, problems and prospects (vol 12, pg 563, 2003). Molecular Ecology 12:1687- 1687.

Zink, R., and G. Barrowclough. 2008. Mitochondrial DNA under siege in avian phylogeography. Molecular Ecology 17:2107-2121.

Zwickl, D. J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. Dissertation,The University of Texas, Austin.

130

Appendix 1: Supplementary Data

Appendix 1 Table 1. Collection localities and sample sizes for species used in the present study.

A. aspera Locality GPS Number of Individuals N 25.00719 W 077.54846 10 New Providence, Bahamas N 25.09971 W 077.30686 6 Glover's Atoll, Belize N 16.761617 W 87.763683 12 N 15.95428 W 86.51771 7 Honduras N 15.97659 W 86.48747 9 Culebra, Puerto Rico N 18 19.441' W 065 19.943' 3 Paguera, Puerto Rico N 17 53.872' W 066 57.904' 8 St. Thomas N/A 13 St. Maarten N 17.99082 W 63.05679 16 TOTAL 84

A. spinosa Locality GPS Number of Individuals N 25.09971 W 077.30686 3 New Providence, Bahamas N 25.00719 W 077.54846 9 Glover's Atoll, Belize N 16.761617 W 87.763683 13 N 15.97659 W 86.48747 8 Honduras 15.95428N 86.51771W 7 Culebra, Puerto Rico N 18 19.441' W 065 19.943' 7 Paguera, Puerto Rico N 17 53.872' W 066 57.904' 8 St. Thomas N/A 15 St. Maarten N 17.99082 W 63.05679 14 TOTAL 84

A. betinensis Locality GPS Number of Individuals Bocas del Tora, Bahia Azul, Panama N 9°07.9‘ W -81°50.9‘ 1

A. exilispinus Locality Isla Taboga, Panama N 8°46.9‘ W -79°33.0‘ 1

A. paula Glover's Atoll, Belize N 16.761617 W 87.763683 1

131 Appendix 1 Table 2. PCR conditions, primer sequences, and references for Chapter 2

Marker Primer Name Primer sequence Reference COI FISHCOILBC TCAACYAATCAYAAAGATATYGGCAC Baldwin et al. 2008 FISHCOIHBC ACTTCYGGGTGRCCRAARAATCA Baldwin et al. 2008

ATROP ATROP-L GAGTTGGATCGCGCTCAGGAGCG Hickerson and Cunningham 2005 ATROP-H CGGTCAGCCTCCTCAGCAATGTGCTT Hickerson and Cunningham 2005

RAG1 RAG1Of2 CTGAGCTGCAGTCAGTACCATAAGATGT Taylor and Hellberg 2005 RAG1R1.539.519 CAGGACAGTTCTGAGTTTGGC This study

Baldwin, C., J. Mounts, D. Smith, and L. Weigt. 2008. Genetic identification and color descriptions of early life-history stages of Belizean Phaeoptyx and Astrapogon (Teleostei: Apogonidae) with Comments on identification of adult Phaeoptyx. Zootaxa 1:2009.

Hickerson, M. J., and C. W. Cunningham. 2005. Contrasting quaternary histories in an ecologically divergent sister pair of low- dispersing intertidal fish (Xiphister) revealed by multilocus DNA analysis. Evolution 59:344-360

Taylor, M. S., and M. E. Hellberg. 2005. Marine radiations at small geographic scales: speciation in neotropical reef gobies (Elacatinus). Evolution 59:374-385.

PCR conditions COI and RAG1 One cycle of 94° C for 2 min, 50° C for 90 s, 72° C for 2 min followed by 38 cycles of 94° C for 45 s, 50° C for 1 min, and 72° C for 90 s, and a final cycle of 94° C for 40 s, 50° C for 1 min, and 72° C for 10 min.

ATROP One cycle of 94° C for 2 min, 62° C for 1:30, 72° for 2 min followed by 38 cycles of 94° C for 45 s, 62° C for 1 min, and 72° C for 45 s, and a final cycle of 94° C for 45 s, 62° C for 1 min, and 72° C for 10 min.

132

Appendix 1 Table 3. Models of sequence evolution used in the study

Pairwise Sequence Divergence* + GMRF *If the model listed was not available in MEGA then the next less complex Skyride Plots model was used

COI All BA SXM A. aspera TrN+G TrN HKY A. spinosa TrN+G HKY HKY

ATROP All BA SXM A. aspera TVMef+G GTR+G TIM1+G A. spinosa TVMef+G TrN+G TrN+G

RAG1 All BA SXM A. aspera SYM+G HKY+G TVM+G A. spinosa TrNef+G TVM+G TVMef+G

RAG1 BE_HN Trim (only for spinosa, using pos. 1-192) N/A TrNef+G BEAST Gene Trees COI A. aspera ** HKY, HKY, TrN93 **Models partitioned by codon position A. spinosa ** HKY+G, HKY+G, GTR ATROP A. aspera GTR+G A. spinosa GTR+G RAG1 A. aspera GTR+G A. spinosa TrN93ef+G

BEAST Substitution Rate Estimates COI GTR + G ATROP HKY RAG1 HKY

133 Appendix 1 Table 4. Estimates of gene tree root heights and current effective population sizes from skyride analyses

GMRF Estimates of root heights Estimates are given as Median, Upper , and Lower values of the test COI All BA BE_HN A. aspera 2.36E4, 3.5E4, 1.47E4 3.58E4, 6.3E4, 1.47E4 1.94E4, 3.72E4, 7.39E3 A. spinosa 7.46E5, 9.51E5, 5.58E5 3.2E4, 6.15E4, 1.34E4 XXX

PR_STO SXM XXX 2.57E4, 1.14E4, 8E-1 1.92E4, 3.68E4, 6.26E3 2.52E4, 5.01E4, 7.94E4

ATROP All BA BE_HN A. aspera 3.59E5, 5.12E5, 2.39E5 1.02E6, 1.68E6, 5.37E5 6.1E5, 9.66E5, 3.12E5 A. spinosa 4.8E5, 6.93E5, 3.14E5 5.05E5, 8.92E5, 2.12E5 1.06E5, 2.2E5, 28.1E4

PR_STO SXM 6.68E5, 1.06E6, 3.51E5 3.66E5, 6.71E5, 1.39E5 5.38E5, 8.82E5, 2.59E5 2.35E5, 5.04E5, 5.97E4

RAG1 All BA BE_HN A. aspera 4.85E5, 7.02E5 2.89E5 1.21E6, 2.16E6, 5E5 6.33E5, 1.05E6, 2.79E5 A. spinosa 1.5E6, 2.1E6, 9.82E5 1.74E6, 3E6, 7.99E5 8.71E5, 1.47E6, 4.4E5

PR_STO SXM 5.73E5, 1.03E6, 2.32E5 4.82E5, 9.7E5, 1.31E5 7.21E5, 1.22E6, 3.48E5 1.5E6, 2.6E6, 6.6E5

GMRF Estimates of current effective size Estimates are given as Median, Upper , and Lower values of the test COI All BA BE_HN A. aspera 4.63E5, 1.67E6, 1.58E5 1.39E5, 1.3E6, 2.85E4 1.14E5, 7.59E5, 2.52E4 A. spinosa 6.93E5, 1.84E6, 3.61E5 1.75E5, 2.52E6, 1.37E4 XXX

PR_STO SXM XXX 6.81E3, 9.69E3, 1.85E2 9.12E4, 5.48E5, 2.28E4 7.65E4, 8.67E5, 12.43E4

ATROP All BA BE_HN A. aspera 1.74E7, 5.21E7, 6.51E6 3.18E6, 1.33E7, 1.23E6 5.64E6, 2.08E7, 1.88E6 A. spinosa 2.01E7, 5.78E7, 7.78E6 4.06E6, 3.59E7, 6.93E5 1.81E6, 1.03E7, 3.8E5

PR_STO SXM 5.29E6, 2.12E7, 1.66E6 2.59E6, 1.55E7, 5.96E5 4.87E6, 1.73E7, 1.66E6 1.16E6, 7.74E6, 2.45E5

RAG1 All BA BE_HN A. aspera 2.29E7, 7.51E7, 8.25E6 4.4E6, 1.91E7, 1.34E6 9.11E6, 4.33E7, 2.32E6 A. spinosa 5.53E7, 1.53E8, 2.29E7 7.14E6, 4.26E7, 2.03E6 1.1E7, 4.97E7, 3.09E6 PR_STO SXM 5.51E6, 2.59E7, 1.47E6 4.67E6, 3.48E7, 8.66E5 9.51E6, 3.8E7, 2.82E6 5.68E6, 2.61E7, 1.78E6

134

Appendix 1 Table 5. Collection localities for samples used in Chapter 3.

!"#$%&'()* +,-./.-012 '345.46 $7512.89 $18:$7, '+;&<705=4>&+? !"#$%&' !"#$%&'($ %()*+,"# '-./01-/&231.430405 &6! !"#$789': !"#$%&'($ 7;.29"",;<) '0.//=3:&21>.=-10/5 &6! !?@.=H2&2I2/1G=/.-H25 J='I4= !?"D"' !"#)$*$+,(-. KK KK &6! !?<;L")' !"#)'/0+'+%0% M"NF"2!O+DE2L")"@" /G=0./H2&2I23'G-=./H25 J>I'4' !?<;L"): !"#)'/0+'+%0% M"NF"2!O+DE2L")"@" /G=0./H2&2I23'G-=./H25 J>I'4' !C"#PQ' !"#1$%/(,0 2P"D"$(R(#2Q#D")*# 'G'1./H272I2/=G:1.=H25 J=:I3/ !CN"$L"): !"#12$&*0+0 M"NF"2!O+DE2L")"@" /G=0./H2&2I23'G-=./H25 &6! !CN"$M!.)$.: !"#12$&*0+0 &>.=H25 J=>I'4: !>.=H25 J=>I'4: '0.-33'111&222I !R,<<)MW' !"#5(''+60'*70 V+,)4>>>&222I !R,<<)MW: !"#5(''+60'*70 V+,)>>5 !NL")' !"#2$+1,130 Q#D"2V"?(R"E2L")"@" 3G41./H2&2I20/G>>.=H25 J=>I'4: !NBY' !"#2$+1,130 KK KK J=>I: !Z[W' !1$+"+"#%& [<).=H2&2I2'''G'4.=H25 J=-I'44 !@"C9 !.2@"C,(#$FD+# \"2L"O :4G>>.=H2&2I2''=G:>.=H25 J=>I3= !@L"): !.2@",F" M"NF"2!O+DE2L")"@" /G=0./H2&2I23'G-=./H25 J>I'4' !@",7V8' !.2@",F" B")<2M"]E27;.2B,(FAE2^7[Q KK &6! !@",M!' !.2@",F" &.=--305 &6! '0.-33'111&222I !@",)#$MW' _$+)U2?D<))]_ V+,) !.2@<*+#" 7"F);29"",;<) '3.='04-&21>.=1=:=5 &6! !@<*=-B^: !.2@<*+#" B+,"C"( ':.>:3/0&21/.'-==:5 &6! !@<*[W: !.2@<*+#" [<)'./H2&2I214G:=./H25 J=:I12(,2J=1I:00 !#$F)=-B^' !.2#$F)(#" B+,"C"( ':.:'30==&21/.=3-0=-5 &6! !#$F)789:= !.2#$F)(#" 7;.29"";<) '3.='04-&21>.=1=:=5 &6! !#$F)%&- !.2#$F)(#" %()*+,"# '-./01-/&231.430405 &6! !#$F)[W' !.2#$F)(#" [<) !#$F)LYC: !.2#$F)(#" B+DG'=.=H2&2I230G4=.=H25 J='I'0= WU@L"): WU.29]<,#F Q#D"2V"?(R"E2L")"@" 3G41./H2&2I20/G>>.=H25 J=>I'4: WU)L")' WU.2&FR," M"NF"2!O+DE2L")"@" /G=0./H2&2I23'G-=./H25 J>I'4' L?@0.=H2&2I2'=-G'-.=H25 J='I'3:

135 Appendix 1 Table 6. PCR conditions, primer sequences, and references for Chapter 3.

Marker Primer Name Primer sequence Reference COI FISHCOILBC TCA ACY AAT CAY AAA GAT ATY GGC AC Baldwin et al. 2008 FISHCOIHBC ACT TCY GGG TGR CCR AAR AAT CA Baldwin et al. 2008

ATROP ATROP-L GAG TTG GAT CGC GCT CAG GAG CG Hickerson and Cunningham 2005 ATROP-H CGG TCA GCC TCC TCA GCA ATG TGC TT Hickerson and Cunningham 2005

RAG1 RAG1Of2 CTG AGC TGC AGT CAG TAC CAT AAG ATG T Taylor and Hellberg 2005 RAG1F.4.27 AGCTGTAGTCAGTAYCACAARATG This study RAG1S2F CCG AGA AGG CTG TAC GTT TCT CTT Taylor and Hellberg 2005 RAG1S1R CCT GCC AGC ACA GAA ACA GAC ATA Taylor and Hellberg 2005 RAG1R1.539.519 CAG GAC AGT TCT GAG TTT GGC This study RAG1F3.519.539 GCC AAA CTC AGA ACT GTC CTG This study RAG1S2R CATTACCGGCTTGAGCTTCATCCT Taylor and Hellberg 2005 RAG1F4.1129.1148 ATGAATGGGAACTTTGCCCG This study RAG1S3F GCT CAT GAG GCT CTA TAT TCA GAT G Taylor and Hellberg 2005 RAG1Or2 CTG AGT CCT TGT GAG CTT CCA TRA AYT T Taylor and Hellberg 2005

SH3PX3 SH3PX3_F461 GTATGGTSGGCAGGAACYTGAA Li et al. 2007 SH3PX3_R1303 CAAACAKCTCYCCGATGTTCTC Li et al. 2007

TMO4C4 TMO-F2 GAKTGTTTGAAAATGACTCGCTA Near et. al 2004 TMO-R2 AAACATCYAAMGATATGATCATGC Near et. al 2004

MC1R MC1RFor ATGGAAATGACCAACRGGTCCYTGC This study MC1RRev CARGGTTYTMCGCAGCTCCTGGC This study MC1RF477 TCCAGCATCCTCTTCATCG This study MC1RR243 AGCATACCTGGGTGAACGTC This study MC1RR907 CGTAAATGAGCGGGTCGATGA This study MC1RR649 TATGAAGGTAGAGCACCGC This study

Baldwin, C., J. Mounts, D. Smith, and L. Weigt. 2008. Genetic identification and color descriptions of early life-history stages of Belizean Phaeoptyx and Astrapogon (Teleostei: Apogonidae) with Comments on identification of adult Phaeoptyx. Zootaxa 1:2009.

Hickerson, M. J., and C. W. Cunningham. 2005. Contrasting quaternary histories in an ecologically divergent sister pair of low-dispersing intertidal fish (Xiphister) revealed by multilocus DNA analysis. Evolution 59:344-360

Taylor, M. S., and M. E. Hellberg. 2005. Marine radiations at small geographic scales: speciation in neotropical reef gobies (Elacatinus). Evolution 59:374-385.

Near, T. J., D. I. Bolnick, and P. C. Wainwright. 2004. Investigating phylogenetic relationships of sunfishes and black basses (: Centrarchidae) using DNA sequences from mitochondrial and nuclear genes. Molecular Phylogenetics and Evolution 32:344-357

PCR conditions

One cycle of 94° C for 2 min, 50° C for 90 s, 72° C for 2 min followed by 38 cycles of 94° C for 45 s, 50° C for 1 min, COI, RAG1, and 72° C for 90 s, and a final cycle of 94° C for 40 s, 50° C for 1 min, and 72° C for 10 min. TMO4C4, SH3PX3, MC1R

ATROP One cycle of 94° C for 2 min, 62° C for 1:30, 72° for 2 min followed by 38 cycles of 94° C for 45 s, 62° C for 1 min, and 72° C for 45 s, and a final cycle of 94° C for 45 s, 62° C for 1 min, and 72° C for 10 min.

136

Appendix 1 Table 7. Models of sequence evolution used in Chapter 3 TABLE S3 Models used for MrBayes Models used for BEAST Models used for GARLI Gene/Character Set AIC Gene/Partition AIC Gene/Partition BIC FULL MATRIX - ALL GTR+I+G RAG1/1st Pos TrN+I RAG1/1st Pos HKY+I RAG1/ALL GTR+I+G RAG1/2nd Pos TVM+I+G RAG1/2nd Pos K80+I RAG1/1st+2nd Pos HKY+I RAG1/3rd Pos TrN+G RAG1/3rd Pos HKY+G RAG1/1st Pos HKY+I MC1R/1st Pos TrN+I MC1R/1st Pos K80+I RAG1/2nd Pos GTR+I+G MC1R/2nd Pos HKY MC1R/2nd Pos F81 RAG1/3rd Pos HKY+G MC1R/3rd Pos TVM+G MC1R/3rd Pos TVM+G MC1R/ALL HKY+I+G SH3PX3/1st Pos TIM+I SH3PX3/1st Pos JC MC1R/1st+2nd Pos HKY+I SH3PX3/2nd Pos TrN+I+G SH3PX3/2nd Pos F81+I MC1R/1st Pos GTR+I SH3PX3/3rd Pos TVM+G SH3PX3/3rd Pos TVM+G MC1R/2nd Pos HKY TMO4C4/1st Pos TrN+G TMO4C4/1st Pos F81 MC1R/3rd Pos GTR+G TMO4C4/2nd Pos TrN+I TMO4C4/2nd Pos F81+I SH3PX3/ALL GTR+I+G TMO4C4/3rd Pos K80+G TMO4C4/3rd Pos K80+G SH3PX3/1st+2nd Pos HKY+I COI/1st Pos GTR+G COI/1st Pos GTR+G SH3PX3/1st Pos F81+I COI/2nd Pos TVM+I COI/2nd Pos K81uf SH3PX3/2nd Pos HKY+I COI/3rd Pos TIM+I+G COI/3rd Pos TrN+I+G SH3PX3/3rd Pos GTR+G A-TROP/Intron GTR A-TROP/Intron HKY+I TMO4C4/ALL HKY+I+G A-TROP/Exon TrN+I+G A-TROP/Exon F81 TMO4C4/1st+2nd Pos HKY+I TMO4C4/1st Pos F81+G TMO4C4/2nd Pos HKY+I TMO4C4/3rd Pos K80+G A-TROP/ALL* HKY+G A-TROP/Intron GTR A-TROP/Exon HKY+I COI/ALL GTR+I+G COI/1st+2nd Pos GTR+I+G COI/1st Pos GTR+G COI/2nd Pos GTR+I COI/3rd Pos GTR+I+G NUC EXONS/1st Pos HKY+I NUC EXONS/2nd Pos GTR+I+G NUC EXONS/3rd Pos GTR+G NUC EXONS/1st+2nd Pos GTR+I+G

137

Appendix 1 Figure 1. Mean values and standard deviations of L(K) for K 1-6 from 10 STRUCTURE runs of the A. aspera eastern Caribbean dataset. The lowest L(K) values are for K = 1, while the highest are for K = 2.

138 Bahamas

A. aspera A. spinosa

10,000,000 10,000,000

1,000,000 1,000,000

100,000 100,000 COI

10,000 10,000

1,000 1,000 0 10000 20000 30000 40000 50000 60000 0 10000 20000 30000 40000 50000 60000

10,000,000 100,000,000

1,000,000 10,000,000

1,000,000

100,000 ATROP 100,000

10,000 10,000 0 250000 500000 750000 1000000 1250000 1500000 0 100000 200000 300000 400000 500000 600000 700000 800000

100,000,000 100,000,000

10,000,000 10,000,000 RAG1

1,000,000 1,000,000

100,000 100,000 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 2500000 3000000

Time Time

Appendix 1 Figure 2. Bayesian skyride plots for the Bahamas.

139 Belize/Honduras

A. aspera A. spinosa

1,000,000

100,000 COI

10,000

1,000 0 5000 10000 15000 20000 25000 30000 35000

100,000,000 100,000,000

10,000,000 10,000,000

1,000,000 1,000,000

ATROP 100,000 100,000

10,000 10,000 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 0 50000 100000 150000 200000

100,000,000 100,000,000

10,000,000 10,000,000

1,000,000 1,000,000 RAG1

100,000 100,000

10,000 10,000 0 250000 500000 750000 1000000 0 250000 500000 750000 1000000 1250000

Time Time

Appendix 1 Figure 3. Bayesian skyride plots for Belize and Honduras. The skyride analyses for the A. spinosa COI dataset did not converge and those results are not shown.

140 Puerto Rico/St. Thomas

A. aspera A. spinosa

1,000,000

100,000 COI

10,000

1,000 0 5000 10000 15000 20000 25000 30000 35000

100,000,000 100,000,000

10,000,000 10,000,000

1,000,000 1,000,000 ATROP 100,000 100,000

10,000 0 250000 500000 750000 1000000 10,000 0 100000 200000 300000 400000 500000 600000 700000 800000

100,000,000 100,000,000

10,000,000 10,000,000

1,000,000 1,000,000 RAG1

100,000 100,000

10,000 10,000 0 250000 500000 750000 1000000 0 250000 500000 750000 1000000

Time Time

Appendix 1 Figure 4. Bayesian skyride plots for Puerto Rico and St. Thomas. The A. aspera COI did not contain any segregating sites so the skyride analysis was not performed for that dataset.

141

St. Maarten

A. aspera A. spinosa

1,000,000 1,000,000

100,000

100,000

10,000 COI

10,000 1,000

100 1,000 0 2500 5000 7500 10000 0 10000 20000 30000 40000 50000

100,000,000 10,000,000

10,000,000

1,000,000

1,000,000

100,000

ATROP 100,000

10,000 10,000 0 100000 200000 300000 400000 500000 600000 0 100000 200000 300000 400000 500000

100,000,000 100,000,000

10,000,000

10,000,000

1,000,000 RAG1

1,000,000

100,000

10,000 100,000 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 0 500000 1000000 1500000 2000000 2500000

Time Time

Appendix 1 Figure 5. Bayesian skyride plots for St. Maarten.

142

Appendix 2: Permission from Evolution

JOHN WILEY AND SONS LICENSE TERMS AND CONDITIONS

Oct 25, 2010

This is a License Agreement between Ron I Eytan ("You") and John Wiley and Sons ("John Wiley and Sons") provided by Copyright Clearance Center ("CCC"). The license consists of your order details, the terms and conditions provided by John Wiley and Sons, and the payment terms and conditions.

All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of this form.

License Number

2535991290655

License date

Oct 25, 2010

Licensed content publisher

John Wiley and Sons

Licensed content publication

Evolution

Licensed content title

NUCLEAR AND MITOCHONDRIAL SEQUENCE DATA REVEAL AND CONCEAL DIFFERENT DEMOGRAPHIC HISTORIES AND POPULATION GENETIC PROCESSES IN CARIBBEAN REEF FISHES

Licensed content author

143

Ron I. Eytan,Michael E. Hellberg

Licensed content date

Jul 1, 2010

Start page

no

End page

no

Type of use

Dissertation/Thesis

Requestor type

Author of this Wiley article

Format

Print and electronic

Portion

Full article

Will you be translating?

No

Order reference number

144 Total

0.00 USD

Terms and Conditions

TERMS AND CONDITIONS

This copyrighted material is owned by or exclusively licensed to John Wiley & Sons, Inc. or one if its group companies (each a “Wiley Company”) or a society for whom a Wiley Company has exclusive publishing rights in relation to a particular journal (collectively “WILEY”). By clicking “accept” in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the billing and payment terms and conditions established by the Copyright Clearance Center Inc., (“CCC’s Billing and Payment terms and conditions”), at the time that you opened your Rightslink account (these are available at any time at http://myaccount.copyright.com).

Terms and Conditions

1. The materials you have requested permission to reproduce (the "Materials") are protected by copyright.

2. You are hereby granted a personal, non-exclusive, non-sublicensable, non-transferable, worldwide, limited license to reproduce the Materials for the purpose specified in the licensing process. This license is for a one-time use only with a maximum distribution equal to the number that you identified in the licensing process. Any form of republication granted by this licence must be completed within two years of the date of the grant of this licence (although copies prepared before may be distributed thereafter). Any electronic posting of the Materials is limited to one year from the date permission is granted and is on the condition that a link is placed to the journal homepage on Wiley’s online journals publication platform at www.interscience.wiley.com. The Materials shall not be used in any other manner or for any other purpose. Permission is granted subject to an appropriate acknowledgement given to the author, title of the material/book/journal and the publisher and on the understanding that nowhere in the text is a previously published source acknowledged for all or part of this Material. Any third party material is expressly excluded from this permission.

3. With respect to the Materials, all rights are reserved. No part of the Materials may be copied, modified, adapted, translated, reproduced, transferred or distributed, in any form or by any means, and no derivative works may be made based on the Materials without the prior permission of the respective copyright owner. You may not alter, remove or suppress in any manner any copyright, trademark or other notices displayed by the Materials. You may not license, rent, sell, loan, lease, pledge, offer as security, transfer or assign the Materials, or any of the rights granted to you hereunder to any other person.

145 4. The Materials and all of the intellectual property rights therein shall at all times remain the exclusive property of John Wiley & Sons Inc or one of its related companies (WILEY) or their respective licensors, and your interest therein is only that of having possession of and the right to reproduce the Materials pursuant to Section 2 herein during the continuance of this Agreement. You agree that you own no right, title or interest in or to the Materials or any of the intellectual property rights therein. You shall have no rights hereunder other than the license as provided for above in Section 2. No right, license or interest to any trademark, trade name, service mark or other branding ("Marks") of WILEY or its licensors is granted hereunder, and you agree that you shall not assert any such right, license or interest with respect thereto.

5. WILEY DOES NOT MAKE ANY WARRANTY OR REPRESENTATION OF ANY KIND TO YOU OR ANY THIRD PARTY, EXPRESS, IMPLIED OR STATUTORY, WITH RESPECT TO THE MATERIALS OR THE ACCURACY OF ANY INFORMATION CONTAINED IN THE MATERIALS, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, ACCURACY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE, USABILITY, INTEGRATION OR NON-INFRINGEMENT AND ALL SUCH WARRANTIES ARE HEREBY EXCLUDED BY WILEY AND WAIVED BY YOU.

6. WILEY shall have the right to terminate this Agreement immediately upon breach of this Agreement by you.

7. You shall indemnify, defend and hold harmless WILEY, its directors, officers, agents and employees, from and against any actual or threatened claims, demands, causes of action or proceedings arising from any breach of this Agreement by you.

8. IN NO EVENT SHALL WILEY BE LIABLE TO YOU OR ANY OTHER PARTY OR ANY OTHER PERSON OR ENTITY FOR ANY SPECIAL, CONSEQUENTIAL, INCIDENTAL, INDIRECT, EXEMPLARY OR PUNITIVE DAMAGES, HOWEVER CAUSED, ARISING OUT OF OR IN CONNECTION WITH THE DOWNLOADING, PROVISIONING, VIEWING OR USE OF THE MATERIALS REGARDLESS OF THE FORM OF ACTION, WHETHER FOR BREACH OF CONTRACT, BREACH OF WARRANTY, TORT, NEGLIGENCE, INFRINGEMENT OR OTHERWISE (INCLUDING, WITHOUT LIMITATION, DAMAGES BASED ON LOSS OF PROFITS, DATA, FILES, USE, BUSINESS OPPORTUNITY OR CLAIMS OF THIRD PARTIES), AND WHETHER OR NOT THE PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY PROVIDED HEREIN.

9. Should any provision of this Agreement be held by a court of competent jurisdiction to be illegal, invalid, or unenforceable, that provision shall be deemed amended to achieve as nearly as possible the same economic effect as the original provision, and the legality, validity and enforceability of the remaining provisions of this Agreement shall not be affected or impaired thereby.

146 10. The failure of either party to enforce any term or condition of this Agreement shall not constitute a waiver of either party's right to enforce each and every term and condition of this Agreement. No breach under this agreement shall be deemed waived or excused by either party unless such waiver or consent is in writing signed by the party granting such waiver or consent. The waiver by or consent of a party to a breach of any provision of this Agreement shall not operate or be construed as a waiver of or consent to any other or subsequent breach by such other party.

11. This Agreement may not be assigned (including by operation of law or otherwise) by you without WILEY's prior written consent.

12. These terms and conditions together with CCC’s Billing and Payment terms and conditions (which are incorporated herein) form the entire agreement between you and WILEY concerning this licensing transaction and (in the absence of fraud) supersedes all prior agreements and representations of the parties, oral or written. This Agreement may not be amended except in a writing signed by both parties. This Agreement shall be binding upon and inure to the benefit of the parties' successors, legal representatives, and authorized assigns.

13. In the event of any conflict between your obligations established by these terms and conditions and those established by CCC’s Billing and Payment terms and conditions, these terms and conditions shall prevail.

14. WILEY expressly reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC’s Billing and Payment terms and conditions.

15. This Agreement shall be governed by and construed in accordance with the laws of England and you agree to submit to the exclusive jurisdiction of the English courts.

16. Other Terms and Conditions:

BY CLICKING ON THE "I ACCEPT" BUTTON, YOU ACKNOWLEDGE THAT YOU HAVE READ AND FULLY UNDERSTAND EACH OF THE SECTIONS OF AND PROVISIONS SET FORTH IN THIS AGREEMENT AND THAT YOU ARE IN AGREEMENT WITH AND ARE WILLING TO ACCEPT ALL OF YOUR OBLIGATIONS AS SET FORTH IN THIS AGREEMENT.

V1.2

Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No payment is required.

If you would like to pay for this license now, please remit this license along with your payment made payable to "COPYRIGHT CLEARANCE CENTER" otherwise you will be invoiced within 48 hours of the license date. Payment should be in the form of a check or money order referencing your account number and this invoice number RLNK10871735.

147 Once you receive your invoice for this order, you may pay your invoice by credit card. Please follow instructions provided at that time.

Make Payment To: Copyright Clearance Center Dept 001 P.O. Box 843006 Boston, MA 02284-3006

If you find copyrighted material related to this license will not be used and wish to cancel, please contact us referencing this license number 2535991290655 and noting the reason for cancellation.

Questions? [email protected] or +1-877-622-5543 (toll free in the US) or +1-978- 646-2777.

148 Vita

Ron Israel Eytan was born in 1976 in Minneapolis, Minnesota, to Alice and Serge Eytan. Six weeks later, he moved to Chicago. Growing up in the city, his primary introduction to the natural world was through public television and weekend trips to Chicago area museums and zoos with his parents. At an early age, he decided that he wanted to spend as much time as possible working with animals, preferably marine ones. At 15, Ron began spending his summers in

California, SCUBA diving and learning about marine ecosystems. He attended the University of

Miami and received his Bachelor of Science degree in Biology and Marine Science in 1999.

During his college years, Ron spent a year in Australia, at James Cook University, as part of a study abroad program. Once there, he took field classes on the Great Barrier Reef and was introduced to coral reef ecosystems. This experience was a formative one, and would later drive his research interests. Upon graduating from the University of Miami, Ron moved home to

Chicago to be close to his family. He began working in biomedical research labs, picking up valuable training in molecular methods along the way. In late 2001, he ventured out to San

Diego, where he pursued post-graduate studies at Scripps Institution of Oceanography. In 2003,

Ron began his doctoral work with Dr. Michael Hellberg at Louisiana State University, studying the population genetics and systematics of Caribbean coral reef fishes.

149