Multilocus assessment of population differentiation in Baja California : implications for community assembly and conservation

A DISSERTATION SUBMITTED TO THE FACULTY OF UNIVERSITY OF MINNESOTA BY

Hernan Vazquez Miranda

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

F. Keith Barker and Robert M. Zink

August 2014

© Hernan Vazquez Miranda 2014

Acknowledgements

I would like to thank numerous people that made writing this dissertation possible. First and foremost I thank my advisers Keith Barker and Bob Zink. They mentored me in every way, from finding and developing my own research interests to formulating innovative ideas, effective and concise scientific writing, and learning how to obtain my own funding. They are the embodiment of human and academic kindness. My committee members, Ken Kozak and Peter Tiffin, helped me develop my ideas and expand my perspectives; they often pushed me to think “outside the box”. I thank members of the Systematics Discussion Group for debates on principles, methods, and insights that improved my understanding on how species originate and diversify. In particular I would like to thank my friends Ben Lowe, Matt Dufort, Tom Giarla, Liz

Wallace, Lorissa Fujishin, Domink Halas, Chih-Ming Hung, Tricia Markle, Juan Diaz, and Tony Gamble for their honest and constant advice on my progress and new ideas.

Numerous people and institutions provided me with tissue loans. I am grateful to the curators of the Bell Museum of Natural History-University of Minnesota (BMNH), A.

Navarro-Sigüenza from the Museo de Zoología “Alfonso L. Herrera”-Universidad

Nacional Autónoma de México (MZFC), J. Klicka and S. Birks from the Burke Museum of Natural History-University of Washington (UWBM), K. Burns from San Diego State

University (SDSU), D. Dittmann and R. Brumfield from the Louisiana State University

Museum of Natural Science (LSUMNS), N. Rice from the Academy of Natural Sciences of Philadelphia (ANSP), M. Robbins from University of Kansas (KU), P. Sweet from the

American Museum of Natural History (AMNH), K. Rabenold from Purdue University, J.

i

Hinshaw from the University of Michigan Museum of Zoology (UMMZ), and B. Marks and D. Willard from the Field Museum of Natural History (FMNH). My collaborators in

Mexico, Adolfo Navarro, Vicente Contreras, Roberto Sosa, and Octavio Rojas helped me immensely to plan and conduct my fieldwork successfully. I would also like to acknowledge Mexican authorities for making my fieldwork a smooth process. Jaime

Castro and Erick Rodriguez, Avery Michenzi, Carolyn Disse, Jordan Herman, and Leah

VandenBosch assisted me in multiple lab and field procedures.

I received funding from multiple institutions to pay for research expenses. The

University of Minnesota supported my research in multiple capacities: several Dayton-

Wilkie Award from the Bell Museum of Natural History, an International Graduate

Student Grant from the Graduate School, two Huempfner Fellowships, and one Anderson

Fellowship. I am also grateful to the Chapman Fund of the American Museum of Natural

History and a Graduate Student Grant from the American Ornithologists’ Union for funding the initial stages. The Consejo Nacional de Ciencia y Tecnología of Mexico

(CONACyT) gave me a scholarship for part of my stipend during four years. Several analyses were executed on the Minnesota Super Computer Institute and the CIPRES portal (Miller et al. 2010); I thank Matthew Miller for his computer assistance. I also thank Lynx Edicions for allowing me to reproduce HBW bird illustrations.

Finally I would like to thank a number of friends beyond count in Minnesota and

Mexico that made my time away from home enjoyable, allowing me to finish this process. My family, and in particular my mother, gave me every kind of love and support to follow my dreams and to learn how to fly.

ii

Dedication

I dedicate this dissertation to my mother Susana Miranda, my friends everywhere, the

Land of 10,000 Lakes, and Baja California; the origin and the destination. And of course to cohorts of incredible ducklings, cubs, and kits that taught me the most important lesson in life: It’s Duck Duck Gray Duck and none of that goose nonsense.

iii

Abstract

This dissertation is an assessment of biological diversification at the community, species, and population levels from large continental scales in the Americas to small regions between Mexico and the U.S. using birds as a study system. In Chapter 1, I calculate when the avian community of the Baja California peninsula diverged from the mainland using mitochondrial and nuclear DNA sequences. I discovered that even though birds fly and could have arrived to the peninsula in independent dispersal waves, genetic estimates correspond to few events of diversification that correspond to historical barriers to gene flow and more recent ecologic scenarios. Additionally, I find evidence for recognizing four peninsular lineages as valid species, doubling the number of endemic birds in Baja California. Chapter 2 is a collaboration with Keith Barker. In it, we explore the continental diversification of wrens in the genus Campylorhynchus solving all evolutionary relationships by sequencing 23 genes and multiple individuals per lineage, developing a new metric for comparing all sorts of phylogenetic trees, and clarifying biogeographic and behavioral evolution aspects in the Neotropics. In Chapter 3, me and collaborators Kelly Barr, Craig Farquhar, and Robert Zink merge historic fire ecology and population genetics to understand how and when the black-capped (Vireo atricapilla), went from being a historically common bird to being considered currently as endangered in its breeding grounds in the oak savannas of Oklahoma, Texas and northern

Mexico. Five online supplementary files (OSFs) accompany this dissertation: the first file includes voucher numbers, geographic information, substitution models, used primers, and full likelihood values in Chapter 1 (OSF 1); the second file contains Bayesian trees

iv and taxon pair distributions in Chapter 1 (OSF 2); the third file includes voucher numbers, evolutionary models, recombination tests, and primers used in Chapter 2 (OSF

3); the fourth file includes the randomization design in Chapter 2 (OSF 4); and the fifth file includes geographic information for all samples, primers, and multilocus phylogeny of used in Chapter 3 (OSF 5).

v

Table of Contents

List of Tables ……………………………………………………………………………vii List of Figures …………………………………………………………………………..viii

Chapter 1: Congruence despite biogeographic complexity: multilocus assessment of divergence in Baja California birds ..……………………………….….………………....1

Introduction …………………………………………………………………….....1 Materials and Methods …………………………………………………...... 6 Results …………………………………………………………………...... 13 Discussion ……………………………………………………………...... 18

Chapter 2: The phylogenetic power of sex at the species level resolves evolutionary relationships among Neotropical wrens ...... 30

Introduction ……………………………………………………………………...30 Materials and Methods …………………………………………………...... 34 Results …………………………………………………………………...... 42 Discussion ……………………………………………………………...... 52

Chapter 3: Fluctuating fire regimes and their historical effects on genetic variation in an endangered savanna specialist ...... 68

Introduction ……………………………………………………………………...68 Materials and Methods …………………………………………………...... 72 Results …………………………………………………………………...... 78 Discussion ……………………………………………………………...... 82

Bibliography ...... 89

Appendices ...... 103 Appendix 1: Time calibrated species tree phylogeny of Campylorhynchus...... 103

vi

List of Tables

Table 1. Taxon pairs of Baja California birds ...... 28

Table 2. Summary statistics per locus in the black-capped vireo ...... 79

Table 3. Summary statistics per black-capped vireo locality ...... 79

vii

List of Figures

Figure 1.1. Map of Baja California ...... 4

Figure 1.2. Splitting times (in millions of years) for all taxon pairs using BEAST...... 15

Figure 1.3. Prior testing and posterior probabilities for number of splitting events ...... 16

Figure 1.4. Coalescent-based time estimates for splits in co-distributed taxon pairs ...... 17

Figure 2.1. Trees from analyses of the five locus dataset ...... 63

Figure 2.2. Trees from analyses of the 23 locus dataset ...... 64

Figure 2.3. Cloudograms of species trees ...... 65

Figure 2.4. Locus clade support variability and distribution of the HSN index ...... 66

Figure 2.5. Locus rarefaction curves from randomizations ...... 67

Figure 3.1. Breeding sampling localities and Bayesian genotype cluster assignment...... 71

Figure 3.2. Likelihood values of inferred number of genetic clusters ...... 80

Figure 3.3. Isolation by distance patter for nine nuclear loci ...... 81

Figure 3.4. Extended Bayesian Skyline Plots and niche models ...... 88

viii

CHAPTER 1. Congruence despite biogeographic complexity: multilocus assessment of divergence in Baja California birds.

INTRODUCTION

The species composition of biological communities is the result of ecological and evolutionary processes that determine how species arise, co-exist, and interact. In The

Origin of Species, Darwin (1859) elucidated how organisms adapt to their environment, but despite the title of his book, he did not clarify how species originate as independent biological entities. Since The Origin, much evolutionary research has centered on adaptation in individual species, but until recently has rarely addressed phenomena involving community-wide assemblages (Webb et al., 2002; Cavender-Bares et al.,

2009). This gap is unfortunate, because studies of community-level differentiation offer insights into the process of speciation—most notably, the importance of regional environmental and geological change as causes of divergence—not possible in studies of single species. In particular, the timescale of community assembly largely determines what type of processes dictated how taxa come together (Cavender-Bares et al., 2009).

Thus, determining when species in a community rose in a historical context is fundamental to identifying processes behind the assembly.

Comparative phylogeography is one statistical framework for the study of community divergence diversification (Bermingham & Moritz, 1998; Beaumont et al.,

2010). Demographic models employing the coalescent accounting for how alleles from multiple genes sort in space and time, provide estimates of population sizes, migration, and time of divergence that allows for distinguishing between alternative ecological and

1 evolutionary scenarios (Knowles, 2009). Of those population parameters, one of the most important aspects of hypothesis statistical testing in phylogeography has to do with the temporal framework of divergence (Hickerson et al., 2010). Further, accounting for multiple sources of variation is critical to distinguish between historical scenarios. For example, a particular taxon or locus can harbor little or large variation due to stochastic processes or in the allele sorting possibly selection during a speciation event leading to inferring multiple diversification events when tin fact here was a single historical episode of divergence (Avise, 1992; Knowles, 2009). Thus, testing diversification hypotheses across multiple members of a community and across multiple loci is the most rigorous method to discern the number and timing of radiation pulses (Hickerson et al., 2006;

Hickerson et al., 2010).

Islands are nature’s biodiversity laboratories. While most studies of biological radiation have center on islands (Losos & Ricklefs, 2009), little is known about a geographic intermediate between islands and continents: peninsulas. Peninsulas have appealing attributes for studying community diversification; they harbor more species richness than islands but are not as complex as the mainland. One ideal study area is Baja

California (Fig. 1.1), the world’s second longest peninsula (ca. 1600 km) after the

Antarctic Peninsula (Rojas-Soto et al., 2003). Aided by its geologic history and isolation,

Baja California is biodiverse and a significant region of endemism (Grismer, 2000).

Extensive mountain chains run along its length, which provide a variety of elevations and ecoregions. Arid vegetation is common throughout the lowlands, but radically distinct variants, mainly coniferous and deciduous dry forests exist at elevations over 1500 m

2

(Wiggins, 1980). Tectonism and orogenesis have shaped the peninsula (Lindell et al.,

2006) and its geologic origins trace back to the formation of the Gulf of California approximately 12 millions years ago ([mya]; Stock & Hodges, 1989). Baja California was once part of western Mexico and it was shifted northward by action of the San Andreas fault from the North American plate to the Pacific plate in a slow tectonic process that culminated between 4.6 mya (Holt et al., 2000) and 5 mya (Grismer, 2000). In addition to these geologic events, Quaternary glacial cycles have also been proposed to affect species differentiation and environmental transformation (Grismer, 2002). Additionally, Baja

California has been a real island on two occasions. The first was when the Isthmus of La

Paz and Northern Gulf opened ca. 3 mya (Upton & Murphy, 1997; Riddle et al., 2000).

The second occasion was when a putative seaway joined the Gulf of California and the

Pacific Ocean ca. 1 mya in what is today the Vizcaíno Desert (Upton & Murphy, 1997).

The second event has been invoked to explain a consistent phylogeographic pattern in reptiles, mammals, birds, spiders, and plants (Upton & Murphy, 1997; Zink et al., 1997;

Grismer, 2000; Riddle et al., 2000; Rodríguez-Robles & De Jesus-Escobar, 2000; Zink et al., 2001; Lindell et al., 2005; Crews & Hedin, 2006; Trujano-Alvarez & Álvarez-

Castañeda, 2007; Garrick et al., 2009). The congruence of phylogenetic patterns in different taxa matching the seaway hypothesis implied that a single vicariant event was the principal evolutionary mechanism in this region (Riddle et al., 2000; Lindell et al.,

2006).

3

Northern Gulf ~ 3 mya

Midpeninsular Seaway ~ 1mya Gulf of California ~ 5 mya

Polioptila californica Polioptila melanura Auriparus flaviceps Auriparus flaviceps Campylorhynchus brunneicapillus Campylorhynchus brunneicapillus lecontei Toxostoma lecontei Isthmus of La Paz Callipepla californica ~ 3 mya Callipepla gambelli Melozone crissalis Melozone alberti Toxostoma cinereum Toxostoma bendirei Melanerpes uropygialis Peninsular/ South Melanerpes uropygialis Continental/ North

Figure 1.1. Map of Baja California. Gray scale gradient represents elevation in meters above sea level. Thick dotted lines represent historical vicariant events (Riddle et al., 2000). Genera are as follows: Polioptila are gnatcatchers, Auriparus are verdins, Campylorhynchus are wrens, Toxostoma are , Callipepla are quails, Melozone are towhees, and Melanerpes are woodpeckers. This list order matches Table 1 and Figure 1.2.

Despite these results, the single-event hypothesis was first called into question by

Grismer (2002), who suggested that some splits probably corresponded to climatic fluctuation during the Quaternary. More recently, reanalysis accounting for across taxon stochasticity in mitochondrial DNA (mtDNA) of a subset of those taxa (reptiles and mammals) revealed two splitting events instead of one (Leaché et al., 2007), arguing that 4 the hypothesized seaway was an oversimplification of the complex biogeographic history of Baja California and that both geology and environmental change played roles in speciation (Leaché et al., 2007; Garrick et al., 2009). Unfortunately the two pulses detected for reptiles and mammals in Baja California were not tested with multiple loci, and the priors for time were more broad than statistically necessary (Hickerson et al.,

2014). Furthermore, more vagile organisms in the peninsula have not been tested against the single event hypothesis using coalescent algorithms and multiple events could be a result biased by taxa with limited dispersal and lack of control for stochastic variance among loci.

North American desert communities are not as species rich as the tropics.

However, what they lack in species numbers they make up for in levels of endemicity.

Between one third and one fourth of Baja California insects and vascular plants, respectively, are endemic to the peninsula (Wiggins, 1980; Ayala et al., 1993).

Conversely, out of the 122 species of resident land birds, only three are officially considered endemic (Escalante et al., 1993; AOU, 1998). The reasons behind this pattern of low endemicity in birds are unknown, but may be related to dispersal potential in this taxon. However, nearly all year-round resident birds in Baja California are morphologically variable and have described subspecies north and south of the Vizcaíno

Desert (Grinnell, 1928). Thus, the low level of bird endemicity could be real, or due to incorrect ranking of recognized variation, or due to cryptic speciation as seen in other taxa (Grismer, 2000). For instance, Zink et al. (1997; 2001) found evidence of species- level differentiation in mitochondrial DNA (mtDNA) for at least three species in Baja

5

California that correspond to a morphological subspecies distinction (Grinnell, 1928), but which were all considered part of the same biological species (AOU 1998). In this study I employ a multi-taxon-multi-locus approach to estimating divergences within the community of birds in Baja California, to account for the stochastic variance across different species and different loci (Hickerson et al. 2006, 2007). This system allows for testing the time of community divergences for a taxon that could have arrived at multiple time intervals through flight and diversified over multiple vicariant events (Fig. 1.1;

Leaché et al., 2007; Garrick et al., 2009) or over climate change episodes during the end of the Pleistocene (Grismer, 2002). While the proximate goal of statistical phylogeography is to discern between historical events (Knowles, 2009), this research also has implications for conservation in helping to establish species limits and biogeographic units for conservation (Moritz, 2002). Thus, this study unites genetic study of community diversification with biological conservation.

MATERIALS AND METHODS

Taxon sampling. I sampled eight co-distributed avian taxon pairs across Baja

California and the mainland (Table 1, Fig. 1.1). Four of these taxon pairs are currently recognized as a single species with different subspecies in the peninsula and the adjacent continent (Grinnell, 1928; AOU, 1998). The other four correspond to peninsular species that have their sister taxon on the northern portion of Baja California and/or the adjacent mainland. I selected these taxon pairs because there is evidence of their mtDNA reciprocal monophyly in the region (Zink & Dittmann, 1991; Zink et al., 1997; Zink &

6

Blackwell, 1998a, b; Rojas-Soto et al., 2010; Lovette et al., 2012), and because they exhibit divergent phenotypes that match a common pattern of geographic variation

(Grinnell, 1928). Samples from these taxa come from my own collecting in Mexico and tissue loans from several museums (OSF 1, see Abstract).

Laboratory procedures and targeted loci. DNA was extracted from tissues using either a phenol/chloroform protocol using conventional procedures or the DNeasy Kit

(Qiagen Inc. Valencia CA.) following manufacturer’s instructions. I sequenced and amplified the mitochondrial NADH dehydrogenase subunit 2 (ND2) locus, as well as six nuclear loci. These nuclear loci include one sex-linked (ninth intron of aconitase 1,

ACO1, chromosome Z), and five autosomal markers (Table 1). The autosomal loci were the 14th intron of ATP-citrate synthase (ACL), the eight intron of α enolase/τ-crystallin

(AETC), the 11th intron of glyceraldehyde-3-phosphate deshydrogenase (GAPDH), the fourth intron of myelin proteolipid protein (MPP), and the fifth intron of transforming growth factor β2 (TGFB2). For woodpeckers a substitute autosomal locus was sequenced, the seventh intron of β−fibrinogen (FIB7). Polymerase chain reaction (PCR) of each locus was accomplished in 12.5 µl reactions of 4.25 µl of water, 6.25 µl of GoTaq

Green Mastermix (Promega, Madison, WI), and 0.5 µl of each primer at a concentration of 10nM, and 1 µl of template DNA. All loci were amplified with the following touchdown thermocycler conditions: initial denaturation step of 95 °C for 3 min, followed by 5 cycles of 95 °C for 0.5 min, 58°C for 0.5 min and 72°C for 1 min, and repeating these previous steps changing the annealing temperature downward 2oC until it reached 52 ºC for 20 cycles and a final extension step of 72 o C for 6 min. PCR products

7 were purified using a 3.4 µl volume of the enzyme cocktail of Exonuclease I – Shrimp

Alkaline Phosphatase method (ExoSAP, USB Corporation, Cleveland, OH) following the manufacturer’s instructions. Sequencing of forward and reverse strands was done using the BigDye v3.1 terminators (Applied Biosystems, Foster City, CA) following recommendations by the manufacturer, and electrophoresis on an ABI 3730xl automated

Sanger sequencer. Primer sequences are in OSF 1.

Sequence editing, allele phasing, and recombination. DNA sequences (.ab1 files) were assembled in SEQUENCHER 4.7 (Genecodes Inc., Ann Arbor MI) and were verified to be of avian origin and the correct locus by comparing them to reference sequences in the NCBI Genbank database using BLAST searches. For ND2 I checked for stop codons using the online amino acid translation tool from European Bioinformatics

Institute (EBI, see References) for each sample after it had been edited to detect stop codons at early sequence contig stages. I aligned sequences for each locus using default parameters in CLUSTALX 2 (Larkin et al., 2007). For heterozygote positions in nuclear loci I separated alleles by using in silico phasing in the program PHASE 2.1 (Stephens et al., 2001) by interconverting the original aligned FASTA files from CLUSTALX in the program SEQPHASE (Flot, 2010), running 100,000 generations with a burn-in of 10,000.

Allele phases were considered resolved with a probability threshold of 0.9. Alleles with lower probabilities were left with a polymorphic position and these sites were excluded from further analyses. I tested for recombination in our loci employing the Phi method

(Bruen et al., 2006) in the program SPLITS TREE 4.3 (Huson & Bryant, 2006) and for a battery of six additional recombination tests in the program RDP 4 (Martin et al., 2010).

8

Phylogenetic reconstruction and mitochondrial gene tree calibration. I constructed trees for the mitochondrial locus for each taxon pair in order to confirm reciprocal monophyly between them and to produce an estimate of divergence times in the program BEAST 1.8 (Drummond & Rambaut, 2007; Drummond et al., 2012). I inferred tree topologies and time estimates using a Markov chain Monte Carlo (MCMC) for 20 x 106 generations sampling every 2000 steps using the best fitting model calculated in jMODELTEST 2.14 (Darriba et al., 2012; Table 1; OSF 1). The tree priors in each case followed a Yule speciation model with a starting unweighted pair-group method using averages (UPGMA) tree. I performed two iterations for each taxon pair with two different clock models: a strict clock (Drummond et al., 2002) and a lognormal relaxed clock (Drummond et al., 2006). For the strict clock calibrations I used a substitution per lineage per million years rate of 1.3% from avian ND2 that corresponds to a 2.6% divergence rate (Arbogast et al., 2006). For the lognormal relaxed clock calibration I added a 0.45 standard deviation to the previous ND2 substitution rate (Smith & Klicka,

2010). As proxy for divergence time I recorded the median estimate of the tree root and its 95% highest posterior density (HPD). I checked for convergence of parameters in

TRACER 1.5 (Rambaut & Drummond, 2007) by sampling 9,000 trees after a 10% burnin phase, and by assessing the effective sample sizes (ESS) of each parameter. Time assessments from two types of clocks were necessary to check for biases in a diversification framework because clock priors can change significantly the estimation of divergence time. I calculated whether a relaxed clock model fit the data better than a strict clock using a likelihood ratio test (LRT; Felsenstein, 1981) by fitting clock-like and

9 non-clock behaviors for the Bayesian trees in PAUP* 4.beta10 (Swofford, 2003) with each taxon-pair respective evolutionary model as previously described. Additionally I compared these models using Bayes Factors (BFs); however the standard methods for distinguishing between hypotheses relies on a harmonic mean estimate of the marginal likelihood (MLE) that is often biased (Xie et al., 2011). To effectively calculate MLEs I employed the stepping-stone method (Xie et al., 2011) as implemented in BEAST (Baele et al., 2012) for 1 x 106 generations sampling every 1000 iterations with 300 stepping stone paths under each of the clock models followed calculating the magnitude of likelihood support for the two clock models with BFs assessing relative fit according to the scale from Kass & Raftery (1995).

Diversification events under a coalescent framework. Statistical phylogeography relies on accurate population parameter estimation to distinguish between hypotheses under different demographic and temporal scales (Knowles & Maddison, 2002; Knowles,

2009; Knowles & Kubatko, 2010). One of the most significant challenges is to obtain reliable confidence intervals that allow for the rejection of ecological or evolutionary events. This is a difficult to accomplish with single locus studies because confidence intervals are necessarily wider for any one locus (Edwards & Beerli, 2000). Moreover, comparative phylogeography relying on a single locus may be strongly biased because time of coalescence is inevitably associated to the stochastic process of lineage sorting and when ancestral polymorphisms arose prior to population divergence (Knowles &

Maddison, 2002; Knowles, 2009; Knowles & Kubatko, 2010).

10

A way to control for the previously mentioned biases is to incorporate the variance of divergence times through the coalescent process for co-distributed taxa across geographic barriers using multilocus data (Hickerson et al., 2006). The divergence time in multiple taxa, multiple loci, and their variance is assessed jointly in a hierarchical

Approximate Bayesian Computation framework (ABC) in the program MTML- msBAYES (Huang et al., 2011). Briefly, the ABC methodology uses coalescent simulations to estimate sets of 88 “summary statistics” such as nucleotide diversity (π) and Tajima’s D from a real or “observed” set of parameter values calculated from the

DNA sequence loci using all taxon pairs. After a large portion of summary statistics is obtained from coalescent simulated trees (typically in the order of millions), a logistic regression is fitted from the observed set of values to the closest 500-1000 simulations judged by Euclidean distances (Hickerson et al., 2006). This “closest set” of simulations is an approximation of the posterior distribution that is more computationally efficient than methods that employ conventional likelihood-based coalescent methods (Hey &

Nielsen, 2007). The ABC posterior yields the number of divergent events (Ψ), an estimate of divergence times (τ), and the dispersion index of times (Ω) that is the ratio of the variance of divergence times over the mean estimate of time. Two separate sets of simulations are required: A) to test for one divergent event (Ψ = 1, vicariance sensu

Rosen, 1978) relative to multiple divergence events (Ψ ≥ 1, soft vicariance and dispersal sensu Platnick, 1976; Hickerson & Meyer, 2008) and B) to constrain the number and calculate time of divergent events found in the previous step. I used BFs to statistically distinguish between Ψ = 1 or Ψ ≠ 1 using a ratio of the posterior marginal likelihoods

11 corresponding to those previous Ψ values. A value of BF (M1,M2) ≥ 10 indicates a strong preference of the alternative model over the null model (Jeffreys, 1961). Parallel to Ψ, if the dispersion index Ω ≤ 0.01, this indicates the coalescent variance from the observed taxon pairs is small enough to fit a single splitting event. Conversely, if Ω >

0.01 it suggests that the variance of times given the estimates of time fit multiple splitting events.

I performed two sets of msBAYES analyses. The first one included only mtDNA and the second a combined multilocus analysis. The reason to have mtDNA analyzed by itself was to directly compared my results to those seen in mammals and reptiles of Baja

California (Leaché et al., 2007), and the second including all loci to account for among- locus stochasticity and potentially reduce credibility intervals on divergence times. In both cases, I first determined the appropriateness of multiple priors by running 1000 test simulations for every case. Prior choice, in particular the upper limit of τ, is fundamental to determine Ψ and Ω with precision by ensuring the simulated sets fit the observed summary statistics (Hickerson et al., 2014). I varied the upper limit of tau from very small values (0.075, 0.5) to relatively larger values (1.5, 2.0, 10.0) plus the default limit of 1.0. The upper limit of θ (effective population size N times mutation rate µ) was calculated from the empirical values of π. To each test prior set file I appended the observed summary statistics vector, labeled it differently by adding an extra column with a 1 matching a zero on the first column of the test simulations, and performed principal component analyses (PCAs) in R 3.01 (R Development Team, 2010) and judged the best prior to be the one where the observed statistics were closest to the aggregation center on

12 the first two PCA axes of the simulations (Hickerson et al., 2014). After prior selection, I ran and concatenated five sets of 1 x 106 simulations for a total of five million concatenated simulated summary statistics. For the first msBAYES analysis the prior of

Ψ was a uniform distribution for a single event to maximum number of possible events

(number of taxon pairs). The posterior distribution for Ψ, Ω, and τ were calculated in both analysis using a logistic regression to the closest 1000 simulations (acceptance-rejection rate = 0.0002 for 5 million simulations). For the second analysis Ψ was constrained to the number resulting from the previous step. I converted τ to absolute time using the following equation t = τ * θAve/µ g (Hickerson et al., 2006; Hickerson et al., 2007), where “t” is time in million years, τ is the mode of population divergence time in units of

µ per generation, θ is the upper limit of the average ancestral population size divided by 2

(the other half goes to the daughter populations), µ is the substitution rate per lineage per million years-1, and g is the generation time (Dolman & Joseph, 2012) as used in the

Bayesian tree clock prior calibrations. I used the ND2 rate (0.013) to calculate the mutation scalars for the nuclear loci (0.00193 and 0.0018 for sex-linked and autosomal loci respectively; Axelsson et al., 2004) and re-scaled the average µ for the multilocus analysis (Huang et al., 2011). The generation time for each taxon pair was the same, one year as estimated from the age of first reproduction (Bennett, 1986).

RESULTS

I assembled seven independent loci for eight taxon pairs co-distributed across

Baja California and the adjacent mainland (Table 1, OSF 1). These taxon pairs represent

13 non-migratory birds with phenotypic geographic structure recognized both below and above the species level (Grinnell, 1928; AOU, 1998). The entire dataset included 1113 alleles. Not all loci amplified or sequenced therefore the average locus number was 5.6 loci per taxon pair (min=4, max=7). The longest loci were ACO1 ranging between 1044 and 1066 base pairs (bps) and ND2 (1041 bps) for all taxon pairs. Medium length loci included FIB7 at 820 bps, TGFB2 ranging from 585 to 610 bps and ACL from 479-511 bps. The shortest loci were AETC, GAPDH, and MPP with products between 311-461 bps (Table 1).

The Bayesian mitochondrial trees from BEAST showed each taxon pair as being reciprocally monophyletic regardless of the clock prior with a clear divergence between northern/continental and southern populations/species (OSF 2, see Abstract). The ESS values for the all priors and posterior densities were well into the thousands indicating convergence of parameters. The divergence time estimates for all taxon pairs had similar medians between clock models (Fig. 1.2, Table 1). The most significant difference was the width of the 95% HPD intervals. Strict clock priors had tighter intervals than those from a relaxed clock. LRT tests indicated no statistical significance differences between clock and non-clock trees justifying the clock-like behavior of ND2 for all the taxon pairs. Bayes Factors were for the most part congruent with the LRT tests showing support barely worth mentioning for the relaxed clock model. The only two exceptions were a positively favored strict clock for Toxostoma lecontei and a relaxed clock for the towhees with strong support (Table 1). The distribution of divergence times from strict clocks indicates two or three non-overlapping pulses of diversification. One event included

14 gnatcatchers, verdins, and wrens at the Pliocene/Pleistocene boundary ~2.4 million years before present (mybp) with a 95%HPD span of 1.70-3.38 mybp (Fig. 1.2). A second event in the Early Pleistocene at ~0.92 mya (0.58-1.30 mybp 95%HPD) included the Le

Conte’s , the quails, and the towhees. Finally a third divergence between 160,000 and 350,000 years ago exhibited by the Gila woodpecker. The split between Bendire’s and gray thrashers lies intermediate between the two latter events and with its overlapping 95%HDP is not clear which event they reflect.

8_gnatcatchers

7_verdin*

6_wren*

5_quails clock relaxed

taxon strict 4_LeConte's*

3_towhees

2_Bendire/gray

1_woodpecker*

0 1 2 3 4 5 time Figure 1.2. Splitting times (in millions of years) for all taxon pairs using BEAST strict and relaxed clock priors of mtDNA arranged in descending order. North/Continental taxon faces right and South/Peninsular taxon faces left. Taxa with an asterisk (*) are considered the same species and those without are considered sister species. Bird images reproduced with permission from the HBW (Lynx Edicions). 15

Accounting for stochastic variance across taxon pairs and across loci in a coalescent model, the best prior fitting the observed data was at τ= 1.0 (Fig. 1.3).

15 A) τ = 1.00 B) τ = 0.075 15 C) τ = 0.50 20 10 10

10 5 sims1 5 sims3 sims2

0 0 0 PCA PCA 2 obsSS PCA PCA 2 PCA PCA 2 obsSS -5 -10 obsSS -5 -10 -20 -10 -15 -30 -15 -15 -10 -5 0 5 10 15 -10 0 10 20 30 40 -15 -10 -5 0 5 10

15 PCA 1 15 D) τ = 1.50 E) τ = 2.00 PCA 1 F) τ = 10.0 PCA 1 10 10 10

5 sims5 5 sims4 5 sims7

0 0 PCA PCA 2 PCA PCA 2 PCA PCA 2 0 obsSS obsSS -5 -5 obsSS -5 -10 -10

-10 -15 -15

-15 -10 -5 0 5 10 15 -10 -5 0 5 10 15 -10 -5 0 5 10 15 20

PCA 1 PCA 1 PCA 1 G) Probability of divergent events ( ! ) in mtDNA H) Probability of divergent events ( ! ) with multiple loci 0.35

posterior posterior prior prior 0.30 0.15 0.25 ) )

0.20 ! !

0.10 Prob ( Prob Prob ( Prob 0.15 0.10 0.05 0.05 0.00 0.00 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Number of Events Number of Events Figure 1.3. Prior testing and posterior probabilities for number of splitting events in Baja California birds. A-F panels are different priors for the upper bound of time of divergence (τ); panel A (default) was the one where simulations (blue dots) fit the observed summary statistics (red dots) better. Note that PCA scales are not the same for all prior graphs. G-H panels represent the number of splitting events in mtDNA (G; Ψ = 3) and multiple loci (H; Ψ = 2).

MsBAYES simulations found that the most likely number of events (Ψ) had a mode of three events for the mtDNA analysis and two events for the multilocus dataset (Fig. 1.3).

Bayes Factors strongly favored the rejection of a single splitting event (ΨmtDNA ≥ 1, BF = 16

∞; Ψmultilocus ≥ 1, BF = 13.15). All the 1,000 simulations of the mtDNA analysis and most of the multilocus simulations of the index Ω were well beyond the 0.01 threshold confirming the rejection of a single event reflected in the coalescent variation of both mtDNA and the multi-locus sample for these Baja Californian birds (Fig. 1.4).

Times of divergence ( ! ) with mtDNA 0.7

! 1 0.6 4 ! 2 0.5 3 0.4 )

!

! 3 E( Density 0.3 2 0.2 1 0.1 0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.1 0.2 0.3 0.4 0.5 Times of divergence ( ! ) with multiple loci

0.7 Divergent events in million years ! = Var ( ! )/E( ! )

7 ! 1 0.6 6 0.5 5 0.4 )

4 !

E( 0.3 Density 3 0.2

2 ! 2 0.1 1 0 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

! = Var ( ! )/E( ! ) Divergent events in million years

Figure 1.4. Coalescent-based time estimates for splits in co-distributed taxon pairs. Top panels represent mtDNA values and bottom panels are from multilocus data. The critical mass of the 1,000 accepted simulations in either case was greater than a ratio of variance of time over time that would fit a single event (dotted line - Ω ≤ 0.01, Hickerson et al., 2006).

The time calibration of these diversification pulses (Fig. 1.4) in mtDNA suggest an older event at the Pliocene/Pleistocene boundary with a mode of 2.05 mybp (τ3 =

17

1.73-2.37 mybp 95%HPD), a second event at the Early Pleistocene with a mode of 0.78 mybp (τ2 = 0.54-1.05 mybp 95%HPD), and a more recent event on the Mid to Late

Pleistocene with a mode of 0.21 mybp (τ1 = 0.00-0.39 mybp 95%HPD). The correspondence of taxon pairs to these events was exactly as inferred from the Bayesian strict clock divergences, with the clarification of the position of the Bendire’s/gray thrashers split as part of the most recent event together with the Gila woodpecker. There was no overlap between any of these three divergent events in mtDNA.

Including multiple loci reduced the number of splitting events from three in mtDNA to two (Figs. 1.3-1.4). The older event included gnatcatchers, verdins, wrens, and quails at 0.58 mybp (τ2 = 0.15-1.35 mybp 95% HPD). The younger event included the Le

Conte’s thrasher, towhees, Bendire’s and gray thrashers, and woodpeckers at 0.07 mybp

(τ1 = 0.00 – 0.26). The two divergent events estimated from multiple loci overlapped for

0.11 mybp in their HPDs.

DISCUSSION

Comparative multi-locus phylogeography. One of the most appealing parts of studying biogeography is the complexity of patterns and events over different time scales.

Assuming that a large enough region has had a relatively simple biogeographic history would seem at first glance like an oversimplification of our understanding on how species and geography evolve over time. In Baja California, there is a large congruence between phylogenetic branching patterns and geographic distribution of clades fitting a single vicariant event for a wide range of taxonomic groups with different dispersal capabilities

18

(Upton & Murphy, 1997; Zink et al., 1997; Grismer, 2000; Rodríguez-Robles & De

Jesus-Escobar, 2000; Zink et al., 2001; Lindell et al., 2005; Crews & Hedin, 2006;

Trujano-Alvarez & Álvarez-Castañeda, 2007) that would suggest an exceptionally rare and simple case of biogeographic history. This simple scenario has been rejected for organisms with relatively low dispersal capabilities including succulent plants (Garrick et al., 2009), and reptiles and mammals (Leaché et al., 2007) in a coalescent framework similar to the one I used in this study. Birds are considered powerful dispersers because of their flight ability (Crosby, 1972; Barker, 2007). This characteristic makes avian taxa a strong test of hypotheses of community-wide phenomena and the effects of common biogeographic barriers. All the results described herein indicate that birds reject the single vicariant event hypothesis in Baja California as well. However, these findings based on mitochondrial and nuclear loci accounting for multi-taxon and multi-loci stochasticity show a high degree of congruence with the mid peninsular seaway in the now Vizcaíno

Desert ca. 1 mybp (Figs. 1.1-1.2, 1.4; (Upton & Murphy, 1997; Riddle et al., 2000)) for

40-50% of the taxon pairs included in this study. Thus, the midpeninsular seaway can be invoked as to explain partially the common phylogeographic patterns seen in all Baja

Californian amniotes regardless of birds’ high vagility. (Trejo-Torres & Ackerman, 2001) found that plants with high vagility do not fit general historical biogeographic patterns in the Antilles and suggested that other organisms with similar dispersal capabilities would be consistent with orchids dispersed by air. In the Antilles case, birds fit in general a general biogeographic pattern (Vázquez-Miranda et al., 2007). My results under a

19 coalescent framework reveal that despite the evidence for more than a single event, poor and agile dispersers in this peninsular community have been affected by common history.

Yet, that history is not purely vicariance. Diversification and dispersal events spanning the Holocene and the Pleistocene ought to show signatures of historical and ecological factors for community assembly (Cavender-Bares et al., 2009). The remaining

50-60% of the surveyed avifauna fit a more recent event spanning the end of the

Pleistocene and the Quaternary. Grismer (1994; 2000) hypothesized that some diversification events in Baja California were too recent to fit the midpeninsular seaway model. Historical climate change has influenced the fragmentation of desert habitats in

North America (Riddle & Hafner, 2006) that accelerated during the Mid to Late

Pleistocene (Riddle et al., 2008) through cold and wet climate oscillations (Garrick et al.,

2009). Due to the wide and overlapping HPD intervals for the most recent event it is impossible to highlight a particular climate change scenario using the current data. The modes for mtDNA and multiple loci peaked prior and after the Last Glacial Maximum

(LGM). Even though there is evidence of climate shaping populations of some of these birds, for example the California gnatcatcher (P. californica) expanding its populations northwards during the Holocene (Zink et al., 2013), these demographic events happened after the split from their sister taxon associated to the seaway and cannot be addressed by the present study. There is further evidence of the contraction of geographic ranges to the south of the peninsula for several of these taxon pairs (Zink, 2014), but except from the gnatcatcher there are no wider multilocus coalescent-based demographic models available for comparison. Nevertheless the confidence intervals of the most recent event

20 detected exclude the youngest vicariant event that shaped the peninsular community, suggesting that post-seaway historical dispersal happened for at least half of the taxon pairs. Part of the complexity of biogeographic history is the evolutionary processes behind speciation. Over long historical periods both vicariance and dispersal are processes that occur in most biogeographic regions (Vázquez-Miranda et al., 2007), and even in more recent timescales these two processes co-occur (Zink et al., 2000). My findings coincide with previous conclusions stemming from other coalescent-based studies (Leaché et al., 2007; Garrick et al., 2009): a single historical event is not enough to explain how the biological community assembled in the peninsula over time. However, there is geographic and temporal congruence in a handful of diversification pulses in amniotes. Further exploration in other and plant taxa explicitly accounting for coalescent variance across loci-and-taxa will help extend this to the broader biological community in the peninsula.

Single and multilocus datasets. The estimates of number of diversification events were different between mtDNA alone and multiple loci. Although a difference of an extra

τ value could seem small, it has implications on the interpretations of these results. On the one hand multi-locus studies are more solid statistically because they incorporate the stochasticity of lineage sorting, reduce confidence intervals, and the uncertainty of parameters (Edwards & Beerli, 2000; Knowles & Maddison, 2002; Knowles, 2009). This would suggest that despite accounting for across taxon stochasticity some of the ancestral polymorphisms in mtDNA yield idiosyncratic time estimates when a single locus is employed. The reduction of time estimates from multiple loci compared to single locus

21 divergences is a common pattern in phylogeography (Smith et al., 2013; Giarla et al.,

2014). In multi-taxon-multilocus studies that have compared theses differences between mitochondrial and combined estimates, they coincide with the reduction of number of diversification events highlighting the importance of including more than one locus in comparative phylogeography (Bell et al., 2011). On the other hand however, simulations assessing the effect of number of loci to distinguish between pulses of diversification show when the number of loci is not relatively high (>16), multilocus results may be less reliable than mtDNA/single locus estimates (Huang et al., 2011). The lack of HPD overlap in my mitochondrial time estimates and a small but visible overlap of 110,000 years in the times from the multilocus dataset could be suggestive of not enough power to decidedly distinguish between number of events using msBAYES despite having over

1000 alleles. On the other hand Hickerson et al., (2006) suggested that having as little as two alleles per locus was enough to estimate parameters. Future studies maximizing the number of loci over individuals will clarify whether the overlap in times is due to low statistical power or the true nature of the coalescent process. Nevertheless a single event hypothesis is strongly rejected by both mtDNA and combined results.

Under a coalescent model the confidence intervals of all splitting events excluded vicariant events prior to the Pliocene-Pleistocene boundary (Fig. 1.1) - the final closure of the Gulf of California (~5 mybp), the Northern Gulf (~3 mybp), and the Isthmus of La

Paz (~3 mybp)-. In contrast the times from BEAST using a relaxed clock spanned those older events. Even when a strict clock was enforced supported by Bayes Factors and

LRTs the events dated at ~3 mybp could not be rejected. Although time calibrations

22 using sophisticated Bayesian algorithms (Drummond et al., 2007) are the method of choice in single taxon studies, they do not incorporate that stochastic variance among co- distributed taxa (Hickerson et al., 2006; Hickerson et al., 2010; Huang et al., 2011). My empirical results show how important it is to use appropriate priors in comparative phylogeography (Hickerson et al., 2014). For example, accounting for taxon-wide stochasticity largely broad priors make the absolute timing of diversification in peninsular mammals and reptiles (Leaché et al., 2007) “hard to interpret” (Hickerson et al., 2014), suggesting that those divergence times might be incorrect. Nevertheless mammals, reptiles, and now birds appear to have comparable biogeographic histories in

Baja California, the differences in the broad time estimates from (Leaché et al., 2007) and those presented here could be due to differences in the assumed upper limit of τ. In cases where phylogenetic tree estimates are broad due to an assumed relaxed clock, the divergence times across taxon pairs would overlap more than necessary for direct comparisons with msBAYES results (McKay, 2012) or more ambiguity would be introduced if using them as proxy for number of divergent events before running coalescent simulations. Added to Hickerson et al. (2014)’s recommended prior testing, I suggest caution when assuming a relaxed clock over a simpler model without appropriate statistical testing in comparative phylogeography because overlap in credibility intervals could be an artifact of an unjustified time calibration prior.

Ecological factors for divergence. Why did wrens, gnatcatchers, verdins, and quails diverge at the same time whereas woodpeckers, thrashers, and towhees diverged later? When calibrating splitting events in any phylogenetic or coalescent framework

23 researchers assume that substitution rates are uniform across similar taxa (e.g. one average rate for mammals and another for birds; (Brown et al., 1979; Weir & Schluter,

2008, respectively). Since substitution rates are often correlated with metabolic rates

(Martin & Palumbi, 1993) it could be possible that detecting multiple events corresponds to taxon pairs of significantly different body size and not to different historical events.

This is unlikely for birds in Baja California (OSF 1) because the two largest bodied birds included in this study, quails and woodpeckers, belong to distinct diversification scenarios (Figs. 1.2, 1.4). Medium-sized birds such as wrens, towhees, and thrashers are also diverged at different times. The two smallest, verdins and gnatcatchers are lumped with medium-sized wrens in the mtDNA results and also include quails with multilocus data. Wrens, verdins and gnatcatchers are towards the lower size spectrum from the sample taxa, yet wrens’ weight overlaps with other medium-sized birds (OSF 1).

Similarly, foraging guild and nesting location does not follow a pattern matching the diversification scenarios (OSF 1). The only ecological characteristic that would match the three events detected in mtDNA (Fig. 1.3) would be foraging level; the oldest event including brush and tree foragers and one brush-ground forage, the middle event matching mostly ground foragers. Even this pattern breaks down because it includes one ground forager and one large tree forager (OSF 1). Although I did not explore other ecological factors, it seems that general breeding, feeding, and body size characteristics are not correlated with membership in a particular diversification pulse. This predominance of biogeographic events over ecological factors is expected in relatively old communities (Cavender-Bares et al., 2009).

24

Species limits from synchronous divergence and conservation. One of the impacts that phylogeographic studies have is to help identify species limits because conservation relies on an objective assessment of biodiversity informing taxonomic designations, policy, and management (Moritz, 2002). Baja California has a large number of endemic species recognized in most taxa: plants, arthropods, and most vertebrates have up to 80% endemicity in the peninsula (Wiggins, 1980; Ayala et al., 1993a; Riddle et al.,

2000). The unusual taxonomic group is birds with only 3 endemic species out of 122

(Escalante et al., 1993). Why would birds be so different from other taxa in their level of uniqueness? The results from this study show how some well-recognized species pairs diverged at the same time as population deemed as just subspecies (Grinnell, 1928).

These peninsular populations are not only reciprocally monophyletic in mtDNA, but they have synchronous divergences to taxa recognized as valid species. (Zink et al., 1997;

Zink et al., 2001) proposed that the verdin, the cactus wren, and the Le Conte’s thrasher ought to be split into northern and southern species, but this taxonomic recommendation has not been adopted formally (AOU, 1998; Chesser et al., 2013). The likely cause is that the phenotypic differences between these lineages are not as “strong” as the ones exhibited by the quails and the Bendire’s/ gray thrasher pair. In contrast the genetic split between towhees and gnatcatchers and their species validity (Zink & Dittmann, 1991;

Zink & Blackwell, 1998a) been already accepted (AOU, 1998; Chesser et al., 2013) even though phenotypic differentiation is similar to that exhibited by wrens and Le Conte’s thrashers. This taxonomic inconsistency is an example of how some historic taxonomic decisions have been subjective. Both verdins and Gila woodpeckers are reciprocally

25 monophyletic as well (Table 1, OSF 2; Zink et al., 2001) and have divergent morphologies (Grinnell, 1928). My data show that two to three sets of taxon pairs have equivalent speciation events to recognized species (Fig. 1.4), indicating that at least to some degree the low endemicity level of the avifauna of Baja California is a taxonomic artifact. Based on the available genetic data, I would recommend splitting cactus wrens, verdins, Le Conte’s thrashers, and Gila woodpeckers into two species each to accurately represent speciation events in Baja California, in one step doubling the number of endemic Baja California birds. This solution is very simple, elevating peninsular subspecies to species as they all have described lineages and even common names

(Grinnell, 1928) supported with a statistically solid multilocus coalescent framework.

Ecological and evolutionary factors determine how species assemble in a community. Geographic barriers to gene flow can fragment entire communities causing species divergence, while dispersal can homogenize formerly isolated communities.

Several major geologic events have partitioned the Baja California peninsula multiple times into islands over the last five million years. Even though several phylogeographic studies show a genetic and geographic correspondence to these vicariant events, most have focused on individual taxa or have not accounted for the genetic variance across taxa and across loci. My results show that there is congruence of historical temporal patterns in the most vagile of taxa in the peninsula, highlighting the importance of ancient geographic barriers relative to current gene flow into shaping the avifauna of Baja

California. This congruence matches both vicariance and historical dispersal events that

26 are concomitant biogeographic processes in most communities residing in a region over millions of years, despite the underlying geologic complexity.

Importantly, community-wide approaches to biological diversification can help detect and address flaws in using geographic and temporal congruence of divergence patterns. The basic role of systematics and phylogeography is to identify the units of biodiversity and adding estimates from a community perspective could aid in this endeavor. Changes proposed in this study would double the number of endemic bird species for the peninsula and Mexico with important implications to conservation and management.

27

Table 1. Taxon pairs of Baja California birds, number of loci, number of alleles, locus length in base pairs (bps), splitting times for mtDNA under two different clock models, probabilities of rejecting a clock-like behavior under a likelihood ratio test (LRT) and Bayes Factors (BF). Values in bold are statistically significant. All mtDNA trees showed reciprocal monophyly (OSF 2, see Abstract). # Strict Clock Relaxed Clock Taxon pair Loci locusName North South length(bps) (95%HPD) (95%HPD) LRT BF Polioptila melanura/californica 5 ND2 5 5 1041 2.48 (1.76-3.38) 2.25 (0.67-5.06) 0.64 -0.78 black-tailed/ California gnatcatcher ACO1 10 8 1062 AETC 10 10 365 GAPDH 10 10 372 TGFB2 10 8 606 Auriparus flaviceps 5 ND2 11 5 1041 2.42 (1.78-3.12) 2.17 (0.66-4.78) 0.99 -3.31 verdin* ACO1 14 10 1066 GAPDH 14 10 382 AETC 6 8 393 TGFB2 14 10 600 Campylorhynchus brunneicapillus 7 ND2 9 10 1041 2.28 (1.7-2.98) 2.06 (0.60-4.50) 0.75 -1.19 cactus wren* ACO1 10 14 1027 ACL 6 10 479 AETC 10 10 376 GAPDH 10 10 339 MPP 10 14 378 TGFB2 10 10 592 Callipepla gambellii/ californica 4 ND2 7 9 1041 0.95 (0.64-1.34) 0.86 (0.25-1.96) 0.97 -1.28 Gambell's/ California quail AETC 10 12 347 GAPDH 12 12 463 TGFB2 12 12 592

28

Toxostoma lecontei 7 ND2 17 14 1041 0.95 (0.64-1.30) 0.85 (0.26-1.92) 0.99 4.17 Le Conte's Thrasher* ACO1 10 16 1062 ACL 8 12 511 AETC 8 18 329 GAPDH 12 18 370 MPP 12 14 311 TGFB2 8 18 610 Melozone alberti/ crissalis 6 ND2 5 16 1041 0.87 (0.58-1.23) 0.80 (0.20-1.79) 0.91 -36.71 Albert's/ California towhee ACO1 10 10 1066 ACL 10 12 494 AETC 10 12 362 GAPDH 10 12 387 TGFB2 10 10 585 Toxostoma bendirei/ cinereum 6 ND2 5 12 1041 0.52 (0.32-0.78) 0.47 (0.12-1.08) 0.93 -0.26 Bendire's/ gray thrasher ACO1 8 10 1066 ACL 6 12 500 AETC 10 12 328 GAPDH 10 12 379 TGFB2 8 12 602 Melanerpes uropygialis 5 ND2 41 20 1041 0.24 (0.16-0.35) 0.22 (0.06-0.50) 0.99 -2.13 Gila woodpecker* ACO1 54 32 1044 FIB7 54 32 820 GAPDH 8 10 461 TGFB2 6 10 585 Total number of alleles 1113 550 563

29

CHAPTER 2. The phylogenetic power of sex at the species level resolves evolutionary relationships among Neotropical wrens

INTRODUCTION

Although the assembly of the tree of life is a crucial endeavor, systematists are faced with the reality that many phylogenies are difficult to resolve (Whitfield &

Lockhart, 2007), and robust phylogenies are still out of reach for many groups. Adding loci is one general solution to the issue of resolution (Edwards & Beerli, 2000; Rokas &

Carroll, 2005), but the question is, how many loci would we need to include for relationships to be fully solved with statistical support (Felsenstein, 2006)? Perhaps more importantly, what sort of loci are best to address such questions (Chojnowski et al.,

2008): protein coding, introns, intergenic regions, GC-rich regions, retroposon insertions, sex-linked, autosomal, or some combination of these? Several power analyses have been done, addressing how much data needs to be acquired (Walsh et al., 1999; Fishbein et al.,

2001; Rokas et al., 2003); yet several rely on simulations (Agapow & Purvis, 2002;

Knowles & Kubatko, 2010; Leaché & Rannala, 2010; Liu & Yu, 2011) that may or may not accurately reflect the nature of gene stochasticity and other characteristics inherent to each taxon or locus type. Few studies have systematically sampled data from natural systems to determine the phylogenetic power of loci varying in relevant characteristics

(Rokas et al., 2003; Rokas & Carroll, 2005; Corl & Ellegren, 2013).

One particularly interesting question, especially in the context of newly- developed methods for species tree inference, is the issue of ploidy. Although mitochondrial DNA (mtDNA) is the most widely used marker in systematics and

30 phylogeography (Zink & Barrowclough, 2008) it is effectively one genetic locus and sampling multiple mtDNA genes is inefficient for methods that use gene trees as the unit of analysis (Edwards & Bensch, 2009). The usefulness of mtDNA in species tree estimation and phylogeography stems from its fast coalescent time (Zink &

Barrowclough, 2008) and high evolutionary rate (Brown et al., 1979). Nuclear autosomal loci tend to coalesce, on average, four times slower than mtDNA because of the larger number of copies per individual and consequently higher effective population size (Ne,

Kingman, 1982). If divergences in a particular species tree are much less than 2Ne generations, the average nuclear locus will have little chance of tracking true branching pattern. For systems with sex chromosomes, ploidy reduces the coalescent time of a gene marker bringing it closer to that of mtDNA because any sex-linked locus has two copies from the homogametic sex but only one from the heterogametic sex in the gene population; consequently, the time to the most recent common ancestor (TMRCA) for such loci is only 1.5Ne. Therefore, using multiple independent sex-linked loci for phylogenetic reconstruction is potentially a way to more efficiently reconstruct the species tree

(Jacobsen & Omland, 2011; Corl & Ellegren, 2013). The empirical question is, in practice are autosomal loci truly different from sex-linked markers, since 2Ne is after all not too different from 1.5Ne? This is a particularly interesting question in birds, since variance in male reproductive success—a common feature of bird mating systems—has the chance to drive the effective size of sex-linked markers even lower than ploidy expectations (since males have two copies of each sex-linked locus; Bull, 1983; Nunney,

1993). The purpose of our study is to assess the phylogenetic power of loci with different

31 ploidy levels in solving difficult evolutionary relationships at the species level, using empirical data. Our approach is to construct a multilocus phylogeny of a recent radiation with relatively short branches and to assess how many loci of differing ploidy levels need to be sampled to support individual branches in a species tree. We base our assumptions in coalescent expectations: for relatively long branches, more loci should be able to track the species tree, therefore fewer loci will be required to recover such branches.

Conversely, relatively shorter branches will have less time to accumulate mutations over evolutionary time; therefore more loci will need to be sampled in order to recover those groupings. Additionally, intrinsic locus characteristics such as sequence length have clear effects on the resolving power of a locus below the species level (e.g., Carling &

Brumfield, 2007), and we examine this question in the context of our data at the species level.

The focus of our work is the genus Campylorhynchus (Aves: Troglodytidae), a small genus containing the largest-bodied wrens. These wrens are an ideal case study to test the phylogenetic contribution of different loci because they diversified in a relatively brief time interval and a number of lineage splits are not well-resolved (Barker, 2007).

Additionally Campylorhynchus wrens pose interesting questions in evolutionary ecology that apply to other groups of organisms. They are widespread in the Americas in a wide variety of habitats (Brewer, 2001), from the warmest deserts of North America (cactus wren, C. brunneicapillus) to the wettest forests in South America (white-headed wren, C. albobrunneus), therefore understanding how they evolved to survive in climatic extremes is important to understanding how ecological speciation can drive diversification in the

32

Neotropics. Most species are cooperative breeders and sing duets with varying numbers of nest helpers (Selander, 1964; Barker, 1999; Price, 2003), thus a phylogenetic perspective on the evolution of such a complex behavioral syndrome could be enlightening. In addition, their distribution spans and predates the formation of the

Isthmus of Panama (Barker, 2007), therefore understanding the historical biogeography of the genus would shed light on the Great American Biotic Interchange (GABI) and how communities have assembled over time (Cavender-Bares et al., 2009) in particular for

Neotropical birds (DaCosta & Klicka, 2008; Smith & Klicka, 2010; Barker et al., in press). These wrens are currently classified into 13 species with multiple diagnostically distinct populations (i.e. subspecies or phylospecies, depending on treatment).

Traditionally, the genus was divided into two clusters corresponding to their overall phenotype: brown species with vertical dorsal stripes belong to the subgenus

“Heleodytes,” primarily occurring in North America, and gray species with horizontal dorsal bars for the subgenus “Campylorhynchus,” mostly distributed in South America. A mitochondrial phylogeny found these two groupings as clades, but association of the cactus wren (C. brunneicapillus) with the “Heleodytes” clade was not significant (Barker,

2007), yielding a basal trichotomy. Alternate resolutions of this three-taxon problem would indicate whether cooperative breeding and duetting evolved once in the genus or if it was subsequently lost, whether living dry and hot climates is ancestral or derived condition, and whether there were more or less than two independent dispersals prior to the formation of the Isthmus of Panama. To address relationships in this group, we gathered and analyzed two data sets with up to 23 genetic loci and 99 individuals,

33 focusing on questions of among-gene congruence, monophyly of basal units, and resolving power.

MATERIALS AND METHODS

Taxon sampling and laboratory procedures. We sampled all 13 currently recognized species of Campylorhynchus wrens (Brewer, 2001) and included as many diagnostically different populations as feasible, for a total of 24 operational taxonomic units (OTUs). The taxonomy of this genus below the species level is still poorly understood. Although we favor a classification that recognizes species as independent lineages with diagnostic differences—an evolutionary or phylogenetic species concept

(PSC, i.e. Cracraft, 1983)—most of these wrens are still considered polytypic species. A few studies have tried to draw species limits (Zink et al., 2001; Vázquez-Miranda et al.,

2009) but these splits have not been universally recognized (Chesser et al., 2013). We sampled on average five individuals per OTU, however some species are rare in collections; therefore, we have included between two and a maximum of seven individuals per taxon, for a total of a 99 individuals plus five outgroups (OSF 3, see

Abstract), selected according to higher-level phylogenetic hypotheses for wrens and birds (Barker, 2004; Barker et al., 2004). We used two sampling schemes: a) 5 locus dataset, with all 99 individuals sequenced for six genes, two mitochondrial, two autosomal, and two sex-linked (Z-linked in birds) for a total of five independent loci to account for within-lineage relationships; and b) a 23 locus dataset, with one representative of each of the nominate 13 species and 24 genes, two mitochondrial, ten

34 autosomal, and 12 sex-linked for a total of 23 independent loci to assess relationships between lineages (OSF 3). Data collection was identical for both datasets, as follows. We extracted DNA with either a DNeasy kit (Qiagen, Valencia, CA) following manufacturer instructions, or a standard phenol chloroform protocol followed by cold ethanol precipitation, in both cases followed by a final DNA elution or resuspension in 200 µl of

TE, storing samples at -20o C. We amplified and sequenced loci for both datasets using the primers described in OSF 3, each supplemented with M13 tails to facilitate sequencing in 96-well plates. Polymerase chain reaction (PCR) of each locus was accomplished in 12.5 µl reactions of 4.25 µl of water, 6.25 µl of GoTaq Green Mastermix

(Promega, Madison, WI), and 0.5 µl of each primer at a concentration of 10 nM, and 1 µl of template DNA. All loci were amplified with the following touchdown thermocycler conditions: initial denaturation step of 95 °C for 3 min, followed by 5 cycles of 95 °C for

0.5 min, 58°C for 0.5 min and 72°C for 1 min, and repeating these steps changing the annealing temperature downward 2oC until it reached 52 ºC for 20 cycles and a final extension step of 72 o C for 6 min. PCR products were purified using a 3.4 µl volume of the enzyme cocktail of Exonuclease I – Shrimp Alkaline Phosphatase method (ExoSAP,

USB Corporation, Cleveland, OH) following the manufacturer’s instructions, or using magnetic beads (AMpurebeads Beckman Coulter, La Brea, CA). Sequencing of forward and reverse strands was done using BigDye v3.1 terminators (Applied Biosystems, Foster

City, CA) following recommendations by the manufacturer for electrophoresis on an ABI

3730xl automated sequencer.

35

Sequence editing, allele phasing and recombination. DNA sequences were assembled in SEQUENCHER 4.7 (Genecodes Inc., Ann Arbor, MI) and were verified to be of avian origin and the correct locus by comparing them to reference sequences in the

NCBI Genbank database using BLAST searches. For protein-coding loci we checked for stop codons using the online amino acid translation tool from European Bioinformatics

Institute (EBI). We aligned sequences for each locus using default parameters in

CLUSTALX 2 (Larkin et al., 2007) and MUSCLE 3 (Edgar, 2004). Both sets of alignments were essentially identical. For heterozygous positions in nuclear loci for the ingroup we separated each allele by using in silico phasing in the program PHASE 2.1

(Stephens et al., 2001) facilitated by the SEQPHASE (Flot 2010), running 100,000 generations with a burn-in of 10,000. We considered allelic phase resolved with a probability threshold of 0.9. Alleles with lower probabilities were left with a polymorphic position and analyzed as ambiguous. We tested for recombination in our loci employing the Phi method (Bruen et al. 2006) in the program SPLITS TREE 4.3 (Huson and Bryant

2006) and for a battery of six additional recombination tests in the program RDP 4

(Martin et al. 2010).

Testing for selection. Phylogenetic inferences assume neutrality of the marker under survey. We tested for the possibility of selection in a coalescent framework for the two datasets including all samples from all species using the Hudson-Kreitman-Aguadé

(HKA) test (Hudson et al., 1987) with 10,000 simulations as implemented in the program

HKA (J. Hey, see Bibliography), using a closely related taxon as an outgroup. For both datasets we used the closest outgroup possible (Bewick’s wren Thryomanes bewickii for

36 both datasets; sensu Barker, 2004) as a reference for segregating sites and genetic distance.

Phylogenetic reconstruction. We built phylogenetic trees using a two by three design table for the 5 locus and the 23 locus datasets in order to estimate relationships between OTUs and their respective branch lengths to test locus-by-locus support. The two rows correspond to estimations with and without mtDNA. The three columns correspond to a maximum likelihood concatenated analysis (ML), a Bayesian concatenated analysis (BY), and a Bayesian species-tree analysis (SP). This design produced six types of trees for each dataset. We tested for the best evolutionary model using jMODELTEST 2 (Darriba et al., 2012) on each locus and data set, using the

Bayesian Information Criterion (BIC). We also tested for the optimal partition scheme using PARTITIONFINDER (Lanfear et al., 2012) using the greedy algorithm and BIC for selecting the best scheme. The ML trees are the majority-rule consensus of 100 bootstrap replicate trees constructed using GARLI 2.01 (Zwickl, 2006) and all available evolutionary models implemented in PARTITIONFINDER. The BY trees are the majority-rule consensus trees from MRBAYES 3.2.2 (Ronquist et al., 2012) summarized in SUMTREES 3.12.0 (Sukumaran & Holder 2008; Sukumaran & Holder 2010) from two runs and four chains of 5 million generations and a burn-in of 10%. The SP trees are the maximum-clade credibility tree estimated in BEAST 1.7.4 (Drummond & Rambaut,

2007; Drummond et al., 2012) with three independent runs of 300 million generations for the 5 locus dataset and 500 million generations for the 23 locus dataset, each with a burn- in of 25%. Convergence of parameters measured in effective sample size (ESS) and burn-

37 in percentages were determined in Tracer1.5 (Rambaut and Drummond 2007) for BY and

SP analyses. For concatenated analyses we used a single allele per individual drawn at random since only species-tree analyses allow multiple alleles per taxon for both the gene and species trees to be estimated jointly (Liu & Pearl 2007; Heled & Drummond, 2010): our SP analyses included both alleles from each individual. We used *BEAST to estimate divergence times. We incorporated absolute time by adding a substitution rate per lineage per million years (substitutions/lineage/myr) to each type of locus with these priors: a relaxed clock following a log-normal distribution, a constant piecewise root, and birth- death coalescent model. The rates were as follows: for CYTB and ND2 we used 1.0 and

1.03%/lineage/myr (Weir & Schluter, 2008; Arbogast et al., 2006) respectively.

Similarly, for sex-linked and autosomal loci we used 0.18 and 0.195% (Axelsson et al.,

2004). We integrated the uncertainty of these rates for the relaxed clock prior using a standard deviation of 0.45 as suggested by Smith and Klicka (2010).

Species tree methods provide the opportunity to understand the relationship and conflicts between gene trees. In order to visualize agreement and disagreements between our gene trees we estimated “cloudograms” of our four SP trees’ posterior distribution in

DENSITREE 2.0 (Bouckaert, 2010) to show the effect of 5 loci versus 23 loci and the inclusion or exclusion of mtDNA in phylogenetic reconstruction.

Testing for differential signal between sex-linked and autosomal markers.

Although Sanger sequencing of 23 independent loci for any phylogenetic study is not a trivial task, it is still a modest representation of the genome, especially when Next

Generation sequencing technologies allow for sampling from hundred to thousands of

38 loci (Faircloth et al., 2012; Lemmon et al., 2012). In order to test whether the variation found in our loci is representative of the stochastic effects in the genome and estimate which loci are most informative for a given sample of sites, we created a set of simulations based on our fully resolved Bayesian multilocus nuclear phylogeny from 22 nuclear loci (BY in Fig 2.1B). We simulated 1000 loci in SEQ-GEN 1.3.2 using its GUI interface SGRUNNER 1.5.3 (Rambaut & Grass, 1997). For these simulations to mimic our empirical loci as closely as possible we divided them into five sets, each containing

200 loci of 200, 400, 600, 800, and 1000 base pairs (bps) respectively, that represent the sequence length range of the real data. We then reconstructed majority rule consensus trees for each nuclear locus in in MRBAYES using the same conditions as for the single locus randomizations above, with their appropriate evolutionary model. To obtain the evolutionary model for the simulated loci we fitted the most generalized type of substitution, nucleotide composition, and rate heterogeneity of the empirical data with a

ML criterion in PAUP 4.10b (Swofford, 2003). Since our fully resolved nuclear phylogeny was generated in MRBAYES we consequently constructed BY trees for each simulated locus as above, with 2 million generations and a burn-in of 10% for exact comparison to the trees from empirical loci. Additional custom scripts were created to calculate in parallel BY majority rule conesus trees in SUMTREES and to extract the number of statistically supported clades for each locus posterior clade-tree file. In order to standardize comparisons between all types of loci: sex-linked, autosomal, and simulations, we used an index of overall clade-in-tree support. This HSN index (for highly supported nodes) is the ratio of the number of statistically significant clades

39

(SSCs) in a given tree (≥0.95 posterior probability) over the maximum number of nodes in a fully-resolved tree (number of taxa N -1). The HSN index (SSCs/N-1) thus standardizes nodal support in different trees and reports the proportion of clades that are statistically supported in any number of trees regardless of their complexity or number of terminals, with values ranging from zero (no statistically significant clades) to one (every clade in a tree is significant).

We generated a distribution of the HSN index for the simulations, and its 95% confidence interval (CI) by bootstrapping the simulations’ HSN index values 10,000 times in R 3.0.1 (R Core Team, 2010). We mapped the empirical loci HSN values to look for loci with support different to that expected by chance and to distinguish between high and low variability-prone loci.

Sequence length is a primary factor determining variability under the species level because shorter loci carry less genetic information (Carling & Brumfield, 2007), yet the effects of locus length have not been tested at the species level despite the fact that phylogenetic “informativeness” and level of resolution are also correlated with locus length (Rokas & Carroll, 2005; Townsend 2007). In addition, male-driven molecular evolution predicts that loci on male-associated sex chromosomes will have higher mutation rates (Miyata et al., 1987; Ellegren & Fridolfsson, 1997) that could yield higher phylogenetic resolution. We therefore tested for the effects of locus length by fitting linear regressions of the HSN index on bases per locus by type of loci (sex-linked, autosomal, and simulations) to distinguish between ploidy and length effects on phylogenetic power.

40

Locus by locus contribution and clade support testing. We created a series of locus rarefaction analyses to assess the gene-specific support of each clade in the

Campylorhynchus phylogeny. For this series of tests we employed the 23-locus dataset excluding mtDNA. We designed three types of randomizations for the rarefaction analyses (OSF 4, see Abstract): 1) autosomal loci only, 2) sex-linked loci only, and 3) combined autosomal and sex-linked loci. Our randomizations included five draws of different numbers of loci used in phylogenetic estimation, from one locus to ten loci for the first two types (autosomal versus sex-linked) and 11 to 21 loci for the combined type.

These randomizations excluded one locus (21 out of 22) to allow for five different randomizations to be generated for each number of loci. Due to the computational burden of SP tree analyses, these randomizations were based on BY concatenated analyses. We assessed the presence of different clades in the Campylorhynchus phylogeny in the tree posterior distribution of two runs of the five randomizations per number of loci (1-21) in the program AWTY (Nylander et al., 2008) for a total of 10 runs per locus randomization. Autosomal only and sex-linked only randomizations were run for 2 million generations as ESS values were adequate (see RESULTS) and the combined loci randomizations were run for 5 million generations to ensure ESS values of at least 200.

We averaged the presence of any given clade across randomizations and runs to create rarefaction curves with their respective standard errors to identify the number of loci necessary to support a clade at a minimum of 95% posterior probability. The only exception for these locus randomizations was for n=10 autosomal, because we did not have 11 autosomal loci to randomize five independent draws. Thus, iteration “10” for

41 autosomal loci was included for graphical and plotting effects only as it is the average of ten runs of the ten autosomal loci and not the product of real iterations from different random draws.

Historical Biogeography and Behavioral Evolution. Wrens in the genus

Campylorhynchus offer an excellent opportunity to test hypotheses regarding the GABI and the evolution of complicated behavioral syndromes including cooperative breeding and vocal duetting. Consequently, we time calibrated the 5 locus datasets’ species tree from *BEAST using the substitution rates and clock prior we mentioned previously. The distribution area for each OTU was coded following Selander (1964) assigning lineages north of the Isthmus of Panama as present in North America, and those south of the isthmus as South American. The behaviors (cooperative breeding and duetting) were coded as presence/absence (Selander, 1964; Barker, 1999; Brewer, 2001). Although the evolution of these wrens’ behaviors and their biogeography merit a separate study using sophisticated model-based methods, for illustrative purposes, we employed a simple parsimony-based ancestral state reconstruction. We present these scenarios to show the implications of having fully resolved phylogenies.

RESULTS

Phylogenetic trees. Our taxonomic sampling produced sequences for all individuals and all loci but one outgroup in one locus (PPDW1, Thryothorus genibarbis).

This sampling design covers 100% of the currently recognized Campylorhynchus species and 70% (24/36) of distinct populations recognized as subspecies (Selander, 1964;

42

Brewer, 2001) or species (Zink et al., 2001, Vázquez-Miranda et al., 2009). All of our trees for the 5 locus and the 23 locus datasets recovered Campylorhynchus wrens as monophyletic, although we did not test for ingroup monophyly exhaustively. The monophyly of this genus is well supported by previous studies of mtDNA and multilocus data (Barker, 2004; Barker, 2007; Vázquez-Miranda et al., 2009) as reflected in our 5 locus dataset and assumed in our 23 locus dataset.

Five locus dataset. These data were designed to test for monophyly of ingroup species. For the 5 locus dataset all our tree estimates (ML, BY, and SP) with or without mtDNA found the three clades that correspond to subgeneric classification (Selander,

1964; with the addition of C. brunneicapillus as a third “major” lineage) with substantial statistical support, corresponding to at least 70% bootstrap replicates and 0.95 posterior probabilities (Fig. 2.1; Hillis & Bull, 1993; Huelsenbeck et al., 2001; respectively). As seen in previous studies (Barker, 2007; Vázquez-Miranda et al., 2009) these include: (1) the brown species with vertical dorsal streaks or “Heleodytes” clade, (2) the gray species with horizontal dorsal bars or “Campylorhynchus” clade, and (3) the two species of cactus wren sensu (Zink et al., 2001).

The 5 locus dataset includes 99 individuals and 24 OTUs that represent 70% of the described diversity in the genus Campylorhynchus (Fig. 2.1). The five loci included

(two mitochondrial and four nuclear genes) exhibited variation consistent with the neutrality assumption of phylogenetic reconstruction (HKA test: X2 = 2.24, P=0.82, d.f.=

5).

43

ML and BY trees from concatenated analyses were nearly identical: the biggest difference between these two methods was a larger proportion of unresolved clades in the majority-rule consensus ML trees (Fig. 2.1). Concerning the earliest split in this genus, all concatenated analyses placed the cactus wren lineage as sister to the

“Campylorhynchus” clade albeit with non-significant nodal support in most of them. The largest difference between trees was dependent on the inclusion of mtDNA in the concatenation. In the ML tree without mtDNA (Fig. 2.1A) only 29% of the OTUs (7/24) were monophyletic with ≥ 70% bootstrap support and a HSN index for the nodes joining all OTUs was 0.56 (13 supported nodes/ 23 total nodes). The node supporting the cactus wren lineage as sister to the “Campylorhynchus” clade was not well-supported (62% bootstrap). In contrast, the ML tree including mtDNA had 100% of the OTUs statistically supported as monophyletic (Fig 2.1D). Additionally, its HSN index was 0.83 (20 supported clades/ 23 total nodes). Including mtDNA, the cactus wren lineage and the

“Campylorhynchus” clade sister relationship was still only moderately-well supported

(72% bootstrap). Similarly, the BY tree excluding mtDNA (Fig. 2.1B) showed 58% of the OTUs (14/24) as monophyletic with ≥ 0.95 posterior probability support and a HSN index of 0.43 (10 supported clades/ 23 total nodes). The cactus wren lineage was in the exact same place in the BY tree topology with non-significant support (0.62 posterior probability). The BY tree including mtDNA (Fig. 2.1E) had 96% of the OTUs statistically supported as monophyletic with a HSN index of 0.87 (20 supported clades/

23 total nodes). With the inclusion of mtDNA, the cactus wren lineage showed the same topological placement in ML and BY trees with significant nodal support (0.99 posterior

44 probability, compared to 72% bootstrap in ML). Bayesian trees estimated under the species-tree (SP) paradigm (Edwards et al., 2007, Heled & Drummond 2010) showed overall similar topological features as the previously described concatenated analyses.

The main differences were in the generally lower levels of nodal significance when compared to BY trees and the placement of the cactus wren lineage. Species-tree methods assume each taxon is monophyletic and the estimation on the species tree relies on allele assignment to be defined a priori, because the unit of analysis is each OTU rather than each allele (Edwards et al., 2007; Knowles & Kubatko, 2010); therefore, we could not quantify the amount of OTU monophyly for our SP trees. Excluding mtDNA, the SP tree

(Fig. 2.1C) HSN index was 0.48 (11 supported clades/ 23 total nodes). The SP tree that included mtDNA (Fig. 2.1F) had a HSN index of 0.61 (14 supported clades/ 23 total nodes). For both SP trees the position of the cactus wren lineage was equivocal: it was sister to all other Campylorhynchus in the tree with no mtDNA (Fig. 2.1C) and sister to the “Heleodytes” clade with the inclusion of mtDNA (Fig. 2.1F) though both relationships were poorly supported (0.83 and 0.45 posterior clade credibility, respectively).

The relationships between Campylorhynchus OTUs (OSF 3) were similar for all of our trees in the sense that topological conflicts were few and had low support values.

The following are the strongly supported topological agreements in every tree (Fig. 2.1):

1) within the “Heleodytes” clade, the earliest spit was yucatanicus; 2) griseus + minor

(nominate taxon C. griseus), and chiapensis were sister taxa; 3) there were two clusters within the “Campylorhynchus” clade, hypostictus + unicolor (nominate taxon C.

45 turdinus) and nuchalis were one subclade while zonatus + brevirostris + panamensis + costaricensis (all nominate taxon C. zonatus), megalopterus, albobrunneus, and fasciatus

+ pallescens (both nominate C. fasciatus) were sisters in a second subclade. The differences between ML, BY, and SP trees were the ambiguous position and nodal support for several relationships towards the tips of the phylogeny. The relationships between jocosus, gularis, and the five rufinucha OTUs were unclear with non-significant nodal support in all trees that excluded mtDNA. As noted previously (Barker, 2007), paraphyly was pervasive for the four OTUs classified under nominate taxon

Campylorhynchus zonatus (banded-backed wren) and their relationship to other OTUs with geographically adjacent lineages: zonatus from Mexico was sister to megalopterus

(nominate Campylorhynchus megalopterus); Central American OTUs (costaricensis and panamensis) were sister to a South American clade; brevirostris was sister to albobrunneus (nominate Campylorhynchus albobrunneus) within a clade sister to fasciatus and pallescens (the last two belonging to nominate taxon Campylorhynchus fasciatus). The trees that included mtDNA had increased nodal support for marginally significant relationships, whereas nodes with low support remained ambiguous.

Twenty three locus dataset. These data were designed to test relationships between currently recognized species. These trees, with the same analytical strategy described previously, included the 13 nominate Campylorhynchus lineages plus the divergent cactus wren from Baja California (affinis), rooted with a close relative, the

Bewick’s wren (Thryomanes bewickii) for a total of 15 taxa. Similar to the five locus dataset, these analyses were also consistent with an assumption of neutrality (HKA test:

46

X2 = 20.84, P=0.59, d.f.= 23). The trees for all analyses were largely congruent with the results reported for the five loci dataset. All trees were almost identical, with the main differences in nodal support values towards the tips. Only one out of six trees (SP including mtDNA) placed the cactus wren sister to the “Heleodytes” clade; the other five trees, concatenated and species trees, with or without mtDNA, placed the cactus wren lineages (brunneicapillus and affinis) as sister to the “Campylorhynchus” clade (Fig. 2.2).

The placement of the cactus wren was not strongly supported in the dissenting tree (Fig.

2.2F) and the alternative placement was strongly supported in the two concatenated BY trees (Fig. 2.2B, 2.2E). The relationships between nominate lineages in all trees (Fig. 2.2) was as follows: the “Heleodytes” clade contained griseus sister to chiapensis in a core clade sister to rufinucha and gularis; the position of jocosus was unambiguous and sister to this core clade. The earliest split within “Heleodytes” was yucatanicus versus the remaining species. The “Campylorhynchus” clade included megalopterus sister to zonatus in a clade with albobrunneus and fasciatus; nuchalis was sister to turdinus and this clade is sister to the one previously mentioned. Note that one individual from hypostictus was included as representative of Campylorhynchus turdinus because we could not include samples from the nominate taxon (C. t. turdinus). In the SP tree with mtDNA (Fig. 2.2F) the relative order of jocosus, gularis, and rufinucha was ambiguous.

The overall nodal support for concatenated trees was higher than in species trees. The

ML trees without and with mtDNA had an HSN index of 0.77 (Fig. 2.2A, 2.2D; 10 SSC/

13 nodes), whereas both types of BY trees had an HSN index value of 1.0 (Fig. 2.2B,

47

2.2E; 13 SSC/ 13 nodes). In contrast SP trees had the lowest HSN index values: without mtDNA 0.54 (Fig. 2.2E; 7/13 nodes) and with mtDNA 0.69 (Fig. 2.2C; 9/13 nodes).

Gene-tree concordance/discordance. Summarizing the complexity of gene tree stochasticity into a majority-rule tree or a posterior credibility tree can potentially mask agreements or disagreements between the sorting of genes in the speciation process.

Cloudograms are one way to visualize the posterior distribution and show branch densities related to agreements or disagreements. Darker colors and bifurcating branches indicate gene-tree congruence. Lighter colors and reticulate branches or “webs” indicate gene-tree incongruence. Our cloudograms (Fig. 2.3) showed contrasting lineage definitions depending on the number of included loci. In the 5 locus dataset, nuclear loci by themselves defined the same three lineages found with traditional methods, but the resolution level at the tips was poor, emphasizing the presence incomplete lineage sorting

(and/or gene flow) in recently derived lineages. Most of a lineage’s definition in a species tree setting appears to come from the mtDNA gene trees when there is little conflict between nuclear loci. In contrast, species trees built using 23 loci yielded more gene-tree conflict when including mtDNA. The 23 locus species tree without mtDNA was topologically identical to the concatenated BY trees. The only difference is nodal support in different phylogenetic methods. Relatively short branches on species trees have poor posterior clade credibilities whereas posterior probabilities on concatenated trees are all above the 0.95 threshold. Longer internodes are well supported regardless of the tree construction method.

48

Gene tree simulations. Phylogenetic datasets generated using Sanger sequencing generally include a limited amount of independent loci when compared to Next

Generation methods that capture hundreds to thousands of loci. We generated simulations following the stochasticity of the coalescent to estimate the effects of varying sample size on phylogeny estimation in Campylorhynchus. We calculated the HSN index for 500 simulated loci and compared its distribution to the loci used for the Campylorhynchus phylogeny (Fig. 2.4). The density distribution of our simulations showed most of the empirical loci falling within the 95% CI of gene tree HSN values’ stochasticity. Only one sex-linked and two autosomal loci supported more clades than expected by a random coalescent process (GPR98, GH and RHOD1, respectively; see OSF 3). These three loci were among the longest (~1000 bps). The number of supported clades in a locus was correlated to locus length in our simulations (R2= 0.62, P< 0.001) and in the empirical sex-linked loci (R2= 0.51, P< 0.01) whereas autosomal loci showed no such correlation

(R2= 0.02, P= 0.62).

Clade support by locus type through randomizations. Testing the effects of adding and subtracting different types of loci on phylogenetic reconstructions allows for detecting and assessing the contribution of each locus to the overall support of clades. We compared three kinds of randomizations to test each type of locus contribution to the

Campylorhynchus phylogeny under a Bayesian framework: Sex-linked loci only, autosomal loci only, and combined loci (Fig. 2.5). Each randomization consisted of five independent draws from the set of 22 nuclear loci (23 locus dataset) times two runs each for an averaged value of clade presence in the posterior distribution of trees and its

49 associated standard error. We detected three general patterns depending on the relative length of the internodes (Fig. 2.5). Pattern one: long branches needed no more than two sex-linked loci to be present in the 95% posterior distribution of trees, whereas three to five autosomal loci were needed to find the same clades. Consequently, combined loci randomizations found clades on long branches every time. Clades on a medium internode behaved similarly; half the number of sex-linked loci was needed to find them compared to only incorporating autosomal loci (four to eight, respectively). Clades on short internodes in comparison needed between 18 and 20 combined loci to plateau in at least

95% presence on the tree posterior. In these cases neither type of locus was able to find clades reliably on their own. Pattern two: clades produced by randomizations based on a single locus were not present in 95% of the posterior tree distribution, regardless of ploidy and internode length. In most cases sex-linked loci were closer to the 95% threshold but there was general overlapping of the randomizations’ standard errors with autosomal loci data. Pattern three: erroneous clade presence is more prevalent on sex- linked loci trees but the addition of many loci mitigates their existence in the posterior distribution. Trees from a single locus had fewer erroneous clades (including the ourgroup Thryomanes as part of the ingroup, <30%) but this number increases to over

70% in combined analyses. By adding most of the 23 locus dataset (21 out of 22 loci) the presence of erroneous clades virtually disappears (<2%).

Historical biogeography and behavioral evolution. Our time reconstruction based on the species tree dataset including all individuals and five loci (Appendix 1) showed the split between Campylorhynchus and its sister lineage Thryomanes happened

50

12 mya (11-16 mya HPD), in the Miocene. This split followed a 38 mya divergence (28-

50 mya HPD) between the Holarctic Sitta nuthatches and the largely New World gnatcatchers and wrens in the families Polioptilidae and Troglodytidae, in accord with previous estimates (Barker et al., 2004). The radiation within Campylorhynchus and the separation between the mostly North American and predominantly South American clades occurred in the Late Miocene or at the Miocene/Pliocene boundary 7 mya (6-8 mya HPD), somewhat older than previously estimated (Barker, 2007). The speciation process within North and South America ensued in an equivalent timeframe 5 mya (4-6 mya HPD) spanning the final closure of the Isthmus of Panama. Most of the currently recognized species diverged from their sister lineages during the Pliocene or at the

Pliocene/Pleistocene boundary (1.5-3 mya) and the separation between OTUs occurred during the Pleistocene or at the Pleistocene/Holocene boundary. Our parsimony ancestral area reconstruction found three independent transitions from North America into South

America. One occurred at the limit of the final closure of the Isthmus of Panama and the other two happened after isthmian closure in a 3-4 mya timeframe. One of the two recent events is trans-Andean including OTUs pallescens, fasciatus, albobrunneus, and brevirostris. The older event and the other recent event correspond to two crossings into the cis-Andean region, the first including hypostictus, unicolor, and nuchalis and the second griseus and minor. The position of the cactus wren lineage (C. brunneicapillus and affinis) in the phylogeny significantly complicates inference of behavioral evolution, yielding two equally parsimonious explanations (Appendix 1). In the first scenario, the most recent common ancestor of Campylorhynchus evolved cooperative breeding and

51 song duets, and these behavioral syndromes were lost in the northernmost latitudes of the

North American warm deserts in the cactus wren lineage (Appendix 1). Alternatively, the

“Heleodytes” and “Campylorhynchus” clades could have gained both syndromes independently whereas the cactus wren lineage never evolved them.

DISCUSSION

Phylogenetic analyses emphasizing multiple individuals versus multiple loci.

Systematists are faced with multiple criteria of study design but may be limited either by availability of samples or resources (Cummings & Meyer 2005). In the current study we found that when emphasizing taxonomic representation and number of individuals over number of loci, the resolution of relationships near the tips and early splits with short internodes was poor when only a handful of loci were included in the analyses (Figs. 2.1,

2.3). Adding mtDNA to the five nuclear loci we sampled did improve resolution towards the tips but the earlier splits remained ambiguous (Fig. 2.1). In analyses maximizing number of loci over individuals more resolving power was gained towards the root of the tree. Recent splits or clades subtended by short internodes were complicated and many loci were necessary to even begin to sort out their relationships (Fig. 2.5). The more loci sampled, the greater the chances of finding gene trees that will support a given split

(Edwards & Beerli, 2000; Felsenstein, 2006; Lee et al., 2008), however the largest proportion of simulations suggest that individual loci will not support even 20% of clades within Campylorhynchus. That in turn reduces the clade presence frequency in the species tree, lowering nodal support. Gene tree conflict seems more pervasive when

52 including mtDNA (Shaw, 2002; Wiens et al., 2010; Fisher-Reid & Wiens, 2011; this study), but why? Mitochondrial DNA is the locus that should yield the highest number of supported clades (as quantified by the HSN index) due to its higher sorting rate and rapid rate of sequence evolution. The more loci with higher phylogenetic signal the higher the probability of conflict will be, especially when genes are the unit of analysis instead of characters. Each empirical locus, regardless of ploidy, supported barely half of the nodes in the Campylorhynchus phylogeny. This result suggests that when only considering nuclear loci gene-tree discordance (Knowles, 2009) in Campylorhynchus could occur at most 50% of the time, as the remaining cases are soft polytomies, but conflict will be larger if mtDNA trees do not agree with nuclear loci with relatively high HSN values. If the gene-tree discordance is due to lineage sorting, conflict with mtDNA is not a surprising outcome and is indicative of the coalescent nature of the evolutionary process close to the species level. Even if conflict is exacerbated by horizontal gene transfer

(Kubatko, 2009), using mtDNA in phylogenetic estimation under a species-tree paradigm allows for detecting causes of conflict when closely related taxa share identical alleles from slow sorting nuclear loci. If only those loci with little to no variation are surveyed, distinguishing patterns of common descent from sharing alleles due to gene flow becomes difficult. In addition, our results show that under a species-tree paradigm, a handful of nuclear loci do not contribute much to the phylogeny. Even though any study including at least two loci is ‘technically’ multilocus, the resulting species trees are primarily driven by the most informative loci—in many cases mtDNA—due to the absence of variation in slower-evolving loci (Knowles, 2009; Kubatko, 2009). In our

53 analyses, consistent statistical support may not appear for short and medium internodes until as many as 21 loci are included. These results suggest that multilocus phylogenetic analyses should take the “multi-“ part of the name more seriously.

Phylogenetic resolving power of a locus: is it length, ploidy, or how it’s used?

Our analyses began with a typical phylogenetic approach of tree reconstruction from concatenation to species tree methods for full datasets, and then went through dissection of phylogenetic power locus-by-locus and by locus interaction. The distribution of individual gene tree variability showed that our empirical loci follow a neutral coalescent process just like the simulations, and that detecting outlier loci is not the product of selection, More importantly, each locus within the HSN distribution 95% CI (Fig. 2.4) supports individually a small portion of clades regardless of the marker’s ploidy. In this by-locus framework, the significant factor of clade support was length where the longer the locus the more statistically supported clades it will yield. This is a similar trend found at the population level when locus length is a primary factor determining the accuracy of effective population size estimation (Carling & Brumfield, 2007). Nevertheless, our longest loci supported only between 30-50% of clades. The only locus that had less than

30% of supported clades and was longer than 1000 bps length was the protein-coding autosomal gene RAG1. This finding suggests that phylogenetic approaches studying recent radiations will find more informative loci if the markers are long, but at this evolutionary scale protein-coding genes are not ideal. New phylogenomic approaches use baits that capture loci between 400-600 bps (Faircloth et al., 2012, Lemmon et al., 2012).

Assuming a study system similar to the one we present here, these novel approaches

54 would likely produce loci supporting less than 20% of clades per locus. Moreover, sequence loci from single nucleotide polymorphism (SNP) discovery approaches (Eaton

& Ree, 2013; Hipp et al., 2014) yield sequences between 50-200 bps fragments that at most will produce gene trees supporting 15% of clades. We predict bait capture approaches with tiling probes producing loci over 1000 bps will have better results because shorter loci do not leave the HSN index CI only supporting 0-30% of supported clades (Fig. 2.4). Further, sequence loci from SNP discovery approaches that barely reach

200 bps will have a hard time yielding even a single supported clade, therefore it is likely that concatenated approaches or SNP-based species-tree algorithms (Bryant et al., 2012) will be the most feasible alternative because those gene trees will mostly produce polytomies (Lemmon & Lemmon, 2013).

Although one sex-linked gene had the highest HSN, this type of locus also had the largest variability in support, with one marker yielding zero supported clades although each locus was variable. From a by-locus standpoint, our results suggest that sex-linked loci are not drastically more informative than autosomal loci when all clades on a tree are considered equal. Anecdotal evidence suggests that sometimes when sampling a single sex-linked locus, as done in most studies, it could be equally variable or even less variable than an autosomal marker. Consistent with this, we have shown that full polytomies (HSN= 0.0) are expected under a neutral coalescent process with loci of the size and evolutionary rate of those sampled in this study. However, we also show that increasing the number of sampled sites for a sex-linked locus strongly increases the number of recovered clades, whereas this relationship is essentially flat for autosomal

55 loci. If only a single locus is sampled chances are that it will be barely informative, in particular if it is a short sequence. Our results emphasize how sampling multiple sex- linked loci may be more important than sampling the rest of the genome. For instances, our randomizations show that when branches are relatively long, fewer sex-linked loci are needed to find clades than autosomal markers in the posterior distribution of trees.

Further, when interacting in a concatenated framework, less than three sex-linked loci are needed to reliably find clades whereas up to 8 autosomal are needed for the same branch.

For example, the mostly North American Heleodytes clade was found in 80% of all individual sex-linked loci and in every tree posterior of two or more. By contrast, single autosomal loci only found this clade 30% of the time and it wasn’t until six autosomal loci were concatenated that this clade appeared in 95% of the tree posterior distribution.

Few studies have employed the power of sex-linked loci for phylogenetic reconstruction (Jacobsen & Omland, 2011), and have rarely compared their phylogenetic power to that of autosomes (but see Corl & Ellegren, 2013). Corl & Ellegren (2013) found that sex-linked loci yield gene trees closer to the species tree and that fewer sex- linked genes are needed to find the species tree on the posterior. Our results are congruent with this result. Nevertheless, previous studies have not dissected the tree into its clade- specific components or examined the effects of locus or internode length. Interestingly, trees from Corl & Ellegren (2013) included a deeper temporal scale (8-30 mya) and ours was shallower (1-8 mya). Finding that sex-linked loci outperform autosomes at multiple temporal scales further establishes the utility of phylogenetic markers linked to sex chromosomes, in particular when limited resources are available.

56

Other power analyses have also found that relatively small fractions of a dataset may be necessary to recover statistically supported clades (Rokas et al., 2003). As expected, very short branches seem hard to solve regardless of marker ploidy. In our study, branches supporting the three main Campylorhynchus clades and the clade containing C. gularis and C. rufinucha each needed 81-86% of loci present to find those groups in a significant fraction of the tree posterior. These results indicate that that these divergences occurred faster than the average coalescence time for nuclear markers. In these cases adding informative loci will be critical to resolving relationships reliably.

Some taxonomic groups such as salamanders and lungfish have massive genomes

(Gregory, 2005; Sun et al., 2012) and pose significant challenges to phylogenomic approaches targeting de novo loci. Our results indicate that in such cases focusing on a handful of sex-linked genes could be adequate to resolve some clades and that hundreds or thousands of loci would not be required to get a fully resolved and well-supported phylogeny. One substantial difference between our nuclear concatenated and species trees is the statistical support on very short internodes. The species trees had nodal support ranging between 0.41-0.83 whereas the same nodes had probabilities of 1.00 in concatenated analyses, showing how concatenation tends to overestimate nodal support

(Edwards et al., 2007). Although these differences do not alter our conclusions because concatenated and species trees were topologically identical, it does emphasize the need for even more loci to support every node with high support under a species tree paradigm.

Campylorhynchus phylogeny, biogeography, and behavioral evolution before and after multilocus data. The current paradigm in systematic biology is that sampling

57 large numbers of loci (often orders of magnitude more than sampled in phylogenetic studies merely a decade ago) will translate into higher resolution (Edwards et al., 2007;

Faircloth et al., 2012; Lemmon et al., 2012; Smith et al., 2013), less conflict

(McCormack et al. 2009) and tighter confidence intervals on times of divergence (Lee et al., 2008). Here, we address the practical question of what is the gain in phylogenetic information in increasing the number of loci not by an order of magnitude, but by twenty- fold. In the broadest sense, the results presented here are highly congruent with an earlier attempt to solve evolutionary relationships in the genus Campylorhynchus using morphology and mitochondrial DNA. Our phylogenetic reconstructions recovered the two clades that correspond to brownish taxa with vertical streaks and grayish taxa with horizontal bars that had been proposed by general plumage characters and behavior before molecular phylogenetic methods were adopted (Selander, 1964): these two splits were also reflected in the mitochondrial phylogeny (Barker, 2007). The timing of speciation events was also congruent considering that when using similar mtDNA rate calibrations, even when a handful of nuclear rate are added, confidence intervals overlap.

These similarities would indicate that including 23 loci instead of one does not translate into a significant phylogenetic improvement. However, our study yielded a fully resolved tree whereas mitochondrial studies yielded multiple unsupported nodes (Barker, 2007;

Vázquez-Miranda et al., 2009). The most substantial difference was the ability to resolve nodes associated with very short branches that needed at least 19 nuclear loci to be recovered reliably.

58

The GABI was a significant event that led to major rearrangements in the New

World’s biota, including speciation and extinction (Marshall et al., 1982, Simpson &

Neff 1985; Weir et al., 2009; Cody et al., 2010, Smith & Klicka 2010, Pinto-Sánchez et al., 2012). For Neotropical taxa, it is often assumed that South America was the ancestral area and that species migrated north primarily after the Isthmus of Panama closed.

However, many taxa, including some placental mammals (carnivores, deer, camels, rodents), pines, and emberizoid birds, had the opposite biogeographic pattern (Webb,

2006; Woodburne, 2010; Winger et al., 2014). Although New World wrens are most diverse in South America their ancestral area has been hypothesized to be North America

(Mayr, 1946, Barker et al., 2004). The wrens in the genus Campylorhynchus support this north-to-south biogeographic pattern following basal diversification events in Mexico with dispersal into northern South America. The most significant difference with the biogeographic scenario inferred from mitochondrial DNA alone (Barker, 2007) is that our multilocus calibration overlaps with an older date of the isthmus’ final closure of 4 mya

(Kirby et al., 2008) instead of a closure time span between 2.5 – 3.1 mya (Webb, 1991) and that there is no nodal support ambiguity. Nevertheless our results show how important the Isthmus of Panama was in promoting biotic interchange and subsequent diversification. Sister relationships in the genus occur between species that have allopatric distributions in radically different habitats (Selander, 1964; Brewer, 2001). For example, C. zonatus from Mexico lives in humid lowland evergreen forest whereas C. megalopterus is restricted to highland coniferous forests; whereas C. gularis can only be found on the highland conifer-oak ecotone and its sister C. rufinucha requires lowland

59 semidecidious dry forests (Brewer, 2001). In the few cases of overlapping distributions, sympatric species are not closely related (C. chiapensis and C. rufinucha in southern

Mexico, and C. griseus and C. nuchalis in northern South America), and in these cases of sympatry the species pairs have drastic differences in body size (Selander, 1964). These distinct phenotypes may experience reduced ecological competition (Hutchinson, 1959), allowing taxa to co-exist in particular when sufficient evolutionary time and divergence prevents exclusions (Pigot & Tobias 2013; Zink, 2014). Thus, it is likely that species interactions, environmental and geographic preferences are responsible for this group’s speciation, in addition to specific biogeographic events.

We found two equally parsimonious scenarios for the evolution of cooperative breeding and duetting behaviors. They either emerged as single events prior to

Campylorhynchus diversification and were subsequently lost in the cactus wren lineage

(OTUs brunneicapillus and affinins), or were independently gained by the ancestor of two out of the three clades in the Campylorhynchus phylogeny. Although our parsimony reconstruction cannot distinguish between these two possibilities, it does suggest that both behaviors could be correlated. The loss or lack of these two syndromes in the cactus wren lineage may be correlated with its largely temperate distribution (Farley, 1996) and the high aridity of North America’s warm deserts (Barker, 1999; Riddle & Hafner 2006;

Jetz & Rubenstein, 2011). As stated previously a more thorough analysis of the factors behind these two behavior syndromes’ evolution is necessary to test for correlation of these behaviors with environmental and other ecological factors.

60

Conclusions. Studying radiations using phylogenetic methods is a complicated endeavor regardless of taxonomic, geographic, and temporal scales. Even though there is consensus regarding how important it is to include many loci to have better estimates of evolutionary relationships (Edwards & Beerli 2000; Felsenstein, 2006; Lee et al., 2008), there is not an agreement on what is the best locus sampling strategy. In this study we contrasted the level of statistical support of markers with different ploidy and length for clades in a New World wren radiation. We demonstrate that on a locus per locus basis length is correlated with clade support. More importantly the number of loci necessary to support a clade varies with branch length; longer branches need fewer loci than short branches. In addition, we found that sex-linked loci tend to recover clades faster than autosomal markers of similar length when branches are relatively long. However, there are no differences in when a clade appears in the posterior distribution when it sits on a short branch, leaving the number of loci as the most important factor determining clade presence. Our species trees and concatenated analyses solved the earliest split in the genus Campylorhynchus clarifying the position of C. bruneicapillus/affinins as sister to the largely South American “Campylorhynchus” group instead of the mostly North

American “Heleodytes” group, pointing towards a clearer understanding of the group’s biogeographic history and behavioral evolution. Our results suggest it is possible to fully solve a species-level phylogeny even with relatively few, carefully selected loci. By sampling sex-linked markers, the number of loci necessary to find and support relationships can be surprisingly small. These findings become relevant for study systems where phylogenomics resources are not available or the opportunities to collect sequence

61 information are limited. Sex chromosomes are an underutilized source of phylogenetic information and provide a useful tool to efficiently resolve phylogenies at the species level.

62

POLIOPTILAa SITTAa SITTAa POLIOPTILAa SALPINCTESa SALPINCTESa THRYOTHORUSa THRYOTHORUSa THRYOMANESa yucatanicus5aTHRYOMANESa yucatanicus5a yucatanicus4a yucatanicus4a yucatanicus1a yucatanicus1a yucatanicus3a yucatanicus3a yucatanicus2a yucatanicus2a rufinucha4a humilis3a rufinucha3a jocosus2a rufinucha1a jocosus1a navarrii2a navarrii2a navarrii1a humilis7a humilis7a humilis6a humilis6a humilis2a humilis2a humilis5a humilis1a humilis4a humilis5a rufinucha4a humilis4a rufinucha3a rufinucha5a rufinucha1a rufinucha2a navarrii1a A humilis3a B humilis1a C jocosus2a jocosus1a rufinucha5a gularis1a gularis1a rufinucha2a gularis4a gularis4a gularis3a gularis3a gularis2a gularis2a capistratus6a capistratus5a capistratus5a capistratus2a capistratus3a capistratus6a capistratus2a capistratus1a capistratus1a capistratus3a nigricaudatus5a nigricaudatus5a nigricaudatus4a nigricaudatus4a nigricaudatus3a nigricaudatus3a nigricaudatus2a nigricaudatus2a nigricaudatus1a nigricaudatus1a capistratus4a capistratus4a chiapensis4a chiapensis4a chiapensis3a chiapensis3a chiapensis2a chiapensis2a chiapensis1a griseus4a chiapensis1a griseus3a minor5agriseus4a griseus2a griseus3a griseus1a griseus2a minor5a griseus1a minor9a minor7a minor9a minor8a minor7a minor0a minor8a affinis4a minor0a affinis3a affinis4a affinis7a affinis3a affinis6a affinis6a affinis5a affinis5a affinis2a affinis2a affinis1a affinis7a brunneicapillus1a affinis1abrunneicapillus1a brunneicapillus5a brunneicapillus4a brunneicapillus4a brunneicapillus3a brunneicapillus3a brunneicapillus5a brunneicapillus2a brunneicapillus2a hypostictus4a hypostictus4a hypostictus1a hypostictus1a unicolor3a unicolor3a unicolor2a unicolor2a nuchalis7a nuchalis7a nuchalis2a nuchalis2a nuchalis1a nuchalis1a nuchalis5a nuchalis5a nuchalis6a nuchalis6a nuchalis3a nuchalis3a brevirostris5a brevirostris5a brevirostris4a brevirostris4a albobrunneus4a albobrunneus4a albobrunneus5a albobrunneus5a albobrunneus3a albobrunneus3a albobrunneus2a albobrunneus2a albobrunneus1amegalopterus7a albobrunneus1a megalopterus6a megalopterus7a megalopterus5a megalopterus6a megalopterus4a megalopterus5a megalopterus3a megalopterus4a megalopterus2a megalopterus3a megalopterus1a megalopterus2a panamensis1a megalopterus1a pallescens9a pallescens9a pallescens6a pallescens6a pallescens4a pallescens4a pallescens3a pallescens3a fasciatus2a fasciatus2a fasciatus1a fasciatus1a 90 ! BP costaricensis6a 0.99 ! BP costaricensis6a costaricensis2a panamensis1acostaricensis2a zonatus3a zonatus3a vulcanius1a vulcanius1a 70 ! BP 90 zonatus7a 0.95 ! BP < 0.99 zonatus7a zonatus9a zonatus9a zonatus8a zonatus8a BP < 70 zonatus0a BP < 0.95 zonatus0a POLIOPTILAa SITTAa SALPINCTESa POLIOPTILAa THRYOTHORUSa SITTAa yucatanicus5aTHRYOMANESa SALPINCTESa yucatanicus4a THRYOTHORUSa yucatanicus1a yucatanicus5aTHRYOMANESa yucatanicus3a yucatanicus4a yucatanicus2a yucatanicus1a jocosus2a yucatanicus3a jocosus1a yucatanicus2a chiapensis4a jocosus2a chiapensis3a jocosus1a chiapensis2a chiapensis4a chiapensis1a chiapensis3a griseus4a chiapensis2a chiapensis1a griseus3a griseus4a griseus2a griseus3a griseus1a griseus2a minor5a griseus1a minor9a minor5a minor7a minor9a minor8a minor7a gularis3aminor0a minor8a gularis4a minor0a gularis2a gularis3a gularis1a gularis2a nigricaudatus5a gularis4a nigricaudatus4a gularis1a nigricaudatus3a nigricaudatus5a nigricaudatus2a nigricaudatus4a nigricaudatus1a nigricaudatus3a D capistratus4a E nigricaudatus2a F nigricaudatus1a capistratus1a capistratus4a capistratus6a capistratus1a capistratus5a capistratus3a capistratus3a capistratus2a capistratus2a capistratus6a rufinucha5a capistratus5a rufinucha2a rufinucha5a rufinucha4a rufinucha2a rufinucha3a rufinucha4a rufinucha1a rufinucha3a navarrii2a rufinucha1a navarrii1a navarrii2a humilis7a navarrii1a humilis6a humilis5a humilis5a humilis3a humilis4a humilis2a humilis3a humilis4a humilis2a humilis1a humilis1abrunneicapillus1a humilis7a brunneicapillus5a humilis6abrunneicapillus1a brunneicapillus4a brunneicapillus4a brunneicapillus3a brunneicapillus3a brunneicapillus2a brunneicapillus5a affinis2a brunneicapillus2a affinis1a affinis2a affinis3a affinis1a affinis4a affinis3a affinis7a affinis4a affinis6a affinis7a affinis5a affinis6a unicolor3a affinis5a hypostictus4aunicolor2a unicolor3a hypostictus1a hypostictus4aunicolor2a nuchalis7a hypostictus1a nuchalis2a nuchalis7a nuchalis1a nuchalis2a nuchalis5a nuchalis1a nuchalis6a nuchalis5a panamensis1anuchalis3a nuchalis6a costaricensis6a panamensis1anuchalis3a costaricensis2a costaricensis6a zonatus3a costaricensis2a vulcanius1a zonatus3a zonatus7a vulcanius1a zonatus9a zonatus7a zonatus8a zonatus9a zonatus0amegalopterus2a zonatus8a megalopterus1a zonatus0amegalopterus2a megalopterus7a megalopterus1a megalopterus6a megalopterus7a megalopterus5a megalopterus6a megalopterus4a megalopterus5a megalopterus3a megalopterus4a fasciatus2a megalopterus3a pallescens4afasciatus1a fasciatus2a pallescens3a pallescens4afasciatus1a 90 ! BP pallescens9a pallescens3a pallescens6a pallescens9a brevirostris5a 0.99 ! BP pallescens6a brevirostris4a brevirostris5a 70 BP 90 albobrunneus5a brevirostris4a ! albobrunneus4a albobrunneus5a albobrunneus3a 0.95 ! BP < 0.99 albobrunneus4a albobrunneus2a albobrunneus3a BP < 70 albobrunneus1a albobrunneus2a BP < 0.95 albobrunneus1a

Figure 2.1. Trees from analyses of the five locus data set. Top row (A-C) trees without mtDNA. Bottom row (D-F) trees with mtDNA. First column (A, D) maximum likelihood concatenation (ML) trees. Second column (B, E) Bayesian concatenation (BY) trees. Third column (C, F) species (SP) trees. 63

THRYOMANESa THRYOMANESa

yucatanicus4a yucatanicus4a

jocosus1a jocosus1a

A rufinucha1a B rufinucha1a C

gularis1a gularis1a

griseus1a griseus1a

chiapensis2a chiapensis2a

brunneicapillus1a brunneicapillus1a

affinis3a affinis3a

hypostictus1a hypostictus1a

nuchalis1a nuchalis1a

zonatus8a fasciatus1a

megalopterus3a albobrunneus4a

90 ! BP fasciatus1a 0.99 ! BP zonatus8a 70 ! BP 90 0.95 ! BP < 0.99 BP < 70 albobrunneus4a BP < 0.95 megalopterus3a

THRYOMANESa THRYOMANESa

yucatanicus4a yucatanicus4a

jocosus1a jocosus1a

gularis1a gularis1a

rufinucha1a rufinucha1a

D E chiapensis2a F chiapensis2a

griseus1a griseus1a

brunneicapillus1a brunneicapillus1a

affinis3a affinis3a

nuchalis1a nuchalis1a

hypostictus1a hypostictus1a

zonatus8a zonatus8a

megalopterus3a megalopterus3a

90 ! BP fasciatus1a 0.99 ! BP fasciatus1a 70 ! BP 90 0.95 ! BP < 0.99 BP < 70 albobrunneus4a BP < 0.95 albobrunneus4a

Figure 2.2. Trees from analyses of the 23 locus data set. Top row (A-C) trees without mtDNA. Bottom row (D-F) trees with mtDNA. First column (A, D) maximum likelihood concatenation (ML) trees. Second column (B, E) Bayesian concatenation (BY) trees. Third column (C, F) species (SP) trees. Note that hypostictus is considered the representative of Campylorhynchus turdinus. 64

Figure 2.3. Cloudograms of species trees. Top row, 5 locus trees. Bottom row, 23 locus trees. First column, trees without mtDNA. Second column, trees with mtDNA. Four-letter codes at the tree tips correspond to OTUs in OSF 3 (see Abstract). 65

A. Effect of locus length over clade support variability

0.6 Simulations Emp. sex-linked Emp. autosomal 0.5 0.4 0.3 HSN Index HSN 0.2 0.1 0.0

200 400 600 800 1000 1200

Length (bps) B. Density distribution of clade support variability

Mean HSN sim. density 95% btsp 10,000 reps.

5 Emp. sex-linked Emp. autosomal 95% CI 4 3 Density 2 1 0

0.0 0.1 0.2 0.3 0.4 0.5 0.6

HSN Index

Figure 2.4. A. Locus clade support variability as function of length (in bps; simulations were jittered for visualization purposes). B. Density distribution of the HSN index of simulated and empirical loci (jittered for visual purposes). In all cases there are 500 simulated, 12 sex-linked, and 10 autosomal loci. Empirical loci, including outliers, are neutral under the HKA test (see RESULTS). 66

A. rufinucha+gularis B. Heleodytes clade C. Heleodytes 'core' clade 90 90 90 70 70 70 50 50 50

Autosomal

30 30 Sex-linked 30 Combined 10 10 10 Posterior distribution presence of clade Posterior distribution presence of clade Posterior distribution presence of clade 0 0 0

1 3 5 7 9 12 15 18 21 1 3 5 7 9 12 15 18 21 1 3 5 7 9 12 15 18 21

Number of loci Number of loci Number of loci

Thryomanes

D. Campylorhynchus clade yucatanicus E. Campylorhynchus 'core' clade B jocosus

rufinucha 90 90 A gularis 70 70 griseus

C chiapensis 50 50 brunneicapillus

affinis 30 30 turdinus F H nuchalis 10 10

Posterior distribution presence of clade fasciatus Posterior distribution presence of clade 0 0 D albobrunneus 1 3 5 7 9 12 15 18 21 E 1 3 5 7 9 12 15 18 21 zonatus Number of loci Number of loci megalopterus

F. nuchalis+turdinus clade G. Outgroup as ingroup “wrong” clade H.brunneicapillus + Campylorhynchus clade 90 90 90 70 70 70 50 50 50 30 30 30 10 10 10 Posterior distribution presence of clade Posterior distribution presence of clade Posterior distribution presence of clade 0 0 0

1 3 5 7 9 12 15 18 21 1 3 5 7 9 12 15 18 21 1 3 5 7 9 12 15 18 21

Number of loci Number of loci Number of loci

Figure 2.5. Locus rarefaction curves from randomizations. Squares and solid black lines are sex-linked loci, circles and gray lines are autosomal loci, and diamonds and dashed lines are combined loci (see legend in panel B). The tree is the concatenated BY tree without mtDNA with every node strongly supported (≥ 0.95 posterior probability) with the exact same topology as its corresponding SP tree. A-H panels represent clades on the phylogeny (central figure). Panel G is an incorrect clade that includes the outgroup (Thryomanes) as part of the ingroup. 67

CHAPTER 3. Fluctuating fire regimes and their historical effects on genetic variation in an endangered savanna specialist

INTRODUCTION

Fire is considered a major natural architect of the world’s ecosystems and a driver of diversification. For example, Bytebier et al., (2011) found that a major shift in fire regime 20 million years ago led to a radiation of several clades of South African orchids.

Savannas – a natural ecotone between grasslands and forest-- depend on a critical balance of C4 grass accumulation and fire action purging dense understory and closed canopies that in turn promote the growth of grasses creating a positive feedback for savannas and fire (Beckage et al., 2009). The resultant woodland savanna biome provides habitat for many organisms (Skarpe, 1991). Despite these well-known effects of fire in recent ecosystems, few studies have explored the long-term relationship between organisms and shifts in fire regime.

Grasslands fluctuate with glacial cycles. In North America before the Last Glacial

Maximum (LGM) at the Lower Pleniglacial (70,000 years before present ybp), climate became drier and extensive forests transformed into grasslands and deserts. Between

60,000-55,000 ybp, rising temperatures favored steppe vegetation at temperate latitudes

(Winograd et al., 1997; Adams et al., 1999). The current extent of grasslands throughout central North America developed after the LGM, having expanded northward as ice sheets retreated and changes in the fire regime occurred (Axelrod, 1985).

Fire-dependent savanna habitats are one of the most “globally imperiled ecosystems” due to anthropogenic pressures that have reduced the historic 20 million ha

68 of continuous distribution from Minnesota to Texas to 12,000 highly fragmented ha after

European settlement (McPherson, 1997; pp. 34). One of those pressures was the suppression of fire. North American savannas expanded after the LGM when the fire cycles became more frequent in the Holocene around 10,000 ybp (Power et al., 2008;

Marlon et al., 2009; Bryant 2012). Pollen records show that in Texas juniper-oak savannas also expanded after the LGM, becoming the dominant vegetation on Edwards’

Plateau and contacting prairie grasslands from the American Midwest (Bryant, 2012).

These changes in the fire regime have been tied to events of rapid climate change between 15,000 and 10,000 ybp (Marlon et al., 2009) during the last glacial-interglacial transition in North America stemming from abrupt temperature changes, in particular the cooling at the Younger Dryas (~12,900-11,700 ybp, Steffensen et al., 2008). We expect changes in the amount of suitable habitat to affect population size, and consequently genetic diversity of species that occur in obligate fire-dependent habitats. In particular, we expect a population expansion to coincide with the advent of a more active and consistent fire regime for species that are intimately associated with savannas and grasslands, followed by a very recent decline owing to anthropogenic fire suppression.

To test the role of fire-induced population expansion hypothesis we investigated mitochondrial and nuclear sequence diversity in an endangered vertebrate iconic of the juniper-oak savannas of the southern Great Plains, the black-capped vireo (BCVI, Vireo atricapilla). BCVI is an ideal system because its preferred nesting habitat requires a critical balance of open grassland, sparse brush, and very short and solitary trees with no canopy cover, a habitat largely dependent on fire (Axelrod, 1985; Beckage et al., 2009).

69

There is a growing comparative framework of genetic markers for this species including allozymes (Fazio et al., 2004), microsatellites (Athrey et al., 2012a; Athrey et al., 2012b;

Barr et al., 2008), and mitochondrial DNA (mtDNA, Zink et al., 2010). Although these markers provide resolution of recent demographic events, little information exists on

BCVI’s deeper evolutionary history. Therefore, we sequenced multiple nuclear loci and used niche modeling to better understand BCVI population history through space and time. We suggest that knowledge of species history over both recent and historical time frames provides useful perspective for management. We tested previous conclusions that the species is a single evolutionarily significant unit. We determined whether genetic signatures of population fluctuations were consistent with fire history in the region, and we predicted that BCVI populations experienced at least one dramatic increase concomitant with the occurrence of Holocene fires 10,000 ybp.

Study species. The black-capped vireo is a monotypic species with a breeding distribution from Oklahoma (formerly as far north as Kansas) and Texas in the U.S to

Coahuila, Nuevo Leon and Tamaulipas in Mexico, and they winter along the Pacific slope of western Mexico from Sonora to Oaxaca (Fig. 3.1; Grzybowski, 1995; Howell &

Webb, 1995). They are listed as endangered in the United States (Ratzlaff, 1987) and

Mexico (SEMARNAT, 2010), and vulnerable by the International Union for

Conservation of Nature (IUCN, 2001). Only 191 breeding pairs were found in Oklahoma and Texas in 1987 (Wilkins et al., 2006), though by 2005 the populations numbered about 6,000 breeding males, owning in part to conservation efforts and more geographically comprehensive censuses (USFWS, 2007; Wilkins et al., 2006). The main

70 threats are thought to be land use transformation for cattle and concomitant increase in brood parasites (brown-headed cowbirds, Molothrus ater) and predation (Smith et al.,

2012). BCVI is considered a habitat specialist of juniper-oak savannas (Graber, 1961), depending on dense, brushy vegetation for nesting, and hence it is affected by native and exotic ungulate grazers (Wilkins et al., 2006) and fire suppression. High tree density renders the breeding grounds unsuitable. Periodic fires reset the successional cycle in juniper-oak savannas, providing the dense interspersed thickets BCVIs prefer for nesting.

1.0$

1" 0.5$ 0.0$

2" 3" 1 2 3 4 5 6 7 8 9 10 11 12 4" 11" 10" 8" 5" 1.0$ 6" 7" 9" 0.5$

0.0$ 1 2 3 4 5 6 7 8 9 10 11 12 1.0$ 12" 0.5$

0.0$ 1 2 3 4 5 6 7 8 9 10 11 12 0.1.0$ 0$ 0.5$ 0.0$

1 2 3 4 5 6 7 8 9 10 11 12

Figure 3.1. Breeding sampling localities in Oklahoma, Texas, Mexico, and Bayesian genotype cluster assignment of 124 individuals for K = 2 and K = 3 assuming admixture (top two) and no admixture (bottom two). Populations are arranged from north to south: Oklahoma (1), Camp Barkeley (2), Quail’s Ridge (3), Fort Hood (4), Balcones Canyon Lands (5), San Antonio (6), Kerr Wildlife Management Area (7), Dobbs Mountain (8), Kickapoo Caverns (9), Devil’s Ridge (10), Independence Creek (11), Mexico (12; it includes one individual from Coahuila, two from Nuevo Leon, and seven from Tamaulipas). Criss-cross pattern depicts current breeding (north) and wintering (south) distributions, and horizontal lines represent previous breeding range (www.natureserve.org).

71

MATERIALS AND METHODS

Samples and genetic procedures. We analyzed genetic samples from BCVI individuals used in previous studies that were collected from 12 localities (Barr et al.,

2008, Zink et al., 2010; OSF 5, see Abstract) representing the species’ current distribution in Oklahoma, Texas, and Mexico (Fig. 3.1), which were augmented by DNA samples from recently collected tail feathers. We used previously extracted DNA or used a standard phenol chloroform protocol to improve DNA quality and quantity yield from tail feathers, followed by standard cold ethanol precipitation and final DNA elution in 40

µl of TE buffer, stored at -20o C. In the tissue digestion phase the DNA extraction buffer was supplemented with 40 µl of dithiothreitol (DTT; 100mg/ml) to break the disulfide bonds of keratinized feathers. We amplified and sequenced seven autosomal and two sex-linked loci (OSF 5). Polymerase chain reaction (PCR) of each locus was accomplished in 12.5 µl reactions of 3.25 µl of water, 6.25 µl of GoTaq Green Mastermix

(Promega), and 0.5 µl of each primer at a concentration of 10nM, and 2 µl of template

DNA. All loci were amplified with the following touchdown thermocycler conditions: initial denaturation step of 95 °C for 3 min, followed by 5 cycles of 95 °C for 0.5 min,

58°C for 0.5 min and 72°C for 1 min, and repeating these previous steps changing the annealing temperature downward 2oC until it reached 52 ºC for 20 cycles and a final extension step of 72 o C for 6 min. PCR products were purified using a 3.4 µl volume of the enzyme cocktail of Exonuclease I – Shrimp Alkaline Phosphatase method (ExoSAP,

Affymetrix) following manufacturer instructions. PCR products of typical yield (100 ng/µl) or higher as inspected by a visual comparison of the product band’s brightness to a

72 size DNA ladder of 100 base pairs (bps, Fisher Scientific) in a 1% agarose gel were diluted by a factor of two after PCR purification to reach an ideal concentration of 25 ng/µl. For those amplicons with visibly low concentration, the dilution step was skipped and the product sequenced directly to avoid polymerase error mutations due to re- amplification. Sequencing was done using the BigDye v3.1 termination protocol (Applied

Biosystems) following recommendations by the manufacturer in an ABI 3730xl automated Sanger sequencer.

Most DNA samples from non-blood or molting-feather origin had low and fragmented DNA yield and amplification of products larger than 300 bps was unsuccessful. To overcome this issue we sequenced products from blood using regular primers (table 2 in OSF 5) and redesigned specific primers for each locus flanking locus regions with polymorphic sites no longer than 300 bps (table 2 in OSF 5). Although for most loci excluded regions were adjacent to exons and invariable, for MC1R we chose a central range with the most variation. Amplification and sequencing for these smaller fragments was done as previously described.

Allele phasing and recombination. DNA sequences were edited in

SEQUENCHER 4.7 (Genecodes Inc., Ann Arbor MI) and were verified to be of avian origin and the correct locus by comparing them to reference sequences in the NCBI

Genbank database using BLAST searches. We also mapped loci to the chicken genome to determine their position in the avian genome using BLAST. For heterozygotes we separated each allele by using in silico phasing in the program PHASE 2.1 (Stephens et al., 2001) by interconverting the original aligned FASTA files in the program

73

SEQPHASE (Flot 2010), running 100,000 generations with a burn-in of 10,000. We tested for recombination in our loci employing the Phi method (Bruen et al., 2006) in the program SPLITS TREE 4.3 (Huson & Bryant 2006).

Testing for selection. Phylogenetic and phylogeographic inferences assume neutrality of the marker under survey, because selection can bias the nature of polymorphisms detected. We tested for the possibility of selection in a coalescent framework for all of our 10 loci using the Hudson-Kreitman-Aguadé (HKA) test (Hudson et al., 1987) with 10,000 simulations as implemented in the program HKA, using a closely related species to BCVI, the slaty vireo (SLVI, V. brevipennis) as a reference for segregating sites and genetic distance.

Estimating the age of BCVI. Understanding the evolutionary history of a species requires knowing how long it has been evolving independently since it diverged from its sister species. However, the phylogenetic relationships among all species of the family

Vireonidae are unknown (Cicero & Johnson, 2001). Therefore we reconstructed a phylogeny of representative lineages of vireos and close relatives to calculate divergence times and relatedness in BEAST 1.7 (Drummond & Rambaut 2007) by concatenating our

10 loci using the ND2 gene for time calibration employing the avian rate of 1.3% substitutions per lineage, per million years (0.013 substitutions/lineage/million years,

Arbogast et al. 2006) for 30 million generations and a burnin of 10%.

Population genetic parameters and structure. We calculated measures of genetic variability using ARLEQUIN 3.2 (Excoffier et al., 2005): allele number, haplotype diversity (Hd), and nucleotide diversity (π). We calculated the fixation index

74

(FST) from allelic frequencies and its significance with 1000 bootstrap permutations in

ARLEQUIN. The presence of population structure in black-capped vireos has been debated (Athrey et al., 2012a; Athrey et al., 2012b; Barr et al., 2011; Barr et al., 2008;

Fazio et al., 2004; Zink et al., 2010, 2011). We explored the deep historical structure employing only new data (nuclear sequence loci) to avoid bias from adding previously used markers. We used the Bayesian method of individual genotype assignment in the program STRUCTURE 2.3 (Pritchard et al., 2000) to test the hypothesis of no deep structure (K = 1) versus multiple clusters (K > 1). We coded our nuclear loci as single- nucleotide polymorphism (SNP; following Manthey et al., 2011) to calculate K with values between 1 and 20 clusters using the admixture (ancient panmixia) and non- admixture models (ancient fragmentation) with sampling locality as a prior and both correlated and uncorrelated allele frequencies (similar results, therefore only correlated frequencies reported), 20 repetitions per cluster with 1 x 106 iterations and a burn-in of

100,000 steps to ensure results’ reproducibility (Gilbert et al., 2012), and a fixed lambda value (which was inferred by setting K = 1 and allowing lambda to be estimated in an initial analysis). We used the ΔK method (Evanno et al., 2005) to identify the best estimate of K in STRUCTURE HARVESTER (Earl, 2012). To evaluate isolation by distance (IBD), we plotted FST (FST – 1) versus the log of geographic distance among sites. We recognize this assumes demographic equilibrium, and we use the results as a heuristic indication of the relationship between population proximity and genetic distance.

75

Niche modeling. Ecological niche modeling can offer testable phylogeographic hypotheses related to population isolation, contraction, or expansion (Carstens &

Richards, 2007; McCormack et al., 2010). We estimated the potential distribution of black-capped vireos through space and time to test habitat expansion or reduction using records from the Breeding Bird Survey with the Maximum Entropy method in MAXENT

(Phillips et al., 2006). We used 19 bioclimatic layers from three time periods for breeding and wintering distributions: Present, LGM, and Last Interglacial (LIG) from the

WorldClim database (worldclim.org). The LGM occurred between 33,000 and 14,000 years before present (ybp, Clark et al., 2009) and the LIG occurred between 120,000 –

140,000 ybp (Otto-Bliesner et al., 2006). We clipped all layers to an area approximately

30% larger than the current combined breeding and wintering distribution. We ran

MAXENT 10 times, and selected the climatic layers that accounted for 5% or more to the model. We then reran the model using these layers and all locality records for our final maps.

Historical demography. Most methods to calculate population size changes are based on statistical deviation from a specific model (e.g. constant size; Rogers, 1995) by permutations (Excoffier et al., 2005), and fluctuations of effective population size (Ne) need to be assessed independently (Kuhner et al., 1998); however, it is now possible to estimate population sizes and fluctuations through time without a prior limitations of a particular demographic scenario using the full predictive power of coalescent theory from multiple sequence loci (Ho & Shapiro, 2011). We tested the population expansion hypothesis using Extended Bayesian Skyline Plots (EBSP; Heled & Drummond, 2008)

76 implemented in BEAST 1.7 with three independent runs of 100 x 106 generations and a burn-in of 10 x 106 steps. The model of evolution for each locus was calculated in the program JMODELTEST (Posada, 2008). We used a Yule-model genealogical model prior to calculate demographic growth and sizes, and for the time estimates we scaled all the nuclear loci by their ploidy with relaxed clocks to the mtDNA dataset with a strict clock and the same substitution rate we used for the phylogeny. To scale nuclear loci among themselves we used the avian sex-linked locus rate of 0.00195 substitutions/lineage/million years (Axelsson et al., 2004) with a strict clock and a relaxed clock for the autosomal loci. To transform Ne into absolute number of breeding individuals we divided the population size estimate by generation time (Heled &

Drummond, 2008) assuming the age of reproduction of BCVI females to be one year

(Grzybowski, 1990). We did four different skyline plots to test for sex-biased dispersal intrinsic of the locus: (i) “all evidence” included all 9 nuclear loci plus mtDNA from

(Zink et al., 2010), (ii) “nuclear evidence” included all nuclear loci (iii) “male-biased dispersal” included two sex-linked loci alone, and (iv) “female biased dispersal” only included mtDNA. To estimate fire history we analyzed fluctuations of charcoal deposits in North America since the LGM. We plotted anomalous Z-scores at charcoal sites from

Figure 5 in Power et al. (2008) for eastern and western North America onto our EBSP.

An anomalous Z-score is a standardized measurement of burned biomass that differs from present-day charcoal; values of zero are similar, positive values indicate more burned biomass than present day, and negative values less burned biomass than present day. Z-scores of ± 2 are considered large charcoal anomalies (see Power et al., 2008 for

77 details). Large positive Z-scores should coincide with more frequent fires and conversely large negative values indicate historic fire reduction. There are records for only 35 LGM sites in the world (Power et al., 2008) and unfortunately there are no standardized publically available Z-scores for older time intervals; thus, we were unable to explore pre-LGM correlations.

RESULTS

We obtained sequences for nine nuclear loci for 124 individuals from 12 localities

(Table 2; Fig. 3.1; OSF 5) with a range of 206 to 293 bps (after primer sites were trimmed), phased all alleles with a probability of at least 0.9, and found no evidence of recombination. Each locus was variable, exhibiting from 5 to 55 alleles. The HKA test was insignificant (X2 = 15.73, d.f. = 18, p = 0.61), corroborating neutrality of loci.

The phylogeny of selected Vireonidae lineages (figure 1 in OSF 5) supports the sister relationship of BCVI and SLVI. These two species diverged during the Pliocene, ca. 4 million ybp (HPD 2.9 – 5.4 million ybp) indicating a long independent history for

BCVI and that the fire history influence the species rather than its common ancestor with its sister taxon.

Haplotype diversity (Hd) varied considerably, from a high of 0.96 (AETC locus) to 0.3 (ADAMS6). Nucleotide diversity (π) per locus followed a similar pattern (Table 2).

Only three of the nine FST values were statistically different from zero; the overall average of 0.026 was not statistically significant.

78

Table 2. Summary statistics per locus in the black-capped vireo. Sex-linked loci appear in bold. Significant values (p < 0.05) are marked with an asterisk. Locus chromosome placement (Chr#), base pairs (#bps), allelic number (#alleles), Allelic diversity (Hd), nucleotide diversity (π), and Fixation index (FST). Locus Chr # # bps # alleles Hd π FST

AdamS6 Z 291 5 0.3 0.001 0.025 IQGap2 Z 250 16 0.81 0.006 0.045 Gapdh3 1 206 12 0.48 0.003 0.024 Lama2 3 275 7 0.51 0.002 0.022 TGFB2.5 3 236 15 0.82 0.006 0.038* Fib5 4 242 12 0.67 0.005 0.04 Trop6 10 277 14 0.65 0.003 0.002 MC1R 11 293 13 0.72 0.005 0.054* AETC 21 278 55 0.96 0.012 0.029*

In general, there is little geographic variation in levels of genetic variability

(Table 3). For example, nucleotide diversity (π), which takes into account sample size differences, shows considerable uniformity, although the San Antonio sample is slightly more variable than the others.

Table 3. Summary statistics per black-capped vireo locality (see Figure 3.1 for locality information)

Locality Average number of alleles surveyed π Hd # alleles 1. OK 19.78 0.0043 0.68 5.2 2. CB 6.89 0.0044 0.57 3.1 3. QR 19.56 0.0036 0.59 4.8 4. FH 21.25 0.0039 0.52 5.1 5. BC 6.67 0.0043 0.69 3.3 6. SA 17.11 0.0064 0.71 5.9 7. KW 32.00 0.0051 0.68 7.6 8. DM 4.67 0.0032 0.44 2.1 9. KC 18.00 0.0046 0.59 5.1 10. DR 8.89 0.0042 0.68 3.6 11. IC 20.22 0.0050 0.68 5.8 12. MX 12.67 0.0054 0.74 4.5

79

Geographic variation across loci. The most likely number of genetic clusters is K

= 1. ΔK showed values of three or greater to be unlikely (Fig. 3.2). The ΔK statistic showed that the most likely K is 2 (results not shown); however, the ΔK statistic requires both lower and upper K likelihood values to be present, therefore K = 1 and 2 cannot be assessed by this method because K = 0 is nonsensical (Evanno et al. 2005; Pritchard et al.

2000). Nevertheless K = 2 is less likely than K = 1 (Fig. 3.2) revealing no evidence of historical population structure. We found no evidence for IBD (Fig. 3.3).

Figure 3.2. Likelihood values of inferred number of genetic -2600 clusters (K; 1-20). Plots are mean values of 20 repetitions per K and

-2800 vertical bars represent standard deviations: open circles and dotted lines

-3000 represent population admixture assumption runs; filled diamonds and solid vertical bars -3200 represent no-admixture assumption runs (with a

Mean of Ln(Prob) with Std.Dev (20 reps) Std.Dev (20 with of Ln(Prob) Mean jitter offset on K of 1.2 -3400 for visual purposes). The most likely value of K in every case is 1 regardless

-3600 of prior information. 5 10 15 20

Number of clusters (K)

Using all nuclear loci and mtDNA, we found strong evidence for two effective population size change events (median = 1.8, 1 – 3; 95% high posterior density [HPD]).

The first increase after the Lower Pleniglacial (70,000 ybp) with a posterior mean estimate of Ne = 249,580 breeding individuals (100,000 – 742,000; 95% HPD). Before

80 the LGM (35,000 ybp) Ne was 780,000 individuals (181,000 – 1,780,000; 95% HPD) and remained constant until glacial retreat. At the end of the LGM we found a decrease in population size during the last glacial-interglacial transition with the lowest HPD of

38,000 individuals at ~14,000 ybp. The second increase happened at the beginning of the

Holocene (10,000 ybp) with Ne = 997,000 individuals (94,000 – 1,660,000; 95% HPD).

Present-day Ne was = 1,270,000 individuals (508,000 – 4,191,000 95% HPD). The most recent common ancestor (MRCA) of all sampled alleles can be traced back to the Early

Pleistocene, 1.74 million ybp (median.root.height, 900,000 – 2,480,000 95% HPD), and

BCVI populations were constant without significant fluctuations before 70,000 ybp (Fig.

3.4). Historical charcoal-record fluctuations coincide with the episodes of population increase and decrease through time previously described. In North America, charcoal deposits during the LGM were similar to present-day and reduced significantly

(anomalous Z-scores of -2 and -4 in eastern and western North America, respectively; figure 5 in Power et al., 2008), concomitant with the climatic fluctuations after the LIG

(Marlon et al., 2009; Steffensen et al., 2008). The charcoal deposits increased between

12,000 and 9,000 ybp after the cooling of Younger-Dryas period (Fig. 3.4; figure 5 in

Power et al. 2008).

Figure 3.3. Isolation by distance pattern for nine nuclear loci. The linear regression was non-significant (P= 0.89).

81

Demographic trends were maintained even after removing the mtDNA data

(results not shown). Nevertheless, use of mtDNA or sex-linked loci alone were unable to reject the constant size population hypothesis due to their large posterior credibility intervals encompassing zero (results not shown) showing that potentially sex-biased dispersal markers are not driving the analysis.

Niche models. Niche models predicted a restricted breeding distribution at the

LIG relative to today, an increase in suitable habitat at the LGM in northern Mexico, and presence of breeding habitat in Texas and Oklahoma at the Present that altogether correspond to a post-Pleistocene expansion of grasslands and juniper-oak savannas (Fig.

3.4; Bryant, 2012; Hofreiter & Stewart, 2009). Wintering distribution models found available habitat at each of the intervals, with some shifts from the lower Mexican Pacific coast south of the Transvolcanic Belt at the LIG towards the west at the LGM, and a higher probability of occurrence at the Present. Despite these differences, presence of wintering habitat in the last 120,000 years seems constant with some slight shifts from south to west (Fig. 3.4).

DISCUSSION

Roles for molecular markers. In the last decade many conservation genetics studies of endangered species explored recent population history using microsatellites

(see Athrey et al., 2012a and references therein) and sometimes mitochondrial loci (see

Evans & Sheldon, 2008; Fallon, 2007; Haig et al., 2011). For example, (Munshi-South &

Kharchenko, 2010) analyzed 18 microsatellite loci and found that nearly all of their 14

82 study populations of Peromyscus leucopus in the New York City area were genetically differentiated. In this case, various parks apparently served as “refugia” in the past century during habitat fragmentation and urbanization highlighting how rapidly mutating microsatellites can detect population fragmentation. Athrey et al. (2012a) compared microsatellite allele frequencies in museum specimens and current populations of BCVI and found an increase in FST over the past 100 years, likely due to increased habitat fragmentation. Few studies, however, address deeper time periods, and we were interested in ascertaining whether effects of historically shifting fire regimes could be observed in genetic data. Hence, we used nuclear gene sequences, which potentially index deeper time periods than microsatellites or mtDNA (Brito & Edwards, 2009). The goal was to reconstruct a long-term picture of a species so that current populations can be considered in a historical context. For example, these analyses can reveal if a species has undergone past fluctuations in size for natural reasons, and if so, what the magnitude of these fluctuations might have been.

Concomitant fluctuations between fire and population size illustrate the potential of fire to influence historical population structure of a juniper-oak savanna specialist.

These historical patterns are consistent with current understanding of the importance of fire for BCVI habitat. Fire can influence groups with different dispersal abilities in different ways. Arthropod and small mammal species likely perish during major fires whereas birds likely avoid fire by flying to different areas (Whelan, 1995 p. 131).

However, our data show how community-wide phenomena such as fire have had marked effects even on the most vagile members of a community. In grasslands and savanna

83 specialists tend to benefit from fires (Saab & Powell, 2005). Our findings suggest that this relationship is an old one. Just recently black-capped vireos were found in areas considered unsuitable habitat in northern Texas after fires consumed large parts of the woodlands (H. Mathewson Pers. Comm.).

Genetic structure. We found no signatures of historical population structure in any of the loci we explored. The FST of 0.026 value that we found was consistent with that observed in microsatellites (Barr et al., 2008) and mtDNA (Zink et al., 2010), which shows consistency across different genetic markers, despite reasons why the markers could yield different empirical values (Zink & Barrowclough, 2008). The individual genotype assignment test rejected the population structure hypothesis as well (K = 1).

Hence, although there might be local restrictions to gene flow (Athrey et al., 2012a;

Athrey et al., 2012b; Barr et al., 2008), there are not multiple evolutionarily significant units within BCVI despite existing as a single distinct lineage for over 1 million ybp. The geographic area occupied is likely too small for isolating barriers to cause significant historical structuring.

(Barr et al., 2008) suggested that there is a limitation to gene flow at distances exceeding ca. 100 km. The plots of IBD for microsatellites (Barr et al., 2008), mitochondrial DNA (Zink et al,. 2010) and nuclear gene sequences (Fig. 3.3) are all consistent with a slight if any effect of distance. Whether this would be of management value is unclear, given that there is no significant pattern of genetic diversification or variability.

84

Distribution and historical demography through space and time. Our spatial distribution models strongly resemble known current breeding and wintering distributions of the black-capped vireos (Figs. 3.1 and 3.4). Our distribution predictions through time showed relatively little breeding habitat at the LIG, unlike the prediction by (McPherson,

1997), followed by an increase at the LGM in northern Mexico and its expansion northwards Texas, Oklahoma, and Kansas. This increase in habitat availability after the

LGM into the Holocene is congruent with northwards expansion of temperate species during warmer times (Hofreiter & Stewart, 2009) and an increase of pollen records associated with plant species characteristic of juniper-oak savannas of the Edwards

Plateau after 10,000 ybp (Bryant, 2012). Given that the LGM and Present models are snapshots of two points in time, we cannot detect a precise moment when an increase in population size occurred in our ENMs, only that it occurred sometime during this interval. Our EBSP analyses show a correlation to our niche models and suggest that the timing of two population expansions were 60,000 and 10,000 ybp, corresponding respectively to warmer climates after the Lower Pleniglacial and beginning of the

Holocene. These results support the hypothesis that a shifting fire regime before and after glacial periods resulted in increasing savanna habitat conducive to BCVI. Our winter species distribution models indicate that there has been available wintering habitat at least since the LIG through the present in the Mexican semidecidious forests. Those forests are considered the northernmost limit of the Neotropics (Udvardy & Udvardy, 1975) and seemed to have been relatively stable for the last 1.5 million ybp (Becerra, 2005); therefore, our winter distribution prediction seems biologically realistic as it coincides

85 with the time to the MRCA of BCVI and a single historic panmictic population (i.e., it does not suggest allopatric refugia).

Brood parasitism and predation (Smith et al., 2012) have been implicated in reduction of current BCVI populations. Brown-headed cowbirds are a native species and before the 19th Century were found predominantly in the prairies west of the Mississippi

River; in 100 yr they spread throughout eastern North America (Brittingham & Temple,

1983). In addition, cowbirds are present in Pleistocene fossil deposits from California to

Florida (Miller, 1947; Oswald & Steadman, 2011). We can likely reject historical brood parasitism as a cause of present population decline because cowbirds have co-occurred with vireos at least since the LGM and have parasitized nests for millions of years

(Lanyon, 1992). Biotic interactions with natural predators are difficult to study in paleocommunities. Nevertheless there is evidence of a consistent predatory species pool of mammals, reptiles, and raptors since the LGM and in Holocene localities in Texas

(Blair, 1952; Smith et al., 2012) and the American Southwest (Oswald & Steadman,

2011); thus the historical appearance of a new major predator seems unlikely.

Conclusions. It would seem that human alteration of habitat has contributed significantly to the recent declines of the BCVI (Athrey et al., 2012a; Barr et al., 2008).

Our data provide perspective on the species before human impacts. The question is what is the magnitude of this effect? Assuming the average ratio of Ne to census population sizes (Nc) of 0.1 reported in empirical wildlife studies (Frankham, 1995; Palstra &

Ruzzante, 2008), our results might indicate the existence of 12 million black-capped vireos, 1000X more than currently estimated (Wilkins et al., 2006). Thus, it seems likely

86 that without human alteration of their habitat, black-capped vireos should be exceedingly more common, though not necessarily a great deal more widespread (Fig. 3.4). If one accepts the general trend in black-capped vireo populations, there was a large increase beginning approximately 10,000 ybp, after the LGM, but that at present populations are well below historical levels. The estimated reduction of North American savannas to 6% of their original extent seems to have had a significant impact on modern-day populations of black-capped vireos. As a caveat, we note that if we assume a generation time of two years, there are still many fewer birds at present than were estimated to exist at the maximum extent of fire-induced habitat.

It is bothersome, however, that our LIG niche model did not reveal a much wider distribution, a result that cannot be correlated using charcoal for the lack of data for that time period. Thus, perhaps the effect of recent changes has been through fragmentation of the current range rather than a reduction in the areal extent of the range. In other words, the given area currently occupied might support many more vireos if the habitat was more continuous, since population expansion may be hindered by fragmentation.

87

Figure 3.4. Extended Bayesian Skyline Plot and niche models at the Present, Last Glacial Maximum (LGM), and Last Interglacial (LIG). Y-axis is effective population size (Ne) in number of individuals, X- axis is in million years before present (ybp) with its 95% high posterior density (HPD). The anomalous Z- scores fluctuations represent differences in burned biomass (charcoal) to present day charcoal deposits since the LGM (figure 5 in Power et al. 2008) for western (WNA) and eastern (ENA) North America. The dotted-dashed line represents the current census size (Nc; Wilkins et al. 2006). Only the last 70,000 years are shown. 88

BIBLIOGRAPHY

Adams, J., Maslin, M., & Thomas, E. (1999) Sudden climate transitions during the Quaternary. Progress in Physical Geography 23, 1-36. AOU. American Ornithologists’ Union. (1998) Checklist of North American Birds, 7th ed. American Ornithologists' Union, Washington, D.C. Agapow P-M, Purvis A. 2002. Power of eight tree shape statistics to detect nonrandom diversification: a comparison by simulation of two models of cladogenesis. Systematic Biology, 51:866-872. Arbogast, B.S., Drovetski, S.V., Curry, R.L., Boag, P.T., Seutin, G., Grant, P.R., Grant, B.R. & Anderson, D.J. (2006) The origin and diversification of Galapagos mockingbirds. Evolution, 60, 370-382. Athrey, G., Barr, K.R., Lance, R.F., & Leberg, P.L. (2012a) Birds in space and time: genetic changes accompanying anthropogenic habitat fragmentation in the endangered black-capped vireo (Vireo atricapilla). Evolutionary applications 5, 540-552. Athrey, G., Lance, R.F., & Leberd, P.L. (2012b) How far is too close? restricted, sex- biased dispersal in black-capped vireos. Molecular Ecology 21, 4359-4370. Avise, J.C. (1992) Molecular population structure and the biogeographic history of a regional fauna: a case history with lessons for conservation biology. Oikos, 62-76. Axelrod, D.I. (1985) Rise of the grassland biome, central North America. The Botanical Review, 51, 163-201. Axelsson, E., Smith, N.G., Sundström, H., Berlin, S. & Ellegren, H. (2004) Male-biased mutation rate and divergence in autosomal, Z-linked and W-linked introns of chicken and turkey. Molecular Biology and Evolution, 21, 1538-1547. Ayala, R., Griswold, T.L. & Bullock, S. (1993) The native bees of Mexico. Oxford University Press, New York. Baele, G., Lemey, P., Bedford, T., Rambaut, A., Suchard, M.A. & Alekseyenko, A.V. (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution, 29, 2157-2167. Barker, F.K. (1999). The evolution of cooperative breeding in Campylorhynchus wrens: A comparative approach. PhD Thesis. University of Chicago. Barker, FK. (2004). Monophyly and relationships of wrens (Aves: Troglodytidae): a congruence analysis of heterogeneous mitochondrial and nuclear DNA sequence data. Molecular Phylogenetics and Evolution, 31, 486-504. Barker, F.K. (2007) Avifaunal interchange across the Panamanian isthmus: insights from Campylorhynchus wrens. Biological Journal of the Linnean Society, 90, 687-702. Barker, F.K., Cibois A., Schikler P., Feinstein J., & Cracraft J. (2004). Phylogeny and diversification of the largest avian radiation. Proceedings of the National Academy of Sciences, 101,11040-11045.

89

Barker, F. K., Burns, K. J., Klicka, J., Lovette, I. J., & Lanyon, S. M. (In press). New insights into New World biogeography: an integrated view from the phylogeny of blackbirds, cardinals, sparrows, tanagers, warblers, and allies. Auk: Ornithological Advances. Barr, K.R., Athrey, G., Lindsay, D.L., et al. (2011) Missing the forest for the gene trees: conservation genetics is more than the identification of distinct population segments. Auk 128, 792-794. Barr, K.R., Lindsay, D.L., Athrey, G., et al. (2008) Population structure in an endangered songbird: maintenance of genetic differentiation despite high vagility and significant population recovery. Molecular Ecology 17, 3628-3639. Beaumont, M.A., Nielsen, R., Robert, C., Hey, J., Gaggiotti, O., Knowles, L., Estoup, A., Panchal, M., Corander, J. & Hickerson, M. (2010) In defence of model-based inference in phylogeography. Molecular Ecology, 19, 436-446. Becerra, J.X. (2005) Timing the origin and expansion of the Mexican tropical dry forest. Proceedings of the National Academy of Sciences of the United States of America 102, 10919-10923. Beckage, B., Platt, W.J., & Gross, L.J. (2009) Vegetation, Fire, and Feedbacks: A Disturbance-Mediated Model of Savannas. The American Naturalist 174, 805- 818. Bell, R.C., MacKenzie, J.B., Hickerson, M.J., Chavarría, K.L., Cunningham, M., Williams, S. & Moritz, C. (2011) Comparative multi-locus phylogeography confirms multiple vicariance events in co-distributed rainforest frogs. Proceedings of the Royal Society B: Biological Sciences, rspb20111229. Bennett, P (1986) Comparative studies of morphology, life history and ecology among birds. PhD Thesis. University of Sussex, Bermingham, E. & Moritz, C. (1998) Comparative phylogeography: concepts and applications. Molecular Ecology, 7, 367-369. Blair, W.F. (1952) Mammals of the Tamaulipan biotic province in Texas. Texas Journal of Science 4, 230-250. Bouckaert, RR. (2010). DensiTree: making sense of sets of phylogenetic trees. Bioinformatics, 26,1372-1373. Brewer D. 2001. Wrens, dippers and thrashers. A&C Black. New York. Brito, P.H., & Edwards, S.V. (2009) Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica 135, 439-455. Brittingham, M.C., & Temple, S.A. (1983) Have cowbirds caused forest songbirds to decline? BioScience, 31-35. Brown, W.M., George, M. & Wilson, A.C. (1979) Rapid evolution of animal mitochondrial DNA. Proceedings of the National Academy of Sciences, 76, 1967- 1971. Bruen, T.C., Philippe, H. & Bryant, D. (2006) A simple and robust statistical test for detecting the presence of recombination. Genetics, 172, 2665-2681. Bull, J.J. (1983) Evolution of sex determining mechanisms. The Benjamin/Cummings Publishing Company, Inc.

90

Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N. A., & RoyChoudhury, A. (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular biology and evolution, 29, 1917-1932. Bryant, V.M.J. (2012) Paleoenvironments. Texas State Historical Association. http://www.tshaonline.org/handbook/online/articles/sop02 Bytebier, B., Antonelli, A., Bellstedt, D.U., & Linder, H.P. (2011) Estimating the age of fire in the Cape flora of South Africa from an orchid phylogeny. Proceedings of the Royal Society B: Biological Sciences 278, 188-195. Carling, M.D., & Brumfield, R.T. (2007). Gene sampling strategies for multi-locus population estimates of genetic diversity (θ). PLoS One, 2:e160. Carstens, B.C., & Richards C.L. (2007) Integrating coalescent and ecological niche modeling in comparative phylogeography. Evolution 61, 1439-1454. Cavender-Bares, J., Kozak, K.H., Fine, P.V. & Kembel, S.W. (2009) The merging of community ecology and phylogenetic biology. Ecology letters, 12, 693-715. Chesser, R.T., Banks, R.C., Barker, F.K., Cicero, C., Dunn, J.L., Kratter, A.W., Lovette, I.J., Rasmussen, P.C., Remsen Jr, J. & Rising, J.D. (2013) Fifty-fourth supplement to the American Ornithologists' Union check-list of North American birds. Auk, 130, 558-572. Chojnowski, J.L., Kimball, R.T., & Braun, E.L. (2008). Introns outperform exons in analyses of basal avian phylogeny using clathrin heavy chain genes. Gene, 410, 89-96. Cicero, C., & Johnson, N.K. (2001) Higher-level phylogeny of New World vireos (Aves: Vireonidae) based on sequences of multiple mitochondrial DNA genes. Molecular Phylogenetics and Evolution 20, 27-40. Clark, P.U., Dyke, A.S, Shakun, JD, et al. (2009) The last glacial maximum. Science 325, 710-714. Cody, S., Richardson, J. E., Rull, V., Ellis, C., & Pennington, R. T. (2010). The great American biotic interchange revisited. Ecography, 33, 326-332. Corl, A., & Ellegren, H. (2013) Sampling strategies for species trees: the effects on phylogenetic inference of the number of genes, number of individuals, and whether loci are mitochondrial, sex-linked, or autosomal. Molecular Phylogenetics and Evolution 67, 358-366. Cracraft, J. (1983). Species concepts and speciation analysis. Current ornithology, Springer, p. 159-187. Crews, S.C. & Hedin, M. (2006) Studies of morphological and molecular phylogenetic divergence in spiders (Araneae: Homalonychus) from the American southwest, including divergence along the Baja California Peninsula. Molecular Phylogenetics and Evolution, 38, 470-487. Crosby, G. T. (1972). Spread of the Cattle Egret in the western hemisphere. Bird- Banding, 205-212. Cummings, M.P., & Meyer, A. (2005). Magic bullets and golden rules: data sampling in molecular phylogenetics. Zoology, 108, 329-336.

91

DaCosta, J. M., & Klicka, J. (2008). The Great American Interchange in birds: a phylogenetic perspective with the genus Trogon. Molecular Ecology 17, 1328- 1343 Darriba, D., Taboada, G.L., Doallo, R.n. & Posada, D. (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature methods, 9, 772-772. Darwin, C. (1859) On the Origin of the Species by Natural Selection. John Wiley & Sons. New York. Dolman, G. & Joseph, L. (2012) A species assemblage approach to comparative phylogeography of birds in southern Australia. Ecology and evolution, 2, 354- 369. Drummond, A.J. & Rambaut, A. (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evolutionary Biology, 7, 214. Drummond, A.J., Nicholls, G.K., Rodrigo, A.G. & Solomon, W. (2002) Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics, 161, 1307-1320. Drummond, A.J., Ho, S.Y., Phillips, M.J. & Rambaut, A. (2006) Relaxed phylogenetics and dating with confidence. PLoS biology, 4, e88. Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution, 29, 1969- 1973. Earl, D.A. (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources 4, 359-361. Eaton, D.A., & Ree, R.H. (2013). Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Systematic Biology, 62, 689-706. EBI. European Biotechnology Institute. http://www.ebi.ac.uk/Tools/st/emboss_transeq/ Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32, 1792-1797. Edwards, S.V., & Beerli, P. (2000) Perspective: gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution, 54, 1839-1854. Edwards, S.V., & Bensch, S. (2009). Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008. Molecular Ecology, 18, 2930-2933. Edwards, S.V., Liu., L., & Pearl, D.K. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences, 104, 5936- 5941. Ellegren, H., & Fridolfsson, A-K. (1997). Male–driven evolution of DNA sequences in birds. Nature genetics, 17, 182-184. Escalante, P., Navarro, A.G. & Peterson, A.T. (1993) A geographic, ecological, and historical analysis of land bird diversity in Mexico. Oxford University Press, New York.

92

Evanno, G., Regnaut, S., & Goudet, J. (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Molecular Ecology, 14, 2611-2620. Evans, S.R., & Sheldon, B.C. (2008) Interspecific patterns of genetic diversity in birds: correlations with extinction risk. Conservation Biology, 22, 1016-1025. Excoffier, L., Laval, G., & Schneider, S. (2005) Arlequin version 3.0: an integrated software package for population genetics data analysis. Evolution and Bioinformatics Online, 1, 47-50. Faircloth, B.C., McCormack, J.E., Crawford, N.G., Harvey, M.G., Brumfield, R.T., & Glenn, T.C. (2012). Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Systematic Biology:sys004 Fallon, S.M. (2007) Genetic data and the listing of species under the US Endangered Species Act. Conservation Biology, 21, 1186-1195. Farley, G. (1996) Campylorhynchus brunneicapillus: A non-cooperative species in a cooperative genus. Ph.D. Dissertation, University of Arizona. Fazio, V.W.I, Miles, D.B., & White, M.M. (2004) Genetic differentiation in the endangered Black-capped Vireo. Condor, 106, 377-385. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of molecular evolution, 17, 368-376. Felsenstein, J. (2006). Accuracy of coalescent likelihood estimates: do we need more sites, more sequences, or more loci? Molecular biology and evolution, 23, 691- 700. Fishbein, M., Hibsch-Jetter, C., Soltis, D.E., & Hufford, L. (2001). Phylogeny of Saxifragales (angiosperms, eudicots): analysis of a rapid, ancient radiation. Systematic Biology, 50, 817-847. Fisher-Reid, M.C., & Wiens, J.J. (2011) What are the consequences of combining nuclear and mitochondrial data for phylogenetic analysis? Lessons from Plethodon salamanders and 13 other vertebrate clades. BMC Evolutionary Biology, 11, 300. Flot, J.-F. (2010) SeqPHASE: a web tool for interconverting PHASE input/output files and FASTA sequence alignments. Molecular Ecology Resources, 10, 162-166. Frankham, R. (1995) Conservation genetics. Annual review of genetics 29, 305-327. Garrick, R., Nason, J., Meadows, C. & Dyer, R. (2009) Not just vicariance: phylogeography of a Sonoran Desert euphorb indicates a major role of range expansion along the Baja peninsula. Molecular Ecology, 18, 1916-1931. Giarla, T.C., Voss, R.S. & Jansa, S.A. (2014) Hidden diversity in the Andes: comparison of species delimitation methods in montane marsupials. Molecular Phylogenetics and Evolution, 70, 137-151. Gilbert, K.J., Andrew, R.L., Bock, D.G,, et al. (2012) Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program Structure. Molecular Ecology, 21, 4925-4930. Graber, J.W. (1961) Distribution, habitat requirements, and life history of the black- capped vireo (Vireo atricapilla). Ecological monographs, 31, 313-336. Gregory, T.R. (2005). Genome size evolution in . The evolution of the genome, 1, 4-87.

93

Grinnell, J. (1928) A Distributional Summation of the Ornithology of Lower California. Berkeley 1928. University of California Press. Grismer, L.L. (1994) The origin and evolution of the peninsular herpetofauna of Baja California, Mexico. Herpetological Natural History, 2, 51-106. Grismer, L.L. (2000) Evolutionary biogeography on Mexico's Baja California peninsula: A synthesis of molecules and historical geology. Proceedings of the National Academy of Sciences, 97, 14017-14018. Grismer, L.L. (2002) Amphibians and Reptiles of Baja California, Including Its Pacific Islands, and the Islands in the Sea of Corte ÃÅs. Univ of California Press. Grzybowski, J.A. (1990) Population and nesting ecology of the black-capped vireo-1990. Resource Protection Division. Texas Parks and Wildlife Department, Austin, Texas. Grzybowski, J.A. (1995) Black-capped Vireo: Vireo atricapillus American Ornithologists' Union. Haig, S.M., Bronaugh, W.M, Crowhurst, R.S., et al. (2011) Genetic applications in avian conservation. Auk 128, 205-229. Heled, J., & Drummond, A.J. (2008). Bayesian inference of population size history from multiple loci. BMC Evolutionary Biology, 8, 289. Heled, J., & Drummond, A.J. (2010). Bayesian inference of species trees from multilocus data. Molecular biology and evolution, 27, 570-580. Hey, J., HKA. http://lifesci.rutgers.edu/heylab/DistributedProgramsandData.htm#HKA Hey, J., & Nielsen, R. (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences, 104, 2785-2790. Hickerson, M., Carstens, B., Cavender-Bares, J., Crandall, K., Graham, C., Johnson, J., Rissler, L., Victoriano, P. & Yoder, A. (2010) Phylogeography's past, present, and future: 10 years after. Molecular Phylogenetics and Evolution, 54, 291-301. Hickerson, M.J. & Meyer, C.P. (2008) Testing comparative phylogeographic models of marine vicariance and dispersal using a hierarchical Bayesian approach. BMC Evolutionary Biology, 8, 322. Hickerson, M.J., Stahl, E.A. & Lessios, H. (2006) Test for simultaneous divergence using approximate Bayesian computation. Evolution, 60, 2435-2453. Hickerson, M.J., Stahl, E. & Takebayashi, N. (2007) msBayes: pipeline for testing comparative phylogeographic histories using hierarchical approximate Bayesian computation. BMC bioinformatics, 8, 268. Hickerson, M.J., Stone, G.N., Lohse, K., Demos, T.C., Xie, X., Landerer, C. & Takebayashi, N. (2014) Recommendations for using msBayes to incorporate uncertainty in selecting an abc model prior: a response to oaks et Al. Evolution, 68, 284-294. Hillis, D.M., & Bull, J.J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology, 42, 182-192. Hipp, A.L., Eaton, D.A., Cavender-Bares, J., Fitzek, E., Nipper, R., & Manos, P.S. (2014). A framework phylogeny of the American oak clade based on sequenced RAD data. PloS one, 9, e93975.

94

Ho, S.Y., & Shapiro, B. (2011) Skyline-plot methods for estimating demographic history from nucleotide sequences. Molecular Ecology Resources, 11, 423-434. Hofreiter, M., & Stewart, J. (2009) Ecological change, range fluctuations and population dynamics during the Pleistocene. Current Biology, 19, R584-R594. Holt, J.W., Holt, E.W. & Stock, J.M. (2000) An age constraint on Gulf of California rifting from the Santa Rosalía basin, Baja California Sur, Mexico. Geological Society of America Bulletin, 112, 540-549. Howell, S.N., & Webb, S. (1995) A guide to the birds of Mexico and northern Central America Oxford University Press. Huang, W., Takebayashi, N., Qi, Y. & Hickerson, M.J. (2011) MTML-msBayes: approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity. BMC bioinformatics, 12, 1. Hudson, R.R, Kreitman, M., & Aguadé, M. (1987). A test of neutral molecular evolution based on nucleotide data. Genetics, 11, 153-159. Huelsenbeck, J.P., Ronquist, F., Nielsen, R., & Bollback, J.P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294, 2310-2314. Huson, D.H. & Bryant, D. (2006) Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23, 254-267. Hutchinson, G.E. (1959). Homage to Santa Rosalia or why are there so many kinds of animals? American Naturalist, 145-159. IUCN, SSC. (2001) IUCN Red List categories and criteria: version 3.1. Prepared by the IUCN Species Survival Commission. Jacobsen, F., & Omland, K. (2011). Species tree inference in a recent radiation of orioles (Genus Icterus): multiple markers and methods reveal cytonuclear discordance in the northern oriole group. Molecular Phylogenetics and Evolution, 61, 460-469. Jeffreys, H. (1961) Theory of probability. In. Oxford University Press, Oxford UK Jetz, W., & Rubenstein, D. R. (2011). Environmental uncertainty and the global biogeography of cooperative breeding in birds. Current Biology, 21, 72-78. Kass, R.E. & Raftery, A.E. (1995) Bayes factors. Journal of the american statistical association, 90, 773-795. Kingman, J.F. (1982) The coalescent. Stochastic processes and their applications, 13, 235-248. Kirby, M.X., Jones, D.S., & MacFadden, B.J. (2008) Lower Miocene stratigraphy along the Panama Canal and its bearing on the Central American Peninsula. PLoS One, 3, e2791. Knowles, L.L. (2009) Statistical phylogeography. Annual Review of Ecology, Evolution, and Systematics, 40, 593-612. Knowles, L.L. & Maddison, W.P. (2002) Statistical phylogeography. Molecular Ecology, 11, 2623-2635. Knowles, L.L. & Kubatko, L.S. (2010) Estimating species trees: an introduction to concepts and models. Estimating species trees: practical and theoretical aspects. Wiley-Blackwell, Hoboken, New Jersey, 1-14. Kubatko, L.S. (2009) Identifying hybridization events in the presence of coalescence via model selection. Systematic Biology, 58, 478-488.

95

Kuhner, M.K., Yamato, J., Felsenstein. J. (1998) Maximum likelihood estimation of population growth rates based on the coalescent. Genetics, 149, 429-434. Lanfear, R., Calcott, B., Ho, S.Y., & Guindon, S. (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular biology and evolution, 29, 1695-1701. Lanyon, S.M. (1992) Interspecific brood parasitism in blackbirds(Icterinae): A phylogenetic perspective. Science, 255, 77-79. Larkin, M.A., Blackshields, G., Brown, N., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A. & Lopez, R. (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. Leaché, A.D., & Rannala, B. (2010) The accuracy of species tree estimation under simulation: a comparison of methods. Systematic Biology:syq073. Leaché, A.D., Crews, S.C. & Hickerson, M.J. (2007) Two waves of diversification in mammals and reptiles of Baja California revealed by hierarchical Bayesian analysis. Biology Letters, 3, 646-650. Lee, J.Y., Edwards, S.V., & Webster, M. (2008) Divergence across Australia's Carpentarian barrier: Statistical phylogeography of the red-backed fairy wren (Malurus melanocephalus). Evolution, 62, 3117-3134. Lemmon, A.R., Emme, S.A., & Lemmon, E.M. (2012) Anchored hybrid enrichment for massively high-throughput phylogenomics. Systematic Biology:sys049. Lemmon, E.M., & Lemmon, A.R. (2013) High-throughput genomic data in systematics and phylogenetics. Annual Review of Ecology, Evolution, and Systematics, 44, 99- 121. Lindell, J., Méndez-de la Cruz, F. & Murphy, R. (2005) Deep genealogical history without population differentiation: discordance between mtDNA and allozyme divergence in the zebra-tailed lizard (Callisaurus draconoides). Molecular Phylogenetics and Evolution, 36, 682. Lindell, J., Ngo, A. & Murphy, R.W. (2006) Deep genealogies and the mid-peninsular seaway of Baja California. Journal of Biogeography, 33, 1327-1331. Liu, L., & Pearl, D.K. (2007) Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Systematic Biology, 56, 504-514. Liu, L., & Yu, L. (2011) Estimating species trees from unrooted gene trees. Systematic Biology, 60, 661-667. Losos, J.B. & Ricklefs, R.E. (2009) Adaptation and diversification on islands. Nature, 457, 830-836. Lovette, I.J., Arbogast, B.S., Curry, R.L., Zink, R.M., Botero, C.A., Sullivan, J.P., Talaba, A.L., Harris, R.B., Rubenstein, D.R. & Ricklefs, R.E. (2012) Phylogenetic relationships of the mockingbirds and thrashers (Aves: Mimidae). Molecular Phylogenetics and Evolution, 63, 219-229. Manthey, J.D., Klicka, J., Spellman, G.M. (2011) Isolation-driven divergence: speciation in a widespread North American songbird (Aves: Certhiidae). Molecular Ecology, 20, 4371-4384.

96

Marlon, J., Bartlein, P., Walsh, M., et al. (2009) Wildfire responses to abrupt climate change in North America. Proceedings of the National Academy of Sciences, 106, 2519-2524. Martin, A.P. & Palumbi, S.R. (1993) Body size, metabolic rate, generation time, and the molecular clock. Proceedings of the National Academy of Sciences, 90, 4087- 4091. Martin, D.P., Lemey, P., Lott, M., Moulton, V., Posada, D. & Lefeuvre, P. (2010) RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics, 26, 2462-2463. Marshall, L.G., Webb, S.D., Sepkoski, J.J., & Raup, D.M. (1982) Mammalian evolution and the great American interchange. Science, 215, 1351-1357. Mayr, E. (1946) History of the North American bird fauna. Wilson Bulletin 58, 3-41 McCormack, J.E., Huang, H., & Knowles, L.L. (2009) Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. Systematic Biology, 58, 501-508. McCormack, J.E., Zellmer, A.J., & Knowles, L.L. (2010) Does niche divergence accompany allopatric divergence in Aphelocoma jays as predicted under ecological speciation?: insights from tests with niche models. Evolution, 64, 1231-1244. McKay, B.D. (2012) A new timeframe for the diversification of Japanese mammals. Journal of Biogeography, 39, 1134-1143. McPherson, G.R. (1997) Ecology and management of North American savannas University of Arizona Press. Miller, A.H. (1947) A new genus of icterid from Rancho La Brea. Condor 49, 22-24. Miller, M.A., Pfeiffer, W., & Schwartz, T. (2010, November). Creating the CIPRES Science Gateway for inference of large phylogenetic trees. In Gateway Computing Environments Workshop (GCE), 2010 (pp. 1-8). IEEE. Miyata, T., Hayashida, H., Kuma, K., Mitsuyasu, K., & Yasunaga, T. (1987) Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harbor symposia on quantitative biology, Cold Spring Harbor Laboratory Press, p. 863-867. Moritz, C. (2002) Strategies to protect biological diversity and the evolutionary processes that sustain it. Systematic Biology, 51, 238-254. Munshi-South, J., & Kharchenko, K. (2010) Rapid, pervasive genetic differentiation of urban white-footed mouse (Peromyscus leucopus) populations in New York City. Molecular Ecology, 19, 4242-4254. Nunney, L. (1993). The influence of mating system and overlapping generations on effective population size. Evolution, 47, 1329-1341. Nylander, J.A., Wilgenbusch, J.C., Warren, D.L, & Swofford, D.L. (2008) AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics, 24, 581-583. Oswald, J.A., & Steadman, D.W. (2011) Late pleistocene passerine birds from Sonora, Mexico. Palaeogeography, Palaeoclimatology, Palaeoecology, 301, 56-63.

97

Otto-Bliesner, B.L., Marshall, S.J., Overpeck, J.T., Miller, G.H., & Hu, A. (2006) Simulating Arctic climate warmth and icefield retreat in the last interglaciation. Science, 311, 1751-1753. Palstra, F.P,, & Ruzzante, D.E. (2008) Genetic estimates of contemporary effective population size: what can they tell us about the importance of genetic stochasticity for wild population persistence? Molecular Ecology, 17, 3428-3447. Phillips, S.J., Anderson, R.P., & Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological modelling, 190, 231-259. Pigot, A.L., & Tobias, J.A. (2013) Species interactions constrain geographic range expansion over evolutionary time. Ecology Letters, 16, 330-338. Pinto-Sánchez, N.R., Ibáñez, R., Madriñán, S., Sanjur, O.I., Bermingham, E., & Crawford, A.J. (2012) The Great American Biotic Interchange in frogs: Multiple and early colonization of Central America by the South American genus Pristimantis (Anura: Craugastoridae). Molecular Phylogenetics and Evolution, 62, 954-972. Platnick, N.I. (1976) Concepts of dispersal in historical biogeography. Systematic Biology, 25, 294-295. Posada, D. (2008) jModelTest: phylogenetic model averaging. Molecular Biology and Evolution, 25, 1253-1256. Power, M.J., Marlon, J., Ortiz, N., et al. (2008) Changes in fire regimes since the Last Glacial Maximum: an assessment based on a global synthesis and analysis of charcoal data. Climate Dynamics, 30, 887-907. Price, J.J. (2003) Communication with shared call repertoires in the cooperatively breeding stripe-backed wren. Journal of Field Ornithology, 74, 166-171. Pritchard, J.K., Stephens, M., & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics 155, 945-959. R Core Team (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/ Rambaut, A., & Drummond, A. (2007) Tracer version 1.4. Computer program and documentation distributed by the author, website http://beast. bio. ed. ac. uk/Tracer [accessed August 2009], Rambaut, A., & Grass, N.C. (1997) Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computer applications in the biosciences: CABIOS, 13, 235-238. Ratzlaff, A. (1987) Endangered and threatened wildlife and plants: determination of the black-capped vireo to be an endangered species. Federal Register, 52, 37420- 37423. Riddle, B. & Hafner, D. (2006) A step-wise approach to integrating phylogeographic and phylogenetic biogeographic perspectives on the history of a core North American warm deserts biota. Journal of Arid Environments, 66, 435-461. Riddle, B.R., Hafner, D.J., Alexander, L.F. & Jaeger, J.R. (2000) Cryptic vicariance in the historical assembly of a Baja California Peninsular Desert biota. Proceedings of the National Academy of Sciences, 97, 14438-14443.

98

Riddle, B.R., Dawson, M.N., Hadly, E.A., Hafner, D.J., Hickerson, M.J., Mantooth, S.J. & Yoder, A.D. (2008) The role of molecular genetics in sculpting the future of integrative biogeography. Progress in Physical Geography, 32, 173-202. Rodríguez-Robles, J.A. & De Jesus-Escobar, J.M. (2000) Molecular Systematics of New World Gopher, Bull, and Pinesnakes (Pituophis: Colubridae), a Transcontinental Species Complex. Molecular Phylogenetics and Evolution, 14, 35-50. Rogers, A.R. (1995) Genetic evidence for a Pleistocene population explosion. Evolution, 608-615. Rokas, A., & Carroll, S.B. (2005) More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Molecular biology and evolution, 22, 1337-1344. Rokas, A., Williams, B.L., King, N., & Carroll, S.B. (2003) Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature, 425, 798-804. Rojas-Soto, O.R., Alcántara-Ayala, O. & Navarro, A.G. (2003) Regionalization of the avifauna of the Baja California Peninsula, Mexico: a parsimony analysis of endemicity and distributional modelling approach. Journal of Biogeography, 30, 449-461. Rojas-Soto, O.R., Westberg, M., Navarro-Sigüenza, A.G. & Zink, R.M. (2010) Genetic and ecological differentiation in the endemic avifauna of Tiburon Island. Journal of Avian Biology, 41, 398-406. Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., & Huelsenbeck, J.P. (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61, 539-542. Rosen, D.E. (1978) Vicariant patterns and historical explanation in biogeography. Systematic Biology, 27, 159-188. Saab, V.A., & Powell, H.D. (2005) Fire and avian ecology in North America: process influencing pattern. Studies in Avian Biology 30, 1. Selander, R.K. (1964) Speciation in wrens of the genus Campylorhynchus. University of California Press. SEMARNAT (2010) Norma Oficial Mexicana NOM-059-SEMARNAT-2010, Protección ambiental-Especies nativas de Mexico de flora y fauna silvestres- Categorias de riesgo y especificaciones para su inclusion, exclusion o cambio - Lista de especies en riesgo. Diario Oficial de la Federación. Shaw, K. L. (2002) Conflict between nuclear and mitochondrial DNA phylogenies of a recent species radiation: what mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proceedings of the National Academy of Sciences, 99, 16122-16127. Simpson,, B.B., & Neff, J.L. (1985) Plants, their pollinating bees, and the Great American Interchange. In The great American biotic interchange (pp. 427-452). Springer US. Skarpe, C. (1991) Impact of grazing in savanna ecosystems. Ambio, 351-356.

99

Smith, B.T. & Klicka, J. (2010) The profound influence of the Late Pliocene Panamanian uplift on the exchange, diversification, and distribution of New World birds. Ecography, 33, 333-342. Smith, B.T., Harvey, M.G., Faircloth, B.C., Glenn, T.C. & Brumfield, R.T. (2013) Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales. Systematic biology, syt061. Smith, K.N., Cain J.W. III, Morrison, M.L., & Wilkins, R.N. (2012) Nesting ecology of the black-capped vireo in southwest Texas. Wilson Journal of Ornithology, 124, 277-285. Steffensen, J.P., Andersen, K.K., Bigler, M., et al. (2008) High-resolution Greenland ice core data show abrupt climate change happens in few years. Science 321, 680- 684. Stephens, M., Smith, N.J. & Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. The American Journal of Human Genetics, 68, 978-989. Stock, J. & Hodges, K. (1989) Pre-Pliocene Extension around the Gulf of California and the transfer of Baja California to the Pacific Plate. Tectonics, 8, 99-115. Sukumaran, J., & Holder, M.T. (2008) SumTrees: summarization of split support on phylogenetic trees. Part of the DendroPy Phylogenetic Computation Library Version, 2. Sukumaran, J., & Holder, M.T. (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics, 26, 1569-1571. Sun, C., Shepard, D.B., Chong, R.A., Arriaza, J.L., Hall, K., Castoe, T.A., Feschotte, C., Pollock, D.D., Mueller, R.L. (2012) LTR retrotransposons contribute to genomic gigantism in plethodontid salamanders. Genome biology and evolution, 4, 168- 183. Swofford, D.L. (2003) PAUP*. Phylogenetic analysis using parsimony (* and other methods). Version 4. Sinauer, Massachusets. Townsend, J.P. (2007) Profiling phylogenetic informativeness. Systematic Biology, 56, 222-231. Trejo-Torres, J.C. & Ackerman, J.D. (2001) Biogeography of the Antilles based on a parsimony analysis of orchid distributions. Journal of Biogeography, 28, 775-794. Trujano-Alvarez, A.L. & Álvarez-Castañeda, S.T. (2007) Taxonomic revision of Thomomys bottae in the Baja California Sur lowlands. Journal of Mammalogy, 88, 343-350. Udvardy, M.D., & Udvardy, M. (1975) A classification of the biogeographical provinces of the world International Union for Conservation of Nature and Natural Resources Morges, Switzerland. Upton, D.E. & Murphy, R.W. (1997) Phylogeny of the Side-Blotched Lizards (Phrynosomatidae: Uta) Based on mtDNA Sequences: Support for a Midpeninsular Seaway in Baja California. Molecular Phylogenetics and Evolution, 8, 104-113.

100

USFWS (2007) Black-capped Vireo (Vireo atricapilla). In: 5 Year Review: Summary and Evaluation, Arlington, Texas. Vázquez-Miranda, H., Navarro-Sigüenza, A.G. & Morrone, J.J. (2007) Biogeographical patterns of the avifaunas of the Caribbean Basin islands: a parsimony perspective. Cladistics, 23, 180-200. Vázquez-Miranda, H., Navarro-Sigüenza, A.G., & Omland, K.E. (2009) Phylogeography of the rufous-naped wren (Campylorhynchus rufinucha): Speciation and hybridization in Mesoamerica. Auk, 126, 765-778. Walsh, H., Kidd, M., Moum, T., & Friesen, V. (1999) Polytomies and the power of phylogenetic inference. Evolution, 932-937. Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002) Phylogenies and community ecology. Annual review of ecology and systematics, 475-505. Webb, S.D. (1991) Ecogeography and the great American interchange. Paleobiology, 266-280. Webb, S.D. (2006) The great American biotic interchange: patterns and processes. Annals of the Missouri Botanical Garden, 245-257. Weir, J.T. & Schluter, D. (2008) Calibrating the avian molecular clock. Molecular Ecology, 17, 2321-2328. Weir, J.T., Bermingham, E., & Schluter, D. (2009) The great American biotic interchange in birds. Proceedings of the National Academy of Sciences, 106, 21737-21742. Whelan, R.J. (1995) The ecology of fire. Cambridge University Press. Whitfield, J.B., Lockhart, P.J. (2007) Deciphering ancient rapid radiations. Trends in Ecology & Evolution, 22, 258-265. Wiens, J.J., Kuczynski, C.A., & Stephens, P.R. (2010) Discordant mitochondrial and nuclear gene phylogenies in emydid turtles: implications for speciation and conservation. Biological Journal of the Linnean Society, 99, 445-461. Wiggins, I.L. (1980) Flora of Baja California. Stanford University Press. Wilkins, N., Powell, R.A., Conkey, A.A., & Snelgrove, A.G. (2006) Population status and threat analysis for the black-capped vireo. Department of Wildlife and Fisheries Sciences, Texas A&M University. Prepared for the US Fish and Wildlife Service, Region 2, 146. Winger, B.M., Barker, F.K., & Ree, R.H. (2014) Temperate origins of long-distance seasonal migration in New World songbirds. Proceedings of the National Academy of Sciences, 201405000. Winograd, I.J., Landwehr, J.M., Ludwig, K.R., Coplen, T.B., & Riggs, A.C. (1997) Duration and structure of the past four interglaciations. Quaternary Research, 48, 141-154. Woodburne, M.O. (2010) The Great American Biotic Interchange: dispersals, tectonics, climate, sea level and holding pens. Journal of Mammalian Evolution, 17, 245- 264. Xie, W., Lewis, P.O., Fan, Y., Kuo, L. & Chen, M.-H. (2011) Improving Marginal Likelihood Estimation for Bayesian Phylogenetic Model Selection. Systematic biology, 60, 150.

101

Zink, R.M. (2014) Homage to Hutchinson, and the role of ecology in lineage divergence and speciation. Journal of Biogeography, 41, 999-1006. Zink, R.M. & Dittmann, D.L. (1991) Evolution of brown towhees: mitochondrial DNA evidence. Condor, 98-105. Zink, R.M. & Blackwell, R.C. (1998a) Molecular Systematics and Biogeography of Aridland Gnatcatchers (Genus Polioptila) and Evidence Supporting Species Status of the California Gnatcatcher (Polioptila californica). Molecular Phylogenetics and Evolution, 9, 26-32. Zink, R.M. & Blackwell, R.C. (1998b) Molecular systematics of the scaled quail complex (genus Callipepla). Auk, 394-403. Zink, R.M, & Barrowclough G.F. (2008) Mitochondrial DNA under siege in avian phylogeography. Molecular Ecology, 17, 2107-2121. Zink, R.M., Blackwell, R.C. & Rojas-Soto, O. (1997) Species limits in the Le Conte's thrasher. Condor, 132-138. Zink, R.M., Blackwell-Rago, R.C. & Ronquist, F. (2000) The shifting roles of dispersal and vicariance in biogeography. Proceedings of the Royal Society of London. Series B: Biological Sciences, 267, 497-503. Zink, R.M., Kessen, A.E., Line, T.V. & Blackwell-Rago, R.C. (2001) Comparative phylogeography of some aridland bird species. Condor, 103, 1-10. Zink, R.M., Groth, J.G., Vázquez-Miranda, H. & Barrowclough, G.F. (2013) Phylogeography of the California Gnatcatcher (Polioptila californica) Using Multilocus DNA Sequences and Ecological Niche Modeling: Implications for Conservation. Auk, 130, 449-458. Zink, R.M., Jones, A.W., Farquhar, C.C., Westberg, M.C., & Rojas, J.I.G. (2010) Comparison of molecular markers in the endangered Black-capped Vireo (Vireo atricapilla) and their interpretation in conservation. Auk, 127, 797-806. Zink, R.M., Jones, A.W., Farquhar, C.C., Westberg, M.C., & Rojas, J.I.G (2011) Endangered species management and the role of conservation genetics: A response to Barr et al. Auk, 128, 794-797. Zwickl, D. (2006) GARLI—genetic algorithm for rapid likelihood inference. See http://www. bio. utexas. edu/faculty/antisense/garli/Garli. html

102

APPENDICES

1

2

1 South America North America Ambiguous rec. Isthmus of Panama c. Cooperative breeding Duetting Non-cooperative b. No duetting 2 Pliocene Pleistocene Holocene

Eocene Oligocene Miocene

Appendix 1. Time calibrated species tree phylogeny (five loci, Fig. 1F) in millions of years depicting three independent dispersal events from North America to South America, and the two equally parsimonious scenarios (1, 2) for the evolution of cooperative breeding and duetting behavior in Campylorhynchus described in the text. Horizontal node bars represent the time 95% HPD. The ingroup appears in lower case and outgroups in upper case letters.

103