<<

Population Differentiation, Historical Demography and Evolutionary Relationships Among Widespread Populations ( coelebs ssp. )

by

Pasan Samarasin-Dissanayake

A thesis submitted in conformity with the requirements for the degree of Master of Science Ecology and Evolutionary Biology University of Toronto

© Copyright by Pasan Samarasin-Dissanayake 2010

Population Differentiation, Historical Demography and Evolutionary Relationships Among Widespread Common Chaffinch Populations ( Fringilla coelebs ssp. )

Pasan Samarasin-Dissanayake

Master of Science

Department of Ecology and Evolutionary Biology University of Toronto

2010

Abstract

Widespread that occupy continents and oceanic islands provide an excellent opportunity to study evolutionary forces responsible for population divergence. Here, I use multilocus coalescent based population genetic and phylogenetic methods to infer the evolutionary history of the common chaffinch ( Fringilla coelebs ), a widespread

Palearctic species. My results showed strong population structure between

Atlantic islands. However, the two European can be considered one panmictic population based on gene flow estimates. My investigation of effects of sampling on concatenated and Bayesian estimation of species tree (BEST) methods demonstrated that concatenation is more sensitive to sampling than BEST. Furthermore, concatenation can provide incorrect evolutionary relationships with high confidence when sample size is small. In conclusion, my results suggest European ancestry for the common chaffinch and

Atlantic islands appear to have been colonized sequentially from north to south via

Azores.

ii

Acknowledgements

First, I would like to thank Dr. Allan Baker for giving me an opportunity to work in his lab, giving me the independence to explore and learn, and providing guidance when

I needed it. Thanks to Dr. Asher Cutter for helpful advice and allowing me to use his cluster for analysis. I am in debt of Oliver Haddrath for training me in molecular laboratory techniques, designing an informative molecular marker (ANON OH) and tremendous support in and out of the lab. Thank you for everyone in the lab including Dr.

Erika Tavares, the most helpful postdoc one can imagine; Rosemary Gibson for supporting me in many ways, Alison Cloutier for fun and very educational times in the lab, and Yvonne Verkuil for assistance with analysis and some interesting discussions.

Thank you to past and present postdocs Dr. Sergio Pereira and Dr. Debbie Buehler for their assistance, Kristen Choffe for DNA sequencing help, Cathy Dutton and Sue Chopra for administrative support. Thank you to Alivia Dey for friendship. I am very grateful to my parents for their constant support in whatever I chose to do and my sister for running some of my analysis on her computer and annoying me to write this thesis.

iii

Table of Contents

Abstract ………………………………………………………………………………….. ii

Acknowledgements …………………………………………………………………….. iii

Table of contents ……………………………………………………………………….. iv

List of Figures ………………………………………………………………………….. vi

List of Tables ………………………………………………………………………….. viii

List of Appendices ……………………………………………………………………… ix

Chapter 1: General Introduction ………………………………………………………… 1

1.1 Molecular population genetics and ………………. 1 1.2 The common chaffinch ( Fringilla coelebs ) ………………………………… 4 1.3 Thesis objectives ……………………………………………………………. 7 1.4 References …………………………………………………………………… 9

Chapter 2: Population genetic structure and historical demography of the common chaffinch ( Fringilla coelebs ) .………………………...…… 12

2.1 Abstract …………………………………………………………………….. 12

2.2 Introduction ………………………………………………………………… 12

2.3 Methods …………………………………………………………...……….. 17 2.3.1 Sampling ………………………………………………...….……. 17 2.3.2 DNA extraction, amplification and sequencing ………………….. 18 2.3.3 Data analysis ……………………………………………………... 21

2.4 Results ……………………………………………………………………… 24 2.4.1 Summary statistics and neutrality tests …………………………... 24 2.4.2 Haplotype networks …………………………………………….... 28 2.4.3 Genetic clusters and migration rates ……………………………... 32 2.4.4 Effective population sizes and population expansion ……………. 36

2.5 Discussion ………………………………………………………………….. 38 2.5.1 Population structure & subspecies status ……………………….... 41 2.5.2 Effective population size and population expansion …………….. 43 2.5.3 Conclusions ………………………………………………………. 46

2.6 References ………………………………………………………………….. 47

iv

Chapter 3: Comparison of multilocus phylogenetic methods for inferring evolutionary relationships among recently diverged common chaffinch subspecies ( Fringilla coelebs ssp. ) ……………………………… 54

3.1 Abstract …………………………………………………………………….. 54

3.2 Introduction ………………………………………………………………… 54

3.3 Methods ……………………………………………………………………. 63 3.3.1 Sampling, DNA extraction, amplification and sequencing ……… 63 3.3.2 Data analysis ……………………………………………………... 66

3.4 Results ……………………………………………………………………… 69

3.5 Discussion ………………………………………………………………….. 73 3.5.1 Effects of sampling on phylogenetic inference …………………... 76 3.5.2 Species tree estimates with some gene flow ……………………... 81 3.5.3 Conclusion ……………….……………….……………………… 82

3.6 References ………………………………………………………………….. 83

Chapter 4: General conclusions ………………………………………………………... 89

4.1 References ………………………………………………………………….. 91

v

List of Figures

Chapter 1

Figure 1: Evolutionary relationships among Fringilla sp . ……………………...... 4

Figure 2: Distribution map of the common chaffinch ( Fringilla coelebs ) showing part of its total range …………………………………………….….. 5

Chapter 2

Figure 1: Map of sampled common chaffinch populations ……………………………. 17

Figure 2: Median- joining haplotype network of the control region of mtDNA ………. 29

Figure 3: Median- joining haplotype network of nuclear loci EF1 α and PTPN ………. 30

Figure 4: Median- joining haplotype network of nuclear loci TROP and UBIQ ……… 31

Figure 5: Plot of likelihood of data for assumed number of populations (K) in Structure …………………………………………………………………... 33

Figure 6: Probabilistic assignment of individual genotypes to populations (K=5) in Structure ……………………………………………………………. 33

Chapter 3

Figure 1: Discordance between gene trees and the species tree ……………………….. 57

Figure 2: Geographic locations of sampled common chaffinch subspecies (Fringilla coelebs ssp.) ………………………………………………………. 60

Figure 3: Expected phylogenetic relationships among Atlantic island subspecies under different colonization hypothesis …………………………...…………. 62

Figure 4: Evolutionary relationships among common chaffinch subspecies from concatenated Bayesian inference …………………………………………….. 70

Figure 5: Evolutionary relationships among common chaffinch subspecies from estimation of species tree (BEST) method …………………………………… 71

vi

Figure 6: The large data set molecular clock tree from Bayesian estimation of species tree (BEST) method ……...……………………………………….. 73

Figure 7: Effect of sampling when multiple gene lineages persist in recently diverged populations ……………………………………………... 76

Figure 8: Most probable colonization pattern of Atlantic Islands and …………. 79

vii

List of Tables

Chapter 2:

Table 1: Sampled loci for population genetic analysis ………………………… 20

Table 2: Population genetic summary statistics ………………………………... 26

Table 3: Migration rate estimates between common chaffinch populations …... 35

Table 4: Effective population size and population expansion ……………….… 37

Chapter 3:

Table 1: Details of 9 nuclear loci sampled for phylogenetic analysis …………. 65

viii

List of Appendices

Appendix 1: Details of the Chaffinch samples used in the study ……………………… 92

ix

Chapter 1 General Introduction

The study of evolutionary processes responsible for population divergence and speciation is a very important area of research in evolutionary biology. Investigation of structured populations is a key component in understanding how population divergence eventually leads to speciation. According to a simple allopatric model of speciation, an ancestral population splits into two daughter populations and a barrier restricts gene flow between the two daughter populations. In absence of gene flow, the two daughter populations go on different evolutionary trajectories as each population experience random mutations, drift and/or selection. As genetic differences between the two daughter populations accumulate, reproductive isolation may result due to genetic incompatibilities (Coyne and Orr 2004). Understanding details of this process from initial divergence to complete reproductive isolation is an ongoing research goal in evolutionary biology. But it generally takes a long time to go from initial divergence to reproductive isolation; hence study of earth’s biodiversity is historical in nature. The fields of population genetics and phylogenetics attempt to understand historical evolutionary and demographic process responsible for current species distribution, and infer past population divergence events using current molecular data.

1.1 Molecular population genetics and molecular phylogenetics

Historically, the fields of population genetics and phylogenetics have asked different questions and have employed different methods. While population genetics is concerned with evolutionary forces acting within a species, the primary goal of

1

phylogenetics is to infer how species relate to each other. However, these two fields can be thought as being part of a continuum that goes from initial divergence of populations to accumulation of genetic differences between those populations to reach reproductive isolation, hence crossing the “species boundary” according to the biological definition of species. Early on, both fields have extensively used mtDNA for evolutionary inference.

While population genetics moved to microsatellites, phylogenetics required sequence based markers and moved to nuclear markers (e.g. introns). However, recent advances in

DNA sequencing technology have slowly shifted population genetic studies to use more sequence based markers than microsatellites because sequences are much more information-rich than length polymorphisms. In addition, these two fields have come together in terms of methodology as coalescent theory has come to the forefront of phylogenetic inference methods. The utility of coalescent theory, the most active area of research in theoretical population genetics in the last few decades, provides a promising avenue to infer population genetic parameters and test evolutionary hypotheses. The development primarily attributed to Kingman (1982), the coalescent theory traces a sample of sequences from current generation back to its common ancestor. In the coalescent framework, gene genealogies are used to estimate population genetic parameters by treating genealogies as a random variable (Rosenberg and Nordborg 2002).

The coalescent genealogy samplers estimate parameters of interest by making large collections of probable genealogies from data and integrating all sampled genealogies to estimate population genetic parameters. Herein lies the fundamental difference between traditional phylogenetics and theoretical population genetics approaches. Although both use gene genealogies, the inferred tree from the data is the primary interest in

2

phylogenetics, while trees are considered random variables in population genetics to estimate important population genetic parameters. The population genetic parameters estimated using the coalescent model provide more biologically realistic estimates with appropriate confidence intervals than estimates from classical summary statistics (Kuhner

2009). Also, coalescent simulations allow us to test demographic hypotheses, such as conformity to a certain population divergence pattern, and test alternative timing for divergence or population bottlenecks.

The recognition that coalescent models can be extended from the population level to the species level has given rise to new phylogenetic methods for estimating species trees based on the coalescent such as Bayesian estimation of species trees (BEST) (Liu

2008), Minimize deep coalescence (MDC) (Maddison 1997), ESP-COAL (Degnan and

Salter 2005; Carstens and Knowles 2007). These new methods are best suited for inferring evolutionary relationships between recently diverged groups because any single gene tree may not represent the actual species tree in these situations. Therefore, the methods incorporate uncertainty in gene genealogies in the estimate of the phylogeny. As a result, the fields of phylogenetics and population genetics are moving towards multilocus sequence-based coalescent approaches from single locus approaches that were predominant in most of its history (Brito and Edwards 2008). Use of multiple loci provides independent parameter estimates from each locus and gives far better estimates with narrow confidence intervals (Pluzhnikov and Donnelly 1996; Baker 2007).

Therefore, there has been an emphasis on using multilocus data for historical evolutionary inference.

3

1.2 The common chaffinch ( Fringilla coelebs )

The Fringilla contains three species of that feed their young on insects, rather than seeds. They are seed-eating , restricted to the Old World.

The three members of this genus are the ( Fringilla montifringilla ), Blue chaffinch ( Fringilla teydea ) and the Common chaffinch ( Fringilla coelebs ). Figure 1 shows the evolutionary relationships of species in this genus. The common chaffinch and the blue chaffinch are sister species, and the brambling is sister to the two chaffinches.

The blue chaffinch is endemic to high elevation pine forests in Canaries (Grant 1979), while the brambling occupies forests of Northern and Asia.

Figure 1: Evolutionary relationships among Fringilla sp.

The common chaffinch is a widespread Palearctic species that is classified into many subspecies (>19), based on marked phenotypic, behavioural and geographic differences. There are two described subspecies from North Africa (F. c. africana and F. c. spodiogenys ), two from Europe ( F. c. coelebs and F. c. gengleri ) and five from the

Atlantic islands ( F. c. moreletti , F. c. maderensis , F. c. canariensis , F. c. ombriosa , and

F. c. palmae ). The Atlantic islands are volcanic islands off the coast of Northwest Africa

4

and thought to have formed within the last 20 million years (Carracedo 1999). The

Canary Islands are estimated to have formed 1- 20 million years ago. The youngest island in the Canaries archipelago is (home to F. c. ombriosa ), which is estimated to be 1.1 million years old. (home to F. c. palmae ) is estimated to be 2 million years old, while La Gomera (home to F. c. canariensis ) is about 12 million years old

(Carracedo 1999). The and islands are thought to have formed about 5 million years ago. The are located about 100 km off the coast of

Northwest Africa. Madeira is located between the Azores and Canaries, about 800 km from the Azores and 400 km from the Canaries. The Azores archipelago extends approximately 600 km in Northwest-southeast direction and is about 1,500 km from

Portugal (Figure 2).

Figure 2: Distribution map of the common chaffinch ( Fringilla coelebs ) showing part of its total range. The populations are named corresponding to their subspecies designations.

5

Based on the estimated origins of Atlantic islands, the common chaffinch populations are believed to have diverged within the last million years. There are substantial morphological and behavioural differences between island and continental types (Grant 1979; Dennison and Baker 1991; Lynch and Baker 1994), with island populations having evolved shorter wings, longer legs, longer bills and larger bodies than continental conspecifics (Grant 1979). In addition, island populations are characterized by extensive dark blue dorsal plumage, wings and tails in males (Grant 1979). Grant

(1979) also found that depth and width have increased substantially in Azores , compared to the Canaries; and birds from the Canaries appear to be less variable than from the Azores for bill, wing and leg characters. However, Dennison and Baker (1991) used 12 morphometric measurements taken from larger samples and did not find statistically significant differences between the Canary and Azores populations in their total variance. They did noted that total variance in Canaries was lower than Azores and continental populations.

Lynch and Baker (1994) investigated the cultural evolution of chaffinch populations by analyzing song memes. Their analysis revealed that the African and

European populations had the lowest levels of meme differentiation between populations.

The Azores population had the highest level of meme diversity and Canaries showed the highest level of among population differentiation. The high levels of memetic differentiation between Atlantic islands shown in this study suggest limited migration between islands (Lynch and Baker 1994).

A previous phylogenetic analysis using mtDNA suggested that Atlantic islands were sequentially colonized in a single wave via Azores (Marshall and Baker 1999).

6

However, the bootstrap support for some of the internal nodes in this phylogeny was low and incomplete lineage sorting may have swamped the phylogenetic signal, yielding incorrect evolutionary relationships by traditional methods. A population genetic study to infer the location of Pleistocene refugia using the same locus estimated moderate gene flow between some African and European populations, but parameter estimates had large confidence intervals since they were inferred from a single locus. However, the analysis suggested that Iberia, Greece and North Africa may have served as refugia during the

Pleistocene glacial maximum (Griswold and Baker 2002). This study however did not sample any of the Atlantic island populations.

1.3 Thesis objectives

A new method of phylogenetic inference based on coalescence was proposed by

Liu and Pearl (2007), which is designed to rectify the problem of inferring evolutionary relationships when incomplete lineage sorting is present. This method has performed well in simulations and in a few empirical studies, showing potential for inferring correct evolutionary relationships for recently diverged species. Furthermore, coalescent theory allows us to estimate important population genetic parameters such as the scaled population mutation rate ( θ), migration rates and historical population size changes (i.e. population growth/ bottlenecks). I employ these latest phylogenetic and population genetic tools to determine the evolutionary history of the common chaffinch, which can further our understanding of the speciation process in oceanic islands, which is the primary focus of my thesis.

In chapter 2, I use molecular population genetic tools to examine population structuring and historical demography of common chaffinch populations. I use the

7

Bayesian genetic clustering program Structure to examine whether described populations based on morphology correspond to genetic populations. In a coalescent framework, I estimate migration rates among populations and investigate the possibility of recent population growth. The primary objective was to determine the level of population differentiation among putative populations (specifically Atlantic island populations) and whether morphological subspecies correspond to genetic clusters. In addition, I investigate the demographic history of these populations by modeling population size changes in the recent past. One possibility is that these island populations have been isolated for thousands of generations, with very limited gene flow between them, essentially functioning as allopatric incipient species. Another possibility is that these populations have diverged very recently, experience ongoing gene flow, and function as demes. In between these two extremes is the possibility that these populations have diverged relatively recently, have limited ongoing gene flow, and possibly are in the process of acquiring reproductive isolation, in which case, we may be seeing speciation in progress.

In chapter 3, I investigate evolutionary relationships between common chaffinch populations by constructing multilocus intraspecific phylogenies. I use the popular phylogenetic package MrBayes to construct trees from concatenated sequence data and also by using the new coalescent based phylogenetic inference method implemented in the program BEST. First, I investigate effects of sampling on concatenated phylogenetic inference by comparing phylogenies constructed using few (4) sequences per population versus many (16) sequences per population. It is a common practice in phylogenetics to use a single sequence to represent the species because interspecific sequence variation is

8

expected to be significantly higher than intraspecific variation. But for recently diverged groups, this may result in incorrect evolutionary inferences due to the stochasticity of the lineage sorting process. Therefore, I use a new coalescent-based phylogenetic method

(BEST) (Liu and Pearl 2007; Liu 2008), which accounts for incomplete lineage sorting to infer evolutionary relationships. I compare and contrast estimated phylogenies from each method to determine the most likely evolutionary relationships among chaffinch populations.

I summarize my findings from the population genetic and phylogenetic analyses in chapter 4 to elucidate the evolutionary history of the common chaffinch. I also discuss future research directions in phylogenetic and population genetic inference methods to improve accuracy of historical inferences. Chapter 2 and 3 are written as two primary research articles to be published in separate scientific journals.

1.4 References

Baker, A. J. 2007. Molecular advances in the study of geographic variation and

speciation in birds. Auk 124:18-29.

Brito, P. H., and S. V. Edwards. 2008. Multilocus phylogeography and phylogenetics

using sequence-based markers. Genetica:1-17.

Carracedo, J. C. 1999. Growth, structure, instability and collapse of Canarian volcanoes

and comparisons with Hawaiian volcanoes. J Volcanol Geotherm Res 94:1-19.

9

Carstens, B. C., and L. L. Knowles. 2007. Estimating species phylogeny from gene-tree

probabilities despite incomplete lineage sorting: An example from melanoplus

grasshoppers. Syst Biol 56:400-411.

Coyne, J. A., and H. A. Orr. 2004. Speciation. Sinauer Associates, Sunderland, MA.

Degnan, J. H., and L. A. Salter. 2005. Gene tree distributions under the coalescent

process. Evolution 59:24-37.

Dennison, M. D., and A. J. Baker. 1991. Morphometric variability in continental and

Atlantic island populations of chaffinches (Fringilla coelebs). Evolution 45:29-39.

Grant, P. R. 1979. Evolution of the Chaffinch, Fringilla coelebs , on the Atlantic islands.

Biol J Linn Soc 11:301-332.

Griswold, C. K., and A. J. Baker. 2002. Time to the most recent common ancestor and

divergence times of populations of common chaffinches ( Fringilla coelebs ) in

Europe and North Africa: Insights into Pleistocene refugia and current levels of

migration. Evolution 56:143-153.

Kingman, J. F. C. 1982. On the genealogy of large populations. J Appl Probab 19A: 27-

43.

Kuhner, M. K. 2009. Coalescent genealogy samplers: windows into population history.

Trends Ecol Evol 24:86-93.

Liu, L. 2008. BEST: Bayesian estimation of species trees under the coalescent model.

Bioinformatics 24:2542-2543.

Liu, L., and D. K. Pearl. 2007. Species trees from gene trees: Reconstructing Bayesian

posterior distributions of a species phylogeny using estimated gene tree

distributions. Syst Biol 56:504-514.

10

Lynch, A., and A. J. Baker. 1994. A population memetics approach to cultural evolution

in chaffinch song: Differentiation among populations. Evolution 48:351-359.

Maddison, W. P. 1997. Gene trees in species trees. Syst Biol 46:523-536.

Marshall, H. D., and A. J. Baker. 1999. Colonization history of Atlantic island common

chaffinches (Fringilla coelebs) revealed by mitochondrial DNA. Mol Phylogenet

Evol 11:201-212.

Pluzhnikov, A., and P. Donnelly. 1996. Optimal sequencing strategies for surveying

molecular genetic diversity. Genetics 144:1247-1262.

Rosenberg, N. A., and M. Nordborg. 2002. Genealogical trees, coalescent theory and the

analysis of genetic polymorphisms. Nat Rev Genet 3:380-390.

11

Chapter 2 Population genetic structure and historical demography of the common chaffinch ( Fringilla coelebs )

2.1 Abstract

The widespread common chaffinch ( Fringilla coelebs ) is divided into over 19 subspecies based on morphological differences. Here, I investigated the genetic structure and demographic history of the common chaffinch populations in , Northern

Africa and the Atlantic islands using 10 sequence-based genetic markers (mtDNA and 9 nuclear markers). The migration rate estimates from the multilocus coalescent analysis showed significant genetic differentiation among Atlantic island populations (4Nm<1).

High levels of gene flow were detected between the British chaffinch ( F. c. gengleri ) and the Western European chaffinch ( F. c. coelebs ) (4Nm = 7.86). The mismatch distribution analysis of mtDNA and multilocus analysis suggests that all populations (except F. c. ombriosa ) have been growing at relatively equal rates. Our results suggest that the British chaffinch is part of the more widespread Western European chaffinch population.

2.2 Introduction

The question of what constitutes a species is not without its share of controversy; and species delimitation continues to be a source of debate among biologists

(Mallet 1995; Sites Jr and Marshall 2003; Hey 2006). Although the most commonly used definition of species today is the biological species concept (Mayr 1963), species are

12

frequently described based only on morphology. In addition to morphological and biological species concepts, other species concepts such as genealogical species (Baum and Shaw, 1995) rely on a phylogenetic framework, and define a species as “a basal, exclusive group of organisms, whose members are more closely related to each other than they are to any organism outside the group”. Furthermore, some have argued that

“species” is a human classification constructed to divide the biodiversity continuum and not a real separate entity (Mallet 2008). Regardless, “species” is the best recognized unit of biodiversity in evolutionary biology and conservation. Even though the large majority of described species are morphological species, genetics has played an increasingly important role in species delimitation in the past few decades (Hebert et al. 2003; Sites Jr and Marshall 2003).

In the continuum of biological organization from genes to populations to higher levels of organization such as phyla, subspecies represents an intermediate step between populations and species. In general, subspecies are populations that have sufficiently diverged in morphological traits to warrant a distinction, but these trait differences are not pronounced enough to classify them as distinct species. The common chaffinch ( Fringilla coelebs ), a widespread Palearctic passerine species, provides a great opportunity to study this continuum from structured populations to recognized species. Common chaffinches occupy much of Eurasia, Northern Africa and the Atlantic islands in the Canaries,

Madeira and Azores, and these populations have undergone marked phenotypic differentiation to warrant designation of multiple subspecies (>19) (Grant 1979; Baker et al. 1990; Marshall and Baker 1999; Suárez et al. 2009).

13

The study of geographic variation in a species is often the first step in the quest to understand how a single ancestral species gives rise to two or more daughter species.

Since the advent of phylogeography, mtDNA has played a crucial role in furthering our understanding about genetic variation within species and it continues to play a critical role. mtDNA is the molecule of choice for phylogeographic and population genetic studies because it is easy to amplify, does not recombine (but see Rokas et al. 2003;

White et al. 2008) and mutates faster than the average nuclear locus. These properties make mtDNA genes ideal for building gene genealogies and to estimate population genetic parameters. However, insight from coalescent theory predicts that gene trees from different loci can differ substantially in topology, and different genes can have different histories even though they come from the same populations (Maddison 1997; Wakeley

2008; Nielsen and Beaumont 2009). Therefore, population genetic parameters estimated from a single locus may not accurately reflect the true evolutionary history of the population (Hey and Machado 2003; Nielsen and Beaumont 2009). Additionally, population genetic parameters estimated from a single locus have very large 95% credible intervals due to stochastic variance in the coalescent process (Edwards and Beerli 2000;

Baker 2007). Due to these reasons, over-interpretation of phylogeographies and population genetic parameter estimates from a single locus has been criticized (Knowles and Maddison 2002; Edwards et al. 2005). Furthermore, concerns have been raised about historical demography based solely on mtDNA because mtDNA genes could be affected by selective sweeps (Bensch et al. 2006) and sex biases in fitness or dispersal (Hare

2001). With rapidly advancing DNA sequencing technology and development of new

14

analytical tools based on coalescent theory, population genetic parameter estimates and interpretations from multiple loci can avoid problems inherent in single locus interpretations.

Baker et al. (1990) conducted a population genetic survey of the common chaffinch using 22 polymorphic allozyme loci and showed that the highest level of genetic differentiation was between continental (Europe and Africa populations pooled together) and Atlantic island populations (F ST = 0.386). There was a high level of genetic differentiation among Atlantic islands (F ST = 0.321) but the European population was weakly differentiated from the African population (F ST = 0.092). Furthermore, populations in different islands of the Canaries have undergone significantly more differentiation compared to populations among Azores Islands. The mean F ST among

Canary populations was 0.272 while it was 0.047 in Azores Islands (Baker et al. 1990).

This suggests that even though distances between Azores islands are greater than Canary

Islands, Azores populations are essentially panmictic. They also reported that the Azores

Islands support much larger populations than the Canary Islands because of more agricultural land and a much more humid climate in the Azores.

The widespread distribution of common chaffinches in continents and oceanic islands provide an excellent opportunity to investigate the effects of geographic isolation and genetic drift on population divergence and speciation. In this study, I focus on common chaffinch populations in Western Europe, Northern Africa and the Atlantic islands to investigate population structure and demographic history using coalescent-

15

based multilocus population genetic tools. For simplicity, I use current subspecies designations to refer to the putative populations because subspecies names correspond very closely to geographic locations (Figure 1). However, the validity of certain subspecies designations has been questioned. One such point of contention is the validity of the British chaffinch ( F. c. gengleri ) as a separate subspecies from the more widespread F. c. coelebs in rest of Western Europe. Others have questioned the validity of three subspecies from the Canary Islands. For example, Mayr (1968) assigns the El

Hierro population as a separate subspecies ( F. c. ombriosa ) from the La Palma population

(F. c. palmae ), while Baker et al. (1990) claimed that these two populations were phenotypically indistinguishable. Here, I investigate genetic structure of the common chaffinch populations using haplotype networks, Bayesian genetic clustering and by estimating migration rates between populations. I also estimate the effective population size and investigate the possibility of historical population size changes. Population genetic parameters are estimated in a multilocus coalescent framework to avoid problems inherent to single locus methods, thus providing more reliable inferences about the history of the common chaffinch.

16

Figure 1: Distribution map of the common chaffinch ( Fringilla coelebs ) showing part of its total range. The populations are named corresponding to the subspecies designation. There are three putative populations in the Canaries, each occupying different islands.

2.3 Methods

2.3.1 Sampling

A total of 121 chaffinches were sampled from nine populations corresponding to their subspecies designation (Figure 1). Appendix I provide the details of the sampled individuals. Nineteen individuals were sampled from continental Europe (Denmark and

Greece) that were classified as F. c. coelebs , eight individuals from the United Kingdom that were classified as F. c. gengleri , and fourteen individuals from Morocco and twelve from Tunisia, corresponding to F. c. africana and F. c. spodiogenys . From the Atlantic islands, fourteen individuals were sampled from the Azores islands population of F. c. moreletti , fourteen from the Madeira population of F. c. maderensis , and thirty-nine

17

individuals from the Canary Islands. Twelve of the Canary Island individuals were from the island of El Hierro, classified as the F. c. ombriosa subspecies; fourteen from La

Palma, classified as the F. c. palmae subspecies and fourteen from La Gomera, classified as F. c. canariensis subspecies.

2.3.2 DNA extraction, amplification and sequencing

Genomic DNA was extracted from frozen muscle tissue using the standard proteinase K-phenol-chloroform method (Sambrook and Russell 2001) or rapid alkaline extraction (Rudbeck and Dissing 1998). Briefly, rapid alkaline extraction was carried out by first adding a minute amount of muscle tissue (2-3 uL equivalent of blood) to a 96 well PCR plate containing 20 uL of 0.2 M NaOH. The plate was covered and heated to

75 °C for 20 minutes in a thermocycler. Finally, the solution was neutralized by adding

180 uL of a 0.04 M Tris-HCl pH 7.5 solution and frozen.

The control region of the mtDNA and 9 nuclear loci were amplified using locus specific primers (Table 1). The amplified loci are: Aconitase II (ACON: Backström et al.

2008); B-Actin (B-ACT: Waltari and Edwards 2002); Elongation factor 1 alpha (EF1 α:

Backström et al. 2008); Glyceraldehyde-3-phosphate dehydrogenase (GAPD: Friesen et al. 1999); Locus L27331 (L27331: Backström et al. 2008); Protein tyrosine phosphatase non-receptor 12 (PTPN12; Townsend et al. 2008), Tropomyosin (TROP: Friesen et al.

1999); and Ubiquitin carboxyl-terminal esterase (UBIQ: Backström et al. 2008),

Anonymous locus OH (ANON OH), and the Control region of mtDNA (Marshall and

18

Baker 1997). All the nuclear loci with the exception of PTPN12 were non-coding. The anonymous locus OH was developed by the following method. First, Chaffinch genomic

DNA was isolated, purified and digested using five different restriction enzymes which were selected to leave single stranded overhangs at each cut site. The enzymes used were

Hind III, Eco RI, Xba I, Xho I and Nhe I. The digested DNA was then size selected and ligated to double stranded linkers containing specific single strand overhangs complementary to the restriction enzyme cut sites. The library that resulted represented the complete Chaffinch genome with the average fragment size being approximately

1,500 bp and each fragment having amplifiable ends. Using a primer specific for the retrotransposon CR1 and a primer specific for the linker, fragments were amplified, size selected and cloned. A number of clones were sequenced to identify flanking regions to the repetitive element. The flanking regions were blasted against the chicken genome to ensure they were present as a single copy, and from the candidate sequences primers were designed to amplify this region but to exclude the repetitive element.

19

Table 1: Sampled loci for population genetic analysis

Locus Primer sequence Fragment Chromosomal Model of Reference size (bp) location in zebra sequence evolution

F- CCAATGCTTGTGGGCCATG Aconitase II (ACON) R- ATTGCGACCTGTGAAATTCC 750 1A GTR+I Backström et al. 2008 Anonymous locus OH (ANON F- TCCCATTGCAACAACCTGTTCAC OH) R- GGGCACTTCAGTCACTCTGAC 400 2 GTR+I NA F- CCTGATGGTCAGGTCATCA Waltari and Edwards B-Actin (B- ACT) R- CAGCAATGCCAGGGTACAT 500 5 GTR+I+G 2002 Elongation factor 1 alpha F- ATTGGCTACAACCCAGACAC (EF1 α) R- CAGGATGCAGTCCAAGGCT 400 3 HKY+G Backström et al. 2008 Glyceraldehyde 3 phosphase F- ACCTTTAATGCGGGTGCTGGCATTGC dehydrogenase (GAPD) R- CATCAAGTCCACAACACGGTTGCTGTA 350 1 HKY+G Friesen et al. 1997 F- CCTAGCTAAATATGTTCTGGC Locus 22731 (L27331) R- TAGGCTTCCTGATGATGGCT 820 1 GTR+I+G Backström et al. 2008 Protein tyrosine phosphatase F- AGTTGCCTTGTWGAAGGRGATGC non-receptor 12 (PTPN12) R- CTRGCAATKGACATYGGYAATAC 830 1A HKY+I Townsend et al. 2008 F- GAGTTGGATCGGGCTCAGGAGCG Tropomyosin (TROP) R- CGGTCAGCCTCTTCAGCAATGTGCTT 450 28 GTR+I Friesen et al. 1999 Ubiquitin carboxyl-terminal F- GCTTGTGGGACAATTGGG esterase (UBIQ) R- TATTTGGCCCTCTCTTCAGG 450 1 HKY+I Backström et al. 2008 F- TCAGGGTATGTATAATATGC mtDNA Control region (CR) R- CACTTGCTGTGAAGAGC 480 NA GTR+I+G Baker and Marshall 1997

20

Polymerase chain reactions (PCR) were carried out using 1.5 µL of DNA in a 12.5 µL of total reaction volume, with 1.25 µL of PCR buffer (10mM TrisHCl pH8.3, 2.5 mM

MgCl, 50mM KCl and 0.01% gelatin), 0.28 µL of 1X dNTP’s, 0.5 µL of each primer, and 0.05 µL of Platinum ® Taq (5units/ µL) (Invitrogen Inc). The thermocycling profile was as follows: an initial 94 °C denaturation step for 4 min, followed by a total of 35 cycles consisting of a 30 sec at 94 °C denaturation step, a 30 sec annealing step starting at

65 °C and decreasing by a degree per cycle until the annealing temperature reached 55

°C, and 30 sec 72 °C extension step, and a final extension of 5 min at 72 °C. Amplified product was isolated by separation in a 2% agarose gel, the DNA band was cut out of the gel and purified for sequencing. Cycle sequencing reactions with forward and reverse

PCR primers were carried out using BigDye Terminator 3.1 (Applied Biosystems) and visualized on an ABI 3100 DNA Sequencer.

2.3.3 Data analysis

For each locus, all sequences were edited using Chromas Pro 1.42 (Technelysium

Pty. Ltd., Australia). The sequences were aligned initially with the program ClustalW

(Thompson et al. 1994) in the BioEdit Sequence Alignment Editor (Hall 1999) and adjusted manually. For nuclear loci, heterozygous sites were identified from the presence of two equal height peaks in the chromatograms. The program PHASE 2.1 (Stephens and

Scheet 2005) was used to resolve the haplotypes from the unphased genotype data when a sequence contained multiple heterozygous sites. A small subset of individuals that did not sequence well directly or that were needed for phasing were cloned into the PCR 2.1

21

TOPO TA cloning vector (Invitrogen) and sequenced in both directions using M13 primers. This was done to eliminate the possibility of using paralogous genes for analysis and to verify that statistically inferred haplotypes were correct.

For each locus, the number of segregating sites (S), nucleotide diversity ( π), haplotype diversity (H d), Tajima's D statistic (Tajima 1989), Fu and Li’s F* statistic (Fu and Li 1993) and minimum number of recombination events (R M, (Hudson and Kaplan

1985) were estimated using the DnaSP 5.0 software package (Librado and Rozas 2009).

Deviation from the standard neutral model was tested using Tajima’s D and Fu and Li’s

F* statistic, and significance was assessed by conducting 1000 coalescent simulations.

The Hudson-Kreitman-Aguade (HKA) test (Hudson et al. 1987) implemented in DnaSP was conducted on loci that deviated from neutral predictions according to Tajima’s D and Fu and Li’s F* test. The HKA test compares observed and expected number of segregating sites within a species and the number of pairwise differences between species at two or more loci, and is more sensitive in detecting natural selection than neutrality tests based on site frequency spectrum (Zhai et al. 2009). I also conducted the four- gamete test (Hudson and Kaplan 1985) implemented in DnaSP to determine whether recombination needs to be incorporated into the coalescent model for population genetic parameter estimates.

I constructed median-joining haplotype networks for each locus using

NETWORK 4.5.1 (available at http://www.fluxustechnology.com/) to explore relationships among sequences at each locus. The goal of constructing haplotype

22

networks was to determine whether a) haplotypes segregate according to geographic locations and b) if there are signatures of recent population expansion. Therefore, haplotype networks function as a qualitative first pass test for exploring population structure and expansion.

To determine historical population expansion events, mismatch distributions were calculated using the program Arlequin 3.1 (Excoffier et al. 2005). The expected distribution under a sudden demographic expansion model (Rogers and Harpending

1992) was generated using 1000 parametric bootstrap replicates. The sum-of-squared deviations (SSD) between observed and expected mismatch distribution were computed and significant deviation from the demographic expansion model was assessed by calculating the proportion of simulations producing a larger SSD than the observed SSD

(α = 0.05).

To determine the genetic structure of the common chaffinch, I used the Bayesian genetic clustering program Structure 2.3.1 (Pritchard et al. 2000), which utilizes allele frequencies from multiple loci to assign sampled individuals to genetic clusters. Data were coded as haplotypes. Analysis was conducted with a burn-in of 400,000 and two million MCMC iterations after the burn-in under admixture model with correlated allele frequencies (Falush et al. 2003). Ten independent runs were conducted at each K with different seed numbers to calculate Ln [Pr(X|K)]. Simulations were run from K=1 to

K=10. I used the coalescent genealogy sampler program LAMARC 2.1.3 (Kuhner and

Smith 2007) to estimate population genetic parameters theta (θ), migration rates (M) and population growth parameter ( g), and incorporated recombination into the model. Two

23

Bayesian replicates with four simultaneous searches with adaptive heating were conducted. The first 10,000 trees were discarded as burn-in, and then 50,000 trees were collected for parameter estimates by sampling every 50 th tree. The best model of nucleotide sequence evolution for each locus was identified using Akaike Information

Criterion (AIC) in MrModeltest 2.3 (Nylander 2004) and specified for the analysis in

LAMARC. Overall curve files for each parameter were inspected to ensure that parameter space was searched adequately and estimates were reliable.

2.4 Results

2.4.1 Summary statistics and neutrality tests

In general, the number of segregating sites (S) and the average pair-wise number of nucleotide differences per site between sequences ( π) was higher in continental populations than island populations (Table 2). The Canary island populations harbored the least amount of genetic diversity, while the continental European F. c. coelebs population was the most diverse. The diversity statistics (S, π, H d) among the three

Canaries populations were very similar. In Atlantic islands, the Azores appears to be the most genetically diverse, but the Madeira population is close to the Azores population in the level of genetic diversity. The F. c. coelebs and F. c. gengleri populations had comparable numbers of segregating sites and nucleotide diversity at most loci. In the

African continent, the F. c. africana population appears to be slightly more genetically diverse than the F. c. spodiogenys population. Recombination was detected for all examined loci except for the EF1a locus. The B-Actin locus contained the highest

24

number of recombination events as detected by the four gamete test. Only one recombination event was detected for the GAPD, PTPN 12, TROP and UBIQ loci.

The results from Tajima’s D and Fu and Li’s F* neutrality tests were not significant ( α =0.05) except in a few cases; suggesting conformity to the neutral coalescent model. In general, Tajima’s D and Fu and Li’s F* statistic for most populations were negative but not significant (Table 2). Fu and Li’s F* statistic was negative for at least seven of the 10 loci in F. c. africana, F. c. spodiogenys, F. c. coelebs,

F. c. gengleri, F. c. maderensis and F. c. palmae populations. Fu and Li’s F* was negative across all loci for the F. c. coelebs population. Both Tajima’s D and Fu and Li’s

F* were significantly negative for the F. c. palmae population at the EF1 α locus (D = -

1.733, F* = -2.754), and the F. c. coelebs population at the L27331 locus (D= -1.993, F*

= -3.525). At the TROP locus, Tajima’s D was significantly negative for the F. c. maderensis population (D = -1.931) and Fu and Li’s F* was significantly negative for the

F. c. ombriosa population (F* = -2.658). Since both these tests are sensitive to demographic history, the HKA test was conducted with Blue chaffinch as the outgroup.

The HKA test did not detect significant deviation from neutrality (χ2 = 0.215, p = 0.64).

Both Tajima’s D and Fu and Li’s F* were significantly positive for the control region of mtDNA in the F. c. moreletti population in the Azores (D = 2.186, F* =1.780).

25

Table 2: Population genetic summary statistics

Tajima's Fu and Locus Population N S π Hd D Li's F* RM ACON F. c. canariensis 12 0 0 0 0 0 0 F. c. maderensis 18 7 0.0024 0.673 -0.625 0.382 0 F. c. africana 20 15 0.0055 0.726 -0.346 -0.832 1 F. c. spodiogenys 20 14 0.0036 0.442 -1.302 -1.881 0 F. c. coelebs 32 13 0.0032 0.716 -0.979 -0.965 2 F. c. gengleri 16 9 0.0036 0.842 -0.594 -0.844 0 F. c. ombriosa 10 0 0 0 0 0 0 F. c. palmae 26 1 0.0001 0.077 -1.156 -1.727 0 F. c. moreletti 28 8 0.0015 0.373 -1.459 -0.444 0 ANON OH F. c. canariensis 26 2 0.0017 0.52 0.184 -0.514 0 F. c. maderensis 28 9 0.0073 0.825 0.165 -0.221 2 F. c. africana 22 8 0.0066 0.597 0.020 0.150 1 F. c. spodiogenys 24 2 0.0009 0.236 -0.92 -0.843 0 F. c. coelebs 32 11 0.0082 0.847 0.300 -0.084 2 F. c. gengleri 16 14 0.0168 0.958 0.834 0.481 4 F. c. ombriosa 22 8 0.0094 0.853 1.315 1.064 2 F. c. palmae 28 1 0.0005 0.071 -1.151 -1.747 0 F. c. moreletti 28 10 0.0073 0.836 0.257 -0.052 0 B- ACT F. c. canariensis 26 1 0.0002 0.077 -1.156 -1.727 0 F. c. maderensis 26 15 0.0048 0.628 -1.803 -1.299 0 F. c. africana 28 20 0.0063 0.675 -1.658 -2.116 2 F. c. spodiogenys 24 15 0.0081 0.822 -0.094 0.321 2 F. c. coelebs 36 28 0.0116 0.974 -1.037 -0.665 5 F. c. gengleri 16 25 0.0150 0.992 -0.412 0.202 4 F. c. ombriosa 20 0 0 0 0 0 0 F. c. palmae 26 0 0 0 0 0 0 F. c. moreletti 24 9 0.0056 0.674 0.334 0.793 1 EF1 α F. c. canariensis 26 1 0.0005 0.148 -0.714 0.288 0 F. c. maderensis 28 4 0.0028 0.384 -1.552 -1.290 0 F. c. africana 26 10 0.0092 0.763 -1.105 -1.345 0 F. c. spodiogenys 24 15 0.0092 0.899 -0.729 -0.794 0 F. c. coelebs 32 19 0.0075 0.917 -1.653 -1.868 0 F. c. gengleri 16 12 0.0060 0.858 -1.61 -2.133 0 F. c. ombriosa 16 0 0 0 0 0 0 F. c. palmae 28 3 0.0007 0.206 -1.733* -2.754* 0 F. c. moreletti 26 7 0.0050 0.828 -0.675 -0.643 0 GAPD F. c. canariensis 24 1 0.0016 0.522 1.596 1.016 0 F. c. maderensis 28 4 0.0016 0.479 -1.256 -1.191 0 F. c. africana 28 10 0.0052 0.712 -1.081 -2.243 0 F. c. spodiogenys 24 7 0.0029 0.605 -1.512 -2.160 0 F. c. coelebs 38 12 0.0048 0.634 -1.406 -0.559 1

26

F. c. gengleri 16 13 0.0088 0.95 -1.059 -0.580 0 F. c. ombriosa 24 5 0.0023 0.538 -1.353 -1.326 0 F. c. palmae 26 4 0.0020 0.403 -1.033 -0.270 0 F. c. moreletti 28 6 0.0052 0.799 0.360 1.123 0 L27331 F. c. canariensis 22 10 0.0046 0.874 0.873 0.246 0 F. c. maderensis 20 15 0.0036 0.832 -1.228 -0.289 1 F. c. africana 24 20 0.0053 0.957 -0.930 -0.362 1 F. c. spodiogenys 24 14 0.0058 0.841 0.472 0.453 2 F. c. coelebs 32 20 0.0029 0.833 -1.993* -3.525* 1 F. c. gengleri 16 17 0.0043 0.942 -1.282 -1.703 2 F. c. ombriosa 10 9 0.0046 0.867 0.607 0.352 0 F. c. palmae 24 11 0.0071 0.496 1.012 1.527 2 F. c. moreletti 24 14 0.0071 0.928 1.751 1.531 2 PTPN12 F. c. canariensis 16 7 0.0020 0.733 -0.837 -1.118 0 F. c. maderensis 16 4 0.0009 0.442 -1.268 -0.920 0 F. c. africana 24 12 0.0025 0.808 -1.261 -1.384 1 F. c. spodiogenys 22 9 0.0020 0.879 -1.201 -0.589 0 F. c. coelebs 34 12 0.0019 0.811 -1.519 -2.118 0 F. c. gengleri 16 9 0.0023 0.867 -1.010 -1.991 1 F. c. ombriosa 8 4 0.0016 0.75 -0.726 -0.963 0 F. c. palmae 20 2 0.0008 0.563 0.173 -0.444 0 F. c. moreletti 28 7 0.0016 0.677 -0.786 -0.349 0 TROP F. c. canariensis 26 1 0.0013 0.508 1.533 0.991 0 F. c. maderensis 28 8 0.0019 0.545 -1.931* -2.133 0 F. c. africana 28 5 0.0021 0.577 -0.902 -0.780 0 F. c. spodiogenys 24 2 0.0004 0.083 -1.515 -2.281 0 F. c. coelebs 36 8 0.0034 0.771 -0.891 -0.868 1 F. c. gengleri 14 6 0.0034 0.802 -0.976 -1.342 0 F. c. ombriosa 24 3 0.0006 0.163 -1.732 -2.658* 0 F. c. palmae 26 1 0.0005 0.212 -0.311 0.415 0 F. c. moreletti 28 4 0.0025 0.648 0.010 0.058 0 UBIQ F. c. canariensis 26 1 0.0002 0.077 -1.156 -1.727 0 F. c. maderensis 28 3 0.0007 0.267 -1.527 -1.694 0 F. c. africana 26 10 0.0052 0.523 -1.105 -1.345 1 F. c. spodiogenys 24 8 0.0057 0.652 -0.735 -0.470 0 F. c. coelebs 34 9 0.0055 0.82 -0.786 -1.402 1 F. c. gengleri 16 3 0.0029 0.692 0.868 0.231 0 F. c. ombriosa 22 0 0 0 0 0 0 F. c. palmae 26 1 0.0002 0.077 -1.156 -1.727 0 F. c. moreletti 26 8 0.0073 0.751 0.229 0.169 1 CR F. c. canariensis 13 2 0.0019 0.718 1.102 1.122 (1) F. c. maderensis 13 10 0.0065 0.758 -1.238 -1.253 0 F. c. africana 14 9 0.0060 0.824 -0.246 -0.441 0 F. c. spodiogenys 12 13 0.0117 0.561 0.833 1.229 0 F. c. coelebs 19 14 0.0063 0.953 -1.101 -0.596 (1) F. c. gengleri 8 6 0.0052 0.857 -0.786 -0.943 0

27

F. c. ombriosa 11 2 0.0008 0.182 -1.429 -1.797 0 F. c. palmae 14 3 0.0010 0.143 -1.671 -2.255 0 F. c. moreletti 14 7 0.0075 0.736 2.186* 1.78* (1)

N: Sample size S: Number of segregating sites π: Average number of nucleotide differences per site between two sequences Hd: Haplotype (gene) diversity Tajima's D: Tajima’s D statistic (Tajima 1989) calculated in DnaSP. * p<0.05 Fu and Li's F*: Fu and Li's F* statistic (Fu and Li 1993) calculated in DnaSP. * p<0.05 RM: Minimum number of recombinant events estimated using four-gamete test (Hudson and Kaplan 1985)

2.4.2 Haplotype networks

The mtDNA haplotype network (Figure 2) shows a clear distinction between

Atlantic island and continental populations. There is no haplotype sharing between the islands and the continents, and five mutational steps separate the two groups. In the

Atlantic islands, the birds from the Canaries, Madeira and Azores islands are separated into different haplo-groups with little haplotype sharing. Haplotype 4 is shared between the Azores and Madeira populations while haplotype 2 is shared between Madeira and

Canary Islands. Within the Canary Islands, the three populations appear to segregate into distinct haplotypes with no haplotype sharing. The continental haplotype relationships are slightly more complex than islands. Haplotype 9 is shared between all four putative continental populations. Haplotypes 15 and 21 are shared between F. c. gengleri and F. c. coelebs populations. There are many low frequency haplotypes, mostly belonging to the

F. c. coelebs population.

28

Mainland

Islands

Figure 2: Median-joining haplotype network of the control region of mtDNA. Each circle represents a haplotype with its size proportional to the frequency. Dashes on lines connecting two haplotypes represent the number of mutational steps between the two haplotypes.

Figure 3 and 4 show haplotype networks of four nuclear loci (EF1 α, PTPN,

TROP, and UBIQ) that were detected to have had only one recombination event. The nuclear loci networks show more haplotype sharing than mtDNA. In general, nuclear networks contain one very common haplotype that is shared between all the populations and many low frequency haplotypes surrounding the common haplotype, resembling a

“star” pattern. Some medium frequency haplotypes are shared between all four

29

continental populations or three of the four populations (e.g. H15 and H16 of EF1 α, H10 and H19 of PTPN12, H10 of TROP, H14 of UBIQ). Some haplotype structuring is evident at nuclear loci as well. For example, at the PTPN12 locus, all Canary islands haplotypes are clustered together with very little sharing with other populations.

Figure 3: Median- joining haplotype network of nuclear loci EF1 α and PTPN. Each circle represents a haplotype with its size proportional to the frequency. Dashes on lines connecting two haplotypes represent the number of mutational steps between the two haplotypes.

30

Figure 4: Median- joining haplotype network of nuclear loci TROP and UBIQ. Each circle represents a haplotype with its size proportional to the frequency. Dashes on lines connecting two haplotypes represent the number of mutational steps between the two haplotypes.

31

2.4.3 Genetic clusters and migration rates

Genetic relationships among assumed populations corresponding to the nine described subspecies were explored by conducting simulations (K=1 to K=10) in the genetic clustering program Structure 2.0. The objective was to determine the number of genetically distinct populations among sampled locations, and the level of admixture in individuals from each population. My prediction was that K = 9 would yield the best likelihood of the data with nine distinct genetic clusters corresponding to the 9 described subspecies. The best likelihood of K given data was for K= 5 with log-likelihood of

-4341. Figure 5 illustrates the average likelihood of data for each K. When only two populations were assumed for the simulations, Canary island populations were grouped as one cluster and all other populations (Madeira, Azores, two African and two continental European) were grouped in the other cluster. When K was set to 3, Canary populations was classified as one cluster, Madeira and Azores populations as another cluster, and all continental populations as the 3 rd cluster. K= 4 grouped Canaries as one cluster, Madeira and Azores populations together as the 2nd cluster, African populations as the 3 rd cluster and European populations as the 4 th cluster. The best likelihood of data

(K= 5) grouped Canaries, Madeira, and Azores populations as 3 distinct clusters with the two African populations grouped together as the 4 th cluster and European populations as the 5 th cluster (Figure 6). Increasing the assumed number of populations beyond K= 5 did not divide these populations further according to geography; instead, admixture level of individuals increased. The two African subspecies ( F. c. africana and F. c. spodiogenys ) or two continental European subspecies ( F. c. coelebs and F. c. gengleri ) were not separated into distinct genetic clusters even when assumed populations were 10 (K=10).

32

Structure also did not divide the three Canary island subspecies into distinct clusters even at K=10.

-4000

-4250

-4500

-4750

Ln[P(X|D)] -5000

-5250

-5500 1 2 3 4 5 6 7 8 910 K

Figure 5: Plot of the likelihood of data for assumed number of populations (K) in Structure. Simulations were run from K = 1 to K = 10. Ten simulations were conducted for each K and the likelihood was averaged to determine the number of populations that best fit the data.

Figure 6: Probabilistic assignment of individual genotypes to populations (K=5) that was found to be the highest likelihood of data in the genetic clustering program Structure.

33

Table 3 provides the estimated migration rates among populations. The migration rate parameter M = 4Nm has the intuitive interpretation of number of immigrants per generation. In general, migration rates between continental populations are relatively high compared to migration among islands. The highest estimated migration rate is from the F. c. coelebs population in Europe into the F. c. gengleri population in the UK (4Nm

=7.86). The migration rate from F. c. gengleri to F. c. coelebs is moderate as well (4Nm

= 2.71). The estimated migration rate between European and African populations is relatively moderate to low (1< 4Nm <2). The highest migration rate into the Atlantic islands is from F. c. coelebs to F. c. moreletti population (4Nm =1.11). The highest estimated migration rate between any two Atlantic island populations is between the

Azores and Madeira (4Nm = 0.70). The migration rates between the three Canary island populations are also very low (4Nm <0.5).

34

Table 3: Migration rate estimates between common chaffinch populations (M = 4Nm)

From F. c. F. c. F. c. F. c. F. c. F. c. F. c. F. c. F. c. Population canariensis maderensis africana spodiogenys coelebs gengleri ombriosa palmae moreletti F. c. canariensis ------0.34 0.16 0.15 0.23 0.19 0.35 0.32 0.12 F. c. maderensis 0.37 ------0.28 0.24 0.36 0.33 0.25 0.27 0.70 F. c. africana 0.09 0.36 ------1.79 1.45 1.18 0.16 0.12 0.35 To F. c. spodiogenys 0.10 0.20 1.90 ------0.81 0.55 0.18 0.11 0.16 F. c. coelebs 0.13 0.19 1.00 0.53 ------2.71 0.15 0.11 0.43 F. c. gengleri 0.22 0.53 1.51 1.33 7.86 ------0.41 0.20 0.86 F. c. ombriosa 0.40 0.30 0.31 0.25 0.29 0.28 ------0.44 0.23 F. c. palmae 0.29 0.23 0.15 0.12 0.26 0.19 0.37 ------0.30 F. c. moreletti 0.11 0.55 0.38 0.18 1.11 0.47 0.21 0.32 ------

35

2.4.4 Effective population sizes and population expansion

Assuming equal mutation rates among populations, the parameter θ of different populations can be compared to infer relative effective population sizes. The Canaries populations contain the lowest θ values, and values of the three populations are very close to each other. The lowest estimated θ value is for the F. c. palmae population ( θ =

0.0010), followed by the F. c. ombriosa population ( θ = 0.0012) and the F. c. canariensis population ( θ = 0.0013). The F. c. africana, F. c. spodiogenys , F. c. maderensis and F. c. moreletti populations have similar values of θ, approximately three-four times the values of Canary island populations ( F. c. africana θ = 0.0039 F. c. spodiogenys θ = 0.0029, F. c. maderensis θ = 0.0029, F. c. moreletti θ = 0.0033). The two European populations have similar θ values as well. The F. c. gengleri population has θ of 0.0097 and F. c. coelebs has the highest estimated θ of 0.0101, approximately ten times the Canary Island values. I approximated the effective population size (N e) by using a published indirect mutation rate for avian introns/ anonymous nuclear loci of 2.5 x 10 -9 substitutions

/site/year (Lee and Edwards 2008) and a generation time of 2 years for the common chaffinch (Table 4).

Conformity to the recent population expansion model was tested by calculating the sum of-squared deviations (SSD) between observed and expected mismatch distribution of control region of mtDNA. A sum of-squared deviation of zero indicates a perfect match to Rogers and Harpending (1992) population expansion model. Except for the F. c. spodiogenys population, mismatch distribution of populations did not deviate

36

significantly from the population expansion model (Table 4). LAMARC assumes that a population has been growing or shrinking at the same exponential rate for a long period of time. Therefore, the ability to detect population growth or a bottleneck depends on the amount of time a population experienced growth or the duration of a bottleneck. Positive values of g indicate a growing population, and negative values indicate a shrinking population from past to present. Estimated exponential population growth parameters ( g) for each population are shown in Table 4. The maximum probability estimates (MPE) for each population are well above zero (ranges from 395 – 619) and the 95 % confidence interval for g are above zero for all populations except the F. c. ombriosa population in the Canaries. As estimates of g tend to be biased upwards, 95% CI of g including zero suggest no significant deviation from constant population size. The smallest estimate of g is for the F. c. moreletti population ( g = 395) while the highest estimate of g is for the F. c. africana population ( g = 619).

Table 4: Effective population size and population expansion

Population θ Ne SSD g (95% CI) F. c. canariensis 0.0013 65050 0.032 517 (8 - 845) F. c. maderensis 0.0029 144850 0.054 487 (210 - 823) F. c. africana 0.0039 192900 0.054 619 (404 - 881) F. c. spodiogenys 0.0029 144750 0.419* 452 (82 - 692) F. c. coelebs 0.0101 503500 0.001 541 (380 - 725) F. c. gengleri 0.0097 486150 0.008 577 (329 - 812) F. c. ombriosa 0.0012 59750 0.076 477 ( -33 - 808) F. c. palmae 0.0010 52200 0.030 498 (47 - 836) F. c. moreletti 0.0033 164550 0.174 395 (120 - 745)

θ: Population scaled mutation parameter estimated by LAMARC ( θ = 4N ) -9 Ne: Approximate effective population size estimated by using mutation rate of 2.5 x 10 substitutions /site/year and a generation time of 2 years. SSD: The sum of squared deviations between observed and expected mismatch distribution under population expansion model. *p < 0.05 g: Exponential population growth parameter estimated by LAMARC

37

2.5 Discussion

The wide geographic occurrence of the common chaffinch throughout Eurasia,

Northern Africa and the Atlantic islands makes the common chaffinch an attractive species to study population structure and divergence. In particular, the contrast between continents and islands affords an opportunity to examine different windows on the evolutionary process because oceanic islands are considered hotspots for speciation due to geographic isolation, stronger effects of genetic drift due to small population sizes and natural selection. Additionally, the Atlantic islands are close to the African continent

(shortest distance is 115 km), and the three main island archipelagos (Canaries, Madeira and Azores) are separated by a reasonable distance (>400 km). At present, there are 19 described subspecies of the common chaffinch based on morphology, starting from

Linnaeus in 1758. However, the genetic basis for subspecies delimitation is poorly understood. Here, I used a multilocus approach to study population differentiation and the demographic history of Western European, Northern African and Atlantic island chaffinches.

In general, genetic diversity in islands is expected to be less due to stronger effect of genetic drift. Due to their geographic isolation and drift, island populations in an archipelago are also expected to show higher levels of population differentiation than continental populations. As expected, genetic diversity (S and π) is lower in island populations compared to their continental conspecifics. The European populations harbored the highest genetic diversity, followed by African populations. Genetic diversity in the Azores and Madeira populations is similar, and is lowest in the Canary island

38

populations. When demographic phenomena such as population bottlenecks or subdivision are experienced by a population, the genetic signature of such events is expected to be observed throughout the genome. But selection is unlikely to be operating at the whole genome level. Therefore, the genetic signature of selection is likely to be localized. Tajima’s D and Fu and Li’s F* neutrality tests are sensitive to the demographic history of populations. Therefore, deviation from neutrality could be due to population subdivision, recent population expansion or bottlenecks, or natural selection. Fu and Li’s

F is particularly sensitive to recent population expansion, and negative Fu and Li’s Fs values across unlinked loci are suggestive of recent population expansion (Wakeley

2008). Recent population expansion is likely to be the cause of the significant negative deviation of Tajima’s D and Fu and Li’s F* in the F. c. coelebs population at the L27331 locus. In the F. c. coelebs population, Tajima’s D was negative at nine of the 10 loci, and

Fu and Li’s F* was negative at all loci. Since none of the other populations showed significant negative deviation at this locus, this locus is unlikely to be under selection. A significant negative deviation from neutrality was detected for the F. c. palmae population at EF1 α locus. Both neutrality tests were negative at seven of the 10 loci for the F. c. palmae population, and none of the other populations showed significant deviation from neutrality at EF1 α locus. Since Tajima’s D and Fu and Li’s F* were significantly negative for the F. c. maderensis and F. c. ombriosa populations at the

TROP locus, the HKA test was conducted because it is more sensitive in detecting selection. However, the HKA test failed to detect significant deviation from neutrality.

A significant positive deviation of Tajima’s D and Fu and Li’s F* for the control region of the F. c. moreletti population was also detected (D = 2.186). One possibility is

39

that control region of the mtDNA is under local purifying selection since none of the other populations show significant deviation from neutrality. But it is also possible that the positive deviation is a result of weak population structuring in Azores islands because the data set contained sequences from four of the Azores islands. This inference is also supported by the fact that six of the 10 loci show positive (non-significant) Tajima’s D for the F. c. moreletti population.

Haplotype networks from nuclear loci can be difficult to interpret because the network constructing algorithms are designed for non-recombining molecules. This assumption is unlikely to hold for most nuclear loci. Nevertheless, general patterns of the network can be informative. In general, nuclear loci show much more haplotype sharing than the mtDNA haplotype network. Since the effective size of nuclear DNA is approximately four times the effective size of mtDNA, progress to reciprocal monophyly is faster for mtDNA genes than for nuclear loci. Therefore, ancestral polymorphisms persist much longer in nuclear DNA than mtDNA. If the chaffinch populations diverged relatively recently from an ancestral population, the observed pattern of haplotype sharing at nuclear loci is expected. Another explanation for the difference in nuclear and mtDNA networks is the possibility of sex-biased dispersal. If females are mostly sedentary but males disperse widely across the geographic range that was sampled, the mtDNA network is expected to show structured haplotypes, while nuclear DNA would show haplotype sharing due to dispersal of males. However, chaffinches are sedentary for much of their range. Females of Northern European populations migrate southward in

40

winter while many males remain in residence. Therefore, the observed difference in mtDNA and nuclear haplotype sharing cannot be explained by sex-biased dispersal.

2.5.1 Population structure & subspecies status

Clustering of the genotypes of individuals in Structure along with estimates of migration rates in LAMARC do not suggest genetic differentiation between the British chaffinch F. c. gengleri and the widespread continental European chaffinch F. c. coelebs .

Even when simulations were conducted assuming 10 total populations, there was no distinction in the genetic clusters between these two populations. Conventionally, Nm>1 is considered a high enough migration rate between populations to erase population structuring. The migration rates were 4Nm = 7.86 from F. c. coelebs to F. c. gengleri and

4Nm = 2.71 from F. c. gengleri to F. c. coelebs . Therefore, migration rates between these two putative populations are collectively high enough to consider them as one population.

For Canary island populations, clustering of genotypes with Structure suggests no distinct genetic structure among the three populations (Figure 6). However, migration rate estimates between putative populations are low enough to suggest significant population structure. The highest estimated migration between a population pair is between F. c. ombriosa and F. c. palmae populations (4Nm = 0.44), which is very low. It has been noted that Structure clustering can suggest no population structure even when F ST statistics are significant among putative populations because Structure algorithms are designed to identify population clusters from individual allele frequencies without any prior information about populations. Hence, classical population structure parameters such as F ST are more powerful in detecting population structure than Structure when

41

predefined populations correspond closely to genetic clusters. Suárez et al. (2009) studied the genetic structure of Canary chaffinches using the cytochrome b gene and and 16 microsatellite loci, and suggested that there are at least three subspecies in Canaries.

However, their designations were slightly different from the current taxonomic designations. They suggested that chaffinches from should be considered a separate subspecies, apart from F. c. canariensis . Currently, chaffinches from La

Gomera, and Gran Canaria are considered to be F. c. canariensis . The estimated pairwise F ST values from 14 microsatellite loci was 0.038 between La Palma ( F. c. palmae ) and El Hierro ( F. c. ombriosa ) and 0.049 between La Palma ( F. c. palmae ) and

La Gomera ( canariensis ). These estimates are significantly lower than the FST value reported by Baker et al. (1990) for Canary Islands, but are statistically significant. Our gene flow estimates are low enough to suggest three differentiated populations in Canary

Islands; La Palma ( F. c. palmae ), El Hierro ( F. c. ombriosa ), and La Gomera ( F.c. canariensis ).

Griswold and Baker (2002) used the control region of mtDNA to infer possible locations of Pleistocene refugia during the last glacial maximum (LGM). Based on their estimates of time to the most recent common ancestor (TMRCA) of sampled populations, they concluded that Iberia, the Greek islands and North Africa likely served as possible refugia for European chaffinches during LGM (Griswold and Baker 2002). Therefore, it is possible that some Northern African and Northern European populations might have come into contact during the late Pleistocene before European populations colonized back north as ice sheets retreated in Northern Europe. Assuming European and African

42

populations came into contact during the LGM, estimates of migration rates by

LAMARC would be inflated due to shared polymorphisms, and the contemporary migration rates between Europe and Africa would be lower than estimated here. As there is no evidence of mixing of European and African chaffinches today based on our knowledge of migration to wintering sites, their genetic similarity is likely a result of

European range contraction and limited contact with North African populations during the LGM. This is supported by differences in plumage, morphometrics and songs between these two continental populations (Grant 1979; Dennison and Baker 1991;

Lynch and Baker 1994).

2.5.2 Effective population size and population expansion

The concept of effective population size (N e) introduced by Sewall Wright

(Wright 1931) provides the rate at which a finite, non-idealized population experiences genetic drift. Effective population size is often much lower than census population size in wild populations due to many factors (Frankham 2008). Some of the important factors that affect N e include fluctuation in population size over time, variation in offspring number produced by breeding individuals, unequal sex ratios, inbreeding and population structure. Skewed sex ratios, high fluctuations in population size from generation to generation, and high levels of variation in number of offspring produced by breeding individuals can cause the effective population size to be much lower than the census population size (Frankham 2008; Charlesworth 2009).

The estimated effective population sizes generally correlate well with anecdotal census population sizes of common chaffinches. Assuming equal mutation rates among

43

chaffinch populations, the continental European population ( F. c. coelebs ) has the largest effective population size and it is estimated to be about 500,000. The chaffinch is a common throughout Europe and the breeding population is estimated to be about 83

- 240 million (Heath et al. 2000). The effective population size of the British chaffinch

(F. c. gengleri ) is estimated to be about 485,000. According to the last census of British birds, there were about 7.5 million breeding pairs of chaffinches in UK and Ireland

(Newton 1993). The effective population size estimates of the two African populations are approximately 200,000 for the F. c. africana population and 145,000 for F. c. spodiogenys population. Similar estimates of effective population sizes for Madeira and

Azores populations was unexpected (145,000 versus 165,000 respectively), because census populations sizes between these two locations are thought to be different by about an order of magnitude (Baker et al. 1990). The Azores islands have large flocks of chaffinches that feed on agricultural land, but Madeira does not contain as much agricultural land as Azores. Hence, there are much fewer chaffinches in Madeira compared to Azores. However, the Azores was only colonized by humans in the 15 th century, therefore current large populations in Azores is likely due to a recent expansion aided by agricultural land. and the The effective size of the Madeira population could be higher than expected due to high genetic diversity ( θ) caused by some gene flow from

Azores and Canaries into Madeira. As expected from census population sizes, the Canary

Islands have the smallest effective population sizes. The Canary island chaffinches are restricted to broad leaved forests at medium elevations (Grant 1979), and populations are thought to be at least an order of magnitude lower than Azores populations (Baker et al.

44

1990). Although the effective sizes of Canary populations are not 10-fold less than the

Azores population, they are about three times less than the Azores population.

There is strong evidence for recent population expansion in multiple chaffinch populations. Both the mismatch distribution analysis of mtDNA and estimation of population growth parameter g from multilocus data suggest population expansion. The maximum probability estimate of g and 95% CI of all populations (except . F. c. ombriosa ) are similar enough to suggest that all populations are growing at relatively equal rates. It is reasonable to assume that at least the Northern European populations may have contracted during the last glacial maximum (about 21,000 years ago). Parts of northern Europe, including the United Kingdom and Scandinavia were covered with ice, and ice sheets extended southwards into Belgium and Northern Germany during the

LGM (Dahl 1998). Beyond the glaciers, tundra and steppe extended into central Europe, pushing preferred habitats of chaffinches further south.

However, the genetic diversity of European populations is high and it does not appear that European populations underwent a severe genetic bottleneck during the last glacial maximum. Until recently, it was believed that most temperate and boreal species survived the LGM near the Mediterranean (Hewitt 2000). Recent biogeographical studies with climate modeling suggest more northerly refuge areas such as central Europe for temperate and boreal species (Willis and van Andel 2004; Fløjgaard et al. 2009).

Therefore, ranges of European chaffinches may not have contracted as much as previously thought and they might have survived in reasonable numbers during the LGM,

45

retaining much of their genetic diversity. Pavlova et al. (2005) conducted a phylogeographic study of the Rosefinch ( Carpodacus erythrinus ), another widespread passerine in Eurasia, and found that Northern European populations to be more genetically diverse than expected. They also found evidence for population expansion in many populations similar to what was detected in chaffinches, but they believed that these expansions were older than post-Pleistocene warming (Pavlova et al. 2005).

Considering the difficulty of inferring population size changes from coalescent times, it is difficult to state the timing of expansion with high confidence.

2.5.3 Conclusions

Our analysis suggests that chaffinch populations have been expanding in the recent past, likely after the last glacial maxima. Because chaffinches feed on various seeds and are often seen in farmland, increases in their populations are likely tied to an increase in agricultural land in its range, including colonization of Atlantic islands by humans. The UK bird survey data from the last 40 years corroborate this hypothesis because chaffinch populations have grown in size, unlike many other bird species that have undergone rapid decline in numbers (Siriwardena et al. 1998). In light of our results and previous research, we suggest that the British chaffinch should be considered part of the more widespread western European subspecies. Gene flow estimates between the three Canary island populations are low enough to consider them as distinct populations.

Therefore, there is not enough evidence to change current taxonomic designation of the three subspecies in Canaries.

46

2.6 References

Backström, N., S. Fagerberg, and H. Ellegren. 2008. Genomics of natural bird

populations: A gene-based set of reference markers evenly spread across the avian

genome. Mol Ecol 17:964-980.

Baker, A. J. 2007. Molecular advances in the study of geographic variation and

speciation in birds. Auk 124:18-29.

Baker, A. J., M. D. Dennison, A. Lynch, and G. Legrand. 1990. Genetic divergence in

peripherally isolated populations of chaffinches in the Atlantic islands. Evolution

44:981-999.

Baum, D., and K. L. Shaw. 1995. Genealogical perspectives on the species problem. Pp.

289–303 in P. C. Hoch and A. C. Stephenson, eds. Experimental and molecular

approaches to plant biosystematics. Missouri Botanical Garden, St. Louis, MO.

Bensch, S., D. E. Irwin, J. H. Irwin, L. Kvist, and S. Akesson. 2006. Conflicting patterns

of mitochondrial and nuclear DNA diversity in Phylloscopus warblers. Mol Ecol

15:161-171.

Charlesworth, B. 2009. Fundamental concepts in genetics: Effective population size and

patterns of molecular evolution and variation. Nat Rev Genet 10:195-205.

Dahl, E. 1998. The phytogeography of northern Europe : British Isles, Fennoscandia, and

adjacent areas. Cambridge University Press, Cambridge, U.K.

Dennison, M. D., and A. J. Baker. 1991. Morphometric variability in continental and

Atlantic island populations of chaffinches ( Fringilla coelebs ). Evolution 45:29-

39.

47

Edwards, S. V., and P. Beerli. 2000. Perspective: Gene divergence, population

divergence, and the variance in coalescence time in phylogeographic studies.

Evolution 54:1839-1854.

Edwards, S. V., S. B. Kingan, J. D. Calkins, C. N. Balakrishnan, W. B. Jennings, W. J.

Swanson, and M. D. Sorenson. 2005. Speciation in birds: Genes, geography, and

sexual selection. Proc Nat Acad Sci USA 102:6550-6557.

Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An integrated

software package for population genetics data analysis. Evol Bioinform 1:47-50.

Falush, D., M. Stephens, and J. K. Pritchard. 2003. Inference of population structure

using multilocus genotype data: Linked loci and correlated allele frequencies.

Genetics 164:1567-1587.

Fløjgaard, C., S. Normand, F. Skov, and J. C. Svenning. 2009. Ice age distributions of

European small mammals: Insights from species distribution modelling. J

Biogeogr 36:1152-1163.

Frankham, R. 2008. Effective population size/adult population size ratios in wildlife: A

review. Genet Res 89:491-503.

Friesen, V. L., B. C. Congdon, M. G. Kidd, and T. P. Birt. 1999. Polymerase chain

reaction (PCR) primers for the amplification of five nuclear introns in vertebrates.

Mol Ecol 8:2147-2149.

Fu, Y. X., and W. H. Li. 1993. Statistical tests of neutrality of mutations. Genetics

133:693-709.

Grant, P. R. 1979. Evolution of the Chaffinch, Fringilla coelebs , on the Atlantic islands.

Biol J Linn Soc 11:301-332.

48

Griswold, C. K., and A. J. Baker. 2002. Time to the most recent common ancestor and

divergence times of populations of common chaffinches ( Fringilla coelebs ) in

Europe and North Africa: Insights into pleistocene refugia and current levels of

migration. Evolution 56:143-153.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and

analysis program for Windows 95/98/NT Nucl Acid S 41:95-98.

Hare, M. P. 2001. Prospects for nuclear gene phylogeography. Trends Ecol Evol 16:700-

706.

Heath, M., C. Borggreve, and C. Peet, eds. 2000. European bird populations: estimates

and trends. Birdlife International, Cambridge.

Hebert, P. D. N., A. Cywinska, S. L. Ball, and J. R. DeWaard. 2003. Biological

identifications through DNA barcodes. Proc R Soc London, Ser B: Bio. Sci

270:313-321.

Hewitt, G. 2000. The genetic legacy of the quaternary ice ages. Nature 405:907-913.

Hey, J. 2006. On the failure of modern species concepts. Trends Ecol Evol 21:447-450.

Hey, J., and C. A. Machado. 2003. The study of structured populations - New hope for a

difficult and divided science. Nat Rev Genet 4:535-543.

Hudson, R. R., and N. L. Kaplan. 1985. Statistical properties of the number of

recombination events in the history of a sample of DNA sequences. Genetics

111:147-164.

Hudson, R. R., M. Kreitman, and M. Aguade. 1987. A test of neutral molecular evolution

based on nucleotide data. Genetics 116:153-159.

49

Knowles, L. L., and W. P. Maddison. 2002. Statistical phylogeography. Mol Ecol

11:2623-2635.

Kuhner, M. K., and L. P. Smith. 2007. Comparing likelihood and Bayesian coalescent

estimation of population parameters. Genetics 175:155-165.

Lee, J. Y., and S. V. Edwards. 2008. Divergence across Australia's Carpentarian barrier:

Statistical phylogeography of the red-backed fairy wren ( Malurus

melanocephalus ). Evolution 62:3117-3134.

Librado, P., and J. Rozas. 2009. DnaSP v5: A software for comprehensive analysis of

DNA polymorphism data. Bioinformatics 25:1451-1452.

Lynch, A., and A. J. Baker. 1994. A population memetics approach to cultural evolution

in chaffinch song: Differentiation among populations. Evolution 48:351-359.

Maddison, W. P. 1997. Gene trees in species trees. Syst Biol 46:523-536.

Mallet, J. 1995. A species definition for the modern synthesis. Trends Ecol Evol 10:294-

299.

Mallet, J. 2008. Hybridization, ecological races and the nature of species: empirical

evidence for the ease of speciation. Philos Trans R Soc London, Ser B: Biol Sci

363:2971-2986.

Marshall, H. D., and A. J. Baker. 1997. Structural conservation and variation in the

mitochondrial control region of fringilline finches ( Fringilla spp) and the

greenfinch ( Carduelis chloris ). Mol Biol Evol 14:173-184.

Marshall, H. D., and A. J. Baker. 1999. Colonization history of Atlantic island common

chaffinches (Fringilla coelebs) revealed by mitochondrial DNA. Mol Phylogenet

Evol 11:201-212.

50

Mayr, E. 1963. species and evolution. Harvard Univ. Press, Cambridge, MA.

Newton, I. 1993. Chaffinch Fringilla coelebs In T he New Atlas of Breeding Birds in

Britain and Ireland: 1988-1991 . (eds Gibbons, D.W., Reid, J.B. & Chapman,

R.A.). T. & A.D. Poyser, London.

Nielsen, R., and M. A. Beaumont. 2009. Statistical inferences in phylogeography. Mol

Ecol 18:1034-1047.

Nylander, J. A. A. 2004. MrModeltest v2. Program distributed by the author.

Evolutionary Biology Centre, Uppsala University.

Pavlova, A., R. M. Zink, and S. Rohwer. 2005. Evolutionary history, population genetics,

and gene flow in the common rosefinch ( Carpodacus erythrinus ). Mol Phylogenet

Evol 36:669-681.

Pritchard, J. K., M. Stephens, and P. Donnelly. 2000. Inference of population structure

using multilocus genotype data. Genetics 155:945-959.

Rogers, A. R., and H. Harpending. 1992. Population growth makes waves in the

distribution of pairwise genetic differences. Mol Biol Evol 9:552-569.

Rudbeck, L., and J. Dissing. 1998. Rapid, simple alkaline extraction of human genomic

DNA from whole blood, buccal epithelial cells, semen and forensic stains for

PCR. BioTechniques 25:588-592.

Sambrook, J., and R. W. Russell. 2001. Molecular Cloning : A Laboratory Manual. Cold

Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.

Siriwardena, G. M., S. R. Baillie, S. T. Buckland, R. M. Fewster, J. H. Marchant, and J.

D. Wilson. 1998. Trends in the abundance of farmland birds: a quantitative

comparison of smoothed Common Birds Census indices. J Appl Ecol 35:24-43.

51

Sites Jr, J. W., and J. C. Marshall. 2003. Delimiting species: A Renaissance issue in

systematic biology. Trends Ecol Evol 18:462-470.

Stephens, M., and P. Scheet. 2005. Accounting for decay of linkage disequilibrium in

haplotype inference and missing-data imputation. Am J Hum Genet 76:449-462.

Suárez, N. M., E. Betancor, T. E. Klassert, T. Almeida, M. Hernandez, and J. J. Pestano.

2009. Phylogeography and genetic structure of the Canarian common chaffinch

(Fringilla coelebs ) inferred with mtDNA and microsatellite loci. Mol Phylogenet

Evol 53:556-564.

Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA

polymorphism. Genetics 123:585-595.

Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving the

sensitivity of progressive multiple sequence alignment through sequence

weighting, position-specific gap penalties and weight matrix choice. Nucleic

Acids Res 22:4673-4680.

Townsend, T. M., R. E. Alegre, S. T. Kelley, J. J. Wiens, and T. W. Reeder. 2008. Rapid

development of multiple nuclear loci for phylogenetic analysis using genomic

resources: An example from squamate reptiles. Mol Phylogenet Evol 47:129-142.

Wakeley, J. 2008. Coalescent Theory. Roberts and Company Publishers, Colorado.

Waltari, E., and S. V. Edwards. 2002. Evolutionary dynamics of intron size, genome size,

and physiological correlates in archosaurs. Am Nat 160:539-552.

Willis, K. J., and T. H. van Andel. 2004. Trees or no trees? The environments of central

and eastern Europe during the Last Glaciation. Quat Sci Rev 23:2369-2387.

Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:97-159.

52

Zhai, W., R. Nielsen, and M. Slatkin. 2009. An investigation of the statistical power of

neutrality tests based on comparative and population genetic data. Mol Biol Evol

26:273-283.

53

Chapter 3 Comparison of multilocus phylogenetic methods for inferring evolutionary relationships among recently diverged common chaffinch subspecies ( Fringilla coelebs ssp. )

3.1 Abstract

When lineage sorting is incomplete, inference of true evolutionary relationships of taxa using molecular phylogenetic methods is a significant challenge. Here, I compare analysis of concatenated sequence data from multiple independent nuclear loci with

MrBayes versus Bayesian estimation of species trees (BEST) in reconstructing evolutionary relationships among subspecies of common chaffinches ( Fringilla coelebs ) in the Atlantic islands and the nearby continents. All nine subspecies share the common allele(s) at each locus, and thus lineage sorting is incomplete. Concatenation resulted in the recovery of implausible phylogenetic trees, whereas BEST integrated discordant gene trees with European lineages branching basal in the subspecies tree. Colonization of the

Atlantic islands from the BEST large data set tree was consistent with the out of Europe hypothesis, and was inferred to have occurred from the continent via the Azores to

Madeira and finally to the Canaries.

3.2 Introduction

The ease of collecting DNA sequence data led to resurgence in the field of phylogenetics in 1980s. Today, molecular data is the most commonly used set of

54

characters to infer evolutionary relationships. The basic idea behind molecular phylogenetics is that two species that are very closely related have DNA sequences that are more similar to each other than to a distantly related species because of common ancestry. The mitochondrial DNA (mtDNA) was the marker of choice in early molecular phylogenetics because it is abundant and therefore easy to amplify using the polymerase chain reaction (PCR), evolves faster than the average nuclear gene, and does not recombine except in rare occasions (Avise 1987; Zink and Barrowclough 2008). Thus, early molecular trees tend to be mtDNA phylogenies. One of the implicit assumptions in molecular phylogenetics is that gene trees (how gene lineages are related to each other) are equivalent to the species tree (how species related to each other). The reason for this is that the evolutionary history of a gene is embedded within the evolutionary history of the species, and they should be concordant with each other except under a few conditions.

Therefore, molecular phylogenetics uses gene trees to estimate the species tree. However, one of the insights gained from extension of coalescent theory to phylogenetics is that gene trees may not be congruent with the species tree (Hudson 1992; Maddison 1997).

For a simple case where evolutionary relationships need to be inferred among three species, the probability of discordance of gene trees with the species tree when a single sequence is sampled from each species is given by P (discordant) = 2/3 e -T, where T is measured in coalescent time units, which depend on the time interval between two subsequent divergence events and effective ancestral population size (Hudson 1983;

Pamilo and Nei 1988).

Assuming there is no gene flow after the divergence event, gene lineages in sister species must coalesce before or in their common ancestor. Consequently, the probability

55

of gene tree discordance does not depend on time since the divergence event, and hence this problem is not limited to recently diverged species. If the time between successive divergence events is short and the size of the effective ancestral population is large, there is a reasonable chance that gene trees may not be congruent with the species tree.

However, inferring evolutionary relationships among recently diverged groups is further complicated by ancestral polymorphisms that are shared among these groups. An average of 4N e generations after divergence are required for gene lineages to sort into monophyletic groups through the process of stochastic loss of lineages. Therefore, if the time since divergence is short, as in species that have undergone a rapid radiation, ancestral polymorphisms are likely shared among diverged groups, making it difficult to infer correct evolutionary relationships. Furthermore, some processes of molecular evolution such as gene duplication and loss, horizontal gene transfer, or hybridization can lead to incongruence between gene trees and species trees (Tajima 1983; Hudson 1992;

Maddison 1997; Rosenberg 2002). In these cases, treating a single gene tree as the species tree will result in inferring incorrect evolutionary relationships. Figure 1 illustrates the case where two gene trees embedded within the species tree suggest two different evolutionary relationships, one of which is concordant with the species relationships (gene tree 1), while the other is not (gene tree 2). Due to these reasons, molecular phylogenetics is moving towards a multilocus approach (Peters et al. 2005;

Brito and Edwards 2008; Edwards 2009).

56

Figure 1: Discordance between gene trees and the species tree. Individual gene trees may have different evolutionary histories and may not be concordant with the species tree if time between subsequent divergence events is short and ancestral population sizes are large. Gene tree 1 is concordant with the species tree whereas gene tree 2 is discordant.

When multiple loci are sequenced for phylogenetic analysis, the most common approach is to concatenate all these loci into a single sequence, which is then used to reconstruct evolutionary relationships (Rokas et al. 2003; Steppan et al. 2005). The idea is that given enough data, the true phylogenetic signal will emerge to reconstruct true evolutionary relationships. However, concatenation ignores the fact that genes are the functional units of the genome, and individual genes may have different evolutionary histories (Hey and Machado 2003; Maddison and Knowles 2006). Thus different genes may support different evolutionary relationships due to stochasticity of the lineage sorting process, resulting in a concatenated tree that may not be the true species tree, but instead a tree of averaged relationships. Simulation studies have shown that

57

concatenation may infer the wrong species tree with high node support in certain circumstances (Kubatko and Degnan 2007). For example, it might not be appropriate to concatenate loci that have different modes of inheritance and significantly different effective population sizes (N e) (e.g. mtDNA and nuclear) (Miyamoto and Fitch 1995;

Moore 1995). Another approach to inferring relationships, instead of concatenation, is the consensus tree method, which makes individual gene trees from multiple loci and selects the most common gene tree topology as the species tree (Jennings and Edwards 2005).

The drawback with this approach is that it requires numerous independent loci and is effective only when evolutionary relationships need to be inferred for a small number of taxa (i.e. 3 or 4 species).

Inference of evolutionary relationships can be particularly difficult for recently diverged species because they are often not yet sorted into reciprocally monophyletic lineages, which is only apparent once multiple individuals are sampled from a species.

Neither concatenation nor the consensus tree method can successfully solve the problem of estimating the species tree when lineages are not reciprocally monophyletic (Carstens and Knowles 2007). However, lineage sorting can be mathematically modeled under the coalescent process. Recently, two new methods were developed to address this difficulty of incomplete sorting in newly diverged species. Carstens and Knowles (2007) proposed a maximum likelihood approach for estimating the species tree from gene tree probabilities for recently derived species with incomplete lineage sorting. This method probabilistically models the species tree from independent gene trees, incorporating a model of stochastic loss of gene lineages by drift (Carstens and Knowles 2007). In

58

addition, Liu and Pearl (2007) proposed a Bayesian hierarchical model (BEST) for estimating the species tree using multiple gene tree distributions and coalescent theory.

This model estimates gene trees and the species tree jointly by exploring the dependence of gene trees on the common species tree and the dependence of the species tree on gene trees across multiple loci. It assumes from coalescent theory that gene divergence in any two species should predate species divergence, and incorporates multiple alleles from a single species into the species tree estimate in a coalescent framework (Liu and Pearl

2007; Liu 2008).

Organisms inhabiting oceanic islands provide an excellent opportunity to study processes of microevolution and speciation. The common chaffinch ( Fringilla coelebs ) is a widespread Palearctic passerine species which occupies most of continental Europe, parts of Northern Africa and the Atlantic islands in the Azores, Madeira and Canaries archipelagos (Figure 2). These populations have undergone marked phenotypic differentiation, which has been used to classify them as different subspecies. Atlantic island populations have blue dorsal plumage, pale reddish orange breasts, shorter wings, longer legs, longer bills and larger bodies than continental conspecifics (Grant 1979).

There are two described subspecies from Northern Africa ( F. c. africana and F. c. spodiogenys ), and two subspecies from Europe ( F. c. coelebs and F. c. gengleri ). There is one subspecies from the Azores islands ( F. c. moreletti ), one from Madeira ( F. c . maderensis ) and three subspecies from the Canary Islands ( F .c. canariensis , F. c. ombriosa , and F. c. palmae ). While some taxonomists (e.g. Mayr 1968) assign separate subspecies status to F. c. ombriosa (from El Hierro Island in the Canaries) others consider it part of F. c. palmae . The sister group of the common chaffinch is the blue

59

chaffinch ( F. teydea ) which is endemic to high altitude pine forests on Gran Canaria and

Tenerife in the Canary Islands.

Figure 2: Geographic locations of sampled common chaffinch subspecies ( Fringilla coelebs ssp.) in this study.

Based on the estimated geological origins of these islands, Grant (1979) hypothesized that Atlantic islands might have been colonized by chaffinches within the last million years. Geographically, the Azores islands are closest to the and the Canary Islands are only 115 km from Northwest Africa at their closest point

(Figure 2). Madeira is located between the Azores and Madeira, but is closer to the

Canaries than the Azores. The morphological similarity of the Atlantic island subspecies suggests they have recent common ancestry. Therefore, it is reasonable to hypothesize a

60

wave of sequential colonization of Atlantic islands from either Africa or Europe. Since island plumage patterns somewhat resemble the extant North African populations and because Africa is the closest continent to any of the islands, colonization of the Atlantic islands from Africa might be expected. Under this out of Africa hypothesis, the island subspecies ( F. c. moreletti, F. c. maderensis, F. c. canariensis, F. c. ombriosa, F. c. palmae ) would form a monophyletic group. Figure 3a shows expected evolutionary relationships if the Atlantic islands were colonized sequentially from Africa through the

Canaries (closest to Africa). Figure 3b shows the expected subspecies relationships under the alternative out of Europe hypothesis that the Atlantic islands were colonized from

Europe via the Azores islands, as the archipelago is only 1500 km from Portugal. A third possibility is the nearest continent hypothesis which predicts polyphyletic origins for chaffinches in Atlantic islands, because the Azores is colonized by migrants from Europe, and the Canaries by migrants from Northern Africa (Grant 1979) (Figure 3c). Grant

(1979) has earlier suggested that the islands may have been colonized by birds from the nearest neighboring continent due to some similarities in morphological traits between island populations and their nearest continental conspecifics.

61

Figure 3: Expected phylogenetic relationships among Atlantic island subspecies under different colonization hypotheses. (a) Expected evolutionary relationships in Atlantic island populations if they were colonized from Africa via Canaries. (b) Expected evolutionary relationships in Atlantic island populations if they were colonized from Europe via Azores. (c) Expected relationships in Atlantic islands if they were colonized from their nearest continents.

Marshall and Baker (1997) used mtDNA to infer the colonization history of island populations using Maximum likelihood. The authors used F. c. spodiogenys subspecies as the outgroup because the control-region haplotype found in this subspecies appeared genetically intermediate between the blue chaffinch and other common chaffinch haplotypes. But subsequent analysis by Griswold and Baker (2002) indicate some haplotype sharing between European and African populations at the control region of

62

mtDNA. Therefore, using F. c. spodiogenys as the outgroup may not be appropriate.

Their mtDNA phylogeny placed island subspecies as a monophyletic clade sister to the continental subspecies, suggesting a single wave of colonization. However, they noted that the spectral analysis of phylogenetic splits showed substantial support for polyphyletic origins of island subspecies as well (Marshall and Baker 1999). The goal of the present study therefore is to use nuclear DNA and new statistical methods that accounts for incomplete lineage sorting to infer the evolutionary relationships among the subspecies of Fringilla coelebs . In this Chapter, I investigate: a) effects of sampling (i.e. number of individual sampled per population) on phylogenetic inference methods, and b)

I compare trees recovered from concatenated Bayesian analysis versus Bayesian estimation of species tree (BEST) to infer phylogenetic hypotheses when lineage sorting is incomplete.

3.3 Methods

3.3.1 Sampling, DNA extraction, amplification and sequencing

I sampled 8 individuals from each subspecies for a total of 72 chaffinches.

Genomic DNA was extracted from frozen muscle tissue using standard proteinase K- phenol- chloroform method (Sambrook and Russell 2001) or rapid simple alkaline extraction (Rudbeck and Dissing 1998). Briefly, rapid alkaline extraction was carried out by first adding a minute amount of muscle tissue (2-3 uL equivalent of blood) to a 96 well PCR plate containing 20 uL of 0.2 M NaOH. The plate was covered and heated to

75 °C for 20 minutes in a thermocycler. The solution containing DNA was neutralised by adding 180 uL of a 0.04 M Tris-HCl pH 7.5 solution and frozen.

63

Details of the nine nuclear loci that were amplified and sequenced are provided in

Table 1. The loci are: Aconitase II (ACON), (Backström et al. 2008), B- Actin (B-ACT),

(Waltari and Edwards 2002), Elongation factor 1 alpha (EF1 α) (Backström et al. 2008),

Glyceraldehyde-3- phosphate dehydrogenase (GAPD), (Friesen et al. 1999), Locus

L27331 (L27331), (Backström et al. 2008), Protein tyrosine phosphatase non-receptor 12

(PTPN12), (Townsend et al. 2008), Tropomyosin (TROP), Friesen et al. (1999), and

Ubiquitin carboxyl-terminal esterase (UBIQ), (Backström et al. 2008), and Anonymous locus OH (ANON OH). All these loci except PTPN 12 were non-coding. The anonymous locus OH was developed by the following method. First, Chaffinch genomic DNA was isolated, purified and digested using five different restriction enzymes which were selected to leave single-stranded overhangs at each cut site. The enzymes used were

Hind III, Eco R1, Xba I, Xho I and Nhe I. The digested DNA was then size-selected and ligated to double-stranded linkers containing specific single strand overhangs complementary to the restriction enzyme cut sites. The resulting library that represented the complete Chaffinch genome with the average fragment size of approximately 1,500 bp and each fragment had amplifiable ends. Using a primer specific for the retrotransposon CR1 and a primer specific for the linker, fragments were amplified, size selected and cloned. A number of clones were sequenced to identify flanking regions to the repetitive element. The flanking regions were blasted against the chicken genome to ensure they were present as a single copy, and primers were designed to amplify this region but to exclude the repetitive element.

64

Table 1: Details of the nine nuclear loci sequenced in this study along with models of sequence evolution used in phylogenetic reconstructions.

Locus Primer sequence Fragment Chromosomal Model of Reference size (bp) location in Zebra sequence Finch evolution

F- CCAATGCTTGTGGGCCATG Aconitase II (ACON) R- ATTGCGACCTGTGAAATTCC 750 1A GTR+I Backström et al. 2008 Anonymous locus OH (ANON F- TCCCATTGCAACAACCTGTTCAC OH) R- GGGCACTTCAGTCACTCTGAC 400 2 GTR+I NA F- CCTGATGGTCAGGTCATCA Waltari and Edwards B-Actin (B- ACT) R- CAGCAATGCCAGGGTACAT 500 5 GTR+I+G 2002 F- ATTGGCTACAACCCAGACAC Elongation factor 1 alpha (EF1 α) R- CAGGATGCAGTCCAAGGCT 400 3 HKY+G Backström et al. 2008 Glyceraldehyde 3 phosphase F- ACCTTTAATGCGGGTGCTGGCATTGC dehydrogenase (GAPD) R- CATCAAGTCCACAACACGGTTGCTGTA 350 1 HKY+G Friesen et al. 1997 F- CCTAGCTAAATATGTTCTGGC Locus 22731 (L27331) R- TAGGCTTCCTGATGATGGCT 820 1 GTR+I+G Backström et al. 2008 Protein tyrosine phosphatase non- F- AGTTGCCTTGTWGAAGGRGATGC receptor 12 (PTPN12) R- CTRGCAATKGACATYGGYAATAC 830 1A HKY+I Townsend et al. 2008 F- GAGTTGGATCGGGCTCAGGAGCG Tropomyosin (TROP) R- CGGTCAGCCTCTTCAGCAATGTGCTT 450 28 GTR+I Friesen et al. 1999 Ubiquitin carboxyl-terminal F- GCTTGTGGGACAATTGGG esterase (UBIQ) R- TATTTGGCCCTCTCTTCAGG 450 1 HKY+I Backström et al. 2008

65

Polymerase chain reactions (PCR) were carried out using 1.5 µL of DNA in a

12.5 µL total reaction volume, with 1.25 µL of PCR buffer (10mM TrisHCl pH8.3, 2.5 mM MgCl, 50mM KCl and 0.01% gelatin), 0.28 µL of 1X deoxynucleotide tri- phosphates (dNTP’s), 0.5 µL of each primer, and 0.05 µL of Platinum ® Taq (5units/ µL)

(Invitrogen. Inc). The thermocycling profile was as follows: an initial 94 °C denaturation step for 4 min, followed by a total of 35 cycles consisting of a 30 sec 94 °C denaturation step, a 30 sec annealing step starting at 65 °C and decreasing by a degree per cycle until annealing temperature reached 55 °C, a 30 sec 72 °C extension step, and a final extension of 5 min at 72 °C. Amplified product was isolated by separation in a 2% agarose gel, and the DNA bands were excised from the gel and purified by spinning through filter tips.

Cycle sequencing reactions with forward and reverse PCR primers were carried out using

BigDye Terminator 3.1 (Applied Biosystems) and visualized on an ABI 3100 DNA

Sequencer.

3.3.2 Data analysis

For each locus, all sequences were edited using the program Chromas Pro 1.42

(Technelysium Pty. Ltd., Australia). The sequences were initially aligned with the program ClustalW (Thompson et al. 1994) in BioEdit Sequence Alignment Editor (Hall

1999) and adjusted manually. Heterozygous sites were identified by the presence of two equal-height peaks in the chromatograms. The program PHASE 2.1 (Stephens and Scheet

2005) was used to resolve the haplotypes from the unphased genotype data when a sequence contained multiple heterozygous sites. A small subset of individuals that did not

66

sequence well directly or that were needed for phasing were cloned into the PCR 2.1

TOPO TA cloning vector (Invitrogen) and sequenced in both directions using M13 primers. This was done to eliminate the possibility of using paralogous genes for analysis and to verify that statistically inferred haplotypes were correct.

The four gamete test implemented in DnaSP 5.0 (Librado and Rozas 2009) was conducted to estimate the minimum number of recombination events for each locus. I selected the largest non-recombining segment with the highest number of variable sites at each locus for phylogenetic analysis. To investigate the effects of sampling on phylogenetic inference, I constructed two data sets; a large data set containing 16 sequences (N=8 indivduals) per population, and a small data set of four sequences per population randomly drawn from sequences in the large data set. The best model of nucleotide evolution for each locus was determined using the Akaike information criterion (AIC) in MrModelTest 2.3 (Nylander 2004). The concatenated phylogenies were constructed using MrBayes v3.1.2 (Ronquist and Huelsenbeck 2003), and species trees were also inferred from the coalescent based Bayesian species tree inference program BEST v2.3 (Liu 2008).

Analysis of the small data set with MrBayes was conducted with four simultaneous searches of tree space with four chains (three heated with heating parameter

= 0.3) for 10 million generations, sampling every 1000 th generation, and with the first three million generations discarded as burn-in. Analysis of the large data set was run with

4 simultaneous searches of tree space with four chains (three heated chains with heating parameter = 0.4) for 30 million generations, sampling every 1000 th generation, and the

67

first 10 million generations were discarded as burn-in. Runs were considered to have converged on the stationary distribution when the average standard deviation of split frequencies was below 0.01. We repeated the analysis three times with different seed numbers to ensure the optimal tree was found.

The BEST analysis for the small data set was conducted using four simultaneous searches with four chains (three heated with heating parameter = 0.7) for 60 million generations, with the first 40 million generations discarded as a burn-in. Analysis with the large data set was conducted with four runs with four chains, three of which were heated with heating parameter = 0.7) for 40 million generations, and the first 20 million generations were discarded as a burn-in. Convergence was assessed by plotting generations versus Ln-likelihood to see when the chains were stationary.

To determine the likely continental location of the ancestral population that colonized the Atlantic islands, I used DIVA v. 1.1, which reconstructs ancestral distribution in a phylogeny using dispersal-vicariance analysis (Ronquist 1997). The

Atlantic islands could have colonized from an African ancestor or from a European ancestor. Assuming both possibilities are equally likely, I conducted an ancestral reconstruction analysis constraining the maximum number of ancestral areas to be two

(i.e. Africa and/or Europe).

68

3.4 Results

The concatenated Bayesian trees for the small data set and the large data set are shown in Fig. 4a and 4b, respectively. As each population contained at least four sequences, terminal tips were collapsed to one for clarity. Some of the inferred relationships between the two data sets were different, but other relationships were identical. For example, both trees grouped all the continental populations together in a clade, with the two African populations together and sister to European populations. Both trees suggested very close evolutionary relationships between F. c . coelebs and F. c. gengleri , because sequences from the two populations were mixed together in the tree.

Additionally, both trees grouped all the Canary island populations together in a clade.

However, the phylogeny estimated from the small data set suggests that the Madeira subspecies F.c. maderensis is more closely related to the continental subspecies with posterior probability (PP) of 1.0, and the Azores subspecies F. c. moreletti is more closely related to the three Canary subspecies, F. c. canariensis, F. c. palmae and F. c. ombriosa (PP = 0.65). When sample size was increased to 16 sequences per population, different evolutionary relationships emerged. Most notably, the tree constructed with the large data set grouped Madeira and Azores populations together (PP = 0.70), and the continental populations were sister to the Canary island populations (PP = 0.99). The small data set tree clearly suggests an island ancestor for chaffinches since the continental clade is well inside the tree and the Madeira population is basal to it. Island ancestry for chaffinches is also suggested by the tree inferred from the large data set.

69

Figure 4: Evolutionary relationships among common chaffinch subspecies inferred from concatenated Bayesian inference. The tree from the small data set (a) and the tree from the large data set (b) show different subspecies relationships. The blue chaffinch (Fringilla teydea ) was used as the outgroup.

The BEST trees (Fig. 5a & 5b) from the small and large data sets are not identical, but are similar. Both trees are compatible with northern European ancestry for chaffinch populations because F. c. gengleri (British chaffinch) is sister to all other subspecies.

The small data set tree is split into two clades, one containing all the Atlantic island populations and other containing the widespread continental European subspecies F. c. coelebs with the two African subspecies ( F. c. africana and F. c. spodiogenys ). In the clade containing the Atlantic island populations, the Azores population is sister to the

70

Madeira and Canary island populations. The F. c. canariensis population from Gran

Canaria and Tenerife is sister to F. c. ombriosa from El Hierro and F. c. palmae from La

Palma. The tree from the large data set differs from that from the small data set in the placement of the European subspecies F. c. coelebs . In the small data set tree, F. c. coelebs is sister to the two African subspecies (PP = 0.42) but the large data set tree puts

F. c. coelebs and F. c. gengleri as successive sister groups to the other subspecies (PP =

0.55). Both trees place the Atlantic island subspecies in a monophyletic group with the

Azores population splitting from the Madeira population from which the Canary island populations subsequently diverged.

Figure 5: Evolutionary relationships among common chaffinch subspecies from Bayesian estimation of species tree (BEST) method. Small data set tree (a) and large data set tree (b). The blue chaffinch ( Fringilla teydea ) was used as the outgroup.

71

There are similarities between all four estimated phylogenies. The Canary subspecies relationships are identical in all phylogenies. All four trees show a sister relationship between F. c. ombriosa and F. c. palmae in the western Canary islands, with

F. c. canariensis in the central Canary islands sister to them. All phylogenies show a sister relationships between the two African subspecies. The key inferential difference between the two methods is that concatenated trees suggest island ancestry for chaffinch populations while the large sample BEST tree suggests European ancestry. The BEST molecular clock tree of the large data set (Figure 6) has very short internodes, indicating rapid divergence events. The short mutational time intervals between Atlantic island subspecies suggest sequential colonization in a relatively short period of time. The divergence events were dated by applying a molecular clock with a mutation rate of 2.5 x

10 -9 substitutions per site per year (Lee and Edwards 2008). The most recent divergence event between F. c. ombriosa and F. c. palmae is ~912,000 years ago, and the most ancient divergence event, the separation of F. c. gengleri from rest of the chaffinches is

~1. 1 Mya.

72

Figure 6: The large data set molecular clock tree from BEST shows relationships among Fringilla coelebs ssp . Divergence times were estimated using mutation rate of 2.5 x 10 -9 substitutions per site per year.

3.5 Discussion

All traditional molecular phylogenetic methods estimate the gene tree from sequence data, which is assumed to be the species tree. If sequence data are collected from multiple loci, it is common practice to concatenate the sequences. The premise behind this practice is that gene trees from different loci should be similar due to their shared evolutionary history. The recognition that gene trees can be different from the species tree has given rise recently to so called “species tree” methods, and these methods are intended to estimate the species tree when gene tree are discordant. Simulation studies have shown that when gene trees are discordant, the species tree estimated by concatenating them together can be inaccurate (Kubatko and Degnan 2007). This is

73

because concatenation averages different evolutionary relationships inferred from different gene trees to make the consensus tree. Consequently, the consensus tree could potentially suggest a relationship that is different from any of the gene trees, depending on the level of discordance among gene trees and number of taxa in the species tree

(Degnan and Rosenberg 2006, 2009).

Incomplete lineage sorting is common in shallow species trees, where taxa are closely related and the root is recent. This is often the cause of gene tree discordance in recently diverged groups. Hudson and Coyne (2002) calculated the time to reciprocal monophyly for a pair of populations and estimated that it takes 9-12 N e generations after divergence for a moderate number of nuclear loci to reach reciprocal monophyly. About

2 N e generations are required for mtDNA to reach reciprocal monophyly whereas a nuclear locus requires about 9N e generations to reach reciprocal monophyly with a probability of 0.95 (Hudson and Coyne 2002). When the time since divergence is shorter than 9N e generations, multiple ancestral gene lineages likely persist in a population, making it difficult to infer correct evolutionary relationships. Incomplete lineage sorting is also seen in very deep species trees as well. If terminal branches are quite long but one or more of internal branches are very short, genes can sort randomly into descendent lineages, causing gene tree discordance which persists to the present day. Gene tree discordance has been observed in hominids (Satta et al. 2000), Drosophila sp. (Pollard et al. 2006), cichlids (Takahashi et al. 2001), grasshoppers (Carstens and Knowles 2007), grassfinches (Jennings and Edwards 2005), and early bird evolution (Poe and Chubb

2004). Therefore, both recent divergence and rapid radiation in deep phylogenies can cause gene tree discordance.

74

The major cause of gene tree discordance in the chaffinch populations is incomplete lineage sorting, as is readily apparent in the median-joining networks produced for each of the nine nuclear genes used in building the BEST trees (Chapter 2).

All nine subspecies populations share the common allele(s) at each locus, and only the

African and isolated Atlantic island populations show a moderate degree of lineage sorting which is still incomplete (Figures 3&4, Chapter 2). Despite this lack of resolution in the gene trees, the subspecies tree was resolved with the larger samples, but posterior probabilities in the 50% majority rule consensus trees were lowered as a consequence of incomplete lineage sorting within and among populations.

Paraphyly or polyphyly is commonly observed even in low level mtDNA phylogenies (Funk and Omland 2003). In their review, Funk and Omland (2003) found that 23% of 2,319 species surveyed were paraphyletic or polyphyletic, which can only be detected if multiple individuals are sampled per population. The level of paraphyly and polyphyly is likely to be higher in nuclear DNA due to larger effective population size relative to mtDNA. If multiple gene lineages persist in a population (i.e. they are not monophyletic), it is not difficult to imagine a scenario where sampling one individual from each group can lead to incorrect evolutionary relationships due to sampling error.

Figure 7 illustrate an example of four recently diverged populations, with population A and D containing two gene lineages, whereas populations B and C are monophyletic. In this case, if just one lineage was sampled from each population, inferred relationships would be different from actual relationships of the populations. This problem can be resolved by sampling multiple individuals to avoid sampling error, or by using multiple independent loci for phylogenetic reconstruction. By sampling multiple individuals per

75

population and using multilocus data, I inferred evolutionary relationships among common chaffinch subspecies to avoid problems inherent in phylogenetic inference of recently diverged groups.

Figure 7: Effect of sampling when multiple gene lineages persist in recently diverged populations. In this scenario, sampling one individual from each population will result in incorrect evolutionary relationships, regardless of the lineage that would be sampled.

3.5.1 Effects of sampling on phylogenetic inference

Concatenation assumes congruent gene trees across different loci due to their shared evolutionary history and also that the estimated tree is identical to the species phylogeny (Liu et al. 2009). The difference in the two concatenated trees recovered for chaffinches in my study is likely due to sampling error because lineages are not sorted into monophyletic groups. When many ancestral gene lineages persist in a population shortly after divergence, the gene lineage one happens to sample may not be representative of the population if lineages are not monophyletic. As a result, sampling has a significant influence on the accuracy of inferred relationships. Therefore, it is important to incorporate intraspecific variation into phylogenetic estimates when lineage

76

sorting is incomplete. Concatenation clearly seems to suffer from sampling effects with the small data set containing four sequences per population, and suggests different evolutionary relationships of the subspecies from that obtained with the large data set.

For example, the concatenated tree from the small data set suggest that F. c. maderensis from Madeira is more closely related to continental subspecies than other Atlantic island subspecies, while the large data set suggests F. c. maderensis is more closely related to the neighboring Azores population of F. c. moreletti .

As simulation studies by Kubatko and Degnan (2007) have shown, concatenation appears to provide strong support for an incorrect phylogeny when gene trees are discordant. In chaffinches there is very strong support (PP = 1.0) for the node grouping F. c. maderensis with other continental subspecies in the concatenated tree from the small data set. This relationship is not shown by any of the other trees and seems highly unlikely because it would suggest a Madeiran origin for all continental chaffinches, a small isolated island in the Atlantic with a small population. Furthermore, Grant (1979) found that Atlantic island populations have evolved shorter wings and larger body mass.

The island chaffinches have thus evolved traits for a sedentary lifestyle while contemporary Northern European chaffinches migrate south in winter and are morphologically adapted for migration and are capable of long distance dispersal. I therefore conclude that inadequate sampling can affect inference of strongly supported evolutionary relationships using concatenation.

The BEST species trees are consistent with a single wave of colonization of the

Atlantic islands in a North-South direction from a European ancestor, starting from the

77

Azores via Madeira to the Canary Islands. The small data set BEST tree (Figure 5a) clearly supports the inference of European ancestry for Atlantic islands while it is some what ambiguous from the large data set tree (Figure 5b) because both European subspecies are basal to the node separating the Atlantic islands from Africa. As a result, it is possible that the common ancestor of the Atlantic islands and Africa might have been in Africa. The sequence of colonization events in the Atlantic islands from north (Azores) to south (Canaries) is more plausible if they were colonized from Europe than from

Africa due to geographic distances involved. If the common ancestor of the Atlantic

Islands and Africa was located in Africa, it would require the ancestor to fly all the way to Azores from Africa (bypassing Madeira and Canaries on the way) and then move southwards to colonize Madeira and Canaries, compared to the ancestor flying from

Europe to the Azores and then moving south to colonize Madeira and Canaries. From an environmental standpoint, colonization from Europe is more likely because strong winds blow from north to southeast in winter when chaffinches flock in large numbers. The earliest divergence in the Atlantic islands also precedes divergence between the two

African subspecies (Figure 6). This is more probable if the common ancestor of Atlantic islands and Africa was located in Europe than in the African continent (Figure 6). Under this hyppothesis there were two separate colonization events from Europe, one to the

Atlantic islands via Azores (westwards) and a more recent colonization event southwards to Africa. I explored this possibility by conducting ancestral area reconstruction in the program DIVA, which allows both vicariance and dispersal events to best explain current biogeography patterns. I assumed that either Europe or Africa are possible locations for the ancestor of the Atlantic islands and Africa chaffinches, and reconstructed the

78

biogeographic history. Optimization under this assumption required four dispersal events to explain their current distribution, which is consistent with two separate colonization events from Europe, one to Atlantic islands and one to Africa. The four inferred dispersal events comprise one dispersal event each from Europe to Azores and from Europe to

Northern Africa, and two dispersals within the Atlantic islands: one from Azores to

Madeira and one from Madeira to Canaries. Figure 8 illustrates this colonization pattern, which appears to be the most likely scenario considering the phylogenetic analysis, migration capabilities of chaffinches, geographic distances between areas and wind patterns. Considering the geographic proximity, both Atlantic islands and Africa may have been colonized from migrants from Iberia.

Figure 8: Most likely scenario for colonization of the Atlantic islands and Africa considering phylogenetic results and biological reasoning.

79

Marshall and Baker (1999) sequenced four mtDNA genes (cytochrome b,

ATPase 6, NADH and the control region) to infer evolutionary relationships among common chaffinch subspecies. They sampled multiple individuals from all subspecies except F. c. gengleri and F. c. ombriosa and chose the most common haplotype from each subspecies to represent the taxa in the phylogenetic analysis due to methodological and computational constraints at the time. Two common control region haplotypes in F. c. canariensis and F. c. maderensis were included in phylogenetic analysis. Their tree, which was rooted with F. c. spodiogenys , from Tunisia clustered the Atlantic island subspecies as a monophyletic group (Marshall and Baker 1999). Interestingly, the two sampled haplotypes from F. c. canariensis and F. c. maderensis did not group together, suggesting incomplete sorting of ancestral polymorphisms or gene flow. Marshall and

Baker (1999) suggested that the Atlantic islands were colonized by an African ancestor via Azores, and that Europe was colonized from Africa in a separate colonization event.

However, my results suggest European ancestry for modern widespread chaffinches, and that Africa was colonized from Europe in keeping with the recent split of the two African subspecies. Furthermore, Marshall and Baker (1999) hypothesized that the western

Canary Islands (La Palma and El Hierro) were probably the first to be colonized, and the central and eastern islands were colonized subsequently. My analysis suggests the opposite. It appears central islands (represented by F. c. canariensis ) were colonized first and the two western islands, whose populations are classified as two subspecies ( F. c. palmae from La Palma and F. c. ombriosa from Hierro) were colonized later (Figure 4 &

5). Support of this sequence of colonization among Canary subspecies is provided in a

80

recent phylogeographic study of Canary Island subspecies using the cytochrome b gene

(Suárez et al. 2009).

3.5.2 Species tree estimates with some gene flow

Gene flow, in addition to incomplete sorting of ancestral polymorphisms, can decrease the ability to infer true evolutionary relationships from gene sequences. Current phylogenetic methods assume that there is no gene flow after a divergence event

(evolution in complete isolation) and unlike lineage sorting, gene flow cannot be easily incorporated into phylogenetic inference. Therefore, many questions remain on the accuracy of phylogenetic inferences when there is some gene flow between diverged groups. Eckert & Carstens (2008) investigated the accuracy of phylogenetic inference methods when some gene flow is present between species of interest. They tested two coalescent based phylogenetic methods (ESP-COAL and Minimize deep coalescence) versus the concatenated method using simulations, and found that coalescent-based methods outperformed concatenation in all tested scenarios of gene flow (four models of gene flow: n-island, stepping stone, allopatric and parapatric with N em from 0.01 to 1).

Concatenation performed very poorly for n-island and stepping stone models even at relatively low levels of gene flow (N em = 0.10, Eckert and Carstens 2008). Therefore, coalescent based phylogenetic methods appear to be robust to some gene flow relative to concatenation. Brumfield et al. (2008) used BEST as well as divergence population genetics methods to infer evolutionary relationships among hybridizing Manakin species, and found that both methods gave the same conclusions about evolutionary relationships in Manakins. Therefore, BEST appears to be robust to some gene flow as well.

81

The close inferred relationship between F. c. coelebs and African subspecies in the BEST tree from the small data set and concatenated trees could be due to gene flow and hence shared ancestral polymorphisms between African and European populations during the Pleistocene glacial maxima. Griswold and Baker (2002) inferred likely

Pleistocene refugia for the Northern European chaffinches and concluded that they may have survived in Iberia, Greece and North Africa. Therefore, gene flow between Europe and Africa during the last glacial maximum could explain clustering of F. c. coelebs with the two African subspecies. However, shared ancestral polymorphism seems to be overcome when more individuals are sampled from each population, so that F. c. coelebs is placed basally with F. c. gengleri in the tree constructed from the large data set.

3.5.3 Conclusions

In conclusion, the number of individuals that needs to be sampled and number of loci needed to infer the true phylogeny will depend on the evolutionary history of the group. If the populations/ species diverged recently with short time intervals between divergence events (short internodes), a high level of gene tree discordance is expected.

Therefore, it is advisable to sample many individuals per population and use many loci to accurately infer evolutionary relationships. My results emphasize the importance of sampling and incorporating multiple individuals into phylogenetic inference when inferring evolutionary relationships among closely related groups. Furthermore, coalescent based methods appear to provide much more realistic estimates of

82

evolutionary relationships than concatenation when gene discordance is widespread and populations are not reciprocally monophyletic.

3.6 References

Avise, J. C. 1987. Intraspecific phylogeography: the mitochondrial DNA bridge between

population genetics and systematics. Annual review of ecology and systematics.

18:489-522.

Backström, N., S. Fagerberg, and H. Ellegren. 2008. Genomics of natural bird

populations: A gene-based set of reference markers evenly spread across the avian

genome. Mol Ecol 17:964-980.

Brito, P. H., and S. V. Edwards. 2008. Multilocus phylogeography and phylogenetics

using sequence-based markers. Genetica:1-17.

Carstens, B. C., and L. L. Knowles. 2007. Estimating species phylogeny from gene-tree

probabilities despite incomplete lineage sorting: An example from melanoplus

grasshoppers. Syst Biol 56:400-411.

Degnan, J. H., and N. A. Rosenberg. 2006. Discordance of species trees with their most

likely gene trees. PLoS Genet 2:762-768.

Degnan, J. H., and N. A. Rosenberg. 2009. Gene tree discordance, phylogenetic inference

and the multispecies coalescent. Trends Ecol Evol 24:332-340.

Eckert, A. J., and B. C. Carstens. 2008. Does gene flow destroy phylogenetic signal? The

performance of three methods for estimating species phylogenies in the presence

of gene flow. Mol Phylogenet Evol 49:832-842.

83

Edwards, S. V. 2009. Is a new and general theory of molecular systematics emerging?

Evolution 63:1-19.

Friesen, V. L., B. C. Congdon, M. G. Kidd, and T. P. Birt. 1999. Polymerase chain

reaction (PCR) primers for the amplification of five nuclear introns in vertebrates.

Mol Ecol 8:2147-2149.

Funk, D. J., and K. E. Omland. 2003. Species-level paraphyly and polyphyly: frequency,

causes, and consequences, with insights from animal mitochondrial DNA. Pp.

397-423. Annu Rev Ecol Evol Syst.

Grant, P. R. 1979. Evolution of the Chaffinch, Fringilla Coelebs , on the Atlantic islands.

Biol J Linn Soc 11:301-332.

Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and

analysis program for Windows 95/98/NT Nucl Acid S 41:95-98.

Hey, J., and C. A. Machado. 2003. The study of structured populations - new hope for a

difficult and divided science. Nat Rev Genet 4:535-543.

Hudson, R. R. 1983. Testing the constant-rate neutral allele model with protein sequence

data. Evolution 37:203-217.

Hudson, R. R. 1992. Gene trees, species trees and the segregation of ancestral alleles.

Genetics 131:509-512.

Hudson, R. R., and J. A. Coyne. 2002. Mathematical consequences of the genealogical

species concept. Evolution 56:1557-1565.

Jennings, W. B., and S. V. Edwards. 2005. Speciational history of Australian grass

finches ( Poephila ) inferred from thirty gene trees. Evolution 59:2033-2047.

84

Kubatko, L. S., and J. H. Degnan. 2007. Inconsistency of phylogenetic estimates from

concatenated data under coalescence. Syst Biol 56:17-24.

Lee, J. Y., and S. V. Edwards. 2008. Divergence across Australia's Carpentarian barrier:

Statistical phylogeography of the red-backed fairy wren ( Malurus

melanocephalus ). Evolution 62:3117-3134.

Librado, P., and J. Rozas. 2009. DnaSP v5: A software for comprehensive analysis of

DNA polymorphism data. Bioinformatics 25:1451-1452.

Liu, L. 2008. BEST: Bayesian estimation of species trees under the coalescent model.

Bioinformatics 24:2542-2543.

Liu, L., and D. K. Pearl. 2007. Species trees from gene trees: Reconstructing Bayesian

posterior distributions of a species phylogeny using estimated gene tree

distributions. Syst Biol 56:504-514.

Liu, L., L. Yu, L. Kubatko, D. K. Pearl, and S. V. Edwards. 2009. Coalescent methods

for estimating phylogenetic trees. Mol Phylogenet Evol 53:320-328.

Maddison, W. P. 1997. Gene trees in species trees. Syst Biol 46:523-536.

Maddison, W. P., and L. L. Knowles. 2006. Inferring phylogeny despite incomplete

lineage sorting. Syst Biol 55:21-30.

Marshall, H. D., and A. J. Baker. 1999. Colonization History of Atlantic Island Common

Chaffinches ( Fringilla coelebs ) Revealed by Mitochondrial DNA. Mol

Phylogenet Evol 11:201-212.

Miyamoto, M. M., and W. M. Fitch. 1995. Testing species phylogenies and phylogenetic

methods with congruence. Syst Biol 44:64-76.

85

Moore, W. S. 1995. Inferring phylogenies from mtDNA variation: Mitochondrial gene

trees vs. nuclear gene trees. Evolution 49:718-726.

Nylander, J. A. A. 2004. MrModeltest v2. Program distributed by the author.

Evolutionary Biology Centre, Uppsala University.

Pamilo, P., and M. Nei. 1988. Relationship between gene trees and species trees. Mol

Biol Evol 5:568-583.

Peters, J. L., K. G. McCracken, Y. N. Zhuravlev, Y. Lu, R. E. Wilson, K. P. Johnson, and

K. E. Omland. 2005. Phylogenetics of wigeons and allies (Anatidae: Anas ): the

importance of sampling multiple loci and multiple individuals. Mol Phylogenet

Evol 35:209-224.

Poe, S., and A. L. Chubb. 2004. Birds in a bush: five genes indicate explosive evolution

of avian orders. Evolution 58:404-415.

Pollard, D. A., V. N. Iyer, A. M. Moses, and M. B. Eisen. 2006. Widespread discordance

of gene trees with species tree in Drosophila : evidence for incomplete lineage

sorting. PLoS Genet 2:1634-1647.

Rokas, A., B. L. Williams, N. King, and S. B. Carroll. 2003. Genome-scale approaches to

resolving incongruence in molecular phylogenies. Nature 425:798-804.

Ronquist, F. 1997. Dispersal-vicariance analysis: A new approach to the quantification of

historical biogeography. Syst Biol 46:195-203.

Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference

under mixed models. Bioinformatics 19:1572-1574.

Rosenberg, N. A. 2002. The probability of topological concordance of gene trees and

species trees. Theor Popul Biol 61:225-247.

86

Rudbeck, L., and J. Dissing. 1998. Rapid, simple alkaline extraction of human genomic

DNA from whole blood, buccal epithelial cells, semen and forensic stains for

PCR. BioTechniques 25:588-592.

Sambrook, J., and R. W. Russell. 2001. Molecular Cloning : A Laboratory Manual . Cold

Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.

Satta, Y., J. Klein, and N. Takahata. 2000. DNA archives and our nearest relative: The

trichotomy problem revisited. Mol Phylogenet Evol 14:259-275.

Stephens, M., and P. Scheet. 2005. Accounting for decay of linkage disequilibrium in

haplotype inference and missing-data imputation. Am J Hum Genet 76:449-462.

Steppan, S. J., R. M. Adkins, P. Q. Spinks, and C. Hale. 2005. Multigene phylogeny of

the Old World mice, Murinae, reveals distinct geographic lineages and the

declining utility of mitochondrial genes compared to nuclear genes. Mol

Phylogenet Evol 37:370-388.

Suárez, N. M., E. Betancor, T. E. Klassert, T. Almeida, M. Hernandez, and J. J. Pestano.

2009. Phylogeography and genetic structure of the Canarian common chaffinch

(Fringilla coelebs ) inferred with mtDNA and microsatellite loci. Mol Phylogenet

Evol 53:556-564.

Tajima, F. 1983. Evolutionary relationships of DNA sequences in finite populations.

Genetics 105:437-460.

Takahashi, K., Y. Terai, M. Nishida, and N. Okada. 2001. Phylogenetic relationships and

ancient incomplete lineage sorting among cichlid fishes in lake Tanganyika as

revealed by analysis of the insertion of retroposons. Mol Biol Evol 18:2057-2066.

87

Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: Improving the

sensitivity of progressive multiple sequence alignment through sequence

weighting, position-specific gap penalties and weight matrix choice. Nucleic

Acids Res 22:4673-4680.

Townsend, T. M., R. E. Alegre, S. T. Kelley, J. J. Wiens, and T. W. Reeder. 2008. Rapid

development of multiple nuclear loci for phylogenetic analysis using genomic

resources: an example from squamate reptiles. Mol Phylogenet Evol 47:129-142.

Waltari, E., and S. V. Edwards. 2002. Evolutionary dynamics of intron size, genome size,

and physiological correlates in archosaurs. Am Nat 160:539-552.

Zink, R. M., and G. F. Barrowclough. 2008. Mitochondrial DNA under siege in avian

phylogeography. Mol Ecol 17:2107-2121.

88

Chapter 4

General conclusions

Earlier phylogenetic analyses based on mtDNA genes suggested that the ancestral common chaffinch population might have been located in Africa, and later expanded its range northwards to colonize Europe (Marshall and Baker 1999). However, the first coalescent analyses of the highly variable control region of mtDNA samples across

European localities indicated that during Pleistocene glacial cycles, chaffinches were likely confined to refugia in Iberia, Greece and possibly southern Italy (Griswold and

Baker 2002), though migration to wintering sites in northern Africa might have occurred as well. My results, based on coalescent-based phylogenetic methods suggest a European origin for common chaffinches, and that Africa was likely colonized independently from

Europe. The Atlantic islands appear to have colonized by chaffinches from Europe via

Azores sequentially southeastward to Madeira and Canaries. In the Canaries, the central islands (La Gomera and Tenerife) were probably colonized first and western islands (El

Hierro and La Palma) were colonized later. These conclusions based on phylogenetic inference are consistent with difference in genetic diversity of these populations. In a source-colonization scenario, the source population is expected to contain higher genetic diversity than any of the colonies due to founder effects. If colonization is stepwise, genetic diversity is expected to decrease sequentially from the initial colony to successive colonies unless population size is large or gene flow is substantial and ongoing.

Population genetic analysis in this thesis shows that the European population harbors higher genetic diversity than that found in African and Atlantic island populations. This

89

result is more consistent with the out of Europe hypothesis rather than the out of Africa hypothesis. Additionally, both common chaffinches ( Fringilla coelebs ) and their close relative ( Fringilla montifringilla ) are distributed in Eurasia, supporting a

European origin for the common chaffinch.

The population genetic analysis suggests moderate to low gene flow between

Europe and Africa. This is most likely a result of gene flow between European and

African populations during the last glacial maximum as contemporary migration between the two continents has not been observed. The Atlantic island populations appear to be highly structured with very low gene flow between them. Even migration rate estimates between close populations in the Canary Island populations are very low. Analysis of population size changes over time suggests that all except the F. c. ombriosa population on the small island of El Hierro have experienced growth.

In future, the isolation with migration model (IM) (Nielsen and Wakeley 2001) that would allow for simultaneous analysis of multiple populations could be used to estimate divergence times between these populations, once a good working version of the program is released. Coalescent simulations can also be used to test specific historical demographic hypotheses such as population growth after population divergence versus recent population growth after glacial maxima. As these populations appears to have diverged quite rapidly in a short period of time, it might be necessary to collect more genetic data to confidently infer divergence times between subspecies populations. Also, more genetic data from a larger number of independent loci might be necessary to distinguish the best hypothesis from a collection of historical demographic scenarios.

90

Comparison of phylogenetic methods shows that concatenation is affected by sampling much more than the coalescent-based phylogenetic method (BEST).

Concatenation can provide incorrect evolutionary relationships with high support when gene tree discordance is high. Therefore, it is important to sample multiple individuals per population and multiple loci to infer the true phylogeny. If populations are sorted into monophyletic groups, it may not be necessary to incorporate multiple individuals into phylogenetic inference. However, use of multiple loci increases the probability of recovering the true species tree. For closely related species that have not sorted into monophyletic groups, or for species that have undergone a rapid radiation, using coalescent based phylogenetic methods with multiple sampling will increase the probability of recovering the true species tree.

4.1 References

Griswold, C. K., and A. J. Baker. 2002. Time to the most recent common ancestor and

divergence times of populations of common chaffinches ( Fringilla coelebs ) in

Europe and North Africa: Insights into pleistocene refugia and current levels of

migration. Evolution 56:143-153.

Marshall, H. D., and A. J. Baker. 1999. Colonization history of Atlantic island common

chaffinches (Fringilla coelebs) revealed by mitochondrial DNA. Mol Phylogenet

Evol 11:201-212.

Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from isolation: A Markov

chain Monte Carlo approach. Genetics 158:885-896.

91

Appendix 1: Details of the Chaffinch samples used in the study

Subspecies Bird ID designation Sex Location

AJB 4208 canariensis F La Gomera, Canary Islands AJB 4209 canariensis M La Gomera, Canary Islands AJB 4209 canariensis M La Gomera, Canary Islands AJB 4210 canariensis F La Gomera, Canary Islands AJB 4211 canariensis M La Gomera, Canary Islands AJB 4212 canariensis F La Gomera, Canary Islands AJB 4213 canariensis M La Gomera, Canary Islands AJB 4214 canariensis F La Gomera, Canary Islands AJB 4215 canariensis M La Gomera, Canary Islands AJB 4216 canariensis F La Gomera, Canary Islands AJB 4217 canariensis M La Gomera, Canary Islands AJB 4218 canariensis F La Gomera, Canary Islands AJB 4219 canariensis M La Gomera, Canary Islands AJB 4220 canariensis M La Gomera, Canary Islands AJB 4231 maderensis F Madeira Island AJB 4237 maderensis M Madeira Island AJB 4238 maderensis M Madeira Island AJB 4239 maderensis M Madeira Island AJB 4240 maderensis F Madeira Island AJB 4241 maderensis F Madeira Island AJB 4242 maderensis M Madeira Island AJB 4353 maderensis F Madeira Island AJB 4366 maderensis F Madeira Island AJB 4367 maderensis M Madeira Island MKP 584 maderensis M Madeira Island MKP 586 maderensis M Madeira Island MKP 587 maderensis M Madeira Island MKP 589 maderensis F Madeira Island AJB 5080 africana M Morocco AJB 5085 africana M Morocco AJB 5087 africana M Morocco AJB 5088 africana M Morocco AJB 5094 africana M Morocco AJB 5096 africana F Morocco AJB 5098 africana M Morocco AJB 5099 africana M Morocco AJB 5101 africana M Morocco AJB 5103 africana F Morocco AJB 5105 africana M Morocco AJB 5106 africana M Morocco MKP 539 africana M Morocco MKP 540 africana M Morocco

92

MKP 2300 spodiogenys M Tunisia MKP 2301 spodiogenys F Tunisia MKP 2302 spodiogenys M Tunisia MKP 2303 spodiogenys M Tunisia MKP 2305 spodiogenys M Tunisia MKP 2309 spodiogenys M Tunisia MKP 2310 spodiogenys M Tunisia MKP 2311 spodiogenys M Tunisia MKP 2312 spodiogenys M Tunisia MKP 2320 spodiogenys M Tunisia MKP 2321 spodiogenys F Tunisia MKP 2322 spodiogenys M Tunisia MKP 2800 coelebs F Greece MKP 2801 coelebs F Greece MKP 2810 coelebs F Greece MKP 2811 coelebs M Greece MKP 2815 coelebs F Greece MKP 2818 coelebs F Greece MKP 2819 coelebs F Greece MKP 2825 coelebs F Greece MKP 2826 coelebs F Greece AJB 5425 coelebs M Denmark AJB 5432 coelebs M Denmark AJB 5433 coelebs M Denmark AJB 5434 coelebs M Denmark AJB 5436 coelebs M Denmark AJB 5437 coelebs M Denmark AJB 5438 coelebs M Denmark AJB 5439 coelebs M Denmark AJB 5445 coelebs M Denmark AJB 5446 coelebs M Denmark AJB 5452 gengleri M England AJB 5453 gengleri M England AJB 5456 gengleri M England AJB 5464 gengleri M England AJB 5471 gengleri M England AJB 5475 gengleri M England AJB 5483 gengleri M Scotland AJB 5484 gengleri M Scotland AJB 4173 ombriosa M El Hierro, Canary Islands AJB 4175 ombriosa M El Hierro, Canary Islands AJB 4176 ombriosa M El Hierro, Canary Islands AJB 4177 ombriosa F El Hierro, Canary Islands AJB 4178 ombriosa M El Hierro, Canary Islands AJB 4179 ombriosa F El Hierro, Canary Islands AJB 4183 ombriosa F El Hierro, Canary Islands AJB 4185 ombriosa F El Hierro, Canary Islands AJB 4186 ombriosa M El Hierro, Canary Islands AJB 4189 ombriosa F El Hierro, Canary Islands AJB 4192 ombriosa M El Hierro, Canary Islands

93

MKP 574 ombriosa M El Hierro, Canary Islands AJB 3927 palmae M La Palma, Canary Islands AJB 3928 palmae M La Palma, Canary Islands AJB 3933 palmae F La Palma, Canary Islands AJB 3934 palmae M La Palma, Canary Islands AJB 3935 palmae M La Palma, Canary Islands AJB 3936 palmae M La Palma, Canary Islands AJB 3937 palmae M La Palma, Canary Islands AJB 3938 palmae M La Palma, Canary Islands AJB 3940 palmae M La Palma, Canary Islands AJB 3941 palmae M La Palma, Canary Islands AJB 3957 palmae F La Palma, Canary Islands AJB 3958 palmae F La Palma, Canary Islands AJB 3959 palmae F La Palma, Canary Islands AJB 3960 palmae F La Palma, Canary Islands AJB 3850 moreletti M Sao Miguel, Azores Islands AJB 3852 moreletti M Sao Miguel, Azores Islands AJB 3860 moreletti M Sao Miguel, Azores Islands AJB 4303 moreletti M Sao Jorge, Azores Islands AJB 4308 moreletti M Sao Jorge, Azores Islands AJB 4313 moreletti M Sao Jorge, Azores Islands AJB 4607 moreletti M Graciosa, Azores Islands AJB 4615 moreletti M Graciosa, Azores Islands AJB 4626 moreletti M Graciosa, Azores Islands AJB 4628 moreletti M Graciosa, Azores Islands AJB 4676 moreletti M Flores, Azores Islands AJB 4695 moreletti M Flores, Azores Islands AJB 4696 moreletti M Flores, Azores Islands AJB 4700 moreletti M Flores, Azores Islands

94