Divergence Population Genetics of the Genus Capsella Pádraic Corcoran
Total Page:16
File Type:pdf, Size:1020Kb
Divergence Population Genetics ofthe Genus Capsella Pádraic Corcoran Degree project inbiology, Master ofscience (2years), 2009 Examensarbete ibiologi 45 hp tillmasterexamen, 2009 Biology Education Centre and Department ofEvolutionary Functional Genomics, Uppsala University Supervisor: Prof. Martin Lascoux 1 "A contrary pancake surely, a fingerish atrocity but not without a queer charm all its own" (Flann O'Brien, The Third Policeman) 2 Abstract The population genetic study of diverging species or populations has seen many advances in recent years. Likelihood or Bayesian methods based on coalescent theory offer researchers the opportunity to estimate important parameters for the history of the species or populations under consideration. The divergence population genetics approach was utilized here at both the interspecific and intraspecific levels. In this study, fourteen nuclear loci were sequenced in the species Capsella bursa-pastoris sampled in Europe and the Far East. Seven loci were sequenced in the species Capsella rubella found in southern Europe and Israel. Phylogenetic methods were utilized to identify haplotype sharing between these species and the causes of this haplotype sharing were investigated using and divergence population genetics methods. The fitting of the Isolation-with-Migration model to the data identified post divergent gene flow as the cause of haplotype sharing between these species. It was also found that the origin of the polyploid C. bursa-pastoris was an extreme bottleneck event. At the intraspecific level of this study in C. bursa-pastoris, nucleotide diversity estimates for the Far Eastern populations were found to be lower than those in Europe. However, the Russian population appears to have diverged relatively recently from the European population of C. bursa-pastoris and there is no evidence of post divergence gene flow found between the Far Eastern Russian and Chinese populations in this study. These results indicate that polyploid speciation does not always result in reproductive isolation of the polyploid species from closely related species and that divergence population genetics methods can be utilized to better understand the history of a rapid colonizing plant like C. bursa- pastoris. 3 Introduction The study of diverging populations is central to understanding the early stages of speciation and by extension the origins of biodiversity. Diverging populations, from a purely neutral viewpoint, are embroiled in a competition between the forces of drift and migration. Drift in each of the diverging population is the force for divergence, while migration in the form of gene flow is the great homogenizing force (Wright 1931). It is the realization of the potency of migration as an homogenizing force between populations that has been so important in the formulation of the allopatric model for species formation, which emphasizes the preeminence of geographical isolation in the speciation process (Mayr 1963). In the allopatric model of speciation populations diverge geographically and accumulate a number of genetic differences to become reproductively isolated. Allopatric speciation has been considered, for a long time now, the most important model for speciation. This is due to the accumulation of many documented cases of allopatric divergence (Coyne and Orr 2004). An alternative to this model is the sympatric model of speciation which occurs when the species to be occupy the same geographical area (Coyne and Orr 2004). In this case, there can be migration between the species through introgressive hybridization. Indeed, there is now accumulating evidence for this divergence with gene flow speciation (Noor and Feder 2006). In recent years, a number of computational approaches based upon a coalescent framework have been developed to estimate parameters for models of speciation (Becquet and Przeworski 2007). A model where there is no role for gene flow between the diverging populations, allopatric speciation can be described by an isolation model (Wakeley and Hey 1997). This model has four parameters that can be estimated using mutilocus polymorphism data (Figure 1). These include the effective population sizes of the ancestral and two descendant populations/species, along with the estimation of the splitting time between populations. However, such models that ignore gene flow when it has occurred between the diverging populations can lead to overestimation of the ancestral effective population and underestimation of the divergence time (Wall 2003). More recently, an “Isolation-with-Migration” (IM) model has been developed that incorporate post divergent gene flow (Nielsen and Wakeley 2001, Hey and Nielsen 2004). This model has been implemented by a number of programs to estimate the parameters of speciation, or for estimation of parameters in the case of recently diverged populations. The details of this model are explained briefly later in this section (Hey and Nielsen 2004; Hey and Nielsen 2007; Becquet and Przeworski 2007). This study deals with both aspects of divergence discussed above, the divergence between species and the divergence between populations within a species. I shall study the case of introgression between two species of the genus Capsella and the divergent population genetics of three populations of C. bursa-pastoris. I shall make use of Bayesian inference using coalescent based model for diverging populations that can still exchange genes. A brief description of divergent population genetics and the inference of demographic history are given next, followed by the description of the model system for this study. 4 Inferring the demographic history of diverging populations "Oh, East is East, and West is West, and never the twain shall meet." (Rudyard Kipling) We are now at a time where we have experienced 150 years of “tree-thinking” in the field of evolutionary biology. It is now the norm for evolutionary biologists to visualize the relationship among species and populations as being a tree. This was after all the only figure to appear in Darwin‟s (1859) The Origin of Species. This “tree-thinking” has had a great influence over the fields of phylogenetics and phylogeography in particular. Once species are defined, evolutionary biologists wish to understand the evolutionary relationship between these species; they want to know the species tree. The field of study that seeks to infer this species tree is phylogenetics (Felsenstein 2003). Below the level of species we have the realm of the population geneticists, who spend their time trying to understand population structure and the evolutionary forces acting on populations (Gillespie 2004). Initially, it would seem that both fields do not overlap and have little need to communicate or to understand each other. However, this is not the case any longer. The field of phylogeography deals with understanding the processes underlying the geographic distributions of genealogical lineages. Phylogeography has utilized for many years a phylogenetic approach to understand the spatial distribution of gene lineages (Avise 2000). Traditionally, phylogeographic studies made inferences based on the reconstruction of a genealogy (gene tree) for a single locus (e.g. mtDNA), neglecting the fact that this genealogy may not accurately reflect the history of a population (Knowles and Maddison 2002; Hey and Machado 2003). Complications such as incomplete lineage sorting mean that the gene tree need not be the same as the species/population tree. Ancestral polymorphism can lead to the phenomenon of incomplete lineage sorting. This results in a pattern where the genealogical tree for a DNA sequence from individuals sampled from the descendant populations has a different branching order from the tree that relates the history of population splitting events (Avise 2000). For instance, in a large survey, Paterson et al. (2006) showed that as much as 29% of the segregating sites show a discordant pattern for human, chimp and gorilla. This means that almost a third of the sites would have led us to infer the wrong species tree. Eventually, these shared polymorphisms are lost in the descendant populations by the action of drift and selection in both. However, this not only presents difficulty for resolving population history, but it can also present difficulty for resolving the relationship between two recently diverged species that may have shared ancestral polymorphisms (Pamilo and Nei 1988). Theory also tells us that the gene trees within a population are random (Nielsen & Beaumont 2009), and that we can never have much confidence in ever knowing the population tree for any single locus. However, phylogeography has been evolving, and has seen a greater use of mutilocus coalescent based approaches that take advantage of the stochastic variance associated with genealogies (Edwards and Beerli 2000; Nielsen and Beaumont 2009). This new field that takes a more rigorous statistical approach to phylogeographic inference can be referred to as Statistical Phylogeography (Knowles and Maddison 2002). With the incorporation of coalescent models, the field of statistical phylogeography is a fist attempt to bridge the divide between population genetic and phylogenetics (Nielsen and Beaumont 2009; Avise 2000). So then, what information can we obtain from the unobserved genealogical relationship that does exist, and what does it tell us about the past demographics of the sampled populations? To answer this question requires a combination of Coalescent theory (Box1),