Hybridization and Evolution in the Genus Pinus

Baosheng Wang

Department of Ecology and Environmental Science Umeå 2013

Hybridization and Evolution in the Genus Pinus

Baosheng Wang

Department of Ecology and Environmental Science Umeå University, Umeå, Sweden 2013

This work is protected by the Swedish Copyright Legislation (Act 1960:729) Copyright©Baosheng Wang ISBN: 978-91-7459-702-8 Cover photo: Jian-Feng Mao Printed by: Print&Media Umeå, Sweden 2013

List of Papers

This thesis is a summary and discussion of the following papers, which are referred to by their Roman numerals.

I. Wang, B. and Wang, X.R. Mitochondrial DNA capture and divergence in Pinus provide new insights into the evolution of the genus. Submitted Manuscript

II. Wang, B., Mao, J.F., Gao, J., Zhao, W. and Wang, X.R. 2011. Colonization of the Tibetan Plateau by the homoploid hybrid . Molecular Ecology 20: 3796-3811.

III. Gao, J., Wang, B., Mao, J.F., Ingvarsson, P., Zeng, Q.Y. and Wang, X.R. 2012. Demography and speciation history of the homoploid hybrid pine Pinus densata on the Tibetan Plateau. Molecular Ecology 21: 4811–4827.

IV. Wang, B., Mao, J.F., Zhao, W. and Wang, X.R. 2013. Impact of geography and climate on the genetic differentiation of the subtropical pine Pinus yunnannensis. PLoS One. 8: e67345. doi:10.1371/journal.pone.0067345

V. Wang, B., Mahani, M.K., Ng, W.L., Kusumi, J., Phi, H.H., Inomata, N., Wang, X.R. and Szmidt, A.E. Extremely low nucleotide polymorphism in Pinus krempfii Lecomte, a unique flat needle pine endemic to Vietnam. Submitted Manuscript

Paper II & III are reprinted with the permission of Blackwell Publishing Ltd.

Abstract Gene flow and hybridization are pervasive in nature, and can lead to different evolutionary outcomes. They can either accelerate divergence and promote speciation or reverse differentiation. The process of divergence and speciation are strongly influenced by both neutral and selective forces. Disentangling the interplay between these processes in natural systems is important for understanding the general importance of interspecific gene flow in generating novel biodiversity in . This thesis first examines the importance of introgressive hybridization in the evolution of the genus Pinus as a whole, and then focusing on specific pine species, investigates the role of geographical, environmental and demographical factors in driving divergence and adaptation. By examining the distribution of cytoplasmic DNA variation across the wide biogeographic range of the genus Pinus, I revealed historical introgression and mtDNA capture events in several groups of different pine species. This finding suggests that introgressive hybridization was common during past species’ range contractions and expansions and thus has played an important role in the evolution of the genus. To understand the cause and process of hybrid speciation, I focused on the significant case of hybrid speciation in Pinus densata. I established the hybridization, colonization and differentiation processes that defined the origin of this species. I found P. densata originated via multiple hybridization events in the late Miocene. The direction and intensity of introgression with two parental species varied among geographic regions of this species. During the colonization on Tibetan Plateau from the ancestral hybrid zone, consecutive bottlenecks and surfing of rare alleles caused a significant reduction in genetic diversity and strong population differentiation. Divergence within P. densata started from the late Pliocene onwards, induced by regional topographic changes and Pleistocene glaciations. To address the role of neutral and selective forces on genetic divergence, I examined the association of ecological and geographical distance with genetic distance in populations. I found both neutral and selective forces have contributed to population structure and differentiation in P. yunnanensis, but their relative contributions varied across the complex landscape. Finally, I evaluated genetic diversity in the Vietnamese endemic Pinus krempfii. I found extremely low genetic diversity in this species, which is explained by a small ancestral population, short-term population expansion and recent population decline and habitat fragmentation. These findings highlight the role of hybridization in generating novel genetic diversity and the different mechanisms driving divergence and adaptation in the genus Pinus.

Keywords Adaptation, biogeography, coalescent simulation, cytoplasmic genome, demographic history, genetic diversity, hybridization, migration, Pinus, population structure, selection, speciation

Sammanfattning Genflöde och hybridisering är vanligt förekommande i naturen och kan leda till olika evolutionära lösningar. De kan leda till en uppsnabbad diversifiering och facilitera artbildning eller motverka differentiering. Diversifiering och artbildningsprocesser är starkt influerade av både neutrala och direkta urvals processer. Genom att separera interaktionen mellan dessa processer hos system i naturen ökar förståelsen för hur viktigt ett genflöde mellan arter är för att skapa ny biodiversitet hos växter. Denna avhandling kommer till att börja med undersöka hur viktig inkorporering av DNA mellan arter i släktet Pinus är för släktets evolution och gå vidare med att undersöka rollen hos geografiska, miljömässiga och demografiska faktorer i artbildning och anpassningar hos arter av tall. Genom att undersöka den geografiska fördelningen av DNA variationen i cytoplasma hos flertalet arter av tall, kunde vi påvisa en historisk inkorsning mellan arter från samma släkte samt inkorporering av mtDNA mellan arter av tall. Detta påvisar att både hybridisering och inkorporering av främmande DNA från en närbesläktad art har varit vanligt förekommande under den historiska utbredningens expandering och kontraktion och har på så sätt bidragit till evolutionen hos släktet. För att bättre förstå orsaken och hur artbildning genom hybridisering sker, fokuserade vi på en särskild artbildning hos Pinus densata som skett genom hybridisering. Vi påvisar hur hybridisering, kolonialisering och differentiering processen ser ut för ursprunget av denna art. Vi såg att P. densata är sprungen ur flera hybridiseringar under sen Miocene. Vilken och hur mycket DNA som inkorporerats mellan de två föräldraarter varierade mellan olika regioner av P. densata utbredning. Under kolonialiseringen av den Tibetanska platån från en ursprunglig hybridiseringszon, orsakade upprepade genetiska flaskhalsar och genetisk drift i expansionszonen kraftig minskning av den genetiska diversiteten samt en väldig populations differentiering. Uppdelningen inom P. densata började i sen Pilocene som en följd av regionala topografiska förändringar och glaciärbildningen under Pleistocene. För att undersöka neutrala och selektiva evolutionära krafter på uppdelning tittade vi på hur ekologiskt och geografiskt avstånd förhåller sig till genetiskt avstånd hos Pinus yunnanensis populationer. Vi kunde se att bägge dessa avstånd har bidragit till populationsstruktur och differentiering hos P. yunnanensis, men deras bidrag varierade över det heterogena landskapet. Avslutningsvis utvärderade vi den genetiska diversiteten hos den för Vietnam endemiska Pinus krempfii. Vi fann en extremt låg genetisk diversitet hos denna art, vilket förklaras av en liten ursprungspopulation, en kortvarig geografisk expansion och en nylig reduktion av antalet individer samt habitat fragmentering. Dessa upptäckter belyser hybridiseringens roll i att skapa ny genetisk diversitet och de mekanismer som driver uppdelning och anpassning inom släktet Pinus.

Table of Contents

Introduction 1 Introgression and Cytoplasmic Genome Capture 1 Hybrid Speciation 3 Divergence in the Face of Gene Flow 5 Mechanisms that Drive Genetic Divergence: Isolation-by-Distance (IBD) vs. Isolation-by-Ecology (IBE) 7 Study System 10 Phylogeny and biogeography of the genus Pinus 10 Pinus densata – a homoploid hybrid species on the Tibetan Plateau 12 Pinus yunnanensis – a subtropical pine endemic to southwest 13 Pinus krempfii – a Tertiary relic pine in Vietnam 14 Aims of the Thesis 16 Results and Discussion 17 MtDNA Capture and Biogeographic History of the Genus Pinus 17 MtDNA capture 17 Biogeography of the genus Pinus 20 Homoploid Hybrid Speciation of Pinus densata 21 Hybridization and colonization processes of Pinus densata 21 Demographic history of Pinus denasata 24 Role of Geography and Ecology in the Genetic Divergence of Pinus yunnanensis 28 Genetic diversity and phylogeography of Pinus yunnanensis 28 IBD vs. IBE 30 Genetic Variation and Population Demography of Pinus krempfii 32 Concluding Remarks 34 References 35 Acknowledgments 43

Introduction

Introgression and Cytoplasmic Genome Capture Gene flow and introgressive hybridization are pervasive in nature. About 10– 30% of animal and species are known to hybridize and/or to regularly exchange genes with other species, and a large proportion of the currently non-hybridizing species likely (repeatedly) hybridized in the past (Mallet 2007; Soltis & Soltis 2009; Abbott et al. 2013). Hybridization can have different evolutionary outcomes. It can accelerate differentiation and facilitate speciation through the rapid origin of novel biochemical, physiological, or morphological phenotypes that allow hybrid species to invade novel habitats, inaccessible to parental species (Mallet 2007; Abbott et al. 2010; Nolte & Tautz 2010). Conversely, hybridization may reverse differentiation of diverging species through the homogenizing effects of gene flow and recombination. The homogenizing effect of hybridization can be overcome by divergent selection that maintains or reinforces differentiation in the face of gene flow (Feder et al. 2012; Nosil & Feder 2012; Abbott et al. 2013).

The direction of introgression between species and populations is usually asymmetric because of various intrinsic and extrinsic factors (Hamilton et al. 2013a, b). Premating barriers, such as phenology and physical pollination environments, can influence the direction of pollen-mediated gene flow in natural populations (Gérardi et al. 2010; Hamilton et al. 2013b). After pollination, pollen competition could be prominent or exclusively present in one of the hybridizing species, thus reducing the number of hybrids in this species and resulting in directional introgression (Rahme et al. 2009; Lepais et al. 2013). Once the zygote is formatted, selection on the fitness of hybrids and their ability to backcross with parental species could also induce asymmetric introgression (Whitney et al. 2006; Lepais & Gerber 2011; Holliday et al. 2012).

In addition to asymmetric reproductive barriers, unidirectional or unbalanced gene flow can also be due to species distribution patterns. The relative abundance of species in sympatry or parapatry can affect hybridization dynamics with the expectation that introgression will be from the more abundant species towards the less abundant one (Lepais et al. 2009). One situation where population sizes between potentially hybridizing species are likely strongly imbalanced, is when a colonizing species spreads into an area already occupied by a related species. In this case, the invading species is initially rare, and thus genes from abundant resident species would

1 be incorporated into the colonizing species, amplified by the subsequent demographic growth of invading population, resulting in a massive introgression of resident genes into the invader’s gene pool (Currat et al. 2008). The asymmetric introgression from local species into invader is stronger for genome components with less gene flow, such as chloroplast (cp) genome in angiosperms and mitochondrial (mt) genome in (Currat et al. 2008). The rationale is the following: if intraspecific gene flow is low, then genes from the resident species that have introgressed into the invading species will not be diluted by migrants from other populations of the invading species, and will become rapidly fixed in the gene pool of the invader following demographic growth. Hence, the less intraspecific gene flow there is, the more interspecific gene flow is expected (Petit & Excoffier 2009).

In plants, dispersal is usually less effective than pollen dispersal. Therefore, maternally inherited organelle markers (e.g. cpDNA in angiosperms and mtDNA in conifers) that are only dispersed by should be more frequently introgressed, and hence more susceptible to interspecific cytoplasmic genome capture. In angiosperm species, cpDNA capture through hybridization and introgression is frequently observed (Rieseberg & Soltis 1991). In species, the mt genome is maternally inherited and dispersed via seeds, while the cp genome is paternally inherited and transmitted via pollen and seeds. The contrasting patterns of inheritance and high migration ability of wind-dispersed pollen usually result in higher gene flow for cpDNA than for mtDNA (Petit et al. 2005). In keeping with the prediction that introgression is negatively correlated with intraspecific gene flow, mtDNA should introgress more readily than cpDNA in conifers, thus resulting in capture of mtDNA from local species by invading species. Historical lateral transfer of the mt genome has been documented in conifers by phylogeographical studies, such as in pine (Senjo et al. 1999; Godbout et al. 2012) and spruce (Du et al. 2009). In all cases reported, the sharing of mitotypes between species (lineages) in their zone of sympatry or parapaty was proposed to be caused by mtDNA introgression during range expansion, with the mitotypes of local species (lineage) captured by invading species (lineage). The frequent cytoplasmic genome capture events detected in nature, e.g. cpDNA capture in angiosperms and mtDNA capture in conifers, indicate that hybridization and reticulate evolution are common in plants. Elucidating the patterns and consequences of historical introgression could shed light on the evolution of plant diversity.

2

Hybrid Speciation One potential outcome of hybridization is hybrid speciation through the creation of either novel polyploid (allopolyploid speciation) or homoploid (homoploid hybrid speciation, HHS) species. In allopolyploid speciation, genome duplication can restore fertility in new hybrids and create nearly- instantaneous reproductive isolation of incipient hybrids from their progenitor species. In contrast, HHS may encounter two difficulties during hybrid species formation and development. Firstly, hybridization between parental genomes that show certain degree of incompatibility may result in low-fitness homoploid hybrids (Soltis & Soltis 2009; Aitken & Whitlock 2013). Secondly, the lack of strong reproductive isolation between the hybrids and their parents would inhibit the hybrid genome stabilization (Buerkle et al. 2000; Abbott et al. 2010). Therefore, allopolyploid speciation was assumed to be the major mode of hybrid speciation in plants, while HHS was thought not as prevalent (and therefore important) as polyploid hybrid speciation. However, recent scientific research suggests that HHS in plants may be more common than previously thought. This recognition was facilitated by the increasing applicability of molecular genetic techniques to detecting homoploid hybrids in the wild (Wang & Szmidt 1994; Rieseberg 1997; Gross & Rieseberg 2005; Abbott et al. 2010).

Both theoretical and experimental studies indicate that HHS can be a rapid form of speciation (McCarthy et al. 1995; Rieseberg et al. 1996; Buerkle et al. 2000). However, a more recent analysis suggests that genomic regions associated with the initial evolution of reproductive isolation may become stabilized quickly, but the reminder of the genome likely takes a much longer time to be stabilized (Buerkle & Rieseberg 2008). Reproductive isolation in HHS could arise through rapid chromosomal reorganization, spatial isolation and/or ecological divergence (Rieseberg 2006). Early studies of the HHS emphasized the importance of chromosomal sterility barriers, but it has become evident that changes in the ecological attributes of hybrid populations and ecological selection promoting adaption to novel habitats may be particularly important in HHS (Gross & Rieseberg 2005). Theoretical studies show that availability and colonization of a novel habitat are key elements associated with the success of HHS, and chromosomal isolation alone is unlikely to lead to speciation at an appreciable rate (McCarthy et al. 1995; Buerkle et al. 2000). Divergent selection, leading to local adaption to new habitats, may provide critical ecological isolating barrier between hybrids and their progenitors (Gross & Rieseberg 2005; Abbott et al. 2010). The importance of ecological selection in HHS is also confirmed by the observations that hybrid species often occupy habitats that

3 differ from those occupied by their parental species (Rieseberg 1997; Mao & Wang 2011).

When ecological divergence is an important driver of speciation, it is possible that new ecologically similar lineages originate multiple times within different patches of the same habitat type. Molecular evidence suggests that multiple origins of homoploid hybrid species is the rule rather than an exception (Gross & Rieseberg 2005; Abbott et al. 2010). Strong genetic differentiation is expected in hybrids with multiple origins from genetically differentiated parental populations. The genetic structure of hybrid species could be potentially even more complex due to historical events following hybridization, such as the sorting of ancestral variation, local introgression between the hybrid and its parental species, and genetic drift during colonization of new niches (Arnold 1993; Abbott et al. 2010). Because extrinsic isolation barriers, such as those due to ecological selection, are often incomplete between closely related species, gene flow from parental species could lead to complex genetic structures in the hybrid species. Nonetheless, the hybrid species can maintain their ecogeographic distinctiveness, perhaps defined by just a few important genetic differences. If only a small number of factors contribute to reproductive isolation, then most of the hybrid genome should be influenced by introgression from parental species. This is supported by the findings of genome-wide introgression between hybrid Helianthus annuus texanus and its parental species, where the genetic islands of differentiation between species were smaller than 1cM (Scascitelli et al. 2010).

For hybrid populations established by the colonization of new niches that are geographically isolated from their parental species, introgression from parental species should not be a complicating factor during population establishment and subsequent evolution. Instead, selection and founder effects occurring during range expansion should have important roles in generating and shaping the resulting genetic structure of the hybrid species. Spatial expansion can generate allele frequency gradients, promoting the surfing of rare alleles into newly occupied territories, and inducing genetic structure in newly colonized areas (Klopfstein et al. 2006; Excoffier et al. 2009). Rare or low-frequency alleles should theoretically have a higher probability of reaching high frequencies at the expansion wave front, in addition to rare alleles becoming fixed by drift in newly colonized areas, particularly when the founder population is relatively small (Klopfstein et al. 2006; Hallatschek & Nelson 2008). In contrast, in their place of origin, these mutations may often disappear or remain at low frequencies (Edmonds et al. 2004). In this process, some ancestral haplotypes can be lost over time by lineage sorting, and the populations established in the newly colonized

4 regions would have genetic patterns distinct from those occurring in the hybrid zone.

Clearly, elucidating the processes that have influenced the distribution and genetic composition of a hybrid species requires the reconstruction of hybridization and colonization events over a range of temporal and spatial scales. For this purpose, phylogeographic studies based on molecular markers with different modes of inheritance and patterns of recombination, provide opportunities to trace the direction and intensity of hybridization during speciation. To gain further understanding of the processes and driving forces in any hybrid speciation event, more specific investigation of the demographic history and patterns of ecological and genomic divergence are needed.

Divergence in the Face of Gene Flow In the case of speciation in the presence of gene flow, two species generated by a splitting event from their common ancestor will diverge whilst exchanging genes (Pinho & Hey 2010). Under this scenario, patterns of shared and fixed differences between species will depend on divergence time and the migration rates among species, as well as the population size of the ancestral and descendent populations. Fixed polymorphisms will accumulate over time, while shared polymorphisms will be (re)introduced by introgression. Large ancestral populations contain higher levels of sequence polymorphism and larger descendent populations retain polymorphisms longer. An isolation-with-migration (IM) model has been developed to quantify the gene flow and establish the population demography in the process of divergence with gene exchange (Nielsen & Wakeley 2001).

The basic IM model considers an ancestral population that gives rise to two descendent populations that are connected by gene flow (Nielsen & Wakeley 2001; Hey & Nielsen 2004). This model has six parameters: population size parameters for one ancestral (θA) and two descendent populations (θ1 and θ2), a parameter for the timing of the split (t), and two gene exchange parameters (m1→2 and m2→1; Figure 1A). The IM model will be more complex than this basic one when considering more populations. A scenario considering three modern populations has 15 parameters, a four- population model has 28 parameters, and a ten-population model has as much as 190 parameters (Hey 2010). Figure 1B shows an IM model for three sampled populations in which sampled population 1 (θ1) and 2 (θ2) are the most recently diverged population pairs. Parameters in the IM model are usually scaled to the mutation rate when the model is fitted to genetic data.

5

In this framework, each population is represented by a population mutation rate, θ = 4Neμ, where Ne is the effective population size and μ is the mutation rate. Gene flow (m) is expressed as the rate of migration for each gene copy per mutation event, and the time parameter (t) is also scaled by the mutation rate.

Nielsen & Wakeley (2001) developed a likehood-based method to estimate the parameters of the IM model for a single non-recombining genetic locus. There have been several extensions and alterations of the original method: the use of multiple unlinked loci (Hey & Nielsen 2004), relaxation of the assumption of constant populations sizes (Hey 2005), reduction of the state space of MCMC simulation to just genealogies (Hey & Nielsen 2007), intralocus recombination (Becquet & Przeworski 2009), and the inclusion of more than two related populations (Hey 2010). These methods have been used extensively to estimate levels of gene exchange between species, subspecies and populations of the same species, and found convincing evidences for speciation in the presence of gene flow (Pinho & Hey 2010). However, the estimated gene flow with IM model could be biased by violation of assumptions, e.g. existence of population structure and gene exchange from unsampled species (Becquet & Przeworski 2009; Strasburg & Rieseberg 2010). Moreover, changes in the amount of gene flow over time cannot be investigated by current likehood-based methods (Strasburg & Rieseberg 2011). One way to address these concerns is to use more complex models with additional parameters. However, for more complex models, an analytical formula for the likelihood function might be elusive or the likelihood function might be too computationally intensive to evaluate.

ABθ N μ θ N μ 3UHVHQW    θ Nμ    θ Nμ θ Nμ m m→ → m m→ →

m→

m→ m→ m → t

θ Nμ

m→ t m→ t

θ$ N$μ

θ Nμ

3DVW

Figure 1. Isolation with migration (IM) model for (A) two sampled populations and (B) three sampled populations.

6

Approximate Bayesian computation (ABC) methods bypass the evaluation of the likelihood function, and thus can potentially be used to fit very complex demographic models to the genetic data (Beaumont et al. 2002). ABC involves simulating large numbers of data sets for the demographic model, where parameters of the model are drawn from a prior distribution. Each simulated data set is then compared to the observed genetic data using summary statistics. The simulated samples are accepted only when they are sufficiently close to the observed data. The accepted data points are then used to estimate the posterior distribution for the parameters of the model. Despite the capability in handling more parameters, ABC methods are based on comparing the summary statistics for simulated and observed data, and thus they are less accurate than likelihood-based methods using the complete data (Beaumont et al. 2002). Furthermore, in scenarios where many parameters need to be considered, it is not obvious how many summary statistics will be required to accurately capture complex demographic processes (Beaumont et al. 2002), and ABC methods suffer from the curse of dimensionality when the number of summary statistics is increased (Blum & Francois 2010). Therefore, the complex models still represent a computational challenge and demand more sequence information.

IM model tracks the history of population development and gene flow, in which gene exchange could occur at any point following the onset of divergence in sympatric or parapatric speciation, or even be induced by secondary contact after allopatric speciation. However, the IM model is unlikely to capture all the complexity of the divergence process. It alone cannot fully disentangle the interplay between selection, gene flow and recombination that occur during speciation.

Mechanisms that Drive Genetic Divergence: Isolation- by-Distance (IBD) vs. Isolation-by-Ecology (IBE) Understanding the factors that contribute to population differentiation is a long-standing goal in ecology and evolution. Patterns of genetic divergence often reflect spatial variation in gene flow, which can be influenced by neutral processes and natural selection. In most species, the distance of individual migration is much smaller in comparison to the distribution ranges of the species. This limited dispersal can result in patterns of spatial autocorrelation in the distribution of genetic variation: individuals that are close to each other are likely to be more related and genetically more similar than individuals that are geographically further apart. In other words, a positive relationship is expected between genetic divergence and geographic

7 distance: isolation-by-distance (IBD; Wright 1943). IBD is among the most common eco-evolutionary patterns observed in nature, and could lead to parapatric speciation (Jenkins et al. 2010). For example, under a strong IBD for enough time, novel mutations can arise and become fixed in different populations as gene flow has been greatly reduced. These mutations can eventually cause genetic incompatibilities and the evolution of reproductive isolation, known as the Bateson–Dobzhansky–Muller model (Muller 1942).

Most studies have tested IBD by performing Mantel tests. However, tests of IBD could be biased by hierarchical population structure. In a simulation study, the Mantel tests clearly show a positive correlation between geographical distance and genetic distance under a hierarchical island model with the assumption of no IBD (Meirmans 2012). Also a study on alpine plants showed that the results of Mantel tests were not related to IBD, but instead were strongly affected by geographically clustered population structure that resulted from postglacial recolonization (Meirmans et al. 2011). It is therefore advised to perform a test for IBD in combination with an analysis of higher level population structure, e.g. a partial Mantel test that takes the effects of hierarchical structure into account or a standard Mantel test performed in each of the clusters separately (Meirmans 2012).

Ecological divergence could limit gene flow between populations adapted to distinct niches, and thus produce genetic divergence. In this process, divergent selection first acts on a few (ecologically) relevant loci with the remainder of the genome unaffected. Over time and under favorable conditions of selection and recombination, localized divergence can extend to areas surrounding the loci under selection by divergence hitchhiking, and eventually to the entire genome via genome hitchhiking (Feder et al. 2012; Nosil & Feder 2012). At the genome hitchhiking stage, gene flow is effectively reduced across the entire genome and a generalized barrier to gene flow forms that can be detected with smaller sets of neutral molecular markers (Thibert-Plante & Hendry 2010; Feder et al. 2012). It is thus expected that environmental dissimilarity will correlate to neutral genetic population differentiation: isolation by ecology (IBE). Ecologically induced population differentiation has been well documented in natural populations (Schluter 2009) and a positive correlation between reproductive isolation among species and ecological divergence has been established (Funk et al. 2006). A correlation between neutral genetic differentiation and ecological divergence, or IBE, is generally recognized as evidence for the ecological speciation model (Nosil 2012).

A common approach used to test IBE mirrors that of IBD where a matrix of population differentiation is correlated with a matrix of ecological

8 distance. However, most environmental variables are spatially autocorrelate- d, and the spatial autocorrelation in the distribution of genetic variation (IBD) is also ubiquitous (Jenkins et al. 2010). Overlap between the spatial patterns of the environmental and genetic variables can easily lead to significant, but spurious, correlations (Meirmans 2012). Thus, to ensure the IBE pattern is not being driven by spatial autocorrelation, geographic distance should be controlled for in the model by using a partial correlation equation:

ݎ െݎ ݎ ݎ ȁ ൌ ୉ൈୋ ୋൈୈ ୉ൈୈ ୉ൈୋ ଶ ଶ ඥͳെݎୋൈୈඥͳെݎ୉ൈୈ

In this model, a partial IBE correlation (rE×G | D) is equally influenced by IBD (rG×D) and eco-spatial autocorrelation (rE×D). A simulation study examined the impact of co-linearity among geographical distance, ecological divergence and genetic differentiation on inferences of IBE, and found that in scenarios with moderate to high IBD and eco-spatial autocorrelation, detecting IBE might be a fruitless pursuit (Shafer & Wolf 2013).

Whether neutral processes or divergent selection under conditions of gene flow are the major drivers of population divergence is heavily debated. Based on Jenkins et al.’s (2010) meta-analysis, approximately 22% of the variance in neutral genetic differentiation could be explained by genetic drift. In contrast, only ~5% of neutral genetic differentiation can be purely explained by ecological divergence (Shafer & Wolf 2013). However, the relative role of IBD and IBE varies between species and across the distribution range within species. For example, a study of Boechera stricta suggested that while isolation by distance alone is sufficient to explain the moderate and continuous north–south divergence, environmental variables show larger contributions than geographical factors in the discrete genetic divergence between east and west (Lee & Mitchell-Olds 2011). The combination of informative molecular markers, spatial statistics and high resolution geographic information system (GIS) data provide the opportunity to explicitly evaluate the influence of environment and geography on the distribution of genetic variation within each species (Storfer et al. 2007; Manel et al. 2010; Sork et al. 2013).

9

Study System

Phylogeny and biogeography of the genus Pinus The genus Pinus () is one of the largest extant gymnosperm genera, with more than 100 species spread across the Northern Hemisphere (Mirov 1967). Over 40 taxonomic treatments have been proposed for the genus Pinus (reviewed in Price et al. 1998). In this thesis, the classification scheme of Gernandt et al. (2005) is followed. This scheme is based on cpDNA, ribosomal DNA and morphological information and widely accepted today. In Gernandt et al.’s (2005) classification, the genus Pinus is divided into two subgenera Pinus and Strobus, each with two sections: sections Pinus and Trifoliae in the former, and Parrya and Quinquefoliae in the latter. Further divisions generate 2-3 monophyletic subsections in each section, and totally 11 subsections are recognized (Figure 2).

Subsection Section Subgenus Pinus Pinus Pinaster Contortae Pinus Australes Trifoliae Ponderosae Gerardianae Krempfianae Quinquefoliae Strobus Strobus Nelsoniae Balfourianae Parrya Cembroides

Figure 2. Subgenera, sections and subsections recognized by Gernandt et al. (2005), and their phylogentic relationships based on full chloroplast genome sequences (adapted from Parks et al. 2012)

CpDNA data, including restriction fragments and gene sequence data, have been used extensively to infer phylogenies in the genus Pinus, and have generated largely consistent results on the relationships of the major lineages within this genus (Wang & Szmidt 1993; Wang et al. 1999; Gernandt et al. 2005; Eckert & Hall 2006; Parks et al. 2009; Parks et al. 2012). However, phylogenetic resolution at low taxonomic levels is still ambiguous,

10 and even the entire chloroplast genome is insufficient to fully resolve the most rapidly radiating lineages in the genus (Parks et al. 2009; Parks et al. 2012). Compared with cpDNA-based phylogenetic studies on the genus Pinus, studies utilizing mtDNA and nuclear DNA data tend to be more limited in their sampling, both in the number of different genomic regions and the taxa included, but have contributed new insights on the evolution of the genus (Syring et al. 2005; Syring et al. 2007; Willyard et al. 2007; Tsutsui et al. 2009; Willyard et al. 2009).

A few established mtDNA phylogenetic for subgenus Strobus differ from that of the cpDNA trees (Tsutsui et al. 2009). Incongruence has been also detected between nuclear gene trees (Syring et al. 2005), and both hybridization and incomplete lineage sorting have been proposed to be responsible for the lack of allelic monophyly within pine species (Syring et al. 2007; Willyard et al. 2009). In light of these findings, phylogenetic inferences based on a single source of molecular data (e.g. cpDNA) would be insufficient to derive strong conclusions regarding species relationships, and comparison of the three genomes would validate better the ambiguities in the evolutionary history of Pinus.

The origin of the genus Pinus is thought to be Early Cretaceous in the middle latitudes of Laurasia, as the earliest fossil attributed to Pinus belgica was found in Belgium and was dated to 145-125 MYA (Alvin 1960). By the Late Cretaceous, had a widespread distribution in Laurasia and had differentiated into two subgenera Pinus and Strobus (Millar 1998; Willyard et al. 2007). The evolution of the genus Pinus in Tertiary and Quaternary was highly influenced by climatic changes. The impacts of Quaternary glacial cycles on the distribution of pines and their genetic diversity have been well documented. For example, pine trees in boreal regions (e.g. P. sylvestris, P. contorta and P. banksiana) have shifted to southern latitudes during glaciations and colonized their current natural ranges from multiple and genetically differentiated glacial refugia (Naydenov et al. 2007; Godbout et al. 2012). In contrast, evolution of the genus during Tertiary remains obscure. Changing climates in the early Tertiary established warm and humid tropical/subtropical conditions throughout middle latitudes. Temperatures and rainfall reached their maxima in the early Eocene. Millar (1998) hypothesized that pines retreated into three important refugia (circumpolar high-latitude zone, low latitudes in North America and Euriasia, and upland regions in the middle latitudes of western North America) during the warm and humid periods of the Eocene, and recolonized middle latitudes following the cooling and drying of the climate at the end of Eocene. However, the history of isolation in the Eocene and the expansion process during Oligocene-Miocene remain unconfirmed, and

11

Millar’s hypothesis about Eocene refugia has not yet been tested using molecular data (but see Eckert & Hall 2006). Contractions and expansions out of Eocene and Pleistocene refugia could have resulted in contact between previously geographically separated species, and provided additional opportunities for hybridization and introgression. Therefore, hybridization and reticulate evolution could have been common in pines and played an important role in the evolution of the genus as a whole.

In the first study of this thesis (Paper I), I investigated historical introgression in the genus Pinus by comparing mtDNA trees with cpDNA trees. By using ancestral region reconstruction and molecular dating, we inferred the biogeographical history of the genus and the Eocene impacts on pine distribution and evolution.

Pinus densata – a homoploid hybrid species on the Tibetan Plateau Pinus densata represents a highly successful case of homoploid hybrid speciation (HHS) with far-reaching evolutionary consequences. This species forms extensive forests that regenerate well on the south-eastern (SE) Tibetan Plateau at elevations ranging from 2700 to 4200 m above sea level (Mao et al. 2009; Mao & Wang 2011). Ecological niche modeling has projected a vast area that may potentially be inhabited by P. densata, making it perhaps the most successful plant homoploid hybrid species in terms of the geographic scale of establishment reported to date (Mao & Wang 2011). Genetic analyses suggest that P. densata originated from hybridization between and Pinus yunnanensis without any change in ploidy level (Wang & Szmidt 1994; Wang et al. 2001; Liu et al. 2003; Song et al. 2003). Pinus tabuliformis is widely distributed from northern to central China, while P. yunnanensis has a relatively limited range in southwest (SW) China (Mao & Wang 2011). The distribution of the three pine species forms a geographical succession, with P. tabuliformis, P. densata and P. yunnanensis generally being found in northerly, intermediate and southerly latitudes, respectively (Mao & Wang 2011).

It was hypothesized that the evolution of P. densata following the initial hybridization events into a stabilized taxonomic unit was promoted by the uplift of the SE Tibetan Plateau, after which the successful hybrid lineages colonized the new, empty plateau habitat that was inaccessible to both parental species (Wang & Szmidt 1994; Wang et al. 2001; Ma et al. 2006). Patterns of variation in allozymes, cp, mt, and nuclear DNA show that individual populations of P. densata have very diverse genetic compositions, with reciprocal parentage and varying degrees of genomic contribution from

12 each parental species (Wang & Szmidt 1994; Wang et al. 2001; Song et al. 2003; Ma et al. 2006). These results suggest that individual populations of P. densata have unique evolutionary histories e.g. originated from independent hybridization, established by different colonization events, and varying introgression from parental species. However, due to inadequate geographic sampling in previous studies, the historical processes responsible for the current distribution and population structure of the species remain ambiguous.

In Paper II of this thesis, mt- and cpDNA variation was surveyed in P. densata and its two putative parental species throughout their ranges. We addressed the following questions: 1) Where did P. densata originate, and how did it colonize the SE Tibetan Plateau? 2) What genetic events occurred in the hybrid zone, and accompanied its colonization and evolution? In Paper III, we surveyed nucleotide polymorphisms in nuclear genes in P. densata and its parental species. Using this information and coalescent simulations, we inferred the divergence process of P. densata during its origin and subsequent colonization of vast new territories. Our specific objectives were: 1) to establish timeframes for the origin and subsequent divergence of P. densata lineages, 2) to determine how the size of the P. densata population varied throughout its speciation history, and 3) to characterize the patterns of introgression from parental species and the gene flow between divergent gene pools of P. densata.

Pinus yunnanensis – a subtropical pine endemic to southwest China Pinus yunnanensis is a subtropical pine endemic to SW China, which has a continuous distribution in the Yunnan-Guizhou region at elevations ranging from 700-3000 m above sea level across all the major river valleys (Mao & Wang 2011). Climatic conditions vary between regions divided by the mountain chains, and pronounced morphological variations in this pine have been recorded across its range (Yu et al. 1998; Mao et al. 2009). Pinus yunnanensis was involved in the speciation of P. densata as one of the parental species (Wang & Szmidt 1994; Wang et al. 2001; Song et al. 2003).

The topography of SW China is characterized by a number of large valley systems, which were reorganized and reinforced during the uplift of the SE Tibetan Plateau in the Late Miocene-Pliocene (Clark et al. 2004). Species in this region responded uniquely to these landscape changes. In a number of conifers, herbs and shrubs, phylogeographic studies have revealed major landscape effects in which the current mountain and valley systems have acted as natural dispersal barriers (Gao et al. 2007; Yuan et al. 2008), while

13 in some other plants, the spatial genetic structure was found to reflect the historical geography of the region rather than the current geography (Zhang et al. 2011; Yue et al. 2012). SW China has been free from glacial advances and retreats, creating a region with high biodiversity that has been maintained for millions of years (Yao et al. 2012). Thus, local adaptation and ecological divergence have potentially had sufficient time to influence the pattern of genetic differentiation in many local species. Similarly, the genetic divergence of P. yunnanensis could have been driven by both neutral and selective processes.

In Paper IV, we sampled populations of P. yunnanensis throughout its range to cover most of its ecological habitats. Both mtDNA and cpDNA variation and environmental data were analyzed to assess the influence of ecological and historical factors on genetic divergence in this species. We addressed the following questions: 1) How is genetic diversity distributed geographically in P. yunnanensis, and does the observed genetic pattern better reflect the modern or the historical geography? 2) What is the extent of environmental heterogeneity within the species’ range, and could ecological factors have promoted genetic differentiation among populations experiencing different habitats in this pine?

Pinus krempfii – a Tertiary relic pine in Vietnam Pinus krempfii is a unique pine endemic to Vietnam. It is a large , up to 30 m in height, with diameter at breast height up to 2 m. This species occurs in small groups of 10-30 individuals in tropical mixed broadleaf forest. Morphologically it differs from all the other pines by having two flat -like needles rather than typical pine needles (Lecomte 1921; Mirov 1967). As a result, since its first description by Lecomte (1921), there has been a considerable controversy over its classification (reviewed in Price et al. 1998). In most recent classifications, P. krempfii has been considered to belong to the subgenus Strobus (Price et al. 1998; Gernandt et al. 2005). The placement of P. krempfii in subgenus Strobus was supported by phylogenetic studies based on both cp and nuclear DNA data (Wang et al. 1999; Wang et al. 2000; Gernandt et al. 2005; Syring et al. 2005; Parks et al. 2012). A fossil-calibrated phylogeny suggests that the diversification within the genus Pinus was relatively recent, with the emergence of the P. krempfii lineage dating to 14-27 million years ago (Willyard et al. 2007).

Pinus krempfii is regarded as an endangered species and its distribution is limited to just two provinces in Vietnam (Nguyen et al. 2004). However, in spite of biological and economic importance of this unusual species, there is no information about the levels and patterns of genetic variation in its

14

Aims of the Thesis Gene flow and hybridization play important roles in plant evolution. The process of divergence and speciation is strongly influenced by neutral and selective forces and their interactions with the genetic system of the organisms. The dissection of these processes in natural systems is important for understanding the general importance of interspecific gene flow in generating novel biodiversity and the mechanisms driving divergent adaptation and speciation in plants. This thesis investigated the geographical distribution of introgression in the genus Pinus, the speciation history of the homoploid hybrid pine P. densata, and the patterns and mechanisms of divergence in two additional subtropical pine species. The specific aims were: I. To establish the historical introgression events in the evolution of the genus Pinus, and reconstruct the recent biogeographical history of the genus.

II. To establish the origin, population structure and demographic history of the hybrid pine P. densata.

III. To examine the impact of geography and environment on the genetic differentiation of the subtropical pine P. yunnanensis.

IV. To evaluate genetic diversity and population demography of an endangered endemic pine P. krempfii.

16

Results and Discussion

MtDNA Capture and Biogeographic History of the Genus Pinus

MtDNA capture In Paper I, we sequenced more than 11 kbp mtDNA segments in multiple accessions of 36 pine species. By using phylogenetic and phylogeographic methods, we traced the historical introgression and mtDNA capture events, and reconstructed the biogeographical history of the genus. The mtDNA based phylogenetic tree of the genus differed from that based on chloroplast DNA in the placement of several groups of species (Figure 4). In subgenus Pinus (hard pines), the Himalayan P. roxburghii and the southeast Asian tropical P. merkusii are grouped with east Asian pines of the subsection Pinus on the mtDNA tree (Figure 4). In contrast, on the cpDNA trees these two species are grouped with members of the Mediterranean subsection Pinaster (Gernandt et al. 2005; Eckert & Hall 2006; Parks et al. 2012). Pinus roxgburghii and P. merkusii are morphologically similar to Mediterranean pines (Klaus & Ehrendorfer 1989; Frankis 1993). Klaus & Ehrendorfer (1989) proposed that the Mediterranean ancestor of P. roxburghii followed the Tethys coast to the east and reached the Himalayan region where P. roxburghii arose. At present, P. roxburghii is restricted to the Himalayas, and isolated from the species of subsection Pinus. It is possible that P. roxburghii colonized further east along the Himalayas and encountered the ancestor of those east Asian hard pines before the uplift of the Tibetan Plateau. Spatial expansion can induce introgression from local species into colonizing species in genomes with low migration ability (e.g. the mt genome in pine), and result in mtDNA capture (Petit & Excoffier 2009). Under these scenarios, P. roxburghii could have captured mtDNA from the ancestor of some East Asian hard pines and subsequently fixed it by drift during range shifts. Pinus merkusii might have followed a similar evolutionary pathway to that of P. roxburghii: ancestral migration from the Mediterranean and capture of Asian mtDNA. It is even plausible that both species shared the same capture event, and their separation was due to the uplift of the Tibetan Plateau. A long isolation history and adaptation to different environments could lead to distinct morphology, population structure, and chemical composition in these species (Mirov 1967; Szmidt et al. 1996).

In subgenus Strobus, four Eurasian species (P. parviflora, P. cembra, P. armandii and P. wallichiana, which are referred to as PCAW species

17 hereafter) of subsection Strobus were grouped with subsections Gerardianae (excluding P. bungeana) and Krempfianae, and were well separated from all other species (non-PCAW species) of subsection Strobus on the mtDNA tree (Figure 4). In contrast, all previous studies based on cpDNA support the monophyly of subsection Strobus and its sister relationship to subsections Gerardianae and/or Krempfianae (Wang et al. 1999; Gernandt et al. 2005; Eckert & Hall 2006; Parks et al. 2012). This difference in the placement of the PCAW species on the mtDNA and cpDNA trees indicates the capture of an mtDNA lineage from the ancestor of subsections Gerardianae and Krempfianae by the ancestor of PCAW. The time of the capture event was estimated to 27-14 MYA. This time coincides with the Miocene expansion of pines from Eocene refugia, where the ancestor of the PCAW species could have encountered that of Gerardianae- Krempfianae and captured mtDNA from it.

The remaining species in subsection Strobus (non-PCAW species) formed one strongly supported clade on our mtDNA tree, in which species of central to south North America (P. lambertiana, P. flexilis P. ayacahuite and P. strobiformis) grouped with species from both sides of Beringia (P. koraiensis, P. pumila, P. monticola and P. albicaulis; Figure 4). This pattern is different from that based on cpDNA, in which species of central to southern North America are distinct from those from the Beringia region (Gernandt et al. 2005; Parks et al. 2012). This may point to another capture event of mtDNA in subgenus Strobus, probably from a northern species (P. albicaulis or P. monticola) by central and southern North American species.

Figure 4. Geographical distributions of species from subgenera Pinus (A) and Strobus (B) adapted from Mirov (1967). The distribution of major groups of species identified from mtDNA and cpDNA trees (see C for details) are shown by outlines and shaded areas, respectively. Species with different placements on the two trees are indicated in bold. (C) Contrasting phylogenies of mtDNA and cpDNA for the genus Pinus. Species with different placements on the two trees were indicated in bold. Within each subgenus, major groups of branches were colored individually. Two possible positions of P. bungeana revealed by different mtDNA segments are shown by a dashed red line. Branch lengths of mtDNA tree were drawn to reflect divergence estimations based on silent sites by using an 85 MYA calibration as the divergence time between the two subgenera of Pinus. The cpDNA tree was determined from full chloroplast genome sequences adapted from Parks et al. (2012). The geological time scale (MYA) was provided to refer branching events to specific epochs.

18

Biogeography of the genus Pinus In subgenus Strobus, the first two splitting events generated three main clades, section Parrya, and subsections Gerardiana-Krempfianae and Strobus, approximately 59 MYA and 45 MYA, respectively (Figure 4). These estimated divergence times are consistent with climate fluctuations during Eocene, when pine species retreated into three important refugia (Millar 1998). Thus, the early divergence in subgenus Strobus could have started or have been reinforced when the three main clades survived in different Eocene refugia. According to their current distribution, these three main clades could have been preserved in different Eocene refugia with section Parrya, subsections Gerardiana and Krempfianae and subsection Strobus confined to the middle latitudes of western North America, alongside the Tethys Ocean, and in the circumpolar high-latitude zone, respectively.

In contrast to subgenus Strobus, the divergence in subgenus Pinus started after the Eocene, during the Oligocene-Miocene. The crown group of subgenus Pinus is connected by a long stem, which extends over 50 million years (Myr) from the Late Cretaceous to the Oligocene. The tree topology of a long stem leading to a cluster of short branches indicates a mass extinction as suggested by a previous simulation study (Crisp & Cook 2009). Thus, subgenus Pinus could have experienced extinctions during its contraction into the Eocene refugium, most likely a single refugium located in a circumpolar high-latitude zone, from where they could colonize North America and Eurasia through Beringia after they expanded out of their refugium. During the Oligocene-Miocene expansion, genetic drift could have increased the genetic differentiation between lineages that colonized different continents. Geographical barriers created since the mid-Miocene would have facilitated divergence between pine species on different continents, and finally generated sections Pinus and Trifoliae in Eurasia and North America, respectively. Further colonization by each section within their respective continents could have induced Miocene species radiation within the different sections.

In conclusion, our analyses revealed clear discrepancies between the mtDNA and cpDNA trees of the genus Pinus. These discrepancies are associated with strong geographical patterns indicating mtDNA captures during past contraction and expansion histories. The multiple mtDNA capture events identified in this study suggest that hybridization and introgression were common and played an important role in the evolution of the genus. Patterns of divergence within the genus suggest earlier diversification of subgenus Strobus and its preservation in multiple refugia during the Eocene, in contrast to the more recent diversification of the extant

20 lineages of subgenus Pinus, which probably occurred during its expansion from a single Eocene refugium. These results provide new insights into the evolution of the genus.

Homoploid Hybrid Speciation of Pinus densata

Hybridization and colonization processes of Pinus densata In Paper II, we surveyed paternally inherited chloroplast (cp) and maternally inherited mitochondrial (mt) DNA variation within and among 54 populations of P. densata and its putative parental species throughout their respective ranges (Figure 5). High levels of total genetic diversity and population differentiation were detected in P. densata at both mtDNA (HT =

0.735 and GST = 0.841) and cpDNA (HT = 0.929 and GST = 0.126). The distribution of this variation was geographically highly structured (Figure 5). Analysis of mtDNA spatial genetic structure across all populations of P. densata detected seven groups of populations (Groups I-VII, Figure 5), while the cpDNA variation identified two groups of populations in this species (Groups A and B, Figure 5).

The ancestral hybrid zone The most genetically complex populations were those located in the northeastern periphery of the P. densata range (Groups I-IV). Both major and rare mitotypes representative of P. tabuliformis, P. yunnanensis and P. densata were detected in populations from this region. The presence of the common P. yunnanensis and P. tabuliformis mitotypes (e.g. M4, M8 and M24) in populations of P. densata from this region suggests that P. yunnanensis and P. tabuliformis overlapped and hybridized in this place, as both of these species appear to have acted as maternal parents. All populations of P. densata in this region (Group A) were dominated by P. tabuliformis chlorotypes, indicating predominant pollen flow from P. tabuliformis. These findings are consistent with hypotheses regarding evolutionary history of P. densata relating to the recent uplift of Tibetan Plateau. Hybridization could have occurred in the previously overlapping zone between P. yunnanensis and P. tabuliformis. The uplift of the south- eastern Tibetan Plateau and associated climate changes could have gradually pushed P. yunnanensis and P. tabuliformis range southward and northward to their present distribution, respectively. After that, populations in the ancestral hybrid zone became fragmented and isolated. The region of the ancestral hybrid zone between P. yunnanensis and P. tabuliformis is recognized as one of the lowland refugial areas at the southern and eastern fringes of the Tibetan Plateau (Lehmkuhl 1998; Frenzel et al. 2003), thus

21 ancestral polymorphism and unique haplotypes that arose through mutation and recombination could have been preserved, resulting in a mitotype-rich area.

The westward colonization before glaciations In contrast to the spatially restricted Groups I-IV in the ancestral hybrid zone, Groups V-VII each spanned a large geographic region with low levels of mtDNA diversity in the west of P. densata range. These three groups are each monomorphic (or almost monomorphic) for M13, M12 and M11, respectively (Figure 5). A similar, but less pronounced, reduction in genetic diversity was also detected in cpDNA. Populations in the western part of the P. densata range (Group B, corresponds to mtDNA Groups V-VII) were dominated by P. yunnanensis chlorotypes, indicating predominant pollen flow from P. yunnanensis. Populations in the most western part (Nos. 15-26) have different frequencies of cpDNA alleles from populations in other regions. These populations could have been isolated from parental species by seed and pollen exchange for a long time. The decline in genetic diversity for both mtDNA and cpDNA from the east to the west of P. densata range indicates a westward range expansion, during which a series of bottlenecks occurred along the colonization route. The westward colonization of P. densata probably arose in three waves, leading to establishment of Groups V-VII, respectively. The fixation of unique mitotypes in Groups V-VII was likely caused by allele surfing during this range expansion.

Figure 5. (A) Geographic distribution of the 54 sampled populations of P. densata, P. tabuliformis and P. yunnanensis. (B) The distribution of the 29 mitotypes detected in the three pine species. Pie charts show the proportions of the different mitotypes in each population. Seven groups (I-VII) of P. densata are shown. In the network, each link represents one mutation step, and circle size is proportional to the frequency of a mitotype over all populations. (C) The distribution and relationships of chlorotypes. Fifty common chlorotypes were clustered into two major groups. In the network, circle size is proportional to the frequency of a chlorotype over all populations. The pie chart for each population shows the proportions of the chlorotypes from the different groups. Chlorotypes that occurred only once or twice over all populations are in black. The two groups (A and B) of P. densata populations are also shown.

22

Pinus densata colonized the Tibetan Plateau before the last glaciations (see Paper III). Quaternary glaciations in Tibet were geographically limited to isolated mountain groups or smaller plateaus in higher mountains (Shi 2002; Lehmkuhl & Owen 2005). Thus, P. densata populations could have persisted during the glaciations in multiple small regional refugia, and this ancient genetic structure could have been preserved in extant populations. Repeated periods of isolation and range expansion during glaciation cycles could have enhanced population differentiation and independent evolution.

In summary, strong spatial genetic structure in both cp and mtDNA was detected in P. densata populations. The northeastern part of the P. densata range represents the ancestral hybrid zone and the central and western parts seem to have been established by multiple colonization events. Along the colonization route, consecutive bottlenecks and surfing of rare alleles caused a significant reduction in genetic diversity and strong population differentiation. The direction and intensity of introgression of P. densata with the two parental species varies among populations across its geographic range. Pollen flowed predominantly from P. tabuliformis in the ancestral hybrid zone, from P. yunnanensis in the western part, and the most western populations have been isolated from their parental species by seed and pollen flow for a long time. The observed spatial distribution of genetic diversity in P. densata appears to reflect the persistence of this species on the plateau during the last glaciations. Our results indicate that both the ancient and contemporary population dynamics have contributed to the spatial distribution of the genetic diversity in P. densata.

Demographic history of Pinus denasata In Paper III, we sampled populations of P. densata that are representative of different colonization events (Paper II), and surveyed nucleotide polymorphism over eight nuclear loci in 19 populations of P. densata and its parental species. Using this information and coalescence simulations, we assessed the historical changes in its population size, gene flow and divergence in time and space.

Genetic diversity and population structure at nuclear loci

High levels of nucleotide polymorphism (θw = 0.0098 and πt = 0.0065) and population differentiation (FST = 0.282) were detected in P. densata. The genetic composition of P. densata varied across geographic regions, and its populations could be roughly divided into three groups (I-III, Figure 6). The genetic composition of Group I (Nos. 1-3) and Group III (Nos. 9-10) were dominated by ancestry from P. tabuliformis and P. densata, respectively. Group II (Nos. 4-8) had evenly admixed ancestry from both parental species

24

extensive forests. Rapid speciation events triggered by the uplift of the plateau have also been proposed for other plant species in this region (e.g. Liu et al. 2006). Divergence within P. densata coincides with regional topographic changes during the Pliocene and middle Pleistocene glaciations (Zheng & Rutter 1998; Clark et al. 2004). Thus, differentiation between P. densata groups was affected by both regional tectonic events and climate fluctuations.

GroupI GroupII GroupIII 3UHVHQW   N  NH ; NH ; H ; 

 





 0<$

Ancestorof GroupII&III

NH  

 0<$

Ancestorofall threegroups  NH ; 3DVW

Figure 7. Summary of the isolation-with-migration model for the three P. densata groups. Fifteen demographic parameters estimated by IMa2 are shown. Each block represents a current or ancestral population with their estimated effective population size (Ne). Arrows denote the direction of gene flow with the estimated migration rate labeled above or below the arrow. The timings of the two splitting events are indicated in MYA.

The effective population size estimated for the ancestor of P. densata (ancestor of all three groups, 0.41 × 105) was much higher than that for the ancestor of Groups II and III (less than 1000, Figure 7), indicating that a severe bottleneck occurred during the westward colonization. This finding is consistent with the significant decline in genetic diversity in both cytoplasmic (Paper II) and nuclear genomes along the route of westward colonization. ABC simulations suggested that all three P. densata groups

26 experienced a bottleneck followed by exponential growth. The weakest and most ancient bottleneck was detected in Group I. This region is recognized as a refugial area (Lehmkuhl 1998; Frenzel et al. 2003), where populations could have been less exposed to climatic fluctuations than those in Groups II and III. In contrast to Group I, the effective population size for Groups II and III have increased from their ancestor by factors of 367 and 112, respectively.

Patterns of intraspecific gene flow varied among different P. densata groups (Figure 7). Relatively high gene flow (2Nm>1.0) was only detected between neighboring groups. Low gene flow between geographically distant groups (Groups I and III) was probably caused by the long distances separating them and the complex topography of southeast Tibet. Gene flow detected between Groups I and II is more likely to be a signature of historical gene exchange because continued gene flow would have homogenized the distinct mt and cpDNA structures observed in these groups (Paper II). Gene flow detected between Groups II and III may reflect a secondary contact after the most recent glaciations, because only the most geographically close populations between Group II and III showed similar chloroplast genome components (Paper II). Gene flow from Groups I and III into Group II was much higher than in the opposite directions. These asymmetric gene flows could be caused by intrinsic or extrinsic factors affecting hybridization outcomes. This interesting phenomenon is worth further investigation. On the other hand, we have to keep in mind that violation of IM assumption due to gene flow from unsampled populations may result in biased estimations of gene flow (Strasburg & Rieseberg 2010).

In summary, the present study based on nuclear DNA data revealed high levels of genetic diversity and population differentiation in P. densata. Coalescence analyses further established details of the demography and speciation history of this species, yielding four key findings. First, P. densata originated in the late Miocene triggered by the recent uplift of the Tibetan Plateau. The divergence within P. densata dates from the late Pliocene and was induced by regional topographic changes and Pleistocene glaciations. Second, the ancestral P. densata population was much larger than the ancestral population sizes of the central and western populations, indicating severe bottlenecks during the westward colonization. Third, recent population expansions have occurred in all geographic regions, especially in the western populations, but at different times in the past. Finally, gene flow between P. densata populations is limited and is observed only between adjacent geographic regions. Such genetic isolation could be due to geo- physical, ecological and intrinsic genetic factors.

27

Role of Geography and Ecology in the Genetic Divergence of Pinus yunnanensis In Paper IV, we sampled 16 populations throughout the range of P. yunnanensis, and investigated the patterning of intraspecific genetic diversity based on mt- and cpDNA data. By using this information, together with environmental data, we assessed the relative roles of ecological and geographic factors on genetic divergence within P. yunnanensis.

Genetic diversity and phylogeography of Pinus yunnanensis

Population differentiation in mtDNA (GST = 0.538) was higher than that in cpDNA (GST =0.108). This result is consistent with expectations regarding patterns of genetic diversity for these two cytoplasmic genomes. For mtDNA, seven population groups (I-VII) were detected, while cpDNA variation failed to reveal any meaningful phylogeographic grouping. Group I, VI and VII each spanned a large geographical area in the western, central and southeastern regions of P. yunnanensis, respectively (Figure 8). The other four groups (II-V) were each restricted to the centre of the P. yunnanensis range. Sharing of mitotypes between groups was widespread even though the groups were separated by both paleo- and modern river systems. The sharing of mitotypes between regions of P. yunnanensis indicates that there was probably a low relief of paleo-landscape, where regional populations were connected and this pattern is still visible in the extant population structure. Group II and III each have one population with unique mitotype compositions. These two populations were distributed along the paleo- Jinsha River, but separated from each other by a sharp bend in the current river course. The establishment of these two populations is likely to have been linked to the formation of the modern river course (Clark et al. 2004), during which distinct mitotypes could have been fixed by genetic drift. Taking together, we propose that the distribution of mtDNA variation in P. yunnanensis has been shaped by both the paleo-landscape and the formation of the modern regional topography. The observation of continuous genetic differentiation over the main range of P. yunnanensis, together with discrete isolated local clusters, suggests an ancient landscape that imposed little constraint on migration, but which was subsequently disrupted due to regional geological movements.

28

IBD vs. IBE Partial Mantel tests found that mtDNA genetic distance significantly correlated with both geographic and ecological distance (rgen-geo = 0.22, P < 0.05; rgen-eco = 0.28, P < 0.05), while cpDNA genetic distance only correlated with geographic distance (rgen-geo = 0.25, P < 0.01; rgen-eco = -0.06, P > 0.05) at species level. These results indicate that both geographic and environmental factors contributed to the pattern of mtDNA variation across the species as a whole, but only geographic distance affected cpDNA relatedness between populations.

Three distinct ecotypic clusters (Py-eco1, Py-eco2 and Py-eco3) were identified in P. yunnanensis (Figure 9). This division is broadly congruent with that based on mtDNA variation. The two peripheral Groups I and VII corresponded to Py-eco1 and Py-eco3, respectively, and all the other five groups (II-VI) in the central area belonged to Py-eco2. The Py-eco1 region has a humid subtropical climate similar to the Mio-Pliocene paleo-climate, in which the ancestor of P. yunnanensis was present (Xing et al. 2010). The Py- eco3 region represents a much drier climate than Py-eco1. Populations in this region have a distinct morphology characterized by thin and pendulous needles, which is considered to be adaptation to dry and hot environments (Li & Wang 1981). The congruence between ecological and genetic divisions suggests that environmental adaptation could have contributed to the genetic divergence of Py-eco1 and Py-eco3. Py-eco2 covers the major range of the species, including population Groups II-VI. Partial Mantel test revealed that population genetic distance in this region correlated only with geographic distance (rgen-geo = 0.18, P < 0.05; rgen-eco = 0.03, P > 0.05). Thus, the genetic groups recognized in Py-eco2 were shaped by geographical factors rather than by enviromental elements.

Overall, integrating the results of genetic analysis and ecological niche modeling revealed the ecological and phylogeographic processes driving intraspecific genetic divergence within P. yunnanensis. In this species, the observation of continuous genetic differentiation over the majority of its range and discrete isolated local clusters suggests a paleo-landscape that was generally well connected and imposed few migration constraints, but which was subsequently disrupted as a result of geomorphological movements in response to the uplift of the eastern Tibetan Plateau. The genetic distinctiveness of the western and southeastern populations is consistent with niche divergence and geographical isolation of these groups, suggesting that both geographical and environmental factors played a role in their differentiation. In contrast, population structure in the central area of P. yunnanensis was shaped mainly by neutral processes and local geography

30

Genetic Variation and Population Demography of Pinus krempfii In Paper V, we sampled 57 individuals from six natural populations of P. krempfii. Genetic variation of this species was analyzed at ten nuclear genes, 14 mitochondrial mtDNA regions and seven cpSSR loci. Our analysis revealed extremely low levels of nucleotide polymorphism in P. krempfii. For nuclear loci, the mean silent nucleotide diversity in P. krempfii (πs = 0.0015, θws = 0.0020) was 4-10 fold lower than most of previous estimates for other pines (Figure 10). For mtDNA, this species showed no variation across 10 mtDNA regions (~ 10 kbp). For the cpSSR, haplotype diversity (He = 0.911) detected in P. krempfii was comparable with other pines due to the hyper- variable nature of SSR markers. Thus, P. krempfii displays one of the lowest nucleotide diversity among pines reported this far.

Low (5.2%) but significant differentiation was detected among the extant populations of P. krempfii. This level of differentiation is comparable with pine species of wide distribution ranges (e.g. Ma et al. 2006). Apparently, despite relative proximity of individual populations the fragmented nature of P. krempfii distribution has led to certain degree of population differentiation. Unlike most other pines, P. krempfii does not form pure stands. Individual populations consist of small groups of trees or solitary individuals dispersed among dense thicket of other tree species (Nguyen & Thomas 2004). These conditions are likely to limit dispersal of its pollen and seed, and contribute to differentiation between local populations.

We used ABC to infer the demographic history of P. krempfii based on nuclear loci. The ABC model selection approach indicated population expansion in P. krempfii. Population expansion was also supported by a mismatch distribution test based on the cpDNA data. Assuming a generation time of 50 years and mutation rate per year of 7 × 10-10 estimated for the genus Pinus by Willyard et al. (2007), the estimated population size for P. krempfii was 1.43 × 104, and population growth started approximately 49 generations ago. This population expansion history is too short to allow for accumulation of extensive polymorphism. Moreover, the habitat of P. krempfii has deteriorated and become fragmented in the last decades (Nguyen et al. 2004), which could have resulted in the reduction and fragmentation of P. krempfii populations. The reduction in population size and population fragmentation could decrease the frequency of rare alleles in a very short time (Ellstrand & Elam 1993). However, the most recent population decline may not be detectable by the current data and

32

Concluding Remarks This thesis addressed the role of hybridization, geography and climate in the evolution of Pinus species. The geographical distribution of cytoplasmic DNA variations revealed relatively common introgression between pine species during the past range contractions and expansions of the different species, resulting in multiple mtDNA capture events in different geographical regions. Hybridization has generated new biodiversity through homoploid hybrid speciation in the genus. A significant case of hybrid speciation is the origin and establishment of P. densata via multiple hybridization events in the late Miocene, during the uplift of the Tibetan Plateau. The complex genetic structure detected in this species has been shaped by both ancient and contemporary population dynamics. Multiple origins from genetically differentiated parental lineages and varied parental introgression across geographical regions have resulted in the distinct genetic composition of different populations across the species range. During the colonization of the Tibetan Plateau from the ancestral hybrid zone, consecutive bottlenecks and surfing of the rare alleles have caused a significant reduction in genetic diversity and strong population differentiation. Further genetic divergence among geographical regions of this species was induced by regional topographic changes and climate changes from the Pliocene onwards.

Similar to the strong spatial genetic structure observed in P. densata, significant differentiation across complex landscapes was discovered in P. yunnanensis, a species distributed south of the southeast Tibetan Plateau. Both environmental and geographical factors were found to have contributed to genetic divergence in P. yunnanensis. However, their relative importance varied among population groupings across the species’ range. Further to the south in the tropical region of southeast , there is a marked reduction in pine species diversity. This thesis conducted the first population genetic study on endemic, relict P. krempfii in Vietnam, and found extremely low genetic diversity in the species. Further simulations suggested that a small ancestral population size, short-term population expansion and habitat fragmentation were likely responsible for this low genetic diversity.

Overall, the thesis highlights the important role of hybridization in generating novel genetic diversity in the genus Pinus. It also demonstrates the different mechanisms that drive divergence and adaptation in the genus. These results are important for understanding the drivers and tempo of adaptation and divergence in conifers, and advance our knowledge on the origin and maintenance of biological diversity on the Tibetan Plateau and other mountain systems. Future studies on adaptation and speciation should employ functional and population genomic data together with new statistical

34 tools to better characterize and quantify the interplay between selection, hybridization and other evolutionary processes, and their genomic and adaptive consequences.

References Abbott R et al. (2013) Hybridization and speciation. Journal of Evolutionary Biology 26, 229-246. Abbott RJ, Hegarty MJ, Hiscock SJ, Brennan AC (2010) Homoploid hybrid speciation in action. Taxon 59, 1375-1386. Aitken SN, Whitlock MC (2013) Assisted gene flow to facilitate local adaptation to climate change. Annual Review of Ecology, Evolution, and Systematics 44, in press. Alberto FJ et al. (2013) Potential for evolutionary responses to climate change evidence from tree populations. Global Change Biology 19, 1645- 1661. Alvin KL (1960) Further conifers of the Pinaceae from the Wealden Formation of Belgium. Institut Royal des Sciences Naturelles de Belgique Memoires 146, 1-39. Arnold ML (1993) Iris nelsonii (Iridaceae): origin and genetic composition of a homoploid hybrid species. American Journal of Botany 80, 577-583. Beaumont MA, Zhang WY, Balding DJ (2002) Approximate Bayesian computation in population genetics. Genetics 162, 2025-2035. Becquet C, Przeworski M (2009) Learning about modes of speciation by computional approaches. Evolution 63, 2547-2562. Blum MGB, Francois O (2010) Non-linear regression models for Approximate Bayesian computation. Statistics and Computing 20, 63- 73. Buerkle CA, Morris RJ, Asmussen MA, Rieseberg LH (2000) The likelihood of homoploid hybrid speciation. Heredity 84, 441-451. Buerkle CA, Rieseberg LH (2008) The rate of genome stabilization in homoploid hybrid species. Evolution 62, 266-275. Clark MK, Schoenbohm LM, Royden LH, Whipple KX, Burchfiel BC, Zhang X, Tang W, Wang E, Chen L (2004) Surface uplift, tectonics, and erosion of eastern Tibet from large-scale drainage patterns. Tectonics 23. Crisp MD, Cook LG (2009) Explosive radiation or cryptic mass extinction? Interpreting signatures in molecular phylogenies. Evolution 63, 2257- 2265. Currat M, Ruedi M, Petit RJ, Excoffier L (2008) The hidden side of invasions: massive introgression by local genes. Evolution 62, 1908- 1920.

35

Du FK, Petit RJ, Liu JQ (2009) More introgression with less gene flow: chloroplast vs. mitochondrial DNA in the Picea asperata complex in China, and comparison with other Conifers. Molecular Ecology 18, 1396- 1407. Eckert AJ, Hall BD (2006) Phylogeny, historical biogeography, and patterns of diversification for Pinus (Pinaceae): phylogenetic tests of fossil-based hypotheses. Molecular Phylogenetics and Evolution 40, 166-182. Edmonds CA, Lillie AS, Cavalli-Sforza LL (2004) Mutations arising in the wave front of an expanding population. Proceedings of the National Academy of Sciences, USA 101, 975-979. Ellstrand NC, Elam DR (1993) Population genetic consequences of small population size: implications for plant conservation. Annual Review of Ecology and Systematics 24, 217-242. Excoffier L, Foll M, Petit RJ (2009) Genetic consequences of range expansions. Annual Review of Ecology Evolution and Systematics 40, 481-501. Feder JL, Egan SP, Nosil P (2012) The genomics of speciation-with-gene- flow. Trends in Genetics 28, 342-350. Frankis MP (1993) Morphology and affinities of . In: International symposium Pinus brutia (ed. Tashkin O), pp. 11-18. Ministry of forestry, Ankara. Frenzel B, Bräuning A, Adamczyk S (2003) On the problem of possible last- glacial forest-refuge-areas within the deep valleys of eastern Tibet. Erdkunde 57, 182-198. Funk DJ, Nosil P, Etges WJ (2006) Ecological divergence exhibits consistently positive associations with reproductive isolation across disparate taxa. Proceedings of the National Academy of Sciences, USA 103, 3209-3213. Gao LM, Moeller M, Zhang XM, Hollingsworth ML, Liu J, Mill RR, Gibby M, Li DZ (2007) High variation and strong phylogeographic pattern among cpDNA haplotypes in Taxus wallichiana (Taxaceae) in China and north Vietnam. Molecular Ecology 16, 4684-4698. Gérardi S, Jaramillo-Correa JP, Beaulieu J, Bousquet J (2010) From glacial refugia to modern populations: new assemblages of organelle genomes generated by differential cytoplasmic gene flow in transcontinental black spruce. Molecular Ecology 19, 5265-5280. Gernandt DS, Lopez GG, Garcia SO, Liston A (2005) Phylogeny and classification of Pinus. Taxon 54, 29-42. Godbout J, Yeh FC, Bousquet J (2012) Large-scale asymmetric introgression of cytoplasmic DNA reveals Holocene range displacement in a North American boreal pine complex. Ecology and Evolution 2, 1853-1866. Gross BL, Rieseberg LH (2005) The ecological genetics of homoploid hybrid speciation. Journal of Heredity 96, 241-252.

36

Hallatschek O, Nelson DR (2008) Gene surfing in expanding populations. Theoretical Population Biology 73, 158-170. Hamann A, Aitken SN (2013) Conservation planning under climate change: accounting for adaptive potential and migration capacity in species distribution models. Diversity and Distributions 19, 268-280. Hamilton JA, Lexer C, Aitken SN (2013a) Differential introgression reveals candidate genes for selection across a spruce (Picea sitchensis x P. glauca) hybrid zone. New Phytologist 197, 927-938. Hamilton JA, Lexer C, Aitken SN (2013b) Genomic and phenotypic architecture of a spruce hybrid zone (Picea sitchensis x P. glauca). Molecular Ecology 22, 827-841. Harrison TM, Copeland P, Kidd WSF, Yin A (1992) Raising Tibet. Science 255, 1663-1670. Hey J (2005) On the number of New World founders: a population genetic portrait of the peopling of the Americas. Plos Biology 3, 965-975. Hey J (2010) Isolation with migration models for more than two populations. Molecular Biology and Evolution 27, 905-920. Hey J, Nielsen R (2004) Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 167, 747-760. Hey J, Nielsen R (2007) Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proceedings of the National Academy of Sciences, USA 104, 2785-2790. Holliday JA, Suren H, Aitken SN (2012) Divergent selection and heterogeneous migration rates across the range of Sitka spruce (Picea sitchensis). Proceedings of the Royal Society B-Biological Sciences 279, 1675-1683. Ingvarsson PK (2008) Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180, 329-340. Jenkins DG et al. (2010) A meta-analysis of isolation by distance: relic or reference standard for landscape genetics? Ecography 33, 315-320. Klaus W, Ehrendorfer F (1989) Mediterranean pines and their history. Plant Systematics and Evolution 162, 133-163. Klopfstein S, Currat M, Excoffier L (2006) The fate of mutations surfing on the wave of a range expansion. Molecular Biology and Evolution 23, 482-490. Lecomte H (1921) Un pin remarquable de l'Annam, Pinus krempfii. Bulletin du Museum D'Histoire Naturelle, Paris 27, 191-192. Lee CR, Mitchell-Olds T (2011) Quantifying effects of environmental and geographical factors on patterns of genetic differentiation. Molecular Ecology 20, 4631-4642. Lehmkuhl F (1998) Extent and spatial distribution of Pleistocene glaciations in eastern Tibet. Quaternary International 45-6, 123-134.

37

Lehmkuhl F, Owen LA (2005) Late Quaternary glaciation of Tibet and the bordering mountains: a review. Boreas 34, 87-100. Lepais O, Gerber S (2011) Peproductive patterns shape introgression dynamics and species succession within the european white oak species complex. Evolution 65, 156-170. Lepais O, Petit RJ, Guichoux E, Lavabre JE, Alberto F, Kremer A, Gerber S (2009) Species relative abundance and direction of introgression in oaks. Molecular Ecology 18, 2228-2242. Lepais O, Roussel G, Hubert F, Kremer A, Gerber S (2013) Strength and variability of postmating reproductive isolating barriers between four European white oak species. Tree Genetics & Genomes 9, 841-853. Li ZJ, Wang XP (1981) The distribution of Pinus yunnanensis var. tenuifolia in relation to the environmental conditions. Chinese Journal of Plant Ecology 5, 28-37. Liu JQ, Wang YJ, Wang AL, Hideaki O, Abbott RJ (2006) Radiation and diversification within the Ligularia-Cremanthodium-Parasenecio complex (Asteraceae) triggered by uplift of the Qinghai-Tibetan Plateau. Molecular Phylogenetics and Evolution 38, 31-49. Liu ZL, Zhang D, Hong DY, Wang XR (2003) Chromosomal localization of 5S and 18S-5.8S-25S ribosomal DNA sites in five Asian pines using fluorescence in situ hybridization. Theoretical and Applied Genetics 106, 198-204. Ma XF, Szmidt AE, Wang XR (2006) Genetic structure and evolutionary history of a diploid hybrid pine Pinus densata inferred from the nucleotide variation at seven gene loci. Molecular Biology and Evolution 23, 807-816. Mallet J (2007) Hybrid speciation. Nature 446, 279-283. Manel S, Joost S, Epperson BK, Holderegger R, Storfer A, Rosenberg MS, Scribner KT, Bonin A, Fortin MJ (2010) Perspectives on the use of landscape genetics to detect genetic adaptive variation in the field. Molecular Ecology 19, 3760-3772. Mao JF, Li Y, Wang XR (2009) Empirical assessment of the reproductive fitness components of the hybrid pine Pinus densata on the Tibetan Plateau. Evolutionary Ecology 23, 447-462. Mao JF, Wang XR (2011) Distinct niche divergence characterizes the homoploid hybrid speciation of Pinus densata on the Tibetan Plateau. American Naturalist 177, 424-439. McCarthy EM, Asmussen MA, Anderson WW (1995) A theoretical assessment of recombinational speciation. Heredity 74, 502-509. Meirmans PG (2012) The trouble with isolation by distance. Molecular Ecology 21, 2839-2846.

38

Meirmans PG, Goudet J, Gaggiotti OE, IntraBioDiv C (2011) Ecology and life history affect different aspects of the population structure of 27 high- alpine plants. Molecular Ecology 20, 3144-3155. Millar CI (1998) Early evolution of pines. In: Ecology and biogeography of Pinus (ed. Richardson AD), pp. 69-91. Cambridge University Press, Cambridge (UK). Mirov NT (1967) The genus Pinus Ronald Press, New York. Muller HJ (1942) Isolating mechanisms, evolution and temperature. Biology Symposium 6, 71-125. Naydenov K, Senneville S, Beaulieu J, Tremblay F, Bousquet J (2007) Glacial vicariance in Eurasia: mitochondrial DNA evidence from Scots pine for a complex heritage involving genetically distinct refugia at mid-northern latitudes and in Asia Minor. Bmc Evolutionary Biology 7. Nguyen TH, Phan KL, Nguyen DTL, Thomas P, Farjon A, Averyanov L, Regalado J (2004) Vietnam Conifers: Conservation Status Review Fauna & Flora International, Vietnam Programme, Hanoi. Nielsen R, Wakeley J (2001) Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 158, 885-896. Nolte AW, Tautz D (2010) Understanding the onset of hybrid speciation. Trends in Genetics 26, 54-58. Nosil P (2012) Ecological Speciation Oxford University Press, New York. Nosil P, Feder JL (2012) Genomic divergence during speciation: causes and consequences. Philosophical Transactions of the Royal Society B- Biological Sciences 367, 332-342. Parks M, Cronn R, Liston A (2009) Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. Bmc Biology 7, 84. Parks M, Cronn R, Liston A (2012) Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae). Bmc Evolutionary Biology 12, 100. Petit RJ, Duminil J, Fineschi S, Hampe A, Salvini D, Vendramin GG (2005) Comparative organization of chloroplast, mitochondrial and nuclear diversity in plant populations. Molecular Ecology 14, 689-701. Petit RJ, Excoffier L (2009) Gene flow and species delimitation. Trends in Ecology & Evolution 24, 386-393. Pinho C, Hey J (2010) Divergence with gene flow: models and data. Annual Review of Ecology, Evolution, and Systematics 41, 215-230. Price RA, Liston A, Strauss SH (1998) Phylogeny and systematics of Pinus. In: Ecology and Biogeography of Pinus (ed. Richardson DM), pp. 49-68. Cambridge University Press, Cambridge. Rahme J, Widmer A, Karrenberg S (2009) Pollen competition as an asymmetric reproductive barrier between two closely related Silene species. Journal of Evolutionary Biology 22, 1937-1943.

39

Rieseberg LH (1997) Hybrid origins of plant species. Annual Review of Ecology and Systematics 28, 359-389. Rieseberg LH (2006) Hybrid speciation in wild sunflowers. Annals of the Missouri 93, 34-48. Rieseberg LH, Sinervo B, Linder CR, Ungerer MC, Arias DM (1996) Role of gene interactions in hybrid speciation: evidence from ancient and experimental hybrids. Science 272, 741-745. Rieseberg LH, Soltis DE (1991) Phylogenetic consequences of cytoplasmic gene flow in plants. Evolutionary Trends in Plants 5, 65-84. Scascitelli M, Whitney KD, Randell RA, King M, Buerkle CA, Rieseberg LH (2010) Genome scan of hybridizing sunflowers from Texas (Helianthus annuus and H. debilis) reveals asymmetric patterns of introgression and small islands of genomic differentiation. Molecular Ecology 19, 521-541. Schluter D (2009) Evidence for ecological speciation and its alternative. Science 323, 737-741. Senjo M, Kimura K, Watano Y, Ueda K, Shimizu T (1999) Extensive mitochondrial introgression from Pinus pumila to P. parviflora var. Pentaphylla (Pinaceae). Journal of Plant Research 112, 97-105. Shafer ABA, Wolf JBW (2013) Widespread evidence for incipient ecological speciation: a meta-analysis of isolation-by-ecology. Ecology Letters 16, 940-950. Shi Y (2002) Characteristics of late Quaternary monsoonal glaciation on the Tibetan Plateau and in East Asia. Quaternary International 97-8, 79-91. Soltis PS, Soltis DE (2009) The role of hybridization in plant speciation. Annual Review of Plant Biology 60, 561-588. Song BH, Wang XQ, Wang XR, Ding KY, Hong DY (2003) Cytoplasmic composition in Pinus densata and population establishment of the diploid hybrid pine. Molecular Ecology 12, 2995-3001. Sork VL, Aitken SN, Dyer RJ, Eckert AJ, Legendre P, Neale DB (2013) Putting the landscape into the genomics of trees: approaches for understanding local adaptation and population responses to changing climate. Tree Genetics & Genomes 9, 901-911. Storfer A, Murphy MA, Evans JS, Goldberg CS, Robinson S, Spear SF, Dezzani R, Delmelle E, Vierling L, Waits LP (2007) Putting the 'landscape' in landscape genetics. Heredity 98, 128-142. Strasburg JL, Rieseberg LH (2010) How robust are "isolation with migration" analyses to violations of the IM model? A simulation study. Molecular Biology and Evolution 27, 297-310. Strasburg JL, Rieseberg LH (2011) Interpreting the estimated timing of migration events between hybridizing species. Molecular Ecology 20, 2353-2366.

40

Syring J, Farrell K, Businsky R, Cronn R, Liston A (2007) Widespread genealogical nonmonophyly in species of Pinus subgenus Strobus. Systematic Biology 56, 163-181. Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. American Journal of Botany 92, 2086-2100. Szmidt AE, Wang XR, Changtragoon S (1996) Contrasting patterns of genetic diversity in two tropical pines: Pinus kesiya (Royle ex Gordon) and P.merkusii (Jungh et De Vriese). Theoretical and Applied Genetics 92, 436-441. Thibert-Plante X, Hendry AP (2010) When can ecological speciation be detected with neutral loci? Molecular Ecology 19, 2301-2314. Tsutsui K, Suwa A, Sawada K, Kato T, Ohsawa TA, Watano Y (2009) Incongruence among mitochondrial, chloroplast and nuclear gene trees in Pinus subgenus Strobus (Pinaceae). Journal of Plant Research 122, 509-521. Wang XR, Szmidt AE (1993) Chloroplast DNA-based phylogeny of Asian Pinus species (Pinaceae). Plant Systematics and Evolution 188, 197-211. Wang XR, Szmidt AE (1994) Hybridization and chloroplast DNA variation in a Pinus species complex from Asia. Evolution 48, 1020-1031. Wang XR, Szmidt AE, Nguyen HN (2000) The phylogenetic position of the endemic flat-needle pine Pinus krempfii (Pinaceae) from Vietnam, based on PCR-RFLP analysis of chloroplast DNA. Plant Systematics and Evolution 220, 21-36. Wang XR, Szmidt AE, Savolainen O (2001) Genetic composition and diploid hybrid speciation of a high mountain pine, Pinus densata, native to the Tibetan Plateau. Genetics 159, 337-346. Wang XR, Tsumura Y, Yoshimaru H, Nagasaka K, Szmidt AE (1999) Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) based on chloroplast rbcL, matK, rpl20-rps18 spacer, and trnV intron sequences. American Journal of Botany 86, 1742-1753. Whitney KD, Randell RA, Rieseberg LH (2006) Adaptive introgression of herbivore resistance traits in the weedy sunflower Helianthus annuus. American Naturalist 167, 794-807. Willyard A, Cronn R, Liston A (2009) Reticulate evolution and incomplete lineage sorting among the ponderosa pines. Molecular Phylogenetics and Evolution 52, 498-511. Willyard A, Syring J, Gernandt DS, Liston A, Cronn R (2007) Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for Pinus. Molecular Biology and Evolution 24, 90-101. Wright S (1943) Isolation by distance. Genetics 28, 114. Xing Y, Liu YS, Su T, Jacques FMB, Zhou Z (2010) Pinus prekesiya sp. nov. from the upper Miocene of Yunnan, southwestern China and its

41

biogeographical implications. Review of Palaeobotany and Palynology 160, 1-9. Yao YF, Bruch AA, Cheng YM, Mosbrugger V, Wang YF, Li CS (2012) Monsoon versus uplift in southwestern China-Late Pliocene climate in Yuanmou Basin, Yunnan. Plos One 7, e37760. Yu H, Zheng SH, Huang RF (1998) Polymorphism of male cones in populations of Pinus yunnanensis Franch. Biodiversity Science 6, 267- 271 Yuan QJ, Zhang ZY, Peng H, Ge S (2008) Chloroplast phylogeography of Dipentodon (Dipentodontaceae) in southwest China and northern Vietnam. Molecular Ecology 17, 1054-1065. Yue LL, Chen G, Sun WB, Sun H (2012) Phylogeography of Buddleja crispa (Buddlejaceae) and its correlation with drainage system evolution in southwestern China. American Journal of Botany 99, 1726-1735. Zhang TC, Comes HP, Sun H (2011) Chloroplast phylogeography of Terminalia franchetii (Combretaceae) from the eastern Sino-Himalayan region and its correlation with historical river capture events. Molecular Phylogenetics and Evolution 60, 1-12. Zheng BX, Rutter N (1998) On the problem of quaternary glaciations, and the extent and patterns of Pleistocene ice cover in the Qinghai-Xizang (Tibet) Plateau. Quaternary International 45-6, 109-122. Zhisheng A, Kutzbach JE, Prell WL, Porter SC (2001) Evolution of Asian monsoons and phased uplift of the Himalayan Tibetan Plateau since Late Miocene times. Nature 411, 62-66.

42

Acknowledgments

This thesis and the papers included in it cover all research work that I have done during my PhD study. I would not have been able to achieve this without the help of many people and I would like to give my sincere appreciation here.

First of all, I will take the opportunity to thank my supervisor Xiao-Ru Wang for all her support and encouragement. Four years ago, she gave me the chance to join the group. She has introduced population genetics and evolutionary biology to me and has given me enormous help and guidance, especially for manuscript preparation during the past years. Her wisdom, preciseness and patience deserve my highest respect.

Many thanks to my co-supervisor Pelle Ingvarsson for his brilliant work in simulation modelling and discussion.

I want to thank my collaborators Jian-Feng Mao, Jie Gao and Wei Zhao for sampling, lab work and data analyses. Special thanks to Alan Hudson and Tomas Funda for reading through the thesis, polishing the writing and pointing out possible improvements. Thanks David Hall for his elegant translation of “Svenska sammanfattning” in this thesis. Thanks to Jin Pan for valuable comments and suggestion on illustrations of this thesis.

Thanks to Folmer, Barbara, Stacey, Carolina, Jing, Biyue, Amaryllis, Amanda and all others for scientific discussion in the journal club. I would like to thank my follow-up group members for their advices on my PhD study.

Finally, thanks my family for your love and support. I would especially like to thank my wife Hanhan. Thank you for accompanying me everyday.

43

Table S1 Descriptions of the ten investigated nuclear loci. Annealing Alignement o Locus PCR primers (5'-3') temperature ( C) length* (bp) Exon Intron IFG 8612 F: GAGTGCAAATAGCTACTCTCAGG 52 966 811-966 1-810 R: CAACAATAGGAATGTGCATGGTAAGG IF: CAACTGCCCTGTCAATAGGTC IR: GACCTATTGACAGGGCAGTTG CFX F: ATCCAAGGGCCAGTGTATGT 52 1102 1-203; 1067-1102 204-1066 R: CAAACGACTTCTCAAAGCTTCTCT IF: TATTGAGGAGGATATACCTG IR: GTAAGAGCTTCAACTACAGA 4CL F: CAGAGGTGGAACTGATTTC 52 1089 1-808; 1085-1089 809-1084 R: CTGCTTCTGTCATGCCGTA IF: CATCTATTCTCTCAATTCCG IR: TTTGTGATGTCCAGGACAAT TPP 1 F: AAAGAGCTGAGAAAGGAGAAAC 60 1195 1-161; 369-470; 784-978 162-368; 471-783; 979-1195 R: CCTCATACCACTGCAATAACTC IF: CTATACCTGAATTGGTACAG IR: GTCATGCAAGGAACTTAATC IFG 1934 F: CTACTGCTGCAACAATGGCAA 52 765 1-765 - R: TAGGCCCAGGCGTTGTTGTT SOS 27 F: AGAAACGTCTTAAGGCTGCTG 60 1252 1-9; 146-261; 337-513 10-145; 262-336; 514-1252 R: TGGTTGAAATTGTGATCCCTT Rav 2 F: GTTGCAGGAATTGATTGTGTG 60 752 38-661 1-37; 662-752 R: AATGGCAGAGACCAAGTTCTG GSTG 1 F: GCAACCAAGGAACAAAAGC 55 815 1-204; 471-815 205-470 R: TCGAATCTAACAAAGGCTCAC GSTH 2 F: ATGTCAGACCTCGAAGTTTTTG 55 494 1-48; 163-242; 441-494 49-162; 243-440 R: ATTTTATCAGAGTCTTGGATCCAT TPS 4 F: CATGGGTCAATTAGAATCTGTG 55 520 1-520 R: GATTTACCTTGATAGCACCACTTA Total 8950 4608 4342 * without ourgroup IF and IR, internal primers for sequencing Table S2 Primers used for mtDNA and cpDNA amplification. Primers generated polymorphic cpSSR sites are in bold. Annealing Sequence Length o Primer name Sequence (5´-3´) temperature ( C) (bp) Reference mtDNA 18S rRNA rrn 18-F GGGGTGCTTTCTTGACCATTTC 57 964 This study rrn 18-R CCCTACGGCTACCTTGTTACGACT matR matR -F GACCGACATCGACTCATCTCAA 57 1128 This study matR -R CATAATACGCCCACGCTCACT nad 3-rps 12 nad3/rps 12-F GTTCGCTAGTTTGTTTGATCCC 54 673 (Soranzo et al. 1999) nad3/rps 12-R TCCCAGCAAATCCTTGACTC nad 1 intron 2 nad1-2F CGCCCGTTTCCATTTCGTG 57 1510 This study nad1-2R GTGGCTCGTCCGTGCTTTG nad 5 intron 4 nad5-4F GGGGAGGTTTACAGGAGAT 54 574 This study nad5-4R AGTAGTGCCAGCAGGAATG cox 1 cox1-F TGGGAATCATCAACCTCA 55 1170 This study cox1-R GACCACGAAGAAACGAAA mh 09 mh09-F TCATCCATCCTCCAGCAACA 55 189 (Jeandroz et al. 2002) mh09-R TCATCCCCAGAAAGAGACAG mh 05 mh05-F GGGAGTCAGCGAAAGAAGTAA 52 Failed amplification (Jeandroz et al. 2002) mh05-R AGTCTCAGAGCCAGAAGCAG mh 09´ mh09´-F CCATCCAGCCATGTCTCATC 52 Failed amplification (Jeandroz et al. 2002) mh09´-R AGGGCTTCACATAGAGCATC mh 33 mh33-F TTCCCCAGACAGAACAGATAG 52 Failed amplification (Jeandroz et al. 2002) mh33-R GCTCTTAAGTGCTGGTTGATG mh 35 mh35-F CGATGACATCTCTTAGCTTCC 52 Failed amplification (Jeandroz et al. 2002) mh35-R TGGGGAATAGGATTCGGGTAA nad 5 intron 1 nad5-1F GGAAATGTTTGATGCTTCTTGGG 55 1227 (Wang et al. 2000) nad5-1R CTGATCCAAAATCACCTACTCG nad 4 intron 3 nad4-3F GGAGCTTTCCAAAGAAATAG 55 1939 (Dumolin-Lapegue et al. 1997) nad4-3R GCCATGTTGCACTAAGTTAC nad 7 intron 1 nad7-1F GGAACCGCATATTGGATCAC 55 1400 (Tian et al. 2010) nad7-1R GTTGTACCGTAAACCTGCTC Total mtDNA 10774 Table S2 continued Annealing Sequence Length o Primer name Sequence (5´-3´) temperature ( C) (bp) Reference cpDNA For genotyping Pt45002 PT45002-F AAGTTGGATTTTACCCAGGTG 52 ca. 184 (Vendramin et al. 1996) PT45002-R2 AACCCCAAGAACAAGAGGAT Pt71936 PT71936-F TTCATTGGAAATACACTAGCCC 52 ca. 139 (Vendramin et al. 1996) PT71936-R AAAACCGTACATGAGATTCCC Pt87268 PT87268-F GCCAGGGAAAATCGTAGG 52 ca. 147 (Vendramin et al. 1996) PT87268-R AGACGATTAGACATCCAACCC Pt30204 PT30204-F TCATAGCGGAAGATCCTCTTT 52 ca. 152 (Vendramin et al. 1996) PT30204-R CGGATTGATCCTAACCATACC PCP1289 PCP1289-F TCCTGGTTCCAGAAATGGAG 52 Failed amplification (Provan et al. 1998) PCP1289-R TAATTTGGTTCCAGAATTGCG PCP41131 PCP41131-F AAAGCATTTCCAGTTGGGG 52 113 (Provan et al. 1998) PCP41131-R GGTCAGGATTCATGTTCTTCC For sequencing Pt15169 PTS15169-F CCCCTTGTAAAGATGAAC 50 637 This study PTS15169-R AATGGCTTGGTGGTATGT Pt63718 PTS63718-F GATCTGGATATTTCCTCCGATAC 50 ca. 806 This study PTS63718-R AAGCGTTGGTTGGGTAAGC Pt100783 PTS100783-F AAAACAATAGCGTCCTCC 50 ca. 1017 This study PTS100783-R CCGAATATGGTCCAATCA Pt26081 PTS26081-F ATTCAGGGATTGTAAACG 50 780 This study PTS26081-R CGAGATGATAAGGGAACTA Pt36480 PTS36480-F TGAAAGGGAGTAAGATAAATGG 50 833 This study PTS36480-R GAGATAGATCGCAGGGAGG PKS108222* PKS108222-F ACCCTGAGATCGTTTGA 50 ca. 664 This study PKS108222-R TCCATTGCTATGCGTTT * Two cp SSR loci were included in this region Reference: Dumolin-Lapegue S, Pemonge MH, Petit RJ (1997) An enlarged set of consensus primers for the study of organelle DNA in plants. Mol Ecol 6, 393-397. Jeandroz S, Bastien D, Chandelier A, Du Jardin P, Favre JM (2002) A set of primers for amplification of mitochondrial DNA in Picea abies and other conifer species. Mol Ecol Notes 2, 389-392. Provan J et al. (1998) Gene-pool variation in Caledonian and European Scots pine (Pinus sylvestris L.) revealed by chloroplast simple-sequence repeats. Proc Roy Soc Lond B Biol Sci 265, 1697-1705. Soranzo N, Provan J, Powell W (1999) An example of microsatellite length variation in the mitochondrial genome of conifers. Genome 42, 158-161. Tian S, Lopez-Pujol J, Wang HW, Ge S, Zhang ZY (2010) Molecular evidence for glacial expansion and interglacial retreat during Quaternary climatic changes in a montane temperate pine (Pinus kwangtungensis Chun ex Tsiang) in southern China. Plant Syst Evol 284, 219-229. Wang XQ, Tank DC, Sang T (2000) Phylogeny and divergence times in Pinaceae: Evidence from three genomes. Mol Biol Evol 17, 773-781. Vendramin GG, Lelli L, Rossi P, Morgante M (1996) A set of primers for the amplification of 20 chloroplast microsatellites in Pinaceae. Mol Ecol 5, 595-598. Table S3 Prior distribution of the demographic parameters for standard neutral model (N), exponential growth model (G) and bottleneck model (B).

Model %-N 1 T 0 T d N 10-5 - 0.05 10-5 - 0.1 na na na G 10-5 - 0.05 10-5 - 0.1 10-3 - 1 10-5 - 1 na -5 -5 -3 -5 -4 -1 B 10 - 0.05 10 - 0.1 10 - 1 10 - 1 10 - 10 % - , per site nucleotide polymorphism; , per site recombination rate; N 1 , ancestral population size in exponential growth model or population size during bottleneck

in bottleneck model; T 0 , the time of the initial size change in growth model or the time since the end of bottleneck in bottleneck model; T d , duration of bottleneck.

N 1 are measured in units of current population size (N 0 ); T0 and T d are measured in units of 4N 0 generation. % - % -5 , , N A , N 1 , T 0 and T d are given by an uniform with minimum and maximum, e.g. for all simulations, the distribution of is an uniform range from 10 to 0.05. na, no paremeter in such model. % % 4 Table S4 Geographic location, sample sizes (N ), the number of segregating sites (S ), nucleotide polymorphism ( w , total sites; ws , silent sites), nucleotide diversity ( t , total sites; 4 4 s , silent sites; a , nonsynonymous), number of haplotypes (n h ) of the six populations of Pinus krempfii . 4CL CFX o o % % 4 4 4 % % 4 4 4 Populations Longitude ( E) Latitude ( N) NS n h w ws t s a S n h w ws t s a Da Chay 89A 108.6843 12.1758 9 0 1 00000 960.0024 0.0029 0.0030 0.0036 0 Da Chay 90A 108.7015 12.1756 13 5 4 0.0012 0.0022 0.0005 0.0009 0.0001 13 11 0.0031 0.0037 0.0037 0.0045 0 Da Chay 91B 108.6893 12.1938 12 4 5 0.0010 0.0017 0.0005 0.0009 0.0001 9 7 0.0022 0.0026 0.0025 0.0030 0 Total Da Chay 34 6 6 0.0012 0.0017 0.0003 0.0007 0.0001 13 14 0.0025 0.0030 0.0033 0.0039 0 Cong Troi 102 108.4095 12.091 3 0 1 0 0 0 0 0 3 3 0.0012 0.0014 0.0014 0.0017 0 Cong Troi 103 108.4667 11.9488 10 1 2 0.0003 0 0.0002 0 0.0003 11 12 0.0028 0.0034 0.0025 0.0030 0 Total Cong Troi 13 1 2 0.0002 0 0.0001 0 0.0002 12 14 0.0029 0.0034 0.0024 0.0029 0 Bidoup 108.6854 12.0475 10 2 3 0.0005 0.0012 0.0004 0.0009 0 10 9 0.0026 0.0031 0.0038 0.0045 0 Total species 57 7 7 0.0012 0.0016 0.0003 0.0006 0.0001 16 23 0.0027 0.0033 0.0033 0.0039 0

Table S4 continued GSTG 1 GSTH 2 o o % % 4 4 4 % % 4 4 4 Populations Longitude ( E) Latitude ( N) NS n h w ws t s a S n h w ws t s a Da Chay 89A 108.6843 12.1758 9 3 3 0.0011 0.0015 0.0004 0.0006 0.0003 6 6 0.0035 0.0033 0.0051 0.0050 0.0056 Da Chay 90A 108.7015 12.1756 13 3 3 0.0010 0.0014 0.0009 0.0007 0.0010 7 7 0.0037 0.0037 0.0031 0.0028 0.0042 Da Chay 91B 108.6893 12.1938 12 3 4 0.0010 0.0028 0.0007 0.0019 0.0010 8 10 0.0043 0.0045 0.0044 0.0045 0.0044 Total Da Chay 34 7 7 0.0018 0.0033 0.0007 0.0010 0.0005 8 10 0.0034 0.0035 0.0043 0.0041 0.0047 Cong Troi 102 108.4095 12.091 3 0 1 00000 450.0035 0.0037 0.0036 0.0035 0.0039 Cong Troi 103 108.4667 11.9488 10 3 3 0.0010 0.0015 0.0012 0.0015 0.0009 7 7 0.0040 0.0048 0.0059 0.0074 0.0019 Total Cong Troi 13 3 4 0.0010 0.0014 0.0012 0.0012 0.0010 7 7 0.0037 0.0044 0.0054 0.0067 0.0024 Bidoup 108.6854 12.0475 10 3 4 0.0010 0.0015 0.0005 0.0008 0.0002 6 9 0.0034 0.0040 0.0041 0.0043 0.0037 Total species 57 8 10 0.0019 0.0035 0.0008 0.0011 0.0006 8 15 0.0030 0.0032 0.0046 0.0048 0.0040

Table S4 continued IFG 1934 IFG 8612 o o % % 4 4 4 % % 4 4 4 Populations Longitude ( E) Latitude ( N) NS n h w ws t s a S n h w ws t s a Da Chay 89A 108.6843 12.1758 9 2 3 0.0008 0.0031 0.0008 0.0033 0 4 5 0.0012 0.0014 0.0011 0.0013 0 Da Chay 90A 108.7015 12.1756 13 2 3 0.0007 0.0028 0.0008 0.0031 0 4 5 0.0011 0.0013 0.0012 0.0013 0 Da Chay 91B 108.6893 12.1938 12 2 4 0.0007 0.0028 0.0005 0.0020 0 4 5 0.0011 0.0013 0.0010 0.0011 0 Total Da Chay 34 2 4 0.0006 0.0022 0.0007 0.0028 0 8 9 0.0018 0.0020 0.0010 0.0013 0 Cong Troi 102 108.4095 12.091 3 1 2 0.0006 0.0023 0.0004 0.0018 0 1 2 0.0004 0.0005 0.0003 0.0004 0 Cong Troi 103 108.4667 11.9488 10 4 5 0.0015 0.0045 0.0009 0.0025 0.0003 4 4 0.0012 0.0010 0.0013 0.0013 0.0008 Total Cong Troi 13 4 5 0.0014 0.0042 0.0008 0.0023 0.0003 4 4 0.0011 0.0010 0.0012 0.0012 0.0007 Bidoup 108.6854 12.0475 10 2 4 0.0007 0.0030 0.0009 0.0035 0 7 8 0.0021 0.0024 0.0015 0.0018 0 Total species 57 4 6 0.0010 0.0030 0.0007 0.0028 0.0001 11 11 0.0022 0.0023 0.0012 0.0013 0.0001 Table S4 continued Rav 2 SOS 27 o o % % 4 4 4 % % 4 4 4 Populations Longitude ( E) Latitude ( N) NS n h w ws t s a S n h w ws t s a Da Chay 89A 108.6843 12.1758 9 1 2 0.0004 0 0.0006 0 0.0010 1 3 0.0002 0.0001 0.0002 0.0002 0 Da Chay 90A 108.7015 12.1756 13 1 2 0.0003 0 0.0004 0 0.0007 0 1 00000 Da Chay 91B 108.6893 12.1938 12 1 2 0.0004 0 0.0005 0 0.0007 2 3 0.0004 0.0001 0.0001 0.0002 0 Total Da Chay 34 1 2 0.0003 0 0.0005 0 0.0008 3 5 0.0005 0.0001 0.0001 0.0001 0 Cong Troi 102 108.4095 12.091 3 1 2 0.0006 0.0016 0.0004 0.0012 0 3 3 0.0011 0.0013 0.0013 0.0016 0 Cong Troi 103 108.4667 11.9488 10 1 2 0.0004 0 0.0007 0 0.0011 0 1 00000 Total Cong Troi 13 2 3 0.0007 0.0009 0.0007 0.0003 0.0010 3 3 0.0006 0.0008 0.0004 0.0004 0 Bidoup 108.6854 12.0475 10 3 4 0.0011 0.0010 0.0008 0.0004 0.0011 0 1 0 0 0 0 0 Total species 57 3 4 0.0007 0.0007 0.0006 0.0001 0.0009 5 6 0.0008 0.0011 0.0001 0.0002 0

Table S4 continued TPP 1 TPS 4 o o % % 4 4 4 % % 4 4 4 Populations Longitude ( E) Latitude ( N) NS n h w ws t s a S n h w ws t s a Da Chay 89A 108.6843 12.1758 9 0 1 00000 120.0006 0 0.0004 0 0.0005 Da Chay 90A 108.7015 12.1756 13 1 2 0.0002 0 0.0001 0 0.0002 2 3 0.0010 0.0022 0.0004 0.0006 0.0004 Da Chay 91B 108.6893 12.1938 12 1 2 0.0002 0 0.0001 0 0.0002 3 4 0.0015 0.0044 0.0017 0.0031 0.0006 Total Da Chay 34 1 2 0.0002 0 0.0001 0 0.0002 3 4 0.0012 0.0034 0.0007 0.0014 0.0005 Cong Troi 102 108.4095 12.091 3 0 1 0 0000 230.0017 0.0036 0.0013 0.0027 0.0008 Cong Troi 103 108.4667 11.9488 10 0 1 00000 120.0005 0 0.0005 0 0.0002 Total Cong Troi 130100000 2 3 0.0010 0.0022 0.0007 0.0006 0.0007 Bidoup 108.6854 12.0475 10 1 2 0.0002 0.0003 0.0001 0.0001 0 1 2 0.0005 0 0.0002 0 0.0002 Total species 57 2 3 0.0003 0.0002 0 0 0.0001 3 4 0.0011 0.0031 0.0006 0.0010 0.0005 TableS5 Neutrality test at each locus as measured by Tajima's D , Fu and Li’s D* and F* , Fu's Fs , standardized Fay and Wu’s H , and the MK test. 4CL CFX MK test MK test Populations Tajima's DD*F* Fs H Fisher G test Tajima's DD*F* Fs H Fisher G test Da Chay 90A -1.7092 -2.2413 -2.4209 -1.2400 0.4298 0.5846 0.7960 0.7132 -0.1923 0.0974 -1.4480 0.5445 na na Da Chay 91B -1.4953 -0.8560 -1.1998 -2.8420 -1.5134 1.0000 0.1440 0.4573 1.3732* 1.2829 0.1310 -0.8995 na na Da Chay 89A na na na na na na na 0.9175 1.3827* 1.4451 0.9520 -0.9901 na na Cong Troi 102 na na na na na na na 1.1241 1.3958 1.4062 0.6150 -0.8514 na na Cong Troi 103 -0.5916 0.6495 0.3673 -0.0970 0.3992 1.0000 0 -0.3438 0.1161 -0.0191 -5.6180 -0.4155 na na Bidoup -0.5278 -0.5935 -0.6606 -0.4640 0.5485 0.4965 na 1.6614 1.4100* 1.7172* -0.6530 0.6158 na na Total -1.7067 -1.4859 -1.8489 -5.3330 -1.4131 1.0000 0.1420 0.5462 0.0215 0.2579 -6.2870 0.1422 na na

TableS5 continued GSTG 1 GSTH 2 MK test MK test Populations Tajima's DD*F* Fs H Fisher G test Tajima's DD*F* Fs H Fisher G test Da Chay 90A -0.3026 0.9684 0.7035 0.6200 0.7313 0.5455 0.7540 -0.4630 -0.7341 -0.7607 -1.4830 0.0518 na na Da Chay 91B -0.8023 0.9796 0.5541 -1.1950 0.4680 1.0000 na 0.0799 0.1405 0.1426 -3.4370 0.6814 na na Da Chay 89A -1.7130 -2.3015 -2.4579 -1.0250 0.3137 1.0000 0.1030 1.4693 1.2590 1.5168 0.1800 0.7805 na na Cong Troi 102 -0.7093 -0.3748 -0.4688 0.0200 0.8514 1.0000 0.0170 0.1491 0.0713 0.0931 -2.3440 0.7020 na na Cong Troi 103 0.4212 1.0065 0.9732 1.1730 0.8311 1.0000 0.1030 1.5534 1.2998 1.5857 -0.0860 1.0810 na na Bidoup -1.4407 -1.2550 -1.5017 -2.1350 0.4376 1.0000 0.1030 0.6711 0.5473 0.6723 -3.2180 0.9488 na na Total -1.5137 -0.9882 -1.3912 -6.3180 0.3946 1.0000 0.0080 1.1898 1.2501 1.4582 -3.4510 0.9785 na na

TableS5 continued IFG 1934 IFG 8612 MK test MK test Populations Tajima's DD*F* Fs H Fisher G test Tajima's DD*F* Fs H Fisher G test Da Chay 90A 0.2724 0.8256 0.7742 0.2990 -0.6822 0.14708 na 0.1359 0.0889 0.1185 -0.5400 0.7973 na na Da Chay 91B -0.6072 0.8373 0.5053 -1.9790 -1.9773 0.14708 na -0.3204 -0.8560 -0.8140 -1.0310 0.6068 na na Da Chay 89A 0.2204 0.8846 0.8103 0.1610 -0.7743 0.14708 na -0.3022 -0.7011 -0.6807 -1.1430 0.4422 na na Cong Troi 102 -0.9330 -0.9502 -0.9647 -0.0030 0.5518 0.27273 na -0.9330 -0.9502 -0.9647 -0.0030 -2.7588 na na Cong Troi 103 -1.2057 -0.7593 -1.0171 -2.2180 -0.9029 0.36230 1.5450 0.1862 -0.7593 -0.5732 0.4210 0.5099 na na Bidoup 0.4354 0.8662 0.8605 -0.9170 -0.3823 0.14708 na -0.8673 0.6766 -0.6517 -1.9150 0.6827 na na Total -0.4747 -0.2934 -0.4147 -2.0280 -0.8105 0.36230 1.5450 -1.1816 -1.9712 -2.0113 -4.2090 0.4113 na na TableS5 continued Rav 2 SOS 27 MK test MK test Populations Tajima's DD*F* Fs H Fisher G test Tajima's DD*F* Fs H Fisher G test Da Chay 90A 0.3809 0.6121 0.6308 0.7840 0.6011 1.0000 na na na na na na na na Da Chay 91B 0.4803 0.6227 0.6704 0.8470 0.6138 1.0000 na -1.5147 -2.1591 -2.2809 -2.0780 0.2574 na na Da Chay 89A 1.1662 0.6669 0.9102 1.2150 0.5509 1.0000 na -1.5352 -1.9890 -2.1385 -1.7960 0.3260 na na Cong Troi 102 -0.9330 -0.9502 -0.9647 -0.0030 0.5518 na na 1.1241 1.3958 1.4062 0.6150 0.8514 na na Cong Troi 103 1.1513 0.6495 1.0100 1.4670 0.2245 1.0000 na na na na na na na na Bidoup -0.6428 -1.2550 -1.2505 -1.0060 0.5405 0.4667 na na na na na na na na Total -0.3515 -0.6153 -0.6244 -0.5640 0.4521 0.4667 na -1.8627 -1.8098 -2.1612 -6.4980 0.1623 na na

TableS5 continued TPP 1 TPS 4 MK test MK test Populations Tajima's DD*F* Fs H Fisher G test Tajima's DD*F* Fs H Fisher G test Da Chay 90A -1.1556 -1.6338 -1.7271 -1.0940 0.1803 0.4000 na -1.2239 -0.6891 -0.9657 -1.4990 0.3411 1.0000 0.1380 Da Chay 91B -1.1593 -1.6058 -1.7042 -1.0280 0.1929 0.4000 na -0.6322 -0.1889 -0.3614 -0.9820 0.6348 1.0000 0.6800 Da Chay 89A na na na na na na na -0.5290 0.6669 0.4045 -0.0110 0.4285 1.0000 na Cong Troi 102 na na na na na na na -1.1320 -1.1553 -1.1951 -0.8580 0.7341 1.0000 0.1380 Cong Troi 103 na na na na na na na -0.0861 0.6495 0.5203 0.3810 0.5239 1.0000 na Bidoup -1.1644 -1.5396 -1.6477 -0.8790 0.2245 0.4000 na -1.6439 -1.5396 -1.6477 -0.8790 0.2245 1.0000 na Total -1.2920 -1.1017 -1.3595 -3.4760 0.0987 0.4000 na -0.7913 0.8159 0.3589 -1.4500 0.4252 1.0000 0.6800 *P<0.05; na , not caculated due to low polymorphism. Table S6 Population differentiation (F ST ) with each region and total range of Pinus krempfii . No. of Locus Regions populations IFG 8612 CFX 4CL TPP 1 IFG 1934 SOS 27 Rav 2 GSTG 1 GSTH 2 TPS 4 Multilocus Group Da Chay 3 -0.016 0.074* <0.001 -0.029 -0.005 0.023* -0.016 0.051* 0.039 0.016 0.038* Group Cong Troi 2 0.149 0.071 -0.043 NA -0.027 0.471** 0.270* -0.076 -0.001 -0.029 0.078* Total range 6 0,013 0,085** 0,018 -0.019 0,002 0,204** 0,026 0.048* 0,043* 0,005 0.052** *P<0.05; **P<0.01. Table S7 Sampling, data and references for the studies of nucleotide diversity in pines.

4 Species No. of loci N πt πs θw θws Ne × 10 * Reference

Pinus krempfii 10 57 0,0011 0.0015 0,0014 0,0020 1.39 This study P. tabuliformis 7 43 0,0085 0.0119 0,0107 0,0153 10.93 (Ma et al. 2006) 8 48 0,0070 0.0150 0,0092 0,0173 12.37 (Gao et al. 2012) P. yunnanensis 7 29 0,0067 0.0095 0,0055 0,0077 5.50 (Ma et al. 2006) 8 40 0,0046 0.0101 0,0060 0,0101 7.21 (Gao et al. 2012) P. densata 7 66 0,0086 0.0122 0,0101 0,0143 10.21 (Ma et al. 2006) 8 136 0,0065 0.0138 0,0098 0,0153 10,93 (Gao et al. 2012)

P. pinaster 11 208 0,0055 0.0085 0,0062 na 6,07 (Eveno et al. 2008) 6 122 0,0057 na 0,0045 na 3,20 (Grivet et al. 2011) 8 14 0,0024 na 0,0021 na 1,52 (Pot et al. 2005) P.radiata 8 23 0,0019 na 0,0019 na 1,36 (Pot et al. 2005) P. taeda 19 32 0,0040 0.0064 0,0041 0,0066 4,70 (Brown et al. 2004) 28 32 na na 0,0049 0,0059 4,20 (Neale & Savolainen 2004) 16 32 na na 0,0046 0,0070 5,00 (Neale & Savolainen 2004) 18 32 0,0051 0,0085 0,0053 0,0079 5,56 (Gonzalez-Martinez et al. 2006) 41 32 0,0049 0,0070 0,0060 0,0093 6,6 (Ersoz et al. 2010) P. halepensis 10 60 0.0018 na 0,0029 na 2,07 (Grivet et al. 2009) 6 93 0,0031 na 0,0026 na 1,86 (Grivet et al. 2011) P. sylvestris 16 40 0,0034† 0,0070 0,0061† 0,0050 3,57 (Pyhäjärvi et al. 2007) 17 40 0,0041 0.0057 na 0,0080 5.71 (Wachowiak et al. 2011a) 14 40 0,0060 0,0077 na 0,0089 6,35 (Wachowiak et al. 2009) 11 119 0,0032 0,0055 0,0053 0,0062 4,43 (Kujala & Savolainen 2012) 12 40 0,0078 0,0117 na 0,0095 6,79 (Wachowiak et al. 2011b) 8 28 0,0058 0,0076 0,0063 0,0055 3,94 (Ren et al. 2012) 8 35 na 0,0079 na 0,010 7,14 (Pyhajarvi et al. 2011) P. mugo 17 12 0,0049 0.0067 na 0,0072 5.14 (Wachowiak et al. 2011a) 12 169 0,0118 0,0185 na 0,0169 12,07 (Wachowiak et al. 2013) 310 12 0,0077 0,0065 0,0076 0,0067 4,79 (Mosca et al. 2012) P. uncinata 12 93 0,0113 0,0178 na 0,0134 9,57 (Wachowiak et al. 2013) P.cembra 280 12 0,0018 0,0024 0,0019 0,0024 1,71 (Mosca et al. 2012) P.chiapensis 3 7 0,0031 na na na 2,21 (Syring et al. 2007) P.ayacahuite 3 8 0,0036 na na na 2,57 (Syring et al. 2007) P.monticola 3 9 0,0092 na na na 6,57 (Syring et al. 2007) P.strobus 3 10 0,0044 na na na 3,14 (Syring et al. 2007) P.thunbergii 15 16 0,0061 na 0,00697 na 4,32 (Suharyanto & Shiraishi 2011) P.densiflora 15 16 0,0053 na 0,00778 na 3,76 (Suharyanto & Shiraishi 2011) 8 28 0,0060 0,0078 0,00642 0,0064 4,57 (Ren et al. 2012) P.luchuensis 15 16 0,0050 na 0,00607 na 3,59 (Suharyanto & Shiraishi 2011) P.balfouriana 5 40 0,0035 na 0,00325 na 2,32 (Eckert et al. 2008) P.canariensis 3 384 0,0036 na 0,00367 na 2,56 (Lopez et al. 2013)

P. contorta 21 95 0,0025 0,0035 0,00251 0,0034 2,43 (Eckert et al. 2012)

ISBN 978-91-7459-702-8

Department of Ecology and Environmental Science Umeå University 901 87 Umeå, Sweden