Quick viewing(Text Mode)

Differentiation and Genetic Variability in Cork Oak Populations (Quercus Suber L.)

Differentiation and Genetic Variability in Cork Oak Populations (Quercus Suber L.)

UNIVERSIDADE DE LISBOA

FACULDADE DE CIÊNCIAS

DEPARTAMENTO DE BIOLOGIA ANIMAL

Differentiation and genetic variability in populations (Quercus suber L.)

Joana Seabra Pulido Neves da Costa

MESTRADO EM BIOLOGIA HUMANA E AMBIENTE

Lisboa

2011

UNIVERSIDADE DE LISBOA

FACULDADE DE CIÊNCIAS

DEPARTAMENTO DE BIOLOGIA ANIMAL

Differentiation and genetic variability in cork oak populations (Quercus suber L.)

Joana Seabra Pulido Neves da Costa

Dissertação orientada por:

Prof. Doutor Octávio Fernando de Sousa Salgueiro Godinho Paulo

Doutora Dora Cristina Vicente Batista Lyon de Castro

MESTRADO EM BIOLOGIA HUMANA E AMBIENTE

Lisboa

2011

Nota prévia

A presente tese de mestrado encontra-se escrita na língua Inglesa uma vez que esta é considerada a língua científica universal. Por esta razão, o conhecimento e treino da sua escrita apresentam uma importância considerável para quem tenciona seguir uma carreira em investigação científica em Biologia. Com a escrita da tese em Inglês pretende-se também acelerar o processo de elaboração dos manuscritos e subsequentes publicações científicas.

As referências bibliográficas foram elaboradas segundo os parâmetros da revista científica internacional, “Trends in Ecology and Evolution” (www.cell.com/trends/ecology- evolution/authors). Esta é uma das revistas mais relevantes na área em que esta tese foi desenvolvida e possui um sistema de citações cómodo para a leitura de textos de revisão científica. Adicionando o seu elevado factor de impacto na sociedade científica, pareceu apropriada a escolha desta revista como referência para a apresentação da bibliografia.

O estudo elaborado nesta tese foi desenvolvido no âmbito do projecto PTDC/AGR- GLP/104966/2008, “Avaliação dos recursos genéticos e genómicos do sobreiro: bases para uma gestão prospectiva”, financiado pela Fundação para a Ciência e Tecnologia (FCT).

III

Foreword

The present master thesis is written in English. This is considered as the universal scientific language and, therefore, is of the upmost importance the practice of its writing and grammar for those who intend to follow a career in Biology and scientific investigation. Also, the writing of the present thesis in the English language allows to accelerate the process of submission of the manuscripts for further publication.

The bibliographic references were elaborated following the parameters of the international scientific journal “Trends in Ecology and Evolution” (www.cell.com/trends/ecology- evolution/authors). This is one of the most relevant journals in the area where this thesis was developed, with an elevated impact factor in the scientific society. Also it possesses a confortable citations system for the reading long texts.

This study is part of the project PTDC/AGR-GLP/104966/2008, “Avaliação dos recursos genéticos e genómicos do sobreiro: bases para uma gestão prospectiva”, funded by Fundação para a Ciência e Tecnologia (FCT).

IV

Agradecimentos

No terminar desta tese surge a necessidade de agradecer a todos aqueles que de alguma forma a tornaram possível.

O primeiro agradecimento é devido aos meus orientadores, Octávio Paulo e Dora Batista. Ao Professor Octávio pelo incentivo e voto de confiança que depositou em mim desde o início. À Dora pela proposta do tema de mestrado e pelo despertar do meu interesse pelas plantas.

À Professora Deodália Dias pelas oportunidades que me proporcionou e pelo apoio incondicional quando os problemas fogem ao nosso controlo e não dependem de nós.

Agradeço em particular à Professora Helena Almeida do Instituto Superior de Agronomia pelo acesso à Herdade Monta da Fava de onde vieram algumas das populações de sobreiro mais importantes para o desenvolvimento deste trabalho. A todos os que directa ou indirectamente foram importantes para a recolha das amostras.

Ao CoBiG2 pelo grupo que se formou e pelos bons momentos. Ao Francisco Pina-Martins e à Vera Nunes por me terem criado nos momentos iniciais da minha vida de laboratório. À Sofia Seabra pela sua calma natural e amizade. Ao Eduardo Marabuto pelo seu bom humor, muito necessário em tempos difíceis. À Sara Ema, por incrível que te possa parecer acho que stressas mais que eu e isso é uma ajuda enorme, assim como estares comigo até ao dia da entrega… mesmo de moletas. Ao Diogo Silva pelo partilhar de alguns momentos difíceis com os orientadores a afins. À Catarina Dourado, Ana Sofia, Patrícia Brás, Renata Martins, Inês Modesto e Bruno Vieira que me foram ajudando com as usuais dificuldades de uma tese. Aos restantes membros do CoBiG2, assim como a antigos membros e às mais recentes aquisições, muito obrigada!

À Rita Oliveira e Raquel Vaz, não fazem parte do CoBiG2, mas fazem parte da família e merecem o devido reconhecimento e agradecimento pelo que “aturaram” da minha parte.

Um agradecimento especial a quatro pessoas que devem ter sofrido muito comigo. Seriam precisas páginas de agradecimentos, mas como não o posso fazer fica a intenção. À Catarina Dourado um agradecimento em particular. Foi um longo caminho e fica o agradecimento pelo carinho, apoio e amizade. Ao Eduardo Marabuto pelos valiosos comentários, acima de tudo na Introdução. Tens razão em muitas coisas mas há que fazer compromissos. À Sofia Seabra pelas importantíssimas correcções, seria muito mais difícil sem ti. Ao Bruno Vieira pelas

V

horas infinitas que me ouviu queixar da vida em geral, da tese em particular. Sempre com muito amor e carinho! Obrigada aos quatro!

À Diana Martins. Não estás sempre comigo mas estás sempre a pensar em mim e tens timings impecáveis para quando preciso mais de ti.

E ao Pai, Mãe e Avós. Apesar de estarem no fim desta lista foram provavelmente as pessoas que mais contribuíram para que esta tese pudesse ser concluída. Claramente não estaria cá sem o apoio precioso da minha família.

VI

Resumo

O ano 2011 foi designado como “O Ano Internacional das Florestas” pela Assembleia Geral das Nações Unidas, na tentativa de despertar o interesse público e promover a sustentabilidade da gestão e conservação florestal para o benefício das gerações futuras. Estimativas da FAO (Food and Agriculture Organization) para o ano de 2010 demonstraram que 31% da superfície terrestre ainda está coberta por florestas e que as árvores correspondem a 90% da biomassa terrestre, compreendendo um total de 60.000 a 100.000 taxa. Contudo, certas alterações induzidas pelo Homem, principalmente a desflorestação e as alterações climáticas elevaram o número de espécies ameaçadas de extinção para 10%.

Nos últimos tempos as espécies florestais têm sido bastante usadas em estudos de genética populacional e evolutiva, assim como em estudos genómicos. As principais razões são as características particulares que estes modelos não-clássicos apresentam, visto resultarem de milhões de anos de divergência e diversificação, e assim apresentarem impressionantes níveis de diversidade morfológica, divergência evolutiva e diversidade ecológica. Apesar de o impacto que as alterações globais vão ter sobre estas espécies depender grandemente da sua capacidade de reacção e da dos seus ecossistemas, os estudos genéticos permitem-nos, até certo ponto, prever as consequências evolutivas das alterações uma vez que nos possibilitam aumentar o conhecimento da biodiversidade e evolução destas espécies.

O conceito de “Filogeografia” foi apresentado por Avise et al. em 1987, e durante os últimos 25 anos teve um grande impacto na investigação, particularmente em animais. Nas plantas os resultados produzidos não têm sido tão explícitos, principalmente devido à falta de variabilidade genética aplicável à análise filogeográfica. Tem sido consideravelmente difícil encontrar um marcador genético em plantas com um poder de resolução semelhante ao DNA mitocondrial animal. No entanto a filogeografia em plantas tem-se desenvolvido bastante, principalmente nos últimos anos, com o crescimento do uso de marcadores moleculares nucleares e com a recolha de informação de fragmentos maiores do genoma cloroplastidial.

O género Quercus (carvalhos) () é um dos grupos mais importantes de angióspermicas lenhosas no hemisfério norte, nomeadamente em relação à diversidade de espécies, dominância ecológica e valor económico. O género é bastante antigo considerando que o fóssil mais antigo encontrado pertence ao Oligoceno (34-23 milhões de anos). Os

VII

carvalhos são os membros dominantes de uma grande variedade de e pensa-se que existam 500-600 espécies na Terra.

O sobreiro (Quercus suber L.) representa umas das espécies arbóreas mais importantes da região Oeste do Mediterrâneo, tanto económica como ecologicamente, onde define espaços florestais abertos (criados e mantidos pelo Homem) conhecidos em como “montados”. A área de distribuição do sobreiro, apesar de descontínua, vai desde a costa Atlântica do Norte de África e Península Ibérica até às regiões sudoeste de Itália, incluindo as ilhas Mediterrânicas Sicília e Sardenha, assim como as zonas costeiras do Mediterrâneo da Argélia e Tunísia. As florestas de sobreiro cobrem uma área total de cerca de 2,2 milhões de hectares, de onde são extraídas 340.000 toneladas/ano de cortiça. As maiores extensões de área coberta estão localizadas em Portugal com cerca de 700.000 hectares, correspondendo a 21% da área florestal Portuguesa e 30% da área mundial de produção de cortiça. O sobreiro tem sido usado desde a Antiguidade para a produção de cortiça e este produto natural apresenta um grande valor económico. As maiores ameaças contudo, são enfrentadas pelas populações naturais e marginais, que muitas vezes são pequenas e se encontram dispersas e em habitats restritos. Muitas destas populações podem estar em risco de desaparecer, principalmente devido à falta de regeneração.

Devido ao seu valor económico e também porque os espaços florestais de sobro são reservatórios de biodiversidade e abrigo para uma grande variedade de espécies ameaçadas de extinção, estas populações representam material importante para estudos genéticos que possam servir de base ao delineamento de programas de conservação. Assim sendo, é necessário fortalecer e aumentar o conhecimento da organização espacial da variação genética da espécie, para assim se poder tomar decisões conscientes e informadas sobre a conservação dos recursos genéticos.

Os estudos filogeográficos em Quercus suber têm sido pouco aprofundados e alguns até inconclusivos. Isto leva a que não haja uma boa compreensão da história evolutiva da espécie, muito provavelmente devido ao número limitado de áreas amostradas ou o baixo conteúdo informativo dos marcadores usados. Por exemplo, nos estudos que envolveram populações Portuguesas, foram feitas inferências com base numa amostragem deficiente de Portugal, e sendo esta uma das regiões mais relevantes na história presente e passada do sobreiro, é necessária uma maior cobertura da área de distribuição, incluindo algumas zonas referidas como potenciais zonas de refúgios glaciais para outras espécies. Por outro lado, uma

VIII

vez que a maioria dos estudos filogeográficos são suportados por dados derivados do DNA cloroplastidial (cpDNA) (PCR-RFLPs e SSRs), deve considerar-se se outras abordagens moleculares ou marcadores genéticos, que evoluam a taxas mais rápidas que o cpDNA não indicariam um cenário evolutivo diferente. Esta tese de mestrado propõe uma abordagem diferente dos estudos anteriores, complementando dados obtidos a partir de DNA cloroplastidial e nuclear. Esta abordagem nunca foi aplicada ao sobreiro, e espera-se que possa adicionar informação filogeográfica relevante. Mais especificamente os objectivos deste trabalho foram: 1. Inferir a história evolutiva e os padrões demográficos de Quercus suber; 2. Explorar os padrões de hibridação e introgressão do sobreiro com outras espécies de Quercus; 3. Avaliar os níveis de diversidade e diferenciação entre e dentro de algumas populações chave de sobreiro.

A sequenciação de vários fragmentos permitiu inferir alguns detalhes sobre a história evolutiva da espécie. O tradicional cpDNA foi seleccionado para sequenciação de 3 regiões inter-génicas (TrnL-F, TrnS-PsbC e TrnH-PsbA), num total de 148 amostras provenientes de 26 populações. No entanto, e porque inferências filogeográficas baseadas num único tipo de marcador não-recombinante pode dar informações erróneas sobre a história evolutiva da espécie, o genoma nuclear (nuDNA) também foi explorado com a sequenciação de um gene candidato potencialmente envolvido no stress osmótico (EST 2T13), em 104 amostras provenientes das mesmas 26 populações. Para ambos os conjuntos de dados foram detectadas duas linhagens presentes em sobreiro. Uma linhagem, a “linhagem pura”, parece praticamente exclusiva do sobreiro e divide-se em três sub-linhagens possivelmente resultantes de três zonas de refúgio, sendo uma predominante na zona Oeste do Mediterrâneo, e as outras duas na zona Este do Mediterrâneo. A outra linhagem aparece associada a (azinheira) e (carrasco) e foi apelidada de “linhagem introgredida”. Esta linhagem parece resultar de vários fenómenos de hibridação e introgressão com Quercus ilex. A análise combinada das sequências do cpDNA e nuDNA sugere que esta introgressão aconteceu em ambos os sentidos entre as duas espécies, assim como sugere que estes eventos foram frequentes e consecutivos durante um período de tempo.

Finalmente, microssatélites nucleares, derivados de ESTs (Expressed Sequence Tags) (EST- SSRs) e anónimos (nuSSRs), permitiram obter uma perspectiva dos padrões de diversidade genética e estrutura populacional do sobreiro. Numa primeira fase foi possível estabelecer os EST-SSRs como marcadores válidos no sobreiro, contrariando a ideia de que os EST-SSRs

IX

tendem a ser pouco polimórficos. Posteriormente, uma análise combinada destes dois marcadores (5 EST-SSRs e 3 nuSSRs) em 379 indivíduos provenientes de 13 populações detectou uma diversidade genética relativamente baixa, mas altamente significativa. Apesar de não ter sido detectada estrutura populacional nas populações Portuguesas, aparecendo em conjunto num grupo populacional, verifica-se uma tendência para considerar a Catalunha (Espanha) como uma das populações mais diferenciadas.

No geral os objectivos do trabalho foram cumpridos, esclarecendo alguns pontos da filogeografia e história evolutiva do sobreiro. A introdução dos novos marcadores moleculares foi claramente informativa, revelando novos aspectos inesperados acerca dos padrões genéticos da espécie e assim o gerar de hipóteses explicativas completamente novas em sobreiro.

Palavras-Chave

Quercus suber, estrutura geográfica, microssatélites, ESTs, introgressão

X

Abstract

Cork oak (Quercus suber L.) is one of the most important species, economically but also ecologically, in the Western Mediterranean region. Consequently there is an enormous interest in understanding the evolutionary history and current population structure in cork oak. Although some details on the genetic divergence of cork oak populations have been uncovered, it is most probable that a different and complementary analysis of chloroplastidial and nuclear DNA markers (cpDNA and nuDNA) can bring additional phylogeographical relevant information. So far, no one has attempted the molecular approach proposed in the present study for cork oak by combining cpDNA and nuDNA sequence variation and also anonymous nuclear microsatellites (nuSSRs) and EST-derived (Expressed Sequence Tags) (EST-SSRs) polymorphism data to infer phylogeographical patterns and history, possible glacial refuges, diversity levels and geographic structure.

A genetic survey was conducted sampling populations throughout the entire distribution range of the species. Genetic diversity was monitored at 8 nuclear microsatellite loci (3 EST- SSRs and 5 nuSSRs) in 379 individuals derived from 13 populations, and at 4 DNA sequences (3 cpDNA intergenic spacer regions and 1 osmotic-stress related candidate gene) in 148 samples from 26 populations.

DNA sequences, of both cpDNA and nuDNA, confirmed two main lineages of cork oak haplotypes, the first named as pure lineage (mostly exclusive of cork oak but also shared with Q. cerris) and the second as introgressed lineage (shared with Q. ilex and Q. coccifera). However, sequences of the cpDNA show the complexity of the introgressed lineage, apparently indicating that these events of hybridization and introgression may have happened frequently and consecutively over a period of time. The theory of cork oak refugia over the last glaciations was also revisited (over the pure lineage of the cpDNA haplotypes) and three major haplotypes were detected, reflecting three possible refuge areas. Finally, with the microsatellite data, population differentiation was low but rather significant and the geographic subdivisions that could be defined isolated the Portuguese populations in one cluster, further characterizing the Catalonia (Spain) population as possibly the most differentiated population.

Key words

Quercus suber, geographical structure, microsatellites, ESTs, introgressive hybridization

XI

XII

List of abbreviations

AFLP – Amplified Fragment Length Polymorphisms

BA – Bayesian analysis

BLAST – Basic Local Alignment Search Tool bp – base pairs

BP – Before Present

CBOL – The Consortium for the Barcode of Life

COI or Cox1 – cytochrome c oxidase I cpDNA – Chloroplastidial DNA

ESTs – Expressed Sequence Tags

EST-SSRs – EST-derived SSRs

FAO – Food and Agricultural Organization (of the United Nations)

ITS – Internal Transcriber Spacer kb – Kilobases

MCMC – Markov Chain Monte Carlo

MP – Maximum Parsimony mtDNA – Mitochondrial DNA nuDNA – Nuclear DNA nuSSRs – nuclear SSRs

PCR - Polymerase chain reaction

RAPDs – Random Amplified Polymorphic DNA rDNA – Ribosomal DNA

RFLP – Restriction Fragment Length Polymorphism sncDNA – Single-nuclear copy DNA

SNPs – Single Nucleotide Polymorphisms

SSR – Simple sequence repeats; microsatellites

XIII

XIV

Table of Contents

1. Introduction ...... 17 Thesis Main Goals ...... 18 1.1 An emblematic tree: Quercus suber L...... 20 1.1.1 General aspects on cork oak and the “montado” ...... 20 1.1.2 Taxonomic classification and phylogenetic studies ...... 22 1.1.2.1 Barcoding in oak phylogenetics ...... 24 1.1.3 Geographical distribution ...... 25 1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization ...... 27 1.1.5 Genetic diversity studies ...... 30 1.1.6 Hybridization and cytoplasmatic introgression ...... 35 1.2 Molecular markers in phylogeography ...... 37 1.2.1 Mitochondrial DNA (mtDNA) ...... 37 1.2.2 Chloroplastidial DNA (cpDNA) ...... 38 1.2.3 Nuclear DNA (nuDNA) ...... 39 1.2.4 Simple Sequence Repeats (SSRs) ...... 40 1.2.5 Expressed Sequence Tags (ESTs) ...... 41 2. Materials and Methods ...... 43 2.1 Sampling and DNA extraction ...... 43 2.2 DNA sequencing ...... 44 2.3 Microsatellite genotyping ...... 47 2.4 Phylogenetic and phylogeographic analysis ...... 48 2.5 Selective neutrality tests and demographic history ...... 49 2.6 Genetic diversity and population differentiation ...... 50 2.7 Genetic structure of populations ...... 51 3. Results ...... 53 3.1 Sequencing of chloroplast and nuclear DNA fragments ...... 53 3.1.1 cpDNA and nuDNA diversity levels ...... 53 3.1.2 Differentiation patterns ...... 54 3.1.3 Mismatch distribution and neutrality tests ...... 62 3.2 Microsatellite analysis...... 64 3.2.1 Genetic diversity values ...... 64 3.2.2 Genetic differentiation among populations ...... 66

XV

3.2.3 Population structure ...... 68 4. Discussion ...... 73 4.1 Differentiation and demographic patterns ...... 73 4.2 Hybridization and introgression ...... 75 4.3 Genetic diversity and population structure ...... 79 5. Final Remarks ...... 82 6. Bibliographic References ...... 84 Supporting Information ...... 95 Supporting Information 1 ...... 96 Supporting Information 2 ...... 97 Supporting Information 3 ...... 99 Supporting Information 4 ...... 101 Supporting Information 5 ...... 105 Supporting Information 6 ...... 109 Supporting Information 7 ...... 110 Supporting Information 8 ...... 111 Supporting Information 9 ...... 115

XVI

Materials and Methods

1. Introduction

The year of 2011 has been designated as „The International Year of Forests‟ by the United Nations General Assembly, in an attempt to raise awareness and strengthen a more sustainable forest management and conservation of all types of forests for the benefit of current and future generations. Estimates by the Food and Agriculture Organization (of the United Nations) (FAO), in the year of 2010, demonstrated that 31% of the Earth‟s terrestrial surface is still covered by forests, and that correspond to 90% of Earth‟s biomass [1]. Some estimates of the global tree species richness state that there are 60,000 to 100,000 taxa, and that forests harbour the majority of the world‟s terrestrial biodiversity [2]. However, the ongoing deforestation and other human-induced global changes (such as climate and land use) brought the number of the world‟s tree species threatened with extinction close to 10% [3,4] and, although the overall rate of deforestation remains alarmingly high (estimated at 9.4 million hectares per year in the late 1990s), this rate is surprisingly slowing down [5].

In recent years forest trees have been gaining much attention as a non-classical model for several types of studies. For purposes of population and evolutionary genetic and genomic studies, they are particularly interesting since forest trees result of millions of years of lineage divergence and diversification and present amazing levels of diversity in morphology, adaptation, and ecology [6,7]. Although, in the end, the impact of global changes in forest trees will depend to great extent on the reaction of these trees and their ecosystems, genetic studies open the possibility of predicting the evolutionary consequences of the future global changes by increasing the knowledge on tree biodiversity and evolution [8]. For that purpose phylogeographic genetic studies seem to be an important step in understanding these processes.

Avise et al. presented the concept of “Phylogeography” in 1987 [9], and during the past 25 years or so phylogeography has had a major impact on research, particularly in animal species. In , however, the produced results have not been so explicit. One of the major problems has been a lack of useful genetic variation applicable to the phylogeographic analysis. It is quite difficult to find genetic markers in plants with a resolving power comparable to animal mitochondrial DNA (mtDNA) [7,10]. Nonetheless,

17

Materials and Methods

phylogeography has come a long way over the last few years with the availability of nuclear markers and with the collection of data from larger sections of chloroplastidial genome [11].

The genus Quercus () of the Fagaceae family is one of the most important groups of woody angiosperms in the northern hemisphere in terms of species diversity, ecological dominance, and economic value. The genus is quite old, since the oldest unequivocal oak fossils belong to the Oligocene, which ranges from 34 to 23 million years before present. Oaks are dominant members of a wide variety of habitats, and somewhat 500-600 species exist on earth [12,13].

The Quercus suber L. (commonly known as cork oak) is among the most important tree species (economically and ecologically) in the Western Mediterranean region, from where it is endemic, defining unique open woods (created and maintained by man) known in Portugal as “montados” ” and in Spain as “”. Quercus suber has been mostly used to produce cork and this natural product has a great economic value. The biggest threats, however, are faced by the marginal natural populations, often growing in small and scattered stands and in restricted habitats that are at risk of disappearing, mainly due to a lack of regeneration [14]. Due to the species economic value and also because cork oak woodlands are renowned reservoirs of biodiversity, home to a variety of threatened and endangered species, and crucial to avoid soil erosion, Q. suber populations represent valuable material for genetic studies as well as gene conservation programs. With that purpose, a greater knowledge about the spatial organization of genetic variation within the species is necessary to allow decisions to be made about tree breeding and the conservation of genetic resources.

Thesis Main Goals

Although previous studies have already addressed population genetics in Quercus suber, there is still a void in the current understanding of the evolutionary history of the species, mostly due to the limited number of geographical areas sampled or to low marker informative content. In particular, Portuguese populations have been poorly represented in those studies, and being Portugal one of the most relevant regions in the recent and past history of cork oak, a more complete range of cork oak distribution and differentiation should be covered,

18

Materials and Methods

including some areas referred as potential glacial refuges for other species. On the other hand, since the majority of the previous phylogeographical inferences are supported by data from chloroplastidial markers [Restriction Fragment Length Polymorphism (RFLP) and microsatellites (SSRs)], it should be raised the question whether other molecular approaches or genetic markers evolving at faster rates than chloroplastidial DNA (cpDNA), would provide a different microevolutionary scenario. Therefore, the main objectives of this work were to:

1. Infer the evolutionary history and demographic patterns of Quercus suber;

2. Assess the hybridization and introgression patterns by other Quercus species;

3. Evaluate the diversity and differentiation levels among and within some key cork oak populations.

To achieve these goals populations from the entire Mediterranean distribution of the species were analysed using different approaches with several molecular markers. A multi-locus sequencing approach was applied to infer the evolutionary history of the species and its relationships and also introgression patterns with other Quercus species. The traditional cpDNA was selected for sequencing of several fragments. Additionally, and because phylogeographic inferences based on a single non-recombining marker can be misleading, the nuclear genome was also explored with the sequencing of one candidate gene. Finally, Expressed sequence tag (EST) derived SSRs (EST-SSRs) and anonymous nuclear SSRs (nuSSRs) were also used with the intention of providing a perspective of patterns of genetic diversity and population structure.

19

Materials and Methods

1.1 An emblematic tree: Quercus suber L.

1.1.1 General aspects on cork oak and the “montado”

Cork oak (Quercus suber Linné, 1753) is an emblematic Mediterranean evergreen sclerophyllous tree. It is a slow growing, extremely long-lived tree, reaching about 20 meters height, with massive branches forming a round crown (Fig. 1.1). It is a diploid (2n=24), monoecious (both male and female reproductive organs in one individual) species with a protandrous system (anthers mature before carpels) to ensure cross-pollination. Plant propagation in natural populations occurs by seed (acorn) dispersal and subsequent germination (sexual reproduction), which is called natural regeneration. Cork oaks natural regeneration is mostly assured by wind and animals, as with those of most oak species [15- 17].

Cork oak, along with holm oak (Quercus ilex L., 1753), are the two main evergreen oak species in the western part of the Mediterranean Basin [17]. These two species, particularly in the Iberia Peninsula are mostly present as semi-natural stands known as “montados”, which are open woods with a delicate and particular ecosystem, created and maintained by man.

The montado semi-natural landscape is valued because it represents a viable land use still preserving a rich biodiversity at all levels from insects and flora to top predators such as the Figure 1.1: Quercus suber L. – Cork oak‟s natural population in Serra da Iberian Imperial Eagle (Aquila adalberti) or the Estrela, Portugal (Lynx pardinus), the world‟s most endangered cat and their mutual prey species, the rabbit (Oryctolagus cuniculus).

They also represent an important economical resource, but with the exception of central Spain, holm oak forests can be regarded as rare cases of woodlands that have undergone very little or no silvicultural management. Cork oak management is, however, at a different level, since its high economical importance is associated not only with harvesting of acorns, but

20

Materials and Methods

also of cork. The thick and soft bark of cork oak is used to produce the familiar cork which is the main product responsible for the important economical role of this partly domesticated species. Trees are first stripped of cork, from the lower portion of the trunk at about 14 years of age and subsequently every 9-12 years, and can live through this process for 100 to 500 years without any apparent effect on tree physiology. Acorns are eaten by birds and they are highly valued as fattening fodder for domestic Iberian pigs. Since ancient times, cork oak has been favoured, and sometimes widely spread, by preferably using acorns from trees producing good quality cork [15,18-20]. Therefore, Q. suber is widely cultivated within its natural range, but according to Carrión et al. [21], without human activities, cork oak would never develop pure stands in the , and would form mixed forests with other sclerophyllous and deciduous oaks, and with .

Cork oak forests cover 2,2 million hectares worldwide, from where 340,000 tons/year of cork are extracted (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). The largest stands, covering about 700,000 ha are located in Portugal and correspond to 21% of the forest area in Portugal and to 30% of the world‟s cork producing area. Currently, cork industry represents 3% of all Portuguese exportations. Cork stoppers for bottles are the most representative product of this industry, responsible for 70% of the exportations (see: http://www.amorim.com/cor_glob_cortica.php). In spite of the economic importance of this renewable material, there is still much to discover about both the biological and the genetic mechanisms involved in its formation. Human intervention through extensive plantations and systematic clear-cutting in forests with the objective of empirically selecting varieties with higher quality levels of cork is supposed to have strongly contributed to the genetic homogenization of Q. suber populations in the Iberian Peninsula [22].

Cork oak plantations are very important for the economy and play an important social and environmental role that has to be taken into consideration as the unparalleled decline occurring in the Iberian Peninsula and in Morocco is threatening the entire ecosystem [22]. Although the marginal and natural populations of cork oak are possibly the most endangered, Iberian cork oak montados are also currently threatened and in decline due to multiple factors. The main factor contributing to this decline is the occurrence of very severe drought periods over several consecutive years [18]. The lack of natural regeneration (mainly due to overgrazing and insolation, particularly in North ) is one of the most important factors and so stand sustainability cannot rely exclusively on the decreasing resprouting ability of

21

Materials and Methods

aged and decaying adult trees [23,24]. In Portugal and Spain another contributing factor to this decline is the occurrence of ink disease, a root disease caused by the soil born pathogen . Moreover, the increasing use of synthetic stoppers in wine bottles replacing the traditional cork is an additional factor that in conjunction with the above stated threatens this ecosystem at medium term [18,22].

Holm oak montados are also endangered for some of these and a number of other reasons. Thus, the admired sustainability of montados is jeopardized, and these formations may become „fossil forests‟ [23]. In this sense, the outlook is more favourable for Q. ilex which is a more euryecious species than Q. suber (whose presence is limited by cold, drought and soil type). In recent years, there has been increasing recognition of the important contribution made by these species to the preservation of seminatural habitats and landscapes in [18,23,24]. Several studies on the regeneration of Mediterranean forest have been published, and some of them are centred on Q. ilex and Q. suber [24,25]. However, these works have focused mainly on ecological aspects of regeneration, silviculture and land use [23], without addressing the genetic bases of montado regeneration or the populations‟ diversity with consequences on adaptation, which is now an immediate priority to allow informed decisions for conservation of genetic resources.

1.1.2 Taxonomic classification and phylogenetic studies

Several proposals for Quercus based on morphology have been presented [12,26], however these classifications have always been surrounded by controversy mainly due to a generalised intraspecific morphological variation that may be produced by hybridization and adaptation to ecological changes in the environment [27], especially abundant in oaks. As a result, classifications have been all but straightforward and especially at the subgenus level, still uncertain. The taxonomic scheme proposed by Schwarz in 1964 [26] is possibly the most accepted for the classification of cork oak, and appears to be the most suitable in describing the systematics of European oaks [19,28,29]. According to the Flora Europaea [17], and that same taxonomic scheme, the genus Quercus is divided in four subgenera (or subsections), as follows:

22

Materials and Methods

Order Family Fagaceae Genus Quercus Subgenus Cerris Quercus Sclerophyllodrys Erythrobalanus

Quercus suber belongs to the family Fagaceae, genus Quercus and subgenus (or subsection) Cerris (Spach) Oersted.

Quercus comprises 500-600 species, of which 350–500 species are distributed throughout the Northern Hemisphere [12,13,30]. They are conspicuous members of the temperate deciduous forests of North America, Europe, , as well as the evergreen Mediterranean maquis. A smaller number of oak species (30–35) are evergreen and grow mainly in south-western Asia, western North America and around the Mediterranean Basin. In the Mediterranean area, only four evergreen oak species have been identified. These include Quercus alnifolia Poech. (golden oak) endemic to Cyprus and Quercus suber L. (cork oak) distributed exclusively in the western part of the Mediterranean Basin. The third species is the holly oak which is a complex including Quercus coccifera L. and Quercus calliprinos Webb. Allozyme studies suggest that holly oak should perhaps be considered as a single species (Q. coccifera L.) with subspecies coccifera and calliprinos [19]. The fourth Mediterranean oak species, Quercus ilex L. (holm oak), shows two morphological types, rotundifolia and ilex type [18,19,26], which are sometimes regarded as distinct species.

According to Schwarz [26], Q. ilex and Q. coccifera (including subsp. calliprinos) belong to subgenus Sclerophyllodrys (O. Schwartz) whereas Q. suber and Q. alnifolia relate to subgenus Cerris (Spach). This classification was also supported by RFLP analysis of the nuclear ribosomal DNA (rDNA) 18S and 25S and spacer regions [28] and chloroplastidial DNA [30] and by nuclear DNA (nuDNA) Internal Transcriber Spacer (ITS) sequences [27,30], however Q. alnifolia was not included in these studies. Moreover, from the study of Manos et al. [30], evidence was obtained that the two groups of Mediterranean oaks (subg. Sclerophyllodrys and subg. Cerris sensu Schwarz; ”Ilex group” and “Cerris group” sensu

23

Materials and Methods

Nixon) are monophyletic, as reported previously by Nixon [29]. More recently, they constitute a larger group (the Eurasian Cerris group) which includes all the European and Asiatic evergreen oak species analysed [27,30]. When considering the subg. Cerris, several systematic studies support that and Quercus crenata are the most closely related species to Q. suber [27,29-31].

1.1.2.1 Barcoding in oak phylogenetics

Tree species share several attributes, such as longevity, complex reproductive strategies, great potential for local adaptation, and slow mutation and speciation rates [2], that makes barcoding of forest trees a captivating issue from both speculative and practical points of view. “DNA Barcoding” is a molecular approach to identify the species to which any living organism belongs by the use of a standardised gene region of the genome (or several loci used together as a complementary unit). Ideally, the barcode system would be an universal and valuable resource that would allow fast and unequivocal species identification and taxon characterization at any life stage of the specimen and from minimal tissue samples (http://www.barcoding.si.edu) [29,32,33]. Besides taxonomy, a widespread application of barcoding would be a powerful research complement for molecular ecology, phylogenetics, and population genetics [34].

The success of a DNA sequence as a species identification tool - the barcode - depends on the prerequisite of existence of unique substitutions that distinguish among closely related species, and ease of application across a broad range of taxa. A portion of the mitochondrial cytochrome c oxidase I (COI or cox1) gene sequence is currently being used as a universal barcode in certain groups of animals, fungi, diatoms, and red algae. However, COI has proved to be unsuitable in land plants, mainly because of the low nucleotide substitution rates of the plant mitochondrial genome [7,35,36]. The nuclear and plastid plant genomes therefore offer the best expectation of yielding a suitable sequence (or pool of sequences) for DNA barcoding, i.e., a sequence(s) that will be variable enough to differentiate species, but at the same time still stable enough at a lower taxonomic level as to have low infraspecific variability [33,35]. The difficulty in finding a single-locus for barcode in plants suggested a multilocus approach, focusing on the chloroplast genome as the most promising strategy for barcoding plant species. Therefore a pool of loci has been recently considered, with the

24

Materials and Methods

greatest interest turned to seven candidates: rpoB, rpoC1 and rbcL as three easy-to-align coding regions, a section of matK as a rapidly evolving coding region, and trnH-psbA, atpF- atpH, and psbK-psbI for being three rapidly evolving intergenic spacers [36,37]. Based on the relative ease of amplification, sequencing, multialignment, and on the amount of variation displayed, many research groups have proposed different combinations of these loci [32,36- 39]. However, in 2009, the CBOL (The Consortium for the Barcode of Life) Plant Working Group stated the combination of rbcL and matK as the most convenient in terms of universality, sequence quality and discrimination power. Nevertheless, it is still argued that regardless of the regions adopted for barcoding, some species will always be better resolved with the use of other regions [29,36,40]. Such an example is the oaks, which represent an obstacle to the idea of barcode in plants.

A recent attempt of barcoding in the Italian wild dendroflora, with the use of four plastid regions (trnH-psbA, rbcL, rpoC1, matK), revealed that the genus Quercus is noncompliant to barcoding (0% discrimination success) [29], a probable consequence of factors like low variation rate at the plastid genome level and hybridization. Nonetheless, it appears that the main obstacle to barcoding success in difficult genera, such as Quercus, cannot simply be overcome by adding additional plastid DNA data. Nuclear DNA may offer some advantages due to higher mutation rates and modes of inheritance. Discrimination of the same set of oak species was already obtained by means of internal transcribed spacer region of ribosomal DNA (ITS) sequence variation [27], and it even supports the recognition of the subgenus Schlerophyllodrys, Cerris, and Quercus, as proposed by Schwarz [26]. The rapidly evolving ITS may thus represent a useful supplementary barcode in difficult genera, although not without completely overcoming extant problems, namely the paralogy and other factors associated with the complex concerted evolution of this highly repeated part of the nuclear genome, which still requires further refinement of current protocols [7,35].

1.1.3 Geographical distribution

The Mediterranean evergreen Quercus species are a group with overlapping habitats. In the Western Mediterranean Basin, holm oak, cork oak and holly-oak are the dominant broadleaved species. These three species are sympatric in many areas, but some differences in their ecological requirements produce distinct responses to environmental conditions and

25

Materials and Methods

hence different evolutionary histories as interestingly confirmed by several studies showing differences in their genetic variation patterns at both nuclear and cytoplasmic levels [18,19,30,41,42].

Figure 1.2: Geographical distribution of cork oak, Quercus suber, represented in dark grey. Based on Magri et al. [16]

Q. suber has quite a narrow geographical range when compared to the other main evergreen Mediterranean oak species, mainly due to its ecological restrictions. The modern distribution of cork oak, rather discontinuous, ranges from the Atlantic coasts of and Iberian Peninsula to the southeastern regions of Italy, and includes the main western Mediterranean islands of Sicily and as well as the coastal belts of Algeria and Tunisia, Provence (France) and Catalonia (Spain) [16,43] (Fig. 1.2).

As opposed to holm oak which shows a great ecological amplitude, cork oak is restricted to hot (>4ºC – 5ºC mean temperature for the coldest month) variants of the humid and sub- humid Mediterranean areas with at least 450 mm mean annual rainfall [18,20].

26

Materials and Methods

In Europe there are, theoretically, low winter temperatures that appear to set the geographic distribution limits and most cork oak stands are located in areas below 800 meters in altitude, since cork oak are less tolerant to frost and to drought than those of the more widespread holm oak. In addition, whereas holm oak is indifferent to soil types, cork oak usually grows in acidic soils on granite, schist, or sandy substrates and it avoids limestone and other carbonated substrates. Cork oak distribution is therefore more shifted to the west and more patchy than that of holm oak (sensu latu) which constitutes a continuum from to Portugal, including all the larger Mediterranean islands [17,18,20,43]. In spite of this, within its geographical range, cork oak shows high levels of morphological and phenological variability, albeit most of this diversity is considered to be result of past introgressive hybridization with other sympatric species [15,44,45]. Nowadays, in their common distribution area, cork and holm oaks often grow together and the local occurrence of morphologically intermediate trees has been reported [18,27].

1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization

Several hypotheses have been advanced concerning the evolutionary history of cork oak as well as the geographical location of its centre of origin; however the details of its differentiation processes are still largely unknown.

It was originally suggested that Q. suber may have originated in the Iberian Peninsula where the species has its current main range (Fig. 1.2). This hypothesis was based on geobotanical studies and on allozyme variation from the whole cork oak range, which revealed a substantially higher genetic diversity in the Iberian populations as compared with those from North Africa, Italy and France [18,27]. Paleoecological data indicate that both cork and holm oak species have been present in south Europe since the end of the Tertiary period. Also, two fossil records of cork oak from Miocene age were found in Portugal and two belonging to the Pliocene were recovered in Tunisia and Galicia (Spain). Therefore it seems plausible an early Cenozoic origin for Q. suber in Iberia and subsequently, at the end of the Miocene, the colonization of North Africa from the Gibraltar strait [16,18,43].

Alternatively, according to fossil records of other oak species of subgenus Cerris, dating to the Tertiary and found in the Balkanic Peninsula, it has also been considered that Q. suber

27

Materials and Methods

might have appeared first in more eastern countries (either in the Balkanic Peninsula or, alternatively, in the Middle Eastern-Peri-Caucasian area), in common to the whole Cerris group. It has been suggested that the species expanded westward during the late Miocene and was widespread throughout the Mediterranean Basin during the Pliocene, where it survived thanks to the lack of climatic constraints, but going extinct in the eastern part of its distribution area [27,43]. Data from PCR–RFLPs over cpDNA fragments seems to constitute additional evidence to support an eastern origin for cork oak [43].

Glacial and periglacial environments have had a significant effect on the modern vegetation of Europe. It is widely accepted that the climatic oscillations that occurred during the Quaternary (i.e., over the past 1.8 million years) are one of the most crucial determinants of the current distribution of biota in temperate latitudes. The spatial patterns of several tree species throughout the European continent are the long-term result of late glacial and post- glacial migration from refugial populations that were able to withstand the severe climatic conditions of Pleistocene stadials [46-49]. With few exceptions [8,50,51], during the coldest periods of the last full glacial epoch (37,000 – 16,000 years BP – before present) the locations postulated for glacial refugia of most European woody angiosperms have been south of the parallel 40º N, which runs from central Portugal to Sardinia, Calabria and northern Greece. This is considered to be the boundary between polar aridity and warmer climates during part of the Quaternary. The theory that (particularly the three southern peninsulas - Balkan, Italian and Iberian) and the Near East provided appropriate conditions for refugia of temperate tree taxa is based on a number of assumptions relating to the full-glacial environments of those regions and their ability to supply the necessary conditions for growth [52,53].

The original refugial model idea implied that „forests‟ could have survived in these southern locations during the cold stages of the Quaternary. However, extensive populations of trees have never been detected. Instead, the traditional palaeogeographical models (although inferred from a scarce palynological evidence) suggest a small number of refugia - the “few southern refugia” hypothesis [52,53]. Temperate tree taxa possibly survived in small pockets of microenvironmentally favourable locations where usually only a few tree taxa are detected and in low concentrations. Aridity was probably a significant limiting factor for tree growth. Assuming the hypothesis of “few southern refugia”, common patterns of post-glacial colonization for temperate European tree species are defined and expected, with high

28

Materials and Methods

diversity levels in southern Europe and decreasing northwards [54-57]. Some of the main European trees species have been analysed using molecular markers, including for example several Quercus species and Abies alba (European silver fir), and the resulting patterns of diversity correspond to the expected ones [58,59].

However, more complete palaeobotanical data sets [50,53], palaeoclimatic modeling [60] and genetic research [52] are starting to question the paradigm of “few southern refugia” in southern Europe (and in particular in the three southern peninsulas) during full-glaciations. Increasing evidence indicates that during the last full-glacial period populations of coniferous and some deciduous trees grew much further north and east than previously assumed [53]. In addition, new palaeoclimatic simulations suggest that full-glacial conditions in central and eastern Europe were not nearly as severe as previously anticipated [60,61]. While some refugia for Mediterranean trees were previously identified in the Iberia Peninsula, López de Heredia et al. (2007) results based on cpDNA PCR-RFLPs and a review of paleobotanical data support the presence of multiple refugia for the evergreen oaks within the Iberian Peninsula (e.g. Cantabric mountain ranges, south-eastern Spain or even central Spain) during, at least, the last glacial period [52]. Under the “multiple refugia” hypothesis, tree species that nowadays are present in the north and would have recolonized these areas from populations located in the north of the Iberian Peninsula. Moreover, these populations would have been barriers preventing expansion from southern refugia. If that was the case, cpDNA data should show complex patterns of spatial distribution that would have resulted from the generation of multiple secondary contact zones [8].

For the last glacial and postglacial periods, results from palynological data indicate the occurrence of cork oak in south-western Iberia since the Late Glacial period (17,000-12,000 years BP) and in North Africa since the early Postglacial (approximately 8,500 years BP) [18,27]. It is accepted that during the Quaternary glaciations, cork oak may have survived in scattered refugia which possessed favourable microclimate conditions, and from which postglacial colonization occurred over recent millennia. Palynological [21] and molecular data [43,52] indicate a glacial refugia in south-western Iberia that expanded northwards in the absence of mountain barriers and which was favoured by the existence of siliceous substrates. It is also possible that the extensive introgression of Q. suber with Q. ilex may also indicate several potential refugia in eastern Iberia [52]. RFLP analysis of the whole cpDNA show a phylogeographical pattern of three groups corresponding to potential glacial refuges in Italy,

29

Materials and Methods

North Africa and Iberian Peninsula [43], from which, after the last glaciations, Q. suber may have begun migrating northward to the southern part of France. However no fossil record supports the molecular data for the Italian and North African refuge.

Reliable scientific evidence is lacking to confirm the presence of Q. suber in more northern and eastern European countries. The Tertiary and Quaternary remains (megafossils and pollen) found in several European countries did not allow taxonomic identification at the species level and could be attributable to any Mediterranean oak species of the Cerris group [43,62]. In fact, Q. suber is more thermophilous and has stricter soil requirements than many other Quercus species, thus a bigger reduction of this species‟ range during glacial times is expected to have happened. However, the uncertainty of palynological discrimination and the lower cpDNA variation itself could bias the identification of glacial refugia for Q. suber [52].

1.1.5 Genetic diversity studies

As cork oak is predominantly allogamous, i. e. favouring cross-fertilization, with a life span of up to 500 years or more and having a low replacement rate, it can be expected that at least in some places, and mostly for selectively neutral characters, selection over time may have resulted in reduced genetic differentiation both among trees of the same population and between populations. However, a differentiation among populations has been detected around the Mediterranean Basin by investigating both chloroplast and mitochondrial DNA (which are maternally inherited in oaks, as demonstrated by Dumolin et al. [63]), as well as allozyme variation [16,18,19,67]. Isozyme variation in the genus Quercus also shows that genetic variability is high and similar to that found in conifers [15,65]. One of the main causes for the high polymorphism found in cork-oak, as well as in holm-oak, may be attributable to the physiological plasticity of the species, which allows them to adapt to variable and unpredictable climatic conditions, characteristic of the . High levels of diversity within populations are observed; conversely, low inter-population variability indicates that most of the total genetic diversity in the species is found within rather than among populations [15,18,19,23]. According to Elena-Rosselló & Cabrera [15], more than 83% of the total diversity in this species is found within populations, and the decline of kinship estimates with distance suggests that isolation by distance has led to this structure. The results obtained for Q. suber contrast with those found for most temperate forest species,

30

Materials and Methods

for which a generally weak and narrower within-population structure is the trend [23]. In cork oak, gene flow between populations was estimated as more than one migrant per generation (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) and is theoretically enough to prevent genetic drift from causing local genetic differentiation and therefore population divergence, under the Wright‟s Island Model [66].

PCR-RFLPs over specific cpDNA fragments illustrate a complex pattern of variation in the evergreen oaks [19,41]. Jiménez et al. [41] detected three very distinct lineages of cpDNA haplotypes, two of them being present in cork oak. One of the lineages, the “suber” lineage, is specific to cork oak populations and may be considered as the original and most widely distributed lineage in this species. The partial geographical distribution of this lineage was reported by López de Heredia et al. [64], from peninsular Italy, Sardinia, Sicily, , northern Africa and the island of Minorca. Cork oak populations from the Spanish mainland and from the island of Majorca were characterized by another maternal lineage also shared with Q. ilex and Q. coccifera, the “ilex-coccifera I” lineage [41,64]. This fact was interpreted as the result of multiple and mainly unidirectional cytoplasmic introgression of Q. suber by Q. ilex.

RFLP analysis over the whole chloroplastidial DNA was used by Lumaret et al. [43] for the first time in Q. suber to analyse the phylogeographical variation over the whole species range (Fig. 1.3). The chlorotypes showed a clear phylogeographical pattern of three groups corresponding to potential glacial refuges in Italy, North Africa and Iberian Peninsula. The most ancestral and recent groups were observed in populations located in the eastern and western parts of the species range, respectively. Unrelated chlorotypes of an “ilex” cpDNA lineage were also identified in specific western populations [43]. From the cpDNA variants of „ilex‟ lineage recovered through interspecific introgression, additional successive cpDNA changes may have occurred in Q. suber, and so two distinct cpDNA lineages in cork oak were predicted. A particular chlorotype S1, observed predominantly in continental Italy and in Sicily, was identified by Lumaret et al. [43] in a few populations from Sardinia, and from Corsica which also shared a rare chlorotype S7 with Tunisia. This situation possibly reflects the occurrence of rare natural events of long-distance dispersal from several geographical sources located in the closest areas to those islands. Moreover, the possibility of an intentional acorn transport by people for economic purposes cannot be ruled out and its impact on the geographical patterns of cork oak genetic variation should not be

31

Materials and Methods

underestimated [43]. López-de-Heredia et al. [64] also proposed the possibility of long- distance dispersal events to explain the sharing of a rare chlorotype by cork oak populations located in Minorca and in Sardinia.

Figure 1.3: Geographical distribution of the eight and six chlorotypes of the „suber‟ and „ilex‟ lineages identified in Q. suber populations by Lumaret et al. [43]. Chlorotypes were scored by RFLP variation over the whole cpDNA molecule. The identity of sampled populations and cpDNA chlorotypes assayed through RFLP as well as affiliation to the „suber‟ or „ilex‟ cpDNA lineages are indicated in the Figure. Source: Lumaret et al. [43].

Using cpDNA microsatellites, Magri et al. [16] analysed cork oak populations throughout the species distribution range and found a high geographical structure characterized by five distinct haplotypes (Fig. 1.4). It was assumed that H3 (north Africa-Sardinia-Corsica- Provence) and H4 (Portugal-western Spain-southwest France-northern Morocco) were the ancestral Q. suber haplotypes, with H1, H2 (Italy) and H5 (scattered populations) originating through ancient or recent introgression with Q. cerris (H1 and H2) and Q. ilex (H5). Also, the cpDNA SSR data combined with paleobotanical and geodynamics models demonstrated that cork oak populations have possibly experienced a genetic drift geographically consistent with the Oligocene and Miocene break-up events of the European–Iberian continental margins and persisted in some of the separate microplates that are currently found in Tunisia, Sardinia,

32

Materials and Methods

Corsica, and Provence [16] (Fig. 1.5). All these events seemed to have occurred without detectable cpDNA modifications for a time span of over 15 million years.

Figure 1.4: Distribution of cpDNA haplotypes found by Magri et al. [16] with cpDNA SSRs and phylogenetic reconstruction of the relationships between haplotypes. The black circle in the network indicates a hypothesized mutation, which is required to connect existing haplotypes within the network with maximum parsimony. The grey area corresponds to the current distribution of Q. suber Source: Magri et al. [16].

The modern history of Quercus suber is closely related to human activity over the use of its cork. For this reason, humans have been considered responsible for a reduction in genetic variation in some stands of cork oak, as well as for hybridization with congeners [67]. Other cultivated tree species in the Mediterranean area display a similar low geographical structure in genetic variation, arguing for a multidirectional diffusion because of human activity. For example, in , the low geographical structure of the chloroplast genetic diversity may be explained by the effect of a strong human impact [67,68]. However the geographical distribution of the cork oak haplotypes found by Magri et al. [16] does not appear to be related to cultivation. In fact, fossil pollen and wood records suggest that cork oak was distributed in approximately the same areas as today even before the Neolithic.

33

Materials and Methods

Another possible hypothesis to explain these results is postglacial population expansion from the potential glacial refuges in Italy, North Africa and Iberian Peninsula [43].

Some studies have also assessed the genetic variability of cork oak populations in Portugal. Coelho et al. [22] used AFLP markers and reported low levels of differentiation among cork oak populations. The reasons pointed out are owed to the outcrossing characteristic of the species, long distance anemophilous pollination and eventual secondary acorn dispersal by animals, leading to extensive gene flow and an increased homogeneity of allele frequencies between populations [22,45]. The values of population differentiation

reported by Coelho et al. [22] (FST =0.0172) are below the average of 0.07–0.09 expected for long-lived, wind-pollinated woody species. These results are similar to those found by Simões de Matos (F. Simões Figure 1.5: Reconstructions of the Western Mediterranean palaeogeography and possible de Matos, PhD thesis, INETI Lisbon, 2007) location of Quercus suber haplotypes found by Magri et al. [16] (colours as in Fig. 1.4). with nuclear SSRs (FST=0.02), confirming Continental microterranes rifted off the European- the absence of population structure. This Iberian continental margin: Rif (R), Betic range (B), Balearics (Ba), Kabylies (Ka), Corsica (Co), pattern of genetic differentiation within Sardinia (Sa), Calabria (Ca). Source: Magri et al. Portuguese cork oak stands, some located [16]. over a distance of 700 km, may be explained by anthropogenic pressure in addition to a constant gene flow. This study shows that 90% of the polymorphic markers identified in cork oak genotypes are uniformly distributed through the populations of , and Trás-os-Montes regions.

34

Materials and Methods

1.1.6 Hybridization and cytoplasmatic introgression

Capture of unexpected chloroplast haplotypes by hybridization and introgression has been proposed as the most likely explanation for the sharing of cytoplasmic genes both in deciduous and evergreen oaks [69] as well as in others [41,70]. Q. suber was reported to hybridize with several species of the evergreen oak group, particularly, with holm oak [17,45] this being regarded as one factor contributing to the increase of genetic diversity in cork oak [22]. Q. suber and Q. ilex possess overlapping geographical distributions [17], and hybridization occurs in nature, although it is not a frequent event [45]. Nevertheless, these species are not very closely related, as shown from both cytoplasmic and nuclear genetic analyses [19,27,69] and belong to subgenera Cerris and Schlerophyllodrys, respectively [26], although the more recent classification includes both species within the same Eurasian Cerris group [30].

The two most easily recognizable oak hybrids are Quercus x crenata (Q. cerris x Q. suber) and Quercus x morisii (Q. suber x Q. ilex) [27]. It must be noted that these relatively rare hybrids (0.3%) are found only when both parental species co-occur. Mature hybrid individuals are easily recognized due to intermediate morphological traits between the two parental oaks [27], but seedlings and even juvenile trees show very similar morphological traits so that, in mixed stands, species identification is usually very difficult or even impossible until the adult stage [71]. Asymmetric hybridization has been confirmed by Boavida et al. [45], upon the description of post-pollination barriers in Q. suber to interspecific crosses with Q. ilex, Q. coccifera, Q. faginea and Q. robur. The cross between Q. ilex and Q. suber shows evidence of unidirectional compatibility and a higher success rate was reported in the interspecific crosses in which Q. suber acts as pollen donor rather than as female parent due to a differential growth in the pollen tubes of both species [45]. Also, since both species are protandrous and Q. ilex flowers earlier, early cork oak male flowers can pollinate late holm oak female flowers, the reverse not usually occurring.

By analysing polymorphism at allozyme loci and DNA markers for which alleles are distinct in the two species growing in separate areas (diagnostic markers), evidence was obtained for the occurrence of hybrids and genetic introgression (backcrosses between hybrids and parental species) between sympatric holm oak (female) and cork oak (male) in several locations [18,43,71]. Further evidence was advanced that, in initial hybridization and in backcrosses, Q. ilex is predominantly, but not exclusively, the maternal species. This

35

Materials and Methods

interpretation is supported by the discovery of “ilex-coccifera I” haplotypes (chlorotype shared by Q. ilex and Q. coccifera) in Q. suber individuals, and the absence of the opposite situation, that is no Q. suber haplotypes within the Q. ilex pool [41].

The effect of hybridization and introgression in Q. suber cpDNA can produce the total replacement of the Q. suber chlorotype by the “ilex-coccifera I” lineage (chlorotype shared by Q. ilex and Q. coccifera) [41,64]. This situation is common in eastern Spain, where siliceous soils are scarce and the effective population size is lower than in the continuous forests from western Iberia. It has been suggested, on the basis of the differences between Q. ilex and Q. suber chlorotypes found in sympatric populations, that hybridization and introgression in these populations may be ancient [43]. Therefore, as reported by López de Heredia et al [52] it cannot be ruled out that in the eastern range of the species some populations withstood the glacial conditions by hybridizing with Q. ilex. For instance, a particular chlorotype found by these authors (named “c66”) is predominant in all Q. suber populations from Catalonia (north-eastern Spain), being very rare in Q. ilex.

The absence of cork oak populations possessing „ilex‟ chlorotypes in the eastern Mediterranean range of the species was reported both by López-de-Heredia et al. [64] and by Lumaret et al. [43]. However in a Corsican population, one of the 50 cork oak individuals scored for cpDNA RFLPs was shown to possess an „ilex‟ chlorotype [43], suggesting that cytoplasmic introgression of Q. suber by Q. ilex does occur in the eastern range although apparently much less commonly. A substantial number of trees showing intermediate morphology between both species have been observed in south-eastern continental Italy [27], in Sardinia and Provence [42], also possessing predominantly an „ilex‟ chlorotype and for many of them a hybrid origin was confirmed on the basis of nuclear interspecific diagnostic markers. So, interspecific hybridization is likely to have happened quite frequently in the eastern part of the range of Q. suber as well [43].

36

Materials and Methods

1.2 Molecular markers in phylogeography

The use of molecular markers has revolutionized research fields such as conservation biology, population biology, and ecology. Markers provide a mean of observing otherwise hidden aspects of natural history, whether this involves population level interactions on ecological timescales, or the evolutionary relationships of genes, populations, and taxa [10]. As stated before, there is a lack of phylogeographic studies in plants, when in comparison to animal studies. One of the major problems is finding useful genetic variation applicable to this type of analysis, and it has been quite difficult to find genetic markers with a resolving power comparable to the animal mitochondrial DNA [7,10]. To address this problem, and also the choice of the molecular markers used for this study, it seemed necessary to review the literature concerning plant genomes and the molecular markers available.

Plants are characterized by three types of genomes within the cell: the nuclear genome, and two cytoplasmic genomes – mitochondrial and chloroplastidial DNA. The latter are of endosymbiotic origin and have lost various genes to the nucleus over time (and, sometimes, vice versa). These organelle genomes, because of their supposed shared prokaryotic origin, are similar to animal mtDNA in overall structure (closed-circle chromosomes), replication mode (with large populations of molecules per cell), and a non-Mendelian inheritance. However, they also differ from animal mtDNA, and from one another, in some important molecular and evolutionary aspects [10,72].

1.2.1 Mitochondrial DNA (mtDNA)

Although the phylogeographic studies in animals rely heavily on the mitochondrial genome, in plants several characteristics make it poorly suited for these studies [7,11].

Plant mtDNA is highly variable in size across species, ranging from about 20 kilobases (kb) to 2500 kb. Inheritance is often maternal, but not always. Surprisingly, plant mtDNA evolves rapidly with respect to gene order and gene rearrangements are common, but rather slowly regarding primary nucleotide sequence. This leads to low rates of sequence evolution (about 100 times slower than in animals), such that specific loci do not contain adequate variation

37

Materials and Methods

for generating phylogeographic, intraspecific signal. So, in these regards, the evolutionary dynamics of plant mtDNA and animal mtDNA differ greatly, and one must look in alternative genomes for informative variation [7,10].

1.2.2 Chloroplastidial DNA (cpDNA)

Although similar to mtDNA, the plant cpDNA plays by different evolutionary rules. It varies moderately in size among species (from about 120 to 217 kb), with much of the size variation attributable to the extent of sequence repetition in a large inverted repeat region. The molecule contains about 120 genes that code for ribosomal and transfer RNAs, and several polypeptides involved in protein synthesis and photosynthesis. The chloroplastidial genome is transmitted maternally in most species, biparentally in some, and paternally in others (notably, most gymnosperms), and tends to evolve somewhat slowly with regard to gene rearrangements and also in terms of primary nucleotide sequences (about 3 to 4 times faster than plant mtDNA, but still much slower than animal mtDNA). For this latter reason, cpDNA sequences have proven especially useful for estimating phylogenetic relationships in plants [7,10].

Intraspecific variation has been reported in a growing number of species, so that almost all published plant phylogeographic studies have relied on the chloroplast genome as their only source of genetic variation [7,10]. Most of this variation has been revealed by restriction enzyme digestion of cpDNA (RFLP technique), in which genetic variants reflect the gain or loss of restriction sites or length variation [73]. A more recent restriction enzyme-based approach involves the digestion of PCR-amplified chloroplast loci to reveal fragment length polymorphisms (RFLP) within the amplified fragment [52,59]. Using these readily accessible laboratory techniques, large portions of the chloroplast genome may be evaluated in numerous individuals. Furthermore, it is believed that at least 50% of all cpDNA variation may be attributable to small insertion/deletion mutations. However, concerns about the homology of length variants associated with simple-sequence repeat (SSR) polymorphisms need to be addressed before this technique can be widely applied to construct useful gene trees [7]. Ultimately, direct knowledge of the sequences of cpDNA variation would be most desirable for gene tree construction. Unlike restriction enzyme analyses, direct sequencing of cpDNA loci has not retrieved so far as many optimal levels of variation for phylogeographic

38

Materials and Methods

analysis. On the search for cpDNA loci with useful levels of sequence variation, it is necessary to consider that the mutation rate of cpDNA varies for different regions of the genome and non-coding regions are more prone to mutation. Therefore, several small regions of the chloroplast genome (such as some intergenic spacers) show potential for phylogeographic analysis [59,74,75].

Several attempts indicate that single cpDNA loci are only occasionally useful at the intraspecific level, but as technology progresses and the sequencing of larger fragments of DNA becomes easily achievable, with diligent sequencing efforts, it seems likely that sufficient genetic variation can be uncovered and studies will utilize more of the potentially available variation in the chloroplast. Ultimately, finer phylogeographic resolution can be obtained [7,10]. Indeed a few studies have already proved intergenic spacer regions as useful regions for direct sequencing, such as, for example, trnT-L-F in Ficus carica [76][74] and trnH-psbA in Eucalyptus perriniana [77][78], as well as psbC-trnS intergenic spacer region [79] in several Quercus species [75].

1.2.3 Nuclear DNA (nuDNA)

The remaining alternative is the nuclear genome that is still largely unexplored but offers a potentially inexhaustible source of informative genetic variation, and lately many investigators are developing techniques and strategies for locating and efficiently sampling appropriate variation in nuclear DNA.

The ITS region, useful for plant systematics, is however generally not very helpful for phylogeographic studies. First, for most species examined, intraspecific variation has not always been detected in this region. Furthermore, as part of a multicopy gene family, the ITS region is subjected to poorly understood processes of concerted evolution, which may lead to problems with the interpretation of sequence polymorphism at the intraspecific level. Also, when a locus is part of a multicopy gene or multigene family, PCR amplification with conserved primers may produce multiple fragments, including duplicated gene copies, pseudogenes, and even recombinant PCR artifacts. Care is thus necessary to avoid comparing paralogous loci, which may be especially difficult to detect in cases where there has been differential homogenization of gene copies among populations [7,80].

39

Materials and Methods

In principle, single-copy nuclear (scn) genes should also provide sufficient sequence data for phylogeographic assessments at the intraspecific level, but three technical and biological obstacles need to be considered: first the considerable slow rate of sequence evolution at many nuclear loci; in diploid organisms, the difficulty of isolating aleles, one at a time; and intragenic recombination. Nonetheless, scnDNA has been employed successfully in some phylogeographic assessments, with some of the most informative results coming from intron sequences at protein-coding genes [10,81]. However, so far, no single locus appears to be universally useful in all species of plants.

Additional features of the nuclear genome also need to be taken into account for phylogeographic analysis such as complications involving interallelic recombination and heterozygosity, recombinant alleles from crossing-over events among alleles of a locus resulting in chimeric haplotypes and also the homology of the loci in use needs to be reassured. Some (and probably many) „single-copy‟ nuclear genes exist as part of small gene families consisting of two to ten expressed loci and possibly additional pseudogenes [7,10,80,81].

Despite all of these potential problems the nuclear genome is still, perhaps, the most dynamic and useful marker for studying plant phylogeography because it is much larger than the others and includes most of the information behind the shaping and adaptation of the individual to the environment.

1.2.4 Simple Sequence Repeats (SSRs)

Microsatellites (or simple sequence repeats - SSRs) are short repetitive sequences of nucleotides of typically 1-5 base pairs (bp) motifs, that are repeated in tandem up to a usual maximum of 60 or so, and are widespread in both eukaryotic and prokaryotic genomes [82,83]. Less accuracy of traditional molecular markers in the estimation of genetic differences between various taxa and their insufficient statistical capacity forced researchers to look towards better alternatives like microsatellites. They present a group of characteristics that make them eligible as markers of choice for several studies, such as: 1- PCR-based, 2- co-dominant, 3- usually multiallelic and highly variable, 4- randomly dispersed throughout

40

Materials and Methods

the genome, and 5- easily scorable by different methods [82,84,85]. Neutral nuclear SSRs (nuSSRs) are the choice for diversity analysis, genetic mapping and association studies [86].

The use of microsatellites as polymorphic DNA markers has considerably increased over the years, and although they were originally designed for research in humans, they have been extensively used for genetic analysis in all classes of organisms, including plants [82,85]. With the development of other genetic markers like single nucleotide polymorphisms (SNPs) and AFLPs, it was thought that the use of microsatellites would decline. However, recent research has improved its application so much that microsatellites will probably still be used in the near future as important genetic markers in various biological disciplines [11,82]. The initial cost associated with microsatellites may be high due to the requirement of sequence information, but once developed they can be easily maintained and shared between laboratories. The ease of use, high reproducibility, low cost and abundance of SSR loci in living organisms makes them ideal markers for genetic analysis. Also they are multi-allelic and generally have high heterozygosity and mutation rates (ranging from 10-6 to 10-2 events per locus per generation), which can make them more informative than other markers, such as Random Amplified Polymorphic DNA (RAPDs) and AFLPs [82,85].

Particularly in Quercus suber, Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) developed the only specific nuSSRs for cork oak (as a rule SSRs are species- specific markers which must be developed de novo for each species, mainly because they usually occur in non-coding regions of the genome which are not highly conserved) but some studies [87,88] have shown the transferability of nuSSRs from other oak species to cork oak, which potentially reduces the need to develop species-specific nuSSRs for this species.

1.2.5 Expressed Sequence Tags (ESTs)

Expressed Sequence Tags (ESTs) can serve as a source of molecular markers as gene sequences, SSRs or SNPs, and are an easy way to access fragments of the transcriptome. They are short (200-800 bases), randomly selected sequences derived from cDNA libraries. Even if ESTs are not available from the organism under study, EST collections can serve as a bridge between the genomic resources of model organisms and diverse species of interest, usually nonmodel organisms. ESTs provide information of the transcribed mRNA

41

Materials and Methods

populations within a given set of tissues, developmental stages, environmental conditions and genotypes [89,90]. For instance, the direct sequencing of EST fragments and subsequent detection of SNPs would be the most useful way of studying geographical distribution of genetic variation within species. As most ESTs are directly involved in the genetic control of an adaptive trait and have a known function, ESTs are the genetic marker that offer real potential for detecting adaptive genetic diversity [90].

As an alternative to the conventional strategy for detecting anonymous SSRs, large numbers of novel SSRs can be isolated with comparatively minor effort simply by in silico mining of the ESTs databases [91,92]. This approach has become a routine for some species, and there are many characteristics that EST-SSRs (EST-derived SSRs) present and that make them valuable as genetic markers. These include their presence in large numbers, high levels of polymorphism compared with many other types of genetic markers, co-dominant inheritance, repeatability and clarity of scoring, and enhanced transferability across related species [91,93]. Perhaps the greatest concern about the utility of EST-SSRs in population genetic analysis is that selection on these loci might influence the estimation of population parameters. Indeed, divergent selection will increase differentiation among and reduce variability within populations, whereas the opposite effect is expected under balancing selection. However, studies of large-scale comparative analyses suggest that only a very small percentage of all genes are experiencing positive selection [91,93]. Inevitably some fraction of all EST-SSRs will be subjected to selection.

Recently a significant number of EST‟s was generated in oaks, and particularly in cork oak. Since EST‟s are gene conservative primers designed for a species are likely to work well in related ones.

Ultimately, the potential of phylogeography may be fully accomplished when multiple loci are considered. The combined analysis of different marker types should allow a reconstruction of past population events in great detail, and also help understand their spatial structure and the dynamics of genetic diversity.

42

Materials and Methods

2. Materials and Methods

2.1 Sampling and DNA extraction

Sampling of 26 natural populations was performed from the entire Mediterranean distribution (Fig. 1.2 and Table 2.1). In Portugal sampling was performed surveying the following locations: Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de Monchique, Serra do Buçaco, Azeitão and Serra de Sintra. Stands were considered as natural populations when constituted by irregularly disposed trees with over 50 years old. The remaining populations from Portugal (São Brás de Alportel), Spain (Cataluña, Montes de Toledo, Haza del Lino, Sierra de Aracena, Sierra Morena, Sierra de Guadarrama), Italy (Puglia, Lazio and Sicily), France (Var, Landes and Corsica), Algeria (Forêt des Guerbès), Tunisia (Mekna and Fermana) and Morocco (Taza and Kenitra), were obtained from a cork oak provenance trial, located at Herdade Monte da Fava (Ermidas do Sado) , which harbours an international provenance trial established in 1998 in the frame of the Q. suber network from EUFORGEN, covering the complete distribution range of the species. Access to these populations was kindly provided by Helena Almeida from Instituto Superior de Agronomia. From each population 3-5 trees were sampled for the cpDNA and nuDNA fragment analysis. Young leaves were collected from Spring 2009 to Summer 2010 on a total of 119 adult trees distributed among the 26 sampled populations (Table 2.1).

Of the 26 populations chosen, 13 were selected for a wider sampling for the SSR study. The selected locations are representative of the entire Mediterranean distribution, and are the following: Portugal (Gerês, Serra da Estrela, Serra da Arrábida, Serra de Monchique, Serra do Buçaco and Serra de Sintra), Spain (Cataluña and Haza del Lino), Italy (Puglia), Algeria (Forêt des Guerbès), Tunisia (Mekna) and Morocco (Taza and Kenitra). For each population 22-32 trees were obtained. Young leaves were also collected from Spring 2009 until Summer 2010, on a total of 379 adult trees distributed among the 13 sampled populations (Table 2.1).

Several other Quercus species (namely Q. robur, Q. pyrenaica, Q. faginea, Q. rubra, Q. lusitanica, Q. canariensis, Q. cerris, Q. ilex (subsp rotundifolia and subsp ilex) and Q. coccifera) were also sampled from natural populations and used to help determine the Q. suber lineages, and also to more accurately establish the phylogenetic relationships of these lineages. According to the taxonomic classification of Schwartz [26] Q. cerris is part of the

43

Materials and Methods

subgenus Cerris, together with Q. suber. As for Q. petrea, Q. robur, Q. pyrenaica, Q. faginea and Q. lusitanica they belong to the subgenus Quercus. The species Q. coccifera and Q. ilex belong to the subgenus Sclerophyllodrys. Finally, Q. rubra is part of the subgenus Erythrobalanus. Castanea crenata was used as an outgroup (Table 2.1). Species identification of each tree was checked based on the morphology, and presence of bark in Q. suber, assessed during the growing season on fully elongated leaves.

The leaves were ground thoroughly with liquid Nitrogen, with a mortar and pestle, and then the genomic DNA was extracted according to Qiagen‟s protocol for DNeasy plant mini kit (Qiagen). The samples were analysed by electrophoresis on 1% w/v agarose gels stained with Red Safe 20,000x (iNtRON Biothechnology), to determine DNA integrity.

2.2 DNA sequencing

Polymerase chain reaction (PCR) amplifications were performed for 148 Quercus samples (Table 2.1) for fragments of three different chloroplastidial DNA regions [intergenic spacer regions TrnL-F [74], TrnS-PsbC [79] and TrnH-PsbA [78]]. Considering preliminary results of the cpDNA fragments analysis, 104 individuals, out of the 148, were selected for amplification of one nuclear DNA fragment [Expressed Sequence Tag (EST) 2T13 [94], a stress osmotic related gene] (Table 2.1). The primers used to amplify each fragment were those described by each mentioned author (Supporting Information 1 – Table S1.1 and Table S1.2).

To confirm Quercus species and assess the usefulness of barcodes as phylogeographical markers the official cpDNA barcode fragments (matK and rbcL) were amplified with the primers described by Cuénoud et al. [95] and Kress & Erickson [32], respectively (Supporting Information 1 – Table S1.1). Three individuals of each cork oak lineage, identified in the previous analysis of cpDNA regions, and one individual of each other Quercus species were selected for the analysis.

PCRs were performed in a final volume of 25 μL, with 1 μL of DNA (50–100 ng), 1x PCR buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4 µM of each primer. PCR amplification conditions were as follows: an initial denaturation step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 20 s,

44

Materials and Methods

annealing at 65 °C for 30 s for intergenic cpDNA fragments and 55 °C for the nuclear candidate gene and barcode fragments, extension at 72 °C for 40 s, and a final extension step at 72 °C for 7 min. PCR and amplification conditions were the same for all oak species.

PCR products amplifications were verified by staining with Red Safe 20,000x (iNtRON Biothechnology) along with the molecular weight marker HyperLadder™ IV (Bioline) on 1% w/v agarose gels. Amplicons were purified using SureClean (Bioline).

The nuclear EST fragments Phyt B (Phytocrome B, involved in flower phenology) [96] and Cons 58 (Auxin repressed protein) [97] were also tested, with the primers described by the mentioned authors (Supporting Information 1 – Table S1.2); however after several attempts of optimization no amplification product was obtained.

Sequencing reactions were carried out using the BigDye v3.1 chemistry (Applied Biossystems, ABI) on an ABI prism 310 automated sequencer. Amplicons were sequenced in both directions with an initial denaturation at 96ºC for 1 min, followed by 25 cycles of 96ºC for 10s, annealing temperature of 50ºC for 5s, and a final extension step at 60ºC for 4 min. The amplified products were purified through a 70% ethanol precipitation, described as follows. The total reaction volume was transferred to a 1.5 ml tube containing 1 μl of 3 M of sodium acetate and 25 μl of absolute ethanol. This mixture was subsequently incubated on ice for 30 min, and then centrifuged at 10,000 g for 25 min. The supernatant was discarded and 300 μl of 70% ethanol were added to each tube, which were centrifuged for 15 min at the same speed; this last step was performed a second time. Finally, the supernatant was completely discarded and the samples were air-dried in the dark, until further processing.

The products were sequenced in an ABI PRISM® 310 Genetic Analyzer (Applied Biosystems, USA) available in the laboratory.

Chromatograms were manually checked for errors in SEQUENCHER v4.0.5 (Gene Codes Co.). For the nuclear fragment, nucleotide ambiguities of similar peak size in chromatograms were considered as evidence of potential heterozygous sites. The IUPAC ambiguity code was used for subsequent analyses.

BLAST (Basic Local Alignment Search Tool) against NCBI database (http://blast.ncbi.nlm.nih.gov) was always performed to confirm the fragments‟ identity. The

45

Materials and Methods

Table 2.1: Description of the sampled populations for the several species, and sample size for each marker (cpDNA and nuDNA sequences, and SSRs). Species Country Site Code GPS coordinates Sample size Lat Long cpDNA nuDNA SSRs Q. suber Portugal Azeitão AZT 38º 30'N 9º 02'W 5 3 - Gerês GER 41º 40'N 8º 10'W 5 3 29

Serra de Monchique MON 37º 19’N 8º 34’W 5 3 29

Serra da Arrábida ARR 38º 50’N 9º 03’W 5 4 30

Serra São Mamede SSM 39º 23'N 7º 22'W 5 4 -

Serra de Sintra SIN 38º 45’N 9º 25'W 5 3 30

Serra do Buçaco BUC 40º 22'N 8º 21'W 5 3 30

São Brás de Alportel SBA 37º 20'N 7º 56'W 5 3 -

Serra da Estrela EST 40º 32'N 7º 51'W 5 4 32

Tunisia Mekna, Tabarka MEK 36º 57'N 8º 51'E 5 3 28

Fermana FER 36º 35'N 8º 32'E 3 3 -

Algeria Forêt des Guerbès ALG 36º 54'N 7º 15'E 5 3 30

Italy Puglia, Brindisi PUG 40º 34'N 17º 40'E 5 3 22

Lazio, Tuscany LAZ 42º 25'N 11º 57'E 5 3 -

Sicily, Catania SIC 37º 07'N 14º 30'E 3 3 -

France Landes, Soustons LAN 43º 45'N 1º 20'W 5 2 -

Var, Bomes les Mimoses VAR 43º 08'N 6º 15'E 3 3 -

Corsica, Sartene COR 41º 37'N 8º 58'E 3 3 -

Morocco Kenitra KEN 34º 05'N 6º 35'W 5 3 30

Rif, Taza TAZ 34º 12'N 4º 15'W 5 3 30

Spain Sierra de Guadarrama GUA 40º 31'N 3º 45'W 5 3 -

Montes de Toledo, TOL 39º 22'N 5º 21'W 5 3 -

HazaCañamero del Lino HAZ 36º 50'N 3º 18'W 5 5 29

Sierra Morena, MOR 38º 24'N 4º 16'W 4 3 -

Cataluña,Fuencaliente Sta Coloma de CAT 41º 51'N 2º 32'W 5 3 30

SierraFarnes de Aracena, ARC 37º 54’N 6º 44’W 3 3 -

Q. Portugal ErmidasJabugo do Sado 38º 00'N 8º 07'W 2 2 -

rotundifolia Serra da Arrábida 38º 50’N 9º 03’W 1 1 -

Serra da Estrela 40º 32'N 7º 51'W 1 1 -

Serra de São Mamede 39º 23'N 7º 22'W 9 6 -

Fátima 39º 37'N 8º 40'W 1 1 -

Q. ilex France 43º 09’N 3º 03’E 2 1 - Q. coccifera Portugal Cascais, Aldeia de Juzo 38º 72’N 9º 09’W 5 3 - Q. faginea Portugal Serra da Arrábida 38º 50’N 9º 03’W 1 1 - Q. pyrenaica Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 - Q. robur Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 - Q. Portugal Lisbon 38º 45’N 9º 09’W 1 - - canariensis Q. lusitanica Portugal Negrais 38º 52’N 9º 17’W 1 1 -

Q. rubra Portugal Lisbon 38º 45’N 9º 09’W 1 1 -

Q. cerris Italy Greve in Chianti 43º 35’N 11º 18’E 1 1 - Castanea Portugal Vila Real 1 1 - crenata Total 148 104 379

46

Materials and Methods

matK sequence for Quercus crenata was retrieved from GenBank (accession number FN675334, [29]).

2.3 Microsatellite genotyping

A total of 9 dinucleotide nuclear anonymous microsatellite (nuSSRs) markers previously developed on other oaks species were used in this study; one of them, MSQ13, was first described in Q. macrocarpa Michx. [98], five in Q. petraea (Matt) Liebl. (QpZAG9, QpZAG15, QpZAG36, QpZAG46 and QpZAG110) [99], and three in Q. robur L. (QrZAG11, QrZAG7 and QrZAG20) [100]. Transferability of these SSRs to cork oak had been previously reported [87,88]. These microsatellites are considered as unlinked and anonymous markers [101]. Amplifications were performed with the primers designed by the previously mentioned authors and the conditions were as follows: an initial denaturation step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 60 s, annealing at 50 °C for 30 s (specific annealing temperatures in Table S2.1 – Supporting Information 2), extension at 72 °C for 60 s, and a final extension step at 72 °C for 10 min. PCRs were performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4 µM of each primer. However, considering the authors‟ guidelines for PCR, and after several attempts, the loci QpZAG36 and QpZAG46 presented no amplification product for most of the samples or very unreliable scoring and were, therefore, abandoned.

Two nuSSRs developed by Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) specifically for cork oak (QsA11 and QsD8) were also tested. However, in spite of the optimization attempts a clear scoring was never possible and the loci were also discarded.

At the onset of this work there were no EST-derived microsatellites (EST-SSRs) specifically for cork oak, but since ESTs are gene conservative sequences, primers designed for a species are likely to work in related ones. So, six polymorphic EST-SSRs were selected from Quercus mongolica (QmOST1, QmD12, QmAJ1, QmDN1, QmDN2, QmDN3) [92]. The loci names were chosen for this work and correspond, respectively, to the following NCBI dbEST (http://www.ncbi.nlm.nih.gov/dbEST) accession numbers: DN949770, CR627959, AJ577265, DN950717, DN949776, and DN950726. The selected sets of specific primers for

47

Materials and Methods

each SSR used can be found in Ueno & Tsumura [92] (Supporting Information 2 – Table S2.2). PCR amplification conditions were as follows: an initial denaturation step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 30 s, annealing at 57 °C for 30 s (specific annealing temperatures in Table S2.2 – Supporting Information 2), extension at 72 °C for 30 s, and a final extension step at 72 °C for 10 min. PCRs were performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR buffer

(Promega), 1U Taq polymerase (Promega), 1.5 mM MgCl2, 0.12 mM dNTPs and 0.3 µM of each primer. After several attempts of amplification, the locus QmDN2 presented no PCR products.

PCR product electrophoresis was performed with an ABI PRISM 310 automated sequencer and the genotypes were scored and visually controlled using the GENEMAPPER software v3.7 (Applied Biosystems, Inc.) To identify and correct possible genotyping errors the software MICRO-CHECKER v2.2.3 [102] was used.

2.4 Phylogenetic and phylogeographic analysis

Datasets for each sequenced fragment were aligned in CLUSTAL X v2.0.12 [103,104], followed by manual refinement in BIOEDIT v7.0.9 [105]. To create the cpDNA concatenated matrix from the individual datasets of TrnL-F, TrnS-PsbC and TrnH-PsbA fragments, the CONCATENATOR v1.1.0 software was used [106].

Phylogenetic analysis was performed using PAUP* v4.0.b4a [107]. Maximum parsimony (MP) analyses were carried out on all data sets. The optimal tree was found by a heuristic search with tree-bisection–reconnection as the branch-swapping algorithm. Initial trees were obtained via stepwise addition with 1000 replicates of random addition sequence. Bootstrapping with 1000 replicates was performed to evaluate the robustness of the nodes of the phylogenetic trees.

Bayesian analyses (BA) were undertaken using MRBAYES v3.1.2 [108] with the optimal model selected under the Akaike Information Criterion (AIC), as implemented in MrMODELTEST v2.3 [109]. For analysis of the combined data, model selection was carried out separately for each cpDNA data set with MrMODELTEST and then implemented according to the author‟s recommendations. Additionally indels were included and scored as

48

Materials and Methods

binary characters (absent/present). The posterior probabilities of the phylogenetic trees were estimated by a Metropolis-coupled, Markov chain Monte Carlo sampling algorithm (MCMCMC), sampling at every 1000th generation. For the individual and combined cpDNA datasets, Bayesian posterior probabilities were generated from 6x106 and 5x108 generations, respectively. For the nuclear fragment dataset 3x106 generations were used to calculate the Bayesian phylogeny and respective posterior probability values. The analysis was run three times with one cold and three incrementally heated Metropolis-coupled Monte Carlo Markov chains, starting from random trees. Ten percent of the generations were discarded as burn-in. Trees were then combined and summarized on a 50% majority-rule consensus tree.

The cpDNA fragments, when aligned for all oak species presented several indels, therefore only the MP and BA analyses were performed, because only these allow considering indels as informative data.

The program NETWORK v4.6 [110] was used to construct a median-joining network of haplotypes showing the number of mutational steps between them.

2.5 Selective neutrality tests and demographic history

Selective neutrality of each microsatellite locus was examined based on the sampling distribution of neutral alleles under the infinite-alleles model. The Ewens–Watterson homozygosity test [111] and the Ewens–Watterson–Slatkin exact test [112,113] were performed using the absolute allele frequency distribution, as implemented in ARLEQUIN v3.5 software [114]. In these tests, the expected null distribution of the homozygosity statistic (Fexp) is generated by simulating random neutral samples, which is then compared with the homozygosity observed in the original sample (Fobs). If the null hypothesis of selective neutrality is rejected (p<0.05), an Fobs/Fexp ratio less than 1 implies balancing selection in favour of heterozygotes and a ratio greater than 1 implies directional selection in favour of advantageous alleles.

The mismatch distribution (1000 replicates) was used to infer the demographic history of the cork oak lineages present in each cpDNA and nuDNA datasets. Pairwise distances between haplotypes, time since population expansion (τ), relative population size before (θ0) and after (θ1) expansion were calculated in ARLEQUIN. The Harpending's (1994) raggedness index

49

Materials and Methods

(r) and the sum of squared deviation (SSD) to assess the statistical significance of the distribution under the rapid expansion model was tested with 1000 replicates of bootstrap in ARLEQUIN.

Both Tajima‟s D [116] and Fu‟s Fs [117] tests were implemented to test deviations from neutrality. Fu‟s Fs uses information from the haplotype distribution and is particularly sensitive to population demographic expansion where low Fs values indicate an excess of single substitutions usually due to expansion [117,118]. Tajima‟s D uses the average number of pairwise differences and number of segregating sites in the intraspecific DNA sequence to test for departure from neutral expectations, generally assuming negative values in populations that have experienced size changes, or for sequences that have undergone selection [116,118]. Fu‟s Fs and Tajima‟s D were calculated in ARLEQUIN.

2.6 Genetic diversity and population differentiation

Linkage disequilibrium (LD) between all pairs of polymorphic SSR loci was calculated using the probability test implemented in GENEPOP v4.0 software [119]. Using the complete sampling, the nucleotide diversity (π) and its standard deviation, Haplotype diversity (Hd) and Indel Haplotype diversity (IndelHd) were estimated for each selected sequenced fragment in DnaSP v10.01 [120].

Gene diversity statistics (gene diversity He [121] and allelic richness A) were estimated for microsatellites using the program FSTAT v2.9.3.2 [122,123]. Allelic richness (A) was corrected using the rarefaction method based on a minimum sample size of 21 diploid individuals, which corresponded to the smallest number of individuals successfully genotyped for a given locus in a population. The private alleles were calculated in GenAlEx v6.3 [124]. The inbreeding coefficient Fis [125] was calculated using ARLEQUIN and its deviation from zero tested by 10,000 allele permutations. Population differentiation was calculated by FST [125] and RST [126] in ARLEQUIN.

SMOGD software v1.2.5 [127] was used to measure the actual differentiation among populations (Dest) according to Jost [128], G‟ST standardized measure of genetic differentiation [129] and GST nearly unbiased estimator of relative differentiation [130].

Pairwise genetic differentiation between populations was estimated with FST, RST and Dest, in

50

Materials and Methods

FSTAT, ARLEQUIN and SMOGD, respectively. Standard Bonferroni corrections were applied to account for multiple testing.

Geographic patterns of genetic differentiation were tested by regressing the genetic differentiation (FST) against geographic distance between pairs of samples, following Rousset

[131] [FST/(1-FST) and logarithm of geographic distances between populations]. The reduced major axis regression was used to estimate the regression, using the IBDWS v3.03 software [132]. Mantel tests were used to test the null hypothesis of no relationship between the genetic and geographic matrices.

2.7 Genetic structure of populations

The Bayesian clustering method implemented in STRUCTURE v2.3.3 [133] was used to determine the genetic structure of the sampled populations for the microsatellite loci. Because preliminary analyses showed that overall differentiation was low the new clustering method was used, which is not only based on the individual multilocus genotypes but also takes into account the sampling locations [134]. The LocPrior model considers that the prior distribution of cluster assignments can vary among populations. This approach is recommended by the authors when the genetic data are not very informative to help the detection of population structure. A parameter r indicates the extent to which the sampling locations are informative (small values <1 indicate that locations are informative). Twenty independent runs were done, following a Markov Chain Monte Carlo (MCMC) scheme, for each value of K (the number of putative clusters) ranging from 1 to 13 (the number of populations sampled). The admixture model with sampling locations as prior information [134] was selected and correlated allele frequencies among populations were assumed [135]. Each run consisted in a MCMC length of 1,000,000 and 50,000 burnin. It was used the posterior probability of the data for a given K, LnP(D), to identify the most probable number of clusters using both DK (DeltaK) ad hoc statistics [136] and guidelines of the software documentation [133]. Once the most likely K value was determined, for interpreting results was chosen the run with the higher posterior probability and lower variance. Final results from STRUCTURE were visualized using the software DISTRUCT v1.1 [137].

The degree of population subdivision was also explored as implemented in the R-package GENELAND v3.2.4 [138]. This latter approach determines the number of groups (K) using a

51

Materials and Methods

Bayesian clustering model executed in a MCMC scheme to detect the location of genetic discontinuities using individual geo-referenced multilocus genotypes [139]. GENELAND uses geographical locations of individuals as prior information. This model treats the number of clusters as a parameter processed by the MCMC scheme without any approximation and may provide a better estimation of the number of clusters than other proposed procedures that do not take the geographical locations into account [139,140]. Twenty independent MCMC runs were performed, allowing K to vary from 1-13 (the number of populations sampled), with the following parameters: 1,000,000 iterations, of which every hundredth one was saved (after 10% burnin), treating the number of genetic clusters as unknown and using Dirichlet model for allelic frequencies (assumed as correlated).

Results obtained following the GENELAND and STRUCTURE approaches were further tested with an Analysis of Molecular Variance (AMOVA) approach [141].

52

Results

3. Results

3.1 Sequencing of chloroplast and nuclear DNA fragments

3.1.1 cpDNA and nuDNA diversity levels Initially, 148 samples were sequenced for the cpDNA fragments studied (Table 2.1). The cpDNA fragments, when aligned for all oak species presented several indels. When ambiguous alignments were produced, several slightly different alignments including the removal of the ambiguous positions or indels were tested, without producing any major differences in the results. The nucleotide diversity found in each dataset was 0.00400 (+/- 0.0009), 0.00925 (+/- 0.0007) and 0.00549 (+/- 0.0004) for the fragments TrnS/PsbC, TrnH/PsbA and TrnL-F, respectively (Table 3.1). For the cork oak samples (119 out of the 148 individuals), a total of 8 TrnS/PsbC haplotypes, 7 TrnH/PsbA haplotypes, and 5 TrnL-F haplotypes, were obtained. For the cpDNA concatenated dataset 17 cork oak haplotypes were detected with a nucleotide diversity of 0.00658 (+/- 0.005) (Table 3.1). After a preliminary analysis of the cpDNA sequences, 104 samples (out of the 148) from the main groups were selected and sequenced for candidate gene EST 2T13 (Table 2.1). The alignment was straightforward showing no potential heterozygous sites for the cork oak samples. The nucleotide diversity estimated for the EST 2T13 fragment is 0.02387 (+/- 0.0126) (Table 3.1), with 8 cork oak haplotypes.

Table 3.1: The length (bp), number of parsimony informative sites (PI) and estimated nucleotide diversity (π) and its standard deviation for each dataset, using the complete sampling. Lenght Variable Total (bp) sites characters Indels PI π Individual cpDNA 0.00400 TrnS/PsbC 250 20 238 12 15 +/- 0.0009 0.00925 TrnH/PsbA 478 54 448 30 34 +/- 0.0007 0.00549 TrnL-F 381 18 374 7 14 +/-0.0004 Concatenated 0.00658 TrnS/TrnH/TrnL 1109 92 1060 49 63 +/-0.0005 Individual nuDNA 0.02387 EST 2T13 249 48 240 9 20 +/-0.0126

53

Results

3.1.2 Differentiation patterns Maximum parsimony (MP) trees for the cpDNA fragments TrnS/PsbC, TrnH/PsbA and TrnL-F are presented in Fig. 3.1a, Fig. 3.2 and Fig. 3.3, respectively. The Bayesian analyses (BA) derived trees showed very similar results to those of the MP analysis; therefore it was decided to present the MP tree of each fragment with the respective bootstrap and clade credibility values. The concatenated tree supported the results of the individual trees (Supporting Information 3). In all the cpDNA phylogenetic trees (Fig. 3.1a, Fig. 3.2 and Fig. 3.3), four major groups were distinguished, and were named as Group A, B, C and D. Group A (highlighted in the figures in yellow) is composed exclusively by samples of the subgenus Cerris, namely cork oak samples from several populations and Q. cerris. Group B appears as a more complex group since it is composed of samples of several Quercus species, namely Q. suber (highlighted in the trees in orange – subg Cerris), and Q. coccifera (green), Q. ilex ilex (pink) and Q. ilex rotundifolia (red) of the subgenus Sclerophyllodrys. Considering the presence of cork oak samples in these two groups, and that the samples present in each group were always the same for all the cpDNA fragments, the haplotypes belonging to Group A were considered as a pure lineage of cork oak, while the samples belonging to Group B were considered as an introgressed lineage of cork oak. Group C (highlighted in blue) is composed by several Quercus species, specifically Q. faginea, Q. robur, Q. pyrenaica, Q. lusitanicus and Q. canariensis from the subgenus Quercus. Finally, Group D is constituted by Q. rubra of the subgenus Erythrobalanus.

In particular, Group A is composed of 92 cork oak samples (out of the 119) and the sample of Q. cerris, and was characterized by low levels of variation and number of haplotypes (Table 3.2). This was particularly evident for the TrnL-F fragment for which only one haplotype was found (Fig. 3.3), and for TrnH/PsbA fragment where again only one major haplotype is present, although two derived low frequent haplotypes are present in Puglia (Italy) (Fig. 3.2 and Table 3.2). In both these fragments Q. cerris shares the same haplotype as Q. suber. Higher variation was found for TrnS/PsbC fragment in cork oaks pure lineage (Table 3.2), allowing the distinction of tree sublineages, that were named as A1, A2 and A3 (Fig. 3.1 and Table 3.2). In Fig. 3.1b a reconstruction of the phylogenetic tree for the pure lineage of the TrnS/PsbC fragment shows the major haplotypes of each sublineage and the mutational events that occurred during the formation of those haplotypes. The sublineage A1 (sl A1) is exclusive to the island of Sicily, the sublineage A2 (sl A2) is present in West Mediterranean

54

Results

a) b)

Figure 3.1: a) Maximum parsimony tree of the cpDNA TrnS/PsbC intergenic spacer region. Four groups are represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage and Q. cerris (Bright Yellow - Sublineage A2 (Sl A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1 (Sl A1)); Group B (orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group C is highlighted in dark blue and is composed of several Quercus: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value; b) Detailed phylogenetic reconstruction of the sublineages from Group A. Bootstrap support and Bayesian credibility value are provided above each branch. The site combinations bellow each branch represents the mutational events that occurred along the evolution of the three sublineages. 55

Results

Figure 3.2: Maximum parsimony tree of the cpDNA TrnH/PsbA intergenic spacer region. Four groups are represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage; Group B: cork oak‟s introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica; Group D is hightlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

56

Results

Figure 3.3: Maximum parsimony tree of the cpDNA TrnL-F intergenic spacer region. Four groups are represented and color coded. Group A is highlighted in yellow: cork oak‟s pure lineage; Group B: cork oak‟s introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

57

Results

populations, and the sublineage A3 (sl A3) is present in East Mediterranean populations (Fig. 3.4). For this fragment Q. cerris shows a Table 3.2: Haplotype diversity (Hd), Indel Haplotype diversity (Indel Hd) and number of haplotypes for each derived haplotype from the sublineage A3 cpDNA and nuclear fragment (nr H) according to pure and introgressed cork oak lineages. (Fig. 3.1a). Hd Indel Hd nr H TrnS/PsbC Group B is composed of all the Q. Pure Lineage (A) 0.000 0.471 4 coccifera and Q. ilex (subspecies ilex and Introgressed lineage (B) 0.000 0.649 4 TrnH/PsbA rotundifolia) samples that were analysed, Pure Lineage (A) 0.024 0.024 3 Introgressed lineage (B) 0.442 0.695 5 as well as 27 of the 119 cork oak TrnL-F samples. Most of the cork oak haplotypes Pure Lineage (A) 0.000 0.000 1 Introgressed lineage (B) 0.613 0.413 4 that belong to this group are shared or EST2T13 Pure Lineage (α) 0.494 0.028 6 seemed derived from the haplotypes Introgressed lineage (β) 0.000 0.182 2 present in the other species.

When comparing cork oak samples in Group B, belonging to the introgressed lineage, they generally presented more variability than those in Group A, the pure lineage (Table 3.2).

All the three cpDNA fragments seem to be able to distinguish groups C and D, even if with a low resolution, most evident in the TrnL-F fragment. Also these groups are quite inconsistent regarding their position on the trees, as well as the phylogenetic relationships between them and the other groups (Fig. 3.1a, Fig. 3.2 and Fig. 3.3).

The analysis of the barcode matK fragment provided roughly the same nucleotide diversity (π=0.0067 +/- 0.00095) as the remaining cpDNA fragments analysed, whereas the rbcL fragment presented no variation for any of the species. The analysis of the matK fragment and the resulting phylogenetic tree (Fig. 3.5) corroborated the presence of two cork oak lineages. Quercus cerris and Quercus crenata are classified, both by classic taxonomy [17] and recent DNA barcode analysis [29], as the most closely related species to Q. suber (subgenus Cerris), and these species appear with the same haplotype as cork oak‟s samples characterized as pure lineage (subgenus Cerris). Cork oak samples characterized as the introgressed lineage appear closely related to Quercus ilex ilex, Quercus ilex rotundifolia and Quercus coccifera (subgenus Schlerophyllodrys).

58

Results

Figure 3.4: Geographical distribution of cork oak cpDNA haplotype lineages according to the TrnS/PsbC fragment. Pie charts represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect the number of samples per population (3-5). Colour codes reflect those in the TrnS/PsbC tree (Fig. 3.1); Yellow: cork oak‟s Pure lineage (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3; Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage. In grey is represented the present distribution of the species.

Figure 3.5: Maximum parsimony tree of the cpDNA fragment matK. Four groups are represented and classified according to classic taxonomy (Tutin et al. 1993), following the four subgenera identified by Schwartz 1964 (Sclerophyllodrys, Cerris, Quercus and Erythrobalanus). Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

59

Results

For the nuclear fragment EST 2T13, the best model of sequence evolution for the BA was calculated, and the resultant tree presented a very similar topology to that of the MP. The MP tree is shown in Fig. 3.6a with the respective bootstrap values of MP and clade credibility values for the BA. The nuclear tree reflects the same pattern as the cpDNA trees. The same four major groups are present, here named as Group α, β, γ and δ.

Group α is constituted by Q. suber [40 samples out of the 52 used, (Table 2.1)] and Q. cerris, similarly to the cpDNA results. The pattern of three sublineages is also present, α1, α2 and α3, although Q. cerris in the nuclear fragment appears to share the same haplotype of the cork oak samples from sublineage α3. In Fig. 3.6b a reconstruction of the phylogenetic tree for the nuclear pure lineage of the EST 2T13 fragment shows the major haplotypes of each sublineage and the six mutational events that occurred during the formation of those haplotypes. Group β, as in the cpDNA trees, is constituted by cork oak, Q. coccifera and Q. ilex samples. Also as in the cpDNA trees, the Group γ is composed of the Quercus species belonging to the subg. Quercus and Group δ is constituted by Q. rubra from the subgenus Schlerophyllodrys. The phylogenetic relationships between the four groups more closely resemble those of the cpDNA fragment TrnH/PsbA tree. The major differences found between the nuDNA and the cpDNA datasets are the cork oak samples that compose Group α (and one could call the nuclear pure lineage), and subsequent sublineages, and Group β (the nuclear introgressed lineage), that are not always the same when comparing the fragments from both genomes. In particular, sublineage α1 in the nuclear DNA is not exclusively composed of cork oak samples from Sicily island as in the cpDNA fragments, showing samples that in the cpDNA belonged to sublineage A3; sublineage α2 is not completely West Mediterranean in the nuclear DNA presenting samples from the sublineage α3 as well as from the introgressed lineage; the sublineage α3 also loses its exclusiveness to East Mediterranean populations being constituded by samples that in the cpDNA trees belong to the sublineage α2 and introgressed lineage (Fig. 3.7). Another difference between the cpDNA and the nuclear DNA was the arrangement of the cork oak samples that compose Group β. These samples do not share the same haplotypes with Q. ilex and Q. coccifera as they did in the cpDNA fragments. Instead they present a major haplotype derived from a

60

Results

b)

a)

Figure 3.6: a) Maximum parsimony tree of the candidate gene EST 2T13. Four groups are represented and color coded. Group α is highlighted in yellow: cork oak‟s pure lineage and Q. cerris (Bright Yellow - Sublineage α2; Brownish-Yellow – Sublineage α3; Light Yellow – Sublineage α1); Group β (Orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group γ is highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica: Group δ is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value. b) Detailed phylogenetic reconstruction of the sublineages from Group α. Bootstrap support and Bayesian credibility value are provided above each branch. The site combinations bellow each branch represents the 6 mutational events that occurred along the evolution of the three sublineages. 61

Results

common ancestor, shared with Q. ilex and Q. coccifera. The diversity levels in the nuclear pure lineage are not as low as in the cpDNA (Table 3.2), showing 6 haplotypes.

However, the diversity levels of the cork oak samples that belong to the nuclear introgressed lineage in Group β are lower when compared to the diversity of the other species in this group (Table 3.2 and Fig. 3.6). In Group γ the diversity, however is higher than that of the cpDNA since each species is characterized by its own haplotype (Fig. 3.6).

Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Supporting Information 4) reflecting the four major groups in the trees and the shared haplotypes for Q. suber, in clade B, with Quercus coccifera, Q. rotundifolia and Q. ilex.

Figure 3.7: Geographical distribution of cork oak nuDNA EST 2T13 haplotype lineages. Pie charts represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect the number of samples per population (3-5). Colour codes reflect those in the EST 2T13 tree (Fig. 3.6); Yellow: cork oak‟s nuclear pure lineage (Bright Yellow - Sublineage A2‟; Brownish-Yellow – Sublineage A3‟; Light Yellow – Sublineage A1‟); Orange: cork oak‟s nuclear introgressed lineage. In grey is represented the present distribution of the species.

3.1.3 Mismatch distribution and neutrality tests Demographic histories of both cork oak lineages were evaluated with mismatch distributions and tests of the standard neutral model for a demographically stable population (Tajima‟s D [116] and Fu‟s Fs [117]) (Table 3.3). The TrnS/PsbC fragment sequence analysis provided

62

Results

slightly contradicting results. The null hypothesis of population demographic expansion was not rejected based on the mismatch distribution for neither of the lineages (pure lineage – SSD=0.098, p=0.061; r=0.436, p=0.164/ introgressed lineage – SSD=0.026, p=0.086; r=0.186, p=0.055), but p values are somehow marginal and these statistics are conservative and use little information of the data. Detecting population demographic size changes can be difficult with small sample sizes or haplotypes, or when the population has experienced a very recent expansion. Fu‟s Fs has been shown to be more powerful than mismatch distributions in detecting both very recent and older population expansions [117,118], and this statistic (such as Tajima‟s D) did not support population expansion for either of the lineages (Table 3.3). The TrnH/PsbA fragment analysis, for the pure lineage showed a strong evidence of recent population expansion from the not significant sum of squared deviations (SSD=0.000, p=0.183) and Harpending‟s raggedness index (r=0.822, p=0.837), and significant (p<0.001) negative values of Fu‟s Fs. Tajima‟s D values, although not significant (p=0.119) presented also a negative tendency (D=-1.047). TrnH/PsbA introgressed lineage presented no evidence of population expansion. SSD and r values rejected the null hypothesis of expansion supported by the values of D and Fs. It was not possible to calculate the mismatch and Fu‟s Fs for the pure lineage of the TrnL-F fragment because only one haplotype is present. The TrnL-F introgressed lineage presented a mismatch distribution that departed (although marginally) from the stepwise growth model (SSD=0.132, p=0.076), but fit to the Harpending‟s raggedness index of stepwise population expansion model (r=0.491, p=0.019). Fu‟s Fs values and Tajima‟s D were positive and not significant rejecting population expansion (Table 3.3).

The nuDNA fragment EST 2T13 was also evaluated for its demographic history and neutrality. For the nuclear pure lineage the null hypothesis of demographic expansion based on the Harpending‟s raggedness index of the mismatch distribution was not rejected (r=0.145, p=1.00). However the SSD value rejected the null hypothesis at a highly significant level (SSD=0.328, p=0.00), supported by the non-significant values of Fu‟s Fs and Tajima‟s D (although both values are negative) (Table 3.3). For the nuclear introgressed lineage the mismatch analysis indicates demographic expansion, although this is not supported by Fu‟s Fs and Tajima‟s D tests (Table 3.3).

63

Results

Table 3.3: Estimates of mismatch distribution parameters and neutrality tests. τ = (tau) time since population expansion; θ = relative population size before (θ0) and after (θ1) expansion; SSD = sum of squared deviations; r = Harpending‟s raggedness index; D = Tajima‟s D; Fs = Fu‟s Fs; ns = not significant; * Significant at p<0.05; *** Significant at p<0.001 Missmatch Tajima's D Fu's Fs τ Ɵ0 Ɵ1 SSD r D Fs TrnS-PsbC Pure lineage (A) 2.648 0.000 1.115 0.098 ns 0.436 ns 0.000 ns 0.947 ns Introgressed lineage (B) 0.959 0.000 99999.000 0.026 ns 0.186 ns 0.000 ns -0.271 ns TrnH-PsbA Pure lineage (A) 3.000 0.000 0.050 0.000 ns 0.882 ns -1.047 ns -3.773 *** Introgressed lineage (B) 10.273 0.000 8.887 0.236 ** 0.461 *** 0.047 ns 5.891 ns TrnL-F Pure lineage (A) - - - - - 0.000 ns - Introgressed lineage (B) 2.484 0.002 3.000 0.132 ns 0.491 * 0.771 ns 1.290 ns EST 2T13 Pure lineage (α) 0.000 0.000 3413.950 0.328 *** 0.145 ns -1.323 ns -0.741 ns Introgressed lineage (β) 2.965 0.450 0.450 0.021 ns 0.457 ns 0.000 ns -0.176 ns

3.2 Microsatellite analysis

3.2.1 Genetic diversity values For the EST-SSRs markers, QmDN1 locus was apparently monomorphic and was discarded from any subsequent analysis. Global evaluation of the microsatellite data set using Micro- Checker [102] revealed no evidence of genotyping errors due to stuttering or large allele dropout, but identified possible null alleles at two markers: QmOST1 and QmDN3. For QmOST1 locus, although marginally, there is the possibility of null alleles for the populations HAZ and MEK (Supporting Information 5 – Fig. 5.1). As for the QmDN3 locus revealed indices of null alleles in all populations and, therefore, was eliminated from all subsequent analyses (Supporting Information 5 – Fig. 5.2). For the three remaining EST- SSRs no linkage disequilibrium between the loci was detected (Supporting Information 6). The number of total alleles (NA) in each population ranged from nine to fourteen and the allelic richness (A) from 3.000 to 4.109, being the SIN population that clearly presented the higher number of alleles and consequent the highest allelic richness. Gene diversity (expected heterozygosity over loci) ranged from 0.400 in SIN to 0.598 in CAT (Table 3.4). Only the population of SIN departed significantly from Hardy-Weinberg equilibrium at 0.01

64

Results

significance level (Table 3.4). The inbreeding coefficient for the SIN population was positive (Fis=0.1691), and as the species, although monoeicious, presents a protandrous system to ensure cross-pollination, significant deviation from zero should reflect biparental inbreeding or population substructure (Table 3.4). In total, 18 alleles were identified at the three loci, and 7 alleles were exclusive to a single population (private alleles). Of these, 5 were exclusive to SIN and the others to MEK and CAT (Table 3.4). The private alleles were at the extremes of the allele size distribution and occurred at very low frequencies.

Table 3.4: Populations of Quercus suber sampled for the molecular genetic work with SSRs, including country and population abbreviations, number of total alleles (NA), allelic richness (A), number of private alleles (PA), expect ed (He) and observed (Ho) heterozygosities, and within-population inbreeding coefficients (Fis). EST-SSRs (3 loci) nuSSRs (5 loci) Country Code+ NA A PA Fis Ho He NA A PA Fis Ho He Portugal ARR 10 3.315 - 0.0162 ns 0.522 0.531 34 6.165 3 0.1064 ns 0.538 0.601 BUC 10 3.025 - -0.0005 ns 0.422 0.422 34 5.997 1 0.0399 ns 0.553 0.576 EST 10 3.167 - 0.2190 ns 0.375 0.479 32 5.703 - 0.1449 ns 0.506 0.591 GER 10 3.228 - -0.0840 ns 0.506 0.467 30 5.635 - 0.0586 ns 0.557 0.591 MON 11 3.479 - 0.1769 ns 0.393 0.476 35 6.494 1 0.1579 * 0.521 0.617 SIN 14 4.109 5 0.1691 ** 0.333 0.400 33 6.158 1 -0.0665 ns 0.621 0.583 Algeria ALG 10 3.202 - -0.0055 ns 0.522 0.519 38 6.759 1 0.0920 ns 0.510 0.548 Spain CAT 11 3.465 1 -0.0218 ns 0.611 0.598 29 5.354 - 0.0628 ns 0,533 0.569 HAZ 10 3.230 - 0.1188 ns 0.512 0.580 32 5.796 1 0.1181 ns 0,469 0.531 Marocco TAZ 10 3.276 - 0.1083 ns 0.478 0.535 34 6.138 1 0.0261 ns 0.538 0.552 KEN 11 3.365 - -0.2336 ns 0.700 0.570 26 4.938 - 0.0433 ns 0.630 0.658 Tunisia MEK 11 3.480 1 0.1689 ns 0.441 0.528 29 5.433 - 0.0348 ns 0.508 0.526 Italy PUG 9 3.000 - 0.0403 ns 0.444 0.463 28 5.600 - 0.0642 ns 0.591 0.630 + See Fig. 3.4 for visual location on a map of Europe. Significance levels after Bonferroni corrections: Ns – Not significant; ** Significant at p<0.01; * Significant at p<0.05

For the nuSSRs, As previously shown by Burgarella et al. [142] the locus MSQ13 appears to be particularly informative to detect F1 hybrids between Q. suber and Q. rotundifolia because the allele sizes do not overlap [88,142]. The locus was tested in some individuals for each population (including all the individuals that were detected as belonging to the introgressed lineages), revealing to be monomorphic at the expected allele size for Q. suber. Thus, the locus was not used in the following analysis. Global evaluation of the microsatellite data set

65

Results

using Micro-Checker revealed no evidence of genotyping errors due to stuttering or large allele dropout, but identified possible null alleles in a few populations for markers QpZAG110, QrZAG20, QrZAG11 and QpZAG15 (Supporting Information 5 – Fig. 5.3, Fig. 5.4, Fig. 5.5). Also, the QpZAG15 locus revealed a departure from the Hardy-Weinberg equilibrium (HWE) in 9 populations (data not shown). Considering all, this locus was removed from subsequent analyses. No linkage disequilibrium between the remaining loci was detected (Supporting Information 6). The number of total alleles (NA) in each population ranged from 26 (KEN) to 36 (ALG) and the allelic richness (A) from 4.938 to 6.759. The gene diversity (expected heterozygosity over loci) ranged from 0.526 in Mekna to 0.658 in Kenitra (Table 3.4). These values are slightly higher than those obtained for the EST-SSRs in every population, with the exception of the Spanish and Tunisian populations. Only the population of MON departed significantly from HWE at 0.05 significance level, after Bonferroni correction (Table 3.4). Fis for the MON population assumed a positive value, and could reflect biparental inbreeding or population substructure (Table 3.4). However, in this case, considering that Micro-Checker marginally detected null alleles for this population for the QrZAG20 locus, this effect cannot be discarded. In total, 56 alleles were identified at the five loci, and nine alleles were private alleles. The private alleles presented no particular distribution over the populations as did those of the EST-SSRs, although a slight tendency for the population of ARR that has 3 of the nine alleles (Table 3.4). The private alleles were mostly at the extremes of the allele size distribution and occurred at low frequencies.

No microsatellite (either nuSSR or EST-SSR) revealed evidence of nonneutrality after the Ewens–Watterson and Ewens–Watterson–Slatkin tests (data not shown).

3.2.2 Genetic differentiation among populations Different coefficients of genetic differentiation among populations were estimated for both types of SSRs markers (Table 3.5). All the coefficients displayed higher values for the EST-

SSRs than for the nuSSRs, and consistently in both markers G‟ST and D displayed slightly higher values than FST, GST and RST (Table 3.5). GST and FST showed that differentiation among populations was more than double in the case of EST-SSRs (GST=0.066 EST-SSRs vs.

0.031 nuSSRs; FST=0.071 EST-SSR‟s vs 0.032 nuSSRs). Nevertheless, for the remaining

66

Results

Table 3.5: Genetic statistics for EST-SSRs and nuSSRs. Number of alleles (NA), allelic richness (A), observed (Ho) andexpected (He) heterozygosities, F ST differentiation among populations according to Wier and Cockerham [125]; RST

differentiation among populations according to Slatkin [126], GST proportion among

population differentiation according to Nei & Chesser [130], G'ST standardized measure of genetic differentiation according to Hedrick [129], and Dest estimator of actual differentiation according to Jost [128]

Locus NA A Ho He FST RST GST G'ST Dest EST-SSRs

QrOST1 9 4.320 0.540 0.610 0.064 0.038 0.060 0.148 0.093 QpD12 3 2.999 0.403 0.474 0.139 0.142 0.130 0.229 0.114 QmAJ1 6 3.182 0.501 0.542 0.017 0.019 0.020 0.045 0.025 All 18 1.750 0.481 0.542 0.071 0.066 0.066 0.141 0.077 nuSSR's

QpZAG110 23 13.149 0.817 0.872 0.022 -0.004 0.023 0.169 0.149 QpZAG9 7 3.124 0.138 0.142 0.014 0.015 0.014 0.016 0.003 QrZAG20 5 3.894 0.449 0.557 0.035 0.042 0.036 0.081 0.047 QrZAG7 10 6.625 0.689 0.756 0.057 0.138 0.055 0.204 0.158 QrZAG11 11 5.755 0.580 0.628 0.010 0.032 0.015 0.042 0.027 All 56 6.509 0.535 0.591 0.032 0.045 0.031 0.102 0.077

coefficients (RST, G‟ST and D) the differences between the markers are not significant and, interestingly, the value of actual differentiation among populations calculated according to Jost D for both SSRs was the same (Dest=0.077) (Table 3.5).

Tests of pairwise FST and RST were performed for the thirteen populations, for both EST- SSRs and nuSSRs. There was a tendency for obtaining higher values in the EST-SSR data matrix, but not always so. Therefore both SSR matrices were analysed together. The overall genetic differentiation at the microsatellite loci was low (Pairwise FST from 0.000 to 0.123), though highly significant (p<0.001) after bonferroni correction in 51 out of 78 pairs (Table

3.6). The RST matrix values very resembled the ones of the FST matrix. The highest values were obtained for the populations CAT and KEN, followed by PUG. The Dest values, although similar to the FST and RST values, tend to be lower (Supporting Information 7). Isolation by distance was tested using a Mantel test but no correlation was found between genetic differentiation and geographic distance among populations (r=0.1082, p=0.26).

67

Results

Table 3.6: Pairwise FST (Below) and RST (Upper) values between every population. ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK

ALG -- 0.023 0.044 0.055 0.013 0.063 0.021 0.075 0.073 0.022 0.059 0.041 0.000 0.023 ARR -- 0.000 0.052 0.020 0.004 0.000 0.055 0.065 0.034 0.001 0.008 0.046 *** 0.043 0.000 BUC -- 0.094 0.056 0.007 0.004 0.060 0.108 0.043 0.013 0.009 0.067 *** ns 0.052 0.050 0.086 CAT -- 0.051 0.094 0.084 0.141 0.083 0.067 0.096 0.092 0.086 *** *** *** 0.013 0.020 0.053 0.049 HAZ -- 0.041 0.025 0.087 0.043 0.021 0.050 0.065 0.031 ns ns *** *** 0.035 0.003 0.007 0.086 0.039 EST -- 0.000 0.056 0.092 0.034 0.006 0.035 0.044 *** ns ns *** *** 0.021 0.000 0.004 0.077 0.025 0.000 GER -- 0.039 0.073 0.028 0.004 0.023 0.037 *** ns ns *** *** ns 0.070 0.052 0.057 0.123 0.080 0.053 0.038 PUG -- 0.101 0.055 0.072 0.100 0.084 *** *** *** *** *** *** *** 0.068 0.061 0.097 0.076 0.042 0.084 0.068 0.092 KEN -- 0.052 0.120 0.132 0.093 *** *** *** *** *** *** *** *** 0.021 0.033 0.041 0.063 0.020 0.033 0.027 0.052 0.049 TAZ -- 0.068 0.082 0.036 ns *** *** *** ns ns *** *** *** 0.056 0.001 0.013 0.087 0.048 0.006 0.004 0.067 0.107 0.063 MON -- 0.035 0.076 *** ns ns *** *** ns ns *** *** *** 0.039 0.008 0.009 0.084 0.061 0.034 0.022 0.091 0.117 0.076 0.034 SIN -- 0.069 *** ns ns *** *** *** *** *** *** *** *** 0.000 0.044 0.063 0.079 0.030 0.042 0.036 0.077 0.085 0.035 0.070 0.065 MEK -- ns *** *** *** *** *** *** *** *** *** *** *** ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK

ns=Not significant; *p<0.05; ** p<0.01, *** p<0.001

3.2.3 Population structure The EST-SSRs and nuSSRs datasets were analysed separately and then merged together to determine the populations genetic structure (Fig. 3.8). For the EST-SSR‟s, in the software STRUCTURE, the logarithm of the probability of the data [LnP(D)] as function of K reached a peak for K=3 (mean values: LnP(D)=-2055.3; var[LnP(D)]=131.2), which was confirmed using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.1). For the nuSSR‟s dataset the LnP(D) reached a peak at K=4 (mean values: LnP(D)=-4749.3; var[LnP(D)]=238.9) and then decreased, but there was a higher DK value for K=3 than for K=4 using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.2). For the combined dataset, the LnP(D) reached a peak ate K=4 (mean values: LnP(D)=-6704.3; var[LnP(D)]=303.8), but when DK was used to infer the number of clusters, K=2 presented

68

Results

the highest values, however there was a second peak at K=4 (Supporting Information 8 – Fig. S8.3).

For the most likely run for each K, the r value was always low and below 1, indicating that the sample locations were informative and helped greatly to find the population structure.

When comparing the results from the EST-SSRs and nuSSRs datasets the results are slightly different, which is not completely unexpected considering the different types of SSRs (Fig. 3.8a and Fig. 3.8b). However, for both datasets each population can almost be completely assigned to one of the clusters detected. When K=2, for the EST-SSRs the populations CAT and KEN can be assigned to one cluster (pink cluster), as ALG, ARR, BUC, EST, GER, PUG, MON and SIN to the other cluster (blue cluster). The populations HAZ, TAZ and MEK appear as a mixture of both clusters (Fig. 3.8a). For the nuSSRs dataset, the groups are different, as MEK appears differentiated from the remaining populations in the blue cluster and ALG and HAZ as mixed populations, although slightly more similar to MEK (Fig. 3.8b). Despite of the validation of K=3 for the EST-SSRs most of the populations appear as a mixture of clusters. The population of CAT appears differentiated, alone in one of the clusters (pink cluster), the same way as SIN appears in another cluster (blue cluster) (although some individuals show more probability of belonging to the pink cluster, along with CAT) (Fig. 3.8b). For the nuSSRs, at K=4, CAT also appears differentiated, alone in one cluster (blue cluster). The Italian population, PUG, can also be placed alone in another cluster (green cluster). The Portuguese populations (ARR, BUC, EST, GER, MON and SIN) can all also be, to same extent, placed in a third cluster (pink cluster), and HAZ and TAZ appear as mixed populations (Fig. 3.8b).

For a more robust analysis both matrices were merged together (Fig. 3.8c). At K=2 the populations of ALG, CAT and MEK appear as part of the same pink cluster (79%, 88% and 91% of assignment probabilities, respectively), and the Portuguese populations and PUG as part of the blue cluster (94% on average for the Portuguese populations and 85% for PUG). HAZ, KEN and TAZ appear as mixed populations, with a slight tendency for the pink cluster (Fig. 3.8c). At K=4 CAT differentiates from the other populations (75%) in a green cluster. PUG and KEN appear as part of the same yellow cluster (79% and 74%, respectively). The MEK population differentiates in another cluster (85% for the pink cluster) and HAZ and ALG appear as mixed populations although more closely related to the MEK cluster. The

69

Results

Portuguese populations appear all together in the blue cluster (83% on average), with the GER populations as the most mixed population in the group. The population TAZ is a mixed population between several clusters (Fig. 3.8c). The geographic distribution of the clusters obtained by STRUCTURE for the combined SSRs dataset is presented in Fig. 3.9a for K=2 and in Fig. 3.9b for K=4.

To complement the analyses run in STRUCTURE, GENELAND analysis was performed on the merged dataset. The geographical distribution of the six clusters detected is shown in Fig. 3.9c. The first cluster (purple) was composed of the Portuguese populations (EST, GER, BUC, MON, SIN and ARR); the second (orange) was composed only by KEN; the third cluster (green) grouped the populations HAZ and TAZ; the fourth cluster (grey) included a single population, CAT; the fifth cluster (blue) comprised the populations of ALG and MEK; and the sixth cluster (red) considered only PUG.

AMOVA considering the clusters formed in GENELAND and STRUCTURE analysis (Fig. 3.8 and Fig. 3.9) was always significant for the clusters detected at the 0.001 level but also showed that the great majority of genetic variation was found within populations (94%). Also, for the molecular analysis considering the 6 clusters (structure obtained by the software GENELAND) we were able to obtain the highest value for the genetic differentiation between groups (FCT=4.99) (Supporting Information 9).

70

Results

Figure 3.8: Structure clustering results obtained for the a) EST-SSRs dataset (K=2 and 3); b) nuSSRs dataset (K=2, 3 and 4); and c) combined dataset (K=2, 3 and 4). Populations are separated by black bars and identified at the bottom. In all analyses, each distinct cluster is represented by a unique colour. Each individual is represented by a thin bar and the colours on each vertical bar represent the probability of the individual belonging to each cluster.

a)

b)

c)

71

Results

Figure 3.9: Geographic distribution of the clusters obtained by STRUCTURE and GENELAND: a) combined dataset with Structure for K=2; b) combined dataset for Structure with K=4; and c) combined dataset for GENELAND with K=6. Pie charts represent the assignment probabilities to each cluster, and each cluster is colour coded. Pie charts sizes reflect the number of samples per population (22-32). For a) and b) the colour codes reflect the ones used in Fig. 7c to code each cluster. a)

b)

c)

72

Discussion

4. Discussion

4.1 Differentiation and demographic patterns

Maternally inherited cpDNA markers yield valuable information about genetic variability associated with local populations or provenances [143], therefore the geographic patterns of cpDNA haplotypes in many widespread European forest trees are sometimes interpreted based on the assumption of survival as glacial refugia in South and Eastern Europe – outside the limits of the Weichselion ice sheet – and postglacial migration. Some species appear to have spread northwards and westwards from a single refuge while others spread from multiple refugia [48,54,57,70,144].

Analysis of the sequencing data from cpDNA regions, clearly show (with the exception of the rbcL fragment) the presence of two well established cork oak lineages, the pure lineage and the introgressed lineage (supported as well by the sequencing of the nuclear candidate gene).

The cpDNA pure lineage here described seems to be related with the “suber” lineage described previously by Jiménez et al. [41], which is almost specific to cork oak populations and may be considered as the original and most widely distributed lineage in this species [41,64]. The TrnS/PsbC fragments presented the highest resolution power regarding this lineage and three main haplotypes (A1, A2 and A3) are evident (Fig. 3.1 and Fig 3.4). These three sublineages have well delimited geographic areas and possibly reflect refuge areas from where expansion events putatively occurred after the last glaciation, which is somewhat supported by the values from mismatch distribution and neutrality tests (Table 3.3). The previous works of López de Heredia et al. [52] and Lumaret et al. [43] have indicated the southern Iberian Peninsula as a possible refuge area, supported by palynological data. Although the results found in this work are not conclusive enough to support this idea, the sublineage A2 appears to have spread from a western Mediterranean area, consistent with a refuge area in the Iberian Peninsula. Lumaret et al. [43], based on RFLP analysis of the whole cpDNA, indicated two more possible refuge areas for cork oak, more precisely southern Italian Peninsula and North Africa, albeit this is not supported from fossil record [52]. It is difficult to determine the origin of sl A3 because this haplotype is distributed throughout most of Peninsular Italy and North Africa (Algeria and Tunisia). However, any of these geographic areas could have been a refuge for this lineage in cork oak in agreement

73

Discussion

with the results presented by Lumaret et al. [43]. Nevertheless, the presence of a haplotype (sl A2) restricted to the Sicily Island was unexpected. Although no previous work suggests Sicily as a refuge area, the geographic restriction of this lineage and the fact that is more or less contemporaneous to the other two sublineages, suggests that this might be indeed a refuge area for cork oak.

It is also possible that the extensive introgression of Q. suber by Q. ilex may indicate several potential refugia areas. In fact López de Heredia et al. [52] presents North-eastern Spain (Catalonia) as a potential refuge area resultant from extensive hybridization with Q. ilex [52]. The authors argue that the populations from this area present a predominant “ilex chlorotype” that is very rare in holm oak. Therefore it cannot be discarded the hypothesis that some populations might have withstood the glacial conditions in this area (or any other area), by hybridizing. Although it is not possible to fully corroborate this, it was found that in CAT population there is the indication of a total replacement of the cpDNA pure lineage, which might indicate that the events of introgression might be ancient, and indeed reflect a glacial refuge area. The same complete replacement of the cpDNA pure lineage appears to have happened in MEK, and almost completely in HAZ.

However, more detailed inferences about the geographic origins of the haplotypes and their migration scenarios will require additional sampling of populations and most likely other genomic regions because the lower cpDNA variation itself could bias the identification of glacial refugia for Quercus suber.

There is no previous works using sequences from the nuclear genome in this species. However, in comparison with the results from the cpDNA sequences, the nuclear DNA fragment seems to be in fact more informative than the cpDNA. The nucleotide diversity is higher than those from the cpDNA fragments, as well as the haplotype number found for the pure lineage (Table 3.1 and Table 3.2). Also, the analysis shows a more complex geographic distribution history for cork oak. The results obtained, just like for the cpDNA, showed a pure lineage composed by three sublineages, but the distribution of the sublineages are not as geographically structured as they were for the cpDNA dataset (Fig. 3.6 and Fig. 3.7). The sublineage α3 provided by the nuDNA dataset, that in the cpDNA was restricted to the Sicily Island, extends to Lazio (Italy) and Tunisia. The sublineage α1, equivalent to the cpDNA sublineage A1, was still the most frequent sublineage, but at the nuDNA it is not restricted to

74

Discussion

the western part of the Mediterranean as it was in the cpDNA, showing an extended distribution, although not so frequently, to the eastern part of the Mediterranean. The same was detected for the sublineage α2 that seemed not to be restricted to the eastern part of the species distribution. These differences between cpDNA and nuDNA sequence data can be explained by long-distance pollen dispersal and/or high levels of polymorphism. However, considering the results for the levels of polymorphism in the candidate gene (Table 3.1 and Table 3.2) they do not appear to be high enough to justify these differences and long-distance pollen dispersal, with the more limited acorn dispersal, seems to be a better explanation. This is consistent with indirect methods based on measures of genetic differentiation for nuclear versus cpDNA markers in oaks, which suggest that pollen flow is much higher (by two orders of magnitude) than seed flow [145-147].

The pattern of three sublineages obtained in this work clearly contrasts with the one previously found by Magri et al [16]. Using cpDNA microsatellites, the authors analysed cork oak populations throughout the species distribution range and found a high geographical structure characterized by five distinct haplotypes (Fig. 1.4). The cpDNA SSR data combined with paleobotanical and geodynamics models lead the authors to suggest an early Cenozoic origin for cork oak in the Iberian Peninsula and a susequent genetic drift geographically consistent with the Oligocene and Miocene break-up events [16] (Fig. 1.5). All these events seemed to have occurred without detectable cpDNA modifications for a time span of at least 15-25 million years. This is somehow also inconsistent with the results found in this work. As most of the cpDNA fragments sequenced here actually showed no resolution and therefore haplotype variation that could detect the three sublineages, the TrnS/PsbC fragment indeed shows that the sublineages are formed by a single mutational event (Fig. 3.1), which is unlikely to date to an early Cenozoic.

4.2 Hybridization and introgression

Several proposals for Quercus taxonomy based on morphology have been presented [12,26]. Classifications have not been straightforward and especially at the subgenus level, are uncertain. The taxonomic scheme proposed by Schwarz [26] is possibly the most accepted for

75

Discussion

the classification of cork oak, and appears to be the most suitable in describing the systematics of European oaks [19,31,32].

Upon sequencing of the cpDNA fragments for the eleven Quercus species used in this study, with the exception of the rbcL fragment that presented no sequence variation between all the 11 species used, the remaining 4 cpDNA fragments (matK, TrnS/PsbC, TrnL-F and TrnH/PsbA) in general were able to distinguish the 4 subgenus (or subsections) (Fig. 3.1a, Fig. 3.2, Fig. 3.3 and Fig. 3.5) proposed by Schwarz [26] (Quercus, Erythrobalanus, Sclerophyllodrys and Cerris). However, the phylogenetic relationships between the subgenus are uncertain among fragments and it is not possible to make accurate inferences about those relationships. Also, in accordance to the latest work of Piredda et al. [29], it remains the idea that the genus Quercus is noncompliant to barcoding with the most common cpDNA sequences, since most of the species analysed within the same subgenus share the same cpDNA haplotype. The low levels of cpDNA variation rate and hybridization events are likely to be the cause [29].

The nuclear DNA, however, has a lot more discrimination power than the cpDNA. In fact the EST 2T13 fragment supports the recognition of the subgenus Sclerophyllodrys, Cerris, Erythrobalanus and Quercus, in agreement with the works of Bellarosa et al. [28] and Bellarosa et al. [27] that also used fragments of the nuclear genome [27,28]. Also, the EST 2T13 fragment distinguishes all the species analysed, and although this issue requires further study it supports the idea that the nuclear DNA might be a useful supplementary barcode tool in difficult genus such as Quercus.

The complex evolutionary history of the Mediterranean evergreen oaks has already been addressed by other authors, that showed that Q. suber, Q. ilex and Q. coccifera present shared haplotypes as a result of successful hybridization and introgression of Q. suber by Q. ilex [41,43,52]. However, those results were based on RFLP analysis over the cpDNA only and with no insight on the nuclear genome. The sequencing of the cpDNA fragments immediately evidences the introgression events in Q. suber. Since the subgenus Sclerophyllodrys and Cerris are clearly distinguishable in the phylogenetic trees constructed (Fig. 3.1, Fig. 3.2, Fig. 3.3 and Fig. 3.5), the presence of cork oak samples in both subgenus easily points to

76

Discussion

introgression of Q. suber, allowing the identification of a pure lineage of cork oak haplotypes in the subg Cerris, and an introgressed lineage in the subg Sclerophyllodrys.

The distribution of the cpDNA introgressed lineage appears restricted to the Western area of the species distribution and peripheral regarding the distribution of the pure lineage (specifically the sublineage A3). Although it is not possible to date precisely the introgression events some may in fact reflect glacial refugia in this area of the distribution [possibly in the North-eastern Spain (Catalonia) and/or Morocco] where cork oak populations survived with introgression with Q. rotundifolia. In the postglacial colonization events of range expansion the rapid expansion of cork oak from the pure lineage refuge may have limited the expansion of the introgressed lineage forming the mixed populations that present both haplotype lineages (Fig. 3.4). On the other hand, the analysis of the phylogenetic trees doesn‟t allow ruling out the hypothesis of more recent or current introgression events (Fig. 3.1, Fig. 3.2, Fig. 3.3). Current hybridization is still happening, most frequently in central and eastern Iberia, with the first-generation hybrids between Q. suber and Q. ilex being easily identified in the field [52].

The same introgressed lineage seems to be present in the nuclear DNA, although there is no previous reference. However, the cork oak samples belonging to the introgressed lineage are not always the same in both genomes. That is, some of the samples of the cpDNA introgressed lineage present a nuclear genome of the pure lineage as others present evidences of a nuclear introgressed lineage, and also some samples with the cpDNA belonging to the pure lineage present a nuclear genome from the introgressed lineage.

The flowering phenology and present day ecology of the two species suggest that pollen-flow might be expected to be predominantly from Q. suber into Q. ilex. Quercus suber performs better than Q. ilex as a pollen parent in interspecific crosses [45]. Molecular evidence provide support for this expectation [18,43,71]. These evidences would explain the cork oak samples that present an introgressed cpDNA, but where the nuclear fragment belongs to the pure lineage (see, for example, samples TAZ 1 or HAZ 5). However the reverse also seems to happen, because samples were found that present a cpDNA from the pure lineage, and the nuclear DNA belongs to the introgressed lineage (see TOL 3 or LAZ 2). Interestingly some

77

Discussion

samples (see GER 5 or TAZ 2) present both cpDNA and nuDNA fragments of the introgressed lineage at the same time.

The fact that in the subg. Sclerophyllodrys the species Q. ilex and Q. coccifera present the same haplotypes was suggested previously by some authors to be a result of introgression between these species or of incomplete lineage sorting [41,64]. The same happens in the subg Cerris, between Q. cerris and Q. suber. The lack of resolution of the cpDNA might argue for incomplete lineage sorting, but previous authors suggested introgression between these closely related species [16]. Despite Quercus suber and Quercus cerris belong to the same taxonomic group, subgenus Cerris [17,30], they are morphologically well distinct, and have different geographical and ecological ranges. The natural distribution range of Q. cerris is from central and southern Europe to Asia Minor. However, in peninsular Italy and in Sicily the ranges of Q. cerris and Q. suber overlap. In fact, Q. crenata is hypothesized to be a hybrid between Q. suber and Q. cerris, although some other authors considered it instead as a fixed species.

The analyses of the cpDNA datasets show that Quercus cerris and Quercus ilex share the same haplotype for most of the fragments, which could point to an incomplete lineage sorting. However, the highest resolution power of the TrnS/PsbC fragment (Fig. 3.1) places Q. cerris haplotype as highly derived from the sublineage A3, in the Eastern Mediterranean area. Although this cpDNA fragment differentiates the species it does not excludes possible, and eventually somewhat ancient, hybridization events between Q. suber sl A3 and Q. cerris. The nuclear fragment shows that Q. cerris shares the same haplotype as Q. suber samples from sublineage α1, one of the lineages from the Eastern Mediterranean area. Considering both types of markers, although the cpDNA does not immediately suggest introgression events between these species, the nuclear candidate gene does not clarify between this hypothesis and incomplete lineage sorting. Nevertheless, retention of ancestral polymorphism also needs to be considered given the unavailability in confirming introgression between these species. These two hypotheses might be confounded with each other, particularly when contemporary introgression can not be discarded, due to the presence of both species in some areas.

78

Discussion

4.3 Genetic diversity and population structure

The selection of the populations for the SSRs analyses was made based on the sequencing results and throughout the entire range in order to maximize the chances of surveying a great part of the species genetic diversity.

Recent work has been done in genetic diversity and population structure for several species using a combined analysis of EST and genomic SSRs. Although a small amount of work has been done in cork oak with nuSSRs there were no previous studies EST-SSRs. Tests for neutrality indicate that selection did not differentially affect performance of EST and nuSSRs in characterizing cork oak populations. Even though EST-SSRs are potentially exposed to selection only a small percentage shows evidence of positive selection [91,93]. However, it is important to conduct selective neutrality tests on EST-SSRs before using them in population genetics analyses because even though they most probably will not be under strong selection pressure, a small percentage may indeed be [91,93]. Also, results show that genetic diversity of EST-SSRs measures similar to the nuSSRs, and there is no evidence of null alleles or other genotyping errors. Therefore, evidences suggest that EST-SSRs are appropriate markers for population genetics studies in cork oak.

The population differentiation found, although low was significant and is, at least for the

EST-SSRs (FST=0.071; RST=0.066; Dest=0.077), close to the lower limit of the range of the average values (0.07-0.09) expected for the long-lived, wind-pollinated woody species (Table 3.5 and Table 3.6) [22]. Although the studies of Coelho et al. [22] and Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) only considered Portuguese populations

(FST=0.0172 and FST=0.02/RST=0.013, respectively), the general values of population differentiation found here were considerably higher (RST=0.066 EST-SSRs vs. 0.045 nuSSRs;

FST=0.071 EST-SSR‟s vs 0.032 nuSSRs) (Table 3.5). However, pairwise FST and RST values between Portuguese populations tend to be lower and non-significant (Table 3.6) denoting the small differentiation between these populations, also found by the studies of Coelho et al. [22] and Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). Also, and in agreement with these results we found that most of species diversity (94%) is found within rather than among populations.

79

Discussion

The locus MsQ13 was previously suggested to be particularly informative to detect F1 hybrids between Q. suber and Q. rotundifolia because alleles sizes do not overlap [88,142]. Even though the locus was tested here in individuals of every population, including all the individuals that were detected as belonging to the introgressed lineages (either cpDNA or nuDNA) the locus was monomorphic at the expected allele size for Q. suber. Nevertheless the work of Burgarella et al. [142] clearly demonstrates the difficulty in detecting introgressed hybrids in these species even though the microsatellite loci chosen for their work were highly differentiated between species and had good diagnostic power. Also, although there was the initial attempt of recreate the SSR battery used in this work some of them were discarded because there was no amplification product, the scoring was extremely doubtful or there was a high deviation from HWE. In the future, perhaps a more targeted choice for easily reproducible markers is required, as well as the investment in some key holm oak populations for comparative purposes in detecting hybrids.

Isolation by distance was tested but no correlation was found between genetic differentiation and geographic distance among populations throughout the Mediterranean. However, in a previous work of Ramírez-Valiente et al. [148] in cork oak Spanish populations, and using the same nuSSRs battery as in this work (with the exception of QpZAG46 that had no clear scoring) the authors found that the FST measures for the neutral markers were correlated with geographic distance. In the same work the authors also found an association between leaf size and the microsatellite QpZAG46, which suggests a possible linkage between QpZAG46 and genes encoding for leaf size [148].

When comparing the population structure results from the EST-SSRs and nuSSRs datasets they are slightly different, which is not completely unexpected considering the different types of SSRs (Fig. 8a and Fig. 8b). However, when merging the datasets, from where the most consistent information is expected to be retrieved, the results from STRUCTURE and GENELAND softwares, although not in complete agreement, present the same emerging pattern: 1) The Portuguese populations grouped together in one cluster. There was no differentiation between the Portuguese populations and this is in agreement with the results found by Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). This

80

Discussion

might be explained, considering the geographic distance between the populations, and it might be therefore expected the role of gene flow in the homogenization of the alleles in these populations. Also, this is in agreement with the low and mostly non-significant pairwise

FST and RST values found between these populations; 2) Catalonia is clearly the most well differentiated population. The results always placed this population as the only of a cluster and it scored the highest pairwise comparisons for FST and RST values.

On the overall GENELAND results provided a more plausible scenario regarding the distribution of the clades. When analysing STRUCTURE results the population of Puglia (PUG) appeared in awkward clusters that are difficult to explain, such as, when K=2 why does it appear in the same cluster as the Portuguese populations, and when K=4 in the same cluster as Kenitra (KEN), as in K=2 KEN and PUG are in opposite clusters. Although STRUCTURE groups KEN and PUG in one cluster, GENELAND separates these two populations in one cluster each (Figs. 8 and 9). The small number of SSRs and low levels of differentiation might explain the senseless distribution of some clades in STRUCTURE analysis. However, the finding that GENELAND identified a greater number of clusters than STRUCTURE (six versus two/four), and that the same clusters were identified by independent GENELAND runs and produced similar values of posterior probabilities, could indicate that the algorithm employed in GENELAND may be more sensitive to find weak clusters in space, when there is low differentiation. In fact, recently, a similar finding was reported by Wellenreuther et al. [149] in a work with Ischnura elegans, the blue tailed damselfly.

81

Final Remarks

5. Final Remarks

Extending over a surface of about 2.2 million ha in seven Mediterranean countries (Portugal, Spain, Algeria, Morocco, Italy, Tunisia and France), cork oak forest landscapes represent one of the best examples of the multi-functional role of forests, maintained over thousands of years but promoting high biodiversity levels. Well managed cork oak forests provide valuable ecological functions such as the conservation of soil, buffering against climate change and desertification, water table recharge and run-off control and contribute to the survival of many species. Cork oak trees are extremely important in ensuring that these ecosystems maintain the ecological balance and do not harm the forest. These semi-natural woodlands thus provide a valuable income to local populations both at a direct level with the harvesting of cork and in an indirect level by providing other economically valuable resources such as grazing grounds for animals and above all, the maintenance of an ecological balance

Mediterranean regions have been facing a growing number of extreme weather events due to rapid change of climate. Assessment of the impacts of climate extremes upon cork oak trees can help planning better forest management practices for coping with future climate change, and to achieve the purpose of sustainable development of the ecosystems and societies within the Mediterranean area.

Studying the consequences of past climate shifts on biodiversity are among the best tools to validate models of the ecological and evolutionary consequences of future changes. Advances in DNA analysis are allowing the reconstruction of the evolutionary history of forest trees.

This work focused on the first molecular approach assessing the potential of a combined analysis with chloroplastidial and nuclear DNA markers, as well as sequence data and microsatellites. The importance of such synergistic analyses is highlighted when addressing questions such as the evolutionary history and geographic patterns of populations‟ diversity.

On the overall, the three major objectives in this work were achieved. It was possible to gather valuable information on the evolutionary history of Quercus suber. Sequencing data allowed the detection of two major haplotype lineages, consistent in both nuclear and chloroplastidial genomes. Within the pure lineage were unveiled three sublineages and some signs of recent population expansion. It is hypothesised that during the coldest periods cork oak would only survive in more benign climatic areas (possibly three refuges), from where,

82

Final Remarks

after the warming at the end of the last glacial period, might have colonized its current distribution area.

It was also possible to explore the phylogenetic relationships of cork oak and other Quercus species from all the four recognized subgenus. This also helped the detection of the introgressed lineage in cork oak resulting from several events of hybridization with Q. ilex. Although some of the hybridization events might appear old, current hybridization can not be discarded. Also, and although the hybridization and DNA introgression by Q. ilex has already been reported by other authors, it became evident in this work that the introgression events are also detected in the nuclear genome.

Finally, microsatellites allowed the identification of some differentiation and structuring in some key cork oak populations. Although the differentiation and the clusters found might be somewhat weak, adding microsatellites and populations will possibly strengthen the results found here.

83

Bibliographic References

6. Bibliographic References

1 Food and Agriculture Organization of the United Nations (FAO) (2011) State of the World‟s Forests. Fao World Forests

2 Petit, R. J. and Hampe, A. (2006) Some Evolutionary Consequences of Being a Tree. Annual Review of Ecology, Evolution, and Systematics. 37, 187-214

3 Oldfield, S. et al. (1998) The World List of Threatened Trees, Cambridge,World Conservation Press

4 Hansen, A. J. et al. (2001) Global Change in Forests: Responses of Species, Communities, and Biomes. BioScience. 51, 765-779

5 Food and Agriculture Organization of the United Nations (FAO) (2010) Global Forest Resources Assessment 2010. Main report

6 González-Martínez, S. C. et al. (2006) Forest-tree population genomics and adaptive evolution. The New phytologist. 170, 227-38

7 Schaal, B. a et al. (1998) Phylogeographic studies in plants: problems and prospects. Molecular Ecology. 7, 465-474

8 Petit, R. J. et al. (2005) Climate changes and tree phylogeography in the Mediterranean. Taxon. 54, 877-885

9 Avise, J. C. et al. (1987) Intraspecific Phylogeography: The Mitochondrial DNA Bridge Between Population Genetics and Systematics. Annual Review of Ecology and Systematics. 18, 489-522

10 Avise, J. C. (2009) Phylogeography: retrospect and prospect. Journal of Biogeography. 36, 3-15

11 Beheregaray, L. B. (2008) Twenty years of phylogeography: the state of the field and the challenges for the Southern Hemisphere. Molecular ecology. 17, 3754-74

12 Nixon, K. C. (1993) Infrageneric classification of Quercus (Fagaceae) and typification of sectional names. Annales Des Sciences Forestières. 50, 25s-34s

13 Nixon, K. C. (2006) Global and Neotropical Distribution and Diversity of Oak ( genus Quercus ) and Oak Forests. In Ecology and conservation of neotropical montane oak forests 185 (Kappelle, M., ed), pp. 3-13, Springer-Verlag

14 Pausas, J. G. et al. (2006) Regeneration of a marginal Quercus suber forest in the eastern Iberian Peninsula. Journal of Vegetation Science. 17, 729

84

Bibliographic References

15 Elena-Rosselló, J. A. and Cabrera, E. (1996) Isozyme Variation in Natural Populations of Cork-Oak (Quercus suber L.). Population Structure, Diversity, Differentiation and Gene Flow. Silvae Genetica. 4 & 45, 229-235

16 Magri, D. et al. (2007) The distribution of Quercus suber chloroplast haplotypes matches the palaeogeographical history of the western Mediterranean. Molecular Ecology. 16, 5259-5266

17 Tutin, T. G. et al. (1993) Flora Europaea, Volume 1, (2nd edn) Cambridge University Press

18 Toumi, L. and Lumaret, R. (1998) Allozyme variation in cork oak (Quercus suber L.): the role of phylogeography and genetic introgression by other Mediterranean oak species and human activities. Theoretical and Applied Genetics (TAG). 97, 647-656

19 Toumi, L. and Lumaret, R. (2001) Allozyme characterisation of four Mediterranean evergreen oak species. Biochemical systematics and ecology. 29, 799-817

20 Pausas, G. P. et al. (2009) The tree. In Cork Oak Woodlands on the Edge. Ecology, Adaptive Management, and Restoration (1st edn) (Aronson, J. et al., eds), pp. 11-21, Island Press

21 Carrión, J. S. et al. (2000) Past distribution and ecology of the cork oak (Quercus suber) in the Iberian Peninsula: a pollen-analytical approach. Diversity and Distributions. 6, 29 - 44

22 Coelho, A. C. et al. (2006) Genetic Diversity of Two Evergreen Oaks [Quercus suber (L.) and Quercus ilex subsp. rotundifolia (Lam.)] in Portugal using AFLP Markers. Silvae Genetica. 55, 146-152

23 Soto, A. et al. (2007) Differences in fine-scale genetic structure and dispersal in Quercus ilex L. and Q. suber L.: consequences for regeneration of mediterranean open woods. Heredity. 99, 601-7

24 Pulido, F. J. et al. (2001) Size structure and regeneration of Spanish holm oak Quercus ilex forests and dehesas: effects of agroforestry use on their long-term sustainability. Forest Ecology and Management. 146, 1-13

25 Pons, J. and Pausas, J. G. (2006) Oak regeneration in heterogeneous landscapes: The case of fragmented Quercus suber forests in the eastern Iberian Peninsula. Forest Ecology and Management. 231, 196-204

26 Schwarz, O. (1964) Quercus L. In Flora Europaea, Volume 1 (2nd edn) (Tutin, T. G. et al., eds), pp. 71-76, Cambridge University Press

27 Bellarosa, R. et al. (2005) Utility of ITS sequence data for phylogenetic reconstruction of Italian Quercus spp. Molecular Philogenetics and Evolution. 34, 355-370

85

Bibliographic References

28 Bellarosa, R. et al. (1990) Ribosomal RNA genes in Ouercus spp. (Fagaceae). Plant Systematics and Evolution. 172, 127-139

29 Piredda, R. et al. (2011) Prospects of barcoding the Italian wild dendroflora: oaks reveal severe limitations to tracking species identity. Molecular ecology resources. 11, 72-83

30 Manos, P. S. et al. (1999) Phylogeny, Biogeography, and Processes of Molecular Differentiation in Quercus subgenus (Fagaceae). Molecular Phylogenetics and Evolution. 12, 333-349

31 Manos, P. S. et al. (2001) Systematics of Fagaceae: Phylogenetic test of reproductive trait evolution. International journal of plant sciences. 162, 1361-1379

32 Kress, W. J. and Erickson, D. L. (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PloS one. 2, e508

33 Cowan, R. S. et al. (2006) 300,000 Species to Identify: Problems, Progress, and Prospects in DNA Barcoding of Land Plants. Taxon. 55, 611

34 Hajibabaei, M. et al. (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in genetics. 23, 167-72

35 Chase, M. W. et al. (2005) Land plants and DNA barcodes: short-term and long-term goals. Philosophical transactions of the Royal Society of London. Series B, Biological sciences. 360, 1889-95

36 Fazekas, A. J. et al. (2008) Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PloS one. 3, e2802

37 Chase, M. W. et al. (2007) A proposal for a standardised protocol to barcode all land plants. Taxon. 56, 295-299

38 Lahaye, R. et al. (2008) DNA barcoding the floras of biodiversity hotspots. PNAS. 105, 2923-8

39 CBOL, P. W. G. (2009) A DNA barcode for land plants. PNAS. 106, 12794-7

40 Neubig, K. M. et al. (2008) Phylogenetic utility of ycf1 in orchids: a plastid gene more variable than matK. Plant Systematics and Evolution. 277, 75-84

41 Jiménez, P. et al. (2004) High variability of chloroplast DNA in three Mediterranean evergreen oaks indicates complex evolutionary history. Heredity. 93, 510-5

42 Lumaret, R. et al. (2002) Phylogeographical variation of chloroplast DNA in holm oak (Quercus ilex L.). Molecular ecology. 11, 2327-36

86

Bibliographic References

43 Lumaret, R. et al. (2005) Phylogeographical Variation of Chloroplast DNA in Cork Oak (Quercus suber). Annals of . 96, 853-861

44 Rushton, B. S. (1993) Natural hybridization within the genus Quercus L. Annals of forest science. 50, 73-90

45 Boavida, L. C. et al. (2001) Sexual reproduction in the cork oak (Quercus suber L). II. Crossing intra- and interspecific barriers. Sexual Plant Reproduction. 14, 143-152

46 Bennett, K. D. (1997) Evolution and Ecology: the pace of life, Cambridge University Press.

47 French, H. M. (2007) The Periglacial Environment, (3rd edn) Longman.

48 Comes, H. P. and Kadereit, W. K. (1998) The effect of Quaternary climatic changes on plant distribution and evolution. Trends in Plant Science. 3, 432-438

49 Hewitt, G. M. (1999) Post-glacial re-colonization of European biota. Biological Journal of the Linnean Society. 68, 87-112

50 Willis, K. J. et al. (2000) The Full-Glacial Forests of Central and Southeastern Europe. Quaternary Research. 53, 203-213

51 Palmé, A. E. et al. (2003) Postglacial recolonization and cpDNA variation of silver birch, Betula pendula. Molecular ecology. 12, 201-12

52 López de Heredia, U. et al. (2007) Molecular and palaeoecological evidence for multiple glacial refugia for evergreen oaks on the Iberian Peninsula. Journal of Biogeography. 34, 1505-1517

53 Willis, K. J. and Van Andel, T. H. (2004) Trees or no trees? The environments of central and eastern Europe during the Last Glaciation. Quaternary Science Reviews. 23, 2369-2387

54 Hewitt, G. M. (1996) Some genetic consequences of ice ages, and their role in divergence and speciation. Biological Journal of the Linnean Society.

55 Hewitt, G. M. (2000) The genetic legacy of the Quaternary ice ages. Nature. 405, 907- 913

56 Petit, R. J. et al. (1997) Chloroplast DNA footprints of postglacial recolonization by oaks. PNAS. 94, 9996-10001

57 Taberlet, P. et al. (1998) Comparative phylogeography and postglacial colonization routes in Europe. Molecular ecology. 7, 453-64

87

Bibliographic References

58 Konnert, M. and Bergmann, F. (1995) The geographical distribution of genetic variation of silver fir (Abies alba, Pinaceae) in relation to its migration history. Plant Systematics and Evolution. 196, 19-30

59 Dumolin-Lapègue, S. et al. (1997) Phylogeographic structure of white oaks throughout the European continent. Genetics. 146, 1475-87

60 Pollard, D. and Barron, E. J. (2003) Causes of model-data discrepancies in European climate during Oxygen Isotope Stage 3 with insights from the last glacial maximum. Quaternary Research. 59, 108-113

61 Barron, E. and Pollard, D. (2002) High-Resolution Climate Simulations of Oxygen Isotope Stage 3 in Europe. Quaternary Research. 58, 296-309

62 Kvacek, Z. and Walther, H. (1989) Paleobotanical studies in Fagaceae of the European Tertiary. Plant systematics and Evolution. 162, 213-229

63 Dumolin, S. et al. (1995) Inheritance of chloroplast and mitochondrial genomes in pedunculate oak investigated with an efficient PCR method. Theoretical and Applied Genetics. 91, 1253-1256

64 López de Heredia, U. et al. (2005) The Balearic Islands: a reservoir of cpDNA genetic variation for evergreen oaks. Journal of Biogeography. 32, 939-949

65 Kremer, A. and Petit, R. J. (1993) Gene diversity in natural populations of oak species. Annals of forest science. 50, 186-202

66 Wright, S. (1931) Evolution in Mendelian Populations. Genetics. 16, 97-159

67 Thompson, J. D. (2005) Plant Evolution in the Mediterranean, Oxford University Press.

68 Fineschi, S. et al. (2000) Chloroplast DNA polymorphism reveals little geographical structure in Castanea sativa Mill. (Fagaceae) throughout southern European countries. Molecular Ecology. 9, 1495 -1503

69 Petit, R. J. et al. (2002) Chloroplast DNA variation in European white oaks Phylogeography and patterns of diversity based on data from over 2600 populations. Forest Ecology and Management. 156, 5-26

70 Palmé, A. E. and Vendramin, G. G. (2002) Chloroplast DNA variation, postglacial recolonization and hybridization in hazel, Corylus avellana. Molecular ecology. 11, 1769-79

71 Elena-Rosselló, J. A. et al. (1992) Evidence for hybridization between sympatric holm-oak and cork-oak in Spain based on diagnostic enzyme markers. Vegetation. 99, 115-118

88

Bibliographic References

72 Hamza, N. B. (2010) Cytoplasmic and nuclear DNA markers as powerful tools in populations‟ studies and in setting conservation strategies. African Journal of Biotechnology. 9, 4510-4515

73 Levy, F. et al. (1996) A population genetic analysis of chloroplast DNA in Phacelia. Heredity. 76, 143-55

74 Taberlet, P. et al. (1991) Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Molecular Biology. 17, 1105-1109

75 Aoki, K. et al. (2003) Intraspecific sequence variation of chloroplast DNA among the component species of evergreen broad-leaved forests in Japan. Journal of plant research. 116, 337-44

76 Baraket, G. et al. (2008) Chloroplast DNA analysis in Tunisian fig cultivars (Ficus carica L.): Sequence variations of the trnL-trnF intergenic spacer. Biochemical Systematics and Ecology. 36, 828-835

77 Rathbone, D. A. et al. (2007) Microsatellite and cpDNA variation in island and mainland populations of a regionally rare eucalypt, Eucalyptus perriniana (Myrtaceae). Australian journal of botany. 55, 513-520

78 Kress, W. J. et al. (2005) Use of DNA barcodes to identify flowering plants. PNAS. 102, 8369-8374

79 Nishizawa, T. and Watano, Y. (2000) Primer pairs suitable for PCR-SSCP analysis of chloroplast DNA in angiosperms. Journal of Phytogeography Taxon. 48, 63-66

80 Calonje, M. et al. (2008) Non-coding nuclear DNA markers in phylogenetic reconstruction. Plant Systematics and Evolution. 282, 257-280

81 Hare, M. P. (2001) Prospects for nuclear gene phylogeography. Trends in Ecology & Evolution. 16, 700-706

82 Bhargava, A. and Fuentes, F. F. (2010) Mutational dynamics of microsatellites. Molecular biotechnology. 44, 250-66

83 Goldstein, D. B. and Pollock, D. D. (1997) Launching Microsatellites : A Review of Mutation Processes and Methods of Phylogenetic Inference. Journal of Heredity. 88, 335-342

84 Qureshi, S. N. et al. (2004) EST-SSR: A New Class of Genetic Markers in Cotton. The Journal of Cotton Science. 8, 112-123

85 Oliveira, E. J. et al. (2006) Origin, evolution and genome distribution of microsatellites. Genetics and Molecular Biology. 29, 294-307

89

Bibliographic References

86 Lazrek, F. et al. (2009) The use of neutral and non-neutral SSRs to analyse the genetic structure of a Tunisian collection of Medicago truncatula lines and to reveal associations with eco-environmental variables. Genetica. 135, 391-402

87 Hornero, J. et al. (2001) Testing the Conservation of Quercus spp. Microsatellites in the Cork Oak, Q. suber L. Silvae Genetica. 50, 3-4

88 Soto, A. et al. (2003) Nuclear Microsatellite Markers for the Identification of Quercus ilex L . and Q . suber L . hybrids. Silvae Genetica. 52, 63-66

89 Nagaraj, S. H. et al. (2007) A hitchhiker‟s guide to expressed sequence tag (EST) analysis. Briefings in bioinformatics. 8, 6-21

90 Bouck, A. and Vision, T. (2007) The molecular ecologist‟s guide to expressed sequence tags. Molecular ecology. 16, 907-24

91 Kim, K. S. et al. (2008) Utility of EST-derived SSRs as population genetics markers in a beetle. The Journal of heredity. 99, 112-24

92 Ueno, S. and Tsumura, Y. (2007) Development of ten microsatellite markers for Quercus mongolica var. crispula by database mining. Conservation Genetics. 9, 1083- 1085

93 Ellis, J. R. and Burke, J. M. (2007) EST-SSRs as a resource for population genetic analyses. Heredity. 99, 125-32

94 Porth, I. et al. (2005) Linkage mapping of osmotic stress induced genes of oak. Tree Genetics & Genomes. 1, 31-40

95 Cuénoud, P. et al. (2002) Molecular hylogenetics of Caryophyllales based on nuclear 18S rDNA and plastid and rbcl, atpB and matK DNA sequences. American Journal of Botany. 89, 132-144

96 Jeffrey, J. A. and Lexer, C. (2008) A set of novel DNA polymorphisms within candidate genes potentially involved in ecological divergence between Populus alba and P. tremula, two hybridizing European forest trees. Molecular Ecology Resources. 8, 188-192

97 Casasoli, M. et al. (2006) Comparison of Quantitative Trait Loci for Adaptive Traits Between Oak and Chestnut Based on an Expressed Sequence Tag Consensus Map. Genetics Society of America. 172, 533-546

98 Dow, B. D. et al. (1995) Characterization of highly variable (GA/CT) n microsatellites in the bur oak, Quercus macrocarpa. Theoretical and Applied Genetics. 91, 137-141

99 Steinkellner, H. et al. (1997) Identification and characterization of (GA/CT)n- microsatellite loci from Quercus petraea. Plant molecular biology. 33, 1093-6

90

Bibliographic References

100 Kampfer, S. et al. (1998) Characterization of (GA)n Microsatellite Loci from . Hereditas. 129, 183-186

101 Alberto, F. et al. (2010) Population differentiation of sessile oak at the altitudinal front of migration in the French Pyrenees. Molecular ecology. 19, 2626-39

102 Van Oosterhout, C. et al. (2004) Micro-Checker: Software for Identifying and Correcting Genotyping Errors in Microsatellite Data. Molecular Ecology Notes. 4, 535-538

103 Thompson, J. D. et al. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic acids research. 25, 4876-82

104 Larkin, M. a et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics (Oxford, England). 23, 2947-8

105 Hall, T. A. (1999) BioEdit: A biological user-friendly sequence alignment editor and analisis program. Nucleic Acids Symposium. 41, 95-98

106 Pina-Martins, F. and Paulo, O. S. (2008) Concatenator: Sequence Data Matrices Handling Made Easy. Molecular ecology resources. 8, 1254-5

107 Swofford, D. L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Inauer Associates, Sunderland, Massachusetts

108 Ronquist, F. and Huelsenbeck, J. P. (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19, 1572-1574

109 Nylander, J. (2004) MrModeltest V2. Evolutionary Biology Centre.

110 Bandelt, H. J. et al. (1999) Median-joining networks for inferring intraspecific phylogenies. Molecular biology and evolution. 16, 37-48

111 Watterson, G. a (1978) The homozygosity test of neutrality. Genetics. 88, 405-17

112 Slatkin, M. (1994) An exact test for neutrality based on the Ewens sampling distribution. Genetical Research. 64, 71-74

113 Slatkin, M. (1996) A correction to the exact test based on the Ewens sampling distribution. Genetical Research. 68, 259-260

114 Excoffier, L. and Lischer, H. E. L. (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 10, 564-567

91

Bibliographic References

115 Harpending, H. C. (1994) Signature of ancient population growth in a low-resolution mitochondrial DNA mismatch distribution. Human biology an international record of research. 66, 591-600

116 Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123, 585-95

117 Fu, Y.-X. (1997) Statistical Tests of Neutrality of Mutations Against Population Growth, Hitchhiking and Background Selection. Genetics Society of America. 147, 915-925

118 Ramos-Onsins, S. E. and Rozas, J. (2002) Statistical properties of new neutrality tests against population growth. Molecular biology and evolution. 19, 2092-100

119 Rousset, F. (2008) genepop‟007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources. 8, 103-106

120 Librado, P. and Rozas, J. (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics (Oxford, England). 25, 1451-2

121 Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New York, USA. 512 pp

122 Goudet, J. (1995) FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics . Journal of Heredity . 86, 485-486

123 Goudet, J. (2001) FSTAT, a program to estimate and test gene diversities and fixation indices (version 2.9.3). Available ,

124 Peakall, R. and Smouse, P. E. (2006) genalex 6: genetic analysis in Excel. Population genetic software for teaching and research. Molecular Ecology Notes. 6, 288-295

125 Wier, B. S. and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of population structure. Evolution. 38, 1358-1370

126 Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics. 139, 457-62

127 Crawford, N. G. (2010) Smogd: Software for the Measurement of Genetic Diversity. Molecular ecology resources. 10, 556-7

128 Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular Ecology. 17, 4015-4026

129 Hedrick, P. W. (2005) A Standardized genetic differentiation measure. Evolution. 59, 1633-1638

92

Bibliographic References

130 Nei, M. and Chesser, R. K. (1983) Estimation of fixation indices and gene diversities. Annals of Human Genetics. 47, 253-259

131 Rousset, F. (1997) Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics Society of America. 145, 1219-1228

132 Jensen, J. L. et al. (2005) Isolation by distance, web service. BMC genetics. 6, 13

133 Pritchard, J. K. et al. (2000) Inference of population structure using multilocus genotype data. Genetics. 155, 945-59

134 Hubisz, M. J. et al. (2009) Inferring weak population structure with the assistance of sample group information. Molecular ecology resources. 9, 1322-32

135 Falush, D. et al. (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 164, 1567-87

136 Evanno, G. et al. (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular ecology. 14, 2611-20

137 Rosenberg, N. A. (2004) Distruct: a program for the graphical display of population structure. Molecular Ecology Notes. 4, 137-138

138 Guillot, G. et al. (2005) Geneland: a computer package for landscape genetics. Molecular Ecology Notes. 5, 712-715

139 Guillot, G. et al. (2005) A spatial statistical model for landscape genetics. Genetics. 170, 1261-80

140 François, O. et al. (2006) Bayesian clustering using hidden Markov random fields in spatial population genetics. Genetics. 174, 805-16

141 Excoffier, L. et al. (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics. 131, 479-91

142 Burgarella, C. et al. (2009) Detection of hybrids in nature: application to oaks (Quercus suber and Q. ilex). Heredity. 102, 442-52

143 Lexer, C. et al. (2004) Hybrid zones as a tool for identifying adaptive genetic variation in outbreeding forest trees: lessons from wild annual sunflowers (Helianthus spp.). Forest ecology and management. 197, 49-64

144 Petit, R. J. et al. (2002) Identification of refugia and post-glacial colonisation routes of European white oaks based on chloroplast DNA and fossil pollen evidence. Forest Ecology and Management. 156, 49-74

93

Bibliographic References

145 Dow, B. D. and Ashley, M. V. (1996) Microsatellite analysis of seed dispersal and parentage of samplings in bur oak, Quercus macrocarpa. Molecular ecology. 5, 615- 627

146 Hu, X. S. and Ennos, R. A. (1999) Impacts of seed and pollen flow on population genetic structure for plant genomes with three contrasting modes of inheritance. Genetics. 152, 441-50

147 Streiff, R. et al. (1999) Pollen dispersal inferred from paternity analysis in a mixed oak stand of Quercus robur L . and Q. petraea ( Matt .) Liebl . Molecular Ecology. 8, 831- 841

148 Ramírez-Valiente, J. a et al. (2009) Elucidating the role of genetic drift and natural selection in cork oak differentiation regarding drought tolerance. Molecular ecology. 18, 3803-15

149 Wellenreuther, M. et al. (2011) Environmental and climatic determinants of molecular diversity and genetic population structure in a coenagrionid damselfly. PloS one. 6, e20440

150 Lewontin, R. C. (1964) The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. Genetics. 49, 49-67

151 Meirmans, P. G. and Hedrick, P. W. (2011) Assessing population structure: F(ST) and related measures. Molecular ecology resources. 11, 5-18

94

Supporting Information

Supporting Information

95

Supporting Information

Supporting Information 1

Information regarding the primers used for the amplification of each cpDNA fragment is summarized in table S1.1, as well as the annealing temperatures for PCR amplification and fragments size.

Table S1.1: Description of the cpDNA fragments used concerning primer sequences, annealing temperature (Ta in ºC) and fragment size (in base pairs). Primers Locus Forward Reverse Ta Size Reference 5’ GGT TCA AGT CCC TCT 5’ ATT TGA ACT GGT GAC ACG Taberlet et al., TrnL-F 65 381 ATC CC 3’ AG 3’ 1991 5’ TGA ACC TGT TCT TTC 5’ GAA CTA TCG AGG GTT Nishizawa & TrnS-PsbC 65 250 CAT GA 3’ CGA AT 3’ Watano, 2000 5’ CGC GCA TGG TGG ATT 5’ GTT ATG CAT GAA CGT AAT TrnH-PsbA 65 478 Kress et al., 2005 CAC AAT CC 3’ GCT C 3’ 5' CGA TCT ATT CAT TCA 5' TCT AGC ACA CGA AAG TCG Cuénoud et al., matK 65 740 ATA TTT C 3' AAG T 3' 2002 5' ATG TCA CCA CAA ACA 5' GTA AAA TCA AGT CCA CCR Kress & Erickson, rbcla 65 552 GAG ACT AAA GC 3' CG 3' 2007

A description regarding the three nuclear candidate genes tested in this study is summarized in table S1.2, as well as the annealing temperatures and primers for PCR amplification and fragments size.

Table S1.2: Primer sequences and bibliographic references, annealing temperature (in ºC), fragment size (in base pairs) and locus information for the nuDNA fragments. Primers Locus Forward Reverse Description Ta size Reference 5' CAT GCA CTG 5' ATA ATT TGC CTC Osmotic stress Porth et al., EST 2T13 CCA ATC TCA GAG ATC ACT ACA TAA GA 55 249 related gene 2005 A 3' 3' 5'CCA ATT CTC 5' GCT TTG GGA TGA Auxin repressed Casasoli et Cons 58 TTA GTG GCA * * TGT TTT GG 3' protein al., 2006 AGG 3' 5' ATA TGG CGA Phytocrome B, 5' GGC ATC CAT TTC Jeffrey & Phyt B ATA TGG GGT CA involved in flower * * TGC ATT CT 3' Lexer, 2008 3' phenology * Amplification product was never obtained for cork oak.

96

Supporting Information

Supporting Information 2

Information regarding the 11 dinucleotide nuclear microsatellite (nuSSRs) markers is summarized in table S2.1. A description and relevant information about the 6 EST-SSRs tested in this study is also summarized in table S2.2.

Table S2.1: Description of the nuSSRs used concerning primer sequences, annealing temperatures (Ta in ºC), repeat motif and size ranges (in base pairs).

Primers Size range (bp) Repeat Locus Forward Reverse Ta motif Expected Found Reference 5' TGG CTG CAC 5' ACA CTC AGA Dow et al., MsQ13 CTA TGG CTC TTA CCC ACC ATT 55 (AG)n 222-246 218 1995 G 3' TTT CC 3' 5' GCA ATT ACA 5' GTC TGG ACC QpZAG9 GGC TAG GCT GG TAG CCC TCA TG 50 (AG)12 182-210 223-249 Steinkellner 3' 3' et al., 1997 5' CGA TTT GAT 5' CAT CGA CTC QpZAG15 AAT GAC ACT ATG ATT GTT AAG 57 (AG)23 108-152 101-135 Steinkellner G 3' CAC 3' et al., 1997 5' GAT CAAA AAT 5' ACT GTG GTG QpZAG36 TTG GAA TAT TAA GTG AGT CTA * (AG)19 210-236 * Steinkellner GAG AG 3' ACA TGT AG 3' et al., 1997 5' CCC CTA TTG 5' TCT CCC ATG QpZAG46 AAG TCC TAG CCG TAA GTA GCT * (AG)13 190-222 * Steinkellner 3' CTG 3' et al., 1997 5' GGA GGC TTC 5' GAT CTC TTG QpZAG110 CTT CAA CCT ACT TGT GCT GTA 50 (AG)15 206-262 208-258 Steinkellner 3' TTT 3' et al., 1997 5' CCT TGA ACT 5' GTA GGT CAA QrZAG11 CGA AGG TGT CCT AAC CAT TGG 50 (TC)18 238-263 255-281 Kampfer et T 3' TTG ACT 3' al., 2004 5' CAA CTT GGT 5' GTG CAT TTC QrZAG7 GTT CGG ATC AA TTT TAT AGC 50 (TC)17 115-153 115-133 Kampfer et 3' ATT CAC 3' al., 2004 5' CCA TTA AAA 5' GCA ACA CTC QrZAG20 GAA GCA GTA TTT AGC CTA TAT 50 (TC)22 160-200 161-171 Kampfer et TGT 3' CTA GAA 3' al., 2004 5’ GAT CTC TTT 5’ ATG TGT GTG QsA11 GTC AAC CCA GAC GTG ATG GGT * (CA)n 258-276 * Simões de 3’ TT 3' Matos 2007 5’CTG CAA CTT 5’ GAT CCT CTG QsD8 TAT CCG CCT CC * (CA)n 140-150 * Simões de CTT CTC TCT G 3’ 3’ Matos 2007 * Amplification product was never obtained, or the scoring was unreliable.

97

Supporting Information

Table S2.2: Primer sequences [92], annealing temperature (in ºC), repeat motif, size ranges (in base pairs) and locus information for the EST-SSRs. Primers Size range (bp) Locus Forward Reverse Ta Repeat motif Expected Found Description 5' CAA CCA TCG 5' TCA CCG ATC QmOST1 EST Non- AGG CCA TTA TTG AAG GTC 58 (AG)19 149-171 134-152 DN949770 coding CGA A 3' CTC GA 3' 5' GCT CCC TGG 5' CAA TTG GGA EST Coding QmD12 TAG TCG GCT CAA CAT GGA 58 (GCA)7 243-251 240-246 Zinc finger CR627959 AAA GA 3' AGC AT 3' protein EST Coding QmAJ1 5' ATT CAG GCC 5' GAA ACT GGT Pheromone 57 (GAA)6 374-380 360-375 AJ577265 GCA AAT CAA CCC CTT CTC receptor-like TAA GG 3' TTG GA 3' protein 5' TAG TTT TCC 5' CTT CTT GAA EST Coding QmDN1 CAG CGA ATC GGG ACT GAC 58 (GGA)6 242-261 236 Salt tolerance DN950717 CAA CA 3' CCC AT 3' protein EST Non- QmDN2 5' CAA CCA TCG 5' TCA CCG ATC coding * (AG)9 156-168 * DN949776 AGG CCA TTA TTG AAG GTC 60S ribosomal CGA A 3' CTC AG 3' protein L21 EST Non- coding 5' TCA AAC AAT 5' GCT TTT GAG QmDN3 Putative CTC AAG GCT AAA CTT TGG 58 (TC)10 361-381 361-375 DN950726 carboxyl- CCC AA 3' CCA CC 3' terminal proteinase * Amplification product was never obtained, or the scoring was unreliable.

98

Supporting Information

Supporting Information 3

The cpDNA concatenated matrix has a length of 1109 bp, where 92 are variable. The model of sequence evolution for the Bayesian analysis (BA) was calculated separately for each cpDNA data set. The BA tree showed a very similar result to that of the MP analysis, therefore the MP tree for the concatenated dataset is presented in Fig. S3.1. The concatenated tree supports the results of the individual trees, where the 4 major groups are present (Fig. 3.1a, Fig. 3.2 and Fig. 3.3). Highlighted in yellow, the Group A is composed by the cork oak samples belonging to the pure lineage distributed in the three sublineages (A1, A2 and A3) in accordance with the TrnS/PsbC tree (Fig. 3.1). Group B is the most variable one, composed by several haplotypes of cork oak samples from the introgressed lineage, as well as with samples from Quercus ilex (subs rotundifolia and ilex) and Quercus coccifera. The Group C, composed by several Quercus species, is closely related to Group A. Group D is constituted by Quercus rubra, which is placed as the most distant species from cork oak, as it happened in the phylogeny of the TrnH/PsbA fragment (Fig. 3.2)

99

Supporting Information

Figure S3.1: Maximum parsimony tree of the cpDNA concatenated dataset. Four groups are represented and color coded. Group A is highlighted in yellow: Cork oak‟s Pure lineage (Bright Yellow - Sublineage A2 (Sl A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1(Sl A1)); Group B (orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group C is highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap support value obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

100

Supporting Information

Supporting Information 4

Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Fig. S4.1, Fig. S4.2 and Fig. S4.3) reflecting the four major groups in the phylogenetic trees. Also they show shared haplotypes for Q. suber, in clade B, with Quercus coccifera, Q. ilex ilex and Quercus ilex rotundifolia. Although the networks do not clearly reflect the phylogenetic relationships between the groups they bring visual support information about the distance between them, as the networks appear as a simple and clear way to represent the mutational steps between haplotypes, and also about the haplotype frequencies. The median-joining networks of the distribution representing the observed haplotypes for each Quercus species, for the fragments TrnS/PsbC, TrnH/PsbA and TrnL-F, are respectively presented in Fig. S4.1, Fig. S4.2 and Fig. S4.3.

101

Supporting Information

Figure S4.1: A median-joining haplotype network generated from 250 bases of the TrnS/PsbC intergenic spacer region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: cork oak‟s Pure lineage and Q. cerris (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3, including Q. cerris; Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate the presence of a missing ancestral haplotype

102

Supporting Information

Figure S4.2: A median-joining haplotype network generated from 478 bases of the TrnH/PsbA intergenic spacer region. Circle size reflects the relative frequency of each haplotype across all 10 Quercus species. Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork oak‟s pure lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate the presence of a missing ancestral haplotype.

103

Supporting Information

Figure S4.3: A median-joining haplotype network generated from 381 bases of the TrnL-F intergenic spacer region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork oak‟s pure lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate the presence of a missing ancestral haplotype.

104

Supporting Information

Supporting Information 5

Global evaluation of the EST-SSR dataset using MICRO-CHEKER v2.2.3 [102] revealed no evidence of genotyping errors due to stuttering or large allele dropout, but identified possible null alleles, by a general excess of homozygotes, at two loci: QmOST1 and QmDN3 (p<0.05). For QmOST1 locus there is the possibility of null alleles for the populations of Haza del Lino (HAZ) and Mekna (MEK) (Fig. S5.1). However, for both populations, when analyzing the graphics the observed values of the homozygote frequencies are barely outside the range of the expected values. Therefore this microsatellite was not discarded from the following analysis.

For the QmDN3 locus the observed homozygote frequencies were clearly out of the range of what would be expected. The fact that this was detected for all the 13 populations provides a strong indicator that there seems to be in fact null alleles for this locus. A representative example of all populations is exhibited in Fig. S5.2. As a result this locus was discarded from all subsequent analyses.

Regarding the nuSSRs dataset, the global evaluation with MICRO-CHEKER revealed, again, no evidence of genotyping errors due to stuttering or large allele dropout, but identified possible null alleles in a few populations for the markers: QpZAG110 (Fig. S5.3), QrZAG11 (Fig. S5.4) and QrZAG20 (Fig. S5.6). Specifically for QpZAG110 locus, null alleles were detected for the populations of Serra da Arrábida (ARR) e Serra do Buçaco (BUC); for the QrZAG11 locus the possibility of null alleles was detected for Serra da Estrela (EST) population; and for QrZAG20 locus for the populations of Puglia (PUG) and Serra de Monchique (MON). However, when analyzing the graphics the observed values of the homozygote frequencies are barely outside the range of the expected values, and the indication of null alleles is only for one or two populations out of the 13 analysed. Therefore these microsatellites were not discarded from the following analysis.

105

Supporting Information

a) b)

c) d)

Figure S5.1: MICRO-CHEKER charts for the QmOST1 locus for the populations of HAZ (Figs. a and b) and MEK (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population HAZ; b) Homozygote frequencies for the population HAZ; c) Frequency differences in base pair for the population MEK; d) Homozygote frequencies for the population MEK.

a) b)

Figure S5.2: MICRO-CHEKER charts of the QmDN3 locus for the population of Serra da Arrábida (ARR), as a representative of the indication of null alleles for all the 13 populations. The significance level is 0.05; a) Frequency differences in base pair for the population ARR; b) Homozygote frequencies for the population ARR.

106

Supporting Information

a)

b)

c)

d)

Figure S5.3: MICRO-CHEKER charts for the locus QpZAG110 for the populations ARR and BUC. The significance level is 0.05; a) Frequency differences in base pair for the population ARR; b) Homozygote frequencies for the population ARR; c) Frequency differences in base pair for the population BUC; d) Homozygote frequencies for the population BUC.

107

Supporting Information

a)

b)

Figure S5.4: MICRO-CHEKER charts for the locus QrZAG11 for the populations EST. The significance level is 0.05; a) Frequency differences in base pair; b) Homozygote frequencies.

a) b)

c) d)

Figure S5.5: MICRO-CHEKER charts for the QrZAG20 locus for the populations of PUG (Figs. a and b) and MON (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population PUG; b) Homozygote frequencies for the population PUG; c) Frequency differences in base pair for the population MON; d) Homozygote frequencies for the population MON.

108

Supporting Information 6

Linkage disequilibrium is the non-random association of alleles at two or more loci. This is a statistical association and the loci do not have necessarily to be physically linked [150]. Genotypic linkage disequilibrium between all pairs of loci was tested by means of a contingency exact test using GenePop v4 [119] (Table S6.1). No significant departure from the null hypothesis of linkage equilibrium was detected. Therefore the eight polymorphic microsatellite markers should be useful for this study.

Table S6.1: Test for linkage disequilibrium for all pairs of loci using Fisher's method, implemented in GenePop software. Loci combination p

EST-SSRs

QrOST1 & QpD12Loci combination0.37 p QrOST1 & QmAJ1EST-SSRs 0.10 QpD12 & QmAJ1 QrOST10.07 & QpD12 0.37 nuSSRs QrOST1 & QmAJ1 0.10

QpZAG110 & QpZAG9 QpD120.38 & QmAJ1 0.07 QpZAG110 & QrZAG20nuSSRs 0.90 QpZAG9 & QrZAG20QpZAG110 0.43 & QpZAG9 0.38 QpZAG110 & QrZAG7QpZAG110 0.96 & QrZAG20 0.90 QpZAG9 & QrZAG7QpZAG9 0.51 & QrZAG20 0.43 QrZAG20 & QrZAG7QpZAG110 0.34 & QrZAG7 0.96 QpZAG110 & QrZAG11 QpZAG90.81 & QrZAG7 0.51 QpZAG9 & QrZAG11 QrZAG200.38 & QrZAG7 0.34 QpZAG20 & QrZAG11QpZAG110 0.10 & QrZAG11 0.81 QrZAG7 & QrZAG11QpZAG9 0.97 & QrZAG11 0.38 Complete dataset QpZAG20 & QrZAG11 0.10

QrOST1 & QpZAG110 QrZAG70.95 & QrZAG11 0.97 QrOST1 & QpZAG9Complete dataset0.88 QrOST1 & QrZAG20QrOST1 0.18 & QpZAG110 0.95 QrOST1 & QrZAG7 QrOST10.00 & QpZAG9 0.88 QrOST1 & QrZAG11 QrOST10.44 & QrZAG20 0.18 QpD12 & QpZAG110 QrOST10.88 & QrZAG7 0.00 QpD12 & QpZAG9 QrOST10.95 & QrZAG11 0.44 QpD12 & QrZAG20 QpD120.05 & QpZAG110 0.88 QpD12 & QrZAG7 QpD120.25 & QpZAG9 0.95 QpD12 & QrZAG11 QpD120.86 & QrZAG20 0.05 QmAJ1 & QpZAG110 QpD120.58 & QrZAG7 0.25 QmAJ1 & QpZAG9 QpD10.962 & QrZAG11 0.86 QmAJ1 & QrZAG20QmAJ1 0.59 & QpZAG110 0.58 QmAJ1 & QrZAG7 QmAJ10.15 & QpZAG9 0.96 QmAJ1 & QrZAG11 QmAJ10.82 & QrZAG20 0.59

QmAJ1 & QrZAG7 0.15 QmAJ1 & QrZAG11 0.82 Table S6.1: Test for linkage disequilibrium for all pairs of loci using Fisher's method, implemented in GenePop software. Supporting Information

Supporting Information 7

Although FST is widely used as a measure of population differentiation and structure, it has been criticized because of its dependency on within-population diversity, which has led to the development of replacement statistics such as D, the measure of actual differentiation among populations, according to Jost [128]. Nevertheless, Meirmans & Hendrick [151] recommend continuing to use FST in combination with the new statistics.

Tests of pairwise Dest were performed for the thirteen populations. Both SSR‟s matrices were analysed together. The overall genetic differentiation at the microsatellite loci was low

(Pairwise FST from 0.000 to 0.097) (Table S7.1). The Dest values very resembled the FST and

RST matrices (Table 3.6), although with a tendency to be lower.

Table S7.1: Pair Dest values between every population. -- 0.010 0.021 0.031 0.005 0.012 0.007 0.056 0.039 0.006 0.033 0.012 0.000 ALG -- 0.000 0.017 0.016 0.001 0.000 0.050 0.040 0.008 0.000 0.000 0.024 ARR -- 0.031 0.060 0.002 0.002 0.041 0.065 0.009 0.005 0.003 0.028 BUC -- 0.035 0.050 0.029 0.097 0.051 0.045 0.043 0.037 0.070 CAT -- 0.032 0.017 0.073 0.026 0.009 0.039 0.030 0.012 HAZ -- 0.000 0.037 0.057 0.013 0.001 0.016 0.015 EST -- 0.033 0.030 0.013 0.001 0.005 0.013 GER -- 0.042 0.034 0.055 0.072 0.066 PUG -- 0.025 0.069 0.070 0.062 KEN -- 0.037 0.016 0.010 TAZ -- 0.029 0.046 MON -- 0.031 SIN -- MEK ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK ALG – Forêt des Guerbès (Algeria); ARR – Arrábida (Portugal); BUC – Buçaco (Portugal); CAT – Cataluña (Spain); HAZ – Haza del Lino (Spain); EST – Estrela (Portugal); GER – Gerês (Portugal); ITA – Puglia (Italy); KEN – Kenitra (Marocco); TAZ – Taza (Marocco); MON – Monchique (Portugal); SIN – Sintra (Portugal); TUN – Mekna (Tunisia).

110

Supporting Information

Supporting Information 8

The estimation of the number of populations (K) should be treated with care and a biological interpretation of K may not be straightforward. We used the posterior probability of the data for a given K, LnP(D), to identify the most probable number of clusters using both DeltaK (DK) ad hoc statistics [136] and by plotting the average values of LnP(D). As the LnP(D), the (ad hoc) estimate for the number of groups given by STRUCTURE might not always correspond to the real number of clusters, the DeltaK, an ad hoc quantity related to the second order rate of change of the log probability of data with respect to the number of clusters, tends to be a good predictor of the real number of clusters.

The EST-SSR‟s and nuSSR‟s datasets were analysed separately and then merged together to determine the species genetic structure (Fig. 3.8). The plots of the logarithm of the probability of the data [LnP(D)] and of the Evanno‟s criterion [136] are represented, respectively, in Fig. S8.1, Fig. S8.2 and Fig. S8.3 for the EST-SSRs, nuSSRs and combined datasets.

111

Supporting Information

Figure S8.1: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the EST-SSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20 replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1 to 13).

112

Supporting Information

Figure S8.2: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the nuSSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20 replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1 to 13).

113

Supporting Information

Figure S8.3: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the combined dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20 replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1 to 13).

114

Supporting Information

Supporting Information 9

Several AMOVA (Hierarchical Analysis of Molecular Variance) [141] analysis (with 1000 permutations) (Table S9.1), were performed based on the allelic frequencies (FST values). It was intended to verify the distribution of the genetic variability between the different hierarchy levels: groups (FCT), populations (FSC) and individuals (FST). The different structures considered were in accordance with the clusters (K) obtained by the softwares STRUCTURE [133] (Fig. 3.8) and GENELAND [138] (Fig. 3.9). It is assumed that the best genetic structure obtained is the one that explains the major part of variation by the groups

(FCT), that is, it maximizes the break between populations

Table S9.1: Variation percentages over different levels estimated with AMOVA. The analysis was performed for the SSR loci combined dataset, based on Fst values.

Among groups Among populations Within populations within groups % Fct % Fsc % Fst 0.02814 0.03249 0.02814 Two clusters (K=2) 2.81 3.16 94.03 *** *** *** 0.05817 0.02532 0.03370 Four clusters (K=4) 3.37 2.45 94.18 *** *** *** 0.05844 0.00897 0.04992 Six clusters (K=6) 4.99 0.85 94.16 *** ** *** %= Percentage explained by the total of molecular variance Significance level **P<0.01, ***P<0.001

115

Supporting Information

116