Dog introgression patterns in a South European population

MSc in Bioinformatics Master’s Thesis

Daniel Gómez-Sánchez Barcelona, 2014 Dog introgression patterns in a South European wolf population

Daniel Gómez-Sánchez Barcelona, 2014

Approval of the tutors

Signed, Dr. Antonio Barbadilla Dr. Carles Lalueza-Fox

ACKNOWLEDGMENTS

First of all, it had been a pleasure working with the people of the Institut de Biologia Evolutiva (CSIC-UPF). Concretely, I’m very grateful to the Paleogenomic’s group for the opportunity to work with a very professional team: to Dr. Carles Lalueza-Fox for accept me in his group, to Dr. Óscar Ramírez because this work could not have been done without his help and ideas, to Iñigo Olalde and Federico Sánchez-Quinto for shared their bioinformatic skills with me, and to Federica Pierini.

Second, I’m thankful to the faculty of the Universitat Autònoma de Barcelona’s MSc in Bioinformatics, concretely to my academic tutor Antonio Barbadilla, for the bioinformatic knowledge and skills taught.

Third, I would like to express my gratitude to Dr. Carles Vilà, Dr. Robert K. Wayne, Dr. Tomas Marques-Bonet and Dr. Jeffrey M. Kidd for the unpublished data used in this Master’s Thesis. I also wanted to point out the help of Raphael Carrasco and Conrad Enseñat for the donation of the and Wolf EEP samples; Dr. Natalia Sastre for the microsatellite analysis; and Dr. Adam Boyko, Dr. Bridgett vonHoldt and Dr. Malgorzata Pilot for the information about the 48K dataset.

I’m very thankful to Dr. Carles Lalueza-Fox, Dr. Óscar Ramírez, Iñigo Olalde and Dr. Antonio Barbadilla for the review and comments on the manuscript; and Patricia Rodríguez and Jordi Antonio Pinzón García for their help in the linguistic revision.

Last but not least, I’m very grateful to all the people who have believed, and continue to believing in my early scientific career: to my parents Antonio and María Luisa, my sisters Alba and Alicia, Patricia Rodríguez and the rest of my family for their patience and affection; to my friends Alberto Segovia Sanz and Jordi Antonio Pinzón García for their aid in the battle to conquer informatics; and to Dr. Juan Luis Santos and all the people in the Cytogenetic’s lab at the Universidad Complutense de for my initiation in the scientific world.

Thank you all, because without you this Master’s Thesis had never been written.

1

INDEX Contents 1. INTRODUCTION ...... 3

1.1. Extinction risk factors ...... 4

1.2. Molecular markers ...... 7

2. OBJETIVES ...... 8

3. MATERIAL AND METHODS ...... 9

3.1. Sampling and sequencing ...... 9

3.2. Mapping ...... 9

3.3. SNP calling ...... 9

3.4. Diversity analysis and inbreeding ...... 10

3.5. Ancestry analysis ...... 11

3.6. Hybridization analysis ...... 12

4. RESULTS ...... 13

4.1. Heterozygosity and inbreeding ...... 13

4.2. Hybridization patterns ...... 16

5. DISCUSSION ...... 24

6. CONCLUSION ...... 29

7. REFERENCES ...... 30 Appendixes 1. Bioinformatics’ discussion ...... 37

2. Results for no-Iberian samples ...... 41

3. Heterozygosity by chromosome ...... 43

4. Heterozygosity distribution for no Iberian samples ...... 57

5. Principal components’ boxplots and PCA with component 4 ...... 60

6. Cross-validation error of the ADMIXTURE analysis ...... 64

7. Linear model details of heterozygosity-percentage block analysis ...... 65 2

List of Tables Table 1. Results for Iberian samples...... 13 List of Figures Figure 1. Iberian wolf distribution ...... 4 Figure 2. Diversity distribution ...... 13 Figure 3. Diversity analysis ...... 14 Figure 4. Runs of homozygosity ...... 15 Figure 5. Principal component analysis ...... 17 Figure 6. ADMIXTURE analysis of the present-work's dataset ...... 18 Figure 7. ADMIXTURE analysis of the 48K-merged dataset ...... 19 Figure 8. STRUCTURE analysis of introgressed ...... 20 Figure 9. Ancestry across the chromosome ...... 21 Figure 10. Analysis of haplotype blocks ...... 22 Figure 11. Iberian shared alleles ...... 23

3

1. INTRODUCTION

Grey wolves ( lupus) historically have been distributed across Europe, Asia and North America, but due to human hunt, deforestation and wild prey loss its population was reduced during the past centuries (Boitani 2003). The species was fragmented and confined in the southern European peninsulas (Iberia, and the Balkans), Canada and Northern USA (Mech 1970; Breitenmoser 1998). In the 1960s legal protection led to a population expansion in USA and (Mech 1995); however, in Eastern Europe and Northern Asia there was no protection, neither extinction risk (Bibikov 1994). Differences in protection and threat between these populations make this species a good model for conservation genetic and genomics.

In Europe, wolves have a discontinuous range where it can be distinguished three main populations that spatially correspond to different glacial refugia and demographic histories (Pilot et al. 2014): large and interconnected populations with constant hunting pressures in Eastern Europe and two relatively smaller, isolated and bottlenecked populations in Western Europe. Eastern Europe wolves are also connected with the Asian populations; nevertheless, hunting causes multiple local demographic fluctuations (for example, Ozolins & Andersone 2001; Sidorovich et al. 2003; Gomerčić et al. 2010). Currently, wolves in Western Europe are expanding from the partially protected populations in Italy (including the Apennine Peninsula and the Western Italian Alps) and the to other regions as or (Sastre 2011).

The Iberian Peninsula contains the largest wolf population in Western Europe (Boitani 2003; Silva et al. 2013), isolated at least since the extinction at the end of 19th century of the France and wolves, suffering a reduction due to human eradication campaigns (Valverde 1971).With new conservation policies, the population underwent a posterior expansion in range and size (Figure 1; Sastre 2011). Although controversially, the wolf population in Iberian Peninsula is estimated to hold 2,200-2,500 individuals concentrated in the Northwestern region (Silva et al. 2013). However, in the South of the Duero river, very fragmented and isolated populations with high extinction risk are present (Silva et al. 2013). In 1994, the European Breeding of Endangered Species Programme (EEP) started a breeding program for the Iberian wolf derived from 15 4

Figure 1. Iberian wolf distribution. Range in the Iberian Peninsula between 19th and 20th centuries. Taken from Sastre 2011. Figure 1. Iberian wolf distribution founders according to the studbook. The relatively high number of independent founders and the subsequent genetic management leads to the conservation of the original variability in the EEP population (Ramírez et al. 2006).

1.1. Extinction risk factors

Small and isolated populations have an increased risk of extinction due to genetic drift and inbreeding because the probability of mating between relatives increases (Frankham 2005; Wright et al. 2007). Close-relative matting increases the amount of homozygous alleles in the offspring which may reduce its fitness by inbreeding depression (Wright 1977; Falconer & Mackay 1996). Furthermore, small isolated populations are also known to have a higher risk of hybridizing (Godinho et al. 2011; Randi et al. 2014). 5

Thereby, wolf populations are sensitive to both processes due to its biogeographical history previously described.

 Population bottlenecks

Bottlenecks are demographic processes that consist in a severe reduction of effective size. Consequences can be the loss of genetic diversity and larger amount of consanguinity, deleterious mutations and genetic drift (Bouzat 2010), that leads to a reduction of adaptability (Frankham et al. 1999) and increase of extinction risk by genetic and demographic processes (Keller 2002). Founding effects, over-exploitation by humans, diseases, starvation and other natural and biological catastrophes cause population bottlenecks, and the genetic effects depends on their strength, duration and isolation level (Carmichael et al. 2001; Busch et al. 2007). In the Italian and Iberian Peninsula, isolated populations suffered a severe bottleneck, and many studies described the previous explained effects in the genetic landscape (Sastre et al. 2010; Sastre 2011; Pilot et al. 2014).

 Population fragmentation

Fragmentation of populations due to habitat loss and modification is an increasingly important threat in the conservation of endangered species because the diversity in a population can only increase through mutation or exchange of genes with neighbouring populations (Vilà et al. 2003a). Isolated and fragmented populations have an increased extinction risk due to the lack of migration and the lower population effective size (Frankham 2005). Under these conditions, migration events may have important effects for the rescue of small and inbred populations (Tallmon et al. 2004). Detecting population fragmentation is therefore crucial for conservation management. European wolf population have been more fragmented than Americans; thus, old world grey wolves are much differentiated between them due to the lack of genetic flux between populations and the genetic drift (Pilot et al. 2010).

6

 Inbreeding depression

Founder effect and isolation might reduce allelic and genotypic diversity within populations, increasing inbreeding and the probability of extinction. Inbreeding leads to an increased frequency of deleterious alleles in the population which in turn reduce the individual fitness. This phenomenon is known as inbreeding depression and might decrease the short-term viability of a population due to the loss of adaptive potential (Ouborg 2010; Ouborg et al. 2010). Wolves are prone to inbreeding depression (Liberg et al. 2005; Räikkönen et al. 2006, 2009), but large or fast-growing population seems to avoid it thanks to selection for heterozygotes (Randi 2011). However, the decline of genetic variability is correlated to the effective population size that is very small in wolves even in the largest populations (Randi 2011), due probably to increased levels of inbreeding together with decreased dispersal and immigration (Aspi et al. 2006). Confirmed negative effects of inbreeding depression in wild wolves include the decreasing over winter survival of pups (Liberg et al. 2005) and congenital bone deformities in isolated wolf populations (Räikkönen et al. 2006, 2009).

 Hybridization

Hybridization between wild species and their domestic counterparts represents a threat to natural populations; although at the same time can introduce genetic variation into isolated populations. The consequences could be the disruption of local adaptation, increase of genetic homogenization between populations and the extinction through introgressive hybridization (Rhymer & Simberloff 1996). Grey wolves and domestic dogs possess identical karyotypes and can generate fertile hybrids despite physiological and morphological differences (Wayne et al. 1989; Vilà & Wayne 1999). Until now, population genetic studies did not reveal large scale introgression of dog genes in wolves with the markers used. Nevertheless, several works have started to detect hybrids in natural populations (Randi et al. 2000, 2014; Randi & Lucchini 2002; Andersone et al. 2002; Verardi et al. 2006; Sundqvist 2008; Godinho et al. 2011; Hindrikson et al. 2012). The use of only a few molecular markers cannot detect past generation backcrosses (Randi 2008) and cryptic introgression is likely to go undetected 7

(Currat et al. 2008). Moreover, introgressed variants may be undistinguishable from intraspecific variation (Caniglia et al. 2013). If introgression is sufficiently frequent, small and fragmented wolf populations can lose specific adaptations and subsequently become extinct. Also wolf re-expansion waves are in risk to be polluted by hybridization due to the amount of free-ranging or feral dogs (Randi 2011).

1.2. Molecular markers

Studies on evolutionary processes in natural populations have been extensively analysed with classical population genetics (Ouborg et al. 2010). In grey wolves, many studies have been conducted in this way using microsatellite loci (Verardi et al. 2006; vonHoldt et al. 2010; Randi et al. 2014), MHC genes (Galaverni et al. 2013; Niskanen et al. 2014), mtDNA (Thalmann et al. 2013) and combinations of them (Ramírez et al. 2006; Sastre et al. 2010; Godinho et al. 2011). Advances in Next Generation Sequencing (NGS) technologies allow examining thousands of genetic markers, including indel- polymorphisms and single nucleotide polymorphisms (SNPs). Access to a large number of loci permits researchers to overcome analytical limitations associated with the analysis of a small number of genetic markers (Allendorf et al. 2010), even in the hybridization analysis (Twyford & Ennos 2012).

Many works have analysed the population processes of wolf and other canids using NGS technologies (Boyko et al. 2010; vonHoldt et al. 2010, 2011; Pilot et al. 2014), based on genotyping microarrays obtained from the complete sequencing of the dog (Lindblad-Toh et al. 2005). Although this kind of data enhances the understanding of wolf populations (vonHoldt et al. 2011), microarrays from close relative species could introduce a bias due to the inclusion of the species variation alone. Until now, just few works (Lindblad-Toh et al. 2005; Wang et al. 2013; Axelsson et al. 2013; Freedman et al. 2014) include a whole-genome sequencing of wolves; however, all of them are focused in the study of domestication.

8

2. OBJETIVES

The objective of this Master’s Thesis is to analyse for first time a wolf population using genome wide sequencing, including three Northwestern Iberian samples, one of them from the EEP, and the first South of Duero individual in the literature. The current diversity of the whole Iberian Peninsula has been covered using these samples. Only one previous study (Godinho et al. 2011) have analysed the dynamics of wolf-dog hybridization in this Northwestern population, where wolves use agricultural habitats close to human settlements (Cuesta et al. 1991; Llaneza et al. 1996; Vos 2000; Blanco & Cortés 2007) which is likely to favour the contact with feral and free-ranging dogs and possibly resulting in extensive hybridization (Petrucci-Fonseca 1982; Blanco et al. 1992).

The specific objectives in the present work are two:

 To analyse the wolf inbreeding degree in two small wolf populations, one of them on the edge of extinction.

 To analyse dog hybridization level and introgression patterns in wolves.

9

3. MATERIAL AND METHODS

Bioinformatical methodology is discussed in Appendix 1.

3.1. Sampling and sequencing

We have generated the whole-genome sequence of 4 Iberian wolves: one captive (Wolf EEP), two from the Northwestern population (Wolf and Wolf ) and one from Southern Spain (Sierra Morena). The Illumina libraries were constructed following manufacturer's instructions and sequenced in the CNAG (Centre Nacional d’Anàlisi Genòmica) and the BGI (Beijing Genomics Institute). In addition, we included the of 11 dogs (different breeds), 6 American and 6 Eurasian wolves (unpublished data; Freedman et al. 2014) for comparison purposes. All wild samples derive from killed or found dead for reasons other than this research and deposited in scientific collections. Captive wolf sample, whose origin is the Iberian Northwestern population, comes from the Parc Zoologic of Barcelona.

3.2. Mapping

All the sequences were mapped to the dog reference genome (canFam3.1) using BWA version 0.6.1 (Li & Durbin 2009) with the quality trimming parameter set to a Sanger quality score of 15 and default parameters. Next, I used Picard tools version 1.70 (http://picard.sourceforge.net/) to remove PCR duplicates and GATK version 2.5 (McKenna et al. 2010) to perform indel realignment. The resulted files were used for the SNP calling.

Then, the DepthOfCoverage tool implemented in GATK was used for the autosomic data in order to average depth of coverage of this final set.

3.3. SNP calling

I produced a preliminary set of autosomic variants for wolf and dogs (19,640,837 SNPs) using the GATK UnifiedGenotyper and VariantFiltration with the recommended 10 filtering parameters for the case in which Variant Quality Score Recalibration (VQSR) is not available (Auwera et al. 2013; more details in Appendix 1).

To avoid low complexity regions and gaps, the mappable region was obtained using the GEM mappability program (Derrien et al. 2012) version 1.315, and custom Perl scripts. Keeping the variants that fell into these regions I obtained a final dataset for all the samples which contains 18,956,547 confident SNPs.

3.4. Diversity analysis and inbreeding

To explore the genome-wide distribution of genetic variability in the Iberian samples, I looked at the distribution of heterozygosity across the genome in 1 Mb overlapping window with 200kb sliding-step with an in-house-made Perl and R scripts. For each window, the number of heterozygous positions in these regions was computed and divided by the number of all callable positions. Only windows with a 100kb minimum callable region were considered. The approach of the present work was been used in other studies (for example, Prado-Martinez et al. 2013) instead the expected heterozygosity diversity (π, Tajima 1983), because the number of samples for each population is small.

To avoid coverage divergences between samples, I removed variants in non-callable sample-specific regions obtained with GATK CallableLoci tool with a minimum base quality of 20 and a maximum-minimum depth based in its coverage distribution, taken the mean±5 autosomal read depth.

Runs of homocigosity (ROH) are regions with a lower heterozygosity rate. For each sample, ROH were computed with a non-overlapping window-size of 1Mb. Depending on the length, ROHs may be indicative of historical population demographics and homozygosity by descent (Li et al. 2006; Hamzić 2011). Long ROHs (> 1Mb) are indicative of autozygosity, inbreeding or admixture (Boyko et al. 2010; Pilot et al. 2014). Due to this association to the recent-past demography, I conservatively considered ROHs when at least two consecutive windows (≥ 2Mb) fell under a heterozygosity cutoff of 0.0005 (based in the heterozygosity distribution, Figure 2, Appendix 2) 11

To calculate the inbreeding coefficient based in runs of homocygosity, it was applied the following definition of FROH (Keller et al. 2011):

Where ROHk and Lj are the kth ROH and the individual j’s genome length. The genome length for each sample was computed using the callable bases in the considered windows.

3.5. Ancestry analysis

For the ancestry analysis non-biallelic and missing markers were removed with a custom Python script, filtering by MAF<0.01 and LD-pruned using PLINK version 1.07 (Purcell et al. 2007), with sliding-window size of 50 SNPs (10 overlap) and r2=0.5. With this pruned dataset (4,558,774 SNPs), I performed an ADMIXTURE (Alexander et al. 2009) analysis, which uses the same statistical model as STRUCTURE (Pritchard et al. 2000). To assess the error, the program was run 5 times with K between 2 and 10, and a 5-fold cross-validation (Alexander & Lange 2011). To visualize the relationships between this genotype data a Principal Component Analysis (PCA) was performed using the smartPCA program implemented in EIGENSOFT package version 5.0.1 (Price et al. 2006).

To check and improve the ancestry results for the Iberian wolves, I combined the present-work’s data with a 48K dataset from previous works (Boyko et al. 2010; vonHoldt et al. 2010, 2011) for increase the number of samples. This data comes from the Affymetrix Canine version 2 genome-wide SNP mapping array, which uses CanFam2 assembly coordinates. For this reason, each sample was mapped and SNPs was called again to this assembly as previously described. After joining both datasets, filtering by MAF and LD-pruned with PLIK using the same parameters as previously described, I obtained a set of 43,497 SNPs. This dataset was used to repeat the ADMIXTURE (only 3 runs) and PCA analysis as described above.

12

3.6. Hybridization analysis

To confirm the high level of admixture between Sierra Morena wolf and dogs, shared alleles between each Iberian wolf to the other samples were estimated. The percentage of shared alleles by sample pairs was computed dividing by the total number of alleles present in both samples. For this analysis, I included all confident SNPs called (no- pruned dataset, 15,807,997 SNPs, without non-biallelic and missing markers). For the Iberian wolf population I also computed the shared alleles between all samples drawing a four set Venn diagram with VennDiagram R package version 1.6.5 (Chen & Boutros 2011).

To determine dog introgressed regions in highly admixed Iberian wolf samples (Sierra Morena and Wolf Spain), it was used PCAdmix version 1.0 (Brisbin et al. 2012) with a 50 SNP window size. Because this program needs phased genotypes, the complete pruned dataset were phased using SHAPEIT version 2.644 (Delaneau et al. 2013). To detect blocks of ancestry (haplotypes assigned to ancestral populations), PCAdmix was run with the 11 dogs as one ancestral population and 6 Eurasian wolves plus Wolf Portugal and Wolf EEP as the second. To assess the result for both samples, the same analysis for the Wolf EEP and Wolf Portugal was subsequently performed excluding only the one used in the admixed population. From 50 SNP block assignment of the four samples, overall percentage of haplotypes Dog/Dog, Wolf/Dog and Wolf/Wolf was computed. To analyse the relationship of each kind of block, I fitted to a linear model the incidence of each class with the mean heterozygosity in each chromosome using lm function implemented in R (version 3.1.0).

For comparison purposes, a microsatellite analysis with Sierra Morena and Wolf Spain was made, including the genotyping of 10 autosomic markers following the protocol of Sastre et al. (2010). Using a dataset that includes 31 Iberian wolves and 32 dog samples (Sastre 2011), a Bayesian model-based clustering approach implemented in STRUCTURE version 2.0 (Falush et al. 2007) was performed, running 100,000 Markov chain Monte Carlo repetitions and a burn-in period of 10,000 iterations for K=2.

13

4. RESULTS

4.1. Heterozygosity and inbreeding

Considering only Iberian wolves, Sierra Morena has the lowest mean heterozygosity rate (0.00109 het/bp, Table 1), with 41.85% of sliding windows falling into inbreed regions (Figures 2 and 3, Appendix 3); Wolf Portugal seems to share the same pattern (0.00118 het/bp). The mean heterozygosity rates observed in the genome sequences of the Eurasian wolves is 0.0016 het/bp (except Wolf Italy, where is lowest; Appendixes 2, 3 and 4), consistent to other genome-wide studies (Lindblad-Toh et al. 2005; Freedman et al. 2014). Dog samples have a reduced heterozygosity (0.00088 het/bp; Appendixes 2, 3 and 4), but vary across different breeds as previously described (Lindblad-Toh et al. 2005; Freedman et al. 2014).

Table 1. Results for Iberian samples.

Sample Population Cov Het FROH %Dog blocks Sierra Morena South Spain 43.94 0.00109271 0.42 31.88 Wolf Spain Northwestern 22.68 0.00154275 0.15 14.30 Wolf Portugal Northwestern 24.30 0.00118270 0.30 2.94 Wolf EEP Northwestern 22.74 0.00146647 0.15 3.20 Cov: atosomic coverage; Het: heterozygosity (het/bp); FROH: inbreeding coefficient; %Dog: percentage of dog ancestry blocks

a) b)

Figure 2. Diversity distribution. Density (a) and box plots (b) from heterozygosity in Iberian samples using 1Mb 200kb-overlapping windows. Dotted lines point out the cutoff used as inbreed windows. Figure 2. Diversity distribution 14

the the median of

.

out out

Appendix 3 Appendix

Details for each sample in sample each for Details

overlapping overlapping windows (blue lines). Dotted lines point

-

overlapping windows. windows. overlapping

-

Heterozygosity Heterozygosity in Iberian samples using 1Mb 200kb

3. 3. Diversity analysis.

. Diversity . analysis

3

Figure non 1Mb using (ROHs) ofhomozygosity runs are redblocks and sample each Figure Figure 15

ROHs appear in all Iberian wolves (Figure 4), but Sierra Morena had chromosomes almost entirely homozygous (Figure 3, more details in Appendix 3). This sample shows the largest ROHs at 40-60 Mbp, and the cumulative curve is the highest as compared to the other Iberian samples. Although Wolf Portugal also has runs longer than 40 Mbp (Figure 4b), the distribution is quite similar to other Iberian wolves. Wolf Spain and Wolf EEP show a similar cumulative curve at the ROH length (Figure 4a), and almost all runs of homozygosity are shorter than 30 Mbp.

a)

b)

Figure 4. Runs of homozygosity. Cumulative (a) and total (b) counts for runs of homozygosity (ROHs) in Iberian samples computed using 1Mb non- overlapping windows. Note that inbreed regions less than 2Mb are in the plot but not considered as ROH. Figure 4. Runs of homozygosity

16

Inbreeding coefficient analysis, calculated with the FROH, leads to the same result: Sierra

Morena is the most inbreed Iberian wolf (FROH = 0.42), followed by Wolf Portugal

(FROH = 0.30). Wolf Spain and Wolf EEP have an inbreeding coefficient (FROH = 0.15) which is half the most inbreed Northwester Iberian sample (Table 1). The FROH of

Eurasian and American wolves (Appendix 2), except Wolf Italy (FROH = 0.51), Wolf

China (FROH = 0.23) and both Wolf Mexico (A and B samples, FROH = 0.70), is much lower than that of Wolf Spain and Wolf EEP, with inbreeding coefficients between 0.01 and 0.09. On the other hand, dogs have a FROH in the range between 0.20-0.44 (depending the breed), higher than Northwestern Iberian wolves and around the Sierra Morena’s value.

4.2. Hybridization patterns

48K-merged and the present-work’s dataset bring a similar result in the PCA analysis. Wolf Portugal and Wolf EEP clusters with Iberian wolves and near other Eurasian populations (Figure 5). Nevertheless, Wolf Spain and Sierra Morena are shifted from this cluster towards dogs in the PC1, which differentiates well American wolves, Eurasian wolves and dogs (Appendix 5). In the present-work’s dataset, PC4 distinguishes better Eurasian populations, but shows the same pattern in the Iberian wolves (Appendix 5).

The ADMIXTURE analysis results in the same hybridization pattern with dogs (Figure 6). Because the 48K dataset (Boyko et al. 2010; vonHoldt et al. 2010, 2011) comes from a microarray that maximizes the dog variability and has more samples, I detect dog ancestry in Wolf Spain better in the 48K-merged (Figure 7) than in the present- work’s dataset (Figure 6), although in the K=2 appears this component. Cross-validation error for both datasets (Appendix 6) shows these differences, obtaining as correct clusters K=9 and K=2, respectively.

From the ADMIXTURE analysis at K=2, our samples have the following percentage for the dog component (this study and 48K-merged datasets, respectively): Sierra Morena 31.51% and 36.94%, Wolf Spain 10.43% and 17.69%, Wolf Portugal 0.00% and 4.47%, Wolf EEP 0.00% and 3.28%. Importantly, the 48K dataset always detects more percentage of introgression in any sample. It is likely that this bias is caused by the 17

a)

Figure 5. Principal component analysis. Principal component analysis (PCA) of Sierra Morena (red), b) Wolf Portugal (blue), Wolf Spain (green) and Wolf EEP (orange) with the 48K-merged dataset samples (a) and samples from this work (b, c). In c), SNP from dog blocks in Sierra Morena and

Wolf Spain (Figure 9) are removed (note that c) in this case Iberian samples cluster closer). Figure 5. Principal component analysis

18

Figure 6. ADMIXTURE analysis of the present-work’s dataset. Cross-validation error (Appendix 6) shows that the better cluster is K=2, which differentiates dogs and wolves. Nevertheless, this analysis also differentiates North American (K=3), South American (K=4), Asian and European (K=5, K=7) and Iberian (K=8) wolves. Moreover, relationships between dogs breeds are reflected in K=6-9. Figure 6. ADMIXTURE analysis of the present-work's dataset

microarray design, which takes into account only the variability from the dog genome. Alternatively, for the most inbreed samples (Sierra Morena and Wolf Spain), the STRUCTURE analysis using microsatellites leads to values of 42.4% and 0.6% of dog component (Figure 8), respectively. These results are very different from those obtained with genomic data, suggesting wrong estimations due to the low number of markers.

Due to this displacement towards dogs, I analyse haplotype blocks of dog and Eurasian ancestry in the admixed samples (Figure 9). The result shows that almost a third of Sierra Morena’s genome (31.88 % of 50 SNP) comes from dogs, doubling the Wolf Spain’s dog ancestry (14.30%; Table 1, Figure 9). Moreover, the ancestry pattern between both samples is different: Sierra Morena has long dog haplotypes present at the same region in both chromosomes, whereas in Wolf Spain they are shorter and both chromosomes shows a different distribution (Figure 10a). Wolf Portugal and Wolf EEP have only 3% dog ancestry (Table 1), result that validate the method. These values are close to the percentage of ADMIXTURE dog component and indicate an accurate

19

American American

) ) shows that the better cluster is K=9, which

talian talian (ITW), Iberian (IBW) and 3 different

Appendix Appendix 6

validation validation error (

-

Cross

merged merged dataset.

-

mergeddataset

-

. . ADMIXTURE analysis of the 48K

. ADMIXTURE analysis of analysis 48K the ADMIXTURE .

7

(AMW) populations (left panel). On the right, zoom of Iberian samples the On analysed (left populations this in zoom right, work. panel). Iberian of (AMW) Figure Figure 7 differentiates dogs, and different wolf populations: Asian (ASW), Central European (CEW), I Figure Figure 20

Figure 8. STRUCTURE analysis of introgressed wolves. Probabilistic assignment to the genetic clusters inferred by Bayesian analysis with K=2 of dog, Iberian wolves (IBW) and the hybrid samples Wolf Spain (WS) and Sierra Morena (SM). Figure 8. STRUCTURE analysis of introgressed wolves

estimation of the hybridization patterns with our genomic data in contrast with microsatellites.

The linear models between heterozygosity and haplotypes (Dog/Dog, Wolf/Dog and Wolf/Wolf) indicate that in the hybrid samples, chromosomes with more percentage of both ancestry blocks tend to be more genetically variable (Figure 10b, Appendix 7). This result suggests that the hybrid regions in Wolf Portugal increase the heterozygosity by introgression.

Removing the SNP’s windows that present a dog haplotype in at least one of the admixed samples, the PCA analysis shows the same clusters as in the previous one (Figure 5c). In this case, Wolf Spain gathers the Iberian cluster (Wolf Portugal and Wolf EEP), and Sierra Morena is close to them. However, Sierra Morena remains displaced towards dogs, including in the PC4 (Appendix 5) which explains better the Eurasian variation.

Furthermore, using all confident markers, Sierra Morena shares around 70% alleles with dogs, which represents almost 1% more than Wolf Spain and 3% more than no-admixed (Wolf Portugal and Wolf EEP) samples (Figure 11a). On the other hand, from all the alleles present in the Iberian population (22,959,835 out of 31,616,032 in the dataset), Sierra Morena have 5% of singletons, comparing with the 3% from the rest (Figure 11b). Moreover, exclusive alleles shared between Sierra Morena and Wolf Spain are a 21

little higher (around 0.5% more) than between the other Iberian wolves and Sierra Morena. Both results are consistent with the high hybridization level of Sierra Morena and a few introgression of dog’s genome in Wolf Spain.

a) b)

c) d)

Figure 9. Ancestry across the chromosome. Ancestry blocks from dogs (red) and Eurasian wolves (blue) in Sierra Morena (a), Wolf Spain (b), Wolf Portugal (c) and Wolf EEP (d). In the legend, N represents the number of individuals used as ancestral population. Figure 9. Ancestry across the chromosome

22 a)

Figure 10. Analysis of haplotype blocks. Haplotype class block frequency (a) and heterozygosity-percentage linear model (b, details in Appendix 7) by sample and class. Each point represents a chromosome. Figure 10. Analysis of haplotype blocks

b)

23

a)

Figure 11. Iberian shared b) alleles. Shared alleles between samples in the present-work’s dataset by pairs (a) and between the four Iberian samples (b). Percentage is calculated as the number of shared alleles divided by the total number of alleles in the considered samples. Figure 11. Iberian shared alleles

24

5. DISCUSSION

Wolf population of Northwestern Iberia has been extensively studied in many aspects (Ramírez et al. 2006; Sastre et al. 2010; Godinho et al. 2011; Sastre 2011; vonHoldt et al. 2011; Pilot et al. 2014), but in this work it is included for the first time the variability present in the South of the Iberian Peninsula. Although only one sample from this population was analysed, it could be the last individual due to the high extinction risk of an isolated one-pack group (Padial et al. 2000; Silva et al. 2013) composed by a single breeding pair, their offspring of the year and occasional older offspring (Randi 2011). Because this small size and the controversy about the existence of the South Iberian wolf, I used “population” to refer to Sierra Morena individual data. By comparing the genetic patterns between both populations it can be understood the dynamics of Iberian wolf and its current conservation status. This study analyses the first NGS data from wolves in the context of conservation and population genetics; thus we can investigate in depth heterozygosity, inbreeding and hybridization patterns individually.

A major concern in wolf conservation genetics is the extensive hybridization between wolf and wild or domestic canids (Rhymer & Simberloff 1996; Randi 2011). Hybridization is a documented threat of canids, including the Ethiopian wolf with dogs (Gottelli et al. 1994), and the red wolf (Adams et al. 2003), the Great Lakes wolf (Leonard & Wayne 2008) and other North American wolves with coyotes (Roy et al. 1994). In Europe, hybridization between declining or expanding wolf populations and their domestic counterparts is an important threat (Randi 2011) and many hybrids were reported with a few number of genetic markers (Randi et al. 2000, 2014; Randi & Lucchini 2002; Andersone et al. 2002; Verardi et al. 2006; Sundqvist 2008; Godinho et al. 2011; Hindrikson et al. 2012). Only one previous study (Godinho et al. 2011) provides information about the hybridization between wolves and dogs in the Iberian Peninsula, obtaining a 4% of hybridization occurrence (8 individuals) in the Northwestern population using 42 autosomal markers. In Godinho et al. (2011) some of the introgressed samples were selected because presented dog phenotypic traits. Most of the hybrid individuals show a 50% dog component and only two samples have a component lower than 20% (concretely, 15.2% and 18.1%). Both individuals contain a dog-like Y-chromosome which indicates the direction of the cross, suggesting that the 25 introgression is recent and thus more detectable. In the present work, one out of two wild samples from the Northwestern population shows a minor level of hybridization (around 15%); otherwise, Sierra Morena has the third part of its genome introgressed by dog (Table 1, Figure 9). Moreover, the mtDNA for both individuals (data not shown) present a w1 wolf haplotype (from Vilà et al. 1999), that supports the major wolf-dog hybridization direction detected in previous works (Vilà et al. 2003b; Godinho et al. 2011). Nevertheless, here I cannot detect the 15% hybridization level obtained with whole-genome data of Wolf Spain using 10 microsatellite markers (Figure 8). Inversely, Sierra Morena’s dog component is almost 50%, far for the proportion around 30% obtained with ADMIXTURE and PCAdmix (Figures 6, 7 and 9). In addition, Wolf Spain and Sierra Morena had no hybrid phenotypic characteristic, contrary to the observations of Godinho et al. (2011). The results of this work suggest that the use of microsatellite data might underestimate the hybridization incidence in populations with at least 15% introgression; on the other hand, it might overestimate this parameter in the most introgressed samples. Further whole-genome information from the Iberian Peninsula will help to understand the proportion of hybridization in the Northwestern population and accurately estimate the hybridization occurrence.

Because of the coexistence with feral and domestic dogs, hybridization is an important effect in small wolf populations like those from the Italian and the Iberian Peninsulas (Verardi et al. 2006; Randi 2008; Godinho et al. 2011; Randi et al. 2014) and also in expanding populations in other European regions (Andersone et al. 2002; Sundqvist 2008; Hindrikson et al. 2012). Feral organisms might have an impact in the structure of local communities, leading to loss of genetic diversity (Allendorf et al. 2001). Moreover, introgression of dog genes can decrease the adaptive potential of the hybrid and leads to extinction (Rhymer & Simberloff 1996). Introgressive hybridization would enhance genetic homogenization, leading to disintegrate the local genetic adaptation. Habitat modification is being increased by anthropogenic action, and this leads to fragmentation and isolation of many populations. Individuals in these small and isolated populations in contact with the domestic counterparts are more likely to hybridize because of the difficulty of finding mates of the same species. This is very important in the South Spain population, where the effective size is very small (Silva et al. 2013) and the habitat is close to human settlements (Cuesta et al. 1991; Llaneza et al. 1996; Vos 2000; Blanco & Cortés 2007). When introgression occurs, a relatively greater fraction 26 of small population would hybridize each generation and increase even more the introgression rate (Rhymer & Simberloff 1996).

On the other hand, the importance of the genetic diversity in the scope of conservation genetics is due to the effects in inbreeding depression and disruption of local adaptation (Allendorf et al. 2010). Heterozygosity varies a lot across Iberian samples (Table 1, Figure 2 and 3): in Wolf Spain it is similar to other European wolf populations (Appendixes 2, 3 and 4), whereas Wolf Portugal has a lower rate; Wolf EEP, the captive sample, is also as variable as Wolf Spain and the diversity of Sierra Morena is the lowest, but very close to Wolf Portugal. Nevertheless, the four Iberian samples have a higher inbreeding coefficient than the Eurasian populations (except Wolf China and Wolf Italy; Appendix 2). Eurasian wolves have inbreeding coefficients between 0.01 and 0.09, except Wolf China (0.23) and Wolf Italy (0.51). Wolf China sample was previously described (Freedman et al. 2014) showing the same diversity and ROHs; is known to pass a severe bottleneck with genetic effects (Lucchini et al. 2004; Fabbri et al. 2007; Randi 2008) that leads to its inbreeding pattern. The genetic evidence that a bottleneck occurred in the Iberian population was demonstrated (Sastre et al. 2010; Sastre 2011), and explains the results obtained for the inbreeding coefficient of Iberian samples in the present work.

Although the EEP studbook indicates that Wolf EEP must have an inbreeding coefficient near 0 because only few generation of crosses have undergone, Wolf EEP have the same inbreeding coefficient as Wolf Spain (FROH = 0.15). This value can be explained by the past bottleneck which reduced the diversity of Iberian individuals (Sastre et al. 2010; Sastre 2011). An inbreeding coefficient of 0.125, close to the value of Wolf Spain, is produced by mattings between grandparent/grandchild, half-siblings, or uncle/niece (assuming no previous inbreed parents). Wolf Portugal, due to its lower heterozygosity rate, has an increased inbreeding coefficient (0.30) that is likely to involve mattings between close-relative wolves with the same inbreeding coefficient as Wolf Spain and Wolf EEP. In a population with a small effective size, the inbreeding can’t be avoided (Randi 2011), as demonstrates the FROH very high (0.42) of Sierra Morena. This value is near a very close-relative (parent/offspring, full siblings and double first cousins in first degree) continuous matting structure, so the number of individual must be very small. The autozygosity of Sierra Morena leads to a loss of 27 genetic variation and inbreeding depression, which can make the population disappear, threat that is likely to occur in the very inbred Iberian population.

In wolves, inbreeding has an effect in the health of the population (Liberg et al. 2005; Räikkönen et al. 2013). Loss of genetic diversity (inbreeding depression) reduces reproduction and survival, increasing extinction risk (Frankham 2005). Inbreeding depression affects the population ability for adapting to the environmental change. Although inbreeding depression can be avoided by removing the deleterious alleles by selection (purging), this effect in small population is low and deleterious alleles of small effect can drift to fixation (Frankham 2005). Consequently, this alleles increase in frequency and reduce reproductive fitness (Wright et al. 2007). South Spain population has a very small effective size (only one pack, Silva et al. 2013) which fixed deleterious alleles are likely to have a high frequency due to inbreeding. Northwestern Iberia, although the estimation of population size does not seem to endanger the population diversity (Silva et al. 2013) because its recent growth, shows an inbreeding coefficient between 0.15-0.30 (Table 1), slightly higher than other well-conserved Eurasian populations (Appendix 2). This increase of inbreeding suggests that the effective size values for the Iberian Peninsula might be overestimated (for example, by including juveniles; Vilà 2010) as the matting structure related to the inbreeding coefficients suggests. More samples will be necessary to verify this hypothesis.

Despite the differences in genetic variation, the three Northwestern wolves and Sierra Morena cluster together when the dog component is removed (Figure 5). Moreover, dog component only appears in the less inbreed Northwestern wild individual, leading to conclude that the increase in the heterozygosity and decrease of inbreeding coefficient is due to hybridization in this sample (Figure 10). Genetic integrity of Iberian population is at risk due to its hybridization with dogs, as shown in Sierra Morena and Wolf Spain samples. Nevertheless, Sierra Morena and Wolf Spain introgression patterns are different (Figures 9 and 10): Wolf Spain introgressed regions are always Dog/Wolf, whereas in Sierra Morena there are also haplotypes Dog/Dog. This suggests that the hybridization in the South population is frequent, and almost all remaining individuals have introgression signals in their genomes. On the other hand, Wolf Spain shows a pattern more likely related to a sporadic hybridization event.

28

Here it is confirmed the regional and continental genetic patterns of wolves detected in vonHoldt et al. (2011) using genome-wide sequences without bias. Shared alleles bring a geographical pattern (Figure 11), even considering only Iberian samples: the shared percentage decreases with geographical distance, being highest with the Central Europe sample. However, Sierra Morena reduces its affinity with other Iberian samples due to the introgression of dog alleles. Wolf Spain shows an increase of shared alleles with dogs, although it conserves the affinities with other Iberian wolves. Actual Iberian population can be considered different from other European populations. Iberian wolf is known to represent a different sub-specie (Cabrera 1907). Morphometric (Vilà 1993) and genetic (Vilà et al. 1999; Lucchini et al. 2004) studies describe a notable differentiation between Iberian and Eurasian wolves, which suggests that they have been separated from all other European wolves for a long time. Recognizing its evolutionary potential (Crandall et al. 2000), Northwestern Iberian population demands a separate management. Although previous studies based in few molecular markers (Lucchini et al. 2004; Ramírez et al. 2006; Sastre et al. 2010; Sastre 2011) conclude that there is no severe reduction on the genetic variability, here it is demonstrated that Wolf Portugal sample has an important reduction on the diversity (Table 1, Figures 2, 3 and 4), and that in the Wolf Spain the hybridization reduces its inbreeding coefficient (Figure 10). Those two evidences indicate that the conservation status of Northwestern Iberian population is at risk, either because inbreeding or introgression. Moreover, compared with other European samples, the inbreeding coefficient is incremented (Table 1, Appendix 2). If high levels of inbreeding or hybridization are an extended pattern in the Iberian Peninsula, these results suggest that the real effective population size is lower than previous estimations.

29

6. CONCLUSION

Summarizing, similarities between Sierra Morena individual and other Iberian samples included in this Master’s Thesis shows that the Northwestern population is at risk for the same reason as Sierra Morena. Huge inbreeding coefficient and introgression are well-known conservation threats (Rhymer & Simberloff 1996; Frankham 2005; Ouborg 2010; Allendorf et al. 2010; Randi 2011). Both factors are detected in the Northwestern samples, indicating that the population is not as well-conserved as previously described. Nevertheless, it has been detected two different patterns in both individuals: Wolf Spain has a heterozygosity rate approximately equal to other Eurasian populations, but dog introgression is present; on the other hand, Wolf Portugal has an increased inbreeding coefficient and no hybridization. Analysis of more samples could explain the major pattern in the Northwestern Iberian population.

Following, I point out the conclusions derived from the present work:

 South Iberian wolf shows loss of genomic diversity and huge dog hybridization which indicates an important extinction risk.

 Northwestern Iberian wolf has higher diversity and less introgression than the South population, but the level represents a threat to the population.

 Patterns of hybridization are different in both populations: in the South, introgression is frequent and extended; in the Northwestern, an occasional event.

 Northwestern population has an inbreeding coefficient slightly higher than other healthy grey wolves as a consequence of the bottleneck suffered in the Iberian Peninsula.

 Although more samples are needed, wolf population size seems to be overestimated in the Northwestern Iberian Peninsula.

30

7. REFERENCES

Adams JR, Kelly BT, Waits LP (2003) Using faecal DNA sampling and GIS to monitor hybridization between red wolves (Canis rufus) and coyotes (Canis latrans). Molecular ecology, 12, 2175–2186.

Alexander DH, Lange K (2011) Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC bioinformatics, 12, 246.

Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19, 1655–1664.

Allendorf FW, Hohenlohe P a, Luikart G (2010) Genomics and the future of conservation genetics. Nature reviews. Genetics, 11, 697–709.

Allendorf FW, Leary RF, Spruell P, Wenburg JK (2001) The problems with hybrids: setting conservation guidelines. Trends in Ecology & Evolution, 16, 613–622.

Andersone Ž, Lucchini V, Ozoliņš J (2002) Hybridisation between wolves and dogs in Latvia as documented using mitochondrial and microsatellite DNA markers. Mammalian Biology - Zeitschrift für Säugetierkunde, 67, 79–90.

Aspi J, Roininen E, Ruokonen M, Kojola I, Vilà C (2006) Genetic diversity, population structure, effective population size and demographic history of the Finnish wolf population. Molecular ecology, 15, 1561–76.

Auwera GA Van Der, Carneiro MO, Hartl C et al. (2013) From FastQ Data to High- Confidence Variant Calls : The Genome Analysis Toolkit Best Practices Pipeline. In: Current Protocols in Bioinformatics (eds Bateman A, Pearson WR, Stein LD, Stormo GD, Yates JR), pp. 11.10.1–11.10.33. Hoboken, NJ, USA.

Axelsson E, Ratnakumar A, Arendt M-L et al. (2013) The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature, 495, 360–364.

Bibikov DI (1994) Wolf problem in Russia. Lutreola, 3, 10–14.

Blanco JC, Cortés Y (2007) Dispersal patterns, social structure and mortality of wolves living in agricultural habitats in Spain. Journal of Zoology, 273, 114–124.

Blanco JC, Reig S, de la Cuesta L (1992) Distribution, status and conservation problems of the wolf Canis lupus in Spain. Biological Conservation, 60, 73–80.

Boitani L (2003) Wolf conservation and recovery. In: Wolves. Behavior, Ecology, and Conservation (eds Mech LD, Boitani L), pp. 317–344. The University of Chicago Press, Chicago.

Bouzat JL (2010) Conservation genetics of population bottlenecks: the role of chance, selection, and history. Conservation Genetics, 11, 463–478. 31

Boyko AR, Quignon P, Li L et al. (2010) A simple genetic architecture underlies morphological variation in dogs. PLoS biology, 8, e1000451.

Breitenmoser U (1998) Large predators in the Alps: The fall and rise of man’s competitors. Biological Conservation, 83, 279–289.

Brisbin A, Bryc K, Byrnes J et al. (2012) PCAdmix: principal components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations. Human biology, 84, 343–364.

Busch JD, Waser PM, Dewoody JA (2007) Recent demographic bottlenecks are not accompanied by a genetic signature in banner-tailed kangaroo rats (Dipodomys spectabilis). Molecular ecology, 16, 2450–62.

Cabrera A (1907) Los lobos de España. Boletín de la Real Sociedad Española de Historia Natural, 7, 193–198.

Caniglia R, Fabbri E, Greco C et al. (2013) Black coats in an admixed wolf × dog pack is melanism an indicator of hybridization in wolves? European Journal of Wildlife Research, 59, 543–555.

Carmichael LE, Nagy JA, Larter NC, Strobeck C (2001) Prey specialization may influence patterns of gene flow in wolves of the Canadian Northwest. Molecular Ecology, 10, 2787–2798.

Chen H, Boutros PC (2011) VennDiagram: a package for the generation of highly- customizable Venn and Euler diagrams in R. BMC bioinformatics, 12, 35.

Crandall KA, Bininda-Emonds ORP, Mace GM, Wayne RK (2000) Considering evolutionary processes in conservation biology. Trends in Ecology & Evolution, 15, 290–295.

Cuesta L, Barcena F, Palacios F, Reig S (1991) The trophic ecology of the Iberian Wolf (Canis lupus signatus Cabrera, 1907). A new analysis of stomach’s data. Mammalia, 55, 239–254.

Currat M, Ruedi M, Petit RJ, Excoffier L (2008) The hidden side of invasions: massive introgression by local genes. Evolution, 62, 1908–1920.

Delaneau O, Zagury J-F, Marchini J (2013) Improved whole-chromosome phasing for disease and population genetic studies. Nature methods, 10, 5–6.

Derrien T, Estellé J, Marco Sola S et al. (2012) Fast computation and applications of genome mappability. PloS one, 7, e30377.

Fabbri E, Miquel C, Lucchini V et al. (2007) From the Apennines to the Alps: colonization genetics of the naturally expanding Italian wolf (Canis lupus) population. Molecular ecology, 16, 1661–1671.

Falconer DS, Mackay TFC (1996) Quantitative genetics. Pearson Education Limited. 32

Falush D, Stephens M, Pritchard JK (2007) Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular ecology notes, 7, 574–578.

Frankham R (2005) Genetics and extinction. Biological Conservation, 126, 131–140.

Frankham R, Lees K, Montgomery ME et al. (1999) Do population size bottlenecks reduce evolutionary potential? Conservation, 2, 255–260.

Freedman AH, Gronau I, Schweizer RM et al. (2014) Genome sequencing highlights the dynamic early history of dogs. PLoS genetics, 10, e1004016.

Galaverni M, Caniglia R, Fabbri E, Lapalombella S, Randi E (2013) MHC variability in an isolated wolf population in Italy. The Journal of heredity, 104, 601–612.

Godinho R, Llaneza L, Blanco JC et al. (2011) Genetic evidence for multiple events of hybridization between wolves and domestic dogs in the Iberian Peninsula. Molecular ecology, 20, 5154–5166.

Gomerčić T, Sindičić M, Galov A et al. (2010) High genetic variability of the grey wolf (Canis lupus L.) population from Croatia as revealed by mitochondrial DNA control region sequences. Zoological Studies, 49, 816–823.

Gottelli D, Sillero-Zubiri C, Applebaum GD et al. (1994) Molecular genetics of the most endangered canid: the Ethiopian wolf Canis simensis. Molecular ecology, 3, 301–312.

Hamzić E (2011) Division of Sciences Levels of Inbreeding Derived from Runs of Homozygosity : A Comparison of Austrian and Norwegian Cattle Breeds. PhD thesis, University of Natural Resources and Life Sciences: Vienna.

Hindrikson M, Männil P, Ozolins J, Krzywinski A, Saarma U (2012) Bucking the trend in wolf-dog hybridization: first evidence from europe of hybridization between female dogs and male wolves. PloS one, 7, e46465.

Keller L (2002) Inbreeding effects in wild populations. Trends in Ecology & Evolution, 17, 230–241.

Keller MC, Visscher PM, Goddard ME (2011) Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics, 189, 237–249.

Leonard JA, Wayne RK (2008) Native Great Lakes wolves were not restored. Biology letters, 4, 95–98.

Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 25, 1754–1760.

Li L, Ho S, Chen C et al. (2006) Long Contiguous Stretches of Homozygosity in the Human Genome. Human mutation, 27, 1115–1121. 33

Liberg O, Andrén H, Pedersen H-C et al. (2005) Severe inbreeding depression in a wild wolf (Canis lupus) population. Biology letters, 1, 17–20.

Lindblad-Toh K, Wade CM, Mikkelsen TS et al. (2005) Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature, 438, 803–819.

Llaneza L, Fernández A, Nores C (1996) Dieta del lobo en dos zonas de (España) que difieren en carga ganadera. Doñana Acta Vertebrata, 23, 201–214.

Lucchini V, Galov A, Randi E (2004) Evidence of genetic distinction and long-term population decline in wolves (Canis lupus) in the Italian Apennines. Molecular Ecology, 13, 523–536.

McKenna A, Hanna M, Banks E et al. (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20, 1297–1303.

Mech LD (1970) The wolf: the ecology and behavior of an endangerted species (Natural History Press, Ed,). Doubleday Publishing Co., N.Y.

Mech LD (1995) The challenge and opportunity of recovering wolf populations. Conservation Biology, 9, 270–278.

Niskanen AK, Kennedy LJ, Ruokonen M et al. (2014) Balancing selection and heterozygote advantage in major histocompatibility complex loci of the bottlenecked Finnish wolf population. Molecular ecology, 23, 875–889.

Ouborg NJ (2010) Integrating population genetics and conservation biology in the era of genomics. Biology letters, 6, 3–6.

Ouborg NJ, Pertoldi C, Loeschcke V, Bijlsma RK, Hedrick PW (2010) Conservation genetics in transition to conservation genomics. Trends in genetics, 26, 177–187.

Ozolins J, Andersone Z (2001) Status of large carnivore conservation in the Baltic States. Action plan for the conservation of wolf (Canis lupus) in Latvia. European Commission: Strasbourg, T-PVS, 73, 1–32.

Padial JM, Contreras FJ, Pérez J, Ávila E, Barea JM (2000) Análisis de la situación y problemática del lobo (Canis lupus signatus) en Sierra Morena Oriental (Sur de España). Galemys, 12, 37–44.

Petrucci-Fonseca F (1982) Wolves and stray-feral dogs in Portugal. In: III International Theriological Congress . Helsinky.

Pilot M, Branicki W, Jedrzejewski W et al. (2010) Phylogeographic history of grey wolves in Europe. BMC evolutionary biology, 10, 104. 34

Pilot M, Greco C, vonHoldt BM et al. (2014) Genome-wide signatures of population bottlenecks and diversifying selection in European wolves. Heredity, 112, 428– 442.

Prado-Martinez J, Hernando-Herraez I, Lorente-Galdos B et al. (2013) The genome sequencing of an albino Western lowland gorilla reveals inbreeding in the wild. BMC genomics, 14, 363.

Price AL, Patterson NJ, Plenge RM et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38, 904– 909.

Pritchard JK, Stephens M, Donnelly P (2000) Inference of Population Structure Using Multilocus Genotype Data. Genetics, 155, 945–959.

Purcell S, Neale B, Todd-brown K et al. (2007) PLINK : A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. American Journal of Human Genetics, 81, 559–575.

Räikkönen J, Bignert A, Mortensen P, Fernholm B (2006) Congenital defects in a highly inbred wild wolf population (Canis lupus). Mammalian Biology - Zeitschrift für Säugetierkunde, 71, 65–73.

Räikkönen J, Vucetich JA, Peterson RO, Nelson MP (2009) Congenital bone deformities and the inbred wolves (Canis lupus) of Isle Royale. Biological Conservation, 142, 1025–1031.

Räikkönen J, Vucetich J a, Vucetich LM, Peterson RO, Nelson MP (2013) What the Inbred Scandinavian Wolf Population Tells Us about the Nature of Conservation. PloS one, 8, e67218.

Ramírez O, Altet L, Enseñat C et al. (2006) Genetic assessment of the Iberian wolf Canis lupus signatus captive breeding program. Conservation Genetics, 7, 861– 878.

Randi E (2008) Detecting hybridization between wild species and their domesticated relatives. Molecular ecology, 17, 285–293.

Randi E (2011) Genetics and conservation of wolves Canis lupus in Europe. Review, 41, 99–111.

Randi E, Hulva P, Fabbri E et al. (2014) Multilocus detection of wolf x dog hybridization in italy, and guidelines for marker selection. PloS one, 9, e86409.

Randi E, Lucchini V (2002) Detecting rare introgression of domestic dog genes into wild wolf (Canis lupus) populations by Bayesian admixture analyses of microsatellite variation. Conservation Biology, 3, 31–45. 35

Randi E, Lucchini V, Christensen MF et al. (2000) Mitochondrial DNA Variability in Italian and East European Wolves: Detecting the Consequences of Small Population Size and Hybridization. Conservation Biology, 14, 464–473.

Rhymer JM, Simberloff D (1996) Extinction by hybridization and introgression. Annual Review of Ecology and Systematics, 27, 83–109.

Roy M, Geffen E, Smith D, Ostrander E, Wayne R (1994) Patterns of differentiation and hybridization in North American wolflike canids, revealed by analysis of microsatellite loci. Molecular biology and evolution, 11, 553–570.

Sastre N (2011) Genética de la conservación: el lobo gris (Canis lupus). PhD thesis, Universidad Autónoma de Barcelona: Spain.

Sastre N, Vilà C, Salinas M et al. (2010) Signatures of demographic bottlenecks in European wolf populations. Conservation Genetics, 12, 701–712.

Sidorovich VE, Tikhomirova LL, Jedrzejewska B (2003) Wolf Canis lupus numbers, diet and damage to livestock in relation to hunting and ungulate abundance in northeastern Belarus during 1990–2000. Wildlife Biol, 9, 103–111.

Silva JP, Toland J, Hudson T et al. (2013) LIFE and human coexistence with large carnivores (The EU LIFE Programme - European Commision, Ed,). DG Environment.

Sundqvist A (2008) Conservation Genetics of Wolves and their Relationship with Dogs. PhD thesis, Uppsala University: Sweeden.

Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics, 105, 437–460.

Tallmon DA, Luikart G, Waples RS (2004) The alluring simplicity and complex reality of genetic rescue. Trends in ecology & evolution, 19, 489–96.

Thalmann O, Shapiro B, Cui P et al. (2013) Complete mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science, 342, 871–874.

Twyford AD, Ennos RA (2012) Next-generation hybridization and introgression. Heredity, 108, 179–189.

Valverde JA (1971) El lobo español. Montes, 159, 228–241.

Verardi a, Lucchini V, Randi E (2006) Detecting introgressive hybridization between free-ranging domestic dogs and wild wolves (Canis lupus) by admixture linkage disequilibrium analysis. Molecular ecology, 15, 2845–2855.

Vilà C (1993) Aspectos morfológicos y ecológicos del lobo ibérico Canis lupus. PhD thesis, Universidad de Barcelona: Spain. 36

Vilà C (2010) Viabilidad de las poblaciones ibéricas de lobos. Enseñanzas de la genética para la conservación. In: Los lobos de la Península Ibérica. Propuestas para el diagnóstico de sus poblaciones. (eds Fernández-Gil A, Álvares F, Vilà C, Ordiz A), pp. 157–171. ASCEL, Palencia, Spain.

Vilà C, Amorim IR, Leonard JA et al. (1999) Mitochondrial DNA phylogeography and population history of the grey wolf Canis lupus. Molecular Ecology, 8, 2089–2103.

Vilà C, Sundqvist A-K, Flagstad Ø et al. (2003a) Rescue of a severely bottlenecked wolf (Canis lupus) population by a single immigrant. Proceedings. Biological sciences / The Royal Society, 270, 91–97.

Vilà C, Walker C, Sundqvist A-K et al. (2003b) Combined use of maternal, paternal and bi-parental genetic markers for the identification of wolf-dog hybrids. Heredity, 90, 17–24.

Vilà C, Wayne RK (1999) Hybridization between Wolves and Dogs. Conservation Biology, 13, 195–198. vonHoldt BM, Pollinger JP, Earl D a et al. (2011) A genome-wide perspective on the evolutionary history of enigmatic wolf-like canids. Genome Research, 21, 1294– 1305. vonHoldt BM, Pollinger JP, Lohmueller KE et al. (2010) Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Nature, 464, 898–902.

Vos J (2000) Food habits and livestock depredation of two Iberian wolf packs (Canis lupus signatus) in the north of Portugal. Journal of Zoology, 251, 457–462.

Wang G, Zhai W, Yang H et al. (2013) The genomics of selection in dogs and the parallel evolution between dogs and humans. Nature communications, 4, 1860.

Wayne RK, Van Valkenburgh B, Kat PW et al. (1989) Genetic and Morphological Divergence among Sympatric Canids. J. Hered., 80, 447–454.

Wright S (1977) Evolution and the genetics of populations. Vol. 3. Evolution and the genetics of populations. Univ. of Chicago Press, Chicago, IL.

Wright LI, Tregenza T, Hosken DJ (2007) Inbreeding, inbreeding depression and extinction. Conservation Genetics, 9, 833–843.

37

APPENDIX 1 Bioinformatics’ discussion

In this appendix, I present the pipeline diagrams for the analyses done and specific bioinformatics issues were discussed.

 Pipelines Diagrams for the three main analyses done: mapping and variant calling, diversity analysis and hybridization analysis. The symbol meanings in the pipeline are the following:

 Mapping and variant calling

38

 Diversity analysis

 Hybridization analysis

39

 Bioinformatics’ discussion

SNP validation Due to the small dataset used, I perform the validation of the calls with the GATK Good Practices recommendations1, preforming a hard filtering. The variants that pass the filters have:

 Quality by depth higher than 2.  Root Mean Square (RMS) of the mapping quality higher than 40.  Phred-scaled p-value using Fisher’s Exact Test to detect strand bias lower than 60.  Consistency of the site with two segregating haplotypes (haplotype score) lower than 13.  u-based z-approximation from the Mann-Whitney Rank Sum Test for mapping qualities higher than 12.5.  u-based z-approximation from the Mann-Whitney Rank Sum Test for the distance from the end of the read for reads with the alternate allele higher than 8.

Format conversion In the analysis of the dataset, I used various published programs whose formats are different. Although some open programs and scripts deal with this problem (for example, VCFtools2), I wrote some scripts to better known of the final dataset characteristic (https://github.com/magicDGS/bioConvert). For instance, a VCF to TPED/TMAP converter written in Python (vcf2tplink.py) remove non- bialelic SNPS and/or whitout GT information.

Analytical scripts Perl and Python custom scripts used in the analyses of this work do not appear in any repository because they are in optimization process. In addition, R analyses were done using plyr3 and reshape24 packages to manage the data and ggplot25 to visualize the results. Because I explored the data by command-line interface, no scripts were written.

40

References

1. Auwera, G. A. Van Der et al. in Curr. Protoc. Bioinforma. (Bateman, A., Pearson, W. R., Stein, L. D., Stormo, G. D. & Yates, J. R.) 11.10.1–11.10.33 (2013).

2. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–8 (2011).

3. Wickham, H. The Split-Apply-Combine Strategy for Data Analysis. J. Stat. Softw. 40, 1–29 (2011).

4. Wickham, H. Reshaping Data with the reshape Package. J. Stat. Softw. 21, 1–20 (2007).

5. Wickham, H. ggplot2: elegant graphics for data analysis. (Springer New York, 2009).

41

APPENDIX 2 Results for no-Iberian samples

Table A2.1 shows the individual results and Table A2.2 the means for each population. Means are computed considering dog population, Eurasian (excluding Wolf Italy), North American and South American wolf populations.

Table A2.1 Individual results

Sample Specie/Region Population Cov Het FROH Wolf Croatia Central/Eastern Europe 6.98 0.00147319 0.09 Wolf China Eurasian Wolf Middle Eastern Europe/Asia 26.36 0.00148438 0.23 Wolf India Eurasian Wolf Middle Eastern Europe/Asia 24.90 0.00181422 0.01 Wolf Iran Eurasian Wolf Middle Eastern Europe/Asia 26.27 0.00178093 0.03 Wolf Israel Eurasian Wolf Middle Eastern Europe/Asia 6.01 0.00150744 0.05 Wolf Italy Eurasian Wolf Italy 5.81 0.00032140 0.51 Airedale Terrier Dog Modern breed 7.33 0.00064358 0.44 Basenji Dog Modern breed 1.35 0.00068557 0.34 Boxer Dog Modern breed 29.33 0.00066418 0.41 Chinese Crested Dog Modern breed 19.17 0.00076284 0.41 Chinook Dog Modern breed 7.84 0.00080670 0.39 English Cocker Spaniel Dog Modern breed 9.66 0.00104400 0.25 Kerry Blue Terrier Dog Modern breed 15.83 0.00068793 0.44 Labrador Retriever Dog Modern breed 10.80 0.00110546 0.20 Miniature Schnauzer Dog Modern breed 5.47 0.00076737 0.32 Soft Coated Wheaten Terrier Dog Modern breed 17.18 0.00070319 0.41 Standard Poodle Dog Modern breed 12.63 0.00101575 0.28 Wolf Great Lakes Amerian Wolf North America 24.34 0.00183124 0.08 Wolf Yellowstone A Amerian Wolf North America 25.73 0.00154630 0.18 Wolf Yellowstone B Amerian Wolf North America 24.07 0.00158641 0.13 Wolf Yellowstone C Amerian Wolf North America 5.41 0.00148466 0.09 Wolf Mexico A Amerian Wolf South America 23.59 0.00003753 0.70 Wolf Mexico B Amerian Wolf South America 5.23 0.00012047 0.70

Cov: atosomic coverage; Het: heterozygosity (het/bp); FROH: inbreeding coefficient

42

Table A2.2. Population results

Population N Mean Het Mean FROH Central/Eastern Europe Wolf 4 0.00161203 0.08 Dogs 11 0.00080787 0.35 North American Wolf 4 0.00161215 0.12 South American Wolf 2 0.00007900 0.70

Het: mean heterozygosity (het/bp); FROH: inbreeding coefficient

43

APPENDIX 3 Heterozygosity by chromosome

Heterozygosity in each sample using 1Mb 200kb-overlapping windows. Red dots are indicative for a window under 0.0005 heterozygotes per base pair (inbreed window).

44

45 46 47 48 49 50 51 52 53 54 55 56

57

APPENDIX 4 Heterozygosity distribution for no Iberian samples

Density and box plots from heterozygosity in dogs, Eurasian and American wolves, using 1Mb windows with 200kb-overlapping. Dotted lines point out the cutoff used as inbreed window.  Dogs

58

 Eurasian wolves

59

 American wolves

60

APPENDIX 5 Principal components’ boxplots and PCA with component 4

In the 48K-merged dataset, PC1 shows the differentiation between wolves and dogs, whereas PC2 represents the geographical variation of wolves. In the present-work’s dataset, PC1 shows the differentiation between wolves and dogs, whereas PC2 clusters American wolves together. The geographically differentiation of Eurasian wolves are explained with PC4. Plotted below, PCA (using PC1 and PC4) with the samples from this work with and without dog blocks from Sierra Morena and Wolf Spain.

61

 48K-merged dataset

62

 Dataset form this work

63

 PCA with dog blocks

 PCA without dog blocks

64

APPENDIX 6 Cross-validation error of the ADMIXTURE analysis

Cross-validation mean and standard deviation for the 5-run ADMIXTURE analysis of the present-work’s dataset and the 3-run 48K-merged dataset. Note that the variation in the cross-validation error is larger in the present-work’s dataset.

65

APPENDIX 7 Linear model details of heterozygosity-percentage block analysis

Statistic summary (Table A7.1) for each linear model showed in Figure 10. Below, residuals, Q-Q and leverage plots.

Table A7.1. Summary statistics table. Sample Haplotype Slope p-value Adj. R2 Sierra Morena Dog/Dog -112.40 0.0657 0.0657 Wolf/Dog 332.32 0.0000 0.7331 Wolf/Wolf -219.92 0.0049 0.1774 Wolf Spain Dog/Dog 16.21 0.0252 0.1074 Wolf/Dog 443.56 0.0002 0.3029 Wolf/Wolf -459.77 0.0002 0.2968 Wolf Portugal Dog/Dog -17.37 0.0000 0.4639 Wolf/Dog 29.95 0.0000 0.3534 Wolf/Wolf -12.58 0.0942 0.0502 Wolf EEP Dog/Dog -7.19 0.2841 0.0049 Wolf/Dog 18.03 0.1870 0.0213 Wolf/Wolf -10.84 0.5100 -0.0153 Haplotype: haplotype class in both chromosomes; Slope: estimation for the slope (heterozygosity in het/bp); Adj R2:adjusted R2 for the complete model

 Dog/Dog haplotypes Sierra Morena

66

 Wolf/Dog haplotypes Sierra Morena

 Wolf/Wolf haplotypes Sierra Morena

67

 Dog/Dog haplotypes Wolf Spain

 Wolf/Dog haplotypes Wolf Spain

68

 Wolf/Wolf haplotypes Wolf Spain

 Dog/Dog haplotypes Wolf Portugal

69

 Wolf/Dog haplotypes Wolf Portugal

 Wolf/Wolf haplotypes Wolf Portugal

70

 Dog/Dog haplotypes Wolf EEP

 Wolf/Dog haplotypes Wolf EEP

71

 Wolf/Wolf haplotypes Wolf EEP