<<

Loxodonta africana distribution across African Database Input Zones

Hyeon Jeong Kim and Samuel K Wasser Center for Conservation Biology Department of Biology University of Washington

March 15, 2019

Executive summary

The aim of this project is to identify the distribution of savannah, forest, and elephant populations within IUCN/SCC Specialist Group’s input zones and range boundaries of the African Elephant Database (AED) using the Center for Conservation Biology’s genetic information.

We selected for analysis a total of 2292 geo-referenced samples with genetic information at a minimum of 10 out of 16 microsatellite loci, but always including two highly subspecies discriminating loci. Of these samples, 1432, 519, and 171 samples were respectively identified as savannah, forest, or hybrid samples. The remainder of the samples did not meet our stringent criteria for subspecies designation.

The samples with subspecies status were found in 106 out of 411 AED input zones and 117 out of 975 AED range boundaries. The 106 input zones were distributed into 57 savannah, 34 forest, 4 hybrid, and 11 mixed population input zones.

To identify the subspecies status of the remaining 305 input zones, the data were analyzed using a k-nearest neighbor approach and a spatial population genetic analysis. A total of 96 and 129 input zones were respectively found to have only savannah or forest samples within 300 km of the polygon. Thirty-one of the remaining input zones had a mix of savannah, forest, and/or hybrid samples whereas 32 input zones did not have samples within 300 km of the polygon boundary. Spatial population genetic analysis using the genetic information of the geo-referenced samples resulted in a map of genetic ancestry of in , from which to estimate the subspecies status of the unknown input zones.

2 METHOD

Reference Sample Subspecies Identification

Sample collection

Elephant samples used in this project were collected between 2000 and 2018 as part of the University of Washington Center for Conservation Biology (CCB), African elephant forensic database established in 2004 (Wasser et al. 2004, 2015). Samples from a hybrid assessment project (Mondol et al. 2015) conducted by the CCB were also included. Samples in the reference database consist of fecal, blood and hair samples. Whenever possible, every effort was made to consecutive samples at distances ≥1 km apart to minimize chances of obtaining multiple samples from the same family group. Latitude/Longitude coordinates for each sample or each batch of samples were recorded at the time of collection.

Genetic analysis and data filtering

DNA was extracted from each sample and genotyped at 16 di-nucleotide microsatellite loci following the methods of Wasser et al. (2004). All samples were extracted in duplicate and each extract was amplified 2-3 times per locus, using a multiple tubes approach to minimize allelic drop-out (i.e., missing alleles due to DNA amplification failure at a given locus). Stringent data filter criteria were applied to the dataset; only samples with accurate geographic information, genetic data at a minimum of 10 out of 16 loci and always including two loci (FH71 and SO4) with high subspecific differentiating power.

The subspecies status of the sample was identified following the methods of Mondol et al. (2015) using EBhybrids v. 0.991, a program written specifically for elephant subspecies and hybrid identification (Mondol et al. 2015; available at https://github.com/stephenslab/EBhybrids). EBhybrids uses allelic drop out rates for each subspecies and ancestry proportions of each sample to calculate the posterior probability that a given sample is a pure forest elephant, pure savanna elephant or a hybrid between the two subspecies, including whether the hybrid is F1 generation, F2 generation, or backcrossed to either a savannah or forest elephant.

The allelic drop out rates were calculated using MicroDrop version 1.01 (Wang et al. 2012). The ancestry proportion values were estimated using the software, STRUCTURE v. 2.3.4 (Pritchard et al. 2000; Falush et al. 2003; 2007; Hubisz et al. 2009), which were compiled using CLUMPP v. 1.1.2 (Jakobsson & Rosenberg 2007) and TESS3 implemented in R (Chen et al. 2007; Caye et al. 2015). Samples retained for further analysis were those identified as either forest, savannah, or hybrid, each with > 0.95 posterior probability of being in its respective subgroup using EBhybrids analysis, based on both STRUCTURE and TESS3.

Objective 1 Spatial analysis

The African Elephant Database (AED) includes 411 input zones in 37 countries and the AED range layer consists of 975 polygons divided into three occurrence categories: doubtful, possible,

3 and known. The reference samples were merged with the input zones and range layers to identify the number of samples of each subspecies in each of the polygons. Spatial analysis was conducted using Geopandas in Python 3.6.5 and all other data manipulation was conducted in R version 3.5.1 using the tidyverse packages.

Objective 2 K-nearest-neighbor algorithm

A k-nearest-neighbor analysis was conducted to identify the most likely subspecies to be present in each of the input zones that had no overlap with reference samples. The 20 closest samples within 300 km of the input zone were identified, using only samples with unique locations to maximize the number of samples identified as nearest neighbors. This was conducted using a k- nearest neighbor algorithm implemented in the R package nngeo. The number of each subspecies and average distance to each subspecies was calculated.

Spatial inference based on genetic data

The software TESS3 uses spatial and genetic information to assign the ancestry proportion of each reference sample to either savannah or forest subspecies. The ancestry proportions were inferred over geographic space to predict the subspecies present in the unknown input zones. To plot the genetic information over geographic space, a raster file of Africa was downloaded (http://membres-timc.imag.fr/Olivier.Francois/RasterMaps.zip) and cropped to fit the boundaries of the AED Africa base layer.

4 RESULTS

Reference Sample Subspecies Identification

After filtering for geographic coordinates and the above mentioned genetic criteria for subspecies assignment, 2292 elephant fecal samples were retained. A total of 2122 samples were identified to subspecies status: 1432 as savannah elephant, 519 as forest elephant, and 171 as hybrids. These number represent samples and not unique individuals. The remaining samples were excluded from all subsequent analysis because they did not meet the criteria to be assigned to one of the three categories.

Objective 1: To combine genetic data with input zones to identify the present in input zones and range boundaries.

Input zone

The African Elephant Database contains 411 input zones in 37 countries. All 2122 samples with clear subspecies status were spatially merged with the input zones to identify the subspecies of elephants present in each input zones (Figure 1).

Figure 1. Reference samples identified to subspecies status are shown (forest = green; savannah = orange, hybrids = blue) with AED input zones (light brown = input zones that contain no samples, brown = input zones that contain samples).

5 In total, 1821 samples were located inside 106 input zones in 29 countries while 301 samples did not fall inside any input zones. Table 1 shows the number of input zones classified as forest, savannah, hybrid or a combination of the three.

Table 1. Number of input zones classified as each subspecies and the number of countries.

Number of input Number of Subspecies zones countries Savannah 57 17 Forest 34 13 Hybrid 4 3 Mixed 11 6

Eleven of the 106 input zones included more than one subspecies of elephants (Table 2). The detailed workflow of identifying subspecies status is shown in Figure 2.

Figure 2. Flow diagram detailing the number of input zones identified to subspecies status at each analysis.

6

Table 2. List of 11 input zones with sample of more than one subspecies status.

Number of Input zone Subspecies samples Garamba Ecosystem Forest 12 Garamba Ecosystem Hybrid 1 Gourma aerial survey zone Hybrid 7 Gourma aerial survey zone Savannah 10 Gourma: surrounding area Hybrid 1 Gourma: surrounding area Savannah 6 Kibale National Park Hybrid 70 Kibale National Park Savannah 4 Mekrou Hunting Zone Forest 3 Mekrou Hunting Zone Hybrid 1 Murchison Falls Conservation Area Hybrid 1 Murchison Falls Conservation Area Savannah 31 Queen Elizabeth Conservation Area Forest 1 Queen Elizabeth Conservation Area Hybrid 31 Queen Elizabeth Conservation Area Savannah 38 Sudanian Area Hunting Blocks Hybrid 1 Sudanian Area Hunting Blocks Savannah 9 Virunga (North & Central) National Park Hybrid 6 Virunga (North & Central) National Park Savannah 2 W du Bénin National Park Forest 17 W du Bénin National Park Hybrid 1 Forest 1 Zakouma National Park Savannah 25

Range layer

The number of samples found within a range polygon was 1875 and the number of samples found outside a range polygon was 247. The 1875 samples were found within 117 polygons of the ranger layer (Table 3). The majority of samples, 1804, fell inside a known range polygon, while 25 and 46 samples fell inside possible and doubtful range polygons, respectively (Figure 3).

7

Table 3. The number of samples found in each category of AED’s range layer.

Range Total number Polygons Number of category of polygons with samples samples

Known 571 95 1804 Possible 190 10 25 Doubtful 214 12 46

Figure 3. The reference samples are shown in blue. The range layer’s known, possible, doubtful polygons are shown in green, yellow, and red, respectively. Polygons with samples found within them are shown in darker green, yellow, and red.

8 Objective 2: To determine species distribution of savannah, forest, hybrids and remaining unknown populations of African elephants across the input zones and range boundaries using a statistical model

K-nearest neighbor

Out of 305 unknown input zones, 242 input zones matched with samples of a single subspecies within 300 km of that zone, 31 input zones matched with samples of multiple subspecies, and 32 input zones had no nearest neighbors. The input zones that matched with only a single subspecies of input zones could be identified as 96 forest and 129 savannah input zones. However, 17 input zones matched with a single sample and therefore the subspecies status could not be determined. The detailed breakdown is shown in Figure 2.

The average distance from an unknown input zone to samples of each subspecies were 147 km for savannah, 139 km for forest and 78.1 km for hybrid samples.

Spatial inference

There were 80 input zones for which subspecies status could not be determined by merging the reference samples or by the k-nearest neighbor approach (highlighted in yellow in Figure 2). A spatially explicit population structure analysis was conducted to estimate the ancestry proportion of a sample coming from each of the 80 input zones. Each sample was estimated to be from either a forest or savannah subspecies with varying levels of admixture between the two subspecies. A value of 1 indicates a pure savannah elephant sample while a value of 0 indicates a pure forest elephant sample (Figure 4). These values were inferred across geographic space and merged with the input zones to predict the subspecies status of each of the input zones.

9 30 20

●●●●● ● ● 1.0 ●●●● ● ● ●●●●●●●● ● ● ●●● ● ●● ● ●● ●● 10 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●● ● 0.8 ●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ●●●● ● ● ●●●● ● ● 0.6 ●●●●● ●●● ● ●●● 0 ● ●● ● ● ●●●● ●● ●● ●●●● ●● ●● ●● ● ●●● ● ●●● ●●●● ●● ●●● ● ● ● ● 0.4 ●●● ● ●●●● ● ●● ●●●●●●●● ● ● ●● ●●● ●●●

10 ● ● 0.2 − ● ●●●●● ● ●●●● ●●●●● ●● ●● ●● ●● ● ●● 0.0 ● ●●●● ● ● ●● ● ● ● 20

− ● ● ●●●

30 Figure 4. The distribution of forest− and savannah genetic ancestry proportions plotted across the African continent. A: Reference samples identified as forest (green), savannah (orange), or hybird (hybrid) are plotted on the inferred spatial distribution.−10 0 10 20 30 40 50

Conclusions

The spatial genetic analysis of African elephant samples show distinct genetic and spatial separation between savannah and forest subspecies with limited hybridization. Savannah elephants are largely restricted to woodland-savannah habitat whereas forest elephants are largely restrcited to forest habitat. Hybrids are clustered at the junction between savannah and forest habitat. However, despite vast availability of savannah-forest ecotone, hybrids are largely restricted to the borders of eastern DRC, western Uganda and South Sudan, and secondarily along the Mali-Burkina Faso and -Burkina Faso borders. These findings support the suggestion by Mondol et al. (2015) that this restricted geospatial hybrid concentration is largely due to asymmetrical pressure, whereby the subspecies experiencing high poaching pressure flees to safe haven in a neighboring country despite the habitat change. Mondol et al also found that there was no parental sex-bias in the subspecies of these hybrids. Genome wide studies of extinct and extant by Palkopoulou et al (2018) also indicate that hybridization between forest and savanna African elephants was extremely rare over their evolutionary history despite a high overall occurrence of hybridization among elephantidae as a whole.

10 REFERENCES

Caye K, Deist TM, Martins H, Michel O, François O. 2015. TESS3: fast inference of spatial population structure and genome scans for selection. Molecular Ecology Resources 16:540– 548. Caye K, and Francois O. (2016). tess3r: Inference of Spatial Population Genetic Structure. R package version 1.1.0. Chen C, Durand E, Forbes F, François O. 2007. Bayesian clustering algorithms ascertaining spatial population structure: a new computer program and a comparison study. Molecular Ecology Notes 7:747–756. Dorman, M. (2018). nngeo: k-Nearest Neighbor Join for Spatial Data. R package version 0.2.4. https://CRAN.R-project.org/package=nngeo Falush D, Stephens M, Pritchard JK. 2003. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587. Genetics Society of America. Falush D, Stephens M, Pritchard JK. 2007. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes 7:574–578. Hubisz MJ, Falush D, Stephens M, Pritchard JK. 2009. Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9:1322–1332. Jakobsson M, Rosenberg NA. 2007. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23:1801–1806. Mondol S, Moltke I, Hart J, Keigwin M, Brown L, Stephens M, Wasser SK. 2015. New evidence for hybrid zones of forest and savanna elephants in Central and West Africa. Molecular Ecology 24:6134–6147. Palkopoulou, E. et al. 2018. A comprehensive genomic history of extinct and living elephants. Proceedings of the National Academy of Sciences March 115: E2566-E2574. National Academy of Sciences. https://doi.org/10.1073/pnas.1720554115 Pritchard JK, Stephens M, Donnelly P. 2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics 155:945–959. Genetics. R Core Team (2018). R: A and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. Wang C, Schroeder KB, Rosenberg NA. 2012. A maximum-likelihood method to correct for allelic dropout in microsatellite data with no replicate genotypes. Genetics 192:651–669. Genetics. Wasser SK, Mailand C, Mondol S, Clark W, Laurie C, Weir BS, Brown L. 2015. Genetic assignment of large seizures of elephant reveals Africa's major poaching hotspots. Science (New York, N.Y.) 349:84–87. Wasser SK, Shedlock AM, Comstock K, Ostrander EA, MUTAYOBA B, Stephens M. 2004. Assigning African elephant DNA to geographic region of origin: applications to the . Proceedings of the National Academy of Sciences 101:14847–14852. National Academy of Sciences. Wickham, H. (2017). tidyverse: Easily Install and Load the 'Tidyverse'. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

11 SUPPLEMENTARY FILES

Spatial data files 1. Shapefile of elephant samples: “reference-samples.shp” 2. ASCII raster file of estimated elephant distribution: “distribution-raster.ascii”

Reference files 1. Bibliography as a Zotero RDF file: “elephant-library.rdf” 2. Bibliography as a BibTeX file: “elephant-library.bib”

Dataframes 1. Dataframe of input zones merged with samples and their subspecies designation: “sj_id_106_iz.csv” 2. Dataframe of k-nearest-neighbor analysis results: “knn_d_273_iz.csv” 3. Dataframe of 80 unknown input zones with genetic ancestry results: “ts_id_80_iz.csv”

12