HOST-PARASITE INTERACTIONS: COMPARATIVE ANALYSES OF POPULATION GENOMICS, DISEASE-ASSOCIATED GENOMIC REGIONS, AND HOST USE

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

By

SARA ROSE SEIBERT B.S., The Ohio State University, 2010 M.S., Wright State University, 2014

2020 Wright State University

WRIGHT STATE UNIVERSITY

GRADUATE SCHOOL

April 22, 2020

I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY Sara Rose Seibert ENTITLED Host-parasite interactions: comparative analyses of population genomics, disease- associated genomic regions and host use. BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy. ______Jeffrey L. Peters, Ph.D. Dissertation Director

______Mill W. Miller, Ph.D. Director, Biomedical Sciences Ph.D. Program

______Barry Milligan, Ph.D. Interim Dean of the Graduate School Committee on Final Examination:

______Paula Bubulya, Ph.D.

______Quan Zhong, Ph.D.

______Michael Leffak, Ph.D.

______Oleg Paliy, Ph.D.

______Philip Lavretsky, Ph.D. (non-voting)

ABSTRACT

Seibert, Sara Rose Ph.D., Biomedical Sciences Ph.D. Program, Wright State University, 2020. Host-parasite interactions: comparative analyses of population genomics, disease-associated genomic regions, and host use

Birds are major vectors and reservoir hosts of pathogens across our globe.

Pathogens can be spread through the dispersal and seasonal movements of migratory and resident wild . Although biogeographic barriers have the potential to restrict completely or in part gene flow in wild birds, variation in ecological drivers of dispersal likely influences the effectiveness of these barriers amongst different . This dissertation seeks to enhance our knowledge of

Southern Hemisphere waterfowl species’ dispersal, population genomics, and disease ecology using population genomic techniques (double-digest restriction- site associated DNA sequencing (ddRADseq)) and multi-variate statistics. First, we investigated the roles of dispersal behavior and biogeographic barriers in genetically structuring populations of four species of Australasian waterfowl.

Additionally, we explored host-parasite theory by comparing the genetic structure of a southern African waterfowl population and its Plasmodium parasite. Next, we identified genomic regions associated with Plasmodium infection in waterfowl using traditional and novel computational methods. Finally, we performed a meta-analysis to create a linear model to explain variation in Plasmodium infection rates in birds using host demographic variables on a global scale. The

iii knowledge gained from this dissertation benefits research in the public health, agriculture, and wildlife management fields.

iv

TABLE OF CONTENTS

PAGE Foreword ...... 1 Chapter 1: Southern Hemisphere waterfowl population genomics and phylogeograhy ...... 5 Introduction ...... 5 Methods ...... 10 Results ...... 17 Discussion ...... 27 Conclusions ...... 31 Appendix 1 ...... 33 Chapter 2: Host-parasite population genomics ...... 62 Introduction ...... 62 Methods ...... 66 Results ...... 74 Discussion ...... 85 Chapter 3: Identification of genomic regions associated with Plasmodium infection ...... 90 Introduction ...... 90 Methods ...... 94 Results ...... 97 Discussion ...... 111 Appendix 3 ...... 115 Chapter 4: Host demographic variables associated with avian Plasmodium ... 118 Introduction ...... 118 Methods ...... 121 Results ...... 130 Discussion ...... 141 Appendix 4 ...... 148 Concluding Remarks ...... 163 References ...... 166

v

LIST OF FIGURES

FIGURE PAGE

Chapter 1

1.1 Biogeographical barriers in northern and southern Papua New

Guinea ...... 7

1.2 Radjah ...... 20

1.3 Wandering Whistling-Duck ...... 22

1.4 Green Pygmy-Goose ...... 24

1.5 Pacific Black Duck ...... 26

Appendix 1

S2 Double-digest Restriction site Associated DNA sequencing library preparation

protocol ...... 36

S3 Double-digest Restiction site Associated DNA sequencing bioinformatic

pipeline ...... 41

S4.1 Heat map for kinship and inbreeding coefficients in Radjah shelduck ...... 44

S4.2 Heat map for kinship and inbreeding coefficients in Wandering Whistling-

Duck...... 45

S4.3 Heat map for kinship and inbreeding coefficients in Green Pygmy-Goose 46

S4.4 Heat map for kinship and inbreeding coefficients in Pacific Black Duck ..... 47

S5 Principal Component Analysis R script ...... 48

S6.1 Principal component analysis for Radjah shelduck with sibling groups ...... 50

vi

S6.2 Principal component analysis for Wandering Whistling-Duck with sibling

groups ...... 51

S7 Non-metric multidimensional scaling R script ...... 55

S10.1 Non-metric multidimensional scaling plot for Radjah shelduck ...... 58

S10.2 Non-metric multidimensional scaling plot for Wandering Whistling-Duck . 59

S10.3 Non-metric multidimensional scaling plot for Green Pygmy-Goose ...... 60

S10.4 Non-metric multidimensional scaling plot for Pacific Black Duck ...... 61

Chapter 2

2.1 Range map of Red-billed Duck ...... 65

2.2 Principal component analysis of Red-billed Duck ...... 77

2.3 Maximum likelihood phylogeny of Red-billed Duck nuclear loci ...... 79

2.4 Maximum likelihood phylogeny of Red-billed Duck mitochondrial control

region ...... 80

2.5 Maximum likelihood phylogeny of Plasmodium cytochrome b gene ...... 84

Chapter 3

3.1 Host-parasite disease triangle ...... 91

3.2 ΦST distribution plot for Red-billed Duck ...... 98

3.3 Principal component analysis of Red-billed Duck Plasmodium infected vs.

non-infected ...... 99

3.4 BayeScan plot for Red-billed Duck for Plasmodium infected vs. non-infected

...... 100

vii

3.5 Random Forest variable (allele) importance plot for Red-billed Duck ...... 103

3.6 Rank assignment frequency plot for alleles of importance for Red-billed Duck

Plasmodium infected vs. non-infected ...... 104

3.7 Red-billed Duck A4417 genotypes and Plasmodium infection...... 105

3.8 Red-billed Duck A6724 genotypes and Plasmodium infection...... 106

3.9 Red-billed Duck A6648 genotypes and Plasmodium infection...... 107

Appendix 3

S1. Random Forest R script ...... 115

Chapter 4

4.1 Forest plot of proportion of Plasmodium infected birds per study ...... 123

4.2 Baujat plot of meta-analysis outlier studies...... 134

4.3 Cook’s distance for meta-analysis outlier studies ...... 135

4.4 Funnel plot for proportion of avian Plasmodium infection ...... 137

4.5 Trim and Fill plot for proportion of avian Plasmodium infection ...... 138

Appendix 4

S1. Meta-analysis R script ...... 148

viii

LIST OF TABLES

TABLE PAGE

Chapter 1

1.1 ddRADseq data for four species of Australia and Papua New Guinea waterfowl

………………………………………………………………………………………. 18

Appendix 1

S1.1 Australia and Papua New Guinea sampling information ...... 33

S7.1 Pairwise estimates of divergence for four Australia-Papua New Guinea

waterfowl species ...... 52

S8.1. Radjah shelduck loci associated with principal components 1 and 2 from

Figure 1.1B ...... 53

Chapter 2

2.1 Red-billed Duck sampling information ...... 67

2.2 Pairwise estimates of divergence for Red-billed Duck nuclear loci and mitochondrial control region ...... 78

2.3 Pairwise estimates of divergence for Plasmodium spp. cytochrome b ...... 82

2.4 Plasmodium spp. sampling information ...... 83

ix

Chapter 3

3.1 Top 10 loci of importance for classifying Red-billed Duck into Plasmodium infected and non-infected ...... 108

Chapter 4

4.1 Avian Plasmodium studies in meta-analysis ...... 123

4.2 Likelihood ratio test results for host demographic variables and interactions between variables ...... 140

x

ACKNOWLEDGEMENTS

I would like to thank my advisor, Dr. Jeffrey L. Peters, for his support and guidance throughout this project, as well as his advice and encouragement to push the boundaries of my knowledge. I would also like to thank Dr. Paula

Bubulya, Dr. Quan Zhong, Dr. Michael Leffak, Dr. Oleg Paily, and Dr. Philip

Lavretsky for serving as active members on my committee. Tremendous appreciation goes to Dr. Scott E. Baird for believing in my potential to be a principal investigator and supporting my academic career development. To all the strong female mentors in my life, thank you for supporting an early career female scientist. I hope one day I can do the same. My thanks to Molly Simonis, Safia

Janjua, Leon Katona, Hannah W. Shows, Mary Westwood, and Jananie

Rockwood, and remainder of my graduate cohort for their support. I would like to express sincere gratitude towards my friends and family, especially my parents,

Suzanne M. Seibert and Brian Keith Seibert Sr. for their constant support and encouragement throughout this process. I would not have been able to mentally survive my preliminary exam, dissertation proposal defense, and dissertation defense without my canine companion, Lilly Serena; thank you for your unconditional love and mostly silent support of me. Finally, to all those family members that passed away before they could see my dreams become a reality…we did it.

xi

Foreword

Waterfowl species occupy every continent except Antarctica. However, detailed information about the dispersal, disease ecology, and population genomics of waterfowl species across biogeographic barriers in the Southern

Hemisphere is limited (Peck & Congdon, 2004; Rhymer et al., 2004; McCracken et al., 2009; Guay et al., 2010; Roshier et al., 2012; Dhami et al., 2013). Some of these waterfowl species are nomadic, moving long distances in response to seasonal fluctuations in rainfall, under the cover of night and into remote, hard to reach areas (Marchant & Higgins, 1990; McEvoy et al., 2015). To better characterize current and historical patterns of dispersal in Southern Hemisphere waterfowl species, we used population genomic techniques to recover a pseudorandom scan of the genome. These methods have the capacity to provide insight into the evolutionary history and genetic connectivity of waterfowl populations species across biogeographic barriers like water, mountains and large tracts of dry land.

Waterfowl are major vectors (organisms that transfer pathogens to other organisms) and reservoir hosts (organisms in which the pathogen multiplies or develops) of infectious diseases across our globe (Gubler, 2002; Reed et al.,

2003; Altizer et al., 2011; Hamer et al., 2012). The epidemiological importance of waterfowl dispersal and how it influences the movement of pathogens Is important for understanding the ecology of disease (Little and Ebert, 2000;

Winker et al., 2007; Bataille et al., 2009; Archie et al., 2009; Biek and Real, 2010;

Blanchong et al., 2016). Host movement can define the genetic structure of its

1 pathogens (Price, 1980; Blouin et al., 1995; Nadler, 1995; Davies et al., 1999;

Criscione & Blouin, 2004; Strobel et al., 2016). From population genomic and phylogeographic studies of the host, one may be able to qualitatively infer how pathogens have historically spread across the landscape and estimate dispersal patterns of future pathogens (Biek and Real, 2010; Altizer et al., 2011).

The coevolutionary arms race between host and parasite may result in selection for genetic variants associated with lower rates or milder symptoms of pathogen infection (Brunner and Eizaguirre, 2016). These genetic variants may rise in frequency within a waterfowl population who experience high pathogen infection rates and severe symptoms. In a study of southern African birds, a disparity in Plasmodium spp. infection rates were observed between aquatic and terrestrial foragers (Hellard et al., 2016). Interestingly, female mosquitoes

(Plasmodium vector) spend a majority of their life cycle in water and thus, we would expect them to have greater rates of contact with aquatic foragers like waterfowl compared to terrestrial birds. However, aquatic foragers may have been repeatedly exposed to Plasmodium and this has likely resulted in natural selection for genetic variants associated with lower infection rates or milder infection symptoms (Bonneaud et al., 2006).

In addition to genomic information, host demographic variables have also been used to explain intra- and inter-specific variation in Plasmodium infection

(Kamis and Ibrahim, 1989; Schuurs and Verheul, 1990; Zuk and Johnson, 1998;

Duffy et al., 2000, Norris and Evans, 2000; Schall, 2000; Hasselquist, 2007;

Trigunaite et al., 2015; Calero-Riestra and Garcia, 2016). These host

2 demographic variables are comprised of behavioral, physiological, temporal, or species-specific differences between hosts. Identifying host demographic variables that may disrupt the transmission of these disease-causing pathogens in a model organism like waterfowl is of great importance to the health of humans, domestic livestock, and wildlife (Burkett-Cadena et al., 2014). Moreover, the identification of host demographics that explain the variation in Plasmodium infection rates is a crucial step towards understanding mosquito-borne disease transmission mechanisms, host population dynamics, and host population viability (Hasselquist, 2007; Dunn et al., 2013; Burkett-Cadena et al., 2014).

Economic impacts

Monitoring waterfowl populations around the world is also important for financial and health reasons especially in the areas of agriculture, ecotourism, and public health (Sekercioglu, 2002; Reed et al., 2003; Miller et al., 2013).

Agriculturalists are especially concerned with the impact of infected waterfowl on livestock and domestic poultry health (Miller et al., 2013). Global concern for the role waterfowl plays in infectious disease movement has largely been focused on highly pathogenic avian influenza (HPAI). To date, HPAI has caused a minimum of three epidemics (i.e., Spanish flu (1918), Asian flu (1957) and Hong Kong flu

(1968)) with approximately 40 million human deaths worldwide (Peiris et al.,

2007). The most recent U.S. outbreak in 2014-2015 of HPAI strain H2N2 occurred in 15 states and over 50 million birds were culled, with a total economic loss valued at $3.3 billion (Greene, 2015; USDA, 2016; Zhao et al., 2019). The

3 financial impacts of infectious diseases spread by wild birds are experienced by every economic class across our globe whether it is due to loss of income or life.

Gaps in Knowledge

The data generated from this dissertation addresses knowledge gaps concerning the dispersal behavior, population genomics, and disease ecology of

Southern Hemisphere waterfowl and one of its known pathogens. These results provide insight into candidate regions of the genome and host demographic variables associated with lower rates of Plasmodium infection. The knowledge gained from this dissertation will make valuable information available to wildlife officials by further characterizing the dispersal behavior of Southern Hemisphere for the prioritization of mitigation efforts towards host that may pose the greatest threat to spreading pathogens. Additonally, the identification of genomic regions associated with Plasmodium infection are potential targets for therapeutic intervention during Plasmodium infection. Finally, identifying host demographic variables that explain variation in Plasmodium infection rates across bird species has broad application across our globe for disease management and species’ conservation initatives.

4 Chapter 1:

Population genomics and phylogeography of four Australasian

waterfowl species

Introduction

Species’ ranges are the product of evolutionary history, fundamental and realized niches, historical or current environmental conditions, and movement ecology (e.g. Chase & Leibold, 2003; Broennimann et al., 2006; Cumming,

Gaidet & Ndlovu, 2012a). The integration of movement ecology (Cumming et al.,

2012a) into classical biogeographical theory is necessary to provide a greater understanding of and ability to predict responses to habitat and climate change

(Lamb, Silva, Joseph, Sunnucks, & Pavlova, 2019). Specifically, dispersal ability is an indicator of a species’ potential to escape declining environmental conditions, find mates and exploit seasonal or variable resource surges (Nathan et al., 2008; Cumming et al., 2012a).

Impediments to dispersal can facilitate local adaptation, allopatric speciation, and variation in distributional patterns (Mayr, 1942; Kyrkjeeide et al.,

2016). Thus, changes in environmental conditions and species’ responses to them and each other often manifest in spatiotemporal variation in their distribution and can potentially contribute to the formation of new species

(Franklin & Miller, 2014). Though ringing or satellite telemetry can be used to study movement patterns, such methods are often hampered by small sample sizes and short time periods (Mech & Barber, 2002; Corrigan et al., 2018).

5 Conversely, studying the molecular diversity of populations can reveal whether populations are genetically connected over evolutionary timescales, and estimates of connectivity can often be accomplished with a few individuals

(Sonsthagen, Wilson, Lavretsky & Talbot, 2019). Thus, by identifying systematic variation in allele frequencies, phylogeographic investigations can provide insight into historical or current impediments to gene flow.

In Australia and New Guinea, there are approximately 30 recognized barriers to species’ distributions, including both terrestrial and marine barriers

(reviews in Schodde & Mason, 1999; Schodde 2006; Bryant & Krosch 2016).

Major marine, climatic, habitat or topological barriers within the region include

Torres Strait, Barkly Tablelands, Great Dividing Mountain Range, Carpentarian

Barrier and three regions of savanna that intersect regions of rainforest habitat in

Papua New Guinea (Figure 1.1). Historically, the importance of these barriers fluctuated as climatic oscillations resulted in variable sea-levels and the variable extent of the Australian arid zone (Byrne et al., 2008). Fluctuations in sea level created and dissipated marine barriers to movement between Australia and the island of New Guinea (Keast, 1961; Ford, 1987; Schodde & Mason, 1999; Lamb et al., 2019). Several phylogeographic studies have documented the influence of these barriers, and the eustatic fluctuations throughout the Pleistocene, on the genetic structuring and genomic divergence of populations of many Australo-

Papua birds (reviewed in Joseph & Omland, 2009; Donnellan et al., 2009;

Kearns, Joseph, & Cook, 2010; Guay et al., 2010; Toon, Hughes,

6

Figure 1.1: Proposed biogeographical barriers important in the species- and subspecies-level structuring of northern Australia and southern Papua New Guinea birds (adapted from Schodde and Mason 1999; Joseph and Omland, 2009).

7 & Joseph, 2010; Dolman & Joseph, 2012; Dhami, Joseph, Roshier, Heinsohn, &

Peters, 2013; Edwards, Crisp, Cook, & Cook, 2017; also reviewed in Joseph et al., 2019).

Understanding of the relationship between movement behavior and genetic connectivity across biogeographic barriers is limited in southern hemisphere waterfowl species, especially in remote parts of Australia and the island of New Guinea (Peck & Congdon, 2004; Rhymer, Williams, & Kingsford,

2004; Guay et al., 2010; Roshier, Heinsohn, Adcock, Beerli, & Joseph., 2012;

Dhami et al., 2013). A study of genetic connectivity of two closely related species of Australian teal, for example, revealed differential patterns of population genetic structure consistent with differences in movement behavior (Dhami et al., 2013).

The grey teal (Anas gracilis) is a highly dispersive species that lacks population structure despite being geographically distributed across known biogeographic barriers. In contrast, the more sedentary chestnut teal (Anas castanea) is more structured across its range in the southern portions of Australia’s mesic zones.

Here, we report on genetic connectivity in four waterfowl species distributed across the Torres Strait in both southern Papua New Guinea, which comprises the eastern half of the island of New Guinea, and northern Australia.

Study Species

Our study taxa are four waterfowl species that differ in life history traits, ranges, and dispersal propensity, and which occur in both Australia and the island of New Guinea: Radjah Shelduck (Tadorna radjah), Green Pygmy-Goose

8 (Nettapus pulchellus), Wandering Whistling-Duck (Dendrocygna arcuata), and

Pacific Black Duck (Anas superciliosa). Within these species, subspecies are often divided by important geographical features. Specifically, the shelduck has two recognized subspecies: T. r. radjah in New Guinea and Maluku Islands

(Moluccas) and T. r. rufitergum in northern Australia (Marchant & Higgins, 1990).

Black duck has three recognized subspecies: A. s. rogersi is in southern New

Guinea, Indonesia, and Australia, A. s. pelewensis in northern New Guinea and several Pacific islands, and A. s. superciliosa in New Zealand (Marchant &

Higgins, 1990). Finally, the whistling-duck has three recognized subspecies: D. a. arcuata in the Philippines and Indonesia, D. a. australis in southern New Guinea and northern Australia, and D. a. pygmaea in northern New Guinea (Marchant &

Higgins, 1990). In contrast, the pygmy-goose is monotypic (Dickinson &

Remsen, 2013; del Hoyo & Collar, 2014; Beehler, Pratt & LeCroy, 2016).

Among our studied species, shelduck and pygmy-goose are thought to only make restricted, local movements. In contrast, whistling-duck and black duck readily undertake long-distance movements of hundreds of kilometers in response to environmental factors (Marchant & Higgins, 1990; McEvoy, Roshier,

Ribot & Bennett, 2015), although some populations can be mostly sedentary on permanent waters. Our objective was to quantify the genetic connectivity among sampling locations of these species of ducks within and between Papua New

Guinea and Australia. We predicted as a result of greater capacity and predisposition for dispersal, the black duck and whistling-duck (long- and intermediate-distance movements, respectively) have greater genetic

9 connectivity among locations (i.e., no population sub-structuring) than the pygmy- goose or shelduck (local movements).

Materials and Methods

Sample Collection & DNA Isolation

Muscle tissue samples for Radjah Shelduck (N = 32), Wandering

Whistling-Duck (N = 27), Green Pygmy-Goose (N = 15), and Pacific Black Duck

(N = 40) were obtained from the Australian National Wildlife Collection and originated from sites in northern Australia and Papua New Guinea (PNG) (see

Appendix 1 S1 Table S1.1). The PNG samples were collected from the Western

Province and Central Province (W-PNG, C-PNG, respectively), and those from

Australia were from northern Western Australia (WA), Northern Territory (NT),

Cape York Peninsula (CYP), and eastern Queensland south of Cape York

Peninsula (QLD) (Figures 2A – 5A). Genomic DNA was extracted following the

DNeasy Blood and Tissue Kit manufacturer’s protocol (Qiagen, Valencia, CA), and quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific,

Waltham, MA).

ddRAD-seq library preparation

To generate DNA sequences from a pseudo-random sampling across the genome (Miller, Dunham, Amores, Cresko & Johnson, 2007), we used the double-digest restriction site associated DNA sequencing (ddRAD-seq) protocol of DaCosta and Sorenson (2014; Protocol details provided in Appendix 1 S2).

10 Approximately 1 µg of genomic DNA was digested using 10 U of restriction enzymes SbfI and EcoRI. Adapters containing sequences compatible with

Illumina TruSeq reagents and barcodes for de-multiplexing sequencing reads were ligated to the sticky ends generated by double digest. Adapter-ligated DNA was size-selected (300 – 450 bp) using gel electrophoresis (2% low-melt agarose) and a MinElute gel extraction kit (Qiagen, Valencia, CA). Size-selected fragments were amplified using the polymerase chain reaction (PCR) with

Phusion high-fidelity DNA polymerase (Thermo Scientific, Pittsburgh, PA).

Amplified products were purified using magnetic AMPure XP beads (Beckman

Coulter Inc., Indianapolis, IN). The concentration of purified PCR products was quantified using real-time PCR with an Illumina library quantification kit (KAPA

Biosystems, Wilmington, MA) on an ABI 7900HT SDS (Applied Biosystems,

Foster City, CA). Samples with compatible barcode combinations were pooled in equimolar concentrations and multiplexed libraries were sequenced on an

Illumina HiSeq 2500 at TUCF Genomics, Tufts University (Medford, MA, USA).

Bioinformatics processing

Raw Illumina reads from each species were processed in separate runs using the computational pipeline described by DaCosta and Sorenson (2014;

Details in Appendix 1 S3) [Scripts available at: http://github.com/BU-RAD- seq/ddRAD-seq-Pipeline]. Reads from all individuals were assigned to individual samples based on unique barcode combinations. For each sample, identical reads were combined while still retaining read counts and the highest quality

11 score for each nucleotide position. Individual reads >10% divergent and/or those reads with an average Phred score of <20 was removed. Reads were then clustered into putative loci using the UCLUST function in USEARCH v. 5 (Edgar,

2010) with an –id setting of 0.85. The highest read quality from each cluster was mapped to an assembled mallard (Anas platyrhynchos) reference genome

(provided by T. Farault, unpubl. data) using BLASTN V. 2 (Altschul, Gish, Miller,

Myers & Lipman 1990). Clusters with identical or nearly identical BLAST hits (i.e. aligned to ± 50 bp on the same reference genome scaffold) were combined. The reads within each cluster (i.e. putative locus) were aligned using MUSCLE V. 3

(Edgar, 2004). Alignments with end gaps due to indels and/or polymorphisms at the SbfI restriction site were either automatically trimmed or flagged for manual editing. Alignments with greater than or equal to two polymorphisms in the last five base pairs were flagged for manual inspection.

Genotypes were scored as described in DaCosta and Sorenson (2014): homozygous genotypes were defined if greater than 93% of sequence reads were consistent with a single haplotype, whereas heterozygotes were defined if a second haplotype was represented by at least 29% of reads, or if a second haplotype was represented by as few as 10% of reads and the haplotype was present in other individuals. Individual genotypes that did not meet either criteria or contained more than two haplotypes were flagged. From these flagged samples, we retained the allele represented by the majority of reads and scored additional alleles as missing data. The second allele was scored as missing for apparently homozygous genotypes based on 1 to 5 reads which was considered

12 “low depth”. We retained all loci with ≤10% missing genotypes and ≤5% flagged genotypes.

We categorized ddRAD-seq loci as either autosomal or sex-linked on the basis of BLAST hits to the Mallard genome and on sex-specific patterns of read depth and heterozygosity (Lavretsky et al., 2015; Peters et al., 2016). Because females carry only one copy of the Z-chromosome, they should have half as many reads for Z-linked loci compared to males and no heterozygosity. In contrast, the number of reads for autosomal loci should be comparable between males and females. The autosomal and Z-linked loci for each individual were combined for further analyses.

Relatedness

To eliminate relatedness as a confounding factor in our population structure analyses, we first measured genotype similarities among individuals in our samples using maximum likelihood as implemented in the program ML-

RELATE (Kalinowski, Wagner & Taper, 2006). ML-RELATE assumes no two individuals being compared are inbred, there are no migrants entering the population, and individuals were sampled from a panmictic population

(Kalinowski et al., 2006). Under these conditions, full siblings are expected to have a relatedness (r) value of 0.5, and half siblings will have an r of 0.25.

However, when populations are not panmictic, inbreeding within subpopulations can result in individuals being genetically equivalent to siblings in analyses that include individuals from other subpopulations. Thus, to determine

13 whether inferred sibling groups were indicative of true siblings or resulted from population structure, we used SPAGeDi 1.5a to calculate Ritland’s (1996) kinship coefficient (Fij) between pairs of individuals and an inbreeding coefficient (F) within individuals (Hardy and Vekemans, 2002). The kinship coefficient is the probability that two alleles, one sampled from each of two individuals, are identical by state for a given locus. The inbreeding coefficient is defined as the probability an individual carries two alleles that are identical by state at a given locus (Crow and Kimura, 1971). Both of these values are calculated relative to the probability of randomly sampling two identical alleles from the entire population; thus, positive values indicate two individuals are more related to each other (or more inbred) than expected by chance, and negative values indicate individuals are less related (or less inbred) than expected.

If our population is panmictic and a random sample from that population does not contain siblings, then we expect both relatedness among individuals and inbreeding within individuals to be near zero. Deviations from our null hypothesis would suggest our samples include related individuals or our population is subdivided (Ritland, 1996). In subdivided populations, individuals from the same population will be genetically more similar to each other than individuals from different populations, and therefore, kinship coefficients will be high. Population subdivision should also result in higher inbreeding (i.e., higher homozygosity) than expected under a model of panmixia. In contrast, if a sample from a population that is not subdivided includes siblings, then we expect kinship coefficients between those individuals to be high relative to values of inbreeding

14 within individuals. In the event these analyses suggested sibships were a better explanation of genetic similarity than population structure, we filtered the data for full-sib (r ≈ 0.5) and half-sib (r ≈ 0.25) relationships, retaining for further analyses the individual with the largest number of recovered loci.

Population structure

We used several approaches to visualize population structure within each species. First, to determine if our waterfowl species have discrete population units or are a panmictic population (see Peters et al., 2016), we first used the R- script, ‘adegenet’ (Jombart, 2008) to conduct a Principal Component Analysis

(PCA) which identifies components (parsimonious informative loci) that describe the highest percentage of variation observed between samples (PCA R script

Appendix 1 S5). Additionally, we used a non-metric multidimensional scaling plot

(NMDS; R script Appendix 1 S9) to estimate pairwise dissimilarity between objects in low-dimensional space using the R package vegan (Oksanen, 2019).

In general, individuals that are more genetically similar are expected to cluster together within the PCA and NMDS plots. For these analyses, alleles were categorized based on the full sequences for each locus rather than individual single nucleotide polymorphism. For each locus, we coded alleles 1 to n, where n is the observed number of unique alleles/haplotypes at each locus. The loci represented by principal components 1 and 2 were identified and mapped to the mallard (Anas platyrhynchos) or chicken (Gallus gallus) genome via the Ensembl genome browser (Cunningham et al., 2018).

15 Second, we identified the optimum number of genetic populations (K) within each species’ dataset by calculating individual assignment probabilities in the program STRUCTURE v. 2.3.4 (Pritchard, Stephens & Donnelly, 2000).

Alleles were coded as described for the PCA (see above). We evaluated the ln

Pr(X|K) for K populations of one to six, and without incorporating a priori information about sampling locality or population origin. To reduce computational time, we only included parsimony informative loci in analyses. STRUCTURE was run using an admixture model and correlated allele frequencies for 500,000 burn- in and 1,000,000 sampling generations. We replicated each analysis five times and calculated ∆K to determine the most likely number of populations (Evanno,

Regnaut & Goudet, 2005) using STRUCTURE HARVESTER (Earl and von Holdt,

2011).

Third, we measured the association between geographic distance using a straightline distance and relatedness between all sampling sites across Australia and Papua New Guinea. We calculated pairwise kinship coefficients (Fij) of

Ritland (1996), using the program SPAGeDi 1.5a (Hardy and Vekemans, 2002).

A Mantel test was then used to test for a correlation between geographic distance between pairs of samples and kinship using the program zt (Bonnet and

Van de Peer, 2002). Under an isolation-by-distance model, we expect a negative correlation between kinship and geographic distance, because individuals within close proximity have a higher probability of sharing alleles.

Finally, composite pairwise population estimates of relative divergence

(i.e., ΦST; the proportion of total genetic variation partitioned among sample

16 groups), as well as nucleotide diversity (the average number of pairwise differences among all sequences within populations) were calculated in the R package ‘PopGenome’ (Hartl and Clark, 1997; Pfeifer, Wittelsbürger, Ramos-

Onsins & Lercher, 2020).

Results

We obtained an average of 715,627 (± 396,054) high quality reads per individual. For each species, greater than 3,700 loci were recovered from >90% of individuals with ≤5% flagged genotypes. The median read depth for these loci for all four species was 135 (±75.2) reads per individual per locus and genotypes were complete for 98.1% and partial (one allele scored) for 0.6% of individuals per locus. Based on read depth and heterozygosity, we inferred 94.3-95.7% of loci were autosomal and 4.3 – 5.7% of loci were Z-linked.

Radjah Shelduck

A total of 3,838 variable ddRAD-seq loci were recovered for 32 individual

Radjah Shelduck samples (Table 1.1). We detected four sets of closely related comprising a total of 10 individuals related at the half-sib or full-sib level (r = 0.18–0.49; Appendix 1 S4: Fig. S4.1). Based on relatedness and inbreeding values, two pairs of sibling groups (N = 4 individuals) from WA were identified; in both cases, putative siblings were collected on the same day and location. The WA sibling groups had high kinship (avg. Fij = 0.19) and low inbreeding coefficients (avg. F = 0.03), which suggests a level of relatedness that

17 Table 1.1: Sample size, ddRADseq polymorphic loci, median read depth, number of autosomal loci, number of sex loci, and nucleotide diversity of four species of Australia and Papua New Guinea (AUS-PNG) waterfowl: Radjah Shelduck, Wandering Whistling duck, Green Pygmy-Goose, and the Pacific black duck. Numbers of loci are presented as autosomal/z-linked.

Radjah Wandering Green Pygmy- Pacific Black Shelduck Whistling-Duck Goose Duck No. of 29 18 14 40 individuals No. of reads 556,682 770,379 776,117 774,051 per individual (±219,122.8) (±625,557.4) (±281,672.1) (±312,207.3) (st.dev.) No. of reads 112 (±42.7) 144 (±121.4) 144 (±47.9) 144 (±61.1) per locus per ind. (± st. dev.) No. of loci 3844 4227 4118 3720 No. of variable 3,640/198 3,956/194 3,933/177 3,507/213 loci Parsimony- 1,736/53 2,713/99 2,063/37 2,645/92 informative loci Nucleotide 0.0018 0.0034 0.0023 0.0057 diversity

18 is not the result of restricted gene flow. In contrast, a group of four highly related

(r = 0.21–0.49) shelduck samples were identified in eastern Queensland, but they were sampled over three different years and from two sites. These QLD samples had high inbreeding coefficients (avg. F = 0.11), suggesting the high kinship resulted from genetic isolation rather than true sibships. Finally, our last sibling group consisted of two PNG-C samples with high kinship (Fij = 0.11) and high inbreeding coefficient values (avg. F = 0.15), which suggests population structure. Thus, the QLD and PNG-C samples were retained for further analyses, whereas one individual from each of the WA sibling groups was removed.

Relatedness and inbreeding coefficient values were low for the remaining samples (r < 0.01; avg. F = 0.04). After filtering out full- and half-siblings, our final sample size for shelduck was 29 individuals.

Both PCA (Figure 1.2b) and STRUCTURE (Figure 1.2c) revealed four main shelduck clusters, which largely corresponded with geography (Figure

1.2a). In the PCA plot, WA and NT samples clustered together, representing a single population, whereas samples from CYP, QLD, C-PNG each clustered into different groups (Figure 1.2b). Samples from W-PNG fell at an intermediate distance between the C-PNG and CYP genetic clusters. Loci identified as principal components 1 and 2 were mapped to mallard and chicken reference genomes (Appendix 1 S8: Table S8.1). Similar results were observed in the shelduck NMDS plot (Appendix 1 S10: Fig. S10.1). Analyses in STRUCTURE indicated the data best fit a four-population model, and population assignment

19

Figure 1.2: A) Radjah Shelduck range (stippled) and sampling locations (circles): northern Western Australia (WA; black), Northern Territory (NT; dark gray), Cape York Peninsula (CYP; light gray), eastern Queensland (QLD; white), western Papua-New Guinea (PNG-W; horizontal stripe), and central PNG (PNG-C; diagonal stripe). B) Principal component analysis using 1,789 variable ddRAD- seq loci from 29 individuals revealing site-specific clusters. C) STRUCTURE bar plots for K = 4 populations (best-supported model) and K = 5 populations, which revealed additional separation within PNG.

20 probabilities mirrored the PCA results (Figure 1.2c). However, upon evaluating assignment probabilities at K = five populations, additional resolution was observed and suggested W-PNG and C-PNG may be genetically differentiated.

Finally, estimates of relative differentiation further supported a moderate level of genetic differentiation (ΦST = 0.09) within shelduck. Specifically, WA and NT were genetically similar, indicating weak or no population structure (ΦST = 0.01), whereas all other sampling locations were genetically differentiated from each other (ΦST = 0.04 – 0.17; Appendix 1 S7: Table S7.1).

Wandering Whistling-Duck

We recovered 4,150 variable ddRAD-seq loci in 26 individual Wandering

Whistling-Duck samples for our analyses (Table 1.1). From our data, we detected three sets of closely related whistling-ducks comprising a total of 11 individuals related at the half-sib or full-sib level (r = 0.13–0.51; Appendix 1 S4: Fig. S4.2). A group of five individuals collected from the same site and on the same day, had high relatedness (r = 0.41–0.50), high kinship coefficients (Fij = 0.11–0.14), but low inbreeding coefficients (avg. F < 0.0). Another group of four individuals from a different site on CYP were also all collected on the same day and had high relatedness (r = 0.41–0.48), high kinship (Fij = 0.13–0.16), but low inbreeding

(avg. F < −0.005). In contrast, three additional individuals collected from the same site and day had low relatedness and low kinship with each other and with the four related individuals (r = 0.0; Fij < −0.009). Finally, two individuals collected from a third site on CYP appeared to be related at a level equivalent to first cousins (r = 0.13) and had a moderately high kinship coefficient (Fij = 0.058), but

21

Figure 1.3: A) Wandering Whistling-Duck range (stippled) and sampling locations (circles): northern Western Australia (WA; black), Northern Territory (NT; dark gray), Cape York Peninsula (CYP; light gray), western Papua-New Guinea (PNG-W; horizontal stripe), and central PNG (PNG-C; diagonal stripe). B) Principal component analysis using 2,812 variable ddRAD-seq loci from 18 individuals did not reveal site-specific clusters. C) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best-supported model, and K = 2 populations did not reveal population genetic structure.

22 low inbreeding (F = 0.01). The high relatedness and high kinship coefficients, coupled with low inbreeding coefficients, suggest the genetic similarity among individuals are better explained by familial relationships rather than population structure. Relatedness was low for the remaining samples (r < 0.01; Fij < −0.007).

After filtering out closely related individuals (N = 8), our final sample size for whistling-duck was 18 samples (Figure 1.3a).

Pairwise composite ΦST values across sampling sites for whistling duck were generally low, with an average of 0.021 (ΦST range = 0.006 – 0.037;

Appendix 1 S7: Table S7.1). In addition to these low ΦST estimates, PCA (Figure

1.3b), NMDS (Appendix 1 S10: Fig. S10.2) and STRUCTURE (Figure 1.3c) clearly demonstrated an absence of population structure across sampled whistling duck.

Green Pygmy-Goose

We recovered 4,110 variable ddRAD-seq loci in 15 individual Green

Pygmy-Goose samples for our analyses (Table 1.1). After removing one individual with a low number of reads, our final sample sizewas 14 individuals.

No sibling relationships were recovered (r < 0.01; Appendix 1 S4: Fig. S4.3).

Pairwise sampling site composite ΦST values ranged from 0 to 0.015 (Appendix 1

S7: Table S7.1), indicating no genetic structure amongst our samples. The PCA

(Figure 1.4b), NMDS (Appendix 1 S10: Fig. S10.3), and STRUCTURE (Figure

1.4c) analyses corroborated these nonsignificant ΦST values supporting no genetic structure among sampled pygmy-goose.

23

Figure 1.4: A) Green Pygmy-Goose range (stippled) and sampling locations (circles): northern Western Australia (WA; black), Northern Territory (NT; dark gray), Cape York Peninsula (CYP; light gray), western Papua-New Guinea (PNG-W; horizontal stripe), and central PNG (PNG-C; diagonal stripe). B) Principal component analysis using 2,100 variable ddRAD-seq loci from 14 individuals did not reveal site-specific clusters. C) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best-supported model, and K = 2 populations did not reveal population genetic structure.

24 Pacific Black Duck

We recovered 2,737 variable ddRAD-seq loci in 40 individual Pacific Black

Duck samples for our analyses (Table 1.1). From our data, we did not detect half- or full-sib groups; relatedness was low among all samples (r < 0.01; Appendix 1

S4: Fig. S4.4). Low pair-wise composite ΦST values across sampling sites (ΦST range = 0.005 – 0.035; Appendix 1 S7: Table S7.1), an absence of clustering in

PCA (Figure 1.5b), NMDS (Appendix 1 S10: Fig. S10.4), and STRUCTURE

(Figure 1.5c) suggested a lack of population structure across the six black duck sampling sites.

25

Figure 1.5: A) Pacific Black Duck range (stippled) and sampling locations (circles): northern Western Australia (WA; black), Northern Territory (NT; dark gray), Cape York Peninsula (CYP; light gray), eastern Queensland (QLD; white), western Papua-New Guinea (PNG-W; horizontal stripe), and central PNG (PNG-C; diagonal stripe). b) Principal component analysis using 2,737 variable ddRAD-seq loci from 40 individuals did not reveal sample-site specific clusters. C) STRUCTURE bar plot for K = 2 populations; K = 1 population was the best- supported model, and K = 2 populations did not reveal population genetic structure.

26 Discussion

Here, we present genetic data from four waterfowl species, Radjah

Shelduck, Green Pygmy-Goose, Wandering Whistling-Duck and Pacific Black

Duck, each having been sampled in both southern Papua New Guinea and throughout northern Australia. Based on reported differences in their propensity to move long distances (Marchant & Higgins 1990; Kear, 2005), we predicted differences in genetic connectivity among our study species. In general, population structure or lack thereof was partially explained by differences in vagility. For example, both black ducks (Figure 1.5) and whistling-ducks (Figure

1.3) were unstructured across their respective sampling sites, which is concordant with initial expectations for greater genetic connectivity resulting from a high propensity for long-distance movements in response to resource availability (Marchant & Higgins 1990; Kear, 2005). We note, however, that northern New Guinean populations of black duck are recognized as a different subspecies (Beehler, Pratt & LeCroy, 2016), but our sampling did not include any individuals from that region; thus, there might be more population structure over a short distance than what was detected in this study.

Population structure within shelducks (Figure 1.2) followed geography, which reflects their sedentary nature. First, structure between southern Papua

New Guinea and northern Australia suggests the Torres Strait acts as an effective barrier to movement, and is consistent with the recognition of separate subspecies on these landmasses (Figure1; Marchant & Higgins, 1990). We also found support for population structure between the two sampled sites in Papua

27 New Guinea, but this is based on a limited number of samples per site. Second, the lack of genetic structure between Western Australia and Northern Territory suggests a single interbreeding population, whereas Cape York Peninsula and eastern Queensland host differentiated populations, which is consistent with limited movement across the Black Mountain Barrier, Burdekin Gap or

Carpentarian Barrier (Figure 1.1; adapted from Schodde & Mason, 1999;

Schodde, 2006; Joseph & Omland, 2009). We hypothesize dispersal behavior and movement ecology of shelducks, coupled with landscape complexity, have contributed to its population structure. Interestingly, shelducks from eastern

Queensland (QLD) were highly inbred, which likely explains why these were so highly segregated from the other shelduck populations (Figure 1.2b, c). Reports of shelduck range contractions and fragmentation may coincide with a decrease in gene flow in this mostly sedentary species (Kear, 2005). Overall, our data suggest at least four subpopulations in shelducks: (1) Western Australia and

Northern Territory, (2) Cape York Peninsula, (3) eastern Queensland, and (4)

Papua New Guinea.

We predicted population structure within pygmy-goose because this species was thought to be largely sedentary (Marchant & Higgins, 1990).

However, the genetic data support a single, panmictic population (ΦST = 0; Figure

1.4). Weak genetic differentiation may reflect a recent population expansion and an insufficient amount of time for genetic differences to accumulate. Alternatively, our results may suggest pygmy-goose is more dispersive than previously thought, and individuals occasionally or even regularly move between regions.

28 The presumed sedentary nature of pygmy-goose might simply reflect limited natural history and movement data on this species. Uncovering discordances between presumed dispersal behavior and the genetic signature of a species demonstrates the utility of genome-scale data in quantifying genetic connectivity as a means to better understand the movement ecology of poorly studied species (Sonsthagen et al., 2019). Finally, our pygmy-goose data are consistent with the species being treated as monotypic.

Kinship confounds population structure

In our study, we found several pairs/groups of genetically similar individuals within the shelduck and whistling-duck samples. Including or excluding familial groups from analyses of shelduck did not have a major effect on the visualization of population structure (Appendix 1 S6: Fig. S6.1). However, the effect familial groups had on the visualization of population structure was especially prominent in the whistling-duck (Appendix 1 S6: Fig. S6.2). The inclusion of familial groups in our whistling duck data set resulted in three clusters: Cape York Peninsula samples grouped within all three clusters, one of which also included all samples from the remaining sites. Retaining one representative sample from each familial group, we observed no site-specific clustering or evidence of population structure within the whistling-duck samples.

Our whistling duck results conflict with previously reported inferences of population structure by Roshier et al. (2012), who investigated population structure in whistling-duck using microsatellite data from seven nuclear loci. They found fine scale population structure from samples collected in northern

29 Australia, Papua New Guinea and Timor Leste (Roshier et al., 2012). Indeed, two flocks sampled 12.5 km and one week apart in the Aurukun area of western

Cape York Peninsula were genetically differentiated. In our study, we included a small subset of the samples from Roshier et al. (2012), and found high kinship among several pairs of individuals collected from this region, suggesting kingroups might have contributed to the signatures of population structure.

However, kinship seems inadequate to explain all the patterns. For example, in addition to the structure found within Cape York Peninsula, Roshier et al. (2012) found significant differentiation between Papua New Guinea and Western

Australia; yet, we detected neither population differences between nor kin groups within these regions despite analyzing some of the same samples. The larger sample size examined by Roshier et al. (2012) may have provided finer scale resolution for detecting population structure within whistling-duck than our data, despite the much lower number of markers examined. Future genomic studies with additional samples are necessary to resolve the contrast between our results and those of Roshier et al. (2012) and to better understand population connectivity in whistling-duck.

Our results further demonstrate that finding highly-related individuals within a sampling site can be the product of population structure, kinship, or both

(Sul, Martin & Eskin, 2018). In general, strong population structure would indicate populations are subdivided and that gene flow is restricted, and thus, individuals within the local population are genetically more similar to each other than to individuals from different populations (Ritland, 1996). Kinship, sibling or family

30 groups, would likely underestimate genetic differences among individuals within a particular region, and thus, contribute to underestimating the genetic connectivity of these individuals to the larger population (Sul et al., 2018; Voight and

Pritchard, 2005), thereby overemphasizing the importance of biogeographic barriers.

Conclusions

Torres Strait, the Barkly Tablelands, Carpentarian Barrier, Great Dividing

Range, and other biogeographic barriers in northern Australia appear insufficient to impede gene flow in waterfowl that disperse large distances, such as the

Wandering Whistling-Duck, Pacific Black Duck, and Green Pygmy-Goose.

Conversely, in mostly sedentary species such as the Radjah Shelduck, these barriers coupled with the birds’ ecology and natural history interact to restrict gene flow, particularly in eastern Queensland where we observed inbred individuals. Our study has provided significant new and sometimes surprising knowledge concerning the genetic connectivity of these four waterfowl species across biogeographic barriers in northern Austarlia and southern Papua-New

Guinea. Conservation and wildlife biologists will gain insight into the movements and structure of waterfowl populations within the Australo-Papua area that may not be so readily possible from other kinds of data, such as from tracking and behavioral studies. Our study explored the impact of the most obvious biogeographic barriers in northern Austarlia and southern Papua-New Guineaare on restriciting gene flow in these four waterfowl species; however, we cannot rule

31 out that habitat selection or mate selection may also contribute to the genetic differentiation we observed between sampling sites.

32 Appendix 1

Appendix 1 S1: Table S1.1 Australia and Papua New Guinea sampling information for Radjah shelduck, Wandering Whistling-Duck, Green Pygmy- Goose and Pacific Black Duck. Specimens with a sample I.D. prefix of: B, are from the Australian National Wildlife Collection, CSIRO, Canberra; R, and WWD blood samples were collected by D. A. R. as part of a separate study. Abbreviations: CEN – Central Province, Papua New Guinea; WES – Western Province, Papua New Guinea; WA – Western Australia; NT – Northern Territory; CYP – Cape York Peninsula, Queensland; QLD – eastern Queensland; F – female; M – male; U – unknown sex.

Species Sample State Date Sex Latitude Longitude I.d. Prov. collected Anas superciliosa B50679 WA 10/2/2004 M -17.8464 122.7406 Anas superciliosa B50756 WA 10/5/2004 M -17.7742 122.8544 Anas superciliosa B50795 WA 10/7/2004 F -17.8886 122.6475 Anas superciliosa B50796 WA 10/7/2004 M -17.8886 122.6475 Anas superciliosa B50829 WA 10/8/2004 M -17.7781 122.8872 Anas superciliosa B50830 WA 10/8/2004 M -17.7781 122.8872 Anas superciliosa B50964 WA 10/16/2004 M -15.5711 128.7581 Anas superciliosa B50965 WA 10/16/2004 M -15.5711 128.7581 Anas superciliosa B50966 WA 10/16/2004 M -15.5711 128.7581 Anas superciliosa B51096 NT 10/20/2004 F -15.4869 129.7344 Anas superciliosa B51132 NT 8/28/2004 U -12.5972 131.3444 Anas superciliosa R29651 CYP 7/4/2008 M -17.491 140.8341 Anas superciliosa R29652 CYP 7/14/2008 M -15.633 141.817 Anas superciliosa R29655 CYP 9/11/2008 M -15.633 141.817 Anas superciliosa R29660 CYP 9/23/2008 M -13.8446 141.60544 Anas superciliosa R29661 CYP 9/23/2008 U -13.8446 141.60544 Anas superciliosa R29662 CYP 9/23/2008 M -13.8446 141.60544 Anas superciliosa R29663 CYP 9/23/2008 M -13.8446 141.60544 Anas superciliosa R29664 CYP 9/23/2008 M -13.8446 141.60544 Anas superciliosa R29665 CYP 9/23/2008 U -13.8446 141.60544 Anas superciliosa R29666 CYP 9/23/2008 U -13.8446 141.60544 Anas superciliosa R29667 CYP 9/23/2008 M -13.8446 141.60544 Anas superciliosa R29668 CYP 9/23/2008 U -13.8446 141.60544 Anas superciliosa R29670 CYP 9/24/2008 U -13.8446 141.60544 Anas superciliosa R29671 CYP 9/24/2008 M -13.8446 141.60544 Anas superciliosa R29680 CYP 9/28/2008 U -13.8446 141.60544 Anas superciliosa R29681 CYP 9/28/2008 U -13.8446 141.60544 Anas superciliosa B29561 CYP 10/12/2001 F -17.4986 140.8483 Anas superciliosa B29562 CYP 10/12/2001 M -17.4986 140.8483 Anas superciliosa B51504 CYP 11/4/2006 F -13.0019 142.0583 Anas superciliosa B51583 CYP 11/6/2006 M -13.0019 142.0583

33 Anas superciliosa B51606 CYP 11/9/2006 M -15.4367 141.7181 Anas superciliosa B51607 CYP 11/9/2006 F -15.4367 141.7181 Anas superciliosa B51609 CYP 11/9/2006 F -15.4367 141.7181 Anas superciliosa B34813 QLD 1/1/2006 U -24.0947 143.1683 Anas superciliosa B34881 QLD 5/17/2003 M -19.2833 146.8033 Anas superciliosa B56175 WES 7/16/2014 M -8.97432 141.26421 Anas superciliosa B56176 WES 7/16/2014 M -8.97432 141.26421 Anas superciliosa B56177 WES 7/16/2014 F -8.97432 141.26421 Anas superciliosa B56058 CEN 10/27/2013 M -8.97723 146.7836 Dendrocygna arcuata B50968 WA 10/16/2004 M -15.5711 128.7581 Dendrocygna arcuata B34011 NT 10/29/2003 F -12.8317 131.6519 Dendrocygna arcuata B51130 NT 8/27/2004 M -12.5972 131.3444 Dendrocygna arcuata R29659 CYP 9/21/2008 U -13.983 141.663 Dendrocygna arcuata R29672 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29673 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29674 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29675 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29676 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29677 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29678 CYP 9/27/2008 U -13.8446 141.60544 Dendrocygna arcuata R29684 CYP 6/15/2009 U -15.633 141.817 Dendrocygna arcuata R29693 CYP 6/25/2009 U -15.633 141.817 Dendrocygna arcuata R29694 CYP 6/25/2009 U -15.633 141.817 Dendrocygna arcuata WWD1 CYP 9/20/2008 U -13.983 141.663 Dendrocygna arcuata B51629 CYP 11/10/2006 M -15.4414 141.7775 Dendrocygna arcuata B51630 CYP 11/10/2006 F -15.4414 141.7775 Dendrocygna arcuata B51631 CYP 11/10/2006 F -15.4414 141.7775 Dendrocygna arcuata B51632 CYP 11/10/2006 M -15.4414 141.7775 Dendrocygna arcuata B51633 CYP 11/10/2006 M -15.4414 141.7775 Dendrocygna arcuata B56371 WES 7/23/2014 F -9.00801 141.2184 Dendrocygna arcuata B56372 WES 7/23/2014 M -9.00801 141.2184 Dendrocygna arcuata B56376 WES 7/23/2014 M -9.03352 141.13686 Dendrocygna arcuata B56385 WES 7/23/2014 M -9.00445 141.18745 Dendrocygna arcuata B56037 CEN 10/26/2013 M -8.99727 146.80446 Dendrocygna arcuata B56073 CEN 10/27/2013 M -8.99727 146.80446 Nettapus pulchellus B50682 WA 10/3/2004 F -17.8464 122.7406 Nettapus pulchellus B50683 WA 10/3/2004 M -17.8464 122.7406 Nettapus pulchellus B50684 WA 10/3/2004 F -17.8464 122.7406 Nettapus pulchellus B50685 WA 10/3/2004 F -17.8464 122.7406 Nettapus pulchellus B50686 WA 10/3/2004 M -17.8464 122.7406 Nettapus pulchellus B50687 WA 10/3/2004 M -17.8464 122.7406 Nettapus pulchellus B34021 NT 10/29/2003 M -12.8317 131.6519 Nettapus pulchellus B34136 NT 12/21/2003 M -12.4667 130.9833

34 Nettapus pulchellus B51403 QLD 10/30/2006 F -14.6139 144.2686 Nettapus pulchellus B51404 QLD 10/30/2006 M -14.6139 144.2686 Nettapus pulchellus B51405 QLD 10/30/2006 F -14.6139 144.2686 Nettapus pulchellus B56373 WES 7/23/2014 F -9.00801 141.2184 Nettapus pulchellus B56035 CEN 10/26/2013 F -8.99727 146.80446 Nettapus pulchellus B56036 CEN 10/26/2013 M -8.99727 146.80446 Tadorna radjah B50849 WA 10/11/2004 M -15.7036 127.8517 Tadorna radjah B50850 WA 10/11/2004 M -15.7036 127.8517 Tadorna radjah B50851 WA 10/11/2004 M -15.7036 127.8517 Tadorna radjah B50912 WA 10/13/2004 U -15.7036 127.8517 Tadorna radjah B50913 WA 10/13/2004 M -15.7036 127.8517 Tadorna radjah B54962 WA 7/13/2010 F -15.6014 128.4906 Tadorna radjah B29998 NT 10/17/2002 M -12.4211 131.2206 Tadorna radjah B33550 NT 10/23/2002 M -12.42 131.2242 Tadorna radjah B34009 NT 10/29/2003 F -12.8317 131.6519 Tadorna radjah B34010 NT 10/29/2003 M -12.8317 131.6519 Tadorna radjah B54596 NT 7/10/2009 F -14.742 135.292 Tadorna radjah B54630 NT 7/11/2009 F -14.717 135.305 Tadorna radjah B54631 NT 7/11/2009 F -14.717 135.305 Tadorna radjah B29563 CYP 10/12/2001 M -17.4986 140.8483 Tadorna radjah B51407 CYP 10/30/2006 F -14.6139 144.2686 Tadorna radjah B51409 CYP 10/30/2006 F -14.6306 144.225 Tadorna radjah B51410 CYP 10/30/2006 F -14.6306 144.225 Tadorna radjah B51411 CYP 10/30/2006 M -14.6306 144.225 Tadorna radjah R29653 CYP 7/14/2008 F -15.633 141.817 Tadorna radjah R29654 CYP 10/9/2008 U -15.633 141.817 Tadorna radjah R29656 CYP 10/9/2008 U -15.633 141.817 Tadorna radjah R29657 CYP 11/9/2008 U -15.633 141.817 Tadorna radjah B39415 QLD 6/26/1985 F -22.1667 148.5 Tadorna radjah B43833 QLD 8/30/1991 M -22.4108 150.2717 Tadorna radjah B43834 QLD 8/30/1991 F -22.4108 150.2717 Tadorna radjah B44218 QLD 2/2/1992 M -22.4167 150.2778 Tadorna radjah B56148 WES 7/16/2014 F -8.86682 141.24431 Tadorna radjah B56149 WES 7/16/2014 M -8.86682 141.24431 Tadorna radjah B56342 WES 7/22/2014 F -8.93294 141.27327 Tadorna radjah B55988 CEN 10/25/2013 F -9.06799 146.83099 Tadorna radjah B55989 CEN 10/25/2013 M -9.06799 146.83099

35 Appendix 1 S2: Double-digest restriction site associated DNA sequencing library

prepartion protocol

1. Extract genomic DNA from samples using a QIAgen DNeasy Blood & Tissue Kit

following manufacturer's protocol.

2. Measure DNA concentration with a Nanodrop; dilute DNA to a concentration of 1

µg in 30 µl of solution (note: we have successfully used the protocol with as little

as 0.25 µg).

3. Cut DNA with 20 units of restriction enzymes SbfI & EcoRI.

a. Reaction: 1 µg DNA template (note: we have successfully used the

protocol with as little as 0.25 µg), 20U SbfI, 20U EcoRI, 5 µl 10x NEB

Buffer 4, ddH2O to bring total volume to 50 µl.

b. Thermocycler conditions: 30 min at 37°C, 20 min at 65°C, and hold at

4°C.

4. Ligate adapters to digested DNA.

a. Barcoded P1 adapter ligates to overhang produced by EcoRI (includes

barcodes for each sample).

P1 Adapter: Top: 5-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTxxxxxxTGCA-3

Bottom: 3-TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAxxxxxx-Phos-5 xxxxxx = barcode (see Table Appendix 1 S2.1) b. Divergent-Y P2 adapter ligates to overhang produced by “common”

enzyme (includes indexes for each sample).

P2 Adapter: Top: 5-Phos- ATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACiiiiiiATCAGAACAA-3 Bottom: 3-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGiiiiiiTAGAGCATACGGCAGAAGACGAAC-5

36 iiiiii = index (see Table Appendix 1 S2.1) c. Reaction: 50 microliters, 2 microliters 10x NEB Buffer 4, 0.6 microliters

rATP, 4 microliters of P1 adapter, 12 microliters of P2 adapter, 0.4

microliters of ddH2O, and 1 microliter of T4 ligase.

d. Thermocycler conditions: 30 min at 22°C, 20 min at 65°C, and hold at

4°C.

5. Size select pooled sample in agarose gel.

a. Run on 2% low melt agarose gel.

b. Add pooled internal size standards (300 and 450 bp) to each sample.

c. Run gel electrophoresis for ~2.5–3.5 hours at 110 volts.

d. Size select by cutting on the internal edge of each size standard (just

below 450 bp and just above 300 bp).

6. Purify size selected samples using QIAgen MinElute kit following the

manufacturer’s protocol.

7. Amplify size-selected fragments via PCR.

a. Primers: RAD1.F*: 5’-AATGATACGGCGACCACCGAG-3’

RAD2.R*: 5’-CAAGCAGAAGACGGCATACGAG-3’

b. Reaction: 15 microliters of template, 3 microliters of F primer, 3 microliters

of R primer, 9 microliters of ddH20, and 30 microliters of Phusion Mix.

c. Thermocycler conditions: 30 sec at 98C, 18-26 cycles of the following:

10 sec at 98C, 30 sec at 60C, 40 sec at 72C, 5 min at 72C, and hold

at 4C.

37 8. Purify PCR products with AMPure XP beads following the manufacturer’s

protocol.

9. Quantify purified PCR products using KAPA Biosystems SYBR FAST qPCR kit

following the manufacturer’s protocol.

a. Reaction: 2 microliters of template DNA, 2 microliters of ddH20, and 6

microliters of SYBR FAST mix + primers RAD1.F & RAD1.R.

b. qPCR profile conditions: 5 min at 95C, 35 cycles of the following: 30 sec

at 95C, 45 sec at 60C, and hold at 4C.

10. Pool samples in equimolar concentrations.

11. Submit library for Illumina sequencing.

38 Appendix 1 S2: Table S2.1. Barcodes and adaptors used in ddRAD-seq protocol.

Barcodes for P1 adaptors (xxxxxx) Sbf001 ACACCT Sbf003 ACAGCA Sbf005 ACCAAG Sbf007 ACCTAC Sbf011 ACGTAG Sbf014 ACTCGT Sbf017 AGACCA Sbf018 AGACGT Sbf022 AGCATG Sbf024 AGCTTC Sbf027 AGGTAC Sbf028 AGGTTG Sbf029 AGTCCT Sbf030 AGTCGA Sbf034 CAACTC Sbf036 CAAGTG Sbf039 CACTCA Sbf040 CACTGT Sbf043 CAGTCT Sbf046 CATCTG Sbf051 CTAGAG Sbf052 CTAGTC Sbf053 CTCACA Sbf054 CTCAGT Sbf059 CTGTCA Sbf060 CTGTGT Sbf061 CTTCAG Sbf062 CTTCTC Sbf066 GAACTG Sbf067 GAAGAG Sbf070 GACAGT Sbf072 GACTGA Sbf073 GAGACT Sbf075 GAGTCA Sbf077 GATCAG Sbf078 GATCTC Sbf082 GTACTC Sbf086 GTCAGA

39 Sbf090 GTGAGT Sbf094 GTTCTG Sbf098 TCACGT Sbf102 TCCATG Sbf106 TCGATC Sbf110 TCTCGA Sbf116 TGAGGT Sbf117 TGCAAG Sbf122 TGGATG Sbf125 TGTCCA

Indexes for P2 adaptors (iiiiii) i1 ATCACG i2 CGATGT i3 GCTGTA i4 TAGCAC

40 Appendix 1 S3: Double-digest restriction site associated DNA sequencing bioinformatic pipeline

Install the newest version of these programs on your computer

Usearch v. 5.0

Blastn v. 2.2.25

Muscle v. 3.8.31

Python3

Geneious

Steps (Adapted from DaCosta and Sorenson, 2014; scripts are available at https://github.com/BU-RAD-seq/ddRAD-seq-Pipeline; more details are provided in

the official manual--https://github.com/BU-RAD-seq/ddRAD-seq-

Pipeline/blob/master/MANUAL.txt)

1. Change all index files and undetermined files from fastq to qseq.

a. python3 fastq2qseq1.8.py

2. Demultiplex sequences.

a. python3 First12.py filename.qseq

b. python3 IndexReads.py undetermined.qseq

c. python3 ParseImperfect.py undetermined.qseq

d. python3 ddRADparser.py SPECS.FILE

3. Condense (C) identical reads within each individual sample.

a. python3 CondenseSequences.py samplename.qseq 8

4. Filter (F) reads within each individual sample.

a. python3 FilterSequences.py samplenameC.qseq 8

41 5. Concatenate Condensed and Filtered (CF) files for samples of interest.

a. python3 cat *CF.qseq > projectname.cat.qseq

6. Make stacks from all reads within the concatenated CF file (this also pulls the

best sequence per stack for step 10 BLAST search).

a. python3 uCLust85toBLAST.py projectname.cat.qseq #

7. BLAST representative sequences from each cluster (BLASTn)

a. python3 runBLAST.py projectname.cat.qseq \Anas_1.0 (duck genome)

8. Combine clusters with the same blast hit.

a. python3 CombineClusters.py projectname.cat.qseq

9. Align each cluster with MUSCLE.

a. python3 AlignClustersP2.py projectname.cat.qseq # hostlist

b. python3 AlignClusters.py projectname.cat.qseq #

10. Genotype individuals within each stack.

a. python3 RADGenotypes117.py projectname.cat.qseq # CCTGCAGG

Look at Clusterssummary to:

11. Find W-linked clusters for sexing (FEMALES > 0 DEPTH; MALES ~ 0 DEPTH)

a) Make excel file with clstr # & subsequent depth reads for all individuals.

12. Find good clusters:

a) For each cluster do:

=IF(AND(COUNTIF(depths.across.indiv.4cluster.X,">9")> 20% cutoff, ,

“Good.Colum> 20%cutoff,SUM(BadRatio, ExtraReads,

3rdAllele)<7),AVERAGE(depths.across.indiv.4cluster.X),0)

42 b) Sort clusters large > small == ALL GOOD CLUSTERS WILL HAVE #, BAD =

0

13a. Make FASTA file of all clusters for manual editing and their respective best BLAST

hit to be manually checked in Geneious.

- python3 rad2fasta_list.py projectname.cat.out (# of samples) Clstrlist.txt

13b. Get best BLAST hit for all clusters needing revision.

- python3 GetSeq_JD.py file.w.blast.txt 150

13c. Incorporate edited alignments into genotype file.

- python3 IncorporateEdits.py projectname.cat.qseq (# of samples)

Do this after final genotypes are obtained:

14. Autosome vs. Sex chromosomes: calculates depth and heterozygosity per locus:

a. Create list of clusters that will be retained in dataset (variable + constant)

b. python3 rad2parse_clstrs.py projectname.catRV.out (#of samples)

projectname.xM-yF.list

c. python3 rad2HetSexStats.py projectname.catRV_parse.out (#of samples)

sex.file

15. For each marker type out file, run script to get final structure.

43 Appendix 1 S4: Inbreeding coefficients, kinship coefficients, and additional principal component analyses for Radjah Shelduck and Wandering Whistling Duck. Abbreviations and specimen prefixes as in Appendix 1 S1.

Appendix 1 S4: Fig. S4.1: Heat map showing pairwise comparisons of kinship coefficients and an inbreeding coefficient for Radjah Shelducks. In the pairwise comparisons, the increasingly darker shading in a box is positively correlated with the proportion of alleles that are identical by state between those individuals compared. In the same respect, an increase in the dark shading of inbreeding coefficient is positively correlated with the proportion of loci containing two alleles that are identical by state (homozygosity) relative to the probability of randomly sampling two identical alleles from the entire population. Note the high inbreeding coupled with high kinship within populations for most samples. In contrast, two pairs of individuals from WA had high kinship but comparatively low inbreeding, suggesting they were indeed siblings.

44

Appendix 1 S4: Fig. S4.2: Heat map showing pairwise comparisons of kinship coefficients and an inbreeding coefficient for Wandering Whistling-Ducks. In the pairwise comparisons, the increasingly darker shading in a box is positively correlated with the proportion of alleles that are identical by state between those individuals compared. In the same respect, an increase in the dark shading of inbreeding coefficient is positively correlated with the proportion of loci containing two alleles that are identical by state (homozygosity) relative to the probability of randomly sampling two identical alleles from the entire population. In general, most kinship coefficients were similar to inbreeding coefficients. However, two groups of individuals from CYP had high kinship coefficients, but low inbreeding coefficients, suggesting that the high genetic similarity resulted from sampling sibgroups.

45

Appendix 1 S4: Fig. S4.3: Heat map showing pairwise comparisons of kinship coefficients and an inbreeding coefficient for Green Pygmy-Goose samples. In the pairwise comparisons, the increasingly darker shading in a box is positively correlated with the proportion of alleles that are identical by state between those individuals compared. In the same respect, an increase in the dark shading of inbreeding coefficient is positively correlated with the proportion of loci containing two alleles that are identical by state (homozygosity) relative to the probability of randomly sampling two identical alleles from the entire population. Note the overall low inbreeding and kinship coefficients, suggesting a lack of population structure and sibling in our sample.

46

Appendix 1 S4: Fig. S4.4: Heat map showing pairwise comparisons of kinship coefficients and an inbreeding coefficient for Pacific Black Duck samples. In the pairwise comparisons, the increasingly darker shading in a box is positively correlated with the proportion of alleles that are identical by state between those individuals compared. In the same respect, an increase in the dark shading of inbreeding coefficient is positively correlated with the proportion of loci containing two alleles that are identical by state (homozygosity) relative to the probability of randomly sampling two identical alleles from the entire population. Note the overall low inbreeding and kinship coefficients, suggesting a lack of population structure and sibling in our sample. However, at least two individuals in our sample appear to be inbred, despite a lack of population structure.

47 Appendix 1 S5: R code for Principal component analysis.

title: "PCA"

author: "Sara Seibert"

date: "1/15/2020"

1. Load package(s): adegenet

2. Read in data.

```{r} obj1 <- read.structure("RASH.N29.L1789.NA.stru")

```

3. Convert to genind file. Combines both alleles for each individual into single row.

```{r}

X <- scaleGen(obj1, NA.method="mean")

```

4. Calculates principal components for genind file.

```{r} pca1 <- dudi.pca(X,cent=FALSE,scale=FALSE,scannf=FALSE,nf=10)

```

5. Writes outfile with components.

```{r} write.csv(cbind(dimnames(X)[[1]], pca1$li),

"RASH_02032020_NA_PCscores.csv", row.names = F)

```

6. Plot principal component dimensions and their associated eigen values

```{r} plot(pca1$eig)

48 ```

7. Calculate the eigen value percent for each component.

```{r} eig.perc <- 100*pca1$eig/sum(pca1$eig)

View(eig.perc)

```

49 50

40

30

20

2 10 PC 0 -50 -30 -10 10 -10

-20

-30 PC 1

Appendix 1 S6: Fig. S6.1: Principal component analysis using 1,789 variable ddRADseq loci from 31 Radjah Shelducks. Individuals for each sample site (dots): western PNG (WP; black), central PNG (CP; diagonal stripe), western Australia (WA; light gray), Northern Territory (NT; white), Cape York Peninsula (CYP; gray), and eastern Queensland (QLD; horizontal stripe).The PCA revealed site-specific clusters when siblings were included in our analysis. However, these results were qualitatively similar to the analysis that excluded sibgroups (Fig. 1B).

50 60

40

20 PC 2 2 PC 0 -70 -20 30

-20

PC 1 -40

Appendix 1 S6: Fig. S6.2: Principal component analysis using 2,812 ddRADseq loci from 26 Wandering Whistling-Ducks. Individuals for each sample site (dots): western PNG (WP; black), central PNG (CP; diagonal stripe), western Australia (WA; light gray), Northern Territory (NT; white), Cape York Peninsula (CYP; gray), and eastern Queensland (QLD; horizontal stripe).The PCA revealed three distinct clusters: two clusters of CYP samples and one cluster containing additional CYP samples and individuals from the remaining sampling sites. It was determined these two CYP clusters contained highly related individuals, and when excluded from the analysis, there was no evidence of population structure (Fig. 2B).

51 Appendix 1 S7: Table S7.1: Pairwise estimate of relative divergence (ΦST) between sampling sites: western PNG (WP), central PNG (CP), western Australia (WA), Northern Territory (NT), Cape York Peninsula (CYP), and eastern Queensland (QLD) for Radjah shelduck (RASH), Wandering Whistling-Duck (WWDU), Pacific Black Duck (PBDU) and Green Pygmy-Goose (GPYG).

Radjah Shelduck Wandering Whistling- Pacific Black Duck Green Pygmy -oose Duck

ΦST ΦST ΦST ΦST CYP/NT 0.1041 CYP/NT 0.013 CYP/NT 0.019 NT/QLD 0.016 CYP/QLD 0.097 CYP/QLD 0.016 CYP/QLD 0.0087 NT/WA 0.0081 CYP/WA 0.054 CYP/WA 0.022 CYP/WA 0.0054 NT/WP -0.024 CYP/WP 0.052 CYP/WP 0.0068 CYP/WP 0.0080 NT/CP 0.0085 CYP/CP 0.8098 CYP/CP 0.011 CYP/CP 0.018 QLD/WA 0.0026 NT/QLD 0.12 NT/QLD 0.021 NT/QLD 0.021 QLD/WP -0.025 NT/WA 0.011 NT/WA 0.037 NT/WA 0.017 QLD/CP 0.0097 NT/WP 0.081 NT/WP 0.018 NT/WP 0.030 WA/WP -0.036 NT/CP 0.12 NT/CP 0.020 NT/CP 0.035 WA/CP 0.0072 QLD/WA 0.12 QLD/WA 0.030 QLD/WA 0.0068 WP/CP -0.036 QLD/WP 0.12 QLD/WP 0.026 QLD/WP 0.012 QLD/CP 0.17 QLD/CP 0.028 QLD/CP 0.021 WA/WP 0.086 WA/WP 0.021 WA/WP 0.0095 WA/CP 0.12 WA/CP 0.028 WA/CP 0.013 WP/CP 0.064 WP/CP 0.016 WP/CP 0.014

52 Appendix 1 S8: Table S8.1: The shelduck loci represented by principal components 1 and 2 in the PCA. DNA sequences from shelduck samples were mapped to the mallard and chicken genome. Reference genome matches with the highest sequence similarity are listed with the known function and reference.

53

54 Appendix 1 S9: R code for non-metric multidimensional scaling

title: "RASH_1054_nmds"

author: "Sara Seibert"

date: "6/26/2019"

1. Load packages: ggplot2, vegan, and MASS

2. Read in data.

```{r}

RASH_29__1054_nmds<- read.csv("C:/Users/seibe/Documents/AUS_PNG/AUSPNG_data/RASH_data/RA

SH_1054.csv",header=TRUE,sep=",",na.strings=c("","NA"))

```

3. Identify variables in dataset

```{r} data<-(RASH_29__1054_nmds[,2:1055])

```

4. Create Manhattan type dissimilarity matrix

```{r}

RASH_1054 <- vegdist(data) plot(RASH_1054, type = "t")

```

5. Create a list of item points and associated stress.

```{r}

RASH.mds0 <- isoMDS(RASH_1054) print(RASH.mds0)

55 ```

6. Generate stress plot to show the linear fit (null model is ordination distances are

all equal) and non-metric fit.

```{r} stressplot(RASH.mds0, RASH_1054)

```

7. Shepard plot for ordination distances. Observed community dissimilarity

nonlinearly onto ordination space to handle non-linear species responses of any

shape. Shepard plot where ordination distances are plotted against community

dissimilarities and the fit is shown as a monotone step line.

```{r} ordiplot(RASH.mds0, type = "t")

```

8. Iterative search using several different random starts and selects among similar

solutions to reduce the stress.

```{r}

RASH_1054 <- metaMDS(data)

```

9. Non-metric multidimensional scaling plot code using bray-curtis.

```{r} nMDS <- metaMDS(data, distance = "bray", k = 2)

```

56 10. Create and plot a nMDS shepard diagram for spearman correlation.

```{r} par(mfrow = c(1,2), mar = c(3.5,3.5,3,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 1) spear <- round(cor(vegdist(data, method = "bray"), dist(nMDS$points), method =

"spearman"),3) plot(vegdist(data, method = "bray"), dist(nMDS$points), main = "Shepard diagram of nMDS", xlab = "True Bray-Curtis distance", ylab = "Distance in the reduced space") mtext(line = 0.1, text = paste0("Spearman correlation = ", spear), cex = 0.7)

```

11. Code to generate a bar plot of dimensional stress for nMDS

```{r} n = 10 stress <- vector(length = n) for (i in 1:n) {stress[i] <- metaMDS(data, distance = "bray", k = i)$stress} names(stress) <- paste0(1:n, "Dim")

# x11(width = 10/2.54, height = 7/2.54) par(mar = c(3.5,3.5,1,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 2) barplot(stress, ylab = "stress")

```

57 RASH_1054_nmds 0.4

0.3

0.2

0.1 DIM 2 DIM 0 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 -0.1

-0.2

-0.3

-0.4 Stress = 0.13 DIM 1

Appendix 1 S10: Fig. S10.1: To eliminate confounding affects in the orthogonal transformation from missing data in our datasets we also used a non-metric multidimensional scaling plot analysis (NMDS) for the 29-RASH samples. After filtering 1,789 loci with missing variables, we recovered 1,054 loci with categorical variables for all 29 samples. Samples are represented by two dots instead of one (PCA). In general, the NMDS spatial pattern of shelducks resembles the PCA plot. Samples are clustering together based on site.

58 WWDU_2114_NMDS 0.4

0.3

0.2

0.1 DIM 2 DIM 0 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 -0.1

-0.2

-0.3 Stress = 0.18

-0.4 DIM 1

Appendix 1 S10: Fig. S10.2: To eliminate confounding affects in the orthogonal transformation from missing data in our datasets we also used a non-metric multidimensional scaling plot analysis (NMDS) for the 18 Wandering Whistling- Duck samples. After filtering 2,812 loci with missing variables, we recovered 2,114 loci with categorical variables for all 18 samples. Samples are represented by two dots instead of one (PCA). In general, the NMDS spatial pattern of whistling-ducks resembles the PCA plot. Samples are dispersive and not clustering together based on site.

59 GPYG_1404_NMDS 0.08

0.06

0.04

0.02 DIM 2 DIM 0 -0.15 -0.1 -0.05 0 0.05 -0.02

-0.04

-0.06

-0.08 Stress = 0.14 DIM 1

Appendix 1 S10: Fig. S10.3: To eliminate confounding affects in the orthogonal transformation from missing data in our datasets we also used a non-metric multidimensional scaling plot analysis (NMDS) for the 14 Green Pygmy-Geese. After filtering 2,100 loci with missing variables, we recovered 1404 loci with categorical variables for all 14 samples. Samples are represented by two dots instead of one (PCA). In general, the NMDS spatial pattern of pygmy-geese resembles the PCA plot. Samples are dispersive and not clustering together based on site.

60 PBDU_1025_NMDS

0.8

0.6

0.4 2

0.2 DIM 0 -1.5 -1 -0.5 0 0.5 1 -0.2

-0.4

-0.6 Stress = 0.24

-0.8 DIM 1

Appendix 1 S10: Fig. S10.4: To eliminate confounding affects in the orthogonal transformation from missing data in our datasets we also used a non-metric multidimensional scaling plot analysis (NMDS) for the 40 Pacific black ducks. After filtering 2,737 loci with missing variables, we recovered 1025 loci with categorical variables for all 40 samples. Samples are represented by two dots instead of one (PCA). In general, the NMDS spatial pattern of black ducks resembles the PCA plot. Samples are dispersive and not clustering together based on site.

61 Chapter 2: Host-parasite population genomics

Introduction

A burgeoning topic in disease ecology concerns the epidemiological importance of host-population genetics (Little and Ebert, 2000; Archie et al.,

2009; Biek and Real, 2010; Blanchong et al., 2016). Among the key issues being investigated is host movement ecology and how it influences the movement of pathogens (Winker et al., 2007; Bataille et al., 2009; Biek and Real, 2010). In understanding the spatial and genetic connectivity between host and pathogen populations, one may be able to qualitatively infer the potential for spread of disease and the prospective paths pathogens may travel across landscapes

(Biek and Real, 2010; Altizer et al., 2011).

Host movement can define the genetic structure of parasite populations

(Price, 1980; Blouin et al., 1995; Nadler, 1995; Davies et al., 1999; Criscione &

Blouin, 2004; Strobel et al., 2016). Obligate parasites rely on the availability of suitable hosts to survive and complete their life cycle (Combes, 1997). When infected hosts move across the landscape, they can provide opportunities for parasites to infect novel hosts and to disperse effectively across large geographic distances (Real and Biek, 2007). Obligate parasite genetic connectivity is dependent on the infected hosts’ movement and contact with susceptible hosts

(McCoy et al., 2003; Real and Biek, 2007; Barret et al., 2008; Biek and Real,

2010; Daversa et al., 2017). Concordance between host and parasite population structure may indicate a highly dependent interaction (specialists) between these two organisms. Discordance in population structure may reflect differences in

62 effective dispersal, effective population sizes, or life history traits (generalists)

(Mulvey et al., 1991; McCoy et al., 2005; Keeney et al., 2009; Strobel et al.,

2016).

Estimates of host-parasites population structure are even more complex in heteroxenous parasites, which require a minimum of two hosts to complete their life cycle. The genetic structure of heteroxenous parasites might be predicted to be a mixture of the two-host species’ movement. However, Jarne and Théron

(2001) predicted that genetic structure of the parasite is dependent on the host with the highest genetic connectivity or dispersal behavior. Any potential signal of local adaptation or systematic differences in allele frequency from the more strongly isolated host would be erased by a more motile host with more frequent dispersal events (Jarne and Théron, 2001; Keeney et al., 2009; Louhi et al.,

2010). The relative role each host plays in its parasites genetic structure is still largely unexplored (Witsenburg et al., 2015).

A distinct class of heteroxenous parasites includes those transmitted by vectors (Witsenburg et al., 2015). For these parasites, the two host species required for completion of their life cycle are from highly diverged taxa (i.e., invertebrate versus vertebrate). Moreover, these parasites are obligate and cannot live outside their hosts. Multiple encounters between host and vector are then necessary for the sustainability of the parasite population. Witsenburg et al.

(2015) compared the genetic structure of a parasite haemosporidian parasite

(Polychromophilus melanipherus), its vector the bat fly (Nycteribia schmidlii), and its host bent-winged bat (Miniopterus schreibersii), and revealed the parasite’s

63 population structure matched neither that of the vector nor the host. From this study, it has been suggested that gene flow within the P. melanipherus parasite population does not simply reflect that of the most motile host but is likely a summation of the host and vector dispersal patterns.

Mosquitoes are another well-studied vector of pathogens that cause diseases like malaria, West Nile, Zika and other arboviruses (Huang et al., 2019).

Avian malaria is a diseased host state caused by a parasitic protozoan

(Plasmodium spp.) infection. Plasmodium spp. are obligate blood parasites that require an invertebrate host (mosquito) for sexual reproduction and a vertebrate host for asexual reproduction to complete its life cycle. Avian host and

Plasmodium spp., which have been found in most bird species worldwide

(Valkiũnas, 2005), provide excellent model systems for determining how spatial variation in host movement and distribution influences parasite genetic structure and adaptive host trait evolution (Cumming et al., 2012a).

Our study focuses on Plasmodium infections of one of the most common species of waterfowl in southern Africa, the Red-billed Duck (Anas erythrorhyncha; Irwin, 1981; Tarboton et al., 1987; Parker, 1994). The Red-billed

Duck is a semi-nomadic dabbling duck with a range that extends from South

Africa to Ethiopia and the Sudan, and in Madagascar. This range includes several potential barriers to gene flow including the Namib Desert, Kalahari

Desert, Great Karoo plateau basin, and the Mozambique Channel (Kear, 2005;

Figure 2.1). Using 3 years of satellite telemetry data, Cumming et al. (2012b)

64 Figure 2.1: Range map of Red-billed Duck in southern Africa. The distribution is outlined in gray. Sample sites are indicated by the colored dots: Barberspan, South Africa (BAR; orange); Strandfontein Sewage Treatment Plant in Cape Town, South Africa (STR; green), KwaZulu-Natal, South Africa (KZN; gray),Lake Ngami, Botswana (NGA; yellow), Lake Chuali, Mozambique (CHU; black), Namibia (NAM; red), and Lake Manyame and Chivero, Zimbabwe (ZW; blue).

65 determined the range of Red-billed Duck is largely influenced by life history and habitat requirements related to breeding success and annual molting migrations.

Current movements likely reflect a historical response to environmental variability, rather than a short-term selection of optimal habitat for breeding or escaping resource attrition. Based on the Red-billed Duck’s capacity and predisposition for dispersal, we predict the genetic connectivity to be high across its known range. Moreover, we predict the population structure (or lack thereof) of their parasites (Plasmodium spp.) to be similar to that of this motile host. Here, we compare population structure of the host to the parasite to inform our general understanding of the role the host may be playing in the movement of these parasites across southern Africa.

Materials and Methods

Sample collection

Blood or tissue samples were collected from 68 Red-billed Ducks from seven sampling sites scattered across southern Africa. These sites include three sites in South Africa: (Barberspan, BAR (N = 12); Strandfontein Sewage

Treatment Plant in Cape Town, STR (N = 9); and KwaZulu-Natal, KZN (N = 6)), and one site in Botswana (Lake Ngami, NGA; N = 8), Mozambique (Lake Chuali,

CHU; N = 2), Namibia (NAM; N = 10), and Zimbabwe (Lake Manyame and

Chivero, ZW; N = 12) (Table 2.1). With the exception of the ten individuals from

Namibia, which were only represented by muscle samples, all blood samples

66 Table 2.1: Sampling information of Red-billed Duck.

Sample site Sample I.D. Tissue Type Barberspan, South Africa BAR0292a Blood Barberspan, South Africa BAR0553a Blood Barberspan, South Africa BAR0750a Blood Barberspan, South Africa BAR1096a Blood Barberspan, South Africa BAR1115a Blood Barberspan, South Africa BAR1141a Blood Barberspan, South Africa BAR1156a Blood Barberspan, South Africa BAR1164a Blood Barberspan, South Africa BAR1174a Blood Barberspan, South Africa BAR1177a Blood Barberspan, South Africa BAR1269a Blood Barberspan, South Africa BAR1273a Blood Barberspan, South Africa BAR1461a Blood Barberspan, South Africa BAR1811a Blood Barberspan, South Africa BAR2021a Blood Lake Chuali, Mozambique CHU0102a Blood Lake Chuali, Mozambique CHU0120a Blood Kwazulu-Natal, South Africa KZN001a Blood Kwazulu-Natal, South Africa KZN002a Blood Kwazulu-Natal, South Africa KZN003a Blood Kwazulu-Natal, South Africa KZN004a Blood Kwazulu-Natal, South Africa KZN005a Blood Kwazulu-Natal, South Africa KZN006a Blood Lake Ngami, Botswana NGA0010a Blood Lake Ngami, Botswana NGA0081a Blood Lake Ngami, Botswana NGA0456a Blood Lake Ngami, Botswana NGA0515a Blood Lake Ngami, Botswana NGA0517a Blood Lake Ngami, Botswana NGA0559a Blood Lake Ngami, Botswana NGA0618a Blood Lake Ngami, Botswana NGA0627a Blood Namibia PL1107a Muscle Namibia PL1108a Muscle Namibia PL1109a Muscle Namibia PL1110a Muscle Namibia PL1111a Muscle Namibia PL1112a Muscle Namibia PL1116a Muscle Namibia PL1117a Muscle Namibia PL1118a Muscle Namibia PL1119a Muscle

67 Namibia PL1110a Muscle Namibia PL1111a Muscle Namibia PL1112a Muscle Namibia PL1116a Muscle Namibia PL1117a Muscle Namibia PL1118a Muscle Namibia PL1119a Muscle Strandfontein, South Africa STR0313a Blood Strandfontein, South Africa STR0493a Blood Strandfontein, South Africa STR0506a Blood Strandfontein, South Africa STR0514a Blood Strandfontein, South Africa STR0531a Blood Strandfontein, South Africa STR0599a Blood Strandfontein, South Africa STR0664a Blood Strandfontein, South Africa STR0868a Blood Strandfontein, South Africa STR0880a Blood Zimbabwe ZW0065a Blood Zimbabwe ZW0067a Blood Zimbabwe ZW0073a Blood Zimbabwe ZW0354a Blood Zimbabwe ZW0370a Blood Zimbabwe ZW0372a Blood Zimbabwe ZW0504a Blood Zimbabwe ZW0518a Blood Zimbabwe ZW0610a Blood Zimbabwe ZW0613a Blood Zimbabwe ZW0667a Blood Zimbabwe ZW0873a Blood Zimbabwe ZW1187a Blood Zimbabwe ZW1505a Blood Zimbabwe ZW1611a Blood Zimbabwe ZW1614a Blood Zimbabwe ZW1615a Blood Zimbabwe ZW1737a Blood

68 were previously screened for avian malaria (Cumming et al., 2012b; Hellard et al., 2016).

DNA extraction

Genomic DNA extraction was performed according to the DNeasy Blood and Tissue Kit manufacturer’s protocol (Qiagen, Valencia, CA,). DNA extractions were quantified using a NanoDrop 2000 Spectrophotometer (Thermo Scientific,

Waltham, MA). Samples that did not meet a minimum concentration of 0.02

µg/µL were re-extracted.

ddRAD-seq library preparation

We used the double-digest restriction-site-associated DNA sequencing

(ddRAD-seq) protocol of DaCosta and Sorenson (2014) to generate a pseudorandom genome-wide sample of loci (Miller et al. 2007; Baird et al.,

2008). In short, 1 µg of genomic DNA was digested using 10 U of restriction enzymes Sbf1 and EcoR1. Adapters containing sequences compatible with

Illumina TruSeq reagents and barcodes for de-multiplexing sequencing reads were ligated to the sticky ends generated by the double digest. Adapter-ligated

DNA fragments were size-selected (300 – 450 bp) using gel electrophoresis (2% low-melt agarose) and a MinElute gel extraction kit (Qiagen, Valencia, CA). Size- selected fragments were amplified using the polymerase chain reaction (PCR) with Phusion high-fidelity DNA polymerase (Thermo Scientific, Pittsburgh, PA).

Amplified products were then purified using magnetic AMPure XP beads

69 (Beckman Coulter Inc., Indianapolis, IN). The concentration of purified PCR products was quantified using real-time PCR with an Illumina library quantification kit (KAPA Biosystems, Wilmington, MA) on an ABI 7900HT SDS

(Applied Biosystems, Foster City, CA). Barcoded samples were pooled in equimolar concentrations and multiplexed libraries were sequenced on an

Illumina HiSeq 2000 (Yale University, Science Hill, Connecticut). Raw Illumina reads were deposited in the National Center for Biotechnology information

(NCBI) Sequence Read Archive.

Bioinformatics processing

We used the computational pipeline described by DaCosta and Sorenson

(2014) [Scripts available at: http://github.com/BU-RAD-seq/ddRAD-seq-Pipeline] to de-multiplex and process raw Illumina reads. The filtered reads from each individual were condensed into a single file for further processing. Reads were assigned to individual samples based on unique barcode combinations. For each sample, identical reads were combined while still retaining read counts and the highest quality score for each nucleotide position. Individual reads >10% divergent and/or those reads with an average Phred score of < 20 were filtered out. Reads were then clustered into putative loci using the UCLUST function in

USEARCH v. 5 (Edgar, 2010) with an –id setting of 0.85. The highest read quality from each cluster was mapped to the mallard reference genome (Anas platyrhynchos) via the Ensembl genome browser (Cunningham et al., 2018).

Clusters with identical or nearly identical BLAST hits (i.e. aligned to ± 50 bp on

70 the same reference genome scaffold) were combined. The reads within each cluster (i.e. putative locus) were aligned using MUSCLE V. 3 (Edgar, 2004), and then samples were genotyped within each aligned cluster. Alignments with end gaps due to indels, polymorphisms at the SbfI restriction site and/or greater than or equal to two polymorphisms in the last five base pairs were either automatically trimmed or flagged for manual editing.

Genotypes were scored as described in DaCosta and Sorenson (2014): homozygous genotypes were defined if greater than 93% of sequence reads were consistent with a single haplotype, wheras heterozygotes were defined if a second haplotype was represented by at least 29% of reads, or if a second haplotype was represented by as few as 10% of reads and the haplotype was present in other individuals. Individual genotypes that did not meet either criteria or contained more than two haplotypes were flagged. We retained the allele represented by the majority of reads and scored additional alleles as missing data from flagged samples. The second allele was scored as missing for apparently homozygous genotypes based on 1 to 5 reads which we considered

“low depth”. All loci with ≤10% missing genotypes and ≤5% flagged genotypes were retained for further analyses.

We categorized ddRAD-seq-loci as either autosomal or sex-linked on the basis of chromosomal alignments to the mallard reference genome and on sex- specific patterns of read depth and heterozygosity (Lavretsky et al., 2015; Peters et al., 2016). Whereas, females carry only one copy of the Z-chromosome, they should have half as many reads for Z-linked loci compared to males and no

71 heterozygosity. The number of reads for autosomal loci should be comparable between males and females. For loci identified as Z-linked, we retained a single allele and coded the second allele as missing in females. Autosomal and Z-linked loci for each individual were combined for relatedness and population structure analyses.

Data analyses

We measured sequence similarities among individuals in our samples using maximum likelihood estimates as implemented in the program ML-RELATE and SPAGeDi to test for familial groups (Kalinowski et al. 2006; Hardy and

Vekemans, 2002). Because no sibships were detected, all individuals were retained for further analysis.

We calculated composite pairwise population estimate of relative divergence (i.e., ΦST; the proportion of total genetic variation partitioned among sample groups) and nucleotide diversity (the average number of pairwise differences among all sequences within populations) for Red-billed Ducks and

Plasmodium spp. in the R package ‘PopGenome’ (Hartl and Clark, 1997; Pfeifer et al., 2020).

To determine if the Red-billed Duck has discrete population units or is consistent with a panmictic population, we used a Principal Component Analysis

(PCA) (parsimony-informative loci) in the R package ‘adegenet’ (Jombart, 2008).

This multivariate statistical procedure orthogonally transforms genotypic data to identify components that describe the highest variation observed among

72 samples. The two components (PC1 and PC2) that accounted for the largest amount of variance among samples were plotted on the X- and Y-axes, respectively. In general, individuals that are more genetically similar are expected to cluster together within the PCA plot.

Next, we identified the optimum number of genetic populations (K) in the program STRUCTURE v. 2.3.4 (Pritchard et al., 2000). Alleles were categorized based on full sequences for each locus rather than individual single nucleotide polymorphism (SNP). We coded alleles 1 to n, where n is the observed number of unique alleles/haplotypes at each locus. We evaluated the ln Pr(X|K) for K populations of one to seven, and without incorporating a priori information about sampling locality or population. STRUCTURE was run using an admixture model and correlated allele frequencies for 500,000 burn-in and 1,000,000 sampling generations. We replicated each analysis five times and calculated ∆K to determine the most likely number of populations (Evanno et al., 2005) using

STRUCTURE HARVESTER (Earl and von Holdt, 2011).

Finally, we concatenated all loci for Red-billed Ducks, combining the two alleles for each individual into a consensus sequence with polymorphisms coded with ambiguity codes, to construct a maximum likelihood phylogeny using RAxML v8.1.15 (Stamatakis, 2014). We defined a GTR+Gamma substitution model and used 1000 rapid bootstrap replicates to assess nodal support. We used FigTree v1.4.0 (Rambaut, 2012) and MEGA (vers. 6.0.6; Tamura et al., 2013) to visualize trees. Bootstrap values greater than 70% were considered to be strong support for a clade (Weinell and Bauer, 2018).

73 Maximum likelihood phylogenies for 94 Red-billed Duck mitochondrial

DNA (mtDNA) control region sequences (Peters and Cumming, unpubl. data) and 38 Plasmodium spp. mtDNA cytochrome b sequences (obtained from

Cummings et al., 2012b; Hellard et al., 2016) were constructed in MEGA vers.

6.0.6 using the Tamura-Nei substitution model (Tamura et al., 2013). A nearest- neighbor-interchange (NNI) search was used to construct and estimate the best- fit substitution model for the ML phylogeny (Tamura et al., 2013). All the positions containing insertions/deletions were eliminated from the analyses (complete deletion option). Branch support values were computed by bootstrap analyses with 1,000 replications.

Plasmodium lineages were identified by inputting the cytochrome b sequences into the BLAST search option on the Malavi website

(http://130.235.244.92/Malavi/blast.html; Bensch et al., 2007). The region in which each Plasmodium lineage was first described was also noted.

Results

Red-billed Duck

We obtained an average of 670,000 (± 370,000) high quality reads per individual. A total of 3,482 loci were recovered from >90% of individuals and contained ≤5% flagged genotypes. The median read depth for these loci for all

Red-billed Duck was 85 (± 43.5) reads per individual per locus and genotypes were complete for 94.3% and partial (one allele scored) for 2.0% of individuals per locus. Based on BLAST hits to the duck reference genome, read depth, and

74 heterozygosity, we inferred that 3,303 (94.9%) loci were autosomal and 170

(4.9%) loci were Z-linked. In addition, we identified nine loci that had BLAST hits to the first 3,000,000 nucleotides on the 5’-end of the Z-chromosome, but relatively high depth and heterozygosity for females, as probable pseudo- autosomal regions; these loci were excluded from further analyses.

We recovered 3,303 variable autosomal loci in 68 individuals for our analyses. From our data, we did not detect half- or full-sib groups; average relatedness was low among all samples (r < 0). The composite ΦST value for

-3 Red-billed Ducks across the seven sampling sites was 0.016 (ΦST = 2.4 x 10 –

0.043; Table 2.2), indicating little genetic differentiation. Most notably, NGA and

-3 BAR samples were genetically the most similar (ΦST = 2.4 x 10 ), whereas KZN and CHU were the most differentiated (ΦST = 0.043). Moreover, our data suggested that CHU experiences greater impediments to gene flow (ΦST = 0.028

-3 – 0.043) than any other sampling site (ΦST = 2.4 x 10 – 0.019); however, the sample size for CHU was small (N = 2 individuals), and ΦST values need to be interpreted cautiously.

A lack of population structure was further supported by the PCA which revealed an absence of clustering amongst the seven sampling sites (Figure

2.2). Likewise, based on STRUCTURE analyses, ln Pr(X/K) indicated K = 1 population was the best-supported model. Examining assignment probabilities for

K = 2 populations did not reveal any evidence of subtle population structure.

Similar to the nuclear loci, the proportion of genetic variation between sampling sites for the Red-billed Duck mtDNA control region sequence was low

75 for most pairwise comparisons (ΦST = −0.011; range = −0.034 – 0.025; Table

2.2), except those between Lake Chuali (CHU) and the remaining sites (ΦST =

0.080; range = −0.042 – 0.17; Table 2.2).

Finally, the maximum likelihood phylogenies generated for 2,885 ddRAD- seq loci and the mtDNA control region revealed shallow relationships among individuals, a lack of clustering of individuals from the same sites, and low nodal support (<50) for most of the clades (Figure 1.3 and Figure 1.4, respectively).

76

Figure 2.2: Principal component analysis of 2,885 ddRADseq loci from 68 Red- billed Duck samples. Red-billed Duck individuals for each sample site (dots): Namibia (NAM; Red), Lake Ngami; Botswana (NGA; Yellow), Strandfontein, South Africa (STR; Green), Barberspan, South Africa (BAR; Orange), Lake Chuali, Mozambique (CHU; Black), Zimbabwe (ZW; Blue), and Kwazulu-Natal, South Africa (KZN; Gray).The PCA revealed no evidence of population structure.

77 Table 2.2: Pairwise ΦST for concatenated 2885 nuclear loci from 68 Red-billed Ducks and 656 nucleotides of the mtDNA control region from 94 Red-billed Ducks. Sampling sites in southern Africa: NGA – Lake Ngami, Botswana; KZN – Kwazulu-Natal, South Africa; ZW – Zimbabwe; STR – Strandfontein, Cape Town, South Africa; CHU – Lake Chuali, Mozambique; NAM – Namibia; BAR – Barberspan, South Africa. N/A indicates not available—mtDNA sequences for the Kwazulu-Natal samples were not available.

ddRAD- Population 1 Population 2 seq mtDNA STR BAR 0.0024 -0.020 KZN 0.016 N/A NAM 0.010 -0.017 NGA 0.0033 -0.034 ZW 0.0027 -0.019 CHU 0.03 0.091 BAR KZN 0.014 N/A NAM 0.0083 -0.0035 NGA 0.0055 -0.0071 ZW 0.0038 -0.0094 CHU 0.028 0.11 KZN NAM 0.019 N/A NGA 0.018 N/A ZW 0.06 N/A CHU 0.043 N/A NAM NGA 0.0088 -0.024 ZW 0.0066 0.025 CHU 0.036 -0.043 NGA ZW 0.005 -0.0053 CHU 0.032 0.073 ZW CHU 0.029 0.17

78 BAR0292a KZN001a STR0514a BAR1174a KZN002a BAR1115a ZW1737a STR0880a ZW0610a ZW0372a ZW0504a KZN006a BAR1177a KZN004a NAM1112a NAM1117a BAR1273a BAR2021a BAR0750a BAR1156a ZW0065a ZW0354a CHU0102a STR0313a BAR1141a ZW0067a ZW0667a

53 KZN005a NGA0010a NAM1111a NAM1108a NGA0081a BAR1811a CHU0120a NGA0627a ZW1614a ZW0370a ZW1615a

55 NGA0456a NAM1110a ZW0518a NGA0515a STR0868a NAM1107a NAM1109a ZW1611a STR0506a STR0599a ZW1187a NGA0618a NGA0517a STR0664a BAR1461a ZW0873a NAM1116a STR0493a NAM1118a KZN003a BAR1164a ZW0073a 56 BAR0553a NAM1119a NGA0559a BAR1269a ZW1505a STR0531a ZW0613a BAR1096a

Figure 2.3: Maximum likelihood phylogeny generated from 2885 nuclear loci from 68 Red-billed Duck samples from southern Africa. Sample sites represented by colored dots: Red – Namibia; Yellow – Lake Ngami; Botswana, Green – Strandfontein, South Africa; Orange – Barberspan, South Africa; Black – Lake Chuali, Mozambique; Blue – Zimbabwe. We used 1000 bootstrap replications to evaluate nodal support. Bootstrap values greater than 50% was considered to be strong support for a clade

79 ZW0902 STR0482 ZW0885 ZW0255 ZW0255 BAR1177 BAR1115 NAM1110 CHU0127 NGA0559 NGA0685 STR0468 STR0664 ZW1124 BAR1164 STR0467 NGA0010 ZW0065 NGA0618 NGA0080 STR0510 ZW0903 BAR0748 BAR0289 ZW1102 NGA0081 BAR0317 BAR0325 NGA0456 NGA0686 NAM1108 NAM1111 NGA0457 NGA0464 NGA0558 ZW0271 ZW1135 STR0505 STR0514 STR0518 STR0599 ZW0271 ZW0910 BAR1174 STR0472 BAR0292 BAR0987 ZW0848 NGA0455 ZW0264 BAR1141 NAM1117 BAR1273 STR0503 ZW1139 NGA0459 ZW0873 NGA0645 STR0493 NAM1118 CHU0120 BAR1096 STR0506 ZW1114 STR0484 ZW0278 ZW0849 BAR0750 ZW1144 NAM1119 NGA0460 STR0527 ZW1127 NAM1116 BAR1269 NGA0517 NAM1107 ZW1187 BAR0324 BAR0553 ZW0872 NGA0627 NAM1109 STR0477 CHU0102 CHU0177 CHU0205 BAR1156 STR0531 NGA0461 NAM1112 NGA0470 STR0170 STR0485

Figure 2.4: Maximum likelihood phylogeny generated from mitochondrial control region sequences of 94 Red-billed Duck samples using MEGA version 6.0.6. Sample sites represented by colored dots: Red – Namibia; Yellow – Lake Ngami; Botswana, Green – Strandfontein, South Africa; Orange – Barberspan, South Africa; Black – Lake Chuali, Mozambique; Blue – Zimbabwe. We used 1000 bootstrap replications to evaluate nodal support. Bootstrap values greater than 50% was considered to be strong support for a clade. Scale: 0.002.

80 Plasmodium spp. The cytochrome b gene (478 bp) was sequenced from 38 Plasmodium spp. isolated from Red-billed Ducks at four sampling locations (ZW, BAR, STR and NGA; Cumming et al., 2012b; Hellard et al., 2016); all individuals screened from KZN (N = 6) and CHU (N = 5) were negative for infections, whereas samples from NAM (N = 10) were not screened. At the STR sampling site, only two of the two Red-billed Ducks infected with Plasmodium spp., and only one isolate was sequenced; therefore, STR was excluded from estimates of genetic differentiation and diversity. The composite ΦST value for Plasmodium spp. across sampling sites was 0.21 (ΦST = 0.18 – 0.53; Table 2.3), indicating a great amount of genetic differentiation amongst our samples. Plasmodium isolates from NGA and ZW were the most genetically dissimilar (ΦST = 0.53), indicating strong population structure and the presence of barriers to gene flow between these two sampling sites.

A maximum likelihood phylogeny using nucleotide sequences from the 38

Plasmodium spp. cytochrome b genes revealed six different clusters (lineages) of

Plasmodium spp. among the four sampling sites (Figure 2.5). Red-billed Duck sample NGA0006 was found to be infected with a Haemoproteus parasite

(Lineage: TANGAL01) and was used as an outgroup (Cumming et al., 2012b). A common lineage of Plasmodium spp. was observed in 21 ZW, 3 BAR, and 1 STR individuals; this lineage received high nodal support (99%) and was identified as lineage COPALBO3 in the Malavi database. The remaining Plasmodium spp. (N

= 6 lineages sequenced from 13 samples) represented ‘rare’ lineages identified

81 Table 2.3: Pairwise ΦST for 478 base pairs of the mitochondrial cytochrome b gene sequence from 38 Plasmodium spp. across three sampling sites in southern Africa: BAR - Barberspan, South Africa; NGA - Lake Ngami, Botswana; ZW - Zimbabwe.

Pairwise comparison ΦST BAR/NGA 0.26 BAR/ZW 0.18 NGA/ZW 0.53

82 Table 2.4: Plasmodium identification and the specific lineage name. The more common lineage (COPALB03; Musa et al., 2019) and more rare lineages (GRW02, LAIRI01, MILANS05, MILANS06, RTSR1, SYBOR01) were isolated from Red-billed Duck (Beadell et al., 2006; Pacheco et al., 2011; Njabo et al., 2011; Loiseau et al., 2017; Ishtiaq et al., 2012; Lopez et al., 2013; Beadell et al., 2004). Sample sites: NGA – Lake Ngami; Botswana, STR – Strandfontein, South Africa; BAR – Barberspan, South Africa; ZW – Zimbabwe.

Sample Plasmodium lineage Region first described I.D. BAR0324 COPALB03 Madagascar, Africa BAR0553 COPALB03 Madagascar, Africa BAR0748 MILANS05 Cameroon, Africa BAR1273 COPALB03 Madagascar, Africa BAR1431 RTSR1 Punjab, Pakistan BAR1461 SYBOR10 Indian Ocean Islands BAR1811 MILANS05 Cameroon, Africa BAR2021 MILANS05 Cameroon, Africa NGA0081 SYBOR10 Punjab, Pakistan NGA0515 MILANS06 Gulf of Guinea, West Africa NGA0516 MILANS06 Gulf of Guinea, West Africa STR0518 LAIRI01 Arizona, U.S.A. ZW0067 COPALB03 Madagascar, Africa ZW0072 COPALB03 Madagascar, Africa ZW0220 COPALB03 Madagascar, Africa ZW0226 COPALB03 Madagascar, Africa ZW0242 COPALB03 Madagascar, Africa ZW0248 COPALB03 Madagascar, Africa ZW0273 MILANS06 Gulf of Guinea, West Africa ZW0278A LAIRI01 Arizona, U.S.A. ZW0278B MILANS06 Gulf of Guinea, West Africa ZW0285 COPALB03 Madagascar, Africa ZW0288 COPALB03 Madagascar, Africa ZW0299 COPALB03 Madagascar, Africa ZW0354 MILANS06 Gulf of Guinea, West Africa ZW0372 RTSR1 Punjab, Pakistan ZW0504 COPALB03 Madagascar, Africa ZW0518 COPALB03 Madagascar, Africa ZW0613 COPALB03 Madagascar, Africa ZW0673 COPALB03 Madagascar, Africa ZW0873 LAIRI01 Arizona, U.S.A. ZW0928 COPALB03 Madagascar, Africa ZW1102 LAIRI01 Arizona, U.S.A. ZW1220 GRW02 Hawai’i, U.S.A. ZW1465 COPALB03 Madagascar, Africa ZW1505 COPALB03 Madagascar, Africa ZW1611 COPALB03 Madagascar, Africa ZW1737 COPALB03 Madagascar, Africa

83

RTSR1

NGA0006 TANGAL01 ZW0354 ZW0273 100 NGA0515 MILANS06 NGA0516 ZW0278B BAR0748 100 BAR1811 MILANS05 BAR2021 99 ZW0372 RTSR1 77 BAR1431 59 MILANS05 BAR1461 85 NGA0081 SYBOR10 ZW1220 GRW2 38 STR0518

93 ZW0278A ZW1102 LAIRI01 55 ZW0873 ZW1505 ZW1465 98 ZW0220 ZW0504 ZW0518 ZW0299 94 ZW0226 ZW0248 ZW1611 ZW0673 BAR0324 83 ZW0285 COPALB03 ZW0288 ZW0067 BAR1273 BAR0553 ZW1737 ZW0928 ZW0613 ZW0242 ZW0072

Figure 2.5: Maximum likelihood phylogeny generated from 478 bp of the cytochrome b gene sequenced from 38 Plasmodium spp. (Cumming et al.,2012b; Hellard et al., 2016; MEGA version 6.0.6). Sample sites represented by colored dots: Red – Namibia; Yellow – Lake Ngami; Botswana, Green – Strandfontein, South Africa; Orange – Barberspan, South Africa; Black – Lake Chuali, Mozambique; Blue – Zimbabwe. Red-billed Duck sample NGA0006 was infected with a Haemoproteus (TANGAL01) parasite and was used as an outgroup. Plasmdoium spp. lineages are indicated on the right of the plot in bold. COPALB03 is referred as the common lineage and the remaining lineages are considered rare. We used 1000 bootstrap replications to test nodal support. Bootstrap values greater than 50% was considered to be strong support for a clade. Scale 0.002.

84 as MILANS05, RTSR1, SYBOR10, LAIRI01, and MILANS06 (Figure 2.5; Table

6).

Discussion

Here, we present genetic data from Red-billed Duck and their Plasmodium parasites sampled across southern Africa. For heteroxenous parasites like

Plasmodium spp., dispersal and genetic connectivity is expected to be dependent upon the movements of their vertebrate and invertebrate hosts, especially on the host with the highest genetic connectivity or dispersal behavior

(Jarne and Théron, 2001). In this study, we predicted estimates of Plasmodium spp. population structure would reflect that of the more motile Red-billed Duck host. Consistent with the propensity of the Red-billed Duck to move long distances (Brown et al., 1982; Oatley and Prŷs-Jones, 1986; del Hoyo, et al.

1992, Scott and Rose, 1996; Kear, 2005), we observed low genetic differentiation amongst all sampling localities for nuclear DNA (ΦST = 0.016) and mtDNA (ΦST =

0.020). These data are consistent with findings in other nomadic species of

Southern Hemisphere waterfowl (Dhami et al., 2013; Chapter 1). However, population structure in Plasmodium spp. is strong (ΦST = 0.255) and discordant with that of their host (Table 2.3 and Table 2.2, respectively). We did not find evidence that movements by the host greatly influences the movement of these parasites across southern Africa.

Avian Plasmodium spp. are vectored by multiple species of mosquito in the Culex, Culiseta and Aedes genera, all of which have overlapping distributions in southern Africa (Njabo et al., 2010; Okanga et al., 2013). Okanga et al. (2013)

85 sampled four bird species and 15 mosquito species in the Western Cape

Province of South Africa. Of the 15 mosquito species, five of those belonged to the genera Culex and Culiseta, which are known to vector avian Plasmodium.

Additionally, Culex and Aedes mosquitoes were also observed in the eastern portions of South Africa, Botswana and Mozambique (Cornel et al., 2018). Thus, it appears there are multiple species of mosquito that vector Pasmodium spp. within the Red-billed Duck’s known range.

The variation in Red-billed Duck infection rates between sampling sites and the discordance in population structure between host and parasite may indicate species-specific variation in Plasmodium parasite load (Gutiérrez-López et al., 2020), contact rates of infected mosquitoes with the host (Medeiros et al.,

2016; Gutiérrez-López et al., 2019), or the efficiency of the vector species in transmitting the parasite (Gutiérrez-López et al., 2020). Furthermore, the distribution of Plasmodium lineages could reflect specialization on mosquito vectors, and thus, the distribution of those vectors. Screening mosquitoes for

Plasmodium in southern Africa is needed to examine the role of multiple potential vectors and vector dispersal, relative to host dispersal, on the population structure of Plasmodium spp.

The strong population structure observed in Plasmodium spp. was heavily influenced by the distribution of a common lineage with high prevalence in Red- billed Ducks sampled in Zimbabwe relative to rare lineages that were found at a higher prevalence in Red-billed Ducks sampled in Barberspan, South Africa, and

Lake Ngami, Botswana. The common lineage (COPALB03) has been isolated

86 from several other African waterfowl species, including Yellow-billed Duck (Anas undulata), Egyptian Goose (Alopochen aegyptiaca), Spur-winged Goose

(Plectropterus gambensis), Hottentot Teal (Spatula hottentota), Cape Teal (Anas capensis), and White-faced Whistling Duck (Dendrocygna viduata), suggesting that it is a generalist distributed across southern Africa (Cumming et al., 2012b).

Movements in any or all of these hosts could influence the distribution of the parasite. However, all of these species infected with the common lineage of

Plasmodium are freshwater birds with broadly overlapping distributions in southern Africa, and they are nomadic in response to seasonal variability in rainfall (Red List of Threatened Species (IUCN), Data accessed: 03/2020).

Therefore, we have no a priori reason to suspect that the generalist behavior of a waterfowl parasite would result in strong population structure.

The common lineage of Plasmodium spp. appears to have a broad distribution and was isolated from Red-billed Ducks sampled from Zimbabwe and both Barberspan and Strandfontein in South Africa. This lineage was also isolated from White-faced Ducks in Mozambique and passerines in Madagascar

(Cumming et al., 2012b; Musa et al., 2019). However, the presence of

Plasmodium at these sites does not necessarily imply that these birds were infected at that same site. Avian Plasmodium spp. infection is a chronic disorder

(Bensch et al., 2007; Latta & Ricklefs, 2010; Asghar et al., 2012; LaPointe et al.,

2012; Wood et al., 2013), and individuals could have obtained their parasite at any point during their annual cycle, perhaps from some other geographic region.

The nomadic behavior of their hosts could have transported the parasite across

87 the landscape, even if it is not present in the vectors at those sites. Although this hypothesis might account for the broad distribution of the common lineage, it cannot explain the overall high population structure observed for Plasmodium spp. in general. Future studies should investigate mosquito vectors and

Plasmodium in overlapping ecoregions in southern Africa and across the whole distribution of these waterfowl species.

The ‘rare’ lineages of Plasmodium spp. isolated from Red-billed Duck may provide evidence of a spillover infection (Power and Mitchell, 2004). A spillover infection occurs when a reservoir host population with a high parasite prevalence comes into contact with a novel host population. Two of these rarer lineages of

Plasmodium spp. have also been observed in the congeneric Yellow-billed Duck sampled from Barberspan (MILANS05; SYBOR10; Cumming et al., 2012b). The rare lineages of Plasmodium may represent a regionally abundant Plasmodium lineage that infects a reservoir avian host more often than these ducks or a lineage common to migratory birds wintering in southern Africa. Alternatively, these Plasmodium lineages might simply have a low abundance in southern

Africa. Further investigation into the spillover infection hypothesis and movement of migrants into southern Africa would require additional sequence data from other bird species infected with these rare Plasmodium lineages.

Based on host-parasite theory, we expected to find evidence that population structure in a heteroxenous parasite tracks that of the more motile host. We did not find this relationship between Red-billed Duck and its

Plasmodium spp. Specifically, we found strong population structure in

88 Plasmodium spp. isolated in contrast to the Red-billed Ducks which exhibited little to no evidence of structure. We have provided a number of hypotheses that could explain the discordance observed between host and parasite. This list is not meant to be exhaustive, but rather is aimed at providing future research directions. Regardless, our study demonstrates that population structure in this heteroxenous parasite does not mirror that of its more motile host, filling a knowledge gap in our understanding of the complex interactions between lineages of avian malaria and their hosts in southern Africa.

89 Chapter 3:

Identification of genomic regions associated with Plasmodium infection

Introduction

Classic host–parasite theory describes coevolutionary dynamics via reciprocal changes in allele frequencies in host and parasite populations to maintain genetic diversity (either between or within populations) or select for rare genetic variants (Summers et al., 2003; Brunner and Eizaguirre, 2016). The consequences of these genetic changes depend on characteristics of the

“disease triangle” (Figure 3.1; Brunner and Eizaguirre, 2016): a) the host corner includes life history traits, immunity and abundance, b) the parasite corner characterizes transmission rates, life cycle and virulence, and c) the environment corner represents the ecosystem the host and the parasite live in (Brunner and

Eizaguirre, 2016). Established host-parasite interactions develop over time in an environment with brief punctuated advantages that lead to gradual coevolutionary dependencies that promote the survival and reproductive success of both species (Solomon et al., 2015).

The coevolutionary arms race between vertebrate host and Plasmodium parasite likely involves genetic changes to mechanisms required for Plasmodium to invade and replicate within host tissue (Cowman and Crabb, 2006). During the

Plasmodium life cycle, Plasmodium sporozoites are injected into their vertebrate host through the saliva of female mosquitoes. Sporozoites travel through the

90

Figure 3.1: Disease ecology triangle

91 circulatory system to infect and replicate within the host’s tissue. Plasmodium sporozoites are coated in circumsporozoite proteins (CSP). These proteins are integral for the first stages of parasitic invasion into Plasmodium host’s tissue

(McCutchan et al., 1996; Cowman and Crabb, 2006).

A phylogenetic analysis of the CSP protein-coding sequence of

Plasmodium species infecting humans, birds, primates, and rodents, revealed a common ancestry between human, chimpanzee, and bird Plasmodium species

(McCutchan et al., 1996). Moreover, the phylogeny highlights the common ancestry between species of Plasmodium that have the highest rate of mortality within human and birds, Plasmodium falciparum and Plasmodium relictum, respectively (Loy et al., 2017; Hellgren et al., 2013). Similar phylogenetic results are observed in another development stage of the Plasmodium in the vertebrate host. Plasmodium merozoite surface proteins (MSP) and the chitanese gene both are integral for parasite invasion and asexual reproduction within the vertebrate host (Hellgren et al., 2013; Garcia-Longoria et al., 2015). The phylogenetic results of CSP, MSP, and the chitanese gene provide insight into the origins, coevolutionary history, and host-parasite mechanisms in a wide range of vertebrate taxa.

Birds have the greatest number and diversity of Plasmodium species in comparison to mammals and lizards. Thus, birds may serve as an excellent model system for understanding the co-evolutionary arms race between all vertebrate host and parasite (Hellard et al., 2016). Thus, in understanding the coevolutionary history and disease ecology of Plasmodium and its bird host,

92 researchers may identify novel mechanisms vulnerable to targeted medical intervention in a wide range of vertebrate host (Marshall, 1942; Cohuet et al.,

2006; Garcia et al., 2006; Jeffares et al., 2007; Pigeault et al., 2015).

In a study of southern African birds, a disparity in Plasmodium infection rates were observed between aquatic and terrestrial foragers (Hellard et al.,

2016). Aquatic foragers comprise ducks and other water bird species that had lower rates of Plasmodium infection in comparison to terrestrial bird species like passerines and pheasants. These results are quite interesting if one considers the life cycle of the Plasmodium vector. Female mosquitoes spend a majority of their life cycle in water, and thus, we would expect them to have greater rates of contact with aquatic birds compared to terrestrial birds. The observations in

Hellard et al. (2016) have led us to hypothesize that repeated exposure of aquatic birds to Plasmodium has resulted in pathogen-driven selection pressure to mitigate infection through genetic changes (Bonneaud et al., 2006).

Here, we focus our investigation into genetic changes associated with

Plasmodium infection by using the Red-billed Duck genomic data generated in

Chapter 2. We used two computational methods to identify loci associated with

Plasmodium infection: a multinomial Dirichlet model (Bayescan) and an ensemble learning method for classification (Random Forest). The knowledge gained from these computational methods provide information on loci experiencing natural selection, or important for the classification of Red-billed

Duck samples into infected or non-infected. These results inform our general

93 understanding of avian host adaptation, pathogen virulence and potential targets for novel medical interventions (Jeffares et al., 2007).

Methods

DNA extraction, ddRADsequencing library preparation, and bioinformatics pipeline for Red-billed Duck

The laboratory and computational methods used to generate the Red- billed Duck genomic data are identical to those outlined in Chapter 2. For our analysis, we excluded samples from Namibia (N = 10) as they were not screened for avian Plasmodium infection.

Avian malaria screening and Plasmodium DNA sequencing

The infection status of Red-billed Duck was previously determined in

Cumming et al. (2012b) and Hellard et al. (2016). Results specific to Red-billed

Ducks are summarized in Chapter 2.

ΦST and Bayescan FST

For each locus, we quantified the composite pairwise population estimate of relative divergence (i.e., ΦST; proportion of total genetic variation partitioned among sample groups) for Red-billed Duck using the function get.F_ST( ) from the R package ‘PopGenome’ (Hartl and Clark, 1997; R Core Team, 2019; Pfeifer,

2020). We tested for signatures of selection using the FST-outlier approach in

BayeScan 2.1 (Gaggiotti and Foll, 2008; Fischer et al., 2011). This statistical

94 methodology identifies loci that show FST values that are significantly more different than expected under neutral evolution (Manunza et al., 2016). The

BayeScan model parameters were set as 20 pilot runs of 5,000 iterations each followed by a burn-in of 50,000 iterations (Gaggiotti and Foll, 2008).

Principal Component Analysis (PCA)

The statistical method used are identical to those outlined in Chapters 1 and 2 with the exception that samples were grouped based on Plasmodium infection status (negative vs. positive).

Random Forest

We used a non-parametric decision tree-based method (Random Forest;

Appendix 3 S1 for R scipt) that combines several machine learning techniques to generate and aggregate a series of decisions to classify Red-billed Ducks into

Plasmodium infected and non-infected groups (Chen and Ishwaran, 2012). Each decision tree generated in our algorithm is built with replacement of variables

(bootstrap). Additionally, for each decision tree a subset of randomly selected variables was used to increase the randomness of our algorithm.

Variables of Importance

We used a computable internal measure of variable importance (VIMP) to rank variables. This feature was especially useful for high-dimensional genomic datasets (i.e. Red-billed Duck data set; Chen and Ishwaran, 2012). To identify

95 VIMP for Red-billed Duck variable (locus), we quantified the mean decrease in

Gini and predictive accuracy using the function varImpPlot( ) from the R package

‘randomForest’ (R Core Team, 2019; Pfeifer, 2020). The mean decrease in the

Gini coefficient measures how an individual variable contributes to the homogeneity of the nodes and leaves in the decision tree formation. Every time a variable was used to split a node in a tree, a Gini coefficient was generated for the child node and compared to the original node. VIMPs with a higher decrease in the Gini coefficient also contain nodes with higher purity (Chen and Ishwaran,

2012). The mean decrease in accuracy for each variable was determined by the exclusion of each variable and calculating its effect on the accuracy of the

Random Forest in classifying individuals to Plasmodium infected and non- infected (Liaw and Wiener, 2002). VIMP with a larger mean decrease in accuracy are more important for classification. This process was replicated in ten independent runs of 5,000 decision trees. Only those variables of importance consistently identified in the top 10 across the independent runs were considered. Mean decrease in Gini and accuracy were plotted in multi way- impotance plot using the function plot_multi_way_importance () in the R package

‘randomForestExplainer’ (R Core Team, 2019; Liaw and Wiener, 2002).

BLAST and Annotation

The DNA sequence for all candidate loci under the influence of natural selection (P = <0.05) or classified as a VIMP were compared to the mallard

(Anas platyrynchos platyrynchos) and chicken reference genome (Gallus gallus)

96 via the Ensembl genome browser (Cunningham et al., 2018). We used a basic local assignment search tool to find regions within reference genomes with high similarity to our loci of interest. For each Red-billed Duck locus, we noted the sequence with greatest similarity (lowest E-value), its gene name, chromosome, chromosomal position, and function.

Results

In our sample, 15 Red-billed Ducks were infected by a Plasmodium lineage whereas 43 were not infected. The composite ΦST value between infected and non-infected individuals was 0.002 (ΦST range among loci = 0 –

0.37), indicating little genetic differentiation amongst samples (Figure 2.2). Some of the ΦST values were high and may indicate outlier loci. However, Bayescan did not detect genomic regions that had significantly higher-than-expected ΦST values relative to background levels of differentiaton. These results were further supported by the PCA which revealed an absence of clustering based on

97 1000 900 800 700 600 500

400 Frequency 300 200 100

0

0.13 0.29 0.65 0.01 0.05 0.09 0.17 0.21 0.25 0.33 0.37 0.41 0.45 0.49 0.53 0.57 0.61 0.69 0.73 0.77 0.81 0.85 0.89 0.93 0.97

Popgenome ΦST

Figure 3.2: ΦST distribution plot when Red-billed Ducks were grouped based on Plasmodium infection status. The x-axis represents the ΦST values and the y-axis represents the number of loci in which that ΦST value was observed in our data. As expected, many loci have very low ΦST values considering the lack of population structure observed within this data set.

98 100

80

60

40

20

PCA 2(1.73%) PCA 0 -120 -70 -20 30 -20

-40

-60 PCA 1 (1.74%)

Figure 3.3: Principal component analysis of Red-billed Duck loci and Plasmodium infection status. Red-billed Duck samples infected with Plasmodium are represented by red dots and non-infected are represented by black dots. Samples did not cluster with respect to infection status. The PCA1 and PCA2 percent eigenvalues are 1.74% and 1.73%, respectively.

99

ST F

Log 10 (PO)

Figure 3.4: Bayescan FST and Bayes decision factor Log 10 (PO) plot for Red- billed Duck Plasmodium infected and non-infected. The decision factor in logarithmic scale (base 10) to determine selection. Black line indicates the Bayes factor threshold of 2 (log10) that provides ‘decisive’ evidence for selection corresponding to a posterior probability of 0.99. There were no significant outliers or loci under selection between subpopulations (Plasmodium infected and non- infected) of Red-billed Duck.

100 Plasmodium infection status (Figure 3.3). We did not observe any signatures of selection between infected and non-infected Red-billed Ducks (Figure 3.4).

To identify Red-billed Duck genomic regions that are important for the classification of individuals into Plasmodium infected and non-infected groups, we used a Random Forest algorithm. Variables with high mean decrease in gini and accuracy were considered to be important for the classification of Red-billed

Duck samples into Plasmodium infected or non-infected groups (Figure 3.5). We combined the rankings of the top 10 allelic variants for 10 replications of the

Random Forest algorithm and plotted them as a frequency of rank assignment

(Figure 4.6). The locus identifier of A4417 represents an arbitrary autosomal locus and the .01 represents the allele numbers. Our analysis suggests

A4417.01, A6724.01, and A6648.02 are consistently ranked in the top 3 in greater than 80% of the decision trees. In general, we observed Red-billed Duck individuals homozygous or heterozygous for A4417.01 and A6724.01 to have lower rates of Plasmodium infection (Figure 3.7-9). Finally, those loci with a variable importance rank below three experienced more variation in assignment frequency between trials and their genotypes were not further explored.

To understand the role these loci may play in Plasmodium infection, we used BLAST to map the important loci to the mallard or chicken genome, and gain insight into their potential function (Table 3.1). The locus identified as

A4417.01 had high sequence similarity to a guanine nucleotide-binding protein

(G protein) in the Mallard duck reference genome (Agebite et al., 2008). Next, the locus identified as A6724.01 was compared to the reference genomes but did not

101 reveal any information about potential gene name or function in either reference genome. Finally, the locus identified as A6648.02 has sequence similarity to succinate-CoA ligase GDP-forming beta subunit (SUCLG2) in the chicken reference genome (Johnson et al., 1998). The variables rank 4 or below were also annotated (Table 3.1; Wise et al., 1990; Mbikay et al., 1995; Norris et al.,

1997; Bridge et al., 1998; Hu et al., 2002; Wu et al., 2003; Caillé et al., 2004;

Charles et al., 2011; Ahmed et al., 2014).

102

Figure 3.5: Random Forest variable Importance plot for Red-billed Duck samples when grouped based on Plasmodium infection status. The x-axis represents mean decreases in accuracy due to the exclusion of a single variable, the more important that variable was for classification of the data the greater mean decrease in the accuracy. The y-axis represents mean decreases in Gini. The variable identification tags are for the top 10 variables: A = autosomal, 4417 = locus and .01 = locus variant.allele. The top 10 variables of importance dots are outlined in black. P >=0.1 – purple dot; 0.1 > P > 0.05 – blue dot; 0.05 > P > 0.01 – green dot; P < 0.01 – red dot.

103 Rank 1 0.9 NA 0.8 10 0.7 9 0.6 8 0.5 7 0.4 0.3 6 0.2 5 0.1

4 Rank Assignment Frequency 0 3 2 1 Red-billed Duck locus.allele

Figure 4.6: Rank assignment frequency plot for the top 10 alleles of importance for Red-billed Duck in association with Plasmodium infection. The x-axis represents the top variables of importance in 10 replicates of 5,000 decision trees. The y-axis represents the frequency in which they were assigned to a particular rank of importance. Rank of importance was reflective of the mean decrease in gini and accuracy for each variable. Rankings were color coded as follows: 1- light blue, 2 - orange, 3 – green, 4 - yellow, 5 - purple, 6 – pink, 7 - dark blue, 8 - red, 9 - dark green, 10 - brown and NA – black; rank of 11 or higher.

104 A4417 Genotype for Red-billed Duck 1 0.9 42 2 8 2 1 1 0.8 0.7 0.6 0.5 Non-infected 0.4 0.3 Infected 0.2

Proportion of Individuals of Proportion 0.1 0 A1,A1 A1,A2 A1,A3 A1,A4 A1,A5 A3,A3 Genotype

Figure 3.7: Proportion of Red-billed Duck with A4417 genotypes and Plasmodium infection. The x-axis represents the A4417 genotypes and the y-axis represents the proportion of Plasmodium infected or non-infected. The total number of Red- billed Duck with genotype is indicated in the stacked bar in white font. Individuals that are homozygous for allelic variant 1 have lower rates of Plasmodium infection than other genotypes (Χ2 = 19.57; df =2; P <0.001).

105 A6724 Genotype for Red-billed Duck 1 0.9 21 9 10 7 3 2 1 3 1 0.8 0.7 0.6 0.5 Non-infected 0.4 Infected 0.3

Proportion of Individuals of Proportion 0.2 0.1 0 A1,A1 A1,A2 A1,A3 A1,A4 A1,A5 A2,A3 A2,A7 A3,A4 A3,A7 Genotype

Figure 3.8: Proportion of Red-billed Duck with A6724 genotypes and Plasmodium infection. The x-axis represents the A6724 genotypes and the y-axis represents the proportion of Plasmodium infected or non-infected. The total number of Red- billed Duck with genotype is indicated in the stacked bar in white font. Individuals that carry allelic variant 1 have significantly lower rates of Plasmodium infection (Χ2 = 7.48; df =2; P <0.025).

106 A6648 Genotype for Red-billed Duck 1

0.9 38 14 3 1 1 0.8 0.7 0.6 0.5 Non-infected 0.4 Infected

0.3 Proprotion of Individuals of Proprotion 0.2 0.1 0 A1,A1 A1,A2 A1,A3 A1,A4 A1,A5 Genotype

Figure 3.9: Proportion of Red-billed Duck with A6648 genotypes and Plasmodium infection. The x-axis represents the A6648 genotypes and the y-axis represents the proportion of Plasmodium infected or non-infected. The total number of Red- billed Duck with genotype is indicated in the stacked bar in white font. Homozygotes for allelic variant 1 have lower rates of Plasmodium infection than heterozygotes (Χ2 = 10.73; df =2; P <0.01).

107 Table 3.1: The top 10 loci of importance for classifying Red-billed Duck into Plasmodium infected and non-infected. The table includes the loci I.D., chromosomal position, model organism gene the loci mapped to, known gene function, and references for gene function.

108

109

110 Discussion

In our study, the Random Forest algorithm consistently identified loci of importance for classification of subpopulations into Plasmodium infected and non-infected. The top ranked locus (A4417.01) was ranked number one in 90% of our replicates; Red-billed Ducks that are homozygous for allelic variant one have significantly lower rates of Plasmodium infection than other genotypes.

This locus had a BLAST hit to a guanine nucleotide binding protein, alpha stimulating (G-proteins; GNAS). G-proteins influence the Plasmodium parasite entry into red blood cells (Harrison et al., 2003). The entry of the Plasmodium parasite into the vertebrate host’s red blood cells is crucial for the pathogenesis of the parasite and is responsible for the symptoms associated with infection

(Harrison et al., 2003; Murphy et al., 2006; Gupta et al., 2015). The A4417.01 coding sequence includes a deletion and potentially a frame shift but additional characterization of this genomic regions is needed to understand the impact this deletion may have on transcription and translation of GNAS. G-protein cellular signaling is very important for communication between cells and is likely conserved throughout the genome (Tuteja, 2009). The A4417.01 locus is an excellent candidate region for further characterization and may provide insight into the cellular mechanisms to explain the variation in Plasmodium infection in

Red-billed Duck and other wild bird species (Gupta et al., 2015).

The Red-billed A6724.01 locus mapped to genomic region of the mallard or chicken genome has not been annotated. The Red-billed Duck A6648.02 locus had high sequence similarity to a succinate-CoA ligase GDP-forming beta

111 subunit (SUCLG2) in the Gallus gallus genome. SUCLG2 codes for an enzyme that is important for coupling the hydrolysis of succinyl-CoA to the synthesis of

GTP in the Citric Acid cycle. Specifically, the beta subunit the Red-billed Duck

A6648 locus mapped to provides nucleotide specificity of the enzyme and binds the substrate succinate (Johnson et al., 1998). We could not find any association of SUCGL2 with Plasmodium spp. infection nor the impact A6648 locus, allelic variant two has on the function of this enzyme. The A6724.01 and A6648.02

Red-billed Duck loci may provide promising leads to novel candidate regions for targeted therapy for Plasmodium infection that have yet to be described.

In addition to investigating the impact of A4417.01, A6724.01, and

A6648.02, we should also characterize the genomic regions surrounding these loci and other loci with high importance. The Red-billed Duck dataset used in the

Random Forest did not account for linkage disequilibrium between loci. Linkage disequilibrium may result in false positives in the variables of importance (Meng,

2009). The variables we have identified may be true positives or closely linked to a gene that influences infection probability (Meng, 2009). Alternative Random

Forest methods have been developed for genomic data to counter linkage disequilibrium (Meng et al., 2009; Botta et al., 2014). However, these extended

Random Forest methods only appear to be more sensitive to rare variants and genomic regions deviating from Hardy-Weinberg (HW) equilibrium (Botta et al.,

2014). Linkage disequilibrium maps are not well characterized in the mallard genome. Therefore, we were not able to incorporate this information into our analysis prior to the Random Forest.

112 The traditional method of BayeScan did not identify any loci under selection between Plasmodium infected and non-infected Red-billed Duck. We could have lost power in the BayeScan analysis due to a low number of individuals in the subpopulations (43-non-infected; 15-infected) and/or weak selection (Tigano et al., 2017). The BayeScan method may reveal loci under selection if additional Red-billed Duck samples were included in the analysis. If the variation in Plasmodium infection is caused by multiple genes, the signatures of selection may be divided amongst the loci that contribute to that trait (Tigano et al., 2017). Increases in genomic architecture complexity (i.e. more genes) of this trait decreases our ability to detect and identify loci under selection (Yeaman,

2015).

To our knowledge this is the first example of using a Random Forest classification algorithm to identify ddRADseq loci associated with a disease.

Further characterization of the genomic regions surrounding the top loci of importance for classification (A4417.01, A6724.01 and A6648.02) using whole genome sequencing will be crucial in improving our genotype/phenotype association study and understand linkage disequilibrium at a local scale in the

Red-billed Duck genome. Additionally, we can apply knowledge gained from studies measuring recombination rates in birds to our Red-billed Duck genomic data to better understand the size of heritable genomic blocks and thus, linkage disequilibrium (Backstrӧm et al., 2010). The results from Chapter 3 identified genomic regions associated with Plasmodium infection in Red-billed Duck and

113 provided a promising lead on potential genomic targets for therapeutic intervention in avian Plasmodium infection.

114 Appendix 3

Appendix 3 S1: R script for Red-billed Duck Random Forest

title: "RBDU_randomforest_01242020"

author: "Sara Seibert"

date: "1/24/2020"

Note: R code is adapted from https://www.youtube.com/watch?v=dJclNIN-TPo

Pre-1: Load required packages

```{r} library(randomForest) library(randomForestExplainer) library(caret)

```

1. Read in my data. File does not contain any NAs for missing variables. These variables have been removed prior to analysis in excel.

```{r}

SNPdata <-read.csv("E:/RBDU_randomforest/RBDU_snps_woNA_03122020.csv", header = TRUE)

```

2. Check data format- we want factors not integers. Our .csv file has alleles represented by arbitrarily assigned numbers. R will read these as integers not factors.

```{r} str(SNPdata)

```

3. Loop to change variables from integers to factors for the entire dataset.

```{r} for (i in 2:8379) SNPdata[,i]<-as.factor(SNPdata[,i])

115 ```

4. Run random forest for the entire dataset. The number of trees is set at 5000. I added class weight between infected and non-infected because of the disparity.

```{r}

SNPRF <- randomForest(PopData ~ ., data = SNPdata[,c(F, T,

!apply(is.na(SNPdata[,3:8379]),2,any))], ntree=5000, localImp = TRUE, classwt

=c(43/58, 15/58))

```

5. Volker's code for random forest results summary.

Notes:

Importance Frame Components

It contains 13 rows, each corresponding to a predictor, and 8 columns of which one stores the variable names and the rest store the variable importance measures of a variable Xj: accuracy_decrease (classification) – mean decrease of prediction accuracy after Xj is permuted, gini_decrease (classification) – mean decrease in the Gini index of node impurity (i.e. increase of node purity) by splits on Xj, mse_increase (regression) – mean increase of mean squared error after Xj is permuted, node_purity_increase (regression) – mean node purity increases by splits on Xj, as measured by the decrease in sum of squares, mean_minimal_depth – mean minimal depth calculated in one of three ways specified by the parameter mean_sample, no_of_trees – total number of trees in which a split on Xj occurs, no_of_nodes – total number of nodes that use Xj for splitting (it is usually equal to no_of_trees if trees are shallow),

116 times_a_root – total number of trees in which Xj is used for splitting the root node (i.e., the whole sample is divided into two based on the value of Xj), p_value – p-value for the one-sided binomial test using the following distribution:

*This test tells us whether the observed number of successes (number of nodes in which

Xj was used for splitting) exceeds the theoretical number of successes if they were random (i.e. following the binomial distribution given above).

```{r} summary(SNPRF) str(SNPRF)

SNPRF$importance sort(SNPRF$importance) varImpPlot(SNPRF, sort = T, n.var = 10, main = "Top 10 - Variables of Importance")

```

*** Ran this 10 times to get a consensus of ranking of variables****

6. Generate a multi_way importance plot.

```{r} importance_frame <- measure_importance(SNPRF) save(importance_frame, file = "importance_frame.rda") load("importance_frame.rda") importance_frame plot_multi_way_importance(importance_frame, x_measure = "accuracy_decrease", y_measure = "gini_decrease", size_measure = "p_value", no_of_labels = 10, main="RBDU Random Forest SNP Importance plot")

```

117 Chapter 4: Host demographic variables associated with avian Plasmodium

Introduction

Birds are reservoir hosts for many mosquito-borne diseases known to infect humans (Reed et al., 2003; Hamer et al., 2012). Thus, the development of novel strategies to disrupt the transmission of these disease-causing pathogens is of great importance to the health of humans, domestic livestock, and wildlife

(Burkett-Cadena et al., 2014). Basic scientific knowledge on host demographics that may explain differences in the rate or susceptibility to mosquito-borne diseases is a crucial step towards understanding transmission mechanisms, host population dynamics, and host population viability (Hasselquist, 2007; Dunn et al., 2013; Burkett-Cadena et al., 2014).

In birds, the scientific literature often provides conflicting results concerning the strength and direction of the association between sex and

Plasmodium infection (McCurdy et al., 1998; Calero-Riestra and Garcia, 2016).

Some studies have documented higher rates of Plasmodium infection in male birds (Fernandez et al., 2009; Yohannes et al., 2009; Baillie et al., 2012;

Biedrzycka et al., 2014; Calero-Riestra et al., 2016; Fastetal et al., 2016; Hanel et al., 2016), whereas others found higher rates in females (Fernandez et al., 2009;

Fecchio et al., 2015; Bertram et al., 2017; Ishtiaq et al., 2017; Muriel et al., 2018) or have found no association with sex at all (Kulma et al., 2013; Perez-Rodriguez et al., 2013; Podmokla et al., 2014; Lewicki et al., 2015; Svoboda et al., 2015;

Smith et al., 2017). The sex disparity in Plasmodium infection rates observed between bird hosts is likely attributed to differences in exposure risk mediated by

118 behavior and physiology (Magnhagen, 1991; McCurdy et al., 1998; Hasselquist,

2007; Dunn et al., 2011; Isaksson et al., 2013; Chiver et al., 2014; Burkett-

Cadena et al., 2014; Calero-Riestra and Garcia, 2016; Escallion et al., 2016).

The exposure risk mediated by behavior includes differences in mating behaviors, parental care, and dispersal to explain sex-biased Plasmodium infection (Hasselquist, 2007; Chiver et al., 2014). Male birds often take part in riskier behaviors during breeding season that are associated with securing a mate: mating displays, ornate plumages, and prospecting for extra-pair copulation (Dunn et al., 2011). These risky behaviors displayed by male birds increase visibility to potential mates and competitors but also inadvertently increases their ‘visibility’ to mosquitoes looking for a blood meal. Therefore, it is likely that males would have a greater rate of Plasmodium infection than females.

In addition to behavior, differences in physiology may explain sex bias in

Plasmodium infection. Testosterone, the male sex hormone, reduces cell- mediated (T-cell) and humoral immunity (B-cell) (McCurdy et al., 1998;

Hasselquist, 2007). The immunosuppressive effect of sex hormones, mostly testosterone, and its role in susceptibility to pathogens is well documented in mice, lizards, birds, and humans (Kamis and Ibrahim, 1989; Schuurs and

Verheul, 1990; Zuk and Johnson, 1998; Duffy et al., 2000, Norris and Evans,

2000; Schall, 2000; Hasselquist, 2007; Trigunaite et al., 2015; Calero-Riestra and

Garcia, 2016).

Developmental stage (age) of bird hosts is also a potential predictor of avian Plasmodium infection and severity of disease symptoms (Calero-Riestra

119 and Garcia, 2016). The stages most vulnerable to Plasmodium are hatchlings and nestlings compared to adults. The vulnerability to Plasmodium infection and severe disease symptoms are likely a manifestation of a naïve immune system, nakedness (without feathers), and restricted mobility (Fromhage, 2001; Cosgrove et al., 2006). The correlation between host age and parasite prevalence has not explicitly been studied often, and therefore, drawing definitive conclusions about their association is difficult (Burkett-Cadena et al., 2010; Calero-Riestra and

Garcia, 2016).

Ducks, swans, and geese are described as aquatic foragers. These birds are known to have low rates of Plasmodium infection (Peirce and Brooke, 1993;

Merino et al., 1997; Piersma, 1997; Merino and Minguez, 1998; Figuerola,

1999; Engström et al., 2000; Jovani et al., 2001; Levin et al., 2009; Yohannes et al., 2009; Quillfeldt et al., 2010; Mendes et al., 2013; Ramey et al., 2013;

Hellard et al., 2016; Soares et al., 2016; Martínez-de la Puente et al., 2017;

Campioni et al., 2018) compared to terrestrial foragers like passerines and pheasants. This variation in Plasmodium infection may be explained by differences in vector contact, immunological function, or behavior (Piersma,

1997; Figuerola, 1999; Martínez-Abrain et al., 2004).

In addition to sex, age, and foraging strategy, we also tested absolute latitude and species as variables that may explain Plasmodium infection in birds.

In Chapter 4, I performed a meta-analysis to determine key host demographic variables influencing patterns of Plasmodium in birds.

120 Methods

Literature Search Strategy

We performed a systematic search of PubMed and MalAvi databases for publicly available data published between 2002– 2019 to aggregate and contrast the results from studies on avian Plasmodium infection. Our literature search focused on studies using PCR to detect the presence of the Plasmodium parasite

(Fallon et al., 2003). The following keywords were used for search criteria: “avian

Plasmodium”, “avian malaria”, AND “2002-2018” to create a reference list of studies. To minimize publication bias, we manually searched for relevant studies that meet inclusion criteria within the reference list retrieved. The minimum criteria needed for study inclusion was sample size, species, sex (male, female), age (adult, juvenile), location, and Plasmodium infection status scored. Additional information was extrapolated from the species identification including reproductive behavior, parental care, migratory behavior, and species range We followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta- analyses) guidelines to obtain the final dataset which included total sample size aggregated to total data entries (Table 4.1). Criteria used to exclude studies from meta-analysis included raw data not publicly available, lack of PCR method to determine Plasmodium infection, no confirmation of Plasmodium infection through DNA sequencing, experimental inoculation of Plasmodium, and data set usage in multiple publications. If sufficient data was unavailable, the study was excluded from the meta- analysis. These exclusion criteria filtered studies that did not provide sufficient information for addressing the objective.

121 Statistical analysis

We performed a meta-analysis of proportions in the statistical programming environment R (R Core Team, 2019) using the randomized meta- analysis function rma.glmm () from the ‘metafor’, ‘meta’, and ‘weightr’

R packages (adapted from Viechtbauer, 2010; Appendix 4 S1 for R script).

Steps used in meta-analysis

1. Test for heterogeneity and pool proportions for each study

We performed a random effects meta-analysis across our compiled dataset using Cochran’s Q (Q2), a chi-squared statistic for heterogeneity. The random effects model was used because it does not assume similar condition or subjects across studies and does not weight studies based on the information provided (Gelman and Hill, 2006). We calculated summary proportions of Plasmodium infection (number infected / total tested) across all studies and host demographic variables (i.e. age, adult or juvenile; sex, male or

122 Table 4.1: Avian Plasmodium studies included in the meta-analysis. Study details in columns 1-3. Ppos – number of individuals positive for Plasmodium; Total – toal number of birds tested for Plasmodium; Proportion – Ppos/Total; Sex – female (F), male (M); Age – adult (A), juvenile (J); Foraging – terrestrial (T), aquatic (A)

123

124

125

126

127 female; foraging habitat, aquatic or terrestrial; latitude, species, and first author and year of publication). To pool effect sizes across all studies, we used a logit link to calculate the odds ratios from the summary proportions and quantitatively compare study proportions within our meta-analysis. We tested the odds ratios of summary proportions as a function of first author and year to measure Q2 across the dataset, the total dataset heterogeneity (tau2), and within study variation (I2). Finally, we created a linear model with a restricted maximum likelihood estimator using the function predict() from the package

‘metafor’ (Viechtbauer, 2010) in the statistical programming environment R

(Sullivan and Feinn, 2012; R Core Team, 2019).

2. Identify outliers and quantify their influence on overall summary

proportion

To determine if any individual studies influenced total dataset heterogeneity with a large effect, squared Pearson’s residuals and Cook’s D values were calculated and visualized using the functions baujat() and influence() from the R package ‘metafor’ (Baujat et al., 2002; Viechtbauer, 2010; R Core Team, 2019).

High Cook’s distance values indicate high residual values and influence (Cook,

1977).

3. Test for publication bias and its effect on the overall summary

proportion

To detect and confirm publication bias, we generated a funnel plot using the function funnel.rma() from the R package ‘metafor’ (Viechtbauer, 2010) and performed an Egger’s regression test (Egger et al. 1997).

128 4. Identify host demographic variables of influence on overall

summary proportion

To determine which host demographics may be most associated with

Plasmodium infection in birds, we created linear models using the function rma() from the R package ‘metafor’ (Viechtbauer, 2010; R Core Team, 2019).

All linear models were constructed using Plasmodium infection odds ratios as a function of singular variables (e.g. sex) or all possible interactions with other variables where ecologically relevant. For example, we created models for

Plasmodium infection odds ratios as a function of sex, as well as separate models for Plasmodium infection odds ratios as a function of interactive terms for sex and foraging, sex, foraging and latitude, and sex, foraging, latitude and age. Foraging habitat and species were nested variables since foraging habitat was determined by the natural history of each individual species.

5. Test the significance of host demographic variables of influence

using a linear mixed effects model

To test the interaction terms that best described the odds of Plasmodium infection, we created a linear mixed effects model for the odds ratios as a function of foraging and sex with a random effect of first author and year of publication using the function lmer () and anova () the R package ‘lme4’ and

‘lmerTest’ (R Core Team 2019; Bates et al., 2019; Kuznetsova et al., 2019).

Model variables were tested using a Type II Wald’s Chi-squared analysis of variance (ANOVA) and a restricted maximum likelihood estimator for fit with R

129 packages ‘lme4’ and ‘lmerTest’ (R Core Team 2019; Bates et al., 2019;

Kuznetsova et al., 2019).

Results

The final data set for our meta-analysis included 5,630 birds from 41 different species distributed between ± 3-71° latitude in 33 peer-review studies

(Table 4.1).

1. Heterogeneity and summary proportion

We had substantial heterogeneity across the dataset (Q = 1057.2833, df

= 92; tau2 ± SE, 2 ± 0.36; P < 0.0001), indicating large variation across studies rather than within studies (West et al., 2010). Across the entire dataset, avian

Plasmodium infection was 17% of the total number of tested individuals

(meansummary proportion = 0.17, CIlower =0.13, CIupper = 0.22). Variation observed between studies (I2) was 94%, indicating a large percentage of total variation resulted from variability between studies (Huedo-Medina et al., 2006). A forest plot was generated for our entire dataset (Figure 4.1) which describes the proportion of birds infected for each study class (Adult, female) and the 95% confidence interval.

130

131 Figure 4.1: Forest plot of Plasmodium infection proportion in birds. Each row represents an age, sex combination class for each study. Additionally, each row provides the event (number of individuals infected with Plasmodium), total (number of individuals tested), proportion (percent of tested individuals infected), and 95% confidence interval (CI) for the proportion and the weights assumed for Fixed or Random effect models. The summary proportion estimates for Fixed and Random effect models are provided at the bottom of the plot. The dotted line represent the Random effect estimate and the dashed line represents the Fixed effect estimate of the summary proportion.

132 2. Identify outliers of influence

To identify studies with large residuals that may be considered outliers influencing our summary proportion, we calculated squared Pearson residual values for each study and used a Baujat plot to visualize the data (Figure 4.2).

From our analysis, we identified two potential outlier studies: 78 (Fernández et al., 2009) and 73 (Fecchio et al., 2015) (Figure 4.2). Neither study had a Cook’s distance above an influence threshold of 0.05 and thus, did not influence the overall summary proportion (Figure 4.3). All potential outlier studies identified in the squared Pearson’s residual value plot were not significant influencers on the summary proportion (Figure 4.3).

133

Figure 4.2: Baujat plot to identify outliers. The x-axis represents the squared pearson residual for each study, and the y-axis represents the influence on the overall summary proportion. Outlier studies have high squared Pearson residual values and high influence on the overall result. Each numerical value = study number (Table 4.1). The squared standardized Pearson residual values have approximately a chi-squared distribution with df = 1. Thus, at a critical alpha value of 0.05, a value of the squared standardized Pearson residuals greater than 4 (blue line, i.e., X2(1, 0.05) = 3.84) are significant. It appears studies 78 (Fernandez et al., 2009) and 73 (Fecchio et al., 2015) may be significant outlier studies.

134

Cook’sdistance

Study

Figure 4.3: Cook’s distance test for outlier influence. Studies and their representative number are represented on the x-axis and the Cook’s distance value on the y-axis. The threshold for influence was a Cook’s d value of 0.05. If an individual study’s value is above the influence threshold, that study would be indicated in red. Studies 78 and 73 do not have Cook’s distance values greater than 0.05. Therefore, studies 78 and 73 are not considered outlier studies of influence and remained in the dataset. There were no studies with a Cook’s distance value greater than 0.05.

135 3. Publication bias

To visualize the distribution of the individual proportion for each study compared to mean summary proportion and its 95% confidence interval, we generated a funnel plot (Figure 4.4). The asymmetry of the dots in the funnel plot indicated publication bias. To quantify asymmetry in the funnel plot, we generated a Trim and Fill plot which estimates the number of missing studies from our analysis and its effect on the summary proportion estimate (Figure

4.5). Significant asymmetry (P <0.0001) was detected in the dataset. An estimated 24 studies with values greater than the overall summary proportion were missing from the meta-analysis. To confirm publication bias was present, we used an Egger’s regression test to measure funnel plot asymmetry. This weight regression used standard error as a predictor for asymmetry. Publication bias was significant (P <0.0001).

136 Figure 4.4: Funnel plot for proportion of avian Plasmodium infection. The x-axis represents the proportion of Plasmodium infection for each study (black dot) and the y-axis represents the standard error of each study. The summary proportion (0.017) was represented by the vertical line bisecting the funnel shape. The summary proportion pseudo 95% confidence intervals are represented by the dotted lines. Pseudo 95% confidence interval = [ summary proportion estimate − (1.96 × standard error)] and [ summary effect estimate + (1.96 × standard error)] for each standard error on the vertical axis. The asymmetry of the dot distribution indicates publication bias.

137

Figure 4.5: Trim and Fill plot for proportion of avian Plasmodium infection. The x-axis represents the proportion of Plasmodium infection for each study (black dot) and the y-axis represents the standard error of each study. The adjusted summary proportion (0.27) was represented by the vertical line bisecting the funnel shape. The summary proportion pseudo 95% confidence intervals are represented by the dotted lines. Pseudo 95% confidence interval = [summary proportion estimate − (1.96 × standard error)] and [summary effect estimate + (1.96 × standard error)] for each standard error on the vertical axis. In our analysis, we estimate 24 studies are missing (white dots) from the right side of our funnel plot revealing significant publication bias (p-value <0.0001) which suggest studies with a high proportion of Plasmodium infected individuals are unpublished.

138 Likelihood ratio tests Host demographic variables that best described the patterns of

Plasmodium infection in birds were interactions between foraging and latitude

(P = 0.017) and foraging and first author and year of publication (P < 0.0001;

Table 4.2).

139 Table 4.2: Likelihood ratio test results for all host demographic variable and interactions between variables. Each row represents a variable or interaction among variables that may explain the variation in avian Plasmodium infection in the dataset. The interactions between Foraging*Latitude and Foraging*author_year are significant (bold print). These variable interactions were used in the model created in Step 5.

Host demographic variables p-value Sex 0.094 Age 0.31 Foraging <.0001 Species 0.009 Latitude 0.009 author_year <.0001 Foraging*Age 0.11 Foraging*Sex 0.12 Foraging*Latitude 0.017 Foraging*author_year <.0001 author_year*Sex 0.12 author_year*Age 0.08 author_year*Latitude 0.12 author_year*Foraging*Latitude 0.13 author_year*Foraging*Age 0.12 author_year*Foraging*Sex 0.13 author_year*Foraging*Latitude*Sex 0.13 author_year*Foraging*Latitude*Age 0.12

140 4. Test linear mixed effects model

We used the terms identified in Step 4 to construct a linear mixed effect model (lmer). We created a linear mixed effects model for the summary proportion of Plasmodium infection as a function of the interaction of foraging and latitude and the random effect of author_year. By including author_year as a random effect, our model accounts for variation in the dataset not due to random chance. We expect there to be differences site to site not only due to random chance, but because they are inherently different. The interaction of foraging and author_year was excluded as a fixed effect, but included as a random effect in our model. Foraging and latitude did not explain Plasmodium infection proportions in birds (P = 0.4793), nor did foraging (P = 0.0839) and latitude (P =

0.5047) alone.

141 Discussion

The purpose of Chapter 4 was to identify host-demographic variables that best explain the observed variation in Plasmodium infection in birds. Our literature search gathered information for over 6,000 individuals from 41 species across a wide range of latitudes ± 3-71°.

Biological Implications

Previous studies have tried to explain variation in Plasmodium infection in birds. These studies have focused on season, taxonomic order, host sex, age, body mass, density, and foraging strata as explanatory variables for variation in Plasmodium infection (Hasselquist, 2007; Dunn et al., 2011; Baillie et al., 2012; Dunn et al., 2013; Pérez-Rodríguez et al., 2013; Isakkson et al.,

2013; Cornelius et al. 2014; Kulma et al., 2014; Podmokła et al., 2015; Ramey et al., 2015; Lewicki et al., 2015; Bosholn et al., 2016; Calero-Riestra and

García, 2016; Hellard et al., 2016; Escallon et al., 2016; Dubeic et al., 2016;

Ishtiaq et al., 2017; Rafael Gutiérrez-López et al., 2019). These studies fail to address differences between foraging habitat (aquatic vs. terrestrial) and latitude that more appropriately address the likelihood of interacting with the mosquito vector. Lastly, many of these previous studies focus on one or only a few closely related species to monitor Plasmodium infection. Our study integrates all of these variables from which we model the variation in

Plasmodium infection at the broadest scale.

In our study, age did not appear to be an important variable for describing differences in Plasmodium infection. This may be due to biases in

142 our dataset which may have overly represented adult birds. The age bias is most likely due to sampling technician not assessing the age of birds in massive sampling efforts, adult only studies (Fernandez et la., 2009; Yohannes et al., 2009; Dunn et al., 2011; Baillie et al., 2012; Isaksson et al., 2013;

Mendes et al., 2013; Knowles et al., 2014; Bosholn et al., 2016; Gangoso et al., 2016; Ganser et al., 2016; Ishtiaq et al., 2017; Martinez de LaPuente et al.,

2017; Schmid et al., 2017; Muriel et al., 2018), or not reporting age in their manuscript because its association with Plasmodium infection was not significant.

Here, we did not observe a significant correlation between sex (M vs. F) or species with differences in Plasmodium infection in birds. It appears sexual differences in behavior and physiology do not explain this infection variation across bird species.

Historically, aquatic foragers have had low rates of Plasmodium infection

(Peirce and Brooke, 1993; Merino et al., 1997; Piersma, 1997; Merino and

Minguez, 1998; Figuerola, 1999; Engström et al.,2000; Jovani et al., 2001;

Levin et al., 2009; Yohannes et al., 2009; Quillfeldt et al., 2010; Mendes et al.,

2013; Ramey et al., 2013; Soares et al., 2016; Martínez-de la Puente et al.,

2017; Campioni et al., 2018) compared to terrestrial foragers. Thus, many researchers of aquatic foragers may not publish null results or may find no value in spending the money for screening for a parasite that is found at such low prevalence. The lower rates of Plasmodium infection may reflect host immunological competence, behavioral characteristics, or the presence of

143 appropriate vectors at higher latitudes (Piersma, 1997; Figuerola, 1999;

Martínez-Abrain et al., 2004).

The significant interaction observed between latitude and foraging may have been influenced by aquatic foragers sampled more often at high latitudes or more temperature regions. As previously mentioned, aquatic foragers are known to have lower rates of Plasmodium infection but they also breed often at higher latitudes where the vector of Plasmodium would be less likely to be found due to lower temperatures and humidity (Burgess, 1959; Clements et al.,

1963; Gilles et al., 1972; Beck-Johnson et al., 2013; Abiodun et al., 2017). For our study, the range of latitudes for aquatic foragers was (±18-71°) compared to terrestrial foragers (±3-61°) which may partially explain the significant interaction we observe between latitude and foraging. The biological implications of this meta-analysis are limitless and have the potential to unearth global trends in Plasmodium infection across bird species and latitudes.

Substantial Heterogeneity

Within the dataset, we observed substantial heterogeneity (94%) that likely skewed the overall summary proportions and impeded our ability to identify explanatory variables of variation in avian Plasmodium infection. In an effort to decrease heterogeneity, we will increase the number of studies and the number of explanatory variables included in our analysis. First, we plan to increase the number of search engines used to obtain peer-reviewed literature.

Scopus and Google scholar will be searched for manuscripts that fulfill our inclusion criteria. Moreover, we will request raw data from corresponding

144 authors for publications that were excluded in the latter portions of the PRISM pipeline. Second, the host demographic variables assessed in this meta- analysis may not be sufficient in describing the data set. Other demographic variables that could improve our model may include host body size, host density, host plumage color, more specific descriptors of host foraging or

Plasmodium species (Freeman-Gallant et al., 2001; Scheuerlein and Ricklefs,

2004; Isaksson et al., 2013; Gangoso et al., 2016). Additional data may alleviate the substantial heterogeneity observed in this data set and increase our ability to construct a model to describe the variation in Plasmodium in birds.

Biases in our analysis

In addition to heterogeneity between studies, several types of biases may be confounding our anlaysis. Publication bias observed in our data set revealed an estimated 24 studies were missing from the right side of the funnel plot. If these 24 studies were included in our analysis, avian Plasmodium infection would be an estimated 27% of the total number of tested individuals

(meansummary proportion = 0.27, CIlower =0.22, CIupper = 0.32), compared to 17% without these studies.

There are multiple explanations for the publication bias observed within the compiled dataset. First, the inclusion or exclusion of a study from the dataset relied heavily on our ability to access and quantify the data. Many studies were excluded due to our inability to determine the proportion of individuals infected with Plasmodium in our demographic classes (adult, female; adult; male; etc.). Many of the studies excluded from our meta-analysis

145 reported higher Plasmodium infection rates but the lack of information differentiating host demographic classes or Plasmodium from Haemoproteus infections prevented their inclusion (Ribeiro et al., 2005; Szollosi et al., 2011;

Grillo et al., 2012; Podmokla et al., 2014; Svoboda et al., 2015; Delhaye et al.,

2016; Dubeic et al., 2016; Smith et al., 2017). Second, the type of data needed for this analysis may not be published. This may be due to negative results or the results being outside a set narrative for a publication. The findings from this analysis may motivate authors to not only publish avian Plasmodium results, but more specifically, make their data openly accessible. Publication bias may explain our inability to identify host demographic variables that truly describe the patterns of avian Plasmodium infection.

Other types of bias in our analysis include search and selection bias. The search terms used in our literature search may be too specific and broader search terms like “avian parasites” OR “avian infections” may increase the number of studies retrieved. Additionally, the use of specialized search engines, such as Pubmed and Malavi, likely limited the publications retrieved. Lastly, exclusion of non-English language studies could have negatively impacted our results for publication bias and asymmetry (Jüni et al., 2002; Moher, 2003). We will expand this dataset in the future to include studies that were not published in English and fulfilled our search criteria otherwise (Grégoire et al., 1995;

Moher, 1996; Higgins et al., 2019).

Finally, selection bias refers to the tendency of meta-analytic authors to select particular studies for their analysis (Eisend and Tarrahi, 2014). Whereas

146 publication bias is based on publication of a manuscript, the selection bias occurs in response to the selection criteria used by a meta-analyst (Eisend and

Tarrahi, 2014). Although, we used inclusion and exclusion criteria to filter studies, these protocols may have influenced the studies used for our quantitative analysis and strength of our meta-analytic estimate (Eisend and

Tarrahi, 2014).

Linear mixed effects model

We identified interactive terms for foraging and latitude and foraging and author_year that described some of the variation in Plasmodium infection. After constructing a linear mixed effects model, we determined these variables were insufficient. Our efforts to build an explanatory model for avian Plasmodium infection are most likely complicated by the amount of heterogeneity and the significant publication bias in the dataset. With the inclusion of additional studies to the data set, foraging habitat or one of the other host demographic variables may become significant.

Whilst our meta-analysis did not identify host demographic variables that explain the variation in avian Plasmodium infection, the complications we experienced due to heterogeneity and publication bias may motivate authors to submit their publications or make their data available to the public. The findings from this work had the potential to improve wildlife management initiatives by identifying key host demographics associated with higher rates of infection and thus pose the greatest threat to the spread of mosquito-borne diseases.

147 Appendix 4

Appendix 4 S1: R script for Meta-analysis of Proportions on avian Plasmodium infection.

title: "Meta-analysis"

author: "Sara Seibert"

date: "1/15/2020"

1. Load these packages

```{r} library("metafor") library("meta") library("weightr") library( lme4 ) library(car) library(jtools) library(sandwich) library(multcomp)

```

2. Read in data. View data

```{r} dat <- read.csv("E:/Meta-analyses/Plas_meta_long_02172020.csv") print(dat, row.names = FALSE)

```

Heterogeneity and Outliers

3. Calculate effect size and heterogeneity

### r calculating overall summary proportion

148 Notes: Need to convert observed proportions into logits to account for non-normal distribution

"PLO" stands for logit transformation

"ML" Degree of heterogeneity will be estimated using maximum likelihood procedure predict = convert logits back to proportions

Interpret results:

#Pred = summary proportion size for each variable; summary proportion; prevalence of

CC

#tau2 = total between study variance (unexplained + explained between study variance)

#I2 = #% total variance between studies explained by that moderator

#Q = tells us whether the included studies share a common effect size; will tell us if there is substantial heterogeneity in our meta#

```{r} ies.logit = escalc(xi=Ppos,ni=Total,measure="PLO",data=dat) pes.logit = rma(yi,vi,data=ies.logit,method="REML",weighted=TRUE) pes=predict(pes.logit,transf=transf.ilogit,digits=5) print(pes,digits=5);print(pes.logit,digits=4);confint(pes.logit,digits=2)

```

4. Calculating subgroup summary proportions according to binary variables: sex, age, and foraging.

```{r} pes.logit.M=rma(yi,vi,data=ies.logit,subset=Sex=="M",method="REML") pes.logit.F=rma(yi,vi,data=ies.logit,subset=Sex=="F",method="REML") pes.logit.J=rma(yi,vi,data=ies.logit,subset=Age=="J",method="REML")

149 pes.logit.A=rma(yi,vi,data=ies.logit,subset=Age=="A",method="REML") pes.logit.Aquatic=rma(yi,vi,data=ies.logit,subset=Foraging=="Aquatic",method="REML

") pes.logit.Terrestrial=rma(yi,vi,data=ies.logit,subset=Foraging=="Terrestrial",method="R

EML")

pes.M=predict(pes.logit.M,transf=transf.ilogit,digits=5) pes.F=predict(pes.logit.F,transf=transf.ilogit,digits=5) pes.J=predict(pes.logit.J,transf=transf.ilogit,digits=5) pes.A=predict(pes.logit.A,transf=transf.ilogit,digits=5) pes.Aquatic=predict(pes.logit.Aquatic,transf=transf.ilogit,digits=5) pes.Terrestrial=predict(pes.logit.Terrestrial,transf=transf.ilogit,digits=5)

print(pes.M,digits=5);print(pes.logit.M,digits=2);confint(pes.logit.M,digits) print(pes.F,digits=5);print(pes.logit.F,digits=2);confint(pes.logit.F,digits=2) print(pes.J,digits=5);print(pes.logit.J,digits=2);confint(pes.logit.J,digits=2) print(pes.A,digits=5);print(pes.logit.A,digits=2);confint(pes.logit.A,digits=2) print(pes.Aquatic,digits=5);print(pes.logit.Aquatic,digits=2);confint(pes.logit.Aquatic,digi ts=2) print(pes.Terrestrial,digits=5);print(pes.logit.Terrestrial,digits=2);confint(pes.logit.Terres trial,digits=2)

```

5. Calculating summary proportions for continuous variables: species, author_year, and latitude

```{r}

150 pes.logit.Species=rma(yi,vi,data=ies.logit,subset=Species,method="REML") pes.Species=predict(pes.logit.Species,transf=transf.ilogit,digits=5) print(pes.Species,digits=5);print(pes.logit.Species,digits=2);confint(pes.logit.Species,di gits)

```

```{r} pes.logit.Latitude=rma(yi,vi,data=ies.logit,subset=Latitude,method="REML",verbose=T

RUE, digits=5, control=list(maxiter=1000)) pes.Latitude=predict(pes.logit.Latitude,transf=transf.ilogit,digits=5) print(pes.Latitude,digits=5);print(pes.logit.Latitude,digits=2);confint(pes.logit.Latitude)

```

```{r} pes.logit.author_year=rma(yi,vi,data=ies.logit,subset=author_year,method="REML") pes.author_year=predict(pes.logit.author_year,transf=transf.ilogit,digits=5) print(pes.author_year,digits=5);print(pes.logit.author_year,digits=2);confint(pes.logit.au thor_year,digits)

6. Identify study outliers that may be significantly influencing the average summary proportion.

Notes: Outliers with residuals, Z vlaues greater abs[2] could be outliers.

#rstudent = studentized residuals; externally standardized residuals#

```{r} stud.res=rstudent(pes.logit) abs.z=abs(stud.res$z)

151 stud.res[order(-abs.z)]

```

7. Baujat plot to visualize outliers.

#some studies may have really large residuals and may be considered outliers

#Doesn’t mean they are influential, studies with greater than 4 squared pearson residual may be sig.

```{r} baujat(pes.logit)

```

8. Diagnostic test to determine if studies with a strong influence on results should be removed. Studies of influence will be marked red in the cook’s d plot greater than

0.05.

```{r} inf=influence(pes.logit) print(inf);plot(inf)

```

9. To remove significant outliers identified in Baujat and Cook’s d. In this example studies 62 and 57 were outliers studentized and baujat plot but not in Cook's D. Do not run this code unless you have outliers. If you have outliers you have to input the studies you want removed in the first line of code and orange numbers

```{r} ies.logit.noutlier=escalc(xi=Ppos,ni=Total,measure="PLO", data=dat[-c(62,57),]) pes.logit.noutlier=rma(yi,vi,data=ies.logit.noutlier,method="ML", weighted=TRUE) pes.noutlier=predict(pes.logit.noutlier,transf=transf.ilogit)

152 print(pes,digts=5) print(pes.noutlier,digits=5)

```

10. Generate a forest plot using metaprop function.

Notes: The # code just modifies the forest plot. I used the classic default. Forest plot should show up in your wd as the name you gave it in line 2

```{r} pes.forest=metaprop(Ppos,Total,author_year,data=dat,sm="PLO",method.ci="NAsm", method.tau="REML") png("pes.forest_02132020.png", width=1000, height=3000) forest(pes.forest, col.square="black", hetstat=FALSE)

#forest(#pes.forest,

#xlim=c(0,-3), pscale =1,

#rightcols=c("Proportion","ci","w.random"),

#rightlabs=c("Proportion","95% C.I.","Weights"),

#leftcols = c("event","n"),

#leftlabs = c("author_year","Ppos","Total"),

#xlab = "Prevalence",

#fs.xlab=12,

#fs.study=12,

#fs.study.lables=12,

#fs.heading=12,

#squaresize = 0.5, col.square = "navy", col.square.lines = "navy",

#col.diamond = "navy", col.diamond.lines = "navy",

153 #comb.fixed=FALSE,

#lty.fixed=0,

#lty.random=2,

#type.study="square",

#type.random="diamond",

#ff.fixed="bold.italic",

#ff.random="bold.italic",

#hetlab="Heterogeneity:",

#fs.hetstat=10,

#smlab="",

#print.Q=TRUE,

#print.pval.Q=TRUE,

#print.I2=TRUE,

#print.tau2=FALSE,

#col.by="grey",

#digits=5)

dev.off()

```

Publication bias

11. To assess publication bias first generate a funnel plot.

```{r} funnel.rma(pes.logit,atransf=transf.ilogit, xlab = "Proportion", ylab = "standard Error",

main = "Funnel plot for Proportions")

```

154 12. To simulate the ‘missing’ studies into your funnel plot to add symmetry and predict impact on average summary proportion.

```{r} pes pes.trimfill=trimfill(pes.logit) predict(pes.trimfill,transf=transf.ilogit,digits=5) pes.trimfill

Note: If you want the x-axis to be expressed as a logit transformed proportion, then remove the "#" sign before the following line

#funnel(pes.trimfill)

#if you want the x-axis to be expressed as a proportion, then remove the "#" sign before the following line. funnel.rma(pes.trimfill,atransf=transf.ilogit,yaxis="sei",xlab="Proportion",main=" Trim and Fill plot", digits=6)

```

13. Test for publication bias significance using Egger's regression test

```{r} regtest(pes.logit,model="lm",predictor="sei")

```

Likelihood ratio tests

14. ANOVAs

-Age; Note: Not signficant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Age),data=dat,method="ML")

155 res2 <- rma(pes.logit$yi,pes.logit$vi,data = dat, method="REML") anova(res1, res2)

```

-Sex; Note: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Sex),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-Foraging; Note: Very significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-Species; Note: Significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Species),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-Latitude; Note: Significant

```{r}

156 res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Latitude),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-author_year; Note: Very significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-Pub_Year; Note: Very significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Pub_Year),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

-First_author; Note: Very significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(First_author),data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, data=dat, method="REML") anova(res1, res2)

```

157 -SEX*Foraging when compared to foraging; Note: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Sex,Foraging), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging), data=dat, method="REML") anova(res1, res2)

```

-Age*Foraging; Note: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Age,Foraging), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging), data=dat, method="REML") anova(res1, res2)

```

-Latitude*Foraging; Note: Significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Latitude,Foraging), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging), data=dat, method="REML") anova(res1, res2)

```

-author_year*foraging; Note: Significant

```{r}

158 res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging), data=dat, method="REML") anova(res1, res2)

```

-author_year*foraging*latitude; Note: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging, Latitude), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging, author_year), data=dat, method="REML") anova(res1, res2)

```

-author_year, foraging, age; Notes: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging, Age), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging, author_year), data=dat, method="REML") anova(res1, res2)

```

-author_year, foraging, Sex; Notes: Not significant

```{r}

159 res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging, Sex), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging, author_year), data=dat, method="REML") anova(res1, res2)

```

-SEX*author_year; Notes: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Sex,author_year), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year), data=dat, method="REML") anova(res1, res2)

```

-age*author_year; Notes: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Age,author_year), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year), data=dat, method="REML") anova(res1, res2)

```

-latitude*author_year; Notes: Not significant

```{r}

160 res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Latitude,author_year), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year), data=dat, method="REML") anova(res1, res2)

```

-author_year*foraging*latitude*sex; Notes: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging, Latitude,

Sex), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging, author_year, Latitude), data=dat, method="REML") anova(res1, res2)

```

-author_year*foraging*latitude*Age; Notes: Not significant

```{r} res1 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(author_year,Foraging, Latitude,

Age), data=dat,method="REML") res2 <- rma(pes.logit$yi,pes.logit$vi, mods = cbind(Foraging, author_year, Latitude), data=dat, method="REML") anova(res1, res2)

```

161 Linear mixed effects model

15. Testing the significance of the linear mixed effects model

```{r} model <- lmer( Proportion ~ (Latitude*Foraging) + (1|author_year), data=dat) summary(model)

Anova(model)

```

162 Concluding remarks

Here, we sought to gain knowledge concerning the dispersal, population genomics, and disease ecology of Southern Hemisphere waterfowl species. In

Chapter 1, we focused on four species of waterfowl found in Papua-New Guinea and Australia: Wandering Whistling-Duck, Pacific Black Duck, Green Pygmy-

Goose, and Radjah Shelduck. Biogeographic barriers in northern Australia and into Papua-New Guinea are insufficient in impeding gene flow in the whistling- duck, black duck, and the pygmy-goose. In contrast, the sedentary shelduck has restricted gene flow across these same barriers; the eastern Queensland population seems to be particularly isolated and subject to inbreeding. From these Australo-Papua data, conservation and wildlife biologists have gained insight into the movements and structure of waterfowl populations in this region that are not so readily possible from other kinds of data, such as from tracking and behavioral studies. Moreover, this chapter has provided evidentiary support for a methodological shift in population genomic analyses by assessing kinship and inbreeding prior to quantifying population structure in wild bird populations.

Birds are major vectors and reservoir hosts of infectious diseases across our globe (Reed et al., 2003; Hamer et al., 2012; Gubler, 2002; Altizer et al.,

2011). In Chapter 2, we compared the genetic structure between a Southern

Hemisphere waterfowl host species the Red-billed Duck and its Plasmodium spp. parasite. Although host-parasite theory suggests the parasite population structure should reflect the movement of its more motile host, we did not observe this in our Red-billed Duck-Plasmodium study. We did observe that the common

163 Plasmdoium lineage (COPALB03) was more abundant in Red-billed Ducks sampled in Zimbabwe, whereas the rarer lineages (MILANS05, RTSR1,

SYBOR10, LAIRI01, MILANS06) were found more often in Botswana and South

Africa. From these data, disease ecologist gain a broader understanding of host- parasite interactions in southern Africa and in nomadic, Southern Hemisphere waterfowl species like the Red-billed Duck.

Waterfowl can also be used as a model organism to identify genomic regions associated with lower rates of Plasmodium infection. Based on previous southern Africa work, we speculated the lower rates of Plasmodium infection observed in waterfowl were due to natural selection for genetic variants that confer resistance to infection. In Chapter 3, we used the Red-billed Duck genomic data to detect genomic regions that may be experiencing selection or are important for the classification of birds into infected and non-infected for

Plasmodium. Using a non-traditional population genomic method, we identified several genomic regions and allelic variants associated with avian Plasmodium.

The most important genomic region for classifying Red-billed Duck samples mapped to a protein-coding gene that has been known to inhibit the invasion of red blood cells by Plasmodium parasites. These results are exciting and provide candidate genomic regions that may be associated with resistance to avian malaria in birds, and could potentially lead to novel therapeutic interventions for

Plasmodium infection in humans.

Finally, in Chapter 4, we explored host demographic variables that may explain the variation in Plasmodium infection rates in birds. Our meta-analysis

164 had substantial heterogeneity between studies and significant publication bias, but we were still able to identify host variables associated with higher rates of

Plasmodium infection. Unfortunately, the linear model constructed from these data did not signficantly explain the variation in Plasmodium infection. To address the substantial heterogeneity and publication bias, future work will use an additional search engine to retrieve literature and communications with corresponding authors to request data that were not publicly available. The results from this chapter are not only broadly applicable across all bird species but also other vertebrate taxa. The utility of prioritizing bird host characteristic associated with higher Plasmodium infection that pose the greatest threat to disease transmission is boundless with respect to monitoring and managing the movement of infectious diseases. The knowledge gained from this dissertation advances the fields of avian biology, disease ecology, and public health.

165 References

Abiodun, B. J., Adegoke, J., Abatan, A. A., Ibe, C. A., Egbebiyi, T. S.,

Engelbrecht, F., & Pinto, I. (2017). Potential impacts of climate change on

extreme precipitation over four African coastal cities. Climatic Change,

143(3-4), 399–413. doi: 10.1007/s10584-017-2001-5

Adegbite, N., Xu, M., Kaplan, F., Shore, E., & Pignolo, R. (2008). Diagnostic and

mutational spectrum of progressive osseous heteroplasia (POH) and other

forms of GNAS‐based heterotopic ossification. American Journal of

Medical Genetics Part A, 146A(14), 1788–1796. doi:

10.1002/ajmg.a.32346

Ahmed, S. B., & Prigent, S. A. (2014). A nuclear export signal and oxidative

stress regulate ShcD subcellular localisation: A potential role for ShcD in

the nucleus. Cellular Signalling, 26(1), 32–40. doi:

10.1016/j.cellsig.2013.09.003

Altizer, S., Bartel, R., & Han, B.A. (2011). Migration and Infectious

Disease Risk. Science 331: 296–302. doi: 10.1126/science.1194694

Altschul S.F., Gish W., Miller W., Myers E.W., & Lipman, D. J. (1990). Basic local

alignment search tool. Journal of Molecular Biology 215: 403–410.

An, S., Bleu, T., Hallmark, O. G., & Goetzl, E. J. (1998). Characterization of a

Novel Subtype of Human G Protein-coupled Receptor for

Lysophosphatidic Acid. Journal of Biological Chemistry, 273(14), 7906–

7910. doi: 10.1074/jbc.273.14.7906

166 Archie, E. A., Luikart, G., & Ezenwa, V. O. (2009) Infecting Epidemiology with

Genetics: a New Frontier in Disease Ecology. Trends in Ecology &

Evolution 24: 21–30.

Asghar, M., Westerdahl, H., Zehtindjiev, P., Ilieva, M., Hasselquist, D., &

Bensch, S. (2012). Primary peak and chronic malaria infection levels are

correlated in experimentally infected great reed warblers. Parasitology

139: 1246–1252.

Backstrom, N., Forstmeier, W., Schielzeth, H., Mellenius, H., Nam, K., Bolund,

E., …&, Ellegren, H. (2010). The recombination landscape of the zebra

finch Taeniopygia guttata genome. Genome Research, 20(4), 485–495.

doi: 10.1101/gr.101410.109

Baillie, S. M., Gudex-Cross, D., Barraclough, R. K., Blanchard, W., & Brunton,

D. H. (2012). Patterns in avian malaria at founder and source

populations of an endemic New Zealand passerine. Parasitology

Research, 111(5), 2077–2089. doi: 10.1007/s00436-012-3055-y

Baird, N. A., Etter, P. D., Atwood, T. S., Currey, M. C., Shiver, A. L., Lewis, Z. A.,

Selker, E. U. Cresko, W. A., & Johnson, E. A. (2008). Rapid SNP

Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS

ONE 3: 10. doi: 10.1371/journal.pone.0003376

Barrett, L. G., Thrall, P. H., Burdon, J. J., & Linde, C. C. (2008). Life history

determines genetic structure and evolutionary potential of host–parasite

interactions. Trends in Ecology & Evolution, 23(12): 678–685. doi:

10.1016/j.tree.2008.06.017

167 Bataille, A., Cunningham, A. A., Cedeño, V., Cruz, M., Eastwood, G., Fonseca,

D. M., Causton, C. E., Azuero, R., Loaya, J., Martinez, J. D., & Goodman,

S. J. (2009). Evidence for regular ongoing introductions of mosquito

disease vectors into the Galápagos Islands. Proceedings of the Royal

Society B: Biological Sciences 276(1674): 3769–3775. doi:

10.1098/rspb.2009.0998

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2019). Fitting Linear Mixed-

Effects Models Using lme4. Journal of Statistical Software 67:1.

Baujat, B., Mahe, C., Pignon, J. P., & Hill, C. (2002). A graphical method for

exploring heterogeneity in meta-analyses: Application to a meta-analysis

of 65 trials. Statistics in Medicine, 21(18), 2641--2652.

Beadell, J. S., Gering, E., Austin, J., Dumbacher, J. P., Peirce, M. A., Pratt, T. K.,

… Fleischer, R. C. (2004). Prevalence and differential host-specificity of

two avian blood parasite genera in the Australo-Papuan region. Molecular

Ecology, 13(12), 3829–3844. doi: 10.1111/j.1365-294x.2004.02363.x

Beck-Johnson, L. M., Nelson, W. A., Paaijmans, K. P., Read, A. F., Thomas, M.

B., & Bjørnstad, O. N. (2013). The Effect of Temperature on Anopheles

Mosquito Population Dynamics and the Potential for Malaria

Transmission. PLoS ONE, 8(11). doi: 10.1371/journal.pone.0079276

Beehler B.M.P., Pratt T.K. & LeCroy M. (2016) Birds of New Guinea: distribution,

and systematics. Princeton University Press, Princeton (N.J.).

Bensch, S., Hellgren, O., & Pérez-Tris, J. (2009). MalAvi: a public database of

malaria parasites and related haemosporidians in avian hosts based on

168 mitochondrial cytochromeblineages. Molecular Ecology Resources, 9(5),

1353–1358. doi: 10.1111/j.1755-0998.2009.02692.x

Bensch, S., Waldenström, J., Jonzén, N., Westerdahl, H., Hansson, B., Sejberg,

D., & Hasselquist, D. (2007). Temporal dynamics and diversity of avian

malaria parasites in a single host species. The Journal of Animal Ecology

76: 112–122.

Bentz, S., Rigaud, T., Barroca, M., Martin-Laurent, F., Bru, D., Moreau, J., &

Faivre, B. (2006). Sensitive measure of prevalence and parasitaemia of

haemosporidia from European blackbird (Turdus merula) populations:

value of PCR-RFLP and quantitative PCR. Parasitology, 133(06), 685.

doi: 10.1017/s0031182006001090

Bertram, M. R., Hamer, S. A., Hartup, B. K., Snowden, K. F., Medeiros, M. C.,

Outlaw, D. C., & Hamer, G. L. (2017). A novel Haemosporida clade at

the rank of in North American cranes (Aves: Gruiformes).

Molecular Phylogenetics and Evolution, 109, 73–79. doi:

10.1016/j.ympev.2016.12.025

Biedrzycka, A., Migalska, M., & Bielański, W. (2014). A quantitative PCR

protocol for detecting specific Haemoproteus lineages: molecular

characterization of blood parasites in a Sedge Warbler population from

southern Poland. Journal of Ornithology, 156(1), 201–208. doi:

10.1007/s10336-014-1116-y

169 Biek, R., & Real, L. A. (2010). The landscape genetics of infectious disease

emergence and spread. Molecular Ecology 19(17): 3515–3531. doi:

10.1111/j.1365-294x.2010.04679.xBlanchong et al., 2016

Blanchong, J. A., Robinson, S. J., Samuel, M. D., & Foster, J. T. (2016).

Application of genetics and genomics to wildlife epidemiology. The Journal

of Wildlife Management, 80(4), 593–608. doi: 10.1002/jwmg.1064

Blouin, M. S., Yowell, C. A., Courtney, C. H., & Dame, J. B. (1995). Host

movement and the genetic structure of populations of parasitic

nematodes. Genetics 141: 1007–1014.

Bonneaud, C., Pérez-Tris, J., Federici, P., Chastel, O., & Sorci, G. (2006) Major

Histocompatibility Alleles Associated with Local Resistance To Malaria In

A Passerine. Evolution, 60, 383.

Bonnet, E. & Van de Peer, Y. V. D. (2002) zt: A Software Tool for Simple and

Partial Mantel Tests. - Journal of Statistical Software in press.

doi:10.18637/jss.v007.i10.

Bosholn, M., Fecchio, A., Silveira, P., Braga, É. M., & Anciães, M. (2016).

Effects of avian malaria on male behaviour and female visitation in

lekking blue-crowned manakins. Journal of Avian Biology, 47(4), 457–

465. doi: 10.1111/jav.00864

Botta, V., Louppe, G., Geurts, P., & Wehenkel, L. (2014). Exploiting SNP

Correlations within Random Forest for Genome-Wide Association

Studies. PLoS ONE, 9(4). doi: 10.1371/journal.po

170 Bridge, J. A., Nelson, M., Örndal, C., Bhatia, P., & Neff, J. R. (1998). Clonal

karyotypic abnormalities of the hereditary multiple exostoses

chromosomal loci 8q24.1 (EXT1) and 11p11-12 (EXT2) in patients with

sporadic and hereditary osteochondromas. Cancer, 82(9), 1657–1663.

doi: 10.1002/(sici)1097-0142(19980501)82:9<1657::aid-cncr10>3.0.co;2-3

Broennimann O., Thuiller W., Hughes G., Midgley G.F., Alkemade, J. M. R. &

Guisan, A. (2006) Do geographic distribution, niche property and life form

explain plants vulnerability to global change? Global Change Biology, 12,

1079–1093.

Brown, L.H., Urban, E.K., & Newman, K. (1982). The Birds of Africa, Volume I.

Academic Press, London.

Brunner, F. S., & Eizaguirre, C. (2016). Can environmental change affect

host/parasite-mediated speciation? Zoology, 119(4), 384–394. doi:

10.1016/j.zool.2016.04.001

Bryant, L. M. & Krosch, M. N. (2016) Lines in the land: a review of evidence for

eastern Australia’s major biogeographical barriers to closed forest taxa. -

Biological Journal of the Linnean Society 119: 238–264.

Burgess, L. (1959). Probing Behaviour of Aedes aegypti (L.) in Response to Heat

and Moisture. Nature, 184(4703), 1968–1969. doi: 10.1038/1841968a0

Burkett-Cadena, N. D., Bingham, A. M., & Unnasch, T. R. (2014). Sex-biased

avian host use by arbovirus vectors. Royal Society Open Science, 1(3),

140262. doi: 10.1098/rsos.140262

171 Byrne, M., Yeates, D. K., Joseph, L., Kearney, M., Bowler, J., Williams, M. A. J.,

Cooper, S., Donnellan, S.C., Keogh, J.S., Leys, R., Melville, J., Murphy,

D.J., Porch, N. & Wyrwoll, K.H. (2008). Birth of a biome: Insights into the

assembly and maintenance of the Australian arid zone biota. Molecular

Ecology 17(20): 4398–4417. doi:10.1111/j.1365-294X.2008.03899.x

Caillé, I., Allinquant, B., Dupont, E., Bouillot, C., Langer, A., Müller, U. &

Prochiantz, A. (2004) Soluble form of amyloid precursor protein regulates

proliferation of progenitors in the adult subventricular zone. Development

131:2173–2181.

Calero-Riestra, M., & García, J. T. (2016). Sex-dependent differences in avian

malaria prevalence and consequences of infections on nestling growth

and adult condition in the Tawny pipit, Anthus campestris. Malaria

Journal, 15(1). doi: 10.1186/s12936-016-1220-y

Campioni, L., Puente, J. M.-D. L., Figuerola, J., Granadeiro, J. P., Silva, M. C.,

& Catry, P. (2017). Absence of haemosporidian parasite infections in the

long-lived Cory’s shearwater: evidence from molecular analyses and

review of the literature. Parasitology Research, 117(1), 323–329. doi:

10.1007/s00436-017-5676-7

Charles, B. A., Shriner, D., Doumatey, A., Chen, G., Zhou, J., Huang, H., … &

Rotimi, C. N. (2011). A genome-wide association study of serum uric acid

in African Americans. BMC Medical Genomics, 4(1). doi: 10.1186/1755-

8794-4-17

172 Chase, J. M. & Leibold, M. A. (2003) Ecological niches: linking classic and

contemporary approaches. University of Chicago Press, Chicago, IL.

Chen, X., & Ishwaran, H. (2012). Random forests for genomic data

analysis. Genomics, 99(6), 323–329. doi: 10.1016/j.ygeno.2012.04.003

Chiver, I., Stutchbury, B.J.M., & Morton, E.S. (2014) Seasonal variation in male

testosterone levels in a tropical bird with year-round territoriality. Journal of

Field Ornithology, 85, 1–9.

Clements, A. N. (1963). The physiology of mosquitoes. New York: Macmillan.

Cohuet, A., Osta, M. A., Morlais, I., Awono‐Ambene, P. H., Michel, K., Simard,

F., … & Kafatos, F. C. (2006). Anopheles and Plasmodium: from

laboratory models to natural systems in the field. EMBO Reports, 7(12),

1285–1289. doi: 10.1038/sj.embor.7400831

Collins, J.E., Wright, C.L., Edwards, C.A., …Dunham, I. (2004). A genome

annotation-driven approach to cloning the human ORFeome. Genome Biol

5, R84 https://doi.org/10.1186/gb-2004-5-10-r84

Combes, C. 1997. Fitness of parasites: Pathology and selection. International

Journal for Parasitology 27(1): 1–10. doi: 10.1016/s0020-7519(96)00168-3

Cook, R. D. (1977). Detection of Influential Observation in Linear

Regression. Technometrics, 19(1), 15. doi: 10.2307/1268249

Cornel, A. J., Lee, Y., Almeida, A. P. G., Johnson, T., Mouatcho, J., Venter, M.,

…& Braack, L. (2018). Mosquito community composition in South Africa

and some neighboring countries. Parasites & Vectors, 11(1). doi:

10.1186/s13071-018-2824-6

173 Cornelius, E. A., Davis, A. K., & Altizer, S. A. (2014). How Important Are

Hemoparasites to Migratory Songbirds? Evaluating Physiological

Measures and Infection Status in Three Neotropical Migrants during

Stopover. Physiological and Biochemical Zoology, 87(5), 719–728. doi:

10.1086/677541

Corrigan, S., Lowther, A. D., Beheregaray, L. B., Bruce, B. D., Cliff, G., Duffy, C.

A., Foulis, A., Francis, M. P., Goldsworthy, S. D., Hyde, J. R., Jabado, R.

W., Kacev, D., Marshall L., Mucientes G. R., Naylor, G. J. P., Pepperell, J.

G., Queiroz, N., White, W. T., Wintner, S. P. & Rogers, P. J. (2018)

Population Connectivity of the Highly Migratory Shortfin Mako (Isurus

oxyrinchus Rafinesque 1810) and Implications for Management in the

Southern Hemisphere. Frontiers in Ecology and Evolution, 6.

Cosgrove, C. L., Knowles, S. C. L., Day, K. P., & Sheldon, B. C. (2006). No

Evidence For Avian Malaria Infection During The Nestling Phase In A

Passerine Bird. Journal of Parasitology, 92(6), 1302–1304. doi:

10.1645/ge-878r.1

Cowman, A. F. & Crabb, B. S. (2006) Invasion of Red Blood Cells by Malaria

Parasites. Cell, 124, 755–766.

Criscione, C. D., & Blouin, M. S. (2004). Life Cycles Shape Parasite Evolution:

Comparative Population Genetics Of Salmon Trematodes. Evolution

58(1): 198. doi: 10.1554/03-359

Crow, J. F. & Kimura, M. (1971) An Introduction to Population Genetics Theory. -

Population (French Edition) 26: 977.

174 Cumming, G. S., Gaidet, N., & Ndlovu, M. (2012a). Towards a unification of

movement ecology and biogeography: conceptual framework and a case

study on Afrotropical ducks. Journal of Biogeography 39: 1401–1411.

Cumming, G. S., Shepard, E., Okanga, S., Caron, A., Ndlovu, M., & Peters, J. L.

(2012b). Host associations, biogeography, and phylogenetics of avian

malaria in southern African waterfowl. Parasitology, 140(2), 193–201. doi:

10.1017/s0031182012001461

Cunningham, F., Achuthan, P., Akanni, W., et al. (2019). Ensembl 2019. Nucleic

Acids Res. 2019; 47(D1): D745–D751.

Dacosta, J. M. & Sorenson, M. D. (2014). Amplification Biases and Consistent

Recovery of Loci in a Double-Digest RAD-seq Protocol. - PLoS ONE.

doi:10.1371/journal.pone.0106713.

Daversa, D. R., Fenton, A., Dell, A. I., Garner, T. W. J., & Manica, A. (2017).

Infections on the move: how transient phases of host movement influence

disease spread. Proceedings of the Royal Society B: Biological Sciences

284(1869): 20171807. doi: 10.1098/rspb.2017.1807

Davies, N., Villablanca, F. X., & Roderick, G. K. (1999). Determining the source

of individuals: multilocus genotyping in nonequilibrium population genetics.

Trends in Ecology & Evolution 14(1): 17–21. doi: 10.1016/s0169-

5347(98)01530-4 del Hoyo, J. & Collar, N. J. (2014) The HBW–BirdLife International illustrated

checklist of the birds of the world, Volume1: non-passerines. Barcelona:

Lynx Editions

175 Delhaye, J., Jenkins, T., & Christe, P. (2016). Plasmodium infection and

oxidative status in breeding great tits, Parus major. Malaria Journal,

15(1). doi: 10.1186/s12936-016-1579-9

Dhami, K. K., Joseph, L., Roshier, D. A., Heinsohn, R., & Peters, J.L. (2012)

Multilocus phylogeography of Australian teals (Anasspp.): a case study of

the relationship between vagility and genetic structure. Journal of Avian

Biology, 44, 169–178.

Dickinson, E.C. & Remsen, J.V. (2013) The Howard and Moore Complete

Checklist of the Birds of the World. 4th Edition. Volume 1. Aves Press:

Eastbourne, UK.

Dolman, G. & Joseph, L. (2012) A species assemblage approach to comparative

phylogeography of birds in southern Australia. Ecology and Evolution, 2,

354–369.

Donnellan, S. C., Armstrong, J., Pickett, M., Milne, T., Baulderstone, J.,

Hollfelder, T. & Bertozzi, T. (2009) Systematic and conservation

implications of mitochondrial DNA diversity in emu-wrens, Stipiturus

(Aves: Maluridae). Emu - Austral Ornithology, 109, 143–152.

Dubiec, A., Podmokła, E., Zagalska-Neubauer, M., Drobniak, S. M., Arct, A.,

Gustafsson, L., & Cichoń, M. (2016). Differential prevalence and diversity

of haemosporidian parasites in two sympatric closely related non-

migratory passerines. Parasitology, 143(10), 1320–1329. doi:

10.1017/s0031182016000779

176 Duffy, D. L., Bentley, G. E., Drazen, D. L., & Ball, G. F. (2000). Effects of

testosterone on cell-mediated and humoral immunity in non-breeding adult

European starlings. Behavioral Ecology, 11(6), 654–662. doi:

10.1093/beheco/11.6.654

Dunn, J. C., Cole, E. F., & Quinn, J. L. (2011). Personality and parasites: sex-

dependent associations between avian malaria infection and multiple

behavioural traits. Behavioral Ecology and Sociobiology, 65(7), 1459–

1471. doi: 10.1007/s00265-011-1156-8

Dunn, J. C., Goodman, S. J., Benton, T. G., & Hamer, K. C. (2013). Avian blood

parasite infection during the non-breeding season: an overlooked issue in

declining populations? BMC Ecology, 13(1), 30. doi: 10.1186/1472-6785-

13-30

Earl, D. A. & von Holdt, B. M. (2011) STRUCTURE HARVESTER: a website and

program for visualizing STRUCTURE output and implementing the

Evanno method. - Conservation Genetics Resources 4: 359–361.

doi:10.1007/s12686-011-9548-7

Edgar, R. C. (2004) MUSCLE: multiple sequence alignment with high accuracy

and high throughput. - Nucleic Acids Research 32: 1792–1797.

Edgar, R. C. (2010) Search and clustering orders of magnitude faster than

BLAST. - Bioinformatics 26: 2460–2461.

Edwards, R. D., Crisp, M. D., Cook, D. H., & Cook, L. G. (2017) Congruent

biogeographical disjunctions at a continent-wide scale: Quantifying and

177 clarifying the role of biogeographic barriers in the Australian tropics. PLoS

One, 12.

Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-

analysis detected by a simple, graphical test. Bmj, 315(7109), 629–634.

doi: 10.1136/bmj.315.7109.629

Eisend, M., & Tarrahi, F. (2014). Meta-analysis selection bias in marketing

research. International Journal of Research in Marketing, 31(3), 317–326.

doi: 10.1016/j.ijresmar.2014.03.006

Engström, H., Dufva, R., Olsson, G., & Engstrom, H. (2000). Absence of

Haematozoa and Ectoparasites in a Highly Sexually Ornamented Species,

The Crested Auklet. Waterbirds: The International Journal of Waterbird

Biology, 23(3), 486. doi: 10.2307/1522187

Escallón, C., Weinstein, N. M., Tallant, J. A., Wojtenek, W., Rodríguez-Saltos,

C. A., Bonaccorso, E., & Moore, I. T. (2016). Testosterone and

Haemosporidian Parasites Along a Tropical Elevational Gradient in

Rufous-Collared Sparrows (Zonotrichia capensis). Journal of

Experimental Zoology Part A: Ecological Genetics and Physiology,

325(8), 501–510. doi: 10.1002/jez.2034

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the number of clusters of

individuals using the software structure: a simulation study. Molecular

Ecology 14: 2611–2620.

Fallon, S. M., Ricklefs, R. E., Swanson, B. L., & Bermingham, E. (2003).

Detecting Avian Malaria: An Improved Polymerase Chain Reaction

178 Diagnostic. Journal of Parasitology, 89(5), 1044–1047. doi: 10.1645/ge-

3157

Fastetal, K. M., Walstrom, V. W., & Outlaw, D. C. (2016). Haemosporidian

Prevalence and Parasitemia In the Tufted Titmouse (Baeolophus

bicolor). Journal of Parasitology, 102(6), 636–642. doi: 10.1645/15-935

Fecchio, A., Lima, M., Silveira, P., Ribas, A., Caparroz, R., & Marini, M. (2015).

Age, but not sex and seasonality, influence Haemosporida prevalence in

White-banded Tanagers (Neothraupis fasciata) from central Brazil.

Canadian Journal of Zoology, 93(1), 71–77. doi: 10.1139/cjz-2014-0119

Fernández, M., Rojo, M. Á., Casanueva, P., Carrión, S., Hernández, M. Á., &

Campos, F. (2009). High prevalence of haemosporidians in Reed

Warbler Acrocephalus scirpaceus and Sedge Warbler Acrocephalus

schoenobaenus in Spain. Journal of Ornithology, 151(1), 27–32. doi:

10.1007/s10336-009-0417-z

Figuerola, J., Munoz, E., Gutierrez, R., & Ferrer, D. (1999). Blood parasites,

leucocytes and plumage brightness in the Cirl Bunting, Emberiza cirlus.

Functional Ecology, 13(5), 594–601. doi: 10.1046/j.1365-

2435.1999.00354.x

Fischer, M. C., Foll, M., Excoffier, L., & Heckel, G. (2011). Enhanced AFLP

genome scans detect local adaptation in high-altitude populations of a

small rodent (Microtus arvalis). Molecular Ecology, 20(7), 1450–1462. doi:

10.1111/j.1365-294x.2011.05015.x

179 Ford, J. (1987) Hybrid Zones in Australian Birds. Emu - Austral Ornithology, 87,

158–178.

Franklin, J. & Miller, J. A. (2014) Mapping species distributions: Spatial inference

and prediction. Cambridge University Press, Cambridge.

Freeman-Gallant, C. R., O’Connor, K. D., & Breuer, M. E. (2001). Sexual

selection and the geography of Plasmodium infection in Savannah

sparrows (Passerculus sandwichensis). Oecologia, 127(4), 517–521. doi:

10.1007/s004420000618

Fromhage, L. (2001). Parental Care and Investment. ELS, 1–7.

Gaggiotti, O. E., & Foll, M. (2010). Quantifying population structure using the F-

model. Molecular Ecology Resources, 10(5), 821–830. doi:

10.1111/j.1755-0998.2010.02873.x

Gangoso, L., Gutiérrez-López, R., Puente, J. M.-D. L., & Figuerola, J. (2016).

Genetic colour polymorphism is associated with avian malarial infections.

Biology Letters, 12(12), 20160839. doi: 10.1098/rsbl.2016.0839

Ganser, C., Gregory, A. J., Mcnew, L. B., Hunt, L. A., Sandercock, B. K., &

Wisely, S. M. (2016). Fine-scale distribution modeling of avian malaria

vectors in north-central Kansas. Journal of Vector Ecology, 41(1), 114–

122. doi: 10.1111/jvec.12202

Garcia, J.E., Puentes, A. & Patarroyo, M. E. (2006) Developmental Biology of

Sporozoite-Host Interactions in Plasmodium falciparum Malaria:

Implications for Vaccine Design. Clinical Microbiology Reviews, 19, 686–

707.

180 Garcia-Longoria, L., Møller, A. P., Balbontín, J., Lope, F. D. & Marzal, A. (2015)

Do malaria parasites manipulate the escape behaviour of their avian

hosts? An experimental study. Parasitology Research, 114, 4493–4501.

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel:

hierarchical models. Cambridge: Cambridge University Press.

Gillies, M. T., & Wilkes, T. J. (1972). The range of attraction of animal baits and

carbon dioxide for mosquitoes. Studies in a freshwater area of West

Africa. Bulletin of Entomological Research, 61(3), 389–404. doi:

10.1017/s0007485300047295

Goytain, A., & Quamme, G. A. (2005). Functional characterization of human

SLC41A1, a Mg2 transporter with similarity to prokaryotic MgtE Mg2

transporters. Physiological Genomics, 21(3), 337–342. doi:

10.1152/physiolgenomics.00261.2004

Greene, J. L. (2015). Update on the highly-pathogenic avian influenza outbreak

of 2014–2015. Congressional Research Service.

Grégoire, G., Derderian, F., & Lorier, J. L. (1995). Selecting the language of the

publications included in a meta-analysis: Is there a tower of babel bias?

Journal of Clinical Epidemiology, 48(1), 159–163. doi: 10.1016/0895-

4356(94)00098-b

Grillo, E. L., Fithian, R. C., Cross, H., Wallace, C., Viverette, C., Reilly, R., &

Mayer, D. C. G. (2012). Presence of Plasmodium and Haemoproteus in

Breeding Prothonotary Warblers (Protonotaria citrea: Parulidae):

181 Temporal and Spatial Trends in Infection Prevalence. Journal of

Parasitology, 98(1), 93–102. doi: 10.1645/ge-2780.1

Guay P. J., Chesser R.T., Mulder R.A., Afton A.D., Paton D.C. & McCracken

K.G. (2010) East–west genetic differentiation in Musk Ducks (Biziura

lobata) of Australia suggests late Pleistocene divergence at the Nullarbor

Plain. Conservation Genetics, 11, 2105–2120.

Gupta, A., Thiruvengadam, G., & Desai, S. A. (2015). The conserved clag

multigene family of malaria parasites: Essential roles in host–pathogen

interaction. Drug Resistance Updates, 18, 47–54. doi:

10.1016/j.drup.2014.10.004

Gutiérrez-López, R., Puente, J. M.-D. L., Gangoso, L., Soriguer, R., & Figuerola,

J. (2020). Plasmodium transmission differs between mosquito species and

parasite lineages. Parasitology, 1–7. doi: 10.1017/s0031182020000062

Gutiérrez-López, R., Puente, J. M.-D. L., Gangoso, L., Soriguer, R., & Figuerola,

J. (2019). Effects of host sex, body mass and infection by avian

Plasmodium on the biting rate of two mosquito species with different

feeding preferences. Parasites & Vectors, 12(1). doi: 10.1186/s13071-

019-3342-x

Hamer, S.A., Lehrer, E., & Magle, S.B. (2012) Wild Birds as Sentinels for Multiple

Zoonotic Pathogens Along an Urban to Rural Gradient in Greater Chicago,

Illinois. Zoonoses and Public Health, 59, 355-364.

Hanel, J., Doležalová, J., Stehlíková, Š., Modrý, D., Chudoba, J., Synek, P., &

Votýpka, J. (2015). Blood parasites in northern goshawk (Accipiter

182 gentilis) with an emphasis to Leucocytozoon toddi. Parasitology

Research, 115(1), 263–270. doi: 10.1007/s00436-015-4743-1

Hardy, O. J. & Vekemans, X. (2002) spagedi: a versatile computer program to

analyse spatial genetic structure at the individual or population levels. -

Molecular Ecology Notes 2: 618–620. doi:10.1046/j.1471-

8286.2002.00305.

Harrison, T. (2003). Erythrocyte G Protein-Coupled Receptor Signaling in

Malarial Infection. Science, 301(5640), 1734–1736. doi:

10.1126/science.1089324

Hartl, D. L. & Clark, A. G. (1997) Principles of population genetics. Sinauer

Associates, Inc. Publishers, Sunderland.

Hasselquist, D. (2007) Comparative immunoecology in birds: hypotheses and

tests. Journal of Ornithology, 148, 571–582.

Hellard, E., Cumming, G. S., Caron, A., Coe, E., & Peters, J. L. (2016) Testing

epidemiological functional groups as predictors of avian haemosporidia

patterns in southern Africa. Ecosphere, 7.

Hellgren, O., Kutzer, M., Bensch, S., Valkiūnas, G., & Palinauskas, V. (2013)

Identification and characterization of the merozoite surface protein 1

(msp1) gene in a host-generalist Avian malaria parasite, Plasmodium

relictum (lineages SGS1 and GRW4) with the use of blood

transcriptome. Malaria Journal, 12, 381.

183 Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J. &

Welch, V. A. (2019). Cochrane Handbook for Systematic Reviews of

Interventions. 2nd Edition. Chichester (UK): John Wiley & Sons, 2019.

Hu, P., Wu, S., Sun, Y., Yuan, C.-C., Kobayashi, R., Myers, M. P., & Hernandez,

N. (2002). Characterization of Human RNA Polymerase III Identifies

Orthologues for Saccharomyces cerevisiae RNA Polymerase III Subunits.

Molecular and Cellular Biology, 22(22), 8044–8055. doi:

10.1128/mcb.22.22.8044-8055.2002

Huang, Y.-J. S., Higgs, S., & Vanlandingham, D. L. (2019). Arbovirus-Mosquito

Vector-Host Interactions and the Impact on Transmission and Disease

Pathogenesis of Arboviruses. Frontiers in Microbiology 10. doi:

10.3389/fmicb.2019.00022

Huedo-Medina, T. B., Sánchez-Meca, J., Marín-Martínez, F., & Botella, J. (2006).

Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol

Methods, 11(2):193-206.

Irwin, M. P. S. (1981). The birds of Zimbabwe Salisbury: Quest.

Isaksson, C., Sepil, I., Baramidze, V., & Sheldon, B. C. (2013). Explaining

variance of avian malaria infection in the wild: the importance of host

density, habitat, individual life-history and oxidative stress. BMC Ecology,

13(1), 15. doi: 10.1186/1472-6785-13-15

Ishtiaq, F. (2017). Exploring host and geographical shifts in transmission of

haemosporidians in a Palaearctic passerine wintering in India. Journal of

Ornithology, 158(3), 869–874. doi: 10.1007/s10336-017-1444-9

184 Jarne, P. & Théron, A. (2001). Genetic structure in natural populations of flukes

and snails: a practical approach and review. Parasitology 123(7): 27–40.

doi: 10.1017/s0031182001007715

Jeffares, A., Pain, A., Berry, A. & Cox, V. (2007). Genome variation and evolution

of the malaria parasite Plasmodium falciparum. Nature. Nature Publishing

Group.

Johnson, J. D., Mehus, J. G., Tews, K., Milavetz, B. I., & Lambeth, D. O. (1998).

Genetic Evidence for the Expression of ATP- and GTP-specific Succinyl-

CoA Synthetases in Multicellular Eucaryotes. Journal of Biological

Chemistry, 273(42), 27580–27586. doi: 10.1074/jbc.273.42.27580

Jombart, T. (2008) adegenet: a R package for the multivariate analysis of genetic

markers. Bioinformatics, 24, 1403–1405.

Joseph, L. & Omland, K. E. (2009) Phylogeography: its development and impact

in Australo-Papuan ornithology with special reference to paraphyly in

Australian birds. Emu - Austral Ornithology, 109, 1–23.

Joseph, L., Bishop, K. D., Wilson, C. A., Edwards, S. V., Iova, B., Campbell, C.

D., Mason, I., & Drew, A. (2019) A review of evolutionary research on

birds of the New Guinean savannas and closely associated habitats of

riparian rainforests, and grasslands. Emu - Austral Ornithology

119: 317–330. https://doi.org/10.1080/01584197.2019.1615844.

Jovani, R., Tella, J. L., Forero, M. G., Bertellotti, M., Blanco, G., Ceballos, O., &

Donazar, J. A. (2001). Apparent Absence of Blood Parasites in the

Patagonian Seabird Community: Is It Related to the Marine Environment?

185 Waterbirds: The International Journal of Waterbird Biology, 24(3), 430.

doi: 10.2307/1522076

Jüni, P., Holenstein, F., Sterne, J., Bartlett, C., & Egger, M. (2002). Direction and

impact of language bias in meta-analyses of controlled trials: empirical

study. International Journal of Epidemiology, 31(1), 115–123. doi:

10.1093/ije/31.1.115

Kalinowski, S. T., Wagner, A. P., & Taper, M. L. (2006) ml-relate: a computer

program for maximum likelihood estimation of relatedness and

relationship. - Molecular Ecology Notes 6: 576–579. doi:10.1111/j.1471-

8286.2006.01256.

Kamis, A. B. & Ibrahim, J. B. (1989) Effects of testosterone on blood leukocytes

in Plasmodium berghei-infected mice. Parasitology Research, 75, 611–

613.

Kear, J. (2005) Ducks, geese, and swans. Oxford University Press, Oxford.

Kearns, A. M., Joseph, L., & Cook, L. G. (2010) The impact of Pleistocene

changes of climate and landscape on Australian birds: a test using the

Pied Butcherbird (Cracticus nigrogularis). Emu - Austral Ornithology, 110,

285–295.

Keast, J. A. (1961) Bird speciation on the Australian continent. Bulletin of the

Museum of Comparative Zoology 123, 303–495.

Keeney, D. B., King, T. M., Rowe, D. L., & Poulin, R. (2009). Contrasting mtDNA

diversity and population structure in a direct-developing marine gastropod

186 and its trematode parasites. Molecular Ecology 18(22): 4591–4603. doi:

10.1111/j.1365-294x.2009.04388.x

Kloeker, S., & Wadzinski, B. E. (1999). Purification and Identification of a Novel

Subunit of Protein Serine/Threonine Phosphatase 4. Journal of Biological

Chemistry, 274(9), 5339–5347. doi: 10.1074/jbc.274.9.5339

Knowles, S. C. L., Wood, M. J., Alves, R., & Sheldon, B. C. (2014). Dispersal in

a patchy landscape reveals contrasting determinants of infection in a wild

avian malaria system. Journal of Animal Ecology, 83(2), 429–439. doi:

10.1111/1365-2656.12154

Kulma, K., Low, M., Bensch, S., & Qvarnström, A. (2013). Malaria infections

reinforce competitive asymmetry between twoFicedulaflycatchers in a

recent contact zone. Molecular Ecology, 22(17), 4591–4601. doi:

10.1111/mec.12409

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest

Package: Tests in Linear Mixed Effects Models. Journal of Statistical

Software, 82(13). doi: 10.18637/jss.v082.i13

Kyrkjeeide M.O., Hassel K., Flatberg K.I., Shaw A.J., Yousefi N. & Stenøien H.K.

(2016) Spatial Genetic Structure of the Abundant and Widespread

Peatmoss Sphagnum magellanicum Brid. Plos One, 11,

Lamb, A. M., Silva, A. G. D., Joseph, L., Sunnucks, P., & Pavlova, A. (2019)

Pleistocene-dated biogeographic barriers drove divergence within the

Australo-Papuan region in a sex-specific manner: an example in a

widespread Australian songbird. Heredity, 123, 608–621.

187 LaPointe, D. A., Atkinson, C. T., & Samuel, M. D. (2012). Ecology and

conservation biology of avian malaria. Annals of the New York Academy

of Sciences 1249: 211–226.

Latta, S. C. & Ricklefs, R. E. (2010). Prevalence patterns of avian haemosporida

on Hispaniola. Journal of Avian Biology 41: 25–33.

Lavretsky, P., Dacosta, J. M., Hernández-Baños, B. E., Engilis, A., Sorenson, M.

D. & Peters, J. L. (2015) Speciation genomics and a role for the Z

chromosome in the early stages of divergence between Mexican ducks

and mallards. - Molecular Ecology 24: 5364–5378.

Levin, I. I., Outlaw, D. C., Vargas, F. H., & Parker, P. G. (2009). Plasmodium

blood parasite found in endangered Galapagos penguins (Spheniscus

mendiculus). Biological Conservation, 142(12), 3191–3195. doi:

10.1016/j.biocon.2009.06.017

Lewicki, K. E., Huyvaert, K. P., Piaggio, A. J., Diller, L. V., & Franklin, A. B.

(2014). Effects of barred owl (Strix varia) range expansion on

Haemoproteus parasite assemblage dynamics and transmission in

barred and northern spotted owls (Strix occidentalis caurina). Biological

Invasions, 17(6), 1713–1727. doi: 10.1007/s10530-014-0828-5

Liaw, A. & Wiener, M. (2002). Classification and regression by randomForest. R

News 2, 18–22

Little, T. J. & Ebert, D. 2000. The cause of parasitic infection in natural

populations of Daphnia (Crustacea: Cladocera): the role of host genetics.

188 Proceedings of the Royal Society of London. Series B: Biological Sciences

267(1457): 2037–2042. doi: 10.1098/rspb.2000.1246

Loiseau, C., Melo, M., Lobato, E., Beadell, J. S., Fleischer, R. C., Reis, S., …

Covas, R. (2017). Insularity effects on the assemblage of the blood

parasite community of the birds from the Gulf of Guinea. Journal of

Biogeography, 44(11), 2607–2617. doi: 10.1111/jbi.13060

López, G., Muñoz, J., Soriguer, R., & Figuerola, J. (2013). Increased

Endoparasite Infection in Late-Arriving Individuals of a Trans-Saharan

Passerine Migrant Bird. PLoS ONE, 8(4). doi:

10.1371/journal.pone.0061236

Louhi, K.-R., Karvonen, A., Rellstab, C., & Jokela, J. (2010). Is the population

genetic structure of complex life cycle parasites determined by the

geographic range of the most motile host? Infection, Genetics and

Evolution 10(8): 1271–1277. doi: 10.1016/j.meegid.2010.08.013

Loy, D. E., Liu, W., Li, Y., Learn, G. H., Plenderleith, L. J., Sundararaman, S. A.,

…& Hahn, B. H. (2017). Out of Africa: origins and evolution of the human

malaria parasites Plasmodium falciparum and Plasmodium vivax.

International Journal for Parasitology, 47(2-3), 87–97. doi:

10.1016/j.ijpara.2016.05.008

Magnhagen, C. (1991) Predation risk as a cost of reproduction. Trends in

Ecology & Evolution, 6, 183–186.

Manunza, A., Cardoso, T. F., Noce, A., Martínez, A., Pons, A., Bermejo, L. A., …

& Amills, M. (2016). Population structure of eleven Spanish ovine breeds

189 and detection of selective sweeps with BayeScan and hapFLK. Scientific

Reports, 6(1). doi: 10.1038/srep27296

Marchant, S. & Higgins, P. J. (eds) 1990. Handbook of Australian, New Zealand

and Antarctic Birds. Volume 1: Ratites to Ducks. Oxford University Press,

Melbourne. ISBN 0-19-553244-9

Marshall, E. K. (1942) Chemotherapy Of Avian malaria. Physiological Reviews,

22, 190–204.

Martinez-Abrain, A., Esparza, B., & Oro, D. (2004) Lack of blood parasites in bird

species: does absence of blood parasite vectors explain it all? Ardeola

51:225–232

Martínez-De LaPuente, J., Eberhart-Phillips, L. J., Carmona-Isunza, M. C.,

Zefania, S., Navarro, M. J., Kruger, O., … Figuerola, J. (2017). Extremely

low Plasmodium prevalence in wild plovers and coursers from Cape Verde

and Madagascar. Malaria Journal, 16(1). doi: 10.1186/s12936-017-1892-y

Mayr, E. (1942) Systematics and the origin of species. Columbia Univ. Press,

New York.

Mbikay, M., Seidah, N. G., Chrétien, M., & Simpson, E. M. (1995). Chromosomal

assignment of the genes for proprotein convertases PC4, PC5, and PACE

4 in mouse and human. Genomics, 26(1), 123–129. doi: 10.1016/0888-

7543(95)80090-9

McCoy, K. D., Boulinier, T., Tirard, C., & Michalakis, Y. (2003). Host-Dependent

Genetic Structure Of Parasite Populations: Differential Dispersal Of

190 Seabird Tick Host Races. Evolution 57(2): 288. doi: 10.1554/0014-

3820(2003)057[0288:hdgsop]2.0.co;2

McCoy, K. D., Chapuis, E., Tirard, C., Boulinier, T., Michalakis, Y., Bohec, C. L.,

Maho, Y. L., & Gauthier-Clerc, M. (2005). Recurrent evolution of host-

specialized races in a globally distributed parasite. Proceedings of the

Royal Society B: Biological Sciences 272(1579): 2389–2395. doi:

10.1098/rspb.2005.3230

McCracken, K. G., Barger, C. P., Bulgarella, M., Johnson, K. P., Kuhner, M. K.,

Moore, A. V., … & Wilson, R. E. (2009). Signatures of High‐Altitude

Adaptation in the Major Hemoglobin of Five Species of Andean Dabbling

Ducks. The American Naturalist, 174(5), 631–650. doi: 10.1086/606020

McCurdy DG, Shutler D, Mullie A, Forbes MR (1998) Sex-Biased Parasitism of

Avian Hosts: Relations to Blood Parasite Taxon and Mating System.

Oikos, 82, 303.

McCutchan, T. F., Kissinger, J. C., Touray, M. G., Rogers, M. J., Li, J., Sullivan,

M., …& Miller, L. H. (1996). Comparison of circumsporozoite proteins from

avian and mammalian malarias: biological and phylogenetic

implications. Proceedings of the National Academy of Sciences, 93(21),

11889–11894. doi: 10.1073/pnas.93.21.11889

McEvoy, J. F., Roshier, D. A., Ribot, R. F., & Bennett, A. T. (2015). Proximate

cues to phases of movement in a highly dispersive waterfowl, Anas

superciliosa. Movement Ecology 3, 21. doi: 10.1186/s40462-015-0048-3.

191 Mech, L. D. & Barber, S. M. (2002) A critique of wildlife radio-tracking and its use

in national parks. Northern Prairie Wildlife Research Center, Jamestown,

ND.

Medeiros, M. C. I., Ricklefs, R. E., Brawn, J. D., & Hamer, G. L. (2015).

Plasmodium prevalence across avian host species is positively associated

with exposure to mosquito vectors. Parasitology, 142(13), 1612–1620. doi:

10.1017/s0031182015001183

Medeiros, M. C. I., Ricklefs, R. E., Goldberg, T. L., Ruiz, M. O., Hamer, G. L., &

Brawn, J. D. (2016). Overlap in the Seasonal Infection Patterns of Avian

Malaria Parasites and West Nile Virus in Vectors and Hosts. The

American Journal of Tropical Medicine and Hygiene, 95(5), 1121–1129.

doi: 10.4269/ajtmh.16-0236

Mendes, L., Pardal, S., Morais, J., Antunes, S., Ramos, J. A., Perez-Tris, J., &

Piersma, T. (2013). Hidden haemosporidian infections in Ruffs

(Philomachus pugnax) staging in Northwest Europe en route from Africa

to Arctic Europe. Parasitology Research, 112(5), 2037–2043. doi:

10.1007/s00436-013-3362-y

Meng, Y. A., Yu, Y., Cupples, L. A., Farrer, L. A., & Lunetta, K. L. (2009).

Performance of random forest when SNPs are in linkage

disequilibrium. BMC Bioinformatics, 10(1). doi: 10.1186/1471-2105-10-

78ne.0093379

192 Merino, S., & Mínguez, E. (2008). Absence of haematozoa in a breeding colony

of the Storm Petrel Hydrobates pelagicus. Ibis, 140(1), 180–181. doi:

10.1111/j.1474-919x.1998.tb04560.x

Merino, S., Potti, J., & Fargallo, J. A. (1997). Blood Parasites of Passerine Birds

from Central Spain. Journal of Wildlife Diseases, 33(3), 638–641. doi:

10.7589/0090-3558-33.3.638

Miller, M. R., Dunham, J. P., Amores, A., Cresko, W. A. & Johnson, E. A. (2007)

Rapid and cost-effective polymorphism identification and genotyping using

restriction site associated DNA (RAD) markers. - Genome Research 17:

240–248. doi:10.1101/gr.5681207.

Moher, D., Pham, B., Lawson, M., & Klassen, T. (2003). The inclusion of reports

of randomised trials published in languages other than English in

systematic reviews. Health Technology Assessment, 7(41). doi:

10.3310/hta7410

Mulvey, M., Aho, J. M., Lydeard, C., Leberg, P. L., & Smith, M. H. (1991).

Comparative Population Genetic Structure of a Parasite (Fascioloides

magna) and Its Definitive Host. Evolution 45(7): 1628. doi:

10.2307/2409784

Muriel, J., Graves, J. A., Gil, D., Magallanes, S., Salaberria, C., Casal-López,

M., & Marzal, A. (2018). Molecular characterization of avian malaria in

the spotless starling (Sturnus unicolor). Parasitology Research, 117(3),

919–928. doi: 10.1007/s00436-018-5748-3

193 Murphy, S. C., Harrison, T., Hamm, H. E., Lomasney, J. W., Mohandas, N., &

Haldar, K. (2006). Erythrocyte G Protein as a Novel Target for Malarial

Chemotherapy. PLoS Medicine, 3(12). doi:

10.1371/journal.pmed.0030528

Musa, S., Mackenstedt, U., Woog, F., & Dinkel, A. (2019). Avian malaria on

Madagascar: prevalence, biodiversity and specialization of

haemosporidian parasites. International Journal for Parasitology, 49(3-4),

199–210. doi: 10.1016/j.ijpara.2018.11.001

Nadler, S. A. (1995). Microevolution and the Genetic Structure of Parasite

Populations. The Journal of Parasitology 81(3): 395. doi:

10.2307/3283821

Nakayama, Y., Nakamura, N., Oki, S., Wakabayashi, M., Ishihama, Y., Miyake,

A., … Kurosaka, A. (2012). A Putative PolypeptideN-

Acetylgalactosaminyltransferase/Williams-Beuren Syndrome

Chromosome Region 17 (WBSCR17) Regulates Lamellipodium Formation

and Macropinocytosis. Journal of Biological Chemistry, 287(38), 32222–

32235. doi: 10.1074/jbc.m112.370932

Nathan R., Getz W.M., Revilla E., Holyoak M., Kadmon R., Saltz D. & Smouse

P.E. (2008) A movement ecology paradigm for unifying organismal

movement research. Proceedings of the National Academy of Sciences,

105, 19052–19059.

Neto, J. M., Pérez-Rodríguez, A., Haase, M., Flade, M., & Bensch, S. (2015).

Prevalence and diversity of Plasmodium and Haemoproteus parasites in

194 the globally-threatened Aquatic Warbler Acrocephalus paludicola.

Parasitology, 142(9), 1183–1189. doi: 10.1017/s0031182015000414

Njabo, K. Y., Cornel, A. J., Bonneaud, C., Toffelmier, E., Sehgal, R. N. M.,

Valkiūnas, G., …& Smith, T. B. (2010). Nonspecific patterns of vector,

host and avian malaria parasite associations in a central African rainforest.

Molecular Ecology, 20(5), 1049–1061. doi: 10.1111/j.1365-

294x.2010.04904.x

Nolte, I. M., Wallace, C., Newhouse, S. J., Waggott, D., Fu, J., Soranzo, N., …

Jamshidi, Y. (2009). Common Genetic Variation Near the Phospholamban

Gene Is Associated with Cardiac Repolarisation: Meta-Analysis of Three

Genome-Wide Association Studies. PLoS ONE, 4(7). doi:

10.1371/journal.pone.0006138

Norris, F. A., Atkins, R. C., & Majerus, P. W. (1997). The cDNA Cloning and

Characterization of Inositol Polyphosphate 4-Phosphatase Type II. Journal

of Biological Chemistry, 272(38), 23859–23864. doi:

10.1074/jbc.272.38.23859

Norris, K., & Evans, M. R. (2000). Ecological immunology: life history trade-offs

and immune defense in birds. Behavioral Ecology, 11(1), 19–26. doi:

10.1093/beheco/11.1.19

Oatley, T.O. & and Prys-Jones, R.P. A comparative analysis of movements of

southern African waterfowl (), based on ringing recoveries. South

African Journal of Wildlife Research, 16(1), 1-6(6)

195 Okanga, S., Cumming, G. S., & Hockey, P. A. (2013). Avian malaria prevalence

and mosquito abundance in the Western Cape, South Africa. Malaria

Journal, 12(1). doi: 10.1186/1475-2875-12-370

Oksanen, J., Guillaume Blanchet, F., Friendly, M., Kindt, R., Legendre, P.,

McGlinn, D., Minchin, P.R., O'Hara, R. B., Simpson, G. L., Solymos, P.,

Henry, M., Stevens, H., Szoecs, E., & Wagner, H. (2019). vegan:

Community Ecology Package. R package version 2.5-6. https://CRAN.R-

project.org/package=vegan

Pacheco, M. A., Escalante, A. A., Garner, M. M., Bradley, G. A., & Aguilar, R. F.

(2011). Haemosporidian infection in captive masked bobwhite quail

(Colinus virginianus ridgwayi), an endangered subspecies of the northern

bobwhite quail. Veterinary Parasitology, 182(2-4), 113–120. doi:

10.1016/j.vetpar.2011.06.006

Parker, V. (1994). Swaziland bird atlas, 1985–1991. Websters.

Peck, D.R. & Congdon, B.C. (2004) Reconciling historical processes and

population structure in the sooty tern Sterna fuscata. Journal of Avian

Biology, 35, 327–335.

Peirce, M. A. & Brooke, M. (1993) Failure to detect blood parasites in seabirds

from the Pitcairn Islands. Seabird, 15: 72-74.

Pérez-Rodríguez, A., Puente, J. D. L., Onrubia, A., & Pérez-Tris, J. (2013).

Molecular characterization of haemosporidian parasites from kites of the

genus Milvus (Aves: Accipitridae). International Journal for Parasitology,

43(5), 381–387. doi: 10.1016/j.ijpara.2012.12.007

196 Peters, J. L., Lavretsky, P., Dacosta, J. M., Bielefeld, R. R., Feddersen, J. C. &

Sorenson, M. D. (2016) Population genomic data delineate conservation

units in mottled ducks (Anas fulvigula). - Biological Conservation 203:

272–281.

Pfeifer, B., Wittelsbürger, U., Ramos-Onsins, S. E. & Lercher, M. J. (2020)

PopGenome: An Efficient Swiss Army Knife for Population Genomic

Analyses in R. - Molecular Biology and Evolution 31: 1929–1936.

doi:10.1093/molbev/msu136.

Piersma, T. (1997). Do Global Patterns of Habitat Use and Migration Strategies

Co-Evolve with Relative Investments in Immunocompetence due to

Spatial Variation in Parasite Pressure? Oikos, 80(3), 623. doi:

10.2307/3546640

Pigeault, R., Vézilier, J., Cornet, S., Zélé, F., Nicot, A., Perret, P. & Rivero, A.

(2015). Avian malaria: a new lease of life for an old experimental model to

study the evolutionary ecology of Plasmodium. Philosophical Transactions

of the Royal Society B: Biological Sciences, 370(1675), 20140300.

Podmokła, E., Dubiec, A., Drobniak, S. M., Arct, A., Gustafsson, L., & Cichoń,

M. (2014). Avian malaria is associated with increased reproductive

investment in the blue tit. Journal of Avian Biology, 45(3), 219–224. doi:

10.1111/j.1600-048x.2013.00284.x

Power, A. G. & Mitchell, C. E. (2004). Pathogen Spillover in Disease Epidemics.

The American Naturalist 164(S5). doi: 10.1086/424610

197 Price, P. W. (1980). Evolutionary biology of parasites. Princeton, NJ: Princeton

UP.

Pritchard, J. K., Stephens, M. & Donnelly, P. (2000) Inference of population

structure using multilocus genotype data. - Genetics 155, 945–959.

Quillfeldt, P., Arriero, E., Martínez, J., Masello, J. F., & Merino, S. (2011).

Prevalence of blood parasites in seabirds - a review. Frontiers in Zoology,

8(1), 26. doi: 10.1186/1742-9994-8-26

R Core Team (2019). R: A language and environment for statistical computing.

R Foundation for Statistical Computing. Vienna, Austria. URL:

https://www.R-project.org/

Rambaut, A. (2012). Figtree 1.4.0. Available:

http://tree.bio.ed.ac.uk/software/figtree/ (20 December 2019, date last

accessed)

Ramey, A. M., Fleskes, J. P., Schmutz, J. A., & Yabsley, M. J. (2013).

Evaluation of blood and muscle tissues for molecular detection and

characterization of hematozoa infections in northern pintails (Anas acuta)

wintering in California. International Journal for Parasitology: Parasites

and Wildlife, 2, 102–109. doi: 10.1016/j.ijppaw.2013.02.001

Ramey, A. M., Reed, J. A., Walther, P., Link, P., Schmutz, J. A., Douglas, D. C.,

… Soos, C. (2016). Evidence for the exchange of blood parasites

between North America and the Neotropics in blue-winged teal (Anas

discors). Parasitology Research, 115(10), 3923–3939. doi:

10.1007/s00436-016-5159-2

198 Ramey, A. M., Schmutz, J. A., Reed, J. A., Fujita, G., Scotton, B. D., Casler, B.,

…& Yabsley, M. J. (2015). Evidence for intercontinental parasite

exchange through molecular detection and characterization of

haematozoa in northern pintails (Anas acuta) sampled throughout the

North Pacific Basin. International Journal for Parasitology: Parasites and

Wildlife, 4(1), 11–21. doi: 10.1016/j.ijppaw.2014.12.004

Real, L. A. & Biek, R. (2007). Spatial dynamics and genetics of infectious

diseases on heterogeneous landscapes. Journal of The Royal Society

Interface 4(16): 935–948. doi: 10.1098/rsif.2007.1041

Reed, K. D., Meece, J.K., Henkel, J.S., & Shukla, S. K. (2003) Birds, Migration

and Emerging Zoonoses: West Nile Virus, Lyme Disease, Influenza A and

Enteropathogens. Clinical Medicine & Research, 1, 5-12.

Rhymer J.M., Williams M.J., & Kingsford R.T. (2004) Implications of

phylogeography and population genetics for subspecies taxonomy of Grey

(Pacific Black) Duck Anas superciliosa and its conservation in New

Zealand. Pacific Conservation Biology, 10, 57.

Ribeiro, S. F., Sebaio, F., Branquinho, F. C. S., Marini, M. Â., Vago, A. R., &

Braga, É. M. (2005). Avian malaria in Brazilian passerine birds:

parasitism detected by nested PCR using DNA from stained blood

smears. Parasitology, 130(3), 261–267. doi:

10.1017/s0031182004006596

199 Ritland, K. (1996) Estimators for pairwise relatedness and individual inbreeding

coefficients. - Genetical Research 67: 175–185.

doi:10.1017/s0016672300033620.

Roshier D.A., Heinsohn R., Adcock G.J., Beerli P., & Joseph L. (2012)

Biogeographic models of gene flow in two waterfowl of the Australo-

Papuan tropics. Ecology and Evolution, 2, 2803–2814.

Santiago-Alarcon, D., Bloch, R., Rolshausen, G., Schaefer, H. M., &

Segelbacher, G. (2011). Prevalence, diversity, and interaction patterns of

avian haemosporidians in a four-year study of blackcaps in a migratory

divide. Parasitology, 138(7), 824–835. doi: 10.1017/s0031182011000515

Scaglione, F. E., Pregel, P., Cannizzo, F. T., Pérez-Rodríguez, A. D., Ferroglio,

E., & Bollo, E. (2015). Prevalence of new and known species of

haemoparasites in feral pigeons in northwest Italy. Malaria Journal,

14(1). doi: 10.1186/s12936-015-0617-3

Schall, J. (2000) Transmission success of the malaria parasite Plasmodium

mexicanum into its vector: role of gametocyte density and sex ratio.

Parasitology, 121.

Scheuerlein, A., & Ricklefs, R. E. (2004). Prevalence of blood parasites in

European passeriform birds. Proceedings of the Royal Society of London.

Series B: Biological Sciences, 271(1546), 1363–1370. doi:

10.1098/rspb.2004.2726

Schmid, S., Fachet, K., Dinkel, A., Mackenstedt, U., & Woog, F. (2017). Carrion

crows (Corvus corone) of southwest Germany: important hosts for

200 haemosporidian parasites. Malaria Journal, 16(1). doi: 10.1186/s12936-

017-2023-5

Schodde, R. & Mason, I. J. (1999) The directory of Australian birds a taxonomic

and zoogeographic atlas of the biodiversity of birds in Australia and its

territories. CSIRO Publishing, Collingwood, VIC.

Schodde, R. (2006). Australia’s bird fauna today – Origins and development. In

‘Evolution and Biogeography of Australasian Vertebrates.’ (Eds J. R.

Merrick, M. Archer, G. Hickey, and M. Lee.) pp. 413–458. (AusciPub:

Sydney.)

Schrenzel, M. D., Maalouf, G. A., Keener, L. L., & Gaffney, P. M. (2003).

Molecular Characterization Of Malarial Parasites In Captive Passerine

Birds. Journal of Parasitology, 89(5), 1025–1033. doi: 10.1645/ge-3163

Soares, L., Ellis, V. A., & Ricklefs, R. E. (2016). Co-infections of

haemosporidian and trypanosome parasites in a North American

songbird. Parasitology, 143(14), 1930–1938. doi:

10.1017/s0031182016001384

Schuurs, A. &Verheul, H. (1990) Effects of gender and sex steroids on the

immune response. Journal of Steroid Biochemistry, 35, 157–172.

Scott, D. A. & Rose, P. M. (1996). Atlas of Anatidae populations in Africa and

western Eurasia. Wetlands International, Wageningen, Netherlands.

Soares, L., Ellis, V. A., & Ricklefs, R. E. (2016). Co-infections of haemosporidian

and trypanosome parasites in a North American songbird. Parasitology,

143(14), 1930–1938. doi: 10.1017/s0031182016001384

201 Solomon, N., James, I., Alphonsus, N., & Nkiruka, R. (2015). A Review of Host-

Parasite Relationships. Annual Research & Review in Biology, 5(5), 372–

384. doi: 10.9734/arrb/2015/10263

Sonsthagen, S. A., Wilson, R. E., Lavretsky, P. & Talbot, S. L. (2019) Coast to

coast: High genomic connectivity in North American scoters. - Ecology

and Evolution in press.

Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and

post-analysis of large phylogenies. Bioinformatics 30(9): 1312–1313. doi:

10.1093/bioinformatics/btu033

Stehling, O., Vashisht, A. A., Mascarenhas, J., Jonsson, Z. O., Sharma, T., Netz,

D. J. A., … Lill, R. (2012). MMS19 Assembles Iron-Sulfur Proteins

Required for DNA Metabolism and Genomic Integrity. Science, 337(6091),

195–199. doi: 10.1126/science.1219723

Strobel, H. M., Alda, F., Sprehn, C. G., Blum, M. J., & Heins, D. C. (2016).

Geographic and host-mediated population genetic structure in a cestode

parasite of the three-spined stickleback. Biological Journal of the Linnean

Society 119(2): 381–396. doi: 10.1111/bij.12826

Sul, J. H., Martin, L. S. & Eskin, E. (2018) Population structure in genetic studies:

Confounding factors and mixed models. - PLOS Genetics.

Sullivan, G. M., & Feinn, R. (2012). Using Effect Size—or Why thePValue Is Not

Enough. Journal of Graduate Medical Education, 4(3), 279–282. doi:

10.4300/jgme-d-12-00156.1

202 Summers, K., Mckeon, S., Sellars, J., Keusenkothen, M., Morris, J., Gloeckner,

D., … Snow, H. (2003). Parasitic exploitation as an engine of diversity.

Biological Reviews, 78(4), 639–675. doi: 10.1017/s146479310300616x

Svoboda, A., Marthinsen, G., Pavel, V., Chutný, B., Turčoková, L., Lifjeld, J. T.,

& Johnsen, A. (2014). Blood parasite prevalence in the Bluethroat is

associated with subspecies and breeding habitat. Journal of Ornithology,

156(2), 371–380. doi: 10.1007/s10336-014-1134-9

Swanson, B. L., Lyons, A. C., & Bouzat, J. L. (2014). Distribution, prevalence

and host specificity of avian malaria parasites across the breeding range

of the migratory lark sparrow (Chondestes grammacus). Genetica,

142(3), 235–249. doi: 10.1007/s10709-014-9770-9

Szöllősi, E., Cichoń, M., Eens, M., Hasselquist, D., Kempenaers, B., Merino, S.,

…& Garamszegi, L. Z. (2011). Determinants of distribution and

prevalence of avian malaria in blue tit populations across Europe:

separating host and parasite effects. Journal of Evolutionary Biology,

24(9), 2014–2024. doi: 10.1111/j.1420-9101.2011.02339.x

Tamura, K., Stecher, G., Peterson, D., Filipski, A., & Kumar, S. (2013). MEGA6:

Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology

and Evolution 30(12): 2725–2729. doi: 10.1093/molbev/mst197

Tarboton, W. R., Kemp, M., & Kemp, A. C. (1987). Birds of the Transvaal.

Pretoria: Transvaal Museum.

Tigano, A., Shultz, A. J., Edwards, S. V., Robertson, G. J., & Friesen, V. L.

(2017). Outlier analyses to test for local adaptation to breeding grounds in

203 a migratory arctic seabird. Ecology and Evolution, 7(7), 2370–2381. doi:

10.1002/ece3.2819

Toon A., Hughes, J.M., & Joseph, L. (2010) Multilocus analysis of honeyeaters

(Aves: Meliphagidae) highlights spatio-temporal heterogeneity in the

influence of biogeographic barriers in the Australian monsoonal zone.

Molecular Ecology, 19, 2980–2994.

Tremblay, L. O., Dyke, N. C., & Herscovics, A. (1998). Molecular cloning,

chromosomal mapping and tissue-specific expression of a novel human

1,2-mannosidase gene involved in N-glycan maturation. Glycobiology,

8(6), 585–595. doi: 10.1093/glycob/8.6.585

Trigunaite, A., Dimo, J., & Jørgensen, T. N. (2015) Suppressive effects of

androgens on the immune system. Cellular Immunology, 294, 87–94.

Tuteja, N. (2009). Signaling through G protein coupled receptors. Plant Signaling

& Behavior, 4(10), 942–947. doi: 10.4161/psb.4.10.9530

United States Department of Agriculture (USDA). (2016). HPAI 2014/15

confirmed detections,

https://www.aphis.usda.gov/aphis/ourfocus/animalhealth/animal-disease-

information/avian-influenza-disease/SA_Detections_by_States/HPAI-

2014-2015-Confirmed-Detections.

Valkiūnas, G. (2005). Avian malaria parasites and other haemosporidia. CRC

Press. Boca Raton, Florida. pp. 946.

Vermeesch, J. R., Mertens, G., David, G., & Marynen, P. (1995). Assignment of

the human glypican gene (GPC1) to 2q35–q37 by fluorescence in situ

204 hybridization. Genomics, 25(1), 327–329. doi: 10.1016/0888-

7543(95)80152-c

Viechtbauer, W. (2010) Conducting Meta-Analyses in R with the metaphor

Package. Journal of Statistical Software, 36.

Voight, B. & Pritchard, J. (2005) Confounding from Cryptic Relatedness in Case-

Control Association Studies. - PLoS Genetics.

Watkins, P. A., Maiguel, D., Jia, Z., & Pevsner, J. (2007). Evidence for 26 distinct

acyl-coenzyme A synthetase genes in the human genome. Journal of

Lipid Research, 48(12), 2736–2750. doi: 10.1194/jlr.m700378-jlr200

Weinell, J. L. & Bauer, A. M. (2018). Systematics and phylogeography of the

widely distributed African skink Trachylepis varia species complex.

Molecular Phylogenetics and Evolution 120: 103–117. doi:

10.1016/j.ympev.2017.11.014

West, S. L., Gartlehner, G., Mansfield, A. J., Poole, C., Tant, E., Lenfestey, N.,

…& Lohr, K. N. (2010). Comparative Effectiveness Review Methods:

Clinical Heterogeneity. S.l.: Agency for Healthcare Research and Quality

(US).

Winker, K., McCracken, K. G., Gibson, D. D., Pruett, C. L., Meier, R., Huettmann,

F., Wege, M., Kulikova, I., Zhuravlev, Y. N., Perdue, M. et al., (2007).

Movements of Birds and Avian Influenza from Asia into Alaska. Emerging

Infectious Diseases 13(4): 547–552. doi: 10.3201/eid1304.061072

Wise, R. J., Barr, P. J., Wong, P. A., Kiefer, M. C., Brake, A. J., & Kaufman, R. J.

(1990). Expression of a human proprotein processing enzyme: correct

205 cleavage of the von Willebrand factor precursor at a paired basic amino

acid site. Proceedings of the National Academy of Sciences, 87(23),

9378–9382. doi: 10.1073/pnas.87.23.9378

Witsenburg, F., Clément, L., López-Baucells, A., Palmeirim, J., Pavlinić, I.,

Scaravelli, D., Ševčík, M., Dutoit, L., Salamin, N., Goudet, J., & P. Christe.

(2015). How a haemosporidian parasite of bats gets around: the genetic

structure of a parasite, vector and host compared. Molecular Ecology

24(4): 926–940. doi: 10.1111/mec.13071

Wood, M. J., Childs, D. Z., Davies, A. S., Hellgren, O., Cornwallis, C. K., Perrins,

C. M., & Sheldon, B. C. (2013). The epidemiology underlying age‐related

avian malaria infection in a long‐lived host: The mute swan Cygnus olor.

Journal of Avian Biology 44: 347–358.

Wu, M., Michaud, E. J., & Johnson, D. K. (2003). Cloning, functional study and

comparative mapping of Luzp2 to mouse Chromosome 7 and human

Chromosome 11p13?11p14. Mammalian Genome, 14(5), 323–334. doi:

10.1007/s00335-002-2248-6

Yang, J., Kim, O., Wu, J., & Qiu, Y. (2002). Interaction between Tyrosine Kinase

Etk and a RUN Domain- and FYVE Domain-containing Protein RUFY1.

Journal of Biological Chemistry, 277(33), 30219–30226. doi:

10.1074/jbc.m111933200

Yeaman, S. (2015). Local adaptation by alleles of small effect. The American

Naturalist, 186, S74–S89.

206 Yohannes, E., Križanauskienė, A., Valcu, M., Bensch, S., & Kempenaers, B.

(2008). Prevalence of malaria and related haemosporidian parasites in

two shorebird species with different winter habitat distribution. Journal of

Ornithology, 150(1), 287–291. doi: 10.1007/s10336-008-0349-z

Zhao, Y., Richardson, B., Takle, E., Chai, L., Schmitt, D., & Xin, H. (2019).

Airborne transmission may have played a role in the spread of 2015 highly

pathogenic avian influenza outbreaks in the United States. Scientific

Reports, 9(1). doi: 10.1038/s41598-019-47788-z

Zuk, M., & Johnsen, T. S. (1998). Seasonal changes in the relationship between

ornamentation and immune response in red jungle fowl. Proceedings of

the Royal Society of London. Series B: Biological Sciences, 265(1406),

1631–1635. doi: 10.1098/rspb.1998.0481

207