Genetic Linkage to Explain Genetic Variation

22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Genetic linkage to explain genetic variation J.L. Mijangosa, C.E. Holleleyb,f, R.A. Nicholsb,c, I.N. Towersa, Z. Jovanoskia, H.S. Sidhua, S. Watta, A.T. Adamackd,e and W.B. Sherwinb a School of Physical, Environmental and Mathematical Sciences, University of New South Wales, Canberra, Australia. b Evolution and Ecology Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney NSW 2052, Australia. c School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK. d Institute for Applied Ecology, University of Canberra, Bruce, Australia. e Department of Biology, Virginia Commonwealth University, Richmond, VA 23284, USA. f Australian National Wildlife Collection, National Research Collections Australia, CSIRO, Australia. Email: [email protected] Abstract: Population genetics theory has been used to develop models to inform biodiversity conservation. The implementation of these models for the decision-making, monitoring and evaluating conservation efforts has improved their efficacy. Many of these models are based on neutral loci, which are assumed to not have any effect on the survival or reproduction of organisms. However, when neutral loci are linked to loci that are under selection, they do not follow the expectations predicted by theoretical neutral models. The term linked selection has been used to refer to this phenomenon. Linked selection might accelerate the rate of loss of genetic diversity, with respect expectations under neutral models. This acceleration is typically expected to occur under two different scenarios: selective sweeps and background selection. A selective sweep occurs when an advantageous allele is spread across the population along with the alleles that are linked to it. Conversely, background selection occurs when a deleterious allele is eliminated from the population along with the alleles linked to it. A third scenario of linked selection has been hypothesised to occur in small populations: associative overdominance. This scenario rather than an acceleration involves a retardation of the loss rate of genetic diversity. The proposed mechanism in operation is a type of natural selection that maintains two or more alleles in the population (i.e. balancing selection). Ultimately, linked selection will bias the conclusions obtained by neutral models. To investigate the mechanisms by which linked selection alters genetic diversity, we built a population genetics model incorporating the main factors involved in the occurrence of linked selection. We modelled the following factors: recombination/linkage disequilibrium, fitness/selection, population size and dominance. Our overall aims are twofold: • Build a model that serves as a base to develop more complex models to test competing hypotheses of linked selection. • Implement the model in a programming language and validate it against theoretical expectations The model was implemented in the programing language R and was extensively tested. The program produces outputs that are consistent with predictions from population genetics theory. By using this model as a starting point, we will be able to investigate potential mechanisms by which linked selection affects genetic diversity and its derived consequences. Our research may highlight the importance of the need to adjust neutral models. Keywords: Associative overdominance, population genetics, linkage disequilibrium, linked selection 127 Mijangos et al., Genetic linkage to explain genetic variation 1. INTRODUCTION Genomes consist of sequences of deoxyribonucleic acid (DNA) that encode most of an organism’s traits. DNA is packaged within structures called chromosomes. Diploid organisms carry two copies of each chromosome with one copy inherited from each of its parents. The location of a specific DNA sequence within the genome is called a “locus” (“loci” in plural), and the different variants of DNA sequences at a locus are called “alleles”. The word “gene” is often ambiguously used to make reference to both “locus” and “allele”; therefore, we avoid the use of gene in this text. An individual carrying different alleles at a particular locus is said to be heterozygote at that locus, while an individual carrying the same allele at that locus is a homozygote at that locus. Population geneticists have developed theoretical models to understand and predict evolutionary processes. Many core models in population genetics are based on neutral loci, called neutral because they do not affect fitness1 (i.e. survival or reproduction of living organisms). Therefore, neutral loci are not affected by natural selection, but by genetic drift (variation in allele frequencies across generations due to random chance) and gene flow (exchanges of genetic material between populations). Neutral loci models have been developed to use our knowledge about genetic drift and gene flow to derive measures and parameters that can be used to inform the management of natural populations. For instance, such measures can help us to determine levels of landscape connectivity, population size, genetic divergence, population origin of individuals, delineation of populations and infer past and present population trajectories (Frankham et al. 2009). The effective population size (Ne) is a major concept in population genetics and is particularly important for informing the conservation and restoration of biodiversity. Ne predicts the effects of genetic drift such as rates of loss of neutral genetic diversity, and the interactions of genetic drift and natural selection such as fixation of deleterious and favourable alleles (England et al. 2006). Ne is defined as the size of an idealised population that would have the same amount of genetic drift as the population under consideration (Kimura & Crow 1963). TCGCGATA Mother's chromosome Neutral models frequently assume that neutral loci are Before recombination inherited by offspring as independent units from their TGGTCAGC Father's chromosome parents. However, loci that are close to each other on the same chromosome are often inherited together in clusters TCGCCAGC Chromosome A of different loci termed haplotypes. This non- After independent inheritance of loci will cause a non-random recombination TGGTGATA Chromosome B association between their alleles causing a linkage disequilibrium (LD) between loci. Conversely, the process of recombination rearranges loci within Figure 1. Diagram depicting the recombination haplotypes and hence dilutes the amount of LD between process. Upper DNA sequences represent loci. Recombination occurs during meiosis, when homologous chromosomes of an individual (its homologous chromosomes temporarily fuse and mother’s and its father’s chromosomes). Black exchange loci between each other (Fig. 1). As a result of arrows represent the location where recombination recombination, haplotypes with different combinations is occurring within a chromosome. Lower DNA of alleles arise, which are then inherited by offspring. sequences represent the resulting chromosomes (A and B) that may be transmitted to the individual’s offspring. When LD occurs between two loci, one neutral and one under selection, theoretical expectations for neutral loci break down, and sometimes neutral loci appear as if they are under selection (“linked selection”). As a consequence, this phenomenon biases the conclusions derived from neutral models. Our objective is to develop a population genetics model that recreates the mechanisms occurring in linked selection. This model will serve as a base to develop more complex models that allow us to investigate factors that may be influencing linked selection, such as dispersal, different types of selection and multiple loci under linked selection. This work will highlight the need to adjust neutral models when linked selection occurs and will provide insights into how linked selection may influence models used in conservation and restoration. 1 Please note that all the words or phrases in bold in this manuscript are defined in the glossary section. 128 Mijangos et al., Genetic linkage to explain genetic variation A) Selective Sweep In this manuscript, we present the factors involved 1 CATCGCGATAAC 3 GATGCCGAATAC with linked selection, and then outline how these 2GTTCGCGATAAC 3 GATGCCGAATAC factors were incorporated in the model. We then ran 3 GATGCCGAATAC 3 GATGCCGAATAC test simulations to validate the proposed model. We 4GTTGCCGTAAAC 3 GATGCCGAATAC used R v3.3.3 (R Core Team 2017) to implement 5GATCCGGAAAAG 3 GATGCCGAATAC the model and run the simulations. B) Background selection 1CATCGCGATAAC 1 CATCGCGATAAC 2GTTCGCGATAAC 2 GTTCGCGATAAC 2. HAPLOTYPES WITH LINKED 3 GATGCCCAAAAC SELECTION 4GTTGCCGTAAAC 4 G TTGCCGTAAAC 5GATCCGGAAAAG 5 GATCCGGAAAAG Linkage disequilibrium between nearby loci, within Deleterious Advantageous Neutral a haplotype, will cause loss of genetic variation if one or more loci within the haplotype is under Figure 2. Hypothetical scenarios showing the selection. Loss of genetic variation via linked consequences of selective sweeps and background selection is expected to occur under two different selection on genetic diversity after several scenarios. An allele under positive selection (i.e. generations. Five haplotypes are present in the advantageous) will be spread across the population population (1 to 5)

Load more