Manual on application of molecular tools in aquaculture and inland fisheries management. Part 1. Conceptual basis of population genetic approaches

Item Type book

Authors Nguyen, Thuy; Hurwood, David; Mather, Peter; Na-Nakorn, Uthairat; Kamonrat, Wongpathom; Bartley, Devin

Publisher Network of Aquaculture Centres in Asia-Pacific

Download date 30/09/2021 09:48:27

Link to Item http://hdl.handle.net/1834/18374 Manual on Application of Molecular Tools in Aquaculture and Inland Fisheries Management MANUAL

ON

APPLICATION Part 1

OF

MOLECULAR Conceptual basis of population genetic

TOOLS approaches

IN

AQUACULTURE

AND

INLAND

FISHERIES

MANAGEMENT : PART

1 NACA Monograph1

www.enaca.org Manual on Application of Molecular Tools in Aquaculture and Inland Fisheries Management

Part 1: Conceptual basis of population genetic approaches

Contributors

Thuy Nguyen Network of Aquaculture Centres in Asia-Pacifi c

David Hurwood, Peter Mather School of Natural Resource Sciences, Queensland University of Technology

Uthairat Na-Nakorn Kasetsart University, Thailand

Wongpathom Kamonrat Department of Fisheries, Thailand

Devin Bartley Food and Agriculture Organization of the United Nations

Queensland University of Technology Brisbane, Australia NACA MONOGRAPH SERIES NACA is an intergovernmental organization that promotes rural development through sustainable aquaculture. NACA seeks to improve rural income, increase food production and foreign exchange earnings and to diversify farm production. The ultimate benefi ciaries of NACA activities are farmers and rural communities.

Visit NACA online at www.enaca.org for hundreds of freely downloadable publications on aquaculture and aquatic resource management.

© Network of Aquaculture Centres in Asia-Pacifi c PO Box 1040, Kasetsart University Post Offi ce Ladyao, Jatujak Bangkok 10903 Thailand Email: [email protected]

Nguyen, T.T.T., Hurwood, D., Mather, P., Na-Nakorn, N, Kamonrat, W. and Bartley, D. 2006. Manual on applications of molecular tools in aquaculture and inland fi sheries management, Part 1: Conceptual basis of population genetic approaches. NACA Monograph No. 1, 80p.

ISBN 978-974-88246-1-1

Printed by Scandmedia, Bangkok. Contents

Preface ...... 5 Acknowledgements ...... 7 Background ...... 9 Target audiences ...... 11 Aims, scope and format of the manual ...... 12 Abbreviations ...... 13

Section 1. The fundamental nature of DNA ...... 15

1.1 Basic DNA structure ...... 17 1.2 Where does variation in DNA sequences come from? ...... 18

Section 2. Genetic variation in nature ...... 23

Section 3. Basic concepts in population genetics ...... 29

Section 4. Natural selection ...... 35

Section 5. Genetic drift ...... 41

Section 6. Non-random mating and population structure ...... 47

Section 7. Environmental influences on population processes ...... 55

Section 8. Ecological influences on population processes ...... 63

Glossary ...... 69 Bibliography ...... 79

3

Preface

The mandate of NACA is to support is produced to facilitate training member governments in their endea- processes that NACA will undertake vours to achieve long-term sustainabi- in the ensuing years to enable the lity of inland fi shery resource utilisation member nations to achieve the overall and aquaculture development. In this objectives in regard to maintaining regard, NACA plays a major role in biodiversity in relation to development developing human capacity in aspects of aquatic resources utilisation. in the member countries. We accept the fact that a number of In the current millennium, inland fi she- text books are available for reference ries resource utilisation and aquacul- in this fi eld. Most however, are ture development have to go hand in expensive for many users and some of hand with maintaining environmental the techniques provided in them are integrity and biodiversity. Conserving not always suitable for many of the biodiversity has become an important molecular laboratories in the region. consideration worldwide. Nations that This has prompted us to prepare this import aquaculture products, often manual, which is designed to be less stress that the production processes expensive, more “user friendly” and of must not negatively affect natural direct relevance to the region. biodiversity. Furthermore, conservation of biodiversity is an integral component of responsible fi sheries and enshrined in the FAO Code of Conduct for Responsible Fisheries. Consequently, NACA as mandated by its Governing Council, is embarking on a program that attempts to sustain genetic diversity in relation to inland fi sheries management and aquaculture develop- ment in the region.

One of the initial steps is to assist member nations to achieve the above broad objectives and to develop human capacity in the current methodologies used to assess genetic diversity and its applications to biodiversity issues in inland fi shery resource utilisation and aquaculture development. This manual

5

Acknowledgements

T. T. T. Nguyen would like to thanks Mr. Pedro Bueno, former Director General of NACA, without whose support the manual could not have become possible. Encouragement from Prof. Sena De Silva, Deakin University (current Director General of NACA) is very much appreciated. P. Mather and D. Hurwood would like to acknowledge the Australian Centre for International Agricultural Research (ACIAR) for funding support.

7

Background

It has generally been accepted that as a loss of valuable genetic material aquaculture can contribute signifi cantly such as locally adapted or to narrowing the gap between demand complexes or homogenisation and supply for aquatic food supplies. of previously structured populations Currently, aquaculture production is via fl ooding with exogenous genes. estimated to be 51.4 million tonnes In Thailand, one example of such annually, valued at US$60 billion. More impacts is the outcome of hybridisation importantly, developing countries, between the Thai walking catfi sh, particularly in Asia, account for over Clarias macrocephalus and the African 85% of current production. It is most catfi sh C. gariepinus (Senanan et al., likely that dominance of Asian coun- 2004). While the long-term impact of tries in aquaculture production will be this hybridisation is still to be deter- maintained into the foreseeable future. mined, there has been a general loss of genetic diversity in the native species. With increasing developments in Similarly, it has been a suggested aquaculture however, the sector that hybrid Clarias are contributing also has had to face public concern to the decline of native C. batrachus in regard to environmental effects. in the Mekong Delta (Welcomme Aquaculture development with no and Vidthayanon, 2003). A parallel regard for social and environmental situation appears to be occurring issues is no longer acceptable to the elsewhere in Viet Nam, but as yet no public, be it in developed or developing genetic analyses have been conducted countries. Aquaculture development (personal observation). needs increasingly to take into account environmental impacts. It is in this Stock enhancement is a common fi shery regard that maintaining and sustaining practice in the freshwaters of many the environment has become para- Asian nations, and is considered to be a mount. Attention to genetic diversity means by which fi sh food supplies can and biodiversity in aquaculture devel- be signifi cantly enhanced (Petr, 1998 opment and aquatic resource manage- De Silva, 2004). Many enhancement ment are therefore, crucial elements practices, except those in China and for sustainable environments. perhaps India, are however dependent primarily on exotic species, with little Introduction of new species/strains can understainding of their effects on affect biodiversity via impacts on the genetic diversity in the native species. native gene pool. New species/strains A limited study conducted in Thailand can hybridise with native stocks, appears to indicate that stock enhance- and hence alter the natural genetic ment together with escapees from architecture. This may be expressed aquaculture operations have brought

9 about a decrease in genetic diversity macrocephalus of wild and broodstock in the silver barb Puntius gonionotus populations in Thailand, while populations (Kamonrat, 1996). Indeed, Kamonrat (1996) demonstrated that a the observation itself indicates a need similar situation has resulted for silver to step up the number of similar studies barb Puntius gonionotus. in the region to enable measures to be adopted that ensure levels of genetic Another major concern is poor stock diversity and biodiversity can be management practices in hatcheries, sustained for the long-term. especially with respect to broodstock management, which may lead to losses The major regional genetic program of genetic variation in culture stocks initiatives in Asia have thus far largely due to genetic drift and inbreeding. been confi ned to selective breeding Although the number of published programs, a much needed area of works on this matter are limited (e. work for aquaculture development in g. Eknath and Doyle, 1990) there is the region. None of these programs anecdotal evidence for genetic erosion were directly related however, to of cultured stocks especially with contributing to aquatic resource regard to the major carp species. diversity. On the other hand, at recent regional workshops (Gupta and Acosta, Asian nations in the meeting of the 2001) in which most Asian nations were International Network of Genetics in represented, ongoing and planned Aquaculture in 2000 (Gupta and Acosta, genetic related work was discussed 2001) recognised that more attention and some consideration was made needs to be paid to biodiversity and regarding biodiversity and conservation conservation issues. Thus while atten- issues. Unfortunately, there were a very tion should be paid to genetic improve- limited number of biodiversity related ment of important cultured species, studies reported. increasing awareness of the potential impacts of aquaculture and fi sheries To date only a limited number of (and related activities) on biodiversity studies have addressed biodiversity is also very important at this stage. issues in freshwater species in the There is a need to build the capacity of region. These studies have raised regional fi sheries agencies in molecular however, important concerns regarding genetic techniques to address this issue. the potential negative impacts of Genetic diversity studies in the region aquaculture on biodiversity. Of should therefore focus on: particular concern is the ongoing practice of translocations and importa- • Genetic improvement of important tion of exotic strains/species for culture. cultured species Senanan et al. (2004) and Na-Nakorn et al. (2004) have provided evidence that African catfi sh (Clarias gariepinus) genes have introgressed into native C.

10 • Assisting management practices in raised by the public and nations that aquaculture operations, especially import aquatic products. It is in this broodstock management regard that there is a great need to build capacity in applied molecular • Resolving taxonomic uncertainties, genetic capabilities at the national and phylogenetic relationships, and regional levels. This will allow especially for those species or characterisation of the genetic populations that are endangered resources of relevant species important and/or commercially important to aquaculture and inland fi sheries in the respective nations/sub-region. • Documenting patterns of natural Knowledge on the applications of genetic diversity and identifying molecular genetics to aquaculture and management units fi sheries management will help reduce the negative impacts of many current • Assessing genetic impacts of activities on biodiversity, and allow cultured stocks on indigenous stocks development of suitable strategies for maintaining and sustaining diversity. It In the light of the major aquaculture will also help to provide a useful guide developments taking place in Asia, to the identifi cation and conservation urgent attention is needed on biodi- of genetic integrity of aquatic species versity and genetic integrity issues of within the region. cultured as well as indigenous wild stocks; issues that are increasingly

Target audiences

This manual is expected to enable The manual has gone through two NACA member country personnel to be development/improvement stages. The trained to undertake molecular genetic initial material was tested at a regional studies in their own institutions, and workshop and at the second stage as such is aimed at middle and higher feedback from participants was used to level technical grades. The manual can improve the contents. also provide useful teaching material for specialised advanced level university courses in the region and postgraduate students.

11 Aims, scope and format of the manual

The aim of this manual is to provide a utilised in population genetics and comprehensive practical tool for the systematic studies. In addition, a generation and analysis of genetic data brief discussion and explanation of for subsequent application in aquatic how these data are managed and resources management in relation to analysed is also included. genetic stock identifi cation in inland fi sheries and aquaculture.

The material only covers general background on genetics in relation to aquaculture and fi sheries resource management, the techniques and relevant methods of data analysis that are commonly used to address questions relating to genetic resource characterisation and population genetic analyses. No attempt is made to include applications of genetic improvement techniques e.g. selective breeding or producing genetically modifi ed organ- isms (GMOs).

The manual includes two ‘stand-alone’ parts:

• Part 1 – Conceptual basis of population genetic approaches: will provide a basic foundation on genetics in general, and concepts of population genetics. Issues on the choices of molecular markers and project design are also discussed.

• Part 2 – Laboratory protocols, data management and analysis: will provide step-by-step protocols of the most commonly used molecular genetic techniques

12 Abbreviations

A Adenine AA Amino Acid AFLP Amplifi ed fragment length polymorphism AMOVA Analysis of molecular variance ANOVA Analysis of variance C Cytosine DGGE Denaturing Gradient Gel electrophoresis DNA Deoxyribonucleic acid dsDNA Double stranded DNA G Guanine GD Genetic drift HWE Hardy-Weinberg Equilibrium IBD Isolation-by-distance or identical-by-descent kb 1000 nucleotide base pairs (kilobase) LHT Life history traits MDS Multidimensional scaling ordinations MHC Major histocompatability complex mRNA Messenger ribonucleic acid MSN Minium spanning network mtDNA Mitochondrial deoxyribonucleic acid MU Management units NCA Nested clade analysis nDNA Nuclear deoxyribonucleic acid Nm Effective number of migrants (where N= effective population size and m=mutation rate) NS Natural selection PCR Polymerase chain reaction RAPD Random amplifi ed polymorphic DNA RE Restriction enzyme RNA Ribonucleic acid SCR Semi-conservative replication SSCP Single strand conformational polymorphism SSR Simple sequence repeats T Thymine TGGE Temperature gradient gel electrophoresis U Uracil

13

SECTION 1

The fundamental nature of DNA

15

Traditional approaches in fi sheries for RNA) and a Phosphate group. The three identifying populations that should be components are bound covalently and managed separately (i.e. management when joined are called a Nucleotide. units) have relied on documenting There are four kinds of nucleotide population life history traits including present in any DNA strand. Essentially, reproductive condition both temporally the sugar and phosphate form the and spatially, breeding and feeding backbone of the molecule and the sites, population specifi c behaviours, backbone is identical in all DNA and and movement patterns to infer simi- RNA molecules. The only potential larity or independence of gene pools. difference between any two DNA or While the results often are in accord RNA molecules are the sequences of with subsequent population genetic nitrogenous bases, so it is this sequence analyses of the same populations this that encodes the genetic traits in an may not always be the case (i.e. obser- organism. There are four bases in both vations of morphological similarity does DNA and RNA: Thymine (T), Guanine not necessarily mean individuals belong (G), Adenine (A) and Cytosine (C) in to the same reproductive unit or DNA with Uracil (U) replacing (T) in observations of mating do not neces- RNA. For a long time the idea that all sarily imply successful reproductive genetic diversity could be explained by input into the population). Molecular sequence variation in four nucleotides analyses (either direct or indirect) have was disputed because scientists the capacity to directly test if morpho- could not comprehend how the logical similarity corresponds with diversity observable in nature could be genetic similarity or breeding actually explained by variation in only 4 bases. results in genetic exchange. This is This was because biologists already because a large amount of essentially new that there are 20 common Amino ecological and life history information Acids (AAs) that are the building blocks is retained in the DNA and is expressed of cells present in living organisms and as variation in DNA sequences. So it was diffi cult to see how four bases the basis of using Population Genetic could encode the diversity of amino approaches for identifying manage- acids unless groups of bases were read ment units in fi sheries is to understand together. The discovery of the genetic the basic attributes of DNA, how it code whereby bases are read in groups changes (evolves) and the limitations of three bases (Codons) and then on storage of life history information in decoded into Amino Acids (AAs) solved DNA sequences. this problem.

1.1 Basic DNA structure Other important aspects of DNA (RNA) structure to consider include; the DNA is a polymer and a macromolecule. Base-Pairing Rule whereby because It consists of three building blocks, of their chemical structure and the Nitrogenous bases, a Pentose sugar physical structure of the DNA molecule (Deoxyribose in DNA and Ribose in A binds to T (U) and C binds to G

17 and the fact that the bases in a DNA placed into a test tube at the start of a molecule are held on the inside of PCR reaction are basically identical to the helix and joined by a hydrogen what is present in the nucleus of a cell bond. This allows for DNA replication, during DNA replication except we use a necessary attribute for reproduction an artifi cial short piece of DNA (called (both cellular and whole organism) a Primer) to specify the sequence of and thus for near faithful transmission DNA we wish to amplify. The other of genetic information from cell to components are the same; target cell and organism to organism. DNA chromosomal DNA, a catalytic enzyme replication is Semi-conservative (SCR) - DNA Polymerase, building blocks of that implies that when DNA replicates new DNA strands – free nucleotides and the two strands separate with each a buffer to stabilise the reactions. By old strand acting as a template for cycling the reaction repeatedly, millions the production of a new strand that of copies of the target sequence are should have the reciprocal sequence to generated so we can easily harvest it the strand that was used to generate from the limited copies of other DNA it because replication occurs according sequences. So DNA replication provides to the base-pair rule (A - T and G – C). us with a method for producing a This is important to recognise because specifi c target DNA sequence from a this attribute provides the basis for mix of all sequences in the cell. later proof-reading of new strands of DNA whereby the sequence along the 1.2 Where does variation in new strand can be proof read by special DNA sequences come from? enzymes to check to see if the correct base has been incorporated. Where an When we compare the same DNA incorrect base has been incorporated in sequence from two individuals we the new strand and this is detected by may detect a different base at the the repair enzyme relative to the old identical point along the sequence. strand, it can be corrected. If however, This difference is referred to as a a change occurs in both strands simul- Mutation or base-pair substitution. taneously then repair enzymes have no Mutations are the result of ‘rare’ reference point to correct the change. errors during DNA replication but are The mechanism of DNA replication a basic requirement for Evolution as that occurs naturally in all cells forms a process of change because without the basis of a very powerful technique mutation all DNA sequences would be that was developed in the late identical to the fi rst DNA sequence(s) 1970’s/early 1980’s called Polymerase that evolved originally. Potential for Chain Reaction (PCR). Essentially, PCR accumulating mutational change is an mimics what happens naturally in the attribute of DNA and RNA molecules cell during DNA replication (in vivo) in and is the basis for the differences we a test tube (in vitro). We will discuss this see in living and extinct organisms. in more detail later but to demonstrate Mutations can occur anywhere in the the similarity, all of the ingredients DNA (both in coding and non-coding

18 DNA) but where they occur in coding most relevant to analyses of population DNA they may produce changes in the structure are point substitutions e.g. AA sequence and be expressed as new GAG to GUG and deletions or insertions phenotypes. If the mutation is present (Indels) of bases in a sequence e.g. GAG in some individuals in the population to GAGG. and not in others then the differential expression of the two phenotypes in a Effects that mutations can have vary particular environment allows the envi- widely from no effect on the individual ronment to select the most appropriate to death and there are no simple rules form. The effect may be to change the that we can apply to say what the relative frequency of the two different likely impact of a particular type of forms of the gene in the population mutation is going to be. The impact over time. Mutation rates vary widely is determined by where they occur in among DNA sequences in an organism’s the genome and what changes they genome and relative mutation rate to produce. The simple fact is however a large extent is determined by what that because mutations are random, role (if any) a particular DNA sequence when they occur in coding sequences serves in the organism. So, the more they are likely to be deleterious (i.e. important the role of an individual produce poor outcomes), simply sequence is to the individual, the more because they are random changes to slowly the sequence is likely to accumu- DNA. Ultimately the environment is late mutations and therefore evolve. the key however, as to whether a new mutation in coding DNA will provide There are a number of different ways in better or poorer phenotypes. which DNA can be modifi ed by muta- tions, from simple base-pair substitu- Until the development of molecular tions involving individual nucleotides technologies for examining vari- to changes in whole blocks of DNA, ation in natural populations, the to loss or gain of a sequence or larger most common characters used to changes that could include loss or gain document variation were studies of of one or more individual external morphological phenotypes or even whole sets and while mutations in genes that (polyploidy). When they occur, their code for morphological phenotypes probability of long-term incorporation can produce different outcomes, (survival) depends on their impact on often these mutations do not survive Gene function. Non-coding DNA will because there can be strong functional tend to accumulate more mutations constraints on many morphological and so evolve faster than coding DNA traits. Thus morphological evolution because mutations in this type of DNA can be a relatively slow process and do not directly affect gene function. populations may diverge genetically As a general rule, the more important without any changes appearing in the gene function, the lower the rate their external morphology. Many of mutation. The types of mutations morphological traits are also polygenic,

19 meaning that they are the product of individuals, populations, species etc. the combined effect of a few or more to be constructed without the need to commonly many gene loci that may consider complications of factors like be expressed differently in different transient impacts of natural selection environments. This means that many or environmental effects on sequence systematic or population variation divergence. Thus neutral markers can studies based solely on examination provide more fundamental information of variation in external morphological about phylogenetic relationships than traits may underestimate the real can studies of morphology alone. extent of underlying genetic variation and hence population structure. Simply But molecular markers like morpho- put, population studies based simply on logical markers can also have associated morphology are unlikely to detect all of problems that we need to address if we the signifi cant population structuring are going to use them constructively. that may exist in a species. Molecular For example, -based markers systematic studies in contrast, are not such as allozymes, often show low limited in the same way. levels of polymorphism hence they may not be suitable for detecting genetic Molecular markers can provide a more differentiation of organisms having fundamental data set than morphology weak population structure such as for examining relationships among many marine organisms. Also, allozyme populations and higher taxonomic studies only detect a portion of the levels. One important difference is actual genetic variation because not that they are not complicated by any all nucleotide changes lead to amino potential effect of the environment acid changes and not all amino acid because they are fi xed at fertilisation. substitutions result in electrophoreti- If we target areas of DNA that do not cally detectable mobility differences. encode phenotypes (i.e. non-coding A major problem with DNA markers DNA), these markers are usually can be Homoplasy. Since there are only neutral in respect to potential effects four potential character states at any of Natural Selection (NS). Thus they point along a DNA sequence (A, T, C or should accumulate mutations at a G), eventually by chance mutations can constant rate determined by their change a single base many times, but locus specifi c mutation rates. What this we are only able to determine the base means effectively is that the absolute that is present at a particular location number of mutations between homolo- when we sequence that region (i.e. gous sequences in two individuals now), not what may have been there in provides an absolute estimate of the the past. If we compare two individuals time since they shared a common and fi nd that they both share the same ancestor after allowing for the locus base at a particular point along a DNA specifi c mutation rate. Where DNA sequence we interpret this as similarity sequences evolve neutrally, they allow due to common descent. If however, phylogenetic relationships between they share the same base due to

20 homoplasy we have no way of knowing number of mutations between any this. Thus, it is essential to choose DNA two individuals in each population can markers carefully. Appropriate DNA be used to calculate the evolutionary markers need to evolve fast enough time that has elapsed since they last so that populations or species show shared a common ancestor, once we differences, because without variation have calibrated for mutation rate. Put there is no basis for phylogenetic infer- another way, the absolute number of ence, but they must also evolve slowly mutational differences between the enough so that there is little chance two individuals with the most similar of character convergence (homoplasy) genotypes in each population is a where we will score similarity, incor- direct refl ection of how closely related rectly. For any DNA marker there will the two forms are. Once we have this be a point reached when homoplasy information we can correlate estimated will become an issue, so we should divergence times with past earth choose a DNA marker appropriate for geological or climate history events the time frame we are examining to that may have impacted on the evolu- reduce possible confounding effects of tion of the different forms. homoplasy. This point theoretically, is when suffi cient evolutionary time has An example of this is the evolution elapsed, given the mutation rate at the of the fl ightless ratite birds (Emus, locus, for all four character states to Ostriches, Rheas, etc.). Ratites are an have been expressed (A, T, C and G) at ancient order of birds that now are a single point with the fi nal outcome limited to a few relict species confi ned being a return to a character state to the southern continents (Australia, that previously occurred there. When Africa and South America, respectively). this point is reached, we are likely to Molecular analyses confi rm both underestimate the real divergence time the relationship between the three between two individuals with the same surviving families and the fact that genotype. they last shared a common ancestor in the Cretaceous. The simplest The recognition that DNA sequences explanation for their evolution is that evolve at a constant rate as a function the common ancestor evolved when of their locus specifi c mutation rates, the three continents were part of a implies that the same gene sequence in super continent (Gondwana) that also two populations should evolve at the included Antarctica. This giant land same rate. This is the basis for the idea mass that was fractured subsequently of the ‘Molecular Clock’. Essentially, due to tectonic plate movement that what this means is that assuming lead to the sequential rafting of the population sizes are similar and remain three continents northward carrying constant over time, individuals in their ancestral fl ightless ratites with different populations will accumulate them (fi rst Africa, then South America mutations at approximately the same and fi nally Australasia). The relatives rate and so the absolute minimum of the ancient ratites (Emus, Ostriches

21 and Rheas etc.) are now living evidence for both the existence of giant super continents in the past but also for how Molecular Systematics can help to tease out the processes that have infl uenced modern biodiversity. Vicariance due to the splitting of the Gondwanan land mass has also been invoked to partially explain the distribution of freshwater galaxids among southern hemisphere continents (Waters et al., 2000).

22 SECTION 2

Genetic variation in nature

Genetic variation is essential for of sexually-reproducing individuals evolutionary change in populations. accumulate more genetic variation than While we are well accustomed to will smaller populations. detecting variation in our own species, we are less perceptive at detecting fi ne Mutations are the ultimate source of scale variation in other species. But genetic variation and occur randomly the level of phenotypic variation we along DNA sequences. Mutation rates can detect in ourselves is also present vary widely among DNA sequences in most other species. The amount of (over 1000 fold among genetic loci). An variation in a population will infl uence optimal rate exists however, for each its relative rate of evolution. Thus large DNA sequence, so populations of a populations should contain higher species and closely related species are levels of genetic variation than small likely to share the same mutation rate populations. What this means, at least for each common DNA sequence. conceptually, is that populations with little or no genetic variation have little A question that has long interested potential to respond to environmental evolutionary biologists is ‘how much disturbance and therefore have a genetic variation actually exists in higher probability of going extinct natural populations?’ For a long time when environments change. As a (based on observations of external consequence most extant populations morphological variation in natural are variable. populations) biologists believed that genetic diversity in natural popula- The genetic variation present in natural tions was relatively low because most populations comes from three funda- con-specifi cs looked morphologically mental sources: Mutation, Genetic very similar. So, when biologists Recombination and Sexual Reproduc- quantifi ed variation in individuals in tion. All variation ultimately arises from a population for morphological traits mutation but this varation is mixed (e.g. colour variants), they tended to among individual chromosome strands regard it as unusual and not typical by genetic recombination, and mixed of most traits that they examined. among diploid individuals by sexual This led to a view that Evolution as a reproduction when individuals combine process in general was conservative and their gametes to produce a zygote. The hence slow, and they argued that this extent of this variation means that no could be explained by the fact that if two individuals in a population (except mutations were random, then changes for monozygotic twins or clones), will in genes were likely to produce poor be genetically identical. Since most (deleterious) outcomes in mutants individuals are genetically unique and and hence will be lost as a result of population size determines how much ‘purifying selection’. This view however, mutational variation can accumulate developed at a time when we did not in a population, larger populations

25 know about non-coding DNA and to directly infl uence allele frequencies had no molecular data on levels and at a locus however, the locus must be patterns of genetic variation. coding and produce different pheno- types. Neutral evolution in contrast, The modern view of how much genetic is where changes in allele frequencies variation exists in natural populations occur simply as a consequence of is quite different to the ‘Classical View” accumulation of neutral mutations at described above and resulted from a locus and their frequencies change the development and application of as a result of random genetic drift molecular analyses of genetic diversity (a function of population size) over in natural populations that commenced time. This idea (developed by Kimura in the late 1960’s. The early studies in the 1960’s into the ‘Neutral Theory exploited the development of the of Evolutionary Change’) argues that technique of Allozyme Electrophoresis genetic drift acting in populations fi rst developed for human disease of different size (i.e. chance) will diagnosis in the early 1960’s. This was determine the fate of most individual extended to molecular analyses of mutations at a locus over time. Kimura DNA markers in the late 1970’s and showed that both coding and non- early 1980’s. What these studies have coding DNA sequences in theory, can shown is that genetic diversity in most evolve by genetic drift alone and that DNA sequences is in fact much higher there is no absolute requirement for than had been predicted from earlier NS to infl uence gene frequencies at a morphological studies and so required locus for populations to diverge. Even a new explanation. This led to the idea though the effect of GD will be greater that the high genetic variation evident in small populations, over evolutionary in most DNA sequences could result time isolated populations are likely to from two evolutionary mechanisms; diverge simply as a consequence of GD. Natural Selection (NS) in the form of This process will occur regardless of balancing selection and the random whether NS is affecting the frequency accumulation of neutral mutations. of alleles at the locus or not (unless strong balancing selection, i.e. hetero- Evolution via NS (or Darwinian zygote advantage, is present). Under evolution) results from the difference this model the amount of genetic in relative fi tness of the possible diversity at a locus is largely determined phenotypes present in a population. by a balance between how many new Evolutionary biologists now recognise alleles are entering the population by a variety of different types of natural mutation and the number being lost by selection that can change gene genetic drift (the balance is referred to frequencies in a population including: as ‘mutation-drift equilibrium’). Since heterozygote advantage, effect of loss of alleles via genetic drift occurs patchy environments, frequency more rapidly in small populations, large dependent selection, epistatic interac- populations will be more genetically tions etc. to name but a few. For NS diverse simply because there are more

26 individuals in the population in which So is most genetic variation (read mutations can occur and fewer alleles – evolution) in nature determined will be lost by drift. largely by adaptive or neutral processes? The fi rst point to recognise Thus developed two contrasting is that different types of DNA are more hypotheses that attempted to explain likely to be affected by one or other genetic change in populations over mechanism. Non-coding DNA is likely time: evolution via NS or neutral to be infl uenced by neutral evolution evolution. Both attempt to explain because no phenotypes are expressed, why there is so much genetic variation so the only way that non-coding DNA present in most natural populations, can be affected by NS is if by chance it but do so from essentially opposing occurs in close proximity to a coding positions. Proponents of Darwinian sequence on a chromosome and evolution (i.e. evolution via natural genetic variation at the non-coding selection) argue that genetic variation sequence is infl uenced by selection results from accumulation of mutants acting at the adjacent coding locus, that produce phenotypes that have indirectly. This is called a ‘hitch-hiking’ higher fi tness than alternative effect. On the other hand, coding DNA phenotypes at the locus driven by a can be affected by NS and the more variety of different selective processes important is the functional role of a singularly or in concert. In contrast, coding locus, the more likely it is that proponents of the neutral model of NS has, is or will affect genetic variation evolutionary change argue that genetic at the locus over time. variation accumulates in populations simply due to the fact that most new While evolutionary biologists agree mutations entering a population are that genetic variation levels in nature selectively neutral i.e. they affect fi tness are generally high and large popula- very little if at all and so it is chance tions contain on average more genetic and population size that are the most diversity, the relative importance of NS important factors that will affect their and genetic drift in generating diversity long-term fate. Since modelling has remains a matter of ongoing debate. shown that both mechanisms in theory can change gene frequencies and there are practical examples in nature of both processes, this led to a debate as to which mechanism was responsible for the majority of observable genetic variation in nature. The so-called ‘Neutralist - Selectionist debate’ that is still to be fi nally resolved.

27 28 SECTION 3

Basic concepts in population genetics

Basic concepts in population genetics heritable, secondly what evolutionary are central to understanding the mechanism(s) may have caused the processes that infl uence development change and thirdly, how the evolu- of population structure in natural tionary mechanism(s) are having their populations. The science of population effect. In nature however, satisfying the genetics focuses on heredity in groups three requirements is often not easy of individuals and populations and aims and so we may have to be satisfi ed by to describe the genetic composition inference rather than direct evidence of populations and to document and for some of the requirements. understand the forces that change their genetic composition over time. Thus at If we believe that a phenotypic trait its heart, population genetics seeks to in a population may be evolving understand the process of evolution. and wish to test the hypothesis and attempt to understand the process, we The fundamental starting point to need a foundation (essentially a ‘null understanding population genetics is hypothesis’). This is simply because to recognise the relationship between we can never prove the hypothesis DNA and the phenotype. Encrypted in that evolutionary processes have the DNA of all organisms is the genetic changed the gene frequencies at the information necessary to encode locus (or loci) coding for the trait, phenotypes, but fi rst in eukaryotes but we can refute the null hypothesis it has to be transcribed into a carrier that evolutionary processes have not molecule (mRNA) and this molecule changed the gene frequencies. To set then moves to the cytoplasm of up this null hypothesis test we need to eukaryote cells where it can be trans- employ the Hardy-Weinberg Principle. lated into the encoded polypeptide Hardy and Weinberg were mathemati- chain at the . Essentially there cians in the 1930s that developed are three steps in the process; transcrip- the basic mathematical platform for tion, translation and modern population genetics. They were of which only gene expression can be interested in how gene frequencies infl uenced by external environmental can change in natural populations and factors. While mutations can affect any recognised that before this process can stage of the process, only mutations be explored there has to be an a priori in the DNA have the potential to be reason for focusing on a particular trait. passed among generations. Put simply, there is no reason to try to understand what forces are changing From a population genetic perspective, gene frequencies at a locus or how they Evolution can be defi ned as any change have their effect without fi rst having in phenotypic frequency in a popula- reasonable evidence that gene frequen- tion over time. To demonstrate that cies have in fact, changed! So they evolution has occurred in a population modelled the effects of gene frequency we need to satisfy three requirements; change in populations over time and fi rst that the trait in question is developed what is now referred to as

31 the ‘Hardy-Weinberg Equation’ (H/W). natural populations will fully satisfy all The Hardy-Weinberg Principle can be of these attributes but unless we have defi ned as; ‘in the absence of migra- evidence to the contrary we can assume tion, mutation and natural selection, that most large natural populations will gene frequencies and genotypic approach H/W status. Characteristics frequencies remain constant in a large, of populations at H/W equilibrium are randomly mating population’. that allele frequencies at autosomal loci will not change across generations, The Hardy-Weinberg equation genotype frequencies will also remain essentially states the null hypothesis of constant and if H/W equilibrium is gene frequency change, i.e. that if no disturbed, it can be re-established evolutionary mechanisms are affecting within one generation of random the frequencies of alleles at a locus, mating. then the frequencies should not change over time or among generations. Thus, By inference populations that do not if we have the necessary information satisfy H/W equilibrium must be expe- and data to test the hypothesis for riencing changes in gene frequency a locus of interest and we are able due to some evolutionary mechanism. to refute the null hypothesis that An example would be changes in the no change in gene frequencies has frequency of a recessive allele that occurred, then we are in a position to causes a genetic disease in humans justify searching for a mechanism(s) across generations. When expressed in that may be causing the change and the homozygous state (rr), individuals to attempt to understand the process. suffer from the genetic disorder and Hardy and Weinberg recognised may die before they can reproduce, however, that there are qualifi cations thus when this happens, the frequency on the attributes of populations in of the ‘r’ allele in the population will which their principle would hold. They decline. Assuming that no other factors defi ned this population as a ‘Mendelian affect survival of the mutant recessive population’ and recognised that it allele over time, we should expect that must have the following attributes; be this allele will eventually go extinct diploid, sexual, outbreeding, randomly because it is not favoured by natural mating and large. Populations that selection. An example in humans is the satisfy these conditions are considered mutant allele that causes haemophilia. to have reached H/W equilibrium and Although inheritance of this allele is this means that every reproductive complicated by the fact that the locus individual has an equal chance of is sex-linked and so is inherited in a mating, all genotypes at a locus have different manner in males and females. equal fi tness, each new generation is a random sample of the previous Where we have data on gene frequen- generation’s gametes and no new cies in a population at a locus we can alleles appear in the population. We test to see if the population conforms now recognise that not all (if any!) to H/W equilibrium by comparing

32 the distribution of genotypes against those expected if the population was The difference between the at equilibrium. To do this we use the observed and expected values H/W equation i.e. in the simplest case, can be tested for statistical if the trait in question is determined by signifi cance, using a χ2 test for a single genetic locus with two alleles goodness of fi t. then if we let the frequency of the (a) (Observed - Expected)2 allele equal p and the frequency of the χ2 = Σ alternative allele (b) equal q then: Expected

p + q = 1 in a simple χ2 test with the ∑(observed But in diploid organisms for most – expected)2 / expected (with degrees of nuclear genes we inherit two copies of freedom equal to number of genotypes each gene, one from each of parent, minus number of alleles) (Example 1). so Hardy and Weinberg realised that If after completing such an analysis their equation needed to take the the result is that the population does diploid condition into consideration not conform to H/W equilibrium then and to recognise that there are three we can look for information that can ways that an individual can carry a and help to identify the likely causative b alleles if they are diploid. To address agent (evolutionary mechanism) with this issue they expanded the equation the results of the test providing some to deal with diploid genotypes so that; insight into the possible cause. For p2 is the probability of receiving a copy example, an excess of heterozygotes of the a allele from both parents, 2pq is may indicate balancing selection (i.e. the probability of being a heterozygote heterozygote advantage), whereas a and q2 is the probability of receiving a defi ciency of heterozygotes may refl ect copy of the b allele from both parents disruptive selection or non-random then: (assortative) mating.

p2 + 2pq + q2 = 1 While Darwin was aware of only a single class of causative agent for which The equation can also be expanded to he coined the term ‘natural selection’, deal with cases where there are more modern evolutionary biologists recog- than 2 alleles at the locus, e.g. nise at least six different mechanisms p + q + r = 1. Once we have data for the that can cause populations to deviate observed allele frequencies constituting from H/W equilibrium (mutation, the genotypes in the population we can migration/gene fl ow, non-random then use the H/W equation to calculate mating, genetic drift, natural selection the expected genotypic frequencies if and ‘molecular drive’). Of these; natural the population was at equilibrium. The selection, genetic drift and migration/ expected frequencies of genotypes can gene fl ow are the mechanisms most be compared with observed frequencies commonly considered to be the most

33 important ones that can affect popula- The degrees of freedom (df) in a test tion structure, at least over shorter involving n classes are usually equal evolutionary time frames. to n-1. That is, if the total number of individual (6 in this example) is divided Example 1. Observed distribution among n classes (3 genotypic classes in and expected Hardy-Weinberg the example), then once the expected equilibrium distribution of genotypes numbers have been computed for n-1 can be summarised in the Table on the classes (1 in the example), the expected following page: number of the last class is set. Thus in the above example there is only one Genotypes degree of freedom in the analysis. AA AB BB Observed 321 Check the χ2 value of 0.37 at df = 1 in Expected 2.66 2.67 0.67 Table 1 we will have P-value > 0.05 and (O - E)2 0.12 0.45 0.11 therefore we accept the null hypothesis (O - E)2/E 0.04 0.17 0.16 of Hardy-Weinberg equilibrium in the population in our example. χ2 = 0.04 + 0.17 + 0.16 = 0.37

Table 1. Chi-square Probabilities.

Probabilities df 0.95 0.90 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.016 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.71 1.39 2.41 3.22 4.61 5.99 9.21 13.82 3 0.35 0.58 1.42 2.37 3.67 4.64 6.25 7.82 11.35 16.27 4 0.71 1.06 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 0.15 1.61 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.64 2.20 3.83 5.35 7.23 8.56 10.65 12.59 16.81 22.46 7 2.17 2.83 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.13 9 3.33 4.17 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.87 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 11 4.58 5.58 8.15 10.34 12.90 14.63 17.28 19.68 24.73 31.26 12 5.23 6.30 9.03 11.34 14.01 15.81 18.55 21.03 26.22 32.91 13 5.89 7.04 9.93 12.34 15.12 16.99 19.81 22.36 27.69 34.53 14 6.57 7.79 10.82 13.34 16.22 18.15 21.06 23.69 29.14 36.12 15 7.26 8.55 11.72 14.34 17.32 19.31 22.31 25.00 30.58 37.70 20 10.85 12.44 16.27 19.34 22.78 25.04 28.41 31.41 37.57 45.32 25 14.61 16.47 20.87 24.34 28.17 30.68 34.38 37.65 44.31 52.62 30 18.49 20.60 25.51 29.34 33.53 36.25 40.26 43.77 50.89 59.70 50 34.76 37.69 44.31 49.34 54.72 58.16 63.17 67.51 76.15 86.66 Accept at 0.05 level Reject

34 SECTION 4

Natural selection

Charles Darwin and a colleague, Alfred relative fi tness of different genotypes Wallace, established that evolution at a locus in a population where we could result from the effects of natural have data on the average number of selection changing the frequency of surviving offspring per genotype across genetically determined traits in nature. at least two generations. Relative Since this was the only mechanism fi tness varies from 0 to 1 because the that had been proposed to drive calculation is made in such a way that evolutionary change in nature from the best performing genotype of all the middle of the 19th century until possible ones at the locus is always the 1930’s, it has attracted considerable given a fi tness value of 1 and poorer interest from evolutionary biologists genotypes a value less than 1. A over time and continues to do so. genotype that does not produce any Simply put, NS acts on heritable surviving offspring across generations variation and is the relative ability of will have a relative fi tness of 0. individuals with different phenotypes to survive and pass on their genes to Sometimes different genotypes can their offspring. Where NS is affecting have equal fi tness in a particular allele frequencies at a locus, over time environment or multiple niches are individuals with superior phenotypes available for different genotypes and (and hence superior underlying geno- where this occurs, multiple phenotypes types) in a particular environment will may do well over time. This is called tend to have more surviving offspring a ‘balanced polymorphism’ and is and so their alleles will increase in one way evolutionary biologists who frequency in the population at the support the notion that NS is the most expense of individuals with poorer important evolutionary mechanism, performing phenotypes. Differences believe that NS can maintain high in reproductive output was termed levels of genetic diversity in natural ‘relative fi tness’ by Darwin and by populations. Balanced polymorphisms this he meant that individuals in the may evolve for a number of different population with high relative fi tness reasons including that across the would on average provide more natural distribution of the species surviving offspring to the next genera- there may be different habitat patches tion compared with another individual that favour different phenotypes (and with a poorer phenotype. The hence infl uence the frequency of their comparison is always ‘relative’ because underlying alleles). The ‘peppered it is population specifi c and is made moth’ is a classic example of this kind against the best-performing genotype of balanced polymorphism because in in a particular environment. This is polluted environments in Britain where an important point because the best it occurs, the dark morph of the moth performing genotype may not always is favoured because it is more cryptic be the same genotype if populations to predators than is the light form. of the species are found in different In contrast in pristine environments, environments. We can estimate the where there is little air pollution from

37 coal dust, the light coloured form is the fi tness of individuals may vary favoured because the resting place geographically across the natural moths use during the day (trunks of distribution of the species, while oak trees) are covered with lichens ecological variation in fi tness may occur that are white and light grey in colour where factors such as differences in that provide more protection to the fi tness associated with substrate type, light coloured morph than to the dark depth, canopy cover etc. can infl uence morph. Lichens are very sensitive to air fi tness. pollution particularly coal dust and so in polluted areas they do not thrive and Once evolutionary biologists had recog- so the trunks of oak trees are basically nised that selection can act in a variety dark brown to black, the natural colour of ways, attempts were made to model of the tree bark. An alternative way the impact of selection on traits with balanced polymorphisms may evolve, different modes of inheritance. These is where multiple niches are available models attempt to predict the outcome in the same place (environment). Shell of selection. The most important factor colour variation in English land snails in the models is the time to ‘fi xation’ or (Genus Cepaea) can be infl uenced by when one allele (the one favoured by this process. There are patches within NS) reaches 100% and allelic variation a single habitat type where different at the locus is lost. Time to fi xation will colour morphs may be more cryptic depend on the starting frequency of (e.g. certain patches favour banded the allele favoured by NS, differences in snails and other patches may favour relative fi tness among genotypes and un-banded snails, so overall both the the mode of inheritance of the locus. alleles for ‘banded’ and ‘un-banded’ For simplicity, most selection models remain in the population. Thus selec- assume that selection pressure remains tion can favour one allele in a single constant over time but in reality this place or multiple alleles in the same may not necessarily always be true. place so that a balanced polymorphism Most models are essentially H/W evolves for a particular trait. models that incorporate selection co- effi cients i.e. an estimate of how much Relative fi tness can vary temporally, advantage the favoured genotype has geographically and ecologically for a over alternative genotype(s) at the population. If one or more of these locus. There are four basic kinds of effects are evident then polymorphisms selection model; (a) selection against will be common at the locus. Temporal the recessive homozygote, (b) selection variation in fi tness is where fi tness that favours the heterozygote, (c) may be affected in different ways selection against a single allele at a at different life history stages (e.g. locus and (d) selection that acts against eggs vs fi ngerlings) or with different heterozygotes. A special case of type seasons within a single life history (a) is referred to as a recessive lethal stage. Geographical variation in fi tness where only the recessive homozygote is occurs where factors that determine affected and always dies pre-reproduc-

38 tion, so the relative fi tness of this dominant inheritance it can quickly genotype is 0. Cystic Fibrosis in humans be eliminated by natural selection used to be an example of this type as long as it does not provide better of mutation until modern medicine outcomes than pre-existing forms of devised ways to prolong the life of the gene in certain environments as some affected individuals. is the case for alleles that codes for ‘sickle-cell’ anaemia in humans. While Once we have data on the mode of individuals that express the sickle cell inheritance of the mutant we can use allele in either the homozygous or relative fi tness estimates incorporated heterozygous state have lower fi tness into a H/W Model to determine the in most environments where humans likely time to fi xation under different occur, where Plasmodium falciparum selection intensities and look at the malaria is a problem (i.e. many tropical effect of the change with different and sub-tropical environments) hetero- starting gene frequencies. While the zygotes that express sickle-cell anaemia outcomes can be very diverse, one have higher fi tness than homozygous obvious characteristic is that the time normal individuals because they have required to purge a recessive allele higher resistance to infection from that produces even extremely poor the malarial parasite. Mortality due to fi tness outcomes for a sufferer is much malaria is higher in these areas than longer (in terms of generation time) the lower reproductive capacity that is than for an equivalent allele that shows associated with sickle-cell anaemia so dominant inheritance. Equally it will essentially the environment (presence take a much longer time for a new or absence of malaria) changes the recessive mutant that provides higher fi tness of an allele from negative to relative fi tness than pre-existing allelic positive. forms of the gene to reach fi xation than an equivalent dominant favoured As discussed earlier we now recognise mutation. The simple explanation that NS can take many forms and affect for these phenomena relate to the individuals and hence populations in differences in the mode of inheritance a diversity of ways. The evolutionary and the fact that NS can only act effects of NS can often be very when a mutation is expressed as a complicated and even in opposition phenotype, so deleterious mutations when different types of NS act in can remain hidden in the population concert on a population at the one with no effect simply because an time. For example sexual selection individual requires two copies of the may be favouring alleles in males in a gene to express the phenotype. This completely different way to females is one reason why so many mutations (e.g. favouring more conspicuous males that cause ‘nasty’ genetic disorders that just happen to be more obvious can remain in the genomes of species to predators as well) while other for many thousands of generations. forms of NS (e.g. NS favouring cryptic In contrast if a ‘nasty’ mutation shows colouration to avoid predation is acting

39 on both sexes). What we measure is the cumulative effects of both types of NS on colouration patterns not the individual effects of each process. Esti- mating the relative effect of individual NS agents even when they are known can be very diffi cult, because to do this we need to know; how many different selective agents are affecting a trait at one time, what are their individual effects and how they interact. In most cases this is not possible, so we simply look at their cumulative impact on phenotypes over time.

It is obvious however, that NS can be a very powerful mechanism for evolutionary change in natural populations and in certain situations can infl uence how populations are structured in space and time. This is especially true when populations have been in the past or are currently isolated from other populations so that gene exchange is either restricted or completely disrupted for extended periods of time. When this occurs, isolated populations are likely to experience their local environments differently because conditions will not be identical and so local selective agents may produce unrelated changes in gene frequencies and so result in population divergence. Geographical speciation models argue that this is the simplest and most widely accepted way in which new species can evolve from ancestral types (isolation leading to populations experiencing local environ- ments differently and hence NS driving their divergence).

40 SECTION 5

Genetic drift

The process of random genetic drift In a population of N diploid individuals is a powerful evolutionary force and there are 2N alleles at any particular is central to our understanding of locus. Therefore, if a new mutation population genetics. Random genetic arises, it will start in the population drift (GD) refers to the random fl uctua- with a frequency of 1/2N. This is also tions of allele frequencies from one roughly the probability of the new generation to the next. Sometimes mutation being passed on to the it is referred to as a sampling error next generation. It is also the prob- of gametes between generations. In ability of the new allele increasing a randomly mating population, the in frequency to fi xation. From this expectation that two particular alleles simple relationship it can be seen that coming together at fertilisation is a the eroding force of genetic drift on function of the relative frequencies genetic variability is purely a function of each allele in that population and of population size (N). The greater therefore should conform to Hardy- the population size, the smaller the Weinberg expectations. In essence, effect. Similarly in small populations, these expectations are rarely realised a new mutation will initially exist at a due to stochastic events that may affect relatively high frequency (relative to a random mating, for example, unequal large population of the same species) offspring numbers from individual thereby having a greater chance of females. being passed onto the next generation. Effects of small population sizes on Because of the random nature of genetic drifts will be further explained genetic drift, it is impossible to predict later. absolutely the fate of a particular allele. The effects of genetic drift on Kimura and Ohta (1971) calculated that an allele over time however, will be to it would take 4N generations for the either increase or decrease in frequency frequency of a newly mutated allele to in the population. Given suffi cient reach fi xation in a population. That is, time, the allele in question will if p is the frequency of allele a: increase in frequency until it reaches fi xation or alternatively decrease until Time to fi xation of allele a (i.e. p=1): it becomes extinct. Either way, the locus in question is heading towards T1(p)=4N a homozygous state. So even if we are unable to forecast the outcome of Conversely, the time it takes for allele a an individual allele, we can state that to be lost from the population is: the overall effect of drift is to reduce genetic variation (push polymorphisms Time to loss of allele a (i.e. p=0): towards homozygosity).

T0(p)=2ln(2N)

43 With the ratio between them: In the absence of selection and 2N/[ln(2N)] assuming that each mutation results in a unique allele, the It can be seen therefore, that in a level of genetic variation (hetero- population of 500 individuals, it takes zygosity) can be considered as approximately 145 times longer for the a balance between the force of allele to go to fi xation than it does for drift (that erodes variability) and it to be lost from the population. It is mutation (generating variability). easy to see therefore, that the majority Kimura and Crow (1964) defi ned of new genetic variants will be more the equilibrium of heterozygosity likely to go extinct than to become as: established in the population H = 4Nµ/(4Nµ + 1) Given enough time, every locus will become homozygous as a single allele where N is the population size will have drifted to fi xation at each and µ is the mutation rate. It gene locus within the population can be seen from this equation (i.e. a total lack of genetic variation). that if either population size or Because of the potentially huge mutation rate is low then hetero- timescale required for an allele to reach zygosity will also be low. This is fi xation, however, this rarely (if ever) intuitive as small population size occurs. During the time it takes for an results in elevated drift thereby allele to head towards fi xation, new reducing genetic variation. mutations are continuing to arise that A general rule regarding the are subject to the same pressure of relationship between these two drift (with their own respective prob- opposing processes is that if 4Nµ abilities of increasing in frequency). is much larger than one, then Hence, a population is in a constant mutation is the dominant process heterozygous state for many loci with and heterozygosity is high. If 4Nµ the persistence of the polymorphism is much lower than one, then dependent on population size. drift is the dominant force and heterozygosity will be low. Natural populations also rarely remain stable in size over time. Many populations at some stage experience pass their alleles onto the following a sudden crash in numbers (usually due generation and what genetic diversity to some extrinsic disturbance such as an does survive is subject to a greatly outbreak of disease). A rapid decline elevated pressure of drift. Ultimately, in size is referred to as a ‘Population the degree to which genetic variation Bottleneck’ that has a two-fold effect is lost is a function of two factors: i) the on genetic variability. Firstly, only a magnitude of the bottleneck (i.e. how relative few individuals manage to few individuals survive to reproduce)

44 and ii) the duration of the bottleneck drifting to fi xation of the same allele. (i.e. how long the population remains If there were fi ve alleles present, then at a low number). It can be argued that the probability would drop to a 20% the duration of the bottleneck has a chance. Also the probability of two greater impact with respect to the loss populations going to fi xation for the of genetic variability. That is, if a popu- same allele would be very low when lation that has undergone a bottleneck considered across many loci. Therefore, can recover numbers rapidly, then the the net effect of drift is to cause loss of variation will be attenuated. populations to differentiate (diverge genetically). Another form of population bottleneck is when a few individuals colonise an We have seen the relationship between environment previously unoccupied by mutation and genetic drift and how the species. This is known as a ‘Founder their interaction determines genetic Event’ with the new population being variation. These predictions are only subject to the forces of drift in the valid under the neutral theory of same way as seen with a bottleneck. evolution. Although many (most) The founding population will lose point mutations are selectively neutral, genetic variability much faster than will particular mutations may bestow a the parent population. signifi cant fi tness advantage on the individual. In this case, new mutations So far we have discussed the fate of may increase in frequency at a much genetic variation due to the forces of higher rate than may be expected drift within single populations. Genetic under neutral theory. The question is, drift also plays an important role in which evolutionary force is stronger? leading isolated populations to become Balancing selection will tend to keep genetically differentiated. This concept multiple alleles in relatively high is central to population genetic theory. frequencies at the locus under selection Because the process of drift is random, thereby maintaining high heterozy- alternative alleles within different gosity. This opposes the effects of drift populations will increase (or decrease) that reduces variability. The relationship in frequency. Eventually, populations between these two evolutionary forces will become fi xed at particular loci for is once again a function of population different alleles (i.e. total differentia- size. The smaller the population the tion). It should be recognised however, greater will be the probability that the that due to the random nature of drift, effects of drift will outweigh the effects two populations could also become of selection. The general rule is that if fi xed for the same allele by chance. The 4Ns (where s is the selection coeffi cient) probability of this occurring reduces is much less than one, then drift is the rapidly as the number of alleles at the most important process determining locus increases. For example, for a locus variability. If 4Ns is much greater than where there are only two alleles there one, then selection is likely to be the is a 50% chance of two populations dominant force. It should be noted

45 however, that other forms of natural cytoplasmic genomes (e.g. mitochon- selection (e.g. directional selection) can drial or chloroplast DNA) will exist also lead to reduced heterozygosity. at lower Ne than for nuclear genes. For example, mtDNA is maternally It is clear from the preceding discussion inherited (compared to bi-parental that the force of drift in infl uencing inheritance for nDNA) and is a haploid genetic variability depends on molecule (compared to diploidy of population size (N). Given this fact, nDNA), therefore half the number of it is probably important to briefl y parents times half the ploidy results in explore the value N. When we think of a four-fold reduction in Ne. Therefore, a population size, we merely see it as the effects of drift will be four times the number of individuals present at a greater on mtDNA genes than on location at a particular time. In terms nDNA genes in the same population. of population genetics however, this This concept and its implications for value can be misleading. The important assessing population structure will be point is that we are interested in the developed later. probability of alleles being transferred successfully to subsequent generations. Therefore we are only interested in the number of individuals that contribute their genes to the next generation (i.e. Effective population size can also individuals that breed successfully). be infl uenced by other factors These individuals constitute what is such as unequal sex ratios or called the ‘Effective Population Size’ particular breeding behaviours

(Ne). In nearly all cases, the effective where one or a few males breed population size is signifi cantly smaller with many females. The simple than the census population size. So formula for calculating Ne in even a population that appears to be these cases is: very large may in fact have a relatively small Ne and therefore be subject Ne = (4NmNf)/(Nm + Nf) to an elevated pressure of drift. The concept of Ne is particularly signifi cant where Nm is the number of to conservation genetics. That is, how breeding males and Nf is the small can the effective population be number of breeding females. until levels of inbreeding reduce the Ne is affected more (in terms overall fi tness of the population? of reduced numbers) by the rarer sex. This is because they Another important concept regarding constitute less than half of the

Ne is that it will vary depending on breeding population (sometimes which gene locus we look at. All auto- signifi cantly so) yet they still somal genes in the nuclear genome will contribute 50% of the gametes to follow the rule listed above. However, the next generation. genes on the Y chromosome or in

46 SECTION 6

Non-random mating and population structure

A Gene Pool is the collection of geno- clones. This situation is relatively rare in types present in all individuals that nature however, because self-fertilising constitute a reproducing population, so species have little genetic variation and essentially it comprises all individuals hence lose most of the advantages of who potentially could exchange genes. diversity. Even species that are capable Sometimes a gene pool is connected of self-fertilisation may not neces- directly i.e. reproducing individuals sarily engage in it (e.g. some mollusc can meet and exchange genes directly species). More common in nature is or exchange may be indirect via the situation where organisms within intermediates because individuals a gene pool practice some level of choose not to move large distances or inbreeding (i.e. non-random mating). individual dispersal distances are not The consequences of this can be that large enough to allow contact with populations will be structured spatially all members of the gene pool. While and/or temporally. one assumption of the H/W theorem is that individuals within a gene pool Inbreeding may result from both mate at random, this is seldom if ever intrinsic (e.g. behavioural traits) and/or the case in nature both for intrinsic extrinsic factors (e.g. physical barriers and extrinsic reasons. Thus individuals to dispersal). If individuals mate assor- that belong to a discrete gene pool tatively, that is they either choose other are often distributed as ‘demes’, local individuals as mates that are pheno- populations, subpopulations or popula- typically similar to themselves (positive tions and share more genes in common assortative mating) or individuals that with members of their own sub-group are phenotypically different to them- than with the rest of the gene pool. selves (negative assortative mating), When populations become subdivided this can affect the level of inbreeding in by limitations on dispersal, the popula- the population. An example of positive tion will inevitably become subdivided assortative mating may be the fact that as complete interbreeding may not in human populations, individuals more be possible, so mating will not be at often than at random select individuals random. This results in genes being of the opposite sex of similar height, structured spatially across the natural while examples of negative assortative distribution of the gene pool. mating include mate choice in mice and self incompatibility factors in A number of different types of non- some plants. Female mice have been random mating have been recognised shown to select males with different by evolutionary biologists. One form odours to themselves as mates when is ‘inbreeding’ where individuals share the opportunity exists. Odour in mice more genes by common descent than is in part, determined by Major Histo- would be expected by chance. Different compatability Complex genes (MHC) levels of inbreeding exist from one that provide a major component of the extreme of self-fertilisation where the bodies defence system against disease, population is essentially an assembly of parasites and pathogens. It is thought

49 Once modelers had determined that inbreeding increases homozygosity they worked out that where genetic data were available, this effect could be used to estimate the level of inbreeding that was occurring in a population essentially by comparing the observed heterozygosity against that predicted under H/W equilibrium given the observed allele frequencies. Thus the probability that two alleles are inbred is given by the inbreeding coeffi cient (F), where F is the probability that two alleles in an individual are identical by descent. F or the inbreeding coeffi cient varies from 0 where the population is completely outbred to 1 when the population is completely inbred so the population will consist of only AA and aa homozygotes for a two allele system. We can estimate the relative level of inbreeding in a population using:

1 F = Σ[( )n1+n2 +1 + (1+ F )] X 2 A Where:

• FX is the inbreeding coeffi cient of the individual in question

• FA is the inbreeding coeffi cient of the common ancestor, and

• n1 and n2 are the number of generations from the sire and the dam to the common ancestor, respectively.

The statistics of inbreeding were developed by Sewall Wright in the 1922 and later who modelled the effect of various processes on gene frequencies in natural populations and related this to what was expected under H/W equilibrium. The result is that the modern statistics of inbreeding take his name i.e. Wright’s (F) statistics and a variety of versions are available for analysis with genetic markers that possess different modes of inheritance and in theory mutate in different ways. The general equation is:

(1 – FIS)(1 – FST) = (1 – FIT)

Where:

• FIT is the correlation of uniting gametes relative to gametes drawn at random from the entire population

• FIS is the correlation of uniting gametes relative to gametes drawn at random from within a population and,

50 • FST is the correlation of uniting gametes within subpopulations relative to gametes drawn at random from the entire population.

The statistic of real interest in studies of population structuring is FST because in essence it measures the extent to which the populations under examina- tion are subdivided, or put another way, how much gene fl ow is occurring among subpopulations.

FST varies between 0 and 1 where an FST of 0 implies that the populations under examination have the same set of alleles in identical frequencies

and an FST of 1 implies that the populations share no alleles in common. In practice FST among populations is rarely larger than 0.5 and is often much less. Wright proposed for a simple two allelic system at a locus where FST > 0.25 constitutes very great differentiation and within the range 0.15 to 0.25

this constitutes ‘moderate differentiation’. The actual interpretation of FST is more complex. An example is that recently it has been shown that for hyper- variable genetic markers (e.g. microsatellites) that often possess many alleles

per locus, FST estimates among populations may be considerably lower than for traditional markers with fewer alleles per locus (e.g. allozymes).

that choice of a male with different is called ‘Gene Flow’ and is another odour type by female mice increases method apart from mutation by which the probability that their offspring new genes can enter a population. will be more heterozygous at MHC Gene fl ow is a very powerful force loci and this attribute may increase for homogenising gene frequencies overall fi tness of the offspring. Both among demes or populations and inbreeding and positive assortative the more gene fl ow that occurs the mating will increase homozygosity lower will be the level of inbreeding. while negative assortative mating will Essentially, gene fl ow is a force that increase heterozygosity above that opposes development of population predicted under the H/W model. differentiation and hence population sub-structuring. As gene fl ow increases As discussed earlier, a major factor it should also increase heterozygosity that keeps populations evolving as a in the receiving population. This effect unit is when they are connected by results from crosses among individuals ongoing dispersal. In theory, the more from different populations that did not effective dispersal that occurs (where have identical gene frequencies at all individuals move among populations loci at the start of the process. So gene and reproduce in the new site), the fl ow and inbreeding are essentially more similar populations should opposing forces that largely determine be, genetically. Effective dispersal the extent of population structure that

51 will evolve among demes. If gene fl ow dispersal is possible either directly or is high among subpopulations, then indirectly via generational connections, population structuring will be low individuals disperse more commonly at because inbreeding is reduced. If gene a relatively local scale so that subpopu- fl ow is low among subpopulations, lation differentiation is greatest at the then population structuring will be largest spatial scale. ‘Stepping Stone high because inbreeding will increase. Models’ are mathematically more complex and describe situations where Once the relationship between dispersal is only possible between inbreeding and gene fl ow was under- adjacent populations and the greater stood, interest focused on the diversity the geographic distance between of potential patterns of population populations the less chance there is of structure that could result in nature. gene fl ow, so there is genetic isolation So migration models were devised to by distance. In this case the relationship describe patterns of population subdivi- between FST and gene fl ow is: sion that were possible. Essentially 1/2 because they are population genetic FST = 1/(1 + 4Nm)(2Nµ/Nm) models, they describe the relative contribution that migrants make to Notice that the stepping stone model demes that they enter (i.e. the extent approaches the island model when of effective dispersal). populations become very large. Also, the stepping stone model is a function The simplest migration model is an of not only gene fl ow but the mutation ‘Island Model’ where subpopulations rate (µ) as well. of equal size over a geographical area interact in such a way so that they can An example could be catadromous exchange genes with equal probability. fi sh species that spend much of their An example could be subpopulations of life cycles in freshwater, but can have a fi sh species confi ned to a large lake. limited dispersal via the marine envi-

The relationship between FST and gene ronment and hence reach neighbouring fl ow (Nm) for the island model is: rivers. Complexity of stepping stone models can be increased by spatial and

FST = 1/(1 + 4Nm) temporal effects of the environment and this will have consequential A second kind of model is ‘Isolation effects on the relative complexity of by Distance’ where relative gene fl ow the mathematical equations used to among subpopulations of one large describe the relationships. population is affected by distance among subpopulations and/or possible The models discussed above attempt alternative paths by which individuals to describe the patterns of population can disperse. An example could be structure that can exist in specifi c populations of a species of fi sh that situations in nature. All rely on the occur widely across an ocean. While association between gene fl ow and

52 As biologists began to apply migration models to aquatic species they quickly realised that in some instances (e.g. riverine freshwater systems) that the existing models were not adequate to explain all possible limitations on gene fl ow. Riverine systems are unique in that they can impose a hierarchical structure on potential for gene fl ow on species that are obligate in that environment (e.g. some freshwater invertebrates and fi shes). This lead Meffe and Vrijenhoek (1988) to develop a specifi c model to address this situation, a model they called the ‘Stream Hierarchy Model’ (SHM). What this model attempts to describe is the fact that rivers and streams are essentially dendritic spatial systems for the organisms that are obligate users of them. Consequently, their patterns of genetic diversity should refl ect the dendritic nature of the habitat with genetic diversity is likely to increase down the system because of water fl ow effects on relative dispersal and gene fl ow structured hierarchically. Thus gene fl ow is structured according to the following hierarchy, within stream > among streams > among drainages so that:

HT = HC + DCR + DRS + DST

Where:

• HC = within population diversity

• DCR = differences among populations in a river

• DRS = differences among rivers in a drainage

• DST = differences among drainages

With the expectation that: DCR < DRS < DST

inbreeding, i.e. as gene fl ow increases, population or the migration rate. Nm level of inbreeding should decline. This is based on an Island model of popula- means that when we have data on how tion structure and estimates recurrent differentiated two or more populations gene fl ow among subpopulations and are from each other, in theory this tells is equivalent to ‘the probability that us how much gene fl ow is occurring an allele randomly chosen from the

(or has occurred historically between population comes from a migrant’. Nm them). This information is used to calcu- can be diffi cult to measure in nature, late Nm, a statistic that equates to the but if we know allele frequencies in number of migrants moving between both donor and recipient populations

53 before gene fl ow and the change in allele frequency in the recipient population after gene fl ow, then we can estimate Nm. This is because the change in allele frequency over time following gene fl ow is proportional to the difference in frequencies between donor and recipient populations. The outcome of modelling of the effect of different levels of gene fl ow among populations has shown quite clearly that even very limited gene fl ow is suffi cient to keep populations essentially, genetically homogenous. As little as a single migrant per generation is suffi cient in theory, to homogenise gene frequencies among populations. So only very limited dispersal is capable of restricting divergence that results from local selection and genetic drift effects.

54 SECTION 7

Environmental infl uences on population processes

In the previous sections we have Change in the physical environment discussed the genetic processes that that has affected levels of gene fl ow operate at the population level that among populations on an evolutionary principally determine population time scale has been immense. For structure (i.e. mutation, genetic example, the continents that we drift, gene fl ow and selection). These know today once were part of super- processes however, must operate within continents (Pangaea, Gondwana). a framework shaped by the environ- As the continents drifted apart (via ment (extrinsic factors) and the ecology plate tectonics) gene fl ow ceased, and life history traits of the species leaving populations isolated from each (intrinsic factors). In fact it is rather other (unless they were very good at meaningless to interpret population swimming or fl ying). The separation genetic data (especially for manage- of the super-continents happened so ment purposes) in isolation without long ago that most species affected by taking intrinsic and extrinsic factors it have since gone extinct or isolated into account. In this section we will populations have evolved into different look at the effect that the environment species. There are however, still several can play in shaping genetic variation in closely related taxa that share a Gond- natural populations, with the emphasis wanan distribution (e.g. marsupials, on freshwater systems. Firstly we can ratite birds, lungfi sh). Because the disregard mutation, as the effect of the separation occurred so long ago, it environment on the mutation process bears little application to intraspecifi c largely results in somatic mutations level processes. which are rarely heritable (e.g. solar radiation causing skin cancer). On a more recent evolutionary time scale however, many events have The environment can either promote or shaped the population structure of inhibit gene fl ow among populations extant species particularly during and as such a heterogeneous environ- the Pleistocene. Many populations ment (as is usually the case) will result became isolated due to the expansion in varying levels of population connec- of ice sheets during the most recent tivity. An important consideration is ice age. In fact many populations still that the environment or habitat of a bear the genetic signatures of these species is rarely stable over time and vicariant events in North America and therefore its impact on shaping popula- Europe even though levels of gene tion structure will consist of a historical fl ow are signifi cantly higher today and a contemporary component. That than 10,000 years ago. Mountain is, how the environment is affecting uplift due to tectonics or volcanism structure today (on a ecological time (geomorpholoical change) also has scale), and how it affected population resulted in much habitat fragmentation structure in the past (on an evolu- leading to population differentiation tionary time scale). and many of these populations still remain isolated today. The rise and fall

57 of sea levels (eustasy) also connected the Pleistocene, low sea levels resulted and isolated landmasses and hence in freshwater connection between populations repeatedly. For example Australia and New Guinea via ‘Lake much of the terrestrial fauna shared Carpentaria’. Several freshwater species among the Indonesian islands and (e.g. gudgeons, rainbowfi sh, freshwater between Australia and New Guinea prawns) still have a distribution that can be explained through this process. refl ect this history. Another effect Over this sort of time scale there of eustatic change has been on river was also signifi cant fl uctuations in systems that are currently isolated by temperature which played a signifi cant the marine environment but historically role in shaping genetic variation in had a freshwater confl uence at times populations. Changes in temperature of low sea level. This phenomenon generally led to reduced habitat avail- also explains the distribution of ability with intervening regions often genetic variation of a southeast inhospitable to dispersal. Asian freshwater catfi sh (Hemibagrus nemurus) among currently isolated All of the historical environmental river drainages. During the Pleistocene, fl uctuations mentioned above have had low sea levels resulted in freshwater signifi cant impacts on the population confl uences on the Sunda Shelf that structure of freshwater fauna through facilitated interdrainage gene fl ow the modifi cation of dispersal pathways. (Dodson et al. 1995). One of the most signifi cant effects resulted from geomorphological Sometimes climate has changed so change through the rearrangement rapidly (e.g. temperature), that species of drainage channels (e.g. river fail to evolve in situ and are forced capture). Under this scenario where to move to more suitable habitat. a stream fl owing to one river system This movement may take the form is ‘captured’ and begins fl owing in of latitudinal or altitudinal shifts. another direction, populations that had For example, freshwater crayfi sh in been connected through a high level Australia (Euastacus sp.) historically had of gene fl ow previously, became totally a widespread lowland distribution. As isolated while populations that may temperatures began to increase in the have been isolated started exchanging Miocene, they were forced to retreat genes. The geomorphological evolution further and further up mountains of drainage channels is generally seen where cool moist conditions still as the primary factor that infl uences remained. Eventually, connectivity the distribution of most obligate among mountain top populations was freshwater species. cut as the intervening lowlands became uninhabitable for crayfi sh. Sea level fl uctuations have also have a signifi cant infl uence on shaping On an ecological time scale, there population structure of freshwater are also many factors that can affect fauna. For example, towards the end of levels of gene fl ow, either promoting

58 or restricting it. Firstly, it must be disturbances (e.g. tropical cyclones, recognised that due to the nature of earthquakes, volcanic eruptions) that river systems, freshwater populations alter the landscape will probably affect are expected to be highly structured the qualities necessary for continued especially among drainages. The connectivity. For example, volcanic terrestrial environment and the marine eruptions in New Zealand some 2,000 habitat that separate rivers, inher- years ago with associated larva fl ows ently dictate that gene fl ow will be and ash deposits, resulted in small highly restricted. Climatic fl uctuations isolated populations of freshwater fi sh however, can overcome these barriers species with little or no potential for to dispersal. High rainfall can result in gene fl ow among them. freshwater plumes around the mouths of rivers (e.g. the freshwater plume at Other natural instream barriers to the mouth of the Amazon River some- dispersal include waterfalls, rapids times extends hundreds of kilometres and cascades. It is not uncommon into the Atlantic Ocean). Depending for upland populations to be totally on the scale of the plume and the isolated from downstream populations proximity of the neighbouring river that are divided by a signifi cant and mouths, connectivity among normally rapid change in stream profi le. Stream isolated rivers may exist and for a fl ow itself dictates that gene fl ow short period of time a small degree downstream is going to be signifi cantly of dispersal may result. Also, fl ooding greater than in an upstream direction, caused by high rainfall can lead to a unless species have evolved dispersal high degree of connectivity amongst mechanisms to counteract this effect normally isolated drainages resulting in (e.g. positive rheotaxis). Another massive interdrainage dispersal events, important barrier to dispersal in some especially in areas of low elevation (e.g. freshwater systems is just physical inland eastern Australia). distance. In extensive drainages such as the Mekong River, it is not physically Within a single drainage there also possible for an individual to traverse exist several natural barriers to gene the entire distance of the river in a fl ow, some of which are infl uenced single lifetime. climatically. For example, headwater streams that are continuous during the As can be seen from this discussion, wet season may be transformed into a there are many environmental factors series of isolated waterholes during the that can infl uence gene fl ow (either dry season. Also dispersal vectors such promoting or restricting) that operate as water currents may change season- over various temporal scales. For ally (e.g. Tonle Sap River, Cambodia). management purposes, an important These processes can affect gene fl ow. goal is to understand the magnitude Generally, most species have evolved of gene fl ow that is occurring today. to cope with seasonal environmental As such, one of the challenges of fl uctuations but random catastrophic population genetics is to be able to

59 differentiate the effects of historical divergent in isolation (mtDNA is a versus contemporary gene fl ow on particularly powerful marker for this the observed population structure. application). Early population genetic studies based on allozymes were largely unable to We know that the population process accomplish this. Allozymes (and to of random genetic drift is largely a certain extent mtDNA haplotypic a function of population size – the frequency data) can distinguish smaller the population the greater between high gene fl ow and total will be the effect of drift. It is also isolation, but the interpretation of commonly known that populations situations in between these extremes naturally fl uctuate in size over time can only ever be an educated guess. with much of the fl uctuation a result The development of more sensitive of environmental infl uences. Once techniques (i.e. DNA sequencing, again, these environmental fl uctuations microsatellites) has provided tools that have a historical and a contemporary allow us to determine more confi dently component. For example, during the the relative contributions of both Pleistocene much of the freshwater historical and contemporary processes habitat in the northern hemisphere to gene fl ow. was locked up as ice and what suitable habitat was left tended to fragment Finally, humans in recent times have large populations, sending many had a substantial impact on gene fl ow subpopulations extinct. The surviving in freshwater systems. Over the past individuals existed as small populations few hundred years, anthropogenic in small habitat refugia. During this modifi cations to natural water courses time much genetic variation would have been signifi cant. Mostly these have been lost. Subsequent climatic modifi cations such as dams, pollution warming re-opened much habitat and stream channel alteration have allowing the small populations to resulted in dispersal being restricted rapidly increase their range and expand further. Because most anthropogenic into areas previously unavailable to disturbance has occurred relatively them with an associated increase in recently, any cessation of gene fl ow population size. is unlikely to be detected in the data from a population genetics survey On an ecological time scale, natural (although some population structuring seasonal shifts result in fl uctuations has been recorded either side of some of available habitat (as seen in the dams in the United States that have previous section). When habitat is only been in existence for 50 years). On reduced such as in the dry season, the other hand, human mediated gene population size generally decreases. fl ow via interbasin transfers of water or Similarly seasonal fl uctuation can direct translocation of species among affect resource availability. If periods drainages can be detected if the newly of poor habitat and low resources mixed populations were genetically coincide, local populations may crash

60 and possibly become extinct. Because differential selection pressures due to seasonal fl uctuations are short lived, varying environments play a major role it is expected that the level of genetic in infl uencing genetic differentiation variation in the population will be among populations. determined by the duration of poor conditions and hence the effects of On the other hand, selection may act to drift when the population is at its homogenize allele frequencies among smallest size. If only a few individuals populations, even in the absence of make it through the bottleneck, gene fl ow. If the local environments regardless of the rate of recovery, of populations are similar, alleles may genetic variation will have been lost be under similar selection pressure (i.e. and recovery can take a long time. the same alleles are favoured in both populations) thereby creating a popula- While most species are well adapted tion structure that would be expected to their environments and have life under a model of high gene fl ow. This history traits that are well suited to highlights the necessity for choosing seasonal environmental fl uctuations, neutral markers for population studies catastrophic events can have a devas- that are capable of revealing any tating effect on population numbers population structure that may be and even result in local extinctions. present. It may take many generations before population size recovers and many more before genetic variation reaches pre-disturbance levels. The amount of genetic variation a population can maintain over time is determined by the population size at its lowest level, not at the highest.

When new mutations arise, they are either benefi cial, deleterious or neutral. Their relative fi tness is purely a function of the environment. Much genetic variation may exist in a population that is essentially selectively neutral. A sudden change in environmental conditions however, may result in a particular genetic variant having a signifi cant selective advantage (or disadvantage). This will lead to strong directional selection that will inherently reduce genetic variation. In association with genetic drift, localized

61 62 SECTION 8

Ecological infl uences on population processes

It is diffi cult to discuss ecological species that have freshwater larvae, infl uences on population processes fl y upstream as adults to breed. Some without incorporating environmental species, rather than compensating for factors because a species’ life history down stream dispersal, have evolved traits (LHT) will adjust over time to local physiological or behavioural traits environmental conditions. However, that assist them to avoid displacement certain LHTs will inherently infl uence in the fi rst place. Most freshwater the effects of gene fl ow and genetic crustaceans have an abbreviated larval drift. phase thereby reducing the time in the plankton. Some species are Gene fl ow can be achieved by individ- dorso-ventrally fl attened which makes uals at all life history stages (i.e. from them less ‘visible’ to the water current. fertilized eggs through to adult) or as Others still have the ability to adhere gametes (eggs or sperm). Most species to the substrate or some species glue have evolved a dispersal phase in their their eggs to the substrate. Behavioural life history in order to avoid inbreeding adaptations include brooding of eggs and competition with close relatives. or larvae, remaining at the edge of Dispersal can either be of a passive or the stream (where the current has less active nature. Passive dispersal is usually velocity), hiding under large immov- undertaken as gametes or as planktonic able objects (rocks, snags, etc.) and larvae, but exceptions do exist (e.g. burrowing into the substrate. some adult spiders disperse large distances in the wind by producing Irrespective of compensatory, physi- ‘silk parachutes’). Passive dispersal has ological or behavioural adaptations, advantages because minimum energy the majority of gene fl ow in freshwater is required, however a dispersal vector systems is in a downstream direction. is required (e.g. a water current). The Therefore downstream populations disadvantage of this form of dispersal tend to act as ‘sinks’ for genetic varia- is that the individual may end up in tion and should display higher levels unsuitable habitat. of diversity that populations further upstream. Furthermore, confl uence In most river systems, passive dispersal sites should have a mixture of all is always in a downstream direction. alleles present in the river branches This presents a problem – how do that lead to them. This effect is upstream reaches of a stream remain further accentuated if there is a colonized? Many freshwater taxa with barrier to dispersal such as a waterfall a passive dispersal stage also have that signifi cantly restricts upstream a compensatory behaviour at some movement. Therefore, new mutations stage of their life history. For example, that arise in upstream populations can many freshwater crustaceans display a disperse downstream but new variants positive rheotactic response as adults from downstream may not be found (i.e. they actively swim or walk against upstream. the current). Similarly, many insect

65 Species that have evolved an active population numbers, fl uctuations in dispersal phase are particularly vulner- environmental conditions (both predict- able to anthropogenically modifi ed able seasonal and catastrophic change) environments, especially migratory mean that most populations will go species. For example, dams or impound- through declines and expansions ments can interrupt long established (boom/bust). As discussed in previous dispersal pathways to breeding or sections, the severity and the duration feeding grounds. Disruptions to the of population declines will largely natural life history of the species in this determine the level of genetic variation manner will result in a marked reduc- that can be maintained in the gene tion in the potential for long-term pool. The most extreme form of popu- population persistence. lation size fl uctuation is that of extinc- tion and recolonisation. Depending Many species display sex-biased on the source and magnitude of the dispersal in their life history. That is, recolonisation, genetic diversity may either males or females, but not both, either increase or decrease, both within are the principal dispersers. This has and among populations. signifi cant implications for mtDNA studies due to the maternal inheritance Evolution of LHT in some species of the molecule. If dispersal is male has resulted in breeding systems mediated, then there is no effective where certain sexually mature dispersal of mitochondrial genes (i.e. individuals (usually the males) gain gene fl ow is zero). Therefore a mtDNA a high percentage of matings (e.g. survey may indicate strong genetic harem system in many Pinniped structuring while nuclear markers may species). This behaviour by itself reveal panmixia. A similar pattern reduces Ne signifi cantly. From Section may be seen in philopatric species 5 we know that unequal numbers of (those that return to their natal site to breeding males and females will result reproduce). Even though these species in the effective population size being may disperse over great distances (e.g. signifi cantly smaller than the total across oceans) if the female is philo- number or breeders. In some breeding patric, mtDNA gene fl ow is nil (e.g. this systems, selection has favoured mate pattern is seen in sea otters). choice (‘good genes’ hypothesis)

where the reduction in Ne is offset by Species evolve to maximise their the increased genetic quality of the reproductive output. This may be offspring (e.g. co-operative breeding in through breeding at a time of birds). maximum resource quantity/quality, iteroparity (multiple breedings over An important outcome of size fl uctua- time) or through modifying the repro- tions is that many natural populations ductive allocation to suit prevailing will rarely achieve mutation/drift environmental conditions. Irrespective of these adaptations to maintain high

66 equilibrium, a condition that forms a common assumption underlying many statistical analyses.

67 68 Glossary

Aestivation: Dormancy during summer or dry season.

Allele: An alternative form of a gene occurring at the gene locus.

Allopatric: Relating to the geographic distribution of populations/species with distributions that do not overlap.

Allozymes: Alternative forms of an enzyme coded for by different DNA sequences at a single genetic locus.

Ancestral retention: Isolated populations having the same allele from a time prior to isolation.

Autosomal loci: Gene sequences on non sex linked chromosomes.

Balanced polymorphism: Where multiple alleles exist at a single locus over evolu- tionary time.

Balancing selection: Process by which multiple alleles are maintained by selection at a coding locus.

Base-Pairing Rule: Where A binds to T (U) and G binds to C.

Bonferroni correction: Adjustment of the signifi cance (α) of a statistical test to reduce the probability of committing a Type I error through multiple comparisons.

Bootstrapping: Permutation method for testing the reliability of a node in a gene tree.

Coding: Region of DNA that can be transcribed and translated to produce func- tional polypeptide product.

Co-dominant: Locus where heterozygotes express a unique phenotype.

69 Codon: Nucleotide triplet in DNA or RNA that specifi es the amino acid to be inserted in a specifi c position of a polypeptide.

Confl uence: A place where two water channels join.

Conspecifi c: Of the same species.

Cytoplasm: All cell contents excluding nucleus.

Denature: Process of breaking the bonds between the two complementary strands of DNA through chemical or temperature stress.

Dendritic: Resembling a dendrite (nerve cell) which is characterised by many paths of connection.

Diploid: Referring to an organism having two sets of chromosomes, one from each parent.

Directional selection: Process by which one phenotype if favoured by selection produces an increase in relative frequency of the underlying allele.

Dispersal: The movement of individuals.

Dispersal vectors: Extrinsic entities that facilitate dispersal; physical (wind, water) or biological.

DNA markers: DNA sequences that can characterise individuals, populations, species, etc.

DNA polymerase: Enzyme that catalyses production of new DNA molecules.

DNA replication: Process of generating new DNA strands.

Dominant: Locus where heterozygotes are not detected from homozygote domi- nants.

Duplex: Single stranded DNA that has re-annealed to its complementary strand or another strand with different sequence (homoduplex and heteroduplex respec- tively).

Effective population size: The number of breeding adults in a population that contribute their genes to the next generation (Ne).

70 Electrophoresis: Procedure for separation of molecules (based on charge and/or structure) in an electric fi eld.

Epistatic interaction: Interaction of nonallelic genes to produce phenotypes.

Eukaryote: Referring to the superkingdom that contains organisms whose cells contain membrane-bound nuclei and mitochondria (Protista, Fungi, Animalia, and Plantae).

Eustasy: Rise and fall of sea levels.

Evolution: Changes in genetic composition that occur within populations from one generation to the next.

Evolutionary mechanism: A process that results in changes in gene frequencies in populations.

Extant: Currently in existence, not extinct.

Extrinsic factors: Factors outside the basic nature of something.

Fixation: The point when an allele reaches a frequency of 100% in a population.

Founder event: The formation of a new population by one or a few individuals.

Gamete: Germ cell (egg, sperm) with a haploid genome.

Gene: A hereditary unit that occupies a specifi c location (locus) on a chromosome, the physical entity that is transmitted from parent to offspring.

Gene expression: When a coding locus produces a phenotype (protein).

Gene fl ow: Successful movement of genes among populations via dispersal of individuals or gametes.

Gene frequency: The frequency that an allele occurs at within a population.

Gene function: The role that a specifi c coding DNA sequence plays.

Gene tree: A branching diagram depicting the inferred relationships among a group of genes or other DNA fragments.

Genetic Drift: Random changes in gene frequency due to chance.

71 Genetic recombination: Reassortment of DNA sequences on homologous chromo- somes at meiosis.

Genome: The total genetic material within a cell or individual.

Genotype: Genetic constitution of a cell or individual.

Geographical speciation: Where two species evolve from a common ancestor as a result of geographical isolation and independent evolution in different environ- ments.

Geomorphology: Study of the evolution of physical landscapes.

Gondwana: Supercontinent (consisting of South America, Africa, Australia and Antarctica) existing ~200 million years ago.

Haploid: Single copy of each gene.

Haplotype: Genetic constitution of a haploid cell.

Heritable: Capable of being passed from one generation to the next.

Heredity: The genetic transmission of characteristics from parent to offspring.

Heteroduplex: Combining DNA’s from different sources together.

Heterogeneous: Variable, made up of different elements.

Heterologus: Non-identical DNA sequences.

Heteroplasmic: Contains more than a single DNA sequence per cell.

Heterosis: Also known as hybrid vigour and heterozygote advantage. Occurs when the fi tness of individuals with two different alleles at a locus is greater than the fi tness of individuals with two identical alleles at a locus.

Heterozygote: An individual that carries two different alleles at a diploid locus.

Homoduplex: Single DNA strand bound to its mirror image.

Homoplasmic: Contains a single DNA sequence per cell.

72 Homoplasy: Denotes parallel or convergent evolution of DNA sequence informa- tion, same allelic state not resulting from descent from a common ancestor.

Homozygous: An individual that carries two copies of the same allele at a diploid locus.

In vitro: In an artifi cial environment outside an organism.

In vivo: Within a living organism.

Indels: Changes in DNA sequence, specifi cally insertion or deletion of nucleotides.

Inbreeding: Mating among closely related individuals.

Intraspecifi c: Pertaining to interactions among individuals of the same species.

Intrinsic factors: Factors belonging to the basic nature of something.

Introgression: Mixing of discrete entities/populations.

Iteroparity: The characteristic of breeding more than once in a lifetime.

Life history trait (LHT): Signifi cant feature of the life cycle through which an organism passes.

Lineage sorting: Process of particular genetic variants going extinct over evolu- tionary time in a population through random drift.

Locus: The site that a gene or molecular sequence occupies on a chromosome (plural loci).

Maternal: Relating to the mother.

Microsatellites: Tandem repeats of short DNA motif’s (non-coding DNA).

Migration: The mass directional movement of large numbers of individuals of a species as part of their life history.

Miocene: Geological time scale (epoch) from 23.8-5.3 million years ago.

Mismatch distribution: Distribution of the pairwise nucleotide differences among all DNA sequences in a sample, used for determining historical demographic change.

73 Molecular clock: Consistent accumulation/loss of mutations at a locus that occurs at the DNA level.

Molecular drive: A process of DNA ‘turnover’.

Molecular systematics: Use of DNA sequence data to characterise relationships among organisms.

Morphology: Shape, form, external structure or arrangement of an organism.

Multidimensional scaling: Multivariate statistical method that represents the multidimensional similarity of samples in two or three dimensions.

Mutation: Alteration in the arrangement or amount of genetic material of a cell.

Mutation/drift equilibrium: The increase in genetic variation in a population through mutation is offset by the reduction in genetic variation due to random drift.

Mutation rate: The frequency with which new mutations arise in a population.

Natural selection: Change in gene frequencies due to differences in individual fi tness.

Neutral markers: DNA sequences that evolve solely due to genetic drift and mutation.

Neutral Theory: Theory proposed by M. Kimura to account for the high level of genetic variation in populations; most point mutations are selectively neutral.

Niche: The position or role of a plant or animal species within its community or environment.

Nitrogenous bases: Bases that code for variation at the DNA level.

Non-coding: Region of DNA that is not transcribed and translated.

Nucleotides: Building blocks of DNA molecules.

Nucleotide diversity: Measure of genetic variation, the mean number of differences in a sample.

74 Null alleles: An allele that produces no functional product, a sequence that is not amplifi ed in PCR-based analysis because a variation in the DNA sequence annealing to the 3’ end of a primer results in nonamplifi cation of the expected segment.

Outbreeding: Breeding with individuals from another sub-population.

Pangaea: Supercontinent (all continents joined) existing ~ 225 million years ago.

Panmixia: A single gene pool, no barriers to gene fl ow.

Pentose sugar: Part of DNA ‘skeleton’.

Permutation test: Statistical procedure that repeatedly randomises a data set to create a null distribution for testing the signifi cance of an parameter estimated from the data.

Phenotype: The physical manifestation of a genotype, eg. the colouration pattern of a fi sh.

Philopatric: Returning to the natal site to reproduce.

Phosphate group: Part of DNA ‘skeleton’.

Phylogenetic: Relating to the hypothesised evolutionary relationships of indi- viduals, populations or species.

Phylogeography: Analysis of genealogy, population genetics or evolution within a geographical context.

Point substitutions: Mutations at a single base pair site.

Polygenic: Refers to a trait or phenotype whose expression is the result of the interaction of numerous genes.

Polymorphic: The occurrence of different forms, stages, or types in individual organisms or in organisms of the same species, independent of sexual variations.

Polyphyletic: The term for a group of organisms, when despite their being classi- fi ed together as one taxonomic category, it is thought that not all have descended from a common ancestor.

Polyploidy: The situation of cells or individuals having additional complete sets of chromosomes.

75 Ploidy: Number of sets of homologous chromosomes.

Population bottlenecks: Severe reduction in population size that reduces popula- tion genetic variation.

Post hoc: Planned after the fact.

Plate tectonics: Movement of continental plates.

Pleistocene: Geological time scale (epoch) from 1.8 million – 10,000 years ago.

Primer: A short oligonucleotide fragment from where nucleotide extension is initiated during PCR.

Proof reading: Process of scanning new DNA strands for replication “errors”.

Purifying selection: Process of removal of deleterious alleles from gene pools.

Recolonisation: The arrival of a number of individuals to re-establish a population that had gone extinct.

Refugia: Places of suitable habitat generally surrounded by inhospitable habitat.

Relative fi tness: Measure of relative reproductive success of different phenotypes.

Restriction enzymes: Prokaryote enzymes that cut DNA strands.

Rheotaxis: Active dispersal response when subject to a current; positive – against the current; negative - with the current.

Ribosome: Site of protein synthesis in cells.

River capture: Drainage rearrangement where a stream fl owing into one river system is diverted (captured) by an adjacent system that has a higher erosional rate.

Segregating site: Nucleotide position in a series of homologous DNA sequences where a mutation has occurred.

Semi-conservative replication: Where new DNA molecules are synthesised using an ‘old’ strand as a template.

Selection coeffi cient: The relative fi tness of a particular genetic variant(s).

76 Selectively neutral: Not affected by natural selection.

Sex-biased dispersal: Dispersal predominantly by one sex.

Simple sequence repeats: Microsatellites.

Sink: A population that accumulates genetic variation through gene fl ow from several other source populations.

Somatic: Of cells of the body as opposed to germ cells.

Stochastic: Involving chance or probability.

Stock: A group of organisms that shares the same genetic and demographic parameters.

Stream profi le: 2 dimensional cross section of a stream refl ecting elevation.

Substrate: Ground or other solid surface on which animals walk on or are attached to.

Temporal: Related to time.

Transcription: Process of encrypting DNA gene message onto a ‘carrier’ molecule (mRNA).

Transition: Point mutation where a purine base is replaced by another purine (A or T) or a pyrimidine is replaced by another pyrimidine (C or G).

Translation: Decoding of mRNA into polypeptide.

Transversion: A point mutation where a purine (A or T) is replaced by a pyrimidine (C or G) or vice versa.

Vicariance: Separation.

χ2 Test: A test that uses the chi-square statistic to test the fi t between a theoretical frequency distribution and a frequency distribution of observed data for which each observation may fall into one of several classes.

Zygote: Offspring (2n) that results at fusion of egg (n) and sperm.

77 Also see additional glossary in “Glossary of biotechnology and genetic engi- neering”, FAO Rsearch and Technology Paper, No.7 at: http: //www.enaca. org/modules/wfdownloads/singlefi le.php?cid=63&lid=769

78 Bibliography

De Silva, S. S. (2004). Fisheries in inland waters in the Asian region with special reference to stock enhancement practices. Bangkok, Thailand.

Dodson, J.J., Colombani, F. & Ng, P.K.L. (1995). Phylogeographic structure in mitochondrial DNA of a South-East Asian freshwater fi sh, Hemibagrus nemurus (Siluroidei, Bagridae) and Pleistocene sea-level changes on the Sunda shelf. Molecular Ecology 4: 331-346.

Eknath, A. E. & Doyle, R. W. (1990). Effective population size and rate of inbreeding in aquaculture of Indian major carps. Aquaculture 85: 293-305.

Gupta, M. V. & Acosta, B. O., eds. (2001). Fish genetics research in member countries and institutions of the International Network on Genetics in Aquaculture.

Kamonrat, W. (1996). Spatial genetic structure of Thai silver barb Puntius gonionotus (Bleeker) population in Thailand. Halifax, Canada: Dalhousie University.

Petr. T. (ed.) (1998). Inland fi shery enhancements. Papers presented at the FAO/DFID Expert Consultation on Inland Fishery Enhancements. Dhaka, Bangladesh, 7–11 April 1997. FAO Fisheries Technical Paper. No. 374. Rome, FAO. 1998. 463p.

Kimura, M. and Crow J. F., (1964). The number of alleles that can be maintained in a fi nite population. Genetics 49: 725-738.

Rhymer, J. M. & Simberloff, D. (1996). Extinction by hybridisation and introgression. Annual Review of Ecological Systematics 27: 83-109.

Senanan, W., Kapuscinski, A. R., Na-Nakorn, U. & Miller, L. (2004). Genetic impacts of hybrid catfi sh farming (Clarias macrocephalus x C. gariepinus) on native catfi sh populations in central Thailand. Aquaculture 235: 167-184.

Waters, J. M., Lopez, J. A. & Wallis, G. P. (2000). Molecular phylogenetics and biogeography of galaxiid fi shes (Osteichthyes: Galaxiidae): dispersal, vicariance and the position of Lepidogalaxias salamandroides. Systematic Biology 49, 777-795.

79 Welcomme, R. L. & Vidthayanon, C. (2003). The impacts of introduction and stocking of exotic species in the Mekong basin and policies for their control. Cambodia: Mekong River Commission.

80 Manual on Application of Molecular Tools in Aquaculture and Inland Fisheries Management MANUAL

ON

APPLICATION Part 1

OF

MOLECULAR Conceptual basis of population genetic

TOOLS approaches

IN

AQUACULTURE

AND

INLAND

FISHERIES

MANAGEMENT : PART

1 NACA Monograph1

www.enaca.org