The evolution of microsatellites

by Owen Charles Rose

A thesis submitted for the degree of Doctor of Philosophy of the University of London

The Galton Laboratory

Department of Biology

University College London

4 Stephenson Way

London NWl 2HE ProQuest Number: U641917

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the . uest.

ProQuest U641917

Published by ProQuest LLC(2015). Copyright of the Dissertation is held by the Author.

All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. Microform Edition © ProQuest LLC.

ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Abstract

This thesis investigates the evolutionary dynamics of microsatellite loci, and their use as measures of genetic variability for conservation studies.

An understanding of the dynamic process at microsatellite loci is developing, but at present it is not known what factors limit array size change - how microsatellites are bom and how they die. A novel approach to this question is made possible by the analysis of recently produced large scale genome sequence data. Chapter 2 proposes a lower size limit for slippage mutation - the birth of a microsatellite, based on the distribution of short tandem repeat array sizes in the genomes of the yeast, nematode and human. Chapter 3 presents a simple model to show how the two components of dynamic mutation, slippage mutation and unequal exchange, could combine to set an intrinsic limit on array size increase - the death of a microsatellite.

Microsatellite variation can be used as a surrogate for overall genomic diversity to test the hypothesis that frequent extinction and colonisation in fragmented populations can drastically reduce levels of genetic variability. Three microsatellite loci were cloned from the declining British butterfly Plebejus argus to examine the genetic effects of population turnover in this species. Two related studies were performed. Chapter 5 shows that serial colonisation events in an introduced P. argus metapopulation have caused significant losses of genetic variation. In Chapter 6, levels of variability and patterns of differentiation at microsatellite loci are used to show that turnover within natural populations has not led to genetic impoverishment of the species. Both studies show that the microsatellite data are consistent with studies of allozyme loci in the same populations. Co n t e n t s

ABSTRACT...... 2

CONTENTS...... 3

FIGURES...... 5

TABLES...... 6

ACKNOWLEDGEMENTS...... 7

DECLARATION...... 8

CHAPTER 1: AN INTRODUCTION TO THE EVOLUTIONARY DYNAMICS OF MICROSATELLITES...... 9 1.1 Characteristics of microsatellite loci ...... 9 1.2 M utation at microsatellite loci ...... 12 1.3 M icrosatellite evolutionary dynamics and population structure ...... 15 1.4 W hat limits microsatellite array size evolution ?...... 18

CHAPTER 2: A THRESHOLD SIZE FOR MICROSATELLITE EXPANSION...... 20

A bstrac t ...... 20 2.1 Introduction ...... 21 2.2 Me t h o d s ...... 22 2.2.1 Assembly of genomic sequence data...... 23 2.2.2 Searching of genome databases ...... 23 2.3 Results and discussion...... 25

CHAPTER 3: AN INTRINSIC MUTATIONAL CONSTRAINT ON MICROSATELLITE ARRAY EXPANSION...... 32

A bstrac t ...... 32 3.1 Introduction ...... 33 3.2 Evidence for a nd against a general functional role for microsatellites ...... 34 3.3 A n intrinsic mutational limit to microsatellite e x pa n sio n ...... 35 3.4 D escription of the m o del ...... 37 3.5 M odel operation ...... 38 3.6 M odel o u t p u t ...... 39 3.6.1 Arrays evolving under stepwise mutation only...... 39 3.6.2 Unequal exchange constrains array size expansion ...... 41 3.6.3 Comparison of model output with observed genomic...... data 43 3.7 D iscussion of model o u t pu t ...... 50 3.8 M icrosatellites a n d human disease ...... 52 3.9 Conclusions ...... 55

CHAPTER 4: CONSERVATION THROUGH THE MAINTENANCE OF GENETIC DIVERSITY. 56

4.1 Population size a n d genetic variability ...... 57 4.2 H ow DOES LOSS OF VARIATION REDUCE POPULATION VIABILITY?...... 59 4.3 Empirical studies of genetically depauperate populations ...... 61 4.4 M olecular genetic markers for conservation stu d ies ...... 64 CHAPTER 5: LOSS OF GENETIC VARIABILITY THROUGH POPULATION TURNOVER IN A BUTTERFLY...... 69 A bstr a c t ...... 69 5.1 Introduction ...... 70 5.2 M e t h o d s ...... 71 5.2.7 History of Plebejus argus in North Wales...... 77 5.2.2 Collecting sites...... 72 5.2.3 DNA extraction...... 72 5.2.4 Identification and analysis of microsatellite loci...... 74 5.2.5 Genetic variation at protein and mitochondrial loci...... 76 5.2.6 Analysis of data ...... 76 5.3 Re su l t s ...... 78 5.3.1 Genetic variability at microsatellite loci...... 78 5.3.2 Permutation tests for the effects of colonisation on levels of variability...... 80 5.3.3 Patterns of variability at allozyme and mitochondrial loci...... 83 5.4. D isc u ssio n ...... 88 5.4.1 Genetic effects of artificial introductions and natural colonisation...... 88 5.4.2 Comparison of microsatellite, allozyme and mitochondrial variation...... 90 5.4.3 Conservation of P. argus in North Wales...... 97

CHAPTER 6: ASSESSING THE GENETIC CONSEQUENCES OF DEMOGRAPHIC CHANGE IN BRITISH POPULATIONS OF A DECLINING BUTTERFLY SPECIES...... 92 A b s t r a c t ...... 92 6.1 Introduction ...... 93 6.1.2 P. argus in Britain ...... 94 6.2 M e t h o d s ...... 95 6.2.1 Sample collection...... 95 6.2.2 Collection of genetic data...... 95 6.2.3 Analysis of levels of genetic variability ...... 97 6.2.4 Analysis of genetic structure...... 98 6.2.5 Stepwise mutation and genetic structure at microsatellite loci...... 99 6.3 R e s u l t s ...... 100 6.3.1 Changes in genetic variability during the post-glacial spread o f P. argus...... 100 6.3.2 Estimation of demographic parameters from allele frequency distributions...... 105 6.4 D is c u s s io n ...... 109

CHAPTER 7: GENERAL CONCLUSIONS ...... 113 7.1 D y n a m ic m u t a t io n a t microsatellite l o c i ...... 113 7.2 G e n e t ic variability a n d p o p u l a t io n persistence ...... 114

LITERATURE CITED...... 117

APPENDIX 1...... 133

APPENDIX n ...... 136

APPENDIX m ...... 137 F ig u r e s

F ig u r e 1.1 S t epw ise m u t a t io n th r o u g h slipped - st r a n d m ispa ir in g ...... 13

Rg u r e 1.2 M u ta tio n through u n e q u a l e x c h a n g e ...... 15

F ig u r e 2.1 D istribution of sh o r t t a n d e m repeats in th e g en o m e o f th e y e a s t 2 7

F ig u r e 2 .2 D istribution o f sho rt t a n d e m r epeats in th e g en o m e o f t h e n e m a t o d e . . . 2 8

F ig u r e 2 .3 D istribution o f sh o r t t a n d e m repeats iN t h e g en o m e o f t h e h u m a n 2 9

FIGURE 3.1 D inucleotide a r r a y size distributions from g en o m e seq u e n c e d a t a ...... 3 4

R g u r e 3 .2 F l o w c h a r t o f m o d e l o pe r a t io n ...... 3 9

Rg u r e 3 .3 A r r a y e v o lu tio n u n d e r u pw a r d b ia se d stepw ise m u ta tio n o n l y ...... 4 0

Rg u r e 3 .4 A r r a y e v o lu tio n u n d e r sm a ll u p w a r d bia ses to stepw ise m u t a t io n ...... 41

Rg u r e 3 .5 A r r a y life histories u n d e r stepw ise m u ta tio n o n l y ...... 41

Rg u r e 3 .6 A r r a y evo lu tio n u n d e r stepw ise m u ta tio n a n d u n e q u a l e x c h a n g e ...... 4 2

Rg u r e 3 .7 A r r a y size distributions after 100 m illion generations ...... 4 3

Rg u r e 3 .8 C o m pa r iso n o f m o d el o u t pu t w ith h u m a n dinucleotide d a t a ...... 4 5

Rg u r e 3 .9 C o m pa r iso n o f m o d e l o u t pu t w ith y e a st microsatelute d a t a ...... 4 7

Rg u r e 3 .1 0 C o m pa r iso n of m o d el o u tpu t w ith n e m a to d e microsatelute d a t a ...... 4 8

Rg u r e 3.11 C o m pa r iso n o f m o d el o u t pu t w ith h u m a n microsatellite d a t a ...... 4 9

R g u r e 5.1 L o c a t io n s o f P. a r g u s populations in n o r th w a l e s ...... 7 3

Rg u r e 5 .2 C h a n g e s in allele n u m b e r a n d heterozygosity a t microsatelute L o c i... 8 3

Rg u r e 5 .3 C h a n g e s in allele n u m b er a n d heierozygosity a t a llo zy m e LOCi...... 8 6

Rg u r e 5 .4 C h a n g e s in allele n u m b e r a n d heterozygosity a t m tDNA ...... 8 7

Rg u r e 6.1 L o c a t io n o f st u d y populations ...... 9 6

Rg u r e 6 .2 a V a r ia bil it y a t 3 microsatellite l o c i ...... 1 0 0

Rg u r e 6 .2 b V a r ia bil it y a t 10 a llo zy m e lo c i ...... 1 0 0

Rg u r e 6 .3 D ist a n c e fro m S t u d l a n d H ea th a n d g e n e h c d iv e r s it y ...... 1 0 4 TABLES

T a b l e 5.1 F r e q u e n c ie s , heterozygosity , n u m b e r s o f a l l e l e s , a n d sa m p l e sizes a t MICROSATELLITE LOCI...... 7 9

T a b l e 5 .2 P a ir w ise probabilities o f o bta in in g t h e o b se r v e d c h a n g e s in a llele n u m b e r , HETEROZYGOSITY AND GENE FREQUENCY AT MICROSATELLITE LOQ...... 81

T a b l e 5 .3 C o m p a r iso n o f m a r k e r loci u s e d fo r g en o t y pin g P. argc /spopulations fr o m N . W a l e s ...... 8 4

T a b l e 5 .4 C o m pa r iso n o f differentiation a t a l l o z y m e , mitochondrial a n d MICROSATELLITE LOCI...... 85

Table 6.1 Location, sample size, and biotope of the 13 study populations...... 9 6

Table 6.2 Variability at m icrosatellite loci ...... 101

Table 6.3 Genetic effects of the colonisation of Britain from Europe...... 1 0 2

Table 6.4 Genetic effects of range expansion in B ritain...... 103

Table 6.5 Genetic structure at m icrosatellite loci...... 105

Table 6.6 Genetic structure at allozym e and m icrosatelute loci ...... 1 0 6

Table 6.7 Indirect and direct estimates of in Brihsh P. a r g u s...... 108 A cknowledgments

For technical help and advice I thank Mari-Wynn Burley, Sue Povey, David Whitehouse and particularly Stavros Malas, probably the only person in world with great enough patience to withstand my initial attempts at molecular . My thanks go to Daniel Falush for much useful practical help, discussion and argument. I also gratefully acknowledge the encouragement and advice of David Goldstein, Hidenori Tachida and Montgomery Slatkin for my work on microsatellite evolution. I thank Jim Mallet for being a consistently honest supervisor, and letting pursue my own ideas, and Steve Jones for having faith in me in the first place. The journey was made considerably easier and more enjoyable by the support of my friends, in particular Steve Phipps, Tony Ashworll,Darren Conway, Yvonne Graneau, Chris Jiggins, and Owen McMillan.

In particular, I would like to thank Martin Brookes for Figure 5.1 and so much more, my family for their constant support, and Jennie McCabe for everything. DECLARATION

Chapter 3 of this thesis is partly based on a computer simulation model written in collaboration with Daniel Falush. I was responsible for the initial intellectual conception of the model and its design, and collaborated with Daniel for the computer programming. Subsequently I performed all model iterations and comparisons with observed data, and am responsible for all elements of the presentation of the study.

Chapter 6 of this thesis compares microsatellite and allozyme data from British and European populations of P. argus. Allozyme gel electrophoresis was performed by Yvonne Graneau and Peter King at UCL. I am responsible for all analysis and interpretation of the data, comparisons with my own microsatellite data, and for all elements of the presentation of the study.

O w e n Ro se Dr. Ja m e s M a llet (University Supervisor) Ch a pter 1

An introduction to the evolutionary dynamics of microsatellites

Microsatellites are currently the markers of choice for a wide variety of genetic analyses. Consequently, much work has been focused on understanding the dynamic mutational mechanisms which give rise to high levels of polymorphism at these loci. Although such an understanding is developing, at present it is not known what factors limit microsatellite evolution, - how they are ‘bom’ and how they ‘die’. These two question are addressed in Chapters 2 and 3. The purpose of this chapter is describe the biology of microsatellites, what is currently known about their mutational processes, and why population studies indicate that their evolution may be limited.

1.1 Characteristics of microsatellite loci In 1986 Diethard Tautz suggested that the widespread existence of tandemly repeated DNA motifs in eukaryotic organisms would provide a major source of genetic variation in all regions of the genome (Tautz et al. 1986). This prediction has been spectacularly borne out in the ten years which have elapsed since. The ubiquitous distribution of poly (dT - dG)„ sequences in vertebrates was first demonstrated by Hamada et al. in 1982, and was soon confirmed for a variety of other short (1-6 bp) motifs (Tautz and Renz 1984). The subsequent discovery of widespread minisatellites (Jeffreys et al. 1985a), and satellite DNA (Willard and Waye 1987), confirmed the existence of a variety of tandemly repetitive DNA’s (Box 1.1) whose similarities suggested that they may be subject to the same general evolutionary processes.

Box 1.1 Tandemly repeated non-coding sequences.

SATELLITE SEQUENCES - core repeat units are typically lOObp, with overall array sizes up to lOOMh, located in heterochromatic chromosomal regions. Not generally as variable within populations as micro and minisatellites. (Charlesworth et al. 1994).

MINISATELLITE SEQUENCES - core repeat units typically 15-40bp, with overall array sizes up to 30 kb. Found in euchromatic chromosomal regions and are hypervariable in array size. (Jeffreys et al. 1985a).

MICROSATELLITE SEQUENCES - core repeat units are short, l-6bp, and total array lengths rarely exceed 200bp. HigMy variable and widely distributed throughout all eukaryotic genomes. (Jarne and Lagoda 1997). A consistent feature of tandem repetitive DNA is the variation in the number of repeat units between arrays at the same locus. This variability was first harnessed with minisatellite repeats, where genetic fingerprinting became a powerful tool in fine scale genetic analyses, such as individual identification and paternity studies (Jeffreys et al. 19856). The large size of minisatellite alleles (up to 50kb) renders them suitable for analysis by conventional Southern blotting techniques, but the routine use of microsatellite variability only became possible following the development of the polymerase chain reaction (PCR) in the late 1980’s. Microsatellite analysis usually involves locus specific PCR amplification and either radioactive or fluorescent labeling, followed by electrophoretic separation of different alleles. A wide variety of studies have shown these loci to be highly polymorphic, codominant, Mendelian markers with expected heterozygosities typically greater than 60% (Box 1.2).

Box 1.2 Microsatellite nomenclature Microsatellite - An array of tandemly repeated 1-6 base pair units. Microsatellites are named by the size of the core repeat unit, mononucleotide, dinucleotide, tetranucleotide etc., and may he further defined as pure, an unbroken array of repeat units, compound, a mixture of two or more different repeat arrays, or interrupted, where the array is interspersed with one or more non-repeating units. (In general, empirical analyses and theoretical treatments only concern pure repeat arrays).

for example: A pure dinucleotide repeat cattaCACACACACACACACAttcag A pure iciranucleotide repeat cgtatAGATAGATAGATAGATctact A compound mononucleotide repeat cgtatgAAAAAAAATTTTTTTgtactgg An interrupted irinucleotide repeat agatcCCGCCGTCCGACCGaacgga

High density genetic maps show that the distribution of microsatellite loci is more or less random across the whole genome (Dib et al. 1996, Dietrich et al. 1996). While this suggests that the majority of loci are selectively neutral, there is evidence of an upper size limit for trinucleotide repeats located within exons (Watkins et al. 1995). However, for higher organisms, with much non-coding DNA, these loci will be comparatively rare and should not affect the general assumption of neutrality for microsatellites in population genetic studies.

Analysis of genome sequence data shows that all possible types of mono, di and trinucleotide repeats, and 29 of the 33 possible tetranucleotide repeat types are present in primate genomes (Jurka and Pethiyagoda 1995), but the relative abundance of

10 particular repeat motifs differs across taxa. Mammalian genomes are very rich in CA repeats (Stallings et al. 1991), Yeast and C. elegans have more AT motifs (Rose unpublished data), while plants tend to have more abundant TA and G A repeats. The mean density of microsatellites also varies greatly between species. In mouse, dinucleotide repeats may be as common as one locus per 5kb and similarly high figures apply to humans and plants (Jame and Lagoda 1996). Conversely in Lepidoptera the density appears to be very low, with many independent attempts to isolate di and tetranucleotide repeats having failed (O. McMillan, I Saccheri, E. Meglecz, pers. comm.). Currently there is no functional or evolutionary explanation for these differences, and no model for predicting the occurrence of microsatellites.

Polymorphic microsatellite repeats represent a vast reservoir of easily scoreable genetic variation in the genomes of all eukaryotes. This has made them an important tool for various genetic analyses, most spectacularly in linkage mapping (Weissenbach et al. 1993, Dib et al. 1996), but also in forensics (Hagelberg et al 1991), and increasingly in population genetic studies (Bruford and Wayne 1993). Successful studies using microsatellites have now been performed in a wide variety of taxa including humans (Deka et al. 1995, Goldstein et al. 1995a), other mammals (Amos cj. 1993, Gotelli et al. 1995), birds (Ellegren et al. 1995, Primmer et al. 1997), turtles (FitzSimmons et al. 1995) and Hymenoptera ( Hugkes ayicl Que/Mer 1993}.

There is much interest in the connection between microsatellites and a number of human genetic diseases. Some of these involve dinucleotide repeats (Eschleman and Markowitz 1996), but the so-called trinucleotide exjjansion disorders are more prevalent (Mandel 1994). At these loci, the disease phenotype is associated with trinucleotide repeat alleles which are expanded above normal sizes. Disorders of this type include Fragile-X Syndrome, Huntingdon’s disease and . (This topic is discussed more fully in Chapter 3, section 3.8).

High levels of polymorphism at most microsatellite loci, and transitions from normal to pathological alleles at the disease associated microsatellites, derive from the dynamic nature of changes in array size. Understanding the mutational processes which operate at microsatellite loci is crucial for predicting array size dynamics and particularly the rapid rate at which variability is generated. In Chapters 2 and 3 ,1 use novel methods for studying microsatellite mutation, based on analysis of whole genome sequence data. Here, I first describe what is already known about the mutational processes, to make clear the assumptions on which the work of Chapters 2 and 3 is based.

11 1.2 Mutation at microsatellite loci Mutation rates have been studied directly, through empirical analyses of large pedigrees, and indirectly by applying population genetics theory to allele frequency distributions.

Direct observations of microsatellite mutation events in large studies of human and mouse pedigrees yield mutation rates of 1x10'^ - 1x10'^ per locus, per generation (ftiHas 1992, Weber and Wong 1993, Banchs et al. 1994), and even 1x10^ at one human tetranucleotide repeat locus (Mahtani and Willard 1993). Care must be taken in interpreting these results, as DNA maintained through cell lines is known to accumulate somatic , and mouse inbred lines could carry recessive mutations affecting DNA repair mechanisms which are important for microsatellite array stability. However, it is generally accepted that mutation rates at microsatellite loci are orders of magnitude above those for point nucleotide substitutions. Observed rates differ between loci and also between repeat types, with tetranucleotide repeats being nearly four times as mutable as dinucleotides, on average (Weber and Wong 1993).

Indirect estimates of mutation rates may be obtained from classical population genetics theory (Ewens 1972), where estimates of the population size and total number of alleles are available. Indirect estimates from dinucleotide repeat loci in the honeybee have yielded mutation rates of 1x10 "^- 1x10 ^ (Estoup et al. 1995), and in humans, trimeric and tetrameric repeats had mutation rates of 1x10 ^ (Edwards et al. 1992), values generally in agreement with direct observations. Although much less labour intensive to perform, indirect estimates must be treated with caution as unknown demographic histories frequently affect estimates of N^, and sampling effects mean that total numbers of alleles cannot, in practice, be precisely calculated.

High mutation rates are typical of tandem repetitive DNA’s, suggesting that these loci are affected by common mutational processes. However, unlike minisatellite loci, where hypervariability is thought to arise mainly through high rates of unequal recombination, the dominant mutational mechanism at microsatellite loci is thought to be polymerase slippage during DNA replication (Levinson and Gutman 1987). This process, henceforth termed ‘stepwise mutation’, causes an increase or decrease in . , or a-f&u) array size by one^repeat units.

The exact mechanism of stepwise mutation is unclear. Gain or loss of repeat units may occur through misalignment of complimentary strands following replication (Strand et al. 1993), or slippage of the polymerase enzyme complex itself (Richards and Sutherland 1994). By whichever mechanism the change in repeat number occurs, it is

12 clear that the mutant allele arises through a failure of DNA repair, as mutations which impair DNA repair enzymes can raise stepwise mutation rates by several hundred fold (Strand et al. 1993). This implies that replication errors at microsatellite loci are very frequent events which are usually corrected by efficient DNA repair systems. Figure 1.1 shows a schematic diagram of the hypothesised mechanism of stepwise mutation.

Figure 1.1 Stepwise mutation through slipped-strand mispairing. (after Strand et al. 1993)

Double stranded DNA with microsatellite repeat.

Strands transiently dissociate during DNA replication.

Strands reanneal in a mis aligned configuration.

DNA repair enzymes fail to excise mispaired repeat unit, but fill gap and extend array by one repeat.

A stepwise model of mutation is consistent with in vitro studies of microsatellite mutation events, where the vast majority (>80%) involve a gain or loss of one repeat

A high rate of rephcation error at microsatellite loci is also suggested by Schlotterer and Tautz (1992). From an in vitro study of slippage synthesis of simple sequences, they suggest that mutation rates at these sequences may be as high as one per hundred rephcation events. Their results imply that changes in repeat number occur through a gap-fiUing process similar to that proposed by Strand et al. (1993) (Figure 1.1). Additionally, Schlotterer and Tautz show that in vitro, the rate of shppage mutation is independent of the length of the repeat. This finding has recently been contradicted by in vivo studies from yeast (Wierdl et al. 1997), where mutation rates for poly GT arrays increase by more than two orders of magnitude as arrays increase in size from 15 to 99 bp. (Weber 1990, Goldstein and Clarke 1995), but Valdes found no such correlation in a

13 study of 100 human loci (Valdes et al. 1993). However, all of these studies are somewhat limited by the stringent screening processes used to obtain microsatellites for human linkage studies. These methods specifically select for large arrays, limiting the variance in size and reducing the power to detect any correlation with degree of polymorphism.

Large changes in allele size are less clearly understood, but it has been proposed that they may occur through unconstrained slippage (Richards and Sutherland 1994), or unequal crossing over. Richards and Sutherland propose that when both single stranded replication breaks occur within the repeat, the unanchored fragment could undergo large-scale slippage. Although plausible, this mechanism requires that the initial array size is very large (>80 repeats), and so may only have particular relevance to the disease related microsatellites, rather than being a general process affecting all , loci.

Unequal recombination between homologous repeat arrays could generate both large and small changes in allele size. This process could occur at mitosis or meiosis and between sister and non-sister homologues. Evidence from disease-associated trinucleotide repeat arrays, which show large-scale mutations on parental transmission (Yu et al. 1992), as well as somatic mosaicism for alleles with a large size variance (Whorle et al. 1993), suggest that if unequal recombination is responsible it occurs at both mitosis and meiosis. As with stepwise mutation, unequal recombination should become more likely with increasing array size, as the region of potential mis-alignment gets larger. Figure 1.2 shows a schematic diagram for a generalised process of unequal exchange.

Allele frequency distributions at microsatellite loci are generally consistent with the pattern expected from a predominantly stepwise mutation process. Distributions tend to 5how adjacent alleles sizes differing by one repeat unit, and positively skewed towards longer allele length (Rubinsztein et al. 1995), supporting the evidence for an upward bias to stepwise mutation from direct studies of mutation. Bimodal and multimodal allele frequency distributions are occasionally observed (Brooker et al. 1994, Taylor et al. 1994), these probably result from rare large changes in allele size followed by stepwise mutation around the new allele. There is also evidence that interrupted microsatellite arrays produce complex allele frequency distributions (Fomage et al. 1992).

14 Figure 1.2 Mutation through unequal exchange.

Paired Chromatids

Crossover between mis aligned repeats.

Reciprocal alteration of tract length.

Microsatellite arrays thus evolve rapidly through a combination of stepwise mutation and presumed unequal exchange. This has been termed Dynamic Mutation (Sutherland and Richards 1995). Empirical evidence shows that stepwise mutation occurs at frequencies orders of magnitude higher than point nucleotide substitutions, with a 2:1 bias in favour of gain of repeats. Stepwise mutations also appear to be by far the most dominant mutational process, occurring at least 10 times as frequently as larger size changes for arrays of 10-20 repeat units. These appear to be general features of the microsatellite mutation process, applying to all taxa as yet examined, and are the basic assumptions of the studies presented in Chapters 2 and 3. In the next section of this introduction, I describe why an understanding of the mutational processes of microsatellite loci is necessary for their use as markers in population and taxonomic studies.

1.3 Microsatellite evolutionary dynamics and population structure Microsatellites appear to represent the apotheosis of the genetic marker for population studies. They are abundant and widely distributed in all eukaryotic genomes, usually highly polymorphic, easily typed by PCR, and, with the possible exception of some trinucleotide repeats, approximately selectively neutral. However, a greater understanding of the nature of array size changes has shown the need for some caution in their use.

15 Population genetic analyses interpret patterns of genetic variation within and between populations to make inferences about the demographic parameters which must have affected those populations. However, the pattern of variation exhibited at a particular marker locus is not only a product of the population’s demographic history, but also of the particular mutational processes which affect the locus. These ‘intrinsic dynamics’ are different for particular types of genetic marker, and this must be accounted for when making inferences about demographic events. Traditional methods of estimating demographic parameters, such as population subdivision or genetic distance, from marker loci assume very low mutation rates (fi), so that the effects of gene flow (m) and drift predominate, (m & 1/2N » fi). They also assume infinite alleles, or k alleles, models, where mutations generate new alleles either not previously present in the population (infinite alleles model), or in one of k possible allelic states (k alleles model). Neither model assumes that there is any pattern to the mutational process (Nei 1972, 1977, Wright 1978).

The dynamic nature of array size change at microsatellite loci, through stepwise mutation and unequal exchange, potentially violates both of these assumptions, and must be accounted for when making demographic inferences from allele frequency distributions. An infinite alleles model assumes that a mutation event erases any memory of the prior allelic state, and so any excess genetic similarity between populations must be due to migration or historical association. Under a stepwise mutation model, the new mutant allele is closely related in size to the allele that mutated, and back mutation to alleles which are identical in state, but not in ancestry, will be common. This will result in an apparent excess of genetic similarity if Nm estimates are based on

Slatkin (1995) has presented a measure of population subdivision based on microsatellite allele frequencies which is analogous to Wright’s Fg^ (1978) but unbiased by stepwise mutations occurring at relatively high rates. Slatkin’s Rg^ (Chapter 6, section 6.2.5) is a measure of differentiation based on the variance in allele size between populations, which will gradually increase with time under the assumptions of a neutral stepwise model. The related measure of genetic distance Dg^, is a simple, unbiased estimator of the amount of variance in allele size which is between populations (Shriver et al. 1995).

The new models have been tested using data from a variety of organisms. Forbes et al. (1995) examined the divergence of domestic and wild sheep populations in the USA, comparing measures based on infinite alleles and stepwise mutation models. Among

16 allopatric populations of domestic sheep, provided greater resolution than R^y, and standard genetic distance measures compared well with known divergence times. This is probably because these populations diverged recently enough for the mutation process to be irrelevant. Over longer timescales, where there is sufficient accumulation of mutations, Rgy better distinguished interspecies comparisons among sheep, and more accurately dated the divergence of bighorn sheep. However, even using the size variance based measures, microsatellites produced an underestimate of divergence times when compared with evidence from biogeography and from mtDNA data.

An underestimate of divergence times was also found by Deka et al. (1994) in a study of 13 microsatellite loci in humans and chimpanzees. They calculated the ratio interpopulation and interspecies divergence times derived from Dg^ and found it to be nine times smaller than that derived from DNA sequence analysis and paleobiological evidence. This apparent excess of genetic similarity at microsatellite loci between species could be explained by a force which constrains the extent to which repeat arrays expand and contract. Such a constraint to array size evolution could arise either through selection, or some sort of mutational boundary. The implication of a mutational boundary explanation is that small repeats will tend to expand through dynamic mutation, while the same process will cause large arrays to contract.

The possibility of a mutational constraint on divergence was investigated by Garza et al. (1995), using the data of DiRienzo et al. (1994). Three human populations had very similar allele size distributions at 10 microsatellite loci, with mean allele lengths differing by less than one repeat unit at most loci. According to the neutral stepwise model, the difference in numbers of repeats in reproductively isolated populations should increase without bounds at a rate proportional to the time since separation. If this is true, and microsatellite mutation rates are of the order 10 ^, why is there so little divergence between these human populations? The similarity could be due to historical gene flow, and so Garza used the same 10 loci to compare humans and chimps. Again, average repeat number and allele size were very similar in both species. A biased stepwise mutation model, where the expected change under mutation depends on the distance of the allele from some assumed ‘target’ size, suggests that a small mutation bias can easily explain the lack of divergence between distinct lineages. However, as they point out, the extent of the bias necessary to reduce divergence to the levels observed depends on the mutation rate. If the mutation rates of these loci are all of the order of 10 ^ then it may not be necessary to invoke a mutation bias at all.

17 1.4 What limits microsatellite array size evolution? Population studies provide some support for the existence of a constraint on microsatellite array size. Much stronger evidence comes from genomic sequence studies which show that, despite an upward biased stepwise mutation process, the vast majority of microsatellite arrays are very small (<6 repeat units) (Jurka and Pethiyagoda 1995, Chapter 3, section 3.1). What constrains array expansion?

The possibility that there is selection on array size arises from the suggestion that microsatellites have a functional role. Speculations as to what this might be are discussed more fully in Chapter 3 (section 3.2), but include acting as hot-spots for meiotic recombination, facilitating exon shuffling, participating in the high order structuring of the eukaryotic genome, and the regulation of gene transcription, (Hamada et al. 1984 a&b, Treco and Amheim 1986, Nadir et al. 1996). A regulatory role for microsatellites in gene transcription has intriguing implications for spéciation and divergence. In recently diverged lineages where large phenotypic differences between species are accompanied by very little genetic divergence, (as between humans and chimpanzees), the marked phenotypic differences may arise through mutations in regulatory systems (King and Wilson 1975).

While selection may act to constrain the size of arrays at some loci, it seems unlikely that it can be general process acting across the vast number of microsatellites in eukaryotic genomes. And even though Garza et al. (1995) have shown that a mutational bias would constrain array size distribution, a mechanistic explanation for how this bias might operate does not yet exist.

The purpose of the following two chapters is to demonstrate how the mechanisms of dynamic mutation could set intrinsic limits on array size evolution. In Chapter 2 ,1 use the distribution of short tandem repeat array sizes in the genomes of the yeast, nematode and human to suggest an origin for stepwise mutation - the birth of a microsatellite. In Chapter 3 ,1 present a simple model to show how the two components of dynamic mutation, slippage mutation and unequal exchange, could combine to set an intrinsic limit on array size increase, and present a preliminary comparison of this model output with genome sequence data. Both approaches utilise the extensive data which is being made available by large scale genome sequencing projects. These data sets provide a powerful new tool for examining the distribution of microsatellite array types and sizes in organisms as diverse as bacteria, yeast and humans. Previous studies of microsatellite mutation have been based on arrays isolated by hybridisation screening. These screening protocols are generally designed to select for large arrays, resulting in a highly size-biased sample of microsatellites. Large

18 genome sequence databases contain information on all repeat arrays, a particularly important advantage when considering the limits of microsatellite evolution.

19 Ch apter 2

A threshold size for microsatellite expansion

Abstract Microsatellites are characterised by dynamic mutation, but at present, the evolutionary origins of this process are unknown. Understanding mutational processes through pedigree analysis or phylogenetic comparison is hampered by the large amount of data which must be collected in order to observe a few individual mutation events. To overcome this, I have used the size distributions of short tandem repeats in the yeast, nematode and human genomes, as ‘fingerprints’ of the mutational process which produced them. I show that dynamic mutation only occurs at array sizes of about eight nucleotides or greater, and that this threshold applies to all types and size classes of short tandem repeat. I propose that the eight nucleotide size threshold represents a structural constraint on the accuracy of DNA repair, leading to dynamic mutation, and may be regarded as the evolutionary origin, or ‘birth’ of a microsatellite.

20 2.1 Introduction When is a DNA repeat sequence a microsatellite? Microsatellites are tandem arrays of short (1-5 bp) repeats, characterised by rapid expansion and contraction through a process of ‘dynamic mutation’. Despite their importance in modem genetic analyses, and association with several human genetic diseases, little is known about the initial conditions necessary for dynamic mutation to occur - the evolutionary origin of a microsatellite. In this chapter, I use data on the frequency of large arrays in the genomes of Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens (hereafter yeast, nematode and human) to demonstrate the existence of a minimum threshold size necessary for a repeat sequence to undergo dynamic mutation. The existence of this threshold provides an important insight into the evolutionary properties of repeat arrays in the genome.

The major component of the dynamic mutation process is thought to be polymerase slippage during replication, resulting in an increase or decrease of array size by one repeat unit. Estimates of slippage mutation rates range from 10 ^ to 10 ^, values which are orders of magnitude greater than point nucleotide substitution rates. Large in vivo studies of pedigrees show a 2:1 bias in favour of gain of repeats and, coupled with high mutation rates, this suggests that microsatellites show a tendency to rapidly increase in size over time (Chapter 1, section 1.2). Given existing mechanistic explanations of slippage mutation, it is likely that the lower size limit to dynamic mutation is imposed by the minimum number of repeats necessary for post-replication mis-annealing to occur.

The possible existence of such a threshold was first demonstrated in a phylogenetic study of putative microsatellite loci in primates (Messier et al. 1996). The authors

examined a (GT)^ short tandem repeat in the r|-Globin pseudogene of the Great Apes

and a variety of New and Old World monkeys. In the African apes, a single nucleotide substitution had mutated the sequence into an (ATGT)^ tetranucleotide repeat which had subsequently expanded to (ATGT)^ in chimpanzee and gorilla, and (ATGT)^ in humans. A different substitution event in the Owl monkey had converted the repeat to a (GT)4 which had then expanded to (GT)g. In all other monkeys the repeat did not change in size. This suggests that once point nucleotide substitutions create arrays of sufficient size they may expand rapidly through dynamic mutation. Based on this single repeat sequence in primates, the minimum array size seems to be about four repeats for dinucleotide arrays and only two for tetranucleotide arrays. Thus an understanding of the rapid evolutionary dynamics of microsatellites is developing, but

21 both in vivo and phylogenetic studies are limited by the large amount of data which must be accumulated in order to observe a few mutation events.

Large scale genome sequencing projects allow a novel approach to the analysis of slippage mutation and the evolutionary origins of microsatellites. In a recent review, Eugene Koonin and Arcady Mushegian (1996) suggest that ‘the sequencing of the first complete genomes of cellular life forms signifies the onset of a new era in biology.’ Currently the complete genomes of two bacteria, Haemophilus influenzae and Mycoplasma genitalium, and that of the unicellular eukaryote Saccharomyces cerevisiae are publicly available. Well-advanced sequencing projects are also underway in a variety of other eukaryotes including the nematode. Drosophila, the pufferfish and man, with an expected completion date for the entire human genome within the next ten years. Knowledge of the complete functional content of genomes will potentially revolutionise many biological disciplines including gene expression (Bassett et al. 1996), comparative mapping (Eppig 1996), medical genetics and gene therapy (Gilbert 1992), and evolutionary biology (Sidow 1996). Large scale genome sequences also provide an extremely powerful method for examining the behaviour and evolution of non-coding sequences. They have already been exploited in the study of several classes of tandem repetitive DNA (Ting 1995, Hancock 1996, Richard and Dujon 1996, Smit 1996). Here I use large scale genome sequence data to study the evolutionary origins of microsatellites.

2.2 Methods I have performed a study of the size distribution of short tandem repeats in the genomes of the yeast, nematode and human to illustrate the mutational processes acting on these arrays. The study compares observed and expected numbers of repeat arrays in each size class

I assume a null hypothesis in which there is no dynamic mutation and repeat units (i.e. CA, AT, AGAT etc.) are randomly distributed within the genome. This null hypothesis is consistent with a simple model of genome evolution in which nucleotides evolve by random transitions and . Under this null hypothesis there are many more short repeat arrays than long ones. Therefore, if rapid dynamic mutation does occur, it will tend to create an excess of long repeat arrays, compared to the null hypothesis, even if slippage mutation is unbiased or slightly downwardly biased. For any particular type of array, the size at which this to overrepresentation occurs may reflect the beginning of dynamic mutation and the ‘birth’ of a microsatellite. I show evidence for such a minimum size threshold for slippage mutation, and suggest that for a variety of different short tandem repeats, this

22 threshold is consistently determined by the number of nucleotides in the array, rather than the number of repeats. As this chapter is concerned with the origins of microsatellites, I use the term short tandem repeat to describe any array of repeat units, and restrict the term microsatellite to those arrays presumed to be undergoing dynamic mutation,

2.2.1 Assembly of genomic sequence data The genome sequence of the budding yeast Saccharomyces cerevisiae was down­ loaded from the World Wide Web (WWW) site of the Martinsried Institute for Protein Sequences, Munich, Germany. This 12.06 Mb of non-ribosomal DNA comprises approximately 70% coding sequence, which encodes 6183 proteins (Bassett et al. 1996). Sequences were downloaded in 50 Kb files, header information was removed and the files were condensed into a single database for searching.

The small soil nematode Caenorhabditis elegans has a 100 Mb genome which is thought to encode about 13,100 genes (Hodgkin et al. 1995). Currently about 50% of the genome sequence is known, and is available directly from the ‘aCEDB’ WWW site at the Sanger Centre, Oxford, UK. I assembled a database consisting of 615 cosmid sequences from chromosomes 2,3,4,5 and X, a total of 18,983,313 bp. Cosmids were chosen to give maximal chromosomal coverage without overlap. (Although about 32 Mb of finished sequence data are available, much of this is in the overlaps between cosmids.)

Human sequence data were obtained by anonymous FTP from the WWW site of the Sanger Center, Oxford, UK. Although large quantities of human sequence data are available, much of this is piecemeal and concentrated on particular genes. Currently the longest continuous sequenced tracts are only about 2 Mb, and these are from particularly gene rich regions. Finished sequence data was selected from non­ overlapping cosmids in these regions, mostly from Xp and 22q 11.2 - 22qter. A total database of 7,173,839bp was assembled. The human database is consequently smaller and much less representative than the nematode or yeast, but it still represents the most advanced sequencing project of any higher organism.

2.2.2 Searching of genome databases The genome sequence databases were searched for short tandem repeats using çeorc^b^ software written by Erik Corry. All possible mononucleotide, dinucleotide and tetranucleotide repeats were detected. Each type of repeat unit has alternative counterparts and complimentary versions, for example (CA)„ arrays can also be represented as (AC)„, (GT)^, and (TG)„. Consequently the total number of possible,

23 non-overlapping repeat arrays are 2 mononucleotide, 4 dinucleotide and 33 tetranucleotide. However, as the databases only contain information from a single strand of DNA, it is necessary to search with both complimentary versions of each repeat unit in order to find all arrays of that type, i.e. CA repeats on one strand are represented as GT repeats on the complimentary strand. Trinucleotides were excluded from the analysis, as all array sizes of trinucleotide repeats coincide with the periodicity of codons within open reading frames. Consequently the pattern of trinucleotide array sizes is likely to be affected by selection on amino acid sequences.

To calculate the expected values for a particular array, I counted the total number of repeat units of that type in the genome database and use this to calculate the expected numbers of arrays at each size, under the assumption of the null hypothesis that repeat units are randomly distributed within the genome. An example calculation is given in Box 2.1.

Box 2.1 Calculation of expected numbers of arrays. For example - How many CACACA repeats are expected in the yeast genome under the assumption of a random distribution of the repeat unit CA? Total nucleotides = 12067315 Total CA =711971 Frequency of CA (p) = 0.059 Expected number of CACACA = (0.059)^ x 12067315 = 2478 (However this estimate must be modified to prevent the inclusion of longerCA repeat arrays) - Expected number of wCACACAxx = 2194 (Where 2194 = 12067315 x (l-p)(p^)(l-p) and (l-p) = the probability that XX is any pair of nucleotides other than CA).

Observed (O) and expected (E) numbers of repeat arrays at each size are compared. The expectation under the null hypothesis is that observed and expected numbers will be the same and 0/E values will equal 0. Dynamic mutation will cause rapid expansion of arrays leading to overrepresentation and positive O/E values at those sizes.

All possible types of mononucleotide and dinucleotide repeat units exist in expanded arrays in the genomes of the yeast, nematode and human. Some GC rich tetranucleotide repeats do not exist in any arrays of greater than two units, but this may be as a result of their rarity in the genomes, rather than any particular limit on their expansion potential. Figures 2.1 - 2.3 show the log O/E numbers of arrays at each size

24 for short tandem repeats in the yeast, nematode and human genomes. A logarithmic plot is used because of the extremely high 0/E values for large arrays, where the expected number effectively equals zero.

2.3 Results and discussion Figures 2.1-2.3 clearly show that for every type of repeat unit, large arrays are greatly overrepresented. This is consistent with the hypothesis that these arrays are subject to dynamic mutation and are behaving as microsatellites. With general exception of the mononucleotides A and T, most repeat types show a reasonably clear transition to overrepresentation. At arrays sizes of below about eight nucleotides, there are significant deviations between observed and expected values for most repeat types, however the arrays are both under and overrepresented - there is no obvious pattern. At sizes of greater than eight nucleotides almost all arrays are overrepresented with observed numbers typically orders of magnitude greater than expected.

This general pattern is probably most clearly exemplified in the yeast genome (Figure 2.1). In yeast, the mononucleotides A and T do not show a clear transition to overrepresentation and are consistently more numerous than expected even at small array sizes. This is probably explained by the great abundance of short poly (A.T) repeats which exist in all genomes, and not due to dynamic mutation operating at very small array sizes. To investigate this I searched the genome of the bacterium Mycoplasma genitalium for short tandem repeats. Microsatellites are not thought to exist in bacteria and, as expected, the Mycoplasma genome contains very few expanded arrays. However, it does contain a large overabundance of short poly (A.T) repeats. In both yeast and Mycoplasma these short sequences are so frequent and widely distributed that most must lie within coding regions. Consequently they are unlikely to have arisen through slippage mutation, as this would frequently disrupt open reading frames. Instead it seems probable that they are generated through some form of biased point nucleotide substitution.

Poly (G.C) repeat arrays are much less numerous, due to the overall A/T codon bias of the genome, and in yeast these sequences show a clear transition to overrepresentation at sizes above eight nucleotides. The slight deficit of arrays at size eight is consistent with this being a minimum threshold for dynamic mutation, as arrays of this size will tend to expand through biased slippage mutation faster than they are generated by point nucleotide substitution. Yeast dinucleotide and tetranucleotide arrays generally show a clear transition to overrepresentation, although for dinucleotide repeats this appears to be at sizes above six nucleotides. The genome of S. cerevisiae is extremely compact for a eukaryotic organism, with an average of one gene every 2kb, comprising

25 approximately 70% coding DNA, where dynamic mutation will be strongly selected against. Despite this, it contains approximately 4000 expanded (>8 nucleotide) mono, di and tetranucleotide repeats which together contain about 0.38% of the genomic DNA.

In the nematode (Figure 2.2), tetranucleotides show a clear expansion threshold, with all arrays overrepresented at sizes greater than eight nucleotides. For dinucleotides the transition is less clear, but lies between six and eight nucleotides. G and C mononucleotides in the nematode genome show the same pattern as those in yeast, a clear transition to overrepresentation with a deficit of arrays one size below the expansion threshold. However, in the nematode this deficit occurs at an array size of seven nucleotides. Mononucleotide A and T arrays in the nematode genome show no clear expansion threshold, despite being overrepresented at all sizes. Why this should be is not clear, unless there is some genome specific selection against rapid expansion through slippage mutation in these arrays. However, this seems unlikely as short (<6 bp) poly (A.T) repeats are very common in all genomes. Consequently if selection was necessary to prevent disruption caused by frequent dynamic mutation in these tracts, it should also apply to the yeast and human genomes.

Genome sequencing projects tend to begin with gene-rich regions, as a result the nematode sequence data used here probably contains a bias toward coding DNA when compared with the genome as a whole. However, it is still likely to contain more non­ coding DNA than the extremely compact yeast genome. This is supported by the presence of a greater proportion of expanded repeat arrays, more than 22000 in 18.9 Mb, comprising 1.13% of the total DNA.

Although large arrays are all greatly overrepresented in the human genome, the threshold for this transition is less clear (Figure 2.3). Short tandem repeats of all sizes tend to be present in excess, suggesting that the sequence composition of the database is highly non-random. There are known to be many small-scale sequence periodicities within human coding sequences, and the database used here is derived from particularly gene rich regions. As dynamic mutation is expected to be principally a property of non-coding regions, this human dataset is not ideal. Despite this, the pattern of 0/E values for human G and C mononucleotides is consistent with those of the yeast and nematode.

26 Figure 2.1 Distribution of short tandem repeats in the genome of the yeast

Mononucleotides. Dinucleotides. Tetranucleotides.

AT AAAT 14- 16- CA/GT CTTT

14- GA/CT CAAA 2 - GGAA

10- TAGA

10- ATGT TGAA ATTA Ô c

- 9

0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 9 6 8 10 12 14 16 18 204 Array Size (nucleotides) Array Size (nucleotides) Array Size (nucleotides)

Ratio of observed to expected frequencies of microsatellite repeats in the yeast genome. Expected values are calculated assuming that repeat units are randomly distnbuted at their observed genomic frequencies. Tetranucleotide data comprise both homologous forms of each repeat, thus AAAT = AAAT-t-TTTA. Some G/C nch tetranucleotides do not exist in arrays of greater than two repeats, however this is probably due to the general scarcity of G’s and C’s in the genome. For the sake of clarity, only the eight commonest tetranucleotides are shown. 27 Figure 2.2 Distribution of short tandem repeats in the genome of the nematode

Mononucleotides. Dinucleotides. Tetranucleotides.

AAAT AT 1 0 - 1 6 - O AAA CA/GT

1 4 - C.AAA GA/CT GGAA GC A— C TAGA

TGAA 10- TCCA

Ô TACA c

.1

0 2 4 6 8 10 12 14 16 18 2 0 22 24 0 2 4 6 8 10 12 14 16 18 2 0 22 24 0 4 8 1 2 16 20 Array Size (nucleotides) Array Size (nucleotides) Array Size (nucleotides)

Ratio of observed to expected frequencies of microsatellite repeats in the nematode genome. Expected values are calculated assuming that repeat units are randomly distributed at their observed genomic frequencies. Tetranucleotide data comprise both homologous forms of each repeat, thus AAAT = AAAT+TTTA. For the sake of clanty, only the eight commonest tetranucleotides are shown.

28 Figure 2.3 Distribution of short tandem repeats in the genome of the human

Mononucleotides. Dinucleotides. Tetranucleotides.

4 0

1 4 - AT AAAT 3 6 - 1 6 - CA/GT CCCT

GA/CT 1 4 - CAAA

GC GGAA 2 8 - 10- TAGA

2 4 - ATGT 1 0 - U-l TGAA 20 - Ô ATTA

c 1 6 -

0 2 4 6 8 10 12 14 16 18 20 0 1 4 6 8 10 12 14 16 18 20 4 8 1 2 16 20 24 Array Size (nucleotides) Array Size (nucleotides) Array Size (nucleotides)

Ratio of observed to expected frequencies of microsatellite repeats in the human genome. Expected values are calculated assuming that repeat units are randomly distributed at their observed genomic frequencies. Tetranucleotide data comprise both homologous forms of each repeat, thus AAAT = AAAT+TTTA. For the sake of clarity, only the eight commonest tetranucleotides are shown. 29 In the calculation of expected numbers of arrays at each size I assume that repeat units are randomly distributed throughout the genome. Clearly selective constraints affect genome sequence composition, and systematic analysis of whole yeast chromosomes have shown the distribution of bases to be non-random (Oliver et al. 1992, Murakami et al. 1995). However, it is unlikely that such constraints would act in the same way on each different type and size class of short tandem repeat. It is also possible that localised variation in base composition, such as particular regions of AT or GC richness, could cause an overrepresentation of short tandem repeats containing those nucleotides. Indeed, most genomic regions are subject to such biases (Murakami et al. 1995, Dujon 1996). However, this would affect all array sizes and could not account for the threshold-linked change in array size abundance that I demonstrate in this chapter.

The sequence databases contain large amounts of coding DNA. Many small repeats will exist, and may be transcribed, within exons, but any slippage mutation here will be strongly selected against. These sequences will tend to obscure any transition to overrepresentation caused by the beginning of microsatellite dynamics, and it would be interesting to repeat the study with a database comprising only non-coding DNA. Despite the presence of large amounts of open reading frames in my sequence data, a clear transition is already shown for several repeat types and within different genomes.

Overall the sequence data from three genomes suggest that short tandem repeats show a transition to great overrepresentation at array sizes larger than eight nucleotides. I suggest that this excess is caused by expansion through rapid upward-biased slippage mutation, and that a minimum array size is required for this process. This is consistent with the phylogenetic analysis of Messier et al. (1996). They showed that once tetranucleotide and dinucleotide arrays were increased to a size of eight nucleotides, by substitution events, array expansion followed.

In this chapter, I have used the array size distribution of short tandem repeats in large scale genome sequence databases to investigate the origins of dynamic mutation. To answer the question posed at the beginning of the chapter, a DNA repeat sequence is a microsatellite when its array size is greater than eight nucleotides and it becomes subject to upward biased slippage mutation. The data also suggest that this threshold for expansion is consistently determined by the number of nucleotides in the array rather than the number of repeats. Despite a detailed knowledge of DNA replication, the mechanism of slippage mutation remains elusive, and it is unclear why it should

30 depend upon the number of nucleotides in an array. It has been shown that rates of dynamic mutation can be drastically altered by mutations affecting DNA mismatch repair (Strand et al. 1993). The microsatellite expansion threshold that I observe may therefore represent a structural constraint on the accuracy of this repair process.

31 Chapter 3

An intrinsic mutational constraint on microsatellite array expansion

Abstract Empirical studies show that microsatellite arrays grow in size through upwardly biased stepwise mutation. Conversely, population studies and analysis of genome sequence data, show that most arrays are small (<10 repeats). A strong constraint must act to limit array size increase, but at present there is no understanding of what this constraint might be. Various functional roles have been suggested for microsatellites, but it is unlikely that a generalised selective constraint applies to all types and size classes of these short tandem repeats. Instead, I present a simple model to show that the growth of arrays through stepwise mutation is opposed by the increasing instability of large arrays, due to unequal recombinational exchange. In this neutral model, array sizes are determined entirely by their intrinsic mutational properties. Under the assumptions of the model, I use mutational parameters from empirical studies to generate expected array size distributions within genomes. These are compared with observed array size distributions in genomic sequence data from yeast, nematode and human. These comparisons suggest that unequal exchange alone does not provide strong enough constraining force to account for the observed bias towards small array sizes. An increasing empirical understanding of the mutational dynamics of microsatellite loci will be required to more accurately define the parameters of the model, and allow a fuller examination of the hypothesis that array expansion is constrained by unequal recombinational exchange.

32 3.1 Introduction In the previous chapter, I proposed the existence of a minimum threshold size for dynamic mutation. That such a minimum limit exists is intuitively obvious. Without it, every nucleotide, or group of nucleotides, would potentially be a microsatellite, and rapid dynamic mutation would dominate DNA sequence evolution. However, the establishment of a lower size limit for dynamic mutation cannot alone account for microsatellite array size dynamics. The upward bias to stepwise mutation observed in empirical studies suggests that above the minimum threshold size, arrays should expand indefinitely through dynamic mutation. In the absence of some constraint on size increase, the genome would become dominated by microsatellite arrays.

In this chapter, I first argue that this limit to size increase does not arise through a general selective constraint on microsatellite function, but that it may be explained by processes of genomic turnover. I then present a simple model to show how interaction between the two components of dynamic mutation, stepwise mutation and unequal exchange, could put an intrinsic limit on microsatellite array expansion.

Strong selection against rapid size change could limit microsatellite array expansion within open reading frames. However, empirical data suggest that a size constraint applies to microsatellite arrays throughout the whole genome. In a study of partial sequences of primate DNA, Jurka and Pethiyagoda (1995) found large numbers of short repeat sequences but very few expanded arrays. This observation is supported by recent data from genomic sequencing projects (Chapter 2). As an example. Figure 3.1 shows the array size distribution of dinucleotide repeats in yeast and humans.

In each case only arrays of five repeats and greater are included. These arrays are all above the proposed threshold for dynamic mutation, (Chapter 2), and so should be subject to rapid upward biased change. But, though large arrays exist, the distribution is still dominated by small microsatellites. Hybridisation screening in a variety of taxa has revealed the presence of numerous microsatellites of >20 repeats, and microsatellites associated with human genetic disease may expand to contain hundreds of repeat units (Yu et al. 1992). Does the relative scarcity of large microsatellites imply a genome-wide selective constraint on array size, or can an explanation be based entirely on their mutational properties.

33 Figure 3.1 Dinucleotide array size distributions from genome sequence data

Yeast Human

300- ■ ------5 0 0 j

400-

200 - 300-

200- î 100- 1 100-

10 14 18 22 26 30 34 38 10 18 26 34 42 50

Array Size Array Size (nucleotides) (nucleotides)

3 .2 Evidence for and against a general functional role for microsatellites Almost as soon as it was realised that eukaryotic genomes were rich in poly (CA.GT) tracts, functional roles were suggested for these elements. Their potential to adopt a left handed DNA conformation (Z DNA), (Nordhiem and Rich 1983), indicates a possible role in regulating gene expression. of poly (CA.GT) tracts into expression vector plasmids can increase enzyme activity by up to ten-fold, depending upon the location of the repetitive element (Hamada et al. 19846). This finding leads to tantalizing speculation about a possible role for microsatellites in morphological evolution. King and Wilson (1975), suggested that only changes in gene expression could account for the large morphological differences between humans and chimpanzees, species which are very similar at the molecular level. Recent work indicates that microsatellite mutation may indeed vary in direction and rate between humans and chimps (Rubinsztein et al. 1995), and there are also differences in the organisation and dynamics of microsatellites between other taxa (Pardue et al. 1987 Brooker et al. 1994, Schug et al. 1997).

Other functional roles for microsatellites have also been proposed. Poly (CA.GT) microsatellites have been shown to enhance reciprocal meiotic recombination and cause multiple recombination events in yeast (Treco and Amheim 1988). This may be due to the increased sequence homology between pairing chromatids, or through the

34 increased binding of enzymes causing homologous pairing to Z-DNA. The genomic juxtaposition of different classes of repeated DNA has been a frequent observation (Tautz et al. 1986). The association of A rich microsatellites with A rich retroposons could indicate a functional role for microsatellites in the higher order organisation of the genome (Nadir et al. 1996). Homology driven integration of retroposons at ‘navigator’ microsatellite loci causes site-specific generation of A-rich sequences, which are thought to play a role in chromatin organisation (Gasser and Laemmli 1987).

Although these various functional roles have been assigned to microsatellites, none seems adequate to explain the general constraint on size observed across all types and sizes of repeat unit. There is good evidence against a general role for microsatellites in the regulation of gene expression. The distribution of expanded arrays (>10 repeats), used in genomic mapping studies, is approximately random across the genome as a whole, with no tendency towards clustering in euchromatic (gene rich) regions (Dib et al. 1996, Dietrich et al. 1996 ). Additionally, there is evidence that enhancement of gene expression may be particular to poly (CA.GT) arrays and not a general feature of all arrays capable of taking on Z DNA conformation (Hamada et al. 19846).

Generally function is attributed to the Z -DNA forming potential of poly(CA.GT) microsatellites, but the same arguments do not apply to other mono, di and tetranucleotide repeats which also appear to besufejecè to some form of size restriction. Furthermore, these studies have concentrated on relatively large arrays (>15 repeat units), due to the hybridisation procedures used to detect microsatellite loci. Genome- wide examination of microsatellite sizes (Figure 3.1), shows a continuous decline in frequency as repeat number increases, with no peak corresponding to a functionally important size. Although selection may be important in constraining array size at some microsatellite loci, particularly trinucleotide repeats within or near exons, it is unlikely to be a generalised process acting on all short tandem repeat arrays.

3.3 An intrinsic mutational limit to microsatellite expansion The realisation that eukaryotic genomes contain huge amounts of tandem repetitive, non-coding DNA has focused much attention on mechanisms of genomic turnover (Dover 1993). Processes such as gene conversion, slippage, unequal crossing over and retrotransposition have been shown to be important in the evolution of many genes including the human MHC complex (Kawamura et al. 1992), immunoglobulins (Rudikoff et al. 1992) and T-cell receptor genes (Kuhner et al. 1992). Crucially, it has been suggested that in DNA sequences not constrained by natural selection, sequence similarity will inevitably lead to the generation of tandem repetitive patterns through

35 unequal crossing-over and slippage mutation (Smith 1976, Stephan 1989). Here, I propose that the forces of genomic flux acting on microsatellite loci, stepwise mutation and unequal exchange, produce an intrinsic constraint on array size. This proposal is a purely neutral model of microsatellite evolution.

Models of stepwise mutation have generally been found to be in good agreement with allele frequency data at microsatellite loci within closely related populations (Valdes et al. 1993). Rubinszstein et al. (1994) demonstrated that an upwardly biased stepwise mutation process is required to explain the distribution of CAG repeat alleles at the Huntingdon’s disease gene, and DiRienzo et al. (1993) found that a two phase model, in which occasional mutations of greater than one repeat are allowed, better fitted observed data in human populations. However, while stepwise models may be appropriate for population genetic studies over short evolutionary timescales, it is clear that they cannot alone explain the evolution of microsatellite array size distribution. Empirical evidence indicates the existence of a bias towards upward stepwise mutation (Strand et al. 1993, Weber and Wong 1993, Banchs et al. 1994, Rubinzstein et al. 1994), Under stepwise mutation alone, arrays would increase to infinity with a positive probability, as there is no stable equilibrium for array size (Walsh 1987, TacKiJa 199^),

Although stepwise mutation is the dominant mutational process affecting microsatellite L ..... Wolff et al. (1989) used markers flanking the human D1S7 VNTR to suggest that unequal crossing over between homologous chromosomes is not the major mechanism involved in the generation of new alleles at VNTR loci. Statistical analysis of linkage data from 12 new alleles revealed that flanking DNA sequences were frequently not exchanged during mutation events. However, this does not rule out the possibihty of large size change through unequal sister-chromatid exchange at mitosis or meiosis, rephcation slippage or insertion or deletion of repeat units.

one repeat me muiaiionai process is eiiecuveiy sicpwisc, anu me pcisi&Lcnec umc ui the array is found to be up to two orders of magnitude higher than for unconstrained bite sizes, where the repeat is frequently recombined out of existence. This supports observed data from interspecies comparisons. For microsatellites, where the mutational process is predominantly stepwise, loci are frequently polymorphic in different but closely related species (Rubinzstein et al. 1994, Fitzsimmons et al. 1995). Whereas for minisatellites, where the mutational process is predominantly of unequal exchange, polymorphic loci in one species are not usually polymorphic in related species. Thus stepwise mutation and unequal recombination have very different

36 consequences for array size evolution. Under the former, small-scale size changes occur and individual arrays have long persistence times. Under the latter, arrays may rapidly explode to large size but their inherent instability ensures that they will only exist transiently before collapsing back to small size.

The purpose of this chapter is to demonstrate that the interaction of the two processes can constrain array sizes. I present a simple model of microsatellite array dynamics which combines upwardly biased stepwise mutation with unbiased unequal exchange. The model generates array size distributions which are then used in preliminary comparisons with data from the yeast, nematode and human genomes.

3.4 Description of the model Under the model scenario, an array expands gradually through an upwardly biased stepwise mutation process. As it expands, unequal recombination events become more probable and the average ‘bite size’ of such events increases. Unequal recombination increases or decreases array size with equal probability, but because instability increases as array size gets larger, the overall effect of this process is to reduce array size. However, as a minimum length of sequence similarity is necessary for unequal exchange to occur (Stephan 1989), microsatellite arrays should never be reduced below the minimum threshold size necessary for stepwise mutation. This is crucially different from the case of minisatellites. The larger core repeat unit size of minisatellites means that the minimum sequence similarity may correspond to a single repeat unit, meaning that the array can disappear completely through unequal exchange.

A simple iteration model was written in ‘C’ in collaboration with Daniel Falush. It follows Harding et al. (1992) in examining the change in size of a single microsatellite array through time. Over a large number of generations the resultant size distribution reaches a steady state which is assumed to reflect that of a large number of arrays evolving independently within a genome. Under this assumption, I am suggesting that when all microsatellite repeat arrays are sampled from millions of base pairs of sequence from different regions of the genome, they may be regarded as a population of independently evolving sequences. The model also assumes no birth and death of microsatellites. Downward stepwise mutation is not permitted at the minimum size threshold, and the assumption of a minimum overlap for unequal exchange ensures that arrays are not lost by this process. In this way, arrays subject to dynamic mutation are treated as an independent genomic entity, and the much less frequent process of point nucleotide substitution is ignored. This is clearly biologically unrealistic, but the model has been kept intentionally simplistic, as none of the mutational factors affecting

37 microsatellite arrays are understood well enough to allow for precise comparisons with genomic data. Rather, the purpose of this model is to demonstrate the potential for a mutational constraint to array size and suggest how this may be further investigated.

3.5 Model operation During a run of the program, array size (n) is modified according to three mutational processes, stepwise increases (n+\), stepwise decreases (n-\), or unequal exchange, (n + or - (1 to n-\)). The probability of each event occurring is specified by the ratios a, P and Ô respectively. At each mutational event the program calculates three time periods,

UPTIME = -log(x)/a«

DOWNTIME = -log(x)/p(M-l)

UNEQUALTIME =-\og{x)/àn^ where x is a random number between 0 and 1. The shortest of these time periods determines which mutation event has occurred and the array size is modified accordingly. Mutation rates are specified per repeat, such that larger arrays become

Valdes ct al. (1993) do not find such a con elation between increasing an ay size and mutation rate.

Unequal exchange is modelled as unequal sister strand exchange, allowing a single chromosomal lineage to be followed through time. When an unequal exchange event is specified by the model, two random cuts are made within the array to simulate the points of strand exchange. The new size = n + (cut 1 - cut 2), so n may increase or decrease depending upon the position of the two cuts. Under this mechanism, two mutation events are necessary for unequal exchange to occur. A consequence of this is that the probability of unequal exchange increases with the square of increasing array size, while the probability of stepwise mutation, either up or down, only increases linearly. Thus the relative instability of arrays increases greatly as they become longer.

Two forms of output from the model are examined. The life history of an array - its changes in size over time, and the distribution of time spent at each size over a very large number of generations.

38 Figure 3.2 Flow chart of model operation

Calculate: UPTIME DOWNTIME UNEQUALTIME

THEN IF UPTIME < DOWNTIME & n + (Cut 1 - Cut 2) UNEQUALTIME ELSE THEN IF DOWNTIME < UNEQUALTIME

ELSE

Calculate Cut 1 - Cut 2

Iterations start with array size (n) = 1. UPTIME, DOWNTIME and UNEQUALTIME correspond to the length dependent probabilities of stepwise increase; stepwise decrease and unequal exchange respectively.

3.6 Model output Output from the model is considered in three stages. Firstly, I show that there will be no stable array size distribution for arrays evolving under stepwise mutation alone. I then show that a combination of stepwise mutation and unequal exchange can constrain expansion and produce stable array size distributions. Finally, I use mutational parameters from empirical studies of microsatellites to generate array size distributions which are then compared with observed data.

3.6.1 Arrays evolving under stepwise mutation only Figures 3.3-3.5 show changes in size over time for arrays evolving only through stepwise mutation, but subject to different degrees of bias in the direction of this process. For each of these arrays the rate of upward stepwise mutation is set at IxlO'"^ per repeat. This gives a rate of 1x10 ^ for arrays of 10 repeats, a figure typical of those obtained in direct observations of mutation rates at dinucleotide microsatellites. This dependence of mutation rate on repeat length is crucial in determining size dynamics as it may vary by two or more orders of magnitude during the lifetime of an array.

39 Figure 3.3 shows the course of three independent repeat arrays evolving under stepwise mutation with a 2:1 bias in favour of gain of repeats. After an initial and variable lag phase, when mutation events are rare, the array size dynamics become predictable and the repeats expand rapidly towards infinite size. This expansion phase is reached quickly on an evolutionary timescale, within tens of thousands of generations.

Figure 3.3 Array size evolution under upward biased stepwise mutation only

Life histories of three independent arrays when a = IxlO""^ and |3 = 5x10'^

50 1 50

1 4 0 ■ :

3 0 -

N 20-: 2 0 -

10 ■ 1 0 .:

20000 40000 50000 100000 30000 60000

Time in Generations

Figure 3.4 shows the life histories of arrays evolving under stepwise mutation, but with a smaller bias in favour of gain of repeats. There is no stable equilibrium for array

size. a:p ratios of 1.25:1 and 1.05:1 increase the length of the lag time, but a rapid and predictable expansion still occurs within 200000 generations. Figure 3.5 shows that even with unbiased stepwise mutation, arrays will still grow to large size (>20 repeats) with high probability.

Under the assumptions of this model, a downward biased stepwise mutation process effectively constrains array size and so could possibly explain the observed bias towards small microsatellites (Figure 3.5). However the persistence of this array is an artifact of the model, in which arrays are not allowed to contract below the minimum threshold size for slippage mutation. In reality, a downward bias to stepwise mutation would frequently contract repeat arrays to below this threshold, and in the absence of some other equally rapid generating process microsatellites could not persist. Thus the model demonstrates, as expected, that in the absence of selection, there will be no stable array size distribution for microsatellites evolving under stepwise mutation only.

40 Figure 3.4 Array size evolution under small upward biases to stepwise mutation

a = 1x10-4 p = 7.5x10'^ a = 1x10-4 p = 9.5x10-5 50 501

40

30

20 I 10 0 100000 200000

Time in Generations

Figure 3.5 Array life histories under stepwise mutation only

Unbiased Downward Biased a = 1x10^ P = 1x10-4 a = 1x10^ P = 5x10-4 40 40

30 30-

20 2 0 -

10 10

0 ^ 1 1 1 |"i 0 100000 200000 0 200000 400000

Time in Generations

3.6.2 Unequal exchange constrains array size expansion Figure 3.6 shows the life history of a typical array evolving under the dual processes of stepwise mutation and unequal exchange. At small array sizes (<10 repeats), upward biased stepwise mutation dominates and the microsatellite grows gradually. As it increases in size the array becomes a more effective template for unequal exchange, which decreases or increases size with equal probability. However the dependence of mutation rate on repeat number means large arrays are highly unstable and have very

41 low persistence times. Consequently array life history is dominated by stepwise mutation at relatively small size - the general pattern observed in empirical studies.

Figure 3.6 Array size evolution under stepwise mutation and unequal exchange

a = 1x10-3 p = 5x10^ Ô = 5x10'^

120

I 100 - c p s 80 CHh - (D 60 .5

4 0 - 1 20

0 r T- -|— I— r ) 20000 400004 0 0 0 0 60000

Time in Generations

Figure 3.7 demonstrates that unequal exchange can constrain array expansion and produce stable size distributions. The graph shows the array size frequency distributions for a variety of different ratios of stepwise mutation to unequal exchange. Distributions represent the amount of time spent at each size by a single array during 100 million generations of evolution. This is assumed to represent the steady state achieved by a large number of independently evolving arrays within a single genome. In each case there is a 2:1 bias in favour of upward stepwise mutation, as suggested by empirical studies. Mean array size is determined by the ratio of stepwise mutation to unequal exchange, and is kept below 20 repeats even when stepwise mutation is two orders of magnitude more frequent than unequal exchange.

Thus it is easy to show that, in principle, an unbiased process of unequal exchange could constrain the tendency of arrays to grow in size through biased stepwise mutation, through the inherent instability of large arrays. However, the purpose of this chapter is to go further and ask if this hypothesis is supported by what little we know about the processes of mutation at microsatellite loci. As with the previous chapter, I use the size distribution of repeat arrays within genomes as a ‘fingerprint’ of the mutational processes which may have produced them. I compare size frequency

42 distributions generated iVoin the iteration model with observed genomic data from yeast, nematode and human to examine which combination of mutational parameters produce the best fit. The parameters of interest are degree of bias in stepwise ratio, and ratio of stepwise mutation to unequal exchange, because these are the ones for which some empirical data is available. Absolute mutation rates are unimportant as the distributions are assumed to be static.

Figure 3.7 Array size distributions after 100 million generations

0 . 4 - Mutation ratio a : (3 : Ô

o c

»

010 20 30 40 50 Array Size (repeat units)

3.6.3 Comparison of model output with observed genomic data The model only considers array behaviour under dynamic mutation, other genomic processes are assumed to act at rates low enough to be ignored. This necessitated isolating a subset of such arrays from the observed genomic data. Consequently, only repeat arrays of greater than 10 nucleotides were considered, as these lie above the (hieshold for dynamrc mutation outlined previously. Additionally, there is the comphcation of selection against the products of dynamic mutation in coding regions. Repeats of >10 nucleotides may exist within open reading frames, but are not expected to be subject to rapid size change. This will necessarily cause a bias towards smaller repeat arrays. In order to partially overcome thrs, I use 0-E values for ai'rays of greater than 10 nucleotides (for calculation of expected numbers of arrays at each size, see

43 Chapter 2). Although this is somewhat arbitrary, the aim was to overcome the effects of sequences under strong selection, it does not matter if some microsatellite arrays were also excluded. Genomic array size frequency distributions thus consist of observed - expected numbers of arrays at each size greater than 10 nucleotides, for mono, di and tetranucleotides.

The goodness of fit between observed size distributions and model data are tested by calculating the log-likelihood support function (({)) for each combination of mutation parameters. This function takes the form,

(|) = E logj>. Wherea is the observed frequency, and p is the expected frequency of arrays in the zth

size class. As (|) is calculated from the log^ of fractions it will necessarily be negative. Each set of mutation parameters gave an expected dataset, which was compared with a specific genomic array size distribution to give a value of (|). Higher values of (|) represent better the fit between observed and expected distributions. Results are presented in the general form of the likelihood surface, generated by varying the ratios of upward and downward stepwise mutation (a and p) to unequal exchange (Ô). On figures 8-11, the x axis corresponds to the ratio a:0, the y axis to the ratio p:0, and the

z axis to the value of (]). The highest point of the likelihood surface represents the best fit of expectation with data. Arbitrary minimum contour values have been chosen to assist clarity, all outlying regions of the surface lie below these values. The aim of the

analysis is to determine whether, when the ratio of a:p is approximately 2:1, the

maximum value of (|) corresponds to a ratio of stepwise mutation to unequal exchange

comparable with that found in empirical studies. The line of this 2:1 ratio of a:p is

indicated on each graph.

Figure 3.8 shows the (|) surface for human dinucleotide repeat arrays, when ratios of

stepwise mutation (a and p) to unequal exchange (ô) range from 0 to 100.

44 Figures. 8 ([) surfaces for comparisons of model size distributions with human dinucleotide data

(solid line indicates a: |3 = 2:1)

80 -

60 -

^ 40 -

20 -

-3200 -3140 -3080 -3020 -2960 -2900 -2840 -2780 -2720 -2660 -2600

Rather than a single point of maximum likelihood, there is a clear ridge of best fit between model output and data. This is not surprising, as at any level of Ô there will always be a balance between a and (3 which will best fit the data. The point of interest is where the observed 2:1 bias towards upward stepwise mutation crosses the ridge of best fit. It is clear from Figure 3.8 that such a bias can only account for the observed anay size distribution at very high levels of unequal exchange relative to stepwise mutation. Figure 3.8 also shows that an increasing bias is required to best explain the data as the ratio of a and P to Ô increases. This is to be expected, as the overall effect

45 of ô is to constrain array size expansion. Hence as the relative frequency of 8 decreases, a higher rate of p is required to account for the observed bias towards small arrays.

Figures 3.9 to 3.11 show the (|) surfaces for mono, di and tetranucleotide repeats in

Humans, nematode and yeast. Stepwise mutation to unequal exchange ratios of up to 20:1 are considered. All the surfaces show ridges of best fit, and all show an increase in the bias towards p as the ratio of a and p to Ô increases. If a 2:1 upward bias to

stepwise mutation applies to mono and tetranucleotide repeats as well as to dinucleotides, then it is clear that very high levels of unequal recombination would be required to explain the observed data. With the exception of mononucleotide repeats in humans, all the maximal values of (|) are at a:8 values of less than 10:1, where the 2:1 a:p bias exists. It must also be remembered that these ratios are determined by array size. For a dinucleotide repeat, a a:0 ratio of 10:1 at size 2 becomes a 1:1 ratio at 20 nucleotides, because the rate of 8 increases with the square of array size. This means that for dinucleotide microsatellites of approximately 10 repeats, probably the most studied type, equal rates of stepwise mutation and unequal recombination should apply. This is not supported by empirical studies where a:0 is probably greater than

20:1 for arrays larger than 10 repeats (Strand et al. 1993, Weber and Wong 1993, Banchs et al. 1994).

For each organism, tetranucleotide repeat arrays appear to be particularly constrained. In yeast and nematode a downward biased stepwise mutation process is required to account for the array size distribution at every value of a:8. Although expanded

tetranucleotide repeats were rare in the sequence data I examined, they are known to be highly polymorphic when present (Mahtani and Willard 1992, Weber and Wong 1993,). It is possible that as the core repeat size becomes large, slippage mutation

becomes less frequent and the ratio of Ô to a and p will increase. In this way, tetranucleotide repeats may begin to show more ‘minisatellite-like’ behaviour. While this might account for the fact that tetranucleotide array size distributions are more constrained than those of mono and di nucleotides, it is unlikely that even this can account for the extremely high levels of unequal exchange implied from this analysis.

46 Figure 3.9 (|) surfaces for comparisons of model size distributions with yeast microsatellite data

(solid line indicates a: (3 = 2:1)

Mononucleotides p:5 Contour Values M i n 3750 Max ------3450 Interval = 30

10

Dinucleotides

10 Contour Values

Min — -1500 M a x 1300 Interval = 20

0

Tetranucleotides

p: 6 lo: Contour Values

Min -3750 Max — -3 150 Interval = 60

47 Figure 3.10 (() surfaces for comparisons of model size distributions with nematode microsatellite data

(solid line indicates a: (3 = 2:1)

Mononucleotides

p : Ô Contour Values

Min ------2600 Max ------2380 Interval = 20

10

Dinucleotides

Contour Values p ; ô Min ------2400 Max —r -2150 Interval = 25

Tetranucleotides

(3: Ô lo’ Contour values Min ------4500 Max — -3800 Interval = 70

48 Figure 3 II ^ surfaces for comparisons of model size distributions with human microsatellite data

(solid line indicates (x: (3 = 2:1)

Mononucleotides

(3: Ô 10 Contour Values

Min ------5500 Max ------5300 Interval = 20

Dinucleotides.

(3:0 10- Contour Values M i n 3000 Max ------2800 Interval = 40

Tetranucleotides.

(3:6 10 Contour Values M i n 10000 Max — -9200 Interval = 80

10 a : Ô 49 3.7 Discussion of model output I have investigated the possibility that, in the absence of any generalised selective constraint, microsatellite array sizes are determined by their intrinsic mutational dynamics. My general finding, from the simple iteration model presented here, is that the mutational process of unequal recombinational exchange will act to constrain array size expansion. Simulation studies of the behaviour of repeat arrays evolving under unequal exchange predict that they will have short persistence times and rapidly disappear in the absence of some form of amplification process (Walsh 1987, Harding et al. 1992). At microsatellite loci, upward biased stepwise mutation provides just such an amplifying force.

Previous comparisons of model microsatellite evolutionary dynamics with empirical data have generally been in a population context and consider allele size frequency distributions at a single locus (Valdes et al. 1993, Deka et al. 1995). Typically, array sizes at these loci are large (>20 repeats), meaning that they are quite unrepresentative of microsatellites in the genome as a whole. I have used data from large-scale genome sequencing projects to study all microsatellite within a genome, a more preferable approach when considering the upper and lower limits of array size evolution.

The purpose of this study has been to use general mutational properties of microsatellite loci, derived from many empirical studies, to try and account for general distributions of arrays sizes within genomes. The size distributions will be the result of a balance between mutational processes tending to cause array expansion and those tending to cause array contraction. Data on the former is quite extensive, while data on the latter is almost entirely lacking. Microsatellite arrays will tend to expand through biased stepwise mutation. This process is reasonably well understood and has been convincingly demonstrated in a variety of different taxa. On the other hand, while unbiased recombinational exchange will effectively constrain array expansion, as instability increases with array size, little is known about the mechanism of this process. It has even been asserted that it is not important in microsatellite evolution (Stephan and Cho 1994). Consequently, I have only attempted to describe array size distributions in terms of the degree of bias to the stepwise mutation process, and the ratio of stepwise mutation to unequal exchange, on which some data is available.

My specific finding is that although unequal exchange can act to constrain array size, it would have to operate at extremely high levels in order to adequately explain the observed data. Most of the empirical data on the mutational processes of microsatellites is derived from dinucleotide repeat loci. Under my mutational constraint model, where

50 the rate of unequal exchange increases with the square of array size (Ôn^), dinucleotide array size distributions in the yeast, nematode and human genomes are all best explained by ratios of stepwise mutation to unequal exchange of <10 :L This means that at array sizes of >10 repeats, unequal exchange will be the dominant process. This is not supported by empirical observations, which show that stepwise mutation is still the most common mutational process for microsatellite arrays of 10-20 repeat units.

Additionally, arrays of = 20 repeat units show long persistence times in inter-species comparisons (Fitzsimmons et al. 1995, Rubinsztein et al. 1995), as predicted under a predominantly stepwise mutation model (Harding et al. 1992).

This study represents a preliminary comparison, and could be refined to provide a better dataset with which to test mutational constraint models. It is inevitable that genome sequencing projects have concentrated on the genomes of lower organisms, with their very high gene density, and the gene rich areas of higher organisms. Consequently, all the sequence data used in this study contains a high proportion of coding regions, where the products of dynamic mutation will be strongly selected against. In the absence of dynamic mutation, the probability of finding repeat arrays decreases with increasing array size, necessarily causing a bias towards smaller arrays. This effect was decreased by excluding from the observed size distributions the number of repeat arrays which were expected to occur by chance under the assumption of a random distribution of repeat units. However, the problem could be more effectively overcome by compiling sequence databases containing only non-coding DNA. In principle this is already possible for the yeast genome, where the position of all open reading frames is already known. It could also be performed on any sequence data by using software designed to find exons, such as the GRAIL package. With the implementation of such procedures and the continuing progress of genome sequencing projects, it is clear that much more extensive and informative sequence databases will become available.

Another reason for the failure of this mutational constraint model to explain the observed data, is that while its general properties agree with empirical observation of mutational processes, specific assunptions must be made in the absence empirical data. As stated earlier, almost all empirical knowledge of microsatellite mutation is derived from studies on relatively large arrays (>15 repeat units). It is possible that the parameters of slippage mutation are different for small arrays - there may be a smaller bias in favour of gain of repeats and no length dependence for mutation rate. Either of these two possibilities would increase the time spent at small sizes and reduce the amount of unequal exchange required to account for such biased size distributions.

51 Detailed assumptions of the model may also be incorrect. Stepwise mutation rates may increase exponentially, instead of linearly, with increasing array size, for example. However, the model serves its general purpose, which is to demonstrate that the inherent instability of arrays through unequal exchange can place a constraint on an upwardly biased stepwise mutation process.

Much of the genomes of eukaryotes are composed of non-coding, repetitive, apparently functionless ‘junk’ DNA. The ability of modem molecular techniques to harness these sequences for a variety of analytical purposes has focused much attention on the processes of genomic flux which produce them. This has revealed a world of dynamic sequences, expanding, spreading and contracting through processes of turnover largely oblivious to selection at the level of the organism. Under such circumstances any selectionist - neutralist argument becomes redundant - the answer merely depending at which level the phenotype is drawn. It seems likely that the dynamics of microsatellites, as well as those of other tandemly repeated sequences, such as minisatellite and satellite DNA, are largely determined by such intrinsic processes. This does not rule out the possibility that for some microsatellite loci, array sizes are influenced by selection at the level of the organismal phenotype.

There is some evidence that processes of genomic turnover and natural selection interact in what Dover calls ‘molecular co-evolution’ (Dover 1993). Unequal crossing over is thought to homogenise co-evolving regions of the MHC locus in primates (Klein et al. 1993), and the latitudinal dine in array size at a TG repeat locus in the D. melanogaster per gene may be related to the thermal stability of the gene function (Costa et al. 1992, Peixoto et al. 1993). Another particularly intriguing possibility for molecular co-evolution concerns those microsatellites most associated with coding regions - the trinucleotide expansion diseases.

3.8 Microsatellites and human disease Several inherited human diseases have recently been linked with microsatellite loci. The most well studied of these are the trinucleotide expansion related disorders, where the disease phenotype is associated with a huge increase in microsatellite array size. The instability of these arrays, leading to them being termed ‘fragile-sites’, show features of the mutational constraint model presented here. In this section, I first describe the dynamics of disease related microsatellites, and then describe how these dynamics can be accounted for by my model.

The first genetic disease of this type to be reported was Fragile-X-Syndrome, the most common form of familial mental retardation (Yu et al. 1992). The disease phenotype is

52 caused by dynamic mutation of a CCG trinucleotide repeat in the 5’ untranslated region of the FMRl gene. On normal X chromosomes this repeat array is polymorphic, usually with 6-50 copies of the CCG motif. Above 50 copies the repeat begins to undergo larger changes in size, through the so-called pre-mutation phase of up to 230 copies. Carriers of premutation alleles can transmit still larger changes in array size to give full mutation alleles of >200 repeats. Three other fragile sites have since been discovered, all are dynamic mutations of CCG arrays but the genes associated with these sites, if there are any, have not been identified (Sutherland and Richards 1995).

Dynamic mutation of AGC microsatellite arrays has also been found to be associated with a number of human neurological disorders, most notably Myotonic Dystrophy (DM) (Sutherland et al. 1991). In DM the repeat array undergoes a similar dynamic mutation to Fragile -X, but in this case there is not such a clearly determined threshold for full mutation, rather the severity of the disorder increases with increasing array size (Harper et al. 1992). In some of the other neurogenetic disorders linked with trinucleotide expansion the ACG repeat appears to code for polyglutamine and increases in copy number of as little as 5% can result in the disease.

The discovery of an increasing number of such loci has provoked interest in the relationship between repeat expansion and the actual genetic lesions which cause disease. There has been much debate as to whether the disease phenotypes arise through generalised processes, which potentially affect many loci, termed trans effects, or whether the effects are locus specific, so-called cis effects. Examples of trans effects would be mutations in mismatch repair enzymes causing elevated stepwise mutation rates, or an increase in the méthylation of the CCG repeat motifs which occurs in expanded alleles at the Fragile X locus. Conversely, the intrinsic mutational dynamics of a particular locus will play a cis acting role in the development of the disease phenotype.

It seems likely that bothcis and trans acting factors are likely to be involved in the expression of trinucleotide expansion related diseases. It is known that mutations in the yeast MLHl and MSH2 genes, both of which are involved in mismatch repair, cause greatly increased levels of stepwise mutation in repeat arrays (Strand et al. 1993). A similar phenomenon affecting human DNA repair enzymes could perhaps explain the tissue specific repeat instability exhibited by Myotonic Dystrophy and Huntingdon’s disease. In both of these cases, the target tissues for the disorder have increased array sizes when compared with other tissues (Anvret et al. 1993, Telenius et al. 1994).

53 The possibility of molecular co-evolution between natural selection and processes of genome turnover arises from the suggestion that dynamic mutation of trinucleotide repeat arrays might represent some sort of adaptive mutation (Richards and Sutherland 1995). A failure of DNA mismatch repair is thought to be responsible for ‘so-called’ adaptive mutation in bacteria by allowing the rapid accumulation of polymerase errors (Cairns and Foster 1991). Richards and Sutherland hypothesise that stepwise mutation at trinucleotide repeat loci allows for a rapid and flexible selective response to environmental stress if they directly affect gene function. Other possible examples of molecular co-evolution at microsatellite loci are the Huntingdon’s disease and 1 loci, where c ^ e rs of moderately expanded arrays have MltMyiS o f hum W r Ox OTir59nV»q higher fitness^than carriers of smaller ‘hbrmal’ alleles (Frontali et al. 1996).

In addition to potential frawj-acting factors, it will be observed that a dynamic mutation model, like that presented in this chapter, can provide a cis acting explanation for many of the peculiar features of the trinucleotide repeat diseases. In the terms of the model, normal alleles in the size range 6-50 repeat units may be regarded as small arrays, principally subject to stepwise mutation, and exhibiting a gradual tendency to increase in size due to the upward bias of such changes. The pre-mutation alleles (50 - 200 repeats) represent medium arrays which become unstable through a higher frequency of large size changes due to unequal exchange or unconstrained slippage mutation. Large array size changes at premutation alleles will occasionally produce much larger arrays in the so-called full mutation range and, in the case of Fragile X, these are responsible for the disease phenotype. With Fragile X, the probability of producing affected children increases as the premutation allele gets larger. This is predicted by the dynamic mutation model, in which increasing array size is correlated with increasing instability.

The instability of large arrays is not limited to meiotic chromosomes, as demonstrated through somatic mosaicism for array size at the Fragile X locus (Wohrle et al. 1993). However some from of imprinting may operate within this region as only maternally transmitted premutations subsequently expand to full mutation size post-zygotically. The predictions of a dynamic mutation model may also offer insights into the general prevalence of microsatellite related disorders. Many human gene loci may contain trinucleotide repeats, but these will generally be small and unlikely to expand through instability. Founder effects in human populations will result in the rare fixation of large alleles at particular loci. The greater instability of these large alleles make them much more likely to undergo large size changes into the pre and full mutation states. Such an effect is illustrated by the population history of Myotonic Dystrophy, where some

54 African populations do not contain expanded alleles at the DM locus, and are also free from the disease (Goldman et al, 1994).

Population effects would also account for the fact that mouse homologues of the human genes containing unstable repeats generally have shorter repeats with lower levels of polymorphism. Richards and Sutherland state that dynamic mutations and disease causing trinucleotide expansions have been found only in the human genome. I predict that dynamic mutation is a general feature of all microsatellite loci, in any organism. Genetic drift within populations will usually cause the loss of the rare large alleles at trinucleotide repeat loci that act as a template for the pathalogical expanded arrays. Thus though trinucleotide repeats may exist within many exons, recognisable phenotypic effects will only occur at a few loci, and these will differ between species. We have not found trinucleotide repeat related disease phenotypes in the mouse because we are not looking at the right loci yet.

3.9 Conclusions I have shown that the two components of dynamic mutation, stepwise mutation and unequal exchange, can interact to limit microsatellite array expansion. I have also shown how large scale genome sequence data may be used to test predictions about microsatellite evolutionary dynamics. The simple model of dynamic mutation presented here is generally supported by observed genomic data, but the specific mutational parameters I chose are not. However, an increasing empirical knowledge of the mutational processes at microsatellite loci will allow the development of more realistic models. These, combined with more refined analysis of genome sequence data should lead to a greater understanding of the evolutionary dynamics of microsatellite arrays.

55 Chapter 4

Conservation through the maintenance of genetic diversity

Does genetics matter in conservation? At first sight the answer to this question is a resounding yes - attempts to conserve biodiversity are attempts to conserve genetic diversity. However, the practical question for conservationists is more subtle; should management strategies for particular species be directly concerned with preserving genetic variation, or should they concentrate on the traditional approach of population demography? An example of the importance of genetic considerations is in breeding programs in captive populations of highly endangered species. In almost all of these cases great care is taken to control inbreeding and maximise the amount of genetic variability within existing individuals (Templeton and Read 1983, Ebenhard 1995). These situations are slightly artificial however, as natural populations of such small size would probably become extinct for demographic reasons long before the deleterious effects of inbreeding were manifested. In natural populations Lande (1988) has emphasised that low population densities are more likely to lead to reduced reproduction for non-genetic reasons such as inability to find a mate or a lack of the social interactions necessary for breeding. The question remains - can genetic factors lead to the extinction of populations not immediately threatened by demographic stochasticity?

The issue is complex, encompassing a hierarchy of unanswered questions: under what demographic conditions will populations become genetically impoverished? does this genetic impoverishment cause a reduction in fitness? can any such reduction in fitness threaten population persistence? and how should levels of genetic variability best be measured in the first place? I have addressed the first and last of these questions by studying the effects of population turnover in the silver studded blue butterfly, Plebejus argus, (Chapters 5 and 6). In this chapter I introduce the theoretical basis of conservation genetics, and examine how well this is substantiated by empirical studies. I also present a short review of the molecular genetic markers available for measuring levels of variability.

56 4.1 Population size and genetic variability Human land use practices are causing changes in habitat structure at a rate unprecedented in evolutionary history. Frequently the effect of this change is to fragment previously widespread habitats into arrays of smaller isolated patches, with concomitant reductions in the size of the populations that inhabit them. In small populations, random genetic drift will tend to reduce genetic variability and increase homozygosity, which may affect the ability of the organisms to adapt to environmental change. The loss of genetic variability will be dependent on the effective population size (NJ, This is the number of individuals in an ideal population which would give the same rate of genetic drift as that in the actual population. In other words, it is a measure of the average number of individuals which contribute genes to subsequent generations. The expected rate of loss of heterozygosity through genetic drift will be 1/(2N^) per generation (Wright 1978). This represents only a very small per generation loss of variability for large populations, but if population sizes remain small for several generations then genetic variability can be severely depleted.

Attempts have been made to try and estimate the minimum population size which must be maintained to offset the detrimental effects of drift. Franklin (1980), suggested that a population with an effective size of 500 will be able to maintain stable levels of heritable variation in selectively neutral quantitative characters. Although the measure is speculative, it has been used as the basis of several specific conservation management plans. In two of these cases, the Northern spotted owl, Strix occintalis, and the red cockaded woodpecker, Picoides borealis. Lande argues that attempts to maintain genetic diversity, through an emphasis on minimum viable population size, have ignored basic demographic and ecological considerations, with the result that these populations are still highly endangered (Lande 1988). Lande goes on to argue that, in general, demography will be more important than population genetics in conservation plans. He says, ‘if a population becomes extinct for demographic reasons, such as habitat destruction, the amount of genetic variation it has is irrelevant.’ Clearly the implication is that for threatened species, ecological parameters are changing at a rate which makes genetic change unimportant.

However, the dynamics of some fragmented populations may give rise to conditions where major genetic changes can take place over relatively short timescales. The idea that there will be genetic effects of population fragmentation is nothing new, Sewall Wright considered the problem in the 1940’s. He described the degree of population differentiation in terms of two key parameters N^, the effective population size, and m, the rate of gene flow between populations. Wright showed that in population arrays where and m are low enough, the effects of drift will predominate, and may cause

57 substantial local differentiation. Wright measured this differentiation as the degree of identity by descent between alleles within subpopulations compared to the identity by descent of alleles in the total population. This measure is termed and Wright demonstrated that for populations at mutation/drift equilibrium, = l/(4Nm +1) (Wright 1978). Wright was not concerned with questions of conservation, but believed that the extensive changes in allele frequencies caused by genetic drift in small, isolated populations could result in evolutionary pathways not possible by natural selection alone. His work assumes an island type model of population structure in which small isolated populations receive migrants from a large source population of essentially fixed gene frequency. Thus even if genetic drift causes a significant loss of variation in the population isolates, there is always a ‘reservoir’ of variability remaining in the source, and, according to Franklin, this would only need to number approximately 500 individuals to prevent a population wide loss of variability. This suggests that even in species which have undergone range fragmentation, population persistence is unlikely to be affected by genetic considerations.

This simplistic view overlooks the important process of population turnover. Many species with a fragmented distribution exist within a mosaic of available habitat patches which show temporal variation in their suitability. In this case, the overall distribution of the population within the habitat is governed by a balance between local extinction and colonisation. Levins developed a simple model to describe the dynamics of subdivided populations in terms of this extinction/colonisation balance (Levins: in Hanski and Gilpin 1991). Rather than producing a detailed account of specific subdivided populations. Levins’ important contribution was to emphasise the dynamic nature of these systems, stressing particularly that there are no permanent populations and hence no ‘reservoirs’ of genetic variability. Ecologists have become interested in Levins-type subdivided populations, and have extended his work to analyse in more detail a variety of processes, including population size distribution (Hastings 1991), dispersal processes (Hansson 1991), colonisation ability (Ebenhard 1991) and predator-prey interactions (Nachman 1991).

The potential demographic effects of population turnover are now well studied and have been used in the design of some specific conservation management plans. The genetic consequences of such demographic upheavals have received much less attention. The importance of effective population size in maintaining genetic diversity has already been described, but in spatially discontinuous populations subject to turnover, frequent colonisation events may result in effective population sizes which are orders of magnitude lower than observed census sizes.

58 Population turnover has two consequences for the genetic structure of fragmented populations. The first is episodes of genetic drift associated with sampling individuals who become colonists, and the second is episodes of gene flow associated with the movement of migrants between established populations. In a simulation study, Gilpin showed that populations subject to turnover lose heterozygosity at a rate much faster than stable populations of the same size, through drift during colonisation bottlenecks (Gilpin 1991). The model assumes a particularly severe form of genetic bottleneck, as all colonisation events consist of only two diploid individuals with no subsequent gene flow between populations. In natural fragmented populations, loss of genetic variation during bottlenecks would be at least partially offset by migration from surrounding populations, there might also be some post-bottleneck mutational recovery of genetic variation (Whitlock 1992). However, in less restrictive models it has been shown that sampling effects will usually dominate, and that the general effect of population fragmentation will be to reduce levels of genetic variation (Slatkin 1976).

4.2 How does loss of variation reduce population viability? Theoretical results suggest that the environmental conditions faced by many species may cause depletion of genetic variability. However, the relevance of such genetic impoverishment to population viability is difficult to establish and will depend upon the specific effects of inbreeding depression in populations of small effective size.

Inbreeding depression occurs through the segregation of partially recessive alleles which would usually be kept rare by selection in large populations. Even in historically large outbreeding populations, rapid inbreeding decreases the mean of many major fitness components, such as litter size in pigs and mice and seed yield in maize (Falconer 1989). This inbreeding depression is the result of the effects of a few recessive mutations of large effect (lethal and sublethal), and many mildly detrimental mutations with more nearly additive effects. Experiments in Drosophila have used flies homozygous for entire chromosomes to examine these two components. Simmons and Crow (1977) show that approximately half of the inbreeding depression in viability is caused by rare, nearly recessive, lethal and sublethal mutations, with the remainder caused by numerous mutations of small effect.

The extent to which this inbreeding depression affects population persistence will depend upon the effectiveness of selection in purging the deleterious mutations. Gradual inbreeding through a steady reduction in population size over many generations will allow selection to remove mutations of large effect when they become homozygous. However, mildly deleterious mutations may be nearly neutral and could reach high frequency in the population. Thus, due to their relatively higher rate of

59 spontaneous generation and near neutrality, mildly detrimental mutations may well be the most damaging to population viability. This is substantiated by the rebound in fitness shown by highly bottlenecked species in laboratory experiments (Bryant et al. 1990, Brakefield and Saccheri 1994). Butterflies and houseflies both show an increase in the mean values of several fitness components after initial reductions due to the inbreeding depression caused by a tight bottleneck. The fact that recovery begins after only a few (5-7) generations, even when the founding population size is only one pair, suggests that the inbreeding depression caused by recessive lethals, and near lethals, is, i’y\ unlikely to represent much of a threat to most natural populations.

In contrast, the accumulation of mildly deleterious, near neutral, mutations is relevant to population persistence for two reasons. Firstly, the so-called ‘mutational meltdown’ scenario, in which fitness is gradually eroded by the accumulation of mildly deleterious mutations which are not purged by selection, and secondly, in considerations of the minimum viable population size required to preserve sufficient levels of additive genetic variation for ‘normal’ rates of evolution to proceed.

Lande (1995) has modelled the effect of the fixation of new deleterious mutations on the intrinsic rate of increase of a population. When all mutations are subject to the same selection coefficients, that is s against heterozygotes and 2s against homozygotes, the mean time to extinction is a nearly exponential function of population size (NJ. In this case, is such a rapidly increasing function of that there is little risk of population extinction for populations where > 100. When there is variance in the selection coefficient acting on new deleterious mutations, a much more realistic situation for natural populations, then is asymptotically proportional to N / (when s is exponentially distributed). There is thus a more gradual increase of with increasing Ng, indicating that with variance in s, the accumulation of mildly detrimental mutations could present a considerable risk of extinction even where > 200. Lande (1995) showed that under demographic stochasticity the average time to extinction increases almost exponentially with the carrying capacity of the population, whereas under environmental stochasticity the average extinction time scales as a power of carrying capacity. Thus for populations where > 100, genetic stochasticity, through the accumulation of mildly deleterious mutations, represents the same order of risk as environmental stochasticity, and a greater risk than demographic stochasticity.

The accumulation of nearly neutral (quasi-neutral) variation is also of importance for its contribution to the standing variation of the population, which in turn is important for adaptive change. The aim in establishing Minimum Viable Population sizes (MVP) (Franklin 1980, Soule 1980), was to estimate the size of population required to

60 preserve levels of additive genetic variance in the face of genetic drift. Franklin and Soule based their estimates on the assumption that under a mutation/drift equilibrium Vg = 2NgV^ (where Vg = additive genetic variance, = effective population size, = additive genetic variation created in each generation by spontaneous mutation). The heritability of a character is proportional to the additive genetic variation, and for typical characters of morphology and physiology, heritabilities range between 0.2 - 0.8 (Falconer 1989). Experiments using highly inbred lines in animals and plants, have shown that = 10 ^ Taking a typical heritability of 0.5 and solving the equation (Vg = 2NgV^), gives = 500 - the Franklin/Soule number. However, as Lande points out, much of this new mutation will be selected against and so will not contribute to the standing variation of the population. Experiments suggest that only about 10% of the spontaneous mutational variance is quasi-neutral (Lopez and Lopez-Fanjul 1993). If of lO '^ is used then becomes 5000 for quasi-neutral mutation.

Theory suggests that both the avoidance of mutational meltdown and the maintenance of quasi-neutral variation require effective population sizes of the order of thousands of individuals. Genetic stochasticity could thus present a real threat to population persistence, and this threat may be being realised in populations traditionally thought large enough to be of no concern to conservationists. This work has formed the basis of the suggestion that genetic factors are as relevant to conservation as demographic and environmental concerns. However, empirical studies linking loss of genetic variability to reduced fitness are equivocal, even in species which are apparently extremely genetically depauperate.

4.3 Empirical studies of genetically depauperate populations Possibly the most celebrated example of an endangered species where a population bottleneck has been invoked to explain low genetic diversity, is that of the cheetah Acinonyx jubatas. Over 100 protein loci have been analysed in this species revealing almost complete monomorphism and extremely low levels of heterozygosity. The high degree of genetic relatedness between individuals is confirmed by the acceptance of skin grafts made between ‘unrelated’ cheetahs. O’Brien et al. (1987), has postulated that this genetic uniformity is the result of two genetic bottlenecks in the recent history of the cheetah, one approximately 100,000 years ago, and one as recently as the last century.

A known bottleneck, of as few as 30 individuals, during the 1890’s is thought to be responsible for the lack of genetic variation at allozyme loci in northern elephant seals. Although numbers have now recovered to tens of thousands of individuals, no genetic variation has been found at the 55 allozyme loci thus far examined (Bonnell and

61 Selander 1974). Ellegren et al. (1993), examined the effects of a population bottleneck in Swedish beavers, which were completely eliminated from Sweden through hunting during the 1870’s. Réintroductions were made using 80 individuals transported from Norway between 1922 and 1939, since when population numbers have climbed to approximately 100,000. Post-bottleneck Swedish beavers are monomorphic for RFLP’s at the MHC locus and also have very low levels of variability at multilocus fingerprints - loci which are normally so variable that they are used for distinguishing individuals within pedigrees.

Molecular genetic studies have revealed that the gray wolves of Isle Royale in Lake Superior have only about 50% of the allozyme heterozygosity of mainland populations and also all carry the same mtDNA haplotype (Wayne et al. 1991). This loss of variability is associated with the reduction of the population to just 12 individuals. Members of the population are now as closely related as full sibs in a normal outbreeding wolf colony. Other examples of populations where a lack of detectable genetic variation has been associated with a severe reduction in effective population size include Florida tree snails (Hillis et al. 1991), wombats (Taylor et al. 1994) and bison (McClenaghan et al. 1990).

Even if colonisation bottlenecks are frequent, and even if they do result in significant losses of genetic variation, the question remains - does this loss of variation affect the life history of the organism concerned and hence have any relevance to conservation? In all the examples cited above, the post-bottleneck populations have grown to substantial levels and none of the species is demographically highly endangered. However, high census numbers may hide a deleterious genetic legacy from periods of very small effective population size.

A frequently cited consequence of lost allelic diversity is a reduction in the ability of a population to respond to future environmental change, leaving it in danger of extinction. A specific example would be the necessity for producing variability in the immune response when confronted with a variety of pathogens. When a captive population of cheetahs in Oregon was swept by a series of corona-virus associated diseases, the result was 60% mortality, whilst no other Felidae were infected (O’Brien and Evermann 1988). Additionally, there are particularly high levels of juvenile mortality in the cheetah, possibly as a result of inbreeding increasing the frequency of homozygotes for deleterious alleles of large effect (O’Brien 1996). It has also been suggested that a reduced immune response was responsible for catastrophic population crashes following distemper virus outbreaks in populations of the harbour seal in the Dutch Wadden sea, populations known to be lacking in scoreable allozyme variation

62 (Swart et al. 1996). Wayne et al. (1991) have speculated that the rapid decline of the Isle Royale wolf population may be in part due to a recognition triggered response to avoid incest, preventing the formation of pair bonds.

A comprehensive study of the effects of genetic variability on fitness has been carried out on the Sonoran topminnow, Poeciliopsis monacha, by Vrijenhoek (1994). He examined 25 allozyme loci and four fitness characters in three separate populations of the fish. The population with the lowest heterozygosity also showed the highest mortality, the slowest growth rate, the poorest fecundity and the weakest developmental stability. Additionally, Vrijenhoek examined a population bottleneck in Poeciliopsis caused by the recolonisation of river pools where the species had previously become extinct during a severe drought. The post bottleneck populations were homozygous at allozyme loci which were polymorphic in fish from the source pools. The homozygous populations exhibited reduced levels of fitness, as measured by fluctuating asymmetry, when compared with source populations, and also had high rates of spinal curvature and other visible deformities. Vrijenhoek then reintroduced genetically variable females from the source population and showed that these out competed the homozygous Poeciliopsis individuals and levels of variation returned to normal within three generations. Selection against inbred individuals has also been observed during cyclical population crashes in song sparrows (Keller et al. 1994), and by the rapid spread of introduced alleles through previously inbred populations (Berry et al. 1991).

Do these findings have any relevance to conservation efforts? From the examples cited above, the answer appears to be yes, loss of genetic variability can affect the fitness of organisms. However, there are good reasons for exercising caution in interpreting these results. In the case of the Northern elephant seal and the Swedish beaver populations, low levels of genetic variability do not seem to have impeded rapid post­ bottleneck recovery of population size. Extremely low levels of detectable genetic variation have been found in six species of seal, frequently unconnected with particular demographic incidents (Simonsen et al. 19826, Swart et al. 1996). Many widespread and successful species, especially including carnivorous mammals, also have low levels of genetic diversity (Allendorf 1979, Simonsen et al. 1982a, Avise 1995). Additionally, as stated earlier, different species may not undergo the same genetic response to periods of low population size. The cost of inbreeding, as measured by survival in crosses between first degree relatives, varied by two orders of magnitude in a survey of captive mammalian species (Ralls et al. 1988).

63 Swedish beaver populations have grown rapidly since the bottleneck during recolonisation. Ellegren et al. (1993) speculate that the beaver may be tolerant to periods of inbreeding, as it lives in small colonies, often just a single family, from which the dispersal of juveniles is restricted. It has been suggested that moderate levels of inbreeding in normally outbreeding organisms will act to purge the genome of deleterious recessive alleles through selection against recessive homozygotes (Carson 1968). A similar effect is responsible for the observed ‘fitness rebound’ in intensely bottlenecked populations (Bryant et al. 1993, Brakefield and Saccheri 1994). Consequently, it is not safe to assume that a particular demographic episode will affect all species in the same manner, the response will depend in part upon the evolutionary history of the population in question.

Concerns have also been raised about surveys of genetic variability based on one type of genetic marker (Hedrick 1986). In the Florida tree snail, an almost complete lack of allozyme variation contrasts starkly with a huge range of morphological variation (Hillis et al. 1991). Even in the cheetah, normal levels of genetic variability are now being revealed at microsatellite loci, in comparison with widespread monomorphism at allozyme loci (M. Bruford pers. comm.). Finally, even in cases where populations with low genetic variability are suffering reduced fitness, care must be taken in establishing a clear causal link between genetic poverty and population vulnerability. To return to the example of the cheetah, most juvenile mortality in the wild has been shown to be due to predation by lions rather than directly due to the cost of inbreeding. And as regards the dangers of monomorphism and disease resistance, Caro and Laurenson (1994) ruthlessly point out, ‘only 60% of captive cheetah succumbed to feline infectious peritonitis in Oregon, that is, not nearly all.’

4.4 Molecular genetic markers for conservation studies Conservation genetic studies are generally concerned with how demographic decline has shaped the genetic architecture of populations through the conflicting processes of genetic drift and gene flow. Consequently, an ideal analysis would utilise a highly variable neutral genetic marker which faithfully reflects the influence of these two processes across the genome as a whole. Modem molecular techniques have placed a wide variety of genetic markers at the disposal of the population geneticist, however, all studies face several inherent difficulties.

Any realistic experimental design can only sample a tiny fraction of the genomic DNA, leading to sampling effects associated with having too few loci, or too little allelic diversity at those loci (Hillis et al. 1991, Hughes and Queller 1993). Additionally, the pattern of variability at a neutral locus is not only a product of its demographic history,

64 but also of its specific mutational dynamics. Thus comparisons between different types of marker loci may give conflicting results (Begun and Aquadro 1993, Pogson et al. 1995, Scribner et al. 1995). The suggestion has even been made that studies should focus on particular regions of the genome considered to be of particular importance to population viability, such as the MHC locus (Hughes 1991). However this has proved controversial, as other loci are known to contribute to disease resistance, and selective breeding to maintain MHC diversity could lead to inadvertent losses of variability at other polygenes underlying important adaptive traits (Gilpin and Wills 1991, Miller and Hedrick 1991).

Allozyme electrophoresis has traditionally been the mainstay in studies of population genetic diversity. Assays are relatively cheap and easy to perform, and codominant Mendelian marker information can rapidly be obtained from a large number of loci (reviewed in Frankham 1995). However, allozyme loci are frequently invariant, especially in endangered populations (Bonnell and Selander 1974, O’Brien et al. 1985, Hillis et al. 1991), and doubts have been raised about their selective neutrality (Karl and Avise 1992, Begun and Aquadro 1993). Another drawback is the general necessity for fresh protein extracts, meaning that analyses are invasive and cannot be performed on hair, scales or ancient or decaying specimens.

The first DNA based assays of population genetic variation utilised restriction fragment length polymorphisms (RFLP’s). The banding patterns revealed by the hybridisation of specific probes to restricted total genomic DNA provide easily scoreable codominant markers, which are frequently more variable than allozyme loci (Begun and Aquadro 1993, Pogson et al. 1995). They have been used in the study of several endangered species (O’Brien et al. 1994), but are somewhat limited by the amount of DNA required (>10|ig per individual), and the labour-intensive probing and blotting procedures.

The development of the polymerase chain reaction (PCR) has brought about a revolution in the analysis of the DNA sequence itself. Probably the crudest application of the technique is for RAPD (randomly amplified polymorphic DNA). Here the use of arbitrary primers to amplify fragments of total genomic DNA obviates the need for any specific probes or sequence information. It is extremely quick and simple to perform, even with small amounts of template (Williams et al. 1990, Hadrys et al. 1992, Nadeau et al. 1992). While being useful in genetic mapping and strain identification, RAPD’s are dominant markers and consequently traditional measures of estimating population diversity cannot be used, although methods of estimating nucleotide

65 diversity from RAPD data have been developed (Clark and Lanigan 1993, Gibbs et al. 1994). In general, attempts to use RAPD have been beset with problems of repeatability and lack of knowledge about the identity of individual bands within multi­ band profiles (David Heckel pers. comm,).

Amplified fragment length polymorphism (AFLP) represents a more advanced form of RAPD analysis. It involves selective rounds of PCR amplification with random primers on digested DNA. Restriction of the DNA prior to amplification effectively limits the size of PCR products obtainable, meaning that much higher resolution gels may be run, and ‘loci’ can be identified (Vos et al. 1995, Karp et al. 1996). Although AFLP’s promise to be a powerful tool in gene mapping, they are subject to the same problems of band anonymity as RAPD, and are unlikely to become the method of choice for estimating genetic diversity.

The discovery of numerous widely distributed arrays of tandem repetitive DNA has dramatically increased the power of molecular genetic analyses (Chapter 1, section 1.2). Minisatellite polymorphisms were originally revealed by hybridisation to oligonucleotide probes to produce multilocus ‘fingerprint’ profiles, similar to those of RAPD and AFLP (Jeffreys et al. 19856). Cloning of flanking regions also allows single locus PCR of minisatellites to give a codominant marker. The hypervariabilty of minisatellites makes them ideal individual specific markers for understanding mating systems in small wild or captive populations, and for estimating diversity in particularly genetically depauperate populations (Wayne et al. 1991, Ellegren et al. 1993, Fleischer et al. 1994, Signer et al. 1994). However, such high levels of variability generate problems for population studies. Individual bands from fingerprint profiles cannot be assigned to specific loci, meaning that population hetrozygosities, and allele frequencies cannot generally be ascertained.

There are several reasons for regarding microsatellites as the ideal molecular genetic marker for population studies (Bruford and Wayne 1993). They are highly variable, codominant, Mendelian, neutral genetic markers which are widely distributed in all eukaryotes. Once locus specific primers are obtained, analysis is relatively easy - labeled PCR followed by electrophoresis - with no need for hybridisation or blotting. The smaller size of microsatellite alleles also makes them more suitable for PCR than minisatellites. Microsatellites have been used to genetically characterise numerous endangered populations (Gotelli et al. 1994, Forbes and Boyd 1996, Garcia-Moreno et al. 1996), and only require minute quantities of DNA for amplification, allowing non- invasive sampling (Rose et al. 1994), and even the sampling of extinct populations (Taylor et al. 1994).

66 The principal technical limitation of microsatellite analyses is the species specific cloning required to design primers for amplification. Although construction of DNA libraries enriched for microsatellite sequences should make selection of individual loci easier (Edwards et al. 1996), methodological problems have still been encountered in many species (T. Coote, O. McMillan, I. Saccheri, B. Meglecz pers. comm.). Specific analyses have been developed to account for dynamic mutation when interpreting population data from microsatellite loci (Goldstein et al. 19956, Shriver et al. 1995, Slatkin 1995), but the demographic events affecting most endangered populations have probably occurred recently enough to make these unnecessary.

Finally, analysis of the DNA sequence composition itself is possible. The principal use of direct DNA sequence data in conservation genetics has been in the construction of phytogenies based on sequence similarity for assessing the taxonomic status of endangered populations (Avise and Nelson 1989, O’Brien 1994, Lehman et al. 1991). Following an initial phase of cloning to design primers for PCR, sequence data may be analysed in a number of different ways. Variation in nucleotide composition can be revealed by sensitive detection assays such as DOGE (denaturing gradient gel electrophoresis) (Norman et al. 1994), or SSCP (single stranded conformational polymorphism) (Orita et al. 1989). However, both of these systems are extremely sensitive to operating conditions and problems of repeatability make them difficult to use when studying populations (O. Rose unpublished data). Another alternative is to digest the amplified products with specific restriction enzymes (PCR-RFLP) and visualise the products via electrophoresis on agarose or acrylamide (Alkopyanz et al. 1992, Brookes et al. 1997).

The choice of marker for a particular study will be determined by an interaction of the cost involved, time and specialist skills available, and the level of resolution required - allozymes provide informative phylogenetic distinctions between conspecific populations and closely related species, but are too invariant for paternity analysis, where hypervariable minisatellites are ideal. Only DNA sequence analysis offers the potential for resolution of variation at all taxonomic levels, but is extremely time consuming and expensive to perform. Where the concern is with the standing levels of variation in small or endangered populations, a strong case can be made for the use of allozyme and microsatellite loci. These presumably approximately neutral, codominant markers provide good genome coverage and may be interpreted through standard population genetics theory, to infer the effects of demographic changes.

It has also been argued that mitochondrial DNA analyses have a special significance for understanding the demography of endangered populations (Moritz 1994, Avise 1995).

67 Maternal inheritance of the single copy mtDNA genome ensures that it has an effective population size only one quarter that of nuclear genes (Birky et al. 1993), resulting in particular sensitivity to population decline. Avise (1995) argues that the demography of matrilines may be of particular relevance to conservation as gene flow is often highly gender biased and successful recruitment depends heavily on the demography of females. However, questions have been raised about the assumed neutrality of mtDNA, because the absence of recombination greatly increases the possibility of hitchhiking through selection at linked sites (Ballard and Kreitman 1995).

The status of genetic approaches for identifying individuals, populations or species in need of conservation is not in doubt (Avise and Hamrick 1996). The suggestion that genetic factors can play a primary causitive role in global biodiversity loss is much more contentious. A wide range of molecular tools are available to test the theoretical claim that genetic impoverishment can lead to the decline and eventual extinction of small populations. However, the empirical evidence for this hypothesis is, at best, equivocal. Despite good empirical evidence linking inbreeding depression with reduced fitness, there has been no convincing demonstration that this can present a significant threat to population persistence when compared with demographic and environmental factors. An important advance would be to show that genetic impoverishment can occur in populations where demography is not already the subject of concern.

I have investigated the effects of population turnover in the silver studded blue butterfly P. argus. Frequent extinction and colonisation in fragmented local populations of this sedentary species provide conditions under which there could be substantial loss of genetic variability through drift. I examine the genetic effects of population turnover at two levels. Chapter 5 studies an introduced P. argus metapopulation in North Wales, to examine losses of variability associated with spread through fragmented habitat: a scenario of great relevance to many species. In Chapter 6, genetic variability and differentiation in nine isolated British populations of the butterfly are used to examine the effects of thousands of years of population turnover. In both studies, comparisons between different molecular markers are used to assess their suitability as surrogate measures of overall genomic diversity.

68 Chapter 5

Loss of genetic variability through population turnover in a butterfly

Abstract Fragmented populations of the silver studded blue butterfly (Plebejus argus), in North Wales provide an ideal system with which to examine the genetic effects of population turnover. This thriving metapopulation has been established through a series of artificial introductions and natural colonisation events. Microsatellite loci were cloned and used to analyse levels of variability (in terms of numbers of alleles and expected heterozygosity), in one source and eight daughter populations. Inferences made from sample data using permutation tests (Brookes et al. 1997), indicate that serial colonisation events can significantly reduce levels of genetic variability, but individual population bottlenecks do not. These data were compared with a study of variability at allozyme and mtDNA loci, in the same populations, to compare the sensitivity of these different molecular markers to population turnover. Allozyme and microsatellite loci showed a consistent average response to turnover, but differed widely during individual colonisation events. Microsatellites show greater change in expected heterozygosity, while allozymes showed greater changes in numbers of alleles. This difference may reflect the greater evenness of allele frequency distributions at microsatellite loci, where rarer alleles contribute more to expected heterozygosity. The mtDNA locus showed greater losses of variation than allozymes or microsatellites in daughter populations, but the reduced statistical power of only using a single locus meant that changes associated with individual colonisation events were rarely significant. Losses of variability at all three marker loci only affect rare alleles, and post-introduction turnover within the metapopulation indicates that current levels of variation can be maintained within transient and fragmented habitat. It seems very unlikely that population turnover is introducing a damaging genetic legacy into these thriving populations.

69 5.1 Introduction In the current climate of habitat destruction and fragmentation, the demographic and genetic threats faced by small populations have caused much interest amongst conservation biologists. Traditionally, demographic threats have been regarded as paramount (Soule 1987, Lande 1988, Nunney and Campbell 1993), but the specific properties of dynamic metapopulations may make them vulnerable to genetic catastrophe (Hanski and Gilpin 1991). The theory is that repeated colonisation and extinction in spatially discontinuous populations lead to small effective population sizes which in turn cause loss of genetic variation through drift and ultimately loss of fitness through inbreeding depression (Chapter 4). However, the relevance of this to real populations is unclear and untested.

Empirical studies have generally focused on the effects of single bottlenecks (O’Brien et al. 1985, Johnson 1988, Elle gren et al. 1993, Taylor et al. 1994). Frequently, the bottleneck itself is inferred from genetic data, in the absence of any demographic evidence (Bonnell and Selander 1974, O’Brien et al. 1987). Additionally, in many studies, low levels of variability have been identified almost as an afterthought, in populations which are already demographically threatened (O’Brien et al. 1985, Wayne et al. 1991, Swart et al. 1994). If genome wide loss of genetic variability is to be taken seriously as an independent threat to natural populations, it must be demonstrated in populations where demography is not already the subject of concern.

I have studied losses of genetic variation at microsatellite loci, associated with turnover in a butterfly metapopulation. The population history of the silver studded blue butterfly, Plebejus argus, in North Wales provides an ideal system with which to test the genetic effects of a realistic conservation scenario - the réintroduction of an endangered invertebrate species into a fragmented habitat. The demographics of the system are well documented (Thomas 1985, Thomas and Harrison 1992, Lewis et al. 1997), and allow the effects of both artificial introductions and multiple natural colonisations to be studied. P. argus is sedentary and a poor coloniser, enhancing the potential for loss of genetic variability through drift. Additionally, population census numbers are high (10^ - lO'^ individuals). This means that although very local, the species is not currently regarded as endangered, and large (80 individuals) samples may be taken without affecting population structure.

A major problem for conservation genetic studies is how to measure genome-wide losses of variability and the associated fixation of deleterious alleles at polymorphic loci (Chapter 4, section 4.3). The use of marker loci as surrogates for overall genomic diversity is widespread, but encounters two inherent sampling errors. Firstly, actual

70 losses of allelic diversity can only be confirmed by sampling of entire populations, a practical impossibility in almost all cases. Secondly, any analysis of molecular genetic variability can only sample a tiny fraction of the genome, and the different evolutionary dynamics of particular markers will in part determine the genetic effects of demographic upheaval. For example, if protein polymorphism is under strong selection, allozyme variability may reflect current selection rather than demographic history.

I have attempted to overcome these problems in a study of the effects of single and multiple colonisation events in P. argus on levels of variability at microsatellite loci. I use recently developed permutation procedures (Brookes et al. 1997), to make inferences about population-wide changes from sample data. I also compare losses of allelic diversity at microsatellites with those at allozyme and mtDNA loci, to examine the consistency of different molecular genetic markers in revealing the changes in genetic structure caused by population turnover.

5.2 Methods

5.2.1 History of Plebejus argus in North Wales British populations of the silver studded blue butterfly, Plebejus argus, inhabit early successional limestone grasslands and sandstone heaths, usually on warm South facing slopes at low elevation. Heathland disturbances, such as grazing or burning, are essential for this species, as eggs are laid at the vegetation boundaries, where the larval stages have a mutualistic relationship with Lasius ants (Jordano and Thomas 1992). The fragmentary nature of such disturbances ensure that P. argus is patchily distributed wherever it has been studied (Thomas and Harrison 1992). This century has seen the ‘improvement’ of most British lowland heaths, and those that remain are much less intensively grazed. This, combined with the poor dispersal powers of P. argus has resulted in an 80% reduction in its former range since 1900 (Thomas 1985, Thomas and Lewington 1991, Warren 1993).

This study concentrates on one source and eight derived populations on the limestone grasslands of North Wales (Figure 5.1). These calcareous grasslands consist of patches of short sward with high densities of P. argus host plants (various Leguminosae and Cistaceae), interspersed with areas of outcrop and chalk scree. The source population at the Great Orme’s Head, near Llandudno is thought to have been present for 10^ - lO'^ years (Dennis 1977), and currently numbers about 10^ adults (Thomas 1985). In 1942,90 mated females from Great Orme’s Head were introduced to suitable, but unoccupied habitat at Rhyd-y-foel in the Dulas Valley, 15km away

71 (Marchant 1956). Subsequently there have been a series of natural colonisation events from Rhyd-y-foel leading to the establishment of a metapopulation consisting of six persistent, and several transient, patches with a total size of about 10^ individuals in 1983, (the ecology of the Dulas valley metapopulation is fully described in Thomas and Harrison 1992). In 1983 a second introduction of 30 mated females from the Great Orme was made at Graig Fawr, near Prestatyn, 29 km away. This population now numbers several thousand individuals. In addition to the two introductions, a site approximately 1 km from the Great Orme population, at Happy Valley, was naturally colonised soon after 1983.

5.2.2 Collecting sites Samples were collected during July and August in 1992 and 1994. Locations of sampling sites are given in Figure 5.1. They are, the source population. Great Orme’s Head (“GO”, grid ref. SH 768 824); the introduced populations at Graig Fawr (“GF”, SJ 059 803) and Rhyd-y-foel (“RYF”, SH 913 765); the populations naturally colonised from RYF, Borth-wryd (“BW”, SH 913 763), Garth Gogo (“GG”, SH 917 762), Mynydd Marian (“MM”, SH 891 773), Plas-Newydd (“PN”, SH 905 766), Terfyn (“T”, SH 913 776); and the population naturally colonised from the Great Orme - Happy Valley (“HV”, SH 780 832). Transect walks were conducted through each site for a period of three weeks to monitor emergence of adults, and collections were made only once substantial emergence had occurred. Additionally, only males were collected to minimise the effects on population numbers. Upon collection, individuals were immediately frozen in liquid nitrogen and transported to the laboratory for storage at -80°C prior to analysis.

5.2.3 DNA extraction A standard phenol-chloroform DNA extraction was performed for the construction of the genomic library. This involved overnight digestion of head, thorax and abdomen with proteinase K, followed by two extractions with phenol chloroform (1:1) at 4°C and two extractions with chloroform / isoamylalcohol (24:1). The DNA was precipitated with absolute alcohol and, after centrifuging, the pellet was washed in 70% alcohol and resuspended in 100|il TE.

72 Figure 5.1 Locations ofP. argus populations on the limestone grasslands of North Wales.

reradui

AAbergde

y X y y

River DulasV ) e T

MM/PN RYF ® ®GG PN# # BW

. 1km

73 Extraction of DNA for PCR was performed using a high-salt method on butterfly thoraxes. These extractions can be performed very rapidly and avoid the necessity to use harmful reagents such as phenol and chloroform. The DNA obtained is less pure than that yielded by phenol-chloroform extractions but is adequate for PCR.

Each butterfly thorax was homogenised in 250|il TNES buffer (0.01 MTris, O.IM

NaCl, ImM EDTA, 0.1%SDS), and incubated overnight at 55® C with 250|ig

Proteinase K. An equal volume of 2.6M NaCl was added, and the mixture was centrifuged for 20min at 4®C. The DNA precipitate was washed in 70% ethanol and

air-dried, before dissolving in lOOpl HjO

5.2.4 Identification and analysis of microsatellite loci The method of library construction and screening broadly follows that of Hughes and Queller (1993). Pooled genomic DNA from four individuals was digested to completion with Sau 3AI (Gibco BRL), giving a high yield of fragments in the range 400-600bp. These fragments were then dephosphorylated, purified (Promega ‘Magic PCR prep’ system), and ligated into pBluescript IISKM13 (+) vector (Stratagene). Ligation reactions included a great excess of target DNA to prevent recicularisation of the undephosphorylated vector. The library was titrated by transformation into ‘Epicurian coli’ XL-Blue MRF competent cells (Stratagene). Recombinants were identified by blue/white colour selection on media containing X-Gal/ IPTG. Platings were standardised to yield approximately 4000 colonies per ‘megaplate’ (Nunc- Intermed).

The library was screened by hybridisation to radiolabelled GAjj and GT^g oligonucleotides. Platings were replica-lifted and fixed to Hybond-N-membrane (Amersham) and then pre-hybridised for 1 hour at 65®C in 1.2X SSPE 2% SDS with

10% Dextran Sulphate. Oligonucleotides were 5’ end-labeled with adCTP^^ using T4

polynucleotide kinase (Gibco BRL) at 37®C for 1 hour. Labeled oligos were then added to the prehybridisation mixture and allowed to hybridise to filters overnight at 65®C. Following hybridisation, the filters were washed twice in 0.2X SSPE, 1% SDS for 30 mins at 64®C. Dried filters were then exposed to Amersham Hyperfilm MP for 24-48 hours at -70®C with intensifying screens.

Positive colonies identified from the screening were picked, grown and re-screened. Re-screening was by PCR amplification of complete inserts, using the pBluescript T3

74 and Ks primers, followed by Southern-blotting and repeat hybridisations of the PCR products. Positive colonies from the second screen were ‘cleaned’ by replating, picking six distinct white colonies and subjecting these to another round of PCR screening. Colonies which still gave a strong positive signal were purified (Quiagen Tip-20 miniprep) and sequenced at the automatic sequencing facility of the Virology department at the Royal Postgraduate Medical School, Hammersmith Hospital. A total of 152 positive colonies were picked, of these 8 still gave strong positive signals after rescreening and cleaning, and seven microsatellite loci were identified following sequencing. This is equivalent to approximately 1 GA or CT microsatellite per 10,000 recombinant cells, or 1 per 5 Mb in the P. argus genome (assuming an average insert size of 500bp).

PCR primers were designed from flanking sequences using the PRIMER 0.5 program (Whitehead Institute for Biomedical Research) and synthesised by Oswell DNA services. PCR reactions were optimised by magnesium and temperature titration and final reactions contained approximately 150ng DNA, 6pM dCTP, lOOpM each of

dA/T/GTP, 1.5 - 2.5mM Mg^^, 5pM each primer 0.04pCi dCTP^^, 20% glycerol, IX Advanced Biotechnologies Reaction Buffer 4 and 0.25 units of Advanced Biotechnologies Taq Polymerase. Amplifications were performed on a Hybaid Omnigene thermal cycler with annealing temperatures of 58 - 62°C. Desired product sizes were less than 200bp, so no elongation stage was necessary. After extensive titration tests, only three primer pairs produced reliable and easily scoreable genotypes. The corresponding loci were named Papl, Pap2 and Pap3. (Primer sequences and PCR protocols are given in Appendix II).

Following amplification, 2|Xl of each reaction product was mixed with an equal volume

of formamide loading buffer (Sambrook et al. 1989), denatured for 2mins at 88°C and run on 8% acrylamide gels containing 0.46g/ml urea. After 12-24 hours of autoradiography, gels were scored by eye on a light box. During the course of the study I also demonstrated that these PCR reactions could be successfully performed by direct amplification from small fragments removed from the wings of live butterflies (Rose et al. 1994, Appendix HI). Although slightly less reliable than standard DNA extractions, this method could be valuable for non-invasive sampling from highly endangered populations.

Only obtaining three useable loci from such an extensive screening and purification program illustrates one of the main drawbacks of attempting to use microsatellite loci.

75 For most species, where little or no sequence data is available, time consuming and labour intensive molecular methods must be employed, whereas allozymes can be used immediately and are usually informative. Attempts to isolate microsatellites in Lepidoptera seem to have met with particular difficulty. Currently, I am aware of one other successful attempt. Emese Meglecz has isolated and used primers for three microsatellite loci in Pamassius mnemosyne, using a straightforward total genomic library screening method similar to that outlined here. Attempts to use libraries enriched for microsatellites have generally been unsuccessful, tending to yield compound and complex repeats (I Saccheri, I Emelianov, O McMillan pers. comm.). It seems likely that microsatellites may be rare in Lepidopteran genomes, compared with other taxa. Taking into account the stringent hybridisation conditions I used, an optimistic estimate of 1 GT microsatellite per megabase in the P. argus genome compares with one every 18 Kb in the mouse genome (Stallings et al. 1991). In Pamassius mnemosyne, Meglecz detected one microsatellite every 96Kb using two dinucleotide and two trinucleotide probes.

5.2.5 Genetic variation at protein and mitochondrial loci Data on microsatellite variability was compared with allozyme and mtDNA data collected by Brookes et al. (1997). They obtained data from 12 polymorphic allozyme loci using methods given in Emelianov et al. (1995) as modified from Richardson et al. (1986), Pasteur et al. (1988) and Mallet et al. (1993). They also performed an RFLP analysis of 780bp of the mitochondrial ‘AT’ rich (control) region (methods in Brookes et al. 1997).

5.2.6 Analysis of data Gene frequencies, expected heterozygosities and deviations of genotypes from Hardy Weinberg equilibrium were calculated using the program BIOSYS-1 (Swofford and Selander 1981). Changes in genetic variation were measured by changes in gene frequency, expected heterozygosity (HJ or number of alleles (nj. Loss of alleles is particularly relevant to conservation, as selection can act to restore perturbed gene frequencies, whereas lost alleles will only be replaced by mutation or immigration. However, realistic constraints on sampling often mean that rare alleles will be missed.

Expected heterozygosity (Nei’s unbiased estimate, H^ = (2N/ (2N-1)) (l-(Epj^)),

which combines the number of alleles and their relative frequencies could be regarded as a more complete measure (Nei et al. 1975). However, although H^ is expected to decline predictably during periods of small population size (Crow 1986), drift-induced changes in allele frequency may cause increases in heterozygosity at some loci (Leberg

76 1992). Theoretical treatments also suggest that a very small population bottleneck is required to appreciably decrease heterozygosity (Wright 1931). The reduction in in a population of given size can be represented by -

= Ho (1 - l/2Ney where H^= heterozygosity of the sample, H q is the heterozygosity of the original population and is the effective population size. Even a one generation bottleneck of only two individuals is expected to contain about 75% of the heterozygosity of the original population. Thus there is no single ideal measure of drift induced change in bottlenecked populations. In this study I have examined changes in heterozygosity, numbers of alleles and gene frequencies between source and founded populations.

Sampling considerations also raise difficulties for assessing the significance of any changes in genetic variability which may be found. Parametric statistical tests of changes in and n^ require estimates of the population allele frequency distributions. However, the genetic consequences of population bottlenecks predominantly concern the loss of rare alleles, precisely those alleles which are most likely to be missed during sampling. Thus accurate predictions about the frequency distributions of rare alleles cannot be made. This is particularly relevant when using allelically diverse markers such as microsatellite loci, where many alleles are expected to be at low frequency.

I have avoided this problem by using a pairwise permutation method (Brookes et al. 1997), which does not rely on assumptions about population gene frequency distributions. This computer simulation first pools all of the individual genotypes from the pair of populations under test, and then samples individuals at random and without replacement to form two new populations of the same size as the originals. The H^ and n^of the two new populations are calculated. Each pairwise test is repeated 1000 times to generate a probability estimate of finding the observed differences in H^ and n^. This tests the null hypothesis that the two samples are drawn from the same population with regard to numbers of alleles and expected heterozygosity. Sampling without replacement avoids assumptions about the allele frequency distributions in the actual populations and sampling of complete genotypes avoids assumptions about the interdependence of loci.

Differences in gene frequency between pairs of populations were analysed by exact G tests of genotype frequency contingency tables using the program GENEPOP 3 (Raymond and Rousset 1995a). The hypothesis tested is H^ = ‘the genotypic distribution is identical across populations’ and the rejection zone is defined as the sum

77 of the probabilities of all tables having a higher or equal G value than the observed one (Raymond and Rousset 19956).

5.3 Results

5.3.1 Genetic variability at microsatellite loci Table 5.1 shows the allele frequency distributions at three microsatellite loci for the nine populations of the study system. Genotypes in four populations (PN, MM, HV and GF), show a strong deviation from HWE (x^p = < 0.01 where = HWE), at locus Papl, and one population (T), shows a strong deviation from HWE at locus Pap3. The sedentary nature of P. argus suggests that fine scale population structure

A x^ test may be inappropriate in this case, where many genotypes are only expected to occur at low frequency. The pooling procedures used by BIOSYS-1 to avoid this problem also greatly reduce the informativeness of the data. In order to overcome this, genotypes were tested for HWE using the Fisher exact test of GENEPOP 3.1. Despite this more accurate analysis, strong deviations from HWE persist in populations PN, MM, HV and GF at locus Papl (p = < ().()(K) in all 4 populations), and in population T at locus Pap3 (p = ().()()2).

Under the assumption of random mating and no population subdivision, the frequency of the null allele (r) may be estimated as r = F|g/(2-F;g), where Fj^is the proportion of heterozygous deficit (Chakraborty et al. 1992). However, when genotype frequencies at loci Papl and Pap3 are corrected for r by this method, large deviations from HWT^ persist. At locus Pap3 in population Terfyn, the expected frequency of r derived from F,s gives five null/null homozygotes, but a clear, scoreable genotype is obtained from every individual in this population. For locus Papl in Mynydd Marian, the expected frequency of r should produce 16 null/null blank lanes on the gel, whereas only three are observed. If the original observed deviations are not caused by a simple null allele, it is possible that they may be due to intra-allelic competition during amplification. It is known that where there is a large size difference at a locus, the smaller allele may be preferentially amplified by PCR (J. Nahmias pers. comm.). It is also possible that mutations in the flanking regions could alter the tertiary structure of the nucleic acid molecule, leading to differential amplification between alleles. The could result in a failure of some alleles to amplify in specific genotypic combinations rather than a general failure of amplification per se.

78 A total of eighteen microsatellite alleles were found in the nine populations of the study system, all of which are present in the source population at the Great Orme (Table 5.1).

Table 5.1 Frequencies, heterozygosity and numbers of alleles, and sample sizes at microsatellite loci. Allele designations indicate relative sizes in bp, where 2 is the smallest allele in each case.

POPULATION GO HV GFRYFGGTBW PN MM Locus Allele Papl 2 0.139 0.046 0.218 0.059 0.115 0.159 0.014 0.134 4 0.082 0.085 0.185 0.221 0.105 0.173 0.134 0.232 0.122 6 0.664 0.685 0.468 0.721 0.872 0.705 0.610 0.613 0.652 8 0.066 0.108 0.129 -- 0.006 - 0.028 0.067 12 0.033 0.038 -- 0.023 - 0.098 0.113 0.024 14 0.008 0.038 ------16 0.008 ------N 61 65 62 34 43 78 41 71 82 Locus Allele Pap2 2 0169 0152 0.105 0.119 0.033 0.108 0.133 0.074 0.059 4 0459 0476 0.711 0.498 0.609 0.488 0.469 0.481 0.576 6 0372 0372 0.184 0.393 0.359 0.404 0.398 0.444 0.365 N 74 82 76 41 47 84 49 81 85 Locus Allele Pap3 2 0.222 0.309 0.200 0.463 0.319 0.470 0.398 0.383 0.463 4 0.184 0.130 0.162 0.134 0.106 0.089 0.235 0.104 0.189 6 0.114 0.105 0.054 0.073 0.277 0.018 0.071 0.078 0.079 8 0.291 0.278 0.292 0.207 0.202 0.351 0.194 0.214 0.201 10 0.051 0.062 0.015 0.012 - 0.018 0.010 0.013 - 12 0.070 0.086 0.092 0.098 0.096 0.036 0.051 0.149 0.055 14 0.051 0.006 0.138 0.012 - 0.018 0.041 0.058 0.012 16 0.019 0.025 0.046 ------N 79 81 65 41 47 84 49 77 82 Average H, 0.658 0.640 0.654 0.584 Ô.5Ù1 0.567 0.647 0.633 0.5*3 Total N, 18 17 15 13 11 14 14 15 14

All of the derived populations show losses of alleles. Generally the losses are only of rare alleles (p < 0.05), however Papl allele 8 is missing from Rhyd-y-foel, Borth wryd and Garth Gogo, having been at a frequency of 0.066 at Great Orme, and allele Pap 1-2, which is at frequency 0.139 at Great Orme is lost from Garth Gogo. The

79 greatest allelic loss is between populations Great Orme and Garth Gogo, which only has 11 alleles, representing a 39% loss of allelic diversity. There are a total of 15 different alleles present in the six Dulas Valley populations, although the site of the original introduction, Rhyd-y-foel, only contains 13 alleles. However, it is not likely that subsequent colonisation bottlenecks have actually increased the number of alleles. This apparent anomaly may represent the smaller sample size at Rhyd-y-foel compared with other Dulas valley populations (42 individuals, compared with 80+ at Terfyn, Plas Newydd, and Mynydd Marian), or a post-introduction population crash.

All of the naturally colonised and artificially introduced populations show reductions in heterozygosity when compared with Great Orme. Again, the greatest reduction is between Great Orme and Garth Gogo where 24% of the expected heterozygosity has been lost. As with allele numbers, natural colonisation events within the Dulas Valley appear to have increased the genetic diversity measured by H^. Plas Newydd, Terfyn and Borth-wryd all have higher heterozygosity than Rhyd-y-foel, from where they were originally colonised. This could be due the observed changes in gene frequency which accompanied colonisation, or again may reflect the subsequent population history of Rhyd-y-foel.

5.3.2 Permutation tests for the effects of colonisation on levels of variability All of the populations derived from the Great Orme show the expected reductions of n^ and Hg associated with drift during colonisation bottlenecks. However, in a study concerned with the conservation of genetic variation, it is important to ascertain if these reductions in genetic variation are greater than the random differences which might be present in any subdivided group of populations. This was tested by performing permutation tests between pairs of populations at three different hierarchical levels relevant to conservation interests. I first ask if there are significant reductions in genetic variation associated with the initial colonisation and introductions from Great Orme. Secondly, I ask if the naturally colonised populations of the Dulas valley are significantly more genetically depauperate than the Great Orme - this represents the loss of variation associated with serial colonisation events. Finally I examine the changes of genetic variation along the actual (or presumed) routes of colonisation to show the effects of individual population bottlenecks. Results are presented Table 5.2 and Figure 5.2.

The daughter populations of Great Orme provided three tests of the null hypothesis that specific colonisation events have not significantly reduced allelic diversity or heterozygosity. Consequently, the acceptable confidence interval was modified by

80 Bonferroni’s correction iplm), wherep is the nominal probability level, and m is the number of independent tests. In this case the revised probability level is p = 0.05/3 = 0.017.

Only the artificial introduction to the Dulas Valley at Rhyd-y-foel was associated with a significant reduction in genetic diversity at microsatelhte loci, despite the fact that more individuals were introduced there than at Graig Fawr (Table 5.2). Obviously the introduction size (N), tells us little about N^, and it is possible that Graig Fawr was colonised by more individuals than Rhyd-y foel. It is also possible that drift associated with subsequent population turnover at Rhyd-y foel caused additional losses of variation. The sample from the naturally colonised population at Happy Valley is almost genetically identical to that from the Great Orme with only one very rare allele having been lost. However, these two populations are separated by a distance of less than 300m and it is possible that there is frequent substantial gene flow between them.

Table 5.2 Pairwise probabilities of obtaining the observed changes in allele number, heterozygosity and gene frequency at microsatellite loci, under the null hypothesis of no differentiation between the Great Orme and descendant populations. Probability of change in H, and n„ calculated by permutation test (Brookes et al. 1997), probability of change in gene frequency calculated by exact G test (section 5.2.6).

Probability of change in: Population N. H. Gene frequency Initial colonisations from GO (p crit = 0.017) HV 0.909 0.572 0.210 GF 0.067 0.878 0.000 RYF 0.007 0.032 0.003 Serial colonisations from GO (p crit = 0.010) GG 0.000 0.000 0.000 MM 0.004 0.020 0.000 PN 0.060 0.342 0.000 BW 0.008 0.586 0.040 T 0.018 0.001 0.000

The second null hypothesis of conservation interest states that multiple colonisation events do not lead to significant losses of genetic variation, in terms of numbers of alleles or heterozygosity. Five tests of this hypothesis are available by comparing each local population of the Dulas Valley with Great Orme. The results these tests are also presented in Table 5.2, where the acceptable confidence interval for samples to be regarded as significantly different is/? = 0.05/5 = 0.01.

81 Serial colonisation events have greatly reduced levels of genetic diversity within the populations of the Dulas Valley (Table 5.2). Plas Newydd is the only Dulas Valley population which has not undergone a significant reduction in either number of alleles or expected heterozygosity at microsatellite loci, when compared with the Great Orme. Plas Newydd is a large and particularly well sheltered site when compared with Rhyd- y-foel, Borth-wryd and Mynydd Marian. It is likely that, once colonised, this site has persisted and acted as a source for colonists following extinctions of neighbouring populations. It is known that there has been such population turnover within the Dulas Valley since the initial establishment of the populations. P. argus was present at Borth- wryd in 1972, extinct there by 1983, and had recolonised by 1990 (Thomas and Harrison 1993). Additionally, the genetic data strongly suggests that the original introduction site at Rhyd-y-foel, a particularly small patch, has been extinct at some point since 1945. Levels of genetic variation at Plas Newydd may thus best represent those initially introduced to the valley.

Figure 5.2 shows the changes in allele number, heterozygosity and gene frequency associated with each presumed individual colonisation event, also given are the probabilities of obtaining these changes under the assumption of no differentiation between populations (Within the Dulas Valley the most likely colonisation routes have been inferred from the proximity of the populations). The null hypothesis tested is, ‘individual colonisation events associated with the spread of the butterfly do not cause significant losses of genetic variation at microsatellite loci’, and the results of the permutation tests suggest that it cannot be rejected.

Levels of genetic variation may have been maintained either by large colonising propagules or post-colonisation gene flow. The introduction (N = 30 mated females) to Graig Fawr occurred over a distance great enough to rule out gene flow from the Great Orme (29km), but the bottleneck effective population size was high enough to prevent significant loss of alleles or heterozygosity. Additionally, mark-release experiments within the Dulas valley itself suggest that gene flow may be substantial, with up to 1.4% of individuals moving between patches separated by 13 -200 meters (Lewis et al. 1997). The only colonisation apparently associated with a large loss of genetic variation, in terms of both n^ and H^, is the initial introduction of 90 mated females to Rhyd-y-foel. However, as stated previously, genetic and ecological evidence indicates that the levels of genetic variation at Rhyd-y-foel are probably the product of post colonisation events, as well as due to the initial colonisation.

82 Figure 5.2 Schematic diagram showing changes in allele number and heterozygosity at microsatellite loci along routes of artificial introductions and natural colonisation events. Solid boxes show n, and H, for each population. Dashed boxes show the two tailed permutation test probabilities of the null hypothesis of no difference in number of alleles (ua, underlined) and heterozygosity (He) between populations. Appropriate Bonferroni correction should be applied to these probabilities. For example, the nominal level of significance for change during any individual colonisation event is P = 0.05/8 = 0.006. Bold arrows indicate documented artificial introductions. Thin arrows represent presumed routes of natural colonisation. Staight line distances between populations are given.

HV 0.2km GO 29km GF Ha 17 ^ ! 0~851 1 18 1 15 He 0.640 ^ ^ 1 0.544 ! 0.658 , 0.909 0.654

0.593 1km 0^74 0.567 I 0.521 '[ I 0.204 I

I 0.648 I I 0.577"' I ' 0.566 "' 0.024 I 0.633 \ 0.084 I 0.15km

0.647

5.3.3 Patterns of variability at allozyme and mitochondrial loci This study is also concerned with the efficacy of using various different genetic markers in conservation genetic studies. The nine populations of the study system have been assayed for variation at allozyme and mitochondrial loci by Brookes et al. (1997). Table 5.3 compares the levels of variation revealed by all three types of marker and the costs and effort involved in using them.

83 Table 5.3 Comparison of marker loci used for genotypingP. argus populations from N. Wales. Cost are for consumables, and length of study includes development time for microsatellite and mtDNA loci.

Microsatellites Allozymes mt DNA RFLP Number of loci 3 12 1 Total n„ 18 40 19 Average pop. H, 0.609 0.236 0.609 Average pop. 14.6 27.7 7 Average A -8.6% -5.85% -16.5% Average A n^ -21.6% -25.9% -33.8% Max A H, -24% (GO=>GG) -14% (GO=>RYF) -26% (GO=>PN) Max A n„ -39% (GO=>GG) -36% (GO=>RYF) -60% (GO=>T) Cost of study £3500 £1000 £1000 Length of study 14 months 6 months 6 months

Microsatellites and mtDNA RFLP's have revealed higher levels of per locus polymorphism than allozymes. However, the relative ease of allozyme electrophoresis means that a larger fraction of the genome can be sampled faster and at lower cost than for the two molecular markers. Allozyme and microsatellites show similar average reductions in and n^ between Great Orme and its daughter populations, though microsatellites are somewhat more sensitive to changes in H^, while allozymes are more sensitive to changes in n^. This difference may be due to the greater evenness of allele frequency distributions at microsatellite loci, where rarer alleles contribute more

to l-(Zpi^). Conversely, at five allozyme loci (Got-f, Sdh, Idh, Pp and Lgg-s), there are common alleles at frequencies of greater than 95%, and one or two very rare alleles. These very rare alleles are very likely to be lost during population bottlenecks, but contribute little to H^. The high allelic diversity of the single mtDNA locus cause this marker to be more sensitive to the effects of sampling on and n^ than either microsatellites or allozymes.

Brookes et al, (1997) also used permutation tests between Great Orme and its daughter populations to examine the effects of colonisation, and their results are compared directly with those obtained from microsatellites in Table 5.4

84 Table 5.4 Comparison of differentiation at allozyme, mitochondrial and microsatellite loci. Figures given are pairwise probabilities of obtaining the observed changes in each genetic parameter under the null hypothesis of no differentiation between Great Orme and descendent populations. Probabilities calculated as for Table 5.2. Allozyme and microsatellite data from Brookes et al. (1997).

Probability of change in: Population n. H. Gene frequency Initial colonisations from GO (p crit = 0.017) HV allozymes (A) 0.003 0.043 0.042 mtDNA (mt) 0.822 0.526 0.121 microsatellites (M) 0.909 0.572 0.210 GF A 0.000 0.634 0.000 mt 0.089 0.000 0.000 M 0.067 0.878 0.000 RYF A 0.001 0.018 0.006 mt 0.785 0.077 0.005 M 0.007 0.032 0.003 Serial colonisations from GO (p crit = 0.010) GG allozymes (A) 0.007 0.027 0.061 mtDNA (mt) 0.220 0.037 0.245 microsatellites (M) 0.000 0.000 0.000 MM A 0.027 0.685 0.001 mt 0.007 0.034 0.023 M 0.004 0.020 0.000 PN A 0.008 0.710 0.000 mt 0.511 0.000 0.007 M 0.060 0.342 0.000 BW A 0.003 0.455 0.126 mt 0.478 0.003 0.040 M 0.008 0.586 0.040 T A 0.000 0.193 0.000 mt 0.478 0.003 0.040 M 0.018 0.001 0.000

The 12 allozyme loci show the greatest sensitivity to changes in allele number. All of the initial introductions and serial colonisations appear to have caused reductions in allozyme n^ which are extremely unlikely under the null hypothesis of no differentiation between populations. Expected heterozygosity is a less sensitive measure of the genetic effects of both single and serial colonisation events across all three types of genetic marker. There is some evidence that colonisation has caused

85 significant reduction of heterozygosity at allelically diverse microsatellite and mtDNA loci (GO=»T, GO=>PN, GO=>GG), but allozymes do not detect this. Changes in gene frequency between Great Orme and all daughter populations are consistently revealed. Garth Gogo is the only population in which the probabilities of the observed gene frequency change show a large discrepancy between the different markers.

Figure 5.3 Changes in allele number and heterozygosity at allozyme loci, during spread ofP.argus in North Wales. Details of figure as for fig 5.2.

0 . 2 k m HV GO 2 9 k m GF “ a 29 Ô.ÔÔ3 . 36 I 25 ^ , 0.043 J ] 0.634 H e 0.229 0.249 0.245

1 \ô .œ i 0.018 1 1 3 k m MM 29 0.245 T I k m 27 rôwfv^ 0.236 J RYF I 0.114 I 23 0 . 6 k m 0.214 Â227 0 . 5 k m PN I 0.009 I 0.323 I 28 0.465 I 0.758 I 0.056 I 0..245 0 . 1 5 k m GG 26 Ï 0.220 BW 26 0.242

86 Figure 5.4 Changes in allele number and heterozygosity at the mtDNA control region, during spread ofP. argus in North Wales. Details of figure as for fig 5.2.

0 . 2 k m HV GO 2 9 k m GF U a 9 0.822 1 . 10 I^ ] 0 . 0 S 9 6 He 0.526 1 \o.ooo 0.788 0.759 -1 0.530

MM 5 0.663 T I k m 4 Ô.ÔÔ8 0.368 RYF i 0.025 , 0.080 I 1 5 k m 8 0.046 . 0 . 6 k m 0.632 ! - - p -J [■Jæ T 0 . 5 k m PN I [‘ 0 5 0 £ “i I 0.951 ^ 8 0.465 I 0..534 0.056 I 0 . 1 5 k m GG 6 i 0.623 BW 7 0.580

Figures 5.3 and 5.4 complement Figure 5.2 and show the probabilities of obtaining the observed changes in allele number, heterozygosity and gene frequency for individual colonisation events at allozyme and mitochondrial loci respectively. For allozymes, unlike microsatellites, all three initial colonisation events from the Great Orme have resulted in significant losses of alleles. However, permutation tests again suggest that following an initial purging of rare alleles during the introduction to Rhyd-y-foel, the genetic effects of subsequent natural colonisation events within the Dulas Valley are minimal. Allozyme data also supports the hypothesis that Rhyd-y-foel has suffered further bottlenecks at some point since butterflies were introduced in 1942, as it has less alleles than any of the other Dulas Valley populations. When all colonisation events within the Dulas Valley are considered together, there are no significant reductions in numbers of mitochondrial alleles, and only the introduction from Great Orme to Graig Fawr results in a significant loss of mitochondrial heterozygosity.

87 5.4. Discussion

5.4.1 Genetic effects of artificial introductions and natural colonisation The well studied demographic history of P. argus in North Wales allows a distinction to be made between the genetic effects of single and multiple colonisation events. At microsatellite loci, neither single natural colonisations nor the artificial introduction of 30 mated females to Graig Fawr are associated with a significant loss of genetic diversity, whereas there is strong evidence that serial colonisation events can cause reductions in numbers of alleles and expected heterozygosity. In contrast, n^ at allozyme loci showed significant reductions in samples from both artificial introductions and the natural colonisation of Happy Valley, as well as in the populations serially colonised from Great Orme. None of the individual colonisation events associated with the spread of the butterfly through the Dulas Valley have caused significant losses of n^ or at allozyme or microsatellite loci. This is presumably because the markers lose sensitivity to genetic drift when rare alleles are purged during the initial colonisation events.

While there is much concern that loss of genetic diversity may be a threat to population survival, there no clear agreement as to how that genetic diversity is best measured. Genetic diversity can be seen as analogous to species diversity. Numbers of alleles is equivalent to species ‘richness’, whereas expected heterozygosity has the advantage of incorporating both allele numbers (richness) and frequencies (evenness) (Mallet 1996). Expected heterozygosity could thus been seen as the obvious choice. However, the rare alleles which are most sensitive to drift contribute little to the expected heterozygosity. Additionally, large reductions in heterozygosity may be regarded as recoverable, when caused by fluctuations in gene frequency without actual losses of alleles. Measures of allelic richness also have their associated difficulties. While heterozygosity measures stabilise rapidly with increasing sampling effort, the lower limit of allele frequency is 1/2N, and it will be impossible to detect all alleles in a large population unless the whole population is sampled.

This difficulty is illustrated by the detection of four rare allozyme alleles in the Dulas Valley which were not sampled at the Great Orme. These alleles are most likely present at Great Orme and were only found in the Dulas Valley because of the greater overall sampling intensity (398 individuals in total, compared with 90 at Great Orme), Thus to ask whether two populations differ significantly in numbers of alleles is extremely difficult. The permutation method used here strictly only tests whether the two samples differ significantly in n^ and H^. However, permutation tests between 1992 and 1994 collections from the same populations failed to detect any significant changes in n^ or

88 Hg, and no novel microsatellite alleles were found in the Dulas Valley, despite much higher sampling intensity. This suggests that for microsatellite loci at least, our samples provide good representations of population allele frequency distributions.

Changes in and n^ at genetic marker loci in P. argus suggests that, in line with theoretical predictions (Gilpin 1991), populations subject to turnover can lose genetic variation rapidly even though census numbers remain high. Gilpin uses particularly extreme conditions to show that drift associated with frequent colonisation and extinction can erode levels of genetic variation orders of magnitude faster than in stable populations - all colonising propagules in his model consist of two individuals and there is no recovery of variation through mutation or migration. In the Dulas Valley, where the initial introduction was 90 mated females and the rapid spread since 1942 indicates that interpopulation migration could be quite frequent, populations still show significant losses of genetic variation. On average these populations have lost 11% of Hg and 25% of n^ at microsatellite loci compared with the source population at Great Orme, despite maintaining census numbers in the tens of thousands. Where loss of through drift over t generations is H, = Hg(l-1/2NJ\ a single population as small as 1000 individuals would only be expected to lose 2.5% of in the 50 years since the introduction to the Valley. It is clear that population turnover has accelerated the depletion of levels of genetic variation in P. argus, but will this loss continue?

The rapid spread of the species following introduction in the Dulas Valley suggest high levels of migration, but the maintenance of significant gene frequency differences between many of these local populations show that they are genetically distinct. Direct estimates of dispersal also suggest that these differences will persist, as gene flow between the local populations is extremely rare (Lewis et al. 1997). The successional nature of limestone grassland vegetation, and the highly specific habitat requirements of P. argus, mean that small local populations are ephemeral, and there is genetic and demographic evidence of continuing population turnover within the Dulas Valley (Thomas and Harrison 1992, Brookes et al. 1997). However, most of the genetic changes between GO and its daughter populations involve the loss of alleles which were initially rare (p<0.05 at GO). Failure to detect any significant losses of H^ or n^ associated with colonisation events within the Dulas Valley at microsatellite, allozyme or mitochondrial loci, suggest that levels of genetic variation similar to those seen today can be maintained across a shifting mosaic of suitable habitat.

89 5.4.2 Comparison of microsatellite, allozyme and mitochondrial variation The utility of any genetic marker for detecting changes in levels of variation will depend upon the number and extent of polymorphisms at the loci examined. The relatively large number of alleles and high heterozygosities at P argus microsatellite loci are typical of those found in other taxa (Bruford and Wayne 1993, Jame and Lagoda 1997), and make these markers particularly attractive for studying the effects of drift and gene flow during colonisation. However, the number of loci available is low and the cost per genotype, in terms of time and money, is high. The general problems associated with developing microsatellites in previously unexamined species will gradually be alleviated by the accumulation of primers and the increasing use of interspecies amplification (Deka et al. 1994, Gotelli et al. 1994). Consequently microsatellites can still be regarded as being ‘in development’ for population studies and may soon be as straightforward to use as allozyme electrophoresis - currently the cheapest and most reliable method for revealing genomic variation in most species.

Average reductions in n^ and between Great Orme and its daughter populations are very similar at microsatellite and allozyme loci (Table 5.3). However, individual colonisation events reveal inconsistencies in the sensitivity of the two markers, caused by the small sample of total genomic variation revealed by each. This illustrates the general problem of extrapolating from marker loci to genome wide effects. Although the extent of polymorphism at individual allozyme loci was lower than for microsatellites, the larger fraction of the genome sampled (12 loci), gave more rare alleles. Consequently, n^ at allozyme loci was more sensitive to the initial colonisations from Great Orme. This is generally consistent with other comparative studies of marker loci, where resolution of spatial genetic differentiation increases with total allele number (Karl and Avise 1992, Zhang et al. 1993, Pogson et al. 1995).

Genetic variation in mitochondrial DNA is expected to be particularly sensitive to population bottlenecks because maternally inherited mtDNA has only a quarter of the population size of nuclear genes (Birky et al. 1989), and has an approximately 10 fold higher rate of base substitution (Brown et al. 1982). Among all the newly colonised populations in the study, an average of 36% of alleles and 22% of heterozygosity has been lost at the mitochondrial locus examined. The locus used by Brookes et al. (1997) was highly variable, revealing a total of 19 alleles. Despite large changes in mitochondrial allele number and heterozygosity associated with several colonisation events, these are rarely significant changes. Clearly the sampling effects at a single locus with many rare alleles will be large and so large differences in allele number and heterozygosity will be a relatively frequent product of the pairwise permutation tests.

90 This study suggests that there is little to choose between using microsatellite or allozyme loci for measuring the genome-wide effects of population bottlenecks. The overall consistency of the two markers indicates that these allozyme loci can be regarded as selectively neutral, and they provide the most effective coverage of the genome in terms of both cost and time. However, most of the sensitivity to changes in variability at allozyme loci arises through the loss of very rare alleles, effects which probably have little relevance to viability. Thus it may be argued that as technical developments make larger numbers of loci available, microsatellites will provide the most sensitive measure of changes in both richness and evenness of genetic variability.

5.4.3 Conservation of P. argus in North Wales The importance of genetics in conservation is difficult to assess. While genetic considerations are undoubtedly of great concern in the management of small captive populations (Templeton 1987, Butler et al. 1994), natural populations of the same size will usually be much more vulnerable to demographic threats (Lande 1988, Nunney and Campbell 1993). Despite studies connecting a paucity of genetic variation with reduced fitness (O’Brien et al. 1985, Keller et al. 1994, Vrijenhoek 1994), the relevance of this to population extinction remains highly contentious (Caro and Laurenson 1994). It is not surprising that demographic arguments are more persuasive, an endangered population in its last throes will almost certainly be finally extinguished as a consequence of its extremely low numbers. The question for conservation geneticists is more subtle, could genetic factors be important in lowering population sizes enough to make them vulnerable to demographic catastrophe?

I have studied what is likely to become an increasing frequent scenario for many endangered British species - artificial introductions and natural colonisation events associated with spread through a fragmented habitat. This spread has been accompanied by significant losses of alleles and reductions in heterozygosity at microsatellite, allozyme and mitochondrial loci. However, only rare alleles have been lost, and while their importance for future adaptation is impossible to assess, they cannot, by definition, have much relevance to current population fitness. Additionally, the thriving daughter populations in the Dulas Valley are some of the largest in the country, with approximately 10^ adults present in the entire metapopulation. The recent history of P. argus in North Wales seems to represent a conservation success story which has rested far more on the availability and maintenance of a network of suitable habitat, than genetic factors.

91 C h a p t e r 6

Assessing the genetic consequences of demographic change in British populations of a declining butterfly species

Abstract The metapopulation structure in British populations of P. argus allows the genetic effects of habitat fragmentation to be studied over timescales of thousands of generations. Current patterns of genetic variability reflect the conflicting effects of gene flow during the initial spread of the species and genetic drift during population turnover. I have examined these patterns at allozyme and microsatellite loci in 9 British and 4 European populations. Permutation tests between source and daughter populations, and spatial analysis of levels of genetic variability, suggest that colonisation bottlenecks associated with the spread of the species across Britain, have been large enough to prevent substantial drift. Significant losses of allozyme alleles have occurred during the colonisation of Britain from Europe, but once these rare alleles are purged, neither marker gave consistent evidence for persistent losses of variation through population turnover within Britain. Rg^^and Fg^ estimates based on microsatellite loci gave consistent estimates of Nm, indicating that stepwise mutations have not had a significant effect on allele frequency distributions within these populations. Levels of genetic structuring at allozyme loci were also consistent with those at microsatellites and show little interpopulation differentiation between completely isolated populations (Microsatellite Fg^ = 0.103, Allozyme Fg^ = 0.107 for all populations). Indirect estimates of gene flow based on the genetic data suggest that current patterns of genetic variation in British populations of P. argus are the product of greatly enhanced gene flow during the post-glacial spread of the species, followed by low levels of drift in isolated populations of large effective size. Taken in conjunction with the results of Chapter 5, this suggest that losses of genetic variability through population turnover in P. argus are not affecting the persistence of this declining species.

92 6.1 Introduction Despite having lost genetic diversity through serial colonisation events, the thriving P. argus populations of the Dulas Valley show no signs of undergoing a genetically induced demographic catastrophe (Chapter 5). However, this is a ‘young’ system (only 55 generations old), and it is unclear if continual population turnover could deplete levels of variation to the extent where individual fitness is affected. The genetic effects of metapopulation structure over much longer timescales can be revealed through the patterns of genetic variation in established populations of P. argus from Britain and Europe.

The current distribution of P. argus in Britain is a product of great range expansion during the Flandrian invasion which followed the last ice age, and great range contraction following the ‘improvement’ of most British lowland heaths and chalk grasslands during this century (Thomas 1985, Thomas and Lewington 1991, Dennis 1992). Increased levels of gene flow during range expansion may produce patterns of

Additional references for the genetic consequences of post glacial expansion are Nichols and Hewitt (1994) and Hewitt (1996). homogeneity (Scwaegerle and Schaal 1979, Ross 1983, 8tone and Sunnucks 1993). The interaction of these two processes will determine the overall levels and distribution of genetic variability.

Understanding past and present interactions of drift and gene flow is crucial to any study concerned with the conservation of genetic variation. They can be studied directly, by monitoring current dispersal or population size, or indirectly, from spatial distributions of allele frequencies (Slatkin 1987). Indirect estimates average demographic processes over historical timescales, and frequently show discrepancies with direct estimates which only reflect current conditions (Ehrlich et al. 1975, Jones et al. 1981, Karl and Avise 1992). However, the nature and magnitude of such discrepancies reflect the past influences of drift and gene flow, and so can be used to reconstruct the demographic history of the species.

In this chapter, I use neutral marker loci to determine whether colonisation and population turnover have significantly reduced genetic variability in British populations of P. argus. I then use spatial patterns of genetic structure at the same loci to obtain indirect estimates of the demography of the species. These are compared with direct estimates from ecological studies (Thomas 1985, Thomas and Harrison 1992, Lewis et al. 1997), to assess the relative effects of post-glacial population spread and subsequent turnover within isolated metapopulations, on levels of genetic variability.

93 6.1.2 P. argus in Britain The silver studded blue butterfly is widely but patchily distributed throughout Europe and Asia, on early successional heaths and grasslands. It is univoltine and polyphagous, feeding on various Ericaceae, Leguminoseae and Cistaceae. Eggs are laid in warm microclimates on the vegetation boundaries, where the caterpillars feed on tender meristematic shoots and are tended by ants of the genus Lasius (Jordano and Thomas 1992). Consequently the species is dependent on the broken ground and young vegetation of early succession following habitat disturbance. In Britain, P. argus finds such suitable habitat on four different biotopes.

1. Heathland. Host plants include heathers (Erica cinerea. Erica tetralix and Calluna vulgaris), and gorse (Ulex galli and Ulex europaeus). Succession is caused by disturbances such as burning, cutting or grazing of mature heaths. Following disturbance, the habitat probably remains suitable for P. argus for 10-50 years during which time the dwarf shrubs mature and obscure the vegetation margins (Thomas and Harrison 1992). Thus frequent disturbance will cause a shifting mosaic of suitable habitat patches within a larger heathland area.

2. Mossland. Similar to wet heathland, supporting the same heather host plants. Peat digging has traditionally been the major form of mossland disturbance and creates a turnover of suitable habitat on the same timescale as for dry heaths (Thomas 1985).

3. Calcareous (limestone and chalk) grasslands. Host plants comprise mainly of rock roses (Helianthemum), and birds-foot trefoil (Lotus comiculatus). Here the early successional habitat is generally formed by grazing, quarrying activity and natural disturbances on crags and screes (Chapter 5, section 5.2.1). Succession through natural disturbances is slow and consequently some habitat may remain suitable for P. argus for tens or hundreds of years on this biotope (Thomas and Harrison 1992).

4. Calcareous sand dunes. Restricted to the North Cornish coast, this biotope supports the same host plants as limestone grassland. Disturbances include grazing, where vegetation densities are high enough, and dune erosion.

P. argus is particularly suitable for studying the long-term effects of habitat fragmentation. A combination of the destruction of lowland heaths and the cessation of traditional agricultural practices has made it one of Britain’s most rapidly declining butterfly species (Thomas 1985, Thomas and Lewington 1991). However, where present, the butterfly is still numerous with some populations exceeding one million individuals. The ecology of the species is well studied. P. argus is very sedentary and

94 a poor coloniser (Thomas 1985, Lewis et al. 1997). 90% of adults move less than 20m in their 4-5 day lifespans, and the furthest recorded distance ever moved by an individual is 350m (Thomas 1985), suggesting that populations separated by more than lOKm of unsuitable habitat are effectively completely isolated. This allows the genetic effects of population turnover to be assessed in isolated natural metapopulations with high enough census numbers not to be threatened by demographic stochasticity. Direct estimates of dispersal and census size are also available for comparison with those estimated indirectly from neutral genetic marker loci (Thomas 1985, Thomas and Harrison 1992, Lewis et al. 1997).

This study analyses genetic variation at microsatellite and allozyme loci in samples of individuals from 13 isolated British and European populations, including all four remaining Northern British populations which pre-date the introduction to the Dulas valley in 1942 (Chapter 5). I use levels of genetic variability, with direct and indirect estimates of dispersal and population size in P.argus , to differentiate between three possible demographic scenarios. 1) that drift dominates, and populations have lost genetic variability rapidly through colonisation events and population turnover. 2) that gene flow dominates, and genetic heterogeneity established during range expansion persists, despite current isolation. 3) that populations are at equilibrium and losses of variability through drift are balanced by gains through gene flow or mutation.

6.2 Methods

6.2.1 Sample Collection Samples were collected during June and July of 1992 and 1993, collection procedures were as described in Chapter 5 (section 5.2.2). European samples were collected by H. Descimon. Details of samples are given in Table 6.1 and locations in Figure 6.1.

6.2.2 Collection of genetic data Individuals were genotyped at the three microsatellite loci Papl, Pap2 and Pap3 according to the methods described in Chapter 5 (section 5.2.4). Locus Pap2 repeatedly failed to amplify in any individuals from the Chobham Common and Monte Baldo samples. Excluding these two populations, the amplification success rate at this locus was 97.5%. Failure from an entire population suggests that a PCR contaminant was introduced during the DNA extraction procedure, which was performed in population ‘batches’. It is surprising that any reaction contaminant should selectively affect one locus. However, Pap2 amplifications had to be performed in the presence of more magnesium ions than Papl and Pap3 (1.5mM compared with l.OmM), possibly indicating that this reaction is more sensitive to contaminants.

95 Table 6.1 Location, sample size (N), and biotope of the 13 study populations.

Name Abbreviation Location Biotope N BRITAIN South Stack Range SSR SH 207 826 Heathland 42 Great Orme GO SH 768 824 Limestone Grassland 40 Llyn Hafod-y-Llyn HYL SH 575 490 Mossland 41 Frees Heath PH SJ 559 374 Heathland 42 Chobham Common CC SU 955 653 Heathland 39 Studland Heath SH SZ 025 835 Heathland 42 Penns Weare m SY 696 708 Limestone Grassland 42 Breney Common BC SX 055 613 Heathland 40 Penhale Sands PSE SW 774 556 Sand Dune 40 EUROPE Crevoux CR 44“ 30’ 6°40’ Limestone Grassland 39 Cervieres CV 44° 90’ 6°45’ Limestone Grassland 31 Plar d’Aups PA 43°35’ 5°55’ Heathland 22 Monte Baldo MB 45° 50’ 10° 55’ Limestone Grassland 17

SSR ,G0

PA HYL CC

PSB Figure 6.1. Location of study populations. SH

With the exception of these 2 populations, all 3 loci produced clear, easily scoreable genotypes with an overall amplification success rate of 90.5%. However, as in the North Wales populations (Chapter 5), genotypes at microsatellite locus Papl showed significant deviations from Hardy Weinberg expectation (%^p <0.01 where H» =

96 HWE) in 11 of 13 populations. Again this was probably caused by interallelic

A test may be inappropriate in this case where many genotypes are only expected to occur at low frequency. The poohng procedures used by BIOSYS-1 to avoid this problem also greatly reduces the informativeness of the data. In order to overcome this, genotypes were tested for HWE using the Fisher exact test of GENEPOP 3.1. Despite using this more accurate procedure, the strong deviations from HWE (p = <0.001) observed at locus Papl in 11 populations persist.

6.2.3 Analysis of levels of genetic variability The genetic consequences of colonisation and population turnover were measured in terms of changes in numbers of alleles (n^), heterozygosity (HJ and gene frequencies between source and derived populations. Putative ‘European source’ and ‘British source’ populations were identified as those having the highest levels of genetic diversity, (See Chapter 5, section 5.2.6 for a discussion of the use of H^ and n^ as indices of genetic diversity). The significance of changes in genetic variability between ‘source’ and ‘derived’ populations were assessed by performing pairwise permutation tests (Chapter 5, section 5.2.6), at two levels. 1) between a source European population and the derived British populations to test the significance of losses of variability associated with the original colonisation of Britain. 2) Between a source British population and other British populations to test for significant losses of variation associated with the spread of the species in this country. Differences in genetic heterogeneity between source and derived populations was tested for significance by performing G-based exact tests on allele frequency contingency tables using the program GENEPOP 3.1 (Raymond and Rousset 1995).

I also tested for the effects of sequential colonisation by examining spatial variation in levels of genetic variability in British populations. This was performed by linear regression of H^ and n^ on physical distance from source. Following Stone and Sunnucks (1993), log^-transformed values of heterozygosity and numbers of alleles were used, because this allows values to asymptote (as monomorphism at all loci is approached), without becoming negative. A significant, non-curvilinear, negative relationship between genetic variability measures and distance from the source population would suggest a tendency for genetic variability to decline predictably during range expansion. Such an effect would be produced by sequential subsampling during serial colonisation events (Stone and Sunnucks 1993)

97 6.2.4 Analysis of genetic structure Levels of genetic subdivision in British and European populations of P. argus were quantified by estimating Wright’s Fg-p for microsatellite and allozyme loci (Wright 1978). Weir and Cockerham’s 0 estimator of was calculated using the program

FSTAT (Goudet 1986), as it has the advantage of using weighting regimes to account for differences in sample sizes and levels of polymorphism between loci. Fg^, the between population component of genetic variance, is standardised by mean allele frequencies among all populations; consequently alleles at neutral loci should give similar estimates of the statistic, regardless of their average frequencies. FSTAT also tests if Fgy estimates are significantly different from zero by performing permutation tests of multilocus genotypes among samples.

The historical effects of drift and gene flow in P. argus were assessed by comparing indirect estimates of Nm (where N is the local population size and m is the fraction immigrating per generation) derived from Fg^ at microsatellite and allozyme loci with direct estimates of dispersal. The degree of genetic structure present between populations is a result of present and past interactions of drift, mutation and gene flow. Calculating Fg^ allows demographic parameters to be indirectly estimated from the resulting patterns of genetic heterogeneity. As an estimate of gene flow between populations, Wright showed that Fg^ = 1/(1+4Nm) for selectively neutral alleles, in populations at equilibrium between gene flow and genetic drift, and where mutation rates are assumed to be « m. Although this relationship between Fg.j. and Nm is based on an infinite island population model, it also approximates to a stepping stone model, where gene flow only occurs between adjacent populations (Slatkin and Barton 1989).

In natural populations, demographic perturbations will frequently prevent the attainment of gene flow/drift equilibria, which are approached slowly (over timescales of the order of 2N^ generations, where 2N^ is the effective population size (Whitlock 1992)). However, the differences between indirect estimates of Nm, and direct observations of dispersal indicate the relative importance of drift and gene flow in the history of the population, and allow the nature of demographic perturbations to be inferred. I first determine whether British populations of P.argus are likely to be at equilibrium between loss of variation through drift and gain of variation through gene

flow and mutation (Fg.j.= l/(l+4Ng(m+jU)). In the absence of such an equilibrium, I ask whether current levels of differentiation are high as a result of historically increased drift (l/2Ng » m, ji), or low as a result of historically increased gene flow (m »

1/2N„ M)

98 6.2.5 Stepwise mutation and genetic structure at microsatellite loci It has been suggested that may not be the most appropriate statistic to describe allele frequency data at microsatellite loci (Slatkin 1995), as it is based on the assumptions ofa&alleles mutation model where mutation events generate new alleles randomly in one of k possible allelic states. This assumption is almost certainly violated at microsatellite loci, where frequent stepwise mutation means that most newly mutated alleles are closely related to their progenitor - the mutational process has a ‘memory’. However, this ‘mutational memory’ can be utilised to derive statistics which more effectively measure differentiation at microsatellite loci, because the difference in repeat number between alleles carries information about the time that has passed since they shared a common ancestral allele.

Slatkin (1991), showed that for the k alleles mutation model, when k is large and mutation rates are low, = ( f - t^yt, wheret is the average coalescence time of 2 copies of an allele from the total population, and is the average coalescence time of 2 copies of an allele from the same subpopulation. For microsatellite loci subject to stepwise mutation, these coalescence times correspond to the variance in size between alleles. So the expected variance in allele size within populations, E (S^) = where 2|ifg represents the number of mutations in both allelic lineages since common ancestry, and is the variance of change in allele size. Similarly, the expected variance in allele size between populations E(S) = . So taking the ratio

Rst = (S - S^)/S = )l2\itd^^ which after canceling gives Rg^ = (f- suggests that Rg^ for microsatellite loci under the stepwise mutation model is analogous to Fg^under the k allele model (Slatkin 1995).

estimates and their associated Nm values for P. argus microsatellite loci were obtained using the program Rg^ CALC (Goodman 1997). These were compared with Fgy estimates from the same loci, over different spatial and temporal scales to assess the importance of the mutational process in determining allele frequency distributions. Slatkin has demonstrated that Fg.^ tends to underestimate levels of genetic differentiation when compared with R ^ over demographic timescales long enough for the mutational process to be relevant. To account for sampling of a relatively small number of populations, Nm is derived from R^^by Nm = ((dj,-l)/4dJ((l/Rg-j.) -1), where d^ is the number of populations. (A correction for sampling bias is already incorporated into Weir and Cockerham’s 0 estimator of Fgy).

99 6.3 Results

6.3.1 Changes in genetic variability during the post-glacial spread ofP. argus Table 6.2 presents allele frequency data at three microsatellite loci for the 13 populations of the study system. A total of 38 alleles are present, 31 in British and 34 in European populations, with 4 alleles unique to Britain and 7 unique to Europe. Sampling intensity was much greater in Britain than Europe (368 individuals compared with 78), suggesting that European populations of P. argus have more allelicly diversity than British, as estimates of n^ asymptote slowly with increasing sampling effort. Levels of genetic variability in individual populations are shown in Figure 6.2.

c v SSR GO CR 25 20 17 21 710 MB .734 658 823 15 H 7 T 732 PA 16 14 ,712 459

733

FSB- 20 ,737 Figure 6.2a. Num bers of alleles and expected heterozygosity at 3 m icrosatellite loci. 14 18 25 603 730 758

s s ir 33 24 34 361 H. ,261 25 -J pPH“ 396 HTir / 348 ■nr 24 30 ,341 TC- 377 29 374

PSB" 25 ,335 Figure 6.2b. Num bers of alleles and expected heterozygosity at 10 allozym e loci. 24 24 28 314 319 390

100 Table 6.2. Frequencies, heterozygosity and numbers of alleles at microsatellite loci.

POPULATION Locus Allele SSR GO HYLPH CCSH PW BCPSB CR CVPA MB P503 2 ------0 . 0 1 9 0 . 0 5 0 -- 4 ------0.025- - 6 ------0.025- - 1 0 ------0 . 0 2 5 -- 1 4 0 . 0 2 4 --- 0 . 0 2 4 0 . 0 1 5 - 0 . 0 1 6 0 . 2 0 0 0 . 2 0 0 0 . 0 2 5 -- 16 0.071 0.107 --- 0 . 0 3 0 ---- 0 . 1 0 0 0 . 8 4 6 - 1 8 - 0 . 0 5 4 - 0.125 0.167 0.015 0.038 --- 0 . 0 2 5 - 0 . 0 9 4 20 0.024 - 0.019 0.179 0.048 0.091 0.077 - 0.020 0.040 0.025 - 0 . 0 3 1 2 2 ---- 0 . 1 2 1 - 0 . 1 0 9 0 . 0 8 0 --- 0 . 2 1 9 2 4 0 . 4 0 5 0 . 7 3 2 0 . 5 7 7 0 . 3 0 4 0.500 0.242 0.500 0.578 0.360 0.260 0.250 0 . 0 7 7 0 . 2 8 1 2 5 0 . 3 3 3 -- 0 . 0 5 4 0 . 0 7 1 0 . 0 1 5 0 . 0 1 9 0 . 0 9 4 - 0 . 0 1 9 - 0 . 0 3 8 0 . 0 3 1 2 6 0 . 1 4 3 0.054 0.096 0.143 - 0.167 0.250 0 . 0 4 7 0.080 0.231 0.375 0 . 0 3 8 0 . 0 3 1 28 - 0.036 0.115 0 . 1 9 6 0 . 1 6 7 0 . 1 9 7 0 . 0 5 8 0 . 1 5 6 0 . 2 6 0 0 . 0 7 7 -- 0 . 1 8 8 3 0 - - 0 . 1 9 2 - 0 . 0 2 4 0.106 0.058 - - 0.058 0 . 0 5 0 -- 34 -0.018------0 . 0 3 1 3 6 ------0 . 0 9 4 N 2 8 2 8 2 6 2 8 2 1 3 3 2 6 3 2 2 5 3 6 2 6 1 6 1 6 Pap2 2 ------0 . 0 2 5 - 4 0.214 0.190 0 . 0 9 4 0 . 2 5 7 - 0 . 1 2 8 0 . 3 8 3 0 . 2 0 0 0.105 0.095 0.033 0 . 2 7 5 - 6 0.286 0.431 0.484 0.414 - 0 . 3 4 6 0.350 0.629 0.434 0.068 0 . 1 5 0 0 . 4 2 5 - 8 0.482 0.379 0.250 0.329 - 0.346 0.133 0.171 0.434 0.734 0.583 0 . 2 5 0 - 1 0 0 . 0 1 8 - 0 . 0 9 4 - - 0.038 - - - 0.027 0 . 1 5 0 -- 1 2 - - - - - 0.026 0.133 - 0.026 - 0 . 0 3 3 -- 1 4 - - - - - 0 . 0 2 6 ------1 6 - -0.078 ------0 . 0 1 4 0 . 0 3 3 -- 1 8 ------0 . 0 5 4 0 . 0 1 7 0 . 0 2 5 - N 2 8 2 9 3 2 3 5 0 3 9 3 0 3 5 3 8 3 7 3 0 2 0 0 Pap3 2 ----- 0 . 0 2 d 0 . 0 9 1 0 . 0 1 4 ----- 16 - - - - 0.019 - - - 0 . 0 2 7 --- 0 . 0 3 1 18 0.222 0.290 - 0.015 0.038 0.115 0.076 0.143 0.176 0.313 0.357 0 . 0 2 5 0 . 0 9 4 2 0 0 . 1 3 0 0 . 1 2 9 0 . 1 5 6 0 . 3 2 4 0 . 3 2 7 0 . 1 4 1 0 . 0 3 0 0 . 0 7 1 0 . 1 6 2 0 . 0 3 1 0 . 0 7 1 0 . 1 0 0 0 . 0 9 4 2 2 0 . 2 5 9 0 . 0 6 5 0.172 0.265 0.346 0.462 0.318 0 . 5 4 3 0 . 3 2 4 0 . 0 4 7 0 . 0 2 4 0 . 7 7 5 0 . 4 0 6 24 0.074 - 0.109 - 0.058 0.090 0 . 2 7 3 - 0 . 0 4 1 0 . 3 5 9 0 . 4 0 5 0 . 0 7 5 0 . 1 5 6 26 0.056 0.323 0.250 0.015 0 . 0 7 7 0 . 1 0 3 0.136 0.229 0.054 0 . 1 2 5 0 . 0 4 8 0 . 0 2 5 0 . 0 6 3 2 8 0 . 0 1 9 - - 0 . 0 4 4 0 . 0 1 9 - - - 0 . 1 0 8 0 . 0 4 7 0 . 0 2 4 -- 3 0 - 0 . 0 8 1 0 . 0 9 4 - 0.115 0.013 - - 0 . 0 1 4 - 0 . 0 4 8 -- 3 2 0 . 1 3 0 0.048 0.219 0.338 ---- 0 . 0 4 1 0 . 0 7 8 0 . 0 2 4 - 0 . 1 2 5 3 4 0.019 0.048 - - - 0.038 0 . 0 7 6 - 0 . 0 5 4 --- 0 . 0 3 1 36 0.0190.016 - - - 0.013 ------3 8 0 . 0 7 4 ------N 2 7 3 1 3 2 3 4 2 6 3 9 3 3 3 5 3 7 3 2 2 1 2 0 1 6 A v . H . 0.734 0.658 0.712 0.732 0.733 0.758 0.730 0.603 0.737 0.624 0.710 0.459 0.823 T o t . N . 20 17 16 15 15 25 18 14 20 21 26 14 17

I have used the microsatellite allele frequency data, in conjunction with allozyme data generated by Y. Graneau and P. King to examine the genetic effects of colonisation events, in terms of reductions of and n^, associated with the spread of P. argus following the last ice age.

101 Individuals from Crevoux and Cervieres were pooled to give a ‘European source’ population. These two limestone grassland populations are in very close proximity, have relatively large sample sizes and are near the centre of the assumed glacial distribution of P. argus. Pairwise permutation tests of differences in n^ and were performed between each British population and this putative European source (Table 6.3). This tests the null hypothesis that there have been no losses of genetic variation associated with the colonisation of Britain from Europe.

Table 6.3 shows that the samples from British populations frequently have significantly less genetic variation than those from European populations. As expected, large losses of alleles are more frequent than large reductions in heterozygosity at microsatellite and allozyme loci, as rare alleles, which contribute little to standing levels of heterozygosity are the most sensitive to the effects of drift. Indeed at Chobham Common and Studland Heath, significant losses of allozyme alleles are associated with significant increases in heterozygosity. The marker system with the highest total number of alleles, allozymes, is also the most sensitive, with all British populations having significantly reduced n^.

Table 6.3 Genetic effects of the colonisation of Britain from Europe. Probabilities of obtaining the observed differences in n„ and gene frequency distributions under the null hypothesis of no differentiation between European and British populations. Probabilities of change in gene frequencies obtained from exact G tests as described in section 5.2.6. There are 9 tests of the above null hypothesis and so a revised probability level of 0.05/9 = 0.006 is used as a nominal level of significant difference. * indicates a significantincrease in H, between source and derived population.

Population Microsatellites Allozym es He Ha Gene He Ha Gene freq freq South Stack Range 0.089 0.477 0.000 0.000 0.000 0.000 Great Orme 0.702 0.068 0.000 0.003 0.000 0.000 Lynn Hafod-y-Lyn 0.714 0.000 0.000 0.647 0.000 0.000 Frees Heath 0.042 0.000 0.000 0.164 0.000 0.000 Chobham Common 0.806 0.243 0.000 *0.001 0.000 0.000 Studland Heath *0.005 0.677 0.002 *0.000 0.000 0.000 Penns Weare 0.094 0.003 0.000 0.055 0.000 0.000 Breney Common 0.034 0.000 0.000 0.011 0.000 0.000 Penhale Sands 0.043 0.040 0.000 0.829 0.000 0.000

Permutation tests of differences in numbers alleles and heterozygosity were also performed between Studland Heath and each British population. Studland Heath was

102 chosen as the putative British source population as it has the highest levels of genetic diversity of any British population at three of the four measures, (the exception being allozyme n^), and is situated on the edge of the Hampshire basin, the traditional stronghold of P. argus in Britain. The null hypothesis tested is that bottleneck induced drift, associated with spread through Britain, has caused no losses of genetic variation. It is rejected convincingly in a number of cases (Table 6.4).

Table 6.4 Genetic effects of range expansion in Britain. Probabilities of obtaining the observed differences in n„ and gene frequency distributions under the null hypothesis of no differentiation between Studland Heath and derived populations. Probabilities of change in gene frequencies obtained from exact G tests as described in section 5.2.6. There are 8 tests of the above null hypothesis and so a revised probability level of 0.05/8 = 0.006 is used as a nominal level of significant difference

Population Microsatellites Allozymes He Ha Gene freq He Ha Gene freq South Stack Range 0.3é0 0.081 0.000 0.000 0.058 0.000 Great Orme 0.001 0.084 0.000 0.000 1.000 0.000 Lynn Hafod-y-Lyn 0.137 0.001 0.000 0.010 0.133 0.000 Frees Heath 0.265 0.000 0.000 0.004 0.177 0.000 Chobham Common 0.540 0.150 0.000 0.410 0.874 0.000 Penns Weare 0.269 0.006 0.000 0.000 0.059 0.000 Breney Common 0.000 0.000 0.000 0.000 0.080 0.000 Penhale Sands 0.419 0.121 0.000 0.000 0.211 0.000

Both Table 6.3 and Table 6.4 illustrate the problems of using numbers of alleles and heterozygosity at neutral marker loci as indices of genetic diversity for studying the conservation of genetic variation. Allozyme n^ in P. argus is a highly sensitive indicator of the genetic effects of the initial colonisation of Britain (Table 6.3), but the purging of rare alleles during this process suggest that it is much less sensitive to drift during the subsequent spread of the species in Britain (Table 6.4). Similarly, many significant losses of at allozyme loci have accompanied the spread of P. argus through Britain (Table 6.4). However, it seems likely that the particularly high levels Hg at Studland Heath (higher than at Crevoux or Cervieres, at both allozyme and microsatellite loci. Figure 6.2 ), are the result of random gene frequency changes during its establishment (Table 6.3), and may not accurately reflect standing levels of variation in the population. Greater evenness of allele frequency distributions, combined with the possibility of multilocus estimates, give microsatellites an advantage over allozymes. The inconsistent pattern of changes in microsatellite H^ and n^ associated with the post-glacial spread of P. argus (Tables 6.3 and 6.4), suggests a

103 historical scenario in which localised demographic changes dominate, after rapid initial spread.

Linear regression of genetic variability measures on distance from source for British populations does not support a hypothesis of gradual erosion of genetic variability through sequential colonisation events (Figure 6.3).

4.00 Figure 6.3. y = -0.001X + 3.854 f = 0.403 Relationship between distance from Studland Heath (SH) and genetic 3.80. diversity, n^ j^ta comprises microsatellite and allozyme loci.

3.40.

n = 8 p = 0.0847 3.20 100 200 300 400

Microsatellites A llozym es 4.40 3.80 y = -O.OOOx + 4.297 # = 0.108 y= -0.001x4-3.610 # = 0.398

4.30 3.60

loge loge He 4.20- H . 3.40

4.10- 3.20

n = 9 p = 0.3643 n = 9 p = 0.0411 4.00 3.00 100 200 300 400 100 200 300 400

Distance from Studland Heath (Km)

The consistent negative relationship between genetic variability measures and physical distance is a consequence of using the most variable population (Studland Heath) as the hypothetical source. However, non significant relationships imply that there is no constant rate of decline in variability with increasing physical distance from source. There is some evidence that decline in allozyme H^ is dependent on distance from Studland Heath (p = 0.0411) but, in isolation, this result is unconvincing, as the

104 principal effect of drift during colonisation bottlenecks will be to cause loss of alleles (Regression of allozyme n^ on distance is not significant p = .2955).

Permutation tests between source and derived populations, and spatial analysis of levels of variability, indicate that persistent losses of genetic variation have not occurred during the spread of P. argus within Britain. This may be because colonisation bottlenecks have been large enough to prevent substantial drift, or because gene flow has played a greater role in the demographic history of the species than current levels of dispersal would indicate. This was investigated by using levels of genetic structure at microsatellite and allozyme loci to reconstruct an indirectly obtained demography of the species.

6.3.2. Estimation of demographic parameters from allele frequency distributions Levels of genetic structure at microsatellite loci were examined by computing and Rgy statistics and their associated Nm estimates at three spatial levels, across all populations, across British populations and across continental populations. Results are presented in Table 6.5.

Table 6.5 Genetic structure at microsatellite loci. Fst estimates were computed using FSTAT (Goudet 1986), Rgx estimates were calculated using Rgx CALC (Goodman 1997). Nm estimates were derived from formulae given in section 6.2.5. p = probability Fgx > 0, derived from permutations of genotypes between samples. Due to the absence of data at locus Pap 2 in Cbobbam Common (CC) and Monte Baldo (MB), two estimates of Fgy and Rg^ were calculated - across all three loci but excluding populations CC and MB, and across Papl and Pap3 in all populations. These are termed estimates (Est.) 1 and 2 respectively.

Locus Est Est Nm P Est Nm P All populations Papl 0.112 0.120 Pap2 0.090 0.057 Pap3 0.109 0.169 All loci 1 0.103 2.177 < 0.001 0.115 1.737 < 0.001 2 0.107 2.086 < 0.001 0.100 2.064 < 0.001 British populations 1 0.060 3.917 < 0.001 0.056 3.636 < 0.001 (all loci) 2 0.072 3.222 < 0.001 0.054 3.851 < 0.001 Continental populations1 0.226 0.856 < 0.001 0.136 1.061 < 0.001 (all loci) 2 0.208 0.952 < 0.001 0.134 1.215 < 0.001

105 Fgy and estimates are robust to the exclusion of populations Chobham Common and Monte Baldo, and locus Pap 2. Both yield consistent values of Nm at each different hierarchy of population structure; Nm « 3.5 for British populations, « 1 for

Continental populations and = 2 for all populations. The close agreement between levels of gene flow estimated by Fg^j. and suggests that over timescales of about ten thousand years, stepwise mutations have not accumulated at a rate sufficient to bias Fg^ (Section 6.2.4). This implies mutation rates at these loci of < 10'^ per gamete per generation. Under these circumstances, the specific mutational dynamics of microsatellite loci can be ignored, and it can be assumed that Njn has not significantly affected allele frequency distributions at these three microsatellite loci in the populations of the study system. Consequently, microsatellite Fg^ estimates will be used in comparison with allozymes, and for subsequent demographic analysis.

Fgy and Nm estimates derived from microsatellite and allozyme loci are compared in Table 6.6.

Table 6.6. Genetic structure at allozyme and microsatellite loci. Fst» Nm andp value estimates derived as for Table 6.5. Estimate 2 of microsatellite Fst is used (Loci Papl and 3 in all populations).

Allozymes Microsatellites Structure level Locus F^ Nm p Nm p pgm 0.054 gpi 0.095 mpi 0.087 me 0.165 sdh 0.043 idh 0.066 fum 0.048 PP 0.257 idhl 0.135 akl 0.147 All populations All loci 0.107 2.09 <0.001 0.103 2.177 <0.001 Britain All 0.097 2 J3 <0.001 0.060 3.917 <0.001 Continental All 0.116 1.91 <0.001 0.226 0.856 <0.001

Although Goudet (1996), cautions that ‘estimates of Fgy based on less than 5 polymorphic loci are bound to be dubious’, microsatellite (2 loci), and allozyme (10

106 loci), estimates of for all 13 populations are in good agreement. Continental populations show greater subdivision than British at both markers. This may reflect either higher average interpopulation distances, or longer time since isolation, both of which will be expected to increase the effects of drift. For populations in continental Europe (CR, CV, MB, PA), almost twice as much of the variance in microsatellite allele frequencies is between populations, when compared with allozymes. Other studies have found that allelically diverse markers, such as microsatellites or RFLP’s do reveal more structure that allozymes (Begun and Aquadro 1993, Scribner et al. 1994, Pogson et al. 1995) however, this effect should be consistent at all levels of subdivision. In this case the anomaly may result from the sampling effects of low number of loci in a small sample of continental populations mimicking the effects of increased drift.

Indirect estimates of Nm based on F^^ at both types of marker loci (Table 6.6), are much higher than direct estimates obtained from field studies of dispersal. Extensive mark-recapture studies suggest that P. argus is a very sedentary species, adults move an average of only 20m in a lifetime, and the longest flights ever recorded are of the order of a few hundred meters (Thomas 1985, Lewis et al. 1997). All of the populations in the study are separated by at least tens of km and have probably not exchanged individuals for hundreds or thousands of generations (Thomas and Lewington 1993). This is supported by the observation that the Dulas Valley system described in Chapter 5 is 13km from an old established population of approximately 1 million P.argus at the Great Orme, but was never naturally colonised. This discrepancy between direct and indirect estimates of Nm suggest that the lack of genetic differentiation in British populations of P. argus is not the product of a balance between gene flow and genetic drift.

In the absence of gene flow, low levels of differentiation between British populations could be maintained through a mutation drift balance, under the assumption that F^^. = l/(l+4NgjU ). Taking typical allozyme mutation rates of 10 ^ - 10^ (Neel et al. 1986),

and assuming m = 0, effective population sizes would have to be of the order of 200,000- 2 million to generate enough new mutations to account for the observed Fg^ values. Microsatellite mutation rates are generally higher, typically 10 ^ 10'"^. If this was the case for the three loci studied here, Fg^ values at microsatellite loci could be the result of a mutation drift equilibrium in P. argus populations with effective sizes of thousands or tens of thousands. However, the similarity of Fg^ and measures at microsatellite loci, suggest that a rapid stepwise mutation process is not important in partitioning genetic variation among the populations. Consequently, unless extremely

107 large effective population sizes have been maintained (« 1 million individuals), it is unlikely that current patterns of genetic differentiation are the result of a mutation/drift equilibrium.

Equilibria between the forces affecting neutral genetic variation are approached slowly. It is unlikely that they have been reached in the large British populations of P. argus since the demographic reconstruction following the last ice age. Instead, current genetic patterns of P. argus in Britain are probably a historical legacy of greatly increased gene flow during rapid range expansion into open habitat, following the retreat of ice approximately ten thousand years b.p. (Dennis 1993). Gradual forest coverage of Britain, culminating in the ‘Boreal climax’ 5-7 thousand years b.p. (Dennis 1992), would have caused subdivision of an effectively panmictic British population. Under this scenario, isolated populations do not exchange individuals following subdivision, and the extent of differentiation though drift is determined by subsequent effective population sizes. This hypothesis was tested by using Fg^as a measure of heterozygote deficit, to estimate the effective population sizes required to cause the observed decline in heterozygosity through drift since population subdivision.

Assuming levels of genetic variation at Studland Heath to be representative of pre­ isolation levels, pairwise comparisons of between this and other British populations were used to estimate N^, where (l-Fg^)' = (l-(l/2Ne))‘(l-FgT)o, where t is the time in generations since subdivision. I assume that the study populations have been effectively completely isolated since the Boreal climax, so t = 6000. These indirect estimates of effective population size are compared with direct estimates in Table 6.7, for four P. argus populations where census size data is available, and assuming that = approximately N/2 (Nunney and Cambell 1993).

Table 6.7. Comparison of indirect and direct estimates of in British populations of P. argus. Census size numbers obtained from Thomas (1985).

Pairwise Fg? withSH Indirect N, Population Microsatellite Allozymes Microsatellite Allozymes Census N, GO 0.1087 0.0707 2.5x10“ 4x10“ Ix 10" HYL 0.0648 0.1241 5x10' 2x10" 2-10x10' SSR 0.0418 0.1364 7x10' 2.5x10" 2-10x10' PH 0.1237 0.0560 2.5x10" 1x10' 1-4x10'

108 Table 6.7 suggests that if current low levels of genetic differentiation are the result of genetic drift in populations isolated for at least 6000 generations, effective population sizes must have been historically larger than at present. This is possible, however, as Lynn Hafod-y-Lynn, South Stack Range, and Frees Heath are all heath and mossland populations, the biotopes which have been most greatly in decline this century. Conversely, on the limestone grassland at the Great Orme, land use has remained much the same for at least the past 100 years (Thomas 1985), and it is likely that numbers of P. argus have remained undiminished. Here it appears that effective sizes could have remained large enough to account for the lack of differentiation for Studland Heath.

6.4 Discussion Levels of genetic variability at marker loci in P. argus suggest that effective population sizes have remained large enough to prevent substantial genetic drift during the spread of the species across its current range. Indeed, the greatly increased gene flow associated with this spread still dominates patterns of genetic heterogeneity in British populations of the butterfly. Loss of genetic variability associated with colonisation of a new range has been demonstrated in a wide variety of taxa, and include both natural and artificial introductions (Scwaegerle and Schaal 1979, Bryant et al. 1981, Ross 1983, Johnson 1988, Vrijenhoek 1994, Brookes et al. 1997). In a classic study of the effects of serial colonisation in an invaded range. Stone and Sunnucks (1993) showed a striking linear loss of variability with distance from origin, in the spread of the cynipid gallwasp across Western Europe.

No such clear effect is found for P.argus in Britain. Pairwise permutation tests between source and colonised populations do reveal significant losses of H^ and n^ associated with spread from Europe across Britain, but these losses are rarely consistent, either between different types of marker loci within populations, or between different populations at the same type of marker locus. Spatial analysis of genetic variability levels also show no significant effect of distance from source on levels of variation.

Direct observations of the mobility of P. argus suggest that gene flow between populations will be low, and that drift, enhanced by turnover within populations, will predominate (Thomas and Harrison 1992, Hanski and Thomas 1996). This is contrary to the indirect estimates presented here, based on allele frequency distributions at marker loci, which indicate that levels of gene flow have been high enough to prevent substantial genetic differentiation. This discrepancy between direct and indirect estimates of gene flow is a frequent observation in a variety of

109 species (Ehrlich et al. 1975, McCauley and Eanes 1987, Slatkin 1987, Peterson 1996), and arises through the difficulty of accurately measuring gene flow directly, and because indirect estimates average levels of gene flow over long periods.

The direct observation of low mobility of P. argus is based on mark-recapture studies in fragmented habitats (Thomas 1985, Lewis et al. 1997), where long distance dispersal entails risk of isolation. The possibility that individuals have higher mobility when no suitable habitat is available cannot be ruled out. For many species, dispersal occurs over distances much shorter than individuals are capable of moving (Ehrlich and Raven 1969). To test this, M. Brookes, Y. Graneau and I, released 12 individual male P. argus in a large field of mown pasture, containing no host plants or exposed ground. All the butterflies rapidly dispersed over hundreds of meters, distances as great as the longest ever recorded flights on suitable habitats, where less than 1% of individuals move more than 100 meters in their 3-5 day lifespans (Lewis et al. 1997). The rapid colonisation of the Dulas Valley (Chapter 5), where intersite distances are >500m, also suggests that direct approaches underestimate the true dispersal ability of P. argus. Despite this evidence, the general inability of the species to colonise over distances of greater than 10 km (Thomas 1985, Brookes et al. 1997), suggests that there is no gene flow between the study populations.

Where effective population sizes are large, change through genetic drift will be slow. Indirect estimates average the effects of gene flow across these timescales, and so historical perturbations of population structure or dispersal will cause discrepancies with direct estimates. The history of P. argus in post ice-age Britain probably reflects just such a demographic reconstruction. The reappearance of P. argus in Britain, along with most of the other butterfly fauna, during the pre-boreal ‘Flandrian invasion’, was probably characterised by rapid dispersal and widespread distribution, enhanced by the warm summer temperatures (17°C av.) and low ground cover. Greatly increased gene flow during this period would gradually have been reduced by increasing population isolation through forest growth. The great decline of the species since the turn of this century mean that population sizes have probably been historically larger than current census numbers indicate. Thomas (1985), notes that in 400 km^ surrounding Frees Heath, the only surviving midlands population, 5% of the place names on the 1:50 000 OS map incorporate the word heath, although there is scarcely any of this biotope left. This, and direct observation of current population sizes (Table 6.7), suggests that even if populations were completely isolated from 6000 years b.p., effective sizes may have remained large enough to account for the low levels of genetic divergence at neutral marker loci.

110 Complete isolation of P. argus populations allows a test of the hypothesis that turnover in a metapopulation increases the rate of loss of genetic variation (Gilpin 1991). Where drift is enhanced through colonisation bottlenecks and extinction events, heath and mossland biotopes will be expected to lose genetic variation faster than limestone grasslands, as their rate of turnover is approximately twice as fast (Thomas 1985). If the study populations have not exchanged genes for at least 6000 generations, enhanced drift on heath and mossland biotopes should have resulted in lower levels of Hg and n^, and increased when compared with limestone grassland populations. There were no significant differences between biotopes in levels of genetic variability (p>0.05 Mann Whitney U test, for and n^ at allozyme and microsatellite loci), or estimates. Although only two limestone grassland populations are used in this comparison, the result concurs with the permutation test data and suggests that even though frequent population turnover does occur in P. argus populations, effective population sizes remain large enough to prevent substantial losses of variability through drift.

Recent great advances in molecular techniques have allowed a more detailed examination of population genetic structure, but care must be taken when extrapolating from marker data to demographic parameters. Patterns of variability at marker loci reflect an interaction between the demographic history of the species and the intrinsic mutational dynamics of the sequences in question. For most analyses using allozyme loci, infinite alleles mutation models are assumed, in which mutational ‘noise’ is considered low enough to be ignored. At highly variable microsatellite loci, increasingly the marker of choice for population genetic studies (Bruford and Wayne 1993), a highly non-random mutational process, operating at high rates, could significantly affect patterns of genetic heterogeneity (Slatkin 1995). Empirical studies have shown that, when infinite alleles models are assumed, divergence in the estimates of demographic parameters extrapolated from microsatellite and allozyme data increases with time (Deka et al. 1994, Forbes et al. 1995, Goldstein et al. 1995a). Methods which incorporate stepwise mutation provide more consistent agreement between microsatellites and other genetic, morphological and paleobiological data (Deka et al. 1994, Forbes et al. 1995, Shriver et al. 1995).

In P. argus, there was close agreement between estimates of gene flow estimated from microsatellite data assuming infinite alleles (F^y), and stepwise mutation (R^j), models. These measures remained consistent for populations which have diverged over different evolutionary timescales (within Britain and within Europe.). This is probably because the timescales of divergence of the study populations are not long enough to have allowed a substantial occurrence of stepwise mutation events (Slatkin 1995,

111 Rousset 1995). In the studies of Forbes et al. (1995) and Deka et al. (1994) Fg^ and Rg.p based estimates of divergence differed substantially only at the sub-species level and above, this, and the data from P. argus populations, suggest that the intrinsic mutational dynamics of microsatellite loci may be irrelevant for the interpopulation comparisons considered here.

British populations of P. argus provide an ideal natural system for investigating the hypothesis that population turnover can drastically reduce genome wide genetic variability; a crucial first stage in the argument that genetic factors can play a causative role in species extinctions. This rapidly declining species has been isolated in fragmented metapopulations for thousands of generations, where the timescale of local patch turnover is tens of generations. This, combined with its apparently sedentary nature and poor colonising ability suggest that drift induced losses of variability could lead to inbreeding depression, even in populations with high census numbers. This is the scenario envisaged by Gilpin (1991), and is especially relevant to the increasing number of species with small local populations due to habitat fragmentation.

Genetic data does not support such a catastrophic outlook. Losses of variability associated with turnover in P. argus are generally small, inconsistent, and only concern rare alleles. This does not mean that genetic losses of significance to viability have not occurred. The microsatellite and allozyme loci used are only surrogates of overall genomic diversity, and the accuracy of heterozygosity as a relative measure of inbreeding could be improved by increasing the number and polymorphism of the marker loci used. However, the overall consistency in the patterns of genetic subdivision revealed by the two markers is encouraging in the light of the increasing use of microsatellites for population analysis.

The results of this study indicate the power of gene flow to maintain levels of genetic diversity even in patchily distributed and sedentary species. British populations of P. argus may well be facing a catastrophe, but it will be environmental, and not genetic. Genetic drift has not caused a significant loss of diversity in these populations over the last 10000 years. Current rates of habitat loss will remove it all within the next 100.

112 Chapter 7 General Conclusions

This thesis has been concerned with the evolutionary dynamics of microsatellites and their use as genetic markers in conservation studies. Each chapter contains an independent discussion of the results presented therein. Here I conclude with a brief general discussion of each of these two, somewhat divergent, sections.

7.1 Dynamic mutation at microsatellite loci Microsatellites represent something of a holy grail in the search for the ideal genetic marker. Their widespread distribution and high polymorphism has resulted in a huge range of applications, including the transformation of linkage mapping and increasingly in forensics and population genetics (Bruford and Wayne 1993, Jame and Lagoda 1996). However, the dynamic mutation processes which underlie this utility are still poorly understood. In addition to being of paramount importance for the analysis of population genetic structure, elucidating the correct mutation model will offer insight into fundamental processes of genomic evolution. Microsatellites are one of several classes of tandemly repetitive DNA, sequences which share many properties of dynamic evolution and together comprise large amounts of most eukaryotic genomes. There is increasing awareness that redundancy and turnover in these repetitive elements may have important co-evolutionary roles in genome structure and function (Cavalier-Smith 1985, Dover 1993).

Repeat copy number at microsatellite loci is usually low compared with other repetitive DNA classes, despite the rapid nature of size change. Understanding what constrains dynamic mutation - how microsatellites are bom, and what stops them growing in size - will give important insights into the general mutational processes of repetitve DNA. In Chapters 2 and 3 respectively, I proposed explanations for the lower and upper size limits to dynamic mutation at microsatellite loci. I suggested that these limits are imposed by the intrinsic mutational properties of microsatellites, and may arise without a generalised selective constraint.

The intention of these studies is to present a novel and provocative approach, rather than to derive an exact analytical description of array size dynamics. The results of Chapter 3 indicate that models of microsatellite evolution will not be very useful without a greater empirical understanding of mutational processes at microsatellite loci. Knowledge is accumulating rapidly, however, especially from well designed experimental systems in Saccharomyces cerevisiae (Petes et al. 1993, 1997). Indeed,

113 this system has just been used to show that rates of instability for poly GT arrays increase by more than two orders of magnitude as arrays increase in size from 15 to 99 bp (Wierdl et al. 1997). This dependency of stability on length is a crucial assumption of the model presented in Chapter 3. Greater understanding of the evolutionary dynamics of microsatellites will also undoubtedly come from the increasing range of taxa from which they have been isolated. Interspecies comparisons first indicated that array size evolution is subject to constraint (Chapter 1.3), and should provide greater information on rates, degrees of bias and levels of instability in the dynamic mutation processes (Brooker et al. 1994, Rubinstein et al. 1995, Schug et al. 1997).

Many areas of biology have been revolutionised by whole genome sequencing projects, and the study of genomic evolution is set to become one of these. Precise information on the number, size and distribution of microsatellites will soon be available for model organisms spanning a wide range of taxa. Making the best use of this data will require new approaches, and a willingness to look at old problems from new angles. I advocate such a new approach in Chapters 2 and 3. Assumption of neutrality allows microsatellites in sequence databases to be viewed as a huge population of independently evolving arrays. Manipulation of sequences to produce more specific databases (for example, all non-coding sequences. Chapter 2), will also become easier with the proliferation of bioinformatics software.

I hope that the studies reported in Chapters 2 and 3 will encourage further work using this new approach. Combining a greater understanding of mutation mechanisms with improved sequence database analysis should allow the development and testing of more realistic models, and lead to a clearer understanding of microsatellite evolutionary dynamics. Such an understanding could find application in the association between microsatellites and human genetic disease, and will enhance the use of these loci as molecular markers for population and conservation genetics.

7.2 Genetic variability and population persistence The scope of genetics in conservation issues is broad, ranging from studies of paternity in tiny captive populations, to systematics and the assessment of species identity (Schonewald-Cox et al. 1983, Avise and Hamrick 1996). I have studied what is probably its most contentious application, the relevance of overall levels of genetic variability to population persistence. For the identification of conservation relevant entities (Operational Taxonomic Units), genetic approaches are as valid as any other. Disagreements in this field revolve around the inevitably subjective question of what aspect of biodiversity we are trying to conserve. But the claim that genetic

114 impoverishment could be a primary cause of extinctions is a much more complicated and controversial question.

Does reduced heterozygosity really matter? Some prima facie evidence suggests that it does. There are numerous examples of endangered populations with low levels of scoreable genetic variation, and in some cases these have been linked with reductions in individual fitness (Chapter 4). However, most of these studies confuse cause and effect. Genetic impoverishment is a likely effect of drastic reductions in population size caused by demographic or environmental factors. While this knowledge helps to inform management plans for such highly endangered species, it does not provide a new insight into conserving global biodiversity.

I examined whether losses of genetic variation could be a cause, rather than an effect, of population decline in the silver studded blue butterfly. The case for genetic stochasticity as a cause of extinctions has been strengthened by recent theoretical work, and suggests that it could represent as big a threat as demographic or environmental factors in spatially discontinuous populations subject to turnover (Chapter 4.2). As prevailing human land use practices are causing increasing fragmentation of natural habitats, this represents an area of genuine concern.

Populations of P. argus have lost genetic variation through population turnover over a variety of spatial scales (Chapters 5 and 6). However there have been no apparent fitness costs associated with these losses (Brookes et al. 1997), indeed the populations are thriving, a lack of suitable habitat being the only apparent limit to a more widespread distribution. This dichotomy is typical of other studies, although some convincing examples of natural inbreeding depression do exist (Keller et al. 1994, Jimenez et al. 1994), and exemplifies the problem of attributing population decline to genetic factors.

How should losses of genetic variability be quantified and interpreted? Permutation tests may indicate significant reductions in variability, in terms of heterozygosity and numbers of alleles, associated with periods of small population size (Chapters 5 & 6), but their significance in terms of individual fitness it not clear. Even the discovery of particularly low levels of variation does not lead to a clear plan for conservation. According to the ‘dominance’ view of inbreeding depression, such a population may quickly become purged of deleterious recessive alleles, and fitness may rebound rapidly (Bryant et al. 1990, Brakefield and Saccheri 1994). Whereas under the assumptions of an ‘overdominance’ view, fitness could suffer from the accumulation

115 of many deleterious recessive alleles of small effect, or a failure to respond to new ecological challenges (Lande 1995, Avise and Hamrick 1994).

The tools available to conservation geneticists have proliferated rapidly in the last 20 years (Chapter 4.4), allowing increasing genetic scrutiny of a wider diversity of endangered populations. Whether increased scrutiny has lead to greater clarity is another matter. I showed that allozyme and microsatellite loci showed broadly the same response to population turnover at both local (Chapter 5) and national (Chapter 6) scales, strengthening the view that either type of marker can be used as a surrogate for overall genomic diversity. However, consistency at neutral marker loci does not necessarily make them reliable indicators of genetic health. Many widespread and thriving species have low levels of scoreable genetic variation, and even in genetically depauperate endangered species, the link between marker loci and fitness is highly contentious (Caro and Laurenson 1990). An alternative is to concentrate on maintaining diversity at loci of special adaptive significance (Hughes 1991). However, this approach is just as subjective, presupposing that such loci can be reliably identified.

Conservation is ultimately not a scientific activity. Decisions on where and what to preserve being matters of aesthetics or expediency. However, science does have a vital role to play in deciding how to conserve, informing management decisions and ensuring that they are effective, once made. The explosive population growth and excessive material demands of the human race are increasing the pressure to turn scientific knowledge into practical conservation plans. Genetic approaches form a significant part of many of these (Annual report of the Institute of Zoology: Science for Conservation 1996). However, the failure to establish a clear link between overall genetic variability and population viability raise important questions about the validity of such an approach for practical conservation.

The ‘heterozygosity - viability’ issue is extremely complex, but poorly understood. Frequently, as in most areas of conservation biology, the cry is made for more and better studies to shed light on the problem. Knowledge is always preferable to ignorance, but in the 3 year course of the average research grant, many species are lost and the human population grows by 200 million (Avise and Hamrick 1996). Scientific knowledge is not keeping pace with political and social development. Conservationists must be practical. In the light of many inconclusive studies, it may be time to lay to rest the issue of reduced genetic variation as a primary cause of population extinction.

116 Literature cited

Alkopyanz, N., Bukanov, N., Westblom, T. U., and Berg, D. B. (1992) PCR-based RFLP analysis of DNA-sequence diversity in the gastric pathogen Helicobacter- pylori. Nucleic Acids Research 20: 6221-6225. Ammerman, A. J., and Cavalli-Sforza, L.L. (1984) The Neolithic transition and the genetics of populations in Europe. Princeton University Press; Princeton N.J. Amos, B., Schlotterer, C., and Tautz, D. (1993) Social-structure of pilot whales revealed by analytical DNA profiling. Science 260: 670-672. Anvret, M., Ahlberg, G., Grandell, U., Hedberg, B., Johnson, K., and Edstrom, L. (1993) Larger expansions of the CTG repeat in muscle compared to lymphocytes from patients with Myotonic-Dystrophy. Human Molecular Genetics 2: 1397- 1400. Avise, J. C. (1995) Mitochondrial-DNA polymorphism and a connection between genetics and demography of relevance to conservation. Conservation Biology 9: 686-690. Avise, J. C., and Hamrick, J. L. (1996) Conservation Genetics: Case histories from nature. Chapman and Hall. Avise, J. C., and Nelson, W. S. (1989) Molecular genetic-relationships of the extinct dusky seaside sparrow. Science 243: 646-648. Ballard, J. W. O., and Kreitman, M. (1995) Is mitochraidrial DNA a strictly neutral marker? Trends in Ecology and Evolution 10: 485-488. Banchs, I., Bosch, A., Guimera, I., Lazaro, C., Puig, A., and Estivill, X. (1994) New alleles at microsatellite loci in CEPH families mainly arise from somatic mutations in the lymphoblastoid cell-lines. Human Mutation 4: 232-232. Bassett, D. E., Basrai, M. A., Connelly, C., Hyland, K. M., Kitagawa, K., Mayer, M. L., Morrow, D. M., Page, A. M., Resto, V. A., Skibbens, R. V., and Hieter, P. (1996) Exploiting the complete yeast genome sequence. Current Opinion In Genetics & Development 6: 763-766. Begun, D. J., and Aquadro, C. F. (1993) African and North-American populations of Drosophila-melanogaster are very different at the DNA level. Nature 365: 548- 550. Berry, R. J., Triggs, G. S., King, P., Nash, H. R., and Noble, L. R. (1991) Hybridization and gene flow in house mice introduced into an existing population on an island. Journal Of Zoology 225: 615-632. Birky, C. W., Maruyama, T., and Fuerst, P. (1983) An approach to population and evolutionary genetic theory for genes in mitochondria and chloroplasts, and some results. Genetics 103: 513-527.

117 Bonnell, M L. and R.K. Selander (1974) Elephant seals: genetic variation and near extinction. 5'd^/icc 184:908-909. Brakefield, P. M., and Saccheri, I. J. (1994) Guidelines in conservation genetics and the use of population cage experiments with butterflies to investigate the effects of genetic drift and inbreeding, pp 165-179 in Conservation Genetics. Edited by V. Loeschcke, J. Tomiuk and S. K. Jain. Birkhauser; Basel. Brooker, A. L., Cook, D., Bentzen, P., Wright, J. M., and Doyle, R. W. (1994) Organization of microsatellites differs between mammals and cold-water teleost fishes. Canadian Journal Of Fisheries and Aquatic Sciences 51: 1959-1966. Brookes, M L et al. (1997) Genetic analysis of founder bottlenecks in the rare British butterfly Plebejus argus. Conservation Biology 11: 648-661 Brown, W. M., Prager, E. M., Wang, A., and Wilson, A. C. (1982) Mitochondrial- DNA sequences of primates - tempo and mode of evolution. Journal Of Molecular Evolution 18:225-239. Bruford, M. and R.K. Wayne (1993) Microsatellites and their application to population genetic studies. Current Opinion in Genetics and Development 3: 939-943. Bryant, E. H., Meffert, L. M., and McCommas, S. A. (1990) Fitness rebound in serially bottlenecked populations of the house-fly. American Naturalist 136:

Bryant, E H„ Vandijk, H., and VanDclden, W. (1981) Genetic variability of the face lly, Musca-autumnalis Degeer, in relation to a population bottleneck. Evolution 35: 872-881. uairns, J . , and Foster, P.L. (1991) Adaptive reversion of a in Escherichia-coli. Genetics 128: 695-701. Callen, D. F., Thompson, A. D., Shen, Y., and Phillips, H. A. (1993) Incidence and origin of 'null' alleles in microsatellite markers. American Journal of Human 52: 922-927. Caro, T. M., and Laurenson, M. K. (1994) Ecological and genetic-factors in conservation - a cautionary tale. Science 263: 485-486. Carson, H. L. (1968) The population flush and its genetic consequences, pp 123-137 in Population Biology and Evolution. Edited by R. C. Lewontin. Syracuse Univ. Press; Syracuse NY. Cavalier-Smith, T. (1985) The Evolution of Genome Size. John Wiley and Sons Ltd; London. Chakraborty, R., Deandrade, M., Diager, S. P., and Budowle, B. (1992) Apparent heterozygote deficiencies observed in DNA typing data and their implications in forensic applications. Annals of Human Genetics 56: 45-47.

118 Charlesworth, B., Sniegowski, P., and Stephan, W. (1994) The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215-220. Clark, A. G., and Lanigan, C. M. S. (1993) Prospects for estimating nucleotide divergence with RAPDs. Molecular Biology and Evolution 10: 1096-1111. Costa, R., Peixoto, A. A., Barbujani, G., and Kyriacou, C. P. (1992) A latitudinal cline in a Drosophila clock gene. Proceedings Of the Royal Society Of London Series B-Biological Sciences 250: 43-49.

Crow. .T. F. 11 9R6^ Rncir Ponrenfc in Pnnnl?>tir»n AnanfifAfivA qnrl F\/r>lntir>nor\/

Dallas, J. F. (1992) Estimation of microsatellite mutation rates in recombinant inbred strains of mouse. Mammalian Genome 3: 452-456.

dinucleotide (dC-dA)^. (dG-dT)„ polymorphisms in world populations. American Journal Of Human Genetics 56: 461-474. Deka, R., Shriver, M. D., Yu, L. M., Jin, L., Aston, C. E., Chakraborty, R., and Ferrell, R. E. (1994) Conservation of human-chromosome-13 polymorphic microsatellite (CA)„ repeats in chimpanzees. Genomics 22: 226-230. Dennis, R. L. H. (1993) An evolutionary history of British butterflies. pp217-245 in The Ecology of Butterflies in Britain. Edited by R. L. H. Dennis. Oxford University Press; Oxford. Dib, C., Faure, S., Fizames, C., Samson, D., Drouot, N., Vignal, A., Millasseau, P., Marc, S., Hazan, J., Seboun, E., Lathrop, M., Gyapay, G., Morissette, J., and Weissenbach, J. (1996) A comprehensive genetic-map of the human genome based on 5,264 microsatellites. 380: 152-154. Dietrich, W. E., Miller, J., Steen, R., Merchant, M. A., Damronboles, D., Elusain, Z., Dredge, R., Daly, M. J., Ingalls, K. A., Oconnor, T. J., Evans, C. A., Deangelis, M. M., Levinson, D. M., Kruglyak, L., Goodman, N., Copeland, N. G., Jenkins, N. A., Hawkins, T. L., Stein, L., Page, D. C., and Lander, E. S. (1996) A comprehensive genetic-map of the mouse genome. Nature 381: 172-172. DiRienzo, A., Peterson, A. C., Garza, J. C., Valdes, A. M., Slatkin, M., and Freimer, N. B. (1994) Mutational processes of simple-sequence repeat loci in human- populations. Proceedings Of the National Academy Of Sciences Of the United States Of America 91:3166-3170. DiRienzo, A., Thomas, R., Peterson, A. C., Valdes, A. M., Garza, C., Slatkin, M., and Freimer, N. B. (1993) Mutations at simple sequence repeats in human- populations. American Journal Of Human Genetics 53: 792-792. Dover, G. A. (1993) Evolution of genomic redundancy for advanced players. Current Opinion in genetics and Development 3: 902-910.

119 Dujon, B. (1996) Yeast Genome. M S-Medicine Sciences 12: 29-32. Ebenhard, T. (1991) Colonization in metapopulations - a review of theory and observations. Biological Journal Of the Linnean Society 42: 105-121. Ebenhard, T. (1995) Conservation breeding as a tool for saving animal species from extinction. Trends In Ecology & Evolution 10: 438-443. Edwards, A., Hammond, H. A., Jin, L., Caskey, C. T., and Chakraborty, R. (1992) Genetic-variation at 5 trimeric and tetrameric tandem repeat loci in 4 human- population groups. Genomics 12: 241-253. Edwards, K. J., Barker, J. H. A., Daly, A., Jones, C., and Karp, A. (1996) Microsatellite libraries enriched for several microsatellite sequences in plants. Biotechniques 20: 758. Ehrlich, P. R., and Raven, P. H. (1969) Differentiation of populations. Science 165: 1228-32. Ehrlich, P. R., White, R. R., Singer, M. C., McKechnie, S. W., and Gilbert, L. E. (1975) Checkerspot butterflies: a historical perspective. Science 188: 221-228. Ellegren, H., Hartman, G., Johansson, M., and Andersson, L. (1993) Major histocompatibility complex monomorphism and low-levels of DNA- fingerprinting variability in a reintroduced and rapidly expanding population of beavers. Proceedings Of the National Academy Of Sciences USA 90: 8150- 8153. Ellegren, H., Lifjeld, J. T., Slagsvold, T., and Primmer, C. R. (1995) Handicapped males and extrapair paternity in pied flycatchers - a study using microsatellite

Ellegren, H., Primmer, C. R., and Sheldon, B. C. (1995) Microsatellite ‘evolution’: directionahty or bias? Nature Genetics 11: 360-362.

nost races or sibling species? Heredity 75: 416-424. Eppig, J. T. (1996) Comparative maps: Adding pieces to the mammalian jigsaw puzzle. Current Opinion In Genetics & Development 6: 723-730. f^cKlemov^» J- R , and Markowitz, S. D. (1996) Mismatch repair defects in human caicinogenQsis. Human Molecular Genetics 5: 1489-1494. Estoup, A., Garnery, L., Solignac, M., and Cornuet, J. M. (1995) Microsatellite variation in honey-bee (Apis-mellifera I) populations - hierarchical genetic- structure and test of the infinite allele and stepwise mutation models. Genetics 140: 679-695. Ewens, W. J. (1972) The sampling theory of selectively neutral alleles. Theoretical Population Biology 3:87-112. Falconer, D. S. (1989) Introduction to quantitative genetics. Third ed. John Wiley; New York.

120 FitzSimmons N, N, Moritz C, Moore S,S. (1995) Conservation and Dynamics of Microsatellite Loci over 300 Million Years of Marine Turtle Evolution. Molecular Biology and Evolution. 12: 432-440. Fleischer, R. C., Tarr, C. L., and Pratt, T. K. (1994) Genetic-structure and mating system in the palila, an endangered hawaiian honeycreeper, as assessed by XyHA-fingQTgnntmg. Molecular Ecology 3: 383-392. Forbes, S. H., and Boyd, D. K. (1996) Genetic-variation of naturally colonizing wolves in the central rocky-mountains. Conservation Biology 10: 1082-1090. Forbes, S. H., Hogg, J. T., Buchanan, F. C., Crawford, A. M., and Allendorf, F. W. (1995) Microsatellite evolution in congeneric mammals - domestic and bighorn sheep. Molecular Biology and Evolution 12: 1106-1113. Fomage, M., Chan, L., Siest, G., and Boerwinkle, E. (1992) Allele frequency- distribution of the (TG)„(AG)^ microsatellite in the apolipoprotein c-II gene. Genomics 12: 63-68. Frankham, R. (1995) Conservation genetics. Annual Review of Genetics 305-327. Franklin, I. R. (1980) Evolutionary change in small populations, pp 135-149 in Conservation Biology: An evolutionary-ecological perspective. Edited by M. E. Soule. Sinauer; Sunderland MA. Frontali, M., Sabbadini, G., Novelletto, A., Jodice, C., Naso, F., Spadaro, M., Giunti, P., Jacopini, A. G., Veneziano, L., Mantuano, E., Malaspina, P., Ulizzi, L., Brice, A., Durr, A., and Terrenato, L. (1996) Genetic fitness in Huntingtons-disease and Spinocerebellar ataxia-1 - a population-genetics model for CAG repeat expansions. Annals Of Human Genetics 60: 423-435. Garcia-Moreno, J., Matocq, M. D., Roy, M. S., Geffen, E., and Wayne, R. K. (1996) Relationships and genetic purity of the endangered mexican wolf based on analysis of microsatellite loci. Conservation Biology 10: 376-389. Garza, J. C., Slatkin, M., and Freimer, N. B. (19956) Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Molecular Biology and Evolution 12: 594-603. Gasser, S. M., and Laemmli, U. K. (1987) A glimpse at chromosomal order. Trends In Genetics 3: 16-22. Gibbs, H. L., Prior, K. A., and Weatherhead, P. J. (1994) Genetic-analysis of populations of threatened snake species using RAPD markers. Molecular Ecology 3: 329-337. Gilbert, W. (1992) A vision of the Grail, in The code of codes: scientific and social issues of the Human Genome Project. Edited by K. A. Hood. Harvard University Press. Gilpin, M. (1991) The genetic effective size of a metapopulation. Biological Journal Of the Linnean Society 42: 165-175.

121 Gilpin, M., and Wills, C. (1991) MHC and captive breeding - a rebuttal. Conservation Biology 5: 554-555. Goldman, A., Ramsay, M., and Jenkins, T. (1994) Absence of Myotonic-Dystrophy in Southern African negroids is associated with a significantly lower number of CTG trinucleotide repeats. Journal Of Medical Genetics 31: 37-40. Goldstein, D. B., and Clark, A. G. (1995) Microsatellite variation in North-American populations of Drosophila- melanogaster. Nucleic Acids Research 23: 3882- 3886. Goldstein, D. B., Linares, A. R., Cavalli Sforza, L. L., and Feldman, M. W. (1995a) Genetic absolute dating based on microsatellites and the origin of modem humans. Proceedings Of the National Academy Of Sciences USA 92: 6723- 6727. Goldstein, D. B., Linares, A. R., Cavalli Sforza, L. L., and Feldman, M. W. (19956) An evaluation of genetic distances for use with microsatellite loci. Genetics 139: 463-471. Goodman, S. J. (1997) Rg^CALC: A collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance. Molecular Ecology In press: Gottelli, D., Sillerozubiri, C., Applebaum, G. D., Roy, M. S., Girman, D. J., Garciamoreno, J., Ostrander, E. A., and Wayne, R. K. (1994) Molecular- genetics of the most endangered canid - the Ethiopian wolf Canis-simensis. Molecular Ecology 3:301-312. Goudet, J. (1995) Fstat version 1.2: a computer program to calculate F-statistics. Heredity 86: 485-486. Gray, I. C., and Jeffreys, A. J. (1991) Evolutionary transience of hypervariable minisatellites in man and the primates. Proceedings Of the Royal Society Of London Series B-Biological Sciences 243: 241-253. Hadrys, P., Balick, M., and Schierwater, B. (1992) Applications of randomly amplified polymorphic DNA (RAPD) in molecular ecology. Molecular Ecology 1: 55-63. Hagelberg, E., Gray, I. C., and Jeffreys, A. J. (1991) Identification of the skeletal remains of a murder victim by DNA analysis. Nature 352: 427-429. Hamada, H., Petrino, M. G., and Kakunaga, T. (1982) A novel repeated element with Z-DNA-forming potential is widely found in evolutionarily diverse eukaryotic genomes. Proceedings Of the National Academy Of Sciences USA 79: 6465-6469. Hamada, H., Petrino, M. G., Kakunaga, T., Seidman, M., and Stollar, B. D. (1984a) Characterization of genomic poly(dT-dG). poly(dC-dA) sequence - structure, organization, and conformation. Molecular and Cellular Biology 4: 2610-2621.

122 Hamada, H., Seidman, M., Howard, B. H., and Gorman, C. M. (19846) Enhanced gene-expression by the poly(dT-dG). poly(dC-dA) sequence. Molecular and Cellular Biology 4: 2622-2630, Hancock, J. M. (1996) Simple sequences in a minimal genome. Nature Genetics 14: 14-15. Hanski, I., and Gilpin, M. (1991) Metapopulation dynamics - brief-history and conceptual domain. Biological Journal Of the Linnean Society 42: 3-16. Hanski, I., and Thomas, C. D. (1994) Metapopulation Dynamics and conservation - a spatially explicit model applied to butterflies. Biological Conservation 68: 167- 180. Hansson, L. (1991) Dispersal and connectivity in metapopulations. Biological Journal Of the Linnean Society 42: 89-103. Harding, R. M., Boyce, A. J., and Clegg, J. B. (1992) The evolution of tandemly repetitive DNA - recombination rules. Genetics 132: 847-859. Harding, R. M., Boyce, A. J., Martinson, J. J., Flint, J., and Clegg, J. B. (1993) A computer-simulation study of VNTR population-genetics - constrained recombination rules out the infinite alleles model. Genetics 135: 911-922. Harper, P. S., Harley, H. G., Reardon, W., and Shaw, D. J. (1992) in Myotonic-Dystrophy - new light on an old problem. American Journal Of Human Genetics 51: 10-16. Hastings, A. (1991) Structured models of metapopulation dynamics. Biological Journal O f the Linnean Society 42: 57-71. Hedrick, P. W. (1986) Genetic-polymorphism in heterogeneous environments - a

Hewitt, G. M. (1996) Some genetic consequences of ice ages, and their role in divergence and spéciation. BiolJ. Linn Soc. 58: 247-276.

Heredity 82: 282-286. Hodgkin, J., Plasterk, R. H. A., and Waterston, R. H. (1995) The nematode Caenorhabditis-elegans and its genome. Science 270: 410-414. Hughes, A. L. (1991) MHC polymorphism and the design of captive breeding programs. Conservation Biology 5: 249-251. Hughes, C. R., and Queller, D. C. (1993) Detection of highly polymorphic microsatellite loci in a species with little allozyme polymorphism. Molecular Ecology 2: 131-137. Jame, P., and Lagoda, P. J. L. (1996) Microsatellites, from molecules to populations and back. Trends In Ecology & Evolution 11: 424-429. Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985a) Hypervariable minisatellite regions in human DNA. Nature 314: 67-73.

123 Jeffreys, A. J., Wilson, V., and Thein, S. L. (19856) Individual-specific fingerprints of human DNA. Nature 316: 76-79. Jimenez, J. A., Hughes, K. A., Alaks, G., Graham, L., and Lacy, R. C. (1994) An experimental study of inbreeding depression in a natural habitat. Science 266: 271-273. Johnson, M. S. (1988) Founder effects and geographic variation in the land snail Theba pisana. Heredity 61: 133-142. Jones, J. S., Bryant, S. H., Lewontin, R. C., Moore, J. A., and Prout, T. (1981) Gene flow and the geographical-distribution of a molecular polymorphism in Drosophila-pseudoobscura. Genetics 98: 157-178. Jordano, D., and Thomas, C. D. (1992) Specificity of an ant-lycaenid interaction. Oecologia 91: 431-438. Jurka, J., and Pethiyagoda, C. (1995) Simple repetitive DNA-sequences from primates - compilation and analysis. Journal Of Molecular Evolution 40: 120- 126. Karl, S. A., and Avise, J. C. (1992) Balancing selection at allozyme loci in oysters - implications from nuclear RFLPs. Science 256: 100-102. Karp, A., Seberg, O., and Buiatti, M. (1996) Molecular techniques in the assessment of boim\ca\ diwQTsiiy. Annals Of Botany 78: 143-149. Kawamura, S., Saitou, N., and Ueda, S. (1992) Concerted evolution of the primate immunoglobulin alpha-gene through gene conversion. Journal Of Biological Chemistry 267: 7359-7367. Keller, L. P., Arcese, P., Smith, J. N. M., Hochachka, W. M., and Steams, S. C. (1994) Selection against inbred song sparrows during a natural-population bottleneck. Nature 372: 356-357. King, M .C. unci j^/ij^oiv A C. (1975) Evolution at two levels in Human and Chimpanzees. 188: 107-116. Klein, J., Ghuigin, C., Figueroa, P., Mayer, W. E., and Klein, D. (1993) Different modes of MHC evolution in primates. Molecular Biology and Evolution 10: 48- 59. Koonin, E. V., and Mushegian, A. R. (1996) Complete genome sequences of cellular life forms: Glimpses of theoretical evolutionary genomics. Current Opinion In Genetics & Development 6: 757-762. Kuhner, M. K., and Peterson, M. J. (1992) Genetic exchange in the evolution of the human MHC class-11 loci. Tissue Antigens 39: 209-215. Lande, R. (1988) Genetics and demography in biological conservation. Science 241: 1455-1460. Lande, R. (1994) Risk of population extinction from fixation of new deleterious mutations. Evolution 48: 1460-1469.

124 Lande, R. (1995) Mutation and conservation. Conservation Biology 9: 782-791. Leberg, P. L. (1992) Effects of population bottlenecks on genetic diversity as measured by allozyme electrophoresis. Evolution 46: 477-494. Lehman, N., Eisenhawer, A., Hansen, K., Mech, L. D., Peterson, R. O., Gogan, P. J. P., and Wayne, R. K. (1991) Introgression of coyote mitochondrial-DNA into sympatric North- American gray wolf populations. Evolution 45: 104-119. Levinson, G., and Gutman, G. A. (1987) Slipped-strand mispairing - a major mechanism for DNA-sequence evolution. Molecular Biology and Evolution 4: 203-221. Lewis, O.T., Thomas C. D., Hill, J. K., Brookes, M. I., Robin Crane, T. P., Graneau, Y. A., Mallet J. L.B., Rose O. C. (1997) Three ways of assessing metapopulation structure in the butterfly, Plebejus argus. Ecological Entomology 22:283-293 Lopez, M. A., and Lopez Banjul, C. (1993) Spontaneous mutation for a quantitative trait in Drosophila- melanogaster .2. distribution of mutant effects on the trait and fitness. Genetical Research 61: 117-126. Mahtani M, M, Willard H,F. (1993) A polymorphic X-linked tetranucleotide repeat locus displaying a high rate of new mutation: implications for mechanisms of mutation at short tandem repeat loci. Human Molecular Genetics 2: 431-437. Mallet, J. (1996) The genetics of biological diversity: from varieties to species. In Biodiversity: Biology of Numbers and Difference. Edited by K. J. Gaston. Blackwell; Oxford. Mallet, J., Korman, A., Heckel, D. G., and King, P. (1993) Biochemical genetics of Heliothis and Helicoverpa (Lepidoptera: Noctuidae) and evidence for a founder event in Helicoverpa zea. Annals of the Entomological Society of America 86: 189-197. Mandel, J. L. (1994) Trinucleotide diseases on the rise. Nature Genetics 7: 453-455. Marchant, A.J. (1956) Plebejus argus caemensis (Thompson) at Llanddulas and Rhyd-y-Foel. Entomologist 89: 234. McCauley, D. E., and Eanes, W. F. (1987) Hierachical population structure analysis of the milkweed beetle Tetraopes tetraopthalmus. Heredity 58: 193-201. McClenaghan, L. R., Berger, J., and Truesdale, H. D. (1990) Founding lineages and genic variability in plains bison (Bison-bison) from Badlands-national-park, South-Dakota. Conservation Biology 4: 285-289. Messier, W., Li, S. H., and Stewart, C. B. (1996) The birth of microsatellites. Nature 381: 483-483. Miller, P. S., and Hedrick, P. W. (1991) MHC polymorphism and the design of captive breeding programs - simple solutions are not the answer. Conservation Biology 5: 556-558.

125 Moritz, C. (1994) Applications of mitochondrial-DNA analysis in conservation - a critical-review. Molecular Ecology 3: 401-411. Murakami, Y., et al. (1995) Analysis of the nucleotide-sequence of chromosome-VI from saccharomyces-cerevisiae. Nature Genetics 10: 261-268. Nachman, G. (1991) An acarine predator-prey metapopulation system inhabiting greenhouse cucumbers. Biological Journal of the Linnaen Society 42: 285-303. Nadeau, J., Bedigian, H. G., Bouchard, G., Denial, T., Kosowsky, M., Pugh, S., Sargeant, B., Turner, R., and Paigen, B. (1992) Multilocus markers for mouse genome analysis - PCR amplification based on single primers of arbitrary nucleotide-sequence. Mammalian Genome 3: 55-64. Nadir, B., Margalit, H., Gallily, T., and Bensasson, S. A. (1996) Microsatellite spreading in the human genome - evolutionary mechanisms and structural implications. Proceedings Of the National Academy Of Sciences USA 93: 6470- 6475. Neel, J. V., Satoh, C., Goriki, K., Fujita, M., Takahashi, N., Asakawa, J., and Hazama, R. (1986) The rate with which spontaneous mutation alters the electrophoretc mobility of polypeptides. Proceeding of the National Academy of Sciences USA 83: 389-393. Nei, M. (1972) Genetic distance between populations. American Naturalist 106: 283- 292. Nei, M. (1977) F-statistics and the analysis of of gene diversity in subdivided populations. Annals of Human Genetics 41: 225-233. Nei, M., T. Maruyama, R. Chakraborty. (1975) The bottleneck effect and genetic

Nichols, R. A., and Hewitt, G. M. (1994) The genetic consequences of long distance dispersal during colonization. Heredity 72: 312-317.

Academy of Sciences USA 80: 1821-1825. Norman, J. A., Moritz, C., and Limpus, C. J. (1994) Mitochondrial-DNA control region polymorphisms - genetic-markers for ecological-studies of marine turtles. Molecular Ecology 3: 363-373. Nunney, L., and Campbell, K. A. (1993) Assessing minimum viable population size - demography meets population-genetics. Trends In Ecology & Evolution 8: 234- 239. O'Brien, S. J. (1996) Conservation genetics of the felidae. pp 50-74 in Conservation Genetics: Case histories from nature. Bdited by J. C. Avise and J. L. Hamrick. Chapman and Hall. O'Brien, S. J., and Bvermann, J. F. (1988) Interactive influence of infective disease and genetic diversity in natural populations. Trends in Ecology and Evolution. 3: 254-259.

126 O'Brien, S. J., Wildt, D. E., Bush, M., Caro, T. M., Fitzgibbon, C., Aggundey, L, and Leakey, R. E. (1987) East African cheetahs: evidence for two population bottlecks. Prodeeding of the National Academy of Sciences USA 84: 508-511. O'Brien, S.J., et al (1985) Genetic basis for species vulnerability in the cheetah. Science 227: 1428-1434. Obrien, S. J. (1994) A role for molecular genetics in biological conservation. Proceedings of the National Academy of Sciences USA 91: 5748-5755. Oliver, S. G. et al. (1992) The complete DNA-sequence of yeast chromosome-III. Nature 357: 38-46. Orita, M., Suzuki, Y., Sekiya, T., and Hayashi, K. (1989) Rapid and sensitive detection of point mutations and DNA polymorphisms using the polymerase chain-reaction. Genomics 5: 874-879. Pardue, M. L., Lowenhaupt, K., Rich, A., and Nordheim, A. (1987) (dC-dA)„.(dG- dT)„ sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and function. EMBO Journal 6: 1781-1789. Pasteur, N., G. Pasteur, F. Bonhomme, J. Catalan, and J. Britton-Davidian. (1988) Practical Isozyme Genetics. Ellis Horwood.; Chichester. Peixoto, A. A., Campesan, S., Costa, R., and Kyriacou, C. P. (1993) Molecular evolution of a repetitive region within the per gene of Drosophila. Molecular Biology and Evolution 10: 127-139. Pemberton, J. M., Slate, J., Bancroft, D. R., and Barret, J. A. (1995) Nonamplifying alleles at microsatellte loci - a caution for parentage and population studies. Molecular Ecology 4: 249-252. Peterson, M. (1996) Long-distance gene flow in the sedentary butterfly, Euphilotes (Lepidoptera: lycaenidae). 50: 1990-1999. Petes, T. D., Greenwell, P. W., and Dominska, M. (1997) Stabilizationof microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146: 491-498. Pogson, G. H., Mesa, K. A., and Boutilier, R. G. (1995) Genetic population- structure and gene flow in the atlantic cod Gadus- morhua - a comparison of allozyme and nuclear RFLP loci. Genetics 139: 375-385. Primmer, C. R., Moller, A. P., and Ellegren, H. (1997) A wide-range survey of cross-species microsatellite amplification in birds. Molecular Ecology 6: 101- 101. Queller, D. C., Strassmann, J. E., and Hughes, C. R. (1988) Genetic relatedness in colonies of tropical wasps with multiple queens. Science 242: 1155-1157. Ralls, K., J.D. Ballou, A.Templeton (1988) Estimates of lethal equivalents and the cost of inbreeding in mammals. R/o/ogy 2: 185-193.

127 Raymond, M., and Rousset, F. (\995a) An exact test for population differentiation. Evolution 49: 1280-1283. Raymond, M., and Rousset, F. (19956) GENEPOP (version 1.2): population genetics software for exact tests and ecumenism. Heredity 86: 248-249. Richard, G. F., and Dujon, B. (1996) Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces-cerevisiae. Gene 174: 165- 174. Richards, R. I., and Sutherland, G. R. (1994) Simple repeat DNA is not replicated simply. Nature Genetics 6: 114-116. Richardson, B. J., P R. Baverstock and M. Adams. (1986) Allozyme Elecrophoresis. A Handbook for Animal Systematics and Population Studies. Academic Press; London. Rose, O. C., Brookes, M. I., and Mallet, J. L. B. (1994) A quick and simple nonlethal method for extracting DNA from butterfly wings. Molecular Ecology 3: 275-275. Ross, H. A. (1983) Genetic differentiation of starling {Sturnus vulgaris: Aves) populations in New Zealand and Great Britain. Journal of the Zoological Society o f London 201: 351-362. Rousset, F. (1996) Equilibrium values of measures of population subdivision for stepwise mutation processes. Gmé?6c5' 142: 1357-1362. Rubinsztein, D. C., Amos, W., Leggo, J., Goodburn, S., Jain, S., Li, S. H., Margolis, R. L., Ross, C. A., and Ferguson-Smith, M. A. (1995) Microsatellite evolution - evidence for directionality and variation in rate between species. Nature Genetics 10: 337-343. Rubinsztein, D. C., Amos, W., Leggo, J., Goodburn, S., Ramesar, R. S., Old, J., Bontrop, R., McMahon, R., Barton, D. E., and Eerguson-Smith, M. A. (1994) Mutational bias provides a model for the evolution of Huntingtons- disease and predicts a general increase in disease prevalence. Journal Of Medical Genetics 32: 142-142. Rudikoff, S., Fitch, W. M., and Heller, M. (1992) Exon-specific gene correction (conversion) during short evolutionary periods - homogenization in a 2-gene family encoding the beta-chain constant region of the lymphocyte-t antigen recepior. Molecular Biology and Evolution 9: 14-26. Sambrook, J., E. Fritsch and Maniatis T. (1989) Molecular cloning, a laboratory

Schlottcrcr, C., and Tautz, D. (1992) Slippage synthesis of simple sequence DNA. NA.R. 20: 211-215.

popUiaiHJllî». UClijtiimiu i w.. ..

128 Schug, M. D., Mackay, T. F. C., and Aquadro, C. F. (1997) Low mutation rates of microsatellite loci in Drosophila melanogaster. Nature Genetics 15: 99-102. Schwaegerle, K., and Schaal, B. A. (1979) Genetic variability and founder effect in the pitcher plantSarracenia purpurea L. Evolution 33: 1210-1218. Scribner, K. T., Amtzen, J. W., and Burke, T. (1994) Comparative-analysis of intrapopulation and interpopulation genetic diversity in Bufo-bufo, using allozyme, single-locus microsatellite, minisatellite, and multilocus minisatellite data. Molecular Biology and Evolution 11: 737-748. Shriver, M. D., Jin, L., Boerwinkle, E., Deka, R., Ferrell, R. E., and Chakraborty, R. (1995) A novel measure of genetic-distance for highly polymorphic tandem repeat loci. Molecular Biology and Evolution 12: 914-920. Sidow, A. (1996) Genome duplications in the evolution of early vertebrates. Current Opinion In Genetics & Development 6: 715-722. Signer, E. N., Schmidt, C. R., and Jeffreys, A. J. (1994) DNA variability and parentage testing in captive waldrapp ibises. Molecular Ecology 3: 291-300. Simmons, M. J., and Crow, J. F. (1977) Mutations affecting fitness in Drosophila populations. Annual Review of Genetics 11: 49-78. Simonsen, V. (1982a) Electrophoretic variation in large mammals .2. The red fox, Vulpes- vulpes, the stoat, Mustela-erminea, the weasel,Mustela-nivalis, the pole cat, Mustela-putorius, the pine marten, Martes-martes, the beech marten,Martes- foina, and the badger, Meles-meles. Hereditas 96: 299-305. Simonsen, V., Allendorf, F. W., Eanes, W. F., and Kapel, F. O. (19826) Electrophoretic variation in large mammals .3. The ringed seal, Pusa- hispida, the harp seal,Pagophilus-groenlandicus, and the hooded seal, Cystophora- cristata. Hereditas 97: 87-90. Slatkin, M. (1976) Gene flow and genetic drift in a species subject to frequent local extinctions. Theoretcal Population Biology 12: 253-262. Slatkin, M. (1987) Gene Flow and the Geographic Structure of Natural Populations. Science 236: 787-792. Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139: 457-462. Slatkin, M., and Barton, N. H. (1989) A comparison of three indirect methods for estimating average levels of gene flow. Evo/wriow 43: 1349-1368. Smit, A. F. A. (1996) The origin of interspersed repeats in the human genome. Current Opinion In Genetics & Development 6: 743-748. Smith, G.P. (1976) Evolution of repeated DNA sequences in unequal crossover. Science 191: 528-535.

129 Soule, M. E. (1980) Thresholds for survvial: maintaining fitness and evolutionary potential. p p l5 1-169 in Conservation Biology: An evolutionary-ecological perspective. Edited by M. E. Soule. Sinauer; Sunderland MA. Soule, M E., editor (1987) Viable Populations for Conservation. Cambridge University Press, Cambridge, UK. Stallings, R. L., Ford, A. P., Nelson, D., Torney, D. C., Hildebrand, C. E., and Moyzis, R. K. (1991) Evolution and distribution of (GT)„ repetitive sequences in mammalian genomes. Genomics 10: 807-815. Stephan, W. (1989) Tandem-repetitive noncoding DNA - forms and forces. Molecular Biology and Evolution 6: 198-212. Stephan, W., and Cho, S. (1994) Possible role of natural-selection in the formation of tandem- repetitive noncoding DNA. Genetics 136: 333-341. Stone, G. N., and Sunnucks, P. (1993) Genetic consequences of an invasion through a patchy environment - the cynipid gallwasp Andricus quercuscalicis (Hymenoptera, cynipidae). Molecular Ecology 2: 251-268. Strand, M., Prolla, T. A., Liskay, R. M., and Petes, T. D. (1993) Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature 365: 274-276. Sutherland, G. R., and Richards, R.I. (1995) Simple tandem DNA repeats and human genetic disease. Proceedings of the National Academy of Sciences USA 92: 3636-3641. Sutherland, G. R., Haan, E. A., Kremer, E., Lynch, M., Pritchard, M., Yu, S., and Richards, R. I. (1991) Hereditary unstable DNA- a new explanation for some old genetic questions. Lancet 338: 289-292. Swart, J. A. A., Reijnders, P. J. H., and Vandelden, W. (1996) Absence of genetic- variation in harbor seals {Phoca-vitulina) in the Dutch Wadden-sea and the British Wash.Conservation Biology 10: 289-293. Swofford, D. L., and Selander, R. B. (1981) Biosys 1: a FORTRAN program for the comprehensive analysis of data in population genetics and systematics. Journal of Heredity 72: 281-283. Tachida, H., and lizuka, M. (1992) Persistence of repeated sequences that evolve by replication slippage. Genetics 131: 471 -478. Tautz, D., and Renz, M. (1984) Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Research 12: 4127-4138. Tautz, D., Trick, M., and Dover, G. A. (1986) Cryptic simplicity in DNA is a major source of genetic-variation. Nature 322: 652-656. Taylor, A. C., Sherwin, W. B., and Wayne, R. K. (1994) Genetic-variation of microsatellite loci in a bottlenecked species - the northern hairy-nosed wombat Lasiorhinus-kreffiii. Molecular Ecology 3: 277-290.

130 Telenius, H., Kremer, B., Goldberg, Y. P., Theilmann, J., Andrew, S. E., Zeisler, J., Adam, S., Greenberg, C., Ives, J., Clarke, L. A., and Hayden, M. R. (1994) Somatic and gonadal mosaicism of the Huntington disease gene GAG repeat in brain and sperm. Nature Genetics 7: 113-113. Templeton, A. R., and Read, B. (1983) The elimination of inbreeding depression in a captive herd of Speke's gazelle. pp241-261 in Genetics and Conservation: A reference for managing wild animal and plant populations. Edited by C. M. Schonewald-Cox, S. M. Chambers, B. MacBryde and L. Thomas. Benjamiin/Cummings; Menlo Park CA. Templeton, A. R., Davis, S. K., and Read, B. (1987) Genetic-variability in a captive herd of spekes gazelle {Gazella- spekei). Zoo Biology 6: 305-313. Thomas, C. D. (1985) The status and conservation of the butterfly Plebejus-argus I (lepidoptera, lycaenidae) in North-West Britain. Biological Conservation 33: 29- 51. Thomas, C. D., and Harrison, S. (1992) Spatial dynamics of a patchily distributed butterfly species. Journal Of Animal Ecology 61: 437-446. Thomas, J., and Lewington, R. (1991) The Butterflies of Britain and Ireland. Dorling Kindersley; London. Ting, S. J. Y. (1995) A binary model of repetitive DNA-sequence in Caenorhabditis- elegans. DNA and Cell Biology 14: 83-85. Treco, D., and Amheim, N. (1986) The evolutionarily conserved repetitive sequence d(tG . aC)„ promotes reciprocal exchange and generates unusual recombinant tetrads during yeast meiosis. Molecular and Cellular Biology 6: 3934-3947. Valdes, A. M., Slatkin, M., and Freimer, N. B. (1993) Allele frequencies at microsatellite loci - the stepwise mutation model revisited. Genetics 133: 737- 749. Vos, P., Hogers, R., Bleeker, M., Reijans, M., Vandelee, T., Homes, M., Frijters, A., Pot, J., Peleman, J., Kuiper, M., and Zabeau, M. (1995) AFLP - a new technique for DNA-fingerprinting. Nucleic Acids Research 23: 4407-4414. Vrijenhoek, R.C. (1994) Genetic diversity and fitness in small populations. In Conservation Genetics. Edited by V. Loeschcke, J. Tomiuk and S. K. Jain. Birkhauser; Basel. Walsh, J. B. (1987) Persistence of tandem arrays - implications for satellite and simple- sequence DNAs. Genetics 115: 553-567. Warren, M. S. (1993) The conservation of British butterflies. pp246-274 in The Ecology of Butterflies in Britain. Edited by R. L. H. Dennis. Oxford University Press; Oxford. Watkins, W. S., Bamshad, M., and Jorde, L. B. (1995) Population-genetics of trinucleotide repeat polymorphisms. Human Molecular Genetics 4: 1485-1491.

131 Wayne, R. K., Lehman, N., Girman, D., Gogan, P. J. P., Gilbert, D. A., Hansen, K., Peterson, R. G., Seal, U. S., Eisenhawer, A., Mech, L. D., and Krumenaker, R. J. (1991) Conservation genetics of the endangered Isle-Royale gray wolf. Conservation Biology 5: 41-51. Weber, J. L., and Wong, C. (1993) Mutation of human short tandem repeats. Human Molecular Genetics 2: 1123-1128. Weber, James (1990) Informativeness of Human (dC-dA)„. (dG-dT)^ Polymorphisms. Genomics 7: 524-530. Weir, B. S., and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of population structure. Evo/Mf/on 1984: 1358-1370. Weissenbach J, Gyapay G, Dib C, Vignal A, (1992) A Second-Generation Linkage Map ofthe Human Genome. Nature 359: 794-801. Willard., H. P., and Waye, J. S. (1987) Hierachical order in chromosome-specific human alpha-satellite DNA.Trends in Genetics 3: 192-198. Whitlock, M. C. (1992) Nonequilibrium structure in forked fungus beetlesiextinction, colonisation, and the genetic variance among populations. The American Naturalist 139: 952-970. Whitlock, M., and McCauley, D. E. (1990) Some population genetic consequences of colony formation and extinction: genetic correlations within founding groups. Evolution 44: 1717-1724. Wierdl, M., Dominska, M., and Petes, T. D. (1997) Microsatellite instability in yeast: Dependence on the length of the microsatellite. Genetics 146: 769-779. Williams, J. G. K., Kubelik, A. R., Livak, K. J., Rafalski, J. A., and Tingey, S. V. (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic- markers. Nucleic Acids Research 18: 6531-6535. Wohrle, D., Hennig, I., Vogel, W., and Steinbach, P. (1993) Mitotic stability of Fragile-X mutations in differentiated cells indicates early postconceptional

Wolff, R. K., Plaetke, R., Jeffreys, A. J., and White, R. (1989) Unequal crossing over between homologous chromosomes is not the major mechanism involved in the generation of new alleles at VNTR loci. Genomics 5: 382-384.

Kremer, E., Lynch, M., Pritchard, M., Sutherland, G. R., and Richards, R. I. (1992) Fragile-X syndrome - unique genetics of the heritable unstable element. American Journal Of Human Genetics 50: 968-980. Zhang, Q., Maroff, M. A. Saghai, and Kleinhofs, A. (1993) Comparitive diversity analysis of RFLP's and isozymes within and among populations ofHordeum vulgare ssp. spontaneum. Genetics 134: 909-916.

132 A ppendix I Programming code for mutation model. (Chapter 3, sections 3.4 & 3.5)

Programmed in ‘C’.

#include

#include #include

#define steps 1000000 #define upratio 6.0 #define downratio 3.0 #define unequalratio 1.0 #defme max 100 #define seedstart 989476 #define INTERVAL 1000000

typedef long storearray [max] ; typedef double doublearray [max]; typedef long bigarray [max] [max] ;

static doublearray time;

static long size; static int seed;

#defme MBIG 1000000000 #define MSEED 161803398 #define MZ 0 #define FAC (1.0/MBIG)

‘Generation of random numbers’

float ran3(idum) int *idum; { static int inext, inextp; static long ma[56]; static int iff = 0; long mj, mk; int i, ii, k;

if (*idum < 0 II iff == 0) { iff=l; mj = MSEED - (*idum < 0 ? -*idum : *idum); mj %= MBIG; ma[55] = mj; m k-1; for (i = 1 ; i <= 54; i++) { ii = (21 * i) % 55; ma[ii] = mk;

133 mk = mj - mk; if (mk < MZ) mk += MBIG; mj = ma[ii]; } for (k = 1 ; k < 4; k++) for (i = 1; i < 55; i++) { ma[i] -= ma[l + (i + 30) % 55]; if (ma[i] < MZ) ma[i] += MBIG; } inext = 0; inextp = 31; *idum = 1 ; } if (++inext == 56) inext = 1 ; if (++inextp == 56) inextp = 1 ; mj = ma[inext] - ma[inextp]; if (mj < MZ) mj += MBIG; ma[inext] = mj; return mj * FAC; } ‘manipulation of array size' static onerunO { long X , y, z, cutl, cut2, newsize,counter,oldtime; double UPTIME, UNEQUALTIME, TOTALTIME, DOWNTIME,CUMTIME;

size = 1 ; TOTALTIME = 0.0; oldtime=0; counter=l; for (x = 0; X < max; x++) { . time[x] = 0.0; }

for (x = 0; X < steps; x++) { if ((long)(TOTALTIME/INTERVAL-(double)oldtime)>0) { printfC % .5E % 121d\n", TOTALTIME, size); oldtime+=l;} UPTIME = -log(ran3(&seed)) / ((double) size * (double)upratio); DOWNTIME = -log(ran3(&seed)) / ((double) (size-1.0) ^(double) downratio); UNEQUALTIME = -log(ran3(&seed)) / ((double) size * (double) size*(double)unequalratio) ; if (UPTIME < UNEQUALTIME && UPTIME < DOWNTIME) { TOTALTIME += UPTIME; if (size < max) time [size] += UPTIME; newsize = size + 1 ; } else { if (UNEQUALTIME < DOWNTIME) {

134 TOTALTIME += UNEQUALTIME; if (size < max) time[size] += UNEQUALTIME; cutl = (long) (ran3(&seed) * (double) size); cut2 = (long) (ran3(&seed) * (double) size); newsize = size + cutl - cut2; } else { TOTALTIME += DOWNTIME; if (size < max) time[size] += DOWNTIME; newsize = size -1; } ) size = newsize; /* if (log 10((double)size) > (double)counter) { counter+=l; printf("%121d% .5E\n",size,TOTALTIME); } */

} CUMTIME=0.0; for (x = 0; X < max; x++) { CUMTIME+=time[x] ; printf("% 121d % ,5E % .5E \n", x, time[x] / TOTALTIME,time[x]); } printf("% 121d % .5E % .5E \n", max, 1.0- CUMTIME/TOTALTIME,TOTALTIME-CUMTIME);

}

main(argc, argv) int argc; char *argv[]; { long x; double Y; seed = seedstart; onerunO;

135 A p p e n d ix n Primer Sequences and Reaction Conditions

P a p l Forward GGCACGACAGAATACTCAG Reverse CTGCTTGGCTTTTTCACAAG PCR product length « 150bp

P ap2 Forward TAGCTCAGGTCACTCCAGCC Reverse GCATAATGCGCTAGTAATGC PCR product length « 130bp

Pap3 Forward GTGGATGCGTGCGTACTG Reverse GCGCTGCAAACCTGAAATC PCR product length « 125bp

Radioactive PCR of loci Pap 1, Pap2 and Pap3.

PCR reaction mix: 10|Xl final volume Advanced Biotechnologies buffer IV 1 x dCTP 6|iM dATP 100p,M dTTP lOOpM dGTP lOOpM Mg2+ l.OmM (1.5mM Pap2) Oligonucleotide Primers 5pM each 32PdCTP 0.04|iCi Taq. Polymerase 0.25 units DNA 150ng Glycerol 20%

Reaction Conditions: TD = 94°C - 20 seconds TA = 58°C - 30 seconds TE = ------X 35 cycles.

136 A p p en d ix m

M olecular Ecology (1994) 3, 275

TECHNICAL NOTE A quick and simple nonlethal method for extracting DNA from butterfly wings

O. C. ROSE, M. I. BROOKES and J. L. B. MALLET The Gallon Laboratory, Department of Genetics and Biometry, University College London, 4 Stephenson Way, London NW l 2HE, UK

The polymerase chain reaction (PCR) is now used rou­ 123456789 10 tinely in analyses of genetic variation for population and evolutionary studies. Such studies usually involve the analysis of a large number of individuals. In these cir­ 1 018 cumstances, DNA extraction, using a conventional phe­ 5 0 6 nol/ chloroform-based approach, can be laborious and time consuming. In addition, extracting DNA from in­ sects and other invertebrates normally necessitates the killing of specimens. Invasive sampling from small, natu­ ral populations is clearly not ideal, especially when the particular study organism is rare or endangered. Here we report on an extremely quick and simple, noninvasive technique for extracting DNA from butterfly wing frag­ ments. Such crude DNA extracts have been used success­ fully in PCR of both nuclear and mitochondrial DNA markers. Fig. 1 Agarose gel electrophoresis of mitochondrial and nuclear Using a pair of fine forceps (sterilized by several DNA amplification products, showing comparison between phe­ washes in ethanol and heating over a flame), a small frag­ nol/chloroform-extracted DNA and crude DNA extracts from ment of tissue about 2 mm square is removed from the wings of P. argiis (2% Agarose gel (Sigma) run at 80 V for 2 h). Lanes; 1 & 6 molecular weight marker (sizes in bp); 2-5 amplifica­ edge of the butterfly wing. This is then gently homog­ tions of the AT-rich region of mtDNA using the met39- and enised in 20 |iL of TE (10 mM Tris HCl, 1 m.M EDTA, pH 8) 12s438+ primers (designed by Martin Taylor, University of Ari­ and left overnight at room temperature. Homogenisation zona); 2 no DNA control; 3 phenol/chloroform-extracted DNA; 4 is performed by grinding thoroughly with a sterile pellet & 5 crude DNA wing extracts; 7-10 nuclear DNA amplifications mixer (Scotlab), in a 0.5 mL microfuge tube. Samples are (primers were designed from partial sequences of random clones stored at 4 °C. Figure 1 shows PCR amplification prod­ at UCL); 7 no DNA control; 8 phenol/ chloroform-extracted DNA; ucts using 1 pL of the crude extract in a 50 pL total reac­ 9 & 10 crude DNA wing extracts. Length variation has been re­ vealed by the nuclear primers. tion volume. In general, 1 pL of crude wing extract pro­ duces a similar product yield to 1 pL of total butterfly phenol/chloroform-extracted DNA. Occasionally, it has This technique is currently being applied to the study been necessary to increase the amount of extract in the of genetic variation in metapopulations of two endan­ reaction (up to 10 pL) in order to increase the yield of PCR gered British butterfly species, the silver studded blue, product. Successful amplifications have been performed Plebejus argus, and the silver spotted skipper, Hesperia on unfrozen, dried specimens as well as those stored at - comma. Genetic analyses can now be performed on small 70 °C. The technique will allow efficient and rapid field local populations with few, if any, detrimental effects on sampling without the need for cumbersome cold storage the long term survival of these populations. The tech­ apparatus. nique should make it possible to study other rare and en­ dangered invertebrate species which are sensitive to perturbations caused by invasix-e sampling.

KL'yii'ord^: butterflies, conservation, DN'A extraction, mtDNA, nuclear DNA, PCR Acknowledgement

Recch'cd 21 luiiiian/ 1994: revi^ioiniaypti'd 2 March 1994 This research is funded bv an NERC grant to |\1.

137