INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UM I directly to order. UMI A Bell & Howell Information Company 300 North Zeeb Road, Arm Arbor MI 48106-1346 USA 313/761-4700 800/521-0600

MOLECULAR POPULATION GENETIC ANALYSES OF LAKE VICTORIA FISHES USING MICRO SATELLITE DNA MARKERS

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Lizhao Wu, B. S., M. S.

*****

The Ohio State University 1999

Dissertation Committee: Approved by Dr. Paul A. Fuerst, Advisor

Dr. Arthur H. M. Burghes Advisor Dr. Thomas J. Byers Molecular Genetics Graduate Program Dr. Thomas W. Prior DMI Number: 9931702

UMI Microform 9931702 Copyright 1999, by UMI Company. All rights reserved.

This microform edition is protected against unauthorized copying under Title 17, United States Code.

UMI 300 North Zeeb Road Ann Arbor, MI 48103 ABSTRACT

The cichlid flocks of East Afiica offer powerful models of explosive spéciation and reiterative adaptive radiation. However, the phylogenetic history and population structure of the fishes of the Lake Victoria Region (LYR) have been problematic, due to a lack of sufficiently variable genetic markers. Previous studies of the Lake Victoria using mitochondrial DNA sequences showed little divergence among species. To determine whether low mtDNA variation was due to small sample sizes, or is a characteristic of the cichlids of Lake Victoria, a 432 base pair mtDNA region was sequenced from 35 individuals of a LVR widespread species,

Astatoreochromis alluaudi, from six localities. Very little phylogenetic information was detected, and no significant relationship existed between either the frequency or the occurrence of mtDNA haplotype and the geographical source of samples. The results suggest that the low mtDNA diversity in the LVR cichlid species reported previously was not solely a function of small sample sizes, and rule out the usefulness of mtDNA sequence data for phylogenetic inferences of the LVR cichlids.

To develop markers that can be used to study the problematic phylogeny of cichlids in the LVR, both dinucleotide and trinucleotide microsatellite

ii DNA probes were used to screen a partial genomic library constructed &om A alluaudi.

The (GT)^ motifs were estimated to occur at an average interval of 24kb in the A. alluaudi genome. Like several other teleost genomes, the A. alluaudi genome seems to have a higher percentage of long dinucleotide microsatellites than mammalian genomes.

Unlike mtDNA sequencing data, nine DNA microsatellite markers revealed significant regional differentiation among the same six population samples of A. alluaudi that were used for mtDNA studies, coupled with very high levels of intra-population genetic variability. Furthermore, measures of genetic differentiation based on microsatellites are consistent with patterns predicted by both the biogeography and the jaw morphology of the six populations.

In sharp contrast to both allozyme markers and mtDNA sequences, 14 microsatellite markers detected substantial amounts of genetic variation within each of 24 cichlid species from the LVR, as well as a single representative species from Lake

Malawi. Phylogenetic analyses indicated that microsatellite markers, as a group, contain informative phylogenetic signals that not only confirmed taxonomic relationships of some, but not all, morphologically defined congeners, but also suggested regional spéciation and differentiation in the LVR. Moreover, the microsatellite based phylogeny is consistent with an ancient invasion of Lake Edward species into the Lake Victoria basin after Lake Victoria dried up and refilled in the Late Pleistocene. Microsatellites appear to evolve rapidly enough to reveal population structure and phylogeny of the tightly knit members of the Lake Victoria haplochromine cichlid species flock. However,

111 a robust and reliable phylogeny of these cichlids may still require additional data from more microsatellite markers, as weU as the inclusion of additional genera and additional sets of congeners from different lakes.

IV Dedicated to my parents for their faith in education and science ACKNOWLEDGMENTS

I would like to sincerely thank my advisor. Dr. Paul A. Fuerst, for his constant stimulation, guidance, support, and patience throughout the course of my dissertation research. I would also like to thank the other members of my committee. Dr. Arthur H.

M. Burghes, Dr. Thomas J. Byers, and Dr. Thomas W. Prior, for their helpful advice on my research and valuable comments on my dissertation. I am especially indebted to Drs.

Thomas J. Byers and Paul A. Fuerst for suggesting numerous improvements, both grammatically and scientifically, on an earlier draft of my dissertation. Special thanks go to Dr. Les Kaufinan at Boston University, for his broad knowledge of the Lake Victoria cichlid system and intellectual support.

I thank my friends and colleagues in the Fuerst laboratory, Greg Booton, Wenrui

Duan, Jeannette Kreiger, Brian Mark, Godfrey Mbahinzireki, Wilsom Mwanja, Brady

Porter, Malcolm Schug, and Diane Stothard, for providing technical support, stimulating ideas, making jokes, and admitting the ignorance of the number of beetle species.

Special thanks go to Greg Booton and Wilson Mwanja, for their insights into the Lake

Victoria cichlid problems, and to Malcolm Schug, for his understanding, encouragement,

vi and intellectual support. Data analysis and figure preparation were assisted by Greg

Booton, Jeannette Kreiger, and Braian Mark.

This project would have been impossible without the samples collected by numerous researchers: Wilson Mwanja; Les Kaufinan; Collins Chapman and Lauren

Chapman at the University of Florida; Audrey Armoudlian at the Michigan State

University; Researchers at the Fisheries Research Institute of Uganda, the Kenyan Marine and Fisheries Research Institute, and the Columbus Zoo. All the samples were morphologically identified by Les Kaufinan. These studies were supported by grants firom the National Sciences Foundation, the U.S. Department of Agriculture, the

Columbus Zoo, the Sigma Xi Grant-In-Aid Research Award, the Ohio State University

Graduate Student Alumni Research Award, as well as the Presidential Fellowship from the Ohio State University.

Finally, I would like to deeply thank my wife, Judy, and my daughters, Janet and

Jessica, for their love, support, sacrifices, and patience throughout my career at the Ohio

State University. I may never repay them for their understanding of my absence from their lives during many evening hours and weekends when they need me the most.

VII VITA

November 28, 1963 ...... Bom in Ezhou, Hubei, P. R. China

1982 ...... B. S. Biology, The Central China Normal University

1985 ...... M. S. Genetics, The Chinese Academy of Sciences

1985 — 1987 ...... Research Assistant, The Chinese Academy of Sciences

1987 — 1992 ...... Research Associate, The Chinese Academy of Sciences

1993-present ...... Graduate Teaching and Research Associate

The Ohio State University

1997...... President Fellow, Graduate School at

the Ohio State University

PUBLICATIONS

1. Wu, L-, L. Kaufinan, and P. Fuerst, “Isolation of microsatellite markers in Astatoreochromis alluaudi and their cross species amplification in other Afiican cichlids.” Mol. EcoL, in press (1999). viii 2. Wilson, G. A., C. Strobeck, L. Wu, and J. W. Coffin, “Characterization of microsatellite loci in caribou Rangifer tarandus, and their use in other artiodactyls.” Mol. EcoL 6: 697-699 (1997).

3. Wu, L.and Z. Wang, “Studies on the expressions and regulations of isozymic genes in silver carp {Hypophthalmicythys molitrix) during their ontogenesis.” Acta Hydrobiol. Sin., 21: 49-58 (1997a).

4. Wu, L. & Z. Wang, “Biochemical genetic structure and variation in a natural population of silver carp from the middle reaches of the Yangtze River.” Acta Hydrobiol. Sin., 21: 157-162 (1997b).

5. Wu, L., G. Booton, M. Chandler, L. Kaufinan, and P. Fuerst, “Use of DNA microsatellite loci to identify populations and species of Lake Victoria haplochromine cichlids.” In: "Aquaculture Biotechnology", pp. 105-113. Edited by E. M. Donaldson and D. D. MacKinlay. American Fisheries Society, Bethesda, MD (1996).

6. Wu, L. and Z. Wang, “The difference in low temperature endurance between mud carp and the mixed-sperm-inseminated mud carp of the second successive generation.” Acta Hydrobiol. Sin., 17(3): 206-210 (1993)

7. Wu, L. and Z. Wang, “Biochemical genetic structure and variation in a natural population of grass carp from the middle reaches of the Yangtze River.” Chinese J. Genet., 19(3): 221-27 (1992a).

8. Wu, L. and Z. Wang, “Studies on the development genetics of isozymes in bighead carp (Aristichthys nobilis).” Acta Hydrobiol. 5'/n.,16(l): 8-17 (1992b).

9. Wu, L. and Z. Wang, “Biochemical genetic structure and variation in a natural population of bighead carp from the lower reaches of the Yangtze River.” Acta Hydrobiol. Sin., 15(1): 94-6 (1991).

10. Li, S., L. Wu, J. Wang, Q. Chou, and Y. Chen (editors), "Comprehensive Genetic Study on Chinese Carps". Shanghai Scientific & Technical Publishers, Shanghai, P. R. China (1990).

11. Wu, L. and Z. Wang, “A preliminary study on the polymorphisms of isozymic loci in grass czrp(Ctenopharyngodon idellus)Acta Hydrobiol. Sin., 12(2): 116-24 (1988).

12. Wu, L. and Z. Wang, “Studies on the developmental genetics of isozymes in grass czxp(Ctenopharyngodon idellus ). I. Analysis of isozymes in various tissues and

ix organs of grass carp.” Chinese J. Genet., 14(4): 287-93 (1987a).

13. Wu, L. and Z. Wang, “Studies on the developmental genetics of isozymes in grass carp(jOtenopharyngodon idellus). II. Analysis of isozymes during the early development of grass carp.” Chinese J. Genet., 14(5): 395-8 (1997b).

FIELDS OF STUDY

Major Field: Molecular Genetics.

Specilization: Molecular Population Genetics and Evolutionary Genetics TABLE OF CONTENTS

Page

Abstract ...... ii

Dedication ...... v

Acknowledgments ...... vi

Vita...... viii

List of Tables ...... xiv

List of Figures ...... xvi

Chapters:

1. Introduction ...... 1 1.1 Eastern African cichlids ...... 1 1.1.1 Lake Victoria and its major satellite lakes ...... 2 1.1.2 Explosive spéciation ...... 3 1.1.3 Controversial ...... 6 1.1.4 Population decline and species extinction ...... 9 1.2 Microsatellites...... 12 1.2.1 Technical overview ...... 12 1.2.2 Applications ...... 17 1.2.3 Possible cellular functions ...... 19 1.2.4 Mutational process and mutational mechanism ...... 21 1.2.5 Mutation rates ...... 24 1.2.6 Advantages of microsatellite as genetic markers ...... 25 1.2.7 Disadvantage and limitations of using microsatellite markers 27

2. Isolation and characterization of microsatellite markers in Astatoreochromis alluaudi and their cross-species amplifications in other African cichlids ...... 35 xi 2.1 Introduction ...... 35 2.2 Materials and methods ...... 38 2.2.1 Isolation and sequencing of microsatellite markers ...... 38 2.2.2 Microsatellite PCR analysis ...... 40 2.2.3 Biological materials and DNA extraction ...... 41 2.3 Results...... 41 2.3.1 Cloning, characterization of microsatellite markers ...... 41 2.3.2 Microsatellite PCR primer design and initial screening...... 42 2.3.3 Cross-species amplification ...... 43 2.4 Discussion ...... 44 2.4.1 Abundance of (GT)n motif in A. alluaudi genome ...... 44 2.4.2 Do teleost genomes contain more lengthy dinucleotide repeat motifs than mammalian genomes? ...... 46 2.4.3 Potential of cross-species applications ...... 47

3. Microsatellites reveal regional differentiation in the Lake Victoria cichlid fish, Astatoreochromis alluaudi ...... 57 3.1 Introduction ...... 57 3.2 Materials and methods ...... 60 3.2.1 Biological materials and DNA extraction ...... 60 3.2.2 Amplification and sequencing of mtDNA ...... 61 3.2.3 Microsatellite genotyping ...... 62 3.2.4 Data analysis ...... 62 3.3 Results...... 64 3.3.1 Variation of mtDNA ...... 64 3.3.2 Intra-population microsatellite variability ...... 66 3.3.3 Population structure ...... 67 3.3.4 Population genetic relationships...... 69 3.3.5 Microsatellite variability and phylogenetic inference of other LVR Haplochromine cichlids ...... 69 3.4 Discussion ...... 70 3.4.1 Intrapopulation genetic variability ...... 71 3.4.2 Genetic relationships of A. alluaudi populations consistent with geographical histories ...... 73

x i i 3.4.3 F statistics supporting the interpretation of high mobility despite Significant population subdivision o f A. alluaudi...... 75 3.4.4 Population correlation between genetic distance (as measured by microsatellites) and pharyngeal jaw morphology ...... 77 3.4.5 Potential ofÆ alluaudi microsatellite markers in cichlid phylogenetic studies...... 79

4. Microsatellite genetic variation and phylogeny of Lake Victoria cichlid species ...... 94 4.1 Introduction ...... 94 4.2 Materials and methods ...... 96 4.2.1 Biological specimens ...... 96 4.2.2 Microsatellite markers and PCR genotyping ...... 97 4.2.3 Data analyses ...... 98 4.3 Result...... 99 4.3.1 Cross-species amplification of microsatellite primers ...... 99 4.3.2 Microsatellite variation within species ...... 100 4.3.3 Phylogenetic analyses based on 11 microsatellite markers ...... 102 4.3.4 Phylogenetic analyses based on 14 microsatellite markers ...... 105 4.4 Discussion ...... 107 4.4.1 Success of cross-species amplifications of microsatellite primers... 107 4.4.2 Microsatellite variation within haplochromine cichlid species ...... 108 4.4.3 Correlation between allele sizes and genetic variability ...... 109 4.4.4 Phylogenetic inferences for haplochromine cichlids ...... 110 4.5 Conclusions ...... 117

Literature cited...... 137

X lll LIST OF TABLES

Table ...Page

1.1 The surface area and maximum depth of the Great Lakes of East and several other freshwater lakes in the world ...... 33 1.2 The number of cichlid species, non-cichlid species, and the percentage (%) of endemic species in major East African lakes and river systems ...... 34 2.1 Microsatellite core sequences, primer sequences and PCR conditions ...... 50 2.2 Species, sample sizes, and collection sites for cross-species amplifications ...... 51 2.3 Repeat motifs for the 23 positive dinucleotide microsatellite clones...... 52 2.4 Repeat motifs for the 20 positive trinucleotide microsatellite clones ...... 53 2.5 Cross-species amplification using nine pairs of A. alluaudi microsatellite PCR primers ...... 54 2.6 Genetic variability of microsatellites in LVR cichlid species ...... 55 2.7 Estimated occurrence frequency of (GT)n repeat motifs in various teleost genomes ...... 56 3.1 Pairwise straight-line geographical distances and “realistic” geographical distances for die six sampling locations ...... 81 3.2 mtDNA sequence diversity of A. alluaudi samples ...... 82 3.3 Microsatellite variability withinvf. alluaudi populations ...... 83 3.4 Number and percentage of private alleles ...... 85

3.5 Pairwise Fst (and as well as Nm (and M r) for the six A. alluaudi populations based on microsatellite data ...... 86

3.6 Pairwise allele sharing distances (D as) and Nei’s (1972) standard genetic distances for the six population samples ...... 87 4.1 Sample sizes and sampling locations of the cichlid species ...... 118 4.2 Repeat motifs and PCR conditions of the seven microsatellite markers developed in other laboratories ...... 119 xiv 4.3 Summary of microsatellite genetic variability within 25 haplochromine cichid species ...... 120 4.4 Correlation between allele size and genetic variability of homologous microsatellite markers among different species ...... 127 A.1 Tissue sample ID, DNA sample ID, tissue sample collection locations and years for A alluaudi samples ...... 153

XV LIST OF FIGURES

Figure ...Page

1.1 A map of the Lake Victoria Region (LVR) ...... 32 3.1 Map of the LVR with the sample collection sites labelled ...... 88 3.2 Bases at 10 variable sites for the 10 haplotypes of A. alluaudi (a) and a minimum mutation network for the 10 haplotypes (b) ...... 89 3.3 A UPGMA phenogram of the 10 mtDNA haplotypes ...... 91 3.4 A UPGMA phenogram based on Nei’s (1972) genetic distances for the six A. alluaudi populations ...... 92 3.5 A neighbor-joining tree based on Nei’s (1972) genetic distances of the five Lake Victoria cichlid species ...... 93 4.1 An unrooted neighbor-joining tree based on Nei’s (1972) genetic distances of 11 markers...... 129 4.2 An unrooted neighbor-joining tree based on Nei’s genetic distances of 11 markers (topology only) ...... 130 4.3 An unrooted neighbor-joining tree based on allele sharing distances of 11 markers (topology only) ...... 131 4.4 An unrooted neighbor-joining tree based on the stepwise weighted genetic distances of 11 markers (topology only) ...... 132 4.5 An unrooted neighbor-joining tree based on Nei’s genetic distances of 14 markers (topology only) ...... 133 4.6 An unrooted neighbor-joining tree based on allele sharing distances of 14 markers (topology only) ...... 134 4.7 An unrooted neighbor-joining tree based on stepwise weighted genetic distances of 11 markers (topology only) ...... 135 4.8 A consensus neighbor-joining tree based on Nei’s genetic distances of 14 markers...... 136

XVI F.l Pairwise Nei’s genetic distances based on 11 microsatellite markers ...... 195 F.2 Pairwise allele sharing genetic distances based on 11 microsatellite markers...... 196 F.3 Pairwise stepwise weighted genetic distances based on 11 microsatellite markers...... 197

xvii CHAPTER 1

Introduction

I. EASTERN AFRICAN CICHLIDS

Although Lake Victoria is not the largest lake. Lake Tanganyika is not the deepest lake, and Lake Malawi is not the oldest lake in the world, the Great Lakes in the East

African Rift Valley are second to none when it comes to their richness in fish species and the speed at which those fish have evolved (Fryer & lies, 1972). Since their discovery about a century ago, the cichlid species assemblages in the Great Lakes have fascinated evolutionary biologists. They have been one of the most extensively studied model systems among vertebrates. Because a set of projects in the Fuerst laboratory have focused on Lake Victoria cichlids (Booton, 1995; Booton et al, 1999; Wu et al, 1996;

Wu et al, 1999), and because the cichlid spéciation event happened more recently and more rapidly in Lake Victoria than in the other two lakes, the following review will primarily deal with the Lake Victoria system. Lake Victoria and its major satellite lakes

Situated across the equator. Lake Victoria is the largest freshwater lake in Africa.

It covers an area of 68,800 km’ (300 km from north to south and 280 km from east to west), making it the second largest freshwater lake in the world (Table 1.1; Figure 1.1).

Unlike Lakes Tanganyika and Malawi, Lake Victoria is set in a large shallow basin, with the maximum depth being only 79 meters, and the mean depth being 40 meters (Kaufinan et al, 1996). Strictly speaking. Lake Victoria is not a rift lake. Instead, it was formed by a combination of direct rainfall on the surface, the fusion of a few older and smaller lakes, and the inflow from several nearby rivers (Kaufinan et al., 1996).

North of Lake Victoria is Lake Kyoga. A very shallow lake with a maximum depth of only 8 meters (Pitcher & Hart, 1995), Lake Kyoga is fed by the Nile River system through the Victoria Nile River, which receives discharges from Lake Victoria.

Water from the Victoria Nile River is further drained to the northwest into Lake Albert, a typical rift lake with a maximum depth of 56 meters (Pitcher & Hart, 1995). Lake

Edward and Lake George lie west of Victoria. The former is a typical rift lake with a maximum depth of 117 meters, but the later is very shallow, with a maximum depth of only 3 meters (Pitcher & Hart, 1995). These two lakes are connected to each other by the

Kazinga Channel. Connections between Lake Victoria and Lake Edward may have existed as recently as several thousand years ago (Kaufinan et al., 1996). Explosive spéciation

The cichlid fauna in the rift valley Great Lakes of East Africa form one of the most extraordinary assemblages of freshwater species in the world. For evolutionary biologists, these cichlid species flocks represent a very unusual evolutionary phenomenon called “explosive spéciation”. These assemblages include the largest number of species among living vertebrates. They are also the most diverse group ecologically, behaviorally, and morphologically. In the kingdom, only the famous Hawaii

Drosophila species flock (Carson & Kaneshiro, 1976) is close to the East African cichlid species flock.

Explosive spéciation has occurred in each of the three great lakes. Hundreds of cichlid species but a much smaller number of other fish species have appeared in each of the lakes in a short time period (Table 1.2). More interestingly, almost all of the cichlid species found in the Great Lakes are endemic to a particular lake. In sharp contrast, in the nearby river systems such as the Nile River and Niger River, there are a much smaller number of cichlid species, most of which are not endemic (Table 1.2).

Cichlids have a wide geographical range. They are found not only in Afiica, but also in South America, Central America, and Asia. Only in East Africa have cichlids undergone explosive spéciation. Although the origin and past history of Lake Victoria are still not completely clear, it was believed that Lake Victoria was formed between

250,000 to 750,000 years ago (Greenwood, 1984). However, recent geological evidence indicated that Lake Victoria must have completely dried up for a period of a few thousand years, and was not refilled until the Late Pleistocene period (-12,400 years ago)

3 (Johnson et al., 1996). Because the drying up was severe, it is suspected that few, if any, fish species would be able to survive. Therefore, most, if not all, of the present-day 400 plus endemic cichlid species in Lake Victoria must have evolved in the last 12,400 years, probably from a single common ancestral riverine or lacustrine founder species (Meyer et al., 1990). Alternatively, the lake could have been colonized directly by many ancient species from the nearby satellite lakes or rivers when Lake Victoria was refilled and fed.

It was once believed that temporary geographical isolation of cichlid species in satellite lakes around Lake Victoria could be an important factor for the extreme cichlid species diversity in Lake Victoria (Greenhood, 1965). In other words, most of the present-day Lake Victoria cichlid species might be already present in the satellite lakes when the water refilled and fed Lake Victoria. However, such a scenario is no longer favored (Kaufinan et al., 1997) because the number of cichlid species found in each of the satellite lakes are too few to explain the extreme cichlid species diversity in Lake

Victoria. Moreover, this hypothesis also failed to explain the diversity of “rock” species in Lake Victoria because there is almost no “rock” species in any of the major satellite lakes of Lake Victoria (Galis & Metz, 1998). By the same token, the idea that most of the present-day cichlid species in Lake Victoria were produced directly through colonization of many ancient species from the nearby rivers is also not well supported

(see Table 1.2). Therefore, it is more likely that most of the present-day Lake Victoria cichlid species have evolved inside the lake from one, or perhaps a few, ancestral founder species from the nearby pre-Pleistocene lakes or river systems (Booton et al, 1999;

Kaufinan ef a/., 1996). However, the exact mechanisms behind the explosive evolution in East African

Great Lakes still remains unresolved. Recent studies suggested that sexual selection based on mate choice of females on male coloration probably plays a very important role in rapid spéciation of Lake Victoria cichlid species (Seehausen et al., 1997; Seehausen et al, 1998). Although various cichlid species can interbreed and produce fertile offspring under laboratory conditions, they are usually sexually isolated in nature by mate choice.

Males of most cichlid species differ in coloration. This is especially true for closely related sympatric species. Females strongly prefer males with a particular coloration especially when light conditions are very good (Seehausen & van Alphen 1998). Mate choice of females for males with different coloration could help maintain reproductive isolation for sympatric species and color morphs (Knight et a l, 1998; van Oppen et al,

1998).

Ecologically, Lake Victoria cichlid species are extremely diverse. As a whole, they utilize almost all possible ecological resources (niches) and eat all the possible food organisms available in the lake (Greenwood, 1991). Moreover, phenotypic plasticity

(Stissny, 1991) is a common feature for many morphological characters, such as the pharyngeal jaw morphologies. Phenotypic plasticity refers to morphological changes under different environmental conditions without genetic modifications. Phenotypic plasticity not only provides the ability for a species to survive in, and to adapt to, different ecological niches, but also provides the ability for a species to evolve.

Currently, it is generally believed that mate choice and niche differentiation are the two key factors for rapid spéciation of cichlid species in the Great Lakes (Galis & Metz, 1998). Niche differentiation and adaptive radiation provides the opportunity for con- specific populations to split, whereas sexual selection promotes the spéciation event.

Controversial taxonomy

The cichlid species group in East Afncan Great Lakes belongs to the family

Cichlidae, suborder Labroidei, order Perciformes (Kaufinan & Liem, 1982). Based on a special type of dentition, all the Lake Victoria cichlid species were originally placed in the (Hilgendorf, 1888 in Lippitsch, 1993). When examining cichlid species in Lake Tangannyika, Regan (1920) claimed that Haplochromis was the largest genus in Africa, and cichlid species in the East African Great Lakes have been since then called “haplochromine cichlids”.

Partly due to Regan’s authority in taxonomy, large-scale revisions of cichlid taxonomy did not occur until the late 1970’s. In two series papers, P. H. Greenwood, the

“Father of Cichlids”, undertook a major revision of cichlid taxonomy, covering cichlid species from both the Lake Victoria and the Lake Tanganyika assemblages (Greenwood,

1979, 1980). In the revision, the entire haplochromine cichlid species complex was divided into more than 20 genera or, in a few cases, subgenera. However, such a major revision has not been universally accepted among ichthyologists and taxonomists for several reasons (Lippitsch, 1993). The major challenge to the revision is that the revision was based on few character differences and that, based on the key characters used in the revision, it was almost impossible to assign newly discovered species to the revised genera. Moreover, most of the characters used in the revision are associated with the trophic apparatus {i.e., pharyngeal jaw morphology), which is subject to strong selection pressure. Finally, any revision excluding the largest Lake Malawi cichlid assemblage was no better than incomplete. Nevertheless, a more recent study based on 96 scale and squamation {i.e., epidermal scale arrangement) characters confirmed most of the genera proposed in Greenwood’s revision, even though many taxonomic placements still remain controversial (Lippitsch, 1993). This later study also suggested that members in the super lineage of the Lake Victoria-Edward-Kivu haplochromine cichlids belong to a monophyletic group.

The major challenge for phylogenetic analysis of the East A&ican cichlid species flock is that many putative species have significant overlaps for various widely used morphological characters. There are very few useftil morphological characters available for cichlid taxonomists. Moreover, most of the phylogenetically informative characters are further confounded by phenotypic plasticity. For example, as mentioned above, most of the morphological characters that Greenwood (1979, 1980) used in his revision are associated with the trophic apparatus, which are generally thought to be subject to selection pressure. Therefore, since the 1980’s, increasing efforts have been made to use biochemical and/or genetic markers in cichlid phylogenetic studies. Unfortunately, many early studies documented extremely low levels of genetic variation in Eastern African cichlids, especially in Lake Victoria cichlids. For example, data from 10 allozyme loci failed to resolve 10 Lake Victoria haplochromine cichlid species (Sage et al, 1984).

Although analyses of the fast-evolving mitochondrial DNA sequences revealed considerable amounts of genetic variation in cichlid species flocks of Lake Tanganyika

7 and Lake Malawi, genetic variation for the Lake Victoria species flock is much lower

(Meyer et al., 1990; Sturmbauer & Meyer, 1992). Genetic variation for an endemic

Tropheus lineage of six species in Lake Tanganyika is six times more than that of the

Lake Victoria species flock. (Sturmbauer & Meyer, 1992) In fact, among 14 Lake

Victoria cichlid species, there was no intraspecific variation reported in 363 base pairs

(bp) of the cytochrome b gene, and on average, there were only about three nucleotide substitutions between species (Meyer et al., 1990). None of these substitutions were phylogenetically informative in a cladistic sense. Several studies have well documented that mitochondrial DNA sequence information is useful to differentiate various haplochromine cichlid species in Lake Tanganyika or Lake Malawi (Kocher et al, 1993;

Sturmbauer & Meyer, 1992). Such information, however, is much less useful in differentiating the members of the young cichlid species flock in Lake Victoria (Meyer et al, 1990).

Since the early 1990’s, fast-evolving nuclear DNA markers have been employed in a few studies with various levels of success. For example, a study on Lake Victoria cichlid fauna using the fast-evolving internal transcribed spacer one (ITS-1) sequences of the ribosomal RNA gene operon resolved intergeneric relationships, but revealed very low levels of interspecific divergence (Booton et al, 1999). Several other studies using different nuclear DNA markers have achieved considerable phylogenetic resolution among the three major assemblages in the three Great Lakes, but little or no phylogenetic resolution has been obtained within the Lake Victoria cichlid assemblage (Mayer et al,

1998; Sultmann et al, 1995; Zardoya et al, 1996). To better understand the evolutionary

8 relationships of various Lake Victoria haplochromine cichlids, new genetic markers, especially the ones with very high mutation rates, had to be developed and employed.

Microsatellite DNA markers could be the class of markers to fill this need (see below).

Population declines and species extinction

The waters of Lake Victoria are shared by three developing countries: Tanzania

(51%), Uganda (43%), and Kenya (6%) (Kaufinan ern/., 1996). This region of Afirica maintains one of the highest population growth rates in the world (Kaufinan 1996). As a major source of food protein, fishes have greatly benefited people around the lake for many years. Cichlids, including some haplochromine cichlids, used to be the mainstay of the fisheries in Lake Victoria. In this century, especially since the late 1960’s, however, major changes have occurred in the ecosystem of Lake Victoria, and its cichlid fish fauna have experienced dramatic population declines as well as species extinction. About two thirds of the once present endemic haplochromine species have disappeared or are threatened with imminent extinction (Witte e/n/., 1995). In fact, what happened to the

Lake Victoria haplochromine cichlid species group has been considered as the largest mass extinction event ever wimessed by humans in modem times (Barel et al, 1985).

Conservation biologists are afiraid that unless efficient conservation management is taken immediately, the fascinating Lake Victoria cichlid fauna may soon become “living fossils”.

At least three factors, all of which are linked to human activities in one way or the other, have been implicated in this catastrophe that occurred in Lake Victoria. Over­ fishing in Lake Victoria has been well documented as a negative factor affecting the Lake

Victoria cichlid community, especially the larger species (Kudhongania & Chitamwebwa,

1995; Witte et al., 1995). The increasing demand for fish as a major food resource for the people around the Lake Victoria region has led to aggressive and more efficient fishing approaches. For example, when the fish community in Lake Victoria began to decline, fishermen were tempted to use fishing gill nets with smaller and smaller meshes, thereby further decimating the fish fauna and making population recovery very difficult.

Traditionally, Lake Victoria was considered to be a cichlid lake based on the predominance of cichlid species in the fish fauna (Kudhongania & Chitamwebwa, 1995).

Data firom the early 1950’s indicated that cichlids, including both and non-haplochromines {i.e., tilapias), accounted for more than half of the fishery yield in

Lake Victoria. In the early 1960’s, a voracious predator of cichlids, the Nile perch {Lates niloticus), was introduced into Lake Victoria to rejuvenate the gradually declining Lake

Victoria fishery. This introduction appeared to be a successful event, manifested by the increasing catch rates of the Nile perch. By the mid 1980’s, the Nile perch accounted for more than 80% of the catch in Lake Victoria (Kaufman et al, 1996). Unfortunately, such success later turned out to be only temporary, and has come at a high price. The population explosion of Nile perch was sadly mirrored by the dramatic loss of the haplochromine cichlid diversity in Lake Victoria, even in areas where intensive fishing was not practiced (Kaufinan et al, 1996; Kudhongania & Chitamwebwa, 1995; Witte et al, 1995).

10 In addition to over-fishing and the introduction of exogenous predatory species, environmental changes also have had a catastrophic impact on the cichlid species diversity in Lake Victoria (Kudhongania & Chitamwebwa, 1995). Environmental changes include 1) the temporary increases in water levels that damage the rooted vegetation; 2) the loss o f the papyrus swamps that reduces protection against wave action; and 3) the gradual water enrichment, as well as industrialization and urbanization that deteriorate the water quality and decrease the water transparency (Kaufinan, 1996).

Recently, a study by Seehausen et al. (1997) elegantly demonstrated that increasing water turbidity in Lake Victoria plays an important role in the dramatic loss of haplochromine cichlid diversity in Lake Victoria by blocking the reproductive isolation mechanism through interfering with mate choice and relaxing sexual selection.

Attempts to ameliorate the process of extinction and to develop strategies for the sustainable management and conservation of the endangered Lake Victoria cichlid fauna are being made not only in Afiica, but also in Europe and North America. Apparently, the ultimate success of such efforts will depend on a better knowledge about the ecosystem of Lake Victoria, so that appropriate strategies to stabilize and restore the environment can be developed. It will also depend on a better understanding of levels of genetic variation within taxa, as well as interpopulation and interspecific genetic relationships of various taxa, so that appropriate taxonomic units can be identified for preservation in situ and/or for captive breeding practice and réintroduction of endangered species.

11 n. MICROSATELLITES

Microsatellites are a class of DNA markers that consist of short tandem repeat

DNA sequences (STR) with the core repeat motifs from 1 to 6 bp (Beckmarm & Weber,

1992). The repeat units of the most frequently used microsatellites are 2 bp, 3 bp, and 4 bp, referred to as dinucleotide, trinucleotide, and tetranucleotide microsatellites, respectively. Although microsatellite repeat motifs were first identified in the early

1980's (Hamada et al, 1982; Miesfeld et al., 1981), they were not frilly exploited until

1989 when three papers independently reported the characterization of allelic variation at microsatellite loci by the polymerase chain reaction (PGR) (Litt & Luty, 1989; Tautz,

1989; Weber & May, 1989). In the past decade, microsateUites have been increasingly used in a range of research areas (see below).

Technical overview o f microsatellite marker development

Except in a few model organisms, such as humans, mice, and frruit flies, where substantial amounts of DNA sequence genome database can be searched for microsatellite sequences directly, development of microsatellite markers for a new species usually still relies on expensive and time-consuming laboratory procedures, including subgenomic library construction, library screening, and DNA sequencing.

Subgenomic librarv construction

The most frequently used subgenomic libraries for microsatellite marker development contain size-selected inserts that are small enough for sequencing without

12 designing internal sequencing primers, but large enough to possibly harbor repeat DNA sequences and to have enough flanking sequences for PCR primer design (see below).

The commonly used insert size is from 300 to 600 bp. Such a size-selected genomic library can be generated by digesting the genomic DNA with one or two four-cutter restriction enzymes such as Sau3A, and ligating the 300-600 bp inserts into a vector that is precut with a restriction enzyme that generates sticky ends compatible with those that were generated by the four-cutter enzymes. Since the inserts are generally small, bacteriophages (i.e., M l3) or plasmids are the most commonly used vectors. To prevent self ligation, ends of either the inserts or the vector are usually protected by removing the

5’-phosphotate group by alkaline phosphatase.

To increase the percentage of positive clones in the library, a couple of library enrichment techniques have been available:

i. Library enrichment by primer amplification (e.g.,, Ostrander et al, 1992?):

Single-stranded DNA (ssDNA) is made from the subgenomic library constructed as described above, and then used as a primary genomic library to construct various enrichment sublibraries for particular microsatellite motifs. To construct each sublibrary, a single PCR cycle is performed using the ssDNA as template and using one particular repeat sequence as the PCR primer. The dsDNA produced by one cycle PCR is then ligated to seal nicks, and subsequently transformed into DHSaF' competent cells to construct individual enrichment sublibraries, each of which is now enriched for specific repeat motifs.

13 ii. Library enrichment by biotin afSnity capture: For this approach, a size selected subgenomic library is constructed, and subsequently used as templates for an asymmetric PCR using only one universal primer from the vector sequences so that the predominant PCR products are single stranded DNA. A biotinylated microsatellite probe is then added to hybridize to the targeted PCR products. The hybridized PCR products is recaptured by an interaction between the biotin tag and streptavidin-coated magnetic beads. Unbound beads are washed off and then the retained PCR products, which are now enriched for the particular microsatellite probes used above, are released from the beads. The predominant single stranded DNA PCR products are then reamplified using both universal primers of the vector, and the exponentially amplified PCR products are then cloned and screened for the particular microsatellite repeat motifs (Kandpal et al,

1994; Kijas et a/., 1994).

In addition to the library enrichment approaches, techniques are also available for developing microsatellite markers without constructing a genomic library. For instance, using a single primer with a track of specific repeat sequences at the 3"end, and a degenerate septamer at the 5’-end, Brachet et al (1999) were able to get some PCR products with the particular repeat sequences at both ends but in opposite orientations.

By cloning {i.e., TA cloning) and sequencing the PCR products, technically at least two separate microsatellite sequences with the same type of repeat motifs can be obtained from each PCR product. Another advantage of this approach, besides avoiding genomic library constmction, is that only one (instead of two) unique primer needs to be designed for PCR genotyping for each individual microsatellite marker, because the primer used

14 for the initial single primer PCR amplification can be used as the other primer for PCR genotyping (Fisher et ai, 1996).

Librarv screening

Microsatellite repeat probes can be labeled with y*^P-ATP, and then used to screen the subgenomic DNA libraries. To avoid the safety concerns of using isotopes, non­ radioactive labeling approaches, such as biotin labeling, can also be used (Paetkau &

Strobeck 1994; Wu, et ai, 1999). One additional advantage for using biotin-labeled probes is that library screening and detection can be performed in a single day (Paetkau &

Strobeck, 1994). In mammalian genomes and many other vertebrate genomes, (GT)n repeat motifs are the most abundant. Therefore, a (GT)n probe is normally the first choice to develop dinucleotide microsatellite markers. However, a disadvantage for dinucleotide microsatellite markers is the production of stutter bands or shadow bands that normally show up in the genotyping process. Such bands sometimes can make interpretation of data difficult or ambiguous. One alternative is to develop trinucleotide or tetranucleotide microsatellite markers, which are much easier to score both because their PCR products have larger size differences, and because they present much fewer shadow band problems in genotyping. Unfortunately, both trinucleotide and tetranucleotide microsatellite sequences are much less abimdant than dinucleotide repeats in vertebrate genomes. In any cases, multiple microsatellite probes with similar

15 annealing (hybridization) temperatures can be used simultaneously to increase the screening efficiency.

Microsatellite PCR primer design and genotyping

DNA sequences flanking the repeat motifs can be used to design primers for microsatellite PCR genotyping analysis. For genotyping purposes, PCR primers are usually designed as close to the repeat motifs as possible to minimize the sizes of PCR products. This is especially helpful for radioactive genotyping (see below) because the smaller the PCR products, the better the resolution will be. PCR genotyping can be performed radioactively by labeling one of each PCR primer pairs with y^'F-ATP, or by incorporating radioactive nucleotides into the PCR products during the amplification process. Radioactive PCR products are separated in denaturing sequencing gels, and their sizes can be estimated fairly easily and accurately. The incorporation approach, however, has been only used in earlier years because of its inconsistency in scoring.

Alternatively, several non-radioactive genotyping approaches are also available.

For example, ethidium bromide-stained Metaphore agarose gels allow a resolution of as small as 4 bp differences for PCR products between 200 to 300 bp (Golstein & Clark,

1995). The most widely used non-radioactive approach involves the use of a fluorescence-based PCR genotyping technique, which was originally developed by Diehl et al. (1990), and further improved by Edwards et al. (1991) and Ziegle et al. (1992). By taking this approach, one of each PCR primer pairs is labeled with one of the three

16 fluorescent dyes, each, with a different emission wavelength, and thereby allowing the simultaneous genotyping, in a single gel lane, of up to three microsatellite markers with their PCR products being overlapped in size. The biggest advantage for using fluorescent labeled primers is the high genotyping efficiency. This is because fluorescence-based

PCR genotyping can score up to 24 markers in a single gel lane (Schwengel et al., 1994) when combined with multiplex PCR (Morral & Estivill, 1992) in 96-well microtiter plates (Todd, 1992). In addition, the fourth fluorescent dye can be used to incorporate an internal size standard in each gel lane, thereby allowing a fast semi-automated sizing of microsatellite alleles by use of an automated sequencer and a microsatellite genotyping computer software, such as GENESCAN 672 or Genotyper (e. g., Lee & Kocher, 1996).

Applications o f microsatellites

In the past decade, microsatellites have been used for different levels of genetic analyses. At the genomic level, highly variable microsatellite markers have been widely used in model organism genome mapping , disease gene (loci) mapping, as well as quantitative trait loci (QTL) mapping (Weissenback et al, 1992). In fact, including microsatellite markers in the human genome mapping project has greatly facilitated the mapping process (Cooperative Human Linkage Center, 1994). So far, microsatellites have also been used in genome mapping of other scientifically and/or economically important organisms, including mice (Dietrich et ai, 1992), rats (Serikawa et al, 1992), pigs (Johansson et al, 1992), dogs (Ostrander et al, 1993), zebrafish (Postlethwait et al,

1994), tilapias (Kocher et u/., 1998), and mosquitoes (Zheng er a/., 1996). More recently,

17 however, increasing efforts have been made to develop a new set of genetic markers, single nucleotide polymorphism (SNP), for future genome mapping and disease candidate gene mapping studies (Wang et al„ 1998). This is because SNPs are even more abundant than microsatellite sequences and because genotyping of SNPs is easier to automate

(Cavalli-Sforza, 1998; Kruglyak, 1997).

At the organismal level, microsateUites have been successfully used for paternity analysis (e.g., Fjerdingstad ef a/., 1998; Knight et uA, 1998; Morin ef a/., 1994), pedigree determination (Herbinger et al, 1995), individual relatedness inference (Ishibashi et ai,

1997), and estimation of rates of male mating success (Gullberg et al„ 1997). In addition, recent studies have also demonstrated that microsateUites are ideal markers for estimating effective population sizes (Fiumera 1999; Tarr et al., 1998).

So far, the application of microsatellite markers has been best documented, and has been most successful, at the inter-population level. Numerous studies have documented various levels of regional population differentiation in a variety of species.

For example, microsateUites are very powerful for the differentiation of various human populations (Bowcock et ai, 1994), bear populations (Paetkau & Strobeck, 1994), and populations of various fish species, such as the Pacific steelhead (Nielsen et al, 1995), the Atlantic salmon (McConneU et al, 1995), and the Eastern African cichlid species

(van Oppen et al, 1997). MicrosateUites are also excellent genetic markers for evaluating levels of gene flow and migration rates (e.g., Amos et al, 1994; Roy et al,

1994).

18 Application o f microsatellite markers at the interspecific level is currently very rare and still in its infancy. There are at least a couple of reasons for this. Few microsatellite markers are capable of amplifying genomic DNA from a large number of species. Furthermore, it is still too early to determine what types of data analysis tools are most appropriate for such studies. Nevertheless, it has been shown recently that by choosing appropriate markers, microsatellites can be very useful for phylogenetic analysis of closely related species groups, such as the Drosophila melanogaster species complex (Harr et ai, 1998). Success of future studies on phylogenetic inferences using microsatellite markers will greatly depend on both the availability of a large number microsatellite markers for the species group of interest, and a better understanding of the mutational mechanism of microsatellite markers (see below).

Possible cellular functions o f microsateUites

Because microsatellite sequences occur 5-10 times more frequently than the equivalent random motifs (Tautz et al., 1986), and because alternating purine/pyrimidine repeats such as (GT)n and (GC)n can form a Z-DNA structure (Nordheim & Rich, 1983), it has been speculated that microsatellites might be involved in some specific cellular events. So far, limited evidence has implicated microsatellites in several important cellular functions, such as regulation of gene expression (Hamada et ai, 1984), gene rearrangement (Boehm et ai, 1989), chromosome packing/condensing (Stallings et al,

1991), and the replication of telomeres (Blackburn & Szostak, 1984). In addition, since some microsatellite sequences, such as (GT)n (Boehm et a l, 1989), are frequently found

19 at or near the breakpoints of chromosomal rearrangements, it has been speculated that microsateUites might stimulate homologous or illegitimate recombination. Several pieces of empirical evidence have shown that a (GT)n repeat sequence stimulated homologous recombination at the extrachromosomal level (Bullock et al., 1986; Wahls et al., 1990), but had no effect at the chromosomal level (Sargent et al., 1996).

The best known cellular functions of microsatellite sequences come from genetic studies of special types of human inherited diseases and cancers. Since 1991, several specific types of trinucleotide repeats {e.g., CAG) have been shown to be directly associated with about a dozen human genetic disorders, including four congenital fragile

X syndromes (Fu et al, 1991; Knight et al, 1993; Nancarrow et al, 1994; Parrish et al,

1994) ), and seven neurodegenerative disorders (Brook et al, 1992; Burke et al, 1994;

Kawaguchi er a/., 1994; Koide er a/., 1994; La Spada ef a/., 1991', Oxr et al, 1993; The

Huntington's Disease CoUaborative Research Group, 1993). In all cases, expansions or increases in copy numbers of repeat sequences result in the disease phenotype.

Therefore, such human genetic disorders are referred to as trinucleotide expansion diseases, or trinucleotide diseases (Mandel, 1994).

In addition, in the past few years, a genome-wide microsatellite instability has also been well-documented in different types of cancerous cells (Aaltonen et al, 1993).

It is unclear, however, whether microsatellite instability is the cause or the consequence of tumorogenesis, although indirect evidence collected by Aquilina et a l (1994) and

Strand et al (1993) favors the consequence scenario (but also see Parson et al, 1993).

20 Mutational process and mutational mechanism o f microsateUites

The success of using microsatellite markers for various population genetic studies relies greatly on appropriate models for microsatellite data analyses, such as estimates of

Fst values, genetic distances, and divergence time. Several recent studies have shown that estimation of mutation rate and/or inference of mutational mechanism and mutational processes are essential to generating such models (Goldstein et al, 1995a; 1995b;

BCimmel er a/., 1996; Slatkin, 1995).

Although the precise mutational process and mutational mechanisms at microsatellite loci are not well understood, theoretical studies and empirical observations have already shed some light on them. Two alternative mechanisms have been proposed to explain microsatellite mutations: DNA polymerase slippage during DNA replication and unequal crossing-over during homologous recombination (Strand et al, 1993).

Schlotterer & Tautz (1992) demonstrated that DNA polymerase slippage alone can generate all types of di- and trinucleotide repeat motifs starting from short primers in vitro, suggesting that slippage might be the cause of mutations at microsatellite loci in vivo. In addition, several studies using different mutant strains of E. coli or yeasts also suggest an important role of slippage in microsatellite mutations. For example, mutations affecting mismatch repair in E. coli (mutL and mutS) increased the microsatellite instability by 13-fold (Henderson & Petes, 1992), whereas mutations affecting recombination in E. coli (recA) (Levinson & Gutman, 1987) and yeast (rad52)

(Henderson & Petes, 1992) had no effect on microsatellite stability. Similarly, Strand et al (1993) have shown that in yeast, while mutations affecting the proof-reading functions

21 of DNA polymerase had little effect on microsatellite instability, mutations affecting

DNA mismatch repair increased the microsatellite instability by 100- to 700 fold.

Furthermore, mutations at a tetranucleotide repeat observed in pedigree studies by

Mahtani & Willard (1993) likely arise by DNA polymerase slippage instead of unequal crossing-over between nonsister chromatids, although unequal crossing-over between sister chromatids could not be ruled out. Taking all the information together, it is now generally believed that DNA polymerase shppage is the primary, if not the exclusive, cause of mutations at microsatellite loci. However, when the repeat length reaches some unknown threshold, unequal crossing-over could also be an important factor.

Two extreme models also could potentially be related to the mutational process of microsateUites. The first model is the infinite-allele model (lAM) (Kimura & Crow,

1964), which predicts that mutations are "memoryless" and can involve an infinite number of tandem repeats, resulting in new alleles that have not been previously encountered in the population. The second model is the stepwise mutation model (SMM)

(Kimura & Ohta, 1978), which predicts that a parental allele can mutate only by gaining or losing one repeat unit, and therefore the mutant allele could be the one that has already been present in the population. In other words, mutations under the SMM are not

"memoryless".

Empirical studies on an artificial constmct in yeast (Henderson & Petes, 1992) and of mutation analyses in mice (Dallas, 1992) or humans (Weber & Wong, 1993) indicated that the majority of mutations at microsatellite loci involve the addition or subtraction of one or a small number of repeat units, favoring the SMM. However,

22 conflicting results have also been documented. Mutations involving changes of a large number of repeat units have been well documented at specific trinucleotide microsatellite loci that are associated with certain human genetic disorders (Fu et al., 1991; Knight et al., 1993; Nancarrow era/., 1994; Parrish e/a/., 1994).

Using computer simulations, Valdes et al. (1993) found that the allele frequency distributions observed at 108 dinucleotide microsatellite loci in three human populations were consistent with the SMM if the product of the effective population size and mutation rate was greater than one. Similarly, Shriver et al. (1993) demonstrated that mutations of microsateUites with 3 - 5 bp repeat units fit well into the one-step SMM, and microsateUites vrith 1 - 2 bp repeat units fit the SMM slightly less well, whereas mutations of minisatellites (repeat unit of 15 - 70 bp) were better explained by the lAM .

In contrast, Estoup et al. (1995a) provided evidence to show that mutations at microsateUites overaU fit the lAM better, although the authors cautiously pointed out that their data might be biased because that the majority (>70%) of microsateUites used in their studies are compound repeats consisting of two or three different repeat motifs.

Realizing that either the one-step SMM or the lAM alone might not be able to interpret all the mutational processes, Di Rienzo ei al. (1994) postulated a two-phase model, in which most mutations are single repeat unit changes, but rare large changes in repeat copy number do occur. Their observations suggested that this new model provided the best fit to most of the 10 microsatellite loci in human populations.

23 Mutation rates o f microsatellites

Early theoretical studies have revealed a positive correlation between the number of alleles and the mutation rate (Ewens, 1972). Furthermore, empirical observations showed that most microsatellite markers are highly variable, characterized by large numbers of alleles and high heterozygosity (Weber & May, 1989). Goldstein & Clark

(1995) argued that the high level of variation observed at microsatellite loci is due to their high mutation rates.

So far, several different approaches have been used to estimate mutation rates of microsateUites. The most straightforward approach is to simply count the rare mutant alleles by genotyping a large number of offspring based on pedigree data. By taking such a painstaking approach, Weber & Wong (1993) examined a total of 20,000 parent- offspring transfers of aUeles by genotyping 40 CEPH reference families for 28 microsateUites on human chromosome 19. The average mutation rate estimated is 1.2 X

•3 -3 10 per locus per gamete per generation, ranging from 0 to 8 X 10 . Hastbacha et al,

(1992), on the other hand, used a different approach to estimate the mutation rate of three microsateUites in the Finnish populations. This approach is based on linkage disequilibrium, and the mutation rates estimated are from 3X 1 O'* to 4X10^. Using marker heterozygosity and maximum likelihood determination, Edwards et al. (1992) estimated the mutation rates for tri- and tetranucleotide microsateUites in humans to be from 2 X 10 ^ to 2 X 10^. The last approach that has been successfully used to estimate the mutation rates of microsateUites is to genotype DNA samples from multiple generation inbred lines. Using this time-saving and less expensive approach in mice,

24 -4 Dallas et al. (1992) estimated the mutation rates for two microsatellites to be 1.2 X 10

4 and 4.7 X 10 , respectively.

At the present time, estimation o f mutation rates of microsateUites has been limited to only a few species, such as mammalian species (Deitrich et al. 1992), birds

(Primer et al., 1998), and Drosophila (Hutter et al., 1998; Schlotterer et al., 1998; Schug et al., 1997; Schug et al, 1998). Mutation rates of microsateUites in different mammaUan species including humans (Edwards et al., 1992; Hastbacka et al, 1992; Mahtani &

Willard, 1993; Weber & Wong, 1993), mice (Dallas eta l, 1992; Deitrich e ta l, 1992),

-5 -2 rats (Serikawa et al, 1992), and pigs (EUegren et al, 1995), are usuaUy firom 10 to 10 ", and the average rates are generally in the same range firom 10~* to 10^, a measure several orders of magnitude higher than the average mutation rate for single copy nuclear genes.

By screening 157,680 allele-generations for 24 microsateUites through 30 mutation accumulation inbred lines of Drosophila melanogaster, Schug et al. (1996) obtained an average mutation rate estimate of 6.3 X 10^, a number considerably lower than those reported for the mammalian species. More recently, Schlotterer et a l (1998) obtained an identical estimate of average mutation rate for 24 loci in Drosophila melanogaster, but all nine mutations occurred at a single locus, which has an estimated mutation rate of 3.0 X

10"*.

Advantages o f microsateUites as genetic markers

Compared with the conventional genetic markers, microsatellite markers have several advantages that lead to their rapid acceptance and wide use in a variety of

25 research areas. First of all, microsateUites are abundant in all the eukaryotic genomes examined except the yeast genome (Heame e r or/., 1992). For instance, there are at least

10^ CA repeats in the mouse and human genomes (Hamada et al, 1982). More importantly, unlike minisatellites or VNTRs, which are primarily located in the distal region of human chromosomes, microsateUites are evenly distributed across an entire genome (Stallings et al, 1991; but also see O'Reilly & Wright, 1995), a feature making them better representatives of a species' genetic background.

Secondly, different microsatelUtes may have different ranges of variability, with a positive correlation between the repeat length of microsatellite markers and their variability (Weber, 1990). This feature makes it feasible to choose different microsatellite markers for specific types of applications (O'Reilly & Wright, 1995). For example, less variable markers might be very appropriate for population genetic analyses

(Carvalho & Hauser, 1994) and phylogenetic analysis of closely related taxa (Harrer al,

1998) whereas highly variable markers might be preferable for genome mapping and linkage analyses (Weissenback et al,, 1992), paternity exclusion (Morin et al, 1994), as well as population genetic smdies for species where conventional genetic markers revealed little variation (Hughes & Queller, 1993; Taylor et al, 1994).

Thirdly, in contrast to minisatellites, most microsatellite markers are single locus.

This allows the number of aUeles, the allele frequency, and the heterozygosity to be estimated for each locus, whereas only the average heterozygosity over all loci can be estimated for multilocus markers (Stephens et al, 1992).

26 Fourthly, microsatellite sequences usually cover less than 300 bp. Therefore, they can be genotyped by PCR analysis. More importantly, the size of different PCR products can be determined easily by running a known DNA sequence size marker (e.g., M l3 sequence) along with the PCR products in a denaturing polyacrylamide sequencing gel, a feature making cross gel comparison possible and dropping out alleles with ambiguous size or binning (pooling) alleles with similar size unnecessary (O'Reilly & Wright, 1995).

Finally, although microsatellite sequences are usually highly variable, primarily due to different numbers of the repeat motif, the flanking sequences, which can be designed as PCR primers, are usually conserved among closely related species (Roy et al, 1994), genera (Schlotterer et al., 1991; Vaiman et al., 1994; Zardoya et al., 1996), or even families (FitzSimmons er a/., 1995). This makes cross species application of microsatellite markers feasible.

Disadvantage and limitations o f using microsatellite markers

Technical limitations

A common technical problem for microsatellite genotyping is the appearance of shadow bands or stutter bands (Litt et al., 1993), which sometimes makes it very difficult to differentiate between a homozygote and a heterozygote with two alleles being different by only one repeat unit. Empirical studies demonstrated that shadow bands are probably generated by DNA polymerase slippage during PCR (Hauge & Litt, 1993). Such a problem could be serious for some dinucleotide microsatellite markers. One way to avoid such a problem is to use either trinucleotide or tetranucleotide micrsoatellite markers,

27 which produce almost no or few shadow bands during the PCR genotyping process.

Adding formamide to the sequencing gel at a final concentration of 32% could also help to slightly ameliorate the shadow band problem (Litt et al., 1993).

Another potential technical limitation is false heterozygote deficiency due to the non-detection of one of the two alleles in heterozygotes. Technically, at least two problems could result in nondetection of a particular allele. The first problem is short allele dominance (Wattier et al., 1998), in which the shorter alleles are preferentially amplified versus the longer alleles in heterozygotes. This usually happens when the sizes of the two alleles are very different, and could result in a false heterozygote deficiency if the longer alleles fail to amplify. The second problem is related to low copy numbers of the target DNA for some particular DNA sources, such as faeces, feathers, or hair

(Gagneux et a/., 1997; Gerloff et or/., 1995; Taberlet e tû r /., 1996). In this case, only one allele is amplified by chance due to low copy numbers of the templates. Therefore, the dropout allele could be either the shorter allele or the longer allele.

Homoplasv

Homoplasy refers to a situation where two characters (alleles) are identical by

“state” but not identical by descent. This could be a problem for microsatellite markers especially when they are applied to phylogenetic studies of distantly related taxa.

Homoplasy exists when alleles with the same size (or the same “state”) of PCR products either represent different sequences or represent the exact same sequences but are evolved by independent processes (Estoup et al., 1995b). The first type allele homoplasy could be

28 due to that: 1) insertions/deletions (indels) within the repeat motifs are compensated by indels within the flanking sequences \e.g, size of (N)7(CA)23(N)13 = size of

(N)7(CA)20(N)19 in Grimaldi & Crouau-Roy 1997 where N could be either G, or A, or T or C]; 2) indels within one type of repeat motifs are compensated by indels within the other type of repeat motifs for a compound microsatellite markers [e.g., size of

(N)20(GT)10(AC)10(N)30 = size of (N)20(GT)11(AC)9(N)30], and 3) point mutations within the repeat motifs (point mutations in the flanking sequences can be ignored) [e.g, size of (N)20(GT)10(N)30 = size of (N)20(GT)9(AT)(N)30]. In any of the three cases, homoplasy can be identified by sequencing all the alleles with the same size.

Tlie second type of homoplasy arises when two alleles with the same size and the exact same repeat sequence evolved through different evolutionary processes. This is because most of the time microsateUites follow a stepwise mutation model so that the identical present-day alleles could come firom independent evolutionary lineages. For example, allele (GT)20 could either come firom (GT)18 to (GT)19 then to (GT)20 or firom

(GT)22 to (GT)21 then to (GT)20. The two (GT)20 alleles are now not identical by descent. Unlike the first type of homoplasy, this type of homoplasy can not be inferred from the present-day alleles. Fortunately, levels of homoplasy are usually minimal unless distantly related populations or species are involved.

Null alleles

Null alleles are the alleles that fail to amplify in the PCR genotyping process.

They have been well documented in various studies {e.g., Paetkau & Strobeck, 1995). At

29 least three situations can result in null alleles. The first two are short allele dominance and random allele dropout due to low copy numbers of DNA templates, both of which are due to technical limitations associated with microsatellite PCR genotyping (see above).

The third situation is due to mutations in at least one o f the two primer sequences. Such mutations could be either point mutations (Paetkau & Strobeck, 1995) or deletions

(Callen et al., 1993). The presence of null alleles due to mutations in the flanking sequences is usually not a major problem unless distantly related populations or distantly related species group are involved.

The presence of null alleles may be detected either by violations of the classical

Mendelian inheritance in pedigree studies, or by significant heterozygote deficiencies in the Hardy-Weinberg test. However, heterozygote deficiency alone does not necessarily indicate the presence of null alleles because other factors, such as the Wahlund effect and inbreeding, could also result in heterozygote deficiency. In general, heterozygote deficiency due to null alleles is usually marker-specific, while heterozygote deficiency due to inbreeding or the Wahlund effect is usually genome-wide. Furthermore, failure to amplify some templates for specific markers is also a good indication of the presence of null alleles. Once identified, the null allele problem can be resolved by design of a new primer or primer pairs and to retype all the individuals that were scored as homozygotes or completely failed to amplify before (Paetkau & Strobeck, 1995). Recently, Brookfield

(1997) developed a simple approach to estimate null allele frequencies based on heterozygote deficiency of microsatellite data.

30 In summary, although the fascinating Lake Victoria cichlid species assemblage has provided an excellent model system for evolutionary biologists, taxonomists, and conservation biologists, the evolutionary relationships among the members of the young species flock remains unclear. A better understanding of levels and patterns of genetic variation within and among various Lake Victoria cichlid species will be essential to major success in future studies of evolutionary biology, taxonomy, and conservation biology of the Lake Victoria cichlid species group. Although the inability of many conventional genetic markers in detecting sufficient amounts of variation among various

Lake Victoria cichlid species has presented a major challenge for researchers, the recent emergence and success of applying highly variable microsatellite markers in various population genetic studies may provide needed tools in phylogenetic studies of Lake

Victoria haplochromine cichlid species.

31 «IL,

•KWAMIA

L*KAMTA0OL1

L*NA8UaABa

#L'KACHIRA

L.HA KIVALI^» L'1C1JAMIBAL0LA|

Figure 1.1: A map of the Lake Victoria Region (LVR).

32 Lake Surface area (km^) Maximum depth (m) East Africa Lake Victoria 68800 79 Lake Tanganyika 32900 1435 Lake Malawi 22490 706 North America Lake Superior 83270 393 Lake Michigan 58020 281 Lake Erie 25680 64 Asia Lake Baikal 30500 1741

Table 1.1: The surface area and maximum depth of the Great Lakes of East Africa and several other freshwater lakes in the world (modified from Kaufinan et al, 1996)

33 Water body Estimated age Cichlid Non-cichlid n % n % Malawi l-2mya 500 99 45 62 Tanganyika 2-4mya 165 99 75 70 Victoria 250-750kya >400 99 38 16 Albert 12 98 37 5 Edward/ George 60 99 17 12 Niger 10 20 124 4 Nile 10 20 105 20 Zaire 40 65 c. 650 80

Table 1.2: The number of cichlid species, non-cichlid species, and the percentage (%) of endemic species in major East African lakes and river systems. All figures are approximations, and include both described species and newly discovered but undescribed species. Data were compiled from Greenwood (1991), Kaufinan et al.

(1996), and Pitcher & Hart (1995). n: number of species; %: percentage of species that are endemic to a particular lake.

34 CHAPTER 2

Isolation and Characterization of Microsatellite Markers in Astatoreochromis

alluaudi and Their Cross-species Amplifications in Other African Cichlids

INTRODUCTION

The Great Lakes of East Afirica, which include Lake Victoria, Lake Malawi, and

Lake Tanganyika, are second to none in cichlid species richness and the speed at which those species have evolved (Fryer & lies, 1972). In Lake Victoria alone, there are an estimated 400-600 haplochromine cichlid species, almost all of which are endemic to the lake (Greenwood, 1991; Kaufinan, 1997). In contrast, there are only about 40 non-cichlid fish species in this lake, and only 42% are endemic (Kaufinan & Ochumba ,1993; Pitcher

& Hart, 1995).

It appears that most of the endemic species in Lake Victoria evolved in the last

12.000 years (Johnson et al, 1996), derived within the greater Lake Victoria region from a set of regional genera that diverged from a monophyletic origin perhaps as recently as

225.000 years (Kaufinan et al., 1997), and certainly less than one million years ago

35 (Meyer ef a/., 1990). Unfortunately, many of the endemic Lake Victoria haplochromine species have been recently lost or threatened with extinction due to human interference, such as over-fishing, water pollution, and introduction of the voracious predatory fish, the

Nile perch (Pitcher & Hart, 1995; Stiassny, 1991). Currently, conservation efforts to preserve the endangered Lake Victoria cichlid species are underway in Afiica, Europe, and North America.

Knowledge about the evolutionary relationships among the Lake Victoria cichlid species is vital for understanding the mechanisms that governed the explosive spéciation, and for developing practical guidelines for the conservation efforts. For many years, the details of the evolutionary relationships of Eastern Afiican cichlids have been greatly debated. There are very few morphological characters that are phylogenetically informative for the intrafamihar relationships of cichlids, and some of the potentially most useful morphological characters are confounded by phenotypic plasticity (Stiassny,

1991). Moreover, efforts to employ more reliable genetic approaches have been severely hampered by the lack of sufficiently variable genetic markers. Conventional genetic markers have not detected significant amounts of variation in Lake Victoria cichlid species. For example. Sage et al. (1984), using 10 allozyme markers, failed to distinguish among 10 Lake Victoria haplochromine cichlid species. Similarly, Meyer et al. (1990) obtained over SOObp of mtDNA sequences, including the highly variable control region

(D-loop), but found very little intraspecific variation and very low levels of divergence among 14 species of Lake Victoria cichlids representing nine genera. Therefore, to better

36 understand the evolutionary mechanisms and to infer the population genetic structure of haplochromines, more highly variable genetic markers have to be developed.

In the past few years, microsatellites, or short tandem repeats (STR), have been isolated and characterized in many different organisms, and have been increasingly applied to different types of population genetic analyses, such as paternity exclusion (e. g.

Morin et al., 1994) and population structure (e. g. Amos et al, 1994; Bowcock et ai,

1994; McConnell et ai, 1995; Paetkau & Strobeck, 1994; Ruzzante et a i, 1996). The wide use of microsatellites is attributable to a list of highly favorable features: high abundance but random distribution in eukaryotic genomes (Heame et a i, 1992), high mutation rates (e.g. Weber & Wong, 1993; but also see Schug et ai, 1996; Schlotterer et ai, 1998), general conservation of flanking sequences among closely related species

(Morris et ai, 1996; Rico et ai, 1996; Scribner et ai, 1996; Zardoya et ai, 1996), and relative ease of genotyping via PCR with accurate sizing of alleles (Litt & Luty, 1989).

As the first step toward our general goal of inferring the evolutionary history and relationships of LVR haplochromines using rapidly evolving genetic markers, I here report the isolation and characterization of dinucleotide and trinucleotide microsatellite markers from a Lake Victoria regional endemic, Astatoreochromis alluaudi, as well as the cross-species amplifications of microsatellites for other LVR cichlid species.

37 MATERIALS AND METHODS

Isolation and sequencing o f microsatellite markers

A size-selected partial genomic library was constructed and subsequently screened for microsatellites, following the procedures of Paetkau and Strobeck (1994), with slight modifications. Briefly, A. alluaudi genomic DNA was extracted using a standard proteinase K, phenol-chloroform extraction protocol (Sambrook et al, 1989), and then completely digested with SauZAA (Boehringer Mannheim) for 2 hr at 37°C. A pool of digested DNA fragments of 300 to SOObp was recovered from a 0.7% agarose gel by centrifugation through glass wool, and subsequently dephosphorylated by calf intestine alkaline phosphatase (Boehringer Mannheim) according to the manufacturer’s protocol to prevent multiple insert cloning. A partial genomic library was constructed by ligating the dephosphorylated DNA inserts into a Ml 3 bacteriophage vector that had been predigested using BamiH (GIBCO BRL). Ligation was performed at 16°C for 8-12 hr with SOng of M13 DNA, 50-100ng of genomic DNA, and 0.5 unit o f T4 ligase (GIBCO

BRL). Ligation products were ethanol precipitated using yeast tRNA as a carrier, and then transformed into competent DHSaF' E. coli cells using an electroporator (BioRad) according to the manufacturer's protocol. Electroporation was performed at 2.5 k\' for about 5 milliseconds.

Recombinant plaques were lifted with nitrocellulose filters (S & S), and subsequently screened either by a mixture of 5'-biotinylated (GT),, and (CT),, probes for dinucleotide microsatellites, or by a mixture of 5'-biotinylated (ACC); and (AGC), probes

38 for trinucleotide microsatellites. All the probes were synthesized by OPERON

TECHNOLOGIES, INC.. Nitrocellulose filter lifts were incubated at 37°C for 1 hr in 2X

SSC, 0.1% SDS, 50 ug/ml proteinase K, and then hybridized to the probes in 6X SSC,

0.05% sodium pyrophosphate and 4% BLOTTO (5% skim milk powder and 0.02% sodium azide) for 2 hr at 50°C for dinucleotide probes, and at 60°C for trinucleotide probes. Post-hybridization washes were carried out in 2X SSC and 0.05% sodium pyrophosphate at a temperature of 12°C higher than the corresponding hybridization temperature. Positive signals were detected by the BLUGENE biotin detection system

(GIBCO BRL) according to the manufacturer’s protocol.

Putative positive clones were isolated after two or three rounds of screening.

Double-stranded DNA was made from the phage suspension of putative positive clones using the standard small-scale double stranded M l3 DNA preparation protocol

(Sambrook et al., 1989), and was subsequently used as templates for sequencing using

GIB CO BRL dsDNA Cycle Sequencing System following the manufacturer’s protocol.

Sequencing was performed in both directions using M13/pUC18 forward 23-base sequencing primer (GIBCO BRL) and M13/pUC18 reverse 16-base sequencing primer

(OPERON TECHNOLOGIES, INC.). Sequences of nine positive clones for which primer pairs were tested on population samples were deposited in GenBank with the following Accession Numbers: U66809 (0SU12t), U66810 (OSU09d), U66811

(0SU13d), U66812 (0SU16d), U66813 (0SU19d), U66814 (0SU19t), U66815

(OSU20d), U66816 (0SU21d), and U66817 (OSU22d).

39 Microsatellite PCR analysis

PCR primers complimentary to the sequences flanking 17 microsatellite motifs were designed using the computer program OLIGO (Rychlik & Rhoads, 1989). Table 2.1 lists the primer sequences and PCR conditions for the 17 microsatellite loci that were used for initial screening. Because the initial population screening on A. alluaudi suggested that genotypic distributions were significantly different firom the Hardy-

Weinberg expectations at three loci (0SU19d, OSU20d, and OSU22d), extra pairs of

PCR primers (listed, as OSUxxxN in Table 2.1) were designed to test for the presence of null alleles due to variation in primer sequences.

For PCR analysis, each forward primer was end-labeled with [y-^^P]-ATP using

T4 polynucleotide kinase (GIBCO BRL). PCR reactions were carried out in 5 or 10 pi of a mixture containing 20-3 Ong of DNA template, 0.25 mM of each primer, 100 mM deoxynucleotide triphosphate (dNTP), 1.5 mM MgClj, and 0.5 unit of Taq polymerase

(GIBCO BRL). Amplification was achieved by running the appropriate number of cycles

(Table 2.1) of 45 sec at 95°C, 45 sec at the appropriate annealing temperature (Table 2.1), and 30 sec at 74°C. This main PCR profile was preceded by a long denature cycle at

95°C for 5 min, and followed by a long extension cycle at 74°C for 6 min. Multiplex

PCR was performed for markers 0SU12t and 0SU21d. PCR products were resolved in

6% or 8% polyacrylamide sequencing gels with 7M urea, and visualized by autoradiography. Accurate sizing of PCR bands was achieved by running several pUClS plasmid control sequencing products along with the microsatellite PCR products.

4 0 Biological materials and DNA extraction

Samples of 10 cichlid species (Table 2.2) were used to test cross-species amplifications of microsatellite PCR primers firom A. alluaudi. All specimens, including the one A. alluaudi individual that was used to construct the partial genomic library, were collected from Kenyan and Ugandan waters in the northern and western portions of the

Lake Victoria Region between 1992 and 1996, except for .^4. straeleni, which was obtained from an aquarium dealer (Ned Bowers).

White muscle (epaxial musculature) tissue samples were collected in the field and fixed in 95% ethanol until DNA extraction. After tissue collection, entire fish specimens were photographed, fixed and preserved in 10% formalin, and retained as taxonomic vouches. The formalin preserved specimens were then transported to the laboratory of

Dr. Les Kaufrnan at Boston University, and identified morphologically. Genomic DNA was extracted from the alcohol-preserved muscle tissues using either the standard proteinase K, phenol-chloroform extraction protocol (Sambrook et al., 1989), or the

NaOH extraction protocol (Zhang & Tiersch, 1993).

RESULTS

Cloning, characterization o f microsatellite markers

Approximately 8,910 recombinant plaques were screened using the dinucleotide microsatellite probes. A total of 158 putative positive clones were identified, and over

41 100 such clones were isolated. Of these, 24 clones have been sequenced, and 23 contained the expected dinucleotide repeat motifs. Twenty two clones had (GT)„ repeat motifs, and one clone had a (CT)„ repeat motif (Table 2.3). The number of uninterrupted repeat motifs ranged from six to 47. Among the 23 clones, 12 clones had at least 20 uninterrupted repeat motifs, seven clones had at least 30 uninterrupted repeat motifs, and four clones had at least 40 uninterrupted repeat motifs. According to Weber's (1990) classification, 16 (70%) clones contained perfect repeats, three (13%) clones contained imperfect repeats, and four (17%) clones contained compound repeats.

For the trinucleotide probes, about 22,000 recombinant clones were screened, and

127 putative positive clones were isolated. Nineteen of the 21 clones that have been sequenced contained trinucleotide repeat motifs (Table 2.4). The majority (63%) of them were perfect repeats, while 16% were imperfect, and 21% were compound. The uninterrupted trinucleotide repeats recovered were generally very short, ranging from two to 10 repeat units. Five clones contained at least two similar trinucleotide motifs that were only one base different from each other, including clones AA12t [(NGC)^ and

(CAN),J, AA14 [(CNG);(CAA)(CAG)3 ], AA15[(CNT)s(CCN)g], AA19[(ANC),J, and

AA24 [(CNG)g(TGG)(CAG)y] (Table 4).

Microsatellite PCR primer design and initial screening in A. alluaudi

Among the 42 positive clones that contain the expected repeat motifs (Table 2.3 and Table 2.4), a total of 17 pairs of PCR primers (Table 2.1) were designed based on the sequences flanking the repeat motifs of 16 clones. Six pairs of primers were designed for

4 2 trinucleotide repeat motifs, and 11 pairs were designed for dinucleotide repeat motifs. No primers were designed from the other 26 clones because o f one or more of the following reasons: 1) the repeat sequence is too short; 2) the repeat motif is too close to the cloning site to design a primer, or there is not enough flanking sequence; 3) the flanking sequence is not clear enough to design a primer; 4) the clone contains multiple dinucleotide repeats.

Nine of the 16 pairs of PCR primers seemed to successfully amplify single loci of genomic DNA from A. alluaudi and produce polymorphic, scorable PCR products.

These nine sets of primers were subsequently used as genetic markers for population studies. They include seven dinucleotide microsatellite markers (OSU09d, 0SU13d,

0SU16d, 0SU19d, OSU20d, 0SU21d, and OSU22d) and two trinucleotide microsatellite markers (0SU I2t and 0SU19t). The remaining eight pairs of PCR primers either did not produce scorable PCR products, or seemed to amplify multiple targets (loci), or did not reveal polymorphism for at least five individuals of A. alluaudi.

Cross-species amplifications

To test the conservation of the A. alluaudi microsatellite PCR primer sequences in other African cichlid species, we used nine pairs of PCR primers (OSUxxx in Table 1.1) to amplify genomic DNA from the 10 other cichlid species (Table 2.2) without changing the PCR conditions used in A. alluaudi. All primers successfully amplified the A. alluaudi congener, A. straeleni. Seven pairs of primers successfully amplified the presumptive homologous loci in the seven other LVR haplochromine species tested, and gave rise to typical microsatellite “shadow” bands. The remaining two pairs of primers,

43 0SU13d and OSU22d, also produced amplification products in all the other

haplochromine species tested. However, they did not produce consistent results, since

they gave rise to multiple PCR products in a significant number of individuals. In the

more distantly related tilapiine cichlids, only five pairs of primers (OSU09d, 12t, 16d,

20d, and 2Id) produced clear scoreable PCR amplification products with “shadow” bands

(Table 2.5).

All seven successfully amplified markers are polymorphic in each of the seven

Lake Victoria haplochromine cichlid species. In sharp contrast to both allozyme markers

and mtDNA sequences, these microsatellite markers detected substantial amounts of

genetic variation within each species, with the average number of observed alleles

ranging from 7.9 to 17.4 per locus, and the average expected heterozygosity (Nei 1987)

being estimated to be from 0.61 to 0.67 (Table 2.6).

DISCUSSION

Abundance o f (GT)^ motif in A. alluaudi genome

Various library screening studies have made it clear that dinucleotide

microsatellite sequences are abundant within vertebrate genomes, including several fish

genomes (reviewed in O'reilly and Wright, 1995). In the A. alluaudi library presented

here, about 1.8% (158/8,910) of the recombinant plaques screened were putative positives

for dinucleotide microsatellites. This percentage is similar to that reported in library

4 4 screening studies for other vertebrate genomes, including the Atlantic salmon (Salmo salar) genome (McConnell et al., 1995). Assuming that 95.8% (23/24) of the 158 putative positive clones did indeed contain dinucleotide repeat motifs (or 1.7% of the total plaques screened contained the dinucleotide repeats), and that 95.7% (22/23) of the real positive clones recovered from the library contained (GT)^ motifs, we estimated that about 1.6% ((158 X 22 X 23)/(8,910 X 23 X 24)) of the recombinant plaques in the library harbored (GT)„ motifs. Further assuming that the number of dinucleotide repeat motifs recovered from the library is representative of occurrence frequencies in the entire

A. alluaudi genome, and that the average insert size in the library is 400 bp, we estimated that (GT)n motifs occur at an average interval of 24 kb in the A. alluaudi genome. This frequency compares favorably with the estimated occturence frequences of (GT)„ motifs in genomes of brown trout {Salmo trutta ), the Atlantic salmon, and a Lake Malawian cichlid {Pseudotropheus zebra), but is slightly lower than the genomes of zebrafrsh

{Brachydanio rerio), the Atlantic cod {Gadus morhua ), and bluegill sunfrsh {Lepomis macrochirus ) (Table 2.7).

Unlike dinucleotide microsatellites, trinucleotide microsatellites have only been extensively screened in a few genomes, primarily in mammalian genomes {e.g., Gastier et ai, 1995) and invertebrate genomes (e.g., Strassmann et n/., 1996). A comprehensive survey of the human genome indicates that trinucleotide repeat sequences are one to two orders of magnitude less frequent than (GT)„ repeats, and that the average size is less than

15 repeat units (Gastier et n/., 1995). So far, teleost genomes have not been comprehensively screened for trinucleotide repeats (but see Lee & Kocher, 1996; Chenuil

45 et al., 1997). In the present study, we found that about 0.5% ((19/21)722,000) of the recombinant plaques screened contained one of the two trinucleotide repeat motifs.

Therefore, in A alluaudi genome, trinucleotide repeat sequences are about seven times

(1.7/(0.572)) less frequent than the (GT)n repeat sequences. Most of the trinucleotide repeat motifs recovered from the library screening were around five repeat units, and none of the 19 trinucleotide repeat sequences contained more than 10 repeat units. These observations suggest that recovering longer trinucleotide repeats may require a higher screening stringency.

Do teleost genomes contain more lengthy dinucleotide repeat motifs than mammalian genomes?

While the number of dinucleotide repeat units in the microsatellite-containing clones from mammalian libraries rarely exceeds 30 (Weber, 1990), such long repeat sequences are found frequently in teleost genomes. For example, the proportion of dinucleotide microsatellite sequences with at least 30 repeat units is 56.3% in Atlantic cod (Brooker et al, 1994), at least 32.0% in rainbow trout {Oncorhynchus mykiss )

(Morris et al, 1996), 23.1-26.2% in Atlantic salmon (McConnell et al, 1995), and 8.3-

13.3% in bluegill sunfish (Colboume et al, 1996). Brooker et al, (1994) speculated that the longer microsatellite repeats found in teleost fishes were caused by more frequent

DNA polymerase slippage during replication due to temperature fluctuations. If this were the case, one would expect shorter dinucleotide repeats in those teleost species (such as tilapia and Eastern African cichlids) that live in the environment with less temperature

46 fluctuation than the environment of salmonids. Data from a recent extensive library screening study in tilapia by Lee & Kocher (1996) seem consistent with this hypothesis.

Among the 133 dinucleotide repeat sequences that they recovered, only 8.3% had at least

30 uninterrupted repeat units, and none had 40 or more uninterrupted repeat units. In the present study, we estimated that 30.4% of the dinucleotide microsatellite clones isolated from Æ alluaudi contained at least 30 continuous repeat units, and 21.7% contained at least 40 continuous repeat units. This is not surprising, however, considering that A. alluaudi, because of its high tolerance to anoxia (Chapman et al, 1995), is able to exploit marginal environments that actually are subject to severe temperature fluctuations. It is worthwhile noting that since different library screening techniques and/or different library screening stringencies have been employed in different laboratories, some screening techniques or conditions may favor the recovery of shorter repeat motifs. Therefore, a more comprehensive library screening with identical screening conditions is necessary before generalizations can be made about the profile of dinucleotide repeats in teleost genomes.

Potential o f cross-species applications o f A. alluaudi microsatellite markers in cichlid phylogenetic studies

One of the biggest advantages of microsatellite markers in phylogenetic studies is that sequences flanking the repeat motifs are generally conserved among closely related species. This feature is of great importance to cross-species application studies like cichlid population genetic studies and phylogenetic analyses, considering the expensive

47 and time-consuming process of developing microsatellite markers for a particular species.

Several previous studies have shown that microsatellite primers designed for one teleost species (source species) could amplify genomic DNA from other closely related species

(target species) (McConnell er a/., 1995; Morris era/., 1996; Rico etal., 1996; Scribner er al., 1996).

In the present study, I have found that seven o f the nine pairs of microsatellite primers developed in A. alluaudi amplified homologous loci in seven other LVR haplochromines. Data presented here have also indicated very high genetic variability within each of the taxa. With an average sample size of 28.4 individuals, an average of

12.7 alleles were observed per locus per species, and the average expected heterozygosity was 0.64 (Table 2.2). In A. alluaudi, the average number of alleles observed at the seven

loci in 128 individuals is 18.3 (Wu et ai, manuscript submitted). Taking the sample size

differences into account, A. alluaudi and the other seven LVR haplochromines exhibit comparable levels of polymorphism at the seven microsatellite loci (also see Chapter 3).

This finding is inconsistent with a study by Fitzsimmons et al., (1995), in which they reported that at polymorphic loci, non-source (target) turtle species have fewer alleles and

lower heterozygosity than the source turtle species where the microsatellite primers were

developed. The discrepancy between Fitzsimmons et al.'s (1995) study and the present

study may be explained by the fact that the cichlid species used in the present study are much more closely related to one another than the turtle species that were compared. The finding that most of the microsatellite markers developed from A. alluaudi revealed an extremely high level of variation in each of the other seven LVR haplochromine species

48 is very encouraging, because the genetic markers may now be sufficiently variable to allow us to infer population structure and to obtain a robust Lake Victoria cichlid phylogeny.

Finally, it is also interesting to note that the two markers (OSUlSd and OSU22d) that harbor long repeat motifs failed to consistently amplify homologous loci in the seven other haplochromines, suggesting that the primer sequences at these two highly polymorphic loci are variable among cichlid species. Gleim et al. (1996) have shown that at alligator microsatellite loci, allelic diversity is negatively correlated with the evolutionary conservation of sequences flanking the repeat motifs. This information gives us some insight in terms of choosing appropriate microsatellite markers in future studies of cichlid phylogeny.

49 Locus Core sequence Primer sequence (S' - 3') Tm cycle

OSU09d (TC)3(TG)2 o(CGT)i4 F: CCTCTGTAGTGATGTTTAATCTCTGT 60°C 28 R: TGACACTGCACTTACTTGGCT OSUI2t (NGC)i 3 F: TCAAACACCCACAGCCTTCA 60°C 22 R: CGGTGATTGCTGTTGATACTGA OSUlSd (GT)25 F: TAAGCTGATAGGAACCCAAC 58°C 30 R: ACTCCTATTrrGTTATTTTTGTGA OSU16d (GT)io F: GGCGAATGGTGGGTCAAG 58°C 32 R: ATGTTGCTTGCCGCTGC 0SU19d (GT)47 F: CAGTGCTTTGGTGGTGCT 55°C 30 R: CATGACGTCTTTCAATAAGGAT 0SUI9dN F: GATCACTTGTCAGTGCTTTG 55“C 30 R: GATAGAACACTTAGAGTGCAGG OSU19t (CA) j i (ANC) j 2 F; TGAAGGACAAAGCAGGACTG 60°C 28 R: TGCCCGAACCriTTTATTTA OSU20d (GT)47 F: GAAGTGGGATTTGCAGCTTG 60°C 30 R: CATGCTTACAAAGAACAGGGTTAC OSU20dN F: ACACCTGGGTGAGACTGGC 58°C 31 R; TTAGAGCGTGTCACACAGCAT OSU21d (GT)g(GC)(GT)4 F: GCCGCTCAGAGTTTGGTG 60°C 22 R: AGGCATGTGTCAGTTCATCCT OSU22d (GT)4 i F: TGAAATCAAATACTAGAGCAAATA 55°C 32 R: GGAGTTTAAAAATGATGCGT OSU22dN F: GATCACTTT r 1CCCTTTTA 50°C 30 R: ATTCATACAACTACTGGCAC OSU02d (TG)i 3(TT)(TG)5 F: TCTAGAGTCCAAAGCAGGTG 50°C 30 R: ACAGCCTGTGGAAAAATATC OSU04d (TG)2(TA)(TG)20 F: ACTGAATTTGTTGTGTAGC 50°C 35 R: TATTATTCTTATAACATTTATACC 0SUl4d (TG)41 F; TGTGCAGTCGTTATGTTATC 6Q°C 30 R: CACTATCCAAGTTTGAACTCTAAG OSUlTd (AT)5(GT)3 o F: ACATCATTTCTTGGTAACA 50°C 35 R: TTCAGAGAGATAGAGATATATTT OSUlSd (GT)9 F: TATGTGGGTGTTATGAAAGCA 52°C 30 R; GGGATCTACCACCTTGTGAC OSUl4t (CAG)5(CCG)(CAG)4(CAA)(CAG)3 F: GAGCCCCGCAGTGTCGC 64°C 30 R: CCTCCAGCTCCACCTCCAGA OSU24tl (TCC)io F: TCACAATCGCCGGCGGT 64°C 30 R: CGGAGCCGACCACGCAG OSU24t2 (CGG)2(CAG)4(TGG)(CAG)7 F: CCGTTGTTGCGACCAGCC 64°C 30 R: CGTCCTCCTCCAAGTTCTCCTG

Table 2.1: Microsatellite core sequences, primer sequences and PCR conditions-

PCR conditions are defined by PCR cycle numbers and annealing temperature (Tm).

50 Species Sample size (N) Collection site Astatoreochromis straeleni I Lake Tanganyika* Astatotilapia velifer 41 Lake Nabugabo Astatotilapia velifer 29 Lake Kayugi Haplochromis "ruby" 19 Lake Nawampasa Oreochromis esculentus I Lake Kyoga Oreochromis niloticus 1 Lake Kachira Paralabidochromis “black para” 23 Lake Kyoga Paralabidochromis “rock kribensis" 31 Lake Victoria Paralabidochromis sp. 16 Lake Victoria Yssichromis fusiformis 29 Lake Victoria Yssichromis laparagramma 10 Lake Victoria

Table 2.2: Species, sample sizes, and collection sites for cross-species amplifications.

*A. straeleni are only found in Lake Tanganyika.

51 Clone Locus Repeat motif LWl OSUOld (GT), LW2 OSU02d (TG),3(TT)(TG)5 LW3 OSU03d (CT):,(AC)^ LW4 OSU04d (TG),(TA)(TG)3 o LW5 OSUOSd (GT)^o LW6 OSU06d (GT),, LW7 OSU07d (GT),g LW9 OSU09d (TC)3(TG),,(CGT),^ LWIO OSUlOdd (TA) p(N),2(GT)3, L W ll O SU lld (GT)3(AT)(GT),3 LW12 0SU12d (GT)a LWl 3 0SU13d (GT)^ LW14 0SU14d (TGI, LWl 5 OSUlSd (GT), LWl 6 OSU16d (GT),. LWl 7 0SU17d (AT)5(GT)3o LWl 8 OSUlSd (GT), LWl 9 OSU19d (GT),, LW20 OSU20d (GT),, LW21 0SU21d (GT),(GC)(GT)^ LW22 OSU22d (GT)„ LW23 OSU23d (CT),o(GT),3 LW24 OSU24d (GT)33

Table 2.3: Repeat motifs for the 23 positive dinucleotide microsatellite clones.

52 Clone Locus Repeat m otif AA3 OSU03t (CAT)(CAA)(CCA)zG(CCA) AA4 OSU04t (ACC), AA5 OSUOSt (CAT); AA7 OSU07t (CA)«? AA8 OSUOSt (CCC)(ACC)s AA9 OSU09t (CCA);(CCC) AAIG OSUlOt (GCA),(GTA) A A ll O SU llt (ACC). AA12 0SU12t (AGC);(CGC)(AGC),(TGC)(AGC)(CGC)(AGC) OSU12t2 (CAG),(CAA)(CAG)2(CAA)2(CAG)z(CAA)(CAG) AA13 0SU13t (CCT)(CCA)5 AA14 0SU14t (CAG);(CCG)(CAG)4(CAA)(CAG), AA15 OSUlSt (CGT)(CAT),(CCT)(CCA)(CCT)(CCA)6 AA16 0SU16t (CAG); AA17 0SU17t (CCA); AA19 0SU19t (CA),,(ACC),(AAC), AA21 0SU21t (ACC); AA22 OSU22t (TCC),T(TCC), AA23 OSU23t (AAC). AA24 OSU24tl (TCC),o OSU24t2 (CGG),(CAG)4(TGG)(CAG)7 AA26 OSU26t (ACC)4(N),(CA)„

Table 2.4: Repeat motifs for the 20 positive trinucleotide microsatellite clones. *N: either G, or A, or T, or C.

53 Locus A. straeleni Tilapia species Haplochromine cichlids OSU09d + + 4- O SU llt + + 4- OSUlSd + +/- 4-/-

0SU16d + + 4-

0SU19d + +/- 4-

0SU19t + +/- 4-

OSUlOd + + 4-

O SU lld + 4- 4-

O SU lld + +/- 4-/-

Table 2.5: Cross-species amplification using nine pairs o f A. alluaudi microsatellite PCR primers. +: clear PCR products were obtained; +/-: unclear or unscorable PCR products were obtained.

54 OSU09d 0SU12t OSU16d OSU19d 0SU 19t OSU20d OSU21d Average

A. velifer n 7 4 27 28 25 28 3 17.4 He 0.37 0.38 0.91 0.95 0.93 0.93 0.23 0.67 H. “ruby” n 6 2 26 16 21 22 2 13.6

He 0.34 O.IO 0.95 0.92 0.91 0.94 0.33 0.64 P. “rock kribensis” n 3 2 20 16 24 22 2 12.7 He 0.47 0.12 0.87 0.88 0.87 0.92 0.41 0.65 P. “black para” n 2 3 23 21 19 19 2 12.6 He 0.09 0.20 0.94 0.93 0.87 0.90 0.16 0.58 P. sp. n 4 2 16 14 16 16 2 10.0 He 0.33 0.22 0.92 0.91 0.90 0.90 0.30 0.64 K laparogramma n 4 2 12 10 12 13 0 7.9 He 0.57 0.48 0.89 0.87 0.90 0.91 0.00 0.67 Y. fusiformis n 6 2 23 20 26 23 2 14.7 He 0.32 0.21 0.94 0.93 0.88 0.94 0.03 0.61 Average n 4.6 2.4 21.0 17.9 20.4 20.4 2.0 12.7 He 0.36 0.23 0.92 0.91 0.90 0.92 0.22 0.64

Table 2.6: Genetic variability of microsatellites in LVR cichlid species, n: number of observed alleles; Hq\ expected heterozygosity.

55 Genome Frequency (kbp"^) Reference A. alluaudi 24 The present study Atlantic cod (Gadus morhua) 7 Brooker er a/., 1994 Atlantic salmon (Salmo salar) 24-35 McConnell efn/., 1995 Bluegill sunfish (Lepomis macrochirus) 14 Colboume et al, 1996 Brown trout (Salmo truttd) 23 Estoup et al, 1993 Malawian cichlid (Pseudotropheus zebra) 35 Van Oppen et al, 1997 Zebrafish (Brachydanio rerio) 12 Goffer a/., 1992

Table 2.7: Estimated occurrence frequency of (GT)n repeat motifs in various teleost genomes.

56 CHAPTERS

Microsatellites Reveal Regional Differentiation in the Lake Victoria Cichlid Fish,

Astatoreochromis alluaudi

INTRODUCTION

The cichlid species flocks of the three Eastern Aflican Great Lakes, Lake

Victoria, Lake Malawi, and Lake Tanganyika, represent a unique example of explosive spéciation and reiterative adaptive radiation in vertebrates (Fryer & lies, 1972). Each of the regional lacustrine fauna includes several hundred cichlid species (nearly all endemic), and a smaller number of coexisting non-cichlid fishes (Greenwood, 1991;

Pitcher & Hart, 1995). For example, the haplochromine fauna of Lake Victoria, the youngest of the three great lakes, consists of an estimated 400-600 haplochromine cichlid species, more than 99% of which are endemic (Greenwood, 1991; Kaufinan, 1997;

Pitcher & Hart, 1995). In contrast, there are only about 40 non-cichlid fish species in this lake, and only 42% are endemic (Pitcher & Hart, 1995). The great lakes cichlid faunas also stand in contrast to those of nearby river systems {e.g., Nile River and Niger River),

57 in which many fewer cichlid species are present and most of them are not endemic

(Greenwood, 1991; Kaufinan et a/., 1997).

The status of most Lake Victoria Region (LVR) haplochromine cichlids as

distinct species is clear by any of the measures currently used by evolutionary biologists.

They differ in appearance, morphology, and microhabitat, and mate assertively in nature

(Seehausen et al, 1997). Nonetheless, few morphological characters are consistently

informative of intrafamilial relationships, and some of the potentially most useful

characters are confounded by phenotypic plasticity (Stiassny, 1991). Moreover,

conventional genetic markers have revealed low levels of polymorphism or very little

divergence within and among Eastern African cichlids. Such a lack of genetic divergence

is especially noteworthy among the LVR haplochromines. Previous studies using

allozyme markers (Sage et al., 1984), nuclear DNA markers (Mayer et ai, 1998), and the

normally highly variable mtDNA sequences (Meyer et ai, 1990), all showed very low

levels of divergence among Lake Victoria haplochromine cichlid species. The results

firom the mtDNA study are especially surprising considering that mtDNA sequences are

usually more variable than single copy nuclear genes, and have been successfully used to

infer population stmcture and relationships in many studies of closely related fish species

(Stepien & Kocher, 1997).

New genetic markers with different characteristics, especially DNA microsatellites, are now available to reconsider the situation of the LVR cichlids.

Microsatellite loci diverge more rapidly than mitochrondrial or nuclear DNA sequences.

Here, I compare the patterns of variability and divergence obtained firom a set of

5 8 microsatellite markers developed from the LVR haplochromine species Astatoreochromis alluaudi (Wu et ai, in press), with results obtained from mtDNA sequences. A. alluaudi is one of the most widespread haplochromine species, distributed in the Lake Victoria basin of Kenya, Uganda and Tanzania, as well as being found in Lakes Edward,

Nakachira and Kakivali of Uganda. It has been used as a model system for the study of phenotypic plasticity, but belongs to a genus of haplochromine cichlids that contains only two other described species throughout East Africa. Because of its widespread distribution in the LVR, we hypothesized that the mtDNA analysis might have underestimated genetic diversity due to the small sample size examined in the previous study (Meyer er a/., 1990). Examining a larger number of individuals from multiple geographical locations would allow us to test whether low estimates of mtDNA variation in the LVR cichlids were due in part to small sample sizes, or, alternatively, were a characteristics deriving from the probable recent divergence of members of the fauna

(Kaufinan 1997; Johnson et al, 1996; Meyer et al, 1990). Further, the results from mtDNA sequence analysis can then be contrasted to the diversity and phylogenetic patterns obtained when DNA microsatellite loci are studied in the same populations.

Finally, 1 examine whether microsatellite markers can differentiate other species of the haplochromine flock of Lake Victoria.

59 MATERIALS AND METHODS

Biological material and DNA extraction

Detailed information about all the specimens used in this study is compiled in

Appendix A. A total of 128 individuals o f A. alluaudi were collected from six localities in the LVR (Figure 3.1): Jinja Pier (sample size N = 13), Napoleon G ulf (N = 18), Lake

Kachira (N = 22), Lake Kyoga (N = 22), Lake Nawampasa (N = 23), and Lake Kabeleka

(N = 30). Among the six sampling locations, Jinja Pier and Napoleon Gulf are both within Lake Victoria, and are only several kilometers apart. They are located on the eastern side and the western side of the Victoria Nile entry, respectively. Lake

Nawampasa is a small lake that is geographically isolated from the greater Lake Kyoga by a slightly elevated dike-like ground. During high flooding seasons, however, water connections could exist between Lake Kyoga and Lake Nawampasa in either direction.

Lake Victoria and Lake Kyoga are connected through the big river, Victoria Nile. The water connection is primarily one directional, from Lake Victoria to Lake Kyoga. Since

1954 when a dam was build in Victoria Nile, such a connection between the two major lake systems has been blocked. Lake Kabeleka is also a small lake that used to be a part of the greater Lake Edward/George system, but is currently geographically semi-isolated from the later even though water connections between them still exist via numerous swamps, especially during the high flooding seasons. The water connection between the

Lake Edward/George system and Lake Victoria is bidirectional through the Katonga

River. However, the water levels in the Katonga River fluctuates dramatically during

60 different seasons. Lake Kachira belongs to the Kooki Lake system, and is isolated from all other major lakes, including Lake Victoria. Nevertheless, during some high flooding seasons, limited water connection could exist between Lake Kachira and Lake Victoria through numerous swamps and the Kagera River.

Four additional LVR cichlid species were also included in the phylogenetic analysis. They are Astatotilapia velifer from Lake Nabugabo (N = 41) and from Lake

Kayugi (N = 29), Paralabidochromis “rock kribensis” from Lake Victoria (N = 31),

Paralabidochromis sp. from Lake Victoria (N = 16), and Yissichromis fusiformis from

Lake Victoria (N = 29). All specimens were collected from Kenyan and Ugandan waters in the northern and western portions of the LVR between 1992 and 1996. Sample collection procedures and genomic DNA extraction protocols were as described as in

Chapter 1.

Amplification and sequencing o f mtDNA

Amplifications of mtDNA were performed using primers H14724F (Song, 1994) and tPhenR (5’-CTAGGGCCCATCTTAACATCTTCAT-3’, designed by B. Porter).

PGR was carried out in 50 pi of a mixture containing 300-900 ng of DNA template, 25 pmol of each primer, 12.5 pmol of each deoxyribonucleoside triphosphate (dNTP), 125 pmol of MgCl;, and 1 unit of Taq DNA polymerase (GIBCO BRL). Amplification was performed in a Perkin Elmer Thermocycler with 35 cycles of 45 sec at 95°C, 1 min at 52

- 56°C, and 2 min at 74°C. This main PCR profile was preceded by a long dénaturation

61 cycle at 95°C for 5 min, and followed by a long extension cycle at 74°C for 6 min.

Individual PCR products were purified with, a QIAquick PCR Purification Kit (QIAGEN,

INC.), and 5-10 ul of the purified PCR products were bi-directionally sequenced with a

BRL cycling sequencing kit using two internal primers: tThr (5’-

AGAGCGCCGGTCTTGTAATCC-3’, designed by B. Porter) and H I6498 (Meyer et al,

1990).

Microsatellite genotyping

Nine microsatellite markers were used in the present study. They are OSU09d,

0SUl2t, 0SU13d, 0SUI6d, 0SU19d, 0SUl9t, OSU20d, 0SU21d, and OSU22d.

Primer sequences as well as PCR conditions are described in Chapter 2. Primers designated OSUxxxN were only used to analyze individuals of A. alluaudi that were initially scored as homozygous when OSUxxx primers were used, to rule out possible

“null” alleles. Individual PCR products were resolved in 6% or 8% polyacrylamide sequencing gel with 7M urea, with sizes being determined by running pUCI 8 plasmid control sequences in adjacent lanes.

Data analyses

Nucleotide diversity of mtDNA, or 7t (Nei, 1987), was calculated using the computer program DnaSP (Rozas & Rozas, 1997), version 2.2. Intra-population polymorphism of microsatellite markers was measured by the number of observed alleles.

6 2 observed heterozygosity, and unbiased gene diversity (Nei, 1987). Genotypic distributions were tested for Hardy-Weinberg equilibrium using the computer software

GENEPOP (version 3.1) (Raymond & Rousset, 1995) with a 1,000,000-step (1,000 batches of 1,000 iterations) Markov chain for the exact probability test, and a 300,000- step (300 batches of 1,000 iterations) Markov chain for the global tests across loci or across populations for heterozygote deficiency (Rousset & Raymond, 1995).

Inter-population differentiation of microsatellite markers was evaluated by two genetic distance measures: Nei's (1972) standard genetic distance (Dg) and the allele sharing distance (Dyis) (Bowcock et al., 1994). Genetic distances were calculated using the computer program Microsat (version 1.5d) developed by E. MINCH

(http://lotka.stanford.edu/microsat.html). Phonograms based on each of the two distance measures were constructed using either the unweighted pair group method for arithmetic mean (UPGMA; Sneath & Sokal, 1973) or the neighbor-joining method (NJ; Saitou &

Nei, 1987) implemented in the computer program MEGA (Molecular Evolutionary

Genetic Analysis, Kumar et al., 1993).

Population subdivision was estimated using Wright’s F statistics {F^, theta estimate; Weir & Cockham, 1984). Pairwise values of F^ were used to estimate the average number of migrants per generation (N^) between populations. Calculations were performed using the computer software FSTAT (Goudet, 1995), version 1.2. Under a finite island model, = [(l/FJ - 1]/4. Statistical significance of F^ values was tested using FSTAT with 1,000 permutations. For comparison, the analogue for

63 microsatellite markers, Slatkin’s (1995) R^, as well as the analogue MR (Slatkin,

1995), were also calculated using Microsat (version 1.5). Here MR = [(1/R^ - l]/4.

To test for Slatkin’s (1993) isolation by distance, we used the computer package

GENEPOP (Raymond & Rousset, 1995), version 3.1a, to perform Mantel’s tests for the independence between geographic distances and or with 1,000 permutations. Two geographical distances were examined by Dr. L. Kaufinan at Boston University for these tests (Table 3.1). The “straight-line geographical distances” were the shortest direct distances among the sampling sites, while the “realistic migrational distances” were based on probable migration routes, assessed firom topographic maps and field reconnaissance with a precision altimeter.

RESULTS

Variation o f mtDNA

Sequences were determined for a 432 base pair segment of mtDNA, including the entire proline tRNA gene and the first part of the hypervariable control region, firom 35 Æ alluaudi individuals (Table 3.2). Consistent with Meyer et a/.’s (1990) data, very little variation was observed in the mtDNA region that we sequenced. Only 10 variable sites were identified (Figure 3.2a), none of which is occupied by more than two alternatives, and only four of which are phylogenetically informative. Among the 10 variable sites, nine involve nucleotide substitutions and one involves an insertion/deletion event. All of the nine substitutions are transitions (Figure 3.2a). Among the 35 individuals, only 10

6 4 different haplotypes were identified. The single Lake Victoria .4. alluaudi sequence reported in Meyer et al. (1990) is identical to haplotype 3 except that at three positions, where no substitution variation (except a single deletion at position 439) was found among the 35 individuals that we sequenced, we obtained different nucleotides from those reported in Meyer et al. (1990) (i.e., A vs. G at position 46, T vs. C at position 420, and T or vs. A at position 439). The complete sequences of the 10 haplotypes, plus one A. alluaudi sequence from Meyer et al. (1990) and one Astatotilapia burtoni sequence from Meyer et al. (1990) are compiled in Appendix B.

Nucleotide diversity (Nei 1987) was low. For the four population samples with multiple individuals, the Jinja Pier sample from the largest lake. Lake Victoria, has the highest nucleotide diversity (0.0056) (with six haplotypes identified for seven individuals), followed by the Lake Kyoga population sample (0.0036). The two population samples from smaller lakes (Kachira and Kabeleka) have relatively low nucleotide diversity (0.0014 and 0.0000, respectively). Figure 2b shows a minimum mutation network for the 10 mtDNA haplotypes. Because adenine was found at position

124 in all other haplochromine cichlids analyzed previously (Meyer et al. 1990), haplotype 1 is more likely to be the ancestral haplotype within A. alluaudi. It also occurs in more localities than other haplotypes.

To test whether mtDNA sequence information can give us any insight about the relationships of the six populations, a UPGMA phonogram was constructed based on the

10 haplotypes (Figure 3.3). No significant relationship existed between either the frequency or the occurrence of mtDNA haplotype and the geographical source of

65 samples, both because individuals from different sampling locations can share the same haplotypes {e.g. haplotype 1 is share by individuals from Lakes Victoria, Kyoga and

Kachira), and because individuals from the same sampling locations can represent multiple haplotypes that did not form a single cluster {e.g., individuals from Lake

Victoria represent seven of the 10 haplotypes).

Intra-population microsatellite variability

All nine microsatellite markers were polymorphic in A. alluaudi, characterized by multiple alleles and high levels of observed heterozygosity (Table 3.3). Complete allele frequency data for the six populations at the nine microsatellite loci are compiled in

Appendix C. A total of 199 alleles were observed in 128 A. alluaudi individuals. All of the populations analyzed are highly polymorphic for six markers: OSU09d, 0SU13d,

0SU19d, 0SU19t, OSU20d, and OSU22d. At these loci, the total number of alleles per locus observed ranged from 19 to 43, the average observed heterozygosity at individual loci ranged from 0.86 to 0.96, and the average gene diversity (Nei, 1987) at individual loci ranged from 0.83 to 0.96. The allele frequencies for the most frequent alleles at these six loci were low, usually less than 0.20. The remaining three markers, OSU12t,

OSU16d, and 0SU21d, were less variable, with the total number of alleles ranging from three to five, the average observed heterozygosity from 0 . 0 2 to 0.17, and the average gene diversity from 0.02 to 0.16 at each locus. For each of the six populations, the mean number of alleles per locus ranged from 7.9 (Kachira) to 13.7 (Kyoga), and the average observed heterozygosity from 0.50 (Kachira) to 0.67 (Jinja Pier & Kabeleka). Over all

66 populations, for each of the nine microsatelHte markers, the mean number of alleles observed ranged firom 1.3 (OSU12t) to 21.2 (OSU22d), and the average observed heterozygosity ranged firom 0.02 (0SU12t) to 0.96 (OSU22d).

Exact probability tests on each of the six populations revealed no significant departure firom Hardy-Weinberg expectations for the three less variable markers

(0SU12t, 0SU16d, and 0SU21d) and the three highly variable markers(OSU09d,

OSUlSd, and 0SU19t), whereas significant deviations firom genotypic proportions expected under Hardy-Weinberg equilibrium were detected for the remaining three highly polymorphic markers (0SU19d, OSU20d, and OSU22d). By using the additional pairs of

PCR primers (OSUxxxN in Table 2.1) for loci 0SU19d and OSU22d, we found that most individuals that were originally scored as homozygotes were actually heterozygotes.

However, the additional primers for locus OSU20d did not reveal any previously undetected heterozygotes. For this new data set, significant deviations from Hardy-

Weinberg’s expectations were no longer detected either at locus OSU19 or at locus

OSU22. The Markov chain global tests revealed a significant heterozygote deficiency at only one (OSU20d) out of the nine loci when testing across populations, and in only one

(Kabeleka) out of the six populations when testing across loci.

Population structure

For the six population samples, the overall value is 0.049. Permutation tests showed that F^^ for each locus except locus 0SU12t was significantly different firom zero, and the Fj, over all loci was also significantly different firom zero. Moreover, 22.6%

67 (44/199) of the alleles observed in A. alluaudi are private alleles, alleles found only at single sampling sites. All the private alleles had very low allele frequencies, and the sizes of the majority of private alleles were either in the lower end or in the higher end of the allele size profile (Appendix C). The number and percentage of private alleles at each locus and in each population are given in Table 3.4.

Not surprisingly, in considering pairwise comparisons among population samples

(Table 3.5), the lowest pairwise value (0.002) was observed between the two population samples from Lake Victoria (Jinja Pier and the Napoleon Gulf), followed by that between the two populations from the Lake Kyoga basin (Kyoga and Nawampasa)

(0.006). The pairwise values between the sample from Lake Kachira and each of the other samples were considerably higher than those between any other two samples. All pairwise /%, values except that between the two Lake Victoria samples showed significant population structure (Table 3.5).

The average number of migrants per generation estimated from F^^ values was generally high, ranging from 1.8 (between the Jinja Pier population and the Kachira population) and 121.3 (between the Jinja Pier population and the Napolean Gulf population) (Table 3.5). The pattern of pairwise R^^ values (Table 3.5) was generally consistent with that of values, although almost all pairwise values were higher than their respective F^ values. The highest values were also observed between the Kachira population and each of the other five populations, ranging from 0.143 to 0.272. In contrast, the pairwise values between any other two populations were much smaller

68 (firom 0.026 to 0.116), except that a small negative value (- 0 .0 1 ) was observed between the Jinja Pier population and the Napoleon Gulf population.

In testing isolation by distance, a significant positive correlation was obtained between either or and realistic migrational distances (p = 0.017 and 0.030, respectively), but not between or R^ and straight-line geographical distances (p =

0.077 and 0.130, respectively).

Population genetic relationships

Allele firequencies at the nine microsatellite loci were used to estimate population relatedness using two types o f genetic distances {Dg and Djis) (Table 3.6). The topologies of the UPGMA phenograms obtained from Dg and Djig are identical: the

Jinja Pier population and the west Napoleon Gulf population form one population cluster; the Kyoga population and the Nawampasa population form the second cluster; and the

Kachira population falls out o f the deepest branch. Figure 4 shows the topology of the

UPGMA phenogram obtained firom Dg. The topologies of the two NJ phenograms (not shown) obtained firom Dg and Dj^g are also identical, but differ slightly firom the

UPGMA phenograms by having the Kyoga population and the Nawampasa population sequentially come off firom the branch leading to the Lake Victoria cluster.

69 Microsatellite variability and phylogenetic inference o f other LVR haplochromine cichlids

To test whether microsatellite markers can detect substantial amounts of variation and contain useful phylogenetically informative signals for LVR haplochromine cichlid species, we used the nine pairs of microsatellite primers designated as OSUxxx (Wu et ai, 1999) to amplify genomic DNA from four other LVR haplochromine cichlid species.

Seven pairs of primers successfully amplified the homologous loci in each of the species tested, and detected high levels of genetic variation within each species. The average number of observed alleles ranges from 12.7 to 17.4 per locus, and the average expected heterozygosity from 0.58 to 0.67 (Wu et al., 1999). Furthermore, a neighbor-joining tree

(Figure 3.5) based on Nei’s (1972) standard genetic distances showed that the group of microsatellite markers can differentiate all the four species with the two populations o f A. velifer forming one clade and the two congeneric species of Paralabidochromis forming the other clade. Consistent with previous genetic studies (Meyer et al, 1990; Sage et al.,

1984), A. alluaudi is more distantly related to other LVR haplochromine cichlid species than the later to one another.

DISCUSSION

Although the cichlid species flocks in the Eastern African rift valley lakes have received extensive attention from evolutionary biologists and population geneticists, previous genetic studies (Mayer et al, 1998; Meyer et al, 1990; Sage et a l, 1984) bave

70 lamented the extreme difficulty of resolving phylogenetic relationships and population structure in LVR haplochromine cichlids. This has been a ffiistrating obstacle to progress in an otherwise ideal and information-laden system. The present extended mtDNA study, with a larger number of individuals sampled from several geographical locations, still revealed very low levels of mtDNA nucleotide diversity even thought, alluaudi is a widespread haplochromine species in the LVR. Such results suggest that the low mtDNA diversity in LVR cichlid species reported previously (Meyer et al, 1990) was not solely a function of small sample sizes, and rule out the useful phylogenetic inference from mtDNA sequence data. In sharp contrast, microsatellite markers, as a group, revealed significant regional differentiation of alluaudi in the LVR, probably as a result of migrational isolation by distance. Our comparative study bears several important implications.

Intrapopulation genetic variability

All of the nine microsatellite markers are polymorphic in at least two of the six natural A. alluaudi populations surveyed. The six markers with longer repeat motifs showed much higher genetic variability than the three markers with shorter repeat motifs.

This finding is consistent with the notion that repeat lengths of microsatellite markers are positively correlated with their degree of polymorphism (Weber, 1990).

Among the six populations, the population from Lake Kachira, which is geographically semi-isolated from all the other major lakes, has the lowest mean number of observed alleles (7.9), the lowest average observed heterozygosity (0.50), and the

71 lowest average gene diversity (0.53) (Table 3.3). These numbers for the other five

populations are considerably higher, and similar to one another. An extreme example

was seen at locus OSU09d, where only four alleles were observed in the Kachira

population, whereas 15 to 17 alleles were observed in each of the other five populations.

The fact that the Kachira population has the lowest genetic diversity among the six

populations is in concert with the data from Randomly Amplified Polymorphic DNA

(RAPD) markers, which showed that A. alluaudi individuals from Lake Kachira were

more similar to one another than are individuals either from Jinja Pier or from Lake

Kyoga (Black, M., G. Booton, L. Kaufinan, and P. Fuerst, unpublished data).

Genotypic frequencies at eight of the nine markers fit Hardy-Weinberg expectations quite

well, whereas a significant heterozygote deficiency was observed at the OSU20d locus.

Such deviation from Hardy-Weinberg equilibrium is not completely unexpected,

considering that many alleles were observed in each population, and that the sample size

was not very large. Nonrandom mating seems unlikely to be an explanation in our case,

since such departures were not observed at the other loci. Given that a recent study

showed a negative correlation between allelic diversity and the evolutionary conservation

of flanking sequences at alligator microsatellite markers (Glenn et al, 1996), and that the

alternative primers for loci 0SU19d and OSU22d revealed additional heterozygotes, the presence of null alleles at the locus OSU20d may be a more reasonable explanation for the departure from Hardy-Weinberg expectations.

72 Genetic relationships o f A. alluaudi populations consistent with their geographical histories

As highly variable genetic markers, microsatellites are ideal for studying the evolutionary relationships of closely related taxa, including the different populations of a species. For example, McConnell et al. (1995) showed that two of the three microsatellite markers in Atlantic salmon can distinguish between populations, and that the Canadian fish are clearly genetically different firom the European fish based on the presence of unique alleles. Likewise, using only five microsatellite loci, Ruzzante et al.

(1996) demonstrated the existence of genetically distinct inshore and offshore populations among over-wintering cod in Newfoundland.

In the present study, we used nine microsatellite markers to evaluate the genetic structure of six A. alluaudi populations, and to infer their evolutionary relationships.

Two types of pairwise genetic distances {Dg and Djig) were calculated for the six populations. Nei's standard distance has been widely used in allozyme studies, and behaves well in terms o f the probability of getting a perfect tree firom microsatellite data

(Takezaki & Nei, 1996). The distance measure has been successfully employed to differentiate different human populations using microsatellite data (Bowcock et al,

1994).

Overall, the topology of the phylogenetic phenogram depicted in Figure 3.4 is consonant with our knowledge about the geographical and historical relationships in the

LVR. For example, populations that are very close geographically are also close

73 genetically {e.g., Jinja and Napoleon Gulf; Kyoga and Nawampasa), while the

geographically isolated Lake Kachira population is the most genetically distinct.

Consideration of the genetic relationships of samples from Lakes Kachira and Kabeleka with respect to the Victoria-Kyoga cluster of samples provides insight into the factors

determining genetic differentiation within the LVR. Lake Kachira is one of the Kooki

Lakes, a large valley swamp system in the Kagera Basin of Uganda that is both far from

Jinja and higher in elevation than Lake Victoria. Even though the straight-line distance

between Lake Kachira and Jinja/Napoleon Gulf is shorter than that between Lake

Kabeleka and Jinja/Napoleon Gulf, the realistic putative migration distances are reversed

since the most probable connection between Lake Kachira and Lake Victoria is the

Kagera River (see Figure 3.1).

Several other factors could further help to explain why the Lake Kachira

population is the most distantly related group among the six populations. The Kachira

population may have undergone more genetic drift because of its greater geographical

isolation. Although Lake Kachira is larger than Lake Kabeleka, it is highly isolated from

any larger lakes by numerous swamp barriers, a substantial drop in slope, and many river

miles. The very low level of polymorphism in the Kachira population (Table 3.3)

indicates that it may have experienced a historical bottleneck. Recent field data suggest

that the Kabeleka population, on the other hand, may be in relatively close contact with

that of Lake George (Chapman & Kaufrnan, unpublished). Alternatively, faunal

exchange between Lake Victoria and the Lake Edward/George system may have taken

place more recently than we would have expected given the timing of known major

74 floods, and the current very low level of the Katonga River (Kaufinan et al., 1997). A final possibility is that the Kachira A. alluaudi are derived fi:om the ancient riverine stock in the Kagera River basin that actually predates the most recent firee passage between the

Victoria and Edward/George systems, which could have been as recent as 8,000 years ago. Falsification of any of these hypotheses requires data firom additional localities in the Edward/George systems.

F statistics supporting the interpretation o f high mobility despite significant population subdivision of A. alluaudi

Fst values and the estimated number of migrants between each pair of populations

(Table 3.6) indicate the presence of recent and/or extensive gene flow between populations within a lake (populations of Jinja Pier and Napoleon Gulf, and populations of Kyoga and Nawampasa). However, gene flow between the Kachira population and any other five populations is highly restricted, probably due to geographical isolation. F,, values at eight out of the nine loci, and the overall F,, value over all loci, are all significantly different firom zero, and 22.6% of the alleles observed in A. alluaudi are private alleles. These observations support the interpretation of population subdivision within A alluaudi, and are generally consistent with the biogeographical data of the six populations. Since fi’equencies of all the private alleles are very low, and since the sizes of most of the private alleles are toward the ends, the percentage of private alleles would be expected to decrease when a larger number of individuals is analysed.

75 Although the pairwise i?j, values revealed patterns of gene flow similar to those

suggested by pairwise values, most of the values are larger than their respective

values. Several other studies have also shown higher values than the respective

values (Ruzzante et al., 1996; Viard et al, 1996). It is worthwhile to note that a negative

value was observed between the Jinja Pier and the West Napoleon Gulf populations,

which are geographically very close to each other (Figure 3.1; Table 3.1). Negative R^^

values were also recorded in Ruzzante et al.'s (1996) studies. Kimmel et a l (1996)

pointed out that firom a statistical point of view, a negative R^^ value could be obtained if

two populations are genetically very close and if the genetic variation within each

population is high, so that “their differences are dominated by statistical noise" (Kimmel

et al, 1996).

The relatively high estimates o f the average numbers of migrants per generation

(Nm) exchanged between populations of A. alluaudi (Table 3.5), some of which are quite

disparate geographically (Figure 3.1; Table 3.1), differ greatly firom results in a similar

study of populations for several Lake Malawi cichlid species (van Oppen et al, 1997). In

A. alluaudi, the average Nm were estimated to be from 1 .8 to 2.7 between Lake Kachira population and the other five populations, firom 6.3 to 13.2 among populations firom

different lake systems other than Lake Kachira, and firom 38.6 to 121.3 between populations firom the same lake systems (Table 3.5). In the case o f Lake Malawi cichlids,

small but significant population differentiation was inferred for samples collected across very short geographical distances (-40 km). The average Nm values between each of the pairwise population samples were very small. Similarly, we found significant differences

76 (Fjt = 0.053; Nm = 4.5) between the two population samples o f Astatotilapia velifer collected from two nearby lakes (Nabugabo and Kayugi) in Western Uganda. Therefore, the high migration rates estimated in. A. alluaudi (Table 3.5; Nm from 1.8 to 121.3) suggest that it is an unusually mobile haplochromine species, congruent with other knowledge of this species. A. alluaudi has an extremely broad habitat distribution. In our surveys we have found it from mud bottoms in open lakes to tiny streams and swampy puddles in the surrounding hills (Chapman et al, 1996). It also has a widespread geographical distribution within the greater LVR (Greenwood, 1981), tolerance to hypoxic barriers to dispersal (Chapman et ai, 1995; 1996), and trophic plasticity (Greenwood, 1965). High mobility fox Astatoreochromis might also help explain its astonishingly conservative evolutionary track record. Only three species have been described in this genus that is distributed across East Africa, while species richness in other haplochromine genera can range from tens to scores of species (Greenwood,

1981; Kaufrnan era/., 1997).

Population correlation between genetic distance (as measured by microsatellites) and pharyngeal jaw morphology

The data on population relationships have a bearing on important laboratory studies on phenotypic plasticity. A. alluaudi is well known for its plasticity in jaw morphology, manifested in the expression of diet-dependent tooth and pharyngeal morphs

(Hoogerhoud, 1986). Initially, these morphs were described as subspecies: A. alluaudi alluaudi as the “crusher” type, and A. alluaudi occidentalis bearing a pharyngeal

77 morphology typical of insectivorous species (Greenwood, 1965). Although studies on phenotypic plasticity in the laboratory suggested that such a subspecies division was unlikely (Hoogerhoud, 1986), the microsatellite data presented here indicate that individuals from Lake Kachira are genetically very different from individuals from other lakes. To see whether there is a correlation between microsatellite data and pharyngeal jaw morphology of the six population samples used in the present study. Dr. Les

Kaufinan, one of our collaborators at the Boston University, determined the form and size of the central teeth on the lower pharyngeal jaw by otoscopic examination for at least 10 of the largest possible size-matched individuals from each of the six samples. Peak derived tooth form was then classified as “molariform”, “pappilliform”, or

“intermediate”. The four Victoria and Kyoga samples in our study all exhibited strongly molariform dentition and pharyngeal hypertrophication, while the Kachira sample was relatively hypotrophic, a common feature for the Kooki lakes fishes examined by

Greenwood (Greenwood, 1965). Fishes from Lake Kabeleka are intermediate in the jaw morphology, tending toward molariform. This pattern is consistent with the genetic relationship among populations (Figure 3.4). Detailed comparisons of pharyngeal hypertrophication in these and other study populations are underway.

Results from the present genetic analysis open the intriguing possibility that the relatively greater genetic distances between the sample from Lake Kachira and those from Lake Victoria or Kyoga, may be correlated to significant, genetically fixed trophic differentiation. If so, regional variation in the ecology and ecomorphology o f A. alluaudi could have both a genetic and an epigenetic component, thus implying that the celebrated

78 plasticity of pharyngeal crushers among the LVR haplochromines has a strong genetic component in addition to the epigenesis so elegantly demonstrated by Hoogerhoud

(1986). In this, the control system for A alluaudi functional morphology may in part resemble that of the Cuatro Cienegas cichlid, Cichlasoma minckleyi (Komfield & Taylor,

1982; Liem & Kaufinan, 1984).

Potential o f A. alluaudi microsatellite markers in cichlid phylogenetic studies

The inability o f conventional genetic markers to reveal substantial variation, or to achieve phylogenetic resolution for the LVR haplochromines, is not entirely surprising, considering growing evidence for their very recent divergence. Severe drying in the last glacial maximum suggests that endemic Lake Victoria haplochromine species evolved in the last 12,400 years (Johnson et al, 1996), derived from regional genera probably no older than 225,000 years (Kaufinan et al, 1997; Meyer et al, 1990). Therefore, the lack of species or even generic resolution found using mtDNA is likely to be inherent in the short evolutionary history of the members of the Lake Victoria aquatic system.

The striking contrast in A. alluaudi of low mtDNA nucleotide diversity and high microsatellite variability points to microsatellite markers as the method of choice for population genetic analyses and phylogenetic inference of the LVR cichlids, a group that has evolved in a time frame probably too short for less variable genetic markers to accumulate sufficient variation. Here we have applied a subset of seven microsatellite markers to a set of other species firom the LVR. These markers detected high genetic variability within each taxon (Wu et al, 1999). More importantly, phylogenetic analyses

79 based on limited preliminary data indicate that, as a group, microsatellite markers can differentiate various LVR cichlid species, and that the resulting microsatellite-based phylogeny is generally consistent with predictions based on morphology. In the next chapter, I will present the data from an expanded phylogenetic analysis on LVR cichlid species using 14 microsatellite markers and 25 LVR haplochromine cichlid species.

Together, the results presented here strongly suggest the great potential of microsatellite markers in assessing population structure and obtaining a robust phylogeny of Lake

Victoria haplochromine cichlid species flock, tasks that have not been fulfilled using conventional genetic markers.

80 Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka

Jinja Pier — 4.76 257.38 118.21 97.78 330.60

Napoleon Gulf 9.II — 255.70 122.73 101.01 330.22

Kachira 756.1 753.42 — 294.69 323.59 126.21

Kyoga 130.68 138.05 975.43 — 54.97 321.95

Nawampasa 219.33 226.47 886.78 88.42 — 364.53

Kabeleka 611.95 606.47 688.35 831.28 742.63 —

Table 3.1: Pairwise straight-line geographical distances (above diagonal) and “realistic’ geographical distances (below diagonal) for the six sampling locations. Distances measures are in kilometers.

81 n Sample No. 7T

Jinja Pier 7 16603,16856, 14559, 14548, 14545, 6 0.0056 14551, 14553 Napoleon Gulf I 98024 n.a. n.a.

Kachira 10 97090, 97147,97149, 97091, 97154, 2 0.0014 97087, 97084, 97092, 97161, 97089 Kyoga 8 16619, 16620, 16621, 16630, 19812, 4 0.0036 19813, 19819, 19820 Nawampasa 1 17536 n.a. n.a. Kabeleka 8 K19, K26, K30, K33, K35, K36, K37, K39 1 0.0000

Average 5.8 3.3 0.0026

Table 3.2: mtDNA sequence diversity of Æ ûr//wûtw(i/samples, n: number of individuals sequenced; Aj,- number of mtDNA haplotypes (from 10 total haplotypes observed); n: nucleotide diversity (Nei 1987); n.a.: not applicable due to only one individual sampled.

82 Table 3.3: Microsatellite variability within A populations. 2N: number of chromosomes scored; N^x- number of alleles observed; average observed heterozygosity; average unbiased gene diversity (Nei 1987).

83 Jinja Pier Napolean G ulf Kachira Kyoga Nawampasa Kabeleka Average Total OSU09d 2N 24 36 44 44 44 58 250 ^Vall 15 16 4 16 17 16 13.8 27 Ho 1.00 1.00 0.32 0.91 0.91 0.91 0.84 He 0.91 0.92 028 0.91 0.96 0.90 0.81 O SU l2t 2N 26 34 44 44 44 60 252 Hall 1 1 1 1 3 1 1.3 3 Ho 0.00 0.00 0.00 0.00 0.09 0.00 0.02 He 0.00 0.00 0.00 0.00 0.09 0.00 0.02 OSUlSd 2N 26 36 44 44 46 54 250 A^all 15 18 14 20 19 21 17.8 28 Ho 1.0 0.89 0.91 0.91 0.91 0.82 0.91 He 0.91 0.93 0.88 0.93 0.92 0.93 0.92 O SU I6d 2N 26 34 44 44 46 60 254 Hall 3 3 1 4 3 2 2.7 4 Ho 0.23 0.24 0.00 0.18 0.09 0.30 0.17 He 0.21 0.21 0.00 0.17 0.08 0.26 0.16 OSU19d IN 26 34 44 44 46 60 254 Hall 10 14 7 19 21 18 14.8 29 Ho 1.00 0.88 0.68 0.96 0.91 0.90 0.89 He 0.86 0.91 0.82 0.93 0.94 0.92 0.90 0SU 19t 2N 26 36 42 44 46 60 254 ^all 8 9 4 14 12 12 9.8 19 Ho 0.86 0.56 0.43 0.77 0.82 0.70 0.69 He 0.73 0.71 0.72 0.88 0.85 0.87 0.79 OSU20d 2N 24 34 44 44 44 56 246 Hall 16 17 21 16 23 15 18.0 41 Ho 0.92 0.82 0.82 0.73 0.91 0.57 0.80 He 0.92 0.93 0.92 0.93 0.94 0.91 0.93 OSU21d 2N 26 34 44 44 44 60 254 Hall 1 2 2 1 2 3 1.8 5 Ho 0.00 0.00 0.09 0.00 0.05 0.63 0.13 He 0.00 0.11 0.09 0.00 0.04 0.48 0.12 OSU22d 2N 26 36 44 44 46 60 256 Hall 16 18 17 22 23 31 21.2 43 Ho 0.92 1.00 1.00 0.96 1.00 0.90 0.96 He 0.93 0.91 0.91 0.94 0.95 0.96 0.93 Average ^all 9.4 10.9 7.9 12.6 13.7 13.2 11.3 Ho 0.67 0.61 0.50 0.63 0.64 0.67 0.62 He 0.63 0.65 0.53 0.63 0.65 0.70 0.64

84 Jinja Pier Napoleon Gulf Kachira Kyoga Nawampas Kabeleka Overall OSU09d 1 I 0 2 2 0 6 % 6.7 6.3 0.0 12.5 11.8 0.0 22.2 OSU12t 0 0 0 0 2 0 2 % 0.0 0.0 0.0 0.0 66.7 0.0 66.7 OSUlSd 0 II 1 0 2 5 % 0.0 5.6 7.1 5.0 0.0 9.5 17.9 OSU16d 0 0 0 0 0 0 0 % 0.0 0.0 0.0 0.0 0.0 0.0 0.0 OSU19d 0 0 0 2 3 0 5 % 0.0 0.0 0.0 10.5 13.0 0.0 17.2 O SU l9t 0 0 0 1 1 2 4 % 0.0 0.0 0.0 7.1 8.3 16.7 21.1 OSUlOd I 0 5 3 0 1 10 % 6.3 0.0 23.8 15.8 0.0 5.9 24.4 OSU21d 0 1 0 0 0 2 3 % 0.0 50.0 0.0 0.0 0.0 66.7 60.0 OSU22d 0 0 I 4 1 4 10 % 0.0 0.0 5.9 18.2 4.3 12.9 23.3 Average

^ P 0.2 0.3 0.8 1.4 1.0 1.2 5.0 % 2.3 3.0 9.7 9.7 7.5 9.1 22.6

Table 3.4: Number and percentage of private alleles.

8 5 Jinja Fier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka

Jinja Pier — 121.336 1.807 10.366 9.139 6.327

(n.a/0 (0.669) (3.083) (3.222) (1.905)

Napoleon Gulf 0.002 — 2.522 13.208 20.333 8.718

(-0.011) (0.762) (9.365) (5.069) (2.327)

Kachira 0.122** 0.090** — 2.628 2.673 2.053

(0.272) (0.247) (0.754) (1.498) (0.805)

Kyoga 0.024** 0.019** 0.087** - 38.572 9.383

(0.075) (0.026) (0.249) (9.365) (3.373)

Nawampasa 0.027** 0.012** 0.086** 0.006* — 7.775

(0.072) (0.047) (0.143) (0.026) (4.060)

Kabeleka 0.038** 0.028** 0.109** 0.026** 0.031** —

(0.116) (0.097) (0.237) (0.069) (0.058)

Table 3.5: Pairwise Fst (and Rst) (below diagonal), as well as Nm (and Mr ) (above diagonal) for the six A. alluaudi populations based on microsatellite data. *p < 0.05; **p < 0.01; ^ n.a.: not applicable due to a negative Rst value

86 Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka

Jinja Pier - 0.388 0.658 0.450 0.502 0.511

Napoleon Gulf 0.005 — 0.636 0.470 0.424 0.473

Kachira 0.180 0.135 — 0.611 0.593 0.752 Kyoga 0.030 0.029 0.140 — 0.369 0.487 Nawampasa 0.048 0.028 0.134 0.016 — 0.511 Kabeleka 0.070 0.055 0.205 0.054 0.065 --

Table 3.6: Pairwise allele sharing genetic distances above diagonal) and Nei’s

(1972) standard genetic distances (D 5 ; below diagonal) for the six population samples.

87 Lake Albert Lake Kyoga a A A oy

40 80 120 160 Km

Kaonga

Lake George Lake Lake Edward Victoria

K a g w a

Figure 3.1: Map of the LVR with the sample collection sites labelled as: 1. Jinja Pier; 2.

Napoleon Gulf; 3. Lake Kyoga; 4. Lake Nawampasa; 5. Lake Kabeleka; 6. Lake

Kachira.

88 Figure 3.2: (a) Bases at 10 variable sites for the 10 mtDNA haplotypes of.^. alluaudi.

Dots - identical sequences to haplotype 1 ; dash — deletion. Numbers

above the haplotypes are the nucleotide positions for the 10 variable sites

relative to positions of the aligned sequences from Meyer et al. (1990).

Numbers in parentheses are numbers of individuals with the particular

haplotypes.

(b) A minimum mutation network for the 10 haplotypes. Dotted lines

represent alternative mutation pathways, and slashes represent the number

of inferred mutation steps along the mutation pathways.

89 (a)

Position of the 10 variable sites

.0 - 1 1 1 2 2 2 2 3 3 4 type 2 5 7 0 0 7 a 1 8 3 4 9 1 3 4 0 7 2 6 9

1 A T A TTTAAGT Jinja(1) Kyoga(1) Kachira(9) 2 r Kyoga(4 ) Nawampas a(1) 3 G J inj a(1) Kyoga(1) 4 G r Kabeleka(8) Napoleon Gulf(l) 5 G • GA Kachira(1) 6 G . G Kyoga(2) 7 G .. C c Jinja (2) 8 G C . . c . G . . . Jinja (1) 9 G G Jinja (1) 10 G G . - Jinja(1)

(b)

90 Victoria (1) Kyoga (1) Kachira (9)

Victoria (1) Kyoga (1)

Kyoga (4) Nawampasa (1)

Victoria (1)

Kachira (1)

Kyoga (2)

Victoria (1)

Victoria (1)

Victoria (2)

Napoleon Gulf (1) Kabeleka (8)

Figure 3.3: A UPGMA phenogram of the 10 mtDNA haplotypes. Numbers in parenthesis represent the number of individuals that exhibit the particular haplotypes.

91 Jinja [ Napoleon Gulf Kyoga cNawam pasa Kabeleka

Kachira

0.01

Figure 3.4: A UPGMA phenogram based on Nei's (1972) genetic distances for the six A. alluaudi populations.

92 Y. fusiformis

p. sp. 4: P. "rock kribensis* A. ve//fer(N)

A. velifer{K)

0.01

Figure 3.5: A neighbor-joining tree based on Nei’s (1972) genetic distances of the five

Lake Victoria cichlid species. The tree was rooted using A. alluaudi as an outgroup. N-

Nabugabo; K-Kayugi.

93 CHAPTER 4

Microsatellite Genetic Variation and Phylogeny of Lake Victoria Cichlid Species

INTRODUCTION

Uncovering the phylogenetic relationships of closely related taxa using molecular markers continues to be a major challenge for contemporary evolutionary biologists.

This is especially true for groups such as the recently diverged members of the endemic

Lake Victoria cichlid species flock (Meyer et ai, 1990). A common problem in such cases is that many conventional genetic markers, such as allozyme markers, are not variable enough to detect sufficient variation for phylogenetic inferences.

Cichlid species in Lake Victoria are extremely diverse morphologically, trophically, and behaviorally. However, morphology, ecology, and behaviour could potentially interact during the development of individuals to produce different “forms” that might be described as different species. Although under laboratory conditions, different cichlid species may interbreed and produce fertile offspring, hybridization does not seem to normally occur in the wild (Seehausen et ai, 1997). In their natural

94 environment, different cichlid species, including species that are sympatric, are thought to be reproductively isolated by sexual selection through female mate choice of male coloration (Seehausen et al., 1997). Nevertheless, the inability o f numerous genetic studies (Mayer et al, 1998; Meyer et al., 1990; Sage et al, 1984) to differentiate various

Lake Victoria cichlid species leaves some argument about the identification of hundreds of forms based on morphology alone.

The failure of many conventional genetic markers to achieve phylogenetic resolution of Lake Victoria cichhd species is not completely surprising. The lake has a recent geological origin, having almost completely disappeared during a drier period that occurred about 12,000 years ago during the Late Pleistocene (Johnson et al., 1996; also see Chapter 1). Therefore, most of the present-day cichlid species could have evolved in a period of time too short for most conventional genetic markers to accumulate sufficient mutational variation for phylogenetic inferences. If this interpretation is correct, application of genetic markers with much higher mutation rates uncover the phylogenetic relationships of the tightly knit members of the Lake Victoria cichlid species flock.

In the past decade, short tandem repeat DNA sequences, microsatellites, have been receiving increasing attention for population genetic studies of closely related taxa

(e.g., Bowcock et al., 1994). These sequences are abundant in all vertebrate genomes but have a random distribution. They have extremely high mutation rates with high levels of allelic variation, and are relatively easy to genotype (see Chapter 1). Microsatellite markers have detected considerable amounts of genetic variation in several species where conventional genetic markers revealed very little genetic polymorphism (Gertsch et al,

95 1995; Hughes & Queller, 1993; Taylor et al., 1994). One additional common feature of microsatellite loci makes them ideal genetic markers for phylogenetic studies of various

Lake Victoria cichlid species. Although microsatellite repeat motifs usually have high population variabhty, the unique sequences that flank the repeat motifs are generally conserved among closely related species (Roy et al, 1994), genera (Schlotterer et al.,

1991; Vaiman et al, 1994), and even some families (FitzSimmons et al, 1995).

Therefore, it is logical to assume that microsatellite markers isolated from one Lake

Victoria cichlid species might be able to amplify genomic DNA from all, or most, of the other Lake Victoria cichlid species, a favorable situation for phylogenetic studies.

In the present study, I used 14 microsatellite makers to investigate the levels of genetic variation within species and to infer the phylogenetic relationships among 24 endemic cichlid species from the LVR, as well as one single representative species from

Lake Malawi.

MATERIALS AND METHODS

Biological specimens

A total of 574 specimens were sampled, representing 24 putative endemic haplochromine cichlid species from the LVR and one species from Lake Malawi (Table

4.1). Detailed sample collection information about all the specimens is presented in

Appendix D. All the tissue samples were collected between 1993 and 1996. Specimens of Astatotilapia caliptera specimens were taken from Lake Malawi, whereas all the other

96 specimens were taken from the LVR (Table 4.1). The LVR includes Lake Victoria, the

Lake Kyoga system (including Lakes Kyoga and Nawampasa), the Lake Edward/George system (including Lakes Edward, George, and Kabeleka), the Lake Nabugabo system

(including Lakes Nabugabo, Kayugi, and Kanjanja), and the Kooki Lake system

(including Lake Kachira) (see Figure 1.1; Figure 3.1). Sample collection procedures and

DNA extraction protocols were described in Chapters 2 and 3. All specimens were morphologically identified by Dr. L. Kaufinan at Boston University.

Microsatellite markers and PCR genotyping

A total of 14 microsatellite markers were used in this study. Among the 14 markers, seven (OSU09d, lit, 16d, 19d, 19t, 20d, and 2 Id) were developed from a Lake

Victoria haplochromine cichlid species, Astatoreochromis alluaudi (Wu et al, 1999), four were developed from either of the two haplochromine cichlid species

(Pseudotropheus zebra from Lake Malawi and Haplochromis nigricans from Lake

Victoria) rhttp://tilapia.unh.ediVWWWPages/TGP/CA-Melanochromis.htmlL and three were from a tilapia species, Oreochromis niloticus (Lee & Kocher 1996). Detailed information about the seven .4. alluaudi microsatellite markers as well as their PCR conditions were presented in Chapter 2. Only primers OSUXXX listed in Table 2.1 of

Chapter 2 were used for all species except fox A.alluaudi specimens, for which the alternative primers OSUXXXN were also used to type individuals that were scored as homozygotes when OSUXXX were used. linformation about the other seven markers is presented in Table 4.2. PCR cycle conditions were as described as in Chapter 2.

97 Multiplex PCR was performed for markers UNHI69 and DXTUCA-15, because of their similar PCR amplification conditions and non-overlapping nature of alleles.

Data analyses

PCR amplification success rates were determined for each locus-species combination. An amplification is considered successful only when it produces one or two scorable PCR bands for single individuals. Non successful amplifications include either no amplification (no visible bands) for at least two independent PCR trials, or amplifications that produce no typical microsatellite bands (judged both by the shadow band pattern and by expected size ranges). Amplifications that produce more than two scorable (sizable) typical microsatellite bands for single individuals or produce non- scorable (sizable) typical microsatellite bands were not counted in calculating the amplification success rates.

Levels of intraspecific genetic variation of microsatellite markers were measured by the number of observed alleles in the entire samples of each species, observed heterozygosity, and unbiased gene diversity (Nei, 1987). Interspecific genetic divergence was estimated by three genetic distance measures. Three types of genetic distance measures were used in phylogenetic analyses. Nei’s standard genetic distance (Nei,

1972) has been one of the most widely used distance measures for various genetic markers. The allele sharing distance measure has been successfully used to reconstruct phylogentic lineages of various human populations based on microsatellite data

(Bowcock et al, 1994). The stepwise weighted genetic distance measure (JDsw) was

98 specifically developed for microsatellite data (Shriver et al, 1995). It takes into account both allele frequencies and allele sizes. Pairwise genetic distances were calculated using the computer program Microsat (version l.Sd) developed by E. Minch rhttp://lotka.stanford.edu/microsat.html'). Phylogenetic trees based on each of the three genetic distance measures were constructed using the Neighbor-joining (NJ) (Saitou &

Nei, 1987) algorithm that is implemented in the computer program MEGA (Kumar et al,

1993). To evaluate the reliability of the data, 100-bootstrap replicates were obtained using the computer program Microsat (version 1.5d). For each replicate sample, the distance matrix for Nei’s genetic distance was determined. The neighbor-joining tree was constructed for each replicate using NEIGHBOR, and the consensus tree with the percentage of clades appearing was obtained using CONSENSE. Both NEIGHBOR and

CONSENSE are implemented in the computer package PHYLIP (version 3.5c)(

Felsenstein, 1993).

RESULTS

Cross-species amplification o f microsatellite primers

Results from a pilot study presented in Chapter 2 indicated that all seven A. alluaudi microsatellite markers used in this study successfully amplified each of the seven other haplochromine cichlid species from the LVR (Table 2.5). The present expanded study on cross-species amplifications of microsatellite markers confirms the

99 preliminary finding. All of the seven Astatoreochromis alluaudi microsatellite markers successfully amplified genomic DNA firom all the 25 species examined, including one representative Lake Malawi species, Astatotilapia calliptera. The average amplification success rate was 96.5%, ranging firom 93.5% (0SU19d) to 99.2% (OSU09d) (Table 4.3).

The average amplification success rate for the other four haplochromine cichlid markers was slightly lower [92.6%, ranging firom 82.7% (DXTUCA-14) to 98.5% (DXTUCA-8)].

The average amplification success rate for the three tilapia microsatellite markers was the lowest (average 90.1%, ranging firom 86.3% to 96.8%). In general, the lack of scorable amplifications for markers DXTUCA-3 and DXTUCA-14 was primarily due to unsizable

PCR products or multiple atypical microsatellite PCR bands. For other markers, complete failure on amplification was the common cause for unscorable amplifications.

Over all of the 14 microsatellite markers, the average amplification success rate per species was 94.0%, ranging fi'om 86.4% (Thoracochromispharyngalis) to 100.0%

(Paralabidochromis sp.).

Microsatellite variation within species

All 14 microsatellite markers were polymorphic for almost all the species examined (Table 4.3). Among the 350 species-marker combinations (25 species X 14 markers), only one was not scorable (DXTUCA-14 for Harpagochromis squamipinnis), and only 11 were monomorphic. Locus 0SU12t was monomorphic for five species,

0SU21d was monomorphic for two species, and four markers (OSU09d, UNH142,

UNH169, and DXTUCA-15) were monomorphic for one species each. Among the

100 species, Astatoreochromis alluaudi was monomorphic for three markers,

Proganathochromis Venator was monomorphic for two markers, and six other species were monomorphic for one marker each.

On average, 45.3 chromosomes (or about 23 individuals) per species were scored per marker. Estimated genetic variability for each microsatellite marker was generally high (Table 4.3). The average number of observed alleles was 9.9 per species per marker, the average observed heterozygosity was 0.592, and the average expected heterozygosity

(unbiased gene diversity) was estimated to be 0.665. Among the 25 species, P. venator had the lowest average number of observed alleles (6.6), and A. alluaudi had the lowest average of unbiased gene diversity (0.532). For each of the 14 markers, 0SU12t appeared to be the least variable, with the lowest average number of observed alleles

(2.0), the lowest average observed heterozygosity (0.172), and the lowest average expected heterozygosity (0.190). However, locus 0SU12t did not have the lowest total number of observed alleles (7 compared with 5 for 0SU21d). OSU20d was the most variable marker, with the highest average number of alleles (19.2), the highest average expected heterozygosity (0.942), and the highest total number of observed alleles (69).

However, it did not have the highest observed heterozygosity (0.714 compared with

0.929 for DXTUCA-3). A total of 517 alleles were observed for the 14 markers in the 25 species. The average total number o f observed alleles per marker was 36.7, with 11 of the 14 markers having at least 30 alleles observed per marker. Allele frequency data of all the marker-species combinations is presented in Appendix E.

101 Phylogenetic analyses o f haplochromine cichlids based on 11 microsatellite markers

A common concern about the quality of data obtained in cross-species applications of microsatellite markers is the presence of null alleles (see Chapter 1). If null alleles exist at a very high frequency, there would be a large number of individuals scored as false homozygotes, who were real heterozygotes with one scorable allele and a null allele. In such cases, the allele frequency of the scorable alleles may be overestimated, and thus may affect phylogenetic analyses. Strict confirmation of the presence of null alleles relies on designing new PCR primers using the flanking sequences outside the original PCR primers and sequencing the newly amplified PCR products. However, observation of a very low amplification success rate and/or a significant deficiency in observed heterozygosity compared with the expected heterozygosity would be good indications of the presence of null alleles.

The presence of null alleles is probably not a serious problem for the present study. This conclusion is reached because of the high amplification success rates for each of the 14 markers (Table 4.3) and because previous studies suggest that the majority of the haplochromine cichlid species in the LVR are very similar genetically (Meyer et al,

1990; Sage et al, 1984). Nevertheless, to minimize the potential effect of the presence of null alleles on phylogenetic analyses, for the first set of phylogenetic analyses only microsatellite markers that meet at least one of the following three criteria for all species were used: 1) The marker has an amplification success rate of at least 60% for each species; 2) was scored successfully in at least 15 individuals for each species; 3) did not show significant reduction of observed heterozygosity compared with the expected

102 heterozygosity. Based on these criteria, three markers (UNH142, UNH149 and

DXTUCA-14) were excluded firom the following initial phylogenetic analyses because they did not meet any of the three criteria for at least one species each (UNH142 for

Thoracochromis pharyngalis', UNH149 ïox Astatotilapia elegans', and DXTUCA-14 for

H. squamminipis, Harpagochromis “slick”, and Pseudocrenolabris multicolor victoriae)

(Table 4.3).

All the pairwise genetic distance matrixes for the three distance measures are presented in Appendix F. The three distance methods produce trees with slightly different branching patterns and clusters, but with considerable similarities. Furthermore, except branches leading to a few basal outgroup species, branch lengths are generally small on all phylogenetic trees. An unrooted NJ phylogenetic tree based on Nei’s standard genetic distances of the complete data set for 11 markers (excluding UNH142,

UNH149 and DXTUCA-14) is shown in Figure 4.1 and Figure 4.2 (topology only). On the tree, Pseudocrenilabrus multicolor victoriae is the outgroup (basal group) for all the other species. The tree shows subsequent branching by Thoracochromis pharyngalis,

Astatoreochromis alluaudi, and two Lake Edward species. The remaining haplochromine cichlid species form three monophyletic groups. The basal monophyletic group (Group I) includes three non-congeneric species from Lake Edward. The second monophyletic group (Group II) includes four species (two Harpogachromis species, one species each from genera Prognathochromis and Psammochromis). A total of 12 species belong to the biggest monophyletic group (Group III), including five species from Lake Victoria, four species from the Lake Kyoga system (also including Nawampasa), two species from the

103 Lake Nabugabo system (also including Lakes Kayugi and Kanjanja), and one species firom Lake Malawi. Five of the 12 species belong to the genus Astatotilapia, and four species belong to the genus Paralabidochromis.

On the unrooted NJ phylogenetic tree based on allele sharing distances (Figure

4.3), the basal nodes are similar to Figure 4.2 except that the sole Lake Malawi representative, Astatotilapia calliptera, clusters with Astatochromis alluaudi. For the remaining species, there are two monophyletic groups. Group I includes five non- congeneric species exclusively firom Lake Edward. Group II includes a total of 15 species. Similar to group EH on the previous tree (Figure 4.1 and Figure 4.2), all except one of the 15 species are from the Lake Victoria basin, including Lake Victoria (5 species), the Lake Kyoga system (6), and the Lake Nabugabo system (3). Only one outsider species {Astatotilapia aenocolor) is firom Lake Edward.

For the unrooted NJ phylogenetic tree based on the stepwise weighted genetic distances (Figure 4.4), Pseudocrenilabrus multicolor victoriae is again the outgroup, followed by Astatochromis alluaudi and the single Lake Malawi species. Five Lake

Edward species sequentially come off firom the basal nodes after the three outgroup species. For the remaining species, there are two monophyletic groups. The first group

(Group I) includes three species firom Lake Edward, two species fi-om the Lake Kyoga system, and one species firom the Lake Nabugabo system. Within this group, the two species firom the genus Harpogachromis form a cluster. The second monophyletic group

(group II) includes 11 species firom Lake Victoria, the Lake Kyoga system, and the Lake

104 Nabugabo system. Within this group, the two species from the genus Yssichromis form a cluster.

Phylogenetic analyses o f haplochromine cichlids based on 14 microsatellite markers

Computer simulation studies have demonstrated a positive correlation between the number of microsatellite markers used and the reliability of a phylogeny inferred from microsatellite markers (Takezaki & Nei, 1996). In the phylogenetic analyses presented above, three microsatellite markers were excluded due to questionable data for a total of only five out of the possible 75 (25 X 3) species-marker combinations. Moreover, the allele frequency distributions for the five excluded species-marker combinations are similar to other allele frequency distributions (Appendix E). Therefore, to see whether including data from the three additional markers improves the tree topologies, phylogenetic analyses were conducted based on allele frequency data from all of the 14 microsatellite markers for 24 species, with Harpogachromis squamipinnis excluded because it was not possible to score for locus DXTUCA-14 (Table 4.3).

An unrooted NJ phylogenetic tree based on Nei’s genetic distances of 14 markers is shown in Figure 4.5. Compared with Figure 4.2, the three deepest basal group species are identical. The major differences include 1) Astatotilapia calliptera from Lake

Malawi is an outgroup species for all the LVR cichlid species except the three deepest outgroup species; 2) congeneric populations or species form clusters - the two populations oi Astatotilapia velifer form a cluster, as do the two Lake Victoria species of

Yssichromis , as well as three members of Paralabidochromis.

105 The NJ phylogenetic tree based on the allele sharing distances of 14 markers

(Figure 4.6) is similar to the 11 marker tree in Figure 4.3. The major differences include:

1) on the 14 marker tree, clusters form between the two populations of A. velifer, between the two congeneric Lake Victoria Yssichromis species, between the two Astatotilapia species from the Lake Kyoga system, and between three species of Paralabidochromis\ and 2) Y. pappenheimi is not enbeded in the Lake Edward exclusive species group.

The stepwise weighted genetic distance measure seems to show the biggest alteration on the overall tree topology when data for the three additional markers were included (Figure 4. 7 vs. Figure 4.4). The two identifiable monophyletic groups on the 11 marker tree has become three monophyletic groups on the 14 marker tree. The bigger non-Lake Edward monophyletic species group (Group II, 11 species) on the 11 marker tree becomes much smaller on the 14 marker tree (seven species, with two A. velifer populations forming a cluster).

To evaluate the reliability of the analysis presented above, consensus trees were constructed based on bootstrapping with 100 replicates. Figure 4.8 shows a consensus tree based on Nei’s genetic distance. The topology of the consensus tree is very similar to the neighbor-joining tree from the observed data (Figure 4.1 and 4.2). However, boostrap values for most branches are under 40%, suggesting the low resoluation of the tree.

106 DISCUSSION

Success of cross-species amplifications o f microsatellite primers

The present studies confirm that microsatellite markers are indeed useful for cross-species amplifications among closely related species of East Afirican haplochromine cichlids. Overall, 94.0% of the amplifications produce scorable PGR products. The results also confirm that the success rates for cross-species amplifications are negatively correlated with the genetic distances between the source species (where microsatellite markers are developed) and the target species (where cross-species amplification applies).

As a group, the three markers developed firom the most distantly related non- haplochromine cichlid species, the Nile tilapia (Oreochromis niloticus), have the lowest amplification success rates, with an average of 90.1%, compared to 96.5% for the seven markers from Astatoreochromis alluaudi, and 92.6% for the four markers from other two

East African haplochromine cichlid species (Table 4.3). This study demonstrated that most microsatellite markers developed in one haplochromine cichlid species are indeed useful for many other haplochromine cichlid species. Therefore, development of microsatellite markers for each individual species is not necessary for cross-species applications, such as population genetic analysis and phylogenetic reconstruction of Lake

Victoria haplochromine cichlids.

107 Microsatellite variation within haplochromine cichlid species

Although previous studies using various genetic markers indicated thkt haplochromine cichlid species firom the LVR are genetically “depauperate” (Meyer et ai,

1990; Sage et al, 1984), the present study clearly demonstrated that there is a substantial amount of genetic variation at microsatellite loci within each of the Lake Victoria cichlid species (Table 4.3). This is most likely a result of the higher mutation rates for microsatellite markers compared to either mtDNA or allozyme markers (see Chapter 1).

The fact that considerable amounts of microsatellite variation were revealed in Lake

Victoria cichlid species is consistent with a recent study by Nagl et al. (1998), in which high levels of nucleotide substitution mutations were observed in a few Lake Victoria haplochromine cichlid species.

Among all the 25 species, Proganathochromis Venator appears to be least variable

(Table 4.3). Its average number of observed alleles per marker (6.6) is the smallest, even though its sample size (75.6) is much higher than average (45.3). P. venator is endemic to the Lake Nabugabo system, which is much smaller than other satellite lake systems of

Lake Victoria. Furthermore, the P. venator samples used in the present study were taken from Lake Kayanja, a very small lake in the Nabugabo system. Therefore, it is possible that the P. venator samples analyzed in the present study may have a small effective population size and/or a historical population bottleneck due to a smaller founder population, both of which would result in lower levels of microsatellite genetic variation.

108 Correlation between allele sizes and genetic variability among haplochromine cichlids

Previous studies have demonstrated that within a species, nonhomologous microsatellite markers with longer repeat motifs usually have higher genetic variability than markers with shorter repeat motifs. Such a positive correlation has rarely been documented for homologous microsatellite markers among species (but see Harr et al.,

1998). In the present study, the success of the 14 microsatellite markers to amplify genomic DNA firom almost all of the 25 species allows the examination o f the correlation between repeat size and variation for homologous microsatellite markers across different species.

Among the 14 microsatellite markers, two markers (OSU12t and 0SU21d) showed very low levels of genetic variation for all the 25 species and three markers

(0SU19t, DXTUCA-3 and DXTUCA-14) showed high levels of genetic variation for all the species (Table 4.3; Appendix E). Among the remaining nine markers, a clear positive correlation exists between the allele size (repeat length) and genetic variability (Table 4.4; also refer to Table 4.3 and Appendix E). In these cases, for each homologous microsatellite marker, species in which a larger average size of alleles (or longer repeat motifs) was observed usually have higher genetic variability (higher number of observed alleles and higher gene diversity). Such a positive correlation for homologous markers among species is consistent with the argument that longer repeat motifs tend to have higher mutation rates.

109 Phylogenetic inferences for haplochromine cichlids

Phylogenetic reconstruction of the evolutionary history of Lake Victoria haplochromine cichlid species has been a major challenge for evolutionary biologists.

One major problem has been a lack of genetic markers with appropriate levels of variation (Meyer e/a/., 1990; Sage er ar/., 1984). Given that micro satellite markers, as a group, revealed tremendous amounts of genetic variation in each of the 25 haplochromine cichlid species examined in this study, it is logical to believe that microsatellite markers might be more useful for phylogenetic analysis of Lake Victoria haplochromine cichlids than other genetic markers.

The present study using microsatellite markers for phylogenetic inferences of various species, represents the most extensive study of its kind. It greatly extends both the number of markers used and the number of species examined. Phylogenetic analyses did show that in sharp contrast to conventional genetic markers, microsatellite markers, as a group, contain important phylogenetic signals. However, the branch lengths estimated between divergence events are very short. This is consistent with the very recent divergence of the large number of species that make up the present-day members of the

East African cichlid species flocks.

Overall, the present phylogenetic studies bear the following implications:

Placement of divergent taxa

i. On all phylogenetic trees (Figure 4.1-7), it is clear that Pseiidocrenilabrus multicolor victoriae is the most divergent taxon of the haplochromine cichlid species examined. It appears to have differentiated even from other divregent types, including

110 Astatochromis alluaudi and the sole Lake Malawi representative, Astatotilapia calliptera.

P. multicolor victoriae has very different allele frequency distributions from other haplochromine cichlid species used in this study (Appendix E). At several loci {e.g.,

DXTUCA-8), it has a set of alleles that do not overlap with those in any other species.

The basal outgroup status of P. multicolor victoriae is consonant with an earlier study using the nuclear ribosomal RNA internal transcribed spacer (ITS-1) (Booton et ai,

1999). It is also consistent with the observation that P. multicolor victoriae is one of the few widespread haplochromine cichlid species in the LVR.

ii. Consistent with previous studies using both nuclear markers (Mayer et al,

1998; Sage et al., 1984) and mitochrondrial DNA markers (Meyer et ai, 1990),

Astatochromis alluaudi is an outgroup for other haplochromine species from both Lake

Victoria and Lake Malawi. Like P. multicolor victoriae, A. alluaudi is also a widespread species in the LVR.

iii. Although the exact placement of Astatotilapia calliptera is uncertain from the

11 marker trees (Figure 4.1-4.4), the 14 marker trees (Figure 4.5-4.7) suggest that this

Lake Malawi haplochromine is an outgroup of all LVR haplochromine species except the two or three deepest LVR outgroup species. Such an outgroup status also is consistent with other studies which show that members of the Lake Malawi cichlid species flock may represent a sister group to the LVR lineage (Meyer et ai, 1990).

iv. Although the exact phylogenetic position of Tharacochromis pharyngalis is unclear because of the inconsistency among various trees, its status as a divergent taxon is established in all the phylogenetic analyses. Together, the identification of divergent

111 haplochromine species using microsatellite markers is generally consistent with that based on other genetic markers, suggesting the effectiveness of microsatellite markers in phylogenetic inferences.

Taxonomic relationships

The 25 species used in the present study represent 11 genera, in seven of which congeners have been examined. When using only 11 microsatellite markers, it appears that none of the three genetic distance measures recovers congeneric relationships very effectively (Figure 4.1-4.4). Although some of the congeneric pairs of Astatotilapia,

Hapargochromis, Paralabidochromis, and Yssichromis were recovered by at least one distance measure, in most cases not all congeners are included in a single clade. For example, Yssichromis pappenheimi never clustered with either o f its two congeners, even though the latter did form a cluster in one distance measure (Dsw, Figure 4.4). Among the 25 species examined, seven species are from the genus Astatotilapia, but none of the distance measures identify a monophyletic group that includes more than two

Astatotilapia congeners. It is surprising that the two populations of Astatotilapia velifer never formed a cluster on the 11 marker trees.

When using 14 microsatellite markers, however, morphologically defined taxonomic relationships were recovered with more consistency (Figure 4.5-4.7). On all of the three trees, the two populations of Astatotilapia velifer always form a cluster, as do the two populations oiAstatoreochromis alluaudi. Moreover, the congeneric relationships of Yssichromis fusiformis and Y. laparagramma were established on each of

112 the three trees. Finally, on two out of the three trees, the congeneric relationships of

Paralabidochromis beadlei, P. “rock kribensis”, and P. sp. were also recovered.

The inability of microsatellite markers to recover all congeneric relationships may be due to one or more of the following factors. First, some of the congeneric relationships based on morphological data may not be appropriate. In other words, current genus classification and in Lake Victoria cichlid species is sometimes not valid .

For example, among the 25 species, seven taxa are in fact undescribed species (Table

4.1). Therefore, their genus placement is no better than tentative. Moreover, it is a common practice in haplochromine cichlid taxonomy to put a taxa in the geneus

Astatotilapia. Thus, that if a species genus status is uncertain, it is usually placed in tlie genus Astatotilapia if a species genus status is uncertain. In other words, Astatotilapia may well be an artificial genus.

Secondly, it has been shown that the genus Astatotilapia is not monophyletic

(Lipptisch, 1993). Instead, it consists of at least two separate lineages, a lacustrine lineage and a riverine lineage (Lipptisch, 1993). Similarly, recent morphological studies suggest that the genus Paralabidochromis may not be monophyletic either, with a possible second lineage being closer to the genera Neochromis and Haplochromis

(Kaufinan, personal communications).

Finally, theoretical studies and empirical analyses of microsatellite data suggest that both the number of markers and the sample sizes must be quite large to ensure a perfect tree topology using microsatellite markers, primarily because of their high levels of genetic variability (Takezaki & Nei, 1996). The obvious improvement on congener

113 recovery on the 14 marker trees, compared with the 11 markers trees, supports such an argument. In the present study, although the average sample sizes are moderate (23 individuals per species), there are several species in which less than 15 individuals were sampled. Moreover, although the present study represents the largest number of loci applied to any similar phylogenetic analysis, the number of markers used in this study is still smaller than that recommended from theoretical studies (Takezaki & Nei, 1996). In contrast to the lack of genetic variation for many conventional genetic markers, some of the microsatellite markers that were used in the present study may actually be too variable for phylogenetic reconstruction as they can potentially lead to high levels of homoplasy.

Therefore, in future studies, in addition to using a larger number of microsatellite markers as well as larger sample sizes, efforts should also be made to employ microsatellite markers with lower intraspecific variability to ameliorate the potential homoplasy problem (Harr et al, 1998).

Regional spéciation and differentiation

On all the phylogenetic trees presented above, there is always a large species cluster represented by species exclusively from the Lake Victoria basin (including Lake

Victoria, the Lake Kyoga system and the Lake Nabugabo system). The only exception to this is that on the 14 marker tree based on the allele sharing distances, a Lake Edward endemic, Yssichromis pappenheimi is embeded in the large Lake Victoria basin species group (Figure 4.6). Similarly, multiple species from Lake Edward, almost all of which are non-congeneric, are either recovered as a monophyletic group (when D s or were

114 used), or as a group of species that sequentially come off from the divergent taxa (when

Dsw is used).

The clustering of non-Lake Edward species and the embedding of multiple non- congeneric Lake Edward haplochromine species are analogous to results obtained in an earlier study using randomly amplified polymorphic DNA (RAPD) markers (Booton

1995). In that study, Booton (1995) showed that species were grouped more often by their geographic locations (Western vs. Eastern) within Lake Victoria rather than by their genus placement based on morphological data. If such a pattern continues to be uncovered in expanded phylogenetic analyses using genetic markers, it would strongly suggest that many of the endemic haplochromine cichlid species might have originated by parallèle evolution. Such a phenomenon is of course not new to the East Aftdcan cichlid species flocks. Kocher et al. (1994) elegantly demonstrated that morphologically similar species pairs from Lake Malawi and Lake Tanganyika did not cluster. Instead, morphologically dissimilar non-congeneric species from within a lake were genetically more similar to each other than to their counterparts from the other lake. However, confirmation of convergent evolution in the LVR requires additional genetic data on more congeners from different lakes.

The clustering of Lake Edward species could also have different causes. It could be explained by a recent common origin of those species. Or alternatively, it may reflect some hybridization among Lake Edward endemic species. The fact that the D ^ s genetic distance measure reveals stronger Lake Edward clustering than the other two distance measures favors the latter scenario. This is because the genetic distance measure is

115 based on the percentage of alleles shared by each species pair, and therefore can detect lower levels of hybridization more easily than the other two genetic distance measures.

Invasion of ancient Lake Edward species into the Lake Victoria basin

On all the phylogenetic trees, after outgroup taxa Pseudocrenilabrus multicolor victoriae and Astatoreochromis alluaudi, there are always several species exclusively from Lake Edward radiating from basal nodes, either clustered as a monophyletic group

(for or sequentially coming off from the deeper node (for £> 5w), or a combination of the two (for Ds). In contrast, there is always a monophyletic group that is derived from a more recent ancestral node of the trees and includes species exclusively from the

Lake Victoria basin. Such tree topologies strongly suggest that most of the present-day haplochomine cichlid species from the Lake Victoria basin are derived from ancestral species from the Lake Edward system after the most recent refilling of Lake Victoria.

This argument is well supported by a recent study by Kaufrnan et a l (Kaufman, personal communications). By examining the geography and histories of several major satellite lakes of Lake Victoria, they concluded that if there were a refugium for haplochromine fauna during the Lake Victoria drying up event. Lake Edward would be the only primary suspect because it would have been the only regional lake that remained wet during the time period when Lake Victoria was drying up.

116 CONCLUSIONS

The present study demonstrates that there are considerable amounts of microsatellite genetic variation are present in all of the cichlid species from the LVR. As genetic markers for the phylogenetic analysis of Lake Victoria cichlid species, microsatellites are sufficiently variable to recover both regional relationships and many

(but not all) taxonomic relationships defined by morphological characters. Furthermore, the phylogenetic analysis raises a very interesting possibility that Lake Edward might have been the refugium of haplochromine cichlid fauna during the Last Glacial Maximum when Lake Victoria almost was completely dry, and that many of the present-day cichlid species in the Lake Victoria Basin were derived from a group of ancestral species in Lake

Edward. Finally, although microsatellite markers, as a group, bear phylogenetic signals, the small branch lengths (Figure 4.1) and low boostrap values (Figure 4.8) seen in phylogenetic trees for the LVR cichlids suggest that the branching order obtained in this study may not be highly stable (reliable). Therefore, a robust phylogeny of the LVR cichlid species will probably be obtained in expanded studies using more taxa and more microsatellite markers, preferably those with slightly less variation than most of the markers used in the present study.

117 Species Taxonomic Sampling Sample Sampling year status lake size

Astatoreochromis alluaudi Described Victoria 31 1992;93; 95;96 Kyoga 22 1993 Astatotilapia aenocolor Described Edward 20 1996 Astatotilapia caliptera Described Malawi 17 1996 Astatotilapia elegans Described Edward 20 1996 Astatotilapia “fat tooth” sp. nov. Nawampasa 20 1996 Astatotilapia latifasciata Described Kyoga 35 1996 Nawampasa 4 1996 Astatotilapia nubila Described Nawampasa 19 1995 Astatotilapia velifer Described Kayugi 29 1993 Nabugabo 41 1993; 1994 Entochromis nigripinnis Described Edward 14 1996 Haplochromis “ruby” sp. nov. Nawampasa 19 1995 Harpagochromis “slick” sp. nov. Kyoga 20 1994 Harpagochromis squamipinnis Described Edward 20 1996 Paralabidochromis beadlei Described Nabugabo 20 1996 Paralabidochromis “black para” sp. nov. Kyoga 23 1995 Paralabidochromis “rock kribensis” sp. nov. Victoria 31 1993;1994 Paralabidochromis sp. sp. nov. Victoria 16 1995 Prognathochromis venator Described Kayanja 40 1996 Psammochromis schubotzi Described Edward 20 1996 Psammochromis “shovel mouth” sp. nov. Nawampasa 30 1995;1996 Pseudocrenilabrus multicolor victoriae Described Nabugabo 20 1995 Kabeleka 5 1996 Thoracochromis pharyngalis Described Edward 20 1996 Tharacochromis wingatti Described Edward 25 1996 Yssichromis fusiformis Described Victoria 29 1994 Yssichromis laparagramma Described Victoria 10 1993 Yssichromis pappenheimi Described Edward 11 1994; 1996

Table 4.1: Sample sizes and sampling locations of the cichlid species.

118 Locus Repeat m otif Primer sequence (5’-3’) Tm Cycles

UNH142 (AC)2I F: CTTTACGTTGACGCAGT 56°C 30 R: GTGACATGCAGCAGATA UNH149 (CA)2TA(CA)6TA F: TTAAAACCAGGCCTACC 56°C 30 (CA)9TA(CA)3 R: GTTCTGAGCTCATGCAT UNH169 (TG)9 F: GCTCATTCATATGTAAAGGA 56°C 30 R: TATTTTTTGGGAAGCTGA DXTUCA-3 unknown F: ACCGAAAGAAAGAGCCT 55°C 30 R: GAAGTTTGTTAGCTGGTCA DXTUCA-8 unknown F: CTTGAAAACTGTCCTCAAA 56°C 30 R: CAAAAGCAAGGGAATAAG DXTUCA-14 unknown F: CAGAATAGAGCTTTGATGGAT 58°C 30 R: GCACGCACGCACAAAC DXTUCA-I5 unknown F: GCTGTGTAATCCCAAACTCC 56°C 30 R: GTATTTAGCTTTCCTCTGTGCT

Table 4.2: Repeat motifs and PCR conditions of the seven microsatellite markers

developed in other laboratories.

119 Table 4.3: Summary of microsatellite genetic variability within 25 haplochromine cichlid species. 2N: number o f chromosomes scored; n: number o f alleles observed; : expected heterozygosity (unbiased gene diversity); observed heterozygosity; %: amplification success rate.

120 Species 09d 121 16d 19d 191 20d 2ld 142 149 169 CA-14 CA-15 CA-8 CA-3 Average A. alluaudi 2N 104 104 104 104 106 102 104 94 86 76 92 90 92 86 96.0 n 24 1 4 24 15 33 2 1 20 3 27 1 5 20 12.9 Wc 0.949 0.000 0.197 0.940 0.870 0.965 0.038 0.000 0.930 0.392 0.958 0.000 0.275 0.934 0.532 //. 0.962 0.000 0.250 0.942 0.887 0.804 0.000 0.000 0.884 0.421 0.891 0.000 0.217 1.000 0.518 % 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.940 0.935 0.950 0.939 0.898 0.939 0.936 0.967 A. aenocolor 2N 40 40 38 38 38 32 40 32 34 40 14 40 40 38 36.0 n 3 1 20 21 14 19 3 8 9 2 10 7 4 17 9.9 Wc 0.347 0.000 0.956 0.962 0.878 0.935 0.405 0.768 0.836 0.467 0.945 0.690 0.509 0.946 0.689 0.350 0.000 0.895 0.842 0.579 0.625 0.250 0.750 0.353 0.400 0.714 0.750 0.600 0.895 0.572 % 1.000 1.000 0.950 0.950 0.950 0,842 1.000 0.800 0.850 1.000 0.350 1.000 1.000 0.950 0.903 X. calliptera 2N 34 34 34 34 32 34 34 32 32 34 28 34 34 34 33.0 n 3 2 4 13 8 19 1 5 6 4 14 2 10 14 7.5 to 0.348 0.513 0.736 0.902 0.708 0.952 0.000 0.740 0.784 0.570 0.947 0.059 0.756 0.922 0.636 0.412 0.353 0.765 0.882 0.563 0.882 0.000 0.938 0.688 0.412 0.857 0.059 0.588 0.941 0.600 % 1.000 1.000 1.000 1.000 0.941 1.000 1.000 0.941 0.941 1.000 0.875 1.000 1.000 1.000 0.978 /I. elegans 2N 40 40 40 36 36 32 38 34 22 40 28 40 40 32 35.6 n 4 1 19 13 8 18 3 12 9 2 15 7 7 12 9.3 0.568 0.000 0.959 0.916 0.617 0.960 0.495 0.891 0.835 0.492 0.952 0.673 0.713 0.925 0.714 //. 0.550 0.000 0.900 0.778 0.500 0.625 0.263 0.765 0.273 0.400 0.786 0.750 0.700 1.000 0.592 % 1.000 1.000 1.000 0.900 0.900 0.842 0.950 0.850 0.550 1.000 0.737 1.000 1.000 0.800 0.895 /4, “fat tooth” 2N 40 30 40 30 30 36 30 34 22 38 36 38 38 34 34.0 n 3 2 19 13 11 16 2 9 6 3 14 6 5 15 8.9 W. 0.145 0.186 0.951 0.910 0.770 0.956 0.287 0.888 0.684 0.198 0.921 0.734 0.680 0.950 0.661 //. 0.150 0.200 1.000 0.933 0.533 0.778 0.333 0.588 0.273 0.211 0.833 0.684 0.632 1.000 0.582 % 1.000 0.833 1.000 0.833 0.833 0.900 0.833 0.850 0.611 0.950 0.900 0.950 0.950 0.850 0.878

(Continued) Table 4.3, (Continued)

A. latifasciata 2N 78 68 76 58 72 66 66 64 62 68 62 68 78 54 67.0 n 3 2 23 20 23 21 2 9 14 3 18 8 5 15 11.9 H. 0.123 0.233 0.917 0.932 0.914 0.938 0.088 0.633 0.872 0.307 0.928 0.726 0.697 0.912 0.658 Ho 0.128 0.265 0.895 0.931 0.889 0.667 0.091 0.500 0.387 .0235 0.945 0.647 0.410 0.852 0.539 % 1.000 0.872 1.000 0.744 0.923 0.892 0.846 0.821 0.795 0.872 0.816 0.850 1.000 0.730 0.869 A. niibila 2N 38 38 38 38 38 38 38 36 38 38 38 38 38 38 37.9 n 4 2 23 17 19 18 2 5 9 2 19 8 4 16 10.4 Ho 0.245 0.193 0.974 0.953 0.962 0.932 0.149 0.748 0.822 0.235 0.943 0.772 0.459 0.939 0.666 Ho 0.263 0.211 0.789 1.000 0.947 0.947 0.158 0.667 0.895 0.263 0.684 0.789 0.632 0.947 0.657 % 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.947 1.000 1.000 1.000 1.000 1.000 1.000 0.996 A. velifer 2N 140 134 132 132 140 130 138 86 70 80 72 76 76 74 105.7 M n 7 4 27 28 25 28 5 14 15 4 24 6 6 17 15.0 to Ho 0.377 0.384 0.914 0.957 0.935 0.939 0.230 0.871 0.896 0.121 0.962 0.688 0.761 0.914 0.711 Ho 0.229 0.358 0.712 0.788 0.843 0.446 0.145 0.814 0.829 0.100 0.833 0.711 0.658 0.838 0.593 % 1.000 0.971 0.957 0.957 1.000 0.929 1.000 0.880 0.875 1.000 0.947 0.950 0.950 0.881 0.950 E. nigripinnis 2N 28 28 28 28 26 28 28 28 28 28 28 28 28 28 27.9 n 3 1 20 14 11 20 2 8 11 2 18 6 3 15 9.6 Ho 0.370 0.000 0.968 0.942 0.812 0.968 0.071 0.841 0.749 0.508 0.968 0.616 0.558 0.955 0.666 Ho 0.429 0.000 0.929 1.000 0.846 0.929 0.071 0.429 0.643 0.286 0.714 0.571 0.357 0.929 0.581 % 1.000 1.000 1.000 1.000 0.929 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 H. “ruby” 2N 36 38 38 38 34 36 38 38 34 32 36 36 38 38 36.4 n 6 2 26 16 21 22 2 9 9 2 18 6 5 15 11.3 Ho 0.352 0.102 0.976 0.942 0.939 0.970 0.341 0.792 0.788 0.121 0.952 0.837 0.486 0.916 0.680 Ho 0.389 0.105 1.000 1.000 0.706 0.667 0.316 0.895 0.706 0.125 0.944 0.889 0.474 0.737 0.639 % 0.947 1.000 1.000 1.000 0.944 1.000 1.000 1.000 0.944 0.842 0.947 0.947 1,000 1.000 0.970

(Continued) Table 4.3. (Continued)

H. squamminis

2N 40 40 38 38 40 38 40 40 32 40 - 40 40 30 35,4

n 2 1 16 14 7 18 3 11 8 2 - 4 6 15 7,6

0.224 0.000 0.923 0.930 0.809 0.950 0.273 0.891 0.792 0.409 - 0.672 0.654 0,943 0,652 Mo 0.150 0.000 0.684 0,789 0.850 0.842 0.200 0.850 0.875 0.550 - 0.450 0.600 0,800 0,588 % 1.000 1.000 0.950 0.950 1.000 0.950 1.000 1.000 0.800 1.000 - 1.000 1.000 0.750 0.886 H. “slick” 2N 40 40 40 34 40 38 40 40 40 40 8 40 40 40 35.4 n 2 2 16 15 6 17 2 6 6 3 7 5 8 15 7.6 //. 0.050 0.142 0.929 0.914 0.397 0.939 0.508 0.556 0.468 0,268 0.964 0.769 0.673 0,929 0.652 /Y. 0.050 0.200 0.900 1.000 0.350 0.789 0.400 0.550 0.300 0.300 1.000 0.800 0.550 0.950 0.588 % 1.000 1.000 1.000 0.850 1.000 0.950 1.000 1.000 1.000 1.000 0.200 1.000 1.000 1.000 0.886 P. “black para” 2N 44 46 46 46 42 40 46 44 46 38 46 44 42 46 44.0 N3 n 2 3 23 21 16 19 2 6 13 2 16 6 4 16 10.8 W 0.089 0.204 0.965 0.953 0.887 0.923 0.162 0.727 0.842 0.053 0.900 0.751 0.585 0.937 0.641 //. 0.091 0.174 0.913 1.000 0.714 0.850 0.187 0.727 0.609 0.053 1.000 0.909 0.619 0.826 0.612 % 0.957 1.000 1.000 1.000 1.000 0.952 1.000 0.957 1.000 0.826 1.000 0.955 0.905 1.000 0.968 P. beadlei 2N 40 34 40 30 38 28 32 24 28 40 38 40 40 30 34.4 n 4 2 21 14 13 15 2 8 9 2 14 6 4 13 9.1 M. 0.233 0.214 0.962 0.938 0.792 0.950 0.226 0.826 0.875 0.142 0.932 0.782 0.565 0.931 0.688 Mo 0.250 0.294 0.850 0.800 0.632 0.571 0.250 0.500 0.500 0.050 0.842 0.650 0.500 0,933 0.544 % 1.000 0.850 1.000 0.833 0.950 0.824 0.800 0.600 0.700 1.000 0.950 1.000 1.000 0,750 0.875 P. “rock kribensis” 2N 62 62 62 60 58 60 62 38 36 32 32 40 32 40 48.3 n 3 2 20 16 24 22 2 7 7 2 16 5 3 15 10.3 Mo 0.476 0.123 0.883 0.894 0.887 0.940 0.419 0.738 0.771 0.272 0.935 0.683 0.599 0,941 0,683 Mo 0.548 0.129 0.613 0.867 0.897 0.467 0.387 0.737 0.556 0.188 1.000 0.850 0.750 0,950 0,638 % 1.000 1.000 1.000 0.968 0.967 1.000 1.000 0.950 0.947 0.889 0.889 1.000 0.895 1.000 0,965

(Continued) Table 4.3. (Continued)

P. sp. 2N 32 32 32 32 32 32 32 32 32 30 32 32 32 32 31.9 n 4 2 16 14 16 16 2 7 5 2 11 6 6 18 8.9 H. 0.337 0.226 0.948 0.940 0.931 0.935 0.315 0.597 0.720 0.186 0.911 0.819 0.744 0.952 0.683 Ho 0.375 0.000 0.688 0.625 0.875 0.750 0.125 0.500 0.063 0.200 0.750 0.875 0.750 1.000 0.541 % 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 P. Venator 2N 80 80 80 78 80 80 80 72 38 78 76 78 78 80 75.6 n 1 2 8 8 3 19 2 4 13 1 12 4 3 12 6.6 0.000 0.481 0.768 0.767 0.344 0.910 0.486 0.696 0.893 0.000 0.861 0.646 0.641 0.869 0.597 Ho 0.000 0.425 0.675 0.923 0.375 0.750 0.400 0.750 0.421 0.000 0.605 0.615 0.513 0.900 0.525 % 1.000 1.000 1.000 0.975 1.000 1.000 1.000 0.900 0.475 0.975 0.950 0.975 0.975 1.000 0.945 P. schubollz 2N 40 38 32 34 34 38 38 24 34 40 28 40 40 30 34.9 n 2 3 15 16 11 14 3 16 10 3 19 4 7 14 9.6 Ho 0.409 0.104 0.946 0.925 0.886 0.922 0.286 0.971 0.742 0,569 0.971 0.706 0.642 0.943 0.715 Ho 0.450 0.105 0.750 0.824 0.412 0.474 0.316 0.583 0.706 0.750 1.000 0.500 0.500 1.000 0.602 % 1.000 0.950 0.842 0.850 0.850 0.950 0.950 0.600 0.850 1.000 0.700 1.000 1.000 0.750 0.878 P. “shovel mouth” 2N 56 58 60 58 60 56 58 56 56 60 60 60 60 60 58.4 n 7 2 25 17 9 26 2 9 6 2 21 6 4 14 10.7 Ho 0.688 0.034 0.953 0.938 0.709 0.957 0.494 0.668 0.605 0.345 0.944 0.676 0.466 0.873 0.668 Ho 0.321 0.034 0.967 1.000 0.633 0.786 0.345 0.714 0.357 0.233 0.733 0.767 0,467 0.933 0.592 % 0.947 0.967 1.000 0.967 1.000 0.933 0.967 0.933 0.933 1.000 1.000 1.000 1.000 1.000 0.975 P. multicolor victoriae 2N 48 48 50 50 48 44 50 50 48 50 16 50 50 50 45.3 n 19 3 5 8 9 17 1 5 25 22 9 5 20 16 11.7 Ho 0.901 0.551 0.630 0.641 0.657 0.920 0.000 0.729 0.965 0.944 0.900 0.559 0.945 0.929 0.704 Ho 0.917 0.583 0.600 0.600 0.542 0.727 0.000 0.840 1.000 0.840 0.625 0.480 1.000 1.000 0.730 % 0.960 0.960 1.000 1.000 0.960 0.880 1.000 1.000 0.960 1.000 0.320 1.000 1.000 1.000 0.931

(Continued) Table 4.3. (Continued)

T. pharyngalis 2N 40 40 38 34 40 32 40 8 28 40 34 40 40 28 34.4 n 4 2 11 15 7 9 2 7 6 2 13 7 4 12 7.1 Wc 0.672 0.050 0.851 0.941 0.558 0.865 0.185 0.964 0.794 0.097 0.914 0.358 0.345 0.905 0.607 Ho 0.700 0.050 0.895 1.000 0.450 0.438 0.200 0.750 0.500 0.100 0.824 0.350 0.200 1.000 0.533 % 1.000 1.000 0.950 0.850 1.000 0.842 1.000 0.200 0.700 1.000 0.850 1.000 1.000 0.700 0.864 T. wingati 2N 50 50 50 44 46 42 50 30 36 50 24 50 50 36 43.4 n 4 2 21 18 12 23 2 7 12 2 11 6 9 14 10.2 //c 0.479 0.115 0.951 0.939 0.802 0.970 0.115 0.818 0.903 0.510 0.924 0.725 0.691 0.910 0.704 Ho 0.280 0.120 0.800 1.000 0.609 0.667 0.120 0.533 0.389 0.360 1.000 0.800 0.680 0.833 0.585 % 1.000 1.000 1.000 0.880 0.920 0.840 1.000 0.724 0.783 1.000 0.480 1.000 1.000 0.720 0.881 y. fusiformis 2N 60 58 58 58 60 60 58 58 54 52 56 58 58 56 57.4 to n 6 2 23 20 26 23 2 12 3 2 23 6 7 17 12.3 oi Ho 0.324 0.216 0.958 0.951 0.898 0.958 0.034 0.799 0.237 0.434 0.953 0.815 0.581 0.934 0.649 Ho 0.267 0.172 0.862 0.724 0.800 0.633 0.034 0.822 0.185 0.308 0.893 0.862 0.517 0.964 0.575 % 1.000 0.967 1.000 0.967 1.000 1.000 0.967 1.000 0.931 0.897 1.000 1.000 1.000 1.000 0.981 Y, laparagramma 2N 20 18 20 18 18 20 18 18 18 18 18 18 18 18 18.4 n 4 2 12 10 12 13 2 9 2 2 11 6 3 12 7.1 Ho 0.595 0.503 0.932 0.922 0.948 0.958 0.000 0.804 0.471 0.111 0.935 0.699 0.569 0.594 0.671 Ho 0.700 0.333 0.700 0.667 0.667 0.800 0.000 0.667 0.444 0.111 0.667 0.667 0.667 1.000 0.578 % 1.000 0.900 1.000 0.900 0.900 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.971 Y. pappenhenmi 2N 22 22 22 22 22 22 22 20 22 22 22 22 22 20 21.7 n 4 3 16 13 10 15 3 9 13 2 10 7 6 13 8.9 Ho 0.398 0.177 0.970 0.944 0.827 0.952 0.567 0.884 0.935 0.524 0.913 0.749 0.619 0.932 0.742 Ho 0.455 0.182 1.000 1.000 0.909 1.000 0.364 0.900 1.000 0.455 0.545 0.818 0.727 1.000 0.740 % 1.000 1.000 1,000 1.000 1.000 1.000 1.000 0.909 1.000 1.000 1.000 1.000 1.000 0.909 0.987

(Continued) Table 4.3. (Continued)

Average 2N 50.1 48.8 49.4 49.6 48.4 46.7 48.8 41.3 39.0 44.2 38.5 45.6 45.8 42.1 45.3 n 5.1 2.0 17.3 15.9 13.5 19.2 2.2 8.1 9.8 3.1 15.4 5.4 5.9 14.9 9.9 //. 0.388 0.190 0.885 0.916 0.789 0.942 0.247 0.754 0.768 0.331 0.935 0.648 0.609 0.928 0.665 Ho 0.377 0.172 0.801 0.877 0.680 0.714 0.194 0.671 0.553 0.294 0.808 0.626 0.572 0.929 0.592 % 0.992 0.971 0.986 0.935 0.959 0.941 0.969 0.872 0.863 0.968 0.827 0.981 0.985 0.909 0.940 Total n 44 7 47 48 56 70 6 31 54 30 41 11 35 37 36.9

N3 O) Table 4.4: Correlation between allele size and genetic variability of homologous microsatellite markers among different species. 2N: number of chromosomes scored; n: number o f alleles observed; unbiased gene diversity (Nei 1987). All numbers for grouped species are average values. Comparison was made for species where at least 40 chromosomes were scored for a particular marker to minimize the sampling errors. P.

Venator was excluded for this comparative analysis because it has a lower genetic varability for almost all markers comparing with other species. Mean allele size was calculated by summing up all the products between size of each allele and its allele frequency. For pooling analysis, it is the average of mean allele sizes of all the species in the pool.

127 2N Mean allele size (bp) n OSU09d A. alluaudi 104 183.1 24 0.949 P. multicolor victoriae 48 170.3 19 0.901 15 other species 54.0 128.6 3.7 0.347

0S U I6d A. alluaudi 104 72.3 4 0.197 P. multicolor victoriae 50 70.2 8 0.641 11 other species 58.5 112.2 21.5 0.940

OSU19d P. multicolor victoriae 50 89.6 8 0.641 8 other species 70.0 154.0 20.5 0.938

OSU20d P. multicolor victoriae 44 131.7 17 0.920 8 other species 69.5 175.5 24.4 0.949

UNH142 A. alluaudi 94 147.0 1 0.000 P. multicolor victoriae 50 149.0 5 0.729 7 other species 52.6 169.7 9.9 0.749

UNH149 A. alluaudi 86 143.2 20 0.930 P. multicolor victoriae 48 162.7 25 0.965 H. “slick” 40 120.5 6 0.468 P. “shovel mouth” 56 118.6 6 0.605 K fusiformis 54 113.6 3 0.237 3 other species 59.3 131.3 14.0 0.870

UNH169 P. multicolor victoriae 50 183.7 22 0.944 13 other species 51.2 135.6 2.5 0.350

DXTUCA-15 A. alluaudi 90 78.0 1 0.000 15 other species 48.4 87.1 5.7 0.685

DXTUCA-8 P. multicolor victoriae 50 103.9 20 0.945 14 other species 52.6 79.3 5.7 0.583

128 , As. velifer (NaN) 1 Y. laparagramma (VI) — — As. velifer (NaK) •— As. latifasciata (Ky) 1— 4s. "fet tooth"(Ky) ..4s. nubila (Ky) ^ P y.fusiformis (Vi) III ML . 4s. calliptera (Ma) - Pa. sp. (Vi) mPa. beadlei (Na) Ij-Pa. "black para" (Ky) I Ha. "ruby" (Ky) — Pa. "rock kribensis" (Vi) Ps. "shovel mouth" (Ky) slick" (Ky) H. squamipinnis (Ed) Pr. Venator (Na) 4s. aenocolor (Ed) Y. pappenheimi (Ed) 4s. elegans (Ed) — Ps. schuboltz (Ed) T. wingatti (Ed) '•E. nigripinnis (Ed) j. 4. alluaudi (Vi) 4. alluaudi (Ky) T. pharyngalis (Ed) -Pu. multicolor (Vi) I------0.0 0.1

Figure 4.1. An unrooted neighbor-joining tree based on Nei’s genetic distances of II markers.

129 As. velifer (NaN) y, laparagramma (Vi) ■As. velifer (NaK) r - L e :As. latifasciata (Ky) As. "fattooth"(Ky)

I------As. nubila (Ky) ~ ~ | I— Y. fusiformis (Vi) III I— As. calliptera (Ma) I------Pa. sp. (Vi) I I------Pa. beadlei (Na) — Pa. "black para" (Ky) I— Ha. "ruby" (Ky) ------Pa. "rock kribensis" (Vi) I— Ps. "shovel mouth" (Ky) I— H. "slick" (Ky) I— H. squamipinnis (Ed) I— Pr. Venator (Na) As. aenocolor (Ed) j------y. pappenheimi (Ed) I I— As. elegans (Ed) I ^ P s . schuboltz (Ed) T. wingatti (Ed) E. nigripinnis (Ed) A. alluaudi (Vi) □A. aliuaudi (Ky) T. pharyngalis (Ed) —"Pu. muiticoior (Vi)

Figure 4.2. An unrooted neighbor-joining tree based on Nei’s genetic distances of 11 markers (topology only).

130 As. velifer (NaK) Y. laparagramma (Vi) As. velifer (NaN) As. latifasciata (Ky) As. "fat tooth"(Ky) Pa. "rock kribensis" (Vi) Pr. Venator (Na) Ps. "shovel mouth" (Ky) H. "slick" (Ky) ç— Pa. sp. (Vi) — Pa. beadlei (Na) I ,— Pa. "black para" (Ky) I— Ha. "ruby" (Ky) As. nubila (Ky) Y. fusiformis (Vi) As. aenocolor (Ed) H. squamlpinnis (Ed) Ps. schuboitz (Ed) Y. pappenheimi (Ed) As. eiegans (Ed) T. wingatti (Ed) E. nigripinnis (Ed) As. calliptera (Ma) A. ailuaudi (Vi) A. ailuaudi (Ky) T. pharyngaiis (Ed) Pu. multicolor (Vi)

Figure 4.3. An unrooted neighbor-joining tree based on allele sharing genetic distances of 11 markers (topology only).

131 y. fusiformis (Vi) Y. iaparagramma (Vi) As. nubila (Ky) •Pa. "black para" (Ky) ■Ha. "ruby" (Ky) ■As. velifer (NaN) ■Pa. "rock kribensis" (Vi) ■As. velifer (NaK) ■As. latifasciata (Ky) • Pa. sp. (Vi) ■ As. "fat tooth"(Ky) ■ Pa. beadlei (Na) ■As. aenocolor (Ed) ■ Y. pappenheimi (Ed) Ps. "shovel mouth" (Ky) Pr. Venator (Na) H. squamlpinnis (Ed) H z ; : slick" (Ky) E. nigripinnis (Ed) Ps. schuboitz (Ed) T. wingatti (Ed) As. eiegans (Ed) T. pharyngaiis (Ed) As. calliptera (Ma) A. ailuaudi (Vi) A. ailuaudi (Ky) Pu. multicolor (Vi)

Figure 4.4. An unrooted neighbor-joining tree based on the stepwise weighted genetic distances of 11 markers (topology only).

132 Ps. "shovel mouth" (Ky) H. "slick" (Ky) Pr. Venator (Na) Ha. "ruby" (Ky) Pa. "black para" (Ky) As. nubila (Ky) Pa. beadlei (Na) Pa. "rock kribensis" (VI) '^ C Pa. sp. (Vi) As. velifer (NaN) As. velifer (NaK) As. "fat tooth"(Ky) Y. iaparagramma (Vi) Y. fusiformis (Vi) As. latifasciata (Ky)

I ■ - As.MS. aenocoior (Ed) pappenheimi (Ed) eiegans (Ed) I— Ps. schuboitz (Ed) E. nigripinnis (Ed) T. wingatti (Ed) As. calliptera (Ma) A. ailuaudi (Vi) A. ailuaudi (Ky) T. pharyngaiis (Ed) Pu. muiticolor (Vi)

Figure 4.5. An unrooted neighbor-joining tree based on Nei’s genetic distances of 14 markers (topology only).

133 Ps. "shovel mouth" (Ky) H. "slick" (Ky) Y. pappenheimi (Ed) Pr. Venator (Na) Ha. "ruby" (Ky) Pa. "black para" (Ky) As. nubila (Ky) Pa. beadlei (Na) Pa. "rock kribensis" (Vi) Pa. sp. (Vi) -As. velifer (NaN) -As. velifer (NaK) -As. latifasciata (Ky) -As. "fattooth"(Ky) ‘ Y. Iaparagramma (Vi) -Y. fusiformis (Vi) -As. aenocolor (Ed) -Ps. schuboitz (Ed) £. nigripinnis (Ed) As. eiegans (Ed) T. wingatti (Ed) -As. calliptera (Ma) -A. ailuaudi (Vi) -A. ailuaudi (Ky) -T. pharyngaiis (Ed) -Pu. multicolor (Vi)

Figure 4.6. An unrooted neighbor-joining tree based on allele sharing genetic distances of 14 markers (topology only).

134 Y. Iaparagramma (Vi) Y. fusiformis (Vi) Pa. "rock kribensis" (Vi) Pa. sp. (Vi) III As. "fat tooth"(Ky) As. latifasciata (Ky) As. velifer (NaN) As. velifer (NaK) E. nigripinnis (Ed) Y. pappenheimi (Ed) As. eiegans (Ed) Pa. beadlei (Na) Pr. Venator (Na) Ha. "ruby" (Ky) Pa. "black para" (Ky) As. nubila (Ky) As. aenocolor (Ed) Ps. "shovel mouth" (Ky) H. "slick" (Ky) T. wingatti (Ed) Ps. schuboitz (Ed) r. pharyngaiis (Ed) As. calliptera (Ma) Pu. multicolor (Vi) A. ailuaudi (Vi) A. ailuaudi (Ky)

Figure 4.7. An unrooted neighbor-joining tree based on stepwise weighted genetic distances of 14 markers (topology only).

135 Y. Iaparagramma (Vi) Y. £asi£oxm±s (Vi) Pr, Venator (Ha) Ha, "ruby" (Ky) 22 As, latifasciata (Ky) As, "fat tooth" (Ky) Pa, beadlei (Ha) LVB Pa, "rock kribensis" (Vi) Pa, sp. (Vi) Pa, "black para" (Ky) L— As, nubila (Ky) 4<| , As, velifer (HaH) As, velifer (HaK) Ps, "shovel mouth" (Ky) H, "slick" (Ky) Y. pappenheimi (Ed) As, a.enocolox (Ed) 29 E. nigripinnis (Ed) As, eiegans (Ed) Edward r, TTingatti (Ed) Ps, schuboitz (Ed) 47 As, calliptera (Ma)

86 A, ailuaudi (Vi) A, ailuaudi (Ky) I widespread r, pharyngaiis (Ed) I Edward Pu, w u lb ico lo x (Vi) I widespread

Figure 4.8: A consensus neighbor-joining tree based on Nei’s genetic distances of 14

markers. Numbers on branches are bootstrap values. Bootstrap values below 20% were

not shown.

136 LITERATURE CITED

Aaltonen, L. A. et a l, 1993. Clues to the pathogenesis of familial colorectal cancer. Science, 260: 812-816.

Amos B, C. Schlotterer, and D. Tautz, 1994. Social structure of pilot whales revealed by analytic DNA profiling. Nature, 260: 670-672.

Aqualina, G. et al., 1994. A mismatch recognition defect in colon carcinoma confers DNA micro satellite instability and a mutator phenotype. PNAS, 91: 8905-8909.

Barel, C. er n/., 1985. Destruction of fisheries in Afinca’s lakes. Nature, 315: 19-20.

Beckmann, J. S. and J. L. Weber, 1992. Survey of human and rat microsatellites. Genomics, 12: 627-631.

Blackburn, E. and J. Szostak, 1984. The molecular structure of centromeres and telomeres. Ann. Rev. Biochem., 53: 163-194.

Boehm, T. et at., 1989. Alternating purine-pyrimidine tracts may promote chromosomal translocation seen in a variety of human lymphoid tumors. EMBOJ., 8: 2621-2631.

Booton, G., 1995. Molecular Genetic Analysis of the Phylogenetic Relationships of Lake Victoria Cichlid Fish. Ph.D. Dissertation. The Ohio State University.

Booton, G. et al., 1999. Evolution of the ribosomal RNA internal transcribed spacer one (ITS-1) in cichlid fishes of t he Lake Victoria region. Mol. Phylogenet. Evol., in press.

Bowcock, A. M. et a l, 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature, 368: 455-457.

137 Brachet, S., M. F. Jubier, M. Richard, B. Jung-Muller, and N. Franscaria-Lacoste, 1999. Rapid identification of microsatellite loci using 5’ anchored PCR in the common ash Fraxinus excelsior. Mol. EcoL, 8: 160-163.

Brook, J. D. et al, 1992. Molecular basis of myotonic dystrophy: expansion of a trinucleotide (CTG) repeat at the 3' end of a transcript encoding a protein kinase family member. Cell, 68: 799 - 808.

Brooker AL, D. Cook, P. Bentzen, J. M. Wright, and R. W. Doyle, 1994. Organization of microsatellites differs between mammals and cold-water teleost fishes. Can. J. Fish. Aquat. ScL, 51: 1959-1966.

Bullock, P., J. Miller, and M. Botchan, 1986. Effects of poly[d(pGpT).d(pApC)] and poly[d(pCpG).d(pGpC)] repeats on homologous recombination in somatic cells. Mol. Cell Biol, 6: 3948-3953.

Burke, J. R. et ai, 1994. The Haw River Syndrome: Dentatorubropallidoluysian atrophy (DRPLA) in an Afiican-American family. Nature Genet., 7: 521-524.

Callen, D. F. et al, 1993. Incidence and origin of “null” alleles in the (AC)n microsatellite markers. Am. J. Human Genet., 52: 922-927.

Carson, H. L. and K. Y. Kaneshiro, 1976. Drosophila of Hawaii: systematics and ecological genetics. Ann. Rev. EcoL Syst., 7: 311-345.

Carvalho, G. R. and L. Hauser, 1994. Molecular genetics and the stock concept in fisheries. Rev. Fish Biol. Fish., 4: 326-350.

Cavalli-Sforza, L. L., 1998. The DNA revolution in population genetics. Trends Genet., 14: 60-66.

Chapman L. J., C. A. Chapman, L. S. Kaufinan, and E. Mckenzie, 1995. Hpyoxia tolerance in twelve species of East African cichlids: Potential for low oxygen réfugia in Lake Victoria. Cons. Biol, 9: 1-15.

Chapman, L. J., C. A. Chapman, R. Ogutu-Ohwayo, M. Chandler, and L. S. Kaufinan, 1996. Réfugia for endangered fishes firom an introduced predator in Lake Nabugabo, Uganda. Cons. Biol, 55A-561.

Chenuil A., E. Desmarais, L. Pouyaud, and P. Berrebi, 1997. Does polyploidy lead to fewer and shorter microsatellites in Barbus (Teleostei: Cyprinidae)? M ol Ecol, 6: 169-178.

138 Colboume, J. K., B. D. Neff, Wright, J. M., and M. R. Gross, 1996. DNA fingerprinting of bluegill sunfish {Lepomis macrochirus) using (GT)„ microsatellites and its potential for assessment of mating success. Can. J. Fish. Aquat. ScL, 53: 342-349.

Cooperative Human Linkage Center, 1994. A comprehensive human linkage map with centimorgan density. Science, 265: 2049-2054.

Dallas, J. F., 1992. Estimation of microsatellite mutation rates in recombinant inbred strains of mouse. Mammal. Genome, 3: 452 - 456.

Diehl, S. R., J. Ziegle, G. A. Buck, T. R. Reynolds , and J. L. Weber, 1990. Automated genotyping of human DNA polymorphisms. Am. J. Hum. Genet., 47: A177.

Deitrich, W. et al., 1992. A genetic map of the mouse suitable for typing intraspecific crosses. Genetics, 131: 423-447.

Di Rienzo, A. et al., 1994. Mutational processes of simple-sequence repeat loci in human populations. PNAS, 91: 3166 - 3170.

Edwards, A., A. Civitello, H. Hammond , and C. T. Caskey, 1991. DNA typing and genetic mapping with trimeric and tetrameric tandem repeats. Am. J. Hum. Genet., 49: 746 - 756.

Edwards, A., H. A. Hammond, L. Jin, C. T. Caskey , and R. Chakraborty, 1992. Genetic variation at five trimeric and tetrameric repeat loci in four human population groups. Genomics, 12: 241 - 253.

Ellegren, H., 1995. Mutation rates at porcine microsatellite loci. Mammal. Genome, 6: 376-377.

Estoup A, P. Presa, F. Krieg, D. Vaiman,, and R. Guyomard, 1993. (CT)„ and (GT)„ microsatellites: a new class of genetic markers for Salmo trutta L. (brown trout). Heredity, 71: 488-496.

Estoup, A., L. Garnery, M. Solignac, and J. M. Comuet, 1995a. Microsatellite variation in honey bee (Apis mellifera L.) populations: Hierarchical genetic structure and test of the infinite allele and stepwise mutation models. Genetics, 140: 679 - 695.

Estoup, A., L. Garnery, M. Solignac, and J. M. Comuet, 1995b. Size homoplasy and mutational processes of interrupted microsatellites in two bee species Apis mellifera and Bombus terrestris (Apidae). Mol. Biol. Evol, 12: 1074-1084.

139 Evens, W. J., 1972. The sampling theory o f selectively neutral alleles. Theo. Popul. Biol., 3: S7-U2.

Felsenstein, J., 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.

Fisher, P. J., R. C. Gardner, and T. E. Richardson, 1996. Single locus microsatellites isolated using 5’ anchored PCR. Nucl. Acid Res., 24: 4369-4371.

Fitzsimmons, N. N., C. Moritz, and S. Moore, 1995. Conservation and dynamics of microsatellite loci over 300 million years of marine turtle evolution. Mol. Biol. Evol., 12, 432-440.

Fiumera, A. C., P. G. Parker, and P. A. Fuerst, 1999. Effective population size and maintenance of genetic diversity in captive bred populations of the Lake Victoria cichlid Prognathochromisperrieri. Cons. Biol, in press.

Fjerdingstad, E. J., J. J. Boomsma, and P. Thoren, 1998. Multiple paternity in the leafcutter ajotAtta colombica - a microsatellite DNA study. Heredity, 80: 118-126.

Fryer G, and T . D. lies, 1972. The Cichlid Fishes o f the Great Lakes o f Africa: their Biology and Evolution. Oliver and Boyd, Edinburgh.

Fu, Y. H. et a l, 1991. Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell, 67: 1047-1058.

Gagneux, P., C. Boesch, and D. S. Woodruff, 1997. Microsatellite scoring errors associated with noninvasive genotyping based on nuclear DNA amplified from shed hair. Mol. Ecol., 6: 861-868.

Galis, F. & J. A. J. Metz, 1998. Why are there so many cichlid species? Trends Ecol EvoL, 13(1): 1-2.

Gastier J. M. et ai, 1995. Survey of trinucleotide repeats in the human genome: assessment of their utility as genetic markers. Hum. Mol. Genet., 4: 1829-1836.

Gerloff, U., C. Schlotter, and K. Rassmann, 1995. Amplification of hypervariable simple sequence repeats (microsatellites) from excremental DNA of wild living bonobos {Panpanicus). Mol Ecol, 4: 515-518.

Glenn, T. C., W. Stephan, H. C. Dessauer, and M. J. Braun, 1996. Allelic diversity in alligator microsatellite loci is negatively correlated with GO content of flanking

140 sequences and evolutionary conservation of PCR amplifiability. Mol. Ecol., 13: 1151-1154.

Goff, D. J. et al., 1992. Identification of polymorphic simple sequence repeats in the genome of zebrafish. Genomics, 14: 200-202.

Goldstein, D. B. and A. G. Clark, 1995. Microsatellite variation in North American populations of Drosophila melanogaster. Nucl. Acid Res., 23(19): 3882-3886.

Goldstein, D., A. R. Linares, L. L. Cavalli-Sforza, and M. W. Feldman, 1995a. An evaluation of genetic distances for use with microsatellite loci. Genetics, 139: 463- 471.

Goldstein, D., A. R. Linares, L. L. Cavalli-Sforza, and M. W. Feldman, 1995b. Genetic absolute dating based on microsatellites and the origin of modem humans. PNAS, 92 : 6723-6727.

Goudet, J.,1995. Fstat version 1.2. A computer program to calculate F-statistics. J. Heredity, 86: 485-486.

Greenwood, P. H., 1965. Environmental effects on the pharyngeal mill of a cichlid fish, Astatoreochromis ailuaudi, and their taxonomic implications. Proc. Linn. Sac. Lon. 176: 1-10.

Greenwood, P. H., 1979. Towards a phyletic classification of the ‘genus’ Haplochromis (Pisces: Cichlidae) and related taxa Pt. 1. Bull. British Mus. ZooL, 35: 265-322.

Greenwood, P. H., 1980. Towards a phyletic classification of the ‘genus’ Haplochromis (Pisces: Cichlidae) and related taxa Pt. II. Bull. British Mus. ZooL, 39: 1-101.

Greenwood, P. H., 1981. The Haplochromine Fishes of the East Afiican Lakes. Cornell University Press, Ithaca, New York.

Greenwood, P. H., 1984. Afiican cichlids and evolutionary theories. In: Evolution o f Fish Species Flocks (A. A. Echelle & I. Komfield, eds.), pp. 141-154. University of Maine at Orono Press.

Greenwood, P. H., 1991. Spéciation. In; Cichlid Fishes: Behaviour, Ecology and Ecology (Keenleyside, M. H. A., ed), pp. 86-102. Chapman & Hall, London.

Grimaldi, M.-C. and B. Crouau-Roy, 1997. Microsatellite allelic homoplasy due to variable flanking sequences. J. Mol. Evol, 44: 336-340.

141 Gullberg, A., M. Olsson, and H. Tegelstrom, 1997. Male mating success, reproductive success and multiple paternity in a natural population of sand lizards: behavioural and molecular genetics data. Mol. Ecol., 6: 105-112.

Hamada, H., M. Petrino, and T. Kakunaga, 1982. A novel repeated element with Z- DNA-forming potential is widely found in evolutionarily diverse eukaryotic genomes. PNAS, 79: 6465-6469.

Hamada, H., M. Seidman, B. H. Howard, and C. M. Gorman, 1984. Enhanced gene expression by the Poly(dT-dG).Poly(dC-dA) sequence. Mol. Cell. Biol., 4: 2622 - 2630.

Harr, B., S. Weiss, J. R. David, G. Brem, and C. Schlotterer, 1998. A microsatellite- based multilocus phylogeny of the Drosophila melanogaster species complex. Curr. Biol.,%: 1183-1186.

Hastbacka, J. et al., 1992. Linkage disequilibrium mapping in isolated founder populations: Diastrophic dysplasia in Finland. Nature Genet., 2: 204-211.

Hauge, X. Y. and M. Litt, 1993. A study of the origin o f ‘shadow bands’ seen when typing dinucleotide repeat polymorphisms by PCR. Hum. Mol. Genet., 2:411-415.

Heame C, S. Ghosh, J. A. Todd, 1992. Microsatellites for linkage analysis of genetic traits. Trends Genet., 8: 288-294.

Henderson, S. T. and T. D. Petes, 1992. Instability of simple sequence DNA in Saccharomyces cerevisiae. Mol. Cell. Biol., 12: 2749-2757.

Herbinger, C. M. et ai, 1995. DNA fingerprinting based analysis of paternal and maternal effects on offspring growth and survival in commercially reared rainbow trout. Aquaculture, 137: 245-256.

Hilgneforf, P., 1888. Fische aus dem Victoria-Nyanza (Ukerewe-See). Sitzungsbericht der Gesellschaft naturforschender Freunde Berlin, 1988: 75-79.

Hoogerhoud, R. J. C., 1986, Taxonomic and ecological aspects of morphological plasticity in molluscivorous haplochromines (Pisces, Cichlidae). Annls. Mus. r. Afr. Cent. Sci. ZooL, 251: 131-134.

Hughes, C. R. and D. C. Queller, 1993. Detection of highly polymorphic microsatellite loci in a species with little allozyme polymorphism. Mol. Ecol., 2: 131-137.

142 Hutter, C. M., M. D. Schug, and C. F. Aquadro, 1998. Microsatellite variation in Drosophila melanogaster and Drosophila simulans: a reciprocal test of the ascertainment bias hypothesis. Mol. Biol. Evol., 15: 1620-1636.

Ishibashi, Y., T. Saitoh, S. Abe, and M. C. Yoshida, 1997. Sex-related spatial kin structure in a spring population of grey-sided voles Clethrionomys rufocanus as revealed by mitochondrial and microsatellite DNA analyses. Mol. Ecol., 6: 63-71.

Johansson, M., H. Ellegren and L. Andersson, 1992. Cloning and characterization of highly polymorphic porcine microsatellites. J. Hered.,S3: 196-198.

Johnson, T. C., C. A. Scholz, and M. R. Talbot, K. Kelts, and R. D. Ricketts, 1996. Late pleistocene desiccation of Lake Victoria and rapid evolution of cichlid fishes. Science, 273: 1091-1093.

Kandpal, R. P., G. Kandpal, and S. M. Weissman, 1994. Construction of libraries enriched for sequence repeats and jumping clones and hybridization selelction for region-specific markers. Proc. Natl. Acad. Sci. U.S.A., 91: 88-92.

Kaufinan, L. 1997. Asynchronous taxon cycles in haplochromine fishes of the Lake Victoria region. South Afri. J. Sci., 93: 601-606.

Kaufinan, L., L. J. Chapman, and C. A. Chapman, 1996. The Great Lakes. In: East African ecosystems and their conservation (T. R. McCIanahan & T. P. Young, eds.). pp. 191-216. Oxford University Press, New York.

Kaufinan, L. S., Chapman, L. C. and Chapman, C. 1997. Evolution in fast forward: Haplochromine fishes of the Lake Victoria region. Endeavour 21: 23-30.

Kaufinan, L. and K. P. Liem, 1982. Fishes of suborder Labroidei (Pisces: Perciformes): phylogeny, ecology and evolutioinary significance. Breviora, MV. 1-19.

Kaufinan LS, and P. Ochumba, 1993. Evolutionary and conservation biology of cichlid fishes as revealed by faunal renmants in lake Victoria. Cons. Biol, 7: 719-730.

Kawaguchi, Y. et al, 1994. CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nature Genet., 8: 221 - 228.

Kijas, J. M. H., J. C. S. Fowler, C. A. Garbett, and M. R. Thomas, 1994. Enrichment of microsatellites from the citrus genome using biotinylated oligonucleotide sequences bound to streptavidin-coated magnetic particles. Biotechniques, 16(4): 656-662.

143 BCiininel, M., R. Chakraborty, D. N. Stivers and R. Deka, 1996. Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between- population variability at microsatellite loci. Genetics, 143: 549-555.

Kimura, M. and J. F. Crow, 1964. The number of alleles that can be maintained in a finite population. Genetics, 49: 725-738.

Kimura, M. and T. Ohta, 1978. Stepwise mutation model and distribution of alleleic firequencies in a finite population. PNAS, 75: 2868-2872.

Knight, M. E., G. F. Turner, C. Rico, M. J. H. van Oppen, and G. M. Hewitt, 1998. Mcirosatellite paternity analysis on captive Lake Malawi cichlids supports reproductive isolation by mate choice. Mol. Ecol, 7: 1605-1610.

Knight, S. J. L. et al., 1993. Trinucleotide repeat amplification and hypermethylation of a CpG island in FRAXE mental retardation. Cell, 74: 127 - 134.

Kocher, T. D., J. A. Conroy, K. R. McKaye, and J. R. Stauffer, 1993. Similar morphologies of cichlid fish in Lakes Tanganyika and Malawi are due to convergence. Mol. Phylogenet. Evol., 2: 158-165.

Kocher, T. D., W.-J. Lee, H. Sobolewska, D. Penman, and B. McAndrew, 1998. A genetic linkage map of a cichlid fish, the tilapia {Oreochromis niloticus). Genetics, 148: 1225-1232.

Koide, R. et al., 1994. Unstable expansion of CAG repeat in hereditary dentatorubral- pallidoluysian atrophy (DRPLA). Nature Genet., 6: 9-13.

Komfield, I. and J. N. Taylor, 1982. A new species of polymorphic fish, Cichlasoma minckleyi firom Cuatro Cienegas, Mexico, (Teleostei: Cichlidae). Proc. Biol. Soc. Wash. 96: 253-269.

Kruglyak, L., 1997. The use of a genetic map of biallelic markers in linkage studies. Nature Genet., 17: 21-24.

Kudhongania, A. W. and D. B. R. Chitamwebwa, 1995. Impact of environmental change, species introductions and ecological interactions on the fish stocks of Lake Victoria. In: The Impact o f Species Changes in African Lakes (Pitcher, T. J. and P. J. B. Hart, eds.). pp. 19-32. Chapman & Hall, New York.

Kumar, S., K. Tamura, and M. Nei, 1993. MEGA: molecular evolutionary genetic analysis, version 1.01. University College: Pennsylvania State University Press.

144 La Spada, A. R., E. M. Wilson, D. B. Lubahn, A. E. Harding, and K. H. Fischbeck, 1991. Andogen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature, 352: 77-79.

Lee, W.-J., and T. D. Kocher, 1996. Microsatellite DNA markers for genetic mapping in Oreochromis niloticus. J. Fish Biol., 49: 169-171.

Levinson, G. and G. A. Gutman, 1987. High frequencies of short frameshifrs in poly- CA/TG tandem repeats borne by bacteriophage M13 in Escherichia coli K-12. Nucl. Acids Res., 15: 5323-5338.

Liem, K. F. and L. S. Kaufrnan, 1984. Intraspecific macroevolution: fimctional biology of the polymorphic cichlid species Cichlasoma minckleyi. In Evolution o f Fish Species Flocks (A. A. Echelle and I. Komfield, eds.), pp. 203-215. Orono: University of Maine at Orono Press.

Lippitsch, E., 1993. A phyletic study on lacustrine haplochromine fishes (Perciformes, Cichlidae) of East Africa, based on scale and squamation characters. J. Fish Biol, 42: 903-946.

Litt, M., X. Hauge, and V. Sharma, 1993. Shadow bands seen when typing polymorphic dinucleotide repeats: some causes and cures. Biotechniques, 15(2): 280-284.

Litt M and J. A. Luty,1989. A hypervariable microsatellite revealed by in vitro amplification o f a dinucleotide repeat within the cardiac muscle actin gene. Am. J. o f Hum. Genet., 44: 397-401.

Mahtani, M. M. and H. F. Willard, 1993. A polymorphic X-linked tetranucleotide repeat locus displaying a high rate of new mutation: implications for mechanisms of mutation at short tandem repeat loci. Hum. Mol. Genet., 2: 431-437.

Mandel, J. L., 1994. Trinucleotide diseases on the rise. Nature Genet., 7: 453-455.

Mayer, W. E., H. Tichy, and J. Klein, 1998. Phylogeny o f African cichlid fishes as revealed by molecular markers. Heredity, 80: 702-714.

McConnell, S. K., P. O'Reilly, L. Hamilton, J. Wright, and P. Bentzen, 1995. Polymorphic microsatellite loci from Atlantic salmon {Salmo salar): genetic differentiation of North American and European populations. Can. J. Fish. Aquat. Sci., 52: 1863-1872.

145 Meyer, A., T. D. Kocher, P. Basasibwake, and A. C. Wilson, 1990. Monophyletic origin of Lake Victoria cichlid fishes suggested by mitochondrial DNA sequences. Nature, Land. 347: 550-553.

Miesfeld, R., M. Krystal and N. Amheim, 1981. A member of a new repeated sequence family which is conserved throughout eucaryotic evolution is found between the human delta and beta globin genes. Nucl. Acids Res., 9: 5931-5947.

Morin, P. A., J. Wallis, J. J. Moore, and D. S. Woodruff, 1994. Paternity exclusion in a community of wild chimpanzees using hypervariable simple sequence repeats. Mol. Ecol., 3: 469 - 478.

Morral, N and X. Estivill, 1992. Multiplex PCR amplification of three microsatellites within the CFTRgene. Genomics, 13: 1362 - 1364.

Morris, D. B., R. K. Richard, J. M. Wright, 1996. Microsatellites from rainbow trout {Oncorhynchus mykiss) and their use for genetic studies of salmonids. Can. J. Fish. Aquat. Sci., 53: 120-126.

Nagl, S., H. Tichy, W. E. Mayer, N. Takahata, and J. Klein, 1998. Persistence of neutral polymorphisms in Lake Victoria cichlid fish. Proc. Natl. Acad. Sci. USA, 95: 14238- 14243.

Nancarrow, J. K., et. al., 1994. Implication of FRA16A structure for the mechanism of chromosomal fragile site genesis. Science, 26^: 1938-1941.

Nei, M., 1972. Genetic distance between populations. Amer. Nat. 106:283-291.

Nei, M. 1987. Molecular Evolutionary Genetics. New York: Columbia University Press.

Nielsen, J. L., C. A. Gan, J. M. Wright, D. Morris and K. Thomas, 1995. Biogeographic distributions of mitochondrial and nuclear markers for southern steelhead. Mol. Mar. Biol. Biotech., 3: 281 - 293.

Nordheim, A. and A. Rich, 1983. The sequence (dC-dA)^.(dG-dT)n forms left-handed Z-DNA in negatively supercoiled plasmids. Prac. Natl.Acd. Sci., 80: 1821-1825.

O'Reilly, P. and J. M. Wright, 1995. The evolving technology of DNA fingerprinting and its application to fisheries and aquaculture. J. Fish Biol, 47 (Suppl. A): 29-55.

Orr, H. T. et al, 1993. Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nature Genet., 4: 221-226.

146 Ostrander, E. A., G. F. Sprague and J. Rine, 1993. Identification and characterization of dinucleotide repeat (CA)n markers for genetic mapping in dogs. Genomics, 16: 207- 213.

Ostrander, E. A., P. M. Jong, J. Rine, and R. Duyk, 1992. Construction o f small-insert genomic DNA libraries highly enriched for microsatellite repeat sequences. Proc. Natl Acad. Sci. USA, 89: 3419-3423.

Paetkau D. and C. Strobeck, 1994. Microsatellite analysis of genetic variation in black bear population. M ol Ecol, 3: 489-495.

Paetkau, D. and C. Strobeck, 1995. The molecular basis and evolutionary history of a microsatellite null allele in bears. Mol Ecol, 4: 519-520.

Parrish, J. E., et a., 1994. Isolation of a GCC repeat showing expansion in FRAXF, a fragile site distal to FRAXA and FRAXE. Nature Genet., 8: 229-235.

Parsons, R. et a l, 1993. Hypermutability and mismatch repair deficiency in RER'*’ tumor cell. Cell, 75: 1227-1236.

Postlethwait, J. H. et al, 1994. A genetic linkage map for the zebrafish. Science, 264: 699-703.

Pitcher, T. J. and P. J. B. Hart, 1995. The Impact o f Species Changes in African Lakes. Chapman & Hall, New York.

Primmer, C. R., N. Saino, A. P. Moller, and H. Ellegren, 1998. Unraveling the processes of microsatellite evolution through analysis of germ line mutations in bam swallors Hirundo rustica. M ol Biol Evol, 15: 1047-1054.

Raymond, M. and F. Rousset, 1995. Genepop (version 1.2): population genetics software for exact tests and ecumeniscism. J. Hered., 86: 248-249.

Regan, C. T., 1920. The classification of the fishes of the family Cichlidae. I. The Tanganyika genera. Annals and Magazine o f Natural History, Ser. 9, 5: 33-53.

Rico C, Rico I, G. Hewitt, 1996. 470 million years of conservation of microsatellite loci among fish species. Proc. Roy. Soc. Lon., 263B, 549-557.

Roy, M. S. er al, 1994. Patterns of differentiation and hybridization in North American wolflike canids, revealed by analysis of microsatellite loci. Mol Biol Evol, II: 553- 570.

147 Rousset, F. and M. Raymond, 1995 Testing heterozygote excess and deficiency. Genetics 140: 1413-1419.

Rozas, J. & Rozas, R. 1997. DnaSP version 2.0: a novel software package for extensive molecular population genetics analysis. Comput. Appl. Biosci. 13, 307-311.

Ruzzante DE, Taggart CT, Cook D, Goddard S (1996) Genetic differentiation between inshore and offshore Atlantic cod (jJadus morhud) off Newfoundland: microsatellite DNA variation and antifreeze level. Canadian Journal o f Fisheries and Aquatic Sciences, 53, 634-645.

Rychlik W, Rhoads RE (1989) A computer program for choosing optimal oligonucleotides for filter hybridization, sequencing, and in vitro amplification of DNA. Nucleic Acids Research, 17, 8543-8551.

Sage, R. D., Loiselle, P. V., Basasibwaki, P. & Wilson, A. C. 1984 Molecular verse morphological change among cichlid fishes of Lake Victoria. In: Evolution o f Fish Species Flocks (eds. A. A. Echelle & I. Komfield), pp. 185-201. Orono: University of Maine at Orono Press.

Saitou, N., and M. NEI, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4: 406-425.

Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A laboratory Manual, 2nd edn. Cold Spring Harbor Laboratory Press, New York.

Sargent, R. G., R. V. Merrihew, R. Naim, G. Adair, M. Meuth & J. H. Wilson, 1996. The influence of a (GT)29 microsatellite sequence on homologous recombination in the hamster adenine phosphoribosyltransferase gene. Nucl. Acids Res., 24: 746 - 753.

Schlotterer, C., B. Amos, and D. Tautz, 1991. Conservation o f polymorphic simple sequence loci in cetacean species. Nature, 354: 63-65.

Schlotterer, C. & D. Tautz, 1992. Slippage synthesis of simple sequence DNA. Nucleic Acids Res., 20: 211-215.

Schlotterer, C., R. Ritter, B. Harr, and G. Brem, 1998. High mutationi rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates. Mol. Biol. Evol., 15: 1269-1274.

Schug, M. D., C. M. Hutter, K. A. Wetterstrand, M. S. Gaudette, T. F. C. Mackay, and C. F. Aquadro, 1998. The mutation rates of di-, tri- and tetranucleotide repeats in Drosophila melanogaster. Mol. Biol. Evol., 15: 1751-1760.

148 Schug, M. D., T. F. C. Mackay, and C. F. Aquadro, 1997. Low mutation rates of microsatellite loci m Drosophila melanogaster. Nature Genet., 15: 99-102.

Schwengel, D. A., A. E. Jedlicka, E. J. Nanthakumar, J. L. Weber and R. C. Letitt, 1994. Comparison of fluorescence-based semi-automated genotyping of multiple microsatellite loci with autoradiographic techniques. Genomics, 22: 46-54.

Scribner KT, J. R. Gust, Fields RL (1996) Isolation and characterization of novel salmon microsatellite loci: cross-species amplification and population genetic applications. Canadian Journal o f Fisheries and Aquatic Sciences, 53, 833-841.

Seehausen, O., F. Witte, J. J. M. van Alphen, and N. Bouton, 1998. Direct mate choice maintaining diversity among sympatric cichlids in Lake Victoria. J. Fish Biol., 53 (Suppl. A): 37-55.

Seehausen, O. and J. J. M. van Alphen, 1998. The effect o f male coloration on female mate choice in closely related Lake Victoria cichlids {H. nyererei complex). Behavioural Ecology and Sociobiology, 42(1): 1-8.

Seehausen, O., van Alphen, J. J. M., and Witte, F., 1997. Cichlid fish diversity threatened by eutrophication that curbs sexual selection. Science, 277: 1808-1811.

Serikawa, T. et al, 1992. Rat gene mapping using PCR-analyzed microsatellites. Genetics, 131: 701-721.

Shriver, M. D., L. Jin, E. Boerwinkle, R. Deka, R. E. Ferrell et al.,, 1995. A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol. Biol. Evol, 12: 914-920.

Shriver, M. D., L. Jin, R. Chakraborty, and E. Boerwinkle, 1993. VNTR allele frequency distribution under the stepwise mutation model. Genetics, 134: 983-993.

Slatkin, M., 1993. Isolation by distance in equilibrium and nonequilibrium populations. Evolution, 47: 264-279.

Sneath, P. H. A., and R. R. Sokal, 1973. Numerical Taxonomy. W. H. Freeman, San Francisco.

Song, C. B., 1994. Molecular Evolution of the Cytochrome B gene Among Percid Fishes. Ph.D. Dissertation, University of Illinois at Urbana-Champaign, USA.

149 Stallings, R. L., A. F. Ford, D. Nelson, D. C. Tomey, C. E. Hildebrand & R. K. Moyzis, 1991. Evolution and distribution of (GT)n repetitive sequences in mammalian genomes. Genomics, 10 : 807-815.

Stephen, J. C., D. A. Gilbert, N. Yuhki, and S. J. O'Brien, 1992. Estimation of heterozygosity for single-probe multilocus DNA fingerprints. Mol. Biol. Evol., 9: 729-743.

Stepien, C. A. and T. D. Kocher, 1997. Molecules and morphology in studies of fish evolution. In: Molecular Systematics o f Fishes (C. Stepien, T. D. Kocher eds), pp. 1- 11. New York: Academic Press.

Stiassny, M. L. J., 1991. Phylogenetic intrarelationships of the family Cichlidae: An overview. In Cichlid Fishes: Behaviour, Ecology and Ecology (M. H. A. Keenleyside, ed. ), pp. 1-35. London: Chapman & Hall.

Strand, M., T. A. Prolla, R. M. Liskay and T. D. Petes, 1993. Destabilization of tracts of simple repetitive DNA in yeast by mutations affecting DNA mismatch repair. Nature, 365: 274 - 276.

Strassmann JE, Solis CR, Barefield K, Queller DC (1996) Trinucleotide microsatellite loci in a swarm-founding neotropical wasp, Parachartergus colobopterus and their usefulness in other social wasps. Mol. Ecol, 5: 459-461.

Sturmbauer, C. and A. Meyer, 1992. Genetic divergence, spéciation and morphological stasis in a lineage o f African cichlid fishes. Nature, 358: 578-581.

Sultmann, H., W. E. Mayer, F. Figueroa, H. Tichy, and J. Klein, 1995. Phylogenetic analysis of cichlid fishes using nuclear DNA markers. Mol. Biol. Evol., 12: 1033- 1047.

Taberlet, P. et al, 1996. Reliable genotyping of samples with very low DNA quantities using PCR. Nucl. Acids Res., 24(16): 3189-3194.

Takezaki, N. and M. Nei, 1996. Genetic distances and reconstruction o f phylogenetic trees from microsatellite DNA. Genetics, 144: 389-399.

Tarr, C. L., S. Conant, and R. C. Fleischer, 1998. Founder events and variation at microsatellite loci in an insular passerine bird, the Laysan finch {Telespiza cantans). Mol. Ecol, 7: 719-731.

Tautz, D., 1989. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucl Acids Res., 17: 6463-6467.

150 Tautz, D., M. Trick and G. A. Dover, 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature, 322: 652-656.

Taylor, A. C., W. B. Sherwin, and R. K. Wayne, 1994. Genetic variation of microsatellite loci in a bottlenecked species: the northern hairy-nosed wombat Lasiorhinus kreffiii. Mol. Ecol., 3: 277-290.

The Huntington's Disease Collaborative Research Group, 1993. A novel gene containing a tmucleotide repeat that is expanded and unstable on Huntington's disease chromosome. Cell, 72: 971-983.

Todd, J. A., 1992. La carte des microsatellites est arrivée! Hum. Mol. Genet., 1: 663-666.

Vaiman, D. et al.,, 1994. Conservation of a syntenic group of microsatellite loci between cattle and sheep. Mammal. Genome, 5: 310 -314.

Valdes, A. M., M. Slatkin and N. B. Freiner, 1993. Allele frequencies at microsatellite loci: the stepwise mutation model revised. Genetics, 133: 737-749. van Oppen, M. J. H. et al., 1997. Unusually fme-scale genetic structuring found in rapidly speciating Malawi cichlid fishes. Proc. Roy. Soc. Lon., 264B: 1803-1812. van Oppen, M. J. H. et al, 1998. Assortative mating among rock-dwelling cichlid fishes supports high estimates of species richness from Lake Malawi. Mol. Ecol, 7:991- 1001.

Viard, R. et ai, 1996. Microsatellites and the genetics of highly selfing populaitons in ihsfrQsh\N3.lti svvaxl Bulinus truncatus. Genetics, 142: 1237-1247.

Wahls, W. P., L. J. Wallace and P. D. Moore, 1990. The Z DNA motif DTG-30 promotes receptor of information during gene conversion events while stimulating homologous recombination human cells in culture. Mol Cell Biol., 10: 785-793.

Wang, D. G. et a i, 1998. Large-scale identification, mapping, and genotyping of single­ nucleotide polymorphosims in t he human genome. Science,2SQ: 1077-1082.

Wattier, R., C. R. Engel, P. Saumitou-Laprade, and M. Valero, 1998. Short allele dominance as a source of heterozygote deficiency at microsatellite loci: experimental evidence at the dinucleotide locus GvlCT in Gracilaria gracilis (Rhodophyta). Mol Ecol.,1: 1569-1573.

151 Weber, J. L., 1990. Informativeness of human (dC-dA)n (dG-dT)n polymorphisms. Genomics, 7: 524-530.

Weber, J. L.and C. Wong, 1993. Mutation of human short tandem repeats. Hum. Mol. Genet.,2: 1123-1128.

Weber, J. L. & P. E. May, 1989. Abundant class of human DNA polymorphisms which can be types using the polymerase chain reaction. Am. J. Hum. Genet., 44: 388-396.

Weir, B. S. and C. C. Cockerham, 1984. Estimating F-statistics for the analysis of population structure. Evolution, 1358-1370.

Weissenbach, J. et ai, 1992. A second-generation linkage map of the human genome. Nature, 359: 794-801.

Witte, P., T. Goldschmidt, and J. H. Wanink, 1995. Dynamics of the haplochromine cichlid fauna and other ecological changes in the Mwanza Gulf of Lake Victoria. In: The Impact o f Species Changes in African Lakes (Pitcher, T. J. and P. J. B. Hart, eds.). pp. 83-110. Chapman & Hall, New York.

Wu, L., G. Booton, M. Chandler, L. Kaufinan, and P. Fuerst, 1996. Use of DNA microsatellite loci to identify populations and species of Lake Victoria haplochromine cichlids. In: Aquaculture Biotechnology (E. M. Donaldson and D. D. MacKinlay, eds.), pp. 105-113. American Fisheries Society, Bethesda, MD.

Wu, L., Kaufinan, L. and Fuerst, A. 1999. Isolation o f microsatellite markers in Astatoreochromis ailuaudi and their cross-species amplifications in other African cichlids. Molecular Ecology, in press.

Zardoya R. et al., 1996. Evolutionary conservation of microsatellite flanking regions and their use in resolving the phylogeny of cichlid fishes (Pisces: Perciformes). Proc. Roy. Soc. Lond., 263B: 1589-1598.

Zhang, Q. and R. Tiersch, 1993. Rapid isolation of DNA for genetic screening of catfishes by polymerase chain reaction. Trans. Am. Fish. Sac., 123: 997-1001.

Zheng, L., M. Q. Benecict, F. H. Collins, and F. C. Kafafes, 1996. An integrated genetic map of the Afiican human malaria vector mosquito. Anopheles gambiae. Genetics, 143: 941-952.

Ziegle, J. S. et al, 1992. Application of automated DNA sizing technology for genotyping microsatellite loci. Genomics, 14: 1026-1031.

152 APPENDIX A:

Data Relevant to Chapter 3,

Information on A. ailuaudi Samples

Table A .l: Tissue sample ID, DNA sample ID, tissue sample collection locations and years for A. ailuaudi samples.

153 DNA ID Sample ID Sampling location Sampling year 2 16567 Jinja Pier 1993 3 16603 " 4 16856 " 5 16867 " 5.2 16848 " 48 14559 1992 51 14548 " 52 14545 " 53 14546 " 54 14551 " 66 14547 " 67 14556 " 68 14555 " 70 14553

18 97143 Lake Kachira 1993 19 97090 20 97148 21 97147 22 97149 23 97158 24 97155 25 97091 26 97152 27 97077 28 97154 29 97087 30 97088 31 97159 32 97084 33 97092 34 91716 35 97082 37 97081 38 97080 39 97161 40 97089

6 16617 Lake Kyoga 1993 7- 16619 8* 16620 9 16621 10 16622 11 16623

Table A.I. (Continued)

154 Table A.I. (Continued)

12 16662 Lake Kyoga 1993 13 16671 " 14 16630 " 15 16653 " 16 16654 " 17 16657 " 17.1 16632 " 17.2 16655 36 16656 II 71 19812 II 1994 72 19813 " 73 19819 " 74 19820 " 75 19950 " 76 19951 " 77 19851 "

AFl 0315 L. Nawampasa 1996 AF2 0316 AF3 0317 AF4 0318 AF5 0319 AF6 0320 AF7 0321 AF8 0322 AF9 0324 AFIO 0325 AFll 0326 AF12 0327 AF13 0328 AF14 17474 1995 AF15 17475 AF16 17496 AF17 17497 AF18 17498 AF19 19499 78 17535 79 17536 80 17764 81 17765

AF20 17793 Napoleon Gulf 1995 AF21 17794 " " AF22 17795 " " AF23 17796 ""

Table A.I. (Continued)

155 Table A.I. (Continued)

AF24 17797 Napoleon G ulf 1995 AF25 17846 " AF26 0002 1996 AF27 0003 " AF28 0005 " AF29 0006 " AF30 0042 " AF31 0043 " AF32 0044 " AF33 0045 " AF34 0046 " 112 98189 " 113 98024 " 114 98057

82 K l Lake Kabeleka 1996 83 K2 84 K3 85 K4 86 K5 87 K6 88 K7 89 K18 90 K19 91 K20 92 BC21 93 K22 94 K23 95 K24 96 K25 97 K26 98 K27 99 K28 100 K29 101 K30 102 K31 103 K32 104 K33 105 K34 106 K35 107 K36 108 K37 109 K38 110 K39 111 K40

156 APPENDIX B:

Data Relevant to Chapter 3,

Aligned mtDNA Sequences for A. alluaudi Samples

Figure B.l: mtDNA sequences for the 10 haplotypes of A. alluaudi, aligned with sequences of a single A. alluaudi specimen from Lake Victoria as well as a single

Astatotilapia burtoni specimen (Meyer et ai, 1990). Dots - identical sequences to the first sequence; dash — deletion. Numbers above the sequence alignments are the nucleotide positions relative to positions of the aligned sequences from Meyer et al,

(1990). Aa: A. alluaudi', Kac: Lake Kachira; Vic: Lake Victoria; Kyo: Lake Kyoga; Kab:

Lake Kabeleka; Aa Meyer et al, : A. alluaudi sequence from Meyer et al., (1990).

Numbers after the lake abbreviations are sample ID in Appendix A.

157 2222222222 3333333333 4444444444 5555555555 6656666666 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kac 19 TTCAAACAAA GGGGATTTTA ACCCCTACCC CTAGCTCCCA AAGCTAGGAT Aa Kac 33 ...... Aa Kyo 08 ...... Aa Kyo 09 ...... Aa Kyo 71 ...... Aa Vic 51 ...... Aa Vic 52 ...... Aa Vic 54 ...... Aa Vic 70 ...... Aa Kab 101 ...... Aa Meyer ec al., ...... G ...... A. burtoni...... A ..... C ...... A ......

1111111111 1111111111 7777777777 8888888888 9999999999 OOOOOOOOOO 1111111111 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kac 19 CCTAATTTAG ACTATTGTCT GCCGGGCTCT GCCTTTTATG TAAACGCAAT Aa Kac 33 ...... Aa Kyo 08 ...... Aa Kyo 09 ...... Aa Kyo 71 ...... Aa Vic 51 ...... Aa Vic 52 ...... Aa Vic 54 ...... Aa Vic 70 ...... Aa Kab 101 ...... Aa Meyer et al., ...... A. burtoni T C ...... CG ...... T...

1111111111 1111111111 1111111111 1111111111 1111111111 2222222222 3333333333 4444444444 5555555555 6666666666 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kac 19 GCATATATGT ATTATCACCA TTATTCTATA TCAAACATAT CCTATATATA Aa Kac 33 ....G...... Aa Kyo 08 ....G...... Aa Kyo 0 9 ...... Aa Kyo 71 ....G...... Aa Vic 51 ....G. Aa Vic 52 ....G. Aa Vic 54 ....G. Aa Vic 70 ....G. Aa Kab 101 ....G.. Aa Meyer et al., ....G. A. burtoni ......

1111111111 1111111111 1111111111 2222222222 2222222222 7777777777 8888888888 9999999999 OOOOOOOOOO 1111111111 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kac 19 AATACATAAT ATTCACAAAG ACATAGATTT ATTTCCCACA TATTTGTTAA Aa Kac 33 .G ...... Aa Kyo 08 ......

Figure B.l. (Continued)

158 Figure B.l. (Continued)

Aa Kyo 0 9 ...... Aa Kyo 71 .G...... Aa Vie 51 C. . Aa Vie 52 CC. Aa Vie 54 ...... Aa Vie 70 ...... Aa Kab 101 G. Aa Meyer et al., ...... A. burtoni TC ...T.T..GA .T.... CAA . .A.A

2222222222 2222222222 2222222222 2222222222 2222222222 2222222222 3333333333 4444444444 5555555555 6666666666 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kae 19 GAACATTTTA ACTAAGGGGT ACATAAACCA TAACTGAAAC TTTTCCAATA Aa Kae 33 ...... Aa Kyo 08 ...... Aa Kyo 09 C. Aa Kyo 71 ...... Aa Vie 51 ...... Aa Vie 52 ...... Aa Vie 54 ...... Aa Vie 70 ...... Aa Kab 101 ...... Aa Meyer ec al., ...... A. burtoni C....C...... TA ......

2222222222 2222222222 2222222222 3333333333 3333333333 7777777777 8888888888 9999999999 OOOOOOOOOO 1111111111 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kae 19 AATATTAATG AAATACTGAA CGATAGTTTA AGACCGATCA CACCTCTCAC Aa Kae 33 ...... Aa Kyo 08 ...... Aa Kyo 09 ...... Aa Kyo 71 ...... Aa Vie 51 G ...... Aa Vie 52 ...... Aa Vie 54 G ...... Aa Vie 70 G ...... Aa Kab 101 ...... Aa Meyer et ai., ...... A. burtoni . T T .TCC......

3333333333 3333333333 3333333333 3333333333 3333333333 2222222222 3333333333 4444444444 5555555555 6666666666 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kae 19 TAGTTAAGAT ATACCAAGTA CCCACCATCC TATTCATTAC CAATACTTAA Aa Kae 33 ...... Aa Kyo 08 ...... Aa Kyo 0 9 ......

Figure B.l. (Continued)

159 Figure B.l. (Continued)

Aa Kyo 71 ...... Aa Vie 51 ...... Aa Vie 52 ...... Aa Vie 54 ...... Aa Vie 70 ...... Aa Kab 101 ...... Aa Meyer et al., ...... A. burtoni...... T ...... T.C.T. .C...T.

3333333333 3333333333 3333333333 4444444444 4444444444 7777777777 8888888888 9999999999 OOOOOOOOOO 1111111111 0123456789 0123456789 0123456789 0123456789 0123456789 Aa Kae 19 TGTAGTAAGA GCCCACCATC AGTTGATTTC TTAATGTTAA CGGTTCTTGA Aa Kae 33 ...... A ...... Aa Kyo 08 ...... Aa Kyo 09 ...... Aa Kyo 71 ...... Aa Vie 51 ...... Aa Vie 52 ...... Aa Vie 54 ...... Aa Vie 70 ...... Aa Kab 101 ...... Aa Meyer et al., ...... C .... A. burtoni...... C ......

4444444444 4444444444 4444444444 44 2222222222 3333333333 4444444444 55 0123456789 0123456789 0123456789 01 Aa Kae 19 AGGTCAAGGA CAATTATTCG TGGGGGTTTC CC Aa Kae 33 ...... Aa Kyo 08 ...... Aa Kyo 09 ...... Aa Kyo 71 ...... Aa Vie 51 ...... Aa Vie 52 ...... Aa Vie 54 ...... Aa Vie 70 - ...... Aa Kab 101 ...... Aa Meyer et al., ...... A ...... A. burtoni G ...... A.

160 APPENDIX C:

Data Relevant to Chapter 3,

AUele frequency data for the six A. alluaudi Samples.

161 OSUOSd Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 141 0.068 145 0.042 0.091 0.159 149 0.045 163 0.023 0.017 165 0.083 0.086 167 0.023 169 0.042 0.028 0.045 0.045 0.034 171 0.042 0.028 0.023 0.091 0.023 0.069 173 0.028 0.023 0.017 175 0.139 0.841 0.045 0.114 0.069 177 0.111 0.045 0.091 0.138 179 0.208 0.091 0.136 0.023 0.155 181 0.042 0.056 0.052 183 0.042 0.083 0.159 0.068 0.138 185 0.111 0.023 0.068 0.017 187 0.083 0.056 0.091 0.159 0.052 189 0.042 0.056 0.023 0.023 0.017 191 0.028 0.068 0.023 0.017 193 0.083 0.083 195 0.042 0.045 0.023 0.052 197 0.056 0.023 0.023 199 0.042 0.083 0.068 201 0.083 0.045 0.069 205 0.042 207 0.028 209 0.023 213 0.083 0.028

0SU12t Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 101 0.023 107 1.000 1.000 1.000 1.000 0.955 1.000 116 0.023

OSU13d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 101 0.019 105 0.077 0.023 0.023 0.065 107 0.028 0.023 0.022 0.056 109 0.038 0.056 0.091 0.019 111 0.154 0.028 0.022 113 0.077 0.083 0.023 0.045 0.022 0.093 115 0.028 0.045 0.045 0.019 117 0.038 0.056 0.045 0.130 0.037 119 0.115 0.111 0.023 0.043 0.093

162 121 0.115 0.083 0.068 0.091 0.022 0.037 123 0.038 0.111 0.045 0.022 0.167 125 0.077 0.056 0.182 0.045 0.022 0.056 127 0.038 0.045 0.091 0.174 0.056 129 0.038 0.028 0.023 0.045 0.087 0.056 131 0.227 0.091 0.087 0.019 133 0.038 0.056 0.068 0.136 0.065 0.019 135 0.038 0.056 0.091 0.022 0.056 137 0.038 0.028 0.045 0.045 0.065 139 0.083 0.023 0.043 0.056 141 0.077 0.056 0.023 0.022 0.056 143 0.023 0.037 145 0.028 0.068 0.043 147 0.068 0.022 0.019 149 0.023 0.019 151 0.028 153 0.019 155 0.023 157 0.023

0SU 16d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 72 0.885 0.882 1.000 0.909 0.957 0.850 74 0.038 0.088 0.045 0.022 0.150 76 0.077 0.029 0.023 78 0.023 0.022

0SU19d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 114 0.023 118 0.045 122 0.022 126 0.022 128 0.029 0.022 130 0.029 0.023 0.033 132 0.029 0.022 0.017 138 0.038 0.117 140 0.045 0.017 142 0.023 0.033 144 0.088 0.023 0.022 146 0.023 0.023 0.065 0.033 148 0.115 0.136 0.065 0.017 150 0.077 0.118 0.250 0.023 0.022 152 0.269 0.088 0.023 0.068 0.022 154 0.038 0.147 0.045 0.087 0.017 156 0.077 0.118 0.091 0.043 0.033 158 0.077 0.029 0.227 0.091 0.109 0.017

163 160 0.154 0.088 0.114 0.045 0.043 0.067 162 0.115 0.059 0.068 0.045 0.109 0.100 164 0.038 0.159 0.091 0.022 0.133 166 0.029 0.136 0.045 0.043 0.050 168 0.059 0.091 0.109 0.050 170 0.023 0.043 0.117 172 0.029 0.022 0.033 174 0.022 0.033 176 0.022 0.050 178 0.059 0.022 0.033 180 0.022

0SU19t Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 122 0.083 125 0.045 126 0.038 0.056 0.045 0.043 127 0.022 128 0.023 0.022 130 0.077 0.056 0.022 131 0.028 0.310 0.205 0.083 132 0.115 0.222 0.333 0.205 0.130 0.033 133 0.067 134 0.038 0.056 0.114 0.109 0.167 135 0.023 0.109 0.150 136 0.190 0.068 0.109 0.117 137 0.231 0.167 0.167 0.068 0.283 0.017 138 0.028 0.023 0.022 139 0.038 0.056 0.045 0.067 140 0.423 0.333 0.068 0.087 0.133 141 0.045 0.043 0.067 142 0.038 0.023 146 0.017

OSU20d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 141 0.089 145 0.091 147 0.068 149 0.045 0.045 151 0.029 0.068 0.091 153 0.023 0.091 0.023 0.018 155 0.029 0.045 157 0.042 0.088 0.114 0.068 0.054 159 0.042 0.118 0.023 0.068 161 0.125 0.091 0.018 163 0.167 0.118 0.045 0.023 0.023 0.089

164 165 0.042 0.068 0.161 167 0.042 0.059 0.182 0.023 169 0.088 0.068 0.089 171 0.029 0.023 0.091 0.136 0.125 173 0.029 0.045 0.023 0.036 175 0.083 0.059 0.023 0.023 0.125 177 0.042 0.045 0.036 179 0.042 0.029 0.023 0.018 181 0.042 0.023 183 0.029 0.023 0.023 0.068 185 0.059 0.023 0.023 0.018 187 0.059 0.054 189 0.042 0.023 0.018 191 0.088 0.023 0.023 193 0.042 0.029 0.023 0.023 0.036 195 0.023 0.023 0.023 197 0.083 0.029 0.091 0.023 199 0.068 0.045 201 0.042 0.068 0.018 203 0.023 0.045 205 0.083 0.023 0.045 209 0.029 0.045 211 0.042 213 0.045 217 0.045 223 0.114 227 0.045 0.023 237 0.023 241 0.023 253 0.023

OSU21d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 146 0.059 156 1.000 0.941 0.955 1.000 0.977 0.683 158 0.045 0.023 166 0.117 178 0.200

OSU22d Allele Jinja Pier Napoleon Gulf Kachira Kyoga Nawampasa Kabeleka 89 0.023 95 0.045 0.022 99 0.045 101 0.022 103 0.050 105 0.050

165 107 0.038 0.056 0.043 0.017 111 0.067 113 0.056 0.065 0.083 115 0.038 0.111 0.017 117 0.017 119 0.038 0.028 0.023 0.065 0.017 121 0.038 0.045 0.022 0.033 123 0.023 0.022 0.017 125 0.038 0.028 0.023 0.043 127 0.077 0.159 0.087 0.050 129 0.028 0.023 0.043 131 0.115 0.083 0.023 0.043 0.017 133 0.114 0.068 0.087 0.017 135 0.038 0.045 0.022 0.067 137 0.056 0.022 0.017 139 0.077 0.056 0.043 141 0.091 0.091 0.043 0.017 143 0.028 0.045 0.087 145 0.038 0.083 0.068 0.045 0.022 0.050 147 0.192 0.083 0.068 0.045 0.043 0.017 149 0.038 0.056 0.023 0.045 0.043 0.033 151 0.083 0.136 0.043 0.017 153 0.115 0.028 0.023 0.033 155 0.038 0.083 0.023 0.043 0.017 157 0.023 0.033 159 0.038 0.023 0.023 0.033 161 0.028 0.050 163 0.045 165 0.159 0.033 167 0.038 0.023 0.068 0.022 0.033 169 0.023 0.017 171 0.114 0.017 173 0.028 0.045 175 0.023 0.017 177 0.023 0.050 181 0.023 183 0.023

166 APPENDIX D:

Data Relevant to Chapter 4,

Information on Cichiid Samples

Tissue sample ID, DNA sample ID, tissue sample collection locations and years for 24 haplochromine cichiid species. Information on A. alluaudi samples were presented in

Appendix A.

167 DNA ID Sample Sampling Sampling DNA Sample Sampling ID lake year ID ID lake year

Y. fusiformts A. velifer YFl 19686 Victoria 1994 AVI 17015 Nabugabo 94 YF2 19688 2 17016 YF3 19689 3 17017 YF4 19690 4 17018 YF5 19691 5 17019 YF6 19692 6 17020 YF7 19693 7 17021 YF8 19694 8 17022 YF9 19695 9 19518 YFIO 19701 10 19519 YFll 19702 11 19520 YFI2 19703 12 19521 YFI3 19704 13 19522 YF14 19706 14 19523 YF15 19707 15 19531 YFI6 19708 16 19533 YF17 19709 17 19610 YFI8 19673 18 19611 YFI9 19674 19 19613 YF20 19675 20 19614 YF21 19676 21 19615 YF22 19677 22 19616 YF23 19678 23 19617 YF24 19679 24 19618 YF25 19720 25 19619 YF26 19721 27 19645 YF27 19722 28 19646 YF28 19723 29 19647 YF29 19700 30 19648 31 19649 P. Venator 32 19650 PVl 0575 Kayanja 1996 33 19651 2 0576 34 19652 3 0577 35 19653 4 0578 36 19654 5 0579 37 19655 6 0580 38 19661 7 0581 39 16424 93 8 0582 " 40 16425 9 0583 " 41 16428 10 0584 " 42 16429 II 0585 " 43 16498 Kayugi 93 12 0586 " 44 16432 13 0587 " 45 16433 14 0588 Kayanja 1996 46 16434

168 15 0589 47 16435 16 0590 48 16436 17 0591 49 16437 18 0592 50 16438 19 0641 51 16439 20 0642 52 16440 21 0643 53 16441 22 0644 54 16443 23 0646 55 16447 24 0647 56 16448 25 0648 57 16449 26 0649 58 16450 27 0650 59 16451 28 0651 60 16452 29 0652 61 16453 30 0653 62 16445 31 0654 63 16446 32 0655 65 16455 33 0656 66 16456 34 0657 67 16457 35 0658 68 16458 36 0659 69 16459 37 0662 70 16460 38 0663 71 16461 39 0664 72 16462 40 0665 A. caliptera A. latifasciata AstcMl AstcMl Malawi 1996 ALI 0732 Kyoga 1996 2 AstcM2 " AL2 0733 3 AstcM3 AL3 0735 4 AstcM4 AL4 0736 5 AstcM5 " AL5 0737 6 AstcM6 " AL6 0738 7 AstcM7 AL7 0739 8 AstcM8 " AL8 0741 9 AstcM9 " ALIO 0748 10 AstcMlO " " 11 0749 11 AstcMl 1 " 12 0750 12 AstcMl 2 " 13 0752 13 AstcMl 3 " 14 0753 14 AstcM14 " " 15 0754 15 AstcMl 5 " " 16 0755 16 AstcMl 6 " " 17 0756 17 AstcMl 7 " " 18 0766 19 0767 £■. nigripinnis 20 0768 Enil 98226 Edward 1996 21 0108 Nawampasa 1996 2 98259 22 0109 3 98273 23 0110 4 98278 24 0112 5 98279

169 25 0740 Kyoga 1996 6 98284 Edward 1996 26 0770 7 98290 ** " 27 0771 8 98293 " 28 0772 9 98297 " 29 0773 10 98298 " 30 0774 11 98301 " 31 0775 12 98340 32 0776 13 98344 "" 33 0777 14 98380 " 34 0778 35 0779 P- sp. 36 0781 Plvl PLVl Victoria 1995 37 0783 2 PLV2 38 0784 3 PLV3 39 0785 4 PLV4 40 0786 5 PLV5 6 PLV6 X. “ fat tooth” 7 PLV7 AFI 0160 Nawampasa 1996 8 PLV8 2 0161 9 PLV9 3 0162 10PLVIO 4 0163 11PLVll 5 0164 12 PLV12 6 0165 13PLV13 7 0166 14 PLV14 8 0167 15 PLV15 9 0168 16 PLV16 10 0169 II 0211 P. “rock kribensis” 12 0212 1 19458 Jinja Pier 1994 13 0213 2 19460 14 0214 3 19461 15 0235 4 19462 16 0270 5 19467 17 0271 6 19472 18 0273 7 19480 19 0274 8 19488 20 0275 9 19491 10 19493 A. elegans 11 19495 AEl Edl32 Edward 1996 12 19496 2 133 13 19497 3 134 14 19500 4 135 15 19501 5 136 16 16584 1993 6 137 17 16860 7 138 18 16858 8 139 19 16559 9 140 20 16852 10 160 21 16864 11 161 Edward 1996 22 16869

170 12 162 23 16873 13 163 24 16855 14 164 25 16857 15 165 26 16557 16 166 27 16853 17 167 28 16885 18 168 29 16886 19 169 30 16887 20 170 31 16888

P. multicolor victoriae T. wingatti PMI CHL16 Nabugabo 1995 TWl 97655 Edward 19‘ 2 CHL17 2 97656 3 CHL18 3 97657 4 CHL19 4 97658 5 CHL20 5 97664 6 CHL76 6 97668 7 CHL22 7 97669 8 CHL23 8 97671 9 CHL24 9 97672 10 CHL25 10 97676 11 MAN16 11 97677 12 MAN17 12 97682 13 MAN18 13 97683 14 MAN19 14 97684 15 MAN20 15 97687 16 MAN21 16 97690 17 MAN22 17 97692 18 MAN23 18 97693 19 MAN24 19 97694 20 MAN25 20 98341 21 BC96PMV Kabeleka 1996 21 98359 22 K97PMV 22 98361 23 K98PMV 23 98375 24 K99PMV 24 98381 25 KlOOPMV 25 98382

T. pliaryngalis A. aenocolor TPI Edi 96 Edward 1996 Aael EDI 3 Edward 19S 2 197 2 ED14 3 198 3 ED15 4 200 4 ED16 5 201 5 ED17 6 202 6 ED18 7 203 7 ED19 8 204 8 ED20 9 205 9 ED21 10 206 10 ED22 11 207 11 ED23 " 12 208 12 ED28 " 13 209 Edward 1996 13 ED29 "

171 14 210 14 ED30 15 211 15 ED31 16 212 16 ED32 17 213 17 ED33 18 214 18 ED34 19 215 19 ED35 20 216 20 ED36

P. beadlei P. schuboltz PDl 0493 Nabugabo 1996 PScl ED75 Edward 1996 2 0494 2 ED 141 3 0495 3 ED 142 4 0496 4 ED 143 5 0497 5 ED 144 6 0498 6 ED 145 7 0499 7 ED 146 8 0500 8 ED 147 9 0501 9 ED148 10 0502 10 ED 149 11 0503 11 ED 150 12 0504 12 ED151 13 0505 13 ED 152 14 0506 14 ED 153 15 0507 15 ED 154 16 0508 16 ED 155 17 0509 17 ED 183 18 0510 18 ED 184 19 0512 19 ED185 20 0513 20 ED 186

H. squantipinnis H. “slick’ HSqL 98221 Edward 1996 HSU 97300 Edward 1994 2 98224 2 97301 3 98225 3 97303 4 98232 4 97304 5 98236 5 97306 6 98237 6 97307 7 98238 7 97310 8 98239 8 97314 9 98240 9 97315 10 98251 10 97320 11 98253 11 19787 12 98254 12 19931 13 98255 13 19970 14 98256 14 19890 15 98258 15 19891 16 98266 16 19861 17 98270 17 19858 18 98271 18 19955 19 98321 19 19971 20 98330 20 97651

172 A. niibila H. “ ruby” Anul 17442 Nawampasa 1995 Rubyl 17401 Nawampasa 1995 2 17443 2 17402 3 17444 3 17403 4 17445 4 17404 5 17446 5 17405 7 17457 6 17406 8 17486 7 17407 9 17487 8 17408 10 17488 9 17409 11 17489 10 17410 12 17490 11 17411 13 17509 12 17412 14 17510 13 17413 15 17511 14 17414 16 17512 15 17415 17 17513 16 17757 18 17514 17 17758 19 17515 18 17768 20 17516 19 17769

P. “shovel mouth” P. “black para” A shl 0125 Nawampasa 1996 Pbl 17558 Kyoga 1995 2 0126 2 17571 "" 3 0127 3 17572 "" 4 0128 4 17573 "" 5 0132 5 17574 " " 6 0218 6 17619 Kyoga 1995 7 0219 7 17714 Kyoga 1995 8 0220 8 17715 Kyoga 1995 9 0221 9 17716 10 0222 10 17717 11 0223 11 17719 12 0224 12 17720 13 0291 13 17722 14 0292 14 17725 15 0293 15 17728 16 0296 16 17711 17 0297 17 17572 18 0298 18 17581 19 0299 19 17176 20 0300 20 17636 21 0301 21 17724 22 0302 22 17721 23 0303 23 17723 24 0305 25 0306 P. laparagramma 26 0307 Plal 19672 Victoria 1994 27 0308 2 19775 "" 28 0309 Nawampasa 1996 3 19726 Victoria 1994

173 29 0310 4 19761 Victoria 1994 30 17464 Nawampasa 1995 5 19762 31 17466 6 19684 32 17522 7 19685 8 19711 Y. pappenhemmi 9 19712 Ypal 97418 Edward 1994 10 19705 2 97354 1994 3 97467 1994 4 98243 1996 5 98349 1996 6 98345 1996 7 98228 1996 8 ED6YP 9 ED7YP 10 ED8Ysp 11 ED107YP

174 APPENDIX E:

Data Relevant to Chapter 4,

Allele frequency data for all the 25 species at 14 microsatellite markers.

Alleles were designated as the sizes (in base pairs) of PGR products. Species names were abbreviated as the following:

Aal: A. alluaudi', Aae: A. aenocolor; Acm: A. caliptera from L. Malawi;

Ael: A. elegans; Afa: A. “fat tooth”; AJa: A. latifasciata;

Anu: A. nubila; Ave: A. velifer; Eni: E. nigripinnis;

Hru: H. “ruby”; Hsq: H. squamipinnis; Hsl: H. “slick”;

Pbp: P. “black para”; Pbe: P. beadlei; Pro: P. “rock kribensis”;

Plv: P. sp. (Victoria); Pve: P. venator; Psc: P. schuboltz;

Psh: P. “shovel mouth”; Pmu: P. multicolor victoriae;

Tph: T. pharyngalis; Twi: T. wingatti; Yfu; Y. fusiformis;

Yla: Y. laparagramma; Ypa: Y. pappenhemi.

175 0SU16d Allele Aal Pro Plv Pbe Pbp Ave Ala Aae Anu Psb Afa Ael AcM Ypa Via Yfu Pmu Hru Hsq Hsl Pvc Tpli Twi Enl Psc 64 0.235 66 0.265 0.460 68 0.118 72 0.894 0.382 0.400

74 0.058 0.100

76 0.038

78 0.010 0.026

80 0.053 0.079 0.020

82 0.094 0.0750.075 0.0080.008 0.0130.013 0.105 0.105 0.0500.050 0.052 0.026 84 0.0260.026 0.025 0.045 0.2110.211 0.020 0.020 0.031

86 0.016 0.0750.075 0.0220.022 0.0230.023 0.013 0.013 0.0830.083 0.100 0.017 0.053 0.026 0.040 0.036 88 0.081 0.094 0.0500.050 0.0230.023 0.0390.039 0.0260.026 0.026 0.026 0.0330.033 0.025 0.025 0.075 0.150 0.020 0.026 0.211 0.020 0.107 90 0.026 0.026 0.025 0.263 0.040 0.036 92 0.048 0.0250.025 0.030 0.030 0.026 0.026 0.050 0.050 0.050 0.050 0.075 0.045 0.200 0.034 0.020 0.125 0.053 0.120 ^ 94 0.022 0.015 0.053 0.017 0.025 0.050 0.034 0.053 0.026 0.036 95 0.094 0.025 0.061 0.092 0.0530.053 0.017 0.017 0.125 0.025 0.050 0.069 0.026 0.053 0.040 98 0.0430.043 0.030 0.030 0.105 0.105 0.026 0.026 0.050 0.050 0.025 0.091 0.100 0.026 0.053 0.080 0.036 100 0.016 0.063 0.075 0.0430.043 0.145 0.145 0.079 0.079 0.0500.050 0.034 0.026 0,036

102 0.016 0.025 0.022 0.250 0.197 0.0260.026 0.026 0.026 0.0170.017 0.075 0.075 0.050 0.034 0.079 0.060 0.063 104 0.032 0.063 0.050 0.022 0.053 0.132 0.017 0.075 0.050 0.069 0.026 0.036 0.031 106 0.113 0.031 0.022 0.038 0.026 0.079 0.079 0.017 0.050 0.050 0.091 0.050 0.034 0.026 0.237 0.025 0.038 0.020 0.036 0.125

108 0.306 0.063 0.0250.025 0.0220.022 0.0830.083 0.053 0.053 0.079 0.079 0.050 0.150 0.017 0.063

110 0.032 0.094 0.0500.050 0.023 0.023 0.026 0.026 0.026 0.026 0.053 0.053 0.0170.017 0,0250,025 0.025 0.026 0.388 0.125 112 0.048 0. 0.125 0.065 0.015 0.039 0.026 0.025 0.052 0.026 0.079 0.050 0.088 0.060 0.036 0.094

114 0.065 0. 0.025 0.022 0.038 0.013 0.053 0.017 0.125 0.045 0.050 0.017 0.026 0.075 0.026 0.080 0.143

116 0.016 0.0500.050 0.015 0.015 0.0660.066 0.0260.026 0.026 0.083 0.083 0.025 0.045 0.034 0.050 0.053 0.040 0.036 118 0.048 0.031 0.0150.015 0.0530.053 0.0670.067 0.075 0.045 0.053 0.026 0.050 0.040 0.036 0.031 120 0.032 0.050 0.065 0.045 0.013 0.033 0.025 0.045 0.017 0.079 0.026 0.075 0.120 0.036 0.031

122 0 . 0.000 0.043 0.030 0.013 0.026 0.026 0.050 0.025 0.045 0.050 0.017 0.026 0.020 0.071 0.053 124 0.075 0.087 0.045 0.013 0.105 0.150 0.050 0.136 0.121 0.026 0.200 0.080 126 0,156 0,075 0,087 0,061 0,013 0,053 0,050 0,025 0,045 0,050 0,052 0,079 0,079 0,075 0,071

128 0,016 0,031 0,025 0,043 0,023 0,026 0,053 0,017 0,100 0,045 0,052 0,026 0,053 0,050 0,020 0,071 0,094 130 0,065 0,023 0,013 0,053 0,026 0,050 0,100 0,091 0,050 0,069 0,026 0,026 0,075 0,200 0,094 132 0,032 0,025 0,087 0,015 0,053 0,033 0,050 0,025 0,091 0,069 0,026 0,053 0,025 0,025 0,020 0,094 134 0,022 0,045 0,053 0,053 0,050 0,075 0,069 0,053 0,031 136 0,043 0,013 0,053 0,026 0,017 0,017 0,053 0,020 0,036 138 0,016 0,031 0,025 0,022 0,015 0,026 0,017 0,025 0,053 0,038 0,040 0,036 140 0,032 0,025 0,065 0,026 0,053 0,026 0,0500,200 0,036 0,031

142 0,063 0,025 0,043 0,008 0,026 0,045 0,026 0,053

144 0,016 0,022 0,015 0,013 0,026 0,017

146 0,026 0,017 0,045 0,026

148 0,016 0,008 0,053 0,026

150 0,026 0,053 0,025 0,025 152 0,033 0.105 0,025 154 0,026 156 0,036

^ 158 0,025

OSU09d

Allele Aal Pro Plv Pbe Pbp Ave Ala Aae Anu Psb Afa Ael AcM Ypa Via Yfu Pmu Hru Hsq Hsl Pve Tph Twi Enl Psc

122 0,694 0,813 0,875 0,955 0,771 0,936 0,800 0,868 0,429 0,925 0,600 0,794 0,773 0,600 0,817 0,806 0,875 0,975 1,000 0,400 0,700 0,786 0,275

124 0,025 0,007 0,050 125 0,056 126 0,075 0,025 129 0,013 135 0,007 0,026 0,028

137 0,136 139 0,194 0,025 0,029 0,025 0,100 0,028

143 0,113 0,094 0,075 0,171 0,125 0,079 0,036 0,275 0,176 0,045 0,250 0,117 0,028 0,125 0,025 0,400 0,180 0,143 0,725 145 0,048

147 0,031 0,017 0,063

151 0,045 0,026 0,029 0,045 0,208 0,056 0,150 0,040 0,071 155 0.208

157 0.054 159 0.007 0.089 0.050 0.050 0,080

161 0.357 163 0.007 0.021 165 0.019 0.021

167 0.010 0.063 0.051 0.100 169 0.038 171 0.058 0.017 0.042 173 0.010 175 0.067 0.125 177 0.058 0.042 179 0.106 0.042

181 0.029 0.017 183 0.106 0.021

185 0.048

187 0.077 0.017 00 189 0.038 0.021 191 0.038 0.021 193 0.048 195 0.029 0.021 197 0.029

199 0.067 0.018 201 0.019

203 0.018 0.021 205 0.010 0.021

207 0.010

209 0.010

211 0.021 213 0.029 0.021 217 0.042 219 0.021 0SU121 Allele Aal Pro Plv Pbe Pbp Ave Ala Aae Anu Psh Afa Ael AcM Ypa Via Yfu Pmu Hru Hsq Hsl Pvc Tpb Twi Enl Psc

101 0.529 0.045 0.025 104 0.026

107 1.000 0.935 0.875 0.882 0.891 0.754 0.868 1.000 0.895 0.983 0.900 1.000 0.471 0.909 0.611 0.879 0.188 0.947 1.000 0.925 0.613 0.975 0.940 1.000 0.947 110 0.007 0.625 113 0.065 0.125 0.118 0.043 0.224 0.132 0.105 0.017 0.100 0.045 0.389 0.121 0.188 0.053 0.075 0.388 0.060 0.026 119 0.065 125 0.015

OSU19d Allele Aal Pro Plv Pbe Pbp Ave Ala Aae Anu Psb Afa Ael AcM Ypa Yla Yfu Pmu Hru Hsq Hsl Pvc Tph Twi Enl Psc 86 0.140 88 0.580

90 0.040 ^ 92 0.080 94 0.020 96 0.060 98 0.060 102 0.020 106 0.206 108 0.029 110 0.059 112 0.045 114 0.010 0.023 116 0,053 118 0,019

120 0.079 0.026 0.059 0.029

122 0.029 124 0.094 0.017 0.036 0.059 126 0.033 0.029 128 0,010 0.017 0.0340.026 0.029 0.045 0.029 130 0.019 0.008 0.059

132 0.010 0.022 0.068 0.053 0.029 0.052 0.079 0.118 134 0.017 0.063 0.022 0.023 0.017 0.026 0.017 0.0560.086 0.088 136 0.017 0.022 0.045 0.017 0.026 0.056 0.079 0.029 0.023 0.036 138 0.010 0.015 0.0260.0260.086 0.083 0.052 0.026 0.107 0.029 140 0.019 0.094 0.043 0.030 0.017 0.026 0.026 0.069 0.059 0.034 0.059 0.036 142 0.010 0.050 0.063 0.067 0.023 0.017 0.026 0.056 0.034 0.053 0.026 0.088 0.023 0.071 0.029 144 0.038 0.017 0.063 0.133 0.043 0.015 0.034 0.034 0.056 0.111 0.052 0.053 0.029 0.023 146 0.010 0.100 0.033 0.065 0.030 0.069 0.053 0.121 0.067 0.028 0.206 0.136 0.103 0.026 0.184 0.176 0.026 0.029 0.068 0.029 148 0.087 0.017 0.033 0.130 0.045 0.138 0.105 0.053 0.069 0.133 0.091 0.017 0.053 0.026 0.059 0.051 0.114 0.029

150 0.067 0.200 0.067 0.109 0.030 0.017 0.000 0.053 0.033 0.083 0.088 0.091 0.056 0.017 0.105 0.079 0.029 0.256 0.088 0.023 0.071 0.029 152 0.125 0.017 0.156 0.033 0.022 0.045 0.052 0.053 0.053 0.017 0.033 0.056 0.222 0.069 0.053 0.105 0.029 0.103 0.029 0.023 0.206 154 0.077 0.031 0.033 0.065 0.068 0.103 0.026 0.079 0.034 0.133 0.111 0.103 0.132 0.053 0.206 0.118 0.068 0.143 0.088

156 0.096 0.200 0.094 0.100 0.087 0.015 0.155 0.053 0.132 0.052 0.067 0.111 0.045 0.086 0.132 0.029 0.026 0.118 0.045 0.143 0.147

158 0.067 0.033 0.031 0.043 0.053 0.052 0.053 0.079 0.103 0.033 0.111 0.118 0.091 0.167 0.052 0.105 0.105 0.088 0.385 0.029 0.114 0.107 0.088 00 o 160 0.087 0.083 0.125 0.067 0.087 0.030 0.103 0.105 0.105 0.200 0.056 0.045 0.052 0.053 0.059 0.115 0.118 0.114 0.071 0.029 162 0.067 0.100 0.031 0.100 0.022 0.045 0.034 0.026 0.053 0.103 0.167 0.222 0.059 0.056 0.069 0.053 0.029 0.029 0.114 0.036

164 0.048 0.083 0.063 0.133 0.043 0.083 0.017 0.026 0.026 0.034 0.033 0.083 0.045 0.111 0.017 0.053 0.079 0.059 0.114 0.036 166 0.029 0.033 0.031 0.033 0.043 0.091 0.052 0.053 0.053 0.103 0.059 0.034 0.053 0.053 0.071 0.029 168 0.058 0.063 0.033 0.022 0.061 0.017 0.053 0.079 0.017 0.026 0.118 0.038 0.023 0.059 170 0.010 0.133 0.022 0.045 0.017 0.105 0.033 0.029 0.182 0.034 0.026 0.079 0.023

172 0.010 0.017 0.022 0.015 0.034 0.026 0.052 0.028 0.091 0.056 0.017 0.026 0.029 0.023 174 0.022 0.038 0.026 0.053 0.017 0.045 0.029

176 0.043 0.023 0.033 0.045 0.029 0.036

178 0.019 0.023 0.028 0.045 0.026

180 0.015 0.069

182 0.008

184 0.008

0SU191

Allcic Aal Pro Plv Pbe Pbp Ave Ala Aae Anu P.sb Afa Ael AcM Ypa Via Yfu Pmu Hru Hs(| llsl Pve Tpb Twi Enl Psc 96 0,028

99 0,033

101 0,017 0,156 0,071 0,014 0,289 0,053 0,500 0,028 0,045 0,017 0,021 0,075 0,775 0,113 0,025 103 0,083 0,229 0,325 0,250 0,152 0,147 104 0,026 0,059 106 0,026 0,045 0,5420,059 0,6250,130 0,088 108 0,007 0,021 0,025

110 0,031 0,028 0,029 0,025 0,038 112 0,207 0,156 0,421 0,024 0,050 0,056 0,184 0,105 0,083 0,133 0,611 0,409 0,067 0,083 0,176 0,225 0,050 0,025 0,391 0,077 0,265

114 0,063 0,024 0,021 0,079 0,105 0,033 0,056 0,045 0,083 0,021 0,175 0,075 0,800 0,130 0,385

116 0,017 0,021 0,139 0,053 0,050 0,467 0,083 0,063 0,091 0,056 0,050 0,042 0,125 0,025 0,043 0,231 0,147

118 0,017 0,184 0,310 0,100 0,014 0,026 0,026 0,183 0,033 0,083 0,091 0,021 0,176 0,050 0,025 0,088 0,038 0,059 120 0,017 0,156 0,079 0,129 0,056 0,079 0,033 0,083 0,029 0,050 0,022 0,038

122 0,034 0,026 0,064 0,222 0,026 0,033 0,017 0,029 0,022 0,038 124 0,034 0,031 0,026 0,071 0,086 0,069 0,079 0,033 0,045 0,029 125 0,019 0,022 oo 126 0.047 0,086 0,031 0,007 0,028 0,053 0,219 0,022 128 0,009 0,021 0,069 0,079 0,031 0,056 0,033 130 0,038 0,034 0,031 0,024 0,053 0,029 0,059 131 0,094

132 0,189 0,034 0,063 0,026 0,024 0,036 0,014 0,026 0,017 0,033 0,031 0,300 0,059 134 0,075 0,031 0,095 0,029 0,026 0,094 0,059 135 0,009

136 0,028 0,259 0,024 0,036 0,014 0,026 0,053 0,067 0,500 0,091 0,017 0,038 137 0,142

138 0,019 0,034 0,053 0,024 0,043 0,028 0,026 0,045 0,056 0,017 0,029

139 0,047

140 0,245 0,017 0,0290,014 0,026 0,031 0,056 0,033 0,029 141 0,019

142 0,019 0,017 0,053 0,021 0,014 0,067 0,028 0,056 0,017 0,029 144 0,0480,129 0,0830,026 0,029 146 0,017 0,063 0,026 0,024 0,021 0,014 0,017 0,031 0,091 0,056 0,021 0,059 0,038 148 0,017 0,031 0,029 0,028 0,0530,053 0,017 0,029 0,022 150 0,017 0,024 0,021 0,167 0,033 0,022 0,038 0,029 152 0,017 0,063 0,026 0,095 0,007 0,014 0,111 0,017 0,038 0,059 154 0,017 0,024 0,007 0,042 0,0260,053 0,017 0,029 0,022 156 0,017 0,024 0,014 158 0,014 0,029 0.025 0,059 160 0,017 0,048 0,026 162 0,053 0,053 0,017 163 0,017 164 0,017 155 0,053 0,056 0,017 166 0,029

167 0,017 0,031 0,033 0,017 168 0,029

169 0,053 0,167 0,017 170 0,048 0,025 00 171 0,033 0,111 0,017 172 0,048 0,067 0,029 173 0,007 0,026 0,056 175 0,031 0,053 0,050 177 0,017 0,031 0,026 0,033

179 0,017 181 0,014

185 0,007 0,033

OSU20d

Allele Aa! Pro PIv Pbc Pbp Avc Ala Aac Anu Psb Afa Acl AcM Ypa Yla Yfu Piiiu Hru Ilsq HsI I’ve Tph T>vi Eiii Psc

117 0,136 119 0,018 0,114

121 0,205 123 0,045 125 0,023 127 0.045 129 0.023 133 0,094 135 0.045 137 0.029 0.023

141 0.031 0.071 0.059 0.023 0.028 0.132 0.026 143 0.050 0.023 0.026 145 0.039 0.077 0.125 0.0290.045 0.023 0.188 0.024 0.036 0.105 147 0.029 0.023 0.031 0.056 0.088 0.017 0.056 0.053 0.024 149 0.020 0.071 0.175 0.111 0.029 0.045 0.091 151 0.039 0.017 0.008 0.076 0.026 0.094 0.029 0.068 0.2500.048 0.036 153 0.039 0.017 0.156 0.023 0.045 0.063 0.211 0.107 0.063 0.045 0.050 0.083 0.068 0.026 0.026 0.071 155 0.029 0.033 0.025 0.023 0.045 0.094 0.026 0.018 0.063 0.088 0.045 0.017 0.023 0.083 0.031

157 0.088 0.033 0.143 0.025 0.046 0.083 0.050 0.017 0.023 0.056 0.071 0.026 159 0.059 0.067 0.031 0.071 0.015 0.031 0.053 0.028 0.031 0.118 0.050 0.083 0.056 0.053 0.053 0.156 0.048 0.036 0.1.58

161 0.069 0.050 0.200 0.023 0.015 0.079 0.125 0.125 0.147 0.100 0.100 0.083 0.105 0.063 0.024 0.143 0.026 00 03 163 0.088 0.125 0.143 0.100 0.185 0.094 0.029 0.056 0.079 0.079 0.024 0.107 165 0.039 0.117 0.031 0.071 0.050 0.023 0.045 0.018 0.056 0.063 0.029 0.045 0.017 0.056 0.026 0.105 0.156 0.048 0.071 0.105

167 0.029 0.167 0.063 0.036 0.023 0.015 0.031 0.026 0.018 0.083 0.063 0.050 0.028 0.132 0.031 0.053 169 0.029 0.083 0.031 0.050 0.038 0.015 0.026 0.056 0.031 0.136 0.050 0.111 0.026 0.026 0.036 171 0.049 0.067 0.071 0.025 0.046 0.106 0.031 0.091 0.100 0.050 0.079 0.158 0.031 0.048 173 0.010 0.050 0.071 0.025 0.038 0.061 0.105 0.056 0.031 0.029 0.045 0.067 0.105 0.026 0.024 0.079 175 0.049 0.017 0.071 0.031 0.091 0.083 0.031 0.050 0.028 0.079 0.079 0.048 177 0.010 0.025 0.031 0.036 0.056 0.045 0.100 0.056 0.024 0.036

179 0.029 0.017 0.050 0.038 0.015 0.071 0.031 0.029 0.182 0.100 0.033 0.026 0.095 0.071 181 0.010 0.036 0.023 0.152 0.026 0.031 0.091 0.050 0.028 0.053 0.026 0.071 0.036 0.079 183 0.020 0.017 0.094 0.050 0.023 0.076 0.026 0.054 0.056 0.033 0.026 0.0480.036

185 0.029 0.067 0.031 0.036 0.091 0.053 0.036 0.083 0.100 0.067 0.026 0.053 0.071 0.036 0.158 187 0.020 0.063 0.046 0.030 0.031 0.105 0.018 0.031 0.017 0.028 0.026 0.094 0.024 0.036

189 0.010 0.050 0.046 0.219 0.053 0.067 191 0.029 0.050 0.025 0.085 0.031 0.018 0.150 0.033 0.048

193 0.020 0.156 0.015 0.030 0,031 0.036 0.056 0.026 0.071 0.105 195 0,010 0.031 0.031 0.018 0.017 0.028 0.026 0,036 197 0,029 0,008 0.015 0.063 0.031 0.050 0.017 0.028 0,053 199 0.017 0,024 201 0,010 0.063 0.015 0.031 0.028 0.063 0.029 0.033 0,053 203 0.031 0.067 0.028 205 0,020 0.031 0.008 0.071 0.088 0.028 0.026

207 0.036 0.029 0.079 0,026

209 0,010 0,017 0.031 0.054 0,048 211 0,010 0,033 0,031 0.018 0.031 0.059

213 0,020 0.031 . 0.025 0.036 0.045 0.050 0,175 0,024 0,036 0,026 215 0.036 0,025 0.018

217 0,017 0.071 0.030 0.036 0,013

219 0.017 0.083 0.029 0,026

221 0.033 0.031 0.018 0.056 0.029 0,113 223 0.045 0,050 225 0.026 0,013 00 ft 227 0,010 0.031 0.028 0.028 0,025 229 0.017 0.025 0.008 0.015 0.026 0,113 0,036 0,026

231 0,028 0,163 0,024 0,036 233 0.045 0,025

235 0,015 0,018 0,038 0,036

237 0.031 0.015 0,088

239 0.053 0.018 0,028 0,025

241 0.025 0,025

243 0.018 0,050 245 0.026

247 0,025

249 0,025 0,036 251 0.053 0.045 0,013 253 0.036 255

259 0,013 261 0.013

OSUZld Allcle Aal l’ro l’iv l’bc Php Ave Ala Aac Anu Psii Afa Acl AcM Ypa Via Yfu Ihiiu Hru ILsq H.sl Pvc Tph T>vl Eni P,sc 146 0.019 152 1.000

154 0.290 0.188 0.125 0.087 0.123 0.045 0.200 0.079 0.414 0.167 0.158 0.455 0.017 0.211 0.850 0.450 0.600 0.100 0.060 0.036 0.079

156 0.981 0.710 0.813 0.875 0.913 0.870 0.955 0.750 0.921 0.586 0.833 0.684 1.000 0.500 1.000 0.983 0.789 0.075 0.550 0.400 0.900 0.940 0.964 0.079 158 0.050 0.158

166 0.007 0.045 0.075 0.842

UNH142

Allele Aal Pro PIv Phc Php Ave Ala Aac Anu Psh Afa Acl AcM Ypa Yla Yfu Ihuu Hru Ik q Hsl Pvc Tph Twl Enl Psc 145 0.063 0.360 147 1.000 0.040 149 0.060 00 ai 151 0.016 0.438 0.111 0.320 153 0.031 0.156 0.017 0.220

155 0.219

157 0.125 0.086

159 0.031 0.031 0.017 0.083 161 0.053 0.031 0.083 0.035 0.578 0.018 0.029 0.056 0.053 0.050 0.042

163 0.031 0.012 0.016 0.147 0.138 0.026 0.050 0.025 0,107 165 0.042 0.012 0.031 0.018 0.059 0.029 0.056 0.034 0.053 0.139

167 0.026 0.031 0.042 0.093 0.172 0.438 0.278 0.018 0.088 0.265 0.050 0.250 0.625 0.250 0.167 0.143 0.083 169 0.474 0.625 0.250 0.318 0.244 0.094 0.094 0.333 0.411 0.088 0.150 0.111 0.034 0.053 0.125 0.250 0.125 0.333 0.286 0.042 171 0.333 0.182 0.174 0.031 0.188 0.278 0.411 0.176 0.088 0.250 0.056 0.155 0.263 0.125 0.025 0.444 0.083 173 0.105 0.125 0.068 0.163 0.031 0.094 0.083 0.071 0.176 0.088 0.200 0.444 0.397 0.368 0.125 0.025 0.278 0.250 0.083 175 0.079 0.386 0.070 0.031 0.028 0.018 0.059 0.147 0.0560.052 0.105 0.025 0.050 0.125 0.200 0.042 177 0.158 0.094 0.012 0.094 0.059 0.150 0.0260.075 0.125 0.100 0.036

179 0.063 0.042 0.047 0.018 0.176 0.050 0.034 0.133 0.071 0.042

181 0.125 0.023 0.035 0.018 0.050 0.017 0.075 0.125 0.033 0.083 183 0,105 0.083 0.023 0.050 0.139 0.125 0.033 0.071 185 0.058 0.059 0.050 187 0.023 0.029 0.125 189 0.050 0.083 191 0.023 0.118 0.036 193 0.029 0.017 0.053 0.083 195 0.059 0.050

197 0.056 0.042 199 0.042

201 0.029 0.056 0.083 203 0.042 205 0.042

UNH149 Allcle Aal Pro Plv Pbc Pbp Avc Ala Aac Anu Psb Afa Acl AcM Ypa Via Yfu I*mu Hru H.sq Hsl l\c Tpb Twi Eni P.sc 93 0.125 00 O) 95 0.029 105 0.406 0.091

107 0.043 0.125 0.250 0.194

109 0.043 0.045 0.357 0.056 0.029 111 0.071 0.026 0.471 112 0.250 0.375 0.286 0.054 0.107 0.028 113 0.139 0.313 0.071 0.022 0.271 0.194 0.294 0.053 0.136 0.045 0.156 0.091 0.333 0.870 0.375 0.025 0.079 0.071 0.088 115 0.389 0.250 0.036 0.022 0.057 0.226 0.235 0.316 0.054 0.545 0.182 0.125 0.045 0.667 0.093 0.353 0.050 0.053 0.036 0.111 0.500 0.206 116 0.145 0.147 0.091 0.227 0.219 0.143 0.071 0.029

117 0.065 0.118 0.079 0.036 0.136 0.045 0.053 0,107 0.194

118 0.571 0.063 0.045 0.031 0.725 119 0.043 0.014 0.045 0.045 0.028 120 0.105 0.268 0.031 0.100 0.071 121 0.035 0.088 0.056 0.029 125 0.035 0.037 0.026 126 0.028 0.029 127 0.022 0.045 0.053 129 0.012

131 0.093 0.059

133 0.047 0.043 0.057 0.032 0.111 0.036 135 0.058 0.031 0.036 0.326 0.016 0.026 0.036 137 0.035 0.056 0.364 0.083 0.132 0.036

139 0.140 0.111 0.196 0.029 0.263 0.045 0.136 0.021 0.050 0.036 141 0.035 0.028 0.130 0.057 0.026 0.042 0.029 143 0,151 0.143 0.065 0.071 0.263 0.029 145 0.012 0.214 0.071 0.021 0.026 147 0.047 0.057 0.045 0.042 149 0.071 0.021 0.050 0.079 0.036

151 0.023 0.022 0.016 0.029 0.045 0.031 0.029

153 0.012 0.014 0.113 0.079 0.018 0.063 0.059 155 0.093 0.113 0.045 0.156 0.132

157 0.058 0.071 0.022 0.045 0.042 0.053 0.083 0.071 00 N 159 0.035 0.091 0.104 0.031 161 0.047 0.022 0.091 0.063 0.029 0.125 0.026 162 0.016 0.056 163 0.091 0.091 0.063 164 0.056

165 0,023 0.029 0.021 0.036 0.029

167 0.016 0.063

169 0.083

171 0.012 0.042 0.029 0.028

173 0.016 0.021 0.294

175 0.016 0.021

177 0.045 0.021

179 0.053 0.029 181 0.036 0.021 0.118

183 0.042 185 0.021 187 0.016 0.059 189 0.021 191 0.021 193 0.021 197 0.021

UNH169

Allele Aal Pro Plv Pbc Pbp Avc Ala Aac Anu Psh Afa Acl AcM Ypa Via Yfu Piuu Hru Ilsq Hsl Pvc Tpb Twl Enl Psc

131 0.026 133 0.026 0.044 0.025 135 0.750 0.844 0.900 0.925 0.974 0.938 0.824 0.650 0.868 0.783 0.895 0.400 0.500 0.500 0.944 0.692 0.938 0.725 0.850 1.000 0.950 0.500 0,571 0.600 137 0.224 0.156 0.100 0.075 0.026 0.038 0.132 0.350 0.132 0.217 0.079 0.600 0.441 0.500 0.056 0.308 0.063 0.275 0.125 0.050 0.500 0.429 0.250 139 0.029 0,150

143 0.029

153 0.100 161 0.013

00 163 0.013 CO 165 0.100 167 0.160 169 0.020 171 0.040 175 0.100 179 0.020 181 0.020 187 0.040 189 0.060 191 0.020 195 0,040 197 0.020 199 0.020 201 0.020

203 0.040 209 0,020 215 0.020 217 0.040 219 0.020 22 1 0.040 235 0.040

DXTUCA-14 Allele Aal Pro Plv Pbc Pbp Avc Ala Aac Anu P.sh Afa Acl AcM Ypa Via Yfu I’inu Hru Hsq Hsl Pvc Tph Twi Enl Psc 71 0.063 73 0.022 75 0.022 0.048

79 0.051 0.097

81 0.017 0.022 83 0.210 0.107 _ 0.029 85 0.014 0.081 0.143 00 87 0.0560.032 0.028 0.036 CO _ 89 0.031 0.014 0.032 0.143 0.026 0.107 0.036 0.018 0.059 0.036 91 0.063 0.014 0.071 0.033 0.018 0.056 0.036 93 0.065 0.026 0.042 0.071 95 0.063 0,056 0.026 0.050 0.036 97 0.034 0.031 0.026 0.083 0.071 99 0.022 0.017 0.028 0.056 0.125 0.125 0.071 0.036 101 0.022 0.028 0.032 0.071 0.026 0.067 0.071 0.054 0.056 0.237 0.059 0.083 0.036

103 0.068 0.094 0.132 0.022 0.042 0.032 0.105 0.071 0.091 0.036 0.111 0.059 0.036

105 0.125 0.063 0.026 0.069 0.032 0.0530.050 0.036 0.083 0.167 0.036 0.071

107 0.051 0.156 0.063 0.053 0.014 0.048 0.053 0.017 0.111 0.071 0.111 0.071 0.028 0.029 0.0420.036 0.036 109 0.031 0.063 0.022 0.056 0.071 0.053 0.050 0.111 0.107 0.045 0.111 0.036 0.088 0.125 0.036 111 0.085 0.031 0.063 0.105 0.152 0.028 0.065 0.214 0.026 0.067 0.083 0.036 0.143 0.227 0.056 o.in 0.036 113 0.068 0.156 0.156 0.105 0.152 0.014 0.071 0.026 0.033 0.167 0.071 0.071 0.136 0.056 0.018 0.111 0.039 0.118 0.083 0.071

115 0.156 0.105 0.056 0.032 0.033 0.028 0.071 0.036 0.036 0.125 0.092 0.088 0.107 0.036

117 0.102 0.031 0.105 0.042 0,032 0,143 0.184 0.133 0,167 0.071 0.136 0,0560,018 0,250 0,039 0.167 0.071 0.107 119 0,017 0,188 0,026 0,152 0,056 0,071 0,033 0,083 0,107 0,111 0,054 0,028 0,125 0,029 0,042 0,071 0,107 121 0,063 0,063 0,053 0,196 0,069 0,048 0,026 0,033 0,036 0,045 0,056 0,036 0,063 0,125 0,235 0,083 0,036 0,071 123 0,136 0,053 0,043 0,042 0,079 0,133 0,071 0,054 0,188 0,026 0,059 0,036 0,071 125 0,136 0,031 0,132 0,056 0,071 0,026 0,033 0,107 0,089 0,028 0,224 0,059 0,036 0,071 127 0,031 0,065 0,056 0,048 0,071 0,017 0,056 0,071 0,222 0,107 0,118 0,088 0,036 0,036

129 0,085 0,031 0,053 0,022 0,042 0,016 0,079 0,017 0,028 0,071 0,036 0,091 0,071 0,063 0,083 0,079 0,107 0,036 131 0,034 0,026 0,022 0,079 0,067 0,0280,036 0,018 0,028 0,013 0,036 133 0,085 0,031 0,026 0,043 0,069 0,048 0,017 0,028 0,125 0,188 0,056 0,013 135 0,094 0,069 0,0560,036 0,091 0,125 0,013 0,071

137 0,017 0,031 0,028 0,036 0,056 0,018 0,063 0,083 0,125 0,105 139 0,017 0.091 0,111 0,018 0,028 _ 0,042 0,071 141 0.036 0,045 0,036 0,063 _ 0,036 143 0,017 0,053 0,018 0,063 0,028 145 0,014 0,026 0,018 0,028 147 0,250 0,028 151 0,063 _ 0,036 g 155 0,028 _ 0,036 157 0,036

DXTUCA-3 Allele Aal Pro Plv Pbe Pbp Avc Ala Aac Anu P,sh Afa Acl AcM Ypa Yla Yfu Pinu H ru Hsc| Hsl Pvc Tph Twi KnI Psc 103 0,075

105 0,031 0,018

107 0,019 0,059 109 0,000 0,029

111 0,022 0,027 0,019 0,206 0,050 0,083 113 0,012 0,162 0,026 0,026 0,0590,050 0,026 0,071

115 0,047 0,033 0,027 0,019 0,053 0,017 0,059 0,025 0,013 0,033 117 0,035 0,025 0,063 0,0330,088 0,050 0,0280,036 0,033 119 0,012 0,075 0,031 0,033 0,118 0,059 0,026 0,025 0,036 121 0,058 0,063 0,033 0,022 0,027 0,074 0,026 0,053 0,088 0,063 0,029 0,050 0,056 0,054 0,033 0,056 0,071 0,033 123 0,023 0,100 0,063 0,067 0,065 0,037 0,079 0,050 0,059 0,063 0,059 0,1110,0360,020 0,056 0,036 125 0,012 0,050 0,031 0,133 0,087 0,167 0,105 0,026 0,088 0,031 0,029 0,111 0,0260,033 0,0360,056 0,071 0,100 127 0,012 0,075 0,031 0,033 0,130 0,041 0,037 0,105 0,079 0,094 0,088 0,056 0,071 0,026 0,100 0,075 0,036 0,083 0,100 129 0,081 0,075 0,031 0,133 0,130 0,068 0,056 0,053 0,026 0,033 0,125 0,088 0,050 0,089 0,053 0,033 0.050 0,013 0,028 0,071 0,067

131 0,035 0,125 0,067 0,022 0,135 0,074 0,026 0,158 0,067 0,088 0,125 0,147 0,100 0,111 0,018 0.053 0,133 0,025 0,125 0,071 0,028 0,107 0,033 133 0,128 0,125 0,065 0,162 0,074 0,105 0,105 0,200 0,088 0,031 0,088 0,050 0.167 0,143 0,079 0,133 0,075 0,038 0,143 0,133 135 0,128 0,025 0,031 0,067 0,065 0,081 0,111 0,026 0,026 0,050 0,059 0,063 0,029 0,250 0,111 0,036 0,079 0,033 0,100 0,163 0,107 0,111 0,100

137 0,105 0,025 0,156 0,100 0,043 0,041 0,185 0,079 0,105 0,133 0,059 0,125 0,100 0,056 0,054 0,105 0,133 0,125 0,075 0,107 0,250 0,071

139 0,081 0,075 0,125 0,087 0,068 0,037 0,053 0,079 0,250 0,088 0,156 0,050 0,056 0,089 0,211 0,150 0,225 0,083 0,036 0,033

141 0,058 0,075 0,094 0,133 0,109 0,027 0,074 0,053 0,026 0,033 0,029 0,094 0,050 0,056 0,125 0,026 0,033 0,100 0,075 0,036 0,071 0,100

143 0,058 0,063 0,133 0,054 0,132 0,105 0,017 0,031 0,029 0,036 0,067 0,125 0,013 0,250 0,083 0,035 0,100 145 0,023 0,075 0,041 0,053 0,083 0,089 0,025 0,175 0,036 0,028 0,071

147 0,047 0,031 0,043 0,014 0,053 0,017 0,029 0,100 0,080 0,053 0,025 0,071 0,100 149 0,012 0,050 0,033 0,043 0,014 0,019 0,026 0,017 0,050 0,056 0,071 0,120 0,158 0,067 0,025 0,013 0,071 0,028 0,033 151 0,035 0,025 0,063 0,043 0,026 0,053 0,0560,0360,140 0,053 0,067 0,143 153 0,031 0,022 0,026 0,026 0,029 0,0180,060 0,026 0,071 155 0,031 0,014 0,029 0,100 0,067 0,036 2 157 0,018 0,020 159 0,031 0,050 0,040 0,033 161 0,140 163 0,040 0,033 165 0,080

167 0,020

169 0,040 173 0,020 175 0,040

179 0,040

I)XTUCA-15 Allele Aal Pro Plv Pbc Pbp Ave Ala Aac Anu Psb Afa Acl AcM Ypa Via Yfu Pinu Hru Ilsq Hsl Pvc Tph Twi Enl Psc 78 1,000 0,029

80 0,023 0,971 0,056

82 0.094 0,025 0.068 0,013 0,074 0,025 0,211 0,017 0,132 0,050 0,029 0,045 0,056 0,034 0,040 0,139 0,025 0,140 0,071 84 0,200 0.219 0,225 0,114 0,171 0,206 0,075 0,132 0,167 0,132 0,100 0,045 0,500 0,190 0,140 0,111 0,250 0,071

86 0,300 0,188 0,325 0,250 0,316 0,456 0,500 0,158 0,267 0,132 0,550 0,409 0,224 0,640 0,222 0,400 0,200 0,013 0,800 0,440 0,607 0,250 88 0,450 0,281 0,250 0,409 0,434 0,162 0,250 0,395 0,483 0,474 0,125 0,318 0,278 0,241 0,140 0,222 0,400 0,350 0,474 0,050 0,260 0,143 0,050 90 0,025 0,188 0,125 0,136 0,053 0,015 0,075 0,053 0,017 0,053 0,050 0,091 0,056 0,103 0,040 0,222 0,150 0,050 0,205 0,075 0,060 0,036 0,325 92 0,025 0,031 0,050 0,013 0,029 0,025 0,053 0,050 0,079 0,025 0,056 0,207 0,083 0,150 0,308 0,050 0,040 0,375 94 0,029 0,050 0,100 0,045 0,050 0,071 96 0,060

98 0,045

nXTUCA-8 Allcle Aal Pro Plv Pbc Pbp Avc Ala Aac Anu Psh Afa Acl AcM Ypa Via Yfu Pinu Hru Hsq Hsl Pvc Tph Twi Eni Psc

60 0,017

66 0,013 0,045 70 0,025

72 0,094 0,026 0,026 0,017

74 0,438 0,250 0,200 0,214 0,197 0,179 0,075 0,211 0,067 0,237 0,050 0,556 0,103 0,237 0,275 0,050 0,359 0,025 0,100 0,125 0,167 0,211 0,385 0,200 0,053 0,217 0,237 0,150 0,235 0.227 0,056 0,172 0,040 0,026 0,050 0,075 0,192 0,800 0,080 0,107 0,300 78 0,050 0,024 0,197 0,051 0,026 0,026 0,050 0,059 0,045 0,034 0,026 0,125 0,025 80 0,011 0,469 0,438 0,625 0,595 0,355 0,359 0,675 0,711 0,700 0,474 0,500 0,441 0,591 0,389 0,621 0,684 0,525 0,550 0,449 0,150 0,520 0,607 0,525 82 0,063 0,017 0,045 0,060 0,026 0,025 0,025 0,020 84 0,848 0,026 0,050 0,1500,029 0,034 0.075 0,125 0,180 0,286 0,025 86 0,076 0,088

88 0,054 0,050 0,045 0,020 0,070 90 0,011 0,029 0,020

93 0,040

94 0,029 0,025 0,075

95 0,040

96 0,000 0,050 0,025 0,020 97 0,040 98 0,029

99 0,020 0,040 0,025 100 0,050 101 0.080 0.020 0.025 103 0.120 105 0.020 107 0.060 109 0.120 111 0.100 113 0.100 115 0.020 116 0,029 117 0.040 121 0.020 122 0.029 125 0.020 131 0.020

CD C*5 APPENDIX F:

Data Relevant to Chapter 4,

Pairwise geentic distance matrices.

194 AAV AAY 0 ,0 1 2 RKR 0,595 0,628 PU 0,562 0,583 0,057 PBP 0 .6 0 5 0 ,6 1 9 0 ,0 7 4 0 , 0 3 8 AVN 0,579 0,595 0,057 0,022 0,029 AVK 0 ,6 7 2 0,691 0,170 0,128 0 ,1 3 2 0 ,1 0 6 YLA 0 ,5 9 4 0 ,6 1 8 0,108 0,076 0,112 0,053 0,178 YFU 0,540 0,562 0,098 0,046 0.055 0,066 0,161 0 ,1 1 2 PBE 0,606 0,628 0,056 0,017 0,025 0,036 0 ,1 3 9 0 ,1 1 1 0 ,0 5 2 ALA 0,581 0,595 0,111 0 , 0 6 0 0 ,0 6 6 0,050 0,131 0,131 0 ,0 6 9 0 ,0 5 6 AAE 0 ,6 1 7 0 ,6 3 2 0 ,0 9 4 0 , 0 5 5 0 ,0 8 0 0 ,1 0 3 0 ,1 5 8 0 ,2 2 6 0,051 0,057 0,081 ANU 0 ,5 8 6 0,598 0,057 0,017 0 ,0 2 4 0 ,0 4 2 0,130 0,087 0,024 0,034 0,066 0,048 PSH 0 ,6 7 0 0 ,6 8 4 0 ,1 4 5 0 , 1 2 6 0,143 0,150 0,240 0,267 0,150 0 ,1 5 8 0 ,2 1 0 0 ,0 9 2 0 ,1 2 2 AFA 0 ,6 2 1 0 ,6 3 6 0,069 0,048 0,046 0 ,0 3 9 0 ,1 5 9 0 ,1 1 8 0,072 0,054 0,055 0,098 0,043 0,160 AEL 0,659 0,682 0,153 0,163 0.212 0,222 0,292 0.338 0,136 0,114 0,180 0 .0 7 2 0,165 0,203 0,210 ACM 0 ,6 8 1 0 ,6 9 9 0 ,4 3 2 0 ,4 2 8 0,387 0,417 0,488 0,448 0,341 0,419 0,396 0.402 0,369 0,569 0,414 0 ,4 7 4 YPA 0,738 0,745 0,111 0, 128 0 ,1 2 4 0 ,1 4 0 0 ,2 4 7 0,320 0,112 0,087 0 ,1 3 7 0 ,0 3 9 0 ,1 0 9 0 ,1 1 7 0,121 0,045 0,431 PMU 1,954 1 ,9 5 7 1,944 2,023 2,077 2,057 1,8 9 7 2 ,3 5 6 2 ,0 6 7 1,942 1,866 1 ,748 2,124 1,920 2,175 1,653 2,270 1,688 HSQ 0,924 0,957 0,168 0.199 0,233 0,245 0,337 0,427 0 ,2 7 9 0,211 0 ,2 8 8 0 ,1 6 3 0 ,2 2 0 0 ,1 8 4 0,215 0,232 0,707 0,083 1 ,7 2 6 HSL 0 ,7 2 0 0 ,7 3 3 0,154 0.093 0,136 0 ,1 3 4 0 .2 1 9 0 ,2 5 3 0,146 0,131 0,176 0,095 0,121 0,086 0,138 0.261 0,543 0,131 2,138 0 ,1 6 5 PVE 1,015 1,033 0,270 0,228 0.252 0 ,2 3 5 0 ,3 7 5 0 ,3 2 0 0 ,3 1 8 0,2 9 1 0 ,3 4 4 0,328 0,240 0,320 0 ,2 5 9 0 ,5 1 2 0,723 0,305 2,236 0 ,2 4 2 0 ,2 4 4 TPH 0 ,6 6 7 0,679 0,306 0,276 0,296 0,256 0,314 0,382 0,291 0,270 0 ,2 0 9 0 ,2 4 7 0,310 0,361 0,316 0 ,3 1 1 0 ,6 1 6 0 ,3 2 8 1,259 0,466 0,419 0,618 TWI 0,540 0,542 0,133 0,100 0.128 0,146 0,210 0,217 0,072 0,077 0,098 0,053 0,078 0,182 0,122 0,037 0,390 0,050 1,632 0 ,2 3 9 0 ,2 0 3 0 ,4 1 2 ENI 0 ,5 1 6 0 ,5 2 1 0,141 0.108 0,107 0,148 0,187 0,233 0 ,0 6 2 0 ,0 9 8 0,089 0,047 0,082 0 ,1 8 8 0,118 0,076 0,375 0,092 1,624 0,257 0,205 0 ,3 6 6 0,249 0,035 PSC 0 ,9 4 3 0 ,9 7 5 0 ,4 3 4 0,431 0,499 0,488 0,563 0,580 0,418 0,417 0,507 0,366 0,455 0 ,4 4 6 0 ,4 8 0 0,347 0,876 0,356 1 ,9 5 9 0,373 0,504 0,64 5 0,434 0,410 0,432 HRU 0,573 0,599 0,056 0,016 0,013 0,047 0,150 0 ,1 1 2 0 ,0 5 0 0,016 0,080 0,064 0,016 0 ,1 1 8 0,055 0,158 0,415 0,090 2,033 0,178 0,116 0,236 0,292 0,101 0,097 0,406

Figure F.l. Pairwise Nei’s genetic distances based on 11 microsatellite markers. Species name abbreviations are as in Appendix

E except the following: AAJ: Astatoreochromis alluaudi from Lake Victoria; AAY: : Astatoreochromis alluaudi from Lake

Kyoga; AVN: Aslatotilapia velifer from Lake Nabugabo; AVK: Astatotilapia velifer from Lake Kayugi. MV M Y 0 .2 8 9 RKR 0.942 1.002 PI,A 0.944 0.999 0. 498 PBP 0.928 0.955 0.449 0.450 AVN 0.925 0.928 0.4020.408 0.371 AVK 1.135 1.144 0.601 0.598 0.535 0.471 Y1.A 1.009 1.026 0.550 0.577 0.603 0.485 0.619 YFU 0.819 0.862 0.442 0.428 0.392 0.401 0.594 0.542 PBE 0.956 1.002 0.3980.367 0.347 0.364 0.585 0.571 0.411 ALA 0.892 0.886 0.472 0.459 0.430 0.389 0.533 0.562 0.413 0.399 ME 0.977 0.964 0.500 0.455 0.491 0.476 0.574 0.751 0.391 0.460 0.474 ANU 0.908 0.929 0.4180.397 0.368 0.380 0.524 0.564 0.345 0.410 0.408 0.415 PSH 1.027 1.027 0.519 0.574 0.525 0.517 0.688 0.763 0.490 0.563 0.611 0.476 0.510 AFA 1.013 0.955 0.423 0.446 0.389 0.376 0.584 0.608 0.467 0.404 0.391 0.532 0.402 0.564 AEL 1.003 1.061 0.497 0.552 0.591 0.563 0.730 0.729 0.513 0.464 0.555 0.432 0.554 0.566 0.610 ACM 1.070 1.032 0.831 0.982 0.851 0.852 0.930 0.941 0.767 0.926 0.869 0.833 0.862 1.002 0.902 0.837 YPA 1.215 1.115 0.568 0.681 0.524 0.545 0.732 0.809 0.565 0.534 0.588 0.498 0.560 0.534 0.586 0.476 0.866 PMU 2.194 2.186 2.189 2.024 2.142 2.129 2.164 2.329 2.012 2.111 1.994 2.010 2.086 2.097 2.221 1.960 2.236 1.907 HSQ 1.089 1.172 0.530 0.584 0.582 0.586 0.722 0.797 0.551 0.548 0.622 0.497 0.518 0.571 0.611 0.556 0.969 0.507 1.897 HSL 0.994 1.011 0.477 0.494 0.491 0.470 0.665 0.711 0.462 0.454 0.519 0.503 0.469 0.424 0.487 0.573 1.012 0.505 2.238 0.501 PVE 1.401 1.406 0.702 0.711 0.666 0.665 0.797 0.781 0.804 0.750 0.819 0.329 0.688 0.738 0.691 0.916 1.211 0.761 2.567 0.695 0.632 ^ TPH 1.023 1.052 0.757 0.764 0.743 0.700 0.783 0.823 0.753 0.690 0.655 0.7240.684 0.823 0.761 0.724 1.0770.870 1.5670.860 0.8441.088 TWI 0.920 0.906 0.510 0.550 0.526 0.518 0.652 0.635 0.455 0.460 0.448 0.453 0.476 0.560 0.515 0.403 0.815 0.450 1.804 0.525 0.510 0.914 0.658 ENI 0.867 0.871 0.530 0.549 0.486 0.546 0.646 0.670 0.442 0.516 0.492 0.461 0.493 0.588 0.529 0.432 0.753 0.527 1.859 0.572 0.593 0.876 0.679 0.410 PSC 1.157 1.224 0.7260.788 0.781 0.796 1.006 0.946 0.669 0.720 0.770 0.691 0.759 0.816 0.805 0.713 1.150 0.730 2.254 0.645 0.774 0.987 0.818 0.681 0.719 HRU 0.901 0.955 0.450 0.398 0.360 0.395 0.609 0.612 0.461 0.370 0.498 0.469 0.371 0.509 0.422 0.541 0.920 0.547 2.093 0.556 0.467 0.694 0.719 0.534 0.491 0.760

Figure F.2. Pairwise allele sharing genetic distances based on 11 microsatellite markers. Species name abbreviations are as in

Appendix E except the following: AAJ:Astatoreochromis alluaudi from Lake Victoria; AAY: : Astatoreochromis alluaudi from

Lake Kyoga; AVN: Astatotilapia velifer from Lake Nabugabo; AVK: Astatotilapia velifer from Lake Kayugi. AAV AAY 0 .2 0 4 RKR 4.199 3.998 PLA 4.152 4.021 0.134 POP 5 , 0 0 3 4 .7 3 2 0.2 6 4 0 .2 7 6 AVN 4.351 4.137 0.086 0.074 0.162 AVK 4.523 4.161 0.259 0.299 0.475 0.249 YLA 3.726 3.769 0.490 0.582 0.748 0.609 0.764 YFU 4.231 4.084 0.182 0.092 0.107 0.134 0.456 0.444 PBE 4.594 4.272 0.172 0.092 0.284 0.091 0.304 0.985 0.317 ALA 4.255 4.068 0.185 0.161 0.464 0.148 0.287 0.568 0.340 0.196 AAE 4,831 4.551 0.370 0.041 0.354 0.251 0.352 1.137 0.307 0.128 0,381 ANU 4.399 4.238 0.134 0.068 0.084 0.114 0.348 0.410 0.044 0.214 0.258 0.236 ' PSII 4 . 125 4 .004 1.046 0. 596 1.190 0. 909 1 .070 1 . 859 1.024 0. 748 1.183 0.516 0.961 AFA 4.693 4.3700.171 0.102 0.258 0.115 0.251 0.834 0.309 0.010 0.130 0.177 0.180 0.858 AEL 4.500 4.046 0.471 0.391 0.774 0.431 0.611 1.479 0.670 0.131 0.533 0.305 0.629 0.696 0.325 ACM 3.112 2.538 2.368 2.304 2.858 2.472 2.366 2.457 2.491 2.447 2.200 2.573 2.488 3.590 2.460 2.640 YPA 4.915 4.709 0.347 0.198 0.300 0.177 0.428 1.278 0.364 0.082 0.461 0.091 0.255 0.509 0.157 0.286 3.036 PMU 8.546 7.835 10.281 9.518 10.892 10.082 10.329 10.725 9.957 10.006 10.362 9.614 10.376 9.049 10.399 9.081 8.180 10.113 " - ...... " ...... 516 .358 0,161 ,498 2.170 2.452 — ...... 7.195 1.303 1.632 4 . 486 TWI 4.342 4.113 0.467 0.184 0.734 0.390 0.521 1.269 0.561 0.143 0.399 0.140 0.496 0.425 0.233 0.113 2.670 0.149 9.265 0.237 0.480 2.129 1,115 ENI 4.496 4.199 0.290 0.117 0.352 0.221 0.397 1.154 0.338 0.035 0.328 0.086 0.253 0.585 0.084 0.123 2.519 0.056 9.695 0.274 0.556 1.884 1.417 0.057 PSC 4.829 4.588 0.858 0.722 1.099 0.882 1.034 1.654 0.876 0.843 1.179 0.700 0.982 0.845 0.934 0.682 3.516 0.683 9.438 0.760 1.144 2.758 1.619 0.632 0.646 HRU 4.528 4.353 0.219 0.090 0.055 0.115 0.509 0.787 0.081 0.165 0.385 0.207 0.039 0.875 0.228 0.525 2.620 0.162 10.052 0.434 0.686 2.168 1.637 0.461 0.192 0.899

Figure F.3. Pairwise stepwise weighted genetic distances based on 11 microsatellite markers. Species name abbreviations are as in Appendix E except the following: AAJ:Astatoreochromis alluaudi from Lake Victoria; AAY: : Astatoreochromis alluaudi from Lake Kyoga; AVN: Astatotilapia velifer from Lake Nabugabo; AVK: Astatotilapia velifer from Lake Kayugi, IMAGE EVALUATION TEST TARGET (Q A -3 )

%

1^ il 23 1.0 3.2 Ué I [ 2.2 If 2.0 l.l 1.8

1.25 1.4 1.6

150mm

V

V A P P L IE D ^ IIVI/4GE . Inc 1653 East Main Street " Rochester, NY 14609 USA Phone: 716/482-0300 ^ ,*V Fax: 716/288-5989 / O 1993. Applied Image. Inc.. Ail Rights Reserved

O 7