<<

MOLECULAR EVOLUTION OF THE GAPC GENE FAMILY

IN SPECTABILIS ()

Joëlle R. Pérusse

Department ofBiology

McGill University, Montreal

November 2001

A thesis submitted to the Faculty of Graduate Studies and Research in partial

fulfillment ofthe requirements ofthe degree ofMasters of Science

Joëlle R. Pérusse

© 2001 National Library Bibliothèque nationale 1+1 of Canada du Canada Acquisitions and Acquisitions et Bibliographic Services services bibliographiques 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1 A ON4 canada canada You, file Votre réftl_

Ou, file Notre rélé,_

The author bas granted a non­ L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library ofCanada to Bibliothèque nationale du Canada de reproduce, loan, distnbute or sell reproduire, prêter, distribuer ou copies ofthis thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

The author retains ownership ofthe L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts from it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son penmsslOn. autorisation.

0-612-78937-3

Canada PREFACE

This thesis carries a credit weight of 39 credits, from a total of45 credits required for the Master's degree. Graduate credits are a measure ofthe time assigned to a given task in the graduate program. They are based on the consideration that a term offull-time graduate work is equivalent to 12 to 16 credits, depending on the intensity ofthe program.

2 AB8TRACT

This thesis investigates the molecular evolution ofthe cytosolic glyceraldehyde 3­ phosphate dehydrogenase (GapC) gene family in two varieties ofAmsinckia spectabilis (Boraginaceae) that differ in mating system. Examination of sequence variation suggests that the gene family consists ofthree or four members.

Compared with GapC in Arabidopsis thaliana, the GapC locus in A. spectabilis has at least two fewer introns and one intron that is double in length. Strong purifying selection was detected at each putative locus since the divergence ofthe

Amsinckia spectabilis lineage from in the aIlied family Solanaceae. Mean nuc1eotide diversity across aIl observedO loci is 0.0036 for the inbreeding variety spectabilis, and 0.0049 for the outbreeding variety microcarpa. Outbreeding populations were systematicaIly more diverse than inbreeding populations at

GapC-IV These results are discussed in the context oftheory for the fate of duplicate genes, and the background selection hypothesis.

3 RESUME

La présente thèse étudie l'évolution moléculaire de la famille des gènes de la

glycéraldehyde 3-phosphate déshydrogénase cytosolique (GapC) chez deux variétés d' qui diffèrent de système de reproduction.

L'examen de la variation des séquences suggère que la famille des gènes GapC comprend trois ou quatre membres chez Amsinckia spectabilis. Les loci GapC chez Amsinckia spectabilis contiennent un intron deux fois plus long, et au moins deux introns de moins, comparé à GapC chez Arabidopsis thaliana. Une forte sélection purificatrice fut détectée à chaque locus GapC depuis la divergence de la lignée à laquelle appartient Amsinckia spectabilis et des espèces de la famille

Solanaceae. La diversité génétique évaluée à partir des séquences d'ADN est de

0.0036 chez la variété auto-féconde spectabilis, et de 0.0049 chez la variété à fertilisation croisée microcarpa. Les populations à fertilisation croisée étaient systématiquement plus diverses au locus GapC-IV que celles auto-fécondes. Ces résultats sont discutés dans le contexte de la théorie sur le sort des gènes dupliqués et de l'hypothèse de "background selection",

4 ACKNOWLEDGEMENTS

First and foremost, l would like to thank my thesis supervisor, Daniel J. Schoen, for his guidance and support throughout this degree, and acknowledge his editorial expertise and his valuable help in data analysis. Thanks also to the other members ofmy supervisory committee, Candace Waddell and Tom Bureau, for excellent advice and use oftheir facilities. l also want to thank my undergraduate thesis advisor, Terry A. Wheeler, for getting me involved in research.

Thank you to Lily, Steve, Aaron, Aura, Hien and Nabil, who have helped me with laboratory techniques and troubleshooting. Thanks to Nikoleta for doing most of the sequencing. Thanks to Mark, Claire and Frank ofthe McGill Phytotron for excellent care, and to Isabelle for doing part ofthe planting.

l have enjoyed while at McGill the company of, and discussions ofevolutionary topics with, Steve, Sara, Mattieu, Aura, Natalia, Denis, Gray, Malorie, Adrian,

Kathy and others.

Thanks to my sister Marie-Andrée, and my friends Vanessa, Malorie, Alex K.,

Alex S., Adrian, Lily, Andrea, Jade and Jessica, for their encouragements, for making me laugh and for keeping me entertained.

5 Last but not least 1 would like to thank my parents for teaching me the value of

education and hard work, and for instilling in me the love of leaming.

To my beloved family, for your support and for aIl the "little" things that you did for me while 1 was hard at work. This thesis is dedicated to you.

This work was supported in part by a postgraduate scholarship from the Natural

Sciences and Engineering Council of Canada (NSERC) to lR. Pérusse and by a

NSERC operating grant to D.l Schoen.

6 TABLE OF CONTENT

Preface 2

Abstract 3

Resume 4

Acknowledgments 5

Table of Content. 7

List of Tables 8

List of Figures 10

Introduction 12

Materials and Methods...... 34

Results 45

Discussion 55

Conclusion 63

Literature cited 65

Tables 87

Figures 101

Appendix A 108

7 LIST OF TABLES

Table 1. Amsinckia spectabilis varieties studied 87

Table 2. Primer pairs and corresponding annealing temperatures 88

Table 3. List of individuals with the 18 bp deletion in their GapC gene sequence

...... 89

Table 4. Putative locus-defining substitutions in relation to the consensus

Amsinckia GapC sequence shown in Figure 1 90

Table 5. Linkage disequilibrium between putative loci-defining sites 91

Table 6. Analysis of selective constraint in four putative GapC loci in Amsinckia spectabilis 93

Table 7. Analysis of selective constraint in four putative GapC loci in Amsinckia spectabilis (singletons removed) 94

8 Table 8. Analysis of selective constraint for each putative A. spectabilis GapC locus analyzed in combination with Solanaceous GapC coding sequence data

...... 95

Table 9. Functional consequences of amino acid substitutions in haplotypes of

Amsinckia spectabilis at GapC loci 96

Table 10. Nucleotide variation at four putative Gap C loci in Amsinckia spectabilis varieties that differ in mating system 97

Table 11. Nucleotide variation at four putative GapC loci in Amsinckia spectabilis varieties that differ in mating system (singletons removed) 98

Table 12. Nucleotide variation at GapC-I and GapC-IV loci within Amsinckia spectabilis populations that differ in mating system 99

Table 13. Nucleotide variation at GapC-I and GapC-IV loci within Amsinckia spectabilis populations that differ in mating system (singletons removed) 100

9 LIST OF FIGURES

Figure 1. Alignment of the partial GapC gene sequence ofArabidopsis thaliana

and Amsincka spectabilis...... 101

Figure 2. Graphical representation ofintronJexon structure in a partial GapC gene

sequence ofArabidopsis thaliana and Amsinckia spectabilis 102

Figure 3. Neighbor-joining tree of GapC haplotypes in Amsinckia spectabilis

...... '" '" 103

Figure 4. Haplotypes produced by single individua1s in Amsinckia spectabilis populations .... 104

Figure 5. Representative result ofPCR-RFLP ana1ysis, depicted here for population 28...... 105

Figure 6. Results from a Southern hybridization between a GapC fragment and

Amsinckia spectabilis genomic DNA digested with Hha l (lane 1) and Hind III

(lane 2) 106

Figure 7. ML tree ofGapC sequences from Amsinckia spectabilis and

Solanaceous species tobacco, tomato, petunia and potato 107

10 INTRODUCTION

DNA sequences are a valuable source of infonnation on the evolutionary process. Many disciplines, from comparative biology to population genetics, profit from sequence data analyses. For example, studies involving phylogenetics and biogeography (Whelan et al. 200l; Pagel 1999) have reconstructed gene trees, primarily based on plastid sequences (Simon et al. 1994), which have successfully been compared to species trees based on morphological characters. The increased ease ofobtaining molecular genetic data has resulted in a greater resolution of historical demographic parameters for populations (Goldstein and Harvey 1999), and in more accurate measures ofpopulation divergence estimates (Edwards and

Beerli 2000). Population geneticists have put much emphasis on the level and patterns ofevolution as revealed by DNA sequences, with a particular emphasis on natural selection, drift and recombination (Aquadro 1997). Methods have been developed to detect selection, but when significant departures from neutrality are observed, this raises questions as to whether selection or population demography is the main cause (reviewed in Kreitman 2000; Neilsen 2001; Otto 2000).

The research described in this thesis is based on information retrievable from DNA sequence data. The initial objective was to examine levels ofDNA polymorphism between populations with contrasting mating systems. Over the course ofthe project it became apparent that the gene selected for study, which codes for cytosolic glyceraldehyde 3-phosphate dehydrogenase (GapC), had been duplicated at least once. Thus, determining the number ofGapC loci in the

Il Amsinckia spectabilis genome, and how to distinguish among them became

important objectives in the analysis ofthe molecular evolution ofthis gene, as it

has been shown that pooling ofparalogous loci may artifactually increase

estimates ofdiversity and divergence (Gaut 1998). Thus the scope ofthe study was broadened to include a preliminary molecular evolutionary analysis ofthe diversity ofthe entire GapC gene family in Amsinckia spectabilis. A literature review on both gene duplicate evolution and neutral genetic diversity follows.

12 EVOLUTION OF GENE DUPLICATES

Genes may increase in copy number in a number ofways (Li 1997).

Whole genome duplication, or polyploidization, occurs spontaneously by the

accidentaI doubling ofthe genome. As a consequence, gene order and dosage

(number ofsets ofgenes) are conserved. This type ofduplication is the most important in evolutionary terms according to 0000 (1970), and is postulated to have been important to the increase ofgenomic complexity (reviewed in Cooke et al. 1997). The most widely known polyploid organisms are , but examples exist within reptiles, amphibians, invertebrates (sorne flatworms, leeches and shrimp) and fish.

Polysomy or aneuploidy occur when entire chromosomes are duplicated.

This is usually due to nondisjunction, or failure ofsister chromatids to migrate to opposite poles, during cell division, at anaphase. Partial polysomy occurs when a fragment ofa chromosome is duplicated. In humans, chromosome duplications are most often lethal or have serious phenotypic effects (i.e. Down and Cri du

Chat syndromes).

Apart from chromosome duplication, individual genes may be duplicated either entirely or partially. Tandem duplication, reverse duplication and unequal crossing-over may be responsible for partial polysomy, and partial to entire gene duplications. In the case oftandem duplication, the duplicated regions are adjacent to one another and gene order is preserved. Adjacent duplicated regions also characterize reverse duplication, but in this case gene order is reversed.

13 FinaIly, unequal crossing-over occurs between chromosomal homologs that are

not perfectly aligned during cell division, at prophase.

An additional mechanism by which genes may become duplicated is through reverse transcription ofan mRNA intermediate. This mechanism can move copies ofgenes to unrelated sites in the genome.

Haldane (1932) recognized the important role of gene duplication in the evolutionary process. Gene duplication is still viewed today as the main way by which new genes arise (0000 1970), although other means such as exon shuffling

(Gilbert 1978) and alternative splicing (Smith et al. 1989) may also be important in the evo1ution ofnew genes and gene products.

Advantages associated with gene duplication include gene dosage effects,

(i.e. 0000 1970; Thomas 1993) where the potential for production ofa limited product such as rRNAs, tRNAs, immunoglobulins and interferons, increases with the number of genes. Gene duplication may also lead to heterozygous advantage

(0000 1970; KoeOO and Rasmussen 1967), increased fidelity ofDNA replication enzymes (Thomas 1993), protection ofthe organism against defective developmental genes (Cooke et al. 1997), and the generation ofbiodiversity by leading to postmating reproductive behaviors (Lynch and Conery 2000). In addition, the selective advantage conferred by duplicate genes has been demonstrated when experimental bacterial populations are exposed to adverse conditions (e.g. toxic media or low nutrient concentrations; reviewed by

Maclntyre 1976), and for the peach-potato aphid by inducing insecticide resistance (Devonshire and Sawicki 1979).

14 Gene duplication may also have deleterious effects, even when small regions are involved. For example, when duplicated, the bar region ofthe

Drosophila genome leads to a decreased number ofeye facets (the "bar effect").

When triplicated, the phenotype is even mare pronounced (the "double bar effect")(Sturtevant and Beadle 1962).

Cooke et al. (1997) noted that in vertebrates and other complex metazoans, redundancy is commonly seen in genes involved in development, cell proliferation control, and DNA replication fidelity. This is in contrast to

"housekeeping" or metabolic genes, where ancient duplicates seem to be rare.

The rate ofgene duplication has been estimated to be ofthe same arder of magnitude as the rate ofmutation per nuc1eotide site (Li 1997). Specifical1y, average rates ofgene duplication have been estimated to be 0.0023 per gene per million years in Drosophila, 0.0083 per gene per million years in yeast, 0.0208 in

Caenorhabditis elegans (Lynch and Conery 2000).

Related genes descended from a common ancestor are known as gene families. The fate ofthese gene family members is still a focus of interest among evolutionary biologists today.

The classical model for the fate ofduplicate genes

Early work by Haldane (1933) suggested that duplicate genes would be silenced by mutation creating null alleles, and that the duplication would affect linkage relationships among active genes by changing gene arder. Nonsense,

15 missense and frameshit mutations, point mutations affecting splicing of introns, and transposable element insertions may all alter or destroy gene function.

Fisher (1935) added that because sorne genotypes that carry null alleles may be selectively disfavored, the population will approach mutation-selection balance where functional alleles segregate at both loci. Selection will only maintain the duplicate having the lower mutation rate. Ifmutation rates are similar, then one ofthe duplicates will eventually be eliminated by genetic drift

(Fisher 1935).

Ohno (1970) is accredited with elucidating the classical model for the fate of duplicate genes. This model states that following gene duplication, selection will be relaxed so that one ofthe copies will accumulate mutations and become nonfunctional, unless it evolves a new function.

The initial fixation ofa duplicate gene

Simulation studies have shown that the initial fixation of a gene duplicate can occur through either genetic drift or positive selection (Ohta 1987, 1988a,b).

A gene duplicate in mutation-selection balance will increase in frequency provided it is selectively advantageous (Clark 1994). The selective advantage may arise from masking the accumulation ofnull alleles, especially in tightly linked loci (Clark 1994). In addition, recurrent duplication is effective in protecting the duplicated gene from loss (Clark 1994). Chance effects are also not negligible in determining which gene duplicates will get fixed, i.e. different gene family

16 members may be acquired under the same environment (Ohta 1989). Fixation ofa duplicate is more common in diploids than in haploid organisms, presurnably because ofthe potential for recessitivity and the lower equilibrium frequency of null alleles (Clark 1994).

Rate of fIxation of a null allele

The rate of fixation ofnonfunctional genes is equal to f.l per gene per generation, where f.l is equal to the mutation rate per gene per generation (Nei

1969). The accumulation ofnull alleles increases with the rate ofunequal crossing over; this accumulation appears to be more rapid under interchromosomal, rather than intrachromosomal, crossing over (Ohta 1987, 1988a, 1989). Increased population size, selection against mutant alleles and decreased mutation rate decrease the fixation rate (Watterson 1983).

Probability of fixation of null allele

The probability of fixation ofa null allele is thought to depend on population size, on the age ofthe duplication event (Bailey et al. 1978), on the heterozygous effect oflethal alleles, and on linkage (Nei and Roychoudhury

1973). Fisher (1935) conc1uded that in an infinitely large population, null alleles are never fixed. In small populations, however, drift may fix a null or lethal allele

(Nei and Roychoudhury 1973).

17 Reverse mutation was included in Nei and Roychoudhury's (1973) examination ofthe probability offixation ofa lethal (or nonfunctional) gene. This is only likely to occur in small populations over short periods oftime (Bailey et al. 1978). Walsh (1995) calculated the probability that an advantageous rather s than a null allele, is fixed to be equal to ([l-e- ]/[pS] +Ir! , where S = 4Nes, Ne is the effective population size, p is the ratio ofadvantageous to null mutations, and s is the selection coefficient ofnew advantageous mutations. Therefore the probability that a locus becomes nonfunctional, as opposed to evolving a new function, is high unless pS is much greater than one.

Time to gene silencing

According to theoretical analyses, the time to gene silencing is dependent on Nf.1, where N is the effective population size and f.1 is the mutation rate per site

(Li 1980a). When Nf.1 is smaller than 0.01, the mean loss time is equal to halfthe mean extinction time for a neutral allele under irreversible mutation. When Nf.1 is greater than one, the mean extinction time is at least twice as large as the mean extinction time for a neutral allele under irreversible mutation (Li 1980a).

Watterson (1983) determined that the mean time for silencing either ofthe duplicate loci is approximated by the mean time to fixation ofeither mutant allele from the stationary distribution for p CP = 2S-xy), plus the mean time for this stationary distribution to be achieved, where Sis the scaled selection parameter, and x and y are the frequency ofnull or deleterious alleles at each duplicate loci as

18 a function oftime. The time to gene silencing is retarded by selection against

double homozygote mutants. These results agree with Kimura and King (1979),

and Li (l980a) and Takahata and Maruyama (1979) depending on the value ofS,

and with empirical estimations ofrates of gene duplication (Lynch and Conery

2000). Specifically, Lynch and Conery (2000) found from the analysis of

eukaryote sequences retrieved from genomic databases, that gene duplicates have an average half-life of four million years.

Rate of gene silencing

Allendorf (1979) suggested that the rate ofgene silencing increases with the effective population size, provided that selection acts against multiple functional alleles. Under these conditions, however, the initial fixation ofthe duplicate gene would also be selectively disfavored (Walsh 1995).

From simulations, Li (l980a,b) determined that linkage has little effect on the rate ofgene silencing at duplicate loci when the population size is below

10,000 individuals, or if 11Il is smaller than the population size. In populations larger than 10,000 individuals, partial dominance slows the rate ofgene silencing

(Li 1980b). Surprisingly, Li (l980b) also found that the correlation between gene silencing and mutation rates is weak, perhaps due to chance effects.

19 Evolution of new gene function

The evolution of a new gene function may occur through a single mutation, although a series ofmutations is more likely (Ferris and Whitt 1979).

For example, it has been demonstrated that a change in three amino acids results in a new function for chalcone synthase (Tropf et al. 1995), and that a change in one to three amino acids caused by stepwise mutation in beta-lactamases leads to susceptibility to extended-spectrum cephalosporins in Klebsiella pneumoniae

(Chang et al. 2001). A change in function from extracellular transport of hydrophobie ligands to enzymatic activity is caused by unpairing of a single cysteine residue in the hydrophobie pocket ofprostaglandin D synthase in humans

(Nagata et al. 1991).

New roles for members of gene families can also be in the form of different temporal and spatial gene expression (reviewed by Cooke et al. 1997).

There is also evidence that functional diversity is commonly generated by domain shuffling and local sequence variation (Todd et al. 2001). In particular, changes in gene structure are thought to convert proto-oncogenes into cellular oncogenes leading to tumor growth (Hannig and Ottilie 1993).

Insights into the fate of duplicate genes gained by the application of molecular methods

Studies oftetraploid or ancient tetraploid organisms that have used molecular data have been conducted mostly in vertebrates. They have advanced

20 the level ofdetail ofour knowledge about the evolution of gene duplication and of the fate of duplicate loci. Findings from these studies have led to questions about the appropriateness ofthe classical model for predicting the fate ofduplicate genes. For example, molecular studies have revealed a high proportion of functionally preserved genes, evidence ofpurifying selection acting on aIl known duplicates, and low numbers ofsegregating null alleles.

A crucial detail that has emerged from the study ofancient tetraploid organisms is that more genes than expected under the classical model have been functionally conserved. Ferris and Whitt (1977) observed that only 53% of duplicate loci in Catostomid fish had been silenced. Hughes and Hughes (1993) estimated that 76.8% ofthe duplicate loci inXenopus laevis had not been silenced. In both ofthese studies, it was confirmed that the high preservation of duplicates was not associated with gene dosage requirements. In plants, maize, an allopolyploid, has maintained 72% ofprotein coding genes for Il million years

(White and Doebley 1998). These observations refute the notion that null alleles at a duplicate locus become fixed due to drift, as described in Nei and

Roychoudhury (1973).

From a different perspective, models evaluated by Nadeau and Sankoff

(1997) aimed at explaining gene persistence, based on empirical mouse and human genes, were consistent with genes being almost as likely to acquire new and essential functions as to become 10st through function impairing mutations.

This may be because most mutations are neutral rather than disadvantageous

(Nadeau and Sankoff 1997).

21 Further studies have revealed that 17 pairs ofnonallelic, duplicated genes in Xenopus laevis were subject to purifying selection at a constraint level similar to their mammalian orthologues (Hugues and Hugues 1993). Diversifying selection was also observed in members ofmultigene families (Hughes 1994).

The presence ofeither purifying or diversifying selection acting on duplicate alle1es is not consistent with Ohno's model for the fate ofduplicate loci.

Lynch and Conery (2000), who conducted a comparative study of nucleotide sequence data, suggested that selection is not re1axed comp1ete1y following gene duplication. Gene duplicates were found to experience moderate1y relaxed selection or accelerated evolution at replacement sites, followed by increased selective constraint (Lynch and Conery 2000).

Alternative models of gene duplication

Li (1980a) proposed a model for the fate of dup1icate genes specifie to po1yp10id organisms, to explain e1ectrophoretic data obtained from tetrap10id fishes. Li's (1980a) model may be divided into three phases. In the first phase, disomic segregation ofchromosomes is reestablished. During this period, gene loss is on1y expected to occur through the loss ofchromosomes and is therefore considered unlikely. In the second phase there is 10ss ofduplicate gene expression ifthe presence ofthe normal locus shields the organism from fitness 10ss. During the third phase, regu1atory or functiona1 divergence ofthe dup1icate gene(s) continues. According to this model, dup1icate genes may be preserved for long

22 time periods, assuming one or more ofthe fol1owing conditions hold: 1) long time to diploidization 2) large effective population size; 3) low mutation rate and 4) divergence in regulation patterns or function.

Hughes (1994) described an alternative model for the evolution ofnovel proteins. The main difference between his and the classical model is that duplication is preceded by a period ofgene sharing, which allows the duplicates to specialize in function once the duplication event occurs. This is thought to be an irreversible process, and because one copy maintains the ancestral function, the other may drift to fixation. Ifneither specialization nor sîlencing occurs, it is postulated that the gene will be subject to purifying selection.

Cooke et al. (1997) remark that a very small selective advantage, on the order ofthe germ line mutation rate, would suffice for selection to maintain the second gene indefinitely. Two other scenarios were envisioned by these authors for the maintenance of duplicate genes. In the first, duplicates perform the same function but have different efficiencies, so that the less efficient gene is maintained by selection in mutants. In the second, the first gene's efficiency is highest, but the other duplicate may persist due to a secondary function.

The Duplication-Degeneration-Complementation model (DDC) was proposed by Force et al. (1999). The main difference between the DDC model and the classical model for the fate of gene duplicates is that degenerative mutations lead to sîlencing ofthe duplicate in the classical model, whereas they lead to the preservation ofduplicates due to complementation in the DDC model.

23 The DDC model is divided into two phases. During the first phase, each gene duplicate has a subfunction that is mutated, resulting in gene silencing

(qualitative subfunctionalization), diminished expression (quantitative subfunctionalization) or new evolution of function (neofunctionalization). Both duplicates then become essential, since (different) required subfunctions are coded for in each duplicate. During the second phase ofDDC, the duplicate genes preserved by neofunctionalization or subfunctionalization may lose additional complementary subfunctions as degenerative mutations continue to accumulate.

Previous studies of gene duplication in plants

Although gene duplications are common in eukaryotes, plant gene duplicates or gene families have not contributed much to our knowledge ofgene duplication compared with animal studies (particularly in vertebrates). We have learned through genetic mapping and sequence analysis that plant genomes contain large amounts ofsegmental duplications (reviewed in Bennetzen 2000).

Patterns ofevolution of a few selected duplicate genes or gene families of plants have been examined. Among them, alcohol dehydrogenase has been studied in palms and grasses (Morton et al. 1996; Gaut et al. 1999), cotton (Small and WendeIl2000), and Leavenworthia (Char1esworth et al. 1998).

Phosphoglucose isomerase has been weIl characterized and weIl studied within the Clarkia (Thomas et al. 1993; Gottlieb and Ford 1996). Chalcone synthase, involved in secondary metabolism, has been widely studied particularly

24 in the common and Japanese morning glory (reviewed in Durbin et al. 2000). The findings are summarized below.

Briefly, different evolutionary rates (Morton et al. 1996, Small and

Wende1l2000, Durbin et al. 1995; Fukada-Tanaka et al. 1997), as well as functional divergence and differential selective constraint (Durbin et al. 1995;

Fukada-Tanaka et al. 1997), were observed between gene copies. Results from gene structure analyses indicate that introns may be conserved (Morton et al.

1996) or vary between gene copies (Charlesworth et al. 1998, Small and Wendell

2000). Gene expression was confirmed in most cases, as were shifts in gene expression (Gaut et al. 1999, Durbin et al. 2000). Finally, one pseudogene and one eliminated gene were also reported (Gottlieb and Ford 1997). Clearly, we have relatively few detailed analyses ofthe dynamics and evolution of gene duplication in plants. More work on the evolution ofgene families, especially in non­ cultivated plants, seems warranted.

25 NEUTRAL SEQUENCE DIVERSITY

The neutral theory ofmolecular evolution was proposed independently by

Kimura (1968a,b) and King and Jukes (1969) to account for the high levels of

genetic diversity revealed by allozyme studies. The theory postulates that molecular changes and variability within species are due to the random genetic drift ofmutations, which are assumed to be neutral or nearly neutral. Neutrality does in fact appear to account for a substantial fraction ofthe observed genetic diversity (Clegg 1997). However, relative evolutionary distances and levels of nucleotide diversity may become uncoupled. In particular, neutralloci, which are not expected to be subject to selection, may have reduced genetic diversity below what that expected under neutrality. Some factors that contribute to this are discussed below.

Factors influencing population level variability at neutralloci

Low levels ofpopulation genetic diversity at neutralloci are thought to arise primarily from demography (i.e. population history, subdivision), through the action ofselection (genetic hitchhiking, background selection), and in some cases from inbreeding. Population history can exert strong effects on the levels of

DNA polymorphism. For instance, recent reductions in population size or recent colonizing events (termed "population bottlenecks") can lead to loss ofpopulation genetic variation at neutralloci. Population subdivision may also reduce genetic

26 variability ifone subpopulation or deme contributes more migrants than the others, thereby reducing effective population size and genetic diversity (Nagylaki

1982, 2000; Whitlock and Barton 1997).

Hitchhiking is defined as a change in the frequency ofalleles at a neutral locus (typically 10ss ofallelic variation) due to selection acting at a nearby, linked locus (Kojima and Schaffer 1967; Maynard Smith and Haigh 1974). The influence ofhitchhiking in areas ofthe genome with reduced recombination and in populations of self-fertilizing (selfing) organisms can be especially strong

(Hedrick 1980).

Background selection, like hitchhiking, occurs when there is linkage between neutral and selected loci, but in this case, the selected loci are postulated to be deleterious mutations located throughout the genome (Charlesworth et al.

1993). Simulation studies suggest that background selection can reduce genetic diversity in random mating populations (outcrossers), primarily in regions ofthe genome where there are low levels of genetic recombination (e.g., near centromeres or in other recombinational cold spots). However, in populations of self-fertilizing organisms (selfers), which have low effective recombination rates compared with outcrossers, reduction in diversity due to background selection is expected to be seen throughout the genome (Charlesworth et al. 1993). Moreover, background selection can also influence between-population genetic variation

(Charlesworth et al. 1997). In other words, it could contribute to the relatively high Fst values (i.e., high levels ofpopulation differentiation) seen among sorne populations of selfing species (Nordborg 1997).

27 According to classical neutral theory (Pollack 1987), a 50% reduction in within-population genetic diversity is expected in completely selfing versus completely outcrossing populations, regardless ofthe action ofthe three above­ mentioned phenomena. This reduction arises due to the halving ofthe effective population size in complete selfers (as compared to random-mating populations).

Reductions in diversity below that expected from the reduction in effective population size in selfing relative to outcrossing organisms may be caused by background selection. In the presence ofgenome-wide, recurrent deleterious mutation (i.e., under the genetic conditions associated with background selection), simulations show that populations with high selfing rates undergo reductions in diversity ofas much as 80% compared to what is expected under outcrossing

(Charlesworth et al. 1993). Recombination rates are also known to affect effective population size, and empirical observations ofsequence diversity suggest that recombination is positively correlated with DNA polymorphism (e.g. in

Drosophila: Begun and Aquadro, 1992). Empirical evidence pointing to a correlation between levels ofvariation and rates ofrecombination has been recorded for tomato (Stephan and Langley 1998). However, a study ofthe effect ofthe mating system and recombination rate on polymorphism has shown that the former was more important in tomato (Baudry et al. 2001). Such reductions in diversity have traditionally been attributed ta hitchhiking.

28 Distinguishing between the possible diversity-reducing factors

Distinguishing between the different possible diversity-reducing factors outlined above is ofprimary importance in evolutionary genetics. In particular, the issue ofthe overall importance ofdeleterious mutation as an evolutionary force is one ofthe most controversial issues in population biology (Lynch et al.

1999).

Statistical tests based on neutral theory have been developed to detect the presence ofselection. Tests designed for a single locus include Tajima's D (1989) and Fu and Li's (1993) D* and F* tests. Simulation studies show that Tajima's D is the most powerful ofmolecular polymorphism tests, but it is limited by a time frame component (Simonsen et al. 1995). Recently, Galtier et al. (2000) developed a likelihood test to detect and distinguish whether demography or selection is more likely to be responsible for reductions in genetic diversity. In addition, Fay and Wu (2000) described the H statistic, which measures the excess ofhigh compared to intermediate frequency variants, a pattern only expected with genetic hitchhiking. Tests using data from two loci include Fu and Li's test based on coalescent theory (1993), and Watterson's measure ofhomozygosity (1977,

1978), and tests using data from two species includes the M-K test (McDonald and Kreitman 1991). However, the null hypothesis makes assumptions about the demographics ofthe population, such as a constant population size and the absence ofpopulation structure (Nielsen 2001), which may be violated. Further complications arise from the fact that there is only a certain time period after

29 fixation ofan advantageous mutation where hitchhiking may be detected (about

0.5N generations in the absence ofrecombination; Simonsen et al. 1995; Fu

1997). A closer examination ofpatterns ofneutral variability is thus in order.

Charlesworth et al. (1993) argue that patterns ofnucleotide diversity among populations can provide an additional means ofdiscriminating between population parameters, hitchhiking, and background selection. In particular, with population demographical parameters, such as bottlenecks, sorne populations of selfers may exhibit lower levels ofvariation than outcrossers (i.e., because ofthe effect ofthe mating system on diversity), but because population size varies in both selfers and outcrossers, there is no a priori expectation that aU populations of selfers will necessarily exhibit reduced variation compared with aU populations of outcrossers. Under hitchhiking there is again the expectation that sorne but not all selfing populations will exhibit reduced variation (i.e., strong reductions are expected only in those selfing populations that because oftheir particular ecological conditions have recently experienced strong selection at a linked locus). On the other hand, under background selection there is the expectation that all populations ofselfers will exhibit reduced, genome-wide variation. The expectation arises because deleterious mutation is assumed to be a universally occurring property ofpopulations and genomes. In outcrossers, selection will affect only parts ofthe genome.

Kim and Stephan (2000) analyzed the interaction between directional and background selection and their combined effects on neutral variation. They found that the predominant force responsible for the reduction in diversity depends on

30 the level ofrecombination. Genetic hitchhiking was important in regions of lower

recombination, while background selection was important in regions ofhigher

recombination.

Previous studies of the effect of mating systems on genetic diversity in plants

Allozyme diversity in selfing species has been observed to be on the order ofhalfthat ofoutcrossers (Hamrick and Godt 1990; Schoen and Brown 1991).

This agrees with the theoretical expectations arising from the influence ofthe mating system on effective population size, as discussed above (Pollack 1987), but because the reductions are not lower than 50%, the findings do not in general support the existence ofbackground selection nor do they provide evidence for genetic hitchhiking. Allozyme variation, however, may not be appropriate for testing the background selection hypothesis, as there is evidence that allozyme loci are under balancing selection (Gillespie 1991).

The amount of DNA polymorphism data for plant populations is increasing rapidly (Clegg 1997). Yet few ofthese studies were designed to detect the effect ofselection at linked loci, and have considered that differences in mating system could account for high differences in variability (Charlesworth and

Panne1l2001). To date, there have been five surveys that have compared sequence variation in related taxa that exhibit contrasting mating systems (Liu et al. 1999;

Miyashita et al. 1998; Cummings and Clegg 1998; Savolainen et al. 2000; and

Baudry et al. 2001). None ofthese suggest that background selection was

31 operating, except for Baudry et al. (2001), who did not exclude this possibility.

Because ofviolation ofthe assumption ofneutrality (Liu et al. 1999), and/or due to experimental design problems and limited sample sizes, the results ofmost of these studies are difficult to interpret in light ofthe background selection hypothesis (Miyashita et al. 1998; Cummings and Clegg(1998); and Savolainen et al. 2000). Recently, Baudry et al. (2001), who also had a smal1 number of sequence replicates, found that the mating system has a larger influence on the level ofDNA polymorphism than recombination. They were unable to confidently infer which potential diversity-reducing factor(s) was responsible for the observed reduction in nucleotide diversity in the inbreeding species, with the exception of one locus that experienced genetic hitchhiking. Further empirica1 studies testing the relative nucleotide diversity in related inbreeding and outbreeding populations as conducted here with Amsinckia spectabilis, therefore, seems justified.

32 Research objectives pertaining to the evolution of the cytosolic glyceraldehyde-3-phosphate dehydrogenase loci in Amsinckia spectabilis

In the research reported in this thesis, l have studied the molecular evolution ofthe cytosolic glyceraldehyde 3-phosphate dehydrogenase gene family in Amsinckia spectabilis, an angiosperm with contrasting mating systems. The primary objectives were to: 1) identify the minimum nurnber belonging to the

GapC gene farnily in Amsinckia spectabilis; 2) use sequence data to examine patterns ofselection in members ofthe GapC gene family; 3) contrast patterns of sequence diversity between outcrossing and inbreeding Amsinckia spectabilis; and

4) attempt to discriminate which selective or demographic factors known to reduce variability have been important in the evolution ofGapC loci.

33 MATERIALS AND METHODS

The genus Amsinckia

Frequent evolutionary shifts in the mating system in the annual, diploid genus Amsinckia have led to the existence ofclosely related outcrossing and selfing populations (Ray and Chisaki 1957a, b; Schoen et al. 1997).

Morphological and molecular genetic evidence suggests that the shift from heterostyly to homostyly has been frequent in Amsinckia (Ray and Chisaki

1957a,b; Schoen et al. 1997). In heterostylous f1owers, the anther and style are widely separated. Two forms of f10wers are found in populations: "pin" and

"thrum". In pin f1owers, the stigma is located at the opening ofthe corolla and the anthers are found at its base. The reciprocal is seen in thrums. This promotes outcrossing in nature. Homostyly is characterized by a single physicallocation of the stigma and ofthe anthers inside the corolla. Heterostylous plants require pollinators for seed set, while homostylous plants set large amounts ofseed by spontaneous self-fertilization.

The presence ofthe homostYlous species ofAmsinckia in disturbed and marginal habitats has earned it the reputation of a weed. Indeed, it is widely distributed in the western United States (Ray and Chisaki 1957a). The species studied here, A. spectabilis (2n=10), is found on the Pacifie coast from northern

Baja California to Washington and Vancouver Island.

34 Amsinckia spectabilis occurs as two separate named varieties that differ in

their floral biology (Ray and Chisaki 1957a; Ganders et al. 1985). A. spectabilis

var. spectabilis has small, homostylous flowers, and inhabits coastal dunes. A. spectabilis var. microcarpa, has larger, heterostylous flowers, and occupies stabilized sandy habitats.

Plant materials

Seeds were collected during fieldwork in Califomia by Johnston and

Schoen (1995, 1996). Their original population numbers are used here, but individual plants in each population were assigned unique identification numbers.

The populations used to generate sequence data for the GapC locus are listed in

Table 1, along with the collection locality and a brief description ofthe mating system.

Samples ofsix populations, each consisting of one progeny individual propagated from the seed of25 different naturally occurring plants, were grown in the McGill University Phytotron. Approximately five seeds per parent plant were planted at a depth of0.5 cm, in soil prepared from a 3: 1 mixture ofsand and black earth, with 10 grams oflime added per liter ofblack earth. After incubation at 40

C for 10 days, seedlings were grown under controlled conditions (22 0 C, 10h:14h light : dark cycle) until tissue was taken from one individual per pot for DNA extraction.

35 GapC

The gene chosen for the study ofpolymorphism in Amsinckia spectabilis codes for the ubiquitous glyceraldehyde 3-phosphate dehydrogenase (GAPDH).

In plants, GAPDH is involved in two important metabolic processes: glycolysis in the cytosol and the Calvin cycle in the chloroplast. During glycolysis, GAPDH catalyzes the oxidative phosphorylation of glyceraldehyde 3- phosphate into 1,3­ bisphophoglycerate (Harris and Waters 1976). The homo-tetrameric (C4) cytosolic protein (EC 1.2.1.12) has an N-terminal domain, responsable for binding the coenzyme NAD+, an Sloop which is though to confer thermostability to the protein (Walker et al. 1980), and a catalytic C- domain (Fothergill-Gillmore and

Michels 1993). In the chloroplast, GAPDH (EC 1.2.1.13) catalyses the reverse reaction. It exists as two isoenzymes: the main form is as a heterodimer (A2B2)

(Cerff, 1979), and in contrast to the cytosolic protein, is NADP+ dependent (Cerff

1982).

In plants, GAPDH is encoded by three separate nuclear genes (Cerff and

Chambers 1979): GapA and GapB code for the chloroplast isoenzymes (Cerffand

Kloppstech, 1982), while GapC codes for the cytosolic protein. The divergence of the chloroplast and cytosolic GAPDH genes predates that of eukaryotes and prokaryotes (Shih et al. 1988a). This is consistent with the fact that GapA and

GapB are more similar to each other than either is to GapC. GapC is more closely related to animal and fungi GAPDH's than it is to the other two subunits (Shih et al. 1986). This study focuses on the GapC gene in Amsinckia spectabilis. GapC

36 has been documented to exist in maize as a gene family of four members

(Manjunath and Sachs 1997), and in Arabidopsis thaliana, as three separate loci

(based on a search ofthe TIGRArabidopsis thaliana database performed on

January 22 2002), although Southern hybridizations have suggested a single copy

(Shih et al. 1988b).

PCR amplification, cloning and sequencing

DNA was extracted from fresh young using the DNeasy Plant Mini

Kit (Quiagen) or Nucleon PhytoPure Genomic DNA Extraction Kit (Amersham­

Pharmacia). A 1.0 kb fragment was amplified by the Polymerase Chain Reaction

(PCR) using universal primers GPDX7F and GPDX9R for plant GapC (Strand et al. 1997) in a single individual. That PCR product was sequenced as described belowand its sequence was used to design the reverse primer AsGapC-II.

Forward primer AsGapC-I was designed based on a tomato, tobacco and petunia

GapC consensus sequence (GenBank accession numbers U97257, AJ133422 and

X60346, respectively). This second primer pair yielded four PCR products per

PCR reaction. Bands ~ 1.0 kb in length were ge1-extracted (QIAEX II Gel

Extraction Kit, Qiagen), sequenced, and used to design Amsinckia spectabilis specifie primers AsGapC-III and AsGapC-IV. These primers amplif)r a single, sharp, ~950 base pair (bp) region ofthe GapC gene in Amsinckia. A fourth forward primer, AsGapC-V, was designed to target a specifie putative locus. AlI

37 primers designed for this study and their respective annealing temperatures are listed in Table 2.

The standard 25J.,lL PCR reaction consisted of IX PCR buffer (Quiagen),

20 mM ofeach dNTP (Amersham-Pharmacia), 10 pM of each primer (BioCorp

Canada), 2.5 J.,lg ofpurified BSA (New England Biolabs), and one unit of HotStar

Taq polymerase (Quiagen). The PCR reaction was conducted on a GeneAmp®

PCR Sytem 9700 (Perkin Elmer Applied Biosystems) under the following conditions: 15 minutes at 95°C; 28 to 30 cycles of60 seconds at 95°C, 60 seconds at the appropriate annealing temperature, and 60 seconds at 72°C; and a final extension step of 7 minutes at 72°C. The products were loaded on a IX TBE, 1% agarose (Gibco) gel, stained with ethidium bromide, and visualized under ultraviolet light.

Reactions containing products of ~950 bp were cloned using the Original

Cloning and TOPO Cloning kits (Invitrogen) as directed by the manufacturer.

Subcloning, microbial cultures, and plasmid isolation were performed following

Sambrook et al. (1989). Clones were screened for the presence ofinserts by

EcoRl digestion, followed by gel visualization. Positive clones were purified for sequencing using a QIA Prep Spin Plasmid Miniprep Kit (Quiagen). Sequencing was done according to the manufacturer's recommendations on a Li-Cor LONG

READIR 4200 automated sequencer using a SequiTherm EXCEL II kit

(Epicentre) with M13 forward and reverse primers. Sequencing ofboth strands was performed to ensure accuracy in base calling. Discrepancies between strands were resolved through sequence chromatograph analysis using the program Trev

38 viewer ofthe Staden Package version 1.5 (Medical Research Council, Laboratory ofMolecular Biology, Cambridge, UK) or marked as missing data. Strands were assembled using BioEdit version 5.0.6 (HaU1999). Identity ofthe product was confirmed by BLAST (Altschul et al. 1997) against the GenBank database.

Amsinckia spectabilis sequences will be deposited into the GenBank database.

PCR-RFLP using diagnostic enzymes

Because the analysis ofpreliminary sequence data suggested the presence ofseveral distinct GapC loci, a combination ofPCR amplification and Restriction

Fragment Length Polymorphism, PCR-RFLP, was used to investigate the simultaneous presence ofthe different sequence subsets (i.e. putative loci) in single individuals belonging to inbreeding populations. The first step was to generate PCR products using the standard Amsinckia primers AsGapC-III and

AsGapC-IV as described above. This was foUowed by restriction digestion using enzymes ("diagnostic enzymes") chosen to each recognize and cut a single group of sequences suspected of corresponding to the different putative GapC loci. This method can be particularly useful for detecting multiple GapC loci in inbreeding populations since there is the a priori expectation ofhomozygosity at each locus.

Therefore, the recognition ofmore than one amplified PCR products ofa single individual by aU diagnostic enzymes suggests the presence ofmultiple GapC loci.

The PCR products served as templates for the restriction digestions (with enzymes Accl, Sspl and Earl), which were carried out at 37° C in a total volume

39 of 15 III for at least one hour. Digestion products were run on agarose and scored with respect to the presence ofthe expected band size.

Southern hybridization

Southern hybridizations were performed to obtain an estimate ofthe number of GapC loci in the Amsinckia spectabilis genome according to the following. The probe was prepared using a plasmid clone with a sequenced insert corresponding to individual 1027 (population 91005) was used to generate a PCR product using the Amsinckia-specific primers AsGapC-III and AsGapC-IV. After size verification on an agarose gel, the product was cleaned using a High Pure

PCR Product Purification Kit (Roche Biochemicals) and random-Iabeled with digoxygenin, using DIG-High Prime (Roche Biochemicals), following the manufacturer's instructions. DIG Quantification Teststrips (Roche Biochemicals) were used for determining the concentration ofthe labeled probe.

Genomic DNA chosen from a single individual (whenever possible) in each population was aliquoted into 90 /-lI samples. Three aliquots from each sample were digested overnight in the presence of2.5% spermidine (Amersham­

Pharmacia) with EcoRI, HhaI, or HindIII in 100 /-lI total volume. These enzymes do not have restriction sites within the portion ofthe GapC loci used as a probe.

The products were fUll 900 volt-hours on a 0.5X TBE - 1% agarOse gel free of ethidium bromide. High salt upward capillary transfer onto a positively charged nylon membrane (Roche Biochemicals) was done overnight for a maximum of20

40 hours. DNA transferred cnte the membrane was fixed by exposure to UV illumination on a standard transil1uminator for 195 seconds. The labeled probe was incorporated into 10 ml ofDig Easy Hyb (Roche Biochemicals), and the hybridization reaction was carried out for 18 hours at 42° C. Both low and high stringency washes were conducted as recommended by the manufacturer.

Chemiluminescent detection was done using ready-to-use CSPD (Disodium 3-(4­

3 meth-oxyspiro {1 ,2-dioxetane-3,2'-(5'-chloro) tricyclo [3.3.1.1 ,7] decan}-4-y1) phenyl phosphate) substrate according to the manufacturer's recommendations.

Special consideration in sequence analysis: singletons

A particular concern with sequence data is the risk ofincluding artifactual singletons (base changes unique to a single sequence). The presence ofthese artifacts may be attributable to the Taq DNA polymerase used in PCR, which has

5 an inherent error rate of approximately 1 X 10- , or to the bacterial polymerases used in molecular cloning, which are expected to have a lower fidelity rate than

Taq DNA polymerase.

It is important to recognize this potential problem because sorne analyses commonly used to detect selection examine the frequency spectrum ofvariants, and are sensitive to rare variants. Thus, a conservative approach ofremoving singletons from aU sequences was undertaken for certain analyses described below.

41 Data Analysis

Sequence identity and alignment

Assembled sequences were aligned using ClustalX (Thompson et al.

1997). A BLAST (Astchul et al. 1997) search of GenBank was performed on

February 18 2001 to identify genomes of other plants that share a high level of nucleotide similarity at their GapC locus. Manua1 alignments were done using

GeneDoc (Nicholas and Nicholas 1997).

Phylogenetic analyses

Neighbor-joining analysis ofsequences grouped into haplotypes was performed following the method of Saitou and Nei (1987) as implemented in

ClustalX (Thompson et al. 1997). Maximum likelihood (ML) phylogenetic analysis of coding regions was conducted using PAUP version 4.0 beta version

(Swofford 1998) using the HKY85 model ofnucleotide substitution (Hasegawa et al. 1985). Phylogenetic trees were visualized with TreeView (Win32) 1.6.5 (Page

1996).

dN/dS calculations

The ratio ofthe number ofnonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous

42 site (dN/dS) can provide a measure ofthe degree ofselective constraint acting upon a coding region. PAML version 3.0c (Yang 1997) was used to calculate dN/dS values from the ML tree generated by PAUP. To test ifdN/dS values depart from that expected under selective neutrality (i.e. dN/dS = 1.0), the test statistic 2 .6. L = 2 (Le - Le) was ca1culated, where Le = log likelihood associated with the estimated ratio ofdN/dS and Le = log likelihood when dN/dS is constrained to a value of 1.0. The test statistic is distributed as X2 with one degree offreedom.

Linkage disequilibrium

Linkage disequilibrium among sites was investigated using DNAsp 3.51

(Rozas and Rozas, 1999) and tested for significance using Fisher' exact test and a

X2 test.

Functional consequences of nonsynonymous substitutions

Haplotype sequences were translated using GeneDoc (Nicholas and

Nicholas 1997) and examined for nonsynonymous substitutions. Replacement changes in amino acids were characterized by Grantham values (Grantham 1974), which take into account composition, polarity and molecular volume differences for each amino acid pair. The range ofvalues that was used to assign functional consequences to replacement changes follows Li et al. (1984) and is as follows:

43 ~,--, < 50 is conservative, 51-100 is moderately conservative, 101-150 is moderately

radical and>151 is radical.

Diversity of GapC loci

Diversity at each putative GapC locus was estimated by n and 8wusing

Proseq v.2.8 beta (Filatov 1999). n is equal to the number ofdifferences per site

between two randomly chosen sequences, while 8wis calculated from the number

ofsegregating sites in the sample.

44 RESULTS

Structure of the GapC locus in Amsinckia spectabilis

Identity ofamplicons generated by both the universal and Amsinckia­ specifie primers was confirmed by BLAST (Altschul et al. 1997) against the

GenBank database to be a portion ofthe catalytic domain ofthe cytosolic

GAPDH gene. Within aU A. spectabilis sequences, sequence length is conserved except for an 18 base pair deletion between bases 318 and 334 ofthe consensus sequence. This indel is seen in only a smaU number of individuals from populations 91004 and 88028, which are listed in Table 3.

The consensus sequence ofthe Amsinckia GapC fragment investigated in this study shows identity ofbetween 82 to 84% in DNA sequence, and of91% in amino acid sequence to the GapC sequences ofthe solanaceous species tobacco, petunia and potato (the Solanales is a closely allied order to the ).

The Arabidopsis thaliana GapC (GenBank accession M64119) and a consensus of an Amsinckia spectabilis sequences from this study were manuaUy aligned (Figure 1) with GeneDoc (Nicholas and Nicholas 1997) using the results ofa pairwise BLAST (Astchul et al. 1997) search as a reference. The region amplified in Amsinckia spectabilis is delimited by the last 15 nucleotides of exon

5 at the 5'end, and the first 25 nucleotides ofexon 9 at the 3'end. The coding regions, which represent approximately one half ofthe Amsinckia spectabilis sequence, show a high degree ofconservation with those ofthe Arabidopsis

45 sequence. There are, however, notable differences between the non-coding regions ofthe two species' GapC loci (refer to Figure 2). The most pronounced difference is the complete loss ofintrons 6 and 7 in Amsinckia spectabilis. In addition, the portion ofthe sequence corresponding to intron 5 in Amsinckia spectabilis is about twice longer than in Arabidopsis thaliana, while intron 8 is of comparable length in these species.

Evidence of at least three separate GapC loci in Amsinckia

Severallines ofevidence from this study point to the existence ofat least three separate GapC loci in the Amsinckia spectabilis genome. This evidence includes phylogenetic analysis ofsequence data, correlation ofspecifie base substitutions at a number ofsites, linkage disequilibrium analyses ofsequence data, PCR-RFLP analyses and Southern hybridizations.

Phylogenetic analyses. To simplify the analysis ofthe relationships between aIl A. spectabilis sequences analysed in this study, parsimony non-informative sites (i.e. singletons) were removed and the sequences were grouped into haplotypes

(Appendix A). Results from the phylogenetic analysis ofthe haplotypes are shown in Figure 3. Apart from two haplotypes, which represent a very small percentage ofthe total number ofsequences collected, aIl haplotypes fall neatly into four distinct groups. Each such group ofsequences was assigned the status of

46 putative locus and will hereafter be referred to as GapC-I through IV (Appendix

A).

The putative loci are defined by sets ofcorrelated base substitutions and one insertion as listed in Table 4. Most ofthe substitutions that distinguish these putative loci fall within the non-coding region. Ofthe three substitutions located within the coding region (at positions 555, 567 and 636), none cause replacement changes in the amino acid sequence.

The two haplotypes that do not correspond to any locus, representing populations 91003 and 91004 (two individuals each), were found to be closely related to those ofGapC-I and GapC-IIbut missing defining substitutions.

Haplotype 13 was missing one substitution to be classified under GapC-I.

Haplotype 14 shared one ofthe substitutions defining GapC-I, but could not be classified as a member ofthat locus because one substitution common to GapC-I and GapC-II was missing.

The genetic distance between groups ofhaplotypes is measured by the number ofshared substitutions (Figure 3). Haplotypes 29 through 72 (putative locus 4) form the most distant group, while haplotypes 1 through 12 (putative locus 1) and 15 through 21 (putative locus 2) are very similar to each other.

Haplotypes 22 to 28 (putative locus 3) are more similar to haplotypes corresponding to putative loci 1 and 2, than to those ofputative locus 4.

Linkage disequilibrium among sites. Linkage disequilibrium analyses were conducted to quantify the magnitude and test the significance ofthe association

47 between combinations ofnucleotide positions that define each presumed GapC

locus. Linkage disequilibrium (D ') is equal to DID max, where D is a measure ofthe

deviation ofgametic frequencies observed in the sequence data from equilibrium

frequencies (Lewontin and Kojima 1964), and Dmax is the maximum value ofD.

Most significant comparisons among loci-defining sites had values ofD' equal to

1.0, the maximum value (Table 5). The few remaining significant comparisons also had D' values very close to the maximum.

Observation ofseveral distinct GapC sequences within individuals from inbreeding populations. The expectation ofhigh homozygosity in organisms from inbreeding populations provides an additional tool for inferring the number of

GapC loci amplified by PCR cloning. In an extreme inbreeder, such as Amsinckia spectabilis var. spectabilis, the number ofsequences produced by a single individual that faU into separate groups ofsequences corresponding to putative loci can be taken as additional evidence ofthe number of GapC loci per genome.

Individuals for which multiple sequences were available (i.e. through sequencing multiple clones per PCR reaction) were examined and placed onto the neighbor-joining tree ofaU haplotypes. Figure 4 shows the placement of haplotypes present within different single individuals. Three distinct and distantly related haplotypes were seen in the two selfing individuals, individuals Gand

242. This evidence suggests the existence ofthree rather than four GapC loci. In the outcrossing individuals 1003 and 1027, four haplotypes were placed on the neighbor-joining tree, each in a separate and well-defined group.

48 ?,~

PCR-RFLP. PCR-RFLP was performed to verify that an putative loci were

present in the PCR products taken from individuals in an populations studied. The

diagnostic restriction enzymes used were as follows: Earl (New England Biolabs)

for putative loci 1 and 2, SspI (Gibco) for putative locus 3, and AccI (Gibco) for

putative locus 4.

The enzyme SspI did not produce the expected digestion products. Earl

and AccI were reliable in producing bands of expected sizes. Results are

summarized in Figure 5. Three bands, which correspond to the expected sizes of

the raw PCR product and to that ofboth restriction digestion products, are seen in

each lane. This indicates that the PCR products from aIl individuals investigated

have restriction sites for these two enzymes. This is especially informative in the

case ofthe inbreeding populations, as it reinforces the idea that each group of

sequences is a distinct locus, and not a divergent allelic form.

Southern hybridization. Results from Southern hybridizations are shown in Figure

6. In two individuals, 4 or 5 discernable bands, ranging in size from

approximately 1.65 to 3.60 kb, were seen. Each band represents a GapC locus in

Amsinckia spectabilis.

49 Analysis of functional consequences of nonsynonymous substitutions

Grantham values (1974) and their functional consequences (following Li et al. 1984) were obtained for the 15 amino acid replacements observed in the translated haplotype sequences (Table 9). Grantham values are based on composition, polarity and molecular volume differences between amino acid pairs, and provide a quantitative estimate ofthe biophysical effect associated with replacement changes. One nonsynonymous substitution from tryptophan to a stop codon was found in one haplotype representing three GapC-f sequences from individuals belonging to two different populations. Out of 163 amino acid sites,

2.45% of amino acid changes were radical, 1.23% moderately radical, 3.68% moderately conservative, and 1.23% conservative.

Pattern of sequence evolution in coding regions of the GapC loci

One measure of selection on coding sequences is the dN/dS ratio. It represents the number ofnonsynonymous substitutions per nonsynonymous site divided by the number ofsynonymous substitutions per synonymous site. A dN/dS greater than one indicates the action ofpositive or diversifying selection, while a value ofless than one is likely to be due to negative or purifying selection.

The program PAML version 3.0c (Yang, 1997) used for this work ca1culates dN/dS ratios for the entire phylogeny ofsequences analyzed.

50 Individualloci analyzed separately in the phylogeny ofA. spectabilis sequences.

The coding region in the A. spectabilis GapC sequence was inferred from the annotated A. thaliana fuIllength GapC sequence (GenBank accession M64119).

The coding regions corresponding to exons 6 through 8 were included in the analysis. Estimates ofdN/dS values were obtained with and without singletons removed from the data.

The dN/dS ratio estimated from the original data were smaller than one for aIl putative loci, with values ranging from 0.5985 in GapC-l to 0.8215 in GapC­

IV (Table 6). However, no value was statistically significant. The dN/dS estimated from sequences free ofsingletons were also smaller than one (Table 7). Again, no value was significantly different than one. Non-significance ofthe dN/dS ratios in these analyses may be due to the re1ative1y 10w degree ofvariation observed among the sequences assigned to individua110ci.

Amsinckia GapC loci analyzed together with GapC sequences from the related family Solanaceae. The Solanaceae is a relatively closely allied family to the

Boraginaceae (Chase et al. 1993). To test for selective constraint in Amsinckia

GapC loci since divergence ofthe Solanaceae and the Amsinckia lineages, a ML analysis of dN/dS was conducted using the consensus sequence ofeach sequence together with GapC Solanaceous sequences for which GenBank data was available. Consensus sequences for each ofthe putative Amsinckia spectabilis

GapC loci were obtained using GeneDoc version 2.5 (Nicholas and Nicholas

1997). GapC mRNA sequences ofNicotiana tabacum (tobacco; accession

51 M14419), Lycopersion esculentum (tomato; accession U97257), Petunia hybrida

(accession X60346) and Solanum tuberculosum (potato; accession U17005) were retrieved from the GenBank database. Homologous coding regions were selected based on a pairwise BLAST (Altschul et al. 1997) search with the Amsinckia spectabilis coding sequence set as the query.

The ML phylogeny ofthe above four Solanaceous GapC coding regions and A. spectabilis locus-specifie consensus eoding sequences obtained using

PAUP, is shown in Figure 7. The tree file was modified by distinguishing the branch beneath the Amsinckia GapC sequence from the branches leading to the

Solanaceous sequence, and PAML was used to obtain dN/dS estimates for the A. spectabilis loci versus the four Solanaceous GapC loci. This analysis yielded estimates ofdN/dS that were aIl significantly less than 1.0 at the P < 0.001 level for the four putative A. spectabilis GapC loci (Table 8).

Nucleotide variability in Amsinckia spectabilis GapC

Nucleotide diversity, e, was investigated for each putative Amsinckia spectabilis GapC locus with and without singletons removed from the data. e=4Nef.1, where Ne is the effective population size and f.1 is the mutation rate.

Because both parameters defining eare unknown, DNA polymorphism is estimated by TC (also known as Tajima's estimator of e) and ew(Nei 1987). TC is the average number ofnucleotides differences per site between two randomly chosen sequences. ewis the number of segregating sites (or mutations) within the

52 sample; it is also considered a better estimator because it has a smaIler stochastic varIance.

Both 'TC and ewwere estimated separately for aIl sites combined, as weIl as for silent (synonymous and non-coding) sites, with indels ignored, for Amsinckia spectabilis var. spectabilis (inbreeding) and Amsinckia spectabilis var. microcarpa (outbreeding). The results are shown in Tables 10 and Il. In aIl cases, regardless ofthe presence or absence ofsingletons, data from silent sites were equal in diversity or more diverse than aIl sites combined, except for var. spectabilis at GapC-Il.

In aIl four putative loci, nucleotide diversity was higher in the outbreeding varieties (var. microcarpa) than in the inbreeding varieties (var. spectabilis), independent ofwhether aIl sites or only silent sites were considered, except for

'TC at GapC-I, and 'TC at aIl sites and eat GapC-IVwith singletons removed.

Exceptions aside, this effect is not dependent on the presence ofsingletons. The ratio ofn at silent sites between inbreeding and outbreeding varieties ranged from

50% at GapC-IVto 75.3% at GapC-III (singletons present) and from 56.5% at

GapC-IIIto 107.1% at GapC-I (singletons removed).

The population-Ievel DNA polymorphism survey shows that without singletons removed, aIl A. s. microcarpa populations were more diverse than A. s. spectabilis populations, for both types ofsites examined (Tables 12 and 13). In addition, silent sites were always more diverse than aIl sites combined.

When singletons were removed, not aIl inbreeding populations were less diverse than aIl populations ofoubreeders at GapC-f: population 88028 was more

53 diverse than the other inbreeding populations, and population 91005 was much

less diverse than the other outbreeding populations. Although the amount of

difference between inbreeding and outbreeding populations at GapC-IVis smal1, the former are stillless diverse than the latter. Standard deviations are especial1y high for sorne estimates of n.

54 DISCUSSION

A preliminary analysis ofmolecular evolution at the cytosolic

glyceraldehyde 3-phosphate dehydrogenase gene, GapC, was conducted in

Amsinckia spectabilis (Boraginaceae). It is suggested that GapC is a gene family ofthree or four similar members in A. spectabilis. This conclusion is based on four lines of evidence: 1) phylogenetic analysis ofsequence data; 2) correlation among specifie base substitutions at a number ofsites (positive linkage disequilibrium); 3) PCR-RFLP analyses; and 4) Southern hybridizations. The structure ofthe GapC locus in Amsinckia spectabilis was found to be different from that ofArabidopsis thaliana, principally by the loss oftwo introns and a twofold increase in the length ofanother intron. A small percentage ofamino acid substitutions that account for the differences among the haplotypes were classified as radical, but most changes were synonymous, conservative, or moderately conservative. A single nonsynonymous substitution caused a stop a codon to replace an amino acid at one ofthe loci. Analyses ofpatterns ofsubstitutions intended to detect departure from selective neutrality yielded non-significant results when each locus was examined separately. Strong deviation from neutrality was detected, however, with dN/dS ratios less than one, when each locus was analyzed in conjunction with representatives ofthe closely allied plant family Solanaceae. Mean nucleotide diversity across aIl GapC loci is 0.0036

(0.0019 with singletons removed), for the inbreeding var. spectabilis, and 0.0049

55 (0.0025 with singletons removed), for the outbreeding var. microcarpa. At the

individual population level, there is a systematic difference in diversity between

outbreeding and inbreeding populations at GapC-IV

Evidence for at least three separate GapC loci in the Amsinckia spectabilis genome

Clearly, members ofthe GapC gene family in Amsinckia spectabilis are very similar to one another: they are only distinguishable through correlated base substitutions at specifie positions in the sequence. This helps to account for why initial efforts to amplify a single locus through the design ofprimers specifie to a

PCR band failed. The combination ofphylogenetic analyses, specifie base substitutions, linkage disequilibrium analyses and phylogenetic distribution of clones generated from single individuals provide strong evidence that the groups ofsequences represent three or four separate loci.

Although most likely distinct from GapC-III and IV, the evidence favoring

GapC-I and GapC-II as separate loci is weaker than that for the other loci. Only a small number ofsequences generated from single inbreeding individuals correspond to GapC-l1; no inbreeding individuals were observed to possess sequences corresponding to both GapC-I and GapC-II. One method that has the potential to solve this problem is PCR-RFLP, but because only three specifie base substitutions separate loci 1 and 2, no restriction enzyme could be found that would recognize one locus but not the other. With this in mind, the possibility that

56 what is referred to as GapC-f and II are actually differentiated alleles ofthe sarne locus must be considered.

Linkage disequilibrium was estimated in order to investigate the association between the base substitutions at positions that define putative loci.

Most significant comparisons reached the maximal value ofD' or were very close to it. The correlation between sites that distinguish the four putative loci does not appear to have arisen by chance (Table 5), a finding that supports the presence of four separate putative loci. Thus, statistical evidence corroborates the existence of four different loci. Alternatively, the specifie nucleotide substitutions may define distinct alleles maintained by balancing selection. However, this explanation implies that each allele is different enough biochemically or ecologically so that maintaining them would be advantageous. This seems unlikely, as most differences between "loci" are found in the form ofsingle nucleotide substitutions located in the non-coding regions.

Unfortunately, while suggestive, the Southern hybridizations are not definitive in elucidating the total number of GapC loci in Amsinckia spectabilis, primarily because ofpartial restriction digestion ofAmsinckia spectabilis DNA, and because the signal generated from digoxygenin-Iabeled probes was weak

(Figure 6).

57 Structure and functionality ofthe GapC loci in Amsinckia spectabilis

The loss oftwo introns at the GapC loci ofAmsinckia spectabilis as compared to Arabidopsis thaliana suggests that the Amsinckia spectabilis GapC might have arisen through the reverse transcription of an mRNA intermediate.

This would imply that there is yet another GapC locus within the Amsinckia genome, or that gene conversion has taken place at sorne point during the evolution ofthis gene family. Alternatively, intron loss could have occurred in the ancestor ofAmsinckia spectabilis. Although not uncommon in plants, the loss of introns here is unexpected, considering that glyceraldehyde 3-phosphate dehydrogenase is a slowly evolving enzyme (Fothergill-Gillmore and Michels

1993), and that there is strong conservation ofgene structure at this locus throughout the plant and animal kingdoms (Shih et al. 1988a). It would be interesting to investigate when, relative to other plant lineages, the loss ofthese introns occurred. Unfortunately, there are no other published full-length plant

GapC DNA sequences, apart from that ofArabidopsis thaliana.

Based on the analysis ofthe synonymous and nonsynonymous substitutions, it appears likely that GapC-II, GapC-III and GapC-IVare aIl functional. On the other hand, since one nonsynonymous substitution (from tryptophan to a stop codon) was found in a haplotype representing three individuals from two different populations at GapC-I, it is possible that this locus is a pseudogene. Sorne haplotypes belonging to all putative loci, except GapC-IIL have nonsynonymous substitutions that lead to radical changes in the amino acid

58 sequence. The fact that radical or moderately radical changes in amino acids were not observed at GapC-III may be an artifact ofthe smaH number ofsequences obtained for this locus.

Pattern of sequence evolution in coding regions ofthe GapC loci

The patterns of sequence evolution in coding regions ofthe GapC locus in

Amsinckia spectabilis were investigated by estimating the ratio dN/dS, a method that has proven useful for the detection of selection (Nei and Gojobori 1986).

There is strong evidence that strong purifying selection has occurred at the GapC loci since the divergence ofthe Amsinckia lineages from the Solanaceae. The presence of singletons in the data does not affect these conclusions.

The fact that selective constraint appears to have been acting on aH GapC loci in Amsinckia spectabilis refutes the classical model for the fate ofduplicate genes (0000 1970), which states that after duplication, one duplicate gene will maintain the ancestral function while the other will either gain a new function or become a pseudogene through the accumulation ofmutations. It is not possible, however, to determine which ofthe other models proposed in the last decade (e.g.

Hughes 1994; Force et al. 1999) best fit the observations reported here, since only approximately half ofthe entire gene was sequenced, and information on evolution in the regulatory regions is lacking.

59 Diversity at GapC loci

DNA polymorphism was estimated for each putative CapC locus in both

A. spectabilis varieties. As expected, the inbreeding variety ofA. spectabilis was less diverse than the outbreeding variety. Nevertheless, instead of a large reduction in diversity in inbreeding relative to outbreeding taxa as expected by the background selection and hitchhiking models (e.g. Charlesworth et al. 1993;

Maynard Smith and Haigh 1974), Cape was only 25% less diverse in the inbreeding versus outbreeding populations. While nucleotide diversity within varieties may be due to between-population variation, (Charlesworth and Pannell

2001), this is unlikely to be the case here, as sequences from individuals of different populations often share the same haplotype (Appendix A). Moreover, this possibility was investigated through a population-Ievel survey ofdiversity at

CapC-I and IV, which had adequate sample sizes in most populations. Nucleotide diversity was greater in the outbreeding than in the selfing populations, at levels that agree with the effect ofthe mating system on effective population size and genetic diversity (Pollack 1987). Because no reduction in genetic diversity in inbreeding versus outbreeding populations below that expected from the reduction in effective population size was observed, the sequence data from this study does not support the predictions ofthe background selection model (Charlesworth et al.

1993).

It is worthwhile to consider that several factors may mask the effect of background selection on nucieotide diversity. For one, overestimation ofthe

60 selfing rates for the homostylous populations ofAmsinckia spectabilis may have occurred; selfing rates were estimated in only one season (Johnston and Schoen

1996, Schoen et al. 1997). There is also the possibility that the GapC loci are situated in genomic regions oflow recombination, which, under the' background selection hypothesis, could lead to reduced variability independent ofthe mating system. Unfortunately, genetic mapping has not been done in Amsinckia or other closely allied taxa, so this hypothesis could not be tested. Another possibility is that the selfing varieties ofA, spectabilis may have originated from the outcrossing varieties recently in evolutionary time (Schoen et al. 1997). Ifthis is indeed true, reductions in diversity through the selective removal oflinked deleterious mutations may not have yet progressed to the level expected in an inbreeder.

One inbreeding population, 91010, showed greatly reduced genetic diversity at silent sites relative to other inbreeding and outbreeding populations at the GapC-Ilocus. The fact that a much lower level ofDNA polymorphism was not observed for that same population at GapC-IV would seem to mIe out demographic factors such as a population bottleneck, which is expected to affect diversity at allloci equally. A local selective factor may account for this observation.

Other statistical tests ofneutrality have not yet been performed on the data obtained in this study, mainly because the proportion ofartifactual singletons is unknown. Statistical tests ofneutrality have been shown to be able to partially distinguish between factors responsible for reducing genetic diversity

61 (Charlesworth et al. 1993). For instance, Tajima's D (1989), which measures the difference between estimates of eand n, was found to have the power to detect background selection (Simonsen et al. 1995). But since n is highly sensitive to the presence oflow frequency variants, artifactual singletons may result in false positive results (type II error). Sequencing ofmore clones per individual will be required before such neutrality tests are performed.

62 CONCLUSIONS

The initial objective ofthis study was to compare nuc1eotide diversity between the two c10sely related varieties ofAmsinckia spectabilis with contrasting mating systems. This study is one ofthe first to do so at the population-Ievel. The use of c10sely related taxa provided a way to test for background selection

(Charlesworth et al. 1993) to see ifit contributes to the lower levels ofgenetic variability reported in several species of selfers compared to outcrossers, as assessed using allozymes (Hamrick and Godt 1990; Schoen and Brown 1991).

Prior to addressing the effect ofmating systems on genetic variability, the number and defining characteristics ofeach member ofthe GapC gene family in

Amsinckia spectabilis had to be determined. Strong evidence for three putative copies ofthe GapC gene was found, while weaker evidence suggests the existence ofa foOOh locus. Allioci are highly conserved in gene structure and differ only by single base substitutions at specifie positions. Evidence that purifying selection has been acting on all observed members ofthe GapC gene family inA. spectabilis since the divergence ofthe c10sely allied plant family

Solanaceae and ofthe Amsinckia lineage is in agreement with studies oftetraploid organisms that refute the c1assical model for the fate ofduplicated genes (e.g.

Hughes and Hughes 1993). Whether the newer models for the fate of duplicate genes can successfully explain the observations made here remains to be c1arified with entire gene sequences inc1uding regulatory regions. Tests for functional

63 divergence in the form ofshifts in gene expression would also be useful in this respect.

GapC-IVwas chosen to study nucleotide polymorphism based on strong evidence that it was a single locus, and for practical reasons (large number ofbase substitutions unique to this locus useful for primer design). Levels of polymorphism were found to be on the same order ofmagnitude as that observed in Leavenworthia at the phosphoglucose isomerase locus (Liu et al. 1999), and in

Lyeopersieon at CT208 (in L. ehilense), suer (in L. pimpinellifolium, L. ehilense and L. hirsutum), CT251 (L. ehilense and L. hirsutum), CT268 (in L. pimpinellifolium) and CT143 (in L. pimpinellifolium and L. hirsutum) (Beaudry et al. 2001) for both selfing and outcrossing populations. The small reduction in genetic diversity observed in Amsinekia speetabilis GapC between outcrossing and inbreeding populations is consistent with the effect ofselfing on the effective population size and genetic diversity (Pollack, 1987).

64 LITERATURE CITED

Allendorf, F.W. Rapid loss of duplicate gene expression by natural selection.

Heredity 43: 247-258.

Altschul, S.F., T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and

D.l Lipman. 1997. Gapped Blast and Psi-Blast: a new generation ofprotein database search programs. Nucleic Acid Reseach 25: 3389-3402.

Aquadro, C.F. 1997. Insights into the evolutionary process from patterns ofDNA sequence variability. Current Opinion in Genetics and Development 7(6):835-40.

Bailey, G.S., R.T.M. Poultier and P.A. Stockwell. 1978. Gene duplication in tetraploid fish: Model for gene silencing at unlinked duplicated loci. Proceedings ofthe National Academy of Sciences USA 75 (11): 5575-5579.

Baudry, E., C. Kerdelhue, H. Innan and W. Stephan. 2001. Species and recombination effects on DNA variability in the tomato genus. Genetics 158(4):

1725-1735.

Begun, D.l and C.F. Aquadro. 1992. Levels ofnaturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356:

519-520.

65 Bennetzen, J.L. 2000. Comparative sequence analysis ofplant nuc1ear genomes: microlinearity and its many exceptions. The Plant Cell 12: 1021-1029.

Cerff, R 1979. Quaternary structure ofhigher plant glyceraldehyde-3-phosphate dehydrogenases. European Journal ofBiochemistry 94: 243-247.

Cerff, R 1982. Separation and purification ofNAD- and NADP-linked glyceraldehydes 3-phosphate dehydrogenases from higher plants. Pp: 683-694, in

Methods in Chloroplast Molecular Biology, edited by M. Eldelmann, RB. Hallick and N.R. Chua. Elsevier Biomedical Press, Amsterdam.

Cerff, Rand S.E. Chambers. 1979. Subunit structure ofhigher plant glyceraldehyde-3-phosphate dehydrogenases (EC 1.2.1.12 and EC 1.2.1.13). The

Journal ofBiological Chemistry 254 (13): 6094-6098.

Cerff, Rand K. Kloppstech. 1982. Structural diversity and differentiallight control ofmRNAs coding for angiosperm glyceraldehyde-3-phosphate dehydrogenases. Proceedings ofthe National Academy of Sciences ofthe United

States ofAmerica 79(24): 7624-7628.

Chang, F.Y., L.K. Siu, C.P. Fung, M.R. Huang and M. Ho. 2001. Diversity of

SHV and TEM beta-Iactamases in Klebsiella pneumoniae: Gene evolution in

66 Northern Taiwan and two novel beta-lactamases, SHV-25 and SHV-26.

Antimicrobial Agents and Chemotherapy 45(9): 2407-2413.

Chatlesworth, B., M.T. Morgan and D. Charlesworth. 1993. The effect of

de1eterious mutations on neutral molecular variation. Genetics 134: 1289-1303.

Charlesworth, B., M. Nordborg and D. Charlesworth. 1997. The effects oflocal selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genetical Research 70:

155-174.

Charlesworth, D. and lR. Pannell. 2001. Mating systems and population genetic structure, or how a couple ofbotanists learned to stop worrying and love the coalescent or theory is sorne use, after aIl. In Plants Stand Still, But Their Genes

Don/t, edited by J. Silvertown and l Antonovics. Blackwell (in press).

Charlesworth, D., F.L. Lui and L. Zhang. 1998. The evolution ofthe alcohol dehydrogenase gene family by loss of introns in plants ofthe genus

Leavenworthia (Brassicaceae). Molecular Biology and Evolution 15 (5): 552-559.

Chase, M.W., D.E. Soltis, RG. Olmstead, D. Morgan, D.H. Les, B.D. Mishler,

M.R. Duvall, RA. Priee, H.G. Hills, Y-L. Qiu, K.A. Kron, lH. Rettig, E. Conti, lD. Palmer, J.R Manhart, K.1. Sytsma, H.1. Michaels, W.J. Kress, K.G. Karal,

67 W.D. Clark, M. Hedren, B.S. Gaut, R.K. Jansen, K.-J. Kim, C.F. Wimpee, J.F.

Smith, G.R. Fumier, S.H. Strauss, Q.-Y. Xiang, G.M. Plunkett, P.S. Sohis, S.M.

Swensen, S.E. Williams, P.A. Gadek, C.J. Quinn, L.E. Eguiarte, E. Golenberg,

G.H. Leam Jr., S.W. Graham, S.C.B. Barrett, S. Dayanandan and V.A. Albert.

1993. Phylogenetics of seed plants: An analysis ofnucleotide sequences from the plastid gene RbcL. Annals ofthe Missouri Botanical Garden 80: 528-580.

Clark, A.G. 1994. Invasion and maintenance of a gene duplication. Proceedings of the National Academy of Sciences USA 91: 2950-2954.

Clegg, M.T. 1997. Plant genetic diversity and the struggle to measure selection.

Journal ofHeredity 88: (1) 1-7.

Cooke, 1., M.A. Nowak, M. Boerlijst and J. Maynard-Smith. 1997. Evolutionary origins and maintenance ofredundant gene expression during metazoan development. Trends in Genetics 13(9): 360-364.

Cummings, M.P. and M.T. Clegg. 1998. Nucleotide sequence diversity at the alcohol dehydrogenase 1 locus in wild barley (Hordeum vulgare ssp. spontaneum): An evaluation ofthe background selection hypothesis. Proceedings ofthe National Academy ofSciences USA 95: 5637-5642.

68 Devonshire, A.L. and R.M. Sawicki. 1979. Insecticide-resistant Myzus persicae as

an example of evolution by gene duplication. Nature 280: 140-141.

Durbin, M.L., G.H. Learn, G.A. Huttley and M.T. Clegg. 1995. Evolution ofthe chalcone synthase gene family in the genus Ipomoea. Proceedings ofthe National

Academy of Sciences USA 92: 3338-'3342.

Durbin, M.L., B. McCaig and M.T. Clegg. 2000. Molecu1ar evolution ofthe chalcone synthase multigene family in the morning glory genome. Plant

Molecular Biology 42: 79-92.

Edwards, S.V. and P. Beerli. 2000. Perspective: Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies.

Evolution 54(6): 1839-1854.

Fay, J.C. and c.-I. Wu. 2000. Hitchhiking under positive Darwinian selection.

Genetics 155: 1405-1413.

Ferris, S.D. and G.S. Whitt. 1977. Loss of duplicate gene expression after polyploidisation. Nature 265: 258-260.

69 Ferris, S.D. and G.S. Whitt. 1979. Evolution ofthe differential regulation of duplicate genes after polyploidization. Journal of Molecular Evolution 12: 267­

317.

Filatov, D.A and D. Charlesworth. 1999. DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus. Genetics 153:

1423-1434.

Fisher, R.A 1935. The sheltering oflethals. American Naturalist 69: 446-455.

Force, A, M Lynch, F.B. Pickett, A. Amores, Y Yan and J. Postlethwait. 1999.

Preservation ofduplicate genes by complementary, degenerative mutations.

Genetics 151: 1531-1545.

Fothergill-Gillmore, L.A and P.AM. Michels. 1993. Evolution ofglycolysis.

Progress in Biophysics and Molecular Biology 59: 105-235.

Fu, y-x. 1997. Statistical tests ofneutrality ofmutations against population growth, hitchhiking and background selection. Genetics 147: 915-925.

Fu, Y-X. and W.-H. Li. 1993. Statistical tests ofneutrality ofmutations. Genetics

133: 693-709.

70 Fukada-Tanaka, S., A. Hoshino, Y. Hisatomi, Y. Habu, M. Hasebe and S. Iida.

1997. Identification ofnew chalcone synthase genes for flower pigmentation in the Japanese and common moming glories. Plant Cell Physiology 38: 754-758.

Galtier, N., F. Depaulis and N.H. Barton. 2000. Detection bottlenecks and selective sweeps from DNA sequence polymorphism. Genetics 155: 981-987.

Ganders, F.R., S.K. Denny and D. Tsai. 1985. Breeding systems and genetic variation in Amsinckia spectabilis (Boraginaceae). Canadian Journal ofBotany

63: 533-538.

Gaut, B.S. 1998. Molecular clocks and nucleotide substitution rates in higher plants. Evolutionary Biology 30: 93-120.

Gaut, B.S., A.S. Peek, B.R. Morton and M.T. Clegg. 1999. Patterns ofgenetic diversification within the Adh gene family in the grasses (Poaceae). Molecular

Biology and Evolution 16 (8): 1086-1097.

Gilbert, W. 1978. Why genes in pieces? Nature 271: 501.

Gillespie, J.H. 1991. The Causes ofMolecular Evolution. Oxford University

Press, Oxford.

71 Goldstein, D.B. and P.B. Harvey. 1999. Evolutionary inference from genomic data. BioEssays 21: 148-156.

Gottlieb, L.D. and V.S. Ford. 1996. Phylogenetic relationships among the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences ofPgiC.

Systematic Botany 21: 1-18.

Gottlieb, L.D. and V.S. Ford. 1997. A recently silenced, duplicate PgiC locus in

Clarkia. Molecular Biology and Evolution 14 (2): 125-132.

Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185: 862-864.

Haldane, IB.S. 1932. The Causes ofEvolution. Cornell University Press, Ithaca,

New York.

Haldane, IB.S. 1933. The part played by recurrent mutation in evolution.

American Naturalist 67: 5-19.

Hall, T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series

41: 95-98

72 Hamrick, J.L. and M.J.W. Godt. 1990. Allozyme diversity in plant species, pp 43­

63, in Plant Population Genetics, Breeding and Genetic Resources, edited by

A.H.D. Brown, M.T. Clegg, A.L. Kahler and B.S. Weir. Sinauer, Sunderland,

Massachusetts.

Hannig, G. and S. Ottilie. 1993. Evolution of SRC-type protein tyrosine kinases and receptor protein tyrosine kinases. Endocytobiosis and Cell Research 9(2-3):

105-134.

Harris, lI. and M.G. Waters. 1976. Glyceraldehyde 3-phosphate dehydrogenase.

Pp. 1-49 in The Enzymes, 3rd Edition, Vol. XIII, edited by P.D. Boyer. Academie

Press, Inc., New York.

Hasegawa, M., H. Kishino and T.-A. Yano. 1985. Dating the human-ape splitting by a molecular dock ofmitochondrial DNA. Journal ofMolecular Evolution 22:

160-174.

Hedrich, P.W. 1980. Hitchhiking: A comparison of linkage and partial selfing.

Genetics 94: 791-808.

Hugues, A.L. 1994. The evolution of functionally novel proteins after gene duplication. Proceedings ofthe Royal Society ofLondon. Series B. 256 (1346):

119-124.

73 Hugues, M.K. and A.L. Hugues. 1993. Evolution ofduplicate genes in a tetraploid animal, Xenopus laevis. Molecular Biology and Evolution 10 (6): 1360­

1369.

Johnston, M.O. and DJ. Schoen. 1995. Mutation rates and dominance levels of genes affecting total fitness in two angiosperm species. Science 267: 226-229.

Johnston, M.O. and D.J. Schoen. 1996. Corre1ated evolution of self-fertilization and inbreeding depression: an experimenta1 study ofnine populations of

Amsinckia (Boraginaceae). Evolution 50: 1478-1491.

Khoen, R.K. and D.l. Rasmussen. 1967. Polymorphic and monomorphic serum esterase heterogeneity in Catostornid fish populations. Biochemical Genetics 1:

131-144.

Kim, J. and W. Stephan. 2000. Joint effects ofgenetic hitchhiking and background selection on neutral variation. Genetics 155: 1415-1427.

Kimura, M. 1968a. Evolutionary rate at the molecular level. Nature 217: 624-626.

74 Kimura, M. 1968b. Genetic variability maintained in a finite population due to mutational production ofneutral and nearly neutral isoalleles. Genetical Research

11: 247-269.

Kimura, M. and lL. King. 1979. Fixation ofa deleterious allele at one oftwo

"duplicate" loci by mutation pressure and random drift. Proceedings ofthe

National Academy of Sciences USA 76: 2858-2861.

King, J.L. and T.H. Jukes. 1969. Non-Darwinian evolution. Science 164: 788­

798.

Kojima, K-I. and H.E. Schaffer. 1967. Survival process oflinked mutant genes.

Evolution 21: 518-531.

Kreitman, M. 2000. Methods to detect selection in populations with applications to the human. Annual Review ofGenomics and Ruman Genetics 1: 539-59.

Lewontin, R.C. and K-I. Kojima. 1960. The evolutionary dynamics ofcomplex polymorphisms. Evolution 14: 458-472.

Li, W.-H. 1980a. Rate ofgene silencing at duplicate loci: a theoretical study and interpretation ofdata from tetraploid fishes. Genetics 95: 237-258.

75 Li, W.-H. 1980b. Evolutionary ehange of duplieate genes. Isozymes: Current

Topies in Biologieal and Medical Research 6: 55-92.

Li, W.-H. 1997. Molecular Evolution. Sinauer Associates, Sunderland

Massachussets.

Li, W.-H., c.-I. Wu and C.-C. Luo. 1984. Nonrandomness ofpoint mutation as reflected in nuc1eotide substitutions in pseudogenes and its evolutionary implications. Journal ofMolecular Evolution 21: 58-71.

Liu, F., D. Charlesworth and M. Kreitman. 1999. The effect ofmating system differences on nuc1eotide diversity at the phosphoglueose isomerase locus in the plant genus Leavenworthia. Genetics 151: 343-357.

Lynch, M. and 1.S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155.

Lynch, M.J., 1. Blanchard, D. Houle, T. Kibota, S. Schultz, L. Vassilieva and 1.

Willis. 1999. Perspective: Spontaneous deleterious mutation. Evolution 53: 645­

663.

MacIntyre, R.J. 1976. Evolution and ecological value of duplicate genes. Annual

Review of Ecology and Systematics 7: 421-468.

76 Manjunath, S. and M.M. Sachs. 1997. Molecular characterization and promoter analysis ofthe maize cytosolic glyceraldehyde-3-phosphate dehydrogenase gene family and its expression during anoxia. Plant Molecular biology 33: 97-112.

Maynard-Smith, land 1 Haigh. 1974. The hitch-hicking effect of a favorable gene. Genetical Research 23: 23-35.

McDonald, J. and M. Kreitman. 1991. Adaptive protein evolution at the adh locus in Drosophila. Nature 351: 652-654.

Miyashita, N.T., A. Kawabe, H. Innan and R. Terauchi. 1998. Intra- and interspecific DNA variation and codon bias ofthe alcohol dehydrogenase (Adh) locus in Arabis and Arabidopsis species. Molecular Biology and Evolution

15(11): 1420-1429.

Morton, B.R., B.S. Gaut and M.T. Clegg. 1996. Evolution of alcohol dehydrogenase genes in the Palm and Grass families. Proceedings ofthe National

Academy of Sciences USA 93: 11735-11739.

Nadeau, lB. and D. Sankoff. 1997. Comparable rates ofgene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147:

1259-1266.

77 Nagata, A., Y. Suzuki, M. Igarashi, N. Eguchi, H. Toh, Y. Urade and O. Hayaishi.

1991. Human brain prostaglandin D synthase has been evolutionarily differentiated from lipophilic-ligand carrier proteins. Proceedings ofthe National

Academy of Sciences USA 88: 4020-4024.

Nagylaki, T. 1982. Geographical invariance in population genetics. Journal of

Theoretical Biology 99: 159-172.

Nagylaki, T. 2000. Geographical invariance and the strong migration limit in subdivided populations. Journal ofMathematical Biology 41(2): 123-42.

Nei, M. 1969. Gene duplication and nucleotide substitution in evolution. Nature

218: 1160-1161.

Nei, M. 1987. Molecular Evolutionary Genetics. Columbia University Press, New

York.

Nei, M. and A.K. Roychoudhury. 1973. Probability offixation ofnonfunctional genes at duplicate loci. The Arnerican Naturalist 107 (955): 362-372.

78 Nei, M. and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution 3: 418-426.

Neilsen, R. 2001. Statistical tests ofselective neutrality in the age ofgenomics.

Heredity 86: 641-647.

Nicholas, K.B. and H.B. JI. Nicholas. 1997. GeneDoc: A tool for editing and annotating multiple sequence alignements. Distributed by the author at www.cris.com/~ketchup/genedoc.shtml

Nordborg, M. 1997. Structured coalescent processes on different time scales.

Genetics 146: 1501-1514.

0000, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin.

Ohta, T. 1987. Simulating evolution by gene duplication. Genetics 115: 207-213.

Ohta, T. 1988a. Further simulation studies on evolution by gene duplication.

Evolution 42: 375-386.

Ohta, T. 1988b. Time for acquiring a new gene by duplication. Proceedings ofthe

National Academy of Sciences USA 85: 3509-3512.

79 Ohta, T. 1989. Role ofgene duplication in evolution. Genome 31: 304-310.

Otto, S. 2000. Detecting the form ofselection from DNA sequence data. Trends in Genetics 16(12): 526-529.

Page, R. D. M. 1996. TREEVIEW: An application to display phylogenetic trees on personal computers. Computer Applications in the Biosciences 12: 357-358.

Pagel, M. 1999. Inferring the historical patterns ofbiological evolution. Nature

401(6756): 877-884.

Pollak, E. 1987. On the theory ofpartially inbreeding finite populations. 1. Partial selfing. Genetics 117: 353-360.

Ray, P.M. and H.F. Chisaki. 1957a. Studies on Amsinckia. 1. A synopsis ofthe genus, with a study ofheterostyly in it. American Journal ofBotany 44: 529-536.

Ray, P.M. and H.F. Chisaki. 1957b. Studies on Amsinckia. II. Relationships among the primitive species. American Journal of Botany 44: 537-544.

80 Rozas, land R. Rozas. 1999. Dnasp version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics

15: 174-175.

Saitou, N. and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phy10genetic trees. Molecular Biology and Evolution 4: 406-425.

Sambrook, l, E.F. Fritsch and T. Maniatis. 1989. Molecu1ar Cloning: A

Laboratory Manual. Second Edition. Cold Spring Harbor Laboratory Press,

P1ainview, New York.

Savo1ainen, O., c.H. Langley, B.P. Lazzaro and H. Fréville. 2000. Contrasting patterns ofnucleotide polymorphism at the alcohol dehydrogenase locus in the outcrossing Arabidopsis lyrata and the selfing Arabidopsis thaliana. Molecular

Biology and Evolution 17(4): 645-655.

Schoen, DJ. and A.H.D. Brown. 1991. Intraspecific variation in population gene diversity and effective population size correlates with the mating system in plants.

Proceedings ofthe National Academy of Sciences USA 88: 4494-4497.

Schoen, D.I, M.O. Johnston, A.-M. L'Heureux and IV. Marso1ais. 1997.

Evolutionary history ofthe mating system in Amsinckia (Boraginaceae).

Evolution 51(4): 1090-1099.

81 Shih, M.C., G. Lazar and H.M. Goodman. 1986. Evidence in favor ofthe symbiotic origin ofchloroplasts: primary structure and evolution oftobacco glyceraldehydes 3-phosphate dehydrogenases. Ce1l47: 73-80.

Shih, M.C., P. Heinrich and H.M. Goodman. 1988a. Intron existence predated the divergence ofeukaryotes and prokaryotes. Science 242: 1164-1166.

Shih, M.C., P. Heinrich and H.M. Goodman. 1988b. Unpublished results cited in

Shih, M.C., P. Heinrich and H.M. Goodman. 1988a.

Simon, C., F. Frati, A. Beckenbach, B. Crespi, H. Liu and P. Flook. 1994.

Evolution, weighting, and phylogenetic utility ofmitochodrial gene sequences and a compilation of conserved polymerase chain reaction primers. Annals ofthe

Entomological Society ofAmerica 87(6): 651-701.

Simonsen, K.L., G.A. Churchill and C.F. Aquadro. 1995. Properties ofstatistical tests ofneutrality for DNA polymorphism data. Genetics 141: 413-429.

Small, R.L. and J.F. Wendel. 2000. Copy number lability and evolutionary dynamics ofthe Adh gene family in diploid and tetraploid cotton (Gossypium).

Genetics 155: 1913-1926.

82 Smith, C.W.I, IG. Patton and B. Nadal-Ginard. 1989. Alternative splicing in the control of gene expression. Annual Review of Genetics 23: 527-577.

Spofford, lB. 1969. Heterosis and the evolution of duplications. American

Naturalist 103: 407-432.

Stephan, W. and C. H. Langley. 1998. DNA polymorphism in Lycopersicon and crossing-over per physicallength. Genetics 150: 1585-1593.

Strand, A.E., l Leebens-Mack and B.G. Milligan. 1997. Nuclear DNA-based markers for plant evolutionary bio1ogy. Molecular Ecology 6(2): 113-118.

Sturtevant, A.H. and G.W. Bead1e. 1962. An Introduction to Genetics. Dover

Publications, New York.

Swofford, D.L. 1998. PAUP*. Phylogenetic analysis using parsimony (and other methods). Version 4. Sinauer Associates. Sunderland, Massachussets.

Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by

DNA polymorphism. Genetics 123: 585-595.

83 Takahata, N. and T. Maruyama. 1979. Polymorphism and loss of duplicate gene expression: A theoretical study with application to tetraploid fish. Proceedings of the National Academy of Sciences USA 76: 4521-4525.

Thomas, B.R., V.S. Ford, E. Pichersky and L.D. Gottlieb. 1993. Molecular characterization of duplicate cytosolic phosphoglucose isomerase genes in

Clarkia and comparison to the single gene in Arabidopsis. Genetics 135: 895-905.

Thomas, J.H. 1993. Thinking about genetic redundancy. Trends in Genetics 9:

395-399.

Thompson J.D., T.I Gibson, F. Plewniak, F. Jeanmougin and D.G. Higgins.

(1997). The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 24:4876-4882.

Todd, A.E., C.A. Orengo and lM. Thornton. 2001. Evolution offunction in protein superfamilies, from a structural perspective. Journal ofMolecular Biology

307(4): 1113-1143.

Tropf, S., B. Karcher, G. Schroder and J. Schroder. 1995. Reaction mechanisms ofhomodimeric plant polyketide synthases (stillbene and chalcone synthase). A single active site for the condensing reaction is sufficient for synthesis of

84 stillbenes, chalcones and 6'-deoxychalcones. Journal ofBiological Chemistry

270: 7922-7928.

Walker, JE., AF. Carne, M.J. Runswick, J Bridgen and JI. Harris. 1980. D­ glyceraldehyde-3-phosphate dehydrogenase. Complete amino-acid sequence of the enzyme from Bacillus stearothermophilus. European Journal ofBiochemistry

108(2): 549-65.

Walsh, J.B. 1995. How often do duplicated genes evolve new functions? Genetics

139: 421-428.

Watterson, G.A 1977. Heterosis or neutrality? Genetics 35: 789-814.

Watterson, G.A 1978. The homozygosity test ofneutrality. Genetics 88: 405­

417.

Watterson, G.A 1983. On the time for gene silencing at duplicate loci. Genetics

105: 745-766.

Whelan, S., P. Lio and N. Goldman. 2001. Molecular phylogenetics: state-of-the­ art methods for looking into the past. Trends in Genetics 17(5): 262-272.

85 White, S. and 1. Doebley. 1998. Of genes and genomes and the origin ofmaize.

Trends in Genetics 14: 327-332.

Whitlock, M.C. and N.B. Barton. 1997. The effective size ofa subdivided population. Genetics 146: 427-441.

Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13: 555-556

86 Table 1. Amsinckia spectabilis varieties studied

Variety Population Collection locality Sel:fing rate

Inbreeding spectabilis 91010 Alisal Slough, CA 0.998a spectabilis 91011 Zmudowski State Beach, CA l.Ob spectabilis 88028 Pescadero Beach, CA not estimatedC

Outbreeding microcarpa 91003 Nipomo, CA O.SSb microcarpa 91004 Santa Malia, CA not estImate. dd microcarpa 91005 Lompoc, CA 0.33b a; Johnston and Schoen (1996) b: Schoen et al. (1997) c; complete seed set in the absence ofpollinators, homostylous d: no seed set in the absence ofpollinators, heterostylous

87 Table 2. Primer pairs and corresponding annealing temperatures

Forward primer Reverse Primer Annealing temperature CO C)

AsGapC-I AsGapC-II 49 AGGCTGCTGCTCACTTGAAG TCGGTAGACACTACATCATCT

AsGapC-III AsGapC-IV 49 GCTGCTCACTTGAAGGTCTG CTTGAGCTTGCCTTCTGATTC

AsGapC-V AsGapC-IV 48 GGGTTTGATTGGAATCTCAAAC CTTGAGCTTGCCTTCTGATTC

00 00 Table 3. List ofindividuals with the 18 bp deletion in their GapC gene sequence

Population Individual Number ofclones sequenced

88028 242 3 88028 G 3 88028 Q 1

91004 646 1 91004 624 3 91004 607 1 91004 663 1

89 Table 4. Locus-defining substitutions in relation to the consensus Amsinckia GapC sequence shown in Figure 1

Position 26 105 122 139 146 168 183 194 198 233 284 289 305 316 326 332 555* 567* 636* 840 897

Locus 1 T G G C A CC G G C AT A C T C C A G C Locus 2 T GG CC C G G C G CA T TC C A G C Locus 3 G T G - C T C G Cl C G T A C T C T Cl G T Locus 4 T T C T A G C T C TT G T G C C AT G T C

*: coding position

\0o Table 5. Linkage disequilibrium between putative loci-defming sites. DI values are above the diagonal; Results from Fisher's exact test are below the diagnonal. Note: only significant comparisons are shown.

Site 2 26 105 122 146 168 183 194 198 233 284 Site 1 26 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 105 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 122 *** *** 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 146 *** *** *** 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 168 *** *** *** *** 1.0000 1.0000 1.0000 1.0000 1.0000 183 *** *** *** *** 1.0000 1.0000 1.0000 1.0000 194 *** *** *** *** *** *** 1.0000 1.0000 1.0000 198 *** *** *** *** *** *** *** 1.0000 1.0000 233 *** *** *** *** *** *** *** *** 1.0000 284 *** *** *** *** *** *** *** *** *** 289 *** *** *** *** *** *** *** *** 305 *** *** *** *** *** *** *** *** 316 *** *** *** *** *** *** *** *** *** *** 326 *** *** *** *** *** *** *** *** 332 *** *** *** *** *** *** *** *** *** *** 555 *** *** *** *** *** *** *** *** *** *** 567 *** *** *** *** *** *** *** *** 636 *** *** *** *** *** *** *** *** 840 ** *** *** *** *** ** *** *** *** *** 897 *** *** *** *** *** *** *** *** *** ... continued

1.0 **: P < 0.01 ..... ***: P < 0.001 Table 5 (collcluded).

Site 2 289 305 316 326 332 555 567 636 840 897 Site 1 26 1.0000 1.0000 1.0000 1.0000 1.0000 105 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 122 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 146 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 168 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 183 1.0000 1.0000 1.0000 0.771 1.0000 194 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 198 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 233 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 284 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.858 1.0000 1.0000 289 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 305 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 316 *** *** 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.0510 326 *** *** 1.0000 1.0000 1.0000 1.0000 1.0000 332 *** *** *** *** 1.0000 1.0000 1.0000 1.0000 0.0510 555 *** *** *** *** *** 1.0000 0.858 1.0000 0.0510 567 *** *** *** *** *** *** 1.0000 1.0000 636 *** *** *** *** *** *** *** 0.861 840 *** *** *** *** *** *** *** *** 1.0000 897 ** ** ** **

\0 -**: P < 0.01 N ***: P < 0.001 Table 6. Analysis ofselective constraint in four putative GapC loci in Amsinckia spectabilis

dN/dS ratio lnL lnL 2~lnL Putative locus Sample size (dN/dS estimated) (dN/dS constrained to 1.0)

GapC-I 42 0.5985 -770.083368 -770.795435 1.424134 NS GapC-II 31 0.6834 -715.413757 -715.701576 0.575638 NS GapC-III 14 0.6513 -646.187398 -646.340105 0.305414 NS GapC-IV 93 0.8215 -1137.395321 -1137.728932 0.667222 NS

NS: Not significant

\0 W Table 7. Analysis ofselective constraint in four putative GapC loci in Amsinckia spectabilis (singletons rernoved)

dN/dS ratio ln L InL 21llnL Putative locus Sample size (dN/dS estirnated) (dN/dS constrained ta 1.0)

GapC-I 42 0.3629 -612.957205 -613.198249 0.482088 NS GapC-II 31 0.6546 -642.875284 -643.024386 0.298204 NS GapC-III 14 0.9803 -616.720158 -616.720297 0.000278 NS GapC-IV 93 0.6740 -801.710735 -802.197999 0.974528 NS

NS: Not significant

\0 ~ Table 8. Analysis ofselective constraint for each putative A. spectabilis GapC locus analyzed in combination with solanaceous GapC coding sequence data

dN/dS estimated separately dN/dS constrained to 1.0 along branch leading to Amsinckia sequences dN/dS ratio dN/dS ratio ln L dN/dS ratio dN/dS ratio ln L 2 i11nL Putative locus Amsinckia Solanaceae Amsinckia Solanaceae

GapC-I 0.1042 0.0620 -1245.415309 1.0000 0.0474 -1254.744638 18.658658*** GapC-II 0.1042 0.0620 -1245.415309 1.0000 0.0474 -1254.744638 18.658658*** GapC-III 0.0997 0.0622 -1246.793399 1.0000 0.0471 -1257.080880 20.574962*** GapC-IV 0.0987 0.0620 -1247.447195 1.0000 0.0467 -1257.847208 20.800026***

-***: P < 0.001

ID VI Table 9. Functional consequences ofamino acid substitutions in haplotypes ofAmsinclâa spectabilis at Cape loci Table 10. Nucleotide variation at four putative GapC loci in Amsinckia spectabilis varieties that differ in mating systems

all sites silent sites Sample a a - Slze n±SD 8±SD n±SD 8±SD GapC-I spectabilis 15 0.0034 ± 0.00318 0.0063 ± 0.00267 0.0046 ± 0.00425 0.0084 ± 0.00371 mierocarpa 27 0.0042 ± 0.00584 0.0119 ± 0.00415 0.0067 ± 0.00916 0.0187 ± 0.00658

GapC-II spectabilis 4 0.0038 ± 0.00256 0.0039 ± 0.00250 0.0035 ± 0.00224 0.0034 ± 0.00248 microcarpa 26 0.0042 ± 0.00479 0.0098 ± 0.00348 0.0054 ± 0.00569 0.0116 ± 0.00435

GapC-III spectabilis 4 0.0039 ± 0.00260 0.0040 ± 0.00254 0.0058 ± 0.00385 0.0059 ± 0.00387 microcarpa 10 0.0064 ± 0.00409 0.0101 ± 0.00647 0.0077 ± 0.00353 0.0122 ± 0.00566

GapC-IV spectabilis 53 0.0034 ± 0.00800 0.0166 ± 0.00494 0.0038 ± 0.00899 0.0187 ± 0.00581 microcarpa 40 0.0049 ± 0.00899 0.0186 ± 0.00576 0.0076 ± 0.01377 0.0285 ± 0.00891

SD: standard deviation \.0 a -.l : with no recombination Table 11. Nucleotide variation at four putative GapC loci in Amsinckia spectabilis varieties that differ in mating systems (singletons removed)

ail sites silent sites Sample a a --size n±SD e±SD rr±SD e±SD GapC-I spectabilis 15 0.0015 ± 0.00094 0.0018 ± 0.00102 0.0011 ± 0.00057 0.0011 ± 0.00073 microcalpa 27 0.0014 ± 0.00122 0.0025 ± 0.00111 0.0020 ± 0.00156 0.0032 ± 0.00151

GapC-II spectabilis 4 0.0020 ± 0.00128 0.0020 ± 0.00142 0.0014 ± 0.00075 0.0011 ± 0.00114 microcarpa 26 0.0024 ± 0.00185 0.0038 ± 0.00157 0.0036 ± 0.00271 0.0055 ± 0.00240

GapC-III spectabilis 4 0.0026 ± 0.00173 0.0027 ± 0.00181 0.0047 ± 0.00308 0.0047 ± 0.00322 microcarpa la 0.0046 ± 0.00250 0.0047 ± 0.00231 0.0075 ± 0.00403 0.0076 ± 0.00378

GapC-IV spectabilis 53 0.0014 ± 0.00232 0.0048 ± 0.00171 0.0014 ± 0.00225 0.0047 ± 0.00190 microcmpa 40 0.0014 ± 0.00095 0.0020 ± 0.00090 0.0018 ± 0.00118 0.0024 ± 0.00126

SD: standard deviation \0 a 00 : with no recombination Table 12. Nucleotide variation at GapC-I and GapC-IV loci within Amsinckia spectabilis populations that differ in mating systems

aH sites silent sites Sample Slze n±SD 9±SD' n±SD 8±SD'

GapC-I

spectabilis Population 91010 6 0.0020 ± 0.00151 0.0026 ± 0.00160 0.0021 ± 0.00159 0.0027 ± 0.00190 Population 910Il 3 0.0040 ± 0.00297 0.0040 ± 0.00278 0.0070 ± 0.00519 0.0070 ± 0.00485 Population 88028 6 0.0034 ± 0.00242 0.0042 ± 0.00230 0.0053 ± 0.00369 0.0064 ± 0.00368

microcarpa Population 91003 10 0.0033 ± 0.00291 0.0055 ± 0.00262 0.0049 ± 0.00431 0.0082 ± 0.00398 Population 91004 11 0.0042 ± 0.00343 0.0066 ± 0.00298 0.0070 ± 0.00563 0.0108 ± 0.00493 Population 91005 6 0.0046 ± 0.00332 0.0058 ± 0.00310 0.0074 ± 0.00528 0.0091 ± 0.00498

GapC-IV

spectabilis Population 91010 18 0.0035 ± 0.00388 0.0078 ± 0.00309 0.0040 ± 0.00401 0.0080 ± 0.00345 Population 910 Il 20 0.0031 ± 0.00387 0.0078 ± 0.00302 0.0041 ± 0.00554 0.0111 ± 0.00444 Population 88028 15 0.0033 ± 0.00353 0.0070 ± 0.00292 0.0052 ± 0.00550 0.0108 ± 0.00461

microcarpa Population 91003 15 0.0041 ± 0.00428 0.0084 ± 0.00346 0.0055 ± 0.00616 0.0121 ± 0.00509 Population 91004 12 0.0043 ± 0.00388 0.0075 ± 0.00327 0.0062 ± 0.00569 0.0110 ± 0.00490 Population 91005 13 0.0048 ± 0.00473 0.0092 ± 0.00386 0.0078 ± 0.00756 0.0147 ± 0.00623 1.0 1.0 SD: standard deviation ': with no recombination Table 13. Nucleotide variation at GapC-Iand GapC-IV loci within Amsinckia spectabilis populations that differ in mating systems (singletons removed)

aIl sites silent sites Sample Slze n±SD f:J±SD' n±SD El±SDa

GapC-I

spectabilis Population 91010 6 0.0004 ± 0.00030 0.0005 ± 0.00052 0.0004 ± 0.00031 0.0005 ± 0.00054 Population 910 Il 3 0.0008 ± 0.00059 0.0008 ± 0.00080 0.0008 ± 0.00061 0.0008 ± 0.00081 Population 88028 6 0.0018 ± 0.00121 0.0021 ± 0.00135 0.0019 ± 0.00124 0.0021 ± 0.00138

l/licrocarpa Population 91003 10 0.0011 ± 0.00090 0.0017 ± 0.00103 0.0012 ± 0.00092 0.0017 ± 0.00106 Population 91004 11 0.0024 ± 0.00171 0.0033 ± 0.00167 0.0025 ± 0.00175 0.0033 ± 0.00170 Population 91005 6 0.0006 ± 0.00030 0.0005 ± 0.00052 0.0007 ± 0.00031 0.0005 ± 0.00053

GapC-IV

spectabilis Population 91010 18 0.0015 ± 0.00124 0.0025 ± 0.00122 0.0019 ± 0.00124 0.0025 ± 0.00144 Population 91011 20 0.0013 ± 0.00135 0.0027 ± 0.00128 0.0014 ± 0.00175 0.0035 ± 0.00180 Population 88028 15 0.0012 ± 0.00112 0.0022 ± 0.00115 0.0019 ±: 0.00162 0.0032 ± 0.00176

l1lîcrocarpa Population 91003 15 0.0016 ± 0.00130 0.0026 ± 0.00129 0.0028 ± 0.00228 0.0045 ± 0.00226 Population 91004 12 0.0021 ± 0.00163 0.0032 ± 0.00159 0.0031 ± 0.00249 0.0048 ± 0.00249 Population 91005 13 0.0023 ± 0.00197 0.0038 ± 0.00182 0.0040 ± 0.00345 0.0067 ± 0.00319 0 SD: standard deviation -0 •; with no recombination Figure 1. Alignment ofthe partial GapC gene sequence ofArabidopsis thaliana and Amsinckia spectabilis. Light gray shading indicates non-coding nuc1eotides of

A. thaliana. Exons are numbered according to annotations for A. thaliana.

Nucleotides shown on the black background are identical in both organisms. The

Amsinckia sequence represents the consensus sequence for this species.

101 Arabidopsis GCTCACTTGA.lI. 15 Amsinckia GCTC ACTTGAAG GTCTGTTTTCTATCTACATCTACTACTTGTGGTTA 50

Arabidopsis Amsinckia CTAAGTGTTTATTTGTGTAAAGTGATTTACCATGTTGAATGTTATAAGTT 100

Arabidopsis Amsinckia TAGTKTGATTGGAATCTCAAASTAATTTTAAATAATTTTATATGGHTTAT 150

Arabidopsis Amsinckia TAGTTTTCTCGAAATATHGTGGTTCTAATTACCGTGATCTACTHGGTBAG 200

Arabidopsis Amsinckia ATGTGTTAGTCCTC(:;GTATCCCTAATGCTCTGDATTATGGCTTAACTCCT 250

Arabidopsis ------~~-----GTUTGTCTTATUTGAATUGGTTATTTUTGTCTTGT 50 Amsinckia AG1I.TGTCTTCAGTACCpiAAATCCCCijATTGAijYTAGTGG.riGTATGAAA 300

Arabidopsis ù TAAATlGTTUATGTŒ.. • CümGAATTTGCT~TlTCATTCAACTA 100 Amsinckia 'CTTG.~TAPiTGTCiTÜlCGTAGAYTGG!œcIGTTGGTTTAAT 350

Arabidopsis rJliTGTGA.:TGijiiiTU CAGGGTGGTGC 150 Amsinckia rJliGTGAC •• HiTUci CAGGGTGGTGC 400

Arabidopsis !TŒAAmcBAAAB!.~ C 200 Amsinckia !.'tcrmTrDGG-r!I l!".cDGle8C TfIIcmCmT 450 EXON6

Arabidopsis _ AGTlcE CTTGACATTG GCTAGCTGCAC 250 Amsinckia CAGI·rfD CTTGACATTG GCTAGCTGCAC 500

Arabidopsis 300 Amsinckia 522 Arabidopsis AATTTGACTTTGTATTTCAAGTTGAAGTGACTAATTTCATTTAACGTTCT 350 Amsinckia

Arabidopsis ll.TTGTTGAGGG T "100 Amsinckia ll.TTGTTGAGG- 101 555 EXON7

Arabidopsis "150 Amsinckia 583

Arabidopsis AAGTTTATTACAAACTTGCTTGCCTATAGGTGGAAll.ATTTGTGATTTAAT 500 Amsinckia

Arabidopsis AAGll.CTGTTGATGGGCC 550 Amsinckia AAGll.CTGTTGATGGGCC 611

Arabidopsis E!A GAll.GGACTGGAGAGGTGGAAGAGCTGC TCll.TTCAACA 600 Amsinckia E!c 'GAAGGll.CTGGll.GAGGTGGll.AGRGCTGC TCATTCll.ll.CA 661

Arabidopsis GTGCTTCCAGC 650 Amsinckia 'GTGCTTCCAGC 711 EXON8

Arabidopsis TTGACTGGAATGTCTTTCCGTG GTTGATGTCTCAG 700 Amsinckia • TTGll.CTGGAATGTCTTTCCGTG • GTTGATGTCTCAG 761 Arabidopsis 750 Amsinckia 811

Arabidopsis I!AA"--~--GTE!AGci 795 Amsinckia I!GC " 'GTAn':;'TcE!n.1ij" 861

Arabidopsis C.T.••.•..TAi]A:.Tct;AGÜ. ~GÜc.mI'cm~':..iïA: Tcmct;p.fl!~TG.'i'ii TTTAC CAC: cmA 8"1S Amsinckia f'.AA TjJTLTA1jJTTjJ'_,ci]AE!'- p.c o...:i]GG u.fl!1jJTfl1.u.c AiJiGAATTTC TAmT 911

EXON'9 Arabidopsis AGGGll.GGll.ATC 877 Amsinckia AGGGAGGAATC--- 9"13 Figure 2. Graphical representation ofintronJexon structure in a partial GapC gene sequence ofArabidopsis thaliana and Amsinckia spectabilis. Thick lines depict exons, and thin lines, introns. Black dashed lines indicate correspondence between Arabidopsis thaliana and Amsinckia spectabilis exons. White dashed lines indicate exon boundaries in Amsinckia spectabilis as inferred by comparison with Arabidopsis thaliana.

102 0> Z o ~ 1 1 1 1 , 1 1 1 1 1 , 1 1 1 , 1 1 1 1 1 1 1

co Z o 1/" " / ~ l ' 1 / 1 / l' / l ' , /

/ " /1/1 ,/ ,',/ / 1 1 1 /// 1'-.. 1 / / / 1 Z 1 1 1 1 o 1 / 1 X 1 / w / / / / / 1 1 1 1 CD / Z / ~ 1 W /

L() z o X w Figure 3. Neighbor-joining tree ofGapC haplotypes in Amsinckia spectabilis.

Position of locus-defining substitutions and direction of change are boxed. Scale bar represents 0.01 nucleotide substitutions per site.

103 14

Putative, locus; 2

27

Putative; locus; 1

Putative; locus; 3

Putative, locus; 4

0.01 Figure 4. Haplotypes produced by single individuals in Amsinckia spectabilis populations. Circled numbers correspond to previously defined haplotypes produced by each individual. The Neighbor-joining tree is as shown in Figure 2.

104 Individual G' (population' 88028)

22 Putative locus 2

Putative locus 3

Putative locus 1

Putative locus 4

Individual 242 (population' 88028)

®

Putative locus 2

Putative locus 3

Putative locus 1

Putative locus 4 Individual; 1003 (population; 91 005)

Putative locus 2 ®

Putative locus 3

Putative locus 1

Putative locus 4

Individua( 1027 (population; 91 005)

Putative locus 2

Putative locus 3

Putative locus 1

Putative locus 4 Figure 5. Representative result ofPCR-RFLP analysis, depicted here for population 28. A: Restriction digestion by Ace! ofa fragment of GapC-IV generated by PCR. B: Restriction digestion by Earl offragments ofGapC-f and II generated by PCR. Lane M is a 100 bp DNA ladder; lanes 1 through 7 in A and B correspond to PCR-RFLP results from the same individuals. Three bands are seen in allianes: the largest are undigested PCR products; the two others are restriction digestion products ofthe amplicon (in both A and B).

105 M 1 234567

A

M 1234567

B Figure 6. Results from a Southern hybridization between a GapC fragment and

Amsinckia spectabilis genomic DNA digested with Hha l (lane 1) and Hind III

(lane 2). A digoxygenin-labeled maker is found in lane M. Black arrows point to bands in lane 1, white arrows point to bands in lane 2.

106 ~ ~ 2027 1904 ~ ~ 1584

1375

947 Figure 7. ML tree of GapC sequences from Amsinckia spectabilis and solanaceous species tobacco, tomato, petunia and potato. Scale bar represents 0.1 nucleotide substitutions per site.

107 Tomato

Potato Tobacco

AsGapC-IV AsGapC-III AsGapC-1 AsGapC-1I

Petunia

0.1 Appendix A. Correspondance between haplotype, sequence and putative GapC locus

Haplotype Sequence Putative Haplotype Sequence Putative locus locus

1 a46071 1 12 a46153 1 1 a510093 1 12 a46195 1 1 a510775 1 12 a46242 1 1 c510275 1 12 a46255b 1 12 a46263 1 2 a107052 12 a510021 1 12 a510031 1 3 a31581 12 a510193 1 12 a107014b 1 4 a31621 1 12 a107122 1 4 a282133 1 12 a107241b 1 4 a282424 1 12 a107321 1 12 a282281 1 5 a118281 1 5 a28G1 1 13 a31214 2 13 a31414 2 6 a31532 1 6 a46022b 1 14 a46032 2 6 a118033 1 14 a46693 2 6 a118091 1 6 a282304 1 15 a46514 2

7 a282374 16 a46713 2

8 a31293 17 a46315 2 17 a510253 2 9 a46044 17 a510325 2 9 a46221 18 a510502b 2 10 a107353 18 a282194 2

11 a46612 19 a31693 2 11 a46684 19 a118124b 2

12 a31022 1 20 a510053 2 12 a31253 1 20 a510151 2 12 a31262 1 20 a510152 2 12 a31575 1 20 a510214 2 12 a31662 1 20 a510241 2 12 a31773 1 20 a510342 2

Bold: population Default: individual ttalies: clone

108 Appendix A (continued)

Haplotype Sequence Putative Haplotype Sequence Putative locus locus

20 a282262 2 33 a28236c1 4

21 a510032 2 34 a3125c3 4 21 a510042 2 21 a510072 2 35 a107192b 4 21 a510101 2 35 a118041 4 21 a510125 2 35 a118082 4 21 a510172 2 35 a118213 4 21 a510271 2 35 a118253b 4 21 a510332 2 35 a118322 4 21 a510352 2 35 a282414 4 21 a510362 2 21 a107073 2 36 a3121c1 4

22 a282421 3 37 a107102b 4 a28G2 3 38 a107095 4 23 a28Q1 3 39 a4646c3 4 24 a46463 3 39 a51018c5 4

25 a46072 3 40 a3177c5 4 25 a46241 3 25 a46632b 3 41 a118181 4

26 a46215 3 42 a107365 4 26 a510103b 3 43 a51003c1 4 27 a510033 3 27 b510271 3 44 a510112 4

28 a46182 3 45 a118513 4 28 a510071 3 28 a282143 3 46 a118201 4

29 a3178c1 4 47 a118152 4

30 a31815 4 48 a118301 4

31 a51021c3 4 49 a4647c2 4

32 a31144 4 50 a4650c2 4 32 a118504 4 50 a107033 4

Bold: population Default: individual /talics: clone

109 Appendix A (concluded)

Haplotype Sequence Putative Haplotype Sequence Putative locus locus

51 a4604c1 4 66 a31462 4

52 a4603c1 4 67 a4626c2 4 67 a107264 4 53 a28G4 4 68 a118093 4 54 a3102c2 4 54 a3126cS 4 69 a28213c1 4 54 a3129c2 4 54 a4607c1 4 70 a51010c1 4 54 a4615c1 4 54 a4624c3 4 71 a11802S 4 54 a51023c1 4 54 a118104 4 72 a3106cS 4 54 a28Gc1 4 72 a3118c2 4 72 a3162c3 4 55 a51024c.3. 4 72 a31651 4 72 a46053b 4 56 a51006c1 4 72 a46303 4 72 a510011 4 57 a118163 4 72 a51033c1 4 72 a107042 4 58 a51007c3 4 72 a107081 4 72 a107163 4 59 a28244c1 4 72 a107224 4 72 a107291 4 60 a51015c2 4 72 a107331b 4 72 a118011 4 61 a107113 4 72 a118061b 4 72 a118143 4 62 a107141 4 72 a118224 4 62 a107171 4 72 a118244 4 72 a282072 4 63 a3154c1 4 72 a28210c1 4 72 a28214c2 4 64 a4661c2 4 72 a282161 4 72 a28220S 4 65 a510204 4 72 a28224S 4 65 a107151 4 72 a282263 4 65 a10720S 4 72 a28234S 4 72 a28242c4 4

Bold: population Default: individual Italics: clone

110