Copyright  2004 by the Genetics Society of America

Population Genetics of the Wild Yeast Saccharomyces paradoxus

Louise J. Johnson,*,1 Vassiliki Koufopanou,* Matthew R. Goddard,† Richard Hetherington,* Stefanie M. Scha¨fer*,2 and Austin Burt* *Department of Biological Sciences and †NERC Centre for Population Biology, Imperial College at Silwood Park, Ascot SL5 7PY, United Kingdom Manuscript received November 4, 2002 Accepted for publication September 22, 2003

ABSTRACT Saccharomyces paradoxus is the closest known relative of the well-known S. cerevisiae and an attractive model organism for population genetic and genomic studies. Here we characterize a set of 28 wild isolates from a 10-km2 sampling area in southern England. All 28 isolates are homothallic (capable of mating-type switching) and wild type with respect to nutrient requirements. Nine wild isolates and two lab strains of S. paradoxus were surveyed for sequence variation at six loci totaling 7 kb, and all 28 wild isolates were then genotyped at seven polymorphic loci. These data were used to calculate nucleotide diversity and number of segregating sites in S. paradoxus and to investigate geographic differentiation, population Extensive incompatibilities .%0.3ف structure, and linkage disequilibrium. Synonymous site diversity is between gene genealogies indicate frequent recombination between unlinked loci, but there is no evidence of recombination within genes. Some localized clonal growth is apparent. The frequency of outcrossing relative to is estimated at 1.1% on the basis of heterozygosity. Thus, all three modes of reproduction known in the lab (clonal replication, inbreeding, and outcrossing) have been important in molding genetic variation in this species.

ANY fields in biology have progressed by the con- are escaped domestics or otherwise greatly affected by M centrated study of a select group of model sys- human activity (Vaughan-Martini and Martini 1995; tems. In population and evolutionary genetics, only a Naumov et al. 1992a). This could greatly affect their few species such as Drosophila and humans have been population genetics, severely complicating interpreta- widely adopted, and it might make sense to consider tions and reducing the extent to which lessons learned what other taxa might best complement these. The yeast with this species are likely to be widely applicable. For has a number of characteristics example, one survey of S. cerevisiae in wineries revealed that would seem to make it ideal (Zeyl 2000): (i) It is some surprising findings, including 31% of strains het- already a well-studied model system in biochemistry, cell erozygous for a lethal mutation and 23% heterozygous biology, classical genetics, and molecular biology; (ii) or homozygous for , i.e., an inability to genomes can be precisely altered by homologous recom- undergo mating-type switching (Mortimer 2000). The bination; and (iii) long-term experiments with large association between Drosophila and humans has posed population sizes and sensitive fitness assays are readily similar problems (Andolfatto and Przeworski 2000; possible in the laboratory. These features suggest that Wall et al. 2002). one may be more likely to be able to investigate and One way to circumvent this problem would be to study interpret the functional significance of natural DNA a close relative that has the same advantages, but not sequence variation in this species than in any other the disadvantage. S. paradoxus is (along with S. cario- eukaryote. Moreover, it has a relatively small and gene- canus) the closest known relative of S. cerevisiae (God- rich genome, reducing the size of the problem to be dard and Burt 1999). The two species appear to be solved. However, there is a problem: S. cerevisiae has biochemically indistinguishable (Barnett et al. 1990), long been associated with humans, and in collecting have the same chromosome number, and appear to be strains it is difficult to determine to what extent they largely syntenic (Naumov et al. 1992b). Growth prefer- ences in the lab are the same as for S. cerevisiae, and genetic engineering by the same homologous gene re- placement methods used in S. cerevisiae is possible (E. Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AJ515177– Louis, personal communication). Thus, many of the AJ515216, AJ515322–AJ515352, and AJ515430–AJ515449. advantages still apply. Moreover, it has been isolated 1Corresponding author: Institute of Genetics, University of Notting- from many natural locations worldwide (e.g., Sniegow- ham, Queens Medical Centre, Nottingham NG7 2UH, United King- dom. E-mail: [email protected] ski et al. 2002) and apparently has not been widely 2Present address: Department of Infectious Disease Epidemiology, domesticated. Gene flow between S. cerevisiae and S. Imperial College, London W2 1PG, United Kingdom. paradoxus is also unlikely; hybrids can be formed, but

Genetics 166: 43–52 ( January 2004) 44 L. J. Johnson et al. are almost completely sterile (Naumov et al. 1997a). cerevisiae strains with substantial variability within each species. Overall DNA sequence divergence between the two spe- The initial collection of 344 bark scrapings yielded 28 isolates. Other strains: The Centraalbureau voor Schimmelcultures ف cies is thought to be 20% (Herbert et al. 1988), and (CBS) supplied CBS 432, the type strain of S. paradoxus, and synonymous site divergence at the loci studied here is the Danish lab strain CBS 5829, here referred to as “Type” .and “Danish,” respectively .%30ف In the laboratory, the life cycle of S. paradoxus is the Two S. paradoxus isolates from the Russian Far East (FE), same as that of S. cerevisiae (Herskowitz 1988). It nor- CBS 8436 and CBS 8444, were included for comparison. These isolates differ from European S. paradoxus at allozyme loci -synonymous site diver %5ف mally reproduces mitotically as a diploid, but when (Naumov et al. 1997b) and show starved of nitrogen undergoes and produces gence from the type strain of S. paradoxus at the six sequenced four haploid spores encapsulated in an ascus. There are loci. These strains, referred to herein as FE1 and FE2, respec- two mating types, and the spores usually mate within tively, were kindly provided by Edward Louis. All S. cerevisiae the ascus upon germination, but if this does not happen, sequence data were from the Yeast Genome Project (Goffeau et al. 1996). they are able to reproduce mitotically as haploids. Hap- Phenotypic assays: To isolate individual spores for pheno- loid cells are constitutively ready to mate and can out- typic assays, all wild isolates were grown on sporulation me- cross. However, haploid mitoses are associated with a dium for 4 days, and resultant asci were enzymatically digested sophisticated mechanism of mating-type switching, with (10 min in a 50-␮l solution of 10 mg/ml sulfanotase, 10 mg/ Њ the result that cells can also mate with their clonemates, ml lyticase at 25 ). Individual spores were removed with a Zeiss micromanipulator and incubated at 25Њ for 4 days on YPD producing an entirely homozygous diploid (“autodiploi- agar to allow colony growth. Colonies were replica plated to dization”). Thus, S. paradoxus may undergo two types minimal and sporulation media and after 3 days examined of self-fertilization: intra-ascus mating and autodiploidi- for growth or surveyed by microscopy for the presence of zation. For a review of ascomycete mating systems, see tetrads. The presence of tetrads was considered indicative Nelson (1996). of mating-type switching. All media were made according to Sherman (1991). In this article we describe a preliminary investigation Molecular methods: Nine wild isolates were chosen ran- into the genetics of a single population of S. paradoxus, domly for an initial survey of sequence variation. Total DNA focusing on quantifying levels of nucleotide variation was extracted (Sherman 1991) and diluted 100-fold for use and analyzing the pattern of variation to infer mating as a PCR template. Six genes involved in mate recognition system (and, to a lesser extent, dispersal). were amplified from the nine wild isolates and from the Type strain, Danish, FE1, and FE2 isolates. Details of genes and primers are given in Table 1. All 28 wild isolates were then genotyped at polymorphic sites by restriction at the MFA1 and MATERIALS AND METHODS AGA2 loci, using enzymes Tsp451 and AseI, respectively, and by sequencing fragments of MF␣1, SAG1, STE2, and STE3. Collections: S. paradoxus was isolated from the bark of oak Microsatellite locus: Twenty S. cerevisiae microsatellite trees (Quercus, mainly Quercus robur; Naumov et al. 1998) in primer pairs (Field and Wills 1998) were tested on S. para- ف Silwood Park and Windsor Great Park. Bark scrapings ( 1g) doxus. Of these only 3 gave a PCR product with S. paradoxus, were collected from 86 oak trees on each of two dates, with and 1 was found to be polymorphic, a variable-length repeat two scrapings on opposite sides of the tree on each date. in the TFA1 gene (chromosome XI in S. cerevisiae). The wild Scrapings were aseptically transferred to acidified malt me- isolates were genotyped at this locus by polyacrylamide gel dium [5% malt extract (Sigma, Dorset, UK), 0.4% lactic acid electrophoresis of radioactively end-labeled PCR products (Sigma) w/v] in loosely capped vials and shaken for 2 days (Sambrook et al. 1989). A representative of each mobility Њ at 30 . Many types of microbe were present in the medium so group was sequenced to determine the length of each . a selection procedure was incorporated to isolate S. paradoxus. Statistical analysis and software used: Nucleotide diversity Dilutions of the 48-hr culture were plated on acidified malt ␲ at synonymous and nonsynonymous sites, and synonymous and incubated for 24 hr at 30Њ. The resulting colony-forming site divergence, were calculated using DnaSP (Rozas and Rozas units were visually inspected and colonies looking like S. para- 1999; available at http://www.ub.es/dnasp/). Parsimony analy- doxus were picked, placed on YPD [1% yeast extract (Merck, sis of gene trees and comparisons among them by the partition Dorset, UK), 2% peptone (Merck), 2% glucose (BDH, Leices- homogeneity test (Farris et al. 1994) were performed using tershire, UK], and then subsamples were tested for their ability PAUP (Swofford 2002). To test for deviations from neutral- to form tetrads when placed upon nitrogen-starving medium ity, we compared the variance of branch lengths on the geneal- (2% potassium acetate; BDH). Heterozygosity was maintained ogy to that from 1000 random genealogies with the same total in the original samples because they were not stimulated to branch length, constructed using N. Barton’s genealogies pack- sporulate. For those that formed tetrads, the internal tran- age (available at http://helios.bto.ed.ac.uk/evolgen/barton/ scribed spacer region (ITS1-5.8rRNA-ITS2) was amplified us- index.html) for Mathematica (Wolfram Research 1999). Tests ing primers ITS1 and ITS4 (White et al. 1990) and then for overrepresentation of genotypes and linkage disequilib- visualized via electrophoresis through 1% agarose. ITS ampli- rium were performed using MultiLocus (Agapow and Burt cons of roughly the correct size were sequenced (with an ABI 2001; available at http://www.bio.ic.ac.uk/evolve/software/ 373) and compared to the ITS sequences from the S. paradoxus multilocus/index.html). The correlation between genetic and (CBS 432) and S. cerevisiae type strains. Three types of sequence geographical distance across all pairs of isolates was tested by were recovered. Two of these were largely unalignable to the randomization, in Mathematica. Saccharomyces sequences and were identified as Hanseni- aspora osmophila (CBS 313) and Torulaspora delbrueckii (CBS 404), using BLAST (Altschul et al. 1990). All sequences in RESULTS the third category were very similar to the S. paradoxus se- quence and were included in our sample. Our procedure Isolations: S. paradoxus was isolated from 28 of 344 therefore allowed the isolation of both S. paradoxus and S. bark scrapings, a success rate of 8%. There was no obvi- Population Genetics of S. paradoxus 45

TABLE 1 Primer sequences

Gene Primers, 5Ј–3Ј, forward first MFA1 (YDR461w): ␣-pheromone, chromosome 4 MFL-5px: CTG TTG CTC GGA TAA AAT CAA G MFL-6px: GGA TAA CAG TAA CAG CGC TAA G

MF␣1 (YPL187w): ␣-pheromone, chromosome 16 sMFG1-U: AAA GCA ACA ACA GGT TTT GG sMFG1-L: CAA ATT GAA ATA TGG CAG GC MFAL-SEQF*: TTT TAA TAC ACA CAA ATA AAT TAT CC MFAL-SEQR*: TGA GAA AGT TGA TTT TGT TAC GC

STE2 (YFL026w): ␣-pheromone receptor, chromosome 6 STE2-142F: ACT GTT ACT CAG GCT ATT ATG TTC G STE2-1539R: TAA TCC AAT GAA AAA AAA TCA CTG C STE2-497F*: TGA CAT CAA TAT CTT TCA CTT TCA CTT TAG G STE2-1078F*: TCA GAA AGA ACT TTT GTT GCT GAG G STE2-1148R*: CCT TGT ATT TTT TGA ACT CGT GG STE2-235R*: AAA CTT GGT TGA TAA TGA AAA TTG G

STE3 (YKL178c): ␣-pheromone receptor, chromosome 11 STE3-F3: TGG ACA CAT TCA TTA CCT ACC ACG STE3-ENDR: TTT CTG AAC TAA GCT CAT TTG AAC STE3-530R*: GAA AAC GAA CAG CAC CAA GG STE3-989F*: AGG ATT TAC AGC AGG TGG ATG G STE3-997R*: TTT CAG AAT CGG TAG AGA ATG G

AGA2 (YGL032c): ␣-agglutinin subunit, chromosome 7 AGA2-7PX: CTT TTG TTG TTC GGG CAT TTC C AGA2-8PX: GTT GGC TAT TAT GAT AGT CCA TCC

SAG1 (YJR004c): ␣-agglutinin, chromosome 10 SAG1-78F: GCT ATG TGA ACC AAA AAA AGA TAC C SAG1-2005R: GCC TGA TGT TGA AGA ATA ATA TGC SAG1-411R*: GTT TTT TGC GAT GAA TCT GAC AGC SAG1-711F*: AAT GTC TGA TGT GGT GAA TTT CG SAG1-1317F*: GTC GGA AGT AAT CAG TCA TGT GG SAG1-1503R*: GAT GTT GAA GTC ACA ATA GGT ACG * Internal sequencing primers.

ف ϫ 10Ϫ8/0.069 60ف ous difference in success rate between large and small them at the observed frequency is trees or samples with different aspect. From 4 bark 10Ϫ5. Thus, even very small selection coefficients would scrapings on each of two dates, 63 trees produced no be sufficient to keep the mutants at the observed low isolates, 18 produced one isolate, and 5 produced two frequency. isolates. No S. cerevisiae strains were recovered although All colonies grown from haploid spores were also they were not excluded by our procedure. capable of forming tetrads on sporulation medium, indi- Phenotypic variation: All 28 wild isolates were induced cating that they had autodiploidized following mating- to undergo meiosis, and the four haploid spores were type switching (i.e., were homothallic). In S. cerevisiae it dissected from the asci. The resultant colonies were all appears that there is only one locus that can mutate to capable of growth on minimal medium, demonstrating give a heterothallic phenotype (HO); making the same that none of the 28 strains carried an auxotrophic muta- calculations as above indicates that the minimum selec- tion. The frequency of auxotrophic mutants is thus 0, tion coefficient against such mutants in the wild is .10Ϫ7ف ,with a 2-unit upper support limit of 0.069. In S. cerevisiae -genes can mutate to auxotrophy, as estimated by Molecular data set 1: DNA sequences from nine iso 60ف counting gene names denoting amino acid auxotrophy lates: The initial survey of molecular variation involved in the yeast genome (Goffeau et al. 1996). The sponta- sequencing six loci from nine wild isolates plus the Type, 10Ϫ8/locus/mitotic Danish, FE1, and FE2 isolates. Sequence variation wasف neous mutation rate in the lab is generation (Drake 1991; Zeyl and Devisser 2001). If discovered at each of the six loci, and there were a total the same values apply to S. paradoxus in nature, and the of 24 polymorphic sites and one polymorphic repeat in .(bp of sequence from nine isolates (see Table 2 7000ف population is at mutation selection balance (i.e., the frequency of deleterious mutants is equal to q ϭ u/s, None of the isolates was heterozygous at any of these where u is the mutation rate and s is the selection coeffi- polymorphic sites. Three isolates (T8.1, T21.4, and cient), the minimum harmonic mean selection coeffi- T32.1) had identical genotypes; subsequent analysis (de- cient against auxotrophic mutants necessary to keep scribed below for data set 2) suggests that they are part 46 L. J. Johnson et al. — TT TT TT G 346 347 he nucleotides ?A.. AG GA . . GA . . GA . . 915 1578 90 ...... ? G G G Ϫ TT TT ACA 1382 1387 AC C GT...... G...... 224 837 1106 Gene TABLE 2 pheromone unit. Noncoding regions are shown in italic type...... GT...... GT...... GT...... G...... G...... G...... GT...... GT...... G...... GT...... G.T Polymorphisms C.T.A 1 ␣ MF CTAGAT T T T T T T TACCTA T T T T T 1544 1593 1679 1707 1718 1775 (S. c). Dots indicate identity to Type. Bases are numbered from the start of the coding sequence; negative numbers ...... A TG G 792 805 1365 10 G G Ϫ S. cerevisiae AA 245 4—..4T— . . . .— ...T—. . . G3C 354 R 17 1 MFA1 STE3 STE2 SAG1 AGA2 ␣ ..4T— ..4T—...... 4T—...... 4T—...... 4T—.C.A .a4...... 4...... 4...... a4T—...... 4T— .. . .T . —.4.. 108 MF Ϫ ?? ?? ? T T T T T T T T T T AC 130 Ϫ Nucleotide polymorphisms found in the initial survey of nine wild isolates and the Type and Danish strains are shown. Also shown, for comparison, are t T76.6 S. c FE2 FE1 T8.1 T21.4 T32.1 T62.1 Q4.1 Q32.3 Q59.1 Q70.8 Danish Base Type found in the Far Eastern strains and in indicate upstream positions. Column R shows number of repeats of the Population Genetics of S. paradoxus 47

TABLE 3 Estimates of nucleotide diversity in S. paradoxus wild isolates

Coding sequence Noncoding sequence

␲ ϫ 3 ␲ ϫ 3 ␲ϫ 3 Gene Strains bp a 10 s 10 bp 10 MFA1 9 111 0 23.31 463 1.2 MF␣1 9 534 0 6.94 150 2.6 STE2 9 1296 0 2.16 214 1.8 STE3 9 1413 0.21 1.42 450 2.5 AGA2 9 264 0 0 221 1.8 SAG1 8 1956 0 3.96 158 3.4 Total 5534 0.07 3.53 1656 1.7 ␲ ␲ Average pairwise diversity per nucleotide site at synonymous ( s) and nonsynonymous ( a) sites of coding regions and of adjacent noncoding sequence is shown. Noncoding regions considered are upstream of MF␣1 and SAG1; downstream of MFA1, STE2, and STE3; and 91 bp upstream ϩ 130 bp downstream of AGA2. The STE2 sequence does not include the first 200 bp.

of a single clone. No other pair of isolates had identical and Sharp 1992). These results indicate that the six genotypes. Table 3 shows the average pairwise diversity genes are under purifying selection in S. paradoxus. per nucleotide site of these six genes in wild isolates. Gene trees for each locus, rooted using the Far East- Only one amino acid polymorphism is seen among the ern isolates and S. cerevisiae, are shown in Figure 1. nine wild isolates; the nonsynonymous nucleotide diver- The data fit these trees perfectly—i.e., their consistency comparable to that index is 1 (Farris 1989): There is no homoplasy within ,(%0.01ف) sity at these loci is low found in humans (Li and Sadler 1991). By contrast, the European data. Far Eastern and European isolates, the synonymous and noncoding nucleotide diversity is however, share a polymorphism in MF␣1 pheromone comparable to that found in repeat number. There are fixed differences between ,(%0.3ف) relatively high Drosophila melanogaster (Begun and Aquadro 1992)— Far Eastern and European MF␣1 sequences at other sites, so this homoplasy must have been created either %5ف although this is still far lower than the diversity of seen between sympatric isolates of Escherichia coli (Hall by recombination between from the Far East and

Figure 1.—Gene trees for 11 European S. paradoxus iso- lates at six loci. Identical se- quences at each locus are grouped together, and branch lengths are labeled with num- ber of base changes. T21.4 and T32.1 are in all cases identical to T8.1 and have been omitted. Arrows indicate ancestral state as indicated by the Far Eastern isolates. 48 L. J. Johnson et al.

Europe or by parallel mutations. Parallel mutation is a One such pair (Q15.1 and Q16.1) was collected from plausible cause, as repeat number is highly variable in the same tree at the same time and is the most likely Saccharomyces (Kitada and Hishinuma 1988) and var- candidate; each other pair is separated by Ͼ500 m and ies from two to four repeats in our set of 28 wild isolates the data do not allow one to distinguish whether these (see below). Overall, then, there is no compelling evi- are clonemates or are identical just by chance. dence of recombination within any of these genes. Apart from this localized clonal growth, there is no To test for recombination between genes, the data obvious correlation between genotype and geographic from all six loci were combined for parsimony analysis. location. With all isolates included, there is a significant The European isolates give a shortest tree of 30 steps, positive regression across all pairs of isolates of geno- 7 steps longer than the minimum possible (consistency typic distance (proportion of loci at which the isolates index ϭ 0.77), showing extensive homoplasy. Eight of differ) and geographical distance (slope ϭ 0.01 kmϪ1, (However, if only a single (randomly chosen .(0.02 ف the 15 possible pairs of gene trees conflict, and no P branch is common to all 6 trees. Moreover, nucleotide isolate of each distinct genotype is included in the analy- sites in the same gene are significantly more likely to sis, the regression is not significant (slope ϭ 0.005 kmϪ1, It appears that this population experiences .(0.25 ف agree than sites in different genes (partition homogene- P ity test, P ϭ 0.002). Recombination does therefore ap- frequent gene flow on a kilometer scale. pear to have occurred between the six genes, each of Homozygosity and inbreeding: In the entire data set, which is on a different chromosome. only a single isolate was heterozygous, at a single locus Interestingly, for none of the genes do our wild iso- (Table 4). Wright’s inbreeding coefficient, F, estimated lates form a monophyletic clade with respect to the from the fixation index (Brown 1979) is 0.99. This Type and Danish strains (with the possible exception suggests a high level of inbreeding. In the appendix of SAG1). This indicates either gene flow on the scale we model a mixed-mating population in which diploid of thousands of kilometers or large populations since individuals are derived either from intra-ascus mating divergence such that variation present at the time of or from random outcrossing. Using this model, the max- divergence has not sorted out. imum-likelihood estimate of the outcrossing rate is To compare the gene trees to the expectation under 1.1%, with 2-unit support limits of 0.06 and 5%. If auto- the null hypothesis of a neutral coalescent, we calculated diploidization occurs in the wild, this method will under- the variance of branch lengths in the genealogies and estimate the true outcrossing rate, as autodiploidization compared them to those found on randomized genealo- removes heterozygosity far more quickly than intra-ascus gies with the same total number of mutations. For this mating does (appendix). analysis the sample size was taken as seven (i.e., clonemates Recombination: In both data sets, there is abundant were excluded). For STE3, seven of the eight differences evidence of recombination between loci. Of the 21 possi- segregating within our wild isolates are on the same ble pairs of loci, 18 of them are phylogenetically incom- branch and the variance of branch lengths is 4.1, sig- patible (i.e., show evidence of past recombination). Par- simony analysis of the entire data set gives a shortest ف nificantly higher than that in random genealogies (P 0.005). For SAG1, all three segregating differences are tree of 22 steps, compared to a minimum possible of on the same branch, and the variance is 0.75, also sig- 12 (consistency index ϭ 0.54). Taken as a whole there ϭ ف nificant (P 0.05). This clumping of nucleotide is significant multilocus linkage disequilibrium (IA ف ϭ changes on the genealogies could have resulted from 0.21, rD 0.035, P 0.02), but not if each distinct ϭϪ nonindependent mutation (perhaps unlikely since the genotype is reduced to a single observation (IA 0.05, ف Ͼ ϭϪ changes occurred 600 bp apart), introgression from rD 0.008, P 0.6). other more divergent populations, or balancing selec- tion at a linked locus. DISCUSSION Molecular data set 2: genotypes of 28 isolates at seven loci: The second data set consists of all 28 isolates geno- Like S. cerevisiae, S. paradoxus is capable of three types typed for at least one polymorphism per locus se- of reproduction in the laboratory: clonal replication, quenced, plus a microsatellite locus (Table 4). Six iso- inbreeding, and outcrossing. All three appear to be lates, including the three found to be identical in data important in molding the pattern of genetic variation in set 1, had identical genotypes. This is unlikely in a ran- our natural population. Evidence for clonal replication domized data set (P Ͻ 0.001), and all 6 isolates were comes from the repeated isolation of the same genotype, collected within 600 m of one another over a 3-month more than would be expected by chance: Among our period (Figure 2). We interpret these 6 isolates as part 28 wild isolates, 6 appear to be members of a single of a clone. If five of these six clones are removed from clone, and at least one of the other five pairs of identical the data set, there remain 5 pairs of identical isolates and genotypes is also likely to be clonemates. There may only 18 different genotypes. This is fewer than would be have been inbreeding in the ancestry of these clonemates, expected in a randomized data set (P ϭ 0.05), sug- or even mating between clonemates, but inbreeding gesting that one or more of these are also clonemates. alone without clonal replication would not lead to such Population Genetics of S. paradoxus 49

TABLE 4 Genotypes of 28 wild S. paradoxus isolates

Month MF␣1 STE3 STE2 SAG1 AGA2 ID collected 333, 354, R MFA1 17 792, 805 1382, 1387 1578 346, 347 TFA1 Silkwood Park W7 10/96 TA4 C TG ? G G- 2 S36.7 12/97 TG4 C CG TT G G- 3 T4ba 5/98 TG4 T TG AC A G- 1 T8.1a 5/98 TG4 T TG AC A G- 1 T18.2 5/98 TG4 C TG ? A/G G- 3 T21.4a 5/98 TG4 T TG AC A G- 1 T22.1a 5/98 TG4 T TG AC A G- 1 T26.3 7/98 ? C TG AC A TT 1 T27.3a 7/98 TG4 T TG AC A G- 1 T32.1a 7/98 TG4 T TG AC A G- 1 T62.1 7/98 TA4 T CG AC G G- 2 T68.2a 7/98 TA4 T TG AC G G- 2 T76.6a 7/98 TG4 C TT AC G TT 2

Windsor Great Park Q4.1 9/98 TA4 C TG AC G TT 1 Q6.1 9/98 TA4 T TG AC A TT 1 Q14.4a 9/98 TA2 C TG TT G G- 2 Q15.1a 9/98 AA3 C TG TT A TT 2 Q16.1a 9/98 AA3 C TG TT A TT 2 Q31.4a 9/98 TG4 C TT AC G TT 2 Q32.3 9/98 TG4 C TG AC A G- 1 Q43.5a 9/98 TA4 C TG ? G G- 1 Q59.1 10/98 TG4 C TG TT G G- 1 Q62.5 10/98 TG4 C TG TT G TT 2 Q69.8a 10/98 TA2 C TG ? G G- 2 Q70.8a 10/98 TA4 T TG AC G G- 2 Q74.4a 10/98 TA4 C TG ? G G- 1 Q89.8 10/98 TG4 C CG TT A G- 1 Q95.3 10/98 TG4 C TG AC G G- 1 Bases or repeat numbers are shown for polymorphic sites at seven loci. Numbers under the gene names indicate polymorphic positions scored (see Table 2). Two MF␣1 alleles were absent from the nine-isolate set: both differ from Type sequence by the G → A change at base 354; allele 3 has a further T → A change at base 333 and three pheromone repeats. Allele 4 has two pheromone repeats. a Isolate IDs with indistinguishable genotypes. an overrepresentation of genotypes. Evidence for in- different loci (Maynard Smith 1994). Nevertheless, breeding comes from the high homozygosity. An as- inbreeding reduces the effective rate of recombination sumption in making this inference is that S. paradoxus (re) in the population below the actual rate (ra), ac- ϭ Ϫ in the field behaves as it does in the lab, and in particular cording to the relation re (1 F)ra (Dye and Wil- that the diplophase predominates, and so the cells we liams 1997; Nordborg 2000). This is because recombi- isolated are diploid. In principle, an alternative explana- nation is effective only in heterozygous individuals, and tion for the lack of heterozygosity is that cells are hap- inbreeding reduces the frequency of heterozygotes. In loids in nature, but autodiploidize in the early stages of our population, F ϭ 0.99, and so the effective recombi- the isolation procedure. However, we do not consider nation rate is 1% of what it would be in a random- it likely that S. paradoxus should change its life cycle so mating population. This means that linkage disequi- drastically in response to laboratory conditions. Finally, librium should extend for greater distances along the evidence for outcrossing comes from the single hetero- genome than would otherwise be the case and may have zygote we found plus the genealogical incompatibility contributed to the absence of evidence for recombina- between loci and absence of linkage disequilibrium. tion within any of the genes studied. This extension of This contrast between the great excess of homozygos- linkage disequilibrium along the genome means that ity and the absence of linkage disequilibrium between DNA sequences will be more informative for at least genes reflects the fact that even small amounts of out- some types of analyses than would otherwise be the case crossing and recombination will randomize alleles at (Nordborg 2000), which makes S. paradoxus yet more 50 L. J. Johnson et al.

Figure 2.—Locations of oak trees from which wild isolates were collected. Superimposed circles indicate isolates from the same tree. Suspected clones are shown as open circles. attractive as a model system for population genetics and though formal theoretical work would be useful in genomics. Also relevant, of course, is the actual rate of clarifying this. If balancing selection operates, it is prob- recombination, and it is interesting that S. cerevisiae has ably not heterozygote advantage (given the low levels one of the highest known recombination rates per of heterozygosity), but frequency-dependent selection. megabase of DNA. One explanation is that this has Inbreeding in S. paradoxus can occur both by intra- evolved to compensate for a low rate of outcrossing, as ascus mating and by autodiploidization (as well as by is suggested to explain the high chiasmata frequency mating between other types of relatives) and it is not seen in selfing (e.g., Zarchi et al. 1972). Alterna- possible with our data to determine the relative fre- tively, it is possible that the high rate of recombination quency of these alternatives. One possible approach has evolved as a consequence of intense selection pres- would be to compare heterozygosity at loci tightly linked sures imposed by domestication (Burt and Bell 1987). to the mating-type locus to that at unlinked loci; if there It will be interesting to see whether S. paradoxus also has not been switching, heterozygosity near the mating- has a high rate of recombination in lab crosses and to type locus will be maintained, even with selfing. Pre- determine just how far linkage disequilibrium extends sumably switching does occur at least occasionally, as along the genome. otherwise selection would not maintain the underlying The low effective rate of recombination over distances mechanism. -kb allowed us to reconstruct genealogies for each Inbreeding species present some difficulties for inter 1ف of gene. We compared the variance of branch lengths to preting sequence variability, due to genotypes being those found on random genealogies and detected sig- nonindependent. Although inbreeding predominates nificant deviations from neutrality in two genes, both over outcrossing in S. paradoxus, it is not as extreme in in the direction of changes being clumped on the gene- this regard as some other yeasts, at least in the labora- alogy. Nonindependent mutation, introgression, or bal- tory—in many species, mating typically occurs between ancing selection could give rise to such a pattern, al- a haploid mother cell and a daughter bud (Johannsen Population Genetics of S. paradoxus 51 and van der Walt 1980; Kurtzman and Fell 1998). and Saccharomyces douglasii—a paradigm of incipient evolution. Mol. Gen. Genet. 213: 297–309. Other species probably outcross more than S. para- Herskowitz, I., 1988 Life cycle of the budding yeast Saccharomyces doxus—in particular, species that are vegetatively hap- cerevisiae. Microbiol. Rev. 52: 536–553. loid and heterothallic (Kurtzman and Fell 1998). It Johannsen, E., and J. P. van der Walt, 1980 Hybridization studies within the genus Schwanniomyces Klo¨cker. Can. J. Microbiol. 26: would be interesting to compare patterns of genetic 1199–1203. variation for such species with those found here. Kitada, K., and F. Hishinuma, 1988 Evidence for preferential multi- Finally, the results reported here differ markedly from plication of the internal unit in tandem repeats of MFalpha genes in Saccharomyces yeasts. Curr. Genet. 13: 1–5. those reported for S. cerevisiae from wineries, in which Kurtzman, C. P., and J. W. Fell, 1998 The Yeasts: A Taxonomic Survey, there was a high frequency of heterozygous strains, re- Ed. 4. Elsevier, Amsterdam. cessive lethals, and heterothallism (Mortimer 2000). Li, W.-H., and L. A. Sadler, 1991 Low nucleotide diversity in man. Genetics 129: 513–523. These differences are presumably the effect of domesti- Maynard Smith, J., 1994 Estimating the minimum rate of genetic cation, although the precise details remain obscure. transformation in bacteria. J. Evol. Biol. 7: 525–534. With the development of wild strain collections, such Mortimer, R. K., 2000 Evolution and variation of the yeast (Saccharo- as are available for Drosophila, and the identification myces) genome. Genome Res. 10: 403–409. Naumov, G. I., E. Naumova and M. Korhola, 1992a Genetic identi- of more molecular markers in this species, S. paradoxus fication of natural Saccharomyces sensu stricto yeasts from Finland, may prove to be a valuable addition to the current suite Holland and Slovakia. Antonie van Leeuwenhoek 61: 237–243. of model organisms available to the population geneti- Naumov, G. I., E. S. Naumova, R. A. Lantto, E. J. Louis and M. Korhola, 1992b Genetic homology between Saccharomyces cere- cist. visiae and its sibling species S. paradoxus and S. bayanus: electro- Thanks go to Alexandra Eggington and Celine Vass for technical phoretic karyotypes. Yeast 8: 599–612. Naumov, G. I., E. S. Naumova and A. Querol, 1997a Genetic study help. This work was funded by the Natural Environment Research of natural introgression supports delimitation of biological spe- Council in studentships to Louise Johnson, Matthew Goddard, and cies in the Saccharomyces sensu stricto complex. Syst. Appl. Micro- Richard Hetherington; and a grant to Austin Burt. biol. 20: 595–601. Naumov, G. I., E. S. Naumova and P. D. Sniegowski, 1997b Differ- entiation of European and Far East Asian populations of Saccha- romyces paradoxus by allozyme analysis. Int. J. Syst. Bacteriol. LITERATURE CITED 47: 341–344. Naumov, G. I., E. S. Naumova and P. D. Sniegowski, 1998 Saccharo- Agapow, P.-M., and A. Burt, 2001 Indices of multilocus linkage myces paradoxus and Saccharomyces cerevisiae are associated with disequilibrium. Mol. Ecol. Notes 1: 101–102. exudates of North American oaks. Can. J. Microbiol. 44: 1045– Altschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman, 1050. 1990 Basic local alignment search tool. J. Mol. Biol. 215: 403– Nelson, M. A., 1996 Mating systems in ascomycetes: a romp in the 410. sac. Trends Genet. 12: 69. Andolfatto, P., and M. Przeworski, 2000 A genome-wide depar- Nordborg, M., 2000 Linkage disequilibrium, gene trees and selfing: ture from the standard neutral model in natural populations of an ancestral recombination graph with partial self-fertilization. Drosophila. Genetics 156: 257–268. Genetics 154: 923–929. Barnett, J. A., R. W. Payne and D. Yarrow, 1990 Yeasts: Characteris- Rozas, J., and R. Rozas, 1999 DnaSP version 3: an integrated pro- tics and Identification. Cambridge University Press, Cambridge, gram for molecular population genetics and molecular evolution UK/London/New York. analysis. Bioinformatics 15: 174–175. Begun, D. J., and C. F. Aquadro, 1992 Levels of naturally occurring Sambrook, J., E. F. Fritsch and T. Maniatis, 1989 Molecular Clon- DNA polymorphism correlate with recombination rates in D. ing: A Laboratory Manual, Ed. 2. Cold Spring Harbor Laboratory melanogaster. Nature 356: 519–520. Press, Cold Spring Harbor, NY. Brown, A. H. D., 1979 Enzyme polymorphism in populations. Sherman, F., 1991 Getting started with yeast, pp. 3–21 in Guide to Theor. Popul. Biol. 15: 1–42. Yeast Genetics and Molecular Biology, edited by C. Guthrie and Burt, A., and G. Bell, 1987 Mammalian chiasma frequencies as a test of two theories of recombination. Nature 326: 803–805. G. R. Fink. Academic Press, San Diego. Drake, J. W., 1991 A constant rate of spontaneous mutation in DNA- Sniegowski, P. D., P. G. Dombrowski and E. Fingerman, 2002 Sac- based microbes. Proc. Natl. Acad. Sci. USA 88: 7160–7164. charomyces cerevisiae and Saccharomyces paradoxus coexist in Dye, C., and B. G. Williams, 1997 Multigenic drug resistance among a natural woodland site in North America and display different inbred malaria parasites. Proc. R. Soc. Lond. Ser. B Biol. Sci. levels of reproductive isolation from European conspecifics. 264: 61–67. FEMS Yeast Res. 1: 299–306. Farris, J. S., 1989 The retention index and rescaled consistency Swofford, D. L., 2002 PAUP*. Phylogenetic Analysis Using Parsimony index. Cladistics 5: 417–419. (*and Other Methods), Version 4. Sinauer Associates, Sunderland, Farris, J. S., M. Kallersjo, A. C. Kluge and C. Bult, 1994 Testing MA. significance of incongruence. Cladistics 10: 315–319. Vaughan-Martini, A., and A. Martini, 1995 Facts, myths and leg- Field, D., and C. Wills, 1998 Abundant microsatellite polymor- ends on the prime industrial microorganism. J. Indust. Microbiol. phism in S. cerevisiae, and the different distributions of microsatel- 14: 514–522. lites in eight prokaryotes and S. cerevisiae, result from strong Wall, J. D., P. Andolfatto and M. Przeworski, 2002 Testing mod- mutation pressures and a variety of selective forces. Proc. Natl. els of selection and demography in Drosophila simulans. Genetics Acad. Sci. USA 95: 1647–1652. 162: 203–216. Goddard, M. R., and A. Burt, 1999 Recurrent invasion and extinc- White, T. J., T. Bruns, S. Lee and J. W. Taylor, 1990 Amplification tion of a selfish gene. Proc. Natl. Acad. Sci. USA 96: 13880–13885. and direct sequencing of fungal rRNA genes for phylogenetics, Goffeau, A., B. G. Barrell, H. Bussey, R. W. Davis, B. Dujon et pp. 315–322 in PCR Protocols: A Guide to Methods and Applications, al., 1996 Life with 6000 genes. Science 274: 563–567. edited by M. A. Innes,D.H.Gelfand,J.J.Sninsky and T. J. Hall, B. G., and P. M. Sharp, 1992 Molecular population genetics White. Academic Press, San Diego. of Escherichia coli: DNA sequence diversity at the celC, crr and gutB Wolfram Research, 1999 Mathematica, Version 4. Wolfram Re- loci of natural isolates. Mol. Biol. Evol. 9: 654–665. search, Champaign, IL. Herbert, C. J., G. Dujardin, M. Labouesse and P. P. Slonimski, Zarchi, Y., G. Simchen, J. Hillel and T. Schaap, 1972 Chiasmata 1988 Divergence of the mitochondrial leucyl transfer-RNA syn- and the breeding system in wild populations of diploid wheats. thetase genes in 2 closely related yeasts, Saccharomyces cerevisiae Chromosoma 38: 77–94. 52 L. J. Johnson et al.

Zeyl, C., 2000 Budding yeast as a model organism for population Note that this assumes the loci are independent. For genetics. Yeast 16: 773–784. Zeyl, C., and J. A. G. M. DeVisser, 2001 Estimates of the rate the six isolates with missing data, the inside product is and distribution of fitness effects of spontaneous mutation in done over only the loci for which there are data. Finally, Saccharomyces cerevisiae. Genetics 157: 53–61. isolate T18.2 is homozygous at 5 loci and heterozygous Communicating editor: D. Charlesworth at SAG1, and the probability of an individual being this is

APPENDIX ∞ 5 x x ϭ x Ϫ 2 2 p(T18.2) ͚s t͟΂1 HWi΂ ΃ ΃HWSAG1΂ ΃ . To estimate the frequency of outcrossing compatible xϭ0 iϭ1 3 3 with the observed level of heterozygosity, we first mod- eled a mixed-mating population in which haploid cells When we count only one isolate of each distinct geno- either mate within the ascus with probability s or mate type, the data consist of 14 completely homozygous ge- randomly in the population with probability t (ϭ1 Ϫ notypes, two homozygous isolates with unknown STE2 ␣ s). Note first that in such a population, the probability genotype, one homozygous isolate with unknown MF 1 that an individual chosen at random is derived from x genotype, and the heterozygote T18.2. The probability generations of selfing (i.e., there are exactly x genera- of observing the entire data set is therefore tions of selfing in its ancestry before one gets back to p(data) ϭ p(all homozygous)14 ϫ p(missing STE2)2 an outcrossing event) is sxt. Second, the probability that an individual derived from x generations of selfing is ϫ p(missing MF␣1) ϫ p(T18.2). Ϫ x homozygous at locus i is 1 HWi(2/3) , where HWi is the Hardy-Weinberg proportion of heterozygotes in the The maximum possible value of this occurs at an out- population at that locus. Note that in this system selfing crossing rate of t ϭ 1.1%, with 2-unit support limits of reduces heterozygosity by one-third every generation, 0.06 and 5%. not by one-half, as in more familiar systems where selfing We also modeled a mixed-mating population in which come from independent meioses (e.g., plants). individuals were derived either from mating between This is because, with intra-ascus mating, each haploid clonemates (autodiploidization) with probability s or spore produced from a heterozygous diploid shares an from random outcrossing with probability t. In this case allele with only one of its three potential mating part- individuals are either completely homozygous at all loci ners. Finally, the overall probability that a random indi- or heterozygous at Hardy-Weinberg proportions, and vidual is homozygous at the ith locus is the product the probability an individual is homozygous at the ith of these two probabilities, summed over all possible locus is numbers of generations of selfing in its ancestry: ϭ ϩ Ϫ ∞ x p(homozygous) s t(1 HWi). ϭ x Ϫ 2 p(homozygous) ͚s t΂1 HWi ΂ ΃ ΃. xϭ0 3 With this model the maximum-likelihood outcrossing rate is 6%, with 2-unit support limits of 0.3 and 23%, In our data set there are 7 loci, and the probability that higher than that in the previous model, as a greater an individual will be homozygous at all of them is then frequency of outcrossing is needed to counterbalance ∞ 7 x ϭ x Ϫ 2 the more intense inbreeding caused by autodiploidiza- p(all homozygous) ͚s t͟΂1 HWi΂ ΃ ΃. xϭ0 iϭ1 3 tion.