C. R. Biologies 334 (2011) 620–628

Contents lists available at ScienceDirect

Comptes Rendus Biologies

www.sciencedirect.com

Evolution/E´ volution Yeasty clocks: Dating genomic changes in Horloges tremblantes : datation des changements ge´nomiques chez les levures

Thomas Rolland, Bernard Dujon *

Unite´ de ge´ne´tique mole´culaire des levures (CNRS URA2171 and University P.-M.-Curie UFR927), Institut Pasteur, 25, rue du Docteur-Roux, 75724 Paris cedex 15, France

ARTICLE INFO ABSTRACT

Article history: Calibration of clocks to date evolutionary changes is of primary importance for Received 12 November 2010 comparative . In the absence of fossil records, the dating of changes during Accepted after revision 17 March 2011 genome evolution can only rely on the properties of the genomes themselves, given Available online 1 July 2011 the uncertainty of extrapolations using clocks from other organisms. In this work, we use the experimentally determined mutational rate of to calculate Keywords: the numbers of successive generations corresponding to observed sequence polymor- Evolution phism between strains or species of other yeasts. We then examine conservation Mutational rate Polymorphism across the entire subphylum of yeasts, and compare this second clock Divergence times based on chromosomal rearrangements with the first one based on sequence divergence. A Synteny conservation non-linear relationship is observed, that interestingly also applies to insects although, for equivalent sequence divergence, their rate of chromosomal rearrangements is higher than that of yeasts. ß 2011 Acade´mie des sciences. Published by Elsevier Masson SAS. All rights reserved.

RE´ SUME´

Mots cle´s: L’e´talonnage d’horloges mole´culaires pour dater les changements e´volutifs a une grande E´ volution importance pour la ge´nomique comparative. En l’absence de fossiles, la datation des Taux de changements durant l’e´volution des ge´nomes de levures ne peut se baser que sur les Polymorphisme proprie´te´s des ge´nomes eux-meˆmes, e´tant donne´e l’incertitude des extrapolations a` partir Temps de divergence d’horloges d’autres organismes. Dans ce travail, nous utilisons le taux de mutation Conservation de synte´nie expe´rimentalement de´termine´ chez Saccharomyces cerevisiae pour calculer les nombres de ge´ne´rations successives correspondant aux degre´s de polymorphisme de se´quences observe´s entre souches ou espe`ces d’autres levures. Nous examinons ensuite la conservation de synte´nie a` travers tout le sous-embranchement des levures Sacchar- omycotina, et comparons cette seconde horloge base´e sur les re´arrangements chromo- somiques avec la premie`re base´e sur la divergence de se´quence. Une relation non-line´aire est observe´e, qui s’applique e´galement aux insectes bien que, pour une divergence de se´quence e´quivalente, leur taux de re´arrangements chromosomiques soit plus e´leve´ que celui des levures. ß 2011 Acade´mie des sciences. Publie´ par Elsevier Masson SAS. Tous droits re´serve´s.

* Corresponding author. E-mail address: [email protected] (B. Dujon).

1631-0691/$ – see front matter ß 2011 Acade´mie des sciences. Published by Elsevier Masson SAS. All rights reserved. doi:10.1016/j.crvi.2011.05.010 T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 621

1. Introduction experimental approaches [16]. Most yeasts whose gen- omes have been fully sequenced so far belong to the The concept of molecular evolutionary clocks is central Saccharomycotina (also called hemiascomycetes), a large to modern . From the pioneering subphylum of Ascomycota that includes Saccharomyces work of Zuckerlandl and Pauling [1], it is commonly cerevisiae. Despite the conservation of their unicellular admitted that amino-acid substitutions between ortholo- mode of life with bud formation, these yeasts cover a very gous accumulate with the time separating them broad evolutionary range, and very important degrees of from their common ancestor, and differences between sequence divergence exist between orthologous genes of aligned sequences are, therefore, used to build phyloge- distinct yeast species, even those belonging to the same netic trees and to estimate the dates of separation between clade [17,18]. Dating major evolutionary changes in yeast living species (or groups of species). With the increasing genomes, such as the change of codon assignation in the availability of genome sequence data, it became clear, CTG group [19], the triplication of mating cassettes in however, that the rate at which sequences evolve Saccharomycetaceae [13], or the whole-genome duplica- varies among lineages [2], leading to the idea of relaxed tion in the ancestry of Saccharomyces sensu stricto and molecular clocks [3–5], and raising the question of related clades [20], remains, therefore, highly imprecise. appropriate calibration to date major phylogenetic separa- Phylogenetic interpolation within the fungal tree of life has tions. In fungi, for example, this problem was remarkably been attempted [21–23], but the specific mode of illustrated by the work of Taylor and Berbee [6]: depending propagation of yeasts with rapid clonal expansions raises upon the reference used to calibrate the clock, the the question of the validity of the comparisons with separation date between Ascomycota and Basidiomycota multicellular organisms having obligate sexual reproduc- varies between 400 and 1800 Myr. Similarly, the origin of tion and possibly distinct evolutionary rates. A specific Saccharomycotina (budding yeasts) is dated, according to calibration of the molecular clock of yeasts is, therefore, calibrations, at 250 Myr ago or 900 Myr ago, i.e. a range of desirable. But, besides the genomic changes themselves, no uncertainty linking the Permian-Trias transition to deep independent piece of information such as fossils records, is precambrian times. Even when calibration is properly set, available to cover their very large evolutionary range. extrapolation of molecular clocks to large evolutionary In this work, we have addressed this question from two scales can only give seemingly precise results if one takes different viewpoints. Starting from the mutation rates that the statistical limits of confidence into proper consider- have been precisely measured by experiments in ation [7]. Greater precision would require independent S. cerevisiae [24–26], we have computed the minimal calibration points within short evolutionary timescales number of successive generations separating distinct using increased taxon sampling or continuous fossil lineages in this yeast, and extrapolated similar calculations records, two conditions not always readily accessible. to the separation of species within clades. This clock is The identification of Paleopyrenomycites devonicus as the appropriate for short evolutionary timescales but gradual- oldest fossil ascomycete dated to 400 Myr [8] played an ly loses precision with increasing evolutionary range. We important role to calibrate the fungal tree of life, but such have, therefore, looked for a second clock more appropriate fossils remain rare in fungi. Also, they are non-existent in to larger evolutionary timescales by examining the yeasts, if one excepts amber inclusions which have relationship between sequence divergence and degrees received only limited attention so far [9,10] and are, of chromosomal rearrangements. This relationship has anyway, too recent for setting clocks over long evolution- been quantitatively established over the entire evolution- ary times. Increasing taxon sampling is not easier for ary range of Saccharomycotina, and compared to a similar yeasts, since it is unlikely that living intermediates exist, relationship established for insects. given their very mode of propagation that creates constant bottlenecks. 2. Calibrating sequence divergence in terms of the Another important problem for dating using molecular minimal number of successive generations data is that substitution rates also vary between the different genes of a same organism. In yeasts, for example, The spontaneous mutation rate has recently been a dispersion of nearly three orders of magnitude exists in determined with precision in S. cerevisiae by three the rate of non-synonymous substitutions per site (dN) independent approaches. A per-base-pair mutation rate between the fastest and the slowest evolving proteins [11]. (m) was established for two genes using the classical Luria- 10 The dispersion is lower in organisms with smaller Delbru¨ ck fluctuation assays [24]. Figures of 3.80 10À 10  genetically effective population sizes such as Drosophila and 6.44 10À per nucleotide per generation  and mammals [12], hence the necessity to compare were obtained for the URA3 and the CAN1 genes, homogeneous groups of organisms sharing similar life respectively, indicating that, even if not entirely uniform style and mode of propagation to properly date evolution- across the genome, the mutation rate shows a limited ary changes. Yeasts offer such a case with more than three variation range (ca. two times). An independent estimation dozens of species fully sequenced [13] and population of the per-base-pair mutation rate (m) along the entire genomic studies now available for a few of them [14,15]. genome was obtained using novel sequencing technology These fungi proved particularly meaningful to elucidate in mutation-accumulation experiments [25]. Partial rese- the mechanisms of unicellular eukaryotic genome evolu- quencing (ca. 40% genome coverage) of four independent tion by allowing us to easily confront hypotheses based on cultures of S. cerevisiae grown in rich medium for a total of comparative genome analysis with the results of direct ca. 4800 generations after 200 successive single-cell 622 T. Rolland, B. Dujon / C. R. Biologies[(Fig._1)TD$IG] 334 (2011) 620–628 bottlenecks gave a complete description of the spectrum and frequencies of spontaneous mutations. Although some variations were again observed between the different parts of the S. cerevisiae genome, results converge to an average 10 figure of 3.3 10À mutations per nucleotide per  generation, ca. 90% of which being nucleotide substitutions and 10% indels. This figure is in excellent agreement with the Luria-Delbru¨ ck assays on reporter-construct studies 10 10 cited above. Finally, figures of 3.8 10À to 2.0 10À   base substitutions per nucleotide per generation were reported for three strains of S. cerevisiae using sequencing of cell lines grown with or without meiotic cycles [26]. We, Fig. 1. Theoretical mutant frequency as a function of successive therefore, admitted for this work that the spontaneous rate generations. Theoretical curves representing the predicted fraction of non-mutated genetic elements (nucleotides, genes or genomes) in of nucleotide substitution in S. cerevisiae under laboratory yeasts (ordinate) after increasing numbers of successive cellular 10 conditions is 3 10À mutations per site per generation. generations (abscissa, log scale) under the hypothesis of a constant  Assuming that such mutations are independent and mutation rate and independent and neutral mutations (see text). Dotted curve refers to the fraction of non-mutated nucleotides for a mutation neutral, one can then simply calculate the theoretical 10 rate m =3 10À mutation per nucleotide per generation. Hatched area frequency of mutants (m) after n successive generations  10 gives expected limits for mutational rates of 1 10À (right limit) and 10  from the initial genome using the following equation: 10 10À (left limit), respectively. Similar curves and hatched areas are  drawn for the same mutational rates for genes (assuming an average m 1 1 m n (1) gene size of 1500 nucleotides, dashed curve and area) and for complete ¼ À ðÞÀ yeast genomes (assuming a genome size of 12 millions nucleotides, plain curve and area). Note that m represents the proportion of nucleotides mutated at least once from the origin, not the final result in terms of sequence changes (the same nucleotide can reappear after multiple changes). Fig. 1 illustrates the yeasts (above). This perspective predicts that presently quantitative results of this equation. With a mutational rate living yeast species, even those usually regarded as 10 of 3 10À mutations per site per generation, half of the ‘‘closely related’’, can only be distantly related from one  nucleotides are expected to have been mutated at least once another in terms of molecular evolution. Of course, results after ca. 2.3 109 generations, the other half remaining non- of Fig. 1 only represent the maximum possible frequency of  mutated. The same calculation predicts that ca. 3.3 107 mutants after a given number of successive generations (or  and ca. 3.5 108 successive generations are needed for, the minimum time necessary to reach a given level of  respectively, 1% and 10% of nucleotides to have mutated at sequence divergence between two yeasts derived from a least once, i.e. figures frequently observed in yeast genome common ancestor). In reality, mutations are not all neutral comparisons (see below). To illustrate the effects of varying (in particular in compact genomes such as yeasts), and mutation rates, the same calculation was repeated for values those affecting fitness will have a decreased or increased 10 of m ranging from 1 to 10 10À , respectively. For 1%, 10% probability of becoming fixed in populations. For yeasts,  and 50% of nucleotides mutated at least once, upper and however, this bias against mutation fixation is probably lower limits of generation numbers are, respectively, 10– limited because, although not quantitatively established 100 millions, 100–1000 millions and 700–7000 millions for wild populations, bottlenecks are likely to play a major (Fig. 1). Although such figures are obviously theoretical role, hence increasing genetic drift at the expense of and based on the seemingly improbable hypothesis of selection [25]. neutrality for all mutations and exclusive clonal propaga- Since several S. cerevisiae strains have now been tion, they are useful to contemplate to help us understand sequenced [14,15,28–30], we found interesting to calcu- the evolution of yeasts compared to other organisms. Under late the theoretical number of generations separating each laboratory conditions, S. cerevisiae has been estimated to of these strains to the reference laboratory strain S288c. undergo a maximum of ca. 3000 generations per year [27]. Table 1 gives such figures for several frequently used Figures for wild populations are not precisely known, but S. cerevisiae laboratory strains, as well as for a few isolates likely to be lower. We have, therefore, assumed a range of of S. paradoxus. As can be seen, the least diverging strain of 100 to 1000 generations per year. With this range, S. cerevisiae, A364A, appears to have undergone at least one calculation shows that 50% of the nucleotides in a yeast million generations from its common ancestor with S288c, genome will be mutated at least once after only a few i.e. more than the total number of generations since the millions of years, i.e. a time comparable to the origin of human-chimpanzee separation. The most divergent hominoids. If one extends the same calculation to genes or to S. cerevisiae strain, SK1, has undergone 6.8–11 million entire genomes based on their sizes in nucleotides (Fig. 1), it successive generations (depending upon dataset, rese- appears that half of the protein-coding genes in a yeast quencing and array hybridization give slightly different genome will be mutated at least once in only a few results) from its common ancestor with S288c. Similarly, thousands years (ca. 106 generations) and half of the yeast the closest strain of S. paradoxus has undergone ca. one genomes will be mutated at least once in less than a year. million generations since its common ancestor with the Such short times on the geological scale are to be reference strain, but divergence of other strains appear compared with the estimated age of Saccharomycotina much more ancient (up to 63.3 million generations). Using T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 623

Table 1 Sequence polymorphism between strains of S. cerevisiae and S. paradoxus.

Species Reference strain Compared strain Number of SNPs SNP frequency (%) n Ref.

S. cerevisiae S288C A364A 6,538 0.060 1,000,300 [15] S288C W303 11,976 0.110 1,834,342 [15] S288C CENPK 16,406 0.150 2,501,877 [15] S288C FL100 22,446 0.210 3,503,680 [15] S288C RM11 29,508 0.270 4,506,086 [15] S288C SK1 44,148 0.410 6,847,380 [15]

S288C W303 - 0.072 1,200,432 [14] S288C RM11-1a - 0.364 6,077,734 [14] S288C SK1 - 0.659 11,019,682 [14]

S. paradoxus CBS432 CBS5829 - 0.068 1,133,719 [14] CBS432 N-44 - 1.209 20,272,796 [14] CBS432 DBVPG6304 - 3.736 63,459,609 [14] CBS432 YPS138 - 3.727 63,303,795 [14] Data taken from [14,15]. The table gives the frequency of SNP observed for each listed strain compared to the reference sequence and the deduced number of generations undergone by this strain from its common ancestor with that reference. Calculation of n according to Fig. 2.

recent population genomics studies [14,15] and similar determined by using different references. The population calculations, we have reanalyzed the population structures of S. paradoxus (only available from [14]) is made of a of S. cerevisiae and S. paradoxus (Fig. 2). A striking difference homogeneous majority of strains very closely related to appears between the two species using the available the reference (less than one million generations) and two references. In S. cerevisiae, less than 10% of strains are subpopulations having separated much longer before (ca. separated from the reference by a relatively small number 20 and 65 million generations from the last common of generations (1–3 million(s)), whereas the majority of ancestor, respectively). This heterogeneity coincides with strains have undergone 5–10 million generations after the idea that S. paradoxus strains remain limited within separation (or 4 to 7, depending on datasets). Whether the geographic boundaries for a long time while the homoge- latter forms a homogeneous population or not can only be neity of the S. cerevisiae population is related to the [(Fig._2)TD$IG] frequent formation of mosaics among strains [14]. We have tried to extend our calculations to larger evolutionary distances, such as those observed between species of a same clade, even if precision should diminish. An interesting case of a yeast genome has recently been discovered and fully sequenced (Leh-Louis et al., in preparation). This yeast was formed by hybridization between two parents differing from each other by ca. 12% nucleotide substitutions on average, a figure which, according to our calculations, corresponds to ca. 210 mil- lion generations from their common ancestor, i.e. an order of magnitude probably comparable to the separation of fishes from mammals. Other interesting cases are, in principle, offered by the existence of since they are expected to diverge in sequence at the neutral rate Fig. 2. Dating populations of Saccharomyces from sequence polymorphism. The figure represents the cumulative frequency [31]. However, the original sequences of the ancestral distributions of strains from S. cerevisiae (black lines) and S. paradoxus functional gene are unfortunately very rarely available. (grey line) as a function of the number of successive generations Pseudogenes corresponding to duplicated ohnologs in the (abscissa) they have each undergone from their common ancestor with genome of S. cerevisiae offer a means to alleviate this the cognate reference strain (S288c for S. cerevisiae strains, and CBS432 for difficulty. For example, a corresponding to an S. paradoxus strains). The number of successive generations (abscissa) is calculated from the SNP rate of each strain relative to the reference, using ancient copy of the Lys-tRNA synthetase gene lies between the equation: YBR060c and YBR061c after duplication of the functional N = 1/2 log (1 m)/log (1 m) KRS1 ancestral gene [32]. Given the fact that the two 1/2  À À where m is the mutation rate per nucleotide per generation (here functional copies conserved in S. uvarum (660.15 and 10 3 10À ) and m is the observed frequency of SNP. The factor compared 678.163) are 98.8% identical in sequence (consistent with a  to Eq. (1) is due to the assumption of equivalent mutational rates in both strong functional constraint on this essential enzyme) and lineages (reference and studied strain) from their common ancestor. SNP are 89% identical in sequence to the functional gene of data are taken from the I40 rates in [14] (triangles) for the 37 strains of S. cerevisiae (KRS1, YDR037w), it is possible to conclude that S. cerevisiae and the 35 strains of S. paradoxus (resequencing of references ignored) and from [15] (dots) for 62 strains of S. cerevisiae (resequencing the S. cerevisiae pseudogene differs from its ancestral of reference ignored). sequence by ca. 30–40% of nucleotide substitutions which, 624 T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 according to our calculation corresponds to a minimum of myces) stipitis [36], Candida (Meyerozyma) guilliermondi, 1.1–1.7 billion successive generations. This estimate is, of Clavispora lusitaniae and Lodderomyces elongisporus [17]. As course, not precise but it gives us an order of magnitude for an outgroup, we have used the genome of the minimal age of the whole-genome duplication at the lipolytica [35] which is neither a Saccharomycetaceae nor a origin of Saccharomyces sensu stricto and related clades. member of the CTG group. All pairwise comparisons were Extension of this method to larger phylogenetic distances performed between the 11 yeast species, as described in becomes increasingly problematic, however. First because Fig. 3, and conserved syntenic blocks were defined using nucleotide sequence alignments become more uncertain the same parameters as [18], namely a minimum of five as sequence divergence increases, and second because of conserved orthologs and a maximum of 10 intervening the over-simplification of the reality inherent to the genes. As published previously, the five protoploid Sacchar- hypothesis of neutrality and clonal expansion. Given the omycetaceae share 200 to 300 short syntenic blocks large evolutionary span covered by the sequenced yeast (average size of 20 genes) in all pairwise comparisons, genomes, another method is, therefore, needed. except for the Kluyveromyces (Lachancea) thermotolerans/ Saccharomyces (Lachancea) kluyveri pair. These two species 3. Chromosomal rearrangements as an estimation of belong to the same clade (Lachancea) within the Sacchar- species divergence times omycetaceae family. Similar number and size distributions of conserved syntenic blocks are observed among the Our second method to estimate the evolutionary pairwise comparisons between the five CTG species. This divergence between yeasts is based on the conservation time, the D. hansenii/C. guillermondi pair forms the of synteny. In the group of S. sensu stricto and related exception, indicating that these two species are more clades, the genome duplication followed by extensive gene closely related to each other than are the other three loss, has so profoundly affected the gene order map by (despite the fact that they belong to two distinct clades, creating a 1:2 relationship with the non-duplicated yeasts Debaryomyces and Meyerozyma, respectively). If one now of the same family [33,34], that synteny conservation compares species of the Saccharomycetaceae family to cannot be used as a simple evolutionary clock. The those of the CTG group, the number of conserved syntenic subsequent release of complete genome sequences of blocks and their average size drop (100–200 blocks of numerous other yeasts now allows us to examine this average size 14 genes). problem across a very broad evolutionary range. In a To quantitatively estimate the conservation of synteny previous investigation, five protoploid species of Sacchar- between any two yeasts (in order to further support omycetaceae have been compared, giving us a first comparisons across the entire group of species studied), description of the number and size of conserved syntenic we calculated for all pairs of compared species the number blocks in yeasts [18]. We have now extended this analysis of orthologous genes present in conserved syntenic blocks to another group of yeasts, collectively designated as and reported it to the total number of orthologous genes ‘‘CTG’’, and separated from the Saccharomycetaceae family between the two species. We found 3600–4300 ortholo- at an early branching point within the Saccharomycotina gous genes in conserved syntenic blocks for comparisons yeasts ([13], see also Santos et al., this issue). Many within the Saccharomycetaceae (corresponding to 85% to sequenced species of this group are only known as diploids 95% of all orthologs, Fig. 4A). Similarly, 3100–4300 and were, therefore, disregarded to eliminate possible orthologous genes are in conserved syntenic blocks for artifacts on synteny conservation (available sequences comparisons within the CTG group (68% to 92%). Now, correspond to the haploid equivalent). We have, therefore, comparisons between the protoploid Saccharomycetaceae only studied the five fully sequenced haploid species from and the CTG yeasts reveals only 750–1400 orthologous this group: [35], Pichia (Schefferso- genes in conserved syntenic blocks (15% to 35%). When [(Fig._3)TD$IG]

Fig. 3. Number and size of conserved syntenic blocks between Saccharomycotina yeasts. On the left, the commonly accepted topology is shown [57] (top: protoploid Saccharomycetaceae, middle: CTG yeasts, bottom: Y. lipolytica). For each pairwise comparison, the table indicates the total number of conserved syntenic blocks (left) and their average size (in coding genes, right). Conserved syntenic blocks are defined by sets of at least five adjacent orthologous genes (defining anchor points), conserved in order between two species, and separated by a maximum of 10 intervening genes [18]. Orthologous genes were previously extracted by the IONS method using sequence and neighborhood similarity (Seret and Baret, in preparation) for Zygosaccharomyces rouxii, K. thermotolerans, S. kluyveri, Kluyveromyces lactis, Eremothecium (Ashbya) gossypii, D. hansenii and Y. lipolytica. Orthology relationships between the protoploid and CTG species, and between the CTG group and Y. lipolytica, was deduced from Reciprocal Best Hits (RBH), using blastp program [58]. Z. rouxii: ZYRO; K. thermotolerans: KLTH; S. kluyveri: SAKL; K. lactis: KLLA; E. gossypii: ERGO; D. hansenii: DEHA; C. guillermondi: CAGU; P. stipitis: PIST; C. lusitaniae: CLLU; L. elongisporus: LOEL; Y. lipolytica: YALI. [(Fig._4)TD$IG] T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 625

Fig. 4. Conservation of synteny and its relationship with sequence divergence. A. Estimation of the minimal (green) and maximal (red) numbers of genome rearrangements (ordinate) between Saccharomycotina yeasts. Abscissa represents the ratio of the number of orthologs in conserved synteny blocks over the total number of identified orthologs in the pair of yeast species compared. Symbols correspond to those described in (B). B. Relationship between conserved synteny and sequence divergence among Saccharomycotina yeasts. Syntenic blocks considered are defined in Fig. 3. Abscissa represents the ratio of the number of orthologs in conserved synteny blocks over the total number of identified orthologs in the pair of yeast species compared. Ordinate represents the average amino-acid identity between all orthologous proteins for each pair of yeast species considered. The red dot corresponds to comparison of any species with itself. Linear correlations have been fitted for the whole dataset, and independently for the two subsets corresponding to less than 40% or more than 60% of orthologs in synteny, respectively. C. Comparison of yeasts to insects. Abscissa and ordinate, same as (B). Insect and vertebrate data have been extracted from [45]. Yeast data have been recomputed using the same parameters as for insect data. Conserved syntenic blocks were reconstructed from aligned orthologs defined from RBH (with more than 30 amino-acid long alignments to avoid domain detection), assuming a minimum of two anchor points and a maximum of one intervening gene (compare to (B)). Linear correlations have been fitted for each of the two datasets, and for corresponding subsets as in (B). Abscissa limits are less than 35% and more than 60% for insects, and less than 70% and more than 80% for yeasts.

Y. lipolytica is compared to any member of the previous two number of rearrangements ranges from 250 to 1500 groups, even lower conservation of synteny is observed. (Fig. 4A). For species presenting less than 35% of orthologs Ancestral genome reconstruction is generally done by in synteny, the difference between minimum and maxi- trying to minimize the postulated rearrangements neces- mum values is too large to allow reliable reconstruction of sary to account for extant genomes [37–41]. Given the ancestral genomes. In addition to the broadening of large evolutionary distances between studied yeasts, the observable figures, the number of rearrangements estimation of the number of actual rearrangements from becomes more and more difficult to evaluate with the observed syntenic blocks is not trivial. We have, increasing evolutionary distance due to the superposition therefore, opted for minimal and maximal estimates using of events. Following the original work of [42], breakpoint the following principles: the minimal number of rearran- reuse has been proposed to have a great impact on the gements should be at least equal to the number of dynamics of genomes. Micro-inversions involving one or a identified syntenic blocks, and the maximum number of few genes, and consequently forming short conserved rearrangements is equal to the total number of orthologs blocks, have been shown to deeply affect the estimation of minus those present in syntenic blocks (Fig. 4A). For breakpoint reuse in human and mouse evolution [43]. example, between K. thermotolerans and S. kluyveri, the More recently, the analysis of 12 closely related Drosophila minimal number of rearrangements is 84, and the maximal species has shown that breakpoint reuse is stronger in one is 161 (4609 identified orthologs – 4448 orthologs in internal branches of the phylogenetic tree, while uniquely syntenic blocks). Interestingly, the two numbers are very used breakpoints are specific to more derived lineages close for this comparison, as is the case for D. hansenii and [44]. By analyzing the distribution of synteny block sizes in C. guillermondi (minimum 111 and maximum 281), but protoploid Saccharomycetaceae, it has been shown that diverge for longer evolutionary distances. For species breaks are not random in genomes [18], as previously presenting more than 65% of orthologs in synteny, the reported for insects [45]. Although different in , 626 T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 breakpoint reuse is not different from the presence of hot- 4. Discussion spots and cold-spots in meiotic recombination (see [46] for S. cerevisiae, for example). In the absence of a properly set evolutionary clock for At this point, it is interesting to analyze the relation- yeasts, based on reliable external data, and in view of the ships between the conservation of synteny and the difficulty to apply clocks that would simultaneously be divergence of sequences. Fig. 4B shows the results. We valid over short and very long evolutionary ranges, we observe two groups of points, corresponding respectively have developed here two methods to relate sequence to intra-family comparisons (protoploid, on the one hand, divergence, number of generations and genome rearran- and CTG species, on the other) and to interfamily gements. Calculations based on the known mutational rate comparisons, including Y. lipolytica. By fitting two inde- of S. cerevisiae illustrate that the minimum number of pendent regression lines, we show that the relationship successive generations separating different strains of a between the percentage of orthologs in syntenic blocks same species is necessarily large, and rapidly becomes very and the sequence divergence is described by two linear large when two related species of a same clade are correlations. The greatest slope for the first group of points compared. Given generation times in nature, the muta- (short evolutionary distances) indicates rapid sequence tional clock for yeast genomes is, therefore, necessarily divergence for limited loss of synteny. The flattened slope very rapid. Our theoretical assumption about neutrality of for the second group of points suggests saturation of mutations and exclusively clonal expansion (used to sequence divergence due to functional constraints for very simplify the calculations) does not alter this conclusion. long evolutionary distances. If anything, the number of generations needed to obtain The data previously reported by [45] for eight members the sequence divergence observed between yeast genomes of the Drosophila genus and four other insects, show an can only be larger than the one calculated here on the astonishing similarity with our yeast results. Because they neutrality hypothesis. Indeed, disadvantageous mutations used slightly different parameters to calculate conserved will have a lower probability to be fixed in populations and syntenic blocks, we have recalculated the yeast data using advantageous ones cannot represent the majority. A their parameters (minimum of two conserved orthologous systematic analysis of the fitness of mutations in yeasts genes separated by a maximum of one intervening gene) to would certainly be very informative. However, the allow direct comparisons (Fig. 4C). As can be seen by repetitive bottlenecks predicted to occur in natural yeast comparing Fig. 4B to Fig. 4C, application of the insect populations (to keep sustainable cell numbers), indeed parameters to the yeast dataset results in a translation to create a trend to neutrality, the genetic drift becoming higher synteny values, without altering the overall shape prominent over selection. The existence of sexual repro- of the curves. Remarquably, we observe a similar split into duction in natural yeast populations does not change our two groups of points for both insects and yeasts, despite conclusions, since similar base substitution rates were the fact that sequences are globally less diverged in insects found in S. cerevisiae between purely vegetative lines and than in yeasts. For a similar interval of sequence identity lines undergoing one meiotic cycle every 20 vegetative (ca. 50–60%), the insect genomes are clearly much more divisions [26]. rearranged than the yeast genomes. Alternatively, for The clock based on synteny conservation also presents similarly high conservation of synteny (above 80%), yeast some limits with increasing evolutionary distances. First, sequences are much more divergent than insect sequences. with current methods to assign gene orthology relation- Several hypotheses can account for the accelerated ships based on sequence similarity, the number of chromosomal reshuffling in insects compared to yeasts, recognizable orthologs diminishes when sequences including the very distinct architectures of their genomes, diverge too much. Second, the observable number of and their sexual reproduction. Insect genomes vary in size conserved syntenic blocks tends to underestimate the from 152 to 231 Mb [47], as compared to 8.7 to 15.5 Mb for actual number of chromosomal rearrangements due to most yeast genomes, except Y. lipolytica genome of 20.5 Mb superposition of events and accumulation of micro- [13]. They contain numerous and diverse transposable rearrangements embedding a few genes. These limita- elements (for example 1572 partial or full-size elements in tions are also discussed by Drillon and Fischer, this D. melanogaster [48]), as compared to only few in yeast volume for yeast and vertebrate comparisons. The genomes (zero in some protoploid genomes [18] to a dozen similarity of the relationship between synteny and in most S. cerevisiae strains [14]). Insect genomes have sequence divergence among yeasts and insects, however, larger intergenic regions than yeast genomes (ca. 4800 bp shows that a synteny-based clock is very appropriate for on average for insects [49] compared to ca. 490 bp for intra-family taxa and becomes less appropriate for inter- yeasts [50]) and larger and more numerous spliceosomal family comparisons. At this larger evolutionary scale, a [49] (Neuve´glise et al., this volume). The acceler- better taxon sampling remains central to the correct ated chromosomal reshuffling in insects compared to estimation of evolutionary times. yeasts is further magnified by the fact that the mutational Whatever the progresses in setting appropriate clocks, 9 rate of Drosophila melanogaster (3.5 10À mutations per the correct construction of phylogenetic trees will have to  nucleotide per generation, a value experimentally mea- better incorporate non-vertical exchanges. In yeasts, the sured by sequencing three strains [51]), is roughly ten formation of interspecific hybrids appears to be frequent times greater than that of S. cerevisiae. Consequently, [52,53], even though the contribution of this phenomenon similar sequence divergence values correspond to a to yeast evolution remains to be quantified. Similarly, smaller number of generations in insects than in yeasts. acquisition of horizontally transferred genes [54] and T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628 627 introgression of large chromosomal segments from dis- [15] J. Schacherer, J.A. Shapiro, D.M. Ruderfer, L. Kruglyak, Comprehensive polymorphism survey elucidates population structure of Saccharomy- tantly related species [30] contribute to alter the clocks. In ces cerevisiae, Nature 458 (2009) 342–345. principle, building gene-specific and lineage-specific [16] B. Dujon, Yeasts illustrate the molecular mechanisms of eukaryotic clocks would be the solution [55] but it results in complex genome evolution, Trends Genet. 22 (2006) 375–387. [17] G. Butler, M.D. Rasmussen, M.F. Lin, M.A. Santos, S. Sakthikumar, C.A. models whose biological relevance remains to be estab- Munro, et al., Evolution of pathogenicity and sexual reproduction in lished. Finally, to complete the evolutionary clocks of eight Candida genomes, Nature 459 (2009) 657–662. , one should note the accelerated mutation rate [18] J.L. Souciet, B. Dujon, C. Gaillardin, M. Johnston, P.V. Baret, P. Cliften, 9 et al., Comparative genomics of protoploid Saccharomycetaceae, Ge- of mitochondrial DNA (e.g. 12.9 10À mutations per  nome Res. 19 (2009) 1696–1709. nucleotide per generation as experimentally determined [19] S.E. Massey, G. Moura, P. Beltra˜o, R. Almeida, J.R. Garey, M.F. Tuite, et al., for S. cerevisiae [25]), and the fact that pieces of Comparative evolutionary genomics unveils the molecular mechanism mitochondrial DNA (NUMTs) enter chromosomes of yeasts of reassignment of the CTG codon in Candida spp., Genome Res. 13 (2003) 544–557. [56] and other species, reminding us of the intensity of [20] K.H. Wolfe, D.C. Shields, Molecular evidence for an ancient duplication novel sequence acquisition within nuclear genomes of of the entire yeast genome, Nature 387 (1997) 708–713. eukaryotes. [21] R. Friedman, A.L. Hughes, Gene duplication and the structure of eu- karyotic genomes, Genome Res. 11 (2001) 373–381. [22] R.B. Langkjaer, P.F. Cliften, M. Johnston, J. Piskur, Yeast genome dupli- cation was followed by asynchronous differentiation of duplicated Disclosure of interest genes, Nature 421 (2003) 848–852. [23] D.A. Fitzpatrick, M.E. Logue, J.E. Stajich, G. Butler, A fungal phylogeny The authors declare that they have no conflicts of based on 42 complete genomes derived from supertree and combined interest concerning this article. gene analysis, BMC Evol. Biol. 6 (2006) 99. [24] G.I. Lang, A.W. Murray, Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae, 178 (2008) 67–82. [25] M. Lynch, W. Sung, K. Morris, N. Coffey, C.R. Landry, E.B. Dopman, et al., Acknowledgements A genome-wide view of the spectrum of spontaneous mutations in yeast, Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 9272–9277. [26] K.T. Nishant, W. Wei, E. Mancera, J.L. Argueso, A. Schlattl, N. Delhomme, We thank our colleagues from the Ge´nolevures et al., The baker’s yeast diploid genome is remarkably stable in vege- Consortium (GDR2354 CNRS) for helpful discussions, tative growth and , PLoS Genet. 6 (9) (2010). and particularly Philippe Baret, Laurence Despons, Ve´ro- [27] J.C. Fay, J.A. Benavides, Evidence for domesticated and wild populations nique Leh-Louis and Marie-Line Seret for communicating of Saccharomyces cerevisiae, PLoS Genet 1 (2005) 66–71. [28] W. Wei, J.H. McCusker, R.W. Hyman, T. Jones, Y. Ning, Z. Cao, et al., unpublished results. T.R. is the recipient of a fellowship Genome sequencing and comparative analysis of Saccharomyces cere- from the French Ministe`re de l’Enseignement Supe´rieur et visiae strain YJM789, Proc. Natl. Acad. Sci. U. S. A. 104 (2007) 12825– de la Recherche. B.D. is a member of Institut Universitaire 12830. [29] S.W. Doniger, H.S. Kim, D. Swain, D. Corcuera, M. Williams, S.P. Yang, de France. et al., A catalog of neutral and deleterious polymorphism in yeast, PLoS Genet. 4 (2008) e1000183. [30] M. Novo, F. Bigey, E. Beyne, V. Galeote, F. Gavory, S. Mallet, et al., References -to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118, Proc. Natl. [1] E. Zuckerlandl, L. Pauling, Molecules as documents of evolutionary Acad. Sci. U. S. A. 106 (2009) 16333–16338. history, J. Theor. Biol. 8 (1965) 357–366. [31] I. Lafontaine, B. Dujon, Origin and fate of pseudogenes in Hemiasco- [2] R.J. Britten, Rates of DNA sequence evolution differ between taxonomic mycetes: a comparative analysis, BMC Genom. 11 (2010) 260. groups, Science 231 (1986) 1393–1398. [32] G. Fischer, C. Neuve´ glise, P. Durrens, C. Gaillardin, B. Dujon, Evolution of [3] M.J. Sanderson, A nonparametric approach to estimating divergence gene order in the genomes of two related yeast species, Genome Res. 11 times in the absence of rate constancy, Mol. Biol. Evol. 14 (1997) 1218– (2001) 2009–2019. 1231. [33] F.S. Dietrich, S. Voegeli, S. Brachat, A. Lerch, K. Gates, S. Steiner, et al., [4] A.D. Yoder, Z.H. Yang, Estimation of primate speciation dates using local The Ashbya gossypii genome as a tool for mapping the ancient Saccha- molecular clocks, Mol. Biol. Evol. 17 (2000) 1081–1090. romyces cerevisiae genome, Science 304 (2004) 304–307. [5] R. Lanfear, J.J. Welch, L. Bromham, Watching the clock: studying [34] M. Kellis, B.W. Birren, E.S. Lander, Proof and evolutionary analysis of variation in rates of molecular evolution between species, Trends Ecol. ancient genome duplication in the yeast Saccharomyces cerevisiae, Evol. 25 (2010) 495–503. Nature 428 (2004) 617–624. [6] J.W. Taylor, M.L. Berbee, Dating divergences in the Fungal Tree of Life: [35] B. Dujon, D. Sherman, G. Fischer, P. Durrens, S. Casare´gola, I. Lafontaine, review and new analyses, Mycologia 98 (2006) 838–849. et al., Genome evolution in yeasts, Nature 430 (2004) 35–44. [7] D. Graur, W. Martin, Reading the entrails of chickens: molecular time- [36] T.W. Jeffries, I.V. Grigoriev, J. Grimwood, J.M. Laplaza, A. Aerts, A. scales of evolution and the illusion of precision, Trends Genet. 20 (2004) Salamov, et al., Genome sequence of the lignocellulose-bioconverting 80–86. and xylose-fermenting yeast Pichia stipitis, Nat. Biotechnol. 25 (2007) [8] T.N. Taylor, H. Hass, H. Kerp, The oldest fossil ascomycetes, Nature 399 319–326. (1999) 648. [37] O. Jaillon, J.M. Aury, F. Brunet, J.L. Petit, N. Stange-Thomann, E. Mauceli, [9] P. Veiga-Crespo, M. Poza, M. Prieto-Alcedo, T.G. Villa, Ancient genes of et al., Genome duplication in the teleost fish Tetraodon nigroviridis Saccharomyces cerevisiae, Microbiology 150 (2004) 2221–2227. reveals the early vertebrate proto-karyotype, Nature 431 (2004) [10] P. Veiga-Crespo, L. Blasco, M. Poza, T.G. Villa, Putative ancient 946–957. microorganisms from amber nuggets, Int. Microbiol. 10 (2007) [38] G. Bourque, G. Tesler, P.A. Pevzner, The convergence of cytogenetics and 117–122. rearrangement-based models for ancestral genome reconstruction, [11] D.A. Drummond, J.D. Bloom, C. Adami, C.O. Wilke, F.H. Arnold, Why Genome Res. 16 (2006) 311–313. highly expressed proteins evolve slowly, Proc. Natl. Acad. Sci. U. S. A. [39] J.L. Gordon, K.P. Byrne, K.H. Wolfe, Additions, losses, and rearrange- 102 (2005) 14338–14343. ments on the evolutionary route from a reconstructed ancestor to the [12] T. Bedford, I. Wapinski, D.L. Hartl, Overdispersion of the molecular clock modern Saccharomyces cerevisiae genome, PLoS Genet. 5 (2009) varies between yeast, Drosophila and mammals, Genetics 179 (2008) e1000485. 977–984. [40] G Jean, D.J. Sherman, M. Nikolski, Mining the semantics of genome [13] B. Dujon, Yeast evolutionary genomics, Nat. Rev. Genet. 7 (2010) super-blocks to infer ancestral architectures, J. Comput. Biol. 16 (2009) 512–524. 1267–1284. [14] G. Liti, D.M. Carter, A.M. Moses, J. Warringer, L. Parts, S.A. James, et al., [41] C. Chauve, H. Gavranovic, A. Ouangraoua, E. Tannier, Yeast ancestral Population genomics of domestic and wild yeasts, Nature 458 (2009) genome reconstructions: the possibilities of computational methods II, 337–341. J. Comput. Biol. 17 (2010) 1097–1112. 628 T. Rolland, B. Dujon / C. R. Biologies 334 (2011) 620–628

[42] P. Pevzner, G. Tesler, Genome rearrangements in mammalian evolution: spontaneous mutation accumulation lines, Genome Res. 19 (2009) lessons from human and mouse genomes, Genome Res. 13 (2003) 37–45. 1195–1201. [43] D. Sankoff, P. Trinh, Chromosomal breakpoint reuse in genome se- [52] Y. Nakao, T. Kanamori, T. Itoh, Y. Kodama, S. Rainieri, N. Nakamura, quence rearrangement, J. Comput. Biol. 12 (2005) 812–821. et al., Genome sequence of the lager brewing yeast, an interspecies [44] A. Bhutkar, S.W. Schaeffer, S.M. Russo, M. Xu, T.F. Smith, W.M. Gelbart, hybrid, DNA Res. 16 (2009) 115–129. Chromosomal rearrangement inferred from comparisons of 12 Dro- [53] B Dunn, G. Sherlock, Reconstruction of the genome origins and evolu- sophila genomes, Genetics 179 (2008) 1657–1680. tion of the hybrid lager yeast Saccharomyces pastorianus, Genome Res. [45] E.M. Zdobnov, P. Bork, Quantification of insect genome divergence, 18 (2008) 1610–1623. Trends Genet. 23 (2007) 16–20. [54] T. Rolland, C. Neuve´ glise, C. Sacerdot, B. Dujon, Insertion of horizontally [46] T.D. Petes, Meiotic recombination hot spots and cold spots, Nat. Rev. transferred genes within conserved syntenic regions of yeast genomes, Genet. 2 (2001) 360–369. PLoS One 4 (2009) e6515. [47] A.G. Clark, M.B. Eisen, D.R. Smith, C.M. Bergman, B. Oliver, T.A. Markow, [55] P.S. Novichkov, M.V. Omelchenko, M.S. Gelfand, A.A. Mironov, Y.I. Wolf, et al., Evolution of genes and genomes on the Drosophila phylogeny, E.V. Koonin, Genome-wide molecular clock and horizontal gene trans- Nature 450 (2007) 203–218. fer in bacterial evolution, J. Bacteriol. 186 (2004) 6575–6585. [48] J.S. Kaminker, C.M. Bergman, B. Kronmiller, J. Carlson, R. Svirskas, S. [56] N. Jacques, C. Sacerdot, M. Derkaoui, B. Dujon, O. Ozier-Kalogeropou- Patel, et al., The transposable elements of the Drosophila melanogaster los, S. Casare´ gola, Population polymorphism of nuclear mitochondrial euchromatin: a genomics perspective, Genome Biol. 3 (2002), RE- DNA insertions reveals widespread diploidy associated with loss of SEARCH0084. heterozygosity in Debaryomyces hansenii,Eukaryot.Cell9(2010) [49] D.C. Presgraves, length evolution in Drosophila, Mol. Biol. Evol. 449–459. 23 (2006) 2203–2213. [57] C.P. Kurtzman, J.W. Fell, T. Boekhout, The yeasts: a taxonomic study, [50] B. Dujon, The yeast genome project: what did we learn? Trends Genet. fifth ed., Elsevier, Amsterdam, 2011. 12 (1996) 263–270. [58] S.F. Altschul, T.L. Madden, A.A. Scha¨ffer, J. Zhang, Z. Zhang, W. Miller, D.J. [51] P.D. Keightley, U. Trivedi, M. Thomson, F. Oliver, S. Kumar, M.L. Blaxter, Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein Analysis of the genome sequences of three Drosophila melanogaster database search programs, Nucleic Acids Res. 25 (1997) 3389–3402.

C. R. Biologies 334 (2011) 916

Contents lists available at SciVerse ScienceDirect

Comptes Rendus Biologies

w ww.sciencedirect.com

Erratum

Corrigendum to the article: Yeasty clocks: Dating genomic changes in

yeasts [C. R. Biologies 334 (2011) 620–628]

Thomas Rolland, Bernard Dujon *

Unite´ de ge´ne´tique mole´culaire des levures (CNRS URA2171 and University P.-M.-Curie UFR927), institut Pasteur, 25, rue du Docteur-Roux, 75724 Paris cedex 15,

France

In Figure 4, right panel, Human vs. Mouse should read Human vs. Fish.

DOI of original article: 10.1016/j.crvi.2011.05.010

* Corresponding author.

E-mail address: [email protected] (B. Dujon).

1631-0691/$ – see front matter ß 2011 Acade´mie des sciences. Published by Elsevier Masson SAS. All rights reserved. doi:10.1016/j.crvi.2011.10.001