2010-Rolland Et Al.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic evolution of megasatellites in yeasts. Thomas Rolland, Bernard Dujon, Guy-Franck Richard To cite this version: Thomas Rolland, Bernard Dujon, Guy-Franck Richard. Dynamic evolution of megasatellites in yeasts.. Nucleic Acids Research, Oxford University Press, 2010, 38 (14), pp.4731-9. 10.1093/nar/gkq207. pasteur-01370686 HAL Id: pasteur-01370686 https://hal-pasteur.archives-ouvertes.fr/pasteur-01370686 Submitted on 23 Sep 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International License Published online 31 March 2010 Nucleic Acids Research, 2010, Vol. 38, No. 14 4731–4739 doi:10.1093/nar/gkq207 Dynamic evolution of megasatellites in yeasts Thomas Rolland1,2,3, Bernard Dujon1,2,3 and Guy-Franck Richard1,2,3,* 1Institut Pasteur, Unite´ de Ge´ ne´ tique Mole´ culaire des Levures, Department ‘Genomes and Genetics’, 25 rue du Dr Roux, 2CNRS, URA2171 and 3Universite´ Pierre et Marie Curie, UFR 927, F-75015 Paris, France Received January 13, 2010; Revised March 10, 2010; Accepted March 12, 2010 ABSTRACT homology with any other tandem repeat or gene sequenced so far. Among the 84 minisatellites previously Megasatellites are a new family of long tandem reported in Saccharomyces cerevisiae (3), four harbor a repeats, recently discovered in the yeast Candida tandemly repeated motif of 135 bp or larger, and may glabrata. Compared to shorter tandem repeats, qualify as megasatellites (a motif is defined as the such as minisatellites, megasatellite motifs range smallest DNA sequence that is tandemly repeated within in size from 135 to more than 300 bp, and allow cal- a minisatellite or a megasatellite). These four S. cerevisiae culation of evolutionary distances between individ- megasatellites are found in the paralogous FLO1, FLO5, ual motifs. Using divergence based on nucleotide FLO9 genes (135-bp repeated motif), involved in floccula- substitutions among similar motifs, we determined tion and cellular adhesion, and in the NUM1 gene (192-bp the smallest distance between two motifs, allowing repeated motif), encoding a protein required for nuclear their subsequent clustering. Motifs belonging to the migration along microtubules during cell division (3–5). The 44 megasatellites found in C. glabrata are present in same cluster are recurrently found in different 29 different protein-coding genes and six pseudogenes. megasatellites located on different chromosomes, Two major families of megasatellites have been described: showing transfer of genetic information between the ‘SFFIT family’ (due to the conservation of these five megasatellites. In comparison, evolution of the few amino acids in each motif) present in 11 genes and three similar tandem repeats in Saccharomyces cerevisiae pseudogenes, and the ‘SHITT family’ in 12 genes and six FLO genes mainly involves subtelomeric homolo- pseudogenes. Four genes and three pseudogenes carry gous recombination. We estimated selective con- both types of megasatellites. The remaining 10 genes straints acting on megasatellite motifs and their contain megasatellites that do not share significant host genes, and found that motifs are under strong homology with SFFIT and SHITT megasatellites. purifying selection. Surprisingly, motifs inserted Remarkably, although minisatellites are evenly distributed within pseudogenes are also under purifying selec- in the C. glabrata genome, megasatellites show a very strong bias towards locations in subtelomeric regions (2). tion, whereas the pseudogenes themselves evolve The existence of tandem repeats with such long motifs, neutrally. We propose that megasatellite motifs and their abundance in this yeast genome, raise the propagate by a combination of three different mo- question of their very origin. Regular minisatellites in lecular mechanisms: (i) gene duplication, (ii) ectopic other yeasts generally contain motifs that are 9- to 81-bp homologous recombination and (iii) transfer of long (3), the average motif size being 27 bp. In C. glabrata, motifs from one megasatellite to another one. the average motif size of a regular minisatellite is slightly These mechanisms actively cooperate to create shorter (21 bp), whereas SHITT and SFFIT megasatellites new megasatellites, that may play an important contain much longer motifs (135 and 300 bp, respectively). role in the adaptation of Candida glabrata to its It was proposed, several years ago, that minisatellites are human host. initially formed by replication slippage between two short sequences (5 bp) spaced by a few nucleotides, thus creating an initial motif duplication that could further expand by INTRODUCTION replication slippage and unequal sister-chromatid recom- Megasatellites are a new class of large tandem repeats that bination between the two motifs (6). This model, however, were recently discovered in the Candida glabrata genome does not explain how minisatellites (or megasatellites) sequence (1,2). They are widespread in the genome of this propagate into new genes to form large families, as hemiascomycetous yeast species, but share no significant observed in C. glabrata. *To whom correspondence should be addressed. Tel: +33 1 40 61 34 54; Fax: +33 1 40 61 34 56; Email: [email protected] ß The Author(s) 2010. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 4732 Nucleic Acids Research, 2010, Vol. 38, No. 14 In the present work, we address this question using an eliminate the risk of misassembly of repeated sequences in silico approach to infer evolutionary relationships (1). We verified here the presence of each individual between megasatellite motifs both at the intra- and motif of these megasatellites using original sequence inter-genic levels in the C. glabrata and S. cerevisiae reads of those BACs. Megasatellites present in genes genomes. Minisatellite motifs are too short to measure CAGL0E00143g, CAGL0E01661g and CAGL0I10098g evolutionary distances between them. By contrast, evolu- (1) were not considered in this work because they were tionary distances can be measured between megasatellites, not covered by BACs. In addition, the correctness of that contain longer DNA motifs, by classical sequence megasatellite sequences were verified by direct sequencing homology methods. In addition, to the expected similarity of PCR products for the genes CAGL0K13024g, detected between motifs belonging to the same CAGL0I10200g and CAGL0I10362g. Note that the megasatellite, we also found a surprising conservation sequence of CAGL0I10200g is not exactly identical to between motifs located in different megasatellites, suggest- Ge´ nolevures database. In total, out of 23 ing transfer of motifs from one megasatellite to another megasatellite-containing genes and pseudogenes presented one. We discuss possible mechanism(s) responsible for in Table 1, only two genes (CAGL0G10219g and these ‘motif jumps’ among megasatellites, and their CAGL0H10626g) and four pseudogenes possible selection during evolution of this pathogenic (CAGL0B05093g, CAGL0F00110g, CAGL0H00132g and yeast. CAGL0I00110g) were not directly verified in this work. Note that the list contains five pseudogenes, annotated MATERIALS AND METHODS as such because they contain 11–69 stop codons or an extensive 30 deletion (CAGL0A04873g). Motifs were ex- Megasatellite sequences tracted from the megasatellites as described in Thierry The starting set of megasatellite-containing genes was ex- et al. (1). Incomplete motifs at 50-or30- ends were tracted from the complete genomic sequence of C. glabrata eliminated before analysis. (http://www.genolevures.org/). In this sequence, 16 Saccharomyces cerevisiae FLO megasatellites are megasatellite-containing regions were determined from described in (3). FLO1, FLO5 and FLO9 sequences were BAC inserts instead of direct shotgun assembly to retrieved from SGD (http://www.yeastgenome.org/). Table 1. Megasatellites studied in this work and their corresponding motifs Organism Gene Name MS number Motif Family IS C. glabrata CAGL0A04873g 114/230 31xSHITT/3xSFFIT +/– CAGL0B05093g 231 11xSHITT – CAGL0E00231g 206/207 6xSHITT/3xSFFIT –/– CAGL0F00110g 115 4xSHITT + CAGL0G10219g 209 4xSHITT – CAGL0H00132g 234 4xSHITT – CAGL0H010626g 211 2xSHITT – CAGL0I00110g 235/236 2xSHITT/8xSFFIT –/– CAGL0I00157g 222/223 10xSHITT/3xSFFIT –/– CAGL0L00227g 112 4xSHITT + CAGL0I10246g 215 5xSFFIT – CAGL0I10340g 216 2xSFFIT – CAGL0I10200g 237 9xSFFIT – CAGL0J01774g 108 6xSHITT + CAGL0K13024g 221 5xSHITT – CAGL0L13310g (EPA11) 225/226 10xSHITT/6xSFFIT –/– CAGL0L13332g (EPA13) 227 5xSHITT + CAGL0C00253g 205 5xSFFIT – CAGL0I10147g 214 32xSFFIT – CAGL0I10362g 217/218 3xSHITT/5xSHITT –/– CAGL0J05170g 202 10xSHITT + CAGL0L09911g 224 5xSFFIT – CAGL0L10092g 229/228 5xSHITT/2xSFFIT +/– S. cerevisiae YAR050W