Thèse de doctorat

Université Pierre et Marie Curie

Ecole doctorale « Complexité du vivant », ED 515

Estimation des taux de mutation : implications pour la diversification et l’évolution du phytoplancton eucaryote

Marc Krasovec

Le 19 Octobre 2016, à Banyuls sur mer

Gwenaël Piganeau Université Pierre et Marie Curie Directeur de thèse Sophie Sanchez-Ferandin Université Pierre et Marie Curie Directeur de thèse Vincent Laudet Université Pierre et Marie Curie Président de Jury Laurent Duret CNRS, UMR 5558 Rapporteur Olivier Tenaillon INSERM, UMR 1137 Rapporteur Delphine Sicard INRA, UMR 1083 Examinateur

1

2 Remerciements

Ma plus profonde gratitude va à mes deux directrices de thèse, Gwenaël Piganeau et Sophie Sanchez-Ferandin, pour m’avoir donné l’opportunité de réaliser cette thèse avec elles, et surtout pour le soutien indéfectible et permanent durant ces trois années. Au-delà de la grande qualité de l’encadrement qu’elles m’ont apporté, j’ai pris un grand plaisir à travailler avec elles pour leurs nombreuses qualités aussi bien professionnelles que relationnelles. Ces trois années de thèse passées avec Gwenaël et Sophie ont été pour moi un grand épanouissement professionnel et personnel, et constituent une unique et excellente expérience pour ma vie future.

Je tiens également à adresser mes remerciements aux membres du jury pour avoir accepté d’évaluer mon travail, Laurent Duret, Olivier Tenaillon, Delphine Sicard et Vincent Laudet, ainsi que les membres de mes comités de thèse, Delphine Sicard, Jean-Paul Cadoret et Adam Erye-Walker.

Je tiens aussi à remercier le laboratoire de Biologie Intégrative des Organismes Marins et l’équipe de génomique environnemental du phytoplancton pour m’avoir accueilli et permis de réaliser cette thèse.

D’une manière plus générale, je remercie ma famille, en premier lieu ma mère Christine sans laquelle je ne serais jamais allé aussi loin aussi bien dans mes études que dans mes avancées personnelles, ainsi que mes frères et sœur Caroline, David, Frédéric (ou Lélic) et mon frère jumeau, Gabriel, comme moi grand admirateur des êtres vivants.

Aussi, nombreuses sont les personnes du laboratoire Arago qui m’ont aidé dans mes travaux, en sein de l’équipe, Nigel Grimsley, Hervé Moreau, Evelyne Derelle, Sheree Yau, et enfin une grande reconnaissance pour Elodie Desgranges et Claire Hemon, mes deux collègues de bureau et de laboratoire.

3 Je remercie également les membres de la plateforme cytométrie, David et Christophe Salmeron, toujours disponibles pour venir à la rescousse d’un cytomètre en panne.

Pour finir, je remercie mes différents amis, Florian, Sylvain, Alex (vous vous reconnaitrez) et les doctorants du laboratoire pour les discussions, les soirées avec une mention spéciale pour les gaming-night, et les amitiés qui resteront bien après la fin de cette thèse. Pour citer quelques noms, je remercie bien sûr mon cher collègue de thèse Hugo L et sa femme, Océane l’aristocrate, Sandrine et ses petits félins, Margot et son congénère larvaire qui ont toujours des bonbons à me donner, mon premier ministre imaginaire Mathieu que je remercie pour avoir effectué le déplacement, Hugo B pour les discussions philosophiques sur Homo sapiens, Marine, Tatiana et Remy, Mariana, Mathias qui va se faire séquencer, Nathalie, Brian et Elsa, Daniel. A toutes les personnes évoquées ci-dessus, je vous suis reconnaissant d’avoir supporté mes discussions parfois inutiles et inintéressantes sur les chats, les chinchillas (dont Kalam et Glorfindel sont les plus beaux représentants), et mes idées sensiblement peu démocratiques.

4 SOMMAIRE

Liste des abréviations 7

CHAPITRE 1: INTRODUCTION 9 1. Introduction générale 11

2. Les enjeux de la recherche sur les mutations 12 3. Les variations du taux de mutation 14

4. Les expériences d’accumulation de mutations 15 1. Les premières expériences de Terumi Mukai 15 2. L’effet des mutations sur la fitness 17 1. Les successeurs de Terumi Mukai 17 2. Paysage adaptatif 19 3. Interactions génotype-environnement 22 1. Les changements d’effet des mutations 22 2. Le stress et les hyper mutateurs 23 5. Les estimations directes du taux de mutation 24 1. Les variations inter génomiques du taux de mutation 24 1. La taille du génome 24

2. La taille efficace (Ne) 26 3. Le temps de génération 28 4. Le taux métabolique et la température 2. Les variations intra génomiques du taux de mutation 30 1. Le sens de la transcription et de la réplication 30 2. Le temps de réplication 31 3. Les régions codantes et le niveau d'expression 31 4. La composition en GC 31 6. Nouveaux modèles biologiques 35 1. L’importance écologique du phytoplancton 35 2. Présentation des espèces 36 1. Choix des modèles biologiques 36 2. Les Mamiellophyceae 40 3. Les 41 1. Présentation générale 41 2. Les transferts horizontaux de gènes 42 7. Les objectifs de thèse 45

5 CHAPITRE 2: EFFETS DES MUTATIONS SUR LA FITNESS 47

CHAPITRE 3: LE TAUX DE MUTATION CHEZ LES MAMIELLOPHYCEAE 61

CHAPITRE 4: LES TRANSFERTS HORIZONTAUX DE GENES: LE CAS DE RCC4223 81

CHAPITRE 5: IMPACT DU TAUX DE MUTATION POUR LES BIOTECHNOLOGIES 97

CHAPITRE 6: DISCUSSION ET CONCLUSION 113 1. Les variations de fitness indépendantes des mutations 115 1. La plasticité phénotypique 115 2. Les bactéries présentes dans les cultures d’O. tauri 117 3. Le rôle des variations structurelles sur le phénotype 118 2. Les limites à l’estimation du taux de mutation 120 3. Perspectives pour les EAMs 123 4. Conclusion générale 125

Annexes 127

Listes des figures et des tableaux 175

Bibliographie 181

Résumé 214

6 Liste des abréviations a : Effet de la mutation sur la fitness ADN : Acide désoxyribonucléique ARN : Acide ribonucléique CV : Variation de l’effet des mutations ΔV : Changement de variance de la donnée de fitness ΔM : Changement moyen de fitness par génération EAM : Experience d’accumulation de mutations G : Taille de génome

GCeq : Contenu en GC du génome à l’équilibre Ge : Taille de génome codante GxE : Interactions Genotype-Environement HGT : Horizontal gene transfer Μb : Mega base MMR : Mismatch repair Ne : Taille efficace de population OmV1 : Ostreococcus mediterraneus Virus 1 PFGE : Pulsed-field gel electrophoresis

R1 : Taux de mutation de GC vers AT

R2 : Taux de mutation de AT vers GC RCC : Roscoff culture collection ROS : Reactive oxygen species TCR : Transcription-coupled repair U : Taux de mutation par génome Uc : Taux de mutation caryotypique par génome

Ud : Taux de mutation délétères par génome µ : Taux de mutation par nucléotide

7

8

CHAPITRE 1:

INTRODUCTION

9

10 1. Introduction générale

Depuis la publication de l’origine des espèces et du principe de la sélection naturelle par Charles Darwin en 1859, des générations de biologistes ont étudié les questions fondamentales qui entourent l’évolution et la diversité du vivant. A cette époque, la génétique n’est pas connue et Darwin ignore les mécanismes qui génèrent la variabilité et la diversité soumises à la sélection naturelle. Les lois de Mendel sont redécouvertes en 1900, et en 1902 Walter Sutton propose la théorie chromosomique de l’hérédité. L’existence des mutations est démontrée en 1911 par Thomas Morgan en réalisant des expériences sur des drosophiles. Les mutations sont le moteur de l’évolution et constituent la base du potentiel adaptatif des espèces car elles constituent la principale source de diversité sur laquelle peut agir la sélection. Les biologistes s’intéressent donc depuis longtemps aux rôles des mutations, et les découvertes du début du 20ème siècle vont aboutir à la théorie synthétique de l’évolution, en particulier avec les travaux de Sewall Wright, John B. S. Haldane, Hermann J. Muller ou Julian Huxley (Haldane, 1949, 1937; Muller, 1928; Wright, 1932). La découverte de l’ADN et de sa structure (Watson and Crick, 1953) ouvrira la voie aux technologies de séquençage qui permettent d’observer directement l’apparition des mutations sur un génome, leurs fréquences et leurs distributions. Leurs effets sur la capacité de survie sont également explorés (Eyre- Walker and Keightley, 2007; Haldane, 1937; Muller, 1950) pour comprendre les différents processus évolutifs et adaptatifs des êtres vivants. La théorie neutraliste de l’évolution de Kimura dans les années 1960 apporte une nouvelle vision de l’évolution avec la mise en avant du hasard comme force aussi importante que la sélection, la dérive génétique (Kimura, 1991, 1987, 1968). Il s’agit de la variation aléatoire des fréquences alléliques dans une population (Charlesworth, 2009), indépendamment de la sélection ou des migrations. La dérive est plus forte dans des populations de petite taille, et donc de faible taille efficace (Wright, 1931), et peut aller à l’encontre de la sélection naturelle (Charlesworth, 2009; Willi et al., 2006). Les mutations sont soumises à ces forces évolutives et le taux de mutation subit lui même la sélection naturelle ou le hasard de la dérive génétique.

11 2. Les enjeux de la recherche sur les mutations

La diversité que nous pouvons observer sur Terre au sein des trois empires du vivant que sont les bactéries, les archées et les eucaryotes est issue des processus de sélection et de mutations. Les mutations sont une altération de la molécule d’ADN, à un niveau ponctuel ou chromosomique. Cette altération peut être le remplacement d’un nucléotide par un autre, une insertion ou une délétion de séquence, une cassure, une duplication, un réarrangement chromosomique ou autres modifications de l’ADN. Nous pouvons distinguer deux origines aux mutations: les mutations issues des erreurs de réplication d’une part, et issues de facteurs mutagènes d’autres part (rayonnement ultra violet, stress oxydatifs ou radioactivité par exemple); voir la revue de Maki, 2002 (Maki, 2002) et la Figure 1.

Les mutations constituent un large enjeu pour la recherche en biologie et en médecine. En recherche fondamentale, elles sont étudiées pour répondre à des questions centrales sur l’évolution et les capacités d’adaptation des espèces. La diversité génétique, en partie issues des mutations, est étudiée en écologie pour la conservation des espèces menacées (Ellegren and Galtier, 2016). En médecine, elles sont étudiées en raison de leurs implications dans différentes maladies génétiques et cancers (Ding et al., 2015; Salk et al., 2010). Deux points essentiels intéressent particulièrement les évolutionnistes et la communauté scientifique en général: Le premier est de savoir comment les mutations impactent la fitness des organismes, c’est à dire leurs capacités de survie et de reproduction. L’effet des mutations se définit alors comme avantageux (la fitness augmente), neutre (la fitness ne change pas) ou délétère (la fitness diminue). Le second point est de comprendre à quelles fréquences les mutations apparaissent, et quels facteurs influencent le taux de mutation et ces éventuelles variations aux différentes échelles.

12 Nous verrons donc dans un premier temps l’état de l’art sur notre compréhension des effets des mutations sur la fitness et leurs rôles dans l’adaptation, suivis d’une liste non exhaustive des facteurs qui expliquent en partie les variations inter et intra génomiques du taux de mutation. Ce travail de thèse s’inscrit pleinement dans ces deux problématiques, par l’étude du taux de mutation et de l’effet des mutations sur la fitness en considérant cinq espèces d’algues vertes (chlorophytes, plantae, eucaryotes) comme modèles biologiques.

ADN intact Mutagènes endogènes ou exogènes

Pas de dommages

Lésion Dommage ADN pré-réplication Réparation Réparation Absence de correcte non correcte réparation ou partielle

ADN intact Mutation Lésion

Erreur de Réplication réplication Arrêt de la réplication

Résultat Pas de Mutation Mutation létale post-réplication mutation

Figure 1. Processus de mutations, modifié de Gao et al., (Gao et al., 2016). Les mutations sont issues des erreurs de réplication ou des facteurs mutationnels indépendants de la réplication. Dans les deux cas, des mécanismes de réparations existent pour corriger une partie de ces mutations. Si la mutation n’est pas réparée, elle peut être transmise ou non à la descendance en fonction du mode de reproduction de l’organisme.

"$! 3. Les variations du taux de mutation

Au début des années 1960 apparaît la notion d’horloge moléculaire (Bromham and Penny, 2003). L’horloge moléculaire avance l’hypothèse d’une apparition constante et continue des mutations dans un génome. Cette horloge moléculaire sera utilisée pour dater les phylogénies, mais des études vont invalider cette hypothèse, avec des variations inter taxons (Britten, 1986; Bromham, 2009) et intra taxon (Bousquet et al., 1992; Bromham et al., 1996) du taux de mutation. De plus, les données actuelles montrent des variations importantes au sein d’une même espèce, par exemple en fonction du mode de reproduction, où le taux de mutation est plus fort dans une population asexuée (Henry et al., 2012; Neiman et al., 2010). C’est aussi le cas pour différentes souches de Chlamydomonas reinhardtii (Ness et al., 2015b) avec une variation d’un facteur 10 entre les taux de mutations les plus bas et les plus hauts. En plus de ces variations inter espèces, il existent des variations intra génomiques du taux de mutation, comme dans le génome mitochondrial des angiospermes (Laroche et al., 1997), ou entre les organelles et l’ADN nucléaire comme chez la drosophile ou Caenorhabditis elegans (Denver et al., 2000; Haag-Liautard et al., 2008; Smith, 2015; Xu et al., 2012). Une revue chez les mammifères expose les nombreuses variations du taux de mutation dans un génome (Hodgkinson and Eyre-Walker, 2011), que ce soit aux échelles de sites adjacents, ou de chromosomes entiers. Nous savons par exemple que le taux de mutation est plus élevé au niveau du chromosome sexuel Y les chimpanzés par rapport aux autres chromosomes (Consortium, 2005; Ebersberger et al., 2002). Au niveau intra chromosomique, il a été montré que certains trinucléotides mutent préférentiellement par rapport à d’autres (Ness et al., 2015b; Sung et al., 2015), ou que les régions avec de petites séquences répétées mutent plus rapidement que le reste du génome (Ma et al., 2012; Tesson et al., 2013). Ces variations du taux de mutation mettent en avant l’importance de comprendre quelles forces évolutives l’impactent et le font varier aux échelles inter et intra génomiques. Ces types de résultats sont en partie obtenus par une étude directe du taux de mutation, via les expériences d’accumulation de mutations (EAM). C’est cette approche qui est utilisée dans ce travail de thèse sur les cinq espèces modèles.

14 4. Les expériences d’accumulation de mutations 1. Les premières expériences de Terumi Mukai

Les premières estimations du taux de mutation datent des années 1960 avec les expériences d’accumulation de mutations (EAM) de Terumi Mukai (Keightley and Eyre-Walker, 1999; Mukai, 1964) sur la drosophile, bien que les premières expériences portant sur les mutations ont été développées une cinquantaine d’années plus tôt par Muller (Crow and Abrahamson, 1997; Muller, 1927). A cette époque, l’estimation du taux de mutation ne se fait pas par séquençage, en raison de l’absence des technologies modernes, mais par l’estimation du taux de mutation

délétères (Ud) à partir de données de fitness. Le principe des expériences d’accumulation de mutations est de maintenir des lignées filles issues d’une lignée mère pendant un certain nombre de générations et de comparer les lignées filles en fin d’expérience avec le type ancestral (Figure 2). Durant les expériences d’accumulation de mutations, une série de goulots d’étranglements est nécessaire pour maintenir une taille efficace (Ne) la plus faible possible dans les lignées. La taille efficace d’une population, notion introduite par Sewall Wright en 1931 (Wright, 1931), est la part de la population qui participe à la reproduction, ou la taille théorique qu’aurait la population dans un cas idéal (c’est à dire une population avec reproduction aléatoire, la panmixie) qui aurait la même diversité que la population réelle. Plus la taille efficace de la population est grande, plus la sélection est efficace. Inversement, plus la taille efficace est faible plus la dérive génétique sera forte. Réduire la taille efficace dans les lignées permet donc d’éliminer au maximum la sélection naturelle et d’estimer le taux de mutation avant sélection. Nous avons donc accès à la totalité des mutations (exception faite des mutations létales), définissant le taux de mutations spontanées (Drake et al., 1998). C’est pour cette raison qu’une étude de la diversité existante dans une population ou une espèce est insuffisante pour estimer le taux de mutations spontanées car seule la diversité après sélection est mesurée. Dans le cas de la drosophile, en raison de la diploïdie et de la reproduction sexuée, la lignée mère est généralement consanguine homozygote avant de commencer l’expérience (Keightley et al., 2014a, 2014b, 2009).

15 Contrôle Type ancestral

Mutation

Type N Lignées ancestral

Figure 2. Schéma d’une expérience d’accumulation de mutations. Les lignées sont maintenues avec une succession de goulots d’étranglements. En fin d’expérience, une comparaison de fitness ou une comparaison génomique permet d’étudier l’effet des mutations et leurs distributions dans le génome.

De cette façon, des variations de fitness dues à la recombinaison de plusieurs allèles lors de la méiose sont évitées. Les locus portant tous le même allèle, l’hypothèse est faite que seules les mutations créent une variation de fitness. Avec les données de fitness, estimées par le succès de reproduction (nombre d’œufs et nombre d’éclosions), Mukai développe une méthode statistique et calcule les paramètres de mutation:

! ! !! ! !" ! ! !! ! !!!!! ! ! !"#!!!!! !" ! !! ! !!! !!!! ! !! ! !!! Où a est l’effet de la mutation, !V le changement de variance de la donnée de fitness par génération et !M le changement moyen de fitness par génération. !V et !M peuvent être estimés directement par régression sur les données de fitness mesurées pendant l’expérience. L’augmentation de la variance de la valeur de fitness résulte de l’impact des mutations qui vont faire changer la fitness dans le cas de mutations avantageuses ou délétères. Mukai estime Ud=0.34 mutations délétères par génome par génération comme première estimation d’un taux de mutation délétère chez un organisme et E(a)=0.027 comme baisse moyenne de fitness par génération. Ce taux de mutation est le taux de mutation minimal, car il ne prend en compte que les mutations délétères (seule la baisse de fitness est prise en

"'! compte pour les calculs des paramètres de mutations). De plus, il estime le taux de mutation létal à 0.006 mutations par génération. Suite à la méthode de Mukai, une autre méthode, par maximum likelihood (Fry et al., 1999; Keightley, 1994; Keightley and Bataillon, 2000; Keightley and Caballero, 1997) a été développée. Elle permet notamment d’utiliser des données de fitness issues d’une EAM comme la méthode de Mukai, mais avec une variance plus faible.

4.2. L’effet des mutations sur la fitness 1. Les successeurs de Terumi Mukai

La méthode statistique de Mukai est utilisée par différents biologistes pour

estimer Ud chez différents organismes modèles. Une revue a été publiée en 2009 (Halligan and Keightley, 2009). D’une manière générale, il est constaté une baisse de la fitness chez les lignées mutantes au cours des générations pour toutes les espèces qui ont été testées, dont quelques exemples sont cités ci-dessous: - Drosophila melanogaster (Fernández and López-Fanjul, 1996; Fry, 2004, 2001, Fry et al., 1999, 1996; Fry and Heinsohn, 2002; Houle et al., 1992; Huey et al., 2003; Keightley, 1994; Schrider et al., 2013; Wang et al., 2014); - Caenorhabditis elegans (Ajie et al., 2005; Baer et al., 2006, 2006; Davies et al., 1999; Estes et al., 2004; Katju et al., 2014; Vassilieva et al., 2000; Vassilieva and Lynch, 1999); - Saccharomyces cerevisiae (Korona, 1999; Wloch et al., 2001; Zeyl and DeVisser, 2001); - Daphnia pulex (Deng and Lynch, 1997; Korona, 1999, 1999; Latta et al., 2013; Schaack et al., 2013); - Arabidopsis thaliana (Deng and Lynch, 1997; Rutter et al., 2012; Schultz et al., 1999; Shaw et al., 2000);

17 Il existe aussi des organismes un peu moins étudiés par EAMs, mais de plus en plus de données sont disponibles sur tout l’arbre du vivant; Chlamydomonas reinhardtii (Morgan et al., 2014), Tetrahymena thermophila (Brito et al., 2010), Dictyostelium discoideum (Hall et al., 2013), Escherichia coli (Cao et al., 2014; Kibota and Lynch, 1996).

De ce fait, il est avancé que la majorité des mutations sont délétères, c’est à dire qu’elles diminuent la capacité de survie. L’impact des mutations délétères dans une population ou chez un individu est défini comme le poids des mutations délétères, ou fardeau génétique (Agrawal and Whitlock, 2012; Charlesworth et al., 1990): c’est la différence de fitness qu’il existe entre la fitness optimale et la fitness réelle. Les mutations délétères sont normalement supprimées par la sélection naturelle, mais la dérive peut les maintenir ou les fixer dans une population. L’effet des mutations délétères sur les populations a largement été exploré (Agrawal and Whitlock, 2012; Charlesworth and Charlesworth, 1998; Kondrashov, 1995, 1988, Lande, 1994, 1988; Lynch et al., 1999), de même que l’estimation par des méthodes statistiques des paramètres mutationnels dans les populations naturelles (Deng et al., 2002; Deng and Lynch, 1996; Li and Deng, 2005). Une population de petite taille efficace est plus sensible aux mutations délétères en raison de la faible efficacité de la sélection naturelle (Eyre-Walker et al., 2002; Higgins and Lynch, 2001; Houle, 1992; Lande, 1998; Lynch et al., 1995; Lynch and Gabriel, 1990; Willi et al., 2006). Si la sélection est trop faible, elle ne permet pas une purge efficace des mutations délétères. Cela peut avoir un impact sur les espèces menacées avec de faibles tailles de population: la dérive et les mutations délétères peuvent accentuer la perte de diversité et de viabilité d’une population. Cependant, le taux de mutation « optimal » résulte d’un compromis entre le coût des mutations délétères et le bénéfice de mutations avantageuses (Wielgoss et al., 2013). La taille efficace d’une population joue donc un rôle essentiel dans la force de sélection et la capacité adaptative de cette population (Gossmann et al., 2012). Ainsi, la probabilité de fixation d’une mutation dans une population va dépendre de l’intensité de la dérive et de la sélection, et de l’effet de cette mutation sur la survie (neutre, avantageux ou délétère).

18 4.2.2. Paysage adaptatif et distribution de l’effet des mutations

Comme nous venons de voir, la majorité des mutations semble être délétère, mais une partie est neutre ou avantageuse (Hall et al., 2008; Joseph and Hall, 2004). La distribution de fitness des mutations vient en partie du niveau de fitness d’un génome dans un environnement donné. En faisant l’hypothèse qu’il existe un niveau de fitness maximal possible dans un environnement, la proportion de mutations délétères augmente si la fitness du génome se rapproche du maximum. L’ensemble des fitness possibles se définit comme le paysage adaptatif (Orr, 2005; Petren, 2013), une notion introduite par Sewall Wright et Fisher (Mousseau and Roff, 1987; Edwards, 2000; Zhang, 2012). Il existe de nombreuses théories sur les modèles de paysages adaptatifs possibles, en particulier le « single-peak » (Wright, 1932), le « rugged » ou vallée (Martin and Wainwright, 2013; Steinberg and Ostermeier, 2016) ou le « holey » (Gavrilets, 1997).

Le « single-peak », le plus simple, est un pic de fitness avec un maximum possible (Figure 3). Dans ce cas, la fitness du génome, en fonction des mutations et de l’épigénétique (Kaity et al., 2008), va se déplacer sur le pic de fitness entre le maximum et le minimum. Plus la fitness du génome est proche du maximum, plus les mutations auront de fortes probabilités d’être délétères et, inversement, un génome avec un niveau de fitness bas va compter plus de mutations avantageuses (Tenaillon et al., 2016). Enfin, si la fitness du génome est trop basse, il peut simplement être éliminé par la sélection.

Le second modèle est le modèle « holey » (Gavrilets, 1997), où la fitness maximale est définie comme le plancher du paysage adaptatif. Les mutations avantageuses ne font que maintenir le génome à ce niveau. Ce plancher est marqué par des puits de fitness, dans lesquelles le génome « tombe » en cas de mutations délétères.

19 Enfin, le modèle le plus souvent accepté, et qui a déjà été mis en évidence chez les bactéries (Nahum et al., 2015) ou des espèces comme un téléostéen du genre Cyprinodon (Martin and Wainwright, 2013), est le « rugged ». Dans ce cas, il existe de nombreux pics de fitness avec des vallées ou des plateaux sur lesquels le génome va se déplacer. De plus, une vallée entre des pics de fitness peut entrainer une différenciation de deux populations, d’où l’importance de cette hypothèse en évolution. Avec ce modèle, une population avec une faible taille efficace peut atteindre un pic de fitness plus élevé qu’une population avec une taille efficace plus grande (Rozen et al., 2008). A cause de l’efficacité de la sélection, une grande population atteindra rapidement le sommet de fitness le plus proche. En revanche, une population avec une petite taille efficace pourrait atteindre un pic de fitness plus élevé, car la dérive génétique déplace la population dans le paysage de fitness.

Dans tous les cas, quel que soit le model admis, ce sont les mutations qui vont principalement augmenter ou diminuer la fitness du génome sur le paysage adaptatif et permettre l’accession à une fitness supérieure dans le cas de mutations avantageuses. Par ailleurs, le paysage adaptatif est spécifiquement défini pour un génotype et un environnement. La position d’un génome est donc le résultat de l’interaction génotype-environnement et du contexte génétique. On a donc une variation du paysage adaptatif suite à une variation environnementale (Matuszewski et al., 2014) ou le long d’un gradient environnemental (Laughlin and Messier, 2015), avec un compromis d’adaptation (Elena and Lenski, 2003) entre les environnements (Figure 4).

Au delà des mutations avantageuses ou délétères, les mutations neutres sont tout aussi importantes en raison de la variation de la distribution de la fitness des mutations. Les mutations avantageuses peuvent voir leurs impacts augmentés ou diminués, et les mutations neutres dans un cas peuvent avoir un effet dans d’autres conditions. Le changement de distribution de fitness des mutations neutres met en avant l’importance de la variation existante comme base d’adaptation immédiate à un changement environnemental (Barrett and Schluter, 2008; Hermisson and Pennings, 2005). On parle de la « standing genetic variability ».

20 Fitness du génome Mutation - neutre - avantageuse - délétère

Position du génome

Figure 3. Représentation du « fitness landscape ». La fitness du génotype (en bleu) va bouger et changer en fonction de l’effet des mutations qui vont apparaître. Plus la fitness du génotype est proche du maximum, plus les mutations auront de probabilité d’être délétères. De même une variation environnementale peut entrainer un déplacement du génotype sur le paysage de fitness. La difference entre la position du génotype et le maximum de fitness est definie comme le poids des mutations délétères.

()"$%*% Fitness

()"$%+%

()"$%,%

!"#$%&% !"#$%'% Figure 4. Changement de fitness d’un genotype entre environnements. La variation de l’effet des mutations et de la variabilité entre envrironnements. La figure est reprise de Santiago et Richards, 2003 (Elena and Lenski, 2003). Le genotype 1 est specialisé dans l’environnement A mais peu adapté au B, inversement pour le génotype 2, alors que le génotype 3 est généraliste.

#"! 4.2.3. Interaction génotype-environnement 1. Les changements d’effet des mutations

Différentes études sur des lignées mutantes issues d’expériences d’accumulation de mutations ont tenté d’explorer les effets d’un changement environnemental sur la distribution de fitness des mutations en comparant les fitness d’une même lignée, notamment chez la drosophile (Fry et al., 1996, p. 19996; Fry and Heinsohn, 2002; Kondrashov and Houle, 1994; Korona, 1999), le nématode Caenorhabditis elegans (Baer et al., 2006), ou la plante Arabidopsis thaliana (Rutter et al., 2012). Comme nous l’avons vu dans le paragraphe précédent sur le paysage adaptatif, nous nous attendons à des changements de fitness selon les conditions. En laboratoire, différentes variables facilement contrôlables peuvent être testées; on peut citer le cas de la disponibilité en ressource (Chang and Shaw, 2003) et de la luminosité (Kavanaugh and Shaw, 2005) chez Arabidopsis thaliana. De ces études, nous pouvons estimer les paramètres de mutations comme pour les EMAs et les comparer pour émettre des hypothèses sur les implications biologiques des mutations, définies au nombre de trois (Martin and Lenormand, 2006). Premièrement, un changement de U (nombre de mutations par génome par génération) traduit une différence d’effet des mutations, avec des mutations qui ont un effet détectable dans une condition mais neutre (ou non détectable suivant le caractère de fitness considéré) dans une autre. On peut par exemple penser à des changements d’expression de gènes entre deux conditions. Deuxièmement, une variation de a (effet moyen d’une mutation par génération) indique un changement de l’intensité de la sélection car les effets d’une mutation varient. Or, plus l’effet de la mutation est fort, plus la sélection pourra influer sur la fréquence de cette mutation dans une population : on attend une plus forte contre sélection d’une mutation délétère qui a un plus fort impact sur la fitness. Enfin, si CV varie (c’est à dire la variation de l’effet des mutations), on s’attend à avoir un effet du stress sur les mutations. Ainsi, si une population est adaptée à une condition et est proche du maximum de fitness, l’effet des mutations sera le plus souvent délétère. Mais en cas de stress, la population n’est plus à son optimum de fitness, ce qui fera varier l’effet des nouvelles mutations.

22 4.2.3. Interaction génotype-environnement 2. Le stress et les hyper mutateurs

En cas de stress, il est traditionnellement admis que les mutations délétères ont un impact plus fort sur la fitness (Elena and de , 2003). C’est notamment le cas dans une étude pourtant sur l’impact du stress chez des lignées issues d’expériences d’accumulation de mutations chez Chlamydomonas reinhardtii (Kraemer et al., 2015). Cependant, il est à noter que cette observation n’est pas systématique et différents articles tendent à montrer que le stress n’a pas d’effet sur l’ampleur de l’impact des mutations délétères (Andrew et al., 2015; Kishony and Leibler, 2003). L'impact du stress sur les effets des mutations peut être caractérisé de trois façons (Elena et , 2003): tout d'abord, la mutation peut être délétère sans conditions, avec une augmentation de l'effet délétère avec le stress; Ensuite, la mutation peut être conditionnellement neutre, c’est à dire neutre dans certaines conditions et délétère dans d'autres; Enfin, la mutation peut être conditionnellement bénéfique: avantageuse dans certaines conditions, mais délétère ou neutre dans d'autres.

En cas de stress, chez les bactéries, il a été mis en évidence la présence d’allèles mutateurs qui vont avoir un impact sur le taux de mutation en l’augmentant significativement (Couce et al., 2013; Taddei et al., 1997). Ce type de mécanismes est avantageux dans un environnement défavorable. Les nouvelles mutations vont apparaître plus fréquemment, ce qui augmente la probabilité des mutations avantageuses et donc l’adaptation (Giraud et al., 2001; Sniegowski et al., 1997; Tenaillon et al., 1999). Il n’y a pas d’allèles mutateurs connus chez les eucaryotes, mais il semble que chez la drosophile, les individus moins adaptés à un nouvel environnement ont un taux de mutation plus élevé que les autres (Sharp and Agrawal, 2012). Cette observation est également faite chez la plante A. thaliana (Jiang et al., 2014) ou chez la levure (Shor et al., 2013). Cette augmentation est toutefois moins significative que dans le cas des hyper-mutateurs bactériens.

23 5. Les estimations directes du taux de mutation 1. Les variations inter génomiques du taux de mutation 1. La taille du génome

De nos jours, les nouvelles générations de séquenceurs permettent d’estimer directement le taux de mutation en comparant les génomes de début et de fin d’EAM (voir le Tableau 1 pour les estimations actuelles). Une base de données en ligne est également disponible (Wei et al., 2014). Ces estimations ont permis de formuler différentes hypothèses sur les facteurs biologiques et écologiques qui agissent sur l’évolution du taux de mutation.

Parmi les premiers articles, Drake propose en 1991, à partir de données sur des organismes unicellulaires, un nombre de mutations constant par génome (Drake, 1991; Drake et al., 1998). Cette constante serait de U=0.0033 mutations par génome par réplication chez les microorganismes. Il s’agit plutôt de formuler que le taux de mutation U varie moins par rapport au niveau de variation des taux de mutation µ et des tailles des génomes. De ce fait, le taux de mutation diminue avec l’augmentation de la taille du génome pour garder le nombre de mutations U par génome constant à chaque réplication (Figure 5). Dans le cas d’un taux de mutation qui ne diminue pas avec l’augmentation de la taille du génome, nous obtenons un nombre de mutations par génome croissant. Cela augmente la probabilité d’apparition de mutations délétères à chaque réplication, ce qui peut compromettre la survie. Cette relation ne semble toutefois pas concerner les eucaryotes, où le taux de mutation augmente avec la taille du génome (Smeds et al., 2016; Sung et al., 2012a).

24 Tableau 1. Les taux de mutations spontanées estimés par des expériences d'accumulation de mutations. Dans ce tableau, seules les estimations de taux de mutation par séquençage du génome entier sont spécifiées. G est la taille du génome en Mb, µ est le taux de mutation par nucléotide par génome par génération et U est le nombre de mutations par génome par génération. Dans ce tableau n’apparaissent pas les mesures de taux de mutation obtenues avec des lignées artificiellement mutantes (suppression d’un mécanisme de réparation de l’ADN) ou issues d’un pédigrées, comme chez l’homme ou la souris (ces données sont disponibles dans le chapitre 3, tableau S10). Espèces G µ U Références Arabidopsis thaliana Col-0 157.0 7.00E-09 1.0990 (Ossowski et al., 2010) Caenorhabditis elegans N2 100.3 2.50E-09 0.2508 (Denver et al., 2009) Caenorhabditis elegans N2 100.3 3.10E-09 0.3109 (Denver et al., 2009) Caenorhabditis elegans N2 100.3 1.33E-09 0.1334 (Denver et al., 2012) Caenorhabditis elegans PB306 100.3 1.62E-09 0.1625 (Denver et al., 2012) Caenorhabditis briggsae PB800 108.4 1.44E-09 0.1561 (Denver et al., 2012) Caenorhabditis briggsae HK104 108.4 1.23E-09 0.1333 (Denver et al., 2012) Pristionchus pacificus PS312 133.1 2.0E-09 0.2663 (Weller et al., 2014) Drosophila melanogaster Madrid 122.0 3.50E-09 0.4270 (Keightley et al., 2009) Drosophila melanogaster Florida 122.0 5.49E-09 0.6698 (Schrider et al., 2013) Drosophila melanogaster Florida 122.0 2.80E-09 0.3416 (Keightley et al., 2014a) Heliconius melpomene 273.8 2.90E-09 0.7940 (Keightley et al., 2014b) Chlamydomonas reinhardtii CC-2937 112 2.08E-10 0.0233 (Ness et al., 2012) Chlamydomonas reinhardtii CC-124 112 6.76E-11 0.0076 (Sung et al., 2012a) Chlamydomonas reinhardtii 112 9.63E-10 0.1079 (Ness et al., 2015b) Paramecium tetraurelia d4-2 72.1 1.94E-11 0.0014 (Sung et al., 2012b) Saccharomyces cerevisiae FY10 12.3 3.30E-10 0.0041 (Lynch et al., 2008) Saccharomyces cerevisiae 12.3 1.67E-10 0.0021 (Zhu et al., 2014) Schizoaccharomyces pombe ED668 12.6 2.00E-10 0.0025 (Farlow et al., 2015) Schizoaccharomyces pombe 12.6 1.70E-10 0.0021 (Behringer and Hall, 2015) Dictyostelium discoideum AX4 34.2 2.90E-11 0.0010 (Saxer et al., 2012) Burkholderia cenocepacia HI2424 7.7 1.33E-10 0.0010 (Dillon et al., 2015) Escherichia coli 3k 4.6 1.88E-10 0.0009 (Lee et al., 2012) Escherichia coli 6k 4.6 2.45E-10 0.0011 (Lee et al., 2012) Mesoplasma florum L1 0.8 9.78E-09 0.0078 (Sung et al., 2012a) Mycobacterium tuberculosis H37Rv 4.4 2.58E-10 0.0011 (Ford et al., 2011) Salmonella typhimurium LT2 4.8 7.00E-10 0.0034 (Lind and Andersson, 2008) Bacillus subtilis 4.2 3.28E-10 0.0014 (Sung et al., 2015) Pseudomonas aeruginosa 6.6 7.92E-11 0.0005 (Dettman et al., 2016) Deinococcus radiodurans BBA816 3.2 4.99E-10 0.0016 (Long et al., 2015a) Mycobacterium smegmatis 7.0 5.27E-10 0.0036 (Kucukyildirim et al., 2016)

25 ! 10-6

10-7 virus

-8 10 eucaryotes

10-9 bacteries

10-10

archées 10-11 10-3 10-2 10-1 10 101 102 Taille du génome (Mb)

Figure 5. Relation entre le taux de mutation et la taille du génome. Figure reprise de Sung, 2012 (Sung et al., 2012a). On observe une diminution du taux de mutation avec une augmentation de la taille du génome chez les microorganismes. Cela se traduit par des apparitions peu fréquentes des mutations délétères dans les plus grands génomes.

5.1.2. La taille efficace (Ne)

Un autre facteur essentiel est la taille efficace qui va conditionner l’intensité de sélection à laquelle sera soumis le taux de mutation (Charlesworth, 2009; Lanfear et al., 2014), avec la notion de barrière de dérive (Martincorena and Luscombe, 2013; Sung et al., 2012a). Selon Lynch, le taux de mutation est plus faible chez les microorganismes en raison de leur grande taille efficace de population qui permet une sélection efficace (Lynch, 2010a). Le taux de mutation pourrait cependant être attendu plus petit chez les organismes multicellulaires en raison des dommages somatiques liés aux mutations délétères (Lynch, 2008), en particulier les cancers (Cowin et al., 2010; Knudson, 2000). Chez les organismes pluricellulaires, le taux de

#'! mutation varie en fonction des tissus (Lynch, 2010a), et le taux de mutation dans la lignée germinale est inférieur aux taux de mutation des cellules somatiques (Lynch and Hagner, 2015), limitant la transmission de mutations délétères au générations suivantes. Le coût des mutations délétères va pousser vers la sélection d’un taux de mutation faible, avec un taux de mutation théorique défini comme optimal (Figure 6). Mais ce taux de mutation optimal n’est jamais atteint en raison de la dérive génétique. Il existe donc une limite, dite la « barrière de dérive », qui empêche d’atteindre un taux de mutation optimal par la sélection, qui est un compromis entre l’adaptation, le coût des mutations délétères sur la fitness et le coût de réplication (Martincorena and Luscombe, 2013). Pour conclure, le taux de mutation réel est le plus proche du taux de mutation optimal chez les organismes à grande taille efficace (comme les microorganismes), que celui des organismes à taille efficace de population plus faible, comme les métazoaires (Figure 7).

Taux de mutation observé

Coût de Taux de réplication mutation optimal Coût des mutations délétères

Limite Coût de fitness de dérive

Taux de mutation !

Figure 6. La barrière de dérive et le coût de la réplication. Figure reprise de Martincorena et Luscombe, 2013 (Martincorena and Luscombe, 2013). La dérive impose une limite à la sélection du taux de mutation, qui ne peut atteindre le taux de mutation optimal, défini comme le compromis entre le coût des mutations délétères et le coût de la fidélité de réplication. Les espèces avec une grande taille efficace sont plus susceptibles de se rapprocher du taux de mutation optimal.

#(! !

10-7 M. domesticus

H. sapiens 10-8 A. thaliana N. crassa C. elegans D. melanogaster

P. falciparum 10-9

S. serevisiae

C. reinhardtii 10-10 104 105 106 107 108 Taille efficace

Figure 7. Relation entre taille efficace et taux de mutations. Figure reprise de Ness, 2012 (Ness et al., 2012). La taille efficace de la population définit l’efficacité de la sélection, et donc la capacité à atteindre le taux de mutation le plus bas possible pour limiter l’apparition des mutations délétères.

5.1.3. Le temps de génération

Le temps de génération étant variable en fonction des caractéristiques biologiques ou écologiques des espèces, le nombre de mutations qui apparaissent par unité de temps varie également. Des études ont tenté de comprendre l’influence du temps de génération sur le taux de mutation (Laird et al., 1969), notamment chez les vertébrés (Martin and Palumbi, 1993; Mooers and Harvey, 1994). D’une manière générale, il est observé une diminution du taux de mutation avec l’augmentation du temps de génération (Tableau 2). Des études plus récentes sur les mollusques (Thomas et al., 2010) ou les bactéries (Weller and Wu, 2015) tendent à confirmer cette hypothèse. Cela signifie une plus forte capacité à créer de nouvelles mutations et donc à s’adapter pour les espèces à temps de génération court.

#)! Tableau 2. Corrélation entre le temps de génération et le taux de mutation. Données de Martin et Palumbi (Martin and Palumbi, 1993). On observe une diminution du taux de mutation avec une augmentation du temps de génération et une baisse du taux métabolique. Taux Substitutions par site par Temps de Espèces métabolique milliard d'années génération (jours) (O2/kg/h) Douroucouli 2.1 450 880 Singe araignée 1.9 415 1 700 Macaque 1.8 430 1 095 Gibbon 1.7 370 3 410 Orang-outang 1.2 230 4 290 Gorille 1.2 200 3 438 Chimpanzé 1.2 220 3 190 Humain 1.1 210 6 200

5.1.4. Le taux métabolique et la température

En plus du temps de génération vu précédemment, Martin et ses collaborateurs montrent une augmentation du taux de mutation avec une augmentation du taux métabolique mesurée par la respiration (Martin and Palumbi, 1993). Cette augmentation est en général expliquée par la plus importante production d’espèces réactives d’oxygène (ROS) qui induisent un stress oxydatif (Baer et al., 2007). Les ROS, s’ils sont produits en trop grand nombre par l’organisme, peuvent provoquer des mutations, en particulier par l’oxydation de la guanine (Foster et al., 2015) ou la déamination de la cytosine (Cooke et al., 2003; Dizdaroglu, 1992; Hurst and Williams, 2000).

Au delà du taux métabolique, la température pourrait également influer sur le taux de mutation (Lewis et al., 2016; Wolfenden, 2014). Selon Wolfenden, les sources hydrothermales auraient pu être un accélérateur pour l’évolution en raison de la forte température qui augmente la vitesse des réactions enzymatiques, comme l’hydrolyse des peptides, et l’instabilité de l’ADN. Ainsi, les réactions chimiques provoquant des changements irréversibles auraient été plus fréquentes, notamment

29 les déaminations hydrolytiques des cytosines et adénines qui deviennent des uraciles et xanthines (Wolfenden, 2014). En lien avec les deux points précédents, il existe une théorie dite « metabolic theory of ecology », qui propose une accélération de l’évolution moléculaire avec la température. Cela se traduit par plus de spéciations et divergences en régions tropicales, chez plusieurs taxons, dont les plantes, les amphibiens ou les mammifères (Gillman et al., 2010; Mittelbach et al., 2007; Rolland et al., 2014; Wright et al., 2010).

5.2. Les variations intra génomiques du taux de mutation 1. Le sens de la transcription et de la réplication

Une étude sur Bacillus subtilis (Paul et al., 2013) suggère une hétérogénéité du taux de mutation en fonction du sens de la réplication et de la transcription sur le brin d’ADN. Une augmentation du taux de mutation est observée dans les zones dites de «conflit réplication-transcription». Le taux de mutation est plus élevé dans les gènes orientés inversement au sens de réplication, ce qui signifie une variation du taux d’évolution entre gènes. Ce «conflit réplication-transcription» a également été mis en évidence par d’autres études portant sur des lignées issues d’expériences d’accumulations de mutations (Schroeder et al., 2016). Le taux de mutation est également plus fort dans certaines zones, appelés « points chauds de mutations ». Chez les bactéries, notamment Escherichia coli, ces points chauds ont été localisés au niveau des points de blocage ou collision entre les fourches de réplication des deux brins matrices et non matrices (Foster et al., 2013). De même, la réplication est plus ou moins fidèle selon l’orientation des brins sens et anti-sens, ce qui a également une influence sur le taux de mutation (Fijalkowska et al., 1998).

30 5.2.2. Le temps de réplication

Au-delà du sens de la réplication, le temps de réplication induit un taux de mutation plus fort en fin de réplication, phénomène bien connu chez les mammifères (Chen et al., 2010; Stamatoyannopoulos et al., 2009). C’est à dire que plus le temps de réplication est long, plus le taux de mutation peut être élevé en fin de réplication. Les deux hypothèses avancées pour expliquer ce phénomène sont la diminution du stock de nucléotides disponibles et la perte d’efficacité des mécanismes de réparation de l’ADN (MisMatch Repair ou MMR). Les MMR permettent de réduire l’apparition des mutations au cours de la réplication (Fukui and Fukui, 2010; Jiricny, 2006; Kunkel and Erie, 2015; Li, 2008). Nous savons, par des expériences d’accumulation de mutations avec des lignées artificiellement déficientes en MMR (Denver et al., 2005; Jiricny, 2006; Lang et al., 2013; Lee et al., 2012; Long et al., 2015b; Sung et al., 2015), que ces mécanismes de réparation réduisent d’environ un facteur 100 le taux de mutation et peuvent changer le sens et la distribution des mutations en fonction de leur activation. Ce biais mutationnel en fin de réplication existe aussi chez les bactéries (Hudson et al., 2002) et chez la levure (Lujan et al., 2014). Le type et le taux d’erreur lors de la réplication peuvent également dépendre du type d’ADN polymérase. En effet, les différentes ADN polymérases n’ont pas les mêmes niveaux de fidélité, induisant plus ou moins d’erreurs (Hestand et al., 2016; Kunkel and Bebenek, 2000). Chez les eucaryotes, il existe par exemple de nombreuses polymérases avec des fonctions et capacités enzymatiques différentes (Hubscher et al., 2002).

31 5.2.3. Les régions codantes et le niveau d'expression

Une troisième raison aux variations intra génomiques vient de la différence du taux de mutation entre les régions codantes et non codantes du génome, et le niveau d’expression. Le taux de mutation semble en effet plus faible dans les gènes fortement exprimés (Eyre-Walker and Bulmer, 1995; Martincorena et al., 2012). Les expériences d’accumulation de mutations ont permis de le confirmer, notamment chez la levure (Zhu et al., 2014). Deux explications peuvent être avancées pour l’expliquer. Les MMR, qui peuvent être plus efficaces en région codante (Foster et al., 2015), et les transcription-coupled repairs (TCR), capables de réparation dans les régions fortement exprimées (Hanawalt and Spivak, 2008). Cependant différentes étude faites chez Escherichia coli (Beletskii and Bhagwat, 1996; Chen and Zhang, 2013; Klapacz and Bhagwat, 2002) contredisent ces résultats et montrent que les gènes fortement exprimés mutent plus rapidement que les autres. L’une des hypothèses avancées est le lien entre le taux de transcription et la mutabilité de la région transcrite: le processus de transcription peut perturber la réplication (Kim and Jinks-Robertson, 2012), comme vu dans le paragraphe traitant du sens de la réplication et de la transcription.

5.2.4. La composition en GC

Enfin, la composition en GC a une influence sur le taux de mutation, par le biais de la proportion transversions/transitions et la proportion des mutations A-T vers G-C ou inversement. Hershberg et Petrov ont montré un biais mutationnel chez les bactéries (Hershberg and Petrov, 2010), avec une proportion plus importante de mutations depuis les nucléotides G-C vers A-T par rapport aux autres types de substitutions. Les expériences d’accumulation de mutations montrent généralement aussi un biais de mutation de G-C vers A-T, chez Caenorhabditis elegans (Denver et al., 2009), Arabidopsis thaliana (Ossowski et al., 2010), Escherichia coli (Lee et al., 2012), Salmonella typhimurium (Lind and Andersson, 2008), par exemple. Cette observation n’est cependant pas systématique chez les bactéries, avec deux contre- exemples (Dillon et al., 2015; Long et al., 2015a), voir le Tableau 3. Certains types

32 de mutations fréquentes comme la déamination de la cytosine (Coulondre et al., 1978; Fryxell and Zuckerkandl, 2000) et l’oxydation de la guanine sont connus pour induire des mutations de G-C vers A-T (Cooke et al., 2003; Dizdaroglu, 1992). Ce biais mutationnel est bien connu chez les mammifères, où les sites CpG, c’est à dire les dinucléotides CG, mutent plus rapidement que le reste du génome (Hodgkinson and Eyre-Walker, 2011). Le taux de mutation des sites CpG est ~10 fois plus important que pour les autres sites. De ce fait, les dinucléotides CpG ne sont présents qu’a 20% de leur fréquence attendue dans le génome humain (Lander et al., 2001). Cependant, cette relation n’est pas aussi simple, car il existe aussi chez les mammifères des régions dite « CpG islands », très riches en GC (Bird, 1986). Or, dans ces régions d’environ 1kb, le taux de mutation est inferieurs à celui des CpGs situés ailleurs dans le génome (Cohen et al., 2011). Cela s’expliquerait par la stabilité de la méthylation des cytosines, influencer par la richesse des nucléotides adjacents en GC (Elango et al., 2008). Face à cela, nous pouvons nous attendre à observer une baisse de la teneur en GC dans certains génomes au cours des générations. Or, chez les bactéries, certains génomes sont très riches en GC (proche de 70%). Cela s’explique en partie par la sélection pour des codons optimaux plus riches en GC (Hildebrand et al., 2010). En effet, même des mutations synonymes peuvent avoir un impact sur la fitness, comme démontré précédemment (Glémin, 2010) en raison du biais d’usage du code (Ikemura, 1981). Le biais d’usage du code se traduit par l’utilisation préférentielle de certains codons par rapport à d’autres, notamment en fonction de la quantité de séquences qui codent pour l’ARN de transfert associé à ces codons. La conversion génique biaisée peut aussi expliquer une augmentation de la teneur en GC d’un génome (Chen et al., 2007; Duret and Galtier, 2009; Galtier et al., 2001; Glémin et al., 2015; Mugal et al., 2013). Il s’agit d’un biais de réparation des mésappariements, généralement lors de la recombinaison, qui conduit à enrichissement en GC du génome. Ce phénomène semble aussi présent chez les bactéries (Lassalle et al., 2015), qui recombinent moins que les eucaryotes étudiés généralement pour la conversion génique biaisée.

33 De plus, des variations dans l’orientation des mutations ont également été observées entre le génome nucléaire et le génome des organelles chez Chlamydomonas reinhardtii (Ness et al., 2015a). Chez Chlamydomonas reinhardtii, la composition en GC des organelles (47%) est plus faible que celle du génome nucléaire (63%), ce qui peut expliquer en partie cette différence.

Pour prédire le nombre de mutations en fonction de la composition du génome, il est utile de calculer le GC% à l’équilibre (Sueoka, 1962), c’est à dire la teneur en GC du génome pour laquelle il y a autant de mutations de type G-C vers A-T que A-T vers G-C. Sachant que les mutations sont biaisées de G-C vers A-T, le taux de mutation est en général plus fort pour les nucléotides G et C que pour A et

T. Le GCeq se calcule avec les équations suivantes :

(GC→AT) (AT→GC) �! �!= , �!= , GCeq = ��! ��! �! + �!

avec GC→AT le nombre de mutations de type G-C vers A-T et ��! le nombre de G et C dans le génome.

Tableau 3. Le biais de GC vers AT, dans la dernière colonne du tableau, est le rapport du taux de mutation de GC vers AT sur celui de AT vers GC. Les données sont reprises de Dillon et collaborateurs en 2015 (Dillon et al., 2015). Espèce (%GC) A/T->T/A G/C->C/G A/T->G/C G/C->A/T Biais vers AT

B. cenocepacia (67) 2.67 2.38 12.23 9.95 0.81

E. coli (51) 2.8 2.88 15.38 18.79 1.22

M. florum (27) 15.67 185.36 62.68 1000.97 15.97

H. sapiens (45) 129 295 581 1219 2.1

D. melanogaster (42) 98.06 74.52 149.14 643.95 4.32

S. cerevisiae (38) 3.03 7.82 12.43 27.55 2.22

A. thaliana (36) 43.56 123.63 165.52 1035.38 6.26

C. elegans (35) 17.5 16.89 24.19 101.32 4.19

34 6. Nouveaux modèles biologiques 1. L’importance écologique du phytoplancton

Le phytoplancton est composé de la partie photosynthétique du plancton, présente à l’échelle mondiale dans tous les écosystèmes aquatiques (de Vargas et al., 2015). Ce n’est cependant pas un terme qui désigne un groupe monophylétique, et ne constitue donc pas un groupe naturel d’organismes au sens évolutif. Il inclut des organismes issus de différents règnes, parmi les eucaryotes et les bactéries, notamment les cyanobactéries. Le phytoplancton eucaryote est très diversifié et se retrouve dans tous les règnes excepté les unicontes (qui comprennent entre autres les fungis et les métazoaires). La production primaire du phytoplancton constitue environ la moitié de la production terrestre (Field et al., 1998) et la base de la plupart des écosystèmes océaniques (Li, 1994; Worden et al., 2004; Jardillier et al., 2010). Le phytoplancton est donc essentiel pour les transferts trophiques, et joue également un rôle fondamental dans les cycles biogéochimiques de la planète (Worden et al., 2015). Par exemple, les diatomées sont responsables d’environ 40% de la production primaire océanique (Boyd and Newton, 1995) et jouent un rôle clé dans les cycles biogéochimiques, comme l’export de carbone (Boyd and Newton, 1999; Buesseler, 1998).

Dans le cadre de ce travail de thèse, nous nous intéressons aux Chlorophytae (Friedl and Rybalka, 2012; Leliaert et al., 2012; Lewis and McCourt, 2004), ou «algues vertes», qui regroupent 4 300 espèces dans le règne eucaryote de la lignée verte (archaeplastidae ou plantae). La photosynthèse est apparue dans la lignée verte avec la première endosymbiose par transfert du chloroplaste d’une cyanobactérie il y a environ 1.6 milliard d’années dans une cellule eucaryote (Yoon et al., 2004). Parmi les chlorophytes, il existe une importante diversité de forme de vie (De Clerck et al., 2012): des espèces unicellulaires, pluricellulaires, d’eaux douces, marines ou saumâtres, des espèces coloniales ou non, et des espèces symbiotiques.

35 6.2. Présentation des espèces 1. Choix des modèles biologiques

L’objectif du travail de doctorat est d’acquérir une meilleure compréhension des processus évolutifs et adaptatifs du pico-phytoplancton eucaryote. Il faut souligner l’importance des progrès que de telles expérimentations permettent aujourd’hui dans les recherches menées par la communauté scientifique sur l’évolution. Cette thèse apporte une importante contribution à la littérature existante sur les expériences d’accumulation de mutations et permet d’évaluer le potentiel adaptatif d’un groupe écologique majeur.

Pour cela, nous avons choisi cinq espèces d’algues vertes (Figure 9): Ostreococcus tauri RCC4221 (Blanc-Mathieu et al., 2014; Derelle et al., 2006), Ostreococcus mediterraneus RCC2590 (Subirana et al., 2013), Bathycoccus prasinos RCC1105 (Moreau et al., 2012), Micromonas pusilla RCC299 (Worden et al., 2009) et Picochlorum sp. RCC4223. Toutes appartiennent à la classe des Mamiellophyceae (Marin and Melkonian, 2010), sauf le genre Picochlorum qui appartient à la classe des Trebouxiophyceae (Henley et al., 2004); voir l’arbre phylogénétique, Figure 8. Les fiches détaillées des souches sont disponibles en Annexes. Cinq raisons nous ont orienté vers ces choix:

Premièrement, la culture de toutes ces espèces est bien connue en laboratoire, dans du milieu L1 (voir la composition du L1 en annexe) à 20 °C, avec un cycle jour-nuit de 8h-16h. Les cultures sont clonales, mais pas axéniques, c’est- à-dire qu’elles contiennent des bactéries. La maîtrise de la culture est une étape essentielle pour la mise en place d’expériences et de protocoles avec ces espèces. Pour les expériences, toutes les souches proviennent de la Roscoff Culture Collection (http://roscoff-culture-collection.org/), une banque de microorganismes basée en France et disponible pour la recherche.

36 Deuxièmement, ces espèces sont devenues des modèles d’étude avec une importante bibliographie qui nous donne accès à différentes informations biologiques ou écologiques. C’est surtout le cas des Mamiellophyceae, avec quelques exemples relatés dans la littérature (Abby et al., 2014; Blanc-Mathieu et al., 2014, 2013; Demir- Hilton et al., 2011; Grimsley et al., 2010; Jancek et al., 2008; Palenik et al., 2007; Piganeau et al., 2011b; Rodríguez et al., 2005; Šlapeta et al., 2006; Sullivan et al., 2015).

Troisièmement, en lien avec l’argument précédent, le génome de ces espèces a été entièrement séquencé, ce qui est essentiel pour une étude du taux de mutation. Les génomes et données associées sont disponibles sur deux sites, ORCAE (Sterck et al., 2012) pour l’annotation et Picoplaza (Vandepoele et al., 2013) pour l’analyse comparative des génomes. Ce n’est cependant pas le cas pour Picochlorum RCC4223, dont l’assemblage et l’annotation du génome font partie du travail de thèse.

Quatrièmement, elles possèdent une large diversité génétique et génomique: un petit génome haploïde de 13 à 21 Mb (Tableau 4), avec des variations en GC qui vont de 46 à 63%. Ces différences génomiques nous intéressent précisément dans le cadre des EAMs pour tester les différentes hypothèses exposées en seconde partie de cette introduction.

Tableau 4. Diversité génomique des espèces utilisées pour les expériences d’accumulation de mutations. La diversité génétique de nos modèles, notamment la composition en GC et la taille du génome, nous intéressent pour tester leurs rôles dans la variation du taux de mutation. Espèces RCC Génome (Mb) %GC Gènes Génome codant (%) Ostreococcus tauri 4221 12.5 56 8 116 81.21 Ostreococcus mediterraneus 2590 13.5 69 7 492 84.25 Bathycoccus prasinos 1105 15.1 48 7 847 83.09 Micromonas pusilla 299 21.0 63 10 286 81.85 Picochlorum sp. 4223 13.7 46 8 755 79.45

37 Enfin, les algues vertes d’une manière générale font l’objet de recherches pour leur potentiel biotechnologique (Becker, 2007; Brennan and Owende, 2010; Chisti, 2007; Mata et al., 2010). La possibilité d’exploiter les lipides des algues, notamment pour la recherche de biocarburant (Brennan and Owende, 2010; Hannon et al., 2010), a poussé de nombreux chercheurs à optimiser les protocoles de production ou d’extraction des lipides chez certaines espèces, notamment chez les Trebouxiophyceae (Dassey and Theegala, 2013; Garzon-Sanabria et al., 2012; Gerken et al., 2013; S.-J. Park et al., 2012; Tran et al., 2014; Yang et al., 2014; Zhu and Dunford, 2013). Une étude récente a également mis en évidence une potentielle application médicale (Black et al., 2014) du genre Nannochloris. Brièvement, les Nannochloris eukaryotum (ou Picochlorum eukaryotum) pénètrent spontanément dans des cellules humaines de l’épithélium pigmentaire de la rétine. Les algues qui entrent sont viables et la photosynthèse est active, avec division cellulaire. Ces cellules de la rétine jouent un rôle crucial dans la formation du réseau vasculaire de la rétine en régulant l’expression de la production de facteurs de croissance vasculaire, qui est fonction de la concentration en dioxygène. Plusieurs pathologies oculaires sont liées à des problèmes de régulation de ces facteurs de croissance. La production de dioxygène via la photosynthèse par les Nannochloris entrées dans les cellules de la rétine semble donc, pour les auteurs, une piste à explorer. La connaissance du taux de mutation est particulièrement importante ici, en raison de son utilité pour les recherches d’évolution expérimentale qui peuvent être utilisé pour sélectionner des lignées d’intérêts. C’est la raison pour laquelle un chapitre sera consacré à cette question, en se focalisant sur l’espèce Picochlorum

RCC4223.

38 Acrosiphonia sp. Ulvophycea Oltmannsiellopsis viridis

Dunaliella salina Chlorophycea Chlamydomonas reinhardtii

Tetraselmis striata

Chlorella vulgaris

Picochlorum oklahomensis Trebouxiophycea

Picochlorum RCC4223

Picocystis salinarum

Nephroselmis rotunda Nephroselmidophycea Nephroselmis astigmatica

Ostreococcus tauri

Bathycoccus prasinos

Micromonas pusilla Mamiellophycea

Crustomastix stigmata

Monomastix minuta

Prasinoderma coloniale

Pyramimonas olivacea Pyramimonadales Pyramimonas disomata

Coleochaete nitellarum (groupe externe)

Figure 8. Arbre phylogénétique des , repris de Marin et Melkonian (Marin and Melkonian, 2010), réalisé à partir des séquences qui codent l’ARNr 18S. Les Mamiellophyceae constituent un groupe basal ayant divergé de façon précoce alors que les Trebouxiophyceae sont plus dérivés.

$*! !" #" $" %"

1.44 µm 50 nm 100 nm

Figure 9. Photographies en microscopie électronique des espèces utilisées pour les expériences d’accumulation de mutations. A. Bathycoccus prasinos (Yau et al., 2015). B. Micromonas pusilla (Yau et al., 2015). C. Ostreococcus sp (Yau et al., 2015). D. Picochlorum sp (photo de Claire Hemon).

6.2.2. Les Mamiellophyceae

Les Mamiellophyceae constituent une part de ce que qui est défini comme le pico-phytoplancton eucaryote (Massana, 2011). Ils sont ubiquistes et ont été isolées en mer Méditerranée, et dans les océans Atlantique et Pacifique (de Vargas et al., 2015; Demir-Hilton et al., 2011). Ce groupe comprend notamment les plus petits eucaryotes libres connus, avec une taille de l’ordre du micromètre, dont le plus étudié est Ostreococcus tauri, qui fut découvert dans l’étang de Thau (France) en 1994 (Courties et al., 1994). Ils ont une organisation cellulaire très simple, avec seulement une mitochondrie et un chloroplaste, et sont caractérisés par une absence de paroi cellulaire.

Les quatre espèces de Mamiellophyceae étudiées appartiennent à l’ordre des Mamiellales, dont la première divergence est datée de 65 millions d’années, et correspond à la divergence du genre Micromonas à la fin du crétacé ("lapeta et al., 2006). Enfin, il faut noter une très grande diversité au sein de la classe des Mamiellophyceae (Piganeau et al., 2011a). Malgré la proximité phylogénétique des différentes espèces, il est à noter qu’au sein même des espèces Ostreococcus tauri et O. mediterraneus par exemple, 14 et 6 souches génétiquement distinctes ont été mises en évidence.

%+! Les différentes espèces de Mamiellales connues possèdent des chromosomes particuliers, appelés «outlier chromosomes», qui ont une composition en GC inferieure au reste du génome, avec de nombreuses régions répétées. En tout, O. tauri possède 20 chromosomes, O. mediterraneus et B. prasinos 19, M. pusilla 17. Le caryotype de ces quatre espèces est présenté sur la Figure 10.

Taille Taille en Mb en Mb ! Mp Y Om Bp 1 900 1 640 Ot 1 100 945 915 815

680 '*'#$% 555 $"*% 450 )""% 250 (&!%

Figure 10. Migration&'% par PFGE de l’ADN complet des 4 espèces de Mamiellophyceae. B. prasinos !"#$% possède 19 chromosomes, O. tauri 20, O. mediterraneus 19 et M. pusilla 17. Les marqueurs de taille sont le phage Lambda (premier puits à gauche), et la levure (dernier puits à droite).

6.2.3. Les Trebouxiophyceae 1. Présentation générale

La classe des Trebouxiophyceae a évolué plus récemment que celle des Mamiellophyceae dans l’arbre des Chlorophyta, et possède aussi des genres bien connus en biologie, comme Chlorella. Il existe également des génomes séquencés chez les Trebouxiophyceae (Blanc et al., 2012, 2010; Gao et al., 2014; Pombert et al., 2014). Celui qui se rapproche le plus de notre souche Picochlorum RCC4223 est la souche Picochlorum SE3 (Foflonker et al., 2016, 2015), avec un génome de 13.5

%"! Mb et un GC de 46% (Foflonker et al., 2015). Différentes stratégies de divisons cellulaires sont connues (Yamamoto et al., 2007): la fission (Picochlorum bacillaris), le bourgeonnement (Nannochloris coccoides) et l’autosporulation (N. eucaryotum, N. atomus).

An sein des Trebouxiophyceae, deux genres nous intéressent: Nannochloris et Picochlorum, auxquels appartient la souche RCC4223. La phylogénie n’est pas encore totalement établie (Henley et al., 2004; Yamamoto et al., 2007, 2003, 2001), et des changements de noms entre les deux genres ont eu lieu pour différentes espèces ou souches. Ainsi, il existe des synonymes, d’où une possible confusion. D’après les connaissances actuelles, les espèces du genre Picochlorum sont caractérisées par une halotolérance importante (Foflonker et al., 2015; Henley et al., 2002; von Alvensleben et al., 2013), jusqu’à 90 g.L-1.

6.2.3. Les Trebouxiophyceae 2. Les transferts horizontaux de gènes

Le genre Picochlorum peut permettre de répondre en partie à une troisième question concernant l’évolution et la diversification du phytoplancton eucaryote: Quelle est la part des transferts horizontaux de gènes (« Horizontal Gene Transfert », HGTs) dans la diversification du phytoplancton?

En effet, des HGTs ont été mis en évidence chez plusieurs espèces eucaryotes appartenant aux Chlorophyta (Picochlorum SE3 (Foflonker et al., 2015), B. prasinos (Moreau et al., 2012), Chloromonas brevispina (Raymond, 2014) et Chlorella variabilis (Blanc et al., 2010)) ainsi que chez des algues rouges (Rhodobionta) (Galdieria phlegrea par exemple (Qiu et al., 2013)). Ces transferts de gènes permettent l’acquisition de nouveaux gènes et nouvelles fonctions qui augmentent la diversité et les capacités adaptatives. Les HGTs sont bien connus et fréquents chez les bactéries et les archées (Vos et al., 2015), et chez les eucaryotes (Schönknecht et al., 2014). Il a été proposé que les eucaryotes capables de survivre à des environnements extrêmes auraient conservés plus de gènes d’origine

42 bactérienne acquis par transferts horizontal, comme pour Galdieria sulphuraria (Schönknecht et al., 2013). L’étude d’un autre génome de Picochlorum, halotolérant et themotolérant, pourrait donc fournir de nouvelles données sur l’hypothèse des HGTs vers les Planta.

Pour les différentes raisons évoquées plus haut (HGTs et potentiels biotechnologiques), il nous a semblé important d’inclure Picochlorum RCC4223 dans nos études sur le taux de mutation chez les algues vertes. C’est pourquoi, en plus des expériences d’accumulation de mutations et des résultats associés, une partie du travail de thèse consiste dans l’assemblage et l’annotation de ce nouveau génome.

43

44 7. Les objectifs de thèse

Quels sont les impacts des nouvelles mutations sur la fitness du pico- phytoplancton eucaryote (Chlorophyta)?

Pour répondre à cette question, nous avons effectué des EAMs sur les quatre espèces de Mamiellophyceae présentées précédemment, en suivant l’évolution de la fitness au cours du temps. La fitness est estimée par le nombre de divisions

cellulaires par jour, depuis le T0 au Tfinal. Au total, cette étude de fitness portera sur 7 lignées de M. pusilla, 8 de B. prasinos, 23 d’O. mediterraneus et 19 d’O. tauri. A cela s’ajoute, en fin d’expérience, des tests de fitness dans différentes conditions de salinité et en condition de stress (présence d’herbicides). L’objectif est d’observer ou non des changements de fitness entre environnements, qui seraient le résultat de changements d’effet des mutations.

Quel est le taux de mutations spontanés des algues vertes (Chlorophytes) et quels sont les facteurs qui l’influencent?

Les lignées utilisées pour estimer le taux de mutation sont issues d’EAMs identiques au protocole utiliser pour répondre à la première question exposée ci- dessus. Les mutations sont un évènement rare, ce qui pose deux contraintes expérimentales: il faut suffisamment de lignées et suffisamment de générations pour pouvoir observer des mutations en fin d’expérience. 40 lignées par espèce ont été maintenues le plus longtemps possible pour obtenir le plus de générations indépendantes possibles dans nos conditions expérimentales. Le génome de ces lignées est séquencé par Illumina (162 en tout), et un travail bioinformatique nous permet d’estimer directement le taux de mutation et leurs répartitions dans le génome. Deux questions sous jacentes se posent, celles de la variabilité inter génomique et intra génomique du taux de mutation.

45 Quel est le rôle des HGTs dans la diversification des algues vertes (Chlorophytes)?

Enfin, la possibilité d’étudier un nouveau génome (celui de Picochlorum RCC4223) permet d’explorer d’autres mécanismes d’adaptation. Les HGTs ont été proposés chez les algues vertes comme un mécanisme de diversité. Ce nouveau génome confirmera ou pas cette hypothèse en mettant en évidence des transferts horizontaux de gènes vers Picochlorum. Les HGTs candidats chez Picochlorum SE3 vont être recherchés dans le nouveau génome de RCC4223. De plus, une partie expérimentale et une partie de génomique comparative vont permettre de caractériser la nouvelle souche.

Quelles sont les implications du taux de mutation pour la domestication des algues vertes ?

Le taux de mutation de Picochlorum RCC4223 est estimé de la même façon que pour les Mamiellophyceae. Par contre, l’expérience ne compte que 12 lignées et le type ancestral. Ce taux de mutation donne l’opportunité de discuter du taux de mutation d’une espèce d’algue verte qui présente des intérêts pour les biotechnologies. La connaissance de son taux de mutation peut donc être un paramètre à prendre en compte dans le choix des espèces.

46

CHAPITRE 2:

L’EFFET DES MUTATIONS SUR LA FITNESS

47

48 Quels sont les impacts des nouvelles mutations sur la fitness du pico- phytoplancton eucaryote (Chlorophyta)?

Pour tenter de comprendre l’impact des mutations sur la fitness chez les algues vertes, nous avons effectué des EAMs sur 40 lignées de chaque espèce de Mamiellophyceae. Cette étude est la première du genre chez les Mamiellophyceae, qui n’ont jamais fait l’objet d’EAMs. Selon les espèces, les lignées ont été maintenues de 204 à 378 jours, avec une série de goulots d’étranglements à une cellule tous les 14 jours. Contrairement à la plupart des microorganismes ayant déjà fait l’objet d’EAMs, les cultures ne sont pas maintenues en milieu solide, mais liquide. Pour cette raison, nous avons utilisé la cytométrie en flux pour suivre la croissance des lignées et des contrôles. Toutes les lignées sont issues d’un clone

T0, et maintenues avec une taille efficace réduite au maximum pour les lignées, et 100 fois plus grande pour le contrôle. Le contrôle, avec une sélection efficace, maintient la fitness du type ancestral au niveau optimal. La fitness est estimée par le nombre de divisions cellulaires par jour. En raison des pertes de lignées remplacées par les survivantes en cours d’expérience, nous ne considérons que les lignées totalement indépendantes pour les analyses statistiques. Au suivi de la fitness s’ajoutent deux tests pour explorer l’interaction génotype-environnement sur l’effet des mutations. Pour cela, la fitness des lignées est mesurée dans différentes conditions: une variation de salinité sur un gradient de 5 à 65 g.l-1, et deux conditions en présence d’herbicides. Les espèces choisies ont une large tolérance aux changements de salinité: nous avons voulu voir la réponse des lignées mutantes à ces changements. Les milieux avec herbicides permettent d’étudier la réponse au stress. Cette étude montre une baisse de fitness chez O. tauri, en accord avec la littérature sur le sujet (Halligan and Keightley, 2009). Nous mettons aussi en avant l’importance de l’interaction GxE dans l’adaptation. En effet, certaines lignées montrent des variations significatives de fitness (augmentation ou diminution) dans les conditions de tests GxE qui n’ont pas été détectées pendent l’EAM. Par ailleurs, nous discutons des possibles problèmes expérimentaux, notamment la forte perte de lignées pour certaines espèces, en particulier B. prasinos et M. pusilla. Le matériel supplémentaire de ce chapitre est disponible page 137 à 142.

49

50 INVESTIGATION

Fitness Effects of Spontaneous Mutations in Picoeukaryotic Marine Green Algae

Marc Krasovec,*,1 Adam Eyre-Walker,§ Nigel Grimsley,* Christophe Salmeron,† David Pecqueur,† Gwenael Piganeau,*,1 and Sophie Sanchez-Ferandin* * Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Observatoire Océanologique, F-66650 Banyuls/Mer, France †Sorbonne Universités, UPMC Univ Paris 06, CNRS, Observatoire § Océanologique de Banyuls (OOB) , F-66650 Banyuls/Mer, France and School of Life Sciences, University of Sussex, Brighton BN1 9QG, United Kingdom

ABSTRACT Estimates of the fitness effects of spontaneous mutations are important for understanding the KEYWORDS adaptive potential of species. Here, we present the results of mutation accumulation experiments over 265– spontaneous 512 sequential generations in four species of marine unicellular green algae, Ostreococcus tauri RCC4221, mutation Ostreococcus mediterraneus RCC2590, Micromonas pusilla RCC299, and Bathycoccus prasinos RCC1105. mutation Cell division rates, taken as a proxy for fitness, systematically decline over the course of the experiment in O. accumulation tauri, but not in the three other species where the MA experiments were carried out over a smaller number fitness effects of generations. However, evidence of mutation accumulation in 24 MA lines arises when they are exposed marine pico- to stressful conditions, such as changes in osmolarity or exposure to herbicides. The selection coefficients, phytoplankton estimated from the number of cell divisions/day, varies significantly between the different environmental single cell conditions tested in MA lines, providing evidence for advantageous and deleterious effects of spontaneous cultures mutations. This suggests a common environmental dependence of the fitness effects of mutations and allows the minimum mutation/genome/generation rates to be inferred at 0.0037 in these species.

Mutations are the main drivers of genetic diversity that enable species genotypeforagivennumberofgenerations(seeHalliganand to adapt by natural selection. Estimating the spontaneous mutation rate Keightley 2009 for a review). Serial bottlenecks make natural selection and the fitness effects of mutations is, thus, essential for a better ineffective in the face of genetic drift and permit deleterious muta- understanding of the evolution and the adaptive potential of species tions to segregate and become fixed in MA lines. Since Mukai’s first (Wright 1932; Kondrashov 1988). A proportion of new mutations are experiments in Drosophila, many MA experiments have been per- deleterious (Charlesworth and Charlesworth 1998; Keightley and formed in different organisms: Arabidospis thaliana (Shaw et al. 2000), Lynch 2003; Lynch et al. 1999), and some of the strongest evidence Caenorhabditis elegans (Ajie et al. 2005; Katju et al. 2014; Vassilieva et al. for this comes from mutation accumulation (MA) experiments, pioneered 2000; Vassilieva and Lynch 1999), Daphnia pulex (Deng et al. 2002; by Mukai in Drosophila melanogaster (Mukai 1964). The accumula- Deng and Lynch 1997; Schaack et al. 2013), Dictyostelium discoideum tion of mutations can be measured experimentally by monitoring the (Hall et al. 2013), D. melanogaster (Fernández and López-Fanjul 1996; growth, or other fitness traits, of independent lines starting from one Fry 2004, 2001; Fry et al. 1999; Keightley 1994; Schrider et al. 2013), Saccharomyces cerevisiae (Wloch et al. 2001; Zeyl and DeVisser 2001), Copyright © 2016 Krasovec et al. and Tetrahymena thermophila (Long et al. 2013). Generally, these ex- doi: 10.1534/g3.116.029769 periments show a decrease of fitnessintheMAlinesastheexperiment Manuscript received March 29, 2016; accepted for publication May 5, 2016; progresses, consistent with a substantial proportion of spontaneous published Early Online May 10, 2016. mutations being deleterious. This is an open-access article distributed under the terms of the Creative fi Commons Attribution 4.0 International License (http://creativecommons.org/ MA experiments also enable the relationship between the tness licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction effects of mutations and the environment to be explored. Knowledge in any medium, provided the original work is properly cited. about genotype–environment (GxE) interactions is essential to under- Supplemental material is available online at www.g3journal.org/lookup/suppl/ stand the adaptation process, because fitness effects of mutations may doi:10.1534/g3.116.029769/-/DC1 1 change with time and spatial scales. In D. melanogaster (Fry et al. 1996; Corresponding authors: Pierre and Marie Curie University (UPMC), 1 Avenue de Fontaulé, 66650 Banyuls-sur-Mer, France. E-mails: [email protected]; Kondrashov and Houle 1994), C. elegans (Baer et al. 2006) or S. cer- [email protected] evisiae (Korona 1999), the fitness effects of spontaneous mutations

Volume 6 | July 2016 | 2063 change with environmental conditions. However, this interaction is not Classically, in MA experiments of unicellular organisms, a colony of systematic; in the case of A. thaliana, one experiment showed a positive cells is transferred to a fresh agar plate at each bottleneck to allow the GxE interaction in fitness effects of mutations (Rutter et al. 2012), separation of the cells and the random sampling of a new cell. However, whereas other studies did not (Chang and Shaw 2003; Kavanaugh this is not possible in these species as they do not grow on the surface of and Shaw 2005). The nature of the change in mutational effect with gelled media, and only grow slowly within gelled medium, in contrast to environmental conditions allows us to infer three biological implica- S. cerevisiae, D. discoideum or Chlamydomonas reinhardtii (Hall et al. tions (Martin and Lenormand 2006): (i) a change in the genomic 2013; Morgan et al. 2014; Wloch et al. 2001). Nevertheless, they are mutation rate U can be interpreted as changes in the expression of easily cultured in liquid medium in the laboratory. Therefore, we de- mutated genes, (ii) an increase of the fitness variance suggests a varia- veloped an experimental protocol combining flow cytometry, which tion in the fitness effects of mutation between environments (iii), a has the advantage of counting individual cells while verifying cell size change in the average fitness measured might be explained by increased and fluorescence, and transfer of single cells in liquid media. Bottle- selection strength in harsh conditions. necks of MA lines to one cell were performed every 14 d. However, In harsh environments, the effects of deleterious mutations are since the number of sampled cells follows a Poisson distribution, the expected to increase, because of the biological and ecological pressure probability of line loss by sampling one single cell is 0.37. Indeed, in induced by stress. However, this view is disputed by experimental evidence contrast with agar plate protocols, a colony cannot be observed in liquid in Escherichia coli (Kishony and Leibler 2003) and C. elegans (Andrew medium, and the cell densities were never large enough to be seen as et al. 2015). In general, the interaction between stress and fitness effects of green. Thus, we measured the number of cells in our wells and calcu- mutations may be categorized as follows (Elena and de Visser 2003): first, lated the volume needed to extract 10 cells, from which we sampled six unconditionally deleterious, with the magnitude of the stress increasing for the next new six wells with fresh media. Thus, we maintained six the deleterious effect; second, conditionally neutral, i.e., neutral in some replicates per line at each bottleneck. conditions and deleterious in others; third, conditionally beneficial, i.e., If we assume that cells are uniformly distributed through the advantageous in some conditions but deleterious in others. medium, the number of sampled cells, N, is Poisson distributed: While most MA experiments have been performed in model or- 2 N ganisms, no results are available in marine phytoplanktonic . e N N P N; N ¼ (1) Here, we report MA experiments in four haploid marine green algae N! (Chlorophyta): Ostreococcus tauri RCC4221 (Blanc-Mathieu et al. 2014), Ostreococcus mediterraneus RCC2590 (Subirana et al. 2013), We inoculated those cells into a volume V from which we drew Micromonas pusilla RCC299 (Worden et al. 2009), and Bathycoccus aliquots such that we ultimately discarded a proportion q of the prasinos RCC1105 (Moreau et al. 2012). All species belong to the sample. For a particular sample, the probability that all N cells are N Mamiellales order (class Mamiellophyceae, Marin and Melkonian discarded is simply q . Thus, the overall probability that we discard all 2010), and are widespread members of the marine phytoplankton cells and hence lose a line is: (De Vargas et al. 2015) that sustain the marine ecosystem in coastal XN areas (Worden et al. 2004). These green algae contain the smallest G ¼ P N; N qN (2) known free-living eukaryotes (Courties et al. 1994), defined as the N¼0 pico-phytoplankton (see Massana 2011 for a review). They have a simple cell organization, with only one chloroplast and one mitochon- If we wanted to include pipetting error, we could model this by drion, and a small genome of 13–21 Mb. assuming that the volume sampled differs from that intended by a factor a which is g distributed with a mean of 1 and a shape param- eter of b. Now equation 1 becomes: MATERIALS AND METHODS ZN Biological models e2aΝðaN ÞN P N; N ; b ¼ Dða; bÞda (3) We performed MA experiments on four haploid marine green algae N! a¼ (Chlorophyta): O. tauri RCC4221, O. mediterraneus RCC2590, M. 0 pusilla RCC299, and B. prasinos RCC1105. All cultures are available This is actually a negative binomial: from the Roscoff Culture Collection (http://roscoff-culture-collection. org/). The identity of each strain was confirmed by 18S rDNA sequenc- b 1 1 2N2b P N; N ; b ¼ N N N þb GðNþbÞ (4) ing and PFGE migration (Schwartz and Cantor 1984) at the start of the N!GðbÞ b experiment. All species were kept in L1 liquid medium (salinity of 35 g/L) with a light:dark (LD) cycle of 8:16 (8 hr light 16 hr dark) in So the probability of observing k or more line losses over t transfers is 24-well plates, at 20°,exceptforB. prasinos RCC1105, for which the given by multiplying G from equation (2) by k and t. cycle was 12:12 LD. One fifth of the microtiter plate’s volume was used for Cell counting, using a FACSCanto II flow cytometer (Becton Dickinson, Franklin MA experiments Lakes, NJ) equipped with an air-cooled laser providing 15 mW at Each experiment was started with one single cell, which divided to 488 nm with the standard filter set-up. Becton Dickinson TrucountTM produce the ancestral population, from which single cells were sampled beads were used to calculate the abundance of the cells as described by togenerateindependentlines by one cellinoculation (Figure1). For each Pecqueur et al. (2011). A total of 20 mlofmixedfluorescent beads 1 mm species, we inoculated 40 MA lines, kept in 24-well microtiter plates. As in diameter (Molecular Probes Inc., Eugene, OR) were added as an a control, the ancestral population was cultured in the same conditions, internal standard to 300 ml of the diluted sample (20th dilution). The but with an inoculation of 100 cells, to maintain a larger effective flow rate of the cytometer was set to high (acquisition time: 1 min). population size. We kept one microplate of controls, i.e.,24control Eukaryotic pico-phytoplankton cells were detected and analyzed using replicates. natural chlorophyll fluorescence (chlorophyll a FL3 670 nm LP). The

2064 | M. Krasovec et al. Figure 1 Mutation accumulation (MA) experiments in pico-algae. Flow cytometer measurements were performed every 14 d to make one cell bottlenecks for each line. The ancestral culture of each species came from one single cell, inoculated in a well to grow enough cells to start the experiment. The ancestral culture was maintained with higher effective population size in the control lines (inoculation of 100 cells) and MA lines by reinoculating one single cell, in six replicates per line, in 24-well microtiter plates.

flow cytometry data were analyzed using BD FACSDiva (Becton stressful conditions: nine MA lines of O. mediterraneus, seven MA lines Dickinson). of M. pusilla, and eight MA lines of B. prasinos.ForO. mediterraneus, In total, the experiments involved 27 bottlenecks over 378 d for 24 MA lines reached the end of the experiment, of which nine were O. tauri, 21 bottlenecks over 294 d for O. mediterraneus,21bottlenecks chosen randomly for practicality. over 302 d for M. pusilla, and 16 bottlenecks over 224 d for B. prasinos Before starting fitness assays, we transferred MA lines in L1 medium (Table 1). flasks and let them grow for 2 wk to have enough cells to inoculate cultures. Fitness assays were performed in 48-well microtiter plates, Estimation of fitness with a starting population of 50,000 cells/well. For herbicide tolerance We estimated the fitness of lines from the number of divisions/day, G, tests, we used Diuron at 10 mg/L and Irgarol 1051 at 1 mg/L (Sanchez- calculated over a period of 14 d using the equation: Ferandin et al. 2013). We tested salinities of 5, 20, 35, 50, and 65 g/L using L1 medium supplements (Guillard and Hargraves 1993). The ½ ð = Þ= G ¼ e ln Nt 1 t (5) number of biological replicates was three for each MA line and four for each control. Cell concentrations were obtained by flow cytometry fi Nt is the nal number of cells just before the bottleneck and t = 14 the 7 d after plate inoculation and S was estimated as specified above. This number of days between two bottlenecks (t = 14). G is the number of T corresponds to a total of 52 wells measures for M. pusilla,58forB. generations/day. To compare G between different MA lines over time, prasinos, and 64 for O. mediterraneus.InO. tauri, the MA experiment the relative fitness, G (G = G /G ), was computed. The effec- r r MA control was completed 6 months before the start of the fitness assays under tive population size of MA lines and control line populations at each stressful conditions, so fitness assays could not be performed for this bottleneck was estimated as the harmonic mean of the population size species. between t =1tot = 14 days. Following Chevin (2011), the fitness effects of mutations in the MA lines at the end of the experiment were Statistical analysis measured by estimating the selection coefficient scaled by the gener- First, to investigate the relationship between fitness, G, and the number ation time, ST. of sequential generations, we used data from those lines that survived ð Þ 2 ð Þ throughout the experiment: 21 lines for O. tauri,24forO. mediterra- ¼ ln GMA ln Gcontrol ST ð Þ ln2 (6) ln Gcontrol neus, eight for B. prasinos, and seven for M. pusilla. We performed an ANOVA on the control data to test whether G changed significantly between bottleneck times. The change in fitness of MA lines as a func- tion of time was thereafter analyzed by dividing the growth rate in the Fitness assays in stressful conditions MA lines by the growth rate in the control, Gr, to remove the variation Upon completion of the MA experiments in O. mediterraneus, M. in the experimental set-up through time. For each line, the relationship fi pusilla,andB. prasinos, we used MA lines that had survived from the between the relative tness (Gr) and the number of generations was first to the last generations in each species for further investigations in tested using Pearson’s correlation.

Volume 6 July 2016 | Effects of Mutations in Algae | 2065 n Table 1 Summary of mutation accumulation experiments for four species

Species Number of Lines Average Number of Generations Per Line Ne T0–Tf (d) O. tauri RCC4221 21 512 8 378 O. mediterraneus RCC2590 24 272 6 294 M. pusilla RCC299 7 272 6 302 B. prasinos RCC1105 8 265 8 224

The number of lines is the number of surviving independent lines since the start of the experiment (T0)totheend(Tf). Ne is the average of effective population size between each bottleneck. The last column is the total duration of the experiment. The probability of line loss was estimated using equation (2) in the Materials and Methods section, N =10,andq =0.4.Expectednumberoflinelosses(Lexp) is estimated for each species as a function of the coefficient of variation in sampling cells (Table 2).

fi fi Second, for tness assays in stressful conditions, ST was calculated eliminate any changes in tness due to uncontrolled variation in the in all conditions using GMA and Gcontrol at each condition as explained experimental set-up. above. We used a pairwise Student’s test to detect changes between The average Gr of O. tauri MA lines per bottleneck event decreases MA lines and control. The p-value was corrected for multiple testing significantly with time (Pearson correlation test, r = 20.49, p-value = using the Bonferroni-Holm method (Holm 1979), as implemented 0.047). Also, four independent MA lines of the 21 had an individu- fi 2 in R. Because MA lines could have fixed more than one mutation ally signi cant decrease of Gr (Pearson correlation test; r = 0.54, during MA experiments, the selection coefficient is estimated for a p-value = 0.026; r = 20.51, p-value = 0.035; r = 20.56, p-value = potential set of mutations, including their possible epistatic effects 0.018; and r = 20.55, p-value = 0.022) (Table S4). fi on fitness. In O. mediterraneus, Gr signi cantly increased in one line (Pearson To check that the environmental assays were indeed stressful for our correlation test, r = 0.52, p-value , 0.05). This line is the only one with fi fi fi cultures, Gcontrol of the 24 controls at the end of the MA experiment was asigni cant increase in tness. No signi cant increase or decrease of fi compared to Gcontrol of the four controls in each of the environmental within-species tness variation of Gr was detected for M. pusilla (Table conditions. A significant decrease of G in an environmental condition S1), B. prasinos (Table S2), and O. mediterraneus (Table S3).We also confirmed its stressful effect. investigated whether the number of lines lost varied over the course of Finally, the salinity of 35 g/L is the standard salinity of culture. We the experiments: the data are consistent with a constant line loss over performed a Fisher-Snedecor test to detect changes in variance between the course of the experiments in all four species. However, the observed the standard salinity and the other salinities. number of lines lost was higher than expected by chance for a coeffi- Statistical analyses were performed with R (version 3.1.1) (R Core cient of variation in sampling error equal or smaller to 5% (Table 2) in Team 2014). all species.

Data availability Fitness effects in stressful conditions Supplemental Material, Table S1, Table S2, Table S3,andTable S4 Herbicide stress: Both herbicides significantly decreased fitness in the fi contain tness data of each MA line during the experiments. Table control lines in all tested species when compared to those cultured fi fi S5, Table S6,andTable S7 contain tness data for tness assays in without herbicide (Wilcoxon test, p-value , 0.001); the herbicides re- herbicides and salinity gradient conditions. Control data during MA duced growth rate by 52% and 74% for B. prasinos, 40% and 42% for M. experiments are provided in Table S8, Table S9, Table S10,and pusilla, and 52% and 48% for O. mediterraneus, in Irgarol 1051 and Table S11. Diuron media, respectively. In some cases, the variance significantly increased in MA lines (Fisher-Snedecor test, p-value , 0.05 in Irgarol , RESULTS 1051 for O. mediterraneus and M. pusilla;p-value 0.001 for B. prasinos with the two herbicides). A change of variance is as expected MA experiments in stressful conditions, because of the revelation of mutation effects. The average effective population sizes across the experiment were six fi For each species, the selection coef cients, ST, are shown in Figure 2. cells for O. mediterraneus and M. pusilla and eight cells in B. prasinos In contrast with the MA experimental conditions, some MA lines and O. tauri (Table 1). The effective population size in the control, showed significantly lower or higher fitnesses with a significant negative fi which was started with an initial cell number of 100, was estimated or positive selection coef cient. In addition, ST changed between the to be 600 for M. pusilla,650forO. mediterraneus, and 700 for the other two conditions for some identical MA lines. two species. Between each bottleneck, depending on species and lines, In all, one MA line had a significantly positive selection coefficient, the lines divided 10–20 times, corresponding to 512 independent se- while two MA lines had a significantly negative selection coefficient in quential generations/line for O. tauri, 272 for O. mediterraneus,265for the two conditions. B. prasinos,and272forM. pusilla, on average (Table 1). In summary, out of 24 tested lines, 12 lines (50%) had a significantly fi negative ST in at least one herbicide, whereas ve lines (21%) had a fi Fitness effects of mutations during the MA experiment signi cantly positive ST. We measured the fitness of our MA lines as the number of cell divisions that occurred between two bottlenecks. There was no increase or Osmolarity stress: MA and control lines were exposed to lower decrease in the growth rate of the control lines with generation time, (salinities of 5 and 20 g/L) and higher (salinities of 50 and 65 g/L) levels but there was a significant variation between bottleneck times of salinity than the seawater of their natural environment (35 g/L). (ANOVA, p-value , 0.001) for all species. The fitnessvaluesof Below, we define an environment as stressful if the controls grow more MAlineswerethusdividedbythemeanfitness estimation of the slowly in this environment than in standard conditions, the magni- fi control, Gcontrol, to yield relative tness values, Gr;thiswasdoneto tude of stress being estimated by the growth rate reduction. Both high

2066 | M. Krasovec et al. n Table 2 Statistical probabilities of line loss O. tauri O. mediterraneus B. prasinos M. pusilla

CV p Lexp P(L $ Lobs)Lexp P(L $ Lobs)Lexp P(L $ Lobs)Lexp P(L $ Lobs) 0 0.0025 2.7 0 2.1 0 2.4 0 1.7 0 0.05 0.0026 2.8 0 2.2 0 2.5 0 1.8 0 0.4 0.0150 16.2 0.09 12.6 0 14.4 0 10.2 0 0.5 0.0260 28.1 0.89 21.8 0.5 25.0 0.0000 17.7 0.0033

Statistical probabilities of line loss, with p the probability of line loss at each bottleneck, Lexp the expected number of line losses for each experiment, and Lobs the number of observed line losses. Probability of observing Lobs or more line losses, as a function of the number of lines, the number of bottlenecks, t (16, 21, and 27 bottlenecks depending on species), and the coefficient of variation of the sampling error (g distribution with average 1 and Coefficient of Variation CV). As an example, for O. tauri, the probability of obtaining the observed line loss, Lobs, over the number of bottlenecks performed, with a CV of 0.04, is 0.09 [P(L $ Lobs)], the expected line loss, Lexp, being 2.8.

and low salinities are stressful for B. prasinos. In contrast, the control deleterious mutations (Lynch 2010; Sung et al. 2012). Nevertheless, it lines of both M. pusilla and O. mediterraneus grew faster in the slightly is possible to estimate a minimum mutation rate, assuming that a lower salinity treatment (20 g/L), and O. mediterraneus also grew faster significant fitness difference between the controls and the MA lines in the lowest salinity treatment (5 g/L) than in the standard conditions might be the result of at least one mutation. Since each of the MA (35 g/L), suggesting that lower salinity is not necessarily stressful. lines has a significant fitness difference with the control in at least Achangeintheselectioncoefficient of MA lines is thus not necessarily one condition, this corresponds to nine mutations for O. mediterra- a consequence of a stress, but just due to benign changes of an envi- neus,sevenforM. pusilla,andeightforB. prasinos.Dependingon ronmental parameter. the number of generations and the genome size, the minimum mu- Stress may be expected to increase the fitness variance. To test this, tation rate is thus 2.72210 mutations/site/generation for O. mediter- 2 we compared the variance of ST in each condition with the standard raneus (i.e., 0.0037 mutations/genome/generation), 1.75 10 for M. conditions (35 g/L). The variance of the fitness of MA lines was signif- pusilla (i.e., 0.0037 mutations/genome/generation), and 2.52210 for icantly higher for O. mediterraneus in the higher salinity, the most B. prasinos (i.e., 0.0038 mutations/genome/generation). These esti- stressful condition (p-value , 0.01). This was also the case for B. mates are consistent with estimates in other unicellular organisms, , 2 prasinos in the two higher and lower salinities (p-value 0.001) and like C. reinhardtii (Ness et al. 2012) with 2.08 10 mutations/site/ , fi 2 at 20 g/L (p-value 0.05). In contrast, we did not detect any signi cant generation, or S. cerevisiae with 3.30 10 mutations/site/generation fi 2 change of the variance in the tness of M. pusilla MA lines between (Lynch et al. 2008), Schizosaccharomyces pombe with 2.00 10 tested conditions. mutations/site/generation (Farlow et al. 2015), Burkholderia ceno- The three species showed contrasting patterns in terms of the cepacia with 1.33210 mutations/site/generation (Dillon et al. 2015), fi direction of selection coef cient variation, estimated from the number or E. coli with 2.45210 mutations/site/generation (Lee et al. 2012). of cell divisions/day (Figure 3). In O. mediterraneus, ST was systemat- Thus, fitness assays suggest that the minimum mutation rates of ically negative for the MA lines. In particular, the decrease of ST was the ourstrainsarenotlowerthanthoseinotherspeciesandarecloseto fi most signi cant in the highest salinity, which was the most stressful. B. the constant mutation rate proposed by Drake (Drake 1991), that is prasinos and M. pusilla were much more variable. In B. prasinos, almost U = 0.0033 in microorganisms. fi fi all MA lines had a signi cantly higher tness than the control under Second, our measure of fitness may not be well suited to detect the stressful conditions, whereas in M. pusilla approximately half of the effect of mutations. In a MA experiment in D. discoideum,Halland fi fi fi lines with signi cantly different tness to the control had higher tness, coworkers followed eight fitness traits, and showed that two of them did and half had lower fitness. Strikingly, the MA lines in B. prasinos with not decrease (Hall et al. 2013). We measured fitness as the rate at which higher fitness under low salinity also had higher fitness in higher the population increased over the 2 wk period between two bottlenecks. salinity. Most of the species tend to divide once a day, in rhythm with the In conclusion, all 24 MA lines investigated had a significant lower natural LD cycle, and so this is probably a robust character, particularly or higher selection coefficient than the control lines in at least one under the benign lab conditions. Likewise, cell death may not occur condition, in accordance with the accumulation of spontaneous muta- very often under laboratory conditions. However, the fact that all MA tions in each MA line and a variation in the effects of spontaneous lines show significant fitness differences with the control lines under mutations in different environments. stressful conditions suggests that at least some mutations with fitness effects have occurred. Indeed, the fitness effects of mutations change DISCUSSION across environments. Previous mutation experiments in Caenorhabditis No fitness decrease in three out of four species: no (Baer et al. 2006) and D. melanogaster (Fry et al. 1996; Fry and Heinsohn mutations or mutations with no fitness effects? 2002) suggest that mutational parameters change, as expected because of Except for O. tauri, most MA lines did not show any evidence of fitness GxE interactions. decrease during the experiment. This is despite running the experiment Third, although all of these species are usually haploid, some lines with a low average effective population size of around eight individuals, may have become diploid during the experiment, which may have at maximum, over 265–272 generations. Several factors might explain masked the effects of some deleterious mutations. However, we would the absence of fitness decrease in most MA lines. expect an increase of cell size with ploidy change, but this was not First, it could be due to a very low mutation rate. The low mutation observed by flow cytometry. rate could be a result of large effective population sizes in these species, Finally, the duration of the experiment may not have been sufficient that enable selection for lower mutation rate, limiting the appearance of to detect the effects of deleterious mutations. A decrease of fitness was

Volume 6 July 2016 | Effects of Mutations in Algae | 2067 Figure 2 Selection coefficients, ST,in media containing Irgarol 1051 or Diu- ron herbicides. Empty circles with a number: MA lines with significant ST differences (Student’s test, p-value , 0.01). Left to right in the two graphs: B. prasinos in orange (eight MA lines), M. pusilla in blue (seven MA lines), and O. mediterraneus in green (nine MA lines). The ST of controls are pre- sented as white plots on the left of the MA lines. MA, mutation accumulation.

observed in O. tauri, which was allowed to accumulate mutations would be five to sevenfold higher than the spontaneous mutation over a longer period than the other three species (512 generations as rates reported above. This corresponds to lethal mutation rates that compared to the 272 and 265 in the other species). Indeed, recent are too high to be viably supported by a population. MA experimental studies in C. reinhardtii (Morgan et al. 2014) and A third hypothesis is that line loss is not the consequence of cell D. discoideum (Hall et al. 2013) reported a decrease in fitness with death but the consequence of the absence of cell division. In lab similar effective population sizes and higher numbers of sequential conditions, living cells usually engage in cell division at the end of the generations (Ne = 6.5 during 1000 generations, and Ne = 7.5 dur- day, after light exposure, provided nutrients are available. Without ing 994 generations, respectively). However, increasing the num- bottleneck to one single cell, line loss in culture maintenance is ber of sequential generations beyond 200 was not possible: in exceptional. However, if cell division is triggered by an environmen- B. prasinos and M. pusilla, the MA experiments had to be stopped tal factor produced by the culture, it may be halted as a consequence as a consequence of the high line loss at each bottleneck. The num- of the reinoculation step of one single cell. Consistent with this hy- ber of lines lost was leading to a stagnation of the total number of pothesis, we observed that lost lines were transferred from signifi- independent generations in the experiments. Line loss occurred at cantly smaller volumes; from 2 ml on average, while maintained lines each bottleneck from the start of the experiment and there was no have been transferred from 4 ml, on average, for M. pusilla and B. trend (increase or decrease) in the number of lines lost with time. prasinos (Student’s test, p-values , 0.001 and , 0.01, respectively). There are three possible explanations for line loss. The difference in line loss rates between species could thus be the First, it could be due to sampling error, since single cell transfer consequence of a difference in dependence of cell division to an cannot be checked by eye or light microscopy due to small cell size. environmental factor, lost during the reinoculation step. This envi- Theprobabilityofsamplingonesinglecellfromavolumefollowsa ronmental factor may be a metabolite produced by the culture, e.g.,a Poisson distribution and the probability of sampling no cell is thus phytohormone (Bartel 1997; Piotrowska-Niczyporuk and Bajguz 0.37. To overcome this high rate of loss, our experimental procedure 2014). This high level of line loss reveals a knowledge gap on the was to sample a volume of culture predicted by flow cytometry to induction of cell division in nonmodel microorganisms and reduces contain 10 cells and divide this into six wells of a culture plate (see the amount of data available for fitness estimates. However, it does Materials and Methods). The probability of line loss is thus smaller not alter the growth rate estimates of the maintained lines or the than 1022 in all experiments (Table 2). Coefficients of variation estimations of mutations per generation. between 0.4–0.5 are needed to account for the observed line loss. However, since cytometry counts and pipetting errors are below 1%, Increase or decrease of fitness under it is highly unlikely that the sampling procedure is responsible for stressful conditions the observed level of line loss. Changes in environmental conditions clearly enable the detection of Second, line loss may be the consequence of lethal mutations or substantial variation in fitness between MA lines. This is as expected strong selection imposed by the experiment. If the experiment was if the fitness effect of mutation changed between environments. The associated with selection, we would expect the growth rates from the variance between the MA lines is greater than the variance between the control cultures, reinoculated at the same time with 100 cells, to control lines, suggesting that some mutations, not detected in MA increase over the course of the experiment. There is no evidence for standard conditions, have been fixed in our MA lines. The significant this in any experiment. On the other hand, if lethal mutations are variation in fitness of some MA lines may be the result of several responsible for the line loss, the rate of lethal mutations per gener- nonmutually exclusive factors. ation can be estimated by the proportion of lost lines divided by the First, stressful conditions might exacerbate already existing number of generations and is 0.025 and 0.019 per genome per fitness differences (Kondrashov and Houle 1994), so the MA lines generation in B. prasinos and M. pusilla, respectively. Compared may have accumulated more slightly deleterious mutations than to the known spontaneous mutation rates in other microorganisms the control lines because they have smaller Ne,buttheoverall (Drake 1991) and the estimations above, these lethal mutation rates difference in fitness between the MA and control lines is not

2068 | M. Krasovec et al. Figure 3 Selection coefficients in five salinity con- ditions. Empty circles with number are MA lines with significant differences to controls (Student’s test, p-value , 0.01). (A) O. mediterraneus in green, nine MA lines. (B) M. pusilla in blue, seven MA lines. (C) B. prasinos in orange, eight MA lines. The ST of controls are presented as white plots on the left of the MA lines. MA, mutation accumulation.

detectable under the standard MA conditions. However, such dif- Conclusion ferences in fitness might be detectable in a stressful environment We investigated the accumulation of mutations in four marine green because the selection intensity changes (Martin and Lenormand picoalgae. Despite a modest number of sequential generations 2006). A change in selection intensity might come about through a per MA line, we found evidence for a variation in fitness effects change in the environment (Fry and Heinsohn 2002; Rutter et al. of spontaneous mutations from benign to stressful environments. 2012), or a change in the effect of an allele, for example by a This allowed us to estimate a minimum per genome mutation rate change in gene expression. In another green algae, C. reinhardtii, of 0.0037. Kraemer and coworkers also highlight the effects of stress on the amplification of deleterious mutations and their impact on fitness ACKNOWLEDGMENTS (Kraemer et al. 2015). We acknowledge Hervé Moreau, Sheree Yau, and the Genomics of Second, the fixation of mutations, particularly slightly dele- Phytoplankton lab for support and stimulating discussions. We also teriousmutations,isfasterintheMAlinesbecausetheyhave thank three anonymous referees for their constructive comments on smaller Ne. As a consequence, these slightly deleterious muta- a previous version of this manuscript. We are grateful to Sebastien tions, which could become advantageous in a novel environ- Peuchet, Aurelien De Jode, Claire Hemon, and Elodie Desgranges for ment, can accumulate in the MA lines but not in the controls. technical assistance with the mutation accumulation experiments They may thereby increase the fitnessinsomeoftheseMAlines. from 2011–2013, and to the Agence Nationale de la Recherche In addition, both the control and MA lines have accumulated (ANR) for supporting them (PICOVIR, DECOVIR, TARA-GIRUS; mutations that are neutral under the original conditions but BLAN07- 1_200218, ANR-12-BSV7-0009, ANR-09-PCS-GENM- deleterious under the stressful conditions, causing the fall of 218). This work was funded by grant ANRJCJC-SVSE6-2013-0005 fitness among MA lines. to G.P. and S.S.F.

Volume 6 July 2016 | Effects of Mutations in Algae | 2069 LITERATURE CITED the social amoeba, Dictyostelium discoideum. G3 (Bethesda) 3: 1115– Ajie, B. C., S. Estes, M. Lynch, and P. C. Phillips, 2005 Behavioral degra- 1127. dation under mutation accumulation in Caenorhabditis elegans. Genetics Halligan, D. L., and P. D. Keightley, 2009 Spontaneous mutation accumu- 170: 655–660. lation studies in evolutionary genetics. Annu. Rev. Ecol. Evol. Syst. 40: Andrew, J. R., M. M. Dossey, V. O. Garza, M. Keller-Pearson, C. F. Baer et al., 151–172. 2015 Abiotic stress does not magnify the deleterious effects of sponta- Holm, S., 1979 A simple sequentially rejective multiple test procedure. neous mutations. Heredity 115: 503–508. Scand. J. Stat. 6: 65–70. Baer, C. F., N. Phillips, D. Ostrow, A. Avalos, D. Blanton et al., Katju, V., L.B. Packard, L. Bu, P.D. Keightley, and U. Bergthorsson, 2006 Cumulative effects of spontaneous mutations for fitness in Cae- 2014 Fitness decline in spontaneous mutation accumulation lines of norhabditis: role of genotype, environment and stress. Genetics 174: Caenorhabditis elegans with varying effective population sizes. Evolution 1387–1395. 69: 104–116. Bartel, B., 1997 Auxin biosynthesis. Annu. Rev. Plant Physiol. Plant Mol. Kavanaugh, C. M., and R. G. Shaw, 2005 The contribution of spontaneous Biol. 48: 51–66. mutation to variation in environmental responses of Arabidopsis thali- Blanc-Mathieu, R., B. Verhelst, E. Derelle, S. Rombauts, F.-Y. Bouget et al., ana: responses to light. Evolution 59: 266–275. 2014 An improved genome of the model marine alga Ostreococcus tauri Keightley, P. D., 1994 The distribution of mutation effects on viability in unfolds by assessing Illumina de novo assemblies. BMC Genomics 15: Drosophila melanogaster. Genetics 138: 1315–1322. 1103. Keightley, P.D., and M. Lynch, 2003 Toward a realistic model of mutations Chang, S.-M., and R. G. Shaw, 2003 The contribution of spontaneous affecting fitness. Evolution 57: 683–685. mutation to variation in environmental response in Arabidopsis thaliana: Kishony, R., and S. Leibler, 2003 Environmental stresses can alleviate the responses to nutrients. Evolution 57: 984–994. average deleterious effect of mutations. J. Biol. 2: 14. Charlesworth, B., and D. Charlesworth, 1998 Some evolutionary conse- Kondrashov, A. S., 1988 Deleterious mutations and the evolution of sexual quences of deleterious mutations. Genetica 102–103: 3–19. reproduction. Nature 336: 435–440. Chevin, L.-M., 2011 On measuring selection in experimental evolution. Kondrashov, A. S., and D. Houle, 1994 Genotype-environment interactions Biol. Lett. 7: 210–213. and the estimation of the genomic mutation rate in Drosophila mela- Courties, C., A. Vaquer, M. Troussellier, J. Lautier, M. J. Chrétiennot-Dinet nogaster. Proc. Biol. Sci. 258: 221–227. et al., 1994 Smallest eukaryotic organism. Nature 370: 255. Korona, R., 1999 Genetic load of the yeast Saccharomyces cerevisiae under Deng, H.-W., G. Gao, and J.-L. Li, 2002 Estimation of deleterious genomic diverse environmental conditions. Evolution 53: 1966–1971. mutation parameters in natural populations by accounting for variable Kraemer, S. A., A. D. Morgan, R. W. Ness, P. D. Keightley, and N. Colegrave, mutation effects across loci. Genetics 162: 1487–1500. 2015 Fitness effects of new mutations in Chlamydomonas reinhardtii Deng, H. W., and M. Lynch, 1997 Inbreeding depression and inferred across two stress gradients. J. Evol. Biol. 29(3): 583–593. deleterious-mutation parameters in Daphnia. Genetics 147: 147–155. Lee, H., E. Popodi, H. Tang, and P. L. Foster, 2012 Rate and molecular De Vargas, C., S. Audic, N. Henry, J. Decelle, F. Mahé et al., 2015 Ocean spectrum of spontaneous mutations in the bacterium Escherichia coli as plankton. Eukaryotic plankton diversity in the sunlit ocean. Science 348: determined by whole-genome sequencing. Proc. Natl. Acad. Sci. USA 1261605. 109: E2774–E2783. Dillon, M. M., W. Sung, M. Lynch, and V. S. Cooper, 2015 The rate and Long, H. A., T. Paixão, R. B. R. Azevedo, and R. A. Zufall, molecular spectrum of spontaneous mutations in the GC-rich multi- 2013 Accumulation of spontaneous mutations in the ciliate Tetrahy- chromosome genome of Burkholderia cenocepacia. Genetics 200: 935– mena thermophila. Genetics 195: 527–540. 946. Lynch, M., 2010 Evolution of the mutation rate. Trends Genet. 26: 345–352. Drake, J. W., 1991 A constant rate of spontaneous mutation in DNA-based Lynch, M., J. Blanchard, D. Houle, T. Kibota, S. Schultz et al., microbes. Proc. Natl. Acad. Sci. USA 88: 7160–7164. 1999 Perspective: spontaneous deleterious mutation. Evolution 53: 645– Elena, S. F., and J. A. G. de Visser, 2003 Environmental stress and the 663. effects of mutation. J. Biol. 2: 12. Lynch, M., W. Sung, K. Morris, N. Coffey, C. R. Landry et al., 2008 A Farlow, A., H. Long, S. Arnoux, W. Sung, T.G. Doak et al., 2015 The genome-wide view of the spectrum of spontaneous mutations in yeast. Spontaneous mutation rate in the fission yeast Schizosaccharomyces Proc. Natl. Acad. Sci. USA 105: 9272–9277. pombe. Genetics 201: 737–744. Marin, B., and M. Melkonian, 2010 Molecular phylogeny and classification Fernández, J., and C. López-Fanjul, 1996 Spontaneous mutational variances of the Mamiellophyceae class. nov. (Chlorophyta) based on sequence and covariances for fitness-related traits in Drosophila melanogaster. comparisons of the nuclear- and plastid-encoded rRNA operons. Protist Genetics 143: 829–837. 161: 304–336. Fry, J. D., 2001 Rapid mutational declines of viability in Drosophila. Genet. Martin, G., and T. Lenormand, 2006 The fitness effect of mutations across Res. 77: 53–60. environments: a survey in light of fitness landscape models. Evolution 60: Fry, J. D., 2004 On the rate and linearity of viability declines in Drosophila 2413–2427. mutation-accumulation experiments: genomic mutation rates and syn- Massana, R., 2011 Eukaryotic picoplankton in surface oceans. Annu. Rev. ergistic epistasis revisited. Genetics 166: 797–806. Microbiol. 65: 91–110. Fry, J. D., and S. L. Heinsohn, 2002 Environment dependence of mutational Moreau, H., B. Verhelst, A. Couloux, E. Derelle, S. Rombauts et al., parameters for viability in Drosophila melanogaster. Genetics 161: 1155– 2012 Gene functionalities and genome structure in Bathycoccus prasi- 1167. nos reflect cellular specializations at the base of the green lineage. Genome Fry, J. D., S. L. Heinsohn, and T. F. C. Mackay, 1996 The contribution of Biol. 13: R74. new mutations to genotype-environment interaction for fitness in Dro- Morgan, A. D., R. W. Ness, P. D. Keightley, and N. Colegrave, sophila melanogaster. Evolution 50: 2316–2327. 2014 Spontaneous mutation accumulation in multiple strains of Fry,J.D.,P.D.Keightley,S.L.Heinsohn,andS.V.Nuzhdin, the green alga, Chlamydomonas reinhardtii. Evolution 68: 2589– 1999 New estimates of the rates and effects of mildly deleterious 2602. mutation in Drosophila melanogaster.Proc.Natl.Acad.Sci.USA96: Mukai, T., 1964 The genetic structure of natural populations of Drosophila 574–579. melanogaster. I. Spontaneous mutation rate of polygenes controlling vi- Guillard, R. R. L., and P. E. Hargraves, 1993 Stichochrysis immobilis is a ability. Genetics 50: 1–19. diatom, not a chrysophyte. Phycologia 32: 234–236. Ness, R. W., A. D. Morgan, N. Colegrave, and P. D. Keightley, Hall, D.W., S. Fox, J.J. Kuzdzal-Fick, J.E. Strassmann, and D.C. Queller, 2012 Estimate of the spontaneous mutation rate in Chlamydomonas 2013 The rate and effects of spontaneous mutation on fitness traits in reinhardtii. Genetics 192: 1447–1454.

2070 | M. Krasovec et al. Pecqueur, D., F. Vidussi, E. Fouilland, E. L. Floc’h, S. Mas et al., Subirana, L., B. Péquin, S. Michely, M. L. Escande, J. Meilland et al., 2011 Dynamics of microbial planktonic food web components during a 2013 Morphology, genome plasticity, and phylogeny in the genus Os- river flash flood in a Mediterranean coastal lagoon. Hydrobiologia 673: treococcus reveal a cryptic Sspecies, O. mediterraneus sp. nov. (Mamiel- 13–27. lales, Mamiellophyceae). Protist 164: 643–659. Piotrowska-Niczyporuk, A., and A. Bajguz, 2014 The effect of natural and Sung, W., M. S. Ackerman, S. F. Miller, T. G. Doak, and M. Lynch, synthetic auxins on the growth, metabolite content and antioxidant re- 2012 Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl. sponse of green alga Chlorella vulgaris (Trebouxiophyceae). Plant Growth Acad. Sci. USA 109: 18488–18492. Regul. 73: 57–66. Vassilieva, L. L., and M. Lynch, 1999 The rate of spontaneous mutation for R Core Team, 2014 R: A Language and Environment for Statistical Com- life-history traits in Caenorhabditis elegans. Genetics 151: 119–129. puting, R Foundation for Statistical Computing, Vienna, Austria. Vassilieva, L. L., A. M. Hook, and M. Lynch, 2000 The fitness effects of Rutter, M. T., A. Roles, J. K. Conner, R. G. Shaw, F. H. Shaw et al., spontaneous mutations in Caenorhabditis elegans. Evolution 54: 1234– 2012 Fitness of Arabidopsis thaliana mutation accumulation lines 1246. whose spontaneous mutations are known. Evolution 66: 2335–2339. Wloch, D. M., K. Szafraniec, R. H. Borts, and R. Korona, 2001 Direct Sanchez-Ferandin, S., F. Leroy, F.-Y. Bouget, and F. Joux, 2013 A new, estimate of the mutation rate and the distribution of fitness effects in the sensitive marine microalgal recombinant biosensor using luminescence yeast Saccharomyces cerevisiae. Genetics 159: 441–452. monitoring for toxicity testing of antifouling biocides. Appl. Environ. Worden, A. Z., J. K. Nolan, and B. Palenik, 2004 Assessing the dynamics Microbiol. 79: 631–638. and ecology of marine picophytoplankton: the importance of the eu- Schaack, S., D. E. Allen, L. C. Latta, K. K. Morgan, and M. Lynch, 2013 The karyotic component. Limnol. Oceanogr. 49: 168–179. effect of spontaneous mutations on competitive ability. J. Evol. Biol. 26: Worden, A. Z., J. H. Lee, T. Mock, P. Rouzé, M. P. Simmons et al., 451–456. 2009 Green evolution and dynamic adaptations revealed by genomes of Schrider, D. R., D. Houle, M. Lynch, and M. W. Hahn, 2013 Rates and the marine picoeukaryotes Micromonas. Science 324: 268–272. genomic consequences of spontaneous mutational events in Drosophila Wright, S., 1932 The roles of mutation, inbreeding, crossbreeding, and melanogaster. Genetics 194: 937–954. selection in evolution. Proc. Sixth Int. Congr. Genet. 1: 356–366. Schwartz, D. C., and C. R. Cantor, 1984 Separation of yeast chromosome- Zeyl, C., and J. A. DeVisser, 2001 Estimates of the rate and distribution of sized DNAs by pulsed field gradient gel electrophoresis. Cell 37: 67–75. fitness effects of spontaneous mutation in Saccharomyces cerevisiae. Ge- Shaw, R. G., D. L. Byers, and E. Darmo, 2000 Spontaneous mutational netics 157: 53–61. effects on reproductive traits of Arabidopsis thaliana. Genetics 155: 369– 378. Communicating editor: S. I. Wright

Volume 6 July 2016 | Effects of Mutations in Algae | 2071

CHAPITRE 3:

LE TAUX DE MUTATION CHEZ LES

MAMIELLOPHYCEAE

61

62 Quel est le taux de mutations spontanées des algues vertes (Chlorophytes) et comment varie-t-il ?

Les lignées mutantes issues des EAMs ont accumulé de ~80 à ~500 générations indépendantes. Ce chapitre traite des données génomiques de ces lignées, dont 153 ont été séquencées par Illumina MiSeq et HiSeq, ainsi que les 4 types ancestraux. Il n’est question ici que des Mamiellophycea, le taux de mutation de Picochlorum RCC4223 étant abordé dans le chapitre 5. L’intégralité des données fut traitée avec le même pipeline informatique. En résumé, les génomes sont alignés sur le génome de référence disponible avec BWA (Li and Durbin, 2010); les fichiers de sortie sont traités avec Samtools (Li et al., 2009); les mutations sont identifiées avec GATK (DePristo et al., 2011). Une importante variation du taux de mutation est mise en évidence et discutée. Les régions non codantes et peu exprimées mutent plus que les autres régions du génome. Cela peut s’expliquer par des mécanismes de réparation liés à la transcription, appelés transcription-coupled repair (TCR). Les taux de mutation obtenus sont mis en relation avec ceux de la littérature existante sur les EAMs et les études de pédigrées (humain et souris). Nous discutons du rôle de la taille du génome et de la distance entre le GC% réel et celui à l’équilibre dans les variations inter espèces du taux de mutation. En effet, un biais de mutation est observé, avec une plus forte fréquence de mutation de GC vers AT qu’inversement. Cette augmentation du taux de mutation pour les nucléotides G et C induit un plus fort taux de mutation pour les génomes éloignés de leurs équilibres en GC. En résumé, cette étude met en avant différents facteurs de variations intra et inter génomiques et tente d’apporter des réponses quand à l’évolution du taux de mutation chez les eucaryotes. Le matériel supplémentaire de ce chapitre est disponible page 143 à 155.

63

64 The rate of spontaneous mutation rates in pico-algae and implications for mutation rate variation

Krasovec Marc*, Eyre-Walker Adam‡, Sanchez-Ferandin Sophie*, Piganeau Gwenael*.

* Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Observatoire Océanologique, F-66650, Banyuls/Mer, France ‡ School of Life Sciences, University of Sussex, Brighton BN1 9QG, United Kingdom

Keywords: Spontaneous mutation rate, Mutation accumulation, Effective population size, GC content, Phytoplankton.

Corresponding authors: [email protected]

ABSTRACT Mutation, the ultimate source of genetic variation, has been studied by generations of evolutionary biologists. Genome wide spontaneous mutation rates have been estimated by mutation accumulation experiments in many model species. Here, we report mutation rate estimations in four marine green algal species Bathycoccus prasinos, Ostreococcus tauri, Ostreococcus mediterraneus and Micromonas pusilla. There is a twofold variation of spontaneous mutation rate between species from µ=4.4 x 10-10 mutations per nucleotide per generation to 9.8 x 10-10. Within genomes, there is a threefold increase in the mutation rate in lowly transcribed regions, consistent with transcription-coupled DNA repair. The mutation rate variation between species can be explained by genome size, consistent with a lower fidelity of replication in larger genomes. Additionally, we provide evidence that departure from equilibrium GC content impacts the mutation rate, accounting for up to a 70% increase of the mutation rate in some eukaryotic species.

65 INTRODUCTION Mutations are responsible for the genetic variability within organisms (Wright, 1932), which permit adaptation by natural selection. Thus, estimation of the mutation rate (µ) is important for a better understanding of evolution and adaptability. Estimating the mutation rate was difficult until recently, because mutations are rare events, so methods either relied on reporter constructs, for example the reversion to antibiotic resistance, or phylogenetic methods which required knowledge of divergence times and assumptions of neutrality. However, new sequencing technologies have allowed the estimation of the mutation rate from either offspring- parent trios, in humans (Abecasis et al., 2010; Conrad et al., 2011) and mice (Adewoye et al., 2015; Uchimura et al., 2015), or mutation accumulation (MA) experiments (Halligan and Keightley, 2009; Lynch et al., 2008) in organisms such as Drosophila melanogaster (Haag-Liautard et al., 2007; Keightley et al., 2014a, 2009), Arabidopsis thaliana (Ossowski et al., 2010), Caenorhabditis elegans (Denver et al., 2012, 2009, 2004), unicellular eucaryotes such as Saccharomyces cerevisiae (Lang and Murray, 2008; Lynch et al., 2008; Wloch et al., 2001; Zhu et al., 2014) and bacteria such as Escherichia coli (Lee et al., 2012) and Salmonella typhimurium (Lind and Andersson, 2008). The mutation rate varies considerably across the tree of life from 1.94 x 10-11 in the ciliate Paramecium tetraurelia (Sung et al., 2012b) to 9.78 x 10-9 in the bacteria Mesoplasma florum (Sung et al., 2012a).

The variation of mutation rate between species appears to be correlated to two factors – genome size (Drake, 1991; Drake et al., 1998), and in particular the size of the protein coding component of the genome (Lynch, 2010a), and effective population size (Lynch, 2010a; Sung et al., 2012a). Both of these correlations may arise because of the limitations that genetic drift imposes on selection to minimize the mutation rate (Sung et al., 2012a). In asexual species, selection will favour an intermediate mutation rate, which generates sufficient advantageous mutations, whilst not generating too many deleterious mutations. In contrast, in sexual species, selection always acts to minimize the mutation rate because a modifier of the mutation rate only stays linked the mutations it causes for a short period of time and deleterious mutations are more prevalent than advantageous mutations, increasing the genetic load (Agrawal and Whitlock, 2012).

66 However, genetic drift ultimately limits the degree to which the mutation rate can be reduced (Martincorena and Luscombe, 2013), because the strength of selection acting on a modifier is equal to γ*U*s, where, γ is the proportional decrease in the mutation rate, U is the genomic rate of mutation and s is the average strength of selection against deleterious mutations. If γ*U*s<1/Ne then selection will be ineffective against the modifier and the mutation rate cannot be reduced further. Hence we expect the per site rate of mutation to depend upon the effective population size (Charlesworth, 2009; Lanfear et al., 2014) – species with larger Ne should have lower mutation rates – and genome size – the more selected sites there are the lower the mutation rate should be. These predictions appear to be largely upheld (Lynch, 2010a).

It has also been observed that there is variation in the mutation rate within a genome at a number of different scales, from differences between chromosomes, to variation between regions on a chromosome and variation between adjacent sites (Hodgkinson and Eyre-Walker, 2011; Schrider et al., 2011). As an example, the Y- chromosome in humans and chimps mutates faster than the other chromosomes (Ebersberger et al., 2002). It is also known known that mitochondria has a higher mutation rate than nuclear genome in Caenorhabditis elegans (Denver et al., 2009, 2000), Homo sapiens (Rebolledo-Jaramillo et al., 2014) and Drosophila melanogaster (Haag-Liautard et al., 2008; Keightley et al., 2009). Within chromosomes, it has been shown that nucleotide context affects the mutability of a site in Chlamydomonas reinhardtii (Ness et al., 2015b), Bacillus subtilis (Sung et al., 2015) and humans (Aggarwala and Voight, 2016; Gojobori et al., 1982). In mammals, the most conspicuous effect is the high mutability of CpG dinucleotides resulting from cytosine deamination (Coulondre et al., 1978; Fryxell and Zuckerkandl, 2000), which leads to an 80% reduction in the frequency of the CpG dinucelotide in the human genome (Lander et al., 2001). Gene expression also affects the rate of mutation and its effect is controversial. First, the mutation rate seemed lower in highly expressed genes (Martincorena et al., 2012). However, analysis on MA lines in Escherichia coli highlighted that the mutation rate increases with gene expression (Chen and Zhang, 2013). The last findings is congruent with observations in Saccharomyces cerevisiae

67 and humans (C. Park et al., 2012; Polak and Arndt, 2008). This phenomenon, resulting from an alteration of DNA sequence associated with transcription process, is known as transcription-associated mutagenesis (Kim and Jinks-Robertson, 2012).

In this study, we provide the first estimates of the spontaneous mutation rate in 4 species of haploid green algae (Chlorophyta, Mamiellophyceae (Marin and Melkonian, 2010)): Ostreococcus tauri RCC4221 (Blanc-Mathieu et al., 2014), O. mediterraneus RCC2590 (Subirana et al., 2013), Micromonas pusilla RCC299 (Worden et al., 2009) and Bathycoccus prasinos RCC1105 (Moreau et al., 2012), with compact genomes containing 83% to 84% coding sequences. Green algae constitute one of the most important photosynthetic group on Earth, with an ubiquitous repartition in global ocean (de Vargas et al., 2015), and play a fundamental role in foodweb and biogeochimical cycles (Worden et al., 2015). These green algae span a large evolutionary divergence as revealed by a high proportion of species-specific genes, and high amino-acid divergence between orthologous genes (Jancek et al., 2008; Šlapeta et al., 2006). Their genome size ranges from 13 Mb to 21 Mb and their average GC content from 48 to 63 %. Combined with spontaneous mutation rates from previous studies, these new data enable the exploration of the role of genome size, transcription rates and GC content on mutation rate variation.

MATERIAL AND METHODS MA experiments Mutation accumulation experiments were performed on four haploid marine green algae (Chlorophyta): Ostreococcus tauri RCC4221, O. mediterraneus RCC2590, Micromonas pusilla RCC299 and Bathycoccus prasinos RCC1105. All strains are maintained in the Roscoff Culture Collection (RCC), in France (http://roscoff-culture-collection.org/). MA lines were started from a clonal population and maintained in L1 liquid medium in 24 wells of a microtiter plate, with a one-cell bottleneck every 14 days (Krasovec et al., 2016). Serial bottlenecks allowed to largely removes the influence of natural selection (the average effective population size, estimated with the harmonic mean of cell number, varied between 6 and 9 across the four species, Table S1).

68 Cell concentrations of MA lines were measured by flow cytometry using a FACSCanto II flow cytometer (Becton Dickinson, Franklin Lakes, NJ, U.S.A.), relative to their natural chlorophyll fluorescence (FL3 acquisition at 670 nm) and size scatter (SSC) acquisitions. Depending of cell concentration, the volume corresponding to one cell was inoculated into a new well plate with new media (we

always assumed N0=1 to estimated the effective population size). The number of generations per day, G, was estimated as follows: !" [!" ( )/!] � = � ! where Nt is the cells number in the well at bottleneck time, and t = 14 days. MA experiments were performed over a period of 224 to 378 days depending on the species and MA lines accumulated between 80 and 500 independent generations (Table S1).

Sequencing DNA of ancestral types and MA lines were extracted as described previously (Winnepenninckx et al., 1993) and sequenced with Illumina technology. All library preparations and sequencing were performed by GATC biotech® (Konstanz, Germany). Two different sequencing technologies were used: MiSeq for O.tauri and O. mediterraneus, and HiSeq for B. prasinos and M. pusilla. Reads from ancestral types and MA lines were aligned to the reference genomes using BWA (Li and Durbin, 2010) (M. pusilla: GCA_000151265.1; O. tauri: GCF_000214015.2; B. prasinos: ; O. mediterraneus in preparation) and SAMtools (Li et al., 2009) were used to obtained bam and mpileup files. The four ancestral types and 150 MA lines were sequenced: 40 for O. tauri, 37 for O. mediterraneus, 37 for M. pusilla and 36 for B. prasinos.

Mutation identifications Mutations were called from mpileup files (Li et al., 2009) using GATK (DePristo et al., 2011). The final mutation candidates were filtered to remove low mapping quality regions (<50), low coverage regions (<5 reads), and shared mutations between all MA lines. The number of callable sites per genome above these thresholds was computed to estimate the per base pair mutation rate (97 to 99% of the genomes was callable (Table S2)). All mutation candidates were

69 compared to the ancestral type to discard spurious candidates that result from discrepancies between the reference genome and the ancestral strain at the start of the MA experiment (e.g. 9 substitutions in O. tauri RCC4221 occurred between 2001 and 2009 (Blanc-Mathieu et al., 2014)). Sanger re-sequencing of 22 random mutation candidates were found to be correct (true positive rate = 100%). Whether the mutation was non-synonymous, synonymous, intronic or intergenic was extracted with snpEff (Cingolani et al., 2012). This calling method has been used for base substitution and indels mutations.

Mutation rate at equilibrium GC content

Let R1 be equal to the rate of mutation from GC to AT, R2 from AT to GC, R3

the rate of mutation between A and T, and R4 be the rate between G and C (NN→NN) �!= (1) !!!

NNn is the number of GC or AT sites in the genome; NN→NN is the number of mutations from GC→AT or GC→AT; Then it is straightforward to show that the GC- content at mutational equilibrium (Sueoka, 1962)

!! ��!" = (2) !!!!!

Assuming that R1, R2, R3 and R4 are constant, the expected mutation rate at equilibrium is

µeq = GCeq * (R1 + R3) + (1-GCeq) * (R2 + R4) (3)

Mutation spectrum tests To investigate the effects of context we extracted the 10bp either side of each mutated site and used binomial tests to investigate whether a particular trinucleotide, either NXN or NNX, where X is the mutated site, has a significantly higher or lower mutation rate. We also ran a logistic regression to test whether the GC content surrounding the site affected whether the site had a mutation or not. To investigate whether gene expression affected the rate of mutation we used STAR (Dobin et al., 2013) to compute the coverage of the genome by RNAseq data, available from the ORCAE web site (Sterck et al., 2012), for B. prasinos (RNAseq data from Moreau and co-workers (Moreau et al., 2012)) and O. tauri (RNAseq data from Blanc-Mathieu and co-workers (Blanc-Mathieu et al., 2014)).Statistical analyses were performed with R (version 3.1.1) (R Development Core Team, 2011).

70 RESULTS Mutation rates within Mamiellophyceae We have performed an MA experiment in four species of algae. All together, we found 238 single nucleotide mutations and 48 indels, summarized in Table 1. Mutation types are provided in Tables S3 to S8. The numbers of synonymous and non-synonymous mutations are as expected if mutations are randomly distributed across sites for all species (Table 2), consistent with a lack of selection on non- synonymous spontaneous mutations along the MA experiments. We thus assume that the rates and patterns of mutation are not affected by selection.

The base substitution mutation rate (µbs) and the insertions-deletions mutation

rate (µID) per nucleotide per generation were estimated on callable sites, which represented 97 to 99% of the genome (Table S2). The total mutation rate, µtot, is the sum of µbs and µID. Mutation rates varied (not significantly, Kruskal-Wallis test) over two-fold from 4.4 x 10-10 mutations per site per generation in B. prasinos to 9.8 x 10- 10 in M. pusilla.

Table 1. Summary of spontaneous mutation rates in four Mamiellophyceae species. BS is the number of base-substitution mutations, Ins the number of insertions and Del the number of deletions. G is the genome size in Mb and µ the mutation rate per nucleotide per genome per generation. TotGen is the total number of generations accumulated per species.

-10 -10 -10 Species TotGen G (Mb) BS Ins Del µbs µID µtot

O. tauri 17 250 12.46 91 5 8 4.19 0.60 4.79

O. mediterraneus 8 380 13.34 54 3 8 4.92 1.00 5.92

B. prasinos 4 145 14.96 22 5 5 3.02 1.37 4.39

M. pusilla 4994 20.99 71 2 12 8.15 1.61 9.76

Non-random mutation events in the genome: It has been reported that the rate of mutation at a site varies between nucleotides, and that some trinucleotides are more mutable than others, both in eukaryotes (Ness et al., 2015b) and bacteria (Sung et al., 2015). However, we did not detect any influence of adjacent nucleotides or GC content upon the mutation rate in our data. The analysis of the distribution of mutations across these four species reveals significant deviations from a uniform distribution of mutations along the genome.

71 First, mutation events tend to cluster within adjacent nucleotides: of our 238 base substitution mutations across all species, 37 occurred adjacent to one another. These clustered mutations probably represent single mutational events since each multiple mutation is found within a single strain. No mutations were found in the mitochondria or chloroplast genomes of these species; this is perhaps not surprising since both genomes are small relative to the nuclear genome and both have lower nucleotide diversity than the nuclear genome suggesting that they might have lower mutation rates, consistent with patterns seen in higher plants (Smith, 2015). Second, there is an excess of mutations in non-coding (Chi-Square, P- value<0.01) and lowly expressed sequences (Wilcoxon test, P-value<0.001) as opposed to coding regions (Table S9). The mutation rate varies by three fold (Table 2), in opposition to which is observed both in humans and yeast (C. Park et al., 2012). Third, there are significantly more deletions than insertions (Binomial test, P- value<0.05) if we combine the data from all species. A deletion bias has been reported in species among the three domain of life (Kuo and Ochman, 2009), and may have contributed to the compact genomes of Mamiellophyceae species. Most indels appeared in non-coding regions, and from the 20 indels occurring in coding regions, there are 14 frame shifts, 2 codon insertions and 4 codon deletions. Last, mutations are overrepresented at the first and last 1000 bp of chromosomes (Binomial test, P-value < 0.001). However, despite the hypervariable telomeric regions described above, there are no significant differences in the mutation rate between chromosomes (Chi-Squared test, ns).

Table 2. Mutation rate variation between coding and non-coding sequences. The bias of mutation toward non-coding sequences is significant, with P-value<0.01 (Chi-squared test). Syn and non-syn are the synonymous and non-synonymous point mutations.

% genome µ x 10-10 µ x 10-10 N mutations Species non-coding coding non-coding syn : non-syn : coding regions regions O. tauri 18.4 : 81.6 3.9 8.9 19 : 42 O. mediterraneus 15.6 : 84.4 5.0 11.7 9 : 34 B. prasinos 16.9 : 83.1 3.4 14.7 5 : 10 M. pusilla 18.1 : 81.9 8.15 16.1 15 : 41

72 The direction of base-substitution mutations The mutational spectrum of the Mamiellophyceae is biased towards GC to AT mutations (significant for O. tauri and M. pusilla; Binomial test, P-value<0.01 and P-

value<0.05, respectively) (Figure 1 and S1). The equilibrium GC content, GCeq, is

substantially lower than the current GC content (GCeq = 36.8% and GCobs = 59.0% for O. tauri, 43.5% and 56.0% for O. mediterraneus, 46.2% and 63.8 for M. pusilla and 36.8% and 48.0 for B. prasinos), which suggests that other forces are acting to maintain the GC content above its mutational equilibrium. It can be noticed that two chromosomes are defined as outlier chromosomes in Mamiellophyceae because they have a lower GC content than the other chromosomes, and are closer to the

GCeq. These chromosomes have a GC% of 51.3% in M. pusilla, 49.9% in O. mediterraneus, 54.3% in O. tauri and 41.9% in B. prasinos.

*** 60 60 60

* 30 40 37

30

20 19

20 18 16

8 3

0 GC!AT AT!GC GC!AT AT!GC GC!AT AT!GC GC!AT AT!GC O. tauri O. mediterraneus B. prasinos M. pusilla Figure 1. The GC to AT and AT to GC mutations in the four species. GC to AT bias is significant in 1 2 3 -74 5 6 7 8 O.tauri and M. pusilla (Binomial test, P-value = 4 and 0.02, respectively).

Inter-genomic variation in the mutation rate in eukaryotes Several ecological and biological factors have been proposed to explain the variation in the mutation rate between species, such as genome size and effective population size. We compiled the available estimates of the spontaneous mutation rate from whole genome sequencing in wild-type strains (Table S10), adding the new mutation rate estimates from this study.

($! Following Sung and co-workers, we performed a meta analysis to investigate the effective population size effect on the mutation rate (Sung et al., 2012a). Using Ne estimates provided in Table S10, we observe a significant decrease of the mutation rate with the effective population size with all species (n=17, Pearson correlation P-value<0.001, "=-0.71) (Figure 2). There is a negative correlation between genome size (G) and mutation rates in bacteria (n=8, Pearson correlation, P-value=0.001, "=-0.95) and a positive correlation between G and # in eukaryotes, where the mutation rate increases with genome size (n=18, Pearson correlation, P-value=0.001, "=0.69, Figure 3A). These results are consistent with a previous meta analysis by Lynch and Sung et al. (Lynch, 2010a; Sung et al., 2012a), where it was suggested that the proportion of coding regions, rather than genome size was the relevant parameter. The increase of U with genome size in eukaryotes reveals an increase of the number of mutations per genome at each division (n=18, Pearson correlation, P-value=0.0001, "=0.92, Figure 3C). Note that there are two outlier species in eukaryotes, Dictyostelium discoideum and Paramecium tetraurelia (Figure 3). Exclude these two species would ! provide more significant results. Additionally, the relation between genome size and mutation rate is also observed using the size of the protein coding genome, excluding the two outliers species (n=16, Pearson correlation, P-value=0.0001, "=0.84, Figure 3B). -7 µ ) -8 -9 Log_µ

-10 Bacteria Unicellular Eukaryotes Mamiellophyceae Log10 of nucleotide mutation rate ( Metazoans Arabidopsis -11

4.04 5.05 6.06 7.07 8.08 9.09 Log10 of effective population size (Ne) Log_Ne Figure 2. Correlation of the base substitution mutation rate and the effective population size (n=17, Pearson correlation P-value<0.001, "=-0.71).

(%! $"

-7.5 Figure 3. Correlation of the base µ )

-8.0 substitution mutation rate, µ, in log10 scale. Raw data come from Table S10. -8.5 Blue regressions are done without the 2 -9.0 outliers, Dictyostelium discoideum and Log_µ Paramecium tetraurelia. A. Mutation rates as a function of the genome size G (n=18 -10.0 !"#$%&'() eukaryotes, Pearson correlation, P- Pt Dd value=0.001, "=0.69; n=16 eukaryotes, Log10 of nucleotide mutation rate (

-11.0 -10.0 -9.0 -8.0 -11.0 Pearson correlation, P-value<0.0001, -11.0 0.00 1.01 2.02 3.03 "=0.89; n=8 bacteria, Pearson correlation, P-value<0.0003, =-0.95). B. Mutation Log10 of genomeLog_G size (Mb) " rates as a function of effective genome #" size, estimated as the coding genome -7.5 size, Ge (n=16 eukaryotes, Pearson µ )

-8.0 correlation, P-value<0.0001, "=0.84). C. Mutation rate per genome as a function of -8.5 genome size (n=18 eukaryotes, Pearson -9.0 correlation, P-value<0.0001, "=0.92). Log_µ

-10.0

Pt Dd

Log10 of nucleotide mutation rate ( !"#$%&'()

-11.0 -11.0 -10.0 -9.0 -8.0 -11.0 0.00.0 0.50.5 1.0 1.0 1.5 1.5 2.0 Log10 of effective genome size (Mb) Log_Ge

!"

1

0

-1 Log_U

-2

Log10 of genomic mutation rate (U) -3

-3 -2 -1 0 1

0.00 1.01 2.02 3.03

Log10 of genomeLog_G size (Mb)

(&! One striking feature of the mutation spectrum in Mamiellophyceae is the high GC->AT mutation bias and the large gap between the observed genomic GC content and the equilibrium GC content predicted from the pattern of mutation (Table S11). Departures from the equilibrium GC-content affect the mutation rate; if the mutation rate is biased towards AT and the observed GC content is above the equilibrium value, then the mutation rate is elevated relative to its value at the equilibrium GC content (Figure 3A). If we calculate the mutation rate at the equilibrium GC-content we find that the observed mutation rate can be up to 2.5-fold higher than expected at equilibrium. The ratio of the observed and the equilibrium mutation rates is highly correlated to the ratio of the observed and equilibrium GC content (Pearson, P-value = 2 x 10e-16, " = 0.99, Figure 4B). The correlation between the observed mutation rate and GCr is also positive (Pearson correlation, P-value = 0.01, " = 0.51) (Figure

4A). GC->AT bias, estimated as R1/R2, is positively correlated to the nucleotide mutation rate, excluding Mesoplasma florum and Paramecium tetraurelia (Pearson correlation, P-value=0.002, #=0.61), Figure S3.

!" #"

-7 Bacteria Unicellular Eukaryotes µ )

2.5 Mamiellophyceae Metazoans Arabidopsis -8 Mp 2.0

eq -9 µ /µ / Log_µ tab$Rµ obs µ 1.5 -10

Log10 of nucleotide mutation rate ( Pt 1.0 1.0 1.5 2.0 2.5 -11 -11.0 -10.0 -9.0 -8.0 -7.0 -11.0

1.01 2.02 3.03 4.04 1.01 2.02 3.03 4.04 GCr (GC/GC ) GCr (GC/GC ) eq eq tab$Rgc tab$Rgc Figure 4. Correlation between mutation rate and gap from GC equilibrium. Pt is Paramecium tetraurelia and Mp is Mesoplasma florum. A. Observed base-substitution mutation rates as function of relative gap from GCeq. (Pearson, P-value = 0.01, " = 0.51, excluding Pt). B. Correlation between relative increase of observed mutation rate from equilibrium mutation rate and relative gap from e-16 GCeq, Pearson, P-value = 2 x 10 , " = 0.99.

('! To test the effect of effective population size, Ne, effective genome size, Ge, genome size, G and distance to equilibrium GC content, GCr, on the spontaneous mutation rate, we performed stepwise selection of these predictor variables using the stepAIC function from R version 3.1.1 (R CoreTeam 2014). Data set was n=11 eukaryotes, excluding the outlier Paramecium tetraurelia. The final fit model included two parameters, G and GCr (AIC=-28.8). In conclusion, spontaneous mutation rates in eukaryotes increase with genome size G and with distance to equilibrium GC content, GCr. In this dataset, the effective population size effect may be cancelled by the genome size effect as Ne and G are negatively correlated (Pearson correlation, P-value = 0.004, ρ = -0.64).

DISCUSSION We have performed mutation accumulation experiments in 4 species of pico- phytoplankton followed by whole genome sequencing. In total we have observed 238-point mutations and 48 indels. These have allowed us to study various aspects of the mutation rate. The genome coverage of each mutation accumulation lines is 98% and mutation rates vary from µ=4.4 x 10-10 to 9.8 x 10-10 mutations per nucleotide per generation.

Within genome variation of mutation rate We observed a two to three fold difference in the mutation rate of coding and non-coding regions in our dataset. There are several possible explanations for this observation. First, this could simply reflect selection against spontaneous mutations in coding regions. The MA experiment was designed such that all but the most strongly deleterious mutations would accumulate. However, strongly deleterious mutations will lead to line loss, which is something we observed (Krasovec et al., 2016). In coding regions, approximately one third of nucleotide positions are synonymous and are thus expected to have little consequence on fitness. If selection occurred during the MA experiments, mutations in coding regions should be biased towards synonymous mutations. There is no excess of synonymous mutations in any of the MA experiments (Chi-squared test, NS), consistent with a lack of selection during the experiment.

77 Second, the lower mutation rate in coding regions could reflect a difference in the efficiency of mismatch repair (MMR) between coding and non-coding regions (Kunkel and Erie, 2015). The MMR efficiency may be optimized in coding region of the genome (Foster et al., 2015; Lee et al., 2012). In E. coli, MA experiment using wild type and MMR deficient lines show that the bias between coding and non- coding sequence disappears in MMR deficient lines (Foster et al., 2015). Third, the higher mutation rate in non-coding regions could come from transcription-coupled DNA repairs (TCR) (Hanawalt and Spivak, 2008). This system allows the repair of lesions in the DNA that are encountered during transcription and hence is more likely in the regions of the genome that are expressed. In Mamiellophyceae, genes coding for TCRs have been identified (gene family HOMO03P001591 from the picoplaza database) (Vandepoele et al., 2013).

In conclusion, mutation rates in Mamiellophyceae do not occur randomly along the genome, and we report one of the largest differences in the mutation rate between coding and non-coding sequences in eukaryotes. The difference in the mutation rate between coding and non-coding may affect mutation rate estimates in some other species. In our data we were able to call de novo mutations in 98.5% of the genome, but this fraction is much smaller in some other studies such A. thaliana (Ossowski et al., 2010) (78%), C. reinhardtii (Ness et al., 2015b, 2012) (75%) and Heliconius melpomene (Keightley et al., 2014b) (46%). In some of these studies there is a bias towards coding regions, which will decrease the estimated mutation rate if there are differences between coding and non-coding regions as we have reported here.

Inter-specific mutation rate variation The genomic mutation rate varies by about two-fold amongst the four species of Mamiellophyceae investigated here. Including these in an analysis of all mutation rate estimates confirms the negative correlation between the mutation rate and effective population size, the positive correlation between genome size (G and Ge) and the mutation rate in eukaryotes (Smeds et al., 2016). There are a number of potential explanations for why the mutation rate might be positively correlated to genome size. (i) Larger genomes are more costly to replicate and this might lead to a

78 trade-off in fidelity. (ii) As we have shown the mutation rate is higher in non-coding sequences and larger genomes have a higher proportion of non-coding DNA: i.e ~80% of the genome is coding in Mamiellophyceae, ~20% to ~50% in D. melanogaster, C. elegans or A. thaliana and less than 2% in H. sapiens and mice; (iii) it could be due to the negative correlation between genome size and effective population size. To investigate this last explanation we ran a multiple regression of the log mutation rate against the log genome size and log effective population size. The best model only keeps the genome size, which is more relevant than the effective population size (AIC=-26.1 for G and -24.8 for G and Ne).

In addition to genome size, the departure from the equilibrium GC base composition is responsible for a substantial part of mutation rate variation between species, as a consequence of the increase of the mutation rate with an increase of the relative GC content from equilibrium. The mutation rate expected at equilibrium GC composition is lower than the observed mutation rate measured especially in Arabidopsis thaliana (Ossowski et al., 2010) and Mesoplasma florum (Sung et al., 2012a). Most species have a higher GC content than expected from the GC->AT and AT->GC mutation rates at the equilibrium and this gap is responsible of an increase of the mutation rate. The forces that increase the GC content of the genome thus contribute to an increase in the spontaneous mutation rate in the majority of species studied by MA experiment (Table S10). Two mechanisms could be responsible for an increase in GC content above the equilibrium value; selection and biased gene conversion. (i) Selection can act on protein coding sequences, synonymous codon use and gene regulatory sequences, in a manner which is expected to lead to less biased base composition than the mutational spectrum would cause. (ii) Biased gene conversion, a byproduct of recombination, has been identified in many organisms from bacteria (Lassalle et al., 2015) and yeast (Harrison and Charlesworth, 2011; Lesecque et al., 2013) to humans (Duret and Arndt, 2008; Duret and Galtier, 2009). There is indirect evidence of recombination in O. tauri (Grimsley et al., 2010), so that GC biased gene conversion may be involved in GC content of Mamiellophyceae.

79 CONCLUSION Our analysis of DNM in Mamiellophyceae species has shown that the spontaneous mutation rate is ~3-fold higher in non-coding than coding regions. In Eukaryotes, mutation rates increase with effective genome size and the distance from the GC content equilibrium, providing support that processes increasing the GC content may influence the spontaneous mutation rate. Because of this, we propose that the distance a genome is from the GC equilibrium is an important parameter in determining the mutation rate.

ACKNOWLEDGEMENTS We are grateful to Claire Hemon, Elodie Desgranges and Christophe Salmeron for technical assistance with the mutation accumulation experiments from 2012 to 2015 and to the Genomics of Phytoplankton lab for support and stimulating discussions. We acknowledge the O. mediterraneus genome consortium for access to the complete genome data and the GenoToul Bioinformatics platform from Toulouse, France, for bioinformatics analysis support and GenoToul cluster availability. This work was funded by ANRJCJC-SVSE6-2013-0005 to GP and SSF.

80

CHAPITRE 4:

LES TRANSFERTS HORIZONTAUX DE GENES : LE CAS DE PICOCHLORUM

RCC4223

81

82 Quelle est la part des HGTs dans la diversification des algues vertes (Chlorophytes)?

En plus des mutations, différents processus créent aussi de la diversité génétique. L’un de ces processus est le transfert horizontal de gènes. Des transferts horizontaux de gènes ont été proposés chez plusieurs espèces de Chlorophycées, y compris une espèce de Picochlorum SE3. Pour cette raison, un nouveau génome de référence d’une nouvelle souche de Picochlorum, la Picochlorum RCC4223, est étudié pour tester cette hypothèse.

Le génome fut construit à partir de données issues de séquençages PacBio RS II et Illumina MiSeq 2000. Différents assembleurs ont été utilisés jusqu'à obtenir un génome de qualité satisfaisante: HGAP (Chin et al., 2013), ABySS (Simpson et al., 2009), SGA (Simpson and Durbin, 2012), SSPACE (Boetzer et al., 2011) et Geneious (Kearse et al., 2012). L’annotation à été faite à partir des bases de données ORCAE (Sterck et al., 2012) et PicoPLAZA (Vandepoele et al., 2013).

27 candidats pour des HGTs sont proposés, et une analyse de génomique comparative permet d’identifier des familles de gènes surreprésentées chez cette nouvelle souche par rapport aux autres génomes d’algues vertes connus: plusieurs retro-transcriptases, une famille de polykétide synthase, une endonucléase et une hélicase.

En parallèle de ces études génomiques, une caractérisation phénotypique confirme l’halotolérance déjà observée chez le genre Picochlorum , alliée à une forte thermotolérance chez RCC4223. De plus, l’étude d’un méta-génome obtenu sur 4 sites d’échantillonnage en mer Méditerranée confirme la présence de séquences 18S appartenant à Picochlorum RCC4223 dans le milieu marin.

Enfin, ce génome constitue une nouvelle ressource pour la communauté scientifique en générale.

Le matériel supplémentaire de ce chapitre est disponible page 156 à 158.

83

84 Genomic insights into a thermotolerant and halotolerant Trebouxiophyceae: Picochlorum costavermella RCC4223

Marc Krasovec*, Sophie Sanchez-ferandin*, Stephane Rombauts#, Nigel Grimsley*, Sheree Yau*, Claire Hemon*, Hugo Lebredonchel*, Emmelien Vancaester#, Hervé Moreau*, Klaas Vandepoele#, Gwenaël Piganeau*

*Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie Intégrative des Organismes Marins (BIOM), ObservatoireOcéanologique, F-66650 Banyuls/Mer, France

#Department of Plant Systems Biology (VIB) and Department of Plant Biotechnology and Bioinformatics (Ghent University), Technologiepark 927, 9052, Ghent, Belgium

Keywords: Picochlorum, Green algae biotechnology, Horizontal gene transfers.

Corresponding authors: [email protected]

ABSTRACT Picochlorum species regroup halotolerant green algae, both studied for their high tolerance to extreme environments and for their high potential for biotechnologies. Here, we investigate the new genome of the strain Picochlorum RCC4223, isolated from an estuary connected to the Mediteranean Sea. The genome is 13.7 Mb length and contains 9315 genes, its GC content is 46 GC%. The average gene length is 1155 bp, shorter as compared to other green algae, and the genome contains 19 extended gene families. This study confirms the presence of horizontal gene transfers (HGT) from bacteria in the genome of Picochlorum. Last, 18S sequences 100% identical with RCC4223 were present in four meta-genomes from Mediterranean Sea samples, consistent with a marine habitat of the strain RCC4223.

85 INTRODUCTION In aquatic and ocean ecosystems, photosynthetic planktonic microorganisms are taxonomically very diverse, with representative in five of the six super-groups of the eukaryotic tree of life (Not et al., 2012) and produce an important part of primary production on Earth (Field et al., 1998; Worden et al., 2004). Chlorophyta (green algae) are ubiquitous members of phytoplanktonic communities (de Vargas et al., 2015) and are descendants of the first endosymbiosis, when an unicellular heterotrophic captured a cyanobacteria that evolved into the chloroplast, 1.6 billion years ago (Yoon et al., 2004). Among Chlorophyta, the Trebouxiophycae is a monophyletic green algae class proposed by Friedl in 1995 (Friedl, 1995), which contained the representative genus Chlorella. The Trebouxiophyceae includes phenotypically very diverse organisms; flagellates, coccoids, colonies and multicellular organisms (De Clerck et al., 2012), photosynthetic symbiosis with other eukaryotes (Blanc et al., 2010) and diverse cell division strategies (Yamamoto et al., 2007, 2003). Phylogenetic analysis place the Trebouxiophycae as a recent group in Chlorophyta evolution (Friedl and Rybalka, 2012; Leliaert et al., 2012). The lack of morphological discriminating features in the coccoid unicellular Trebouxiophyceae (e.g Chlorella, Picochlorum, Nannochloris) led to a conundrum on species description in many genera that has been partially solved by the generalization of molecular techniques. The same have sometimes been described as Nannochloris or Picochlorum genus and a distinction was proposed to clarify their phylogeny (Henley et al., 2004). It was suggested that Picochlorum alga regroup marine or saline autosporic taxa, supported by 18S rDNA phylogeny, including Picochlorum RCC4223, Picochlorum SE3 (Foflonker et al., 2015) and Picochlorum oklahomensis (Henley et al., 2004), while Nannochloris regroup freshwater algae. They are characterized by a genome size estimated from 13 Mb to 50 Mb (Yamamoto et al., 2001) according to species, a thick cell wall and halotolerance (Foflonker et al., 2016, 2015; Henley et al., 2002). The domestication of new algal species by evolutionary biology appeared to be one of possible solutions to the challenge imposed by the shortage of natural resources (Carroll et al., 2014). Biotechnological potential of green algae is the field of intense investigation. Several biotechnological applications have been proposed for Picochlorum algae as a consequence of their suitable lipid and protein content for

86 aquaculture (Becker, 2007; Chen et al., 2012), biofuel production (de la Vega et al., 2011; S.-J. Park et al., 2012; Tran et al., 2014; Zhu and Dunford, 2013) or bioremediation (von Alvensleben et al., 2013). Interestingly, a recent study suggests that microalgae from the Picochlorum genus may form symbiosis with human cell cultures. Black and co-workers (Black et al., 2014) provided evidence that a spontaneous symbiosis between Picochlorum eukaryotum and retinal humans cells in culture can be maintained in culture. This suggests that this alga is good model to study the onset of photosymbiosis. In this study, we provide the complete genome of Picochlorum RCC4223, a resource for the investigation of metabolic pathways and genome evolution mechanisms in Trebouxiophyceae.

MATERIALS AND METHODS Strain characterisation Picochlorum RCC4223 strain was isolated from the estuary of the river La Massane (June 2011, France). Water sample was spread on petri dish containing L1 (Guillard and Hargraves, 1993) agar medium. One colony was successively isolated, cloned and kept in L1 seawater medium flask. We sequenced the complete 18S rDNA sequence and obtained the karyotype by PFGE, following the protocol described by Yamamoto (Yamamoto et al., 2001). Phylogenetic position of Picochlorum RCC4223 was assessed with four methods with 18S rDNA sequence: maximum likelihood, neighbor joining, maximum parsimony and bayesian inference.

Phenotypic traits Picochlorum species are characterized by large range of extremotolerence. To explore phenotypic traits of Picochlorum RCC4223, fitness tests in two conditions were performed. First, halotolerant capacity was tested using fresh water 3N-BBM medium and salinity gradient (10 to 70 g.L-1 from L1 medium with adjustment of salinity). Algae were put in 48 well plates, with 5 samples per salinity and 5 controls in standard L1 medium. Plates were maintained one week at 20 °C with 12-12 h light-dark cycle.

87 Second, culture were maintained in the same condition as the halotolerant test, but only in L1 medium and in two temperature conditions: 20°C and 35°C. To compare with other green algae, the model species Ostreococcus tauri (Blanc-Mathieu et al., 2014; Courties et al., 1994) was exposed to the same conditions. Flow cytometer (Becton Dickinson, Franklin Lakes, NJ, U.S.A.) gave us the cells concentration to estimate the growth rate using below equation: !" [!" ( )/!] � = � !!

with N0 = 5000, Nt the final cell number and t = 7. G is defined as the number of cell divisions per day.

Picochlorum RCC4223 was isolated from the estuary of a river. As consequence, it is unsure whether its habitat is mainly freshwater or marine. We sampled two marine stations (SOLA marine station, 47 27’136 N 03 32’360 E; MOLA marine station, 42°27‘205 N 03°32’565 E; Mediterranean Sea, France) and two lagoon stations in Leucate lagoon (Mediterranean Sea, France). Two hypervariable regions of the 18S rDNA sequence were amplified: the V4 (380 nt length) and the V9 (94 nt length) regions by PCR in eight samples from the SOLA marine station, 14 samples from the MOLA marine station and 30 samples from the Leucate lagoon. Sequencing was done by Illumina HiSeq (GATC biotech®, Konstanz, Germany) and analysed using mothur (Schloss et al., 2009).

Sequencing and genome assembly DNA was extracted using the CTAB protocol modified from Winnepenninckx and co-workers (Winnepenninckx et al., 1993) and RNA using the Direct-zol™ RNA MiniPrep Kit from Zymo Research®. Two technologies have been used for whole genome sequencing. First, sequencing was done with the SMRT® Technology PacBio RS II by GATC biotech® (Konstanz, Germany). The genome was assembled by GATC biotech® (Konstanz, Germany), with InView™ De novo Genome 2.0. HGAP assembler. Second sequencing was done using MiSeq technology (GATC biotech®, Konstanz, Germany). Miseq reads were assembled with ABySS (Simpson et al., 2009).

88 To improve the PacBio HGAP assembly, we removed ABySS contigs smaller than 1kb length and bacterial contigs from ABySS and HGAP assembly. ABySS contigs were then aligned to PacBio contigs to build scaffolds using Geneious. MiSeq reads mapping to the HGAP scaffolds were done with two different assembly software; SSPACE (Boetzer et al., 2011) and SGA (Simpson and Durbin, 2012). Parameters and options tested are presented in the supplementary material.

Genome annotations The RNA libraries were constructed by GATC biotech® (Konstanz, Germany) and sequenced by Illumina HiSeq. Annotation was done using ORCAE (Sterck et al., 2012) and Picoplaza (Vandepoele et al., 2013) databases. Twenty height HGTs candidates were proposed by Foflonker and co-workers in Picochlorum SE3 (Foflonker et al., 2015). HGTs candidates were search among the RCC4223 genome using blastp. Seven of the 28 candidates were found. Additionally, new HGT candidates were identified by searching best blastp hits against non Chlorophyta genes using the approach used previously in Bathycoccus (Moreau et al., 2012). Twenty-four candidate genes were thus retrieved with best blast hit on bacterial genes. Extended gene families have been explored with Picoplaza platform, by comparing genes from 38 eukaryotes species (including Metazoa, Funfi, Chlorophyta, Embryophyta, Rhodophyta, Haptobionta and Stramenopile).

RESULTS Strain characterisation: The phylogenetic position of Picochlorum costavermella (Figure 1A) inferred from the 18S rDNA sequence is closely related the other Picochlorum genome sequenced (Figure 1A), Picochlorum SE3 from Foflonker and co-workers (Foflonker et al., 2015). Transmission electron photography reveals a small cell of 1 to 2 µm size, with a simple organisation and a thick cell wall ~50 nm (Figure 1C). The karyotype is provided in Figure 2, and suggests that the genome is composed of 13 chromosomes from ~95 to ~1 800 kb.

89 2.8 2.6 2.8

Phenotypic tests reveal a high tolerance to salinity variations (Figure 1B): the 2.4 strain has the faster growth rate in salinities from 20 to 35 g.L-1, and is able to grow

in all salinity concentrations tested. Growth is2.6 lower in fresh water and starts to decrease after a salinity of 40 g.L-1. However, the salinity gradient used here confirms that the strain may develop in a wide range of salinities. The temperature m assay indicates a similar tolerance to a wide range of temperature. Indeed, Picochlorum RCC4223 is able to grow at 35 °C (Figure 1D), whereas Ostreococcus tauri cultures do not survive when exposed to this temperature.

m A 100 RCC 4223 B N0 = 5 000 2.2 1.8 2.0 2.2 2.4 2.6 2.8 n = 3 98 Picochlorum sp SE3 fw = fresh water 2.4

2.6 2.6 6 m 100 Chlorella variabilis NC64A 1.8 2.0 2.2 2.4 2.6 2.8

2.4 2.4 100 Coccomyxa C169 6 m 8 Chlamydomonas reinhardtii 1.8 2.0 2.2 2.4 2.6 2.8 2.2 2.2 100 6 Volvox carteri nagariensis m 8

m1 1.8 2.0 2.2 2.4 2.6 2.8 m moy Micromonas pusilla RCC299 10 2.0

2.02.0 6 m Ostreococcus meditarraneus RCC2590 Cell division per day 100 8 1.8 2.0 2.2 2.4 2.6 2.8 m1 10 Cell divisions per day 100 Ostreococcus lucimarinus RCC2590 1.8 1.8 m 6 12 98 m 1.8 Ostreococcus tauri RCC745 2.08 2.2 2.4 2.6 2.8

m1 1.8 2.0 2.2 2.4 2.6 2.8 10

1.6 1.6

1.66 Coleochaete pulvinata (outgroup) 12

6 fw 10 20 30 40 50 60 70 8 0.01 14 Salinity m1 10

12 0 10 20 30 40 50 60 70 2.4 C 2.4 D N0 = 100 000

2.2 m 8 14 O. tauri L1 medium

2.0 y m1 10 1.8 2.0 2.2 2.4 2.6 2.8 2.3 2.3 12

8 2.42.3 6 m 14 m1 10 1.8 2.0 2.2 2.4 2.6 2.8 2.2 2.2 2.32.212 6 m 14 mm mm 8 1.8 2.0 2.2 2.4 2.6 2.8 2.1 2.1 m1

10 2.1 2.112 6 m

14 Picochlorum 2.8 8 1.8 2.0 2.2 2.4 2.6 2.8 Picochlorum m1 2.0 2.0 2.02.010 6 2.6 m Cell division per day 14 8 Cell divisions per day 1.8 2.0 2.2 2.4 2.6 2.8 2.4 m1 10 1.9 1.9 1.91.9 m 6 m 12 2.4 1.44 µm 2.0 2.5 m 2.0 3.0 2.5 3.5 3.0 4.0 3.5 4.5 4.0 5.0 4.5 5.0 12 2.2 1.8 2.08 all all 2.2 2.4 2.6 2.8 50 nm 1.8 2.0 2.2 2.4 2.6 2.8 m1 100 nm 010 2.0 6

0 2.3 6 20°C 35°C

12 20°C 35°C 8 2.0 1.8 14 m1 10 14 Figure 1. (A) Phylogenetic tree summarizing the position6 of Picochlorum8 2.2 RCC422310 in green12 14 12

8 m1 mm 1.8 14 m1 Chlorophyta, using 18S rDNA sequence. (B) Fitness measures10 of Picochlorum RCC4223 in salinity 2.1 8 gradient from fresh water to 70 g/L. The proxy for fitness12 is the number of cell division per day. For 14 m1 10 each point, there were 3 replicates started with 5 000 cells. (C) Picochlorum2.0 RCC4223 transmission 12 14

electron micrograph. Note the thick cell wall. Cell size is about 1 to 2 µm, close1.9 to the smallest known m1 10 12 2.0 2.5 3.0 3.5 4.0 4.5 5.0

all eukaryotes. (D) Fitness measures in high temperature14 (35°C) compared to standard culture conditions (20°C). O. tauri grows faster is standard condition, but do not survive at 35°C, contrary to Picochlorum RCC4223. 14 6 8 10 12 14 12 1.8 14

*+! 6 8 10m1 12 14

m1 Mothur analysis from the two marine stations and the lagoon confirms marine distribution of RCC4223. Two OTU reference sequences had 100% sequence identity with V4 and V9 sequences in the three stations. However, RCC4223 V4 region is 100% identical to Picochlorum RCC2935, RCC897, RCC142, RCC140, RCC14 and RCC4223 V9 region is 100% identical to Picochlorum sp. Azis1. Despite the 100% identical sequences found in environmental samples, it is thus not possible to conclude about the presence of this RCC4223 strain. However, it clearly appears that Picochlorum genus is present both in costal station (SOLA) and offshore station (MOLA). Associated with a higher growth rate in saline water, RCC4223 seems to be better adapted to the marine environment, athough it has been isolated in estuary.

Pico Yladder Mb

($)##$ $*+,( $($&##$./(2$ ($%!#$ *+,01-1!$$$$$$$($"##$4$-$./-2$

*+,"1% $($-##$4$0$ ($(##$ *+,'3& $($###3)"#$./!2$ )!"$ *+,) $)-#$ )("$ *+,(# $&"#$

*+,(( $&##$./"2$ &("$ '!"$ %&#$ $ *+ $%##$./&3/)3/(02$ ,(0 """$ !"#$

*+ $)"$./001/0'2$ ,(- Figure 2. PFGE migration of Picochlorum RCC4223 in the left and yeast ladder in the right. 13 chromosomes are identified, with size from 95 to 1 800 Mb. Ch_n N (Sx); Ch_n is the chromosome number, N is the chromosome size in Mb estimated, (Sx) is the possible contig number associated to the chromosome (see contig sizes in Table S4). PFGE migration provided a genome size of ~14.1 Mb.

*"! Genome assembly PacBio sequencing generate 266 217 reads, with mean read length of 6 215 bp and mean coverage of 69.94x. HGAP assembly built 361 contigs and total length was 21 260 393 bp. Maximum and minimum contig lengths were 965 kb and 2.3 kb length, with N50=244 Mb. After removal of bacterial contigs, 147 HGAP contigs were conserved: total length was reduced to 14 233 452 bp with an average 46% GC content. The statistical results of raw HGAP assembly and assembly report from GATC biotech® (Konstanz, Germany) are provided in Table S1 and supplementary materials. Illumina MiSeq sequencing generated 2.3 millions reads of 300 pb, mapped in 94.63% of the HGAP genome. The best results obtained from ABySS using Illumina reads are presented in Table S2. The ABySS genome was composed by 19 896 contigs with N50=11 212 bp. After removal of bacterial contigs and small contigs below 1 kb, 2 503 were kept and mapped on HGAP with Geneious, SGA (Table S3) and SSPACE. Final assembly was composed by 40 scaffolds of 10.5 kb to 1 794 kb, with N50=691 kb, GC content of 46% and total length of 13.74 Mb. These values are similar to those obtained from the Picochlorum SE3 genome, that is 13.5 Mb length and 46.1% of GC. Mitochondrial sequence corresponds to one contig of 42.9 kb and 41% GC, and the chloroplast sequence corresponds to one contig of 78.2 kb and 31.9% GC. Final assembly is presented in Table S4 and possible correspondences of scaffold sizes and chromosomes are presented in Figure 2.

Genome annotations 9 315 genes composed the genome of Picochlorum RCC4223 with 2 738 genes smaller than 500 bp and 879 genes smaller than 200 pb (see lengths of Open Read Frame (ORF) in Figure 3).

The RCC4223 genome reveals a high number of extended gene families. These extensions are the results of tandem gene duplications. First, genes families involved in variety of transposons and transposable elements are extended in the genome of RCC4223 compared to SE3: (i) the reverse transcriptase HOM03P000019 (53 vs 17; number of copies in SE3 and RCC4223, respectively) and HOM03P007658 (10 vs 1), and (ii) the transposase HOM03P005338 (15 vs 2) and MULE transposase domain HOM03P007225 (10 vs 2).

92 Second, the polyketide synthase HOM03P000145 (23 vs 6), which is implicated in the fatty acid chain formation and polyketide secondary metabolites production. With 23 copies, this genome contains the most abundant gene repertoire of polyketide synthases as compared to other green algae. Polyketides are known in bacteria, plants and fungi, and may be involved in the synthesis of antibiotics, chemotherapeutic compounds or toxins (Carreras et al., 1997; Hopwood, 1997). In plants, two molecules are particularly studied, stilbene and chalcone,, both implicated in phytoalexins synthesis (Schröder and Schröder, 1990).

Third, the DEE superfamily endonuclease (HOM03P005481 and HOM03P005432, 12 vs 5 and 11 vs 2). DDE family are responsible of coordination of metal ions for catalysis.

Fourth, DNA helicase Pif1 HOM03P000255 (21 vs 4), involved in DNA repair and genome stability in telomeres in yeast (Pinter et al., 2008) and human (Mateyak and Zakian, 2006).

Last, Zinc finger SWIM-type HOM03P007225 (10 vs 2), which is involved in different biological functions (Laity et al., 2001).

Of the 28 HGT candidates from the strain SE3 (Foflonker et al., 2015), 7 were found in the RCC4223 genome (09g03890, 03g06310, 08g00570, 10g04270, 15g01560, 15g01560, 17g01380). In addition, 27 new HGT candidates were identified: 01g04510, 01g02120, 01g05920, 01g07290, 01g09100, 01g10000, 02g01220, 03g02930, 03g03660, 04g02340, 05g00790, 06g02940, 07g01620, 08g03650, 09g00220, 09g01800, 10g00960, 12g02430, 13g00040, 14g01620, 16g00230, 18g00250, 20g00540, 20g00570, 21g00070, 21g00420).

93 Histogram of pico 600 400 Frequency HistogramPicochlorum of RCC4223batash

250 Frequency Picochlorum SE3 200 200 150 Frequency 100 50 0 0

0 1000 2000 3000 4000 5000

batash 0 1000 2000 3000 4000 5000 Gene length (bp) pico Figure 3. Open Read Frame (ORF) lengths comparison between Picochlorum species.

Number of gene families per position 50 1-2 3-8 9-25 26-75 76-400

! 40 400+

30

20 Picochlorum RCC4223

10

10 20 30 40 50 Picochlorum SE3 ! Figure 4. Gene family extensions between Picochlorum SE3 and Picochlorum RCC4223.

*%! DISCUSSION Origin of RCC4223 Picochlorum species are commonly found in marine and saline ecosystems (see Picochlorum strains at http://roscoff-culture-collection.org). The presence of Picochlorum RCC4223 species in marine stations and its higher fitness in saline water justify its affiliation to the Picochlorum genus. The new genome of RCC4223 strain provides (i) a new tool to better study extremophile green algae, (ii) and to explore metabolic pathways in these algae.

HGTs candidates It has been proposed that horizontal gene transfers from Bacteria to Plantae including green algae, such as Bathycoccus prasinos RCC1105 (Moreau et al., 2012), Picochlorum SE3 (Foflonker et al., 2015), Chlorella variabilis (Blanc et al., 2010) and the red algae Galdiera phlegrea (Qiu et al., 2013) are more extended than previously thought, and are not limited to the transfer of genes from the chloroplast to the nucleus. HGTs are a fundamental mechanism of adaptation in Bacteria and Archaea (Vos et al., 2015), by enabling the acquisition of new genes involved in metabolic pathways, resistance to a stress or other biological interests. These cross kingdom gene transfers are rare in eukaryotes, but there is evidence that HGT conferred survival capability to extremophile eukaryotes (Schönknecht et al., 2013). However, none of the HGT candidates we identified in Picochlorum RCC4223 can be directly linked to thermotolerance or halotolerance of Picochlorum. Some HGT candidates are present in both RCC4223 and SE3, meaning that these genes have been acquired earlier by their common ancestor. The presence of species-specific genes HGTs candidates indicates that more recent transfers occurred, and thus actively and regularly participate to genome evolution and adaptation of these species.

95 Extended gene families Genes belonging to an over-represented family are disposed at adjacent sites, suggesting an origin by tandem duplication. In the case of the polyketide synthase family HOM03P000145, genes from 02g08770 to 02g08840 and from 16g00230 to 16g00280 are disposed in tandem, with shorter gene length as commonly known in this gene family. Adjacent genes have very low levels of amino-acid identity (~35%), pointing to a very ancient origin by duplication. The zinc finger SWIM-type HOM03P00722 is composed by 3 genes of ~2300 bp and ~95% of amino acid identity. There are also two times 3 short adjacent genes, whose concatenated sequences have ~95% amino acid identity with the previous genes of 2300 bp length. The principle mechanism of gene family extension we observed could be the division of existing genes that may first appear by duplication, and that are later shortened by internal stop codons by mutations. This does not necessarily mean functional loss, and RNA data suggests that all these genes are expressed.

CONCLUSION This study provided a new resource for the study of green algae by bringing a new Picochlorum genome, from the halotolerant and thermotolerant strain RCC4223. The genome of Picochlorum RCC4223 is characterized by a high number of genes compare to other green algae with similar genome size, in part explain by the extension of several gene families with short ORFs. In addition, we support the hypothesis of horizontal gene transfers from bacteria to algae. Last, given the interest of Picochlorum species for biotechnologies, this new genome is an opportunity to explore deeply their potential.

ACKNOWLEDGEMENTS We are grateful to the Genomics of Phytoplankton lab for support and stimulating discussions and the GenoToul Bioinformatics platform from Toulouse, France, for bioinformatics analysis support and GenoToul cluster availability. This work was funded by ANRJCJC-SVSE6-2013-0005 to GP and SSF.

96

CHAPITRE 5:

IMPACT DU TAUX DE MUTATION POUR

LES BIOTECHNOLOGIES

97

98 Quels sont les impacts du taux de mutation sur la domestication des algues vertes ?

Les espèces du genre Picochlorum sont étudiées pour différentes applications biotechnologiques. Pour cette raison, mais aussi pour élargir les connaissances sur le taux de mutation, une EAM a été réalisée avec l’espèce Picochlorum RCC4223. 12 lignées ont été suivies pendant environ 150 générations par lignées sur 199 jours. Contrairement aux expériences avec les Mamiellophyceae, il n’y a pas de témoin pour mieux suivre l’évolution de la fitness des lignées au cours du temps. La méthode pour l’identification des mutations est identique à celle exposer dans le chapitre 3, et le génome de référence est présenté dans le chapitre 4.

En résumé, il est observée une baisse de fitness au cours du temps chez les lignées en raison des mutations délétères (cependant il n’y a pas de normalisation par un control comme dans le chapitre 2). Le taux de mutation par génome est plus élevé que chez les Mamiellophyceae et la majorité des eucaryotes unicellulaires. Il n’y a que 21 mutations, ce qui est trop peu pour obtenir des résultats statistiques solides sur la distribution des mutations dans le génome, comme se fut le cas chez les espèces de Mamiellophyceae.

Cette étude met l’accent sur la variabilité du taux de mutation, y compris entre espèces proches, et surtout sur l’utilisation du taux de mutations spontanées pour l’évolution expérimentale et la domestication des algues vertes. En effet, les algues vertes sont maintenant l’un des groupes le plus représenté parmi les estimations des taux de mutations spontanées. Cette connaissance peut être prise en compte dans le choix d’une espèce pour l’élaboration d’un protocole.

99

100 Spontaneous mutation rate of the Chlorophyta Picochlorum RCC4223

Krasovec Marc*, Piganeau Gwenael*, Sanchez-Ferandin Sophie*.

* Sorbonne Universités, UPMC Univ Paris 06, CNRS, Biologie Intégrative des Organismes Marins (BIOM), Observatoire Océanologique, F-66650 Banyuls/Mer, France

Keywords: Mutation rate, Mutation accumulation, Green algae biotechnology, Picochlorum.

Corresponding authors: [email protected]

ABSTRACT Mutation rate is a crucial parameter to understand evolution and adaptive capacity of species, because mutations are at the origin of variability on which selection acts. Spontaneous mutations may be an alternative to mutagenesis for microalgal domestication purposes. To investigate the potential of spontaneous mutations to generate genetic variation, we performed a mutation accumulation experiment using the species Picochlorum RCC4223, a green alga with important biotechnological potential. A decrease of fitness during experiment due to deleterious mutations is observed. The spontaneous mutation rate is 10.12 x 10-9 mutations per nucleotide per genome per generation, one of the highest estimates in unicellular eukaryotes.

101 INTRODUCTION Natural selection allows species to adapt from standing genetic variation, powered by mutations which constitute the ultimate source of diversity (Wright 1932). Quantifying the rate of mutations and their effects are thus of primary importance to better understand evolution. Mutation accumulation (MA) experiments allows to access to the spontaneous mutation rate (Halligan and Keightley, 2009) thanks to the development of high-throughput sequencing technologies (see Wei and co- workers for a database (Wei et al., 2014)). The principle is to maintain MA lines from an ancestral type with serial bottlenecks to remove selection and fix deleterious spontaneous mutations. Current estimations of mutation rates range from 1 x 10-9 mutations per nucleotide per generation in metazoans, such as Drosophila melanogaster (Keightley et al., 2014a, 2009; Schrider et al., 2013), Caenorhabditis elegans (Denver et al., 2012, 2009) or Heliconius melpomene (Keightley et al., 2014b), and in order of 1 x 10-10 in microorganisms, such as Saccharomyces cerevisiae (Lang and Murray, 2008; Lynch et al., 2008; Zhu et al., 2014),

Schizoaccharomyces pombe (Behringer and Hall, 2015; Farlow et al., 2015) and Escherichia coli (Lee et al., 2012).

The mutation rate is expected to be low so that the mutational load due to deleterious mutations is limited (Agrawal and Whitlock, 2012). As a consequence, selection pushes to decrease the mutation rate. Effective population size is therefore a key parameter because it defines the intensity of selection and drift, which are inversely related (Charlesworth, 2009). Microorganisms, with high effective population size, like green algae, are expected to reach a low mutation rate (Lynch, 2010a; Sung et al., 2012a). Inversely, high genetic drift imposes a drift barrier which prevents the natural selection to reach the optimal mutation rate (Martincorena and Luscombe, 2013). In eukaryotes, the mutation rate is positively correlated to the genome size (Smeds et al., 2016) (Krasovec et al., 2016, in preparation), which varies by 100-fold from a few Mb to a few Gb. This variation causes an increase of replication cost that leads to decrease the replication fidelity.

102 Although the knowledge about mutation rate variation between species is improving, intra-species variation is also commonly observed in MA experiments (Behringer and Hall, 2016). These variations are thus independent from genome size and effective population size. In bacteria, mutation rate can increase by 100-fold because of mutator alleles (Sniegowski et al., 1997; Taddei et al., 1997; Tenaillon et al., 1999). This boosts the chance to observe the appearance of mutations (including mutations advantageous). In eukaryotes, stress can induce an increase of the mutation rate (Jiang et al., 2014), but not as strongly as in bacteria.

In green algae, MA studies are available in a wide range of species. These MA experiments were conducted to estimate either the fitness effects of mutations (Kraemer et al., 2015; Krasovec et al., 2016; Morgan et al., 2014) or the spontaneous mutation rate: Chlamydomonas reinhardtii (Ness et al., 2015b, 2012; Sung et al., 2012a) and four Mamiellophyceae species (Krasovec et al., 2016, in preparation). This knowledge is relevant for biotechnological applications of green algae, like the biofuels production (Brennan and Owende, 2010; Chisti, 2007; Mata et al., 2010) and proteins for health food or cosmetics (Becker, 2007). All natural populations harbour standing genetic variability (Barrett and Schluter, 2008) which enables adaptation to environmental changes and pressures. Thus, domestication of traits of interest could be obtained from the standing genetic diversity. Alternatively, adaptation may also occur from new mutations so that the estimation of spontaneous mutations rates is also paramount to investigate the possible rate of domestication of green algae. In this study, we performed a mutation accumulation experiment in Picochlorum RCC4223, a green algae species belonging to Chlorophyta in the Trebouxiophyceae family (Henley et al., 2004). It has a small haploid genome of ~13.5 Mb with 79.5% of coding sequences and 46% GC content (Krasovec et al., in prep). Strains from the Picochlorum genera are versatile alga for large scale culturing, capable of growing in a wide range of salinities and temperatures (Foflonker et al., 2016, 2015). They also constitute interesting models in different fields, such as medicine (Black et al., 2014), biofuels (Wang et al., 2016) and in aquaculture as dietary complement given its high content in proteins (Chen et al., 2012).

103 MATERIAL AND METHODS MA experiment: The Picochlorum RCC4223 culture has been isolated from an estuary of the river La Massane (France) in our lab and has been deposited at the Roscoff Culture Collection (http://roscoff-culture-collection.org/, France). 12 MA lines were kept from a clonal ancestral population in 24 well plates, at 20°C with life cycle of 16h-8h dark- light. MA lines were inoculated by one single cell and maintained by serial one-cell bottlenecks every 14 days. At bottleneck time, cell concentration was measured with a FACSCanto II flow cytometer (Becton Dickinson, Franklin Lakes, NJ, U.S.A.) using natural chlorophyll fluorescence (670 nm used FL3 data) and SSC acquisitions.

Effective population size, Ne, was estimated with the harmonic mean of cell number between bottlenecks and the number of generations was provided by the following equation:

"!ln(Nt /1)/t$# G = e

Nt is the total number of cells measured by flow cytometer and t the time between two bottlenecks, i.e 14. Lines were maintained during 199 days and suffered a serial of 14 bottlenecks. G is also used as a proxy of fitness to estimate the effects of mutations over time, from T0 to Tf, with a linear model.

Sequencing and mutations identification We extracted DNA using CTAB protocol for Illumina MiSeq sequencing, performed by GATC biotech® (Konstanz, Germany). 12 MA lines and the ancestral type were sequenced. To identify mutations, we used the same method described by Krasovec and co-workers (Krasovec et al., 2016, in preparation); MiSeq reads were aligned to the reference genome with BWA (Li and Durbin, 2010), bam files were treated with SAMtools (Li et al., 2009) and mutations were identified with GATK (DePristo et al., 2011). Afterwards, final vcf files and mutations candidates were obtained after following filtered steps: removal of low mapping quality sites (<40), low covered sites (<5) and candidates shared by two MA lines. SnpEff (Cingolani et al., 2012) permitted to identify synonymous, non-synonymous, intronic and intergenic mutation types using the annotation available in ORCAE web site (Sterck et al., 2012). This mutation calling pipeline was used for base-substitution and insertions- deletions (indels).

104 Mutation spectrum

Pearson's chi-squared test was used to test the distribution of observed

mutations and expected distribution. Expected distribution H0 was defined assuming that mutations appear randomly and independently in the genome. We compared the distribution of mutations between coding and non coding regions; the level of expression of mutated sites using STAR (Dobin et al., 2013); the synonymous and non-synonymous base-substitution mutations; the direction of mutations from each nucleotide to others and the nucleotide context (between 2 and 10 nucleotides) around mutated sites.

GC bias and GCeq were estimated from the following equations (Sueoka, 1962):

(GC→AT) (AT→GC) �! �!= , �!= , ���� = , ��� = �� − ���� ��! ��! �! + �!

GCn and ATn are the total GC and AT; GC→AT and AT→GC are the number of nucleotide changes; GCeq is the GC at the equilibrium, meaning the GC content where the number of mutations from GC to AT and AT to GC is equal.

RESULTS Picochlorum mutation rate The experiment lasted 199 days, corresponding to an average of 133 generations per MA line with an effective population size of Ne ~6. About ~ 97% of the genome was usable for mutation identifications (Table 1). MA lines accumulated 21 mutations: 19 base-substitutions (Figure 1) and 2 insertion-deletions (indels) (Table 2). 19 of these mutations were validated by PCR. Despite MA the fact that lines were maintained for a similar number of generations, there is a strong heterogeneity in the mutations distribution between lines (Table 1). Considering all -10 MA lines, µbs is 9.19 x 10 base-substitution mutations per nucleotide, and µID is 9.64-11 indels per nucleotide. Total mutation rate µ is 1.012 x 10- 9 mutations per nucleotide per generation. It corresponds to Ubs = 0.0119 base-substitution mutations per genome and UID = 0.0013 indels per genome per generation. No mutation was found the mitochondria and the chloroplast genomes.

105 The fitness of the MA lines significantly decreased during the experiment (linear model, " = -0.36, P-value = 1.7 x 10-6), suggesting that some spontaneous mutations are deleterious. Nevertheless, no fitness measurement of control line with high effective population size is available to normalize MA lines fitness data (Chevin, 2011; Krasovec et al., 2016). Although the fitness decrease is expected as a consequence of deleterious mutations, it can’t be exclude that the fitness variation in MA lines could arise from the experimental set-up

We propose to estimate the arrival of new mutations in a Picochlorum

RCC4223 culture, started from modest an inoculation of N0=10 cells and maintained for 30 days, assuming U=0.0132 mutations per genome per cell division and one cell division per day. According to these simple assumptions, this will lead to a culture of 10 230 cells (~5.4 x 109 cells) corresponding to 230 cell divisions. This culture would thus contain ~3.57 x 107 mutant cells (3.54 x 10-7 cells with one mutations, 23.4 x 104 cells with 2 mutations, 1543 cells with 3 mutations and 10 cells with four mutations). 8

Mutation distribution Mutations appearing in non-coding region were higher than expected by 6 chance but this is not statistically significant. The proportion of synonymous and non- synonymous mutations was as expected under neutral evolution (Table 2), consistent with the lack of selection against non-synonymous mutations. 4

Transitions Transversions 3

2 2

1 1 2 3 4 5 6 7 8 9 10 11 12

0 G!A C !T T!C A!G G!C C!G G!T C!A T!G A!C T!A A!T Figure 1. Number of base-substitution mutations observed in MA lines of Picochlorum RCC4223. 19 base-substitutions mutations were detected, and 17 confirmed by PCR.

"+'! Table 1. Distribution of the mutations. BS and ID are the base-substitution and insertion-deletion mutations. G* corresponds to the genome percentage usable for mutations identification. MA lines G* (%) BS ID Generations 1 97.2 1 0 143 2 97.1 2 0 148 3 97.2 0 0 120 4 97.1 1 0 125 5 97.1 0 0 123 6 97.3 6 0 128 7 97.7 1 0 90 8 96.9 1 0 134 9 97.2 0 0 145 10 97.2 5 1 156 11 98.1 1 0 130 12 97.2 0 1 153

Table 2. The distribution of the mutations in the genome, with the predicted effects from SnpEFF. Contig Position Reference Mutation Effect 1 1412265 A G Synonymous coding 2 1558965 T A Non synonymous coding 3 1431057 G A Intergenic 3 1438974 C T Synonymous coding 4 388329 G T Intergenic 4 532113 C A Non synonymous coding 4 632871 A G Non synonymous coding 4 726437 T A Non synonymous coding 4 934361 C A Intergenic 6 671466 G A Non synonymous coding 10 496542 C G Non synonymous coding 10 615708 A T Intergenic 12 132864 T C Intergenic 12 160261 C T Synonymous coding 12 545145 T C Intergenic 13 215260 G A Non synonymous coding 13 240891 G C Non synonymous coding 13 475120 C G Intergenic 41 2537 A C Non synonymous coding 27 93213 TG T Frame shift 13 8595 T TTA Frame shift

107 R1 (GC->AT mutation rate) and R2 (AT->GC mutation rate) are respectively equal to 4.12 x 10-10 and 9.61 x 10-10 mutations per nucleotide per generation. We

obtained a GCeq of 30%, while the observed GC content is 46%. The base substitution mutation increases by 8.8% as consequence of GC distance from equilibrium, meaning it has a moderate influence on the high mutation rate in Picochlorum. No influence of the nucleotide context was observed in the distribution of the mutations.

Mutation rate variation The Chlorophyta constitute one of the most represented phylogenetic group in spontaneous mutation rate estimates by MA experiments. In addition to Picochlorum, four mutation rates from Mamiellophyceae (Krasovec et al., in preparation) and several mutation rates from multiple strains from Chlamydomonas reinhardtii (Ness et al., 2015b, 2012; Sung et al., 2012a) are available (Table 3).

In C. reinharditii, the mutation rate varies by ~40 fold, like observed with all unicellular eukaryotes. In Mamiellophyceae, µ is ~6.2 x 10-10(±2.5-10); Considering the other eukaryotes with multiple mutation rate estimates, µ is ~39.3-10(±14.0-10) in Drosophila melanogaster (Keightley et al., 2014a, 2009; Schrider et al., 2013) and ~21.4-10(±8.1-10) in Caenorhabditis elegans (Denver et al., 2012, 2009). It corresponds to ~35% intra specific variation of µ. This part of mutation rate variability is thus independent from species characters such as genome size or effective population size. This suggests that a mutation rate estimated in a strain or species may not be extrapolated to phylogenetically closely related species. This implies that there is no or low phylogenetic constraint on the mutation rate. The mutation rate observed in RCC4223 is thus not necessarily representative of the Picochlorum genus mutation rate.

108

Table 3. Available direct spontaneous mutation rate estimations by mutation accumulation experiments in Chlorophyta, taking into account base-substitutions and indels.

Species G G*% µ -10 U References

C. reinhardtii CC-2937 105 59.0 3.23 0.0362 (Ness et al., 2012)

C. reinhardtii CC-124 121 - 0.676 0.0076 (Sung et al., 2012a)

C. reinhardtii C-1952 104 71.5 4.05 0.0454 (Ness et al., 2015b)

C. reinhardtii CC-2931 104 69.7 15.6 0.1747 (Ness et al., 2015b)

C. reinhardtii CC-1373 104 75.8 28.1 0.3147 (Ness et al., 2015b)

C. reinhardtii CC-2342 104 69.2 11.1 0.1243 (Ness et al., 2015b)

O. tauri RCC4221 13.0 97.5 4.79 0.0062 Krasovec et al., 2016

O. mediterraneus RCC2590 13.5 97.2 5.92 0.0081 Krasovec et al., 2016

M.s pusilla RCC299 21.0 99.6 9.76 0.0205 Krasovec et al., 2016

B. prasinos RCC1105 15.0 99.5 4.39 0.0066 Krasovec et al., 2016

Picochlorum sp RCC4223 14.3 97.0 10.12 0.0132 This study

DISCUSSION High mutation rate The mutation rate of Picochlorum RCC4223 is one of the highest spontaneous mutation rates estimated from a microorganism. In unicellular eukaryotes, higher mutation rate has been reported in three strains of C. reinhardtii (Ness et al., 2015b).

First, a high mutation rate can be a consequence of low adaptation to lab conditions. Picochlorum RCC4223 was isolated from brackish water, and we performed the MA experiment just one year after its isolation, while it was kept in L1 medium. In eukaryotes, the mutation rate can increase in genome with a low fitness quality or stressful conditions, such as in A. thaliana (Jiang et al., 2014) and D. melanogaster (Sharp and Agrawal, 2012). This increases the chance of beneficial mutation appearance. Although a higher mutation rate is a pledge of fast arrival of new mutations, it also increases deleterious mutation events. Effectively, in accordance with the literature, the fitness of the MA lines decreases during the experiment as expected if mutations are deleterious (Ajie et al., 2005; Fry, 2001; Hall

109 et al., 2013; Keightley, 1994; Shaw et al., 2000). However, the large effective population size in microorganisms allows effective selection against this higher pool of deleterious mutations, and therefore limits the genetic load (Agrawal and Whitlock, 2012; Lynch and Gabriel, 1990) due to this high mutation rate. Species with a high effective population size could support a higher mutation rate if necessary for a given time period, without undergoing too heavily deleterious mutation effects.

Second, the mutation rate variation between MA lines can come from the stochastic production of DNA repair proteins, reported in Escherichia coli (Uphoff et al., 2016). This bias could be responsible of an increase of mutation rate in some MA lines, increasing the global mutation rate from the whole experiment.

Biotechnological potential of Picochlorum RCC4223 Spontaneous mutation rate is determinant to estimate the number of mutants in a culture of microalgae. In Picochlorum RCC4223, calculation suggests that the number of mutants generated spontaneously is sufficient for effective and fast experimental evolution, provided efficient selection procedure to isolate mutants with relevant trait. For example, single cells with larger sizes or higher lipids content can be sorted by flow cytometry using fluorescent dyes such as Nile red and BODIPY 505/515 (Rumin et al., 2015). Mutation and selection are of primary importance for species domestication, as biotechnologies and use of new species are reported as a possible solution to global challenges (Carroll et al., 2014).

In Picochlorum RCC4223, we expect to quickly reach a genotype of interest with high effective population size and strong selection. Despite a decrease of fitness due to deleterious mutations in MA condition, the adaptation potential of Picochlorum is not impacted. First, high selection due to high effective population size removes deleterious mutations. Second, mutation effects are environment dependent and a deleterious mutation in one condition is not necessarily deleterious in other (Krasovec et al., 2016).

110 The chance to a obtain desirable genotype may be increased by mutagenesis. In green algae, different protocols have been tested and permit to obtain cells of interest (Cazzaniga et al., 2014; Ota et al., 2013; Vonlanthen et al., 2015). However, the mutagen factors also increase the deleterious mutation rate, and thus the mutation load. Algal populations have so to be carefully exposed to a mutagen in order to not compromise their survival capacity. In the case of Ultraviolet irradiation, the survival rate reaches just 10% (Cazzaniga et al., 2014). The use of mutagens chemicals could be avoided for Picochlorum RCC4223 in culture condition, providing that the mutation rate and the effective population size are high. It has recently been shown that experimental evolution with high effective population size induces an increase of the growth rate over generations without the use of mutagens in Chlamydomonas reinhardtii (Perrineau et al., 2014).

To conclude, knowledge about green algae mutation rates allows an estimation of the numbers of mutations appearing in the culture, and may be considered as a criterion for species selection for domestication purposes.

CONCLUSION This study provided the first direct spontaneous mutation rate estimate in a Picochlorum species. The mutation rate of RCC4223 appears to be sensibly higher that mutation rate commonly found in unicellular eukaryotes, including other green algae, with the exception of a few strains of C. reinhardtii. This high mutation rates strengthens Picochlorum species as a promising alga in biotechnological researches.

111

112

CHAPITRE 6:

DISCUSSION GENERALE ET

CONCLUSION

113

114 1. Les variations de fitness indépendantes des mutations 1. La plasticité phénotypique

Les résultats obtenus à partir des données de fitness sur les lignées mutantes (chapitre 2) indiquent que l’effet des mutations chez le phytoplancton eucaryote peut être très important. Les lignées n’ont accumulé en moyenne que 2 ou 3 mutations chacune (Tableaux A1 à A3, page 159 à 161). Or elles ont des fitness significativement différentes, inférieures ou supérieures au contrôle selon les lignées. Il apparaît donc que quelques mutations induisent une variation de fitness détectable. En effet, certaines mutations pourraient avoir un fort impact sur la capacité de survie, notamment dans le cas d’une mutation qui introduit un codon stop dans une protéine, ou qui réduit l’efficacité d’un facteur de transcription responsable de la régulation d’une voie métabolique.

Les données de séquençage nous donnent l’opportunité de corréler certaines mutations avec des données de fitness. Malgré cela, les données disponibles ne permettent pas de comprendre la relation entre une mutation et un effet mesuré. Par exemple, la lignées 3 d’O. mediterraneus pousse significativement plus vite dans le milieu avec Irgarol et moins vite dans le milieu avec le Diuron. Cette lignée a fixé trois mutations, deux substitutions non synonymes et une délétion avec un décalage du cadre de lecture (Tableaux A2 et A3). Les gènes impactés sont identifiés (une glycoside hydrolase, une triphosphate hydrolase et un domaine MYB), mais il n’est pas possible de conclure sur l’impact réel de l’une des mutations sur la différence de fitness observée.

Il existe cependant un point qui relativise l’importance du rôle des mutations dans la variation de fitness observée: deux lignées n’ont pas fixé de mutation et montrent pourtant un changement significatif de fitness (la lignées 7 d’O. tauri montre une baisse significative de sa fitness au cours de l’EAM (Tableau A1), et la lignées 5 de B. prasinos a une fitness plus élevée dans les milieux avec herbicides). La forte couverture de séquençage sur la presque totalité du génome réduit la probabilité qu’une mutation soit apparue sans avoir été identifiée.

115 Bien que cela ne concerne que 2 lignées sur la totalité dont la fitness a été étudiée, ces résultats mettent en avant l’importance de facteurs non mutationnelles qui peuvent être impliqués dans la variation de la fitness. Cette différence de fitness peut venir de variations dans la transcription et la méthylation de gènes (Jones, 2012), ou d’une forte plasticité comme proposé par Collins et ses collaborateurs chez différentes espèces d’algues vertes (Collins et al., 2014; Schaum et al., 2015; Schaum and Collins, 2014; Scheinin et al., 2015). La plasticité est définie comme la capacité pour un même génotype de produire plusieurs phénotypes en réponse à des variations environnementales. Il est montré que dans un environnement fluctuant, la moitié de cette réponse peut être apportée par la plasticité. Si tel est le cas, il est délicat d’affirmer que les changements de fitness sont issus des mutations (même dans le cas où des mutations ont été identifiées). Le changement de fitness serait alors lié aussi bien aux mutations qu’à la plasticité phénotypique. Pourtant, les différences de fitness sont bien observées, qu’elles soient d’origine mutationnelle ou non. C’est plutôt le rôle précis des mutations qui est délicat à déterminer dans cette étude sur la fitness.

Par ailleurs, de nombreuses études sur les variations de fitness chez des lignées issues d’EAMs avaient été réalisées avant que l’accès aux données de séquençage ne soit aussi facile qu’aujourd’hui (Halligan and Keightley, 2009). Un changement de fitness était alors interprété comme la conséquence de l’apparition d’une mutation. A partir de là, la moyenne et la variance du caractère de fitness permettaient de calculer les paramètres mutationnels. De ce fait, il est possible que lors de ces études, le rôle des mutations sur la fitness ait parfois été surestimé en raison de l’absence de données de séquençage.

116 1. 2. Les bactéries présentes dans les cultures d’O tauri

Les EAMs ont permis d’aborder d’autres questions que celles relatives aux mutations, notamment sur la communauté bactérienne qui pourrait être associée aux cultures de Mamiellophyceae. Des cas d’interactions entre bactéries et unicellulaires eucaryotes phytoplanctoniques sont connus (Cole, 1982), comme chez les diatomées (Bacillariophyceae) (Schäfer et al., 2002). Par exemple, certaines bactéries peuvent produire des vitamines qui stimulent la croissance du phytoplancton (Croft et al., 2005). Aucune culture totalement axénique (c’est à dire sans bactéries) de Mamiellophyceae n’a pour l’instant été obtenue dans notre laboratoire. Même avec différentes méthodes (centrifugation, antibiotiques), des bactéries restent présentes dans la culture. S’il n’est pas possible ou difficile d’obtenir des cultures axéniques, il est raisonnable de penser que c’est parce que certaines bactéries favorisent ou stimulent la croissance de la culture. Si tel est le cas, une diminution de fitness observée ou une perte de lignée pourraient s’expliquer par l’absence de la bactérie associée à la culture d’algue. Dans le cas des EAMs, les lignées ont pu être inoculées, de façon stochastique, avec ou sans bactéries lors des goulots d’étranglement.

Concernant les lignées d’O. tauri ayant survécu à l’ensemble de l’EAM, une certaine communauté de bactéries devrait être retrouvée en fin d’expérience dans les cultures si celle-ci. Aussi, on peut supposer que cette communauté bactérienne aura été inoculée à chaque goulot d’étranglement avec O. tauri. est indispensable pour favoriser la croissance de la microalgue.

Pour tester cette hypothèse, nous avons étalé sur boite de Pétri 16 milieux de culture de 16 lignées d’O. tauri et identifié les colonies bactériennes par séquençage du marqueur ADNr 16S. Les résultats de cette étude sont présenté dans un article accepté dans la revue Frontiers in Microbiology (Lupette et al., 2016, in press, voir annexes page 173). Le genre bactérien le plus représenté est Marinobacter, déjà retrouvé dans des cultures de Dinobiontes ou Haptobiontes (Alavi et al., 2001; Amin et al., 2009; Hold et al., 2001). Les Marinobacter sont connues pour produire des

117 sidérophores, un chélateur du fer utilisé par le phytoplancton (Martinez et al., 2003, 2000; Vraspir and Butler, 2009), et des vitamines B12 (cobalamine). Il n’est cependant pas possible de conclure à ce stade de l’étude sur une éventuelle association symbiotique entre les bactéries identifiées et O. tauri. Pour cela, d’autres expériences doivent être menées sur des cultures axéniques pour tester les effets de la présence/absence bactérienne, et cela dans différents milieux de culture. Si cette relation est confirmée, les pertes de lignées observées lors des EAMs pourraient peut-être s’expliquer en partie par l’absence de bactéries au moment du goulot d’étranglement.

1.3. Le rôle des variations structurelles sur le phénotype

Au-delà des mutations étudiées dans les chapitres précédents (les substitutions et les insertions-délétions), il existe des variations structurelles du caryotype avec d’importantes conséquences phénotypiques. Les différentes souches d’une même espèce de Mamiellophycae sont en partie différenciées par leurs caryotypes, qui présentent des variations au niveau de la taille des chromosomes dit « outliers » (Subirana et al., 2013). Il s’agit de chromosomes avec une composition en GC inferieure au reste du génome, et de nombreuses régions répétées. Il y a 1 ou 2 chromosomes dit « outliers » par génome suivant les espèces, sur le total des chromosomes (de 17 à 20 suivant l’espèce) qui composent le caryotype.

Par ailleurs, les Mamiellophycea sont infectés par des prasinovirus spécifiques (Yau et al., 2015), comme c’est le cas pour le virus modèle d’O. tauri, OtV5 (Derelle et al., 2008). En laboratoire, l’infection d’une culture d’algue par le virus induit une très forte mortalité (on parle de lyse de la culture), mais qui n’est pas totale car il existe une partie de la population d’algues qui est résistante au virus. Les variations dans la taille des chromosomes « outliers » semblent être en partie responsables de l’acquisition de cette résistance chez O. tauri (Yau et al., en révision).

118 Pour ces raisons, une migration par PFGE (Schwartz and Cantor, 1984) sur l’ensemble des lignées d’O. mediterraneus, M. pusilla et B. prasinos a été réalisée pour détecter ou non un changement de caryotype au cours des EAMs. Parallèlement, nous avons testé la sensibilité des lignées d’O. mediterraneus RCC2590 au virus d’O. mediterraneus, dit OmV0 (Yau et al., 2015).

Il est ressorti de ces deux tests que quatre lignées d’O. mediterraneus n’ont pas le même caryotype que le type ancestral. Toutes les autres lignées, y compris chez les autres espèces, n’ont pas changé de caryotype. Le résultat obtenu par PFGE est présenté sur la Figure 3, avec une variation de la taille du chromosome 11. Aussi, les 4 lignées avec un caryotype modifié étaient sensibles au virus, alors que toutes les autres et la souche témoin étaient résistantes au virus. Ce n’est cependant qu’une corrélation, et les mécanismes moléculaires qui peuvent lier ces deux observations ne sont pas connus.

Par ailleurs, nous ne savons pas si ces cellules constituent une sous- population au sein de la souche O. mediterraneus RCC2590, caractérisée par une variation au niveau du chromosome 11, ou si ces variations sont issues du processus d’accumulation de mutations. Dans la seconde hypothèse, l’estimation du taux de mutation caryotypique d’O. mediterraneus RCC2590 donne Uc=0.00046 mutations par génome par génération.

Pour tester cela, une expérience est réalisée avec différents clones de la souche O. mediterraneus RCC2590 isolés par cytométrie en flux. Chaque clone est mis en présence du virus OmV0. Dans le cas d’une lyse, le caryotype est vérifié par PFGE. Suivant le résultat, il est possible d’estimer la part de la population de la souche RCC2590 qui présente cette variation chromosomique, certainement corrélée avec la résistance au virus OmV0. D’après cette expérience, il apparaît qu’il y a bien une sous-population, environ 7% des cellules, qui présente cette particularité caryotypique. Le séquençage par la technologie PACBIO d’une lignée de chaque caryotype a permis d’obtenir une meilleure précision dans l’assemblage de ce chromosome, notamment par la mise en évidence d’une région répétée de 60kb dans la lignée

119 résistante. Ces résultats sont inclus dans un manuscrit portant sur l’analyse du génome d’O. mediterraneus RCC2590 (Yau et al., en préparation).

!" !"# !$#%&'("&))*+&,-# #" $%" $%" !"""""#""""$""""""%""""""&"""""""'"""""(""""")""""""*"

1 900

945 815 727.5 680

610 555

485

450

375

295

Figure194 1. Migration PFGE des lignées issues de l’EAM d’O. mediterraneus.. Les numéros225 d’ O. mediterraneus en blanc sont des lignées identiques au type ancestral, et en en rouge apparaissent les numéros des lignées qui présentent une variation de caryotype et une sensibilité au virus OmV0. Les rectangles jaunes indiquent les variations de caryotype observées chez les lignées. 48.5

2. Les limites à l’estimation du taux de mutation

Différentes facteurs jouent un rôle dans la détermination du taux de mutation et dans ses variations (Figure 2). Le taux de mutation est un compromis entre ces facteurs, que sont la taille du génome et le coût de la réplication, la force de sélection et de la dérive, ou encore le poids des mutations délétères entre autres. Les hypothèses qui prédisent l’impact de chacun de ces facteurs sur le taux de

"#+! mutation sont en partie issues des résultats des EAMs. Il est donc nécessaire de discuter des biais éventuels qui existes avec ce type d’approche expérimentale.

!

Fidélité de le Mutations MMR et TCR Selection Polymerase délétères

Coût de la Dérive Coût de la Mutations Distance au Mutagènes réparation réplication adaptatives GC extérieurs Taille du équilibre génome

Figure 2. Présentation des différents facteurs qui influencent le taux de mutation.

Premièrement, ces expériences sont réalisées en majorité sur des organismes connus et maintenus depuis des années en laboratoire. Ces conditions sont donc devenues des conditions optimales, auxquelles les modèles biologiques utilisés ont pu s’adapter et dans lesquelles, à part pour certaines expériences données, ils ne sont pas soumis à la compétition interspécifique, à un environnement variable ou encore à une forte sélection (après un certain temps d’adaptation au laboratoire). Prenons le cas des algues vertes dans le cadre des EAMs chez les Mamiellophyceae: les cultures sont clonales (pas de compétition avec une autre espèce), adaptées aux conditions du laboratoire (avec un cycle de vie parfaitement constant depuis des années: lumière et température), dans un milieu riche jamais limitant (pas de compétition pour la ressource) et sans prédation. Bien qu’il n’y ait pas d’hyper-mutateurs connus chez les eucaryotes, le taux de mutation peut varier en cas de stress (Jiang et al., 2014; Sharp and Agrawal, 2012). Dans la nature, nous pouvons supposer que les espèces ne sont jamais soumises à des conditions de laboratoire.

"#"!

En raison de ces arguments, il est raisonnable de s’interroger sur la représentativité de ce type d’expérience par rapport au taux de mutation réel dans la nature. Dans ce cas, davantage d’EAMs réalisées en conditions de stress pourraient permettre de plus se rapprocher du taux de mutation réel dans la nature. Cela a été fait avec Caenorhabditis elegans pour la température (Matsuba et al., 2013), mais il serait intéressant de réaliser ce type d’expérience en condition de stress avec d’autres espèces. Le taux de mutation peut donc aussi varier en fonction des conditions expérimentales, expliquant une partie de la variation observée entre les expériences sur un même organisme. Un article se focalise sur cette question (Behringer and Hall, 2016), en étudiant les résultats d’EAMs indépendantes chez D. melanogaster, A. thaliana, C. reinhardtii, C. elegans, S. pombe et S. cerevisiae. Il apparaît un taux de mutation différent mais un spectre mutationnel statistiquement identique (c’est à dire que les mêmes biais sont observés, par exemple le biais de mutations de GC vers AT) chez S. pombe, C. elegans et D. melanogaster (Behringer and Hall, 2015; Farlow et al., 2015).

Deuxièmement, le nombre de taux de mutation disponibles au niveau de génomes complets sur des souches « sauvages » avec un nombre relativement élevé de lignées est faible et non représentatif de l’arbre eucaryote. Il existe des taux de mutation disponibles par EAMs pour cinq métazoaires et deux levures (tous des Unikonta), sept Archeplastida (six algues vertes et Arabidopsis thaliana), Dictyostelium discoideum, Paramecium tetraurelia et neuf bactéries. De plus, la couverture des grands génomes est souvent inferieure à 80% (78% pour Arabidopsis thaliana (Ossowski et al., 2010), 46% pour Heliconius (Keightley et al., 2014b)). D’une manière générale, le taux de mutation n’est pas uniforme, et cela à toutes les échelles, d’où l’importance d’une couverture la plus grande possible pour estimer un taux de mutation à l’échelle d’un génome.

122 3. Perspectives pour les EAMs

Les EAMs ont eu un rôle majeur pour permettre une meilleure compréhension des mutations et de leurs effets. Ce travail de thèse est une contribution à ce sujet, et expose les premiers protocoles utilisant la technique de la cytométrie en flux pour réaliser des EAMs avec des espèces de pico-phytoplancton qui ne se développent pas sur boite. Ce dernier point est important car les microorganismes ayant fait l’objet de recherches par EAMs sont des organismes qui peuvent se développer sur boite de Pétri, c’est-à-dire en milieu solide. C’est un confort expérimental par rapport au milieu liquide, mais le nombre d’espèces candidates est limité.

Les nouveaux séquenceurs favorisent les nouvelles études portant sur les taux de mutation, et il y a une augmentation du nombre de taux de mutation disponibles par EAM. Néanmoins, des taxons majeurs ne sont pas représentés, notamment les Straménopiles, les Dinobiontes, les Haptobiontes, les Excavates, les Rhizarias ou encore l’intégralité des Archées (on parle ici de taux de mutation à l’échelle d’un génome complet). Il est donc nécessaire d’explorer d’autres modèles biologiques que ceux connus depuis des années en laboratoire. Seule une amélioration significative du nombre d’espèces, en explorant l’intégralité de l’arbre du vivant, apportera des éléments complémentaires sur les facteurs biologiques et écologiques qui influencent le taux de mutation et ces variations.

En effet, les conclusions sur les facteurs influençant les variations du taux de mutation ont été tirées à partir de données obtenues chez les Archeplastida et les Opistochontes. Les taux de mutation de Paramecium tetraurelia (Sung et al., 2012b) et Dictyostellium discoideum (Saxer et al., 2012), éloignées de ces deux règnes eucaryotes, sont justement des « points aberrants » (outliers) concernant, par exemple, le rôle de la taille du génome.

123 Pour ces raisons, à la suite de ce travail de thèse, une sixième EAM a été lancée sur une espèce modèle, Phaeodactylum tricornutum RCC2967 (Straménopiles, Bacillariophycea, détails en Annexes), une diatomée dont le génome est disponible et annoté (Bowler et al., 2008). Son génome fait 27.5 Mb avec un GC de 48.8%. Dans le cas où l’expérience aboutirait (c’est à dire conservation d’un nombre suffisants de lignées pour obtenir 8000 générations indépendantes), ce serait le premier taux de mutation directement estimé sur une espèce de Straménopiles. L’objectif est d’étendre les connaissances actuelles sur les taux de mutation à d’autres modèles biologiques. Le protocole de cette expérience est identique à celui décrit dans le chapitre 2, avec un suivi de 42 lignées qui seront séquencées au bout de ~200 générations par lignée. Il existe d’autres espèces candidates intéressantes, mais il faut pour cela un génome publié de bonne qualité, annoté et d’une espèce pour laquelle la manipulation est la moins complexe possible. On peut par exemple citer l’Haptobionte modèle Emiliania huxlei, même s’il y a des complications dans l’assemblage du génome (Read et al., 2013).

Enfin, une dernière expérience est en projet pour étudier l’effet de la température et du stress chez Picochlorum RCC4223. Cette souche est capable de se développer jusqu’à 35°C. Une expérience en milieu standard à 20°C et une autre à 30°C permettront de comparer les deux conditions. Une variation du taux de mutation peut refléter un effet de la température ou du stress. 20 lignées dans chaque condition seront suivies pendant 150 générations.

124 4. Conclusion générale

En conclusion, ce travail de thèse a permis l’estimation de 5 nouveaux taux de mutations spontanées et fournit un nouveau génome d’algue verte assemblé et annoté. Le nombre d’espèces ayant fait l’objet d’une estimation directe du taux de mutation par EAMs était de 12 en début de thèse (septembre 2013), puis 25 en fin de thèse (septembre 2016) en comptant les 5 algues vertes issues de ce travail. Ces résultats ont permis de répondre en partie à la question de l’origine des variations du taux de mutation à l’intérieur d’un génome et entre espèces. Une méta-analyse confirme le rôle de la taille efficace et de la taille du génome dans l’évolution du taux de mutation. Aussi, nous mettons en évidence l’impact de la distance relative de la composition en GC du génome par rapport à l’équilibre dans l’augmentation du taux de mutation. D’une manière générale, cette contribution participe à résoudre la problématique globale du rôle des mutations en évolution, et met en avant le rôle des expériences d’accumulation de mutations dans les avancées faites par les biologistes sur les questions qui entourent les mutations et l’adaptation. Pour conclure, il est essentiel de développer des approches d’accumulation de mutations sur un spectre plus large d’espèces. L’effet des mutations sur la fitness est de mieux en mieux connu, de même que les spectres mutationnels chez les modèles classiques en biologie. Il est crucial de s’orienter vers les nouveaux modèles pour non seulement réévaluer les hypothèses existantes, mais aussi en développer de nouvelles qui n’auraient peut-être pas émergé avec les connaissances issues des modèles classiques.

125

126

ANNEXES

127

128

L1 Medium

Guillard and Hargraves (1993) - please see note at the bottom of this page

This enriched seawater medium is based upon f/2 medium (Guillard and Ryther 1962) but has additional trace metals. It is a general purpose marine medium for growing coastal algae.

To prepare, begin with 950 mL of filtered natural seawater. Add the quantity of each component as indicated below, and then bring the final volume to 1 liter using filtered natural seawater. The trace element solution and vitamin solutions are given below. Autoclave. Final pH should be 8.0 to 8.2.

Component Stock Solution Quantity Molar Concentration in Final Medium -1 -4 NaNO3 75.00 g L dH2O 1 mL 8.82 x 10 M -1 -5 NaH2PO4· H2O 5.00 g L dH2O 1 mL 3.62 x 10 M -1 -4 Na2SiO3 · 9 H2O 30.00 g L dH2O 1 mL 1.06 x 10 M trace element solution (see recipe below) 1 mL --- vitamin solution (see recipe below) 0.5mL ---

L1 Trace Element Solution

To 950 mL dH2O add the following components and bring final volume to 1 liter with dH2O. Autoclave.

Page 1 of 3

Component Stock Solution Quantity Molar Concentration in Final Medium -5 Na2EDTA · 2H2O --- 4.36 g 1.17 x 10 M -5 FeCl3 · 6H2O --- 3.15 g 1.17 x 10 M -1 -7 MnCl2·4 H2O 178.10 g L dH2O 1 mL 9.09 x 10 M -1 -8 ZnSO4 · 7H2O 23.00 g L dH2O 1 mL 8.00 x 10 M -1 -8 CoCl2 · 6H2O 11.90 g L dH2O 1 mL 5.00 x 10 M -1 -8 CuSO4 · 5H2O 2.50 g L dH2O 1 mL 1.00 x 10 M -1 -8 Na2MoO4 · 2H2O 19.9 g L dH2O 1 mL 8.22 x 10 M -1 -8 H2SeO3 1.29 g L dH2O 1 mL 1.00 x 10 M -1 -8 NiSO4 · 6H2O 2.63 g L dH2O 1 mL 1.00 x 10 M -1 -8 Na3VO4 1.84 g L dH2O 1 mL 1.00 x 10 M -1 -8 K2CrO4 1.94 g L dH2O 1 mL 1.00 x 10 M

f/2 Vitamin Solution

(Guillard and Ryther 1962, Guillard 1975)

First, prepare primary stock solutions. To prepare final vitamin solution, begin with 950 mL of dH2O, dissolve the thiamine, add the amounts of the primary stocks as indicated in the quantity column below, and bring final volume to 1 liter with dH2O. At the NCMA we autoclave to sterilize. Store in refrigerator or freezer.

Component Primary Stock Quantity Molar Concentration in Solution Final Medium -7 thiamine · HCl (vit. B1) --- 200 mg 2.96 x 10 M -1 -9 biotin (vit. H) 0.1g L dH2O 10 mL 2.05 x 10 M -1 -10 cyanocobalamin (vit. B12) 1.0 g L dH2O 1 mL 3.69 x 10 M

Page 2 of 3 ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 4221 Ostreococcus tauri

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Mamiellophyceae Order: Mamiellales Ecotype: Strain name: BCC145000 Other names: La Reine (from BCC1000, RCC745)

ORIGIN Ocean origin: Mediterranean Sea Region of origin: Thau lagoon Country of origin: France Cruise: Isolation station: Isolation depth: 0 m GPS position: +42°n/a 24', +3° 36' Isolation Date: 3/5/1995 00:00:00 Isolator: Courties

CULTURE Size: 1.0 µm Cell shape: Coccoid Cell assemblage: Growth medium: L1 Growth light: 100 µEin Growth temperature: 20.0°C Remarks: Clonal cultures obtained from RCC745. Replaces RCC745

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 2590 Ostreococcus mediterraneus

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Mamiellophyceae Order: Mamiellales Ecotype: clade D Strain name: P_4-03_1 Other names: BCC 102000

ORIGIN Ocean origin: Mediterranean Sea Region of origin: Gulf of Lion Country of origin: France Cruise: Isolation station: Isolation depth: 1 m GPS position: +43°n/a 24', +3° 36' Isolation Date: 23/3/2009 00:00:00 Isolator: Stephanie Michely

CULTURE Size: 1.0 µm Cell shape: coccoid Cell assemblage: Growth medium: K Growth light: 100 µEin Growth temperature: 20.0°C Remarks:

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 1105 Bathycoccus prasinos

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Mamiellophyceae Order: Mamiellales Ecotype: Strain name: BBan7 Other names: BCC4000

ORIGIN Ocean origin: Mediterranean Sea Region of origin: Gulf of Lion Country of origin: France Cruise: Isolation station: Banyuls Bay, SOLA Isolation depth: 0 m GPS position: +42° 27', +3° 32' Isolation Date: 1/1/2006 00:00:00 Isolator: N.Grimsley

CULTURE Size: Cell shape: Cell assemblage: Growth medium: K Growth light: 100 µEin Growth temperature: 20.0°C Remarks: The strain deposited at RCC appears to be Ostreococcus. We do not know when the contamination occured. The Banyuls lab has recloned the culture from a cryopreserved aliquot (RCC4222) that can be ordered.

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 299 Micromonas pusilla

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Mamiellophyceae Order: Mamiellales Ecotype: clade A Strain name: NOUM17 Other names: NOUM97017, NIES-2672

ORIGIN Ocean origin: Pacific Ocean Region of origin: Equatorial Pacific Country of origin: Cruise: Ebene Isolation station: Isolation depth: 0 m GPS position: -22° 20', +166° 20' Isolation Date: 10/2/1998 00:00:00 Isolator: Boulben S.

CULTURE Size: 2.0 µm Cell shape: flagellate Cell assemblage: Growth medium: K Growth light: 100 µEin Growth temperature: 20.0°C Remarks: The clonal version of this strain is RCC 827. We recommend that you RCC 827 rather than RCC 299.

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 4223 Picochlorum sp

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Trebouxiophyceae Order: Ecotype: Strain name: BCC143000 Other names: 10M2F12

ORIGIN Ocean origin: Mediterranean Sea Region of origin: Gulf of Lion Country of origin: France Cruise: Isolation station: Isolation depth: 0 m GPS position: +42°n/a 32', +2° 59' Isolation Date: 11/8/2011 00:00:00 Isolator: Subirana

CULTURE Size: 1.0 µm Cell shape: coccoid Cell assemblage: Growth medium: L1 Growth light: 100 µEin Growth temperature: 20.0°C Remarks:

Genome sequence is currently done in the Banyuls Lab.

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org ROSCOFF CULTURE COLLECTION INTERNATIONAL MARINE CULTURE COLLECTION

RCC 2967 Phaeodactylum tricornutum

STATUS Status: Distributed Cryopreserved:

IDENTITY Class: Bacillariophyceae Order: Naviculales Ecotype: Strain name: Pt1_8.6 Other names: CCAP 1055/1, Pt Gen,COUGH, CCMP632

ORIGIN Ocean origin: Atlantic Ocean Region of origin: Blackpool Country of origin: UK Cruise: Isolation station: Isolation depth: 0 m GPS position: +54° 0', -4° 0' Isolation Date: Isolator:

CULTURE Size: 10.0 µm Cell shape: fusiform Cell assemblage: Growth medium: F/2 Growth light: 100 µEin Growth temperature: 20.0°C Remarks:

Strain has been fully described in De Martino et al (2007) J Phycol 43: 992-109. Its genome has been sequenced in Bowler, C., Allen, A.E., Badger, J.H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U. et al. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature. 456:239-44.

Roscoff Culture Collection Station Biologique de Roscoff, Place Georges Teissier, 29680 ROSCOFF Cedex, France Phone : +33 2 98 29 25 64, Fax : +33 2 98 29 23 24 For any question, please contact us. Web site : www.roscoff-culture-collection.org CHAPITRE 2

Table S1. Normalized Gr from Micromonas pusilla MA lines that survived since the beginning to the end of the mutation accumulation experiment, at each

bottleneck, from 14 to 302 days. Gtot is the total number of generations of the MA line, and p-value the result of linear correlation test to detect an increase or

a decrease of fitness using normalised data. We could not normalize the MA growth rates for days 70 and 98 because the control lines did not grow, but

bottlenecks have been performed.

Line 14 28 42 56 84 112 126 140 154 168 182 196 210 224 245 260 274 288 302 Gtot P- value 1 0.90 1.26 1.02 1.16 1.25 1.33 1.01 1.08 1.05 0.95 1.14 1.07 0.99 1.01 1.19 1.01 1.02 1.00 0.96 275 NS 2 1.00 1.09 1.21 0.97 1.46 1.4 1.02 1.04 1.08 1.12 0.98 1.01 0.93 1.13 1.15 0.92 1.00 1.05 1.15 277 NS 3 1.08 1.13 1.01 0.92 1.23 1.34 0.99 1.1 1.04 1.35 0.95 1.08 0.98 1.00 1.10 1.10 1.09 1.00 1.21 284 NS 4 0.88 1.27 1.01 0.96 1.25 1.53 0.97 1.17 0.95 1.24 1.00 1.13 0.75 1.2 1.01 1.01 0.95 0.80 0.99 261 NS 5 0.94 1.02 1.14 0.90 1.24 1.21 1.14 0.86 0.94 0.96 0.83 1.15 0.81 1.18 0.99 0.87 1.01 0.93 1.13 247 NS 6 0.96 1.09 1.14 1.00 1.25 1.46 1.02 1.10 0.91 1.32 1.01 1.17 0.98 1.12 1.21 0.95 1.01 0.94 1.21 280 NS 7 0.91 1.28 1.04 1.12 1.38 1.33 1.02 1.16 0.99 1.34 0.90 1.08 0.97 1.06 1.22 1.09 0.95 0.91 1.22 285 NS

137 Table S2. Normalized Gr from Ostreococcus mediterraneus MA lines that survived since the beginning to the end of the mutation accumulation experiment,

from 14 to 294 days. Gtot is the total number of generations of the MA line, and p-value the result of linear correlation test to detect an increase or a decrease

of fitness using normalised data.

Line 14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238 252 266 280 294 Gtot P-value 1 1.12 0.98 1.07 1.01 1.07 0.98 1.06 1.13 1.04 1.04 0.91 1.04 1.12 1.00 0.95 0.89 1.11 1.00 1.05 1.03 0.98 287 NS 2 1.12 1.12 1.09 1.06 0.98 1.06 0.99 1.10 0.87 1.01 1.00 1.00 1.06 1.04 0.96 0.92 1.07 0.96 0.98 0.97 0.96 280 NS 3 0.99 0.96 0.99 0.98 0.85 1.12 0.89 0.95 0.91 0.95 0.88 0.98 1.12 0.89 0.90 1.03 0.91 0.97 1.10 0.94 0.95 252 NS 4 0.95 1.07 0.95 1.13 1.06 1.00 0.97 1.02 0.99 1.04 0.95 1.10 1.12 0.88 1.03 0.98 1.02 1.06 1.16 1.00 1.12 288 NS 5 0.90 0.93 1.06 0.85 1.22 0.91 1.00 0.90 0.97 1.12 1.05 1.04 1.09 1.01 0.99 1.05 1.01 0.97 1.03 0.93 0.86 268 NS 6 0.93 0.99 0.98 0.91 1.02 0.95 1.15 0.93 0.86 0.98 1.12 1.02 1.04 1.01 0.97 1.09 0.96 1.11 1.02 0.99 0.95 271 NS 7 1.05 0.95 1.04 0.91 0.97 0.94 1.07 1.13 0.87 0.99 0.98 1.00 1.07 0.93 0.97 0.97 1.06 1.15 0.94 0.97 1.00 271 NS 8 0.90 0.96 0.84 0.92 0.91 0.86 1.05 0.98 1.03 0.98 1.07 1.11 0.96 1.23 0.97 1.14 0.93 0.93 1.01 1.03 1.15 271 increase* 9 1.05 0.94 0.97 1.09 0.98 1.14 0.96 0.92 1.09 1.03 0.99 0.91 1.09 0.95 0.99 1.05 1.11 0.97 1.06 1.00 0.96 278 NS 10 1.00 1.10 1.03 0.92 1.06 0.98 0.96 1.01 0.91 1.07 0.94 0.91 1.02 1.03 0.98 0.91 0.99 0.92 0.92 1.06 0.92 262 NS 11 0.93 1.04 0.93 1.06 1.08 0.97 1.00 1.10 1.07 0.93 1.02 1.06 0.85 1.18 0.98 0.96 0.88 1.10 0.90 0.93 1.01 270 NS 12 0.95 0.99 1.01 0.99 1.03 0.98 0.93 1.01 0.95 1.08 0.89 1.03 0.97 1.01 0.92 1.03 1.00 0.92 1.02 0.86 0.99 260 NS 13 1.03 0.96 1.09 0.97 0.89 1.13 0.89 1.07 0.93 1.02 0.84 0.99 1.05 0.95 1.11 0.95 1.05 1.02 0.95 0.93 1.04 267 NS 14 1.10 0.96 1.02 0.92 1.11 0.91 0.93 1.20 0.89 0.96 1.08 1.00 0.98 0.95 1.08 0.97 1.02 1.15 1.00 1.06 0.94 277 NS 15 0.94 0.92 0.92 1.01 0.95 1.01 1.03 0.89 1.00 0.97 0.90 1.04 1.04 0.90 1.10 0.86 1.11 0.87 0.90 1.13 1.02 259 NS 16 1.00 0.88 0.99 0.99 0.86 1.02 1.15 0.94 0.91 1.16 0.88 1.00 0.95 1.09 1.08 1.11 0.96 1.15 0.94 1.14 0.94 276 NS 17 0.93 0.98 0.94 1.12 0.92 0.97 1.06 0.88 1.08 0.97 1.08 1.09 1.02 1.02 1.03 1.09 1.11 1.00 1.03 1.14 0.96 284 NS 18 0.97 1.00 0.92 1.17 0.97 0.94 0.97 0.98 0.93 1.01 1.02 0.89 1.14 0.84 1.02 0.95 0.95 1.00 0.93 1.05 0.99 262 NS 19 1.03 1.04 0.99 1.02 1.00 1.07 1.03 1.04 1.02 0.97 0.98 1.05 0.98 0.96 1.03 0.98 0.97 0.96 1.05 1.06 1.13 281 NS 20 0.94 1.02 1.03 0.92 1.16 1.04 1.05 0.96 1.13 0.98 1.10 1.05 0.95 1.18 0.93 1.23 0.98 1.12 1.08 0.96 1.04 295 NS 21 0.99 1.00 0.86 1.09 0.92 1.02 0.91 0.93 1.23 0.95 0.91 1.02 0.88 1.06 0.93 1.09 0.88 0.89 1.05 1.04 1.07 264 NS 22 0.95 1.09 1.01 0.96 1.04 1.08 0.94 1.02 0.96 1.00 1.05 1.06 0.88 1.01 0.95 1.02 0.87 1.14 0.90 1.11 1.04 273 NS 23 0.94 1.08 1.12 1.06 1.06 0.95 0.95 0.86 1.00 0.94 1.11 1.00 1.09 0.92 1.07 0.95 1.01 0.92 1.06 0.94 1.11 275 NS 24 1.04 1.04 1.01 1.01 1.08 0.91 0.91 0.91 1.20 0.92 0.98 0.97 0.96 0.95 1.05 0.94 1.01 0.99 1.07 1.01 0.96 269 NS

138

Table S3. Normalized Gr from Bathycoccus prasinos MA lines that survived since the beginning to the end of the mutation accumulation experiment, at each

bottleneck, from 14 to 224 days. Gtot is the total number of generations of the MA line, and p-value the result of linear correlation test to detect an increase or

a decrease of fitness using normalised data.

Lines 14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 Gtot P- value 1 0.95 1.25 1.05 1.10 0.97 1.00 1.06 1.07 1.45 0.96 0.89 1.06 1.16 1.00 1.15 1.06 258 NS 2 1.08 1.30 0.95 1.13 1.11 1.18 1.02 1.11 1.38 0.92 0.91 1.02 1.18 0.91 1.20 1.11 269 NS 3 1.03 1.25 0.85 1.17 1.22 1.05 1.19 1.02 1.56 0.93 0.90 0.91 1.14 1.17 1.07 0.91 264 NS 4 1.04 1.12 1.04 0.97 1.01 1.15 1.02 0.79 1.26 0.98 1.05 1.04 1.19 1.15 1.14 0.92 251 NS 5 1.09 1.03 1.13 1.18 1.23 1.35 1.04 0.96 1.41 0.85 1.02 0.93 1.09 0.99 0.89 1.05 262 NS 6 1.04 1.19 0.98 1.18 1.21 1.00 1.06 0.97 1.40 1.02 0.90 0.87 1.20 1.02 1.06 1.06 259 NS 7 1.13 1.12 1.28 1.10 1.12 0.95 1.23 1.13 1.40 0.92 1.05 1.19 1.30 1.22 1.39 0.96 296 NS 8 1.02 1.13 1.10 1.00 1.24 1.12 1.14 1.14 1.34 0.95 1.14 0.80 1.24 1.05 1.04 0.86 263 NS

139 Table S4. Normalized Gr from Ostreococcus tauri MA lines that survived since the beginning to the end of the mutation accumulation experiment, at each

bottleneck, from 140 to 378 days. Gtot is the total number of generations of the MA line, and p-value the result of linear correlation test to detect an increase or

a decrease of fitness using normalised data. For O. tauri, however, there were two exceptions in bottleneck time: one at 11 days, and one after 18 days (Ne=

9.5 for this data point).

Lines 140 151 165 179 192 224 238 252 266 280 294 308 322 336 350 364 378 Gtot P-value 1 1.16 1.12 1.42 1.13 1.21 1.28 1.21 1.20 1.29 1.31 1.13 1.15 1.01 1.03 1.31 1.12 1.10 519 NS 2 1.19 1.26 1.34 1.12 1.27 1.19 1.24 1.21 1.36 1.26 1.15 1.07 1.03 1.17 1.29 1.12 1.16 526 NS 3 1.10 1.39 1.23 1.23 1.16 1.24 1.07 1.15 1.33 1.21 1.10 1.11 0.93 1.11 1.35 1.11 1.21 515 NS 4 1.16 1.21 1.18 1.26 1.20 1.23 1.05 1.23 1.23 1.23 1.17 0.99 1.09 1.09 1.33 1.05 1.08 492 NS 5 1.23 1.31 1.18 1.16 1.14 1.2 1.07 1.23 1.34 1.21 1.18 1.11 1.05 1.07 1.28 1.12 1.08 503 NS 6 1.08 1.31 1.22 1.19 1.21 1.21 1.09 1.26 1.19 1.23 1.27 1.07 1.01 1.08 1.21 1.10 1.13 514 NS 7 1.18 1.6 1.31 1.24 1.17 1.24 1.14 1.18 1.37 1.23 1.21 1.05 1.02 1.12 1.22 1.21 1.06 521 decrease* 8 1.14 1.35 1.29 1.26 1.27 1.34 1.00 1.29 1.3 1.18 1.15 0.95 1.00 1.10 1.23 1.13 1.1 517 decrease* 9 1.22 1.29 1.22 1.29 1.20 1.25 1.07 1.13 1.2 1.26 1.18 1.17 1.02 1.12 1.14 1.21 1.04 509 NS 10 1.09 1.42 1.22 1.15 1.19 1.24 1.25 1.07 1.23 1.34 1.09 1.07 0.93 1.23 1.25 1.19 1.1 526 NS 11 1.18 1.33 1.24 1.16 1.18 1.18 1.16 1.19 1.18 1.32 1.14 1.18 0.99 1.21 1.27 1.2 1.14 525 NS 12 1.19 1.17 1.21 1.18 1.11 1.16 1.16 1.14 1.18 1.33 1.02 1.14 0.87 1.13 1.15 1.23 1.12 498 NS 13 1.19 1.30 1.18 1.12 1.16 1.23 1.13 1.28 1.27 1.25 1.12 1.01 1.04 1.11 1.09 1.23 1.17 511 NS 14 1.11 1.40 1.14 1.22 1.26 1.34 1.16 1.14 1.18 1.36 1.20 1.02 0.99 1.26 1.09 1.18 1.30 526 NS 15 1.08 1.27 1.19 1.22 1.22 1.21 1.16 1.12 1.28 1.3 1.03 0.97 0.98 1.19 1.08 1.26 1.24 498 NS 16 1.19 1.30 1.17 1.27 1.25 1.24 1.08 1.17 1.28 1.16 1.10 0.97 1.00 1.14 1.13 1.28 1.20 520 NS 17 1.18 1.3 1.17 1.25 1.26 1.31 1.11 1.20 1.30 1.12 1.21 1.03 0.95 1.16 1.07 1.33 1.14 511 NS 18 1.18 1.64 1.4 1.24 1.18 1.2 1.22 1.17 1.32 1.12 1.09 1.09 1.03 1.14 1.19 1.20 1.16 521 decrease* 19 1.12 1.28 1.34 1.3 1.24 1.3 1.13 1.23 1.30 1.23 0.99 1.00 1.05 1.15 1.16 1.12 1.14 507 decrease* 20 1.05 1.23 1.40 1.22 1.27 1.21 1.07 1.17 1.35 1.20 1.07 1.01 1.03 1.16 1.15 1.17 1.09 505 NS 21 1.03 1.30 1.31 1.16 1.23 1.27 1.14 1.21 1.16 1.17 1.08 1.03 1.05 1.22 1.00 1.28 1.17 500 NS

140

Table S5. Average of G of MA lines and control of Micromonas pusilla for each environmental test. P-value column indicates the result of the pairwise test to compare MA lines fitness with the control (NS p-value non significant, * p-value significant at 5%, ** p-value significant at 1%, *** p-value significant at 0.1%).

Line 1 2 3 4 5 Control Irgarol 1 µg.L-1 0.98 0.97 0.52*** 0.78*** 0.37*** 1.08 Diuron 10 µg.L-1 1.08 0.96** 0.93** 1.06 0.89*** 1.05 Salinity 5 g.L-1 0.98 0.80 1.01 0.95 1.03 0.83 Salinity 20 g.L-1 1.64** 1.52*** 1.69** 1.55*** 1.73 1.80 Salinity 35 g.L-1 1.66 1.54** 1.72 1.60 1.79** 1.65 Salinity 55 g.L-1 1.51** 1.46** 1.64** 1.49** 1.66** 1.35 Salinity 65 g.L-1 1.24*** 1.05*** 1.26*** 1.10*** 1.23*** 0.84

Table S6. Average of G of MA lines and control of Bathycoccus prasinos for each environmental test. P-value column indicates the result of the pairwise test to compare MA lines fitness with the control (NS p-value non significant, * p-value significant at 5%, ** p-value significant at 1%, *** p-value significant at

0.1%).

Line 1 2 3 4 5 6 7 8 Control Irgarol 1 µg.L-1 1.00 0.74*** 1.00 0.88 1.67*** 0.74*** 1.04 1.46*** 0.97 Diuron 10 µg.L-1 0.60 0.52 0.55 0.68 1.08*** 0.47 0.57 0.65 0.52 Salinity 5 g.L-1 1.42*** 1.41*** 1.25*** 0.82 1.13** 1.48*** 1.51*** 0.69 0.72 Salinity 20 g.L-1 2.22** 2.33** 2.15** 1.70** 2.30** 2.36** 2.32** 2.20** 1.93 Salinity 35 g.L-1 2.05 2.03 1.99 1.86** 2.05 2.14 2.08 2.21 2.23 Salinity 55 g.L-1 1.72 1.76 1.56 1.56 1.62 1.70 1.79 1.57 1.88 Salinity 65 g.L-1 1.04** 1.08** 0.81 0.84 0.98 1.14** 1.10** 0.64 0.86

141 Table S7. Average of G of MA lines and control of Ostreococcus mediterraneus for each environmental test. P-value column indicates the result of the

pairwise test to compare MA lines fitness with the control, (NS p-value non significant, * p-value significant at 5%, ** p-value significant at 1%, *** p-value

significant at 0.1%).

Line 1 2 3 4 5 6 7 8 9 Control Irgarol 1 µg.L-1 1.07*** 1.06*** 1.04*** 0.86 0.95 0.78** 0.67*** 0.82 0.73*** 0.89 Diuron 10 µg.L-1 0.87** 0.98 0.92 0.9 0.93 0.91 0.95 0.88** 0.88** 0.96 Salinity 5 g.L-1 1.87 1.83 1.88 1.93 1.90 1.68*** 1.60*** 1.93 1.66*** 1.86 Salinity 20 g.L-1 1.97 2.07 2.08 2.10 2.04 1.79* 1.75* 2.11 2.00 2.00 Salinity 35 g.L-1 1.86 1.80 1.94 1.93 1.71 1.63** 1.65** 1.90 1.82 1.81 Salinity 55 g.L-1 1.34 1.28 1.62 1.46 1.59 1.21 1.54 1.29 1.36 1.49 Salinity 65 g.L-1 0.73*** 0.68*** 0.63*** 0.59*** 1.14 1.14 1.23 0.73*** 0.91*** 1.13

142 CHAPITRE 3

Table S1. Summary of mutation accumulation experiments. Ne is the average of effective population size during the experiment (estimated using harmonic mean of cell number), and the total line is the total sequenced lines at the end of the MA experiment. Total generation is the total of independent

generations obtained with all sequenced MA lines. T0 to Tf is the duration of the mutation accumulation experiment since the inoculation of the MA lines to the DNA extraction. Gen. is generation Total Mean gen per Species Total gen Ne T0 to Tf (days) line MA line O. tauri RCC4221 40 17 250 431 8.5 378 O. mediterraneus RCC2590 37 8 379 235 7 294 M. pusilla RCC299 37 4 145 112 6 299 B. prasinos RCC1105 36 4994 139 8.5 224

Table S2. The part of the genome usable (G*) for mutations calling. G*min is the minimum and G*max the maximum genome size.

Species G (Mb) G*average (%) G*min (%) G*max (%)

O. tauri RCC4221 13.03 12.60 (97.5) 11.82 (91.54) 12.84 (99.45)

O. mediterraneus RCC2590 13.48 13.10 (97.2) 12.28 (91.10) 13.35 (99.08)

B. prasinos RCC1105 15.07 15.02 (99.6) 14.66 (97.23) 15.03 (99.73)

M. pusilla RCC299 21.11 21.01 (99.5) 20.55 (97.36) 21.07 (99.79)

Table S3. Base-substitution mutations in O. tauri. Chromosome Position Reference Mutation Effect 1 281349 T C Synonym coding 1 462543 T A - 1 462544 A T - 1 536352 G A - 1 688195 A G - 1 762227 A G Non synonym coding 1 764912 G A - 1 934994 C T Synonym coding 2 202377 C A Non synonym coding 2 744253 G A Synonym coding 2 759388 C T Non synonym coding

143 2 817108 C T Non synonym coding 2 820022 C A Non synonym coding 2 868393 G A Synonym coding 2 1070559 A G Non synonym coding 3 41080 C T Synonym coding 3 127716 G A Non synonym coding 3 210680 A G - 3 353325 C T Non synonym coding 3 379647 C T Non synonym coding 3 779548 T C Synonym coding 3 971494 C T Non synonym coding 4 117 C A - 4 366731 C T Synonym coding 4 524836 G A Non synonym coding 4 735752 C T Non synonym coding 5 5699 C T Non synonym coding 5 405535 G A Synonym coding 5 498490 T G Non synonym coding 5 600053 T A - 5 600056 G A - 5 658543 G A Non synonym coding 5 668815 C T Non synonym coding 6 12972 C T Non synonym coding 6 18960 C T - 6 46323 T G Non synonym coding 6 162319 T C Non synonym coding 6 334481 G C - 6 661927 C T - 6 716579 C T Non synonym coding 7 475871 A G Non synonym coding 7 612454 C T Synonym coding 7 702913 A C Non synonym coding 7 744212 G A Non synonym coding 8 66027 G A - 8 185592 G C - 9 31962 G A Non synonym coding 9 126121 C T Synonym coding 9 393218 C T Non synonym coding 9 398617 C T Synonym coding 9 514297 C T Non synonym coding

144 9 643568 A G Non synonym coding 10 108529 G A Non synonym coding 10 510428 C T Non synonym coding 10 562914 G A Synonym coding 10 587147 G A - 11 222249 A G - 11 446073 T C Synonym coding 11 452825 C A Non synonym coding 11 472590 G A Stop gained 11 474928 A G - 12 94 A G - 12 419488 C A Non synonym coding 12 460233 T G - 13 145936 G A Synonym coding 13 365169 C T Non synonym coding 13 496008 C T Non synonym coding 14 75583 G A Intron 14 210547 G A Non synonym coding 14 231150 G T - 14 257724 C A Non synonym coding 14 374547 G C Synonym coding 14 495621 G A Synonym coding 14 500741 C T Non synonym coding 15 165137 C T Non synonym coding 15 177872 G T Non synonym coding 15 352817 A G Synonym coding 16 460708 C T Synonym coding 16 325515 G T Synonym coding 16 521001 C T Non synonym coding 17 35411 T G Non synonym coding 17 410925 C T - 17 410926 G A - 17 410927 A G - 17 410928 C G - 18 122316 A G - 18 252024 T G - 18 345570 T G - 18 345641 G A - 20 49667 C T Non synonym coding 2 165267 T G -

145 Table S4. Base-substitution mutations in O. mediterraneus. Chromosome Position Reference Mutation Effect 1 240410 T C Synonym coding 1 281668 A T Non synonym coding 1 384332 T C - 1 500961 T G - 1 504598 G A - 1 960787 G T Non synonym coding 2 783209 C A Synonym coding 3 168617 T G Non synonym coding 3 194253 A T Non synonym coding 4 4863 A G Non synonym coding 4 250707 T C - 4 271238 A G - 4 271242 G T - 4 271243 A G - 4 271249 A C - 4 271258 C T - 4 271264 G T - 5 204024 G A Non synonym coding 5 237455 G A Non synonym coding 5 369175 C T Non synonym coding 5 561353 C A Non synonym coding 6 167738 C T Non synonym coding 6 365873 C T Synonym coding 6 638629 C A Non synonym coding 7 136735 A C Non synonym coding 7 298488 C T Non synonym coding 8 59263 G A Non synonym coding 8 106639 G A Non synonym coding 8 165454 G A Synonym coding 8 246199 T C Non synonym coding 8 285631 G A Synonym coding 8 390757 T A Non synonym coding 8 425236 T G Non synonym coding 8 437922 G C Non synonym coding 8 699362 G A Synonym coding 9 552630 G T Non synonym coding 9 701807 A C Non synonym coding 9 717500 C A - 9 737678 G T Non synonym coding 10 165842 G A Non synonym coding 13 107450 T C Non synonym coding 13 162729 C T Synonym coding 13 586916 T C Non synonym coding

146 14 60175 T A Non synonym coding 14 60217 G A Non synonym coding 14 98081 A C Non synonym coding 15 112247 C A Start gained 16 222414 C T Synonym coding 16 297544 C T Synonym coding 17 368001 G T Non synonym coding 17 422490 C T - 18 272777 A G Non synonym coding 18 299852 A G Non synonym coding 19 81732 C G Non synonym coding

Table S5. Base-substitution mutations in B. prasinos. Chromosome Position Reference Mutation Effect 1 42082 C A Non synonym coding 1 79489 C T Synonym coding 1 418914 T G Non synonym coding 1 1305044 G C Synonym coding 11 525583 C T Non synonym coding 12 50 T A - 12 122406 T C - 13 208600 T A Non synonym coding 17 72 G C - 18 999 C A Non synonym coding 18 235306 T G - 19 69819 A T - 19 145968 A T - 2 556426 G C Non synonym coding 4 135852 C T Synonym coding 4 531741 C G Non synonym coding 5 87328 C T Non synonym coding 5 145129 A T Intron 5 602981 C G Synonym coding 5 797176 A T Non synonym coding 8 574299 C A Non synonym coding 9 93038 C T Synonym coding

147 Table S6. Base-substitution mutations in M. pusilla. Chromosome Position Reference Mutation Effect 1 101651 G T - 1 319728 T A - 1 319729 C A - 1 356850 G C Non synonym coding 1 868786 A G - 1 907594 A G - 1 989553 G A Non synonym coding 1 1166944 A G - 1 1311961 A G Non synonym coding 1 1328941 C T - 1 1480609 A C - 1 1531771 T C - 1 1771291 T C - 2 296 C T - 2 197240 G C Non synonym coding 2 318653 C G Non synonym coding 2 318654 A C Non synonym coding 2 318655 G C Non synonym coding 2 371001 G T Non synonym coding 2 629587 T G Non synonym coding 2 1231590 G A Synonym coding 2 1439488 C T Non synonym coding 2 1898300 G T Synonym coding 2 1898302 A C Non synonym coding 2 1898303 C A Non synonym coding 2 1898304 T A Non synonym coding 2 1898305 T C Synonym coding 3 1370371 C G Non synonym coding 3 1375097 G A Non synonym coding 3 1708043 C T Non synonym coding 4 3470 C A Synonym coding 4 334275 G A Synonym coding 4 571514 G A Non synonym coding 4 1224092 G A Synonym coding 5 486431 C A Synonym coding 5 508099 G C Synonym coding 5 556716 C G Non synonym coding 5 896616 C T -

148 5 1108981 G A Non synonym coding 6 1235633 G A Non synonym coding 7 570664 G T Non synonym coding 7 570665 A C Synonym coding 7 570666 G T Non synonym coding 7 672236 G A - 7 746112 G C Synonym coding 8 93285 G T Non synonym coding 8 273349 C T Non synonym coding 8 284318 G T Non synonym coding 8 303063 G A Synonym coding 8 310233 T A Non synonym coding 8 310234 G C Non synonym coding 8 1077655 G T Non synonym coding 9 1126769 A T Synonym coding 11 173723 G A - 11 430804 C T Non synonym coding 11 899300 A G Non synonym coding 11 899309 G A Non synonym coding 12 323466 T C Non synonym coding 12 797741 C T Synonym coding 12 823088 T G Non synonym coding 13 688257 C G Synonym coding 14 346000 G T Synonym coding 14 346390 G C Synonym coding 14 761603 G A Non synonym coding 15 146889 A T Non synonym coding 15 146890 T G Non synonym coding 15 320895 A G Synonym coding 15 408470 A C - 15 410306 C T Non synonym coding 15 504682 G T Non synonym coding 16 329517 C T Non synonym coding

149 Table S7. Insertions in the four species.

Chromosome Position Reference Mutation Effect Bathycoccus prasinos

2 102 A AC - 9 521140 A AAG - 13 570953 T TGCC Codons insertion 14 272 A ACC - 18 523 A AC - Micromonas pusilla

1 319727 A AC - 2 378439 C CCG Frame shift Ostreococcus tauri

4 76071 C CG - 10 455685 T TCGTCGG Codons insertion 15 109437 C CG - 17 226384 C CG - 20 153964 G GA - Ostreococcus mediterraneus

1 384327 G ATT - 8 729406 C CA Frame shift 12 432782 C CTACTG -

Table S8. Deletions in the four species.

Chromosome Position Reference Mutation Effect Bathycoccus prasinos

1 164711 AGGCGAGCAGTG A - 1 164726 AATTCAATTTCAATA A - 5 622569 GA G Frame shift 5 963834 AAATATCTATTG A - 14 329038 TG T - Micromonas pusilla

1 1787228 TTATTCCTTCGAAGCTTACGTACG T - 1 1924354 CA C Frame shift 1 319750 GTACCTTCGAAGGTATAA G - 3 214 CT C - 5 140463 CGCGAGACCTCG C Frame shift Codons 6 1235612 AGGAGGAGGAGGGGGAGGAGGG A deletion Codons 8 310219 CGATCTGCGTCCGGTG C deletion 10 1045182 GCGC G - 10 1160336 TCA T - Codons 11 899291 ACAGACGGCACAGCTGGCG A deletion 14 474 AACCCTTCGT A -

150 16 287228 AACTCGAGTTGACAAGACC A - Ostreococcus tauri

1 601169 GA G - 6 46321 CT C Frame shift 8 273361 AAC A Frame shift 8 72783 GACACCCGCGTGTACGGGACCGCGACCC G - 11 222762 GA G Frame shift 12 403601 CGCGAGACCGGCGCACATCGCCGTCGTCGCCACCGTCGGAAACT C Frame shift 13 135 CA C - 17 87 CT C - Ostreococcus mediterraneus

6 284415 CG C Frame shift 7 156017 GT G - 7 1775 TGTTGCC T - Codons 8 350046 CGTT C deletion 9 185017 CACGGCGACGACGAACGATGGCG C Frame shift 12 567140 TCG T Frame shift 12 328305 GT G - Codons 13 112767 ATCTATCGTCGCGACGGCGGTCGTCTCTATG A deletion

Table S9: RNAseq coverage of exons and other sequences, in mutated and no mutated sites in B. prasinos and O. tauri. Species Sequence N mutations Mutated site Non-mutated P. value types coverages site coverages Wilcoxon test Bathycoccus prasinos Non-exons 13 247 646 0.0004 Exons 19 505 667 0.123 Ostreococcus tauri Non-exons 38 26 149 3.45-7 Exons 64 127 126 0.382

151 O. tauri GC to AT > AT to GC (Binomial test,P-value=0.0001) 29 30

22 20

12 8 10 6 5 30 3 3 3 1 1 2 1

O.1 mediterraneus2 3 4 5 6 GC to7 AT > AT 8to GC (Binomial9 test,NS)10 11 12 20

10 9 10 6 6 5 5 4 5 3 30 1 1 2 2

M.1 pusilla 2 3 4 5 GC6 to AT >7 AT to GC8 (Binomial9 test,P-value=0.02)10 11 12 20 13 10 10 30

10 7 7 4 4 4 5 5 3 3 2 20

B.1 prasinos2 3 4 5 6 GC to7 AT > AT 8to GC (Binomial9 test,NS)10 11 12 10 5 4 5 3 3 2 2 2 0 1 0 0 0 G!A C!T T!C A!G G!C C!G G!T C!A T!G A!C T!A A!T 1 2 3 4 5 6 7 8 9 10 11 12 Transitions Transversions

Figure S1. The distribution of the base-substitution mutations. GC to AT bias in observed in the four species, and is significant in O.tauri and M. pusilla.

"&#! Table S10. Spontaneous base-substitution mutation rates estimated by mutation accumulation experiments in Bacteria and Eukaryotes. G is the genome size in Mb, µ is the mutation rate per nucleotide and U is the number of mutations per genome per generation. The data from Ness in 2015 is the average of 6 strains. Effective population size come from Lynch supplementary material (Lynch, 2010a), excepted for Mus musculus (Phifer-Rixey et al., 2012), Heliconius melpomene (Keightley et al., 2014b), Ficedula albicollis (Backström et al., 2013), Arabidopsis thaliana (Cao et al., 2011), Caenorhabditis elegans (Cutter, 2006 ), Caenorhabditis briggsae (Cutter et al., 2006), Drosophila melanogaster (Shapiro et al., 2007) and O. tauri (Blanc-Mathieu et al., in preparation). Ge is the estimation of protein length sequences provided in ensembl.org website database.

Species G Ge µ U Ne References

Homo sapiens 3300.0 24.4 1.29E-08 38.5500 2.00E+04 (Lynch, 2010b)

Mus musculus 2700.0 24.3 5.40E-09 14.5800 2.00E+05 (Uchimura et al., 2015)

Ficedula albicollis 1100 26.1 4.60E-09 5.0600 4.50E+05 (Smeds et al., 2016)

Arabidopsis thaliana 134.4 45.0 7.00E-09 1.0990 2.50E+05 (Ossowski et al., 2010)

Caenorhabditis elegans 100.3 27.2 1.48E-09 0.1479 8.00E+04 (Denver et al., 2012)

Caenorhabditis briggsae 108.4 26.2 1.34E-09 0.1447 6.00E+04 (Denver et al.. 2012)

Pristionchus pacificus 133.1 25.3 2.00E-9 0.2663 - (Weller et al., 2014)

Drosophila melanogaster 148.0 21.2 5.49E-09 0.6698 1.15E+06 (Schrider et al., 2013)

Heliconius melpomene 273.8 17.9 2.90E-09 0.7940 2.00E+06 (Keightley et al., 2014b)

Ostreococcus tauri 13.0 10.6 4.19E-10 0.0054 9.60E+06 This study

Ostreococcus mediterraneus 13.5 11.4 4.92E-10 0.0065 - This study

Bathycoccus prasinos 15.1 12.5 3.07E-10 0.0046 - This study

Micromonas pusilla 21.1 17.3 8.15E-10 0.0172 - This study

Chlamydomonas reinhardtii 112.0 19.7 9.63E-10 0.1079 3.10E+07 (Ness et al., 2015b)

Saccharomyces cerevisiae 12.3 8.8 1.67E-10 0.0021 6.20E+06 (Zhu et al., 2014)

Schizoaccharomyces pombe 12.6 7.1 2.00E-10 0.0025 2.60E+06 (Farlow et al., 2015)

Paramecium tetraurelia 72.1 53.9 1.94E-11 0.0014 1.20E+08 (Sung et al., 2012b)

Dictyostelium discoideum 34.2 21.1 2.90E-11 0.0010 - (Saxer et al., 2012)

Bacillus subtilis 4.2 3.6 3.28E-10 0.0014 6.30E+07 (Sung et al., 2015)

Escherichia coli 4.6 4.1 2.20E-10 0.0010 1.80E+08 (Lee et al., 2012)

Mesoplasma florum 0.8 0.7 9.78E-09 0.0078 1.10E+06 (Sung et al., 2012a)

Burkholderia cenocepacia 7.7 6.8 1.33E-10 0.0010 - (Dillon et al., 2015)

Pseudomonas aeruginosa 6.6 6.0 7.92E-11 0.0005 2.00E+07 (Dettman et al., 2016)

Salmonella typhimurium 4.8 4.3 7.00E-10 0.0034 - (Lind and Andersson, 2008)

Mycobacterium tuberculosis 4.4 4.0 2.58E-10 0.0011 - (Ford et al., 2011)

Deinococcus radiodurans 3.2 2.9 4.99E-10 0.0016 - (Long et al., 2015a)

153 Table S11. Effect of GC gap from equilibrium in base substitution mutation rate. R1, R2, R3 and R3

were used to calculate the GCeq and the mutation rate at equilibrium µeq. The ratio µ/µeq permits to estimate the elevation of the mutation rate due to the GC gap, i.e in O.tauri the mutation rate increases by 11.8%.

GC>AT relative Species µ GC GCeq GCr µ/µeq References to AT>GC

Homo sapiens 1.29E-08 0.420 0.323 1.301 2.098 1.081 (Lynch, 2010b)

Mus musculus 5.40E-09 0.424 0.207 2.051 3.842 1.295 (Uchimura et al., 2015)

Ficedula albicollis 4.60E-09 0.443 0.311 1.426 2.219 1.116 (Smeds et al., 2016)

Arabidopsis thaliana 7.00E-09 0.367 0.138 2.663 6.255 1.640 (Ossowski et al.. 2010)

Caenorhabditis elegans 1.48E-09 0.354 0.193 1.837 4.189 1.225 (Denver et al.. 2012)

Caenorhabditis briggsae 1.34E-09 0.377 0.211 1.784 3.732 1.227 (Denver et al.. 2012)

Pristionchus pacificus 2.00E-09 0.427 0.157 2.711 5.350 1.381 (Weller et al., 2014)

Drosophila melanogaster 5.49E-09 0.419 0.188 2.228 4.318 1.324 (Schrider et al., 2013)

Heliconius melpomene 2.90E-09 0.331 0.248 1.335 3.032 1.044 (Keightley et al., 2014b)

Ostreococcus tauri 4.19E-10 0.590 0.365 1.615 1.737 1.118 This study

Ostreococcus mediterraneus 3.73E-10 0.560 0.433 1.293 1.310 1.017 This study

Bathycoccus prasinos 3.07E-10 0.480 0.366 1.312 1.733 1.029 This study

Micromonas pusilla 8.15E-10 0.638 0.462 1.382 1.166 1.030 This study

Chlamydomonas reinhardtii 9.63E-10 0.619 0.259 2.392 2.864 1.428 (Ness et al., 2015)

Saccharomyces cerevisiae 1.67E-10 0.384 0.311 1.235 2.216 1.067 (Zhu et al., 2014)

Schizoaccharomyces pombe 2.00E-10 0.360 0.264 1.364 2.790 1.122 (Farlow et al., 2015)

Paramecium tetraurelia 1.94E-11 0.279 0.072 3.884 12.921 1.543 (Sung et al., 2012b)

Bacillus subtilis 3.28E-10 0.437 0.443 0.986 1.256 0.999 (Sung et al., 2015)

Escherichia coli 2.20E-10 0.506 0.450 1.124 1.222 1.010 (Lee et al., 2012)

Mesoplasma florum 9.78E-09 0.270 0.059 4.582 15.970 2.628 (Sung et al., 2012a)

Burkholderia cenocepacia 1.33E-10 0.669 0.551 1.213 0.814 0.978 (Dillon et al., 2015)

Pseudomonas aeruginosa 7.92E-11 0.662 0.396 1.671 1.524 1.083 (Dettman et al., 2016)

Salmonella typhimurium 7.00E-10 0.521 0.559 0.932 0.788 1.008 (Lind and Andersson, 2008)

Mycobacterium tuberculosis 2.58E-10 0.656 0.276 2.376 2.622 1.512 (Ford et al., 2011)

Deinococcus radiodurans 4.99E-10 0.668 0.643 1.039 0.555 0.983 (Long et al., 2015)

154 -7 µ ) -8 Mp -9 Log_µ

-10 Bacteria Unicellular Eukaryotes Mamiellophyceae Log10 of nucleotide mutation rate ( Pt Metazoans Arabidopsis -11 -11.0 -10.0 -9.0 -8.0 -7.0 -11.0

0.00 5.05 10.010 15.015

R1 (GC to AT) / R2 (AT to GC) tab$R1R2 Figure S2. Correlation between the strength of the GC bias (R1/R2) and the nucleotide mutation rate. Mp is Mesoplasma florum and Pt is Paramecium tetraurelia. (n=23 excluding Pt and Mp, Pearson correlation, P-value=0.002, "=0.61). Nucleotides context of mutations 14 14 14 14 12 12 12 12 10 10 10 10 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2

A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C A T G C Mutated site A T C G A T C G A T C G A T C G A T G C

Figure S3. Mutational context of base-substitution mutations. Mutations occur at the last position of the trinucleotides. This figure takes count of all mutations of the four species add together. Despite some trinucleotides mutated more frequently as expected by chance, no significant bias is detected.

"&&! CHAPITRE 4

Table S1. Draw statistical results from Pacbio RS II sequencing and polished step.

Job Metric Value Polished Contigs 361 N50 Contig Length 244 124 Sum of Contig Lengths 21 260 393 Adapter Dimers (0-10bp) 0.02% Short Inserts (11-100bp) 0.01% Number of Bases 1 654 575 141 Number of Reads 266 217 N50 Read Length 10 870 Mean Read Length 6 215 Mean Read Score 0.85 Mapped Reads 237 383 Mapped Read Length of Insert 1 971 Average Reference Length 446,159 Average Reference Bases Called 100.0% Average Reference Consensus Concordance 99.98% Average Reference Coverage 69.94 Polymerase Read Quality 0.852

Table S2. Statistical best results from ABySS assembly with MiSeq reads (K=80 and n=10).

n n:200 n:N50 min N80 N50 N20 max sum

Unitigs 20 806 15 924 1 042 200 2 411 10 108 25 033 99 228 41.55e6

Contigs 19 925 15 327 902 200 2 534 11 210 29 270 189 496 41.62e6

Scaffolds 19 896 15 298 889 200 2 536 11 212 29 270 483 331 41.62e6

156 Table S3. SGA results with MiSeq reads mappings in HGAP assembly. Classified 140 vertices as unique (13.96 Mbp) Classified 6 vertices as repeat (0.12 Mbp) Classified 59 vertices as spurious (0.70 Mbp) Constructed 81 scaffolds from 81 contigs Total bases: 13.26Mbp Max scaffold: 964 334 bp N50 scaffold: 436 237 bp Mean scaffold: 163 701 bp

Table S4. Total final genome assembly with the 64 contigs. The contig indicated Mt is the mitochondria and the contig indicated Cl is the chloroplast.

Contig Size (kb) GC% 1 1793.9 46.4 2 1666.5 46.5 3 1505.7 46.4 4 965.3 46.4 5 764.1 46.5 6 691.2 46.4 10 664,0 46.6 7 651.1 46.3 8 588.1 46.6 12 578.9 46.5 9 564.2 46.6 13 475.2 46.9 14 361.5 47.1 16 349.9 47.9 11 349.1 46.3 15 260.0 46.7 18 227.8 46.9 17 225,0 47.5 29 186.6 43.5 19 178.3 46.3 22 94.6 46.4 27 93.3 47.3 20-Cl 78.2 31.9 30 75.5 40.1 92 47.9 36.9

157 101 45.8 36,0 21-Mt 42.9 41.0 24 40.2 51.7 103 39.2 44.9 104 39.1 34.9 111 37.8 35.5 105 37.5 36.4 128 35.0 34.9 114 33.3 44.7 117 28.2 35.1 23 28.1 44.6 130 27.2 35.8 39 27.0 39.8 25 24.7 47.7 40 23.7 38,0 132 21.9 35.1 165 20.6 35.7 41 20.1 60.8 151 19.0 34.6 28 18.2 47.9 26 16.6 44.7 190 16.5 34.7 138 15.2 35.2 42 14.9 45.0 174 14.2 30.9 169 13.7 34.8 43 13.5 57.3 31 12.8 45.9 44 12.7 61.0 180 12.7 40.9 195 12.7 37.1 181 12.3 34.3 208 12.1 37.3 45 11.9 62.7 202 11.5 36.0 199 11.0 35.7 32 10.7 54.6 206 10.7 35.7 46 10.5 46.0

158 CHAPITRE 6

Tableau A1. Mutations des lignées d’O. tauri qui montrent une baisse significative de fitness au cours de l’expérience d’accumulation de mutations. Dans le cas de mutations inter-géniques, il est indiqué les deux gènes adjacents.

Lignée Chromosome Position Effet de fitness Gene Annotation

Spermidine Ostta03g01200 12 3 210680 Inter génique spermine synthases family TPMT Ostta03g01210 family FAD-dependent pyridine nucleotide 12 5 5699 Non synonyme Ostta05g00040 disulphide oxidoreductase

Ostta06g00060 Helicase, C-terminal 12 6 18960 Inter génique Ostta06g00070 NA

12 7 612454 Synonyme Ostta07g03740 transducin family protein

Ostta08g00400 Zinc finger, CCHC-type 12 8 66027 Inter génique ostta08g00410 Translation initiation factor 3 subunit D

35 3 353325 Non synonyme Ostta03g02180 Conserved oligomeric Golgi complex

35 10 510428 Non synonyme Ostta10g03080 Zinc finger, C2H2

Ostta01g04240 NA 36 1 688195 Inter génique Ostta01g04250 Methylase domain

36 11 452825 Non synonyme Ostta11g02480 Filamin/ABP280 repeat-like

36 13 365169 Non synonyme Ostta13g02140 tRNA/rRNA methyltransferase

Tableau A2. Délétions et insertions identifiées chez les lignées utilisées pour les essais de fitness.

Lignée Effet de fitness Gene Annotation Micromonas pusilla 2 Inter génique Mipur14g00010 Polyribonucleotide nucleotidyltransferase Mipur01g07370 NA 4 Inter génique Mipur01g07380 Zinc finger, RING-type 4 Frame shift Mipur02g01650 SKI-interacting protein Mipur01g01440 Translation elongation factor 5 Inter génique Mipur01g01450 Small nuclear RNA activating complex Mipur01g01440 Translation elongation factor 5 Inter génique Mipur01g01450 Small nuclear RNA activating complex Bathycoccus prasinos 4 insertion de codons Bathy13g02570 General substrate transporter Bathy05g04970 General substrate transporter 7 Inter génique Bathy05g04980 Dynein heavy chain Bathy18g00010 TonB-dependent receptor 7 Inter génique Bathy18g00020 ATP-binding cassette superfamily Ostreococcus mediterraneus 3 Frame shift Ostme06g01720 Glycoside hydrolase catalytic domain 6 Inter génique Ostme12g03410 NA

159 Tableau A3. Substitutions identifiées chez les lignées utilisées pour les essais de fitness.

Lignée Effet de fitness Gene Annotation Micromonas pusilla Mipur01g06170 Tetratricopeptide-like helical 1 Inter génique Mipur01g06160 NA Protein binding 1 Inter génique Mipur02g00010 NA Mipur11g01020 Intracellular protein transport 1 Inter génique Mipur11g01030 NA Mipur01g03700 Ribosomal protein L21 2 Inter génique Mipur01g03710 Heme binding,Cytochrome b5,Fatty acid desaturase Mipur01g04900 Protein kinase,ransferase activity 4 Inter génique Mipur01g04910 ATPase 4 Synonyme Mipur07g02835 Clusterin-associated protein-1 4 Non synonyme Mipur07g02835 Clusterin-associated protein-1 4 Non synonyme Mipur07g02835 Clusterin-associated protein-1 Membrane,transferase activity 4 Non synonyme Mipur11g02270 Transferring phosphorus-containing groups 4 Non synonyme Mipur15g02150 sulfotransferase activity Mipur01g01440 Translation elongation factor,Nucleic acid-binding 5 Inter génique Mipur01g01450 Small nuclear RNA activating complex Mipur01g01440 Translation elongation factor,Nucleic acid-binding 5 Inter génique Mipur01g01450 Small nuclear RNA activating complex 5 Non synonyme Mipur01g04180 SKI-interacting protein Mipur15g02130 Multidrug efflux transporter AcrB 5 Inter génique Mipur15g02140 Methyltransferase FkbM 1-3 Non synonyme Mipur02g01500 Pectin lyase fold 1-3 Non synonyme Mipur02g01500 Immunoglobulin E-set 1-3 Non synonyme Mipur02g01500 Parallel beta-helix repeat Protein kinase, catalytic domain,transferase activity 1-3 Non synonyme Mipur05g02350 Transferring phosphorus-containing groups Bathycoccus prasinos

Bathy18g01200 RNA polymerase 1 Inter génique Bathy18g01210 NA 2 Synonyme Bathy01g00480 Esterase, SGNH hydrolase-type, lipase 2 Inter génique Bathy12g00620 D-galacturonic acid reductase 3 Synonyme Bathy01g00250 NA Bathy05g00830 GTP cyclohydrolase II 3 Inter génique Bathy05g00840 NA 3 Synonyme Bathy05g03350 ATP-dependent Clp protease proteolytic subunit 4 Non synonyme Bathy05g04130 DNA-binding 6 Non synonyme Bathy01g02250 NA 6 Non synonyme Bathy13g00710 ATP-dependent metalloprotease FtsH 7 Inter génique Bathy19g00710 NA Bathy12g00610 NA 8 Inter génique Bathy12g00620 NA

160 Ostreococcus mediterraneus

1 Gain d’un start Ostme09g04340 NA 1 Non synonyme Ostme13g00630 Cation/H+ exchanger Cleavage/polyadenylation specificity factor 2 Non synonyme Ostme05g02160 A subunit, C-terminal 3 Non synonyme Ostme05g01170 Myb domain, plants 3 Non synonyme Ostme05g01310 P-loop containing nucleoside triphosphate hydrolase 4 Non synonyme Ostme05g03160 P-loop containing nucleoside triphosphate hydrolase Steroid receptor RNA activator-protein 5 Non synonyme Ostme07g00860 coat protein complex II, Sec31 Ostme17g02510 Putative 5-3 exonuclease 5 Inter génique Ostme17g02520 NA 7 Non synonyme Ostme01g01630 COMM domain 7 Non synonyme Ostme08g01390 Exonuclease, phage-type/RecB 8 Non synonyme Ostme09g04240 P-loop containing nucleoside triphosphate hydrolase 8 Non synonyme Ostme09g04420 Regulator of K+ conductance 9 Non synonyme Ostme01g05620 Acetyl-coenzyme A transporter 1 9 Synonyme Ostme13g01030 RNA recognition motif domain, eukaryote

161 What are the fitness effects of spontaneous mutations in picophytoplankton?

Marc Krasovec1, Gwenael Piganeau1, Adam Eyre-Walker2 ,Nigel Grimsley1, David Pecqueur3, Christophe Salmeron3, Elodie Desgranges1, Claire Hemon1, Sophie Sanchez-Ferandin1

1 Oceanological Observatory of Banyuls, UMR 7232, Banyuls-sur-mer, 66650, France 2 University of Sussex, Evolution Behaviour and Environment, United Kingdom 3 Oceanological Observatory of Banyuls, UMS 2348, Banyuls-sur-mer, 66650, France [email protected]

Context: Mutations are the main source of diversity upon which How does the fitness effects of mutations natural selection can act(1). The study of mutation rates and their change in stressful environments? effects on fitness is fundamental to our understanding of evolution -! Salinity tests (5 to 65 g/l) rates. -! Herbicide tests (irgarol and diuron) Number of MA lines assayed for fitness effects in stressful conditions How can we estimate the fitness effects of spontaneous Total number of Species Number of lines mutations in eukaryotic picophytoplankton? generations O. mediterraneus 9 300 Mutation accumulation (MA) experiments(2) aim to estimate the M. pusilla 7 250 B. prasinos 8 250 effects of spontaneous mutations on fitness. We followed the

evolution of fitness, measured as the growth rate between Results Lines with significant difference in fitness bottlenecks, in 40 to 60 lines (MA lines) from 3 strains passaged Cell division

per 2.5 day (G) Controls (ancestral line) through serial bottlenecks during 200 to 400 days. Lines MA experiment condition

2.02.0 Ancestral line n mutant Mutation lines

1.51.5 tab[11, 2]

T0 = one cell per well 1.01.0 Cell count by flow cytometer Salinity 0.50.5 (g/l) 5 20 35 50 65

1 2 O . mediterraneus3 4 5 6 a

Irgarol Diuron Cell division 1 µg/l 10 µg/l 2.0 2.0 per day (G) 2.0 1.5 1.5 One cell bottleneck each 14 days 1.5 1.5 1.0 1.0 1.0 tab[3, 2] 1.0tab[1, 2] Biological models: Our picophytoplankton models (Ostreococcus tab[5, 2] !!!!! !!!!! 0.5 0.5 mediterraneus, Micromonas pusilla and Bathycoccus prasinos) 0.5 0.5 0.0 0.0 belong to the Mamiellophyceae class (Chlorophyta). They are the 0 0.0 B.1.0 prasinos 1.2 1.4 1.6 M. 1.0 pusilla 1.8 1.2 2.0 1.4 1.0O1.6 . mediterraneus1.2 1.8 1.4 2.0 1.6 1.8 2.0 (3) a a smallest free-living eukaryotes (1 !m) described to date . They a possess a simple cellular organization with one mitochondrion, one Salinity tests: chloroplast and a 13 to 20 Mbp haploid nuclear genome. The - MA lines were less fit in stressful environment spontaneous mutation rate is the lower limit of the rate of - ancestral type was never outcompeted by MA lines

adaptation and paramount for understanding the Herbicide tests: consequences of rapid - MA lines were equally, less or more fit depending on species environmental changes A B C 56!"#!$%&'(%))*+%,-7!86!.'3)2$2+*-!9(:7!;6!1*(453233,-!9(:! on these species. Conclusion: Taking the cell division per day as a proxy for fitness, the fitness of MA lines show little or no Summary of MA experiment results differences compared with ancestral line. In stressful !"#$%#&' ()*+,',%-#&' ()*+,'-./0#1')2'3#-#1+4)-&' 56+-3#'%-'7*-#&&' !!"#!$%&'(%))*+%,-! "#! $!###! %&!'&()*+,*-!./++)0*1/-2! environments, the difference fitness between MA lines !!.#!/,-'00*! "#! 3!###! %&!'&()*+,*-!./++)0*1/-2! was more striking, particularly for M. pusilla. We found !!1#!/)*-'+2-! 4#! 3!###! %&!'&()*+,*-!./++)0*1/-2! evidence for changes in fitness effects of mutations in B. prasinos exposed to herbicides: where some MA There was no evidence for a variation in fitness in MA lines lines outcompete the ancestral line. This study shows along the course of the experiment. the importance of genotype/environment interactions to

1 Wright, S., 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366. understand species adaptation. 2 Halligan, D.L., Keightley, P.D., 2009. Spontaneous Mutation Accumulation Studies in Evolutionary Genetics.Annu. Rev. Ecol. Evol. Syst. 40, 151–172. 3 Courties, C., Vaquer, A., Troussellier, M., Lautier, J., Chrétiennot-Dinet, M.J., Neveux, J., Machado, C., Claustre, H., 1994. Smallest eukaryotic organism. Nature 370, 255–255. Funding: CNRS, UPMC Complexité du vivant and ANR SVSE6-0004 PHYTNESS M. (g/l) (g/l) 2 3 5 2 3 5 Salinity Salinity 9 1 2 3 4 8 ) for -1 1 -10 3 4 7 3 4 7 pusilla mediterraneusmediterraneus prasinos M. M. pusilla M. pusilla B. prasinos B. O. mediterraneus O. This study also Diuron (10 µg.L 5 5

1 0

) 2.5 2.0 1.5 1.0 0.5 2.0 0.0 0.5 1.5 7 6 , Sophie Sanchez-Ferandin mutations per nucleotide per mutations 1 -10 ! 1 2 3 6 9 7 ) -1 6 7 4 3 5 for B. prasinos. salinity stress 20 35 50 65 Algicide stress -10 , Gwenael Piganeau 1 Irgarol (1 µg.L Taking the cell division per day as a proxy the cell division Taking 2 6 5 8 6 7 9 5

1 0

CNRS, UPMC Complexité du vivant and ANR SVSE6-0004 PHYTNESS Funding: CNRS, UPMC Complexité du vivant and

2.5 2.0 1.5 1.0 0.5 1.5 0.0 2.0 0.5 , Claire Hemon 1

0 1

1.5 control the to relative fitness Mean 0.5

O. mediterraneus from O. mediterraneus Results obtained 3.0 2.5 2.0 1.5 1.0 0.5 0.0 the control control the Salinity tests (5 to 65 g/l) Salinity tests (5 to and Algicide tests (irgarol diuron

=> MA lines are less fit than control culture when lines are less fit than control => MA reveals fitness osmolarity changes : salinity stress of spontaneous mutations effects Mean fitness relative to to relative fitness Mean lines, => high variation in fitness between MA outcompete control lines 5 MA strikingly, pusilla and 2.52 Conclusion: little shows lines MA of fitness the fitness, for compared with the control line. However differences stressful in revealed are mutations of effects fitness conditions. Considering that each significant fitness lines might between the control and the MA difference be result of at least one mutation, the minimum 2.72 thus is rate mutation per generation for O. generation per mediterraneus, 1.75 pinpoints the importance of genotype-environment genotype-environment of importance the pinpoints interactions to understand species adaptation. -! -! Do the fitness effects of mutations change in of mutations change in Do the fitness effects stressful environments? , Elodie Desgranges

1

and and (days) f (3)

302 224 378 294 to T 0 T 1*(23/0//,-#&'(! C Ostreococcus Ostreococcus , Christophe Salmeron 1 Cell count by flow cytometer lines 6 8 8 6 n mutant Ne allow the estimation allow the estimation (2) B , David Pecqueur 1 University of Sussex, Evolution Behaviour and Environment, United Kingdom Environment, United Kingdom Evolution Behaviour and University of Sussex, 2 Mutation Oceanological Observatory of Banyuls, UMR 7232, Banyuls-sur-mer, 66650, France 66650, 7232, Banyuls-sur-mer, of Banyuls, UMR Oceanological Observatory 1 m) described to date to described !m) ) prasinos Bathycoccus and 272 265 512 272 Average Average generations One cell bottleneck each 14 days ,Nigel Grimsley 2 = one cell per well What are the fitness effects of of effects fitness the are What 0 A

T $!%#!!&'($!)#! .'/)0$0+*- "#!!"#$%&'(%))*+%,- . The study of mutation rates and their The study of mutation . 7 8 (1) 21 24 of lines Number , Adam Eyre-Walker 1

through serial bottlenecks during 200 to 200 during bottlenecks serial through subcultured

RCC2590 spontaneous mutations in picophytoplankton? in picophytoplankton? mutations spontaneous Marc Krasovec RCC1105 Ancestral line RCC299

RCC4221

pusilla

tauri mediterraneus

prasinos

and adaptation rates. and adaptation rates. effects on fitness is fundamental to our understanding of evolution on fitness is fundamental to effects natural selection can act natural selection can Mutations are the main source of diversity upon which upon diversity of source main the are Mutations Context: proxy for fitness. strains were strains a as taken was bottlenecks between rate growth The mutations. of the fitness effects of spontaneous mutations. MA lines from 3 of spontaneous mutations. MA of the fitness effects of spontaneous deleterious 400 days to allow the segregation Mutation accumulation (MA) experiments Mutation accumulation

on these species. environmental changes consequences of rapid to understand the adaptation and is crucial spontaneous mutation rate sets the lower limit of the rate of spontaneous mutation rate sets the lower limit of the rate of chloroplast and a 13 to 21 Mbp haploid nuclear genome. The The chloroplast and a 13 to 21 Mbp haploid nuclear genome. possess a simple cellular organization with one mitochondrion, one possess a simple cellular organization with one mitochondrion, one the smallest free-living eukaryotes (1 eukaryotes free-living smallest the Micromonas pusilla Micromonas mediterraneus, contain They (Chlorophyta). family Mamiellophyceae the to belong Biological models: Our picophytoplankton models (

in standard subculturing conditions? What are the fitness effects of spontaneous mutations lines decrease, 1 increase. MA lines along the course of the experiment : 5 MA MA 5 : experiment the of course the along lines MA There is little evidence for variation in fitness in =>

mutations in eukaryotic picophytoplankton? picophytoplankton? mutations in eukaryotic How can we estimate the fitness effects of spontaneous the fitness effects of spontaneous How can we estimate M. B. O. O. Species Wright, S., 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc. Sixth Int. Congr. S., 1932. Wright, Ecol. Evol. Syst. 40, 151–172. Accumulation Studies in Evolutionary Genetics.Annu. Rev. 2009. Spontaneous Mutation P.D., Halligan, D.L., Keightley, Machado, C., Claustre, H., 1994. Smallest eukaryotic organism. Nature 370, 255–255. J., Chrétiennot-Dinet, M.J., Neveux, J., M., Lautier, Troussellier, A., Courties, C., Vaquer, [email protected] [email protected] 1 2 3 !"#$%&'(")*+,)"-(./*"0*!12'"#%&)3'")4*!"#$%&'&''(" #)($*+&/*&*.&/,*/'$52 !"#$%&'()$&*+,$-.%/01',$2%#/'3/454$461'74%* !%8$)91',$4*':4$9"8/*1'';"<.%/';$&*./=+>/4$&6%&1' ?6$#'7@4/+A$)B/4'$&6'CD/&$E)'F%G$&/$0

!12%"6,),'(.*#"/('(") <&-(,%%"#12.,&, 7*/(-#%,*.,%% '-"+!././/*- "#*+,)%9'$'$)(.,%%$%&:*,$3&:2"'(.*6:,,)*&%6&, "L'-./'*)$99'<&-(,%%"#12.,&,= A%-.%&'70B$4@"-/9 KR;4bM?'-4// ,$#%/))"<.@*/$/'%9'$'*)$99'5/)"&G%&G'-"'-./'<.@)0#'>1%":"#12'&= ?))''()"#*+, 9-4$%&9'*0)-%8$-/6'%&'-./')$5'$4/'1&#%"(5= '()"#*+, .$9'$'/(-#%,*?*$-*%&:6,*.,%%*#$6/'"L'$'9%&G)/'*.)"4"<)$9- $&6'#%-"*."&64%"&H

'()"#*+, .$9'$'?@*

&0*)/%'Y&Z1'*.)"4"<)$9-'Y*Z1' <&-(,%%"#12.,&/'$4/'G4"0<'-"G/-./4'5$9/6'"&'-"%,.$%&:*.:(',:(&H' #%-"*."&64%"&'Y#Z1'C")G%'5"6%/9' YGZ1'-./'7&6"<)$9#%*'!/-%*0)0#' b/9<%-/'9.$4%&G'$'),&:%2*(5,)'(.&%*?D8:EF7*/,G$,).,1''()"#*+, $&6''()$*/,0#+,1*-)<4"-/"#/9'$4/' Y/4Z1'$&6'G4$&0)/9'YG4ZH';*$)/' ,"6%L%/6'L4"#'M"-'!"#$H1'IJKI >4"#'F%G$&/$0'!"#$H1'IJKK bar = 200 &# "&)@'VJ^'%6/&-%*$)1'$'6%LL/4/&*/'*"#<$4$5)/'-"'.0#$&'$&6'*.%*B/&'6/9<%-/'-./9/'94"#'d/&9/&'!"#$(1'IJJV .%G./4'6%8/4G/&*/'%&'-./%4'KR;4bM?'9/c0/&*/H H1&'*(/*'1,*%,9,%*"0*/'&)5()6*6,)"-(.*9&:(&'(")*()*&*?D8:EF7*5,0(),5*/#,.(,/*I

!"%2-":#1(/-/*.&%%()6*&)5*9&%(5&'(")*#:".,5$:, >"-#&:(/")*&.:"//*.&%%()6*-,'1"5/ H1&'*'"*,U#,.'*I ;MF'$&6'9."4-'W&6/)9 f$4G/'Y[KJJ'5:2#'(.*/#,.(,/*."-#%,U*I _C KUV1QKV KIS1ITJ KIJ1QSR Q1RPI !"#$%&'(")*I W))0#%&$'F$%4/6'7&6'!/$69'L"4'KP'.$<)"%6'*)"&$)'9-4$%&9'"L''()"#*+, 3` KVK1VTQ KQK1QRT KPI1JSR T1QPK F./&"-@<%*'8$4%$-%"& b0<)%*$-/9'4/#"8/6'$&6'4/$6'-4%##/6'Y>$-9_&%c $&6'a4%##"#$-%*Z C7MN + KJQ1UTS KJI1UTT K1TTU F")@#"4<.%9# ,$<<%&G'$G$%&9-'-./'4/L/4/&*/'G/&"#/'9/c0/&*/'Y!``QIIKZ ?))'<")@#"4<.%9#9 (%$))/)%* ;MF9'"&)@ ;-4$%&'9; W&9/4-%"&'6/-/*-%"& W&'."09/' 8,G$,).()6*/'&'(/'(./*#,:*/':&() 9*4%<-' 9-4$%& X4/$69' X4/$69H&-+K C/&"#/' X;MF9'Y_CZ XW&6/)9' C/&"-@

_&%L%/6' !``KKJR KVR TQU TQHT QPRSU KVSV !``KKKJ VT PKQ TQHS QIURR KUUS ;&:(&)'/*0(%':&'(")4 !``KKKI KVT UUT TQHV QPUKV KVSQ !/c0%4/'*"8/4$G/O !``KKKQ SK KTK TQHI PRKVT KQRS ['KJ] !``KKKS UP QVSJJ KTKJ b/)/-%"&'6/-/*-%"& IUP TQHQ g'I'-%#/9'$8/4$G/'4"#'C4%#9)/@ !"#$H1'IJKJ1' .$45"4%&G'$'G4//&'*")"4' !``KSSR UP IQS TQHS QKJTU KSUI *.$4$*-/4%9-%*'"L',$#%/))$)/9H' !``KSST PK KSS TQHQ QTTSQ IJKJ e$)%6$-%"&'5@';$&G/4'9/c0/&*%&G'"L'F`!'<4"60*-'"L'B/@'4/G%"&9 a./'9M >F ;/&9%-%8%-@ ;1%":"#%&/' <('".1")5:(& F$.%,&: 8'&'(/'(./ M"&+"0-)%/49 N0-)%/49 M0#5/4'"L'9%-/9 VR'B5 QQ'B5 KKHI',5 %MO'M"&9@&"&@#"09 9/G4/G$-%&G 9%-/9' PK1QRR PPS !/L/4/&*/ ;@&"&@#"09';MF KKT SS QV TKV &MO'M"&9@&"&@#"09 9%-/9' Q1PJJ1TIJ IU1IRJ A%)6'9-4$%&9 JHJJQ JHJJP F%';@&"&@#"09 JHJJS JHJJP JHJK πMO'M"&9@&"&@#"09 &0*)/"-%6/ 6%8/49%-@ f"D'C` f"D'C` !/*"#5%&$-%"& i/9 M" i/9 %;O';@&"&@#"09 9/G4/G$-%&G 9%-/9' SJ1JRK QKU !/L/4/&*/ & '()"#*+,)&0*)/"-%6/' ;O';@&"&@#"09 9%-/9' I1KSJ1QUJ KP1KQJ A%)6'9-4$%&9 b/*$@'"L')%&B$G/' π 6%9/c0%)%54%0# 6%8/49%-@'2-( "-./4' ;O';@&"&@#"09 &0*)/"-%6/ 6%8/49%-@ JHJJR JHJJR % ;-$&6$46'*.4"#"9"#/9O'KJ'<4/6%*-/6'$&6'*"&L%4#/6')$4G/'6/)/-%"&9'YJHS'-"'KQ'B5Z /0B$4@"-/9 QO'Q+L")6'9/G4/G$-%&G 9%-/9' PK1TPS ISR & N59/48/6 QO'Q+L")6'9%-/9' K1QIK1IIS R1VPI 7*%&:6,*()5,% #"%2-":#1(/-/*()*'1,*.1%":"#%&/'*&)5* 72

('/*!>Q*9&%(5&'(") %WO'W&-/4G/&%* 9/G4/G$-%&G 9%-/9' PK1IJP SUU

&WO'W&-/4G/&%* 9%-/9' K1KRV1IQJ IV1KTS

`"8/4$G/ πWO'W&-/4G/&%* &0*)/"-%6/ 6%8/49%-@ JHJJT JHJJS

['KJ] f%&B$G/'6%9/c0%)%54%0#'Y4IZ %WO'W&-4"&%* 9/G4/G$-%&G 9%-/9' Q1RIR IQ g KJ] &WO'W&-4"&%* 9%-/9' IIP1RPU RSI \'J] b%9-$&*/'YB5Z πWO'W&-4"&%* &0*)/"-%6/ 6%8/49%-@ JHJJVS JHJJU H1"%,*6,)"-,*0,&'$:,/*&)5*#"%2-":#1(/-/* f$4G/'%&9/4-%"&'YgZ'$&6'6/)/-%"&'Y[Z'<")@#"4<.%9#9 >").%$/(") f"D')/8/)'"L'<")@#"4<.%9#'"&' Y4/6O'F`!'*"&L%4#/61'G4/@O'F`!'6"'&"-'$G4//1'5)$*BO'&"-'F`!'*./*B/6Z TQ^'"L'-./'G/&"#/'#$B/9' C`'*"&-/&- YG4//&'\'UJ^1'5)0/'\'SJ'^Z H1&'*T,*"A/,:9,5 *4@<-%*'9:2#'(.*/#,.(,/*."-#%,U*I !"#$%&'(")*I ,%&%#0#'L4/c0/&*@'"L'9/20$)' F%hF9'4$-%"'Y*)/$4\')"D1'6$4B'5)0/\'.%G.Z M0*)/"-%6/'6%8/49%-@'YF%Z F./&"-@<%*'8$4%$-%"& 4/<4"60*-%"&'K'#/%"9%9'/8/4@' KJ1JJJ'#%-"9%9'%&''()"#*+, F")@#"4<.%9# R(61*"..$::,).,*"0*&5S&.,)'*8F!/ ;-4$%&'9

F/48$9%8/'#0)-%+&0*)/"-%6/'#0-$-%"&'/8/&-9 Stable Production of a Lytic Prasinovirus by its Picoalgal Host,Ostreococcus mediterraneus Sheree Yau1, Marc Krasovec1, Nigel Grimsley1, Evelyne Derelle1, Sophie Sanchez-Ferrandin1, Stephane Rombauts2, Klaas Vandepoele2, Gwenael Piganeau1 1Integrative Biology of Marine Organisms (BIOM)-CNRS UMR7232 - Observatoire Océanologique de Banyuls, FRANCE 2Department of Plant Systems Biology, VIB, Ghent, BELGIUM. Correspondence: [email protected]

Background Globally distributed marine algae of the genus Ostreococcus are among the smallest known free-living eukaryotes (<1 micron diameter) [1]. Ostreococcus genomes possess a "Small Outlier Chromosome" (SOC), so named because it has a lower GC content than the rest of the genome and shares little sequence homology with other species [2]. All Ostreococcus species are infected by prasinoviruses, large DNA viruses thus far all known to be strictly lytic. Surprising, a prasinovirus genome, termed OmV0, was assembled with the genome of the recently described species, O. mediterraneus RCC2590 [3].The culture showed no sign of infection in standard batch culture conditions since its isolation and cloning in 2008. 0.5 micron AIM: Determine how OmV0 reproduces and the genetic basis for its stable coexistence with its host. Fig 1. Electronmicrograph of O. mediterraneus Methods A) Pulse-field Gel Electrophoresis (PFGE) and hybridisation C) Mutation Accumulation (MA) * Chromosomes were separated by PFGE. Experiment and infection of MA * Radiolabelled probes specific to OmV0 and the host genes lines were hybridised to the chromosomes in the gel. * A virus-free clone was obtained by limiting dilution that was used as the ancestor to PCR for major the MA experiment (Fig 3.) capsid protein * Culture medium filtrate from the RCC2590 Centrifuge 8,000 g "wild-type" strain was added to the MA 20 mins gene (24 independent) Filter <0.8 micron lines and lysis observed Infection of MA lines "wild type" RCC2590 cell-free filtrate D) Determination of genomic changes in MA lines by single molecule B) Fig 2. Test for production of OmV0 virions Fig 3. Mutation Accumulation Experiment Design PACBIO sequencing Results

OmV0 is not integrated into the RCC2590 1 O. mediterraneus RCC2590 genome Fig 4. Hybridisation of OmV0 and 18S rRNA specific probes to the PFGE-separated chromosomes of "wild type" O. mediterraneus Mutant OmV0-susceptible RCC2590 culture.The 18S rRNA probe hybridised 4 to the chromosomal band corresponding to the O. mediterraneus lines have decreased size predicted from the assembled genome. SOC size Both OmV0 probes, unique to two separate Fig 7. Hybridisation of SOC-specific probes to the genomic regions, hybridised to the same physical PFGE gel of "wild type" OmV0-producing RCC2590 location on the gel corresponding to the predicted and the OmV0-susceptible MA3, MA23, size of the complete linear OmV0 genome and MA24 lines shows a decrease in SOC size. MA (200 kb). lines that were not lysed by OmV0 showed no O. mediterraneus RCC2590 culture change in SOC size as assessed by PFGE (not 2 medium filtrate contains OmV0 shown) suggesting a strong correlation between deletion in the SOC with susceptibility to lysis by 1M 2 3 Fig 5. PCR amplification of OmV0. The estimated SOC sizes vary between the OmV0 major capsid OmV0-susceptible MA lines confirming the protein gene in the culture deletions occurred independently. medium filtrate. M) 1 Kb molecular ladder 1) No DNA template control 2) Genomic DNA preparation 3) Culture medium filtrate 200

Mutant O. mediterraneus lines are lysed by OmV0 A 58 kb region is deleted in the SOC of mutant line MA3 3 produced by the parent strain, RCC2590 5 Fig 6. Three independnent MA culture lines (MA3, MA23, MA24) Fig 8. Alignment of the RCC2590 SOC: 644 kb of 24 (40 total lines) visibly lysed upon addition of OmV0 produced SOC from by the "wild type" O. mediterraneus RCC2590 indicating OmV0 is O. mediterraneus capable of lytic reproduction. Neither the Ancestor line, nor the other RCC2590 to the SOC 58 kb independent MA lines were lysed by OmV0. of the MA3 line shows a 58 kb deletion at Ancestor line MA3 one end of the MA3 SOC. This deleted region contains primarily repeated infected non-infected infected non-infected sequences found else MA23 MA24 where in the genome but also at least 6 unique ORFs of unknown function. MA3 - SOC: 590 kb infected non-infected infected non-infected Conclusions and Perspectives References * O. mediterraneus stably produces an infectious lytic prasinovirus, OmV0. [1] Courties et al. (1994) Smallest eukaryotic organism. Nature. 370:255. * Spontaneous mutation leading to susceptibility to OmV0 lysis occurs in [2] Moreau et al. (2012) Gene funtionalities and genome structure in Bathycoccus ~12% of independent MA lines. prasinos reflect cellular specializations at the base of the green lineage. * Coexistence of OmV0 and O. mediterraneus in culture is linked to the Genome Biology. 13: R74. [3] Subirana et al. (2013) Morphology, genome plasticity and phylogeny presence of a genomic region located on the SOC. in the genus Ostreococcus reveal a cryptic species, O. mediterraneus sp. nov. * The OmV0-O. mediterraneus system is, as far as we know, the first report of (Mamiellales, Mamiellophyceae). Protist. 164: 643–659. this type of virus-host interaction to be isolated directly from the environment. [4] Thyrhaug et al. (2003) Stable coexistence in marine algal host–virus systems. A similar phenomenon is observed in diverse marine algal host-virus systems Mar. Ecol. Prog. Ser. 254:27-35. in culture [4], whose significance in the environment is yet to be explored. L06'-#,+&5'-&)%#".*"%)/$-=)?+=*",,+,"%)-$("$)+')'0") .$-%%$-+(%)-/)1"&"7.%3)".-,-163)"V#"$*="&'+,)2*-,-16)) +&()2*-*&/-$=+7.%\)'0")LWdBU_@@)#$-c".'A) @-#0*")@+&.0"9EI"$+&(*&3)?+$.)J$+%-D".3)KC"&+",)L*1+&"+4)

) f?!)[;Z;)^*-,-1*")b&'S1$+7D")("%) a$1+&*%="%)?+$*&%),-.!/0) ) KS&-=*O4")_D-,47D")"') _&D*$-&&"="&'+,")(4)L06'-#,+&.'-&) ,123!4560) ) a2%"$D+'-*$")a.S+&-,-1*O4"3)eee]F) M-&'"V') ^+&64,%E%4$E="$) ) !"#$"%"&'"()*&)+,,)-."+&%)-/)'0")1,-2"3)'0")#*.-"45+$6-7.)1$""&) ) +,1+")8%*9"):);)<=>)/$-=)'0")?+=*",,-#06."+").,+%%)+$")"%%"&7+,) "#$%&'()8(/9"(:3$""%(;"6'/:(#5$(,<6%/( +.'-$%)-/)#,+&5'-&*.).-==4&*7"%)-/)'0")?"(*'"$$+&"+&)@"+A)B0") =-5<53-'/'(( !"#$%&'&''(")1"&4%)C+%)-$*1*&+,,6)("%.$*2"()/$-=)+)%'$+*&)(*%.-D"E $"()*&)'0")B0+4),+1--&);F)6"+$%)+1-3)!"#$%&''&'(")#*($+3)5&-C&)) +%)'0")%=+,,"%')"45+$6-7.)/$""E,*D*&1).",,)8FAG)<=>)'-)(+'")8H>A))

B0")LWdBU_@@)#$-c".')+*=%)'-)g,,)=+c-$)1+#%)*&)-4$) Henderson et al., 2007 2+%*.)4&("$%'+&(*&1)-/)'0")#$-."%%"%)4&("$,6*&1)1"&-=")"D-,47-&) @*=#,").",,4,+$)-$1+&*9+7-&)C*'0)-&")=*'-.0-&($*+)+&() *&)'0"%")=*.$-+,1+")26)+)=4'+7-&)+..4=4,+7-&)8?P>)+##$-+.0A)) -&").0,-$-#,+%')+&()+)HZ)?2#)0+#,-*()&4.,"+$)1"&-="A) @4.."%%/4,,6)4%"()'-)1+*&)5&-C,"(1")*&'-)/4&(+="&'+,) )2/675%( .",,4,+$)#$-."%%"%)%4.0)+%)-$1+&",,+$)*&0"$*'+&."))8;>3)).",,) !"#"$"%&"( (*D*%*-&3).*$.+(*+&).,-.5)8Z>3),*#*()+&()%'+$.0)%6&'0"%*%A)) '/$6-%( !(.2/6%/( A/9"$()6.-"<<5,9;&"6"(2'"?(-%(/9-'('/2?;( '/$6-%'(

)*("+,"$-."%/(-&)YF)*&("#"&(+&')%'$+*&%)8%'+$7&1)/$-=)-&").",,>) 0"12"%&-%3)-/)+,,)(+410'"$)%'$+*&%)+h"$)'0-4%+&(%)-/)1"&"$+7-&%) P) ^) M) 4-5-%#5$.67&')+##$-+.0)'-)"%7=+'")'0")=4'+7-&)$+'")26)$"#,*.+E) P\)!7)8%9+#%$$*:%("l)^\)/+'$&8&:*")%#Al)M\)-*#;<'&''(")%#A) 7-&)"$$-$3)<)i)=4'+7-&)$+'")#"$)&4.,"-7(")#"$).",,)(*D*%*-&)

L06%*-,-1*.+,)"V#"$*="&'+7-&%)C*,,)"&+2,")'-)"%7=+'")'0")#0"&-'6#*.)(*D"$%*'6)+&()'0"),*&5) C*'0)1"&"7.)(*D"$%*'6)26)g&(*&1)1"&-=")C*(")+%%-.*+7-&%)2"'C""&)%-=")0+#,-'6#"%)+&()(*%.$"'")#0"&-'6#"%A)) >%&2=675%(-%(( @*=*,+$)#0"&-'6#*.)%.$""&*&1) ?-@"$"%/(&5%?-75%'( 0+%)#$"D*-4%,6)2""&)(-&")+') '0"),+2-$+'-$6)26).-=#+$*&1) 1$-C'0)$+'"%)4&("$)(*j"$"&') "&D*$-&="&'+,)%'$"%%"%)8Y>)

B-).-&.,4("3)"%7=+7-&%)-/)'0")=4'+7-&)$+'"%)-/)'0"%")(*j"$"&')=*.$-+,1+")C*,,)2$*&1)&"C)4&("$%'+&(*&1) '-)'0")=-(")+&()'"=#-)-/)"D-,47-&)-/)'0"%")"&*1=+7.)"45+$6-'"%)'0+')($*D")'0")-."+&k%)".-%6%'"=A)) .:#%$%"=:>)#&):&='%?)*&)+((*7-&)'-)'0"*$)".-,-1*.+,)*=#-$'+&."3)'0"%")=*.$-+,1+").-4,()-j"$)&"C)2*-'".0&-,-1*.+,)#-'"&7+,)+%)#-%%*2,")%-4$."%)-/) 2*-/4",%)+&()(*"'+$6)m-="1+EZn),*#*()/--()%4##,"="&'%3)+&()'0"6)0+D")/-%'"$"()%"D"$+,)*&'"$&+7-&+,)*&*7+7D"%)*&)'0")f@)8R-*&')K"&-=")b&*7+7D">3) +&()=-$")$"."&',6)*&)_4$-#")8BP!PEa."+&%>)'-)1+*&)5&-C,"(1")*&'-)'0"*$)(*D"$%*'6)+&()="'+2-,*.)#-'"&7+,A) !"#"$"%&"'( E!8H>)M-4$7"%)M3)N+O4"$)P3)B$-4%%",,*"$)?3)Q+47"$)R3)M0$S7"&&-'ET*&"')?R3)U"D"4V)R3)?+.0+(-)M3)M,+4%'$")WA)HXXYA)@=+,,"%')"45+$6-7.)-$1+&*%=A)U+'4$")Z[F\;]]A) E!8;>)^,+&.E?+'0*"4)!A3)@+&.0"9EI"$+&(*&)@A3)_6$"E`+,5"$)PA)+&()L*1+&"+4)KA3);FHZA)a$1+&",,"%)b&0"$*'+&.")*&)'0")K$""&)Q*&"+1"\)b&%*10'%)I$-=)!"#$%&'&''(")#*($+A)K"&-=")^*-,A)_D-,A3)]8G>\H]FZEHHA) E8Z>)Tc-4+&*)_AE^A3)M0$*%7")RA3)@+&.0"9EI"$+&(*&)@A3)@+&.0"9)IA3)^-41"')IAEdA3)M-$",,-4)IA3);FHHA)P)"45+$6-7.)QaNE0*%7(*&")5*&+%")$"14,+'"%).*$.+(*+&).,-.5)/4&.7-&)*&)'0")#*.-+,1+)!"#$%&'&''("A)B0")L,+&')R-4$&+,3)e]8Y>\ ][GE]GGA)) E8Y>)@+&.0"9EI"$+&(*&)@A3)Q"$-6)IA3)^-41"')IEd)+&()R-4V)IA3);FHZA)P)U"C3)@"&%*7D")?+$*&")?*.$-+,1+,)!".-=2*&+&')^*-%"&%-$)4%*&1)Q4=*&"%."&.")?-&*'-$*&1)/-$)'0")B-V*.*'6)B"%7&1)-/)P&7/-4,*&1)^*-.*("%A)P##,*"() _&D*$-&="&'+,)?*.$-2*-,-163)[X\eZHEeZGA) Marinobacter dominates the bacterial community of the Ostreococcus tauri phycosphere in culture

Josselin Lupette1,2, Raphaël Lami3, Marc Krasovec2, Nigel H. Grimsley2, Hervé Moreau2, Gwenael Piganeau2, Sophie Sanchez-Ferandin2*

1 CEA / CNRS / INRA / Université Grenoble Alpes UMR 5168, France, 2UMR 7232 Biologie Intégrative desOrganismes Marins, BIOM, Observatoire Océanologique, France, 2 Observatoire Océanologique, UMR 7232 Biologie Intégrative des Organismes Marins, BIOM, France, 3 Observatoire Océanologique, USR3579 Laboratoire de Biodiversité et Biotechnologies Microbiennes, LBBM, France,

Keywords: Ostreococcus tauri, Marinobacter sp., Picoalgae, Bacteria, interactions, Phytoplankton

ABSTRACT Microalgal-bacterial interactions are commonly found in marine environments and are well known in diatom cultures maintained in laboratory. These interactions also exert strong effects on bacterial and algal diversity in the oceans. Small green eukaryote algae of the class Mamiellophyceae (Chlorophyta) are ubiquitous and some species, such as Ostreococcus spp., are particularly important in Mediterranean coastal lagoons, and are observed as dominant species during phytoplankton blooms in open sea. Despite this, little is known about the diversity of bacteria that might facilitate or hinder O. tauri growth. We show, using rDNA 16S sequences, that the bacterial community found in O. tauri RCC4221 laboratory cultures is dominated by γ-proteobacteria from the Marinobacter genus, regardless of the growth phase of O. tauri RCC4221, the photoperiod used, or the nutrient conditions (limited in nitrogen or phosphorous) tested. Several strains of M. algicola were detected, all closely related to strains found in association with taxonomically distinct organisms, particularly with dinoflagellates and coccolithophorids. These sequences were more distantly related to M. adhaerens, M. aquaeoli and bacteria usually associated to euglenoids. This is the first time, to our knowledge, that distinct Marinobacter strains have been found to be associated with a green alga in culture.

LISTES DES FIGURES ET DES TABLEAUX

175 CHAPITRE 1: INTRODUCTION

Figure 1. Processus de mutations. 13

Figure 2. Schéma d’une expérience d’accumulation de mutations. 16

Figure 3. Representation du fitness landscape selon Fisher. 21

Figure 4. Changement de fitness d’un genotype entre environnements. 21

Figure 5. Relation entre le taux de mutation et la taille du génome. 26

Figure 6. La barrière de dérive et le coût de la réplication. 27

Figure 7. Relation entre taille efficace et taux de mutations. 28

Figure 8. Photographies en microscopie électronique des espèces utilisées pour les expériences d’accumulation de mutations. 39

Figure 9. Arbre phylogénétique des Chlorophyta. 40

Figure 10. Migration par PFGE de l’ADN complet des 4 espèces de Mamiellophyceae. 41

Tableau 1. Les taux de mutations spontanées estimés par des expériences d'accumulation de mutations. 25

Tableau 2. Corrélation entre le temps de génération et le taux de mutation. 29

Tableau 3. Le biais de GC vers AT. 34

Tableau 4. Diversité génomique des espèces utilisées pour les expériences d’accumulation de mutations. 37

176 CHAPITRE 2: EFFETS DES MUTATIONS SUR LA FITNESS

Figure 1. Mutation accumulation (MA) experiments in pico-algae. 53

Figure 2. Selection coefficients, ST, in Irgarol 1051 or Diuron. 56

Figure 3. Selection coefficients in five salinity conditions. 57

Table 1. Summary of mutation accumulation experiments for four species. 54

Table 2. Statistical probabilities of line loss. 55

CHAPITRE 3: LE TAUX DE MUTATION CHEZ LES MAMIELLOPHYCEAE

Figure 1. The GC to AT and AT to GC mutations in the four species. 73

Figure 2. Correlation of the base substitution mutation rate and the effective population size. 74

Figure 3. Correlation of the base substitution mutation rate. 75

Figure 4. Correlation between mutation rate and gap from GC equilibrium. 76

Table 1. Summary of spontaneous mutation rates in four Mamiellophyceae species. 71

Table 2. Mutation rate variation between coding and non-coding sequences. 72

CHAPITRE 4: LES TRANSFERTS HORIZONTAUX DE GENES: LE CAS DE PICOCHLORUM RCC4223

Figure 1. Phylogenetic and phenotypic analysis of Picochlorum RCC4223. 90

Figure 2. PFGE migration of Picochlorum RCC4223. 91

177 Figure 3. Open Read Frame (ORF) lengths comparison between Picochlorum species. 94

Figure 4. Gene family extensions in Picochlorum RCC4223. 94

CHAPITRE 5: IMPACT DU TAUX DE MUTATION POUR LES BIOTECHNOLOGIES

Figure 1. Number of base-substitution mutations observed in MA lines of Picochlorum RCC4223. 106

Table 1. Distribution of the mutations between MA lines. 107

Table 2. The distribution of the mutations in the genome, with the predicted effects from SnpEFF. 107

Table 3. Available direct spontaneous mutation rate estimations in Chlorophyta. 109

CHAPITRE 6: DISCUSSION ET CONCLUSION

Figure 1. Migration PFGE chez les lignées issue de l’EAM d’O. mediterraneus. 120

Figure 2. Présentation des différents facteurs qui influencent le taux de mutation. 121

178

ANNEXES

CHAPITRE 2: EFFETS DES MUTATIONS SUR LA FITNESS

Table S1. Normalized Gr from Micromonas pusilla. 137

Table S2. Normalized Gr Ostreococcus mediterraneus. 138

Table S3. Normalized Gr from Bathycoccus prasinos. 139

Table S4. Normalized Gr from Ostreococcus tauri. 140

Table S5. Average of G of MA lines and control of Micromonas pusilla for each environmental test. 141

Table S6. Average of G of MA lines and control of Bathycoccus prasinos for each environmental test. 141

Table S7. Average of G of MA lines and control of Ostreococcus mediterraneus for each environmental test. 142

CHAPITRE 3: LE TAUX DE MUTATION CHEZ LES MAMIELLOPHYCEAE

Figure S1. The distribution of the base-substitution mutations. 152

Figure S2. Correlation between the strength of the GC bias (R1/R2) and the nucleotide mutation rate. 155

Figure S3. Mutational context of base-substitution mutations. 155

Table S1. Summary of mutation accumulation experiments. 143

Table S2. The part of the genome usable (G*) for mutations calling. 143

Table S3. Base-substitution mutations in O. tauri. 143

Table S4. Base-substitution mutations in O. mediterraneus. 146

179 Table S5. Base-substitution mutations in B. prasinos. 147

Table S6. Base-substitution mutations in M. pusilla. 148

Table S7. Insertions in the four species. 150

Table S8. Deletions in the four species. 150

Table S9: RNAseq coverage of exons and other sequences in mutated and no mutated sites. 151

Table S10. Spontaneous base-substitution mutation rates estimated by mutation accumulation. 153

Table S11. Effect of GC gap from equilibrium in base substitution mutation rate. 154

CHAPITRE 4: LES TRANSFERTS HORIZONTAUX DE GENES: LE CAS DE PICOCHLORUM RCC4223

Table S1. Draw statistical results from Pacbio RS II sequencing and polished step. 156

Table S2. Statistical best results from ABySS assembly. 156

Table S3. SGA results with MiSeq reads mappings in HGAP assembly. 157

Table S4. Total final genome assembly. 157

CHAPITRE 6: DISCUSSION ET CONCLUSION

Tableau A1. Mutations des lignées d’O. tauri qui montrent une baisse significative de fitness. 159

Tableau A2. Délétions et insertions identifiées chez les lignées utilisées pour les essais de fitness. 159

Tableau A3. Substitutions identifiées chez les lignées utilisées pour les essais de fitness. 160

180

BIBLIOGRAPHIE

181 Abby, S.S., Touchon, M., De Jode, A., Grimsley, N., Piganeau, G., 2014. Bacteria in Ostreococcus tauri cultures - friends, foes or hitchhikers? Front. Microbiol. 5, 505. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, 1000 Genomes Project Consortium, 2010. Nature 467(7319): 1061–1073. Adewoye, A.B., Lindsay, S.J., Dubrova, Y.E., Hurles, M.E., 2015. The genome-wide effects of ionizing radiation on mutation induction in the mammalian germline. Nat. Commun. 6, 6684. Aggarwala, V., Voight, B.F., 2016. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48, 349–355. Agrawal, A.F., Whitlock, M.C., 2012. Mutation Load: The Fitness of Individuals in Populations Where Deleterious Alleles Are Abundant. Annu. Rev. Ecol. Evol. Syst. 43, 115–135. Ajie, B.C., Estes, S., Lynch, M., Phillips, P.C., 2005. Behavioral Degradation Under Mutation Accumulation in Caenorhabditis elegans. Genetics 170, 655–660. Alavi, M., Miller, T., Erlandson, K., Schneider, R., Belas, R., 2001. Bacterial community associated with Pfiesteria-like dinoflagellate cultures. Environ. Microbiol. 3, 380–396. Amin, S.A., Green, D.H., Hart, M.C., Küpper, F.C., Sunda, W.G., Carrano, C.J., 2009. Photolysis of iron–siderophore chelates promotes bacterial–algal mutualism. Proc. Natl. Acad. Sci. U. S. A. 106, 17071–17076. Andrew, J.R., Dossey, M.M., Garza, V.O., Keller-Pearson, M., Baer, C.F., Joyner- Matos, J., 2015. Abiotic stress does not magnify the deleterious effects of spontaneous mutations. Heredity. Backström, N., Sætre, G.-P., Ellegren, H., 2013. Inferring the demographic history of European Ficedula flycatcher populations. BMC Evol. Biol. 13, 2. Baer, C.F., Miyamoto, M.M., Denver, D.R., 2007. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nat. Rev. Genet. 8, 619– 631. Baer, C.F., Phillips, N., Ostrow, D., Avalos, A., Blanton, D., Boggs, A., Keller, T., Levy, L., Mezerhane, E., 2006. Cumulative Effects of Spontaneous Mutations for Fitness in Caenorhabditis: Role of Genotype, Environment and Stress. Genetics 174, 1387–1395.

182 Barrett, R.D.H., Schluter, D., 2008. Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 38–44. Becker, E.W., 2007. Micro-algae as a source of protein. Biotechnol. Adv. 25, 207– 210. Behringer, M.G., Hall, D.W., 2016. The repeatability of genome-wide mutation rate and spectrum estimates. Curr. Genet. 1–6. Behringer, M.G., Hall, D.W., 2015. Genome-Wide Estimates of Mutation Rates and Spectrum in Schizosaccharomyces pombe Indicate CpG Sites are Highly Mutagenic Despite the Absence of DNA Methylation. G3 GenesGenomesGenetics 6, 149–160. Beletskii, A., Bhagwat, A.S., 1996. Transcription-induced mutations: Increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 93, 13919–13924. Bird, A.P., 1986. CpG-rich islands and the function of DNA methylation. Nature 321, 209–213. Black, C.K., Mihai, D.M., Washington, I., 2014. The Photosynthetic Eukaryote Nannochloris eukaryotum as an Intracellular Machine To Control and Expand Functionality of Human Cells. Nano Lett. 14, 2720–2725. Blanc, G., Agarkova, I., Grimwood, J., Kuo, A., Brueggeman, A et al., 2012. The genome of the polar eukaryotic microalga Coccomyxa subellipsoidea reveals traits of cold adaptation. Genome Biol. 13, R39. Blanc, G., Duncan, G., Agarkova, I., Borodovsky, M., Gurnon, J et al., 2010. The Chlorella variabilis NC64A genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell 22, 2943–2955. doi:10.1105/tpc.110.076406 Blanc-Mathieu, R., Sanchez-Ferandin, S., Eyre-Walker, A., Piganeau, G., 2013. Organellar inheritance in the green lineage: insights from Ostreococcus tauri. Genome Biol. Evol. 5, 1503–1511. Blanc-Mathieu, R., Verhelst, B., Derelle, E., Rombauts, S., Bouget, F.-Y., Carré, I., Château, A., Eyre-Walker, A., Grimsley, N., Moreau, H., Piégu, B., Rivals, E., Schackwitz, W., Van de Peer, Y., Piganeau, G., 2014. An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies. BMC Genomics 15, 1103.

183 Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., Pirovano, W., 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579. Bousquet, J., Strauss, S.H., Doerksen, A.H., Price, R.A., 1992. Extensive variation in evolutionary rate of rbcL gene sequences among seed plants. Proc. Natl. Acad. Sci. U. S. A. 89, 7844–7848. Bowler, C., Allen, A.E., Badger, J.H., Grimwood, J., Jabbari, K et al., 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature 456, 239–244. Boyd, P., Newton, P., 1995. Evidence of the potential influence of planktonic community structure on the interannual variability of particulate organic carbon flux. Deep Sea Res. Part Oceanogr. Res. Pap. 42, 619–639. Boyd, P.W., Newton, P.P., 1999. Does planktonic community structure determine downward particulate organic carbon flux in different oceanic provinces? Deep Sea Res. Part Oceanogr. Res. Pap. 46, 63–91. Brennan, L., Owende, P., 2010. Biofuels from microalgae-A review of technologies for production, processing, and extractions of biofuels and co-products. Renew. Sustain. Energy Rev. 14, 557–577. Brito, P.H., Guilherme, E., Soares, H., Gordo, I., 2010. Mutation accumulation in Tetrahymena. BMC Evol. Biol. 10, 354. Britten, R.J., 1986. Rates of DNA sequence evolution differ between taxonomic groups. Science 231, 1393–1398. Bromham, L., 2009. Why do species vary in their rate of molecular evolution? Biol. Lett. 5, 401–404. Bromham, L., Penny, D., 2003. The modern molecular clock. Nat. Rev. Genet. 4, 216–224. Bromham, L., Rambaut, A., Harvey, P.H., 1996. Determinants of rate variation in mammalian DNA sequence evolution. J. Mol. Evol. 43, 610–621. Buesseler, K.O., 1998. The decoupling of production and particulate export in the surface ocean. Glob. Biogeochem. Cycles 12, 297–310. Cao, H., Butler, K., Hossain, M., Lewis, J.D., 2014. Variation in the fitness effects of mutations with population density and size in Escherichia coli. PloS One 9, e105369. Cao, J., Schneeberger, K., Ossowski, S., Günther, T., Bender, S., Fitz, J., Koenig, D., Lanz, C., Stegle, O., Lippert, C., Wang, X., Ott, F., Müller, J., Alonso-

184 Blanco, C., Borgwardt, K., Schmid, K.J., Weigel, D., 2011. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43, 956– 963. Carreras, C.W., Pieper, R., Khosla, C., 1997. The chemistry and biology of fatty acid, polyketide, and nonribosomal peptide biosynthesis 85–126. Carroll, S.P., Jørgensen, P.S., Kinnison, M.T., Bergstrom, C.T., Denison, R.F., Gluckman, P., Smith, T.B., Strauss, S.Y., Tabashnik, B.E., 2014. Applying evolutionary biology to address global challenges. Science 346, 1245993. Cazzaniga, S., Dall’Osto, L., Szaub, J., Scibilia, L., Ballottari, M., Purton, S., Bassi, R., 2014. Domestication of the green alga Chlorella sorokiniana: reduction of antenna size improves light-use efficiency in a photobioreactor. Biotechnol. Biofuels 7, 157. Chang, S.-M., Shaw, R.G., 2003. The contribution of spontaneous mutation to variation in environmental response in Arabidopsis thaliana: responses to nutrients. Evol. Int. J. Org. Evol. 57, 984–994. Charlesworth, B., 2009. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10, 195–205. Charlesworth, B., Charlesworth, D., 1998. Some evolutionary consequences of deleterious mutations. Genetica 102–103, 3–19. Charlesworth, B., Charlesworth, D., Morgan, M.T., 1990. Genetic loads and estimates of mutation rates in highly inbred plant populations. Nature 347, 380–382. Chen, C.-L., Rappailles, A., Duquenne, L., Huvet, M., Guilbaud, G., Farinelli, L., Audit, B., d’Aubenton-Carafa, Y., Arneodo, A., Hyrien, O., Thermes, C., 2010. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457. Chen, J.-M., Cooper, D.N., Chuzhanova, N., Férec, C., Patrinos, G.P., 2007. Gene conversion: mechanisms, evolution and human disease. Nat. Rev. Genet. 8, 762–775. Chen, T.-Y., Lin, H.-Y., Lin, C.-C., Lu, C.-K., Chen, Y.-M., 2012. Picochlorum as an alternative to Nannochloropsis for grouper larval rearing. Aquaculture 338– 341, 82–88. Chen, X., Zhang, J., 2013. No Gene-Specific Optimization of Mutation Rate in Escherichia coli. Mol. Biol. Evol. 30, 1559–1562.

185 Chevin, L.-M., 2011. On measuring selection in experimental evolution. Biol. Lett. 7, 210–213. Chin, C.-S., Alexander, D.H., Marks, P., Klammer, A.A., Drake, J., Heiner, C., Clum, A., Copeland, A., Huddleston, J., Eichler, E.E., Turner, S.W., Korlach, J., 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569. Chisti, Y., 2007. Biodiesel from microalgae. Biotechnol. Adv. 25, 294–306. Cingolani, P., Platts, A., Wang, L.L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., Ruden, D.M., 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6, 80–92. Cohen, N.M., Kenigsberg, E., Tanay, A., 2011. Primate CpG Islands Are Maintained by Heterogeneous Evolutionary Regimes Involving Minimal Selection. Cell 145, 773–786. Cole, J.J., 1982. Interactions Between Bacteria and Algae in Aquatic Ecosystems. Annu. Rev. Ecol. Syst. 13, 291–314. Collins, S., Rost, B., Rynearson, T.A., 2014. Evolutionary potential of marine phytoplankton under ocean acidification. Evol. Appl. 7, 140–155. Conrad, DF., Keebler, JE., DePristo, MA., Lindsay, SJ., Zhang, Y et al., 2011. Variation in genome-wide mutation rates within and between human families. Nat Genet. 43(7): 712–714. Consortium, T. 1000 G.P., 2010. A map of human genome variation from population- scale sequencing. Nature 467, 1061–1073. Consortium, T.C.S. and A., 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87. Cooke, M.S., Evans, M.D., Dizdaroglu, M., Lunec, J., 2003. Oxidative DNA damage: mechanisms, mutation, and disease. FASEB J. Off. Publ. Fed. Am. Soc. Exp. Biol. 17, 1195–1214. Couce, A., Guelfo, J.R., Blázquez, J., 2013. Mutational spectrum drives the rise of mutator bacteria. PLoS Genet. 9, e1003167. Coulondre, C., Miller, J.H., Farabaugh, P.J., Gilbert, W., 1978. Molecular basis of base substitution hotspots in Escherichia coli. Nature 274, 775–780. Courties, C., Vaquer, A., Troussellier, M., Lautier, J., Chrétiennot-Dinet, M.J., Neveux, J., Machado, C., Claustre, H., 1994. Smallest eukaryotic organism. Nature 370, 255–255.

186 Cowin, P.A., Anglesio, M., Etemadmoghadam, D., Bowtell, D.D.L., 2010. Profiling the Cancer Genome. Annu. Rev. Genomics Hum. Genet. 11, 133–159. Croft, M.T., Lawrence, A.D., Raux-Deery, E., Warren, M.J., Smith, A.G., 2005. Algae acquire vitamin B12 through a symbiotic relationship with bacteria. Nature 438, 90–93. Crow, J.F., Abrahamson, S., 1997. Seventy Years Ago: Mutation Becomes Experimental. Genetics 147, 1491–1496. Cutter, A.D., 2006. Nucleotide polymorphism and linkage disequilibrium in wild populations of the partial selfer Caenorhabditis elegans. Genetics 172, 171– 184. Cutter, A.D., Félix, M.-A., Barrière, A., Charlesworth, D., 2006. Patterns of Nucleotide Polymorphism Distinguish Temperate and Tropical Wild Isolates of Caenorhabditis briggsae. Genetics 173, 2021–2031. Dassey, A.J., Theegala, C.S., 2013. Reducing electrocoagulation harvesting costs for practical microalgal biodiesel production. Environ. Technol. 35, 691–697. Davies, E.K., Peters, A.D., Keightley, P.D., 1999. High Frequency of Cryptic Deleterious Mutations in Caenorhabditis elegans. Science 1748. De Clerck, O., Bogaert, K.A., Leliaert, F., 2012. Diversity and Evolution of Algae: Primary Endosymbiosis. Adv. Bot. Res. 64, 55–86. de la Vega, M., Díaz, E., Vila, M., León, R., 2011. Isolation of a new strain of Picochlorum sp and characterization of its potential biotechnological applications. Biotechnol. Prog. 27, 1535–1543. De Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, F et al., 2015. Ocean plankton. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605. Demir-Hilton, E., Sudek, S., Cuvelier, M.L., Gentemann, C.L., Zehr, J.P., Worden, A.Z., 2011. Global distribution patterns of distinct clades of the photosynthetic picoeukaryote Ostreococcus. ISME J. 5, 1095–1107. Deng, H.-W., Gao, G., Li, J.-L., 2002. Estimation of deleterious genomic mutation parameters in natural populations by accounting for variable mutation effects across loci. Genetics 162, 1487–1500. Deng, H.W., Lynch, M., 1997. Inbreeding depression and inferred deleterious- mutation parameters in Daphnia. Genetics 147, 147–155.

187 Deng, H.W., Lynch, M., 1996. Estimation of Deleterious-Mutation Parameters in Natural Populations. Genetics 144, 349–360. Denver, D.R., Dolan, P.C., Wilhelm, L.J., Sung, W., Lucas-Lledó, J.I., Howe, D.K., Lewis, S.C., Okamoto, K., Thomas, W.K., Lynch, M., Baer, C.F., 2009. A genome-wide view of Caenorhabditis elegans base-substitution mutation processes. Proc. Natl. Acad. Sci. U. S. A. 106, 16310–16314. Denver, D.R., Feinberg, S., Estes, S., Thomas, W.K., Lynch, M., 2005. Mutation Rates, Spectra and Hotspots in Mismatch Repair-Deficient Caenorhabditis elegans. Genetics 170, 107–113. Denver, D.R., Morris, K., Lynch, M., Thomas, W.K., 2004. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430, 679–682. Denver, D.R., Morris, K., Lynch, M., Vassilieva, L.L., Thomas, W.K., 2000. High direct estimate of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science 289, 2342–2344. Denver, D.R., Wilhelm, L.J., Howe, D.K., Gafner, K., Dolan, P.C., Baer, C.F., 2012. Variation in base-substitution mutation in experimental and natural lineages of Caenorhabditis nematodes. Genome Biol. Evol. 4, 513–522. DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., Daly, M.J., 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498. Derelle, E., Ferraz, C., Escande, M.-L., Eychenié, S., Cooke, R., Piganeau, G., Desdevises, Y., Bellec, L., Moreau, H., Grimsley, N., 2008. Life-cycle and genome of OtV5, a large DNA virus of the pelagic marine unicellular green alga Ostreococcus tauri. PloS One 3, e2250. Derelle, E., Ferraz, C., Rombauts, S., Rouzé, P., Worden, A.Z et al., 2006. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc. Natl. Acad. Sci. U. S. A. 103, 11647–11652. Dettman, J.R., Sztepanacz, J.L., Kassen, R., 2016. The properties of spontaneous mutations in the opportunistic pathogen Pseudomonas aeruginosa. BMC Genomics 17.

188 Dillon, M.M., Sung, W., Lynch, M., Cooper, V.S., 2015. The Rate and Molecular Spectrum of Spontaneous Mutations in the GC-Rich Multi-Chromosome Genome of Burkholderia cenocepacia. Genetics. Ding, J., McConechy, M.K., Horlings, H.M., Ha, G., Chun Chan, F., Funnell, T., Mullaly, S.C., Reimand, J., Bashashati, A., Bader, G.D., Huntsman, D., Aparicio, S., Condon, A., Shah, S.P., 2015. Systematic analysis of somatic mutations impacting gene expression in 12 tumour types. Nat. Commun. 6, 8554. Dizdaroglu, M., 1992. Oxidative damage to DNA in mammalian chromatin. Mutat. Res. 275, 331–342. Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., Gingeras, T.R., 2013. STAR: ultrafast universal RNA-seq aligner. Bioinforma. Oxf. Engl. 29, 15–21. Drake, J.W., 1991. A constant rate of spontaneous mutation in DNA-based microbes. Proc. Natl. Acad. Sci. U. S. A. 88, 7160–7164. Drake, J.W., Charlesworth, B., Charlesworth, D., Crow, J.F., 1998. Rates of spontaneous mutation. Genetics 148, 1667–1686. Duret, L., Arndt, P.F., 2008. The Impact of Recombination on Nucleotide Substitutions in the Human Genome. PLOS Genet 4, e1000071. Duret, L., Galtier, N., 2009. Biased Gene Conversion and the Evolution of Mammalian Genomic Landscapes. Annu. Rev. Genomics Hum. Genet. 10, 285–311. Ebersberger, I., Metzler, D., Schwarz, C., Pääbo, S., 2002. Genomewide Comparison of DNA Sequences between Humans and Chimpanzees. Am. J. Hum. Genet. 70, 1490–1497. Edwards, A.W.F., 2000. The Genetical Theory of Natural Selection. Genetics 154, 1419–1426. Elango, N., Kim, S.-H., Vigoda, E., Yi, S.V., 2008. Mutations of different molecular origins exhibit contrasting patterns of regional substitution rate variation. PLoS Comput. Biol. 4, e1000015. Elena, S.F., de Visser, J.A.G., 2003. Environmental stress and the effects of mutation. J. Biol. 2, 12. Elena, S.F., Lenski, R.E., 2003. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nat. Rev. Genet. 4, 457–469.

189 Ellegren, H., Galtier, N., 2016. Determinants of genetic diversity. Nat. Rev. Genet. 17, 422–433. Estes, S., Phillips, P.C., Denver, D.R., Thomas, W.K., Lynch, M., 2004. Mutation Accumulation in Populations of Varying Size: The Distribution of Mutational Effects for Fitness Correlates in Caenorhabditis elegans. Genetics 166, 1269– 1279. Eyre-Walker, A., Bulmer, M., 1995. Synonymous substitution rates in enterobacteria. Genetics 140, 1407–1412. Eyre-Walker, A., Keightley, P.D., 2007. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618. Eyre-Walker, A., Keightley, P.D., Smith, N.G.C., Gaffney, D., 2002. Quantifying the Slightly Deleterious Mutation Model of Molecular Evolution. Mol. Biol. Evol. 19, 2142–2149. Farlow, A., Long, H., Arnoux, S., Sung, W., Doak, T.G., Nordborg, M., Lynch, M., 2015. The Spontaneous Mutation Rate in the Fission Yeast Schizosaccharomyces pombe. Genetics genetics.115.177329. Fernández, J., López-Fanjul, C., 1996. Spontaneous mutational variances and covariances for fitness-related traits in Drosophila melanogaster. Genetics 143, 829–837. Field, null, Behrenfeld, null, Randerson, null, Falkowski, null, 1998. Primary production of the biosphere: integrating terrestrial and oceanic components. Science 281, 237–240. Fijalkowska, I.J., Jonczyk, P., Tkaczyk, M.M., Bialoskorska, M., Schaaper, R.M., 1998. Unequal fidelity of leading strand and lagging strand DNA replication on the Escherichia coli chromosome. Proc. Natl. Acad. Sci. U. S. A. 95, 10020– 10025. Foflonker, F., Ananyev, G., Qiu, H., Morrison, A., Palenik, B., Dismukes, G.C., Bhattacharya, D., 2016. The unexpected extremophile: Tolerance to fluctuating salinity in the green alga Picochlorum. Algal Res. 16, 465–472. Foflonker, F., Price, D.C., Qiu, H., Palenik, B., Wang, S., Bhattacharya, D., 2015. Genome of the halotolerant green alga Picochlorum sp. reveals strategies for thriving under fluctuating environmental conditions. Environ. Microbiol. 17, 412–426.

190 Ford, C.B., Lin, P.L., Chase, M.R., Shah, R.R., Iartchouk, O., Galagan, J., Mohaideen, N., Ioerger, T.R., Sacchettini, J.C., Lipsitch, M., Flynn, J.L., Fortune, S.M., 2011. Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection. Nat. Genet. 43, 482–486. Foster, P.L., Hanson, A.J., Lee, H., Popodi, E.M., Tang, H., 2013. On the mutational topology of the bacterial genome. G3 Bethesda Md 3, 399–407. Foster, P.L., Lee, H., Popodi, E., Townes, J.P., Tang, H., 2015. Determinants of spontaneous mutation in the bacterium Escherichia coli as revealed by whole- genome sequencing. Proc. Natl. Acad. Sci. U. S. A. 112, E5990-5999. Friedl, T., 1995. Inferring Taxonomic Positions and Testing Genus Level Assignments in Coccoid Green Lichen Algae: A Phylogenetic Analysis of 18s Ribosomal Rna Sequences from Dictyochloropsis Reticulata and from Members of the Genus Myrmecia (chlorophyta, Trebouxiophyceae Cl. Nov.)1. J. Phycol. 31, 632–639. Friedl, T., Rybalka, N., 2012. Systematics of the Green Algae: A Brief Introduction to the Current Status. Prog. Bot. 73 259. Fry, J.D., 2004. On the rate and linearity of viability declines in Drosophila mutation- accumulation experiments: genomic mutation rates and synergistic epistasis revisited. Genetics 166, 797–806. Fry, J.D., 2001. Rapid mutational declines of viability in Drosophila. Genet. Res. 77, 53–60. Fry, J.D., Heinsohn, S.L., 2002. Environment dependence of mutational parameters for viability in Drosophila melanogaster. Genetics 161, 1155–1167. Fry, J.D., Heinsohn, S.L., Mackay, T.F.C., 1996. The Contribution of New Mutations to Genotype-Environment Interaction for Fitness in Drosophila melanogaster. Evolution 50, 2316–2327. Fry, J.D., Keightley, P.D., Heinsohn, S.L., Nuzhdin, S.V., 1999. New estimates of the rates and effects of mildly deleterious mutation in Drosophila melanogaster. Proc. Natl. Acad. Sci. U. S. A. 96, 574–579. Fryxell, K.J., Zuckerkandl, E., 2000. Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol. Biol. Evol. 17, 1371–1383.

191 Fukui, K., Fukui, K., 2010. DNA Mismatch Repair in Eukaryotes and Bacteria, DNA Mismatch Repair in Eukaryotes and Bacteria. J. Nucleic Acids J. Nucleic Acids 2010, 2010, e260512. Galtier, N., Piganeau, G., Mouchiroud, D., Duret, L., 2001. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159, 907–911. Gao, C., Wang, Y., Shen, Y., Yan, D., He, X., Dai, J., Wu, Q., 2014. Oil accumulation mechanisms of the oleaginous microalga Chlorella protothecoides revealed through its genome, transcriptomes, and proteomes. BMC Genomics 15, 582. Gao, Z., Wyman, M.J., Sella, G., Przeworski, M., 2016. Interpreting the Dependence of Mutation Rates on Age and Time. PLoS Biol. 14, e1002355. Garzon-Sanabria, A.J., Davis, R.T., Nikolov, Z.L., 2012. Harvesting Nannochloris oculata by inorganic electrolyte flocculation: effect of initial cell density, ionic strength, coagulant dosage, and media pH. Bioresour. Technol. 118, 418– 424. Gavrilets, S., 1997. Evolution and speciation on holey adaptive landscapes. Trends Ecol. Evol. 12, 307–312. Gerken, H.G., Donohoe, B., Knoshaug, E.P., 2013. Enzymatic cell wall degradation of Chlorella vulgaris and other microalgae for biofuels production. Planta 237, 239–253. Gillman, L.N., Keeling, D.J., Gardner, R.C., Wright, S.D., 2010. Faster evolution of highly conserved DNA in tropical plants. J. Evol. Biol. 23, 1327–1330. Giraud, A., Matic, I., Tenaillon, O., Clara, A., Radman, M., Fons, M., Taddei, F., 2001. Costs and benefits of high mutation rates: adaptive evolution of bacteria in the mouse gut. Science 291, 2606–2608. Glémin, S., 2010. Surprising Fitness Consequences of GC-Biased Gene Conversion: I. Mutation Load and Inbreeding Depression. Genetics 185, 939–959. Glémin, S., Arndt, P.F., Messer, P.W., Petrov, D., Galtier, N., Duret, L., 2015. Quantification of GC-biased gene conversion in the human genome. Genome Res. Gojobori, T., Li, W.-H., Graur, D., n.d. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 18, 360–369.

192 Gossmann, T.I., Keightley, P.D., Eyre-Walker, A., 2012. The Effect of Variation in the Effective Population Size on the Rate of Adaptive Molecular Evolution in Eukaryotes. Genome Biol. Evol. 4, 658–667. Grimsley, N., Péquin, B., Bachy, C., Moreau, H., Piganeau, G., 2010. Cryptic sex in the smallest eukaryotic marine green alga. Mol. Biol. Evol. 27, 47–54. Guillard, R.R.L., Hargraves, P.E., 1993. Stichochrysis immobilis is a diatom, not a chrysophyte. Phycologia 32, 234–236. Haag-Liautard, C., Coffey, N., Houle, D., Lynch, M., Charlesworth, B., Keightley, P.D., 2008. Direct estimation of the mitochondrial DNA mutation rate in Drosophila melanogaster. PLoS Biol. 6, e204. Haag-Liautard, C., Dorris, M., Maside, X., Macaskill, S., Halligan, D.L., Houle, D., Charlesworth, B., Keightley, P.D., 2007. Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila. Nature 445, 82–85. Haldane, J.B.S., 1949. The Rate of Mutation of Human Genes. Hereditas 35, 267– 273. Haldane, J.B.S., 1937. The Effect of Variation of Fitness. Am. Nat. 71, 337–349. Hall, D.W., Fox, S., Kuzdzal-Fick, J.J., Strassmann, J.E., Queller, D.C., 2013. The Rate and Effects of Spontaneous Mutation on Fitness Traits in the Social Amoeba, Dictyostelium discoideum. G3 GenesGenomesGenetics 3, 1115– 1127. Hall, D.W., Mahmoudizad, R., Hurd, A.W., Joseph, S.B., 2008. Spontaneous mutations in diploid Saccharomyces cerevisiae: another thousand cell generations. Genet. Res. 90, 229–241. Halligan, D.L., Keightley, P.D., 2009. Spontaneous Mutation Accumulation Studies in Evolutionary Genetics. Annu. Rev. Ecol. Evol. Syst. 40, 151–172. Hanawalt, P.C., Spivak, G., 2008. Transcription-coupled DNA repair: two decades of progress and surprises. Nat. Rev. Mol. Cell Biol. 9, 958–970. Hannon, M., Gimpel, J., Tran, M., Rasala, B., Mayfield, S., 2010. Biofuels from algae: challenges and potential. Biofuels 1, 763–784. Harrison, R.J., Charlesworth, B., 2011. Biased Gene Conversion Affects Patterns of Codon Usage and Amino Acid Usage in the Saccharomyces sensu stricto Group of Yeasts. Mol. Biol. Evol. 28, 117–129. Henley, W.J., Hironaka, J.L., Guillou, L., Buchheim, M.A., Buchheim, J.A., Fawley, M.W., Fawley, K.P., 2004. Phylogenetic analysis of the “Nannochloris-like”

193 algae and diagnoses of Picochlorum oklahomensis gen. et sp. nov. (Trebouxiophyceae, Chlorophyta). Phycologia 43, 641–652. Henley, W.J., Major, K.M., Hironaka, J.L., 2002. Response to Salinity and Heat Stress in Two Halotolerant Chlorophyte Algae1. J. Phycol. 38, 757–766. Henry, L., Schwander, T., Crespi, B.J., 2012. Deleterious Mutation Accumulation in Asexual Timema Stick Insects. Mol. Biol. Evol. 29, 401–408. Hermisson, J., Pennings, P.S., 2005. Soft Sweeps Molecular Population Genetics of Adaptation From Standing Genetic Variation. Genetics 169, 2335–2352. Hershberg, R., Petrov, D.A., 2010. Evidence That Mutation Is Universally Biased towards AT in Bacteria. PLoS Genet 6, e1001115. Hestand, M.S., Houdt, J.V., Cristofoli, F., Vermeesch, J.R., 2016. Polymerase specific error rates and profiles identified by single molecule sequencing. Mutat. Res. Mol. Mech. Mutagen. 784–785, 39–45. Higgins, K., Lynch, M., 2001. Metapopulation extinction caused by mutation accumulation. Proc. Natl. Acad. Sci. U. S. A. 98, 2928–2933. Hildebrand, F., Meyer, A., Eyre-Walker, A., 2010. Evidence of Selection upon Genomic GC-Content in Bacteria. PLoS Genet. 6. Hodgkinson, A., Eyre-Walker, A., 2011. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766. Hold, G.L., Smith, E.A., Rappé, M.S., Maas, E.W., Moore, E.R.., Stroempl, C., Stephen, J.R., Prosser, J.I., Birkbeck, T.H., Gallacher, S., 2001. Characterisation of bacterial communities associated with toxic and non-toxic dinoflagellates: Alexandrium spp. and Scrippsiella trochoidea. FEMS Microbiol. Ecol. 37, 161–173. Hopwood, D.A., 1997. Genetic Contributions to Understanding Polyketide Synthases. Chem. Rev. 97, 2465–2498. Houle, D., 1992. Comparing Evolvability and Variability of Quantitative Traits. Genetics 130, 195–204. Houle, D., Hoffmaster, D.K., Assimacopoulos, S., Charlesworth, B., 1992. The genomic mutation rate for fitness in Drosophila. Nature 359, 58–60. Hubscher, U., Maga, G., Spadari, S., 2002. Eukaryotic DNA polymerases. Annu. Rev. Biochem. 71, 133–163. Hudson, R.E., Bergthorsson, U., Roth, J.R., Ochman, H., 2002. Effect of chromosome location on bacterial mutation rates. Mol. Biol. Evol. 19, 85–92.

194 Huey, R.B., Gilchrist, G.W., Ward, K., Maves, L., Pepin, D., Houle, D., 2003. Mutation Accumulation, Performance, Fitness. Integr. Comp. Biol. 43, 387– 395. Hurst, L.D., Williams, E.J., 2000. Covariation of GC content and the silent site substitution rate in rodents: implications for methodology and for the evolution of isochores. Gene 261, 107–114. Ikemura, T., 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409. Jancek, S., Gourbière, S., Moreau, H., Piganeau, G., 2008. Clues about the genetic basis of adaptation emerge from comparing the proteomes of two Ostreococcus ecotypes (Chlorophyta, Prasinophyceae). Mol. Biol. Evol. 25, 2293–2300. Jardillier, L., Zubkov, M.V., Pearman, J., Scanlan, D.J., 2010. Significant CO2 fixation by small prymnesiophytes in the subtropical and tropical northeast Atlantic Ocean. ISME J. 4, 1180–1192. Jiang, C., Mithani, A., Belfield, E.J., Mott, R., Hurst, L.D., Harberd, N.P., 2014. Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res. 24, 1821– 1829. Jiricny, J., 2006. The multifaceted mismatch-repair system. Nat. Rev. Mol. Cell Biol. 7, 335–346. Jones, P.A., 2012. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484–492. Joseph, S.B., Hall, D.W., 2004. Spontaneous Mutations in Diploid Saccharomyces cerevisiae. Genetics 168, 1817–1825. Kaity, A., Ashmore, S.E., Drew, R.A., Dulloo, M.E., 2008. Assessment of genetic and epigenetic changes following cryopreservation in papaya. Plant Cell Rep. 27, 1529–1539. Katju, V., Packard, L.B., Bu, L., Keightley, P.D., Bergthorsson, U., 2014. Fitness decline in spontaneous mutation accumulation lines of Caenorhabditis elegans with varying effective population sizes. Evol. Int. J. Org. Evol.

195 Kavanaugh, C.M., Shaw, R.G., 2005. The contribution of spontaneous mutation to variation in environmental responses of Arabidopsis thaliana: responses to light. Evol. Int. J. Org. Evol. 59, 266–275. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., Drummond, A., 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinforma. Oxf. Engl. 28, 1647–1649. Keightley, P.D., 1994. The Distribution of Mutation Effects on Viability in Drosophila melanogaster. Genetics 138, 1315–1322. Keightley, P.D., Bataillon, T.M., 2000. Multigeneration maximum-likelihood analysis applied to mutation-accumulation experiments in Caenorhabditis elegans. Genetics 154, 1193–1201. Keightley, P.D., Caballero, A., 1997. Genomic mutation rates for lifetime reproductive output and lifespan in Caenorhabditis elegans. Proc. Natl. Acad. Sci. U. S. A. 94, 3823–3827. Keightley, P.D., Eyre-Walker, A., 1999. Terumi Mukai and the Riddle of Deleterious Mutation Rates. Genetics 153, 515–523. Keightley, P.D., Ness, R.W., Halligan, D.L., Haddrill, P.R., 2014a. Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics 196, 313–320. Keightley, P.D., Pinharanda, A., Ness, R.W., Simpson, F., Dasmahapatra, K.K., Mallet, J., Davey, J.W., Jiggins, C.D., 2014b. Estimation of the spontaneous mutation rate in Heliconius melpomene. Mol. Biol. Evol. Keightley, P.D., Trivedi, U., Thomson, M., Oliver, F., Kumar, S., Blaxter, M.L., 2009. Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Res. 19, 1195–1201. Kibota, T.T., Lynch, M., 1996. Estimate of the genomic mutation rate deleterious to overall fitness in E. coll. Nature 381, 694–696. Kim, N., Jinks-Robertson, S., 2012. Transcription as a source of genome instability. Nat. Rev. Genet. 13, 204–214. Kimura, M., 1991. The neutral theory of molecular evolution: a review of recent evidence. Idengaku Zasshi 66, 367–386.

196 Kimura, M., 1987. Molecular evolutionary clock and the neutral theory. J. Mol. Evol. 26, 24–33. Kimura, M., 1968. Evolutionary rate at the molecular level. Nature 217, 624–626. Kishony, R., Leibler, S., 2003. Environmental stresses can alleviate the average deleterious effect of mutations. J. Biol. 2, 14. Klapacz, J., Bhagwat, A.S., 2002. Transcription-dependent increase in multiple classes of base substitution mutations in Escherichia coli. J. Bacteriol. 184, 6866–6872. Knudson, A.G., 2000. Chasing the Cancer Demon. Annu. Rev. Genet. 34, 1–19. Kondrashov, A.S., 1995. Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? J. Theor. Biol. 175, 583– 594. Kondrashov, A.S., 1988. Deleterious mutations and the evolution of sexual reproduction. Nature 336, 435–440. Kondrashov, A.S., Houle, D., 1994. Genotype-environment interactions and the estimation of the genomic mutation rate in Drosophila melanogaster. Proc. Biol. Sci. 258, 221–227. Korona, R., 1999. Genetic Load of the Yeast Saccharomyces cerevisiae under Diverse Environmental Conditions. Evolution 53, 1966–1971. Kraemer, S.A., Morgan, A.D., Ness, R.W., Keightley, P.D., Colegrave, N., 2015. Fitness effects of new mutations in Chlamydomonas reinhardtii across two stress gradients. J. Evol. Biol. n/a-n/a. Krasovec, M., Eyre-Walker, A., Grimsley, N., Salmeron, C., Pecqueur, D., Piganeau, G., Sanchez-Ferandin, S., 2016. Fitness Effects of Spontaneous Mutations in Picoeukaryotic Marine Green Algae. G3 GenesGenomesGenetics 6, 2063– 2071. Kucukyildirim, S., Long, H., Sung, W., Miller, S.F., Doak, T.G., Lynch, M., 2016. The Rate and Spectrum of Spontaneous Mutations in Mycobacterium smegmatis, a Bacterium Naturally Devoid of the Post-replicative Mismatch Repair Pathway. G3 GenesGenomesGenetics g3.116.030130. Kunkel, T.A., Bebenek, and K., 2000. DNA Replication Fidelity. Annu. Rev. Biochem. 69, 497–529. Kunkel, T.A., Erie, D.A., 2015. Eukaryotic Mismatch Repair in Relation to DNA Replication. Annu. Rev. Genet. 49, null.

197 Kuo, C.-H., Ochman, H., 2009. Deletional Bias across the Three Domains of Life. Genome Biol. Evol. 1, 145–152. Laird, C.D., McConaughy, B.L., McCarthy, B.J., 1969. Rate of Fixation of Nucleotide Substitutions in Evolution. Nature 224, 149–154. Laity, J.H., Lee, B.M., Wright, P.E., 2001. Zinc finger proteins: new insights into structural and functional diversity. Curr. Opin. Struct. Biol. 11, 39–46. Lande, R., 1998. Risk of population extinction from fixation of deleterious and reverse mutations. Genetica 102–103, 21–27. Lande, R., 1994. Risk of Population Extinction from Fixation of New Deleterious Mutations. Evolution 48, 1460–1469. Lande, R., 1988. Genetics and demography in biological conservation. Science 241, 1455–1460. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C et al., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Lanfear, R., Kokko, H., Eyre-Walker, A., 2014. Population size and the rate of evolution. Trends Ecol. Evol. 29, 33–41. Lang, G.I., Murray, A.W., 2008. Estimating the per-base-pair mutation rate in the yeast Saccharomyces cerevisiae. Genetics 178, 67–82. Lang, G.I., Parsons, L., Gammie, A.E., 2013. Mutation rates, spectra, and genome- wide distribution of spontaneous mutations in mismatch repair deficient yeast. G3 Bethesda Md 3, 1453–1465. Laroche, J., Li, P., Maggia, L., Bousquet, J., 1997. Molecular evolution of angiosperm mitochondrial introns and exons. Proc. Natl. Acad. Sci. U. S. A. 94, 5722–5727. Lassalle, F., Périan, S., Bataillon, T., Nesme, X., Duret, L., Daubin, V., 2015. GC- Content Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands. PLoS Genet 11, e1004941. Latta, L.C., Morgan, K.K., Weaver, C.S., Allen, D., Schaack, S., Lynch, M., 2013. Genomic background and generation time influence deleterious mutation rates in Daphnia. Genetics 193, 539–544. Laughlin, D.C., Messier, J., 2015. Fitness of multidimensional phenotypes in dynamic adaptive landscapes. Trends Ecol. Evol. 30, 487–496.

198 Lee, H., Popodi, E., Tang, H., Foster, P.L., 2012. Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proc. Natl. Acad. Sci. U. S. A. 109, E2774-2783. Leliaert, F., Smith, D.R., Moreau, H., Herron, M.D., Verbruggen, H., Delwiche, C.F., Clerck, O.D., 2012. Phylogeny and Molecular Evolution of the Green Algae. Crit. Rev. Plant Sci. 31, 1–46. Lesecque, Y., Mouchiroud, D., Duret, L., 2013. GC-Biased Gene Conversion in Yeast Is Specifically Associated with Crossovers: Molecular Mechanisms and Evolutionary Significance. Mol. Biol. Evol. 30, 1409–1419. Lewis, C.A., Crayle, J., Zhou, S., Swanstrom, R., Wolfenden, R., 2016. Cytosine deamination and the precipitous decline of spontaneous mutation during Earth’s history. Proc. Natl. Acad. Sci. U. S. A. 113, 8194–8199. Lewis, L.A., McCourt, R.M., 2004. Green algae and the origin of land plants. Am. J. Bot. 91, 1535–1556. Li, G.-M., 2008. Mechanisms and functions of DNA mismatch repair. Cell Res. 18, 85–98. Li, H., Durbin, R., 2010. Fast and accurate long-read alignment with Burrows– Wheeler transform. Bioinformatics 26, 589–595. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., 1000 Genome Project Data Processing Subgroup, 2009. The Sequence Alignment/Map format and SAMtools. Bioinforma. Oxf. Engl. 25, 2078–2079. Li, J., Deng, H.-W., 2005. Estimation of the rate and effects of deleterious genomic mutations in finite populations with linkage disequilibrium. Heredity 95, 59–68. Li, W.K.W., 1994. Primary production of prochlorophytes, cyanobacteria, and eucaryotic ultraphytoplankton: Measurements from flow cytometric sorting. Limnol. Oceanogr. 39, 169–175. Lind, P.A., Andersson, D.I., 2008. Whole-genome mutational biases in bacteria. Proc. Natl. Acad. Sci. U. S. A. 105, 17878–17883. Long, H., Kucukyildirim, S., Sung, W., Williams, E., Lee, H., Ackerman, M., Doak, T.G., Tang, H., Lynch, M., 2015a. Background Mutational Features of the Radiation-Resistant Bacterium Deinococcus radiodurans. Mol. Biol. Evol. 32, 2383–2392.

199 Long, H., Sung, W., Miller, S.F., Ackerman, M.S., Doak, T.G., Lynch, M., 2015b. Mutation rate, spectrum, topology, and context-dependency in the DNA mismatch repair-deficient Pseudomonas fluorescens ATCC948. Genome Biol. Evol. 7, 262–271. Lujan, S.A., Clausen, A.R., Clark, A.B., MacAlpine, H.K., MacAlpine, D.M., Malc, E.P., Mieczkowski, P.A., Burkholder, A.B., Fargo, D.C., Gordenin, D.A., Kunkel, T.A., 2014. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res. 24, 1751–1764. Lynch, M., 2010a. Evolution of the mutation rate. Trends Genet. TIG 26, 345–352. Lynch, M., 2010b. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. 107, 961–968. Lynch, M., 2008. The Cellular, Developmental and Population-Genetic Determinants of Mutation-Rate Evolution. Genetics 180, 933–943. Lynch, M., Blanchard, J., Houle, D., Kibota, T., Schultz, S., Vassilieva, L., Willis, J., 1999. Perspective: Spontaneous Deleterious Mutation. Evolution 53, 645– 663. Lynch, M., Conery, J., Burger, R., 1995. Mutation Accumulation and the Extinction of Small Populations. Am. Nat. 146, 489–518. Lynch, M., Gabriel, W., 1990. Mutation Load and the Survival of Small Populations. Evolution 44, 1725–1737. Lynch, M., Hagner, K., 2015. Evolutionary meandering of intermolecular interactions along the drift barrier. Proc. Natl. Acad. Sci. 112, E30–E38. Lynch, M., Sung, W., Morris, K., Coffey, N., Landry, C.R., Dopman, E.B., Dickinson, W.J., Okamoto, K., Kulkarni, S., Hartl, D.L., Thomas, W.K., 2008. A genome- wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U. S. A. 105, 9272–9277. Ma, X., Rogacheva, M.V., Nishant, K.T., Zanders, S., Bustamante, C.D., Alani, E., 2012. Mutation hotspots in yeast caused by long-range clustering of homopolymeric sequences. Cell Rep. 1, 36–42. Maki, H., 2002. Origins of Spontaneous Mutations: Specificity and Directionality of Base-Substitution, Frameshift, and Sequence-Substitution Mutageneses. Annu. Rev. Genet. 36, 279–303.

200 Marin, B., Melkonian, M., 2010. Molecular phylogeny and classification of the Mamiellophyceae class. nov. (Chlorophyta) based on sequence comparisons of the nuclear- and plastid-encoded rRNA operons. Protist 161, 304–336. Martin, A.P., Palumbi, S.R., 1993. Body size, metabolic rate, generation time, and the molecular clock. Proc. Natl. Acad. Sci. U. S. A. 90, 4087–4091. Martin, C.H., Wainwright, P.C., 2013. Multiple Fitness Peaks on the Adaptive Landscape Drive Adaptive Radiation in the Wild. Science 339, 208–211. Martin, G., Lenormand, T., 2006. The fitness effect of mutations across environments: a survey in light of fitness landscape models. Evol. Int. J. Org. Evol. 60, 2413–2427. Martincorena, I., Luscombe, N.M., 2013. Non-random mutation: the evolution of targeted hypermutation and hypomutation. BioEssays News Rev. Mol. Cell. Dev. Biol. 35, 123–130. Martincorena, I., Seshasayee, A.S.N., Luscombe, N.M., 2012. Evidence of non- random mutation rates suggests an evolutionary risk management strategy. Nature 485, 95–98. Martinez, J.S., Carter-Franklin, J.N., Mann, E.L., Martin, J.D., Haygood, M.G., Butler, A., 2003. Structure and membrane affinity of a suite of amphiphilic siderophores produced by a marine bacterium. Proc. Natl. Acad. Sci. 100, 3754–3759. Martinez, J.S., Zhang, G.P., Holt, P.D., Jung, H.T., Carrano, C.J., Haygood, M.G., Butler, A., 2000. Self-assembling amphiphilic siderophores from marine bacteria. Science 287, 1245–1247. Massana, R., 2011. Eukaryotic picoplankton in surface oceans. Annu. Rev. Microbiol. 65, 91–110. Mata, T.M., Martins, A.A., Caetano, N.S., 2010. Microalgae for biodiesel production and other applications: A review. Renew. Sustain. Energy Rev. 14, 217–232. Mateyak, M.K., Zakian, V.A., 2006. Human PIF helicase is cell cycle regulated and associates with telomerase. Cell Cycle Georget. Tex 5, 2796–2804. Matsuba, C., Ostrow, D.G., Salomon, M.P., Tolani, A., Baer, C.F., 2013. Temperature, stress and spontaneous mutation in Caenorhabditis briggsae and Caenorhabditis elegans. Biol. Lett. 9, 20120334. Matuszewski, S., Hermisson, J., Kopp, M., 2014. Fisher’s geometric model with a moving optimum. Evol. Int. J. Org. Evol. 68, 2571–2588.

201 Mittelbach, G.G., Schemske, D.W., Cornell, H.V., Allen, A.P., Brown, J.M et al., 2007. Evolution and the latitudinal diversity gradient: speciation, extinction and biogeography. Ecol. Lett. 10, 315–331. Mooers, A.O., Harvey, P.H., 1994. Metabolic rate, generation time, and the rate of molecular evolution in birds. Mol. Phylogenet. Evol. 3, 344–350. Moreau, H., Verhelst, B., Couloux, A., Derelle, E., Rombauts, S., Grimsley, N., Van Bel, M., Poulain, J., Katinka, M., Hohmann-Marriott, M.F., Piganeau, G., Rouzé, P., Da Silva, C., Wincker, P., Van de Peer, Y., Vandepoele, K., 2012. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage. Genome Biol. 13, R74. Morgan, A.D., Ness, R.W., Keightley, P.D., Colegrave, N., 2014. Spontaneous mutation accumulation in multiple strains of the green alga, Chlamydomonas reinhardtii. Evolution 68, 2589–2602. Mousseau, T.A., Roff, D.A., 1987. Natural selection and the heritability of fitness components. Heredity 59 ( Pt 2), 181–197. Mugal, C.F., Arndt, P.F., Ellegren, H., 2013. Twisted signatures of GC-biased gene conversion embedded in an evolutionary stable karyotype. Mol. Biol. Evol. 30, 1700–1712. Mukai, T., 1964. The Genetic Structure of Natural Populations of Drosophila melanogaster. I. Spontaneous Mutation Rate of Polygenes Controlling Viability. Genetics 50, 1–19. Muller, H.J., 1950. Our load of mutations. Am. J. Hum. Genet. 2, 111–176. Muller, H.J., 1928. The Measurement of Gene Mutation Rate in Drosophila, Its High Variability, and Its Dependence upon Temperature. Genetics 13, 279–357. Muller, H.J., 1927. Artificial Transmutation of the Gene. Science 66, 84–87. Nahum, J.R., Godfrey-Smith, P., Harding, B.N., Marcus, J.H., Carlson-Stevermer, J., Kerr, B., 2015. A tortoise–hare pattern seen in adapting structured and unstructured populations suggests a rugged fitness landscape in bacteria. Proc. Natl. Acad. Sci. 112, 7530–7535. Neiman, M., Hehman, G., Miller, J.T., Logsdon, J.M., Taylor, D.R., 2010. Accelerated mutation accumulation in asexual lineages of a freshwater snail. Mol. Biol. Evol. 27, 954–963.

202 Ness, R.W., Kraemer, S.A., Colegrave, N., Keightley, P.D., 2015a. Direct estimate of the spontaneous mutation rate uncovers the effects of drift and recombination in the Chlamydomonas reinhardtii plastid genome. Mol. Biol. Evol. Ness, R.W., Morgan, A.D., Colegrave, N., Keightley, P.D., Ness, R.W., Morgan, A.D., Colegrave, N., Keightley, P.D., 2012. Estimate of the Spontaneous Mutation Rate in Chlamydomonas reinhardtii. Genetics 192, 1447–1454. Ness, R.W., Morgan, A.D., Vasanthakrishnan, R.B., Colegrave, N., Keightley, P.D., 2015b. Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii. Genome Res. 25, 1739– 1749. Not, F., Siano, R., Kooistra, W.H.C.F., Simon, N., Vaulot, D., Probert, I., 2012. Diversity and Ecology of Eukaryotic Marine Phytoplankton. Adv. Bot. Res. Orr, H.A., 2005. The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6, 119–127. Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., Lynch, M., 2010. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94. Ota, S., Matsuda, T., Takeshita, T., Yamazaki, T., Kazama, Y., Abe, T., Kawano, S., 2013. Phenotypic spectrum of Parachlorella kessleri (Chlorophyta) mutants produced by heavy-ion irradiation. Bioresour. Technol. 149, 432–438. Palenik, B., Grimwood, J., Aerts, A., Rouzé, P., Salamov, A et al., 2007. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc. Natl. Acad. Sci. U. S. A. 104, 7705–7710. Park, C., Qian, W., Zhang, J., 2012. Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep. 13, 1123–1129. Park, S.-J., Choi, Y.-E., Kim, E.J., Park, W.-K., Kim, C.W., Yang, J.-W., 2012. Serial optimization of biomass production using microalga Nannochloris oculata and corresponding lipid biosynthesis. Bioprocess Biosyst. Eng. 35, 3–9. Paul, S., Million-Weaver, S., Chattopadhyay, S., Sokurenko, E., Merrikh, H., 2013. Accelerated gene evolution through replication-transcription conflicts. Nature 495, 512–515. Perrineau, M.-M., Gross, J., Zelzion, E., Price, D.C., Levitan, O., Boyd, J., Bhattacharya, D., 2014. Using Natural Selection to Explore the Adaptive Potential of Chlamydomonas reinhardtii. PLOS ONE 9, e92533.

203 Petren, K., 2013. The Evolution of Landscape Genetics. Evolution 67, 3383–3385. Phifer-Rixey, M., Bonhomme, F., Boursot, P., Churchill, G.A., Piálek, J., Tucker, P.K., Nachman, M.W., 2012. Adaptive Evolution and Effective Population Size in Wild House Mice. Mol. Biol. Evol. 29, 2949–2955. Piganeau, G., Eyre-Walker, A., Jancek, S., Grimsley, N., Moreau, H., 2011a. How and why DNA barcodes underestimate the diversity of microbial eukaryotes. PloS One 6, e16342. Piganeau, G., Grimsley, N., Moreau, H., 2011b. Genome diversity in the smallest marine photosynthetic eukaryotes. Res. Microbiol. 162, 570–577. Pinter, S.F., Aubert, S.D., Zakian, V.A., 2008. The Schizosaccharomyces pombe Pfh1p DNA helicase is essential for the maintenance of nuclear and mitochondrial DNA. Mol. Cell. Biol. 28, 6594–6608. Polak, P., Arndt, P.F., 2008. Transcription induces strand-specific mutations at the 5’ end of human genes. Genome Res. 18, 1216–1223. Pombert, J.-F., Blouin, N.A., Lane, C., Boucias, D., Keeling, P.J., 2014. A lack of parasitic reduction in the obligate parasitic green alga Helicosporidium. PLoS Genet. 10, e1004355. Project, the 1000 G., 2011. Variation in genome-wide mutation rates within and between human families. Nat. Genet. 43, 712–714. Qiu, H., Price, D.C., Weber, A.P.M., Reeb, V., Yang, E.C., Lee, J.M., Kim, S.Y., Yoon, H.S., Bhattacharya, D., 2013. Adaptation through horizontal gene transfer in the cryptoendolithic red alga Galdieria phlegrea. Curr. Biol. CB 23, R865-866. Raymond, J.A., 2014. The ice-binding proteins of a snow alga, Chloromonas brevispina: probable acquisition by horizontal gene transfer. Extrem. Life Extreme Cond. 18, 987–994. Read, B.A., Kegel, J., Klute, M.J., Kuo, A., Lefebvre, S.C et al., 2013. Pan genome of the phytoplankton Emiliania underpins its global distribution. Nature 499, 209–213. Rebolledo-Jaramillo, B., Su, M.S.-W., Stoler, N., McElhoe, J.A., Dickins, B., Blankenberg, D., Korneliussen, T.S., Chiaromonte, F., Nielsen, R., Holland, M.M., Paul, I.M., Nekrutenko, A., Makova, K.D., 2014. Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc. Natl. Acad. Sci. 111, 15474–15479.

204 Rodríguez, F., Derelle, E., Guillou, L., Le Gall, F., Vaulot, D., Moreau, H., 2005. Ecotype diversity in the marine picoeukaryote Ostreococcus (Chlorophyta, Prasinophyceae). Environ. Microbiol. 7, 853–859. Rolland, J., Condamine, F.L., Jiguet, F., Morlon, H., 2014. Faster speciation and reduced extinction in the tropics contribute to the Mammalian latitudinal diversity gradient. PLoS Biol. 12, e1001775. Rozen, D.E., Habets, M.G.J.L., Handel, A., de Visser, J.A.G.M., 2008. Heterogeneous Adaptive Trajectories of Small Populations on Complex Fitness Landscapes. PLoS ONE 3, e1715. Rumin, J., Bonnefond, H., Saint-Jean, B., Rouxel, C., Sciandra, A., Bernard, O., Cadoret, J.-P., Bougaran, G., 2015. The use of fluorescent Nile red and BODIPY for lipid measurement in microalgae. Biotechnol. Biofuels 8, 42. Rutter, M.T., Roles, A., Conner, J.K., Shaw, R.G., Shaw, F.H., Schneeberger, K., Ossowski, S., Weigel, D., Fenster, C.B., 2012. Fitness of Arabidopsis thaliana mutation accumulation lines whose spontaneous mutations are known. Evol. Int. J. Org. Evol. 66, 2335–2339. Salk, J.J., Fox, E.J., Loeb, L.A., 2010. Mutational Heterogeneity in Human Cancers: Origin and Consequences. Annu. Rev. Pathol. Mech. Dis. 5, 51–75. Saxer, G., Havlak, P., Fox, S.A., Quance, M.A., Gupta, S., Fofanov, Y., Strassmann, J.E., Queller, D.C., 2012. Whole genome sequencing of mutation accumulation lines reveals a low mutation rate in the social amoeba Dictyostelium discoideum. PloS One 7, e46759. Schaack, S., Allen, D.E., Latta, L.C., Morgan, K.K., Lynch, M., 2013. The effect of spontaneous mutations on competitive ability. J. Evol. Biol. 26, 451–456. Schäfer, H., Abbas, B., Witte, H., Muyzer, G., 2002. Genetic diversity of “satellite” bacteria present in cultures of marine diatoms. FEMS Microbiol. Ecol. 42, 25– 35. Schaum, C.E., Collins, S., 2014. Plasticity predicts evolution in a marine alga. Proc. Biol. Sci. 281. Schaum, C.-E., Rost, B., Collins, S., 2015. Environmental stability affects phenotypic evolution in a globally distributed marine picoplankton. ISME J. Scheinin, M., Riebesell, U., Rynearson, T.A., Lohbeck, K.T., Collins, S., 2015. Experimental evolution gone wild. J. R. Soc. Interface 12, 20150056.

205 Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Horn, D.J.V., Weber, C.F., 2009. Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. Appl. Environ. Microbiol. 75, 7537–7541. Schönknecht, G., Chen, W.-H., Ternes, C.M., Barbier, G.G., Shrestha, R.P., Stanke, M., Bräutigam, A., Baker, B.J., Banfield, J.F., Garavito, R.M., Carr, K., Wilkerson, C., Rensing, S.A., Gagneul, D., Dickenson, N.E., Oesterhelt, C., Lercher, M.J., Weber, A.P.M., 2013. Gene Transfer from Bacteria and Archaea Facilitated Evolution of an Extremophilic Eukaryote. Science 339, 1207–1210. Schönknecht, G., Weber, A.P.M., Lercher, M.J., 2014. Horizontal gene acquisitions by eukaryotes as drivers of adaptive evolution. BioEssays 36, 9–20. Schrider, D.R., Houle, D., Lynch, M., Hahn, M.W., 2013. Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 194, 937–954. Schrider, D.R., Hourmozdi, J.N., Hahn, M.W., 2011. Pervasive multinucleotide mutational events in eukaryotes. Curr. Biol. CB 21, 1051–1054. Schröder, J., Schröder, G., 1990. Stilbene and chalcone synthases: related enzymes with key functions in plant-specific pathways. Z. Für Naturforschung C J. Biosci. 45, 1–8. Schroeder, J.W., Hirst, W.G., Szewczyk, G.A., Simmons, L.A., 2016. The Effect of Local Sequence Context on Mutational Bias of Genes Encoded on the Leading and Lagging Strands. Curr. Biol. CB. Schultz, S.T., Lynch, M., Willis, J.H., 1999. Spontaneous deleterious mutation in Arabidopsis thaliana. Proc. Natl. Acad. Sci. U. S. A. 96, 11393–11398. Schwartz, D.C., Cantor, C.R., 1984. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell 37, 67–75. Shapiro, J.A., Huang, W., Zhang, C., Hubisz, M.J., Lu, J., Turissini, D.A., Fang, S., Wang, H.-Y., Hudson, R.R., Nielsen, R., Chen, Z., Wu, C.-I., 2007. Adaptive genic evolution in the Drosophila genomes. Proc. Natl. Acad. Sci. U. S. A. 104, 2271–2276.

206 Sharp, N.P., Agrawal, A.F., 2012. Evidence for elevated mutation rates in low-quality genotypes. Proc. Natl. Acad. Sci. U. S. A. 109, 6142–6146. Shaw, R.G., Byers, D.L., Darmo, E., 2000. Spontaneous Mutational Effects on Reproductive Traits of Arabidopsis thaliana. Genetics 155, 369–378. Shor, E., Fox, C.A., Broach, J.R., 2013. The Yeast Environmental Stress Response Regulates Mutagenesis Induced by Proteotoxic Stress. PLOS Genet 9, e1003680. Simpson, J.T., Durbin, R., 2012. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556. Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I., 2009. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123. Šlapeta, J., López-García, P., Moreira, D., 2006. Global Dispersal and Ancient Cryptic Species in the Smallest Marine Eukaryotes. Mol. Biol. Evol. 23, 23–29. Smeds, L., Qvarnstrom, A., Ellegren, H., 2016. Direct estimate of the rate of germline mutation in a bird. Genome Res. Smith, D.R., 2015. Mutation rates in plastid genomes: they are lower than you might think. Genome Biol. Evol. 7, 1227–1234. Sniegowski, P.D., Gerrish, P.J., Lenski, R.E., 1997. Evolution of high mutation rates in experimental populations of E. coli. Nature 387, 703–705. Stamatoyannopoulos, J.A., Adzhubei, I., Thurman, R.E., Kryukov, G.V., Mirkin, S.M., Sunyaev, S.R., 2009. Human mutation rate associated with DNA replication timing. Nat. Genet. 41, 393–395. Steinberg, B., Ostermeier, M., 2016. Environmental changes bridge evolutionary valleys. Sci. Adv. 2, e1500921. Sterck, L., Billiau, K., Abeel, T., Rouzé, P., Van de Peer, Y., 2012. ORCAE: online resource for community annotation of eukaryotes. Nat. Methods 9, 1041. Subirana, L., Péquin, B., Michely, S., Escande, M.-L., Meilland, J., Derelle, E., Marin, B., Piganeau, G., Desdevises, Y., Moreau, H., Grimsley, N.H., 2013. Morphology, Genome Plasticity, and Phylogeny in the Genus Ostreococcus Reveal a Cryptic Species, O. mediterraneus sp. nov. (Mamiellales, Mamiellophyceae). Protist 164, 643–659. Sueoka, N., 1962. On the genetic basis of variation and heterogeneity of DNA base composition. Proc. Natl. Acad. Sci. U. S. A. 48, 582–592.

207 Sullivan, S., Petersen, J., Blackwood, L., Papanatsiou, M., Christie, J.M., 2015. Functional characterization of Ostreococcus tauri phototropin. New Phytol. Sung, W., Ackerman, M.S., Gout, J.-F., Miller, S.F., Williams, E., Foster, P.L., Lynch, M., 2015. Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation–Accumulation Experiments. Mol. Biol. Evol. 32, 1672–1683. Sung, W., Ackerman, M.S., Miller, S.F., Doak, T.G., Lynch, M., 2012a. Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl. Acad. Sci. U. S. A. 109, 18488–18492. Sung, W., Tucker, A.E., Doak, T.G., Choi, E., Thomas, W.K., Lynch, M., 2012b. Extraordinary genome stability in the ciliate Paramecium tetraurelia. Proc. Natl. Acad. Sci. U. S. A. 109, 19339–19344. Taddei, F., Radman, M., Maynard-Smith, J., Toupance, B., Gouyon, P.H., Godelle, B., 1997. Role of mutator alleles in adaptive evolution. Nature 387, 700–702. Tenaillon, O., Barrick, J.E., Ribeck, N., Deatherage, D.E., Blanchard, J.L., Dasgupta, A., Wu, G.C., Wielgoss, S., Cruveiller, S., Médigue, C., Schneider, D., Lenski, R.E., 2016. Tempo and mode of genome evolution in a 50,000-generation experiment. Nature. Tenaillon, O., Toupance, B., Le Nagard, H., Taddei, F., Godelle, B., 1999. Mutators, population size, adaptive landscape and the adaptation of asexual populations of bacteria. Genetics 152, 485–493. Tesson, S.V.M., Legrand, C., van Oosterhout, C., Montresor, M., Kooistra, W.H.C.F., Procaccini, G., 2013. Mendelian inheritance pattern and high mutation rates of microsatellite alleles in the diatom Pseudo-nitzschia multistriata. Protist 164, 89–100. Thomas, J.A., Welch, J.J., Lanfear, R., Bromham, L., 2010. A generation time effect on the rate of molecular evolution in invertebrates. Mol. Biol. Evol. 27, 1173– 1180. Tran, D., Giordano, M., Louime, C., Tran, N., Vo, T., Nguyen, D., Hoang, T., 2014. An Isolated Picochlorum Species for Aquaculture, Food, and Biofuel. North Am. J. Aquac. 76, 305–311. Uchimura, A., Higuchi, M., Minakuchi, Y., Ohno, M., Toyoda, A., Fujiyama, A., Miura, I., Wakana, S., Nishino, J., Yagi, T., 2015. Germline mutation rates and the long-term phenotypic effects of mutation accumulation in wild-type laboratory mice and mutator mice. Genome Res.

208 Uphoff, S., Lord, N.D., Okumus, B., Potvin-Trottier, L., Sherratt, D.J., Paulsson, J., 2016. Stochastic activation of a DNA damage response causes cell-to-cell mutation rate variation. Science 351, 1094–1097. Vandepoele, K., Van Bel, M., Richard, G., Van Landeghem, S., Verhelst, B., Moreau, H., Van de Peer, Y., Grimsley, N., Piganeau, G., 2013. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes. Environ. Microbiol. 15, 2147–2153. Vassilieva, L.L., Hook, A.M., Lynch, M., 2000. The fitness effects of spontaneous mutations in Caenorhabditis elegans. Evol. Int. J. Org. Evol. 54, 1234–1246. Vassilieva, L.L., Lynch, M., 1999. The rate of spontaneous mutation for life-history traits in Caenorhabditis elegans. Genetics 151, 119–129. Von Alvensleben, N., Stookey, K., Magnusson, M., Heimann, K., 2013. Salinity Tolerance of Picochlorum atomus and the Use of Salinity for Contamination Control by the Freshwater Cyanobacterium Pseudanabaena limnetica. PLoS ONE 8, e63569. Vonlanthen, S., Dauvillée, D., Purton, S., 2015. Evaluation of novel starch-deficient mutants of Chlorella sorokiniana for hyper-accumulation of lipids. Algal Res. 12, 109–118. Vos, M., Hesselman, M.C., Beek, T.A. te, Passel, M.W.J. van, Eyre-Walker, A., 2015. Rates of Lateral Gene Transfer in Prokaryotes: High but Why? Trends Microbiol. 23, 598–605. Vraspir, J.M., Butler, A., 2009. Chemistry of marine ligands and siderophores. Annu. Rev. Mar. Sci. 1, 43–63. Wang, A.D., Sharp, N.P., Agrawal, A.F., 2014. Sensitivity of the Distribution of Mutational Fitness Effects to Environment, Genetic Background, and Adaptedness: A Case Study with Drosophila. Evolution 68, 840–853. Wang, S., Shi, X., Palenik, B., 2016. Characterization of Picochlorum sp use of wastewater generated from hydrothermal liquefaction as a nitrogen source. Algal Res.-Biomass Biofuels Bioprod. 13, 311–317. Watson, J.D., Crick, F.H.C., 1953. Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid. Nature 171, 737–738. Wei, W., Ning, L.-W., Ye, Y.-N., Li, S.-J., Zhou, H.-Q., Huang, J., Guo, F.-B., 2014. SMAL: A Resource of Spontaneous Mutation Accumulation Lines. Mol. Biol. Evol. 31, 1302–1308.

209 Weller, A.M., Rödelsperger, C., Eberhardt, G., Molnar, R.I., Sommer, R.J., 2014. Opposing Forces of A/T-Biased Mutations and G/C-Biased Gene Conversions Shape the Genome of the Nematode Pristionchus pacificus. Genetics 196, 1145–1152. Weller, C., Wu, M., 2015. A generation-time effect on the rate of molecular evolution in bacteria. Evol. Int. J. Org. Evol. 69, 643–652. Wielgoss, S., Barrick, J.E., Tenaillon, O., Wiser, M.J., Dittmar, W.J., Cruveiller, S., Chane-Woon-Ming, B., Médigue, C., Lenski, R.E., Schneider, D., 2013. Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load. Proc. Natl. Acad. Sci. U. S. A. 110, 222–227. Willi, Y., Van Buskirk, J., Hoffmann, A.A., 2006. Limits to the Adaptive Potential of Small Populations. Annu. Rev. Ecol. Evol. Syst. 37, 433–458. Winnepenninckx, B., Backeljau, T., De Wachter, R., 1993. Extraction of high molecular weight DNA from molluscs. Trends Genet. TIG 9, 407. Wloch, D.M., Szafraniec, K., Borts, R.H., Korona, R., 2001. Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae. Genetics 159, 441–452. Wolfenden, R., 2014. Massive thermal acceleration of the emergence of primordial chemistry, the incidence of spontaneous mutation, and the evolution of enzymes. J. Biol. Chem. 289, 30198–30204. Worden, A.Z., Follows, M.J., Giovannoni, S.J., Wilken, S., Zimmerman, A.E., Keeling, P.J., 2015. Rethinking the marine carbon cycle: Factoring in the multifarious lifestyles of microbes. Science 347, 1257594. Worden, A.Z., Lee, J.-H., Mock, T., Rouzé, P., Simmons, M.P et al., 2009. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science 324, 268–272. Worden, A.Z., Nolan, J.K., Palenik, B., 2004. Assessing the dynamics and ecology of marine picophytoplankton: The importance of the eukaryotic component. Limnol. Oceanogr. 49, 168–179. Wright, S., 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proc. Sixth Int. Congr. Genet. 1, 356–366. Wright, S., 1931. Evolution in Mendelian Populations. Genetics 16, 97–159. Wright, S.D., Gillman, L.N., Ross, H.A., Keeling, D.J., 2010. Energy and the tempo of evolution in amphibians. Glob. Ecol. Biogeogr. 19, 733–740.

210 Xu, S., Schaack, S., Seyfert, A., Choi, E., Lynch, M., Cristescu, M.E., 2012. High mutation rates in the mitochondrial genomes of Daphnia pulex. Mol. Biol. Evol. 29, 763–769. Yamamoto, M., Nishikawa, T., Kajitani, H., Kawano, S., 2007. Patterns of asexual reproduction in Nannochloris bacillaris and Marvania geminata (Chlorophyta, Trebouxiophyceae). Planta 226, 917–927. Yamamoto, M., Nozaki, H., Kawano, S., 2001. Evolutionary relationships among multiple modes of cell division in the genus Nannochloris (chlorophyta) revealed by genome size, actin gene multiplicity, and phylogeny. J. Phycol. 37, 106–120. Yamamoto, M., Nozaki, H., Miyazawa, Y., Koide, T., Kawano, S., 2003. Relationship between presence of a mother cell wall and speciation in the unicellular microalga Nannochloris (Chlorophyta)1. J. Phycol. 39, 172–184. Yang, F., Xiang, W., Sun, X., Wu, H., Li, T., Long, L., 2014. A novel lipid extraction method from wet microalga Picochlorum sp. at room temperature. Mar. Drugs 12, 1258–1270. Yau, S., Grimsley, N., Moreau, H., 2015. Molecular ecology of Mamiellales and their viruses in the marine environment. Perspect. Phycol. 83–89. Yoon, H.S., Hackett, J.D., Ciniglia, C., Pinto, G., Bhattacharya, D., 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol. 21, 809– 818. Zeyl, C., DeVisser, J.A., 2001. Estimates of the rate and distribution of fitness effects of spontaneous mutation in Saccharomyces cerevisiae. Genetics 157, 53–61. Zhang, X.-S., 2012. Fisher’s Geometrical Model of Fitness Landscape and Variance in Fitness Within a Changing Environment. Evolution 66, 2350–2368. Zhu, Y., Dunford, N.T., 2013. Growth and Biomass Characteristics of Picochlorum oklahomensis and Nannochloropsis oculata. J. Am. Oil Chem. Soc. 90, 841– 849. Zhu, Y.O., Siegal, M.L., Hall, D.W., Petrov, D.A., 2014. Precise estimates of mutation rate and spectrum in yeast. Proc. Natl. Acad. Sci. U. S. A. 111, E2310-2318.

211

212

213 Résumé Les mutations sont la principale source de diversité sur laquelle agit la sélection pour permettre aux espèces de s’adapter. Les études de l’effet des mutations sur la survie et du taux de mutation sont donc essentielles pour mieux comprendre l’évolution. Par une approche d’expérience d’accumulation de mutations, nous étudions ces deux questions chez cinq modèles d’algues vertes (Ostreococcus tauri, O. mediterraneus, Bathycoccus prasinos, Micromonas pusilla, et Picochlorum RCC4223). Il est mis en évidence une diminution de la fitness au cours du temps en raison des mutations délétères, et une importante interaction génotype-environnement sur l’effet des mutations. Le taux de mutation varie aux échelles intra-génomique et inter-spécifique, avec deux principaux résultats: une augmentation du taux de mutation dans les régions non codantes et une augmentation du taux de mutation avec la taille du génome chez les eucaryotes et en fonction de l’écart à l’équilibre en GC du génome. Aussi, l’assemblage et l’annotation d’une picoalgue du genre Picochlorum permettent d’étudier le rôle des transferts horizontaux de gènes chez les Chlorophytes.

Abstract Mutations are the main source of diversity on which selection acts to allow species to adapt. Studies of the effect of mutations on survival and estimation of spontaneous mutation rates are essential to better understand evolution. Using mutation accumulation experimental approach, we investigated the issues of mutation effects and mutation rate in five models of green algae (Ostreococcus tauri, O. mediterraneus, Bathycoccus Prasinos, Micromonas pusilla, and Picochlorum RCC4223). It highlighted a decline in fitness over time because of deleterious mutations, and a significant genotype-environment interaction on the fitness effect of mutations. The mutation rate varies at inter-specific and intra-genomic scales, with two main results: a raise of the mutation rate in non-coding regions in accordance with trancriptional-coupled repair, and an increase of the mutation rate with an increase of the genome size in eukaryotes and the GC content deviation from the equilibrium. Also, a new Picochlorum genome is provided to investigate the role of horizontal gene transfer in the Chlorophyta group.

214