Génomique des populations et association génotype- phénotype des écotypes de touladi du lac Supérieur

Mémoire

Alysse Perreault-Payette

Maîtrise en biologie Maître ès sciences (M. Sc.)

Québec, Canada

© Alysse Perreault-Payette, 2016

Génomique des populations et association génotype- phénotype des écotypes de touladi du lac Supérieur

Mémoire

Alysse Perreault-Payette

Sous la direction de :

Louis Bernatchez, directeur de recherche Pascal Sirois, codirecteur de recherche

Résumé

L’apparition et le maintien d’écotypes adaptés à différentes niches écologiques, en situation de sympatrie, est régit par une multitude de facteurs. Ceux-ci sont essentiels pour la compréhension des processus évolutifs impliqués mais aussi pour la gestion et la conservation des populations en question. Le touladi (Salvelinus namaycush) est un salmonidé reconnu pour la présence d’écotypes liée à l’utilisation des ressources et de l’habitat à travers l’Amérique du Nord. Un total de quatre écotypes a été décrit vivant dans le lac Supérieur, se différenciant par l’habitat utilisé, l’alimentation, la morphologie ainsi que l’ostéologie. L’objectif principal de la présente étude était de quantifier l’étendue de la différentiation génétique entre les différents sites d’échantillonnage ainsi qu’entre les différents écotypes. Un second objectif était d’identifier des marqueurs potentiellement sous sélection entre les différents écotypes reflétant de possibles adaptations locales. Pour ce faire, un total de 486 individus, représentant les quatre écotypes pour chacun des quatre sites d’échantillonnages, a été génotypé à 6822 SNPs (polymorphisme de nucléotide simple). De plus, des analyses morphométriques ont été effectuées afin de caractériser l’ampleur de la divergence morphologique entre les écotypes à chacun des sites. Les résultats ont montré une différentiation génétique, bien que faible, plus prononcée entre les sites d’échantillonnage qu’entre les écotypes à chacun de ces sites. Des indices indiquant la présence de sélection divergente ont aussi été décelés entre les écotypes ou en association avec des variations morphologiques, dont certains marqueurs représentant des traits importants dans la divergence des différents écotypes. Les résultats de cette étude permettront une meilleure gestion et conservation des populations de touladi du lac Supérieur en plus d’éclairer le choix possible de populations sources pour l’ensemencement des autres Grands Lacs.

iii

Abstract

Understanding the emergence and maintenance of sympatric ecotypes adapted to various trophic niches is a central topic in evolutionary biology, and also has implications for conservation and management. Lake Trout (Salvelinus namaycush) is renowned for the occurrence of phenotypically distinct ecotypes linked to resource and habitat use throughout North America. A total of four ecotypes have been described in Lake Superior that differ in terms of habitat, diet, morphology and osteology. The principal objective of this study was to quantify the extent of genetic differentiation among sampling sites and among ecotypes. The secondary objective was to identify markers potentially under divergent selection among the four ecotypes that may underlie local adaptation. To this end, a total of 486 individuals were genotyped at 6822 SNPs (single nucleotide polymorphism). In addition, these analyses were conducted alongside morphometric analyses to characterise the extent of morphological divergence among ecotypes within each sampling site. Results reveal that overall genetic differentiation is weak and is higher among sites than among ecotypes within each site. Moreover, we found evidence for divergent selection among ecotypes, and in some instances in association with morphological variation. These markers represent ecologically important traits linked to ecotype divergence. Results from this study will benefit management and conservation practices, and will guide the choice of source populations for stocking in the Great Lakes.

iv

Table des matières

Résumé ...... iii Abstract ...... iv Table des matières ...... v Liste des tableaux ...... vii Liste des figures ...... viii Remerciements ...... x Avant-propos ...... xi Introduction ...... 1 L’apparition de nouvelles espèces ...... 1 Parallélisme dans la divergence phénotypique...... 3 Le touladi ...... 4 En Amérique du Nord ...... 4 Dans le lac Supérieur ...... 5 L’origine et le maintien des écotypes ...... 6 Problématique ...... 7 Bouleversement du système aquatique ...... 7 Plan de réintroduction des poissons de fonds ...... 8 Objectifs ...... 9 Chapter I: Investigating the Extent of Parallelism in Morphological and Genomic Divergence among Lake Trout Ecotypes in Lake Superior ...... 10 Résumé ...... 11 Abstract ...... 13 Introduction ...... 14 Methods ...... 17 Results ...... 26 Discussion ...... 42 Acknowledgements ...... 51 Conclusion ...... 52 Analyses morphologiques ...... 52

v

Analyses génomiques et divergence adaptative ...... 53 Différences historiques et impacts anthropogéniques ...... 54 Implications pour la gestion et la conservation ...... 56 Limitations et perspectives futures ...... 57 Bibliographie ...... 59 Annexe 1 ...... 67

vi

Liste des tableaux

Table 1. Sampling site information and consensus analysis of body shape, head shape and visual identification. Number of fish sampled per sampling site (N), year of collection and coordinates is provided. Ecotypes were identified by consensus analysis of body shape (B) and/or head shape (H) and/or visual identification (V). Fish for subsequent genetic analysis were chosen based on ecotype consensus. Fish less than 430 mm long were removed prior to the analysis………………………………………………………………………...... ….19

Table 2. Models and parameters selected to assign an ecotype to each fish from the four sites separately. The number of principal components used and the corresponding cumulative percentage of explained variance are in brackets. Models chosen and parameterisation of the covariance matrix are either identified as EII (spherical distribution, equal volume, and equal shape), VEI (diagonal distribution, variable volume, and equal shape), VII (spherical distribution, variable volume, equal shape) or EEI (diagonal distribution, equal volume, and equal shape). The number of groups found (G), the corresponding bayesian information criterion (BIC) and mean uncertainty are also listed for each model….…....29

Table 3. MANOVA on body and head shape to investigate the effect of the ecotype, the site of origin, the sex and all interactions. Significant variables are in bold……………..…..…...30

Table 4. Between-group PCA analysis of group distance among ecotypes within and among sites with 10000 permutation for body (below diagonal) and head (above diagonal) shape. Ecotypes are: Siscowet (FT), Humper (HT), Lean (LT) and Redfin (RF). Group distances with significant p-value (< 0.05) are followed by an asterix (*). Within sites analyses are highlighted in gray and significant between site differences for the same ecotype are in bold. ………………………………………………………………………………..….….31

Table 5. Number of SNPs remaining after each filtration step. Allelic imbalance corresponds to the ratio of the number of sequences for the major allele on the number of sequences for the minor allele……………………………………………………………………….….....33

Table 6. Population statistics estimated with 6822 SNPs: the observed heterozygosity (Ho), the expected heterozygosity (He), the inbreeding coefficient (Gis), the effective population size (Ne) and confidence interval in brackets, and the number of polymorphic loci (N). Genome- wide diversity () and the increase in individual homozygosity relative to mean Hardy- Weinberg expected homozygosity (Fh) was calculated on the dataset prior to filtration. Effective population size for ecotypes with sample size < 15 individuals were not calculated (NA)…………………………………………………………………..………...35

Table 7. Analysis of molecular variance (AMOVA) on 486 individuals and 6822 SNPs. Missing data has been replaced by random picking in the overall pool of allele frequency…...….36

vii

Liste des figures

Figure 1. Map of Lake Superior sampling sites; Isle Royale, Superior Shoals, Stannard Rock and Big Reef. Circles correspond to sampling locations for each site………………….…….18

Figure 2. Landmark and semi-landmark positions digitized on fish body and head. Homologous landmarks are represented by red dots and semi-landmarks by black dots. (a) Sixteen homologous landmarks and four semi-landmarks were placed on each fish body; (1) tip of the snout, (2) posterior tip of the maxillary, (3) center of the eye, (4) top of the head above the center of the eye, (5) posterior of neurocranium above the edge of opercula, (6) anterior insertion of dorsal fin, (7) posterior insertion of dorsal fin, (8) anterior insertion of adipose fin, (9) dorsal insertion of caudal fin, (10) midpoint of hypural plate, (11) ventral insertion of caudal fin, (12) posterior insertion of anal fin, (13) anterior insertion of anal fin, (14) anterior insertion of pelvic fin, (15) 50% of body length, (16) 40% of body length, (17) 30% of body length, (18) 20% of body length, (19) anterior insertion of pectoral fin, (20) connection (isthmus) between branchiostegals. (b) Eight homologous landmarks and twenty semi-landmarks were placed on each fish head; (1) tip of the snout, (2-11) ten equally spaced semi-landmarks, (12) posterior edge of the opercula, (13-22) ten equally spaced semi-landmarks, (23) tip of the lower jaw, (24) posterior tip of the maxillary, (25) anterior edge of the eye, (26) center of the eye, (27) posterior edge of the eye, (28) insertion of pectoral fin……………………………….…………………………………………21-22

Figure 3. Between-group PCA on partial warps of 501 Lake Trout. (a) First and second principal components for head shape representing 56.2% and 15.9% of the variance respectively distinguishing the four ecotypes. (b) First and second principal components for body shape representing 65.5% and 14.5% of the variance respectively that distinguish leans from siscowets based mainly on belly curvature. (c) Third and fourth principal component for head shape representing 11.5% and 7.2% of the variance respectively distinguishing the four sites. (d) Third and fourth principal components for body shape representing 8.8% and 3.0% of the variance respectively distinguishing the four sites. The colored points refer to the mean scores for each ecotype in each site. The sites are: Big Reef (black), Isle Royale (blue), Stannard Rock (red) and Superior Shoals (green). Ecotypes are: Siscowet (FT), Humper (HT), Lean (LT) and Redfin (RF). Under each ecotype are drawn the consensus shapes of all four ecotypes (gray) with the outline of the ecotype in question (black). The shaded ellipses have been drawn for clarity. ……………………………………………..32

Figure 4. Population structure analysis of Lake Trout; a) Admixture plot based on 486 individuals and 6822 SNPs (including outliers) for different values of K. Individuals are shown by sites and ecotypes. b) Neighbour joining tree based on 486 individuals and 6822 SNPs including outliers. Yellow circles represent Big Reef, orange circles Stannard Rock, blue circles Isle Royale and green circles Superior Shoals. Bootstrapping support is indicated on each branch. The four ecotypes are represented for each site; Lean (LT), Humper (HT), Redfin (RF) and Siscowet (FT).…………………………………………………..………………38

Figure 5. Assignment success of individuals to their sampling sites (a) or ecotypes (b). Percentage assignment is written below circles with the exact number of individuals assigned within brackets. Percentage of correct assignment to either sampling sites or ecotypes is in bold.

viii

Sites are: Big Reef (BR), Isle Royale (IR), Stannard Rock (SR), Superior Shoals (SS). Ecotypes are: Humper (HT), Siscowet (FT), Lean (LT) and Redfin (RF).…………..….39

Figure 6. Venn diagrams of outliers detected by LFMM and BAYESCAN among sites, ecotypes or among ecotypes within sites a) Outliers detected among the four sites by BAYESCAN and LFMM including outliers detected by LFMM using morphological PC scores. b) Outliers detected among the four pooled ecotypes by BAYESCAN and LFMM including outliers detected by LFMM using morphological PC scores. c) Outliers among ecotypes within each site detected by LFMM…………………………………...………………….…….41

ix

Remerciements

Un goût certain pour tout ce qui a trait à la nature a toujours été présent en moi, ainsi qu’un intérêt particulier pour la vie aquatique sous toutes ses formes. Par conséquent, les métiers en relation avec la vie sauvage m’ont toujours attirée et c'est pourquoi j’ai choisi d’étudier dans le domaine de la biologie. C’est également la raison principale de ma venue au Laboratoire du Dr. Louis Bernatchez, titulaire de la chaire de recherche du Canada en génomique et conservation de ressources aquatiques.

J’aimerais le remercier pour m’avoir donné la chance de poursuivre ma passion dans son laboratoire, de même que tous les étudiants, stagiaires et professionnels de recherches présents, sans qui ce projet n’aurait pu être mené à terme. Plus précisément, j’aimerais mentionner, pour leur précieuse collaboration, Cécilia Hernandez pour la partie « laboratoire » du projet, mes collègues de bureau, dont Simon Bernatchez, ainsi que Charles Perrier, Martin Laporte, Jean-Sébastien Moore, Anne-Laure Ferchand, Clément Rougeux, Laura Benestan, Thierry Gosselin et Éric Normandeau pour leurs aides et conseils quant à l’analyse et l’interprétation des résultats.

Ce projet n’aurait pu prendre vie sans le concours et le soutien de la Commission des Pêcheries des Grands Lacs, et tout particulièrement, sans celui de monsieur Andrew Muir. Toute ma reconnaissance à C. Krueger, C. Bronte, M. Hansen et tous les membres de l’équipage du Kiyi pour leur aide lors de l’échantillonnage sur le lac Supérieur. Et c’est sans oublier la collaboration de monsieur Frederick Goetz, de l'Agence américaine d'observation océanique et atmosphérique sans qui l’étude du site « Isle Royale » n’aurait pas été possible.

Finalement, j’aimerais exprimer ma gratitude à ma mère et mon conjoint qui, malgré bien des embûches, ont toujours cru en moi et m’ont encouragée dans cette aventure.

Sur ce, je vous souhaite une bonne lecture.

x

Avant-propos

Vous trouverez, au cœur de ce mémoire, un article intitulé « Investigating the Extent of Parallelism in Morphological and Genomic Divergence among Lake Trout Ecotypes in Lake Superior ». Cet article a été accepté avec corrections mineures par la revue « Molecular Ecology ».

Les co-auteurs sont messieurs Andrew Muir, Frederick Goetz, Charles Perrier et Éric Normandeau, mon co-directeur Pascal Sirois et mon directeur monsieur Louis Bernatchez.

Ont effectué, en grande majorité, le travail de terrain, messieurs Andrew Muir et Frederick Goetz. Ont participé aux analyses, messieurs Éric Normandeau et Charles Perrier, ce dernier ayant aussi travaillé à l’interprétation des résultats. Pour ma part, j’ai réalisé le travail en laboratoire, rédigé l’article, effectué les analyses et interpréter les résultats. De plus, tous les co-auteurs ont collaboré à la révision et l’amélioration du manuscrit.

Ce projet a été rendu possible grâce à la Commission des Pêcheries des Grands Lacs (GLFC) ainsi qu’au regroupement Ressources Aquatiques Québec (RAQ).

xi

Introduction

L’apparition de nouvelles espèces

La biodiversité, telle qu’on la connaît aujourd’hui, est le fruit de plusieurs processus évolutifs agissant de concert ou de façon individuelle sur des individus ou des populations. La nature de ces forces soit la mutation, la dérive génétique, la migration, la sélection ainsi que leurs modes d’action, a toujours été au centre de l’étude de l’évolution. On reconnaît maintenant leur importance dans plusieurs autres domaines tels que la gestion et la conservation des ressources. Le processus de spéciation est connu comme étant un moteur de diversité par lequel une espèce se divise en deux ou plusieurs espèces distinctes (Hendry 2009; Weissing et coll. 2011). L’apparition de ces nouvelles espèces est causée par plusieurs forces agissant comme barrières aux flux de gènes entre ces taxons naissants (Turelli et coll. 2001; Hendry 2009). Ces forces prennent la forme d’isolements reproducteurs agissant à plusieurs étapes du cycle de vie, soit avant la reproduction (pré-zygotique) ou directement sur les hybrides produits (post-zygotique). Les barrières pré-zygotiques peuvent être de types écologiques, comportementales, mécaniques ou gamétiques (Schluter 2001; Turelli et coll. 2001) tandis que celles post-zygotiques découlent de la stérilité ou la mortalité des hybrides, ou de leur valeur sélective plus faible en comparaison des parents (Butlin et coll. 2012; Turelli et coll. 2001).

La spéciation est le plus souvent considérée comme un processus graduel où l’accumulation de différences génétiques amène un isolement reproductif (Hendry et coll. 2009). C’est pourquoi, la spéciation est représentée comme un continuum, où à l’une des extrémités, on retrouve des populations présentant des variations adaptatives continues sans aucun isolement reproductif et, à l’autre extrémité, des populations présentant des différences adaptatives accompagnées par un isolement reproductif irréversible (Hendry et coll. 2009). On retrouve donc des populations à différents stades, dits intermédiaires, de ce continuum.

1

La compréhension des facteurs menant à un isolement reproductif et, plus spécifiquement à son niveau d’achèvement, est primordial tant au niveau évolutif qu’au niveau de la conservation (Seehausen et coll. 2008). Par exemple, le niveau d’achèvement de l’isolement reproductif peut déterminer si, lorsque remise en contact, deux populations ou espèces vont fusionner en une seule et même espèce et ainsi perdre les adaptations acquises (Seehausen et coll. 2008).

On retrouve traditionnellement trois modes de spéciation selon le contexte spatial : l’allopatrie, la parapatrie et la sympatrie. L’utilisation et la véracité de ce classement a souvent fait l’objet de débats et de remises en question, mais est encore en vigueur aujourd’hui (Butlin et coll. 2008; Fitzpatrick et coll. 2009). La forme la plus acceptée est celle dite allopatrique, où les populations ou espèces sont séparées par des barrières géographiques empêchant ou réduisant de manière significative la dispersion et, par conséquent, l’échange de gènes (Mallet et coll. 2009). Ce faisant, les populations accumulent des différences génétiques par dérive, ou par des pressions de sélection différentes selon les habitats occupés (Turelli et coll. 2001). Lors d’un contact secondaire, l’accumulation de ces différences pourra mener à un isolement reproductif empêchant complètement ou diminuant le flux génique. Pour sa part, la spéciation dite en parapatrie inclut un certain niveau d’échange de gènes entre les populations seulement à des zones de contacts ou d’hybridation (Butlin et coll. 2012). À l’autre extrémité, on retrouve la spéciation dite en sympatrie où les populations occupent une même région géographique et où tous les individus ont la capacité de se rencontrer physiquement plus ou moins fréquemment (Mallet et coll. 2009). Ce dernier mode de spéciation est causé par la présence d’une sélection divergente liée aux facteurs biotiques et abiotiques présents (Turelli et coll. 2001). Ces facteurs pourraient prendre la forme d’une divergence écologique associée à l’utilisation des ressources et de l’habitat (Blackie et coll. 2003; Hendry 2009).

La présence d’une sélection divergente sur des traits entre populations utilisant différentes ressources et menant à l’isolement reproductif est souvent appelée spéciation écologique (Hendry 2009; Schluter 2016; Gavrilets et coll. 2007). Cette forme de spéciation cause l’apparition d’un certain isolement pré-zygotique sous la forme de sélection sexuelle ou de

2

ségrégation temporelle, et post-zygotique lié à la sélection contre les formes intermédiaires quant à l’exploitation des ressources et à leur compétition (Hendry 2009).

La présence d’un ou de plusieurs phénotypes distincts dans une population utilisant différentes ressources est appelée le polymorphisme des ressources (Wimberger 1994; Skúlason et Smith 1995). Ce polymorphisme peut prendre la forme de différences morphologiques marquées, de différences comportementales ou de traits d’histoire de vie (Skúlason et Smith 1995). On retrouve souvent le polymorphisme des ressources dans un contexte de niche inoccupée et de diminution dans la compétition interspécifique (Skúlason et Smith 1995). Ces conditions, présentes dans les lacs lors du retrait des glaciers, ont été proposées comme moteur dans la radiation adaptative observée chez les poissons d’eau douce en Amérique du Nord. Cette radiation se traduit par la différentiation d’une espèce ancestrale en une série d’espèces adaptées à différentes niches écologiques (Schluter 2001). Un des exemples les plus connus est l’évolution répétée des formes limnétiques et benthiques retrouvées en sympatrie chez plusieurs espèces de poissons telles que le corégone (Coregonus), l’épinoche à trois épines (Gasterosteus aculeatus), et le crapet-soleil (Lepomis gibbosus) et le crapet arlequin (Lepomis macrochirus) (Jonsson 2001; Wimberger 1994; Skùlason et Smith 1995).

Parallélisme dans la divergence phénotypique

Le parallélisme phénotypique, soit l’apparition répétée de phénotypes similaires adaptés à certaines conditions environnementales, a été observé chez plusieurs vertébrés (Elmer et Meyer 2011; Arendt et Reznick 2008). Les outils génomiques modernes permettent maintenant de vérifier si ce parallélisme au niveau phénotypique est suivi d’une base génétique parallèle sous-jacente. Ce parallélisme génétique peut découler du polymorphisme génétique ancestral présent chez l’espèce ou prendre la forme de nouvelles mutations (Elmer et Meyer 2011). Le processus d’adaptation à partir du polymorphisme génétique ancestral est considéré comme plus probable et plus rapide que la mutation pour trois raisons. Premièrement, les allèles sont déjà disponibles, deuxièmement ils sont présents à une plus

3

haute fréquence dans la population, et troisièmement, étant déjà présents, les allèles ont été testés par la sélection et dès lors sont compatibles avec le reste du génome (Barrett et Schluter 2008; Elmer et Meyer 2011). Plusieurs exemples de parallélisme phénotypiques associés à une base génétique similaire ont été documentés chez la souris (Peromyscus polionotus), l’épinoche (Gasterosteus aculeatus), le corégone (Coregonus), les cichlidés et le tétra aveugle (Astyanax mexicanus). Par contre, pour les mêmes espèces énumérées précédemment, une ou plusieurs populations affichant le même phénotype ne montraient pas de parallélisme au niveau génétique et, par conséquent, le phénotype exprimé serait causé par un changement moléculaire différent au courant du développement (Elmer et Meyer 2011; Arendt et Reznick 2008). On peut donc observer au sein d’une même population différente routes évolutives menant à un même phénotype.

Le touladi

En Amérique du Nord

Le touladi est un salmonidé répandu en Amérique du Nord dont la distribution suit les limites de la calotte glacière du Wisconsin, où suivant le retrait des glaciers, il a colonisé les plans d’eau de toutes tailles (Muir et coll. 2015; Wilson et Hebert 1996; Eshenroder 2008). Les populations présentes en Amérique du Nord, proviennent de plusieurs refuges glaciaires, soit ceux de l’Atlantique, du Béringien, du Mississippi, du Montana et du Nahanni (Wilson et Hebert 1998). Ce poisson d’eau douce est reconnu pour sa diversité, comparable à celle de l’Omble chevalier, tant au niveau de la morphologie, des traits d’histoire de vie que de la physiologie et de l’écologie (Muir et coll. 2015). L’apparition répétée de formes adaptées à l’eau peu profonde ou ayant des adaptations liées à un mode de vie en eau profonde a été répertoriée dans plusieurs lacs nord-américains (Muir et coll. 2015). Par exemple, on retrouve deux écotypes dans le lac Mistassini et le lac Rush, dont l’un adapté à la vie en eau peu profonde et l’autre aux eaux profondes (Zimmerman et coll. 2007; Muir et coll. 2015). Il est à noter que dans des lacs de plus grande superficie, on peut retrouver jusqu’à quatre

4

écotypes différents. Par exemple, jusqu’à quatre écotypes adaptés aux eaux peu profondes ont été trouvé au Grand lac de l’Ours (Harris et coll. 2014; Muir et coll. 2015), tandis qu’au Grand lac des Esclaves, on retrouve un écotype adapté aux eaux peu profondes ainsi que deux adaptés aux eaux profondes (Zimmerman et coll. 2006; Muir et coll. 2015).

Dans le lac Supérieur

Dans le lac Supérieur, on retrouve quatre écotypes de touladi : le « lean » dit maigre, le « siscowet » dit gras, le « humper » dit bossu et le « redfin » (Muir et coll. 2014). Ces écotypes diffèrent principalement au niveau de l’habitat, de l’alimentation, de la morphologie et dans certains traits d’histoire de vie, tels que la croissance et la taille à la maturité (Muir et coll. 2015). Le maigre est le seul écotype exploité par la pêche commercial et récréative en raison du faible taux de lipides présent dans ses tissus (Bronte et Sitar 2008). Cet écotype est décrit comme ayant une forme allongée, un museau droit et pointu ainsi qu’un long pédoncule caudal lui permettant de nager de manière prolongée (Burnham-Curtis & Smith 1994; Moore & Bronte 2001; Zimmerman et coll. 2006). On le retrouve en eaux peu profondes (< 100 m) près des côtes où il se nourrit de petites espèces de poissons pélagiques ou de fond tels que l’éperlan arc-en-ciel (Osmerus mordax), le cisco de lac (Coregonus artedi) et le chabot visqueux (Cottus cognatus) (Goetz et coll. 2011; Ray et coll. 2007; Zimmerman et coll. 2006). À l’opposé, l’écotype gras est le plus abondant et on le retrouve principalement en eaux profondes (> 100 m) au large des côtes (Bronte et coll. 2003; Ray et coll. 2007; Goetz et coll. 2011). Cet écotype est décrit comme ayant un corps robuste et épais, une tête plutôt petite, un museau court et incliné ainsi que de grands yeux. Il est aussi reconnu pour avoir un pédoncule caudal épais et long ainsi que de grandes nageoires lui permettant d’effectuer de soudaines poussées de vitesse pour attraper ses proies. De plus, l’écotype gras tient son nom du niveau élevé de lipides dans ses viscères lui donnant une flottabilité plutôt neutre, et par le fait même, facilitant les migrations verticales dans la colonne d’eau à la poursuite de ses proies. Celles-ci sont composées principalement de corégoninés (corégones, ciscos, ménominis), de chabots de profondeur (Myoxocephalus thompsoni) et de lottes (Lota lota) (Goetz et coll. 2011; Ahrenstorff et coll. 2011; Hrabik et

5

coll. 2014; Bronte et coll. 2003; Bronte and Sitar 2008; Burnham-Curtis and Smith 1994; Hansen et coll. 2012). Habitant les hauts-fonds au large des côtes, l’écotype bossu est peu répandu dans le lac Supérieur (Zimmerman et coll. 2006). On le reconnaît à sa petite taille, à sa petite tête ayant un museau incliné et de grands yeux, à une paroi abdominale plutôt mince ainsi qu’à un niveau de lipides intermédiaire (Bronte et coll. 2003; Goetz et coll. 2011; Moore et Bronte 2001). C’est le seul écotype qui ne soit pas piscivores à l’âge adulte, se nourrissant principalement de Mysis diluviana (Stafford et coll. 2014). Finalement, l’écotype « redfin » a été décrit récemment près de l’Isle Royale dans la partie ouest du lac Supérieur (Muir et coll. 2014). Il est décrit comme ayant un corps robuste, une tête, un museau et des yeux larges, un pédoncule caudal long et épais ainsi que de grandes nageoires (Muir et coll. 2014). On le retrouve en eaux de profondeur moyenne (~ 80 m) et il semble avoir un niveau de lipides semblable et même plus élevé que l’écotype gras (Muir et coll. 2014; Hansen et coll. 2016). De plus, il est de plus grande taille, plus lourd, plus longévif et a une croissance plus lente que les autres écotypes à l’Isle Royale (Hansen et coll. 2016).

L’origine et le maintien des écotypes

Plusieurs hypothèses ont été avancées pour expliquer la présence de ces écotypes dans le lac Supérieur (Wilson et Mandrak 2004; Eshenroder 2008). La première est la plasticité phénotypique induite par l’environnement qui se traduit par l’apparition de plusieurs phénotypes à partir d’un seul génotype selon des variations biotiques ou abiotiques. La deuxième hypothèse sous-entend la présence d’une base génétique entre les différents écotypes, et donc d’un certain isolement reproductif (Goetz et coll. 2010). Plusieurs évidences pointent vers la deuxième hypothèse, soit la présence d’une base génétique, pour expliquer l’apparition et le maintien des écotypes. En premier lieu, les écotypes (maigre, bossu, et gras) diffèrent au niveau de certains os crâniens, plus spécifiquement au niveau de la forme de l’opercule, du supraethmoïde ou du dermethmoïde. Des variations environnementales ou ontogéniques sont peu plausibles pour expliquer les différences morphologiques observées avec ces os (Burnham-Curtis et Smith 1994). De plus, la progéniture des écotypes maigres et gras a été élevée en milieu contrôlé et retenait la plupart des traits distinctifs les séparant (Goetz et coll. 2010). La même étude a aussi décelé des

6

différences au niveau de l’expression de certains gènes, entre ces écotypes, associés au métabolisme des lipides. Aussi, deux études utilisant des microsatellites ont montré, bien que faiblement, une plus grande variance liée aux écotypes qu’aux localités (Page et coll. 2004; Guinand et coll. 2012). Par contre, plusieurs études ont démontré un schéma contraire, soit une plus grande variance liée aux sites d’échantillonnage qu’aux écotypes (Dehring et coll. 1981; Ihssen et coll. 1988; Baillie et coll. 2016). L’homogénéisation génétique et phénotypique des écotypes de touladi, vivant dans le lac Supérieur, a été avancée pour expliquer la présence d’une plus grande différentiation génétique entre les sites d’échantillonnage qu’entre les écotypes (Baillie et coll. 2016). En effet, en étudiant des échantillons préeffondrements et contemporains, Baillie et collaborateurs (2016) ont identifié un regroupement génétique par écotypes plus importants chez les individus préeffondrements que chez les individus contemporains qui se regroupent majoritairement par site d’échantillonnage. La surpêche, l’ensemencement intensif et l’introduction d’espèces invasives ont été énoncés pour expliquer le plus grand échange de gènes, observé aujourd’hui, entre les différents écotypes (Baillie et coll. 2016).

Problématique

Bouleversement du système aquatique

Jusqu’à dix variétés de touladis ont été décrites de manière anecdotique par les pêcheurs du lac Supérieur, selon des distinctions au niveau de l’apparence physique, de l’habitat et du comportement. Celles-ci étaient exploitées commercialement, tout particulièrement la forme maigre dû à son faible taux en lipides (Goodier 1981; Bronte et Sitar 2008). Dans les années 1960, la pêche s’est effondrée et le touladi a disparu des Grands Lacs à l’exception du lac Supérieur et d’une partie du lac Huron (Bronte et Sitar 2008). Cet effondrement a été causé principalement par l’activité humaine, dont la surpêche, la dégradation de l’habitat et l’introduction d’espèces envahissantes notamment la lamproie marine (Bronte et Sitar 2008; Page et coll. 2003; Page et coll. 2004). En ce qui a trait aux salmonidés, six espèces ont été introduites dont la truite brune et arc-en-ciel, ainsi que les saumons chinook, coho, rose et

7

atlantique. Elles ont été introduites afin de diversifier et d’augmenter la pêche sportive, mais ont eu pour effet de changer la dynamique des poissons-fourrages ainsi que de leurs prédateurs (Bronte et coll. 2003). Le contrôle de la lamproie marine et l’ensemencement intensif ont permis de restaurer les stocks de touladis dans le lac Supérieur, mais bien en deçà des niveaux historiques (Bronte & Sitar 2008). Malgré plusieurs efforts de restauration dans les autres Grands Lacs, aucune population reproductrice autonome n’a pu être créée dans ces derniers (Page et coll. 2003).

Plan de réintroduction des poissons de fonds

Le rétablissement des poissons de fonds tels que le touladi dans les Grands Lacs est encore incertain et a fait l’objet d’un article par deux biologistes de la Commission des Pêcheries des Grands Lacs (Zimmerman et Krueger 2009). Ceux-ci ont examiné plusieurs facteurs pouvant freiner ou empêcher son rétablissement dans les Grands Lacs. À cet effet, quatre sujets de recherche ont été désignés comme capitaux, soient : − une plus grande compréhension des variations spatiales et temporelles liées au recrutement, − l’identification de la ou des sources de mortalité des premiers stades de vie, − la comparaison de l’histoire de vie et de l’écologie de l’écotype maigre avec ceux des autres écotypes, et − la détermination du niveau de différentiation génétique entre les écotypes et les populations. La diversité génétique est associée au potentiel évolutif d’une espèce face aux changements (Toro et Caballero 2005). C’est ainsi que le maintien d’une grande diversité génétique, ainsi qu’une grande diversité phénotypique, augmenterait la résilience et la résistance des espèces face aux changements dans l’environnement. Donc, l’établissement d’une combinaison de populations adaptées à divers habitats devrait augmenter la résilience du touladi dans un même plan d’eau (Zimmerman et Krueger 2009).

L’utilisation d’outils génomiques de pointe ainsi que des approches pangénomiques permettrait d’augmenter la résolution génétique entre les différents écotypes, et ce faisant de

8

préciser les mécanismes soutenant l’origine et le maintien des écotypes de touladi dans le Lac Supérieur. De plus, un échantillonnage à plusieurs sites connus pour l’abondance de cette espèce permettrait de déterminer la connectivité génétique à travers le lac, mais aussi de vérifier la présence de variabilité génétique liée à la localité.

Objectifs

L’objectif principal de cette étude était d’approfondir les connaissances quant à l’origine et l’ampleur de la différentiation génétique des différents écotypes de touladi dans une optique de gestion et de conservation. Afin d’atteindre cet objectif, deux cibles spécifiques ont été identifiés soit : (i) de documenter et comparer l’étendue de la structure et de la connectivité génétique entre les différents écotypes et entre les différents sites d’échantillonnage et (ii) d’identifier de possibles divergences adaptatives incluant les associations génotype- phénotype entre les différents écotypes.

9

Chapter I: Investigating the Extent of Parallelism in Morphological and Genomic Divergence among Lake Trout Ecotypes in Lake Superior

Chapitre I: Évaluation de l’étendue du parallélisme dans la divergence morphologique et génomique des écotypes de touladi du lac Supérieur

10

Résumé

Comprendre le processus de spéciation écologique par lequel de nouvelles espèces émergent est au centre de la biologie évolutive, mais il est tout aussi important en ce qui a trait à la conservation et la gestion. Le touladi (Salvelinus namaycush) est reconnu pour la présence d’écotypes associés à l’utilisation de l’habitat et des ressources à travers l’Amérique du Nord. Nous avons utilisé le séquençage de nouvelle génération afin de définir la structure génétique fine des quatre écotypes de touladis décrits dans le lac Supérieur, le plus grand lac en Amérique du Nord. Pour ce faire, 486 individus provenant des quatre écotypes présents en sympatrie à quatre sites d’échantillonnage ont été génotypés à 6822 SNPs en utilisant la technologie de séquençage « RAD ». De plus, les traits phénotypiques de tous les individus séquencés ont été documentés sous la forme d’analyses morphométriques conjointes de la forme de la tête et du corps ainsi qu’une identification visuelle. Les résultats obtenus ont révélé la présence de différents niveaux de différentiation morphologique et génétique à l’intérieur des différents sites. De manière générale, la différentiation génétique était faible, mais significative et était en moyenne près de trois fois plus prononcées entre les sites (FST

Moyen = 0.016 [0.012; 0.021]) qu’entre les écotypes à chacun de ces sites (FST Moyen= 0.005 [0.004; 0.007]). Ceci indique un flux de gènes plus important ou un ancêtre commun plus récent à l’intérieur de chaque site, qu’entre les populations du même écotype. Malgré, le peu d’indication quant à la présence d’une origine commune des différentes populations appartenant à chaque écotype, des individus correspondant à la description des différents écotypes ont été identifiés à chacun des sites, ce qui suggère une convergence dans les traits phénotypiques. Des indices de la présence de sélection divergente ont été trouvés entre les écotypes ou en association avec des variations morphologiques. En effet, plusieurs marqueurs associés au métabolisme des lipides ainsi qu’à l’acuité visuelle sont particulièrement intéressants dans un contexte de divergence entre les écotypes. Globalement, la présence de différents niveaux de différentiation génomique entre les écotypes à chacun des sites ainsi que la présence de loci différentiés associés à des fonctions

11

biologiques d’intérêt supportent un scénario de divergence répété des différents écotypes en sympatrie, à l’intérieur de chaque localité.

12

Abstract

Understanding the emergence of species through the process of ecological speciation is a central question in evolutionary biology which also has implications for conservation and management. Lake Trout (Salvelinus namaycush) is renowned for the occurrence of different ecotypes linked to resource and habitat use throughout North America. We used next generation sequencing to unravel the fine genetic structure of the four Lake Trout ecotypes described in Lake Superior. A total of 486 individuals from four sites where the four ecotypes occur in sympatry were genotyped at 6822 filtered SNPs using RADseq technology. Phenotypic traits of all sequenced fish were documented using a combination of morphometric analyses. Our results revealed different extent of morphological and genetic differentiation within the different sites. Overall, genetic differentiation was weak but significant and was on average three times higher between sites (Mean FST = 0.016) than between ecotypes within sites (Mean FST = 0.005) indicating higher level of flow and/or a more recent shared ancestor between ecotypes within each site than between populations of the same ecotype. Evidence of divergent selection was also found between ecotypes and/or in association with morphological variation. Outlier loci found in gene related to lipid metabolism and visual acuity were of particular interest in this context of ecotypes divergence. Overall, the occurrence of different levels of both genomic and phenotypic differentiation between ecotypes within each site with several differentiated loci linked to relevant biological functions support the presence of a continuum of divergence in Lake Trout.

13

Introduction

The study of diversification and ultimately speciation is central to evolution and relevant for conservation biology (Weissing et al. 2011). The most common and established mechanism of speciation is divergence in allopatry, where spatial and geographical barrier prevent gene flow, thus allowing genetic incompatibilities to accumulate, subsequently resulting in reproductive isolation following secondary contact (Coyne and Orr 2004; Tittes and Kane 2014). Some examples of allopatric isolation mechanisms in fishes include the glacial cycles in North America responsible for the origin of many freshwater species (April et al. 2013), the rise and fall of Lake Tanganyika, and the barriers created by high water flow in large rivers such as the Amazon or Congo River (reviewed in Bernardi 2013). However, a geographic barrier is not always needed and speciation can emerge in sympatry, or in parapatry despite high gene flow, by divergent selection on ecologically important traits (Tittes and Kane 2014; Gavrilets et al. 2007). Divergent selection on ecological traits can be caused by biotic and abiotic influences where adaptations to different environment or ecological niches result in the emergence of reproductive incompatibilities (Bernardi 2013; Nosil et al. 2009). The latter may create a continuum of divergence from continuous variation within a single gene pool, to ecotype formation and finally to complete differentiation and reproductive isolation (Lu and Bernatchez 1999; Nosil et al. 2009; Hendry 2009). Models and case studies have shown that sympatric speciation is possible under gene flow when few loci underlying the divergent trait undergo strong selection, whereas gene flow homogenises the rest of the genome (Gavrilets et al. 2007; Franchini et al. 2013).

Ecological speciation has been extensively documented in several geologically young fish species living in sympatry. For instance sympatric speciation has occurred in Midas cichlids (Amphilophus spp.) (Franchini et al. 2013), Lake Victoria cichlids (Wagner et al. 2013) but more commonly in several temperate freshwater fishes such as stickleback (Gasterosteus spp.), smelt (Osmerus spp.) and mainly in salmonids such as whitefish (Coregonus spp.), trout (Salmo spp.), Pacific salmon (Oncorhynchus spp.) and charrs (Salvelinus spp.) (Taylor 1999; Jonsson and Jonsson 2001). Sympatric speciation is usually linked to trophic

14

polymorphism in which intra-specific ecotypes use different habitats and resources (Blackie et al. 2003; Hansen et al. 2012; Smith and Skúlason 1996). Trophic polymorphism is common in post-glacial lakes where retreat of the ice sheet creates unoccupied niches and opportunities for intra-specific competition (Blackie et al. 2003; Zimmerman et al. 2009). These conditions are believed to be responsible for the extensive radiation in North American freshwater fishes where several species are adapted to different ecologic niches (Schluter 2001). Parallel evolution of shared phenotypic traits linked to trophic resource use has been demonstrated in several postglacial systems. These shared morphological traits between populations can be accompanied by shared genetic architecture underlying the ecologically important traits or can arise from independent genetic processes (Ralph and Coop 2014). For example, the repeated divergence of marine and freshwater stickleback exhibiting similar phenotypic changes in body armour have been described and the repeated reduction in armour plates was found to be controlled by the same set of loci (Colosimo et al. 2005; Jones et al. 2012). On the other hand, convergent phenotypic traits may not always be controlled by similar developmental pathways as is the case for cavefish (Astyanax spp.), beach mice (Peromyscus polionotus), and fruit fly (Drosophila spp.) (reviewed in Arendt and Reznick 2008; Bernatchez (submitted)). For instance, the evolution of parallel phenotypic divergence between benthic normal and limnetic dwarf whitefish (Coregonus spp.) in several North American lakes was found to be only partially associated with parallelism at the genome level (Gagnaire et al. 2013; Laporte et al. 2015).

Lake Trout (Salvelinus namaycush) are renowned for the occurrence of different ecotypes linked to resource and habitat use throughout North America. In small lakes, Lake Trout diverged mainly into a planktivorous and piscivorous ecotype (Vander Zander et al. 2000), whereas several large lakes harbor at least four ecotypes associated with differential resource partitioning (Muir et al. 2015). For instance, four different ecotypes occur in Great Bear Lake and Lake Superior, three in Great Slave Lake and two in Lake Mistassini and Rush Lake (Muir et al. 2015). In Lake Superior, four distinct ecotypes have been reported that are recognized based on differences in morphology and coloration but also in life history traits, physiology and ecology (Muir et al. 2015). For instance, they differ in traits such as growth rate, asymptotic length and weight, size at sexual maturity, as well as developmental rate of

15

fertilized eggs or fry. They also differ in physiology such as buoyancy and swim bladder retention (Muir et al. 2015; Hansen et al. 2016). The ‘lean’ ecotype has a slender, streamlined body with low body lipid content, and occupies shallow waters where it preys upon pelagic fishes (Bronte et al. 2003; Goetz et al. 2011; Burnham-Curtis and Smith 1994; Moore and Bronte 2001; Zimmerman et al. 2009). The ‘humper’ ecotype inhabits offshore, mid-water shoals, feeds on small prey and is sexually mature at relatively smaller sizes (< 500 mm) (Stafford et al. 2014; Burnham-Curtis and Smith 1994; Hansen et al. 2016). It also has a small head with moderately large eyes dorsally positioned and short-paired fins (Zimmerman et al. 2009; Moore and Bronte 2001; Bronte et al. 2003). The ‘siscowet’ ecotype is recognized by its sloping snout, moderately large eyes and high body fat content which may facilitate diel vertical migration to follow the migration of ciscoes (Ahrenstorff et al. 2011; Hrabik et al. 2014; Bronte et al. 2003; Bronte and Sitar 2008; Burnham-Curtis and Smith 1994; Hansen et al. 2012). Lastly, the ‘redfin’ ecotype has a robust body, a large head, a long deep peduncle and large fins (Muir et al. 2014). Several hypotheses have been proposed to explain the origin of these ecotypes (Wilson and Mandrak 2004; Eshenroder 2008). These could be the result of developmental plasticity in which a single genotype expresses different phenotypes matching selection optima or can be genetically based or a mix of both (Goetz et al. 2010). While this does not rule out a role for developmental plasticity, two lines of evidence suggest some genetic basis for the phenotypic differences observed between the ecotypes. First, progeny from wild lean and siscowet gametes have been raised in a common garden experiment and key phenotypic features that differentiate wild leans and siscowets such as condition factor, morphology and lipid content were maintained (Goetz et al. 2010). Furthermore, the same study uncovered transcriptional differences in lipid-related between the two ecotypes (Goetz et al. 2010). Second, morphological differences in the operculum and supraethmoid bones have been documented between leans, siscowets and humpers. Cranial bones are of taxonomic significance in salmonids and are unlikely affected by environmental conditions and ontogenic shifts (Burnham-Curtis and Smith 1994).

Lake Trout (Salvelinus namaycush) were once the dominant predator in the Great Lakes. It historically supported one the most important freshwater commercial fisheries before being

16

extirpated in the 1950s in all lakes except Lake Superior, where it is now considered restored and Lake Huron, where recruitment has been increasing (Riley et al. 2007), but it remains at relatively low abundance (Zimmerman and Krueger 2009, Bronte et al. 2003). The collapse of Lake Trout populations has been associated with anthropogenic factors, including habitat degradation, pollution and overfishing, as well as predation by invasive sea lamprey following the construction of navigation canals (Bronte and Sitar 2008; Page et al. 2003; Page et al. 2004). Impediments to its recovery or restoration have been the object of a review by Zimmerman and Krueger 2009 and guidelines have been provided to maintain, increase or reintroduce Lake Trout population in the Laurentian Great Lakes. Understanding and evaluating genetic structure and diversity of remaining Lake Trout population has been identify as key research topic.

The general goal of this study was to gain insight into the nature and origin of the different Lake Trout ecotypes in Lake Superior. More specifically, we aimed to; 1) investigate the extent of both morphological and genome wide genetic differentiation and connectivity among the four Lake Trout ecotypes from different geographic locations within the lake; and 2) identify possible adaptive genetic differentiation among ecotypes by means of genome scans and genotype-phenotype associations. To achieve this, we used RADseq to genotype Lake Trout from the four ecotypes and from four sites from Lake Superior. In parallel, geometric morphometric analyses were performed on head and body shape.

Methods

Sampling Fish from the four Lake Trout ecotypes were sampled in 2013-2014 from four sites in Lake Superior; Big Reef (2014), Stannard Rock (2013-2014), Superior Shoals (2013) and Isle Royale (2013) (Figure 1, Table 1). For the first three sites, a nylon gill net, 183 m long by 1.8 m high, was used with 30.5 m long panels of different mesh sizes (50.8-114.3 mm). Nets were deployed for 24 hours at different depth ranges (0-50 m, 50-100 m and >100 m) approximately representing preferred depths of the different ecotypes. A picture of each fish

17

was taken following the protocol in Muir et al. (2012) and a biopsy of either the adipose or pectoral fin was collected and preserved in 95% ethanol. The fourth site, Isle Royale, was sampled in 2013 using overnight sets of 274-823 m long gill nets with nine panels (91.4 m long by 1.83 m high) of single mesh size (5.1 cm, 6.4 cm, 7.6 cm, 8.9 cm, 10.2 cm, 11.4 cm, 12.7 cm, 14.0 cm, 15.2 cm). Pictures were taken using the same protocol (Muir et al. 2012) and liver or gonads were conserved in RNA Later. Samples without pictures or genetic material were removed from subsequent analyses. Information about total length (mm), wet weight (g) and sex, and depth of capture were recorded for each sampled individual.

Figure 1. Map of Lake Superior sampling sites; Isle Royale, Superior Shoals, Stannard Rock and Big Reef. Circles correspond to sampling locations for each site.

18

Table 1. Sampling site information and consensus analysis of body shape, head shape and visual identification. Number of fish sampled per sampling site (N), year of collection and coordinates is provided. Ecotypes were identified by consensus analysis of body shape (B) and/or head shape (H) and/or visual identification (V). Fish for subsequent genetic analysis were chosen based on ecotype consensus. Fish less than 430 mm long were removed prior to the analysis.

Lean- Siscowet- Humper- Redfin- no Sites Year Coordinates N Total like like like like consensus B, H,V B, H, V V V 46°46,545°N Consensus 39 46 8 17 13 123 Big Reef 2014 132 86°28,378°W Chosen 39 40 8 17 104

B, H, V B, H, V V V 47°11,450°N Consensus 63 66 24 24 40 217 Stannard 2013- 362 Rock 2014 87°11,531°W Chosen 40 40 24 24 128

B, H, V B, H, V H,V B, V 47°21,550°N Consensus 55 37 35 33 42 202 Isle Royale 2013 214 88°30,497°W Chosen 40 37 34 33 144

B, H, V B, H, V V B, V 48°02,464°N Consensus 35 133 11 62 74 315 Superior 2013 394 Shoals 87°07,536°W Chosen 31 41 11 42 125

19

1 2 Ecotype assignment 3 Consensus of both morphometric analyses (body and head) and visual identification as 4 visual interpretation of fish pictures by Lake Trout experts (see Acknowledgements) was 5 used to assign an ecotype to each fish per Muir et al. (2014). Fish less than 430 mm long 6 with the exception of humper-like fish, that are < 430 mm when sexually mature, were 7 excluded to remove the confounding effect of ontogenetic shifts in morphology 8 (Zimmerman et al. 2009). Body and head were analysed separately to distinguish 9 locomotion (body) from feeding habit (head). In addition, morphometric analyses were 10 conducted separately for each site to investigate morphological variation among sites. 11 Landmarks and semi-landmarks were digitized and analysed with the Thin Plate Spline suite 12 (TPS; State University of New York at Stony Brook; http://life.bio.sunysb.edu/morp). First, 13 for each fish a rectangular grid was overlaid to identify belly curvature corresponding to 20- 14 30-40-50% of body length using the program REVIT (Autodesk) (Figure 2a). The body grid 15 was anchored at the tip of the snout and the midpoint of the hypural plate. Second, 16 16 homologous landmarks and four semi-landmarks were digitized on each fish with the 17 program TpsDig2 and semi-landmarks were slid using TpsUtil (Figure 2a). Semi-landmarks 18 were used to represent belly curvature which is known to be distinctive between the two 19 major ecotypes, leans and siscowets (Muir et al. 2014). Similarly, a squared grid was 20 overlaid on each fish head dividing it into 10 equally spaced sections using the program 21 REVIT (Figure 2b). The head grid was anchored at the tip of the snout and the posterior 22 edge of the opercula. Eight homologous landmarks and 20 semi-landmarks were digitized on 23 each fish head with the program TpsDig2 and semi-landmarks were slid using TpsUtil 24 (Figure 2b). Distortions from rotation and size were removed by the program TpsRelw 25 producing partial warps scores which are size-free variables. A principal component analysis 26 (PCA) was performed to reduce the number of morphometric variables or scores and extract 27 divergent morphometric patterns. Subsequently, relevant axes were supplied to a Bayesian 28 clustering analysis implemented in the R package Mclust v.4. Mclust is a normal mixture 29 modeling for model-based cluster analysis, classification and density estimation that include 30 the Bayesian Information Criterion (BIC) for model selection and that do not require priori 31 information about groups such as discriminant function analysis (Fraley & Raftery 2012).

20

32 Components accounting for more than 65% of the variance were supplied to the Mclust 33 algorithm. The best model was the one with the highest BIC able to separate leans from 34 siscowets since they are the most morphologically differentiated ecotypes (Fraley and 35 Raftery 2012; Muir et al. 2014). Group classification resulting from the chosen model was 36 retrieved for each individual. The visual identification of each collected fish from Big Reef, 37 Stannard Rock and Superior Shoals was conducted by visual consensus of three trained 38 biologists. Visual identification of Isle Royale fish was provided by an experienced 39 biologist. An ecotype was assigned to each fish based on the consensus from body shape, 40 head shape and visual identification. Two out of three similar ecotype assignments were 41 needed to assign to each fish a particular ecotype. In the case of different head, body and 42 visual assignations; the fish were assigned « no consensus» and removed from subsequent 43 analyses. Fish for subsequent genetic analyses were chosen as follow: (1) fish with 100% 44 consensus having the lowest group uncertainty and (2) fish with 2/3 consensus having the 45 lowest group uncertainty. However, if an ecotype was not identified based on morphometric 46 analysis, the visual identification only was used and taken into account in subsequent 47 analyses. 48

a)

49

21

b)

50

51 Figure 2. Landmark and semi-landmark positions digitized on fish body and head. Homologous 52 landmarks are represented by red dots and semi-landmarks by black dots. (a) Sixteen homologous 53 landmarks and four semi-landmarks were placed on each fish body; (1) tip of the snout, (2) posterior 54 tip of the maxillary, (3) center of the eye, (4) top of the head above the center of the eye, (5) posterior 55 of neurocranium above the edge of opercula, (6) anterior insertion of dorsal fin, (7) posterior 56 insertion of dorsal fin, (8) anterior insertion of adipose fin, (9) dorsal insertion of caudal fin, (10) 57 midpoint of hypural plate, (11) ventral insertion of caudal fin, (12) posterior insertion of anal fin, 58 (13) anterior insertion of anal fin, (14) anterior insertion of pelvic fin, (15) 50% of body length, (16) 59 40% of body length, (17) 30% of body length, (18) 20% of body length, (19) anterior insertion of 60 pectoral fin, (20) connection (isthmus) between branchiostegals. (b) Eight homologous landmarks 61 and twenty semi-landmarks were placed on each fish head; (1) tip of the snout, (2-11) ten equally 62 spaced semi-landmarks, (12) posterior edge of the opercula, (13-22) ten equally spaced semi- 63 landmarks, (23) tip of the lower jaw, (24) posterior tip of the maxillary, (25) anterior edge of the eye, 64 (26) center of the eye, (27) posterior edge of the eye, (28) insertion of pectoral fin. 65 66 Morphometric analysis 67 Two multivariate analyses were used to test for morphological differences between the four 68 ecotypes at the four sites and to investigate among site differences for the same ecotype. 69 First, a principal component analysis (PCA) was performed to reduce variable 70 dimensionality, and components explaining most of the variance were selected based on the 71 broken stick method. Then a multivariate analysis of variance (MANOVA) was conducted 72 in R (package “stats”) on the selected components. Since partial scores derived from a 73 configuration that included semi-landmarks do not have the same number of free variables 74 as degrees of freedom, a requisite of MANOVA, a between-group analysis (groupPCA) 75 implemented in the R package “morpho” was conducted on partial warps (Webster and 76 Sheets 2010). This analysis takes into account uneven group size and does not require

22

77 normality or homogeneity of variance (Mitteroecker and Bookstein 2011). The Euclidean 78 distance between group mean was tested using 10 000 permutations. For both analyses, the 79 effects of the sampling site, sex and ecotype were tested. 80

81 Sample DNA extraction and sequencing 82 Genomic DNA was extracted from individuals representing each ecotype at the four sites 83 using a salt-extraction protocol adapted from Aljanabi and Martinez (1997). Sample quality 84 and concentration were checked on 1% agarose gels and using the NanoDrop 2000 85 spectrophotometer (Thermo Scientific). Each individual’s genomic DNA was normalized to 86 20 ng/µL of 10 µL (200ng total) using PicoGreen (Fluoroskan Ascent FL, Thermo 87 Labsystems) in 96 well plates. The libraries were constructed and sequenced on the Ion 88 Torrent Proton platform following the protocol in Mascher et al. 2013. Briefly, restriction 89 digest buffer (NEB4) and two restriction enzymes (PstI and MspI) were added to each 90 sample. Digestion was completed by incubation at 37oC for two hours and enzymes were 91 inactivated by incubation at 65oC for 20 minutes. Two adaptors (one unique to each sample 92 and the second common) were added to each sample and ligation was performed using a 93 ligation master mix followed by the addition of T4 ligase. The ligation reaction was 94 completed at 22oC for 2 hours followed by 65oC for 20 minutes to inactivate the enzymes. 95 Samples were pooled in 48-plex and cleaned-up using QIAquick PCR purification kits. The 96 library was then amplified by PCR and sequenced on the Ion Torrent Proton P1v2 chip. The 97 detailed methods for SNP identification, SNP filtering and genotyping using STACKS 98 v.1.32 (Catchen et al. 2011) are presented in Supplementary materials. 99

100 Genetic diversity and differentiation 101 We first estimated pairwise population differentiation using Weir’s and Cockerham’s

102 estimator of pairwise FST (Weir and Cockerman 1984) in GenoDive 2.0b23 (Meirmans and

103 Van Tienderen 2004) with 10000 permutations. Similarly, measures of observed (Ho) and

104 expected heterozygosity (He) and inbreeding (FIS) were estimated using GenoDive 2.0b23b.

105 Effective population size (Ne) and number of polymorphic loci (N) for each site was 106 estimated using the program NeEstimator v.2.01 (Do et al. 2014). Briefly, the program was 107 run with the linkage disequilibrium model, the random mating system and a critical value of

23

108 0.05 (Pcrit) to exclude alleles that occur in only a single copy in the sample. Genome-wide 109 diversity () and the increase in individual homozygosity relative to mean Hardy-Weinberg

110 expected homozygosity (Fh) were estimated for each site with the dataset prior to filtration 111 using the R package stackr (https://github.com/thierrygosselin/stackr). Lastly, an analysis of 112 molecular variance (AMOVA) was conducted to quantify the proportion of genetic variance 113 explained by sites relative to that explained by variation among the four ecotypes (Meirmans 114 and Van Tienderen 2004). The AMOVA was run with two different levels of hierarchical 115 subdivision; first with sites nested within ecotypes and then ecotypes nested within sites. A 116 total of 10000 permutations were used to access significance and an infinite allele model 117 was chosen. Because AMOVA does not allow missing data, missing values were replaced 118 by randomly selecting alleles proportional to total allele frequency in Genodive 2.0b23. 119 120 Population clustering 121 Population clustering and connectivity was estimated with the program ADMIXTURE 1.23 122 (Alexander et al. 2009). This program estimates ancestry in a model-based manner where 123 individuals are considered unrelated and allows choosing the best number of possible 124 genetic groups present in the data based on a cross-validation procedure. The program was 125 run with values of K ranging from 1 to 20. A population tree was built using the program 126 TreeFit (Kalinowski 2009) and visualized with the program FigTree v1.4.2 127 (http://tree.bio.ed.ac.uk/software/figtree/). Genetic distances were calculated using  (Weir 128 and Cockerham 1984) between each pair of population and the neighbor-joining (NJ) 129 distance-based method was used for tree construction. Support for each branch was assessed 130 by bootstrapping using 1000 permutations (Kalinowski 2009). 131 132 Population assignment 133 Population assignment was conducted to investigate the power to classify an unknown 134 individual to either a sampling site or an ecotype. This analysis was run using Genodive 135 2.0b23 with the home likelihood criteria (the likelihood that an individual comes from the 136 population where it was sampled), which is more appropriate when only part of all possible 137 source population have been sampled (Meirmans and Van Tienderen 2004). A significance 138 threshold of 0.05 was applied and zero frequencies were replaced by 0.005 as suggested by

24

139 Meirmans and Van Tienderen (2004). To avoid bias due to the calculation of allele 140 frequencies from the same individuals which are subsequently assigned, the program uses 141 the leave one out (LOO) validation procedure in which a targeted individual is removed 142 from its source population before calculation of the allele frequency. For this analysis, 143 missing values were replaced by randomly picking alleles from the global allele pool. All 144 loci were used for this analysis such that no correction was necessary to avoid high grading 145 bias associated with using a subset of markers based on their ranking of level of 146 differentiation (Anderson 2010). 147 148 Outlier detection and phenotype-genotype associations 149 We used two different types of approaches to detect outlier SNPs potentially under divergent 150 selection between ecotypes and sites: 1) genome scans performed among the different 151 ecotypes and/or sites; and 2) association tests between genotypes and continuous phenotypic 152 values. 153 154 For the first approach, two different methods were used to detect outlier SNPs potentially 155 under divergent selection (1) among the four sites (individuals from the different ecotypes 156 were pooled), (2) among the four ecotypes (individuals from the different sites were pooled) 157 and (3) among the four ecotypes within each site, independently. First, the program

158 Bayescan v1.2 was used to detect outliers based on locus-specific FST with a prior odd of 159 10000 and a false discovery rate (FDR) of 0.05. Bayescan was run with 5000 iterations and a 160 burn-in length of 100000 as recommended by Foll and Gaggiotti (2008). Second, the 161 program LFMM (Latent Factor Mixed Models) from the R package LEA was used to detect 162 outliers based on allele frequencies exhibiting significant statistical association with selected 163 phenotypes. Categorical variables were coded as orthogonal matrices on which a principal 164 component analysis was applied and resulting scores were supplied to the LFMM analysis. 165 LFMM was run with five repetitions, 10000 cycles and 5000 burn-in as recommended by 166 Frichot and François (2015). P-values were adjusted from their distribution and possible 167 associations corrected for population structure detected from the admixture analysis as 168 suggested by Frichot and François (2015). 169

25

170 For the second approach, phenotype-genotype associations were analysed with LFMM. This 171 technique can uncover subtle changes in allele frequencies (such as expected in polygenic 172 selection) that are not detected in traditional outlier analyses (Rellstab et al. 2015). LFMM 173 was run with the same parameters stated in the previous paragraph with ten repetitions 174 including the p-value adjustment, an FDR of 0.05 and a correction for population structure 175 based on the admixture analyses. The phenotypic variables were the principal components 176 scores for each individual that explain most of the variation for head and body shape based 177 on the broken stick method. 178 179 180 Loci potentially under selection detected by the different approaches (Bayescan and LFMM) 181 were blasted against the Rainbow Trout genome (Oncorhynchus mykiss) (Berthelot et al. 182 2014) to determine possible functions with the following parameters: an e-value threshold of 183 1e-6, a word size of 11 bp and a max target of 100 bp. Resulting loci were filtered based 184 upon three criteria; the number of similar hits, the bit-score and sequence length. First, loci 185 with only one hit and having ≥ 50 bp long were kept. Second, loci with multiple hits having 186 the first best hit ≥ 20 bit-score higher than the second best hit with sequence length ≥ 50bp 187 were kept.

188 Results

189 190 Ecotype identification and morphometric analyses 191 Based upon consensus analysis between head, body and visual identification, an ecotype was 192 assigned to each fish. First, the best model for each site that distinguished, at least, between 193 leans and siscowets with BIC values and mean uncertainties was used for ecotype 194 assignment (Table 2). For each site, fish to be sequenced were chosen from the consensus 195 identification (Table 1). If some ecotypes were not distinguished by the morphometric 196 analysis from either head or body shape, expert visual identification from these fish was 197 used (Muir et al. 2015). Based on the broken stick method, the first four PCs were retained 198 for body shape and the first six PCs were retained for head shape corresponding to 70% and 199 81% of total variance respectively to conduct the multivariate analysis of variance

26

200 (MANOVA). First, the overall shape difference between ecotypes was assessed by pooling 201 similar ecotypes from the four sites. For the head shape, the ecotypes, the sites and the sex 202 were significantly different (p < 0.001). Interactions between ecotypes and sex (p < 0.01) or 203 sites (p < 0.001) were also significant. Similar results were observed for body shape (p < 204 0.001) (Table 3). The group-PCA revealed the same pattern for the head and body shape 205 except that no difference between sexes was detected (Table S1, Supporting information). 206 The first axes of the group-PCA for head shape explained 56.2% of the variance and 207 discriminated siscowets from leans, whereas the second axis explaining 15.9% of the 208 variance discriminated humpers from redfins (Figure 3a). For body shape, the first two axes 209 of the group-PCA explained 65.5% and 14.5% of the variance and mainly distinguished 210 leans from siscowets. In both head and body analyses, the third and fourth axes 211 discriminated Lake Trout more by sampling sites than ecotypes (Figure. 3b-3d). Ecotypes 212 were not significantly different within all sites, either based on morphometric analyses of 213 head or body shape but yet could be differentiated by visual inspection (Table 4, Figure 3a- 214 3b). Within Big Reef, only leans differed from siscowets and redfins in terms of both head 215 and body shapes (p < 0.05). Within Isle Royale, head and body shape differed between all 216 four ecotypes (p < 0.05) with two exceptions; humpers did not differ from leans in body 217 shape and leans did not differ from redfins in head shape. Within Stannard Rock, leans 218 differed from siscowets and redfins in both head and body shape (p < 0.05). Finally, within 219 Superior Shoals, siscowet body shape differed from all other ecotypes (p < 0.05), except for 220 head shape which was not different from humper’s. In addition, leans head shape was 221 different from redfin’s (p < 0.05). In some cases, similar ecotypes from different sites had 222 significant different head and/or body shapes (Table 4, Figure 3c-3d). Indeed, siscowets 223 head shape differed among sites whereas body shape was not different between Big Reef 224 and Stannard Rock. Body shape of Superior Shoals leans differed from other leans except 225 Isle Royale’s whereas Isle Royale leans differed from Stannard Rock’s. On the other hand, 226 Isle Royale lean heads differed from both Stannard Rock and Big Reef leans. Redfins from 227 Isle Royale differed in head and body shape from all other redfins and lastly humpers were 228 not different from site to site. Despite the fact that not all ecotypes from all sites were 229 morphologically different based on morphometric analyses, we conserved this grouping for

27

230 the genetic analysis based on the visual inspection of other traits (e.g. size of mature fish, 231 body or fin colours). 232

28

Table 2. Models and parameters selected to assign an ecotype to each fish from the four sites separately. The number of principal components used and the corresponding cumulative percentage of explained variance are in brackets. Models chosen and parameterisation of the covariance matrix are either identified as EII (spherical distribution, equal volume, and equal shape), VEI (diagonal distribution, variable volume, and equal shape), VII (spherical distribution, variable volume, equal shape) or EEI (diagonal distribution, equal volume, and equal shape). The number of groups found (G), the corresponding bayesian information criterion (BIC) and mean uncertainty are also listed for each model.

Body Head

Number of Mean Number of Sites Model G BIC Model G BIC Mean uncertainty PC used uncertainty PC used Big Reef 3 (66%) EII 2 2073.192 0.08 ± 0.01 4 (73%) EEI 3 2405.928 0.13 ± 0.01

Isle Royale 4 (68%) VII 3 4870.388 0.18 ± 0.01 4 (71%) VEI 3 3878.7 0.17 ± 0.01

Superior Shoals 4 (68%) VEI 3 7341.97 0.29 ± 0.01 4 (68%) VEI 3 5926.035 0.21 ± 0.01

Stannard Rock 4 (72%) EEI 3 4920.14 0.16 ± 0.01 4 (71%) VEI 3 4149.481 0.18 ± 0.01

29

Table 3. MANOVA on body and head shape to investigate the effect of the ecotype, the site of origin, the sex and all interactions. Significant variables are in bold.

Body Head

Variables Df Pillai Approx F Num Df Den Df Pr (> F) Df Pillai Approx F Num Df Den Df Pr (> F)

Ecotype 3 0.59616 28.7686 12 1392 < 2.2 e-16 3 0.79729 27.871 18 1386 < 2.2 e-16

Site 3 0.43694 19.7752 12 1392 < 2.2 e-16 3 0.90345 33.181 18 1386 < 2.2 e-16

Sex 1 0.10732 13.8857 4 462 1.052e-10 1 0.10040 8.556 6 460 7.883 e-09

Ecotype: Site 9 0.47419 6.9486 36 1860 < 2.2 e-16 9 0.52612 4.966 54 2790 < 2.2 e-16

Ecotype: Sex 3 0.05388 2.1214 12 1392 0.01339 3 0.11061 2.948 18 1386 3.28 e-05

Site : Sex 3 0.04189 1.6427 12 1392 0.07412 3 0.04268 1.111 18 1386 0.3344

Ecotype: Site : Sex 9 0.07982 1.0520 36 1860 0.38565 9 0.12499 1.099 54 2790 0.2891

30

Table 4. Between-group PCA analysis of group distance among ecotypes within and among sites with 10000 permutation for body (below diagonal) and head (above diagonal) shape. Ecotypes are: Siscowet (FT), Humper (HT), Lean (LT) and Redfin (RF). Group distances with significant p-value (< 0.05) are followed by an asterix (*). Within sites analyses are highlighted in gray and significant between site differences for the same ecotype are in bold.

Group distances Big Reef Isle Royale Stannard Rock Superior Shoals FT HT LT RF FT HT LT RF FT HT LT RF FT HT LT RF FT 0.028 0.048* 0.022 0.047* 0.042* 0.045* 0.045* 0.036* 0.052* 0.059* 0.027 0.038* 0.037 0.048* 0.032* HT 0.017 0.031 0.031 0.065* 0.033 0.035 0.042 0.031 0.034 0.041 0.018 0.057* 0.037 0.035 0.033 LT 0.030* 0.021 0.042* 0.085* 0.045* 0.029* 0.050* 0.042* 0.032 0.018 0.031 0.072* 0.048* 0.028 0.045* Big Reef RF 0.009 0.019 0.030* 0.064* 0.049* 0.041* 0.044* 0.042* 0.053* 0.051* 0.028 0.047* 0.037 0.041* 0.029 FT 0.025* 0.038* 0.046* 0.027* 0.068* 0.076* 0.070* 0.068* 0.087* 0.097* 0.064* 0.045* 0.071* 0.085* 0.066* HT 0.025* 0.018 0.014 0.027* 0.041* 0.044* 0.052* 0.026 0.031 0.050* 0.037* 0.064* 0.040 0.037* 0.039* LT 0.025* 0.018 0.014 0.027* 0.043* 0.013 0.027 0.049* 0.042* 0.036* 0.035* 0.063* 0.043 0.030 0.038*

Isle Royale RF 0.028* 0.022 0.032* 0.028* 0.048* 0.028* 0.023* 0.056* 0.058* 0.056* 0.043* 0.064* 0.050* 0.047* 0.044* FT 0.017 0.017 0.022* 0.020 0.034* 0.018 0.020* 0.027* 0.028 0.047* 0.024 0.058* 0.034 0.039* 0.037* HT 0.030* 0.019 0.015 0.033* 0.049* 0.017 0.018 0.028* 0.019 0.029 0.033 0.074* 0.042 0.029 0.044*

Rock LT 0.038* 0.026 0.014 0.039* 0.056* 0.021* 0.019*0.033*0.029* 0.014 0.040* 0.082* 0.053* 0.025 0.051*

Stannard RF 0.016 0.016 0.019 0.016 0.035* 0.017 0.019 0.027* 0.011 0.020 0.026* 0.051* 0.031 0.034 0.030 FT 0.036* 0.048* 0.053* 0.038* 0.029* 0.049* 0.049* 0.057* 0.041* 0.055* 0.062* 0.041* 0.043 0.067* 0.041* HT 0.024 0.025 0.027* 0.027 0.038* 0.024 0.024 0.035* 0.019 0.025 0.032* 0.019 0.034* 0.035 0.017 LT 0.021* 0.016 0.020* 0.022 0.040* 0.016 0.013 0.019 0.014 0.017 0.024* 0.014 0.045* 0.019 0.031* Shoals Superior RF 0.020* 0.025 0.028* 0.022 0.031* 0.024* 0.024* 0.034* 0.018 0.028* 0.035* 0.017 0.029* 0.011 0.018

31

Figure 3. Between-group PCA on partial warps of 501 Lake Trout. (a) First and second principal components for head shape representing 56.2% and 15.9% of the variance respectively distinguishing the four ecotypes. (b) First and second principal components for body shape representing 65.5% and 14.5% of the variance respectively that distinguish leans from siscowets based mainly on belly curvature. (c) Third and fourth principal component for head shape representing 11.5% and 7.2% of the variance respectively distinguishing the four sites. (d) Third and fourth principal components for body shape representing 8.8% and 3.0% of the variance respectively distinguishing the four sites. The colored points refer to the mean scores for each ecotype in each site. The sites are: Big Reef (black), Isle Royale (blue), Stannard Rock (red) and Superior Shoals (green). Ecotypes are: Siscowet (FT), Humper (HT), Lean (LT) and Redfin (RF). Under each ecotype are drawn the consensus shapes of all four ecotypes (gray) with the outline of the ecotype in question (black). The shaded ellipses have been drawn for clarity.

Sequencing and SNP calling Raw reads cleaning and demultiplexing resulted in a total of 1.6 billion reads with an average of 3.2 million reads per individual and a relatively small mean coefficient of variation (CV) of 0.23. The assembly resulted in a catalog containing 1,052,664 loci and a total of 212,804 SNPs (49,399 loci) after the population module. Fifteen individuals having

32

more than 40% missing genotype were removed from the analysis. After custom filtration 6822 high quality SNPs were retained for subsequent analysis (Table 5).

Table 5. Number of SNPs remaining after each filtration step. Allelic imbalance corresponds to the ratio of the number of sequences for the major allele on the number of sequences for the minor allele.

Before filtration: Count Catalog 1,052,664 loci

After population module (≥ 212,804 SNP 70% individuals in ≥ 2 sites )

Filters: Genotype quality: Genotype likelihood (≥ 6) Allelic imbalance (≤ 5) 193,678 SNP Hardy-Weinberg: Heterozygosity (≤ 0.6) Fis [-0.3; 0.3] 185,445 SNP MAF: Global (≥ 0.01) and/or 17,812 SNP Local (site) (≥ 0.05) Locus: Maximum number of SNP 13,984 SNP per locus (≤ 8) Position: 1th SNP per locus kept 6822 SNP

Genetic diversity and differentiation among sites and ecotypes

Genetic statistics revealed modest but significant FST among some ecotypes within each sampling site (Mean FST = 0.0055) (Table S2, Supporting information). Mean FST among ecotypes within sites were as follow: Big Reef 0.0047, Isle Royale 0.0087, Stannard Rock 0.0053 and Superior Shoals 0.0032. No trend in patterns of genetic diversity was observed between ecotypes within each site (Table 6). That is, there was no evidence that diversity in some ecotypes tended to be higher than in others. On the other hand, FST among sites were on average three times higher than observed among ecotypes within site (Mean FST = 0.016).

For instance, FST between sites (all four ecotypes pooled) were Big Reef  Isle Royale 0.017, Big Reef  Stannard Rock 0.009, Big Reef  Superior Shoals 0.022, Stannard Rock

33

 Isle Royale 0.02, Superior Shoals  Isle Royale 0.01 and Stannard Rock  Superior Shoals 0.02). A lower value between Big Reef  Stannard Rock and Isle Royale  Superior Shoals site pairs was consistent with their closer geographic proximity. Also genetic diversity parameters tended to show greater differences between sites, than between ecotypes within sites (Table 6). Namely, genetic diversity, in terms of nucleotide diversity

() and heterozygosity (Ho, He), was notably lower within Stannard Rock in comparison to the three other sites (Table 6). Overall, Superior Shoals ecotypes had the lowest effective population size (Ne) estimates while having, with Isle Royale, the highest inbreeding coefficient (Gis, Fh) whereas ecotypes from Big Reef had the highest effective population size (Ne) and the lowest inbreeding coefficient (Gis, Fh) while Stannard Rock showed intermediate indices. The more pronounced pattern of population differentiation between sites than between ecotypes was also evidenced by the AMOVA which revealed no net genetic variance explained by the ecotype grouping (FCT = -0.002) compared to the net genetic variance explained by sites grouping (FCT = 0.011) (Table 7).

34

Table 6. Population statistics estimated with 6822 SNPs: the observed heterozygosity (Ho), the expected heterozygosity (He), the inbreeding coefficient (Gis), the effective population size (Ne) and confidence interval in brackets, and the number of polymorphic loci (N). Genome-wide diversity () and the increase in individual homozygosity relative to mean Hardy-Weinberg expected homozygosity (Fh) was calculated on the dataset prior to filtration. Effective population size for ecotypes with sample size < 15 individuals were not calculated (NA).

Ho He  Ne Gis Fh N LT 0.067 0.067 0.000319 415 [356; 498] -0.007 -1.04E-07 4132 FT 0.074 0.072 0.000353 925 [698; 1365] -0.027 -1.29E-07 4564 RF 0.073 0.071 0.000337 1341 [646; infinite] -0.03 -1.44E-07 3238 Big Reef HT 0.072 0.069 0.000334 NA -0.034 -1.84E-07 2022 LT 0.075 0.077 0.000393 181 [171; 194] 0.027 -5.58E-08 4904 FT 0.074 0.075 0.000358 122 [116; 128] 0.012 -5.52E-08 4751 RF 0.079 0.080 0.000387 124 [118; 131] 0.015 -5.41E-08 4663

Isle Royale HT 0.074 0.075 0.000366 192 [178; 209] 0.017 -5.36E-08 4510 LT 0.062 0.061 0.000284 487 [408; 603] -0.011 -9.50E-08 3649 FT 0.062 0.061 0.000285 159 [149; 171] -0.007 -9.18E-08 3638

Rock RF 0.062 0.061 0.000281 250 [213; 301] -0.018 -1.03E-07 2923

Stannard HT 0.061 0.060 0.000276 199 [174; 230] -0.016 -1.05E-07 2920 LT 0.081 0.082 0.000385 146 [137; 155] 0.01 -6.30E-08 4538 FT 0.078 0.080 0.000378 133 [127; 139] 0.021 -5.14E-08 5022 RF 0.076 0.077 0.000367 94 [91; 97] 0.007 -7.57E-08 5227 Shoals Superior HT 0.073 0.074 0.000367 NA 0.012 -7.57E-08 2727

35

Table 7. Analysis of molecular variance (AMOVA) on 486 individuals and 6822 SNPs. Missing data has been replaced by random picking in the overall pool of allele frequency.

Source of variation % Variance F-stat F-value Std.Dev. c.i.2.5% c.i.97.5% p-value

Sites as group

Among sites 0.011 FCT 0.011 0.000 0.01 0.012 < 0.001

Among ecotypes 0.004 FSC 0.004 0.000 0.004 0.005 < 0.001 within sites Ecotypes as group

Among ecotypes -0.002 FCT -0.002 0.000 -0.002 -0.002 0.95

Among sites within 0.015 FSC 0.015 0.000 0.014 0.016 < 0.001 ecotypes

Clustering analysis The Admixture program identified two groups (best K) corresponding to pairs of sites: Big Reef and Stannard Rock vs. Isle Royale and Superior Shoals (Figure 4a). No migrants from Isle Royale and Superior Shoals were detected in the Big Reef/Stannard Rock cluster while results suggested the occurrence of migrants and admixed individuals in the Isle Royale/Superior Shoals cluster with a tendency for a greater proportion of migrants in Superior Shoals (Figure 4a). At K3-K4 Big Reef individuals tended to cluster separately from those of Stannard Rock although lean trout from Big Reef tended to be more similar to Stannard Rock leans. At K5, Isle Royale could be discriminated from Superior Shoals. Lastly, at K6 all four sites differed and some additional within-site distinctions began to appear. Within Big Reef, leans were distinct from other ecotypes, being more similar to the lean/redfin cluster from Stannard Rock. Within Isle Royale, siscowets were distinct from the other ecotypes while no obvious difference emerged between ecotypes within Superior Shoals. In addition, some siscowets from Stannard Rock seemed to be similar to Isle Royale siscowets. The NJ population tree mainly grouped ecotypes from different sites together

36

with pair of sites closer geographically also clustering more closely in the tree (Figure 4b). In addition, as observed in the Admixture analysis, leans from Big Reef were closer to leans from Stannard Rock than from the other ecotypes within Big Reef. Admixture also showed that siscowets from Isle Royale were most distinct from the other three ecotypes within this site (Figure 4b).

37 a)

b)

Figure 4. Population structure analysis of Lake Trout; a) Admixture plot based on 486 individuals and 6822 SNPs (including outliers) for different values of K. Individuals are shown by sites and ecotypes. b) Neighbour joining tree based on 486 individuals and 6822 SNPs including outliers. Yellow circles represent Big Reef, orange circles Stannard Rock, blue circles Isle Royale and green circles Superior Shoals. Bootstrapping support is indicated on each branch. The four ecotypes are represented for each site; Lean (LT), Humper (HT), Redfin (RF) and Siscowet (FT).

38

Population assignment Assignment success to sampling sites, based on the 6822 SNPs, was high, being 85% on average and up to 95% for Isle Royale and 93% for Stannard Rock (Figure 5a). Miss- assigned individuals to Big Reef were only assigned to Stannard Rock and vice-versa. Superior Shoals had a lower assignment success (78%) and had miss-assigned individuals to the three other sites. On the contrary, assignment success to ecotypes was low, being 41% on average, ranging from 12% for humpers up to 61% for siscowets (Figure 5b). Ecotype assignment success within each sampling site was highly variable, being highest on average within Isle Royale (55%) and lowest within Superior Shoals (21%) and in fact similar to random expectation while Big Reef (33%) and Stannard Rock (40%) showed intermediate results (data not shown). Assignment success within Isle Royale, was 76% for siscowets, 62% for leans, 52% for humpers and 33% for redfins, whereas assignment success within Superior Shoals was 46% for siscowets, 23% for redfins, 18% for leans and 0% for humpers. Within Big Reef, individuals were assigned either to siscowets or leans whatever their current ecotype was. For instance, assignment success for siscowets was 82%, 51% for leans and 0% for humpers or redfins. Stannard Rock showed a similar pattern, where assignment success for siscowets was the highest (76%) followed by leans (74%) while the assignment for humpers (10%) and redfins (0%) was low.

Figure 5. Assignment success of individuals to their sampling sites (a) or ecotypes (b). Percentage assignment is written below circles with the exact number of individuals assigned within brackets. Percentage of correct assignment to either sampling sites or ecotypes is in bold. Sites are: Big Reef (BR), Isle Royale (IR), Stannard Rock (SR), Superior Shoals (SS). Ecotypes are: Humper (HT), Siscowet (FT), Lean (LT) and Redfin (RF).

39

Outlier detection and phenotype-genotype associations Bayescan identified a total of 52 outliers from which 49 occurred between the four sites and three between the four ecotypes (Figure 6a-6b). No outliers were detected between ecotypes within each site. In addition, no outliers were common between sites and ecotype comparisons. For LFMM, the p-values were adjusted using a lambda of 0.55 (ℷ) and population structure was corrected for each analysis using the number of ancestral groups (K) identified by Admixture for the overall dataset or within each site separately. According to the Admixture results, a K of five was used for between sites and between pooled ecotype comparisons while for within site comparisons, a K of two was used for Big Reef and Superior Shoals and a K of three was used for Isle Royale and Stannard Rock. LFMM identified a total of 670 unique outliers: 554 between sites, and 116 between ecotypes in which 20 were common to both comparisons (Figure 6a-6b). Thus, the number of outliers between ecotypes was lower than that observed between sites. For within site comparisons between ecotypes, 359 outliers were detected within Big Reef, 131 within Isle Royale, 360 within Stannard Rock and 120 within Superior Shoals. Overall, up to 27 outliers were common between some sites but none were common to all sites (Figure 6c). No outliers detected among ecotypes were common between LFMM and BAYESCAN but height were common among sites.

Based on the broken stick method, the first four principal components were selected to represent head shape and the first six principal components were selected to represent body shape, for a total of 10 shape variables. Briefly, the p-values were adjusted using a lambda of 0.55 (ℷ) and population structure was corrected using a K of five. A total of 915 unique associations were detected with an FDR of 0.05 in which several were common between variables (Table S3, Supporting information). Four of these associated SNPs were common with BAYESCAN outliers (one with the between ecotype comparison and three with the between site comparison) (Figure 6a-6b). In addition, 349 of these associated SNPs were common with the previous LFMM analysis. Briefly, 71 were in common with between site comparison, 20 with between ecotypes comparison, 85 with within Big Reef comparison, 48 with within Isle Royale comparison, 91 with within Stannard Rock and 34 with within Superior Shoals comparisons.

40

Figure 6. Venn diagrams of outliers detected by LFMM and BAYESCAN among sites, ecotypes or among ecotypes within sites a) Outliers detected among the four sites by BAYESCAN and LFMM including outliers detected by LFMM using morphological PC scores. b) Outliers detected among the four pooled ecotypes by BAYESCAN and LFMM including outliers detected by LFMM using morphological PC scores. c) Outliers among ecotypes within each site detected by LFMM.

Annotation A total of 2056 loci detected either by BAYESCAN or LFMM between sites, between ecotypes or in association with phenotypic variation were blasted against the Rainbow Trout genome. After quality filtering, 258 loci were kept that had an annotation in genes (Table S4). From those with a known biological function, markers linked to lipid transport and

41

metabolism as well as visual development and perception were of particular interest given previously documented phenotypic characteristics differentiating Lake Trout ecotypes (see Discussion).

Discussion

This is the first study to combine genomic and morphometric analyses from all four lake trout ecotypes from several different locations in Lake Superior. This provided the unique opportunity to investigate among and within site variation and the extent of parallelism, both at the phenotypic and genomic level. Both morphometric and genomic analyses revealed within site morphological and genetic differences between ecotypes, but in general, genetic differences were more pronounced among sites than among ecotypes, even when comparing populations of the same ecotype. Similarly, we observed that values of demographic and genetic diversity parameters generally tended to vary more by site than by ecotype. Moreover, the extent of both morphological and genetic differences among ecotypes observed within site varied from one location to the other. In addition, genome scans and association tests identified several loci potentially implicated in local adaptation and phenotypic divergence among ecotypes, among which loci linked to lipid metabolism and transport as well as visual acuity and development are of particular interest (see Discussion below). The relatively large number of outlier loci identified, which globally showed relatively modest levels of genetic differentiation among sites or ecotypes, suggests a polygenic origin of both local adaptation between sites and ecotypic differentiation. We discuss the implications of these results for the understanding of the biological processes responsible for the emergence of the different ecotypes of Lake Trout as well as for their management.

Parallel evolution of ecotypes? Parallel evolution, the repeated evolution of similar phenotypic traits, has been shown in many populations within the same species (reviewed in Elmer and Meyer 2011). Shared phenotypic traits that evolved independently are generally believed to indicate the presence

42

of shared environmental pressures between locations driving morphological changes to similar optimum (Butlin et al. 2013). The evolution of these similar traits can be underlain by similar molecular changes or arise from different molecular processes (Elmer and Meyer 2011; Ralph and Coop 2014; Bernatchez submitted). Here, similar ecotypes corresponding to previously published descriptions were identified within all sampling sites. The fact that different ecotypes within sites were in general genetically more similar than different populations of the same ecotype among sites suggests that parallel evolution is implicated in the generation and maintenance of ecotypes. Moreover, although to a lesser extent, some morphological components could discriminate Lake Trout by sampling sites and in some cases, different populations of a same ecotype from different sites were not totally morphologically different. Such inter-site differences within ecotype have previously been reported by Bronte and Moore (2007) for siscowet which was interpreted as either the presence of different reproductive populations and/or a plastic response to different environmental conditions among sites. It is noteworthy, that head shape better discriminated ecotypes than body shape, as reported previously in other Lake Trout studies both from the Great lakes and elsewhere (Chavarie et al. 2013; Moore and Bronte 2001; Alfonso 2004; Moore and Bronte 2007; Magalhaes et al. 2009). More pronounced ecotypic differentiation of head shape could suggest a predominant role for feeding habit compared to locomotion as the main driver for these morphological differences (Chavarie et al. 2013; Magalhaes et al. 2009; Jonsson and Jonsson 2001).

Generally speaking, we found very limited support for a shared genetic origin among populations of the same ecotype. That is, we generally observed less genetic differences among ecotypes within sites than among populations (sites) for the same ecotype. The exception to this general pattern was for the lean ecotype for which we observed more genetic similarity between populations from Big Reef and Stannard Rock than between leans and other ecotypes from these locations. Similar results were previously reported by Ihssen (1988) and Dehring et al. (1981) who showed based on allozymes that Lake Trout of the lean ecotype from four different locations differed in allele frequencies. In contrast with our general results showing higher differentiation among sites than among ecotypes, a recent study conducted in Great Bear Lake, found more pronounced genetic differentiation among

43

Lake Trout ecotypes than among sampling sites (Harris et al. 2014). These authors hypothesized that stronger genetic and morphological differentiation in Great Bear Lake could be due to its more pristine environment and limited human impact compared to Lake Superior where these factors may have altered the original pattern of population structuring. For instance, considerable stocking and fishery harvest has occurred in Lake Superior, which could certainly have had an impact on the extent of population admixture (Guinand et al. 2003) compared to Great Bear Lake, which has not been stocked and has only been subject to minor fishery harvest. However, it is noteworthy that stocking has been done essentially for the lean ecotype (Page et al. 2004) such that it seems unlikely that this could explain the general pattern of structuring we documented for other ecotypes, although it could possibly explain why leans from different locations were more similar in some cases, as explained above.

Evolutionary origin of ecotypes and continuum of divergence The main emerging pattern overall suggests the independent origin of the different ecotypes at each sampling site. Relatively large geographic distances between these sites, known for the relatively high occurrence of the four ecotypes separated by ranges of much lower abundance, may have contributed to reduce genetic exchange between spatially isolated populations. Thus, localized movement have been reported for Lake Trout based on tagging studies where an average movement of approximately 40 km has been reported (Kapuscinski et al. 2005; Eschmeyer 1955). Considering that the closest sites in this study are separated by about 69 km (Big Reef/Stannard Rock) to 87 km (Isle Royale/Superior Shoals) and that sites that are the farthest are separated by 98 km (Superior Shoals/Stannard Rock) to 212 km (Isle Royale/Big Reef), the presence of spatially genetically differentiated stocks is consistent with this observation. Spatial isolation could also have been exacerbated by historical water level fluctuations. Lake Superior has a very diverse bathymetric habitat covered by peaks and valleys, thus creating geographical barriers particularly when water levels fluctuated. This situation is thought to have occurred 8000 years ago, which could have triggered the spatial pattern of genetic divergence seen today (Bronte and Moore 2007). Our data demonstrates different levels of both genetic and phenotypic divergence underlying the observed ecotypes ranging from intra-population polymorphism to clear genetically

44

distinct populations within a location. The extent of morphological differentiation in both head and body shape was also variable depending on the site being examined. Although the explanations for this pattern of continuum in morphological divergence are only hypothetical at this time, this could reflect different levels of trophic polymorphism associated with different selective pressures (e.g. competitive interactions), as reported for other species, including Lake Whitefish (Coregonus clupeaformis) (Lu and Bernatchez 1999), Arctic Charr (S. alpinus) (Gislason et al. 1999), or Three-spined Stickleback (Gasterosteus aculeatus) (Hendry et al. 2009). The extent of genetic divergence between ecotypes was also variable depending on the site examined suggesting that different levels of reproductive isolation accompany different levels of phenotypic divergence (see reference above, also reviewed by Hendry 2009).

In sum, the combined genomic and morphological data support the hypothesis that ecotypic differentiation among Lake Trout ecotypes from different geographic locations within Lake Superior can be arrayed along a continuum from quasi-panmixia to relatively pronounced reproductive isolation, mimicking the inter-specific pattern described by Hendry (2009) among lacustrine north temperate freshwater fishes. Consequently, variation along this continuum might profitably be used for studying factors, outlined by Hendry (2009), that can promote or constrain progress toward ecological speciation, including plasticity, natural selection, mate choice, geography, or historical contingency. However, the present study cannot rule out that different anthropogenic impacts among sites could have also contributed to the observed pattern of genomic and phenotypic variation. Indeed, a recent study conducted by Baillie et al. (2016) highlighted substantive losses of genetic diversity and genetic distances in lean, siscowet and humper trout from post-collapse recovery (1995- 1999) compared to contemporary period (2004-2013). This homogenisation could be the result of overexploitation, intensive stocking and invasions of non-native species which could have led to the overlap in breeding or foraging area thus increasing hybridisation. Biodiversity losses and speciation reversal caused by anthropogenic activities have been recorded in several freshwater species such as Lake Victoria cichlids (Seehausen et al. 2008), Great Lakes ciscoes (Coregonus spp.) (Todd and Stedman 1989) as well as the

45

European whitefish (Coregonus lavaretus) and Alpine whitefish (Coregonus spp.) (Bhat et al. 2014; Hudson et al. 2013).

Evidence of local adaptation Some evidence of divergent selection among ecotypes and sites was uncovered in the present study using different approaches. A larger proportion of markers identified by LFMM and Bayescan occurred among sites in comparison to among ecotypes. It is noteworthy that a greater proportion of markers identified as outliers or associated with phenotypic differentiation was found for head shape compared to whole body shape. This is in accordance with the morphometric analysis in which head shape differences were more pronounced than whole body shape, further supporting the hypothesis of the role of feeding ecology as an important driver for morphological differentiation of these ecotypes (Chavarie et al. 2013; Moore and Bronte 2007). Taken together, the combined results obtained for “neutral” and potentially “adaptive” markers highlight the contribution of both spatial isolation and local adaptation in shaping ecotypic variation within each sampling site.

Uncoupling of genotype and phenotype variation Both detection methods (Bayescan and LFMM) differentiated more outlier markers among sampling sites than among Lake Trout ecotypes, again supporting the view that spatial variables (e.g. different environmental conditions) are more important than ecotypic differentiation in explaining the observed pattern of population structure in Lake Superior. In both spatial and ecotypic differentiation however, a much larger proportion of markers potentially under selection were detected by the LFMM method compared to Bayescan, the former known to be more sensitive to polygenic effects, suggesting that weak or polygenic selection might be responsible for the observed pattern of “adaptive differentiation”, both spatially and between ecotypes (Rellstab et al. 2015). Moreover, the same method (LFMM) uncovered markers potentially under selection among ecotypes within all four sampling sites, none being common to all sites. Independent markers of selection among sites provide further support for the independent origin of ecotypes within each site. These results also suggest that phenotypic parallelism in Lake Trout ecotypes is not accompanied by parallelism at the genome level whereby the expression of a given ecotype in different sites

46

is controlled by a different polygenic architecture. The uncoupling of phenotypic and genotypic differentiation has been reported in many species, such as mice (Peromyscus maniculatus), cichlids, cavefish (Astyanax mexicanus), stickleback (Gasterosteus spp.), as well as ciscoes and whitefish (Coregonus spp.) (reviewed in Elmer and Meyer 2011; Bernatchez submitted). For instance, ciscoes in Lake Nipigon exhibit four morphological and ecological different forms without evidence of corresponding neutral genetic differentiation (Turgeon et al. 1999). Similarly, ciscoes from several inland lakes showed variable phenotypic differentiation which was not correlated to genetic distinction (Turgeon et al. 2016). Also Laporte et al. (2015) recently reported a clear pattern of phenotypic parallelism in body shape between dwarf and normal sympatric pairs of lake whitefish with similar genomic architecture underlying these traits being observed between some pairs but different genome architecture in others.

Functional annotation

Visual development and acuity Of the 258 loci for which successful annotation could be retrieved, several were of particular interest and linked to ecotypic differences observed in the present system. Two loci were linked to visual development and acuity of the retina; retinal guanylyl cyclase 2 and retinitis pigmentosa 1-like 1 , and one to visual perception; peripherin-2-like. Both markers linked to visual development and acuity were found in significant association with the second component of the head shape analysis from which the highest loadings was for the eye position (landmark number 26). Changes in size, location and sensitivity of the eyes have been associated to adaptation to low light environment (Von der Emde et al. 2004). Indeed, larger eyes with a predominance of rods are known to increase visual acuity (Von der Emde et al. 2004). Large eyes, close to the snout have been reported in other deepwater, salmonid morphs similar to the siscowet and humper ecotypes in Lake Superior, potentially reflecting adaptation to low light condition (Eshenroder 2008; Skoglund et al. 2015; Moore and Bronte 2001). The retinal guanylyl cyclise 2 is hypothesized to play a functional role in rods and cones of photoreceptors and might be required for recovery of the dark state after phototransduction in humans (Uniprot). Similarly, retinitis pigmentosa 1-like 1 protein

47

(RP1L1) is known to be required for differentiation of photoreceptor cells in human (Uniprot). In addition, it might play a role in the organisation of the outer segment of rod and cone photoreceptors in human based on sequence similarity (Uniprot).

Lipid metabolism Annotated markers of interest were also linked to lipid binding, transport, regulation and metabolism. A total of three annotated markers were linked to lipid binding; the spectrin beta non-erythrocytic 4-like isoform x1 and 1-like isoform x2 (SPTBN4, SPTBN1) and the calcium-dependant secretion activator 1 (CADPS), and one markers was linked to transport; the lipid phosphate phosphatase-related protein type 4-like (LPPR4) (available at: http://genecards.org/, updated on 7 February 2016). SPTBN4 was found in significant association with the first component of the body shape analysis which was linked to belly curvature whereas SPTBN1 and CADPS were found in significant association with among ecotype analysis and LPPR4 in significant association with head depth. PPAP2A, a paralog of LPPR4 is known to encode an enzyme playing a role in hydrolysis and uptake of lipids in humans (available at: http://genecards.org/, updated on 7 February 2016). Another marker, the Pr domain zinc-finger 16-like isoform x2 (PRDM16) was found in significant association with among ecotype analysis and is known to be a transcriptional regulator of the differentiation of skeletal muscle into brown adipose tissue in mice (Seale et al. 2008). Similarly, the peroxisome proliferator-activated receptor gamma coactivator 1-alpha-like isoform 1 (PPARGC1A) is involve in fatty acid metabolism and more specifically adipose tissue development. Interestingly, PPARγ was found highly expressed in Atlantic Salmon liver and adipose tissue and was suggested to play an important role in regulation of fatty acid degradation (Ruyter et al. 1997). High lipid levels in the muscle of the deepwater siscowet ecotype have long been described and suggested to facilitate vertical migration in the water column by providing hydrostatic lift (Henderson and Anderson 2002; Eschmeyer and Phillips 1965).

Also, Goetz et al. (2010) showed that differences in lipid levels in leans and siscowets persist when reared under identical conditions. They also found several differentially expressed genes in controlled conditions between these two ecotypes linked to lipid

48

metabolism. Interestingly, four of the annotated markers in the present study were found differentially expressed by Goetz et al. (2010), further suggesting that our study identified candidate genes involved in the differentiation between these ecotypes. The four markers in common are the alpha-tectorin-like protein, the fk506-binding protein 5-like isoform x3, the galactosamine (n-acetyl)-6-sulfate sulfatase and the Peroxisome proliferator activated receptor. Functional descriptions of most of these genes are still lacking, therefore mechanistic links between these markers and Lake Trout adaptations remains unknown.

In sum, the hypothesis of genetically based adaptation in Lake Trout is supported by at least a few divergent annotated genes that are linked to biological functions (e.g., vision, lipids metabolism). These same genes are believed to play roles in the local adaptation to different water depths and trophic resource use (Goetz et al. 2010). Arguably, our results do not rule out a role for phenotypic plasticity induced by exposure to different environmental conditions, which will require further common garden studies of other ecotypes from other locations, as performed by Goetz et al. (2010). In fact, phenotypic plasticity may have played an important role in the diversification of Lake trout ecotypes within Lake Superior. Indeed, the presence of environmentally induced (plastic) polymorphism within population has been hypothesized to facilitate the process of divergence (Adams and Huntingford 2004; Pfennig et al. 2010). Thus, phenotypic plasticity can promote the emergence of divergent phenotypes on which selection can act (Pfennig et al. 2010). In addition, trophic polymorphism may be an effective way to promote speciation by resource use because it may trigger reproductive isolation (Pfennig et al. 2010; Smith and Skúlason 1996). Finally, studies on sympatric ecotypes such as cichlids, whitefish and arctic charr have shown, using common-garden experiments, that some morphological characters were plastic and others heritable, thus demonstrating the role of phenotypic plasticity in shaping divergence (Adams and Huntingford 2004; Magalhaes et al. 2009; Lundsgaard-Hansen et al. 2013).

Limitations Admittedly, we must also consider the possibility that several alternative factors could explain the pattern of continuum in ecotypic divergence observed here. Namely, sample sizes were small in some cases, especially for the humper ecotype, which could have limited

49

our power to detect genetic divergence, namely between humper and redfin. Also, our capability of assigning Lake Trout to different ecotypes based on their morphology varied among sites, which may have created artificially admixed groups of individuals resulting in lower level of differentiation among them. In such a case however, the clustering analysis performed with Admixture should have detected such groups of admixed individuals from different populations, which was not the case here. Instead, Admixture revealed homogeneous groups of individuals, independent of their ecotype in locations where we observed very weak or no genetic differentiation.

Management implications The maintenance of genetic diversity, and thus the potential of a species to evolve in the face of a changing environment is central in conservation genomics and fishery management (Toro and Caballero 2005). Improper management may lead to depletion of the resource and/or impaired resilience by decreasing genetic diversity or eroding local adaptations (Laikre et al. 2005; Zimmerman et al. 2009). Management units are groups of conspecific individuals among which connectivity is sufficiently low so that each group should be managed separately (Palsboll et al. 2007). The delineation of these management units is still debated and has usually been based upon the rejection of panmixia (Waples and Gaggiotti 2006) or the absolute amount of population divergence between populations (Palsboll et al. 2007). Thresholds above which populations should be considered distinct management units do not exist, but a dispersal level < 10% has been suggested (Palsboll et al. 2007). Based on our results, the primary basis to define management units in Lake Trout of Lake Superior should be the sampling sites rather than ecotypes since we observed pronounced levels of net genetic differentiation and high assignment success (varying between 74-95%) among sites compared to net genetic differentiation and very low assignment success (varying between 12-61%) among ecotypes. Yet, depending on locations, ecotypic differentiation must also be considered since ecotypes were also genetically distinct in some cases, such as Isle Royale in particular. Also, evidence of local adaptation was uncovered and therefore, caution must be taken within sites to avoid depletion of locally adapted traits by stocking or exploitation. Since the extirpation of Lake Trout from most of the Great Lakes other than Superior, stocking programs have been developed in some lakes without success (Page et al.

50

2003). Matching stocking sites with proper ecotype could increase reintroduction success (Zimmerman et al. 2009). Based on this study, we would advocate for reintroduction and translocation of Lake Trout from the least genetically differentiated site, namely Superior Shoals since this would provide the full range of ecotypic differentiation within a quasi- panmictic gene pool, a situation that would be reminiscent of the early stage of ecological speciation (Smith and Skúlason 1996; Hendry 2009). Moreover, such intra-population polymorphism may increase survival in a new environment while maintaining genetic diversity and potential for local adaptation (Wennersten and Forsman 2012). In addition, our results provided limited evidence for local adaptation associated with ecotypic differentiation at this location, which could improve survival in a different lake environment given that local adaptation is typically associated with trade-offs wherein locally adapted individuals exhibit higher fitness in their local environment compared with individuals from a different population and environment (Kawecki and Ebert 2004). However, further studies on the extent of population differentiation throughout Lake Superior will be necessary not only to better define boundaries to gene flow but also characterise potentially adaptive traits in other localities.

Acknowledgements

We thank the Great Lakes Fishery Commission for the funding of the project and particularly to C Bronte, C Krueger and M Hansen for their help in fish identification. We also thank the KIYI crew for the opportunity to fish on the research boat on Lake Superior. We thank « Ressource Aquatique Québec (RAQ) » for bursaries and monetary help in assisting conferences. We are also grateful to Laura Benestan for graphical input, as well as Thierry Gosselin for bioinformatics support. This study was supported by a research grant from Science and Engineering Research Canada (NSERC strategic program) to L Bernatchez and Pascal Sirois, as well as a research grant from the Great Lakes Fisheries Commission.

51

Conclusion

Les avancées liées aux technologies de séquençage de nouvelle génération permettent maintenant la découverte de milliers de marqueurs chez des centaines d’individus sans aucun a priori. Ces marqueurs sont de plus en plus utilisés pour l’étude de la structure de population, mais aussi pour la détection d’adaptations locales. Cette étude à pris avantage de ces nouvelles technologies pour déterminer, à plus fine échelle, la structure génétique des différents écotypes de touladis du lac Supérieur ainsi que la présence potentielle d’adaptations locales. En combinaison aux analyses génétiques, cette étude à utilisé des analyses morphométriques afin de quantifier les différences dans la forme du corps et de la tête de tous les individus étudiés. Les résultats obtenus nous ont permis d’approfondir les connaissances tant au niveau de la structure génétique entre les écotypes et entre les sites d’échantillonnages qu’au niveau de la présence de divergence adaptative. En outre, cette étude à permis de vérifier si l’apparent parallélisme au niveau phénotypique est aussi présent au niveau génotypique.

Analyses morphologiques

En utilisant la forme du corps et de la tête, il nous a été possible, dans tous les sites échantillonnés, de distinguer différents groupes d’individus ayant des différences morphologiques correspondant aux écotypes décrits dans la littérature. Par contre, tous les écotypes n’affichaient pas de différences morphologiques significatives, et ce, selon le site étudié. Conséquemment, un certain continuum de divergence phénotypique a été observé dans les différents sites à l’étude. Par exemple, Isle Royale présente le plus haut niveau de divergence, avec les quatre écotypes morphologiquement différents, tandis qu’à Stannard Rock et Big Reef seulement l’écotype maigre se différencie. Puis, à Superior Shoals, c’est l’écotype gras qui est morphologiquement le plus distinct. Il est intéressant de noter que certaines composantes morphologiques semblent différencier les différents sites d’échantillonnages ce qui explique que les populations de certains écotypes semblables en apparence soient morphologiquement différents entre eux. Ces résultats suggèrent la

52

présence de différentes pressions de sélection à chacun des sites générant et maintenant les différences observées.

Analyses génomiques et divergence adaptative

Le séquençage de type « RAD » à permis l’identification de 6822 polymorphismes de nucléotide simple (SNP) génotypés chez 486 individus provenant des quatre sites d’échantillonnage et des quatre écotypes. Ces marqueurs ont mis en évidence, une différenciation génétique faible, plus grande entre les sites d’échantillonnage qu’entre les écotypes, ce qui indique un flux génique plus important entre écotypes qu’entre les différentes localités. Cette même tendance a été observée dans les différents indices de diversité génétique, où les sites d’échantillonnages montrent une plus grande différence qu’entre les écotypes tant au niveau de la consanguinité et de la diversité qu’au niveau de l’hétérozygotie. Un patron similaire a aussi été observé lors des analyses d’assignations populationnelles où chaque individu pouvait être assigné à son site d’échantillonnage avec succès, mais rarement à l’écotype qui lui était assigné. La seule exception était l’écotype maigre des sites Stannard Rock et Big Reef qui semblait faire partie du même groupement génétique. La présence d’une plus grande différentiation intersite qu’intrasite entre les différents écotypes avait été notée dans deux autres études basées sur des allozymes (Ihssen 1988; Dehring et coll. 1981). La structure génétique observée pourrait être expliquée par la faible distance parcourue, en moyenne, par les touladis du lac Supérieur (Kapuscinski et coll. 2005; Eschmeyer 1955), qui est en deçà de la distance séparant les différentes localités. De plus, une hypothèse d’isolement bathymétrique, il y a 8000 ans, a été avancée par Bronte et Moore (2007) pour expliquer le patron observé. Le lac Supérieur est constitué de vallées et de pics pouvant, lorsque le niveau de l’eau est très bas, agir comme barrière à la dispersion. Finalement, certains écotypes sont génétiquement différents à l’intérieur de certains sites. Conséquemment, l’étendue de la divergence génétique entre écotypes semble dépendante du site évalué ce qui indique un degré d’achèvement variable de l’isolation reproductif. L’ensemble de ces résultats suggère une évolution indépendante des différents écotypes à chacun des sites.

53

Malgré la faible différentiation génétique, plusieurs marqueurs potentiellement sous sélection divergente ont été identifiés, entre les différents sites, entre les écotypes, entre les écotypes à chacun des sites et en association avec la variation morphologique. Cependant, aucun marqueur, potentiellement sous sélection divergente, n’a été décelé en commun dans les quatre sites entre les différents écotypes. De manière générale, de cinq à 27 marqueurs, ont été identifié en commun (entre écotypes) entre deux site, et seulement trois en commun dans trois sites. Des marqueurs identifiés, huit ont retenu notre attention, étant en lien avec les différences écologiques et phénotypiques observées chez les différents écotypes de touladis. En résumé, cinq de ces marqueurs sont en relation avec le transport et le métabolisme des lipides et les trois autres avec le développement et l’acuité visuelle de la rétine ainsi que la perception visuelle. En fait, une des caractéristiques principales permettant l’identification visuelle des écotypes de touladi est le niveau de gras dans les tissus se manifestant par une plus grande épaisseur du corps chez les écotypes vivant en eaux profondes (ex. gras). Un haut taux de lipides dans les viscères a été suggéré comme force hydrostatique permettant les migrations diurnes dans la colonne d’eau (Henderson et Anderson 2002; Eschmeyer et Phillips 1965). De plus, des changements quant à la forme, la position ainsi que la sensibilité des yeux ont été démontrés chez plusieurs espèces aquatiques comme étant une adaptation à une faible luminosité (Von der Emde et coll. 2004). Ces marqueurs pourraient être en lien avec l’utilisation de l’habitat du touladi dans le lac Supérieur et plus spécifiquement avec sa position dans la colonne d’eau.

Différences historiques et impacts anthropogéniques

Plus de 10 variétés de touladis étaient décrites par les pêcheurs avant 1955 (Goodier 1981). Puis, l’arrivée de la lamproie marine et de plusieurs autres espèces invasives ainsi que des changements anthropogéniques (surexploitation, pollution), ont mené à l’effondrement de la pêche dans les années 60 (Bronte & Sitar 2008; Page et coll. 2003; Page et coll. 2004). Afin d’y restaurer la pêche, le lac Supérieur à été ensemencé avec des individus de l’écotype maigre provenant de lignées créées à partir des populations originaires du même lac (Page et

54

coll. 2004). Un des impacts de ces ensemencements, notés par Guinand et coll. (2003) et (2012), fut la diminution de la diversité génétique des populations de maigre ainsi que, à l’heure actuelle, leur plus grande affinité aux populations d’élevage qu’aux populations ancestrales. De façon similaire, un FST moyen plus bas entre les maigres qu’entre les autres écotypes à été observé dans la présente étude. Les deux autres écotypes, soit, gras et bossu, occupant différentes niches écologiques et ayant peu de pression de pêche, ont toujours été considérés comme peu affectés par ces changements (Guinand et coll. 2012). Cependant, l’étude conduite par Guinard et coll. (2012) a démontré une perte de diversité génétique chez ces écotypes au même titre que celui ensemencé (maigre). Cette perte de diversité génétique, pour tous les écotypes, a aussi été observée par une étude de Baillie et coll. (2016) entre des échantillons de la période de rétablissement (1995-1999) et ceux de la période contemporaine (2004-2013). En effet, une structure génétique par écotype, peu importe le site d’origine, a été observée chez les échantillons de 1995-1999, et elle a été remplacée par une structure génétique principalement par localité d’échantillonnage chez les échantillons contemporains. Ces observations mettent en lumière un plus grand échange de gènes entre écotypes contemporains, réduisant la distance génétique les séparant. La surexploitation, l’ensemencement intensif et l’invasion d’espèces exotiques pourraient avoir créé un chevauchement dans les régions utilisées pour la reproduction ou l’alimentation des différents écotypes et, se faisant, contribuer à l’affaiblissement de l’isolement reproductif (Baillie et coll. 2016). Les activités humaines représentent une force pouvant conduire à l’inversement du continuum de spéciation et à la diminution ou la disparition d’adaptations locales. La perte de biodiversité et l’inversion du processus de spéciation, appelée la dé- spéciation, causés par des changements liés à l’activité humaine, ont été observés chez plusieurs espèces d’eau douce telles que le corégone européen (Coregonus lavaretus) (Bhat et coll. 2014; Hudson et coll. 2013), les ciscos (Coregonus sp.) des Grands Lacs ainsi que chez les cichlidés du lac Victoria (Seehausen et coll. 2008).

55

Implications pour la gestion et la conservation

Le maintien de la diversité génétique est important non seulement pour la conservation, mais aussi pour la gestion des pêcheries, puisque cette diversité représente le potentiel évolutif et adaptatif d’une espèce face à un changement dans l’environnement (Toro and Caballero 2005). On peut donc conclure qu’une perte de diversité génétique, mais aussi la perte d’adaptations locales peuvent altérer le maintien et la résilience de l’espèce en question et même causer son épuisement (Laikre et coll. 2005; Zimmerman et coll. 2009). Cette diversité peut être représentée par des écotypes, vivant en sympatrie et adaptés à différentes niches écologiques. Dans cette optique, le maintien du polymorphisme écotypique permettrait d’assurer la continuité de la ressource. Basé sur les résultats de notre étude, nous recommandons, en premier lieu, une gestion par localité puis par écotypes puisque la présence d’adaptations locales a été décelée.

En ce qui a trait à la réintroduction du touladi dans les autres Grands Lacs, deux stratégies ont été énoncées spécifiquement pour le système des Grands Lacs par Krueger et coll. (1981). La première stratégie consiste à prélever un échantillon d’une population sauvage, représentant toute la diversité génétique de l’espèce et à effectuer des croisements afin de maximiser la variabilité génétique de la progéniture à ensemencer. Cette stratégie se base sur la sélection présente dans l’environnement afin de trouver la combinaison génétique adéquate. À l’inverse, la seconde stratégie repose sur l’acquisition d’individus représentant des populations déjà adaptées à certaines conditions environnementales similaires à celles du plan d’eau à ensemencer. Cette stratégie devrait permettre l’établissement de génotypes déjà adaptés, et limiter, lorsque présentes, l’hybridation avec les populations indigènes. Largement perturbé par l’activité humaine, le système des Grands Lacs peut ne plus correspondre aux niches trophiques existantes avant ces changements lorsque des populations indigènes étaient encore présentes. C'est pourquoi, nous préconisons l’utilisation de populations présentant un certain niveau de polymorphisme phénotypique et une très faible différentiation génétique comme source pour des ensemencements futurs. Le site de Superior Shoals serait un bon candidat comme source d’ensemencements futurs puisqu’il est le site où la différentiation génétique entre les écotypes est la plus faible tout en gardant une

56

diversité phénotypique intéressante. Ce site reflète, de notre point de vue, les premières étapes de la spéciation écologique. Ce faisant, le polymorphisme tant génétique que phénotypique présent à Superior Shoals pourrait ainsi augmenter les chances de survie dans un nouvel environnement tout en maintenant une certaine variabilité génétique ainsi qu’un certain potentiel d’adaptations locales.

Limitations et perspectives futures

Certaines limitations ont été rencontrées dans la réalisation de la présente étude. En premier lieu, un faible nombre d’individus pour certains écotypes dans certains sites (ex. bossu, redfin) a été échantillonné, ce qui a pu biaiser les indices de différentiation génétique ainsi que la divergence morphologique. En deuxième lieu, notre capacité visuelle ainsi que le succès mitigés des programmes statistiques à identifier et assigner un écotype aux individus échantillonnés étaient variables selon les sites ce qui aurait pu amener à la création artificielle de groupe d’individus mixtes. Même si cette tendance n’a pas été observée dans les analyses de type « structure », elle aurait pu contribuer à la diminution des indices de différentiation. De plus, en ce qui a trait au site de l’Isle Royale, les individus génotypés et utilisés pour les analyses morphologiques ont été triés sur le volet, par un collaborateur, comme étant les meilleurs représentants de chaque écotype contrairement aux autres sites où tous les individus échantillonnés ont été utilisés dans la première partie des analyses morphologiques. Cette distinction aurait pu exacerber les différences dans la forme du corps et de la tête observées lors des tests de différentiation morphologique.

La présente étude est la seule à ce jour à avoir utilisé les nouvelles technologies de séquençage et à avoir combiné des analyses génétiques et morphologiques des écotypes de touladis du lac Supérieur. Il est donc nécessaire d’approfondir les résultats présentés dans cette étude pour avoir une vue d’ensemble des processus évolutifs opérant dans ce système. Il serait nécessaire, entre autres, de déterminer l’étendue de la différentiation génétique dans l’ensemble du lac Supérieur, en ajoutant des sites d’échantillonnage dans d’autres régions

57

telles que la partie ouest du lac. L’ajout de ces sites pourrait aider à délimiter les barrières au flux génétique avec plus d’exactitude, et ainsi mieux en comprendre la dynamique. Pareillement, il serait intéressant d’utiliser des analyses d’inférence démographique afin de comprendre l’évolution des écotypes de touladi dans le lac Supérieur ainsi que la possibilité de leur récente homogénéisation. Ces informations seraient un atout quant à la gestion et à la conservation de la ressource, mais aussi dans la compréhension de l’effet des activités humaines sur les populations naturelles.

58

Bibliographie

Adams CE, Huntingford FA (2004) Incipient speciation driven by phenotypic plasticity? Evidence from sympatric population of arctic charr. Biological Journal of the Linnean Society, 81, 611-618.

Ahrenstorff T, Hrabik TR, Stockwell JD, Yule DL, and Sass GG (2011) Seasonally dynamic diel vertical migrations of Mysis diluviana, coregonine fishes, and siscowet lake trout in the pelagia of western Lake Superior. Transactions of the American Fisheries Society, 140, 1504–1520.

Alexander DH, November J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Research Cold Spring Harbor Lab, 19, 1655-1664.

Alfonso NR (2004) Evidence for two morphotypes of lake charr, Salvelinus namaycush, from Great Bear Lake, Northwest Territories, Canada. Environmental Biology of Fishes, 71, 21-32.

Aljanabi SM, Martinez I (1997) Universal and rapid salt-extraction of high quality genomic DNA for PCR-based techniques. Nucleic Acids Research, 25, 4692-4693.

Anderson EC (2010) Assessing the power of informative subsets of loci for population assignment: standard methods are upwardly biased. Molecular Ecology Resources, 10, 701-710.

April J, Hanner RH, Dion-Côté A et al. (2013) Glacial cycles as an allopatric speciation pump in north-eastern American freshwater fishes. Molecular Ecology, 22, 409-422.

Arendt J, Reznick D (2008) Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends in Ecology and Evolution, 23, 26-32.

Bhat S, Amundsen P, Knudsen R et al. (2014) Speciation reversal in European whitefish (Coregonus lavaretus (L.)) caused by competitor invasion. PLoS ONE, 9, 1-10.

Baillie SM, Muir AM, Scribner K et al. (2016) Loss of genetic diversity and reduction of genetic distance among lake trout Salvelinus namaycush ecomorphs, Lake Superior 1959 to 2013. Journal of Great Lakes Research, 42, 1-13.

Barrett RDH, Schluter D (2008) Adaptation from standing genetic variation. Trends in Ecology and Evolution, 23, 38-44.

Bernardi G (2013) Speciation in fishes. Molecular Ecology, 22, 5487-5502.

Bernatchez, L. Adaptive evolutionary potential in the face of environmental change: considerations from studies of genomic variation in fishes. Journal of Fish Biology.

Berthelot C, Brunet F, Chalopin D (2014) The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications, 5.

Bhat S, Amundsen P, Knudsen R et al. (2014) Speciation reversal in European whitefish (Coregonus lavaretus (L.)) caused by competitor invasion. PLoS ONE, 9, 1-10.

59

Blackie C, Weese D, Noakes D (2003) Evidence for resource polymorphism in the lake charr (Salvelinus namaycush) population of Great Bear Lake, Northwest Territories, Canada. Ecoscience, 10, 509-514.

Bronte CR, Ebener MP, Schreiner DR et al. (2003) Fish community change in Lake Superior, 1970 – 2000. Canadian Journal of Fisheries and Aquatic Sciences, 60, 1552-1574.

Bronte CR, Moore SA (2007) Morphological variation of siscowet lake trout in Lake Superior. Transactions of the American Fisheries Society, 136, 509-517.

Bronte CR, Sitar SP (2008) Harvest and relative abundance of siscowet lake trout in Michigan waters of Lake Superior, 1929-1961. Transactions of the American Fisheries Society, 137, 916-926.

Burnham-Curtis MK, Smith G (1994) Osteological evidence of genetic divergence of lake trout (Salvelinus namaycush). Copea, 4, 843-850.

Butlin RK, Galindo J, Grahame W (2008) Sympatric, parapatric or allopatric: the most important way to classify speciation? Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 363, 2997-3007.

Butlin RK, Debelle A, Kerth C et al. (2012) What do we need to know about speciation? Trends in Ecology and Evolution, 27, 27-39.

Butlin RK, Saura M, Charrier G et al. (2013) Parallel evolution of local adaptation and reproductive isolation in the face of gene flow. Evolution, 68, 935-949.

Catchen JM, Amores A, Hohenlohe P et al. (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3, 1, 171-182.

Chavarie L, Howland KL, Tonn WM (2013) Sympatric polymorphism in lake trout: The coexistence of multiple shallow-water morphotypes in Great Bear Lake. Transactions of the American Fisheries Society, 142, 814-823.

Colosimo PF, Hosemann KE, Balabhadra S et al. (2005) Widespread parallel evolution in sticklebacks by repeated fixation of ectodysplasin alleles. Science, 307, 1928-1933.

Coyne JA, Orr HA (2004) Speciation. Sinauer Associates, Sunderland, MA. 545 pp.

Danecek P, Auton A, Abecasis G et al. (2011) The variant call format VCFtools. Bioinformatics, 27, 2156-2158.

Dehring T, Brown A, Daugherty C et al. (1981) Survey of the genetic variation among eastern Lake Superior lake trout (Salvelinus namaycush). Canadian Journal of Fisheries and Aquatic Sciences, 38, 1738-1746.

Do C, Waples RS, Peel D et al. (2014) NeEstimator V2 : re-implementation of software for the estimation of comtempory effective population size (Ne) from genetic data. Molecular Ecology Resources, 14, 209-214.

Elmer KR, Meyer A (2011) Adaptation in the age of ecological genomics: Insights from parallelism

60

and convergence. Trends in Ecology and Evolution, 26, 298-306.

Eshenroder R (2008) Differentiation of deep-water lake charr Salvelinus namaycush in North American lakes. Environmental Biology of Fishes, 83, 77-90.

Eschmeyer PH (1955) The reproduction of lake trout in southern Lake Superior. Transactions of the American Fisheries Society, 84, 47-74.

Eschmeyer PH, Phillips AM (1965) Fat content of the flesh of siscowets and lake trout from Lake Superior. Transactions of the American Fisheries Society, 94, 62-74.

Fitzpatrick BM, Fordyce JA, Gavrilets S (2009) Pattern, process and geographic modes of speciation. Journal of Evolutionary Biology, 22, 2342-2347.

Foll M , Gaggiotti OE (2008) A genome scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180, 977-993.

Fraley C, Raftery AE (2012) MCLUST Version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report no. 597, Department of Statistics, University of Washington, June 2012.

Franchini P, Fruciano C, Spreitzer M et al. (2013) Genomic architecture of ecologically divergent body shape in a pair of sympatric crater lake cichlid fishes. Molecular Ecology, 23, 1828-1845.

Frichot E, François O (2015) LEA: an R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6, 925-929.

Gagnaire P, Pavey S, Normandeau E et al. (2013) The genetic architecture of reproductive isolation during speciation-with-gene-flow in Lake Whitefish species pairs assessed by RAD sequencing. Evolution, 67, 2483-2797.

Gavrilets S, Vose A, Barluenga M et al. (2007) Case studies and mathematical models of ecological speciation. 1. Cichlids in a crater lake. Molecular Ecology, 16, 2893-2909.

Gislason D, Ferguson MM, Skúlason S, Snorrason SS (1999) Rapid and coupled phenotypic and genetic divergence in Icelandic Arctic Char (Salvelinus alpinus). Canadian Journal of Fisheries and Aquatic Sciences, 56, 2229-2234.

Goetz F, Rosauer D, Sitar S et al. (2010) A genetic basis for the phenotypic differentiation between siscowet and lean lake trout (Salvelinus namaycush). Molecular Ecology, 19, 176-196.

Goetz F, Sitar S, Rosauer D et al. (2011) The reproductive biology of siscowet and lean lake trout in southern Lake Superior. Transactions of the American Fisheries Society, 140, 1472-1791.

Goodier JL (1981) Native lake trout (Salvelinus namaycush) stocks in the Canadian waters of Lake Superior Prior to 1955 '. Journal of Fisheries and Aquatic Sciences, 38, 1724-1737.

Gosselin T, Bernatchez L (2016). stackr: GBS/RAD data exploration, manipulation and visualization using R. R package version 0.2.1. https://github.com/thierrygosselin/stackr.

Guinand B, Scribner KT, Page KS et al. (2003) Genetic variation over space and time: analyses of

61

extinct and remnant lake trout populations in the Upper Great Lakes. Proceedings of the Royal Society Biological sciences, 270, 425-433.

Guinand B, Page KS Burnham-Curtis MK et al. (2012) Genetic signatures of historical bottlenecks in sympatric lake trout (Salvelinus namaycush) morphotypes in Lake Superior. Environmental Biology of Fishes, 95, 323-334.

Hansen M, Nate N, Muir A et al. (2016) Life history variation among four lake trout morphs at Isle Royale, Lake Superior. Journal of Great Lakes Research, 42, 421-432.

Hansen M, Nate N, Krueger C et al. (2012) Age, growth, survival, and maturity of lake trout morphotypes in Lake Mistassini, Quebec. Transactions of the American Fisheries Society, 141, 1492-1503.

Harris LN, Chavarie L, Bajno R et al. (2014) Evolution and origin of sympatric shallow-water morphotypes of lake trout, Salvelinus namaycush, in Canada’s Great Bear Lake. Heredity, 114, 94- 106.

Henderson B, Anderson D (2002) Phenotypic differences in buoyancy and energetic of lean and siscowet lake charr in Lake Superior. Environmental Biology of Fishes, 64, 203-209.

Hendry AP (2009) Speciation. Nature, 458, 162-164.

Hendry AP, Bolnick DI, Bernier D et al. (2009) Along the speciation continuum in sticklebacks. Journal of Fish Biology, 75, 2000-2036.

Hrabik TR, Rothb BM, Ahrenstorff T (2014) Predation risk and prey fish vertical migration in Lake Superior: Insights from an individual based model of siscowet (Salvelinus namaycush). Journal of Great Lakes Research, 40, 730–738.

Hudson AG, Vonlanthen P, Bezault E et al. (2013) Genomic signatures of relaxed disruptive selection associated with speciation reversal in whitefish. BMC Evolutionary Biology, 13, 108.

Ihssen PE, Casselman JM, Martin GW et al.(1988) Biochemical genetic differentiation of lake trout (Salvelinus namaycush) stocks of the Great Lakes region. Canadian Journal of Fishery and Aquatic Sciences, 45, 1018-1029.

Jones FC, Grabherr MG, Chan YF et al. (2012) The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484, 55-61.

Jonsson B, Jonsson N (2001) Polymorphism and speciation in Arctic charr. Journal of Fish Biology, 58, 605-638.

Kalinowski ST (2009) How well do evolutionary trees describe genetic relationships between populations? Heredity, 102, 506-513.

Kapuscinski K, Hansen S, Schram S (2005) Movements of lake trout in U.S. waters of Lake Superior, 1973–2001. North American Journal of Fisheries Management, 25, 696-708.

Kawecki TJ, Ebert D (2004) Conceptual issues in local adaptation. Ecology Letters, 7, 1225-1241.

62

Krueger CC, Gharrett AJ, Dehring TR et al. (1981) Genetic aspects of fisheries rehabilitation programs. Canadian Journal of Fisheries and Aquatic Sciences, 38, 1877-1881.

Laikre L, Palm S, Ryman N (2005) Genetic population structure of fishes: Implications for coastal zone management. Ambio, 34, 111-119.

Laporte M, Rogers S, Dion-Côté A et al. (2015) RAD-QTL mapping reveals both genome level parallelism and different genetic architecture underlying the evolution of body shape in Lake Whitefish (Coregonus clupeaformis) species pairs. G3: Genes| Genomes | Genetics, 5, 1481-1491.

Lu G, Bernatchez L (1999) Correlated trophic specialization and genetic divergence in sympatric Lake Whitefish ecotypes (Coregonus clupeaformis): support for the ecological speciation hypothesis. Evolution, 53, 1491-1505.

Lundsgaard-Hensen B, Matthews B, Vonlanthen P et al. (2013) Adaptive plasticity and genetic divergence in feeding efficiency during parallel adaptive radiation of whitefish (Coregonus spp.). Journal of Evolutionary Biology, 26, 483-498.

Mallet J, Meyer A, Nosil P et al. (2009) Space, sympatry and speciation. Journal of Evolutionary Biology, 22, 2332-2341.

Magalhaes IS, Mwaiko S, Schneider MV et al. (2009) Divergent selection and phenotypic plasticity during incipient speciation in Lake Victoria cichlid fish. Journal of Evolutionary Biology, 22, 260- 274.

Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17, 10-12.

Mascher M, Wu S, St.Amand P et al. (2013) Application of genotyping-by-sequencing on semiconductor sequencing platforms: a comparison of genetic and reference-based marker ordering in barley. PLoS ONE, 8, 1-11.

Meirmans PG, Van Tienderen PH (2004) GENOTYPE and GENODIVE: two programs for the analysis of genetic diversity of asexual organisms. Molecular Ecology Notes, 4, 792-794.

Mitteroecker P, Bookstein F (2011) Linear discrimination, ordination, and the visualization of selection gradients in modern morphometrics. Evolutionary Biology, 38, 100-114.

Moore S, Bronte C (2001) Delineation of sympatric morphotypes of lake trout in Lake Superior. Transactions of the American Fisheries Society, 130, 1233-1240.

Moore S, Bronte C (2007) Morphological variation of siscowet lake trout in Lake Superior. Transactions of the American Fisheries Society, 136, 509-517.

Muir AM, Hansen M, Bronte C et al. (2015) If Arctic charr Salvelinus alpinus is ‘the most diverse vertebrate’, what is the lake charr Salvelinus namaycush? Fish and Fisheries.

Muir AM, Bronte C, Zimmerman MS et al. (2014) Ecomorphological diversity of lake trout at Isle Royale, Lake Superior. Transactions of the American Fisheries Society, 143, 972-987.

63

Muir AM, Vecsei P, Krueger CC (2012) A Perspective on perspectives: methods to reduce variation in shape analysis of digital images. Transactions of the American Fisheries Society, 141, 1161-1170.

Nosil P, Harmon L, Seehausen O (2009) Ecological explanations for (incomplete) speciation. Trends in Ecology and Evolution, 24, 145-156.

Page KS, Scribner KT, Bennett KR et al. (2003) Genetic assessment of strain-specific sources of lake trout recruitment in the Great Lakes. Transactions of the American Fisheries Society, 132, 877- 894.

Page KS, Scribner KT, Burnham-Curtis M (2004) Genetic diversity of wild and hatchery lake trout populations: relevance for management and restoration in the Great Lakes. Transactions of the American Fisheries Society, 133, 674-691.

Palsbøll PJ, Bérubé M, Allendorf FW (2007) Identification of management units using population genetic data. Trends in ecology & evolution, 22, 11-16.

Pfennig DW, Wund MA, Snell-Rood EC et al. (2010) Phenotypic plasticity’s impacts on diversification and speciation. Trends in Ecology & Evolution, 25, 459-467.

Ralph PL, Coop G (2014) Convergent evolution during local adaptation to patchy landscape. PLOS Genetics, 11.

Ray BA, Hrabik TR, Ebener MP et al. (2007) Diet and prey selection by Lake Superior lake trout during spring, 1986–2001. Journal of Great Lakes Research, 33, 104–113.

Rellstab C, Gugerli F, Eckert AJ et al. (2015) A practical guide to environmental association analysis in landscape genomics. Molecular Ecology, 24, 4348-4370.

Riley SC, He J.X, Johnson JE, O'Brien TP, Schaeffer JS (2007) Evidence of widespread natural reproduction by lake trout Salvelinus namaycush in the Michigan waters of Lake Huron. Journal of Great Lakes Research, 33, 917-921.

Ruyter B, Anderson Ø, Dehli A et al. (1997) Peroxisome proliferator activated receptors in Atlantic salmon (Salmo Salar): effects on PPAR transcription and acyl-CoA oxidase activity in hepatocytes by peroxisome proliferators and fatty acids. Biochimica et Biophysica Acta, 1348, 331-338.

Schluter D (2001) Ecology and the origin of species. Trends in Ecology and Evolution, 16, 372-380.

Schluter D (2016) Speciation, ecological opportunity, and latitude. The American Naturalist, 187, 1- 18.

Seale P, Bjork B, Yang W et al. (2008) PRDM16 controls a brown fat/skeletal muscle switch. Nature, 454, 961-967.

Seehausen O, Takimoto G, Roy D et al. (2008) Speciation reversal and biodiversity dynamics with hybridization in changing environments. Molecular Ecology, 17, 30-44.

64

Skoglund S, Siwertsson A, Amundsen P et al. (2015) Morphological divergence between three Arctic charr morphs - the significance of the deep-water environment. Ecology and Evolution, 15, 3114-3129.

Skúlason S, Smith TB (1995) Resource polymorphisms in vertebrates. Trends in Ecology and Evolution, 10, 366–370.

Smith TB, Skúlason S (1996) Evolutionary significance of resource polymorphisms in fishes, amphibians, and birds. Annual review of ecology and systematic, 27, 111-133.

Stafford CP, McPhee MV, Eby LA et al. (2014) Introduced lake trout exhibit life history and morphological divergence with depth. Canadian Journal of Fisheries and Aquatic Sciences, 71, 10- 20.

Taylor EB (1999) Species pairs of north temperate freshwater fishes: Evolution, taxonomy, and conservation. Reviews in Fish Biology and Fisheries, 9, 299-324.

Tittes S, Kane N (2014) The genomics of adaptation, divergence and speciation: a congealing theory. Molecular Ecology, 23, 3938-3940.

Todd TN, Stedman RM (1989). Hybridization of ciscoes (Coregonus spp.) in Lake Huron. Canadian Journal of Zoology, 67,1679- 1685.

Toro MA, Caballero A (2005) Characterization and conservation of genetic diversity in subdivided populations. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 360, 1367-1378.

Turelli M, Barton NH, Coyne JA (2001) Theory and speciation. Trends in ecology & evolution, 16, 330-343.

Turgeon J, Estoup A, Bernatchez L (1999) Species flock in the North American Great Lakes: molecular ecology of Lake Nipigon ciscoes (Teleostei: Coregonidae: Coregonus). Evolution, 53, 1857-1871.

Turgeon J, Reid SM, Bourret A et al. (2016) Morphological and genetic variation in cisco (Coregonus artedi) and shortjaw cisco (C. zenithicus): multiple origins of shortjaw cisco in inland lakes require a lake-specific conservation approach. Conservation Genetics, 17, 45-56.

Vander Zanden MJ, Shuter BJ, Lester NP et al. (2000) Within- and among-population variation in the trophic position of a pelagic predator, lake trout (Salvelinus namaycush). Canadian Journal of Fisheries and Aquatic Sciences, 57, 725-731.

Von der Emde G, Mogdans J, Kapoor B (2004) The Senses of Fish: Adaptations for the Reception of Natural Stimuli. Springer, India.

Wagner C, Keller I, Wittwer S et al. (2013) Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlids adaptive radiation. Molecular Ecology, 22, 787-798.

65

Waples RS, Gaggiotti O (2006) What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity. Molecular Ecology, 15, 1419-1439.

Webster M, Sheets HD (2010) A practical introduction to landmark-based geometric morphometrics. In: Alroy J, Hunt G, editors. Quantitative Methods in Paleobiology Paleontological Society Papers. 16,163–188.

Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution, 38, 1358–70.

Weissing F, Edelaar P, Van Doorn G (2011) Adaptive speciation theory: a conceptual review. Behavioral Ecology and Sociobiology, 65, 461-480.

Wennersten L, Forsman A (2012) Population-level consequences of polymorphism, plasticity and randomized phenotype switching: a review of predictions. Biological Reviews, 87, 756-767.

Wilson CC, Hebert PDN (1996) Phylogeographic origins of lake trout (Salvelinus namaycush) in eastern North America. Canadian Journal of Aquatic Sciences, 53, 2764-2775.

Wilson CC, Hebert PDN (1998) Phylogeography and postglacial dispersal of lake trout (Salvelinus namaycush) in North America. Canadian Journal of Aquatic Sciences, 55, 1010-1024.

Wilson CC, Mandrak NE (2004) History and evolution of lake trout in shield lakes: past and future challenges. In:Boreal Shield Watersheds: Lake Trout Ecosystems in a Changing Environment (eds Gunn JM, Steedman RJ, Ryder RA), CRCPress, Boca Raton, Florida.

Wimberger PH. Trophic polymorphisms, plasticity, and speciation in vertebrates. In: Stouder DJ, Fresh K, Feller RJ, editors. Theory and application in fish feeding ecology. Columbia, SC: University of South Carolina Press; 1994. pp. 19–43.

Zimmerman MS, Schmidt SN, Krueger CC et al. (2009) Ontogenetic niche shifts and resource partitioning of lake trout morphotypes. Canadian Journal of Fisheries and Aquatic Sciences, 66, 1007–1018.

Zimmerman MS, Krueger CC (2009) An ecosystem perspective on re-establishing native deepwater fishes in the Laurentian Great Lakes. North American Journal of Fisheries Management, 29, 1352- 1371.

Zimmerman MS, Krueger CC, Eshenroder RL (2006) Phenotypic diversity of lake trout in Great Slave Lake: differences in morphology, buoyancy, and habitat depth. Transactions of the American Fisheries Society, 135, 1056-1067.

Zimmerman MS, Krueger CC, Eshenroder RL (2007) Morphological and ecological differences between shallow- and deep-water lake trout in Lake Mistassini, Quebec. Journal of Great Lakes Research, 33, 156-169.

66

Annexe 1

SNP identification and genotyping Raw reads quality was assessed with FastQC (v.0.11.3) (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc) before and after filtering with Cutadapt (Martin 2011). Reads were filtered with Cutadapt to remove possible adaptor sequences “AGATCGGAAGAGCGGG” left from the sequencing and/or possibly preceded by the frequent cutter sequence “CCGA”. For recognition, two nucleotide mismatches were allowed in the adaptor from the 3’ end (-e 0.2) and sequences less than 50 bp were discarded (-m 50). Subsequently, the program STACKS v.1.32 was used for read assembly and SNP genotyping (Catchen et al. 2011). Briefly, reads were cleaned, demultiplexed and reads longer than 80 bp were trimmed to 80 bp by the module process radtags. For de novo assembly (ustacks), a minimum of four reads were required to form a stacks (-m 4), a maximum of five mismatches was allowed between stacks to form a locus (-M 5), a maximum of seven mismatches to align secondary reads (-N 7) and a maximum of two stacks per locus was allowed (--max_locus_stacks 2). In addition, an alpha level of 0.05, a model type bounded (lower bound of zero, higher bound of 0.1) and one mismatch between loci (-n 1) were chosen for catalog building (cstacks and sstacks). Then, the program rxstacks was used to make genotype corrections based on all populations and individuals. This module was run with a log likelihood filter of -10 (--ln_lim 10) chosen based on the log likelihood distribution of catalog loci, a confounded filter limit of 0.75 (--conf_lim 0.75), the same model type as previously mentioned and haplotype pruning was enabled. Modules cstacks and sstacks were run again to rebuild a new catalog with corrections from rxstacks. Finally, the module population with a population map grouping individuals per site was run to generate a vcf output. Loci being retained had to be present in a minimum of 70% of individuals from a given site (-r 0.7), in at least two sites (-p 2), have a minimum stack depth of 4 (-m 4) and a minimum allele frequency of 0.001 (-a 0.001). Subsequently, individuals with more than 40% missing genotype calculated with vcftools (Danecek et al. 2001) were removed and the population module was run again without these individuals. Lastly, the resulting vcf file was filtered using custom script to retain the best quality SNPs (Table 2). Briefly, to be kept for subsequent analyses, a SNP should have a high genotype quality by having 1) a maximum allelic imbalance of 5 (ratio of the number of sequences for the major allele on the number of sequences for the minor allele) and 2) a minimum genotype likelihood of 6. An SNP should not be in Hardy-Weinberg disequilibrium by having 1) an observed heterozygosity ≤ 0.6, and 2) a FIS ≥ -0.3 and ≤ 0.3. An SNP should also have a global minor allele frequency (MAF) ≥ 0.01 or a local MAF ≥ 0.05. Loci with less than eight SNPs were kept and the first SNP per locus was kept to limit the extent of linkage disequilibrium.

67

Table S1. Between-group PCA analysis of group distance among the four ecotypes pooled (Siscowet (FT), Humper (HT), Redfin (RF), Lean (LT)) with 10000 permutation for head and body shape. Group distances differences were significant (p < 0.001) for all comparisons.

Head Body FT HT LT FT HT LT H 0.041 0.029 T LT 0.055 0.031 0.033 0.010 RF 0.037 0.031 0.031 0.019 0.016 0.019

68

Table S2. Paiwise differentiation (FST) and genetic diversity indices calculated between each pair of ecotypes within and among sites with 6822 SNPs. Significant FST are indicated with asterix (**p-values ≤ 0.01, *p-values ≤ 0.01). Within site comparisons are highlighted in grey and between sites comparisons for similar ecotypes are in bold. Ecotypes are: Siscowet (FT), Humper (HT), Lean (LT) and Redfin (RF).

Big Reef Isle Royale Stannard Rock Superior Shoals LT FT RF HT LT FT RF HT LT FT RF HT LT FT RF HT LT -- FT 0.008** -- RF 0.006** 0.000 -- Big Reef HT 0.012** 0.000 0.002 -- LT 0.017** 0.019** 0.016** 0.015** -- FT 0.022** 0.022** 0.022** 0.020** 0.010** -- RF 0.027** 0.024** 0.021** 0.020** 0.001* 0.012** --

Isle Royale HT 0.028** 0.023** 0.020** 0.020** 0.006** 0.017** 0.006** -- LT 0.006** 0.015** 0.013** 0.022** 0.018** 0.025** 0.027** 0.030** -- FT 0.014** 0.014** 0.014** 0.017** 0.024** 0.022** 0.031** 0.034** 0.012** --

Rock RF 0.008** 0.013** 0.010** 0.018** 0.017** 0.024** 0.025** 0.028** 0.002* 0.005** --

Stannard Stannard HT 0.015** 0.014** 0.014** 0.021** 0.020** 0.026** 0.027** 0.030** 0.009** 0.002* 0.002 -- LT 0.025** 0.018** 0.017** 0.021** 0.014** 0.023** 0.015** 0.021** 0.026** 0.019** 0.017** 0.014** -- FT 0.024** 0.021** 0.018** 0.018** 0.011** 0.021** 0.013** 0.018** 0.021** 0.021** 0.014** 0.013** 0.002 -- RF 0.033** 0.030** 0.026** 0.025** 0.010** 0.018** 0.009** 0.014** 0.033** 0.032** 0.026** 0.025** 0.006** 0.004** -- Shoals Superior Superior HT 0.029** 0.027** 0.022** 0.020** 0.009** 0.019** 0.009** 0.013** 0.029** 0.026** 0.021** 0.019** 0.005* 0.002** 0.000 --

69

Table S3. SNPs associated with variation in head and body shape detected by LFMM. P-values were adjusted with a ℷ of 0.55 and population structure was corrected using a K of 5. The first four PCs of body shape and the first six PCs of head shape were chosen as phenotypic variables based on the broken stick method. SNPs associated with each variable are highlighted in grey. The other numbers represent SNPs common between variables.

Body Head PCs 1 2 3 4 1 2 3 4 5 6 1 189 2 4 74 Body 3 2 0 63 4 1 0 2 77 1 65 6 3 4 129 2 7 0 4 6 6 154 3 2 6 2 3 4 2 107 Head 4 4 3 4 4 5 0 5 113 5 13 14 1 5 3 4 3 2 93 6 7 1 4 3 3 5 1 2 3 113

70

Table S4. Hits against the Rainbow Trout genome (Oncorhynchus mykiss) of loci sequences identified as potentially under selection according to BAYESCAN or LFMM between ecotypes or between sites as well as in association with phenotypic variation. The comparison in which SNPs were identified is described as follow: between ecotypes (E), between sites (S) or in association with phenotypic variation (P). Gene name, loci ID and transcript ID is provided for the remaining 258 markers after quality filtration as well as their percentage identity, sequence length, base-pair differences between query and reference, and significance (e-value).

Comparison % length diff. Gene name Loci Transcript ID e-value type identity (bp) (bp) 1 anoctamin‐8‐like isoform x2 26073 E, P GSONMT00054401001 100.0 80 0 4.73E‐36 2 at‐rich interactive domain‐containing protein 3b‐like 53967 E GSONMT00067039001 100.0 80 0 4.73E‐36 3 baculoviral iap repeat‐containing protein 6 15229 E, P GSONMT00060030001 100.0 80 0 4.73E‐36 4 glutamate receptor delta‐1‐like isoform x1 56484 P GSONMT00065227001 100.0 80 0 4.73E‐36 5 histone‐lysine n‐methyltransferase mll3 isoform x3 19958 E GSONMT00068823001 100.0 80 0 4.73E‐36 6 hydrocephalus‐inducing protein homolog 5057 E, S, P GSONMT00076258001 100.0 80 0 4.73E‐36 7 latent‐transforming growth factor beta‐binding protein 3 isoform x1 47104 E GSONMT00069497001 100.0 80 0 4.73E‐36 lish domain and heat repeat‐containing protein kiaa1468 homolog 8 26998 P GSONMT00038235001 100.0 80 0 4.73E‐36 isoform x1 9 myotubularin‐related protein 4‐like isoform x2 55983 E, P GSONMT00040680001 100.0 80 0 4.73E‐36 10 nuclear pore complex protein nup153‐like isoform x1 44429 E, P GSONMT00025986001 100.0 80 0 4.73E‐36 11 pms1 protein homolog 1 isoform x1 101365 P GSONMT00075117001 100.0 80 0 4.73E‐36 12 protein wnt‐8c‐like 90881 P GSONMT00002071001 100.0 80 0 4.73E‐36 13 spectrin beta non‐erythrocytic 4‐like isoform x1 60407 P GSONMT00074634001 100.0 80 0 4.73E‐36 14 zinc finger protein 319‐like 7800 S GSONMT00015126001 100.0 80 0 4.73E‐36 15 adnp homeobox protein 2‐like 111578 E GSONMT00007460001 100.0 80 0 4.73E‐36 16 biorientation of in cell division protein 1‐like 1‐like 43621 E GSONMT00032398001 100.0 80 0 4.73E‐36 17 calcium homeostasis endoplasmic reticulum isoform x2 104892 E GSONMT00072562001 100.0 80 0 4.73E‐36 18 chondroadherin 6477 S GSONMT00047209001 100.0 80 0 4.73E‐36 19 e3 ubiquitin‐protein ligase dtx4‐like isoform x1 2976 S GSONMT00030035001 100.0 80 0 4.73E‐36 20 fts and hook‐interacting protein homolog 5061 S GSONMT00041201001 100.0 80 0 4.73E‐36 21 guanine nucleotide‐binding protein g g g subunit beta‐3‐like 5695 S GSONMT00025707001 100.0 80 0 4.73E‐36

71

22 lim homeobox protein lmx‐ ‐like isoform x1 38018 E GSONMT00078944001 100.0 80 0 4.73E‐36 23 nucleoprotein tpr 4730 S GSONMT00034367001 100.0 80 0 4.73E‐36 24 peripherin‐2‐like 5575 S GSONMT00069513001 100.0 80 0 4.73E‐36 25 probable g‐protein coupled receptor 149‐like 2063 S GSONMT00003168001 100.0 80 0 4.73E‐36 26 protein piccolo‐like 100242 E GSONMT00019652001 100.0 80 0 4.73E‐36 27 transcriptional activator gli3‐like 3941 S GSONMT00079138001 100.0 80 0 4.73E‐36 28 transmembrane and coiled‐coil domains protein 3‐like 9953 E GSONMT00061555001 100.0 80 0 4.73E‐36 29 zinc finger ccch domain‐containing protein 13‐like 6116 S GSONMT00022019001 100.0 80 0 4.73E‐36 30 transposable element tcb1 transposase 28382 S GSONMT00023176001 100.0 79 0 1.70E‐35 31 roundabout homolog 1‐like 9822 P GSONMT00063386001 100.0 80 0 6.12E‐35 32 alpha‐n‐acetylgalactosaminide alpha‐ ‐sialyltransferase 5‐like 30704 P GSONMT00057798001 98.8 80 1 2.20E‐34 33 ankyrin repeat and btb poz domain‐containing protein btbd11 90882 E, P GSONMT00040601001 98.8 80 1 2.20E‐34 34 cub domain‐containing protein 1‐like 17799 P GSONMT00047670001 98.8 80 1 2.20E‐34 35 desmoplakin isoform x1 8766 E GSONMT00079639001 98.8 80 1 2.20E‐34 36 gap junction alpha‐9 protein 55108 S GSONMT00041354001 98.8 80 1 2.20E‐34 37 interferon regulatory factor 2‐binding protein 2‐b‐like 57567 E GSONMT00034070001 98.8 80 1 2.20E‐34 38 intestinal‐type alkaline phosphatase 1‐like isoform x2 30228 P GSONMT00032149001 98.8 80 1 2.20E‐34 39 laminin subunit alpha‐2‐like 30362 P GSONMT00072221001 98.8 80 1 2.20E‐34 40 low quality protein: cullin‐9‐like 50860 E GSONMT00080507001 98.8 80 1 2.20E‐34 41 lysine‐specific demethylase 5b‐b‐like isoform x1 123864 P GSONMT00028137001 98.8 80 1 2.20E‐34 42 methylated‐dna‐‐protein‐cysteine methyltransferase 23742 E, P GSONMT00082523001 98.8 80 1 2.20E‐34 43 ngfi‐a‐binding protein 1‐like 14232 P GSONMT00029060001 98.8 80 1 2.20E‐34 44 ninjurin‐1‐like 18559 P GSONMT00003195001 98.8 80 1 2.20E‐34 45 nuclear pore complex protein nup214‐like 24790 E, P GSONMT00050062001 98.8 80 1 2.20E‐34 46 PREDICTED: uncharacterized protein LOC100001321 38523 S GSONMT00029755001 98.8 80 1 2.20E‐34 47 ras‐related protein rab‐19‐like 52545 E GSONMT00060221001 98.8 80 1 2.20E‐34 48 superkiller viralicidic activity 2‐like 2 5005 E,S GSONMT00070735001 98.8 80 1 2.20E‐34 49 suppressor of cytokine signaling 4 35879 P GSONMT00043783001 98.8 80 1 2.20E‐34 50 upf0461 protein c5orf24 homolog 24563 E GSONMT00044986001 98.8 80 1 2.20E‐34

72

51 activating molecule in becn1‐regulated autophagy protein 1‐like 59404 E GSONMT00078025001 98.8 80 1 2.20E‐34 52 alpha‐( )‐fucosyltransferase‐like 2335 S GSONMT00009526001 100.0 80 0 2.20E‐34 53 cgmp‐dependent protein kinase 2 1137 E,S GSONMT00064545001 98.8 80 1 2.20E‐34 54 dentin sialophospho isoform x4 116789 E GSONMT00061661001 98.8 80 1 2.20E‐34 55 eukaryotic translation initiation factor 2‐alpha kinase 4 1515 E,S GSONMT00055112001 98.8 80 0 2.20E‐34 56 excitatory amino acid transporter 5 60687 E GSONMT00061594001 98.8 80 1 2.20E‐34 57 junction‐mediating and ‐regulatory 17692 E GSONMT00031579001 98.8 80 1 2.20E‐34 58 magnesium transporter protein 1 6437 S GSONMT00031369001 98.8 80 1 2.20E‐34 59 myomegalin isoform x8 1017 E, S GSONMT00068188001 98.8 80 1 2.20E‐34 phosphatidylinositol 4‐phosphate 3‐kinase c2 domain‐containing 60 113457 E GSONMT00020329001 98.8 80 1 2.20E‐34 subunit gamma‐like pleckstrin homology domain‐containing family g member 4b‐like 61 4745 S GSONMT00044973001 98.8 80 1 2.20E‐34 isoform x2 62 pleckstrin homology domain‐containing family h member 1 14563 E GSONMT00055075001 98.8 80 1 2.20E‐34 63 pr domain zinc finger protein 16‐like isoform x2 60678 E GSONMT00044086001 98.8 80 1 2.20E‐34 64 receptor‐type tyrosine‐protein phosphatase delta‐like isoform x7 47268 E GSONMT00014097001 98.8 80 1 2.20E‐34 65 rho gtpase‐activating protein 35‐like 4005 S GSONMT00013509001 98.8 80 1 2.20E‐34 66 syntabulin‐like isoform x1 2309 S GSONMT00071916001 98.8 80 1 2.20E‐34 67 ww domain‐containing adapter protein with coiled‐coil isoform 3 4741 S GSONMT00077786001 98.8 80 1 2.20E‐34 68 zinc finger e‐box‐binding homeobox 2‐like 3041 E,S GSONMT00047434001 98.8 80 1 2.20E‐34 69 septin‐10‐like isoform x2 91888 E GSONMT00051997001 98.7 79 1 7.92E‐34 70 plexin‐a2‐like 88779 P GSONMT00018513001 98.7 80 1 2.85E‐33 71 dynactin subunit 6 6279 S GSONMT00036651001 98.7 80 1 2.85E‐33 72 etoposide‐induced protein homolog 95493 E GSONMT00006738001 98.7 78 1 2.85E‐33 73 a‐kinase anchor protein mitochondrial‐like 7702 P GSONMT00030994001 97.5 80 2 1.02E‐32 74 diacylglycerol kinase zeta‐like 92913 P GSONMT00078021001 97.5 80 2 1.02E‐32 75 gamma‐butyrobetaine dioxygenase‐like 126319 P GSONMT00064074001 97.5 80 2 1.02E‐32 76 gdp‐man:man c ‐pp‐dol alpha‐ ‐mannosyltransferase‐like 2931 E,S GSONMT00053049001 97.5 80 2 1.02E‐32 77 leucine‐rich alpha‐2‐glycoprotein 42286 P GSONMT00068092001 97.5 80 2 1.02E‐32

73

78 mediator of rna polymerase ii transcription subunit 9 7050 P GSONMT00060569001 97.5 80 2 1.02E‐32 79 methionine‐‐trna mitochondrial‐like 22696 E, P GSONMT00017184001 97.5 80 2 1.02E‐32 80 neurogenic locus notch homolog protein 1‐like 42458 E, P GSONMT00069756001 97.5 80 2 1.02E‐32 81 protein dispatched homolog 1 91805 P GSONMT00065735001 97.5 80 2 1.02E‐32 82 protein fam65a 112120 P GSONMT00076273001 97.5 80 2 1.02E‐32 83 serine threonine‐protein kinase tao2‐like 92344 S GSONMT00021298001 97.5 80 2 1.02E‐32 84 spartin isoform x2 92211 P GSONMT00001726001 97.5 80 2 1.02E‐32 85 synembryn‐b‐like isoform x2 94140 P GSONMT00046577001 97.5 80 2 1.02E‐32 86 tpa_inf: von willebrand factor 110066 E, P GSONMT00000124001 97.5 80 2 1.02E‐32 87 ubiquitin thioesterase otu1 6860 P GSONMT00049668001 97.5 80 2 1.02E‐32 88 vwfa and cache domain‐containing protein 1 22652 P GSONMT00015355001 97.5 80 2 1.02E‐32 89 zinc finger protein 831 23802 P GSONMT00026913001 97.5 80 2 1.02E‐32 90 cap‐gly domain‐containing linker protein 1‐like 101752 E GSONMT00003762001 97.5 80 2 1.02E‐32 91 chromodomain‐helicase‐dna‐binding protein 6 39341 E GSONMT00003315001 97.5 80 2 1.02E‐32 92 cue domain‐containing protein 1‐like isoform x3 5267 S GSONMT00023710001 97.5 80 2 1.02E‐32 93 fidgetin‐like 20127 E GSONMT00055663001 97.5 80 2 1.02E‐32 94 helicase srcap‐like 35881 E GSONMT00065964001 97.5 80 2 1.02E‐32 95 krueppel‐like factor 15 4930 S GSONMT00010151001 97.5 80 2 1.02E‐32 96 leucine‐rich repeat‐containing protein 58 2875 S GSONMT00032546001 97.5 80 2 1.02E‐32 97 metal regulatory transcription factor 1‐like 1479 S GSONMT00034432001 97.5 80 2 1.02E‐32 98 neuronal cell adhesion molecule‐like isoform x2 2622 S GSONMT00047986001 97.5 80 2 1.02E‐32 99 peregrin‐like isoform x1 5778 S GSONMT00010432001 97.5 80 2 1.02E‐32 100 periplakin isoform x1 4315 S GSONMT00063255001 97.5 80 2 1.02E‐32 101 PREDICTED: uncharacterized protein LOC100537266 59898 E GSONMT00043169001 97.5 80 2 1.02E‐32 102 protein apcdd1‐like 5529 S GSONMT00082480001 97.5 80 2 1.02E‐32 103 rho guanine nucleotide exchange factor 40‐like isoform x2 6623 S GSONMT00018296001 97.5 80 2 1.02E‐32 104 secretory carrier‐associated membrane protein 3‐like 2801 S GSONMT00036121001 97.5 80 2 1.02E‐32 105 snf‐related serine threonine‐protein kinase 36716 E GSONMT00080026001 97.5 80 2 1.02E‐32 106 tgf‐beta‐activated kinase 1 and map3k7‐binding protein 2 94333 E GSONMT00072246001 97.5 80 2 1.02E‐32

74

107 zinc finger and scan domain‐containing protein 20‐like 1986 S GSONMT00023307001 97.5 80 2 1.02E‐32 108 zinc finger protein 608‐like 23534 E GSONMT00002581001 98.7 77 1 1.02E‐32 109 zinc finger protein glis1‐like isoform x1 39569 E GSONMT00057812001 97.5 80 2 1.02E‐32 110 mam domain‐containing protein 2 10687 P GSONMT00070516001 98.7 77 1 3.68E‐32 peroxisome proliferator‐activated receptor gamma coactivator 1‐ 111 5927 S GSONMT00000773001 97.5 80 2 3.68E‐32 alpha‐like isoform x1 112 protein phosphatase 1g‐like 108785 E GSONMT00069799001 98.7 80 1 3.68E‐32 113 zinc finger c2hc domain‐containing protein 1a‐like isoform x2 129154 E GSONMT00027196001 98.7 80 1 1.33E‐31 114 adenosine monophosphate‐protein transferase ficd 47974 S GSONMT00075256001 96.3 80 3 4.77E‐31 115 down syndrome cell adhesion molecule‐like isoform x2 28379 P GSONMT00004960001 96.3 80 2 4.77E‐31 116 ellis‐van creveld syndrome protein 47920 E, P GSONMT00066234001 96.3 80 3 4.77E‐31 117 ferm and pdz domain‐containing protein 3‐like isoform x1 6627 E, P, S GSONMT00036849001 96.3 80 3 4.77E‐31 118 histone‐lysine n‐ h3 lysine‐79 specific‐like isoform x2 19976 P GSONMT00070213001 96.3 80 3 4.77E‐31 119 integrator complex subunit 2‐like isoform x2 29694 P GSONMT00038514001 96.3 80 3 4.77E‐31 120 interferon‐induced protein with tetratricopeptide repeats 2‐like 49731 E, P GSONMT00033943001 96.3 80 3 4.77E‐31 121 lipid phosphate phosphatase‐related protein type 4‐like 44717 P GSONMT00063321001 96.3 80 3 4.77E‐31 122 propionyl‐ carboxylase alpha mitochondrial‐like 29072 P GSONMT00009160001 97.4 80 2 4.77E‐31 123 sal‐like protein 4‐like 22105 S GSONMT00052097001 96.3 80 3 4.77E‐31 124 serine incorporator 1‐like 149 E, S GSONMT00048097001 96.3 80 3 4.77E‐31 125 slit homolog 3 44076 P GSONMT00077320001 96.3 80 3 4.77E‐31 126 upf0676 103486 P GSONMT00029988001 96.3 80 3 4.77E‐31 127 zinc finger protein gli2‐like 59157 E, P GSONMT00044295001 96.3 80 2 4.77E‐31 128 claudin‐8‐like 51333 E GSONMT00008093001 96.3 80 3 4.77E‐31 129 c‐x‐c chemokine receptor type 3‐like 131424 E GSONMT00014976001 96.3 80 3 4.77E‐31 130 eh domain‐binding protein 1‐like protein 1‐like isoform x3 669 E,S GSONMT00069163001 96.3 80 3 4.77E‐31 131 erythrocyte band 7 integral membrane protein 1663 S GSONMT00000261001 96.3 80 3 4.77E‐31 132 line‐1 type transposase domain‐containing protein 1‐like 4638 S GSONMT00047942001 96.3 80 3 4.77E‐31 133 unconventional myosin‐xviiia‐like isoform x3 33489 E GSONMT00010938001 96.3 80 3 4.77E‐31 134 dedicator of cytokinesis protein 8‐like 2022 S GSONMT00044149001 96.2 80 3 1.71E‐30

75

135 e3 ubiquitin isg15 ligase trim25‐like 32916 P GSONMT00045931001 96.2 79 3 1.71E‐30 136 fibronectin‐like 5825 S GSONMT00034687001 98.6 80 1 1.71E‐30 137 phosphoribosylformylglycinamidine synthase‐like 16077 E GSONMT00041297001 97.4 80 2 1.71E‐30 138 spondin‐1‐like 1310 E,S GSONMT00003226001 96.3 80 2 1.71E‐30 139 serine threonine‐protein kinase tao2‐like 32442 E GSONMT00009756001 98.6 72 1 6.17E‐30 140 btb poz domain‐containing protein 7 34907 P GSONMT00034538001 95.0 80 4 2.22E‐29 141 f‐box only protein 39 49300 P GSONMT00078654001 95.0 80 4 2.22E‐29 142 neuronal cell adhesion molecule‐like isoform x6 60346 P GSONMT00077257001 97.3 74 2 2.22E‐29 143 oocyte zinc finger protein 20 62155 P GSONMT00073933001 95.0 80 4 2.22E‐29 144 potassium voltage‐gated channel subfamily kqt member 5‐like 49260 P GSONMT00075916001 95.0 80 4 2.22E‐29 145 tata‐binding protein‐associated factor 172 96522 P GSONMT00066587001 98.6 71 1 2.22E‐29 146 zinc finger and btb domain‐containing protein 10 26226 P GSONMT00060560001 96.1 80 3 2.22E‐29 147 ankyrin repeat domain‐containing protein sowahd 6312 E, S GSONMT00048012001 95.0 80 4 2.22E‐29 148 g‐protein coupled receptor family c group 5 member c‐like 5130 S GSONMT00075979001 95.1 80 3 2.22E‐29 149 gram domain‐containing protein 1a‐like 2650 S GSONMT00071944001 95.1 80 3 2.22E‐29 150 msx2‐interacting protein 16188 E GSONMT00058032001 95.0 80 4 2.22E‐29 151 PREDICTED: uncharacterized protein LOC102077212 isoform X1 61415 E GSONMT00006987001 95.0 80 4 2.22E‐29 152 tyrosine‐protein kinase 223‐like 2874 S GSONMT00061056001 95.0 80 4 2.22E‐29 153 coproporphyrinogen‐iii mitochondrial‐like 61752 E GSONMT00038790001 94.9 79 4 7.98E‐29 154 von willebrand factor a domain‐containing protein 1‐like 3681 S GSONMT00002588001 94.9 79 4 7.98E‐29 155 cgmp‐dependent protein kinase 1‐like isoform x2 25138 S GSONMT00077402001 96.0 80 3 2.87E‐28 156 torsin‐1a‐interacting protein 2‐like 13691 P GSONMT00041820001 98.6 79 1 2.87E‐28 157 3‐oxoacyl‐ 14817 E,P GSONMT00038888001 93.8 80 5 1.03E‐27 158 protein phosphatase 1 regulatory subunit 15b 51399 E, P GSONMT00023162001 93.8 80 5 1.03E‐27 159 transposable element tc1 transposase 81477 E GSONMT00028808001 93.8 80 5 1.03E‐27 160 transposase 90146 E, P GSONMT00022528001 93.8 80 5 1.03E‐27 161 alpha‐tectorin‐like 4780 S GSONMT00042960001 94.8 80 4 1.03E‐27 162 calcium‐dependent secretion activator 1 14710 E GSONMT00081944001 93.8 80 5 1.03E‐27 163 disabled homolog 2‐like isoform x2 4846 S GSONMT00067469001 93.8 80 5 1.03E‐27

76

164 scan domain‐containing protein 3‐like 94766 E GSONMT00064321001 96.0 80 2 1.03E‐27 165 small integral membrane protein 18 40969 E GSONMT00030951001 97.2 71 2 1.03E‐27 166 substance‐p receptor‐like 2964 S GSONMT00064849001 93.8 80 5 1.03E‐27 167 transposable element tcb2 transposase 62130 E GSONMT00071423001 93.8 80 3 1.03E‐27 168 60s ribosomal export protein nmd3‐like 44065 P GSONMT00055465001 93.8 80 4 3.71E‐27 169 ring finger and ccch‐type zinc finger domains 1 55538 E GSONMT00053435001 100.0 80 0 3.71E‐27 170 angiomotin‐like isoform x3 105971 P GSONMT00039788001 93.8 80 2 1.33E‐26 171 monocarboxylate transporter 8 29910 S GSONMT00020277001 93.8 80 2 1.33E‐26 172 cadherin egf lag seven‐pass g‐type receptor 2 6751 S GSONMT00069954001 92.5 80 6 4.80E‐26 173 kelch‐like protein 42‐like 88414 E GSONMT00002306001 92.5 80 6 4.80E‐26 174 phosphoinositide 3‐kinase regulatory subunit 5 6368 S GSONMT00038984001 100.0 80 0 4.80E‐26 175 sodium‐ and chloride‐dependent gaba transporter 3‐like isoform x1 114270 E GSONMT00018144001 98.5 65 1 4.80E‐26 176 bcl‐2‐like protein 13‐like isoform x1 14966 P GSONMT00072455001 93.3 80 5 6.21E‐25 177 alpha‐tectorin‐like protein 124863 P GSONMT00024493001 91.3 80 7 2.23E‐24 178 at‐rich interactive domain‐containing protein 5b‐like 1478 E,S GSONMT00039390001 91.3 80 7 2.23E‐24 179 coiled‐coil domain‐containing protein 37 2333 S GSONMT00011100001 97.0 80 1 2.23E‐24 180 transient receptor potential channel pyrexia‐ partial 1686 S GSONMT00001467001 91.4 80 6 2.23E‐24 181 host cell factor 1‐like 56836 P GSONMT00002482001 94.4 70 3 8.03E‐24 182 isoform cra_a 15491 P GSONMT00067158001 96.9 64 2 8.03E‐24 183 interferon‐inducible protein gig1 10291 E GSONMT00041936001 92.1 80 6 8.03E‐24 184 retinitis pigmentosa 1‐like 1 protein 7273 P GSONMT00048786001 94.2 69 4 2.89E‐23 185 sodium channel protein type 4 subunit alpha b‐like 85408 E GSONMT00044608001 91.0 80 7 2.89E‐23 186 dystrophin isoform x4 3350 S GSONMT00015403001 92.0 80 6 2.89E‐23 187 leukotriene b4 receptor 1‐like 381 E,S GSONMT00014336001 94.2 80 4 2.89E‐23 188 methyltransferase‐like protein 24‐like 3897 S GSONMT00062568001 90.2 80 6 2.89E‐23 189 leucine‐rich repeat‐containing protein fam211b‐like 54510 P GSONMT00078867001 90.9 77 7 1.04E‐22 190 PREDICTED: uncharacterized protein K02A2.6‐like 65204 E GSONMT00015899001 90.0 80 8 1.04E‐22 191 cugbp elav‐like family member 4‐like isoform x1 1227 E,P,S GSONMT00019216001 95.3 80 3 3.74E‐22 192 endonuclease‐reverse transcriptase 21356 P GSONMT00006616001 96.7 79 2 3.74E‐22

77

193 PREDICTED: uncharacterized protein LOC101470296 55708 E, P GSONMT00015684001 90.8 76 7 3.74E‐22 194 ras gtpase‐activating protein ngap‐like isoform x2 33175 E GSONMT00079805001 95.3 66 3 3.74E‐22 195 mucin‐17 isoform x2 2594 S GSONMT00038383001 89.9 80 8 3.74E‐22 196 epsin‐1‐like isoform x2 93556 P GSONMT00058523001 100.0 56 0 4.83E‐21 197 retinal guanylyl cyclase 2 37974 P GSONMT00027020001 88.8 80 9 4.83E‐21 198 transposase 52174 E, P GSONMT00022816001 88.9 80 8 4.83E‐21 199 nedd4‐binding protein 1 3743 S GSONMT00026945001 88.8 80 9 4.83E‐21 200 glutaredoxin‐1 3742 S GSONMT00007498001 100.0 52 0 1.74E‐20 201 protein nlrc3‐like 37523 P GSONMT00030459001 91.3 80 6 6.25E‐20 202 armadillo repeat‐containing protein 3 isoform x2 20486 E GSONMT00054705001 87.8 80 8 6.25E‐20 203 atherin‐like isoform x1 57709 E GSONMT00029835001 92.5 80 4 6.25E‐20 204 protein furry homolog isoform x1 129507 E GSONMT00082309001 90.3 72 7 6.25E‐20 205 anaphase‐promoting complex subunit 1‐like 27912 E GSONMT00060195001 89.3 80 6 2.25E‐19 206 b‐cell lymphoma 6 protein 32679 P GSONMT00053887001 87.5 80 10 2.25E‐19 207 dihydropyrimidinase‐related protein 1 60094 E GSONMT00006482001 87.7 80 8 2.25E‐19 208 protein prune homolog 2 103777 P GSONMT00061660001 90.1 71 7 2.25E‐19 209 serine threonine‐protein kinase tao1 isoform x1 40119 E GSONMT00029417001 98.1 58 1 2.25E‐19 210 galactosamine (n‐acetyl)‐6‐sulfate sulfatase 31139 E, P GSONMT00017477001 92.2 77 5 8.09E‐19 211 anoctamin‐3‐like isoform x2 6662 S GSONMT00009836001 96.4 80 2 8.09E‐19 212 myoferlin‐like isoform x1 7997 E GSONMT00065503001 98.1 80 0 8.09E‐19 213 spectrin beta non‐erythrocytic 1 isoform x2 262 E,S GSONMT00082057001 98.1 52 1 8.09E‐19 214 gtpase imap family member 4‐like 27693 P GSONMT00061887001 88.6 80 0 1.05E‐17 215 histone‐lysine n‐methyltransferase setdb1‐b‐like 42583 P GSONMT00036114001 92.1 80 4 1.05E‐17 216 f‐box lrr‐repeat protein 13 4796 S GSONMT00011303001 86.6 80 8 1.05E‐17 217 39s ribosomal protein mitochondrial 44945 E GSONMT00058397001 93.0 57 4 1.35E‐16 218 protein unc‐80 homolog isoform x4 48129 P GSONMT00056878001 96.1 51 2 1.35E‐16 219 actin‐binding protein anillin‐like isoform x1 47445 P GSONMT00039489001 94.4 76 2 4.87E‐16 220 cd63‐like protein 20810 E, P GSONMT00006698001 96.0 50 2 4.87E‐16 221 protein nlrc3‐like 49036 E GSONMT00048785001 91.5 77 5 4.87E‐16

78

222 ras‐related protein rab‐13‐like 22595 E GSONMT00012658001 94.3 58 3 4.87E‐16 223 toll‐like receptor 1 44326 P GSONMT00042809001 88.2 68 8 4.87E‐16 224 transposase domain containing protein 108 E, P, S GSONMT00013300001 88.4 69 7 4.87E‐16 225 ubiquitin carboxyl‐terminal hydrolase 34 248 E, S GSONMT00052855001 84.9 80 7 4.87E‐16 226 tetratricopeptide repeat protein 14 13748 P GSONMT00025748001 87.3 74 7 1.75E‐15 227 fk506‐binding protein 5‐like isoform x3 54907 P GSONMT00063317001 84.8 80 10 6.30E‐15 228 protein nlrc3‐like 51698 P GSONMT00066240001 84.8 80 10 6.30E‐15 229 protein nlrc3‐like 30234 E GSONMT00007861001 87.1 80 6 2.27E‐14 230 ig kappa chain v region k29‐213 55567 E, P GSONMT00046394001 88.5 61 7 8.15E‐14 231 synergin gamma‐like isoform x1 65880 E, P GSONMT00079034001 92.3 80 4 8.15E‐14 232 kruppel‐like factor 11 33099 E GSONMT00005619001 83.5 80 13 8.15E‐14 233 zinc finger protein 384‐like isoform x1 66566 S GSONMT00028645001 85.0 71 3 2.93E‐13 234 1‐phosphatidylinositol ‐bisphosphate phosphodiesterase eta‐2 41895 E GSONMT00081825001 83.3 80 13 2.93E‐13 235 phosphorylated ctd‐interacting factor 1 66233 E GSONMT00001876001 92.2 51 4 2.93E‐13 236 centrosomal protein c10orf90 homolog 26665 E GSONMT00082518001 83.3 80 12 1.05E‐12 237 b‐cell receptor cd22‐like 33713 E GSONMT00037077001 82.9 80 11 1.36E‐11 238 zinc finger bed domain‐containing protein 4‐like 37107 E GSONMT00064976001 85.5 63 4 1.36E‐11 239 tonsoku‐like protein 27469 P GSONMT00076779001 81.3 80 15 4.90E‐11 240 ankyrin repeat and sam domain‐containing protein 6 6360 S GSONMT00079699001 88.9 73 5 4.90E‐11 241 at‐rich interactive domain‐containing protein 4b‐like isoform x1 1829 S GSONMT00031238001 84.1 66 8 1.76E‐10 242 coronin‐1c‐like isoform x2 1149 E,S GSONMT00082735001 84.1 67 7 1.76E‐10 243 kinesin‐like protein kif26a‐like isoform x2 169 E,S GSONMT00076742001 86.4 66 7 1.76E‐10 244 lrr and pyd domains‐containing protein 3‐like 108763 P GSONMT00004546001 85.0 60 9 6.34E‐10 245 gag‐like protein 415 E,S GSONMT00074011001 88.2 79 6 6.34E‐10 246 proteasome inhibitor pi31 subunit 5111 S GSONMT00048310001 86.2 58 6 6.34E‐10 247 protein 59797 P GSONMT00054486001 88.0 52 6 2.28E‐09 248 reverse transcriptase‐like protein 124094 E GSONMT00023192001 80.5 77 15 2.28E‐09 249 coiled‐coil domain‐containing protein 127 1087 E,S GSONMT00079705001 88.0 80 6 2.28E‐09 250 extracellular calcium‐sensing receptor‐like 18338 E, P GSONMT00015522001 82.6 75 9 8.21E‐09

79

251 reverse transcriptase‐like protein 106006 P GSONMT00027348001 84.5 80 9 8.21E‐09 252 transposable element tcb1 transposase 80190 E, P GSONMT00025176001 85.5 55 7 2.95E‐08 253 apoptotic chromatin condensation inducer in the nucleus‐like 49123 P GSONMT00061509001 85.2 80 7 1.06E‐07 254 poly ‐specific ribonuclease parn 97473 E GSONMT00024346001 83.9 62 9 1.06E‐07 255 4‐hydroxyphenylpyruvate dioxygenase‐like 122734 E, P GSONMT00014774001 86.0 58 5 3.82E‐07 256 nadh dehydrogenase subunit 5 36525 P GSONMT00039388001 83.6 80 9 3.82E‐07 257 rna‐directed dna polymerase from mobile element jockey‐like 44521 E, P, S GSONMT00059868001 82.5 76 7 3.82E‐07 258 ras association domain‐containing protein 9‐like 27884 E GSONMT00062885001 78.5 79 17 3.82E‐07

80