1 2 MS. IVA POPOVIC (Orcid ID : 0000-0001-6582-4236) 3 4 5 Article type : Original Article 6 7 8 Comparative genomics reveals divergent thermal selection in warm- and cold-tolerant marine 9 mussels 10 11 Iva Popovic *, 1 and Cynthia Riginos 1 12 13 1 School of Biological Sciences, University of Queensland, St Lucia, Queensland 4072, Australia 14 15 * Corresponding Author: 16 Email: [email protected] 17 18 Short Title: Positive selection in a marine mussel 19

20 Word Count: 7711

21 Elements of Manuscript:

22 Abstract

23 Introduction

24 Methods

25 Results

26 Discussion

27 References Cited

28 Tables Author Manuscript 29 Figures

This is the author manuscript accepted for publication and has undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: 10.1111/mec.15339

This article is protected by copyright. All rights reserved 30 31 Abstract 32

33 Investigating the history of natural selection among closely related species can elucidate how genomes 34 diverge in response to disparate environmental pressures. Molecular evolutionary approaches can be 35 integrated with knowledge of functions to examine how evolutionary divergence may affect 36 ecologically-relevant traits such as temperature tolerance and species distribution limits. Here, we 37 integrate transcriptome-wide analyses of molecular evolution with knowledge from physiological studies to 38 develop hypotheses regarding the functional classes of under positive selection in one of the 39 world’s most widespread invasive species, the warm-tolerant marine mussel Mytilus galloprovincialis. 40 Based on existing physiological information, we test the hypothesis that genomic functions previously 41 linked to divergent temperature adaptation at the whole-organism level show accelerated molecular 42 divergence between warm-adapted M. galloprovincialis and cold-adapted congeners. Combined results 43 from codon model tests and analyses of polymorphism and divergence reveal that divergent selection has 44 affected genomic functions previously associated with species-specific expression responses to heat 45 stress, namely oxidative stress defense and cytoskeletal stabilisation. Examining specific loci implicated in 46 thermal tolerance among Mytilus species (based on interspecific biochemical or expression patterns), we 47 find close functional similarities between known thermotolerance candidate genes under positive selection 48 and positively selected loci under predicted genomic functions (those associated with divergent 49 expression responses). Taken together, our findings suggest a contribution of temperature-dependent 50 selection in the molecular divergence between warm- and cold-adapted Mytilus species that is largely 51 consistent with results from physiological studies. More broadly, this study provides an example of how 52 independent experimental evidence from ecophysiological investigations can inform evolutionary 53 hypotheses about molecular adaptation in closely related non-model species. 54 55 Keywords: positive selection, Mytilus, molecular evolution, thermal tolerance, transcriptome 56 57 58 Introduction 59 60 Comparisons among closely related species adapted to contrasting niches provides an opportunity to 61 investigate how genomes diverge in response to different environmental conditions (Oliver et al. 2010). In

62 cases where specificAuthor Manuscript amino acids are known to affect protein function, analyses of intraspecific 63 polymorphism and divergence can be used to directly study functional variation in natural populations 64 (Dean and Thornton 2007; Storz and Wheat 2010; Storz et al. 2015). Such methods can uncover the 65 strength and direction of selection on functional variation and corroborate the adaptive significance of the

This article is protected by copyright. All rights reserved 66 loci under study (e.g., Storz et al. 2009; Linnen et al. 2009; Natarajan et al. 2015). However, the polygenic 67 basis of many traits governing physiological tolerance presents significant challenges for linking genetic 68 variation to ecologically-relevant phenotypes or to differences among species adapted to divergent 69 environments (Storz and Wheat 2010; Rockman 2012; Le Corre and Kremer 2012). Furthermore, the lack 70 of complete reference genomes for many non-model species, coupled with difficulties in establishing 71 multigenerational pedigree relationships, precludes the use of quantitative genetic tools for examining 72 heritable trait variation and testing hypotheses about the molecular basis of adaptation (Colautti and Lau 73 2015; Sherman et al. 2016). 74 75 Knowledge of gene functions can be integrated with molecular evolutionary approaches to interrogate the 76 complex relationship between genomic variation and ecologically-relevant traits (Storz et al. 2015). 77 Investigations of genome-wide expression can be used to classify hundreds of loci into causative 78 functional groups responding to specific ecological stressors (Lockwood et al. 2010; Somero 2012). In 79 turn, such functional classifications can provide insight into the molecular traits differentiating species- 80 specific physiological responses, even in cases where phenotypes are controlled by many co-dependent 81 genes (Rockman 2008). In contrast, molecular evolutionary approaches examine how selection has 82 directly shaped gene sequences and do not require pre-existing knowledge about the phenotypes 83 involved in adaptation (Anisimova and Liberles 2007; Li et al. 2008; Ellegren 2008). For example, genomic 84 comparisons of substitution rate variation have demonstrated lineage-specific positive selection in protein- 85 coding loci associated with reproductive, immune defense and sensory perception functions in diverse 86 taxa (e.g., Clark et al. 2003; Nielsen et al. 2005; Holloway et al. 2007; Roux et al. 2014a). Yet, without the 87 context of the organism’s environment or specific hypotheses regarding the selective pressures acting on 88 focal lineages (e.g., Oliver et al. 2010; Barreto et al. 2010; Ladner et al. 2012; Koester et al. 2013), 89 genome scans may encourage false speculations about the adaptive significance of positively selected 90 loci (Pavlidis et al. 2012; Storz et al. 2015). As a result, analyses relating molecular signatures of positive 91 selection to lineage-specific adaptations could be strengthened by independent experimental evidence 92 under which a priori hypotheses about the genomic functions under selection can be formulated (Pavlidis 93 et al. 2012). 94 95 In this study, we combine genome-wide analyses of positive selection with knowledge from whole- 96 organism physiological studies, where previous physiological investigations provide clear predictions 97 regarding the functional classes of genes likely to be under positive selection for species divergence. We 98 focus on the warm-tolerant marine mussel Mytilus galloprovincialis (McDonald et al. 1991; Lowe et al. Author Manuscript 99 2000), which is one of the world’s most widespread invasive species. Mytilus galloprovincialis is the only 100 Mytilus species that is known to have established invasive populations outside of its native range in the 101 Mediterranean Sea, including introductions in California, South Africa, and Australia (McDonald and

This article is protected by copyright. All rights reserved 102 Koehn 1988; Branch and Steffani 2004; Popovic et al. 2019). Over the last two decades, the genus 103 Mytilus has become an important comparative system for studying links between species distribution 104 limits, marine invasive success and thermal physiology, and there is a growing body of experimental 105 evidence for understanding key molecular functions involved in environmental adaptation in this group 106 (Lockwood et al. 2015). The Mytilus genus therefore holds many advantages of a nearly , 107 with a breadth of ecophysiological studies and functional insights into the genetic basis of thermal 108 adaptation, despite life history features that do not allow for accessible inbred lines or multiple generations 109 required to link physiological differences to their genetic basis. 110 111 Ecophysiological studies investigating the role of environmental adaptation in setting species distributions 112 have largely focused on M. galloprovincialis in the Northeastern Pacific, where introduced populations 113 have displaced native Mytilus trossulus in central and southern California (Figure 1A; Rawson et al. 1999; 114 Geller 1999; Braby and Somero 2006a). Physiological studies have implicated interspecific differences in 115 temperature tolerance as a primary factor explaining species distribution limits and the ability of introduced 116 M. galloprovincialis to outcompete native congeners in warm habitats where their distributions overlap 117 (Lockwood and Somero 2011; Fields et al. 2012; Lockwood et al. 2015; Figure 1B). Indeed, comparative 118 physiological and biochemical investigations have demonstrated marked differences between Mytilus 119 species, consistent with higher warm-temperature tolerance in introduced M. galloprovincialis relative to 120 cold-adapted congeners (e.g., temperature-dependent gene expression, Hofmann and Somero 1996; 121 Lockwood et al. 2010; proteomic response, Tomanek and Zuzow 2010; Fields et al. 2012; Tomanek 2012; 122 metabolic enzyme activity, Fields et al. 2006; Lockwood and Somero 2012; and cardiac function, Braby 123 and Somero 2006b; reviewed in Lockwood and Somero 2011; Lockwood et al. 2015). 124 125 Physiological investigations have also provided insight into the genetic basis of thermal tolerance 126 differences among Mytilus species (Lockwood et al. 2015). Fixed amino acid substitutions in two 127 metabolic enzymes have been sufficient to explain functional divergence in protein heat sensitivity 128 conferring temperature tolerance between warm- and cold-adapted Mytilus species (i.e. cytosolic malate 129 dehydrogenase, cMDH, Fields et al. 2006; isocitrate dehydrogenase, IDH, Lockwood and Somero 2012). 130 Investigations across broader arrays of expressed genes and proteins have also identified functional 131 classes of genes mediating interspecific variation in temperature tolerance (Lockwood et al. 2010; 132 Tomanek 2012). Specifically, genomic functions linked to heat shock responses and oxidative stress (e.g., 133 small molecular chaperones) were associated with the strongest species-specific responses to heat stress 134 differentiating M. galloprovincialis and M. trossulus in both expression and proteomic studies (Lockwood Author Manuscript 135 et al. 2010; Tomanek and Zuzow 2010; Tomanek 2012; Fields et al. 2012). Collectively, this body of 136 research has provided extensive experimental evidence supporting a role of temperature-dependent 137 adaptation in the divergence of M. galloprovincialis from cold-adapted congeners. The Mytilus system

This article is protected by copyright. All rights reserved 138 therefore provides a unique opportunity to compare the genomes of warm-tolerant and cold-tolerant 139 species in the context of known physiological adaptations associated with species-specific temperature 140 tolerance (Braby and Somero 2006a; Lockwood et al. 2015). 141 142 Here, we test the hypothesis that genomic functions experimentally linked to species-specific temperature 143 adaptation show accelerated evolutionary divergence in warm-tolerant and invasive M. galloprovincialis, 144 relative to three cold-tolerant, non-invasive congeners. Based on existing physiological information, we 145 predict that genomic functions associated with divergent expression responses to heat stress (i.e. genes 146 functionally characterised by roles in oxidative stress, proteolysis, energy metabolism, cell signaling, ion 147 transport and cytoskeletal binding and organisation; Lockwood et al. 2010) have experienced positive 148 selection in M. galloprovincialis. Alternatively, without a contribution of divergent thermal selection to 149 species divergence, we would expect positively selected loci to fall predominantly within functional 150 categories shown to evolve at high relative rates in other taxa (e.g., reproductive or immune response- 151 related genes involved in co-evolutionary arms races driven by sexual conflict or host-pathogen defense; 152 Nielsen et al. 2005; Kosiol et al. 2008; Roux et al. 2014a). We focus on protein-coding genes and 153 compare 2719 de novo assembled orthologous transcripts from four northern hemisphere Mytilus species: 154 M. californianus, M. trossulus, M. edulis and M. galloprovincialis. We use both codon model tests and 155 intraspecific polymorphism and divergence-based analyses to identify positively selected loci belonging to 156 predicted genomic functions (Figure S1). We also explore whether any known thermotolerance candidate 157 genes (specific loci previously implicated in thermal tolerance in Mytilus species) have experienced 158 positive selection in M. galloprovincialis. 159 160 Methods 161 162 Transcriptome Data, Processing and Assembly 163

164 We obtained published RNAseq data for four Mytilus species (total of 24 individuals; Table 1) from our 165 previous population genomic investigation (Popovic et al. 2019). RNAseq data are derived from mantle 166 tissue total RNA and paired-end reads are available on the NCBI Sequence Read Archive (SRA) 167 (BioProject ID: PRJNA560413). M. galloprovincialis were sampled from genetically divergent populations 168 from the Atlantic coast of Europe (n=5) and the Mediterranean Sea (n=10) (Fraïsse et al. 2016; Table 1). 169 Outgroup specimens (Mytilus californianus, n=3; Mytilus trossulus, n=3; M. edulis n=3) were collected 170 from known allopatricAuthor Manuscript ranges to minimise the possibility of sampling hybrid individuals (Table 1). All 171 samples were genotyped for the species diagnostic marker Glu-5’ (Rawson et al. 1996) to obtain an initial 172 clue about species identity and subsequent population genomic analyses of transcriptome-derived

This article is protected by copyright. All rights reserved 173 variants ensured that samples were non-hybrid parental taxa (Popovic et al. 2019). Read quality and 174 adapter contamination was assessed using FastQC 175 (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). We used Trimmomatic (v0.36) (Bolger et al. 2014) 176 to remove residual adapter sequences from paired reads and quality trimming was performed using a 4 bp 177 sliding window, phred-scale average quality score of 20, and a minimum size filter of 50 bp. To reduce 178 redundancy among high-coverage reads (including rRNA contaminants) and to discard associated 179 sequence errors, each dataset was digitally normalised using Trinity’s insilico_read_normalization.pl script 180 with a default kmer size of 25 and maximum read coverage of 50 (Grabherr et al. 2011). Overlapping 181 paired reads were merged using FLASH v1.2.11 (Magoč and Salzberg 2011) with a minimum overlap 182 length of 10 bp. 183 184 The resulting read datasets were pooled for each taxon and used to create de novo assemblies using 185 Trinity v2.0.6 (Grabherr et al. 2011) with default parameters. Because downstream analyses required 186 obtaining putative full-length orthologous transcripts, we performed a series of filtering steps on each 187 assembly to remove close paralogs and non-coding RNAs, and to reduce the inclusion of redundant 188 transcripts representing alternatively spliced isoforms, partial contigs or divergent alleles. First, the output 189 of the Trinity pipeline was filtered to retain only the longest isoform per gene group. Transcripts with high 190 sequence similarity were then clustered using Cd-Hit-Est (Li and Godzik 2006; Fu et al. 2012) with a 191 minimum sequence identity threshold of 95% of the shortest sequence. Next, Transdecoder (Haas et al. 192 2013) was used to reduce assemblies to protein-coding sequences with significant matches to the Pfam 193 protein database. Finally, to minimise the possibility of misalignment of orthologous genes between 194 species or the inclusion of non-homologous sequences, we retained only genes with complete predicted 195 open reading frames (ORF) greater than 100 amino acids in which both a start and stop codon was 196 identified. 197 198 Transcriptome Assembly Quality and Gene Annotation 199 200 Predicted coding sequences were queried against the Uniprot-Swissprot protein database using blastx 201 with an e-value of 10-3 for significant matches. Subsequently, contigs with significant blast matches to 202 likely environmental contaminants, including bacteria, fungi, viruses, protists (Alveolata), green 203 (Viridiplantae) and red algae (Haptophyceae), and other eukaryotic contaminants (i.e. Euglenozoa) were 204 removed using Biopython v1.68 and R (R Development Core Team 2017). Additional annotation was 205 carried out using the Trinotate 3.0.1 annotation pipeline using blastx and the Pfam protein database. To Author Manuscript 206 assess the completeness and expected gene content of each filtered assembly, we looked for the 207 proportion of Benchmarking Universal Single Copy Orthologs (BUSCO; Simão et al. 2015), a set of 429 208 single-copy genes selected from OrthoDB that are shared by higher eukaryotes. The M. galloprovincialis

This article is protected by copyright. All rights reserved 209 assembly was also subjected to a blastn search (e-value=10-3) against the M. galloprovincialis draft 210 genome (Murgarella et al. 2016). Additional assembly quality metrics, such as the proportion of aligned 211 reads were assessed by mapping the FLASH-merged normalised reads onto each assembly using 212 Bowtie2 v2.2.9 (Langmead and Salzberg 2012). 213 214 Ortholog Identification and Alignment 215 216 To identify putative orthologs, we used the best-hit blast method Proteinortho v5.13 (Lechner et al. 2011). 217 Clusters of putative orthologous transcripts termed orthogroups, were identified under an e-value 218 threshold of 10-10 and alignment parameters including 50% query sequence coverage, 25% minimum 219 sequence identity, and 95% minimum similarity thresholds for additional blast hits. Only orthogroups 220 containing at least one sequence from each taxon were retained for downstream analyses. In cases 221 where multiple transcripts from the same taxon were identified as putative orthologs, the phylogeny-based 222 PhyloTreePruner pipeline (Kocot et al. 2013) was used to choose a single species representative for each 223 group. We collapsed poorly supported nodes with bootstrap support of less than 60% into polytomies and 224 applied a pairwise distance method (-r flag; SCaFoS; Roure et al. 2007) to select the best orthologous 225 sequence if multiple sequences formed a monophyletic clade from the same taxon. As input for 226 PhyloTreePruner, nucleotide alignments of orthogroups were generated using a codon-based alignment 227 algorithm in PRANK (-F -codon option; Löytynoja and Goldman 2008), which takes evolutionary 228 relationships into consideration to increase the accuracy of aligning homologous sequences in the 229 presence of indel variation (Jordan and Goldman 2012). Individual gene trees were generated for each 230 alignment using FastTree2 (Price et al. 2010) with default options (Jukes-Cantor with CAT approximation) 231 and gamma likelihood approximation. 232 233 The final dataset consisted of 2719 high-quality orthologous groups including all four taxa. Prior to 234 selection analyses, alignments were screened for the effects of intragenic recombination (which can 235 influence the tree topology of individual sites) using the single break point method implemented in HyPhy 236 (Kosakovsky Pond et al. 2006). The GARDprocessor.bf module was used to assess whether significant 237 breakpoints are due to topological incongruence using the Shimodaira-Hasegawa test at a significance 238 level of P=0.05. In the presence of significant recombination breakpoint, alignments were partitioned and a 239 new topology was generated for each gene partition prior to selection analyses. Branch lengths and node 240 labels were removed from the newick formatted tree files using the R package Ape (Paradis et al. 2004). 241 Author Manuscript 242 Branch-Site Models of Molecular Evolution 243

This article is protected by copyright. All rights reserved 244 To identify loci under positive selection in the M. galloprovincialis lineage, we carried out branch-site tests 245 of positive selection for each alignment using the codeml method in PAML v4.8 (Yang et al. 2005; Zhang 246 et al. 2005; Yang 2007). Because the rate of synonymous substitution approximates neutral evolution, the 247 ratio of nonsynonymous substitutions per nonsynonymous site (dN) relative to synonymous differences 248 (dS), omega (=dN/dS), can be inferred as positive selection (>1) for amino acid substitutions or 249 purifying selection (<1) against nonsynonymous changes. Given an a priori hypothesis, the branch-site 250 method implements a maximum likelihood codon model to compare  variation among a set of foreground 251 branches (i.e. the lineage tested for positive selection) and background branches in the focal gene tree to 252 identify whether selection is significantly different on the specified foreground lineage. The selection model

253 permits all sites in a foreground branch to vary in , which can belong to one of three rate classes: 0<1,

254 1=1 and 2 estimated as a free parameter relative to the background rate of evolution. The background 255 rate can reflect either purifying selection (class 2a) or neutral evolution (class 2b). Positive selection is

256 inferred if 2>>1 and the likelihood of the selection model is significantly greater than the likelihood of the 257 null model where  for the foreground lineages is set to neutral evolution (=1). The likelihood ratio test 258 statistic (LRT), twice the log-likelihood difference (-2l), was compared to the X2 distribution (critical 259 value=3.84; P=0.05) with one degree of freedom to determine if the selection model significantly improved 260 the fit to the genetic data. In cases where the LRT is significant, Bayesian inference (Bayes Empirical 261 Bayes; BEB) was used to estimate the posterior probability that individual codons in the foreground 262 branch are under positive selection (Yang et al. 2005). Codons were inferred to be under positive 263 selection if the BEB critical posterior probability value was P≥0.95. 264 265 To account for gene tree discordance among individual loci that may result from incomplete lineage 266 sorting (Mendes and Hahn 2016), we generated individual gene trees for each alignment and specified 267 the M. galloprovincialis lineage as the foreground branch (Figure 1C) as input for the PAML analyses. 268 Codeml analyses were conducted using pairwise deletion of sequence gaps to estimate  for sites that 269 included indels between orthologous sequences. To ensure that codon misalignment did not affect our 270 conclusions about site-specific positive selection (e.g., Mallick et al. 2009), alignments of genes under 271 selection were visually inspected post hoc to identify unreliable alignments at positively selected sites. We 272 excluded loci with characteristics indicative of potential misalignments leading to high  variation: 1) 273 positively selected sites within five codons of an indel; 2) positively selected sites within five codons of a 274 start/stop codon; and 3) positively selected sites within a cluster of selected codons (Markova-Raina and 275 Petrov 2011). We termed this reduced dataset as having the ‘highest-confidence alignments’ under Author Manuscript 276 positive selection. 277 278 Analyses of Positive Selection using Polymorphism Variation

This article is protected by copyright. All rights reserved 279 280 Additional power to detect positive selection can be obtained through combining divergence data with 281 intraspecific polymorphism data for the focal lineage (Bierne and Eyre-Walker 2004; Christe et al. 2017). 282 Given the relatively recent divergence between M. galloprovincialis and M. edulis (~1-2.5 million years; 283 Boon et al. 2009 Roux et al. 2014b; ~2% genomic divergence, Roux et al. 2016; Fraïsse et al. 2018), as 284 well as historical introgression (Fraïsse et al. 2016), we expected that incomplete lineage sorting may be 285 pervasive in some genomic regions. Moreover, positive selection can be overestimated if species have 286 split only very recently (<10Ne generations; Keightley and Eyre-Walker 2012). To minimise the effects of 287 shared polymorphism and to keep species comparisons consistent with ecophysiological studies, we 288 compared intraspecific polymorphism to divergence relationships of replacement and synonymous 289 substitutions in two genetically divergent native M. galloprovincialis populations from the eastern and 290 western Mediterranean (n=10) and Atlantic (n=5) (Table 1), relative to M. trossulus. RNAseq reads were 291 mapped to 2719 reference contigs using Bowtie2 v2.2.9 (Langmead and Salzberg 2012) and PCR 292 duplicates were removed using Picard MarkDuplicates (http://picard.sourceforage.net). Variants were 293 identified using Freebayes (https://github.com/ekg/freebayes). To minimise the inclusion of erroneous or 294 low quality variants, we removed singletons, indel or complex variants, and columns with more than 70% 295 missing data from the resulting VCF file using vcftools (Danecek et al. 2011). The remaining Single 296 Nucleotide Polymorphisms (SNPs) were filtered for a minimum quality of 30, and individual genotype calls 297 below a quality score of 30 and depth coverage of 5 reads were removed. Using a total of 44,750 SNPs, 298 we reconstructed consensus haplotype sequences for each M. galloprovincialis individual using filtered 299 variants and reference sequences in BCFtools v1.3.1, such that heterozygous sites were consecutively 300 assigned a strand. The resulting 2498 gene alignments were subset by population, merged with the 301 corresponding M. trossulus outgroup sequence, and aligned using PRANK as outlined above (Löytynoja 302 and Goldman 2008). From each dataset, we discarded sequences with premature stop codons (annotated 303 in SNPeff; Cingolani et al. 2012) and loci for which no variant information was retained following filtering. 304 We calculated divergence and polymorphism counts for synonymous and replacement substitutions for 305 each , for each locality, using the Polymorphorama perl script (Bachtrog and Andolfatto 2006; 306 Haddrill et al. 2008). We did not consider synonymous polymorphisms with frequencies below 10% (i.e. 307 Fay et al. 2001; Bierne and Eyre-Walker 2004; Charlesworth and Eyre-Walker 2008). 308 309 We performed two tests of selection to determine whether adaptive fixation could be detected, when 310 polymorphism data is considered. Because strong population structure may lead to false classification of 311 fixed substitutions (between divergent lineages) as polymorphisms and obscure true signals of positive Author Manuscript 312 selection, the subsequent tests were performed on Mediterranean and Atlantic M. galloprovincialis 313 populations independently. Assuming that the contribution of polymorphism substitutions to adaptive

This article is protected by copyright. All rights reserved 314 evolution is relatively small, we calculated alpha, the proportion of nonsynonymous substitutions fixed in 315 each lineage by positive selection for each locus, based on equation 2 in Smith and Eyre-Walker (2002): 316

317 alpha = 1 – DsPn ⁄ DnPs ; 318

319 where Dn, Ds, Pn, Ps are the numbers of replacement and synonymous substitutions and polymorphisms 320 per gene, respectively. A G-test of independence with the Williams correction was carried out in R to 321 assess the significance of alpha at P≤0.05 (McDonald and Kreitman 1991). To estimate the relative 322 strength of natural selection and establish how the contribution of slightly deleterious polymorphisms (that 323 may contribute to false signals of adaptive fixation; Keightley and Eyre-Walker 2012; Mugal et al. 2013; 324 but see Eyre-Walker and Keightley 2009) may vary among genes and M. galloprovincialis populations, we 325 calculated the DoS statistic at the level of each locus with significant values of alpha for each locality 326 (Stoletzki and Eyre-Walker 2011) 327

328 DoS = Dn ⁄ (Dn + Ds) − Pn ⁄ (Pn + Ps) ; 329 330 where positive DoS values indicate positive selection and negative DoS values suggest slightly 331 deleterious mutations segregating in the focal population. To validate that adaptive divergence has 332 occurred within the M. galloprovincialis lineage (rather than divergence driven in the outgroup lineage), we 333 repeated polymorphism and divergence statistics post hoc using M. edulis as an outgroup for seven 334 candidate loci showing evidence of positive selection in both codon models and significant and positive 335 alpha/DoS values in the initial analyses conducted against a M. trossulus outgroup (Table 2). 336 337 Identification of Known Thermotolerance Candidate Genes 338 339 We obtained specific loci previously identified as thermotolerance candidate genes based on their 340 biochemical or expression patterns among Mytilus congeners in physiological investigations (Table S1). 341 Gene sets included thermotolerance candidates for (A) divergent functional adaptation (n=2; i.e. cMDH, 342 IDH, Fields et al. 2006; Lockwood and Somero 2012); (B) species-specific differential expression under 343 heat-stress (n=96; Lockwood et al. 2010); and (C) shared transcriptional responses in three Mytilus 344 congeners under acute heat stress (n=175; Connor and Gracey 2012; Lockwood et al. 2015). A total of 345 273 thermotolerance candidate genes were assigned to reference transcripts using reciprocal best hits in 346 Proteinortho v5.13 (Lechner et al. 2011), but only a small proportion of candidate loci (36/273) were Author Manuscript 347 matched to complete orthologs in all four taxa. Because we applied stringent filtering to remove genes 348 expressed at low levels before assigning orthologous sequences, it is likely that many candidate loci were 349 excluded from the reference dataset, and thus, downstream analyses. Indeed, the majority of matched

This article is protected by copyright. All rights reserved 350 transcripts were found in trace counts in the transcriptome assemblies, such that we could only recover 351 partial open reading frames in one or more species. We did not consider these partial transcripts that 352 could lead to unreliable interspecific alignments. 353 354 Enrichment Analyses of Positively Selected Genes 355 356 We determined whether categories were enriched among sets of positively selected loci in 357 the R-bioconductor package topGO v2.26 (Alexa et al. 2006; Alexa and Rahnenfuhrer 2010). We applied 358 Fisher’s exact tests using the topGO wieght01 algorithm and considered GO terms assigned to at least 359 five genes to identify enriched biological process categories (P≤0.001). Because enrichment analyses are 360 highly dependent on accurately characterising the diversity and abundance of GO categories in the 361 reference assembly (including those expressed at low levels), enrichment analyses were performed 362 against two reference gene sets: (i) 38,534 protein-coding transcripts (complete and partial ORFs) from 363 the M. galloprovincialis reference assembly; and (ii) 2719 one-to-one orthologous transcripts analysed in 364 this study (Supplementary Material). 365 366 Results 367 368 Transcriptome Assembly 369 370 Quality filtering and Trinity read normalisation resulted in a large reduction of reads (up to 84.6%) prior to 371 assembly, which was largely due to the high presence of rRNA contaminants in the raw cDNA libraries. 372 Filtering and de novo assembly statistics are summarised in Table S2. The removal of alternatively spliced 373 variants and transcripts with high sequence similarity resulted in 80,523 to 130,389 non-redundant 374 transcripts per reference assembly. Based on these assemblies, we extracted transcripts containing likely 375 ORFs, which resulted in 7,046 to 13,681 complete coding sequences per species for downstream 376 analyses (Table S3). 377 378 Assembly Quality, Annotation and Ortholog Identification 379 380 Functional annotation against the Uniprot-Swissprot protein database returned significant blastx hits for 381 ~70% of predicted coding sequences for each species; only ~5% of transcripts were identified as 382 environmental contaminants and removed from each dataset (Table S4). Quality assessments suggested Author Manuscript 383 high quality for de novo transcriptome reconstruction: The proportion of normalised reads that successfully 384 mapped back to each respective assembly ranged from 82.4% to 92.3%. Additionally, the percentage of 385 full-length and partial BUSCO orthologs recovered from each assembly was greater than 87% (Table S5)

This article is protected by copyright. All rights reserved 386 suggesting high assembly completeness; this value is similar to proportions of single copy orthologs 387 identified in other comparative genomic studies (e.g., Hodgins et al. 2014). Additionally, greater than 97% 388 of contigs in the M. galloprovincialis draft genome (although presumed to be highly fragmented; see 389 Murgarella et al. 2016; Table S6) recovered a significant hit to our M. galloprovincialis transcriptome 390 assembly. Reciprocal best hits of complete coding sequences yielded 2719 high-quality orthologous 391 groups including all four taxa, out of which 79 (3%) required pruning of putatively paralogous sequences in 392 at least one taxon. A significant recombination breakpoint was identified in nine alignments; each of these 393 alignments were partitioned and a new topology was generated for each partition prior to selection 394 analyses. 395 396 Genomic Functions under Positive Selection in Warm-Tolerant M. galloprovincialis 397 398 Branch-site analyses revealed 99 loci (3.6%) with significant evidence for positive selection (>>1) in M. 399 galloprovincialis using the LRT (Figure 2; Table S7). A smaller subset of 38 genes (1.4%) additionally 400 returned at least one codon under positive selection at a BEB posterior probability cut-off (P≥0.95; Figure 401 2). To minimise the effects of alignment error on selection tests, we filtered this subset of 38 genes to a 402 reduced gene set (n=19) containing loci with the ‘highest-confidence alignments’ under positive selection. 403 To account for intraspecific polymorphism that may contribute to high  variation in M. galloprovincialis, 404 we used a McDonald-Kreitman framework to compare patterns of polymorphism and divergence to test for 405 positive selection in the same 2719 reference loci. In contrast to branch-site tests, a larger number of 406 genes (n=175) showed evidence of adaptive divergence in one or both populations (P≤0.05) (Figure 2). 407 Estimates of positive alpha were greater than 0.5 for all 175 selected genes (P≤0.05). While the low 408 number of polymorphic sites at some genes may account for relatively high alpha estimates, these 409 findings are consistent with high proportions of fixed substitutions observed in other species (Smith and 410 Eyre-Walker 2002; Galtier 2016). 411 412 We calculated the direction of selection (DoS) statistic separately for each locus to assess the contribution 413 of positive (DoS=positive) and relaxed purifying selection (DoS=negative) to high  variation in the M. 414 galloprovincialis lineage (Table S7). Significant and positive DoS values were obtained for 17 loci that 415 were also positively selected in branch-site LRTs, among which only seven loci also had one or more 416 codons under positive selection (BEB P≥0.95, Figure 2; Table 2); when only loci with the ‘highest- 417 confidence alignments’ were considered, four loci showed evidence for positive selection in both sets of

418 analyses (Figure 2).Author Manuscript We find that five out of six loci with functional annotations have documented roles in 419 heat tolerance based on functional gene annotations in the GO database (Ashburner et al. 2000), the 420 UniProt Knowledge Database (The UniProt Consortium 2017), or direct associations to heat-stress 421 responses in published Mytilus ecophysiological studies (i.e. TNF, CPT1A; Table 2). Specifically, three of

This article is protected by copyright. All rights reserved 422 these positively selected loci fall within predicted genomic functions (oxidative defense and cytoskeletal 423 stabilization) associated with the strongest species-specific responses to temperature-stress in 424 physiological studies (i.e. MGST3, CSAD, MDM1, Table 2; Figure 2). Only one out of these six loci had a 425 functional annotation linked to reproduction (i.e, HSD17B2). 426 427 Because positive alpha/DoS estimates may be interpreted as divergence driven by adaptive evolution in 428 the outgroup lineage rather than selection in the focal species, we also conducted polymorphism and 429 divergence analyses using M. edulis as an outgroup for the seven loci in which we identified both 430 significant branch-site tests (LRT P≤0.05 and BEB P≥0.95) and adaptive divergence relative to M. 431 trossulus (Table 2). Compared to M. edulis, we found significant and positive alpha/DoS values in one or 432 both localities (Mediterranean or Atlantic) for all seven loci, implicating accelerated evolution in the M. 433 galloprovincialis lineage (Table S8). Among all loci, a small proportion of transcripts had significantly 434 negative DoS values (17.6% and 21.8% in the Mediterranean and Atlantic populations respectively; Figure 435 2b; Table S7) suggesting weakly deleterious nonsynonymous substitutions segregating in the population; 436 however, only one of these loci was also positively selected in branch-site analyses (using inference 437 based on the LRT statistic), indicating that signatures of accelerated evolution in the M. galloprovincialis 438 lineage are unlikely due to relaxed purifying selection at selected loci. 439 440 Thermotolerance Candidate Genes under Positive Selection 441 442 Out of 36 thermotolerance candidate genes identified from physiological investigations (and present in our 443 reference dataset), we found five genes under positive selection in either codon model tests or 444 polymorphism and divergence-based analyses (Figure 2; Table 3). Four loci belonged to candidate group 445 C based on shared transcriptional heat-stress responses in three Mytilus congeners. Specifically, branch- 446 site models identified two homologs under positive selection using inference based on the LRT statistic 447 (Table 3). The first homolog had a blastx annotation to a small heat shock protein of the alpha-crystallin 448 protein family and Heat Shock Protein 25 (HSP25; based on M. californianus EST annotations; Genbank 449 accession ES737726), which is up-regulated in heat-stressed Mytilus congeners (Connor and Gracey 450 2012; Lockwood et al. 2015). The branch-site model also identified a single codon with a high Bayesian 451 posterior probability of positive selection at site 185 (albeit not significant; BEB P=0.905) encoding a 452 functional amino acid substitution in M. galloprovincialis (Figure 3). The second homolog returned a best 453 hit to protein Shootin-1 (SHTN1) involved in actin filament binding (Connor and Gracey 2012; Lockwood et 454 al. 2015). Two candidate genes showed significantly positive alpha/DoS values. We identified Author Manuscript 455 oxidoreductase S-transferase omega-1 (GSTO1) and Dipeptidyl peptidase 1 (i.e. cathepsin C; 456 CTSC) (Table 3; Connor and Gracey 2012; Lockwood et al. 2015). Positive alpha/DoS values were 457 identified for the thermotolerance candidate gene cMDH (group A: divergent functional adaptation),

This article is protected by copyright. All rights reserved 458 corroborating that observed amino acid differences between M. trossulus and M. galloprovincialis are 459 likely adaptive (Fields et al. 2006; Table 3). Post hoc inspections of cMDH read alignments revealed that 460 M. edulis is polymorphic for valine and asparagine amino acids at the functional codon 114, while M. 461 galloprovincialis is fixed for asparagine based on available data. 462 463 Gene Ontology Enrichment in Positively Selected Loci 464 465 We obtained GO terms for 21,185 out of 38,534 protein-coding transcripts in the M. galloprovincialis 466 assembly. Positively selected loci identified in PAML were significantly enriched for GO terms associated 467 with sulfur compound metabolism and organonitrogen compound catabolic processes (P<0.001) (Table 468 4). A wider diversity of GO terms were enriched among genes with significant and positive alpha/DoS 469 values, including terms involved in RNA splicing, RNA-protein binding, and liver development (Table 4). 470 Results of enrichment analyses using a smaller dataset of 2719 analysed transcripts as the reference 471 gene set are in Table S9. 472 473 Discussion 474 475 We demonstrate that functional classes of genes linked to species-specific temperature adaptation in 476 physiological studies show accelerated evolutionary divergence between warm- and cold-adapted Mytilus 477 species. We find that divergent selection has affected genomic functions previously associated with 478 divergent expression responses to thermal stress, with the strongest signatures of positive selection (i.e. 479 significant in both branch-site tests and analyses of polymorphism and divergence) in a small set of loci 480 involved in oxidative stress defense. Furthermore, known thermotolerance candidate genes under positive 481 selection in either selection test share close functional similarities to several positively selected loci within 482 predicted genomic functions (those associated with divergent expression responses). These findings 483 suggest a contribution of temperature-dependent selection in the divergence of M. galloprovincialis that is 484 largely consistent with results from physiological studies. More broadly, our study highlights how 485 independent evidence from ecophysiological studies can inform evolutionary hypotheses about molecular 486 divergence in non-model species. 487

488 Genomic Functions linked to Temperature Adaptation show the Strongest Signatures of Positive Selection 489 490 Positively selectedAuthor Manuscript loci in the M. galloprovincialis lineage included those involved in immunity, 491 reproduction and cellular processes relating to transport, signaling and protein modification (Table S7). 492 Such functional gene categories frequently evolve under positive selection in other taxa (Nielsen et al.

This article is protected by copyright. All rights reserved 493 2005; Kosiol et al. 2008). A striking result here, however, is the greater weight of evidence of positive 494 selection acting on loci with functional roles characterised by the same genomic functions differentiating 495 warm- and cold-adapted Mytilus species in expression studies (Table 2). Out of seven loci with signatures 496 of positive selection in both selection tests (i.e. genes with at least one codon with a high Bayesian

497 posterior probability within the positively selected site class (2>>1) and significantly positive alpha/DoS 498 values), five candidates have documented functions in heat stress tolerance (Table 2). Furthermore, we 499 find that three of these loci belong to predicted genomic functions associated with species-specific 500 expression, namely oxidative stress defense and cytoskeletal stabilisation (MGST3, CSAD, MDM1; Figure 501 2; Table 2). In contrast, only one selected locus was characterised by a reproductive function (HSD17B2). 502 503 Positive selection in loci functionally linked to transcriptional adaptation suggests that expression 504 divergence between warm- and cold-adapted Mytilus species may be paralleled by sequence evolution 505 across longer evolutionary timescales. Significant correlations between expression divergence and 506 increased evolutionary rates (dN/dS) have been identified in a number of terrestrial taxa (e.g., Drosophila, 507 Lemos et al. 2005; Solenopsis fire ants, Hunt et al. 2013; conifers, Hodgins et al. 2016), with positive 508 selection at specific genes contributing to observed correlations in some species (e.g., Drosophila, 509 Nuzhdin et al. 2004). In some cases, however, as in conifers, significant correlations between evolutionary 510 rate and expression divergence are only evident when groups of loci (binned by their expression levels) 511 are compared (Hodgins et al. 2016). Furthermore, inconsistent support for such patterns in other taxa 512 (e.g., yeast, Tirosh and Barkai 2008; sunflowers, Renaut et al. 2012) suggests that expression divergence 513 does not always predict sequence evolution in the same loci. Selective pressures toward common 514 adaptive optima are likely to operate through divergent molecular pathways (Moyers and Rieseberg 515 2013). In the present study, we found no evidence for positive selection on specific thermotolerance 516 candidate genes showing divergent expression responses among Mytilus congeners (group B; Lockwood 517 et al. 2010). Instead, our results suggest that commonalities between high relative rates of molecular 518 evolution and species-specific expression responses may be observed at higher functional classifications 519 but may be rarer at the level of individual loci. 520 521 Functional Similarities between Positively Selected Loci and Known Thermotolerance Candidate Genes 522 523 Examining specific thermotolerance candidate genes (i.e. known to confer thermal adaptation in 524 physiological studies), we find five candidates under positive selection in independent selection tests

525 (Figure 2; Table 3).Author Manuscript Notably, selected thermotolerance candidate genes share close functional similarities 526 with positively selected loci falling under predicted genomic functions (Table 2; Table 3). First, proteins 527 involved in cytoskeletal reorganisation (i.e. SHTN1, MDM1, small HSP25; Haslbeck et al. 2005; Tomanek 528 2012) were significantly affected by temperature and heat-induced oxidative stress in expression and

This article is protected by copyright. All rights reserved 529 proteomic studies in Mytilus species (Lockwood et al. 2010; Tomanek 2012) and are also under positive 530 selection in this study. Second, thermotolerance candidate loci GSTO1 and MGST3 encode glutathione S- 531 transferase enzymes that control oxidative stress through the metabolism of glutathione, the most 532 abundant free thiol responsible for redox regulation of reactive oxygen species (ROS) in the cell (Dalle- 533 Donne et al. 2009; Tomanek 2015). Third, functional similarities are evident between CSAD and CTSC; 534 both enzymes regulate organic osmolytes and ROS through targeted breakdown of sulfur-containing 535 amino acids in the intracellular milieu (McGuire et al. 1992; Connor and Gracey 2012; Lockwood et al. 536 2015). We propose that lineage-specific divergence among functionally analogous, but non-homologous 537 proteins implicates strong selection on oxidative stress-related genomic functions that is consistent with 538 suggestions that selection acting on ROS-scavenging cellular pathways is a primary feature differentiating 539 M. galloprovincialis from cold-adapted congeners (Tomanek 2012; 2015). Specifically, genes involved in 540 the targeted metabolism of sulfur-containing thiol groups (e.g., glutathione, cysteine) were significantly 541 enriched among positively selected gene sets (Table 4). Loci involved in sulfur metabolism have been 542 shown to diverge under positive selection between ecologically divergent sea urchins occupying shallow 543 and deep water habitats (Allocentrotus sp., Oliver et al. 2010), and may be important candidates 544 associated with temperature-related ecological shifts experienced by Mytilus species (this study). 545 546 Combining Codon Models with Polymorphism and Divergence Analyses 547 548 Why did we not always identify congruent signatures of locus-specific positive selection in codon model 549 tests and polymorphism and divergence-based analyses? Observed discrepancies between analytical 550 approaches may reflect inherent differences in the methods used in this study. Global McDonald- 551 Kreitman-based tests of adaptive divergence (i.e. alpha/DoS) rely on summed counts of polymorphism 552 and divergence substitutions within individual genes (Bierne and Eyre-Walker 2004; Stoletzki and Eyre- 553 Walker 2011). As a result, the power of such methods is limited to gene-wide signals of selection 554 operating over long evolutionary timescales. In turn, such methods may underestimate adaptive fixation if 555 current polymorphism is maintained by polygenic selection or soft selective sweeps (Storz and Wheat 556 2010; Messer and Petrov 2013). Furthermore, because signatures of selection cannot be specified to 557 individual codons, methods based on pooled polymorphism statistics are likely to miss selective targets if 558 adaptive evolution is constrained to small gene regions, such as protein binding domains in long 559 conserved proteins (e.g., Popovic et al. 2014; Hart et al. 2014). 560 561 In contrast, likelihood-based branch-site tests detect variation in positive selection among individual amino Author Manuscript 562 acids and lineages in a phylogeny (e.g., Anisimova and Yang 2007). The small number of taxa analysed 563 in this study, however, will have limited the power of the branch-site model to estimate gene-wide  564 variation and positive selection in a single focal lineage (Anisimova et al. 2001; Stoletzki and Eyre-Walker

This article is protected by copyright. All rights reserved 565 2011; Lu and Guindon 2014). The sensitivity of such gene-tree based methods may be further diluted for 566 closely related Mytilus species in which shared ancestral polymorphism is predicted to be high, 567 synonymous substitutions to be low (Stoletzki and Eyre-Walker 2011), and interspecific introgression 568 common (Fraïsse et al. 2016). On the other hand, lineage-specific estimates of  are difficult to interpret 569 for loci in which shared polymorphism is high and reciprocally monophyletic gene trees are predicted to be 570 rare. Branch-site models assume that all substitutions contributing to divergence are fixed within species. 571 However, both ancestral and lineage-specific polymorphisms (adaptive and weakly deleterious) can 572 contribute to divergence in closely related species, and may lead to high  variation when only a single 573 sequence per taxon is used. The inclusion of population-level polymorphism data provides one way to 574 validate these assumptions; although small sample sizes coupled with missing individual genotypes and 575 the loss of variants that did not meet filtering thresholds (e.g., rare variants or complex structural 576 polymorphisms), provides conservative estimates of the extent of shared variation segregating within M. 577 galloprovincialis. Nevertheless, congruence between selection tests in this study for focal loci illustrates 578 the power of combining these complementary approaches for (i) distinguishing between positive selection 579 and relaxed purifying selection leading to high  variation; and (ii) identifying genes that are among the 580 most likely candidates for adaptive species divergence, rather than artefacts of neutral evolutionary 581 processes or methodological biases. 582 583 Genetics of Species Divergence and Implications for Thermal Adaptation 584 585 For the case of M. galloprovincialis, warm-temperature tolerance appears to have facilitated initial 586 invasions into southern California (Lockwood and Somero 2011; Lockwood et al. 2015), and our results 587 suggest that divergence at genomic functions linked to thermal adaptation is paralleled at the molecular 588 level. To date, cMDH and IDH are the only enzymes for which there is genetic and biochemical evidence 589 for functional variation affecting thermal tolerance limits in Mytilus species (Fields et al. 2006; Lockwood 590 and Somero 2012). The loci under positive selection in this study should be of interest for genetic assays 591 of protein function and potential roles in temperature adaptation in this system (Lockwood and Somero 592 2011). For example, positive selection in a homolog of CSAD has acted on a single codon conferring a 593 functional group change from a polar and uncharged threonine to a non-polar and hydrophobic valine 594 residue in M. galloprovincialis (Table 2; Figure 3). CSAD is a rate-limiting enzyme for taurine biosynthesis 595 through targeted breakdown of sulfur amino acids (de la Rosa and Stipanuk 1985). Taurine is a major free 596 organic acid in the tissues of osmoconforming marine invertebrates with primary functions in regulating

597 osmotic desiccation,Author Manuscript membrane stabilisation, and reducing oxidative damage (Silva and Wright 1992; 598 Schaffer et al. 2010; Jong et al. 2012; Yancey and Siebenaller 2015). Interestingly, physiological work in 599 M. californianus has demonstrated significant and positive correlations of taurine levels in gill tissues with 600 exposure to elevated temperatures, pointing to thermoprotective properties of taurine in response to heat

This article is protected by copyright. All rights reserved 601 stress under natural conditions (Gleason et al. 2017). However, hypotheses regarding CSAD activity and 602 taurine turnover under thermal stress have not yet been investigated in warm- and cold-tolerant Mytilus 603 species and present valuable directions for future research. 604 605 In another example, positive selection on the gene encoding the oxidative stress enzyme MGST3 is 606 evident on a single codon with three nucleotide substitutions encoding a nonpolar and hydrophobic 607 tryptophan in M. galloprovincialis compared to polar glutamine acid residues in the other Mytilus species 608 (Table 2; Figure 3). MGST3 is a glutathione-S transferase (GST) enzyme that catalyses the binding of 609 cellular glutathione to cysteine residues of proteins for protection against oxidative stress (Abele and 610 Puntarulo 2004; Dalle-Donne et al. 2009). GSTs also show elevated rates of sequence evolution in 611 Acropora corals (Voolstra et al. 2011) and have been identified as candidates for population-level 612 differentiation under divergent thermal selection (Bay and Palumbi 2014) pointing to functions mediating 613 temperature tolerance in other marine invertebrates. Positively selected loci identified in this study provide 614 opportunities to compare allele frequency patterns in multiple M. galloprovincialis populations to determine 615 whether genes under selection for species divergence (identified in this study) are also important for local 616 adaptation within species. Sequence data can also be used in combination with computational 617 approaches to investigate how specific amino acid substitutions affect three-dimensional protein structure 618 and thermal stability, or whether these sites comprise binding regions undergoing large conformational 619 changes during catalysis (e.g., Fields et al. 2012; Fields et al. 2015; Liao et al. 2017; Saarman et al. 620 2017). Such data would allow a more direct assessment of the contributions of thermal selection on the 621 evolution of species characteristics mediating present-day distribution limits. 622 623 Conclusions 624 625 In summary, our findings corroborate independent lines of experimental evidence for temperature 626 adaptation with signatures of positive selection in the same functional classes of genes differentiating 627 warm- and cold-adapted Mytilus species in expression studies. Considering specific loci, we find close 628 functional similarities between known thermotolerance candidate genes under positive selection and 629 positively selected loci under predicted genomic functions (associated with divergent expression 630 responses). Specifically, these results highlight positive selection in genes encoding core elements of 631 biochemical adaptation to temperature: proteins involved in controlling oxidative stress though the 632 targeted metabolism of sulfur-containing thiol groups directly (e.g., glutathione S-transferases) or indirectly 633 by regulating antioxidant osmolytes in the cellular milieu (e.g., enzymes involved in sulfur amino acid Author Manuscript 634 catabolism), and contributing to cytoskeletal stabilisation (e.g., tubulin and actin binding proteins, and heat 635 shock proteins). However, we recovered no evidence for positive selection on known thermotolerance 636 candidate genes that have shown divergent expression patterns between congeners (group B; Lockwood

This article is protected by copyright. All rights reserved 637 et al. 2010). Our findings suggest that expression divergence and sequence divergence under positive 638 selection is often not evident at the level of individual loci, but may be correlated at higher functional 639 classifications. Taken together, molecular divergence of M. galloprovincialis is consistent with warm- 640 temperature adaptation demonstrated by physiological studies. Our study provides an example of how 641 genomic-enabled studies of molecular adaptation can complement direct experimental measures of gene 642 expression from whole-organism ecophysiological studies. 643 644 Acknowledgements 645 646 We would like to thank all of the colleagues who helped with mussel collections and for bioinformatics 647 help, in particular, G. Rouse, M. Hart, B. Popovic, L. Malard, J. Thia, A. Matias, C. McDougall, H. Lui, 648 members of the Palumbi Lab and Hacky Hour UQ. We are very grateful to D. Ortiz-Barrientos, S. Palumbi 649 and G. Somero for useful suggestions and for many insightful discussions on Mytilus genomics and 650 physiology. This work is supported by the Australian Biological Resources Study (ABRS) National 651 Taxonomy Research Grant (grant number RF216-11), and awards from the National Sciences and 652 Engineering Research Council of Canada (to I.P), the Society for the Study of Evolution (to I.P) and the 653 University of Queensland (to I.P). 654 655 Data Accessibility Statement 656 657 RNAseq data are available on the NCBI Sequence Read Archive (SRA) (BioProject ID: PRJNA560413) 658 and associated geographic sampling information can be found in the Genomic Observatories 659 MetaDatabase (https://geome‐db.org). Protein coding transcriptome assemblies and supporting genomic 660 data sets that support the findings of this study are openly available the Dryad digital repository at 661 https://doi.org/10.5061/dryad.5qfttdz1h. 662 663 Author Contributions 664 665 I.P and C.R conceived the study and wrote the manuscript. I.P collected and analysed the data. 666 667 Figure 1. Geographical and thermal distributions for M. galloprovincialis (MG), M. edulis (ME), M. 668 trossulus (MT) and M. californianus (MC). a) Present day distributions in the northern hemisphere based

669 on published geneticAuthor Manuscript literature; b) Mean Sea Surface Temperature (SST) data corresponding to species 670 range. Average annual SST data were compiled from the National Oceanographic Data Center 671 (www.nodc.noaa.gov) and analysed in R (R Development Core Team 2017); c) Cladogram illustrating

This article is protected by copyright. All rights reserved 672 taxonomic relationships. Branch-site model hypothesis test of high relative rates of amino acid substitution 673 along the MG foreground lineage is indicated in red. 674 675 Figure 2. Summary of selection test results. a) Venn diagram showing counts of de novo assembled 676 transcripts belonging to the total protein-coding reference assembly, dataset of 2719 one-to-one 677 orthologous groups analysed, known thermotolerance candidate genes, and positively selected gene sets 678 according to each selection test. Thermotolerance candidate genes groups correspond to known 679 candidates for (A) divergent genetic adaptation, (B) species-specific differential expression, and (C) 680 shared transcriptional responses under acute heat stress. Seven genes with evidence of positive selection 681 in both branch-site tests (i.e. Likelihood ratio test (LRT), P≤0.05; Bayesian Empirical Bayes (BEB), 682 P≥0.95) and polymorphism and divergence analyses are shown in the grey box. Four positively selected 683 genes with the ‘highest-confidence alignments’ are marked with an asterisk; b) Strength of selection 684 profiles for orthologous gene groups, showing the relationship between per-gene LRT statistic and DoS 685 statistic for the Mediterranean M. galloprovincialis population (refer to Figure S2 for Atlantic population 686 comparisons). Significant LRT values are in cyan. Significant DoS values are solid circles. Seven genes 687 showing the strongest signatures of positive selection are outlined in red. 688 689 Figure 3. Amino acid substitutions and their functional group attributes in selected genes with evidence of 690 positive selection in M. galloprovincialis. Significantly positively selected codons with a Bayesian posterior 691 probability P≥0.95 are indicated in bold. 692 693 References Cited 694 695 Abele, D., & Puntarulo, S. (2004). Formation of reactive species and induction of antioxidant defence 696 systems in polar and temperate marine invertebrates and fish. Comparative Biochemistry and 697 Physiology Part a: Molecular & Integrative Physiology, 138(4), 405–415. 698 Anisimova, M., Bielawski, J. P., & Yang, Z. (2001). Accuracy and power of the likelihood ratio test in 699 detecting adaptive molecular evolution. Molecular Biology and Evolution, 18(8), 1585–1592. 700 Anisimova, M., & Liberles, D. A. (2007). The quest for natural selection in the age of comparative 701 genomics. Heredity, 99(6), 567. 702 Anisimova, M., & Yang, Z. (2007). Multiple hypothesis testing to detect lineages under positive selection 703 that affects only a few sites. Molecular Biology and Evolution, 24(5), 1219–1228. 704 Alexa, A., Rahnenfuhrer, J., & Lengauer, T. (2006). Improved scoring of functional groups from gene Author Manuscript 705 expression data by decorrelating GO graph structure. Bioinformatics, 22(13), 1600–1607. 706 Alexa, A., & Rahnenfuhrer, J. (2010). topGO: enrichment analysis for gene ontology. R Package Version, 707 2(0).

This article is protected by copyright. All rights reserved 708 Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. (2000). Gene 709 Ontology: tool for the unification of biology. Nature Genetics, 25(1), 25. 710 Bachtrog, D., & Andolfatto, P. (2006). Selection, recombination and demographic history in Drosophila 711 miranda. Genetics, 174(4), 2045–2059. 712 Barreto, F. S., Moy, G. W., & Burton, R. S. (2010). Interpopulation patterns of divergence and selection 713 across the transcriptome of the copepod Tigriopus californicus. Molecular Ecology, 20(3), 560–572. 714 Bay, R. A., & Palumbi, S. R. (2014). Multilocus adaptation associated with heat resistance in reef-building 715 corals. Current Biology, 24(24), 2952–2956. 716 Bierne, N., & Eyre-Walker, A. (2004). The genomic rate of adaptive amino acid substitution in Drosophila. 717 Molecular Biology and Evolution, 21(7), 1350–1360. 718 Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. 719 Bioinformatics, 30(15), 2114–2120. 720 Boon, E., Faure, M. F., & Bierne, N. (2009). The flow of antimicrobial peptide genes through a genetic 721 barrier between Mytilus edulis and M. galloprovincialis. Journal of Molecular Evolution, 68(5), 461– 722 474. 723 Braby, C. E., & Somero, G. N. (2006a). Ecological gradients and relative abundance of native (Mytilus 724 trossulus) and invasive (Mytilus galloprovincialis) blue mussels in the California hybrid zone. Marine 725 Biology, 148(6), 1249–1262. 726 Braby, C. E., & Somero, G. N. (2006b). Following the heart: temperature and salinity effects on heart rate 727 in native and invasive species of blue mussels (genus Mytilus). Journal of Experimental Biology, 728 209(13), 2554–2566. 729 Branch, G. M., & Steffani, C. N. (2004). Can we predict the effects of alien species? A case-history of the 730 invasion of South Africa by Mytilus galloprovincialis (Lamarck). Journal of Experimental Marine 731 Biology and Ecology, 300(1-2), 189–215. 732 Charlesworth, J., & Eyre-Walker, A. (2008). The McDonald-Kreitman test and slightly deleterious 733 mutations. Molecular Biology and Evolution, 25(6), 1007–1015. 734 Christe, C., Stölting, K. N., Paris, M., Fraїsse, C., Bierne, N., & Lexer, C. (2017). Adaptive evolution and 735 segregating load contribute to the genomic landscape of divergence in two tree species connected by 736 episodic gene flow. Molecular Ecology, 26(1), 59–76. 737 Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., et al. (2012). A program for 738 annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the 739 genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2), 80–92. 740 Clark, A. G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M. J., et al. (2003). Positive Author Manuscript 741 selection in the inferred from human–chimp–mouse orthologous gene alignments. In: 742 Cold Spring Harbor symposia on quantitative biology, (Vol. 68, pp. 479–486). Cold Spring Harbor 743 Laboratory Press.

This article is protected by copyright. All rights reserved 744 Colautti, R. I., & Lau, J. A. (2015). Contemporary evolution during invasion: evidence for differentiation, 745 natural selection, and local adaptation. Molecular Ecology, 24(9), 1999–2017. 746 Connor, K. M., & Gracey, A. Y. (2012). Circadian cycles are the dominant transcriptional rhythm in the 747 intertidal mussel Mytilus californianus. Proceedings of the National Academy of Sciences, USA, 108, 748 16110-16115. 749 Dalle-Donne, I., Rossi, R., Colombo, G., Giustarini, D., & Milzani, A. (2009). Protein S-glutathionylation: a 750 regulatory device from bacteria to humans. Trends in Biochemical Sciences, 34(2), 85–96. 751 Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., et al. (2011). The variant 752 call format and VCFtools. Bioinformatics, 27(15), 2156–2158. 753 Dean, A. M., & Thornton, J. W. (2007). Mechanistic approaches to the study of evolution: the functional 754 synthesis. Nature Reviews. Genetics, 8(9), 675. 755 Ellegren, H. (2008). Comparative genomics and the study of evolution by natural selection. Molecular 756 Ecology, 17(21), 4586–4596. 757 Eyre-Walker, A., & Keightley, P. D. (2009). Estimating the rate of adaptive molecular evolution in the 758 presence of slightly deleterious mutations and population size change. Molecular Biology and 759 Evolution, 26(9), 2097–2108. 760 Fay, J. C., Wyckoff, G. J., & Wu, C.-I. (2001). Positive and negative selection on the human genome. 761 Genetics, 158(3), 1227–1234. 762 Fields, P. A., Rudomin, E. L., & Somero, G. N. (2006). Temperature sensitivities of cytosolic malate 763 dehydrogenases from native and invasive species of marine mussels (genus Mytilus): sequence- 764 function linkages and correlations with biogeographic distribution. Journal of Experimental Biology, 765 209(4), 656–667. 766 Fields, P. A., Dong, Y., Meng, X., & Somero, G. N. (2015). Adaptations of protein structure and function to 767 temperature: there is more than one way to “skin a cat.” Journal of Experimental Biology, 218(12), 768 1801–1811. 769 Fields, P. A., Zuzow, M. J., & Tomanek, L. (2012). Proteomic responses of blue mussel (Mytilus) 770 congeners to temperature acclimation. Journal of Experimental Biology, 215(7), 1106–1116. 771 Fraïsse, C., Roux, C., Gagnaire, P.-A., Romiguier, J., Faivre, N., Welch, J. J., & Bierne, N. (2018). The 772 divergence history of European blue mussel species reconstructed from Approximate Bayesian 773 Computation: the effects of sequencing techniques and sampling strategies. Peer J, 6, e5198. 774 Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the next-generation 775 sequencing data. Bioinformatics, 28(23), 3150–3152. 776 Galtier, N. (2016). Adaptive protein evolution in animals and the effective population size hypothesis. Author Manuscript 777 PLoS Genetics, 12(1), e1005774–23. 778 Geller, J. B. (1999). Decline of a native mussel masked by sibling species invasion. Conservation Biology, 779 13(3), 661–664.

This article is protected by copyright. All rights reserved 780 Gleason, L. U., Miller, L. P., Winnikoff, J. R., Somero, G. N., Yancey, P. H., Bratz, D., & Dowd, W. W. 781 (2017). Thermal history and gape of individual Mytilus californianus correlate with oxidative damage 782 and thermoprotective osmolytes. Journal of Experimental Biology, 220(22), 4292–4304. 783 Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., 784 Raychowdhury, R., & Zeng, Q. (2011). Full-length transcriptome assembly from RNA-Seq data 785 without a reference genome. Nature Biotechnology, 29(7), 644. 786 Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., Couger, M. B., Eccles, 787 D., Li, B., & Lieber, M. (2013). De novo transcript sequence reconstruction from RNA-seq using the 788 Trinity platform for reference generation and analysis. Nature Protocols, 8(8), 1494. 789 Haddrill, P. R., Bachtrog, D., & Andolfatto, P. (2008). Positive and negative selection on noncoding DNA in 790 Drosophila simulans. Molecular Biology and Evolution, 25(9), 1825–1834. 791 Hart, M. W., Sunday, J. M., Popovic, I., Learning, K. J., & Konrad, C. M. (2014). Incipient speciation of sea 792 star populations by adaptive gamete recognition coevolution. Evolution, 68(5), 1294–1305. 793 Haslbeck, M., Franzmann, T., Weinfurtner, D., & Buchner, J. (2005). Some like it hot: the structure and 794 function of small heat-shock proteins. Nature Structural & Molecular Biology, 12(10), 842–846. 795 Hodgins, K. A., Bock, D. G., Hahn, M. A., Heredia, S. M., Turner, K. G., & Rieseberg, L. H. (2014). 796 Comparative genomics in the Asteraceae reveals little evidence for parallel evolutionary change in 797 invasive taxa. Molecular Ecology, 24(9), 2226–2240. 798 Hodgins, K. A., Yeaman, S., Nurkowski, K. A., Rieseberg, L. H., & Aitken, S. N. (2016). Expression 799 divergence is correlated with sequence evolution but not positive selection in conifers. Molecular 800 Biology and Evolution, 33(6), 1502–1516. 801 Hofmann, G. E., & Somero, G. N. (1996). Interspecific variation in thermal denaturation of proteins in the 802 congeneric mussels Mytilus trossulus and M. galloprovincialis: evidence from the heat-shock 803 response and protein ubiquitination. Marine Biology, 126(1), 65–75. 804 Holloway, A. K., Lawniczak, M. K. N., Mezey, J. G., Begun, D. J., & Jones, C. D. (2007). Adaptive gene 805 expression divergence inferred from population genomics. PLoS Genetics, 3(10), e187–7. 806 Hunt, B. G., Ometto, L., Keller, L., & Goodisman, M. A. (2012). Evolution at two levels in fire ants: the 807 relationship between patterns of gene expression and protein sequence evolution. Molecular Biology 808 and Evolution, 30(2), 263–271. 809 Jong, C. J., Azuma, J., & Schaffer, S. (2012). Mechanism underlying the antioxidant activity of taurine: 810 prevention of mitochondrial oxidant production. Amino Acids, 42(6), 2223–2232. 811 Jordan, G., & Goldman, N. (2012). The effects of alignment error and alignment filtering on the sitewise 812 detection of positive selection. Molecular Biology and Evolution, 29(4), 1125–1139. Author Manuscript 813 Keightley, P. D., & Eyre-Walker, A. (2012). Estimating the rate of adaptive molecular evolution when the 814 evolutionary divergence between species is small. Journal of Molecular Evolution, 74(1-2), 61–68.

This article is protected by copyright. All rights reserved 815 Koester, J. A., Swanson, W. J., & Armbrust, E. V. (2013). Positive Selection within a diatom species acts 816 on putative protein interactions and transcriptional regulation. Molecular Biology and Evolution, 30(2), 817 422–434. 818 Kocot, K. M., Citarella, M. R., Moroz, L. L., & Halanych, K. M. (2013). PhyloTreePruner: a phylogenetic 819 tree-based approach for selection of orthologous sequences for phylogenomics. Evolutionary 820 Bioinformatics, 9, EBO–S12813. 821 Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H., & Frost, S. D. (2006). GARD: a 822 genetic algorithm for recombination detection. Bioinformatics, 22(24), 3096–3098. 823 Kosiol, C., Vinař, T., da Fonseca, R. R., Hubisz, M. J., Bustamante, C. D., Nielsen, R., & Siepel, A. (2008). 824 Patterns of positive selection in six mammalian genomes. PLoS Genetics, 4(8), e1000144–17. 825 Ladner, J. T., Barshis, D. J., & Palumbi, S. R. (2012). Protein evolution in two co-occurring types of 826 Symbiodinium: an exploration into the genetic basis of thermal tolerance in Symbiodinium clade D. 827 BMC Evolutionary Biology, 12(1), 217. 828 Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 829 357–359. 830 la Rosa, de, J., & Stipanuk, M. H. (1985). Evidence for a rate-limiting role of cysteine-sulfinate 831 decarboxylase activity in taurine biosynthesis in vivo. Comparative Biochemistry and Physiology Part 832 B: Comparative Biochemistry, 81(3), 565–571. 833 Le Corre, V., & Kremer, A. (2012). The genetic differentiation at quantitative trait loci under local 834 adaptation. Molecular Ecology, 21(7), 1548–1566. 835 Lechner, M., Findeiß, S., Steiner, L., Marz, M., Stadler, P. F., & Prohaska, S. J. (2011). Proteinortho: 836 detection of (co-) orthologs in large-scale analysis. BMC Bioinformatics, 12(1), 124. 837 Lemos, B., Bettencourt, B. R., Meiklejohn, C. D., & Hartl, D. L. (2005). Evolution of proteins and gene 838 expression levels are coupled in Drosophila and are independently associated with mRNA 839 abundance, protein length, and number of protein-protein interactions. Molecular Biology and 840 Evolution, 22(5), 1345–1354. 841 Li, Y. F., Costello, J. C., Holloway, A. K., & Hahn, M. W. (2008). “Reverse ecology” and the power of 842 population genomics. Evolution, 62(12), 2984–2994. 843 Li, W., & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or 844 nucleotide sequences. Bioinformatics, 22(13), 1658–1659. 845 Liao, M.-L., Zhang, S., Zhang, G.-Y., Chu, Y.-M., Somero, G. N., & Dong, Y.-W. (2017). Heat-resistant 846 cytosolic malate dehydrogenases (cMDHs) of thermophilic intertidal snails (genus Echinolittorina): 847 Protein underpinnings of tolerance to body temperatures reaching 55 C. Journal of Experimental Author Manuscript 848 Biology, 220(11), 2066–2075. 849 Linnen, C. R., Kingsley, E. P., Jensen, J. D., & Hoekstra, H. E. (2009). On the origin and spread of an 850 adaptive allele in deer mice. Science, 325(5944), 1095-1098.

This article is protected by copyright. All rights reserved 851 Lockwood, B. L., Sanders, J. G., & Somero, G. N. (2010). Transcriptomic responses to heat stress in 852 invasive and native blue mussels (genus Mytilus): molecular correlates of invasive success. Journal 853 of Experimental Biology, 213(20), 3548–3558. 854 Lockwood, B. L., & Somero, G. N. (2011a). Invasive and native blue mussels (genus Mytilus) on the 855 California coast: the role of physiology in a biological invasion. Journal of Experimental Marine 856 Biology and Ecology, 400(1-2), 167–174. 857 Lockwood, B. L., & Somero, G. N. (2012). Functional determinants of temperature adaptation in enzymes 858 of cold- versus warm-adapted mussels (Genus Mytilus). Molecular Biology and Evolution, 29(10), 859 3061–3070. 860 Lockwood, B. L., Connor, K. M., & Gracey, A. Y. (2015). The environmentally tuned transcriptomes of 861 Mytilus mussels. Journal of Experimental Biology, 218(12), 1822–1833. 862 Löytynoja, A., & Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence 863 alignment and evolutionary analysis. Science, 320(5883), 1632–1635. 864 Lu, A., & Guindon, S. (2013). Performance of standard and stochastic branch-site models for detecting 865 positive selection among coding sequences. Molecular Biology and Evolution, 31(2), 484–495. 866 Lowe, S., Browne, M., Boudjelas, S., & De Poorter, M. (2000). 100 of the world's worst invasive alien 867 species: a selection from the global invasive species database (Vol. 12). Invasive Species Specialist 868 Group Auckland. 869 Magoč, T., & Salzberg, S. L. (2011). FLASH: fast length adjustment of short reads to improve genome 870 assemblies. Bioinformatics, 27(21), 2957–2963. 871 Mallick, S., Gnerre, S., Muller, P., & Reich, D. (2009). The difficulty of avoiding false positives in genome 872 scans for natural selection. Genome Research, 19(5), 922–933. 873 Markova-Raina, P., & Petrov, D. (2011). High sensitivity to aligner and high rate of false positives in the 874 estimates of positive selection in the 12 Drosophila genomes. Genome Research, 21(6), 863–874. 875 McDonald, J. H., & Koehn, R. K. (1988). The mussels Mytilus galloprovincialis and M. trossulus on the 876 Pacific coast of North America. Marine Biology, 99(1), 111–118. 877 McDonald, J. H., & Kreitman, M. (1991). Adaptive protein evolution at the Adh locus in Drosophila. Nature, 878 351(6328), 652. 879 McDonald, J. H., Seed, R., & Koehn, R. K. (1991). Allozymes and morphometric characters of three 880 species of Mytilus in the Northern and Southern Hemispheres. Marine Biology, 111(3), 323–333. 881 McGuire, M. J., Lipsky, P. E., & Thiele, D. L. (1992). Purification and characterization of dipeptidyl 882 peptidase I from human spleen. Archives of Biochemistry and Biophysics, 295(2), 280–288. 883 Mendes, F. K., & Hahn, M. W. (2016). Gene tree discordance causes apparent substitution rate variation. Author Manuscript 884 Systematic Biology, 65(4), 711–721. 885 Messer, P. W., & Petrov, D. A. (2013). Population genomics of rapid adaptation by soft selective sweeps. 886 Trends in Ecology & Evolution, 28(11), 659–669.

This article is protected by copyright. All rights reserved 887 Moyers, B. T., & Rieseberg, L. H. (2013). Divergence in gene expression is uncoupled from divergence in 888 coding sequence in a secondarily woody sunflower. International Journal of Plant Sciences, 174(7), 889 1079–1089. 890 Mugal, C. F., Wolf, J. B. W., & Kaj, I. (2013). Why time matters: Codon evolution and the temporal 891 dynamics of dN/dS. Molecular Biology and Evolution, 31(1), 212–231. 892 Murgarella, M., Puiu, D., Novoa, B., Figueras, A., Posada, D., & Canchaya, C. (2016). A first insight into 893 the genome of the filter-feeder mussel Mytilus galloprovincialis. PLoS ONE, 11(3), e0151561. 894 Natarajan, C., Hoffmann, F. G., Lanier, H. C., Wolf, C. J., Cheviron, Z. A., Spangler, M. L., et al. (2015). 895 Intraspecific polymorphism, interspecific divergence, and the origins of function-altering mutations in 896 deer mouse hemoglobin. Molecular Biology and Evolution, 32(4), 978–997. 897 Nielsen, R., Bustamante, C., Clark, A. G., Glanowski, S., Sackton, T. B., Hubisz, M. J., Fledel-Alon, A., 898 Tanenbaum, D. M., Civello, D., & White, T. J. (2005). A scan for positively selected genes in the 899 genomes of humans and chimpanzees. PLoS Biology, 3(6), e170. 900 Nuzhdin, S. V., Wayne, M. L., Harmon, K. L., & McIntyre, L. M. (2004). Common pattern of evolution of 901 gene expression level and protein sequence in Drosophila. Molecular Biology and Evolution, 21(7), 902 1308–1317. 903 Oliver, T. A., Garfield, D. A., Manier, M. K., Haygood, R., Wray, G. A., & Palumbi, S. R. (2010). Whole- 904 genome positive selection and habitat-driven evolution in a shallow and a deep-sea urchin. Genome 905 Biology and Evolution, 2, 800–814. 906 Paradis E., Claude J., Strimmer K., (2004) APE: Analyses of phylogenetics and evolution in R language. 907 Bioinformatics, 20, 289–290. 908 Pavlidis, P., Jensen, J. D., Stephan, W., & Stamatakis, A. (2012). A critical assessment of storytelling: 909 gene ontology categories and the importance of validating genomic scans. Molecular Biology and 910 Evolution, 29(10), 3237–3248. 911 Popovic, I., Marko, P. B., Wares, J. P., & Hart, M. W. (2014). Selection and demographic history shape the 912 molecular evolution of the gamete compatibility protein bindin in Pisaster sea stars. Ecology and 913 Evolution, 4(9), 1567–1588. 914 Popovic, I., Matias, A. M. A., Bierne, N., & Riginos, C. (2019). Twin introductions by independent invader 915 mussel lineages are both associated with recent admixture with a native congener in Australia. 916 Evolutionary Applications, 00, 1-18 917 Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2–approximately maximum-likelihood trees for 918 large alignments. PLoS ONE, 5(3), e9490. 919 Rawson, P. D., Agrawal, V., & Hilbish, T. J. (1999). Hybridization between the blue mussels Mytilus Author Manuscript 920 galloprovincialis and M. trossulus along the Pacific coast of North America: evidence for limited 921 introgression. Marine Biology, 134(1), 201–211.

This article is protected by copyright. All rights reserved 922 Rawson, P. D., Joyner, K. L., Meetze, K., & Hilbish, T. J. (1996). Evidence for intragenic recombination 923 within a novel genetic marker that distinguishes mussels in the Mytilus edulis species complex. 924 Heredity, 77(6), 599. 925 Renaut, S., Grassa, C., Moyers, B., Kane, N., Rieseberg, L. (2012). The population genomics of 926 sunflowers and genomic determinants of protein evolution revealed by RNAseq. Biology, 1(3):575– 927 596. 928 Rockman, M. V. (2008). Reverse engineering the genotype–phenotype map with natural genetic variation. 929 Nature, 456(7223), 738. 930 Rockman, M. V. (2012). The QTN program and the alleles that matter for evolution: all that's gold does not 931 glitter. Evolution, 66(1), 1–17. 932 Roure, B., Rodriguez-Ezpeleta, N., & Philippe, H. (2007). SCaFoS: a tool for selection, concatenation and 933 fusion of sequences for phylogenomics. BMC Evolutionary Biology, 7(1), S2. 934 Roux, J., Privman, E., Moretti, S., Daub, J. T., Robinson-Rechavi, M., & Keller, L. (2014a). Patterns of 935 positive selection in seven ant genomes. Molecular Biology and Evolution, 31(7), 1661–1685. 936 Roux, C., Fraïsse, C., Castric, V., Vekemans, X., Pogson, G. H., & Bierne, N. (2014b). Can we continue to 937 neglect genomic variation in introgression rates when inferring the history of speciation? A case study 938 in a Mytilus hybrid zone. Journal of Evolutionary Biology, 27(8), 1662–1675. 939 Roux, C., Fraïsse, C., Romiguier, J., Anciaux, Y., Galtier, N., & Bierne, N. (2016). Shedding light on the 940 grey zone of speciation along a continuum of genomic divergence. PLoS Biology, 14(12), e2000234. 941 Saarman, N. P., Kober, K. M., Simison, W. B., & Pogson, G. H. (2017). Sequence-based analysis of 942 thermal adaptation and protein energy landscapes in an invasive blue mussel (Mytilus 943 galloprovincialis). Genome Biology and Evolution, 9(10), 2739–2751. 944 Schaffer, S. W., Jong, C. J., Ramila, K. C., & Azuma, J. (2010). Physiological roles of taurine in heart and 945 muscle. Journal of Biomedical Science, 17(1), S2. 946 Sherman, C., Lotterhos, K. E., Richardson, M. F., Tepolt, C. K., Rollins, L. A., Palumbi, S. R., & Miller, A. 947 D. (2016). What are we missing about marine invasions? Filling in the gaps with evolutionary 948 genomics. Marine Biology, 163(10), 198. 949 Silva, A. L., & Wright, S. H. (1992). Integumental taurine transport in Mytilus gill: short-term adaptation to 950 reduced salinity. Journal of Experimental Biology, 162(1), 265–279. 951 Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., & Zdobnov, E. M. (2015). BUSCO: 952 assessing genome assembly and annotation completeness with single-copy orthologs. 953 Bioinformatics, 31(19), 3210–3212. 954 Smith, N. G., & Eyre-Walker, A. (2002). Adaptive protein evolution in Drosophila. Nature, 415(6875), Author Manuscript 955 1022–1024. 956 Somero, G. N. (2012). The physiology of global change: linking patterns to mechanisms. Annual Review 957 of Marine Science, 4, 39–61.

This article is protected by copyright. All rights reserved 958 Stoletzki, N., & Eyre-Walker, A. (2011). Estimation of the neutrality index. Molecular Biology and 959 Evolution, 28(1), 63–70. 960 Storz, J. F., Runck, A. M., Sabatino, S. J., Kelly, J. K., Ferrand, N., Moriyama, H., et al. (2009). 961 Evolutionary and functional insights into the mechanism underlying high-altitude adaptation of deer 962 mouse hemoglobin. Proceedings of the National Academy of Sciences, 106(34), 14450–14455. 963 Storz, J. F., & Wheat, C. W. (2010). Integrating evolutionary and functional approaches to infer adaptation 964 at specific loci. Evolution, 64(9), 2489–2509. 965 Storz, J. F., Bridgham, J. T., Kelly, S. A., & Garland, T., Jr. (2015). Genetic approaches in comparative 966 and evolutionary physiology. American Journal of Physiology - Regulatory, Integrative and 967 Comparative Physiology, 309(3), R197–R214. 968 Tirosh, I., & Barkai, N. (2008). Evolution of gene sequence and gene expression are not correlated in 969 yeast. Trends in Genetics, 24(3), 109–113. 970 Tomanek, L. (2012). Environmental proteomics of the mussel Mytilus: Implications for tolerance to stress 971 and change in limits of biogeographic ranges in response to climate change. Integrative and 972 Comparative Biology, 52(5), 648–664. 973 Tomanek, L. (2015). Proteomic responses to environmentally induced oxidative stress. Journal of 974 Experimental Biology, 218(12), 1867–1879. 975 Tomanek, L., & Zuzow, M. J. (2010). The proteomic response of the mussel congeners Mytilus 976 galloprovincialis and M. trossulus to acute heat stress: implications for thermal tolerance limits and 977 metabolic costs of thermal stress. Journal of Experimental Biology, 213(20), 3559–3574. 978 Voolstra, C. R., Sunagawa, S., Matz, M. V., Bayer, T., Aranda, M., Buschiazzo, E., DeSalvo, M. K., 979 Lindquist, E., Szmant, A. M., & Coffroth, M. A. (2011). Rapid evolution of coral proteins responsible 980 for interaction with the environment. PLoS ONE, 6(5), e20392. 981 Yancey, P. H., & Siebenaller, J. F. (2015). Co-evolution of proteins and solutions: protein adaptation 982 versus cytoprotective micromolecules and their roles in marine organisms. Journal of Experimental 983 Biology, 218(12), 1880–1896. 984 Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 985 24(8), 1586–1591. 986 Yang, Z., Wong, W. S., & Nielsen, R. (2005). Bayes empirical Bayes inference of amino acid sites under 987 positive selection. Molecular Biology and Evolution, 22(4), 1107–1118. 988 Zhang, J., Nielsen, R., & Yang, Z. (2005). Evaluation of an improved branch-site likelihood method for 989 detecting positive selection at the molecular level. Molecular Biology and Evolution, 22(12), 2472– 990 2479. Author Manuscript 991 [dataset] Popovic, I., & Riginos, C.; 2019; Dryad; Dataset links provided upon acceptance. 992 [dataset] Popovic, I., Matias, A. M. A., Bierne, N., & Riginos, C.; 2019; BioProject ID: PRJNA560413; 993 NCBI sequence read archive.

This article is protected by copyright. All rights reserved

Table 1. Details of sample collection locations.

Samples Analysis Taxon Location Latitude Longitude Population sequenced Species-level analyses Mytilus californianus Scripps Institute of Oceanography, CA, USA 32.86ºN 117.25 ºW - 3

Mytilus trossulus Lighthouse Park, Vancouver, BC, Canada 49.33ºN 123.26 ºW - 3

Mytilus edulis Darling Marine Station, Maine, USA 43.94ºN 69.56 ºW - 3 Mytilus galloprovincialis Herceg Novi, Montenegro (Eastern Mediterranean) 42.45ºN 18.53 ºE - 3* Population-level analyses Mytilus galloprovincialis Herceg Novi, Montenegro (Eastern Mediterranean) 42.45ºN 18.53ºE Mediterranean 5 Crique des Issambles, France (Western Mediterranean) 43.35ºN 6.71ºE Mediterranean 5 Primel, France (Atlantic) 48.71ºN 3.81ºW Atlantic 5

* Samples also used for population-level analyses.

Table 2. Genomic functions under positive selection in M. galloprovincialis as evidenced by both branch-site tests and polymorphism-divergence analyses.

Polymorphism- Branch-site Testsa Divergence Testsb Genomic Functional Class Gene Annotation Locus Functional Description abc; Uniprot ID LRT BEB alpha DoS statistic () (codon site) 1.00 A 0.317 A Oxidative Stress Microsomal glutathione S-transferase 3 MGST3 8.062 (999)* 0.979 (42)* Glutathione peroxidase activity c; Q3T100 1.00 M* 0.317 M* 0.912 A* 0.305 A* Taurine biosynthesis, sulfur amino acid Heat Oxidative Stress Cysteine sulfinic acid decarboxylase CSAD 6.160 (26)* 0.951 (166)* 1.00 M* 0.349 M* catabolism c; Q64611 Stress 0.973 (7)*

CytoskeletalAuthor Manuscript 0.723 A 0.155 A Nuclear protein MDM1 MDM1 28.07 (998)* 0.972 (8)* Microtubule binding c; Q9D067 Reorganisation 1.00 M* 0.237 M* 0.990 (9)*

This article is protected by copyright. All rights reserved

0.976 (19)*

Tumour necrosis factor 0.917 (7) 0.880 A* 0.425 A* Stress Response TNF 7.800 (999)* Heat-stress response d; PF00229.17 f (PFAM protein domain) 0.996 (39)* 0.859 M* 0.406 M* 0.775 A 0.166 A Carnitine metabolism, fatty acid beta- Lipid Metabolism Carnitine O-palmitoyltransferase 1 CPT1A 12.93 (94)* 0.999 (703)* 0.858 M* 0.199 M* oxidation c, heat-stress response d; P32198 0.911 (129) 0.577 A* 0.211 A* Reproduction Estradiol 17-beta dehydrogenase 2 HSD17B2 5.030 (4)* Estrogen biosynthetic process c; P37059 0.956 (191)* 0.522 M* 0.182 M* 0.917 (8) 0.946 (14) 1.000 A* 0.569 A* NA No Hit NA 13.20 (15)* NA 0.922 (18) 0.853 M 0.409 M 0.951 (53)* a Branch-site test statistics for loci under positive selection with the criteria of both significant Likelihood Ratio Test (LRT) statistic (P≤0.05) and at least one codon with a Bayesian Empirical Bayes (BEB) posterior probability ≥95%; b Polymorphism-divergence statistics for loci with significantly positive alpha and DoS values in Atlantic (A) or Mediterranean (M) populations relative to M. trossulus (P≤0.05). Significant p-values are marked with an asterisk* c Functional description based on Gene Ontology Consortium classification; d Functional description based on Mytilus ecophysiological studies (Lockwood et al. 2010; Conner and Gracey 2012) f Functional description based on PFAM protein domain classification

Table 3. Known thermotolerance candidate loci with evidence of positive selection in either branch-site tests or polymorphism-divergence testsab.

a Polymorphism- Branch-site Tests b Genomic Functional Divergence Tests Gene Annotation Locus Functional Description abc; Uniprot ID Class LRT BEB alpha DoS statistic () (codon site) 0.844 A* 0.433 A* Glutathione oxidoreductase activity c, heat-stress Oxidative Stress Glutathione S-transferase omega-1 GSTO1 0 (1) - Author Manuscript 0.786 M 0.367 M response d ; P78417

Oxidative Stress Dipeptidyl peptidase (cathepsin C) CTSC 0 (1) - 0.739 A* 0.229 A* Sulfur amino acid catabolism c, heat-stress response d

This article is protected by copyright. All rights reserved

0.719 M* 0.223 M* ; P53634

1.00 A* 0.176 A* Oxidoreductase activity, gluconeogenesis c; heat- Energy Metabolism Cytosolic malate dehydrogenase cMDH 0 (1) - 1.00 M* 0.200 M* stress response d ; Q3T145 Cytoskeletal 1.00 A 0.250 A Small heat shock protein 25 HSP25 7.601 (999)* 0.905 (185) Heat-stress response c d; Q17849 Reorganisation 0.643 M 0.161 M Cytoskeletal None 0.607 A 0.155 A Actin filament binding c; heat-stress response d; Shootin-1 SHTN1 4.821 (72)* Reorganisation Identified 0.520 M 0.127 M A0MZ66 a Branch-site test statistics for loci under positive selection with the criteria of both significant Likelihood Ratio Test (LRT) statistic (P≤0.05) and at least one codon with a Bayesian Empirical Bayes (BEB) posterior probability ≥95%; b Polymorphism-divergence statistics for loci with significantly positive alpha and DoS values in Atlantic (A) or Mediterranean (M) populations relative to M. trossulus (P≤0.05). Significant p-values are marked with an asterisk* c Functional description based on Gene Ontology Consortium classification; d Functional description based on Mytilus ecophysiological studies (Lockwood et al. 2010; Conner and Gracey 2012)

Table 4. Enrichment of gene ontology terms among positively selected gene sets (P≤0.001), conducted against the complete M. galloprovincialis protein-coding assembly as the reference gene set.

Number of Significant Analysis of Selected Gene List Gene Count GO ID Biological Process P-value Annotated Genes 1. Branch-site Tests 1a. LRT p≤0.05 99 GO:0006790 sulfur compound metabolic process 0.00076 1* GO:1901565 organonitrogen compound catabolic process 0.00076 1*

1b. LRT p≤0.05; BEB p≥0.95 38 GO:0006790 sulfur compound metabolic process 0.00051 1* GO:1901565 organonitrogen compound catabolic process 0.00051 1* GO:0016054 organic acid catabolic process 0.00081 1*

Author Manuscript 1c. LRT p≤0.05; BEB p≥0.95 19 GO:0006790 sulfur compound metabolic process 0.00025 1* ‘highest-confidence alignment’

This article is protected by copyright. All rights reserved

GO:1901565 organonitrogen compound catabolic process 0.00025 1* GO:0016054 organic acid catabolic process 0.00041 1* 2. Polymorphism-Divergence Tests GO:0000375 RNA splicing, via transesterification reaction 1.6e-06 4 175 GO:0006790 sulfur compound metabolic process 1.8e-06 2* GO:0022618 ribonucleoprotein complex assembly 9.3e-06 3 GO:0001889 liver development 3.4e-05 2

* Cysteine sulfinic acid decarboxylase (CSAD) is annotated with significantly enriched GO terms Author Manuscript

This article is protected by copyright. All rights reserved mec_15339_f1-3.pdf

a)# b)#

20

MG ME MT MC 10

25

20 (°C) Temperature Surface Sea Mean

15

10 0 5 MG ME MT MC 0 Species Mean SST (C°) c)#

1.5e+07 Author Manuscript −1.0e+07 −5.0e+06 0.0e+00 5.0e+06 1.0e+07

This article is protected by copyright. All rights reserved B 81 A)a) 15 Genes3with3positive3alpha/DoS values Mytilus(galloprovincialis( (n=175) protein2coding(reference(assembly High2quality(one2to2one( Positively3selected3genes3in3LRTs (n=99)3 orthologous(groups Positively3selected3genes3in3LRTs3with3 35,815 at3least3one3codon3under3positive3 selection3(BEBI n=38) 2,719 A 1 1.$Estradiol 17.beta$dehydrogenase$2$(HSD17B2)*$ 1 2.$Cysteine$sulfinic acid$decarboxylase$(CSAD)* 31 3.$Tumor Necrosis$Factor$(PFAM$domain)$(TNF)* 155 7 4.$Microsomal$glutathione$S.transferase$3$(MGST3)* 5.3Carnitine3OOpalmitoyltransferase 13(CPT1A) 6.3No3Hit 10 7.3Nuclear3protein3MDM13(MDM1) 49 2 2 Thermotolerance candidate3gene3 15 groups3A3B3C3(n=273) C 156 B)b)

3 6 2 4 7 1 5 DoS Statistic DoS 0.5 0.0 0.5 1.0 − 1.0 − Author Manuscript

0 5 10 15 20 25 30

LRT Statistic

This article is protected by copyright. All rights reserved Gene$Name GSTO1 MGST3 CSAD HSP25

Codon 11 42 166 185

M.#galloprovincialis Ala Trp Val Leu GCA TGG GTG CTG

M.#edulis Ser Glu Thr His TCA GAA ACG CAC

M.#trossulus Thr Glu Thr His ACA GAA ACG CAT

M.#californianus Glu Glu Thr His GAA GAA ACG CAT

Nonpolar$Hydrophobic Polar$Uncharged

Polar$Acidic Polar$Basic Author Manuscript

This article is protected by copyright. All rights reserved