Received: 28 August 2020 | Revised: 4 December 2020 | Accepted: 18 December 2020 DOI: 10.1111/mec.15787

ORIGINAL ARTICLE

Genome-wide single nucleotide polymorphism markers reveal population structure and dispersal direction of an expanding nuisance algal bloom species

Karin Rengefors1 | Raphael Gollnisch1 | Ingrid Sassenhagen1,2 | Karolina Härnström Aloisi1,3 | Marie Svensson1 | Karen Lebret1 | Dora Čertnerová4 | William A. Cresko5 | Susan Bassham5 | Dag Ahrén6

1Department of Biology, Lund University, Lund, Sweden Abstract 2Department of and Genetics, Species invasion and range expansion are currently under scrutiny due to increasing Uppsala University, Uppsala, Sweden anthropogenic impact on the natural environment. This is also true for harmful 3Nordic Genetic Resource Centre (NordGen), Alnarp, Sweden algal blooms, which have been reported to have increased in frequency. However, 4Department of Botany, Faculty of this research is challenging due to the ephemeral nature, small size and mostly low Science, Charles University, Prague, Czech concentrations of microalgae in the environment. One such species is the nuisance Republic 5Institute of Ecology and Evolution, microalga Gonyostomum semen (Raphidophyceae), which has increased in occurrence University of Oregon, Eugene, OR, USA in northern Europe in recent decades. The question of whether the species has 6 Department of Biology, National expanded its habitat range or if it was already present in the lakes but was too rare Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Lund, Sweden to be detected remains unanswered. The aim of the present study was to determine the genetic structure and dispersal pathways of G. semen using RAD (restriction-site- Correspondence Karin Rengefors, Department of Biology, associated DNA) tag sequencing. For G. semen, which has a huge genome (32 Gbp), we Lund University, Ecology Building, Lund, faced particular challenges, but were nevertheless able to recover over 1000 single Sweden. Email: [email protected] nucleotide polymorphisms at high coverage. Our data revealed a distinct population genetic structure, demonstrating a divide of western and eastern populations that Funding information Vetenskapsrådet, Grant/Award Number: probably represent different lineages. Despite significant genetic differentiation 621-2012-3726; Svenska Forskningsrådet among lakes, we found only limited isolation-by-distance. While we had expected a Formas, Grant/Award Number: 215-2010- 751; H2020 Marie Skłodowska-Curie pattern of recent expansion northwards, the data demonstrated from the Actions, Grant/Award Number: SINGEK northeast/east towards the southwest/west. This genetic signature suggests that the H2020 MSCA-ITN-2015-ETN 675752 observed gene flow may be due to dispersal by autumn migratory birds, which act as dispersal vectors of resistant resting propagules that form at the end of the G. semen blooms.

KEYWORDS algal blooms, Gonyostomum semen, , population structure, RAD-seq, SNPs

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. © 2021 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

.wileyonlinelibrary.com/journal/mec  Molecular Ecology. 2021;30:912–925 | 912 RENGEFORS et al. | 913

1 | INTRODUCTION levels are potential drivers that are correlated with the increased fre- quency of blooms (Hagman et al., 2019; Rengefors et al., 2012; Trigal Habitat range expansion and species invasions are important pro- et al., 2013). Moreover, Lebret et al. (2018) recently found that high cesses in determining species biogeographies and pat- abundances of G. semen occurred mainly in lakes with high iron, an el- terns. The ever-increasing anthropogenic impact on the natural ement that has increased in recent decades in freshwater environment, including climate change, has increased scientific inter- (Björnerås et al., 2017). Although monitoring data convincingly show est in these processes. While most focus has been on multicellular increased bloom occurrence in new lakes, they do not reveal the pat- animals and plants, especially in terms of conservation issues, micro- tern of invasion, when it started, or if G. semen was always present in bial organisms undergo equivalent changes. In recent decades there a specific lake but without being detected. has been an increase in the report of harmful algal blooms (HABs) An alternative approach to study invasive species is to utilize in freshwater and marine environments around the world (Anderson genetic markers to follow the tracks of a range expansion, as the et al., 2002; Paerl & Otten, 2013). In the ocean, these HABs are expansion process leaves an imprint on contemporary patterns mainly attributed to range expansion due to increased seawater of genetic diversity (Excoffier et al., 2009; Waters et al., 2013). temperatures and eutrophication (Gobler et al., 2017), as well as hu- Phylogeographical studies of G. semen indicate that a single mito- man-mediated dispersal through ballast water and aquaculture (Bolch chondrial haplotype has expanded in northern Europe (Lebret et al., & de Salas, 2007; Lilly et al., 2002). In freshwater systems, increased 2015), possibly following deglaciation, similar to many other species nutrient levels together with climatic changes are suggested to have (Waters et al., 2013). Population genetic markers (amplified fragment increased cyanobacterial HABs (Michalak et al., 2013; Paerl & Otten, length polymorphism [AFLP]) were utilized to investigate the more 2013). Nevertheless, studies on microalgal invasions and expansions recent expansion in Finland and Scandinavia, but the AFLP-generated are still relatively few, and are hampered by methodological issues data could not resolve any population structure in this system (Lebret associated with the detection and identification of ephemeral and et al., 2013). In addition, no isolation-by-distance was detected, ge- nonpathogenic microorganisms (Litchman, 2010). Thus, it has been netic diversity was low, as was genetic differentiation among lakes. difficult to obtain conclusive evidence showing that the microalga in Together these data indicated a very recent expansion, but did not question was not present in a specific habitat previously, or when provide additional information about the colonization pattern. Due it started becoming more abundant. Typically, microalgal species are to the inherent traits of the AFLP method (i.e., dominant marker, no noted only once they form nuisance blooms, when cell densities in nucleotide information), it was not possible to calculate direction of monitoring samples are high enough to be detected and identified. migration, or to obtain a finer spatial structure analysis. Despite the methodological impediments and often lacking his- Whole genome sequencing or single nucleotide polymorphism torical data, there are a few documented invasive microalgal species (SNP) data would probably give the resolution necessary to trace (Padisák et al., 2016). One of these is the raphidophyte Gonyostomum the expansion of G. semen. However, like many other HAB species, semen, which forms nuisance algal blooms in freshwater environments including most of the dinoflagellates and raphidophytes, G. semen across the world (Karosiene et al., 2014; Rengefors et al., 2012). This has been estimated to have a very large genome (in the Gbp per species can dominate the phytoplankton community almost completely, cell range) (Sassenhagen et al., 2015). The size of these genomes has up to 98% of the biomass, and last at high density for several months hitherto restricted population genetic analyses of HAB species to (Lebret et al., 2012). These blooms reduce the recreational value of lakes microsatellite markers and AFLP analyses. Although these methods as the cells expel mucilage threads upon physical contact (Cronberg have provided significant revelations about phytoplankton popula- et al., 1988), which causes the water to become slimy and may result in tion genetic structure (reviewed in Rengefors et al., 2017), they are skin irritation (Sörensen, 1954). G. semen cells are flagellated and fragile, also of limited use for fine-scale patterns and specific population but form resistant resting cysts following sexual reproduction (Figueroa metrics. Due to the large genome size, whole-genome sequencing & Rengefors, 2006). The resting cysts remain dormant in the lake sed- and genome assembly of many HAB species has proved very diffi- iments until germination in the spring (Figueroa & Rengefors, 2006; cult, and is out of the question for the hundreds of individual sam- Rengefors et al., 2012), and probably both play a role in survival during ples needed for population genetic analysis. To identify thousands adverse conditions and serve as dispersal propagules. of SNPs without the existence of a reference genome, RAD-seq (i.e., While originally found only in very humic small bog lakes restriction-site-associated DNA sequencing) (Davey et al., 2011; (Cronberg et al., 1988), G. semen has expanded its habitat range in Etter et al., 2011), provided a promising approach for G. semen. Scandinavia and Baltic countries in northern Europe during the past The main objective of the present study was to explore the ge- 50 years according to Finnish, Swedish and Norwegian national mon- netic signature and pattern of the expanding G. semen population in itoring programmes (Hagman et al., 2015; Karosiene et al., 2014; Fenno-Scandinavia using SNP markers obtained from RAD-seq of Lepistö et al., 1994; Rengefors et al., 2012). The data show that the genome. This objective included determining the genome size blooms occur in more lakes of different characteristics, but no geo- of G. semen and optimizing the RAD protocol for microeukaryotes graphical direction of the expansion has been discerned. The under- with large genomes. We anticipated detecting fine-scale population lying causes of the increase of blooms are unclear, but temperature structure within the Fenno-Scandinavian metapopulation, which (longer growth season) and increased dissolved organic carbon (DOC) was not revealed with AFLP markers. A genetic footprint of the 914 | RENGEFORS et al. post-glacial recolonization of the region was also expected, as well in 2012 (Sassenhagen et al., 2015). Strains from lakes Helgasjön and as lake populations with low genetic diversity due to recent inva- Stråken were isolated in all three years (2010, 2011, and 2012) (Table sions. Finally, we hypothesized that we would observe a directional S1). Approximately 50–60 cells per lake were individually isolated to gene flow (dispersal) that would reveal the recent colonization of establish monoclonal strains. Cells were washed and transferred into new habitats, possibly displaying a northward expansion in response wells of 96-well plates containing 50% sterile-filtered lake water and to increased temperature. 50% MWC medium as described by Lebret et al. (2013). When suf- ficient biomass had been grown (~150,000 cells), cultures were har- vested by centrifugation, the supernatant was removed and pellets 2 | MATERIALS AND METHODS were stored at −80°C. DNA extractions were made by CTAB extrac- tion according to Lebret et al. (2013) and Sassenhagen et al. (2015). 2.1 | Sampling and cultures The final number of surviving strains from each lake ranged from 13 to 19, and the total added up to 153 strains. The Gonyostomum lake populations were sampled by establishing monoclonal algal cultures (strains) from individually isolated cells. Cells were sampled from nine lakes in Sweden, Norway and Finland, across 2.2 | DNA content estimation three consecutive years (Figure 1). All except two lakes (Osbysjön and Dammen) were sampled in the summer of 2010 as described by Lebret To accurately estimate the nuclear genome size of Gonyostomum et al. (2013). Osbysjön and Dammen were sampled both in 2011 and semen (strain BO-182), we employed propidium iodide flow

FIGURE 1 Map showing location of lakes sampled. Norway: LU = Lundebyvatn, IS = Isesjøen, GJ = Gjølsjøen. Sweden: OS = Osbysjön, DA = Dammen, HE = Helgasjön, ST = Stråken, MJ = Mjöträsket. Finland: KY = Kylänalanen [Colour figure can be viewed at wileyonlinelibrary.com] RENGEFORS et al. | 915 cytometry (FCM). Similar to other raphidophytes, G. semen is con- and decreased the volume of NEB2 buffer (1 µl) used in the P1 sidered to be diploid (Figueroa & Rengefors, 2006; Yamaguchi & adapter ligation. P1 adapters contained unique 7-bp barcodes to Imai, 1994). Two millilitres of a well-grown culture were centrifuged allow multiplexing strains in downstream library preparation, and in a MiniSpin centrifuge (5 min, 2040 g; Eppendorf) and the super- 3 µl of barcoded P1 adapter (100 µm) was used in each ligation re- fluous medium was removed by pipetting. Subsequently, 500 µl action. Multiplexed samples were sheared to around 450 bp using of ice-cold LB01 lysis buffer (Dolezel et al., 1989) was added to a Bioruptor sonicator (Diagenode). Moreover, AMPure XP beads the pellet of cells. To isolate the sample nuclei, several glass beads (Beckman Coulter) were used to remove redundant P1 adapters of 1.5 mm diameter were added and the cells were disrupted for (left-side size selection excluding fragments below 100 bp, factor 3 min at 25 Hz using a Retsch MM200 mixer mill (Retsch). To re- 1.2:1 beads/sample). To increase the yield following the end repair lease nuclei of a plant standard (Vicia faba cv. Inovec, 2C = 26.9 pg; and 3′-dA overhang addition steps, elution from Qiagen MinElute Dolezel et al., 1992), a ~ 20-mg piece of fresh leaf tissue was PCR Purification Kit (Qiagen) spin columns was done in three elution chopped with a razor blade in another 500 µl of ice-cold LB01 lysis steps instead of one. After P2 adapter ligation, AMPure XP beads buffer. The nuclei suspension was thoroughly mixed and filtered were used to remove redundant P2 adapters (see above). The final through a 42-µm nylon mesh into a special cuvette for direct use full amplification was performed with 67 ng of DNA template in a with the flow cytometer. Following a 20-min incubation at room 100-µl reaction volume and 18 PCR (polymerase chain reaction) cy- temperature, 40 µg/ml of propidium iodide, 40 µg/ml of RNase cles. The 300–700-bp size fraction of the PCR product was excised IIA and 1.8 μl/ml β-mercaptoethanol were added to the sample. and purified from an agarose gel. Twenty uniquely barcoded strains The stained sample was immediately analysed using a Partec were pooled per lane for sequencing. Samples were subsequently CyFlow SL cytometer (Partec) equipped with a green solid-state sequenced with Illumina technology at the SNP&SEQ Technology laser (Cobolt Samba, 532 nm, 100 mW). To minimize the effect Platform of the SciLifeLab facility in Uppsala, Sweden. Sequencing of random instrumental shift, three measurements were analysed was performed using Illumina HiSeq2000 v4-chemistry (125 bp), on separate days. Each sample measurement was done with up to the first batch with single-end and the remainder with paired-end 5000 particles and the resulting FCM histograms analysed using sequences. The R2 reads were thus not used in the downstream

FloMax ver. 2.4d (Partec). The sample peak was identified as G1 analyses. These sequences have been submitted to the NCBI SRA (vegetative cells) and the absolute nuclear DNA amount (C-value) database under BioProject ID PRJNA659541. was calculated as the sample G1 peak mean fluorescence/standard

G1 peak mean fluorescence × standard 2C DNA content (according to Doležel, 2005). 2.4 | RAD/SNP identification

All data was demultiplexed, quality-checked and processed using 2.3 | RAD library preparations the stacks software (Catchen et al., 2011, 2013) version 1.35. The analysis pipeline was run manually rather than using the de novo

A pilot test was first performed with four strains (KY20, MJ21, MJ12, denovo _ map.pl pipeline. Following ustacks, which builds loci, the GJ02) for which library preparation was carried out by Floragenex. number of retained sequences, RAD tags and SNPs per sample were

Genomic DNA was digested with the restriction enzyme SbfI and collected. cstacks parameters were tested with a pilot data set of 100-bp single-end reads were sequenced on an Illumina HiSeq2000. four strains with 4,000,000 reads each. The parameters were cho- From this data set, the minimum number of reads required to ob- sen with the criteria to maintain a mean coverage of at least 30, tain enough coverage per locus (>50×) and the number of samples and maximize number of utilized reads and polymorphic SNPs, by that can be multiplexed was determined. Specifically, this was done varying mismatch (M) and depth of stack (m) parameters. The final by first determining the average number of RAD-sites per sample program ustacks of the stacks pipeline was run with the following based on the pilot data and using the step process _ radtags in the parameters -m 5 -M 3 -N 1, where N denotes the maximum distance stacks software pipeline (see next section). This number was mul- to align secondary reads to primary stacks. For more information tiplied by the aimed coverage (i.e., 50) to determine the minimum regarding parameter optimization in stacks see Paris et al. (2017) and number of reads for each sample. Next, the expected total number Rochette and Catchen (2017). of Illumina reads per lane was divided by the minimum number of For population genetic structure, migration rates and metrics, reads per sample. The resulting number represented the number of data were filtered to select SNP loci found in all (nine) populations samples that could be multiplexed per lane. In the current case, this and in at least 80% of the individuals (within the stacks populations was ~20 samples. program). Only the first SNP per RAD-tag sequence was utilized to In-house, we subsequently followed a RAD library preparation avoid using multiple linked neighbouring SNPs. All data files were protocol modified from Amores et al. (2011) and Etter et al. (2011). formatted for downstream analyses directly in stacks populations. To For each sample, 1 µg of genomic DNA was digested with 0.5 µl explore effects of different filters (number of populations included SbfI-HF (NEB). We used 0.5 µl of 2000 U/µl T4 ligase (NEB) in the (-p), per cent of individuals containing the locus (-r), a range of p and P1 and P2 adapter (for sequences see Appendix S1) ligation steps r was analysed in terms of number of retained SNPs. To check that 916 | RENGEFORS et al. the RAD-tag sequences did not belong to contamination, we tested a thinning of 100 (i.e., only every 100th iteration was saved), utili - for potential human or bacterial contaminant sequences using the zation of the implemented spatial model, and correlated and inde- taxonomic sequence classifier kraken version 1.0 (Wood & Salzberg, pendent allele frequencies. 2014). The identified contaminant sequences were then submitted to a similarity search against the NCBI nucleotide collection using blast version 2.11.0+ (Altschul et al., 1998). 2.7 | Direction of gene flow

The direction of gene flow was determined using two different 2.5 | Population genetic metrics methods for the populations in Finland, Norway and Sweden. First, the online software divmigrate (Sundqvist et al., 2018) was used

Population differentiation (FST) and nucleotide diversity (π) were based on the data set from stacks populations including all SNPs calculated directly in stacks based on all SNPs utilized by the stacks (1092 variant sites). The analyses were run using GST as a measure of populations program. Nucleotide diversity was based on all called genetic variation, 1000 bootstraps, and alpha set to 0.05. Different SNPs including low-frequency variants. Data on the number of RAD- filter thresholds for relative migration direction were applied (0– sites, variant alleles, polymorphic sites, and observed and expected 0.45). In addition, a subset of four lake populations, each from a heterozygosity were also obtained. genodive version 2.0b27 different geographical area (northern Sweden: Mjöträsket, Finland;

(Meirmans & Van Tienderen, 2004) was used to calculate PhiST (from Kylänalanen, Norway: Isesjøen, southern Sweden: Helgasjön), was an analysis of molecular variance [AMOVA]), heterozygosity, Hardy– run in order to compare with the models explored in migrate (see Weinberg equilibrium (HW) and isolation-by-distance (IBD, Mantel below). test). The coalescence-based program migrate version 4.4.3 (Beerli & Palczewski, 2010) allowing model comparison using a Bayesian ap- proach (Palczewski & Beerli, 2014) was also utilized to resolve the 2.6 | Population structure analyses population structure and determine the direction of gene flow. For this, three replicate data sets of 100 variant RAD loci were selected Population genetic structure was inferred using a Bayesian anal- randomly and analysed to compare different population models. ysis in the software structure version 2.3 (Falush et al., 2003; A subset of the lakes was used to represent each of the four re- Pritchard et al., 2000). In total, 135 individuals from all nine pop - gions (Norway, northern Sweden, southern Sweden, Finland). The ulations in northern Europe and 1092 variant loci were utilized. evaluated models were a panmictic model (one large population The number of putative populations (K) was set from 1 to 12, with comprising all four sites), a four-population full migration model of 10,000 burn-ins and 50,000 MCMC (Markov chain Monte Carlo) unrestricted migration among all populations, and four-population repetitions, and 10 iterations. The model expected to best explain models with four different general geographical patterns of direc- the data set allowed for the parameters “admixture” and “corre- tional migration. Furthermore, four refined directional migration lated alleles.” The model was also tested with and without a lo- models with different geographical orientations were compared cation prior (i.e., the lake population sampled). Estimates of the (Table S2). Assuming an underlying southwards migration from best K to describe the data was determined using the software Mjöträsket in northern Sweden towards all three other sites and structure harvester with the Evanno method (Earl & vonHoldt, directional migration southwest from Kylänalanen in Finland to- 2012) as well as manual visual inspection. Structure plots were wards Helgasjön in southern Sweden, different directional migra- visualized using the on-line tool clumpak (Kopelman et al., 2015). tion scenarios between Kylänalanen and Isesjøen in Norway, and Population structure was also determined using a k-clustering and between Isesjøen and Helgasjön were tested in this refined model principal component analysis (PCA) with the software genodive, as comparison. this method makes no assumptions about Hardy–Weinberg equi - A modified version of the Python utility stacks2mig (distrib- librium. Analyses were performed after adjusting for missing data uted under the MIT open source licence, © 2015 Peter Beerli) was within genodive. Clustering was performed from one to 12 clus- used to generate migrate input files. Appropriate prior boundaries ters using both allele frequencies and an AMOVA approach, with were determined from preliminary exploratory analyses. Analyses

50,000 annealing steps and 20 random starts. Based on structure combining three replicate runs were conducted using a static heat- results, comparison between groups and AMOVA of groups was ing scheme with four chains (1.00, 1.50, 3.00, 1,000,000.00) with performed with genodive (1092 loci). Population structure was fur- 12,000,000 steps sampled and 600,000 steps recorded, after dis- ther explored including a spatial model utilizing georeferenced in - carding 400,000 steps as burn-in per replicate run. The degree of dividuals in the landscape genetic r package geneland version 4.0.3 support for one model over another was hereby quantified using (Guillot et al., 2005). For the simulations, no uncertainty regarding Bayes factors (Jeffreys, 1961). Bayes factors and model probabili- the spatial coordinates and up to nine possible populations were ties were calculated using the Python utility bf.py (Peter Beerli 2010, assumed. Parameter settings included 100,000 MCMC iterations, updated 2015). RENGEFORS et al. | 917

3 | RESULTS Analysis of the SNP data from the G. semen lake populations showed that all lake populations were genetically differentiated.

3.1 | Genome size Calculation of genetic differentiation (FST) was based on 135 indi-

viduals. Overall, FST values were significant and ranged from mod-

We estimated the nuclear genome size of Gonyostomum semen (strain erate to high differentiation. Lowest FST values were found among BO-182) using propidium iodide FCM. Three independent measure- the four lakes in southern Sweden (0.016–0.025). Highest genetic ments were conducted, each with relatively low coefficients of variation differentiation was found between the westernmost lake Isesjøen for both sample and standard nuclei (Table 1; Figure S1). Mean absolute and the northernmost lake Mjöträsket (0.108) and the eastern- DNA content (2C) of vegetative cells was 32.52 pg (≈31.81 Gbp). most Kylänalanen (0.105). Isesjøen, in turn, was only moderately differentiated from the other two Norwegian lakes, Gjølsjøen and Lundebyvatn (Table 3).

3.2 | Sequencing and SNP metrics When genetic distance (PhiST) was plotted against geographi- cal distance, there is only a weak correlation (r2 = 0.08) (Figure 2). The mean retained number of reads was 8,187,810 (i.e., 82.7% utilized Furthermore, the Mantel test for IBD was not significant (p = .155).

RADtags), resulting in a mean coverage of 69.6× across all samples. Per The data illustrate that the range of pairwise FST is very wide, al- sample, the total number of RAD sites had a mean of 117,626 and the though there is a tendency of increasing FST with geographical dis- total number of SNPs had a mean of 13,864,867. Filtration steps to tance if only minimum and maximum FST are considered. include SNP loci found in at least 80% of the individuals in each of the nine lake populations resulted in a total of 240,128 variant and fixed loci, of which 1092 were variant RAD loci (Table 2). When only eight 3.4 | Population genetic structure populations were required (instead of nine), the number of polymor- phic loci increased to 4545 . Reducing loci to be present in 60% of the Based on the Evanno method, the best clustering using structure individuals increased the polymorphic loci to 15,292. Further analy- for the admixture model with correlated frequencies was for three ses demonstrated that one lake (Osbysjön) had several samples of low population clusters (K = 3), but with a second delta K peak at K = 5. quality and reduced the total number of loci using the strict filtering. When a location prior was added, K = 2 best explained the struc-

Sequence classification of the resulting loci using kraken identi- ture, again with a smaller peak at K = 5 (Figure 3; Figure S2). When fied four candidate contaminant loci of potential bacterial or human K = 2, the two populations consisted of a western cluster with the origin. Similarity searches (blastn) of these loci resulted in weak three Norwegian lakes, and a second cluster with all other lakes. matches caused by low-complexity regions and are hence not likely With K = 3, a third cluster is formed consisting of the northern to originate from contamination. Swedish lake Mjöträsket and the Finnish lake Kylänalanen. For K = 5, the populations consisted of the four southern Swedish lakes as one population and the two eastern Norwegian lakes as a second 3.3 | Genetic differentiation and population, while the western Norwegian lake Lundebyvatn, the population metrics northern Swedish lake Mjöträsket and the Finnish lake Kylänalanen each formed separate populations (Figure 3). Using correlated al-

The percentage of polymorphic loci was low and varied from 0.19% lele frequencies and the location prior, structure also found that (Kylänalanen) to 0.25% (Isesjøen) (Table 2). The nucleotide diversity K = 2 best explained the data. The AMOVA of groups of popula- for all loci (variant and fixed) was also low, ranging from 0.0006 to tions, based on K = 2, resulted in an F-value of 0.113 (p = .025). 0.0008. For the variant sites, the observed heterozygosity ranged K-means cluster analysis, based on either allele frequencies or from 0.14 (Dammen) to 0.19 (Stråken), both located in southern AMOVA, yielded best clustering with K = 2 with Calinski & Harabasz's Sweden. In Stråken the observed heterozygosity was higher than the pseudo-F, and K = 3 with Bayesian Information Criterion. For K = 2, expected heterozygosity (0.19 vs. 0.17), and this was also the site Isesjøen and Gjølsjøen in Norway formed one cluster, and all other with fewest private alleles. The overall population Hardy–Weinberg lakes a second cluster. For K = 3, Isesjøen and Gjølsjøen formed a equilibrium was slightly negative (−0.094). western cluster and the southern Swedish lakes a southern cluster,

TABLE 1 Summary statistics of the flow cytometry data of Gonyostomum semen (strain BO-182)

Sample/standard ratio of Absolute genome size of Absolute genome size of Sample CV Standard Replicate relative fluorescence sample (Gbp) sample (pg) (%) CV (%)

1 1.195 31.43 32.13 2.82 2.98 2 1.221 32.11 32.83 3.26 3.80 3 1.212 31.88 32.60 2.99 3.47

Note: CV = coefficient of variation of nuclear fluorescence intensity. 918 | RENGEFORS et al.

TABLE 2 Basic population metrics for nine lake populations based on a data set of 1092 variant SNP loci out of 240,128 sites in total (variant + fixed), which are present in all nine populations and in 80% of the individuals within each lake population; data are from stacks populations

Variant positions (1092 RAD loci)

Number of Lake individuals Private alleles Observed heterozygosity Expected heterozygosity

Dammen (DM) 15 62 0.14 0.15 Stråken (ST) 15 25 0.19 0.17 Helgasjön (HE) 19 51 0.15 0.15 Osbysjön (OS) 17 40 0.14 0.16 Mjöträsket (MJ) 17 43 0.14 0.14 Lundebyvatn (LU) 13 52 0.13 0.16 Isesjøen (IS) 13 54 0.17 0.17 Gjøelsjøn (GJ) 13 31 0.15 0.16 Kylänalanen (KY) 13 26 0.14 0.14

All positions—variant and fixed (240,128 RAD loci)

Nucleotide Lake Polymorphic loci (%) diversity, π Observed heterozygosity Expected heterozygosity Dammen (DM) 0.23 0.0007 0.0006 0.0007 Stråken (ST) 0.24 0.0008 0.0008 0.0008 Helgasjön (HE) 0.25 0.0007 0.0007 0.0007 Osbysjön (OS) 0.24 0.0008 0.0006 0.0007 Mjöträsket (MJ) 0.19 0.0006 0.0007 0.0006 Lundebyvatn (LU) 0.24 0.0007 0.0006 0.0007 Isesjøen (IS) 0.25 0.0008 0.0008 0.0008 Gjøelsjøn (GJ) 0.22 0.0008 0.0007 0.0007 Kylänalanen (KY) 0.19 0.0006 0.0006 0.0006

while Lundebyvatn (Norway) formed a cluster with the northern were away from lake Kylänalanen in Finland (the furthest east) and Swedish and Finnish lake. Mjöträsket in the north. Consequently, the significant migration di- PCA was utilized to determine the major axis of genetic varia- rection was overall towards the lakes in the west/south (southern tion of the data set. By plotting PCA1 against PCA2, a pattern was Swedish lakes, Norwegian lakes). Above the 0.3 filter threshold, the displayed showing tightly clustered lakes from southern Sweden gene flow routes were only towards the lakes Stråken and Osbysjön (Figure 4) in the upper left-hand quadrant. This cluster diverged from in Sweden, and towards Lundebyvatn in Norway (Figure 5b). From the northern Swedish and Finnish lakes on the PCA2-axis and from lake Mjöträsket in the north, the direction of migration was sig- the Norwegian lakes on the PCA1-axis. Isesjøen was most distant nificant only to the south-west (i.e., to lakes Stråken, Dammen, from all the other lakes (Figure 4). Osbysjön and Lundebyvatn at filter threshold 0.3). In addition,

Spatial analyses using geneland, which takes landscape into ac- there was a significant direction of migration from Lundebyvatn to count, also resulted in two clear spatial populations when using the Isesjøen. At the 0.3 filter threshold there was no significant direction independent allele frequencies model. Here, one population con- of migration to or from lakes Helgasjön or Gjølsjøen. However, at a tained the three Norwegian sites, and all other Swedish and Finnish lower threshold there was a general westward direction of migration lakes. With correlated allele frequencies, geneland detected eight towards Gjølsjøen, while there was a low level of relative migration separate clusters, but grouped into only six lake clusters. away from Helgasjön towards Isesjøen and Gjølsjøen (Figure 5a) in the west. There was no significant migration towards the north or east in any of the lakes. When only four lakes were included in the 3.5 | Direction of migration (gene flow) analysis (Isesjøen, Mjöträsket, Helgasjön and Kylänalanen), the sig- nificant migratory directions were all towards Isesjøen (Figure 5c).

The pattern of migration direction as calculated by divmigrate Analyses of different population and migration models using mi- showed a clear direction of gene flow from east/northeast towards grate showed the best fit of a model assuming four genetically dis- the west/southwest (Figure 5a). The significant migration directions tinct populations with unrestricted migration among all populations RENGEFORS et al. | 919

TABLE 3 (a) Pairwise genetic differentiation (FST) between lake populations based on 1092 RAD SNPs above diagonal, pairwise geographical distance (km) below the diagonal. (b) Pairwise genetic differentiation PhiST (above diagonal) and Jost's D (below diagonal) from genodive

(a)

FST / km DM ST HE OS MJ LU IS GJ KY DM 0.024 0.023 0.025 0.048 0.040 0.091 0.084 0.057 ST 65 0.016 0.017 0.044 0.032 0.083 0.072 0.045 HE 51 27 0.022 0.032 0.028 0.084 0.072 0.040 OS 28 90 79 0.055 0.040 0.084 0.080 0.054 MJ 1243 1179 1195 1268 0.045 0.108 0.090 0.048 LU 373 325 352 383 1006 0.060 0.042 0.037 IS 356 311 338 364 1037 31 0.031 0.105 GJ 355 306 333 366 1008 21 35 0.078 KY 697 645 646 725 728 691 711 677

(b)

PhiST / D DM ST HE OS MJ LU IS GJ KY DM 0.059 0.054 0.058 0.116 0.103 0.228 0.208 0.145 ST 0.012 0.039 0.039 0.101 0.077 0.198 0.172 0.110 HE 0.010 0.008 0.054 0.088 0.076 0.218 0.188 0.110 OS 0.012 0.008 0.011 0.129 0.104 0.206 0.191 0.132 MJ 0.023 0.021 0.017 0.027 0.114 0.255 0.210 0.115 LU 0.022 0.017 0.015 0.023 0.023 0.151 0.102 0.091 IS 0.060 0.052 0.055 0.054 0.064 0.037 0.076 0.250 GJ 0.052 0.043 0.044 0.048 0.048 0.023 0.017 0.186 KY 0.030 0.023 0.022 0.028 0.021 0.018 0.063 0.042

Note: See Table 2 for lake abbreviations.

(Table 4), while the panmictic population model was the least likely of the compared models. Among the directional migration models, the model assuming a general migration direction from north to south performed better than the inverse directional migration model (south to north) on all three replicate data sets. When comparing an eastwards against a westwards migration pattern, a general trend was not visible and model rank order differed between different data sets. In the comparison of further refined migration models, combining an underlying general southwards migration pattern with directional components either towards the east or the west, those models that included some degree of migration from Isesjøen to- wards Helgasjön in the south (refined models 2 and 4) consistently outperformed the other models (refined models 1 and 3). A general trend of directionality between Isesjøen and Kylänalanen towards the west vs. towards the east could not be detected.

4 | DISCUSSION

FIGURE 2 Genetic distance (F ) plotted against geographical New patterns were revealed in the population genetic structure ST distance (km). The blue dotted line indicates a linear fit, R2 = 0.08 and migration patterns of the nuisance alga Gonyostomum semen [Colour figure can be viewed at wileyonlinelibrary.com] 920 | RENGEFORS et al.

(a) FIGURE 3 structure analyses based on two different models (correlated allele frequencies with and without a location prior added), here showing the population division (K) with the highest delta K-values according to the Evanno method (see Figure S2). Vertical bars indicate individual (b) isolates, and the y-axis shows assignment fraction. Each population identified by structure is displayed in a different colour. (a) K = 3, admixed, correlated allele frequencies; K = 5, admixed, correlated allele frequencies; K = 2 admixed, correlated allele frequencies with location (c) prior added; K = 5, admixed, correlated allele frequencies and location prior added [Colour figure can be viewed at wileyonlinelibrary.com]

(d)

displaying a clear division of an eastern and a western population. Third, we found a strong pattern of gene flow from the northeast/ east towards the southwest/west, suggesting that the direction of gene flow mainly reflects dispersal by autumn migratory birds. The latter is in contrast to the expectations of discerning a pattern of recent expansion northwards. These results and their ecological im- plications are discussed below.

4.1 | RAD-seq in microeukaryotes

RAD-seq has been used in a number of multicellular plant and animal species to identify SNPs de novo (Emerson et al., 2010; Hohenlohe et al., 2010), but so far only to a limited extent in microalgae (Postel et al., 2020). Here we show that RAD-seq can also be used for mi- crobial eukaryotes with large genomes. To our knowledge, this is the first genome size estimate for the genus Gonyostomum. The only other available DNA content estimate for Raphidophyceae is for the marine species Heterosigma carterae, which is five times smaller, with FIGURE 4 Plot of PCA of components 1 and 2. Colour codes a genome of 5.43–6.12 pg (≈5.31–5.98 Gbp; Veldhuis et al., 1997). are based geographical location. Yellow diamonds are southern Based on the average cut frequency of the eight-cutter enzyme Sbf1 Swedish lakes, red diamonds Norwegian (western) lakes, green we had a priori expected 1.2 million RAD sites in the G. semen ge- diamond northern Swedish lake, and blue diamond Finnish (eastern) nome, given a roughly estimated genome size of 80 Gbp. Instead, we lake [Colour figure can be viewed at wileyonlinelibrary.com] found that the genome size was 32 Gbp and a mean of only 117,626 RAD-sites. This result implies that an eight-cutter enzyme was ap- by using genome-wide SNPs obtained from RAD-seq. First, we propriate for G. semen, that is the sites were sufficiently numerous showed that RAD-seq can be used to reliably identify SNPs in a mi- to provide the necessary SNPs for downstream analyses, but few croalga with a large microeukaryote genome (2C = 32 Gbp). Second, enough to have enough sequencing coverage at an affordable cost. by using ~1000 RAD-derived SNP loci, the fine structure of the Based on the preliminary data we aimed at a coverage of at least 50×, Fenno-Scandinavian lake populations of G. semen could be detected, but had a final coverage of 70×. This means that more individuals RENGEFORS et al. | 921

(a) FIGURE 5 Relative migration networks based on 1000 bootstraps and GST method in divmigrate. Only significant directions are shown. Note that no arrows denotes no significant direction, not a lack of migration. (a) All nine lakes and no filter threshold. (b) All nine lakes, filter threshold set to 0.3. (c) Four lakes (HE, IS, MJ, KY). Norway: LU = Lundebyvatn, IS = Isesjøen, GJ = Gjølsjøen. Sweden: OS = Osbysjön, DA = Dammen, HE = Helgasjön, ST = Stråken, MJ = Mjöträsket. Finland: KY = Kylänalanen [Colour figure can be viewed at wileyonlinelibrary.com]

of this study, leading to a total higher number of reads sequenced. The unexpectedly low number of RAD sites in relation to the ge- nome size could not be explained in this study. This could be due to many gene copies, as in the dinoflagellates, which are evolutionarily unrelated, but also have very large genomes (Hackett et al., 2004). Another explanation is that the genome contains AT-rich regions where the enzyme does not cut, because SbfI recognizes the cut site CCTGCA^GG. At present there is not enough information about (b) raphidophyte genomes to resolve this. Nevertheless, we can con- clude that RAD-seq should be useful for other microbial eukaryotes with similar genomes, including other HAB-forming raphidophytes.

4.2 | Population structure revealed when using SNPs instead of AFLP

The main objective of this study was to use population genetics to understand the recent expansion of G. semen in northern Europe. In contrast to the analyses made based on AFLP, which suggested a single Fenno-Scandinavian population (Lebret et al., 2013), the structure analyses based on the SNP data revealed a clear population differentiation. The clearest division, supported by all analyses, was that of the Norwegian lakes in a western cluster, and all the Swedish and Finnish lakes in an eastern cluster. This suggests that these two populations stem from different post-glacial source populations. The differentiation must be recent because all the strains belong to the (c) same cox1 and 18S haplotype, as shown in an earlier study (Lebret et al., 2015). In both k-clustering analyses performed, G. semen in the Norwegian lake Lundebyvatn, close to the Swedish border, ap- pears to be an admixed population. The Lundebyvatn population is genetically more closely related to the eastern cluster despite being geographically closer to the other two Norwegian popula- tions. However, it shows significant admixture as shown by the

structure analyses and is equally genetically differentiated ( FST) from the other Norwegian lakes and the Swedish lakes. The reason behind this east–west population divide could be due to different colonization events (i.e., either different time points or lineages of G. semen). A more fine-scaled population genetic structure can also be de-

tected in the SNP data. The structure analysis also supports a pattern could have been multiplexed, thereby reducing library preparation of five distinct populations that coincide well with the geographical and sequencing costs further. The higher coverage was mainly due clustering of the lakes. Here, all the southern Swedish lakes that are to the improvement of the Illumina technology during the duration within 90 km from each other form a separate cluster, as do the two 922 | RENGEFORS et al.

TABLE 4 Model probabilities, Bayes factors and the log marginal likelihoods of different population and migration models of three 4.3 | Weak correlation between genetic random sets of 100 polymorphic RAD loci. (a-c) Model comparison isolation and geographical distance of a panmictic population model, a full migration model and four general geographical patterns of directional migration. (d-f) Model We had expected to see some pattern revealing the post-glacial comparison of a set of refined migration models. colonization of lakes, which our earlier work using AFLP did not re-

Model- veal. In fact the data showed a weak correlation between genetic Model Log(mL) LBF probability distance and geographical distance, although the Mantel test was

(a) not significant. This indicates that IBD has some effect, but that Panmictic model −18,789.61 −4656.75 .0000 this effect is overridden by other factors. For example, Osbysjön and Mjöträsket are 1200 km apart but have a pairwise F of 0.055, South-to-north −17,455.69 −3322.83 .0000 ST while Lundebyvatn and Isesjøen are only 30 km apart and have an East-to-west −17,451.03 −3318.17 .0000 FST of 0.060. Our data show that while there is a trend of the dif- West-to-east −17,429.48 −3296.62 .0000 ferentiation to increase with distance, there is a huge scatter at each North-to-south −17,424.64 −3291.78 .0000 pairwise distance. This clearly points to dispersal limitation caused Full migration −14,132.86 0.00 1.0000 by other factors than distance. Sassenhagen et al. (2015) ruled out (b) that hydrological connectivity played an important role in dispersal South-to-north −17,521.92 −18.03 .0000 of G. semen, but postulated that on a regional scale (<100 km), animal West-to-east −17,511.93 −8.04 .0003 vectors, such as insects and birds, may facilitate gene flow. High dif- North-to-south −17,511.65 −7.76 .0004 ferentiation between relatively nearby lakes has been hypothesized East-to-west −17,503.89 0.00 .9993 to be due to priority effects of first colonizers (Sefbom et al., 2015). (c) Recently, Sundqvist et al. (2018) also suggested that long-term dor- East-to-west −17,146.81 −47.58 .0000 mant resting stages contribute to increased population differentia- tion by acting as an anchoring effect on phytoplankton populations. South-to-north −17,132.41 −33.18 .0000 In addition to genetic isolation with distance, we had hypoth- North-to-south −17,101.17 −1.94 .1256 esized to find lakes with lower genetic diversity than some other West-to-east −17,099.23 0.00 .8744 lakes, indicating that these populations were established recently. (d) However, the genetic diversity in the lakes did not vary much in the Refined-model-3: −17,074.11 −37.92 .0000 percentage of polymorphic loci or in heterozygosity. Lake Stråken Refined-model-1: −17,058.04 −21.85 .0000 had the highest genetic diversity (HO = 0.19) and also has records of Refined-model-2: −17,036.37 −0.18 .4551 G. semen from the late 1940s (Sörensen, 1954). However, Helgasjön Refined-model-4: −17,036.19 0.00 .5449 has records from that time but had an HO = 0.15. For the other lakes (e) in the study, there has either been no monitoring or only as of the Refined-model-3: −17,172.61 −17.32 .0000 early 2000s , at a time when G. semen blooms were already pres- Refined-model-1: −17,171.31 −16.02 .0000 ent. For most lakes, the observed heterozygosity is slightly lower Refined-model-4: −17,169.32 −14.03 .0000 than the expected heterozygosity, which may be a reflection of ear- Refined-model-2: −17,155.29 0.00 1.0000 lier population bottlenecks. Only the Lake Stråken population has a higher observed than expected heterozygosity. This possibly re- (f) flects outbreeding due to dispersal towards Lake Stråken, as can be Refined-model-1: −16,783.11 −36.18 .0000 observed from the divmigrate analysis (Figure 4). Refined-model-3: −16,776.41 −29.48 .0000 Refined-model-2: −16,758.69 −11.76 .0000 Refined-model-4: −16,746.93 0.00 1.0000 4.4 | Direction of gene flow

We had anticipated that we would observe a directional gene flow westernmost Norwegian lakes (within 35 km), while the remaining (dispersal) that would reveal the recent colonization of new habitats, three lakes each form genetically separate populations. This indi- possibly displaying a northward expansion in response to increased cates that the lake populations are genetically differentiated, which temperature. Instead, the divmigrate gene flow analyses suggested is also supported by the FST values. The lakes that clustered together that the main migration direction is towards the south/southwest, had FST values ranging from 0.016 to 0.031, while lakes that were and not towards the north as hypothesized. The analysis of direc- differentiated by structure had FST values ranging from 0.032 to tional migration using migrate supports a dominance of gene flow in

0.105. Of note, the range of FST was three to four times lower than a north-to-south direction compared to a south-to-north direction. the FST values obtained using AFLP markers for in part the same set However, the directionality of gene flow on the geographical east– of individual samples, suggesting that AFLP overestimated FST. west axis remains ambiguous in the analysis. Based on the migrate RENGEFORS et al. | 923 analyses, support for a full migration model with bidirectional asym- with contributions from R.G., I.S. and D.A. All co-authors read and metric migration was significantly higher than for other simpler mod- edited the manuscript. els, which assumed unidirectional migration patterns. This absence of a clear signature of directional migration also suggests recent DATA AVAILABILITY STATEMENT range expansion of G. semen in Fenno-Scandinavia. The sequence data have been submitted to the NCBI SRA database Together, the patterns of migration show that G. semen is not under BioProject ID PRJNA659541. https://www.ncbi.nlm.nih.gov/ currently expanding in a northward direction, which is also corrob- b i o p r o​ j e c t / PRJNA​ 6​ 5 9 5 4 1 orated by monitoring data. Consequently, we can confirm that the expansion observed is not a result of a species dispersing north as ORCID a response to temperature increase. The gene flow direction prob- Karin Rengefors https://orcid.org/0000-0001-6297-9734 ably reveals an older migratory route, not that of recent introduc- Raphael Gollnisch https://orcid.org/0000-0001-6177-8877 tions. Phytoplankton are dispersed either by water, wind or animal William A. Cresko https://orcid.org/0000-0002-3496-8074 vectors (Kristiansen, 1996; Tesson et al., 2016), and in this case we Dag Ahrén https://orcid.org/0000-0003-4713-0032 can discard both water (the lakes are not hydrologically connected) and wind. Winds in Sweden are generally from the west or south- REFERENCES west (Swedish Meteorological and Hydrological Institute, https:// Altschul, S., Madden, T., Schaffer, A., Zhang, J. H., Zhang, Z., Miller, W., www.smhi.se/kunsk​apsba​nken/klima​t/sveri​ges-klima​t/vind-i-sveri​ & Lipman, D. (1998). Gapped BLAST and PSI-BLAST: A new gen- eration of protein database search programs. FASEB Journal, 12(8), ge-1.31309), the opposite direction of the dominating migration di- A1326. rection. Instead we propose that G. semen cysts are dispersed via Amores, A., Catchen, J., Ferrara, A., Fontenot, Q., & Postlethwait, J. H. migratory waterfowl such as mallard ducks, whose autumn migra- (2011). Genome Evolution and Meiotic Maps by Massively Parallel tory routes are from the east/north towards the west/southwest, DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication. Genetics, 188(4), 799–U779. https://doi. similar to the direction of gene flow (Tolf et al., 2012). Waterfowl org/10.1534/genet​ics.111.127324. are known to transport viable propagules of other aquatic organisms Anderson, D. M., Glibert, P. M., & Burkholder, J. M. (2002). Harmful such as bryozoans, crustaceans and plant seeds (Charalambidou & algal blooms and eutrophication: Nutrient sources, composi- Santamaría, 2002; Charalambidou et al., 2003; Green et al., 2002). tion, and consequences. Estuaries, 25(4B), 704–726. https://doi. Phytoplankton cysts have also been shown to be able to survive org/10.1007/bf028​04901. Atkinson, K. M. (1972). Birds as transporters of algae. British Phycological passage in bird guts (Atkinson, 1972; Tesson et al., 2018). Coleman Journal, 7, 319–321. (2001) further suggested that the of a chlorophyte Beerli, P., & Palczewski, M. (2010). Unified framework to evaluate pan- taxon was followed bird migratory paths. Thus, it is also likely that G. mixia and migration direction among multiple sampling locations. semen cysts could also be dispersed by birds. The cysts are produced Genetics, 185(1), 313–326. Björnerås, C., Weyhenmeyer, G. A., Evans, C. D., Gessner, M. O., at the end of the summer and early autumn (Rengefors et al., 2012), Grossart, H.-P., Kangur, K., Kokorite, I., Kortelainen, P., Laudon, and thus could be co-dispersed with birds flying south/southwest. H., Lehtoranta, J., Lottig, N., Monteith, D. T., Nõges, P., Nõges, T., Potential vectors are the mallards, which could disperse cysts at a Oulehle, F., Riise, G., Rusak, J. A., Räike, A., Sire, J., … Kritzberg, E. distance of 150 km in 2.25 hrs, and 800 km in 12 hrs based on the S. (2017). Widespread Increases in Iron Concentration in European and North American Freshwaters. Global Biogeochemical Cycles, calculations for dinoflagellate cysts (Tesson et al., 2018). 31(10), 1488–1500. https://doi.org/10.1002/2017g​b005749. Bolch, C. J. S., & de Salas, M. F. (2007). A review of the molecular ev- ACKNOWLEDGEMENTS idence for ballast water introduction of the toxic dinoflagellates This project was supported by the Swedish Research Council Gymnodinium catenatum and the Alexandrium "tamarensis com- plex" to Australasia. Harmful Algae, 6(4), 465–485. https://doi. Formas 215-2010-751 and the Swedish Research Council (VR) org/10.1016/j.hal.2006.12.008. 621-2012-3726 to K.R. The EU ITN programme SINGEK, Marie Catchen, J. M., Amores, A., Hohenlohe, P. A., Cresko, W. A., & Sklodowska-Curie grant 675752, supported R.G. The computations Postlethwait, J. H. (2011). Stacks: Building and genotyping loci de were performed on resources provided by SNIC through Uppsala novo from short-read sequences. G3: Genes, Genomes, Genetics, 1, 171–182. Multidisciplinary Center for Advanced Computational Science Catchen, J. M., Hohenlohe, P. A., Bassham, S., Amores, A., & Cresko, (UPPMAX) under Project SNIC 2017/7-439. W. A. (2013). Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124–3140. AUTHOR CONTRIBUTIONS Charalambidou, I., & Santamaría, L. (2002). Waterbirds as endozoochor- K.R. designed the research. K.L. and I.S. sampled materials. K.L., I.S., ous dispersers of aquatic organisms: a review of experimental evi- dence. Acta Oecologica, 23, 165–176. K.H.A. and M.S. performed laboratory work. S.B. and W.A.C. ad- Charalambidou, I., Santamaría, L., & Figuerola, J. (2003). How far can vised on design and helped trouble-shoot the RAD method develop- the freshwater bryozoan Cristatella mucedo disperse in duck guts. ment. K.R. and D.A. analysed sequence data. D.C. analysed genome Archiv Fuer Hydrobiologie, 157(4), 547–554. size. K.R., R.G., D.A. and I.S. performed population genetic analyses. Coleman, A. W. (2001). Biogeography and speciation in the Pandorina/ Volvulina (Chlorophyta) superclade. Journal of Phycology, 37(5), K.R., R.G., I.S. and D.A. discussed the results. K.R. wrote the paper 836–851. 924 | RENGEFORS et al.

Cronberg, G., Lindmark, G., & Björk, S. (1988). Mass development of the threespine stickleback using sequenced RAD tags. PLoS Genetics, flagellate Gonyostomum semen (Raphidophyta) in Swedish forest 6(2), e10000862. https://doi.org/10.1371/journ​al.pgen.10000862. lakes - an effect of acidification? Hydrobiologia, 161, 217–236. Jeffreys, H. (1961). Theory of probability. Oxford University Press. Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., Karosiene, J., Kasperoviciene, J., Koreiviene, J., & Vitonyte, I. (2014). & Blaxter, M. L. (2011). Genome-wide genetic marker discovery Assessment of the vulnerability of Lithuanian lakes to expansion and genotyping using next-generation sequencing. Nature Reviews of Gonyostomum semen (Raphidophyceae). Limnologica, 45, 7–15. Genetics, 12, 499–510. https://doi.org/10.1016/j.limno.2013.10.005. Doležel, J. (2005). Plant DNA flow cytometry and estimation of nuclear Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A., & genome size. Annals of Botany, 95(1), 99–110. Mayrose, I. (2015). Clumpak: a program for identifying cluster- Dolezel, J., Binarová, P., & Lucretti, S. (1989). Analysis of Nuclear DNA ing modes and packaging population structure inferences across content in plant cells by Flow cytometry. Biologia Plantarum, 31(2), K. Molecular Ecology Resources, 15(5), 1179–1191. https://doi. 113–120. org/10.1111/1755-0998.12387. Dolezel, J., Sgorbati, S., & Lucretti, S. (1992). Comparison of 3 DNA flu- Kristiansen, J. (1996). Dispersal of freshwater algae - a review. orochromes for flow cytometic estimation of nuclear-DNA con- Hydrobiologia, 336, 151–157. tent in plants. Physiologia Plantarum, 85(4), 625–631. https://doi. Lebret, K., Kritzberg, E., Figueroa, R. I., & Rengefors, K. (2012). Genetic org/10.1034/j.1399-3054.1992.850410.x. diversity within and genetic differentiation between blooms of a Earl, D. A., & vonHoldt, B. M. (2012). STRUCTURE HARVESTER: a web- microalgal species. Environmental Microbiology, 14(9), 2395–2404. site and program for visulaizing STRUCTURE output and imple- Lebret, K., Kritzberg, E., & Rengefors, K. (2013). Population genetic menting the Evanno method. Conservation Genetics Resources, 4(2), structure of a microalgal species under expansion. PLoS One, 8(12), 359–361. 82511–82519. Emerson, K. J., Merz, C. R., Catchen, J. M., Hohenlohe, P. A., Cresko, Lebret, K., Ostman, O., Langenheder, S., Drakare, S., Guillemette, F., & W. A., Bradshaw, W. E., & Holzapfel, C. M. (2010). Resolving Lindstrom, E. S. (2018). High abundances of the nuisance raphido- postglacial phylogeography using high-throughput sequenc- phyte Gonyostomum semen in brown water lakes are associated ing. Proceedings of the National Academy of Sciences of the United with high concentrations of iron. Scientific Reports, 8, 1–10. https:// States of America, 107(37), 16196–16200. https://doi.org/10.1073/ doi.org/10.1038/s4159​8-018-31892​-7 pnas.10065​38107. Lebret, K., Tesson, S., Kritzberg, E., Carmelo, T., & Rengefors, K. (2015). Etter, P. D., Bassham, S., Hohenlohe, P. A., Johnson, E. A., & Cresko, W. Phylogeography of the freshwater raphidophyte Gonyostomum A. (2011). SNP discovery and genotyping for evolutionary genet- semen confirms a recent expansion in northern Europe by a sin- ics using RAD sequencing. In V. Orgogozo, & M. V. Rockman (Eds.), gle haplotype. Journal of Phycology, 51(4), 768–781. https://doi. Molecular methods for evolutionary genetics, methods in molecular bi- org/10.1111/jpy.12317. ology, Vol. 772. Springer Science+Business Media. Lepistö, L., Antikainen, S., & Kivinen, J. (1994). The occurrence of Excoffier, L., Foll, M., & Petit, R. J. (2009). Genetic consequences Gonyostomum semen (Ehr.) Diesing in Finnish lakes. Hydrobiologia, of range expansions. Annual Review of Ecology, Evolution, and 273, 1–8. Systematics, 40(1), 481–501. https://doi.org/10.1146/annur​ev.ecols​ Lilly, E. L., Kulis, D. M., Gentien, P., & Anderson, D. M. (2002). Paralytic shell- ys.39.110707.173414. fish poisoning toxins in France linked to a human-introduced strain of Falush, D., Stephens, M., & Pritchard, J. K. (2003). Inference of popula- Alexandrium catenella from the western Pacific: evidence from DNA tion structure: Extensions to linked loci and correlated allele fre- and toxin analysis. Journal of Plankton Research, 24(5), 443–452. quencies. Genetics, 164, 1567–1587. Litchman, E. (2010). Invisible invaders: non-pathogenic invasive mi- Figueroa, R. I., & Rengefors, K. (2006). Life cycle and sexuality of the crobes in aquatic and terrestrial ecosystems. Ecology Letters, 13, freshwater raphidophyte Gonyostomum semen (Raphidophyceae). 1560–1572. Journal of Phycology, 42, 859–871. Meirmans, P. G., & Van Tienderen, P. H. (2004). GENOTYPE and Gobler, C. J., Doherty, O. M., Hattenrath-Lehmann, T. K., Griffith, A. GENODIVE: two programs for the analysis of genetic diversity of W., Kang, Y., & Litaker, R. W. (2017). Ocean warming since 1982 asexual organisms. Molecular Ecology Notes, 4, 792–794. has expanded the niche of toxic algal blooms in the North Atlantic Michalak, A. M., Anderson, E. J., Beletsky, D., Boland, S., Bosch, N. S., and North Pacific oceans. Proceedings of the National Academy Bridgeman, T. B., Chaffin, J. D., Cho, K., Confesor, R., Daloglu, I., of Sciences of the United States of America, 114(19), 4975–4980. DePinto, J. V., Evans, M. A., Fahnenstiel, G. L., He, L., Ho, J. C., https://doi.org/10.1073/pnas.16195​75114. Jenkins, L., Johengen, T. H., Kuo, K. C., LaPorte, E., … Zagorski, Green, A. J., Figuerola, J., & Sánchez, M. I. (2002). Implications of water- M. A. (2013). Record-setting algal bloom in Lake Erie caused by bird ecology for the dispersal of aquatic organisms. Acta Oecologica, agricultural and meteorological trends consistent with expected 23, 177–189. future conditions. Proceedings of the National Academy of Sciences Guillot, G., Mortier, F., & Estoup, A. (2005). GENELAND: a computer pack- of the United States of America, 110(16), 6448–6452. https://doi. age for landscape genetics. Molecular Ecology Notes, 5(3), 708–711. org/10.1073/pnas.12160​06110. Hackett, J. D., Anderson, D. M., Erdner, D. L., & Bhattacharya, D. (2004). Padisák, J., Vasas, G., & Borics, G. (2016). Phycogeography of fresh- Dinoflagellates: A remarkable evolutionary experiment. American water phytoplankton: traditional knowledge and new molecular Journal of Botany, 91(10), 1523–1534. tools. Hydrobiologia, 764(1), 3–27. https://doi.org/10.1007/s1075​ Hagman, C. H. C., Ballot, A., Hjermann, D. O., Skjelbred, B., Brettum, P., & 0-015-2259-4. Ptacnik, R. (2015). The occurrence and spread of Gonyostomum semen Paerl, H. W., & Otten, T. G. (2013). Harmful Cyanobacterial Blooms: (Ehr.) Diesing (Raphidophyceae) in Norwegian lakes. Hydrobiologia, Causes, Consequences, and Controls. Microbial Ecology, 65(4), 995– 744(1), 1–14. https://doi.org/10.1007/s1075​0-014-2050-y. 1010. https://doi.org/10.1007/s0024​8-012-0159-y. Hagman, C. H. C., Skjelbred, B., Thrane, J. E., Andersen, T., & de Wit, H. Palczewski, M., & Beerli, P. (2014). Population model comparison using A. (2019). Growth responses of the nuisance algae Gonyostomum multi-locus datasets. In M. H. Chen, L. Kuo, & P. O. Lewis (Eds.), semen (Raphidophyceae) to DOC and associated alterations of Bayesian phylogenetics: Methods, algorithms, and applications (pp. light quality and quantity. Aquatic Microbial Ecology, 82(3), 241–251. 187–200). CRC. https://doi.org/10.3354/ame01894. Paris, J. R., Stevens, J. R., & Catchen, J. M. (2017). Lost in parameter Hohenlohe, P. A., Bassham, S., Etter, P. D., Stiffler, N., Johnson, E. A., & space: a road map for stacks. Methods in Ecology and Evolution, Cresko, W. A. (2010). Population genomics of parallel adaptation in 8(10), 1360–1373. https://doi.org/10.1111/2041-210x.12775. RENGEFORS et al. | 925

Postel, U., Glemser, B., Alekseyeva, K. S., Eggers, S. L., Groth, M., Tolf, C., Bengtsson, D., Rodrigues, D., Latorre-Margalef, N., Wille, Glockner, G., & Beszteri, B. (2020). Adaptive divergence across M., Figueiredo, M. E., Jankowska-Hjortaas, M., Germundsson, Southern Ocean gradients in the pelagic diatom Fragilariopsis A., Duby, P.-Y., Lebarbenchon, C., Gauthier-Clerc, M., Olsen, kerguelensis. Molecular Ecology, 29, 4913–4924. https://doi. B., & Waldenström, J. (2012). Birds and Viruses at a Crossroad - org/10.1111/mec.15554 Surveillance of Influenza A Virus in Portuguese Waterfowl. PLoS One, Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of pop- 7(11), e49002. https://doi.org/10.1371/journ​al.pone.0049002 ulation structure using multilocus genotype data. Genetics, 155, Trigal, C., Hallstan, S., Johansson, K. S. L., & Johnson, R. K. (2013). Factors 945–959. affecting occurence and bloom formation of the nuisance flagellate Rengefors, K., Kremp, A., Reusch, T. B. H., & Wood, A. M. (2017). Genetic Gonyostomum semen in boreal lakes. Harmful Algae, 27, 60–67. diversity and evolution in eukaryotic phytoplankton: revelations Veldhuis, M. J. W., Cucci, T. L., & Sieracki, M. E. (1997). Cellular Dna from population genetic studies. Journal of Plankton Research, 39(2), Content of Marine Phytoplankton Using Two New Fluorochromes: 165–179. Taxonomic and Ecological Implications. Journal of Phycology, 33(3), Rengefors, K., Weyhenmeyer, G. A., & Bloch, I. (2012). Temperature as 527–541. a driver for the expansion of the microalga Gonyostomum semen in Waters, J. M., Fraser, C. I., & Hewitt, G. M. (2013). Founder takes all: Swedish lakes. Harmful Algae, 18, 65–73. density-dependent processes structure biodversity. Trends in Rochette, N. C., & Catchen, J. M. (2017). Deriving genotypes from RAD- Ecology & Evolution, 28(2), 78–85. https://doi.org/10.1016/j. seq short-read data using Stacks. Nature Protocols, 12(12), 2640– tree.2012.08.024. 2659. https://doi.org/10.1038/nprot.2017.123. Wood, D. E., & Salzberg, S. L. (2014). Kraken: ultrafast metagenomic se- Sassenhagen, I., Sefbom, J., Säll, T., Godhe, A., & Rengefors, K. (2015). quence classification using exact alignments. Genome Biology, 15(3), Freshwater protists do not go with the flow: Population struc- R46. ture in Gonyostomum semen independent of connectivity among Yamaguchi, M., & Imai, I. (1994). A microfluorometric analysis of nuclear lakes. Environmental Microbiology, 17(12), 5063–5072. https://doi. DNA at different stages in the life hisory of Chattonella antigua and org/10.1111/1462-2920.12987 Chattonella marina (Raphidophyceae). Phycologia, 33, 163–170. Sefbom, J., Sassenhagen, I., Rengefors, K., & Godhe, A. (2015). Priority effects in a planktonic bloom-forming marine diatom. Biology Letters, 11(5), 20150184. SUPPORTING INFORMATION Sörensen, I. (1954). Gonyostomum semen (Ehrenb.) Diesing - en vattenor- ganism av teoretiskt och praktiskt intresse. Svensk Faunistisk Revy, Additional supporting information may be found online in the 2, 47–52. Supporting Information section. Sundqvist, L., Godhe, A., Jonsson, P. R., & Sefbom, J. (2018). The anchor- ing effect - long-term dormancy and genetic population structure. ISME Journal, https://doi.org/10.1038/s4139​6-018-0216-8. How to cite this article: Rengefors K, Gollnisch R, Sassenhagen Tesson, S. V. M., Okamura, B., Dudaniec, R. Y., Vyverman, W., Löndahl, J., I, et al. Genome-wide single nucleotide polymorphism markers Rushing, C., Valentini, A., & Green, A. J. (2016). Integrating microor- ganism and macroorganism dispersal: modes, techniques and chal- reveal population structure and dispersal direction of an lenges with particular focus on co-dispersal. Ecoscience, 22(2–4), expanding nuisance algal bloom species. Mol Ecol. 109–124. https://doi.org/10.1080/11956​860.2016.1148458. 2021;30:912–925. https://doi.org/10.1111/mec.15787 Tesson, S. V. M., Weissbach, A., Kremp, A., Lindstrom, A., & Rengefors, K. (2018). The potential for dispersal of microalgal resting cysts by migratory birds. Journal of Phycology, 54(4), 518–528. https://doi. org/10.1111/jpy.12756.