A Forest

AnDark Evolutionary History of Norway Spruce

Alexis R. Sullivan

Department of Ecology and Environmental Science Umeå University 2020 This work is protected by the Swedish Copyright Legislation (Act 1960:729) Dissertation for PhD ISBN: 978-91-7855-211-5 Cover design by Thomas Ågren Electronic version available at: http://umu.diva-portal.org/ Printed by: CityPrint i Norr AB Umeå, Sweden 2020

Midway upon the journey of our life I found myself within a forest dark, for the straightforward path had been lost.

How hard a thing it is to say what was this forest savage, rough, and stern, the very thought renews my fear.

So bitter is it, death is little more; But to rehearse the good it also brought, I will speak of the other things I saw.

Dante Alighieri, Inferno Translation modified from Henry Wadsworth Longfellow, with apologies

List of Papers ...... iii Author Contributions ...... iv Sammanfattning ...... v The big why? ...... 1 Gymnosperms: improbable model systems...... 2 Thesis aims and objectives ...... 5 Ecology and distribution ...... 7 Genome structure and evolution ...... 9 Nuclei of dark matter ...... 9 Are bigger genomes worse? ...... 13 Mitogenomes ...... 14 Unique assembly challenges ...... 15 Mitogenome size variation...... 17 Plastomes ...... 19 Phylogenetics...... 23 The fossil history of spruce ...... 23 Phylogenetics: gene trees and species trees ...... 25 Hybridization ...... 30 Ubiquitous but cryptic? ...... 31 Promiscuous spruce ...... 33 The rise of Norway spruce ...... 35 Climatic revolutions and the last trees standing ...... 35 Out of the ice...... 39 … and into the fire: Spruce enters the Anthropocene ...... 40 The future ...... 42 Acknowledgements ...... 44 References ...... 46

i

Abstract

Embedded within the relationships among species is a dense forest of gene trees, each with a potentially unique and discordant history. Such widespread genealogical heterogeneity is expected, but embracing this hierarchy of discordance while reconstructing the histories of populations and species remains a major challenge.

In this thesis, I studied the history of the genes and genomes contained within Norway spruce (Picea abies: Pinaceae), a forest tree distributed throughout boreal and montane Europe. I sequenced plastid genomes from all the commonly-recognized Picea species and developed a novel strategy to assemble the bacterial-sized mitochondrial genome of Norway spruce. Using multispecies coalescent network models, I reconstructed the relationships among populations of Norway spruce and the parapatric Siberian spruce (P. obovata) and distinguished between drift and hybridization as sources of phylogenetic discord.

Norway spruce holds heterogenous histories at multiple levels of organization. Although organelle genomes are expected to be clonal and uniparentally inherited, the chloroplast genome held by Norway spruce originated after sexual recombination between two divergent lineages. In the mitochondrial genome, recombination creates a diverse population of genome arrangements subjected to drift and selection within individuals and populations. Genetic diversity among populations is shaped in nearly equal measure by divergence and hybridization. Norway spruce is discordance distilled.

Key words: phylogenetics, genome assembly, recombination, Picea, hybridization, mitogenome, Norway spruce, phylogeography, phylogenetic networks, plastome

ii

List of Papers

I. Sullivan, A.R., B. Schiffthaler, S.L. Thompson, N.R. Street, X.-R. Wang. 2017. Interspecific plastome recombination reflects ancient reticulate evolution in Picea (Pinaceae) Molecular Biology and Evolution 34:1689-1701.

II. Sullivan, A.R., Y. Eldfjell, B. Schiffthaler, N. Delhomme, T. Asp, K.H. Hebelstrup, O. Keech, L. Öberg, I.A. Møller, L. Arvestad, N.R. Street, X-R. Wang. 2019. The mitogenome of Norway spruce and a reappraisal of mitochondrial recombination in . Genome Biology and Evolution 12: 3586–3598.

III. Sullivan, A.R. J. Gao, Y. Jin, E. Mudrik, D. Politov, X-R. Wang. Evidence for the hybrid origin of Norway spruce. Manuscript.

iii

Author Contributions

Paper I: ARS, SLT, and XRW conceived and designed the study. ARS and SLT collected samples. ARS performed all analyses with support from BS and NRS. ARS wrote the manuscript. ARS and XRW critically revised the manuscript with input from all authors.

Paper II: ARS, IMM, NRS, LA, and XRW conceived and designed the study. BS, ND, and NRS obtained sequence data. TA, KHH, OK, LÖ, and IMM designed and implemented the ancient spruce sampling and sequencing. YE and LA designed and implemented the support vector machine. ND carried out the transcriptome assembly and annotation. ARS performed all other analyses with support from BS, ND, and NRS. ARS wrote the manuscript. ARS and XRW critically revised the manuscript with input from all authors.

Paper III: ARS conceived and designed the study with advice from EM, DP, and XRW. EM and DP collected samples. JG and YJ performed laboratory experiments. ARS performed all analyses with support from XRW. ARS wrote the manuscript with input from XRW. All authors approved the manuscript.

iv

sammanfattning

I kopplingarna mellan arter finns en tät skog av genfylogenier, där varje fylogeni kan uppvisa en historik som är unik och oenig med alla andra. Denna vidd av heterogen genealogi är förväntad, men att ta hänsyn till denna hierarki av disharmoni vid återskapandet av populationer och arters historik är fortsatt en enorm utmaning.

I denna avhandling har jag studerat historiken hos gener och genom som återfinns hos gran (Picea abies: Pinaceae), ett skogsträd med en utbredning över boreala och bergiga delar av Europa. Jag sekvenserade plastidgenomen från de vedertagna Picea arterna och utvecklade en ny strategi för att pussla ihop det bakterie-stora mitokondriegenomet hos granen. Med hjälp av en sammanflätad multiart nätverks modell, kunde jag återskapa den evolutionära historiken hos gran och den parapatriska sibiriska granen (P. obovata) och särskilja genetisk drift och hybridisering som orsakerna bakom den fylogenetiska oenigheten.

Granen har en heterogen historik på flera organisationsnivåer. Även om genomen hos organellerna förväntas nedärvas klonalt från endera föräldern, så har kloroplasten hos gran uppkommit från sexuell rekombination mellan två distinkta linjer. I mitokondriegenomet skapar rekombination en population av en mängd olika genomorganiseringar som erfar drift och urval på individ och populationsnivå. Granens populationer formas av en nästan lika del divergens som hybridisering. Gran är i det avseendet en koncentrerad form av disharmoni.

v

The big why?

I began my PhD intending to study the effects of silviculture on the genetic diversity of Norway spruce (Picea abies: Pinaceae). As an immigrant to Sweden, I first consulted a couple of books to learn where, exactly, the species I signed up to study could be found. It did not take me long to discover that “Norway spruce” refers to a set of fuzzily defined and possibly polyphyletic groups. From there, I decided more fundamental questions needed to be addressed first: Do different groups of Norway spruce comprise a single species? What is the history of Norway spruce in an evolutionary context: where, when, and from whom did they evolve? Are spruce boundaries permeable, and if so, how much gene flow occurs? It also did not take me long to discover answering these questions would not be straightforward.

My thesis results from my quest to answer these basic questions about Norway spruce while struggling to navigate the complexities and challenges that stood in the way. Organelle genomes are clonal, non-recombining, and inherited from just one parent. Until they’re not. Phylogenies estimated from genes like the plastid rbcL and mitochondrial nad5 tell us something about the history of species. Except they don’t, not really. Genetic structure can reveal the magnitude and spatial extent of hybridization. But only if you’re studying a population where nothing else has ever happened.

Some of these challenges arose from my own misunderstanding of evolution, like the difference between gene trees and species trees. Some of these misunderstandings are endemic to the field, like just how little you can infer about population history from those ubiquitous genetic structure barplots. Others are fundamentally difficult problems, like reconciling the

1

myriad gene genealogies within a population, especially when hybridization is added to the mix. Some are technical: how do you assemble a genome that exists in vivo in hundreds of tiny, recombining pieces? All these challenges, however, were amplified by my study organism, the majestic but unruly Norway spruce.

Gymnosperms: improbable model systems If I designed an ideal organism for evolutionary studies, the result would not resemble any gymnosperm and probably least of all a coniferous tree. On my wish list, I would jot down short generation times, on the order of weeks and certainly not decades, to be able to empirically test hypotheses about local adaptation, mechanisms of mating isolation, and how exactly periods of gene flow – their duration, periodicity, and magnitude – interact with population divergence and speciation. I’d request a small in stature, so I can conduct controlled crosses and collect seeds without worrying about my life insurance policy. A small genome, perhaps 135,000,000 base pairs over five chromosomes like Arabidopsis thaliana (Arabidopsis Genome Initiative 2000), would help stretch our research budget and enable elegant models based on genome-wide patterns of coalescence or linkage disequilibrium, like the multiple sequentially Markovian coalescent (Schiffels & Durbin 2014). Instead, the characteristics of spruce and the rest of the Pinaceae, including the pines (Pinus), firs (Abies), larches (Larix) and a host of other genera with limited species diversity (Cathaya, Cedrus, Keteleeria, Nothostuga, Tsuga), are exactly opposite to every item on my wish list.

Spruce do have attributes that make them appealing study objects, not least their charisma and the beauty of their habitats. As a dominant tree species of the boreal forest, the Earth's largest terrestrial biome, spruce play

2

integral roles in the global carbon and water cycle (Goulden et al. 1998, Gerten et al. 2004). Links to human livelihoods are also more direct: millions of cubic meters of spruce and other coniferous trees are exported annually as lumber and pulp (FAO 2011). Fennoscandia alone contributes 14% of the world's timber, pulp, and paper, much of which comes from spruce (Hannerz 2003a, 2003b). Thanks to this economic importance, many aspects of Picea ecology and biology are relatively well-understood. The silvics of North American and European species has been a major research focus in forestry over the last century (Burns & Honkala 1990) and Norway spruce may be the single most studied tree in the world (Sullivan 1994). Extensive progeny and clonal1 trials have established the heritability and variability of phenotypic traits like bud burst and set, growth, and stress resistance (Dietrichson 1969; Skrøppa 1982; Ekberg et al. 1991; Skrøppa 1991; Hylen 1997). Ambitious common garden experiments, like the International Union of Forest Research Organizations (IUFRO) 1964/68 Provenance Trial (Krutzsch 1974), provide easily accessible collections of population germplasm and insights into the extent (or lack thereof) of local adaptation (e.g. Ujvári-Jármay et al. 2016). Even the genome structure and evolution of Picea is not exactly terra nova: genome assemblies and linkage maps of Norway spruce (Nystedt et al. 2013) and white spruce (P. glauca, Birol et al. 2013; Warren et al. 2015) are available and improved versions are under development.

Other benefits of studying the evolution of Picea and other conifers are, in fact, the same attributes that make them challenging. Genomic signatures of selection and speciation are lost over time, but the >25 to 60-

1 Norway spruce is easily propagated by cuttings and somatic embryogenesis! This an obvious benefit for evolutionary studies, albeit one tempered by the time to cone development.

3

year generation times of spruce, coupled with the recent origin of most species (Feng et al. 2018; Shao et al. 2019), should make tests of adaptive evolution and hybridization more powerful (e.g. Hamilton et al. 2013; Chen et al. 2014). While long periods between meiosis and recent divergence times can mitigate sequence saturation, they also contribute to incomplete lineage sorting (ILS)2 and this makes estimating relationships among species challenging (Maddison 1997). However, recent studies have documented pervasive gene tree discordance from ILS and other biological (Jarvis et al. 2014) and methodological sources (Reddy et al. 2017), even in organisms that may be naively expected to be better behaved than conifers. There is satisfaction in knowing approaches devised for a challenging group of organisms have a good chance of being broadly applicable, a sort of smug martyrdom familiar to anyone who has tried to use anthropocentric bioinformatics software on 10-fold larger genomes.

Another benefit of working with Picea and other conifers appears to be a quirk of nature. A spandrel, if you’d like (Gould & Lewontin 1979). Like other plants, gymnosperms have three semi-independent sets of genomes found in plastids, mitochondria, and nuclei, respectively. However, unlike most organisms, one of the organelles – the plastid – is generally transmitted by the pollen parent in conifers (Szmidt et al. 1987; Neale & Sederoff 1989; Crosby & Smith 2012). The reason for this paternal inheritance, if any exist, is unclear because organelles are expected to be predominantly transmitted by the mother due to higher genetic load in male gametes (Greiner et al. 2015). Regardless, the contrasting modes of inheritance allows genetic contributions from each parental lineage to be tracked independently and unambiguously through time. At least, more or

2 Large population sizes and short time periods between speciation events are also important sources of ILS.

4

less. Biparental inheritance and sexual recombination can occur (Szmidt et al. 1987; Erixon & Oxelman 2008; Barnard-Kubow et al. 2016), as I discovered while trying to make sense of the conflicting phylogenetic signals along the Picea plastid genome (Sullivan et al. 2017).

Thesis aims and objectives I cannot tell a continuous story about Norway spruce from the very first on Earth down through the connectivity of populations in the modern day. Life, it turns out, is messier than that. Instead, I try in this thesis introduction to fold the results from the three studies summarized below into our broader knowledge of genome evolution, phylogenetics, and speciation.

1. Interspecific plastome recombination reflects ancient reticulate evolution in Picea (Pinaceae): We found evidence for plastomes generated by interspecific hybridization and sexual recombination in the clade comprising Norway spruce and ten other species. These results reconcile previous conflicting plastid-based phylogenies and strengthen the mounting evidence of reticulate evolution in Picea. Given the relatively high frequency of hybridization and biparental plastid inheritance in plants, we suggest interspecific plastome recombination may be more widespread than currently appreciated and could underlie reported cases of discordant plastid phylogenies.

2. The mitogenome of Norway spruce and a reappraisal of mitochondrial recombination in plants: We developed a novel assembly strategy to achieve a highly-contiguous draft of the 5 Mb mitogenome of Norway spruce. By exploiting the in

5

vivo structural variants captured by long sequencing reads, we estimated repeat-by-repeat homologous recombination rates and found evidence for rampant recombination at repeats of all sizes. Prompted by this discovery, we assessed the genomic support for the prevailing hypothesis that recombination is predominantly driven by repeat length, with larger repeats facilitating DNA exchange more readily. Overall, we found mixed support for this view: recombination dynamics were heterogeneous across vascular plants and highly active small repeats (ca. 200 bp) were present in about a third of studied mitogenomes.

3. Evidence for the hybrid origin of Norway spruce: We sequenced organelle genomes and >500,000 nuclear SNPs from 30 trees spanning the range of Norway and Siberian (P. obovata) spruce. Both organelle genomes in Norway spruce were polyphyletic on a continental scale and multispecies coalescent network models strongly supported a hybrid origin for the northern population. Although we limited this analysis to individuals outside the putative hybrid zone, phylogenetic networks indicated about 40% of northern Norway spruce nuclear genome originated from a Siberian spruce-like lineage around 240,000 years ago, whereas gene flow from southern Norway spruce populations ceased ~730,000 years ago.

6

Ecology and distribution

Figure 1. Distributions of commonly recognized Picea species. Reproduced from Lockwood et al. (2013) with permission from Elsevier Inc.

Spruce cover vast expanses of the northern hemisphere, circumscribing nearly the entire boreal region, south to the temperate mountains of Asia, Europe, and North America (Fig. 1). As species predominately adapted to cool, moist conditions, spruce biomass is concentrated across the boreal zones of Eurasia and North America. However, this circumpolar region contains only four spruce species: Norway spruce, Siberian spruce (P.

7

obovata), white spruce, and black spruce (P. mariana). The center of Picea diversity, comprising 50% of the species, lies in the mountains of south- central China, followed by the mountains of Japan, and the North American Cordillera (Eckenwalder 2009, Farjon & Filer 2013).

Spruce species generally share similar morphologies and life history attributes. All are predominately wind pollinated and highly outcrossing. Most are slow-growing, shade to mid-tolerant late successional species that prefer cool sites and well-drained soils (Burns & Honkala 1990). Adult trees are generally tall, ranging from a minimum of ca. 12 m in some black spruce populations up to 65 m in Sitka spruce (P. sitchensis). Some species, like Sitka and Norway spruce, regularly live 300 years or more (Burns & Honkala 1990).

Regeneration most often occurs under a closed forest. Saplings may wait in the understory for decades until a small disturbance creates space in the canopy (Steijlen & Zackrisson 1987). Sexual maturity in natural populations depends on crown size and trees may be 30-60 years old before they first produce seed cones (Zasada & Gregory 1969). Some species, like Norway spruce, are notoriously poor seed producers. Northern populations produce good crops only once a decade on average (Safford 1974). Vegetative reproduction is of marginal importance and spruce seeds do not persist well in the soil (Safford 1974). Overall, spruce have long live, low reproductive output, and are intolerant of large disturbance events that destroy advanced regeneration.

8

Genome structure and evolution

Nuclei of dark matter Gymnosperms are among the worst hoarders of the plant kingdom. Holoploid genome sizes3 average nearly 18,000,000,000 base pairs (18,000 Mb) and genomes up to 35,000 Mb are not uncommon (Zonneveld 2012; Pellicer & Leitch 2019). However, gymnosperms do not have the distinction of having the largest genomes – that record held by Paris japonica (2n=8x=40), a woodland lily with an astonishing ~149,000 Mb of DNA, followed closely by the epiphytic whisk obliqua (2n= 8x=416) with ~147,000 Mb (Hidalgo et al. 2017). In fact, no gymnosperm manages to crack into the elite list of ‘gigantic’ genomes (sensu Pellicer et al. 2018). Instead, gymnosperm genomes are unique because they are consistently large (Table 1). While other plant groups may have a few examples of genomic gigantism, they also contain extremely tiny genomes (Table 1) and are overall strongly skewed towards smaller sizes (Pellicer et al. 2018).

Table 1. Summary of 1C (Mb) genome sizes in groups. Conifers II comprises Araucariaceae, Podocarpaceae, and the four Cupressaceae-like families. Genome size estimates were obtained from the Plant DNA C-values database (Pellicer & Leitch 2019), retrieved February 4, 2020).

Group Mean Min Max σ Angiosperms 5,030 65 149,185 8,761 Monocots 9,393 191 149,185 12,298 Eudicots 2,447 65 100,842 3,854 Pteridophytes 11,870 78 147,598 13,562 Gymnosperms 17,983 2,205 35,280 7,167 Pinaceae 21,028 11,221 35,280 6.745 Conifers II 11,562 4,067 31,492 4,050 Cycadales 19,144 12,129 31,703 3,608 Gnetales 11,579 3,793 17,856 5,605

3 1C values, the DNA content of the unreplicated gametic chromosome complement.

9

Large genome sizes in gymnosperms in general, and Pinaceae in particular, are unique because they cannot be attributed to polyploidy. While all plants descend from a lineage having undergone at least one whole genome duplication event (Li et al. 2015; Wendel et al. 2016), these have been much more common in angiosperms and than in gymnosperms (Wood et al. 2009). Neopolyploidy, the state of having more than one set of chromosomes in generative cells, helps to explain the incredible genomes of P. japonica and T. obliqua (Hidalgo et al. 2017) but occurs in only a handful of gymnosperms (Scott et al. 2016). The remaining are diploid, with Pinaceae conifers having 2n=2x=24 chromosomes almost without exception.

An average gymnosperm has no more genes or exonic sequence than the diminutive Arabidopsis thaliana (Table 2), although estimates of gene space vary due to the difficulty of annotating discontinuous assemblies (Wegrzyn et al. 2014). Conifers do have exceptionally large introns (Nystedt et al. 2013; Stevens et al. 2016; Wegrzyn et al. 2014; Table 2), but this stems largely from the insertion of transposable elements (TEs; Elliott & Gregory 2015), segments of DNA able to cut or copy and paste themselves throughout the genome (Bourque et al. 2018). Among the potential contributors to genome size, TE transpositions and eliminations are the single most important (Elliott & Gregory 2015; Wendel et al. 2016; Table 2).

10

Table 2. Comparison of genome sizes and sequence content for nine published gymnosperm assemblies and two selected angiosperms.

Species Size Karyotype Intron TEs (Mb/%) CDS (Mb/%) (1C-Mb) (2n) size gymnosperms Picea abies1 19,600 24 1,000 13,720 (70) 56 (0.29) Picea glauca2,3 20,000 24 3,900 - 74 (0.37) Pinus taeda4,5,6 20,800 24 3,000 15,392 (74) 53 (0.25) Pinus 31,000 24 10,200 24,490 (79) 18 (0.06) lambertiana6,7 Abies alba6 18,200 24 300 - 64 (0.35) Pseudotsuga 15,700 26 2,300 11,304 (72) 42 (0.27) mensiezii6,8 Larix sibirica6,9 12,300 24 350 - 49 (0.40) Ginkgo biloba10 10,600 24 7,900 7,950 (75) 50 (0.47) Gnetum 4,200 44 2,800 3,276 (78) 45† (1.07) montanum11 angiosperms Arabidopsis 135 10 163 22 (16) 37 (27.41) thaliana12 Populus 484 38 379 69.3 (14) 45 (9.30) trichocarpa13 †Estimated from the number of genes by their mean exon length 1- Nystedt et al. 2013, 2- Birol et al. 2013, 3- Warren et al. 2015, 4- Neale et al. 2014, 5- Wegrzyn et al. 2014, 6- Mosca et al. 2019, 7-Stevens et al. 2016, 8- Neale et al. 2017, 9- Kuzmin et al. 2019, 10- Guan et al. 2016, 11- Wan et al. 2018, 12- Zapata et al. 2016, 13- Tuskan et al. 2006

Conifers are graveyards of inactive TEs (Nystedt et al. 2013; Wegrzyn et al. 2014; Stevens et al. 2016; Table 2). Phylogenetic analyses of the reverse transcriptase gene found in some TEs4 suggest most transpositions occurred prior to the diversification of extant genera, with only a tiny fraction still active in extant species (Nystedt et al. 2013). Given enough time, the activity of a few TEs can result in considerable differences in genome size among lineages. For example, the slow-and-steady proliferation of a few TEs over the last 125 million years (Saladin et al.

4 That is, the more abundant ‘copy and paste’ Class I elements, which include long terminal repeat retrotransposons (LTRs) and long interspersed nuclear elements (LINEs). Short interspersed nuclear elements (SINEs) don’t encode a reverse transcriptase but are Class I elements because they still propagate through an RNA intermediary.

11

2017) explains the ~40% larger genome size of sugar pine (P. lambertiana) compared to loblolly pine (P. taeda; Stevens et al. 2016). Genome sizes in Picea show little variation (Pellicer & Leitch 2019), but this is because extant species descend from a common ancestor dating to around 20-25 Mya (Feng et al. 2018; Shao et al. 2019) and simply haven’t had enough time to evolve more considerable differences.

While TEs are ubiquitous in eukaryotes, their contribution to genome size varies from about 5% in a typical bird to over 70% in gymnosperms (Sotero-Caio et al. 2017). Plants generally have more TEs than other eukaryotes (Elliott & Gregory 2015), but the high proportion among gymnosperms (Table 2) is still an anomaly, along with the Nobel Prize winning exception of Zea mays (McClintock 1941). Gymnosperms, and particularly conifers, seem unable to rid themselves of these selfish elements through unequal homologous recombination (Nystedt et al. 2013). The gymnosperm Gnetum montanum, a member of the bizarre and diverse gnetophytes, evolved a third strategy: Angiosperms show rapid TE turnover and conifers merely accumulate them, but Gentum seems to be slowly eliminating TEs without new acquisitions (Wan et al. 2018).

Although TE insertions are initially slightly deleterious, their long-term dynamics are a central driver of evolution (Bourque et al. 2018). A particularly striking example of morphological evolution connected to TEs is the mammalian placenta, which likely arose through the repeated capture and domestication of endogenous retroviruses (Bourque et al. 2018). Domesticated TEs also replace the lost telomerase gene in Drosophila (Pardue & DeBaryshe 2011) and play important roles in centromere structure in other organisms (Bourque et al. 2018). More generally, TEs induce genome rearrangements, modify gene expression, remodel regulatory networks, create novel transcripts, and cause cascading

12

responses through chromatin remodeling and small RNAs (Sahebi et al. 2018; Bourque et al. 2018). TEs need not be active to wreak havoc on a host’s genome. For example, TEs whose transcription genes have degraded into uselessness still promote large-scale rearrangements through ectopic recombination (Bourque et al. 2018).

Are bigger genomes worse? As genome sizes increase, so does the retention of duplicated genes (Lynch & Conery 2003), the proportion of repetitive DNA including TEs, and the number and size of genes and introns (Lynch & Conery 2003; Elliott & Gregory 2015). In turn, genome size may be negatively correlated with effective population size (Ne), a somewhat nebulous parameter that reflects the strength of genetic drift, that is, rate at which neutral variation is randomly lost between generations. These correlations led Lynch and Conery (2003) to hypothesize that the reduced ability of populations with smaller Ne to purge weakly deleterious mutations, such as the yet another TE transposition, eventually led to them lugging around oversized genomes. Smaller genomes may optimal, but the interplay between the strength of selection and effective population size leaves them unable to shed their excess baggage.

Empirical data suggest larger genomes come with a cost (Grotkopp et al. 2004; Ahuja & Neale 2005; Leitch et al. 2009), but the causative role of drift has received mixed support, particularly in plants (Whitney, Baack, et al. 2010; Ai et al. 2012). Most concerningly, a reanalysis of the Lynch and Conery dataset showed the correlation between Ne and genome size is negligible if the confounding effect of phylogenetic dependence is controlled (Whitney & Garland 2010; Whitney et al. 2011). In other words, a spruce might have a large genome and small Ne, but that’s just the way their ancestors have lived since the Jurassic and they are creatures of habit.

13

Nevertheless, a mechanistic relationship between drift and the accumulation of slightly deleterious mutations is firmly grounded in population genetic theory and the equivocal empirical support may arise from the difficulty in estimating Ne (Whitney et al. 2011). Should a significant relationship between Ne and genome size be found, then the many aspects of organismal biology correlated to population size must also be considered as potential causative mechanisms (Charlesworth & Barton 2004).

Mitogenomes Plant nuclear genomes may be wonderfully variable and complex, but plant mitogenomes are frankly weird. Mitochondria share an α-proteobacterium ancestor, a conserved core proteome, and near universal function as the site of cellular energy production (Gray 2014). While this holds for plants, it’s hard to generalize much further. Figure 2. Size distribution of mitogenomes from Consider everything we know animals (n = 9,300), angiosperms, lycophytes and ferns (n = 229), gymnosperms excluding about vertebrate mitogenomes: Pinaceae (n = 7) and Pinaceae conifers (n = 8). they’re small with 13 intronless Extremely small and large mitogenomes are found in plants, and on average, gymnosperms protein-coding genes and little may have larger mitogenomes than other plants. Data from NCBI Genome Resource and non-coding DNA; they are Sullivan et al. (2019). structurally stable, with little evidence for recombination at all; and they are virtually always inherited maternally (Ladoukakis & Zouros 2017). But plants do it all differently.

14

Plant mitogenomes vary 100-fold or more in size (Sloan et al. 2012; Fig. 2), substitution rates (Sloan et al. 2009), and rearrangement rates (Cole et al. 2018). Gene repertoires typically comprise 24 core genes and 17 others that are liable to gain and loss through recurrent horizontal (Sanchez-Puerta 2014) intergenomic transfers (Adams & Palmer 2003). Many other regions are actively transcribed (Jackman et al. 2015; Wu et al. 2015; Sullivan et al. 2019), which could indicate novel functional elements, but entire organelles can also be transcribed indiscriminately (Sanitá Lima & Smith 2017). Some plants show exuberant inter- and intra-molecular recombination, causing their mitogenomes to be distributed among numerous isoforms (Sullivan et al. 2019; Kozik et al. 2019), while others are curiously staid (Sullivan et al. 2019). This diversity is even more impressive considering the very few reference-quality plant mitogenomes available to date (231 for plants vs. 9,300 for animals; NCBI Genome Resource; accessed January 30, 2020).

Unique assembly challenges Plant mitogenomes have proven difficult to assemble, leading to an unusual situation where nuclear genome assemblies outnumber those of an organelle. Part of this may stem from a misguided lack of interest – perhaps some researchers imagine plant mitogenomes to be more like our own and dismiss them – but their structure can make them genuinely difficult to assemble. A principle challenge is the extent of DNA shared between the nucleus and mitochondrion (Alverson et al. 2010, 2011; Sullivan et al. 2019), which makes identifying mitochondrial DNA from a pool of total genomic DNA difficult to impossible. For some species, this issue can be resolved by physically separating intact mitochondria from the rest of the cell, but this has proven difficult in coniferous gymnosperms, perhaps due to secondary compounds in their leaves. Strategies to identify mitogenomic

15

scaffolds using GC-content and copy number can be effective (e.g. Naito et al. 2013) but fail in some species, such as Norway spruce, possibly because of intergenomic DNA exchanges.

Our mitogenome assembly method used an initial classification step to identify mitogenome-like scaffolds in an assembly, which are then used for in silicio enrichment of Pacific Biosciences (PacBio) Sequel reads from total genomic DNA (Fig. 3). A variant of this approach using only the support vector machine (SVM) to identify the mitogenome from an existing assembly may be adequate for other species (Eldfjell 2018)5 but the sensitivity and specificity of the classification was mediocre for Norway spruce. Assembling the PacBio reads and identifying mitogenome afterwards also proved infeasible because only a tiny Figure 3. Strategy used to assemble the bacterial-size Norway spruce mitogenome. fraction of the data originated from mitochondria. I was dumbfounded initially, but this result could be expected because many, if not most, plant mitochondria lack a copy of their genome (Preuten et al. 2010). In barley, tobacco, and Arabidopsis, complete

5 For example, the SVM had high recall and low false discovery rates for Abies, Juniperus, Gnetum, and Taxus

16

mitogenome copies can be just 10-fold more abundant than nuclear genome, despite each cell having hundreds of mitochondria (Preuten et al. 2010). Assuming spruce needles hold a similar ratio6, then 0.25% of the total base pairs in a cell originate from the mitogenome! In this light, it is unsurprising my attempts to assemble the mitogenome directly from such a highly heterogenous and complex pool of reads produced terrible results.

Assembling the long reads enriched for mitochondrial-like sequences proved to be a workable solution. The lenient criteria used for in silico enrichment retained poorly conserved intergenic regions, while the contiguity of the assemblies helped to identify and remove dubious contigs afterwards (Fig. 3). In the end, we achieved a respectable assembly measuring 4.9 Mb over four contigs, with each nucleotide covered by an average of 284 reads. Especially considering the in vivo dynamism of the mitogenome, this assembly is only a rough model for the arrangement of mitochondrial sequences. More sophisticated approaches could further illuminate the network of genome structures and their interactions (Jackman et al. 2019; Kozik et al. 2019), but our method can be readily applied to other worse-case scenarios.

Mitogenome size variation Like their nuclear counterparts, the Picea mitogenome is exceptionally large but is unremarkable in gene content (Jackman et al. 2015; Sullivan et al. 2019; Jackman et al. 2019). Although repetitive sequences in Picea sum to almost twice the size of an average plant mitogenome, the proportion of repetitive DNA is perfectly average (Dong et al. 2018; Sullivan et al. 2019). Neither is Picea an outlier. There seems to be no strong correlation

6 Assuming two copies of the 19,600 Mb nuclear genome, 20 of the ~4.9 Mb mitogenome, and 2,000 copies of the 0.124 Mb plastome.

17

between mitogenome size and repeat proliferation in gymnosperms (Sullivan et al. 2019), Silene species (Sloan et al. 2012), or the Fabaceae (Choi et al. 2019). However, repeat expansion may be the dominant source of size variability in cucurbits (Alverson et al. 2010, 2011; Rodríguez- Moreno et al. 2011).

Mitogenome size variation in plants may be driven by recurrent intergenomic sequence transfers (Goremykin et al. 2012). In practice, testing this hypothesis would be challenging even with a complete set of genomes because DNA shared by the nucleus and mitochondrion may have originated in either genome. Long terminal repeat retrotransposons (LTRs), a class of TE particularly abundant in plant nuclear genomes, may act as Figure 4. Larger mitogenomes contain proportionally more directional markers because no evidence transposable element (TE) sequences. Data points are from suggests they remain active in the Fabaceae, (Choi et al. 2019), Solanaceae (Gandini et al. 2019), mitogenome. If this is the case, then LTR and gymnosperms (Sullivan et al. 2019). fragments unambiguously indicate transfer from the nuclear genome, although whether their abundance scales proportionally with the total amount of transferred DNA is unknown.

Consistent with the intergenomic transfer hypothesis, larger mitogenomes contain disproportionally more nuclear TE sequences (Fig. 4). Note these values were collated from the literature and TE estimates may vary systematically by study. For example, Kan et al. (2020) analyzed some of the same genomes as Sullivan et al. (2019) but found very different TE abundances. A second important caveat is these estimates have not been

18

corrected for their phylogenetic dependence and the data come from three lineages: Fabaceae (Choi et al. 2019), Solanaceae (Gandini et al. 2019), and gymnosperms (Sullivan et al. 2019). Although preliminary, this result suggests a standardized and more detailed investigation of the intergenomic transfer hypothesis using TEs as a marker could be fruitful.

Plastomes Land plant plastids are an oasis of tranquility in an otherwise chaotic genomic landscape. While exceptions occur and will be continued to be discovered (reviewed in Chaw et al. 2018; Mower & Vickrey 2018; Ruhlman & Jansen 2018), the plastome is highly conserved in size, gene content, structure, and substitution rates. For a change of pace, plastomes in gymnosperms tend to be smaller than in angiosperms, an average of 120 genes over just 130,000 bp (130 Kb, Ginkgo and cycads are exceptions; Chaw et al. 2018). Angiosperm plastomes have retained a pair of 15-30 Kb inverted repeats (IR) from the common ancestor of land plants, while conifers (Pinaceae, Cupressaceae, and gnetophytes) have lost one copy, possibly in two independent events (Wu et al. 2011). The IR is involved in DNA repair through intramolecular recombination and gene conversion (Bock 2007; Day & Madesis 2007; Zhu et al. 2016) and may be important to genome stability, which could explain the more diverse7 plastome arrangements found in conifers than angiosperms (Wu et al. 2011).

In conjunction with this straightforward genome evolution, patterns of plastome inheritance seem quite simple with a few exceptions. Or rather, seemed (Wolfe & Randle 2004; Sullivan et al. 2017; Ramsey & Mandel

7 Although the structural diversity in conifers is sometimes described with phrases like “extensively rearranged,” this is relative to angiosperms plastomes. Even the most rearranged plastome does not approach the degree of gene shuffling in the Picea mitogenome.

19

2019). Until recently, the plastome copies in an organism appeared to be rather strictly uniparentally inherited, uniform in sequence, and non- recombining. Moreover, plastome paralogs seem quite rare, especially compared to large gene families found in the nucleus or the extensive intergenomic transfers to the mitogenome (Goremykin et al. 2012; Sullivan et al. 2019). When these assumptions hold, the plastome is expected to unambiguously reflect a single parental lineage and phylogenetic analyses can focus on this pattern of descent without worrying about the intricacies of genome evolution. It’s no exaggeration to say plastome sequences completely revolutionized plant systematics and taxonomy (e.g. APG 1998, 2003, 2009), and you’d be forgiven for using plastid loci without expecting any evolutionary curveballs.

We sequenced 65 plastomes from 35 spruce species with the aim of reconstructing the pattern of pollen-mediated, patrilineal descent. Then, I planned to compare this to the history inferred from the maternally inherited mitogenome to detect incongruences that could be attributed to interspecific hybridization (e.g. Bouillé et al. 2011). Instead, I found the North American spruce species couldn’t be placed in relation to the other clades with any support (Sullivan et al. 2017). Previous studies achieved high resolution using just a fragment of the plastome (Ran et al. 2006; Bouillé et al. 2011), so I set out to find the source of the problem. First, I checked the most obvious sources of error: were my samples mislabeled? were my plastome assemblies adequate? was the sequence alignment awful? Excluding these possibilities, I then looked to positive selection and sequence saturation, where mutation has eroded the phylogenetic signal away (Parks et al. 2012), as biological explanations. While directional selection may act on a few codons, excluding these sites did not improve the support of the phylogeny (Sullivan et al. 2017). Finally, I looked at the

20

distribution of topologies produced during the bootstrapping procedure and noticed something curious: two distinct and equally well-supported positions were found for the North American clades, the same alternatively supported by the plastid loci used by Bouillé et al. (2011). While lacking the data to draw more definitive conclusions, they suggested:

“The differential positioning of [the North American clades […] likely indicates cpDNA heterologous recombination from more or less ancient hybridization […]”

Skeptical, I decided to test this hypothesis using a multitude of approaches, which all supported one conclusion: the plastome shared by Norway spruce and a group of other Eurasian species is the product of sexual recombination between two divergent lineages originating in North America, potentially through Jezo spruce (P. jezoensis) as an intermediary (Fig. 5). Roughly half of plastome (denoted ‘F1’ and ‘F2’ in Fig. 5) reflects a close relationship between the white (‘glauca’) and Norway (‘abies’) spruce clades, but the other half indicates black spruce (P. mariana) and relatives (‘mariana’) are more closely related to Norway spruce (‘F3’ in Fig. 5). Subsequent analyses of the nuclear genome have found the Norway spruce clade is related to white spruce through vertical descent, but it has also received numerous genes from the black spruce clade through hybridization (Feng et al. 2018).

21

Figure 5. Phylogenies inferred from the structural regions of the Picea plastome are strongly discordant. While the F3 region supports a sister relationship between the white spruce (“glauca”) and Norway spruce (“abies”) clades, the F2 and F1 regions suggest “abies” and the black spruce clade (“mariana”) share a more recent common ancestor. Sexually recombinant plastomes could be more common than currently appreciated, especially in the ~15% of plant genera where hybridization occurs (Whitney, Ahern, et al. 2010), because mechanisms maintaining uniparental organelle inheritance tend to break down when parents are genetically distant (Hansen et al. 2007; Ellis et al. 2008; Barnard-Kubow et al. 2016). Since our study, intraspecific plastome recombination has been reported in Cupressaceae conifers (Zhu et al. 2018). Previous studies that found phylogenetic conflict among plastid loci (Erixon & Oxelman 2008; Parks et al. 2012) suggest more cases of sexual recombination may be found on reexamination. Studies like ours have helped challenge the traditional view of plastomes as ‘simple’ genomes and prompted calls for more nuanced analyses of their evolutionary history (Walker et al. 2019; Gonçalves et al. 2019; Ramsey & Mandel 2019).

22

Phylogenetics

Phylogenetics seeks to understand the evolutionary relationships among species. Placing these relationships in the context of geologic time and space is commonly also a goal, in addition to inferring when, where, and how many times a trait – for example, the presence of absence of vessels8 or the ability to tolerate extreme cold – appeared in history. Progress towards these goals has been made from two historically separate pools of evidence: the fossil record and homologous gene sequences.

The fossil history of spruce Pinaceae fossils are abundant thanks to their woody seed cones, bracts, and twigs and their tough, resinous leaves. However, successful preservation is only the first step in a lengthy, stochastic process of becoming a part of the fossil record. In addition to having tissues amenable to fossilization, the dead organism must become buried in a suitable substrate, and this sedimentary coffin must also survive weathering and reworking well enough to be assigned to a geologic time period (Matthews et al. 2017). Assuming this process is successful, then the fossil must be discovered and identified by anatomical and morphological traits (Matthews et al. 2017, Gernandt et al. 2018). If we seek to understand the relationships of these fossils to the still-living, then the morphological characteristics preserved in the fossils must also be measured in extant taxa and assessed for their ability to serve as a proxy for evolutionary relationships (Gernandt et al. 2016, 2018). While this may seem pedantic, consider that the discovery and description of P. burtonii in 2012 promised to extend the Picea fossil record by 75 million years (Klymiuk & Stockey

8 A type of water conducting cell found in some plants

23

2012), but subsequent comparative studies found the taxonomic affinities of this fossil to be ambiguous (Gernandt et al. 2016, 2018). Overall, our understanding of morphological features and character evolution among living and fossil Pinaceae is still too poor to fully apply the >120 putative Picea fossils (LePage 2001) towards understanding the evolution of extant lineages (Gernandt et al. 2018). The situation is similar for the rest of Pinaceae and other gymnosperms: while fossils are fortuitously abundant, few can be unambiguously assigned to extant genera (Gernandt et al. 2016).

Table 3. The oldest Picea fossils found by geographic region. The two oldest, P. burtonii and P. farjonii, may instead belong to a genus closely related to Picea.

Age Region Species Stratigraphy Locality (MYA) North America P. burtonii1 Lower 136 Vancouver Island, (>60°N) Cretaceous Canada P. nansenii2 45 Axel Heiberg Island, Eocene Canada P. palustris2 45 Axel Heiberg Island, Eocene Canada P. svedruppi2 45 Axel Heiberg Island, Eocene Canada North America P. coloradensis2 45 Idaho, USA Eocene (<60°N) Asia (>87°E) P. farjonii3 Lower 112 Tevshiin Govi, Cretaceous Mongolia P. snatolensis2 Late Eocene 38-34 Kamchatka, Russia Asia (<87°E) P. mugodzharica2 Early 32-30 Mugodzhar Hills, Oligocene Kazakhstan Europe P. beckii2 Late 24-26 Lausitz, Germany (<55°E) Oligocene 1-Klymiuk & Stockey (2012) 2- LePage (2001) 3-Herrera et al. (2016)

While the two oldest Picea-like fossils are not without controversy (Gernandt et al. 2016, 2018), the exquisitely preserved seed cones and foliage from the Canadian High Arctic dated to 45 Mya might be the oldest that can definitively assigned to the genus (LePage 2001; Table 3). However, calibrated molecular clocks have estimated the divergence time between Pinus and Picea to the Late Jurassic or Early Cretaceous (e.g. Willyard et al. 2007; Nystedt et al. 2013; Lockwood et al. 2013), which makes the placement two oldest spruce-like fossils in the genus particularly

24

appealing (Table 3). Leaving these uncertainties aside, the distribution and abundance of Eocene and later fossils suggests extant lineages originated in the North American Cordillera or the Arctic Archipelago and migrated to Asia across the Bering Land Bridge as the Eocene thermal maximum subsided (Ledig et al. 2004). From there, spruce dispersed to Europe as the climate continued to cool. This fossil scenario accords well with biogeographic reconstructions from molecular data (Ran et al. 2006, 2015; Shao et al. 2019), although the high extinction rate in Pinaceae should ward off the temptation to interpret any of the fossil taxa as the ancestors of surviving lineages (Crisp & Cook 2011; Leslie et al. 2012).

Phylogenetics: gene trees and species trees Each gene9 in an organism shares a correlated but not identical history (Maddison 1997). Imagine a population of haploid, clonal organisms with equal fitness that produce a random number of offspring each generation. Population size remains constant and this introduces stochasticity in the genealogy. In this scenario, called a Fisher-Wright model, the probability any two offspring will share a parent in the previous generation is 1/N, where N is the number of individuals (Nordborg 2003). Conversely, the probability they do not share an ancestor in the previous t generations is (1-1/N)t and rapidly approaches zero as t increases (Rosenberg 2002).

More often, we are interested in the history of populations rather than any given gene. In sexual organisms, recombination and reproduction uncouple genomic regions and randomly shuffle them into offspring, generation after generation (Maddison 1997). A single gene genealogy, or gene tree, reflects only a single iteration of this stochastic process. Extending this scenario to multiple groups with shared ancestry introduces

9 Here, a stretch of DNA internally free from recombination

25

the possibility that a gene sampled in one population may coalesce first with a lineage from a different population, with increasing probability in larger populations with fewer generations since divergence. This phenomenon is termed incomplete lineage sorting (ILS).

Under some combinations of population sizes and divergence times, the most likely gene tree does not mirror the species or population history (Degnan & Rosenberg 2006). More generally, ILS is pervasive and simply taking the consensus of a large sample of gene trees provides no guarantee the correct species tree will be recovered (Edwards 2009; Roch & Steel 2015). Building upon the simple Fisher-Wright model (Kingman 1982), the multispecies coalescent (MSC) treats gene trees as variables generated by the shared ancestry of a group of populations, that is, a species tree (Rannala & Yang 2003; Degnan & Rosenberg 2009; Liu et al. 2009). The most sophisticated implementations of the MSC coestimate gene trees and species trees along with population sizes and divergence times (e.g. Heled & Drummond 2009). These methods can accurately infer species trees even under extremely high levels of ILS, but as a tradeoff, they demand considerable sequence data and computational resources (Ogilvie et al. 2016), which often makes them impossible to apply to real datasets (Liu et al. 2015).

Genes in Picea can reflect a huge number of different histories because of their large population times, long lifespans (Chen et al. 2010), and rapid diversifications (Feng et al. 2018; Shao et al. 2019). Given realistic estimates of population sizes and generation times, genes in spruce take about 45-55 million years to sort out after speciation (Chen et al. 2010) but all extant species can be traced to a common ancestor living ca. 20 Mya (Feng et al. 2018; Shao et al. 2019). A gene sampled from the three most distantly

26

related species has a 70%10 chance of matching the topology of the species tree (Hudson 1983; Nei 1986). For more closely related species, for example, among Norway spruce and two relatives, that probability is about 55%. While these odds might sound favorable, they quickly decline to almost nothing as more taxa are added (Degnan & Rosenberg 2006). Any species tree with more than five taxa is expected to have at least relationship produces incongruent gene trees more often than not (Degnan & Rosenberg 2006). Such relationships can still be resolved but not without coalescent models and considerable effort (e.g. Cloutier et al. 2019)

Coalescent species trees have been published for Picea only within the last two years (Feng et al. 2018; Shao et al. 2019). Notably, coalescent methods and those assuming an absence of ILS resolve different quite different species tree estimates when applied to the same data (Feng et al. 2018; Shao et al. 2019). Although the relationships of some lineages differ between the two studies, the coalescent species tree estimates from each study are more congruent than those produced with alternative (i.e., concatenation) methods (Feng et al. 2018; Shao et al. 2019). These results lend confidence to the accuracy of their species tree estimates, if the assumptions of the MSC model hold.

ILS is not the only cause of gene tree heterogeneity, although it may be the most pervasive (Edwards 2009). Horizontal gene transfer, gene duplication, natural selection, recombination, and hybridization (Degnan 2018) also generate variation in gene genealogies. Hybridization – here meaning the effective transfer of genes from one species to another

10 Using the equation , where t is the internal branch length in coalescent units converted from branch lengths reported in Feng et al. (2018) assuming Ne = 100,000 and a generation time of 50 years.

27

through sexual reproduction – and horizontal gene transfer are unique because they create gene trees incompatible with the species tree. Under ILS, incongruent gene trees are still produced by a single bifurcating species history. In contrast, hybridization introduces genes that have evolved from a separate history before being transferred into a different species. Importantly, the standard MSC model assumes ILS is the sole source of gene tree heterogeneity. Model violations may be expected to simply result in poor support for hybrid nodes, but gene flow seems to instead result in highly-supported, but wrong, species tree estimates (Leaché et al. 2013; Solís-Lemus et al. 2016).

Hybridization is common in plants (Whitney, Ahern, et al. 2010) and can be a major source of error in species tree estimates. Identifying hybridization, however, is difficult because ILS can be mistakenly interpreted as gene flow between species (Muir & Schlötterer 2005). In spruce and other conifers, the different uniparental genealogies of the mitogenome and plastome can be a useful tool to identify the more prominent cases. Although the mitogenome and plastome are essentially single gene estimates (but see Sullivan et al. 2017; Zhu et al. 2018), their predominantly uniparental inheritance and clonal replication give them an effective population size of just ~25-50%11 of the nuclear genome. All else being equal, organelle genomes are expected to reach reciprocal monophyly and share a topology with the species tree much faster than nuclear genes.

In spruce, plastid and mitochondrial gene trees suggest markedly different evolutionary histories. Maternally-inherited loci show a strong geographic pattern: mitogenomes of proximate species coalesce first

11 The effective population size of an organelle is dependent on the mating system. For dioecious plants, it should be about ¼Ne of the nuclear genome and for hermaphroditic plants with equal parental contributions about ½Ne.

28

(Bouillé et al. 2011; Ran et al. 2015). For example, European spruce species – Caucasian (P. orientalis), Serbian (P. omorika), and southern populations of Norway spruce – share a more recent common ancestor in their mitogenome (Sullivan, A.R., unpublished data) than for their plastomes (Ran et al. 2015; Sullivan et al. 2017) or nuclear genomes (Feng et al. 2018; Shao et al. 2019). The paternally-inherited plastome is more complicated because it has undergone sexual recombination in Picea (Sullivan et al. 2017). Overall, however, it shows a less geographically-structured topology and suggests surprising links between North American, Eurasian, and Asia taxa that may be attributable to long-distance pollen dispersal (Bouillé et al. 2011; Ran et al. 2015; Sullivan et al. 2017). Notably, the strongest disagreements between the species trees of Feng et al. (2018) and Shao et al. (2019) involve species with radically different placements on mitogenome and plastome gene trees. While these patterns suggest hybridization, note that incongruences may alternatively be explained by ILS (Wang et al. 2018). Separating the two requires more testing, such as statistical comparisons of divergence times and coalescent model testing (e.g. Folk et al. 2016; Sousa et al. 2017; Lee-Yaw et al. 2019).

29

Hybridization

Phylogenetic relationships among many plants are more shrub-like than tree-like (Goulet et al. 2017). Gene flow between lineages is the main source of reticulate evolution, but this phenomenon comes in many flavors depending on the details of when? how much? how often? and between whom? The vocabulary used to describe patterns of gene flow is often ambiguous and varies among studies, but four main modes of hybridization can be discerned, and each can be divided into more nuanced groups:

Introgressive hybridization: genes are incorporated from one species to another through interbreeding and backcrossing without creating a new population (Folk et al. 2018). Introgression is expected to be a gene specific pattern, rather than genome wide (Harrison & Larson 2014).

Admixture: often used as a synonym for introgressive hybridization (e.g. Dannemann & Racimo 2018) but generally implies more extensive (Kronforst et al. 2006) or more recent (Kim et al. 2018) gene flow that results in a hybrid population. Admixed populations today may eventually resemble introgressed populations in the future with continued backcrossing.

Hybrid speciation: This could be considered a special case of admixture where gene flow results in a new, distinct lineage, often through polyploidization (Folk et al. 2018).

30

Some argue hybridization per se must be the cause of divergence (Schumer et al. 2014).

Isolation with Migration: refers to gene flow between sister lineages (Nielsen & Wakeley 2001). This could occur during the early stages of divergence, when genes are exchanged as barriers to reproduction are gradually established, or after populations have evolved independently for some period and come into contact again.

All these modes of hybridization can pose a problem for phylogenetic inference. Because isolation with migration occurs between two lineages immediately descendant from a common ancestor, gene flow should not alter the species tree topology but estimates of population sizes and divergence times will be severely biased (Leaché et al. 2013). Gene flow between paraphyletic taxa will influence the species tree estimate depending on the divergence between the lineages and the timing of gene flow and its magnitude. Concerningly, even low rates of gene flow (e.g., 0.01 migrant per generation) can result in inaccurate inferences of the major vertical inheritance signal in a species tree (Solís-Lemus et al. 2016).

Ubiquitous but cryptic? Detecting hybridization remains a difficult task. Challenges related to genotyping or computation persist, but technological advances have allowed a shift to a more fundamental question: how do we distinguish among different sources of genetic similarity? Many of the methods used for inferring hybridization do not actually distinguish between ILS and hybridization (reviewed in Payseur & Rieseberg 2016). For example, the genetic clustering algorithm STRUCTURE and related programs model drift, not gene flow, and divergent demographic histories among

31

populations can be indistinguishable from hybridization (Lawson et al. 2018). Principle component analysis (PCA) suffers from similar limitations (Lawson et al. 2018). Others, like the ABBA-BABBA statistic (Green et al. 2010), are informed by coalescent theory but still require an estimate of a species tree, among other limitations (Elworth et al. 2019). The logic becomes circular. We must exclude hybrids to infer the species tree, but we must infer the species tree to exclude hybrids.

Phylogenetic networks extend the MSC to model ILS and hybridization simultaneously to reconstruct topologies, population sizes, and divergence times (Elworth et al. 2019). Networks consequently avoid the circular logic of specifying a ‘backbone’ tree. They are equally appropriate for modelling relationships among species or populations and are robust to the mode of hybridization (Wen & Nakhleh 2018). Informally, phylogenetic networks use coalescence times and topologies across numerous genes to dissect the signal of hybridization from ILS. A gene coalescing before12 population divergence is consistent with hybridization, as is the asymmetrical frequency of incongruent gene tree topologies (Pamilo & Nei 1988). These predictions inform complex gene tree probability distributions under a given species network (Yu et al. 2014; Wen & Nakhleh 2018). Although phylogenetic networks are a considerable conceptual advance because they place reticulation on equal footing as speciation, widespread adoption is limited by their ponderous computational demands (Elworth et al. 2019).

12 That is, more recently in time.

32

Promiscuous spruce

Figure 6. Hybridization network of 21 species based on controlled crosses. Species are considered to cross readily if there are no apparent intrinsic barriers. Cross successfully indicates reduced fecundity relative to monospecific crosses, while crosses are difficult if viable seedlings are rarely produced. Reciprocal crosses were not tested in most cases. Data summarized from Wright (1955), Fowler (1983), Gordon (1986), Fowler (1987), and Ledig et al. (2004).

Evidence for hybridization in Picea comes from molecular studies and controlled crosses (Fig. 6). Both lines of evidence have their limitations. While a controlled cross unequivocally demonstrates species interfertility, the ca. 30 years to sexual maturity in spruce usually precludes testing hybrid viability. More generally, the strength of pre- and post-zygotic isolation often depends on ecological context (Widmer et al. 2009) and most controlled crosses, at least in long-lived trees, lack this component. Molecular studies can potentially identify effective gene flow, even at very low or variable levels (e.g. Sullivan et al. 2016; Hamilton et al. 2014), yet they generally are not proof positive because of the difficulty of distinguishing hybridization from other processes.

Genomic evidence of natural hybridization is available for a growing number of spruce taxa (Li et al. 2010; Hamilton et al. 2013; Zou et al. 2013;

33

de la Torre et al. 2014; de Lafontaine et al. 2015; Sun et al. 2015). Many more crosses are attested by discordant mitogenome and plastome topologies (e.g. Du et al. 2009). While studies are increasingly ambitious, phylogenetic networks are still out of reach for most. Instead, many combine hypothesis testing and model selection to compare coalescent histories with and without gene flow. This approach can be powerful and rigorous, but in practice, hybridization scenarios often violate model assumptions (Hey 2009; Strasburg & Rieseberg 2010; Cruickshank & Hahn 2014; Hey et al. 2015), or in the case of the arbitrarily complex models possible in fastsimcoal2 (Excoffier et al. 2013), the utility of the ‘best’ model depends on the quality of the enumerated and tested possibilities (Ruffley et al. 2018).

Phylogenetic network methods may simply refine estimates of gene flow and divergence, but they could more radically challenge our understanding of hybridization. Most new insights will probably come from detecting ancient introgression among the deeper branches of the species phylogeny, where population genetic approaches have limited power. In a major step in this direction, Feng et al. analyzed patterns of identity by descent (IBD) and gene tree frequencies in Picea and concluded species trees fail to capture the reticulate relationships among the major clades (2018). Among extant taxa, controversial cases are obvious candidates for revision (cf. Li et al. 2010; Sun et al. 2014), but applying network models to well-supported examples may reveal, for example, that gene flow only occurred between the ancestors of two species and contemporary hybridization is absent.

34

The rise of Norway spruce

Climatic revolutions and the last trees standing Three million years ago, the Earth’s climate took a turn for the worse.13 The narrow strip of water separating the Americas yielded to accumulating volcanic rocks, eventually changing the global energy cycle and creating the conditions ripe for glaciation (Bartoli et al. 2005). Coniferous forests lined the Arctic islands throughout the Pliocene, but continental glaciation rapidly progressed and by ca. 1.7 Mya, global temperatures never again reached warmer than modern temperatures (Repenning & Bowers 1992; Jost et al. 2009). The Ice Ages had begun.

Forests in Europe during the late years of the Pliocene held much more diversity than today, including taxa no longer present such as Ginkgo and Sequoia (Kvaček et al. 2019). At least three species of spruce are well represented by cone and foliage remains in central Europe, which resemble tigertail (P. torano, restricted to Japan), Serbian, and Norway spruce (Mai & Wähnert 2000; Teodoridis et al. 2009; Kvaček et al. 2019). Trees similar to Norway spruce were widely-distributed throughout the Russian Plain (Velichkevich & Zastawniak 2003). Further east, mixed species-rich coniferous forests prevailed in the Urals and western Siberia, replacing the semi-desert vegetation of climate of the Miocene and early Pliocene (Puchkov & Danukalova 2009; Volkova 2011). But by 1.65 Mya, arctic- tundra vegetation dominated western Siberia (Volkova 2011) and in Europe, trees like cypress (Taxodium) and hemlock (Tsuga) disappeared and

13 I sit here writing in January at 64°N and feel inclined to side with the geologists who describe warm periods as ‘optima’ and cold as ‘degradation’

35

forests started to resemble those of the current day (Couvering 1996; Magri & Palombo 2013; Bizzarri et al. 2018).

The first step towards the development of the modern spruce forests across the Eurasian taiga occurred against this backdrop. Today, the Norway-Siberian spruce complex can be divided into four genetic groups (Fig. 7), roughly corresponding to 1) Norway spruce in the boreal forest of Fennoscandia and western Russia, 2) Norway spruce in montane to subalpine habitats in continental Europe, 3) a somewhat enigmatic group around the Ural Mountains and western Siberia, 4) those distributed in the rest of Siberia, the aptly-named Siberian spruce (P. obovata). This genetic structure is well supported (Lagercrantz & Ryman 1990; Tollefsrud et al. 2008; Tollefsrud, Latałowa, Knaap, et al. 2015; Tsuda et al. 2016; Sullivan et al. Paper III), but the relationships, divergence times, and the amount and timing of hybridization among them is less certain.

Figure 7. Nuclear genetic structure of the Norway-Siberian spruce complex supports four distinct populations corresponding to southern P. abies (red), northern P. abies (yellow), P. obovata (blue), and their putative hybrid zone (olive). Numbers correspond to sampling locations, with details given in Table S1 of Sullivan et al. Paper III.

36

We used the phylogenetic network model implemented in Phylonet (Than et al. 2008; Zhu et al. 2018; Zhu & Nakhleh 2018) for biallelic single nucleotide polymorphisms (SNPs) to reconstruct the history of these four spruce populations (Fig. 8). Coincident with the beginning of persistent global cooling cycles of increasing severity (Repenning & Brouwers 1992), gene-flow to Ural-Siberian lineage simultaneously ceased from the ancestor of Norway spruce and the other Siberian spruce lineage (P. obovata sensu stricto; Fig. 8). While often inferred to be the product of active hybridization between Norway and Siberian spruce, the network does not support this conclusion. Instead, the Ural-Siberian spruce resulted from much older gene flow and have since diverged in relative isolation (Sullivan et al. Paper III).

Figure 8. Maximum likelihood phylogenetic networks show massive gene flow into northern Picea abies from a P. obovata-like taxon (A, B) and support a reticulate history for the P. obovata-like Ural-Siberian lineage (C). Branch lengths are in substitutions per site. Inheritances probabilities, that is, the probability that a gene originates in either parent are denotes by γ.

The Earth suddenly cooled again ca. 900 Kya, during the first of the 100,000-year cycles established during the Mid-Pleistocene Transition (Clark et al. 2006; Hughes & Gibbard 2018). Oceanic energy circulation weakened, global CO2 declined, and ice sheets thickened (Hughes & Gibbard 2018). Contiguous glaciation in Europe reached as far as Moscow

37

during glacial maxima during 790-700 Kya and to nearly the Caspian Sea around ~621 Kya during the most extensive of all Pleistocene glaciations. Around this time, the northern and southern Norway spruce populations became isolated and likely remained so until the Holocene (Tollefsrud et al. 2009).

Severed from the southern population and perhaps experiencing population decline (Heuertz et al. 2006) and range shifts, northern Norway spruce began to experience massive gene flow from the Ural-Siberian lineage. By the time gene flow ceased around 240 Kya, about 40% of the nuclear genome and both organelle genomes of northern Norway spruce were introgressed from Ural-Siberian spruce or P. obovata sensu stricto (Sullivan et al. Paper III). Today, these trees live in Sweden and Norway and more extensive population sampling indicates increasing Siberian ancestry in more eastern forests and possible ongoing hybridization (Tsuda et al. 2016; Chen et al. 2019). In contrast, gene flow between north and south is still limited, at least for this set of trees selected to mitigate the influence of forestry over the last 200 years (Myking et al. 2016; Chen et al. 2019).

Plant and animal communities in Europe underwent two major periods of rapid turnover, centered around 1.7 Mya14 and 850 Kya (Palombo 2014; Magri et al. 2017). The divergence times between Ural-Siberian spruce and P. obovata and the Norway spruce populations correspond satisfyingly well to these broader biogeographic patterns. The phylogenetic network is also surprisingly congruent with a demographic model fit to 10 microsatellites (SSRs) by Tsuda et al. (2016). I suspect they would also be surprised,

14 The climatic changes during this period may have also led to the appearance of the first hominids in Europe. See references in Palombo (2014).

38

because this model differed considerably from a second they estimated from 22 nuclear genes (Tsuda et al. 2016), which should be more robust given the myriad uncertainties around the scoring and analysis of microsatellites (Putman & Carbone 2014). Both models from Tsuda et al. (2016) and our phylogenetic network are incongruent with the topology and divergence times proposed by Chen et al. (2019). Although that study used an identical genotyping method, our filtering strategies differed considerably. For example, they included genotypes with coverage as low as 2–4x, whereas I required a minimum of 30x. Sequencing error can probably explain some of the differences between our studies because spurious alleles still need to be explained by a demographic model. Unfortunately, neither study explains their choice of a priori species tree topology, and it is unclear what alternative scenarios were examined and their relative performance (e.g. Ruffley et al. 2018).

Out of the ice. . . Ice cover in Europe peaked for the last time from around 27 to 18 Kya but did not manage to extend much further east than the Finnish border or south of modern Warsaw (Hughes et al. 2016). Increased aridity, rather than more mild temperatures, most likely led to the reduced ice extent (Hughes & Gibbard 2018). Although spruce is generally quite drought averse, macrofossil remains attest to the persistence of scattered populations almost to the Fennoscandian ice margin (Binney et al. 2009). Spruce also persisted north of the montane glaciations in central Europe in the Alpine and Carpathian forelands (Willis & van Andel 2004; Jankovská & Pokorný 2008). More controversially, trees may have persisted along the Norwegian coast, evidenced by ancient DNA (Parducci et al. 2012) and surprisingly old macrofossils in the Swedish mountains (Tollefsrud, Latałowa, van der Knaap, et al. 2015).

39

These stragglers reaped little reward for their persistence. Population expansion on the Russian Plain is not evident until nearly 10,000 years later, around 13-15 Kya (Tollefsrud, Latałowa, van der Knaap, et al. 2015) and not until ca. 7-9 Kya in central Europe and Fennoscandia (Latałowa & van der Knaap 2006; Tollefsrud et al. 2008). Genetic evidence corroborates the fossil record: the region corresponding to modern Poland, Belarus, and Ukraine is a strong barrier to migration in spruce (Tsuda et al. 2016) and other arctic and boreal plants (Eidesen et al. 2013). While northern and southern spruce populations met in central Poland ~3 Kya, the distribution of mitogenome diversity (Tollefsrud et al. 2008; Dering & Lewandowski 2009) reveals seeds from one population very rarely managed to establish in the range of the other. Norway spruce seemed set to maintain the deep subdivision put in place over half a million years earlier (Sullivan et al., Paper III) for another glacial cycle.

… and into the fire: Spruce enters the Anthropocene Forestry has been big business in Fennoscandia for the last millennium (Myking et al. 2016). Timber extraction began slowly at first, but large areas were eventually deforested to meet growing demands for charcoal and tar, necessary for smelting iron and sealing ship hulls (Myking et al. 2016). Mass importation of Norway spruce seeds from outside Fennoscandia began ca. 1900 to replenish depleted forests, to the point native forests in southern Sweden probably no longer exist (Almäng 1996). Recent phylogeographic studies of Norway spruce have made considerable efforts to exclude trees from secondary forests (Tollefsrud et al. 2008, 2009; Tsuda et al. 2016). This strategy helps to identify patterns of migration, gene flow, and adaptation but consequently the extent of translocated genotypes is largely unknown.

40

Foresters soon realized trees from Romania and especially Belarus outperformed native populations and those imported from central Europe (Hannerz & Westin 2005; Westin & Haapanen 2013). Many of these translocated trees became part of Swedish tree breeding efforts (Chen et al. 2019). In a survey of phenotypically superior trees used to establish the Swedish breeding program, Chen et al. (2019) estimated 55-75% originated from recent translocations. While these trees grow well and may perform better under projected climate change (Milesi et al. 2019), they originate from a genetically distinct lineage previously isolated from Fennoscandia’s native spruce for ~730,000 years (Sullivan et al. Paper III).

Given that 40% of the northern population’s genome is unique, I suggest caution in further industrial-scale population transfers. As a first step, identifying the proportion of non-native trees in breeding programs and seed orchards would help to estimate the risk, if any, the 300+ million improved seedlings produced annually pose to the native population.

41

The future

At the beginning of this thesis work, I set out with the ambition to develop some scientific delimitation for the collection of populations we call Norway spruce, to understand the historical relationships among them and their occasional dalliances with other species, and to place it all within the broader history of the genus Picea. It’s hard to say how much progress I made towards those goals because each answer prompts a dozen more questions. Yes, the deep genetic divergence between northern and southern populations of Norway spruce stems from vicariance and introgression, but why does it persist? Yes, Norway and Siberian spruce come from clearly independent lineages, but don’t they seem a little too independent for interfertile species in such proximity?

Along the way I stumbled across some unexpectedly captivating puzzles in the form of organelle genomes. With my background in ecology and forest management, I never expected to be fascinated by how plant mitogenomes remain functional when every aspect of their existence seems to be a death wish. With all that potential for intra-individual drift and selection, how much does a mitogenome change from one meiosis to the next? And if you can pinch off bits of useless sequence through this process, why do mitogenomes remain so large? Why does the plastome remain a center of cytoplasmic calm? And, hey, isn’t it weird that mitochondria don’t need their genome, but chloroplasts can’t function without theirs?

These lines of questioning mirror a conceptual division found throughout this kappa and in the three manuscripts comprising my thesis. While I considered the evolution of genomes and the evolution of species separately at length, forging explicit links between them proved difficult.

42

Genes, genomes, populations, and species are hierarchically organized, so processes affecting genes can have a bottom-up effect and pressures only acting on higher levels can cascade down below. While this relationship is clear, attempts to demonstrate causality have been met with mixed success; for example, the relationship between genome size and population size is equivocal (Lynch & Conery 2003; Whitney, Baack, et al. 2010) but changes in genome structure can clearly contribute to speciation (Feulner & De- Kayne 2017). Advancing our understanding of the interplay between these top-down and bottom-up processes in evolution seems a worthwhile goal for my future research.

43

Acknowledgements

First things first: I must extend my deepest gratitude to my adviser Xiao-Ru Wang. No one else in my entire life has fought harder for me. I am lucky to have you as a mentor, advocate, and friend.

I put off writing his section until the eleventh hour. Worse, actually: this thesis is late to the print office. So many people have contributed to my research over the years that I don’t know where to begin. I’ll try to keep it brief and focused on academic matters, but please know, dear reader, this only captures a small portion of the folks who have extended their skills, knowledge, networks, and/or friendship to me.

Thanks to my co-supervisors Nathaniel R. Street and Rosario García Gil for their advice and support. Bergvik Skog AB generously helped fund my PhD position. Thanks to Åke Granqvist, my industrial supervisor, and the rest of the office for being so welcoming, warm, and open to my research ideas. I am grateful to the Research School in Forest Genetics, Biotechnology and Breeding at UPSC, who organized funding for my position and meetings, courses, and field trips for us. The administrative staff at EMG have kindly and patiently helped me with all small problems that come with being a foreigner, right down to accompanying me to Skatteverket.

I’ve been lucky to work with Wei Zhao, David Hall, Jie Gao, Jingxiang Meng, Zandra Fagernäs, Alisa Kravtsova, and Yuqing Jin while at EMG, and I regret most of their contributions are not included in this thesis. Thank you all for your tireless hard work and indomitable spirits. I promise our papers are coming, soon! Georgy Moukerya accompanied me on many

44

sample collecting expeditions for little reward. Special thanks also must go to David for translating my abstract and thesis title on very short notice. You are always so generous with your time.

The Visby Institute funded my semester at the Siberian Federal University. I am grateful to Konstantin Krutovsky for being an excellent host and connecting me to his network in Russia, Germany, and beyond. These include Dmitry Politov and Elena Mudrik and their group at Vavilov Institute of General Genetics in Moscow, who are tireless adventurers and curious about everything. I’m excited to start on the next chapter of spruce research and look forward many future collaborations together.

Thanks to the larger EMG evolutionary biology group for the camaraderie and scientific discussions. Although my journal club attendance was sporadic (I’m sorry! It’s too early!), I enjoyed hearing the perspectives of Jing Wang, Jin Pan, Folmer Bokma, Barbara Giles, Jung Koo Kang, Xavier Thibert-Plante, Doreen Huang, Eric Capo and Jenny Olsson.

FORBIO, the Norwegian research school in biosystematics, funded my attendance at several excellent courses in ecological niche modeling, species delimitation, and the multispecies coalescent. I am grateful to the student I met, whose name I cannot recall, who told me about the program.

I’ve spent most of the last few years perpetually exhausted and stressed, but my friends have always been understanding and supportive. I hope to be back soon. Haley Gore and Jennifer C. Martin, I’ll love you forever. Thanks to the Ålidhem crew for a wonderful summer of 2019. Thomas Ågren, thank you for designing my cover and keeping my life together over the last few months.

45

References

Adams KL, Palmer JD. 2003. Evolution of mitochondrial gene content: gene loss and transfer to the nucleus. Mol. Phylogenet. Evol. 29:380– 395. doi: 10.1016/s1055-7903(03)00194-5. Almäng A. 1996 Foreign provenances of Norway spruce and Scots pine in Swedish forestry. Arbetsrapport 54. Sveriges Lantbruksuniversitet, 53 pp. Ahuja MR, Neale DB. 2005. Evolution of genome size in conifers. Silvae Genet. 54:126–137. Ai B, Wang Z-S, Ge S. 2012. Genome size is not correlated with effective population size in the Oryza species. Evolution. 66:3302–3310. doi: 10.1111/j.1558-5646.2012.01674.x. Alverson AJ et al. 2010. Insights into the evolution of mitochondrial genome size from complete sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol. Biol. Evol. 27:1436–1448. doi: 10.1093/molbev/msq029. Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD. 2011. Origins and recombination of the bacterial-sized multichromosomal mitochondrial genome of cucumber. Plant Cell. 23:2499–2513. doi: 10.1105/tpc.111.087189. Angiosperm Phylogeny Group (APG). 1998. An ordinal classification for the families of flowering plants. Ann. Missouri Bot. Gard. 85:531–553. doi: 10.2307/2992015. Angiosperm Phylogeny Group (APG). 2003. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. 141:399–436. doi: 10.1046/j.1095- 8339.2003.t01-1-00158.x. Angiosperm Phylogeny Group (APG). 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 161:105–121. doi: 10.1111/j.1095- 8339.2009.00996.x.

46

Arabidopsis Genome Iniative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 408:796–815. doi: 10.1038/35048692. Barnard-Kubow K, Mccoy MA, Galloway LF. 2016. Biparental chloroplast inheritance leads to rescue from cytonuclear incompatibility. New Phytol. doi: 10.1111/nph.14222. Bartoli G et al. 2005. Final closure of Panama and the onset of northern hemisphere glaciation. Earth Planet. Sci. Lett. 237:33–44. doi: 10.1016/j.epsl.2005.06.020. Binney HA et al. 2009. The distribution of late-Quaternary woody taxa in northern Eurasia: evidence from a new macrofossil database. Quat. Sci. Rev. 28:2445–2464. doi:10.1016/j.quascirev.2009.04.016. Birol I et al. 2013. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics. 29:1492–1497. doi: 10.1093/bioinformatics/btt178. Bizzarri R et al. 2018. Palaeoenvironmental and climatic inferences from the late early Pleistocene lacustrine deposits in the eastern Tiberino Basin (central Italy). Quat. Res. 90:201–221. doi: 10.1017/qua.2018.41. Bock R. 2007. Structure, function, and inheritance of plastid genomes. In: Cell and Molecular Biology of Plastids. Bock, R, editor. Springer Berlin Heidelberg: Berlin, Heidelberg pp. 29–63. doi: 10.1007/4735_2007_0223. Bouillé M, Senneville S, Bousquet J. 2011. Discordant mtDNA and cpDNA phylogenies indicate geographic speciation and reticulation as driving factors for the diversification of the genus Picea. Tree Genet. Genomes. 7:469–484. doi: 10.1007/s11295-010-0349-z. Bourque G et al. 2018. Ten things you should know about transposable elements. Genome Biol. 19:199. doi: 10.1186/s13059-018-1577-z. Burns RM, Honkala BH. 1990. Silvics of North America Volume 1: Conifers. Forest Service, United States Department of Agriculture, Washington, D.C., United States. Charlesworth B, Barton N. 2004. Genome Size: does bigger mean worse? Curr. Biol. 14:R233–R235. doi: 10.1016/j.cub.2004.02.054.

47

Chaw S-M, Wu C-S, Sudianto E. 2018. Evolution of Gymnosperm Plastid Genomes. In: Plastid Genome Evolution. Chaw, S-M & Jansen, RK, editors. Vol. 85 Academic Press pp. 195–222. doi: 10.1016/bs.abr.2017.11.018. Chen J et al. 2014. Clinal variation at phenology-related genes in spruce: parallel evolution in FTL2 and Gigantea? Genetics. 197:1025 LP – 1038. doi: 10.1534/genetics.114.163063. Chen J et al. 2019. Genomic data provide new insights on the demographic history and the extent of recent material transfers in Norway spruce. Evol. Appl. 12:1539–1551. doi: 10.1111/eva.12801. Chen J, Källman T, Gyllenstrand N, Lascoux M. 2010. New insights on the speciation history and nucleotide diversity of three boreal spruce species and a Tertiary relict. Heredity 104:3–14. doi: 10.1038/hdy.2009.88. Choi I-S et al. 2019. Fluctuations in Fabaceae mitochondrial genome size and content are both ancient and recent. BMC Plant Biol. 19:448. doi: 10.1186/s12870-019-2064-8. Clark PU et al. 2006. The middle Pleistocene transition: Characteristics, mechanisms, and implications for long-term changes in atmospheric pCO2. Quat. Sci. Rev. 25:3150–3184. doi: 10.1016/j.quascirev.2006.07.008. Cloutier A et al. 2019. Whole-genome analyses resolve the phylogeny of flightless birds (Palaeognathae) in the presence of an empirical anomaly zone. Syst. Biol. 68:937–955. doi: 10.1093/sysbio/syz019. Cole LW, Guo W, Mower JP, Palmer JD. 2018. High and variable rates of repeat-mediated mitochondrial genome rearrangement in a genus of plants. Mol. Biol. Evol. 35:2773–2785. doi: 10.1093/molbev/msy176. Couvering JA Van, ed. 1996. The Pleistocene Boundary and the Beginning of the Quaternary. Cambridge University Press: Cambridge. doi: 10.1017/CBO9780511585760. Crisp MD, Cook LG. 2011. Cenozoic extinctions account for the low diversity of extant gymnosperms compared with angiosperms. New Phytol. 192:997–1009. doi: 10.1111/j.1469-8137.2011.03862.x.

48

Crosby K, Smith DR. 2012. Does the mode of plastid inheritance influence plastid genome architecture? PLoS One. 7:1–8. doi: 10.1371/journal.pone.0046260. Cruickshank TE, Hahn MW. 2014. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23:3133–3157. doi: 10.1111/mec.12796. Dannemann M, Racimo F. 2018. Something old, something borrowed: admixture and adaptation in human evolution. Curr. Opin. Genet. Dev. 53:1–8. doi: 10.1016/j.gde.2018.05.009. Day A, Madesis P. 2007. DNA replication, recombination, and repair in plastids. In: Cell and Molecular Biology of Plastids. Bock, R, editor. Springer Berlin Heidelberg: Berlin, Heidelberg pp. 65–119. doi: 10.1007/4735_2007_0231. Degnan JH. 2018. Modeling hybridization under the network multispecies coalescent. Syst. Biol. 67:786–799. doi: 10.1093/sysbio/syy040. Degnan JH, Rosenberg NA. 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2:e68. doi: 10.1371/journal.pgen.0020068. Degnan JH, Rosenberg NA. 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24:332– 340. doi: 10.1016/j.tree.2009.01.009. Dering M, Lewandowski A. 2009. Finding the meeting zone: Where have the northern and southern ranges of Norway spruce overlapped? For. Ecol. Manage. 259:229–235. doi: 10.1016/j.foreco.2009.10.018. Dietrichson J. 1969. The geographic variation of spring-frost resistance and growth cessation in Norway spruce (Picea Abies (L.) Karst). Meddelelser fra det Norske Skogforsøksvesen 27:96-106. Dong S et al. 2018. The complete mitochondrial genome of the early flowering plant Nymphaea colorata is highly repetitive with low recombination. BMC Genomics. 19:614. doi: 10.1186/s12864-018- 4991-4.

49

Du FK, Petit RJ, Liu JQ. 2009. More introgression with less gene flow: Chloroplast vs. mitochondrial DNA in the Picea asperata complex in China, and comparison with other Conifers. Mol. Ecol. 18:1396–1407. doi: 10.1111/j.1365-294X.2009.04107.x. Eckenwalder JE. 2009. Conifers of the World: the Complete Reference. Timber Press, Portland, Oregon. Edwards SV. 2009. Is a new and general theory of molecular systematics emerging? Evolution 63:1–19. doi: 10.1111/j.1558-5646.2008.00549.x. Ekberg I, Eriksson G, Nilsson C. 1991. Consistency of phenology and growth of intra‐and interprovenance families of Picea abies. Scand. J. For. Res. 6:323-333. Eidesen PB et al. 2013. Genetic roadmap of the Arctic: Plant dispersal highways, traffic barriers and capitals of diversity. New Phytol. 200:898–910. doi: 10.1111/nph.12412. Eldfjell Y. 2018. Identifying mitochondrial genomes in draft whole genome shotgun assemblies of six gymnosperm species. Bachelor’s thesis. Stockholm University. Elliott TA, Gregory TR. 2015. What’s in a genome? The C-value enigma and the evolution of eukaryotic genome content. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 370:20140331. doi: 10.1098/rstb.2014.0331. Ellis JR, Bentley KE, McCauley DE. 2008. Detection of rare paternal chloroplast inheritance in controlled crosses of the endangered sunflower Helianthus verticillatus. Heredity 100:574–580. doi: 10.1038/hdy.2008.11. Elworth RAL, Ogilvie HA, Zhu J, Nakhleh L. 2019. Advances in computational methods for phylogenetic networks in the presence of hybridization. In: Bioinformatics and Phylogenetics. Warnow, T, editor. Springer International Publishing: pp. 317–360. doi: 10.1007/978-3- 030-10837-3_13. Erixon P, Oxelman B. 2008. Reticulate or tree-like chloroplast DNA evolution in Sileneae (Caryophyllaceae)? Mol. Phylogenet. Evol. 48:313–325. doi: 10.1016/j.ympev.2008.04.015.

50

Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9:e1003905. doi: 10.1371/journal.pgen.1003905. Farjon A, Filer D. 2013. An Atlas of the World's Conifers: An Analysis of their Distribution, Biogeography, Diversity and Conservation Status. Brill, Leiden, the Netherlands. Feng S et al. 2018. Trans-lineage polymorphism and nonbifurcating diversification of the genus Picea. New Phytol. doi: 10.1111/nph.15590. Feulner PGD, De-Kayne R. 2017. Genome evolution, structural rearrangements and speciation. J. Evol. Biol. 30:1488–1490. doi: 10.1111/jeb.13101. Folk RA, Mandel JR, Freudenstein J V. 2016. Ancestral gene flow and parallel organellar genome capture result in extreme phylogenomic discord in a lineage of angiosperms. Syst. Biol. 66:320–337. doi: 10.1093/sysbio/syw083. Folk RA, Soltis PS, Soltis DE, Guralnick R. 2018. New prospects in the detection and comparative analysis of hybridization in the tree of life. Am. J. Bot. 105:364–375. doi: 10.1002/ajb2.1018. Food and Agriculture Organization of the United Nations (FAO). 2011. FAO Yearbook of Forest Products. FAO Forest Series 45, FAO Statistics Series 201. Rome, Italy. Fowler D. 1983. The hybrid black × Sitka spruce, implications to phylogeny of the genus Picea. Can. J. For. Res. 3:108-115. Fowler D. 1987. The hybrid white × Sitka spruce: species crossability. Can. J. For. Res. 17:413-417. Gandini CL, Garcia LE, Abbona CC, Sanchez-Puerta MV. 2019. The complete organelle genomes of Physochlaina orientalis: Insights into short sequence repeats across seed plant mitochondrial genomes. Mol. Phylogenet. Evol. 137:274–284. doi: 10.1016/j.ympev.2019.05.012. Gernandt DS et al. 2016. Phylogenetics of extant and fossil Pinaceae: methods for increasing topological stability. Botany. 94:863–884.

51

Gernandt DS, Reséndiz Arias C, Terrazas T, Aguirre Dugua X, Willyard A. 2018. Incorporating fossils into the Pinaceae tree of life. Am. J. Bot. 105:1329–1344. doi: 10.1002/ajb2.1139. Gerten D, Schaphoff S, Haberlandt U, Lucht W, Sitch S. 2004. Terrestrial vegetation and water balance—hydrological evaluation of a dynamic global vegetation model. J. Hydrol. 286:249-270. Gonçalves DJP, Simpson BB, Ortiz EM, Shimizu GH, Jansen RK. 2019. Incongruence between gene trees and species trees and phylogenetic signal variation in plastid genes. Mol. Phylogenet. Evol. 138:219–232. doi: 10.1016/j.ympev.2019.05.022. Gordon A. 1986. Breeding, genetics and genecological studies in spruce for tree improvement in 1983 and 1984, Sault Ste. Marie, Ontario. Pages 112-116 in Proceedings of the Twentieth Meeting of the Canadian Tree Improvement Association. Goremykin V V, Lockhart PJ, Viola R, Velasco R. 2012. The mitochondrial genome of Malus domestica and the import-driven hypothesis of mitochondrial genome expansion in seed plants. Plant J. 71:615–626. doi: 10.1111/j.1365-313X.2012.05014.x. Gould SJ, Lewontin RC. 1979. The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. R. Soc. London. Ser. B. Biol. Sci. 205:581–598. Goulden M, Wofsy S, Harden J, Trumbore S, Crill P, Gower S, Fries T, Daube B, Fan S-M, Sutton D. 1998. Sensitivity of boreal forest carbon balance to soil thaw. Science 279:214-217. Goulet BE, Roda F, Hopkins R. 2017. Hybridization in plants: old ideas, new techniques. Plant Physiol. 173:65–78. doi: 10.1104/pp.16.01340. Gray MW. 2014. The pre-endosymbiont hypothesis: a new perspective on the origin and evolution of mitochondria. Cold Spring Harb. Perspect. Biol. 6:a016097. doi: 10.1101/cshperspect.a016097. Green RE et al. 2010. A draft sequence of the Neandertal genome. Science. 328. doi: 10.1126/science.1188021.

52

Greiner S, Sobanski J, Bock R. 2015. Why are most organelle genomes transmitted maternally? Bioessays. 37:80–94. doi: 10.1002/bies.201400110. Grotkopp E, Rejmánek Marcel, Sanderson MJ, Rost TL. 2004. Evolution of genome size in Pines (Pinus) and its life-history correlates: supertree analyses. Evolution. 58:1705–1729. doi: 10.1111/j.0014- 3820.2004.tb00456.x. Hamilton JA, De la Torre AR, Aitken SN. 2014. Fine-scale environmental variation contributes to introgression in a three-species spruce hybrid complex. Tree Genet. Genomes. 11:817. doi: 10.1007/s11295-014- 0817-y. Hamilton JA, Lexer C, Aitken SN. 2013. Differential introgression reveals candidate genes for selection across a spruce (Picea sitchensis × P. glauca) hybrid zone. New Phytol. 197:927–938. doi: 10.1111/nph.12055. Hannerz M, Westin J. 2005. Autumn frost hardiness in Norway spruce plus tree progeny and trees of the local and transferred provenances in central Sweden. Tree Physiol. 25:1181–1186. Hansen AK, Escobar LK, Gilbert LE, Jansen RK. 2007. Paternal, maternal, and biparental inheritance of the chloroplast genome in Passiflora (Passifloraceae): Implications for phylogenetic studies. Am. J. Bot. 94:42–46. doi: 10.3732/ajb.94.1.42. Hannerz M. 2003a, Finnish forest research in brief. Scand. J. For. Res 3: 196-198 Hannerz M. 2003b, Swedish forest research in brief. Scand. J. For. Res. 5: 387-390 Harrison RG, Larson EL. 2014. Hybridization, introgression, and the nature of species boundaries. J. Hered. 105:795–809. doi: 10.1093/jhered/esu033. Heled J, Drummond AJ. 2009. Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27:570–580. doi: 10.1093/molbev/msp274. Herrera F et al. 2016. New fossil Pinaceae from the Early Cretaceous of Mongolia. Botany. 94:885–915.

53

Heuertz M et al. 2006. Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics. 174:2095–2105. doi: 10.1534/genetics.106.065102. Hey J. 2009. Isolation with migration models for nore than two populations. Mol. Biol. Evol. 27:905–920. doi: 10.1093/molbev/msp296. Hey J, Chung Y, Sethuraman A. 2015. On the occurrence of false positives in tests of migration under an isolation-with-migration model. Mol. Ecol. 24:5078–5083. doi: 10.1111/mec.13381. Hidalgo O, Pellicer J, Christenhusz MJM, Schneider H, Leitch IJ. 2017. Genomic gigantism in the whisk-fern family (): Tmesipteris obliqua challenges record holder Paris japonica. Bot. J. Linn. Soc. 183:509–514. doi: 10.1093/botlinnean/box003. Hudson RR. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23: 183–201. Hughes ALC, Gyllencreutz R, Lohne ØS, Mangerud J, Svendsen JI. 2016. The last Eurasian ice sheets – a chronological database and time-slice reconstruction, DATED-1. Boreas. 45:1–45. doi: 10.1111/bor.12142. Hughes PD, Gibbard PL. 2018. Global glacier dynamics during 100 ka Pleistocene glacial cycles. Quat. Res. 90:222–243. doi: 10.1017/qua.2018.37. Hylen G. 1997. Genetic variation of wood density and its relationship with growth traits in young Norway spruce. Silvae Genet. 46:55. Jackman SD et al. 2019. Largest complete mitochondrial genome of a gymnosperm, sitka spruce (Picea sitchensis), indicates complex physical structure. bioRxiv. 601104. doi: 10.1101/601104. Jackman SD et al. 2015. Organellar genomes of white spruce (Picea glauca): assembly and annotation. Genome Biol. Evol. 8:29–41. doi: 10.1093/gbe/evv244. Jankovská V, Pokorný P. 2008. Forest vegetation of the last full-glacial period in the Western Carpathians (Slovakia and Czech Republic). Preslia. 80:307–324.

54

Jarvis ED et al. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 346:1320 LP – 1331. doi: 10.1126/science.1253451. Jost A et al. 2009. High resolution climate and vegetation simulations of the Late Pliocene, a model-data comparison over western Europe and the Mediterranean region. Clim. Past. 5:585–606. doi: 10.5194/cp-5-585- 2009. Kan S-L, Shen T-T, Gong P, Ran J-H, Wang X-Q. 2020. The complete mitochondrial genome of Taxus cuspidata (Taxaceae): eight protein- coding genes have transferred to the nuclear genome. BMC Evol. Biol. 20:10. doi: 10.1186/s12862-020-1582-1. Kingman JFC. 1982. The coalescent. Stoch. Process. Their Appl. 13: 235– 248. Kim BY, Huber CD, Lohmueller KE. 2018. Deleterious variation shapes the genomic landscape of introgression. PLoS Genet. 14:e1007741. doi: 10.1371/journal.pgen.1007741. Klymiuk AA, Stockey RA. 2012. A Lower Cretaceous (Valanginian) seed cone provides the earliest fossil record for Picea (Pinaceae). Am. J. Bot. 99:1069–1082. doi: 10.3732/ajb.1100568. Kozik A et al. 2019. The alternative reality of plant mitochondrial DNA: One ring does not rule them all. PLoS Genet. 15:e1008373. doi: 10.1371/journal.pgen.1008373. Kronforst MR, Young LG, Blume LM, Gilbert LE. 2006. Multilocus analyses of admixture and introgression among hybridizing Heliconius butterflies. Evolution. 60:1254–1268. doi: 10.1111/j.0014- 3820.2006.tb01203.x. Krutzsch P. 1974. The IUFRO 1964/68 provenance test with Norway Spruce (Picea abies (L.) Karst.). Silvae Genet. 23:58–62. Kvaček Z, Teodoridis V, Denk T. 2019. The Pliocene flora of Frankfurt am Main, Germany: taxonomy, palaeoenvironments and biogeographic affinities. Palaeobio. Palaeoenv. doi: 10.1007/s12549-019-00391-6.

55

de la Torre AR, Wang T, Jaquish B, Aitken SN. 2014. Adaptation and exogenous selection in a Picea glauca x Picea engelmannii hybrid zone: implications for forest management under climate change. New Phytol. 687–699. doi: 10.1111/nph.12540. Ladoukakis ED, Zouros E. 2017. Evolution and inheritance of animal mitochondrial DNA: rules and exceptions. J. Biol. Res. 24:2. doi: 10.1186/s40709-017-0060-4. de Lafontaine G, Prunier J, Gérardi S, Bousquet J. 2015. Tracking the progression of speciation: variable patterns of introgression across the genome provide insights on the species delimitation between progenitor–derivative spruces (Picea mariana × P. rubens). Mol. Ecol. 24:5229–5247. doi: 10.1111/mec.13377. Lagercrantz U, Ryman N. 1990. Genetic structure of Norway spruce (Picea abies): concordance of morphological and allozymic variation. Evolution. 44:38–53. doi: 10.1111/j.1558-5646.1990.tb04278.x. Latałowa M, van der Knaap WO. 2006. Late Quaternary expansion of Norway spruce Picea abies (L.) Karst. in Europe according to pollen data. Quat. Sci. Rev. 25:2780–2805. doi: 10.1016/j.quascirev.2006.06.007. Lawson DJ, van Dorp L, Falush D. 2018. A tutorial on how not to over- interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9:3258. doi: 10.1038/s41467-018-05257-7. Leaché AD, Harris RB, Rannala B, Yang Z. 2013. The influence of gene flow on species tree estimation: a simulation study. Syst. Biol. 63:17–30. doi: 10.1093/sysbio/syt049. Ledig FT, Hodgskiss PD, Krutovskii K V, Neale DB, Eguiluz-Piedra T. 2004. Relationships among the spruces (Picea, Pinaceae) of southwestern North America. Syst. Bot. 29:275–295. doi: 10.1600/036364404774195485. Lee-Yaw JA, Grassa CJ, Joly S, Andrew RL, Rieseberg LH. 2019. An evaluation of alternative explanations for widespread cytonuclear discordance in annual sunflowers (Helianthus). New Phytol. 221:515– 526. doi: 10.1111/nph.15386.

56

Leitch IJ et al. 2009. Genome size diversity in orchids: consequences and evolution. Ann. Bot. 104:469–481. doi: 10.1093/aob/mcp003. LePage BA. 2001. New species of Picea A. Dietrich (Pinaceae) from the middle Eocene of Axel Heiberg Island, Arctic Canada. Bot. J. Linn. Soc. 135:137–167. doi: 10.1006/bojl.2OOO.O386. Leslie AB et al. 2012. Hemisphere-scale differences in conifer evolutionary dynamics. Proc. Natl. Acad. Sci. 109:16217 LP – 16221. doi: 10.1073/pnas.1213621109. Li Y et al. 2010. Demographic histories of four spruce (Picea) species of the Qinghai-Tibetan Plateau and neighboring areas inferred from multiple nuclear loci. Mol. Biol. Evol. 27:1001–1014. doi: 10.1093/molbev/msp301. Li Z et al. 2015. Early genome duplications in conifers and other seed plants. Sci. Adv. 1:e1501084. doi: 10.1126/sciadv.1501084. Liu L, Wu S, Yu L. 2015. Coalescent methods for estimating species trees from phylogenomic data. J. Syst. Evol. 53:380–390. doi: 10.1111/jse.12160. Liu L, Yu L, Kubatko L, Pearl DK, Edwards S V. 2009. Coalescent methods for estimating phylogenetic trees. Mol. Phylogenet. Evol. 53:320–328. doi: 10.1016/j.ympev.2009.05.033. Lockwood JD et al. 2013. A new phylogeny for the genus Picea from plastid, mitochondrial, and nuclear sequences. Mol. Phylogenet. Evol. 69:717– 727. doi: 10.1016/j.ympev.2013.07.004. Lynch M, Conery JS. 2003. The origins of genome complexity. Science 302:1401 LP – 1404. doi: 10.1126/science.1089370. Maddison WP. 1997. Gene trees in species trees. Syst. Biol. 46:523–536. doi: 10.1093/sysbio/46.3.523. Magri D, Palombo MR. 2013. Early to Middle Pleistocene dynamics of plant and mammal communities in South West Europe. Quat. Int. 288:63–72. doi: 10.1016/j.quaint.2012.02.028.

57

Magri D, Di Rita F, Aranbarri J, Fletcher W, González-Sampériz P. 2017. Quaternary disappearance of tree taxa from southern Europe: Timing and trends. Quat. Sci. Rev. 163:23–55. doi: 10.1016/j.quascirev.2017.02.014. Mai DH, Wähnert V. 2000. On the problems of the Pliocene floras in Lusatia and Lower Silesia. Acta Palaeobot. 40:165–205. Matthews JJ, Liu AG, McIlroy Duncan. 2017. Post-fossilization processes and their implications for understanding Ediacaran macrofossil assemblages. In: Earth System Evolution and Early Life: A Celebration of the Work of Martin Brasier. Brasier, AT, McIlroy, D, & McLoughlin, N, editors. Geological Society, London, Specialist Publications. doi: 10.1144/SP448.20. McClintock B. 1941. The stability of broken ends of chromosomes in Zea Mays. Genetics. 26:234 LP – 282. Milesi P et al. 2019. Assessing the potential for assisted gene flow using past introduction of Norway spruce in southern Sweden: Local adaptation and genetic basis of quantitative traits in trees. Evol. Appl. 12:1946– 1959. doi: 10.1111/eva.12855. Mower JP, Vickrey TL. 2018. Structural diversity among plastid genomes of land plants. In: Plastid Genome Evolution. Chaw, S-M & Jansen, RK, editors. Vol. 85 Academic Press pp. 263–292. doi: 10.1016/bs.abr.2017.11.013. Muir G, Schlötterer C. 2005. Evidence for shared ancestral polymorphism rather than recurrent gene flow at microsatellite loci differentiating two hybridizing oaks (Quercus spp.). Mol. Ecol. 14:549–561. doi: 10.1111/j.1365-294X.2004.02418.x. Myking T, Rusanen M, Steffenrem A, Kjær ED, Jansson G. 2016. Historic transfer of forest reproductive material in the Nordic region: drivers, scale and implications. For. An Int. J. For. Res. 89:325–337. doi: 10.1093/forestry/cpw020.

58

Naito K, Kaga A, Tomooka N, Kawase M. 2013. De novo assembly of the complete organelle genome sequences of azuki bean (Vigna angularis) using next-generation sequencers. Breed. Sci. 63:176–182. doi: 10.1270/jsbbs.63.176. Neale DB, Sederoff RR. 1989. Paternal inheritance of chloroplast DNA and maternal inheritance of mitochondrial DNA in loblolly pine. Theor. Appl. Genet. 77:212–216. doi: 10.1007/BF00266189. Nielsen R, Wakeley J. 2001. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics. 158:885–896. Nordborg M. 2003. Coalescent Theory. Handb. Stat. Genet. doi: 10.1002/0470022620.bbc21. Nystedt B et al. 2013. The Norway spruce genome sequence and conifer genome evolution. Nature. 497:579–84. doi: 10.1038/nature12211. Ogilvie HA, Heled J, Xie D, Drummond AJ. 2016. Computational performance and statistical accuracy of *BEAST and comparisons with other methods. Syst. Biol. 65:381–396. doi: 10.1093/sysbio/syv118. Palombo MR. 2014. Deconstructing mammal dispersals and faunal dynamics in SW Europe during the Quaternary. Quat. Sci. Rev. 96:50– 71. doi: https://doi.org/10.1016/j.quascirev.2014.05.013. Pamilo P, Nei M. 1988. Relationships between gene trees and species trees. Mol. Biol. Evol. 5:568–583. doi: 10.1093/oxfordjournals.molbev.a040517. Parducci L et al. 2012. Glacial survival of boreal trees in northern Scandinavia. Science. 335:1083–6. doi: 10.1126/science.1216043. Pardue M-L, DeBaryshe PG. 2011. Retrotransposons that maintain chromosome ends. Proc. Natl. Acad. Sci. 108:20317 LP – 20324. doi: 10.1073/pnas.1100278108. Parks M, Cronn R, Liston A. 2012. Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae). BMC Evol. Biol. 12:100. doi: 10.1186/1471- 2148-12-100. Payseur BA, Rieseberg LH. 2016. A genomic perspective on hybridization and speciation. Mol. Ecol. 25:2337–2360. doi: 10.1111/mec.13557.

59

Pellicer J, Hidalgo O, Dodsworth S, Leitch IJ. 2018. Genome size diversity and its impact on the evolution of land plants. Genes. 9:88. doi: 10.3390/genes9020088. Pellicer J, Leitch IJ. 2019. The Plant DNA C-values database (release 7.1): an updated online repository of plant genome size data for comparative studies. New Phytol. n/a. doi: 10.1111/nph.16261. Preuten T et al. 2010. Fewer genes than organelles: extremely low and variable gene copy numbers in mitochondria of somatic plant cells. Plant J. 64:948–959. doi: 10.1111/j.1365-313X.2010.04389.x. Puchkov V, Danukalova G. 2009. The Late Pliocene and Pleistocene history of the Southern Urals Region in the light of neotectonic data. Quat. Int. 201:4–12. doi: 10.1016/j.quaint.2008.05.024. Putman AI, Carbone I. 2014. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol. Evol. 4:4399– 4428. doi: 10.1002/ece3.1305. Ramsey AJ, Mandel JR. 2019. When one genome is not enough: Organellar heteroplasmy in plants. Annu. Plant Rev. online. 619–658. doi: doi:10.1002/9781119312994.apr0616. Ran JH, Shen TT, Liu WJ, Wang PP, Wang XQ. 2015. Mitochondrial introgression and complex biogeographic history of the genus Picea. Mol. Phylogenet. Evol. 93:63–76. doi: 10.1016/j.ympev.2015.07.020. Ran JH, Wei XX, Wang XQ. 2006. Molecular phylogeny and biogeography of Picea (Pinaceae): Implications for phylogeographical studies using cytoplasmic haplotypes. Mol. Phylogenet. Evol. 41:405–419. doi: 10.1016/j.ympev.2006.05.039. Rannala B, Yang Z. 2003. Bayes estimation of species divergence times and ancestral population sizes using DNA Sequences from multiple loci. Genetics. 164:1645 LP – 1656. Reddy S et al. 2017. Why do phylogenomic data sets yield conflicting trees? data type influences the avian tree of life more than taxon sampling. Syst. Biol. 66:857–879. doi: 10.1093/sysbio/syx041.

60

Repenning C, Brouwers E. 1992. Late Pliocene-early Pleistocene ecologic changes in the Arctic Ocean borderland. In: U.S. Geological Survey Bulletin 2036 p. 37. Roch S, Steel M. 2015. Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100:56–62. doi: 10.1016/j.tpb.2014.12.005. Rodríguez-Moreno L et al. 2011. Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin. BMC Genomics. 12:424. doi: 10.1186/1471- 2164-12-424. Rosenberg NA. 2002. The probability of topological concordance of gene trees and species trees. Theor. Popul. Biol. 61:225–247. doi: 10.1006/tpbi.2001.1568. Ruffley M et al. 2018. Combining allele frequency and tree-based approaches improves phylogeographic inference from natural history collections. Mol. Ecol. 27:1012–1024. doi: 10.1111/mec.14491. Ruhlman TA, Jansen RK. 2018. Aberration or analogy? The atypical plastomes of Geraniaceae. In: Plastid Genome Evolution. Chaw, S-M & Jansen, RKBT-A in BR, editors. Vol. 85 Academic Press pp. 223–262. doi: 10.1016/bs.abr.2017.11.017. Safford, LO. 1974. Picea A. Dietr. Spruce. In: Schopmeyer, C. S., ed. Seeds of woody plants in the United States. Agric. Handb. 450. Washington, DC: U.S. Department of Agriculture, Forest Service pp. 587-597 Sahebi M et al. 2018. Contribution of transposable elements in the plant’s genome. Gene. 665:155–166. doi: 10.1016/j.gene.2018.04.050. Saladin B et al. 2017. Fossils matter: improved estimates of divergence times in Pinus reveal older diversification. BMC Evol. Biol. 1–15. doi: 10.1186/s12862-017-0941-z. Sanchez-Puerta MV. 2014. Involvement of plastid, mitochondrial and nuclear genomes in plant-to-plant horizontal gene transfer. Acta Soc. Bot. Pol. 83.

61

Sanitá Lima M, Smith DR. 2017. Pervasive Transcription of mitochondrial, plastid, and nucleomorph genomes across diverse plastid-bearing species. Genome Biol. Evol. 9:2650–2657. doi: 10.1093/gbe/evx207. Schiffels S, Durbin R. 2014. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46:919–925. doi: 10.1038/ng.3015. Schumer M, Rosenthal GG, Andolfatto P. 2014. How common is monoploid hybrid speciation? Evolution. 68:1553–1560. doi: 10.1111/evo.12399. Scott AD, Stenz NWM, Ingvarsson PK, Baum DA. 2016. Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers. New Phytol. 211:186–193. doi: 10.1111/nph.13930. Shao C-C et al. 2019. Phylotranscriptomics resolves interspecific relationships and indicates multiple historical out-of-North America dispersals through the Bering Land Bridge for the genus Picea (Pinaceae). Mol. Phylogenet. Evol. 141:106610. doi: 10.1016/j.ympev.2019.106610. Skrøppa T. 1982. Genetic variation in growth rhythm characteristics within and between natural populations of Norway spruce: a preliminary report. Silva Fenn. 16:160-167. Skrøppa T. 1991. Within‐population variation in autumn frost hardiness and its relationship to bud‐set and height growth in Picea abies. Scand. J. For. Res. 6:353-363. Sloan DB et al. 2012. Rapid Evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLOS Biol. 10:e1001241. doi: 10.1371/journal.pbio.1001241. Sloan DB, Oxelman B, Rautenberg A, Taylor DR. 2009. Phylogenetic analysis of mitochondrial substitution rate variation in the angiosperm tribe Sileneae. BMC Evol. Biol. 9:260. doi: 10.1186/1471-2148-9-260.

62

Solís-Lemus C, Yang M, Ané C. 2016. Inconsistency of species tree methods under gene flow. Syst. Biol. 65:843–851. doi: 10.1093/sysbio/syw030. Sotero-Caio CG, Platt II RN, Suh A, Ray DA. 2017. Evolution and diversity of transposable elements in vertebrate genomes. Genome Biol. Evol. 9:161–177. doi: 10.1093/gbe/evw264. Sousa F, Bertrand YJK, Doyle JJ, Oxelman B, Pfeil BE. 2017. Using genomic location and coalescent simulation to investigte gene tree discordance in Medicago L. Syst. Biol. 66:934-949. doi: 10.1093/sysbio/syx035 Steijlen I, Zackrisson O. 1987. Long-term regeneration dynamics and successional trends in a northern Swedish coniferous forest. Can J Bot. 65: 839-848. Stevens KA et al. 2016. Sequence of the sugar pine megagenome. Genetics. 204:1613 LP – 1626. doi: 10.1534/genetics.116.193227. Strasburg JL, Rieseberg LH. 2010. How robust are ‘isolation with migration’ analyses to violations of the im model? A simulation study. Mol. Biol. Evol. 27:297–310. doi: 10.1093/molbev/msp233. Sullivan AR et al. 2019. The mitogenome of Norway spruce and a reappraisal of mitochondrial recombination in plants. Genome Biol. Evol. 12:3586–3598. doi: 10.1093/gbe/evz263. Sullivan AR, Owusu SA, Weber JA, Hipp AL, Gailing O. 2016. Hybridization and divergence in multi-species oak (Quercus) communities. Bot. J. Linn. Soc. 181:99–114. doi: 10.1111/boj.12393. Sullivan AR, Schiffthaler B, Thompson SL, Street NR, Wang X-R. 2017. Interspecific plastome recombination reflects ancient reticulate evolution in Picea (Pinaceae). Mol. Biol. Evol. 34:1689–1701. doi: 10.1093/molbev/msx111. Sullivan J. 1994. Picea abies. Fire Effects Information System. U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station, Fire Sciences Laboratory. Sun Y et al. 2014. Evolutionary history of Purple cone spruce (Picea purpurea) in the Qinghai–Tibet Plateau: homoploid hybrid origin and Pleistocene expansion. Mol. Ecol. 23:343–359. doi: 10.1111/mec.12599.

63

Sun Y, Li Lili, Li Long, Zou J, Liu J. 2015. Distributional dynamics and interspecific gene flow in Picea likiangensis and P. wilsonii triggered by climate change on the Qinghai-Tibet Plateau. J. Biogeogr. 42:475–484. doi: 10.1111/jbi.12434. Szmidt AE, Aldén T, Hällgren JE. 1987. Paternal inheritance of chloroplast DNA in Larix. Plant Mol. Biol. 9:59–64. doi: 10.1007/BF00017987. Tajima F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460. Teodoridis V, Kvaček Z, Uhl D. 2009. Late Neogene palaeoenvironment and correlation of the Sessenheim-Auenheim floral complex. Palaeodiversity. 2:1–17. Than C, Ruths D, Nakhleh L. 2008. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics. 9:322. doi: 10.1186/1471-2105-9-322. Tollefsrud MM et al. 2009. Combined analysis of nuclear and mitochondrial markers provide new insight into the genetic structure of North European Picea abies. Heredity. 102:549–62. doi: 10.1038/hdy.2009.16. Tollefsrud MM et al. 2008. Genetic consequences of glacial survival and postglacial colonization in Norway spruce: Combined analysis of mitochondrial DNA and fossil pollen. Mol. Ecol. 17:4134–4150. doi: 10.1111/j.1365-294X.2008.03893.x. Tollefsrud MM, Latałowa M, van der Knaap WO, Brochmann C, Sperisen C. 2015. Late Quaternary history of North Eurasian Norway spruce (Picea abies) and Siberian spruce (Picea obovata) inferred from macrofossils, pollen and cytoplasmic DNA variation. J. Biogeogr. 42:1431–1442. doi: 10.1111/jbi.12484. Tsuda Y et al. 2016. The extent and meaning of hybridization and introgression between Siberian spruce (Picea obovata) and Norway spruce (P. abies): cryptic refugia as stepping stones to the west? Mol. Ecol. 46:1–17. doi: 10.1111/mec.13654.

64

Ujvári-Jármay É, Nagy L, Mátyás C. 2016. The IUFRO 1964/68 Inventory Provenance Trial of Norway spruce in Nyírjes, Hungary – results and conclusions of five decades. Acta Silv. Lignaria Hungarica. 12:1–2. doi: 10.1515/aslh-2016-0001. Velichkevich FY, Zastawniak E. 2003. The Pliocene flora of Kholmech, south-eastern Belarus and its correlation with other Pliocene floras of Europe. Acta Palaeobot. 43:137–259. Volkova VS. 2011. Paleogene and Neogene stratigraphy and paleotemperature trend of West Siberia (from palynologic data). Russ. Geol. Geophys. 52:709–716. doi: 10.1016/j.rgg.2011.06.003. Walker JF, Walker-Hale N, Vargas OM, Larson DA, Stull GW. 2019. Characterizing gene tree conflict in plastome-inferred phylogenies. PeerJ. 7:e7747. doi: 10.7717/peerj.7747. Wan T et al. 2018. A genome for gnetophytes and early evolution of seed plants. Nat. Plants. 4:82–89. doi: 10.1038/s41477-017-0097-2. Wang K et al. 2018. Incomplete lineage sorting rather than hybridization explains the inconsistent phylogeny of the wisent. Commun. Biol. 1:169. doi: 10.1038/s42003-018-0176-6. Warren RL et al. 2015. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J. 83:189–212. doi: 10.1111/tpj.12886. Wegrzyn JL et al. 2014. Unique features of the loblolly pine (Pinus taeda L.) megagenome revealed through sequence annotation. Genetics. 196:891–909. doi: 10.1534/genetics.113.159996. Wen D, Nakhleh L. 2018. Coestimating reticulate phylogenies and gene trees from multilocus sequence data. Syst. Biol. 67:439–457. doi: 10.1093/sysbio/syx085. Wendel JF, Jackson SA, Meyers BC, Wing RA. 2016. Evolution of plant genome architecture. Genome Biol. 17:37. doi: 10.1186/s13059-016- 0908-1.

65

Westin J, Haapanen M. 2013. Norway spruce - Picea abies (L.) Karst. In: Best Practice for Tree Breeding in Europe. Mullin, T. & Lee, SJ, editors. Gävle pp. 29–47. Whitney KD, Baack EJ, et al. 2010. A role for nonadaptive processes in plant genome size evolution? Evolution. 64:2097–2109. doi: 10.1111/j.1558-5646.2010.00967.x. Whitney KD, Ahern JR, Campbell LG, Albert LP, King MS. 2010. Patterns of hybridization in plants. Perspect. Plant Ecol. Evol. Syst. 12:175–182. doi: 10.1016/j.ppees.2010.02.002. Whitney KD, Boussau B, Baack EJ, Garland Jr. T. 2011. Drift and genome complexity revisited. PLoS Genet. 7:e1002092. doi: 10.1371/journal.pgen.1002092. Whitney KD, Garland Jr T. 2010. Did genetic drift drive increases in genome complexity? PLoS Genet. 6:e1001080. doi: 10.1371/journal.pgen.1001080. Widmer A, Lexer C, Cozzolino S. 2009. Evolution of reproductive isolation in plants. Heredity. 102:31–38. doi: 10.1038/hdy.2008.69. Willis KJ, van Andel TH. 2004. Trees or no trees? The environments of central and eastern Europe during the Last Glaciation. Quat. Sci. Rev. 23:2369–2387. doi: 10.1016/j.quascirev.2004.06.002. Willyard A, Syring J, Gernandt DS, Liston A, Cronn R. 2007. Fossil calibration of molecular divergence infers a moderate mutation rate and recent radiations for Pinus. Mol. Biol. Evol. 24:90–101. doi: 10.1093/molbev/msl131. Wolfe AD, Randle CP. 2004. Recombination, heteroplasmy, haplotype polymorphism, and paralogy in plastid genes: Implications for plant molecular systematics. Syst. Bot. 29:1011–1020. doi: 10.1600/0363644042451008. Wood TE et al. 2009. The frequency of polyploid speciation in vascular plants. Proc. Natl. Acad. Sci. 106:13875 LP – 13879. doi: 10.1073/pnas.0811575106. Wright JW. 1955. Species crossability in spruce in relation to distribution and taxonomy. For. Sci. 1:319-349.

66

Wu C, Wang Y-N, Hsu C-Y, Lin C-P, Chaw S-M. 2011. Loss of different inverted repeat copies from the chloroplast genomes of Pinaceae and Cupressophytes and influence of heterotachy on the evaluation of gymnosperm phylogeny. Genome Biol. Evol. 3:1284–1295. doi: 10.1093/gbe/evr095. Wu CS, Lin CP, Hsu CY, Wang RJ, Chaw SM. 2011. Comparative chloroplast genomes of Pinaceae: Insights into the mechanism of diversified genomic organizations. Genome Biol. Evol. 3:309–319. doi: 10.1093/gbe/evr026. Wu Z, Stone JD, Štorchová H, Sloan DB. 2015. High transcript abundance, RNA editing, and small RNAs in intergenic regions within the massive mitochondrial genome of the angiosperm Silene noctiflora. BMC Genomics. 16:938. doi: 10.1186/s12864-015-2155-3. Yu Y, Dong J, Liu KJ, Nakhleh L. 2014. Maximum likelihood inference of reticulate evolutionary histories. Proc. Natl. Acad. Sci. 111:16448– 16453. doi: 10.1073/pnas.1407950111. Zasada C, Gregory RA. 1969 Regeneration of white spruce, with reference to interior Alaska: a literature review. USDA Forest Service research paper PNW 79. Zhu A, Fan W, Adams RP, Mower JP. 2018. Phylogenomic evidence for ancient recombination between plastid genomes of the Cupressus- Juniperus-Xanthocyparis complex (Cupressaceae). BMC Evol. Biol. 18:137. doi: 10.1186/s12862-018-1258-2. Zhu A, Guo W, Gupta S, Fan W, Mower JP. 2016. Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. New Phytol. 209:1747–1756. doi: 10.1111/nph.13743. Zhu J, Nakhleh L. 2018. Inference of species phylogenies from bi-allelic markers using pseudo-likelihood. Bioinformatics. 34:i376–i385. doi: 10.1093/bioinformatics/bty295. Zhu J, Wen D, Yu Y, Meudt HM, Nakhleh L. 2018. Bayesian inference of phylogenetic networks from bi-allelic genetic markers. PLoS Comput. Biol. 14:e1005932. doi: 10.1371/journal.pcbi.1005932.

67

Zonneveld BJM. 2012. Conifer genome sizes of 172 species, covering 64 of 67 genera, range from 8 to 72 picogram. Nord. J. Bot. 30:490–502. doi: 10.1111/j.1756-1051.2012.01516.x. Zou J et al. 2013. Population genetic evidence for speciation pattern and gene flow between Picea wilsonii, P. morrisonicola and P. neoveitchii. Ann. Bot. 112:1829–1844. doi: 10.1093/aob/mct241.

68