Genomic studies of speciation and gene flow Why study speciation genomics?

Long-standing questions (role of geography/gene flow)

How do genomes diverge?

Find speciation genes Genomic divergence during speciation

1. Speciation as a bi-product of physical isolation

2. Speciation due to selection – without isolation evolution.berkeley.edu Genomic divergence during speciation

1. Speciation as a bi-product of physical isolation 0.8 0.4 frequency d S 0.0 80 100 120 140 160 180 Transect position (km)

Cline theory - e.g. Barton and Gale 1993

2. Speciation due to selection – without isolation evolution.berkeley.edu Genomic divergence during speciation

1. Speciation as a bi-product of physical isolation T

i

m ? e

2. Speciation due to selection – without isolation evolution.berkeley.edu Wu 2001, JEB

Stage 1 - one or few loci under disruptive selection

Gene under selection

Genome

FST

Feder, Egan and Nosil TiG Stage 2 - Divergence hitchhiking

Genome

FST

Feder, Egan and Nosil TiG Stage 2b - Inversion

Inversion links co-adapted alleles

Genome

FST

Feder, Egan and Nosil TiG Stage 3 - Genome hitchhiking

Genome

FST

Feder, Egan and Nosil TiG Stage 4 - Genome wide isolation

Genome

FST

Feder, Egan and Nosil TiG Some sub-species clearly in stage 1

lWing pattern “races” of melpomene

Heliconius erato

1986 1986 0.8 2011 0.8 2011 n n 1 1 0.4 0.4 frequency 10 frequency 10 b D 50 50 0.0 0.0 80 100 120 140 160 180 80 100 120 140 160 180 0.8 0.8 0.4 0.4 frequency frequency r D C 0.0 0.0 80 100 120 140 160 180 80 100 120 140 160 180 0.8 0.8 0.4 0.4 frequency frequency d N S 0.0 0.0 80 100 120 140 160 180 80 100 120 140 160 180 Transect position (km) 0.8 0.4 frequency b Y 0.0 80 100 120 140 160 180 Transect position (km) Some sub-species clearly in stage 1 lWing pattern “races” of Heliconius melpomene

B (red/orange patterns) Yb (yellow/white patterns)

S. H. Martin et al. Genome Res. 23, 1817–1828 (2013). O. Seehausen et al. Nat. Rev. Genet. 15, 176–92 (2014). Some sub-species clearly in stage 1 lCarrion and hooded Crows

FST

Poelstra, J. W. et al. Science 344, 1410–4 (2014). And an example with multiple islands?

Malinsky et al., Science 350, 1493 (2015). Other species have islands…but are they real?

Ellegren, et al. Nature 491, 756- (2012). Other species have islands…but are they real? Anopheles gambiae and A. coluzzi Formerly M and S forms of A. gambiae

Clarkson et al. 2014 Nature Communications Other species have islands…but are they real? Seehausen et al., Nature Reviews Genetics, 2014 What do patterns of Fst really mean?

• Fst measures relative divergence

• Peaks indicate regions of higher than expected between population divergence, given the within population divergence

• Peaks can therefore result from reduced diversity within species

• This could be due to lower Ne within species (selective sweeps, background selection) • So peaks NOT NECESSARILY due to reduced gene flow Note that sometimes sweeps within species = speciation genes Sweeps across the species barrier can also lead to Fst peaks

Double peaks??

Nicolas Bierne, Daniel Berner and others Anopheles M-S divergence

Relative divergence higher in low recombination regions - not significant for absolute divergence

see also: Charlesworth 1998 MBE Measures of divergence… More recently see papers by Reto Burri

No evidence for higher Dxy in wing pattern loci lWing pattern “races” of Heliconius melpomene

B (red/orange patterns) Yb (yellow/white patterns)

S. H. Martin et al. Genome Res. 23, 1817–1828 (2013). O. Seehausen et al. Nat. Rev. Genet. 15, 176–92 (2014). Suggestion that we use absolute measures of divergence? Understanding genomic divergence

No single statistic will capture the complex history of mutation, migration and selection

Patterns need to be interpreted in the specific context of the study species Much better to use explicit tests for gene flow

Need to design sampling so the expectations in the absence of gene flow are clear and testable

The key is to identify ‘control’ populations that are not influenced by admixture Explicit tests for gene flow: Neanderthal genome

•Isolated DNA from bones 38,000 yrs old in Croatia •We diverged from Neanderthals around 270-440,000yrs ago •Evidence for gene exchange with humans (1-4% of genome?)

Green et al., 328:710 Science 2010 Explicit tests for gene flow: ABBA- BABA test

EXPECT: OBSERVE: 50% ABBA 103612 ABBA Green et al. 2010 Science 328:710-722 50% BABA 94029 BABA Explicit tests for gene flow: ABBA- BABA test Explicit tests for gene flow: ABBA- BABA test Explicit tests for gene flow: Combining multiple signals

1) Derived alleles at high frequency shared with Neanderthal 2) High divergence to Africa but low to Neanderthal 3) Long haplotype blocks

The genomic landscape of Neanderthal ancestry in present-day humans - Sankararaman et al. Nature 2014 Sampled 10 complete high coverage genomes per population Explicit tests for gene flow: Heliconius

Many sources of reproductive isolation:

Female hybrids are sterile Different host plant use Different habitat preference Strong assortative mating Using Simon Martin’s Twisst method to characterise relationships ABBA-BABA statistics

Simon Martin

melG melW cyd outgroup

Martin et al., MBE 2015

Downloaded from genome.cshlp.org onDownloaded September from 18,Downloaded genome.cshlp.org2013 - Published from genome.cshlp.org by on Cold September Spring Harbor18, on 2013 September Laboratory - Published 18, Press 2013 by Cold - Published Spring Harbor by Cold Laboratory Spring Harbor Press Laboratory Press

Downloaded from genome.cshlp.orgDownloaded from ongenome.cshlp.org September 18, Downloaded2013on September - Published from 18, by genome.cshlp.org2013 Cold - PublishedSpring Harbor by on Cold SeptemberLaboratory Spring Harbor18,Press 2013 Laboratory - Published Press by Cold Spring Harbor Laboratory Press

300+ offspring for each cross type John Davey

x x x

PstI RAD sequencing Linkage maps built (site every ~10kb) with Lep-MAP

Davey et al., Evol Letters 2017

617 Figure 2. Four-taxon617 maximum-likelihoodFigure617 2. Four-taxonFigure trees 2. Four-taxon maximum-likelihood for 100 kb maximum-likelihood windows trees for 100 trees kb windowsfor 100 kb windows

618 (A) Trees were superimposed618 (A) using 618Trees Densitree were(A) superimposedTrees (Bouckaert were superimposed using 2010). Densitree There using were (Bouckaert Densitree 2848 trees 2010).(Bouckaert for theThere H. 2010). were There2848 treeswere for2848 the trees H. for the H. 617 Figure617 2. Four-taxonFigure 2. maximum-likelihood Four-taxon617 maximum-likelihoodFigure trees 2. Four-taxonfor 100 treeskb windows maximum-likelihood for 100 kb windows trees for 100 kb windows 619 cydno – H. melpomene619 datasetcydno 619(left) – H. and cydnomelpomene 2453 – forH. melpomenedatasetthe H. timareta(left) dataset and – 2453H. (left) melpomene for and the 2453 H. datasettimareta for the (right). H.– H.timareta melpomene – H. melpomenedataset (right). dataset (right). 618 (A) Trees618 were(A) superimposed Trees were superimposed using618 Densitree(A) usingTrees (Bouckaert Densitreewere superimposed 2010). (Bouckaert There using 2010).were Densitree 2848 There trees were (Bouckaert for 2848 the H.trees 2010). for Therethe H. were 2848 trees for the H. 620 Tree-lengths were equalized620 Tree-lengths so620 that allTree-lengths treeswere couldequalized bewere superimposed, so equalized that all trees so and that could then all betreesa randomsuperimposed, could jitter be superimposed, was and then a randomand then jitter a random was jitter was 619 cydno619 – H. melpomenecydno – H. dataset melpomene (left)619 anddatasetcydno 2453 (left) –for H. andthe melpomene H.2453 timareta for datasetthe – H. H. timareta (left)melpomene and – 2453 H. dataset melpomene for the(right). H. datasettimareta (right). – H. melpomene dataset (right). 621 added to all branch621 lengthsadded to show621 to alldensity.added branch Treesto lengths all supportingbranch to show lengths each density. to of show the Trees four density. supporting possible Trees topologies eachsupporting of the eachfour possibleof the four topologies possible topologies 620 Tree-lengths620 wereTree-lengths equalized were so thatequalized620 all treesTree-lengths so couldthat all be treeswere superimposed, couldequalized be superimposed, soand that then all a trees random and could then jitter be a randomwassuperimposed, jitter was and then a random jitter was 622 are coloured accordingly:622 blueare colouredfor622 the speciesare accordingly: coloured tree, red accordingly: blue for forthe the geography bluespecies for tree,tree,the species redgreen for for tree,the the geographyred control for the tree, geography green fortree, the green control for the control 621 added621 to all branchadded lengthsto all branch to show621 lengths density.added to show Trees to alldensity. supporting branch Trees lengths each supporting ofto theshow four each density. possible of the Trees fourtopologies supporting possible topologies each of the four possible topologies 623 tree and black for 623unresolvedtree 623trees.and black (B)tree Thefor and unresolvedfour black topologies for trees.unresolved scored, (B) The trees. along four (B) with topologies The the four number scored,topologies and along scored, with alongthe number with theand number and 622 are coloured622 accordingly:are coloured blueaccordingly: for622 the species blueare colouredfor tree, the red species accordingly: for the tree, geography red blue for for the tree, the geography greenspecies for tree,tree, the redcontrolgreen for for the the geography control tree, green for the control 29 29 29 623 tree and623 blacktree for and unresolved black for trees. 623unresolved (B)tree The trees.and four black (B)topologies Thefor unresolvedfour scored, topologies alongtrees. scored, with(B) Thethe along numberfour with topologies and the number scored, and along with the number and

29 29 29 Pop gen vs actual estimates of recombination rate Recombination rate strongly correlated with admixture proportion

Simon Martin Short chromosomes have more admixture: And chromosome ends have more admixture:

Admixture tree

Species tree Sequenced 20 individuals per population at 20x coverage

Burri et al., Genome Research 2015

An alternative is to take an explicit modelling approach

IM and IMa Jody Hey Martin et al., Biorxiv 2015

The effect of background selection on introgression in humans

Admixture is less in gene rich regions supporting this model…..

Harris and Nielson Genetics 2016, Juric, Aeschbacher and Coop 2016 Population and speciation genomics: Conclusions • Great power to detect subtle signals of selection and gene flow • Can make more general observations about genes and regions involved in adaptation • BUT genomic processes complicate the picture

• Best approaches combine multiple signals to infer process

• Eventually we need to combine background selection, recombination, positive selection And finally a plug…. Adaptive introgression

Photo credit Andrei Sourakov

43 species

77 species Fritz Müller H. melpomene H. cydno F1

30

Expected number 15

No. Models Attacked No. Models 0

– G-test: G = 7.25, d.f. = 1, p = 0.007 Merrill et al., Proc. Roy. Soc 2012 102 204 76 Peaks of divergence correspond to wing pattern genes van Belleghem et al., Nature Ecol Evol Reed et al., 2011 Science Ac - band shape Yb - yellow patterns D - red patterns cortex WntA Chromosome 10 Chromosome 15 optix Chromosome 18

Wing pattern controlled almost entirely by large effect loci

Transcription Expressed in larva and factor, pupa paints red in Wild type knockout Cell cycle, related to fizzypupal wing Putative morphogen defines black-fated(Nadeau regions et al., Naturein larval(Richard disc Wallbank) (Arnaud Martin & Bob Reed) 2016) PATTERN A B hth D Day 1 Day 3 Day 5 Day 7 C 9 * P *

8

Hw * 7 * H. melpomene plesseni Drosophila

D

6

P expression Relative Relative expression Relative

5

Hw

4 H. melpomene malleti P D H P D H P D H P D H Heliconius

optix COLOUR cinnabar D Day 1 Day 3 Day 5 Day 7 Day 1 Day 3 Day 5 Day 7 12 * * 5 * 11 * 4.5 10 * * * * * * * 9 4 *

8 3.5 Relative expression Relative Relative expression Relative 7

3 6

2 5 P D H P D H P D H P D H P D H P D H P D H P D H

E

Richard Wallbank Adaptive introgression

Heliconius Genome Consortium Nature 2012 Phylogenies across B/D

ML tree based on 50,000 bp

Heliconius Genome Consortium Nature 2012 Phylogenies across B/D

ML tree based on 50,000 bp Phylogenies across B/D

ML tree based on 50,000 bp Phylogenies across B/D

ML tree based on 50,000 bp Phylogenies across B/D

ML tree based on 50,000 bp Okay, so introgression causes mimicry

But mimicry is weird, right? Novelty can arise through introgression and recombination

NNbb NNBB

Heliconius cydno cordula

Heliconius heurippa

nnBB

Camilo Salazar

Mavarez et al., Nature 2006 Heliconius melpomene melpomene Wallbank et al., PLoS Biology 2016

100kb optix

Blue Red Wallbank et al., PLoS Biology 2016 Wallbank et al., PLoS Biology 2016

Generate dated trees using this node as a reference point Wallbank et al., PLoS Biology 2016 H. numata H. ismenius H. besckei H. nattereri H. ethilla H. hecale H. athis H. luciana H. pardalinus H. elevatus H. melpomene H. timareta H. tristero H. heurippa H. pachinus

ABCDE H. cydno 43210 Million Years Ago

Stryjewski & Sorensen Nat Ecol Evol 2017 What about behaviour? H. melpomene X H. cydno Backcross design: 30

Richard 20 Merrill 10 Number of approaches 0 melponmene Difference between Difference approaches to cydno and approaches -10

bb Bb

Genotype Z

Significant QTL detected on three linkage groups

5% genome-wide significance threshold Richard Merrill unpub. Relative probability of courting Together explain differences ~50% measured Relative pro bH.abili tymelpomene of courting ~italic(H. melpomene)

0.0 0.2 0.4 0.6 0.8 LG 18 1.0 LG 1LG 17 between H. melpomene and H. and cydno between H. melpomene C C C M C C Genotype at putative QTL C M C C C M C C | C C | C All three C C M | C M | C M H. melpomene H. cydno males males Z NNbb NNBB

Heliconius cydno cordula

Heliconius heurippa nnBB

Heliconius melpomene melpomene

Mavarez et al., Nature 2006 40 30 20 Number of approaches 10 0

cordula heurippa melpomene

Melo et al., Evolution 2009

Lamichhaney et al., Nature 2015 ALX1 associated with beak shape Blunt

Pointy Most of these studies use phenotype associations to identify introgressed loci

But can we identify them a priori using the ABBA-BABA method? Green et al. 2010 Science 328:710-722 D is not at all good at detecting outlier windows

D is quite dependent on the number of informative sites (the denominator)

Where s is numerator from the D equation f is the fraction of introgression compared to maximum possible

Martin et al., MBE 2014 But Martin’s F is quite good at finding the introgression outliers Martin’s F

Martin et al., MBE 2014 • Smith and Kronforst argued that introgression could be inferred where ABBA-BABA outliers showed lower Dxy compared to genome-wide average

• Be wary of window based D statistics

• F is better than D…

• Sampling design is very important! Implications for tree-thinking

Archaea

Eukarya The tree of life is reticulated

Bacteria Implications for tree-thinking

Mallet, Hahn and Besansky BioEssays 2015 Hahn and Nakhleh Evolution 2015

Okay, so what have we learnt and where do we go from here?