<<

Introduction to the Multispecies Coalescent Will Freyman DNA-basedIB200, Spring species 2016 delimitation in algae 183

234 Time barriers to species A species B species C species A species B species C

T2

T1

gene tree = gene tree ≠ species tree species tree AB C ABC ... 3 2 generation 1 = MRCA deep coalescence incomplete lineage sorting

Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence process), based on Degnan & Rosenberg (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent from the first population; the third generation is sampled from the second, and so on. During speciation Image: Leliaert, F., Verbruggen, H., Vanormelingen, P., Steen, F., López- Bautista,events (T1J. M., and Zuccarello T2), where, G. C., populations & De Clerck, becomeO. (2014). separated DNA-based by species a barrier to gene flow, gene copies will only be sampled from within the delimitationsame population. in algae. European This coalescence journal of phycology, process, 49(2), running 179-196. inside the species tree, has two important consequences (Figs 3 and 4). Fig. 3. DNA divergence times may be much older than speciation times, known as ‘deep coalescence’. As illustrated, the most recent common ancestor (MRCA) of two gene copy samples from species A, B and C is much older than the speciation event T1. Fig. 4. Deep coalescence in combination with incomplete fixation of gene lineages within species lineages (incomplete lineage sorting) may result in a gene tree and species tree with different topologies. For example, the gene tree based on three randomly sampled would wrongly suggest a sister relationship between species A and B.

Time species A species B species C reciprocally monophyletic species monophyly sp. A sp. B sp. C

paraphyly

polyphyly

sp. A sp. B sp. C

trans-species polymorphism paraphyletic polyphyletic paraphyletic Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 Fig. 5. Neutral coalescence process running within a species tree, based on Klein et al.(1998), Degnan & Rosenberg (2009) and Mailund (2009). Each dot represents an individual gene copy, each colour a different , and each line connects a gene copy to its ancestor in the previous generation. Within a population, selection and/or drift will result in changing allele frequencies over time. In the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciation would result in none of the three species being monophyletic. Only after sufficient time has gone by, will alleles be completely sorted in each lineage, resulting in reciprocal monophyly for the each of the three species.

effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). Coalescent theory provides the link between phylogenetic models and the underlying . DNA-based species delimitation in algae 183

234 Time barriers to gene flow species A species B species C species A species B species C

T2

T1

gene tree = gene tree ≠ species tree species tree AB C ABC ... 3 2 generation 1 = MRCA deep coalescence incomplete lineage sorting

Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence process), based on Degnan & Rosenberg (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent from the first population; the third generation is sampled from the second, and so on. During speciation Image: Leliaert, F., Verbruggen, H., Vanormelingen, P., Steen, F., López- Bautista,events (T1J. M., and Zuccarello T2), where, G. C., populations & De Clerck, becomeO. (2014). separated DNA-based by species a barrier to gene flow, gene copies will only be sampled from within the delimitationsame population. in algae. European This coalescence journal of phycology, process, 49(2), running 179-196. inside the species tree, has two important consequences (Figs 3 and 4). Fig. 3. DNA divergence times may be much older than speciation times, known as ‘deep coalescence’. As illustrated, the most recent common ancestor (MRCA) of two gene copy samples from species A, B and C is much older than the speciation event T1. Fig. 4. Deep coalescence in combination with incomplete fixation of gene lineages within species lineages (incomplete lineage sorting) may result in a gene tree and species tree with different topologies. For example, the gene tree based on three randomly sampled alleles would wrongly suggest a sister relationship between species A and B.

Time species A species B species C reciprocally monophyletic species monophyly sp. A sp. B sp. C

paraphyly

polyphyly

sp. A sp. B sp. C

trans-species polymorphism paraphyletic polyphyletic paraphyletic Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 Fig. 5. Neutral coalescence process running within a species tree, based on Klein et al.(1998), Degnan & Rosenberg (2009) and Mailund (2009). Each dot represents an individual gene copy, each colour a different allele, and each line connects a gene copy to its ancestor in the previous generation. Within a population, selection and/or drift will result in changing allele frequencies over time. In the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciation would result in none of the three species being monophyletic. Only after sufficient time has gone by, will alleles be completely sorted in each lineage, resulting in reciprocal monophyly for the each of the three species.

effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and phylogenetics (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). Coalescent theory allows us to model sources of gene tree- species tree incongruence that arise due to population level processes,DNA-based species and delimitation make in algaeestimates about demographic parameters.183

234 Time barriers to gene flow species A species B species C species A species B species C

T2

T1

gene tree = gene tree ≠ species tree species tree AB C ABC ... 3 2 generation 1 = MRCA deep coalescence incomplete lineage sorting

Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence process), based on Degnan & Rosenberg (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent from the first population; the third generation is sampled from the second, and so on. During speciation Image: Leliaert, F., Verbruggen, H., Vanormelingen, P., Steen, F., López- Bautista,events (T1J. M., and Zuccarello T2), where, G. C., populations & De Clerck, becomeO. (2014). separated DNA-based by species a barrier to gene flow, gene copies will only be sampled from within the delimitationsame population. in algae. European This coalescence journal of phycology, process, 49(2), running 179-196. inside the species tree, has two important consequences (Figs 3 and 4). Fig. 3. DNA divergence times may be much older than speciation times, known as ‘deep coalescence’. As illustrated, the most recent common ancestor (MRCA) of two gene copy samples from species A, B and C is much older than the speciation event T1. Fig. 4. Deep coalescence in combination with incomplete fixation of gene lineages within species lineages (incomplete lineage sorting) may result in a gene tree and species tree with different topologies. For example, the gene tree based on three randomly sampled alleles would wrongly suggest a sister relationship between species A and B.

Time species A species B species C reciprocally monophyletic species monophyly sp. A sp. B sp. C

paraphyly

polyphyly

sp. A sp. B sp. C

trans-species polymorphism paraphyletic polyphyletic paraphyletic Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 Fig. 5. Neutral coalescence process running within a species tree, based on Klein et al.(1998), Degnan & Rosenberg (2009) and Mailund (2009). Each dot represents an individual gene copy, each colour a different allele, and each line connects a gene copy to its ancestor in the previous generation. Within a population, selection and/or drift will result in changing allele frequencies over time. In the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciation would result in none of the three species being monophyletic. Only after sufficient time has gone by, will alleles be completely sorted in each lineage, resulting in reciprocal monophyly for the each of the three species.

effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and phylogenetics (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). Classic population genetic models:

These models predict changes in allele frequencies forward in time according to neutral drift.

Hardy-Weinberg (1908) assumptions: • Infinite population size • Non-overlapping generations • Random mating • No selection • No population structure • No Classic population genetic models:

These models predict changes in allele frequencies forward in time according to neutral drift.

Wright-Fisher (1930) assumptions: • Finite population size • Non-overlapping generations • Random mating • No selection • No population structure • No mutation Classic population genetic models:

These models predict changes in allele frequencies forward in time according to neutral drift.

Moran (1958) assumptions: • Finite population size • Overlapping generations • Random mating • No selection • No population structure • No mutation Wright-FisherForward-time Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

Laura Kubatko () Coalescent Theory June 9, 2012 5 / 84 Wright-FisherForward-time Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

Laura Kubatko () Coalescent Theory June 9, 2012 65 / 84 Wright-FisherWright-FisherForward-time Model Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

LauraLaura Kubatko Kubatko () () Coalescent Theory JuneJune 9, 9, 2012 2012 7 5 / / 84 84 Wright-FisherWright-FisherForward-time Model Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

LauraLaura Kubatko Kubatko () () Coalescent Theory JuneJune 9, 9, 2012 2012 9 5 / / 84 84 Wright-FisherWright-FisherForward-time Model Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

LauraLaura Kubatko Kubatko () () CoalescentCoalescent Theory Theory JuneJune 9, 9, 2012 2012 10 5 / / 84 84 Wright-FisherWright-FisherForward-time Model Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

LauraLaura Kubatko Kubatko () () Coalescent Theory JuneJune 9, 9, 2012 2012 12 5 / / 84 84 Wright-FisherWright-FisherForward-time Model Model Wright-Fisher Model:

time

Image: Laura Kubatko’s slides

LauraLaura Kubatko Kubatko () () CoalescentCoalescent Theory Theory JuneJuneJune 9, 9, 9, 2012 2012 2012 12 13 5 / / / 84 84 84 Coalescent theory: looking at the same process Wright-FisherWright-FisherWright-Fisherbackwards Model Model in time

time

Image: Laura Kubatko’s slides

LauraLauraLaura Kubatko Kubatko () () CoalescentCoalescent Theory Theory JuneJuneJune 9, 9, 9, 2012 2012 2012 1214 13 5 / / / 84 84 84 time Norecombination Randommating Noselection Nopopulation structure No gene flow Finitepopulation size

• • • • • John Kingman’s coalescent theory (1982): JohnKingman’s • 12 10 : Bob Thomson’s slides Thomson’s Bob : Image N 1 N ancestor: Because of this, the expected time until coalescence is simply The probability oflineages 2 coalescing is the probability of the same them choosing

N

Coalescentprobabilities:

We a very can model this in way... simple

How many generations ago did these alleles last share a common ancestor?

time

The coalescent model The coalescent model 9 11 N 1 N The probability oflineages 2 coalescing is the probability of the same them choosing ancestor: Because of this, the expected time until coalescence is simply N physical copies of a particular locus, not forms distinct of that locus Here alleles simply refer to Let’s think about the relationships between all of those alleles Imagine a single species made up ofImagine a single species made up N diploid individuals (2n total alleles) The coalescent model The coalescent model 20 18 2)! ! 2 n n N g n ) 1 2!( N (1 1 N Describes the time ofDescribes the the first success for trials with probability independent of success p and probability of failure (1-p) Rate = p or 1/N Mean = 1/por N n choose 2 accounts for the variety of ways that coalescence can occur Probability of coalescence event (or success rate) among n sampled lineages is = This is the g g N N The coalescent model The coalescent model 12 10 19 17 g ) 1 N =(1 Image: Bob Thomson’s slides Thomson’s Image:Bob 1 ... N ) 1 2)! N g N ! ) 2 n n N 1 n N (1 1 2!( N ⇥ ) (1 1 N 1 N = followed by coalescence Probability of coalescence no for g generations ancestor: Because of this, the expected time until coalescence is simply The probability oflineages 2 coalescing is the probability of the same them choosing (1 rate) among n sampled lineages is n choose 2 accounts for the variety of ways that coalescence can occur Probability of coalescence event (or success Probability that coalescence occurs g+1 generationsback: g g

N N N

Coalescentprobabilities:

We a very can model this in way... simple How many generations ago did these alleles last share a common ancestor?

time

The coalescent model

The coalescent model

The coalescent model The coalescent model 9 11 N 1 N The probability oflineages 2 coalescing is the probability of the same them choosing ancestor: Because of this, the expected time until coalescence is simply N Here alleles simply refer to physical copies of a particular locus, not forms distinct of that locus Let’s think about the relationships between all of those alleles Imagine a single species made up ofImagine a single species made up N diploid individuals (2n total alleles) The coalescent model The coalescent model 20 18 Image: Bob Thomson’s slides Thomson’s Image:Bob 2)! ! 2 n n N g n ) 1 2!( N (1 1 N of success p and probability of failure (1-p) Rate = p or 1/N Mean = 1/p N or Describes the time ofDescribes the the first success for trials with probability independent Probability of coalescence event (or success rate) n sampled lineages among is n choose 2 accounts for the variety of ways that coalescence can occur = This is the geometric distribution g g

N N Coalescentprobabilities:

time

The coalescent model The coalescent model 19 17 g ) 1 N =(1 1 ... N ) 1 2)! N g ! ) 2 n n N 1 n N (1 2!( ⇥ ) (1 1 N 1 N = Probability ofno coalescence for g generations followed by coalescence (1 n choose 2 accounts for the variety of ways that coalescence can occur Probability of coalescence event (or success rate) n sampled lineages among is Probabilitythat coalescence occurs g+1 generations back: g g N N The coalescent model The coalescent model 22 24 t t 2 2 n n N N = = e e Geometric is a discrete distribution time distribution Continuous time version the exponential is distribution As N goes to infinity, the coalescent process converges to a continuous markov time process with instantaneous rate of coalescence: Geometric distribution is a discrete time distribution Continuous time version is the As N goes to infinity, the coalescent process converges to a continuous time markov process with instantaneous rate of coalescence: t t N N The coalescent model The coalescent model 23 21 Image: Bob Thomson’s slides Thomson’s Image:Bob g ) t 2 2 n n N N 2)! ! 2 n n N n = (1 e 2!( 2 n N Probabilityof coalescence event (or success rate) n sampled lineages among is n choose 2 accounts for the variety of ways that coalescence can occur Expected time becomes: Geometric distribution is a discrete time distribution Continuous time version is the exponential distribution As N goes to infinity, the coalescent process converges to a continuous time markov process with instantaneous rate of coalescence: t g

N N Coalescentprobabilities:

time

The coalescent model The coalescent model The coalescent model The coalescent model Geometric distribution is a discrete time g distribution Probability of coalescence event (or success rate) among n sampled lineages is Continuous time version is the exponential n distribution 2 t N t n choose 2 accounts for the variety of ways e that coalescence can occur n! As N goes to infinity, the coalescent process 2!(n 2)! converges to a continuous time markov process Expected time becomes: with instantaneous rate of coalescence: N n n N n 2 2 (1 2 )g = N N N 21 22

The coalescent model TheCoalescent coalescent probabilities: model Geometric distribution is a discrete time Geometric distribution is a discrete time distribution distribution Continuous time version is the exponential Continuous time version is the exponential distribution distribution t t t t e e

As N goes to infinity, the coalescent process As N goes to infinity, the coalescent process converges to a continuous time markov process converges to a continuous time markov process with instantaneous rate of coalescence: with instantaneous rate of coalescence: N n N n = 2 = 2 N N 23 24 Image: Bob Thomson’s slides The coalescent model The coalescent model

We’ve been assuming constant population size We’ve been assuming constant population size Instead of N, we can specify a function that Instead of N, we can specify a function that describes a changing population size through describes a changing population size through t time t time N N(t) N N(t)

N Our instantaneous rate of coalescence is a N Our instantaneous rate of coalescence is a function of N, so we need to integrate the rate function of N, so we need to integrate the rate of coalescence across the function for N of coalescence across the function for N

n n n t n n n n t n (2) 2 2 (2) 2 2 2 N exp dt 2 N exp dt e N(t) N(t) e N(t) N(t) N Z0 ! N Z0 !

25 26

The coalescent model The coalescent model Coalescent probabilities: We’ve been assuming constant population size We have a nice function to calculate the probability of one coalescent event occurring at time t, given a Instead of N, we can specify a function that demographic function of t: describes a changing population size through n t n time P(t)= 2 exp 2 dt t t N(t) N(t) N N(t) Z0 ! What is the probability of all coalescent events observed in a sample? N Our instantaneous rate of coalescence is a N function of N, so we need to integrate the rate Given a demographic function and a list of coalescence times L = (0,tn,tn 1,...) of coalescence across the function for N Each probability is independent, so take the n n n t n (2) product n n n 2 N t 2 2 t e exp dt 2 2 N N(t) 0 N(t) ! P(L N(t)) = exp dt Z | N(t) 0 N(t) i=2 Z ! 27 Y 28 Image: Bob Thomson’s slides The coalescent model The coalescent model

We have a nice function to calculate the probability of We have a nice function to calculate the probability of one coalescent event occurring at time t, given a one coalescent event occurring at time t, given a demographic function of t: demographic function of t: n t n n t n P(t)= 2 exp 2 dt P(t)= 2 exp 2 dt t N(t) N(t) t N(t) N(t) Z0 ! Z0 ! What is the probability of all coalescent events What is the probability of all coalescent events observed in a sample? observed in a sample? N N Given a demographic function and a list of Given a demographic function and a list of coalescence times L = (0,tn,tn 1,...) coalescence times L = (0,tn,tn 1,...) Each probability is independent, so take the Each probability is independent, so take the product n n t n product n n t n P(L N(t)) = 2 exp 2 dt P(L N(t)) = 2 exp 2 dt | N(t) N(t) | N(t) N(t) i=2 Z0 ! i=2 Z0 ! Y 29 Y 30

The coalescent model The coalescent model Coalescent probabilities:

We have a nice function to calculate the probability of one coalescent event occurring at time t, given a Starting with first principles, we can derive a model that demographic function of t: describes the probability of coalescence histories within a n t n P(t)= 2 exp 2 dt lineage t N(t) N(t) Z0 ! Connects our simplified idea of a phylogenetic lineage back What is the probability of all coalescent events observed in a sample? to the underlying population genetics N Given a demographic function and a list of We end up with an equation that allows us to calculate the coalescence times L = (0,tn,tn 1,...) likelihood of an observed set of coalescence times Each probability is independent, so take the product n n t n P(L N(t)) = 2 exp 2 dt | N(t) N(t) i=2 Z0 ! Y 31 32 Image: Bob Thomson’s slides The coalescent model The coalescent model

We have a nice function to calculate the probability of We have a nice function to calculate the probability of one coalescent event occurring at time t, given a one coalescent event occurring at time t, given a demographic function of t: demographic function of t: n t n n t n P(t)= 2 exp 2 dt P(t)= 2 exp 2 dt t N(t) N(t) t N(t) N(t) Z0 ! Z0 ! What is the probability of all coalescent events What is the probability of all coalescent events observed in a sample? observed in a sample? N N Given a demographic function and a list of Given a demographic function and a list of coalescence times L = (0,tn,tn 1,...) coalescence times L = (0,tn,tn 1,...) Each probability is independent, so take the Each probability is independent, so take the product n n t n product n n t n P(L N(t)) = 2 exp 2 dt P(L N(t)) = 2 exp 2 dt | N(t) N(t) | N(t) N(t) i=2 Z0 ! i=2 Z0 ! Y 29 Y 30

The coalescent model The coalescent model

We have a nice function to calculate the probability of one coalescent event occurring at time t, given a Starting with first principles, we can derive a model that demographic function of t: describes the probability of coalescence histories within a n t n 2 2 lineage t P(t)= exp dt N(t) 0 N(t) ! Coalescent probabilities: Z Connects our simplified idea of a phylogenetic lineage back What is the probability of all coalescent events observed in a sample? to the underlying population genetics N We have now derivedGiven a demographic likelihood function function and a list of for a model that We end up with an equation that allows us to calculate the describes the probabilitycoalescence of times a coalescentL = (0,tn history,tn 1,... within) a lineage: likelihood of an observed set of coalescence times Each probability is independent, so take the product n n t n P(L N(t)) = 2 exp 2 dt | N(t) N(t) i=2 Z0 ! Y 31 32

These probabilities are based on:

1. The population size 2. The times to coalescent events

Image: Bob Thomson’s slides 484 Review TRENDS in Ecology and Vol.18 No.9 September 2003

Box 3. The coalescent and measurably evolving populations

Molecular sequences, whether sampled simultaneously or serially to the population size at that time, the pattern of observed coalescence through time, can be used to reconstruct the demographic history of and sampling events can be used to estimate the demographic history natural populations. This approach is based on a population genetic of the population. model called the coalescent, introduced by Kingman [8] and The coalescent model provides a probability distribution of times generalized by Griffiths and Tavare´ [9]. Although these papers consider between the coalescence events in the sample genealogy. This populations sampled at one time point, the coalescent model has been distribution depends on a demographic function (e.g. constant size, subsequently extended to measurably evolving populations by Rodrigo exponential growth or logistic growth) that describes population-size and Felsenstein [46]. change, and the likelihood of this function can be calculated given a The coalescent describes the relationship between the demographic specified genealogy. Hence, the demographic function can be esti- history of a large population and the shared ancestry of individuals mated from gene sequences by combining the phylogenetic and randomly sampled from it, as represented by a genealogical tree. This coalescent likelihood functions, as described in Box 4. tree, in turn, determines the pattern of genetic diversity seen in sampled The coalescent is a variable process, so it often produces estimates sequences (Box 2). Figure I illustrates the shared ancestry of individuals with large confidence limits. Statistical power is increased by the use of sampled from constant-sized (a) and exponentially growing popu- sequences from multiple unlinked loci (as they represent multiple lations (b), respectively. Moving back in time from the present, we independent runs of the coalescent process), or by the use of sequences follow the number of lineages in the genealogy in each generation. sampled at different times. The greater the spread of sampling times of This value decreases when two lineages share a common ancestor sequences, the more precise the estimates of population size and (a coalescence event), and increasesPopulation when sampled size individuals and coalescent are substitution times rate will be are [23]. Finally,related. heterochronous sequences contain encountered (a sampling event). Because the probability that a information about mutation rate (Box 2), so the demographic function coalescence event occurs at a particularGiven time one is inversely parameter proportional wecan can be estimated estimate in calendar the time other. units (generations or years).

(a) (b)

TRENDS in Ecology & Evolution

Fig. I.

that heterochronous samples taken from a population Measuring the evolution of RNA viruses should be measurably evolving. This expectation has As a group, RNA viruses encompass such well known Image:recently Drummond, been usedA. J., Pybus to challenge, O. G., Rambaut the validity, A., Forsberg, ofclaims R., & Rodrigo, of A.pathogens G. (2003). Measurably as HIV, influenzaevolving populations. and foot Trends and in mouth Ecology disease, & Evolution, 18(9),successful 481-488. isolation of DNA from exceptionally ancient and are characterized by populations that continuously sources of bacteria [37,38]. generate huge numbers of owing to their large Previously, phylogenetic calibrations of the rate of numbers, very short generation times and the error-prone evolution within species have not been possible because nature of their replication machinery [39,40]. Some of of the difficulty in assigning fossils to specific lineages these mutations are carried to fixation by random genetic at this taxonomic level. As a result, most estimates of drift or by the strong directional selection exerted by host the rate of molecular evolution have been at the level immune responses, resulting in a very fast rate of of genera or higher. Independent estimates based on substitution of the order of 1023 substitutions site21 y21 ancient DNA will not only give us a more detailed [19]. This rate is a million-fold greater than that observed picture of variation in evolutionary rates across species in eukaryotes [41], so that samples of RNAviruses showing and populations, but will also help test the validity of measurable evolution can be obtained from short recent methods designed to estimate the variation in sequences (,300 ) sampled over short time rates across lineages [16,20–22]. intervals (,1 year). As a consequence of this wealth of

http://tree.trends.com Coalescent models usually assumes neutrality, but can also be used to detect selection.

Image: Bamshad, M., & Wooding, S. P. (2003). Signatures of in the human genome. Nature Reviews Genetics, 4(2), 99-111. The coalescent model The coalescent model

DNA-based species delimitation in algae We have a nice function to calculate the probability183 of We have a nice function to calculate the probability of one coalescent event occurring at time t, given a one coalescent event occurring at time t, given a 234demographic function of t: demographic function of t: barriers to gene flow Time species A species B species C species An species B speciest n C n t n P(t)= 2 exp 2 dt P(t)= 2 exp 2 dt t N(t) N(t) t N(t) N(t) Z0 ! Z0 ! T2 What is the probability of all coalescent events What is the probability of all coalescent events observed in a sample? observed in a sample? T1 N N Given a demographic function and a list of Given a demographic function and a list of gene tree = coalescence times L = (0,tgenen,t treen ≠1,...) coalescence times L = (0,tn,tn 1,...) species tree species tree Each probability is independent, so take the Each probability is independent, so take the AB C ABC ... product n n t n product n n t n 3 P(L N(t)) = 2 exp 2 dt P(L N(t)) = 2 exp 2 dt 2 | N(t) N(t) | N(t) N(t) generation 1 = MRCA i=2 Z0 ! i=2 Z0 ! deep coalescence incompleteY lineage sorting 29 Y 30 Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence process), based on Degnan & Rosenberg (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent from the first population; the third generation is sampled from the second, and so on. During speciation events (T1 and T2), where populations become separated by a barrier to gene flow, gene copies will only be sampled from within the same population. This coalescence process, running inside the species tree, has two important consequences (Figs 3 and 4). Fig. 3. DNA divergence times may be much older thanThe speciation coalescent times, known as ‘deep coalescence model’. As illustrated, the most recent The coalescent model common ancestor (MRCA) of two gene copy samples from species A, B and C is much older than the speciation event T1. Fig. 4. Deep coalescence in combination with incomplete fixation of gene lineages within species lineages (incomplete lineage sorting) may result in a gene tree and species tree with different topologies. For example, the gene treeWe based have ona nice three function randomly to calculate sampled the probability alleles of would wrongly suggest a sister relationship between species A and B. one coalescent event occurring at time t, given a Starting with first principles, we can derive a model that demographic function of t: describes the probability of coalescence histories within a n t n Time species A species B species C P(t)= 2 exp 2 dt lineage Extendingt reciprocally to monophyleticN the(t) multispecies species N (t) monophyly sp. A sp. B sp. C Z0 ! Connects our simplified idea of a phylogenetic lineage back coalescent:What is the probability of all coalescent events paraphyly observed in a sample? to the underlying population genetics N polyphyly We have describedGiven a demographic a model function thatand a listcalculates of the We end up with an equation that allows us to calculate the coalescence times L = (0,tn,tn 1,...) probability of a coalescent history in a lineage: likelihood of an observed set of coalescence times sp. A Each probabilitysp. B is independent,sp. C so take the

trans-species polymorphism paraphyletic polyphyletic paraphyletic product n n t n P(L N(t)) = 2 exp 2 dt | N(t) N(t) i=2 Z0 ! Y 31 32 Each branch of a phylogeny is a lineage. Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 Fig. 5. Neutral coalescence process running within a species tree, based on KleinFor etevery al.(1998 2 alleles), Degnan present & Rosenberg within (2009 a )branch, and they Mailund (2009). Each dot represents an individual gene copy, each colour a differentwill have allele, andsome each probability line connects aof gene coalescing copy to its or not. ancestor in the previous generation. Within a population, selection and/or drift will result in changing allele frequencies over time. In the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciationSo the would probability result in none of ofa thegene three tree species given being a species monophyletic. Only after sufficient time has gone by, will alleles be completelytree sortedis the in product each lineage, of the resulting coalescent in reciprocal histories for monophyly for the each of the three species. all branches of the tree….. effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and phylogenetics (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). DNA-based species delimitation in algae 183

234 Time barriers to gene flow species A species B species C species A species B species C

T2

T1

gene tree = gene tree ≠ species tree species tree AB C ABC ... 3 2 generation 1 = MRCA deep coalescence incomplete lineage sorting

Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence process), based on Degnan & Rosenberg (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent fromThe the coalescentfirst population; modelThe coalescent the third model generationThe coalescent ismodel sampledThe coalescent from model the second, and so on. During speciation We have a nice function to calculate the probability of We have a nice function to calculate the probability of events (T1 and T2), where populations becomeone coalescentseparated event occurring at time t, given a byWe have a a nice barrier function to calculate the probability to of geneone coalescentfl eventow, occurring at time t, gene given a copiesWe have a nice function to willcalculate the probability only of be sampled from within the The coalescent model demographic functionThe of t: coalescent modelone coalescent event occurring at time t, given a demographic function of t: one coalescent event occurring at time t, given a n t n demographic function of t: n t n demographic function of t: same population. This coalescence process,P( runningt)= 2 exp inside2 dt then speciest n tree,P(t)= has2 exp two2 importantdt n consequencest n (Figs 3 and 4). Fig. 3. t 2 2 t 2 2 We have a nice function to calculate the probability of N (t) 0 N (t) ! t WeP( havet)= a nice functionexp to calculate the probability of dt N (t) 0 N (t) ! t P(t)= exp dt one coalescent event occurring at time t, given a Z one coalescentN event(t )occurring at time t,0 givenN a (t) ! Z N(t) 0 N(t) ! What is the probability of all coalescent events Z What is the probability of all coalescent events Z demographic function of t: demographic function of t: observed in a sample? What is the probability of all coalescent events observed in a sample? What is the probability of all coalescent events DNA divergence times may ben mucht nN older than speciationobserved in a times,n sample? t nN known as ‘deep coalescenceobserved in a sample? ’. As illustrated, the most recent P(t)= 2 exp 2 dt Given a demographic function and a list Nof *BEASTP(t)= 2 exp 2 dt ModelGiven a demographic function and a list Nof *BEAST Model t N(t) N(t) t Given Na demographic(t) function and a listN of(t ) Given a demographic function and a list of Z0 ! coalescence times L = (0,tn,tn 1,...) Z0 ! coalescence times L = (0,tn,tn 1,...) common ancestor (MRCA) of two gene copy*BEAST samples from Modelcoalescence species times L = (0,tn,t A,n 1,...) B and C is much oldercoalescence times L than = (0,tn,tn 1 the,...) speciation*BEAST event Model T1. Fig. 4. What is the probability of all coalescent events Each probability is independent, so take the What is the probability of all coalescent events Each probability is independent, so take the observed in a sample? observedEach in probability a sample? is independent, so take the Each probability is independent, so take the product n n t n productnn n t n n N N product n product n Given a demographic function and a list of P(L N(t)) = 2 exp 2 dt Given a demographicn function and a list of t n P(L N(t)) = 2 Pexp(d g2 dt)P (g S)n P (S)t n P (d g )P (g S)P (S) Deep coalescence in combination with incomplete| N(t) fixation N(t) P(L N of(t)) = genen2 exp lineages2 dt | withini=1N(t) species i N(ti) P(L N(t lineages)) =i 2 exp (incomplete2 dt lineage sorting)n may i=1 i i i coalescence times L = (0,tn,tn 1,...) i=2 Z0 ! coalescence times L = (0,tn,tn 1,...) i=2 Z0 ! Y | N(Pt) ( S 0DN()=t) ! Y | | | N(t) 0 N(t) ! P (S D)= | | 29 i=2 PZ(d g )P (g S)P (S) 30 i=2 Z P (d g )P (g S)P (S) Each probability is independent, so take the Each probabilityY is iindependent,=1 so take the i i i Y i=1 i i i 29 30 product product | P (D) | P (D) result in a gene tree and speciesn treen witht n differentP topologies.(S D)= n Forn example,t n | the| gene tree based on three randomlyP ( sampledS D)= alleles | | P(L N(t)) = 2 exp 2 dt P(L N(t)) = 2 exp 2 dt Q Q | N(t) TheN(t) coalescent model | | N(t) TheN(t) P coalescent(D) model | P (D) i=2 0 ! i=2 0 ! Y Z YQ4 components:Z Q4 components: 29 30 would wrongly suggest a sister relationship betweenWe have a nice function species to calculate the probability of A and B. We have a nice function to calculate the probability of 4 onecomponents: coalescent event occurring at time t, given a one coalescent event occurring at time t, given a 4 components: The coalescent model demographic function of t: The coalescent model demographic function of t: The coalescentn t nmodel P ( d i g i ) - standard likelihoodThe coalescentn for alignmentt nmodel and gene P ( d i g i ) - standard likelihood for alignment and gene P(t)= 2 exp 2 dt P(t)= 2 exp 2 dt We have a nice functiont to calculate the probability of | t | PN ( ( dt ) i g i ) 0- Nstandard (t) ! likelihood for alignment andN ( tgene) 0 N (t) ! P ( d i g i ) - standard likelihood for alignment and gene one coalescent event occurring at time t, given a Z We have a nice function to Startingcalculatetree the withprobability i first of principles, we can derive a model that Z tree i demographic function of t:What is the probability of| all coalescent eventsone coalescent event occurring at time t, given a StartingWhat is the probability with first of all principles, coalescent events we can derive a model that | The coalescent model Theobserved coalescent in a sample? modeldemographic function of t:describes the probability of coalescence historiesobserved within in a a sample? n treet n i describes the probability of coalescence histories within a tree i N 2 2 n lineage t n N t P(t)= exp Given a demographic functiondt and a list of 2 2 lineageGiven a demographic function and a list of We have a nice function to calculate the probability of N(t) 0 N(t) P(t)= exp dt coalescenceZ times L =! (0,tt n,tn 1,...) P ( g i S ) - coalescent likelihoodcoalescence times L =of (0 ,tgenen,tn 1,... trees) P ( g i S ) - coalescent likelihood of gene trees one coalescent event occurring at time t, given a Starting with first principles, we can deriveN( ta) modelConnects that 0 ourN( simplifiedt) ! idea of a phylogenetic lineage back Time species AWhat is the probability species of all coalescent events B speciesZ C demographic function of t: Each probability is independent, so take the | ConnectsEach probability our simplifiedis independent, idea so take of the a phylogenetic lineage back | observed in a sample? describesP the( g probability i S ) of - coalescence coalescentWhat is the probabilityhistories of allwithinto coalescent the likelihood underlyinga events population of genetics gene trees P ( g i S ) - coalescent likelihood of gene trees n t nN product n n observedt n in a sample? to theproduct underlyingn n population geneticst n 2 2 lineage 2 | 2 2 2 | t P(t)= exp dt Given a demographicP(L N function(t)) =and a list Nof exp dt We end up with an equation that allows P(us Lto Ncalculate(t)) = the exp dt N(t) 0 N(t) coalescence times| L = (0,t ,tN(t),...) NGiven(t a) demographic function and a list of | We end up withN( ant) equation that Nallows(t) us to calculate the Z ! i=2n n 1 0 ! likelihood of an observed set of coalescence times i=2 0 ! Connects Your simplified idea ofZ a phylogeneticcoalescence times lineageL = back (0,t n,tn 1,...) Y reciprocallyZ monophyletic species What is the probability of all coalescent events Each probability is independent, so take the 29 likelihood of an observed set of coalescence30 times observed in a sample? to the underlying population geneticsEach probability is independent,P so( takeS the) P (D) P (S) P (D) monophyly product n n t n - uniform topology -Marginal likelihood - uniform topology -Marginal likelihood N The coalescent model 2 2 product Then n coalescentt n model Given a demographic function and a list of P(L N(t)) = Weexp end up with an equationdt that allows us to calculate the sp. A sp. B sp. C | N(t) P ( S N ( )t ) P( L- Nuniform(t)) = 2topologyexp 2 dt P (D) -Marginal likelihood P ( S ) - uniform topology P (D) -Marginal likelihood coalescence times L = (0,tn,tn 1,...) i=2 0 ! likelihood of Zan observed set |of coalescence timesN(t) 0 N ( t )- birth-death or Yule divergence - birth-death or Yule divergence We have a nice functionY to calculate the probability of 31 i=2 !We have a nice function to calculate the probability of 32 Each probability is independent, so take the Z one coalescent event occurring at time t, given a Y 31 one coalescent event occurring at time t, given a 32 product - birth-death or Yule divergence - birth-death or Yule divergence n n t n demographic function of t: demographic function of t: 2 2 - gamma pop sizes with hyperprior - gamma pop sizes with hyperprior P(L N(t)) = exp Thedt coalescentn modelt n The coalescentn modelt n Heled and Drummond 2010 Heled and Drummond 2010 | N(t) N(t) 2 2 2 2 i=2 0 !P(t)= exp dt P(t)= exp dt paraphyly Y Z t - gamma pop sizes witht hyperprior - gamma pop sizes with hyperprior 31 N(t) 0 N(t) ! 32 N(t) 0 N(tHeled) ! and Drummond 2010 Heled and Drummond 2010 WeZ have a nice function to calculate the probability of Z 73 74 What is the probability of all coalescentone coalescent events event occurring at time t, given a StartingWhat with is thefirst probability principles, of all coalescent we can events derive a model that observed in a sample? observed in a sample? 73 74 N demographic function of t: N describes the probability of coalescence histories within a Given a demographic function and a list of n t n Given a demographic function and a list of 2 2 lineage coalescence times Lt =P( (0t,t)=n,tn 1,...exp) dt coalescence times L = (0,tn,tn 1,...) N(t) 0 N (t) ! Each probability is independent, so take the Z Connects ourEach simplified probability is independent, idea of a sophylogenetic take the lineage back What is the probability of all coalescent events polyphyly product product n n observed in at sample?n to the underlying npopulationn genetics t n P(L N(t))N = 2 exp 2 dt P(L N(t)) = 2 exp 2 dt | N(t) Given a 0demographicN(t) function and a list of We end| up with an equationN(t) that allows 0 usN to(t calculate) the i=2 Z ! i=2 Z ! Y coalescence times L = (0,tn,tn 1,...) Y 29 likelihood of an observed set of coalescence times30 Each probability is independent, so take the product n n t n P(L N(t)) = 2 exp 2 dt | N(t) N(t) i=2 0 ! Y Z *BEAST31 Model sp. A 32 sp. B sp. C *BEAST The coalescent model*BEAST ModelThe coalescent model *BEAST trans-species polymorphism paraphyletic polyphyletic paraphyletic n We have a nice function to calculate the probability of one coalescent event occurring at time t, given a Startingn with first principles, we can derivei a =1model thatP (di gi)P (gi S)P (S) P (gi S) demographic function of t: describesP ( thePS probability(Dd )= ofg coalescence)P histories(g withinS) a P (S| ) | n t n i=1 i i i P (gi S) | 2 2 lineage t P(t)= exp P (dtS D)= | | | P (D) N(t) 0 N(t) | Likelihood of gene trees given the species tree Z !| Connects our simplified ideaP of (a phylogeneticDQ) lineage back What is the probability of all coalescent events 4 components: Likelihood of gene trees given the species tree observed in a sample? Qto the underlying population genetics N Given a demographic function4 andcomponents: a list of We end up with an equation that allows us to calculate the We have an equation to calculate the likelihood of coalescence times L = (0,tn,tn 1,...) likelihoodP ( dof an observed g ) -set standard of coalescence times likelihood for alignment and gene We have an equation to calculate the likelihood of Each probability is independent, so take the i i coalescent histories within a lineage product n n P t( d n i g i ) - standard likelihood| for alignment and gene P(L N(t)) = 2 exp 2 dt | N(t) N(t) tree i coalescent histories within a lineage i=2 0 | ! Y Z n n t n tree i31 32 P (g S) n P(nL N(t)) = t 2n exp 2 dt i - coalescent likelihood of gene trees 2 2 P ( g i S ) - coalescent likelihood| of gene trees P(L N(t)) = | exp N(t) dt 0 N(t) ! | | N(t) i=2 N (t) Z i=2 Y0 ! Y Z P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a whole tree? Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a whole tree? Fig. 5. Neutral coalescence process running within a species tree, based - birth-death on Klein or Yuleet divergence al.(1998), Degnan & Rosenberg (2009) and - birth-death or Yule - divergence gamma pop sizes with hyperprior Mailund (2009). Each dot represents an individual gene copy, - gamma each pop sizes colour with hyperprior a different allele, and eachHeled and lineDrummond connects 2010 a gene copy to its Heled and Drummond 2010 Heled and Drummond 2010 75 Heled and Drummond 2010 76 ancestor in the previous generation. Within a population, selection and/or drift will result in changing75 allele frequencies over time. In 76 the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciation would result in none of the three species being monophyletic. Only after sufficient time has gone by, will alleles be completely sorted in each lineage, resulting in reciprocal monophyly for the each of the three species. effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and phylogenetics (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). DNA-based species delimitation in algae 183

234 Time barriers to gene flow species A species B species C species A species B species C

T2 *BEAST Model *BEAST Model T1 n P (d g )P (g S)P (S) n P (d g )P (g S)P (S) P (S D)= i=1 i| i i| P (S D)= i=1 i| i i| gene |tree = P (Dgene) tree ≠ | P (D) species tree Q species tree Q *BEAST Model4 components: *BEAST Model4 components: AB C ABC ... 3 n n PP ( ( d d i i g ig ) i -) standardP (gi S likelihood)P (S) for alignment and gene PP ( ( d d i i g ig ) i -) standardP (gi S likelihood)P (S) for alignment and gene 2 P (S D)= i=1 | | | P (S D)= i=1 | | | generation 1 *BEAST= ModelMRCA | tree iP (D) *BEAST Model| tree iP (D) deep coalescenceQ incomplete lineage sorting Q 4 components: 4 components: n P ( g i S ) - coalescent likelihood of gene trees n P ( g i S ) - coalescent likelihood of gene trees Figs 2-4. Illustrations of a Wright–Fisher process (a simplifiedP version(di ofgi the)P coalescence(g|i S)P process),(S) based on Degnan & Rosenberg P (di gi)P (g|i S)P (S) (2009) and Mailund (2009). Fig. 2. Diagram showing ai set=1 of non-overlapping generations where each new generation is sampled i=1 P (S D)=P ( d g ) - standard| likelihood |for alignment and gene P (S D)=P ( d g ) - standard| likelihood |for alignment and gene from the previous*BEAST at random with replacement. Model| Each doti| representsi an individualP (D) gene copy and each line connects a gene copy to its*BEAST Model| i| i P (D) ancestor in the previous generation. Starting with aQ set of individuals in a first generation, the second generation is created by Q 4 components: tree i P ( S ) - uniform topology P (D) -Marginal likelihood4 components: tree i P ( S ) - uniform topology P (D) -Marginal likelihood randomly selecting a parent from*BEAST the first population;n theModel third generation is sampled from the second, and so on. During speciation *BEASTn Model events (T1 and T2), where populations become separatedi=1 P ( byd ai barriergi)P to( genegi flSow,)P gene(S copies) will only be sampled from within the i=1 P (di gi)P (gi S)P (S) P (S D)= P ( g i S ) - coalescent| likelihood| of - birth-deathgene trees or Yule divergence P (S D)= P ( g i S ) - coalescent| likelihood| of - birth-deathgene trees or Yule divergence same population. This coalescenceP process, ( d i g i ) running - standard inside| thelikelihood species tree, for hasalignment two important and gene consequences (Figs 3 and 4). Fig. 3. P ( d i g i ) - standard| likelihood for alignment and gene | | P (nD) - gamma pop sizes with hyperprior | | P (nD) - gamma pop sizes with hyperprior DNA divergence times may be much olderQ than speciation times,i=1 knownP ( asdi‘deepgi) coalescenceP (gi S’).P As( illustrated,S) the most recent Heled and Drummond 2010 Q i=1 P (di gi)P (gi S)P (S) Heled and Drummond 2010 common ancestor (MRCA)4 components: of two genetree copy iP ( samplesS D)= from species A, B and C| is much older| than the speciation event T1. Fig. 4. 4 components:tree iP (S D)= | | 73 74 Deep coalescence in combination with incompleteP |fi (xation S ) of - geneuniform lineages topology withinP species(D) lineagesP ( (incompleteD) -Marginal lineage likelihood sorting) may P | ( S ) - uniform topologyP (D) P (D) -Marginal likelihood result in a gene tree and species treeP ( with g i S different ) - coalescent topologies. likelihoodQ For example, of thegene gene trees tree based on three randomly sampled alleles P ( g i S ) - coalescent likelihoodQ of gene trees P ( d i g i ) - 4standard components: likelihood for alignment and gene P ( d i g i ) - 4standard components: likelihood for alignment and gene would wrongly suggest a sister relationship| between species - Abirth-death and B. or Yule divergence | - birth-death or Yule divergence tree| i tree| i P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 P ( S ) - uniform| topology P (D) -Marginal likelihood P ( S ) - uniform| topology P (D) -Marginal likelihood Time P ( g i speciesS ) - coalescentAtree species i likelihood B of species gene C trees 73 P ( g i S ) - coalescenttree i likelihood of gene trees 74 | | - birth-death or Yule divergence reciprocally monophyletic species - birth-death or Yule divergence monophyly Therefore the full multispecies P ( g i S ) - coalescent likelihood of genesp. A trees sp. B sp. C P ( g i S ) - coalescent likelihood of gene trees - gamma| pop sizes with hyperpriorcoalescentHeled is and givenDrummond by:2010 - gamma| pop sizes with hyperprior Heled and Drummond 2010 P ( S ) - uniform topology *BEASTP (D) -Marginal Model likelihood P ( S ) - uniform topology *BEASTP (D) -Marginal likelihood paraphyly 73 74 - birth-death or Yule divergence n - birth-death or Yule divergence polyphyly P ( S ) - uniform topology P (D)i=1-MarginalP (di likelihoodgi)P (gi S)P (S) P ( S ) - uniform topologyP (gi S) P (D) -Marginal likelihood - gamma pop sizes with hyperpriorP (S D)= | | - gamma pop sizes with hyperprior | - birth-death or Yule divergence|Heled and Drummond 2010 P (D) - birth-death or YuleLikelihood divergenceHeled and of Drummond gene trees 2010 given the species tree Q 73 74 *BEAST - gammaModel pop 4sizes components: with hyperpriorsp. A sp. B sp. C *BEAST - gamma pop sizes with hyperprior trans-species polymorphism paraphyletic polyphyleticHeled andparaphyletic Drummond 2010 Heled and Drummond 2010 n We have an equation to calculate the likelihood of - standard= standard likelihood likelihood for for alignment gene73 tree and and gene 74 i=1 PP((ddi igig)i)P (gi S)P (S) P (gi S) coalescent histories within a lineage P (S D)= | | a sequence| alignment | *BEAST Model| tree iP (D) *BEAST Q Likelihood of gene trees given the species treen n t n 4 components:n P(L N(t)) = 2 exp 2 dt P ( g i S ) - coalescent= coalescent likelihood likelihood of forgene gene trees tree and i=1 P (di gi)P (g|i S)P (S) P (gi S) We have an equation to calculate| the likelihoodN of(t ) 0 N(t) ! P (S D)= | | species tree | i=2 Z *BEAST ModelP ( d i g i ) - standardP likelihood(D) for alignment and gene *BEAST Y Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 | | Likelihood ofcoalescent gene trees histories given the within species a lineage tree Fig. 5. Neutral coalescence process running withintree aQ speciesi tree, based on Klein et al.(1998), Degnan & Rosenberg (2009) and 4 components:n P ( S ) - uniform topology P (D) -Marginal likelihood n How might we extend this to a whole tree? Mailund (2009). Each dot represents*BEAST an individual gene Model copy, each colour a different allele,= and prior each for line species connects tree a gene copy to its *BEAST n t n i=1 P (di gi)P (gi S)P (S) P (gi SWe) have an equation to calculate the2 likelihood of 2 ancestor in the previous generation.P (S D Within)= a population,P ( g i S selection ) - coalescent| and/or drift likelihood| will result of in- birth-deathgene changing (uniform, trees allele birth-death,or frequencies Yule divergence overYule) time. In P(L N(t)) = exp dt the initial stages of lineage splitting,P ( sister d i g species i ) - standard will| largely likelihood sharen identical for alignment alleles, which and has gene important consequences for species | | N(t) N(t) | | P (D) - gamma pop sizes with hyperprior Likelihoodcoalescent of gene historiestrees given within the species ai lineage=2 tree Z0 ! delimitation. In this example, constructingtree i aQ gene tree at an early stagei=1 ofP speciation(di gi) wouldP (g resulti S in)P none(S of) the three species being Heled and Drummond 2010 P (gi S) Y Heled and Drummond 2010 monophyletic. Only4 after components: sufficient time hasP ( goneS D by,)= will alleles be completely| sorted in| each lineage, resulting in reciprocal | n P (D) = marginal likelihood 75 n t n 76 monophyly for the each of the three species. P | ( S ) - uniform topology P (D) -Marginal likelihood We have an equationLikelihoodHow to might calculate of gene we2 extend thetrees likelihood given this tothe ofa specieswhole 2 tree? tree P ( g i S ) - coalescent likelihoodQ of gene trees P(L N(t)) = exp dt P ( d i g i ) - 4standard |components: likelihood for alignment and gene | N(t) N(t) | - birth-death or Yule divergence coalescent histories withini=2 a lineage 0 ! tree i We have anY equation to calculateZ the likelihood of effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- n P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 n t n Heled and Drummond 2010 Coyne, 2002; Hickerson et al.,P 2006 ( S ) ). - uniform| topologycesses meet.P ( Therefore,D) -Marginal species likelihood delimitation should How mightcoalescent we extend2 histories this to within a whole a lineage 2tree? Thus, although within-speciesP ( g i S ) - coalescent genetree genealogies i likelihood of geneideally trees incorporate aspects of both population genet-75 P(L N(t)) = exp dt 76 | | N(t) n N(t) differ in nature from among-species - phylogeneticbirth-death or Yuleics divergence and phylogenetics (Knowles & Carstens, 2007; i=2 Zn0 ! t n Y 2 2 relationships (Posada & Crandall,P2001 ( g i S ), ) young- coalescentHey likelihood & Pinho, of 2012gene). trees P(L N(t)) = exp dt - gamma| pop sizes with hyperprior Heled and Drummond 2010 | N(t) HeledN and( tDrummond) 2010 P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a wholei=2 tree? Z0 ! 75 Y 76 - birth-death or Yule divergence P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a whole tree? - gamma pop sizes with hyperprior - birth-death or Yule divergenceHeled and Drummond 2010 Heled and Drummond 2010 75 76 - gamma pop sizes with hyperprior Heled and Drummond 2010 Heled and Drummond 2010 75 76 DNA-based species delimitation in algae 183

234 Time barriers to gene flow species A species B species C species A species B species C

T2 *BEAST Model *BEAST Model T1 n P (d g )P (g S)P (S) n P (d g )P (g S)P (S) P (S D)= i=1 i| i i| P (S D)= i=1 i| i i| gene |tree = P (Dgene) tree ≠ | P (D) species tree Q species tree Q 4 components: 4 components: AB C ABC ... 3 P ( d i g i ) - standard likelihood for alignment and gene P ( d i g i ) - standard likelihood for alignment and gene 2 | | generation 1 = MRCA tree i tree i deep coalescence incomplete lineage sorting P ( g i S ) - coalescent likelihood of gene trees P ( g i S ) - coalescent likelihood of gene trees Figs 2-4. Illustrations of a Wright–Fisher process (a simplified version of the coalescence| process), based on Degnan & Rosenberg | (2009) and Mailund (2009). Fig. 2. Diagram showing a set of non-overlapping generations where each new generation is sampled from the previous at random with replacement. Each dot represents an individual gene copy and each line connects a gene copy to its ancestor in the previous generation. Starting with a set of individuals in a first generation, the second generation is created by randomly selecting a parent from the first population; the third generation is sampledP ( S ) from - uniform the second, topology and so on. During speciationP (D) -Marginal likelihood P ( S ) - uniform topology P (D) -Marginal likelihood events (T1 and T2), where populations become separated by a barrier to gene fl ow, gene copies- birth-death will only be or sampled Yule divergence from within the - birth-death or Yule divergence same population. This coalescence process, running inside the species tree, has two important consequences (Figs 3 and 4). Fig. 3. DNA divergence times may be much older than speciation times, known as ‘ deep coalescence - gamma’. pop As illustrated, sizes with the hyperprior most recent Heled and Drummond 2010 - gamma pop sizes with hyperprior Heled and Drummond 2010 common ancestor (MRCA) of two gene copy samples from species A, B and C is much older than the speciation event T1. Fig. 4. 73 74 Deep coalescence in combination with incomplete fixation of gene lineages within species lineages (incomplete lineage sorting) may result in a gene tree and species tree with different topologies. For example, the gene tree based on three randomly sampled alleles would wrongly suggest a sister relationship between species A and B.

Time species A species B species C Thereforereciprocally the monophyletic full multispecies species monophyly sp. A sp. B sp. C *BEASTcoalescent Model is given by: *BEAST paraphyly n P (di gi)P (gi S)P (S) P (g S) polyphyly P (S D)= i=1 | | i| | P (D) Q Likelihood of gene trees given the species tree 4 components:sp. A sp. B sp. C trans-species polymorphism paraphyletic polyphyletic paraphyletic We have an equation to calculate the likelihood of P ( Remember: d i g i ) - standard likelihood for alignment and gene coalescent histories within a lineage tree| i n n t n • Only models incongruence due to 2 2 P ( g i S ) - coalescent likelihood of gene trees P(L N(t)) = exp dt | | N(t) N(t) incomplete lineage sorting i=2 0 ! Y Z Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 Fig. 5. Neutral coalescence process running within a species tree, based on KleinP • ( S Noet ) al .( -horizontal uniform1998), Degnan topology &gene Rosenberg flow (2009 orP) ( andD) -Marginal likelihood How might we extend this to a whole tree? Mailund (2009). Each dot represents an individual gene copy, each colour a different allele, and each line connects a gene copy to its ancestor in the previous generation. Within a population, selection and/or drift will result introgression in- birth-death changing allele or frequencies Yule divergence over time. In the initial stages of lineage splitting, sister species will largely share identical alleles, which has important consequences for species delimitation. In this example, constructing a gene tree at an early stage of speciation would - gamma result in pop none sizes of the with three specieshyperprior being Heled and Drummond 2010 Heled and Drummond 2010 monophyletic. Only after sufficient time has gone by, will alleles be completely• Assumes sorted in each no lineage, selection, resulting in reciprocalno 75 76 monophyly for the each of the three species. recombination effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- Coyne, 2002; Hickerson et al., 2006). cesses meet. Therefore, species delimitation should Thus, although within-species gene genealogies ideally incorporate aspects of both population genet- differ in nature from among-species phylogenetic ics and phylogenetics (Knowles & Carstens, 2007; relationships (Posada & Crandall, 2001), young Hey & Pinho, 2012). Heled and Drummond doi:10.1093/molbev/msp274 MBE ·

Uses of the coalescent:

inferring demographic history of western pocket gophers… Downloaded from http://mbe.oxfordjournals.org/ at UNIVERSITY OF AUCKLAND LIBRARY on May 14, 2012

FIG.9.Western pocket gophers (Geomyidae,Heled, J., &Thomomys Drummond,)speciestreewithembeddedgenetrees,eachinadifferentcolor.Thespeciestreewasgen- A. J. (2010). of species trees from erated using median estimates formultilocus the divergence data.times Molecular and population biology and sizes. evolution, Note that 27(3), this 570-580. representation is for the purpose of visual inspection only, and any inferences should be made directly from the posterior data.

the essential details of the evolutionary process. It has been polymorphism, our implementation also provides a very known since Mendel’s experiments (Orel 1996) that “un- natural way to include multiindividual data and missing linked” genetic loci segregate independently of each other data into a phylogenetic analysis. within a species or isolated population, but it was not un- There are two obvious ways in which our model is defi- til a few decades ago that the implications of independent cient. The first, and arguably most important, is that it lacks segregation were fully appreciated in the context of gene any modeling of recombination within a locus. For nuclear trees. Although the model we have described here is not loci in eukaryotes, recombination rates may often be com- new (Maddison 1997; Rannala and Yang 2003) (apart from parable to mutation rates. The impact of this on species tree small details such population size function and its prior), the estimation under the multispecies coalescent has not been contribution we have made is to describe and implement assessed; however, it is known that the presence of recom- for practical use a full Bayesian inference of the species tree bination can have a large biasing effect on estimated pop- under the multispecies coalescent model. Our implemen- ulation sizes, if no recombination is assumed in the model tation is based on a popular existing software package for (Schierup and Hein 2000). Bayesian phylogenetics, and as a result it can exploit exist- The second deficiency of our model is, arguably, our ing models for gene trees such as relaxed molecular clocks treatment of speciation events. It has been suggested that (Drummond et al. 2006) and previously implemented priors incorporating a substantial period of limited gene flow as for species tree such as the reconstructed birth–death prior atransitionbetweenasingleancestralspeciesandtwode- (Gernhard 2008). scendant species is more realistic (Hey and Nielsen 2004). There is no doubt that our proposed model still rep- This approach has been taken for coalescent inference of resents a very idealized view of the genetic relationships sister species in the IM/IMα software packages (Hey and of multiple loci between individuals from closely related Nielsen 2007). Although we think that this is a good research species. However, we believe that despite its obvious sim- direction and expect multispecies versions of isolation with plifications, it represents a major improvement on the stan- migration models to be popular, we remain uncertain about dard approaches to multigene phylogenetics. Besides the how much power there will be to perform inference under ability to model incomplete lineage sorting and ancestral such models, without making very strong prior assumptions

578 )Q86*-"J16,$'+"$<,=0'

• /-'0,+"'#30"0'*$'*0'*-311&,1&*3$"'$,'6*+*$',%&'*-5"0?;3?,-'$,'3'Uses of the coalescent: Bayesian skyline plots 0+366'&3-;"',7'0*+16"'13&3+"$&*#'+,="60' )Q86*-"'U6,$0#

14 Drummond et al. (2005) MBE

Image: Drummond, A. J., Rambaut, A., Shapiro, B., & Pybus, O. G. (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular biology and evolution, 22(5), 1185-1192. )Q86*-"J16,$'+"$<,=0' >536%3?-;'0%11,&$'

• D*5"-'3'0":%"-#"'36*;-+"-$O'="+,;&31<*#'&"#,-0$&%#?,-'' • /0'3-'*-7"&&"='="+,;&31<*#'13Y"&-'*0'+"3-*-;7%6Z' #,+1&*0"0'$H,'0"13&3C6"'0$"10P' • I38"0'73#$,&0' VW >0?+3?,-',7'$<"';"-"36,;8'7&,+'$<"'36*;-+"-$' • [*0%36'*-01"#?,-',7'0Q86*-"'16,$' • T%+C"&',7'#<3-;"'1,*-$0'@"I)UA' XW >0?+3?,-',7'1,1%63?,-'<*0$,&8'7&,+'$<"';"-"36,;8'

V^^^'

B,0#3-3'5*&%0' V^^' a"<"-="&')*#+,W'@X^^\A' -./)0*#1).)*#%!2,' U,1%63?,-'0*S"'

V^' V\]^' V\_^' V\`^' V\\^' 15 16 Future extensions of the multispecies coalescent: inferring reticulate evolution

Scenario 1: P (ABBA)=P (BABA) incomplete lineage sort- 2Nµ 1 t3 t2 =(1 ) 1 ing and no introgression 3 2N ⇣ ⌘

Scenario 2: P (ABBA)=P (BABA) incomplete lineage sort- 2Nµ 1 t3 tgf = 1 ing and introgression 3 2N ⇣ ⌘

Scenario 3: P (ABBA)=µ(t3 tgf ) introgression and no in- P (BABA)=0 complete lineage sorting

Figure 2: The population tree is depicted as ((P1,P2),P3)withasingleepisodeofintrogressivegeneflowbetweenP2 and P3. The outgroup is omitted for clarity. Possible gene tree histories that can result in ABBA or BABA site patterns are represented as dotted lines, and the corresponding coalescent probabilities are given for the simple case of a constant population size N where is the probability of introgressive gene flow, µ is the substitution rate, and tgf , t2,andt3 are the times of introgressive gene flow, the divergence time of P1 and P2,andthedivergencetimeofP1 and P3 respectively. Full derivations for more complex demographic scenarios (e.g. nonconstant population sizes) can be found in Durand et al. (2011)andGreen et al. (2010).

Objective 1: Expected Outcomes I expect the low-coverage shotgun sequencing to successfully recover high coverage sequences of high-copy genomic regions such as the complete chloroplast genome and ribosomal cistron for ￿80 specimens of Chylis- mia and closely related genera (see the methods in Objective 3 for genome assembly details). Furthermore, I will use the expressed sequence tag (EST) library from Oenothera elata available in GenBank, and a number of bioinformatic approaches including de novo and reference guided assembly to recover other homologous nuclear and mitochondrial sequences. With the RADseq approach I expect to recover approximately 30000 RAD loci (Floragenex Inc. personal communication), which when concatenated will result in a complete sequence matrix of over 1e6 base pairs. The highly informative RADseq and genome skimming datasets will result in a robust hypothesis of the putatively reticulate evolutionary relationships among members of Chylismia.Thiswillpermitthe first reevaluation of the taxonomy of Chylismia based on molecular evidence. Furthermore the estimated ancestral ranges and divergence (and possible convergence) times will test Raven’s multiple hypotheses of vicariance and recontact among Chylismia taxa due to Pleistocene pluvial lake cycles (Raven 1962). One such reticulate evolutionary scenario hypothesized by Raven that will be tested is that eastern populations of the widespread Chylismia claviformis became isolated during Pleistocene oscillations of lakes present in the Salton Sink basin and subsequently underwent hybridization with southern populations of Chylismia brevipes, resulting in the morphologically intermediate subspecies C. claviformis subsp. rubens.Thetiming and extent of gene flow in this and other putative hybridization events will be tested with the method described in Objective 2. Objective 2: Testing a novel Bayesian method to detect introgression in the presence of incomplete lineage sorting. Objective 2: Introduction Here I introduce a novel Bayesian approach for inferring introgression conditional on a population tree, and describe how I will evaluate the model using both empirical Chylismia RADseq data as well as simulated RADseq datasets. The method combines elements from both coalescent theory and phylogenetics by cal- culating the joint posterior probability of a coalescent model of ABBA and BABA incongruent gene tree histories simultaneously with a phylogenetic molecular evolution model conditioned on a population tree.

4 DNA-based species delimitation in algae 183

234 Time barriers to gene flow species A species B species C species A species B species C

T2 *BEAST Model *BEAST Model T1 n P (d g )P (g S)P (S) n P (d g )P (g S)P (S) P (S D)= i=1 i| i i| P (S D)= i=1 i| i i| gene |tree = P (Dgene) tree ≠ | P (D) species tree Q species tree Q *BEAST Model4 components: *BEAST Model4 components: AB C ABC ... 3 n n PP ( ( d d i i g ig ) i -) standardP (gi S likelihood)P (S) for alignment and gene PP ( ( d d i i g ig ) i -) standardP (gi S likelihood)P (S) for alignment and gene 2 P (S D)= i=1 | | | P (S D)= i=1 | | | generation 1 *BEAST= ModelMRCA | tree iP (D) *BEAST Model| tree iP (D) deep coalescenceQ incomplete lineage sorting Q 4 components: 4 components: n P ( g i S ) - coalescent likelihood of gene trees n P ( g i S ) - coalescent likelihood of gene trees Figs 2-4. Illustrations of a Wright–Fisher process (a simplifiedP version(di ofgi the)P coalescence(g|i S)P process),(S) based on Degnan & Rosenberg P (di gi)P (g|i S)P (S) (2009) and Mailund (2009). Fig. 2. Diagram showing ai set=1 of non-overlapping generations where each new generation is sampled i=1 P (S D)=P ( d g ) - standard| likelihood |for alignment and gene P (S D)=P ( d g ) - standard| likelihood |for alignment and gene from the previous*BEAST at random with replacement. Model| Each doti| representsi an individualP (D) gene copy and each line connects a gene copy to its*BEAST Model| i| i P (D) ancestor in the previous generation. Starting with aQ set of individuals in a first generation, the second generation is created by Q 4 components: tree i P ( S ) - uniform topology P (D) -Marginal likelihood4 components: tree i P ( S ) - uniform topology P (D) -Marginal likelihood randomly selecting a parent from*BEAST the first population;n theModel third generation is sampled from the second, and so on. During speciation *BEASTn Model events (T1 and T2), where populations become separatedi=1 P ( byd ai barriergi)P to( genegi flSow,)P gene(S copies) will only be sampled from within the i=1 P (di gi)P (gi S)P (S) P (S D)= P ( g i S ) - coalescent| likelihood| of - birth-deathgene trees or Yule divergence P (S D)= P ( g i S ) - coalescent| likelihood| of - birth-deathgene trees or Yule divergence same population. This coalescenceP process, ( d i g i ) running - standard inside| thelikelihood species tree, for hasalignment two important and gene consequences (Figs 3 and 4). Fig. 3. P ( d i g i ) - standard| likelihood for alignment and gene | | P (nD) - gamma pop sizes with hyperprior | | P (nD) - gamma pop sizes with hyperprior DNA divergence times may be much olderQ than speciation times,i=1 knownP ( asdi‘deepgi) coalescenceP (gi S’).P As( illustrated,S) the most recent Heled and Drummond 2010 Q i=1 P (di gi)P (gi S)P (S) Heled and Drummond 2010 common ancestor (MRCA)4 components: of two genetree copy iP ( samplesS D)= from species A, B and C| is much older| than the speciation event T1. Fig. 4. 4 components:tree iP (S D)= | | 73 74 Deep coalescence in combination with incompleteP |fi (xation S ) of - geneuniform lineages topology withinP species(D) lineagesP ( (incompleteD) -Marginal lineage likelihood sorting) may P | ( S ) - uniform topologyP (D) P (D) -Marginal likelihood result in a gene tree and species treeP ( with g i S different ) - coalescent topologies. likelihoodQ For example, of thegene gene trees tree based on three randomly sampled alleles P ( g i S ) - coalescent likelihoodQ of gene trees P ( d i g i ) - 4standard components: likelihood for alignment and gene P ( d i g i ) - 4standard components: likelihood for alignment and gene would wrongly suggest a sister relationship| between species - Abirth-death and B. or Yule divergence | - birth-death or Yule divergence tree| i tree| i P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 P ( S ) - uniform| topology P (D) -Marginal likelihood P ( S ) - uniform| topology P (D) -Marginal likelihood Time P ( g i speciesS ) - coalescentAtree species i likelihood B of species gene C trees 73 P ( g i S ) - coalescenttree i likelihood of gene trees 74 | | - birth-death or Yule divergence reciprocally monophyletic species - birth-death or Yule divergence monophyly Today in lab we’ll use RevBayes to P ( g i S ) - coalescent likelihood of genesp. A trees sp. B sp. C P ( g i S ) - coalescent likelihood of gene trees - gamma| pop sizes with hyperpriorimplement the full multispecies coalescent: - gamma| pop sizes with hyperprior P ( S ) - uniform topology *BEASTP (D) -Marginal ModelHeled likelihood and Drummond 2010 P ( S ) - uniform topology *BEASTP (D) -MarginalHeled likelihood and Drummond 2010 paraphyly 73 74 - birth-death or Yule divergence n - birth-death or Yule divergence polyphyly P ( S ) - uniform topology P (D)i=1-MarginalP (di likelihoodgi)P (gi S)P (S) P ( S ) - uniform topologyP (gi S) P (D) -Marginal likelihood - gamma pop sizes with hyperpriorP (S D)= | | - gamma pop sizes with hyperprior | - birth-death or Yule divergence|Heled and Drummond 2010 P (D) - birth-death or YuleLikelihood divergenceHeled and of Drummond gene trees 2010 given the species tree Q 73 74 *BEAST - gammaModel pop 4sizes components: with hyperpriorsp. A sp. B sp. C *BEAST - gamma pop sizes with hyperprior trans-species polymorphism paraphyletic polyphyleticHeled andparaphyletic Drummond 2010 We have an equationHeled to calculateand Drummond the 2010 likelihood of n 73 74 - standard= standard likelihood likelihood for for alignment gene tree and given gene i=1 PP((ddi igig)i)P (gi S)P (S) P (gi S) coalescent histories within a lineage P (S D)= | | a sequence| alignment | *BEAST Model| tree iP (D) *BEAST Q Likelihood of gene trees given the species treen n t n 4 components: 2 2 n P ( g i S ) - coalescent likelihood of gene trees P(L N(t)) = exp dt P (di gi)P (g|i S)P= (coalescentS) likelihood for gene tree given | N(t) N(t) i=1 P (gi S) We have an equation to calculate the likelihoodi=2 of 0 ! P (S D)= | | a species tree | Z *BEAST ModelP ( d i g i ) - standardP likelihood(D) for alignment and gene *BEAST Y

Downloaded by [University of Alabama at Tuscaloosa], [Mr Frederik Leliaert] 08:14 16 May 2014 | coalescent histories within a lineage Q| Likelihood of gene trees given the species tree Fig. 5. Neutral coalescence process4 components: running withintree a speciesi tree, based on KleinP ( S et ) al .(- uniform1998), Degnan topology & Rosenberg (2009P)( andD) -Marginal likelihood How might we extend this to a whole tree? Mailund (2009). Each dot represents*BEAST an individualn gene Model copy, each colour a different allele, and each line connects a gene copy to its *BEAST n n t n i=1 P (di gi)P (gi S)P (S) = prior for species tree P (gi SWe) have an equation to calculate the2 likelihood of 2 ancestor in the previous generation.P (S D Within)= a population,P ( g i S selection ) - coalescent| and/or drift likelihood| will result of in- birth-deathgene changing (uniform, trees allele birth-death,or frequencies Yule divergence overYule) time. In P(L N(t)) = exp dt the initial stages of lineage splitting,P ( sister d i g species i ) - standard will| largely likelihood share identical for alignment alleles, which and has gene important consequences for species | | N(t) N(t) | P (nD) coalescent histories within ai lineage=2 0 ! delimitation. In this example, constructing| aQ gene tree at an early stage ofP speciation(d i g i ) would P -( gammag resulti S in) Ppop none(S sizes of) the with three specieshyperprior being Heled and DrummondLikelihood 2010 of geneP (g treesS )given the species tree Z Heled and Drummond 2010 4 components:tree iP (S D)= i=1 | | i Y monophyletic. Only after sufficient time has gone by, will alleles be completely sorted in each lineage, resulting in reciprocal 75 | n n t n 76 P (D) = marginal likelihood monophyly for the each of the three species. P | ( S ) - uniform topology P (D) -Marginal likelihood We have an equationLikelihoodHow to might calculate of gene we2 extend thetrees likelihood given this tothe ofa specieswhole 2 tree? tree P ( g i S ) - coalescent likelihoodQ of gene trees P(L N(t)) = exp dt P ( d i g i ) - 4standard |components: likelihood for alignment and gene | N(t) N(t) | - birth-death or Yule divergence coalescent histories withini=2 a lineage 0 ! tree i We have anY equation to calculateZ the likelihood of effective population size (Avise, 2000; Hudson & species reside in a boundary zone where both pro- n P ( d i g i ) - standard - gamma likelihood pop sizes for with alignment hyperprior and geneHeled and Drummond 2010 n t n Heled and Drummond 2010 Coyne, 2002; Hickerson et al.,P 2006 ( S ) ). - uniform| topologycesses meet.P ( Therefore,D) -Marginal species likelihood delimitation should How mightcoalescent we extend2 histories this to within a whole a lineage 2tree? Thus, although within-speciesP ( g i S ) - coalescent genetree genealogies i likelihood of geneideally trees incorporate aspects of both population genet-75 P(L N(t)) = exp dt 76 | | N(t) n N(t) differ in nature from among-species - phylogeneticbirth-death or Yuleics divergence and phylogenetics (Knowles & Carstens, 2007; i=2 Zn0 ! t n Y 2 2 relationships (Posada & Crandall,P2001 ( g i S ), ) young- coalescentHey likelihood & Pinho, of 2012gene). trees P(L N(t)) = exp dt - gamma| pop sizes with hyperprior Heled and Drummond 2010 | N(t) HeledN and( tDrummond) 2010 P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a wholei=2 tree? Z0 ! 75 Y 76 - birth-death or Yule divergence P ( S ) - uniform topology P (D) -Marginal likelihood How might we extend this to a whole tree? - gamma pop sizes with hyperprior - birth-death or Yule divergenceHeled and Drummond 2010 Heled and Drummond 2010 75 76 - gamma pop sizes with hyperprior Heled and Drummond 2010 Heled and Drummond 2010 75 76