<<

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

A (short) introduction to phylogenetics

Thibaut Jombart

MRC Centre for Outbreak Analysis and Modelling Imperial College London

Genetic data analysis with PR∼Statistics, Glasgow 03-08-2015

1/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

2/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

3/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Phylogenetics: from the origins...

’From the first growth of the tree, many a limb and branch has decayed and dropped off; and these fallen branches of various sizes may represent those whole orders, families, and genera which have now no living representatives, and which are known to us only in a state.’

C. , Notebook, 1837.

4/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Phylogenetics: ...to the present

• phylogenetic trees are part of the standard toolbox of genetic data analysis • represent the evolutionary history of a set of (sampled) taxa

Bininda-Emonds et al., 2007, .

5/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

And the main difference is...

Current trees look better!

(and some other minor differences)

6/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

And the main difference is...

Current trees look better!

(and some other minor differences)

6/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

And the main difference is...

Current trees look better!

(and some other minor differences)

6/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

About the minor differences...

• DNA sequencing revolution • huge data banks freely available (e.g. GenBank) • easier, cheaper, faster to obtain DNA sequences • increasing number of full genomes available

Different ways to exploit this information.

7/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

About the minor differences...

• DNA sequencing revolution • huge data banks freely available (e.g. GenBank) • easier, cheaper, faster to obtain DNA sequences • increasing number of full genomes available

Different ways to exploit this information.

7/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

8/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Phylogenetic trees: some useful terms : representation of evolutionary relationships between a set of taxa.

9/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Accumulated substitutions tell us about the genealogy

Substitution: replacement of a nucleotide (e.g. a → t)

...attgtacgtaggatctagg...

...attgtacgtaggatctagg...... attgtacgtaggatctagg...

...attgtacgtaggatctagg... Time ...attgtacgtaggatctagg...

True phylogeny

10/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Accumulated substitutions tell us about the genealogy

Substitution: replacement of a nucleotide (e.g. a → t)

...attgtacgtaggatctagg...

...attgaacgtaggatctagg...... attgtacgtaccatctagg...

...atcgtacgtaccatctggg... Time ...attgaacgtagttgctagg...

True phylogeny

10/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Accumulated substitutions tell us about the genealogy

Substitution: replacement of a nucleotide (e.g. a → t)

...attgtacgtaggatctagg...

...attgaacgtaggatctagg...... attgtacgtaccatctagg...

...atcgtacgtaccatctggg... Time ...attgaacgtagttgctagg...

True phylogeny

10/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Accumulated substitutions tell us about the genealogy

Substitution: replacement of a nucleotide (e.g. a → t)

...attgtacgtaggatctagg...

...attgaacgtaggatctagg...... attgtacgtaccatctagg...

...atcgtacgtaccatctggg... Time ...attgaacgtagttgctagg...

True phylogeny

10/39 Different methods for achieving phylogenetic reconstruction.

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

From alignments to phylogenies

...attaaacgtaggatctagg...

...attaaacgtaggatctagg...

...attcatacgtaggatcagg...

...attgtacgtaggatctttt...

...attgtacgtaggatctttt...

...attgcatgtaggatctttt... Reconstructed phylogeny

11/39 Different methods for achieving phylogenetic reconstruction.

Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

From alignments to phylogenies

...attaaacgtaggatctagg...

...attaaacgtaggatctagg...

...attcatacgtaggatcagg...

...attgtacgtaggatctttt...

...attgtacgtaggatctttt...

...attgcatgtaggatctttt... Reconstructed phylogeny

11/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

From alignments to phylogenies

...attaaacgtaggatctagg...

...attaaacgtaggatctagg...

...attcatacgtaggatcagg...

...attgtacgtaggatctttt...

...attgtacgtaggatctttt...

...attgcatgtaggatctttt... Reconstructed phylogeny

Different methods for achieving phylogenetic reconstruction.

11/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Workflow Prepare data

• align sequences: alignment software + manual refinement

Build the tree

• distance-based methods • maximum parsimony • likelihood-based methods (ML, Bayesian)

Analyse the tree

• assess uncertainty • test phylogenetic signal • model trait

• ... 12/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Workflow Prepare data

• align sequences: alignment software + manual refinement

Build the tree

• distance-based methods • maximum parsimony • likelihood-based methods (ML, Bayesian)

Analyse the tree

• assess uncertainty • test phylogenetic signal • model trait evolution

• ... 12/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Workflow Prepare data

• align sequences: alignment software + manual refinement

Build the tree

• distance-based methods • maximum parsimony • likelihood-based methods (ML, Bayesian)

Analyse the tree

• assess uncertainty • test phylogenetic signal • model trait evolution

• ... 12/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Workflow Prepare data

• align sequences: alignment software + manual refinement

Build the tree

• distance-based methods • maximum parsimony • likelihood-based methods (ML, Bayesian)

Analyse the tree

• assess uncertainty • test phylogenetic signal • model trait evolution

• ... 12/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

13/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

Approaches relying on agglomerative clustering algorithms (e.g. Single linkage, UPGMA, Neighbor-Joining) Rationale 1. compute pairwise genetic distances D 2. group closest sequences 3. update D 4. go back to 2) until all sequences are grouped

14/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

15/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

What is the distance between a node and tips?

Hierarchical clustering:

k • single linkage: Dk,g = min(Dk,i,Dk,j) Dk , g =... • complete linkage: Dk,g = max(Dk,i,Dk,j)

i j Dk,i+Dk,j • UPGMA: Dk,g = g Di,j 2

Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution.

16/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

What is the distance between a node and tips?

Hierarchical clustering:

k • single linkage: Dk,g = min(Dk,i,Dk,j) Dk , g =... • complete linkage: Dk,g = max(Dk,i,Dk,j)

i j Dk,i+Dk,j • UPGMA: Dk,g = g Di,j 2

Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution.

16/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

What is the distance between a node and tips?

Hierarchical clustering:

k • single linkage: Dk,g = min(Dk,i,Dk,j) Dk , g =... • complete linkage: Dk,g = max(Dk,i,Dk,j)

i j Dk,i+Dk,j • UPGMA: Dk,g = g Di,j 2

Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution.

16/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

What is the distance between a node and tips?

Hierarchical clustering:

k • single linkage: Dk,g = min(Dk,i,Dk,j) Dk , g =... • complete linkage: Dk,g = max(Dk,i,Dk,j)

i j Dk,i+Dk,j • UPGMA: Dk,g = g Di,j 2

Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution.

16/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

What is the distance between a node and tips?

Hierarchical clustering:

k • single linkage: Dk,g = min(Dk,i,Dk,j) Dk , g =... • complete linkage: Dk,g = max(Dk,i,Dk,j)

i j Dk,i+Dk,j • UPGMA: Dk,g = g Di,j 2

Neighbor joining: Transforms original distances to account for heterogeneous rates of evolution.

16/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

Advantages

• simple • flexible (many distances and clustering algorithms) • fast and scalable (applicable to large datasets)

Limitations • sensitive to distance/clustering chosen • evolutionary rates are not estimated • no measure of uncertainty for the tree obtained

17/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Distance-based phylogenetic reconstruction

Advantages

• simple • flexible (many distances and clustering algorithms) • fast and scalable (applicable to large datasets)

Limitations • sensitive to distance/clustering chosen • evolutionary rates are not estimated • no measure of uncertainty for the tree obtained

17/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

18/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

Approaches relying on finding the tree with the smallest number of character changes (substitutions)

Rationale 1. start from a pre-defined tree 2. compute initial parsimony score 3. permute branches and compute parsimony score 4. accept new tree if the parsimony score is improved 5. go back to 3) until convergence

19/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

score: 8

score: 6 Initial tree

score: 5

20/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

score: 8

score: 6 Initial tree

score: 5

20/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

score: 8

score: 6 Initial tree

score: 5

20/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

score: 8

score: 6 Initial tree

score: 5

20/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies

score: 8

score: 6 Initial tree

score: 5

20/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies Advantages

• applicable to any discontinuous characters (not just DNA) • intuitive explanation: ‘simplest’ evolutionary scenario

Limitations • evolutionary rates are not estimated • no measure of uncertainty for the tree obtained • computer-intensive • different types of substitutions ignored • evolution not necessarily parsimonious • sensitive to heterogeneous rates of evolution ()

21/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Maximum parsimony phylogenies Advantages

• applicable to any discontinuous characters (not just DNA) • intuitive explanation: ‘simplest’ evolutionary scenario

Limitations • evolutionary rates are not estimated • no measure of uncertainty for the tree obtained • computer-intensive • different types of substitutions ignored • evolution not necessarily parsimonious • sensitive to heterogeneous rates of evolution (long branch attraction)

21/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

22/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Likelihood-based phylogenies (ML / Bayesian) Approaches relying on a model of sequence evolution: • ML: find tree and evolutionary rates with highest likelihood • Bayesian: find tree and evolutionary rates to posterior probability Rationale 1. start from a pre-defined tree 2. compute initial likelihood/posterior 3. permute branches, sample new parameters and compute likelihood/posterior 4. accept new tree and parameters based on likelihood/posterior improvement 5. go back to 3) until convergence

23/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Likelihood-based phylogenies (ML / Bayesian) Approaches relying on a model of sequence evolution: • ML: find tree and evolutionary rates with highest likelihood • Bayesian: find tree and evolutionary rates to posterior probability Rationale 1. start from a pre-defined tree 2. compute initial likelihood/posterior 3. permute branches, sample new parameters and compute likelihood/posterior 4. accept new tree and parameters based on likelihood/posterior improvement 5. go back to 3) until convergence

23/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Likelihood-based phylogenies (ML / Bayesian)

Likelihood / Posterior Density

24/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Likelihood-based phylogenies (ML / Bayesian) Advantages

• very flexible • consistent with a model of evolution • statistically consistent (model comparison) • parameter estimation • (Bayesian) several trees → measure of uncertainty

Limitations • computer-intensive • choice of the model of evolution • (ML) no measure of uncertainty for the tree obtained • (Bayesian) need to find a consensus tree

25/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Likelihood-based phylogenies (ML / Bayesian) Advantages

• very flexible • consistent with a model of evolution • statistically consistent (model comparison) • parameter estimation • (Bayesian) several trees → measure of uncertainty

Limitations • computer-intensive • choice of the model of evolution • (ML) no measure of uncertainty for the tree obtained • (Bayesian) need to find a consensus tree

25/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

26/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

How do we know the tree is robust?

Main issue: assess the uncertainty of the tree topology / individual nodes Approaches

• ML: model selection to compare trees (whole tree) • Bayesian methods: between-samples variability (individual nodes) • any method: bootstrap (individual nodes)

27/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

How do we know the tree is robust?

Main issue: assess the uncertainty of the tree topology / individual nodes Approaches

• ML: model selection to compare trees (whole tree) • Bayesian methods: between-samples variability (individual nodes) • any method: bootstrap (individual nodes)

27/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

How do we know the tree is robust?

Main issue: assess the uncertainty of the tree topology / individual nodes Approaches

• ML: model selection to compare trees (whole tree) • Bayesian methods: between-samples variability (individual nodes) • any method: bootstrap (individual nodes)

27/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

How do we know the tree is robust?

Main issue: assess the uncertainty of the tree topology / individual nodes Approaches

• ML: model selection to compare trees (whole tree) • Bayesian methods: between-samples variability (individual nodes) • any method: bootstrap (individual nodes)

27/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

How do we know the tree is robust?

Main issue: assess the uncertainty of the tree topology / individual nodes Approaches

• ML: model selection to compare trees (whole tree) • Bayesian methods: between-samples variability (individual nodes) • any method: bootstrap (individual nodes)

27/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Bootstrapping phylogenies

• assess variability due to sampling the genome and conflicting signals • relies on analysing resampled datasets

Rationale 1. obtain a reference tree 2. resample the sites with replacement 3. obtain a tree for the resampled dataset 4. go back to 2) until the desired number of bootstrapped trees is attained 5. compute the frequency of each bifurcation of the reference tree occuring in bootstrapped trees

28/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Bootstrapping phylogenies

• assess variability due to sampling the genome and conflicting signals • relies on analysing resampled datasets

Rationale 1. obtain a reference tree 2. resample the sites with replacement 3. obtain a tree for the resampled dataset 4. go back to 2) until the desired number of bootstrapped trees is attained 5. compute the frequency of each bifurcation of the reference tree occuring in bootstrapped trees

28/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Bootstrapping phylogenies

...attaaacgtaggatctagg...

...attaaacgtaggatctagg...

...attcatacgtaggatcagg...

...attgtacgtaggatctttt...

...attgtacgtaggatctttt...

...attgcatgtaggatctttt... Reconstructed phylogeny

sampling sites with compare replacement topologies

...ttttaaaccatggatctagg...... ttttaaaccatggatctagg...... ttttaaaccatgga...ttttaaaccatggatctagg...tctagg...... ttttaaaccatgga...ttttaaaccatggatctagg...tctagg...... ttttaaaccatgga...ttttaaaccatgga...atttcatacgtaggtctagg...tctagg...atcagg...... ttttaaaccatgga...atttcatacgtaggtctagg...atcagg...... ttttaaaccatgga...atttcatacgtagg...aaatgtaccatggatctagg...atcagg...tcttgt...... atttcatacgtagg...aaatgtaccatggaatcagg...tcttgt...... atttcatacgtagg...aaatgtaccatgga...aaatgtaccatggaatcagg...tcttgt...tcttgt...... aaatgtaccatgga...aaatgtaccatggatcttgt...tcttgt...... aaatgtaccatgga...aaatgtaccatgga...aaatgcatcatggatcttgt...tcttgt...tcttgt... Reconstructed phylogeny ...aaatgtaccatgga...aaatgcatcatggatcttgt...tcttgt... Reconstructed phylogeny ...aaatgtaccatgga...aaatgcatcatggatcttgt...tcttgt... Reconstructed phylogeny ...aaatgcatcatggatcttgt... Reconstructed phylogeny ...aaatgcatcatggatcttgt... Reconstructed phylogeny

29/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Bootstrapping phylogenies

Advantages

• standard • simple to implement

Limitations • possibly computer-intensive • assumes that the genome has been sampled randomly (often wrong)

30/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Bootstrapping phylogenies

Advantages

• standard • simple to implement

Limitations • possibly computer-intensive • assumes that the genome has been sampled randomly (often wrong)

30/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

31/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Plotting trees as rooted

12 9 10 1 2 8 4 3 11 13 4 15 7 13 6 14 7 12 5 14 1 3 5 4 9 12 13 2 1 7 11 11 14 15 15 5 6 6 3 10 10 9 8 8 2 15 14 15 5 11 11 1 7 2 6 13 12 6 4 9 9 3 2

10 3 10 8 5 10 8 8 5 3 9 12 14 2 1 12 7 14 1 13 11 7 6 13 4 4 15 Never plot an unrooted tree as rooted.

32/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Plotting trees as rooted

12 9 10 1 2 8 4 3 11 13 4 15 7 13 6 14 7 12 5 14 1 3 5 4 9 12 13 2 1 7 11 11 14 15 15 5 6 6 3 10 10 9 8 8 2 15 14 15 5 11 11 1 7 2 6 13 12 6 4 9 9 3 2

10 3 10 8 5 10 8 8 5 3 9 12 14 2 1 12 7 14 1 13 11 7 6 13 4 4 15 Never plot an unrooted tree as rooted.

32/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Interpreting distances

33/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Interpreting distances

33/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Interpreting distances meaningless

meaningful distance = sum of branch lengths

33/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

The paradox of divergent clusters

MRCA and genetic distances may give different information.

34/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

The paradox of divergent clusters

D(red, red)>D(red,blue)

MRCA and genetic distances may give different information.

34/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

The paradox of divergent clusters

D(red, red)>D(red,blue)

MRCA and genetic distances may give different information.

34/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Taking uncertainty into account

193 HIV−1 sequences from DRC (Strimmer & Pybus 2001)

0.2 0.15 0.1 0.05 0

At best, the tree is an estimate of the likely evolutionary history of the taxa studied.

35/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Taking uncertainty into account

193 HIV−1 sequences from DRC Collapsed tree (threshold length 0.01) (Strimmer & Pybus 2001)

0.2 0.15 0.1 0.05 0 0.2 0.15 0.1 0.05 0

At best, the tree is an estimate of the likely evolutionary history of the taxa studied.

35/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Taking uncertainty into account

193 HIV−1 sequences from DRC Collapsed tree (threshold length 0.01) (Strimmer & Pybus 2001)

0.2 0.15 0.1 0.05 0 0.2 0.15 0.1 0.05 0

At best, the tree is an estimate of the likely evolutionary history of the taxa studied.

35/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

(Over, Mis)Interpreting temporal trends

t6 t62 t4t48 t29t14 t72 t83 t39 t69 t96 t63 t44t24 t51t85 t95 t68 t9t92 t87 t31 t81t8 t99 t73 t30t25 t64t33 t10 t46 t54t16 t26t59 t76t32 t100t91 t77t42 t36t58 t65t79 t98t74 t56 t37 t80t12 t78t93 t90 t94 t18t11 t5t45 t60t67 t53t88 t52t27 t22 t40 t23 t38 t15t13 t3 t86 t89t75 t35 t2 t61 t34 t55t66 t70t17 t28t47 t19t43 t41t97 t82t20 t84t49 t1 t7 t21 t71 t50 t57

12 10 8 6 4 2 0

“Time trees” only make sense under a near-perfect .

36/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

(Over, Mis)Interpreting temporal trends

t6 t62 t4t48 t29t14 t72 t83 t39 12 t69 t63 t96 t44 t85t24 t51 t68 t95t9 t92t87 t31t81 t73t8 t99 t30 t25t64 t46t33 t54t10 t59t16 t26t32 t76 t100 t91 t77 t58t42 t36 t79 t98 t65 t56 t74 t37t80 t12t78 t93 6 8 10 t90 t94 t18t11 t5t45 t60t67 t53t88 t52t27 to the root > anova(lm(d.root~-1+t.root)) t22 t40 t23 t38 t15t13 anova(lm(d.root~-1+t.root)) t3 t86 t89t75 Analysis of Variance Table t35 t2 t61 t34 t55t66 Response: d.root t70t17 t28t47 Df Sum Sq Mean Sq F value Pr(>F) t19t43 t41t97 t.root 1 5434.7 5434.7 214.66 < 2.2e-16 *** t82t20 t84t49 Residuals 99 2506.4 25.3 t1 t7 t21 t71 t50 t57 0 2 4

12 10 8 6 4 2 0 0 20 40 60 80

Time to the root “Time trees” only make sense under a near-perfect molecular clock.

36/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

(Over, Mis)Interpreting temporal trends

t6 t62 t4t48 t29t14 t72 t83 t39 12 t69 t63 t96 t44 t85t24 t51 t68 t95t9 t92t87 t31t81 t73t8 t99 t30 t25t64 t46t33 t54t10 t59t16 t26t32 t76 t100 t91 t77 t58t42 t36 t79 t98 t65 t56 t74 t37t80 t12t78 t93 6 8 10 t90 t94 t18t11 t5t45 t60t67 t53t88 t52t27 Mutations to the root > anova(lm(d.root~-1+t.root)) t22 t40 t23 t38 t15t13 anova(lm(d.root~-1+t.root)) t3 t86 t89t75 Analysis of Variance Table t35 t2 t61 t34 t55t66 Response: d.root t70t17 t28t47 Df Sum Sq Mean Sq F value Pr(>F) t19t43 t41t97 t.root 1 5434.7 5434.7 214.66 < 2.2e-16 *** t82t20 t84t49 Residuals 99 2506.4 25.3 t1 t7 t21 t71 t50 t57 0 2 4

12 10 8 6 4 2 0 0 20 40 60 80

Time to the root “Time trees” only make sense under a near-perfect molecular clock.

36/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Outline Context

Phylogenies...

Distance trees

Parsimony

Likelihood/Bayesian

Uncertainty

Pitfalls & best practices

And more...

37/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

This is only the beginning

Many things can be done with trees • estimate divergence time • model trait evolution (phylogenetic comparative method) • reconstruct ancestral states • measure diversity • infer past demographics/effective population size (coalescence) • ... • and also, other approaches than phylogenetics to analyse genetic data

38/39 Context Phylogenies... Distance trees Parsimony Likelihood/Bayesian Uncertainty Pitfalls & best practices And more...

Time to get your hands dirty!

The pdf of the practical is online: http://adegenet.r-forge.r-project.org/ or Google → adegenet → documents → “Workshop Glasgow, August 2015” 39/39