Handout Lec. 25

Introduction to Biosystematics - Zool 575

Introduction to Biosystematics Lecture 25 - Confidence - Assessment 2

Confidence - Assessment of the Strength of the Phylogenetic Signal - part 2

1. Consistency Index 2. g1 statistic, PTP - test

“Quantifying the uncertainty of a phylogenetic estimate is at least as important a goal as obtaining the phylogenetic estimate itself.”

3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)
- Huelsenbeck & Rannala (2004)

Derek S. Sikes University of Calgary Zool 575

Multiple optimal trees

• Many methods can yield multiple equally optimal trees
• If multiple optimal trees are found we know

that all of them are wrong except, possibly,

(hopefully) one
• We can further select among these trees

with additional criteria, but

• Some have argued against consensus tree

methods for this reason

• Typically, relationships common to all the optimal trees are summarized with

consensus trees

• Debate over quest for true tree (point estimate) versus quantification of uncertainty

Strict consensus methods
Consensus methods

• Strict consensus methods require agreement across all the fundamental trees
• A consensus tree is a summary of the agreement among a set of fundamental trees

• They show only those relationships that are unambiguously supported by the data
• There are many consensus methods that differ in:
1. the kind of agreement 2. the level of agreement
• The commonest method (strict component consensus) focuses on clades/components/full splits
• Consensus methods can be used with multiple trees from a single analysis or from multiple analyses

1

TWO FUNDAMENTAL TREES

B

E

F

G

A

C

D

A

B

C

D

E

F

G

• This method produces a consensus tree that includes all and only those full splits found in all the fundamental trees

• Other relationships (those in which the fundamental trees disagree) are shown as unresolved polytomies

B

D

F

G

A

C

E

• Can be less optimal than any of the optimal trees

Simplest to interpret

STRICT CONSENSUS TREE

Majority rule consensus

• This method produces a consensus tree that includes all and only those full splits found in a majority (>50%) of the fundamental trees
• Majority-rule consensus methods require agreement across a majority of the fundamental trees

• Other relationships are shown as unresolved polytomies
• May include relationships that are not supported by the most parsimonious interpretation of the data

• Of particular use in bootstrapping and Bayesian Inference (best not to use for single searches)
• The commonest method focuses on clades/components/full splits

• Implemented in PAUP* and MrBayes

Majority rule consensus

Majority Rule Consensus trees are used for

THREE FUNDAMENTAL TREES

B

E

F

G

A

C

D

A

B

C

D

E

F

G

B

E

D

G

A

C

F

1. Summarizing multiple equally optimal trees from one search (but they shouldn’t be!)

2. Summarizing the results of a bootstrapping analysis (multiple searches)

Numbers indicate frequency of clades in the

A

B

C

E

D

F

G

3. Summarizing the results of a Bayesian analysis

66
100

66

fundamental trees

66

Don’t confuse these! The numbers on the branches

MAJORITY-RULE CONSENSUS TREE

mean very different things in each case

2

Consensus methods
Reduced consensus methods

TWO FUNDAMENTAL TREES

Three fundamental trees

agreement subtree

strict consensus

B

D

F

G

A

C

E

A

G

B

C

D

E

F

Euplotes excluded

Ochromonas Symbiodinium Prorocentrum Loxodes

Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Tracheloraphis Gruberia

Tetrahymena Spirostomumum Tracheloraphis Euplotes

Tetrahymena Tracheloraphis Spirostomum Euplotes

Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes

Gruberia

A

G

B C D E

F

Ochromonas

B

D

F

A

C

E

Tetrahymena

majority-rule

Spirostomumum

Ochromonas

Euplotes Tracheloraphis Gruberia

Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Euplotes

100
100
66

Strict component consensus completely unresolved

Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Euplotes Spirostomumum Tracheloraphis Gruberia

100
66

AGREEMENT SUBTREE - PAUP*
Taxon G is excluded

100

Tracheloraphis Gruberia

Consensus methods

Recall

• Use strict methods to identify those relationships unambiguously supported by parsimonious interpretation of the data

• Stochastic error vs Systematic error • These assessment methods help identify stochastic error

• Use reduced methods where consensus trees are poorly resolved
– How repeatable are the results?

– How strongly do the data support them?

• Avoid methods which have ambiguous

interpretations. Prevent possible confusion between MR consensus for an optimal tree search and a MR consensus for a bootstrapping search
– This is a measure of precision (which is

hopefully related to accuracy)

Accuracy and Precision

• Accuracy

1. Consistency Index

– Accuracy is correctness. How close a measurement is to the true value.
(unless we know the “true tree” in advance we cannot measure this)

2. g1 statistic, PTP - test 3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)

• Precision

– Precision is reproducibility. How closely two or more measurements agree with one another. (this we can measure!)

3

Decay analysis

Branch Support

• In parsimony analysis, a way to assess support for a group is to see if the group occurs in slightly less parsimonious trees also
• Several methods have been proposed that attach numerical values to internal branches in trees that are intended to provide some measure of the

strength of support for those branches and the

corresponding groups
• The length difference between: the shortest trees including the group and

• These methods include: the shortest trees that exclude the group

(the extra steps required to collapse a group)

is the decay index or Bremer support

- The Bootstrap (BS) and jackknife - Decay analyses (aka Bremer Support) - Bayesian Posterior Probabilities (PP or BPP)

Decay analyses - in practice

• Decay indices for each clade can be determined by:

Decay analysis -example

- Using PAUP* to search for the shortest tree that lacks the branch of interest using reverse topological constraints

Ciliate SSUrDNA data

Ochromonas

Randomly permuted data

Ochromonas

+27

Symbiodinium Prorocentrum Loxodes
Symbiodinium

+1

Prorocentrum

+1

- with the Autodecay or TreeRot programs (in conjunction with PAUP*) - MacClade 4 will also help prepare for a Decay analysis

+45

Loxodes

+3

Tracheloraphis Spirostomum Gruberia
Tetrahymena Tracheloraphis Spirostomum Euplotes

+8
+15

+7
+10

Euplotes

- An excellent use for the Parsimony Ratchet - because finding the shortest tree length is all that matters (not finding multiple shortest trees)

Tetrahymena
Gruberia

Decay indices - interpretation

• Generally, the higher the decay index the better the relative support for a group
• Unlike BS decay indices are not scaled (0-100)

– This has the advantage that the value can exceed 100 whereas BS “tops - out” at 100 meaning that we cannot distinguish between the support of two branches with BS values of 100 although one might have a far greater decay index than the other

• Like Bootstrap values (BS), decay indices may be misleading if the data are misleading

• Magnitude of decay indices and BS generally correlated (i.e. they tend to agree)
• It is even less clear what is an acceptable decay

index than a BS value…

– Unlike the BS value very little work has examined the properties and behavior of decay indices

• Only groups found in all most parsimonious trees have decay indices > zero

4

Decay indices - interpretation

One key study is that of DeBry (2001)

1. Consistency Index
– He showed that decay indices should be interpreted in

light of branch lengths
2. g1 statistic, PTP - test

– That the same values, even within the same tree, do not

represent the same support if the branch lengths differ
3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)
- ie Decay Indices are not easily comparable as measures of branch support

- Values < 4 should be considered weak regardless of branch length

DeBry, R.W. (2001) Improving interpretation of the Decay Index for DNA sequence data. Systematic Biology 50: 742-752.

Bootstrapping (non-parametric)

• Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter
• Introduced to phylogenetics by Felsenstein in 1985

Decay values versus Bootstrap and Jacknife values from one empirical study

• Based on idea of Efron (1979)

Norén, M. & U. Jondelius. 1999. Phylogeny of the Prolecithophora (Platyhelminthes) inferred from 18S rDNA sequences. Cladistics 15: 103-112.

1. Characters are sampled with replacement to create many (100-1000) bootstrap replicate data sets

(think shuffle vs random play of music)

2. Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML)

3. Agreement among the resulting trees is summarized with a majority-rule consensus tree

5

Bootstrapping

Resampled data matrix

Original data matrix

Characters

• Frequency of occurrence of groups, bootstrap support (BS), is a measure of support for those groups

Characters

Summarize the results of multiple analyses with a majority-rule consensus tree Bootstrap values (BS) are the frequencies with which groups are encountered in analyses of replicate data sets

Taxa

1 2 2 5 5 6 6 8

R R R Y Y Y Y Y R R R Y Y Y Y Y Y Y Y Y Y R R R Y Y Y R R R R R

Taxa

1 2 3 4 5 6 7 8

R R Y Y Y Y Y Y R R Y Y Y Y Y Y Y Y Y Y Y R R R Y Y R R R R R R
ABCD
ABCD
Outgp R R R R R R R R
Outgp R R R R R R R R

• Additional information is given in partition tables (for groups below 50% support)

Randomly resample characters from the original data with replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set

D

A

B

C

D

A

B

C

12

B

C

D

A

• Can ask PAUP* to create MR con-tree of higher cut-off, eg 80% - all weaker branches collapse

5

1

96%
66%

2

8

86
76
2

6

2

543
1
Outgroup

Outgroup

Bootstrapping - an example
Bootstrapping - random data

Partition Table

123456789 Freq

Ciliate SSUrDNA - parsimony bootstrap

123456789 Freq -----------------

Randomly permuted data - parsimony bootstrap

Ochromonas (1)

.*****.** ..**..... ....*..*. .*......* .***.*.** ...*...*. .*..**.** .....*..* .*...*..* .***....* ....**.** ....**.*. ..*...*.. .**..*..* .*...*... .....*.** .***.....
71.17 58.87 26.43 25.67 23.83 21.00 18.50 16.00 15.67 13.17 12.67 12.00 12.00 11.00 10.80 10.50 10.00

----------------- .**...... 100.00 ...**.... 100.00 .....**.. 100.00 ...****.. 100.00

Ochromonas Symbiodinium Prorocentrum Loxodes
Ochromonas Symbiodinium Prorocentrum Loxodes

Symbiodinium (2)

100

16 16

Prorocentrum (3)

59 21
59
26

Euplotes (8)

71

Spirostomumum Tetrahymena Euplotes
Tracheloraphis Spirostomumum Euplotes

84

71