Introduction to Biosystematics - Zool 575
Introduction to Biosystematics Lecture 25 - Confidence - Assessment 2
Confidence - Assessment of the Strength of the Phylogenetic Signal - part 2
1. Consistency Index 2. g1 statistic, PTP - test
“Quantifying the uncertainty of a phylogenetic estimate is at least as important a goal as obtaining the phylogenetic estimate itself.”
3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)
- Huelsenbeck & Rannala (2004)
Derek S. Sikes University of Calgary Zool 575
- Multiple optimal trees
- Multiple optimal trees
• Many methods can yield multiple equally optimal trees
• If multiple optimal trees are found we know
that all of them are wrong except, possibly,
(hopefully) one
• We can further select among these trees
- with additional criteria, but
- • Some have argued against consensus tree
methods for this reason
• Typically, relationships common to all the optimal trees are summarized with
consensus trees
• Debate over quest for true tree (point estimate) versus quantification of uncertainty
Strict consensus methods
Consensus methods
• Strict consensus methods require agreement across all the fundamental trees
• A consensus tree is a summary of the agreement among a set of fundamental trees
• They show only those relationships that are unambiguously supported by the data
• There are many consensus methods that differ in:
1. the kind of agreement 2. the level of agreement
• The commonest method (strict component consensus) focuses on clades/components/full splits
• Consensus methods can be used with multiple trees from a single analysis or from multiple analyses
1
Introduction to Biosystematics - Zool 575
Strict consensus methods
Strict consensus methods
TWO FUNDAMENTAL TREES
- B
- E
- F
- G
- A
- C
- D
- A
- B
- C
- D
- E
- F
- G
• This method produces a consensus tree that includes all and only those full splits found in all the fundamental trees
• Other relationships (those in which the fundamental trees disagree) are shown as unresolved polytomies
- B
- D
- F
- G
- A
- C
- E
• Can be less optimal than any of the optimal trees
Simplest to interpret
STRICT CONSENSUS TREE
- Majority rule consensus
- Majority rule consensus
• This method produces a consensus tree that includes all and only those full splits found in a majority (>50%) of the fundamental trees
• Majority-rule consensus methods require agreement across a majority of the fundamental trees
• Other relationships are shown as unresolved polytomies
• May include relationships that are not supported by the most parsimonious interpretation of the data
• Of particular use in bootstrapping and Bayesian Inference (best not to use for single searches)
• The commonest method focuses on clades/components/full splits
• Implemented in PAUP* and MrBayes
- Majority rule consensus
- Majority rule consensus
Majority Rule Consensus trees are used for
THREE FUNDAMENTAL TREES
- B
- E
- F
- G
- A
- C
- D
- A
- B
- C
- D
- E
- F
- G
- B
- E
- D
- G
- A
- C
- F
1. Summarizing multiple equally optimal trees from one search (but they shouldn’t be!)
2. Summarizing the results of a bootstrapping analysis (multiple searches)
Numbers indicate frequency of clades in the
- A
- B
- C
- E
- D
- F
- G
3. Summarizing the results of a Bayesian analysis
66
100
- 66
- 66
fundamental trees
66
Don’t confuse these! The numbers on the branches
MAJORITY-RULE CONSENSUS TREE
mean very different things in each case
2
Introduction to Biosystematics - Zool 575
Consensus methods
Reduced consensus methods
TWO FUNDAMENTAL TREES
Three fundamental trees
- agreement subtree
- strict consensus
- B
- D
- F
- G
- A
- C
- E
- A
- G
- B
- C
- D
- E
- F
Euplotes excluded
Ochromonas Symbiodinium Prorocentrum Loxodes
Ochromonas Symbiodinium Prorocentrum Loxodes
Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Tracheloraphis Gruberia
Tetrahymena Spirostomumum Tracheloraphis Euplotes
Tetrahymena Tracheloraphis Spirostomum Euplotes
Gruberia Ochromonas Symbiodinium Prorocentrum Loxodes
Gruberia
- A
- G
- B C D E
- F
Ochromonas
- B
- D
- F
- A
- C
- E
Tetrahymena
majority-rule
Spirostomumum
Ochromonas
Euplotes Tracheloraphis Gruberia
Symbiodinium Prorocentrum Loxodes Tetrahymena Spirostomum Euplotes
100
100
66
Strict component consensus completely unresolved
Ochromonas Symbiodinium Prorocentrum Loxodes Tetrahymena Euplotes Spirostomumum Tracheloraphis Gruberia
100
66
AGREEMENT SUBTREE - PAUP*
Taxon G is excluded
100
Tracheloraphis Gruberia
Consensus methods
Recall
• Use strict methods to identify those relationships unambiguously supported by parsimonious interpretation of the data
• Stochastic error vs Systematic error • These assessment methods help identify stochastic error
• Use reduced methods where consensus trees are poorly resolved
– How repeatable are the results?
- – How strongly do the data support them?
- • Avoid methods which have ambiguous
interpretations. Prevent possible confusion between MR consensus for an optimal tree search and a MR consensus for a bootstrapping search
– This is a measure of precision (which is
hopefully related to accuracy)
Confidence - Assessment of the Strength of the Phylogenetic Signal - part 2
Accuracy and Precision
• Accuracy
1. Consistency Index
– Accuracy is correctness. How close a measurement is to the true value.
(unless we know the “true tree” in advance we cannot measure this)
2. g1 statistic, PTP - test 3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)
• Precision
– Precision is reproducibility. How closely two or more measurements agree with one another. (this we can measure!)
3
Introduction to Biosystematics - Zool 575
Decay analysis
Branch Support
• In parsimony analysis, a way to assess support for a group is to see if the group occurs in slightly less parsimonious trees also
• Several methods have been proposed that attach numerical values to internal branches in trees that are intended to provide some measure of the
strength of support for those branches and the
corresponding groups
• The length difference between: the shortest trees including the group and
• These methods include: the shortest trees that exclude the group
(the extra steps required to collapse a group)
is the decay index or Bremer support
- The Bootstrap (BS) and jackknife - Decay analyses (aka Bremer Support) - Bayesian Posterior Probabilities (PP or BPP)
Decay analyses - in practice
• Decay indices for each clade can be determined by:
Decay analysis -example
- Using PAUP* to search for the shortest tree that lacks the branch of interest using reverse topological constraints
Ciliate SSUrDNA data
Ochromonas
Randomly permuted data
Ochromonas
+27
Symbiodinium Prorocentrum Loxodes
Symbiodinium
+1
Prorocentrum
+1
- with the Autodecay or TreeRot programs (in conjunction with PAUP*) - MacClade 4 will also help prepare for a Decay analysis
+45
Loxodes
+3
Tracheloraphis Spirostomum Gruberia
Tetrahymena Tracheloraphis Spirostomum Euplotes
+8
+15
+7
+10
Euplotes
- An excellent use for the Parsimony Ratchet - because finding the shortest tree length is all that matters (not finding multiple shortest trees)
Tetrahymena
Gruberia
- Decay indices - interpretation
- Decay indices - interpretation
• Generally, the higher the decay index the better the relative support for a group
• Unlike BS decay indices are not scaled (0-100)
– This has the advantage that the value can exceed 100 whereas BS “tops - out” at 100 meaning that we cannot distinguish between the support of two branches with BS values of 100 although one might have a far greater decay index than the other
• Like Bootstrap values (BS), decay indices may be misleading if the data are misleading
• Magnitude of decay indices and BS generally correlated (i.e. they tend to agree)
• It is even less clear what is an acceptable decay
index than a BS value…
– Unlike the BS value very little work has examined the properties and behavior of decay indices
• Only groups found in all most parsimonious trees have decay indices > zero
4
Introduction to Biosystematics - Zool 575
Confidence - Assessment of the Strength of the Phylogenetic Signal - part 2
Decay indices - interpretation
One key study is that of DeBry (2001)
1. Consistency Index
– He showed that decay indices should be interpreted in
light of branch lengths
2. g1 statistic, PTP - test
– That the same values, even within the same tree, do not
represent the same support if the branch lengths differ
3. Consensus trees 4. Decay index (Bremer Support) 5. Bootstrapping / Jackknifing 6. Statistical hypothesis testing (frequentist) 7. Posterior probability (see lecture on Bayesian)
- ie Decay Indices are not easily comparable as measures of branch support
- Values < 4 should be considered weak regardless of branch length
DeBry, R.W. (2001) Improving interpretation of the Decay Index for DNA sequence data. Systematic Biology 50: 742-752.
Bootstrapping (non-parametric)
• Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter
• Introduced to phylogenetics by Felsenstein in 1985
Decay values versus Bootstrap and Jacknife values from one empirical study
• Based on idea of Efron (1979)
Norén, M. & U. Jondelius. 1999. Phylogeny of the Prolecithophora (Platyhelminthes) inferred from 18S rDNA sequences. Cladistics 15: 103-112.
Bootstrapping (non-parametric)
1. Characters are sampled with replacement to create many (100-1000) bootstrap replicate data sets
(think shuffle vs random play of music)
2. Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML)
3. Agreement among the resulting trees is summarized with a majority-rule consensus tree
5
Introduction to Biosystematics - Zool 575
Bootstrapping (non-parametric)
Bootstrapping
- Resampled data matrix
- Original data matrix
Characters
• Frequency of occurrence of groups, bootstrap support (BS), is a measure of support for those groups
Characters
Summarize the results of multiple analyses with a majority-rule consensus tree Bootstrap values (BS) are the frequencies with which groups are encountered in analyses of replicate data sets
- Taxa
- 1 2 2 5 5 6 6 8
R R R Y Y Y Y Y R R R Y Y Y Y Y Y Y Y Y Y R R R Y Y Y R R R R R
- Taxa
- 1 2 3 4 5 6 7 8
R R Y Y Y Y Y Y R R Y Y Y Y Y Y Y Y Y Y Y R R R Y Y R R R R R R
ABCD
ABCD
Outgp R R R R R R R R
Outgp R R R R R R R R
• Additional information is given in partition tables (for groups below 50% support)
Randomly resample characters from the original data with replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set
D
- A
- B
- C
- D
- A
- B
- C
12
- B
- C
- D
A
• Can ask PAUP* to create MR con-tree of higher cut-off, eg 80% - all weaker branches collapse
5
- 5
- 1
96%
66%
- 2
- 8
86
76
2
- 6
- 2
543
1
Outgroup
- Outgroup
- Outgroup
Bootstrapping - an example
Bootstrapping - random data
Partition Table
Partition Table
123456789 Freq
Ciliate SSUrDNA - parsimony bootstrap
123456789 Freq -----------------
Randomly permuted data - parsimony bootstrap
Ochromonas (1)
.*****.** ..**..... ....*..*. .*......* .***.*.** ...*...*. .*..**.** .....*..* .*...*..* .***....* ....**.** ....**.*. ..*...*.. .**..*..* .*...*... .....*.** .***.....
71.17 58.87 26.43 25.67 23.83 21.00 18.50 16.00 15.67 13.17 12.67 12.00 12.00 11.00 10.80 10.50 10.00
----------------- .**...... 100.00 ...**.... 100.00 .....**.. 100.00 ...****.. 100.00
Ochromonas Symbiodinium Prorocentrum Loxodes
Ochromonas Symbiodinium Prorocentrum Loxodes
Symbiodinium (2)
100
16 16
Prorocentrum (3)
59 21
59
26
Euplotes (8)
71
Spirostomumum Tetrahymena Euplotes
Tracheloraphis Spirostomumum Euplotes
84
71
Tetrahymena (9)
...****** .......** ...****.* ...*****. .*******. .**....*. .**.....*
95.50 84.33 11.83
3.83 2.50 1.00
96
Tracheloraphis Gruberia
Tetrahymena Gruberia
Loxodes (4)
100
Tracheloraphis (5)
100
50% Majority-rule consensus (with minority components)
Spirostomum (6)
100
Gruberia (7)
Majority-rule consensus