Long Branch Attraction Long Branch Attraction
Results of one simulation of 1000 sites Results of one simulation of 1000 sites A G True tree True tree
1 1 1 1
0.01
0.01 0.01 0.01 0.01 0.01 A G
987 -> 0 542 -> 1+ 2 true pars. inf. Paul O. Lewis ~ Phylogenetics, Spring 2020 1 Paul O. Lewis ~ Phylogenetics, Spring 2020 2
Long Branch Attraction Long Branch Attraction
Results of one simulation of 1000 sites A A 1 3 1 3 True True tree 1 3 tree
1 1 2 4 0.01 2 4 Accurate LBA 0.01 0.01 G G topology 2 4 topology
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively Paul O. Lewis ~ Phylogenetics, Spring 2020 99 false pars. inf. 3 Paul O. Lewis ~ Phylogenetics, Spring 2020 misleading. Systematic Zoology 27: 401-410. 4 "Felsenstein Zone" Trees "Felsenstein Zone" Trees
Fig. 3, p. 531, from Swofford et al. 2001. Bias in phylogenetic estimation and its relevance Paul O. Lewis ~ Phylogenetics, Spring 2020 stat. consist. more data = more sites 5 Paul O. Lewis ~ Phylogenetics, Spring 2020 to the choice between parsimony and likelihood methods. Systematic Biology 50: 525-539. 6
Long Branch Repulsion? Long Branch Repulsion?
Mirror images ?
"Felsenstein Zone" "Farris Zone" or Tree "Anti-Felsenstein Zone" Tree
Siddall, M. E. 1998. Success of parsimony in the four-taxon case: long-branch Fig. 2, p. 529, from Swofford et al. 2001. Bias in phylogenetic estimation and its relevance repulsion by likelihood in the Farris Zone. Cladistics 14:209-220. Paul O. Lewis ~ Phylogenetics, Spring 2020 7 Paul O. Lewis ~ Phylogenetics, Spring 2020 to the choice between parsimony and likelihood methods. Systematic Biology 50: 525-539. 8 Correctly estimated long edges mean Failing to estimate edge lengths convergence explanation is correctly leads to LBA more reasonable A A Underestimated edges make convergence explanation is less reasonable A A
C C C C True JC+G (JC+G) JC
Paul O. Lewis ~ Phylogenetics, Spring 2020 9 Likelihood ratio test of the molecular clock HIV-1 subtypes 2,000 nucleotide sites from gag and pol genes. Codium Codium Substitution model: GTR+Γ (using empirical base frequencies) -5226.835 Osmunda E2 A1 A2 B D E1 E2 A1 B D A2 E1
Zamia Zamia
Oryza Gnetum Osmunda -5214.240 Zea Zea Acer Oryza lnLML = -5069.85 Gnetum lnL1 = -5073.75 Acer
Felsenstein, J. 1983. Statistical inference of phylogenies. Journal of Paul O. Lewis ~ Phylogenetics, Spring 2016 1 Paul O. Lewis ~ Phylogenetics, Spring 2016 Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests 2 the Royal Statistical Society A 146:246-272. of topologies in phylogenetics. Systematic Biology 49:652-670.
KH test KH test (RELL method) ML tree Accepted tree (centered) Site log-likelihoods (pretend we only have 10 sites) Original dataset -5069.85 - (-5073.75) = 3.90 --- site 1 site 2 site 3 site 4 site 5 site 6 site 7 site 8 site 9 site 10
-1.772 -1.772 -7.856 -5.309 -5.747 -1.119 -1.772 -1.119 -10.245 -1.119 Bootstrap 1 -4951.07 - (-4958.72) = 7.65 - 4.43 = 3.23 -1.771 -1.771 -7.965 -5.289 -5.815 -1.12 -1.771 -1.12 -10.256 -1.12 Bootstrap 2 -5149.91 - (-5158.69) = 8.78 - 4.43 = 4.35
Bootstrap 3 -5100.88 - (-5104.89) = 4.01 - 4.43 = -0.42 0 =0.168 -0.001 -0.001 0.109 -0.02 0.068 0.001 -0.001 0.001 0.011 0.001 ⠇ Bootstrap 100 -5051.14 - (-5057.16) = 6.02 - 4.43 = 1.59