Long Branch Attraction Long Branch Attraction

Results of one simulation of 1000 sites Results of one simulation of 1000 sites A G True tree True tree

1 1 1 1

0.01

0.01 0.01 0.01 0.01 0.01 A G

987 -> 0 542 -> 1+ 2 true pars. inf. Paul O. Lewis ~ , Spring 2020 1 Paul O. Lewis ~ Phylogenetics, Spring 2020 2

Long Branch Attraction Long Branch Attraction

Results of one simulation of 1000 sites A A 1 3 1 3 True True tree 1 3 tree

1 1 2 4 0.01 2 4 Accurate LBA 0.01 0.01 G G topology 2 4 topology

Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively Paul O. Lewis ~ Phylogenetics, Spring 2020 99 false pars. inf. 3 Paul O. Lewis ~ Phylogenetics, Spring 2020 misleading. Systematic Zoology 27: 401-410. 4 "Felsenstein Zone" Trees "Felsenstein Zone" Trees

Fig. 3, p. 531, from Swofford et al. 2001. Bias in phylogenetic estimation and its relevance Paul O. Lewis ~ Phylogenetics, Spring 2020 stat. consist. more data = more sites 5 Paul O. Lewis ~ Phylogenetics, Spring 2020 to the choice between parsimony and likelihood methods. Systematic Biology 50: 525-539. 6

Long Branch Repulsion? Long Branch Repulsion?

Mirror images ?

"Felsenstein Zone" "Farris Zone" or Tree "Anti-Felsenstein Zone" Tree

Siddall, M. E. 1998. Success of parsimony in the four-taxon case: long-branch Fig. 2, p. 529, from Swofford et al. 2001. Bias in phylogenetic estimation and its relevance repulsion by likelihood in the Farris Zone. 14:209-220. Paul O. Lewis ~ Phylogenetics, Spring 2020 7 Paul O. Lewis ~ Phylogenetics, Spring 2020 to the choice between parsimony and likelihood methods. Systematic Biology 50: 525-539. 8 Correctly estimated long edges mean Failing to estimate edge lengths convergence explanation is correctly leads to LBA more reasonable A A Underestimated edges make convergence explanation is less reasonable A A

C C C C True JC+G (JC+G) JC

Paul O. Lewis ~ Phylogenetics, Spring 2020 9 Likelihood ratio test of the molecular clock HIV-1 subtypes 2,000 nucleotide sites from gag and pol genes. Codium Codium Substitution model: GTR+Γ (using empirical base frequencies) -5226.835 Osmunda E2 A1 A2 B D E1 E2 A1 B D A2 E1

Zamia Zamia

Oryza Gnetum Osmunda -5214.240 Zea Zea Acer Oryza lnLML = -5069.85 Gnetum lnL1 = -5073.75 Acer

Felsenstein, J. 1983. Statistical inference of phylogenies. Journal of Paul O. Lewis ~ Phylogenetics, Spring 2016 1 Paul O. Lewis ~ Phylogenetics, Spring 2016 Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests 2 the Royal Statistical Society A 146:246-272. of topologies in phylogenetics. Systematic Biology 49:652-670.

KH test KH test (RELL method) ML tree Accepted tree (centered) Site log-likelihoods (pretend we only have 10 sites) Original dataset -5069.85 - (-5073.75) = 3.90 --- site 1 site 2 site 3 site 4 site 5 site 6 site 7 site 8 site 9 site 10

-1.772 -1.772 -7.856 -5.309 -5.747 -1.119 -1.772 -1.119 -10.245 -1.119 Bootstrap 1 -4951.07 - (-4958.72) = 7.65 - 4.43 = 3.23 -1.771 -1.771 -7.965 -5.289 -5.815 -1.12 -1.771 -1.12 -10.256 -1.12 Bootstrap 2 -5149.91 - (-5158.69) = 8.78 - 4.43 = 4.35

Bootstrap 3 -5100.88 - (-5104.89) = 4.01 - 4.43 = -0.42 0 =0.168 -0.001 -0.001 0.109 -0.02 0.068 0.001 -0.001 0.001 0.011 0.001 ⠇ Bootstrap 100 -5051.14 - (-5057.16) = 6.02 - 4.43 = 1.59

= 0.063 -0.001 -0.001 -0.001 -0.02 -0.02 0.001 0.001 -0.001 -0.02 -0.001 1 First bootstrap replicate

Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Paul O. Lewis ~ Phylogenetics, Spring 2016 3 Paul O. Lewis ~ Phylogenetics, Spring 2016 hominoidea. Journal of Molecular 29:170-179. 4 KH Test (Normal Approximation) Example: HIV-1 subtypes 100 δ0 = lnL1 - lnL2= (-5069.85) - (-5073.75) = 3.9 A1 B D A2 E1 E2 100 3.9 does not lie in 80 the rejection P = 0.384 region, so we cannot reject the 80

ML tree null hypothesis of 60 equal support lnL = -5069.85 60

1 Frequency 40

A1 A2 B D E1 E2 Frequency 40 20

accepted 20 2.5% 2.5% tree lower upper 0 tail tail

lnL2 = -5073.75 0 -15 -10 -5 0 5 10 15

-15 -10 -5 0 5 10 15 delta

delta Paul O. Lewis ~ Phylogenetics, Spring 2016 5 Paul O. Lewis ~ Phylogenetics, Spring 2016 6

KH Test in PAUP* begin paup; exe hiv1.nex; gettrees file=treefile.tre; lset nst=6 basefreq=estim rates=gamma shape=estim rmatrix=estim; lscores 1 2 / khtest=bootstrap[none|normal] RELL=yes; end; SH Test in PAUP* lscores 1 2 / shtest RELL=yes bootreps=1000; AU Test in PAUP*

lscores 1 2 / autest RELL=yes bootreps=1000;

Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution 16: 1114-1116.

Shimodaira H. 2002. An approximately unbiased test of selection. Systematic Biology. 51:492–508.

Paul O. Lewis ~ Phylogenetics, Spring 2016 7