Assessing Phylogenetic Hypotheses and Phylogenetic Data

Home , Loxodes

Assessing Phylogenetic Hypotheses and Phylogenetic Data • We use numerical phylogenetic methods because most data includes potentially misleading evidence of relationships • We should not be content with constructing phylogenetic hypotheses but should also assess what ‘confidence’ we can place in our hypotheses • This is not always simple! (but do not despair!) Assessing Data Quality

• We expect (or hope) our data will be well structured and contain strong phylogenetic signal • We can test this using randomization tests of explicit null hypotheses • The behaviour or some measure of the quality of our real data is contrasted with that of comparable but phylogenetically uninformative data determined by randomization Random Permutation • Random permutation destroys any correlation among characters to that expected by chance alone • It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) ‘TAXA’ ‘CHARACTERS’ 12345678 Original structured data with R-PRP RPRP RP A-E A E A E A E A E strong correlations among N-R N R N R N R N R D-M D M D M D M D M characters O-UOUOUOUOU M-T M T M T M T M T L-ELELELELE Y-DYDYDYDYD ‘TAXA’ ‘CHARACTERS’ 12345678 R-PNUDERT OU Randomly permuted data with A-EREAPLEAD N-R M R M M A D N P any correlation among D-M L T R E Y M D R characters due to chance O-UDEYUDEYM M-T O M O T O U L T L-E Y D N D M P M E Y-D A P L R N R R E Matrix Randomization Tests • Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets • This allows us to define a test statistic for the null hypothesis that the real data are no better structured than randomly permuted and phylogenetically uninformative data • A permutation tail probability (PTP) is the proportion of data sets with as good or better measure of quality than the real data Structure of Randomization Tests • Reject null hypothesis if, for example, more than 5% of random permutations have as good or better measure than the real data

FAIL TEST

95% cutoff

PASS TEST reject null hypothesis Frequency Frequency Measure of data quality (e.g. tree length, ML, pairwise incompatibilities) GOOD BAD Matrix Randomization Tests

• Measures of data quality include: 1. Tree length for most parsimonious trees - the shorter the tree length the better the data (PAUP*) 2. Numbers of pairwise incompatibilities between characters (pairs of incongruent characters) - the fewer character conflicts the better the data 3. Skewness of the distribution of tree lengths (PAUP) Matrix Randomization Tests

Ciliate SSUrDNA Min = 430 Max = 927 1 MPT Ochromonas L = 618 Symbiodinium Prorocentrum CI = 0.696 Loxodes RI = 0.714 Real data Tracheloraphis Spirostomum PTP = 0.01 Gruberia PC-PTP = 0.001 Euplotes Tetrahymena Significantly non random

Ochromonas 3 MPTs Symbiodinium L = 792 Prorocentrum Randomly Loxodes CI = 0.543 Tetrahymena RI = 0.272 permuted Tracheloraphis Spirostomum PTP = 0.68 Euplotes Gruberia PC-PTP = 0.737 Not significantly different Strict consensus from random Skewness of Tree Length Distributions

• Skewness of tree length distributions can be used as a measure of data quality in randomization tests

• It is measured with the G1 statistic in PAUP • Significance cut-offs for data sets of up to eight taxa have been published based on randomly generated data (rather than randomly permuted data) • PAUP does not perform the more direct randomization test Skewness of Tree Length Distributions

• Studies with random (and

shortest phylogenetically uninformative) tree data showed that the distribution of tree lengths tends to be normal NUMBER OF TREES NUMBER Tree length • In contrast, phylogenetically informative data is expected to shortest tree have a strongly skewed distribution with few shortest

NUMBER OF TREES NUMBER trees and few trees nearly as Tree length short Skewness - example

722 |## ( 72) 723 |### ( 92) 724 |### ( 101) 725 |### ( 87) 726 |#### ( 107) 792 | (3) 727 |#### ( 120) 728 |#### ( 111) 729 |##### ( 134) 793 | (6) 730 |##### ( 137) 731 |#### ( 110) 794 | (12) 732 |#### ( 113) 733 |#### ( 119) 795 | (7) 734 |#### ( 127) 735 |##### ( 131) 736 |#### ( 106) 796 | (17) 737 |#### ( 109) 738 |#### ( 126) 797 | (30) 739 |#### ( 115) 740 |##### ( 136) 798 | (33) 741 |#### ( 128) 742 |##### ( 144) 743 |##### ( 134) 799 |# (42) 744 |###### ( 160) RANDOMLY PERMUTED 745 |##### ( 152) REAL DATA 800 |# (62) 746 |##### ( 159) 747 |###### ( 164) 801 |# (91) 748 |###### ( 182) 749 |####### ( 216) 750 |####### ( 193) 802 |# (111) 751 |######## ( 235) 752 |######## ( 244) 803 |## (134) 753 |######### ( 251) 754 |######## ( 243) 755 |######### ( 254) 804 |## (172) 756 |######## ( 243) 757 |######### ( 271) 805 |### (234) 758 |######### ( 255) 759 |########## ( 287) DATA Ciliate SSUrDNA 806 |#### (292) 760 |######### ( 268) 761 |########## ( 291) 762 |########### ( 319) 807 |#### (356) 763 |########## ( 295) 764 |########### ( 314) 808 |###### (450) 765 |########### ( 312) 766 |########### ( 331) 809 |####### (557) 767 |########### ( 325) 768 |############ ( 347) 769 |########### ( 333) 810 |######## (642) 770 |############ ( 361) 771 |############## ( 400) 811 |######### (737) 772 |############# ( 386) 773 |############## ( 420) 812 |############ (973) g1=-0.100478 774 |############## ( 399) g1=-0.951947 775 |############### ( 435) 776 |################# ( 505) 813 |############## (1130) 777 |################# ( 492) 778 |################## ( 534) 814 |################ (1308) 779 |################## ( 517) 780 |################## ( 529) 781 |###################### ( 637) 815 |#################### (1594) 782 |##################### ( 604) 783 |######################## ( 685) 816 |##################### (1697) 784 |######################## ( 691) 785 |###################### ( 644) 817 |########################## (2097) 786 |######################## ( 700) 787 |########################## ( 746) 788 |######################### ( 713) 818 |############################## (2389) 789 |########################## ( 743) 790 |########################## ( 746) 819 |################################## (2714) 791 |######################### ( 732) 792 |########################## ( 764) 820 |###################################### (3080) 793 |############################ ( 811) 794 |######################### ( 717) 795 |########################## ( 762) 821 |######################################### (3252) 796 |######################## ( 695) 797 |############################ ( 807) 822 |############################################# (3616) 798 |######################## ( 685) 799 |####################### ( 660) 823 |################################################# (3933) 800 |######################## ( 688) 801 |####################### ( 659) 802 |######################## ( 693) 824 |################################################### (4094) 803 |######################## ( 694) 804 |########################## ( 762) 825 |####################################################### (4408) 805 |########################## ( 743) 806 |######################### ( 737) 807 |########################## ( 745) 826 |######################################################### (4574) 808 |############################ ( 816) 809 |############################# ( 838) 827 |########################################################## (4656) 810 |############################ ( 827) 811 |########################## ( 765) 828 |############################################################# (4871) 812 |############################## ( 859) 814813 |###########################|########################## ( 763)( 773) 829 |############################################################## (4962) 815 |############################# ( 835) 816 |############################ ( 802) 817 |########################### ( 798) 830 |################################################################ (5130) 818 |############################# ( 848) 819 |############################# ( 847) 831 |############################################################## (5005) 820 |############################## ( 879) 821 |############################ ( 828) 832 |############################################################### (5078) 822 |########################### ( 784) 823 |########################## ( 757) 824 |########################## ( 770) 833 |############################################################### (5035) 825 |############################ ( 812) 826 |############################ ( 819) 834 |############################################################### (5029) 827 |############################# ( 850) 828 |############################## ( 863) 835 |############################################################# (4864) 829 |################################ ( 934) 830 |################################ ( 919) 831 |################################# ( 963) 836 |########################################################## (4620) 832 |################################### ( 1021) 833 |###################################### ( 1113) 837 |######################################################## (4491) 834 |####################################### ( 1143) 835 |######################################## ( 1162) 838 |##################################################### (4256) 836 |########################################## ( 1223) 837 |############################################ ( 1270) 838 |############################################### ( 1356) 839 |################################################### (4057) 839 |################################################ ( 1399) 840 |############################################### ( 1356) 840 |############################################### (3749) 841 |################################################# ( 1424) 842 |################################################### ( 1492) 843 |#################################################### ( 1499) 841 |############################################ (3502) 844 |######################################################## ( 1630) 845 |####################################################### ( 1594) 842 |####################################### (3160) 846 |######################################################## ( 1619) 847 |########################################################### ( 1718) 843 |################################### (2771) 848 |############################################################# ( 1765) 849 |############################################################## ( 1793) 850 |################################################################ ( 1853) 844 |############################### (2514) 851 |############################################################## ( 1800) 852 |############################################################# ( 1773) 845 |############################ (2258) 853 |################################################################ ( 1861) 854 |################################################################ ( 1853) 846 |######################### (1964) 855 |############################################################## ( 1805) 856 |########################################################### ( 1722) 857 |######################################################### ( 1651) 847 |###################### (1728) 858 |####################################################### ( 1613) 859 |###################################################### ( 1559) 848 |################## (1425) 860 |################################################### ( 1482) 861 |################################################### ( 1479) 849 |############## (1159) 862 |################################################ ( 1409) 863 |############################################## ( 1349) 864 |################################################ ( 1407) 850 |########### (915) 865 |################################################### ( 1487) 866 |################################################## ( 1445) 851 |######### (760) 867 |##################################################### ( 1550) 868 |################################################### ( 1482) 869 |###################################################### ( 1573) 852 |####### (581) 870 |####################################################### ( 1587) 871 |#################################################### ( 1525) 853 |###### (490) 872 |###################################################### ( 1576) 873 |###################################################### ( 1572) 854 |#### (321) 874 |#################################################### ( 1499) 875 |################################################### ( 1480) 876 |############################################### ( 1370) 855 |### (269) 877 |############################################ ( 1289) 878 |########################################## ( 1228) 856 |### (218) 879 |######################################## ( 1165) 880 |################################### ( 1006) 857 |## (161) 881 |################################## ( 992) 882 |############################### ( 890) 883 |########################### ( 792) 858 |# (95) 884 |######################## ( 693) 885 |###################### ( 650) 859 |# (73) 886 |##################### ( 606) 887 |################ ( 469) 888 |############## ( 415) 860 |# (46) 889 |########### ( 314) 890 |######## ( 232) 861 | (26) 891 |####### ( 213) 892 |##### ( 133) 862 | (16) 893 |#### ( 114) 894 |### ( 75) 895 |## ( 60) 863 | (14) 896 |## ( 52) 897 |# ( 17) 864 | (7) 898 |# ( 16) 899 | ( 6) 865 | (7) 900 | ( 4) 866 | (3) 867 | (2)

Frequency distribution of tree lengths Frequency distribution of tree lengths Matrix Randomization Tests - use and limitations • Can detect very poor data - that provides no good basis for phylogenetic inferences (throw it away!) • However, only very little may be needed to reject the null hypothesis (passing test ≠≠ great data) • Doesn’t indicate location of this structure (more discerning tests are possible)

• In the skewness test, significance levels for G1 have been determined for small numbers of taxa only so that this test remains of limited use Assessing Phylogenetic Hypotheses - groups on trees

• Several methods have been proposed that attach numerical values to nodes in trees that are intended to provide some measure of the strength of support for that node • These methods include: character resampling methods - the bootstrap and jackknife decay analyses additional randomization tests Bootstrapping • Bootstrapping is a modern statistical technique that uses computer intensive random resampling of data to determine sampling error or confidence intervals for some estimated parameter Bootstrapping (non-parametric) • Characters are resampled with replacement to create many bootstrap replicate data sets • Each bootstrap replicate data set is analysed (e.g. with parsimony, distance, ML) • Agreement among the resulting trees is summarized with a majority-rule consensus tree • Frequency of occurrence of groups, bootstrap proportions (BPs), is a measure of support for those groups • Additional information is given in partition tables Bootstrapping

Original data matrix Resampled data matrix

Characters Characters Taxa 1 2 3 4 5 6 7 8 Taxa 1 2 2 5 5 6 6 8 Summarise the results of multiple A R R Y Y Y Y Y Y A R R R Y Y Y Y Y analyses with a majority-rule B R R Y Y Y Y Y Y B R R R Y Y Y Y Y C Y Y Y Y Y R R R C Y Y Y Y Y R R R consensus tree D Y Y R R R R R R D Y Y Y R R R R R Bootstrap proportions (BPs) are Outgp R R R R R R R R Outgp R R R R R R R R the frequencies with which groups are encountered in Randomly resample characters from the original data with analyses of replicate data sets replacement to build many bootstrap replicate data sets of the same size as the original - analyse each replicate data set ABCD ABCD A BCD 1 5 2 1 5 96% 8 2 8 7 2 6 6 5 6 2 66% 4 1 3 Outgroup Outgroup Outgroup Bootstrapping - an example

Ciliate SSUrDNA - parsimony bootstrap Partition Table Ochromonas (1) 123456789 Freq ------Symbiodinium (2) 100 .**...... 100.00 Prorocentrum (3) ...**.... 100.00 Euplotes (8) .....**.. 100.00 84 ...****.. 100.00 Tetrahymena (9) ...****** 95.50 96 Loxodes (4) ...... ** 84.33 100 ...****.* 11.83 Tracheloraphis (5) 100 ...*****. 3.83 Spirostomum (6) 100 .*******. 2.50 Gruberia (7) .**....*. 1.00 Majority-rule consensus .**.....* 1.00 Bootstrapping - random data

Partition Table 123456789 Freq Randomly permuted data - parsimony bootstrap ------.*****.** 71.17 Ochromonas Ochromonas ..**..... 58.87 Symbiodinium 16 Symbiodinium ....*..*. 26.43 59 Prorocentrum 59 Prorocentrum .*...... * 25.67 Loxodes 26 Loxodes .***.*.** 23.83 71 ...*...*. 21.00 Tracheloraphis 21 Spirostomumum 71 16 .*..**.** 18.50 Spirostomumum Tetrahymena .....*..* 16.00 Euplotes Euplotes .*...*..* 15.67 Tetrahymena Tracheloraphis.***....* 13.17 ....**.** 12.67 Gruberia Gruberia ....**.*. 12.00 ..*...*.. 12.00 Majority-rule consensus (with minority components) .**..*..* 11.00 .*...*... 10.80 .....*.** 10.50 .***..... 10.00 Bootstrap - interpretation • Bootstrapping was introduced as a way of establishing confidence intervals for phylogenies • This interpretation of bootstrap proportions (BPs) depends on the assumption that the original data is a random sample from a much larger set of independent and identically distributed data • However, several things complicate this interpretation - Many systematists consider these assumptions unreasonable making any statistical interpretation of BPs invalid - Some theoretical work indicates that BPs are very conservative, and may underestimate confidence intervals - problem increases with numbers of taxa - BPs can be high for incongruent relationships in separate analyses - and can therefore be misleading (misleading data -> misleading BPs) - with parsimony it may be highly affected by inclusion or exclusion of a few characters Bootstrap - interpretation

• Bootstrapping is a very valuable and widely used technique (it is demanded by some journals), but requires a pragmatic interpretation: • BPs depend on two aspects of the support for a group - the numbers of characters supporting a group and the level of support for incongruent groups • BPs thus provides a reasonable index of the relative support for groups provided by a set of data Bootstrap - interpretation • High BPs (e.g. >85%) is indicative of strong signal in the data • Provided we have no evidence of strong misleading signal (e.g. base composition biases, great differences in branch lengths) high BPs are likely to reflect strong phylogenetic signal • Low BPs need not mean the relationship is false! Its just poorly supported • Bootstrapping can be viewed as a way of exploring the robustness of phylogenetic inferences to perturbations in the the balance of supporting and conflicting evidence for groups Jackknifing • Jackknifing is very similar to bootstrapping and differs only in the character resampling strategy • Some proportion of characters (e.g. 50%) are randomly selected and deleted • Replicate data sets are analysed and the results summarised with a majority-rule consensus tree • Jackknifing and bootstrapping tend to produce broadly similar results and have similar interpretations Decay analysis • In parsimony analysis, a way to assess support for a group is to see if the group occurs in slightly less parsimonious trees also • The length difference between the shortest trees including the group and the shortest trees that exclude the group (the extra steps required to overturn a group) is the decay index or Bremer support • Total support (for a tree) is the sum of all clade decay indices - this has been advocated as a measure for an as yet unavailable matrix randomization test Decay analysis -example

Ciliate SSUrDNA data Randomly permuted data Ochromonas Ochromonas +27 Symbiodinium +1 Symbiodinium Prorocentrum +1 Prorocentrum +45 Loxodes +3 Loxodes Tracheloraphis Tetrahymena s Spirostomum +8 Tracheloraphi +15 +10 Gruberia Spirostomum Euplotes Euplotes +7 Tetrahymena Gruberia Decay analyses - in practice • Decay indices for each clade can be determined by: - Saving increasingly less parsimonious trees and producing corresponding strict component consensus trees until the consensus is completely unresolved - analyses using reverse topological constraints to determine shortest trees that lack each clade - with the Autodecay program (in conjunction with PAUP) Decay indices - interpretation • Generally, the higher the decay index the better the relative support for a group • Like BPs, decay indices may be misleading if the data is misleading • Unlike BPs decay indices are not scaled (0-100) and it is less clear what is an acceptable decay index • Magnitude of decay indices and BPs generally correlated (i.e. they tend to agree) • Only groups found in all most parsimonious trees have decay indices > zero Decay indices - extensions

• Traditional decay analysis is the determination of decay indices of clades • Double decay analysis is the determination of decay indices for all relationships - gives a more comprehensive but potentially very complicated summary of support • Analogues of parsimony decay indices are possible for any optimality criterion (objective function) Types of Cladistic Relationships

ROOTED TRIPLETS FIVE LEAF TREE COMPONENTS / CLADES ABC DEA ABCDE ABCDE

X Z X ABD DEB Y

(a) ABC DE ABE DE C FOUR LEAF SUBTREE

ABDE Y ACD BCD

DEABC ABDE ACE BCE

Z DEAB

5-TAXON STATEMENTS 3-TAXON STATEMENTS (b) 4-TAXON STATEMENTS (c) (d) PTP tests of groups • A number of randomization tests have been proposed for evaluating particular groups rather than entire data matrices by testing null hypotheses regarding the level of support they receive from the data • Randomisation can be of the data or the group • These methods have not become widely used both because they are not readily performed and because their properties are still under investigation • Topology dependent PTP tests are included in PAUP* but have serious problems (they don’t work!) Comparing competing phylogenetic hypotheses - tests of two trees

• Particularly useful techniques are those designed to allow evaluation of alternative phylogenetic hypotheses • Several such tests allow us to determine if one tree is statistically significantly worse than another: Winning sites test, Templeton test, Kishino-Hasegawa test, Shimodaira-Hasegawa test, parametric bootstrapping Tests of two trees • All these tests are of the null hypothesis that the differences between two trees (A and B) are no greater than expected from sampling error • The simplest ‘wining sites’ test sums the number of sites supporting tree A over tree B and vice versa (those having fewer steps on, and better fit to, one of the trees) • Under the null hypothesis characters are equally likely to support tree A or tree B and a binomial distribution gives the probability of the observed difference in numbers of winning sites The Templeton test

• Templeton’s test is a non-parametric Wilcoxon signed ranks test of the differences in fits of characters to two trees

• It is like the ‘winning sites’ test but also takes into account the magnitudes of differences in the support of characters for the two trees Templeton’s test - an example Recent studies of the relationships of 1 turtles using morphological data have produced very different results with turtles grouping either within the

Claudiosaurus parareptiles (H1) or within the diapsids Archosauromorpha Diadectomorpha Lepidosauriformes Paleothyris Seymouriadae Placodus Younginiformes Parareptilia Eosauropterygia Araeoscelidia Synapsida Captorhinidae 2 (H2) the result depending on the morphologist This suggests there may be: - problems with the data - special problems with turtles - weak support for turtle relationships

Parsimony analysis of the most recent data favoured H2 However, analyses constrained by H2 produced trees that required only 3 extra steps (<1% tree length)

The Templeton test was used to evaluate the trees and showed that the slightly longer H1 tree found in the constrained analyses was not significantly worse than the unconstrained H2 tree The morphological data do not allow choice between H1 and H2 Kishino-Hasegawa test • The Kishino-Hasegawa test is similar in using differences in the support provided by individual sites for two trees to determine if the overall differences between the trees are significantly greater than expected from random sampling error • It is a parametric test that depends on assumptions that the characters are independent and identically distributed (the same assumptions underlying the statistical interpretation of bootstrapping) • It can be used with parsimony and maximum likelihood - implemented in PHYLIP and PAUP* Kishino-Hasegawa test If the difference between trees (tree Sites favouring tree A Sites favouring tree B lengths or likelihoods) is attributable to sampling error, then characters will randomly support Expected

Mean tree A or B and the total difference 0 Distribution of Step/Likelihood differences at each site will be close to zero The observed difference is Under the null hypothesis the significantly greater than zero if it is mean of the differences in greater than 1.95 standard parsimony steps or likelihoods for deviations each site is expected to be zero, and This allows us to reject the null the distribution normal hypothesis and declare the sub- optimal tree significantly worse than From observed differences we the optimal tree (p < 0.05) calculate a standard deviation Kishino-Hasegawa test - an example

Ochromonas Ciliate SSUrDNA

Symbiodinium Maximum likelihood tree Prorocentrum Sarcocystis Parsimonious character optimization Theileria of the presence and absence of hydrogenosomes suggests four Plagiopyla n separate origins of hydrogenosomes Plagiopyla f Trimyema c within the ciliates Trimyema s Cyclidium p Cyclidium g

Cyclidium l Glaucoma Paramecium Colpodinium Tetrahymena Questions Colpoda Discophrya- how reliable is this result? Opisthonecta Trithigmostoma Dasytrichia - in particular how well supported SpathidiumEntodinium is the idea of multiple origins? Loxophylum Homalozoon - how many origins can we Stylonychia Metopus c confidently infer? Metopus p Onychodromous Oxytrichia anaerobic ciliates with hydrogenosomes Loxodes SpirostomumTracheloraphis Gruberia Blepharisma 96-100 95-100 Kishino-Hasegawa test - an example an - Kishino-Hasegawatest 10

7 Ochromonas 99-100 69-78 81-86 3 11 3 100 18-0 1 41-30 26

45-72 Prorocentrum 100 3 35-17

4 3 Symbiodinium 46-26 18 78-99

100 Sarcocystis 100

53-45 Colpoda 11-0 80-50 83-82 7

3 Theileria 3 3 27 63 89-91 15-0 12 3

100-99

Oxytrichia

Onychodromous Stylonychia Spirostomum

100 42 100 100

Blepharisma 56 Cyclidiump Gruberia

23 Paramecium

100

67-99 100

Loxodes Cyclidiumg

Most parsimonious tree parsimonious Most data SSUrDNA Ciliate Tracheloraphis 50-53 3 48 75 3

100-98 Metopus p 100

17 Opisthonecta 100

Homalozoon Metopus c 69-99

Loxophylum

Spathidium Cyclidium l 27 33

Tetrahymena

Colpodinium Trimyema s

Trimyema c

Dasytrichia

Plagiopylan Glaucoma

Plagiopyla f

Entodinium

Discophryal Trithigmostoma well supported relationships supported well distance trees generally reflect the less Differences between the ML, MP and forsupport clades analysesdistance and relative indicate parsimony andDecay indicesfor BPs ciliates ofhydrogenosomes within originsseparate four indicates optimizationcharacter parsimoniousparticular, in - very similar tree yieldsanalysis Parsimony a Kishino-Hasegawa test - example

Ochromonas Ochromonas Symbiodinium Symbiodinium Parsimony analyse with topological Prorocentrum Prorocentrum Sarcocystis Sarcocystis constraints were used to find the Theileria Theileria Plagiopyla n Plagiopyla n shortest trees that forced Plagiopyla f Plagiopyla f Trimyema c Trimyema c hydrogenosomal ciliate lineages Trimyema s Trimyema s Cyclidium p Cyclidium p together and thereby reduced Cyclidium g Metopus c Cyclidium l Metopus p the number of separate origins of Dasytrichia Dasytrichia Entodinium Entodinium hydrogenosomes Loxophylum Cyclidium g Homalozoon Cyclidium l Spathidium Loxophylum Metopus c Spathidium Metopus p Homalozoon Loxodes Loxodes Tracheloraphis TracheloraphisEach of the constrained parsimony Spirostomum Spirostomum Gruberia Gruberiatrees were compared to the ML Blepharisma Blepharisma Discophrya Discophryatree and the Kishino-Hasegawa test Trithigmostoma Trithigmostoma Stylonychia Stylonychiaused to determine which of these Onychodromous Onychodromous Oxytrichia Oxytrichiatrees were significantly worse than Colpoda Colpoda Paramecium Parameciumthe ML tree Glaucoma Glaucoma Colpodinium Colpodinium Tetrahymena Tetrahymena Opisthonecta Opisthonecta Two examples of the topological constraint trees Kishino-Hasegawa test

Test summary and results - origins of ciliate hydrogenosomes (simplified) Constrained analyses used to find N o. Cons traint Extra Difference Significantly Origins tree Steps and SD worse? most parsimonious trees with less 4ML+10- - than four separate origins of 4MP--13 ± 18 No hydrogenosomes 3(cp,pt)+13-21 ± 22 No 3 (cp,rc) +113 ± -337 40 Yes Tested against ML tree 3(cp,m)+47-147 ± 36 Yes 3(pt,rc)+96-279 ± 38 Yes 3(pt,m)+22-68 ± 29 Yes Trees with 2 or 1 origin are all ± 3(rc,m)+63-190 34 Yes significantly worse than the ML 2 (pt,cp,rc) +123 -432 ± 40 Yes 2 (pt,rc,m) +100 -353 ± 43 Yes tree 2(pt,cp,m)+40-140 ± 37 Yes We can confidently conclude that 2 (cp,rc,m) +124 -466 ± 49 Yes there have been at least three ± 2 (pt,cp)(rc,m) +77 -222 39 Yes separate origins of 2 (pt,m)(rc,cp) +131 -442 ± 48 Yes 2 (pt,rc)(cp,m) +140 -414 ± 50 Yes hydrogenosomes within the 1 (pt,cp,m,rc) +131 -515 ± 49 Yes sampled ciliates Shimodaira-Hasegawa Test • To be statistically valid, the Kishino-Hasegawa test should be of trees that are selected a priori • However, most applications have used trees selected a posteriori on the basis of the phylogenetic analysis • Where we test the ‘best’ tree against some other tree the KH test will be biased towards rejection of the null hypothesis • The SH test is a similar but more statistically correct technique in these circumstances Reliability of Phylogenetic Methods • Phylogenetic methods (e.g. parsimony, distance, ML) can also be evaluated in terms of their general performance, particularly their:

consistency - approach the truth with more data efficiency - how quickly (how much data) robustness - how sensitive to violations of assumptions

• Studies of these properties can be analytical or by simulation Reliability of Phylogenetic Methods

• There have been many arguments that ML methods are best because they have desirable statistical properties, such as consistency • However, ML does not always have these properties – if the model is wrong/inadequate – properties not yet demonstrated for complex inference problems such as phylogenetic trees Reliability of Phylogenetic Methods

• “Simulations show that ML methods generally outperform distance and parsimony methods over a broad range of realistic conditions” Whelan et al. 2001 Trends in Genetics 17:262-272

• Most simulations are very (unrealistically) simple – few taxa (typically just four) – few parameters (standard models - JC, K2P etc) Reliability of Phylogenetic Methods

• Simulations with four taxa have shown: - Model based methods - distance and maximum likelihood perform well when the model is accurate (not surprising!) - Violations of assumptions can lead to inconsistency for all methods (a Felsenstein zone) when branch lengths or rates are highly unequal - Maximum likelihood methods are quite robust to violations of model assumptions - Weighted parsimony can perform better than standard parsimony (has a smaller Felsenstein zone) in some cases Reliability of Phylogenetic Methods • However: - Generalising from four taxon simulations may be dangerous as conclusions may not hold for more complex cases - A few large scale simulations (many taxa) have suggested that parsimony can be very accurate and efficient - Most methods are accurate in correctly recovering known phylogenies produced in laboratory studies • More study of methods is needed to help in choice of method using more realistic simulations HAPPY BIRTHDAY PATRICIA