<<

Parsimony (and a little bit of the history of )

Paul O. Lewis ~ Phylogenetics, Spring 2020 1950: Willi Hennig published Grundzüge einer Theorie der Phylogenetischen Systematik, which was later (1966) fly frog human translated into English as Phylogenetic Systematics. - -

Emphasized that phylogenetic groups should be defined on vertebrae the basis of shared, derived characters (synapomorphies), not on the basis of shared, ancestral characters (symplesiomorphies) fern lily moss

Although widely considered to be the father of cladistics, Hennig never described an algorithm based on the parsimony criterion, nor did he suggest what to do in the case of character conflict. ⑤vascular tissues

Hennig, W. 1966. Phylogenetic systematics. University of Illinois Press, Urbana Photo: Frontispiece of Wiley, E. O. 1981. Phylogenetics: the theory and practice of Paul O. Lewis ~ Phylogenetics, Spring 2020 phylogenetic systematics. John Wiley & Sons, New York. Photo by George W. Byers, Univ. Kansas Changing terminology Oglabrous Otomentose glabrous tomentose

tomentose tomentose

Transformation series: Character: leaf pubescence leaf pubescence

Characters: Character states: glabrous glabrous IEtomentose # tomentose

Photo (tomentose): http://www.sbs.utexas.edu/bio406d/images/pics/eup/croton_capitatus.htm Paul O. Lewis ~ Phylogenetics, Spring 2020 Photo (glabrous): http://www.lhccrems.nsw.gov.au/CPR/CPR/glossary/glabrous.htm First computed phylogeny

1957: Charles Michener and o s Robert Sokal published the first phylogeny estimated using a computer (but not using parsimony).

Robert Sokal, 1959

“...the computation of a 97 species X 97 species matrix of correlation coefficients presents serious technical difficulties. An operation of this magnitude cannot be reasonably undertaken without punched card or electronic computing equipment.”- - Michener, C. D., and R. R. Sokal. 1957. A Quantitative Approach to a Problem in Classification. 11:130-162 Paul O. Lewis ~ Phylogenetics, Spring 2020 Photo: Fig. 10.1, p. 124, in Felsenstein, J. 2004. Inferring phylogenies. Sinauer, Sunderland, MA. Photo courtesy of Robert Sokal. First parsimony analysis 1964: Anthony Edwards and Luca Cavalli-Sforza first used parsimony and likelihood for human blood group frequency data ×. “...we do not anticipate any =major difficulties in the corresponding computer programmes, although these €are not yet working.” This quote refers to the maximum likelihood approach: it would not be until 1973 that Joe Felsenstein identified the - I central problem (trying to estimate too many parameters) and produced a workable maximum likelihood method for continuous data.

Edwards, A. W. F., and L. L. Cavalli-Sforza. 1964. Reconstruction of evolutionary trees. pp. 67-76 in Phenetic and phylogenetic classification, ed. V. H. Heywood and J. McNeill. Systematics Association Publ. No. 6, London. Photo: Fig. 10.3, p. 126, in Felsenstein, J. 2004. Inferring phylogenies. Sinauer, Sunderland, MA. Photo by Joe Anthony Edwards, 1963 Felsenstein. Paul O. Lewis ~ Phylogenetics, Spring 2020 Irreversible Parsimony

1965: Joseph FCamin and Robert of fossil horses (Fig. 4 in Camin & Sokal, 1965) Sokal published the first parsimony - - method method using discrete character data Character states ordered from most ancestral to most derived, and reversals were not allowed

“Comparison by Camin of these various schemes with the “truth” led him to the observation that those trees which most closely resembled the true cladistics invariably required for their construction the least number of postulated evolutionary ⇐steps for the characters studied. Subsequently we examined the possibility of reconstructing cladistics by the principle of evolutionary O parsimony.” Joseph Camin, mid-1970s Some representative “Caminalcules”

Camin, J. H., and R. R. Sokal. 1965. A method for deducing branching sequences in phylogeny. Evolution 19:311-326 Paul O. Lewis ~ Phylogenetics, Spring 2020 Photo: Fig. 10.6, p. 129, in Felsenstein, J. 2004. Inferring phylogenies. Sinauer, Sunderland, MA. Photo courtesy of the Snow Entomological Division, Natural History Museum, University of Kansas. First molecular phylogeny

1966: Richard Eck and Margaret Dayhoff published the first molecular phylogeny using a numerical algorithm “In principle, for a family of closely related proteins, we seek the topology that has the minimum number of mutations.”

Eck and Dayhoff method: • characters unordered (any amino acid can change to any other amino acid with 1 step) • starting tree by stepwise addition (parsimony score approximate) Margaret Dayhoff •= branch swapping (not explicitly defined)

Eck, R. V., and M. O. Dayhoff. 1966. Atlas of protein sequence and structure. National Biomedical Research Foundation. Silver Spring, Maryland. Paul O. Lewis ~ Phylogenetics, Spring 2020 Photo: http://www.dayhoff.cc/MODayhoff_Contrib_Science.html Wagner’s Ground

State 0 = presumed ancestral state, State 1 = presumed derived state Plan Divergence - ToCharacter Neocteniza Actinopus Missulena Warren “Herb” Wagner 1 1 0 0 is well-known for his 2 1 0 0 work on ferns as well 3 1 0 0 as his “ground plan divergence” method for 4 1 0 0 constructing 5 1 0 0 phylogenies 6 1 0 0 7 1 0 0 8 0 1 1 9 0 1 1 10 0 1 1 11 0 1 1 ÷ 12 0 1 1 13 0 1 0 14 0 1 0 15 0 1 1 . http://um2017.org/faculty-history/faculty/warren- ÷. 16 0 1 0 * 17 0 1 0 Wagner ground plan divergence diagram (Fig. 18 0 0 1 5.30, p. 177, in Wiley, 1981) 19 0 0 1 Photo of Wagner: Photo of Wagner: %E2%80%9Cherb%E2%80%9D-wagner total O7 O10 O8

Paul O. Lewis ~ Phylogenetics, Spring 2020 Ordered (Wagner) parsimony - - 1969: Arnold Kluge and Steve Farris produce software for computing Wagner parsimony trees (and introduce CI)

Arnold Kluge Steve. Farris

Kluge, A. G., and J. S. Farris. 1969. Quantitative phyletics and the evolution of anurans. Systematic Zoology 18:1-32 Phylogeny of anuran (frogs and toads) families Wiley, E. O. 1981. Phylogenetics: the theory and practice of phylogenetic (Fig. 5, p. 14, in Kluge and Farris, 1969) systematics. John Wiley & Sons, New York. Photo of Farris: http://systbio.org.whsites.net/?q=taxonomy/term/2&from=30 Photo of Kluge: http://um2017.org/faculty-history/faculty/arnold-g-kluge

Paul O. Lewis ~ Phylogenetics, Spring 2020 Ordered states

whorled alternate 2 steps

00-0opposite

1 step

1 step Paul O. Lewis ~ Phylogenetics, Spring 2020 10 Unordered (Fitch)- parsimony -

1971: Walter Fitch described his algorithm for computing tree length for parsimony with four unordered states (e.g., DNA or RNA data)

Nearly all parsimony analyses of nucleotide data have used this Walter Fitch, 1975 algorithm

Fitch, W. 1971. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Zoology 20:406-416 Paul O. Lewis ~ Phylogenetics, Spring 2020 Photo: Fig. 10.7, p. 131, in Felsenstein, J. 2004. Inferring phylogenies. Sinauer, Sunderland, MA. Photo courtesy of Walter Fitch. . Sequence Data parsimonyinformahve parsimonffrmatie Taxa Characters# (=sites in this case) u t E Agave GGGCTGGGC-CGGCCGGTCCGC-CTCT... Bougainvillia GGGGTGGGT-CGACCGGTCCGC-CTTG... Ceratophyllum GGGTTGGGC-CGGTCGGTCOGTCACTCT... Digitalis GGGTTGGGT-CGGCCGGTCCGC-CTTT... Epilobium GGGTTGGGT-CGACCGGTCCGC-CTTA... Fagus GGGTTGGGC-AGAGCGGTCCGC-CCCT... Galax GGGTTGG?C-CG?CCGGTCCGC-CTAG... l A- Constant variable site sites

constant site parsimony informative site

Paul O. Lewis ~ Phylogenetics, Spring 2020 variable site parsimony uninformative site Discrete Morphology

Taxa Characters P._fimbriata 000000001101010110?01... P._robusta 000000001101010110001... P._articulata 100111010110000001100... P._parksii 000111012110000001000... P._americana 100111010111100001000... P._myriophylla 100111010111000001000... P._macrophylla 110111012100000000000... P._polygama 110111013100100000000... P._gracilis 101111013000001000110... P._ciliata 001110112000001000110... P._basiramia 001110112000001000110...

binary (2-state) site

Paul O. Lewis ~ Phylogenetics, Spring 2020 multistate site Parsimony

123456789... Taxon1 CGACCAGGT... Taxon2 CGACCAGGT... Taxon3 CGGTCCGGT... Taxon4 CGGCCTGGT...

Same tree but with data One of the from site 6 inserted three possible in place of taxon Taxon1 unrooted Taxon3 A C names trees

8Taxon2 Taxon4 A T Paul O. Lewis ~ Phylogenetics, Spring 2020 ) "Standard" Parsimony

A C A C A C A C

A A A C A G A T :iIA T A T A . T A T A C A C A C A C

[C A 'C C 'C G 'C T A 5 T A 3 T A 5 T A 4 T

A C A C A C A C

GG A GG C GG G GG T

A T A T A T A T

A C A C A C A C

TT A TT C TT G TT T A T A T A T A T Paul O. Lewis ~ Phylogenetics, Spring 2020 Important things to note about that last slide: • Two (2) steps was the minimum – no way to explain the observed data with just 1 evolutionary change • More than one way to assign ancestral character states to get 2 steps – one interior node must have A but the other interior node can have anything except G • Enumerating all possible combinations of ancestral states is not the most efficient way to determine the number of steps – more on this later

Paul O. Lewis ~ Phylogenetics, Spring 2020 Parsimony Steps

d Taxon1 Taxon3 123456789... ' Taxon1 CGACCAGGT... 4 µA 6 Taxon2 CGACCAGGT... 36 • Taxon3 CGGTCCGGT... Taxon2 Taxon4 Taxon4 CGGCCTGGT... Steps ______...001 I 02008 Let's call this tree 1: (1,2,(3,4))

Tree 1's length for first 9 sites = ⑧? 44

Paul O. Lewis ~ Phylogenetics, Spring 2020 Parsimony Steps

123456789... Taxon1 Taxon2 Taxon1 CGACCAGGT... Taxon2 CGACCAGGT... < Taxon3 CGGTCCGGT... 4=3 3% Taxon4 CGGCCTGGT... → 6 Taxon3 Taxon4 Steps ______...002A 02000 Tree 2: (1,3,(2,4))

Tree 2's length for first 9 sites = ?• 5

Paul O. Lewis ~ Phylogenetics, Spring 2020 Parsimony Steps

123456789... Taxon1 Taxon2 Taxon1 CGACCAGGT...

Taxon2 CGACCAGGT... 3 4 - r Taxon3 CGGTCCGGT... ' G 643 Taxon4 CGGCCTGGT... Taxon4 Taxon3 Steps 002102000______... Tree 3: (1,4,(2,3))

Tree 3's length for first 9 sites = •? 5

Paul O. Lewis ~ Phylogenetics, Spring 2020 Parsimony (using only 9 sites)

Taxon1 Taxon3 Taxon1 Taxon2 Taxon1 Taxon2

Taxon2 Taxon4 Taxon3 Taxon4 Taxon4 Taxon3

4 steps 5 steps 5 steps

most This is the simplest explanation of the data for parsimonious the first 9 sites according to the parsimony criterion.

Choosing one of the other two trees requires additional (ad hoc) justification. If Paul O. Lewis ~ Phylogenetics, Spring 2020 Occam 's Ockham's Razor - "Essentia non sunt multiplicanda praeter necessitatem" (Entities should not be multiplied unnecessarily) - — William of Ockham, 14th. century

In science, this is the principle of parsimony: all other things equal, go with the simplest, least complicated hypothesis

In phylogenetics, it has become the- parsimony criterion: all other things equal, go with the tree that requires the fewest inferred character state changes

Paul O. Lewis ~ Phylogenetics, Spring 2020 http://www-groups.dcs.st-and.ac.uk/~history/Quotations/Ockham.html Counting steps with a minimum of effort

T C C A A G I

* I + I EAC} A ,G } * { •

{ AS 4 steps { A ,C } ti

{ A } t ' Paul O. Lewis ~ Phylogenetics, Spring 2020 ,C,T transitions Parsimony variants ⇐ R : A a

C G A A A G Y : c ⇐ T

- g R trainer ins A→G, ④ A es c

TR R A ⇒ T ①A→lG G ⇐ c GGT

' C→A A- = G C - T

transversions more reliable transversion parsimony Paul O. Lewis ~ Phylogenetics, Spring 2020 →format equate="A=G C=T" ← Transversion Parsimony

Paul O. Lewis ~ Phylogenetics, Spring 2020 History of Parsimony

1975: David Sankoff describes an algorithm for determining tree length for generalized parsimony

David Sankoff

Paul O. Lewis ~ Phylogenetics, Spring 2020 Sankoff, D. 1975. Minimal Mutation Trees of Sequences. SIAM Journal on Applied Mathematics 28:35-42. Photo: http://www.crm.umontreal.ca/math2000-1/pub/biomathematique.html Step Matrices

To A C G T A 0 1 1 1 C 1 0 1 1 From G 1 1 0 1 T 1 1 1 0

Step matrix for Fitch parsimony

Paul O. Lewis ~ Phylogenetics, Spring 2020 Step Matrices

To A C G T A 0 C 0 From G 0 T § : 0

Step matrix for transversion parsimony

Paul O. Lewis ~ Phylogenetics, Spring 2020 Step Matrices

To A C G T A 0 ⑤ i ⑤ C 5 0 5 I From G I 5 0 5

T 5 I 5 0

Step matrix for analysis in which all changes are allowed, but transversions are weighted 5 times more than transitions

Paul O. Lewis ~ Phylogenetics, Spring 2020 Generalized Parsimony

Paul O. Lewis ~ Phylogenetics, Spring 2020 Terminology of cladistics other derived shared state nonshared red Green plants state denied derived alga algae ✓ apomorphy, synapomorphy, autapomorphy g- -- - - 'th" asfgggtmsahaiees.nu state

plesiomorphy, symplesiomorphy - Chl . - b - minimum → more than µ number of steps required

- - - I 1- - min K chlorophyll K - no . states steps a consistency index: CI = m/s

min . # , m= step # l l O s = actual steps o c retention index: RI = (g-s)/(g-m) ' -

' k=2 £031 . . 17 pY' CIE 12 ÷÷÷±÷Farris, J. S. 1989. The retention index and the rescaled consistency index. Cladistics 5: 417-419. homoplasy 3 - 2 " ⇒ → =g÷=

" = ° I - fates yes §

= - I z m = K

s = 2

g =3