Phylogeny Codon Models • Last Lecture: Poor Man’S Way of Calculating Dn/Ds (Ka/Ks) • Tabulate Synonymous/Non-Synonymous Substitutions • Normalize by the Possibilities

Phylogeny Codon models • Last lecture: poor man’s way of calculating dN/dS (Ka/Ks) • Tabulate synonymous/non-synonymous substitutions • Normalize by the possibilities • Transform to genetic distance KJC or Kk2p • In reality we use codon model • Amino acid substitution rates meet nucleotide models • Codon(nucleotide triplet) Codon model parameterization Stop codons are not allowed, reducing the matrix from 64x64 to 61x61 The entire codon matrix can be parameterized using: κ kappa, the transition/transversionratio ω omega, the dN/dS ratio – optimizing this parameter gives the an estimate of selection force πj the equilibrium codon frequency of codon j (Goldman and Yang. MBE 1994) Empirical codon substitution matrix Observations: Instantaneous rates of double nucleotide changes seem to be non-zero There should be a mechanism for mutating 2 adjacent nucleotides at once! (Kosiol and Goldman) • • Phylogeny • • Last lecture: Inferring distance from Phylogenetic trees given an alignment How to infer trees and distance distance How do we infer trees given an alignment • • Branch length Topology d 6-p E 6'B o F P Edo 3 vvi"oH!.- !fi*+nYolF r66HiH- .) Od-:oXP m a^--'*A ]9; E F: i ts X o Q I E itl Fl xo_-+,<Po r! UoaQrj*l.AP-^PA NJ o - +p-5 H .lXei:i'tH 'i,x+<ox;+x"'o 4 + = '" I = 9o FF^' ^X i! .poxHo dF*x€;. lqEgrE x< f <QrDGYa u5l =.ID * c 3 < 6+6_ y+ltl+5<->-^Hry ni F.O+O* E 3E E-f e= FaFO;o E rH y hl o < H ! E Y P /-)^\-B 91 X-6p-a' 6J. ; J r0a 6X;i;1"<;- ^ - 9.p P,ti o i 6-Yt l0: o x-69^L-:X a ^<c,q E' a> 6 P a' EX=':rxq 9e.FErn=:-{€ E'r'-5 E + P@F X60 i,igH+tsE o RCr'6EE Fg6 B 9 P *)6-!P !=iii;i=ii:r[+a# ilD iig ryH@- @ , @ {Iffi- D olr & z:?iiiEiE3;Fg+E+lii ;1ii=-?lEii ifi gE Fi Ff iEi ,/ -/r -- ==4-ej ,7 :=E gDo :i;iig[iFFFr -- ==- \\- ==d :E- 'if -eo--5 \ tl ?z?;E:i;giiigig i// ^n ig . ] Rooted tree LUU I I l rl ff1::,::::e amoUnt 'r= i --- --a= :l,rnu the | m.ii: :: .: iha tree; I mgt :-; lin|}unt O{ mtr -*:: :;:re :.J,litive Unrootedtree Eit:, :i --t:le "\'ilereas G o aL'..r:::c.iiflerent Fig.2.6 Rooted and unrooted he: : ::e n:arions, trees for human (H), chimp m ::,,-,.- ieft-to- ,C), gorilla (C). urdng-utan iO), and gibbon (B). The fro : !" -:t choice rooted tree (top) corresponds B Rooted vs m-*::.!:.:acsunrooted it may k to the unrooted tree below. &"a:, I '--';:rer. Jrlst • What is a good treeW-;-i:":-:: -::CabOve • m ; :::".rr: lefl to Accurately represents observed distancegr .:. ::ili l'ertical • Distance goodness of fit • Accurately represents evolutionary history—mode difficult to define • Depends on assumption about Rooted tree 1 Rooted tree 2 evolutionary process vuL OGHC :..- i*' _ • A tree root implies a common ancestoril" • Some tree 'f"t building approaches do not Rooted tree 3 Rooted tree 4 Rooted tree 5 provide a root,i,i: IL HCGOB CBOG CGOB • Root identification does effect goodness of fi fit t.,- r [r T Rooted tree 6 Rooted tree 7 g CGBO cGo Rooted ll- : Unrooted N species 1r: N species 2N – 2 branches 2N – 3 branches I ': N – 1 internal nodes N – 2 internal nodes :,. :laced Fig.2.7 The seven rooted trees that can be derived from an unrooted tree for five f:." I ::i rree sequences.Each rooted tree l-7 corresponds to placing the root on the corresponding numbered branch of the unrooted tree. (Sequence labels as for Fig. 2.6.) Gene tree vs species tree • Gene tree is not always the same as a species tree • Species evolve as populations-- gene divergence can predate a speciation event • Inferring species relationship from molecular data—use many orthologous genes –paralogs present many problems • Orthologs can be identified using synteny—chromosomal order Phylogeny is reconstructed through either distance data or discrete characters Distance – all pairwise distances are computed from characters. • UPGMA, Neighbor-Joining… Discrete character methods use sequences directly during inference. • Maximum parsimony, Maximum likelihood, Bayesian methods… Molecular Evolution – Lecture 11 8 UPGMA- unweighted Pair Group Method with Arithmetic Mean First, join shortest-distance pair with branching point at half distance. Lij = dij / 2 Recalculate distance matrix joining x&y as node1 (n1). Ln1z = (dxz + dyz – dxy) / 2 Join next shortest-distance pair, and continue… When updating distances to clusters, use the size of the cluster to calculate correct distance If genetic distance perfectly correlated with time (molecular clock), the UPGMA method would return the correct tree topology and branch lengths. This is never true, and UPGMA, although simple and useful for learning trees, is not often used. UPGMA mistakes Morr.cutan Puy ro GrNrrtc s 185 (b) A • UPGMA is not often (c) A used and makes mistakes by joining branches with slow rates, but that are not true sister taxa. • produces an ultrametric FIGURE5.11 (a) The true,phylogenetic tree. (b) The erroneous phylogenetic tree reconstructedbyusing UPGMA" which does not take into accouni th"epossibility tree in which the of unequal substitution rates alo,ng the different branches. (c) The treeinferred by distances from the root the transformed distance method. The root must be on the branch connecting orU D and the node of the common ancestor of orUs A" B, and C, but its exact lo-cation to every branch tip are cannot be determinedA B by Cthe transformed distance method. equal B 8 7 12 Transforme C 7d di st9 ance 14 meth o d If the Dassumption12 14of rate11 constancyamong lineagesdoes not hold, upGMA may give an erroneoustopology. For example, suppose that the phylogenetic tree in Figure 5.1la is the true tree. Under the assumption of udditl.ritv. th" pairwise evolutionarydistances are given by the following matrix: OTU ABC B 8 C 79 D 72 14 71 By using UPGMA, we obtain an inferred tree (Figure 5.11b)that differs in topology from the true tree (Figure5.11a). For example,orUs A and c are grouped together in the inferred tree, whereasin the true tree A and B are sister orUs. (Note that additivity doesnot hold in the inferred tree.For example, the true distancebetween A and B is 8, whereas the sum of the lengths of the estimatedbranches connecting A and B is 3.50+ 0.75+ 4.28= g.50.) Thesetopological errors might be remedied,however, by using a correc- tion calledthe transformeddistance method (Farris7977;Klotz et aI.ozo\.tn brief, this method uses an outgroup as a referencein order to make correc- tions for the unequal ratesof evolution among the lineagesunder study and then appliesUPGMA to the new distancematrix to infei the topology or tn" tree.An outgroup is an orU or a group of severalorUs for whlch we have externalknowledge, such as taxonomic or paleontologicalinformation, that clearly shows them to have diverged from the common ancestorprior to all the other OTUs under consideration(the ingroup taxa). In the present case,let us assumethat taxon D is an outgroup to all other taxa.D can then be used as a referenceto transform the distancesby the fol- lowingequation: Neighbor Joining • Greedy like UPGMA –except… • Start with everything in a star topology and gradually join leaves with a common branch until a binary tree remains • Which leaves to join • Transform distance matrix into a U matrix • Calculate new distance: we joined g and f to D U create internal node u 31 34 34 • Each calculations uses information from the 30 entire distance matrix 27 Figures and formulas from Wikipedia NJ properties • NJ can be seen as optimizing the least squares error to the distance matrix • Neighbor joining has the property that if the input distance matrix is correct, then the output tree will be correct. • The correctness of the output tree topology is guaranteed as long as the distance matrix is 'nearly additive’-- if each entry in the distance matrix differs from the true distance by less than half of the shortest branch length in the tree • In practice above is not satisfied but the correct tree is found anyway • May assign negative lengths • Tree has no root Rooting a tree • Midpoint rooting • Find the longest path and place the root in the middle • works if rate is constant or balanced along the tree • Shouldn’t change if species is removed • Outgroup rooting-using prior knowledge about relationships Maximum Parsimony • Not greedy—explores the tree space and uses sequence alignments directly • Uses an optimality criterion to scrutinize trees. Therefore, it is a tree searching method, as opposed to clustering like distance-based methods. • Criterion – smallest number of evolutionary changes • Ockham’s razor: the best hypothesis is that requiring the fewest assumptions. Maximum parsimony • Sites are informative if • they are variable • The help us differentiate between trees • At least 2 characters occur at least 2 times • Site2 is uninformative because all three possible trees require 1 evolutionary change, G ->A. • Site 3 is uninformative because all trees require 2 changes. • Site4 is uninformative because all trees require 3 changes. • Site5 is informative because tree I requires one change, trees II and III require two Same as changes Site 7 • Site7 is informative,likesite 5 • Site 9 is informative because tree II requires one change, trees I and III require two. site9+2*site5 4 5 6 Parsimony rule • The set at an interior node is the intersection of its two immediately descendant sets if the intersection is not empty.

Phylogeny Codon Models • Last Lecture: Poor Man’S Way of Calculating Dn/Ds (Ka/Ks) • Tabulate Synonymous/Non-Synonymous Substitutions • Normalize by the Possibilities

Lecture Notes: the Mathematics of Phylogenetics

Mutations, the Molecular Clock, and Models of Sequence Evolution

An Introduction to Phylogenetic Analysis

Family Classification

The Probability of Monophyly of a Sample of Gene Lineages on a Species Tree

A Comparative Phenetic and Cladistic Analysis of the Genus Holcaspis Chaudoir (Coleoptera: .Carabidae)

Math/C SC 5610 Computational Biology Lecture 12: Phylogenetics

Solution Sheet

A Fréchet Tree Distance Measure to Compare Phylogeographic Spread Paths Across Trees Received: 24 July 2018 Susanne Reimering1, Sebastian Muñoz1 & Alice C

The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing, Philippe Lemey, Marco Salemi, and Anne-Mieke Vandamme (Eds.)

Clustering and Phylogenetic Approaches to Classification: Illustration on Stellar Tracks Didier Fraix-Burnet, Marc Thuillard

Phylogenetics