MOLECULAR PHYLOGENETIC METHODS Course Contents - I

MOLECULAR PHYLOGENETIC METHODS Course Contents - I. UPGMA II. Neighbor Joining III. Minimum Evolution IV. Maximum Parsimony V. Maximum Likelihood VI. Bayesian Inference Gurumayum Suraj Sharma MOLECULAR PHYLOGENETIC TREE BUILDING METHODS Mathematical/Statistical Methods for inferring divergence order of taxa, as well as the lengths of the branches that connect them. Many phylogenetic methods available: Each having strengths and weaknesses Cluster analysis is one such method in which OTUs are arranged in the in the order of decreasing similarity Gurumayum Suraj Sharma DISTANCE BASED METHODS Distance-based methods begin construction of tree by calculating pairwise distances between molecular sequences. A matrix of pairwise scores for all aligned proteins (or nucleic acid sequences) is used to generate a tree. GOAL - Find a tree in which branch lengths correspond as closely as possible to the observed distances. Main distance-based methods I. Unweighted Pair Group Method with Arithmetic Mean [UPGMA] II. Neighbor Joining [NJ] Distance-based methods of phylogeny Computationally fast Particularly useful for analyses of larger number of sequences (e.g., .50 or 100). Gurumayum Suraj Sharma USES DISTANCE METRIC o Number of amino acid changes between the sequences o Distance score. Distance is calculated as dissimilarity between the sequences of each pair of taxa. While similarities are useful, distances (which differ from differences) offer appealing properties for describing the relationships between objects. Distance based methods are fast but overlook substantial amount of information in a multiple sequence alignment Gurumayum Suraj Sharma DISTANCE-BASED METHOD UPGMA UPGMA algorithm introduced by Sokal & Michener [1958] Example: Consider five sequences whose distances can be represented as points in a plane. Also represent them in a distance matrix. Some protein sequences, such as 1 & 2, closely similar Others (1 & 3) are far less related. UPGMA clusters these sequences Gurumayum Suraj Sharma 1. Begin with a distance matrix . o Identify the least dissimilar groups (i.e. the two OTUs that are most closely related). o All OTUs given equal weights. If there are several equidistant minimal pairs, one is picked randomly. o Eg. OTUs 1 and 2 have the smallest distance. 2. Combine to form a new group. o Eg. Groups 1 & 2 have smallest distance (0.1) and are combined to form cluster (1, 2). o Results in formation of a new, clustered distance matrix having one fewer row and column than the initial matrix. o Dissimilarities that are not involved in formation of new cluster remain unchanged. o The values for clustered taxa (1,2) reflect average of OTUs 1 and 2 to each of the other OTUs. o The distance of OTU 1 to OTU 4 was initially 0.8, of OTU 2 to OTU 4 was 1.0, and then the distance of OTU (1,2) to OTU 4 becomes 0.9. Gurumayum Suraj Sharma 3. Connect through a new node on nascent tree. o This node corresponds to group. 4. Identify next smallest dissimilarity, & combine those taxa to generate a second clustered dissimilarity matrix. o It is possible that two OTUs will be joined (if they share the least dissimilarity), or a single OTU will be joined with a cluster, or two clusters will be joined. o The dissimilarity of a single OTU with a cluster is computed simply by taking average dissimilarity. o In this process a new distance matrix is formed, and the tree continues to be constructed. 5. Continue until there are only two remaining groups, and join these. Gurumayum Suraj Sharma I. Each sequence is assigned to its own cluster. A distance matrix, based on some metric, quantitates the distance between each object. Circles represent sequences. II. The taxa with closest distance (1 and 2) identified and connected. This allows to name an internal node. The distance matrix is reconstructed counting taxa 1 and 2 as a group. Identify the next closest sequences. Gurumayum Suraj Sharma III. Next closest sequences combined into cluster, and matrix is again redrawn. In the tree taxa 4 and 5 are now connected by a new node, 7. Further identify next smallest distance corresponding to the union of taxon 3 to cluster. IV. The newly formed group (cluster 4,5 joined with sequence 3) is represented on the emerging tree with new node 8. V. Finally, all sequences are connected in a rooted tree. Gurumayum Suraj Sharma INPUT/ INITIAL SETTING Start with clusters of individual points and a distance/proximity matrix p1 p2 p3 p4 p5 . p1 p2 p3 p4 p5 . Distance/Proximity. Matrix Gurumayum Suraj Sharma INTERMEDIATE STATE After some merging steps, we have some clusters C1 C2 C3 C4 C5 C1 C2 C3 C3 C4 C4 C5 C1 Distance/Proximity Matrix C2 C5 Gurumayum Suraj Sharma INTERMEDIATE STATE Merge the two closest clusters (C2 and C5) and update the distance matrix. C1 C2 C3 C4 C5 C1 C3 C2 C3 C4 C4 C5 C1 Distance/Proximity Matrix C2 C5 Gurumayum Suraj Sharma STEP 1 STEP 2 STEP 3 STEP 4 Critical Assumption of UPGMA - Rate of nucleotide or amino acid substitution is constant for all branches in tree, i.e., The Molecular Clock applies to all evolutionary lineages. If this assumption is true, branch lengths can be used to estimate the dates of divergence ,& sequence-based tree mimics a species tree. UPGMA tree is rooted because of its assumption of a molecular clock. If violated & there are unequal substitution rates along different branches of tree, the method can produce an incorrect tree. Other methods (including neighbour-joining) do not automatically produce a root, but a root can be placed by choosing an outgroup or by applying midpoint rooting. Gurumayum Suraj Sharma UPGMA Method - Commonly used distance method in variety of applications. Microarray data analysis. In phylogenetic analyses using molecular sequence data its simplifying assumptions tend to make it significantly less accurate than other distance-based methods such as neighbor-joining. Gurumayum Suraj Sharma DISTANCE-BASED METHODS NEIGHBOUR JOINING [SAITOU AND NEI, 1987 ] Neighbor-joining Method is used for building trees by Distance Methods . Produces both Topology & Branch lengths . Example: A neighbour is a pair of OTUs connected through a single interior node X in an unrooted, bifurcating tree Method related to the cluster method Does not require that all lineages have diverged by equal amounts. Especially suited for datasets comprising lineages with largely varying rates of evolution . Can be used in combination with methods that allow correction for superimposed substitutions. Gurumayum Suraj Sharma Neighbor-joining method - A special case of Star Decomposition Method . Keeps track of nodes on tree rather than taxa or clusters of taxa. Raw data provided as distance matrix & initial tree is a STAR TREE . Modified distance matrix is constructed in which separation between each pair of nodes is adjusted on basis of their average divergence from all other nodes. The tree is constructed by linking the least-distant pair of nodes in this modified matrix. When two nodes are linked, their common ancestral node is added & terminal nodes with their respective branches are removed. The process converts the newly added common ancestor into a terminal node on a tree of reduced size. At each stage two terminal nodes are replaced by one new node The process is complete when two nodes remain, separated by a single branch. Gurumayum Suraj Sharma The process of starting with a star-like tree and finding and joining neighbours is continued until the topology of the tree is completed. Neighbour-joining, algorithm minimizes the sum of branch lengths at each stage of clustering OTUs although the final tree is not necessarily the one with the shortest overall branch lengths. Results may differ from minimum evolution strategies or maximum parsimony. Neighbour joining produces an unrooted tree topology Because it does not assume a constant rate of evolution, unless an outgroup is specified or midpoint rooting is applied. Gurumayum Suraj Sharma NJ method distance-based algorithm: I. OTUs are first clustered in a Starlike Tree . “Neighbours ” are defined as OTUs that are connected by a single, interior node in an unrooted, bifurcating tree. II. Two closest OTUs are identified. These neighbours are connected to other OTUs via internal branch XY . The OTUs [neighbours] that are selected are chosen as ones that yield smallest sum of branch lengths. The process is repeated until the entire tree is generated Gurumayum Suraj Sharma Gurumayum Suraj Sharma ADVANTAGES & DISADVANTAGES ADVANTAGES o Fast and thus suited for large datasets and for bootstrap analysis o Permits lineages with largely different branch lengths o Permits correction for multiple substitutions DISADVANTAGES o Sequence information reduced o Gives only one possible tree o Strongly dependent on model of evolution used. Gurumayum Suraj Sharma MINIMUM EVOLUTION MAIN IDEA- Based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. Length computed from pair-wise distance between the sequences . Slightly similar to Parsimony Method. Tree obtained for ME and parsimony methods nearly identical in topology and branch length. Available in PHYLIP & ClustalW package Gurumayum Suraj Sharma ADVANTAGES & DISADVANTAGES ADVANTAGES o Easy to perform & quick calculation o Fit for sequences having high similarity scores DISADVANTAGES o Loss of Information since sequences are not considered as such o All sites equally treated [differences in substitution rates not considered] o Not applicable in distantly related divergent sequences. Gurumayum Suraj Sharma Gurumayum Suraj Sharma PHYLOGENETIC INFERENCE MAXIMUM PARSIMONY Parsimony: Latin- Parcere meaning “ to spare ” Refers to simplicity of assumptions in a logical formulation MAIN IDEA- Best tree is that with the shortest branch lengths possible. Hennig (1966), and Eck & Dayhoff (1966) Used parsimony-based approach in generating phylogenetic trees based on morphological characters Gurumayum Suraj Sharma Dayhoff et al.

Load more