<<

Introduction Tree Terminology Molecular Evolution and ...a Homology very very short course Molecular Evolution Evolutionary Models Hern´anJ. Dopazo∗ Distance Methods Pharmacogenomics and Comparative Genomics Unit Maximum Parsimony Searching Trees Bioinformatics Department† Tree Confidence PC Lab Centro de Investigaci´onPr´ıncipe Felipe‡ Phylogenetic Links CIPF Credits

Valencia - Spain Title Page JJ II

J I

Page 1 of 60

Go Back ∗[email protected] Full Screen †http://bioinfo.cipf.es ‡http:www.cipf.es/principal.htm Close

Quit 1. Introduction

1.1. Three basic questions Introduction Tree Terminology • Why use phylogenies? Homology – Like astronomy, biology is an historical science! Molecular Evolution Evolutionary Models – The knowledge of the past is important to solve many questions re- Distance Methods lated to biological patterns and processes. Maximum Parsimony • Can we know the past? Searching Trees Tree Confidence – We can postulate alternative evolutionary scenarios (hypothesis) PC Lab – Obtain the proper dataset and get statistical confidence Phylogenetic Links • What means to know ”...the phylogeny”? Credits

– The ancestral-descendant relationships (tree topology) Title Page

– The distances between them (tree branch lengths) JJ II

J I

Page 2 of 60

Phylogenies are working hypotheses!!! Go Back

Full Screen

Close

Quit 2. Tree Terminology

2.1. Topology, branches, nodes & root Introduction Tree Terminology • Nodes & branches.Trees contain internal and external nodes and branches. Homology In , external nodes are sequences representing Molecular Evolution genes, populations or species!. Sometimes, internal nodes contain Evolutionary Models the ancestral information of the clustered species. A branch defines the Distance Methods relationship between sequences in terms of descent and ancestry. Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 3 of 60

Go Back

Full Screen

Close

Quit • Root is the common ancestor of all the sequences.

• Topology represents the branching pattern. Branches can rotate on Introduction internal nodes. Instead of the singular aspect, the folowing trees represent Tree Terminology a single phylogeny. Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II The topology is the same!! J I

Page 4 of 60

Go Back

Full Screen

Close

Quit • Taxa.(plural of taxon or operaqtional taxonomic unit (OTU)) Any group of organisms, populations or sequences considered to be sufficiently distinct Introduction from other of such groups to be treated as a separate unit. Tree Terminology • Polytomies. Sometimes trees does not show fully bifurcated (binary) Homology topologies. In that cases, the tree is considered not resolved. Only the Molecular Evolution relationships of species 1-3, 4 and 5 are known. Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Polytomies can be solved by using more sequences, more Page 5 of 60 characters or both!!! Go Back

Full Screen

Close

Quit 2.2. Rooted & Unrooted trees

Trees can be rooted or unrooted depending on the explicit definition or not Introduction of outgroup sequence or taxa. Tree Terminology Homology • Outgroup is any group of sequences used in the analysis that is not in- Molecular Evolution cluded in the sequences under study (ingroup). Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II • Unrooted trees show the topological relationships among sequences al- thoug it is impossible to deduce wether nodes (ni) represent a primitive or J I derived evolutionary condition. Page 6 of 60 • Rooted trees show the evolutionary and derived evolutionary rela- Go Back tionships among sequences. Rooting by outgroup is frequent in molecular phylogenetics!! Full Screen

Close

Quit 2.3. & Phylograms

Trees showing branching order exclusivelly () are principally the Introduction 1 2 interest of systematists to make inferences on . Those interesting in Tree Terminology the evolutionary processes emphasize on branch lengths information (anagenesis). Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II • Dendrogram is a branching diagram in the form of a tree used to depict degrees of relationship or resemblance. J I

is a branching diagram depicting the hierarchical arrangement Page 7 of 60

of taxa defined by cladistic methods (the distribution of shared derived Go Back characters -synapomorphies-). Full Screen 1The study of biological diversity. 2The theory and practice of describing, naming and classifying organisms Close

Quit • Phylogram is a that indicates the relationships between the taxa and also conveys a sense of time or rate of evolution. The tem- Introduction poral aspect of a phylogram is missing from a cladogram or a generalized Tree Terminology dendogram. Homology • Distance scale represents the number of differences between sequences Molecular Evolution (e.g. 0.1 means 10 % differences between two sequences) Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I Rooted and unrooted phylograms or cladograms are frequently Page 8 of 60

used in molecular ! Go Back

Full Screen

Close

Quit 2.4. Consensus trees

It is frequent to obtain alternative phylogenetic hypothesis from a single data Introduction set. In such a case, it is usefull to summarize common or average relationships Tree Terminology among the original set of trees. A number of different types of consensus trees Homology have been proposed; Molecular Evolution • The majority rule consensus tree uses a simple majority of relationships Evolutionary Models among the fundamental trees. Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 9 of 60

A consensus tree is a summary of how well the original trees agrees. Go Back

A helpfull manual covering these and other concepts of the section can be ob- Full Screen tained in [20, 12]. Close

Quit 3. Homology

The Origin of Species. Charles Darwin. Chapter 14 Introduction Tree Terminology What can be more curious than that the hand of a man, formed for Homology grasping, that of a mole for digging, the leg of the horse, the paddle of the porpoise, and the wing of the bat, should all be constructed Molecular Evolution on the same pattern, and should include similar bones, in the same Evolutionary Models relative positions? Distance Methods Maximum Parsimony Why should similar bones have been created to form the wing and Searching Trees the leg of a bat, used as they are for such totally different purposes, Tree Confidence namely flying and walking? PC Lab How inexplicable are the cases of serial homologies on the ordinary Phylogenetic Links view of creation! Credits

Title Page

JJ II

J I

Page 10 of 60

Go Back

Since Darwin homology was the result of descent with modification Full Screen from a common ancestor. Close

Quit 3.1. Homoplasy

• Similarity among species could represent true homology (just by sharing Introduction the same ancestral state) or, homoplastic events like convergence, par- Tree Terminology allelism or reversals; Homology Molecular Evolution • Homology is a posteriori tree construction definition. Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 11 of 60

Go Back

Full Screen

Close

Quit • Convergences are ...

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Homoplasy can provide misleading evidence of phylogenetic Page 12 of 60

relationships!! (if mistakenly interpreted as homology). Go Back

Full Screen

Close

Quit • Parallels are ...

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 13 of 60 Homoplasy can provide misleading evidence of phylogenetic relationships!! (if mistakenly interpreted as homology). Go Back

Full Screen

Close

Quit • Reversions are ...

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Homoplasy can provide misleading evidence of phylogenetic Page 14 of 60 relationships!! (if mistakenly interpreted as homology). Go Back

Full Screen

Close

Quit 3.2. Similarity

• For molecular sequence data, homology means that two sequences or even Introduction two characters within sequences are descended from a common ancestor. Tree Terminology Homology • This term is frequently mis-used as a synonym of similarity. Molecular Evolution • as in two sequences were 70% homologous. Evolutionary Models Distance Methods • This is totally incorrect! Maximum Parsimony • Sequences show a certain amount of similarity. Searching Trees Tree Confidence • From this similarity value, we can probably infer that the sequences are PC Lab homologous or not. Phylogenetic Links • Homology is like pregnancy. You are either pregnant or not. Credits

• Two sequences are either homologous or they are not. Title Page

JJ II

J I

Page 15 of 60

Go Back

Full Screen

Close

Quit 3.3. Sequence homology

In molecular studies it is important to distinguish among kinds of homology[6]; Introduction Tree Terminology • Ortholog: Homologous genes that have diverged from each other after Homology speciation events (e.g., human β- and chimp β-globin). Molecular Evolution • Paralog: Homologous genes that have diverged from each other after gene Evolutionary Models duplication events (e.g., β- and γ-globin) Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 16 of 60

Go Back

Full Screen

Close

Quit • Xenolog: Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria). Introduction • Homolog: Genes that are descended from a common ancestor (e.g., all Tree Terminology globins). Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 17 of 60

Go Back

Full Screen

Close

Quit • Positional homology: Common ancestry of specific amino acid or nu- cleotide positions in different genes. Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 18 of 60

Go Back

Full Screen

Close

Quit 4. Molecular Evolution

4.1. Molecular clock Introduction Tree Terminology The molecular clock hypothesis postulates that for any given macromolecule Homology (a protein or DNA sequence), the rate of evolution -measured as the mean number Molecular Evolution of amino acids or nucleotide sequence change per site per year- is approximately Evolutionary Models constant over time in all the evolutionary lineages [21]. Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 19 of 60

Go Back

Full Screen

Close

Quit This hypothesis has estimulated much interest in the use of macromolecules in evolutionay studies for two reasons: Introduction • Sequences can be used as molecular markers to date evolutionary events. Tree Terminology Homology • The degree of rate change among sequences and lineages can provide in- Molecular Evolution sights on mechanisms of molecular evolution. For example, a large in- Evolutionary Models crease in the rate of evolution in a protein in a particular may Distance Methods indicate adaptive evolution. Maximum Parsimony Substitution rate estimation Searching Trees Tree Confidence PC Lab It is based on the number of aa substitution (distance) and divergence time Phylogenetic Links (fossil calibration), Credits

Title Page

JJ II

J I

Page 20 of 60

Go Back

Full Screen

Close

Quit There is no universal clock

Introduction Tree Terminology It is known that clock variation exists for: Homology • different molecules, depending on their functional constraints, Molecular Evolution Evolutionary Models • different regions in the same molecule, Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 21 of 60

Go Back

Full Screen

Close

Quit • different base position (synonimous-nonsynonimous),

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 22 of 60

Go Back

Full Screen

Close

Quit • different genomes in the same cell, • different regions of genomes, Introduction Tree Terminology • different taxonomic groups for the same gene (lineage effects) Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 23 of 60

Go Back

Full Screen

Close

Quit 5. Evolutionary Models

5.1. Multiple Hits Introduction Tree Terminology • The mutational change of DNA sequences varies with region. Even Homology considering protein coding sequence alone, the patterns of nucleotide Molecular Evolution substitution at the first, second or third codon position are not the Evolutionary Models same. Distance Methods • When two DNA sequences are derived from a common ancestral se- Maximum Parsimony quence, the descendant sequences gradually diverge by nucleotide sub- Searching Trees stitution. Tree Confidence PC Lab • A simple measure of sequence divergence is the proportion p = N /N d t Phylogenetic Links of nucleotide sites at which the two sequences are different. Credits

Title Page

JJ II

J I

Page 24 of 60

Go Back

Full Screen

Close

Quit • When p is large, it gives an underestimate of the number of of sub- stitutions, because it does not take into account multiple substitu- Introduction tions. Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 25 of 60

Go Back

Full Screen

Close

Quit • Sequences may saturate due to multiple changes (hits) at the same position after lineage splitting. Introduction • In the worst case, data may become random and all the phylogenetic Tree Terminology information about relationships can be lost!!! Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 26 of 60

Go Back

Full Screen

Close

Quit 5.2. Models of nucleotide substitution • In order to estimate the number of nucleotide substitutions Introduction ocurred it is necessary to use a mathematical model of nucleotide Tree Terminology substitution. The model would consider the nucleotide frequencies Homology and the instantaneous rate’s change among them. Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 27 of 60

Go Back

Full Screen

Close

Quit • Interrrelationships among models for estimating the number of nu- cleotide substitutions among a pair of DNA sequences Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 28 of 60

Go Back

Full Screen

Close

Quit • For constructing phylogenetic trees from distance measures, sophisti- cated distances are not neccesary more efficient. Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab • Indeed, by using sophisticated models distances show higher variance Phylogenetic Links values. Credits

Title Page

JJ II

J I

Page 29 of 60

Go Back

Full Screen

Close

Quit • Of course, corrected distances are greather than the observed.

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 30 of 60

Go Back

Full Screen

Close

Quit Distance correction methods share several assumptions:

• All nucleotide sites change independently. Introduction Tree Terminology • The substitution rate is constant over time and in different lineages Homology Molecular Evolution • The base composition is at equilibrium (all sequences have the same base frequencies) Evolutionary Models Distance Methods • The conditional probabilities of nucleotide substitutions are the same Maximum Parsimony for all sites and do not change over time. Searching Trees Tree Confidence PC Lab Phylogenetic Links While these assumptions make the methods tractable, they are Credits in many cases unrealistic. Title Page

JJ II

J I

Page 31 of 60

Go Back

Full Screen

Close

Quit 6. Distance Methods

Distance matrix methods is a major family of phylogenetic meth- Introduction ods trying to fit a tree to a matrix of pairwise distance [1, 5]. Tree Terminology Distance are generally corrected distances. Homology Molecular Evolution • The best way of thinking about distance matrix methods is to consider Evolutionary Models distances as estimates of the branch length separating that pair of Distance Methods species. Maximum Parsimony • Branch lengths are not simply a function of time, they reflect expected Searching Trees amounts of evolution in different branches of the tree. Tree Confidence PC Lab • Two branches may reflect the same elapsed time (sister taxa), but Phylogenetic Links they can have different expected amounts of evolution. Credits

• The product ri ∗ ti is the branch length Title Page

• The main distance-based tree-building methods are cluster analysis, JJ II least square and minimum evolution. J I • They rely on different assumptions, and their success or failure in Page 32 of 60 retrieving the correct phylogenetic tree depends on how well any par- ticular data set meet such assumptions. Go Back

Full Screen

Close

Quit Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 33 of 60

Go Back

Full Screen

Close

Quit 6.1. Cluster Analysis Cluster analysis derived from clustering algorithms popularized by Sokal Introduction and Sneath[16] Tree Terminology Homology 6.1.1. UPGMA Molecular Evolution Evolutionary Models One of the most popular distance approach is the unweighted pair-group method with arithmetic mean (UPGMA), which is also the simplest Distance Methods method for tree reconstruction [10]. Maximum Parsimony Searching Trees 1. Given a matrix of pairwise distances, find the clusters (taxa) i and j Tree Confidence such that dij is the minimum value in the table. PC Lab

2. Define the depth of the branching between i and j (lij) to be dij/2 Phylogenetic Links Credits 3. If i and j are the last 2 clusters, the tree is complete. Otherwise, create a new cluster called u. Title Page

4. Define the distance from u to each other cluster (k, with k 6= i or j) JJ II to be an average of the distances dki and dkj J I 5. Go back to step 1 with one less cluster; clusters i and j are eliminated, and cluster u is added. Page 34 of 60

The variants of UPGMA are in the step 4. Weighted PGMA(WPGM::dku = Go Back dki+dkj/2). Complete linkage (dku = max(dki, dkj). Single linkage(dku = Full Screen min(dki, dkj). Close

Quit Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

The smallest distance in the first table is 0.1715 substitutions per sequence position separating Bacillus subtilis and B. JJ II stearothermophilus. The distance between Bsu-Bst to Lvi (Lactobacillus viridescens) is (0.2147+0.2991)/2=0.2569. In J I the second table, joins Bsu-Bst to Mlu(Micrococcus luteus) at the depth 0.1096(=0.2192/2). The distances Bsu-Bst-

Mlu to Lvi is (2*0.2569+0.3943)/3=0.3027. Notice that this value is identical to (Bsu:Lvi+Bst:Lvi+Mlu:Lvi)/3. Each Page 35 of 60 taxon in the original data table contributes equally to the averages, this is why the method called unweighted Go Back UPGMA method supposes a cloclike behaviour of all the lineages, Full Screen giving a rooted and ultrametric tree.

Close

Quit 6.1.2. NJ (Neighboor Joining)

A variety of methods related to cluster analysis have been proposed that will Introduction correctly reconstruct additive trees, whether the data are ultrametric or not. NJ Tree Terminology removes the assumption that the data are ultrametric. Homology Molecular Evolution 1. For each terminal node i calculate its net divergence (ri) from all the other N Evolutionary Models P 3 taxa using 7→ ri = dik . Distance Methods k=1 Maximum Parsimony 2. Create a rate-corrected distance matrix (M) in which the elements are Searching Trees defined by 7→ M = d − (r + r )/(N − 2) 4. ij ij i j Tree Confidence 3. Define a new node u whose three branches join nodes i, j and the rest PC Lab of tree. Define the lengths of the tree branches from u to i and j 7→ Phylogenetic Links viu = dij/2 + ((ri − rj)/[2(N − 2)]; vju = dij − viu Credits

4. Define the distance from u to each other terminal node (for all k 6= i or Title Page j)7→ dku = (dik + djk − dij)/2 JJ II 5. Remove distances to nodes i and j from the matrix, decrease N by 1 J I 6. If more than2 nodes remain, go back to step 1. Otherwise, the tree is fully defined except for the length of the branch joining the two remaining nodes Page 36 of 60 (i and j) 7→ vij = dij Go Back

3N is the number of terminal nodes 4 Full Screen Only the values i and j for which Mij is minimum need to be recorded, saving the entire matrix is unnecessary Close

Quit The main virtue of neighbor-joining is its efficiency. It can be used on very large data sets for which other phylogenetic analysis are computationally prohibitive. Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Unlike the UPGMA, NJ does not assume that all lineages evolve at Page 37 of 60 the same rate and produces an unrooted tree. Go Back

Full Screen

Close

Quit Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 38 of 60

Go Back

Full Screen

Close

Quit 6.2. Pros & Cons of Distance Methods

Introduction • Pros: Tree Terminology – They are very fast, Homology – There are a lot of models to correct for multiple, Molecular Evolution Evolutionary Models – LRT may be used to search for the best model. Distance Methods • Cons: Maximum Parsimony Searching Trees – Information about evolution of particular characters is lost Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 39 of 60

Go Back

Full Screen

Close

Quit 7. Maximum Parsimony

Most biologists are familiar with the usual notion of parsimony in science, Introduction which essentially maintains that simpler hypotheses are prefereable to more com- Tree Terminology plicated ones and that ad hoc hypotheses should be avoided whenever possible. Homology The principle of maximum parsimony (MP) searches for a tree that requires the Molecular Evolution smallest number of evolutionary changes to explain differences observed Evolutionary Models among OTUs. Distance Methods In general, parsimony methods operate by selecting trees that minimize the total Maximum Parsimony tree length: the number of evolutionary steps (transformation of one Searching Trees character state to another) require to explain a given set of data. Tree Confidence PC Lab In mathematical terms: from the set of possible trees, find all trees τ such that Phylogenetic Links L(τ) is minimal Credits B N P P L(τ) = wj.diff(xk0j, xk00j) Title Page k=1 j=1 JJ II Where L(τ) is the length of the tree, B is the number of branches, N is the 0 00 number of characters, k and k are the two nodes incident to each branch J I k, xk0j and xk00j represent either element of the input data matrix or optimal character-state assignments made to internal nodes, and diff(y, z) is a function Page 40 of 60 specifying the cost of a transformation from state y to state z along any branch. Go Back The coefficient wj assigns a weight to each character. Note also that diff(y, z) 5 needs not to be equal diff(z, y). Full Screen

5For methods that yield unrooted trees diff(y, z) =diff(z, y). Close

Quit Determining the length of the tree is computed by algorithmic methods[4, 15]. However, we will show how to calculate the length of a particular tree topology Introduction ((W,Y),(X,Z))6 for a specific site of a sequence, using Fitch (A) and transversion Tree Terminology parsimony (B)7: Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab • With equal costs, the minimum is 2 steps, achieved by 3 ways (internal Phylogenetic Links nodes ”A-C”, ”C-C”, ”G-C”), Credits • The alternative trees ((W,X),(Y,Z)) and ((W,Z),(Y,X)) also have 2 steps, Title Page • Therefore, the character is said to be parsimony-uninformative,8 JJ II • With 4:1 ts:tv weighting scheme, the minimum length is 5 steps, achived by two reconstructions (internal nodes ”A-C” and”G-C”), J I Page 41 of 60 • By evaluating the alternative topologies finds a minimum of 8 steps, Go Back 6Newick format 7Matrix character states: A,C,G,T Full Screen 8A site is informative, only it favors one tree over the others

Close

Quit • Therefore, under unequal costs, the character becomes informative. The use of unequal costs may provide more information for phylogenetic Introduction reconstruction, Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 42 of 60

Go Back

Full Screen

Close

Quit 7.1. Pros & Cons of MP

• Pros: Introduction Tree Terminology – Does not depend on an explicit model of evolution, Homology – At least gives both, a tree and the associated hypotheses of character Molecular Evolution evolution, Evolutionary Models – If homoplasy is rare, gives reliable results, Distance Methods Maximum Parsimony • Cons: Searching Trees – May give misleading results if homplasy is common (Long branch Tree Confidence attraction effect) PC Lab – Underestimate branch lengths Phylogenetic Links Credits – Parsimony is often justified by phylosophical, instead statistical grounds. Title Page

JJ II

J I

Page 43 of 60

Go Back

Full Screen

Close

Quit 8. Searching Trees

8.1. How many trees are there? Introduction Tree Terminology The obvious method for searching the most parsimonious tree is to consider Homology all posible trees, one after another, and evaluate them. We will see that this Molecular Evolution procedure becomes impossible for more than a few number of taxa (∼11). Evolutionary Models Felsenstein [2] deduced that: Distance Methods T Maximum Parsimony B(T ) = Q (2i − 5) Searching Trees i=3 Tree Confidence An unrooted, fully resolved tree has: PC Lab • T terminal nodes, T − 2 internal nodes, Phylogenetic Links Credits • 2T − 3 branches; T − 3 interior and T peripheral, Title Page • B(T ) alternative topologies, JJ II • Adding a root, adds one more internal node and one more internal branch, J I

• Since the root can be placed along any 2T − 3 branches, the number of Page 44 of 60 possible rooted trees becomes, Go Back T Q B(T ) = (2T − 3) (2i − 5) Full Screen i=3

Close

Quit OTUs Rooted trees Unrooted trees

2 1 1 Introduction 3 3 1 Tree Terminology 4 15 3 Homology 5 105 15 Molecular Evolution 6 954 105 Evolutionary Models 7 10,395 954 Distance Methods 8 135,135 10,395 9 2,027,025 135,135 Maximum Parsimony 10 34,459,425 2,027,025 Searching Trees 11 > 654x106 > 34x106 Tree Confidence 15 > 213x1012 > 7x1012 PC Lab 20 > 8x1021 > 2x1020 Phylogenetic Links 50 > 6x1081 > 2x1076 Credits

Title Page The observable universe has about 8.8x1077 atoms JJ II There is not memory neither time to evaluate all the trees!! J I

Page 45 of 60 For 11 or fewer taxa, a brute-force exhaustive search is feasible!! For more than 11 taxa an heuristic search is the best solution!! Go Back

Full Screen

Close

Quit 8.2. Exhaustive search methods

• Every possible tree is examined; the shortest tree will always be Introduction found, Tree Terminology Homology • Taxon addition sequence is important only in that the algorithm needs Molecular Evolution to remember where it is, Evolutionary Models • Search will also generate a list of the lenths of all possible trees, which Distance Methods can be plotted as an histogram, Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 46 of 60

Go Back

Full Screen

Close

Quit Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 47 of 60

Go Back

Full Screen

Close

Quit 8.3. Heuristic search methods

When a data set is too large to permit the use of exact methods, optimal Introduction trees must be sought via heuristic approaches that sacrifice the guarantee of Tree Terminology optimality in favor of reduced computing time Homology Molecular Evolution Two kind of algorithms can be used: Evolutionary Models 1. Greedy Algorithms Distance Methods Maximum Parsimony 2. Branch Swapping Algorithms Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 48 of 60

Go Back

Full Screen

Close

Quit 8.3.1. Greedy Algorithms

Introduction Tree Terminology Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Strategies of this sort are often called the greedy algorithm because they seize Title Page the first improvement that they see. Two major algorithms exist: JJ II • Stepwise Addition, J I • Star Decomposition9 Page 49 of 60

Go Back Both algoritms are prone to entrapment in local optima Full Screen

9The most common star decomposition method is the NJ algorithm Close

Quit Stepwise Addition

• Use addition sequence similar to that for an exhaustive search, but at each Introduction addition, determines the shortest tree, and add the next taxon to that tree. Tree Terminology Homology • Addition sequence will affect the tree topology that is found! Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 50 of 60

Go Back

Full Screen

Close

Quit Star Decomposition

• Start with all taxa in an unresolved (star) tree, Introduction Tree Terminology • Form pairs of taxa, and determine length of tree with paired taxa. Homology Molecular Evolution Evolutionary Models Distance Methods Maximum Parsimony Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 51 of 60

Go Back

Full Screen

Close

Quit 8.3.2. Branch Swapping Algorithms

It may be possible to improve the greedy solutions by performing sets of pre- Introduction defined rearrangements, or branch swappings. Examples of branch swapping Tree Terminology algorithms are: Homology Molecular Evolution • NNI - Nearest Neighbor Interchange, Evolutionary Models • SPR - Subtree Pruning and Regrafting, Distance Methods Maximum Parsimony • TBR - Tree Bisection and Reconnection. Searching Trees Tree Confidence PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 52 of 60

Go Back

Full Screen

Close

Quit 9. Tree Confidence

9.1. Non-parametric bootstrapping Introduction Tree Terminology • For many simple distributions there are simple equations for calculating Homology confidence intervals around an estimate (e.g., std error of the mean) Molecular Evolution • Trees, however are rather complicated structures, and it is extremely dif- Evolutionary Models ficult to develop equations for confidence intervals around a phylogeny. Distance Methods Maximum Parsimony • One way to measure the confidence on a phylogenetic tree is by means Searching Trees of the bootstrap non-parametric method of resampling the same sample Tree Confidence many times. PC Lab Phylogenetic Links Credits

Title Page

JJ II

J I

Page 53 of 60

Go Back

Full Screen

Close

Quit • Each sample from the original sample is a pseudoreplicate. By gener- ation many hundred or thousand pseudoreplicates, a majority consensus Introduction rule tree can be obtained. Tree Terminology • High bootstrap values > 90% is indicative of strong phylogenetic signal. Homology Molecular Evolution • Bootstrap can be viewed as a way of exploring the robustness of phyloge- Evolutionary Models netic inferences to perturbations Distance Methods • Jackkniffe is another non-parametric resampling method that differen- Maximum Parsimony tiates from bootstrap in the way of sampling. Some proportion of the Searching Trees characters are randomly selected and deleted (withouth replacement). Tree Confidence PC Lab • Another technique used exclusively for parsimony is by means of Decay index or Bremmer support. This is the length difference between the Phylogenetic Links shortest tree including the group and the shortest tree excluding the group Credits 10 (The extra-steps required to overturn a group. Title Page • DI & BPs generally correlates!! JJ II

J I

Page 54 of 60

Go Back

Full Screen

10See [19] for a practical example using PAUP*[17] Close

Quit 10. PC Lab

10.1. Download Programs Introduction Tree Terminology • PHYLIP 3.6 http://evolution.genetics.washington.edu/phylip.html Homology Molecular Evolution • MEGA 3.0 http://www.megasoftware.net Evolutionary Models • TREE-PUZZLE http://www.tree-puzzle.de/ Distance Methods Maximum Parsimony • MODELTEST http://darwin.uvigo.es/ Searching Trees • MrBayes http://morphbank.ebc.uu.se/mrbayes/download.php Tree Confidence PC Lab • TreeView http://taxonomy.zoology.gla.ac.uk/rod/treeview.html Phylogenetic Links Credits

Title Page

JJ II

J I

Page 55 of 60

Go Back

Full Screen

Close

Quit 11. Phylogenetic Links

• Software: Introduction Tree Terminology – The Felsenstein node http://evolution.genetics.washington.edu/phylip/software. html Homology – The R. Page Lab. http://taxonomy.zoology.gla.ac.uk/software/software.html Molecular Evolution Evolutionary Models • Courses: Distance Methods – Molecular Systematics and Evolution of Microorganisms. http://www.dbbm. Maximum Parsimony fiocruz.br/james/index.html Searching Trees – Workshop on Molecular Evolution http://workshop.molecularevolution.org/ Tree Confidence – P. Lewis MCB/EEB Course http://www.eeb.uconn.edu/Courses/EEB372/ PC Lab • Tools: Phylogenetic Links Credits – Clustalw at EBI http://www.ebi.ac.uk/clustalw/

– Phylip Web http://cbrmain.cbr.nrc.ca:8080/cbr/jsp/ServicePage e.jsp?id=38 Title Page – Phylip Dochttp://www.hgmp.mrc.ac.uk/Registered/Help/phylip/phylip.html JJ II

J I

Page 56 of 60

Go Back

Full Screen

Close

Quit 12. Credits

This presentation is based on:11 Introduction Tree Terminology • Major Book or Chapters References: Homology Molecular Evolution – Swofford, D. L. et al. 1996. Phylogenetic inference [18]. Evolutionary Models – Harvey, P. H. et al. 1996. New Uses for New Phylogenies [7]. Distance Methods – Li, W. S. 1997 . Molecular Evolution [9]. Maximum Parsimony – Page, R. & Holmes, E. 1998. Molecular evolution. A phylogenetic approach [7]. Searching Trees – Nei, M. & Kumar, S. 1999 . Molecular evolution and phylogenetics [11]. Tree Confidence – Salemi, M. & Vandamme, A. (ed.) 2003. The phylogenetic handbook [14]. PC Lab – Felsenstein, J. 2004. Inferring phylogenies [3]. Phylogenetic Links • On Line Phylogenetic Resources: Credits

– http://www.dbbm.fiocruz.br/james/index.html .Molecular Systematics and Title Page Evolution of Microorganisms. The Natural History Museum, London and In- stituto Oswaldo Cruz, FIOCRUZ. JJ II – Peter Foster’s ”The Idiot’s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies” at http://www.bioinf.org/molsys/data/idiots.pdf J I

• Slides Production: Page 57 of 60

– Latex and pdfscreen package. Go Back

Full Screen

11HJD take responsibility for innacuracies of this presentation. Close

Quit References

[1] L. L. Cavalli-Sforza and A. W. F. Edwards. Phylogenetic Analysis: Models Introduction and estimation procedures. American Journal of Human Genetics, 19:223– Tree Terminology 257, 1967. Homology Molecular Evolution [2] J. Felsenstein. The number of evolutionary trees. (Correction:, Vol.30, Evolutionary Models p.122, 1981). Syst. Zool., 27:27–33, 1978. Distance Methods [3] J. Felsenstein. Inferring phylogenies. Sinauer associates, Inc., Sunderland, Maximum Parsimony MA, 2004. Searching Trees Tree Confidence [4] W. M. Fitch. Toward defining the course of evolution: Minimum change PC Lab for a specified tree topology. Syst Zool, 20:406–416, 1971. Phylogenetic Links [5] W. M. Fitch and E. Margoliash. Construction of phylogenetic trees: a Credits method based on mutation distances as estimated from cytochrome c se- Title Page quences is of general applicability. Science, 155:279–284, 1967.

[6] W. S. Fitch. Distinguishing homologous from analogous proteins. Syst. JJ II Zool., 19:99–113, 1970. J I

[7] P. H. Harvey, A. J. Leigh Brown, John Maynard Smith, and S. Nee. New Page 58 of 60 Uses for New Phylogenies. Oxford Univ Press, Oxford. England, 1996. Go Back [8] M. Kimura. The neutral theory of molecular evolution. Cambridge Univer- sity Press, Cambridge, London, 1983. Full Screen

Close

Quit [9] W.-S. Li. Molecular evolution. Sinauer Associates, Inc., Sunderland, MA, 1997. Introduction [10] C. D. Michener and R. R. Sokal. A quantitative approach to a problem of Tree Terminology classification. Evolution, 11:490–499, 1957. Homology [11] M. Nei and S. Kumar. Molecular evolution and phylogenetics. Blackwell Molecular Evolution Science Ltd., Oxford, London, first edition, 1998. Evolutionary Models Distance Methods [12] R. D. M. Page and E. C. Holmes. Molecular evolution. A phylogenetic Maximum Parsimony approach. Blackwell Science Ltd., Oxford, London, first edition, 1998. Searching Trees [13] M. Robinson-Rechavi and D. Huchon. RRTree: relative-rate tests between Tree Confidence groups of sequences on a phylogenetic tree. Bioinformatics, 16:296–297, PC Lab 2000. Phylogenetic Links Credits [14] M. Salemi and A. M. Vandamme (ed). The phylogenetic handbook. A prac- tical approach to DNA and protein phylogeny. Cambridge University Press, Title Page UK, 2003. JJ II [15] D. Sankoff and P. Rousseau. Locating the vertixes of a Steiner tree in an arbitrary metric space. Math. Progr., 9:240–276, 1975. J I

[16] R. R. Sokal and P. H. Sneath. Numerical taxonomy. W. H. Freeman, San Page 59 of 60 Francisco, 1963. Go Back [17] D. L. Swofford. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts, Full Screen 2003. Close

Quit [18] D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Mable, editors, Molecular Introduction systematics (2nd ed.), pages 407–514. Sinauer Associates, Inc., Sunderland, Tree Terminology Massachusetts, 1996. Homology [19] D. L. Swofford and J. Sullivan. Phylogeny inference based on parsimony Molecular Evolution and other methods using PAUP*. Theory and practice. In M. Salemi and Evolutionary Models A. M. Vandamme, editors, The phylogenetic handbook. A practical approach Distance Methods to DNA and protein phylogeny, pages 160–206. Cambridge University Press, Maximum Parsimony UK, 2003. Searching Trees [20] E. O. Wiley, D. Siegel-Causey, D. R. Brooks, and V. A. Funk. The Compleat Tree Confidence Cladist.A Primer of Phylogenetic Procedures. The University of Kansas PC Lab Museum of Natural History. Lawrence, Special Publication No19, 1991. Phylogenetic Links Credits [21] E. Zuckerkandl and L. Pauling. Molecules as documents of evolutionary history. J Theor Biol, 8:357–366, 1965. Title Page

JJ II

J I

Page 60 of 60

Go Back

Full Screen

Close

Quit