Phylogenetic Analysisã

CHAPTER 9 Phylogenetic AnalysisÃ OUTLINE 9.1 Phylogenetics and the Widespread Use 9.4.3 Selection of a Model of Evolution 212 of the Phylogenetic Tree 209 9.4.4 Construction of the Phylogenetic Tree 213 9.4.4.1 Distance-Based (Distance-Matrix) 9.2 Phylogenetic Trees 210 Methods 213 9.2.1 Phylogenetic Trees, Phylograms, 9.4.4.2 Character-Based Methods 213 Cladograms, and Dendrograms 211 9.4.5 Assessment of the Reliability 9.3 Phylogenetic Analysis Tools 211 of a Phylogenetic Tree 215 9.4 Principles of Phylogenetic-Tree Construction 211 9.5 Monophyly, Polyphyly, and Paraphyly 217 9.4.1 Selection of the Appropriate Molecular 9.6 Species Trees Versus Gene Trees 217 Marker 211 9.4.2 Multiple Sequence Alignment 212 References 218 9.1 PHYLOGENETICS AND THE phylogenetic/evolutionary trees is now widespread in WIDESPREAD USE OF THE many areas of study where evolutionary divergence PHYLOGENETIC TREE can be studied and demonstrated; be it pathogens, bio- logical macromolecules, or languages. Phylogeny refers to the evolutionary history of spe- Phylogenetics also provides the basis for compara- cies. Phylogenetics is the study of phylogenies—that tive genomics, which is a more recent term that came is, the study of the evolutionary relationships of spe- into existence in the age of genomics. Comparative cies. Phylogenetic analysis is the means of estimating genomics is the study of the interrelationships of gen- the evolutionary relationships. In molecular phyloge- omes of different species. Comparative genomics helps netic analysis, the sequence of a common gene or pro- identify regions of similarity and differences among tein can be used to assess the evolutionary relationship genomes. The comparison can be made at different of species. The evolutionary relationship obtained from levels, such as comparison of whole-genome sequences, phylogenetic analysis is usually depicted as branching, comparison of genome sequences involving blocks treelike diagram—the phylogenetic tree. Historically, of conserved synteny, comparison of the number the use of phylogenetic trees was restricted more or of protein-coding genes, comparison of regulatory less to the study of evolutionary biology, and to disci- sequences, or other focused comparisons. An important plines like systematics and taxonomy. However, with application of comparative genomics is gene finding. the advent of sequencing and the widespread use of From the standpoint of evolutionary biology, compar- cladistics, the use of phylogenetic trees has pervaded ative genomics helps understand the evolutionary many branches of biology and beyond. Construction of relationships among genomes. ÃThe opinions expressed in this chapter are the author’s own and they do not necessarily reflect the opinions of the FDA, the DHHS, or the Federal Government. Bioinformatics for Beginners. 209 2014 Published by Elsevier Inc. 210 9. PHYLOGENETIC ANALYSIS A resource for comparative genomic analysis is populations, or gene or protein sequences—being com- VISTA, which can be accessed at http://genome.lbl. pared, whereas the internal nodes represent hypotheti- gov/vista/index.shtml. cal taxonomic units (HTUs). An HTU is an inferred unit and it represents the last common ancestor (LCA) to the nodes arising from this point. Descendants (taxa) 9.2 PHYLOGENETIC TREES that split from the same node form sister groups,anda taxon that falls outside the cladea is called an outgroup. A phylogenetic tree or evolutionary tree is a diagram- For example, in Figure 9.1 B,T2 and T3 are sister groups, matic representation of the evolutionary relationships and T1 is an outgroup to T2 and T3. among various taxa (Figure 9.1 AÀD). It is a branching Phylogenetic trees can be scaled or unscaled.Ina diagram composed of nodes and branches. The branch- scaled tree, the branch length is proportional to the ing pattern of a tree is called the topology of the tree. amount of evolutionary divergence (e.g. the number of The nodes represent taxonomic units, such as species nucleotide substitutions) that has occurred along that (or higher taxa), populations, genes, or proteins. A branch. In an unscaled tree, the branch length is not branch is called an edge, and represents the time esti- proportional to the amount of evolutionary divergence, mate of the evolutionary relationships among the taxo- but usually the actual number is indicated somewhere nomic units. One branch can connect only two nodes. In on the branch. a phylogenetic tree, the terminal nodes represent the Phylogenetic trees can be rooted (Figure 9.1 A and B) operational taxonomic units (OTUs) or leaves.The or unrooted (Figure 9.1 C). A rooted tree has a node OTUs are the actual objects—such as the species, (the root) from which the rest of the tree diverges. FIGURE 9.1 Different forms of presentation of the phylogenetic tree. The phylogenetic tree in D is a dendrogram derived from hierarchical clustering (see text). A, B, and D show rooted trees, while C shows an unrooted tree. Taxa that share specific derived characters are grouped into clades. (A) Smaller clades located within a larger clade are called nested clades. (B) The terminal nodes represent the operational taxonomic units, also called “leaves”; each terminal node could be a taxon (species or higher taxa), or a gene or protein sequence. The internal nodes represent hypothetical taxonomic units. An HTU represents the last common ancestor to the nodes arising from this point. Two descendants that split from the same node are called sister groups and a taxon that falls outside the clade is called an outgroup. Rooted trees have a node from which the rest of the tree diverges, frequently called the last universal common ancestor (LUCA). aTaxa that share specific derived characters are grouped more closely together than those who do not. The groups are called clades; each clade consists of an ancestor and all of its descendants. BIOINFORMATICS FOR BEGINNERS 9.4. PRINCIPLES OF PHYLOGENETIC-TREE CONSTRUCTION 211 This root is frequently referred to as the last universal resource to understand MEGA. Another widely used common ancestor (LUCA), from which the other taxo- and versatile downloadable software tool is PHYLIP nomic groups have descended and diverged over time. (Phylogenetics Inference Package), which is a free In molecular phylogenetics, the LUCA and LCA are package of programs for inferring phylogenies. It was represented by DNA or protein sequences. Obtaining developed by Joseph Felsenstein of the University a rooted tree is ideal, but most phylogenetic-tree- of Washington (http://evolution.genetics.washington. reconstruction algorithms produce unrooted trees. edu/phylip.html). A widely used and affordable com- mercial software program for phylogenetic analysis is PAUP (Phylogenetic Analysis Using Parsimony 9.2.1 Phylogenetic Trees, Phylograms, (and Other Methods)), written by David Swofford. Cladograms, and Dendrograms Another downloadable phylogenetic software tool is MacClade (http://macclade.org/macclade.html), writ- In the context of molecular phylogenetics, the ten by David Maddison and Wayne Maddison. On the expressions phylogenetic tree, phylogram, cladogram, MacClade link, click on “Acquiring MacClade” or and dendrogram are used interchangeably to mean the access the downloadable link directly at http:// same thing—that is, a branching tree structure that macclade.org/download.html. represents the evolutionary relationships among the There are several other phylogenetic analysis tools taxa (OTUs), which are gene/protein sequences. In the available on the web. Many of these require special traditional evolutionary sense, the OTUs in the phylo- formatting of data for entry, and they send the results genetic tree are represented by species. A phylogram through e-mail instead of providing real-time display is a scaled phylogenetic tree in which the branch of results. These tools can be checked out at the follow- lengths are proportional to the amount of evolutionary ing link: http://molbiol-tools.ca/Phylogeny.htm. divergence. For example, a branch length may be determined by the number of nucleotide substitutions that have occurred between the connected branch points. A cladogram is a branching hierarchical tree 9.4 PRINCIPLES OF PHYLOGENETIC- that shows the relationships between clades; clado- TREE CONSTRUCTION grams are unscaled. The word dendrogram means a hierarchical cluster arrangement where similar objects Although a number of online resources have been (based on some defined criteria) are grouped into clus- mentioned above that can be used to construct/recon- ters; hence, a dendrogram shows the relationships struct phylogenetic trees, it is nevertheless important among various clusters (Figure 9.1 D). Dendrograms to understand the assumptions and steps involved in are also used outside the scope of phylogenetics and phylogenetic-tree construction for conceptual clarity. even outside of biology. Dendrograms are fequently There are certain assumptions behind making a phy- used in computational molecular biology to illustrate logenetic tree, such as (1) the sequences are homolo- the branching based on clustering of genes or proteins. gous—that is, the sequences share a common ancestry and they diverged through time as they evolved—and (2) each position evolved independently. The quality of 9.3 PHYLOGENETIC ANALYSIS TOOLS multiple sequence alignment is the key to obtaining a reliable phylogenetic tree. When using coding sequences, it is desirable to use the protein sequences to reconstruct the The most convenient way to construct a phylo- phylogenetic tree. genetic tree is to use online tools. A good online phyloge- Construction of a phylogenetic tree involves the Phylogeny.fr netic analysis tool is available at (http:// following steps: (1) Selection of the appropriate molec- www.phylogeny.fr/). This server provides “robust ular marker (genes/proteins/mitochondrial DNA), phylogenetic analysis for the non-specialist.” The user (2) Multiple sequence alignment, (3) Selection of a One Click can build a phylogenetic tree using the “ ” model of evolution, (4) Construction of the phyloge- option with all the default settings. Another tool for netic tree, (5) Assessment of the reliability of the tree.

Phylogenetic Analysisã

Species Concepts Should Not Conflict with Evolutionary History, but Often Do

A Phylogenomic Analysis of Turtles ⇑ Nicholas G

Congruence of Morphologically-Defined Genera with Molecular Phylogenies

Genetic Divergence and Polyphyly in the Octocoral Genus Swiftia [Cnidaria: Octocorallia], Including a Species Impacted by the DWH Oil Spill

Phylogenetics

Agaricales) Reveals Polyphyly of Agaricoid Members

Taxonomy and Classification Goals: Un Ders Tan D Traditi Onal and Hi Erarchi Cal Cl Assifi Cati Ons of Biodiversity, and What Information Classifications May Contain

Phylogenetic Relationships and Divergence Times in Rodents Based on Both Genes and Fossils Ryan Norris University of Vermont

Phylogeny of Angiosperms Phylogeny Is the Study of Evolutionary History and Relationship Among Individuals Or Group of Organism

Voelker and Spellman.Pdf

Phylogeny of Angiosperms

Basics of Cladistic Analysis