Basics for the Construction of Phylogenetic Trees
Total Page:16
File Type:pdf, Size:1020Kb
Article ID: WMC002563 ISSN 2046-1690 Basics for the Construction of Phylogenetic Trees Corresponding Author: Mr. B P Niranjan Reddy, Senior Research Fellow, School of Sciences in Biotechnology, Jiwaji University - India Submitting Author: Mr. B.P.Niranjan Reddy, Senior Research Fellow, School of Sciences in Biotechnology, Jiwaji University - India Article ID: WMC002563 Article Type: Review articles Submitted on:03-Dec-2011, 11:47:15 AM GMT Published on: 03-Dec-2011, 08:06:40 PM GMT Article URL: http://www.webmedcentral.com/article_view/2563 Subject Categories:BIOLOGY Keywords:Phylogenetic tree, Model selection, Bootstrapping, Phylogeny free software How to cite the article:Niranjan Reddy B P. Basics for the Construction of Phylogenetic Trees . WebmedCentral BIOLOGY 2011;2(12):WMC002563 Source(s) of Funding: None Competing Interests: None Additional Files: Links to some very useful web pages for phylogenet WebmedCentral > Review articles Page 1 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM Basics for the Construction of Phylogenetic Trees Author(s): Niranjan Reddy B P Abstract the genes into various classes like orthologs, paralogs, in- or out-paralogs, to understand the evolution of the new functions through duplications, horizontal gene transfers, gene conversion, recombination, and Phylogeny- A Diagram for Evolutionary Network-is co-evolution etc. (Hafner and Nadler, 1988; Nei, 2003; used to infer the phylogenetic relationships among the Pagel, 2000). Phylogenetic analysis provides a species or genes. The phylogenetic analysis including powerful tool for comparative genomics (Pagel, 2000). morphological, biological, and bionomic characters, Genome sequencing projects are providing valuable allozyme, RFLP data have been extensively used to sequence information that is widely used to infer the infer the evolutionary relationship among the species evolutionary relationship between different species or during the pre-genomic era. With the advent of high genes. The species' phylogenies are generally inferred throughput sequencing technologies and the based on the paleontological/geological information or development of extensive statistical analytical tools, an morphological traits (Nei, 2003). These phylogenies increased amount of sequence information is made act as a reference to assess the veracity of the available in the public domains. This particular phylogenetic tree constructed based on any situation has revolutionarized the field of phylogenetics, phylogenetic informative marker. With the increased and has opened up opportunities for drawing and availability of whole genome sequences, the field of reconstructing the phylogenetic relationships with phylogenomics (i.e. use of either whole genome or a more confidence and accuracy. Consequently, today, large number of genes for phylogenetics analysis) is phylogenetics has become an integral part of any becoming popular among the evolutionary biologists sequencing associated research projects. Although, (Fitz Gibbon and House, 1999; Korbel et al., 2002; many publications related to the understanding of the Snel et al., 1999; Thornton and DeSalle, 2000). Many phylogenetic tree are available, most of them are phylogenomics based reports have been published, either for the experts in the field or for and most of them are true reflective of reference bioinformaticians. It is essentially needed for the species' phylogenies that are inferred from beginner to start from a document that includes all the paleontological and/or geological information (Kumar basics together with briefings of the modern and Filipski, 2001). Furthermore, phylogenomics developments in phylogenetics. Considering the reconstruction helps in supplementing or correcting importance of phylogenetic analysis in modern science, the earlier working phylogenetic relationships (Kumar here in this review, an attempt was made to simplify and Filipski, 2001). Phylogenetic trees can be drawn the understanding of the phylogenetic tree from genes (nucleotide or protein sequences), construction, availability and usability of the different morphological, biological and bionomic characters, methods and software tools for inferring the trees. restriction fragment polymorphisms, or whole genome Introduction orthologs, or geological records (Horner and Pesole, 2004; Klenk and Göker, 2010; Nikaido et al., 2001; Snel et al., 1999). Although it is very easy to construct The field of phylogenetics has become an integral part the phylogenetic trees using the user-friendly software of any modern biological research. Construction of tools, often it is observed that having basic information phylogenetic tree becoming such an easy task that regarding the processes that undergo behind the novice can also construct relatively near to perfect scenes will greatly helps in improving the quality of the phylogenetic tree with little hard work. This is majorly phylogenetic tree construction by giving better input due to free availability of many tree construction, values into the programs. Thus, in this review article, viewing and editing tools that demand very little our writing centered in basic concepts of construction knowledge regarding the phylogenetic construction of phylogenetic analysis using nucleotide or amino procedures (i. e., it is not mandatory to know the acid sequences. basics of the models and algorithm procedures which Basics concepts and definition involves in behind the scenes). Phylogenetic analysis can be performed to infer the evolutionary relationship among the members of the taxa, to understand the Phylogenetic tree also known as “evolutionary tree” is evolution of the genomes and gene families, to classify WebmedCentral > Review articles Page 2 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM the graphical representation of the evolutionary A rooted tree represents the divergence of a group of relationship between the taxa/genes in question. A related species from their last common ancestor (root) dendrogram is a broad term for the diagrammatic by successive branching events over the time period. representation of a phylogenetic tree. Different In contrary the unrooted phylogenetic tree reveal inter terminologies are used to describe the characteristics species/taxa relationships excluding the identification of a phylogenetic tree. The cladogram is a dendogram of most recent common ancestor or the root. The which explains only genealogy of the taxa but says rooted phylogenies are constructed using unrelated nothing about the branch lengths or time periods of species/genes involving the phylogenetic divergence (Page and Holmes, 1998; Procter et al., reconstruction. Very distantly related taxa or relatively 2010). The phylogram (additive tree) is a phylogenetic related taxa are considered for tree rooting called tree that explicitly represents a number of character out-group and in-group, respectively. The terminal changes (nucleotide/amino acid changes/number of nodes in the phylogenetic tree are called as character variations) through its branch lengths (Page operational taxonomic units (OTU). The branches that and Holmes, 1998; Procter et al., 2010). In case of do not join any of the terminal/leaves/OTUs (fig. 1) phylogram the evolutionary distance between any two directly but via internal nodes are called “ancestral taxa is given by sum of the branch lengths connected states” or “hypothetical taxonomic units” that might them. Though these trees may be rooted or unrooted, have appeared during evolution and cannot be seen at often these trees lack a root. A chronogram present (Page and Holmes, 1998; Pagel, 2000). The (ultrametric) is a rooted phylogenetic tree that posses internal branch points in a species phylogenetic tree all the characteristics of an additive tree, in addition represents the speciation events, while gene families' with the assumption of molecular clock determination phylogenetic tree, they mean for duplication events of the molecular divergence time between taxa can be (Pagel, 2000). The internal branches may be possible (Page and Holmes, 1998). The molecular bifurcating or multi-furcating. Analysis of the gene clock hypothesis assumes that every site in a protein families generally forms multi-furcating branches and or coding nucleotide sequence from all the species each of the small multi-furcating branches forms a sub evolve at a constant rate (Zuckerkandl and Pauling, tree or a clade (Kao et al., 1999; Nei et al., 1997; Nei 1962). Furthermore, the chronogram consists of taxa and Rooney, 2005). placed equidistant from the ancestor which cannot be The whole process of construction of the phylogenetic seen in case of phylogram. Phenetics (taximetrics) tree is divided into five different steps, viz. infers the relationship between the taxa that usually Step 1: Choosing an appropriate markers for the involves morphology or other observable traits as phylogenetic analysis phylogenetic informative markers (Duncan and Baum, Step 2: Multiple sequence alignments 1981; Mayr, 1965; Page and Holmes, 1998). Step 3: Selection of an evolutionary model A tree that shows the evolution of the genes is known Step 4: Phylogenetic reconstruction as gene tree (Snel et al., 1999). While, tree that shows Step 5: Evaluation of the phylogenetic tree the evolution of species is known as species' tree. It is Step 1: Choosing an appropriate markers for the important to note that gene trees are not necessary to phylogenetic analysis follow