<<

Article ID: WMC002563 ISSN 2046-1690

Basics for the Construction of Phylogenetic Trees

Corresponding Author: Mr. B P Niranjan Reddy, Senior Research Fellow, School of Sciences in Biotechnology, Jiwaji University - India

Submitting Author: Mr. B.P.Niranjan Reddy, Senior Research Fellow, School of Sciences in Biotechnology, Jiwaji University - India

Article ID: WMC002563 Article Type: Review articles Submitted on:03-Dec-2011, 11:47:15 AM GMT Published on: 03-Dec-2011, 08:06:40 PM GMT Article URL: http://www.webmedcentral.com/article_view/2563 Subject Categories: Keywords:, Model selection, Bootstrapping, Phylogeny free software How to cite the article:Niranjan Reddy B P. Basics for the Construction of Phylogenetic Trees . WebmedCentral BIOLOGY 2011;2(12):WMC002563 Source(s) of Funding: None

Competing Interests: None

Additional Files: Links to some very useful web pages for phylogenet

WebmedCentral > Review articles Page 1 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

Basics for the Construction of Phylogenetic Trees

Author(s): Niranjan Reddy B P

Abstract the genes into various classes like orthologs, paralogs, in- or out-paralogs, to understand the of the new functions through duplications, horizontal gene transfers, gene conversion, recombination, and Phylogeny- A Diagram for Evolutionary Network-is co-evolution etc. (Hafner and Nadler, 1988; Nei, 2003; used to infer the phylogenetic relationships among the Pagel, 2000). Phylogenetic analysis provides a or genes. The phylogenetic analysis including powerful tool for comparative (Pagel, 2000). morphological, biological, and bionomic characters, Genome sequencing projects are providing valuable allozyme, RFLP data have been extensively used to sequence information that is widely used to infer the infer the evolutionary relationship among the species evolutionary relationship between different species or during the pre-genomic era. With the advent of high genes. The species' phylogenies are generally inferred throughput sequencing technologies and the based on the paleontological/geological information or development of extensive statistical analytical tools, an morphological traits (Nei, 2003). These phylogenies increased amount of sequence information is made act as a reference to assess the veracity of the available in the public domains. This particular phylogenetic tree constructed based on any situation has revolutionarized the field of , phylogenetic informative marker. With the increased and has opened up opportunities for drawing and availability of whole genome sequences, the field of reconstructing the phylogenetic relationships with (i.e. use of either whole genome or a more confidence and accuracy. Consequently, today, large number of genes for phylogenetics analysis) is phylogenetics has become an integral part of any becoming popular among the evolutionary biologists sequencing associated research projects. Although, (Fitz Gibbon and House, 1999; Korbel et al., 2002; many publications related to the understanding of the Snel et al., 1999; Thornton and DeSalle, 2000). Many phylogenetic tree are available, most of them are phylogenomics based reports have been published, either for the experts in the field or for and most of them are true reflective of reference bioinformaticians. It is essentially needed for the species' phylogenies that are inferred from beginner to start from a document that includes all the paleontological and/or geological information (Kumar basics together with briefings of the modern and Filipski, 2001). Furthermore, phylogenomics developments in phylogenetics. Considering the reconstruction helps in supplementing or correcting importance of phylogenetic analysis in modern science, the earlier working phylogenetic relationships (Kumar here in this review, an attempt was made to simplify and Filipski, 2001). Phylogenetic trees can be drawn the understanding of the phylogenetic tree from genes (nucleotide or protein sequences), construction, availability and usability of the different morphological, biological and bionomic characters, methods and software tools for inferring the trees. restriction fragment polymorphisms, or whole genome Introduction orthologs, or geological records (Horner and Pesole, 2004; Klenk and Göker, 2010; Nikaido et al., 2001; Snel et al., 1999). Although it is very easy to construct The field of phylogenetics has become an integral part the phylogenetic trees using the user-friendly software of any modern biological research. Construction of tools, often it is observed that having basic information phylogenetic tree becoming such an easy task that regarding the processes that undergo behind the novice can also construct relatively near to perfect scenes will greatly helps in improving the quality of the phylogenetic tree with little hard work. This is majorly phylogenetic tree construction by giving better input due to free availability of many tree construction, values into the programs. Thus, in this review article, viewing and editing tools that demand very little our writing centered in basic concepts of construction knowledge regarding the phylogenetic construction of phylogenetic analysis using nucleotide or amino procedures (i. e., it is not mandatory to know the acid sequences. basics of the models and algorithm procedures which Basics concepts and definition involves in behind the scenes). Phylogenetic analysis can be performed to infer the evolutionary relationship among the members of the taxa, to understand the Phylogenetic tree also known as “evolutionary tree” is evolution of the genomes and gene families, to classify

WebmedCentral > Review articles Page 2 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

the graphical representation of the evolutionary A rooted tree represents the divergence of a group of relationship between the taxa/genes in question. A related species from their last common ancestor (root) dendrogram is a broad term for the diagrammatic by successive branching events over the time period. representation of a phylogenetic tree. Different In contrary the unrooted phylogenetic tree reveal inter terminologies are used to describe the characteristics species/taxa relationships excluding the identification of a phylogenetic tree. The is a dendogram of most recent common ancestor or the root. The which explains only genealogy of the taxa but says rooted phylogenies are constructed using unrelated nothing about the branch lengths or time periods of species/genes involving the phylogenetic divergence (Page and Holmes, 1998; Procter et al., reconstruction. Very distantly related taxa or relatively 2010). The phylogram (additive tree) is a phylogenetic related taxa are considered for tree rooting called tree that explicitly represents a number of character out-group and in-group, respectively. The terminal changes (nucleotide/amino acid changes/number of nodes in the phylogenetic tree are called as character variations) through its branch lengths (Page operational taxonomic units (OTU). The branches that and Holmes, 1998; Procter et al., 2010). In case of do not join any of the terminal/leaves/OTUs (fig. 1) phylogram the evolutionary distance between any two directly but via internal nodes are called “ancestral taxa is given by sum of the branch lengths connected states” or “hypothetical taxonomic units” that might them. Though these trees may be rooted or unrooted, have appeared during evolution and cannot be seen at often these trees lack a root. A chronogram present (Page and Holmes, 1998; Pagel, 2000). The (ultrametric) is a rooted phylogenetic tree that posses internal branch points in a species phylogenetic tree all the characteristics of an additive tree, in addition represents the speciation events, while gene families' with the assumption of molecular clock determination phylogenetic tree, they mean for duplication events of the molecular divergence time between taxa can be (Pagel, 2000). The internal branches may be possible (Page and Holmes, 1998). The molecular bifurcating or multi-furcating. Analysis of the gene clock hypothesis assumes that every site in a protein families generally forms multi-furcating branches and or coding nucleotide sequence from all the species each of the small multi-furcating branches forms a sub evolve at a constant rate (Zuckerkandl and Pauling, tree or a (Kao et al., 1999; Nei et al., 1997; Nei 1962). Furthermore, the chronogram consists of taxa and Rooney, 2005). placed equidistant from the ancestor which cannot be The whole process of construction of the phylogenetic seen in case of phylogram. Phenetics (taximetrics) tree is divided into five different steps, viz. infers the relationship between the taxa that usually Step 1: Choosing an appropriate markers for the involves morphology or other observable traits as phylogenetic analysis phylogenetic informative markers (Duncan and Baum, Step 2: Multiple sequence alignments 1981; Mayr, 1965; Page and Holmes, 1998). Step 3: Selection of an evolutionary model A tree that shows the evolution of the genes is known Step 4: Phylogenetic reconstruction as gene tree (Snel et al., 1999). While, tree that shows Step 5: Evaluation of the phylogenetic tree the evolution of species is known as species' tree. It is Step 1: Choosing an appropriate markers for the important to note that gene trees are not necessary to phylogenetic analysis follow the species' tree. This is due to different Any biological information that can be used to infer the selection constraints that can act on a gene may evolutionary relationship among the taxa is known as reflect distinct evolutionary rates from others. a phylogenetic information marker. It can be anything How to read a phylogenetic tree: like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, 1. A monophyletic grouping is one in which all species and conserved intronic positions, etc. Identification of share a common ancestor, and all species derived conserved genetic loci (coding- or non-coding) is the from that common ancestor are included. This is the first step in analyzing the phylogenetic relationship. only form of grouping accepted as valid by cladists. Both coding (genes) and non-coding genetic region 2. A paraphyletic grouping is one in which all species can be used for the analysis of phylogenetic share a common ancestor, but not all species derived relationships. However, selected sequence(s) must from that common ancestor are included. satisfy the defined necessary rules: (a) the sequence 3. A polyphyletic grouping is one in which species that should have a long evolutionary history of do not share an immediate common ancestor are conservation, as this feature facilitates, firstly in the lumped together, while excluding other members that preservation of long evolution-selection episodes, and would link them. secondly, aids in easy amplification of the target The phylogenetic trees may be rooted/unrooted (pl. sequences from distant taxa (b) conserved, slow see figure 1 for typical phylogenetic tree with labeling). evolving genes may be used to resolve the

WebmedCentral > Review articles Page 3 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

evolutionary relationship between distantly related groups include slow but accurate algorithms, or group species while fast evolving genes should be choose with both fast and accurate algorithms (Edgar, 2004; for the recently evolved species or intra-species (c) Notredame et al., 2000). Some of the algorithms have amino acid sequences are more informative while been proposed which carry the MSAs by combining inferring the evolutionary relationship among distantly the results obtained from more than one program, and related taxa, and conversely, nucleotide information hence, reasonably accurate multiple sequence for recently evolved/closely related species (d) the alignment can be resulted (Rice et al., 2000). Although, sequences need to be employed in the phylogenetic many program both online and offline are available to analysis should be tested for their usability in a given perform MSA, often manual intervention is warranted (for instance, mitochondrial (cytochrome C to achieve correct MSAs (Zvelebil and Baum, 2008). oxidase subunit I & II (CoxI & II)), chloroplast Step 3: Selection of an evolutionary model (trnH-psbA, matK, rpoC, rpoB, rbcL), and nuclear (16S Selection of an evolutionary model follows the multiple ribosomal RNA) conserved genes are preferred to use sequence alignment. According to the neutral theory of for analyzing animal, plant, and microbial species, evolution, most of the mutations are neutral and can respectively-and are called “barcode genes”) occur at the rate of 10-6 to 10-8. Considering this fact (Chantangsi et al., 2007; Liu and Beckenbach, 1992; every site in a DNA sequence must have undergone Raghavendra et al., 2009; Shneer, 2009) (e) finally, if, numerous substitutions that are proportional to the objective is to estimate the divergence periods evolutionary time period. Some sequences may evolve between taxa, the selected gene or protein sequences at a faster rate than other, and further, some lineages should essentially follow the molecular clock may undergo faster evolution than others (Lio and hypothesis (Barton et al., 2007; Kumar and Filipski, Goldman, 1998). Every site in a sequence may evolve 2001). However, recently relaxed molecular clock differently (Van de Peer and De Wachter, 1997) and models have also been proposed. This step follows may have a differential tendency for mutational successful polymerase chain reaction amplification of tolerance. The nucleotide substitutions can be the target gene/protein, followed by sequencing and classified into transitions and transversions, while editing of the sequences for further analysis. amino acid substitutions as synonymous and Step 2: Multiple sequence alignments non-synonymous mutations. The transitions have The second step in the phylogenetic construction twice as many routes as transversions to occur. involves the alignment of edited sequences. Aligning Consequently, in nature, the number of transitions two sequences is known as pair-wise sequence always prevails over the transversions. Thus, the rate alignment, while the alignment that includes more than of transitions to transversions denoted as ‘R’ is two sequences is known as multiple sequence absolutely necessary to infer the correct phylogenetic alignments. The pair-wise sequence alignments (MSA) relationships. The R-value may vary from sequence to can be classified into global and local. The global sequence, and thus it needs to be estimated for every pair-wise sequence alignment includes end-to-end set of sequences separately. The simplest alignment of two given sequences irrespective of their evolutionary models do not consider the R-value in sequence sizes, while the local alignment is about their analysis. finding the best alignment of the short sequence The rate of substitution also varies from a site to site segments locally (http://www.ncbi.nlm.nih.gov/). The for a given sequence (Van de Peer and De Wachter, main aim of multiple sequence alignment is to 1997). The rates of substitutions are represented by compare the three or more nucleotide or protein gamma distribution where alpha acts as a measured sequences and to provide the basis for calculation of parameter. This parameter is used to derive a gamma the sequence diversities/divergences to infer the distribution corrected distance, referred to as gamma evolutionary relationship among the taxa. Different distance. Thus, inclusion of the gamma parameter will models (discussed below) have been proposed based increase the probability of obtaining the correct on various assumptions to calculate the sequence phylogenetic tree. The actual number of mutations divergences between the sequences or taxa. Hence, occurred during the evolution to yield the present the correct sequence alignment is mandatory in order sequence in question are significantly larger than the to get the true phylogeny that is representative of the actual number of substitutions observed. Hence, evolutionary relationship among the taxa (Feng and evolutionary distance correction is required to obtain Doolittle, 1987). Numerous algorithms have been near to the actual value through applying best fit proposed to perform the task of correct sequence models appropriately. alignment (Procter et al., 2010). Some algorithms are All these facts complicate and make the situation that heuristic with a compromised accuracy, while other warrants for evolutionary models that can best

WebmedCentral > Review articles Page 4 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

calculate the actual rate of substitutions for a given set The Unweighted pairwise group of multiple alignments of sequences. Every phylogenetic reconstruction (UPGMA), (NJ), Minimum Evolution method considers simple to complex models of and Fitch-Margoliash are examples for the distance evolution in order to obtain the evolutionary based methods (Saitou and Imanishi, 1989). These relationship at least nearer to the reality. A number of methods produce a single phylogenetic tree with different models have been proposed separately for branch lengths using the clustering methods. Further, the nucleotide, codon, and protein sequences with distance methods can handle a huge number of emphasis on assumptions made and parameters used sequences; for example, to construct the “Tree of Life'. (Lio and Goldman, 1998; Yang, 2007). It is important Distance based methods derive the pair-wise to note that any single model does not incorporate all distances from MSA. While others will consider MSA the possible information; thus, choice of the best fit directly into consideration and construct the model for the sequences under study should be phylogenies, that tries to consider every single site critically made before the analysis. Evolutionary model variation into the account to derivate branch lengths. that best explain the observed sequence data can be The distance matrix is derived from measured inferred using the ModelTest or jModelTest software. It distances or morphometric analyses. The various uses three different criterions as a measure to infer pair-wise distance formulae (Jaccard Coefficient) can best fit model, namely hierarchical Likelihood Ratio be applied for morphological characters or genetic Test (hLRT), Akaike Information Content (AIC), or distance data that comes from sequences, restriction Bayesian Information Content (BIC). For more site polymorphisms, different methods of marker information on how these estimates are calculated, analysis (for example, micro- or mini-satellites, RAPDs, how the parameter rich models influence these etc.) or allozyme data. Distance-matrix based methods estimates, please refer to Posada (2008). are generally depending upon the MSA to calculate Technical details of the different available evolutionary the pair-wise distances between OTUs. The gaps and models are beyond the scope of this chapter, and the missing data can be handled in different ways; a) readers are advised for further reading given in mismatches (indel/deletions/gaps) can be deleted Reference section (Barton et al., 2007; Delport et al., either pair-wise or completely b) mismatches can be 2008; Lio and Goldman, 1998; Yang, 2007; Yang and included as mutations in the analysis. The pair-wise Nielsen, 2002). distance matrix generated will be used by different Step 4: Phylogenetic reconstruction phylogenetic reconstruction programs for clustering Two different methodologies are employed by the the taxa. The internal node is placed between two presently available programs to generate the similar taxa. Following which progressive clustering dendograms; (a) clustering methods-where two most will be done by considering each internal node as closely related taxa are placed under single inter-node single taxa. and further add third taxa considering within The NJ based method follows the minimum evolution. internodes taxa as a single group. In this way, the The concept of minimum evolution is based on the program progressively adds the other remaining taxa least number of mutations that are required to obtain a to yield final phylogenetic tree (b) second type of given tree. The maximum parsimony also follows a methods generate the 'n' number of trees proportional minimum evolution principle, but are directly on the to the number of taxa involved in the phylogenetic alignment and minimize the number of mutations analysis followed by the selection of best fit tree required to get the given tree topology. Parsimony topology (increased likelihood or probability) for a methods can be affected by the long-branch attraction given evolutionary model. Choosing the correct (fast evolving species were inferred as closely related substitution model is crucial for inferring the most because of highly saturated phylogenetically accurate phylogenetic relationship. The list of freely informative sites), while the likelihood methods are available software for model selection is listed in the best for drawing correct phylogenies with strong popular software section at the end of the chapter statistical support in such cases (Zvelebil and Baum, (Table 1). 2008). Phylogenetic tree construction methods can be Among all, the maximum likelihood and Bayesian classified into distance methods, minimum evolution, probability methods are highly sophisticated that parsimony, probabilistic, and likelihood methods depends on likelihood or probability models to infer the (Table 1). Basically, the distance based methods are evolutionary distances. To date, these two methods simple and the Operational Taxonomic Units (OTUs) are increasingly become popular to construct the clustering is done based on the sequence divergences phylogenies. However, these methods are computer that are calculated using different evolutionary models. intensive and limit the large number of sequences that

WebmedCentral > Review articles Page 5 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

can be used for constructing larger phylogenies. Spring Harbor Laboratory Press). Finally, every method available till to date can produce 2. Chantangsi, C., Lynn, D.H., Brandl, M.T., Cole, J.C., wrong phylogenetic relationship under certain Hetrick, N., Ikonomi, P., 2007. Barcoding ciliates: a conditions and thus, every method has their own comprehensive study of 75 isolates of the followers and discouragers (Nei, 2003). Tetrahymena. Int. J. Syst. Evol. Microbiol. 57, Step 5: evaluating the phylogenetic tree 2412-2423. After successful construction of the phylogenetic tree, 3. Delport, W., Scheffler, K., Seoighe, C., 2008. the next step involves evaluation of the tree topology. Models of coding sequence evolution. Brief. Bioinform. This process can be performed using two evaluation 10, 97-109. methods, namely bootstrap method and 4. Duncan, T., Baum, B.R., 1981. Numerical phenetics: interior-branch test. The basic concept of bootstrap its uses in botanical . Annu. Rev. Ecol. method is evaluation of the tree topology by Syst. 12, 387-404. constructing phylogenetic trees equal to the given 5. Edgar, R.C., 2004. MUSCLE: a multiple sequence number of pseudo-data replicates. Pseudo-replicates alignment method with reduced time and space are nothing but complete data set with equal number complexity. BMC Bioinformatics 5, 113. of information sites (columns) by removing one column 6. Felsenstein, J. 2004. Inferring phylogenies information site which is replaced with the complete (Massachusetts, Sinauer Associates, Inc.), p. 644. column site from existing data set. In this way the user 7. Felsenstein, J., 2005. PHYLIP (phylogeny inference defined number of data pseudo-replicates is package) version 3.6. Distributed by the author. constructed followed by corresponding phylogenetic Department of Genome Sciences, University of trees. The number of times each of the claimed node Washington, Seattle, 47-55. in initial phylogenetic tree which is under evaluation, is 8. Feng, D.F., Doolittle, R.F., 1987. Progressive repeated in bootstrap phylogenetic trees will be given sequence alignment as a prerequisitetto correct in percentages at the tree nodes called “bootstrapped phylogenetic trees. J. Mol. Evol. 25, 351-360. values” or “bootstrapped percentages” (Felsenstein, 9. Fitz Gibbon, S.T., House, C.H., 1999. Whole 2004). The tree nodes having >70% bootstrapped genome-based phylogenetic analysis of free-living values are generally considered as consistent. The microorganisms. Nucleic Acids Res. 27, 4218. computational speed of the bootstrapped testing 10. Guindon, S., Lethiec, F., Duroux, P., Gascuel, O., depends upon the number of sequences, length of the 2005. PHYML Online--a web server for fast maximum sequences, and finally, the number of likelihood-based phylogenetic inference. Nucleic Acids pseudo-replicates/bootstrap replicates is requested. Res. 33, W557. This general method of bootstrapping is known as 11. Hafner, M.S., Nadler, S.A., 1988. Phylogenetic non-parametric bootstrapping. Another variant of the trees support the coevolution of parasites and their non-parametric bootstrapping is parametric hosts. Nature 332, 258-259. bootstrapping where, the evolutionary model based 12. Horner, D.S., Pesole, G., 2004. Phylogenetic sequence data sets (pseudo-replicates) are created. analyses: a brief introduction to methods and their This follows the same procedure as non-parametric application. Expert Rev. Mol. Diagn. 4, 339-350. bootstrapping to evaluate the given phylogenetic tree 13. Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: (Makarenkov et al., 2010). While in case of bootstrap Bayesian inference of phylogenetic trees. interior branch test, the data sampling is resembles Bioinformatics 17, 754-755. the bootstrapped method, however, here it is used to 14. Kao, H.T., Porton, B., Hilfiker, S., Stefani, G., calculate the branch lengths on the given original Pieribone, V.A., DeSalle, R., Greengard, P., 1999. phylogenetic tree. In this test confidence of the interior Molecular evolution of the synapsin gene family. J. branch length being non-zero is tested and the tree Exp. Zool. 285, 360-377. nodes indicated with the confidence of the obtained 15. Klenk, H., Göker, M., 2010. En route to a branch length. This method is considered as an genome-based classification of Archaea and Bacteria? improvement over the existing popular bootstrapped Syst. Appl. Microbiol. 33, 175-182. method (Zvelebil and Baum, 2008). 16. Korbel, J.O., Snel, B., Huynen, M.A., Bork, P., 2002. SHOT: a web server for the construction of References genome phylogenies. Trends Genet. 18, 158-162. 17. Kumar, S., Filipski, A.J. 2001. Molecular phylogeny reconstruction. In Encyclopedia of life sciences 1. Barton, N.H., Briggs, D.E.G., Eisen, J.A., Goldstein, (Macmillan Publishers Ltd, Nature Publishing Group). D.B., Patel, N.H. 2007. Evolution (New York, Cold 18. Lio, P., Goldman, N., 1998. Models of molecular

WebmedCentral > Review articles Page 6 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

evolution and phylogeny. Genome Res. 8, 1233-1244. Trends Genet. 16, 276--277. 19. Liu, H., Beckenbach, A.T., 1992. Evolution of the 34. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: mitochondrial cytochrome oxidase II gene among 10 Bayesian phylogenetic inference under mixed models. orders of insects. Mol. Phylogenet. Evol. 1, 41-52. Bioinformatics 19, 1572. 20. Makarenkov, V., Boc, A., Xie, J., Peres-Neto, P., 35. Saitou, N., Imanishi, T., 1989. Relative efficiencies Lapointe, F., Legendre, P., 2010. Weighted of the Fitch-Margoliash, maximum-parsimony, bootstrapping: a correction method for assessing the maximum-likelihood, minimum-evolution, and robustness of phylogenetic trees. BMC Evol. Biol. 10, neighbor-joining methods of phylogenetic tree 250. construction in obtaining the correct tree. Mol. Biol. 21. Mayr, E., 1965. Numerical phenetics and Evol. 6, 51. taxonomic theory. Syst. Biol. 14, 73. 36. Schmidt, H.A., Strimmer, K., Vingron, M., Von 22. Nei, M., 2003. Phylogenetic analysis in molecular Haeseler, A., 2002. TREE-PUZZLE: maximum evolutionary genetics. Annu. Rev. Genet. 30, 371-403. likelihood phylogenetic analysis using quartets and 23. Nei, M., Gu, X., Sitnikova, T., 1997. Evolution by parallel computing. Bioinformatics 18, 502. the birth-and-death process in multigene families of 37. Shneer, V.S., 2009. DNA barcoding is a new the vertebrate immune system. Proc. Natl. Acad. Sci. approach in comparative genomics of plants. Genetika U. S. A. 94, 7799-7806. 45, 1436-1448. 24. Nei, M., Rooney, A.P., 2005. Concerted and birth 38. Simon, D.L., Larget, B., 1998. Bayesian analysis in and death evolution of multigene families. Annu. Rev. molecular biology and evolution (BAMBE). Department Genet. 39, 121-152. of Mathematics and Computer Science, Dequesne 25. Nikaido, M., Matsuno, F., Hamilton, H., Brownell, University, Pittsburgh. R.L., Cao, Y., Ding, W., Zuoyan, Z., Shedlock, A.M., 39. Snel, B., Bork, P., Huynen, M.A., 1999. Genome Fordyce, R.E., Hasegawa, M., Okada, N., 2001. phylogeny based on gene content. Nat. Genet. 21, Retroposon analysis of major cetacean lineages: the 108-110. of toothed whales and the of 40. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. river dolphins. Proc. Natl. Acad. Sci. U. S. A. 98, MEGA4: Molecular Evolutionary Genetics Analysis 7384-7389. (MEGA) software version 4.0. Mol. Biol. Evol. 24, 26. Notredame, C., Higgins, D.G., Heringa, J., 2000. 1596-1599. T-Coffee: A novel method for fast and accurate Thornton, J.W., DeSalle, R., 2000. Gene family multiple sequence alignment. J. Mol. Biol. 302, evolution and homology: genomics meets 205-217. phylogenetics. Annu. Rev. Genomics Hum. Genet. 1, 27. Page, R.D.M., Holmes, E.C., 1998, Molecular 41-73. evolution: a phylogenetic approach. Wiley-Blackwell, 41. Van de Peer, Y., De Wachter, R., 1997. 417 p. Construction of evolutionary distance trees with 28. Pagel, M., 2000. Phylogenetic-evolutionary TREECON for Windows: accounting for variation in approaches to bioinformatics. Brief. Bioinform. 1, 117. nucleotide substitution rate among sites. Comput. Appl. 29. Pond, S.L.K., Frost, S.D.W., Muse, S.V., 2005. Biosci. 13, 227-230. HyPhy: hypothesis testing using phylogenies. 42. Yang, Z., 2007. PAML 4: Phylogenetic Analysis by Bioinformatics 21, 676-679. Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591. 30. Posada, D., 2008. jModelTest: Phylogenetic Model 43. Yang, Z., Nielsen, R., 2002. Codon-substitution Averaging. Mol. Biol. Evol. 25, 1253-1256. models for detecting molecular adaptation at individual 31. Procter, J.B., Thompson, J., Letunic, I., Creevey, sites along specific lineages. Mol. Biol. Evol. 19, 908. C., Jossinet, F., Barton, G.J., 2010. Visualization of 44. Zuckerkandl, E., Pauling, L.B. 1962. Molecular multiple alignments, phylogenies and gene family disease, evolution, and genetic heterogeneity. In evolution. Nat. Meth. 7, S16-25. Horizons in Biochemistry, Kasha, M., Pullman, B., eds. 32. Raghavendra, K., Cornel, A.J., Reddy, B.P.N., (New York, Academic Press), pp. 189-225. Collins, F.H., Nanda, N., Chandra, D., Verma, V., 45. Zvelebil, M., Baum, J.O. 2008. Understanding Dash, A.P., Subbarao, S.K., 2009. Multiplex PCR bioinformatics, Holdsworth, D., ed. (Garland Science, assay and phylogenetic analysis of sequences derived Taylor & Francis Group, LLC, an informa business). from D2 domain of 28S rDNA distinguished members of the Anopheles culicifacies complex into two groups, A/D and B/C/E. Infect. Genet. Evol. 9, 271-277. 33. Rice, P., Longden, I., Bleasby, A., 2000. EMBOSS: The European Molecular Biology Open Software Suite.

WebmedCentral > Review articles Page 7 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

Illustrations Illustration 1

Table 1. List of different popular phylogenetic tree construction methods along with the list of different free software available from World Wide Web

Name of the method Distance P h y l o g e n y Free programs Remarks References estimation generating method available

Unweighted pair-group √ clustering M E G A v 5 . 0 , Follows molecular clock (Felsenstein, 2005; Tamura method using arithmetic Phylipv3.69 hypothesis et al., 2007) averages (UPGMA)

Neighbor-joining (NJ) √ clustering M E G A v 5 . 0 , Minimum evolution (Felsenstein, 2005; Tamura Phylipv3.69 et al., 2007)

Fitch-Margoliash √ clustering M E G A v 5 . 0 , Minimum evolution (Felsenstein, 2005; Tamura Phylipv3.69 et al., 2007)

Minimum Evolution √ clustering M E G A v 5 . 0 , Minimum evolution (Felsenstein, 2005; Tamura (ME) Phylipv3.69 et al., 2007)

WebmedCentral > Review articles Page 8 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

Maximum Parsimony X Multiple trees M E G A v 5 . 0 , Parsimony hypothesis (Felsenstein, 2005; Tamura (MP) Phylipv3.69, LVB et al., 2007)

Maximum likelihood X Multiple trees M E G A v 5 . 0 , Likelihood method (Felsenstein, 2005; (ML) Phylipv3.69, Guindon et al., 2005; Pond PAMLv4.0, HyPhy, et al., 2005; Schmidt et al., PHYML,PUZZLE 2002; Tamura et al., 2007; Yang, 2007)

Bayesian X Multiple trees MrBayes, Likelihood hypothesis and (Huelsenbeck and is extension to the ML Ronquist, 2001; Ronquist BAMBE and Huelsenbeck, 2003; Simon and Larget, 1998)

WebmedCentral > Review articles Page 9 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

Illustration 2

Figure 1

WebmedCentral > Review articles Page 10 of 11 WMC002563 Downloaded from http://www.webmedcentral.com on 05-Dec-2011, 05:19:58 AM

Disclaimer

This article has been downloaded from WebmedCentral. With our unique author driven post publication peer review, contents posted on this web portal do not undergo any prepublication peer or editorial review. It is completely the responsibility of the authors to ensure not only scientific and ethical standards of the manuscript but also its grammatical accuracy. Authors must ensure that they obtain all the necessary permissions before submitting any information that requires obtaining a consent or approval from a third party. Authors should also ensure not to submit any information which they do not have the copyright of or of which they have transferred the copyrights to a third party. Contents on WebmedCentral are purely for biomedical researchers and scientists. They are not meant to cater to the needs of an individual patient. The web portal or any content(s) therein is neither designed to support, nor replace, the relationship that exists between a patient/site visitor and his/her physician. Your use of the WebmedCentral site and its contents is entirely at your own risk. We do not take any responsibility for any harm that you may suffer or inflict on a third person by following the contents of this website.

WebmedCentral > Review articles Page 11 of 11