Introduction to Bioinformatics What Is Molecular Phylogenetics

Introduction to Bioinformatics What Is Molecular Phylogenetics

Introduction to Bioinformatics What is Molecular Phylogenetics • Phylogenetics is the study of evolutionary relationships Prof. Dr. Nizamettin AYDIN • Example: – relationship among species [email protected] crocodiles primates rodents birds lizards Phylogenetics crocodiles birds snakes rodents marsupials snakes primates lizards marsupials 1 2 A Brief History of Molecular Phylogenetics Molecular data vs. Morphology/Physiology • 1900s • Strictly heritable entities • Can be influenced by – Immunochemical studies environmental factors • cross-reactions stronger for closely related organisms • Data is unambiguous • Ambiguous modifiers: – Nuttall (1902) - apes are closest relatives to humans! “reduced”, “slightly • 1960s - 1970s elongated”, “somewhat flattened” – Protein sequencing methods, electrophoresis, DNA hybridization and PCR contributed to a boom in molecular • Regular & predictable evolution • Unpredictable evolution phylogeny • Quantitative analyses • Qualitative argumentation • late 1970s to present • Ease of homology assessment • Homology difficult to assess – Discoveries using molecular phylogeny • Relationship of distantly related • Only close relationships can be • Endosymbiosis - Margulis, 1978 organisms can be inferred confidently inferred • Divergence of phyla and kingdom - Woese, 1987 • Abundant and easily generated • Problems when working with • Many Tree of Life projects completed or underway with PCR and sequencing micro-organisms and where visible morphology is lacking 3 4 Phylogenetic concepts: Interpreting a Phylogeny Phylogenetic concepts: Interpreting a Phylogeny Sequence A Sequence A Sequence B Sequence B • Physical position in tree is • Physical position in tree is Sequence C not meaningful Sequence E not meaningful • Swiveling can only be done • Swiveling can only be done Sequence D at the nodes Sequence D at the nodes • Only tree structure matters • Only tree structure matters Sequence E Sequence C Present Present Time Time 5 6 Copyright 2000 N. AYDIN. All rights reserved. 1 Tree Terminology Tree Terminology • Relationships are illustrated by a phylogenetic tree / • The branching pattern is called the tree’s topology dendrogram • Trees can be represented in several forms: – Combination of Greek dendro/tree and gramma/drawing – A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Rectangular cladogram Slanted cladogram – Dendrograms are often used in computational biology to illustrate the clustering of genes or samples, sometimes on top of heatmaps. • A cladogram is a type of phylogenetic tree that only shows tree topology – the shape indicating relatedness. • It shows that, say, humans are more closely related to chimpanzees than to gorillas, but not the time or genetic distance between the species. – Combination of Greek clados/branch and gramma/drawing 7 8 • Same tree - seven different views: Tree Terminology Rectangular Phylogram, Rectangular Cladogram, Slanted Cladogram, Circular Phylogram, Circular Cladogram, Radial Phylogram and Radial Cladogram Circular cladogram 9 10 Tree Terminology Tree Terminology Operational taxonomic units (OTU) / Taxa Rooted trees Unrooted trees A D Internal nodes A B B A E C B D C Root E Terminal nodes F C D F Sisters • Rooted trees: E Root – Has a root that denotes common ancestry F • Unrooted trees: – Only specifies the degree of kinship among taxa but not Branches the evolutionary path Polytomy Taxon, plural taxa. (taxonomy): Any group or rank in a biological classification into which related organisms are classified. 11 12 Copyright 2000 N. AYDIN. All rights reserved. 2 Tree Terminology Tree Terminology Scaled trees Unscaled trees Monophyletic groups Paraphyletic groups A Saturnite 1 Jupiterian 32 B Saturnite 2 Jupiterian 5 C Saturnite 3 Jupiterian 67 D Martian 1 Human 11 E Martian 3 Jupiterian 8 F Martian 2 Human 3 • Monophyletic groups: • Scaled trees: – All taxa within the group are derived from a single – Branch lengths are proportional to the number of nucleotide/amino acid common ancestor and members form a natural clade. changes that occurred on that branch (usually a scale is included). • Paraphyletic groups: • Unscaled trees: – The common ancestor is shared by other taxon in the group – Branch lengths are not proportional to the number of nucleotide/amino acid changes (usually used to illustrate evolutionary relationships only). and members do not form a natural clade. 13 14 Methods in Phylogenetic Reconstruction Comparison of Methods • Distance methods Distance Maximum parsimony Maximum likelihood – calculate pairwise distances between sequences, and group • Uses only pairwise • Uses only shared • Uses all data sequences that are most similar. distances derived characters – This approach has potential for computational simplicity and • Minimizes distance • Minimizes total • Maximizes tree therefore speed between nearest distance likelihood given • Maximum Parsimony neighbors specific parameter – assumes that shared characters in different entities result from values common descent. • Very fast • Slow • Very slow – Groups are built on the basis of such shared characters, and the • Easily trapped in • Assumptions fail • Highly dependent on simplest explanation for the evolution of characters is taken to be local optima when evolution is assumed evolution the correct, or most parsimonious one. rapid model • Maximum Likelihood • Good for generating • Best option when • Good for very small tentative tree, or tractable (<30 taxa, data sets and for – compute the probability that a data set fits a tree derived from choosing among homoplasy rare) testing trees built that data set, given a specified model of sequence evolution. multiple trees using other methods 15 16 Methods in Phylogenetic Reconstruction Methods in Phylogenetic Reconstruction • Distance • Maximum Parsimony – Using a sequence alignment, pairwise distances are calculated – All possible trees are determined for each position – Creates a distance matrix of the sequence alignment – A phylogenetic tree is calculated with clustering algorithms, using the distance matrix. – Each tree is given a score based on the number of – Examples of clustering algorithms include the Unweighted Pair evolutionary step needed to produce said tree Group Method using Arithmetic averages (UPGMA) and – The most parsimonious tree is the one that has the Neighbor Joining clustering. fewest evolutionary changes for all sequences to be A A A derived from a common ancestor B B B – Usually several equally parsimonious trees result C C from a single run. D 17 18 Copyright 2000 N. AYDIN. All rights reserved. 3 Maximum parsimony: exhaustive stepwise addition Methods in Phylogenetic Reconstruction B C Step 1 • Maximum Likelihood A – Creates all possible trees like Maximum Parsimony method but instead of retaining trees with shortest evolutionary B D B D C B C steps…… C D Step 2 – Employs a model of evolution whereby different rates of transition/transversion ration can be used A A A – Each tree generated is calculated for the probability that it E reflects each position of the sequence data. E B D B D B D E – Calculation is repeated for all nucleotide sites C C C – Finally, the tree with the best probability is shown as the ………………… maximum likelihood tree - usually only a single tree A A A Step 3 remains – It is a more realistic tree estimation because it does not assume equal transition-transversion ratio for all branches. 19 20 How confident are we about the inferred phylogeny? The Bootstrap ? rat • Computational method to estimate the confidence level ? human of a certain phylogenetic tree. turtle Pseudo sample 1 001122234556667 ? Sample fruit fly rat GGAAGGGGCTTTTTA 0123456789 human GGTTGGGGCTTTTTA ? oak rat GAGGCTTATC turtle GGTTGGGCCCCTTTA duckweed human GTGGCTTATC fruitfly CCTTCCCGCCCTTTT turtle GTGCCCTATG oak AATTCCCGCTTCCCT fruitfly CTCGCCTTTG duckweed AATTCCCCCTTCCCC • Bootstrapping oak ATCGCTCTTG duckweed ATCCCTCCGG • Bootstrap analysis is a kind of statistical analysis to test the reliability of Pseudo sample 2 445556777888899 certain branches in the evolutionary tree rat CCTTTTAAATTTTCC • It involves resampling one's own data, with replacement, to create a series rat human CCTTTTAAATTTTCC turtle CCCCCTAAATTTTGG human of bootstrap samples of the same size as the original data. fruitfly CCCCCTTTTTTTTGG turtle • In the case of nucleic acid (amino acid) sequences, the resampled data are oak CCTTTCTTTTTTTGG fruit fly duckweed CCTTTCCCCGGGGGG the nucleotides (amino acids) of a sequence while the statistical oak significance of a specific cluster is given by the fraction of trees, based on duckweed the resampled data, containing that cluster. Many more replicates Inferred tree (between 100 - 1000) 21 22 Bootstrap values Some Discoveries Made Using Molecular Phylogenetics • Universal Tree of Life 100 rat – Using rRNA 65 human sequences turtle 0 – Able to study the fruit fly relationships of 55 oak uncultivated duckweed organisms, obtained from a hot spring in • Values are in percentages Yellowstone National • Conventional practice: only values 60-100% Park are shown 23 24 Copyright 2000 N. AYDIN. All rights reserved. 4 Some Discoveries Made Using Molecular Phylogenetics Some Discoveries Made Using Molecular Phylogenetics • Endosymbiosis: Origin of the Mitochondrion and Chloroplast • Relationships within species: HIV subtypes Rwanda A -Purple Bacteria Other bacteria Ivory Coast Italy Chloroplasts B U.S. Uganda Mitochondria U.S. India Rwanda U.K.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us