Outline Outline Background Knowledge

Chapter 13: Building phylogenetic Outline trees • Phylogenetic trees • Matching multiple sequence alignment • ClustalW/JalView and Neighbour Joining Nothing in biology makes sense except in the light of Tree evolution. • Phylip and Bootstrapping __Theodosius Dobzhansky (1900-1973). Outline Background knowledge • Phylogenetic trees • Taxonomy classifies life into groups • Matching multiple sequence alignment • Example: taxonomic classification of man • ClustalW/JalView and Neighbour Joining Superkingdom: Eukaryota Tree Kingdom: Metazoa Phylum: Chordata • Phylip and Bootstrapping Class: Mammalia Order: Primata Family: Hominidae Genus: Homo Species: sapiens Are there correct trees? •Conditions –dataare clean – outgroups are correctly specified – appropriate algorithms are chosen – no assumptions are violated, etc. •Question – Can the true, correct tree be found and proven to be scientifically valid? • Unfortunately, it is impossible to ever conclusively state what is the "true" tree for a group of sequences (or a group of organisms); taxonomy is constantly under revision as new data is gathered. 1 Ale pozor •Rozcestník: http://evolution.genetics.washington.edu /phylip/software.html Species Fossils • Any group of closely related organisms • Fossils are relics or impressions of that can produce fertile offspring. organisms from the past, mineralized in • Two organisms are more closely rock. "related" as they approach the level of • Fossils provide a documented history of species, that is, they have more genes how the diversity of life has changed, in common. and how organisms are related. Fossils reliability • Fossils are reliable only when properly dated. • Geological dating. • Radiometric dating – a method used to determine the age of rocks and fossils by studying the rate of decay of radioactive isotopes. 2 Time unit Principle • Half-life – the number of years it takes for • While alive: assimilates both 12C and 14C. 50% of the original sample to decay (14C has • After it dies: stops assimilation and the amount of 14C 12 a half-life of 5600 years) declines relative to C. • Once found: the ratio of 14C to 12C can be measured • Half-life is NOT affected by temperature, for the number of half-life reductions. pressure, or other environmental variables. All species are related by descent Phylogeny Phylogenetic analysis • Phylogeny is the evolutionary history of a species or • Phylogeny applied to gene data. group of related species. • Means of classification of organisms. • Gene data are more objective than morphology • Explains the present diversity of living creatures. data. • Represented as genealogic tree – pedigree – tree of • Reconstruction of evolutionary history of a group life. Usually binary tree with contemporary organisms of organisms or sequences. at the leaves. • Goal of phylogenetic inference is to reconstruct the • Phylogenetic tree: presenting evolutionary order of splitting events (and perhaps the distances relationship. between them). 3 Phylogenetic problem Why phylogenetic analysis? • Input: – A set of contemporary species (S) whose • Constructing vaccines evolutionary relationship is to be reconstructed. – Want to assure that vaccine is constructed – A set of inheritable characteristics (C) that describe to address diverse strains of the disease each species. Characteristics can be ((bird) influenza). • quantitative – continuous (size) • qualitative – discrete (gene sequence). •Epidemiology •Output: – Reconstruct paths of infection, either for – Tree (branch lengths) which best fits the data. an individual or for population (HIV). • Assumptions: – Common ancestor, homologous characteristics. Systematics Phylogenetic systematics • Study of biological diversity in an • Deals with identifying and understanding evolutionary context. the relationships among the many different • Includes taxonomy - the science of kinds of life on Earth, both living (extant) naming and classifying the diversity of and dead (extinct). organisms. • Phenetics and cladistics. Phylogenetic terms Phenetics • Monophyletic • Numerical taxonomy, involves the use – group of DNA sequences from a single common ancestral sequence. of various measures of overall similarity •Clade for the ranking of species. – a group of all monophyletic DNA sequences and their ancestor included in the analysis. • Data converted to numerical values • Parsimony without any character "weighting" and – choosing tree by the shortest evolutionary pathway, compared to produce phenograms pathway with the smallest number of nucleotide changes to go from the ancestral sequence at the (taxonomic clusters). root of the tree to all of the present-day sequences that have been compared. 4 Cladistic groups Cladistics • Monophyletic • Members of a group share a common – all species share a common ancestor, and all species evolutionary history, emphasis on recognizing derived from that common ancestor are included. This is the only form of grouping accepted as valid by cladists. only monophyletic groups, a group plus all of its descendents, or clades. • Paraphyletic – all species share an immediate common ancestor, but NOT • Sharing the set of unique features all species derived from that common ancestor are (apomorphies) within a related group. included. • Species arise from bifurcation. •Polyphyletic – species that do NOT share an immediate common ancestor are lumped together, while excluding other members that would link them. Monophyletic Polyphyletic Question D E GH J K D E GH J K Paraphyletic C F I C F I L B B A A Question Homologs D E GH J K • Orthologs - produced by speciation; similar function, genes derived from a common ancestor that diverged because of divergence of the C F I organism. • Paralogs - produced by gene duplication; B different function, genes derived from a common ancestral gene that duplicated within an organism and then diverged. A • Xenologs – produced by the horizontal transfer of a gene between two organisms; function tends to be similar. 5 Orthology and paralogy Evolution • Drivers of evolution: mutation, selection, drift (indiscriminate parents selection, founder effect, and bottleneck effect). "Natural Selection" is the principle by which each slight variation, if useful, is preserved. __Charles Darwin Evolution theories Phylogenetic tree • Neodarwinism (selectionism) – survival of the fittest. • Neutralism – (Kimura), most mutations are lethal or neutral, neutral spread in population by chance. • We use mostly neutral mutations for phylogenetic tree computations because they accumulate smoothly and follow molecular clock (ticking at a different pace for every gene). Phylogenetic tree parts Branch • Node - taxonomic unit. Either an existing species or • Relationship between the taxa in terms an ancestor. of descent and ancestry. • Root - the common ancestor of all taxa. •Branch • Clade - a group of two or more taxa or DNA sequences that includes both their common ancestor – scaled: length represents the number of and all of their descendents. changes (in terms of passage of time) • Topology - the branching patterns of the tree. – unscaled: branch length is NOT proportional to the number of changes that has • OTU – Operational Taxonomic Unit. occurred. 6 Tree Unrooted tree • Rooted: with a node (root) representing a common ancestor, from which a unique path leads to any other node • Unrooted: without identifying a common ancestor, or evolutionary path. • The oldest split (the tree root) is the hardest to reconstruct. – ignore the problem, and produce unrooted trees – use an “outgroup” - more distant relation than most distant pair in the phylogeny • Outgroups are problematic – When close, too hard to be sure an outgroup really is one – When far away, too hard to identify homologous characteristics. Rooted tree Unrooted to rooted • Include outgroup to make a correct root. Newick tree format ...Newick tree format (B,(A,C,E),D); • ((Human,Gorilla),(Mouse,Rat)); • ((Human:0.1,Gorilla:0.1):0.4,(Mouse:0.2,Rat: 0.2):0.3); 7 ...Newick tree format Gene tree vs. species tree • NOT identical because internal nodes are NOT always equivalent. • Gene tree internal node - the divergence of an ancestral gene into two genes with different DNA sequences, usually resulting from a mutation. • Species tree internal node - speciation event, whereby the population of the ancestral species splits into two groups that are no longer able to interbreed. •GeneTree http://taxonomy.zoology.gla.ac.uk/rod/genetree/genetr ee.html Biological questions with Outline phylogenetic trees • I sequenced rRNA of an unknown • Phylogenetic trees bacterium. What is the closest relative • Matching multiple sequence alignment of my organism? • I found a gene. Is it orthologous to • ClustalW/JalView and Neighbour Joining another well-characterized gene? Tree • My gene seems strange within my • Phylip and Bootstrapping organism. Does it descend from a horizontal transfer? Multiple sequence alignment (MSA) Preparation of data • A multiple sequence alignment is good for • GIGO: garbage in – garbage out specifying a phylogeny. • Each position (column in an MSA) is a • Highly accurate multiple sequence “character”. alignment with properly chosen • Can calculate “distances” or “set of changes” sequences. required, based on differences in aligned • Assumption of divergent evolution is columns and indels. right – divergent evolution on the • These distances or change sets can be used to score alternative trees. molecular level is exception. 8 …preparation of data …preparation of data • Use DNA multiple

Outline Outline Background Knowledge

BIOINFORMATICS APPLICATIONS NOTE Doi:10.1093/Bioinformatics/Btl478

Neighbor Joining, Fastme, and Distance-Based Methods

An R Interface for PHYLIP

Intro to PHYLIP

A Software Tool for the Conversion of Sequence Alignments

Fine Tuning of Phylip on Intel Xeon Architecture

Phylip and Phylogenetics

Garbage in = Garbage Out

Dnasp V5. Tutorial

IQ-TREE Version 2.1.2: Tutorials and Manual Phylogenomic Software by Maximum Likelihood

The Raxml V8.2.X Manual

Rphylip’ December 30, 2013