
Systematics - BIO 615 Outline 1. A few more Data issues (weighting, taxon sampling) 2. Tree terminology 3. Cladistics - Hennig’s method, polarity 4. Optimality Criteria: Parsimony Derek S. Sikes University of Alaska Four steps - each should be explained in methods Saturation 1. Character (data) selection (not too fast, not too Saturation graph slow) “Why did you choose these data?” as time proceeds DNA distances also increase, to a point of saturation 2. Alignment of Data (hypotheses of primary In extreme cases homology) “How did you align your data?” the data become DNA essentially Distances randomized & all 3. Analysis selection (choose the best model / phylogenetically actual valuable method(s)) - data exploration “Why did you observed chose your analysis method?” information is gone time 4. Conduct analysis 20 Amino Acids & stop codon! Saturation 3 Letter Code 1 Letter Code Full name mRNA nucleotide triplets (codons) ! Ala ! !A Alanine !GCA, GCC, GCG, GCU ! One way to help correct saturated data is to ignore Arg ! !R ! Arginine !AGA, AGG, CGA, CGC, CGG, CGU ! Asn ! !N ! Asparagine !AAC, AAU ! the 3rd position sites - which will saturate faster than Asp ! !D ! Aspartic acid !GAC, GAU ! Cys ! !C ! Cysteine !UGC, UGU ! the 1st or 2nd. Glu ! !E ! Glutamic acid !GAA, GAG ! Gln ! !Q ! Glutamine !CAA, CAG ! Another method Gly ! !G ! Glycine !GGA, GGC, GGG, GGU ! His ! !H ! Histidine !CAC, CAU ! is to translate the Ile ! !I ! Isoleucine !AUA, AUC, AUU ! DNA Leu ! !L ! Leucine !UUA, UUG, CUA, CUC, CUG, CUU ! DNA to Amino Lys ! !K ! Lysine! !AAA, AAG ! Distances Met ! !M ! Methiodine !AUG ! Acid sequences Phe ! !F ! Phenylalanine !UUC, UUU ! and use those Pro ! !P ! Proline !CCA, CCC, CCG, CCU ! actual Ser ! !S ! Serine! !AGC, AGU, UCA, UCC, UCG, UCU ! instead Thr ! !T ! Threonine !ACA, ACC, ACG, ACU ! observed Trp ! !W ! Tryptophan !UGG ! Tyr ! !Y ! Tyrosine !UAC, UAU ! Val ! !V ! Valine! !GUA, GUC, GUG, GUU ! STOP UAA, UAG, UGA! Recall the 3rd codon position changes the most and the 2nd time position changes the least 1 Systematics - BIO 615 DNA structure Saturation Transitions & Transversions Recall: A, G = purines T, C = pyrimidines most mutations are transitions [Ts] (replacement of a purine with a purine, or a pyrimidine with a pyrimidine) Transversions [Tv] (purine replaced by a pyrimidine) are less common Saturation Saturation Another option to deal with saturation This is essentially an extreme form of weighting (a method of correcting the data) Convert DNA data to purines & pyrimidines transitions are weighted = 0 and transversions in matrix use R for purines, Y for pyrimidines are weighted = 1 Removes all the transitions from the data & some opt for less extreme weighting: leaves only transversions e.g. transitions = 1 transversions = 2 Less extreme than using only amino acids (which may ignore a lot of signal in the The logic is the less common evolutionary DNA data) change should be less homoplasious & have more phylogenetic information (signal) Weighting Sample Size Some (mostly cladists) consider weighting (or correcting) of data in any way to be wrong For morphological data one typically They argue that the subjectively decided examines tens to hundreds of specimens weighting scheme may alter the results when scoring characters In which case the results are not due to the For molecular data such sample sizes are signal in the data but due to the weighting much more costly scheme itself - typically people sequence one (but there is no such thing as “unweighted” - or, preferably, a few specimens per data - this would be “equally weighted” data) taxon of interest A Big issue that we will touch on repeatedly 2 Systematics - BIO 615 Sample Size Sample Size Problems with one sequence per taxon: Lab / human error: 1. Three major parasitoid wasp taxa - one of the first - Much harder to identify cases of gene molecular phylogenies. Agreed with morphology. Later tree problems, xenology, discovered the sequences were those of cow, pig, and hybridization, or introgression human! Got the “correct” tree by chance. Apocryphal? story reported by Godfray & Knapp (2004) - Also no way to check for lab error: 2. Odd, published, phylogeny of mammals that - could have been a misidentification suggested the blue whale was a primate - due to cut & paste errors - or a tube mix-up Godfray, H. C.J.,S. Knapp (2004) Introduction to a theme issue ‘Taxonomy for the twenty first century’ Phil. Trans. R. Soc. Lond. B 359: 559-569. Page, R. D. M. & Charleston, M. A. 1999 Comments on Allard & Imaginary example of Carpenter (1996), or the ‘aquatic ape’ hypothesis revisited. possible misidentification Cladistics 15, 73–74. or introgression etc. If dataset had only 1 Amazing that Allard & specimen per species this Carpenter published this problem would not be tree apparent Less obvious mistakes might be be scattered interruptus3 among the literature So, when possible, verify the work of others… Sample Size Phylogenetic Trees - Terminology Rule of thumb: Phylogenetic trees are hypotheses or estimates of Should use multiple samples one rank evolutionary relationships below the taxon of interest - trees are an explanation of the data (how the data came to be the way they are) e.g. - if studying relationships among species then sample multiple If we assume: 1. Our characters are independent specimens per species 2. Our character states are homologous (& genes orthologous) - if studying relationships among 3. Evolution has happened genera then sample multiple species We can infer the evolutionary relationships among per genus etc. organisms 3 Systematics - BIO 615 Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Phylogenetic inference: Terminals, LEAVES, TIPS or OTUs Peripheral or terminal branches Inference: drawing conclusions from premises A B C D E F G H I J Deduction: if premises true then inference is true node 2 node 1 Polytomy or Induction: if premises true then inference is multifurcation interior probably true branches A CLADOGRAM In phylogenetics we can never do better than “probably” true ROOT ((A,B)C))(((H,I,J)(F,G))(D,E)) - Newick tree format Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Phylogram These are all the same trees Cladogram branch lengths Branch lengths proportional to time and / or change meaningless E Relationships only D C A B C D E H I J F G A B G I F H J (((D,C)B)A) ABSOLUTE TIME or DIVERGENCE Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Dendrogram or Chronogram archaea Trees - Rooted and Unrooted branch lengths proportional to time eukaryote archaea Require some estimate of a molecular clock Unrooted tree aka Ultrametric trees archaea eukaryote eukaryote A B C D E H I J F G eukaryote Rooted bacteria outgroup" by outgroup archaea archaea Monophyletic group" archaea eukaryote eukaryote Monophyletic ABSOLUTE group" TIME or root eukaryote DIVERGENCE eukaryote 4 Systematics - BIO 615 Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Trees - Rooted and Unrooted Unrooted tree - a network that depicts the relationships among the OTUs but not the direction of evolution - most methods / programs build unrooted trees - these can be later rooted to establish the direction of evolution (and the polarity of the character states) Rooted tree - there are various ways to root a tree Unrooted Rooted using Nori - most common is by addition of an outgroup - where the outgroup joins the ingroup is the root! Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Trees - Rooted and Unrooted Trees - Rooted and Unrooted The seven rooted trees that How many more rooted can be derived from an trees are there than unrooted tree for five unrooted? sequences. Each rooted tree 1-7 corresponds to placing the root on the corresponding numbered branch of the unrooted tree. How many more rooted trees are there than Explains why programs unrooted (in general, that tend to search on is) ? unrooted trees Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology Resolution - a tree with no polytomies is fully resolved CLADE - a set of OTUs that includes all the OTUs descended from their most recent common ancestor (if the set is given a Hard polytomy = simultaneous divergence; rapid radiation name, the named group would be no amount of data will resolve monophyletic) Soft polytomy = divergence not - not all clades are given names simultaneous, but data currently insufficient to resolve - typically, we use ‘clade’ when talking Difficult to distinguish in practice!! about trees and ‘monophyletic’ when talking about taxonomic names (classifications) !! 5 Systematics - BIO 615 Phylogenetic Trees - Terminology Phylogenetic Trees - Terminology TOPOLOGY - the branching patterns of the tree NODE - ancestral OTU (all that a cladogram depicts) note: Hennig’s cladistic method requires explicit distinction between anagenesis and BRANCH LENGTH - proportional to amount of cladogenesis (with the latter being the evolutionary change (rate x time) default) - thus most fossils and potential ‘ancestors’ are not depicted as nodes but a branch can be long because rather as OTUs - cladograms are not - the lineage has a higher rate of evolution evolutionary trees or - the rate of evolution is the same but the lineage has had more time to evolve than other lineages or a combination of these… Phylogenetic Trees - Terminology Cladistics Hennig’s original method: 1. Distinguish homologies from analogies Remane’s criteria 2. Distinguish derived homologies (apomorphies) from ancestral (plesiomorphies)
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-