Molecular Evolution

Total Page:16

File Type:pdf, Size:1020Kb

Molecular Evolution Molecular Evolution Justin Fay Center for Genome Sciences Department of Genetics 4515 McKinley Ave. Rm 4305 [email protected] Molecular evolution is the study of the cause and effects of evolutionary changes in molecules Species 1 GGCAGTGACATTTTCTAACGCGAAGGTACTT Species 2 GGCAGCGCCATTTTCTAATGCGAGGGTACTT Species 3 GGCAGCGCCATTGTCTAATGCGAGGGTACTT ***** * **** ***** **** ******* Phylogenetics Archea Divergence times Human-chimp-neanderthal Comparative Genomics Ultraconserved sequences (mutation and selection) ENCODE Fox2p Phylogenetics Methods Table 1. Number of possible rooted and unrooted trees. Table 2. Distance matrix. Sequence A B C Number of Number of rooted Number of A sequences trees unrooted trees B d(AB) 2 1 1 C d(AC) d(BC) 3 3 1 D d(AD) d(BD) d(CD) 4 15 3 Each d is the distance (substitution rate) 5 954 105 between pairs of sequences 10 34,459,425 2,027,025 Taxonomists have long debated phylogenetic methods. D C There are many types of methods: Character state methods (also called cladistic methods), like A B parsimony. Distance or similarity based methods (also called phenetic Software: methods), like UPGMA. PAUP Maximum likelihood and Bayesian Methods. PHYLIP MEGA Parsimony (non-parametric) and Maximum likelihood MrBayes (parametric) are both used when phylogeny is critical. Gene trees vs Species trees 1. Orthology 2. Independence (no concerted evolution or horizontal transfer) Orthologs are genes created by speciation events. Paralogs are genes created by duplication events. Homologs are genes that are similar because of shared ancestry. Duplication Orthologues and paralogues can be distinguished by i) synteny or ii) phylogeny. Speciation Species 1 Species 2 Gene Conversion and Horizontal Gene Transfer Locus 1 Chr02 HHF1 HHT1 Species tree Locus 2 Chr14 HHT2 HHF2 Vertebrate to Bacteria Bacteria to Vertebrate No conversion Gene Conversion (true phylogeny) Molecular Evolution (Comparative Genomics) 1. Conservation Annotation of genes, regulatory sequences and other functional elements Functional sequences will remain conserved across distantly related species whereas non-functional sequences will accumulate changes 2. Divergence Evolution of genes, regulatory sequences and other functional elements Species-specific functional sequences Functional sequences with new or modified functions Origins of Molecular Evolution Insulin was the first protein sequenced in 1955 for which Fred Sanger received the Nobel prize. Cytochrome C protein sequence (Margoliash et al. 1961). The sequencing of the same proteins from different species established a number of key principles of molecular evolution: 1. Most proteins are highly conserved and changes that do occur are not found within functionally important sites. For example human diabetics were treated with insulin purified from pigs and cows. 2. The rate of amino acid substitution is constant across phylogenetic lineages. Molecular clock - the rate of amino acid or nucleotide substitution is constant per year across phylogenetic lineages (Zuckerkandl and Pauling 1962). Controversial but revolutionized phylogenetics and set the stage for the neutral theory. Neutral theory or neutral mutation random drift hypothesis - the vast majority of mutations that become polymorphic in a population and fixed between species are not driven by Darwinian selection but are neutral or nearly neutral with respect to fitness (Kimura 1968; King and Jukes 1969). The neutral theory is dead; long live the neutral theory. Difference between between mutation rate Difference Fixation probability depends on selection on depends probability Fixation time * probability * fixation rate = rate mutation Substitution selection) on (depends population rate Substitution onselection) depend NOT (does division cell rate Mutation and and substitution rate. the chance of a mutation occurring in each generation or generation each in occurring ofamutation chance the the frequency at which mutations become fixed within a a within fixed become mutations at which frequency the Population frequency Time Nucleotide Substitution Models Jukes and Cantor (JC69) Model (1969) A G Purines C T Pyrimidines Assumptions of JC model. 1) Equal base frequencies Nucleotide substitution models 2) Equal mutation rates between the bases correct for multiple hits 3) Constant mutation rate 4) No selection Jukes Cantor Model p = 3/31 = 0.097 K = 0.104 substitutions per site Other nucleotide substitution models Model Assumption Free Reference Parameters JC69 A=G=C=T 1 Jukes & Cantor ts=tv 1969 K80 A=G=C=T 2 Kimura 1980 F81 ts=tv 4 Felsenstein 1980 HKY85 5 Hasegawa, Kishino & Yano GTR unequal rates 9 Tavare 1986 Substitution Rates with Selection Substitution rate = mutation rate * fixation probability * time The substitution rate for neutral mutations = 2Nµ * 1/2N * t = µt The substitution rate for adaptive mutations = 2Nµ * 2s * t = 4Nsµt for 4Ns > 1 No selection: The substitution rate between two species is K = 2t. −4N sq 1−e e P= Selection: −4N s 1−e e t S.cerevisiae S.paradoxus Conserved sequences Human-Mouse conservation Species Conserved* Conserved Noncoding Reference (non-repetitive aligned) Humans 3-8% 21% Waterston et al. (2002) Worms 18-37% 18% Shabalina & Kondrashov (1999) Flies 37-53% 40-70% Andolfatto (2005) Yeast 47-68% 30-40% Chin et al. (2005), Doniger et al. (2005) *Siepel et al. (2005) Deletion and expression assays of conserved noncoding sequences Pennacchio et al. 2006 Yun et al. 2012 Rapidly Evolving Genes (dN/dS) Detecting selection using the nucleotide substitution rate Synonymous change - mutation that does not change the amino acid sequence of a protein. Nonsynonymous change - mutation that changes the amino acid sequence of a protein. dN or Ka = the nonsynonymous substitution rate = # nonsynonymous changes / # nonsynonymous sites. dS or Ks = the synonymous substitution rate = # synonymous changes / # synonymous sites. Table 1. The genetic code. Codon AA Codon AA Codon AA Codon AA TTT Phe TCT Ser TAT Tyr TGT Cys TTC Phe TCC Ser TAC Tyr TGC Cys Interpretation of dN/dS ratios (assuming synonymous sites are TTA Leu TCA Ser TAA Stop TGA Stop TTG Leu TCG Ser TAG Stop TGG Trp neutral): dN/dS = 1No constraint on protein sequence, i.e. nonsynonymous CTT Leu CCT Pro CAT His CGT Arg CTC Leu CCC Pro CAC His CGC Arg changes are neutral. CTA Leu CCA Pro CAA Gln CGA Arg dN/dS < 1Functional constraint on the protein sequence, i.e. CTG Leu CCG Pro CAG Gln CGG Arg nonsynonymous mutations are deleterious. ATT Ile ACT Thr AAT Asn AGT Ser ATC Ile ACC Thr AAC Asn AGC Ser dN/dS > 1Change in the function of the protein sequence, i.e. ATA Ile ACA Thr AAA Lys AGA Arg nonsynonymous mutations are adaptive. ATG Met ACG Thr AAG Lys AGG Arg GTT Val GCT Ala GAT Asp GGT Gly GTC Val GCC Ala GAC Asp GGC Gly GTA Val GCA Ala GAA Glu GGA Gly GTG Val GCG Ala GAG Glu GGG Gly Rapidly Evolving Genes dN increased by positive selection dN decreased by negative selection Problem: dN may be influenced by both and still be less than dS Nayak et al. 2005 Branch Model (dN/dS) (rate heterogeneity) 15 copies in human Vary in copy in other primates Johnson et al. 2001 Site Model (dN/dS) ● Positive selection on the egg receptor (VERL) for abalone sperm lysin. ● VERL – lysin are a lock and key for fertilization. ● Co-evolution by sexual selection, conflict or microbial attack. Gilando et al. 2003 Sites – methods Maximum Parsimony (Suzuki) Maximum Likelihood (PAML, HyPhy) Models of molecular evolution Key Assumptions: ➔Alignments are correct ➔Sites are independent ➔Mutational & selection parameters Alignment Accuracy & Coverage No indels Indels No indels Indels No constraint Constraint Pollard et al. 2004 Alignment differences gp120 HIV/SIV ClustalW alignment PRANK alignment (phylogeny aware) Detection of positive selection depends on the alignment Markova-Raina and Petrov (2011) Mutation rate variation ● Transitions vs. Transversions – transitions occur twice as often as transversions ● CpG - Spontaneous deamination of 5- methylcytosine results in thymine and ammonia, 20x higher rate of transition ● 28% of mutations are transitions at CpG sites but only 3.5% of sites are CpG ● Genomic position (5-10%) ● Age, sex (2 – 10 fold) ● Repeats (polynucleotides, microsatellites) Types of Mutations - WGS Single nucleotide Transpositions Duplications Insertion/Deletion Rearrangement G/C to A/T 2.9-fold higher than reverse! Predicts 74% AT content Substitution rate as a function of GC content BRCA1 sliding window Ka/Ks analysis Codon Bias Measures of Codon Bias CAI – codon adaptive index based on relative usage of the codon to the most abundant codon for an amino acid Fop – frequency of the optimal codon ENC – effective number of codons based on the deviation from equal usage Explanation of Codon Bias Bias towards GC ending codons that is not found in adjacent noncoding regions Correlates with highly expressed genes Correlates with tRNA abundance Explanations: translational accuracy/speed, protein misfolding Codon Bias is correlated with Synonymous Substitution Rate Codon Bias correlation depends on distance Codon models αs = synonymous rate βs = nonsynonymous rate R = tv/ts πny = frequency of target nucleotide n in codon y Binding site models ● Sequence ~ binding affinity (Schneider et al. 1986, Berg and von Hippel 1987) ● Binding affinity ~ fitness (Gerland and Hwa 2002, Sengupta et al. 2002) ● Fitness ~ substitution rate (Moses et al. 2004) Kimura 1962 Bulmer 1991 Moses et al. 2004 Biased Gene Conversion AT to GC bias Recombination occurs in hotspots Recombination
Recommended publications
  • Translation Readthrough Mitigation Joshua A
    LETTER doi:10.1038/nature18308 Translation readthrough mitigation Joshua A. Arribere1, Elif S. Cenik1, Nimit Jain2, Gaelen T. Hess3, Cameron H. Lee3, Michael C. Bassik3 & Andrew Z. Fire1,3 A fraction of ribosomes engaged in translation will fail to into the 3′ UTR can confer substantial loss of protein expression for at terminate when reaching a stop codon, yielding nascent proteins least these three 3′ UTRs in C. elegans. inappropriately extended on their C termini. Although such To test whether translation into 3′ UTRs could confer a loss of extended proteins can interfere with normal cellular processes, protein expression more generally, a two-fluorescent-reporter system known mechanisms of translational surveillance1 are insufficient to with each fluorophore transgene containing an identical 3′ UTR was protect cells from potential dominant consequences. Here, through used. Nine genes were chosen to reflect a variety of functions and a combination of transgenics and CRISPR–Cas9 gene editing in expression levels: rps-17 (small ribosomal subunit component), r74.6 Caenorhabditis elegans, we demonstrate a consistent ability of cells (dom34/pelota release factor homologue), hlh-1 (muscle transcription to block accumulation of C-terminal-extended proteins that result factor), eef-1A.1 (also known as eft-3, translation elongation factor), from failure to terminate at stop codons. Sequences encoded by myo-2 (a pharyngeal myosin), mut-16 (involved in gene/transposon the 3′ untranslated region (UTR) were sufficient to lower protein silencing), bar-1 (a beta catenin), daf-6 (involved in amphid mor- levels. Measurements of mRNA levels and translation suggested a phogenesis), and alr-1 (neuronal transcription factor).
    [Show full text]
  • 2021 International Conference on Intelligent Biology and Medicine (ICIBM 2021)
    2021 International Conference on Intelligent Biology and Medicine (ICIBM 2021) August 08-10, 2021 Virtual via Zoom Hosted by: The International Association for Intelligent Biology and Medicine (IAIBM), Temple University, The Perelman School of Medicine, University of Pennsylvania, and The University of Texas Health Science Center at Houston 1 TABLE OF CONTENTS Welcome .……………………………….…………………………... 4 Acknowledgments ……………………………….…………………. 5 Schedule ……………………………….………………………….... 9 Keynote speakers’ information ……………………………….….. 24 Eminent Scholar Talks ………….……………………………….. 32 Workshop and tutorial information …………………………… 40 Session information ……………………………….……………… 46 Poster session abstracts ……………………………….…………… 107 About IAIBM ……………………………………………….……. 136 Special Acknowledgements……...……………….………………… 137 2 Sponsorships……………………………………………………… 138 3 Welcome to ICIBM 2021! On behalf of all our conference committees and organizers, we welcome you to the 2021 International Conference on Intelligent Biology and Medicine (ICIBM 2021), co-hosted by The International Association for Intelligent Biology and Medicine (IAIBM), Temple University, and the Perelman School of Medicine at the University of Pennsylvania. Given the rapid innovations in the fields of bioinformatics, systems biology, and intelligent computing and their importance to scientific research and medical advancements, we are pleased to once again provide a forum that fosters interdisciplinary discussions, educational opportunities, and collaborative efforts among these ever growing and progressing fields. We are proud to have built on the successes of previous years’ conferences to take ICIBM 2021 to the next level. This year, our keynote speakers include Drs. James S. Duncan, Chunhua Weng, Ben Raphael, and Ying Xu. We also have four eminent scholar speakers from Drs. Yue Feng, Graciela Gonzalez-Hernandez, Kai Tan, and Wei Chen. These researchers are world-renowned experts in their respective fields, and we are privileged to host their talks at ICIBM 2021.
    [Show full text]
  • Next Generation Exome Sequencing of Paediatric Inflammatory Bowel
    Inflammatory bowel disease ORIGINAL ARTICLE Next generation exome sequencing of paediatric Gut: first published as 10.1136/gutjnl-2011-301833 on 28 April 2012. Downloaded from inflammatory bowel disease patients identifies rare and novel variants in candidate genes Katja Christodoulou,1 Anthony E Wiskin,2 Jane Gibson,1 William Tapper,1 Claire Willis,2 Nadeem A Afzal,3 Rosanna Upstill-Goddard,1 John W Holloway,4 Michael A Simpson,5 R Mark Beattie,3 Andrew Collins,1 Sarah Ennis1 < Additional materials are ABSTRACT published online only. To view Background Multiple genes have been implicated by Significance of this study these files please visit the association studies in altering inflammatory bowel journal online (http://dx.doi.org/ 10.1136/gutjnl-2011-301833). disease (IBD) predisposition. Paediatric patients often What is already known on this subject? manifest more extensive disease and a particularly < For numbered affiliations see Genome-wide association studies have impli- end of article. severe disease course. It is likely that genetic cated numerous candidate genes for inflamma- predisposition plays a more substantial role in this group. tory bowel disease (IBD), but evidence of Correspondence to Objective To identify the spectrum of rare and novel causality for specific variants is largely absent. Dr Sarah Ennis, Genetic variation in known IBD susceptibility genes using exome Furthermore, by design, genome-wide associa- Epidemiology and Genomic sequencing analysis in eight individual cases of childhood Informatics Group, Human tion studies are limited to the study of Genetics, Faculty of Medicine, onset severe disease. common variants and overlook the functionally University of Southampton, Design DNA samples from the eight patients underwent detrimental variation imposed by rare/novel Duthie Building (Mailpoint 808), targeted exome capture and sequencing.
    [Show full text]
  • Non-Synonymous to Synonymous Substitutions Suggest That Orthologs Tend to Keep Their Functions, While Paralogs Are a Source of Functional Novelty
    bioRxiv preprint doi: https://doi.org/10.1101/354704; this version posted July 23, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Non-synonymous to synonymous substitutions suggest that orthologs tend to keep their functions, while paralogs are a source of functional novelty Mario Esposito1, Gabriel Moreno-Hagelsieb1,* 1 Dept of Biology, Wilfrid Laurier University, Waterloo, Ontario, Canada N2L 3C5 * [email protected] Abstract Because orthologs diverge after speciation events, and paralogs after gene duplication, it is expected that orthologs should tend to keep their functions, while paralogs have been proposed as a source of new functions. This does not mean that paralogs should diverge much more than orthologs, but it certainly means that, if there is a difference, then orthologs should be more functionally stable. Since protein functional divergence follows from non-synonymous substitutions, here we present an analysis based on the ratio of non-synonymous to synonymous substitutions (dN=dS). The results showed orthologs to have noticeable and statistically significant lower values of dN=dS than paralogs, not only confirming that orthologs keep their functions better, butalso suggesting that paralogs are a readily source of functional novelty. Author summary Homologs are characters diverging from a common ancestor, with orthologs being homologs diverging after a speciation event, and paralogs diverging after a duplication event. Given those definitions, orthologs are expected to preserve their ancestral function, while paralogs have been proposed as potential sources of functional novelty.
    [Show full text]
  • Abstracts for the ??Evolutionary Medicine Conference
    Ashdin Publishing Journal of Evolutionary Medicine ASHDIN Vol. 3 (2015), Article ID 235924, 45 pages publishing doi:10.4303/jem/235924 Abstracts CONFERENCE PROCEEDINGS Abstracts for the “Evolutionary Medicine Conference: Interdisciplinary Perspectives on Human Health and Disease” at the University of Zurich, Switzerland (July 30–August 1, 2015) Kaspar Staub,1 Nicole Bender,2 Paul Ewald,3 and Frank Ruhli¨ 1 1Institute of Evolutionary Medicine, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 2Institute of Social and Preventive Medicine, University of Bern, Finkenhubelweg 11, CH-3012 Bern, Switzerland 3Department of Biology, University of Louisville, Louisville, KY 40292, USA Address correspondence to Frank Ruhli,¨ [email protected] Received 17 Aril 2015; Revised 11 May 2015; Accepted 14 May 2015 Copyright © 2015 Kaspar Staub et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Summary In summer 2015, the “Evolutionary Medicine Conference diseases. The discipline is now at a turning point at which a 2015: Interdisciplinary Perspectives on Human Health and Disease” rigorous application of evolutionary insights to the medical takes place at the Institute of Evolutionary Medicine, University of sciences will require not only assessing the validity of the Zurich, Switzerland. This international conference is the first of its kind in Europe and brings together eight distinguished keynote speak- full spectrum of possible explanations for each disease ers from all over the world as well as experts from different disci- but the interplay of different contributors to illness within plines (including medicine, anthropology, molecular/evolutionary biol- and between the three broad categories of causal factors: ogy, paleopathology, archeology, history, psychology, epidemiology, genetic, infectious, and environmental.
    [Show full text]
  • BIOL 5112/3112: Fundamentals of Genomic Evolutionary Medicine
    BIOL 5112/3112: Fundamentals of Genomic Evolutionary Medicine Dr. Sudhir Kumar 602A SERC s.kumar@ temple.edu LECTURE BioLife Science 332 Wednesday: 5:30-8:00 pm OFFICE HOURS By appointment Offered in the Spring semester BIOL 5112 SECTION 001 [26318] (For graduate students) BIOL 3112 SECTION 001 [27340] (For undergraduate students) (Graduates and undergraduate attend the lectures together at the same time in the same room. However, undergraduate students will work in small groups in the semester-long student case study projects. Graduate students will work individually on case-study projects.) Prerequisite: Biology 2112 with a grade of C or better. Course Description: Modern evolutionary theory offers a conceptual framework for understanding human health and disease. In this course we will examine human disease in evolutionary contexts with a focus on modern techniques and genome-scale datasets. We ask: What can evolution teach us about human populations? How can we understand disease from molecular evolutionary perspectives? What are the relative roles of negative and positive selection in disease? How do we apply evolutionary principles in to diagnose diseases and develop better treatments? Students will become familiar with current research through guided case studies. This course focuses on discovery-based learning. Course Learning Objectives 1. Explain key concepts of evolutionary biology and medicine from a genomic perspective 2. Integrate key evolutionary concepts and principles to explain various aspects of human health and disease 3. Develop familiarity with current research relevant to evolutionary and genomic medicine 4. Evaluate how genomics and phylomedicine fit into the broader context of modern healthcare 5.
    [Show full text]
  • An Evolutionary Telescope to Explore and Diagnose the Universe of Disease Mutations
    Review Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations Sudhir Kumar1,2, Joel T. Dudley3,4, Alan Filipski1,2 and Li Liu2 1 School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA 2 Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA 3 Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA 4 Biomedical Informatics Training Program, Stanford University School of Medicine, Stanford, CA 94305, USA Modern technologies have made the sequencing of per- robust picture of the amount and types of variations found sonal genomes routine. They have revealed thousands within and between human individuals and populations. of nonsynonymous (amino acid altering) single nucleo- Any one personal genome contains more than a million tide variants (nSNVs) of protein-coding DNA per ge- variants, the majority of which are single nucleotide var- nome. What do these variants foretell about an iants (SNVs) (Figure 1b). With the complete sequencing of individual’s predisposition to diseases? The experimen- each new genome, the number of novel variants discovered tal technologies required to carry out such evaluations at is decreasing, but the total number of known variants is a genomic scale are not yet available. Fortunately, the growing quickly (Figure 2a). Our knowledge of the number process of natural selection has lent us an almost infinite of disease genes and the total number of known disease- set of tests in nature. During long-term evolution, new associated SNVs has grown with these advances [12].
    [Show full text]
  • Unbiased Estimate of Synonymous and Non-Synonymous Substitution Rates with Non-Stationary Base Composition
    Manuscript submitted as an article for the methods section of MBE bioRxiv preprint doi: https://doi.org/10.1101/124925; this version posted April 6, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Unbiased estimate of synonymous and non-synonymous substitution rates with non-stationary base composition Laurent Gu´eguen∗ and Laurent Duret Laboratoire de Biologie et Biom´etrie Evolutive´ CNRS UMR 5558 {Universit´eClaude Bernard Lyon 1 { Universit´ede Lyon 43, bd du 11 novembre 1918,69622 VILLEURBANNE cedex ∗ [email protected] Abstract The measure of synonymous and non-synonymous substitution rates (dS and dN) is useful for assessing selection operating on protein sequences or for investigating mutational processes dN affecting genomes. In particular, the ratio dS is expected to be a good proxy of !, the probability of fixation of non-synonymous mutations relative to that of neutral mutations. Standard methods for estimating dN, dS or ! rely on the assumption that the base composition of sequences is at the equilibrium of the evolutionary process. In many clades, this assumption of stationarity is in fact incorrect, and we show here through simulations and through analyses of empirical data that non-stationarity biases the estimate of dN, dS and !. We show that the bias in the estimate of ! can be fixed by explicitly considering non-stationarity in the modeling of codon evolution, in a maximum likelihood framework.
    [Show full text]
  • Phylomedicine: an Evolutionary Telescope to Explore and Diagnose the Universe of Disease Mutations
    TIGS-899; No. of Pages 10 Review Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations 1,2 3,4 1,2 2 Sudhir Kumar , Joel T. Dudley , Alan Filipski and Li Liu 1 School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA 2 Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287-5301, USA 3 Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA 4 Biomedical Informatics Training Program, Stanford University School of Medicine, Stanford, CA 94305, USA Modern technologies have made the sequencing of per- robust picture of the amount and types of variations found sonal genomes routine. They have revealed thousands within and between human individuals and populations. of nonsynonymous (amino acid altering) single nucleo- Any one personal genome contains more than a million tide variants (nSNVs) of protein-coding DNA per ge- variants, the majority of which are single nucleotide var- nome. What do these variants foretell about an iants (SNVs) (Figure 1b). With the complete sequencing of individual’s predisposition to diseases? The experimen- each new genome, the number of novel variants discovered tal technologies required to carry out such evaluations at is decreasing, but the total number of known variants is a genomic scale are not yet available. Fortunately, the growing quickly (Figure 2a). Our knowledge of the number process of natural selection has lent us an almost infinite of disease genes and the total number of known disease- set of tests in nature.
    [Show full text]
  • The KA /KS Ratio Test for Assessing the Protein
    Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press Methods The KA/KS Ratio Test for Assessing the Protein-Coding Potential of Genomic Regions: An Empirical and Simulation Study Anton Nekrutenko, Kateryna D. Makova, and Wen-Hsiung Li1 Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA Comparative genomics is a simple, powerful way to increase the accuracy of gene prediction. In this study, we show the utility of a simple test for the identification of protein-coding exons using human/mouse sequence comparisons. The test takes advantage of the fact that in the vast majority of coding regions, synonymous substitutions (KS) occur much more frequently than nonsynonymous ones (KA) and uses the KA/KS ratio as the criterion. We show the following: (1) most of the human and mouse exons are sufficiently long and have a suitable degree of sequence divergence for the test to perform reliably; (2) the test is suited for the identification of long exons and single exon genes, which are difficult to predict by current methods; (3) the test has a false-negative rate, lower than most of current gene prediction methods and a false-positive rate lower than all current methods; (4) the test has been automated and can be used in combination with other existing gene-prediction methods. Although computational gene prediction has made much test), which is described below. Nucleotide substitutions in progress over the last decade, its reliability remains a challeng- protein-coding regions are divided into two classes, ones that ing problem.
    [Show full text]
  • Molecular Phylogeny and Evolution
    Molecular Phylogeny and Evolution 10 – 14 February 2020 Juan I. MONTOYA BURGOS Lab of Molecular Phylogeny and Evolution in Vertebrates Title of the course: Molecular Phylogeny and Molecular Evolution Evolutionary relationships among organisms = tree topology A better understanding of molecular evolution improves: First phylogenetic methods did not - topology and branch length make use of models of molecular reconstruction (=phylogenetic tree) evolution (UPGMA, Maximum Parsimony) Better phylogenetic trees improve: - the understanding of evolutionary processes => Models of molecular evolution Lab of Molecular Phylogeny and Evolution in Vertebrates 1 Why should we care about phylogenies? Are you using phylogenetics ? Lab of Molecular Phylogeny and Evolution in Vertebrates Current phylogenetic methods allow: - reconstruction of evolutionary relationships But also the analysis of: Gene/genome duplication Recombination Evolutionary rates Divergence time among lineages Selective pressure Demography Biodiversity Genetic variability Conservation Biogeography Spread of contagious disease Discovery of biomedical compounds Adaptive evolution Biomonitoring Protein-protein co-evolution And more …. Lab of Molecular Phylogeny and Evolution in Vertebrates 2 Phylogenetic analyses: many uses Understanding evolutionary relationships Determining the closest extant relative to tetrapods Amemiya et al., 2013. The African coelacanth genome provides insights into tetrapod evolution. Nature 496. Lab of Molecular Phylogeny and Evolution in Vertebrates Phylogenetic
    [Show full text]
  • Role of Mrna Structure in the Control of Protein Folding Guilhem Faure, Aleksey Y
    10898–10911 Nucleic Acids Research, 2016, Vol. 44, No. 22 Published online 27 July 2016 doi: 10.1093/nar/gkw671 Role of mRNA structure in the control of protein folding Guilhem Faure, Aleksey Y. Ogurtsov, Svetlana A. Shabalina and Eugene V. Koonin* National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA Received May 03, 2016; Revised July 12, 2016; Accepted July 14, 2016 ABSTRACT speed and fidelity of translation. Although much attention had focused on the initiation step as a major determinant Specific structures in mRNA modulate translation of translation rate, multiple studies over nearly two decades rate and thus can affect protein folding. Using the have made it clear that elongation also plays an important protein structures from two eukaryotes and three role in the regulation of translation and co-translational prokaryotes, we explore the connections between protein folding (1–3). In particular, it has been shown that the protein compactness, inferred from solvent ac- in some proteins, ␣-helices and ␤-strands are flanked by cessibility, and mRNA structure, inferred from mRNA strong signals in the mRNA sequence (4,5). Different pro- folding energy (G). In both prokaryotes and eukary- tein structures have been reported to correlate with distinct otes, the G value of the most stable 30 nucleotide patterns of synonymous codon usage in the respective mR- segment of the mRNA (Gmin) strongly, positively NAs (6,7). More generally, it has been proposed that dif- correlates with protein solvent accessibility. Thus, ferent protein secondary structures are encoded by mRNA sequences with distinct properties.
    [Show full text]