Immunoinformatics: Towards an Understanding of Species-Specific Protein Evolution Using Phylogenomics and Network Theory
Total Page:16
File Type:pdf, Size:1020Kb
Immunoinformatics: Towards an understanding of species-specific protein evolution using phylogenomics and network theory Andrew Edward Webb M.Sc. Genetics, B.Sc. Biotechnology (cum laude) A thesis presented to Dublin City University for the Degree of Doctor of Philosophy Supervisor: Dr. Mary J. O'Connell School of Biotechnology Dublin City University January 2015 Declaration ‘I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work.’ Signed: _____________________ ID Number: 10114017 Date: _____________________ ii Table of Contents Acknowledgements ............................................................................................... x Abbreviations .................................................................................................... xii List of Figures ......................................................... Error! Bookmark not defined. List of Tables ..................................................................................................... xix Abstract ............................................................................................................ xxii Thesis Aims ..................................................................................................... xxiii Chapter 1: Introduction ...................................................................................... 1 1.1 Innate Immunity ......................................................................................... 2 1.1.1 General overview of the innate immune system in vertebrates ......... 2 1.1.2 Major proteins categories involved in innate immunity .................... 2 1.1.3 Central innate immune proteins and pathways .................................. 4 1.1.3.1 The TLR signaling pathways ..................................................... 4 1.1.3.2 Vertebrate TLR repertoires ........................................................ 8 1.1.3.3 The complement system ............................................................. 9 1.1.4 Discordant innate immune responses ............................................... 10 1.1.4.1 Discordance reported in TRIM5α ............................................ 12 1.1.4.2 Discordance reported in the Toll-like Receptors ..................... 13 1.1.4.3 Variations reported in inflammation ........................................ 15 1.1.5 Understanding and predicting model species discordance............... 16 1.2 Natural Selection and Molecular Evolution ............................................. 17 1.2.1 Evolutionary theory .......................................................................... 17 1.2.2 Neutral theory................................................................................... 17 1.2.3 Natural Selection .............................................................................. 18 iii 1.2.4 Positive selection and functional shift.............................................. 19 1.2.5 The relationship between orthologs, paralogs, and function............ 24 1.3 Methods for assessing selective pressure variation .................................. 26 1.3.1 Distance-based methods ................................................................... 26 1.3.2 Phylogeny-based methods ................................................................ 27 1.3.2.1 Maximum likelihood methods ................................................. 28 1.3.2.2 CodeML ................................................................................... 29 1.3.2.3 Phylogenetic Reconstruction ..................................................... 33 1.3.4 Population-based methods ............................................................... 34 1.3.4.1 McDonald–Kreitman test ......................................................... 34 1.3.4.2 Tajima’s D test statistic ............................................................ 34 1.3.4.3 Fay and Wu’s H test statistic .................................................... 36 1.4 Data limitations in analyses of selective pressure variation..................... 37 1.4.1 Alignment Error ............................................................................... 37 1.4.2 Non-adaptive evolutionary signals mistaken as positive selection .. 38 1.4.3 Purifying Selection acting on silent sites mistaken for positive selection........................................................................................................ 39 1.4.4 In vitro validation of positive selection ............................................ 40 1.5 Graph theory and molecular evolution ..................................................... 41 1.5.1 Introduction to graph theory............................................................. 41 1.5.2 Characterizing Graphs ...................................................................... 45 1.5.2.1 Centrality .................................................................................. 45 1.5.2.2 Assortativity ............................................................................. 50 1.5.2.3 Cliques and Communities ........................................................ 51 1.5.2.4 Clustering ................................................................................. 56 iv 1.5.3 Graphs and introgressive descent ..................................................... 59 1.5.3.1 Tools for detecting introgressive events in networks ............... 59 1.5.4 Composite genes and functional discordance. ................................. 60 Chapter 2: Design and development of the bmeTools package ..................... 62 2.1 Chapter Aim ............................................................................................. 63 2.2 Introduction .............................................................................................. 64 2.3 Aims for selective pressure analysis package .......................................... 66 2.4 Motivation behind the development of bmeTools ................................... 66 2.4.1 Minimize human error...................................................................... 66 2.4.2 Increase user productivity ................................................................ 67 2.5 Rationale behind the development of bmeTools ...................................... 68 2.5.1 Selection of python programming language .................................... 68 2.5.2 Separation of package into analysis phases ..................................... 69 2.6 General overview of bmeTools ................................................................ 69 2.7 Phase 1 – Data Preparation ...................................................................... 72 2.7.1 Functions: clean and ensembl_clean ................................................ 72 2.7.1.1 Additional options of ‘clean’ and ‘ensembl_clean’ ................. 73 2.7.2 Function: translate ............................................................................ 76 2.7.3 Function: create_database ................................................................ 78 2.7.4 Function: gene_selection .................................................................. 78 2.8 Phase 2: Homology searching .................................................................. 81 2.8.1 Core options ..................................................................................... 84 2.8.2 Functions: similarity_groups and reciprocal_groups ....................... 84 2.8.3 Function: best_reciprocal_groups .................................................... 85 v 2.9 Phase 3: Alignment assessment and phylogeny reconstruction ............... 88 2.9.1 Function: metal_compare ................................................................. 88 2.9.2 Functions: prottest_setup and prottest_reader .................................. 89 2.9.3 Function: mrbayes_setup ................................................................ 90 2.10 Phase 4: Selection analysis ...................................................................... 92 2.10.1 Function: map_alignments............................................................... 92 2.10.2 Function: infer_genetree ................................................................. 94 2.10.3 Function: setup_codeml .................................................................. 96 2.10.4 Function: create_subtrees ............................................................... 98 2.10.5 Function: mrbayes_reader ............................................................. 100 2.10.6 Function: create_branch ................................................................ 100 2.11 Phase 5: Selection analysis assessment .................................................. 101 2.11.1 Function: codeml_reader .............................................................. 101 2.12 General requirements of the software package ...................................... 101 2.12.1 Core functions ............................................................................... 101 2.12.2 Software dependencies ................................................................. 102 2.13 Case study .............................................................................................