Immunoinformatics: Towards an Understanding of Species-Specific Protein Evolution Using Phylogenomics and Network Theory

Immunoinformatics: Towards an understanding of species-specific protein evolution using phylogenomics and network theory Andrew Edward Webb M.Sc. Genetics, B.Sc. Biotechnology (cum laude) A thesis presented to Dublin City University for the Degree of Doctor of Philosophy Supervisor: Dr. Mary J. O'Connell School of Biotechnology Dublin City University January 2015 Declaration ‘I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work.’ Signed: _____________________ ID Number: 10114017 Date: _____________________ ii Table of Contents Acknowledgements ............................................................................................... x Abbreviations .................................................................................................... xii List of Figures ......................................................... Error! Bookmark not defined. List of Tables ..................................................................................................... xix Abstract ............................................................................................................ xxii Thesis Aims ..................................................................................................... xxiii Chapter 1: Introduction ...................................................................................... 1 1.1 Innate Immunity ......................................................................................... 2 1.1.1 General overview of the innate immune system in vertebrates ......... 2 1.1.2 Major proteins categories involved in innate immunity .................... 2 1.1.3 Central innate immune proteins and pathways .................................. 4 1.1.3.1 The TLR signaling pathways ..................................................... 4 1.1.3.2 Vertebrate TLR repertoires ........................................................ 8 1.1.3.3 The complement system ............................................................. 9 1.1.4 Discordant innate immune responses ............................................... 10 1.1.4.1 Discordance reported in TRIM5α ............................................ 12 1.1.4.2 Discordance reported in the Toll-like Receptors ..................... 13 1.1.4.3 Variations reported in inflammation ........................................ 15 1.1.5 Understanding and predicting model species discordance............... 16 1.2 Natural Selection and Molecular Evolution ............................................. 17 1.2.1 Evolutionary theory .......................................................................... 17 1.2.2 Neutral theory................................................................................... 17 1.2.3 Natural Selection .............................................................................. 18 iii 1.2.4 Positive selection and functional shift.............................................. 19 1.2.5 The relationship between orthologs, paralogs, and function............ 24 1.3 Methods for assessing selective pressure variation .................................. 26 1.3.1 Distance-based methods ................................................................... 26 1.3.2 Phylogeny-based methods ................................................................ 27 1.3.2.1 Maximum likelihood methods ................................................. 28 1.3.2.2 CodeML ................................................................................... 29 1.3.2.3 Phylogenetic Reconstruction ..................................................... 33 1.3.4 Population-based methods ............................................................... 34 1.3.4.1 McDonald–Kreitman test ......................................................... 34 1.3.4.2 Tajima’s D test statistic ............................................................ 34 1.3.4.3 Fay and Wu’s H test statistic .................................................... 36 1.4 Data limitations in analyses of selective pressure variation..................... 37 1.4.1 Alignment Error ............................................................................... 37 1.4.2 Non-adaptive evolutionary signals mistaken as positive selection .. 38 1.4.3 Purifying Selection acting on silent sites mistaken for positive selection........................................................................................................ 39 1.4.4 In vitro validation of positive selection ............................................ 40 1.5 Graph theory and molecular evolution ..................................................... 41 1.5.1 Introduction to graph theory............................................................. 41 1.5.2 Characterizing Graphs ...................................................................... 45 1.5.2.1 Centrality .................................................................................. 45 1.5.2.2 Assortativity ............................................................................. 50 1.5.2.3 Cliques and Communities ........................................................ 51 1.5.2.4 Clustering ................................................................................. 56 iv 1.5.3 Graphs and introgressive descent ..................................................... 59 1.5.3.1 Tools for detecting introgressive events in networks ............... 59 1.5.4 Composite genes and functional discordance. ................................. 60 Chapter 2: Design and development of the bmeTools package ..................... 62 2.1 Chapter Aim ............................................................................................. 63 2.2 Introduction .............................................................................................. 64 2.3 Aims for selective pressure analysis package .......................................... 66 2.4 Motivation behind the development of bmeTools ................................... 66 2.4.1 Minimize human error...................................................................... 66 2.4.2 Increase user productivity ................................................................ 67 2.5 Rationale behind the development of bmeTools ...................................... 68 2.5.1 Selection of python programming language .................................... 68 2.5.2 Separation of package into analysis phases ..................................... 69 2.6 General overview of bmeTools ................................................................ 69 2.7 Phase 1 – Data Preparation ...................................................................... 72 2.7.1 Functions: clean and ensembl_clean ................................................ 72 2.7.1.1 Additional options of ‘clean’ and ‘ensembl_clean’ ................. 73 2.7.2 Function: translate ............................................................................ 76 2.7.3 Function: create_database ................................................................ 78 2.7.4 Function: gene_selection .................................................................. 78 2.8 Phase 2: Homology searching .................................................................. 81 2.8.1 Core options ..................................................................................... 84 2.8.2 Functions: similarity_groups and reciprocal_groups ....................... 84 2.8.3 Function: best_reciprocal_groups .................................................... 85 v 2.9 Phase 3: Alignment assessment and phylogeny reconstruction ............... 88 2.9.1 Function: metal_compare ................................................................. 88 2.9.2 Functions: prottest_setup and prottest_reader .................................. 89 2.9.3 Function: mrbayes_setup ................................................................ 90 2.10 Phase 4: Selection analysis ...................................................................... 92 2.10.1 Function: map_alignments............................................................... 92 2.10.2 Function: infer_genetree ................................................................. 94 2.10.3 Function: setup_codeml .................................................................. 96 2.10.4 Function: create_subtrees ............................................................... 98 2.10.5 Function: mrbayes_reader ............................................................. 100 2.10.6 Function: create_branch ................................................................ 100 2.11 Phase 5: Selection analysis assessment .................................................. 101 2.11.1 Function: codeml_reader .............................................................. 101 2.12 General requirements of the software package ...................................... 101 2.12.1 Core functions ............................................................................... 101 2.12.2 Software dependencies ................................................................. 102 2.13 Case study .............................................................................................

Immunoinformatics: Towards an Understanding of Species-Specific Protein Evolution Using Phylogenomics and Network Theory

Towards the Understanding of Ccnb1ip1 As a Co-Regulator of Meiotic Crossing-Over in the Mouse

Transcriptomic and Proteomic Profiling Provides Insight Into

Microtubular Dysfunction and Male Infertility

Sperm Differentiation

Theileria Annulata

Unraveling the Sperm Transcriptome by Next Generation Sequencing and the Global Epigenetic Landscape in Infertile Men Fadi Choucair

Microtubular Dysfunction and Male Infertility

Treating the Man with Evidence Based Medicine

An Evolutionarily Conserved Pirna-Producing Locus Required for Male Mouse Fertility

Loss-Of-Function Mutations in QRICH2 Cause Male Infertility with Multiple Morphological Abnormalities of the Sperm ﬂagella

Automatic Detection of the Circulating Cell-Free Methylated DNA Pattern

Testis-Expressed Protein 33 Is Not Essential for Spermiogenesis and Fertility in Mice