Genome-Scale Detection of Positive Selection in Nine Primates Predicts Human-Virus Evolutionary Conflicts
Total Page:16
File Type:pdf, Size:1020Kb
10634–10648 Nucleic Acids Research, 2017, Vol. 45, No. 18 Published online 11 August 2017 doi: 10.1093/nar/gkx704 Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts Robin van der Lee1,*, Laurens Wiel1,2, Teunis J.P. van Dam1 and Martijn A. Huynen1 1Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands and 2Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, The Netherlands Received May 02, 2017; Revised July 26, 2017; Editorial Decision July 30, 2017; Accepted August 02, 2017 ABSTRACT able at https://github.com/robinvanderlee/positive- selection. Hotspots of rapid genome evolution hold clues about human adaptation. We present a compara- tive analysis of nine whole-genome sequenced pri- INTRODUCTION mates to identify high-confidence targets of posi- Conservation of structure and sequence often indicate bi- tive selection. We find strong statistical evidence ological function. Rapidly evolving sequence features may for positive selection in 331 protein-coding genes however also indicate function, as they may reveal molecu- (3%), pinpointing 934 adaptively evolving codons lar adaptations to new selection pressures. But what drives (0.014%). Our new procedure is stringent and re- these rapid genetic changes during evolution? Can these changes explain the specific phenotypes of species or indi- veals substantial artefacts (20% of initial predic- viduals, such as a differential susceptibility to viruses? tions) that have inflated previous estimates. The fi- Immunity genes contain the strongest signatures of nal 331 positively selected genes (PSG) are strongly rapid evolution due to positive Darwinian selection (1–13). enriched for innate and adaptive immunity, secreted Pathogens continuously invent new ways to evade, coun- and cell membrane proteins (e.g. pattern recognition, teract and suppress the immune response of their hosts, complement, cytokines, immune receptors, MHC, thereby acting as major drivers of the observed adaptive Siglecs). We also find evidence for positive selection evolution of immune systems (14,15). In line with this, the in reproduction and chromosome segregation (e.g. structural interfaces of human proteins directly involved centromere-associated CENPO, CENPT), apolipopro- in virus interactions show accelerated evolution (16). Nu- teins, smell/taste receptors and mitochondrial pro- merous proteins involved in the virus–host interaction have teins. Focusing on the virus–host interaction, we been demonstrated to be in genetic conflict with their inter- acting viral proteins, a phenomenon that has been likened retrieve most evolutionary conflicts known to influ- to a virus–host ‘arms race’ (15). Such studies have generally ence antiviral activity (e.g. TRIM5, MAVS, SAMHD1, focused on a single gene or gene family of interest sequenced tetherin) and predict 70 novel cases through integra- across a large number of species. Evolutionary analyses can tion with virus–human interaction data. Protein struc- then predict which genes and codons may be involved in ture analysis further identifies positive selection in virus interactions. For example, mitochondrial antiviral sig- the interaction interfaces between viruses and their naling protein (MAVS) is a central signaling hub in the cellular receptors (CD4-HIV; CD46-measles, ade- RIG-I-like receptor (RLR) pathway, which recognizes in- noviruses; CD55-picornaviruses). Finally, primate fections of a wide range of viruses from the presence of their PSG consistently show high sequence variation in RNA in the cytosol. Analysis of the MAVS gene in 21 pri- human exomes, suggesting ongoing evolution. Our mates identified several positions that have evolved under curated dataset of positive selection is a rich source recurrent strong positive selection and turned out to be crit- ical for resisting cleavage by Hepatitis C virus (17). Other ex- for studying the genetics underlying human (an- amples of immunity genes showing evolutionary divergence tiviral) phenotypes. Procedures and data are avail- that directly impacts the ability to restrict viral replication include TRIM5␣ (18), PKR (19)andMxA(20). *To whom correspondence should be addressed. Tel: +1 604 875 2345 (Ext 5273); Email: [email protected] Present addresses: Robin van der Lee, Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, BC, Canada. Teunis J.P. van Dam, Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, The Netherlands. C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2017, Vol. 45, No. 18 10635 Positive selection can be detected through comparative tholog was annotated (likely a combination of suboptimal analysis of protein-coding DNA sequences from multiple genome annotation and lineage-specific gene loss), while species (9,21). Markov models of codon evolution com- another 3798 were discarded because they were part of one- bined with maximum likelihood (ML) methods (22)can to-many or many-to-many relationships. Human genes part analyze alignments of orthologous sequences to identify of the one-to-one ortholog clusters are generally a good genes, codons and lineages that show an excess of nonsyn- representation of all protein-coding human genes, as only onymous substitutions (mutations in the DNA that cause functions related to olfactory signaling and sensory percep- changes to the protein) compared to synonymous (‘silent’) tion of smell are strongly underrepresented among them. substitutions (dN/dS ratio or , see Text S1 for a detailed A number of other categories are moderately underrepre- explanation). Successful application requires many steps sented (antigen presentation and MHC; translation and ri- (15,23) (Text S1), including: (i) the identification of orthol- bosome) or overrepresented (development and differentia- ogous sequences, sampled from species across an appropri- tion, including neuron, synapse and nervous system devel- ate evolutionary distance (distant enough to show variation, opment; kinases and phosphatases; cell motility and migra- but not too divergent to prevent saturation); (ii) accurate tion). Note that our analyses of PSG functions (below) cor- alignment and phylogenetic tree reconstruction; (iii) param- rect for these patterns by using the one-to-one orthologs as eterization of the ML model. While studies of positive se- a background. lection on individual genes have achieved reliable results, estimates of positive selection in whole genomes have been Initial alignments substantially affected by unreliabilities in sequencing, gene models, annotation and misalignment (24–30). In addition We first obtained multiple alignments of the clusters ofor- to the analysis of genomes from different species, the in- thologous primate protein sequences using MUSCLE (34) creasing availability of individual human genomes and pop- and Mafft (35), and from the Compara pipeline (i.e. filter- ulations provides new opportunities to systematically ana- ing the larger vertebrate alignment for primate sequences) lyze more recent sequence variation (9,31). (33). Inspections revealed that misalignment of nonhomol- In this study, we performed comparative evolution- ogous codons affects virtually all alignments, as was ob- ary analyses of recent whole-genome sequenced primates served in previous studies (25–29). This is probably the re- as well as of human genetic variation to identify high- sult of the tendency of alignment algorithms to overalign, confidence targets of positive selection. Our findings pro- i.e. produce alignments that are shorter than the true solu- vide insights into the biological systems that have under- tion due to collapsed insertions in an attempt to avoid gap gone molecular adaptation in primate evolution, which goes penalties (26,36). The PRANK algorithm to some extent beyond immunity genes. Interrogation of the positively se- prevents alignment of nonhomologous regions by flagging lected genes with structural and genomic data describing gaps made during different stages of progressive alignment virus–host interactions provides insights into potential de- and permitting their reuse without further penalties (36). terminants of viral infection and predicts new virus–human As PRANK has been shown to provide better initial align- evolutionary conflicts. ments for large-scale positive selection detection (25–30), we obtained multiple alignments of the primate ortholog MATERIALS AND METHODS clusters using the PRANK codon mode (prank +F -codon; v.140603). We used the default settings of (i) obtaining a Genes, orthologs and sequences guide tree from MAFFT for the progressive alignment pro- We obtained protein-coding DNA sequences of all nine cedure and (ii) selecting the best alignment from five itera- simian primates for which high-coverage whole-genome se- tions. These settings likely result in the best alignment for a quences are currently available