A Gene-Based Positive Selection Detection Approach to Identify Vaccine Candidates Using Toxoplasma Gondii As a Test Case Protozoan Pathogen
Total Page:16
File Type:pdf, Size:1020Kb
fgene-09-00332 August 17, 2018 Time: 10:20 # 1 ORIGINAL RESEARCH published: 20 August 2018 doi: 10.3389/fgene.2018.00332 A Gene-Based Positive Selection Detection Approach to Identify Vaccine Candidates Using Toxoplasma gondii as a Test Case Protozoan Pathogen Stephen J. Goodswen1, Paul J. Kennedy2 and John T. Ellis1* 1 School of Life Sciences, University of Technology Sydney, Ultimo, NSW, Australia, 2 School of Software, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Ultimo, NSW, Australia Over the last two decades, various in silico approaches have been developed and refined that attempt to identify protein and/or peptide vaccines candidates from Edited by: José M. Álvarez-Castro, informative signals encoded in protein sequences of a target pathogen. As to date, Universidade de Santiago no signal has been identified that clearly indicates a protein will effectively contribute to de Compostela, Spain a protective immune response in a host. The premise for this study is that proteins under Reviewed by: Marcelo R. S. Briones, positive selection from the immune system are more likely suitable vaccine candidates Federal University of São Paulo, Brazil than proteins exposed to other selection pressures. Furthermore, our expectation is that Manuel Alfonso Patarroyo, protein sequence regions encoding major histocompatibility complexes (MHC) binding Fundación Instituto de Inmunología de Colombia, Colombia peptides will contain consecutive positive selection sites. Using freely available data *Correspondence: and bioinformatic tools, we present a high-throughput approach through a pipeline that John T. Ellis predicts positive selection sites, protein subcellular locations, and sequence locations [email protected] of medium to high T-Cell MHC class I binding peptides. Positive selection sites are Specialty section: estimated from a sequence alignment by comparing rates of synonymous (dS) and This article was submitted to non-synonymous (dN) substitutions among protein coding sequences of orthologous Evolutionary and Population Genetics, a section of the journal genes in a phylogeny. The main pipeline output is a list of protein vaccine candidates Frontiers in Genetics predicted to be naturally exposed to the immune system and containing sites under Received: 23 March 2018 positive selection. Candidates are ranked with respect to the number of consecutive Accepted: 02 August 2018 sites located on protein sequence regions encoding MHCI-binding peptides. Results Published: 20 August 2018 are constrained by the reliability of prediction programs and quality of input data. Citation: Goodswen SJ, Kennedy PJ and Protein sequences from Toxoplasma gondii ME49 strain (TGME49) were used as a case Ellis JT (2018) A Gene-Based Positive study. Surface antigen (SAG), dense granules (GRA), microneme (MIC), and rhoptry Selection Detection Approach to Identify Vaccine Candidates Using (ROP) proteins are considered worthy T. gondii candidates. Given 8263 TGME49 Toxoplasma gondii as a Test Case protein sequences processed anonymously, the top 10 predicted candidates were all Protozoan Pathogen. worthy candidates. In particular, the top ten included ROP5 and ROP18, which are Front. Genet. 9:332. doi: 10.3389/fgene.2018.00332 T. gondii virulence determinants. The chance of randomly selecting a ROP protein Frontiers in Genetics| www.frontiersin.org 1 August 2018| Volume 9| Article 332 fgene-09-00332 August 17, 2018 Time: 10:20 # 2 Goodswen et al. Positive Selection of Vaccine Candidates for Parasites was 0.2% given 8263 sequences. We conclude that the approach described is a valuable addition to other in silico approaches to identify vaccines candidates worthy of laboratory validation and could be adapted for other apicomplexan parasite species (with appropriate data). Keywords: Toxoplasma gondii, Hammondia hammondi, Neospora caninum, reverse vaccinology, vaccine discovery, positive selection INTRODUCTION positive with the main environmental pressure being the immune system. We use the term ‘positive selection’ in the context of any Since the inception of reverse vaccinology (Rappuoli, 2000) type of selection where newly derived mutation has a selective almost two decades ago, researchers have applied various advantage over other mutations and that the majority of the fixed in silico approaches to identify protein and/or peptide vaccines mutations are adaptive even if most mutations are deleterious or candidates worthy of laboratory validation. These approaches neutral (Kaplan et al., 1989; Thiltgen et al., 2017). were previously reviewed (Bowman et al., 2011; Jones, 2012; Pathogens are believed to represent one of the strongest Donati and Rappuoli, 2013; Rappuoli et al., 2016). Fundamental selective pressures acting on humans (Fumagalli et al., 2011). to each approach is the detection of signals or patterns encoded Conversely, pathogens are naturally under strong selection to in protein sequences of a target pathogen. As to date, no prevent detection from the host’s immune system. Eventually, signal has been identified that clearly indicates a protein will advantageous escape mutations become fixed at the molecular effectively contribute to a protective immune response in a host. level. However, the host’s immune system is also under strong Consequently, the current best practice is to predict pertinent selection for mutations that enable the pathogen to be detected. protein characteristics from informative signals that collectively Subsequently, once again, the pathogen will experience selection support the likelihood the protein will make a credible candidate. for new escape mutations in this evolutionary arms race Although there is no proven set of characteristics, the general (Thiltgen et al., 2017). This see-sawing process leaves sequence community consensus delineating a valued characteristic is one patterns of variation indicating selected and neutral regions, i.e., inferring that a protein is either external to or located on, or in, genomic footprints known as selection signatures. Proteins that the membrane of a pathogen. Such a protein type is deemed more continually avoid detection by the immune system are expected accessible to surveillance by the immune system than one within not to have these signatures. the interior of a pathogen (Davies and Flower, 2007; Flower et al., The premise for this study was that proteins under positive 2010). In this study, we investigate the amino acid substitution selection, as identified by selection signatures, are more likely rate of a protein as an additional characteristic supporting its suitable vaccine candidates than proteins under negative or vaccine candidacy. purifying selection. The expectation is that those proteins Protein sequences from Toxoplasma gondii were used as a naturally exposed to the immune system, such as the target case study. Toxoplasma is an obligate intracellular pathogen candidates, will possess and maintain higher levels of genetic responsible for birth defects in humans (Montoya and Liesenfeld, variation than interior ones; in effect creating a greater pool of 2004) and an important model system for the phylum mutations for natural selection to act upon to avoid recognition Apicomplexa (Roos et al., 1999; Kim and Weiss, 2004; Che by the immune system (Pacheco et al., 2012; Obara et al., 2016; et al., 2010). An apicomplexan pathogen invades a host cell Bigham et al., 2018; Garzon-Ospina et al., 2018). first by, recognizing host-cell surface receptors via surface One of the best known gene-based methods to detect antigens (SAGs) on its cell membrane, and then secreting positive selection at the molecular level is based on codon proteins (excreted/secreted antigens) from the organelles of analysis. This method compares patterns of synonymous and apical complexes, including Dense Granules (GRA), Micronemes non-synonymous mutations in protein coding sequences from (MIC), and Rhoptries (ROP) (Chen et al., 2008). Consequently, divergent species. Synonymous mutations (functionally silent or SAG, GRA, MIC and ROP proteins have been the primary neutral) are presumed not to change the amino acid sequence antigens under investigation in numerous recombinant/subunit of the protein encoded, whereas non-synonymous mutations do vaccine studies (Kur et al., 2009; Zhang et al., 2015) due to their alter the amino acid sequence and are subject to natural selection. natural exposure to the immune system and potential to induce The synonymous substitution rate is the neutral rate mS = m. In a host immune response. These latter proteins are referred to contrast, the non-synonymous substitution rate will be typically henceforth as target candidates. different to the neutral rate mN 6D m. All pathogen proteins are susceptible to various types of Ideally, we need a chronological series of ancestor genes to selection in response to environmental pressures. This is from truly calculate mN and mS rates. This is obviously an impracticable the perception that natural selection acts mostly on expressed ideal and so non-synonymous and synonymous distances, among proteins (i.e., the phenotype) rather than directly on genetic coding sequences of orthologous genes in a phylogeny, are material. The three main types of selection are positive, estimated from a sequence alignment (dN = tmN and dS = tmS negative/purifying, and balancing (Harris and Meyer, 2006; where t is the time of divergence or branch length in the Oleksyk et al., 2010). The selection