Adaptive and Conservative Protein Assets in Plasmodia. Two
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.03.14.992107; this version posted January 2, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Title page Article title: Evolutionary Pressures and Codon Bias in Low Complexity Regions of Plasmodia Parasites Authors: 1*Andrea Cappannini, Sergio Forcelloni*3, 1, 2 Andrea Giansanti Affiliation: 1 Sapienza, University of Rome, Department of Physics, P.le A. Moro 5, 00185 Roma, Italy. 2 Istituto Nazionale di Fisica Nucleare, INFN, Roma1 section. 00185, Roma, Italy. 3 Max Planck Institute of Biochemistry, Munich Corresponding Author: Andrea Cappannini : [email protected] Sapienza University of Rome, Department of Physics, P.le A. Moro 5, 00185 Roma, Italy. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.14.992107; this version posted January 2, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Abstract 38 consequently, their propensity to undergo indels, 39 2 One of the most debated topics in Evolutionary whereby to cycles of expansion and contraction. The 40 3 Biology concerns Low Complexity Regions of P. first is Replication Slippage (RS) (Guy-Franck et al. 41 4 falciparum, the causative agent of the most virulent 2008; Mosbach et al. 2019; Ellegren, 2004; Gemayel 42 5 and deadly form of human malaria. In this work, we et al. 2012; Saitou, 2018). RS is a mutation process 43 6 analysed the proteome of 22 plasmodium species that occurs during DNA replication. It involves 44 7 including P. falciparum. SEG predicts that proteins denaturation and displacement of DNA strands with 45 8 containing Low Complexity Regions turn out to be the consequent decoupling of complementary bases 46 9 longer than those which are predicted to be completely (Levinson & Gutman, 1987). In a nutshell, the loop out 47 10 complex (without Low Complexity Regions). for the template strand implies a contraction whilst the 48 11 Moreover, using some well-known bioinformatics same for the nascent strand implies an expansion. 49 12 tools such as the Effective Number of Codons, the Pr2 (Gemayel et al. 2012). The other mechanisms are 50 13 and a new index that we have called SPI, we have recombination-like events (Gemayel et al. 2012; 51 14 noticed how proteins that embed Low Complexity Ellegren, 2004; Verstrepen et al. 2005) such as 52 15 Regions are under lower selective pressure than those unequal crossing-over and gene conversion, where 53 16 that do not present this type of locus. By applying the some propose it as a predominant mechanism in 54 17 Relative Synonymous Codon Usage and other tools minisatellite regions (Guy-Frank & Paques, 2000) 55 18 developed ad hoc for this study, we note, instead, how with the predominance of replication slippage in 56 19 the Low Complexity Regions appear to have a non- microsatellite regions (Kokoska et al. 1998). Guy- 57 20 neutral codon bias with respect to the host proteins. Franck and colleagues (2008) extensively surveyed 58 the current landscape of definitions of TRRs based on 21 Introduction 59 the length of the repeating unit, highlighting a lack of 22 60 consensus in the definition of microsatellite where, 23 Due to the critical implications for Human Health, the 61 some propose to consider a repetitive unit length of up 24 study of Single Nucleotide Polymorphisms (SNPs) 62 to 5 or 10 nt either (Guy-Frank et al. 2008; Gemayel 25 and Copy Number Variations (CNVs) is a 63 et al.2012). These loci have high mutational rate if 26 fundamental research field (Zhang et al. 2009). 64 compared to other DNA regions (Brinkmann et al. 27 Tandem repeat regions (TTRs) represent a third type 65 1998) ranging between and per cell 28 of genetic variation (Gemayel et al. 2012), whereby 10 10 66 generation (Gemayel et al. 2012). Past research has 29 proteins host regions of reduced complexity and 67 identified some factors that would contribute to the 30 biased amino acid composition, also called low 68 instability of these loci. Gemayel and colleagues 31 complexity regions (LCR) (Toll- Riera et al. 2012). 69 (2012) have redacted an extensive examination 32 Despite the diversification of the various domains of 70 referring to different papers that have dealt with this 33 life, this type of loci is ubiquitously present (Kumari 71 topic: Legendre et al. (2007) highlighted how the 34 et al. 2015) existing in Bacteria, Archaea and Eukarya 72 presence of multiple repetitive units is the main 35 (Wootton & Federhen, 1996). Taken as a whole, the 73 characteristic that contributes to the instability of these 36 literature indicates two main mechanisms that are 74 regions, followed by the length and the nucleotide 37 most likely to explain the presence of TRRs and, bioRxiv preprint doi: https://doi.org/10.1101/2020.03.14.992107; this version posted January 2, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 75 purity of the tract, i.e. a coherent succession of 112 fruitful source of genetic investigations, also due to its 76 nucleotides not interrupted by any nitrogenous base 113 particularly high content of LCRs, mostly 77 without counterparts in the repetitive sequence 114 characterized by asparagine (N) residues. The 78 (Saitou, 2018). Wanting to consider a homopolymer 115 functional role of these stretches is not fully 79 sequence (same amino acid) Verstrepen and 116 understood yet. However, several hypotheses have 80 colleagues (2005) have instead highlighted how the 117 been advanced. Among others, Pizzi E. and Frontali C. 81 use of more synonymous codons drastically increases 118 (2001) proposed them to be subjected, to continuous 82 the stability of these loci. The nucleotide content also 119 cycles of expansion and de novo generation 83 affects the instability of these regions where poly A or 120 representing a resource of antigenic variation 84 poly T tracts have been observed to be more stable 121 ultimately leading the parasite to evade the host 85 than the corresponding poly G or poly C tracts (Gragg 122 immune response. Such a hypothesis has been 86 et al. 2002). For a more detailed examination we refer 123 sustained by other investigators in the field (Karlin et 87 to the original work (Gemayel et al. 2012). Low 124 al. 2001; Ferreira et al. 2003; Cortès et al. 2005). On 88 Complexity Regions are thought to be the result of tri- 125 the one hand and in line with a general requirement for 89 nucleotide slippage (Levinson & Gutman, 1987; 126 neo-functionalization, it was proposed that long N- 90 Mularoni et al. 2006). In proteins, the phenotype 127 repetitive stretches influence the local rate of 91 associated with these regions is often contradictory, 128 translation, thus triggering ribosome pausing and 92 whereby LCRs often have pathogenic implications, 129 ultimately behaving as tRNA sponges that assist co- 93 such as in neurodegenerative (Gatchel & Zoghbi, 130 translational folding (Frugier et al. 2009; Filisetti et al. 94 2005) or developmental (Brown & Brown, 2004) 131 2013). On the other hand, Forsdyke D.R. (Forsdyke, 95 pathologies. Nevertheless, LCRs are important for 132 2016) discusses the results obtained with Xue (2003), 96 protein fitness. Shen et al. (2004) showed how the 133 whereby similar N-repetitive stretches might play a 97 Arginine and Serine rich binding sites in the Exonic 134 double role at DNA level. Indeed, since the nucleotide 98 Splicing Enhancer (ESE) contribute to the assembly of 135 content of P.falciparum’s genome is skewed towards 99 pre-spliceosomes, supporting splicing and related 136 AT, the N-repetitive stretches are supposed to stabilize 100 activities. Similarly, the group of Salichs and 137 mRNAs and prevent the corresponding genes to 101 colleagues (2009) noted that histidine-rich sites are 138 undergo deleterious mutations. Undoubtedly, all of the 102 pivotal for sub-cellular localization. More generally, 139 cited works have provided the scientific community 103 incremental evidence highlights how these regions are 140 with important but also conflicting explanations for 104 preferentially inserted only in certain functional 141 the presence of LCRs in P.falciparum. Recent 105 protein classes (Karlin et al. 2001; Alba & Guigo 142 advances have shown how asparagine promotes the 106 2004; Faux et al. 2005) and how they do not sever their 143 formation of amyloid structures following thermal 107 functional domains (Newfeld et al. 1994). P. 144 shocks (Halfmann et al. 2011). As already explained 108 falciparum is a protozoan that belongs to the Phylum 145 by Muralidharan & Goldberg (2013), P. falciparum 109 of Apicomplexa. It represents the etiological agent of 146 copes with extremely variable temperatures during its 110 the most severe and lethal form of Human Malaria 147 life cycle, e.g., passing from the relatively low, 111 (Pizzi & Frontali, 2001). This parasite offered a 148 ambient temperature of the mosquito to the human bioRxiv preprint doi: https://doi.org/10.1101/2020.03.14.992107; this version posted January 2, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 149 body at about 37° C and vice versa. Moreover, Malaria 186 nLCPs. To address the issue of Darwinian Selective 150 is characterized by numerous cycles of fever, 187 Pressures, we selected two common bioinformatics 151 eventually causing the host temperature to reach > 40° 188 tools, i.e.