Significant Differences in Terms of Codon Usage Bias
Total Page:16
File Type:pdf, Size:1020Kb
Mioduser et al. BMC Genomics (2017) 18:866 DOI 10.1186/s12864-017-4248-7 RESEARCHARTICLE Open Access Significant differences in terms of codon usage bias between bacteriophage early and late genes: a comparative genomics analysis Oriah Mioduser1†, Eli Goz1,2† and Tamir Tuller1,2,3* Abstract Background: Viruses undergo extensive evolutionary selection for efficient replication which effects, among others, their codon distribution. In the current study, we aimed at understanding the way evolution shapes the codon distribution in early vs. late viral genes in terms of their expression during different stages in the viral replication cycle. To this end we analyzed 14 bacteriophages and 11 human viruses with available information about the expression phases of their genes. Results: We demonstrated evidence of selection for distinct composition of synonymous codons in early and late viral genes in 50% of the analyzed bacteriophages. Among others, this phenomenon may be related to the time specific adaptation of the viral genes to the translation efficiency factors involved at different bacteriophage developmental stages. Specifically, we showed that the differences in codon composition in different temporal gene groups cannot be explained only by phylogenetic proximities between the analyzed bacteriophages, and can be partially explained by differences in the adaptation to the host tRNA pool, nucleotide bias, GC content and more. In contrast, no difference in temporal regulation of synonymous codon usage was observed in human viruses, possibly because of a stronger selection pressure due to a larger effective population size in bacteriophages and their bacterial hosts. Conclusions: The codon distribution in large fractions of bacteriophage genomes tend to be different in early and late genes. This phenomenon seems to be related to various aspects of the viral life cycle, and to various intracellular processes. We believe that the reported results should contribute towards better understanding of viral evolution and may promote the development of relevant procedures in synthetic virology. Keywords: Viral evolution, Codon usage bias (CUB), Bacteriophage genome evolution, Viral life cycle, Coding regions, Synthetic virology Background Gene expression within different Deoxy ribonucleic Deciphering the regulatory information encoded in Acid (DNA) viruses or viruses with DNA intermedi- the genomes of phages and other viruses, and the ate, such as herpeses, lenti-retro, polyoma, papilloma, relation between the nucleotide composition of the adeno, parvo and various families of bacteriophages is coding regions and the viral fitness is of great interest regulated in a temporal fashion and can be divided in recent years. into early and late stages with respect to the viral replication cycle [1–8]. The early genes are expressed following the entry into * Correspondence: [email protected] the host cell and code typically for non-structural pro- †Equal contributors teins that are responsible for different regulatory 1 Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv, functions in processes such as: viral DNA replication, Israel 2SynVaccineLtd. Ramat Hachayal, Tel Aviv, Israel activation of late genes expression, trans-nuclear Full list of author information is available at the end of the article © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mioduser et al. BMC Genomics (2017) 18:866 Page 2 of 10 transport, interaction with the host cell, induction of the Bacteriophage early and late genes tend to have different cell’s DNA replication machinery necessary for viral compositions of synonymous codons replication, etc. [9, 10]. Late genes largely code for struc- Genome level information about the different viruses tural proteins required for virion assembly; they are analyzed in this study, like their hosts, number of genes, generally highly expressed and their expression is usually gene lengths and ENC, is displayed in Additional file 1: induced or regulated by the early genes [9, 10]. Table S1 and Figure S1. Several studies have shown that viral codon frequen- In order to compare the synonymous codons usage cies tend to undergo evolutionary pressure for specific in early and late genes, each coding sequence was CUB; among others, it was suggested that viral CUB is represented by its relative synonymous codons fre- under selection for improving the viral fitness, and in quencies (RSCF) - a 61 dimensional vector expressing specifically the viral gene expression [11–33]. each sense codon by its frequency in that sequence In particular, in [17] different trends of translation normalized relative to the frequencies of other syn- efficiency adaptation of the coding regions of the onymous codons coding for the same AA. We then bacteriophage Lambda early and late genes were demon- performed a clustering analysis, assuming that RSCF vec- strated. Specifically, it was shown that the preferences of tors that are closer with respect to Euclidian metric cor- codons in early genes, but not in the late genes, were respond to genes with a more similar content of similar to those of the bacterial host [17]. The analysis of synonymous codons (see Materials and Methods). ribosome profiling data revealed that the codon decod- Our results suggest that early and late genes in 50% of ing rates of viral genes tend to correlate with their the analyzed bacteriophages tend to exploit different expression levels [17]. Interestingly, during the initial synonymous codons. Specifically, in 7 of the 14 analyzed stages of phage development the decoding rates in early bacteriophages, early and late genes were found to be genes were found to be higher than the decoding rates significantly (p-value ≤0.05) separated according to the in late genes; in more progressive viral cycles an oppos- frequencies of their synonymous codons (Figs. 2, 3a, b, ite trend was demonstrated [17]. Additional file 1: Figure S2 and Figure S3 in Section 1.2). In this study we go further, and perform a comparative Our analysis provide evidence that different sets of syn- genomics analysis of the temporal differences in CUB in onymous codons in early vs. late genes are selected for almost all known viruses with existing in the literature in the course of viral evolution; these differences may be classification of their genes into early and late groups. related to the optimization of bacteriophage fitness in Specifically basing on analysis of 14 bacteriophages and 11 different phases of the viral lifecycles. human viruses we suggest that 50% of the analyzed bacterio- In addition, 6 out of 14 bacteriophages were also phages tend to undergo an extensive evolutionary selection found to be significantly (p-value <0.05) separated ac- for distinct compositions of synonymous codons in early cording to the AA composition of their early and late and late viral genes. We analyze the features of the genomes genes (Fig. 3b, Additional file 1: Figure S4 and Figure that undergo this type of selection and argue that the differ- S5 in Section 1.2). 4 viruses were characterized both ential CUB can be related to various intracellular phenom- by a differential synonymous codon usage and by a ena and processes, such as: translational selection and differential AA usage in their early and late genes. regulation [11, 12, 17, 21, 22, 28, 31], mutational bias and These findings suggest that among others, the differ- pressure [16, 20, 21, 23, 26, 27, 30, 32, 33], amino acids (AA) ent codon distribution in early and late genes may be compositions [12, 16], and other genomic characteristics, partially related to the functionality of the encoded some of which are still not fully understood [13, 14, 29, 34]. proteins via their AA content and possibly protein Finally, we discuss a possible application of our find- folding [35]. ings to synthetic virology. Specifically, we suggest using To check if bacteriophages with significant differ- the temporally regulated CUB for controlling the viral ences in synonymous codons usage in temporal genes gene expression at different time points during the life tend to have more similar genomic sequences (usually cycle for designing of optimized and/or deoptimized related to smaller evolutionary distances), we recon- synthetic viruses which can be used in exploring novel structed a phylogenetic tree of the bacteriophage pro- strategies in vaccination (e.g. life attenuated vaccines) teomes based on Average Repetitive Subsequences and cancer therapy (oncolytic viruses). (ARS) distance matrix and neighbor joining method as described in Materials and Methods section and in references therein (Fig. 3a). We then performed a statistical analysis in order to investigate the relation Results between the differences in temporal regulation of syn- The research outline of the study is described in Fig. 1. onymous codons in different viruses and their evolu- More details can be found in the following sections. tionary distances. We