Mioduser et al. BMC Genomics (2017) 18:866 DOI 10.1186/s12864-017-4248-7

RESEARCHARTICLE Open Access Significant differences in terms of between bacteriophage early and late : a comparative genomics analysis Oriah Mioduser1†, Eli Goz1,2† and Tamir Tuller1,2,3*

Abstract Background: undergo extensive evolutionary selection for efficient replication which effects, among others, their codon distribution. In the current study, we aimed at understanding the way evolution shapes the codon distribution in early vs. late viral genes in terms of their expression during different stages in the viral replication cycle. To this end we analyzed 14 bacteriophages and 11 viruses with available information about the expression phases of their genes. Results: We demonstrated evidence of selection for distinct composition of synonymous codons in early and late viral genes in 50% of the analyzed bacteriophages. Among others, this phenomenon may be related to the time specific adaptation of the viral genes to the efficiency factors involved at different bacteriophage developmental stages. Specifically, we showed that the differences in codon composition in different temporal groups cannot be explained only by phylogenetic proximities between the analyzed bacteriophages, and can be partially explained by differences in the adaptation to the host tRNA pool, bias, GC content and more. In contrast, no difference in temporal regulation of synonymous codon usage was observed in human viruses, possibly because of a stronger selection pressure due to a larger effective population size in bacteriophages and their bacterial hosts. Conclusions: The codon distribution in large fractions of bacteriophage tend to be different in early and late genes. This phenomenon seems to be related to various aspects of the viral cycle, and to various intracellular processes. We believe that the reported results should contribute towards better understanding of viral evolution and may promote the development of relevant procedures in synthetic . Keywords: Viral evolution, Codon usage bias (CUB), Bacteriophage evolution, Viral life cycle, Coding regions, Synthetic virology

Background within different Deoxy ribonucleic Deciphering the regulatory information encoded in Acid (DNA) viruses or viruses with DNA intermedi- the genomes of phages and other viruses, and the ate, such as herpeses, lenti-retro, polyoma, papilloma, relation between the nucleotide composition of the adeno, parvo and various families of bacteriophages is coding regions and the viral fitness is of great interest regulated in a temporal fashion and can be divided in recent years. into early and late stages with respect to the viral replication cycle [1–8]. The early genes are expressed following the entry into * Correspondence: [email protected] the host and code typically for non-structural pro- †Equal contributors teins that are responsible for different regulatory 1 Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv, functions in processes such as: viral DNA replication, Israel 2SynVaccineLtd. Ramat Hachayal, Tel Aviv, Israel activation of late genes expression, trans-nuclear Full list of author information is available at the end of the article

© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mioduser et al. BMC Genomics (2017) 18:866 Page 2 of 10

transport, interaction with the host cell, induction of the Bacteriophage early and late genes tend to have different cell’s DNA replication machinery necessary for viral compositions of synonymous codons replication, etc. [9, 10]. Late genes largely code for struc- Genome level information about the different viruses tural required for virion assembly; they are analyzed in this study, like their hosts, number of genes, generally highly expressed and their expression is usually gene lengths and ENC, is displayed in Additional file 1: induced or regulated by the early genes [9, 10]. Table S1 and Figure S1. Several studies have shown that viral codon frequen- In order to compare the synonymous codons usage cies tend to undergo evolutionary pressure for specific in early and late genes, each coding sequence was CUB; among others, it was suggested that viral CUB is represented by its relative synonymous codons fre- under selection for improving the viral fitness, and in quencies (RSCF) - a 61 dimensional vector expressing specifically the viral gene expression [11–33]. each sense codon by its in that sequence In particular, in [17] different trends of translation normalized relative to the frequencies of other syn- efficiency adaptation of the coding regions of the onymous codons coding for the same AA. We then bacteriophage Lambda early and late genes were demon- performed a clustering analysis, assuming that RSCF vec- strated. Specifically, it was shown that the preferences of tors that are closer with respect to Euclidian metric cor- codons in early genes, but not in the late genes, were respond to genes with a more similar content of similar to those of the bacterial host [17]. The analysis of synonymous codons (see Materials and Methods). profiling data revealed that the codon decod- Our results suggest that early and late genes in 50% of ing rates of viral genes tend to correlate with their the analyzed bacteriophages tend to exploit different expression levels [17]. Interestingly, during the initial synonymous codons. Specifically, in 7 of the 14 analyzed stages of phage development the decoding rates in early bacteriophages, early and late genes were found to be genes were found to be higher than the decoding rates significantly (p-value ≤0.05) separated according to the in late genes; in more progressive viral cycles an oppos- frequencies of their synonymous codons (Figs. 2, 3a, b, ite trend was demonstrated [17]. Additional file 1: Figure S2 and Figure S3 in Section 1.2). In this study we go further, and perform a comparative Our analysis provide evidence that different sets of syn- genomics analysis of the temporal differences in CUB in onymous codons in early vs. late genes are selected for almost all known viruses with existing in the literature in the course of viral evolution; these differences may be classification of their genes into early and late groups. related to the optimization of bacteriophage fitness in Specifically basing on analysis of 14 bacteriophages and 11 different phases of the viral lifecycles. human viruses we suggest that 50% of the analyzed bacterio- In addition, 6 out of 14 bacteriophages were also phages tend to undergo an extensive evolutionary selection found to be significantly (p-value <0.05) separated ac- for distinct compositions of synonymous codons in early cording to the AA composition of their early and late and late viral genes. We analyze the features of the genomes genes (Fig. 3b, Additional file 1: Figure S4 and Figure that undergo this type of selection and argue that the differ- S5 in Section 1.2). 4 viruses were characterized both ential CUB can be related to various intracellular phenom- by a differential synonymous codon usage and by a ena and processes, such as: translational selection and differential AA usage in their early and late genes. regulation [11, 12, 17, 21, 22, 28, 31], mutational bias and These findings suggest that among others, the differ- pressure [16, 20, 21, 23, 26, 27, 30, 32, 33], amino acids (AA) ent codon distribution in early and late genes may be compositions [12, 16], and other genomic characteristics, partially related to the functionality of the encoded some of which are still not fully understood [13, 14, 29, 34]. proteins via their AA content and possibly Finally, we discuss a possible application of our find- folding [35]. ings to synthetic virology. Specifically, we suggest using To check if bacteriophages with significant differ- the temporally regulated CUB for controlling the viral ences in synonymous codons usage in temporal genes gene expression at different time points during the life tend to have more similar genomic sequences (usually cycle for designing of optimized and/or deoptimized related to smaller evolutionary distances), we recon- synthetic viruses which can be used in exploring novel structed a phylogenetic tree of the bacteriophage pro- strategies in vaccination (e.g. life attenuated vaccines) teomes based on Average Repetitive Subsequences and cancer therapy (oncolytic viruses). (ARS) distance matrix and neighbor joining method as described in Materials and Methods section and in references therein (Fig. 3a). We then performed a statistical analysis in order to investigate the relation Results between the differences in temporal regulation of syn- The research outline of the study is described in Fig. 1. onymous codons in different viruses and their evolu- More details can be found in the following sections. tionary distances. We did not find such a relation Mioduser et al. BMC Genomics (2017) 18:866 Page 3 of 10

Fig. 1 The research outline of the study. The details can be found in the main text: Our analysis was based on coding sequences of 14 bacteriophages and 11 human viruses (A.), and on the ribo-seq measurements of bacteriophage Lambda and its E.coli host (B.). Basing on the existing literature, classification of the viral genes to early and late (with respect to the beginning of the lytic phase) was derived (C.). A., B., and C, were used to perform a comprehensive comparative genomics analysis of differential synonymous codon usage in early and late genes (D.), as well as of additional genomic features possibly related to codon bias (E.), such as: ribo-seq based codon typical decoding rates (TDR), Transfer Ribonucleic Acid (tRNA) adaptation indexes (tAI), effective number of codons (ENC), codon pairs bias (CPB), amino acids bias (AAB), dinucleotide bias (DNTB), nucleotide bias (NTB), GC content, number of genes in each temporary group and their length

(see details in Additional file 1: Section 1.3 and Weaker separation between synonymous codon usage in Figure S6), suggesting that the differential codon early and late genes in human viruses usage in early and late genes is a complex trait re- The results in the previous section suggest that bacterio- lated to alternative determinants such as the bacterial phages undergo an extensive evolutionary selection on a niche, the specific phage proteins and their function/ synonymous level for temporal regulation of gene ex- structure, etc. pression. Whether this also occurs in viruses of Viruses undergo an extensive evolutionary selection and other eukaryotic hosts is harder to ascertain. Human for adaptation to their host’s cell environment, and thus Immunodeficiency 1 (HIV-1) was found to have a it can be assumed that their codon composition reflects significant separation (p-value ≤0.05) of codon compos- an efficient adaptation of the viral genes to specific intra- ition between early and late genes, while such separation cellular conditions (e.g. in terms of gene expression fac- was not statistically significant in the rest of the analyzed tors such as tRNA molecules, AA concentration, etc) viruses (see Additional file 1: Table S2 in Section 1.4). that are prevalent in different gene expression stages, in As evidenced in Additional file 1: Table S1 and Figure accordance with the reported results. S1, human viruses tend to have fewer genes than Mioduser et al. BMC Genomics (2017) 18:866 Page 4 of 10

Fig. 2 Principal component analysis (PCA) of RSCF vectors for bacteriophages with significant separation in codon usage between early (blue circles) and late (red circles) genes. In order to visualize the clustering, PCA was applied to project the RSCF vectors to a plane spanned by their first two principal components. In order to visualize the separation between clusters a maximum margin separation line, a line for which the Euclidian distance between it and the nearest point from either of the groups is maximized, was calculated and plotted. The significance of cluster separation was assessed by comparing the Davies-Bouldin cluster score to the randomized scores obtained from 100 permutations of gene temporary (early or late) labels. The variances % of the first two principal components are mentioned in the figures axis bacteriophages. Therefore, we were interested in check- early and late genes. To this end we analyzed the 7 ing whether this fact can explain the weaker signal for bacteriophages with temporary differential codon usage temporal separation in CUB, and if, in practice, human by sampling in each one of them a number of early and viruses may also behave as bacteriophages with respect late genes that is typical to human viruses (average of 8 to the differential usage of synonymous codons in their early genes and 14 late genes). We found that the Mioduser et al. BMC Genomics (2017) 18:866 Page 5 of 10

Fig. 3 Comparative analysis of early and late genes in 14 different bacteriophages. Details can be found in the main text. a A phylogenetic tree built from complete phage proteomes using ARS distance (see Materials and Methods). Phages with significant differences in temporary codon usage are marked by blue. b Viruses with significant (p-value <0.05) separation between early and late genes w.r.t synonymous codons or AA are marked by yellow stars. c Significance of separation between early and late genes w.r.t additional genomic features estimated by Wilcoxon ranksum p-value. Features/viruses with significant (p-value <0.05) separation between the two temporal groups are marked by yellow stars; green is related to higher mean in the case of the early genes and red is related to higher mean in the case of the late genes temporal differences in codon usage remained significant nucleotide bias (NTB), GC content and amino acids bias even after randomly reducing the number of genes, (AAB). indicating, among others, that these differences cannot Various studies related these features to different gen- be directly explained only by the genome size. omic mechanisms and biological processes involved in viral replication cycles and are related to the viral fitness. Comparison of early and late genes with respect to For example, it was suggested that gene translation ef- additional features of their coding regions ficiency can be affected not only by single codons, but The signal of selection for temporarily regulated com- also by distribution of codon pairs [36]. In [37–39] it position of synonymous codons in bacteriophages dem- was argued that pairs of adjacent may be an onstrated in the previous subsection led us to analyze important genomic characteristic being under a signifi- additional genomic features, such as: codon mean typical cant evolutionary pressure in viruses and their hosts; decoding rate (MTDR), tRNA adaptation index (tAI), specifically, it was suggested that CpG pairs are under- codon pairs bias (CPB), dinucleotide bias (DNTB), represented in many Ribonucleic Acid (RNA) and in Mioduser et al. BMC Genomics (2017) 18:866 Page 6 of 10

most small human DNA viruses, in correspondence to due to the larger size of their populations, and because dinucleotide frequencies of their hosts. This of the fact that regulation processes in human gene ex- phenomenon can be related, for example, to the contri- pression are more ‘complex’ and thus may be mediated bution of the CpG stacking basepairs to RNA folding by additional aspects not necessary related to codons. [40] and/or to the enhanced innate immune responses The differences between early and late genes, both to viruses with elevated CpG [41]. The stability of the with respect to the composition of synonymous codons RNA secondary structures can be also affected by the and with respect to additional genomic features de- genomic composition of nucleotides and in particular by scribed in the previous sections, can be possibly influ- GC content [42]. In addition, nucleotide compositions enced by various intracellular phenomena and processes and AA usage bias may affect, among others, the synthe- related to the optimization of gene expression and to the sis of viral molecules, and the function and structure of overall fitness of the phage. To mention a few, these the encoded proteins. phenomena/processes include: adaptation of translation Consequently, we estimated the listed above features elongation efficiency in different phases of the viral life- for all genes in all viruses, and evaluated the separ- cycle [17], Messenger Ribonucleic Acid (mRNA) folding ation between early and late genes with respect to [43, 44], adaptation of the viral genes to the (possibly al- each one of them (see Materials and Methods). The tering) tRNA pool of their hosts [11, 12, 17, 31], muta- results shown in Fig. 3c suggest that the differential tion levels and biases [16, 20, 21, 23, 26, 27, 30, 32, 33], usage of synonymous codons in early and late genes regulation [45, 46], protein function and can be partially related to temporal differences in structure [47], cell metabolism [48], etc. various characteristics of genomic sequences. Specific- There can be various explanations to the fact that it ally, the features with the strongest temporal differ- seems that only 50% of the bacteriophages there is a signifi- ences are the NTB and GC content which are cant difference in the codon usage in early vs. late genes: significant (p-value <0.05) in most of the phages. First, it is possible that the effective population size In addition, we wanted to check if the bacteriophages with (which is not easy to estimate) varies among the ana- a significant temporal separation with respect to synonym- lyzed bacteriophages. The selection pressure is weaker in ous codons tend also to be enriched with specific genomic bacteriophages with smaller population size. features in comparison to the group of bacteriophages with Second, this observation may be also related to the non-significant temporal differences in synonymous codons. intracellular regimes during the development of the dif- To this end, we compared the distribution of various gen- ferent bacteriophages. For example, it is possible that omic features in the two groups. Based on Wilcoxon rank- during the development of some bacteriophages the sum test we found no significant differences between the tRNA levels are modulated/changed, while in other cases two groups of bacteriophages in terms of: genome length the changes are minor. The changes in the tRNA levels (p-value = 0.53), ENC (p-value = 0.4), CPB (p-value = 0.99), may trigger evolution of different CUB in early/late DNTB (p-value = 0.21), NTB (p-value = 0.9), GC content genes in the bacteriophages that experience them. (p-value = 0.8) and AAB (p-value = 0.99). See also Third, this result may be related to the nature of the Additional file 1: Figure S7 in Section 1.5. protein encoded in the bacteriophages genome. The spe- cific function and properties of the proteins in different Discussion bacteriophages may affect the observed levels of selec- In this study, we performed a comparative genomics tion. For example, it is possible that only in some bacte- analysis of viruses with annotations in literature regard- riophages the early vs. late genes tend to have different ing their genes division according to temporal expres- structure with different co-translational folding con- sion. We examined 14 bacteriophages with different straints that eventually affect the codon bias. It is also bacterial hosts and 11 human viruses in order to under- possible that only in some bacteriophages the early vs. stand if there is a universal difference in synonymous late genes tend to have different expression levels/pat- codons usage as well as in additional genomic features terns that eventually affect their codon bias. (such as codon decoding rates, nucleotide/dinucleotide/ It is possible that the results reported here have rele- AA biases, GC content and others) with respect to vant practical applications. For example, vaccines, and different temporal stages of viral life cycle. their discovery, are topics of singular importance in Our results suggest that 50% of bacteriophages present-day biomedical science; however, the discovery undergo an extensive evolutionary selection for distinct of vaccines has hitherto been primarily empirical in na- compositions of synonymous codons in early and late ture requiring considerable investments of time, efforts viral genes. This phenomenon was found to be weaker/ and resourced. To overcome the numerous pitfalls at- less significant in human viruses, possibly because of the tributed to the classical vaccine design strategies, more stronger selection pressure in bacteriophages / efficient and robust rational approaches based on Mioduser et al. BMC Genomics (2017) 18:866 Page 7 of 10

computer-based methods are highly desirable. One dir- assigned a group label corresponding to its temporal ex- ection in designing in-silico vaccine candidates may be pression stage (early/late) (according to the classification based on exploiting the temporally regulated synonym- known in the literature). The tendency of sequences to ous information encoded in the genomes and investi- cluster according to the codons usage in two different gated in this study for attenuating the viral replication clusters corresponding to their temporal expression cycle while retaining the wild type proteins. In particular, stages (early/late) was measured using the Davies- the result reported here suggest that viral genes can be Bouldin score (DBS) [49]. This score is based on a ratio designed with respect to phase specific temporary regu- of within-cluster and between-cluster distances. The op- lated gene expression constraints, and this design would timal clustering solution has the smallest DBS value. result in controllable yields of the corresponding genetic The significance of cluster separation was assessed by products during a defined time period. To achieve this, comparing the DBS of the wildtype sequences to the codons would be selected with frequencies maximally randomized scores obtained from 1000 permutations of dissimilar / similar to the set of early or late genes than gene group labels (early or late). a random set of genes. See Additional file 1: Section 2 In addition, a similar analysis was performed on AA and Figures S8, S9 for more details and examples. frequencies as well. More details can be found in Additional file 1: Section Conclusions 3.3. The codon distribution in large fractions of bacterio- We decided to use the RSCF, since in this study we phage genomes tend to be different in early and late are interested in comparing the frequencies of the co- genes. It seems that various additional genomic features dons without an a-priory assumption/focus on relative (e.g. NTB and GC content) tend to be associated with bias of codons; to this aim it is more natural to use the this signal. This phenomenon seems to be related to RSCF rather the widely used Relative Synonymous Co- various aspects of the viral life cycle, and to various dons Usage (RSCU) measure [50]. However, these mea- intracellular processes. A similar signal may be observed sures are similar, and the same analysis performed with in human viruses but it seems significantly less frequent. RSCU does not change the final conclusions. We believe that the reported results should contribute towards better understanding of viral evolution and may promote the development of relevant procedures in Additional genomic features analyzed in this study synthetic virology. The tRNA adaptation index (tAI) quantifies the adap- tation of a to the tRNA pool with parame- Material and methods ters describing the different tRNAs copy numbers and The research outline of the study is described in Fig. 1. the selective constraints on the codon–anti-codon coup- ling efficiency. Since, currently, these parameters are Viruses based on gene expression measurements in a very lim- Human Viruses analyzed in this study include Herpes ited number of , and since the efficiencies of viruses, papilloma viruses, Polyomavirus and HIV. the different codon-tRNA interactions are expected to The analyzed bacteriophages include: bacteriophage vary among different species, we used a novel approach Lambda, bacteriophage T4, bacteriophage Pak P3, proposed in [51] for adjusting the tAI weights to any tar- bacteriophage phi29, bacteriophage T7, bacteriophage get , without the need for gene expression mea- phiYs40, bacteriophage Fah, bacteriophage xp10, bacterio- surements, basing on an optimization of the correlation phage Streptococcus DT1, bacteriophage Streptococcus between the tAI and a measure of codon usage bias. It is 2972, bacteriophage Mu, bacteriophage phiC31, the first time, to our knowledge, that this approach is bacteriophage phiEco32, bacteriophage p23–45 and applied to study tAI in viruses with respect to their bacteriophage phiR1–37. hosts. The resulting tAI values were computed by a These viruses were chosen since they have a known standalone application [52]. See more details in division to early and late genes annotated in the litera- Additional file 1: Section 3.4. ture, as described in Additional file 1: Table S3. Effective number of codons (ENC) is a measure that quantifies how far the synonymous codon usage of a gene Synonymous codon usage analysis departs from what is expected under the assumption of Codon composition of a coding sequence was repre- uniformity [53]. See more details in Additional file 1: sented by a 61-dimensional vector of RSCF of each one Section 3.5. of 61 coding codons (stop codons are excluded). GC-content is the percentage of nitrogenous bases on a Clustering analysis was performed on RSCF vectors of DNA or RNA molecule that are either guanine or cyto- each viral coding sequence. Each viral sequence was sine. See more details in Additional file 1: Section 3.6. Mioduser et al. BMC Genomics (2017) 18:866 Page 8 of 10

Codon pair bias (CPB). To quantify the CPB, we fol- Phylogenetic reconstruction low [54] and define a codon pair score (CPS) as the log Following [57], a phylogenetic reconstruction of bacte- ratio of the observed over the expected number of oc- riophages was performed basing on an alignment-free currences of this codon pair in the coding sequence. The distance that estimates the similarity of two sequences CPB of a virus is then defined as an average CPSs over (in our case entire viral proteomes) according to the all codon pairs comprising all viral coding sequences. average length of subsequences that are repeated in both See more details in Additional file 1: Section 3.7. of them (the ARS). The tree was built using the neighbor Dinucleotide bias (DNTB). We define a dinucleotide joining algorithm as implemented in [58]. score (DNTS) for a pair of nucleotides as an observed See more details in Additional file 1: Section 3.12. over expected ratio of its occurrences in a sequence. The DNTB of a virus is defined as an average of DNTSs over Additional files all dinucleotides comprising all viral coding sequences. See more details in Additional file 1: Section 3.8. Additional file 1: Supplementary results and material. (PDF 2568 kb) Nucleotide (NTB) and (AAB) biases are Additional file 2: Full list of the analyzed viruses including their defined as a normalized Shannon entropy over the fre- accession numbers and temporal labels of genes. (XLSX 111 kb) quencies of the nucleotides / AA in a genomic sequence. See more details in Additional file 1: Section 3.9. Abbreviations AA: Amino acid; AAB: Amino acid bias; ARS: Average repetitive subsequences; CPB: Codon pair bias; CPS: Codon pair score; CUB: Codon usage bias; DBS: Davies-bouldin score; DNA: Deoxy ribonucleic acid; Ribosome profiling analysis DNTB: Dinucleotide bias; DNTS: Dinucleotide score; E.Coli: ; Ribosome profiling (ribo-seq) data was taken from EMG: Exponentially modified gaussian; ENC: Effective number of codons; [55]. Ribosome profiles for bacteriophage Lambda and HIV: Human immunodeficiency virus; mRNA: Messenger ribonucleic acid; Escherichia coli E.Coli) MTDR: Mean typical decoding rate; NFC: Normalized footprint count; ( were reconstructed and normal- NTB: Nucleotide bias; PCA: Principal component analysis; RNA: Ribonucleic ized as in [17]. The normalization enables measuring the acid; RSCF: Relative synonymous codons frequencies; RSCU: Relative relative time a ribosome spends translating each codon synonymous codons usage; tAI: tRNA adaptation index; TDR: Typical decoding rate; tRNA: Transfer ribonucleic acid in a specific gene relative to other codons, while consid- ering the total number of codons in this gene, and re- Acknowledgements sults in codons normalized footprint count (NFC). None. Codon typical decoding rate (TDR). Following [17], in order to estimate the typical decoding time of each Funding E.G. is supported, in part, by a fellowship from the Edmond J. Safra Center codon based on the corresponding ribo-seq data, we used a for at Tel-Aviv University. T.T. is grateful to the Minerva novel statistical model [56] which takes into consideration ARCHES award. the skewed nature of the NFC distribution and describes the NFC histogram of each codon as an output of a ran- Availability of data and materials The datasets supporting the conclusions of this article are included within dom variable which is a sum of a normally distributed and the supplementary information in Additional file 1: Section 3.1 and an exponentially distributed random variables called Expo- Additional file 2. nentially Modified Gaussian (EMG). Maximum likelihood criterion was used to estimate the parameters of these dis- Authors’ contributions OM, EG, TT analyzed the data and wrote the paper. All authors read and tributions for each codon according to the ribo-seq data by approved the final manuscript. fitting the suggested model to the NFC distribution. The mean of the normal distribution component of EMG was Ethics approval and consent to participate 1 Not applicable. called μ; and μ was defined to be the TDR of a codon [17].

See more details in Additional file 1: Section 3.10. Consent for publication Mean typical decoding rate (MTDR) is a measure Not applicable. which estimates the global translation elongation efficiency oftheentiregeneasageometricaverageofTDRsofits Competing interests The authors declare that they have no competing interests. codons. See more details in Additional file 1: Section 3.11. Since bacteriophage Lambda is the only phage with pub- ’ licly available ribo-seq data, a direct analysis of TDRs of Publisher sNote Springer Nature remains neutral with regard to jurisdictional claims in other phages is currently impossible. Nevertheless, due to published maps and institutional affiliations. the adaptation of the viruses to the translation machinery Author details of their hosts, a rough estimation of MTDR values for 1 E.Coli Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv, other phages rather than Lambda may be obtained Israel. 2SynVaccineLtd. Ramat Hachayal, Tel Aviv, Israel. 3Sagol School of from the available ribose-seq of the host genes. Neuroscience, Tel-Aviv University, Ramat Aviv, Israel. Mioduser et al. BMC Genomics (2017) 18:866 Page 9 of 10

Received: 5 July 2017 Accepted: 31 October 2017 25. Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 2005;33(4):1141–53. 26. Su MW, Lin HM, Yuan HS, Chu WC. Categorizing host-dependent RNA References viruses by principal component analysis of their Codon usage preferences. J 1. Bonvicini F, Filippone C, Delbarba S, Manaresi E, Zerbini M, Musiani M, Comput Biol. 2009;16(11):1539–47. Gallinella G. Parvovirus B19 genome as a single, two-state replicative and 27. Tao P, Dai L, Luo MC, Tang FQ, Tien P, Pan ZS. Analysis of synonymous transcriptional unit. Virology. 2006;347(2):447–54. codon usage in classical swine fever virus. Virus Genes. 2009;38(1):104–12. 2. Fessler SP, Young CSH. Control of adenovirus early gene expression during 28. Tsai CT, Lin CH, Chang CY. Analysis of codon usage bias and base the late phase of infection. J Virol. 1998;72(5):4049–56. compositional constraints in iridovirus genomes. Virus Res. 2007;126(1–2): 3. Gruffat H, Marchione R, Manet E. Herpesvirus late gene expression: a viral- 196–206. specific pre-initiation complex is key. Front Microbiol. 2016;7:869. 29. Wong EHM, Smith DK, Rabadan R, Peiris M, LLM P. Codon usage bias and 4. Jia R, Liu XF, Tao MF, Kruhlak M, Guo M, Meyers C, Baker CC, Zheng ZM. the evolution of influenza a viruses. Codon usage biases of influenza virus. Control of the Papillomavirus early-to-late Switch by differentially expressed BMC Evol Biol. 2010;10(1):253. SRp20. J Virol. 2009;83(1):167–80. 30. Zhang ZC, Dai W, Wang Y, Lu CP, Fan HJ. Analysis of synonymous codon 5. Liu Z, Carmichael GG. Polyoma-virus early-late switch - regulation of usage patterns in torque teno sus virus 1 (TTSuV1). Arch Virol. 2013;158(1): late rna accumulation by dna-replication. Proc Natl Acad Sci U S A. 145–54. 1993;90(18):8494–8. 31. Zhao KN, Gru WY, Fang NX, Saunders NA, Frazer IH. Gene codon 6. Nisole S, Saïb A. Early steps of replicative cycle. Retrovirology. composition determines differentiation-dependent expression of a viral 2004;1(1):9. capsid gene in keratinocytes in vitro and in vivo. Mol Cell Biol. 2005;25(19): 7. Schiralli Lester GM, Henderson AJ. Mechanisms of HIV transcriptional 8643–55. regulation and their contribution to latency. Mol Biol Int. 2012;2012:11. 32. Zhong J, Li Y, Zhao S, Liu S, Zhang Z. Mutation pressure shapes codon 8. Yang H, Ma Y, Wang Y, Yang H, Shen W, Chen X. Transcription regulation usage in the GC-rich genome of foot-and-mouth disease virus. Virus Genes. mechanisms of bacteriophages. Bioengineered. 2014;5:300–4. 2007;35(3):767–76. 9. Levy JA, Fraenkel-Conrat H, Owens RA: Virology: prentice hall; 1994. 33. Zhou JH, Zhang J, Chen HT, Ma LN, Liu YS. Analysis of synonymous 10. Saunders JBCaVA. Virology principles and applications. West Sussex: John codon usage in foot-and-mouth disease virus. Vet Res Commun. 2010; Wiley & Sons Ltd; 2007. 34(4):393–404. 11. Aragones L, Guix S, Ribes E, Bosch A, Pinto RM. Fine-tuning translation 34. Novella IS, Zarate S, Metzgar D, Ebendick-Corpus BE. Positive selection of kinetics selection as the driving force of Codon usage bias in the hepatitis a synonymous mutations in vesicular stomatitis virus. J Mol Biol. 2004;342(5): virus Capsid. PLoS Pathog. 2010;6(3):e1000797. 1415–21. 12. Bahir I, Fromer M, Prat Y, Linial M. Viral adaptation to host: a proteome- 35. Spencer PS, Barral JM. redundancy and its influence on the based analysis of codon usage and amino acid preferences. Mol Syst Biol. encoded polypeptides. Comput Struct Biotechnol J. 2012;1(1):1–8. 2009;5(1):311. 36. Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S. Virus 13. Bull JJ, Molineux IJ, Wilke CO. Slow fitness recovery in a Codon-modified attenuation by genome-scale changes in codon pair bias. Science. 2008; viral genome. Mol Biol Evol. 2012;29(10):2997–3004. 320(5884):1784–7. 14. Burns CC, Shaw J, Campagnoli R, Jorba J, Vincent A, Quay J, Kew O. 37. Greenbaum BD, Levine AJ, Bhanot G, Rabadan R. Patterns of evolution and Modulation of poliovirus replicative fitness in HeLa cells by host gene mimicry in influenza and other RNA viruses. PLoS Pathog. 2008; deoptimization of synonymous codon usage in the capsid region. J 4(6):e1000079. Virol. 2006;80(7):3259–72. 38. Karlin S, Doerfler W, Cardon LR. Why is CpG suppressed in the genomes of 15. Cai MS, Cheng AC, Wang MS, Zhao LC, Zhu DK, Luo QH, Liu F, Chen XY. virtually all small eukaryotic viruses but not in those of large eukaryotic Characterization of synonymous Codon usage bias in the duck plague virus viruses? J Virol. 1994;68(5):2889–97. UL35 gene. Intervirology. 2009;52(5):266–78. 39. Rima BK, McFerran NV. Dinucleotide and frequencies in single- 16. Das S, Paul S, Dutta C. Synonymous codon usage in adenoviruses: stranded RNA viruses. J Gen Virol. 1997;78(11):2859–70. influence of mutation, selection and protein hydropathy. Virus Res. 40. Yakovchuk P, Protozanova E, Frank-Kamenetskii MD. Base-stacking and base- 2006;117(2):227–36. pairing contributions into thermal stability of the DNA double helix. Nucleic 17. Goz E, Mioduser O, Diament A, Tuller T. Evidence of translation efficiency Acids Res. 2006;34(2):564–74. adaptation of the coding regions of the bacteriophage lambda. DNA Res. 41. Cheng XF, Virk N, Chen W, Ji SQ, Ji SX, Sun YQ, Wu XY. CpG usage in RNA 2017;24(4):333–42. viruses: data and hypotheses. PLoS One. 2013;8(9):e74109. 18. Jia RY, Cheng AC, Wang MS, Xin HY, Guo YF, Zhu DK, Qi XF, Zhao LC, Ge H, 42. Wang AHJ, Hakoshima T, Vandermarel G, Vanboom JH, Rich A. At base-pairs Chen XY. Analysis of synonymous codon usage in the UL24 gene of duck are less stable than Gc Base-pairs in Z-Dna - the crystal-structure of enteritis virus. Virus Genes. 2009;38(1):96–103. D(M5cgtam5cg). Cell. 1984;37(1):321–31. 19. Liu YS, Zhou JH, Chen HT, Ma LN, Ding YZ, Wang M, Zhang J. Analysis of 43. Zur H, Tuller T. Strong association between mRNA folding strength and synonymous codon usage in porcine reproductive and respiratory protein abundance in S. Cerevisiae. EMBO Rep. 2012;13(3):272–7. syndrome virus. Infect Genet Evol. 2010;10(6):797–803. 44. Mortimer SA, Kidwell MA, Doudna JA. Insights into RNA structure and 20. Liu YS, Zhou JH, Chen HT, Ma LN, Pejsak Z, Ding YZ, Zhang J. The function from genome-wide studies. Nat Rev Genet. 2014;15(7):469. characteristics of the synonymous codon usage in enterovirus 71 virus and 45. Xia XH. Maximizing transcription efficiency causes codon usage bias. the effects of host on the virus in codon usage pattern. Infect Genet Evol. Genetics. 1996;144(3):1309–20. 2011;11(5):1168–73. 46. Cohen E, Zafrir Z, and Tuller T. A Code for Transcription Elongation Speed. 21. Ma MR, Ha XQ, Ling H, Wang ML, Zhang FX, Zhang SD, Li G, Yan W. The To appear in RNA Biology. 2017. characteristics of the synonymous codon usage in hepatitis B virus and the 47. Zhang G, Ignatova Z. Folding at the birth of the nascent chain: effects of host on the virus in codon usage pattern. Virol J. 2011;8(1):544. coordinating translation with co-translational folding. Curr Opin Struct 22. Michely S, Toulza E, Subirana L, John U, Cognat V, Marechal-Drouard L, Biol. 2011;21(1):25–31. Grimsley N, Moreau H, Piganeau G. Evolution of Codon usage in the 48. Akashi H, Gojobori T. Metabolic efficiency and amino acid composition in smallest photosynthetic and their Giant viruses. Genome Biol the proteomes of Escherichia Coli and Bacillus Subtilis. Proc Natl Acad Sci U Evol. 2013;5(5):848–59. S A. 2002;99(6):3695–700. 23. RoyChoudhury S, Mukherjee D. A detailed comparative analysis on the 49. Davies DL, Bouldin DW. A cluster separation measure. IEEE Trans Pattern overall codon usage pattern in herpesviruses. Virus Res. 2010;148(1–2): Anal Mach Intell. 1979;1(2):224–7. 31–43. 50. Sharp PM, Li WH. An evolutionary perspective on synonymous Codon 24. Sau K, Gupta SK, Sau S, Ghosh TC. Synonymous codon usage bias in 16 usage in unicellular organisms. J Mol Evol. 1986;24(1–2):28–38. Staphylococcus Aureus phages: implication in phage therapy. Virus Res. 51. Sabi R, Tuller T. Modelling the efficiency of Codon-tRNA interactions based 2005;113(2):123–31. on Codon usage bias. DNA Res. 2014;21(5):511–25. Mioduser et al. BMC Genomics (2017) 18:866 Page 10 of 10

52. Sabi R, Daniel RV, Tuller T. stAI(calc): tRNA adaptation index calculator based on species-specific weights. Bioinformatics. 2017;33(4):589–91. 53. Wright F. The effective number of Codons used in a gene. Gene. 1990; 87(1):23–9. 54. Karlin S. Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol. 1998;1(5):598–610. 55. Liu XQ, Jiang HF, Gu ZL, Roberts JW. High-resolution view of bacteriophage lambda gene expression by ribosome profiling. Proc Natl Acad Sci U S A. 2013;110(29):11928–33. 56. Dana A, Tuller T. The effect of tRNA levels on decoding times of mRNA codons. Nucleic Acids Res. 2014;42(14):9171–81. 57. Ulitsky I, Burstein D, Tuller T, Chor B. The average common substring approach to phylogenomic reconstruction. J Comput Biol. 2006;13(2):336–50. 58. Felsenstein J. PHYLIP - phylogeny inference package (version 3.2). Cladistics. 1989;5(2):163–6.

Submit your next manuscript to BioMed Central and we will help you at every step:

• We accept pre-submission inquiries • Our selector tool helps you to find the most relevant journal • We provide round the clock customer support • Convenient online submission • Thorough peer review • Inclusion in PubMed and all major indexing services • Maximum visibility for your research

Submit your manuscript at www.biomedcentral.com/submit