Comparative Genomic Analysis of Translation Initiation Mechanisms For
Total Page:16
File Type:pdf, Size:1020Kb
3922–3931 Nucleic Acids Research, 2017, Vol. 45, No. 7 Published online 21 February 2017 doi: 10.1093/nar/gkx124 Comparative genomic analysis of translation initiation mechanisms for genes lacking the Shine–Dalgarno sequence in prokaryotes So Nakagawa1,2,*, Yoshihito Niimura3 and Takashi Gojobori4 1Department of Molecular Life Science, Tokai University School of Medicine, Isehara 259-1193, Japan, 2Micro/Nano Technology Center, Tokai University, Hiratsuka 259-1292, Japan, 3Department of Applied Biological Chemistry, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan and 4King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Kingdom of Saudi Arabia Received September 14, 2016; Revised February 08, 2017; Editorial Decision February 09, 2017; Accepted February 11, 2017 ABSTRACT ribosomal subunit and the complementary sequence in the 5 untranslated region (UTR) of an mRNA (1–3). Subse- In prokaryotes, translation initiation is believed to quently, the 50S (large) ribosomal subunit docks to the 30S occur through an interaction between the 3 tail of subunit to form a 70S ribosome, where an initiator tRNA is a 16S rRNA and a corresponding Shine–Dalgarno used for protein synthesis (2,3). The ribosome binding site (SD) sequence in the 5 untranslated region (UTR) of in the mRNA is called the Shine–Dalgarno (SD) sequence an mRNA. However, some genes lack SD sequences (GGAGG) that is located approximately 10 nucleotides (nt) (non-SD genes), and the fraction of non-SD genes upstream of the initiation codon (1–3). Although this inter- in a genome varies depending on the prokaryotic action process is thought to be a major initiator of prokary- species. To elucidate non-SD translation initiation otic translation (3–5), the fraction of genes with an SD se- mechanisms in prokaryotes from an evolutionary quence in a genome widely varies among species, ranging perspective, we statistically examined the nucleotide from 9% to 97% (6,7). In addition, previous work suggests that the SD sequence does not necessarily contribute to effi- frequencies around the initiation codons in non-SD cient translation initiation in species harboring a small frac- genes from 260 prokaryotes (235 bacteria and 25 ar- tion of genes with an SD sequence (6). Therefore, prokary- chaea). We identified distinct nucleotide frequency otes may employ additional translation initiation mecha- biases upstream of the initiation codon in bacteria nisms. and archaea, likely because of the presence of lead- Notably, at least two other prokaryotic translation ini- erless mRNAs lacking a 5 UTR. Moreover, we ob- tiation mechanisms have been identified. One mechanism served overall similarities in the nucleotide patterns depends on a leaderless mRNA that lacks a 5 upstream between upstream and downstream regions of the sequence and directly binds to a 70S ribosome to initiate initiation codon in all examined phyla. Symmetric nu- translation (8–10). Leaderless mRNAs have been found in cleotide frequency biases might facilitate translation various species of prokaryotes (11–15). For example, ap- initiation by preventing the formation of secondary proximately two-thirds or one-quarter of the transcripts examined were leaderless for Halobacterium salinarum or structures around the initiation codon. These fea- Mycobacterium smegmatis (belonging to Euryarchaeota or tures are more prominent in species’ genomes that Mycoplasma), respectively (13,14). Further, Wurtzel et al. harbor large fractions of non-SD sequences, sug- analyzed a transcriptome of a crenarchaeon species Sul- gesting that a reduced stability around the initiation folobus solfataricus P2, and found 69% of protein-coding codon is important for efficient translation initiation transcripts were leaderless (15). Another prokaryotic trans- in prokaryotes. lation initiation mechanism is mediated by ribosomal pro- tein S1 (RPS1), the largest component of the 30S riboso- INTRODUCTION mal subunit (16). In Escherichia coli, RPS1 interacts with an mRNA at ∼11 nucleotides upstream of the SD sequence In prokaryotes (bacteria and archaea), translation initiation (17). Bound RPS1 then promotes translation initiation by is generally thought to occur through a base-pairing inter- unfolding the mRNA structure, particularly in mRNAs action between the 3 tail of the 16S rRNA of the 30S (small) *To whom correspondence should be addressed. Tel: +81 463 93 1121 (Ext. 2661); Fax: +81 463 93 5418; Email: [email protected] C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2017, Vol. 45, No. 7 3923 with a highly structured 5 UTR and no or a weak SD se- ergy value of the three-base interactions between SD and quence (6,16–19). RPS1 genes have been identified only in anti-SD sequences (GGA and CCU; GAG and CUC; AGG bacterial genomes, suggesting that only bacteria use this and UCC). If the G between the 3 end of a 16S rRNA mechanism (6,20). and the SD region of an mRNA was greater than −0.8924 Although these two translation initiation mechanisms (kcal/mol), the gene was assumed not to contain SD se- may be essential in various bacteria and archaea, their us- quence. In this study, we denoted such genes as ‘non-SD age is not well understood at a genomic level, especially genes’. To reduce false positives in the non-SD genes, we in nonmodel organisms. Therefore, to better understand used a looser threshold energy value is looser than that used prokaryotic translation initiation mechanisms other than in our previous reports (6,26). SD sequence interactions from an evolutionary perspec- tive, in this study, we conducted extensive comparisons among genes without 5 UTR SD sequences using genome sequences from 260 prokaryotes belonging to 14 bacte- Evaluation of nucleotide frequency biases using G-statistic rial phyla and three archaeal phyla (Supplementary data). Nucleotide hybridization energy between the 5 UTR in To examine nucleotide frequency biases around the initi- an mRNA and 3 tail of 16S rRNA was used to define ation codons, we applied the G-statistic to a fraction of whether a gene does or does not contain an SD sequence. nucleotides at each position (6,21–23). All protein-coding Nucleotide frequency biases at each position around the genes from each species were aligned at the initiation codons initiation codon were evaluated using G-statistics, a mea- without any gaps. At each position, the observed frequen- sure representing the deviation from the expected nucleotide cies of nucleotides (A, U, G and C) were compared with frequency among the entire genomic sequence (6,21–23). the expected frequencies using the likelihood-ratio statistic, From these analyses, we compared several features observed which is used to test goodness of fit (27). The expected fre- around the initiation codons that are related with transla- quencies were calculated for each species in four separate tion initiation mechanisms in prokaryotes. categories––upstream of the initiation codon, and the first, second and third positions in a codon in CDSs––because the nucleotide frequencies differed among these categories. MATERIALS AND METHODS The G-value at position i was calculated using the formula: Genomic data and 16S rRNA genes (i) For this study, genome sequences and gene annotations of G(i) = 2 O(i)ln On (i) , n En 260 species (235 and 25 species of bacteria and archaea, re- n spectively) were obtained from the Gene Trek in Prokaryote Space (GTPS) database at the DNA Data Bank of Japan, (i) where On is the observed number of nucleotide n (A, U, DDBJ (24). For each species, we selected protein-coding (i) G and C) at position i,andEn is the expected number genes that initiated from the AUG, GUG, UUG, AUA, of nucleotide n in the category to which position i belongs AUU or AUC codon (25) and ended with a stop codon, on (100 nt upstream of the initiation codon, or the first, second the basis of the annotation. For the 16S rRNA genes, we uti- or third position in a codon in CDSs). The G-value distri- lized 13-nt sequences from the 3 tail of 16S rRNA, which we 2 bution was approximated by the -distribution with three obtained in a previously study (6). The 3 tails of 16S rRNAs (i) degrees of freedom. When On was larger and smaller than were determined manually, which would improve accuracy (i) (i) (i) (i) En, the values of Gn [ = 2On ln(On /En )] became pos- of determination of non-SD genes. itive and negative, respectively. For this reason, we regarded each term in this formula as a measure of the bias for each (i) Determination of non-SD genes nucleotide at a given position. Because the Gn was propor- tional to the number of genes (N) when the fractions of the To identify genes that do not contain an SD sequence, we observed and expected numbers of nucleotides were iden- calculated the Gibbs free energy (G) of the base pairing (i) tical, we defined a g value as G divided by the number between 5 UTR of an mRNA and the complementary se- n n G(i)/ = (i) quence at the 3 end of the 16S rRNA (anti-SD sequence) of genes N ( N gn ). As this gn value was not af- for each species. For this purpose, we used a computational n fected by the number of genes, we utilized g value for the program, free scan, based on individual nearest-neighbor n comparisons of nucleotide frequency biases among species hydrogen bonding methods (26). This program computes in this study.