Computational Prediction of Regulatory, Premature Transcription Termination in Bacteria Adi Millman, Daniel Dar, Maya Shamir and Rotem Sorek*
Total Page:16
File Type:pdf, Size:1020Kb
Nucleic Acids Research Advance Access published September 1, 2016 Nucleic Acids Research, 2016 1 doi: 10.1093/nar/gkw749 Computational prediction of regulatory, premature transcription termination in bacteria Adi Millman, Daniel Dar, Maya Shamir and Rotem Sorek* Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel Received April 19, 2016; Revised August 08, 2016; Accepted August 18, 2016 ABSTRACT rial genes are regulated by conditional premature termina- tion (2). Downloaded from A common strategy for regulation of gene expression Several types of cis-acting RNA-based regulation sys- in bacteria is conditional transcription termination. tems (riboregulators) employ conditional premature termi- This strategy is frequently employed by 5 UTR cis- nation as part of their mechanism of action: (i) riboswitches acting RNA elements (riboregulators), including ri- (3), which are non-coding RNA elements that directly bind boswitches and attenuators. Such riboregulators can small molecule ligands and alter their structure accordingly; assume two mutually exclusive RNA structures, one (ii) attenuators, which encode short upstream open reading http://nar.oxfordjournals.org/ of which forms a transcriptional terminator and re- frames (uORFs) that sense the stalling of the ribosome in sults in premature termination, and the other forms case of shortage in amino acids (4) or in presence of antibi- an antiterminator that allows read-through into the otics (5,6); (iii) T-boxes (7), which sense amino acid avail- coding sequence to produce a full-length mRNA. ability by directly binding uncharged tRNAs and (iv) RNA leaders that bind specific antitermination proteins (8). We developed a machine-learning based approach, While the regulatory architypes outlined above differ sig- which, given a 5 UTR of a gene, predicts whether nificantly in their sensory strategies, they all control prema- it can form the two alternative structures typical ture termination by switching between two alternative and at Weizmann Institute of Science on December 14, 2016 to riboregulators employing conditional termination. mutually exclusive RNA conformations. In the repressive Using a large positive training set of riboregula- conformation (‘closed-state’), the riboregulator assumes a tors derived from 89 human microbiome bacteria, terminator form, generating a hairpin structure immedi- we show high specificity and sensitivity for our ately followed by a uridine rich tract (Figure 1A). Alterna- classifier. We further show that our approach al- tively, in the active conformation (‘open-state’), the RNA lows the discovery of previously unidentified ri- folds into an antiterminator stem-loop structure that effec- boregulators, as exemplified by the detection of new tively decouples the uridine tract from the terminator hair- LeuA leaders and T-boxes in Streptococci.Finally, pin, therefore promoting transcription read-through into thegene(Figure1B). The choice between the two possi- we developed PASIFIC (www.weizmann.ac.il/molgen/ ble RNA folds is determined by the presence or absence of Sorek/PASIFIC/), an online web-server that, given a the regulating metabolite (for riboswitches), the presence of user-provided 5 UTR sequence, predicts whether this ribosomes stalled on the riboregulator (for attenuators) or sequence can adopt two alternative structures con- binding of a specific antitermination protein to the riboreg- forming with the conditional termination paradigm. ulator (for protein-binding RNA leaders). This webserver is expected to assist in the identi- Recent studies show that riboregulators that function via fication of new riboswitches and attenuators in the conditional, regulated termination are more abundant than bacterial pan-genome. originally thought (5), conforming with previous estimates that such RNA elements are very common in bacteria (9). Several computational tools have been developed to pre- INTRODUCTION dict the presence of such riboregulators, most of them us- Conditional transcription termination is a common mech- ing comparative genomics, relying on consensus secondary anism for gene expression regulation in bacteria (1). Con- structures and utilizing covariance models to search for ditional transcriptional terminators usually occur in the new elements (10–12). Such approaches perform well when 5UTR of genes or operons, such that in some conditions the riboregulator is highly conserved between distant or- an intrinsic premature transcriptional terminator is formed, ganisms, but are expected to miss RNA elements that are preventing the transcription into the downstream gene (Fig- rare or evolutionarily diverged (9,13). Several tools use ure 1). It is estimated that a significant fraction of all bacte- thermodynamics-based methods to search for RNA ele- *To whom correspondence should be addressed. Tel: +972 8 9346342; Fax: +972 8 934 4108; Email: [email protected] C The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] 2 Nucleic Acids Research, 2016 Downloaded from http://nar.oxfordjournals.org/ Figure 1. The principle of regulation by conditional termination. A riboregulator that functions through conditional transcription termination can assume two mutually exclusive structural conformations: (A) A ‘closed’ conformation, that entails an intrinsic terminator, which is a stem-loop (strands #3 and #4) followed by a poly-U. This structure causes the RNA-polymerase to terminate transcription prematurely. The terminator structure is usually preceded by another stem-loop structure called the anti-antiterminator or P1 (formed by strands #1 and #2). (B) An ‘open’ conformation, in which an antiterminator stem (not immediately followed by a poly-U) is generated from pairing of strands #2 and #3, allowing the RNA-polymerase to continue transcription into the downstream gene. (C) The closed state (‘Gene off’) typically results in higher amounts of the short, prematurely terminated transcript, which can be measured by RNA-seq (5). In the open state (‘gene on’) more full-length transcripts are observed. ments that can adopt alternative conformations (14–17), 5UTR. For Rfam models that contained the terminator at Weizmann Institute of Science on December 14, 2016 but these methods are not specifically directed towards find- stem-loop within the model, a TTS was assigned if it was ing features of conditional terminators, and may be less ef- adjacent to the hit (up to 50 bases downstream). For Rfam fective in detecting riboswitches in which one of the con- models that do not contain the terminator stem-loop within formations is only stable when bound to the ligand (18). To the model, a TTS was searched starting 30 bases down- our knowledge there is currently no tool that utilizes the ba- stream to the Rfam model and up to 80 bases downstream sic concept of mutually exclusive terminator-antiterminator to the model. conformations in order to predict riboregulators that func- tion via regulated termination. Collection of the negative set We developed PASIFIC (prediction of alternative structures for identification of cis-regulation), an online Intergenic regions sized up to 600 bases were extracted from tool that, given a user-provided bacterial 5 UTR, searches the same genomes as above. Segments of 80–400 bases that this sequence for terminator-antiterminator alternative have TSSs (transcription start sites) and TTSs (with over structures enabling RNA structure prediction of known five reads), and were expressed in the opposite orientation and novel cis-acting riboregulators. Combining machine to the downstream gene were extracted. These segments learning with prediction of alternative RNA secondary were then scanned using the Infernal (19) cmscan tool with structures, this tool can detect riboswitches, attenuators ‘trusted’ cutoff for Rfam models and we verified that this set and leaders in a manner not dependent on their sequence does not contain any elements with positive Rfam hits. The conservation in other species. set was supplemented with 27 gene terminators that were obtained by looking for TTS of genes that are regulated by the positive set and taking the last 200 bases, and 30 tRNAs MATERIALS AND METHODS from the discarded sequences. Collection of the positive set Alternative folds prediction Known riboregulators were identified in the reference genomes of 169 human microbiome bacterial species stud- Positive and negative sets were filtered for elements that con- iedin(5). For each of these genomes, the Infernal (19) tain features of an intrinsic terminator at their 3 end. For cmscan tool was run with ‘trusted’ cutoff (–cut tc) using this, sequences not having a poly-U (defined as a stretch of allRfam(20) models defined as Cis-reg, Riboswitch, or at least three consecutive uridines and no more than 2 con- Leader. Results were screened for riboregulators of classes secutive non-uridine bases) were discarded, as well as se- that are known by the literature to function via conditional quences not presenting a stem-loop structure upstream to termination. Results were further screened for hits that have the poly-U. For this, the