Riboswitch Diversity and Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from rnajournal.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press BIOINFORMATICS Riboswitch diversity and distribution PHILLIP J. MCCOWN,1,4 KEITH A. CORBINO,2 SHIRA STAV,1 MADELINE E. SHERLOCK,3 and RONALD R. BREAKER1,2,3 1Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA 2Howard Hughes Medical Institute, Yale University, New Haven, Connecticut 06520-8103, USA 3Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8103, USA ABSTRACT Riboswitches are commonly used by bacteria to detect a variety of metabolites and ions to regulate gene expression. To date, nearly 40 different classes of riboswitches have been discovered, experimentally validated, and modeled at atomic resolution in complex with their cognate ligands. The research findings produced since the first riboswitch validation reports in 2002 reveal that these noncoding RNA domains exploit many different structural features to create binding pockets that are extremely selective for their target ligands. Some riboswitch classes are very common and are present in bacteria from nearly all lineages, whereas others are exceedingly rare and appear in only a few species whose DNA has been sequenced. Presented herein are the consensus sequences, structural models, and phylogenetic distributions for all validated riboswitch classes. Based on our findings, we predict that there are potentially many thousands of distinct bacterial riboswitch classes remaining to be discovered, but that the rarity of individual undiscovered classes will make it increasingly difficult to find additional examples of this RNA-based sensory and gene control mechanism. Keywords: aptamer; coenzyme; ligand; noncoding RNA; RNA World INTRODUCTION messengers. These observations strongly suggest that ribo- switches might have emerged during the RNA World Many bacteria use riboswitches to sense fundamental me- (Breaker 2009, 2012; Garst et al. 2011), when biological sys- tabolites or ions and control the expression of genes whose tems lacked proteins that otherwise would serve as biochem- protein products manage the cellular homeostasis of these ical sensors and switches. However, because no riboswitch ligands (Henkin 2008; Breaker 2012; Serganov and Nudler can be unambiguously traced to the last common universal 2013). Much has been learned about the structures, func- ancestor, these RNAs cannot be directly linked to the RNA tions, and distributions of riboswitches since the first exper- World. For example, representatives of only two classes are of- imental validation studies were reported in 2002 (Mironov ten found in species outside the bacterial domain of life. et al. 2002; Nahvi et al. 2002; Winkler et al. 2002a,b). For Specifically, fluoride riboswitches (Baker et al. 2012) and thi- example, many distinct RNA aptamer structures are exploit- amin pyrophosphate (TPP) riboswitches (Mironov et al. ed by riboswitches to selectively sense a surprising variety of 2002; Winkler et al. 2002b) are sometimes present in species ligands (Roth and Breaker 2009; Garst et al. 2011; Serganov of archaea, whereas TPP riboswitches are very common and Patel 2012). These aptamers regulate gene expression among fungi (Sudarsan et al. 2003a; Cheah et al. 2007), algae by interacting with adjoining expression platforms that reg- (Croft et al. 2007), and plants (Kubodera et al. 2003; Sudarsan ulate gene expression primarily by controlling transcription et al. 2003a; Wachter et al. 2007; Bocobza and Aharoni 2014). termination or translation initiation (Barrick and Breaker In contrast, several riboswitch classes are exceedingly rare, 2007; Breaker 2012), although other mechanisms are also and therefore could represent recent evolutionary inventions used by bacteria (e.g., Winkler et al. 2004; André et al. or perhaps ancient riboswitches that are being driven to the 2008; Lee et al. 2010; Hollands et al. 2012; Mellin et al. brink of extinction. One such rare riboswitch class is called 2013). preQ -III, which, as the name implies, is the third validated Among the most widespread riboswitch classes are those 1 riboswitch class for the modified nucleobase called that sense and respond to RNA-based coenzymes and second © 2017 McCown et al. This article is distributed exclusively by the RNA 4Present address: Department of Chemistry and Biochemistry, Society for the first 12 months after the full-issue publication date (see University of Notre Dame, Notre Dame, IN 46556, USA http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is Corresponding author: [email protected] available under a Creative Commons License (Attribution-NonCommercial Article is online at http://www.rnajournal.org/cgi/doi/10.1261/rna.061234. 4.0 International), as described at http://creativecommons.org/licenses/ 117. by-nc/4.0/. RNA 23:995–1011; Published by Cold Spring Harbor Laboratory Press for the RNA Society 995 Downloaded from rnajournal.cshlp.org on September 28, 2021 - Published by Cold Spring Harbor Laboratory Press McCown et al. prequeuosine1 (McCown et al. 2014). Only 86 examples of RESULTS AND DISCUSSION this riboswitch class were identified from metagenomic se- Strategy for the bioinformatics analysis of known quence data or from members of a single bacterial lineage. riboswitch classes Even rarer are the variants of guanine riboswitches that sense 2′-deoxyguanosine (2′-dG). Only four examples of the 2′- Riboswitch aptamers are among the most highly conserved dG-I riboswitch class have been found, all of which occur cellular components in biology, which reflects their need to in a single organism, Mesoplasma florum (Kim et al. 2007). form highly selective binding pockets for target ligands using Similarly, only 12 members of the distinct 2′-dG-II class only the four common nucleotides (Breaker 2011). In con- have been identified to date (Weinberg et al. 2017), and all trast, expression platforms can regulate gene expression via of these are present only in metagenomic sequence data. a variety of mechanisms and structures (Breaker 2012), and Not surprisingly, the order of riboswitch class discovery therefore are far less well conserved. Conserved aptamer fea- roughly corresponds to the number of representatives in tures include nucleotide sequences (particularly at or near the each class, such that the most common riboswitches were dis- ligand-binding core), secondary structures such as base- covered first. This has occurred because there is a higher prob- paired stems and pseudoknots, and motifs such as k-turns, ability of encountering a member of a common riboswitch E-loops, special tetraloops, and other common substructures class versus a member of a rare class, regardless of whether (Serganov 2009; Garst et al. 2011). Distinctive patterns of con- the search strategy involved a genetics-based or a bioinfor- servation can be readily established by examining only a few matics-based search. Some very rare riboswitch classes were representatives of a riboswitch aptamer class (e.g., Grundy discovered only because their representatives are very close and Henkin 1998; Gelfand et al. 1999; Barrick et al. 2004). derivatives of a more common riboswitch class, such as ade- Once established, consensus sequence and structural mod- nine (Mandal and Breaker 2004) or the 2′-dG-I and -II ribo- els help to define each riboswitch class (Ames and Breaker switches (Kim et al. 2007; Weinberg et al. 2017), all of which 2010; Breaker 2011). The consensus model also can be used are variants of the vastly more common guanine riboswitch to direct bioinformatics algorithms to search for additional class initially discovered. Unfortunately, these close variants RNAs that closely correspond to the consensus. These algo- will remain hidden among the bioinformatics hits for a rithms can be made to rank candidate representative “hits” more prominent riboswitch class unless there are key distinc- based on how well their sequences and predicted substruc- tions, such as mutations to the ligand-binding core of the tures conform to the existing consensus model. Outliers aptamer or gene associations that indicate a change in ligand that are proven (or presumed) to function as riboswitches specificity (Kellenberger et al. 2015; Nelson et al. 2015; can then help inform how the consensus model for the Weinberg et al. 2017). aptamer should be updated to more accurately reflect the se- In recent years, the most productive strategy for the discov- quence and structural constraints on the riboswitch class. ery of riboswitches has involved the use of computer algo- This bioinformatics strategy has been used previously to re- rithms to carry out comparative sequence analysis of the veal the existence of structural variants of preQ1-I, preQ1-II, noncoding DNA portions of bacterial genomes (Ames and and preQ1-III riboswitches (McCown et al. 2014) and to Breaker 2010). As noted above, these searches can uncover uncover variants of guanine riboswitches that exhibit altered the existence of variants of a known riboswitch class by iden- ligand specificities (Mandal and Breaker 2004; Mandal et al. tifying RNAs that closely correspond to the consensus se- 2003; Kim et al. 2007). Most recently, we have used a bioinfor- quence and secondary structure model of the predominant matics search pipeline, guided by the known atomic-resolu- class (e.g., Mandal and Breaker 2004; Kim et al. 2007; tion structures of riboswitches, to