Revision 3

1

2

3 Tandem form a natural Boolean logic gate to control 4 in

5

6 Madeline E. Sherlock, Narasimhan Sudarsan, Shira Stav, Ronald R. Breaker

7 For correspondence: [email protected] 8 9 Phone: 203 432-9389 10

11

12 Sherlock et al. PRPP Riboswitches

13 Abstract 14 Gene control systems sometimes interpret multiple signals to set the expression levels of the 15 genes they regulate. In rare instances, ligand-binding form tandem 16 arrangements to approximate the function of specific two-input Boolean logic gates. Here we 17 report the discovery of riboswitch aptamers for phosphoribosyl pyrophosphate (PRPP) that 18 naturally exist either in singlet arrangements, or occur in tandem with aptamers. Tandem 19 guanine-PRPP aptamers can bind the target ligands, either independently or in combination, to 20 approximate the function expected for an IMPLY Boolean logic gate to regulate transcription of 21 messenger for de novo purine in bacteria. The existence of sophisticated all- 22 RNA regulatory systems that sense two ancient derivatives to control synthesis of 23 RNA molecules supports the hypothesis that RNA World organisms could manage a complex 24 metabolic state without the assistance of regulatory factors.

25

26 Introduction 27 A variety of riboswitch classes regulate gene expression in response to the binding of specific 28 metabolite or inorganic ion ligands (Roth and Breaker, 2009; Serganov and Nudler, 2013; 29 Sherwood and Henkin, 2016). Previously, 25 riboswitch classes were known that selectively 30 respond to ligands that are chemical derivatives of RNA monomers or their precursors (McCown 31 et al., 2017). This strong bias in favor of RNA-like ligands strengthens the hypothesis (Nelson 32 and Breaker, 2017; Breaker, 2010) that many of the widely-distributed riboswitch classes are 33 direct descendants from the RNA World – a time before genetically-encoded protein 34 biosynthesis (Gilbert, 1986; Benner et al., 1989). If this speculation is generally correct, we can 35 expect that some of the most abundant riboswitch classes remaining to be discovered also will 36 regulate gene expression in response to the binding of other ancient RNA derivatives (Breaker, 37 2011). 38 With these considerations in mind, we evaluated the possible functions of variants of recently 39 validated riboswitches for guanidine (Nelson et al., 2017). Collectively, guanidine-I riboswitches 40 along with differently-functioning variants were originally called the ykkC orphan riboswitch 41 candidate because no ligand had been identified (Barrick et al., 2004; Meyer et al., 2011). After 42 removing guanidine-I riboswitches (subtype 1), which constituted ~70% of the original ykkC

2

Sherlock et al. PRPP Riboswitches

43 motif RNAs, we further separated the remaining collection of ykkC RNA variants (subtype 2). 44 Upon further investigation of the RNA sequences, the phylogeny of organisms containing these 45 RNAs, and the genes associated with each instance of the RNA, we concluded that ykkC subtype 46 2 RNAs were populated by multiple riboswitch classes that sensed distinct ligands. 47 Subtype 2 representatives were sorted according to the gene located immediately 48 downstream from the ykkC RNA motif to create revised consensus models for the RNAs within 49 each gene association group. In some cases, additional conserved were observed 50 immediately 5ˊ and 3ˊ of the three hairpin structures (called P1, P2 and P3) that are characteristic 51 of ykkC motif RNAs. This region located downstream of the guanidine binding site of subtype 1 52 RNAs were found to be unnecessary for guanidine-I riboswitches. In total, these distinct 53 representatives were sorted into four candidate riboswitch classes called subtypes 2a through 2d. 54 All subtype 2 RNAs appear to be extremely similar because they share many of the same 55 highly-conserved nucleotides and secondary structure features. We were fortunate to have high- 56 resolution crystal structures (Reiss et al., 2017; Battaglia et al., 2017) of the guanidine-I 57 riboswitch (subtype 1, Figure 1A) in hand when evaluating each of these variant subtypes. 58 Nearly all highly-conserved nucleotides shared by ykkC RNAs are involved in forming tertiary 59 contacts within the larger folded structure that positions other nucleotides to form the ligand 60 binding pocket. However, nucleotides that change identity and are characteristic of each subtype 61 are almost exclusively found in close proximity to the known binding pocket of guanidine-I 62 riboswitches (indicated by black asterisks in Figure 1A). For subtypes 2a, 2b, and 2c, we 63 speculated that the additional stretches of conserved nucleotides preceding the P1 stem and 64 following the P3 stem likely play a role in ligand recognition, or perhaps provide structural 65 support for an expanded binding pocket. Unfortunately, because these nucleotides are not 66 conserved in guanidine-I riboswitches, we do not currently have a good understanding of their 67 3D positioning relative to the rest of the . 68 It quickly became clear that sequences associated with branched-chain amino acid 69 metabolism appeared highly similar to those associated with glutamate synthases and natA 70 transporters, and so these were collectively termed ykkC subtype 2a (Figure 1A). Shortly 71 thereafter, these were experimentally verified as a riboswitch class that responds to the bacterial 72 alarmone ppGpp (ME Sherlock, N Sudarsan, RR Breaker, manuscript submitted). Herein we

3

Sherlock et al. PRPP Riboswitches

73 describe the function of ykkC subtype 2b RNAs (Figure 1A,B), whereas the ligands for subtype 74 2c and 2d RNAs have not yet been identified (see discussion). 75 Three major clues were exploited to identify phosphoribosyl pyrophosphate (PRPP; 5- 76 phospho-α-D-ribose 1-diphosphate) (Figure 1B) as the ligand for riboswitches represented by 77 ykkC subtype 2b RNAs. First, representatives of this riboswitch class commonly associate with 78 genes for de novo purine biosynthesis (Figure 1C). In addition, deaminase (codA), a 79 gene involved in salvage that is known to be repressed by the purine guanine 80 (Kilstrup et al., 1989), is also commonly located downstream of subtype 2b RNAs. These 81 associated genes, which are either related to the production of or are regulated by a 82 purine, suggested to us that the ligand would most likely be a compound related to purine 83 metabolism. 84 Second, there are 257 examples of ykkC subtype 2b RNAs distributed throughout bacteria of 85 the phylum Firmicutes. Of these, 127 examples occur in a tandem arrangement (Sudarsan et al., 86 2006) with a guanine aptamer (Mandal et al., 2003) located directly upstream (Figure 1B). Such 87 tandem arrangements are likely to function in concert as two-input molecular logic gates 88 (Sudarsan et al., 2006; Stoddard and Batey, 2006; Lee et al., 2010). Guanine riboswitches 89 typically turn off expression of downstream purine biosynthesis genes when sufficient guanine is 90 present (Mandal et al., 2003), while representatives of ykkC subtype 2b riboswitches were 91 predicted to turn on expression of downstream genes. The physical arrangement of each aptamer 92 in the tandem arrangement suggests that they share a single expression platform. Guanine 93 binding to its aptamer likely would promote the formation of an adjoining intrinsic transcription 94 terminator stem, which would halt transcription. It is typical for guanine riboswitches to suppress 95 expression of genes involved in purine biosynthesis (Mandal et al., 2003). In opposition, binding 96 of ligand to the adjacent subtype 2b RNA would preclude terminator stem formation and thereby 97 override the typical function of the guanine riboswitch. If this speculation is correct, then the 98 ligand for subtype 2b RNAs should trigger cells to make more purines even when one of the 99 major purine compounds, guanine, is plentiful. 100 Third, we noticed that the consensus sequence and structural models for ppGpp riboswitch 101 aptamers (subtype 2a) and subtype 2b RNAs are remarkably alike, including many similarities at 102 the ligand binding site, suggesting that their ligands might also share some chemical features. 103 Particularly notable was the observation that a long-range C-G base-pair present in guanidine-I

4

Sherlock et al. PRPP Riboswitches

104 riboswitches (Reiss et al., 2017; Battaglia et al., 2017) is absent in ppGpp riboswitches, but 105 appears to again be present in subtype 2b RNAs. We speculated that the structural role of the G 106 missing in ppGpp aptamers might be performed by the guanine base of the ligand. 107 However, because the highly-similar subtype 2b RNA retains this G nucleotide (indicated by a 108 red asterisk in Figure 1A) in its aptamer, its ligand might resemble an apurinic version of ppGpp. 109

110 Results 111 Variant riboswitch aptamers selectively bind the nucleotide precursor phosphoribosyl 112 pyrophosphate (PRPP) 113 To evaluate PRPP binding and riboswitch aptamer function, we examined a 106 nucleotide 114 singlet aptamer arbitrarily derived from the purC gene of the bacterium Facklamia ignava 115 (Figure 2A) by using in-line probing (Soukup and Breaker, 1999; Regulski and Breaker, 116 2008). This method takes advantage of the inherent chemical instability of phosphodiester 117 linkages in unstructured regions of an RNA and the folding differences in aptamers brought 118 about by ligand binding to provide details on riboswitch function. 119 The 106 purC RNA construct exhibits structural modulation at highly conserved nucleotide 120 positions upon the addition of PRPP (Figure 2B). In contrast, no structural modulation of the 121 RNA is observed when the known ligands for the other ykkC subtypes (guanidine and ppGpp) or 122 for the guanine riboswitch (guanine) are added. Importantly, the structural changes selectively 123 brought about by PRPP largely occur in regions equivalent to those previously implicated in 124 ligand binding by guanidine-I (Nelson et al., 2017; Reiss et al., 2017; Battaglia et al., 2017) and 125 ppGpp riboswitch classes. This observation strongly suggests that these binding site nucleotides 126 have changed to alter ligand specificity, and that PRPP is recognized by this converted binding 127 pocket. 128 129 PRPP triggers riboswitch-mediated regulation of transcription termination 130 Due to the well-characterized instability of the PRPP molecule (Dennis et al., 2000; Meola et 131 al., 2003), especially under alkaline conditions and high Mg2+, PRPP extensively degrades into 132 multiple breakdown products over the course of our binding experiments. As a result, in-line 133 probing assays will not yield accurate dissociation constants. Therefore, to evaluate the effects of

5

Sherlock et al. PRPP Riboswitches

134 PRPP on riboswitch function, we performed single-turnover in vitro transcription termination 135 experiments (Wickiser et al., 2005a), which involve briefer incubations under milder conditions. 136 The P3 helix of the PRPP aptamer (Figure 1A) is predicted to act as an antiterminator stem, 137 which should increase the number of full-length mRNA transcripts when PRPP is bound. 138 Transcription of a singlet PRPP riboswitch derived from the purB gene of Heliobacterium 139 modesticaldum (Figure 3A) produces a greater yield of full-length RNA when PRPP is added to 140 the reaction (Figure 3B). The concentration of PRPP necessary to achieve half maximal

141 termination (T50) is less than 100 µM (Figure 3C and Figure 3–figure supplement 1). In 142 contrast, an RNA construct (M1) with disruptive mutations within the highly-conserved aptamer 143 sequence yields the same fraction of terminated and full-length transcripts regardless of PRPP 144 concentration (Figure 3B).

145 Importantly, T50 values should not be considered equivalent to KD values. Riboswitches

146 evaluated by transcription termination assays operate under kinetic limitations, whereas KD 147 values established by in-line probing can reach thermodynamic equilibrium (Wickiser et al.,

148 2005a; Wickiser et al., 2005b; Gilbert et al., 2006; Rieder et al., 2007). Thus, the KD for PRPP

149 binding to the H. modesticaldum aptamer is likely to be far better than the T50 values we 150 measured for the H. modesticaldum purB construct, which range between 40 and 90 µM (Figure 151 3C and Figure 3–figure supplement 1). 152 153 In vivo regulation of gene expression by a PRPP riboswitch 154 The same singlet PRPP riboswitch sequence from H. modesticaldum (Figure 3A) was fused to 155 lacZ in wild-type (WT) and ΔpurF cells. Deletion of the purF gene, which 156 encodes the enzyme that catalyzes the first committed step of de novo purine biosynthesis 157 (Figure 1C), causes PRPP to accumulate when cells are starved for purines. Accordingly, the 158 lacZ gene is expressed (blue) in glucose minimal medium in the ΔpurF strain, but not (white) in 159 the parental B. subtilis strain with a functional purF gene (Figure 3D). 160 Notably, the cells range from completely white to a robust blue under these two conditions, 161 indicating that gene expression is fully suppressed when no ligand is present. Thus the 162 riboswitch appears to sample a much greater dynamic range for gene expression in vivo than is 163 observed with in vitro transcription termination assays. A riboswitch-reporter fusion carrying a 164 mutation (M2) in the terminator stem expresses lacZ under both conditions. These results

6

Sherlock et al. PRPP Riboswitches

165 demonstrate that PRPP accumulation in cells induce gene expression, which parallels our 166 findings in vitro that PRPP binding by ykkC subtype 2b RNAs promotes transcription elongation 167 (Figure 3B and Figure 3C). 168 169 Natural examples of tandem riboswitch aptamers for guanine and PRPP bind these ligands 170 to induce mutually-exclusive structures 171 We next sought to characterize the tandem arrangement of the guanine and PRPP aptamers by 172 in-line probing analysis of the 208 nucleotide RNA derived from the codA gene of Bacillus 173 megaterium (Figure 4A). Each aptamer appears to independently fold to form its characteristic 174 structure and each exhibits typical structural modulation (Figure 4B) upon addition of its 175 cognate ligand as previously observed for singlet guanine (Mandal et al., 2003) or PRPP (Figure 176 2B) aptamers. 177 Only the aptamer regions of the RNA were examined by these in-line probing experiments, 178 but the logic gate function of the tandem riboswitch is accomplished through interactions 179 between the two aptamers and the adjoining expression platform. Consequently, terminator and 180 antiterminator stems are stabilized or destabilized depending on the relative concentrations of 181 both guanine and PRPP. 182 The precise mechanism by which the guanine and PRPP aptamers work in opposition to 183 control the formation of the terminator stem has yet to be fully experimentally evaluated. 184 However, bioinformatic and in-line probing data already might provide clues regarding the 185 structural interplay between the two aptamers. First, guanine aptamers are always located almost 186 immediately adjacent to the conserved nucleotides at the 5ˊ terminus of the PRPP aptamer in 187 tandem arrangements (Figure 1B). Specifically, there are typically only 5 to 7 nucleotides 188 separating the P1 stems of the guanine and PRPP aptamers, suggesting this juxtaposition is 189 important for logic gate function. Second, the addition of PRPP to the tandem construct causes a 190 reduction in spontaneous RNA cleavage within the conserved nucleotides at the 5ˊ terminus of 191 the PRPP aptamer (Figure 4B, region 5), whereas guanine addition causes this same region to 192 experience a dramatic increase in scission. These results suggest that guanine binding to its 193 aptamer destabilizes an important structural feature necessary for ligand binding by the PRPP 194 aptamer. Thus, each aptamer might interfere with its neighboring aptamer upon ligand binding,

7

Sherlock et al. PRPP Riboswitches

195 perhaps by causing steric clashes due to their proximity. Additional experiments will need to be 196 conducted to more fully evaluate this mechanistic hypothesis. 197 198 Boolean logic function by a riboswitch carrying tandem guanine and PRPP aptamers 199 To evaluate the proposed two-input logic gate function, the same guanine-PRPP tandem aptamer 200 system including its associated expression platform from B. megaterium (Figure 5A) was 201 subjected to in vitro transcription termination assays. Consistent with results from the singlet 202 aptamer, the addition of PRPP increases the fraction of RNA transcripts that pass beyond the 203 intrinsic terminator stem and reach full length (Figure 5B). Also as predicted, the addition of 204 guanine causes an increase in the fraction of terminated transcripts. When both guanine and 205 PRPP are at high levels, the predicted genetic override occurs, and the fraction of terminated 206 products approaches zero. 207 Separate and simultaneous mutations to each aptamer (constructs M3 through M5) confirm 208 the OFF and ON function of the guanine and PRPP aptamers, respectively. Specifically, the 209 mutations in M3 alter two highly-conserved nucleotides in the core of the guanine aptamer that 210 are known to be essential for forming its ligand-binding core (Batey et al. 2004; Serganov et al. 211 2004). As expected, the addition of guanine does not increase transcription termination. 212 Similarly, the mutations in M4 alter two highly-conserved nucleotides in PRPP aptamer, which 213 we expect to disrupt PRPP binding. Again as expected, the amount of transcription termination 214 product increases, and PRPP addition has no effect on the distribution of transcription products. 215 Construct M5 combines the M3 and M4 mutations to generate an RNA that should fail to bind 216 either ligand. Indeed, the M5 construct appears to be unaffected by the addition of either guanine 217 or PRPP (Figure 5B). Similar results are observed in vivo when the WT and M3 through M5 218 tandem guanine-PRPP riboswitch constructs from B. megaterium were fused to the lacZ reporter 219 gene and transformed into WT B. subtilis cells (Figure 5C). 220 The versatility of this tandem riboswitch system is best demonstrated by varying the 221 concentrations of both guanine and PRPP in a series of in vitro transcription termination assays. 222 The data from these reactions yields a 3D landscape demonstrating again that the relative 223 concentrations of guanine and PRPP dictate the fraction of RNA transcripts that reach full length 224 (Figure 5D). Thus, the tandem riboswitch arrangement permits cells to tune the expression 225 levels of associated genes based on the relative concentrations of guanine and PRPP, such that

8

Sherlock et al. PRPP Riboswitches

226 high PRPP levels promote purine production despite the abundance of guanine, a prominent 227 purine. 228 At the single molecule level, each riboswitch aptamer in this tandem construct favors either 229 transcription termination or elongation based on the binding of its cognate ligand. In this regard, 230 each RNA molecule carrying this tandem aptamer arrangement functions as a Boolean IMPLY 231 logic gate. The truth table for an IMPLY logic gate (Figure 6A) indicates that gene expression 232 should be on when neither ligand is present, and when PRPP is bound by the RNA – even when 233 guanine is available. Finally, gene expression should be off when guanine is bound by the RNA, 234 which should occur when guanine is abundant while PRPP is not. This regulatory pattern could 235 be accomplished by the tandem RNA arrangement through formation of an antiterminator stem 236 by alternative base pairing interactions between the two aptamers, similar to those observed for 237 an allosterically-regulated tandem riboswitch- described previously (Lee et al., 2010; 238 Chen et al., 2011). Alternatively, as noted earlier, the aptamers might compete sterically due to 239 the proximity of their conserved sequences and structures. Regardless of the precise mechanism, 240 we hypothesize that an unoccupied PRPP aptamer permits formation of the terminator stem, 241 whereas an occupied guanine aptamer destabilizes the PRPP aptamer. Additional detailed kinetic 242 and structural studies will be necessary to reveal the precise molecular interactions that yield 243 IMPLY gene control logic. 244 When transcription termination or gene expression is observed as the average output from a 245 population of RNA constructs, the resulting data do not represent a perfect Boolean function 246 because the output yields intermediate values between zero and maximal values. Whereas each 247 of the individual RNAs operates as a binary logic gate that is either on or off, a population of 248 these RNAs exceeds the function of an individual Boolean logic gate. Together, the output of the 249 tandem riboswitch population functions more like a molecular rheostat to tune gene expression 250 based on relative ligand concentrations, rather than as an all-or-nothing binary switch. 251 Regardless, when ligand concentrations are at their extreme values, the output of the tandem 252 system (Figure 5D) approaches the binary logic of an IMPLY gate. 253 254 The biological utility of the opposing guanine and PRPP signals evaluated by tandem 255 riboswitch aptamers

9

Sherlock et al. PRPP Riboswitches

256 A situation in cells in which both guanine and PRPP are scarce would indicate an extremely 257 starved state. Ordinarily, one might expect guanine and PRPP levels to be negatively correlated, 258 such that an excess of guanine would deplete PRPP as a result of the synthesis of 5′-guanosine 259 monophosphate, and vice versa, either through de novo purine biosynthesis or the purine salvage 260 pathway (Figure 1B). If true, then why is it necessary to form this complex tandem arrangement 261 to override the gene control function of a guanine riboswitch? This apparent regulatory paradox 262 is addressed in more detail below. 263 During the stringent response in Firmicutes, nutrient starvation is signaled by the presence of 264 uncharged tRNAs that activate the ppGpp synthase RelA (Geiger and Wolz, 2014). ppGpp 265 synthesis causes GTP levels to drop and the transcriptional repressor CodY, which requires GTP 266 and isoleucine to bind its target sites, to release and enable transcription of branched-chain amino 267 acid (BCAA) biosynthesis genes (Sonenshein, 2005). The resulting increased ppGpp 268 concentration promotes binding of this signaling molecule as an inhibitor of enzymes such as 269 GuaB, GMK, and HprT (Figure 1C) to suppress GTP production (Kriel et al. 2012), causing 270 both guanine and PRPP to accumulate. 271 Under these conditions the guanine-PRPP tandem riboswitch still turns on expression of its 272 downstream genes, despite the mechanisms in place to keep GTP levels low. At first this appears 273 counterintuitive, but the genes associated with the tandem riboswitch system only include those 274 for catalyzing the steps of de novo purine synthesis up to the production of inosine 275 monophosphate (IMP), the branch point between AMP and GMP production (Figure 1C). Thus, 276 the riboswitch promotes AMP synthesis, but will not turn on genes specific for GDP or GTP 277 synthesis that are already being inhibited by ppGpp. ATP is known to be present in elevated 278 quantities during the stringent response because it is used as the initiating nucleotide for 279 transcription of genes under the ppGpp superregulon, whereas rRNA transcripts utilize GTP as 280 the initiating nucleotide to suppress protein production (Krásný et al., 2008). 281 282 Expansion of natural Boolean logic devices made of RNA 283 As noted above, the guanine-PRPP tandem riboswitch arrangement at its functional limits most 284 closely approximates an IMPLY logic gate (Figure 6A). This all-RNA system joins the NOR 285 gate formed by tandem S-adenosylmethionine (SAM) and (AdoCbl or

286 coenzyme B12) riboswitches (Sudarsan et al., 2006) (Figure 6B) and the AND gate formed by

10

Sherlock et al. PRPP Riboswitches

287 tandem T box and ppGpp riboswitch RNAs (Figure 6C) as sophisticated regulatory RNAs that 288 approximate Boolean logic functions. 289 Similarly, the tandem riboswitch-ribozyme arrangement formed by a c-di-GMP-II riboswitch 290 and a group I self-splicing ribozyme senses both c-di-GMP and guanosine (or any of its 5ˊ- 291 phosphorylated derivatives) to approximate the function of a Boolean AND gate (Lee et al., 292 2010; Chen et al., 2011). In this example, one chemical input is sensed by a conventional 293 riboswitch aptamer, whereas the second chemical input is sensed by the active site of a 294 ribozyme. These more complex riboswitch and ribozyme arrangements are notable because they 295 demonstrate how sensory and regulatory systems comprised entirely of RNA can perform 296 sophisticated decision-making in modern cells. 297

298 Discussion and conclusions 299 Most experimentally validated riboswitch classes sense and respond to ligands originating from 300 RNA metabolism pathways, and therefore it seemed reasonable to consider the likelihood that 301 riboswitches for PRPP would exist. PRPP is an essential precursor for the modern biosynthesis 302 of RNA monomers in organisms from all three domains of life, and is likely to have been 303 important for the production of very early in evolution. If PRPP was used by 304 RNA World organisms to generate the building blocks of this major biopolymer, then RNA 305 aptamers that selectively bound this biosynthetic precursor in ancient times might have persisted 306 to serve as riboswitch aptamers in modern cells. 307 Intriguingly, we had representatives of PRPP riboswitches on our list of riboswitch 308 candidates over a decade ago (Barrick et al., 2004), but due to their striking similarity to 309 riboswitches for guanidine (Nelson et al., 2017) and ppGpp, they were unusually difficult to 310 identify and experimentally validate. Members of other riboswitch classes have previously been 311 found to exhibit altered ligand specificities by accruing key mutations in nucleotides of their 312 binding pockets. For example, guanine riboswitches have undergone one or a few key mutations 313 to change their ligand specificity to (Mandal et al., 2004) or 2ˊ- (Kim et 314 al., 2007; Weinberg et al., 2017). Also, subtle binding site variants of FMN riboswitches appear 315 to bind a closely related ligand (Blount, 2013; Weinberg et al., 2017), and riboswitches for 316 molybdenum cofactor (MoCo) are proposed to have modest binding site changes to alter their 317 ligand specificity to bind tungsten cofactor (Regulski et al., 2008). In nearly all these previous

11

Sherlock et al. PRPP Riboswitches

318 examples, there are relatively small changes in the chemical structure of the ligand that need to 319 be accommodated by the riboswitch binding pocket changes. 320 For the subtypes of ykkC motif RNAs whose ligands have been identified, the chemical 321 differences in the ligands can be substantial. The ppGpp (subtype 2a) and PRPP (subtype 2b) 322 riboswitch ligands differ by the presence or absence of a guanine nucleobase and by the location 323 and number of phosphate moieties. Guanidinium, the ligand for guanidine-I (subtype 1) 324 riboswitches is entirely distinct from PRPP, and this moiety forms only a portion of the purine 325 ring of ppGpp. The ligands sensed by subtypes 2c and 2d, which are associated with genes 326 encoding nucleoside diphosphate linked to X (NUDIX) hydrolases of unknown specificity and a 327 transporter of unknown function associated with phosphonate utilization (phn_DUF6), 328 respectively, still remain to be discovered. 329 The consensus models of ykkC subtype 2c and 2d RNAs each have more prominent changes 330 compared to subtype 2a versus 2b at nucleotide positions that are known to be involved in ligand 331 recognition (Nelson et al., 2017; Reiss et al., 2017; Battaglia et al., 2017). Because the 332 consensus model for subtype 2c RNAs has the most similarity to those for PRPP and ppGpp 333 riboswitches, and the associated genes encode enzymes with various nucleotide substrates, we 334 speculate that this candidate riboswitch class senses another nucleotide derivative. 335 The consensus model for subtype 2d RNAs has the most similarity to that for guanidine-I 336 riboswitches and the phn_DUF6 transporter whose expression they control fall into the 337 same drug/metabolite transporter superfamily as guanidine transporters. Therefore, we propose 338 that this unsolved candidate riboswitch class might sense a small, charged, toxic compound akin 339 to guanidine. It will be interesting to compare the chemical structures of all of the ykkC family 340 riboswitch ligands once those for the subtype 2c and 2d riboswitch candidates have been 341 validated. Nonetheless, the collection of ykkC motif RNAs already demonstrates the ability of 342 RNA to recognize a variety of distinct small molecules, all while utilizing the same general 343 structural scaffold. 344 Furthermore, the discovery of tandem guanine-PRPP riboswitches serve as additional 345 demonstrations of how complex molecular devices can be formed purely from RNA. We should 346 note here that the in vitro transcription data for the logic gate molecules (Figure 5B,C) represent 347 the combined output of a population of RNA transcripts. Although the transcript length trends 348 are as predicted for an IMPLY gate, these outputs do not match the perfect binary output states

12

Sherlock et al. PRPP Riboswitches

349 of the corresponding electronic logic gates. However, each individual RNA molecule indeed has 350 a binary output just like an individual electronic logic gate. Each tandem riboswitch aptamer 351 transcript is either terminated (OFF) or fully transcribed (ON), but the band intensities from 352 analytical gel electrophoresis report on the results of the population of transcripts. 353 The outputs of the riboswitch logic gate can be further complicated by at least two issues. 354 First, an individual transcript might fail to function properly, say by misfolding of an aptamer 355 domain, such that it cannot respond to the presence of a ligand. This failure mode might help 356 explain the imperfect yields of terminated and full-length transcripts observed when examining 357 riboswitches by in vitro transcription assays. It seems likely that folding problems of RNAs 358 prepared in vitro might be less prominent when the RNAs are produced by cells. This means that 359 high levels of transcription read-through that is non-responsive to the absence of ligand are not 360 evident in reporter gene assays because the riboswitch aptamers have evolved to fold properly 361 and on a relevant timescale in the cellular milieu. 362 Second, the presence or absence of the two ligand inputs will change the function of the logic 363 gate system at the individual transcript level, and therefore change the frequency of their choice 364 of these two states rather than change the number of output states possible. Unlike electronic 365 logic gates that use electrical current at only two values (off and on), ligand concentrations for 366 riboswitches can be highly variable. Therefore, given the nature of ligand binding to receptors 367 (each described by an equilibrium constant equation), differences in ligand concentrations will 368 result in differences in the probabilities that the two riboswitch aptamers will be occupied. This 369 occupancy distribution will be reflected in the transcriptional output of the RNA population, but 370 each individual transcript still yields a binary output (terminated or full-length product) that is 371 biased by the concentrations of ligands present. 372 The IMPLY gate described herein involves a riboswitch RNA with aptamers for the RNA 373 nucleobase guanine and the RNA precursor PRPP. While tandem riboswitch systems have 374 previously been described (Figure 6), the guanine-PRPP system is the only example of two 375 tandem aptamers that share the same expression platform, allowing for a new type of logic 376 function to be implemented. All of these RNA systems demonstrate complex chemical sensing 377 and regulatory functions, which perhaps reflects the complexity of systems used for metabolite 378 monitoring and biological regulation in ancient RNA World organisms. Furthermore, it seems

13

Sherlock et al. PRPP Riboswitches

379 likely that additional variations of two-input Boolean logic gates will be represented by tandem 380 riboswitches that have yet to be discovered in modern cells. 381

382 Materials and methods

383 Chemicals and reagents 384 Chemicals were purchased from Sigma-Aldrich with the exceptions of phosphoribosyl

385 pyrophosphate (Santa Cruz Biotech) and guanosine 5′,3′-bisdiphosphate (TriLink Biotech). [- 32 32 386 P]-ATP and [- P]-UTP were purchased from Perkin Elmer and used within two weeks of 387 receipt. Bulk chemicals were purchased from J.T. Baker and enzymes from New England

388 Biolabs, unless otherwise noted. All solutions were prepared using deionized water (dH2O) and 389 were either autoclaved or filter sterilized (using 0.22 µm filters, Millipore) prior to use. DNA 390 oligonucleotides were purchased from Sigma-Aldrich and Integrated DNA Technologies. A list 391 of oligonucleotides used in this study can be found in Supplementary File 1–Supplemental 392 Table 1. 393 394 analyses. 395 Additional examples of ykkC motif RNAs were identified using Infernal 1.1 (Nawrocki and 396 Eddy, 2013) to search RefSeq version 76 plus additional environmental microbial databases as 397 described previously (Weinberg et al., 2015). Iterative searches for new sequences were 398 performed based on the previously published alignment of ykkC subtype 2 RNAs (Nelson et al., 399 2017). These sequences were sorted by nucleotide identity at a specific position as previously 400 indicated (Nelson et al., 2017) to exclude guanidine-I riboswitches, then further manually sorted 401 by downstream gene association, as described above. This revealed 257 unique examples of 402 subtype 2b RNAs (PRPP riboswitches), 127 of which are found in tandem with a guanine 403 aptamer. The consensus sequence and secondary structure model was constructed using R2R 404 software (Weinberg and Breaker, 2011). 405 406 RNA oligonucleotide preparation 407 Double-stranded DNA (dsDNA) templates for RNA transcription were produced either by 408 extension of overlapping synthetic DNAs (Supplementary File 1–Supplemental Table 1) using 409 SuperScript II reverse transcriptase (Thermo Fisher Scientific) for the F. ignava construct or by 14

Sherlock et al. PRPP Riboswitches

410 PCR from genomic DNA for B. megaterium construct. Primers were designed to contain a 5′- 411 terminal T7 RNA polymerase (T7 RNAP) promoter to enable transcription. The desired RNA 412 constructs were prepared from these dsDNAs by in vitro transcription, purified, and subsequently 413 5′ 32P-labeled as previously described (Nelson et al., 2017; Sherlock et al., 2017). 414 415 RNA In-line Probing Analyses 416 In-line probing assays (Soukup and Breaker, 1999; Regulski and Breaker, 2008) were 417 performed precisely as described previously (Nelson et al., 2017; Sherlock et al., 2017). 418 419 Single-round transcription termination assays 420 DNA constructs for single-round in vitro transcription were designed to carry the riboswitch of 421 interest spanning from the predicted natural transcription start site to 55 (H. modesticaldum 422 construct) or 26 (B. megaterium construct) nucleotides following the terminator stem. The 423 promoter from the B. subtilis lysC gene, which is compatible with E. coli RNA polymerase, was 424 used for both in vitro transcription termination as well as the lacZ reporter experiments detailed 425 below. Mutations were incorporated in the region of the B. megaterium construct before the 426 guanine aptamer so that it contained no cytidines, while the H. modesticaldum construct was 427 naturally cytidine-deficient. Synthetic double-stranded DNA templates were designed, purchased 428 (Integrated DNA Technologies) and subsequently amplified by PCR for the H. modesticaldum 429 constructs whereas constructs for the WT and M3 through M5 B. megaterium riboswitch were 430 amplified from the plasmids containing those inserts for the reporter gene fusion, as described 431 below. 432 Approximately 2 pmol of the resulting, purified DNA template was added to a transcription

433 initiation mixture (20 mM Tris-HCl [pH 7.5 at 23ºC], 75 mM KCl, 5 mM MgCl2, 1 mM DTT, 10 434 µg mL-1 bovine serum albumin [BSA], 130 µM ApA dinucleotide, 1% glycerol, 0.04 U µL-1 E. 435 coli RNA polymerase holoenzyme, 2.5 µM GTP, 2.5 µM ATP, and 1 µM UTP). Approximately 436 1 µCi [α-32P]-UTP was added per 8 µL reaction and transcription was allowed to proceed at 37ºC 437 for 10 minutes, leading to formation of a stalled polymerase complex at the first cytidine 438 nucleotide of each transcript. For each 8 µL transcription reaction, 1 µL of 10x elongation buffer -1 439 (20 mM Tris-HCl [pH 7.5 at 23ºC], 75 mM KCl, 5 mM MgCl2, 1 mM DTT, 2 mg mL heparin, 440 1.5 mM ATP, 1.5 mM GTP, 1.5 mM CTP, and 250 µM UTP) as well as 1 µL of a 10x solution

15

Sherlock et al. PRPP Riboswitches

441 of the ligand of interest were added sequentially. Transcription reactions were then incubated at 442 37ºC for an additional 60 minutes. 443 The transcription products were subsequently analyzed via denaturing 10% PAGE and 444 visualized using a phosphorimager (GE Healthcare Life Sciences). Fraction full length values 445 were calculated by varying the ligand concentration in separate reactions and quantifying the 446 changes in band intensity of both full length (FL) and terminated (T) transcription products using

447 the formula (FL)/(FL+T). The T50 values were determined by plotting the fraction FL as a 448 function of the logarithm of ligand concentration and using a sigmoidal four parameter logistic 449 fit in GraphPad Prism 7. 450 451 Riboswitch reporter assays 452 Riboswitch-reporter constructs consisting of the B. subtilis lysC gene promoter and either the H. 453 modesticaldum or B. megaterium riboswitch fused upstream of the E. coli lacZ gene were 454 inserted into the vector pDG1661 as described previously (Sudarsan et al., 2003; Nelson et al., 455 2017) and integrated into the amyE locus of wild-type WT (1A1 strain 168 Δtrp) or ΔpurF 456 strains (BKE06490 [erythromycin resistant] and BKK06490 [kanamycin resistant]) obtained 457 from the Bacillus Genetic Stock Center, The Ohio State University) B. subtilis (Meyer et al., 458 2011). The resulting transformed strains were verified to exhibit the expected resistance to 459 chloramphenicol as well as kanamycin for the ΔpurF::kan strains. Additionally, each strain was 460 confirmed to contain the desired riboswitch reporter sequence by colony PCR amplification 461 followed by sequencing. For example, PCR amplification using primers flanking the purF locus 462 followed by sequencing confirmed either the WT purF gene (WT strains) or kanamycin- 463 resistance gene (ΔpurF::kan strains) as expected. Mutations were incorporated into the B. 464 megaterium reporter construct by using Quikchange site-directed mutagenesis (Agilent 465 Technologies). 466 Reporter experiments were performed by inoculating various B. subtilis strains into 467 Lysogeny Broth (LB) with appropriate and growing overnight at 37ºC. For 468 experiments with the singlet H. modesticaldum reporter, the bacteria were diluted into lysogeny 469 broth (LB), grown for an additional 4-6 hours until culture growth was visible, centrifuged, 470 supernatant removed, the resuspended in glucose minimal media (GMM). For experiments with 471 the tandem B. megaterium reporter, bacteria were diluted directly into either LB or GMM, as

16

Sherlock et al. PRPP Riboswitches

472 indicated, and grown for 6 hours with or without supplemented guanine. Synthetic DNAs used in 473 cloning are listed in Supplementary File 1–Supplemental Table 1. 474 475 Acknowledgements 476 We thank Adam Roth, Ruben Atilho, Kimberly Harris, and other members of the Breaker 477 laboratory for helpful discussions. M.E.S. was supported by an NIH Cellular and Molecular 478 Biology Training Grant (T32GM007223). This work was also supported by NIH grants to 479 R.R.B. (GM022778 and DE022340). R.R.B. is also supported by the Howard Hughes Medical 480 Institute.

481 Competing interests

482 The authors declare that no competing interests exist.

483 Author ORCIDs

484 Madeline Sherlock, 0000-0002-6471-4243

485 Ronald Breaker, 0000-0002-2165-536X

486 References

487 Barrick JE, Corbino KA, Winkler WC, Nahvi A, Mandal M, Collins J, Lee M, Roth A, Sudarsan N, Jona 488 I, Wickiser JK, Breaker RR. 2004. New RNA motifs suggest an expanded scope for riboswitches in 489 bacterial genetic control. Proceedings of the National Academy of Sciences USA 101:6421-6426. DOI: 490 10.1073/pnas.0308014101, PMID: 15096624 491 Batey RT, Gilbert SD, Montange RK. 2004. Structure of a natural guanine-responsive riboswitch 492 complexed with the metabolite hypoxanthine. Nature 432:411-415. DOI:10.1038/nature03037, PMID: 493 15549109 494 Battaglia RA, Price IR, Ke A. 2017. Structural basis for guanidine sensing by the ykkC family of 495 riboswitches. RNA 23:578-585. DOI: 10.1261/.060186.116, PMID: 28096518 496 Benner SA, Ellington AD, Tauer A. 1989. Modern metabolism as a palimpsest of the RNA world. 497 Proceedings of the National Academy of Sciences USA 86:7054-7058, PMID: 2476811 498 Blount KF. 2013. Methods for treating or inhibiting infection by Clostridium difficile. U.S. Patent 499 13/576,989. 500 Breaker RR. 2010. RNA second messengers and riboswitches: relics from the RNA World? Microbe 501 5:13-20. 17

Sherlock et al. PRPP Riboswitches

502 Breaker RR. 2011. Prospects for riboswitch discovery and analysis. Molecular Cell 43:867-879. DOI: 503 10.1016/j.molcel.2011.08.024, PMID: 21925376 504 Chen AG, Sudarsan N, Breaker RR. 2011. Mechanism for gene control by a natural allosteric group I 505 ribozyme. RNA 17:1967-1972. DOI: 10.1261/rna.2757311, PMID: 21960486 506 Dennis AL, Puskas M, Stasaitis S, Sandwick RK. 2000. The formation of a 1–5 phosphodiester linkage in 507 the spontaneous breakdown of 5-phosphoribosyl-α-1-pyrophosphate. Journal of Inorganic Biochemistry 508 81:73-80. DOI: 10.1016/S0162-0134(00)00117-3, PMID: 11001434 509 Geiger T, Wolz C. 2014. Intersection of the stringent response and the CodY regulon in low GC gram- 510 positive bacteria. International Journal of Medical Microbiology 304:150-155. DOI: 511 10.1016/j.ijmm.2013.11.013, PMID: 24462007 512 Gilbert SD, Stoddard CD, Wise SJ, Batey RT. 2006. Thermodynamic and kinetic characterization of 513 ligand binding to the aptamer domain. Journal of 359:754-768. 514 DOI: 10.1016/j.jmb.2006.04.003, PMID: 16650860 515 Gilbert W. 1986. Origin of life: The RNA world. Nature 319:618. DOI: 10.1038/319618a0 516 Kilstrup M, Meng LM, Neuhard J, Nygaard P. 1989. Genetic evidence for a repressor of synthesis of 517 cytosine deaminase and purine biosynthesis enzymes in Escherichia coli. Journal of Bacteriology 518 171:2124-2127. DOI: 10.1128/jb.171.4.2124-2127.1989, PMID: 2539360 519 Kim JN, Roth A, Breaker RR. 2007. Guanine riboswitch variants from Mesoplasma florum selectively 520 recognize 2′-deoxyguanosine. Proceedings of the National Academy of Sciences USA 104:16092-16097. 521 DOI: 10.1073/pnas.0705884104, PMID: 17911257 522 Krásný L, Tiserová H, Jonák J, Rejman D, Sanderová H. 2008. The identity of the transcription +1 523 position is crucial for changes in gene expression in response to amino acid starvation in Bacillus 524 subtilis. Molecular Microbiology 69:42–54. DOI: 10.1111/j.1365-2958.2008.06256.x, PMID: 18433449 525 Kriel A, Bittner AN, Kim SH, Liu K, Tehranchi AK, Zou WY, Rendon S, Chen R, Tu BP, Wang JD. 526 2012. Direct regulation of GTP homeostasis by (p)ppGpp: a critical component of viability and stress 527 resistance Molecular Cell 48:231-241. DOI: 10.1016/j.molcel.2012.08.009, PMID: 22981860 528 Lee ER, Baker JL, Weinberg Z, Sudarsan N, Breaker RR. 2010. An allosteric self-splicing ribozyme 529 triggered by a bacterial second messenger. Science 329:845-848. DOI: 10.1126/science.1190713, 530 PMID: 20705859 531 Mandal M, Boese B, Barrick JE, Winkler WC, Breaker RR. 2003. Riboswitches control fundamental 532 biochemical pathways in Bacillus subtilis and other bacteria. Cell 113:577-586. DOI: 10.1016/S0092- 533 8674(03)00391-X, PMID: 12787499

18

Sherlock et al. PRPP Riboswitches

534 Mandal M, Breaker RR. 2004. Adenine riboswitches and gene activation by disruption of a transcription 535 terminator. Nature Structural and Molecular Biology 11:29-35. DOI: 10.1038/nsmb710, PMID: 536 14718920 537 McCown PJ, Corbino KA, Stav S, Sherlock ME, Breaker RR. 2017. Riboswitch diversity and 538 distribution. RNA 23:995-1011. DOI: 10.1261/rna.061234.117, PMID: 28396576 539 Meola M, Yamen B, Weaver K, Sandwick RK. 2003. The catalytic effect of Mg2+ and imidazole on the 540 decomposition of 5-phosphoribosyl-α-1-pyrophosphate in aqueous solution. Journal of Inorganic 541 Biochemistry 93:235-242. DOI: 10.1016/S0162-0134(02)00578-0, PMID: 12576286 542 Meyer MM, Hammond MC, Salinas Y, Roth A, Sudarsan N, Breaker RR. 2011. Challenges of ligand 543 identification for riboswitch candidates. RNA Biology 8:5-10. DOI: 10.4161/rna.8.1.13865, PMID: 544 21317561 545 Nawrocki EP, Eddy SR. 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 546 29:2933-2935. DOI: 10.1093/bioinformatics/btt509, PMID: 24008419 547 Nelson JW, Atilho RM, Sherlock ME, Stockbridge RB, Breaker RR. 2017. Metabolism of free guanidine 548 in bacteria is regulated by a widespread riboswitch class. Molecular Cell 65:220-230. DOI: 549 10.1016/j.molcel.2016.11.019, PMID: 27989440 550 Nelson JW, Breaker RR. 2017. The lost language of the RNA World. Science Signaling 10:1-10 DOI: 551 10.1126/scisignal.aam8812, PMID: 28611182 552 Regulski EE, Breaker RR. 2008. In-line probing analysis of riboswitches. Methods Molecular Biology 553 419:53-67. DOI: 10.1007/978-1-59745-033-1_4, PMID: 18369975 554 Regulski EE, Moy RH, Weinberg Z, Barrick JE, Yao Z, Ruzzo WL, Breaker RR. 2008. A widespread 555 riboswitch candidate that controls bacterial genes involved in molybdenum cofactor and tungsten 556 cofactor metabolism. Molecular Microbiology 68:918-932. DOI: 10.1111/j.1365-2958.2008.06208.x, 557 PMID: 18363797, 558 Reiss CW, Xiong Y, Strobel SA. 2017. Structural basis for ligand binding to the guanidine-I riboswitch. 559 Structure 25:195-202. DOI: 10.1016/j.str.2016.11.020, PMID: 28017522 560 Rieder R, Lang K, Graber D, Micura R. 2007. Ligand-induced folding of the adenosine deaminase A- 561 riboswitch and implications on riboswitch translational control. ChemBioChem 8:896-902. DOI: 562 10.1002/cbic.200700057, PMID: 17440909 563 Roth A, Breaker RR. 2009. The structural and functional diversity of metabolite-binding riboswitches. 564 Annual Review of Biochemistry 78:305-334. DOI: 10.1146/annurev.biochem.78.070507.135656, PMID: 565 19298181 566 Serganov A, Nudler E. 2013. A decade of riboswitches. Cell 152:17-24. DOI: 567 10.1016/j.cell.2012.12.024, PMID: 23332744

19

Sherlock et al. PRPP Riboswitches

568 Serganov A, Yuan YR, Pikovskaya O, Polonskaia A, Malinina L, Phan AT, Hobartner C, Micura R, 569 Breaker RR, Patel DJ. 2004. Structural basis for discriminative regulation of gene expression by 570 adenine- and guanine-sensing mRNAs. Chemical Biology 11:1729-1741. DOI: 571 10.1016/j.chembiol.2004.11.018, PMID: 15610857 572 Sherlock ME, Malkowski SN, Breaker RR. 2017. Biochemical validation of a second guanidine 573 riboswitch class. Biochemistry 56:352-358. DOI: 10.1021/acs.biochem.6b01270, PMID: 28001368 574 Sherwood AV, Henkin TM. 2016. Riboswitch-mediated gene regulation: Novel RNA architectures 575 dictate gene expression responses. Annual Review of Microbiology 70:361-374. DOI: 10.1146/annurev- 576 micro-091014-104306, PMID: 27607554 577 Sonenshein AL. 2005. CodY, a global regulator of stationary phase and virulence in gram- positive 578 bacteria. Current Opinion in Microbiology 8:203-207. DOI: 10.1016/j.mib.2005.01.001, PMID: 579 15802253 580 Soukup GA, Breaker RR. 1999. Relationship between internucleotide linkage geometry and the stability 581 of RNA. RNA 5:1308-1325. PMID: 10573122 582 Stoddard CD, Batey RT. 2006. Mix and match riboswitches. ACS Chemical Biology 1:751-754. DOI: 583 10.1021/cb600458w, PMID: 17240972 584 Sudarsan N, Wickiser JK, Nakamura S, Ebert MS, Breaker RR. 2003. An mRNA structure in bacteria 585 that controls gene expression by binding . Genes and Development 17:2688-2697. DOI: 586 10.1101/gad.1140003, PMID: 14597663 587 Sudarsan N, Hammond MC, Block KF, Welz R, Barrick JE, Roth A, Breaker RR. 2006. Tandem 588 riboswitch architectures exhibit complex gene control functions. Science 314:300-304. DOI: 589 10.1126/science.1130716, PMID: 17038623 590 Weinberg Z, Nelson JW, Lünse CE, Sherlock ME, Breaker RR. 2017. Bioinformatic analysis of 591 riboswitch structures uncovers variant classes with altered ligand specificity. Proceedings of the 592 National Academy of Sciences USA 114:2077–2085. DOI: 10.1073/pnas.1619581114, PMID: 28265071 593 Weinberg Z, Kim PB, Chen TH, Li S, Harris KA, Lünse CE, Breaker RR. 2015. New classes of self- 594 cleaving revealed by comparative genomics analysis. Nature Chemical Biology 11:606-610. 595 DOI: 10.1038/nchembio.1846, PMID: 26167874 596 Weinberg Z, Breaker RR. 2011. R2R - software to speed the depiction of aesthetic consensus RNA 597 secondary structures. BMC Bioinformatics 12:3. DOI: 10.1186/1471-2105-12-3, PMID: 21205310 598 Wickiser JK, Winkler WC, Breaker RR, Crothers DM. 2005a. The speed of RNA transcription and 599 metabolite binding kinetics operate an FMN riboswitch. Molecular Cell 18:49-60. DOI: 600 10.1016/j.molcel.2005.02.032, PMID: 15808508

20

Sherlock et al. PRPP Riboswitches

601 Wickiser JK, Cheah MT, Breaker RR, Crothers DM. 2005b. The kinetics of ligand binding by an 602 adenine-sensing riboswitch. Biochemistry 44:13404-13414. DOI: 10.1021/bi051008u, PMID: 16201765 603 604 Figure 1. PRPP riboswitches and their regulatory network. (A) Comparison of the consensus 605 sequence and secondary structure models for the guanidine-I riboswitch aptamer (subtype 1, left) 606 and the aptamers for ppGpp (subtype 2a, middle; ME Sherlock, N Sudarsan, RR Breaker, 607 manuscript submitted), and PRPP (subtype 2b, right). For sequence alignments of the 257 unique 608 PRPP riboswitch aptamers in Stockholm format, see Supplementary File 2. (B) Consensus 609 sequence and secondary structure model for the 127 unique examples of adjacent guanidine- 610 PRPP aptamers that form the corresponding tandem riboswitch architecture. For sequence 611 alignments of the tandem guanine- PRPP riboswitch aptamers in Stockholm format, see 612 Supplementary File 3. (C) Only genes in the purine biosynthetic pathway that are necessary for 613 the production of adenosine nucleotides are commonly associated with PRPP riboswitches. 614 615 Figure 2. Ligand binding by a natural singlet PRPP riboswitch. (A) Sequence and secondary 616 structure of the 106 nucleotide RNA derived from the purC gene of F. ignava. Data collected in 617 b were used to determine regions of constant or reduced scission upon the addition of PRPP to 618 in-line probing assays. Lowercase g indicates a guanosine nucleotide added to enable efficient 619 transcription by T7 RNA polymerase. Arrowheads demark the approximate range of nucleotides 620 with in-line probing data available as depicted in B. (B) Polyacrylamide gel electrophoresis 621 (PAGE) analysis of the spontaneous RNA cleavage products generated during in-line probing of 622 5′-32P-labeled 106 purC RNA. NR, T1 and –OH represent RNA undergoing no reaction, partial 623 digest with RNase T1 (cleaves after guanosine nucleotides), or partial digest under alkaline 624 conditions (cleaves after every nucleotide), respectively. Bands corresponding to selected G 625 residues are annotated. In-line probing experiments contained either no ligand (–) or 1 mM of the 626 compound indicated except guanine which was tested at 10 µM. Annotations 1 through 5 627 identify prominent regions of the RNA that undergo structural stabilization in a PRPP-dependent 628 manner. Data shown are representative of multiple experiments.

629 630 Figure 3. Transcription regulation and reporter gene expression by a natural singlet PRPP 631 riboswitch. (A) Sequence and secondary structure of the natural PRPP singlet riboswitch derived 632 from the purB gene from H. modesticaldum. An alternative RNA structure is depicted in which 633 the terminator stem forms followed by a U-rich tract, which only forms in the absence of PRPP. 634 (B) PAGE analysis of a single-round transcription termination assay of WT and mutant H. 635 modesticaldum purB riboswitches at the indicated ligand concentration. FL and T denote full 636 length and terminated transcripts, respectively. Values for the fraction of full-length transcripts 637 relative to the total transcription yield is listed for each reaction. (C) Plot of the fraction of full 638 length WT H. modesticaldum purB RNA riboswitch transcripts contributing to the total number 639 of transcripts (FL plus T) as a function of the logarithm (base 10) of the molar PRPP 640 concentration. The concentration of PRPP required to cause half-maximal termination efficiency 641 (T50) was determined by a sigmoidal curve fit (see METHODS). Inset: PAGE analysis of single 642 round transcription termination assays of the WT H. modesticaldum purB RNA with either no 643 ligand (–) or PRPP ranging from 200 nM to 10 mM. Data for replicates of this transcription 644 termination assay are presented in Figure 3–figure supplement 1. (D) Reporter gene expression 21

Sherlock et al. PRPP Riboswitches

645 of WT and ΔpurF [BKE06490 (depicted) and BKK06490 (not shown)] B. subtilis cells 646 containing wild-type (WT) or mutant (M2) H. modesticaldum purB riboswitch-lacZ reporter 647 fusion constructs as described in A. Cells were grown in glucose minimal medium (GMM) 648 containing 50 µg mL-1 X-gal. 649 The following figure supplement is available for figure 3: 650 Figure supplement 1. Replicate in vitro transcription termination assays with the H. 651 modesticaldum purB riboswitch construct. Assays are conducted as described for the data 652 generated in Figure 3C. T50 values are estimated to be 40 μM (left) and 80 μM (right). 653 654 Figure 4. Ligand binding by tandem guanine and PRPP aptamers. (A) Sequence and secondary 655 structure of the 208 nucleotide RNA derived from the codA gene of B. megaterium. Data 656 collected in b were used to determine regions of constant, increased, or reduced scission upon the 657 addition of guanine or PRPP to in-line probing assays. Additional annotations are as described 658 for Figure 4A. Note that the precise locations of regions experiencing strand scission are 659 approximated after nucleotide 74, as guided by the related in-line probing data generated for the 660 singlet PRPP aptamer depicted in Figure 2. (B) PAGE analysis of the spontaneous RNA 661 cleavage products generated during in-line probing of 5′-32P-labeled 208 codA RNA. 662 Annotations are described in the legend of Figure 2B. In-line probing experiments contained 663 either no ligand (–), 1 mM PRPP or 10 µM guanine. Annotations 1 through 10 identify 664 prominent regions of the RNA that undergo structural stabilization from either guanine or PRPP. 665 Both gel images contain RNA from the same in-line probing reactions, but the gel in the image 666 on the right was run for a longer time to increase resolution of the 3′ region of the RNA, which 667 contains the regions of the PRPP aptamer involved in ligand recognition. 668 669 Figure 5. Guanine and PRPP competitively regulate transcript termination via a riboswitch 670 integrating two aptamers. (A) Sequence and secondary structure of the tandem guanine-PRPP 671 riboswitch derived from the codA gene of B. megaterium. An alternative RNA structure is 672 depicted to include the terminator stem followed by a U-rich tract, which forms based on the 673 relative abundance of each ligand. (B) PAGE analyses of single-round transcription termination 674 assays of WT and various mutant B. megaterium codA tandem riboswitches in the presence or 675 absence of the two ligands. FL and T denote full length and terminated transcripts, respectively. 676 (C) Reporter gene expression of B. subtilis containing wild-type (WT) and mutant (M3 through 677 M5) B. megaterium tandem riboswitch-lacZ reporter fusion constructs as described in A. Cells 678 were grown in rich (lysogeny broth) medium or glucose minimal medium (GMM) containing 679 either no (–) or 10 µM additional guanine (gua) and 50 µg mL-1 X-gal. (D) Plot of the fraction of 680 full length B. megaterium codA tandem riboswitch transcripts contributing to the total number of 681 transcripts as a function of both guanine and PRPP concentrations. Data shown are the average 682 of three experiments (see Figure 5–figure supplement 1).

683 The following figure supplement is available for figure 5: 684 Figure supplement 1. Comprehensive dataset for the in vitro transcription termination assays 685 with the B. megaterium codA riboswitch construct. Data for three separate transcription 686 termination ‘trials’ are plotted for six conditions tested as noted in panel A. Panels A through F

22

Sherlock et al. PRPP Riboswitches

687 include the PRPP concentrations used in the assays in the lower left of each plot. Details are as 688 described in the legend to Figure 5.

689 Figure 6. Natural examples of tandem riboswitches that approximate Boolean logic functions. 690 (A) Tandem guanine and PRPP aptamers form an IMPLY logic gate to determine the state of 691 downstream gene expression (express.). (B) Tandem S-adenosylmethionine (SAM) and 692 adenosylcobalamin (AdoCbl) riboswitches form a NOR logic gate. (C) Tandem T box RNA and 693 ppGpp riboswitches form an AND logic gate. The T box and ppGpp riboswitch elements can 694 occur in either order.

695 696 Supplemental File Legends 697 698 Supplemental File 1. This file includes a table listing the DNA constructs used in the study. 699 700 Supplemental File 2. This file presents a sequence alignment in Stockholm format for individual 701 PRPP riboswitch aptamers. 702 703 Supplemental File 3. This file presents a sequence alignment in Stockholm format for 704 riboswitches that carry tandem guanine and PRPP aptamers. 705 706 707

23

A ykkC subtype 1 ykkC subtype 2a ykkC subtype 2b Guanidine-I ppGpp PRPP

Nucleotide identity covariation N 97% N 75% * Sites distinguishing no covariation P1a N 90% P1a P1a subtypes 1 and 2 compatible G Nucleotide present * Sites distinguishing G G mutation G C 97% 75% G subtypes 2a and 2b C G U C R = A G Y = C U R Y C G 90% 50% C G U C G U U C G U C U UC G G C U C U C A C A A A A A G R A G G C A A G R A G R Y A G A Y A A G A G A G G A G C G A G C U A A G A G A G CG * G C U Y * P3 G C U Y P3 G C G C C G G Y C G * G C P3 Y R * A* G C Y R A G * R Y U * R Y G * P1 G P1 * G P1 * R Y R G Y R * GC Y R * R * *G G C C G * G A * * G C * C G * RY C G * R C G Y R C G 5´ R Y R CA R G R U A CA G RY A CA R R Y * A A R Y Y Y R A * R * A Y R A YR G R A G P2 YR P2 RY P2 R * G G YR * G 5´ U 5´ U variable Y U length C Y hairpin Y C B O O N HO P O NH O O O O- N P1a O P O P OH H N NH2 G R HO OH O- O- guanine Y Y U C U C G Y P2 U CG G G A U phosphoribosyl G C Y C R R A C C U C A A C R G R Y R A C G G C A A pyrophosphate U Y A UA A G R G Y Y Y G R A G Y C Y R Y U A A R (PRPP) A A G A G C A A A G U AUC P3 U Y G C P3 C G G C UA Y R R RY A G A U R P1 C G P1 Y R R R U A G C C G C G 0 nt Y R C G 5´ Y G AA A G R R Y Y A CA R A GG UUY C Y R RY P2 YR

C O PRPP -O P O purF purD purN purL purM purEK purC purB purH purH O O O O- IMP O P O P O- de novo purine synthesis HO OH O- O- guaB O purA hprT N NH Genes commonly guanine XMP associated with guaA N N NH sAMP PRPP riboswitches purine H 2 Enzyme function salvage O known to be purB N inhibited by ppGpp NH O gmk -O P O N O N NH2 AMP GDP O- GMP HO OH C A A U A F. ignava P1a A U 106 purC G U 30 G C G A 20 C G A UC GA C A C A A A C G C G A 80 A G 40 G A G U G A G C G C G A G P3 4 C UC GC G C 100 10 1 A A G U G 70 G C A U G 3 G P1 A A U U A C G U G C 2 C G U 5´g GAAAG U A G U A CA AU A GG 50 G C 90 5 scission U A P2 60 reduced A U CG constant A U A C A B

NR T1 - OH PRPPguanineppGppguanidine Pre 5 G90 4 P3 G76 3 2 G59 P2 G50 1

G35

P1a

G22 P1 G16

G7 A C H. modesticaldum B [ligand] (mM) C PRPP G A terminator P1a U A purB riboswitch 00.1 1 0 0 0.1 1 0 0 PRPP G C 0 0 0 1 0 0 0 1 0 ppGpp C G G FL G C 1.0 C G U C WT M1 M2 T GC UC G U A antiterminator G U FL G C CG C A A A 0.9 G G G A GC T50~90 μM G A G G C T A A G G C A C U CG U C C P3 G C 0.8 C G G C CG G C G A fraction FL M1 A G C G AU G G C P1 C G G CG UU U 0.7 5´ G C CG -7 -6 -5 -4 -3 -2 U A C G 28 GGAAG AAA U U CA AU AGUUCCG 25 CUUUUGCCCGG GC 55 log c (PRPP, M) A U A U AAACAAA P2 G U M2 WT M2 Riboswitch RNA GC D GC GC ∆ purF WT ∆ purF WT B. subtilis strain CUUG A U A U CG C U G A AUG B. megaterium UU A G A tandem 208 codA C G A U P1a U A 110 100 C G guanine aptamerU A PRPP aptamer U A 40 U A G 2 3 A U 30 U U C G G U U C U U G A U C U C A C G A G A C C 50 A U P3 U C G U A A G A G C C 180 A A U U U G U U A C U 120 U A U C A C 90 C 20 U G G G A G P2 G A U A GC 8 A A A G U U G A G U A U C 4 A A A G G C 1 60 A U P3 G C U U A C G C A 9 U 10 70 6 A A U U A 170 G A 130 A C G P1 P1 A U G C G C U A A U C G U C G 5 G C 7 C G 10 C 5´ GG UUCA U AAA AAG U AC A U GACA GU AGUUUUG U G U A 190 A U 160 200 80 U A U G scission 140 C G reduced by guanine A U U G increased by guanine P2 U A reduced by PRPP U A constant U G U A G A 150 B AA HO 1 HO 1 NR T PRPPguanine NR T PRPP guanine Pre Pre 10 G174 9 8 P3 G147 G174 G128 G163 7

G147 P2

G90 G128 6

5 aptamer PRPP G73 G115 P1a 4

P1 G53 P3 G90

3 5

G37 G73 P1

G29

2 guanine aptamer guanine P2

G16

1 UU B. megaterium G A A C G tandem codA riboswitch A U terminator U A P1a C G guanine aptamerU A PRPP aptamer U A U U C A U A G U G G P2 U U U G U C G A C A U C A G A C G U U G U C C antiterminator G U U G A C U C G U A A G U U U U U A C C C G G U C A C U C A U G G A A G U G A U G A U A A A A G U U G G U U C A A A G U A U P3 A A A G GC A U A U A U C G M3 C G C C P3 G C A U G C G U A M4 AG C A C G A U G C G P1 U A M5 = M3 + M4 P1 A G G C A U G C UUUUU C G G C C G A U U A C G 5´ 17 UUC AAA 0 AAG C GACA GUAGUUUUGUUUUGG26 UUG GUCUUUU 26 A U U A P2 A U U A U G C G B PRPP C Rich GMM A U D + + U G + + gua gua U A + U A 0.9 U G WT FL WT U A T G A 0.85 AA M3 FL M3 0.8 T 0.75 1000 M4 FL M4 100 T FL fraction 10 0.7 1 FL 0.1 M5 T M5 0 0.65 [PRPP] (μM) 0 0.1 1 10 100 1000 [guanine] (μM) A OFF ON IMPLY gua gua PRPP express. + PRPP + + 5′ (Un ) + + + + B OFF OFF NOR SAM SAM AdoCblexpress. AdoCbl + + 5′ (Un ) (Un ) + + + C AND ON ON tRNA ppGpp express.

tRNA ppGpp + 5′ (Un ) (Un ) + + + + fraction FL 0.6 0.7 0.8 0.9 1.0 9- 7- 5- -3 -4 -5 -6 -7 -8 -9 log c (PRPP, M)

fraction FL 0.6 0.7 0.8 0.9 1.0 7- 5- 3-2 -3 -4 -5 -6 -7 log c (PRPP, M)