<<

Characterizing the Landscape of Aminoacyl-tRNA Synthetase Production in Bacillus subtilis

By

Darren John Parker

B.S. Biochemistry University of Illinois, Urbana-Champaign, 2014

Submitted to the Department of Biology In partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY IN BIOLOGY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2020

2020 Darren John Parker. All rights reserved.

The author hereby grants MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium known or hereafter created.

Signature of the Author: ______Darren J. Parker Department of Biology June 19, 2020

Certified by: ______Gene-Wei Li Associate Professor of Biology Thesis Supervisor

Accepted by: ______Stephen Bell Uncas and Helen Whitaker Professor of Biology Investigator, Howard Hughes Medical Institute Co-Director, Biology Graduate Committee

2 Characterizing the Landscape of Aminoacyl-tRNA Synthetase Protein Production in Bacillus subtilis

By

Darren John Parker

Submitted to the Department of Biology on June 19, 2020 in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Biology

Abstract

The phenotype of a cell is a consequence of both the identity of the in the genome and the magnitude of their expression into . While the biochemical function of many proteins has been uncovered, for most it is unclear how important native protein abundances are for cell fitness. Furthermore, linking changes in abundances with downstream effects on enzymatic output, pathway function, and ultimately cell fitness is unexplored in nearly all cases. Here I use a model enzyme family, the aminoacyl tRNA synthetases (aaRS), to explore how sensitive Bacillus subtilis are to changes in aaRS production from the molecular to phenotypic level. This culmination of protein levels, functional output, and fitness, leads to a complete “fitness landscape” for the aaRS proteins and provides a framework for future study in quantitative biology.

In Chapter I, I outline the conceptual questions explored in this thesis, review the current understandings of bacterial and aaRS function, and note the various regulatory strategies utilize to adapt to perturbations. In Chapter II, I find that the aaRS proteins are produced to optimize the growth rate of cells despite the presence of uncharged tRNAs. These native levels are positioned near a ‘fitness cliff’ as the underlying molecular processes of tRNA charging, translation, and regulation, are sensitive to reductions but not increases in synthetase production. In Chapter III, I complete the characterization of the aaRS fitness landscapes by exploring the source of the fitness defects of aaRS overproduction. In Chapter IV, I present a novel protocol for RNA-seq library preparation to reduce the cost and time associated with generating transcriptomic datasets. In Chapter V, using the aforementioned protocol, I characterize the transcriptomes of over 70 strains within the single knockout collection. With the help of a colleague we find that strong selective pressures to induce genes involved in motility leads to a large amount of transcriptome heterogeneity within the collection. Finally, in Chapter VI, I discuss the results of my work, setting up future directions within the context of , bacterial physiology, and beyond.

Thesis Supervisor: Gene-Wei Li Title: Associate Professor of Biology

3 Acknowledgements

There are many people to thank for my being at this point…the common image of a scientist toiling away in a dark room, annoyed by any minor disturbance could not be farther from the truth.

First and foremost, I have to thank Gene. I have yet to meet someone so intellectually capable yet so completely willing and eager to listen to the opinions of others. I always felt heard and respected, easily the two most important qualities in a mentor. I also admire the calming presence and patience he provides to myself and the lab.

I would also like to majorly thank my undergraduate advisor, Auinash Kalsotra for inspiring me to continue on a scientific path. I was not even anticipating going to graduate school starting my last year of undergrad, but his enthusiasm for science, warm heart, and encouragement led me to aim high and continue to follow what I found interesting.

I was incredibly fortunate to work with a great group of people during my last five years. First, I have to thank my long-time bay-mate and first colleague Jean-Benoît. Easily the smartest and hardest working person I have met at MIT, our numerous conversations, scientific or otherwise, were key in my development. I must also thank other members of the “founding core” of the Li Lab: Aaron, Grace, James, Lydia and Cassandra. Everyone brought a unique perspective to the lab, and I cherish the fact that I always felt comfortable discussing all elements of our work together. We had a lot of fun over the years, whether at Lab retreats, Halloween parties, or just coming up with shenanigans for the next event, and I will greatly miss those moments.

There are many others within the MIT community I would like to acknowledge. Especially important to me was the culture of the 2nd floor of 68. Thanks to the awesome members especially in the Burge and Calo labs, I can safely say the 2nd floor has the best mix of high achieving work, fun people, and good times. I would never have expected to play beer pong against MIT professors mere feet from lab, but that happened and it was awesome. Also, as much as we joke about Bertucci’s I also really enjoyed RNA journal club which was, let’s be honest, another 2nd floor event. I hope that culture persists for years to come.

I also would like to give a big thanks to Alan and members of the Grossman lab. The close interaction both in group meeting and outside was a huge part in setting the Li lab on its current course. I am sure I became annoying at some point, as I was coming up to ask questions at least once a day early on, so thank you all for your help. I also owe Alan my deepest gratitude for his excellent, frank feedback about science, careers, and in lab.

Life is not just work however, and I have to thank those that were part of the journey. First and foremost are those from my first true “home” 25 Cahvah (RIP) and 79 Putty; Nader, Sam, Charlie, Leanne and Jason. Whether it was Saturday morning cartoons, scientific discussions (i.e. gossip) or late night political debates, our apartment will forever be near to my heart. Second is my friends and members of various groups including DNA/ FC, the Ski trip squad, and the Microbios that semi-adopted me.

4 I was incredibly fortunate to have another group of friends to talk to nearly every day despite being 1,000 miles away. The Mofongo boys; Prath, Colin, Ross, and Ben, and Scott’s Tots; Jaydeep and Tania were all incredibly supportive of my journey and I would have gone crazy without them. To them I say, yes I am finally graduating college!

Finally, I would be nowhere without my loving family. My mom, Jamie, and my dad, John have been extremely supportive throughout the entire process and gave the confidence to do whatever I wanted. My brothers, Adam, Lucas, and Jeremy were essential in recharging on my breaks and I am extremely grateful for all the fun we have had, whether in worlds real or virtual. I would also like to acknowledge and thank my Nanny and Papa for their absolute love and encouragement over the years. Having the feeling that if things aren’t going well I can turn to a loving and supportive environment is not something to take for granted. They are all equal members in the production of this thesis and completion of my Ph.D.

5 Table of Contents

Abstract ...... 3

Acknowledgements ...... 4

Table of Contents ...... 6

Figure Index ...... 8

Chapter I: Introduction ...... 11 Introduction to Quantitative Biology of Gene Expression ...... 12 High Abundance of Translation Machinery...... 17 The Translation Cycle...... 18 Elongation Kinetics & Codon Usage...... 21 Expression Stoichiometry of Translation Machinery ...... 24 Consequences of Translation Elongation Rate Reduction ...... 25 Aminoacyl-tRNA Synthetase Structure and Function...... 27 Biochemistry, Kinetics, and Inhibitors of aaRS Enzymes ...... 31 aaRS Gene Regulation ...... 34 (p)ppGpp and the Stringent Response ...... 37 References ...... 43

Chapter II: Growth Optimized Aminoacyl-tRNA Synthetase Levels Prevent Maximal tRNA Charging ...... 58 Abstract ...... 59 Introduction ...... 60 Results ...... 62 Discussion ...... 77 Supplemental Figures & Legends ...... 82 Acknowledgements ...... 90 Methods ...... 91 References ...... 104

Chapter III: Characterizing the Origin of the Fitness Defect due to Overproduction of the Aminoacyl-tRNA Synthetases in Bacillus Subtilis .111 Abstract ...... 112 Introduction ...... 113 Results ...... 116 Discussion ...... 126 Acknowledgements ...... 129 Methods ...... 130 References ...... 133

6

Chapter IV: Development of a Highly-multiplexed RNA-seq Protocol .....136 Abstract ...... 137 Introduction ...... 138 Results ...... 139 Discussion ...... 145 Acknowledgements ...... 146 Methods ...... 147 References ...... 153

Chapter V: Rapid Accumulation of Motility-Activating Mutations in Resting Liquid Culture of Escherichia coli ...... 156 Abstract ...... 157 Introduction ...... 158 Results ...... 160 Discussion ...... 174 Acknowledgements ...... 178 Methods ...... 179 References ...... 184

Chapter VI: Conclusion and Future Directions ...... 187 Summary ...... 188 The Future of Fitness Landscapes ...... 189 Fitness Landscapes of Translational Machinery ...... 190 Future Directions of the Translational Cycle and Regulation...... 193 Concluding Remarks ...... 196 References ...... 197

Curriculum vitae ...... 201

7 Figure & Table Index

Chapter 1

Figure 1. Theoretical fitness landscapes ...... 14 Figure 2. Proteome distribution of model microorganisms ...... 18 Figure 3. Model of the translation elongation cycle ...... 20 Figure 4. Catalytic mechanism of aminoacylation reaction by aaRS enzymes ...... 28 Table 1. Aminoacyl-tRNA synthetase enzyme form & regulation...... 31 Figure 5. T box riboswitch regulation in Firmicutes ...... 37 Figure 6. Mechanism of stringent response activation...... 38

Chapter 2

Figure 1. Fitness landscape of tRNA synthetase production shows growth optimization near a fitness cliff ...... 65 Figure 2. aaRS overproduction reduces uncharged tRNA levels ...... 67 Figure 3. Translation elongation kinetics change only with aaRS underproduction ...... 70 Figure 4. Removal of the stringent response negatively impacts cell growth during aaRS underproduction ...... 73 Figure 5. Stoichiometry and kinetics of translation machinery are perturbed without stringent response ...... 76 Figure 6. An integrated view of growth optimization for aaRSs ...... 79 Figure S1. Determination of the fitness landscape of aaRS production ...... 82 Figure S2. T box reporter response to aaRS production changes ...... 83 Figure S3. Translation elongation kinetics slowed with underproduction but not increased with overproduction of aaRS ...... 84 Figure S4. Confirmation that the stringent response is activated and beneficial during aaRS underproduction ...... 86 Figure S5. Stringent response reduces translation defects during aaRS underproduction .... 88 Figure S6. Comparison of aaRS quantification methods ...... 89

Chapter 3

Figure 1. Model of the sources leading to fitness defects upon aaRS overproduction ...... 115 Figure 2. Schematic of aaRS and fluorescent protein variants created ...... 118 Figure 3. Effects on aaRS expression and tRNA charging of variant overproduction ...... 121 Figure 4. Effects on fitness of aaRS and FP variant overproduction ...... 123 Figure 5. tRNA charging and gene expression changes of aaRS overproduction ...... 125 Supplemental Table 1. DNA oligos used in this chapter ...... 132

Chapter 4

Figure 1. Comparison of library preparation methods ...... 141 Figure 2. Quality control of the multiplex method ...... 144

8 Chapter 5

Figure 1. Transcriptome remodeling among Keio strains ...... 161 Figure 2. Confirmation of motility gene expression using a GFP reporter driven by the fliC ...... 163 Figure 3. Correlation between motility gene activation and swimming phenotype ...... 165 Figure 4. Absence of motility activation after P1 transduction of single-gene deletions ..... 167 Figure 5. Diverse secondary mutations responsible for motility activation in Keio collection ...... 170 Figure 6. Rapid accumulation of motility-activating mutations in resting liquid culture ..... 173 Figure 7. Motility gene activation in strains deficient for aerotaxis and chemotaxis ...... 176 Table 1. List of KEIO knockout deletion strains tested ...... 179

9

10 Chapter I

Introduction

11 Introduction to Quantitative Biology of Gene Expression

The control of gene expression is foundational to organismal fitness and adaptability.

Although each cell is bounded by the genes encoded in its DNA, it is the differential production of these genes into proteins that allows cells to live and adapt. Even in the simplest bacterial genomes, of the roughly four thousand unique protein coding genes encoded in DNA, not all are expressed into protein at any given time. Furthermore, the amplification steps of and translation leads to protein copy numbers spanning over seven orders of magnitude (Li et al.,

2014). The vast combinatorial space of possibilities makes the culmination of all proteins, the proteome, a much more complex entity than a simple genome would suggest, and allows for organisms to adapt to multiple environments and states. However, resources to produce proteins are finite and competition in nature is fierce. Especially within the prokaryotic world, organisms that can better use the resources at hand can grow faster and outcompete their neighbor as all steps of the central dogma have associated costs (Kafri et al., 2016). Therefore, it is important for the cell to make the best use of the resources at hand and only produce genes that are needed within the specific context. As the identity and biochemical function of progressively more proteins is elucidated, a grand challenge in biology remains understanding the driving forces that shape cellular proteomes. This broad problem can be broken down into specific questions at all levels of biology, from molecular to the organism scale. For instance, are there specific design principles between cellular components that leads to optimal resource allocation and an optimized proteome? For individual proteins, are native levels optimized for growth? What are the effects on fitness and pathway function of deviating from the protein levels observed?

Consequently, can studying the impact of fitness on varying individual proteins lead to better understanding of the molecular pathways in which they reside? The aim of my thesis work is to

12 begin answering these questions and pave the way for new approaches in linking protein expression, function, and organismal fitness.

Attempting to understand the relationship between a quantity of a physiological property, such as the expression level of an individual protein, and its molecular causation on organismal fitness behind the observed value has been a major area of research for at least 30 years

(Diamond and Hammond, 1992). “Quantitative evolutionary design”, was coined as a term for studying how robust biological systems are to changes in system requirements (Diamond, 2002;

Salvador and Savageau, 2003). The major quantitative unit assessed within this framework, the

“safety factor”, is defined as the ratio of the total capacity of a biological process or enzyme to its need. A classic example is the “safety factor” of leg bones in animals, a metric of how heavy an animal can become before the leg bones can no longer provide support (Alexander, 1984;

Biewener, 1982; 1989). This idea can be applied to the study of individual enzymes, where the term is re-defined as the ratio of the maximal reaction rate, a function of the enzymes turnover rate and protein copy number, to the required reaction rate during a given condition. In this framework, a safety factor of 1 indicates that the enzyme is operating at full capacity. This idea has been applied in the study of sugar transporters and metabolic enzymes within human cells

(Salvador and Savageau, 2003; Weiss et al., 1998). The inverse of the safety factor, the “capacity utilization”, has also been used to intuitively indicate what fraction of the total enzymatic capacity a reaction is operating at in a given condition (Davidi and Milo, 2017). Although important, these metrics are incomplete in relating protein expression and cell fitness, since they give no information on the magnitude or consequences of perturbing native expression. A more holistic approach is to determine how cell fitness is affected over the entire range of possible expression. This cumulative metric, which I will herein refer to as a “fitness landscape”, is a

13 powerful way to visualize and quantify the effect of a proteins production on cellular fitness

(Figure 1). Fitness landscapes provide an abundance of important information. For example, by determining the placement of native levels, the safety factor/capacity utilization can easily be visualized and calculated. The landscapes can also inform on potential protein-specific toxicities and limitations on cellular fitness. Finally, they provide an important conceptual framework that provide a jumping off point for further investigation into the underlying molecular mechanisms that unify protein abundance and function with pathway and organism fitness.

Figure 1. Theoretical fitness landscapes. Three potential fitness landscapes for different genes. Fitness landscapes are defined as the relationship between protein abundance and relative fitness. For all landscapes, it is essential to define the location of native protein levels (I vs. II) to determine production optimality and the safety factor.

Despite the power of the above quantitative metrics, there are only a handful of studies fully mapping out fitness landscapes across any organism. One high-throughput study attempted to map the fitness landscapes for over ~100 genes across a 500-fold range of expression, in

Saccharomyces cerevisiae (Keren et al., 2016). As expected, the powerful nature of the landscapes uncovered multiple insights into the underlying biology. First, it uncovered that the fitness landscapes can take on multiple forms across different classes of genes, from genes that are highly sensitive to any expression changes from native levels to those without any change in fitness across the expression space and everything in between. Second, it was found that native

14 expression levels are not always perfectly optimized, as levels for enzymes involved in carbon metabolism were optimized for growth in glucose but not galactose. Unsurprisingly, it was also noted that these landscapes are not static and change depending on environment, further adding to their overall complexity. However, due to the high-throughput nature of the study, the molecular underpinnings that causally link protein expression and fitness were not further explored. This kind of detailed mapping fully integrating expression, function, and fitness has rarely been explored. Of the few examples characterized are the lac in E. coli (Dekel and

Alon, 2005; Perfeito et al., 2011; Stoebel et al., 2009) and the LCB2 gene in yeast (Rest et al.,

2013).

That is not to imply that a quantitative understanding of protein levels is otherwise lacking, as numerous other works highlight that many proteins may be expressed over their strict requirement. One high-throughput study using metabolite flux values and protein copy number predicted that most genes in E. coli metabolism operate well under full capacity (safety factor >

1) with a decrease in overall capacity utilization as the carbon source shifts from glucose to galactose (Davidi and Milo, 2017; O'Brien et al., 2016). However, this type of high-throughput approach generally requires knowledge of the catalytic rates of the enzymes being studied unless assumptions are made (Davidi et al., 2016). Given that most reported biochemical studies on enzyme action are generated in vitro, another caveat is whether these in vitro biochemical values should be applied to the in vivo setting. The more direct and straightforward approach, requires fine-tuned manipulation of enzyme levels. Luckily, advances in genetic tools, such as the introduction of CRISPRi technology, have allowed for expression control at scale (Larson et al.,

2013; Qi et al., 2013). Recent work profiling the fitness and phenotypes to CRISPRi knockdowns in Bacillus subtilis suggested that small amounts of knockdown (~2-3 fold) has no

15 apparent effect on cell fitness for many essential genes (Peters et al., 2016). Given their essential nature, it is potentially surprising that a reduction of these genes does not cause a phenotype, with capacity utilizations less than one. However, given that the expression of many of the targeted genes is autoregulated, it is unclear whether the reported knockdown, measured with a non-regulated reporter, applies to all targets. Another recent study where the production of genes involved in metabolism was reduced in E. coli confirmed that these genes are in fact produced to a greater degree than their flux requirements (Sander et al., 2019). Overall, these studies seem to point towards enzyme overproduction as an overall strategy to ensure organismal fitness is maintained in the case of perturbation.

It is interesting to note that many of the best characterized examples above are genes that are highly conserved across bacteria and higher organisms. Whether or not enzyme overproduction is also conserved remains to be uncovered. Recent work profiling proteome compositions in divergent bacterial species has demonstrated an exquisite conservation of in- pathway enzyme production (Lalanne et al., 2018). It would therefore be surprising if both protein levels and overproduction are conserved, as not all bacteria are faced with the same environmental and metabolic needs. Furthermore, protein production has costs, and since are working near saturation in fast growth conditions, for each enzyme produced over its need another could have been made instead (Dong et al., 1995). A strategy of overproduction seems feasible when applied to proteins that are required in relatively low quantities, but may be less so for proteins of high copy number. For example, achieving a safety factor of 2 for a protein required at 50 copies per cell would only require an extra 50 copies. However, a protein at

10,000 copies per cell would need another 10,000 synthesized for a similar benefit. Given work highlighting the stoichiometric production of components in multi-protein complexes within the

16 same operon it would seem rather unintuitive that the tight control exerted in this context included excess in expression (Li et al., 2014). Therefore, it may not be the case that all enzymes are overproduced, and elucidating which enzymes or pathways adhere to this principle may depend on their inherent costs or importance to cell fitness.

High Abundance of Translation Machinery

The process of protein synthesis, translation, sits at the center of all life. Given its importance and requirement for cell growth, it is one of the most energy intensive processes in the cell estimating to account for up to ~50% of the energy consumption in fast growing bacteria and 30% in dividing mammalian cells (Buttgereit and Brand, 1995; Russell and Cook, 1995).

The complex, three phase cycle, involves over one hundred unique proteins and that take up a major fraction of the total transcriptome and proteome. Ribosomes, the key ribonucleoprotein complexes involved, can vary in copy number from the tens of thousands in bacteria to over three million in the average human cell (Gupta and Warner, 2014). The three pieces of RNA constituting the core components of the ribosome, the 5S, 16S, and 23S rRNA account for ~85% of the total steady state level of RNA in rapidly dividing bacterial cells

(Bremer and Dennis, 2008). Furthermore, the molecular chaperones of amino acids, tRNAs, account for another ~13%, leaving protein coding mRNA to only ~2% (Bremer and Dennis,

2008). The suite of proteins involved in the full process including ribosomal proteins, initiation, elongation, and release factors, tRNA charging enzymes, modification factors, and others, take up a major fraction of the total proteome, accounting for 33%, 36%, and ~23% in fast growing

Escherichia coli, Bacillus subtilis, and Saccharomyces cerevisiae respectively (Figure 2).

Although the inner workings of the pathway have been relatively well characterized, and will be

17 reviewed below, how all the various molecular pieces come together to affect cell fitness is less understood. The large amount of investment into the process making proteins makes translation a prime candidate to study the quantitative questions posed previously. Specifically, what is the fitness landscapes of the factors involved in translation? Is there optimization involved in the production of the multitude of proteins involved? Additionally, what are the rate limiting steps in the process and when does limitation for one factor become rate limiting? Although work has indicated that ribosomes are key limiters of bacterial growth, are other proteins equally important? Finally, given that translation is a highly-controlled process at both the gene and pathway level, how does regulation play into linking perturbation with fitness?

Figure 2. Proteome distribution of model microorganisms. Proteome distribution of fast growing microorganisms. Ribosome profiling data for Escherichia coli, Bacillus subtilis, and Saccharomyces cerevisiae from (Lalanne et al., 2018; Li et al., 2014; Nagaraj et al., 2012) was obtained, processed, and analyzed into Proteomaps (Liebermeister et al., 2014).

The Translation Cycle

The translation of an mRNA into protein consists of three major phases: initiation, elongation, and termination. Although at their core, the three phases remain the same between bacteria and eukaryotes, the exact mechanisms and multitude of players involved differ. Each

18 step involves a plethora of factors and has unique forms of regulation and sensory systems to maintain proper coordination within and outside of the cycle.

Across life, translation initiation is thought be the most highly regulated and generally rate limiting step of the process (Laursen et al., 2005). In bacteria, canonical translation initiates by initial formation of the 30S preinitiation complex (IF1,2 & 3, the initiator tRNA fMet- tRNAfmet, and the 30S small subunit) on the mRNA at the site of the initiator AUG

(Watson et al., 2014). Conformational changes occur after codon-anticodon binding between the initiator AUG and initiator tRNA, locking the more stable 30S initiation complex into place.

Ejection of IF1 and IF3 then leads to the large 50S subunit to be associated with the preinitiation complex. After proper tRNA and ribosome placement, GTP bound to IF2 is hydrolyzed, leading to IF2 release, and final 70S initiation complex formation. With the fMet-tRNAfmet properly placed in the P-site, the ribosome is then ready for the next appropriate tRNA to enter the A-site and the elongation cycle to begin.

The longest phase of the cycle by total time involved is the second step, elongation.

Elongation is a complex cycle involving five major steps (Figure 3). In the first step, decoding, the ribosomes have their P-site occupied by a tRNA conjugated with the nascent and an empty A-site. Ternary complexes of aminoacylated tRNA, EF-Tu, and GTP are then sampled in the A-site to find tRNAs that match the presented triplet codon. Successful interactions of the aa- tRNA anti-codon with the mRNA codon leads to GTP by EF-Tu. The release of the free phosphate then triggers a rearrangement of EF-Tu and full accommodation of the aa-tRNA.

Local rearrangements in 16S rRNA allow for a final sampling step to occur before EF-Tu release

(Rodnina et al., 2017).

19

Figure 3. Model of the translation elongation cycle. Cartoon diagram of the major steps in the elongation cycle.

Above the site of codon-anticodon pairing, the center is ready to conjugate the nascent peptide with the incoming amino acid. The RNA residues in the 23S rRNA position the amino group on the amino acid of the incoming tRNA to perform a nucleophilic attack on the carbonyl carbon of the ester bond on the nascent peptidyl-tRNA covalently linking the nascent peptide to the A-site tRNA. Interestingly, even though the chemical identities of the

20 natural amino acids vary greatly, the ribosome is generally able to perform the peptidyl transfer reaction without limiting the cycle (Wohlgemuth et al., 2008). One exception however, is large stretches of sequential Proline codons (poly-Pro) (Buskirk and Green, 2013; Rajkovic and Ibba, 2017). In this case stalling of the ribosome does occur due to the unfavorable orientation of proline reactive sites when multiple prolines are encountered in series (Ude et al.,

2013). When these sequences are encountered, the stalling is rescued by an auxiliary protein, EF-

P, that comes into the ribosome to orient the poly-Proline chain into a catalytically favorable orientation within the peptidyl transferase center (Huter et al., 2017).

20 Following the covalent linkage between the nascent chain and the incoming aa-tRNA the two ribosome subunits undergo a rotation relative to each other while shifting the positions of the

P and A-site tRNA into hybrid states (P/E and A/P respectively) (Frank and Gonzalez, 2010).

This process is aided by the translation factor EF-G. Upon binding and hydrolysis of GTP on EF-

G, the small subunit body rotates relative to the head maintaining a swiveled state, releasing tRNA anticodon:mRNA binding from the 16S rRNA (Belardinelli et al., 2016). Further rearrangements of the small subunit allow the tRNAs to adopt new positions within the E and P- sites as EF-G releases the hydrolyzed phosphate. Next, as the small subunit head continues to move, the E-site and P-site tRNA are moved further apart as the E-site tRNA loses base pairing with the mRNA. This disassociation results in a return to the non-rotated state of the ribosome with a single tRNA with the nascent peptide chain attached in the P site, ready to begin the cycle anew.

The cycle of protein synthesis ends with the final step of termination. Briefly, when the ribosome encounters a instead of binding tRNA, binding of release factors occurs. In bacteria the two release factors RF1 and RF2 recognize UAG/UAA and UGA/UAA codons respectively to begin the process of hydrolyzing the native peptide from the tRNA. A third factor, RF3, is involved in disassociating RF1/2 from the ribosome. Finally, ribosome subunits and still bound P site tRNA are released through a process involving RRF and EF-G called ribosome recycling. The exact mechanisms by which termination and release occurs differs between prokaryotes and eukaryotes and is still an active area of work.

Elongation Kinetics & Codon Usage

21 Despite the differences in mechanisms across kingdoms, measurements of the average translation elongation rate using a multitude of different in vitro and in vivo techniques have found to be remarkably constant rate across all organisms. In fast growing bacteria through human cell lines, a rate of 5-20 amino acids incorporated per second has been measured (Dai et al., 2016; Morisaki et al., 2016; Sørensen and Pedersen, 1991; Wang et al., 2016; Wu et al.,

2016; Yan et al., 2016). This rate has been shown to be remarkably consistent, albeit with some reduction, across growth rates in bacteria as well, decreasing only two-fold with a 10-fold decrease in growth rate in E. coli (Dai et al., 2016; Dalbow and Young, 1975; Dennis and

Bremer, 1974a; Young and Bremer, 1976). Despite these bulk measurements, whether or not the translation rate differs across genes or codons and why is a hotly contested area of current research. On top of this, the exact rate limiting step in the multi-step process of elongation remains to be determined.

A major mechanism by which elongation rates can differ across genes and codons is the differential production and usage of the amino acid supplier to translation, tRNA. In the universal code, only tryptophan and methionine are decoded by a single codon and tRNA. All other amino acids are encoded by multiple codons generally requiring multiple tRNA species termed tRNA isoacceptors. Furthermore, due to wobble base-pairing at the 5’ most basepair of the tRNA anticodon, tRNA can often decode multiple codons, allowing for organisms to produce less unique tRNA species than codons (<64). The overall degeneracy and crosstalk in the system leads to a complex relationship between tRNA levels and codon usage since in most organisms the usage of codons and expression of tRNAs are not equal within a given amino acid group. The phenomena of ‘codon usage bias’ occurs when an organism has evolved to use some codons more than others. Generally, an increase in usage is correlated with an increase in the cognate

22 tRNA isoaccpetor, however this is not always the case. Ultimately, the differences in ratios between the supply in tRNA isoacceptors, and demand for translation (codon usage), leads to the possibility of differences in translation rate for different codons and amino acids.

When and where codon usage bias affects translation is a major area of current research.

First, the study of codon usage bias potentially brings answers to a long-standing fundamental question about translation, what step in the process limits the cycle? Given that tRNA isoacceptors concentrations can vary up to ten-fold for the same amino acid, the wait time of the ribosome during the decoding step could also be greatly different between codons (Dong et al.,

1996; Kanaya et al., 1999). It has been noted since the first genes were first sequenced that highly expressed genes in bacteria greatly favor codons paired with the most abundant tRNAs

(Gouy and Gautier, 1982). Additionally, production of heterologous proteins using bacteria as host platforms are consistently “codon optimized” for their host to maximize production

(Gustafsson et al., 2004; Sharp and Li, 1987). Further studies have attempted to directly measure how the translation rate changes for codons decoded by abundant vs. rare tRNAs finding proportionality between tRNA abundance and translation speed (Sørensen and Pedersen, 1991;

Sørensen et al., 1989). However, these studies are confined to a few specific examples, generally in non-native situations, so the broad applicability of the results are unclear (Yu et al., 2015).

On the other hand, it has been postulated that the rate limiting step in translation is not decoding, but rather one of the additional tRNA-independent steps. The recent development of ribosome profiling, the high-throughput sequencing of ribosome protected mRNA fragments, can provide average wait times of ribosomes across the entire transcriptome (Ingolia et al., 2009).

Using this global measurement, multiple works have pointed toward a smaller distribution of wait time among codons, arguing that decoding is not rate limiting step in translation (Li et al.,

23 2014; Subramaniam et al., 2014). However, continued advances and modifications to the protocol have begun to question this fact, demonstrating slight anti-correlations between tRNA abundance and ribosome wait time (Mohammad et al., 2019; Wu et al., 2019). Additionally, although tRNA abundances may always differ, some studies suggest that translation rate differences may only manifest during starvation of the cognate amino acid, when cells are fully saturated with amino acids (Elf et al., 2003; Subramaniam et al., 2013). Another orthogonal theory postulates that codon usage bias may not directly affect the translation rate, but instead be a global mechanism to maintain sufficient pools of more rare tRNAs (Andersson and Kurland,

1990; Frumkin et al., 2018; Klumpp et al., 2012). Furthermore, some studies speculate that codon usage bias may be important for mRNA secondary structure or decay, further confounding global analysis studies (Kudla et al., 2009; Radhakrishnan et al., 2016). In any case, differences in the translation rate of abundant versus rare codons are generally thought to be small, with overall protein synthesis rates mainly controlled through translation initiation not elongation.

Expression Stoichiometry of Tranlsation Machinery

It is not well understood how the expression of the proteins involved in the elongation cycle come together to set the protein synthesis rate. The aforementioned finding that major cellular pathways have conserved production for their components includes the process of translation elongation (Lalanne et al., 2018). For example, it was found that the ratio between the major factors in translation elongation: ribosomes, EF-Tu, and tRNA synthetases, is maintained at a constant 1:5:1 across two billion years of evolution. This specific in-pathway stoichiometry up until then had only been observed in the context of changing growth rates in E. coli, as numerous works quantifying translation-related proteins also showed a similar conservation of

24 expression between faster and slower growth rates (Dennis and Bremer, 1974b; Forchhammer and Lindahl, 1971). Given a maintenance of ratios across time and space, these findings hint that translation as a pathway has a specific, potentially optimized, recipe for expression for maximal function that extends to all bacteria (Ehrenberg and Kurland, 1984). Closely related to this idea, some works have sought to understand the potential proteome allocation to translation within the context of maximizing bacterial growth rates using mathematical models. For example, work attempting to better understand the rate limiting step in translation points towards a near optimal ratio between tRNA affiliated proteins and ribosomes to maintain the elongation rate in the crowded (Klumpp et al., 2013). Others have studied the ratios between ribosomes and tRNA, finding that a near 10:1::tRNA:ribosome ratio keeps the total elongation cycle time at a minimum (Gouy and Grantham, 1980). For the most part however, proteome allocation studies in general treat translation as a monolith, where all proteins are treated as a single large chunk in the overall cellular proteome, and overall experimental evidence is lacking (Scott et al., 2010;

2014). Therefore, much more work is required to understand how each individual component comes together to build the full pathway.

Consequences of Translation Elongation Rate Reduction

Given the well conserved stoichiometry of components and high rate of translation elongation across organisms, it follows that reduction in ribosome speed can have drastic effects on gene expression and cell physiology. For fast growing bacteria, reduction in the elongation rate is directly related to a reduction in cell growth rate due to the “bacterial growth law”

(Bremer and Dennis, 2008; Scott and Hwa, 2011). This ‘law’, based on the observation that ribosome content is linearly related to growth rate at medium to fast growth rates, implies that

25 the ribosomes are working near capacity at steady state, setting the doubling time of the proteome and thus the growth rate (Schaechter et al., 1958). Reductions in the actual rate of these ribosomes would therefore increase the amount of time required to double the proteome, decreasing the overall population growth rate. Although the direct consequences of reduced translation speed are less clear in more complex organisms, the requirement to avoid collisions of ribosomes actively engaged in translation is conserved. Perturbations to the rate of translation, whether through errors in gene expression, chemical insults, cis acting elements, or unfavorable codons stretches, can lead to ribosome queuing & collisions (Doma and Parker, 2006). These issues have downstream consequences in a variety of ways including protein misfolding, ribosome sequestration, and mRNA/nascent chain degradation (Doma and Parker, 2006; Ferrin and Subramaniam, 2017; Nedialkova and Leidel, 2015). In eukaryotes, pausing in the internal mRNA is referred to as “no go” and is most well studied in terms of ribosomes that have missed stop codons and begin translating into the poly(A) tail (Harigaya and Parker, 2010). More recent work has begun to uncover the role that misregulation of tRNA can have on translation and novel factors are still being discovered to ameliorate these issues in higher eukaryotes (Ishimura et al.,

2014; Kirchner and Ignatova, 2015). These types of translation stalling events are dealt with by the Dom34-Hbs1complex in yeast to degrade the nascent peptide and split the ribosomal subunits for effective recycling (Bengtson and Joazeiro, 2010; Shoemaker et al., 2010). In bacteria it is thought that cleavage of the mRNA at a stall site allows for entry of tmRNA, the major ribosome rescue system of non-stop mRNA (Christensen and Gerdes, 2003; Li et al.,

2008). Regardless of mechanism, the end result of these ‘translation abortion’ pathways is the destruction of both mRNA and nascent protein. Given this terminal outcome, it is unclear how and when exactly they are used during transient pausing. In the context of pausing due to amino

26 acid starvation, the observation of increased ribosome density at starved codons provides evidence that translation abortion pathway does not act instantaneously, and is in competition with canonical translation, misincorporation, and ribosome slippage (Darnell et al., 2018;

Subramaniam et al., 2014). However, the detection of incomplete protein products upon deletion of the genes involved in abortive translation, such as tmRNA in E. coli, provides evidence that abortive translation occurs at some rate during transient pausing (Subramaniam et al., 2014).

How cells balance the waiting time for the correct tRNA and potential effects of ribosome queuing against the waste of resources associated with abortive translation remains to be fully elucidated.

Aminoacyl-tRNA Synthetase Function and Structure

The most well studied mechanism of transient translation pausing is amino acid starvation. Since amino acids are used in a variety of processes other than translation, in order to target amino acid starvation only to the ribosome, it is common to inhibit the first step in the process by which amino acids are given to the ribosome, tRNA aminoacylation. The step of tRNA aminoacylation, or ‘tRNA charging’, is carried out by a universally conserved set of proteins called aminoacyl-tRNA synthetases (aaRS). These ancient proteins, essentially all catalyze the same reaction linking an amino acid to a tRNA (aa-tRNA) by utilizing the energy of

ATP hydrolysis (Figure 4). Despite the relatively straightforward chemical reaction, the evolution of aaRS, their structure, and biochemistry are anything but simple. As enzymes catalyzing reactions foundational to all life they are thought to have predated the root of the universal phylogenetic tree and possibly evolved from ribozymes with the same function (Lee et al., 2000; Woese, 1998). Surprisingly, the 20 canonical aaRS proteins are split between two

27 structurally unrelated classes of enzymes, Class I and Class II based on their sequence motifs

(Eriani et al., 1990). The origin of this divergence, given the common reaction and ligand, is still a source of speculation today, with some postulating that the two classes arose from opposite strands of a single gene (Carter and Duax, 2002; Martinez-Rodriguez et al., 2015; Rodin and

Ohno, 1995). Type I synthetases are characterized by their active site containing a Rossmann dinucleotide fold for the binding of ATP (O'Donoghue and Luthey-Schulten, 2003). Included in this fold are two major motifs, an HIGH and a KMSKS motif that are conserved across all Class

I aaRS from bacteria to human (Austin and First, 2002). Mutations within either motif has been shown to completely abolish aaRS enzymatic activity (Austin and First, 2002; O'Donoghue and

Luthey-Schulten, 2003). On the other hand, class II enzymes build their active sites using an antiparallel -fold unique to aaRS proteins and hold onto ATP in a different conformation.

(Arnez and Moras, 1997; Murzin et al., 1995).

28 Figure 4. Catalytic mechanism of aminoacylation reaction by aaRS enzymes. Catalytic mechanism of tRNA charging. The top row represents the first step, the formation of the aminoacyl adenylate. In the second step, the 2’ or 3’ hydroxyl of the terminal adenosine on the tRNA performs nucleophilic attack on the α-carbonyl of the aminoacyl-adenylate completing the “charging” reaction. Adopted from (Li et al., 2015).

Another key difference between the two enzyme classes lies in how the enzymes bind tRNA. tRNA recognition, the process of aaRSs selecting the correct tRNA, is a difficult problem for the proteins to solve. Correct recognition is critical in maintaining the fidelity of the as the ribosome is unable to distinguish if the correct amino acid to tRNA pair is formed.

As such, it is actually the aaRSs that are establishing the link between nucleotide and protein.

Due to this importance, even minor losses in recognition fidelity, whether through losses in recognition or quality control, can have major impacts on organism fitness. Unfortunately for the aaRSs, tRNAs are all highly similar at the sequence and structural level, maintaining the same L- shaped three-leafed clover form across tRNAs for all amino acids. The maintenance of this similar structure is important for both the ribosome and EF-Tu so that both molecules can interact with relatively the same affinity for all aa-tRNA (LaRiviere et al., 2001; Schrader et al.,

2011).

Fortunately, there are a few major points of difference that can be used by aaRS to differentiate tRNA. First, the anticodon sequence is the major unique feature of each tRNA and some aaRSs, mainly class I aaRSs, use the anticodon for recognition. However, it is thought that in general the three base pairs of the anticodon do not provide enough binding energy nor differentiation capacity to be the sole points of recognition. aaRS can also recognize tRNA at their discriminator base (N73), D loop, or slight differences among the acceptor stems, near the site of aminoacylation (Ibba and Söll, 2000). Another key difference among tRNAs is the adjustable length and sequence of the variable loops (Lenhard et al., 1999). It has been shown

29 that the variable loops are used more for recognition by Class II aaRS as their cognate tRNAs generally have larger variable regions. To bind these extended variable regions, tRNA Class II enzymes often have N or C terminal extensions (Cusack et al., 1991). For example, the tRNA synthetase enzyme for serine (SerS) while having contacts with tRNA at multiple locations, primarily recognizes serine tRNA with a large conserved N-terminal coiled-coil (Borel et al.,

1994; Cusack et al., 1990). Similarly, the aspartyl-tRNA synthetase in yeast has been shown to use its N-terminal extension to recognize aspartyl tRNA, however in this case binding occurs using both the anticodon and variable loops (Ruff et al., 1991). Another interesting finding of early structural work was the face in which the synthetase bound the tRNA. AspRS, a Class II synthetase, was found to approach the tRNA from the opposite side of the previously solved structure at the time of GlnRS, a class I enzyme. Since the early 1990’s when the first aaRS- tRNA structure pair was solved, all 20 canonical aaRS-tRNA structures have been solved across a variety of organisms (Ruff et al., 1991). These additional structures further demonstrated that

Class II aaRS bind their cognate tRNA from the side of the variable loop, and make contacts with the minor groove of the acceptor stem while Class I aaRS bind using the minor groove with the variable loop facing out (Ibba and Söll, 2000). These structures nicely complement detailed mutational approaches to fully map out the complex landscape that allows aaRS to successfully distinguish between multiple similar substrates.

The two classes also differ at the quaternary level of structure (Table 1). Most Class I enzymes are relatively large single monomers with only TrpRS and TyrRS functioning as homodimers. Conversely, no Class II enzymes function as monomers, as they exist in solution and catalyze aminoacylation mostly as homodimers. Some interesting outliers include AlaRS which was original observed to form a homotetramer in solution (Putney et al., 1981), however

30 subsequent studies found that it could exist in multiple homopolymeric states depending on conditions (Sood et al., 1997; 1996). PheRS and some GlyRS enzymes actually involve two different ORFs. In bacteria the tetramer of two genes, pheS and pheT, are produced in equal abundance to make up the full PheRS complex (Bobkova et al., 1991; Brakhage et al., 1990;

Lavrik et al., 1982).

Table 1. Aminoacyl-tRNA synthetase enzyme form & regulation

Class I aaRS T box in Class II aaRS T box in B.S.? B.S.? 3 ArgRS α No GlyRS α2, (α)2 Yes 4 CysRS α No HisRS α2 Yes IleRS α Yes ProRS α2 No LeuRS α Yes ThrRS α2 Yes MetRS α, α2 Yes SerRS α2 Yes ValRS α Yes AsnRS α2 No 1 4 LysRS α No AspRS α2 Yes 2 1 GlnRS α No LysRS α2 No GluRS α No AlaRS α, α4 Yes TrpRS α2 Yes PheRS (α)2 Yes TyrRS α2 Yes

1 LysRS is found as both class I and II in different organisms(Cusack, 1995) 2 In most bacteria, including Bacillus subtilis, glutaminyl tRNA is created by an amidotransferase protein complex to convert Glu-tRNAGlu into Gln-tRNAGln (Curnow et al., 1997) 3 GlyRS exists in two forms across different organisms (Schimmel and Söll, 1979) 4 hisS and aspS genes exist in a single two gene operon controlled by aspartate tRNA

Biochemistry, Kinetics, and Inhibitors of aaRS Enzymes

Structural differences are not the only divergence point between Class I and Class II enzymes. The biochemistry of the conserved reaction also has been found to differ between the two types of enzyme. Interestingly, the hydroxyl group on the ribose ring in the terminal adenine of the tRNA in which the aminoacyl linkage is attached differs between the two classes. Class I

31 enzymes primarily attach the amino acid to the 2’ hydroxyl and Class II enzymes to the 3’ hydroxyl. This difference is not thought to have a major impact on aminoacyl linkage stability as stability is primarily driven by amino acid identity (Arnez and Moras, 1997; Peacock et al.,

2014). Another biochemical question still under study concerns the rate limiting step in the charging reaction. Early work characterizing the kinetics of reaction in vitro found that the aminoacyl transfer step was ten times greater than the steady-state kcat, suggesting that product release limited the full reaction for GlnRS, ArgRS, and IleRS, all Class I enzymes (Eldred and

Schimmel, 1972; Fersht et al., 1978; Uter and Perona, 2004). In contrast, HisRS, PheRS, and

SerRS, all Class II enzymes, did not show the same kinetics (Dibbelt et al., 1980; Guth et al.,

2005; Lin et al., 1983). Further studies confirmed the differences between the two classes, and identified EF-Tu as a key contributor in increasing steady-state turnover for Class I enzymes by promoting product release (Zhang et al., 2006).

One of the most intriguing findings to come from the in vitro biochemical studies is a consistent lack of concordance between the in vitro measured rates and the required in vivo requirements for aaRS turnover. A back of the envelope calculation for required turnover estimates that the turnover of the average aaRS would need to be ~20 s-1 in order to not limit protein synthesis. This estimate can be further refined by taking into account species specific ribosome and aaRS expression, with codon usage. However, despite refinement, the average turnover measured in vitro is still near one order of magnitude less than the estimated requirement (Jakubowski and Goldman, 1984). As numerous studies have pointed out, the in vitro conditions may not perfectly recapitulate the in vivo environment through the exclusion of potentially important factors like EF-Tu or the high concentration of intracellular aaRS and tRNA. Additionally, despite the highly-conserved enzyme core, aaRS in eukaryotes generally

32 have additional domains, some of which are known to promote large multi-aaRS (MSC) complexes that may promote faster turnover rates than when measured individually (Park et al.,

2005). However, these domains are absent in bacteria so the underlying reason for the discrepancy means by which aaRS turnover matches amino acid incorporation is unclear.

Additionally, these in vivo calculations are minimum turnover rates to not limit the translation cycle, and therefore do not provide the charging reaction a ‘safety factor’ as discussed previously. Regardless, given that in vitro measurements are probably not underestimating the in vivo rates by multiple orders of magnitude, reduction in functioning aaRS could easily limit protein synthesis and cellular growth.

One way to study limitations on aaRS activity and its resulting consequences, is through the use of compounds specifically inhibiting individual aaRS. A general way to specifically inhibit individual aaRS is to use structural analogs of amino acid that cannot be properly ligated to tRNA such as serine hydroxymate or arginine hydroxymate (Nazario and Evans, 1974; Tosa and Pizer, 1971). During growth in minimal media, the addition of large amounts of amino acid analog can effectively outcompete the necessary L-amino acids for translation, creating stalled ribosomes and activation of stress responses. However, given their similarity to amino acids they can have non-translation off-target effects. To target aaRS more specifically a few compounds with like properties have been discovered. Mupirocin, the major clinical antibiotic used to treat methicillin-resistant Staphylococcus aureus, binds tightly to and inhibits a wide spectrum of bacterial IleRS enzymes without inactivating the human protein (Hughes and Mellows, 1978;

Nakama et al., 2001). Other aaRS targeting have been discovered, however their therapeutic potential is limited due to the fast acquisition of resistant strains given the essentiality of aminoacylation to bacterial growth (Hurdle et al., 2005; Kwon et al., 2019). For research

33 purposes however, they are key tools to study the effects of inhibiting aaRS (Silvian et al., 1999).

Overall, despite the straightforward and conserved underlying chemical reaction, there is still much to be learned about the biochemistry of the aaRS proteins.

aaRS Gene Regulation

Perturbing aminoacylation whether through amino acid starvation or specific targeting antibiotics does not necessarily lead to lowered charging output in bacteria. In most well studied bacterial species, aaRS genes are auto-regulated in order to increase production when the need arises. Surprisingly, the exact molecular mechanisms to achieve autoregulation is not identical across organisms and the two most well studied organisms E. coli and B. subtilis show striking divergence in strategies for regulating aaRS production.

E. coli seems to have favored a more generalist approach, with multiple different mechanisms at all levels of the central dogma to control aaRS expression (Putzer and Laalami,

2013). The gene for alanine tRNA charging, alaS, moonlights as a , binding palindromic sequences upstream its own promoter in the presence of high alanine levels (Putney and Schimmel, 1981). When levels of alanine are low, such as during acute amino acid starvation, repression is relieved and AlaS levels increase. The tetrameric PheRS complex utilizes a transcriptional attenuation mechanism to regulate the PheST operon (Springer et al.,

1983). The translation rate of a 14 amino acid ORF upstream of the pheST genes containing five phenylalanine residues determines whether or not transcription is prematurely aborted, similar in mechanism to the classic (Springer et al., 1985). ThrRS production is controlled in a manner to ribosomal proteins, by binding its own mRNA to sterically occlude the ribosome and limit translation (Moine et al., 1988; Sacerdot et al., 1998). While the origin of these multiple

34 independent mechanisms arose is still an unknown, clearly regulation of aaRSs is important for

E. coli fitness.

In stark contrast, Bacillus subtilis, and in fact most Firmicutes, use a single strategy to control aaRS expression. First discovered in the early 1990’s, a novel form of riboswitch named

“T box” riboswitches, has been found to control the majority of aaRS genes in Firmicutes with

14 of 19 B. subtilis aaRS genes utilizing this mechanism (Grundy and Henkin, 1993; Henkin et al., 1992; Vitreschak et al., 2008) (Table 1). T box riboswitches are large ~300 nucleotide structured RNAs in the 5’ UTR of the controlled genes (Green et al., 2010; Gutiérrez-Preciado et al., 2009; Putzer et al., 1995). Although some T boxes have been discovered to operate at the translational level (Sherwood et al., 2015), the best studied and most widespread T boxes are transcriptional regulators. During transcription, the RNA upstream of the open forms a complex multi-stem structure with the ability to bind tRNA cognate to the aaRS downstream. The ‘switch’ depends on the aminoacylation status of the bound tRNA (Figure 5A).

When bound by a charged tRNA, occupied at the 3’ end by an amino acid, the final stem loop forms a structure that promotes termination of the RNA polymerase transcription complex. This leads to a truncated and non-functional RNA product before the gene ORF that is subsequently degraded by RNase Y (Deloughery et al., 2018). This outcome is also achieved when no tRNA binds, and is thus the native fold of the riboswitch (Grundy et al., 2002). Alternatively, the binding of uncharged tRNA, with a free 3’ hydroxyl group, changes the conformation of the final loop into an anti-termination loop, allowing for transcription of the downstream mRNA that can be translated to produce aaRS protein. Although there are conserved features of all T boxes, each is a unique sensor to tRNA cognate to that of the synthetase being controlled (Gutiérrez-Preciado et al., 2009). Therefore, like aaRS enzymes, T boxes are required to identify specific variants in a

35 pool of related molecules. Unlike aaRS proteins, the major point of differentiation for the T boxes is the tRNA anticodon, as numerous studies swapping the tRNA anticodon binding triplet in the T box can efficiently switch recognition to another tRNA (Grundy and Henkin, 1993;

Grundy et al., 2002). More recent structural work paints a broader picture in terms of overall tRNA binding, with multiple points of contact between riboswitch and tRNA across multiple different classes of T boxes (Battaglia et al., 2019; Grigg et al., 2013).

Despite the large body of work dedicated to T boxes, most of the focus has been on their biochemical features, especially in vitro, and many key questions about their in vivo nature remain to be explored. Functional studies suggest that as switches, T boxes can act bidirectionally, as reporter genes linked to T boxes are sensitive both to starvation and aaRS overproduction (Luo et al., 1997). Validating this idea, nascent transcriptome sequencing shows that at steady state growth in B. subtilis roughly one in five transcription events leads to a productive aaRS mRNA, suggesting an equilibrium exists in cells to maintain proper aaRS protein levels (Figure 5B) (Larson et al., 2014). How sensitive this equilibrium is to changes in tRNA charging, and the general dynamic range of regulation in vivo, remains to be worked out.

Additionally, it is yet to be determined if the T boxes provide important buffers to expression noise as has been posited for other forms of autoregulation (Raser and O’Shea, 2005).

Furthermore, it is interesting to note that T boxes effectively set their own steady-state level of charged tRNA through the autoregulatory loop. Why tRNA charging levels are maintained to that observed is an open question linking T box biochemistry with the entire translation system and bacterial growth in general. In summary, maintaining a high rate of protein synthesis through a constant flux of aminoacylation buffered by T boxes or other regulation is of critical importance to growing cells.

36

Figure 5. T box riboswitch regulation in Firmicutes. (A) Cartoon schematic of T box riboswitch. During transcription of the 5’ UTR upstream of an aaRS ORF tRNA has a chance to bind primarily with the anticodon. If charged tRNA or no tRNA binds, a hairpin loops forms and transcription terminates. Alternatively, upon binding uncharged tRNA, an antiterminator loop is formed via the nascent transcript and RNAP proceeds to transcribe the full gene body. (B) Sequencing data of serS operon. NET-seq data obtained from (Larson et al., 2014) shows increased transcriptional activity in the T box leader region than the downstream, translated, open reading from of serS gene. T box verified as a UTR by a lack of ribosome protected fragments as detected in ribosome profiling (Lalanne et al., 2018).

(p)ppGpp and the Stringent Response

Control at the level of aaRS expression is not the only regulatory strategy organisms use to control protein production. Given the high cost and investment required in the synthesis of proteins, all organisms have a need to balance protein production with available nutrients. The building blocks of proteins, amino acids, can either be acquired from the environment or synthesized de novo, and cells must maintain a proper balance between the two processes to sustain growth (Lodish et al., 2016). Therefore many regulatory mechanisms have evolved to sense intracellular amino levels in the cell and shift large scale gene expression investment between translation related proteins and amino acid (Hinnebusch, 2005; Sabatini,

2017; Srivatsan and Wang, 2008). In bacteria, the sensor system involves a family of highly conserved relA proteins, a universal small molecule signal, (p)ppGpp, and a large-scale

37 reprogramming of gene expression generally referred to as the “Stringent response” (Figure 6)

(Srivatsan and Wang, 2008).

(p)ppGpp was first discovered as a product of acute amino acid starvation in E. coli

(Cashel, 1969; Potrykus and Cashel, 2008). Since then, the “Magic spot” a hyper-phosphorylated guanine nucleotide has been found to be produced in all studied bacteria as well as

(Atkinson et al., 2011). It is produced by two major enzymes in E. coli, RelA and a closely related protein SpoT, while in B. subtilis it is produced by a major synthetase RelA, and two minor proteins SasA and SasB (Nanamiya et al., 2008). Large bodies of work on (p)ppGpp have revolved around two major questions. First, what is the exact mechanism by which RelA is activated? Second, what are the downstream targets and consequences of increased (p)ppGpp levels?

Figure 6. Mechanism of stringent response activation. Cartoon diagram of steps involved in activating the Stringent response. When the ribosome encounters a codon for a starved tRNA with low charging levels (blue codon, blue tRNA) RelA synthesizes pppGpp. By a variety of mechanisms across bacteria (p)ppGpp induces changes in gene expression, re-allocating resources from growth promoting genes to genes required for starvation.

The mechanism of activation of RelA has been studied extensively via both genetic and biochemical approaches in multiple organisms. Although RelA and SpoT in E. coli are closely related by sequence, it has been found that RelA is the major synthease involved during amino acid (AA) starvation (Potrykus and Cashel, 2008). The two major requirements for activation of

38 RelA during AA starvation are the ribosome and uncharged tRNA within the A-site (Haseltine and Block, 1973). While these two requirements have been well mapped out and agreed upon, the exact mechanism by which RelA is activated is hotly debated. One model proposes that the

RelA sits by on idle ribosomes, waiting for uncharged tRNA to bind. After uncharged tRNA binding during the translation elongation cycle, (p)ppGpp synthesis begins, causing a conformational change in RelA leading to its eventual release from the stalled ribosome

(Wendrich et al., 2002). The activated RelA after a burst of activity then ‘hops’ to the next ribosome awaiting more uncharged tRNA. This model has been further substantiated by single- molecule tracking studies (English et al., 2011). Alternatively, structural and contradictory imaging evidence suggests that RelA is not ribosome bound normally but requires uncharged tRNA to bind while staying ribosome bound during (p)ppGpp synthesis (Brown et al., 2016; Li et al., 2016). While no definitive evidence has concretely confirmed either model, the evidence towards RelA activation requiring uncharged tRNA is unanimous.

Activation of RelA, and the subsequent accumulation of (p)ppGpp, have been mostly studied in the context of amino acid starvation. The collective set of actions taken by the cell during this loss of nutrients is referred to as the “Stringent response”. Although stringent affects nearly all major pathways in the cell including translation, replication, and metabolism

(Hauryliuk et al., 2015; Liu et al., 2015a; Wang et al., 2007), the myriad of transcriptional changes are the most well characterized consequences of activation (Gourse et al., 2018). These changes involve both reductions in production of genes in energy intensive processes used for growth, and increases in production in genes required for shifting to an environment with less available nutrients. Although the overall changes are conserved across species the means by which these transcriptional changes occur differs across bacteria.

39 In E. coli it has been found that (p)ppGpp directly binds to the RNA polymerase (RNAP) inducing a massive shift in polymerase targeting (Durfee et al., 2008; Ross et al., 2013; Sanchez-

Vazquez et al., 2019; Traxler et al., 2008; Zuo et al., 2013). The most important outcome in this re-targeting is the reduction in rRNA operon transcription. This reduction is mediated both by ppGpp directly binding RNAP, and through the subsequent binding of DksA, which together massively reduce ribosome production (Doniselli et al., 2015; Molodtsov et al., 2018; Paul et al.,

2004). Alternatively, in B. subtilis the means by which transcriptional programs are re-wired during the stringent response appears to be indirect, mainly through the modulation of intracellular GTP pools (Lopez et al., 1981). It has been found that (p)ppGpp, while being a downstream metabolite produced from GTP, additionally binds and represses the activity of GTP synthesizing enzymes (Kriel et al., 2012; Liu et al., 2015b). The decrease in GTP pools has a major effect on ribosomal RNA operon transcription as the initiating nucleotide for rRNA in B. subtilis is guanine (Krásný and Gourse, 2004; Natori et al., 2009). This reduction in GTP is thought to make GTP limiting in the transcription initiation reaction, reducing total promoter output for guanine initiating promoters (Krásný et al., 2008).

The other half of the stringent response is the large upregulation in genes related to decreased nutrient availability. Given the mechanism of upregulation involves increased uncharged tRNA pools being sensed on the ribosome, a major set of genes activated are related to amino acid biosynthesis. Like the control of rRNA operons, amino acid biosynthesis is increased by different means in divergent bacteria. In E. coli the decrease in rRNA transcription through (p)ppGpp & DskA, which can account for as much as 70% of total RNAP flux (Bremer and Dennis, 2008)leads to a passive re-distribution of RNAP molecules to other sites in the genome, to the benefit of amino acid biosynthetic enzymes (Barker et al., 2001; Magnusson et

40 al., 2007). In B. subtilis, transcriptional changes are also indirect, although potentially less passive (Eymann et al., 2002). At the same time that GTP pools are decreased upon (p)ppGpp synthesis, ATP pools inversely begin to rise (Lopez et al., 1981). Unlike rRNA operons beginning transcription with a guanine nucleotide, amino acid biosynthesis operons generally begin transcription with adenine and see increased production as ATP is no longer limiting transcription initiation (Krásný et al., 2008). A secondary effect of the stringent response is the de-repression of operons controlled by the CodY transcription factor (Brinsmade et al., 2010;

Tojo et al., 2005). De-repression further induces the expression of branch chain amino acid metabolic genes as well as others involved in flagellar production, competence, and genes involved in stationary phase maintenance (Belitsky and Sonenshein, 2013; Brinsmade et al.,

2014; Kriel et al., 2014). Whether directly or indirectly, the increases in gene expression to amino acid biosynthesis are a direct counteraction to the sensing of limited nutrients provided to the ribosome. Unlike T boxes regulating aaRS genes, which can activate single genes in response to starvation for a single amino acid, the stringent response is global, controlling broad categories of genes even to starvation for a single amino acid.

But what is the importance of such a global reprogramming? The stringent response has been shown to be important for multiple biological phenomena such as sporulation, antibiotic resistance, persistence, and even pathogenicity (Dalebroux et al., 2010; Lopez et al., 1981;

Maisonneuve and Gerdes, 2014; Svenningsen et al., 2019). However, underlying these global phenotypes is the importance of coordinating gene expression. Given the high amount of energy required to sustain translation, it is hugely disadvantageous for cells to maintain ribosomes and protein synthesis rates when nutrients are unavailable. Additionally, ribosomes starved at codons cognate to limiting amino acids can cause ‘traffic-jams’ on mRNA leading to detrimental

41 downstream effects as discussed previously (Gloge et al., 2014; Hanson and Coller, 2018;

Rauscher and Ignatova, 2018). The stringent response has been shown to play a key role in keeping the balance between amino acid production and protein synthesis, possibly maintaining the bacterial ‘growth law’ relationship between ribosome content and the growth rate (Scott et al., 2010; Zhu and Dai, 2019). The recent finding that conserved molecular pathways have maintained stoichiometry between factors across evolution, and that this conservation extends across growth rates, potentially implicates the stringent response as the major coordinator of these expression ratios (Lalanne et al., 2018). The importance of maintaining a proper balance possibly ensures that the translation elongation rate remains constant across a wide range of growth rates (Dai et al., 2016). However, it is unclear if the stringent response would still be beneficial for the cell if activation was triggered without nutrients actually becoming limiting.

For example, reducing expression of aaRS genes could produce the same activation of RelA and production of (p)ppGpp through accumulated uncharged tRNA, despite amino acid levels remaining unchanged. Would the coordination of ribosome production and the bacterial growth law remain? And how would the shift in global transcription affect the cell? I intend to explore these questions in detail in the following work.

42 References

Alexander, R.M. (1984). Optimum strengths for bones liable to fatigue and accidental fracture. J. Theor. Biol. 109, 621–636.

Andersson, S.G., and Kurland, C.G. (1990). Codon preferences in free-living microorganisms. Microbiol. Rev. 54, 198–210.

Arnez, J.G., and Moras, D. (1997). Structural and functional considerations of the aminoacylation reaction. Trends Biochem. Sci. 22, 211–216.

Atkinson, G.C., Tenson, T., and Hauryliuk, V. (2011). The RelA/SpoT homolog (RSH) superfamily: distribution and functional evolution of ppGpp synthetases and hydrolases across the tree of life. PLoS ONE 6, e23479.

Austin, J., and First, E.A. (2002). Comparison of the catalytic roles played by the KMSKS motif in the human and Bacillus stearothermophilus trosyl-tRNA synthetases. J. Biol. Chem. 277, 28394–28399.

Barker, M.M., Gaal, T., and Gourse, R.L. (2001). Mechanism of regulation of transcription initiation by ppGpp. II. Models for positive control based on properties of RNAP mutants and competition for RNAP. Journal of 305, 689–702.

Battaglia, R.A., Grigg, J.C., and Ke, A. (2019). Structural basis for tRNA decoding and aminoacylation sensing by T-box riboregulators. Nat. Struct. Mol. Biol. 26, 1106–1113.

Belardinelli, R., Sharma, H., Caliskan, N., Cunha, C.E., Peske, F., Wintermeyer, W., and Rodnina, M.V. (2016). Choreography of molecular movements during ribosome progression along mRNA. Nat. Struct. Mol. Biol. 23, 342–348.

Belitsky, B.R., and Sonenshein, A.L. (2013). Genome-wide identification of Bacillus subtilis CodY-binding sites at single-nucleotide resolution. Proc. Natl. Acad. Sci. U.S.a. 110, 7026– 7031.

Bengtson, M.H., and Joazeiro, C.A.P. (2010). Role of a ribosome-associated E3 ubiquitin ligase in protein quality control. Nature 467, 470–473.

Biewener, A.A. (1982). Bone strength in small mammals and bipedal birds: do safety factors change with body size? J. Exp. Biol. 98, 289–301.

Biewener, A.A. (1989). Scaling body support in mammals: limb posture and muscle mechanics. Science 245, 45–48.

Bobkova, E.V., Mashanov-Golikov, A.V., Wolfson, A., Ankilova, V.N., and Lavrik, O.I. (1991). Comparative study of subunits of phenylalanyl-tRNA synthetase from Escherichia coli and Thermus thermophilus. FEBS Letters 290, 95–98.

Borel, F., Vincent, C., Leberman, R., and Härtlein, M. (1994). Seryl-tRNA synthetase from

43 Escherichia coli: implication of its N-terminal domain in aminoacylation activity and specificity. Nucleic Acids Res. 22, 2963–2969.

Brakhage, A.A., Wozny, M., and Putzer, H. (1990). Structure and nucleotide sequence of the Bacillus subtilis phenylalanyl-tRNA synthetase genes. Biochimie 72, 725–734.

Bremer, H., and Dennis, P.P. (2008). Modulation of Chemical Composition and Other Parameters of the Cell at Different Exponential Growth Rates. EcoSal Plus 3.

Brinsmade, S.R., Alexander, E.L., Livny, J., Stettner, A.I., Segrè, D., Rhee, K.Y., and Sonenshein, A.L. (2014). Hierarchical expression of genes controlled by the Bacillus subtilis global regulatory protein CodY. Proc. Natl. Acad. Sci. U.S.a. 111, 8227–8232.

Brinsmade, S.R., Kleijn, R.J., Sauer, U., and Sonenshein, A.L. (2010). Regulation of CodY activity through modulation of intracellular branched-chain amino acid pools. J. Bacteriol. 192, 6357–6368.

Brown, A., Fernández, I.S., Gordiyenko, Y., and Ramakrishnan, V. (2016). Ribosome-dependent activation of stringent control. Nature 534, 277–280.

Buskirk, A.R., and Green, R. (2013). Biochemistry. Getting past polyproline pauses. Science 339, 38–39.

Buttgereit, F., and Brand, M.D. (1995). A hierarchy of ATP-consuming processes in mammalian cells. Biochem. J. 312 ( Pt 1), 163–167.

Carter, C.W., and Duax, W.L. (2002). Did tRNA synthetase classes arise on opposite strands of the same gene? Mol. Cell 10, 705–708.

Cashel, M. (1969). The control of ribonucleic acid synthesis in Escherichia coli. IV. Relevance of unusual phosphorylated compounds from amino acid-starved stringent strains. J. Biol. Chem. 244, 3133–3141.

Christensen, S.K., and Gerdes, K. (2003). RelE toxins from bacteria and cleave mRNAs on translating ribosomes, which are rescued by tmRNA. Mol. Microbiol. 48, 1389–1400.

Curnow, A.W., Hong, K.W., Yuan, R., Kim, S.I., Martins, O., Winkler, W., Henkin, T.M., and Söll, D. (1997). Glu-tRNAGln amidotransferase: a novel heterotrimeric enzyme required for correct decoding of glutamine codons during translation. Proc. Natl. Acad. Sci. U.S.a. 94, 11819–11826.

Cusack, S. (1995). Eleven down and nine to go. Nat. Struct. Biol. 2, 824–831.

Cusack, S., Berthet-Colominas, C., Härtlein, M., Nassar, N., and Leberman, R. (1990). A second class of synthetase structure revealed by X-ray analysis of Escherichia coli seryl-tRNA synthetase at 2.5 A. Nature 347, 249–255.

Cusack, S., Härtlein, M., and Leberman, R. (1991). Sequence, structural and evolutionary

44 relationships between class 2 aminoacyl-tRNA synthetases. Nucleic Acids Res. 19, 3489–3498.

Dai, X., Zhu, M., Warren, M., Balakrishnan, R., Patsalo, V., Okano, H., Williamson, J.R., Fredrick, K., Wang, Y.-P., and Hwa, T. (2016). Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth. Nat Microbiol 2, 16231.

Dalbow, D.G., and Young, R. (1975). Synthesis time of beta-galactosidase in Escherichia coli B/r as a function of growth rate. Biochem. J. 150, 13–20.

Dalebroux, Z.D., Svensson, S.L., Gaynor, E.C., and Swanson, M.S. (2010). ppGpp conjures bacterial virulence. Microbiol. Mol. Biol. Rev. 74, 171–199.

Darnell, A.M., Subramaniam, A.R., and O’Shea, E.K. (2018). Translational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells. Mol. Cell 71, 229–243.e11.

Davidi, D., and Milo, R. (2017). Lessons on enzyme kinetics from quantitative proteomics. Curr. Opin. Biotechnol. 46, 81–89.

Davidi, D., Noor, E., Liebermeister, W., Bar-Even, A., Flamholz, A., Tummler, K., Barenholz, U., Goldenfeld, M., Shlomi, T., and Milo, R. (2016). Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements. Proc. Natl. Acad. Sci. U.S.a. 113, 3401–3406.

Dekel, E., and Alon, U. (2005). Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592.

Deloughery, A., Lalanne, J.-B., Losick, R., and Li, G.-W. (2018). Maturation of polycistronic mRNAs by the endoribonuclease RNase Y and its associated Y-complex in Bacillus subtilis. Proc. Natl. Acad. Sci. U.S.a. 115, E5585–E5594.

Dennis, P.P., and Bremer, H. (1974a). Differential rate of synthesis in Escherichia coli B/r. Journal of Molecular Biology 84, 407–422.

Dennis, P.P., and Bremer, H. (1974b). Macromolecular composition during steady-state growth of Escherichia coli B-r. J. Bacteriol. 119, 270–281.

Diamond, J., and Hammond, K. (1992). The matches, achieved by natural selection, between biological capacities and their natural loads. Experientia 48, 551–557.

Diamond, J. (2002). Quantitative evolutionary design. J. Physiol. (Lond.) 542, 337–345.

Dibbelt, L., Pachmann, U., and Zachau, H.G. (1980). Serine activation is the rate limiting step of tRNASer aminoacylation by yeast seryl tRNA synthetase. Nucleic Acids Res. 8, 4021–4039.

Doma, M.K., and Parker, R. (2006). Endonucleolytic cleavage of eukaryotic mRNAs with stalls in translation elongation. Nature 440, 561–564.

45 Dong, H., Nilsson, L., and Kurland, C.G. (1995). Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J. Bacteriol. 177, 1497– 1504.

Dong, H., Nilsson, L., and Kurland, C.G. (1996). Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth Rates. Journal of Molecular Biology 260, 649–663.

Doniselli, N., Rodriguez-Aliaga, P., Amidani, D., Bardales, J.A., Bustamante, C., Guerra, D.G., and Rivetti, C. (2015). New insights into the regulatory mechanisms of ppGpp and DksA on Escherichia coli RNA polymerase-promoter complex. Nucleic Acids Res. 43, 5249–5262.

Durfee, T., Hansen, A.-M., Zhi, H., Blattner, F.R., and Jin, D.J. (2008). Transcription profiling of the stringent response in Escherichia coli. J. Bacteriol. 190, 1084–1096.

Ehrenberg, M., and Kurland, C.G. (1984). Costs of accuracy determined by a maximal growth rate constraint. Q. Rev. Biophys. 17, 45–82.

Eldred, E.W., and Schimmel, P.R. (1972). Investigation of the transfer of amino acid from a transfer ribonucleic acid synthetase-aminoacyl adenylate complex to transfer ribonucleic acid. Biochemistry 11, 17–23.

Elf, J., Nilsson, D., Tenson, T., and Ehrenberg, M. (2003). Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300, 1718–1722.

English, B.P., Hauryliuk, V., Sanamrad, A., Tankov, S., Dekker, N.H., and Elf, J. (2011). Single- molecule investigations of the stringent response machinery in living bacterial cells. Proc. Natl. Acad. Sci. U.S.a. 108, E365–E373.

Eriani, G., Delarue, M., Poch, O., Gangloff, J., and Moras, D. (1990). Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347, 203–206.

Eymann, C., Homuth, G., Scharf, C., and Hecker, M. (2002). Bacillus subtilis functional genomics: global characterization of the stringent response by proteome and transcriptome analysis. J. Bacteriol. 184, 2500–2520.

Ferrin, M.A., and Subramaniam, A.R. (2017). Kinetic modeling predicts a stimulatory role for ribosome collisions at elongation stall sites in bacteria. Elife 6, 198.

Fersht, A.R., Gangloff, J., and Dirheimer, G. (1978). Reaction pathway and rate-determining step in the aminoacylation of tRNAArg catalyzed by the arginyl-tRNA synthetase from yeast. Biochemistry 17, 3740–3746.

Forchhammer, J., and Lindahl, L. (1971). Growth rate of polypeptide chains as a function of the cell growth rate in a mutant of Escherichia coli 15. Journal of Molecular Biology 55, 563–568.

Frank, J., and Gonzalez, R.L. (2010). Structure and dynamics of a processive Brownian motor: the translating ribosome. Annu. Rev. Biochem. 79, 381–412.

46 Frumkin, I., Lajoie, M.J., Gregg, C.J., Hornung, G., Church, G.M., and Pilpel, Y. (2018). Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc. Natl. Acad. Sci. U.S.a. 115, E4940–E4949.

Gloge, F., Becker, A.H., Kramer, G., and Bukau, B. (2014). Co-translational mechanisms of protein maturation. Curr. Opin. Struct. Biol. 24, 24–33.

Gourse, R.L., Chen, A.Y., Gopalkrishnan, S., Sanchez-Vazquez, P., Myers, A., and Ross, W. (2018). Transcriptional Responses to ppGpp and DksA. Annu. Rev. Microbiol. 72, 163–184.

Gouy, M., and Gautier, C. (1982). Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10, 7055–7074.

Gouy, M., and Grantham, R. (1980). Polypeptide elongation and tRNA cycling in Escherichia coli: a dynamic approach. FEBS Letters 115, 151–155.

Green, N.J., Grundy, F.J., and Henkin, T.M. (2010). The T box mechanism: tRNA as a regulatory molecule. FEBS Letters 584, 318–324.

Grigg, J.C., Chen, Y., Grundy, F.J., Henkin, T.M., Pollack, L., and Ke, A. (2013). T box RNA decodes both the information content and geometry of tRNA to affect gene expression. Proc. Natl. Acad. Sci. U.S.a. 110, 7240–7245.

Grundy, F.J., and Henkin, T.M. (1993). tRNA as a positive regulator of transcription antitermination in B. subtilis. Cell 74, 475–482.

Grundy, F.J., Winkler, W.C., and Henkin, T.M. (2002). tRNA-mediated transcription antitermination in vitro: codon-anticodon pairing independent of the ribosome. Proc. Natl. Acad. Sci. U.S.a. 99, 11121–11126.

Gupta, V., and Warner, J.R. (2014). Ribosome-omics of the human ribosome. Rna 20, 1004– 1013.

Gustafsson, C., Govindarajan, S., and Minshull, J. (2004). Codon bias and heterologous protein expression. Trends in Biotechnology 22, 346–353.

Guth, E., Connolly, S.H., Bovee, M., and Francklyn, C.S. (2005). A substrate-assisted concerted mechanism for aminoacylation by a class II aminoacyl-tRNA synthetase. Biochemistry 44, 3785–3794.

Gutiérrez-Preciado, A., Henkin, T.M., Grundy, F.J., Yanofsky, C., and Merino, E. (2009). Biochemical features and functional implications of the RNA-based T-box regulatory mechanism. Microbiol. Mol. Biol. Rev. 73, 36–61.

Hanson, G., and Coller, J. (2018). Codon optimality, bias and usage in translation and mRNA decay. Nature Reviews Molecular Cell Biology 19, 20–30.

Harigaya, Y., and Parker, R. (2010). No-go decay: a quality control mechanism for RNA in

47 translation. Wiley Interdiscip Rev RNA 1, 132–141.

Haseltine, W.A., and Block, R. (1973). Synthesis of guanosine tetra- and pentaphosphate requires the presence of a codon-specific, uncharged transfer ribonucleic acid in the acceptor site of ribosomes. Proc. Natl. Acad. Sci. U.S.a. 70, 1564–1568.

Hauryliuk, V., Atkinson, G.C., Murakami, K.S., Tenson, T., and Gerdes, K. (2015). Recent functional insights into the role of (p)ppGpp in bacterial physiology. Nature Reviews Microbiology 13, 298–309.

Henkin, T.M., Glass, B.L., and Grundy, F.J. (1992). Analysis of the Bacillus subtilis tyrS gene: conservation of a regulatory sequence in multiple tRNA synthetase genes. J. Bacteriol. 174, 1299–1306.

Hinnebusch, A.G. (2005). of GCN4 and the general amino acid control of yeast. Annu. Rev. Microbiol. 59, 407–450.

Hughes, J., and Mellows, G. (1978). Inhibition of isoleucyl-transfer ribonucleic acid synthetase in Escherichia coli by pseudomonic acid. Biochem. J. 176, 305–318.

Hurdle, J.G., O'Neill, A.J., and Chopra, I. (2005). Prospects for aminoacyl-tRNA synthetase inhibitors as new antimicrobial agents. Antimicrob. Agents Chemother. 49, 4821–4833.

Huter, P., Arenz, S., Bock, L.V., Graf, M., Frister, J.O., Heuer, A., Peil, L., Starosta, A.L., Wohlgemuth, I., Peske, F., et al. (2017). Structural Basis for Polyproline-Mediated Ribosome Stalling and Rescue by the Translation EF-P. Mol. Cell 68, 515–527.e516.

Ibba, M., and Söll, D. (2000). Aminoacyl-tRNA synthesis. Annu. Rev. Biochem. 69, 617–650.

Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S., and Weissman, J.S. (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223.

Ishimura, R., Nagy, G., Dotu, I., Zhou, H., Yang, X.-L., Schimmel, P., Senju, S., Nishimura, Y., Chuang, J.H., and Ackerman, S.L. (2014). RNA function. Ribosome stalling induced by mutation of a CNS-specific tRNA causes neurodegeneration. Science 345, 455–459.

Jakubowski, H., and Goldman, E. (1984). Quantities of individual aminoacyl-tRNA families and their turnover in Escherichia coli. J. Bacteriol. 158, 769–776.

Kafri, M., Metzl-Raz, E., Jona, G., and Barkai, N. (2016). The Cost of Protein Production. Cell Reports 14, 22–31.

Kanaya, S., Yamada, Y., Kudo, Y., and Ikemura, T. (1999). Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238, 143–155.

48 Keren, L., Hausser, J., Lotan-Pompan, M., Vainberg Slutskin, I., Alisar, H., Kaminski, S., Weinberger, A., Alon, U., Milo, R., and Segal, E. (2016). Massively Parallel Interrogation of the Effects of Gene Expression Levels on Fitness. Cell 166, 1282–1294.e18.

Kirchner, S., and Ignatova, Z. (2015). Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nature Reviews Genetics 16, 98–112.

Klumpp, S., Dong, J., and Hwa, T. (2012). On ribosome load, codon bias and protein abundance. PLoS ONE 7, e48542.

Klumpp, S., Scott, M., Pedersen, S., and Hwa, T. (2013). Molecular crowding limits translation and cell growth. Proc. Natl. Acad. Sci. U.S.a. 110, 16754–16759.

Krásný, L., and Gourse, R.L. (2004). An alternative strategy for bacterial ribosome synthesis: Bacillus subtilis rRNA transcription regulation. Embo J. 23, 4473–4483.

Krásný, L., Tiserová, H., Jonák, J., Rejman, D., and Sanderová, H. (2008). The identity of the transcription +1 position is crucial for changes in gene expression in response to amino acid starvation in Bacillus subtilis. Mol. Microbiol. 69, 42–54.

Kriel, A., Bittner, A.N., Kim, S.H., Liu, K., Tehranchi, A.K., Zou, W.Y., Rendon, S., Chen, R., Tu, B.P., and Wang, J.D. (2012). Direct regulation of GTP homeostasis by (p)ppGpp: a critical component of viability and stress resistance. Mol. Cell 48, 231–241.

Kriel, A., Brinsmade, S.R., Tse, J.L., Tehranchi, A.K., Bittner, A.N., Sonenshein, A.L., and Wang, J.D. (2014). GTP dysregulation in Bacillus subtilis cells lacking (p)ppGpp results in phenotypic amino acid auxotrophy and failure to adapt to nutrient downshift and regulate biosynthesis genes. J. Bacteriol. 196, 189–201.

Kudla, G., Murray, A.W., Tollervey, D., and Plotkin, J.B. (2009). Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258.

Kwon, N.H., Fox, P.L., and Kim, S. (2019). Aminoacyl-tRNA synthetases as therapeutic targets. Nat Rev Drug Discov 18, 629–650.

Lalanne, J.-B., Taggart, J.C., Guo, M.S., Herzel, L., Schieler, A., and Li, G.-W. (2018). Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749–761.e38.

LaRiviere, F.J., Wolfson, A.D., and Uhlenbeck, O.C. (2001). Uniform Binding of Aminoacyl- tRNAs to Elongation Factor Tu by Thermodynamic Compensation. Science 294, 165–168.

Larson, M.H., Gilbert, L.A., Wang, X., Lim, W.A., Weissman, J.S., and Qi, L.S. (2013). CRISPR interference (CRISPRi) for sequence-specific control of gene expression. Nature Protocols 8, 2180–2196.

Larson, M.H., Mooney, R.A., Peters, J.M., Windgassen, T., Nayak, D., Gross, C.A., Block, S.M., Greenleaf, W.J., Landick, R., and Weissman, J.S. (2014). A pause sequence enriched at

49 translation start sites drives transcription dynamics in vivo. Science 344, 1042–1047.

Laursen, B.S., Sørensen, H.P., Mortensen, K.K., and Sperling-Petersen, H.U. (2005). Initiation of protein synthesis in bacteria. Microbiol. Mol. Biol. Rev. 69, 101–123.

Lavrik, O.I., Moor, N.A., and Khodyreva, S.N. (1982). Phenylalanyl-tRNA synthetase from E. coli MRE-600: localization of the phenylalanine binding sites on the subunits by affinity reagents. Mol. Biol. Rep. 8, 123–126.

Lee, N., Bessho, Y., Wei, K., Szostak, J.W., and Suga, H. (2000). Ribozyme-catalyzed tRNA aminoacylation. Nat. Struct. Biol. 7, 28–33.

Lenhard, B., Orellana, O., Ibba, M., and Weygand-Durasević, I. (1999). tRNA recognition and evolution of determinants in seryl-tRNA synthesis. Nucleic Acids Res. 27, 721–729.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Li, R., Macnamara, L.M., Leuchter, J.D., Alexander, R.W., and Cho, S.S. (2015). MD Simulations of tRNA and Aminoacyl-tRNA Synthetases: Dynamics, Folding, Binding, and Allostery. Int J Mol Sci 16, 15872–15902.

Li, W., Bouveret, E., Zhang, Y., Liu, K., Wang, J.D., and Weisshaar, J.C. (2016). Effects of amino acid starvation on RelA diffusive behavior in live Escherichia coli. Mol. Microbiol. 99, 571–585.

Li, X., Yagi, M., Morita, T., and Aiba, H. (2008). Cleavage of mRNAs and role of tmRNA system under amino acid starvation in Escherichia coli. Mol. Microbiol. 68, 462–473.

Liebermeister, W., Noor, E., Flamholz, A., Davidi, D., Bernhardt, J., and Milo, R. (2014). Visual account of protein investment in cellular functions. Proc. Natl. Acad. Sci. U.S.a. 111, 8488– 8493.

Lin, S.X., Baltzinger, M., and Remy, P. (1983). Fast kinetic study of yeast phenylalanyl-tRNA synthetase: an efficient discrimination between tyrosine and phenylalanine at the level of the aminoacyladenylate-enzyme complex. Biochemistry 22, 681–689.

Liu, K., Bittner, A.N., and Wang, J.D. (2015a). Diversity in (p)ppGpp metabolism and effectors. Curr. Opin. Microbiol. 24, 72–79.

Liu, K., Myers, A.R., Pisithkul, T., Claas, K.R., Satyshur, K.A., Amador-Noguez, D., Keck, J.L., and Wang, J.D. (2015b). Molecular mechanism and evolution of guanylate kinase regulation by (p)ppGpp. Mol. Cell 57, 735–749.

Lodish, H., Berk, A., Kaiser, C.A., Krieger, M., Bretscher, A., Ploegh, H., Amon, A., and Scott, M.P. (2016). Molecular Cell Biology (W. H. Freeman).

Lopez, J.M., Dromerick, A., and Freese, E. (1981). Response of guanosine 5'-triphosphate

50 concentration to nutritional changes and its significance for Bacillus subtilis sporulation. J. Bacteriol. 146, 605–613.

Luo, D., Leautey, J., Grunberg-Manago, M., and Putzer, H. (1997). Structure and regulation of expression of the Bacillus subtilis valyl-tRNA synthetase gene. J. Bacteriol. 179, 2472–2478.

Magnusson, L.U., Gummesson, B., Joksimović, P., Farewell, A., and Nyström, T. (2007). Identical, independent, and opposing roles of ppGpp and DksA in Escherichia coli. J. Bacteriol. 189, 5193–5202.

Maisonneuve, E., and Gerdes, K. (2014). Molecular mechanisms underlying bacterial persisters. Cell 157, 539–548.

Martinez-Rodriguez, L., Erdogan, O., Jimenez-Rodriguez, M., Gonzalez-Rivera, K., Williams, T., Li, L., Weinreb, V., Collier, M., Chandrasekaran, S.N., Ambroggio, X., et al. (2015). Functional Class I and II Amino Acid-activating Enzymes Can Be Coded by Opposite Strands of the Same Gene. J. Biol. Chem. 290, 19710–19725.

Mohammad, F., Green, R., and Buskirk, A.R. (2019). A systematically-revised ribosome profiling method for bacteria reveals pauses at single-codon resolution. Elife 8, 8324.

Moine, H., Romby, P., Springer, M., Grunberg-Manago, M., Ebel, J.P., Ehresmann, C., and Ehresmann, B. (1988). Messenger RNA structure and gene regulation at the translational level in Escherichia coli: the case of threonine:tRNAThr ligase. Proc. Natl. Acad. Sci. U.S.a. 85, 7892– 7896.

Molodtsov, V., Sineva, E., Zhang, L., Huang, X., Cashel, M., Ades, S.E., and Murakami, K.S. (2018). Allosteric Effector ppGpp Potentiates the Inhibition of Transcript Initiation by DksA. Mol. Cell 69, 828–839.e5.

Morisaki, T., Lyon, K., DeLuca, K.F., DeLuca, J.G., English, B.P., Zhang, Z., Lavis, L.D., Grimm, J.B., Viswanathan, S., Looger, L.L., et al. (2016). Real-time quantification of single RNA translation dynamics in living cells. Science 352, 1425–1429.

Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540.

Nagaraj, N., Kulak, N.A., Cox, J., Neuhauser, N., Mayr, K., Hoerning, O., Vorm, O., and Mann, M. (2012). System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top Orbitrap. Mol. Cell Proteomics 11, M111.013722.

Nakama, T., Nureki, O., and Yokoyama, S. (2001). Structural basis for the recognition of isoleucyl-adenylate and an antibiotic, mupirocin, by isoleucyl-tRNA synthetase. J. Biol. Chem. 276, 47387–47393.

Nanamiya, H., Kasai, K., Nozawa, A., Yun, C.-S., Narisawa, T., Murakami, K., Natori, Y.,

51 Kawamura, F., and Tozawa, Y. (2008). Identification and functional analysis of novel (p)ppGpp synthetase genes in Bacillus subtilis. Mol. Microbiol. 67, 291–304.

Natori, Y., Tagami, K., Murakami, K., Yoshida, S., Tanigawa, O., Moh, Y., Masuda, K., Wada, T., Suzuki, S., Nanamiya, H., et al. (2009). Transcription activity of individual rrn operons in Bacillus subtilis mutants deficient in (p)ppGpp synthetase genes, relA, yjbM, and ywaC. J. Bacteriol. 191, 4555–4561.

Nazario, M., and Evans, J.A. (1974). Physical and kinetic studies of arginyl transfer ribonucleic acid ligase of Neurospora. A sequential ordered mechanism. J. Biol. Chem. 249, 4934–4936.

Nedialkova, D.D., and Leidel, S.A. (2015). Optimization of Codon Translation Rates via tRNA Modifications Maintains Proteome Integrity. Cell 161, 1606–1618.

O'Brien, E.J., Utrilla, J., and Palsson, B.O. (2016). Quantification and Classification of E. coli Proteome Utilization and Unused Protein Costs across Environments. PLoS Comput. Biol. 12, e1004998.

O'Donoghue, P., and Luthey-Schulten, Z. (2003). On the evolution of structure in aminoacyl- tRNA synthetases. Microbiol. Mol. Biol. Rev. 67, 550–573.

Park, S.G., Ewalt, K.L., and Kim, S. (2005). Functional expansion of aminoacyl-tRNA synthetases and their interacting factors: new perspectives on housekeepers. Trends Biochem. Sci. 30, 569–574.

Paul, B.J., Barker, M.M., Ross, W., Schneider, D.A., Webb, C., Foster, J.W., and Gourse, R.L. (2004). DksA: a critical component of the transcription initiation machinery that potentiates the regulation of rRNA promoters by ppGpp and the initiating NTP. Cell 118, 311–322.

Peacock, J.R., Walvoord, R.R., Chang, A.Y., Kozlowski, M.C., Gamper, H., and Hou, Y.-M. (2014). Amino acid-dependent stability of the acyl linkage in aminoacyl-tRNA. Rna 20, 758– 764.

Perfeito, L., Ghozzi, S., Berg, J., Schnetz, K., and Lässig, M. (2011). Nonlinear fitness landscape of a molecular pathway. PLOS Genet 7, e1002160.

Peters, J.M., Colavin, A., Shi, H., Czarny, T.L., Larson, M.H., Wong, S., Hawkins, J.S., Lu, C.H.S., Koo, B.-M., Marta, E., et al. (2016). A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. Cell 165, 1493–1506.

Potrykus, K., and Cashel, M. (2008). (p)ppGpp: still magical? Annu. Rev. Microbiol. 62, 35–51.

Putney, S.D., and Schimmel, P. (1981). An aminoacyl tRNA synthetase binds to a specific DNA sequence and regulates its gene transcription. Nature 291, 632–635.

Putney, S.D., Sauer, R.T., and Schimmel, P.R. (1981). Purification and properties of alanine tRNA synthetase from Escherichia coli A tetramer of identical subunits. J. Biol. Chem. 256, 198–204.

52 Putzer, H., Laalami, S., Brakhage, A.A., Condon, C., and Grunberg-Manago, M. (1995). Aminoacyl-tRNA synthetase gene regulation in Bacillus subtilis: induction, repression and growth-rate regulation. Mol. Microbiol. 16, 709–718.

Putzer, H., and Laalami, S. (2013). Regulation of the Expression of Aminoacyl-tRNA Synthetases and Translation Factors.

Qi, L.S., Larson, M.H., Gilbert, L.A., Doudna, J.A., Weissman, J.S., Arkin, A.P., and Lim, W.A. (2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183.

Radhakrishnan, A., Chen, Y.-H., Martin, S., Alhusaini, N., Green, R., and Coller, J. (2016). The DEAD-Box Protein Dhh1p Couples mRNA Decay and Translation by Monitoring Codon Optimality. Cell 167, 122–132.e129.

Rajkovic, A., and Ibba, M. (2017). and the Control of Translation Elongation. Annu. Rev. Microbiol. 71, 117–131.

Raser, J.M., and O’Shea, E.K. (2005). Noise in gene expression: origins, consequences, and control. Science 309, 2010–2013.

Rauscher, R., and Ignatova, Z. (2018). Timing during translation matters: synonymous mutations in human pathologies influence protein folding and function. Biochem. Soc. Trans. 46, 937–944.

Rest, J.S., Morales, C.M., Waldron, J.B., Opulente, D.A., Fisher, J., Moon, S., Bullaughey, K., Carey, L.B., and Dedousis, D. (2013). Nonlinear fitness consequences of variation in expression level of a eukaryotic gene. Mol. Biol. Evol. 30, 448–456.

Rodin, S.N., and Ohno, S. (1995). Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig Life Evol Biosph 25, 565– 589.

Rodnina, M.V., Fischer, N., Maracci, C., and Stark, H. (2017). Ribosome dynamics during decoding. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 372, 20160182.

Ross, W., Vrentas, C.E., Sanchez-Vazquez, P., Gaal, T., and Gourse, R.L. (2013). The magic spot: a ppGpp binding site on E. coli RNA polymerase responsible for regulation of transcription initiation. Mol. Cell 50, 420–429.

Ruff, M., Krishnaswamy, S., Boeglin, M., Poterszman, A., Mitschler, A., Podjarny, A., Rees, B., Thierry, J.C., and Moras, D. (1991). Class II aminoacyl transfer RNA synthetases: crystal structure of yeast aspartyl-tRNA synthetase complexed with tRNA(Asp). Science 252, 1682– 1689.

Russell, J.B., and Cook, G.M. (1995). Energetics of bacterial growth: balance of anabolic and catabolic reactions. Microbiol. Rev. 59, 48–62.

Sabatini, D.M. (2017). Twenty-five years of mTOR: Uncovering the link from nutrients to

53 growth. Proc. Natl. Acad. Sci. U.S.a. 114, 11818–11825.

Sacerdot, C., Caillet, J., Graffe, M., Eyermann, F., Ehresmann, B., Ehresmann, C., Springer, M., and Romby, P. (1998). The Escherichia coli threonyl-tRNA synthetase gene contains a split ribosomal binding site interrupted by a hairpin structure that is essential for autoregulation. Mol. Microbiol. 29, 1077–1090.

Salvador, A., and Savageau, M.A. (2003). Quantitative evolutionary design of glucose 6- phosphate dehydrogenase expression in human erythrocytes. Proc. Natl. Acad. Sci. U.S.a. 100, 14463–14468.

Sanchez-Vazquez, P., Dewey, C.N., Kitten, N., Ross, W., and Gourse, R.L. (2019). Genome- wide effects on Escherichia coli transcription from ppGpp binding to its two sites on RNA polymerase. Proc. Natl. Acad. Sci. U.S.a. 116, 8310–8319.

Sander, T., Farke, N., Diehl, C., Kuntz, M., Glatter, T., and Link, H. (2019). Allosteric Feedback Inhibition Enables Robust Amino Acid Biosynthesis in E. coli by Enforcing Enzyme Overabundance. Cell Syst 8, 66–75.e68.

Schaechter, M., MaalOe, O., and Kjeldgaard, N.O. (1958). Dependency on Medium and Temperature of Cell Size and Chemical Composition during Balanced Growth of Salmonella typhimurium. Microbiology 19, 592–606.

Schimmel, P.R., and Söll, D. (1979). Aminoacyl-tRNA synthetases: general features and recognition of transfer RNAs. Annu. Rev. Biochem. 48, 601–648.

Schrader, J.M., Chapman, S.J., and Uhlenbeck, O.C. (2011). Tuning the affinity of aminoacyl- tRNA to elongation factor Tu for optimal decoding. Proc. Natl. Acad. Sci. U.S.a. 108, 5215– 5220.

Scott, M., and Hwa, T. (2011). Bacterial growth laws and their applications. Curr. Opin. Biotechnol. 22, 559–565.

Scott, M., Gunderson, C.W., Mateescu, E.M., Zhang, Z., and Hwa, T. (2010). Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102.

Scott, M., Klumpp, S., Mateescu, E.M., and Hwa, T. (2014). Emergence of robust growth laws from optimal regulation of ribosome synthesis. Molecular Systems Biology 10, 747.

Sharp, P.M., and Li, W.H. (1987). The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295.

Sherwood, A.V., Grundy, F.J., and Henkin, T.M. (2015). T box riboswitches in Actinobacteria: translational regulation via novel tRNA interactions. Proc. Natl. Acad. Sci. U.S.a. 112, 1113– 1118.

Shoemaker, C.J., Eyler, D.E., and Green, R. (2010). Dom34:Hbs1 promotes subunit dissociation and peptidyl-tRNA drop-off to initiate no-go decay. Science 330, 369–372.

54 Silvian, L.F., Wang, J., and Steitz, T.A. (1999). Insights into editing from an ile-tRNA synthetase structure with tRNAile and mupirocin. Science 285, 1074–1077.

Sood, S.M., Hill, K.A., and Slattery, C.W. (1997). Stability of Escherichia coli alanyl-tRNA synthetase quaternary structure under increased pressure. Arch. Biochem. Biophys. 346, 322– 323.

Sood, S.M., Slattery, C.W., Filley, S.J., Wu, M.X., and Hill, K.A. (1996). Further characterization of Escherichia coli alanyl-tRNA synthetase. Arch. Biochem. Biophys. 328, 295– 301.

Springer, M., Mayaux, J.F., Fayat, G., Plumbridge, J.A., Graffe, M., Blanquet, S., and Grunberg- Manago, M. (1985). Attenuation control of the Escherichia coli phenylalanyl-tRNA synthetase operon. Journal of Molecular Biology 181, 467–478.

Springer, M., Trudel, M., Graffe, M., Plumbridge, J., Fayat, G., Mayaux, J.F., Sacerdot, C., Blanquet, S., and Grunberg-Manago, M. (1983). Escherichia coli phenylalanyl-tRNA synthetase operon is controlled by attenuation in vivo. Journal of Molecular Biology 171, 263–279.

Srivatsan, A., and Wang, J.D. (2008). Control of , translation and replication by (p)ppGpp. Curr. Opin. Microbiol. 11, 100–105.

Stoebel, D.M., Hokamp, K., Last, M.S., and Dorman, C.J. (2009). Compensatory evolution of gene regulation in response to stress by Escherichia coli lacking RpoS. PLOS Genet 5, e1000671.

Subramaniam, A.R., Pan, T., and Cluzel, P. (2013). Environmental perturbations lift the degeneracy of the genetic code to regulate protein levels in bacteria. Proc. Natl. Acad. Sci. U.S.a. 110, 2419–2424.

Subramaniam, A.R., Zid, B.M., and O’Shea, E.K. (2014). An Integrated Approach Reveals Regulatory Controls on Elongation. Cell 159, 1200–1211.

Svenningsen, M.S., Veress, A., Harms, A., Mitarai, N., and Semsey, S. (2019). Birth and Resuscitation of (p)ppGpp Induced Antibiotic Tolerant Persister Cells. Sci Rep 9, 6056–13.

Sørensen, M.A., and Pedersen, S. (1991). Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. Journal of Molecular Biology 222, 265–280.

Sørensen, M.A., Kurland, C.G., and Pedersen, S. (1989). Codon usage determines translation rate in Escherichia coli. Journal of Molecular Biology 207, 365–377.

Tojo, S., Satomura, T., Morisaki, K., Deutscher, J., Hirooka, K., and Fujita, Y. (2005). Elaborate transcription regulation of the Bacillus subtilis ilv-leu operon involved in the biosynthesis of branched-chain amino acids through global regulators of CcpA, CodY and TnrA. Mol. Microbiol. 56, 1560–1573.

55 Tosa, T., and Pizer, L.I. (1971). Biochemical bases for the antimetabolite action of L-serine hydroxamate. J. Bacteriol. 106, 972–982.

Traxler, M.F., Summers, S.M., Nguyen, H.-T., Zacharia, V.M., Hightower, G.A., Smith, J.T., and Conway, T. (2008). The global, ppGpp-mediated stringent response to amino acid starvation in Escherichia coli. Mol. Microbiol. 68, 1128–1148.

Ude, S., Lassak, J., Starosta, A.L., Kraxenberger, T., Wilson, D.N., and Jung, K. (2013). Translation elongation factor EF-P alleviates ribosome stalling at polyproline stretches. Science 339, 82–85.

Uter, N.T., and Perona, J.J. (2004). Long-range intramolecular signaling in a tRNA synthetase complex revealed by pre-steady-state kinetics. Proc. Natl. Acad. Sci. U.S.a. 101, 14396–14401.

Vitreschak, A.G., Mironov, A.A., Lyubetsky, V.A., and Gelfand, M.S. (2008). Comparative genomic analysis of T-box regulatory systems in bacteria. Rna 14, 717–735.

Wang, C., Han, B., Zhou, R., and Zhuang, X. (2016). Real-Time Imaging of Translation on Single mRNA Transcripts in Live Cells. Cell 165, 990–1001.

Wang, J.D., Sanders, G.M., and Grossman, A.D. (2007). Nutritional control of elongation of DNA replication by (p)ppGpp. Cell 128, 865–875.

Watson, J.D., Baker, T.A., and Bell, S.P. (2014). Molecular Biology of the Gene (Benjamin- Cummings Publishing Company).

Weiss, S.L., Lee, E.A., and Diamond, J. (1998). Evolutionary matches of enzyme and transporter capacities to dietary substrate loads in the intestinal brush border. Proc. Natl. Acad. Sci. U.S.a. 95, 2117–2121.

Wendrich, T.M., Blaha, G., Wilson, D.N., Marahiel, M.A., and Nierhaus, K.H. (2002). Dissection of the mechanism for the stringent factor RelA. Mol. Cell 10, 779–788.

Woese, C. (1998). The universal ancestor. Proc. Natl. Acad. Sci. U.S.a. 95, 6854–6859.

Wohlgemuth, I., Brenner, S., Beringer, M., and Rodnina, M.V. (2008). Modulation of the rate of peptidyl transfer on the ribosome by the nature of substrates. J. Biol. Chem. 283, 32229–32235.

Wu, B., Eliscovich, C., Yoon, Y.J., and Singer, R.H. (2016). Translation dynamics of single mRNAs in live cells and neurons. Science 352, 1430–1435.

Wu, C.C.-C., Zinshteyn, B., Wehner, K.A., and Green, R. (2019). High-Resolution Ribosome Profiling Defines Discrete Ribosome Elongation States and Translational Regulation during Cellular Stress. Mol. Cell 73, 959–970.e5.

Yan, X., Hoek, T.A., Vale, R.D., and Tanenbaum, M.E. (2016). Dynamics of Translation of Single mRNA Molecules In Vivo. Cell 165, 976–989.

56 Young, R., and Bremer, H. (1976). Polypeptide-chain-elongation rate in Escherichia coli B/r as a function of growth rate. Biochem. J. 160, 185–194.

Yu, C.-H., Dang, Y., Zhou, Z., Wu, C., Zhao, F., Sachs, M.S., and Liu, Y. (2015). Codon Usage Influences the Local Rate of Translation Elongation to Regulate Co-translational Protein Folding. Mol. Cell 59, 744–754.

Zhang, C.-M., Perona, J.J., Ryu, K., Francklyn, C., and Hou, Y.-M. (2006). Distinct kinetic mechanisms of the two classes of Aminoacyl-tRNA synthetases. Journal of Molecular Biology 361, 300–311.

Zhu, M., and Dai, X. (2019). Growth suppression by altered (p)ppGpp levels results from non- optimal resource allocation in Escherichia coli. Nucleic Acids Res. 47, 4684–4693.

Zuo, Y., Wang, Y., and Steitz, T.A. (2013). The mechanism of E. coli RNA polymerase regulation by ppGpp is suggested by the structure of their complex. Mol. Cell 50, 430–436.

57 Chapter II

Growth Optimized Aminoacyl-tRNA Synthetase Levels Prevent Maximal

tRNA Charging

Darren J. Parker,1 Jean-Benoît Lalanne,1,2 Satoshi Kimura,3,4,5 Grace E. Johnson,1

Matthew K. Waldor,3,4,5 Gene-Wei Li1

1Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

2Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

3Division of Infectious Diseases, Brigham and Women’s Hospital, Boston, MA, 02115, USA

4Department of Microbiology, Harvard Medical School, Boston MA, 02115, USA

5Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, 02115, USA

This part was published in Cell Systems. DOI: 10.1016/j.cels.2020.07.005 PMID: 32726597

Author Contributions

D.J.P and G.-W.L. designed the experiments and analysis. J.B.L. and D.J.P. designed and optimized the competition experiment. D.J.P. and S.K. performed the tRNA charging experiments. D.J.P and G.E.J created the strains for the experiments. D.J.P. and G.-W.L. wrote the manuscript. S.K., M.K.W., and J.B.L edited the manuscript.

58 Abstract

Aminoacyl-tRNA synthetases (aaRSs) serve a dual role in charging tRNAs. Their enzymatic activities both provide protein synthesis flux and reduce uncharged tRNA levels. Although uncharged tRNAs can negatively impact bacterial growth, substantial concentrations of tRNAs remain deacylated even under nutrient-rich conditions. Here we show that tRNA charging in

Bacillus subtilis is not maximized due to optimization of aaRS production during rapid growth, which prioritizes demands in protein synthesis over charging levels. The presence of uncharged tRNAs is alleviated by precisely tuned translation kinetics and the stringent response, both insensitive to aaRS overproduction but sharply responsive to underproduction, allowing for just- enough aaRS production atop a “fitness cliff”. Notably, we find that the stringent response mitigates fitness defects at all aaRS underproduction levels even without external starvation.

Thus, adherence to minimal, flux-satisfying protein production drives limited tRNA charging and provides a basis for the sensitivity and set-points of an integrated growth-control network.

59 Introduction

Aminoacyl-tRNA synthetases (aaRS) catalyze a universal step in the synthesis of proteins by ligating amino acids to the appropriate tRNA, a processed termed “tRNA charging” . Their enzymatic activities determine two interrelated but distinct properties: the flux of charged tRNAs to the ribosome and the steady-state levels of uncharged tRNAs, both of which are important for regulating cell growth (Figure 1A). First, the flux of tRNA aminoacylation is directly related to the overall rate of peptide chain growth and hence the rates of mass accumulation and growth in bacteria (Bremer and Dennis, 2008; Dai et al., 2016). Second, the steady-state level of charged

(acylated) tRNAs is inversely related to the level of uncharged (deacylated) tRNAs, which have several negative effects on cell physiology. On a global scale, uncharged tRNAs trigger stress responses that slow down cell growth in both bacteria and eukaryotes (Hinnebusch, 2005;

Potrykus and Cashel, 2008; Srivatsan and Wang, 2008). In bacteria, uncharged tRNA levels are sensed by the (p)ppGpp synthetase RelA, whose activity reduces growth via a variety of pathways collectively referred to as the stringent response (Gourse et al., 2018; Hauryliuk et al.,

2015; Liu et al., 2015). On the molecular scale, reduced tRNA charging is thought to cause ribosome pausing, which has adverse effects on mRNA stability and co-translational protein folding (Gloge et al., 2014; Hanson and Coller, 2018; Rauscher and Ignatova, 2018). Given the importance of aaRS’s dual roles, it is plausible that aaRS production is tuned to both minimize uncharged tRNA levels and provide ample capacity for charging flux during rapid cell growth.

However, a substantial concentration of uncharged tRNAs has been frequently observed in vivo even under rich nutrient conditions across divergent bacteria (Avcilar-Kucukgoze et al.,

2016; Dittmar et al., 2005; McClain et al., 1999; Tockman and Vold, 1977; Yegian et al., 1966).

For a wide spectrum of tRNAs – both within the same organism as well as across divergent

60 bacterial species, the concentrations of uncharged tRNAs and total tRNAs are close to the same order of magnitude. This is despite the fact that the production of many aaRSs is directly set by uncharged tRNA levels (and not charging flux) via various negative autoregulatory mechanisms

(Giegé and Springer, 2016; Gutiérrez-Preciado et al., 2009; Putzer et al., 1995; Vitreschak et al.,

2008), suggesting that the setpoints of these feedback loops have evolved to avoid high levels of aaRSs and low levels of uncharged tRNAs. The potential tension between high concentrations of uncharged tRNA and their pleiotropic effects points to a global optimization problem of aaRS production that prevents maximal tRNA charging by cells.

Understanding this potential optimization in aminoacylation requires a systems-level characterization of the molecular and physiological consequences of altering aaRS production rates. A priori, it is unclear whether charging levels are sensitive to small perturbations in aaRS production, as multiple enzymes have been shown to be produced with excess capacities (Davidi and Milo, 2017; O'Brien et al., 2016; Peters et al., 2016; Sander et al., 2019). Even if charging levels can be increased with aaRS overproduction, whether higher charging leads to faster translation depends on additional biochemical parameters in vivo, such as the kinetics of ribosome translocation and availability of EF-Tu. Furthermore, elevated aaRS production may incur benefit-offsetting burdens on cell fitness, including cost of synthesis or synthetase specific overproduction defects (Dekel and Alon, 2005; Dong et al., 1995; Scott and Hwa, 2011;

Sherman et al., 1992; Swanson et al., 1988). These considerations contribute to the fitness landscape of cells relative to aaRS production, which is potentially modified by the stringent response that connects growth rates to charging levels. In general, detailed quantitative relationships between protein production, biochemical activities, and phenotypic outcomes have been difficult to establish.

61 Here we determined the dependency and mechanistic underpinnings of bacterial cell growth on aminoacyl-tRNA synthetase production. High-precision measurements of cellular fitness and aaRS production rates show that these proteins are produced with little excess in order to optimize the growth rate of Bacillus subtilis in nutrient-rich conditions. Although tRNA charging is not maximized at the growth-optimized aaRS levels, the charging capacity matches the flux required to support translation elongation. At native aaRS production levels, the substantial pool of uncharged tRNAs does not lead to pleiotropic effects, due in part to a lack of strong activation of the stringent response. Nevertheless, cells remain exquisitely sensitive to even slight aaRS underproduction despite being relatively insensitive to overproduction.

Notably, although the stringent response is known for curbing growth during external amino acid starvation, we find that it partially alleviates growth defects during aaRS underproduction – a condition in which de novo does not provide benefits. Ribosome profiling reveals that stringent null cells display both a lack of coordination of gene expression and exacerbated translation kinetic defects during aaRS underproduction. Together, these results show that the circuits for both controlling and sensing tRNA aminoacylation are collectively tuned to optimize bacterial growth, leading to unexpected properties of cellular mechanisms that are elucidated only with a fully integrated analysis.

Results aaRS production is growth optimized on an acutely asymmetric landscape

We first mapped the quantitative relationship between cell proliferation rates and individual aaRS protein production in B. subtilis, a relationship we refer to as the fitness landscape (Figure

1B). Four essential synthetase genes were chosen as models (argS, ileS, leuS, and serS) for

62 several key reasons. First, they reside in their own monocistronic operons, which allow for targeted perturbations. Second, these aaRSs are not heteromeric, therefore eliminating the need to maintain stoichiometric production of multiple subunits exogenously. Third, the regulation of these genes is representative of aaRS regulation in B. subtilis, which includes both feedback- regulation by T Box elements (ileS, leuS, and serS) and constitutive expression (argS). Although it is expected that cell growth would be affected at extreme levels of aaRS underproduction, the precise mapping of aaRS production onto fitness has not been determined.

In order to systematically vary aaRS production, we first moved each gene outside its native context and placed it under the control an IPTG-inducible promoter (Britton et al., 2002).

Using this system, the effective aaRS production relative to the wildtype level ranges from tenfold underproduction to tenfold overproduction, as estimated by considering the IPTG- dependent promoter strength and the translation efficiency of the ectopic mRNA (measured by ribosome profiling) (Figure S1A, STAR Methods). At each IPTG concentration, the competitive disadvantage during exponential growth was measured using a pooled competition assay with

DNA barcoding, which can reliably detect 0.5% differences in the net steady-state growth rate

(Figure S1B). Fitness for aaRS inducible strains is defined as the growth rate compared to barcoded control strains with the same antibiotic resistance markers. The combination of these methods provides a generalizable approach for mapping the precise dependence of fitness on individual protein production.

The fitness landscapes for all synthetases showed a strong asymmetry between low and high levels of production (Figure 1C). At low levels of aaRS, fitness is acutely sensitive to aaRS production. This ‘fitness cliff’ may arise from insufficient flux of tRNA charging for cell growth or premature activation of the stringent response. By contrast, fitness is much less sensitive to

63 high levels of aaRS, which may be due to a lack of changes in tRNA charging or reduced dependence of fitness on tRNA charging. The mechanistic basis underlying these behaviors is central to understanding bacterial growth regulation and is examined in further detail below.

We found that the wildtype production rates of aaRS are growth-optimized on the fitness landscape, placing cells near the edge of the fitness cliff. For all synthetases tested, no other aaRS expression level leads to a measurable increase in growth rate. Near the growth-optimized levels, cells with twofold less aaRS production have an average fitness defect of 1% (Figure 1D) and are steadily lost in the competition assay. In contrast, a greater than tenfold overproduction is required to produce a similar fitness defect (Figure 1D). The observed fitness landscape is quantitatively similar across different aaRSs, including SerS whose tRNA charging levels are often low and could potentially benefit from overproduction (Avcilar-Kucukgoze et al., 2016;

Dittmar et al., 2005; Ferro et al., 2017). The asymmetric sensitivity is further confirmed by measuring mass doubling times at selected concentrations (Figure S1C). Despite the comparatively small fitness effects of overproduction, cells appear to produce just enough aaRS proteins, leaving limited buffering capacity for tRNA charging for each of the aaRSs. Together, the asymmetric fitness landscape surrounding the growth optimum suggest that there is a non- trivial transition in the level of tRNA charging or dynamics of mRNA translation near native aaRS production.

64

Figure 1. Fitness landscape of tRNA synthetase production shows growth optimization near a fitness cliff (A) Model describing the dual roles of aminoacyl-tRNA synthetase (aaRS) concentrations and their relation to the cellular growth rate (black lines). The production of aaRS is subject to negative feedback regulation by uncharged tRNAs in Bacillus subtilis (grey line). (B) Fitness landscape defined as the effective relationship between aaRS production and exponential-phase population growth rate. To measure the fitness landscape, we removed the negative feedback regulation to allow direct control of aaRS production. (C-D) Empirical fitness landscape of aaRS production. For each aaRS, gene expression is tuned by IPTG and the effective protein production is calibrated to its native levels using a method based on ribosome profiling (STAR Methods). Relative fitness is defined as the relative growth rate compared to barcoded control strains in pooled competition experiments. Yellow shaded regions indicate the approximate location of the fitness cliff. Data near wildtype fitness levels (dashed box) are depicted in (D). Error bars indicate standard error of fitness measurements for individually barcoded, biological replicates. Grey area indicates experimental uncertainty in fitness values relative to wildtype (=0.5%). Asterisks indicate p-value <0.05 (one asterisk) and <0.01 (two asterisks) from two-sample one-tailed Student’s t test comparing barcoded replicates between control and aaRS induction experiment. See also Figure S1.

65 aaRS overproduction boosts tRNA charging despite a lack of fitness benefit

We next determined whether tRNA charging ceases to increase with aaRS overproduction, which could explain a lack of benefit in the overproduction regime (Figure 2A). Using Northern blotting to quantify aminoacylated and deacylated tRNAs, we found that the charging levels in wildtype B. subtilis cells were between 50-65% (Figure 2B), near the levels observed in other bacterial species (Avcilar-Kucukgoze et al., 2016; Dittmar et al., 2004; Yegian et al., 1966). For all aaRSs tested, ectopic overproduction (5 to 20-fold above native levels) increased the charging levels of cognate tRNAs, in some cases to nearly 90% (Figure 2B). These results indicate that the lack of fitness benefit beyond native aaRS levels is not due to cessation in tRNA charging increase.

To independently confirm these results, we created reporter systems utilizing the native autoregulation for aaRS genes (Figure 2C). In B. subtilis, most aaRS genes have T box leader elements in their 5’ UTR, which bind to uncharged tRNA and activate transcription in a concentration-dependent manner (Green et al., 2010; Gutiérrez-Preciado et al., 2009). Taking advantage of these sensors, we fused the T Box with a gene encoding GFP at the locus that originally harbors the native copy of synthetase gene. In this broken autoregulatory system, increases in uncharged tRNA levels can be read out by increases in gfp expression as measured by qRT-PCR (Figure 2C). These reporters confirmed that aaRS overproduction leads to higher charging (decreased gfp expression) (Figure 2D). They also showed that the riboswitches (and consequently charging level) are continuously responsive to a wide range of aaRS production, i.e. both under and overproduction (Figure 2D, S2A). We note that the magnitude of changes in gfp expression is dependent upon the sensitivity of the T Box elements and is not a quantitative readout of uncharged tRNA levels. Together with the fitness landscape profiling, these results

66 suggest that the benefits of reducing uncharged tRNA levels during aaRS overproduction are limited and potentially negated by other processes in the cells.

Figure 2. aaRS overproduction reduces uncharged tRNA levels (A) Simplified relationship between aaRS production and uncharged tRNA levels. In practice, overproduction of aaRS may not reduce uncharged tRNA levels if there is already excess charging capacity at native levels of aaRS production. (B) Representative northern blots to determine tRNA charging levels in wildtype and aaRS overproduction strains. A control sample of deacylated tRNA was used to determine location of uncharged tRNAs. The middle lane of each blot contains tRNA extracted from wildtype cells (strain 168), and the final lane (o.p.) contains tRNA from aaRS induction strains with 500 μM IPTG (3.7x, 16x, 14x, and 20x overproduction for LeuS, ArgA, IleS, and SerS, respectively). Fraction of charged tRNA was calculated as the ratio between the lower band and the sum of the upper and lower band. Due to a lack of sufficient separation of aminoacylated serine tRNA, quantification was not determined (n.d.). (C) Schematic of reporter used to measure changes in uncharged tRNA levels. The native copy of the aaRS gene was replaced with a gfp gene and moved under the control of an IPTG-inducible promoter (pSpankHy) at the amyE locus in the genome. (D) T box reporter activity over range of aaRS production. For each inducible aaRS strain, the change in uncharged tRNA levels was reported

67 by the fold change (FC) of gfp compared to induction conditions with near native aaRS production (indicated). Induction conditions with both under and over native production was tested and gfp mRNA measured by qRT-PCR. The native argS locus does not contain a T box and therefore was not tested in this assay. See also Figure S2.

Translation elongation is sensitive to decreased but not increased aaRS production

We next show that a transition in translation elongation kinetics underlie the saturated growth response associated with elevated aaRS production (Figure 3A). To estimate the relative ribosome transit time at each codon, we performed ribosome profiling on wildtype cells and the inducible aaRS strains across the expression landscape. The mean ribosome occupancy for each codon accounts for the total transit time across a full elongation cycle (Wu et al., 2019), including the waiting times for ternary complex binding and ribosome translocation, that determines the net rates of peptide chain synthesis and cell growth (Figure 3B).

We found that ribosome transit time is sensitive to aaRS underproduction but not overproduction. aaRS reduction leads to increased ribosome occupancy for the cognate codons

(Figure 3C and S3A,C), and even a two-fold reduction from the native production rates has a noticeable effect. Although the slower growth rates in these conditions may reduce global tRNA expression, such a global decrease is not expected to lead to the amino acid specific ribosome pausing that we observed when the corresponding aaRS is underproduced. Furthermore, we found that the levels of tRNA cognate to the underproduced aaRS are not reduced compared to other tRNAs, demonstrating that decreased tRNA charging is the major mechanism leading to increased ribosome pausing (Figure S3E). Consistent with previous reports in the context of amino acid limitation (Elf et al., 2003; Subramaniam et al., 2013a; 2013b), we observed a hierarchy of sensitivity among the cognate codons for both LeuS and SerS, where only four of the six codons saw a substantial increase in ribosome occupancy upon aaRS reduction. Such

68 differential sensitivity may arise from differences in the ratio of codon usage and tRNA abundance (Elf et al., 2003; Subramaniam et al., 2013a; 2013b). These results show that slower translation due to reduced charging is associated with the acute sensitivity on the fitness landscape upon aaRS underproduction.

In contrast to underproduction, aaRS overproduction does not lead to substantial changes in the elongation kinetics. For codons cognate to the inducible synthetase, we observed similar ribosome occupancy during aaRS overproduction compared to wildtype (Figure 3D and S3B, D), suggesting that the elongation cycle is limited by other processes or that the increase in charged tRNA levels is not sufficient to accelerate translation elongation. Although it is possible that the transit time decreases below our limit of detection, the lack of sensitivity of elongation kinetics at elevated tRNA charging levels is consistent with several lines of evidence in the literature suggesting that the ternary complex concentration is near saturation for the elongation cycle (Dai et al., 2016; Gouy and Grantham, 1980; Klumpp et al., 2013; Subramaniam et al., 2014).

Consequently, elevated aaRS production has little to no benefit on translation, even though tRNA charging is increased. In addition to this limited benefit, the observed decrease in fitness with elevated aaRS production may arise from the costs associated with unnecessary protein synthesis (~2% of global protein synthesis capacity with 10-fold overproduction for individual aaRS) (Dong et al., 1995; Scott et al., 2010). Together, these results show that the wildtype levels of aaRS production are set to provide just enough capacity for supporting translational flux instead of minimizing uncharged tRNA levels.

69

Figure 3. Translation elongation kinetics change only with aaRS underproduction (A) Relationship between aaRS production and translation elongation rate to be tested. (B) Simplified schematic of the translation elongation cycle. The step of ternary complex binding (1) depends on aaRS production, but may not be rate-limiting for the elongation cycle. (C) Changes in codon occupancy across aaRS production. Average codon occupancy for indicated codons was acquired by ribosome profiling at the specific aaRS induction levels for LeuS and SerS (STAR methods). 81 genes for LeuS induction and 218 genes for SerS induction passed the depth threshold in all experiments and were used for analysis. (D) Average ribosome occupancy for all 61 sense codons for aaRS overproduction and control. See also Figure S3.

70 The stringent response is protective during aaRS underproduction in rich media

The omnipresence of uncharged tRNAs, even at native aaRS production, suggests that (p)ppGpp and the stringent response may play an important role in shaping the fitness landscape. The physiological consequences of the stringent response are well-studied under conditions of external amino acid starvation, but how the response affects aaRS optimization under nutrient- rich conditions is unknown. Gene expression profiling by RNA sequencing across the aaRS production landscape showed signatures of elevated (p)ppGpp levels only during aaRS underproduction, including branch chain amino acid biosynthesis and CodY-controlled gene upregulation (Figure 4A, S4A) (Eymann et al., 2002; Kriel et al., 2014; Sonenshein, 2005). This upregulation was prominent despite the fact that excess amino acids are provided in the media. A major role of the stringent response during amino acid limitation is growth reduction by diverting gene expression away from genes involved in protein synthesis and towards those involved in de novo amino acid biosynthesis. However, since amino acids are readily available in the media, spending resources on creating biosynthetic enzymes is unnecessary and could be wasteful, thereby amplifying the effects of insufficient tRNA charging during aaRS underproduction.

Alternatively, the redistribution of resources may instead be beneficial due to reduction in ribosome production. These two models predict opposite growth effects upon removal of the stringent response (Figure 4B).

To test the effect of the stringent response on cell growth when aaRS is underproduced in rich media, we measured fitness of B. subtilis strains with inducible aaRS but lack the ability to activate RelA-dependent (p)ppGpp synthesis. In these strains, we introduced a point mutation to the native copy of relA (relA D264G) that has been previously shown to abolish the (p)ppGpp synthetase activity while retaining the ability to hydrolyze background levels of (p)ppGpp, as

71 evidenced by no detectable (p)ppGpp during serine hydroxamate treatment (Nanamiya et al.,

2008). As a control, we found that this mutation has only a mild effect on growth at native aaRS levels (0.6 ± 0.3% relative fitness) and few gene expression changes (Figure S4B, S4C).

However, upon aaRS underproduction, strains without the stringent response showed further exacerbated fitness cliffs as measured by the competition assay (Figure 4C). Across all aaRSs tested, removal of (p)ppGpp synthetase activity leads to an additional decrease in fitness of

~2.5% at ~2-fold aaRS underproduction, and ~25% at ~10-fold underproduction. The severe growth defects were further confirmed by measuring mass doubling of individual cultures

(Figure S4C). These results indicate that, instead of curtailing bacterial growth rates, the stringent response in fact enhances growth when there is insufficient aminoacyl-tRNA synthetase activity in the absence of external starvation.

72

Figure 4. Removal of the stringent response negatively impacts cell growth during aaRS underproduction (A) Induction of CodY-controlled amino acid biosynthetic genes as a proxy for stringent response activation across aaRS expression. RNA-seq was performed on strains producing IleS and ArgS over a range of induction conditions. The fold change in transcriptome fraction (RPKM) for genes related to amino acid metabolism was compared to wildtype. See Table S1 for list of genes. aaRS production was calibrated in the same way as Figure 1C. (B) Potential models for aaRS fitness landscape for cells lacking the stringent response. The stringent response may amplify the effects of insufficient tRNA charging (Regulation detrimental), or reduce adverse effects (Regulation beneficial). (C) Effects on aaRS fitness landscape of stringent response removal. Strains with stringent response intact (circles) were competed against stringent null cells (squares) across the aaRS production landscape. Measurements of fitness and aaRS production were calculated as in Figure 1. See also Figure S4.

73 We next characterized the molecular defects in mRNA translation associated with the reduced fitness in stringent-null cells during aaRS underproduction. In E. coli, the stringent response enforces a linear relationship between growth rate and the production of translation- related proteins (Scott et al., 2010; Zhu and Dai, 2019), such that ribosomes are operating near capacity at most growth rates. Using ribosome profiling to quantify the allocation of global protein synthesis to translation-related factors, we found that B. subtilis cells with aaRS underproduction adhere to a similar linear relationship when the stringent response is present

(Figure 5A). This applies to both core ribosomal proteins and translation factors, such as other aaRSs that are not externally controlled. However, the coordinated reduction is lost in cells without the stringent response, which overproduce ribosomal proteins and translation factors without matching production of supporting aaRSs (Figure 5A and S5A).

We also found that the uncoordinated production of translation machinery is accompanied with exacerbated ribosome pausing and can even lead to ribosome queuing during translation. For cells that cannot activate the stringent response, ribosome profiling data showed higher ribosome occupancy at the starved codons compared to cells with an intact stringent response (Figure 5B). In the case of SerS, underproduction leads to elevated ribosome occupancy at regularly spaced intervals upstream of severely starved codons, suggesting that additional ribosomes are queued behind the waiting ribosome (Figure 5C). This effect is only observed when the stringent response is removed (Figure S5B). Ribosome queuing is potentially a consequence of overcapacity of protein synthesis in strains deficient in the stringent response, which could in turn further exacerbate the growth defect of the cell. In summary, these findings show that RelA-dependent (p)ppGpp synthesis helps avoid adverse effects on translation during forced reduction in tRNA charging.

74 Importantly, we found that the majority of gene expression changes during aaRS underproduction are mediated through the stringent response. In addition to the translation machinery, the bulk of the changes between wildtype and aaRS underproduction involved increased expression of known genes that promote survival under starvation, such as those regulated by CodY and SigB that fail to increase in the stringent null cells (Eymann et al., 2002;

Zhang and Haldenwang, 2003) (Figure S5C). Overall, stringent null cells during aaRS underproduction show relatively few expression changes compared to WT (with intact stringent response and native aaRS levels) (Figure S5C). Of the few increases in expression selectively in the stringent null cells, we find that clpE, a AAA+ unfoldase and subunit of the ClpE-ClpP protease, is massively increased in expression, indicating potential protein folding stress due to the observed translation pausing. However, despite the few changes and lack of wasteful amino acid biosynthesis expression due to growth in rich media, stringent null cells still display lower fitness. Taken together, we find that the benefits of the coordinated production of translation components outweigh the unnecessary production of amino acid biosynthetic enzymes, making the stringent responsive protective against perturbations in aaRS levels in rich media.

75

Figure 5. Stoichiometry and kinetics of translation machinery are perturbed without stringent response (A) Fraction of protein synthesis devoted to translation-related genes as a function of relative fitness in cells with and without the stringent response. For each inducible aaRS strain at each induction level profiled, the relative fitness as measured by the competition assay was plotted against the fraction of protein synthesis (rpkm) as measured by ribosome profiling for ribosomal proteins and aminoacyl-tRNA synthetases (STAR Methods, Table S1). Stringent positive cells are plotted as closed circles, stringent null cells as open squares. Arrows point to changes when the stringent response is removed at identical induction conditions. (B) Ribosome occupancy changes due to stringent response removal for IleS and SerS underproduction. Average codon occupancy for 61 sense codons with matched induction conditions is plotted for cells with and without the stringent response. Induction conditions lead to 0.18X and 0.12X native production levels of IleS and SerS based on calibration. Colored codons indicate cognate codons to the inducible aaRS. 310 genes for IleS induction and 218 genes for SerS induction passed the depth threshold in paired experiments and were used for analysis. (C) Metagene analysis of ribosome pausing and queuing during SerS underproduction in strains lacking the stringent response. Genes were aligned from each of the 61 codons and averaged across them. Difference in ribosome occupancy between SerS underproducing cells without the stringent response and wildtype is plotted. Red traces indicate serine codons. For full calculation see STAR Methods. See also Figure S5.

76 Native levels of uncharged tRNAs cause limited stringent response in rich media

If the stringent response remains activated by the native levels of uncharged tRNAs, it could adversely modify the optimal levels aaRS production. However, we found that in the absence of aaRS perturbations, the stringent null cells have similar gene expression patterns as wildype cells, including ribosomal protein genes, amino acid biosynthetic genes, and CodY-controlled genes (Fig. S4C). At the level of translation, eliminating the stringent response with native aaRS production does not lead to substantial differences in ribosome pausing (Fig. S5D). These results corroborate the limited fitness effects of the relA D264G mutation in otherwise wildtype cells under nutrient-rich conditions (Fig. S4B). Further increasing aaRS production from native levels also showed limited differences in (p)ppGpp-regulated genes, ribosome pausing, and fitness (Fig.

3D, S3B, 4A and S4A), which is in stark contrast to aaRS underproduction in the same media.

Therefore, the stringent response is sensitively programmed to activate with slight aaRS underproduction but not at native levels (Figure 4A and S4A), even though the latter state is also associated with substantial amounts of uncharged tRNAs. Taken together, these data show that the stringent response participates in ameliorating the fitness cliff without affecting optimal aaRS production rates.

Discussion

Here we thoroughly characterized the mechanistic underpinnings of bacterial growth optimization with respect to aminoacyl-tRNA synthetase production. By profiling the molecular and physiological effects that contribute to the overall global fitness landscape, we showed that the production of aaRS proteins is tuned to satisfy the requirement of tRNA charging flux and not to minimize the steady-state levels of uncharged tRNAs during growth in rich media.

77 Moreover, our work yielded insights into several fundamental biological processes that have been well studied in isolation but are nevertheless extensively cross-regulated in living cells.

First, we found that native levels of aaRS production provide limited excess capacity to support translation. For the synthetases examined, a slight decrease from native production rates results in slower translation at cognate codons and a fitness cliff (Figure 6), suggesting that the enzymes are normally operating near saturation in fast growth conditions. This ‘just enough’ aaRS production stands in contrast to the situations of enzymatic over-capacity proposed for other metabolic enzymes, which are thought to be advantageous by providing a buffer against environmental or stochastic fluctuations (Davidi and Milo, 2017; O'Brien et al., 2016; Peters et al., 2016; Sander et al., 2019). Our findings corroborate a recent study on the origin of haploinsufficient genes in yeast suggesting that the expression of haploinsufficient genes are limited by fitness costs associated with their overexpression (Morrill and Amon, 2019). In the case of aaRSs, we found that even a small fitness defect of overproduction, by comparison to underproduction, could drive evolution towards an expression that is next to the fitness cliff. In addition, the homeostatic feedback of uncharged tRNAs on aaRS production may further ensure that cells maintain sufficient tRNA charging capacity for translation elongation (Figure 6).

Whether or not these same principles apply to growth in nutrient poor conditions when amino acids must be synthesized de novo remains to be explored. Finally, given that all four aaRSs tested here show limited excess capacity, our findings demonstrate that bacterial growth may be limited simultaneously by multiple enzymes in the terms of their underproduction and not overproduction, akin to ribosomal production.

78

Figure 6. An integrated view of growth optimization for aaRSs. At growth optimized levels of aaRS production (vertical dashed line), tRNA charging is not yet maximized. However, further increasing aaRS production and tRNA charging does not result in faster translation elongation, leading to no fitness benefit during overproduction. The native levels of aaRS production, with just enough capacity for the amino acid flux into protein synthesis, are positioned near the edge of the fitness cliff in the highly asymmetric landscape. The sharp sensitivity to aaRS underproduction is partially alleviated by the stringent response despite the unnecessary activation of amino acid biosynthesis genes in rich media. Furthermore, the T box- based autoregulation buffers perturbations to aaRS production from both directions, which is enabled by having non-maximized tRNA charging at the native levels.

Second, our results show that the feedback regulation for aaRS synthesis is programmed to prevent both overproduction and underproduction despite the highly asymmetric fitness landscape (Figure 6). As a homeostatic mechanism, negative feedback regulation may not be constantly active. In one design scenario, the feedback may be constitutively on and inactivated only during reduced output to ensure sufficient charging capacity. Alternatively, it may be

79 constitutively off and activated only during excessive output to prevent wasteful aaRS production. Although the extreme sensitivity to aaRS underproduction suggests a priority to ensure sufficient charging capacity, our T box reporters show that the native autoregulation remains responsive to both under and overproduction, which is consistent with a previous report for thrS and valS (Gendron et al., 1994; Luo et al., 1997). Therefore, the negative feedback loop for aaRS has not evolved to be a merely single-purpose device that is protective in only one direction. Importantly, this ability to respond in both regimes is rooted in the fact that the signal for feedback, i.e. tRNA charging, is not fully saturated at the native aaRS production (Figure 6).

Finally, similar to how translation speed is only sensitive to aaRS underproduction, global growth regulation by the stringent response is activated with mild aaRS reduction while having limited effects at native uncharged tRNA levels. Although it is possible that some

(p)ppGpp is produced with native aaRS production or even in the relA D264G mutant used here, our gene expression data for aaRS overproduction suggest that these basal uncharged tRNA levels do not substantially activate the stringent response. Given that the native uncharged fraction is estimated to be 10-50% for most tRNAs, the concentration of uncharged tRNAs cannot increase drastically with a small decrease in aaRS production. How the stringent response is programmed to be sharply activated by small changes in uncharged tRNA levels remains to be resolved, and recent structural analyses on how RelA is activated on the ribosome may provide insights (Loveland et al., 2016; Winther et al., 2018). Upon activation, the stringent response curbs ribosome biogenesis and promotes de novo amino acid biosynthesis. However, the latter likely leads to wasteful production of anabolic enzymes when aaRS is limiting in the presence of externally supplied amino acids. It is therefore striking that we observed a strong protective effect of the stringent response during aaRS limitation (Figure 6). We propose that the

80 maintenance of protein stoichiometry for the translation apparatus contributes to the protective effect. Without the stringent response, ribosomal protein and some translation factors remain highly expressed, whereas most aaRSs (in addition to the limiting one) have reduced expression in line with the growth rate. The resulting imbalance between ribosomes and aaRSs, together with defects in other (p)ppGpp-dependent regulation (Srivatsan and Wang, 2008; Trinquier et al.,

2019), may lead to the observed translation defects, including exacerbated ribosome pausing and queuing. This result underscores the importance of proper stoichiometry among functionally associated proteins and offers an explanation for the broadly conserved protein stoichiometry across a wide range of species (Lalanne et al., 2018).

Elucidating the interdependence between gene expression, protein activities, and cellular fitness is a fundamental problem in systems biology. With the rapidly expanding knowledge of gene expression and protein functions, a key challenge is to understand how these molecular properties are integrated in vivo to define the behavior of a cell. Our work outlines an approach to connect the fitness landscape to its mechanistic understandings, providing powerful insight into the optimization of protein abundance and regulation.

81 Supplemental Figures & Legends

Figure S1. Determination of the fitness landscape of aaRS production. Related to Figure 1. (A) Workflow to determine the fitness landscape of aaRS production. To measure fitness (y- axis), competition experiments with DNA barcoded strains were mixed and grown with constant

82 backdilution to maintain exponential growth. After DNA harvest, amplicon amplification, and sequencing, strains of interest were counted against a control strain. Linear fit of the log2 ratio was calculated to determine the fitness score. To calibrate aaRS production (x-axis), aaRS inducible strains were grown under a specified induction condition (500 μM) and ribosome profiling was performed to measure aaRS production rates at that condition. Additionally, a GFP reporter under the same inducible promoter as the aaRS genes was grown with the same induction conditions as the competition experiment and gfp expression was measured by qPCR. gfp mRNA level at each discrete inducer concentration was used to calibrate changes in aaRS production rates. Full details on competition experiments and calibration are provided in the Methods section. (B) Reproducibility of fitness measurements. Pairwise fitness comparisons between barcoded strains are shown. For all competition experiments at each discrete [IPTG], the absolute value of the difference in fitness for each strain-BC pair for the same strain-type was calculated and plotted for two control strain-types (top) and all strain-types (including aaRS inducible strains, bottom). (C) Representative growth curves of strains with inducible aaRS production. Cells were grown in fresh MCCG media (STAR Methods) with specified inducer concentration. OD600 readings were taken during the exponential phase of growth and fit to an exponential curve. Fitness was calculated as the doubling time of the wildtype divided by the doubling time of the induction experiment (GC) and compared to fitness measurement from competition experiment (CE).

Figure S2. T box reporter response to aaRS production changes. Related to Figure 2. (A) T box reporter response at different aaRS production levels. Log2 fold change of gfp compared to near native aaRS production from Figure 2D plotted against the corresponding calibrated aaRS production from Figure 1C.

83

84 Figure S3. Translation elongation kinetics slowed with underproduction but not increased with overproduction of aaRS. Related to Figure 3. (A) Changes in codon occupancy across IleS production. Average codon occupancy for indicated codons was acquired by ribosome profiling at the specific aaRS induction levels for IleS. (B) Average ribosome occupancy for all 61 sense codons for IleS overproduction and control. 340 genes for IleS induction passed the depth threshold and were used for analysis in (A) and (B). (C and D) Codon occupancy plots for underproduction (C) and overproduction (D) of ArgS. 420 genes passed the depth threshold and were used for analysis. The average codon occupancy was calculated for all 61 sense codons with cognate codons colored accordingly. The calibrated aaRS production rate is indicated in the upper right. (E) rRNA and tRNA expression changes between wildtype and aaRS reduction. RNA-seq on total RNA was performed on wildtype cells and strains underproducing IleS and ArgS. Ribosomal RNA reads were mapped to a single rRNA operon. tRNA and rRNA are colored red and orange respectively. tRNA cognate to the underproduced aaRS (arginine and isoleucine) are colored according to experiment (green and purple). Light grey line indicates no change between experiments and dark grey lines indicate two-fold expression changes.

85

86 Figure S4. Confirmation that the stringent response is activated and beneficial during aaRS underproduction. Related to Figure 4. (A) Induction of CodY-controlled genes as a proxy for stringent response activation across aaRS expression. RNA-seq was performed on strains producing IleS and ArgS over a range of induction conditions. The fold change in transcriptome fraction for genes controlled by CodY but not related to amino acid biosynthesis was compared to wildtype. See Table S1 for list of genes. aaRS production was calibrated in the same way as Figure 1C. (B) Fitness of strains bearing the relA D264G mutation but otherwise wildtype across all competition experiments. A single barcoded strain with the relA point mutation was competed in two experimental replicates across a range of IPTG concentrations and the fitness was compared to the control strain. (C) Gene expression changes between wildtype B. subtilis and stringent null cells (relA D264G). Ribosome profiling of wildtype cells was compared to cells with the relA D264G point mutation. Light grey line indicates no change between experiments and dark grey lines indicate two-fold expression changes. Genes related to amino-acid biosynthesis, CodY-controlled genes, or translation components are colored in purple, blue, and red respectively. (D) Fitness landscapes with and without an active stringent response measured with by-hand growth curves. Cells with inducible aaRS expression were grown at discrete concentrations of IPTG. OD600 was measured across the exponential phase of growth and the doubling time calculated. Fitness was calculated as the doubling time of the wildtype 168 parent strain divided by the doubling time of the experimental strain at the discrete concentration of IPTG. Strains with wildtype stringent response (open circles) were compared to those with the relA D264G mutation (filled squares).

87

Figure S5. Stringent response reduces translation defects during aaRS underproduction. Related to Figure 5. (A) Fraction of protein synthesis devoted to translation factors and core transcription components as a function of relative fitness in cells with and without the stringent response. For each inducible aaRS strain at each induction level profiled, the relative fitness as measured by the competition assay was plotted against the fraction of protein synthesis as measured by ribosome profiling (STAR Methods, Table S1). Stringent positive cells are plotted as solid circles, stringent null cells as open squares. Arrows point to changes in proteome fraction when the stringent response is removed at identical induction conditions. (B) Traces of ribosome occupancy during SerS underproduction with stringent response present. Traces for ribosome occupancy in cells with reduced SerS production but with a wildtype stringent response were calculated in the same manner as Figure 6C. (C) Gene expression changes in cells with and

88 without the stringent response. All gene expression changes for matched SerS and IleS underproduction ribosome profiling experiments with and without the stringent response (Figure 5B) were compared to each other and wildtype. Genes related to amino-acid biosynthesis, CodY controlled genes, SigB controlled genes, translation components, and clpE are colored in purple, blue, green, red, and orange respectively. See Table S1 for list of genes compared. Light grey, dark grey, and yellow lines indicate no, two-fold, and five-fold change respectively. (D) Average codon occupancy comparison between stringent positive and stringent negative cells. Using the same ribosome profiling dataset as Figure S3C, average codon occupancy for 61 sense codons was calculated as described previously. 362 genes passed the depth threshold in both experiments and were used for analysis.

Figure S6. Comparison of aaRS quantification methods. Related to Methods section on Calibration of aaRS production. (A) Direct comparison between aaRS quantification methods. For each ribosome profiling experiment with inducible aaRS strains at the IPTG concentration listed two quantities were calculated. ‘Relative’ aaRS expression was calculated as the fold change in proteome fraction for the controlled aaRS in the experiment compared to wildtype. Calibrated aaRS production is calculated as described in the method section and Figure S1A. Double asterisks indicate experiments on aaRS inducible strains with the relA D264G mutation (stringent null). (B) Fraction of protein synthesis for each aaRS across all experiments. Ribosome profiling was used to directly calculate the fraction of protein synthesis for each aaRS in each aaRS induction experiment and wildtype (aaRS production relative to the rest of the proteome). Fraction of protein synthesis for the aaRS from wildtype experiment is plotted with a filled square and grey line. Open squares indicate measured fraction of protein synthesis during IPTG induction of individual aaRS.

89 Acknowledgments

We thank members of the MIT BioMicro Center for help in design and completion of the high- throughput sequencing; members of the Li lab and Grossman lab for discussions. This research is supported by NIH R35GM124732, Pew Biomedical Scholars Program, a Sloan Research

Fellowship, Searle Scholars Program, the Smith Family Award for Excellence in Biomedical

Research, an NIH Pre-Doctoral Training Grant (T32 GM007287, to D.J.P. and G.E.J.), an NSF graduate research fellowship (to G.E.J.), an NSERC graduate fellowship (to J.B.L.), and an

HHMI International Student Fellowship (to J.B.L.). S.K.’s work in the laboratory of M.K.W. is supported by the HHMI and NIH Grant R01-AI-042347.

90 Methods

Strain construction

Bacillus subtilis 168 strain was used as the wild-type strain and base strain for transformations.

All further strains were constructed using natural competence as described below. For all assays, cells were grown in a defined rich minimal media (MCCG) described below. All inducible aaRS strains were created by first cloning individual aaRS ORFs under the pSpankHY promoter system in a plasmid containing the lac , a resistance cassette, and flanked by 500 nt homology arms of the amyE locus (Britton et al., 2002). After creation, plasmids were linearized, transformed into B. subtilis 168, and selected with 100 μg/mL spectinomycin. To remove the native copy of the aaRS gene, a transcriptional fusion of GFP and a resistance cassette (CmR) was assembled using Gibson assembly with additional flanking 500-nucleotide homology arms to the aaRS locus of interest (Gibson et al.,

2009). The complete linearized DNA product were directly transformed into the inducible aaRS strains and selected on LB-agar plates with 5 μg/mL chloramphenicol and 100 μM IPTG. To the generate stringent null strains, a gBlock DNA fragment (Integrated DNA technologies) was purchased containing 378 nt of the relA gene sequence: 186 nt upstream and 189 nt downstream of codon 264 while changing codon 264 GAT (Asp) to GGT (Gly). The full sequence can be found in Table S2. Fragments containing further upstream and downstream arms of the relA locus as well as a Kanamycin resistance gene (KanR) were assembled. Upon full product amplification by PCR, DNA was transformed into strains and selected on LB plates with 10

μg/mL Kanamycin.

Strain transformation

91 Competent cells were prepared by following protocol; Cells to be transformed were streaked onto LB plates with the appropriate selection marker. The following morning a single colony was picked into 5 mL of LM media (LB + 3 mM MgSO4). Cultures were incubated at 37°C with shaking until OD600 was between 0.8 and 1.2. 300 μL of cells were then transferred to 3 mL competence media (1% Glucose, 3 mM MgSO4, 2.5 mg/mL Potassium Aspartate, 11 μg/mL

Ferric Ammonium Citrate, 40 μg/mL Tryptophan, 10.7 g/L Dipotassium Phosphate, 6 g/L

Monopotassium Phosphate, 1 g/L Trisodium Citrate Dihydrate) and incubated with shaking for

3.5 hours at 37°C. A 200 μL aliquot of competent cells was mixed with 100-1000 ng of DNA and shaken at 37°C for an additional 1.5 hours before plating on the appropriate selection plates.

Selection was performed on LB agar plates using the following concentrations of antibiotics; 10

μg /mL Kanamycin, 100 ng/mL Spectinomycin, 5 μg /mL Chloramphenicol, and/or 10 μg/mL

Tetracycline.

Growth media

For all experiments, a defined rich media was used for growth of cells (MCCG). See Table S3 for a complete recipe. Given the essentiality of the aaRS genes tested, all strain maintenance, overnight growth conditions, or LB-agar plates included 100 μM IPTG. For competition experiments “conditioned MCCG” media was used instead of fresh MCCG media. To create conditioned MCCG, wildtype B. subtilis 168 was back-diluted 2000-fold from an overnight culture into 0.5 L of fresh MCCG media. Cells were grown with vigorous shaking at 37°C until

OD600 = 0.2. The culture was then immediately poured over a 0.2 μm vacuum filtration flask, collecting the sterilized “conditioned” MCCG media. We found that conditioned MCCG media nearly eliminates a lag phase (0-5 minutes) upon back-dilution during the competition

92 experiment while fresh MCCG induces a 10-15 minute lag phase before the resumption of exponential growth (data not shown).

qRT-PCR

For qRT-PCR measurements, overnight cultures of cells were back-diluted 1000-fold and grown in fresh MCCG media with the appropriate concentration of IPTG. When the culture reached

OD600 = 0.3, cells were collected by mixing 5 mL of culture with 5 mL of ice-cold methanol.

After 10 minutes on ice, cells were spun down at 4°C for 10 minutes, decanted of excess liquid and resuspended in 100 μL of a solution containing 10 mg/mL Lysozyme, 10 mM Tris-HCl pH

7, and 1 mM EDTA. RNA was extracted using RNeasy Mini Kits (Qiagen) according to the manufacturers protocols for Gram-positive Bacteria. Random hexamer reverse transcription with

Moloney Murine Leukemia Virus (M-MuLV) (M0253; NEB) was performed according to the manufacturer’s specifications. Gene expression changes were normalized to sigA. For a list of primers used see Table S2.

Ribosome Profiling

For ribosome profiling experiments, cells were grown in fresh MCCG media, using a protocol outlined previously (Lalanne et al., 2018; Li et al., 2014). An overnight liquid culture, started from a single colony from a fresh plate, was diluted about 5,000-fold into 250 mL fresh MCCG media in a 2.8 L flask at 37°C with aeration (200 rpm) until OD600 = 0.4. The cell culture was rapidly filtered over a nitrocellulose filter and scraped into liquid nitrogen at 37°C. Cell pellets were combined with 650 mL of frozen droplets of lysis buffer 10 mM MgCl2 , 100 mM NH4 Cl,

20 mM Tris pH 8.0, 0.1% NP-40, 0.4% Triton X-100, 100 U/mL DNase I, and 1 mM

93 chloramphenicol. Cells and lysis buffer were pulverized in 10 mL canisters prechilled in liquid nitrogen using a TissueLyser II (Qiagen) for 5 cycles of 3 min at 15 Hz. Pulverized lysate was thawed on ice and clarified by centrifugation at 20,000 rcf. for 10 min at 4°C. 0.5 mg of clarified lysate was then digested with 5 mM CaCl2 and 750 U of micrococcal nuclease (Roche) at 25°C for 1 hr before quenching with 6 mM EGTA.

Following nuclease digestion, the monosome fraction was collected using sucrose gradient and the RNA extracted by hot-phenol extraction. Ribosome-protected mRNA fragments

(footprints) with size ranging from 15 to 45 nucleotides were size selected on a polyacrylamide gel. Footprint RNA was dephosphorylated using T4 polynucleotide kinase (NEB) at 37°C for one hour. Three picomoles of footprints were ligated to 100 pmole of 5’ adenylated and 3’ end blocked DNA oligo (Linker-1, Table S2) using truncated T4 RNA ligase 2 K277Q at 37°C for

2.5 hr. The ligated product was purified by using polyacrylamide gel. cDNA was generated using

Superscript III (ThermoFisher) and custom primer oCJ485 (Table S2) and isolated by size excision on a polyacrylamide gel.

Single-stranded cDNA was circularized using 100 U of CircLigase (Lucigen) at 60°C for

2 hr. Ribosomal cDNA fragments were removed using biotin-linked custom DNA oligos (Table

S2) and MyOne Streptavidin C1 Dynabeads (ThermoFisher). Final cDNA was amplified using

Q5 DNA polymerase (NEB) with o231 primer and indexing primers (Table S2). Sequencing was performed on an Illumina HiSeq 2000 or NextSeq500.

RNA sequencing

For RNA-seq experiments, overnight cultures of cells were back-diluted 1000-fold and grown in fresh MCCG media with the appropriate concentration of IPTG. When the culture reached OD600

94 = 0.3, cells were collected by mixing 9 mL of culture with 1 mL of ice-cold stop solution (95%

Ethanol, 5% Phenol). Cells were spun down for 10 minutes at 20,000 rcf in a 4°C centrifuge.

Excess solution was decanted and removed by pipette. Cell pellets were resuspended in 200 μL

10mM Tris-HCL 1mM EDTA and split into two 100 μL fractions.

To one fraction, cells were lysed with 10 mg/mL lysozyme and RNA extracted using

RNeasy column (Qiagen #74104). Purified RNA was ribosomal RNA depleted with

MICROBExpress Bacterial mRNA Enrichment Kit (ThermoFischer #AM1905) according to the manufacturer’s instructions. RNA from these samples was to be used for gene expression experiments having been depleted for rRNA and tRNA.

To the other fraction, the 100 μL aliquot was re-spun and liquid was removed by pipette.

RNA from cell pellet was extracted using RNASnap (Stead et al., 2012). Briefly, cell pellets were resuspended in 500 μL Extraction solution (95% Formamide, 1% BME, 0.025% SDS, 18 mM EDTA) and 200 μL chilled zirconia beads (0.1 mm diameter). Resuspensions were vortexed for 10 minutes using a horizontal vortex adapter, incubated for 7 minutes at 95°C, and 300 μL of the aqueous solution was transferred to a fresh tube. Samples were spun for 5 minutes at 16,000g and supernatant was diluted with four volumes of water in a fresh tube and precipitated with five volumes of Isopropanol. After freezing and centrifugation, RNA was resuspended in 10 mM Tris pH 7. RNA from these samples was to be used for total RNA expression experiments and was not depleted for rRNA or tRNA.

Each RNA sample was then fragmented (ThermoFischer #AM8740), dephosphorylated, polyadenlyated, and reverse transcribed using Superscript III and custom oligo-dT reverse transcription primers. Resulting cDNA was ligated with Illumina sequencing adapters and PCR amplified. Single end (50 bp) sequencing was performed on an Illumina NextSeq.

95

AUPAGE & Northern Blot tRNA charging analysis was performed similarly to (Darnell et al., 2018; Kimura and Waldor,

2019; Varshney et al., 1991). Cells were grown to OD600 0.3, mixed 50:50 with pre-warmed 10%

Trichloroacetic Acid, and placed on ice. Mixtures were spun down at 3000 x g for 10 minutes at

4°C and the supernatant was poured off. Cell pellets were then resuspended in 750 μL of AE buffer (0.3 M NaOAc pH 4.5, 10 mM EDTA) and either frozen at -80°C or immediately mixed with 750 μL ice cold Phenol pH 4.5 before transferring to a 2 mL tube. Mixtures were vortexed for 10 min, rested on ice for 5 min, and spun for 10 min at 4°C. Aqueous supernatant was removed and mixed with 750 μL cold Isopropanol. Another 750 μL of AE buffer was then added to the remaining organic layer before another round of vortexing and separation was performed.

The aqueous layer was transferred to another tube with 750 μL cold Isopropanol and all samples were vortexed and precipitated for more than two hours at -20°C. After spinning down precipitations, pellets were washed with 70:30::Ethanol:10 mM NaOAc pH 4.5 and dried. Pellets were re-dissolved in SA Buffer (10 mM NaOAc pH 4.5, 1 mM EDTA) and quantified using

Qubit BR RNA kit. To mark gel mobility shift of uncharged tRNA, total RNA samples were deacylated in 100 mM Tris pH 9.5 at 37° for 1 hour then precipitated and re-dissolved in SA buffer.

To perform the acid urea gel electrophoresis, 0.5 – 1 μg of each sample was mixed with

2X loading buffer (0.1 M NaOAc pH 5.0, 9 M urea, 0.05% bromophenol blue, and 0.05% xylene cyanol) and loaded onto a gel containing 6.5% polyacrylamide, 7 M Urea, and 0.1 M NaOAc pH

5.0. Following electrophoresis, the region between the bromophenol blue and xylene cyanol bands was excised and washed in 1X TBE. Gels were transferred to BrightStar-Plus positively

96 charged nylon membrane in 1X TBE using a semi-dry transfer apparatus at 3 mA/cm2 for 30 min. The blots were crosslinked twice with 1,200 μJ UV light, prehybridized in PerfectHyb buffer for 30 min at 42°C, and hybridized at 42°C with 5 pmol of the appropriate probe (see

Table S2). Probes were end-labeled with [γ-32P] ATP using T4 PNK and purified with G25

Sepharose columns. Blots were twice washed with Wash Buffer (2X SSC and 0.5% SDS) at

60°C, exposed to a Phosphor-Imaging screen for 1 hour, and imaged using a FLA-5000 phosphoimager (Fuji). To properly assess the absolute charging level for samples with very high charging fractions, we first extracted raw intensities from the northern blot using ImageJ

(Schneider et al., 2012). Using the x-coordinates of positions of maximal peak heights in wildtype lanes, we fit two gaussians around the uncharged and charged fractions and calculated the area under the curve. The charged fraction was calculated as the area under the charged band divided by the sum of the charged and uncharged bands. These peak positions were used to fit curves for both bands in the aaRS overproduction samples, calculating the charging ratio in the same way.

Competition Experiments

To barcode strains for competition analysis a barcoding plasmid was created to integrate at the lacA locus with a resistance gene tetM (genotype lacA::BC-tetM). Each plasmid has a random 12 nucleotide sequence near the end of lacA integration site to replace the endogenous sequence and differentiate the strain from the others by DNA sequence. All “strain-types” (i.e. argS inducible strain or ileS inducible with relA D254G mutation strain) used in the competition experiment except for wildtype 168 were transformed with the barcoding plasmid. For all strain- types used in the experiment, 2-4 independent clones after transformation with verified unique

97 barcodes were kept and used in the competition experiment as biological replicates and termed unique “strain-BC”. Prior to the experiment, all strains were streaked out and single colonies of each strain type and barcode were picked and grown overnight to saturation. The following morning, 5 μL of the overnight culture for each strain was mixed together thoroughly before back-dilution into fresh conditioned MCCG media with specified inducer concentration (see

‘Media’ for details). Eight concentrations of inducer (IPTG) was tested: 10, 15, 20, 25, 30, 40,

70, and 500 μM. When the culture reached OD600 = 0.1, 156 μL of culture was transferred to 10 mL fresh, pre-warmed, shaking conditioned MCCG to continue exponential growth (1/64 or 26 back-dilution). The remainder of the cells were spun down at 4°C, decanted of media and immediately frozen for eventual DNA extraction. Subsequently, when cells again reached OD600

= 0.1 another back-dilution was performed and cells were frozen for DNA. This process continued until reaching roughly 40 generations total. Due to the extensive experiment time some later rounds of back-dilution were decreased (312.5 or 625 μL OD600=0.1 culture into 10 mL fresh culture: 1/32 or 1/16 back-dilution). This decrease in generation time was noted during quantification & final analysis.

To generate amplicons for sequencing and barcode counting frozen cell pellets were first thawed and resuspended in 480 μL 50 mM EDTA and 120 μL 10 mg/mL lysozyme. Genomic

DNA was harvested using the Wizard Genomic DNA Purification Kit (Promega) according to the manufacturers protocol for Gram Positive Bacteria. Barcode amplicons were created using a two round PCR protocol. During the first round of PCR (4 cycles), primers added specific

“timepoint barcodes” for multiplexing, 17 random nucleotides for collapsing PCR duplicates, and partial Illumina sequencing adaptors. Samples were pooled by induction condition and cleaned up using Select-a-Size DNA Clean & Concentrator columns (Zymo). During the second

98 round (12-16 cycles), full Illumina sequencing adaptors were added using standard library prep primers (o231 & indexing primers). Illumina indices were used to differentiate samples on the sequencer. Final PCR products were run on an 8% TBE gel (Novex) for size selection, excised, extracted, and precipitated. Libraries were sequenced on an Illumina HiSeq 2000 or NextSeq500.

For a list of primers see Table S2 and for details on amplicon structure and barcodes see Table

S4.

Quantification & Statistical Analysis

Fitness calculation

The method to calculate the fitness of strains during the competition experiment is as follows.

First, sequencing reads were checked for correct amplicon structure, matched to the appropriate experiment, timepoint, barcode, and duplicates in the 17N random region were collapsed. At each timepoint, the log2 ratio of the counts of each strain-BC to the average counts of the four strain-BC strains for the ‘no insert control’ strain-type was calculated. The ‘no insert control’, is a strain derived from wildtype 168 with an insertion at the amyE locus including the pSpankHY promoter system and spcR resistance gene similar to the inducible aaRS constructs. However, instead of a protein-coding open reading frame inserted under the pSpankHY promoter, the ORF was removed and no protein product was produced (hence ‘no insert’). This strain was then barcoded with the barcoding plasmid (lacA::BC-tetM) producing four independent strain-BC biological replicates of this strain-type. This strain-type was used as the point of comparison to control for the addition of antibiotic markers and deletion of amyE ORF. This control showed a

~1% growth defect compared to wildtype at all induction levels. After calculating the log2 ratio for each timepoint for a given strain-BC, the slope of the linear regression of this ratio over the

99 number of generations of the full experiment was then calculated. The number of generations was calculated as the number of doublings of the wildtype since the prior backdilution (20 minute per doubling). In general, a 64-fold backdilution was used and therefore 6 generations had passed since the previous timepoint. When the backdilution was reduced (32-fold or 16-fold) the number of generations from the previous timepoint was adjusted accordingly (5 or 4 respectively). The fitness for the strain-BC was then computed as 1 + slope. The fitness of the 2-

4 strain-BC of each strain-type was then averaged to report the final fitness for that strain-type.

The fitness difference between all pairwise comparisons of strain-BCs of equivalent strain-type is plotted as a distribution to demonstrate strain-BC “biological” noise in Figure S1B.

In our analysis, we found a floor of read counts for each strain-BC of around 200 reads, potentially due to PCR crosstalk, which could skew the calculated fitness. For example, if the slope of the linear regression was calculated over all timepoints for a particularly unfit strain, this read count floor would artificially decrease the slope and artificially increase the fitness calculated. To circumvent this problem, we only considered generation timepoints prior to the first timepoint with <200 reads when calculating the slope of the linear regression. For particularly unfit strains such as low induction of leuS, this meant the slope and fitness were calculated with the first two or three timepoints. We also noticed that two timepoints in some induction experiments (40 μM Timepoint 1, 20 μM Timepoint 0) had extremely low read counts across all strain-BC. We attribute this to loss of DNA during collection or a failed PCR reaction in amplicon preparation. To circumvent this problem, we simply removed the affected timepoint from the slope calculation, calculating using the remaining times that passed the read floor threshold.

100 Calibration of aaRS production

Protein production from the ectopic locus was estimated by first performing ribosome profiling for each aaRS strain fully induced with IPTG (500 μM), which showed limited expression changes except for the induced aaRS. The fully induced aaRS production rate relative to the rest of the cell, as measured by ribosome footprint density, was compared to that in the wildtype strain 168. At other IPTG concentrations, aaRS production relative to the fully induced situation was calibrated by the relative promoter strength, which we measured by qRT-PCR for an independent strain that has the same inducible promoter (pSpankHY) driving gfp expression from the same amyE locus. For example, ribosome profiling data showed that the level of IleS production at full induction is 14-fold higher than in wildtype cells. At 40 μM IPTG, the , determined from RT-qPCR of the gfp construct, is 9.4-fold lower than at 500

μM, meaning that IleS production is inferred through calibration to be 1.5-fold higher than the wildtype level.

This calibration provides an estimate for the relative promoter strength compared to the wildtype level and should not be taken as an absolute measure of expression levels. Because aaRS production affects growth and growth affects global gene expression even for a constitutive promoter (Klumpp et al., 2009; Liang et al., 1999; Zhu and Dai, 2019), we expect the absolute output of the promoter to vary with aaRS production. Although our ribosome profiling data could be used to empirically measure the aaRS expression at different IPTG concentrations, such an analysis is limited to providing the expression relative to other genes, which themselves are affected by growth rate changes. For example, a low IPTG concentration not only reduces aaRS production, but also leads to reduced global gene expression. Indeed, in this case, the apparent

‘knockdown’ in aaRS as a fraction of protein synthesis (as measured by ribosome profiling) is

101 less than the calibrated fold-change in promoter strength from the wildtype level because most other genes have reduced expression (Figure S6A). If the fitness landscape is plotted for aaRS production in terms of fraction of protein synthesis instead of calibrated promoter strength, this would lead to an artificially sharper fitness cliff at underproduction (Figure S6B), although in reality the promoter is much more repressed than the apparent knockdown. We therefore chose to report the calibrated production to avoid such confounding effects.

Ribosome profiling and RNA-seq data analysis

Processing of ribosome profiling sequencing reads included removal of the linker sequence and mapping with Bowtie (Langmead et al., 2009) (options -v 2 -k 1) to the NC_000964.3 reference genome obtained from NCBI Reference Sequence Bank. Footprint reads with sizes between 20 to 40 nucleotides were stripped of the first and last 8 nucleotides and a count of 1/n where n equals the remaining read length was assigned to the remaining positions. To calculate proteome fractions for regulons, the total read counts for a subset of genes in the regulon was compared to all ORF mapping reads excluding start and stop codons. Processing of RNA sequencing reads included removal of PolyA tails and mapping with Bowtie (Langmead et al., 2009) (options -v 2

-k 1) to the NC_000964.3 reference genome. The five prime ends of reads mapping to open reading frames was counted excluding the first and last five positions. Read counts were normalized by total mapped reads and gene length (RPKM) or total mapped reads (RPM).

The average ribosome occupancy at the 61 sense codons was computed by first normalizing read counts at each nucleotide in a coding sequence by the mean read count for that sequence (excluding the first and last 15 nucleotides) using the Ribosome profiling data. This normalized ribosome occupancy was averaged across the three nucleotides of each codon, and

102 then averaged across all occurrences of that codon within the coding region. Final values were computed by averaging across all coding sequences meeting a threshold of five sequencing reads per nucleotide. Only genes reaching this threshold in all compared experiments were considered in the final analysis. Ribosome queuing was calculated by similarly normalizing read counts at each nucleotide in a coding region by the mean read count for that region. For each instance of a codon, a window of 100 nucleotides upstream and downstream of the second codon position were obtained and windows were averaged across all instances of the codon within that gene.

Only codons over 115 nucleotides away from a start and stop codon were considered. Final values were computed by averaging across all coding sequences meeting a threshold of five sequencing reads per nucleotide. For comparison between experiments, the average density at each position in the window from the wildtype was subtracted from the average density in the

SerS underproduction experiment.

103 REFERENCES Avcilar-Kucukgoze, I., Bartholomäus, A., Cordero Varela, J.A., Kaml, R.F.-X., Neubauer, P., Budisa, N., and Ignatova, Z. (2016). Discharging tRNAs: a tug of war between translation and detoxification in Escherichia coli. Nucleic Acids Res. 44, 8324–8334.

Bremer, H., and Dennis, P.P. (2008). Modulation of Chemical Composition and Other Parameters of the Cell at Different Exponential Growth Rates. EcoSal Plus 3. https://doi.org/10.1128/ecosal.5.2.3.

Britton, R.A., Eichenberger, P., Gonzalez-Pastor, J.E., Fawcett, P., Monson, R., Losick, R., and Grossman, A.D. (2002). Genome-wide analysis of the stationary-phase (sigma-H) regulon of Bacillus subtilis. J. Bacteriol. 184, 4881–4890.

Dai, X., Zhu, M., Warren, M., Balakrishnan, R., Patsalo, V., Okano, H., Williamson, J.R., Fredrick, K., Wang, Y.-P., and Hwa, T. (2016). Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth. Nat Microbiol 2, 16231.

Darnell, A.M., Subramaniam, A.R., and O’Shea, E.K. (2018). Translational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells. Mol. Cell 71, 229–243.e11.

Davidi, D., and Milo, R. (2017). Lessons on enzyme kinetics from quantitative proteomics. Curr. Opin. Biotechnol. 46, 81–89.

Dekel, E., and Alon, U. (2005). Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592.

Dittmar, K.A., Mobley, E.M., Radek, A.J., and Pan, T. (2004). Exploring the regulation of tRNA distribution on the genomic scale. Journal of Molecular Biology 337, 31–47.

Dittmar, K.A., Sørensen, M.A., Elf, J., Ehrenberg, M., and Pan, T. (2005). Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Reports 6, 151–157.

Dong, H., Nilsson, L., and Kurland, C.G. (1995). Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J. Bacteriol. 177, 1497–

104 1504.

Elf, J., Nilsson, D., Tenson, T., and Ehrenberg, M. (2003). Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300, 1718–1722.

Eymann, C., Homuth, G., Scharf, C., and Hecker, M. (2002). Bacillus subtilis functional genomics: global characterization of the stringent response by proteome and transcriptome analysis. J. Bacteriol. 184, 2500–2520.

Ferro, I., Liebeton, K., and Ignatova, Z. (2017). Growth-Rate Dependent Regulation of tRNA Level and Charging in Bacillus licheniformis. Journal of Molecular Biology 429, 3102–3112.

Gendron, N., Putzer, H., and Grunberg-Manago, M. (1994). Expression of both Bacillus subtilis threonyl-tRNA synthetase genes is autogenously regulated. J. Bacteriol. 176, 486–494.

Gibson, D.G., Young, L., Chuang, R.-Y., Venter, J.C., Hutchison, C.A., and Smith, H.O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods 6, 343–345.

Giegé, R., and Springer, M. (2016). Aminoacyl-tRNA Synthetases in the Bacterial World. EcoSal Plus 7. https://doi.org/10.1128/ecosalplus.ESP-0002-2016.

Gloge, F., Becker, A.H., Kramer, G., and Bukau, B. (2014). Co-translational mechanisms of protein maturation. Curr. Opin. Struct. Biol. 24, 24–33.

Gourse, R.L., Chen, A.Y., Gopalkrishnan, S., Sanchez-Vazquez, P., Myers, A., and Ross, W. (2018). Transcriptional Responses to ppGpp and DksA. Annu. Rev. Microbiol. 72, 163–184.

Gouy, M., and Grantham, R. (1980). Polypeptide elongation and tRNA cycling in Escherichia coli: a dynamic approach. FEBS Letters 115, 151–155.

Green, N.J., Grundy, F.J., and Henkin, T.M. (2010). The T box mechanism: tRNA as a regulatory molecule. FEBS Letters 584, 318–324.

Gutiérrez-Preciado, A., Henkin, T.M., Grundy, F.J., Yanofsky, C., and Merino, E. (2009). Biochemical features and functional implications of the RNA-based T-box regulatory

105 mechanism. Microbiol. Mol. Biol. Rev. 73, 36–61.

Hanson, G., and Coller, J. (2018). Codon optimality, bias and usage in translation and mRNA decay. Nature Reviews Molecular Cell Biology 19, 20–30.

Hauryliuk, V., Atkinson, G.C., Murakami, K.S., Tenson, T., and Gerdes, K. (2015). Recent functional insights into the role of (p)ppGpp in bacterial physiology. Nature Reviews Microbiology 13, 298–309.

Hinnebusch, A.G. (2005). Translational regulation of GCN4 and the general amino acid control of yeast. Annu. Rev. Microbiol. 59, 407–450.

Kimura, S., and Waldor, M.K. (2019). The RNA degradosome promotes tRNA quality control through clearance of hypomodified tRNA. Proc. Natl. Acad. Sci. U.S.a. 116, 1394–1403.

Klumpp, S., Scott, M., Pedersen, S., and Hwa, T. (2013). Molecular crowding limits translation and cell growth. Proc. Natl. Acad. Sci. U.S.a. 110, 16754–16759.

Klumpp, S., Zhang, Z., and Hwa, T. (2009). Growth rate-dependent global effects on gene expression in bacteria. Cell 139, 1366–1375.

Kriel, A., Brinsmade, S.R., Tse, J.L., Tehranchi, A.K., Bittner, A.N., Sonenshein, A.L., and Wang, J.D. (2014). GTP dysregulation in Bacillus subtilis cells lacking (p)ppGpp results in phenotypic amino acid auxotrophy and failure to adapt to nutrient downshift and regulate biosynthesis genes. J. Bacteriol. 196, 189–201.

Lalanne, J.-B., Taggart, J.C., Guo, M.S., Herzel, L., Schieler, A., and Li, G.-W. (2018). Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749–761.e38.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25–10.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

106 Liang, S., Bipatnath, M., Xu, Y., Chen, S., Dennis, P., Ehrenberg, M., and Bremer, H. (1999). Activities of constitutive promoters in Escherichia coli. Journal of Molecular Biology 292, 19– 37.

Liu, K., Bittner, A.N., and Wang, J.D. (2015). Diversity in (p)ppGpp metabolism and effectors. Curr. Opin. Microbiol. 24, 72–79.

Loveland, A.B., Bah, E., Madireddy, R., Zhang, Y., Brilot, A.F., Grigorieff, N., and Korostelev, A.A. (2016). Ribosome•RelA structures reveal the mechanism of stringent response activation. Elife 5, 811.

Luo, D., Leautey, J., Grunberg-Manago, M., and Putzer, H. (1997). Structure and regulation of expression of the Bacillus subtilis valyl-tRNA synthetase gene. J. Bacteriol. 179, 2472–2478.

McClain, W.H., Jou, Y.Y., Bhattacharya, S., Gabriel, K., and Schneider, J. (1999). The reliability of in vivo structure-function analysis of tRNA aminoacylation. Journal of Molecular Biology 290, 391–409.

Morrill, S.A., and Amon, A. (2019). Why haploinsufficiency persists. Proc. Natl. Acad. Sci. U.S.a. 116, 11866–11871.

Nanamiya, H., Kasai, K., Nozawa, A., Yun, C.-S., Narisawa, T., Murakami, K., Natori, Y., Kawamura, F., and Tozawa, Y. (2008). Identification and functional analysis of novel (p)ppGpp synthetase genes in Bacillus subtilis. Mol. Microbiol. 67, 291–304.

O'Brien, E.J., Utrilla, J., and Palsson, B.O. (2016). Quantification and Classification of E. coli Proteome Utilization and Unused Protein Costs across Environments. PLoS Comput. Biol. 12, e1004998.

Peters, J.M., Colavin, A., Shi, H., Czarny, T.L., Larson, M.H., Wong, S., Hawkins, J.S., Lu, C.H.S., Koo, B.-M., Marta, E., et al. (2016). A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. Cell 165, 1493–1506.

Potrykus, K., and Cashel, M. (2008). (p)ppGpp: still magical? Annu. Rev. Microbiol. 62, 35–51.

107 Putzer, H., Laalami, S., Brakhage, A.A., Condon, C., and Grunberg-Manago, M. (1995). Aminoacyl-tRNA synthetase gene regulation in Bacillus subtilis: induction, repression and growth-rate regulation. Mol. Microbiol. 16, 709–718.

Rauscher, R., and Ignatova, Z. (2018). Timing during translation matters: synonymous mutations in human pathologies influence protein folding and function. Biochem. Soc. Trans. 46, 937–944.

Sander, T., Farke, N., Diehl, C., Kuntz, M., Glatter, T., and Link, H. (2019). Allosteric Feedback Inhibition Enables Robust Amino Acid Biosynthesis in E. coli by Enforcing Enzyme Overabundance. Cell Syst 8, 66–75.e68.

Schneider, C.A., Rasband, W.S., and Eliceiri, K.W. (2012). NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9, 671–675.

Scott, M., and Hwa, T. (2011). Bacterial growth laws and their applications. Curr. Opin. Biotechnol. 22, 559–565.

Scott, M., Gunderson, C.W., Mateescu, E.M., Zhang, Z., and Hwa, T. (2010). Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102.

Sherman, J.M., Rogers, M.J., and Söll, D. (1992). Competition of aminoacyl-tRNA synthetases for tRNA ensures the accuracy of aminoacylation. Nucleic Acids Res. 20, 2847–2852.

Sonenshein, A.L. (2005). CodY, a global regulator of stationary phase and virulence in Gram- positive bacteria. Curr. Opin. Microbiol. 8, 203–207.

Srivatsan, A., and Wang, J.D. (2008). Control of bacterial transcription, translation and replication by (p)ppGpp. Curr. Opin. Microbiol. 11, 100–105.

Stead, M.B., Agrawal, A., Bowden, K.E., Nasir, R., Mohanty, B.K., Meagher, R.B., and Kushner, S.R. (2012). RNAsnap™: a rapid, quantitative and inexpensive, method for isolating total RNA from bacteria. Nucleic Acids Res. 40, gks680–e156.

Subramaniam, A.R., Deloughery, A., Bradshaw, N., Chen, Y., O'Shea, E., Losick, R., and Chai, Y. (2013a). A serine sensor for multicellularity in a bacterium. Elife 2, e01501.

108 Subramaniam, A.R., Pan, T., and Cluzel, P. (2013b). Environmental perturbations lift the degeneracy of the genetic code to regulate protein levels in bacteria. Proc. Natl. Acad. Sci. U.S.a. 110, 2419–2424.

Subramaniam, A.R., Zid, B.M., and O’Shea, E.K. (2014). An Integrated Approach Reveals Regulatory Controls on Bacterial Translation Elongation. Cell 159, 1200–1211.

Swanson, R., Hoben, P., Sumner-Smith, M., Uemura, H., Watson, L., and Söll, D. (1988). Accuracy of in vivo aminoacylation requires proper balance of tRNA and aminoacyl-tRNA synthetase. Science 242, 1548–1551.

Tockman, J., and Vold, B.S. (1977). In vivo aminoacylation of transfer ribonucleic acid in Bacillus subtilis and evidence for differential utilization of lysine-isoaccepting transfer ribonucleic acid species. J. Bacteriol. 130, 1091–1097.

Trinquier, A., Ulmer, J.E., Gilet, L., Figaro, S., Hammann, P., Kühn, L., Braun, F., and Condon, C. (2019). tRNA Maturation Defects Lead to Inhibition of rRNA Processing via Synthesis of pppGpp. Mol. Cell 74, 1227–1238.e3.

Varshney, U., Lee, C.P., and RajBhandary, U.L. (1991). Direct analysis of aminoacylation levels of tRNAs in vivo. Application to studying recognition of Escherichia coli initiator tRNA mutants by glutaminyl-tRNA synthetase. J. Biol. Chem. 266, 24712–24718.

Vitreschak, A.G., Mironov, A.A., Lyubetsky, V.A., and Gelfand, M.S. (2008). Comparative genomic analysis of T-box regulatory systems in bacteria. Rna 14, 717–735.

Winther, K.S., Roghanian, M., and Gerdes, K. (2018). Activation of the Stringent Response by Loading of RelA-tRNA Complexes at the Ribosomal A-Site. Mol. Cell 70, 95–105.e4.

Wu, C.C.-C., Zinshteyn, B., Wehner, K.A., and Green, R. (2019). High-Resolution Ribosome Profiling Defines Discrete Ribosome Elongation States and Translational Regulation during Cellular Stress. Mol. Cell 73, 959–970.e5.

Yegian, C.D., Stent, G.S., and Martin, E.M. (1966). Intracellular condition of Escherichia coli transfer RNA. Proc. Natl. Acad. Sci. U.S.a. 55, 839–846.

109 Zhang, S., and Haldenwang, W.G. (2003). RelA is a component of the nutritional stress activation pathway of the Bacillus subtilis transcription factor sigma B. J. Bacteriol. 185, 5714– 5721.

Zhu, M., and Dai, X. (2019). Growth suppression by altered (p)ppGpp levels results from non- optimal resource allocation in Escherichia coli. Nucleic Acids Res. 47, 4684–4693.

110 Chapter III

Characterizing the Origin of the Fitness Defect due to Overproduction of

Aminoacyl-tRNA Synthetases in Bacillus subtilis

Darren J. Parker and Gene-Wei Li

This chapter is being prepared for publication.

111 Abstract

The production of any protein has inherent costs to cellular fitness. These costs, whether dependent on the identity of the protein or independent, must be carefully balanced with the benefits of protein function to optimize fitness. We previously showed that despite increasing tRNA charging levels, the overproduction of the aminoacyl tRNA synthetases (aaRS) in Bacillus subtilis leads to a reduction in fitness. Here we find that this reduction in fitness is due to a combination of increased tRNA mischarging (aaRS-dependent) and increased protein production costs (aaRS-independent), and the summation of these costs explain why aaRS proteins are produced with little excess capacity. These results demonstrate the multitude of parameters that shape the cost-benefit tradeoff of protein production.

112 Introduction

Proper control of gene expression is foundational to cellular fitness, as abnormal gene expression can lead to negative effects at all levels of biology. Despite this importance, directly linking the expression of a gene with its associated consequences at the molecular and phenotypic level has not been explored in many cases. This is especially true of enzyme overexpression, where fewer genetic tools exist to probe the consequences of producing proteins above native levels. We have previously demonstrated that the overproduction of individual aminoacyl-tRNA synthetases (aaRS) incurs a detriment to the growth of Bacillus subtilis in rich media. In this case, native protein levels were found to be near a ‘fitness cliff’ as only a two-fold reduction in aaRS production leads to a growth defect. Key to understanding why B. subtilis position aaRS levels so close to catastrophe is fully exploring the costs that are paired with protein overproduction.

The costs in fitness due to overproduction of any protein is due to the combination of effects within two distinct categories: those that are baseline to the production of a protein

(protein-independent) and those that are specific to the protein being overproduced (protein- dependent). The procedure of synthesizing a protein is an expensive process with both direct and indirect costs agnostic to the identity of the protein. First, protein synthesis requires multiple molecular building blocks including amino acids, ATP, and GTP, that scale with protein length and expression level. Second, these same resources also need to be invested into producing the translational machinery itself: ribosomes, tRNAs, and protein factors, to keep up with production demand. Finally, in fast growing bacteria the machines of translation, ribosomes, are working at capacity (Vind et al., 1993). mRNA are therefore in competition with one another for ribosomes such that the production of one protein indirectly comes at the detriment of all others (Figure 1)

113 (Dong et al., 1995). All of these considerations contribute to costs that are baseline to overproducing any protein, independent of the proteins identity.

On the contrary, protein-dependent effects are any negative consequences of overproduction related to the specific identity of that protein. These effects can manifest at the protein, pathway, or pleiotropic level (Eames and Kortemme, 2012). For example, cells are highly sensitive to increases in transcription factors which amplify unnecessarily gene expression programs (Keren et al., 2016). Furthermore, it has been shown that overproduction of the molecular chaperone, dnaK, in E. coli leads to greatly reduced cell fitness, potentially through an imbalance of chaperones and molecular targets (Blum et al., 1992; Wall and Plückthun, 1995). In another related example, overproduction of individual tubulin genes, alpha-tubulin and beta- tubulin, in Saccharomyces cerevisiae leads to the accumulation and the arrest of growth due to a lack of subunit balance (Weinstein and Solomon, 1990). Interestingly, the two subunits did not produce an equally toxic phenotype upon overexpression, demonstrating that it is sometimes difficult to predict the underlying reason for an observed fitness defect with protein overproduction.

114

Figure 1. Model of the sources leading to fitness defects upon aaRS overproduction. The negative effects on fitness upon aaRS overproduction can come in two forms: aaRS-independent and aaRS-dependent effects. Independent effects are agnostic to the protein identity and come mainly from the costs of production. As depicted, increased mRNA production of one aaRS (orange) leads to ribosome allocation changes. Dependent effects are directly dependent on protein function. As depicted, the overproduction of the orange aaRS protein leads to mischarging of the incorrect (blue) tRNA.

There are some predicted aaRS-dependent consequences of overproducing individual aaRS proteins. Most apparent is the potential for mischarging of tRNAs with the incorrect amino acids. Since tRNAs and related amino acids differ by only a few chemical moieties, some aaRS have been shown to mischarge with a defined frequency (Ibba and Söll, 2000). Increasing the production of a near-cognate aaRS could therefore increase the frequency of mischarging events

(Figure 1) (Sherman et al., 1992; Swanson et al., 1988). Given that the ribosome does not have the ability to sense if a tRNA is aminoacylated with the correct tRNA, most mischarging events would lead to misincorporation into protein (Ibba and Söll, 1999). Mischarging and misincorporation are well demonstrated in directly causing a variety of negative phenotypes from bacteria to higher eukaryotes (Kermgard et al., 2017; Lee et al., 2006; Vo et al., 2018). The accumulation of mistranslated proteins could also subsequently lead to the activation of stress responses or protein folding pathways which could, in turn, reduce fitness. All of the above

115 potential aaRS-dependent, as well as the aaRS-independent effects would sum up to produce the fitness defect we observe during aaRS overproduction.

Here we report that the fitness defects of overproducing aminoacyl-tRNA synthetases in

Bacillus subtilis are due to both aaRS-dependent and aaRS-independent effects. We find that overproduction of SerS and IleS mutants with a reduced ability to charge tRNAs elicit fitness defects that differ from the wildtype aaRSs, but generally have a larger fitness defect than cells overproducing neutral fluorescent proteins to the same magnitude. By exploring the molecular underpinnings behind the aaRS-dependent fitness effects of overproduction, we find evidence of tRNA mischarging and the activation of stress responses. Taken together these results provide a rationale for why aaRS proteins are produced with little excess in Bacillus subtilis.

Results

Creation of aaRS variants that do not increase tRNA charging upon overproduction

To test whether the fitness defects of aaRS overproduction are due solely to aaRS- independent or a combination of aaRS-independent and aaRS-dependent effects, we created multiple IleS and SerS variant proteins with a reduced ability to charge tRNA (Figure 2A and

2B). We reasoned that these variants would elicit only aaRS-independent fitness defects upon overproduction and would therefore have increased fitness compared to wildtype proteins. We created multiple variants of the B. subtilis IleS and SerS proteins in well-characterized, conserved regions known to be involved in the tRNA charging process in the E. coli enzymes.

These mutations included point mutations in the catalytic core, anticodon recognition domain, and other well conserved motifs known to be important for the charging reaction (Figure 2A and

2B) (Baouz et al., 2009; Clarke et al., 1988; Nureki et al., 1998; Shepard et al., 1992; Vincent et

116 al., 1995). Additionally, we created SerS variants with large deletions in the conserved N- terminal coiled coil domain shown to be required for tRNAser binding for the E. coli enzyme

(Figure 2C) (Borel et al., 1994). We also made SerS and IleS variants with degradation tags appended to the C-terminal ends. These short sequences have been shown to lead to efficient degradation of a marked protein by the ClpXP machinery in vivo independent of the sspB adaptor (Griffith and Grossman, 2008). We reasoned that synthesis and then immediate degradation should only incur aaRS-independent costs of production. Finally, we created a series of constructs of concatenated fluorescent proteins (FP) ranging from 0.9 kb (1X FP) to 3.6 kb

(5X FP). Since these fluorescent proteins have little protein-specific toxicity (Kafri et al., 2016), they provide an orthogonal approach to establishing a baseline of the independent costs of protein production.

117

Figure 2. Schematic of aaRS and fluorescent protein variants created. (A and B) Gene diagrams of IleS and SerS open reading frames and related sequences. Important regions of homology targeted for mutations are shown along with the wildtype sequences for E. coli and B. subtilis to demonstrate conservation. Mutations in the B. subtilis sequences are shown in orange with associated variant number. Degron tag was directly appended to the C-terminal end of the open reading frame after removal of the stop codon. (C) Conservation of coiled-coil domain at N-terminal end of SerS proteins. Probability of coiled coils per residue were predicted for E. coli and B. subtilis SerS proteins with the DeepCoil prediction software (Ludwiczak et al., 2019; Zimmermann et al., 2018). Significance threshold denoted by a red line. (D) Fluorescent protein

118 constructs created to assess protein independent fitness effects of overproduction. All constructs with greater than one fluorescent protein unit (FP) were linked together with Glycine/Serine linkers to make a continuous open reading frame.

We then tested if overproduction of the IleS and SerS variant proteins change native transcript levels. Since native ileS and serS transcript levels are controlled by T box autoregulatory riboswitches sensitive to the charged state of cognate tRNA (discussed in Chapter

II), we reasoned that a lack of decrease in native transcript levels upon overproduction would indicate the variants do not increase cognate tRNA charging. Strains with an exogenous copy of either the wildtype or variant genes were grown with full induction of the IPTG promoter and changes in gene expression were assessed by RT-qPCR with a two-primer pair strategy (Figure

3A). The first set of primers have the ability to bind both the wildtype aaRS copy produced from its native locus and the variant copy, thus reporting on total aaRS levels. A secondary primer pair, with one primer within the T box riboswitch region, only reports on the native transcript level. As a control, we found that induction of a second copy of the wildtype gene led to a 4.5- fold reduction in the level of the native transcript, indicating that tRNA charging increased and the T box repressed expression at the native locus. On the contrary, we found that induction of

SerS variants I-III and IleS variant I did not decrease the relative expression of the native aaRS copy, indicating overproduction of these variants did not increase tRNA charging (Figure 3B and

3C). Unfortunately, induction of degron-tagged aaRS variants (SerS IV and IleS II) both showed a decrease in native aaRS transcript, although for the IleS variant the magnitude was reduced compared to induction of the wildtype copy (Figure 3B and 3C). The results suggest that there are still elevated aaRS protein levels in cells, likely due to incomplete degradation. Surprisingly, we found that induction of an ileS variant with point mutations in the HIGH domain (IleS III)

119 were lethal upon overproduction and thus could not be assayed (data not shown). We next verified the reduction in tRNA charging for the two IleS variants compared to wildtype IleS overproduction. We found that although tRNA charging was slightly higher than true wildtype, charging levels were lower with the variants than wildtype enzyme overproduction (Figure 3D).

Taken together our results demonstrate that overproduction of the created aaRS variants do not increase tRNA charging to the same extent as wildtype protein, and can potentially be used to assess the aaRS-independent effects of aaRS overproduction.

120 A

Pnative PspankHY

T box aaRS + aaRS**

B SerS Variants

102

n

o i

s 1

s 10

e

r

p

x

e

e

v i

t 100

a

l

e R

- + + + + + IPTG 10-1 WT WT I II III IV Variant

C IleS Variants D

102 Probed for ileCAT n

o 1 ileS ileS ileS i 10 Deacyl

s WT

s WT vI vII

e

r p

x 100

e

e

v

i

t a l 10-1

e -- 52 86 62 73 R - + + + IPTG 10-2 WT WT I II Variant

Figure 3. Effects on aaRS expression and tRNA charging of variant overproduction. (A) Schematic for two primer pair qRT-PCR strategy. Body primers (black) quantify both native and variant aaRS expression while native primers (blue) only quantify the native copy. (B and C) Gene expression of aaRS variant induction. Body primer quantification shown in yellow and native primer quantification shown in blue. Cells with associated variants as listed were induced without IPTG (-) or with 500 μM IPTG (+). Relative expression to sigA was calculated for each primer pair. Error bars indicate standard deviation of qRT-PCR technical replicates. (D) Northern blot analysis of tRNA charging changes due to variant overproduction. A control sample of deacylated tRNA was used to determine location of uncharged tRNAs. Fraction of charged tRNA was calculated as the ratio between the lower band and the sum of the upper and lower band. tRNA from wildtype cells, IleS WT, IleS variant I and IleS variant II were probed for ileCAT tRNA charging.

121 Varied fitness effects of aaRS variant overproduction

Next, we tested the effects on fitness of overproducing the aaRS variants and FP constructs. Using a DNA-barcoded competition assay described in the previous chapter, cells harboring the variants were competed in rich defined media for ~40 generations with defined concentrations of inducer. As expected, for all variants tested we found that fitness decreased with increasing induction compared to control strain (Figure 4A and 4B). In concordance with our previous work we found that maximal overproduction of wildtype SerS and IleS lead to ~2% fitness defect. Surprisingly, we found that cells overproducing the point mutation variants that showed limited increase in tRNA charging (SerS I, II, III, and IleS I) actually had larger fitness defects when maximally induced compared to cells overproducing the wildtype enzyme (Figure

4A and 4B). Degron-tagged aaRSs showed a divergence in fitness, as the overproduction of the

SerS variant (IV) but not the IleS variant (II) lead to reduced fitness compared to overproduction of the wildtype enzyme. Interestingly, both SerS N-terminal variants, that could not be assayed for changes in native locus expression due to primer design, showed increased fitness compared to the wildtype enzyme when overproduced.

We then directly compared the fitness of the aaRS variants against the fluorescent protein constructs. We inferred the fraction of protein synthesis devoted to each construct during overproduction and compared this to the observed fitness defects (see Methods section for details). We found that 10/11 of the aaRS, including the wildtype enzymes, have fitness costs higher than fluorescent proteins overproduced to the same degree. Only the strain overproducing the degron-tagged IleS showed lowered fitness compared to an equally produced FP construct, indicating that this variant had reduced aaRS-dependent fitness effects. In summary, these results

122 demonstrate that the fitness defects of aaRS overproduction are due to both aaRS-independent and aaRS-dependent effects.

Figure 4. Effects on fitness of aaRS and FP variant overproduction. (A and B) Fitness defects of aaRS variants across competition experiments. Competition experiments with all strains were performed at 50, 100, and 500 μM IPTG. Fitness defects for each variant and WT were calculated as described in the Methods section. Error bars indicate standard error of fitness measurements for individually barcoded, biological replicates. (C) Relationship between proteome fraction and fitness for aaRS and FP variant overproduction. Fitness defects from maximal (500 μM) overproduction competition experiments for all variants were compared to their inferred fraction of protein synthesis during overproduction. All variants are numbered accordingly with SerS, IleS, ArgS, and FP colored as red, purple, green, and yellow respectively. Grey line and yellow lines denote a 1:1 and 2:1 defect to fraction of protein synthesis relationship. See Methods section for details on calculation of fraction of protein synthesis.

123 aRS overproduction leads to mischarging and stress response activation

We next looked for potential sources that contribute to the aaRS-dependent fitness effects during overproduction. We reasoned that these aaRS-specific toxicity effects would primarily be caused by tRNA mischarging, although downstream effects could also contribute to the lowered fitness of overproduction. Overall, we found that aaRS overproduction can lead to increased charging in non-cognate tRNA. During the overproduction of IleS the tRNA charging for a leucine tRNA, leuUAA, increases from 64% in wildtype cells to 72% (Figure 5A). Likewise, the charging of an isoleucine tRNA, ileCAU, increased from 52% to 78% during LeuS overproduction. As expected, this is not always the case, as overproduction of IleS does not lead to an increase in arginine tRNA charging. Although the tRNA northern blots do not directly demonstrate that the increase in tRNA charging is due to mischarging (leu-tRNAile or ile- tRNAleu), we do not see an increase in gene expression for the cognate aaRS during non-cognate aaRS overproduction (Figure 5B), strongly implying the increase in the tRNA charging ratio is due to mischarging.

Finally, we explored the changes in gene expression due to aaRS overproduction.

Overall, we found few differences between strains overproducing SerS, IleS, and ArgS, and a control strain (Figure 5B). The subset of genes with the most noticeable overall change during aaRS overproduction was a slight increase in production of genes in the sigB regulon (Figure

5B). Induction of this stress response regulon was mildly correlated with the observed fitness defect (Figure 4C). Furthermore, it has been demonstrated that B. subtilis are highly sensitive to tRNA charging of serine tRNAs (Chapter II & (Subramaniam et al., 2013)), potentially explaining why the sigB regulon is most activated in SerS overproduction. Interestingly, we do not find any change in genes involved in the heat shock response or protein chaperones, as might

124 be expected if amino acid misincorporation into proteins was increased. Taken together, our data indicate that while aaRS overproduction may increase non-cognate tRNA charging, the overall effects on the cell are minimal, matching the relatively small aaRS-dependent fitness burden.

Figure 5. tRNA charging and gene expression changes of aaRS overproduction. (A) Northern blot analysis of tRNA charging changes due to aaRS overproduction. A control sample of deacylated tRNA was used to determine location of uncharged tRNAs. Fraction of charged tRNA was calculated as described in the Methods section. tRNAs probed as listed for WT and aaRS overproduction. Two independent biological replicates of ArgS overproduction were probed in the final blot. (B) Gene expression changes of aaRS overproduction. Ribosome profiling was performed for SerS, IleS, ArgS overproduction strains and a control strain induced with 500 μM IPTG. Dark grey lines indicate two-fold expression changes. Genes in the sigB regulon are colored in blue. Genes involved in the heat shock response or protein chaperones are colored in orange.

125 Discussion

In this work, we show that the detriment to cell fitness of overproducing the aminoacyl- tRNA synthetases is due to a combination of aaRS-dependent and aaRS-independent effects. We created a series of aaRS variants and fluorescent protein constructs to demonstrate a baseline cost in protein production, finding that wildtype enzymes have exacerbated, aaRS-dependent, costs of overproduction. Then, we uncovered possible mechanisms adding to the aaRS-dependent fitness effects, increased tRNA mischarging and activation of stress responses. Although each individual protein may be different, especially with regards to protein-specific toxicity, our work sets a framework for exploring the consequences of protein overproduction.

The expression of every protein has an inherent cost, whether it be in direct resources or the allocation of ribosomes. Cell phenotypes are therefore set by balancing the costs of production with the benefit of the particular protein. In fast growing bacteria, the pressure to produce just the right amount of protein is high as ribosomes are working near capacity (Dong et al., 1995; Vind et al., 1993). Work in E. coli has suggested that there is a linear relationship between wasteful expression and fitness, concluding that for every 1% of the proteome devoted to an unnecessary protein, the fitness of the cell is reduced 2% (Scott et al., 2010). Intriguingly, our results here show that most enzymes in Bacillus subtilis have reduced fitness effects than this model predicts (Figure 4C, yellow line). Furthermore, given that enzyme overproduction generally elicits some kind of protein-dependent toxicity, that may change depending on the host organism, determining a quantitative relationship between overproduction and fitness is difficult to establish in any context.

The fitness effects we observed in overproducing mutant aaRS and FP proteins highlights the difficulty in directly establishing protein-dependent toxicity. We found that overproduction

126 of aaRS proteins with a reduced catalytic or tRNA charging capabilities does not lead to major fitness gains over overproduction of the wildtype protein. It is unclear however, if these mutant proteins produce the baseline aaRS-independent fitness effects on overproduction as intended, as these proteins could elicit their own unique protein-dependent effects we did not predict. For example, cell growth was completely inhibited during overproduction of the IleS HIGH motif mutant (variant III). This could potentially be due to the sequestration of isoleucine tRNAs on these defective IleS proteins, or something else entirely. Furthermore, the degron-tagged IleS and

SerS variants may not be completely degraded or their overproduction may saturate the ClpXP machinery. Both would impart fitness defects independent of the core aaRS function of tRNA charging. Further examination of the biochemistry of these mutants and their associated consequences are important to properly quantify the size of the aaRS-dependent effects. Taken generally, careful consideration must be made when attempting to distinguish between protein- dependent and protein-independent costs of overproduction.

Our findings further explain why the production levels of native aaRS enzymes are near the ‘fitness cliff’ in B. subtilis, rounding out the underlying rationale for the observed fitness landscapes for the aaRS proteins established in the previous chapter. As demonstrated, native levels of aaRS enzymes produce enough charged tRNA flux to make the tRNA accommodation step non-rate limiting and as such, the growth rate of the cell is optimized at native aaRS levels.

However, these levels do not lead to maximized tRNA charging. Here we find that one of the major consequences of maximizing tRNA charging, through aaRS overproduction, is increasing tRNA mischarging. tRNA mischarging, and subsequent misincorporation into protein, is well documented to be detrimental to cell fitness. Although mischarging events are rare, predicted to occur on the order of 104-105 (Ibba and Söll, 1999), even 2-fold increases in mischarging can

127 have pronounced phenotypes (Hilander et al., 2018; Kermgard et al., 2017; Schimmel, 2018; Vo et al., 2018). The observed mischarging phenotype due to overproduction can be potentially alleviated by increased production of the non-cognate aaRS of the mischarged tRNA, (IleS and

LeuS overproduction to together negate tRNAleu and tRNAile mischarging) but this would result in further fitness defects due to the aaRS-independent costs (Sherman et al., 1992; Swanson et al., 1988). Therefore, Bacillus subtilis strikes a balance between tRNAs and aaRSs: produce just enough to maintain translation rates without maximizing charging, in order to limit mischarging and protein production burdens. We also find an increase in downstream stress responses through slight activation of the SigB regulon as another consequence of aaRS overproduction. Whether or not this activation is beneficial or detrimental to cell growth during aaRS overproduction, similar to the stringent response’s role during aaRS underproduction, is the subject of future work.

128 Acknowledgements

We would like to thank Dieter Söll for the extended discussion on creating the aaRS variants,

Alan Grossman and James Taggart for discussions on the degron tagging system, and Jean-

Benoît Lalanne for help making the GFP-RFP construct.

129 Methods

Strain Construction

Bacillus subtilis 168 strain was used as the wild-type strain and base strain for transformations.

All further strains were constructed using natural competence as described in Chapter II. All aaRS and FP variants were cloned under the pSpankHY promoter in a plasmid containing the , a spectinomycin resistance cassette, and flanked by 500 nt homology arms of the amyE locus (Britton et al., 2002). aaRS and FP variants were created with a combination of rolling circle PCR and Gibson assembly techniques (Gibson et al., 2009). Correct plasmids were linearized, transformed into wildtype and selected on LB-agar plates with 100 μg/mL spectinomycin.

qRT-PCR

Cell growth, RNA extraction, and qRT-PCR assays were performed as described in Chapter II.

Cells were grown either with no IPTG or 500 μM IPTG in MCCG media to OD 0.3. Primers used are listed in Table S1.

AUPAGE & Northern analysis tRNA charging analysis was performed as described in Chapter II with oligos used as Northern probes listed in Table S1.

Competition Experiment & analysis

All strains were barcoded, competition experiments performed, and analysis completed as described in Chapter II. Competition experiments were performed for ~40 generations at 50, 100,

130 and 500 μM IPTG with the same 64-fold, 6 generation backdilution approach. Fitness defects were calculated relative to the ‘no insert’ control as described in Chapter II.

Ribosome profiling & analysis

Ribosome profiling data used in this study was generated in Chapter II and reanalyzed here.

Processing of ribosome profiling sequencing reads included removal of the linker sequence and mapping with Bowtie (Langmead et al., 2009) (options -v 2 -k 1) to the NC_000964.3 reference genome obtained from NCBI Reference Sequence Bank. Footprint reads with sizes between 20 to 40 nucleotides were stripped of the first and last 8 nucleotides and a count of 1/n where n equals the remaining read length was assigned to the remaining positions. Gene expression was calculated using the Reads Per Kilobase per Million mapped reads method (RPKM). To calculate fractions of protein synthesis, the total read counts for genes of interest was compared to all ORF mapping reads excluding start and stop codons.

Inferred protein synthesis fraction calculation

In order to assess the fraction of protein synthesis devoted to aaRS variants, without performing a ribosome profiling experiment for each variant, we used an approach combining previous ribosome profiling results with the qPCR results from Figure 3. We reasoned that since the 5’ end of the variant aaRS genes, including the promoter, 5’ UTR, Shine-Delgarno, and first 50 nucleotides of the ORF is the same as the native copy, the translation efficiency of variants would be the same. Therefore, the only difference in expression level would be the steady state mRNA level that could be assessed by qPCR. However, since we cannot distinguish the native copy from the variant copy, we report here the total fraction of protein synthesis for both aaRS.

131 Given our results that maximal induction of the native aaRS enzymes with the PspankHY promoter leads to a 10-fold increase over native levels, the variant when induced would be a majority of the signal. For each SerS and IleS variant, that was assessed in Figure 3A & B, the proteome fraction calculated for the 500 μM IPTG induction ribosome profiling was multiplied by the fold change in mRNA levels between overproducing the native gene and the variant gene.

For example, wildtype IleS enzyme overproduction was calculated to be 2.54% of total protein synthesis. IleS variant I showed total aaRS mRNA levels of 1.17-fold higher than wildtype when overproduced. Thus, we inferred the IleS variant I fraction of protein synthesis to be 2.97%. For the fluorescent protein constructs we took a modified approach with the same assumption that translation efficiency does not change between FP constructs. We first calculated the fraction of protein synthesis of the 1X FP GFP during 500 μM induction using ribosome profiling (0.934%).

We then multiplied each of the other FP constructs by the fold change in length of the concatenated peptide. For example, the 2X FP construct (GFP-RFP) is 1.97x as long as the as the

1X GFP and thus we calculated the fraction of protein synthesis for this construct to be 1.84%.

Supplemental Table 1 – DNA oligos used in this chapter

Oligo Name Sequence Exp. serS Body F GTGAAGTCGGCGAGGAAATTA qPCR serS Body R TGCGGAATATTCGGGATTGAG qPCR serS Native F AAAAAGGAGTGTTTCGCATGC qPCR serS Native F TCTTCAACCTTGCCGATAAGC qPCR ileS Body F GTAGGTGAAGACCGCTTTGT qPCR ileS Body R TCCCTTTGACCGTTCTTGTC qPCR ileS Native F CTGAAAAAATGCAGTGGAGGAATG qPCR ileS Native R GGTCTTTCGTCCGTTCCTG qPCR sigA Body F AGATTGAAGAAGGTGACGAAGAAT qPCR sigA Body R TCAGATCAAGGAACAGCATACC qPCR ile tRNA probe TGGGCCTGAATGGACTCGAACCATCGaCCTCACGCTTATCAGGCG Northern leu tRNA probe CCGAGGGCCGGACTTGAACCGGCACGGTAGTCACCTACCGCAGGATT Northern arg tRNA probe CGCGCCCGAGAGGAGTCGAACCCCTAACCTTTTGAT Northern

132 References

Baouz, S., Schmitter, J.-M., Chenoune, L., Beauvallet, C., Blanquet, S., Woisard, A., and Hountondji, C. (2009). Primary Structure Revision and Active Site Mapping of E. Coli Isoleucyl-tRNA Synthetase by Means of Maldi Mass Spectrometry. Open Biochem J 3, 26–38.

Blum, P., Ory, J., Bauernfeind, J., and Krska, J. (1992). Physiological consequences of DnaK and DnaJ overproduction in Escherichia coli. J. Bacteriol. 174, 7436–7444.

Borel, F., Vincent, C., Leberman, R., and Härtlein, M. (1994). Seryl-tRNA synthetase from Escherichia coli: implication of its N-terminal domain in aminoacylation activity and specificity. Nucleic Acids Res. 22, 2963–2969.

Britton, R.A., Eichenberger, P., Gonzalez-Pastor, J.E., Fawcett, P., Monson, R., Losick, R., and Grossman, A.D. (2002). Genome-wide analysis of the stationary-phase sigma factor (sigma-H) regulon of Bacillus subtilis. J. Bacteriol. 184, 4881–4890.

Clarke, N.D., Lien, D.C., and Schimmel, P. (1988). Evidence from cassette mutagenesis for a structure-function motif in a protein of unknown structure. Science 240, 521–523.

Dennis, P.P., and Bremer, H. (1974). Differential rate of ribosomal protein synthesis in Escherichia coli B/r. Journal of Molecular Biology 84, 407–422.

Dong, H., Nilsson, L., and Kurland, C.G. (1995). Gratuitous overexpression of genes in Escherichia coli leads to growth inhibition and ribosome destruction. J. Bacteriol. 177, 1497– 1504.

Dong, H., Nilsson, L., and Kurland, C.G. (1996). Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth Rates. Journal of Molecular Biology 260, 649–663.

Eames, M., and Kortemme, T. (2012). Cost-benefit tradeoffs in engineered lac operons. Science 336, 911–915.

Ehrenberg, M., and Kurland, C.G. (1984). Costs of accuracy determined by a maximal growth rate constraint. Q. Rev. Biophys. 17, 45–82.

Gibson, D.G., Young, L., Chuang, R.-Y., Venter, J.C., Hutchison, C.A., and Smith, H.O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods 6, 343–345.

Griffith, K.L., and Grossman, A.D. (2008). Inducible protein degradation in Bacillus subtilis using heterologous peptide tags and adaptor proteins to target substrates to the protease ClpXP. Mol. Microbiol. 70, 1012–1025.

Hilander, T., Zhou, X.-L., Konovalova, S., Zhang, F.-P., Euro, L., Chilov, D., Poutanen, M., Chihade, J., Wang, E.-D., and Tyynismaa, H. (2018). Editing activity for eliminating mischarged tRNAs is essential in mammalian mitochondria. Nucleic Acids Res. 46, 849–860.

133 Ibba, M., and Söll, D. (1999). Quality control mechanisms during translation. Science 286, 1893–1897.

Ibba, M., and Söll, D. (2000). Aminoacyl-tRNA synthesis. Annu. Rev. Biochem. 69, 617–650.

Kafri, M., Metzl-Raz, E., Jona, G., and Barkai, N. (2016). The Cost of Protein Production. Cell Reports 14, 22–31.

Keren, L., Hausser, J., Lotan-Pompan, M., Vainberg Slutskin, I., Alisar, H., Kaminski, S., Weinberger, A., Alon, U., Milo, R., and Segal, E. (2016). Massively Parallel Interrogation of the Effects of Gene Expression Levels on Fitness. Cell 166, 1282–1294.e18.

Kermgard, E., Yang, Z., Michel, A.-M., Simari, R., Wong, J., Ibba, M., and Lazazzera, B.A. (2017). Quality Control by Isoleucyl-tRNA Synthetase of Bacillus subtilis Is Required for Efficient Sporulation. Sci Rep 7, 41763–41769.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25–10.

Lee, J.W., Beebe, K., Nangle, L.A., Jang, J., Longo-Guess, C.M., Cook, S.A., Davisson, M.T., Sundberg, J.P., Schimmel, P., and Ackerman, S.L. (2006). Editing-defective tRNA synthetase causes protein misfolding and neurodegeneration. Nature 443, 50–55.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Ludwiczak, J., Winski, A., Szczepaniak, K., Alva, V., and Dunin-Horkawicz, S. (2019). DeepCoil-a fast and accurate prediction of coiled-coil domains in protein sequences. Bioinformatics 35, 2790–2795.

Nureki, O., Vassylyev, D.G., Tateno, M., Shimada, A., Nakama, T., Fukai, S., Konno, M., Hendrickson, T.L., Schimmel, P., and Yokoyama, S. (1998). Enzyme structure with two catalytic sites for double-sieve selection of substrate. Science 280, 578–582.

Schimmel, P. (2018). The emerging complexity of the tRNA world: mammalian tRNAs beyond protein synthesis. Nature Reviews Molecular Cell Biology 19, 45–58.

Scott, M., Gunderson, C.W., Mateescu, E.M., Zhang, Z., and Hwa, T. (2010). Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102.

Shepard, A., Shiba, K., and Schimmel, P. (1992). RNA binding determinant in some class I tRNA synthetases identified by alignment-guided mutagenesis. Proc. Natl. Acad. Sci. U.S.a. 89, 9964–9968.

Sherman, J.M., Rogers, M.J., and Söll, D. (1992). Competition of aminoacyl-tRNA synthetases for tRNA ensures the accuracy of aminoacylation. Nucleic Acids Res. 20, 2847–2852.

Subramaniam, A.R., Deloughery, A., Bradshaw, N., Chen, Y., O'Shea, E., Losick, R., and Chai,

134 Y. (2013). A serine sensor for multicellularity in a bacterium. Elife 2, e01501.

Swanson, R., Hoben, P., Sumner-Smith, M., Uemura, H., Watson, L., and Söll, D. (1988). Accuracy of in vivo aminoacylation requires proper balance of tRNA and aminoacyl-tRNA synthetase. Science 242, 1548–1551.

Vincent, C., Borel, F., Willison, J.C., Leberman, R., and Härtlein, M. (1995). Seryl-tRNA synthetase from Escherichia coli: functional evidence for cross-dimer tRNA binding during aminoacylation. Nucleic Acids Res. 23, 1113–1118.

Vind, J., Sørensen, M.A., Rasmussen, M.D., and Pedersen, S. (1993). Synthesis of proteins in Escherichia coli is limited by the concentration of free ribosomes. Expression from reporter genes does not always reflect functional mRNA levels. Journal of Molecular Biology 231, 678– 688.

Vo, M.-N., Terrey, M., Lee, J.W., Roy, B., Moresco, J.J., Sun, L., Fu, H., Liu, Q., Weber, T.G., Yates, J.R., et al. (2018). ANKRD16 prevents neuron loss caused by an editing-defective tRNA synthetase. Nature 557, 510–515.

Wall, J.G., and Plückthun, A. (1995). Effects of overexpressing folding modulators on the in vivo folding of heterologous proteins in Escherichia coli. Curr. Opin. Biotechnol. 6, 507–516.

Weinstein, B., and Solomon, F. (1990). Phenotypic consequences of tubulin overproduction in Saccharomyces cerevisiae: differences between alpha-tubulin and beta-tubulin. Mol. Cell. Biol. 10, 5295–5304.

Zimmermann, L., Stephens, A., Nam, S.-Z., Rau, D., Kübler, J., Lozajic, M., Gabler, F., Söding, J., Lupas, A.N., and Alva, V. (2018). A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. Journal of Molecular Biology 430, 2237–2243.

135 Chapter 4

Development of a Highly-multiplexed RNA-seq Protocol

Darren J. Parker and Gene-Wei Li

This chapter is being prepared for publication.

136 Abstract

The development of next-generation sequencing technologies has ushered in new age of molecular biology, allowing for quantification of entire transcriptomes. Despite the high- throughput nature of the measurement, the laborious process of converting cellular RNA to short cDNA fragments amenable to sequencing limits the throughput of experimental design. Here we present a simplified method of RNA-seq library preparation to greatly reduce the cost and time associated with sequencing bacterial transcriptomes. We show that that our method accurately captures gene expression and provides high-resolution information on the 5’ ends of transcripts, in line with the best low throughput approaches. Taken together, our method greatly improves the throughput of generating bacterial transcriptomic datasets, allowing RNA-seq to become a standard tool in the arsenal of any microbiologist.

137 Introduction

The expression of DNA into RNA and protein is what fundamentally sets the phenotype of an organism. Being able to quantify the changes in abundance of these macromolecules is therefore critical in understanding cellular responses to perturbation and the ultimate relationship between genotype and phenotype. However, for many years technological limitations only allowed for the quantification of changes between a few genes at a time, resulting in an incomplete picture of cell responses upon challenge. The advent of high-throughput methodologies, in particular next generation sequencing (NGS), finally allowed for all gene expression changes to be measured simultaneously with high fidelity (Nagalakshmi et al., 2008;

Wang et al., 2009). However, the gold standard approach for quantifying gene expression, RNA- seq, is not a simple process, and currently the cost of library preparation per sample is similar to the cost of sequencing. RNA extracted from cells must go through multiple chemical and physical treatments before it can be measured on a sequencer, with a variety of protocols to this end (Gertz et al., 2012; Mamanova et al., 2010; Sultan et al., 2012). This process is not only expensive, but can be highly laborious, requiring hours of hands on time across all samples

(Levin et al., 2010). Furthermore, although some commercial products aim to simplify the workflow, they often come with a loss of information and an increase in price for the final experiment (Dard-Dascot et al., 2018; Wright et al., 2019).

We therefore sought to simplify and cheapen the workflow of the RNA-seq library preparation process without a major loss of data quality. We designed an approach that rapidly converts fragmented RNA into barcoded cDNA in order to reduce sample to sample variation and the amount of hands-on time required for the procedure. Furthermore, we use sparse fragmentation (Lalanne et al., 2018) and a DNA splint ligation approach (Kwok et al., 2013) to

138 effectively capture and enrich for the 5’ ends of transcripts, increasing the information in the experiment beyond simple gene expression analysis. We find that our method is highly reproducible, measuring gene expression accurately with little crosstalk. All enzymes and DNA oligos used are relatively inexpensive, making the cost of the library preparation an order of magnitude cheaper than the final sequencing. The savings in time, money, and complexity opens the possibility of performing RNA-seq at greater throughput than previously possible.

Results

A simplified early stage barcoding RNA-seq protocol

The workflow involved in the standard RNA-seq protocol involves a lengthy procedure to convert full RNA molecules to short cDNA fragments amenable to high-throughput NGS. One state of the art approach, Rend-seq, provides a high resolution view of the transcriptome, but requires a process spread out over multiple days and hours (Figure 1A) (Lalanne et al., 2018).

Each of the chemical steps involved is completed on individual samples of RNA, requiring multiple extractions, precipitations, tubes, and lanes on gels to be completed for every sample, greatly increasing the time involved when multiple samples are prepared at once (Islam et al.,

2014; Kivioja et al., 2011). We reasoned that a large amount of savings, in both time and money, could be achieved by barcoding and pooling samples as early as possible. Although other library preparation strategies exist with the same aim, utilizing random hexamer or ligation approaches during reverse transcription, each has caveats (Hou et al., 2015; Mäki and Tiirola, 2018; Shishkin et al., 2015). Random hexamer priming is generally biased towards genes with higher GC content and can lead to systematic errors upon sequencing depending on gene position (Hansen et al., 2010). The ligation of common adaptors for specific priming also has well known

139 sequence and structural biases, and the RNA adapters commonly used are susceptible to spontaneous hydrolysis over many freeze-thaw cycles (Elliott and Ladomery, 2017; Fuchs et al.,

2015; Raabe et al., 2014; Zhou and Taira, 1998; Zhuang et al., 2012). Thus, we turned to a priming method generally reserved for eukaryotic RNA-seq protocols, priming from a poly(A) tail with an oligo(dT) primer (Jaitin et al., 2014). We noticed that the conditions for activity of the commercially available E. coli PolyA Polymerase, which efficiently adenylates all terminal nucleotides of RNA, only required slight changes in salt concentrations from the previous and following steps in the protocol, dephosphorylation by T4 Polynucleotide kinase and reverse transcription by Superscript III. Therefore, we developed a streamlined protocol with mild salt and buffer adjustments, OnePot RT, that allows for dephosphorylation, poly(A) tailing, and reverse transcription with a barcoded DNA primer to occur on short fragmented RNA without any intervening cleanup steps (Figure 1B). At this point, all individual libraries are now uniquely barcoded and can be pooled into a single tube. Thus, the final steps of adapter ligation and amplification are performed on only a single sample. With this approach, the process of pooling

30+ unique samples of RNA is only limited by the time required to harvest the RNA from the cells, saving time and money while reducing inter-sample heterogeneity.

140

Figure 1. Comparison of library preparation methods. (A) Traditional method of library preparation for 30 unique samples. Individual circles represent individual tubes required to process 30 samples through the protocol. The identity of the nucleic acids as a result of each step is shown above the samples. Red and blue lines indicate RNA and cDNA respectively. After the “Final PCR” step, samples can be pooled and sequenced together. (B) Multiplex method of library preparation. Schematic setup the same as (A) with circles representing individual samples. After fragmentation, the OnePot RT protocol is performed sequentially in the same tube before combining all libraries into a single tube due to barcoding. Shades of blue denote cDNA libraries generated from unique experiments that exist in the same pool due to barcoding.

141 Validation of multiplex RNA-seq method

We next sought to test the performance of the protocol. In order to be a useful tool for gene expression analysis our method needed to demonstrate high levels of reproducibility, low crosstalk between samples, and high levels of information content. To demonstrate reproducibly, we extracted RNA from MG1655 K12 E. coli cells and split the RNA sample into 12 independent fractions. Each sample was treated as a unique experiment, barcoded during the RT step, pooled, and processed as a single sample. Upon sequencing, we found that gene expression was highly correlated across all pairwise combinations (Figure 2A). Variation between samples was generally due to sequencing depth (Figure 2B) demonstrating the method has high technical reproducibility.

One of the issues with early stage barcoding and pooling is the potential for barcode

“crosstalk”, the incorrect association of samples with barcodes. This is likely to occur during the

PCR amplification step by DNA polymerase template switchin (Kebschull and Zador, 2015;

Odelberg et al., 1995). Determining the potential for crosstalk is especially important when making measurements of lowly transcribed genes or small effect changes, as signals can be diluted when many experiments are pooled together. We checked for barcode crosstalk by performing the protocol on RNA harvested from wildtype, Δtig, and ΔsmpB E. coli strains (Baba et al., 2006). Although we had some early evidence of barcode crosstalk using the Phusion DNA polymerase, using the Q5 DNA polymerase (NEB), we found that no reads mapped to the tig and smpB coding regions in the Δtig and ΔsmpB samples respectively (Figure 2C), indicating the method accurately captures the RNAs of pooled samples.

We subsequently assessed the ability of our method to capture transcript 5’ end information. Following in the footsteps of the high-resolution Rend-seq protocol (Lalanne et al.,

142 2018) we introduced a light fragmentation step before OnePot RT to enrich for terminal RNA fragments. Using libraries generated from WT Bacillus subtilis RNA we found that the 5’ ends of the RNA were captured with single-nucleotide precision due to the high efficiency of the splint ligation approach (Kwok et al., 2013) (Figure 2D). Unfortunately, although the poly(A) tailing step is efficient in adenylating all RNA fragments, the 3’ end information is not as precise

(data not shown). In summary, our method can provide high resolution information about transcription start sites and RNA cleavage events in addition to accurate gene expression quantification.

143

144 Figure 2. Quality control of the multiplex method. (A) Correlation between all 12 barcoded replicate experiments. Pearson correlation of read counts for the top 1000 mRNA of all pairs of barcoded libraries derived from a split sample of WT MG1655 E. coli RNA. Lowest correlation calculated was between BC3 and 5 with Pearson correlation R = 0.9937. The average read count for the lowest counted gene (#1000, kdgK) was 20.64 reads. (B) Correlation of gene expression between barcoded WT replicates. Barcodes were chosen at random (6 and 11) to compare expression of the top 1000 protein coding mRNA. (C) Window of smpB and tig gene loci in E. coli with mapped read counts. 5’ ends of reads mapped to the smpB and tig gene loci are plotted from the WT MG1655, ΔsmpB, and Δtig experiments. (D) Example of 5’ end enrichment produced by multiplex method. Two independent libraries were generated from B. subtilis 168 with the multiplex early-stage barcoding method. Window of the rpsB operon with multiple known transcription start sites is shown (Lalanne et al., 2018). Insets show precision of capture of 5’ ends.

Discussion

Ten years after its inception, RNA-seq has become one of the most ubiquitous techniques in molecular biology to measure gene expression. With the cost associated with short-read sequencing dropping every year, a major limitation in generation of transcriptomic data has become the time and cost associated with library preparation. Here we describe a simple, cost and time-effective library preparation protocol for bacterial transcriptomics. We demonstrate that our early stage barcoding strategy captures accurate gene expression and transcript 5’ end information, in line with the best currently available technologies. Furthermore, early stage barcoding with OnePot RT allows for all steps downstream of fragmentation to be done without much pipetting, greatly reducing the hands-on time and intersample heterogeneity. Importantly, the protocol uses inexpensive, commercially available enzymes and DNA oligos.

By pairing our protocol with DIY rRNA removal methods, such as those developed by

Culviner, Guegler, and Laub (Culviner et al., 2020), the number of transcriptomes per sequencing flow can be greatly increased without a noticeable increase in cost or time. The benefits of simple non-commercial methods are three-fold. First, throughput of experimental design can be greatly increased without a large increase in cost. Whereas, the traditional

145 experiment involves the RNA-seq of tens of bacterial strains or conditions, the feasibility of exploring hundreds of transcriptomes at once is greatly increased (Kemmeren et al., 2014).

Second, the concern of commercial kits fluctuating in price, availability, and reproducibility is alleviated. Finally, and most importantly, these technical advances increase the accessibility of

RNA-seq to labs that once were prohibited by the cost of the method. Taken together, our early stage barcoding and multiplex method greatly expands the ability for whole-transcriptome profiling to become a mainstay in the microbiologist’s toolkit.

Acknowledgements

We would like to thank Stuart Levine and Noelani Kamelamela of the MIT BioMicro Center for their helpful discussions during development of the assay.

146 Methods

Bacterial Strains & Cell growth conditions

E. coli MG1655 K12 was used as the wildtype strain for the barcode pairing experiment. The

Δtig and ΔsmpB strains, originally produced in the Keio collection (Baba et al., 2006), were obtained from the Coli Genetic Stock Center (CGSC) at Yale University. Bacillus subtilis strain

168 was used for the experiment in Figure 2D. All cell growth was performed in LB media.

After overnight growth in liquid LB, cells were backdiluted >400-fold in fresh LB media until

OD 0.3. 5 mL of cells were mixed with 5 mL ice cold Methanol, spun for 10 minutes and decanted of supernatant before freezing at -80 °C.

RNA Extraction

RNA was extracted from pelleted cells using 10 mg/mL Lysozyme and RNeasy columns

(Qiagen) according to the manufacturer’s protocol.

RNA-seq data analysis

Processing of RNA sequencing reads included removal of poly(A) tails and mapping with

Bowtie (Langmead et al., 2009) (options -v 2 -k 1) to the NC_000964.3 or the NC_000913.3 reference genome for the B. subtilis and E. coli experiments respectively. The five prime ends of reads mapping to open reading frames was counted excluding the first and last five positions.

Read counts were normalized by total mapped reads and gene length (RPKM).

147 Multiplex RNA-seq Protocol

The entire protocol after RNA Extraction is listed below.

1. Fragmentation • Set up 250 ng RNA in 10 μL ddH20 on a cold block • Add 1 μL 10X Fragmentation reagent, briefly vortex to mix and spin down • Incubate at 95°C for 1 min 45 seconds & return to cold block • Add 1.1 μL 10X Stop Solution • Briefly mix & spin down • Cleanup reaction using Oligo Clean & Concentrator Column according to manufacturer’s protocol (Zymo – D4060) • Elute RNA off of column with 16 μL DEPC H20

2. One Pot RT reaction • Combine in strip tube on cold block: o 16 μL RNA o 4 μL PNK Master Mix (per sample): ▪ 2 μL 10X PNK Buffer ▪ 0.25 μL SUPERase*In ▪ 1.25 μL DEPC H20 ▪ 0.5 μL PNK enzyme • Incubate at 37°C for 60 min, 75°C for 10 min • Add 10 μL Poly(A) Master Mix (per sample): o 3 μL 500 mM KCl o 3 μL 10 mM ATP o 2 μL 5X FS Buffer (From Superscript III enzyme) o 0.25 μL SUPERase*In o 1.25 μL DEPC H20 o 0.5 μL E. Coli PolyA Polymerase • Incubate at 37°C for 30 min, 75°C for 10 min • Add 1 uL 25 uM RT Barcoding Primer • Incubate at 65°C for 5 min & return to cold block • Add Reverse Transcription Master Mix (per sample): o 3 μL 0.1M DTT o 2 μL 10 mM dNTP mix o 2 μL 5X FS Buffer o 1.25 μL DEPC water o 0.25 μL SUPERase*In o 0.5 μL SS III RT Enzyme • Incubate 50°C for 60 min, 75°C for 10 min • Pool all samples into a single 1.7 mL tube & mix thoroughly • Add 4.4 μL 1M NaOH & vortex briefly • Incubate at 95°C for 15 minutes

148 3. cDNA Size-Selection Gel • Remove 180 uL pooled sample and add 90 uL 2X Urea loading buffer • Heat samples for 2 min @ 75°C • Load onto 8 lanes on 10% TBU gel • Run gel at 200V for 1.5 to 2 hours • Cut bands from >100 – 120 nt • Extract and precipitate • Dissolve pellet in 20 μL 10 mM Tris 8

4. 3’ end ligation • Add (per sample): o 10 uL Size-selected cDNA as substrate o 5 uL 100uM p214 o 3 uL DEPC water o 5 uL 10X T4 DNA Ligase Buffer o 5 uL 5M Betaine o 20 uL PEG 8000 o 2 uL T4 DNA Ligase • Complete one reaction with 1 pmol oDJP208 as a ligation control • Incubate at at 16°C for 10 hours, 4°C overnight • Heat to 75°C for 10 minutes to denature enzyme • Clean up reaction with Zymo Oligo Clean & Concentrator column • Elute in 10 μL 10 mM Tris 8 • Add 10 μL 2X Urea Loading dye • Incubate at 95°C for 2 minutes • Run out on a 10% TBU gel for 1 hour 45 minutes • Cut band from ~135 – 155 nt • Extract and precipitate • Dissolve pellet in 20 μL 10 mM Tris 8

5. Library prep PCR • Make master mix (per sample): o 5 μL Ligated DNA o 6 μL 10 uM oDP161 o 6 μL 10 uM oDP128 o 6 μL 10 mM dNTP mix o 24 μL 10X Q5 buffer o 60 μL water o 2 μL Q5 polymerase • Aliquot 20 μL mix into 5 tubes, one for each cycle • PCR Protocol - 5 PCR reactions per sample at 6, 8, 10, 12, 14 cycles o 98° - 30s o Cycles: ▪ 98° - 10s ▪ 60° - 10s

149 ▪ 72° - 7s o 72° - 20s • Add loading dye & run on 8% TBE gel at 180V for 55 minutes • Cut PCR bands • Extract and precipitate. • Dissolved in 10 μL 10 mM Tris 8 • Send for sequencing on Illumina HiSeq or NextSeq

6. Repeated Protocols, Buffers, Reagents

Precipitation Protocol: 1. To RNA or DNA add up to 300 total uL appropriate 10 mM Tris buffer (7 for RNA, 8 for DNA), 33 uL (1/10 volume) 3M Sodium Acetate pH 5.5, 2 uL Glycoblue reagent 2. Briefly vortex or mix by pipet & briefly spin down 3. Add 900 uL (3 volumes) 100% cold ethanol and briefly vortex 4. -80° freezer for at least 30 minutes 5. Spin in 4° centrifuge for 30 minutes 6. Decant ethanol and add 500 uL 70% cold ethanol 7. Spin for at least 5 more minutes 8. Decant and remove excess ethanol by pipet 9. Air dry for 5 minutes and then add appropriate water/Tris buffer

Gel extraction protocol 1. Pierce an 0.5 mL tube with an 18-gauge needle and put it inside a non-stick 2 mL screw cap tube. 2. Excise gel piece and place inside 0.5 mL tube. 3. Spin the nested tubes 3 min @ 20,000 xG to force the gel through the needle hole. Shake any residual gel from the small tube into the larger tube. 4. Add 500 uL DEPC water to gel pieces and incubate 10 minutes @ 70 C. 5. Vortex gel slurry >15 seconds and use cut p1000 tip to transfer gel mixture to Costar Spin-X column. 6. Centrifuge 3 min @ 20,000 xG to recover the elution mixture free of gel debris. 7. Transfer eluate to new non-stick tube. 8. Add 2 uL glycoblue, 45 uL 3M NaOAc for RNA (3M NaCl for DNA) to eluate. 9. Add 750 uL isopropanol and mix well, precipitate at least 30 minutes at -20 or -80 C. 10. Spin 30 minutes @ 20,000 xG @ 4 C to pellet nucleic acids. 11. Remove supernatant, wash pellet in 750 uL 80% ice-cold EtOH. 12. Spin >5 minutes, remove supernatant and air-dry. 13. Re-suspend in water or 10 mM Tris 7.

7. Primers/Oligos and Commercial enzymes used

Commercial Enzymes & Reagents

150 Enzyme Supplier Catalog Number Fragmentation Reagent ThermoFisher Scientific AM8740 T4 Polynucleotide Kinase NEB M0201S E. coli PolyA Polymerase NEB M0276L Superscript III ThermoFisher Scientific 18080093 T4 DNA Ligase NEB M0202M Q5 DNA Polymerase NEB M0491L

General Reverse Transcription primer - CAAGCAGAAGACGGCATACGAGAT XXXXXX GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTTTTTTTTTVNN

Barcode Table (Reverse complement of XXXXXX in above sequence)

BC# Sequence BC# Sequence BC# Sequence BC1 AAGTCC BC20 CTACTA BC39 TCGTAA BC2 ACTGCA BC21 CTTGTT BC40 AACCCT BC3 AGAACT BC22 GACCAA BC41 TGGGAT BC4 AGTCAC BC23 GATAGT BC42 GGGTTA BC5 ATCATG BC24 GCAACA BC43 AGAGGA BC6 CACGAT BC25 TTCGCA BC44 GAAGCT BC7 CATGTC BC26 AGCTTT BC45 CTGCAT BC8 CTGGTA BC27 ACAAGC BC46 TCCAGA BC9 GAGCTT BC28 TATCGA BC47 ATCGGT BC10 GATTCG BC29 CAAAGA BC48 GTATTC BC11 GCTCAA BC30 TAGACA BC49 TCCGTT BC12 GTCTGA BC31 CTAGAG BC50 CCAATT BC13 TCACTG BC32 GCTATG BC51 CACTCA BC14 TGCTAG BC33 AAGGAG BC52 CTTAGC BC15 TGTTGC BC34 CCTTAG BC53 TACTGT BC16 TTGAGT BC35 TCTACT BC54 CAATAC BC17 ATGAAC BC36 ATACCC BC55 GGACAT BC18 TATGCG BC37 TGAAAC BC56 AACGTA BC19 GCGAAT BC38 GTAAGG BC57 TGGATG

cDNA Ligation Oligos DJP Oligo Use Sequence # oDJP214 Ligation Oligo AGATCGGAAGAGCGTCGTUCTGAUCTNNNN oDJP208 Oligo for control CAAGCAGAAGACGGCATACGAGATGGACTTGTGACT reaction GGAGTTCAGACGTGTGCTCTTCCGATCTTTTTTTTTTT TTTTTTTCGCGTTGCGGGTCGACTCCGTGTACAT

151

PCR Primers DJP Oligo # Alt. name Sequence oDJP121 oDP007 AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTAC ACGACGCTCTTCCGATCT oDJP161 oDJP010 CAAGCAGAAGACGGCA

152 References

Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M., Wanner, B.L., and Mori, H. (2006). Construction of Escherichia coli K‐12 in‐frame, single‐ gene knockout mutants: the Keio collection. Molecular Systems Biology 2, 2006.0008.

Culviner, P.H., Guegler, C.K., and Laub, M.T. (2020). A Simple, Cost-Effective, and Robust Method for rRNA Depletion in RNA-Sequencing Studies. MBio 11, 785.

Dard-Dascot, C., Naquin, D., d'Aubenton-Carafa, Y., Alix, K., Thermes, C., and van Dijk, E. (2018). Systematic comparison of small RNA library preparation protocols for next-generation sequencing. BMC Genomics 19, 118–16.

Elliott, D., and Ladomery, M. (2017). Molecular Biology of RNA (Oxford University Press).

Fuchs, R.T., Sun, Z., Zhuang, F., and Robb, G.B. (2015). Bias in ligation-based small RNA sequencing library construction is determined by adaptor and RNA structure. PLoS ONE 10, e0126049.

Gertz, J., Varley, K.E., Davis, N.S., Baas, B.J., Goryshin, I.Y., Vaidyanathan, R., Kuersten, S., and Myers, R.M. (2012). Transposase mediated construction of RNA-seq libraries. Genome Res. 22, 134–141.

Hansen, K.D., Brenner, S.E., and Dudoit, S. (2010). Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131–e131.

Hou, Z., Jiang, P., Swanson, S.A., Elwell, A.L., Nguyen, B.K.S., Bolin, J.M., Stewart, R., and Thomson, J.A. (2015). A cost-effective RNA sequencing protocol for large-scale gene expression studies. Sci Rep 5, 9570–9575.

Islam, S., Zeisel, A., Joost, S., La Manno, G., Zajac, P., Kasper, M., Lönnerberg, P., and Linnarsson, S. (2014). Quantitative single-cell RNA-seq with unique molecular identifiers. Nature Methods 11, 163–166.

Jaitin, D.A., Kenigsberg, E., Keren-Shaul, H., Elefant, N., Paul, F., Zaretsky, I., Mildner, A., Cohen, N., Jung, S., Tanay, A., et al. (2014). Massively parallel single-cell RNA-seq for marker- free decomposition of tissues into cell types. Science 343, 776–779.

Kebschull, J.M., and Zador, A.M. (2015). Sources of PCR-induced distortions in high- throughput sequencing data sets. Nucleic Acids Res. 43, e143.

Kemmeren, P., Sameith, K., van de Pasch, L.A.L., Benschop, J.J., Lenstra, T.L., Margaritis, T., O'Duibhir, E., Apweiler, E., van Wageningen, S., Ko, C.W., et al. (2014). Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific . Cell 157, 740–752.

Kivioja, T., Vähärautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S., and Taipale, J. (2011). Counting absolute numbers of molecules using unique molecular identifiers. Nature

153 Methods 9, 72–74.

Kwok, C.K., Ding, Y., Sherlock, M.E., Assmann, S.M., and Bevilacqua, P.C. (2013). A hybridization-based approach for quantitative and low-bias single-stranded DNA ligation. Anal. Biochem. 435, 181–186.

Lalanne, J.-B., Taggart, J.C., Guo, M.S., Herzel, L., Schieler, A., and Li, G.-W. (2018). Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749–761.e38.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25–10.

Levin, J.Z., Yassour, M., Adiconis, X., Nusbaum, C., Thompson, D.A., Friedman, N., Gnirke, A., and Regev, A. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods 7, 709–715.

Mamanova, L., Andrews, R.M., James, K.D., Sheridan, E.M., Ellis, P.D., Langford, C.F., Ost, T.W.B., Collins, J.E., and Turner, D.J. (2010). FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nature Methods 7, 130–132.

Mäki, A., and Tiirola, M. (2018). Directional high-throughput sequencing of RNAs without gene-specific primers. BioTechniques 65, 219–223.

Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., and Snyder, M. (2008). The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349.

Odelberg, S.J., Weiss, R.B., Hata, A., and White, R. (1995). Template-switching during DNA synthesis by Thermus aquaticus DNA polymerase I. Nucleic Acids Res. 23, 2049–2057.

Raabe, C.A., Tang, T.-H., Brosius, J., and Rozhdestvensky, T.S. (2014). Biases in small RNA deep sequencing data. Nucleic Acids Res. 42, 1414–1426.

Shishkin, A.A., Giannoukos, G., Kucukural, A., Ciulla, D., Busby, M., Surka, C., Chen, J., Bhattacharyya, R.P., Rudy, R.F., Patel, M.M., et al. (2015). Simultaneous generation of many RNA-seq libraries in a single reaction. Nature Methods 12, 323–325.

Sultan, M., Dökel, S., Amstislavskiy, V., Wuttig, D., Sültmann, H., Lehrach, H., and Yaspo, M.- L. (2012). A simple strand-specific RNA-Seq library preparation protocol combining the Illumina TruSeq RNA and the dUTP methods. Biochem. Biophys. Res. Commun. 422, 643–646.

Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63.

Wright, C., Rajpurohit, A., Burke, E.E., Williams, C., Collado-Torres, L., Kimos, M., Brandon, N.J., Cross, A.J., Jaffe, A.E., Weinberger, D.R., et al. (2019). Comprehensive assessment of multiple biases in small RNA sequencing reveals significant differences in the performance of

154 widely used methods. BMC Genomics 20, 513–521.

Zhou, D.-M., and Taira, K. (1998). The Hydrolysis of RNA: From Theoretical Calculations to the Hammerhead Ribozyme-Mediated Cleavage of RNA. Chem. Rev. 98, 991–1026.

Zhuang, F., Fuchs, R.T., Sun, Z., Zheng, Y., and Robb, G.B. (2012). Structural bias in T4 RNA ligase-mediated 3'-adapter ligation. Nucleic Acids Res. 40, e54–e54.

155 Chapter 5

Rapid Accumulation of Motility-Activating Mutations in Resting Liquid Culture of Escherichia coli

Darren J. Parker,1 Pınar Demetci,1 Gene-Wei Li1

1Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

This part was published in the Journal of Bacteriology. DOI: 10.1128/JB.00259-19 PMID: 31285239

Author Contributions

D.J.P. and P.D. contributed equally to this work. D.J.P., P.D., and G.-W.L. conceived the study and wrote the manuscript. D.J.P. designed and completed the DNA and RNA sequencing. P.D. performed the qPCR, overnight incubation, and swimming assays. G.-W.L. supervised the project.

156 Abstract

Expression of motility genes is a potentially beneficial but costly process in bacteria.

Interestingly, many isolate strains of Escherichia coli possess motility genes but have lost the ability to activate them in conditions in which motility is advantageous, raising the question of how they respond to these situations. Through transcriptome profiling of strains in the E. coli single-gene knockout Keio collection, we noticed drastic upregulation of motility genes in many of the deletion strains as compared to its weakly motile parent strain (BW25113). We show that this switch to a motile phenotype is not a direct consequence of the genes deleted, but is instead due to a variety of secondary mutations that increase the expression of the major motility regulator, FlhDC. Importantly, we find that this switch can be reproduced by growing poorly motile E. coli strains in non-shaking liquid medium overnight but not in shaking liquid medium.

Individual isolates after the non-shaking overnight incubations acquired distinct mutations upstream of the flhDC operon, including different insertion sequence (IS) elements and, to a lesser extent, point mutations. The rapid sweep in the non-shaking population shows that poorly motile strains can quickly adapt to a motile lifestyle by genetic rewiring.

157 Introduction

Motility is a highly effective strategy for bacteria to move within their environment.

Although this process is important for cells to survive in nutrient starved conditions, the cohort of proteins required for this ability are costly to produce and utilize, accounting for up to 2% of energy expenditure in E. coli (Chevance and Hughes, 2008; Soutourina and Bertin, 2003). Unsurprisingly, motility genes in many bacteria are highly regulated to ensure both minimal expression under unnecessary conditions, as well as ordered production when motility becomes beneficial. In E. coli, integration of regulatory signals primarily take place at the flhDC operon, which encodes the

28 master transcription factor FlhD4C2 that regulates the flagellar sigma factor σ and many other motility genes (Chilcott and Hughes, 2000; Girgis et al., 2007; Liu and Matsumura, 1994;

Soutourina and Bertin, 2003; Wang et al., 2006). Like many genetic regulatory circuits in E. coli, the mechanisms activating and inactivating flhDC expression have been well characterized (Bertin et al., 1994; Fitzgerald et al., 2014; Patrick and Kearns, 2012; Soutourina et al., 1999).

In addition to pre-wired regulation, genetic mutations can also lead to activation of genes that are otherwise silenced, but the dynamics of such events under changing environments are much less understood. In the case of motility gene expression, a wide array of mutations has been found upstream of flhDC in both wild and laboratory isolates of E. coli K-12 strains (Barker et al., 2004; Fahrner and Berg, 2015; Lee and Park, 2013). In particular, many strains, including several sequenced MG1655 variants such as CGSC 7740, carry different insertion sequence (IS) elements that perturb the regulation of flhDC expression, leading to dramatic differences in the ability to become motile under conditions that are typically used to study motility (Barker et al.,

2004; Wang and Wood, 2011; Zhang et al., 2017). By contrast, strains without IS elements upstream of flhDC, including other isolates of MG1655 like CGSC 6300 and a K-12 derivative,

158 BW25113, have been shown to be poorly motile (Grenier et al., 2014; Tamar et al., 2016; Wang and Wood, 2011). The loss of motility in these strain raises the question of how they can adapt to detrimental conditions. It was recently found that rare motile isolates can be enriched by growing non-motile starting cultures on soft agar plates (Wang and Wood, 2011; Zhang et al., 2017), suggesting that phenotypic switching may occur if the appropriate conditions are provided.

The poorly motile E. coli strain BW25113 served as the parent strain for the Keio collection, the library of single-gene deletion E. coli strains (Baba et al., 2006). Since its inception 13 years ago, the Keio collection has provided a powerful foundation for a wide array of studies, including functional genomics, experimental evolution, and synthetic biology (Baptist et al., 2013; Falls et al., 2014; French et al., 2016; Nichols et al., 2011; Takeuchi et al., 2014).

The collection was generated by using the lambda-red recombination technology to produce single gene deletions of all non-essential genes from the parent BW25113 strain (Baba et al.,

2006). Although the processes of recombination and validation may generate secondary mutations, there have not been systematic reports of any genetic changes in the collection to our knowledge.

Here we report that a large fraction of the Keio strains tested have dramatically elevated, and likely constitutive, motility gene expression and a motile phenotype compared to the parent strain. We have mapped this increase to direct genetic changes near the flhDC and lrhA loci, the master regulators of motility gene expression. Importantly, this shift from a non-motile state to a motile one is not due to neither the single gene deletions nor lambda-red recombination. Rather, simply incubating E. coli strains in liquid media without shaking overnight is sufficient to select for mutations that have constitutive expression of motility genes. By contrast, non-motile cells grown in shaking liquid media do not undergo selection for motility. These mutations come

159 through various means, but all collectively lead to de-repression of the flhDC locus and increased motility. Taken together, these results demonstrate a secondary form of heterogeneity in the

Keio collection, and serve as another example of how rapid genetic changes can sweep through a population of bacteria in the absence of regulation.

Results

Transcriptome profiling of Keio strains shows drastic differences in motility gene expression

We first noticed striking disparity in motility gene expression while performing a pilot study to map transcriptome responses to single-gene deletions in E. coli using the Keio collection. RNA sequencing was carried out for 71 Keio strains obtained directly from the Coli

Genetic Stock Center (CGSC) grown in shaking MOPS minimal media (Material and Methods).

Hierarchical clustering of gene expression changes relative to the parent strain, BW25113, showed a prominent signature of upshift for flagellar σ28 controlled genes in 49 of the 71 strains

(Fig. 1A and Table S1). Among these 49 strains, the increases in motility gene expression account for most of the largest transcriptome differences with the parent strain (Fig. 1B).

160

Figure 1. Transcriptome remodeling among Keio strains. (A) Hierarchical clustering of gene expression data from 71 Keio knockout strains. Columns denote strains and rows denote genes. Fold-change between mutants and BW25113 is log2 transformed and colored according the scale (inset). Un-centered Pearson correlation was used as the distance metric and clustering is performed using complete linkage. 49 strains showed highly upregulated expression of genes regulated by the flagellar sigma factor σ28, as highlighted on the right of the heatmap. (B) Transcriptome-wide comparisons between BW25113 (WT) and three representative Keio knockout strains highlighted in (A). Annotated motility-related genes are marked in red.

161

To validate the transcriptomic shift, we utilized a GFP reporter construct driven by the promoter of fliC, a major component of the flagellin system, and one of the genes that show the largest increase in expression (Berg, 2003) (Fig. 2A). Plasmids harboring the PfliC-gfp reporter were transformed into BW25113, our lab’s MG1655 strain, and four representative Keio strains, three of which exhibited motility gene activation (ΔpykAKeio, ΔsodAKeio, ΔcspAKeio). While little fluorescence was observed for the parent strain BW25113, the Keio strains with upregulated σ28 controlled genes all showed elevated fluorescence from PfliC-gfp when plated on solid LB-agar

(Fig. 2B). Additionally, ΔsodBKeio did not show any increase in σ28 controlled genes and exhibited no increase in fluorescence over its parent. As another negative control, our lab strain of MG1655 showed low GFP fluorescence, in agreement with our previous transcriptome profiling data (Li et al., 2014). During the construction of the Keio collection, two independent clones were saved for each deletion (Baba et al., 2006). We found that the second clone from the collection (ΔpykAKeio-dup., ΔsodAKeio-dup., ΔcspAKeio-dup.) also showed high, albeit different, levels of fluorescence when transformed with the plasmid containing PfliC-gfp (Fig. 2B). Taken together, these findings suggest a shift in motility gene expression is a common signature in a large fraction of the Keio collection, despite originating from the poorly motile strain BW25113.

162

Figure 2. Confirmation of motility gene expression using a GFP reporter driven by fliC promoter. (A) Schematic of the GFP reporter driven by the σ28-dependent fliC promoter for motility gene expression. 70 nucleotides upstream of the coding sequence of fliC was cloned in front of gfp and inserted into a low copy plasmid, pSC101. (B) Fluorescence image of strains containing the PfliC-gfp reporter. Six Keio strains, along with BW25113 and MG1655, were transformed with the plasmid carrying the reporter construct and streaked on a 1.2% LB-agar plate before exposure to UV light. The strain ΔsodBKeio did not show upregulation of motility genes in RNA-seq experiment and was included as a negative control.

163 Swimming phenotype reflects increased motility gene expression

We next examined whether the transcriptomic changes led to an observable increase in the cells’ motility. Cell cultures in exponential phase were directly inoculated at the center of

0.25% soft agar plates and grown overnight at 37°C. In comparison to the non-motile parent

BW25113 strain, the selected Keio strains from both duplicates of the collection covered a larger area on the soft agar plate, demonstrating that these cells were able to swim further outwards from the origin of inoculation (Fig. 3A). The diameter of the swimming cultures varies across strains and is strongly correlated with the expression level of fliC as measured by qPCR (Fig

3B), all of which exhibit greater expression than WT. Together, these results show that the transcriptional activation of the flagellar genes directly relates to an increase in swimming ability.

164

Figure 3. Correlation between motility gene activation and swimming phenotype. (A) Swimming motility assay on soft-agar plates. Plates were imaged after cells were spotted in the center and incubated for 8 hours at 37°C. Dashed lines mark the measured diameter of each culture. (B) Correlation between the diameter of motility assay and fliC expression level as measured by qPCR. Vertical error bars indicate the standard deviation (S.D.) for qPCR measurements from three technical replicates. Horizontal error bars indicate the range for diameter measurements from three biological replicates of the swimming assay.

165 Widespread secondary mutations affect the expression of motility regulators

To directly assess whether the gene deletions were sufficient to explain the motility phenotype, we transduced the deletion cassette from selected Keio strains into a wild-type

BW25113 background using P1. Comparing the expression levels of fliC and cheA, another gene of the flagellar regulon, via qPCR, we observed that their expression in freshly transduced cells was similar to the levels in the BW25113 (Fig 4A). The lack of motility gene expression and phenotype in freshly transduced cells was further confirmed using the PfliC- gfp reporter construct and soft agar swimming assays, respectively (Fig. 4B and C). These results demonstrate that the widespread activation of the σ28 regulon in the Keio collection is not a direct response to gene deletion, consistent with the quantitative differences in the swimming assay between two independent mutants (Fig. 3B).

166

Figure 4. Absence of motility activation after P1 transduction of single-gene deletions. (A) mRNA levels of fliC and cheA as measured by qPCR for three Keio strains and three freshly transduced strains. Error bars indicate S.D. among three technical replicates. (B) Fluorescence images of LB-agar plates for freshly transduced deletion strains. ∆cspAKeio from the Keio collection was included as a positive control. (C) Swimming motility assay on soft-agar plates for the same strains in (B).

To determine the secondary mutations in the Keio strains that explain increased motility gene expression, we performed whole genome sequencing on several strains with or without motility gene activation. Overall, there were few differences between the parent strain,

BW25113, and the Keio mutants except that all the motility-activated Keio strains contain either insertion sequences (IS) or point mutations in the vicinity of genes encoding the motility

167 regulators lrhA and flhD (Fig. 5A). The only mutation found outside these regions was an intergenic mutation near an rRNA operon in ΔpykAKeio, which was also found in several other strains without motility gene activation. By contrast, no changes at either the flhDC or lhrA regulatory region were observed for a knockout strain that did not show motility gene activation,

ΔsodBKeio (Fig. 2B).

We validated these mutations by performing Sanger sequencing on the corresponding genomic regions from each of the knockout strains. Sanger sequencing also confirmed that independent Keio mutants carry distinct secondary mutations in the vicinity of motility regulator genes, including in the coding region of lrhA (synonymous or nonsynonymous point mutation) or upstream of flhDC (insertion or point mutation) (Fig. 5AB). Notably, none of the independent clones of the same gene deletion shared identical secondary mutations. One of the strains

(ΔcspAKeio-dup.) even appeared as non-isogenic in our stock, with single colonies exhibiting differences in the sites of IS insertion (Fig. 5C). Overall, these results argue that the pressure to obtain genetic changes at lrhA or flhD is selected for at some step downstream of the initial gene deletion.

FlhD4C2 is the transcriptional for many motility genes, including fliA, which encodes σ28 (Arnosti and Chamberlin, 1989). The expression of flhDC is repressed by LrhA

(Lehnen et al., 2002). In our RNA-seq experiments, strains carrying the lrhA mutations

(ΔsodAKeio and ΔcspAKeio) have a decreased level of lrhA mRNA and an increased level of flhDC

(Fig. 5D), suggesting that these mutations decrease either the lrhA mRNA stability or its transcription. On the other hand, the point mutation in the cAMP-CRP binding site upstream of flhDC for ΔpykAKeio, which is identical to a mutation found previously in two other strains

(Fahrner and Berg, 2015), leads to a large increase in flhDC expression while having no effect on

168 lrhA. In addition, we confirmed previous reports that various insertion sequences upstream of flhDC induce its expression but not lrhA (Barker et al., 2004; Fahrner and Berg, 2015; Lee and

Park, 2013) (Fig. 5D). Overall, the plethora of secondary mutations observed that induce flhDC expression indicate a strong selective pressure for cell motility at some point after the knockout mutants were generated.

169

Figure 5: Diverse secondary mutations responsible for motility activation in Keio collection. (A) Mutations in the lrhA and flhD regions identified by whole genome sequencing

170 and targeted Sanger sequencing for selected strains in the Keio collection. (B) Schematic showing the locations and the sequence contexts of the mutations. The point mutation in ∆pykAKeio is in the cAMP-CRP binding site (hexagon). (C) PCR amplicons for the flhD confirming the addition of insertion sequences. Four single colonies for each strain were picked. The schematic next to the gel represents the insertions in this region while arrows denote the relative location of the PCR priming sites. (D) Gene expression changes for flhD and lrhA from the original RNA-seq experiment on four presented knockout strains. ΔsodBKeio did not show any mutations or gene expression changes at either locus.

Rapid accumulation of motility-activating mutations in resting liquid cultures

To determine how the selection for motility arose, we first sought to recreate the conditions that led to accumulation of secondary mutations. As noted in the protocol of the original paper, during the construction of the Keio collection, independent colonies were isolated and cultures were incubated at 37°C overnight without shaking (Baba et al., 2006). We hypothesized that this non-shaking (or “resting”) condition favors motile cells and therefore selects for mutants that activate flhDC expression. To test this hypothesis, we inoculated

BW25113 and the strains that were freshly transduced with the deletion cassettes into fresh LB, and incubated with or without shaking overnight. The following morning, each culture was mixed thoroughly and examined for motility phenotypes by inoculating onto fresh soft agar plates. After another night at 37°C, cultures for all strain types incubated without shaking the first night showed a noticeable amount of swimming, while the cultures incubated with shaking did not (Fig 6A). After five days of sequential overnight incubation, mixing before daily dilution, and repeated liquid growth, non-shaking cultures showed an even more prominent motility phenotype (Fig 6A). Similar effects were observed for freshly transduced knockout strains, the parent BW25113, and MG1655.

To quantify and compare the rates of mutation accumulation in different backgrounds, we used the PfliC-gfp reporter construct to examine individual colonies after incubation. We

171 incubated the freshly transduced deletion strains containing the reporter overnight in LB media in either shaking or resting conditions. After each overnight, an aliquot was plated on LB agar to count the fraction of fluorescent cells, and another aliquot was diluted into liquid culture to start a new round of overnight incubation. We continued this experiment for seven days. In order to ensure that we were not artificially selecting for motile cells while plating, the overnight culture was mixed vigorously for a minute prior to plating.

After the first overnight, the fraction of fluorescent colonies from non-shaking cultures accumulated to >25% for all BW25113-derived strains and to about 18% for MG1655 (Fig. 6B).

Over subsequent days, BW25113 and freshly transduced single-gene deletion strains followed similar trajectories, with the entire population becoming motile by the 7th overnight. On the contrary, we did not detect any fluorescent colonies from any of the shaking cultures, indicating that the frequency of motile mutants is below our detection limit (0.8%) in these populations.

Analysis of individual fluorescent colonies revealed a large array of mutations in each resting culture. We sequenced the flhD region for 25 total colonies (five from each of the 7th day culture) and found mutations in all of them. The mutations included IS5 insertions at two different locations, an IS1 insertion, and a point mutation close to the coding region of flhD. All but one plate had more than one distinct mutation, indicating a diverse composition after the 7th day. These findings suggest that the resting condition uniquely contributes to the accumulation of motile mutants and further highlights that the single-gene deletions do not appear to play a role in the frequency of this accumulation.

172

Figure 6. Rapid accumulation of motility-activating mutations in resting liquid culture. (A) Soft agar swimming assays plates for cells cultured with or without shaking. The first two rows

173 are from single night overnight cultures, and the final row is from the fifth day of successive overnight cultures. (B) Fraction of motility-activated colonies as a function of time. PfliC-gfp construct was transformed into freshly transduced knockout cells. After every overnight a fraction of cells was diluted, plated, and counted. Insets show representative fluorescence images of hard-agar LB plates for ∆cspAtransduced after the first day and the seventh day of incubation without shaking. (C) PCR amplicons for the flhD neighborhood of fluorescent colonies showing a diverse mixture of insertions and a point mutation in each culture. The schematic on the right represents the insertions found by sequencing. The arrow corresponds to a point mutation.

Discussion

In this study, we found that a large fraction (69% of 71 strains tested) of the E. coli strains in the single-gene deletion Keio collection have upregulated the expression of their motility genes. This switch to a motile state is not due to the deleted genes but rather due to secondary mutations accumulated after growth in resting liquid cultures. Non-shaking liquid cultures that start with non-motile strains such as BW25113 or MG1655 begin to show subpopulations of motile mutants after just a single overnight incubation. These motile mutants are accompanied by a variety of genetic changes that culminate in the increased expression of the master regulator of motility, FlhDC. Importantly, this genotype and phenotype switch was not observed for the same strains grown in liquid shaking culture. Moreover, the fractions of motile mutants were similar between the knockout strains and the wild-type parent strain BW25113, suggesting that the accumulation of secondary mutations is not dependent on the three genes that are deleted. Therefore, the resting condition is sufficient to cause accumulation of motility- activating mutations in E. coli.

The nature of these mutations demonstrates another example of how bacterial cells adapt their transcriptome to changing environments by rapidly rewiring regulatory networks. The most common way we observed this rewiring occurring in our data was through the insertion of IS

174 elements upstream of the flhDC operon (Fig. 5A-C, and 6C). IS elements are abundant in the E. coli K12 genome and are frequently associated with mutation events (3.5x10-4 insertions per genome per generation and 4.5 x 10-5 IS homologous recombinations per genome per generation), especially in certain regions of the genome such as the flhDC neighborhood

(Fahrner and Berg, 2015; Lee and Park, 2013; Lee et al., 2016). It remains to be fully determined whether the rate of IS-mediated gene activation is subject to environmental control.

The possibility that stress conditions trigger directed rearrangement of insertion elements, and specifically at flhDC, has been discussed in the literature (Stoebel et al., 2009; Vandecraen et al., 2016; Wang and Wood, 2011; Zhang et al., 2017). Two potential models exist to explain how the mutations arose. In the first model (Larmarckian), the resting condition itself leads to the formation of secondary mutations by triggering of transposition of insertion sequences or increasing the mutation rate. The Larmarckian model has been suggested in previous work on mutations causing flhDC activation on soft-agar plates (Wang and Wood, 2011; Zhang et al.,

2017). Although it was shown that the accumulation of mutations requires functional flagella, no direct molecular mechanisms leading to the increased transposition or mutation rate at flhDC or lrhA have been uncovered. In the second model (Darwinian), mutants were either pre-existing in the cultures at low frequencies or spontaneously arising during the resting condition.

Importantly, our data do not distinguish between these two possible scenarios.

In a non-shaking culture, it is plausible that aerotaxis or chemotaxis provides a selective advantage by allowing motile cells to reach the liquid-air interface and are required for the increase in the motility phenotype in the population (Sourjik and Wingreen, 2012; Taylor et al.,

1999). Interestingly, we do not find this to be likely, as Keio knockout strains for aer, cheA, cheW, and cheY, regulators for aerotaxis and chemotaxis, show a large fraction of GFP positive

175 cells when transformed with the PfliC-gfp construct (100% for ΔcheY) (Fig. 7A). We isolated non-fluorescent colonies of Δaer, ΔcheA, and ΔcheW cells with the PfliC-gfp construct and found that after a single overnight in resting liquid culture individual isolates began to express GFP.

Over sequential overnights in resting culture the populations became dominated by GFP positive cells, similar to all previously tested strains (Fig. 7B). Therefore, it remains to be determined what the exact origin of the fitness advantage is and what pathways are required for imparting this advantage.

Figure 7. Motility gene activation in strains deficient for aerotaxis and chemotaxis. (A) Transformation results for Keio deletion strains for Δaer, ΔcheA, ΔcheW, ΔcheY with PfliC-gfp construct. Transformants for PfliC-gfp were selected for on 100 μg/mL ampicillin LB-agar selection plates and plates were imaged under blue light. (B) Fraction of motility-activated colonies as a function of time. Non-fluorescent colonies of Keio strains Δaer, ΔcheA, and ΔcheW, transformed with PfliC-gfp were isolated and grown overnight in shaking or resting media. After every overnight a fraction of cells was diluted, plated, and counted. Another fraction was backdiluted into fresh shaking or resting media to continue the experiment.

176 Finally, an important implication of this work is that common laboratory practices can rapidly lead to undesired mutations in E. coli. The propensity to accumulate motility-activating mutations during non-shaking overnight incubation could contribute to the widespread secondary mutations we identified in the Keio collection, as well as the diversity of motility phenotypes in the MG1655 strains from different laboratories (Barker et al., 2004). Whereas the 71 Keio strains that we profiled were obtained directly from the CGSC, independent and subsequent distributions in and between laboratories may lead to additional secondary mutations that did not arise from the initial library construction. Because of these genomic changes, it is possible that phenotypes observed in Keio knockout strains are not solely due to the genes deleted, but are compounded with the motility activation (Nichols et al., 2011; Takeuchi et al., 2014). We therefore urge caution in direct comparisons between Keio strains without P1 transduction since this motility activation adds an additional layer heterogeneity on the collection. Taken together, these results should inform proper laboratory practices, as even a single night of resting cultures can lead to robust and unwanted genetic changes that are not immediately obvious.

177 Acknowledgements

We would like to thank Tania Baker’s lab for generously providing the Keio duplicate strains.

We thank Mary Anderson and the members of the BioMicroCenter at MIT, for help in performing the genomic DNA sequencing. We would also like to thank Dan Kearns for helpful discussion. This research is supported by NIH R35GM124732, NIH Pre-Doctoral Training Grant

(T32GM007287, to DJP), Pew Biomedical Scholars Program, a Sloan Research Fellowship,

Searle Scholars Program, the Smith Family Award for Excellence in Biomedical Research.

178 Methods

Table 1 - List of Keio knockout strains tested Keio deletion strains with upregulated Keio deletion strains without motility gene motility gene expression (49) activation (22) fklB, dacC, hchA, rhlB, ompF, fkpA, skp, hupA, pykF, cspC, sufB, iscU, hupB, pfkB, metH, msrB, cyoD, yhdL, ompC, pykA, gshB, sodB, fumA, katE, cueO, nuoB, fbaB, sufC, speD, smpB, slyD, gpmM, tktB, iscS, ahpC, asnB, degP, acnA, mltD, purT, surA, folX, ansB, cyoC, cspE, bcp, stpA, cyoA, sufA deaD, spb, grxB, gdhA, malT, ahpF, aroG, nlpD, cspA, asnA, ppiD, sodA, speA, poxB, katG, gloA, talA, argF, speB, cyoB, spr, fis

Table 1. All Keio knockout strains tested. Table listing all transcriptome sequenced Keio strains sorted by motility gene upregulation signature. A three-fold increase in fliC expression above BW25113 was the threshold used to determine if a strain exhibited motility gene upregulation. 49/71 strains fall above this threshold while 22/71 did not.

Strains

The Keio strains for RNA-seq, including the three strains that we screened individually, denoted as ∆pykAKeio, ∆sodAKeio, and ∆cspAKeio were obtained directly from the Yale E.coli Genetic Stock

Center. These strains were also the parent strains for the transduced strains ∆pykAtransduced,

∆sodAtransduced, and ∆cspAtransduced. The duplicate strains as well as the Keio deletion strains Δaer,

ΔcheA, ΔcheW, ΔcheY, were obtained from Baker lab at Massachusetts Institute of Technology.

RNA-sequencing. MOPS minimal media (Teknova, M2106) was used as the liquid media for all gene expression analysis (RNA-seq and qPCR). Overnight cultures of ∆pykAKeio, ∆sodAKeio, and

∆cspAKeio were backdiluted 1000-fold into 10 mL fresh MOPS Minimal Media and grown to

OD600 0.3. 5 mL of cells were harvested and mixed with equal parts ice cold methanol, spun down for 10 minutes & decanted before freezing at -80°C. Cells were then lysed with 10 mg/mL lysozyme and RNA extracted using RNeasy column (Qiagen #74104). Purified RNA was ribosomal RNA depleted with MICROBExpress Bacterial mRNA Enrichment Kit

(ThermoFischer #AM1905) according to the manufacturer’s instructions. 250 μg of RNA was

179 fragmented (ThermoFischer #AM8740) and underwent the multiplex RNA-seq library prep protocol as detailed in Chapter 4. Single end (50 bp) sequencing was performed on an Illumina

HiSeq 2000. Sequencing data available at Gene Expression Omnibus with accession number

GSE129161.

Hierarchical clustering. We used Cluster 3.0 software to perform complete linkage hierarchical clustering with uncentered Pearson correlation on the RNA-seq data. For preprocessing, we first selected genes with count values of over 100 in at least one condition. We further filtered data for genes that show a minimum two-fold change in at least one of the conditions. This method filtered out 4065 genes. We the logarithmically transformed (base 2) fold-change in the expression of the remaining 235 genes between the 71 Keio mutants and BW25113. Java

TreeView software is used to visualize the cluster.

GFP reporter construct. Using Gibson assembly, promoter region of fliC and coding region of green florescent protein were fused and cloned into pSC101 (Manen et al., 1994). The promoter region of fliC is identified as the 70 bp long region upstream of the fliC coding sequence. The sequence and plasmid are available through Addgene (pGL001).

Overnight cultivation of strains with GFP reporter construct with and without shaking.

Two independent 10 mL LB cultures were started in 125 mL flasks for each strain transformed with the GFP reporter construct. One set was shaken at 250 rpm on a benchtop shaker and the other set was kept on a shelf without shaking at 37°C for 16-20 hours. At the end of the 16-20 hour duration, 1 mL was taken out from each culture, diluted, and plated on 100 μg/mL

180 ampicillin LB selection plates. Plates were imaged under blue light and the fraction of colonies that showed any fluorescence were counted for each plate. After initial overnight (16-20 hr incubation), liquid cultures were diluted by 1:200 to continue with a new day of overnight cultures.

qPCR. qPCR experiments were run on a Roche LC480 machine to measure RNA levels of fliC flagellar gene and cheA chemotaxis gene for all strains used in this study. Strain growth and

RNA collection was completed as described in the RNA-sequencing section. Random hexamer reverse transcription with M-MuLV (NEB, M0253) was performed according to manufacturer’s specifications. The standard error of the mean (S.E.M.) of the fold-change was calculated by propagating the S.E.M. from each measurement. hcaT as a loading control for each sample (Peng et al., 2014). The primer sequences used for qPCR were as follows:

• fliC: (Forward) GTTCTATCGAGCGTCTGTCTTC;

(Reverse) ATACCGTCGTTGGCGTTAC

• cheA: (Forward) CAAGAACAGCTCGACGCTTA;

(Reverse) GTTTCGCCTTTCGCTTCTAATG

• hcaT: (Forward) GCTGGTGATGATTGGCTTTA;

(Reverse) GTAATCAAGCGGGAACTGCT

Soft agar swimming assays. Semi-solid motility plates were prepared using tryptone broth (13 g

Bacto tryptone and 7 g NaCl per liter of media) and 0.25% Difco agar. Plates were pre-warmed to 37°C prior to inoculation. Using a pipette tip, plates were poked and inoculated with 3 μL of cell culture grown to OD600 of 0.1 in 10 mL tryptone broth in flasks. Plates were incubated in

181 37°C for 8 hours and imaged on a BioRad Gel Doc XR+ Gel System (#1708195). Culture diameters were measured using ImageJ software and normalized to the diameter of the petri dishes.

Swimming assays for overnight cultures. Two sets of liquid cultures were started for all strains in 10mL of tryptone broth. One set was shaken at 250 rpm continuously while the other set was kept on the shelf without shaking at 37°C for 16-20 hours. At the end of the 16-20 hour duration,

1 mL was taken out from each culture and was diluted to OD600 of 0.1 and plated on soft agar swimming assays as explained above. Prior to plating, the resting cultures were shaken at 250 rpm for a minute.

After each day of overnight incubation, liquid cultures were diluted by 1:200 to continue with a new day of overnight cultures.

P1 transduction. To generate fresh deletion strains, Kanamycin cassette replacement knockouts were transduced into wild-type BW25113 strains using bacteriophage P1 and selected on LB agar plates with 50 ng/µL kanamycin. For transduction, the protocol outlined by Thomason et al. was used (Thomason et al., 2007). The lysates for each fresh deletion were prepared from the strain directly obtained from CGSC. For example, ∆pykAtransduced was made from ∆pykAKeio lysate. Knockouts were validated by amplifying the regions surrounding the deletion of interest and confirming the gene replacement by a kanamycin cassette.

Whole genome sequencing. DNA from overnight shaking liquid cultures of ∆pykAKeio,

∆sodAKeio, ∆cspAKeio, ∆sodBKeio, and BW25113 was extracted with Promega Wizard® Genomic

182 DNA Purification Kit (#A1120) according to the manufacturers protocol. Samples were prepared using the Nextera XT DNA Library Prep Kit (Illumina) and paired end sequenced (150 bp + 150 bp) on an Illumina MiSeq (v2 chemistry). Sequencing data was analyzed for mutations using breseq package using BW25113 as the reference strain (Deatherage and Barrick, 2014).

Sequencing data is available through SRA with accession PRJNA530228.

Sanger sequencing. Regions of interest in the lrhA and flhD neighborhood for all six Keio strains studied were amplified using NEB Q5 high fidelity polymerase and PCR. PCR conditions were as follows: 30 seconds of initial denaturation at 98˚C, 35 cycles of amplifications, each lasting 20 seconds at 65˚C, 2 minutes of final extension at 72˚C. Primers used for PCR are:

• flhD region: (Forward) GTTGATGTCATAAATGTGTTTCAGCAAC;

(Reverse) CATTAACATTAAGTTGATTGTTGCCTTTCTTTG

• lrhA region: (Forward) CGCGATTTAACAGGAAAGGTAAGATC;

(Reverse) CAAACACGCGCATGTCATAAATAG

183 References

Arnosti, D.N., and Chamberlin, M.J. (1989). Secondary sigma factor controls transcription of flagellar and chemotaxis genes in Escherichia coli. Proc. Natl. Acad. Sci. U.S.a. 86, 830–834.

Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M., Wanner, B.L., and Mori, H. (2006). Construction of Escherichia coli K‐12 in‐frame, single‐ gene knockout mutants: the Keio collection. Molecular Systems Biology 2, 2006.0008.

Baptist, G., Pinel, C., Ranquet, C., Izard, J., Ropers, D., de Jong, H., and Geiselmann, J. (2013). A genome-wide screen for identifying all regulators of a target gene. Nucleic Acids Res. 41, e164–e164.

Barker, C.S., Prüss, B.M., and Matsumura, P. (2004). Increased motility of Escherichia coli by insertion sequence element integration into the regulatory region of the flhD operon. J. Bacteriol. 186, 7529–7537.

Berg, H.C. (2003). The rotary motor of bacterial flagella. Annu. Rev. Biochem. 72, 19–54.

Bertin, P., Terao, E., Lee, E.H., Lejeune, P., Colson, C., Danchin, A., and Collatz, E. (1994). The H-NS protein is involved in the biogenesis of flagella in Escherichia coli. J. Bacteriol. 176, 5537–5540.

Chevance, F.F.V., and Hughes, K.T. (2008). Coordinating assembly of a bacterial macromolecular machine. Nature Reviews Microbiology 6, 455–465.

Chilcott, G.S., and Hughes, K.T. (2000). Coupling of flagellar gene expression to flagellar assembly in Salmonella enterica serovar typhimurium and Escherichia coli. Microbiol. Mol. Biol. Rev. 64, 694–708.

Deatherage, D.E., and Barrick, J.E. (2014). Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol. Biol. 1151, 165– 188.

Fahrner, K.A., and Berg, H.C. (2015). Mutations That Stimulate flhDC Expression in Escherichia coli K-12. J. Bacteriol. 197, 3087–3096.

Falls, K.C., Williams, A.L., Bryksin, A.V., and Matsumura, I. (2014). Escherichia coli deletion mutants illuminate trade-offs between growth rate and flux through a foreign anabolic pathway. PLoS ONE 9, e88159.

Fitzgerald, D.M., Bonocora, R.P., and Wade, J.T. (2014). Comprehensive mapping of the Escherichia coli flagellar regulatory network. PLOS Genet 10, e1004649.

French, S., Mangat, C., Bharat, A., Côté, J.-P., Mori, H., and Brown, E.D. (2016). A robust platform for chemical genomics in bacterial systems. Mol. Biol. Cell 27, 1015–1025.

Girgis, H.S., Liu, Y., Ryu, W.S., and Tavazoie, S. (2007). A comprehensive genetic

184 characterization of bacterial motility. PLOS Genet 3, 1644–1660.

Grenier, F., Matteau, D., Baby, V., and Rodrigue, S. (2014). Complete Genome Sequence of Escherichia coli BW25113. Genome Announc 2, e01038–14.

Lee, C., and Park, C. (2013). Mutations upregulating the flhDC operon of Escherichia coli K-12. J. Microbiol. 51, 140–144.

Lee, H., Doak, T.G., Popodi, E., Foster, P.L., and Tang, H. (2016). Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli. Nucleic Acids Res. 44, 7109– 7119.

Lehnen, D., Blumer, C., Polen, T., Wackwitz, B., Wendisch, V.F., and Unden, G. (2002). LrhA as a new transcriptional key regulator of flagella, motility and chemotaxis genes in Escherichia coli. Mol. Microbiol. 45, 521–532.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Liu, X., and Matsumura, P. (1994). The FlhD/FlhC complex, a transcriptional activator of the Escherichia coli flagellar class II operons. J. Bacteriol. 176, 7345–7351.

Manen, D., Xia, G., and Caro, L. (1994). A locus involved in the regulation of replication in plasmid pSC101. Mol. Microbiol. 11, 875–884.

Nichols, R.J., Sen, S., Choo, Y.J., Beltrao, P., Zietek, M., Chaba, R., Lee, S., Kazmierczak, K.M., Lee, K.J., Wong, A., et al. (2011). Phenotypic landscape of a bacterial cell. Cell 144, 143– 156.

Patrick, J.E., and Kearns, D.B. (2012). Swarming motility and the control of master regulators of flagellar biosynthesis. Mol. Microbiol. 83, 14–23.

Peng, S., Stephan, R., Hummerjohann, J., and Tasara, T. (2014). Evaluation of three reference genes of Escherichia coli for mRNA expression level normalization in view of salt and organic acid stress exposure in food. FEMS Microbiol. Lett. 355, 78–82.

Sourjik, V., and Wingreen, N.S. (2012). Responding to chemical gradients: bacterial chemotaxis. Curr. Opin. Cell Biol. 24, 262–268.

Soutourina, O., Kolb, A., Krin, E., Laurent-Winter, C., Rimsky, S., Danchin, A., and Bertin, P. (1999). Multiple control of flagellum biosynthesis in Escherichia coli: role of H-NS protein and the cyclic AMP-catabolite activator protein complex in transcription of the flhDC master operon. J. Bacteriol. 181, 7500–7508.

Soutourina, O.A., and Bertin, P.N. (2003). Regulation cascade of flagellar expression in Gram- negative bacteria. FEMS Microbiol. Rev. 27, 505–523.

Stoebel, D.M., Hokamp, K., Last, M.S., and Dorman, C.J. (2009). Compensatory evolution of

185 gene regulation in response to stress by Escherichia coli lacking RpoS. PLOS Genet 5, e1000671.

Takeuchi, R., Tamura, T., Nakayashiki, T., Tanaka, Y., Muto, A., Wanner, B.L., and Mori, H. (2014). Colony-live--a high-throughput method for measuring microbial colony growth kinetics- -reveals diverse growth effects of gene knockouts in Escherichia coli. BMC Microbiol. 14, 171.

Tamar, E., Koler, M., and Vaknin, A. (2016). The role of motility and chemotaxis in the bacterial colonization of protected surfaces. Sci Rep 6, 19616.

Taylor, B.L., Zhulin, I.B., and Johnson, M.S. (1999). Aerotaxis and other energy-sensing behavior in bacteria. Annu. Rev. Microbiol. 53, 103–128.

Thomason, L.C., Costantino, N., and Court, D.L. (2007). E. coli genome manipulation by P1 transduction. Curr Protoc Mol Biol Chapter 1, Unit1.17–1.17.8.

Vandecraen, J., Monsieurs, P., Mergeay, M., Leys, N., Aertsen, A., and Van Houdt, R. (2016). Zinc-Induced Transposition of Insertion Sequence Elements Contributes to Increased Adaptability of Cupriavidus metallidurans. Front Microbiol 7, 359.

Wang, S., Fleming, R.T., Westbrook, E.M., Matsumura, P., and McKay, D.B. (2006). Structure of the Escherichia coli FlhDC complex, a prokaryotic heteromeric regulator of transcription. Journal of Molecular Biology 355, 798–808.

Wang, X., and Wood, T.K. (2011). IS5 inserts upstream of the master motility operon flhDC in a quasi-Lamarckian way. Isme J 5, 1517–1525.

Zhang, Z., Kukita, C., Humayun, M.Z., and Saier, M.H. (2017). Environment-directed activation of the Escherichia coliflhDC operon by transposons. Microbiology (Reading, Engl.) 163, 554– 569.

186 Chapter 6

Conclusions and Future Directions

187 Summary

My work described within this thesis shows that gene expression in bacteria is well engineered for maximal fitness. In Chapter 2, I demonstrated that the production of the aminoacyl-tRNA synthetases (aaRS), one of the most conserved sets of proteins within all life, is perfectly optimized for fast growth in Bacillus subtilis. The production of these proteins exactly match their functional requirement, supplying just enough tRNAs to the ribosome to maintain the translation elongation rate and maximize the growth rate. However, this precision comes at a potential cost; tRNAs are not fully charged with native aaRS production. Intriguingly, it seems that this has been taken to account when designing feedback mechanisms to sense starvation as I uncovered that the stringent response is only majorly activated upon aaRS underproduction.

Furthermore, I uncovered that the stringent response alleviates the fitness defects of aaRS underproduction even when amino acids are fully supplied to the cell. In Chapter 3, I explored why aaRS overproduction leads to a growth defect. I found that the observed fitness defects are due to both aaRS-dependent effects, such as tRNA mischarging, and aaRS-independent effects, such as the costs associated with protein production. The combination of Chapter 2 and 3 lead to a complete characterization of the fitness landscapes of the aaRS proteins in B. subtilis. In

Chapter 4, I detailed the development of an improved library preparation protocol for next generation RNA sequencing that greatly improves the throughput of the procedure. In Chapter 5, using this multiplex RNA-seq method I discovered a large amount of transcriptome heterogeneity within the E. coli single gene knockout Keio collection. In collaboration with a former lab member, we found that the observed expression heterogeneity in these strains is due to mutations in the upstream activators of motility gene expression, and that E. coli have a strong propensity to select for motility-activating mutations in non-shaking liquid culture. In total, my

188 work advances biological research in three major ways. First, it provides an example of the importance in fully connecting protein expression, with function and fitness. Second, it gives novel insight into the molecular underpinnings of the translation cycle in bacteria. Finally, it provides a new practical tool for studying gene expression that can be used by all within the microbiology community.

The Future of Fitness Landscapes

The first major goal of my work was to establish the fitness landscape, the quantitative relationship between protein levels and fitness, for the aaRS proteins. To some degree nearly every biologist thinks about fitness landscapes in an informalized manner, as attempting to match cell phenotype with gain or loss of a protein of interest is how the majority of biological research is conducted. However, formalizing this language into direct, quantitative terms, has the potential to open up new unexplored questions and the power of the approach lies in fully exploring the molecular features that shape the landscape. My initial uncovering of the fitness landscape for the aaRS proteins revealed multiple avenues to explore. Chiefly, what about native aaRS production rates led to optimized fitness and why are the placement of native levels near a fitness cliff? Where does the landscape asymmetry arise from? How are tRNA charging and translation affected by changes in aaRS production? Why is there a fitness defect with aaRS overproduction? Answering these questions that arose from the initial landscape observation are integral to fully understanding of how protein expression fits within the context of it’s pathway, to lead to the high level phenotype of fitness. Importantly, all of the above considerations can be applied to any protein of interest making the fitness landscape a general tool applicable to all biologists.

189 One key finding of my work, shaping the thought process behind exploring future fitness landscapes, is that protein-independent and protein-dependent effects of overproduction are not strictly mutually exclusive. Importantly, the independent costs of producing a protein, discussed at length previously, are fundamentally dependent on the expression level of the enzyme required by the cell. For example, the costs of increasing the levels of a protein normally present at 5 copies per cell to 10 copies (2-fold), is very different from increasing the levels of a protein present at 5,000 copies to 10,000 copies (also 2-fold). Since the aaRS proteins are present in multiple thousands of copies per cell during fast growth steady state growth (Li et al., 2014), they may be under tighter pressure to reduce excess production. Whether this principle is true for all highly expressed proteins, remains to be seen. In this way, the fitness landscapes of individual proteins can be shaped by very different forces; genes with high expression requirements have sharper protein-independent consequences of overproduction than those with lower expression requirements. It is possible therefore that proteins normally present at low copy numbers are not near a ‘fitness cliff’ and are produced well in excess of demand since the costs of production are greatly reduced. Furthermore, it could be that autoregulation is of added importance in highly expressed genes in order to keep excess production to a minimum.

Fitness Landscapes of Translation Machinery

Another major finding from uncovering the fitness landscape in Chapter II, is that the aaRS proteins in Bacillus subtilis are produced with little excess near the ‘fitness cliff’. Although there are many known examples of enzymes that seem to be produced over their requirement

(Diamond, 2002; O'Brien et al., 2016; Peters et al., 2016; Sander et al., 2019), the aaRS proteins clearly do not fall within this category. Due to their high abundance, which is increased with

190 faster growth rates, this may be an effect seen for all proteins within translation (Bremer and

Dennis, 2008; Dennis and Bremer, 1974; Dong et al., 1996). Since the overall rate of protein production is intimately tied with the rate of cell doubling, it may be that cells are under further pressure to optimize the process in order to maximize growth (Ehrenberg and Kurland, 1984).

Complementing a large body of work that ribosome production limits growth in fast-growing bacteria (Scott and Hwa, 2011), my research argues that the aaRS proteins are ‘limiting’ for growth as well. ‘Limiting’ in this case is defined by the fact that aaRS production is at a fitness cliff, and any underproduction would throttle the growth rate of the cell. The finding that the stoichiometry of translation proteins is quantitatively conserved across bacteria (Lalanne et al.,

2018), potentially suggests that other translation proteins may be ‘limiting’ the growth rate in this way as well. If this were to be the case, it would uncover a remarkable precision in protein production as translation proteins are produced over three orders of magnitude.

The protein that may be under the most selective pressure to optimize levels, with a potentially interesting fitness landscape, is EF-Tu. First, its activity is intricately related to the translation elongation rate, and thus the growth rate, similar to the aaRSs. Unlike aaRSs, which account for only 1/20th of the amino acids translated, EF-Tu is required for every single elongation event, and may have an even sharper fitness drop-off with underproduction. It is also one of the most highly expressed proteins in fast growing bacteria at 5 copies of EF-Tu per ribosome, accounting for ~9% of the total proteome. At such a high expression level, it is likely that the pressures to reduce the protein-independent costs of production are elevated. Moreover, there could be EF-Tu dependent costs of overproduction, such as an increase in binding of uncharged tRNAs which may artificially slow the elongation cycle and exacerbate the overproduction landscape compared to aaRSs. Modeling work on the E. coli elongation cycle has

191 supported the notion that the EF-Tu to ribosome ratio is optimal for growth rate maximization

(Klumpp et al., 2013), however this has yet to be experimentally verified.

Another interesting set of proteins to explore the fitness landscapes of are the ribosomal proteins (RP). According to the bacterial “growth law”, any reduction in functioning ribosomes should have a direct, negative impact on the growth rate during the fast exponential growth conditions (Bremer and Dennis, 2008; Scott and Hwa, 2011). The ribosome however, is not a single protein entity like an aaRS or EF-Tu, but a ribonucleoprotein complex composed of 3

RNAs and >50 small proteins (Schuwirth et al., 2005; Selmer et al., 2006). Interestingly, many of the RPs are not essential for cell growth across bacteria as 22/54 and 16/57 RPs in E. coli and

B. subtilis have been deleted successfully (Akanuma et al., 2012; Baba et al., 2006; Shoji et al.,

2011). It is generally thought that the remaining essential RPs form the core ribosome and are indispensable for function. This may lead to the fitness landscapes for each RP to be quantitatively conserved, as each would individually limit complex function during underproduction. However, it has been shown that the hierarchical process of ribosome assembly is altered when individual RPs are limited (Davis and Williamson, 2017; Davis et al., 2016) . The cells re-routing of the assembly pathway during underproduction may come at an overall cost to assembly time, leading to RPs assembled early having exacerbated fitness landscapes compared to ones later. This could also be the case with RP overproduction. It is well documented that the majority of RPs across bacteria, like aaRSs, are autoregulated to limit overproduction (Deiorio-

Haggar et al., 2013; Nomura et al., 1984; Springer and Portier, 2003). In contrast to aaRSs, RPs are generally in larger operons with multiple RPs, despite autoregulatory control only sensitive to one RP within the operon (Aseev et al., 2016; Fu et al., 2013). Removal of these autoregulatory loops, leading to small increases in RP expression, has been shown to noticeably reduce fitness

192 (Babina et al., 2018), indicating the fitness effects of RP overproduction may be exacerbated compared to aaRSs. As previously proposed, overproduction can also lead to assembly defects which may be lead to different fitness effects of the RPs depending on the step of assembly

(Davis and Williamson, 2017; Klinge and Woolford, 2019). Alternatively, excess RPs could simply be degraded by the protein degradation machinery leading to a reduced cost of overproduction (Davis et al., 2016; Lam et al., 2007; Sung et al., 2016). Overall, the exploration of fitness landscapes of individual RPs can lead to many additional findings about RP function and ribosome assembly.

Future Directions of the Translation Cycle and Regulation

My work on the fitness landscapes of the aaRSs provided multiple insights on the overall process and regulation of the translation cycle. First, I confirmed previous reports that tRNA charging is non-maximal in fast growing bacteria while adding that charging can be increased with aaRS overproduction (Dittmar et al., 2005; McClain et al., 1999; Varshney et al., 1991;

Yegian et al., 1966). These results show that despite the presence of a substantial concentration of uncharged tRNAs during growth in rich media, their adverse effects on translation are likely minimized. The translation elongation cycle consists of several sequential steps, including binding of the ternary complex, GTP hydrolysis by EF-Tu, EF-G binding, and subsequent GTP hydrolysis. For the charging level to meaningfully affect the overall elongation rate, the binding time of the ternary complex has to be a substantial fraction of the overall ribosome transit time.

Previous analyses of biochemical data have suggested that ternary complex binding is much faster than the elongation rate of ~16 codons/sec in rich media (Dai et al., 2016; Klumpp et al.,

2013). Furthermore, modeling work on the translation cycle in E. coli suggested that aminoacyl-

193 tRNAs do not limit the elongation cycle (Subramaniam et al., 2014). This model is further supported by our observation that increasing aaRS production and tRNA charging does not substantially decrease the ribosome transit time by virtue of saturation of the ternary complex for charged tRNAs at native aaRS production. Moreover, the sensitivity to decreased but not increased tRNA charging is consistent with the Michaelis-Menten type relationship that was previously proposed between net elongation rates and ternary complex concentration (Dai et al.,

2016). Together, these results suggest that the abundance of charged tRNAs is not a main determinant of translation elongation rates in rich media.

Knowing the degree to which tRNAs and aaRS must be expressed to ensure proper charged tRNA flux is especially important when using cellular translation as a platform for exogenous protein production. Typically, the use of bacteria as hosts for large-scale production of therapeutic or industrial proteins requires a redesign of the native sequence to better match the host. Yet if the translation rate of individual codons is not rate limited by the charged tRNA flux produced by native aaRS levels, why is this necessary? My results argue that numerous other properties of these sequence differences (Kudla et al., 2009; Li et al., 2012; Mittal et al., 2018), not the actual rate of translation elongation of those codons, is the primary driver of the observed changes in protein production. However, it could be that the balance of codon usage to tRNA abundance is perfectly aligned for the natural proteome. When this balance is off, due to extreme exogenous protein production, differences in translation rates between codons could become apparent (Andersson and Kurland, 1990; Elf et al., 2003; Frumkin et al., 2018).

For many years work has been done attempting to introduce non-canonical amino acids into the protein design toolbox by effectively expanding the genetic code through aaRS & tRNA engineering (Ambrogelly et al., 2007; Link et al., 2003; Liu and Schultz, 2010). This increase of

194 potential orthogonal aaRS-tRNA sets is nicely paired with innovations in large scale genomic tools, allowing for the complete recoding of bacterial genomes (Cervettini et al., 2020; Fredens et al., 2019; Wang et al., 2016). This opens up the possibility of organisms producing proteins with completely novel chemistries, either by replacement, or in parallel to their natural genome

(Rovner et al., 2015; Zhang et al., 2017). However, encountering these “new” codons, whether truly synthetic or recoded, requires a proper supply of aminoacylated tRNA. As my findings show, the ribosome requires a minimum concentration of charged tRNA at the proper ratio to maintain elongation speeds. Without careful balancing of tRNA supply, codon usage, and aaRS expression, errors can occur including mistranslation, protein degradation, and stress response activation (Ma et al., 2018).

Through the characterization of the aaRS fitness landscapes I also uncovered two novel principles surrounding the sensor of translation elongation, the (p)ppGpp induced stringent response. First, I found that the stringent response is important in buffering the negative effects on fitness of charged tRNA starvation to the ribosome even when amino acids are plentiful.

Second, I found that the stringent response is acutely activated with only 2-fold aaRS underproduction. This result is surprising since a change in the charging ratio from ~60%

(wildtype) to ~90% (aaRS overproduction) does not lead to a change in activation of the stringent response. These results imply that RelA has a threshold for activation, charging less than 60% activates (p)ppGpp synthesis, while levels above this do not. Although it clear that there is a basal level of (p)ppGpp in wildtype cells (Nanamiya et al., 2008), it is unclear if these levels are due to sensing the wildtype uncharged tRNAs or simply background synthesis by

RelA. However, future work should confirm my observations by directly mapping tRNA charging and (p)ppGpp levels across the expression landscape. Although it remains unclear what

195 exactly signals RelA activation, quantification of the tRNA charging and (p)ppGpp levels can potentially help tease apart the multiple proposed mechanisms.

Concluding remarks

With the advent of high-precision measurement tools, such as next-generation sequencing and high resolution mass spectrometry, we now have the ability to quantify the expression of all genes in the cell with amazing precision. In addition, the accumulation of decades of work on the biochemistry and molecular biology of protein function has given us an increased picture of the happenings inside cells. Lagging far behind however, is a link between these quantitative gene expression observations and their physiological meaning. Given that a huge amount of disease causing mutations are due to changes in protein expression, there exists a great need within the biomedical research community to quantitatively understand the relationship between expression, pathway function, and cell phenotype. In my work presented here on the tRNA charging enzymes, I have provided evidence that this type of integrative thinking is useful not only to better understand one set of proteins, but how multiple layers of biology come together to affect cell physiology.

196 References

Akanuma, G., Nanamiya, H., Natori, Y., Yano, K., Suzuki, S., Omata, S., Ishizuka, M., Sekine, Y., and Kawamura, F. (2012). Inactivation of ribosomal protein genes in Bacillus subtilis reveals importance of each ribosomal protein for cell proliferation and cell differentiation. J. Bacteriol. 194, 6282–6291.

Ambrogelly, A., Palioura, S., and Söll, D. (2007). Natural expansion of the genetic code. Nature Chemical Biology 3, 29–35.

Andersson, S.G., and Kurland, C.G. (1990). Codon preferences in free-living microorganisms. Microbiol. Rev. 54, 198–210.

Aseev, L.V., Koledinskaya, L.S., and Boni, I.V. (2016). Regulation of Ribosomal Protein Operons rplM-rpsI, rpmB-rpmG, and rplU-rpmA at the Transcriptional and Translational Levels. J. Bacteriol. 198, 2494–2502.

Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M., Wanner, B.L., and Mori, H. (2006). Construction of Escherichia coli K‐12 in‐frame, single‐ gene knockout mutants: the Keio collection. Molecular Systems Biology 2, 2006.0008.

Babina, A.M., Parker, D.J., Li, G.-W., and Meyer, M.M. (2018). Fitness advantages conferred by the L20-interacting RNA cis-regulator of ribosomal protein synthesis in Bacillus subtilis. Rna 24, 1133–1143.

Bremer, H., and Dennis, P.P. (2008). Modulation of Chemical Composition and Other Parameters of the Cell at Different Exponential Growth Rates. EcoSal Plus 3.

Cervettini, D., Tang, S., Fried, S.D., Willis, J.C.W., Funke, L.F.H., Colwell, L.J., and Chin, J.W. (2020). Rapid discovery and evolution of orthogonal aminoacyl-tRNA synthetase-tRNA pairs. Nat. Biotechnol. 550, 53–11.

Dai, X., Zhu, M., Warren, M., Balakrishnan, R., Patsalo, V., Okano, H., Williamson, J.R., Fredrick, K., Wang, Y.-P., and Hwa, T. (2016). Reduction of translating ribosomes enables Escherichia coli to maintain elongation rates during slow growth. Nat Microbiol 2, 16231.

Davis, J.H., and Williamson, J.R. (2017). Structure and dynamics of bacterial ribosome biogenesis. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 372, 20160181.

Davis, J.H., Tan, Y.Z., Carragher, B., Potter, C.S., Lyumkis, D., and Williamson, J.R. (2016). Modular Assembly of the Bacterial Large Ribosomal Subunit. Cell 167, 1610–1622.e1615.

Deiorio-Haggar, K., Anthony, J., and Meyer, M.M. (2013). RNA structures regulating ribosomal in bacilli. RNA Biol 10, 1180–1184.

Dennis, P.P., and Bremer, H. (1974). Differential rate of ribosomal protein synthesis in Escherichia coli B/r. Journal of Molecular Biology 84, 407–422.

197 Diamond, J. (2002). Quantitative evolutionary design. J. Physiol. (Lond.) 542, 337–345.

Dittmar, K.A., Sørensen, M.A., Elf, J., Ehrenberg, M., and Pan, T. (2005). Selective charging of tRNA isoacceptors induced by amino-acid starvation. EMBO Reports 6, 151–157.

Dong, H., Nilsson, L., and Kurland, C.G. (1996). Co-variation of tRNA Abundance and Codon Usage inEscherichia coliat Different Growth Rates. Journal of Molecular Biology 260, 649–663.

Ehrenberg, M., and Kurland, C.G. (1984). Costs of accuracy determined by a maximal growth rate constraint. Q. Rev. Biophys. 17, 45–82.

Elf, J., Nilsson, D., Tenson, T., and Ehrenberg, M. (2003). Selective charging of tRNA isoacceptors explains patterns of codon usage. Science 300, 1718–1722.

Fredens, J., Wang, K., la Torre, de, D., Funke, L.F.H., Robertson, W.E., Christova, Y., Chia, T., Schmied, W.H., Dunkelmann, D.L., Beránek, V., et al. (2019). Total synthesis of Escherichia coli with a recoded genome. Nature 569, 514–518.

Frumkin, I., Lajoie, M.J., Gregg, C.J., Hornung, G., Church, G.M., and Pilpel, Y. (2018). Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc. Natl. Acad. Sci. U.S.a. 115, E4940–E4949.

Fu, Y., Deiorio-Haggar, K., Anthony, J., and Meyer, M.M. (2013). Most RNAs regulating ribosomal protein biosynthesis in Escherichia coli are narrowly distributed to Gammaproteobacteria. Nucleic Acids Res. 41, 3491–3503.

Klinge, S., and Woolford, J.L. (2019). Ribosome assembly coming into focus. Nature Reviews Molecular Cell Biology 20, 116–131.

Klumpp, S., Scott, M., Pedersen, S., and Hwa, T. (2013). Molecular crowding limits translation and cell growth. Proc. Natl. Acad. Sci. U.S.a. 110, 16754–16759.

Kudla, G., Murray, A.W., Tollervey, D., and Plotkin, J.B. (2009). Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258.

Lalanne, J.-B., Taggart, J.C., Guo, M.S., Herzel, L., Schieler, A., and Li, G.-W. (2018). Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749–761.e38.

Lam, Y.W., Lamond, A.I., Mann, M., and Andersen, J.S. (2007). Analysis of nucleolar protein dynamics reveals the nuclear degradation of ribosomal proteins. Curr. Biol. 17, 749–760.

Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635.

Li, G.-W., Oh, E., and Weissman, J.S. (2012). The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541.

198 Link, A.J., Mock, M.L., and Tirrell, D.A. (2003). Non-canonical amino acids in protein engineering. Curr. Opin. Biotechnol. 14, 603–609.

Liu, C.C., and Schultz, P.G. (2010). Adding new chemistries to the genetic code. Annu. Rev. Biochem. 79, 413–444.

Ma, N.J., Hemez, C.F., Barber, K.W., Rinehart, J., and Isaacs, F.J. (2018). Organisms with alternative genetic codes resolve unassigned codons via mistranslation and ribosomal rescue. Elife 7, e8.

McClain, W.H., Jou, Y.Y., Bhattacharya, S., Gabriel, K., and Schneider, J. (1999). The reliability of in vivo structure-function analysis of tRNA aminoacylation. Journal of Molecular Biology 290, 391–409.

Mittal, P., Brindle, J., Stephen, J., Plotkin, J.B., and Kudla, G. (2018). Codon usage influences fitness through RNA toxicity. Proc. Natl. Acad. Sci. U.S.a. 115, 8639–8644.

Nanamiya, H., Kasai, K., Nozawa, A., Yun, C.-S., Narisawa, T., Murakami, K., Natori, Y., Kawamura, F., and Tozawa, Y. (2008). Identification and functional analysis of novel (p)ppGpp synthetase genes in Bacillus subtilis. Mol. Microbiol. 67, 291–304.

Nomura, M., Gourse, R., and Baughman, G. (1984). Regulation of the synthesis of ribosomes and ribosomal components. Annu. Rev. Biochem. 53, 75–117.

O'Brien, E.J., Utrilla, J., and Palsson, B.O. (2016). Quantification and Classification of E. coli Proteome Utilization and Unused Protein Costs across Environments. PLoS Comput. Biol. 12, e1004998.

Peters, J.M., Colavin, A., Shi, H., Czarny, T.L., Larson, M.H., Wong, S., Hawkins, J.S., Lu, C.H.S., Koo, B.-M., Marta, E., et al. (2016). A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. Cell 165, 1493–1506.

Rovner, A.J., Haimovich, A.D., Katz, S.R., Li, Z., Grome, M.W., Gassaway, B.M., Amiram, M., Patel, J.R., Gallagher, R.R., Rinehart, J., et al. (2015). Recoded organisms engineered to depend on synthetic amino acids. Nature 518, 89–93.

Sander, T., Farke, N., Diehl, C., Kuntz, M., Glatter, T., and Link, H. (2019). Allosteric Feedback Inhibition Enables Robust Amino Acid Biosynthesis in E. coli by Enforcing Enzyme Overabundance. Cell Syst 8, 66–75.e68.

Schuwirth, B.S., Borovinskaya, M.A., Hau, C.W., Zhang, W., Vila-Sanjurjo, A., Holton, J.M., and Cate, J.H.D. (2005). Structures of the bacterial ribosome at 3.5 A resolution. Science 310, 827–834.

Scott, M., and Hwa, T. (2011). Bacterial growth laws and their applications. Curr. Opin. Biotechnol. 22, 559–565.

Selmer, M., Dunham, C.M., Murphy, F.V., Weixlbaumer, A., Petry, S., Kelley, A.C., Weir, J.R.,

199 and Ramakrishnan, V. (2006). Structure of the 70S ribosome complexed with mRNA and tRNA. Science 313, 1935–1942.

Shoji, S., Dambacher, C.M., Shajani, Z., Williamson, J.R., and Schultz, P.G. (2011). Systematic chromosomal deletion of bacterial ribosomal protein genes. Journal of Molecular Biology 413, 751–761.

Springer, M., and Portier, C. (2003). More than one way to skin a cat: translational autoregulation by ribosomal protein S15. Nat. Struct. Biol. 10, 420–422.

Subramaniam, A.R., Zid, B.M., and O’Shea, E.K. (2014). An Integrated Approach Reveals Regulatory Controls on Bacterial Translation Elongation. Cell 159, 1200–1211.

Sung, M.-K., Porras-Yakushi, T.R., Reitsma, J.M., Huber, F.M., Sweredoski, M.J., Hoelz, A., Hess, S., and Deshaies, R.J. (2016). A conserved quality-control pathway that mediates degradation of unassembled ribosomal proteins. Elife 5, 3429.

Varshney, U., Lee, C.P., and RajBhandary, U.L. (1991). Direct analysis of aminoacylation levels of tRNAs in vivo. Application to studying recognition of Escherichia coli initiator tRNA mutants by glutaminyl-tRNA synthetase. J. Biol. Chem. 266, 24712–24718.

Wang, K., Fredens, J., Brunner, S.F., Kim, S.H., Chia, T., and Chin, J.W. (2016). Defining synonymous codon compression schemes by genome recoding. Nature 539, 59–64.

Yegian, C.D., Stent, G.S., and Martin, E.M. (1966). Intracellular condition of Escherichia coli transfer RNA. Proc. Natl. Acad. Sci. U.S.a. 55, 839–846.

Zhang, Y., Ptacin, J.L., Fischer, E.C., Aerni, H.R., Caffaro, C.E., San Jose, K., Feldman, A.W., Turner, C.R., and Romesberg, F.E. (2017). A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551, 644–647.

200 Darren John Parker Current address: 79 Putnam Ave, Cambridge MA 02139 • (630) 308-9691 • [email protected]

EDUCATION

Massachusetts Institute of Technology (MIT), Cambridge, MA Ph.D. candidate in Department of Biology Expected: September 2020 • Provisional thesis title: Characterizing the landscape of aminoacyl-tRNA synthetase protein production in Bacillus subtilis

University of Illinois at Urbana-Champaign, Urbana, IL Bachelor of Science in Biochemistry, Highest Distinction May 2014 Minor in Chemistry • Thesis titles (dual): Identification of deoxyribozymes to catalyze RNA phosphodiester bonds via hydrolysis & Determining the role of alternative mRNA splicing on mammalian liver development

RESEARCH EXPERIENCE

Graduate Research Assistant Gene-Wei Li Group, Cambridge, MA. May 2015 - Present Unpublished works in progress: • Developing high-throughput sequencing tools to profile the aminoacylation state of tRNAs in vivo • Co-developed a high-resolution fitness assay based on NGS for multiplexed bacterial fitness measurements • Created, designed, and optimized a multiplexed RNA-seq method to reduce cost and time of next-generation sequencing library prep >10-fold

Undergraduate Research Assistant Kalsotra Group, Urbana, IL. March 2013 – August 2014 • Worked to identify key mRNA splicing programs that influence development of the liver and heart • Developed mouse line to control RBFOX2 expression in heart

Silverman Group, Urbana, IL. September 2011 – May 2013 • Identified DNA sequences (deoxyribozymes) through in vitro selection that catalyze hydrolysis of RNA and catalyze peptide modifications on lysine side chains

PUBLICATIONS

Parker, D.J., Lalanne, J.B., Kimura, S., Johnson, G.E., Waldor, M.K., Li, G.W. Growth Optimized Aminoacyl-tRNA Synthetase Levels Prevent Maximal tRNA Charging. (Accepted at Cell Systems)

Misra C., Bangru S., Lin F., Lam K., Koenig S.N., Lubber E.R., Hedhli J., Murphy, N.P., Parker D.J., Dobrucki L.W., Cooper TA, Tajkhorshid E, Mohler PJ, Kalsotra A. Aberrant expression of a non-muscle

201 RBFOX2 isoform triggers cardiac conduction defects in myotonic dystrophy. Developmental Cell 52, 748-763 (2020)

Parker, D.J.*, Demetci, P.*, Li, G.W. Rapid accumulation of motility-activating mutations in resting liquid culture of Escherichia coli. Journal of Bacteriology. 5 201:e00259-19. (2019) • *Indicates co-first author

Babina, A.M., Parker, D.J., Li, G.W., Meyer, M.M. Fitness advantages conferred by the L20-interacting RNA cis-regulator of ribosomal protein synthesis in Bacillus subtilis. RNA. 9, 1133-1143 (2018)

Bhate, A.*, Parker, D.J.*, Bebee, T.W., Ahn, J., Arif, W., Rashan, E.H., et al. ESRP2 controls an adult splicing programme in hepatocytes to support postnatal liver maturation. Nature Communications. 6, 8768 (2015) • *Indicates co-first author

Parker D.J., Xiao, Y., Aguilar J.M., Silverman, S.K. DNA Catalysis of a Normally Disfavored RNA Hydrolysis Reaction. Journal of the American Chemical Society. 135, 8472-8475 (2013)

INVITED TALKS

2019 “Life and work of an MIT Biology Graduate student” – Department of Biology Friends of Biology Donor event 2019 “Growth Optimized Aminoacyl tRNA Synthetase Levels Prevents Maximal tRNA Charging” – Department of Biology Building 68 Retreat 2019 “Baking a cellular pie: How much tRNA synthetase is required for the perfect recipe?” – Department of Biology “Bioslam” Competition - https://tinyurl.com/y3abbhn6 2018 “tRNA synthetase expression, translation, and cell growth in Bacillus subtilis” – MIT Department of Biology 2018 “Utilizing high throughput sequencing to understand the robustness of the bacterial proteome” – Lewis University 2016 “Highly multiplexed RNA-seq to uncover genetic compensation in E. coli” – Department of Biology Building 68 Retreat

CONFERENCE PROCEEDINGS

2019 “Growth Optimized Aminoacyl tRNA Synthetase Levels Prevents Maximal tRNA Charging” – Poster, New Approaches and Concepts in Microbiology, Heidelberg, Germany 2018 “Determining the relationship between the design constraints of feedback regulation and the fitness landscape of a protein” – Poster, Molecular Genetics of Bacteria & Phage Conference, Madison WI

AWARDS

2018 Best poster Award – Department of Biology Building 68 Retreat 2014 William T. and Lynn Jackson Senior Thesis Award – Award for best senior thesis in Biochemistry 2013 William T. and Lynn Jackson Summer Scholar – Award for achievement in undergraduate research 2009-2013 UIUC College of Liberal Arts and Sciences Dean’s List – Seven semesters 2011 Edmund J. James Scholar Honors Program

202 2010 General Assembly Scholarship Recipient - full tuition waiver funded by the state of Illinois

TEACHING EXPERIENCE

Graduate Teaching Assistant, MIT Department of Biology Experimental Biology and Communication (Course 7.02) August – December 2017 • Accompanied eight students through rigorous 10 hour/week introductory biology laboratory class Introductory Biology: Microbiology Specialty (Course 7.015) August – December 2015 • Held two recitations per week for a small, seven student class focused on introductory biology & expanding science literacy through evaluation of current scientific news Merit Teaching Assistant, University of Illinois School 2012-2013 • Taught a specialized General Chemistry I discussion section of ~16 students from underdeveloped or underrepresented high schools Student Mentor - Chemistry 199L: iScience 2013 • Mentored four ‘undeclared major’ freshman/sophomore undergraduate students through the process of finding a research lab and major of interest

LEADERSHIP

Founding Member – MIT Department of Biology Graduate Student Council, 2016 - 2017 • Committee member to establish a student resource group responsible for reporting to the MIT Graduate Student Council Intramural Chair – MIT Department of Biology 2016 - Present President, Alpha Chi Sigma- Professional Chemistry Fraternity, Urbana, IL 2012-2013 • Presided over a Registered Student Organization with 90+ paying members Professional Chair – Alpha Chi Sigma 2012 Krug Chair – Alpha Chi Sigma 2011

PROFESSIONAL EXPERIENCE Comparative Biology & Safety Sciences Intern – Amgen, Cambridge MA 2019 • Characterized RNA extraction and targeted sequencing technologies to utilize FFPE tissue as a source of gene expression data for retrospective toxicological studies

Commercialization Intern – Cabot Microelectronics, Aurora, IL. 2011

ACTIVITIES

Member – MIT Men’s Club Volleyball Team 2016 - Present • Compete in USAV Yankee Club league (C+) • Competed in USAV Men’s National tournament in Minneapolis, Minnesota with team in 2017

Volunteer – Provena Covenant Medical Center 2012 - 2013 Member – Alpha Phi Omega service fraternity 2011 - 2014

203