bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

First-principles model of optimal factors stoichiometry

Jean-Benoît Lalanne1,2,# and Gene-Wei Li1,† 1Department of Biology, and 2Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. #Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98105, USA. †Corresponding author: [email protected]

Abstract

Enzymatic pathways have evolved uniquely preferred expression stoichiometry in living cells, but our ability to predict the optimal abundances from basic properties remains underdeveloped. Here we report a biophysical, first-principles model of growth optimization for core mRNA transla- tion, a multi-enzyme system that involves with a broadly conserved stoichiometry spanning two orders of magnitude. We show that a parsimonious flux model constrained by proteome allocation is sufficient to predict the conserved ratios of translation factors through maximization of usage The analytical solutions, without free parameters, provide an interpretable framework for the observed hierarchy of expression levels based on simple biophysical properties, such as diffusion con- stants and protein sizes. Our results provide an intuitive and quantitative understanding for the construction of a central process of life, as well as a path toward rational design of pathway-specific enzyme expression stoichiometry.

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Introduction

A universal challenge faced by both evolution and synthetic pathway creation is to optimize the cellular abundance of proteins. This abundance optimization problem is not only multidimensional – often involving several proteins participating in the same pathway – but also under systems-wide constraints, such as limited physical space [30] and finite nutrient inputs [70]. The complexity of this problem has prevented rational design of protein expression for pathway engineering [23]. Fundamentally, being able to predict the optimal and observed cellular protein abundances from their individual properties would reflect an ultimate understanding of molecular and systems biology.

Evolutionary comparison of gene expression across microorganisms suggests that basic principles governing the optimization problem may exist. We recently reported broad conservation of relative protein synthesis rates within individual pathways, even under circumstances in which the relative tran- scription and translation rates for the homologous enzymes have dramatically diverged across species

[34]. Moreover, distinct proteins that evolved convergently towards the same biological function also displayed the same stoichiometry of protein synthesis in their respective species. These results suggest that the determinants of optimal in-pathway protein stoichiometry are likely modular and independent of detailed biochemical or physiological properties that differ across clades. However, the precise nature of such determinants remains unknown.

Translation of mRNA into proteins is a central pathway required for cell growth and therefore serves as an entry point for establishing a quantitative model of optimal in-pathway stoichiometry. In addition to which have well-coordinated synthesis of subunits [50], nearly 100 protein factors are involved in facilitating ribosome assembly and translation initiation, elongation, and termination [14, 44, 57].

The intracellular abundances of these factors vary over 100-fold [38, 54], and their ratios are often maintained in different growth conditions and across different species [34]. As a group, the total amount of translation-related proteins per cell mass scales linearly with growth rate in most conditions [13, 60, 62].

This relationship is considered a bacterial ‘growth law’ and can be understood as a consequence of producing just enough ribosomes necessary to duplicate the proteome at a given growth rate. However, the relative stoichiometry among translation factors is less understood. Notable exceptions are the Tu (EF-Tu) [17, 30] and more recently elongation factor Ts (EF-Ts) [21], which have been suggested to maximize translational flux per unit proteome. The selective pressure on the expression of other factors, which can be much more lowly expressed and often assumed to be non-limiting, remains

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. to be determined.

Here we sought to derive an intuitive model to understand the quantitative abundance hierarchy

(Fig. 1B) among the core translation factors (TFs), which have well-characterized functions (Table 1, schematic in Fig. 1A). Our goal is not to exhaustively model the heterogeneous movement of ribosomes on the transcriptome [16, 56, 64, 65] or to include as many details of the underlying molecular steps as possible [21, 66]. Instead, we coarse-grained global translation into a cycle that consists of sequential steps with interconnected fluxes that depend on core TFs concentrations. At steady-state cell growth, all individual fluxes are matched and the overall rate of ribosomes completing the full translation cycle is proportional to cell growth. By solving for the maximum flux under previously defined proteome allocation constraints [17, 21, 62], we obtained analytical solutions for the optimal factor concentrations, which agree well with the observed values. The ratios of optimal concentrations depend only on simple biophysical parameters that are broadly conserved across species. For instance, elongation factor EF-

G is predicted to be more abundant than initiation and termination TFs by a multiplicative factor of √ ≈ average protein length ≈ 14, whereas EF-Tu is predicted to be more abundant than EF-G by a factor √ of ≈ number of different amino acids ≈ 4. These results, arising from the optimization procedure and generic properties of the translation cycle, provide rationales for the order-of-magnitude expression of these important enzymes.

Step Factor Function Initiation IF1 1: binds to 30S ribosome subunits to facilitate initiator tRNA binding [20, 36]. Initiation IF2 Initiation factor 2: ribosome-dependent GTPase interacting with 30 ribosome subunits, ensures correct binding of initiator tRNAs [20, 36]. Initiation IF3 Initiation factor 3: prevents premature docking of 50S ribosomal subunits [20, 36]. Elongation EF-Tu Elongation factor Tu: binds to charged tRNAs to form ternary complexes, brings charged tRNAs to empty ribosome A sites. [1, 2, 68] Elongation aaRS tRNA synthetases: charge tRNAs with cognate amino acids [22, 51]. Elongation EF-G Elongation factor G: catalyzes translocation steps of the ribosome after peptide bond formation [1, 2]. Elongation EF-Ts Elongation factor Ts: nucleotide exchange factor for EF-Tu [1, 2]. Termination RF1/RF2 Peptide chain release factors 1 and 2: recognize and hydrolyze the completed protein. RF1 recognizes UAA, UAG, and RF2 UAA, UGA [8]. Termination RF4 Ribosome recycling factor: catalyzes the dissociation of ribosome subunits fol- lowing peptide chain release in translation termination [8].

Table 1: Brief description of the function of core translation factors considered. For reviews of mRNA translation, see [12, 57].

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 1: The hierarchy of mRNA translation factor expression stoichiometry. (A) Multiscale model relating translation factor expression to growth rate. The growth rate 휆 is directly proportional to the ribosome content (휑푟푖푏표) in the cell and inversely proportional to the average time to complete the translation cycle τ푡푟푙, consisting of the sum of the initiation (τ푖푛푖), elongation (τ푒푙), and termination (τ푡푒푟) times. Each of these reaction times are determined by the translation factor abundances. On average, the elongation step is repeated around ⟨ℓ⟩ ≈ 200× to complete a full protein, compared to 1× for initiation and termination. Our framework of flux optimization under proteome allocation constraint addresses what ribosome and translation factor abundances maximize growth rate. (B) Measured expression hierarchy of bacterial mRNA translation factors, conserved across evolution. Horizontal bars mark the proteome synthesis fractions as measured by ribosome profiling [34] (equal to the proteome fraction by weight for a stable proteome) for key mRNA translation factors in B. subtilis (Bsub), E. coli (Ecol), and V. natriegens (Vnat) and are color-coded according to the protein (or group of proteins) specified. TrianglesJ ( ) on the right indicate the mean synthesis fraction of the protein in the three species. See Table 1 for a short description of the translation factors considered.

Results

Problem statement and model formulation

Our overall goal is to determine the growth-optimizing proteome allocation for the ribosome and various translation factors. Conceptually, varying TF concentrations has two opposing effects on cell prolifer- ation. At the biochemical level, high TF expression can facilitate growth by allowing more efficient usage of ribosomes. At the systems level, taking into account the constraint of a finite proteomic space, high TF expression can nonetheless limit growth by reducing the number of ribosomes produced. The

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. tradeoffs between various TFs and ribosomes create a multidimensional optimization problem.

We solve this multidimensional problem by treating translation as a dynamical system, in which ribosomes cycle through initiation, elongation, and termination. The resulting flux drives cell growth.

During steady-state growth, every interlocked step of the translation cycle must have the same ribosome flux that is specified by the growth rate. We show that at the growth optimum, concentrations for distinct TFs can be solved independently. The resulting analytical solutions can be expressed in terms of the growth rate and simple biophysical parameters.

Cell growth driven by TF-dependent ribosome flux

To describe the biochemical effects of TF concentrations on cell growth, we first introduce acoarse- grained translation cycle time τ푡푟푙, or the time it takes for a ribosome to complete a typical cycle of protein synthesis (Fig. 1A), which consists of three sequential steps: initiation ("푖푛푖"), elongation ("푒푙"), and termination ("푡푒푟"). Each of these steps is catalyzed by multiple TFs. The full translation cycle time is then sum of ribosome transit times at the three steps (τ푡푟푙 = τ푖푛푖 + τ푒푙 + τ푡푒푟), whose dependence on individual TF concentrations can be quantitatively described through mass action kinetic schemes

(schematically depicted in Fig. 1A, see Supplement and examples below). We express TF concentrations in units of proteome fractions (dry mass fraction of a specified protein to the full proteome), denoted by 휑 [62] (Supplement, section S1.2). Using this notation, the translation cycle time τ푡푟푙 is a decreasing function of various TFs concentrations ({휑푇 퐹,푖}). In addition to its dependency on TF concentrations, the translation cycle time provides a bridge between the cell growth rate and ribosome concentration. In steady-state growth [13, 47, 62], the growth rate of cells and of their protein content (total number of proteins) must be identical, denoted here as 휆, as a result of the constant average cellular composition. The protein content grows at a rate determined by the flux of ribosomes completing the translation cycle(푁푟푖푏표/τ푡푟푙, where 푁푟푖푏표 is the number of ribosomes per cell) divided by the total number of proteins 푁푃 per cell, i.e., 휆 = 푁푟푖푏표/τ푡푟푙푁푃 .

Rescaling the ribosome number to the mass fraction of proteome for ribosomes (휑푟푖푏표) yields

휑 ⟨ℓ⟩ 휆 = 푟푖푏표 , (1) τ푡푟푙 ℓ푟푖푏표

where ℓ푟푖푏표 is the number of amino acids in ribosomal proteins and ⟨ℓ⟩ is the average length among all proteins, weighted by expression levels (Methods, section S1). The rescaling factor (ℓ푟푖푏표/⟨ℓ⟩ ≈

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

7300 푎.푎./200 푎.푎. = 36.5) is approximately constant across growth conditions (section S1). This equation establishes how TF concentrations affect the growth rate biochemically via τ푡푟푙. We note that equation 1 is a generalized form of the bacterial growth law that relates the mass 휑푒푙 fraction of elongating ribosomes to growth rate (휆 = 푟푖푏표 ⟨ℓ⟩ = 훾휑푒푙 , where 훾 is a rescaled transla- τ푒푙 ℓ푟푖푏표 푟푖푏표 푒푙 tion elongation rate and 휑푟푖푏표 is the proteome fraction of actively translating ribosomes [13, 62, 63]). This classic growth law was derived by considering the steady-state flux of peptide bond formation by elongating ribosomes, whereas our model focuses on the flux of ribosomes that traverse the entire trans- lation cycle, thereby allowing us to consider the effects of translation factors and ribosomes engaged in additional steps (initiation, elongation, and termination). For each step, equation 1 can be extended to show that the growth rate is similarly proportional to the mass fraction of the corresponding ribosomes divided by the transit time at that step (Supplement, section S2).

Steady-state growth thus imposes the requirement that the growth rate be inversely proportional to the translation cycle time and proportional to the number of ribosomes engaged in the translation cycle (equation 1). Inactive ribosomes, comprised of assembly intermediates, hibernating ribosomes, or otherwise non-functional ribosomes, have been found to constitute a small fraction of the total ribosome pool for fast growth [13, 39] and are not considered here. Based on equation 1, both increasing ribosome concentration and increasing TF concentrations (which decreases τ푡푟푙) can accelerate growth. However, production of ribosomes and TFs is subject to competition under a limited proteomic space, which we consider next.

Optimization under proteome allocation constraint

To model the production cost tradeoff between TFs and ribosomes, we integrate the flux-based formu- lation above to the proteome allocation model that describes how global gene expression scales with the growth rate [17, 62]. In this coarse-grained model, the proteome is divided into an incompressible

"sector" 휑푄 that is independent of the growth rate, a metabolic protein sector 휑푃 that scales linearly with growth rate (휑푃 := 휆/휈, where 휈 is a phenomenological parameter that reflects nutrient quality), and a translation sector 휑푅. In our description, 휑푅 is taken as the summed proteome fractions for the ∑︀ ribosome and translation factors, i.e., 휑푅 := 휑푟푖푏표 + 푖 휑푇 퐹,푖. Together, these sectors comprise the full proteome: 1 = 휑푄 + 휑푃 + 휑푅. Under this framework, the proteomic tradeoff between ribosomes and TFs

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. is then 휆 ∑︁ 휑 = 1 − 휑 − − 휑 . (2) 푟푖푏표 푄 휈 푇 퐹,푖 푖 Equations 1 and 2, together with to the kinetic schemes for each step of the translation cycle, constitute the core of our model. Combining the biochemical effects (equation 1) and the systems- level constraints (equation 2) on TFs, we arrive at a self-contained relationship between growth and TF concentrations: 1 − 휑 − 휆 − ∑︀ 휑 ⟨ℓ⟩ 휆 = 푄 휈 푖 푇 퐹,푖 , (3) τ푡푟푙({휑푇 퐹,푖}) ℓ푟푖푏표 where we explicitly express τ푡푟푙 as a function of 휑푇 퐹,푖 to reflect the dependence of ribosome transit times on translation factor abundances. The above relationship (equation 3) allows us to ask: what is the stoichiometry of TFs, or partitioning of the translation sector, that maximizes the growth rate

(Fig. 1A)?

* The condition for the optimal TF abundances, i.e., the set of 휑푇 퐹,푖 that satisfies (휕휆/휕휑푇 퐹,푖) = 0, can be readily obtained by considering the 휑푇 퐹,푖 as independent variables and taking the derivative of equation 3 with respect to a specified TF abundance, which yields

(︂ )︂* 휕τ푡푟푙 ⟨ℓ⟩ 1 = − * , (4) 휕휑푇 퐹,푖 ℓ푟푖푏표 휆

* where the asterisk refers to the growth optimum, i.e., (휕휆/휕휑푇 퐹,푖) = 0. Hence, under this framework, the TF abundances are growth-optimized when the sensitivity of the translation cycle time to changing the considered TF abundance (휕τ푡푟푙/휕휑푇 퐹,푖) reaches a value determined solely by the growth rate and protein size factors.

Although equation 3 and the resulting optimization conditions (equation 4, one for every TF) cor- responds to a coupled nonlinear system of multiple 휑푇 퐹,푖, substantial decoupling occurs at the optimal growth rate. In this situation, most 휑푇 퐹,푖 are only connected through the resulting growth rate. The op- timization problem is then further simplified by the fact that the translation cycle consists of sequential and largely independent steps. The translation cycle time τ푡푟푙 corresponds to the sum of the coarse- grained initiation, elongation, and termination times, i.e., τ푡푟푙 = τ푖푛푖 + τ푒푙 + τ푡푒푟. Given that each TF is involved in a specific molecular step, the sensitivity matrix of these times to TF concentration issparse:

* (휕τ푗/휕휑푇 퐹,푖) = 0 for most combinations of τ푗 and 휑푇 퐹,푖. This lack of ‘cross-reactivity’ expresses that, for example, the initiation time τ푖푛푖 is unaffected by the tRNA synthetase concentration. This sparsity

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. only occurs at the optimal expression levels, as the transit times typically depend on the growth rate

(see an example in section S5.2) and 휕휆/휕휑푇 퐹,푖 ̸= 0 away from the optimum. The optimum condition for factor 푖 then simplifies to: (︂ )︂* 휕τ푗 ⟨ℓ⟩ 1 = − * , (5) 휕휑푇 퐹,푖 ℓ푟푖푏표 휆 where 푗 denotes the translation step(s) that TF푖 participates in. This leads to simplifications that allow the system to be solved analytically in most cases: instead of solving the full system at once, individual reactions within the translation cycle can be considered in isolation. The resulting optimal concentrations are connected via the growth rate 휆*. Interestingly, the optimal stoichiometry among most TFs is independent of 휆* if the reactions are diffusion-limited, as we show below.

Case study: Translation termination

We first illustrate the process of solving for the optimal TF concentration for the relatively simple case of translation termination. The principles used here and the form of solutions provide conceptual guideposts for solving other steps of the translation cycle.

In , translation termination [8] consists of two distinct, sequential steps: (1) stop codon recognition and peptidyl-tRNA hydrolysis catalyzed by class I peptide chain release factors RF1 and RF2, followed by (2) dissociation of ribosomal subunits from the mRNA, i.e., ribosome recycling, catalyzed by RF4. We do not explicitly consider the additional factors (e.g., RF3 and EF-G) due to their lack of conservation or because they are non-limiting for this specific step (Supplement, section S5.1). RF1 and

RF2 have the same molecular functions but recognize different stop codons [61]: RF1 recognizes stops

UAA and UAG, whereas RF2 recognizes UAA and UGA. For simplicity, we describe here a scenario where RF1 and RF2 have no specificity towards the three stop codons, which allows us to combine them in a single factor (denoted RFI). The model is readily generalized, with similar results, to the case of the two RFs with their specificity towards the three stop codons (Supplement, section S5.3).

Under a coarse-grained description, the total ribosome transit time at termination τ푡푒푟 can be de- composed into a sum of peptide release time and ribosome recycling time. In the treatment below, we consider a regime of diffusion-limited reactions for simplicity. A full model with catalytic compo- nents can also be solved analytically (Supplement section S5.2, Fig. 2A). In the diffusion-limited regime

(푘푐푎푡 → ∞), the peptide release time and ribosome recycling time are inversely proportional to the

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. corresponding TF concentrations:

1 1 τ푡푒푟 = 푅퐹 퐼 + 푅퐹 4 , (6) 푘표푛 휑푅퐹 퐼 푘표푛 휑푅퐹 4

푖 where the association rate constants 푘표푛 are rescaled by the factor’s sizes in proteome fraction units (Supplement, section S1.2). The above expression constitutes the solution of the mass action scheme for termination, connecting factor abundances to termination time.

Figure 2: Case study with translation termination (A) Coarse-grained translation termination scheme. (B) Illustration of the minimization of effective proteome fraction corresponding to peptide chain release factors, leading to the equipartition principle.

The termination time (equation 6) can then be directly substituted into the optimality condition

(equation 5) and solved in terms of 휆*:

√︃ * √︃ * * ℓ푟푖푏표휆 * ℓ푟푖푏표휆 휑푅퐹 퐼 = 푅퐹 퐼 , 휑푅퐹 4 = 푅퐹 4 . (7) ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘표푛

* If the reactions are not diffusion-limited, an additional catalytic term ∝ 휆 /푘푐푎푡 is added to the mini- mally required levels above (Supplement, section S5.2). The square-root dependence in the optimal RF

−1 −1 concentrations emerges from the 휑푖 dependence of τ푖, e.g., for ribosome recycling τ푟푒푐푦푐 ∝ 휑푅퐹 4, which * −2 becomes (휑푖 ) upon taking the derivative in the optimality condition (equation 5). The square root * is then obtained by solving for 휑푖 . A similar square-root dependence has been noted in optimization of the ternary complex and tRNA abundances [6, 17]. As a result of the square-root, the optimal RF concentrations are weakly affected by biophysical properties such as the association rate constants and protein sizes. In the diffusion-limited regime above, the ratio of the optimal concentrations between RFI and RF4 is independent of the growth rate and only depends on the kinetics of binding.

As a side note, the expression for termination time τ푡푒푟 in equation 6 must be modified in a regime where ribosomes are frequently queued upstream of stop codons. This would occur if the termination

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. rate were slow and approached initiation rates on mRNAs [7, 33]. In this regime, queues of ribosomes at stop codons would incur an additional time to terminate. In a general description, the resulting

푓푢푙푙 additional termination time can be absorbed in a queuing factor 풬: τ푡푒푟 := τ푡푒푟 풬(τ푡푒푟) (Supplement, section S3 for derivation and discussion). The resulting nonlinearity would forbid the decoupling in the optimization procedure between RFI and RF4. Although absolute rates of termination are difficult to measure in vivo, translation on mRNAs is generally thought to be limited at the initiation step [36], and consistently, ribosome queuing at stop codons in bacteria is not usually observed (except under severe perturbations, e.g., [3, 27, 33, 42, 58]). In the physiological regime of fast termination, the queuing factor converges to 1, yielding simple solutions that depend only on biophysical parameters (equations 7).

Equipartition between TF and corresponding ribosomes

The optimal TF concentrations (e.g., equation 7) can also be intuitively derived from another viewpoint.

For each reaction in the translation cycle, we can define an effective proteome fraction allocated tothat process, combining the proteome fractions of the corresponding TF and the ribosomes waiting at that specific step. As an example, for the case of peptide chain release factor (RFI) just treated, theeffective proteome fraction includes the release factors and ribosomes with completed peptides waiting at stop

푒푓푓 푠푡표푝 codons (dashed box in Fig. 2A), i.e., 휑푅퐹 퐼 := 휑푅퐹 퐼 + 휑푟푖푏표 . This effective proteome fraction corresponds to the total proteomic space associated to a TF in the context of the translation cycle.

During steady-state growth, the concentration of ribosomes waiting at any specific step of the trans- lation cycle is equal to the total ribosome concentration multiplied by the ratio of the transit time of

푠푡표푝 τ푠푡표푝 푅퐹 퐼 that step to the full cycle: e.g., here 휑 = 휑푟푖푏표, where τ푠푡표푝 = 1/(푘 휑푅퐹 퐼 ) is the time to arrival 푟푖푏표 τ푡푟푙 표푛 of RFI. Using equation 1 for 휑푟푖푏표, the effective proteome fraction satisfies:

푒푓푓 푠푡표푝 1 휆 ℓ푟푖푏표 휑푅퐹 퐼 := 휑푅퐹 퐼 + 휑푟푖푏표 = 휑푅퐹 퐼 + 푅퐹 퐼 휑푅퐹 퐼 푘표푛 ⟨ℓ⟩ √︃ (8) 휆 ℓ푟푖푏표 ≥ 2 푅퐹 퐼 . 푘표푛 ⟨ℓ⟩

√ In the last line, we used the inequality of arithmetic and geometric means (푎 + 푏 ≥ 2 푎푏) to obtain the minimum of the effective proteome fraction. The equality holds when the two proteome fractions are

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

푠푡표푝 equal (휑푅퐹 퐼 = 휑푟푖푏표 ), which provides the solution for optimal 휑푅퐹 퐼 :

√︃ * * ℓ푟푖푏표휆 휑푅퐹 퐼 = 푅퐹 퐼 , (9) ⟨ℓ⟩푘표푛

Hence, we recover equation 7 by minimizing the effective proteome fraction allocated to a given process in the translation cycle (the above argument applies to the free optimal concentration in the non-diffusion limited regime, see Supplement, section S5.2 for an example). From this perspective, optimization of the translation apparatus balances the production cost of the enzyme of interest with the improved efficiency of a having less ribosomes idle at that step, Fig. 2B. The optimal abundance corresponds to a point of equipartition: the proteome fraction of free cognate factors equals the proteome fraction of ribosomes waiting at the corresponding step (Fig. 2B).

Case study: Ternary complex and tRNA cycle (EF-Tu and aaRS)

We next consider a more complex step of the translation cycle – elongation – and demonstrate that the optimality criteria (equation 5) can similarly provide simple analytical solutions in the physiologically relevant regime. Translation elongation involves multiple interlocked cycles and enzymes (EF-Tu, EF-G,

EF-Ts, aminoacyl-tRNA synthetases (aaRS), and more). Our kinetic scheme for translation elongation, self-consistently coarse-grained to a single codon (derived in the Supplement, section S6.1), is shown in

Fig. 3A. Charged tRNAs are brought to ribosomes through a ternary complex (TC), corresponding to a bound tRNA and EF-Tu. Following tRNA delivery and GTP hydrolysis, EF-Tu is released from the ribosome, and nucleotide exchange factor EF-Ts recycles EF-Tu back into the active pool, after which

EF-Tu can bind a charged tRNA again and form another TC. At the ribosome, translocation to the next codon is catalyzed by EF-G, followed by release of uncharged tRNAs. Aminoacyl-tRNA synthetases then charge tRNAs to complete the elongation cycle.

Similar to translation termination, the factor-dependent ribosome transit time through a single codon

(τ푎푎) is comprised of two steps, corresponding to binding of the TC and EF-G, respectively (formal derivation and non-diffusion-limited regime in section S6.3.1):

푛푎푎 1 τ푎푎 = 푇 퐶 + 퐺 . (10) 푘표푛 휑푇 퐶 푘표푛휑퐺

The coarse-grained total elongation time in our model is given by the factor-dependent transit time of a

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

single codon above multiplied by the average protein sizes, i.e., τ푒푙 = ⟨ℓ⟩(τ푎푎+τ푖푛푑). A factor-independent time τ푖푛푑 (e.g., peptidyl transfer), which does not come into play in our optimization framework, was added to account for additional steps making up the full elongation cycle. The rescaling of the TC association rate by 푛푎푎 arises as a result of our coarse-graining to a one-codon model, and reflects the codon specific interaction between the TC and the ribosome (Supplement, section S6.1). Notethat the ternary complex concentration, 휑푇 퐶 , is a nonlinear function of the concentrations of all elongation factors (including 휑퐺).

Despite the complexity of τ푎푎 as a function of the 휑푇 퐹,푖, the fact that all fluxes are equal in steady- state allows several steps to be isolated and solved separately (EF-Ts and EF-G, greyed out in Fig. 3A, respectively solved in Supplement, sections S6.3.3 and S6.3.4). For example, the approximate solution for optimal EF-G concentration parallels that for termination factors:

√︃ * * ℓ푟푖푏표휆 휑퐺 ≈ 퐺 . (11) 푘표푛

Importantly, the optimum for EF-G is larger than the optimum for RFs by a factor √︀⟨ℓ⟩, reflecting that a typical translation cycle requires ⟨ℓ⟩ steps catalyzed by EF-G and only one step for RFs (i.e.,

τ푒푙 = ⟨ℓ⟩(τ푎푎 +τ푖푛푑) enters the optimality condition, equation 5, in contrast to τ푡푒푟 which is not multiplied by a scaling factor). The square root dependence arises here for the same reason as in the case of translation termination (derivative of 휑−1).

In contrast to EF-G and EF-Ts, EF-Tu and aaRS cannot a priori be treated in isolation because the TC is composed of charged tRNAs and EF-Tu. A key property of the model simplifies the prob- lem considerably. Following coarse graining, there is a separation of timescales between two classes of reactions in the elongation cycle: those that require codon specificity and those that do not. The first class includes aaRS binding to cognate tRNAs and TC binding to cognate codons (asterisks in Fig. 3A).

−1 The association rate constants of these codon-specific reactions are rescaled by a factor of 푛푎푎 , with

푛푎푎 ≈ 20 close the number of different amino acids (derivation and estimation of the coarse-graining parameter from experimental data in section S6.1). Heuristically, the rescaling arises from having 푛푎푎 parallel pathways whose rates are dictated by bimolecular reactions with respective components present

−1 −2 −1 at a concentration 푛푎푎 lower, with an overall rate reduction proportional to 푛푎푎 × 푛푎푎 = 푛푎푎 . By com- parison, the reactions that are codon-agnostic, such as binding of EF-Tu to charged tRNAs, are much faster.

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 3: Case study with elongation factors (EF-Tu/aaRS) (A) Schematic of the translation elongation scheme, with the tRNA cycle, involving tRNA synthetases (aaRS) and EF-Tu. Reactions with * have the association rate −1 constants rescaled by a factor of 푛푎푎 ≈ 1/20 due to coarse-graining to a single codon model. Greyed out cycles (EF-Ts and EF-G) can be solved in isolation (Supplement, respectively sections S6.3.3 and S6.3.4). (B) Exploration of the aaRS/EF-Tu expression space from numerical solution of the elongation model (Supplement, section S6.3.5). The transition line (orange) marks the boundary between the EF-Tu limited and aaRS limited regimes. Left panel shows the ternary complex concentration (which is closely related to the elongation rate, equation 10). The ternary complex concentration is scaled by the dissociation constant 퐾푇 퐶 to the ribosome A site (Supplement, section S6.3.5). Middle panel shows the free charged tRNA fraction. Right panel shows the free EF-Tu fraction (휑푇 푢퐺푇 푃 denotes the proteome fraction of EF-Tu GTP that can bind to charged tRNAs to form the ternary complex). The star marks the optimal solution, as described in the text. Middle panel shows the free charged tRNA fraction, and right panel the free EF-Tu fraction.

The dependence of TC concentration (휑푇 퐶 in equation 10) on EF-Tu and aaRS concentrations can be solved based on this separation of timescales (Supplement, section S6.3.5) and the condition that

EF-G and EF-Ts are at their respective optima. Here we consider an effective one-codon model and the corresponding sum of all aaRS abundances as 휑푎푎푅푆. The EF-Ts portion of the EF-Tu cycle leads to an additive correction to the optimal EF-Tu solution and is omitted from the current discussion for simplicity, see section S6.3.5 for a full treatment.

As a result of the separation of timescales, the behavior of TC concentration in the two-dimensional

EF-Tu/aaRS expression space is separated into two regimes: one where EF-Tu is limiting (free EF-Tu

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. concentration near 0) for ternary complex formation, and the other where charged tRNAs are limit- ing (free charged tRNA concentration near 0). Specifically, in either limited regimes, increasing the abundance of the limiting component leads to stoichiometric increase in TC and therefore dictates TC concentration. The two regimes are separated by a transition region, whose width is set by the extent of separation of scale between the two classes of reactions. We call the location of the separation the transition line (orange line in Fig. 3B), see section S6.3.5). At large aaRS concentrations, the transition line plateaus as a result of the finite total tRNA budget within the cell (Fig. 3B, middle panel).The plateau is reached once all tRNAs aaRS are charged: the system is then no longer limited by synthetases, but by the amount of tRNAs. In other words, at large aaRS and EF-Tu abundances, the total tRNA concentration becomes limiting for the TC concentration.

The transition line corresponds to conditions in which EF-Tu and aaRS are co-limiting for TC con- centration. In the EF-Tu limited region, increasing aaRS abundance does not increase ternary complex concentration: since all EF-Tu proteins are already bound to charged tRNAs, increasing tRNA charging cannot further increase TC concentration. Conversely, in the aatRNA limited region, increasing EF-Tu abundance does not increase TC concentration: since all charged tRNAs are already bound by EF-Tu, increasing EF-Tu concentration does not alleviate the requirement for more charged tRNAs. Given that the optimality condition requires non-zero increase in ternary complex concentration with increasing fac- tor abundance (equations 5 and 10), the optimal EF-Tu and aaRS abundances must be on the transition line. Away from this line, there is either an unproductive excess of EF-Tu or aaRS.

Which point on the transition line corresponds to the optimum? As an approximation, note that inside the EF-Tu limited region, 휑푇 퐶 ≈ 휑푇 푢 (since most EF-Tu proteins are bound by charged tRNAs). Hence, within this region, the optimal abundance for the ternary complex can be solved in the same way as for translation termination factors resulting in the optimum:

√︃ * * ℓ푟푖푏표푛푎푎휆 휑푇 퐶 ≈ 푇 퐶 . (12) 푘표푛

Importantly, compared to the solution for EF-G, the above is multiplied by an additional factor of √ 푛푎푎. This contribution arises from the rescaling of the association rate for the ternary complex to the ribosome in our coarse-grained one-codon model, reflecting the fact that the interaction between the ternary complex and the ribosome is codon specific.

From the requirement for the combined EF-Tu and aaRS solution to fall on the transition line, the

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. approximate solution for the optimal tRNA synthetase abundance is then the intersection (yellow star in

Fig. 3B) of the transition line with the EF-Tu-only solution described above (dashed blue line in Fig. 3B) and is related to the amount of excess tRNA (Supplement, section S6.3.5).

For the above heuristic to be valid, the total number of tRNAs in the cell must be sufficient to accommodate all ribosomes (about 2 per ribosome) and binding to all EF-Tu (about > 4 per ribosome based on endogenous expression stoichiometry [34, 38]). The number of tRNAs per ribosomes in the cell should thus be at least 6×. Remarkably, estimates of this ratio in the cell suggest that this is just barely the case (between 6-7 tRNAs/ribosome at fast growth [15]). Although our model treats the total tRNA abundance as a measured parameter and omits its selective pressure (but see [21]), the abundance of three core components of the tRNA cycle appear to be at the special point where the transition line plateau, that is set by total tRNA abundance, just crosses the EF-Tu-only optimum (blue line in

Fig. 3B). At this point, all three components are co-limiting.

Optimal stoichioimetry of mRNA translation factors

Analogous to the case studies above, optimal concentrations for all core translation factors can be solved using equation 3 and their respective kinetics schemes (the case of translation initiation is solved in

Supplement, section S7). The analytical forms of the optimal solutions are shown in Table S3 . In the diffusion limited regime, the ratios of growth-optimized TF concentrations are independent of thegrowth rate (except for aaRS), and are dependent only on basic biophysical parameters, such as protein sizes and diffusion constants.

To obtain the numerical values for the optimal TF stoichiometry, we note that the ratios of diffusion- limited biomolecular association rate constants between two reactions can be expressed using Smolu- ^퐴퐵 ^퐶퐷 −1 −1 chowski’s formulation as 푘표푛 /푘표푛 = (퐷퐴 + 퐷퐵)/(퐷퐶 + 퐷퐷) (where the ^ refers to units µM s , see Supplement, section S1.2), where 퐷푖 is the diffusion constant for the molecular species 푖. Diffusion constants for several TFs have been measured experimentally [4, 55, 59, 67], and other uncharacterized ones can be estimated using the cubic-root scaling with protein size from the Stokes-Einstein relation

[49]. Importantly, the absolute values of the optimal concentrations can be anchored by the measured association rate constant between TC and the ribosome obtained from translation elongation kinetic measurements in vivo [13] (Supplement, section S9). The square-root dependence on these parameters

(Table S3), as explained in the case of termination factors, makes the numerical values less sensitive to

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. the exact biophysical parameters.

The optimal TF concentrations show concordance with the observed ones, both in terms of the absolute levels and the stoichiometry among TFs (Fig. 4). A hierarchy of expression levels emerges such that the factors involved in elongation are more abundant compared to initiation and termination factors. The separation of these two classes is driven by the scaling factor √︀⟨ℓ⟩ ≈ 14 in our analytical solutions, which reflects the fact that the flux for elongation factorsis ⟨ℓ⟩ ≈ 200 times higher than that for initiation and termination factors. Within each class, the finer hierarchy of expression levels can also be further explained by simple parameters. For example, EF-Tu is more abundant than EF-G by a √︀ factor of 푛푎푎ℓ푇 푢/ℓ퐺 ≈ 3.3 (observed 휑푇 푢/휑퐺: E. coli 3.9, B. subtilis 2.7, V. natriegens 3.3). A higher abundance is required for EF-Tu because it is bound to the different tRNAs, which effectively decreases the concentration by a factor of 푛푎푎 ≈ 20 (see section S6.2 for derivation and discussion of why the factor is not equal to the number of different tRNAs). Taken together, our model offers straightforward explanations for the observed TF stoichiometry.

Figure 4: Predicted optimal abundance (no cat- alytic contribution, 푘푐푎푡 → ∞) versus observed abundance. Experimental values are the average of E. coli, B. subtilis, V. natriegens [34]. We note that given the extreme sensitivity of the optimal aaRS abundance on the total tRNA to ribosome ratio (visually: yellow star’s position moves rapidly along x-axis upon changes in plateau of transition line, see Fig. 3B), the agreement for aaRS should be interpreted with caution. Values are listed in Table S5.

For a few TFs, the observed concentrations are 2- to 5-fold higher than the predicted optimal levels

(e.g., EF-Ts, RF4, and IF1 in Fig. 4). A potential explanation is that the corresponding reactions may not be diffusion-limited, which would lead to a non-negligible fraction of TFs sequestered atthe catalytic step and thereby require higher total concentrations. Our optimization model can also be solved analytically in the non-diffusion-limited regime (Table S3). However, the numerical values for these solutions are difficult to obtain because the estimates for catalytic rates are sparse and often inconsistent

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. with estimates of kinetics in live cells (Supplement, section S9). Another potential explanation is that the selective pressure for these TFs may be lower compared to the more highly expressed TFs. This explanation is unlikely both because their stoichiometry are observed to be conserved (Fig. 1B) and given that the expression of other lowly expressed TFs (e.g., RF1, RF2, and individual aaRSs) has been shown to acutely affect cell growth [33, 52]. Nevertheless, the deviations from the predicted optimal levels suggest that a more refined model may be required than our first-principles derivation.

Discussion

Despite the comprehensive characterization of their molecular mechanisms, the ‘mixology’ for the protein synthesis machineries inside living cells has remained elusive. Here we establish a first-principles frame- work to provide analytical solutions for the growth-optimizing concentrations of translation factors. We find reasonable agreements between our parsimonious predictions and the observed TF stoichiometry

(Fig. 4). These results provide simple rationales for the hierarchy of expression levels, as well as insights into several construction principles for biological pathways.

An important implication from the agreement between observed stoichiometries and our predictions is that most TFs are co-limiting for growth. Previous models have focused on proteomic allocation constraints to ribosomes [5, 62, 63] and abundant elongation factors [17, 21, 30]. Our results suggest that expression of translation factors, which drive functional outputs from ribosomes, also co-limits cell growth. By virtue of the interlocked translation cycles at steady state, the flux through every cycle must be matched. In our model, the optimality occurs when there are just enough TFs to support the required flux in every cycle, such that the proteome fraction of free factors equals that of waiting ribosomesat that step (equipartition). If the concentration of any one TF falls below the optimal point, it becomes the limiting factor for protein synthesis and growth. This result is supported by experimental evidence that slight knockdowns of individual RFs and aaRSs are detrimental to growth [33, 52]. Figuratively, the translation apparatus is analogous to a vulnerable supply chain, in which slowdown in any of the steps affects the full output.

In the diffusion-limited regime, the optimal TF stoichiometry is independent of the specific growth rate. This is consistent with the observation that relative TF expression remains unchanged between E. coli growing with a 20-min doubling time and a 2-hr doubling time [34, 38]. Furthermore, the optimal

TF stoichiometry depends only on simple biophysical parameters, including protein sizes and diffusion

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. constants, that are likely conserved in distant species. This result is consistent with the maintenance of the relative TF expression across large phylogenetic distances even though the underlying regulation and cellular physiology has diverged [34]. It remains to be determined if similar biophysical principles apply to the numerous other pathways that also exhibit conserved enzyme expression stoichiometry.

In principle, our model can also make predictions on the growth defects at suboptimal TF concen- trations. However, experimentally testing these predictions will be difficult due to secondary effects of gene regulation that are not considered in our model near optimality. For example, we have recently shown that small changes in RF levels lead to idiosyncratic induction of the general stress response in B. subtilis due to a single ultrasensitive stop codon [33]. As a result, the growth defect not only arises from reduced translation flux, but is in fact dictated by spurious regulatory connections thatare normally not activated when TF expression is at the optimum. We propose that TF expression may be set at the optimal levels as our first-principles model suggests but entrenched by spurious connections in the regulatory network. To predict the full expression-to-fitness landscape away from the optimum, a more comprehensive model may be required to take into account all the molecular interactions in the cell [26, 41].

Our coarse-graining approach has several limitations in its connection to detailed biochemical param- eters. First, some of the coarse-grained rate constants remain difficult to estimate, possibly neglecting important features. For example, we do not explicitly consider off-rates in our model. Instead, our parameters correspond to effective rate constants that account for possible sequential binding andun- ˜ binding events, i.e., 푘표푛 = 푘표푛/푛푏푖푛푑, with 푛푏푖푛푑 = 푘푐푎푡/(푘푐푎푡 + 푘표푓푓 ). The effective association rate constants in our model thus contain information about catalytic and possible proofreading steps. We have used a rescaling of in vivo measured 푘표푛 as an attempt to circumvent this issue, but the specifics of off and/or catalytic rates for each factor could complicate this rescaling in a way that isdifficultto predict. Second, the model does not account for possible sequestration of binding sites by non-cognate interactions and it implicitly assumes the off-rates to be large in these cases such that a cognate binding site is never occupied by a non-cognate binding event. An expanded theory adding requirements of accuracy [17, 32, 40] and its interplay with the framework defined here, would be an interesting future research avenue.

Taken together, our model provides the biophysical basis for the stoichiometry of translation factors in living cells. The first-principles approach complements more comprehensive models that include many

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. biochemical parameters [21, 66], while providing intuitive rationales for the expression hierarchy. We anticipate that our approach will be generalizable to elucidate or design enzyme stoichiometry of other biological pathways, especially those whose activities are required for cell growth.

Acknowledgements We thank R. Battaglia, J. Cascino, M. Gill, M. Parker, D. Parker, and G. Schmidt for critical read- ing of the manuscript, and all members of the Li lab for discussion. This research was supported by

NIH grant R35GM124732, the NSF CAREER Award, the Smith Odyssey Award, the Pew Biomedical

Scholars Program, a Sloan Research Fellowship, the Searle Scholars Program, the Smith Family Award for Excellence in Biomedical Research; NSERC doctoral Fellowship and HHMI International Student

Research Fellowship (to J.-B.L.).

Conflict of interest The authors declare that they have no conflict of interest.

Data availability Computer scripts used in this study (custom Matlab scripts) will be made available upon reasonable request. Already publicly available ribosome profiling datasets were used (GEO accessions GSE95211 and GSE53767).

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

References

[1] Xabier Agirrezabala and Joachim Frank. Elongation in translation as a dynamic interaction among

the ribosome, tRNA, and elongation factors EF-G and EF-Tu. Quarterly Reviews of Biophysics,

42(3):159–200, 2009.

[2] Gregers R. Andersen, Poul Nissen, and Jens Nyborg. Elongation factors in protein .

Trends in Biochemical Sciences, 28(8):434–441, 2003.

[3] Natalie E. Baggett, Yan Zhang, and Carol A. Gross. Global analysis of translation termination in

E. coli. PLoS Genetics, 13(3):1–27, 2017.

[4] Somenath Bakshi, Albert Siryaporn, Mark Goulian, and James C. Weisshaar. Superresolution

imaging of ribosomes and RNA polymerase in live cells. Molecular Microbiology,

85(1):21–38, 2012.

[5] Nathan M. Belliveau, Grin Chure, Christina L. Hueschen, Hernan G. Garcia, Jane Kondev, Daniel S.

Fisher, Julie A. Theriot, and Rob Phillips. Fundamental limits on the rate of bacterial growth.

bioRxiv, 2020.

[6] O G Berg and C G Kurland. Growth rate-optimised tRNA abundance and codon usage. Journal

of molecular biology, 270(4):544–550, 1997.

[7] J. E. Bergmann and H. F. Lodish. A kinetic model of protein synthesis. Application to hemoglobin

synthesis and translational control. Journal of Biological Chemistry, 254(23):11927–11937, 1979.

[8] G. Bertram, S. Innes, O. Minella, J. P. Richardson, and I. Stansfield. Endless possibilities: Trans-

lation termination and stop codon recognition. Microbiology, 147(2):255–269, 2001.

[9] Glenn R. Björk and Tord G. Hagervall. Transfer RNA Modification: Presence, Synthesis, and

Function. EcoSal Plus, 6(1), 2014.

[10] Anneli Borg, Michael Pavlov, and Måns Ehrenberg. Complete kinetic mechanism for recycling of

the bacterial ribosome. RNA, 22(1):10–21, 2016.

[11] Hans Bremer and Patrick P Dennis. Modulation of Chemical Composition and Other Parameters

of the Cell at Different Exponential Growth Rates. EcoSal Plus, 2008.

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[12] Jin Chen, Junhong Choi, Seán E. O’Leary, Arjun Prabhakar, Alexey Petrov, Rosslyn Grosely,

Elisabetta Viani Puglisi, and Joseph D. Puglisi. The molecular choreography of protein synthesis:

translational control, regulation, and pathways. Quarterly Reviews of Biophysics, 49:e11, 2016.

[13] Xiongfeng Dai, Manlu Zhu, Mya Warren, Rohan Balakrishnan, Vadim Patsalo, Hiroyuki Okano,

James R. Williamson, Kurt Fredrick, Yi Ping Wang, and Terence Hwa. Reduction of translat-

ing ribosomes enables Escherichia coli to maintain elongation rates during slow growth. Nature

Microbiology, 2(2):1–9, 2016.

[14] Thomas E. Dever and Rachel Green. The elongation, termination, and recycling phases of transla-

tion in . Cold Spring Harbor Perspectives in Biology, 4(7):1–16, 2012.

[15] Hengjing Dong, Lars Nilsson, and Charles G Kurland. Co-variation of tRNA Abundance and Codon

Usage in Escherichia coli at Different Growth Rates. Journal of molecular biology, 260(5):649–663,

1996.

[16] Eric Charles Dykeman. A stochastic model for simulating ribosome kinetics in vivo. PLoS compu-

tational biology, 16(2):e1007618, 2020.

[17] M Ehrenberg and C G Kurland. Costs of accuracy determined by a maximal growth rate constraint.

Quarterly reviews of biophysics, 17:45–80, 1984.

[18] Michael B. Elowitz, Michael G. Surette, Pierre Etienne Wolf, Jeffry B. Stock, and Stanislas Leibler.

Protein mobility in the cytoplasm of Escherichia coli. Journal of Bacteriology, 181(1):197–203, 1999.

[19] Thomas E Gorochowski, Irina Chelysheva, Mette Eriksen, Priyanka Nair, Steen Pedersen, and Zoya

Ignatova. Absolute quantification of and burden using combined sequencing

approaches. Molecular Systems Biology, 15(5):e8719, 2019.

[20] Claudio O. Gualerzi and Cynthia L. Pon. Initiation of mRNA translation in bacteria: Structural

and dynamic aspects. Cellular and Molecular Life Sciences, 72(22):4341–4367, 2015.

[21] Xiao Pan Hu, Hugo Dourado, Peter Schubert, and Martin J. Lercher. The protein translation

machinery is expressed for maximal efficiency in Escherichia coli. Nature Communications, 11(1):1–

10, 2020.

[22] Michael Ibba and S Dieter. Aminoacyl-tRNA synthesis. Annu. Rev. Biochem., 69:617–650, 2000.

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[23] Markus Jeschek, Daniel Gerngross, and Sven Panke. Combinatorial pathway optimization for

streamlined metabolic engineering. Current Opinion in Biotechnology, 47:142–151, 2017.

[24] Lisa Jeske, Sandra Placzek, Ida Schomburg, Antje Chang, and Dietmar Schomburg. BRENDA in

2019: A European ELIXIR core data resource. Nucleic Acids Research, 47(D1):D542–D549, 2019.

[25] Grace E. Johnson, Jean Benoît Lalanne, Michelle L. Peters, and Gene Wei Li. Functionally uncou-

pled transcription-translation in Bacillus subtilis. Nature, 585(7823):124–128, 2020.

[26] Jonathan R. Karr, Jayodita C. Sanghvi, Derek N. MacKlin, Miriam V. Gutschow, Jared M. Ja-

cobs, Benjamin Bolival, Nacyra Assad-Garcia, John I. Glass, and Markus W. Covert. A whole-cell

computational model predicts phenotype from genotype. Cell, 150(2):389–401, 2012.

[27] Bor Kavčič, Gašper Tkačik, and Tobias Bollenbach. Mechanisms of drug interactions between

translation-inhibiting antibiotics. Nature Communications, 11(1):4013, 2020.

[28] David Kennell and Howard Riezman. Transcription and translation initiation frequencies of the

Escherichia coli lac operon. Journal of Molecular Biology, 114(1):1–21, 1977.

[29] Ingrid M Keseler, Amanda Mackie, Alberto Santos-Zavaleta, Richard Billington, César Bonavides-

Martínez, Ron Caspi, Carol Fulcher, Socorro Gama-Castro, Anamika Kothari, Markus Krumme-

nacker, Mario Latendresse, Luis Muñiz-Rascado, Quang Ong, Suzanne Paley, Martin Peralta-Gil,

Pallavi Subhraveti, David A Velázquez-Ramírez, Daniel Weaver, Julio Collado-Vides, Ian Paulsen,

and Peter D Karp. The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.

Nucleic acids research, 45(November 2016):gkw1003, 2016.

[30] Stefan Klumpp, Matthew Scott, Steen Pedersen, and Terence Hwa. Molecular crowding limits

translation and cell growth. Proceedings of the National Academy of Sciences of the United States

of America, 110(42):16754–9, 2013.

[31] Mohit Kumar, Mario S. Mommer, and Victor Sourjik. Mobility of cytoplasmic, membrane, and

DNA-binding proteins in Escherichia coli. Biophysical Journal, 98(4):552–559, 2010.

[32] C G Kurland and M Ehrenberg. Growth-optimizing accuracy of gene expression. Annual review of

biophysics and biophysical chemistry, 16:291–317, 1987.

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[33] Jean-Benoît Lalanne, Darren J. Parker, and Gene-Wei Li. Spurious regulatory connections dictate

the expression-fitness landscape of translation termination factors. bioRxiv, 2020.

[34] Jean Benoît Lalanne, James C. Taggart, Monica S. Guo, Lydia Herzel, Ariel Schieler, and Gene Wei

Li. Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell, pages

749–761, 2018.

[35] K. L. Larrabee, J. O. Phillips, G. J. Williams, and A. R. Larrabee. The relative rates of protein

synthesis and degradation in a growing culture of Escherichia coli. Journal of Biological Chemistry,

255(9):4125–4130, 1980.

[36] Brian S. Laursen and Hans P. Sørensen. Initiation of protein synthesis in bacteria. Microbiology

and Molecular Biology Reviews, 69(1):101–123, 2005.

[37] Gene-Wei Li. How do bacteria tune translation efficiency? Current Opinion in Microbiology,

24:66–71, 2015.

[38] Gene-Wei Li, David Burkhardt, Carol Gross, and Jonathan S. Weissman. Quantifying absolute

protein synthesis rates reveals principles underlying allocation of cellular resources. Cell, 157(3):624–

635, 2014.

[39] Lasse Lindahl. Intermediates and time kinetics of the in vivo assembly of Escherichia coli ribosomes.

Journal of Molecular Biology, 92(1):15–37, 1975.

[40] M. Lovmar and M. Ehrenberg. Rate, accuracy and cost of ribosomes in bacterial cells. Biochimie,

88(8):951–961, 2006.

[41] Derek N. Macklin, Travis A. Ahn-Horst, Heejo Choi, Nicholas A. Ruggero, Javier Carrera, John C.

Mason, Gwanggyu Sun, Eran Agmon, Mialy M. DeFelice, Inbal Maayan, Keara Lane, Ryan K.

Spangler, Taryn E. Gillies, Morgan L. Paull, Sajia Akhter, Samuel R. Bray, Daniel S. Weaver,

Ingrid M. Keseler, Peter D. Karp, Jerry H. Morrison, and Markus W. Covert. Simultaneous cross-

evaluation of heterogeneous E. coli datasets via mechanistic simulation. Science, 369(6502), 2020.

[42] Kyle Mangano, Tanja Florin, Xinhao Shao, Dorota Klepacki, Irina Chelysheva, Zoya Ignatova,

Yu Gao, Alexander S. Mankin, and Nora Vázquez-Laslop. Genome-wide effects of the antimicrobial

peptide apidaecin on translation termination in bacteria. eLife, 9:1–24, 2020.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[43] Tõnu Margus, Maido Remm, and Tanel Tenson. Phylogenetic distribution of translational

in bacteria. BMC Genomics, 8:1–18, 2007.

[44] Assen Marintchev and Gerhard Wagner. Translation initiation: structures, mechanisms and evolu-

tion. Quarterly reviews of biophysics, 37(3-4):197–284, 2004.

[45] Pohl Milón, Cristina Maracci, Liudmila Filonava, Claudio O Gualerzi, and Marina V Rodnina.

Real-time assembly landscape of bacterial 30S translation initiation complex. Nature Structural &

Molecular Biology, 19(6):609–615, 2012.

[46] Fuad Mohammad, Rachel Green, and Allen R Buskirk. A systematically-revised ribosome profiling

method for bacteria reveals pauses at single-codon resolution. eLife, 8:1–25, 2019.

[47] J Monod. The Growth of Bacterial Cultures. Annual Review of Microbiology, 3(1):371–394, 1949.

[48] Liliana Mora, Valérie Heurgué-Hamard, Miklos De Zamaroczy, Stephanie Kervestin, and Richard H.

Buckingham. Methylation of bacterial release factors RF1 and RF2 is required for normal translation

termination in vivo. Journal of Biological Chemistry, 282(49):35638–35645, 2007.

[49] Anja Nenninger, Giulia Mastroianni, and Conrad W. Mullineaux. Size dependence of protein diffu-

sion in the cytoplasm of Escherichia coli. Journal of Bacteriology, 192(18):4535–4540, 2010.

[50] M Nomura. Regulation of the Synthesis of Ribosomes and Ribosomal Components. Annual Review

of , 53(1):75–117, 1984.

[51] Yan Ling Joy Pang, Kiranmai Poruri, and Susan A Martinis. tRNA synthetase: TRNA aminoacy-

lation and beyond. Wiley Interdisciplinary Reviews: RNA, 5(4):461–480, 2014.

[52] Darren J. Parker, Jean-Benoît Lalanne, Satoshi Kimura, Grace E. Johnson, Matthew K. Waldor,

and Gene-Wei Li. Growth-Optimized Aminoacyl-tRNA Synthetase Levels Prevent Maximal tRNA

Charging. Cell Systems, 11:1–10, 2020.

[53] Michael Yu Pavlov, David V. Freistroffer, Valérie Heurgué-Hamard, Richard H. Buckingham, and

Måns Ehrenberg. Release factor RF3 abolishes competition between release factor RF1 and ribosome

recycling factor (RRF) for a ribosome binding site. Journal of Molecular Biology, 273(2):389–401,

1997.

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[54] S Pedersen, P L Bloch, S Reeh, and F C Neidhardt. Patterns of protein synthesis in E. coli: a

catalog of the amount of 140 individual proteins at different growth rates. Cell, 14(1):179–190,

1978.

[55] Anne Plochowietz, Ian Farrell, Zeev Smilansky, Barry S. Cooperman, and Achillefs N. Kapanidis.

In vivo single-RNA tracking shows that most tRNA diffuses freely in live bacteria. Nucleic Acids

Research, 45(2):926–937, 2017.

[56] Shlomi Reuveni, Isaac Meilijson, Martin Kupiec, Eytan Ruppin, and Tamir Tuller. Genome-scale

analysis of translation elongation with a ribosome flow model. PLoS Computational Biology, 7(9),

2011.

[57] Marina V. Rodnina. Translation in prokaryotes. Cold Spring Harbor Perspectives in Biology,

10(9):1–22, 2018.

[58] Kazuki Saito, Rachel Green, and Allen R. Buskirk. Ribosome recycling is not critical for translational

coupling in E. Coli. eLife, 9:1–37, 2020.

[59] Arash Sanamrad, Fredrik Persson, Ebba G. Lundius, David Fange, Arvid H. Gynnå, and Johan Elf.

Single-particle tracking reveals that free ribosomal subunits are not excluded from the Escherichia

coli nucleoid. Proceedings of the National Academy of Sciences of the United States of America,

111(31):11413–11418, 2014.

[60] M. Schaechter, O. MaalOe, and N. O. Kjeldgaard. Dependency on Medium and Temperature of

Cell Size and Chemical Composition during Balanced Growth of Salmonella typhimurium. Journal

of General Microbiology, 19(3):592–606, 1958.

[61] E. Scolnick, R. Tompkins, T. Caskey, and M. Nirenberg. Release factors differing in specificity

for terminator codons. Proceedings of the National Academy of Sciences of the United States of

America, 61(2):768–774, 1968.

[62] Matthew Scott, Carl W Gunderson, Eduard M Mateescu, Zhongge Zhang, Terence Hwa, Carl W

Gunderson, Eduard M Mateescu, Zhongge Zhang, and Terence Hwa. Interdependence of Cell Growth

Origins and Consequences. Science, 330:1099–1102, 2010.

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[63] Matthew Scott, Stefan Klumpp, Eduard M Mateescu, and Terence Hwa. Emergence of robust

growth laws from optimal regulation of ribosome synthesis. Molecular Systems Biology, 10(8):747,

2014.

[64] Leah B Shaw, R K P Zia, and Kelvin H Lee. Totally asymmetric exclusion process with extended

objects: a model for protein synthesis. Physical review. E, Statistical, nonlinear, and soft matter

physics, 68(2 Pt 1):021910, 2003.

[65] Arvind R. Subramaniam, Brian M. Zid, and Erin K. O’Shea. An integrated approach reveals

regulatory controls on elongation. Cell, 159(5):1200–1211, 2014.

[66] Joana Pinto Vieira, Julien Racle, and Vassily Hatzimanikatis. Analysis of Translation Elongation

Dynamics in the Context of an Escherichia coli Cell. Biophysical Journal, 110(9):2120–2131, 2016.

[67] Ivan L. Volkov, Martin Lindén, Javier Aguirre Rivera, Ka Weng Ieong, Mikhail Metelev, Johan Elf,

and Magnus Johansson. tRNA tracking for direct measurements of protein synthesis kinetics in live

cells. Nature Chemical Biology, 14(6):618–626, 2018.

[68] Albert Weijland, Kim Harmark, Robbert H. Cool, Pieter H Anborgh, and Andrea Parmeggiani.

Elongation factor Tu: a molecular switch in . Molecular Microbiology, 6(6):683–

688, 1992.

[69] H G Wittmann. Components of Bacterial Ribosomes. Annual Review of Biochemistry, 51(1):155–

183, 1982.

[70] Conghui You, Hiroyuki Okano, Sheng Hui, Zhongge Zhang, Minsu Kim, Carl W. Gunderson,

Yi Ping Wang, Peter Lenz, Dalai Yan, and Terence Hwa. Coordination of bacterial proteome

with metabolism by cyclic AMP signalling. Nature, 500(7462):301–306, 2013.

[71] Andrey V. Zavialov, Vasili V. Hauryliuk, and Måns Ehrenberg. Splitting of the posttermination

ribosome into subunits by the concerted action of RRF and EF-G. Molecular Cell, 18(6):675–686,

2005.

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Supplement

S1 Methods details S1 S1.1 Average protein size ⟨ℓ⟩ ...... S1 S1.2 Conversion between concentration and proteome fraction ...... S1

S2 Equality of ribosome flux in steady-state S1

S3 Coarse-grained transition times: ribosome traffic and queuing S2

S4 Protein production flux and growth rate S5

S5 Translation termination S5 S5.1 Omitted molecular details ...... S5 S5.2 Non-diffusion limited regime (one stop codon) ...... S6 S5.3 Full three stop codons model ...... S7 S5.3.1 Steady-state concentrations for RFs ...... S9 S5.3.2 Coarse-grained translation termination time ...... S11 S5.3.3 Optimal abundances for RF1/RF2 ...... S11

S6 Translation elongation S12 S6.1 Coarse-grained one-codon model ...... S12 S6.2 Estimation of coarse-grained rates ...... S14 S6.3 Translation elongation: optimal solutions ...... S16 S6.3.1 Coarse-grained translation elongation time ...... S18 S6.3.2 Optimality conditions for translation elongation factors ...... S19 S6.3.3 Optimal EF-Ts abundance ...... S20 S6.3.4 Optimal EF-G abundance ...... S20 S6.3.5 Optimal EF-Tu and aaRS abundances ...... S22

S7 Translation initiation S24 S7.1 Sub-pathway without subunits joining ...... S26 S7.2 Pathway including subunits joining ...... S30

S8 Summary of optimal solutions S32

S9 Estimation of optimal abundances S33 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

S1 Methods details

S1.1 Average protein size ⟨ℓ⟩

∑︀ −1 ∑︀ We calculate the average protein size weighted by expression as ⟨ℓ⟩ := ( 푖 푒푖) 푖 푒푖ℓ푖, where ℓ푖 is the number of for the protein product of gene 푖, and 푒푖 is the protein synthesis rate (as estimated from ribosome profiling [34, 38]) for gene 푖. For a stable proteome (in fast growing bacteria, the cell doubling time is shorter than the active degradation of most proteins [35]), the protein synthesis rate equals to the proteome mass fraction [38]. Changes in the expression of genes across growth conditions do not lead to substantial changes in ⟨ℓ⟩. In E. coli, across growth conditions spanning ≈ 20 min doubling time to ≈ 120 min, ⟨ℓ⟩ changes by about 20%. Specifically, we find ⟨ℓ⟩ =196 a.a., 210 a.a., and 240 a.a. in respectively MOPS complete (≈ 20 min doubling time [38]), MOPS minimal (≈ 56 min doubling time

[38]), and NQ1390 forced glucose limitation (≈ 120 min doubling time, unpublished), based on ribosome profiling data.

S1.2 Conversion between concentration and proteome fraction

Throughout, we use both units of concentration (molar), denoted as e.g., [퐴] for protein 퐴, and proteome fraction, denoted by 휑퐴 [62]. The correspondence between the two is 휑퐴 = [퐴]ℓ퐴/푃 , where ℓ퐴 is the number of amino acid in protein 퐴, and 푃 is the in-protein amino acid concentration in the cell.

푃 ≈ 2.5 × 106 µM, and has a value approximately independent of growth rate [11, 30]. This change in ^ units also relates to how association constants are defined in units of proteome fraction: 푘표푛[퐴] := 푘표푛휑퐴, where the hat ^· refers to the association constant in usual units of µM−1 s−1 (used to connect to empirical

−1 data). Hence, 푘표푛 := 푘^표푛푃 ℓ is the rescaled association rate in units of proteome fraction.

S2 Equality of ribosome flux in steady-state

In steady-state exponential growth, the ribosome flux in and out of each intermediate state is equal to the total flux. This results from the fact that no ribosome can accumulate in any intermediate state.

푖 Since the flux out of state 푖 is given by 휑푟푖푏표/τ푖, we must have:

휆ℓ 휑 휑푖푛푖 휑푒푙 휑푡푒푟 푟푖푏표 = 푟푖푏표 = 푟푖푏표 = 푟푖푏표 = 푟푖푏표 . (S1) ⟨ℓ⟩ τ푡푟푙 τ푖푛푖 τ푒푙 τ푡푒푟

S1 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

As a consequence, the proportion of ribosome in each state is equal to the proportion of time spent at that given step, for example for translation initiation:

휑푖푛푖 τ 푟푖푏표 = 푖푛푖 . 휑푟푖푏표 τ푖푛푖 + τ푒푙 + τ푡푒푟

S3 Coarse-grained transition times: models of ribosome traffic

Our coarse-grained model of ribosome transitions between categories of initiation, elongation, and ter- mination need to be distinguished from the individual molecular times of the respective steps in one important regard: ribosome traffic on mRNAs can lead to effective delays arising from transient queu- ing. For example, if translation termination is slow and ribosomes start to pile up and form queues upstream of stop codons on mRNAs, the molecular time of termination (time between ribosome arrival to the stop codon and its recycling to the free ribosome pool) will not be a correct reflection of the actual termination time of a ribosome, because of the additional wait time in the queue. A similar argument can be made for transient queuing forming in the body of genes for elongating ribosomes.

We connect these two (molecular and coarse-grained) levels of description by noting that our mass action schemes relating the translation factor abundance to the times of the specific steps can be used as input parameters in traffic models of ribosome movement along mRNAs taking into account possible many-body interactions (e.g., totally asymmetric exclusion processes [27, 64]). Solving these traffic models can then be used to obtain transition times in our coarse-grained translation cycle model. As we show below, corrections arising from transient queuing are small (for endogenous translation factor abundances) based on current estimates the absolute rates of initiation, elongation, and termination, on individual mRNAs, such that stochastic queuing does not play a dominant role in determining optimal translation factor expression levels.

As a first example, we relate the on-stop codon molecular termination time τ푡푒푟, which we obtain from solving our mass action scheme (see equation 6), to the termination time in presence of queuing:

푓푢푙푙 τ푡푒푟 . The difference between the two, as described above, being related to possible queues upstream of stop codons leading to further delays in the process of translation termination, and thus to a longer termination time than that of the molecular on-stop codon termination. The delay factor will be denoted

S2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

풬 (τ푡푒푟), defined through:

푓푢푙푙 τ푡푒푟 := τ푡푒푟 풬 (τ푡푒푟) .

To derive the expression for the 풬 factor, note that in steady-state, ribosome numbers in a given state is directly proportional to the time to transition out of that state. Let 푚푖 be the mRNA concentration for gene 푖 in the cell, 푛푡푒푟(훼푖, τ푡푒푟) the number of terminating ribosomes (including queues if present) on a transcript with per mRNA translation initiation rate (i.e., translation efficiency [37]) 훼푖, then:

푓푢푙푙 ∑︁ τ푡푒푟 ∝ 푚푖 푛푡푒푟(훼푖, τ푡푒푟), 푖 whereas

∑︁ Ø풬 τ푡푒푟 ∝ 푚푖 푛푡푒푟 (훼푖, τ푡푒푟), 푖

Ø풬 with 푛푡푒푟 (훼푖, τ푡푒푟) the average number of terminating ribosomes on a transcript with translation effi- Ø풬 ciency 훼푖, assuming no queue upstream of the stop codon. Note that 푛푡푒푟(훼푖, τ푡푒푟) ≥ 푛푡푒푟 (훼푖, τ푡푒푟) (the differences being queued ribosomes). Hence, the queuing factor 풬 is:

τ푓푢푙푙 ∑︀ 푚 푛 (훼 , τ ) 풬 (τ ) := 푡푒푟 = 푖 푖 푡푒푟 푖 푡푒푟 . 푡푒푟 τ ∑︀ Ø풬 푡푒푟 푖 푚푖 푛푡푒푟 (훼푖, τ푡푒푟)

Formally, 푛푡푒푟 can be obtained by solving a TASEP model [64], but a simplified queue model [7, 33] disre- garding spatial information recapitulates the statistics of queue formation (as verified by full stochastic simulations, data not shown). The state space of the queue model is the number of ribosomes 푁 in the queue. Ribosomes arrive at a rate 훼 (initiation rate on the transcript), and leave at the molecular

−1 termination rate τ푡푒푟. The ribosome arrival rate at the queue is rigorously correct in steady-state, unless the queue becomes large enough to affect the initiation process (fully jammed transcript), or RNA degra- dation. The stochastic process (away from the jammed state) is then described by: 푁 → 푁 + 1 at rate

−1 훼, and 푁 → 푁 − 1 at rate τ푡푒푟 for 푁 > 0. The probability for the queue to have 푁 ribosomes, 푃 (푁), can be obtained as the steady-state from the resulting master equation, leading to a geometric series:

푁 푃 (푁) = (훼τ푡푒푟) (1 − 훼τ푡푒푟). Hence, the prevalence of higher order queues scales as the ratio of the initiation to termination rate on the transcript. The average queue size, corresponding to 푛푡푒푟(훼푖, τ푡푒푟),

S3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. is:

⎧ τ 훼 ⎪ 푡푒푟 푖 −1 −1 ⎪ , τ푡푒푟 ≥ 훼푖(1 + ℓ푓표표푡푝푟푖푛푡ℓ푖 ), ⎪1 − τ푡푒푟훼푖 ⎨⎪ 푛푡푒푟(훼푖, τ푡푒푟) ≈ ⎪ ⎪ ⎪ ℓ푖 −1 −1 ⎩⎪ , τ푡푒푟 < 훼푖(1 + ℓ푓표표푡푝푟푖푛푡ℓ푖 ). ℓ푓표표푡푝푟푖푛푡

Above, the solution of the simple model is truncated at the value where the transcript becomes fully jammed with ℓ푖/ℓ푓표표푡푝푟푖푛푡 ribosomes (ℓ푖 and ℓ푓표표푡푝푟푖푛푡 being the size of gene 푖 and the size occupied by a ribosome respectively). The no queue ribosome number is simply equal to a model where queues

Ø풬 with 푁 > 1 do not arise, hence 푛푡푒푟 (훼푖, τ푡푒푟) = 훼푖τ푡푒푟. Therefore, the queuing factor, under the stated assumptions (and assuming no transcript is in the jammed state), is

∑︀ 푚 훼푖 푖 푖 1−τ푡푒푟훼푖 풬 (τ푡푒푟) ≈ ∑︀ . 푖 푚푖훼푖

2 τ푡푒푟⟨훼 ⟩ Expanding for fast termination gives 풬 − 1 = ⟨훼⟩ as the leading order correction, where the averages are weighted by mRNA levels. The above was derived assuming exponentially distributed initiation and termination times, but could be modified to account for more complex dynamics of the initiation and initiation steps.

The queuing factor can be estimated based on absolute measurements of the initiation and termina- tion rates in cells. Kennell and Riezman [28] estimate 3.2 s between initiation events on the lacZ mRNA

(at 48 min per cell doubling). Bremer and Dennis [11] estimate 1 s per ribosome initiation events at 20 min doubling time. Recent calibrated high-throughput measurements report a genome-wide median of

5.6 s per initiation events [19]. To our knowledge, estimation of absolute in vivo termination rates have not been performed, but we can estimate bounds. Indirect assessment based on steady-state protein production measurements place the fraction of actively elongating ribosome at about 95% [13]. Assum- ing (upper bound) that the 5% of non elongating ribosomes are in the process of termination would give a termination time of 5% × 11.1푠 ≈ 0.6 푠 (fraction of ribosomes in a given state equal to the ratio of transition times), where we have used that the elongation time of an average protein is about 11.1 s

(200 푎.푎./18 푎.푎. 푠−1) at fast growth [13]. This upper bound is still much smaller than the reported me- dian initiation time, suggesting that the queuing factor for termination is small. As additional support to the view that translation is far from being termination limited, small that queues at stop codons are

S4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. only globally observed in ribosome profiling upon severe perturbations [3, 33, 42,58]

With regards to translation elongation, transient queuing in the body of gene can also lead to a difference between molecular and coarse-grained transition times in our model. However, the fraction of ribosomes transiently stalled due to this queuing scales as 훼τ푎푎 in the low density phase (defined

√︀ −1 by requirements 훼τ푡푒푟 < 1 and 훼τ푎푎 < (1 + ℓ푓표표푡푝푟푖푛푡) ≈ 0.25) of the TASEP model [64]. Since measured estimates place 훼τ푎푎 ∼ 0.01 [13, 19], we do not consider the queuing effect for elongating ribosomes within our optimization framework for elongation factor abundances.

S4 Protein production flux and growth rate

In order to write the mass action kinetic scheme for more complex models, it is useful to recast our framework in terms of the protein number production flux 퐽, defined as the number of full length proteins produced per cell volume per unit time. The production of each protein requires a ribosome to go through the full synthesis cycle, and as such 퐽 provides a convenient quantity in mass action schemes formulated in molar units.

In steady-state of exponential growth [13, 47, 62], there is a direct relationship between the growth rate 휆 (defined through d푁/d푡 = 휆푁, where 푁 is the number of cells per unit volume of culture) and the protein production flux 퐽. Explicitly, the protein mass accumulation rate is 휆푀, where 푀 is the total protein mass per unit volume of culture. If 푉 is the mean cell volume, then 휆푀/푉 = 푁푚푎푎⟨ℓ⟩퐽, where

푚푎푎 is the mean amino acid mass. Defining 푃 := 푀/(푚푎푎푁푉 ), the in-protein amino acid concentration per cell (see section S1.2), the connection between protein production flux 퐽 and growth rate 휆 is then

푃 휆 퐽 = ⟨ℓ⟩ . This relationship will be used to convert between molar and proteome fraction in some equations below.

S5 Translation termination

S5.1 Omitted molecular details

The kinetic scheme presented in Fig. 2A does not include some known molecular details of translation termination. For example, GTPase RF3 has been shown to catalyze the release of RF1/RF2 post peptide hydrolysis and to effectively prevent rebinding to empty A site ribosome without peptide [53]. RF3isnot included in our model given our desire for a parsimonious description and due to the absence of identifiable

S5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. homologs in multiple bacteria (e.g., B. subtilis) [43]. Our scheme aggregates the RF1/RF2 recycling rate with the catalytic rate, and further assume a unidirectional reaction without rebinding (consistent with a lower bound), effectively taking into account the action of RF3. In addition, translocation factor EF-G is known to be implicated in ribosome recycling via translocation post RF4 binding [71]. We assume

EF-G’s abundance requirement towards the function of termination to be a minor fraction of its total requirement (non-sense to sense codons ≈0.5%) and to be non-limiting for this step. We thus coarse- grain EF-G’s role in ribosome recycling through an effective catalytic rate for RF4, see [10] for details of

EF-G’s involvement in ribosome recycling. As another example of simplification in our coarse-graining, we also do not explicitly model RF1/RF2’s post-translational modification by methyltransferase PrmC

[48]. Thus, the activity of the RFs within our description to correspond to the average within a possibly heterogeneous pool of modified and unmodified factors in the cell.

S5.2 Non-diffusion limited regime (one stop codon)

If translation termination is not diffusion limited, terms corresponding to the finite catalytic times must be included in addition to the diffusive contributions in the termination time (equation 6). Under our simplified scheme (Fig. 2A) and with a single stop codons (grouping RF1 and RF2), the molecular termination time is then sum of the four separate times corresponding to distinct events:

1 1 1 1 τ푡푒푟 = + + + 푅퐹 퐼 푓푟푒푒 푘푅퐹 퐼 푅퐹 4 푓푟푒푒 푘푅퐹 4 푘표푛 휑푅퐹 퐼 푐푎푡 푘표푛 휑푅퐹 4 푐푎푡

The two novelties compared to the diffusion-limited regime (equation 6) are: (1) addition of the catalytic

−1 times 푘푐푎푡 for the two steps, and importantly (2) the mass action diffusion terms now involve the free concentration of release factors. Generally, the free concentration of the TFs can be obtained by solving the steady-state solutions of kinetic schemes under constraints imposed by conservation equations. The examples in e.g., sections S5.3, S6.3, and S7.1 below provide the mathematical details associated with the procedure.

Here, the difference between the total and free concentration of release factor arises from the finite catalytic turnover of the enzymes, and corresponds to the concentration of ribosome bound release factors. Given the flux 퐽 through the system in steady-state of growth, the concentration of ribosome

푅퐹 4 ℓ푅퐹 4휆 bound release factor (e.g., for RF4) is 퐽/푘푐푎푡 , which becomes 푅퐹 4 upon converting to proteome ⟨ℓ⟩푘푐푎푡 fraction. This quantity sets the absolute minimum for the release factor abundance necessary to sustain

S6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

growth 휆 for a given 푘푐푎푡. The free concentrations for the release factors are then:

푓푟푒푒 ℓ푅퐹 퐼 휆 푓푟푒푒 ℓ푅퐹 4휆 휑푅퐹 퐼 = 휑푅퐹 퐼 − 푅퐹 퐼 , 휑푅퐹 4 = 휑푅퐹 4 − 푅퐹 4 . (S2) ⟨ℓ⟩푘푐푎푡 ⟨ℓ⟩푘푐푎푡

Hence, the final solution for the steady-state termination time as a function of the total abundanceof the release factors and growth rate is:

1 1 1 1 τ푡푒푟 = (︁ )︁ + 푅퐹 퐼 + (︁ )︁ + 푅퐹 4 . 푅퐹 퐼 ℓ푅퐹 퐼 휆 푘 푅퐹 4 ℓ푅퐹 4휆 푘 푘표푛 휑푅퐹 퐼 − 푅퐹 퐼 푐푎푡 푘표푛 휑푅퐹 4 − 푅퐹 4 푐푎푡 ⟨ℓ⟩푘푐푎푡 ⟨ℓ⟩푘푐푎푡

The relationship above, between termination time, total TF abundance, and growth rate 휆 closes the solution of the kinetic scheme. Substituting the above in the optimality condition (equation 5) leads to the solution:

√︃ * * √︃ * * * ℓ푟푖푏표휆 ℓ푅퐹 퐼 휆 * ℓ푟푖푏표휆 ℓ푅퐹 4휆 휑푅퐹 퐼 = 푅퐹 퐼 + 푅퐹 퐼 , 휑푅퐹 4 = 푅퐹 4 + 푅퐹 4 . (S3) ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘푐푎푡 ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘푐푎푡

The additional terms ∝ 휆* correspond to the contribution to the optimal abundance arising from the finite catalytic rates, no present in the diffusion limited regime (equation 7).

S5.3 Full three stop codons model

The full model with three different stop codons (UAA, UGA, UAG) and RF1/RF2 with different speci- ficities (RF1: UAA, UAG; RF2: UAA, UGA) can also be solved exactly, leading to a small correction on √︀ √ the summed optimal abundance for RF1 and RF2 of 1 + 2 푓푈퐴퐺푓푈퐺퐴 < 1.05 (fast growing species considered, where 푓푈퐴퐺 and 푓푈퐺퐴 are the fractional fluxes through the RF1 and RF2 stop codons re-

* spectively) compared to the single stop codon optimum derived above (휑푅퐹 퐼 , equation S3). We provide details below. With three stop codons, the coarse-grained reaction scheme is shown in Fig. S1. The relevant chemical species and parameters are listed in Table S1.

S7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 [DUAG] RF1 kon UAG UAG RF1 +pep kcat [CUAG ]

fUAGJ [RF1]

RF1 kon UAA kRF1 cat RF4 1 [D ] UAA Figure S1: Coarse-grained RF4 RF4 J f J k k UAA on cat translation termination scheme UAA +pep -pep RF4 2 [C ] [E4] with three stop codons and [CUAA ] [DUAA] RF2 RF1/RF2. kcat RF2 kon UAA Ribosome recycling

[RF2]

fUGAJ

RF2 kRF2 kon cat

UGA UGA +pep 2 [CUGA ] [DUGA]

Peptide chain release

Variable Description +푝푒푝 [퐶푈퐴퐴 ] Ribosomes at UAA with peptide chain [휇M] +푝푒푝 [퐶푈퐴퐺 ] Ribosomes at UAG with peptide chain [휇M] +푝푒푝 [퐶푈퐺퐴 ] Ribosomes at UGA with peptide chain [휇M] 1 [퐷푈퐴퐴] Ribosomes at UAA with peptide chain and RF1 bound [휇M] 1 [퐷푈퐴퐺] Ribosomes at UAG with peptide chain and RF1 bound [휇M] 2 [퐷푈퐴퐴] Ribosomes at UAA with peptide chain and RF2 bound [휇M] 2 [퐷푈퐺퐴] Ribosomes at UGA with peptide chain and RF2 bound [휇M] [퐶−푝푒푝] Ribosomes at all stops without peptide chain [휇M] [퐸4] Ribosomes at all stops without peptide chain and RF4 bound [휇M] [푅퐹 1] Free RF1 [휇M] [푅퐹 2] Free RF2 [휇M] [푅퐹 4] Free RF4 [휇M] UAA −1 퐽 = 푓UAA퐽 Ribosome flux through UAA [휇M s ] UAG −1 퐽 = 푓UAG퐽 Ribosome flux through UAG [휇M s ] UGA −1 퐽 = 푓UGA퐽 Ribosome flux through UGA [휇M s ] ^푅퐹 1 −1 −1 푘표푛 On-rate for RF1 [휇M s ] ^푅퐹 2 −1 −1 푘표푛 On-rate for RF2 [휇M s ] ^푅퐹 4 −1 −1 푘표푛 On-rate for RF4 [휇M s ] 푅퐹 1 −1 푘푐푎푡 Catalytic rate for RF1 [s ] 푅퐹 2 −1 푘푐푎푡 Catalytic rate for RF2 [s ] 푅퐹 4 −1 푘푐푎푡 Catalytic rate for RF4 [s ] 푅퐹 1푡표푡 Total RF1 [휇M] 푅퐹 2푡표푡 Total RF2 [휇M] 푅퐹 4푡표푡 Total RF4 [휇M]

Table S1: Chemical species and parameters in three stop codons termination model.

S8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The corresponding mass action system of equations for peptide release:

d[퐶+푝푒푝] (︁ )︁ 푈퐴퐴 = 푓 퐽 − [퐶+푝푒푝] 푘^푅퐹 1[푅퐹 1] + 푘^푅퐹 2[푅퐹 1] , dt 푈퐴퐴 푈퐴퐴 표푛 표푛 d[퐶+푝푒푝] 푈퐴퐺 = 푓 퐽 − 푘^푅퐹 1[퐶+푝푒푝][푅퐹 1], dt 푈퐴퐺 표푛 푈퐴퐺 d[퐶+푝푒푝] 푈퐺퐴 = 푓 퐽 − 푘^푅퐹 2[퐶+푝푒푝][푅퐹 1], dt 푈퐺퐴 표푛 푈퐺퐴 d[퐷1 ] 푈퐴퐴 = 푘^푅퐹 1[푅퐹 1][퐶+푝푒푝] − 푘푅퐹 1[퐷1 ], dt 표푛 푈퐴퐴 푐푎푡 푈퐴퐴 d[퐷1 ] 푈퐴퐺 = 푘^푅퐹 1[푅퐹 1][퐶+푝푒푝] − 푘푅퐹 1[퐷1 ], dt 표푛 푈퐴퐺 푐푎푡 푈퐴퐺 d[퐷2 ] 푈퐴퐴 = 푘^푅퐹 2[푅퐹 2][퐶+푝푒푝] − 푘푅퐹 1[퐷2 ], dt 표푛 푈퐴퐴 푐푎푡 푈퐴퐴 d[퐷2 ] 푈퐺퐴 = 푘^푅퐹 2[푅퐹 2][퐶+푝푒푝] − 푘푅퐹 1[퐷2 ], dt 표푛 푈퐺퐴 푐푎푡 푈퐺퐴 d[푅퐹 1] = −푘^푅퐹 1[푅퐹 1] (︀[퐶+푝푒푝] + [퐶+푝푒푝])︀ + 푘푅퐹 1 (︀[퐷1 ] + [퐷1 ])︀ , dt 표푛 푈퐴퐴 푈퐴퐺 푐푎푡 푈퐴퐴 푈퐴퐺 d[푅퐹 2] = −푘^푅퐹 2[푅퐹 2] (︀[퐶+푝푒푝] + [퐶+푝푒푝])︀ + 푘푅퐹 2 (︀[퐷2 ] + [퐷2 ])︀ . dt 표푛 푈퐴퐴 푈퐺퐴 푐푎푡 푈퐴퐴 푈퐺퐴

And for ribosome recycling:

d[퐶−푝푒푝] = 푘푅퐹 1 (︀[퐷1 ] + [퐷1 ])︀ + 푘푅퐹 2 (︀[퐷2 ] + [퐷2 ])︀ − 푘^푅퐹 4[퐶−푝푒푝][푅퐹 4], dt 푐푎푡 푈퐴퐴 푈퐴퐺 푐푎푡 푈퐴퐴 푈퐺퐴 표푛 d[퐸4] = 푘^푅퐹 4[퐶−푝푒푝][푅퐹 4] − 푘푅퐹 4[퐸4], dt 표푛 푐푎푡 d[푅퐹 4] = −푘^푅퐹 4[퐶−푝푒푝][푅퐹 4] + 푘푅퐹 4[퐸4]. dt 표푛 푐푎푡

The conservation equations for RF1, RF2 and RF4 are:

1 1 푅퐹 1푡표푡 = [푅퐹 1] + [퐷푈퐴퐴] + [퐷푈퐴퐺],

2 2 푅퐹 2푡표푡 = [푅퐹 2] + [퐷푈퐴퐴] + [퐷푈퐺퐴],

4 푅퐹 4푡표푡 = [푅퐹 4] + [퐸 ].

With a more complex scheme such as the one above, the optimization problem can be solved in three steps. First, we obtain the steady-state concentration of the chemical species. Second, we determine the effective coarse-grained termination time. Finally, the optimal abundance is found by substituting the termination time in the optimality condition (equation 5), and solving the resulting system of equation.

S5.3.1 Steady-state concentrations for RFs

Note that the RF1/RF2 and RF4 completely decouple, and that the solution for RF4 is identical to the one stop codon case solved above (section S5.2). For peptide chain release, the steady-state of the

S9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. system can be solved by expressing the all chemical species in terms of [푅퐹 1], and [푅퐹 2]:

푓 퐽 [퐶+푝푒푝] = 푈퐴퐴 (S4) 푈퐴퐴 ^푅퐹 1 ^푅퐹 2 푘표푛 [푅퐹 1] + 푘표푛 [푅퐹 2] (︃ ^푅퐹 1 )︃ 1 퐽 푘표푛 [푅퐹 1] [퐷 ] = 푓푈퐴퐴 , 푈퐴퐴 푅퐹 1 ^푅퐹 1 ^푅퐹 2 푘푐푎푡 푘표푛 [푅퐹 1] + 푘표푛 [푅퐹 2] (︃ ^푅퐹 2 )︃ 2 퐽 푘표푛 [푅퐹 2] [퐷 ] = 푓푈퐴퐴 , 푈퐴퐴 푅퐹 2 ^푅퐹 1 ^푅퐹 2 푘푐푎푡 푘표푛 [푅퐹 1] + 푘표푛 [푅퐹 2]

+푝푒푝 푓푈퐴퐺퐽 +푝푒푝 푓푈퐺퐴퐽 1 퐽 2 퐽 [퐶 ] = , [퐶 ] = , [퐷 ] = 푓푈퐴퐺 , [퐷 ] = 푓푈퐺퐴 . 푈퐴퐺 ^푅퐹 1 푈퐺퐴 ^푅퐹 2 푈퐴퐺 푅퐹 1 푈퐺퐴 푅퐹 2 푘표푛 [푅퐹 1] 푘표푛 [푅퐹 2] 푘푐푎푡 푘푐푎푡

Substituting these in the conservation equations for RF1 and RF2 leads to a closed system in terms of

[푅퐹 1] and [푅퐹 2]:

[︃ (︃ ^푅퐹 1 )︃]︃ 퐽 푘표푛 퐽 푅퐹 1푡표푡 = [푅퐹 1] 1 + 푓푈퐴퐴 + 푓푈퐴퐺 , 푅퐹 1 ^푅퐹 1 ^푅퐹 2 푅퐹 1 푘푐푎푡 푘표푛 [푅퐹 1] + 푘표푛 [푅퐹 2] 푘푐푎푡 [︃ (︃ ^푅퐹 2 )︃]︃ 퐽 푘표푛 퐽 푅퐹 2푡표푡 = [푅퐹 2] 1 + 푓푈퐴퐴 + 푓푈퐺퐴 . 푅퐹 2 ^푅퐹 1 ^푅퐹 2 푅퐹 2 푘푐푎푡 푘표푛 [푅퐹 1] + 푘표푛 [푅퐹 2] 푘푐푎푡

푅퐹 1 푅퐹 2 Under the assumption of identical biochemical properties for RF1 and RF2, namely 푘푐푎푡 = 푘푐푎푡 := 푅퐹 퐼 ^푅퐹 1 ^푅퐹 2 ^푅퐹 퐼 푘푐푎푡 and 푘표푛 = 푘표푛 := 푘표푛 , the total free concentration of RF1 and RF2 simplifies to: [푅퐹 1] + 퐽 [푅퐹 2] = 푅퐹 1푡표푡 + 푅퐹 2푡표푡 − 푅퐹 퐼 , where we used 푓푈퐴퐴 + 푓푈퐴퐺 + 푓푈퐺퐴 = 1 (by definition). Using this 푘푐푎푡 relation to eliminate [푅퐹 2] from the [푅퐹 1] equation (and vice-versa), we obtain, upon conversion to proteome fraction:

푓푟푒푒 ℓ푅퐹 퐼 휆 휑푅퐹,푡표푡 :=휑푅퐹 1 + 휑푅퐹 2 − 푅퐹 퐼 , (S5) ⟨ℓ⟩푘푐푎푡 푓푟푒푒 푓푟푒푒 푓푟푒푒 푓푟푒푒 휑푅퐹 1 =휒푅퐹 1 휑푅퐹,푡표푡, 휑푅퐹 2 = 휒푅퐹 2 휑푅퐹,푡표푡, where

ℓ푅퐹 퐼 휆 휑푅퐹 1 − 푅퐹 퐼 푓UAG ⟨ℓ⟩푘푐푎푡 휒푅퐹 1 := , ℓ푅퐹 퐼 휆 ℓ푅퐹 퐼 휆 (휑푅퐹 1 − 푅퐹 퐼 푓UAG) + (휑푅퐹 2 − 푅퐹 퐼 푓UGA) ⟨ℓ⟩푘푐푎푡 ⟨ℓ⟩푘푐푎푡 ℓ푅퐹 퐼 휆 휑푅퐹 2 − 푅퐹 퐼 푓UGA ⟨ℓ⟩푘푐푎푡 휒푅퐹 2 := . ℓ푅퐹 퐼 휆 ℓ푅퐹 퐼 휆 (휑푅퐹 1 − 푅퐹 퐼 푓UAG) + (휑푅퐹 2 − 푅퐹 퐼 푓UGA) ⟨ℓ⟩푘푐푎푡 ⟨ℓ⟩푘푐푎푡

These constitute the steady-state solutions of the system of equation.

S10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

S5.3.2 Coarse-grained translation termination time

In order to obtain an expression for the termination time (peptide release portion), needed to determine the optimal RF abundance (i.e., to substitute in equation 5), the peptide chain release contribution arises from the ribosome containing species listed in equation S4, which sum to (under the assumption of identical biochemical properties for RF1/RF2):

푝푒푝 +푝푒푝 +푝푒푝 +푝푒푝 1 1 2 2 [푅푡푒푟 ] = [퐶푈퐴퐴 ] + [퐶푈퐴퐺] + [퐶푈퐺퐴] + [퐷푈퐴퐴] + [퐷푈퐴퐺] + [퐷푈퐴퐴] + [퐷푈퐺퐴], (︃ )︃ 푓 푓 푓 1 [푅푝푒푝] = 퐽 푈퐴퐺 + 푈퐺퐴 + 푈퐴퐴 + . 푡푒푟 ^푅퐹 퐼 ^푅퐹 퐼 ^푅퐹 퐼 푅퐹 퐼 푘표푛 [푅퐹 1] 푘표푛 [푅퐹 2] 푘표푛 ([푅퐹 1] + [푅퐹 2]) 푘푐푎푡

Upon conversion to proteome fraction, the above becomes:

⎛ ⎞ 푝푒푝 ℓ푟푖푏표 푓푈퐴퐺 푓푈퐺퐴 푓푈퐴퐴 1 ℓ푟푖푏표 휑푟푖푏표 = 휆 ⎝ + + (︁ )︁ + ⎠ := 휆τ푝푒푝. ⟨ℓ⟩ 푘푅퐹 퐼 휑푓푟푒푒 푘푅퐹 퐼 휑푓푟푒푒 푅퐹 퐼 푓푟푒푒 푓푟푒푒 푘푅퐹 퐼 ⟨ℓ⟩ 표푛 푅퐹 1 표푛 푅퐹 2 푘표푛 휑푅퐹 1 + 휑푅퐹 2 푐푎푡

The bracketed term corresponds to the coarse-grained time associated with peptide chain release τ푝푒푝, and the free concentrations are given by equations S5.

S5.3.3 Optimal abundances for RF1/RF2

The solved concentrations in steady-state (as a function of proteome fractions) and coarse-grained times allow us to determine the optimal RF1 and RF2 solutions (within our model). The optimality condition

(equation 5) is now:

(︂ )︂* (︂ )︂* 휕τ푝푒푝 ⟨ℓ⟩ 휕τ푝푒푝 ⟨ℓ⟩ = − * , = − * . 휕휑RF1 ℓ푟푖푏표휆 휕휑RF2 ℓ푟푖푏표휆

* * Solving the above system leads to optima 휑푅퐹 1 and 휑푅퐹 2:

√︃ * * * * ℓ푟푖푏표휆 (1 + 훿) ℓ푅퐹 퐼 휆 휑푅퐹 1 + 휑푅퐹 2 = 푅퐹 퐼 + 푅퐹 퐼 , ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘푐푎푡 * * 푓푈퐴퐺ℓ푅퐹 퐼 휆 휑푅퐹 1 − 푅퐹 퐼 √︃ ⟨ℓ⟩푘푐푎푡 푓푈퐴퐺 * = . * 푓푈퐺퐴ℓ푅퐹 퐼 휆 푓푈퐺퐴 휑푅퐹 2 − 푅퐹 퐼 ⟨ℓ⟩푘푐푎푡 √ where the new factor 훿 := 2 푓푈퐴퐺푓푈퐺퐴.

S11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The flux through each stop codon can be estimated in a variety of bacteria from ribosome profiling data [34] as the total synthesis fraction of genes with the respective stop codon. Results in Table S2 √ highlight that the three stop codon correction 1 + 훿 to the optimal solution for the summed abundance of RF1 and RF2 is small for fast growing bacteria considered in the current study. √ Species 푓푈퐴퐴 푓푈퐴퐺 푓푈퐺퐴 1 + 훿 E. coli 0.888 0.015 0.097 1.04 B. subtilis 0.888 0.064 0.049 1.05 V. natriegens 0.929 0.041 0.031 1.04 √ √ Table S2: Translation flux through stop codon and estimated correction factor 1 + 훿, with 훿 := 2 푓푈퐴퐺푓푈퐺퐴.

S6 Translation elongation

S6.1 Coarse-grained one-codon model

Translation elongation is a more complicated process than termination, involving multiple factors to bring the charged tRNA to the ribosome (EF-Tu), charge the tRNAs (aaRS), translocate the ribosome

(EF-G), and perform nucleotide exchange on EF-Tu to drive the process (EF-Ts), in addition to others not included here. Our simplified kinetic scheme is illustrated in Fig. S2. In anticipation coarse-graining procedure detailed below, rates rescaled in the conversion to a one-codon model are marked by *.

To simplify our model, we coarse-grain the elongation cycle by considering a single codon type

(section S6.2 below or details of the coarse-graining procedure), effectively grouping the tRNA’s, tRNA synthetases, and different ternary complexes to single entities. Importantly, as a result, the on-rates

−1 associated with these processes are rescaled by a factor close to 푛푎푎 , where 푛푎푎 = 20. An important distinction for elongation compared to initiation and termination is that multiple elongation steps (average ⟨ℓ⟩ ≈ 200 a.a.) are required to generate a protein. Hence, the flux into the through the elongation cycle is ⟨ℓ⟩ larger than that through the initiation and termination steps (there is one initiation and termination event for each protein made, but about 200 elongation steps on average).

S12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[tRNA:aaRS]

[aatRNA] aaRS aaRS k kcat on [tRNA] * n1 [aaRS] Ts kcat [Ts] [G] Tu [TuGTP] kTs kon [Tu:Ts] on [TC] [Tu]

TC G G J n2 kcat kon kcat [R ] [R ] * TC [RtRNA] [RG] Ternary complex binding Translocation

Figure S2: Coarse-grained reaction scheme for a single step (amino acid incorporation) of translation elongation. Tu: EF-Tu, Ts: EF-Ts, G: EF-G, aaRS: aminoacyl tRNA synthetases. Steps with slower rates as a result of the coarse-graining to one effective codon are marked by *.

The mass action reaction scheme for translation elongation:

⟨ℓ⟩퐽 −−−→ R∅, (S6)

푘^푎푎푅푆 /푛 tRNA + aaRS −−−−−−−→표푛 1 tRNA:aaRS,

푘푎푎푅푆 tRNA:aaRS −−−−→푐푎푡 aatRNA + aaRS

푘^푇 푠 Tu + Ts −−→표푛 Tu:Ts,

푘푇 푠 Tu:Ts −−−→푐푎푡 TuGTP + Ts,

푘^푇 푢 TuGTP + aatRNA −−−→표푛 TC,

^푇 퐶 푘표푛 /푛2 TC + R∅ −−−−−→ R푇 퐶 ,

푇 퐶 푘푐푎푡 푅푇 퐶 −−−→ R푡푅푁퐴, ^퐺 푘표푛 R푡푅푁퐴 + G −−→ R퐺,

퐺 푘푐푎푡 R퐺 −−−→ G + tRNA.

To arrive at the above, we started with a full model of translation (not shown), will all possible codons, tRNA species, and ribosomes with different codons. To coarse-grain the model, we introduced the following effective variables, which correspond to the total concentration of each type of species involved, summed over the of the codon/amino acid specificity:

∑︁ ∑︁ ∑︁ ∑︁ [tRNA] := [tRNA푖], [aatRNA] := [aatRNA푖], [aaRS] := [aaRS^푖], [TC] := [TC푖] 푖 푖 ^푖 푖

∑︁ 푖 ∑︁ 푖 푇 퐶푗 ∑︁ 푖푗 ∑︁ 푖푗 [R∅] := [R휈휇], [R푇 퐶 ] := [R휈휇 ], [R푡푅푁퐴] := [R휈휇], [R퐺] := [R휈휇 :: 퐺]. 푖,휈,휇 푖,푗,휈,휇 푖,푗,휈,휇 푖,푗,휈,휇

S13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

In the above, Greek indices correspond to different codons on mRNAs, and Roman indices to different tRNAs. Roman indices with a hat (^푖) correspond to tRNA synthetases recognizing specific tRNAs

(multiple amino acids have more than one tRNA isoacceptor). In defining these coarse-grained species

(our approach is analogous to that of [13]), we redefined the two following kinetic parameters:

^푎푎푅푆 ^푇 퐶 푖 푘 ∑︁ [tRNA푖][aaRS^] 푘 ∑︁ [R휇휈]푆휈,푗[TC푗] 표푛 := 푘^푎푎푅푆 푖 , and 표푛 := 푘^푇 퐶 . (S7) 푛 표푛 [tRNA][aaRS] 푛 표푛 [R ][TC] 1 푖 2 휇,휈,푖,푗 ∅

^푎푎푅푆 ^푇 퐶 푘표푛 and 푘표푛 correspond to the microscopic bimolecular rates (assumed equal for the different chemical species). 푆휈,푗 is the tRNA isoacceptor/codon specificity matrix (1 if tRNA 푖 can recognize codon 휈, 0 otherwise) [9]. Rescaling terms 푛1 and 푛2 are estimated below.

S6.2 Estimation of coarse-grained rates

The definition of coarse-grained parameters (equations S7) involves sums:

푖 1 ∑︁ [tRNA푖][aaRS^] 1 ∑︁ [R휇휈]푆휈,푗[TC푗] := 푖 and := . 푛 [tRNA][aaRS] 푛 [R ][TC] 1 푖 2 휇,휈,푖,푗 ∅

These can be estimated from tRNA abundances, codon usage and individual synthetases’ levels obtained from ribosome profiling data in E. coli [38].

We first consider 푛1. Note that the fraction of free tRNA of type 푖 to the total number of free tRNA (not bound to any protein) is not readily measurable. Assuming similarities between types of tRNA’s, we approximate this fraction with the fraction of total tRNA of type 푖 to the total tRNA concentration, or

[tRNA ] tRNA푡표푡 푖 ≈ 푖 . [tRNA] tRNA푡표푡

The total tRNA concentration has been measured at fast growth for E. coli [15]. The relative concentra- tion of each tRNA synthetases (appropriately corrected for stoichiometry for the different classes) can be computed from the ribosome profiling data [38], and we obtain

(︂ )︂ (︂ 푡표푡 )︂ 1 ∑︁ [tRNA푖] [aaRS^] ∑︁ tRNA [aaRS^] := 푖 ≈ 푖 푖 ≈ 0.057 ⇒ 푛 ≈ 17.5 푛 [tRNA] [aaRS] tRNA [aaRS] 1 1 푖 푖 푡표푡

This was to be expected since the synthetases in E. coli show little variability around their mean, and

S14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

in the case of equal synthetase concentration, 푛1 = 20 would strictly hold.

For the second sum (푛2), we use distribution of ribosome footprint reads across the transcriptome to estimate ribosome occupancies at different codons. We first make the following approximation for one of the sub-sum:

푖 퐹 푃 ∑︁ [R휇휈] ∑︁ 푁휇휈 ≈ , [R ] 푁 퐹 푃 휇,푖 ∅ 휇 푡표푡

퐹 푃 퐹 푃 where 푁휇휈 is the total number of ribosome footprint reads at codon pairs 휇, 휈 and 푁푡표푡 is the total number of footprint reads mapping to coding sequences. The nature of the approximation is that we are taking relative fraction of ribosome footprints (representing ribosomes across the elongation cycle at that codon pair) at a given codon pair to be equal to the relative fraction of ribosomes waiting for the ternary complex to derliver a tRNA to the A site. The modest differences in elongation rates at different codons seen in ribosome profiling data [46] justify this approximation.

From our data (not shown), we have that

퐹 푃 퐹 푃 퐹 푃 ∑︁ 푁휇휈 ∑︁ 푁휈휇 푁휈 퐹 푃 ≈ 퐹 푃 = 퐹 푃 := 푓휈 휇 푁푡표푡 휇 푁푡표푡 푁푡표푡 holds to better than 0.5% for each codon. 푓휈 above is the (expression weighted) codon usage. As before with the free tRNA concentrations, we can approximate the relative ternary complexes concentrations by the corresponding total tRNA concentrations:

푖 푡표푡 1 ∑︁ [R휇휈]푆휈,푗[TC푗] ∑︁ 푓휈 푆휈,푗 tRNA푗 := ≈ ≈ 0.048 ⇒ 푛 ≈ 20.8 (S8) 푛 [R][TC] tRNA 2 2 휇,휈,푖,푗 휈,푗 푡표푡

We used the same dataset as before for the total tRNA concentration in E. coli [15]. The codon usage was determined directly from ribosome profiling data [38]. The sum of these products is graphically represented in Fig. S3. The above sum of product of tRNA fraction and codon usage provides an effective number of different ternary complexes. A priori, that might have been expected to equal to the number of tRNAs (≈40). However, as is apparent in Fig. S3, certain tRNA-codon pairs are much more prevalent than others (even for amino acid with multiple codons and/or tRNA isoacceptors), which leads to a decrease in the effective concentration. The exact value depends on the detailed codon usageand tRNA abundance.

S15 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure S3: Graphical illustration of the sum (equation S8). Left: codon us- age (vertical, from analysis of ribosome profiling data from [38]), tRNA-codon specificity (matrix, from [9], with differ- ent amino acids outlined with different colors), and tRNA abundance (horizon- tal, from [15]) organized by amino acid. Right: product matrix.

Given the results above, we take for simplicity 푛1 = 푛2 = 푛푎푎 = 20.

S6.3 Translation elongation: optimal solutions

The mass action reactions corresponding to the one codon elongation cycle model are (equations S6):

^푇 퐶 d[R∅] 푘표푛 = ⟨ℓ⟩퐽 − [TC][R∅], dt 푛푎푎 ^푇 퐶 d[R푇 퐶 ] 푘표푛 푇 퐶 = [TC][R∅] − 푘푐푎푡 [R푇 퐶 ], dt 푛푎푎 d[Tu] = 푘푇 퐶 [R ] − 푘^푇 푠[Tu][Ts], dt 푐푎푡 푇 퐶 표푛 ^푎푎푅푆 d[tRNA] 푘표푛 퐺 = − [tRNA][aaRS] + 푘푐푎푡[R퐺], dt 푛푎푎 ^푎푎푅푆 d[tRNA::aaRS] 푘표푛 푎푎푅푆 d[aaRS] = [tRNA][aaRS] − 푘푐푎푡 [tRNA::aaRS]= − , dt 푛푎푎 dt d[aatRNA] = 푘푎푎푅푆 [tRNA::aaRS] − 푘^푇 푢[aatRNA][TuGTP], dt 푐푎푡 표푛 d[TuGTP] = 푘푇 푠 [Tu:Ts] − 푘^푇 푢[aatRNA][TuGTP], dt 푐푎푡 표푛 d[Tu:Ts] d[Ts] = −푘푇 푠 [Tu:Ts] + 푘^푇 푠[Tu][Ts] = − , dt 푐푎푡 표푛 dt ^푇 퐶 d[TC] ^푇 푢 GTP 푘표푛 = 푘표푛 [aatRNA][Tu ] − [TC][R∅], dt 푛푎푎 d[R ] 푡푅푁퐴 = 푘푇 퐶 [R ] − 푘^퐺 [R ][G], dt 푐푎푡 푇 퐶 표푛 푡푅푁퐴 d[R ] d[G] 퐺 = 푘^퐺 [R ][G] − 푘퐺 [R ] = − . dt 표푛 푡푅푁퐴 푐푎푡 퐺 dt

S16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Conservation equations close the system:

Ts푡표푡 = [Ts] + [Tu:Ts],

GTP Tu푡표푡 = [Tu]+[Tu ] + [Tu:Ts] + [TC]+[R푇 퐶 ],

tRNA푡표푡 = [R∅] + 2[R푇 퐶 ] + 2[R푡푅푁퐴] + 2[R퐺] + [tRNA] + [tRNA:aaRS] + [aatRNA] + [TC],

aaRS푡표푡 = [tRNA:aaRS] + [aaRS],

G푡표푡 = [G]+[R퐺].

The ternary complex concentration and free EF-G concentration enter the translation elongation time

(equation 10, which is the diffusion limited and factor dependent contribution to the elongation time) and are required to infer optimal abundances of elongation factors. Both can to be obtained by solving the system of non-linear equations above.

First, catalytic steps must equal to the flux through in the system in steady-state and thus:

⟨ℓ⟩퐽 ⟨ℓ⟩퐽 ⟨ℓ⟩퐽 ⟨ℓ⟩퐽 [R퐺] = 퐺 , [R푇 퐶 ] = 푇 퐶 , [tRNA::aaRS]= 푎푎푅푆 , [Tu::Ts] = 푇 푠 . 푘푐푎푡 푘푐푎푡 푘푐푎푡 푘푐푎푡

Together with the conservation equations, these allow for immediate solutions for the free concentrations

[Ts], [aaRS], and [G]:

⟨ℓ⟩퐽 [Ts]=Ts푡표푡 − 푇 푠 , 푘푐푎푡 ⟨ℓ⟩퐽 [aaRS]=aaRS푡표푡 − 푎푎푅푆 , 푘푐푎푡 ⟨ℓ⟩퐽 [G] =G푡표푡 − 퐺 . 푘푐푎푡

S17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

The solution for other species can then also be obtained in terms [TuGTP], and [TC]:

⟨ℓ⟩퐽 ⟨ℓ⟩푛푎푎퐽 [R푡푅푁퐴] = (︁ )︁, [R∅] = ^퐺 ⟨ℓ⟩퐽 푘^푇 퐶 [TC] 푘표푛 G푡표푡 − 퐺 표푛 푘푐푎푡 ⟨ℓ⟩푛푎푎퐽 ⟨ℓ⟩퐽 [tRNA]= (︁ )︁, [aatRNA]= , ^푎푎푅푆 ⟨ℓ⟩퐽 푘^푇 푢[TuGTP] 푘표푛 aaRS푡표푡 − 푎푎푅푆 표푛 푘푐푎푡 ⟨ℓ⟩퐽 [Tu] = (︁ )︁. ^푇 푠 ⟨ℓ⟩퐽 푘표푛 Ts푡표푡 − 푇 푠 푘푐푎푡

Substituting these in the conservation equations for tRNAs and EF-Tu lead to the final system to solve (converting to proteome fraction):

tRNA푡표푡 휆푛푎푎 2휆 2휆 2휆 = 푇 퐶 + 푇 퐶 + (︁ )︁ + 퐺 + ... (S9) 푃 푘표푛 휑푇 퐶 푘 퐺 ℓ퐺휆 푘 푐푎푡 푘표푛 휑퐺 − 퐺 푐푎푡 푘푐푎푡 휆푛푎푎 휆 휆 휑푇 퐶 (︁ )︁ + 푎푎푅푆 + 푇 푢 + , 푎푎푅푆 ℓ푎푎푅푆 휆 푘 푘표푛 휑푇 푢GTP ℓ푇 푢 푘표푛 휑푎푎푅푆 − 푎푎푅푆 푐푎푡 푘푐푎푡 ℓ푇 푢휆 ℓ푇 푢휆 ℓ푇 푢휆 where 휑푇 푢GTP :=휑푇 푢 − (︁ )︁ − 푇 푠 − 휑푇 퐶 − 푇 퐶 . (S10) 푇 푠 ℓ푇 푠휆 푘 푘 푘표푛 휑푇 푠 − 푇 푠 푐푎푡 푐푎푡 푘푐푎푡 where the solution for 휑푇 푢GTP in terms of the ternary concentration was obtained from the conservation equation for EF-Tu. Equations S9 and S10 are closed, and the only variables to solve for is 휑푇 퐶 in terms of the TF abundances: 휑푇 푢, 휑푇 푠, 휑퐺, 휑푎푎푅푆, tRNA abundances, kinetic parameters, and the growth rate 휆.

S6.3.1 Coarse-grained translation elongation time

In order to obtain the coarse-grained translation elongation time, we proceed as for translation termi- nation (section S5.3.2). The summed concentration of the ribosome containing species for translation elongation in our model is:

[R푒푙] =[R∅] + [R푇 퐶 ] + [R푡푅푁퐴] + [R퐺],

⟨ℓ⟩푛푎푎퐽 ⟨ℓ⟩퐽 ⟨ℓ⟩퐽 ⟨ℓ⟩퐽 = + + (︁ )︁ + . 푘^푇 퐶 [TC] 푘푇 퐶 ^퐺 ⟨ℓ⟩퐽 푘퐺 표푛 푐푎푡 푘표푛 G푡표푡 − 퐺 푐푎푡 푘푐푎푡

S18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Converting to proteome fraction:

⎛ ⎞ 1 푒푙 푛푎푎 1 1 1 휑푟푖푏표 =휆 ⎝ 푇 퐶 + 푇 퐶 + (︁ )︁ + 퐺 ⎠ . ℓ푟푖푏표 푘표푛 휑푇 퐶 푘 퐺 ℓ퐺휆 푘 푐푎푡 푘표푛 휑퐺 − 퐺 푐푎푡 푘푐푎푡

From the coarse-grained flux relations through the different categories (equation S1), which defines the coarse-grained transition times, we thus have:

푛푎푎 1 1 1 τ푒푙 = ⟨ℓ⟩ (τ푎푎 + τ푖푛푑) , where τ푎푎 = 푇 퐶 + 푇 퐶 + (︁ )︁ + 퐺 . (S11) 푘표푛 휑푇 퐶 푘 퐺 ℓ퐺휆 푘 푐푎푡 푘표푛 휑퐺 − 퐺 푐푎푡 푘푐푎푡

Above, τ푎푎 is the effective time for a single step (1 a.a.) of translation elongation, and τ푖푛푑 corresponds to the summed time of factor independent transitions in each elongation step (not explicitly included in the kinetic scheme).

S6.3.2 Optimality conditions for translation elongation factors

The optimality condition (equation 5) applied to translation elongation factors leads to:

(︂ )︂* (︂ )︂* (︂ )︂* (︂ )︂* 휕τ푡푎푎 휕τ푡푎푎 휕τ푡푎푎 휕τ푡푎푎 1 = = = = − * . (S12) 휕휑퐺 휕휑푇 푢 휕휑푇 푠 휕휑푎푎푅푆 ℓ푟푖푏표휆

where equation S11 was used for τ푎푎. Since the free EF-G concentration does not depend on EF-Tu, EF-Ts, or aaRS concentration, the conditions for EF-Tu, EF-Ts and aaRS simplify to:

(︂ )︂* (︂ )︂* (︂ )︂* 휕 푛푎푎 휕 푛푎푎 휕 푛푎푎 1 푇 퐶 = 푇 퐶 = 푇 퐶 = − * . (S13) 휕휑푇 푢 푘표푛 휑푇 퐶 휕휑푇 푠 푘표푛 휑푇 퐶 휕휑푎푎푅푆 푘표푛 휑푇 퐶 ℓ푟푖푏표휆

Carrying through the differentiation also leads to conditions on the derivatives of the ternary complex concentration at the optimum:

(︂ )︂* (︂ )︂* (︂ )︂* 푇 퐶 * 2 휕휑푇 퐶 휕휑푇 퐶 휕휑푇 퐶 푘표푛 (휑푇 퐶 ) = = = * . (S14) 휕휑푇 푢 휕휑푇 푠 휕휑푎푎푅푆 ℓ푟푖푏표푛푎푎휆

These relationships will be useful to solve for the some elongation factor optimal abundances below.

S19 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

S6.3.3 Optimal EF-Ts abundance

Differentiating equation S9 with respect to 휑푇 푢 and 휑푇 푠, we get at the optimum:

* (︂ )︂* (︂ )︂* 1 휆 휕휑푇 푢GTP 1 휕휑푇 퐶 + 2 = , ℓ푟푖푏표 푇 푢 (︀ * )︀ 휕휑푇 푢 ℓ푇 푢 휕휑푇 푢 푘표푛 휑푇 푢GTP * (︂ )︂* (︂ )︂* 1 휆 휕휑푇 푢GTP 1 휕휑푇 퐶 + 2 = . ℓ푟푖푏표 푇 푢 (︀ * )︀ 휕휑푇 푠 ℓ푇 푢 휕휑푇 푠 푘표푛 휑푇 푢GTP

By equation S14, the above leads to the additional condition at the optimum:

(︂ )︂* (︂ )︂* 휕휑 GTP 휕휑 GTP 푇 푢 = 푇 푢 . 휕휑푇 푢 휕휑푇 푠

Directly differentiating equation S10, and using equation S14, leads to:

(︂ )︂* 푇 퐶 * 2 (︂ )︂* * 푇 퐶 * 2 휕휑 GTP 푘 (휑 ) 휕휑 GTP ℓ 휆 푘 (휑 ) 푇 푢 = 1 − 표푛 푇 퐶 = 푇 푢 = 푇 푢 − 표푛 푇 퐶 . * (︁ )︁2 * 휕휑푇 푢 ℓ푟푖푏표푛푎푎휆 휕휑푇 푠 푇 푠 * ℓ푇 푠휆 ℓ푟푖푏표푛푎푎휆 푘표푛 휑푇 푠 − 푇 푠 푘푐푎푡

Therefore, the optimal abundance for EF-Ts is:

√︃ * * * ℓ푇 푢휆 ℓ푇 푠휆 휑푇 푠 = 푇 푠 + 푇 푠 . (S15) 푘표푛 푘푐푎푡

S6.3.4 Optimal EF-G abundance

The optimality condition for EF-G is complicated by the fact that EF-G free concentration appears in the solution for the steady-state ternary complex through the tRNA conservation equation S9. Differ- entiating the conservation tRNA equation, and using the optimality condition S12 (replacing a number of terms with the elongation time τ푎푎, equation S11):

* (︂ )︂* (︂ )︂* * (︂ )︂* 2 휆 푛푎푎 휕휑푇 퐶 1 휕휑푇 퐶 휆 휕휑푇 푢GTP 0 = − + 2 + − 2 . (S16) ℓ푟푖푏표 푇 퐶 (︀ * )︀ 휕휑퐺 ℓ푇 푢 휕휑퐺 푇 푢 (︀ * )︀ 휕휑퐺 푘표푛 휑푇 푢 푘표푛 휑푇 푢GTP

Above, the right-hand portion corresponds to the additional constraint coming from the implication of

EF-G in the steady-state concentration of the ternary complex. From the equation for 휑푇 푢GTP (equa-

S20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. tion S10), we have directly:

(︂ )︂* (︂ )︂* 휕휑 GTP 휕휑 푇 푢 = − 푇 퐶 . 휕휑퐺 휕휑퐺

Substituting this in equation S16:

[︃ * * ]︃ (︂ )︂* 2 1 휆 휆 푛푎푎 휕휑푇 퐶 = + 2 + 2 . (S17) ℓ푟푖푏표 ℓ푇 푢 푇 푢 (︀ * )︀ 푇 퐶 (︀ * )︀ 휕휑퐺 푘표푛 휑푇 푢GTP 푘표푛 휑푇 퐶

The derivative of the ternary complex with respect to EF-G at the optimum can be obtained from the original optimality condition S12, by carrying through the differentiation:

⎡ ⎤ (︂ )︂* 푇 퐶 휕휑푇 퐶 푘표푛 * 2 1 1 = (휑푇 퐶 ) ⎢ − ⎥ . ⎣ * (︁ * )︁2 ⎦ 휕휑퐺 푛푎푎 ℓ푟푖푏표휆 퐺 * ℓ퐺휆 푘표푛 휑퐺 − 퐺 푘푐푎푡

Substituting in equation S17, we arrive at a final equation for EF-G in terms of the concentration of other elongation factor and the optimal growth rate:

⎛ ⎞ [︃ ]︃ 2 푘푇 퐶 (휑* )2 푘푇 퐶 (휑* )2 1 1 * 표푛 푇 퐶 표푛 푇 퐶 ⎜ ⎟ = 휆 1 + * + 2 * − 2 . ℓ 푛푎푎ℓ푇 푢휆 푇 푢 (︀ * )︀ ⎝ℓ 휆 (︁ ℓ 휆* )︁ ⎠ 푟푖푏표 푛푎푎푘표푛 휑 GTP 푟푖푏표 퐺 * 퐺 푇 푢 푘표푛 휑퐺 − 퐺 푘푐푎푡

The optimal solution for EF-G is thus:

√︃ * (︂ )︂ * √︃ * * * ℓ푟푖푏표휆 Δ + 1 ℓ퐺휆 ℓ푟푖푏표휆 ℓ퐺휆 휑퐺 = 퐺 + 퐺 ≥ 퐺 + 퐺 , (S18) 푘표푛 Δ − 1 푘푐푎푡 푘표푛 푘푐푎푡 푇 퐶 * 2 푇 퐶 * 2 푘표푛 (휑푇 퐶 ) 푘표푛 (휑푇 퐶 ) where: Δ := * + 2 . 푛푎푎ℓ푇 푢휆 푇 푢 (︀ * )︀ 푛푎푎푘표푛 휑푇 푢GTP

* * Note that given that the term Δ involves 휑푇 퐶 and 휑푇 푢GTP , and so the solution above is not a priori complete. However, using the approximate ternary complex concentration at the optimum (equation 12, derived in details in section S6.3.5), we have:

푇 퐶 * 2 푘표푛 (휑푇 퐶 ) ℓ푟푖푏표 Δ > * ≈ ≈ 18.5 ≫ 1 푛푎푎ℓ푇 푢휆 ℓ푇 푢

* This means that the lower bound for 휑퐺 above (equation S18) is a good approximation: in the physiolog- ical regime, we can approximately neglect the indirect dependence of the ternary complex concentration

S21 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. on EF-G via the tRNA conservation equation. Hence, the approximate solution for the EF-G optimal abundance is (same for had we initially assumed that 휑푇 퐶 was independent of 휑퐺, in which case the solution for EF-G can be obtained identically as that of release factors):

√︃ * * * ℓ푟푖푏표휆 ℓ퐺휆 휑퐺 ≈ 퐺 + 퐺 . 푘표푛 푘푐푎푡

S6.3.5 Optimal EF-Tu and aaRS abundances

While simplifying relations were possible with EF-Ts and EF-G, allowing their solution (approximately) independently from the rest of the cycle, EF-Tu and aaRS are intricately connected through the tRNA cycle. We thus return to the tRNA conservation equation, equation S9. For notational simplicity, we group the catalytic step of the TC, EF-G binding, and EF-G catalytic action (translocation) in parameter

푚푎푥 푘푒푙 (these do not depend on 휑푇 푢 and 휑푎푎푅푆). Further dropping the EF-Ts related and catalytic terms (will be added back at the end, they only contribute a fixed term at the optimum) in the equation for the free EF-Tu, we get:

tRNA푡표푡 푛푎푎 2 = 푇 퐶 + 푚푎푥 + ... (S19) 푃 휆 푘표푛 휑푇 퐶 푘푒푙 푛푎푎 1 1 휑푇 퐶 (︁ )︁ + 푎푎푅푆 + 푇 푢 + , 푎푎푅푆 ℓ푎푎푅푆 휆 푘 푘표푛 휑푇 푢GTP ℓ푇 푢휆 푘표푛 휑푎푎푅푆 − 푎푎푅푆 푐푎푡 푘푐푎푡 where 휑푇 푢GTP = 휑푇 푢 − 휑푇 퐶 is the free EF-Tu concentration.

This system is first solved numerically (Fig. 3B). To close the equation in terms of uniquely 휑푇 퐶 , we use our relationship for 휆 (equation 1), with:

(︂ )︂ 푛푎푎 1 τ푡푟푙 = ⟨ℓ⟩ 푇 퐶 + 푚푎푥 + τ푖푛푖 + τ푡푒푟, 푘표푛 휑푇 퐶 푘푒푙

푚푎푥 where 푘푒푙 is the maximum rate of translation elongation (contribution other than ternary complex −1 diffusion) estimated from in vivo kinetic measurements (≈ 22 s [13]), and τ푖푛푖 + τ푡푒푟 ≈ 0.5 s the estimated time for the initiation and termination step (≈ 5−10% of the full translation cycle translation time), taken as fixed parameters here. Using this relationship for the translation time leads to theexplicit

S22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. relationship between growth and ternary complex concentration:

(︂ )︂ 푚푎푥 휑푟푖푏표 푘푡푟푙휑푇 퐶 ⟨ℓ⟩푘푒푙 푘푡푟푙푛푎푎 휆(휑푇 퐶 ) = , with 푘푡푟푙 := 푚푎푥 and 퐾푇 퐶 := 푇 퐶 (S20) ℓ푟푖푏표 휑푇 퐶 + 퐾푇 퐶 ⟨ℓ⟩ + 푘푒푙 (τ푖푛푖 + τ푡푒푟) 푘표푛 which is the same relationship as the one derived in [30], with the addition of the terms corresponding to the rest translation cycle. Substituting the explicit relationship between growth and ternary complex concentration above (equation S20) in the aaRS/EF-Tu tRNA cycle relationship (equation S19) closes the system for 휑푇 퐶 . Numerical solution for this equation is presented in Fig. 3B (see section S9 for other parameters).

The main conclusion from numerically solving the reduced system (equations S19 and S20) is that the EF-Tu/aaRS space is partitioned in two regimes, resulting from the separation of scale of reactions

푇 퐶 in the coarse-grained model. Specifically, 푘푇 푢 ≫ 푘표푛 , so that any imbalance between the constituents 표푛 푛푎푎 of the ternary complex (charged tRNAs, free EF-Tu), results in stoichiometric unproductive excess of the component in surplus.

We can derive a relation for the "transition line" in the aaRS/EF-Tu space where both free charged tRNAs and free EF-Tu are at low concentrations. This corresponds to setting the (formally impossible)

1 requirement 휑푇 푢GTP ≈ 0 ⇒ 휑푇 퐶 ≈ 휑푇 푢 and [aatRNA] ∝ 푇 푢 ≈ 0, i.e., 푘표푛 휑푇 푢GTP

tRNA 푛 2 휑¯ 푛 1 푡표푡 − 푎푎 − − 푇 푢 = 푎푎 + . (S21) (︀ ¯ )︀ 푇 퐶 ¯ 푚푎푥 (︀ ¯ )︀ (︂ ¯ )︂ 푎푎푅푆 푃 휆 휑푇 푢 푘표푛 휑푇 푢 푘푒푙 ℓ푇 푢휆 휑푇 푢 푎푎푅푆 ¯ ℓ푎푎푅푆 휆(휑푇 푢) 푘푐푎푡 푘표푛 휑푎푎푅푆 − 푎푎푅푆 푘푐푎푡

¯ ¯ The ¯· signifies the transition line relationship between 휑푇 푢 and 휑푎푎푅푆, which is displayed in Fig. 3B. The heuristic to estimate the optimal EF-Tu concentration described in the main text can be extended to include the EF-Ts cycle. In particular, in the EF-Tu limited regime, with 휑푇 푢퐺푇 푃 ≈ 0, we have (from equation S10):

ℓ푇 푢휆 ℓ푇 푢휆 ℓ푇 푢휆 휑푇 퐶 ≈ 휑푇 푢 − (︁ )︁ − 푇 푠 − 푇 퐶 . 푇 푠 ℓ푇 푠휆 푘 푘 푘표푛 휑푇 푠 − 푇 푠 푐푎푡 푐푎푡 푘푐푎푡

Substituting the above expression for 휑푇 퐶 in the optimality condition (equation S13) for 휑푇 푢, we arrive

S23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. at (using the optimal solution for EF-Ts, equation S15):

√︃ * √︃ * * * * ℓ푟푖푏표푛푎푎휆 ℓ푇 푢휆 ℓ푇 푢휆 ℓ푇 푢휆 휑푇 푢 ≈ 푇 퐶 + 푇 푠 + 푇 푠 + 푇 퐶 . 푘표푛 푘표푛 푘푐푎푡 푘푐푎푡

Above, the last three terms (not appearing in equation 12) correspond to the additional diffusion of the

EF-Ts cycle, and catalytic contributions.

Following the argument (see main text) that the optimal aaRS abundance should lie on the transition line (equation S21), we obtain:

* * 푛푎푎 ℓ푎푎푅푆휆 휑푎푎푅푆 ≈ 푎푎푅푆 + 푎푎푅푆 , 푘표푛 Δ푡 푘푐푎푡 with Δ푡 related to the excess tRNA:

* √︃ * tRNA푡표푡 푛푎푎 2 휑푇 퐶 1 * 푛푎푎ℓ푟푖푏표휆 Δ푡 := * − 푇 퐶 * − 푚푎푥 − * − 푎푎푅푆 , where 휑푇 퐶 = 푇 퐶 . 푃 휆 푘표푛 휑푇 퐶 푘푒푙 ℓ푇 푢휆 푘푐푎푡 푘표푛

S7 Translation initiation

Translation initiation is also relatively complex compared to translation termination. In contrast with other steps of the translation cycle, binding of factors necessary for the process (IF1, IF2, IF3, initiator tRNA) do not occur in a strict sequential order, leading to a "heterogeneous assembly landscape" [12, 20] more complex to model. However, one assembly pathway is kinetically favored [45]. We take this favored assembly pathway as our kinetic scheme (Fig. S4, note that binding of tRNA/mRNA are coarse-grained to a single even without loss of generality). We provide some evidence below that taking a more complex assembly pathway would minimally affect the predicted optimal initiation factor abundances.

S24 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure S4: Simplified kinetic scheme for translation initiation. Reactions in dashed box correspond to sub-system solved in detail first (section S7.1). Variables are labeled on the scheme.

The reactions in our simplified schemes are:

퐽 −→ 푅30푆 + 푅50푆 ,

^퐼퐹 3 푘표푛 푅30푆 + 퐼퐹 3 −−−→ 푅3,

^퐼퐹 2 푘표푛 푅30푆 + 퐼퐹 2 −−−→ 푅2,

^퐼퐹 2 푘표푛 푅3 + 퐼퐹 2 −−−→ 푅23,

^퐼퐹 3 푘표푛 푅2 + 퐼퐹 3 −−−→ 푅23,

^퐼퐹 1 푘표푛 푅23 + 퐼퐹 1 −−−→ 푅123,

푘푅푁퐴 푅123 −−−−→ 푅123푚,

^50푆 푘표푛 푅123푚 + 푅50푆 −−−→ 푅푃 퐼퐶 ,

푖푛푖 푘푐푎푡 푅푃 퐼퐶 −−−→ 퐼퐹 1 + 퐼퐹 2 + 퐼퐹 3,

S25 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

with corresponding mass action equations:

d[푅 ] 30푆 = 퐽 − 푘^퐼퐹 2[푅 ][퐼퐹 2] − 푘^퐼퐹 3[푅 ][퐼퐹 3], dt 표푛 30푆 표푛 30푆 d[푅 ] 2 = 푘^퐼퐹 2[푅 ][퐼퐹 2] − 푘^퐼퐹 3[푅 ][퐼퐹 3], dt 표푛 30푆 표푛 2 d[푅 ] 3 = 푘^퐼퐹 3[푅 ][퐼퐹 3] − 푘^퐼퐹 2[푅 ][퐼퐹 2], dt 표푛 30푆 표푛 3 d[푅 ] 23 = 푘^퐼퐹 2[푅 ][퐼퐹 2] + 푘^퐼퐹 3[푅 ][퐼퐹 3] − 푘^퐼퐹 1[푅 ][퐼퐹 1], dt 표푛 3 표푛 2 표푛 23 d[푅 ] 123 = 푘^퐼퐹 1[푅 ][퐼퐹 1] − 푘 [푅 ], dt 표푛 23 푅푁퐴 123 d[푅 ] 123푚 = 푘 [푅 ] − 푘^50푆 [푅 ][푅 ], dt 푅푁퐴 123 표푛 123푚 50푆 d[푅 ] 푃 퐼퐶 = 푘^50푆 [푅 ][푅 ] − 푘푖푛푖[푅 ], dt 표푛 123푚 50푆 푐푎푡 푃 퐼퐶 d[푅 ] 50푆 = 퐽 − 푘^50푆 [푅 ][푅 ], dt 표푛 123푚 50푆 d[퐼퐹 1] = −푘^퐼퐹 1[푅 ][퐼퐹 1] + 푘푖푛푖[푃 퐼퐶], dt 표푛 23 푐푎푡 d[퐼퐹 2] = −푘^퐼퐹 2 ([푅 ] + [푅 ]) [퐼퐹 2] + 푘푖푛푖[푃 퐼퐶], dt 표푛 30푆 3 푐푎푡 d[퐼퐹 3] = −푘^퐼퐹 3 ([푅 ] + [푅 ]) [퐼퐹 3] + 푘푖푛푖[푃 퐼퐶], dt 표푛 30푆 2 푐푎푡

and conservation equations:

퐼퐹 1푡표푡 = [퐼퐹 1] + [푅123] + [푅123푚] + [푅푃 퐼퐶 ],

퐼퐹 2푡표푡 = [퐼퐹 2] + [푅2] + [푅23] + [푅123] + [푅123푚] + [푅푃 퐼퐶 ],

퐼퐹 3푡표푡 = [퐼퐹 3] + [푅3] + [푅23] + [푅123] + [푅123푚] + [푅푃 퐼퐶 ],

[푅50푆 ] = [푅30푆 ] + [푅2] + [푅3] + [푅23] + [푅123] + [푅123푚].

We assume the steady-state concentrations of small and large ribosomal subunits to be equal.

S7.1 Sub-pathway without subunits joining

The system of equation is complicated by the second branch of the pathway corresponding to 50S √︂ ^50푆 ℓ퐼퐹 푘표푛 subunit binding. However, in the regime ℓ ^퐼퐹 ≪ 1 (which is realized because of the large size 푟푖푏표 푘표푛 of the ribosome and slower association rate constant for the large subunit compared to the initiation factors again due to size), the effect of this branch is to add a term to the optimal abundance equalto the concentration of species 푅123푚 (see derivation in section S7.2). We focus here on the solution of the

S26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. part of the reaction scheme boxed in Fig. S4. This sub-scheme corresponds to:

퐽 −→ 푅30푆 ,

^퐼퐹 3 푘표푛 푅30푆 + 퐼퐹 3 −−−→ 푅3,

^퐼퐹 2 푘표푛 푅30푆 + 퐼퐹 2 −−−→ 푅2,

^퐼퐹 2 푘표푛 푅3 + 퐼퐹 2 −−−→ 푅23,

^퐼퐹 3 푘표푛 푅2 + 퐼퐹 3 −−−→ 푅23,

^퐼퐹 1 푘표푛 푅23 + 퐼퐹 1 −−−→ 푅123,

푘푅푁퐴 푅123 −−−−→ 푅123푚.

d[푅 ] 30푆 = 퐽 − 푘^퐼퐹 2[푅 ][퐼퐹 2] − 푘^퐼퐹 3[푅 ][퐼퐹 3], dt 표푛 30푆 표푛 30푆 d[푅 ] 2 = 푘^퐼퐹 2[푅 ][퐼퐹 2] − 푘^퐼퐹 3[푅 ][퐼퐹 3], dt 표푛 30푆 표푛 2 d[푅 ] 3 = 푘^퐼퐹 3[푅 ][퐼퐹 3] − 푘^퐼퐹 2[푅 ][퐼퐹 2], dt 표푛 30푆 표푛 3 d[푅 ] 23 = 푘^퐼퐹 2[푅 ][퐼퐹 2] + 푘^퐼퐹 3[푅 ][퐼퐹 3] − 푘^퐼퐹 1[푅 ][퐼퐹 1], dt 표푛 3 표푛 2 표푛 23 d[푅 ] 123 = 푘^퐼퐹 1[푅 ][퐼퐹 1] − 푘 [푅 ], dt 표푛 23 푅푁퐴 123 d[퐼퐹 1] = −푘^1 [푅 ][퐼퐹 1] + 푘 [푅 ], dt 표푛 23 푅푁퐴 123 d[퐼퐹 2] = −푘^퐼퐹 2([푅 ] + [푅 ])[퐼퐹 2] + 푘 [푅 ], dt 표푛 30푆 3 푅푁퐴 123 d[퐼퐹 3] = −푘^퐼퐹 3([푅 ] + [푅 ])[퐼퐹 3] + 푘 [푅 ], dt 표푛 30푆 2 푅푁퐴 123

with conservation equations:

퐼퐹 1푡표푡 = [퐼퐹 1] + [푅123],

퐼퐹 2푡표푡 = [퐼퐹 2] + [푅2] + [푅23] + [푅123],

퐼퐹 3푡표푡 = [퐼퐹 3] + [푅3] + [푅23] + [푅123],

This system can be solved as with the previous schemes. In steady-state, we find for concentrations

S27 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. in terms of the free concentrations [퐼퐹 2] and [퐼퐹 3]:

퐽 퐽 퐽 퐽 [푅123] = , [퐼퐹 1] = 퐼퐹 1푡표푡 − , [푅23] = , [푅30푆] = , ^퐼퐹 1 ^퐼퐹 2 ^퐼퐹 3 푘푅푁퐴 푘푅푁퐴 푘표푛 [퐼퐹 1] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] ^퐼퐹 2 (︃ )︃ ^퐼퐹 3 (︃ )︃ 푘표푛 [퐼퐹 2] 퐽 푘표푛 [퐼퐹 3] 퐽 [푅2] = , [푅3] = , ^퐼퐹 3 ^퐼퐹 2 ^퐼퐹 3 ^퐼퐹 2 ^퐼퐹 2 ^퐼퐹 3 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 2] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] and the coupled equations for [퐼퐹 2] and [퐼퐹 3] that need to be solved:

^퐼퐹 2 (︃ )︃ 푘표푛 [퐼퐹 2] 퐽 퐽 퐽 퐼퐹 2푡표푡 = [퐼퐹 2] + + + , (S22) ^퐼퐹 3 ^퐼퐹 2 ^퐼퐹 3 ^퐼퐹 1 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 1] 푘푅푁퐴 ^퐼퐹 3 (︃ )︃ 푘표푛 [퐼퐹 3] 퐽 퐽 퐽 퐼퐹 3푡표푡 = [퐼퐹 3] + + + . ^퐼퐹 2 ^퐼퐹 2 ^퐼퐹 3 ^퐼퐹 1 푘표푛 [퐼퐹 2] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 1] 푘푅푁퐴

As for translation termination (section S5.3.2) and elongation (section S6.3.1), summing the ribosome containing species:

[푅푖푛푖] = [푅30푆] + [푅2] + [푅3] + [푅23] + [푅123], (︃ )︃ 1 1 1 1 1 = 퐽 + − + + , ^퐼퐹 2 ^퐼퐹 3 ^퐼퐹 2 ^퐼퐹 3 ^퐼퐹 1 푘표푛 [퐼퐹 2] 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 2] + 푘표푛 [퐼퐹 3] 푘표푛 [퐼퐹 1] 푘푅푁퐴 allows us to read the initiation time directly (recast in proteome fraction units):

1 1 1 1 1 τ푖푛푖 = + − + + . (S23) 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 퐼퐹 1 푓푟푒푒 푘 푘표푛 휑퐼퐹 2 푘표푛 휑퐼퐹 3 푘표푛 휑퐼퐹 2 + 푘표푛 휑퐼퐹 3 푘표푛 휑퐼퐹 1 푅푁퐴

The above is the time can be used in the optimality condition (equation 5). Note that the parallel nature of the reactions with IF2 and IF3 leads to a reduction compared to a purely sequential pathway

(negative term above decreasing the total initiation time, as expected if multiple reactions can occur in parallel).

Given that binding of IF1 occurs last in this scheme, its free concentration takes a simple form

푓푟푒푒 ℓ퐼퐹 1휆 (휑 = 휑퐼퐹 1 − ). In contrast, computing the free IF2 and IF3 concentrations requires solving 퐼퐹 1 ⟨ℓ⟩푘푅푁퐴

S28 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. the non-linear coupled system, equations S22. Recasting these in units of proteome fraction:

(︃ 퐼퐹 2 푓푟푒푒 )︃ ˜ 푓푟푒푒 휆ℓ퐼퐹 2 푘표푛 휑퐼퐹 2 휑퐼퐹 2 = 휑 + , 퐼퐹 2 퐼퐹 3 푓푟푒푒 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 ⟨ℓ⟩푘표푛 휑퐼퐹 3 푘표푛 휑퐼퐹 2 + 푘표푛 휑퐼퐹 3 (︃ 퐼퐹 3 푓푟푒푒 )︃ ˜ 푓푟푒푒 휆ℓ퐼퐹 3 푘표푛 휑퐼퐹 3 휑퐼퐹 3 = 휑 + , 퐼퐹 3 퐼퐹 2 푓푟푒푒 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 ⟨ℓ⟩푘표푛 휑퐼퐹 2 푘표푛 휑퐼퐹 2 + 푘표푛 휑퐼퐹 3

˜ ℓ퐼퐹 2휆 ℓ퐼퐹 2휆 ˜ with 휑퐼퐹 2 := 휑퐼퐹 2 − − 푓푟푒푒 , and similarly for 휑퐼퐹 3. We show now that the terms coupling ⟨ℓ⟩푘푅푁퐴 퐼퐹 1 ⟨ℓ⟩푘표푛 휑퐼퐹 1 푓푟푒푒 푓푟푒푒 the two equations for 휑퐼퐹 2 and 휑퐼퐹 2 (bracketed above) are small at the optimum. Indeed, based on results in simpler schemes (self-consistency confirmed below), we expect at the optimum:

√︃ * √︃ * 푓푟푒푒,* ℓ푟푖푏표휆 푓푟푒푒,* ℓ푟푖푏표휆 휑퐼퐹 2 ∼ 퐼퐹 2 and 휑퐼퐹 3 ∼ 퐼퐹 3 . ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘표푛

Hence, we expect the two terms at the optimum in the coupled equations above to compare as (e.g., in the free IF2 equation):

√︃ 휑푓푟푒푒,* ℓ 푘퐼퐹 3 퐼퐹 2 ∼ 푟푖푏표 표푛 ≫ 1, (︃ * )︃ ℓ 푘퐼퐹 2 휆 ℓ퐼퐹 2 퐼퐹 2 표푛 퐼퐹 3 푓푟푒푒,* ⟨ℓ⟩푘표푛 휑퐼퐹 3 coming from the large size of the ribosome compared to the initiation factors. In addition, the derivative of the coupling terms, which appear in the optimality condition and therefore in identifying the opti- 휆*ℓ mal abundances, are all of the form 퐼퐹 compared to the main term. This scales scales as 퐼퐹 푓푟푒푒 2 ⟨ℓ⟩푘표푛 (휑퐼퐹 ) −1 ℓ퐼퐹 ℓ푟푖푏표 ≪ 1 at the self-consistent solution. Hence, neglecting the coupling is justified as an approximate solutions near the optimum, and we obtain for the free concentrations of IFs:

푓푟푒푒 ℓ퐼퐹 1휆 휑퐼퐹 1 = 휑퐼퐹 1 − , ⟨ℓ⟩푘푅푁퐴 푓푟푒푒 ℓ퐼퐹 2휆 ℓ퐼퐹 2휆 휑 ≈ 휑퐼퐹 2 − − , 퐼퐹 2 ⟨ℓ⟩푘 퐼퐹 1 푓푟푒푒 푅푁퐴 ⟨ℓ⟩푘표푛 휑퐼퐹 1 푓푟푒푒 ℓ퐼퐹 3휆 ℓ퐼퐹 3휆 휑 ≈ 휑퐼퐹 3 − − . 퐼퐹 3 ⟨ℓ⟩푘 퐼퐹 1 푓푟푒푒 푅푁퐴 ⟨ℓ⟩푘표푛 휑퐼퐹 1

Substituting these in the expression for the initiation time, equation S23, and using the optimality

퐼퐹 2 퐼퐹 3 condition (equation 5, we find that no simple solution exist for the non symmetric caseof 푘표푛 ̸= 푘표푛 . Since the on-rates should be similar for IF2 and IF3 (difference in size should only lead to modest

S29 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1/3 difference in on-rates coefficient, by roughly (ℓ퐼퐹 2/ℓ퐼퐹 3) ≈ 1.7 assuming Stokes scaling), the symmetric case is approximately correct. We report the symmetric solution for simplicity. The final optimal solutions for the three factors for the sub-scheme solved here is:

√︃ * [︂ ]︂ * * ℓ푟푖푏표휆 ℓ퐼퐹 2 + ℓ퐼퐹 3 ℓ퐼퐹 1휆 휑퐼퐹 1 ≈ 퐼퐹 1 1 + + 푖푛푖 , (S24) ⟨ℓ⟩푘표푛 ℓ푟푖푏표 ⟨ℓ⟩푘

√︂ √︃ * √︃ * * * 3 ℓ푟푖푏표휆 ℓ퐼퐹 2 ℓ푟푖푏표휆 ℓ퐼퐹 2휆 휑퐼퐹 2 ≈ 퐼퐹 2 + 퐼퐹 1 + 푖푛푖 , 4 ⟨ℓ⟩푘표푛 ⟨ℓ⟩ ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘

√︂ √︃ * √︃ * * * 3 ℓ푟푖푏표휆 ℓ퐼퐹 3 ℓ푟푖푏표휆 ℓ퐼퐹 3휆 휑퐼퐹 3 ≈ 퐼퐹 3 + 퐼퐹 1 + 푖푛푖 . 4 ⟨ℓ⟩푘표푛 ⟨ℓ⟩ ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘

The form of the solution is again similar to that derived for the simpler translation termination scheme

(c.f., equation S3), with three differences, each of which has an intuitive interpretation. First, the factor [︁ ]︁ 1 + ℓ퐼퐹 2+ℓ퐼퐹 3 in the IF1 solution arises as a result of IF1 binding being last in our initiation pathway. ℓ푟푖푏표 Indeed, IF1 concentration also influences free IF2 and IF3 concentration, leading to additional selective pressure to increase its abundance. In effect, the molecular species waiting for IF1 to diffuse to itstarget is not only the ribosome, but the ribosome with IF2 and IF3 bound, and a total amino acid weight √︀ ℓ푟푖푏표 → ℓ푟푖푏표 + ℓ퐼퐹 2 + ℓ퐼퐹 3. Second, the factor of 3/4 ≈ 0.87 < 1 for IF2 and IF3 (corresponding to the symmetric case), arising from the parallel pathway for IF2 and IF3 rendering the process more efficient.

We therefore see that the correction from having multiple reactions in parallel is modest (0.87 vs. 1).

The third difference to the simpler case of translation termination are the second terms forIF2and

IF3, corresponding to the additional delay incurred by binding of IF1. These come from the assumed sequential nature of our initiation scheme (Fig. S4). In such cases, factors binding earlier have to be present at higher abundances to account for their wait times for later binding events. The exact form of this correction term would be different for more complex assembly pathways (but would be captured by average delays from other factor binding).

S7.2 Pathway including subunits joining

The solutions above (equations S24) are for the reduced scheme (boxed in Fig. S4). The full solutions includes the delay arising from 50S subunit binding. Including subunit joining requires the solution of an additional equation for the steady-state concentration of species with all three initiation factors, mRNA and initiator tRNA waiting for subunit joining (species 푅123푚 in Fig. S4, denoted 휑123푚 in units

S30 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

of proteome fraction). The equation to solve for 휑123푚 can be obtained from the 50S ribosome subunit conservation equation:

휆 휆 휆 휆 휆 휆 ⟨ℓ⟩휑 = + − + + + 123푚 . 푘50푆휑 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 퐼퐹 2 푓푟푒푒 퐼퐹 3 푓푟푒푒 퐼퐹 1 푓푟푒푒 푘 ℓ 표푛 123푚 푘표푛 휑퐼퐹 2 푘표푛 휑퐼퐹 3 푘표푛 휑퐼퐹 2 + 푘표푛 휑퐼퐹 3 푘표푛 휑퐼퐹 1 푅푁퐴 30푆

휑123푚 appears in the equations for the free concentration of the initiation factors (from the conservation equations), and also leads to the appearance of a new term in the expression for the initiation time τ푖푛푖 (equation S23) corresponding to this step: ⟨ℓ⟩휑123푚 . ℓ30푆 휆 These two additions, resulting from the parallel branch of 50S joining, can be simplified due to a sep- aration of scales between the various terms. For large initiation factor concentrations, the corresponding mass action terms in the equation for 휑123푚 negligibly contribute to the solution. In this regime, the new term involving 휑123푚 in the initiation time τ푖푛푖 does not alter the form the optimal abundances of IF1, IF2, and IF3 beyond adding a constant term. Hence, in the regime of high free IF concentration, the optimality condition has the same form as derived in the previous section.We can therefore obtain

∞ 휑123푚 assuming large IF concentration, denoted 휑123푚:

⎛ √︃ ⎞ (︂ )︂2 ∞ ℓ30푆 휆 1 휆 ⟨ℓ⟩휆 휑123푚 = ⎝− + + 50푆 ⎠ ⟨ℓ⟩ 2푘푅푁퐴 4 푘푅푁퐴 ℓ30푆푘표푛

This solution will be self-consistent provided (for all initiation factors):

√︃ 휆* 휆* ⟨ℓ⟩휑∞ 휆* 1 (︂ 휆* )︂2 ⟨ℓ⟩휆* ≪ + 123푚 = + + , 퐼퐹 푓푟푒푒,* 푘 ℓ 2푘 4 푘 ℓ 푘50푆 푘표푛 휑퐼퐹 푅푁퐴 30푆 푅푁퐴 푅푁퐴 30푆 표푛

It therefore suffices to show:

√︃ 휆* ⟨ℓ⟩휆* ≪ . 퐼퐹 푓푟푒푒,* ℓ 푘50푆 푘표푛 휑퐼퐹 30푆 표푛

푓푟푒푒,* Using our optimality condition on 휑퐼퐹 (equation S24) assuming no contribution from 휑123푚 (self- consistency), and converting association rates in units µM−1s−1, the above condition reduces to:

√︃ ℓ 푘^50푆 퐼퐹 표푛 ≪ 1. ^퐼퐹 ℓ푟푖푏표 푘표푛

The self-consistency condition is met both because initiation factors are smaller than ribosomes ℓ퐼퐹 ≪

S31 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

^50푆 ^퐼퐹 ℓ푟푖푏표, and because the on-rate for subunit joining is lower than initiation factor binding (푘표푛 ≪ 푘표푛 ), given again the size differences. The solution, including the contribution from ribosome subunits joining is then:

√︃ * [︂ ]︂ * (︂ )︂ * ℓ푟푖푏표휆 ℓ퐼퐹 2 + ℓ퐼퐹 3 ℓ퐼퐹 1 ∞ ℓ퐼퐹 1휆 1 1 휑퐼퐹 1 ≈ 퐼퐹 1 1 + + 휑123푚 + + 푖푛푖 , ⟨ℓ⟩푘표푛 ℓ푟푖푏표 ℓ30푆 ⟨ℓ⟩ 푘푅푁퐴 푘푐푎푡 √︂ √︃ * √︃ * * (︂ )︂ * 3 ℓ푟푖푏표휆 ℓ퐼퐹 2 ℓ푟푖푏표휆 ℓ퐼퐹 2 ∞ ℓ퐼퐹 2휆 1 1 휑퐼퐹 2 ≈ 퐼퐹 2 + 퐼퐹 1 + 휑123푚 + + 푖푛푖 , 4 ⟨ℓ⟩푘표푛 ⟨ℓ⟩ ⟨ℓ⟩푘표푛 ℓ30푆 ⟨ℓ⟩ 푘푅푁퐴 푘푐푎푡 √︂ √︃ * √︃ * * (︂ )︂ * 3 ℓ푟푖푏표휆 ℓ퐼퐹 3 ℓ푟푖푏표휆 ℓ퐼퐹 3 ∞ ℓ퐼퐹 3휆 1 1 휑퐼퐹 3 ≈ 퐼퐹 3 + 퐼퐹 1 + 휑123푚 + + 푖푛푖 , 4 ⟨ℓ⟩푘표푛 ⟨ℓ⟩ ⟨ℓ⟩푘표푛 ℓ30푆 ⟨ℓ⟩ 푘푅푁퐴 푘푐푎푡

√︁ * ∞ ℓ30푆 휆 where for 푘푅푁퐴 much faster than the association between the subunits, 휑 ≈ 50푆 . 123푚 ⟨ℓ⟩푘표푛

S8 Summary of optimal solutions

Solutions for the factor predicted optimal abundances as a function of effective biochemical parameters and the growth rate at the optimum, are presented in Table S3. The table breaks down terms in each solution by categories: direct diffusion term (arising from diffusive search time), catalytic sequestration, and delay incurred by the diffusion of other proteins in part of the cycle of the factor of interest. Solutions

−1 −1 are listed in terms of on-rate 푘^표푛 (units of µM s ). The aaRS solution follows a different form:

푛 ℓ ℓ 휆* 휑* = 푎푎 푎푎푅푆 + 푎푎푅푆 , 푎푎푅푆 ^푎푎푅푆 푎푎푅푆 푘표푛 푃 Δ푡 푘푐푎푡 * √︃ * tRNA푡표푡 1 2 휑푇 퐶 1 * 푛푎푎ℓ푟푖푏표ℓ푇 푢휆 with Δ푡 := − − − − , and 휑 := . * 푇 퐶 * 푚푎푥 * 푎푎푅푆 푇 퐶 ^푇 퐶 푃 휆 푘표푛 휑푇 퐶 푘푒푙 ℓ푇 푢휆 푘푐푎푡 푘표푛 푃

S32 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

√︂ √︂ 휆* 휆* Factor Diffusion (direct) ∝ Diffusion (other) ∝ Catalytic sequestration ∝ 휆* 푃 푃 √︃ √︃ ℓ ℓ [︂ ℓ + ℓ ]︂ ℓ ⟨ℓ⟩ ℓ (︂ 1 1 )︂ IF1 푟푖푏표 퐼퐹 1 1 + 퐼퐹 2 퐼퐹 3 퐼퐹 1 퐼퐹 1 + ^퐼퐹 1 ℓ ⟨ℓ⟩ ^50푆 ⟨ℓ⟩ 푘푅푁퐴 푖푛푖 ⟨ℓ⟩푘표푛 푟푖푏표 푘표푛 푘푐푎푡 √︃ (︃√︃ √︃ )︃ √︂3 ℓ ℓ ℓ ℓ ℓ ⟨ℓ⟩ ℓ (︂ 1 1 )︂ IF2 푟푖푏표 퐼퐹 2 퐼퐹 2 푟푖푏표 퐼퐹 1 + 퐼퐹 2 + 4 ^퐼퐹 2 ⟨ℓ⟩ ^퐼퐹 1 ^50푆 ⟨ℓ⟩ 푘푅푁퐴 푖푛푖 ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘표푛 푘표푛 푘푐푎푡 √︃ (︃√︃ √︃ )︃ √︂3 ℓ ℓ ℓ ℓ ℓ ⟨ℓ⟩ ℓ (︂ 1 1 )︂ IF3 푟푖푏표 퐼퐹 3 퐼퐹 3 푟푖푏표 퐼퐹 1 + 퐼퐹 3 + 4 ^퐼퐹 3 ⟨ℓ⟩ ^퐼퐹 1 ^50푆 ⟨ℓ⟩ 푘푅푁퐴 푖푛푖 ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘표푛 푘표푛 푘푐푎푡

√︃ ℓ ℓ ℓ EF-G 푟푖푏표 퐺 퐺 ^퐺 퐺 푘표푛 푘푐푎푡 √︃ ℓ ℓ ℓ EF-Ts 푇 푢 푇 푠 푇 푠 ^푇 푠 푇 푠 푘표푛 푘푐푎푡

√︃ √︃ (︂ )︂ ℓ푟푖푏표ℓ푇 푢푛푎푎 ℓ푇 푢ℓ푇 푠 1 1 EF-Tu ℓ푇 푢 + ^푇 퐶 ^푇 푠 푇 퐶 푇 푠 푘표푛 푘표푛 푘푐푎푡 푘푐푎푡

√︃ √ ℓ ℓ (︀1 + 2 푓 푓 )︀ ℓ RF1+RF2 푟푖푏표 푅퐹 퐼 푈퐴퐺 푈퐺퐴 푅퐹 퐼 ^푅퐹 퐼 푅퐹 퐼 ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘푐푎푡 √︃ ℓ ℓ ℓ RF4 푟푖푏표 푅퐹 4 푅퐹 4 ^푅퐹 4 푅퐹 4 ⟨ℓ⟩푘표푛 ⟨ℓ⟩푘푐푎푡

Table S3: Compilation of predicted optimal abundances for translation factors. The optimal abundance is the sum of the terms in each row. Columns correspond to contributions of different nature (diffusion of factor itself, diffusion of other factors involved in the factor’s cycle, catalytic term). Terms must be multiplied bythecommon factors indicated in each column’s header (∝).

S9 Estimation of optimal abundances

To compare prediction from our parsimonious framework (Table S3) requires specific values of kinetic parameters. We use empirical measurements together with scaling relations to estimate these kinetic parameters.

Catalytic rates for many enzymes have been measured in vitro, but the obtained values can be sharply incompatible with kinetic parameters that have been measured in the cell. An example is tRNA synthetases. Tallying the measured 푘푐푎푡 for all wild-type E. coli aaRSs [24], we find a median value of

푎푎푅푆 −1 −1 푘푐푎푡 ≈ 3 s , and 80% of reported value below 6 s . The total molar concentration of aaRSs in the cell is comparable to the total number of ribosomes, and the elongation speed of ribosome is above 15

S33 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. a.a. s−1 [13, 25]. Hence, the absolute minimum catalytic rate to sustain the translation elongation flux

푎푎푅푆 −1 needs to obey 푘푐푎푡 > 15 a.a. s , which is much higher than most in vitro measured values. To avoid the difficulties in estimating catalytic parameters, and to derive a lower bound on factor abundance from our model, we focus on the diffusive contributions (related to the associate rate) in our predictions, assuming large catalytic rates (푘푐푎푡 → ∞).

One way to estimate association rates 푘^표푛 is to assume that the processes are diffusion limited, i.e., 푑푖푓푓 following the Smoluchowski relation 푘^표푛 = 4휋퐷푅, where 퐷 is the relative diffusion coefficients of the two reactants and 푅 the capture radius. Diffusion is a well-characterized process biophysically, and diffusion coefficients of various proteins of different sizes have been measured in the cell [18,31,49], including for components of the translation machinery such as ribosomes [4, 59], and tRNAs [55, 67].

Volkov et al report an EF-Tu diffusion coefficient (and a similar measurement for a major diffusive state

2 −1 푑푖푓푓 of tRNAs) of ≈ 3 µm s . Since ribosomes are nearly immobile, this can be used in an estimate of 푘표푛 , and taking 푅 ≈ 10 nm as the rough size of the ternary complex (EF-Tu bound to charged tRNAs),

푇 퐶,푑푖푓푓 −1 −1 we get in that case for a reaction purely diffusion limited 푘^표푛 > 100 µM s . In vivo estimates ^푇 퐶 −1 −1 based on kinetic measurements of elongation [13] find 푘표푛 = 6.4 µM s , suggesting that the idealized diffusion limited Smoluchowski regime is not applicable in this case. Multiple reasons could explainthis, such as rotational diffusion, large off-rate requiring multiple binding events before productive encounter, or possibly because the ternary complex needs to sample multiple non-cognate sites to find a cognate target (thereby slowing its diffusive search).

To circumvent these difficulties, we anchored our association rates to the estimated in vivo association ^푇 퐶 −1 −1 rate for the ternary complex, 푘표푛 = 6.4 µM s [13], and rescale the association rate by diffusion of ^퐴퐵 ^푇 퐶 related components, i.e., 푘표푛 /푘표푛 = (퐷퐴 + 퐷퐵)/(퐷푇 퐶 + 퐷푟푖푏표), where 퐷푖 is the diffusion coefficients for the molecular species 푖. While the in vivo diffusion coefficient for a number of component ofthe translation apparatus exist [4, 55, 59, 67], several factors do not have measured diffusion coefficients.

For these, we use the cubic root scaling from the Stokes-Einstein relation [49], see Table S4.

Additional measured quantities required to compute our estimates are: the growth rate 휆* = 5.5 ×

10−4 s−1 (21 min doubling time) , tRNA concentration (estimated from the tRNA to ribosome ratio of

푚푎푥 −1 6.5 [15]), the maximum elongation rate 푘푒푙 = 22 a.a. s [13], the in-protein amino acid concentration 푃 = 2.5 M [11, 30].

Finally, to estimate the ribosome proteome fraction, we subtract from the total concentration to the

S34 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.02.438287; this version posted April 4, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

core translation factor estimated from ribosome profiling (휑푅 = 0.335) the optimal abundance estimates * ∑︀ * for the other factors, i.e., 휑푟푖푏표 = 휑푅 − 푖 휑푇 퐹,푖. Results displayed in Fig. 4 are listed in Table S5 .

Factor Protein size Diffusion coefficientµ ( m2 s−1) Ribosome ℓ푟푖푏표 = 7336 [69] 퐷푟푖푏표 = 0.05[4, 59] 30S subunit ℓ30푆 = 3108 [69] 퐷푠푢푏푢푛푖푡푠 = 0.15[4, 59] # TC ℓ푇 퐶 = 630 퐷푇 퐶 = 3 [67] tRNA 퐷푡푅푁퐴 = 8 [55, 67] √︁ 3 ℓ푇 퐶 IF1 ℓ퐼퐹 1 = 72 퐷퐼퐹 1 = 퐷푇 퐶 ℓ퐼퐹 1 √︁ 3 ℓ푇 퐶 IF2 ℓ퐼퐹 2 = 890 퐷퐼퐹 2 = 퐷푇 퐶 ℓ퐼퐹 2 √︁ 3 ℓ푇 퐶 IF3 ℓ퐼퐹 3 = 180 퐷퐼퐹 3 = 퐷푇 퐶 ℓ퐼퐹 3 √︁ 3 ℓ푇 퐶 EF-G ℓ퐺 = 704 퐷퐺 = 퐷푇 퐶 ℓ퐺 √︁ 3 ℓ푇 퐶 EF-Ts ℓ푇 푠 = 283 퐷푇 푠 = 퐷푇 퐶 ℓ푇 푠 √︁ 3 ℓ푇 퐶 EF-Tu ℓ푇 푢 = 394 퐷푇 푢 = 퐷푇 퐶 ℓ푇 푢 √︁ † 3 ℓ푇 퐶 aaRS ℓ푎푎푅푆 = 1196 퐷푎푎푅푆 = 퐷푇 퐶 ℓ푎푎푅푆 √︁ 3 ℓ푇 퐶 RF1/RF2 ℓ푅퐹 퐼 = 362 퐷푅퐹 퐼 = 퐷푇 퐶 ℓ푅퐹 퐼 √︁ 3 ℓ푇 퐶 RF4 ℓ푅퐹 4 = 185 퐷푅퐹 4 = 퐷푇 퐶 ℓ푅퐹 4

Table S4: Protein sizes and diffusion coefficients. Unless otherwise noted, protein sizes are takenfor E. coli [29] #For the ternary complex, the total mass of tRNA+EF-Tu was converted to an equivalent amino acid length. †For aaRS, the proteome fraction weighted (estimated from ribosome profiling [38]) average size (accounting for varying complex stoichiometries) in E. coli was taken.

Factor B. subtilis E. coli V. natriegens Average Diffusion limited optima 휑* Ribosome 19.5 22.1 21.8 21.1 25.6 EF-Tu 5.40 6.14 6.93 6.16 4.61 aaRS 2.48 2.91 3.32 2.90 1.27 EF-G 1.99 1.59 2.10 1.89 1.35 EF-Ts 0.562 0.724 0.922 0.736 0.16 IF2 0.223 0.331 0.316 0.290 0.280 IF3 0.124 0.109 0.139 0.124 0.070 IF1 0.077 0.059 0.102 0.079 0.030 RF4 0.132 0.116 0.123 0.123 0.039 RF1/RF2 0.139 0.109 0.063 0.104 0.065

Table S5: Proteome fraction (in %) of core mRNA translation factors in fast growing B. subtilis, E. coli, and V. natriegens estimated from ribosome profiling [34, 38] (shown in Fig. 1B), average across the three species, and estimated diffusion-limited optima (Fig. 4).

S35