bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Coordination of noise with cell size: analytical results for agent-based models of growing cell populations Philipp Thomasa) and Vahid Shahrezaei Department of Mathematics, Imperial College London, UK The chemical master equation and the are widely used to model the reaction kinetics inside living cells. It is thereby assumed that cell growth and division can be modelled through effective dilution reactions and extrinsic noise sources. We here re-examine these paradigms through developing an analytical agent-based framework of growing and dividing cells accompanied by an exact simulation algorithm, which allows us to quantify the dynamics of virtually any intracellular reaction network affected by stochastic cell size control and division noise. We find that the solution of the chemical master equation – including static extrinsic noise – exactly agrees with the agent-based formulation when the network under study exhibits stochastic concentration homeostasis, a novel condition that generalises concentration homeostasis in deterministic systems to higher order moments and distributions. We illustrate stochastic concentration homeostasis for a range of common gene expression networks. When this condition is not met, we demonstrate by extending the linear noise approximation to agent-based models that the dependence of gene expression noise on cell size can qualitatively deviate from the chemical master equation. Surprisingly, the total noise of the agent-based approach can still be well approximated by extrinsic noise models.

I. INTRODUCTION tors like regulators, polymerases or ribosomes that approximately double over the division cycle3,16,17. Cell size Cells must continuously synthesise molecules to grow and fluctuates in single cells, however, providing a source of ex- divide. At a single cell level, gene expression and cell size trinsic noise in reaction rates that can be identified via noise 18,19 are coordinated but heterogeneous which can drive pheno- decompositions . A few studies combined EDMs with typic variability and decision making in cell populations1–5. static cell size variations as an explanatory source of ex- 20–22 The interplay between these sources of cell-to-cell variabil- trinsic noise . In brief, the total noise in these models ity is not well understood since they have traditionally been amounts to intrinsic fluctuations due to gene expression and studied separately. A general stochastic theory integrating dilution, and extrinsic variation across cell sizes in the pop- size-dependent biochemical reactions with the dynamics of ulation. We refer to this class of models as extrinsic noise growing and dividing cells is hence still missing. models (ENMs, see Fig.1b). Yet it remains unclear how re- Many models of noisy gene expression and its regulation liably these effective models describe cells that continuously are based on the chemical master equation that describes synthesise molecules, grow and divide. the stochastic dynamics of biochemical reactions in a fixed An increasing number of studies are investing efforts reaction volume6–8. The small scale of compartmental sizes towards quantifying the dependence of gene expression of cells implies that only a small number of molecules is noise on cell cycle progression and growth, either exper- present at any time leading to large variability of reaction imentally via ergodic principles or pseudo-time23,24 and rates from cell to cell, commonly referred to as gene ex- time-lapse imaging22,25,26 or theoretically through noise pression noise9–11. Another factor contributing to gene ex- decomposition27–29, master equations including cell cycle pression noise is the fact that cells are continuously growing dynamics4,17,30–35 and agent-based approaches including and dividing causing molecule numbers to (approximately) age-structure of growing populations35–40. The essence of double over the course of a growth-division cycle. A com- agent-based models (ABMs) is that each cell in a popula- mon approach to account for cell growth is to include extra tion is represented by an agent whose physiological state is degradation reactions that describe dilution of gene expres- tracked along with their molecular reaction networks. In sion levels due to cell growth9–13 akin to what is done in de- principle, these models are able to predict gene expression terministic rate equation models14,15. We will refer to this distributions of cells progressing through well-defined cell approach as the effective dilution model (EDM, see Fig.1a). cycle states as measured by time-lapse microscopy and snap- However, little is known of how well this approach repre- shots of heterogeneous populations. The unprecedented de- sents the dependence of gene expression noise on cell size tail of these models must cast doubt on the predictions of observed in a growing population. master equation models (EDMs and ENMs) in which growth Cells achieve concentration homeostasis through coupling and division are modelled by effective dilution reactions. reaction rates to cell size via highly abundant upstream fac- Yet it is presently unclear why these effective models have fared reasonably well in predicting gene expression noise re- ported by single-cell experiments10,17,41. Nevertheless, most ABMs still ignore cell size, a ma- a)Electronic mail: [email protected] jor physiological factor affecting both intracellular reactions bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2 and cell division dynamics alike. Since cell size varies at condition states that there exists a steady state between least two-fold as required by size homeostasis in a growing reaction and dilution rates population, and it scales some reaction rates as required by concentration homeostasis, it is expected that cell size must R ¯ X + − ¯ significantly contribute to gene expression variation across αX = (νr − νr )fr(X). (2) a population. In this article, we bridge the gap between the r=1 chemical master equation and agent-based approaches by ¯ integrating cell size dynamics with the stochastic kinetics of Here, fr(X) are macroscopic reaction-rate functions and α molecular reaction networks. is the exponential growth rate of cells determining the dilu- The outline of the paper is as follows. First, we explain tion rate due to growth. Since these quantities are indepen- the analytical framework for EDMs, ENMs and ABMs (II). dent of cell size, the balanced growth condition (2) implies Then we introduce the concept of stochastic concentration concentration homeostasis in rate equation models. homeostasis, a rigorous condition under which the chemi- cal master equations of the EDM and ENM agree exactly with the ABM (Sec. IIIA). This new condition is met by 2. Effective dilution model some but not all common models of gene expression. We show that when these conditions are not met, the effective The chemical master equation6 and equivalently the models agree with the ABM only on average (Sec. IIIB). stochastic simulation algorithm7 are state-of-the-art To address this problem, we propose a comprehensive theo- stochastic models of reaction kinetics inside cells. Al- retical framework extending the linear noise approximation though well-established, they are strictly valid only when to agent-based dynamics with which we quantify cell size describing cellular fluctuations at constant cell size s. A scaling of gene expression in growing cells (Sec. IIIC). Our straight-forward approach to circumvent this limitation is findings indicate that the EDM can qualitatively fail to pre- to supplement (1) by additional degradation reactions of dict this dependence but our novel approximation method rate α that model dilution of molecules due to cell growth: accurately describes gene expression noise in the presence of cell size control variations and division errors. We further α S −→ , i = 1, 2,...,N, (3) show that ENMs present surprisingly accurate approxima- i ∅ tions for the total noise statistics (Sec. IIID). akin to what is traditionally for reaction rate equations (2). The chemical master equation of this effective dilution model (EDM) then takes the familiar form II. METHODS

∂ΠEDM(x|s) We consider a biochemical reaction network of N molecu- 0 = = [Q(s) + αD]ΠEDM(x|s), (4) T ∂t lar species S = (S1,S2,...,SN ) embedded in a cell of size s. The network then has the general form: governing the conditional probability of molecule numbers T x = (x1, x2, . . . , xN ) of the species S in a cell of size s and N N X − kr X + where νirSi −→ νirSi, r = 1,...,R, (1) i=1 i=1 R X 0 x,x0 (s) = wr(x , s)(δ 0 + − − δx,x0 ), (5) ± ± ± ± T Q x,x +νr −νr where νr = (ν1r, ν2r, . . . , νNr) are the stoichiometric co- r=1 th efficients and kr is the reaction rate constant of the r re- action. In the following, we outline deterministic, effective are the elements of the transition matrix of the molecu- dilution and extrinsic noise models and develop a new agent- lar reactions (1) and we included the extra dilution reac- PN 0 based approach coupling stochastic reaction dynamics to tions (3) via 0 (s) = x (δ 0 − δ 0 ). We are Dx,x i=1 i xi,xi−1 xi,xi cell size in growing and dividing cells (Fig. 1). here interested in the stationary solution and hence set the time-derivative in Eq. (4) to zero. Such effective models are motivated through the fact6,42 that when the microscopic A. Effective dilution models, extrinsic noise models and the propensities wr are linked to the macroscopic rate functions chemical master equation fr of the rate equation models via mass-action kinetics

1. Rate equation models and concentration homeostasis wr(x, s) ≈ sfr(X), (6)

Deterministically, the vector of molecular concentrations where X = x/s is the concentration, the mean concentra- T X¯ = (X¯1, X¯2,..., X¯N ) is governed by rate equation mod- tions of EDMs follow the concentrations X¯ of the rate equa- els in balanced growth conditions. The balanced growth tions (2) (see Sec. II A 4). bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

3

FIG. 1. Modelling approaches for cell size dependence of gene expression. (a) The effective dilution model describes cells at constant size with intracellular reactions coupled to effective dilution reactions. (b) The extrinsic noise model incorporates static cell size variability as a source of extrinsic noise coupled with effective dilution models (c) The agent-based approach models intracellular reactions occurring across a growing and dividing cell population without the need for effective dilution reactions.

3. Extrinsic noise model law of total variance18,19

Σ = Σint + Σext , (8) A common way to incorporate static size variability be- Y Y Y |{z} |{z} tween cells in the model is to consider cell size s to be gene expression cell size variation distributed across cells according to a cell size distribution which correspond to molecular fluctuations due to gene Π(s). We will refer to this approach as the extrinsic noise expression and cell size variation, respectively, for Y ∈ model (ENM), which leads to a mixture model of concen- {EDM, ENM}. Specifically, for molecule numbers x, we trations X = x/s, int ext have ΣY = EΠ[CovΠY [x|s]] and ΣY = CovΠ[EΠY [x|s]], Z ∞ where EΠ denotes the expectation value with respect to the ΠENM(X) = ds ΠEDM(x = Xs|s)Π(s), (7) distribution Π, and analogously for concentrations. The in- 0 int trinsic components ΣY satisfy a Lyapunov equation called and analogous expressions for the molecule number distri- the linear noise approximation: butions. int int T −1 ¯ 0 = JdΣY + ΣY Jd + ΩY Dd(X), (9)

where ΩY has to be chosen depending on whether concen- 4. Analytical solutions and noise decomposition tration or number covariances are of interest:

The advantage of the EDM and ENM is that its noise ΩY concentration numbers statistics can be approximated in closed-form using the lin- EDM s s−1 . (10) 6,43,44 −1 −1 −1 ear noise approximation . In this approximation, the ENM EΠ[s ] EΠ[s] mean concentrations are approximated by the solution X¯ of the rate equations (2) and the probability distribution The matrix Jd is the Jacobian of the rate equations (2) and ΠEDM(x|s) is approximated by a Gaussian. In the same Dd denotes the diffusion matrix obeying limit, the covariance matrix ΣY can be decomposed into in- int ext Jd(X¯) = J (X¯) − α1, Dd(X¯) = D(X¯) + α diag(X¯), (11) trinsic and extrinsic components, ΣY and ΣY , using the bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4

¯ PR + − T ¯ ¯ where J (X) = r=1(νr − νr )∇X¯ fr(X) and D(X) = The ABM simulation algorithm is given in Box1, which PR ¯ + − + − T combines the First-Division algorithm, previously intro- r=1 fr(X)(νr − νr )(νr − νr ) . The extrinsic compo- ext duced for agent-based cell populations38, with the Extrande nents ΣY follow from the dependence of the mean on cell size, which features only in the molecule number variance method adapted to simulate reaction networks embedded in 45 of the ENM: a growing cell . In the following, we describe the exact an- alytical framework with which we characterise the snapshot ext ΣY concentration numbers distributions that underlie such a population of agents. EDM 0 0 , (12) ¯ ¯ T ENM 0 VarΠ(s)XX Master equation for agent-based populations ext where the last cell follows from Σ = CovΠ[EΠ[x|s]] with ENM We consider the number of cells n(τ, s, x, t) with age τ EΠ[x|s] = sX¯. As a concrete example, we consider of mR- (time since the last division), cell size s and molecule counts NAs with a size-dependent transcription rate that are trans- x in a snapshot at time t, which evolves as lated into stable proteins:  ∂ ∂ ∂  + + αs +γ ¯(s, τ) n(τ, s, x, t) = (s)n(τ, s, x, t), ∂t ∂τ ∂s Q k0s kdm ktl | {z } ∅ −−→ M −−→ ∅, M −−→ M + P. (13) | {z } stochastic reactions growth ∞ ∞ We then account for dilution through the additional reac- 0 0 0 tions n(0, s, x, t) = 2 ∫ dτ ∫ ds B(s|s ) | {z } 0 0 | {z } # newborn cells division error α α M −→ ∅, P −→ ∅. (14) X × B(x|x0, s/s0) γ¯(s0, τ 0)n(τ 0, s0, x0, t), (17) x0 | {z } | {z } The mean protein concentration is given by P¯ = k0b/α and partitioning of molecules # dividing cells the coefficient of variation predicted by the EDM and ENM models follow the familiar expression10 and describes cell growth, stochastic reaction kinetics and a boundary condition for cell division that ensures that the 1  δ  Σext number of newborn cells is twice the number of dividing CV2 = 1 + b + Y , (15) Y 2 cells after partitioning their size and molecular contents. ΩY P¯ 1 + δ P¯ These evolution equations have been derived in38,39 for age- ext where we account for size-variability via ΩY and ΣY given dependent snapshots but here we extend such agent-based by Eqs. (10) and (12), respectively, and the parameters models to include also cell size dynamics and size-dependent reaction dynamics. We allow for the following generalisa- k k δ = 1 + dm , b = tl , (16) tions: (i) size increases exponentially in single cells, (ii) cells α kdm + α divide with rateγ ¯(s, τ) that is both size- and age-dependent, (iii) the transition matrix Q(s) of the molecular reactions correspond to the ratio of mRNA and protein degrada- depends on cell size s via the propensities (see definition tion/dilution rates and the translational burst size, respec- after Eq. (4)), and (iv) the molecular partitioning kernel tively. From Eq. (15), (10) and (12), it is clear that size B(x|x0, s/s0) depends on the inherited size fraction s/s0 of variation acts on the intrinsic noise component of molecule −1 −1 2 a daughter cell. We now describe in detail how we model concentrations (via EΠ[s ] ≈ EΠ[s] (1 + CVΠ[s])) but ext the individual noise sources associated with cell size control, the extrinsic noise component of molecule numbers (via ΣY division errors, and molecule partitioning. (12)). a. Cell size control fluctuations. Recent studies46,47 have shown that the distribution of sizes with which cells divide does not explicitly depend on cell age but on the B. Agent-based modelling −ατ birth size s0. Assuming thatγ ¯(s, τ) = αsγ(s, s ), where 48,49 γ(s, s0) is the division rate per unit size (see also ), the Little is known about the accuracy of EDMs and ENMs division-size distribution is given by in predicting cellular noise in growing populations. In the R s − d dsγ(s,s0) following, we introduce an agent-based modelling approach ϕ(sd|s0) = γ(sd, s0)e s0 . (18) that serves as a gold standard to assess the validity of these effective models. The ABM represents cells as agents that As a concrete example of (18) we consider a model where 49–51 progressively synthesise molecules via intracellular reactions the division size is linearly related to birth size (1), grow in size and undergo cell division. Every division s = as + ∆. (19) gives rise to two daughter cells of varying birth sizes, each d 0 of which inherits a proportion of molecules from the mother The division rate can be calculated from the distribu- cell via stochastic size-dependent partitioning at division. tionϕ ˜(∆) of the noise term ∆ in (19) viaγ ˜(∆) = bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

5

R ∞ ϕ˜(∆)/( ∆ duϕ˜(u)) and setting γ(s, s0) =γ ˜(s − as0), which Cell size distribution gives the correct division-size distribution ϕ(sd|s0) =ϕ ˜(sd − as ) as expected. The model generalises the sizer (a = 0) 0 The evolution of the size distribution Π(s, s0) is obtained to concerted cell size controls such as the adder (a = 1) and by summing Eqs. (21) over all possible x, which yields: timer-like (2 > a > 1) models46,47,52. In the following, we will refer to CVϕ[∆] as the size-control noise.  ∂  α + αs + αsγ(s, s ) Π(s, s ) = 0 (23a) b. Division errors. After division, size is partitioned ∂s 0 0 between cells and the birth size of the two daughter cells is 0 00 s0Π(s0, s0) = obtained from s = θsd and s = (1 − θ)sd where θ is the 0 0 0 inherited size fraction, a random variable between 0 and 1 R ∞ 0R s 0 0 0 0 0 0 0 2 ds ds0 B(s0|s )s γ(s , s0)Π(s , s0). (23b) with distributionπ ¯(θ) (see Box 1). This can be modelled 0 0 using the division kernel Eqs. (23) can be solved analytically Z 1 0 0 2 1 B(s0|s ) = dθ π(θ)δ(θ − s0/s ), Π(s, s0) = ψbw(s0)Φ(s|s0) , (24) 0 Z s2 1 1 where π(θ) = 2 π¯(θ) + 2 π¯(1 − θ) including the case of asym- where ψbw(s0) is the birth size distribution in a backward s metric division. We will refer to CVπ[θ] as the division error 48 R 0 0 lineage (see for details), Φ(s|s0) = exp(− s ds γ(s , s0)) about the centre E [θ] = 1 . 0 π 2 is the probability that a cell born at size s0 has not divided c. Molecule partitioning at cell division. The partition- −1 before reaching size s, and Z = Eψ [s ] is a normalising 0 bw 0 ing kernel B(x|x , θ) denotes the probability that a cell in- constant. herits x molecules from a total of x0 molecules from its mother and this probability depends on the daughter’s in- herited size fraction θ. We assume that cells are sufficiently Molecule number distributions for cells of a certain size well mixed and each molecule is partitioned independently with probability θ such that the division kernel is binomial The conditional molecule number distribution Π(x|s, s0) N  0  gives the probability to find the molecule numbers x in a 0 Y xi x x0 −x B(x|x , θ) = θ i (1 − θ) i i . (20) cell of size s that was born at size s0 and satisfies x i=1 i ∂ αs Π(x|s, s ) = (s)Π(x|s, s ), (25a) To make analytical progress, we assume that the pop- ∂s 0 Q 0 ulation establishes a long-term stationary distribution Π(x|s , s ) = Π(s, s , x) characterising the fraction of cells with molecule 0 0 0 0 numbers x, cell size s and birth size s that is invariant in X R ∞ 0R s 0 0 0 0 0 0 0 0 0 0 ds 0 ds0 B(x|x , s0/s )ρ(s , s0|s0)Π(x |s , s0). αt time. To this end, we let n(τ, s, x, t) ∝ e Π(s, τ, x) and x0 change variables from cell age τ to birth size s0 such that (25b) −1 Π(s, s0, x) = (αs) Π(s, τ = ln(s/s0)/α, x). We find that this transformation reduces the PDE (17) to an integro- Eqs. (25) follow directly from substituting Eq. (22) into (21) ODE: and using (18) and (23). The solution of these equations depends implicitly on the ancestral cell size distribution ρ,  ∂  α + αs + αsγ(s, s0) Π(s, s0, x) = Q(s)Π(s, s0, x) ∂s 0 0 1 s0 0 0 0 0 ρ(s , s0|s0) = 0 B(s0|s )ϕ(s |s0)ψbw(s0), (25c) (21a) ψbw(s0) s s0Π(s0, s0, x) = that gives the probability of a cell born at size s0 having an X ∞ s0 0 0 R 0R 0 0 0 0 0 0 0 0 0 0 ancestor with division size s and birth size s0. The main 2 0 ds 0 ds0 B(x|x , s0/s )B(s0|s )s γ(s , s0)Π(s , s0, x ). x0 difference between the molecule number distributions of the (21b) ABM and the EDM/ENM is the boundary condition at cell division, which as we shall see can have a significant effect We finally characterise the marginal cell size distribution on the reaction dynamics. Π(s, s0) and the conditional molecule number distribution Π(x|s, s0) via Bayes’ formula III. RESULTS Π(s, s , x) Π(s, s ) = P Π(s, s , x), Π(x|s, s ) = 0 , (22) 0 x 0 0 Π(s, s ) 0 We here introduce the concept of stochastic concentra- which together provide the full information about the pop- tion homeostasis (SCH) as a generalisation of concentra- ulation snapshot. tion homeostasis in deterministic systems (see Sec. II A 1) bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6

FIG. 2. Distributions of CME and agent-based models agree for reaction networks with stochastic concentration homeostasis. (a-c) The EDM (solid lines, analytical solution53,54) agrees with agent-based simulations (shaded areas) for a range of gene expression models. Panels show (a) bursty transcription53 with transcription rate proportional to cell size, (b) bursty 54 54 , and (c) bursty transcription and translation with geometrically distributed bursts ms whose average is proportional to cell size s (see main text for details). (d) mRNA distributions simulated using the ABM (shaded areas) are shown for cells of sizes s = s0 (red), s = 1.5s0 (orange) and s = 2s0 (green), which agree with the effective dilution model (dots, Poisson distribution). (e) Simulated protein distributions (shaded areas) disagree with the effective dilution model (solid lines, solution in Ref.55). (f) Absolute error (`1) of the effective dilution model as a function of cell size for mRNA (teal) and protein (red) distributions. ABM simulations were obtained using the First-Division Algorithm (Box 1) assuming an adder model (a = 1) and parameters k0 = 10, kdm = 9, ktl = 100, α = 1. Cell cycle noise assumes gamma distributionϕ ˜(∆) with unit mean and CVϕ[∆] = 0.1, while division noise assumes symmetric beta distribution with CVπ[θ] = 0.01. to higher moments and distributions in stochastic reaction A. The effective dilution model is valid for reaction networks networks. SCH is a homeostatic condition for the distri- with stochastic concentration homeostasis bution p(x|s) of a size-dependent to be expressed as a mixture of Poisson random variables drawn Theorem1 (AppendixA) is a central result of our analysis from an underlying continuous stochastic concentration vec- and it states that if the EDM (4) satisfies SCH, i.e., Eq. (26) T tor κ = (κ1, κ2, . . . , κN ) that is statistically independent holds for ΠEDM(x|s), then its stationary solution is also a of s: solution of the ABM (25):

Π(x|s, s0) = ΠEDM(x|s), (27)

and the solution is independent of the birth size s0. Equiv- N alently (Appendix A), we can say that the EDM/ABM sat- Z xi Y (κis) p(x|s) = dκ χ(κ) e−sκi . (26) isfies SCH if the factorial-moment generating function is of x ! K i=1 i the form N X Y xi GEDM(z|s) = zi ΠEDM(x|s) = F (s(z − 1)), (28) x i=1 PN tiκi The fact that κ and its density χ(κ) are independent of s where F (t) = Eχ[e i=1 ], the moment-generating func- ensures concentration homeostasis in the stochastic sense. tion of the concentration vector, is cell-size (s) indepen- bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

7

FIG. 3. Comparing the statistics of the effective dilution and agent-based models. (a) Simple model of mRNA transcription and protein translation transcriptional size-scaling (13). (b) Mean mRNA (top) and protein levels (bottom) agree with the EDM (solid grey lines) and ABM simulations (blue dots). (c) mRNA statistics display unit Fano factor indicating Poisson statistics in agreement with EDM. (d) ABM simulations (dots, Box1) display non-monotonic cell size scaling of protein noise, which are predicted by the agent-based theory (solid red) but not by the EDM (solid grey). Parameters are k0 = 10, kdm = 10, ktl = 100, α = 1. Cell size control parameters are as in Fig. 2. dent. An interesting observation is that SCH implies that networks must either be proportional to cell size or include the mean numbers and coefficients of variation for cells of size-dependent random bursts ms whose burst distribution the same size s are given by satisfies SCH itself. We illustrate the predictive power of this result by demonstrating SCH for common gene expres- 2 1 2 sion models involving reactions of the form (30) and show EΠ[x|s] = sEχ[κ], CVΠ(x|s) = + CVχ(κ). (29) sEχ[κ] that the analytical solution of the chemical master equations agrees exactly with the agent-based models (Fig. 2a-c). Since κ is independent of s, SCH implies homeostasis of mRNA expression involving a two-state promoter53 the mean concentrations in (29) but concentration home- (Fig. 2a), ostasis on average does not necessarily imply SCH. The coefficients of variation coincide both for concentrations kon sk0 kdm and molecule numbers and have size-dependent and size- Doff )−−−−* Don −−→ Don + M,M −−→ ∅ k independent components. In the following, we provide ex- off amples of reaction networks for which SCH holds for all satisfies SCH for all parameter values since the network is values of the rate constants and demonstrate the validity of of the form (30) whenever the transcription rate is propor- the EDM by comparing its distribution solutions to ABM tional to cell size. The stochastic concentration variable is   simulations. distributed as κ ∼ k0 Beta kon , koff . It can be seen from (26) and (29) that when the EDM’s (kdm+α) (kdm+α) (kdm+α) Bursty protein expression (Fig. 2b) of a stable (non- stationary distribution is Poissonian with deterministic con- degrading) protein arising from a two-stage model of gene centration vector κ, this distribution satisfies SCH and expression can be modelled using stochastic bursts: hence is also a solution of the ABM. More generally,

SCH can be checked without solving for GEDM(z|s) (or k0 −→ ms × P. ΠEDM(x|s)). Assuming mass-action kinetics (6), for exam- ∅ ple, a sufficient condition for SCH is that the network con- According to (30) the model satisfies SCH for all param- sists entirely of mono-molecular reactions (see Appendix A) eter values when the burst distribution obeys SCH. This of the form: is the case for geometrically distributed bursts ms whose D(t)s D(t) mean is proportional to cell size, E[m |s] = bs. It can −−−→ S , or −−−→ m × S , s ∅ 1 ∅ s 1 be shown that the stochastic concentration variable follows s or ∅ −→ S1, or S1 −→ ∅, or S1 −→ S2, (30) κ ∼ Gamma(k0/α, b). Similarly, bursty protein expression from a two-state pro- where S1 and S2 denote any pair of species that are parti- moter arising from a three-stage gene expression model54,56 tioned at cell division and D(t) is a exogenous stationary (Fig. 2c), stochastic process modelling a genetic state which is copied but not partitioned at cell division and does not scale with kon k0 Doff )−−−−* Don −→ Don + ms × P, cell size. The propensities of zero-order reactions in SCH koff bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8 also satisfies SCH for geometrically distributed that under the mass-action scaling assumption (6) the mean bursts (with mean bs) but the concentration vari- concentrations of the ABM and EDM agree approximately, 57 a− able is doubly stochastic κ ∼ Gamma( α , br), and they satisfy concentration homeostasis on average. This a− a++kon+koff can be seen by multiplying Eq. (25) by x and averaging, r ∼ Beta( α , α ) where a± = koff + kon + p 2 which yields ODEs for the mean numbers: k0 ± (koff + kon + k0) − 4k0kon. We observe excellent numerical agreement between the ABM simulations and R ∂ X analytical EDM solutions in all these cases validating our αs E [x|s, s ] = (ν+ − ν−)E [w (x)|s, s ], (31a) ∂s Π 0 r r Π r 0 theoretical predictions (Fig. 2a-c). r=1 A complex example of the reactions (30) that obeys SCH for all parameter values but yet defies analytical solution is and the boundary conditions

gij   Di −−→ Dj ∀i, j = 1,...,ND s0 0 0 EΠ[x|s0, s0] = Eρ 0 EΠ[x|s , s0] s0 . (31b) sti s Di −−→ Di + M1, ∀i = 1,...,ND

k1 k2 kS δi Unfortunately, Eqs. (31) are not necessarily closed since the M1 −→ M2 −→ ... −−→ MS,Mi −→ ∅ ∀i = 1,...,NM equation for the mean may involve higher order moments where the exogenous genetic states Di undergo switching when wr(x) depends nonlinearly on x and we need to re- with rates gij but are not partitioned at cell division, tran- sort to approximations. Analogously to the linear noise ap- scription rates sti are assumed to be proportional to cell proximation, we set E[x|s, s0] = sX¯ and EΠ[wr(x)|s, s0] ≈ size s, and processing of transcripts Mi follows a multi-step sfr(X¯) and insert the resulting expression into Eqs. (31). It process with rates ki and degradation with rates δi. For follows that X¯ is independent of size and satisfies the rate example, it can be checked that for NM = 1 we recover the equations (2). We conclude that, for mass-action kinetics, 2m-multistate model58 as a special case whose EDM has a the EDM agrees exactly with the ABM on average for net- factorial-moment generating function (compare Eq. (7) in58 works with linear propensities and approximately for large with (28)) satisfies SCH precisely when the transcription cell size for nonlinear reaction networks. rates are proportional to cell size. On the other hand, discrepancies between the EDM and ABM solutions will be apparent when reactions do not obey C. Scaling of fluctuations with size in individual cells SCH. To illustrate this point, we return to the gene ex- manifests the breakdown of the EDM lacking SCH pression model with transcriptional size-scaling and explicit protein translation reaction (13). Note that in the EDM Next we investigate the scaling of fluctuations with cell extra reactions are being added for the dilution of mR- size. Under the linear noise approximation the covariance NAs and proteins, while for the ABM proteins are diluted matrix Σ(s, s0) = CovΠ[x|s, s0] evolves according to through growth and divisions. Using our condition (28), it is straight-forward to verify that the Poissonian mRNA ∂ αs Σ(s, s ) = J Σ(s, s ) + Σ(s, s )J T + sD(X¯), (32a) distributions of the EDM coincide exactly with the distri- ∂s 0 0 0 butions of the ABM (Fig. 2d). However, this condition is ¯ ¯ not met for the protein distribution since the translation re- where J (X) and D(X) are the Jacobian and diffusion ma- action is not a monomolecular reaction of the form (30). To trices defined after Eq. (11). To make analytical progress we demonstrate the breakdown of the EDM, we compare the assume for now that cell division is deterministic (CVϕ[∆] = analytical steady state distributions obtained by Bokes et CVπ[θ] = 0), which implies the following boundary condi- al.55 against ABM simulations at various cell sizes (Fig. 2e). tion We observe that the error of the EDM (as quantified by the 4Σ(s0, s0) = 2s0diag(X¯) + Σ(2s0, s0). (32b) `1-distance of the two distributions, Fig. 2f) is pronounced both for newborn and dividing cells. The remainder of this The first term is due to binomial partitioning of molecules article is dedicated to investigate the sources and conse- and the second stems from gene expression noise at divi- quences of these discrepancies. sion. It is implicit in the deterministic division assumption (CVϕ[∆] = CVπ[θ] = 0) that the birth size s0 across cells is B. The EDM approximates the mean concentrations of fixed and that the size distribution in Eq. (24) reduces to ABMs lacking SCH 2s Π(s) = Π(s|s ) = 0 (33) 0 s2 SCH provides a general criterion with which to probe the validity of the EDM probability distributions. In practice, for s0 ≤ s ≤ 2s0 and zero otherwise, in agreement with however, approximate agreement of the first few moments, previous results59,60. Similarly, the ancestral distribution 0 0 0 0 e.g., mean and variances, often suffices. Here, we establish (25c) reduces to ρ(s , s0|s0) = δ(s − 2s0)δ(s0 − s0). bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9

FIG. 4. Effect of noise in cell size control and division on gene expression noise in single cells. (a-d) Protein noise as a function of cell size s for various noise levels in added size CV[∆]. For comparison, the prediction without cell cycle noise (dashed black line, Eq. (36)) and the cell size distributions (shaded grey) are shown. (e-h) Same as (a-d) but with division noise affecting the inherited size fraction CV[θ]. Analytical predictions (solid lines, Eq. (38a) with (36) and (38b)) and ABM simulations (dots) using the First-Division Algorithm (Box 1) are shown. Gene expression model and all other parameters are as given in Fig. 3. Added size ∆ assumes a gamma distribution with unit mean and CV[∆] = 0.01 (e-f) while division errors θ followed a symmetric beta distribution with CV[θ] = 0.01 (a-d).

Eqs. (32) can be solved in closed form using the eigende- Using Eq. (32) we find that the cell size dependent fluc- composition of the Jacobian J . The solution to (32a) that tuations satisfy respects the boundary condition (32b) is (Appendix D) VarΠ[m|s, s0] = EΠ[m|s], † X s uˆiuˆj  δ δ  sbk0 s0  2 Σ(s, s0) = ∗× α − λ − λ CovΠ[m, p|s, s0] = 1 + δ+1 , ij i j αδ s 1 − 2 ∗ " λi+λj # D˜ + X˜ (λ + λ∗ − α) 1− VarΠ[p|s, s0] = EΠ[p|s]× ˜ ij ij i j s0  α Dij + ∗ , (34)  δ+1  λi+λ  δ j −1 s s0 4bδ s0 b 2 2 α − 2 1 + 2b − + . s 3(δ − 1) s δ − 1 (2δ+1 − 1) where † denotes the conjugate-transpose and we defined the (36) matrices D˜ = U −1DU −†, X˜ = U −1diag(X¯)U −† and U = (ˆu1,..., uˆN ) whose columns are the eigenvectors of J such We note that the mRNA variance of the ABM agrees pre- that U −1J U = diag(λ). cisely with the EDM (Fig. 3c). The agreement is a direct We demonstrate the implications of this result using the consequence of SCH exhibited by the mRNA transcription gene expression example with transcriptional size-scaling and degradation reactions (cf. (30)). However, the expres- and explicit translation reaction (13). The mean of mRNA sions for the predicted mRNA-protein covariance and pro- numbers m and protein numbers p are tein variance disagree with their EDM counterpart since the reactions involving the protein violate SCH. To explore this k0 bk0 EΠ[m|s] = s ,EΠ[p|s] = s , (35) dependence, we compare the corresponding coefficients of αδ α variation of both models (Fig. 3d). The EDM overestimates where the constants are defined in Eq. (16). These expres- cell-to-cell variation of small cells but underestimates it for sions hold both for the EDM and the ABM, and they exhibit large cells. Moreover, the EDM’s coefficient of variation de- concentration homeostasis on average as shown in the previ- creases monotonically with cell size, but this is not the case ous section. The exact agreement between EDM and ABM for the ABM. is also confirmed by ABM simulations (Fig. 3b). Strikingly, the coefficient of variation peaks as cells bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10 progress through the cell cycle (Fig. 3d, solid red line), cells. The interpretation of this conditional expectation is which is in excellent agreement with the ABM simulations that small cells were born with sizes smaller than average (blue dots) but is not seen in the EDM (solid grey). This while larger cells were born with sizes above average de- can be seen directly from Eqs. (36) for which protein fluc- pending on their size control (Fig.6 in AppendixC). The parameters in Eq. (38b) are given by the mean birth sizes ¯0 tuations can be approximated in the limit of fast mRNA 2 degradation (δ  1) as and variance σ in a backward lineage tracing the ancestors of a random cell in the population (see Ref.48 for details):    2 1 s0 4 (2 − a) 1 + CV2 CVABM[p|s, s0] ≈ 1 + b 2 − , (37) s¯ = θ , EΠ[p|s] s 3 0 2 2 − a(1 + CVθ) 2 8b CV2 1 + 3CV2 2 − a(1 + CV2) + 4CV2 1 − CV2 which has a maximum at a cell size of s = s0 as 2 2 ∆ θ θ θ θ 3(2b+1) σ =s ¯0 . 2 2 2 2  confirmed by agent-based simulations (Fig. 3d). Depending CVθ + 1 4 − a (1 + 3CVθ) (38c) on the burst size b, the peak shifts from s = s0 for b = 3/2 to s = 4/3s0 for b  1. The qualitative difference between Eqs. (38) provide a closed form approximation of the cell size the scalings of gene expression noise with cell size of EDMs dependence of any given moment accurate to order O(σ3). and ABMs manifests the breakdown of the EDM, which To verify the accuracy of proposed approximation, we test is observed both for concentrations and molecule numbers the theory for various strengths of size control noise and since their coefficients of variation coincide when considering division errors (Fig.4). We observe that increasing noise cells of the same size. results in the monotonic decrease of gene expression noise with cell size (Fig. 4a-d) in good agreement with ABM sim- ulations, even for large cell size fluctuations. We further Effect of cell size control on gene expression dynamics ask about the effects of partitioning noise, which shows a similar dependence but agrees less well with the ABM sim- Next, we ask how fluctuations in the cell size control af- ulations for cells smaller than the mean birth size (Fig. 4e- fect gene expression noise. It may be intuitively expected h), presumably since the effect of large variability in birth that noise in cell size control and division errors cause vari- sizes is not captured in our approximation. Nevertheless, able birth sizes, variable division times and hence noisy ex- the present approximation qualitatively captures the over- pression levels. mRNA fluctuations in the gene expression all cell size dependence of the ABM simulations (Fig. 4). model with transcriptional size-scaling (13) obey SCH and Our findings confirm that birth size variation contributes hence are unaffected by these noise sources. The effect on significantly to the cell size dependence of gene expression protein noise remains yet to be elucidated. noise of networks lacking SCH. To this end, we assume small birth-size variations and approximate the actual birth size with an averaged estimate EΠ[s0|s] of the retrospective birth size for a cell of size s. D. ENMs provide surprisingly accurate approximations of The covariance matrix (or any other moment) can then be total noise in ABMs lacking SCH approximated as We go on to compare the ENM introduced in Sec. II with Σ(s) = EΠ[Σ(s, s0)|s] ≈ Σ(s, EΠ[s0|s])). (38a) the ABM. In contrast to the EDM, the ENM predicts the This simplification can formally be justified through a total noise statistics including the variability introduced by the cell size distribution. It is clear that the ENM agrees saddle-point approximation as the joint distribution Π(s, s0) has a maximum at Π(s, E [s |s])). Generally no analytical exactly with the ABM whenever the network obeys SCH. In Π 0 particular, the marginal factorial-moment generating func- expression of EΠ[s0|s] can be derived from Eq. (24) in the h i QN xi presence of cell size control fluctuations, however, and we tion of the ABM’s molecule numbers G(z) = EΠ i=1 zi approximate EΠ[s0|s] by a matched asymptotic expansion (irrespective of cell size) follows from Eq. (26) as (Appendix C): " N # Y (s−s¯ )2 q 2 − 0 G(z) = Eχ K(κi(zi − 1)) , (39) e 2σ2 π 2 i=1 EΠ[s0|s] ≈ s¯0 − σ   + 2aσ γ(s − as¯0), (38b) 1 + erf s√−s¯0 | {z } 2σ large cells when the concentration distribution χ has been identified | {z } (as we did for the models in Sec. IIIA) and the moment- small cells ts generating function K(t)= EΠ[e ] of the cell size distribu- which holds for the linear cell size control model (19). The tion (24) is known. When SCH does not hold, the ABM first term is the average birth size in the absence of cell size statistics can in principle be obtained through integrat- control fluctuations, the second term denotes the contribu- ing Eq. (34) against the size distribution Π(s, s0). Specif- tions from small cells, while the third term stems from large ically, denoting molecule numbers by x and concentrations bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

11

FIG. 5. The extrinsic noise model approximates gene expression noise with size control and division errors. (a) Scaling of mRNA concentration noise with mean concentrations for various noise levels in added size CV[∆] and partition noise CV[θ] when the transcription rate k0 is varied (top). Corresponding scaling is shown for mRNA numbers (bottom). Analytical predictions of the ENM (solid lines, Eq. (8) with (10) and (12)) and ABM simulations using the First-Division Algorithm (dots, Box1) are shown. The inset shows the relative error in CV of the EDM compared to ABM simulations [100% × (CVENM/CVABM − 1)]. (b) Scaling of protein noise with mean protein concentration (top) and numbers (bottom) when the translation rate ktl is varied. (c) Same as (b) but varying the transcription rate ktl. For comparison, the ABM predictions without cell cycle noise are shown (dashed black lines, Eq. (41)) and the error bounds of 2% predicted by the theory (solid grey). See caption of Fig.3 for the remaining parameters.

x by X = s , as before, we have trol fluctuations and division errors, both for mRNA con- centrations and numbers, which validates our theoretical 2 CovΠ[X] = EΠ[Σ(s, s0)/s ], predictions that the mRNA distribution satisfies SCH and T hence the ENM is exact. CovΠ[x] = EΠ[Σ(s, s0)] + VarΠ(s)X¯X¯ , (40) where Σ(s, s0) is the size-dependent covariance matrix dis- cussed in Sec. IIIC. To illustrate this dependence, we consider the gene ex- pression model with transcriptional size-scaling (13) and integrate Eq. (36) numerically against the size distribution (24). We observe that the mRNA noise-mean relationship However, the protein noise-mean relationships of the of the ABM follows exactly the ENM predictions when the ABM and ENM differ (Fig. 5b). The discrepancy, albeit mean varies through the transcription rate (Fig. 5a). This small, exists even for deterministic divisions (CVϕ[∆] = agreement is confirmed for various strengths of cell size con- CVπ[θ] = 0) for which the averages (40) over the size dis- bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12 tribution can be carried out analytically and result in: Our theory shows that the newly defined stochastic con- centration homeostasis (SCH) (cf. Sec. IIIA and Theo-  δ−1   24(2 −1)  rem 1) provides a necessary and sufficient condition for 2b 12 − + 13δ (2δ+1−1)(δ−1) the stationary distributions of the chemical master equation 2 1   CVABM[P ] = 1 +  , (EDMs and ENMs) to agree exactly with the snapshot dis- ΩP P¯  27(δ + 2)  tributions of detailed agent-based models (ABM). A broad

2 2 class of gene networks (30), involving mono-molecular reac- CVABM[p]=CVΠ[s]+ tions, multi-state promoters and bursting, satisfy SCH irre-  δ−1   2(2 −1)  spective of network parameters when the reaction rates scale 2b δ+1 + δ(ln(8) − 1) − 1 1  (2 −1)(δ−1)  with cell size according to the law of mass-action. SCH is 1 +  , (41) however not restricted to this particular class of network and Ω P¯  3δ ln(2)  p can generally be checked on a case-by-case basis using the generating function equations, which can be accomplished without solving the chemical master equation analytically where Ω = E [s−1]−1 and Ω = E [s] and δ, b are de- P Π p Π (see Appendix A). fined as in Eq. (16). It can be verified by optimising (41) over δ that the ENM underestimates ABM noise of pro- SCH guarantees that gene expression distributions for tein numbers by at most 2%, while it overestimates noise in cells of a given size are entirely independent of extrinsic protein concentrations by the same amount. The difference noise sources affecting birth size such as cell size control and between the ENM and ABM predictions increases with cell division noise. They thus reveal whether a network embed- size control noise for concentration measures but appears to ded in a growing cell can be insulated against such noise be practically independent of cell size control noise for pro- sources, an important feature that can guide the design of tein number fluctuations (Fig. 5b, insets). The protein num- synthetic circuits. ber noise, but not concentration noise, exhibits an extrinsic Nevertheless, most gene regulatory networks of interest do 2 not obey SCH. To address this issue, we developed the lin- noise floor (CVΠ[s] in (41)) for large mean numbers due to extrinsic cell size variability across the population and this ear noise approximation for ABMs lacking SCH embedded noise floor increases with noise in size control (CV [∆]) and in growing and dividing cells. The theory provides ODEs for ϕ the mean molecule numbers (31) and their covariances (32), division errors (CVπ[θ]) as it is also predicted by the ENM (Fig. 5b). which – unlike the conventional linear noise approximation describing EDMs6,43 – describe their evolution across sizes Similar conclusions hold for the noise-mean relation- and features a boundary condition describing the stochastic ship when the mean is varied through translation rate partitioning of molecules at cell division. We showed that, (Fig. 5c) but there appears an additional intrinsic noise when the reactions follow mass-action size-scaling, the mean floor due to stochastic bursting (Fig. 5c and Eq. (41)) that is concentrations of EDMs and ABMs agree, because they present both in the protein concentration and number noise. exhibit concentration homeostasis (Sec. IIIB). The theory This phenomenon is in qualitative agreement with previous further provides closed-form analytical expressions for the findings27,35,40 and similarly predicted by the ENM (15). covariance matrix of gene expression fluctuations in the ab- Presumably, the better quantitative agreement for molecule sence of SCH. We note that like the conventional linear noise numbers as compared to concentrations (Fig. 5b and c) is approximation, the linear noise approximation for ABMs is due the fact that the ENM and ABM predictions are dom- exact for linear reaction networks but represents an approx- inated by extrinsic noise, which has the same effect in both imation for nonlinear reaction networks (Sec. IIIC). models. Our observations suggest that, surprisingly, the ENM provides much more accurate approximations of the While the EDM always predicts birth-size-independent ABM statistics than the EDM. noise, the ABM’s covariance matrices generally depend on it (Sec. IIIC), both for concentrations and molecule numbers. This means that, unlike in SCH conditions, size-control noise and division errors can propagate to gene expression IV. DISCUSSION levels, and we unveiled quantitative and qualitative differ- ences between EDMs and ABMs regarding the dependence We presented an agent-based framework to study gene ex- of expression noise on cell size. Such differences prevail pression noise coupled to cell size dynamics across growing even for relatively simple gene networks involving protein and dividing cell populations. The framework consists of expression (see (13)) for which our linear noise approxima- an exact algorithm for simulating the stochastic dynamics tion readily provides exact expressions for mean and noise of dividing cells (Box 1), which generalises previous algo- statistics. rithms for isolated lineages4,17,32,61–63 towards growing cell Despite these discrepancies, we found that the ENM populations, and a master equation framework (Sec. II) that of these simple gene expression models provides surpris- exactly characterises the snapshot-distribution of gene ex- ingly accurate total noise estimates (Sec. IIID). In fact, we pression and cell size across such a agent-based population. showed analytically that the ENM (and EDM) of bursty bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

13 production with translational size-scaling agrees exactly els as approximations of the agent-based dynamics, and thus with the ABM since it obeys SCH. For transcriptional size- they significantly extend the scope of state-of-the-art mas- scaling, which implies the absence of SCH, the ENM devi- ter equation methods to a broad range of single-cell analyses ates at most a few percent from ABM’s total protein noise in growing cell populations. prediction. To resolve such small differences experimentally, one would need to probe on the order of 10, 000 cells for mea- suring the squared coefficient of variation accurate to three CODE AVAILABILITY leading digits (assuming sampling errors inversely propor- tional to cell number), which is achievable only with high- An implementation of the First-Division Algorithm throughput techniques. (Box1) in Julia is available at github.com/pthomaslab/ An outstanding question is whether the good agreement fda. we observed is specific to the particular model or parameter values we have chosen or whether the ENM is more gener- ally valid. We have made an initial step in this direction AUTHOR CONTRIBUTIONS by providing closed-form expressions for the ABM’s linear noise approximation of any single-species reaction network PT and VS designed the study and interpreted the results. with deterministic size-distribution (AppendixD). These re- PT developed the theory and analysed the data. sults demonstrate that the ENM overestimates the ABM’s coefficient of variation of concentrations by at most 8% but underestimates it by at most 2%, and vice versa for FUNDING molecule numbers, and these bounds hold independently of the choice of parameters. This suggests that ENMs could be surprisingly accurate approximation of ABMs. Other This work has been supported by a UKRI Future Leaders effective models of bursty protein production without any Fellowship (MR/T018429/1) to PT and the EPSRC Centre size-scaling as proposed in64 cannot obey SCH since they for Mathematics of Precision Healthcare (EP/N014529/1). ignore cell size and generally produce larger errors than the ENM even for deterministic cell cycles. A limitation of our study is that we assumed the validity REFERENCES of the linear noise approximation for the noise statistics of networks lacking SCH. Mean and covariance of the linear 1D. J. Kiviet, P. Nghe, N. Walker, S. Boulineau, V. Sunderlikova, and noise approximation are exact for linear reaction networks, S. J. Tans, “Stochasticity of metabolism and growth at the single-cell as those we have studied here, but it represents an approx- level,” Nature 514, 376–379 (2014). imation for networks with nonlinear propensities valid in 2F. J. Bruggeman and B. Teusink, “Living with noise: on the propa- the limit of large molecule numbers. To improve the esti- gation of noise from molecules to phenotype and fitness,” Curr Opin Syst Biol 8, 144–150 (2018). mates of our theory, one could consider higher-order terms 3 44 P. Thomas, G. Terradot, V. Danos, and A. Y. Weiße, “Sources, in the , resort to moment-closure propagation and consequences of stochasticity in cellular growth,” 65 66 approximations , or to compute moment bounds for non- Nat Commun 9, 1–11 (2018). linear reaction networks. 4F. Bertaux, S. Marguerat, and V. Shahrezaei, “Division rate, cell Another limitation is that we neglected growth rate vari- size and proteome allocation: impact on gene expression noise and implications for the dynamics of genetic circuits,” Royal Soc Open ability, which is a significant source of noise at the single- Sci 5, 172234 (2018). 47 cell level . It would be interesting to include these features 5C. A. Vargas-Garcia, K. R. Ghusinga, and A. Singh, “Cell size con- in our ABMs, compare them to the effective models, and trol and gene expression homeostasis in single-cells,” Curr Opin Syst investigate whether SCH can be generalised to this case. Biol 8, 109–116 (2018). 6 Previous studies3 have investigated the dependence of gene N. G. Van Kampen, Stochastic processes in physics and chemistry, Vol. 1 (Elsevier, 1992). expression noise on growth rate dynamics in isolated lin- 7D. T. Gillespie, “Exact stochastic simulation of coupled chemical eages using small noise approximations similar to the one reactions,” J Phys Chem 81, 2340–2361 (1977). used here. Nevertheless, it may be expected that selection 8D. T. Gillespie, “A rigorous derivation of the chemical master equa- plays a pronounced role in populations where cells compete tion,” Physica A 188, 404–425 (1992). 9 for growth unlike in isolated lineages, which in turn may H. H. McAdams and A. Arkin, “Stochastic mechanisms in gene ex- 38,67,68 pression,” Proc Natl Acad Sci 94, 814–819 (1997). lead to significant deviations of ENMs from ABMs 10M. Thattai and A. Van Oudenaarden, “Intrinsic noise in gene regu- that we have not studied here. latory networks,” Proc Natl Acad Sci 98, 8614–8619 (2001). In summary, we proposed SCH as a general condition 11P. S. Swain, M. B. Elowitz, and E. D. Siggia, “Intrinsic and extrinsic for exactness of EDMs. In the absence of SCH, we found contributions to stochasticity in gene expression,” Proc Natl Acad Sci 99, 12795–12800 (2002). that despite qualitative differences in the predictions of 12X. Lei, W. Tian, H. Zhu, T. Chen, and P. Ao, “Biological sources EDMs, ENMs closely approximate the total noise statistics of intrinsic and extrinsic noise in cI expression of lysogenic phage of ABMs. Our results reinstate the validity of effective mod- lambda,” Sci Rep 5, 13597 (2015). bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14

13M. K. Tonn, P. Thomas, M. Barahona, and D. A. Oyarz´un, terial cell size can reveal strategies of gene expression,” Phys Biol “Stochastic modelling reveals mechanisms of metabolic heterogene- 17, 045002 (2020). ity,” Commun Biol 2, 1–9 (2019). 35R. Perez-Carrasco, C. Beentjes, and R. Grima, “Effects of cell cycle 14B. P. Ingalls, Mathematical modeling in systems biology: an intro- variability on lineage and population measurements of messenger duction (MIT press, 2013). RNA abundance,” J Royal Soc Interface 17, 20200360 (2020). 15D. A. Charlebois and G. Bal´azsi,“Modeling cell population dynam- 36O. G. Berg, “A model for the statistical fluctuations of protein num- ics,” In Silico Biol 13, 21–39 (2019). bers in a microbial population,” J Theor Biol 71, 587–603 (1978). 16J. Lin and A. Amir, “Homeostasis of protein and mRNA concentra- 37N. Brenner, K. Farkash, and E. Braun, “Dynamics of protein distri- tions in growing cells,” Nat Commun 9, 1–11 (2018). butions in cell populations,” Phys Biol 3, 172 (2006). 17X.-M. Sun, A. Bowman, M. Priestman, F. Bertaux, A. Martinez- 38P. Thomas, “Making sense of snapshot data: ergodic principle for Segura, W. Tang, C. Whilding, D. Dormann, V. Shahrezaei, and clonal cell populations,” J Royal Soc Interface 14, 20170467 (2017). S. Marguerat, “Size-dependent increase in RNA Polymerase II ini- 39P. Thomas, “Intrinsic and extrinsic noise of gene expression in lin- tiation rates mediates gene expression scaling with cell size,” Curr eage trees,” Sci Rep 9, 474 (2019). Biol 30, 1217–1230 (2020). 40J. Jedrak and A. Ochab-Marcinek, “Contributions to the ‘noise floor’ 18A. Hilfinger and J. Paulsson, “Separating intrinsic from extrinsic in gene expression in a population of dividing cells,” Sci Rep 10, 1– fluctuations in dynamic biological systems,” Proc Natl Acad Sci 108, 13 (2020). 12167–12172 (2011). 41R. Dessalles, V. Fromion, and P. Robert, “Models of protein pro- 19C. G. Bowsher and P. S. Swain, “Identifying sources of variation and duction along the cell cycle: An investigation of possible sources of the flow of information in biochemical networks,” Proc Natl Acad noise,” Plos One 15, e0226016 (2020). Sci 109, E1320–E1328 (2012). 42T. G. Kurtz, “Strong approximation theorems for density dependent 20H. Kempe, A. Schwabe, F. Cr´emazy, P. J. Verschure, and F. J. Markov chains,” Stoch Process Their Appl 6, 223–240 (1978). Bruggeman, “The volumes and transcript counts of single cells reveal 43J. Elf and M. Ehrenberg, “Fast evaluation of fluctuations in bio- concentration homeostasis and capture biological noise,” Mol Biol chemical networks with the linear noise approximation,” Genome Cell 26, 797–804 (2015). Res 13, 2475–2484 (2003). 21T. Blasi, F. Buettner, M. K. Strasser, C. Marr, and F. J. Theis, “cg- 44P. Thomas, H. Matuschek, and R. Grima, “Computation of biochem- correct: a method to correct for confounding cell–cell variation due ical pathway fluctuations beyond the linear noise approximation us- to cell growth in single-cell transcriptomics,” Phys Biol 14, 036001 ing iNA,” in 2012 IEEE International Conference on Bioinformatics (2017). and Biomedicine (IEEE, 2012) pp. 1–5. 22N. Nordholt, J. Van Heerden, R. Kort, and F. J. Bruggeman, “Ef- 45M. Voliotis, P. Thomas, R. Grima, and C. G. Bowsher, “Stochas- fects of growth rate and promoter activity on single-cell protein ex- tic simulation of biomolecular networks in dynamic environments,” pression,” Sci Rep 7, 1–11 (2017). PLoS Comput Biol 12, e1004923 (2016). 23R. Kafri, J. Levy, M. B. Ginzberg, S. Oh, G. Lahav, and M. W. 46M. Osella, E. Nugent, and M. C. Lagomarsino, “Concerted control Kirschner, “Dynamics extracted from fixed cells reveal feedback link- of Escherichia coli cell division,” Proc Natl Acad Sci 111, 3431–3435 ing cell growth to cell cycle,” Nature 494, 480–483 (2013). (2014). 24K. Kuritz, D. St¨ohr, D. S. Maichl, N. Pollak, M. Rehm, and 47S. Taheri-Araghi, S. Bradde, J. T. Sauls, N. S. Hill, P. A. Levin, F. Allg¨ower, “Reconstructing temporal and spatial dynamics from J. Paulsson, M. Vergassola, and S. Jun, “Cell-size control and home- single-cell pseudotime using prior knowledge of real scale cell densi- ostasis in bacteria,” Curr Biol 25, 385–391 (2015). ties,” Sci Rep 10, 1–10 (2020). 48P. Thomas, “Analysis of cell size homeostasis at the single-cell and 25C. Zopf, K. Quinn, J. Zeidman, and N. Maheshri, “Cell-cycle depen- population level,” Front Phys 6, 64 (2018). dence of transcription dominates noise in gene expression,” PLoS 49B. M. Martins, A. K. Tooke, P. Thomas, and J. C. Locke, “Cell size Comput Biol 9, e1003161 (2013). control driven by the circadian clock and environment in cyanobac- 26N. Walker, P. Nghe, and S. J. Tans, “Generation and filtering of teria,” Proc Natl Acad Sci 115, E11415–E11424 (2018). gene expression noise by the bacterial cell cycle,” BMC Biol 14, 11 50Y. Tanouchi, A. Pai, H. Park, S. Huang, R. Stamatov, N. E. Buchler, (2016). and L. You, “A noisy linear map underlies oscillations in cell size 27A. Schwabe and F. J. Bruggeman, “Contributions of cell growth and and gene expression in bacteria,” Nature 523, 357–360 (2015). biochemical reactions to nongenetic variability of cells,” Biophys J 51M. Priestman, P. Thomas, B. D. Robertson, and V. Shahrezaei, 107, 301–313 (2014). “Mycobacteria modify their cell size control under sub-optimal car- 28D. Antunes and A. Singh, “Quantifying gene expression variability bon sources,” Front Cell Develop Biol 5, 64 (2017). arising from randomness in cell division times,” J Math Biol 71, 52J. T. Sauls, D. Li, and S. Jun, “Adder and a coarse-grained approach 437–463 (2015). to cell size homeostasis in bacteria,” Curr Opin Cell Biol 38, 38–44 29M. Soltani, C. A. Vargas-Garcia, D. Antunes, and A. Singh, “Inter- (2016). cellular variability in protein levels from stochastic expression and 53J. Peccoud and B. Ycart, “Markovian modeling of gene-product syn- noisy cell cycle processes,” PLoS Comput Biol 12, e1004972 (2016). thesis,” Theor Popul Biol 48, 222–234 (1995). 30I. G. Johnston, B. Gaal, R. P. das Neves, T. Enver, F. J. Iborra, 54V. Shahrezaei and P. S. Swain, “Analytical distributions for stochas- and N. S. Jones, “Mitochondrial variability as a source of extrinsic tic gene expression,” Proc Natl Acad Sci 105, 17256–17261 (2008). cellular noise,” PLoS Comput Biol 8, e1002416 (2012). 55P. Bokes, J. R. King, A. T. Wood, and M. Loose, “Exact and ap- 31R. Luo, L. Ye, C. Tao, and K. Wang, “Simulation of E. coli gene proximate distributions of protein and mRNA levels in the low-copy regulation including overlapping cell cycles, growth, division, time regime of gene expression,” J Math Biol 64, 829–854 (2012). delays and noise,” PloS One 8, e62380 (2013). 56K. Choudhary and A. Narang, “Urn models for stochastic gene ex- 32D. Gomez, R. Marathe, V. Bierbaum, and S. Klumpp, “Modeling pression yield intuitive insights into the probability distributions of stochastic gene expression in growing cells,” J Theor Biol 348, 1–11 single-cell mRNA and protein counts,” Phys Biol 17, 066001 (2020). 57 (2014). This follows from the fact that F (x) = 2F1(a, λ; γ + λ; bx) is the 33I. G. Johnston and N. S. Jones, “Closed-form stochastic solutions moment-generating function of κ ∼ Gamma(a, (1 + br)−1) and r ∼ p for non-equilibrium dynamics and inheritance of cellular components Beta(λ, γ). Letting EΠ[z |s] = F (s(z − 1)), a = a−/α, λ = a−α over many cell divisions,” Proc R Soc A 471, 20150050 (2015). and λ + γ = kon+koff then yields the factorial-moment generating 34 α C. A. Nieto, C. A. V. Garcia, C. Sanchez, J. C. Arias-Castro, and function solutions in54,56. J. M. Pedraza, “Correlation between protein concentration and bac- bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15

58L. Ham, D. Schnoerr, R. D. Brackston, and M. P. Stumpf, “Exactly factorial-moment generating function solvable models of stochastic gene expression,” J Chem Phys 152, 144106 (2020). Z x x X x −s|κ| X x (κs) 59F. Maclean and R. Munson, “Some environmental factors affecting E [z |s] = z Π(x|s) = dκχ(κ)e z Π x! the length of Escherichia coli organisms in continuous cultures,” Mi- x∈X K x∈X crobiology 25, 17–27 (1961). s(z−1)T κ 60A. Koch and M. Schaechter, “A model for statistics of the cell divi- = Eχ[e ] = F (s(z − 1)) (A2) sion process,” Microbiology 29, 435–454 (1962). 61T. Lu, D. Volfson, L. Tsimring, and J. Hasty, “Cellular growth and and similarly the factorial moments division in the Gillespie algorithm,” IET Syst Biol 1, 121–128 (2004). 62 J. H. Van Heerden, H. Kempe, A. Doerr, T. Maarleveld, N. Nordholt,  x!  and F. J. Bruggeman, “Statistics and simulation of growth of single n x |n| n µn(s) = EΠ s = (∂z EΠ[z |s])z=1 = s Eχ[κ ], bacterial cells: illustrations with B. subtilis and E. coli,” Sci Rep 7, (x − n)! 1–11 (2017). (A3) 63C. Blanco, C. Nieto, C. Vargas, and J. Pedraza, “PyEcoLib: a python library for simulating E. coli stochastic size dynamics,” for a multi-index n. Since the factorial-moment generat- bioRxiv , 319152 (2020). 64C. H. Beentjes, R. Perez-Carrasco, and R. Grima, “Exact solution ing function uniquely determines the distribution, Eqs. (A2) of stochastic gene expression models with bursting, cell cycle and and (A3) may equivalently serve as definitions of SCH. Fur- replication dynamics,” Phys Rev E 101, 032403 (2020). thermore, when {x(s)}s∈[0,∞) is interpreted as a point pro- 65D. Schnoerr, G. Sanguinetti, and R. Grima, “Approximation and cesses along the size coordinate s, SCH emphasises the fact inference methods for stochastic biochemical kinetics—a tutorial re- that it is mixed Poisson process with stationary concentra- view,” J Phys A 50, 093001 (2017). 66J. Kuntz, P. Thomas, G.-B. Stan, and M. Barahona, “Bounding the tion vector κ. stationary distributions of the chemical master equation via mathe- 0 matical programming,” J Chem Phys 151, 034109 (2019). Theorem 1. Assume that the partitioning kernel B(x|x , θ) 67T. Nozoe, E. Kussell, and Y. Wakamoto, “Inferring fitness land- is binomial with probability θ given by the ratio of daughter scapes and selection on phenotypic states from single-cell genealog- birth size and mother division size. A stationary solution ical data,” PLoS Genet 13, e1006653 (2017). of the EDM (4), if it exists, is also a solution of the ABM 68M. Ciechonska, M. Sturrock, A. Grob, G. Larrouy-Maumus, V. Shahrezaei, and M. Isalan, “Ohm’s law for increasing fitness gene (25): expression with selection pressure,” bioRxiv , 693234 (2020). 69 C. Gardiner, Stochastic methods, Vol. 4 (Springer Berlin, 2009). Π(x|s, s0) = ΠEDM(x|s), (A4) 70J. Kuntz, P. Thomas, G.-B. Stan, and M. Barahona, “Stationary dis- tributions of continuous-time markov chains: a review of theory and if and only if the EDM’s solution (4) obeys SCH. truncation-based approximations,” SIAM Review 63, 3–64 (2021). The utility of the theorem is that SCH can be checked without solving the chemical master equation. We demon- strate this aspect for a general reaction network of Appendix A: Stochastic concentration homeostasis and the the form (1) with mass-action propensities wr(x, s) = − 1−|νr | x! validity of EDMs and ENMs s kr − , whose factorial-moment generating func- (x−νr )! tion (see Chapter 7 in69) obeys: Appendices A and B use multi-index notation. In brief, a ∂ multi-index is a N-tuple α = (α1, α2, . . . , αN ). One defines αs G(z|s, s0) α α1 α2 αN ∂s powers of a vector x via x = x1 x2 . . . xN , derivatives α ∂α1 ∂α2 ∂αN R ∂ = ... , sum of components |α| = α + −  + −  − x ∂xα1 ∂xα2 ∂xαN 1 X 1−|ν | ν ν νr = krs r z r − z r ∂z G(z|s, s0) (A5) α2 + ··· + αN , and the factorials α! = α1! · α2! ··· αN ! and α α! r=1 analogously β = β!(α−β)! . Substituting G(z|s, s0) = F (s(z −1)) and x = s(z −1) gives Definition 1 (Stochastic concentration homeostasis). A αx · ∇F (x) probability mass function Π(x|s) with state space X obeys R SCH if for all s ∈ [0, ∞) there exists a random variable κ  + + − −  − X ν −|ν | ν −|ν | νr on K with density χ(κ) satisfying: = krs (x + s) r s r − (x + s) r s r ∂x F (x). r=1

Z x (κs) −s|κ| It can now be seen that the right-hand side of the above Π(x|s) = dκχ(κ) e , ∀x ∈ X . (A1) − + x! equation is independent of s if either (i) |νr | = 0 and |νr | = K − + − + 1 , (ii) |νr | = 1 and |νr | = 0, or (iii) |νr | = |νr | = 1. Thus the EDM and the ABM solutions coincide for mass-action Definition1 implies that if κ has a moment-generating reaction networks (1) when they comprise only the mono- tT κ function F (t) = Eχ[e ] then using (A1) one finds the molecular reactions given in (30). bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

16

Similarly, we check that adding bursty reactions of the such that the invariance condition (B1) becomes k form −→ ms × X leads to a generating function equation ∅ X 0 0 G(z|θs) = GB(z|x , θ)Π(x |s). (B2) ∂ 0 αs G(z|s, s ) = k[g(z|s) − 1]G(z|s, s ) + ..., (A6) x ∂s 0 0 Assume that additionally where g(z|s) = E[zms |s] is the factorial-moment generating function of the burst distribution. Eq. (A6) transforms to N 0 X 0 αx · ∇F (x) = k[g(x/s + 1|s) − 1]F (x) after substituting θ∂θGB(z|x , θ) = (zi − 1)∂zi GB(z|x , θ), (B3) G(z|s, s0) = F (s(z − 1)) and x = s(z − 1). Hence, bursty i=1 reactions obey SCH if and only if the burst distribution 0 obeys SCH, i.e., if there exist a moment-generating function which holds for the binomial partition kernel GB(z|x , θ) = x0 f satisfying f(x) = g(x/s + 1|s) as in (A2). (1 − θ + θz) . Differentiating Eq. (B2) with respect to θ then gives

X 0 0 Appendix B: Proof of Theorem 1 θ∂θG(z|θs)= θ∂θGB(z|x , θ)Π(x |s) x0 N The proof of Theorem 1 is divided in three steps. The first X step shows that a general condition (B1) guarantees snap- = (zi − 1)∂zi G(z|θs), (B4) shot distributions that are independent of birth size. We i=1 then show that (B1) satisfies the effective dilution model where in the last line we used assumption (B3). Changing and reduces to SCH for binomial partitioning at cell divi- variables (θs → s) in (B4) yields sion. The proof also clarifies that the assumption of bino- mial partitioning cannot be removed under biological con- N X ∂ straints conserving the total number of molecule numbers s∂ G(z|s) = (z − 1) G(z|s), s i ∂z at cell division. General conditions for the existence of the i=1 i EDM’s stationary distributions have been discussed in70. or equivalently the EDM

Step 1: Distributions invariant of birth size  ∂  αs Π(x|s) ∂s The conditional distribution Π(x|s, s ) is independent of N 0 X birth size s0 if and only if = −α [(xi + 1)Π(x1, ..., xi + 1, ..., xN |s) − xiΠ(x|s)] i=1 X 0 0 Π(x|θs, s0) = B(x|x , θ)Π(x |s, s0), (B1) = −DΠ(x|s). (B5) x0 Using the above relation, we see that (25a) coincides with where B(x|x0, θ) is the partitioning kernel in Eq. (25b) that 0 (4) and (A4) follows. depends only the inherited size fraction θ = s0/s . This fact can be verified using (B1) in the boundary condition (25b), which leads to Step 3: SCH and the necessity of binomial partitioning 0 R ∞ 0R s 0 0 0 0 Π(x|s0, s0) = ds ds0 ρ(s , s0|s0)Π(x|s0, s0). 0 0 Finally, we show that condition (B3) required for the va- This implies that Π(x|s, s0) must be independent of birth lidity of the EDM implies independent binomial partitioning size of molecules. (B3) is a linear PDE that can be solved using the method of characteristics, which leads to Π(x|s, s0) = Π(x|s). ∂z ∂G θ i = (1 − z ), θ B = 0. In the following, we show that under condition (B1) Π(x|s) ∂θ i ∂θ coincides with the EDM solution. 0 The general solution is GB(z|x , θ) = J(1 − θ + θz) where the function J is fixed by the condition that for θ = 1 all 0 Step 2: Transformation into an effective dilution model molecules are partitioned deterministically, i.e., J(z) = zx . Hence, we obtain

Let us denote the factorial-moment generating function 0 x0 0 P x 0 GB(z|x , θ) = (1 − θ + θz) , of the partitioning kernel by GB(z|x , θ) = x z B(x|x , θ) bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

17 which corresponds to independent binomial partitioning of Using f(s0, s) = s0Φ(s|s0) and f(s0, s) = Φ(s|s0) in molecules in (20). It then follows that (B2) (and (B1)) are Eq. (C1), the conditional moments of birth size can be ap- equivalent to proximated by

G(z|θs)= G((1 − θ) + θz|s). (B6)  2  large 2σ ∂ ln Φ(s|s¯0) 3 EΠ [s0|s] =s ¯0 1 + + O(σ ) Finally, we show that (B6) is equivalent to G(z|s) = s¯0 ∂s¯0 F (s(z − 1)) for binomial partitioning. Expanding (B6) = s¯ + 2aσ2γ(s − as¯ ) + O(σ3), around z = 1 and identifying the series coefficients with the 0 0 factorial moments µn(s) in (A3), we find that the factorial where the last equality follows from γ(s, s ) = γ(s − as ) moments are homogeneous functions of order |n| = P n : 0 0 i i for the linear cell size control model (19), ands ¯ and σ are |n| 0 µn(θs)= θ µn(s). Then by Euler’s homogeneous function the mean and of the backward lineage theorem, it follows that the factorial moments with index n, distribution ψ given by Eqs. (38c). ∂ |n| bw satisfy s ∂s µn(s) = |n|µn(s) and hence µn(s) = s µn(1). This implies that the factorial-moment generating function is 2. Small cell asymptotics X (z − 1)n G(z|s) = s|n|µ (1) = F (s(z − 1)), n n! n Next we consider small cells by noting that Φ(s|s0) is P n practically constant when s ≈ s0, the integral in (C1) can with F (x) = n x µn(1)/n!. It remains to be shown that F is indeed a moment-generating function. To this end, we be approximated by k note that ∂ G(z|s) = s|k|F (k)(s(z − 1)) ≥ 0 for z ∈ (1, −∞) ∂zk Eψ[s01s ≤s] and hence F (−x) is a completely monotone function on E [s |s] ≈ 0 , (C2) Π 0 E [1 ] x ∈ (0, ∞), which implies that there exists a distribution χ ψ s0≤s −sx for which F (−x) = Eχ[e ] is a Laplace transform, which Assuming that ψ, is approximately Gaussian with means ¯0 concludes the proof of Theorem 1. 2 and variance σ , we find that near s ≈ s¯0, we have

2 (s−s¯0)    − 2 Appendix C: Approximation of birth size moments 1 s − s¯0 σe 2σ Eψ[s01s ≤s] ≈ s¯0 1 + erf √ − √ 0 2 2σ 2π We here derive an analytical approximation (38b) for    1 s − s¯0 the conditional birth size moments. We start by rewrit- Eψ[1s¯ ≤s] ≈ 1 + erf √ (C3) 0 2 2σ ing EΠ[s0|s] in terms of the backward lineage distribution ψ using Eq. (24): bw and hence Z s (s−s¯ )2 EΠ[s0|s] = ds0 s0Π(s0|s) q 2 − 0 0 σe 2σ2 small π 3 R s E [s0|s] =s ¯0 − + O σ , ds0 s0Π(s0, s) E [s Φ(s|s )1 ] Π   = 0 = ψ 0 0 s0≤s . (C1) 1 + erf s√−s¯0 R s E [Φ(s|s )1 ] 2σ 0 ds0 Π(s0, s) ψ 0 s0≤s We now apply matched asymptotic expansion to this ex- which is accurate to order σ3. pression.

3. Global asymptotics 1. Large cell asymptotics

The two asymptotic solutions can be matched at the For large cells s  s0, we can extend the range of inte- gration in Eq. (C1) and compute the expectation value as boundary layer. Since follows small large Z ∞ lim EΠ [s0|s] = lim EΠ [s0|s] =s ¯0, s→∞ s→s0 Eψ[f(s0, s)] = ds0ψbw(s0)f(s0, s) 0 the uniformly valid matched asymptotic expansion is Z ∞ Z ∞ dk  k2σ2  −ik(s0−s¯0) 3 = ds0 e 1 − f(s0, s) + O(σ ) 0 −∞ 2π 2 small large EΠ[s0|s] ≈ EΠ [s0|s] + EΠ [s0|s] − s¯0, 2 2 σ ∂ f(¯s0, s) 3 = f(¯s0, s) + 2 + O(σ ). 2 ∂s¯0 which gives Eq. (38b). bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

18

FIG. 6. Retrospective averages of birth size. Matched asymptotic expansions of EΠ[s0|s] (Eq. (38b), lines) for the adder size control (a = 1, s0 = 1) and agent-based simulations (dots) for (a) varying size control noise CV[∆] and (b) division errors CV[θ].

Appendix D: Analytical solutions and error bounds using the sic noise contribution is given by linear noise approximation for deterministic cell division N " βij  ˜ int X † 2 − 2 βij αXij ΣABM = Ω uˆiuˆj + We begin by outlining the solution of (32). Defining (2βij − 1) (β − 1) ln 4 ξ i,j=1 ij ij ˜ −1 −† Σ(s, s0) = U Σ(s, s0)U , Eq. (32) of the main text be- # βij  βij  ˜ comes βij 2 − 1 βij ln 4 − 2 (1 + ln 4) + 2 + ln 4 Dij β 2 , (2 ij − 1) (βij − 1) ln 4 ξij ˜ ∗ ˜ ˜ αs∂sΣij = (λi + λj )Σij + sDij (D1) (D5) 4Σ˜ (s , s ) = Σ˜ (2s , s ) + 2s X˜ . (D2) ij 0 0 ij 0 0 0 ij with Ω = EΠ[s] = s0 ln 4. The expressions greatly simplify for a single species since Eq. (D1) has the solution uˆ =u ˆi, β = βij and ξ = ξij. We note that in this case (D4) increases monotonically with β while (D5) decreases

λ +λ∗ monotonically with β. Using the limits β → 0 and β → i j D˜ijs ˜ α Σij(s, s0) = cijs + ∗ , (D3) ∞, we find that the ABM’s coefficients of variation can be  λi+λj  α 1 − α bounded by the EDM’s coefficients: 6 CV2 [X] 3 where the constants c are fixed using the boundary condi- ≤ ENM ≤ ln 2, (D6) ij 7 2 2 tion (D2) which gives the solution of the EDM, Eq. (34) of CVABM[X] 2 the main text. 2 CVENM[x] 4 ln 2 2 ln (2) ≤ 2 ≤ , (D7) Using Eq. (40) and averaging (D3) over the determinis- CVABM[x] 1 + ln 4 tic size distribution (33), we find the covariance matrix of concentrations X, where we have used the fact that D(X¯) ≥ αX¯ and the ENM int αX¯ D solution of (9) is ΣENM = Ωξ + Ωξ . The result implies that N 1   ˜ the ENM overestimates the ABM’s coefficient of variation 1 X βij coth βij ln 2 + 3 αXij Cov [X] = uˆ uˆ† 2 of concentrations by at most 8% but underestimates it by Π Ω i j 3(β + 1) ξ i,j=1 ij ij at most 2%, and vice versa for molecule numbers. 1  ˜ βij 3βij − coth 2 βij ln 2 Dij + 2  , (D4) 3 βij − 1 ξij

∗ ξij −1 −1 3 1 where ξij = 2α − λi − λ , βij = ,Ω = EΠ[s ] = , j α 4 s0 andu ˆi are the eigenvectors of the Jacobian J introduced before Eq. (34). Similarly, considering molecule numbers x, int ¯ ¯ T we have CovΠ[x] = ΣABM + VarΠ(s)XX where the intrin- bioRxiv preprint doi: https://doi.org/10.1101/2020.10.23.352856; this version posted May 25, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19

Box 1: First-Division Algorithm for agent-based simulations of size-dependent gene regulatory networks

Exact simulation algorithm of general stochastic reaction networks within growing cells (agents) undergoing binary cell division according to cell size control rules46,50,52. The algorithm combines the Extrande method45 for simulating re- action networks embedded in a growing cell and the First-Division algorithm38 for the population dynamics. The state of each cell is given by birth time t0, birth size s0, present cell size s and the vector of molecule numbers x.

Algorithm 1: First-division algorithm simulating Algorithm 2: Extrande algorithm simulating agent-based population dynamics reaction networks in a growing cell

Input: Cell states {t0,i, s0,i, si, xi}i=1,...,M for Input: Cell state (t0, s0, s, x) at time T0. 0 population of M cells. Output: Cell state at time T > T0. Output: Cell states at time T > 0. Require: Growth rate α, stoichiometric vectors ± Require: Growth rate α, cell size control model (19) (νr )r=1,..,R and propensities (wr)r=1,..,R. and division error distribution π. PR Initialise t ← T0 and let a0(x, s)= r=1 wr(x, s); Initialise t ← 0; while t < T 0 do αt0 while t < T do 0 0 Compute bound B = maxt ∈[T0,T ] a0(x, s0e ); Compute division times td,i = t0,i + α ln(sd,i/s0,i) Generate putative reaction time τ ∼ exp(1/B); for i = 1, .., M where sd,i is computed from (19); 0 ∗ if t + τ ≥ T then 0 Compute the first division time td = mini td,i and 0 α(T −t0) ∗ Set time t ← T and cell size s ← s0e ; index i = argminitd,i; ∗ else if td < T then ∗ Update time t ← t + τ and cell size Grow all cells until t = td using Algorithm2; s ← s eα(t−t0); Draw a random number θ ∼ π¯ and divide the 0 ∗ Generate random number u ∼ U(0,1); cell i according to s0,i∗ , si∗ ← θsi∗ and if a0(x, s) ≥ Bu then s0,M+1, sM+1 ← (1 − θ)si∗ ; Choose reaction associated with the smallest new old Partition molecules binomially xi∗ ∼ B(xi∗ , θ) positive integer j less than or equal to R and assign the rest to the other daughter cell Pj satisfying r=1 wr(x, s) ≥ Bu and update old new + − xM+1 ← (xi∗ − xi∗ ); molecular state x ← x + νj − νj ; Set M ← M + 1; end else end Grow all cells until t = T using Algorithm2; end end end