<<

Supplementary Materials for A mathematical model of ctDNA shedding predicts tumor detection size

Stefano Avanzini, David M. Kurtz, Jacob J. Chabon, Sharon Seiko Hori, Sanjiv Sam Gambhir, Ash A. Alizadeh, Maximilian Diehn, Johannes G. Reiter

Supplementary Text S1: Mathematical modeling We developed a stochastic model of evolution and biomarker shedding to study the potential of cancer detection based on the biomarker level in liquid . Here we focused on somatic point mutations present in circulating tumor DNA (ctDNA). Nonetheless, our framework applies to any biomarker that is shed by cancer cells at some rate. These biomarkers can be released at different stages of a cycle, for example during proliferation, , or per unit of time.

We model the dynamics of tumor growth and biomarker shedding through a continuous-time multi-type branching process (Athreya and Ney, 2004; Kimmel and Axelrod, 2015; Beerenwinkel et al., 2015; Altrock et al., 2015; Durrett, 2015b). The grows stochastically from a single malignant cell at time t = 0. The

tumor size at time t, denoted At and expressed in number of cells, is modeled by a supercritical branching birth death process with net growth rate r = b − d > 0, where b and d denote the birth and death rate per day, respectively (Fig. 1B). This process has a probability of δ = d/b to go extinct (Athreya and Ney, 2004). Since

we are only interested in primary tumors that do not go extinct, we condition At on eventual survival, i.e. on the event Ω := {At > 0 for all t}. Cancer cells shed a biomarker into the bloodstream at a rate λ per day. In our case, the biomarker is ctDNA measured in diploid genome equivalents (GE). For now, we assume that the biomarker is exclusively shed by cells undergoing apoptosis and hence the shedding rate is given by λ = d · qd, where qd denotes the shedding probability per cell death (Fig. 1B).

1 Normal or benign tumor cells also shed cell-free DNA (cfDNA) into the bloodstream. Cells in benign tumors and expanded subclones often harbor the same cancer-associated mutations as cancer cells (Martincorena and Campbell, 2015; Makohon-Moore et al., 2018; Teixeira et al., 2019; Yokoyama et al., 2019; Martincorena et al., 2018). Hence, ctDNA shed from benign tumors can be difficult

to distinguish from ctDNA shed from malignant tumors. We define Bt as the size of the benign population of cells at time t that shed the biomarker in the bloodstream,

and denote their shedding rate as λbn . We assume that benign roughly replicate at a constant size. Hence, benign cells divide and die at the same rate

bbn = dbn, and their population size Bt remains constant over time (Bt = B0 for all t). Considering again that the biomarker is exclusively shed by cells undergoing apoptosis, the shedding rate of benign cells is λbn = dbn · qd,bn per day. Although we conservatively assume that apoptotic benign cells shed ctDNA with

the same probability as malignant cells (qd = qd,bn), the shedding rate λbn of benign cells is lower than the shedding rate λ of malignant cells because benign cells typically replicate at a lower rate than cancer cells (Rew and Wilson, 2000).

The circulating biomarker is eliminated from the bloodstream at an elimination

rate ε which can be calculated from the biomarker half-life time t1/2 as A B ε = log(2)/t1/2. We denote as Ct and Ct the amount of biomarker circulating in the bloodstream at time t shed by malignant and benign cells, respectively. The A B total amount of the biomarker circulating at time t is thus Ct = Ct + Ct .

Because malignant and benign cells shed the biomarker independently from each A B other, the processes (At, Ct ) and (Bt, Ct ) can be studied separately. The stochastic A process (At, Ct ) is a two-type branching process governed by the following transitions

A −→ AA rate b

A −→ C rate d · qd

A −→ ∅ rate d (1 − qd) C −→ ∅ rate ε .

The process is initialized at time t = 0 with a single and no circulating A biomarkers, that is A0, C0 = (1, 0).

Because we assume that the benign cell population Bt remains constant over time and biomarker units are eliminated from the bloodstream independently at the same B rate, the process Ct is a branching pure-death process with constant immigration

2 (Allen, 2011) (or equivalently a M/M/∞ queue system (Kulkarni, 2016; Steutel and B Van Harn, 2003)). We assume that Ct is at equilibrium at time t = 0. We present B additional details about this process, including the exact form of C0 , in the next section.

Biomarker Size Distributions

The processes At, Bt and Ct are indexed by the time parameter t, counted from the primary tumor onset. Clinically, this time is impossible to measure, and it is thus convenient to express the previous processes in terms of other indexing parameters. In our case, a suitable choice is to use the first passage time when the tumor reaches a size of m cells

τm := inf{t > 0 : At = m} .

A B By employing this random variable, the processes Cτm and Cτm express the biomarker levels in terms of an observable quantity, i.e. the primary tumor size. We are especially interested in our model predictions for relatively large primary tumor sizes. When m is large and the shedding rate λ is small, we are able to apply A B asymptotic results for the distributions of the processes Cτm and Cτm that greatly simplify computations and help getting insights into the main evolutionary features of the model. Hence, we first focus on these asymptotic results, while later we will show how to generalize our framework to include multiple biomarker shedding dynamics and to derive the probability distribution of the processes At, Bt and Ct indexed by time. Unless otherwise specified, the presented results are conditioned on eventual survival of At.

Note that we will employ the same sign ’∼’ to indicate ”distributed as” and ”asymptotically converges to”. Since only the former usage applies to random variables, this does not cause any notational ambiguity. Finally, whenever asymptotic probability distributions are derived, formulas including the ’∼’ sign will be accompanied by text highlighting the relevant asymptotic limit.

A Process Ct

A Here we provide a sketch derivation for the asymptotic distribution of Cτm in the small shedding rate - large primary tumor size limit (λ  1, m  1). The derivation follows from the results presented in Cheek et al.(2018).

3 The biomarker is shed by cancer cells as a Cox process (or doubly stochastic

Poisson process (D.J. Daley, 2006)) with intensity (λ · At)t≥0. Furthermore, for large times the branching birth death process At satisfies the classic result (Athreya and Ney, 2004) rt lim At = W e (1) t→∞ where W is a non-negative random variable such that W = 0 if and only if the process At goes extinct. Let us assume m  1 and consider the process At at time τm, by definition, that is when the primary tumor is made of m cells. Conditioning on Ω ≡ {W > 0} and looking backwards in time, we see that the size of the primary −rs tumor at s days before τm is given by m·e . Hence, in the small λ - large m limit, the R ∞ −rs biomarker is shed as a Poisson process with mean λ 0 me ds = λ·m/r. Moreover, by the same argument we see that the unordered times of biomarker shedding are asymptotically distributed as i.i.d. Exponential(r) random variables. Given that the elimination times of shed biomarkers are also i.i.d. Exponential(ε), the probability that a randomly selected biomarker fragment is still circulating in the bloodstream

at time τm is r/(ε + r). Due to the thinning property of Poisson processes, we find that for large primary tumor sizes m and small shedding rates λ

m · λ CA ∼ Poisson . (2) τm ε + r

B Process Ct

B The process Ct is a branching pure-death process with immigration, with death rate ε and immigration rate B0 · λbn = B0 · dbn · qd, because the population size Bt is constant over time. The probability generating function for such a process follows from Allen(2011)

B λbn −εt B −εtC0 B0(y−1)(1−e ) C (y, t) = 1 + (y − 1)e e ε . (3)

Conditioning on At survival, we have that limm→∞ τm = ∞ almost surely. Therefore, B B since the process (Bt, Ct ) is independent of At, the large m limit of Cτm coincides B with the large time limit of Ct . When t is large, equation (3) converges to

λ B bn B (y−1) lim C (y, t) = e ε 0 . (4) t→∞

independently of the initial biomarker level. The right hand side of equation (4) is the

probability generating function of a Poisson random variable with mean B0 · λbn/ε,

4 and so for large m we have

B · λ  CB ∼ Poisson 0 bn . (5) τm ε

B B Since we assume that Ct is in equilibrium initially, we set C0 = B0 · dbn · qd/ε.

Process Ct and sampling

By combining the previous results, we find the asymptotic limit for the distribution of the total biomarker amount present in the bloodstream when the primary tumor

is made of a large number of cells. Conditioned on At survival, this limit is given by

m d q B d q  C = CA + CB −→d Poisson d + 0 bn d (6) τm τm τm ε + r ε

Suppose that a sample of blood with volume Vs is drawn from a patient with a total blood volume equal to Vtot. Assuming that the circulating biomarker units are well mixed in the bloodstream, we sample the biomarker units from the bloodstream with probability p = Vs/Vtot. We define Xt as the total biomarker amount (i.e., number of diploid genome equivalents) present in a sample drawn at time t. Then, by applying again the thinning property of Poisson processes, we find that for large m the biomarker amount in the sample when the primary tumor is made of m cells is asymptotically distributed as

 m d q B d q  X ∼ Poisson p d + 0 bn d . (7) τm ε + r ε

General Framework

For the stochastic analysis of ctDNA shedding by malignant and benign cells we employed the asymptotic results derived above. We further extended these results in several ways. First, so far we focused on the case where biomarker shedding occurs exclusively at cell apoptosis. We generalize our derivations to additionally include shedding at cell (per unit of time) and proliferation. Second, we presented size distributions in the asymptotic limit of large primary tumor sizes - small shedding rates. For other shedding processes, We obtain similar distributions in the asymptotic limit for large times - small shedding rates. Finally, we compute exact

probability distributions for the processes At, Bt and Ct at any given time t for all

5 three shedding processes and their mixtures. Below we present these three extensions in greater detail.

In order to include the possibility of biomarker shedding by cancerous cells also at necrosis and proliferations in our model, we consider the following set of transitions

for the process (At, Ct)

A −→ AAC rate b qb rate λ2

A −→ AA rate b (1 − qb) rate b − λ2

A −→ AC rate qn rate λ1 (8) A −→ C rate d qd rate λ0

A −→ ∅ rate d (1 − qd) rate d − λ0 C −→ ∅ rate ε rate ε .

Here, qd and qb denote the shedding probabilities at apoptosis and proliferation, respectively. Similarly, λ0 = d qd, λ1 = qn and λ2 = b qb denote the shedding rates of malignant cells at apoptosis, necrosis, and proliferation, respectively. The total shedding rate is then defined by λ = λ0 + λ1 + λ2.

Asymptotic results for multiple shedding processes

With the generalized definition of the total shedding rate λ, we find that the biomarker is shed by cancer cells as a Cox process with intensity (λAt)t≥0. Hence, the derivation A of Cτm distribution is identical to the one above. In particular, we show again that for a large primary tumor size m and a small total shedding rate λ, the biomarker amount present in the bloodstream when At = m asymptotically follows a Poisson λ·m distribution with mean ε+r .

Asymptotic results for large times

A A similar derivation provides the size distribution for Ct in the asymptotic limit for large time and small total shedding rate. Under this limit and conditioning on At survival, the total biomarker amount shed up to time t converges in finite dimensional W λert distributions to a Poisson random variable with mean r , where W is the same as in equation (1) (for details see Cheek et al.(2018)). We recall that W > 0 if and only r  if At eventually survives, and conditioned on this event, W follows an Exponential b distribution. The biomarker amount still present in the bloodstream at time t is thus

6 a compound Poisson process with probability generating function

  rt  h A i W λe E zCt = E exp (µ(z) − 1) . r

Here µ(z) denotes the probability generating function of the process initiated by a randomly selected biomarker unit, but because the biomarker cannot reproduce, this process is a branching pure-death process whose size can only be 1 or 0. The ε+rz probability generating function of such a process is thus µ(z) = ε+r . Using this A expression and the results for W , the probability generating function for Ct becomes

 rt −1 h A i bλe (z − 1) E zCt = 1 − . (9) r(ε + r)

Notice that the same steps can be repeated to formally derive the asymptotic results for large primary tumor sizes. The last expression represents the probability generating function of a geometric random variable, and so in the asymptotic limit

for large times and small total shedding rates, conditioned on At survival, we find that  r(ε + r)  CA ∼ Geometric . (10) t bλert + r(ε + r)

First moments

In this section, we summarize the expected values and variances of the processes

At and Ct in the considered asymptotic limits. These moments follow from the probability distributions we previously derived. The process At is a supercritical branching birth death process with net growth rate r and extinction probability δ =

d/b. In particular, recalling that Ω denotes the event of At eventual survival, we have P(Ω) = 1 − δ. If A0 = 1, we have

1 + δ E[A ] = ert , Var(A ) ∼ e2rt t t 1 − δ and ert 1 + 2δ E[A | Ω] ∼ , Var(A | Ω) ∼ e2rt . t 1 − δ t (1 − δ)2 A A For the process Ct , given that (A0, C0 ) = (1, 0), equations (2) and (10) yield m λ m λ E CA | Ω  ∼ , Var CA | Ω  ∼ (11) τm ε + r τm ε + r

7 and bλ bλ  bλ  E CA | Ω  ∼ ert , Var CA | Ω  ∼ ert ert + 1 (12) t r(ε + r) t r(ε + r) r(ε + r)

asymptotically for large primary tumor sizes - small shedding rates and large times - small shedding rates, respectively.

B In the same two limits, the process Ct exhibits the same asypmtotic behaviour. B Conditional on C0 = B0 · λbn/ε we find

 B  B λbn B  B λbn lim E Cτ = lim E[Ct ] = B0 , lim Var Cτ = lim Var Ct = B0 . m→∞ m t→∞ ε m→∞ m t→∞ ε Combining these results and applying the same initial conditions, we then get

m λ B λ m λ B λ E[C | Ω] ∼ + 0 bn , Var (C | Ω) ∼ + 0 bn (13) τm ε + r ε τm ε + r ε for large M - small λ and

b λ B λ E[C | Ω] ∼ ert + 0 bn , t r(ε + r) ε (14) b λ  b λ  B λ Var (C | Ω) ∼ ert ert + 1 + 0 bn t r(ε + r) r(ε + r) ε

for large t - small λ.

Exact results

The asymptotic results discussed above perform extremely well for realistic parameter values and perfectly match the exact simulation results (see Figure S3). In principle, however, our mathematical framework can be used for other biomarkers - or entirely different applications - where the parameter values may not fall in the asymptotic regimes considered before. For this reason, we derive the exact distributions of the processes At, Bt and Ct, for any given time t and combinations of non-zero shedding rates.

Probability generating functions

In order to compute the exact distributions of At, Bt and Ct, we first derive the joint A B probability generating functions of the processes (At, Ct ) and (Bt, Ct ). The marginal

8 generating functions then follow directly, and provide the probability distributions of the individual processes by simple analytical or numerical inversion.

A Process (At, Ct )

A Our derivation for the process (At, Ct ) is based on previous results for a two-type branching process presented in Antal and Krapivsky(2010, 2011). These studies assumed that cells of the second type also have the ability to reproduce, but shedding (or mutations) can happen only at wild-type cell death.

A We consider instead the two-type branching process (At, Ct ) fully defined by the set of transitions in equation (8). For any given initial condition, we denote ∗ A A Pm,n(t) = P ((At, Ct ) = (m, n) | (A0, C0 ) = ∗). The corresponding probability generating function is then defined as

∗ X m n ∗ P (x, y, t) = x y Pm,n(t). m,n≥0

The backward Kolmogorov equations for this process read

dP (1,0) m,n = (b − λ )P (2,0) + (d − λ )δ δ dt 2 m,n 0 m,0 n,0 (0,1) (1,1) (2,1) (1,0) + λ0Pm,n + λ1Pm,n + λ2Pm,n − (b + d + λ1)Pm,n (15) dP (0,1) m,n = ε δ δ − εP (0,1) . dt m,0 n,0 m,n Hence, multiplying both sides of equation (15) by xmyn and summing over all non- negative m, n we find

(1,0) (2,0) ∂tP = (b − λ2)P + (d − λ0) (0,1) (1,1) (2,1) (1,0) + λ0P + λ1P + λ2P − (b + d + λ1)P (0,1) (0,1) ∂tP = ε − εP .

Next, we observe that 2 P(2,0)(x, y, t) = P(1,0)(x, y, t) because of the independence of the progenies of the two initial cells. By applying this property and introducing the notation

A(x, y, t) = P(1,0)(x, y, t), C(x, y, t) = P(0,1)(x, y, t)

9 we reduce to the following system of first order differential equations

2 2 ∂tA = (b − λ2)A + (d − λ0) + λ0C + λ1AC + λ2A C − (b + d + λ1)A (16)

∂tC = ε − εC (17) with initial conditions

A(x, y, 0) = x (18) C(x, y, 0) = y . (19)

Equation (17), subject to the initial condition (19), has solution

C(x, y, t) = 1 + (y − 1)e−εt . (20)

Plugging this back into equation (16), we get

 −εt 2  −εt   −εt ∂tA = b + λ2(y − 1)e A + λ1(y − 1)e − b − d A+ d + λ0(y − 1)e . (21)

We apply the change of variables s = e−εt and find

[b + λ (y − 1)s] [λ (y − 1)s − b − d] [d + λ (y − 1)s] ∂ A = − 2 A2 − 1 A − 0 . s εs εs εs To ease the notation, we rewrite the equation above as

α + β s α + β s α + β s ∂ A = 2 2 A2 + 1 1 A + 0 0 (22) s s s s where b b + d d α = − , α = , α = − 2 ε 1 ε 0 ε λ (y − 1) λ (y − 1) λ (y − 1) β = − 2 , β = − 1 , β = − 0 . 2 ε 1 ε 0 ε We are interested in the probability generating function A(x, y, t) for t ≥ 0, that corresponds to 0 < s ≤ 1. Equation (22) is a Riccati equation, which can be reduced to a second order ODE. To do so, we define

α + β s X(x, y, s) = 2 2 A(x, y, s) s

10 which yields

2 (α2 + β2s)(α1 + β1s) − α2 (α2 + β2s)(α0 + β0s) ∂sX = X + X + 2 . (23) s(α2 + β2s) s

Next, we set ∂ Y (∂ Y )2 ∂2 Y X ≡ − s =⇒ ∂ X = s − s Y s Y 2 Y so that equation (23) transforms into

2 (α2 + β2s)(α1 + β1s) − α2 (α2 + β2s)(α0 + β0s) ∂s Y − ∂sY + 2 Y = 0 . (24) s(α2 + β2s) s

Equation (24) is a second order linear differential equation with rational coefficients. It has three singular points: s = 0 and s = − α2 being regular and s = ∞ being β2 irregular with rank 1. Hence, equation (24) is a single confluent Heun equation (Ronveaux, 1995). To bring it into standard form, we first move the non-zero regular singularity to 1 through the change of variable z = − β2 s. This leads to α2     α2β1 α2β0 (z − 1) β z − α1 − 1 α2(z − 1) β z − α0 ∂2 Y + 2 ∂ Y + 2 Y = 0 . (25) z z(z − 1) z z2

We now look for a solution to equation (25) of the form

Y (z) = zmenzf(z) (26) which implies

m−1 nz 0 ∂zY = z e [(m + nz)f + zf ] 2 m−2 nz  2  0 2 00 ∂z Y = z e (m + nz) − m f + 2z(m + nz)f + z f .

Substituting these expressions into equation (25) and rearranging, we obtain the following differential equation for f(z)

P (z) Q(z) f 00 + f 0 + f = 0 (27) z(z − 1) z2(z − 1) where P (z) and Q(z) are two polynomials of second and third degree in z, respectively. However, it is easily checked that by taking

 q  α2 2 m = −α2, n = − β1 + β1 − 4β0β2 (28) 2β2

11 the first and last coefficients of the polynomial Q(z) become zero. Hence, for these values equation (27) can be written as

γ δ  ωz + ρ f 00 + + + η f 0 + f = 0 . (29) z z − 1 z(z − 1)

Here, the parameters γ, δ and η follow from the decomposition in partial fractions of P (z) z(z−1) while ω and ρ correspond to the coefficients of second and first degree terms in Q(z), respectively - when m and n are set as in equation (28). Explicit expressions for all these parameters in terms of the original rates are given by

b b  q  2 m = , n = λ1 − λ1 − 4λ0λ2 ε 2λ2ε b − d b q γ = + 1 , δ = −1 , η = λ2 − 4λ λ ε λ ε 1 0 2 2 (30) b  q  2 ω = − 2 (b + d)λ1 − (b − d) λ1 − 4λ0λ2 + 2(bλ0 + dλ2) 2λ2ε b  q  2 ρ = 2 (b + d − ε)λ1 − (b − d + ε) λ1 − 4λ0λ2 + 2[bλ0 + (d − ε)λ2] . 2λ2ε

Now, equation (29) is the standard form of the single confluent Heun equation. Solutions to such a second order differential equation are called confluent Heun functions and depend on six arguments - the five parameters of the equation and the independent variable z. A standardized package for numerical and symbolical computations involving Heun functions is currently provided only by Maple (Motygin, 2018), which includes in particular the procedures HeunC and HeunCPrime for the evaluation of confluent Heun functions and their z derivative, respectively. Constistently with these implementations the general solution of equation (29) can be written as 1−γ f(z) = h1(z) + Dz h2(z) (31) where  η(γ + δ) 1 − γ(δ − η)  h (z) = HeunC η, 1 − γ, δ − 1, ω − , ρ + , z 1 2 2  η(γ + δ) 1 − γ(δ − η)  h (z) = HeunC η, γ − 1, δ − 1, ω − , ρ + , z 2 2 2 are the confluent Heun functions uniquely determined by the following initial

12 conditions ρ h (0) = 1, h0 (0) = 1 1 γ (32) (γ − 1)(δ − η) − ρ h (0) = 1, h0 (0) = . 2 2 γ − 2 0 Here hi denotes the z derivative of the confluent Heun functions hi. As mentioned before these functions are implemented by the Maple procedure HeunCPrime and 00 uniquely determined by an additional condition on hi (0), too cumbersome to be reported here. Also notice that the expression in equation (31) contains only one integrating constant, D, as our derivation spawns from a first order differential equation. By plugging such expression back into equation (26), the solution to equation (25) becomes

m nz m nz 1+m−γ nz Y (z) = z e f(z) = z e h1(z) + Dz e h2(z).

Next, we denote the derivative of f(z) as

0 0 −γ 0 g(z) = f (z) = h1(z) + Dz [(1 − γ)h2(z) + zh2(z)] . (33)

Recalling that z = − β2 s = ks and that X(x, y, s) = − ∂sY , we find α2 Y

(m + nks)f(ks) + ksg(ks) X(x, y, s) = − . (34) sf(ks)

Notice that in terms of the original parameters of our model the coefficient k is given

by k = qb(y − 1) and thus depends on y. We can now apply the initial condition for X(x, y, s) to find the value of the constant D. Since s = e−εt and X(x, y, s) = α2+β2s s A(x, y, s), the initial condition A(x, y, t = 0) = x translates to X(x, y, s = 1) = (α2 + β2)x = α2(1 − k)x. Therefore, equation (34) at s = 1 implies

(m + nk)f(k) + kg(k) α (1 − k)x = − . 2 f(k)

Substituting the expressions for f and its derivative (equations (31) and (33), respectively) and solving for D, we find

0 (m + nk + α2x − kα2x)h1(k) + kh1(k) D = − 1−γ 0 . k [(m + nk + α2x − kα2x + 1 − γ)h2(k) + kh2(k)]

Plugging this value back into equation (34), we find an expression for X in terms of 0 s the functions hi(s) and h (s). Multiplying this expression by and substituting i α2(1−ks)

13 s = e−εt, we obtain

rt e K1(x, y)φ2(y, t) − K2(x, y)φ1(y, t) A(x, y, t) = rt (35) e K1(x, y)ψ1(y, t) − K2(x, y)ψ2(y, t)

where b − x[b + λ (y − 1)]  K (x, y) = 2 + nk h (k) + kh0 (k) 1 ε 1 1 d − x[b + λ (y − 1)]  K (x, y) = 2 + nk h (k) + kh0 (k) 2 ε 2 2 bnke−εt  εk φ (y, t) = + 1 h (ke−εt) + e−εth0 (ke−εt) 1 ε 1 b 1 (36) bnke−εt d εk φ (y, t) = + h (ke−εt) + e−εth0 (ke−εt) 2 ε b 2 b 2 −εt −εt ψ1(y, t) = (1 − ke )h1(ke ) −εt −εt ψ2(y, t) = (1 − ke )h2(ke ).

A This is the joint probability generating function of the processes At and Ct , starting from one A cell at time t = 0.

To find the marginal generating function

∞ X A m A (x, t) = P(At = m | (A0, C0 ) = (1, 0))x m=0

we take the limit for y → 1 in equation (35). Given the conditions in equation (32), this yields e−rt(x − δ) + δ(1 − x) A (x, t) = . (37) e−rt(x − δ) + (1 − x) As expected, this coincides with the probability generating function of a supercritical birth death process with net growth rate r = b − d > 0 and extinction probability

δ = d/b < 1. The exact distribution for the process At then follows by inverting A (x, t) and is given by  m δ 1 ∂ A (x, t)  S(t) if m = 0 P(At = m) = =   (38) m! ∂xm x=0 δ −m  1 − S(t) (S(t) − 1)S(t) if m ≥ 1

14 1−δe−rt where S(t) = 1−e−rt . Similarly, the generating function

∞ A X A A n C (y, t) = P(Ct = n | (A0, C0 ) = (1, 0))y n=0

is derived by taking the limit for x → 1 in equation (35). The expression for C A slightly simplifies to

ertK¯ (y)φ (y, t) − K¯ (y)φ (y, t) A(y, t) = 1 2 2 1 (39) C rt ¯ ¯ e K1(y)ψ1(y, t) − K2(y)ψ2(y, t)

where  λ (y − 1) K¯ (y) = nk − 2 h (k) + kh0 (k) 1 ε 1 1  b − d + λ (y − 1) K¯ (y) = nk − 2 h (k) + kh0 (k) 2 ε 2 2

and the functions φ1, φ2, ψ1, ψ2 are defined as in equation (36). The function C A(y, t) can be inverted numerically (see discussion below on inversion techniques) A to find the distribution of the process Ct .

We present the particular form that these results assume when one or more of the

parameters qd, λ1, qb are zero. In general, among the nonzero shedding rates, the one with the highest index determines the special functions involved in the generating

function A(x, y, t). As shown above, these are confluent Heun functions when λ2 > 0, while they become confluent hypergeometric functions when λ2 = 0 and λ1 > 0 and Bessel functions of the first kind when λ2 = 0, λ1 = 0 and λ0 > 0. The remaining cases follow by straightforward substitution from these three. We provide a sketch

of how the previous derivation can be adapted to the cases λ2 = 0, λ1 > 0 and λ2 = 0, λ1 = 0, λ0 > 0. The steps up until equation (24) are valid for all scenarios, which will be the starting point of the adapted derivations. Also notice that for all these cases the marginal probability generating function A (y, t) remains unchanged, as the evolution of the process At is not affected by its shedding activity.

qb = 0, λ1 > 0

When qb = 0, we have β2 = 0. Hence, equation (24) becomes

β s + α − 1 α (α + β s) ∂2 Y − 1 1 ∂ Y + 2 0 0 Y = 0 s s s s2

15 By seeking directly a solution of the form Y (s) = s−α2 f(s) and then applying the

change of variables z = β1s, the equation above reduces to the standard form of Kummer’s equation (Abramowitz and Stegun, 1964)

zf 00(z) + (γ − z)f 0(z) − ωf(z) = 0 (40)

where

γ = α0 − α2 + 1

c = β1   β0 ω = −α2 + 1 . β1

Including again only one integrating constant D, its solution can be expressed in

terms of the confluent hypergeometric function 1F1 as

1−γ f(z) = 1F1(ω, γ, z) + Dz 1F1(ω − γ + 1, 2 − γ, z)

Substituting back we immediately find an expression for the solution Y (s) in terms of

h1(s) = 1F1(ω, γ, cs), h2(s) = 1F1(ω − γ + 1, 2 − γ, cs).

Furthermore, since ∂ a F (a, b, z) = F (a + 1, b + 1, z) ∂z 1 1 b 1 1 the derivative of Y (s) can be computed in terms of the functions h1, h2, g1 and g2, where

g1(s) = 1F1(ω + 1, γ + 1, cs), g2(s) = 1F1(ω − γ + 2, 3 − γ, cs).

Joining these results together, we find an expression for X(s), in terms of the same functions. This expression still depends on the integrating constant D, which is determined by the initial condition X(x, y, s = 1) = α2x. Then, multiplying X(x, y, s) by s and substituting s = e−εt we eventually find α2

rt e K1(x, y)φ2(y, t) − K2(x, y)φ1(y, t) A(x, y, t) = rt (41) e K1(x, y)ψ1(y, t) − K2(x, y)ψ2(y, t)

16 where

K1(x, y) = (b − d + e)(1 − x)h1(y, 0) − (λ1 + λ0)(y − 1)g1(y, 0)

K2(x, y) = (b − d − e)(d − bx)h2(y, 0) + (dλ1 + bλ0)(y − 1)g2(y, 0) −εt φ1(y, t) = (b − d + e)h1(y, t) − (λ1 + λ0)(y − 1)e g1(y, t) −εt φ2(y, t) = d(b − d − e)h2(y, t) + (dλ1 + bλ0)(y − 1)e g2(y, t)

ψ1(y, t) = (b − d + e)h1(y, t)

ψ2(y, t) = b(b − d − e)h2(y, t).

A The marginal probability generating function for the process Ct becomes

ertK¯ (y)φ (y, t) − K¯ (y)φ (y, t) A(y, t) = 1 2 2 1 (42) C rt ¯ ¯ e K1(y)ψ1(y, t) − K2(y)ψ2(y, t) where

¯ K1(y) = −(λ1 + λ0)(y − 1)g1(y, 0) ¯ K2(y) = (b − d − e)(d − b)h2(y, 0) + (dλ1 + bλ0)(y − 1)g2(y, 0) .

qb = 0, λ1 = 0

If we additionally have λ1 = 0, equation (24) becomes

α − 1 α (α + β s) ∂2Y − 1 ∂ Y + 2 0 0 Y = 0 . s s s s2 √ √ Through the change of variables z = 2 α2β0s = c s and by seeking a solution of the form Y (z) = zα1 f(z), we reduce to the Bessel equation (Abramowitz and Stegun, 1964) 2 00 0 2 2 z f + zf + [z − (α2 − α0) ]f = 0 . (43) The solution to equation (43) can be expressed as a Bessel function of the first kind

Jν(z) as

f(z) = Jα2−α0 (z) + DJα0−α2 (z). √ √ This allows us to write Y (s) in terms of h1(s) = Jα0−α2 (c s) and h2(s) = Jα2−α0 (c s) and since w ∂ J (z) = J (z) − J (z), z w z w w+1

17 we can compute Y (s) derivative in terms of the functions h1, h2, g1 and g2, where √ √ g1(s) = Jα0−α2+1(c s), g2(s) = Jα2−α0+1(c s).

By combining these expressions, we first derive X(s), and then find the integrating s constant D by applying the initial condition X(x, y, s = 1) = α2x. Multiplying by α2 and substituting s = e−εt, we find

K (x, y)φ (y, t) − K (x, y)φ (y, t) A(x, y, t) = 1 2 2 1 (44) K1(x, y)ψ2(y, t) − K2(x, y)ψ1(y, t) where p K1(x, y) = (bx − b)h1(y, 0) + bλ0(y − 1)g1(y, 0) p K2(x, y) = (bx − d)h2(y, 0) + bλ0(y − 1)g2(y, 0) p − ε t φ1(y, t) = bh1(y, t) − bλ0(y − 1)e 2 g1(y, t) p − ε t φ2(y, t) = dh2(y, t) − bλ0(y − 1)e 2 g2(y, t)

ψ1(y, t) = bh1(t)

ψ2(y, t) = bh2(t).

From here we can recover the usual marginal probability generating function for the

process At by considering the expansions of Bessel functions Jw(z) around z = 0. The A marginal probability generating function for the process Ct is instead given by ¯ ¯ A K1(y)φ2(y, t) − K2(y)φ1(y, t) C (y, t) = ¯ ¯ (45) K1(y)ψ1(y, t) − K2(y)ψ2(y, t)

where

¯ p K1(y) = bλ0(y − 1)g1(y, 0) ¯ p K2(y) = (b − d)h2(y, 0) + bλ0(y − 1)g2(y, 0) .

Finally, we remark that in all the derivations above, there are special cases of the equation f(z) that require extra attention. Namely, if some of the equation coefficients are integers, then its general solution assumes a slightly different expression than the one reported. However, it is unlikely that real estimates lead to such special cases, which we therefore do not discuss in detail here.

18 B Process (Bt, Ct )

B We derive exact results for the size distributions of the processes Bt and Ct . So far, we assumed a constant population of benign cells and studied the asymptotic behaviour of the biomarker amount they shed. Here, we first present additional details about B this case, providing the exact generating function of Ct at a given time t. Later, we A show how a derivation similar to that employed for the process (At, Ct ) can be used when Bt is modeled by a critical branching birth-death process.

Constant growth

In the derivation of asymptotic results for our model, we already mentioned that if B Bt is constant then Ct is a branching pure-death process with immigration and its exact probability generating function is given by equation (3). In our setup, we also B assume that Ct is originally at equilibrium, i.e. that it starts at the expected value B of its large time asymptotic distribution. Since we observed that in this limit Ct λbn B λbn converges to a Poisson random variable with mean ε B0, we set C0 = ε B0. With this initial condition, equation (3) becomes

λ bn B B n −εt (y−1)(1−e−εt)o ε 0 C (y, t) = 1 + (y − 1)e e .

Numerical inversion of this function then provides the exact distribution of the process B Ct at any given time t.

Critical growth

In our model Bt denotes the number of benign cells at time t with the ability to shed the biomarker in the bloodstream. So far we have assumed that this number stays constant over time, but more generally this number can fluctuate around a constant

average value. For this reason, Bt can be modeled as a critical branching birth death process, i.e. as a process defined like At but with the same birth and death rates bbn = dbn. Under this assumption, and assuming that the shedding probabilities at cell apoptosis and proliferation are the same for benign and malignant cells, the set

19 B of transitions characterizing the two-type process (Bt, Ct ) are

B −→ BBC rate bbn qb rate λbn,2

B −→ BB rate bbn (1 − qb) rate bbn − λbn,2

B −→ BC rate λbn,1 rate λbn,1

B −→ C rate bbn qd rate λbn,0

B −→ ∅ rate bbn (1 − qd) rate bbn − λbn,1 C −→ ∅ rate ε rate ε .

The exact probability generating functions for this process can be computed through A ∗ a derivation similar to that shown for (At, Ct ). Briefly, we can redefine P as

∗ X m n B B  P (x, y, t) = x y P Bt, Ct = (m, n) | B0, C0 = ∗ m,n≥0 and set B(x, y, t) = P(1,0)(x, y, t), C0(x, y, t) = P(0,1)(x, y, t). As biomarker units are eliminated from the bloodstream at the same rate regardless of the cell type that shed them, we have C0(x, y, t) = C(x, y, t), where C(x, y, t) is given by equation (20). However, the function B(x, y, t) does not follow straightforwardly from

A(x, y, t) by simply substituting the dashed rates and taking the limit as dbn → bbn. The reason is that in this limit the two linearly independent solutions h1(z) and h2(z) of the reduced equation become the same and so a different set of solutions has to be chosen. Once these are found, the subsequent steps can be repeated and an explicit expression for B(x, y, t) can be derived. The exact probability generating function for B the process (Bt, Ct ) is given by

(B ,CB ) B CB P 0 0 (x, y, t) = B(x, y, t) 0 C(x, y, t) 0 .

The marginal generating functions for the two single processes follow from the y → 1 and x → 1 limits, respectively. In particular, for any combination of non-zero shedding rates the former is equal to

 x − 1 B0 B(x, t) = 1 + 1 − dbnt(x − 1)

which coincides with the probability generating function of a critical branching

birth death process with death rate dbn and B0 initial individuals. This function is analytically invertible, even though the terms of the probability mass function of Bt

20 can only be expressed as finite sums

 B0 dbnt m−1 B0    dbnt+1 X B0 − 1 m P(B = m | B ) = (d t)2j . t 0 m(d t)m(1 + d t)m B − m + j j bn bn bn j=0 0

B The marginal generating function of Ct can instead be inverted numerically to derive the exact probability distribution of the process. However, while the above described computations are feasible, modeling Bt as a critical branching birth death process leads to tedious complications and does not practically change the dynamics of the model. The complications are related to the fact that a critical branching process eventually gets extinct with probability one. We are generally interested in the large time limit distribution of Bt, but this limit converges to a point mass at zero and conditioning on Bt eventual survival does not make sense either. Moreover, since Bt starts with a very large number of cells, the time required by Bt to become extinct would be unrealistically long. Indeed, the expected time to extinction for a critical branching process is infinite, and using the expression above with dbn = 0.1, we find 7 −12 that P(B10000 yrs = 0 | B0 = 10 ) ≈ 10 . For the same principle, the probability that Bt exhibits significant deviations from B0 within human lifetime is very small. To quantify the probability of such fluctuations, we exploit Chebyshev inequality: 7 7 6 using again dbn = 0.1 and B0 = 10 , we find that P(|B55 yrs − 10 | ≥ 2 × 10 ) ≤ 0.01, which says that even after waiting 55 years, the probability of observing a deviation of at least 20% from the original population size would still be lower than 1%.

Process Ct and numerical inversion

Given the independence of biomarker shedding from benign and malignant cells, the A B probability generating function of the process Ct = Ct + Ct is equal to

C (y, t) = C A(y, t)C B(y, t) where the two functions on the right hand side are given by equation (39) (see also equations (42) and (45) for special cases) and equation (3) (assuming Bt is constant), respectively. As for many other functions derived before, C (y, t) can be numerically inverted to find the exact distribution of the process Ct.

We briefly discuss some inversion techniques. The goal of the inversion procedure is to derive the probability mass function of a stochastic process χt from its generating

21 P∞ k function G(z, t) = k=0 P(χt = k)z as

1 ∂kG(z, t) P(χt = k) = . (46) k! ∂zk z=0

In a few cases, the k-th derivative of the generating function is explicitly computable, thus allowing us to find an analytic formula for the probability mass function of χt. The expression for A (x, t) given by equation (37) is an example of such a generating function and from it we directly derived the probabilities

P(At = m) (see equation (38)). For most generating functions, however, this analytical inversion is not feasible and we have to compute the probability mass function numerically. Algorithms for this numerical procedure are based on techniques for series expansions of analytical functions, which are implemented as Series in both Mathematica (see also NSeries) and Maple. For more details on the numerical inversion of probability generating functions, we refer to refs. Abate and Whitt(1992); Choudhury et al.(1994).

Conditional distributions

In our model summary we pointed out that we are interested in the probability distributions of the processes At, Bt and Ct conditional on At non-extinction. Therefore, the asymptotic results derived before are conditioned on the event of At eventual survival, which we denoted as Ω. The same results were also expressed in terms of the primary tumor size, by computing the asymptotic distributions of the processes involved at the time when At = m, for m large. Here we show how the same kind of conditioning can be applied to the exact probability mass functions derived above.

We note that all the probabilities involved in the following derivations are conditioned on the usual initial values of the processes At, Bt and Ct. For clarity, we do not introduce a new notation for such conditioning, and hereafter we implicitly denote   λ  P( · ) = P · | A , B , CA, CB = 1, B , 0, bn B . 0 0 0 0 0 ε 0

Conditioning on At survival

As we are dealing with exact distributions at a given time, we condition on the event

22 of At survival up to time t, that is

Ωt = {At > 0} .

c We denote δt = P(Ωt ) = P(At = 0). This probability is equal to A (0, t), so we get δ − δe−rt δ = . t 1 − δe−rt Bayes theorem yields

P(At = m) P(At = m | Ωt) = · P (Ωt | At = m) P (Ωt)

for any m ≥ 0. The term P (Ωt | At = m), however, is equal to 0 for m = 0 and to 1 for every m ≥ 1. Hence, we get

P(At = m) P(At = m | Ωt) = 1 − δt for every m ≥ 1, and 0 otherwise.

A  Similar steps allow us to compute P Ct = n | Ωt . In this case, we find

A  A  A  A  P Ct = n,Ωt P Ct = n − P At, Ct = (0, n) P Ct = n | Ωt = = . (47) P (Ωt) 1 − δt

A A The terms P(Ct = n) follow from the probability generating function C (y, t). A  Similarly, the probabilities P At, Ct = (0, n) are obtained by inverting the function A(0, y, t).

Conditioning on one process size

The distribution of the total biomarker amount present in the bloodstream at time t conditioned on the primary tumor size at that time is

n X B  A  P(Ct = n | At = m) = P Ct = i P Ct = n − i | At = m i=0 Pn P CB = i P A = m, CA = n − i = i=0 t t t . P(At = m)

A  The terms P At = m, Ct = n − i follow from inverting the joint probability

23 generating function A(x, y, t). Moreover, in this case, if we consider a strictly positive primary tumor size m, conditioning on At survival is not necessary as {At = m} ⊂ Ωt.

We note that once the exact probability mass functions are known one can conversely condition on the total biomarker amount present and ask how this affects the primary tumor size distributions. To see this, we write

P(At = m, Ct = n) P(At = m | Ct = n) = P(Ct = n) Pn B  A  i=0 P Ct = i P At = m, Ct = n − i = Pn B A . i=0 P(Ct = i)P(Ct = n − i)

By further conditioning on At survival up to t, we obtain

P(At = m,Ωt | Ct = n) (1 − δt) P(At = m | Ct = n) P(At = m | Ct = n,Ωt) = = . P(Ωt | Ct = n) P(Ct = n | Ωt) P(Ct = n)

B Finally, we recall that because (Bt, Ct ) is independent of the primary tumor B growth dynamics, any kind of conditioning on At size has no effect on the Bt and Ct probability distributions.

Expected values

From equation (46) we recover the well known formulas for the mean and variance of

a discrete state stochastic process χt with probability generating function G(z, t)

2 2 E[χt] = ∂z G(z, t)|z=1, Var(χt) = ∂z G(z, t)|z=1 + E[χt] − E[χt] .

These properties allow us to derive analytical expression for the expected value and

variance of the processes At, Bt and Ct at any given time t. In this section, we A summarize these expressions, implicitly conditioning on (A0, C0 ) = (1, 0) and B λbn C0 = ε B0.

For the malignant cancer cell population, we have

1 + δ E[A ] = ert, Var(A ) = ert(ert − 1) t t 1 − δ

24 and ert − δ (1 − δe−rt)(1 + 2δ)ert(ert − 1) E[A | Ω ] = , Var(A | Ω ) = . t t 1 − δ t t (1 − δ)2

When the benign population is modeled by a critical branching birth death process, we find

E[Bt] ≡ B0, Var(Bt) = 2dbnB0t .

In this case, conditioning on At survival has no effect as At and Bt are independent processes.

As for the total amount of circulating biomarker, from the functions C A(y, t) and C B(y, t) we get

λ (ert − e−εt) λ E[CA] = , E[CB] ≡ bn B . t r + ε t ε 0

The expression for the latter mean is the same for Bt modeled as a constant population or as a critical branching birth death process. Furthermore, in our setup B such expectation does not depend on t as we start the process Ct at its equilibrium. Combining the previous results, we find

λ (ert − e−εt) λ E[C ] = + bn B . t r + ε ε 0

The variance of Ct and expressions for its first moments conditional on At survival can be obtained from equation (47).

Sampling scheme

By combining the previous results, we can now derive the exact probability distribution for the total biomarker amount present in a blood sample of a given volume at time t. To this end, we first assume that at time t there is a fixed

biomarker amount Ct = n uniformly distributed over a total volume Vtot of plasma. If we sample from it a volume Vs, each biomarker unit is in the sample independent of the others with probability p = Vs . Hence, the total amount of biomarker X Vtot t present in the sample is binomially distributed with parameters n and p

   k  n−k n Vs Vs P(Xt = k | Ct = n) = 1 − . (48) k Vtot Vtot

25 The total amount of biomarker present in the plasma then follows by averaging over all the possible values of Ct

∞ X P(Xt = k) = P(Ct = n) P(Xt = k | Ct = n) . (49) n=k

The second term in the sum above is simply given by equation (48). As we noticed in the derivation of our asymptotic results, if Ct follows a Poisson distribution, then due to the thinning property, Xt follows a Poisson distribution as well. The expected value and variance of Xt in terms of Ct are given by

2 E[Xt] = p E[Ct] , Var(Xt) = p(1 − p)E[Ct] + p Var(Ct).

To condition on At survival, we observe that

P(Xt = k) − P(Xt = k, At = 0) P(Xt = k | Ωt ) = . (50) 1 − δt Of the two terms in the numerator, the first one coincides with equation (49), while the second one expands as

∞ X P(Xt = k, At = 0) = P(Xt = k, At = 0 | Ct = n) P(Ct = n) n=k ∞ X = P(Xt = k | Ct = n) P(At = 0 | Ct = n) P(Ct = n). n=k

The second equality follows from the fact that Xt and At are conditionally independent given Ct. Equation (50) thus becomes

P∞ n=k P(Xt = k | Ct = n) P(Ct = n) [1 − P(At = 0 | Ct = n)] P(Xt = k | Ωt) = P(Ωt) P∞ P(X = k | C = n) [ P(C = n) − P(A = 0, C = n)] = n=k t t t t t P(Ωt) P∞ P(X = k | C = n) P(C = n,Ω ) = n=k t t t t P(Ωt) ∞ X = P(Xt = k | Ct = n) P(Ct = n | Ωt). n=k

26 Supplementary Figures

A B

Figure S1: Amount of ctDNA correlates with tumor volume in early stage lung cancer. Reanalyzed data of 24 early stage lung cancer patients where no lymphnodes were involved Abbosh et al. (2017). Volumetric estimates were inferred from CT-scans. Linear regressions were performed in log-log space. Shaded region depicts the 90% confidence interval of the regression. A | Data points denote the mean plasma variant allele frequency (VAF) of all tracked clonal mutations per patient. Tumor volume correlated with the mean plasma VAF (Spearman ρ = 0.53, P = 0.0083). Linear regression predicts a VAF of 0.0032% per cm3 of tumor (90% confidence interval: 0.00056 - 0.018%). B | Data points denote the mean number of mutant fragments of all tracked clonal somatic mutations per patient. Tumor volume correlated with the mean number of mutant fragments (Spearman ρ = 0.66, P < 0.001). Linear regression predicts 0.19 mutant fragments per plasma mL per cm3 of tumor tissue (90% confidence interval: 0.038-0.98 mutant fragments).

27 P(X ≥ 1) P(X ≥ 2) P(X ≥ 3) P(X ≥ 4) A B

C D

Figure S2: Detection probability of ctDNA fragments in a 15 mL liquid at given tumor size depends on many parameters. Full lines denote the probability to find at least 1 (blue), 2 (orange), 3 (red), or 4 (turquoise) mutant ctDNA fragments in a liquid biopsy of 15 mL blood (≈ 7.5 mL plasma). Probability of finding mutant ctDNA fragments of a specific region corresponds to diploid genome equivalents. Standard parameter values (if not varied): birth rate b = 0.14 per cell per day, death rate d = 0.136 per cell per day, tumor 9 detection size M = 10 , ctDNA half-life time t1/2 = 30 minutes, ctDNA shedding probability −4 9 per cell death qd = 1.4 · 10 (Supplementary Information). A | Tumors with fewer than 10 cells (≈ 1cm3) rarely shed sufficient ctDNA that individual somatic mutations can be robustly detected. B | Higher ctDNA half-life time t1/2 increases the number of mutant ctDNA fragments at a given tumor size. C | Slower growing tumors lead to higher numbers of mutant ctDNA fragments compared with fast growing tumor at the same size. Growth rate is varied by changing the death rate and keeping the birth rate constant. D | A slightly increased ctDNA shedding probability qd boosts the detection probability.

28 A B C D 1.00 1.00 1.00 1.00

0.75 0.75 0.75 0.75 simulations exact density 0.50 0.50 0.50 0.50 asymptotic density 0.25 0.25 0.25 0.25

Probabilty density 0.00 0.00 0.00 0.00 0 5 10 0 5 10 0 5 10 0 5 10

E F G H 1e 7 1.00 1.00 1.00 4 0.75 0.75 0.75 simulations exact density 0.50 0.50 0.50 2 asymptotic density 0.25 0.25 0.25

Probabilty density 1e7 0.00 0.00 0.00 0 0 5 10 0 5 10 0 5 10 0.0 0.5 1.0 # ctDNA fragments # ctDNA fragments # ctDNA fragments # PT cells Figure S3: Comparison between computer simulations, exact and asymptotic theoretical distributions conditional on tumor non-extinction. Panels (A-G) show the level of biomarker shed by cancer cells and circulating in the bloodstream at time t. Bars illustrate the distribution of the number of biomarker molecules based on 104 simulation realizations. The probability distributions illustrated are all conditional on primary tumor survival up to time t. Full lines illustrate the exact probability distribution at time t, while dashed lines represent the asymptotic results for large times and small shedding rate. Both agree with simulation results (bars). Each panel shows the following combinations of shedding rates: Panel (A)- qb > 0, λ1 > 0, qd > 0. Panel (B)- qb > 0, λ1 > 0, qd = 0. Panel (C)- qb > 0, λ1 = 0, qd > 0. Panel (D)- qb > 0, λ1 = 0, qd = 0. Panel (E)- qb = 0, λ1 > 0, qd > 0. Panel (F)- qb = 0, λ1 > 0, qd = 0. Panel (G)- qb = 0, λ1 = 0, qd > 0. Panel (H) similarly shows the probability distribution of the primary tumor size. Parameter values: birth rate b = 0.14 per cell per day and death rate d = 0.13 per cell per day; ctDNA half-life time t1/2 = 30 minutes; ctDNA shedding probability −4 −4 per cell death qd = 1.4 · 10 , per cell division qb = 1.4 · 10 , and shedding rate per day (at −5 cell necrosis) λ1 = 1.4 · 10 per cell per day. All results are computed at time t = 1200 days from the primary tumor onset.

29 A Normal B Stage I E *** *** ***

* n.s. **

median: 2.9 median: 5.3

C Stage II D Stage III F

median: 6.7 median: 6.8

Figure S4: Plasma DNA concentration increases with disease progression. Analyzed data from 812 normal subjects, 199 stage I patients, 497 stage II patients, and 307 stage III patients of Cohen et al. (Cohen et al. (2018)). A-D | Plasma DNA concentrations (ng per plasma mL) in healthy and cancer patients. B | Red line illustrates fitted Gamma distribution with mean of 6.3 and median of 5.2 ng per plasma mL. Depicted distribution is used to model the varying plasma DNA concentration. E | Comparing plasma DNA concentrations. Two-sided Wilcoxon rank-sum tests were used. Thick black bars denote 90% confidence interval. * P < 0.05; ** P < 0.01; *** P < 0.001; n.s. denotes not significant. F | Mean plasma DNA concentration at a specific age increase with age (R2 = 0.18, P = 3.3 · 10−4). Shaded region depicts 90% confidence interval.

30 A B

monthly testing: median 190 mm3 monthly testing: median 180 days quarterly testing: median 530 mm3 quarterly testing: median 110 days

Figure S5: Expected tumor detection size and lead time compared to current clinical relapse detection at a fixed specificity of 99.5% per test. Since the same relapse detection test with a specificity of 99.5% for monthly and quarterly testing was used, the monthly sampling produces 0.06 false-positives over 12 months of relapse testing while the quarterly sampling only produces 0.02 false-positives over 12 months. A | Expected tumor detection size distributions for monthly and quarterly repeated relapse detection tests. B | Expected lead time distributions compared to imaging-based approaches applied at the same frequency with a detection limit of 1 cm3 for monthly and quarterly repeated relapse detection tests. Parameter values typical for lung cancer: birth rate b = 0.14 per cell per day; death rate d = 0.13 per cell per day; ctDNA −4 half-life time t1/2 = 30 minutes; ctDNA shedding probability per cell death qd = 1.4 · 10 ; sequencing panel covers 20 clonal tumor-specific mutations; sequencing error rate per base-pair 1.5 · 10−5; 7.5 mL plasma sampled per test; plasma DNA median concentration 5.2 ng per plasma mL (Supplementary Information).

31 Requiring ≥2 called mutation out of 20 A B

monthly testing: median 190 mm3 monthly testing: median 180 days quarterly testing: median 270 mm3 quarterly testing: median 170 days

Requiring ≥4 called mutation out of 20 C D

monthly testing: median 130 mm3 monthly testing: median 220 days quarterly testing: median 220 mm3 quarterly testing: median 200 days

Figure S6: Expected tumor relapse detection size and lead time compared to current clinical relapse detection when multiple called mutations are required for a positive test. A | Expected tumor detection size distributions for monthly and quarterly repeated relapse detection tests when 20 mutations are covered by the sequencing panel and two of these 20 mutations need to be called for a positive relapse detection. B | Expected lead time distributions compared to imaging-based approaches applied at the same frequency with a detection limit of 1 cm3 for monthly and quarterly repeated relapse detection tests. C | Expected tumor detection size distributions for monthly and quarterly repeated relapse detection tests when 20 mutations are covered by the sequencing panel and four of these 20 mutations need to be called for a positive relapse detection. D | Expected lead time distributions compared to imaging-based approaches applied at the same frequency with a detection limit of 1 cm3 for monthly and quarterly repeated relapse detection tests. Parameter values typical for lung cancer: birth rate b = 0.14 per cell per day; death rate d = 0.13 per cell per day; ctDNA half-life time t1/2 = 30 −4 minutes; ctDNA shedding probability per cell death qd = 1.4 · 10 ; sequencing panel covers 20 clonal tumor-specific mutations; sequencing error rate per base-pair 1.5 · 10−5; 7.5 mL plasma sampled per test; plasma DNA median concentration 5.2 ng per plasma mL (Supplementary Information). For better comparability, positive detection test thresholds were set such that if the test is repeated over a full year, a false positive rate of 5% is obtained.

32 A 1 mutation within B 5 mutations within C 10 mutations within 4,500 sequenced base-pairs 300,000 sequenced base-pairs 300,000 sequenced base-pairs biannual screening: median 6.6 cm3 biannual screening: median 4.8 cm3 biannual screening: median 3.9 cm3 annual screening: median 15 cm3 annual screening: median 11 cm3 annual screening: median 8.9 cm3

Figure S7: Expected tumor detection size distributions for fast growing lung . Tumors grow with a rate of r = 1.0% per day which corresponds to a tumor volume doubling time of 70 days. Positive screening test thresholds were set such that 0.01 false-positives are expected per year (i.e., specificity of 99% for an annually repeated test). A | Biannually (blue bars) repeated virtual screening tests with one expected mutation for a sequencing panel covering 4,500 base-pairs lead to a median detection size of 6.6 billion cells (25th percentile: 3.7 billion cells; 75th percentile: 12 billion cells). Annually (orange bars) repeated virtual screening tests lead to a median detection size of 15 billion cells (25th percentile: 5.9 billion cells; 75th percentile: 39 billion cells). B | Biannually (blue bars) repeated virtual screening tests with five expected mutation for a sequencing panel covering 300,000 base-pairs lead to a median detection size of 4.8 billion cells (25th percentile: 2.8 billion cells; 75th percentile: 8.2 billion cells). Annually (orange bars) repeated virtual screening tests lead to a median detection size of 11 billion cells (25th percentile: 4.2 billion cells; 75th percentile: 28 billion cells). C | Biannually (blue bars) repeated virtual screening tests with ten expected mutation for a sequencing panel covering 300,000 base-pairs lead to a median detection size of 3.9 billion cells (25th percentile: 2.3 billion cells; 75th percentile: 6.6 billion cells). Annually (orange bars) repeated virtual screening tests lead to a median detection size of 8.9 billion cells (25th percentile: 3.4 billion cells; 75th percentile: 24 billion cells). Parameter values: birth rate b = 0.14 per cell per day; death rate d = 0.13 per cell per day; ctDNA half-life time t1/2 = 30 minutes; ctDNA shedding probability −4 −5 per cell death qd = 1.4 · 10 ; sequencing error rate per base-pair 1.5 · 10 ; 7.5 mL plasma sampled per test; plasma DNA median concentration 5.2 ng per plasma mL (Supplementary Information).

33 A ctDNA in 3000 mL plasma B ctDNA in 3000 mL plasma 0.025 0.025 qd=1.4e-4; qn=qb=0; λ=1.90e-5 qd=1.4e-4; r=0.1%; λ=1.90e-5 q =1.4e-4; r=4%; λ=1.90e-5 0.020 qb=1.4e-4; qn=qd=0; λ=1.96e-5 0.020 d

0.015 0.015 mean: 572.2 mean: 589.0 mean: 571.6 mean: 572.3 0.010 0.010

0.005 0.005 Probability density Probability density 0.000 0.000 500 550 600 650 500 550 600 650 ctDNA level [genome equivalents] ctDNA level [genome equivalents] Figure S8: Operating ctDNA shedding processes determine the effective shedding rate. Panels depict diploid genome equivalents of ctDNA present in 3 liters of plasma when a colorectal tumor reaches a a size of 1 billion cells. A | Blue line illustrates a scenario where ctDNA is exclusively shed during apoptosis with a ctDNA shedding probability per cell death qd = 1.4·10−4. Orange line illustrates a scenario where ctDNA is exclusively shed during proliferation −4 with a ctDNA shedding probability per cell division qb = 1.4 · 10 . We obtain a shedding rate of λ = 1.90 · 10−5 per day per cell for the first case and a shedding rate of λ = 1.96 · 10−5 for the second case because cells proliferate and shed ctDNA slightly more frequently than they die and shed ctDNA (b = 0.14 > d = 0.136). B | In both scenarios cells shed ctDNA exclusively −4 during apoptosis (qd = 1.4 · 10 , qn = qb = 0). Cells die with the same death rate d = 0.136 per cell per day and hence, we obtain identical shedding rates of λ = 1.90·10−5 per cell per day. In the blue scenario, cells divide with a rate of b = 0.137 per day while cells divide with a rate of b = 0.176 per day in the orange scenario. Although the shedding rates are identical, the slow growth rate of r = 0.1% leads to a slightly lower ctDNA amount at the same size of a tumor. Parameter values: ctDNA half-life time t1/2 = 30 minutes (Supplementary Information).

34 Supplementary Tables

A Table S1: Mean and variance for the distribution of the circulating biomarker Ct at time t shed by a tumor conditioned on its survival until time t across different shedding processes. Simulated results are averages over 105 realizations and are compared A with the theoretical expression derived from the marginal probability generating function Ct (y | Ωt ). The first column shows the seven combinations of shedding rates we considered. Parameter values: birth rate b = 0.14 per cell per day, death rate d = 0.136 per cell per 6 day, ctDNA half-life time t1/2 = 30 minutes, tumor age t = 3000 days corresponding to an expected size of 5.696 × 10 cells.

Shedding dynamics Mean Variance

λ0 λ1 λ2 Simulations Exact Asymptotic Simulations Exact Asymptotic 1.904×10−5 0 0 3.255 3.2566 3.2567 13.66 13.8623 13.8625 0 0 1.960×10−5 3.362 3.3524 3.3524 14.54 14.5912 14.5913 0 2.000×10−5 0 3.412 3.4208 3.4209 14.86 15.1230 15.1232 1.904×10−5 0 1.960×10−5 6.602 6.6091 6.6091 49.12 50.2888 50.2894 1.904×10−5 2.000×10−5 0 6.68 6.6775 6.6775 50.45 51.2662 51.2669 0 2.000×10−5 1.960×10−5 6.757 6.7733 6.7733 52.12 52.6504 52.6510 1.904×10−5 2.000×10−5 1.960×10−5 10.04 10.0299 10.0300 108.7 110.6289 110.6301 35 A Table S2: Mean and variance for the distribution of the circulating biomarker Ct at time t shed by a tumor conditioned on its survival until time t across a wide range of shedding rates. Simulated results are averages over 105 realizations and are A compared with the theoretical expression derived from the marginal probability generating function Ct (y | Ωt ). We explore a wide range of shedding rates by varying λ0 and setting λ1 = λ2 = 0. Parameter values: birth rate b = 0.14 per cell per day, death rate d = 0.136 6 per cell per day, ctDNA half-life time t1/2 = 30 minutes, tumor age t = 3000 days corresponding to an expected size of 5.696 × 10 cells.

Shedding dynamics Mean Variance

λ = λ0 Simulations Exact Asymptotic Simulations Exact Asymptotic 10−3 171.6 171.04203 171.04305 29201 29426.407 29426.768 10−4 17.13 17.104203 17.104305 306.8 309.65785 309.66155 10−5 1.713 1.7104203 1.7104305 4.614 4.6359567 4.6360029 10−6 0.1709 0.1710420 0.1710430 0.2005 0.2002974 0.2002988 10−7 0.0172 0.0171042 0.0171043 0.01751 0.0173968 0.0173969 10−8 0.001755 0.0017104 0.0017104 0.001756 0.0017133 0.0017134 36 Table S3: Mean number of mutations of lung cancers covered by CancerSEEK and CAPP-Seq. Mean number of clonal mutations of early stage lung cancers in the TRACERx study (Abbosh et al. (2017)) that are covered by the CAPP-Seq panel and the CancerSEEK panel (considing all amplicons entirely).

Cohort Filter CAPP-Seq CancerSEEK Overall 9.14 1.12 Never Smoker 3.83 1.00 Whole Ex Smoker 8.25 1.02 Recent Ex Smoker 12.12 1.33 Current Smoker 10.29 1.00 Overall 9.71 1.03 Never Smoker 4.67 1.22 Stage I Only Ex Smoker 8.76 0.83 Recent Ex Smoker 12.80 1.30 Current Smoker 12.50 0.75

37 References

Abate, J. and W. Whitt (1992, oct). Numerical inversion of probability generating functions. Operations Research Letters 12 (4), 245–251.

Abbosh, C., N. J. Birkbak, G. A. Wilson, M. Jamal-Hanjani, T. Constantin, R. Salari, J. Le Quesne, D. A. Moore, S. Veeriah, R. Rosenthal, T. Marafioti, E. Kirkizlar, T. B. K. Watkins, N. McGranahan, S. Ward, L. Martinson, J. Riley, F. Fraioli, M. Al Bakir, E. Gr¨onroos, F. Zambrana, R. Endozo, W. L. Bi, F. M. Fennessy, N. Sponer, D. Johnson, J. Laycock, S. Shafi, J. Czyzewska-Khan, A. Rowan, T. Chambers, N. Matthews, S. Turajlic, C. Hiley, S. M. Lee, M. D. Forster, T. Ahmad, M. Falzon, E. Borg, D. Lawrence, M. Hayward, S. Kolvekar, N. Panagiotopoulos, S. M. Janes, R. Thakrar, A. Ahmed, F. Blackhall, Y. Summers, D. Hafez, A. Naik, A. Ganguly, S. Kareht, R. Shah, L. Joseph, A. Marie Quinn, P. A. Crosbie, B. Naidu, G. Middleton, G. Langman, S. Trotter, M. Nicolson, H. Remmen, K. Kerr, M. Chetty, L. Gomersall, D. A. Fennell, A. Nakas, S. Rathinam, G. Anand, S. Khan, P. Russell, V. Ezhil, B. Ismail, M. Irvin-Sellers, V. Prakash, J. F. Lester, M. Kornaszewska, R. Attanoos, H. Adams, H. Davies, D. Oukrif, A. U. Akarca, J. A. Hartley, H. L. Lowe, S. Lock, N. Iles, H. Bell, Y. Ngai, G. Elgar, Z. Szallasi, R. F. Schwarz, J. Herrero, A. Stewart, S. A. Quezada, K. S. Peggs, P. Van Loo, C. Dive, C. J. Lin, M. Rabinowitz, H. J. W. L. Aerts, A. Hackshaw, J. A. Shaw, B. G. Zimmermann, T. T. consortium, and C. Swanton (2017). Phylogenetic ctdna analysis depicts early-stage lung cancer evolution. Nature 545 (7655), 446–451.

Abramowitz, M. and I. Stegun (1964). Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Applied mathematics series. Dover Publications.

Allen, L. (2011). An introduction to stochastic processes with applications to biology (2 ed.). Boca Raton, FL: Chapman & Hall/CRC.

Altrock, P. M., L. L. Liu, and F. Michor (2015). The mathematics of cancer: integrating quantitative models. Nature Reviews Cancer 15 (12), 730–745.

Antal, T. and P. L. Krapivsky (2010). Exact solution of a two-type branching process: clone size distribution in cell division kinetics. J. Stat. Mech. Theory Exp. 2010 (7), P07028, 22.

Antal, T. and P. L. Krapivsky (2011). Exact solution of a two-type branching process: Models of tumor progression. Journal of Statistical Mechanics P08018 .

38 Aravanis, A. M., M. Lee, and R. D. Klausner (2017). Next-generation sequencing of circulating tumor for early cancer detection. Cell 168 (4), 571–574.

Athreya, K. B. and P. Ney (2004). Branching Processes. Dover Publications.

Beerenwinkel, N., R. F. Schwarz, M. Gerstung, and F. Markowetz (2015). Cancer evolution: mathematical models and computational inference. Systematic Biology 64 (1), e1–e25.

Bettegowda, C., M. Sausen, R. J. Leary, I. Kinde, Y. Wang, N. Agrawal, B. R. Bartlett, H. Wang, B. Luber, R. M. Alani, E. S. Antonarakis, N. S. Azad, A. Bardelli, H. Brem, J. L. Cameron, C. C. Lee, L. A. Fecher, G. L. Gallia, P. Gibbs, D. Le, R. L. Giuntoli, M. Goggins, M. D. Hogarty, M. Holdhoff, S.- M. Hong, Y. Jiao, H. H. Juhl, J. J. Kim, G. Siravegna, D. A. Laheru, C. Lauricella, M. Lim, E. J. Lipson, S. K. N. Marie, G. J. Netto, K. S. Oliner, A. Olivi, L. Olsson, G. J. Riggins, A. Sartore-Bianchi, K. Schmidt, l.-M. Shih, S. M. Oba-Shinjo, S. Siena, D. Theodorescu, J. Tie, T. T. Harkins, S. Veronese, T.-L. Wang, J. D. Weingart, C. L. Wolfgang, L. D. Wood, D. Xing, R. H. Hruban, J. Wu, P. J. Allen, C. M. Schmidt, M. A. Choti, V. E. Velculescu, K. W. Kinzler, B. Vogelstein, N. Papadopoulos, and L. A. Diaz (2014). Detection of in early- and late-stage human . Science Translational Medicine 6 (224), 224ra24–224ra24.

Bozic, I., T. Antal, H. Ohtsuki, H. Carter, D. Kim, S. Chen, R. Karchin, K. W. Kinzler, B. Vogelstein, and M. A. Nowak (2010). Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci USA 107 (43), 18545–18550.

Chaudhuri, A. A., J. J. Chabon, A. F. Lovejoy, A. M. Newman, H. Stehr, T. D. Azad, M. S. Khodadoust, M. S. Esfahani, C. L. Liu, L. Zhou, F. Scherer, D. M. Kurtz, C. Say, J. N. Carter, D. J. Merriott, J. C. Dudley, M. S. Binkley, L. Modlin, S. K. Padda, M. F. Gensheimer, R. B. West, J. B. Shrager, J. W. Neal, H. A. Wakelee, B. W. Loo, A. A. Alizadeh, and M. Diehn (2017). Early detection of molecular residual disease in localized lung cancer by circulating tumor dna profiling. Cancer Discovery 7 (12), 1394–1403.

Cheek, D., T. Antal, et al. (2018). Mutation frequencies in a birth–death branching process. The Annals of Applied Probability 28 (6), 3922–3947.

Choudhury, G. L., D. M. Lucantoni, and W. Whitt (1994). Multidimensional transform inversion with applications to the transient m/g/1 queue. The Annals of Applied Probability 4 (3), 719–740.

39 Cohen, J. D., L. Li, Y. Wang, C. Thoburn, B. Afsari, L. Danilova, C. Douville, A. A. Javed, F. Wong, A. Mattox, R. H. Hruban, C. L. Wolfgang, M. G. Goggins, M. Dal Molin, T.-L. Wang, R. Roden, A. P. Klein, J. Ptak, L. Dobbyn, J. Schaefer, N. Silliman, M. Popoli, J. T. Vogelstein, J. D. Browne, R. E. Schoen, R. E. Brand, J. Tie, P. Gibbs, H.-L. Wong, A. S. Mansfield, J. Jen, S. M. Hanash, M. Falconi, P. J. Allen, S. Zhou, C. Bettegowda, L. Diaz, C. Tomasetti, K. W. Kinzler, B. Vogelstein, A. M. Lennon, and N. Papadopoulos (2018). Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 357, 926—-930.

Cristiano, S., A. Leal, J. Phallen, J. Fiksel, V. Adleff, D. C. Bruhm, S. Ø. Jensen, J. E. Medina, C. Hruban, J. R. White, D. N. Palsgrove, N. Niknafs, V. Anagnostou, P. Forde, J. Naidoo, K. Marrone, J. Brahmer, B. D. Woodward, H. Husain, K. L. van Rooijen, M.-B. W. Ørntoft, A. H. Madsen, C. J. H. van de Velde, M. Verheij, A. Cats, C. J. A. Punt, G. R. Vink, N. C. T. van Grieken, M. Koopman, R. J. A. Fijneman, J. S. Johansen, H. J. Nielsen, G. A. Meijer, C. L. Andersen, R. B. Scharpf, and V. E. Velculescu (2019). Genome-wide cell-free dna fragmentation in patients with cancer. Nature 570 (7761), 385–398.

Curtius, K., A. Dewanji, W. D. Hazelton, J. H. Rubenstein, and E. G. Luebeck (2020). Optimal timing for cancer screening and adaptive surveillance using mathematical modeling. bioRxiv. de Koning, H. J., R. Meza, S. K. Plevritis, K. ten Haaf, V. N. Munshi, J. Jeon, S. A. Erdogan, C. Y. Kong, S. S. Han, J. van Rosmalen, S. E. Choi, P. F. Pinsky, A. B. de Gonzalez, C. D. Berg, W. C. Black, M. C. Tammem¨agi,W. D. Hazelton, E. J. Feuer, and P. M. McMahon (2014). Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the us preventive services task force. Annals of Internal Medicine 160 (5), 311–320. de Koning, H. J., C. M. van der Aalst, P. A. de Jong, E. T. Scholten, K. Nackaerts, M. A. Heuvelmans, J.-W. J. Lammers, C. Weenink, U. Yousaf-Khan, N. Horeweg, S. van ’t Westeinde, M. Prokop, W. P. Mali, F. A. Mohamed Hoesein, P. M. van Ooijen, J. G. Aerts, M. A. den Bakker, E. Thunnissen, J. Verschakelen, R. Vliegenthart, J. E. Walter, K. ten Haaf, H. J. Groen, and M. Oudkerk (2020). Reduced lung-cancer mortality with volume ct screening in a randomized trial. New England Journal of Medicine.

Detterbeck, F. C., D. J. Boffa, A. W. Kim, and L. T. Tanoue (2017). The eighth edition lung cancer stage classification. Chest 151 (1), 193–203.

40 D.J. Daley, D. V.-J. (2006). Introduction to the Theory of Point Processes. Springer New York.

Dudley, J. C., J. Schroers-Martin, D. V. Lazzareschi, W. Y. Shi, S. B. Chen, M. S. Esfahani, D. Trivedi, J. J. Chabon, A. A. Chaudhuri, H. Stehr, C. L. Liu, H. Lim, H. A. Costa, B. Y. Nabet, M. L. Y. Sin, J. C. Liao, A. A. Alizadeh, and M. Diehn (2019). Detection and surveillance of bladder cancer using urine tumor dna. Cancer Discovery 9 (4), 500–509.

Durrett, R. (2015a). Branching process models of cancer. Springer.

Durrett, R. (2015b). Branching Process Models of Cancer. Springer Cham Heidelberg New York Dordrecht London.

Etzioni, R., N. Urban, S. Ramsey, M. McIntosh, S. Schwartz, B. Reid, J. Radich, G. Anderson, and L. Hartwell (2003). Early detection: The case for early detection. Nature Reviews Cancer 3 (4), 243.

Greaves, M. and C. C. Maley (2012). Clonal evolution in cancer. Nature 481 (7381), 306–313.

Heitzer, E., I. S. Haque, C. E. Roberts, and M. R. Speicher (2019). Current and future perspectives of liquid biopsies in genomics-driven . Nature Reviews 20 (2), 71–88.

Hori, S. S. and S. S. Gambhir (2011). Mathematical model identifies blood biomarker– based early cancer detection strategies and limitations. Science Translational Medicine 3 (109), 109ra116.

Hori, S. S., A. M. Lutz, R. Paulmurugan, and S. S. Gambhir (2017). A model- based personalized cancer screening strategy for detecting early-stage tumors using blood-borne biomarkers. 77 (10), 2570–2584.

Khan, K. H., D. Cunningham, B. Werner, G. Vlachogiannis, I. Spiteri, T. Heide, J. F. Mateos, A. Vatsiou, A. Lampis, M. D. Damavandi, H. Lote, I. S. Huntingford, S. Hedayat, I. Chau, N. Tunariu, G. Mentrasti, F. Trevisani, S. Rao, G. Anandappa, D. Watkins, N. Starling, J. Thomas, C. Peckitt, N. Khan, M. Rugge, R. Begum, B. Hezelova, A. Bryant, T. Jones, P. Proszek, M. Fassan, J. C. Hahne, M. Hubank, C. Braconi, A. Sottoriva, and N. Valeri (2018). Longitudinal liquid biopsy and mathematical modeling of clonal evolution forecast time to treatment failure in the prospect-c phase ii clinical trial. Cancer Discovery 8 (10), 1270–1285.

41 Kimmel, M. and D. Axelrod (2015). Branching Processes in Biology. Springer, New York.

Kulkarni, V. G. (2016). Modeling and analysis of stochastic systems. Chapman and Hall/CRC.

Lahouel, K., L. Younes, L. Danilova, F. M. Giardiello, R. H. Hruban, J. Groopman, K. W. Kinzler, B. Vogelstein, D. Geman, and C. Tomasetti (2020). Revisiting the tumorigenesis timeline with a data-driven generative model. Proc Natl Acad Sci USA 117 (2), 857–864.

Lang, B. M., J. Kuipers, B. Misselwitz, and N. Beerenwinkel (2020). Predicting colorectal cancer risk from detection via a two-type branching process model. PLOS Computational Biology 16 (2), e1007552.

Makohon-Moore, A. P. and C. A. Iacobuzio-Donahue (2016). biology and genetics from an evolutionary perspective. Nature Reviews Cancer 16 (9), 553–565.

Makohon-Moore, A. P., K. Matsukuma, M. Zhang, J. G. Reiter, J. M. Gerold, Y. Jiao, L. Sikkema, M. A. S. Yachida, C. Sandone, R. H. Hruban, D. S. Klimstra, N. Papadopoulos, M. A. Nowak, K. Kinzler, B. Vogelstein, and C. Iacobuzio- Donahue (2018). Precancerous neoplastic cells can move through the pancreatic ductal system. Nature 561 (7722), 201–205.

Martincorena, I. and P. J. Campbell (2015). Somatic mutation in cancer and normal cells. Science 349 (6255), 1483–1489.

Martincorena, I., J. C. Fowler, A. Wabik, A. R. J. Lawson, F. Abascal, M. W. J. Hall, A. Cagan, K. Murai, K. Mahbubani, M. R. Stratton, R. C. Fitzgerald, P. A. Handford, P. J. Campbell, K. Saeb-Parsy, and P. H. Jones (2018). Somatic mutant clones colonize the human esophagus with age. Science.

Mattox, A. K., C. Bettegowda, S. Zhou, N. Papadopoulos, K. W. Kinzler, and B. Vogelstein (2019). Applications of liquid biopsies for cancer. Science Translational Medicine 11 (507), eaay1984.

McDonald, B. R., T. Contente-Cuomo, S.-J. Sammut, A. Odenheimer-Bergman, B. Ernst, N. Perdigones, S.-F. Chin, M. Farooq, R. Mejia, P. A. Cronin, K. S. Anderson, H. E. Kosiorek, D. W. Northfelt, A. E. McCullough, B. K. Patel, J. N. Weitzel, T. P. Slavin, C. Caldas, B. A. Pockaj, and M. Murtaza (2019). Personalized circulating tumor dna analysis to detect residual disease after neoadjuvant therapy in breast cancer. Science Translational Medicine 11 (504), eaax7392.

42 McWilliams, A., M. C. Tammemagi, J. R. Mayo, H. Roberts, G. Liu, K. Soghrati, K. Yasufuku, S. Martel, F. Laberge, M. Gingras, S. Atkar-Khattra, C. D. Berg, K. Evans, R. Finley, J. Yee, J. English, P. Nasute, J. Goffin, S. Puksa, L. Stewart, S. Tsai, M. R. Johnston, D. Manos, G. Nicholas, G. D. Goss, J. M. Seely, K. Amjadi, A. Tremblay, P. Burrowes, P. MacEachern, R. Bhatia, M.-S. Tsao, and S. Lam (2013). Probability of cancer in pulmonary nodules detected on first screening ct. New England Journal of Medicine 369 (10), 910–919.

Motygin, O. V. (2018). On evaluation of the confluent heun functions. In 2018 Days on Diffraction (DD). IEEE.

Mouliere, F., D. Chandrananda, A. M. Piskorz, E. K. Moore, J. Morris, L. B. Ahlborn, R. Mair, T. Goranova, F. Marass, K. Heider, J. C. M. Wan, A. Supernat, I. Hudecova, I. Gounaris, S. Ros, M. Jimenez-Linan, J. Garcia-Corbacho, K. Patel, O. Østrup, S. Murphy, M. D. Eldridge, D. Gale, G. D. Stewart, J. Burge, W. N. Cooper, M. S. van der Heijden, C. E. Massie, C. Watts, P. Corrie, S. Pacey, K. M. Brindle, R. D. Baird, M. Mau-Sørensen, C. A. Parkinson, C. G. Smith, J. D. Brenton, and N. Rosenfeld (2018). Enhanced detection of circulating tumor dna by fragment size analysis. Science Translational Medicine 10 (466).

NCI (2018). Surveillance, epidemiology, and end results (seer) research data 1973- 2015 - ascii text data.

Newman, A. M., S. V. Bratman, J. To, J. F. Wynne, N. C. W. Eclov, L. A. Modlin, C. L. Liu, J. W. Neal, H. A. Wakelee, R. E. Merritt, J. B. Shrager, B. W. Loo Jr, A. A. Alizadeh, and M. Diehn (2014). An ultrasensitive method for quantitating circulating tumor dna with broad patient coverage. Nature Medicine 20 (5), 548.

Newman, A. M., A. F. Lovejoy, D. M. Klass, D. M. Kurtz, J. J. Chabon, F. Scherer, H. Stehr, C. L. Liu, S. V. Bratman, C. Say, L. Zhou, J. N. Carter, R. B. West, G. W. Sledge Jr, J. B. Shrager, B. W. Loo Jr, J. W. Neal, H. A. Wakelee, M. Diehn, and A. A. Alizadeh (2016). Integrated digital error suppression for improved detection of circulating tumor dna. Nature Biotechnology 34 (5), 547.

Phallen, J., M. Sausen, V. Adleff, A. Leal, C. Hruban, J. White, V. Anagnostou, J. Fiksel, S. Cristiano, E. Papp, S. Speir, T. Reinert, M.-B. W. Orntoft, B. D. Woodward, D. Murphy, S. Parpart-Li, D. Riley, M. Nesselbush, N. Sengamalay, A. Georgiadis, Q. K. Li, M. R. Madsen, F. V. Mortensen, J. Huiskens, C. Punt, N. van Grieken, R. Fijneman, G. Meijer, H. Husain, R. B. Scharpf, L. A. Diaz, S. Jones, S. Angiuoli, T. Ørntoft, H. J. Nielsen, C. L. Andersen, and V. E. Velculescu (2017). Direct detection of early-stage cancers using circulating tumor dna. Sci Transl Med 9 (403).

43 Reinert, T., T. V. Henriksen, E. Christensen, S. Sharma, R. Salari, H. Sethi, M. Knudsen, I. Nordentoft, H.-T. Wu, A. S. Tin, M. Heilskov Rasmussen, S. Vang, S. Shchegrova, A. Frydendahl Boll Johansen, R. Srinivasan, Z. Assaf, M. Balcioglu, A. Olson, S. Dashner, D. Hafez, S. Navarro, S. Goel, M. Rabinowitz, P. Billings, S. Sigurjonsson, L. Dyrskjøt, R. Swenerton, A. Aleshin, S. Laurberg, A. Husted Madsen, A.-S. Kannerup, K. Stribolt, S. Palmelund Krag, L. H. Iversen, K. Gotschalck Sunesen, C.-H. J. Lin, B. G. Zimmermann, and C. Lindbjerg Andersen (2019). Analysis of plasma cell-free dna by ultradeep sequencing in patients with stages i to iii colorectal cancer. JAMA Oncology.

Rew, D. A. and G. D. Wilson (2000). Cell production rates in human tissues and tumours and their significance. Part II: clinical data. European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology 26 (4), 405–17.

Ronveaux, A. (1995). Heun’s Differential Equations. Oxford University Press.

Ryser, M. D., M. Worni, E. L. Turner, J. R. Marks, R. Durrett, and E. S. Hwang (2016). Outcomes of active surveillance for ductal in situ: a computational risk analysis. Journal of the National Cancer Institute 108 (5), djv372.

Shen, S. Y., R. Singhania, G. Fehringer, A. Chakravarthy, M. H. A. Roehrl, D. Chadwick, P. C. Zuzarte, A. Borgida, T. T. Wang, T. Li, O. Kis, Z. Zhao, A. Spreafico, T. d. S. Medina, Y. Wang, D. Roulois, I. Ettayebi, Z. Chen, S. Chow, T. Murphy, A. Arruda, G. M. O’Kane, J. Liu, M. Mansour, J. D. McPherson, C. O’Brien, N. Leighl, P. L. Bedard, N. Fleshner, G. Liu, M. D. Minden, S. Gallinger, A. Goldenberg, T. J. Pugh, M. M. Hoffman, S. V. Bratman, R. J. Hung, and D. D. De Carvalho (2018). Sensitive tumour detection and classification using plasma cell-free dna methylomes. Nature 563 (7732), 579.

Siegel, R. L., K. D. Miller, and A. Jemal (2020). Cancer statistics, 2020. CA: A Cancer Journal for Clinicians 70 (1), 7–30.

Song, M., B. Vogelstein, E. L. Giovannucci, W. C. Willett, and C. Tomasetti (2018). : Molecular and epidemiologic consensus. Science 361 (6409), 1317–1318.

Srivastava, S., E. J. Koay, A. D. Borowsky, A. M. De Marzo, S. Ghosh, P. D. Wagner, and B. S. Kramer (2019). Cancer overdiagnosis: a biological challenge and clinical dilemma. Nature Reviews Cancer, 1.

44 Steutel, F. W. and K. Van Harn (2003). Infinite divisibility of probability distributions on the real line. CRC Press.

Team, N. L. S. T. R. (2011). Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine 365 (5), 395–409. PMID: 21714641.

Teixeira, V. H., C. P. Pipinikas, A. Pennycuick, H. Lee-Six, D. Chandrasekharan, J. Beane, T. J. Morris, A. Karpathakis, A. Feber, C. E. Breeze, P. Ntolios, R. E. Hynds, M. Falzon, A. Capitanio, B. Carroll, P. F. Durrenberger, G. Hardavella, J. M. Brown, A. G. Lynch, H. Farmery, D. S. Paul, R. C. Chambers, N. McGranahan, N. Navani, R. M. Thakrar, C. Swanton, S. Beck, P. J. George, A. Spira, P. J. Campbell, C. Thirlwell, and S. M. Janes (2019). Deciphering the genomic, epigenomic, and transcriptomic landscapes of pre-invasive lung cancer lesions. Nature Medicine, 1.

Tie, J., Y. Wang, C. Tomasetti, L. Li, S. Springer, I. Kinde, N. Silliman, M. Tacey, H.-L. Wong, M. Christie, S. Kosmider, I. Skinner, R. Wong, M. Steel, B. Tran, J. Desai, I. Jones, A. Haydon, T. Hayes, T. J. Price, R. L. Strausberg, L. A. Diaz, N. Papadopoulos, K. W. Kinzler, B. Vogelstein, and P. Gibbs (2016). Circulating tumor dna analysis detects minimal residual disease and predicts recurrence in patients with stage ii colon cancer. Science Translational Medicine 8 (346), 346ra92.

Ulz, P., G. G. Thallinger, M. Auer, R. Graf, K. Kashofer, S. W. Jahn, L. Abete, G. Pristauz, E. Petru, J. B. Geigl, E. Heitzer, and M. R. Speicher (2016). Inferring expressed genes by whole-genome sequencing of plasma dna. Nature Genetics 48 (10), 1273.

Vogelstein, B., N. Papadopoulos, V. E. Velculescu, S. Zhou, L. A. Diaz, and K. W. Kinzler (2013). Cancer genome landscapes. Science 339 (6127), 1546–1558.

Wan, J. C., C. Massie, J. Garcia-Corbacho, F. Mouliere, J. D. Brenton, C. Caldas, S. Pacey, R. Baird, and N. Rosenfeld (2017). Liquid biopsies come of age: towards implementation of circulating tumour dna. Nature Reviews Cancer 17 (4), 223–238.

Wan, J. C. M., K. Heider, D. Gale, S. Murphy, E. Fisher, J. Morris, F. Mouliere, D. Chandrananda, A. Marshall, A. B. Gill, P. Y. Chan, E. Barker, G. Young, W. N. Cooper, I. Hudecova, F. Marass, G. R. Bignell, C. Alifrangis, M. R. Middleton, F. A. Gallagher, C. Parkinson, A. Durrani, U. McDermott, C. G. Smith, C. Massie, P. G. Corrie, and N. Rosenfeld (2019). High-sensitivity monitoring of ctdna by patient-specific sequencing panels and integration of variant reads. bioRxiv, 759399.

45 Winer-Muram, H. T., S. G. Jennings, R. D. Tarver, A. M. Aisen, M. Tann, D. J. Conces, and C. A. Meyer (2002). Volumetric growth rate of stage i lung cancer prior to treatment: serial ct scanning. Radiology 223 (3), 798–805.

Wodarz, D. and N. L. Komarova (2014). Dynamics of Cancer: Mathematical Foundations of Oncology. World Scientific Publishing Co., Inc., Singapore.

Yokoyama, A., N. Kakiuchi, T. Yoshizato, Y. Nannya, H. Suzuki, Y. Takeuchi, Y. Shiozawa, Y. Sato, K. Aoki, S. K. Kim, Y. Fujii, K. Yoshida, K. Kataoka, M. M. Nakagawa, Y. Inoue, T. Hirano, Y. Shiraishi, K. Chiba, H. Tanaka, M. Sanada, Y. Nishikawa, Y. Amanuma, S. Ohashi, I. Aoyama, T. Horimatsu, S. Miyamoto, S. Tsunoda, Y. Sakai, M. Narahara, J. B. Brown, Y. Sato, G. Sawada, K. Mimori, S. Minamiguchi, H. Haga, H. Seno, S. Miyano, H. Makishima, M. Muto, and S. Ogawa (2019). Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565 (7739), 312–317.

46