arXiv:2006.07654v1 [math.NA] 13 Jun 2020 C278 USA. 27708, NC 119076. en.W rtdmntaeorie ntecneto riaydiff Keywords: nu ordinary Several of results. method. context Carlo theoretical the Monte our in inchworm verify p the idea sign for our thes demonstrate balance the complete where to first case We analy tool successful Our a a means. as is solution. considered method numerical Carlo be Monte the can be of resummation can evolution which partial erro method, the of the stochastic from and the comes problem in variance latter sign of numerical growth the disting fast Dy as we the classical known method, the are Carlo to which Monte inchworm growth, method the Carlo fo of Mon Monte inchworm We mechanism of the underlying which application systems. for direct quantum time, the meth simulation open than the Carlo to for Monte respect problem inchworm with error sign the numerical of the analysis tackle numerical the consider We Abstract .Introduction 1. rpgtri rte sa xrsinivliga nnt eisand series infinite an involving expression an as written is is method propagator the [5], techniq In numerical classical models. that impurity so for quan th equation, [9] open for integro-differential in for analysis proposed numerical method originally the Carlo was in Monte interested HEO diagrammatic are and proposed we 26] recently paper, app [25, this to In integral) needs path [17]. appro one propagator Markovian simulations, (quasi-adiabatic non-Markovian weak, QuAPI for is environment while non-M [22], the is the simulation and pr dynamics environment, system Nakajima-Zwanzig the by quantum the that obtained the with showing be equation interaction can integro-differential equation to an master Due its and t [14], [1]. [30], and thermodynamics biology isolated, quantum quantum absolutely including and is applications system wide quantum has no systems reality, In ronment. ueia nlssfricwr ot al ehd inpr Sign method: Carlo Monte inchworm for analysis Numerical 2 1 nteitgodffrnileuto omlto fteicwr me inchworm the of formulation equation integro-differential the In quant a to refers system quantum open an mechanics, quantum In eateto ahmtc,Dprmn fPyis n Depa and Physics, of Department Mathematics, of Department eateto ahmtc,Ntoa nvriyo Singapo of University National Mathematics, of Department pnqatmsse,icwr ot al ehd ueia sig numerical method, Carlo Monte inchworm system, quantum Open hnigCai Zhenning ro growth error 1 inegLu Jianfeng , e ee ,BokS7 0LwrKn ig od Singapore Road, Ridge Kent Lower 10 S17, Block 4, Level re, teto hmsr,Dk nvriy o 02,Durham 90320, Box University, Duke Chemistry, of rtment 2 ia Yang Siyao , e a eapplied. be can ues bevdfo h yo eis n the and series, Dyson the from observed w ye ferr n h inchworm the and error, of types two e olmi ffcieysprse ysuch by suppressed effectively is roblem nhomagrtm[,7,wihi a is which 7], [5, algorithm inchworm e roin hnteculn between coupling the When arkovian. u ytm h nhomalgorithm inchworm The system. tum eia xeiet r are u to out carried are experiments merical ihdmninlitgas Therefore, integrals. high-dimensional iaincnb sdt ipiythe simplify to used be can ximation eeoeteter foe quantum open of theory the herefore heacia qain fmotion) of equations (hierarchical M mlfiain h omri u to due is former The amplification. r eCromto hw atrcurve flatter a shows method Carlo te msse neatn ihteenvi- the with interacting system um rnileutos n hnprovide then and equations, erential o eis obte nesadthe understand better To series. son i eosrtsta h technique the that demonstrates sis u ntegot ftenumerical the of growth the on cus jcintcnqe[3 0,wihis which 40], [33, technique ojection ymr xesv ehd uhas such methods expensive more ly ihtotpso xoeta error exponential of types two uish d hc spooe eetyto recently proposed is which od, unu nomto cec [37], science information quantum eati otnosfr san as form continuous a in recast 1 hd h iedrvtv fthe of derivative time the thod, unu ytmi irreversible is system quantum rbe,errgrowth error problem, n be and oblem the numerical method involves both a Runge-Kutta part for time marching and a Monte Carlo part to deal with the series and integrals. In abstract form, we write the equation as du = RHS = E R(X), (1) dt X where EX denotes the expectation with respect to the random variable X, and both RHS and R(X) are functions of the solution u. While we motivate (1) using the inchworm method, such type of equations also arises in many other contexts, and thus our analysis applies in a wider context. Consider using the forward Euler method as the time integrator, combined with Monte Carlo estimate of the RHS, the numerical scheme is

N h s u = u + R(X(n)), (2) n+1 n N i s i=1 X (n) where h is the time step, and Xi are random variables drawn from the probability distribution of X. Such a scheme is highly related to a number of existing methods such as the Direct simulation Monte Carlo [2, 3], stochastic gradient descent method [39], and the random batch method [18]. The qDRIFT method proposed in [6] is also a variant of (2) by replacing the forward Euler method with an exact solver in the context of Hamiltonian simulation. The scheme (2) can be easily extended to general Runge-Kutta methods, which is found in [5] to be useful in the simulation of open quantum systems. The numerical analysis of such a method has been carried out for differential-type equations in several cases [16, 18, 20]. When such methods are applied to systems with dissipation [19, 20], the numerical error can be well controlled by the intrinsic property of the system for long-time simulations. However, in , where the propagators remain unitary for any t, it is often seen that the error grows rapidly with respect to time in the real-time simulations [4, 24], as is known as the “numerical sign problem”, or more specifically the “dynamical sign problem” in the context of open quantum systems [31, 32, 36]. The purpose of the inchworm Monte Carlo method is to mitigate the numerical sign problem when simulating the open quantum system by Dyson series expansion [5, 7]. The numerical sign problem, which will be further elaborated in 2.2, refers to the stochastic error when applying Monte Carlo method to estimate the sum or the integral§ of highly oscillatory, high-dimensional functions. This is an intrinsic and notorious difficulty for simulating many-body quantum systems, such as in [23] and lattice field theory [10]. For open quantum systems, the numerical sign problem becomes more severe as the simulation time gets longer [31, 32, 36]. Specifically, the average numerical error is proportional of the exponential of t2, as introduces great difficulty for long time simulations. Inchworm Monte Carlo method adopts the idea of “partial resummation”, and has successfully reduced the numerical error in a number of applications [11, 13, 35]. However, as mentioned in [8], it is not totally clear how the inchworm Monte Carlo method mitigates the numerical sign problem, despite some intuition coming from the idea of partial resummation. In this work, our aim is to demystify such mechanism by a deep look into the evolution of the numerical error. We find that the error of the inchworm Monte Carlo method grows as the exponential of a polynomial of t. However, the source of such error growth is not the numerical sign problem. The reason of the fast growth mainly comes from the amplification of the error at previous time steps, which is more similar to the error amplification in Runge-Kutta methods for ordinary differential equations. By separating these two types of error growth— numerical sign problem and error amplification, we find that partial resummation can be regarded as a tool to trade-off the two types of error, as may help flatten the error growth curve in certain cases. We hope this also helps understand a more general class of iterative numerical methods for computing summations [21, 28, 29, 34]. In fact, such understanding of error balance can be already revealed in the context of ODEs, which we will first focus on to save the involved notations in the inchworm Monte Carlo method. Thus, in Section 2, we carry out the error analysis of differential equations for general Runge-Kutta methods with Monte Carlo evaluation of the right-hand side. The results are of independent interests, and the analysis also serves as 2 a simple context to understand how partial resummation transforms the mechanism of error growth from numerical sign problem to error amplification. Afterwards, a detailed analysis for the inchworm Monte Carlo method will be given, which reveals the behavior of the error growth in the inchworm Monte Carlo method, and explains whether/how it relaxes the numerical sign problem. In Section 3, we will introduce the integro-differential equation derived from the inchworm Monte Carlo method and present the corresponding main results and their implication. Our analytical results are verified by several numerical tests in Section 4, showing the agreement between the theory and the experiments. The rigorous proofs for the error analysis of differential equation and inchworm Monte Carlo equation are later given in Section 5 and Section 6 respectively. Finally, some concluding remarks are given in Section 7.

2. A stochastic numerical method for differential equations

To demonstrate the methodology of numerical analysis for the equations with the form (1), we first consider the simple case of an ordinary differential equation: du = f(t,u(t)), t [0,T ], (3) dt ∈ where u : [0,T ] Cd and the right-hand side f is (p+1)-times continuously differentiable. A general s-stage explicit Runge-Kutta→ method of order p reads

s un+1 = un + h biki, i=1 X i 1 (4) − k = f t + c h,u + h a k , i =1, , s. i n i n ij j ··· j=1  X 

For simplicity, we assume that the time step h = T/N is a constant, and u0 is given by the initial condition u0 = u(0). The error estimation of Runge-Kutta methods is standard and can be found in textbooks such as [15]. As mentioned in the introduction, we now consider a special scenario where the right-hand side of this equation can be represented as the expectation of a stochastic variable:

f(t,u)= E [g(t,u,X)], t,u (5) X ∀ with X being a random variable subject to a given distribution. Inspired by the Runge-Kutta method, we consider the following numerical scheme:

s u = u + h b k , 0 n N (6) n+1 n i i ≤ ≤ i=1 X e where e e Ns i 1 1 − k = g t + c h, u + h a k ,X(i) (7) i N n i n ij j l s j=1 Xl=1  X  e (i) e e with the initial condition u0 = u(0). Here Xl are independent samples generated from the probability distribution of X. In this section, we wille look for the gap between these two numerical methods (4) and (6). In specific, we aim to bound the bias E(u u ) and the numerical error [E( u u 2)]1/2. By combining these k N − N k2 k N − N k2 e e 3 errors with the error estimation of Runge-Kutta methods, the final error bounds can be obtained simply by triangle inequality:

E(u(T ) u ) u(T ) u + E(u u ) , k − N k2 ≤k − N k2 k N − N k2 E 2 1/2 E 2 1/2 [ ( u(T ) uN 2)] u(T ) uN 2 + [ ( (uN uN ) 2)] , k − ke ≤k − k k −e k where u(T ) u is the error numerical error for the standard Runge-Kutta method, whose analysis can N 2 e e be foundk in a− numberk of textbooks.

2.1. Main results for differential equations In this section, we will list the main results of our error analysis. The results are based on the following working hypothesis:

(m) (m) 2 (m) g (t,u,X) M ′, f (t,u) M ′, f (t,u) M ′′, (8) k∇u k2 ≤ k∇u k2 ≤ k∇u kF ≤ (m) where f denotes the mth component of f, and M, M ′ and M ′′ are constants independent of t, u and X. For the Runge-Kutta solution, we define

i 1 − uRK std = max max Var g tn + cih,un + h aij kj ,X . | | n i=1, ,s v ··· u j=1 u  X  t For simplicity, below we use R to denote the upper bound of all the coefficients appearing in the Runge-Kutta method (4). Precisely, we assume that

a , b , c R for all i, j. (9) | ij | | i| | i|≤ Thus, the recurrence relations of both the bias and the numerical error can be established as: Proposition 1. Given a sufficiently small time step length h and a sufficiently large number of samplings at each step Ns. If the boundedness assumptions (8) hold, we have the recurrence relations

2 E E E 2 h 2 (un+1 un+1) 2 (1 + αh) (un un) 2 + αh un un 2 + uRK std (10) k − k ≤ k − k k − k Ns | |    and e e e 2 2 5 E 2 E 2 h 2 α h 4 ( un+1 un+1 2) (1 + βh) ( un un 2)+ β uRK std + 2 2 2 uRK std , (11) k − k ≤ k − k Ns | | s R Ns | |   s s 3 2 2 s 2 2 s+2 2 2 3 s+1 2 2 where α =2 sR max(Me′,sM ′′, 2 s M ′ M ′′R , 2 s Me ′′R ) and β = max(4+2 M ′ dR s , 2 R s ).

Next, we apply the two recurrence relations above and accumulate the two errors step by step. We will reach the following estimates: Theorem 2. Under the settings in Proposition 1, we have Bias estimation h2 h α2h4 E(u u ) eαT 1 + αT + u 2 emax(α,β)T 1 u 2 . (12) k N − N k2 ≤ N − N s2R2N 2 | RK|std − | RK|std  s s s      Numerical errore estimation

2 4 E 2 βT h α h 2 2 ( uN uN 2) e 1 + 2 2 2 uRK std uRK std. (13) k − k ≤ − Ns s R Ns | | | |   e 4 This theorem shows that although the stochastic scheme is biased, the bias plays a minor role in the numerical simulation since the stochastic noise, estimated by the square root of (13), is significantly larger. It is also worth mentioning that the constant β depends only on the Runge-Kutta scheme and the bound of the first-order derivatives, while the constant α depends also on the bounds of the second-order derivatives. However, in the estimate (13), the constant α only appears in a term significantly smaller than h/Ns. Thus, it is expected that the second-order derivatives have less effect on the numerical error. This is also the case for the inchworm Monte Carlo method to be analyzed in Section 6, and accordingly we will give less details for the estimate involving second-order derivatives. We will defer the proofs of these theorems. For now, let us discuss the implication of these theorems and how this is related to the numerical sign problem in the quantum Monte Carlo method.

2.2. Discussion on the relation between the numerical error and the numerical sign problem The above error estimation shows exponential growth of the numerical error with respect to time. Such exponential growth is due to the amplification of the error at previous time steps, as is well known in the numerical analysis for ordinary differential equations, which is often estimated by the discrete Gronwall inequality. There exists another kind of exponential growth of error, which is typically encountered in the stochastic simulation of quantum mechanical systems, called the “numerical sign problem” [23]. Such a problem occurs when using the Monte Carlo method to evaluate the integral or sum of a strongly oscillatory high-dimensional function. To understand how the numerical sign problem causes the exponential growth of the numerical error, we consider the case where f(t,u)= iH(t)u and g(t,u,X)= iA(t,X)u. As an analog of quantum − − 2 2 mechanics, we assume that H(t) is a Hermitian matrix, so that u(t) 2 = u(0) 2 for any t. The solution of this system of ordinary differential equations can be expressedkby thek Dysonk series:k

+ T tM t2 ∞ M E E u(T )= u(0) + ( i) XM A(tM ,XM ) XM− A(tM 1,XM 1) ··· − 1 − − (14) M=1 Z0 Z0 Z0 X   EX A(t1,X1) u(0) dt1 dtM 1 dtM . ··· 1 ··· − Thus u(T ) can be evaluated directly using the Monte Carlo method to approximate the integral, where M,t1, ,tM and X1, ,XM are all treated as random variables. While different methods to draw samples exist [4],··· here we only··· consider the simplest approach, in which M follows the Poisson distribution, the (num) time points (t1,t2, ,tM ) are uniformly distributed in the M-dimensional simplex. Let u (T ) be the numerical solution··· obtained using this method. Then the standard error estimation of the Monte Carlo method yields that

E u(num)(T ) u(T ) 2 k − k + ∞ T tM t2 E 2 2 6 A(tM ,XM )A(tM 1,XM 1) A(t1,X1)u(0) 2 dt1 dtM 1 dtm u(T ) 2 ··· k − − ··· k ··· − −k k M=0 Z0 Z0 Z0 X 2 2 6 [exp dM ′ T 1] u(0) , − k k2 (15)  where d is the number of dimensions of u, and M ′ is again the bound of the first-order derivatives defined in (8). It can be observed that the numerical error again grows exponentially in time. However, such exponential growth of the error is not due to the error amplification in the Gronwall inequality. It comes from the growing integral domain on the right-hand side of (14). Note that although the integral domain expands as T increases, the magnitude of the infinite sum does not grow over time ( u(T ) 2 = u(0) 2). This indicates that when T is large, strong oscillation exists in the integrand of (14), resultingk k in significantk k cancellation when taking the sum, which leads to huge variance for Monte Carlo estimation. This intrinsic difficulty of stochastic methods is known as the numerical sign problem. 5 As we will see later, if not carefully dealt with, the numerical sign problem can cause even faster growth of the numerical error. One possible approach to mitigating the numerical sign problem is to use the method of partial resummation, which only takes part of the summation (instead of the whole integral), and use the result to find other parts of the sum. For example, suppose we want to compute the infinite sum:

s =1+ a + a2 + a3 + , ··· we can choose to take the sum directly using the Monte Carlo method. Alternatively, we can also first take 2 the partial sum s1 =1+ a, and then use the result of s1 to compute another partial sum s3 = (1+ a )s1. 4 Afterwards, s3 can be used to compute s7 = (1+ a )s3, and so forth. It can be seen that the error in the computation of s1 will be amplified when computing s3, and the error of s3 will be amplified in the computation of s7. This illustrates the idea of partial resummation, which partly transfers the sign problem to error amplification. Below we would like to demonstrate that sometimes the Runge-Kutta method can also be considered as the partial resummation of (14), which changes the underlying mechanism of the error growth. Consider applying the forward Euler method to the equation du = iH(t)u, H(t)= E A(t,X), (16) dt − X so that the numerical scheme is

Ns h (n) un+1 = un i A(tn,Xl )un. − Ns Xl=1 When n = 0, the scheme gives e e e Ns h (0) u1 = u(0) i A(0,Xl )u(0). (17) − Ns Xl=1 If we view the underlined term as ae special Monte Carlo method to evaluate the integral

h EA(t1,X)u(0) dt1, Z0 for which we only take one sample of t1 locating at t1 = 0, then u1 turns out to be part of the right-hand side of (14) by reducing the integral domain from T to h and only considering M = 1. Next, this partial sum u1 is used to compute u2: e

Ns e h e (1) u2 = u1 i A(t1,Xl )u1 − Ns Xl=1 e e Ns (0)e (1) 2 Ns Ns 2h A(0,Xl )+ A(h,Xl ) 2 h (1) (0) = u(0) i u(0) + ( i) 2 A(h,Xl ) A(0,Xl ) u(0) − Ns 2 − Ns ! ! Xl=1 Xl=1 Xl=1 2h 1 2h t2 u(0) + ( i)EA(t ,X) dt + ( i)2(EA(t ,X))(EA(t ,X))u(0) dt dt , ≈ − 1 1 2 − 2 1 1 2 Z0 Z0 Z0 which can again be considered as the partial sum of (14). For further time steps, this can also be verified. As is well known, the error of the forward Euler method may accumulate as the solution evolves, and therefore this example again shows the change of the mechanism for the growth of the error. However, our error estimate in Theorem 2 seems to suggest that shifting numerical sign problem to error amplification does not flatten the error curve, which still grows exponentially. The reason is that our error estimation does not make any assumption on the stability of the Runge-Kutta method, as leads to 6 the exponential growth of the error regardless of the scheme and the problem. Again, let us take (16) as an example and apply the second-order Heun’s method. Then the deterministic scheme and the stochastic scheme are, respectively, u = (I h )u andu ˜ = (I h )˜u , (18) n+1 − L n n+1 − A n where 1 = i H(t )+ H(t ) + hH(t )H(t ) , L 2 n n+1 n+1 n h  Ns  i Ns Ns 1 i (1) (2) 1 (2) 1 (1) = A(tn,Xl )+ A(tn+1,Xl ) + h A(tn+1,Xl ) A(tn,Xl ) . A 2 "Ns Ns ! Ns !# Xl=1   Xl=1 Xl=1 By straightforward calculation, we can find that

E u u˜ 2 6 I h 2 E u u˜ 2 + h2E ( )˜u 2. k n+1 − n+1k2 k − Lk2 k n − nk2 k L − A nk2 On the right-hand side, the second term can be bounded by the standard Monte Carlo error estimate. For the first term, if we assume that H(tn+1) and H(tn) are both Hermitian matrices, and H(tn+1) H(tn)= O(h), 2 4 − then I h 2 =1+ O(h ). Therefore when h is small, the exponential growth of the error can be well suppressed,k − Lk since lim (1 + Ch4)T/h =1 h 0+ → for any positive constants C and T . However, for large time steps such that I h 2 is significantly larger than 1, the error still grows exponentially. In general, for a stable Runge-Kuttak − scheme,Lk the constant β in the coefficient (1 + βh) appearing in (11) can be negative or a positive o(1) quantity. In this case, partial resummation indeed helps reduce the error growth. One example of applications is the method of qDRIFT proposed in [6], where the total Hamiltonian is also computed using a stochastic method. A symplectic time integrator is utilized therein so that the error growth is also well suppressed. Note that the paper [6] provides only an estimate of the bias for qDRIFT, while has not considered the full numerical error. As we have discussed, the bias is in fact not the major part of the error. The analysis of such a simple ODE sketches the idea how the numerical sign problem can be mitigated in the algorithms with partial resummation. For open quantum systems, the inchworm Monte Carlo method, which has been claimed to have the capability of taming the numerical sign problem [9], is one option to apply partial resummation to the corresponding Dyson series. However, due to the existence of the heat bath, the evolution of the quantum state is non-Markovian, and the equation that the inchworm Monte Carlo method solves can only be formulated as an integro-differential equation, so that one cannot simply apply a symplectic scheme to suppress the error growth. For this reason, the situation of the inchworm Monte Carlo method is much more complicated due to the nontrivial behavior of the error amplification. A detailed introduction will be given in the next section.

3. Inchworm Monte Carlo method and the main results

We now study the integro-differential equation induced by the inchworm method for open quantum n n systems. The integro-differential equation is introduced in [5] for the full propagator G : C × , where e T → the subscript e stands for “exact” and = (sf ,si) sf > si > 0 . Here n is the number of dimensions for all the quantum states of the system. InT this{ paper,| we only consid} er the case n = 2 for simplicity. For the case of higher dimensions, the analysis can be extended without difficulties. The equation reads

∂Ge(s ,si) ↑ = sgn(s t)iHsGe(s ,si)+ (s , Ge,si), (19) ∂s ↑ − ↑ H ↑ ↑

7 where M¯ M+1 # ~sM t M M M (s , Ge,si) := sgn(s t) i ( 1) { ≤ }Ws (s ,~s ,si) (s ,~s ) d~s . ↑ ↑ M M ↑ ↑ H − s↑>s > >s >si − U L M=1 Z M ··· 1 M Xis odd (20) M M M M M M Here we use ~s as a short hand of the sequence sM ,sM 1, ,s2 ,s1 , and the integral with respect to ~s is interpreted as − ··· s↑ s↑ s↑ M M M M ϕ d~s = ϕ dsM ds2 ds1 . M M M M s↑>s > >s >si si s ··· s ··· Z M ··· 1 Z Z 1 Z M−1 Other symbols appeared in (19) are introduced as follows:

M M M M M (s ,~s ,si)= Ge(s ,sM )WsGe(sM ,sM 1) WsGe(s1 ,si); • U ↑ ↑ − ··· 2 2 H C × : the Hamiltonian of the quantum system we are interested in; • s ∈ 2 2 W C × : the perturbation of the Hamiltonian due to coupling with the environment; • s ∈ 2 2 O C × : the observable of the quantum system, acting at time t; • s ∈ : the bath influence functional with the form of the sum of products (see [5] for details). • L

The equation holds for the initial time point si [0, 2t] t and the final time point s [si, 2t] t , and the full propagator satisfies the “jump conditions”∈ at time\{ t}: ↑ ∈ \{ }

lim Ge(s ,si)= Os lim Ge(s ,si), + − s↑ t ↑ s↑ t ↑ → → (21) lim Ge(s ,si) = lim Ge(s ,si)Os − + si t ↑ si t ↑ → → as well as the boundary condition Ge(s ,s ) = Id. Here we remark that the original formula of this integro- differential equation is given with M¯ =↑ ↑in [5]. However, in practice, we truncate the series by a finite M¯ as an approximation. In fact, the major∞ benefit of the inchworm method compared with the classical Dyson series expansion is just the fast convergence of this infinite series [7, 8]. Therefore in this paper, we only restrict ourselves to the case of a finite M¯ . Similar to the case of the differential equation, we may use general explicit time integrator to solve this integro-differential equation numerically. In this work, we focus on the numerical method proposed in [5], which is inspired by the second-order Heun’s method:

G∗ = (I + sgn(t t)iH h)G + K h, n+1,m n − s n,m 1 1 1 1 G = (I + sgn(t t)iH h)G + sgn(t t)iH hG∗ + (K + K )h, 0 m n 2N, n+1,m 2 n − s n,m 2 n+1 − s n+1,m 2 1 2 ≤ ≤ ≤ (22) where h = t/N (we again require h 1) is the time step length, and G denotes the numerical approxi- ≤ n,m mation of the solution Ge(nh,mh). Different from the standard Heun’s method for ODEs, the slope K1 has to be computed based on a number of previous numerical solutions

gn,m := (Gm+1,m; Gm+2,m+1, Gm+2,m; ; Gn,n 1, , Gn,m). (23) ··· − ··· The explicit expression for K1 is given by

M¯ M+1 # ~sM t M K1 = F1(gn,m) := sgn(tn t) i ( 1) { ≤ }WsIhG(tn,sM )Ws Ws M M − tn>s > >s >tm − ··· × M=1 Z M ··· 1 M Xis odd I G(sM ,t ) (t ,~sM )d~sM , (24) × h 1 m L n 8 where IhG( , ) is obtained by piecewise linear interpolation on the triangular mesh shown in Figure 1 such that I G(t ·,t· )= G for all integers m k j n. Similarly, K is given by h j k j,k ≤ ≤ ≤ 2 M¯ M+1 # ~sM t M K2 = F2(gn,m∗ ) := sgn(tn+1 t) i ( 1) { ≤ }WsIh∗G(tn+1,sM )Ws Ws M M − tn >s > >s >tm − ··· × M=1 Z +1 M ··· 1 M Xis odd M M M I∗G(s ,t ) (t ,~s )d~s × h 1 m L n+1 where g∗ := (g ; G , , G , G∗ ) and I∗G( , ) is the linear interpolation such that n,m n,m n+1,n ··· n+1,m+1 n+1,m h · ·

Gj,k, if (j, k) = (n +1,m), Ih∗G(tj ,tk)= 6 (Gn∗+1,m, if (j, k) = (n +1,m).

To implement this scheme, we compute each Gj,k in the order illustrated in Figure 1. Specifically, we calculate the propagators column by column from left to right, and for each column we start from the boundary value G = Id (red “ ”) locating on the diagonal and compute from top to bottom. i,i • Due to the jump conditions (21), we need a special treatment for the two discontinuities at GN,k (green “ ”) and G (blue “ ”) to achieve second-order convergence. In the numerical scheme, we keep two copies • j,N • of G or G representing the left- and right-limits when s t±: N,k j,N → (G + , G − ) and (G + , G − ) for 0 k N 1,N +1 j 2N. N ,k N ,k j,N j,N ≤ ≤ − ≤ ≤ Here GN ±,k and Gj,N ± are, respectively, the approximation of lim G(s,kh) and lim G(jh,s). The relation s t± s t± → → GN +,k = OsGN −,k and Gj,N − = Gj,N + Os are immediately derived from the jump conditions. Moreover, we note that the boundary value on the discontinuities are given by: GN +,N + = GN −,N − = Id and GN +,N − = Os. In the implementation, we need to follow the rules below while evolving the scheme (22) near the discontinuities:

(R1) When n = N 1, the quantities G∗ and G are regarded as G∗ − and G − , respectively, − n+1,m n+1,m N ,m N ,m and sgn(tn+1 t) takes the value 1. When n = N, the propagator Gn,m is regarded as GN∗ +,m, and t t takes the− value 1. − n − (R2) The value of Gn+1,N + is set to be OsGN −,m; the value of Gn+1,N − is set to be Gn+1,N + Os.

(R3) sgn(t − t)= 1, sgn(t + t) = 1. N − − N − (R4) The interpolation of IhG and Ih∗G should respect such discontinuities. For example, the interpolating operator Ih should satisfy

lim IhG(tj ,s)= Gj,N ± , lim IhG(s,tk)= GN ±,k, s t± s t± → → lim lim IhG(˜s,s) = lim lim IhG(s, s˜)=Id, lim lim IhG(s, s˜)= Os. s t+ s˜ t+ s t− s˜ t− s t+ s˜ t− → → → → → → The conditions for Ih∗G are similar. As we will prove later, the above numerical method guarantees a second-order approximation of the solution. However, the computation cost is not affordable when M is large since the degrees of freedom for calculating the integral with respect to ~sM will grow exponentially w.r.t M. Therefore, we take advantage of Monte Carlo integration and replace the integrals by the averages of Monte Carlo samples, resulting in the following inchworm Monte Carlo method:

G∗ = (I + sgn(t t)iH h)G + K h, n+1,m n − s n,m 1 1 1 1 G = (I + sgn(t t)iH h)G + sgn(t t)iH hG∗ + (K + K )h, 0 m n 2N, en+1,m 2 n − s e n,m e2 n+1 − s n+1,m 2 1 2 ≤ ≤ ≤ (25) e e e e e 9 si

t

s O t 2t ↑

Figure 1: The uniform mesh and the order of computation for N = 5. with

N 1 s K = F (g ;~si) 1 N 1 n,m s i=1 X ¯ e e e Ns M M sgn(tn t) M+1 (tn tm) # ~si,M t i,M := − i − ( 1) { ≤ }WsIhG(tn,sM )Ws Ws Ns M! − ··· × i=1 M=1 X M Xis odd e I G(si,M ,t ) (t ,~si,M ), × h 1 m L n where N denotes the number of samplings and ~si,Me = (si,M ,si,M , ,si,M ) is the time sequence obtained s 1 2 ··· M via uniform sampling ~si,M U(t ,t ). Similarly, we define ∼ m n

Ns 1 i K = F (g∗ ;~s∗ ) 2 N 2 n,m s i=1 X ¯ e e e Ns M M sgn(tn+1 t) M+1 (tn+1 tm) # ~s∗i,M t i,M := − i − ( 1) { ≤ }WsIh∗G(tn+1,sM∗ )Ws Ws Ns M! − ··· × i=1 M=1 X M Xis odd i,M i,M e I∗G(s∗ ,t ) (t ,~s∗ ) × h 1 m L n+1 i,M i,M i,M i,M with the samplings ~s∗ = (s∗ ,s∗ , ,s∗e ) U(t ,t ). 1 2 ··· M ∼ m n+1 Our goal in this section is to understand how the error evolves over time. The purpose is to compare this method with the classical method using Dyson series expansion [12]. According to the derivation of the inchworm Monte Carlo method [5, 7], the underlying idea is the partial resummation of the Dyson series with the following form:

+ ∞ m m # ~s t (0) m (0) m m Ge(s ,si)= i ( 1) { ≤ } (s ,~s ,si) (~s ) d~s , (26) ↑ m m ↑ s↑>s > >s >si − U L m=0 Z m ··· 1 m Xis even

(0) m (0) m (0) m m (0) m m (0) m (0) where (s ,~s ,si) = Gs (s ,sm)WsGs (sm,sm 1)Ws WsGs (s2 ,s1 )WsGs (s1 ,si) with Gs ( , ) U ↑ ↑ − ··· · · 10 defined by m m i(sk+1 sk )Hs m m e− − , if sk 6 sk+1 < t, (0) m m i(sm sm )H , k k+1 s 6 m 6 Gs (sk+1,sk )= e− − , if t sk sk+1, (27) m m  i(t sk+1)Hs i(t sk )Hs m m e− − Ose− − , if sk

+ m 2 4L 2 2 ∞ (s si) m m/2 Ws (s si) ↑ − (m 1)!! W L = exp k k ↑ − . m! − k sk 2 m=0   m Xis even   For details, we refer the readers to [5, Section 5], where a proof for the spin- model can be found. In [7, 8], it is claimed that the inchworm Monte Carlo method can effectively mitigate such sign problem, without providing an argument. In this paper, we aim at a rigorous numerical analysis for the scheme.

3.1. Notation and Assumptions We first list some notations and assumptions here for the convenience of readers.

3.1.1. Vectorization and norms 2 2ℓ For a sequence of matrices defined as y := (Y , Y , , Y ) C × , we define its vectorization ~y by 1 2 ··· ℓ ∈ ~y = (Y (11), Y (21), Y (12), Y (22), , Y (11), Y (21), Y (12), Y (22))T C4ℓ, (28) 1 1 1 1 ··· ℓ ℓ ℓ ℓ ∈ which reshapes the matrix into a column vector. The same notation applies to a single matrix. For example, (ij) (11) (21) (12) (22) T for Y = (Y )2 2, we have Y~ = (Y , Y , Y , Y ) . For any function F (y), its gradient F (y) is defined as a 4ℓ-dimensional× vector: ∇

T ∂F ∂F ∂F ∂F ∂F ∂F ∂F ∂F F (y)= , , , , , , , , , ∇ (11) (21) (12) (22) ··· (11) (21) (12) (22) ∂Y1 ∂Y1 ∂Y1 ∂Y1 ∂Yℓ ∂Yℓ ∂Yℓ ∂Yℓ ! so that the mean value theorem is denoted by

T F (y ) F (y )= F (1 ξ)y + ξy (~y ~y ), for some ξ [0, 1]. 2 − 1 ∇ − 1 2 2 − 1 ∈   The Hessian matrix 2F (y) is similarly defined as a 4ℓ 4ℓ matrix. ∇ × 2 2 Let be an index set and g = (Gα)α be a collection of random matrices with each Gα C × . For any I , we define ∈I ∈ D ⊂ I g := max Gα F , (29) k kD α {k k } ∈D 11 where F denotes the Frobenius norm. When = , the subscript will be omitted: g = g . It is clear thatk·k defines a norm, and is a seminormD I if . In particular,D for any 2 k 2k matrixk kIG, we define G k·k= G . In our analysis,k·k theD index α is alwaysD a ⊂2D I multi-index. For example,× if n>m and k k k kF = (m +1,m) (m +2,m + 1), (m +2,m) (n,n 1), , (n,m) , I { }∪{ }∪···∪{ − ··· } then g equals gn,m defined in (23). Similarly, we define

(std) E 2 1/2 (g) = max ( Gα F) (30) α ND ∈D k k n  o which will be often used throughout our analysis.

3.1.2. Boundedness assumptions We will need the following assumptions for our analysis:

(H1) The exact solution of (19) Ge, numerical solution G solved by the deterministic method (22) and numerical solution G solved by inchworm Monte Carlo method (25) are all bounded by a O(1) constant: G e Ge(s ,si) for any 0 si s 2t; k ↑ k≤ ≤ ≤ ↑ ≤ (31) + G , G G for any j, k =0, 1, ,N 1,N −,N ,N +1, 2N 1, 2N. k j,kk k j,kk≤ ··· − ··· − (rs) 3 (H2) Each rs entrye (r, s = 1, 2) of the full propagator Ge (s ,si) is of class C on the domain si − ↑ ∈ [0, 2t] t , s [si, 2t] t and we define the following upper bounds: \{ } ↑ ∈ \{ } ∂αG(rs) G , for α = α + α =2, e (x , x ) ′′ 1 2 α1 α2 1 2 G ∂s ∂si ≤ ( ′′′, for α = α1 + α2 =3, ↑

for any x [0, 2t] t , x [x , 2t] t . 2 ∈ \{ } 1 ∈ 2 \{ } (H3) We further assume that system Hamiltonian Hs and system perturbation Ws can also be bounded by O(1) constants: H H , W W . (32) k sk≤ k sk≤ In addition, we assume that the bath influence functional can be bounded as L M M+1 (s ,~s ) (M!!)L 2 (33) |L ↑ |≤ for some O(1) constant L .

Here in the assumption (H1), we have assumed an upper bound for the exact solution. This is reasonable as Ge, the propagator of a spin in an open quantum system, should be unitary. Although the inchworm method applies some approximation by truncating the infinite series, which may result in some deviation from U(2), we would still like to restrict ourselves to the case when the equation (19) gives an approximation with sufficient quality. Similarly, the boundedness assumptions of the numerical solutions mean that we want to study the evolution of the error when the numerical solution does not completely lose its validity. In (H3), the assumption (33) comes from the actual expression of , for which we refer the readers to [5] for more details. L M M+1 Remark 1. The bound (33) can actually be improved by (s ,~s ) α(M)(M!!)L 2 with a coefficient α(M) (0, 1], and the factor α(M) is the reason why the|L series↑ (20)|≤ has faster convergence than the Dyson series (26).∈ Here we are mainly interested in the case of a finite M¯ (the upper bound of M), and therefore the looser bound (33) does not change the order of the error and its general growth rate.

12 3.2. Main results and discussions on the error growth In this section, we will provide our main results for the error estimation of the inchworm Monte Carlo method, and compare the results with the error growth of Dyson series expansion. The following theorem gives the difference between the inchworm Monte Carlo method (25) and the deterministic scheme (22): Theorem 3. Let ∆G = G G . Given a sufficiently small time step length h and a sufficiently n,m n,m − n,m large Ns, if the assumptions (H1) and (H3) hold, the difference between the deterministic solution and the Monte Carlo solution can be estimatede by

Bias estimation • h E 2 3θ1√P1(tn−m+1)tn−m+1 (∆Gn+1,m) 4θ2α¯(tn m+1)¯γ(tn m+1) e (34) k k≤ − − · Ns   Numerical error estimation • h 2 1/2 θ1√P1(tn−m+1)tn−m+1 [E( ∆Gn+1,m )] θ2 γ¯(tn m+1) e (35) k k ≤ − · rNs p   Here

M¯ 1 (WGL 1/2t)M α¯(t)=16P (t) (10t + 16t2 +5t3 + t4), γ¯(t)=2WGL 1/2 , (36) 2 · 4 (M 1)!! M=1 M Xis odd − M¯ 2 3 2 3 (M 1)M 1/2 M 2 P (t)=2W GL +3W G L 2 (1 + t) − (2WGL t) − (37) 1 (M 3)!! M=3 M Xis odd − and the function P ( ) is a polynomial of degree M¯ 1 and are independent of h. The constants θ and θ 2 · − 1 2 are given by θ1 = 353 and θ2 = √34.

The difference between the results of the inchworm Monte Carlo method and the exact solution is given by Theorem 4. Under the same settings as Theorem 3, if we further assume that (H2) holds, then the difference between the inchworm Monte Carlo solution and the exact solution can be estimated by

Bias estimation • e θ1√P1(tn−m+1)tn−m+1 2 E Ge(tn+1,tm) Gn+1,m P (tn m+1) e h − ≤ − ·    h  (38) 2 3θ1√P1(tn−m+1)tn−m+1 +4θ2α¯(tn m+1)¯eγ(tn m+1 ) e , − − · Ns   Numerical error estimation • 1/2 2 e θ1√P1(tn−m+1)tn−m+1 2 E Ge(tn+1,tm) Gn+1,m P (tn m+1) e h k − k ≤ − · (39) h  i  h θ1√P1(tn−m+1)tn−m+1 e + θ2 γ¯(tn m+1) e . − · rNs p   Here the function P e(t) is defined by

M¯ e 1 5 1/2 M +1 1/2 M P (t)= H +8P (t) G ′′ + G ′′′ + WG ′′L (2WGL t) . (40) 4 1 12  (M 1)!!    M=1 − M Xis odd  13   The above result indicates the following properties of the inchworm scheme, which are similar to the case of the differential equations:

The bias again has only a small contribution to the numerical error, which is often hardly observable • in the numerical experiments. The error consists of two parts. The first part is second-order in h, and the second part is half-order • in the total number of samples.

The growth of the numerical error over time is more complicated compared to the ODE case. Since the function P (t) is a polynomial of degree M¯ 1, the growth of the error is on the order of exp Ct(M¯ +1)/2 . 1 − Clearly the growth rate depends on the choice of M¯ . In the numerical examples shown in [5, 8], only M¯ =1 and M¯ = 3 are used in the applications. Regarding the behavior of the error growth for different M¯ , we remark that

When M¯ = 1, the final error estimation (39) shows that there exists constants C and C such that • 1 2

1/2 2 C2tn−m+1 2 h E Ge(tn+1,tm) Gn+1,m C1e h + , k − k ≤ Ns ! h  i r e showing that the error grows exponentially with respect to the time difference in the propagator, which is slower than the method using Dyson series, where the error grows exponentially with respect to the square of the time difference. In this case, the numerical error is successfully mitigated. When M¯ = 3, there exist constants C and C such that • 1 2

1/2 2 2 C2(tn−m+1+t ) 2 h E Ge(tn+1,tm) Gn+1,m C1e n−m+1 h + . k − k ≤ Ns ! h  i r e In this case, the growth rate is exponential in t2, which is the same as the Dyson series. Thus which method has greater error depends on the coefficient in front of t2. Instead of a detailed analysis, we would just comment that the inchworm Monte Carlo method is likely to have a smaller coefficient due to the effect of partial resummation, which leads to less terms in (20) than the original Dyson series. For larger M¯ , if t is large, the error growth of the inchworm Monte Carlo method will be even worse • than the summation using Dyson series. However, since the coefficient of tk is smaller for larger k, when t is small, we may still expect that the inchworm Monte Carlo method has slower error growth due to the effect of partial resummation. When M¯ + , we have • → ∞ lim γ¯(t)=2W 2G 2L t exp W 2G 2L t2/2 , M¯ + → ∞ 2 3 2 3  1/2 2W 2G 2L t2 lim P1(t)=2W GL +3W G L 2 (1 + t)P (2WGL t) e , M¯ + · → ∞ where P (x) = x5 +7x3 +6x. Although these quantities are still finite, the error bound (39) grows double exponentially with respect to t2, which is undesired in applications.

The numerical experiments in [5, 8] show that in certain regimes where the constant L is relatively small, the contribution from M = 1 is dominant in the series (20). In this case, the inchworm Monte Carlo method can well suppress the numerical sign problem and achieve an exponential error growth in these applications.

14 3.3. Outline of the proof We will postpone the details of the proof while provide an outline here. The results are obtained in the following steps:

Estimate the derivatives of the right-hand sides (Propositions 5 and 6). • Derive recurrence relations for the numerical error (Proposition 7). • Apply the recurrence relations to derive the error bounds (Theorem 3). • Estimate the error of the deterministic method (Proposition 8). • Use the triangle inequality to derive the final error bounds (Theorem 4). • Some more details of these steps are given by a number of propositions below. We first define some sets of 2D multi-indices that will be used.

2 Ω = (j, k) Z m k < j n , Ω∗ =Ω ; n,m { ∈ | ≤ ≤ } n,m n+1,m ∂Ω = (j, k) Ω j = n or k = m , ∂Ω∗ = ∂Ω ; n,m { ∈ n,m | } n,m n+1,m (41) ˚Ω =Ω ∂Ω , ˚Ω∗ = ˚Ω ; n,m n,m\ n,m n,m n+1,m Γ (i)= (j, k) Ω j k = i , Γ∗ (i)=Γ (i). n,m { ∈ n,m | − } n,m n+1,m

One may refer to Fig. 2 to visualize these definitions. Note that Ωn,m and Ωn,m∗ respectively contain indices of the numerical solutions in gn,m and gn,m∗ and thus give the locations of all nodes that K1 and K2 depend on. In addition, since Gn∗+1,m is calculated completely based on the rest of gn,m∗ , we define Ω¯ = Ω∗ (n +1,m) to represent the indices of all full propagators that we actually use in order to n,m n,m\{ } obtain Gn+1,m.

k

j 2 8 9

Figure 2: An example illustrating the elements contained in each set defined in this section when n = 8, m =2. Ωn,m: all red ˚ ∗ ∗ ˚∗ ¯ nodes; ∂Ωn,m: “⊗” + red “×”; Ωn,m: “•”; Ωn,m: all nodes; ∂Ωn,m: all “×”, Ωn,m: “•” + “⊗”; Ωn,m: all nodes except blue ∗ “×”; Γn,m(2): all red nodes on the thick black line; Γn,m(2): all nodes on the thick black line.

For the analysis of ODEs, it has been assumed in (8) the boundedness of the first- and second-order derivatives of the rigth-hand side. Correspondingly, our first step is to estimate the derivatives of F1 and F2. For F1, the results are given by the following two propositions for first- and second-order derivatives respectively.

15 Proposition 5. Assume (H1)(H3)(R4) hold. Given the time step length h and any ξn,m being a convex combination of ge , g and g , the first-order derivative of F (ξ ) w.r.t. the pq entry (p, q =1, 2) n,m n,m n,m 1 n,m − of Gk,ℓ is bounded by

∂Fe1(ξn,m) P1(tn m)h, for (k,ℓ) ∂Ωn,m, − ∈ (42) (pq) 2 ˚ ∂G ≤ (P1(tn m)h , for (k,ℓ) Ωn,m, k,ℓ − ∈ where P (t) is defined in (37) . 1 Proposition 6. Assume (H1)(H3)(R4) hold. Given the time step length h, the second-order derivative of F (ξ ) w.r.t. the p q entry of G and the p q entry of G (p , q =1, 2) is bounded by: 1 n,m 1 1− k1,ℓ1 2 2− k2,ℓ2 i i If (k ,ℓ ) (k ,ℓ ) ∂Ω ∂Ω , • 1 1 × 2 2 ∈ n,m × n,m 2 ∂ F1(ξn,m) P2(tn m)h, if one of the conditions (a)-(d) holds, − (p q ) (p q ) 2 (43) ∂G 1 1 ∂G 2 2 ≤ (P2(tn m)h , otherwise, k1,ℓ1 k2,ℓ2 −

where the conditions (a)-(d) are given by

(a) k = k = n, (ℓ ,ℓ ) (n 1,m), (m,n 1) , 1 2 1 2 ∈{ − − } (b) ℓ = ℓ = m, (k , k ) (m +1,n), (n,m + 1) , 1 2 1 2 ∈{ } (c) k = n and ℓ = m, (k ,ℓ ) m ℓ n 1,m +1 k n ℓ k 1 , 1 2 2 1 ∈ ≤ 1 ≤ − ≤ 2 ≤ | 1 − 2|≤ (d) k2 = n and ℓ1 = m, (k1,ℓ2) m ℓ2 n 1,m +1 k1 n ℓ2 k1 1 . ∈ ≤ ≤ − ≤ ≤ | − |≤  If (k ,ℓ ) (k ,ℓ ) ∂Ω ˚Ω , • 1 1 × 2 2 ∈ n,m × n,m 2 2 ∂ F1(ξn,m) P2(tn m)h , for k1 = n, k2 ℓ1 1 or ℓ1 = m, k1 ℓ2 1, − (p q ) (p q ) 3 | − |≤ | − |≤ (44) ∂G 1 1 ∂G 2 2 ≤ (P2(tn m)h , otherwise. k1,ℓ1 k2,ℓ2 −

If (k ,ℓ ) (k ,ℓ ) ˚Ω ∂Ω , • 1 1 × 2 2 ∈ n,m × n,m 2 2 ∂ F1(ξn,m) P2(tn m)h , for k2 = n, k1 ℓ2 1 or ℓ2 = m, k2 ℓ1 1, − (p q ) (p q ) 3 | − |≤ | − |≤ (45) ∂G 1 1 ∂G 2 2 ≤ (P2(tn m)h , otherwise. k1,ℓ1 k2,ℓ2 −

If (k ,ℓ ) (k ,ℓ ) ˚Ω ˚Ω , • 1 1 × 2 2 ∈ n,m × n,m 2 3 ∂ F1(ξn,m) P2(tn m)h , for k1 ℓ2 1 or k2 ℓ1 1, − (p q ) (p q ) 4 | − |≤ | − |≤ (46) ∂G 1 1 ∂G 2 2 ≤ (P2(tn m)h , otherwise, k1,ℓ1 k2,ℓ2 −

where P (t) is a polynomial of degree M ¯ 1 independent of h. 2 −

In these two propositions, the functions P1( ) and P2( ) are the same as the corresponding functions in Theorems 3 and 4, respectively. The proofs of· these two· propositions are deferred to Section 6.1. Unlike the case of differential equations, the partial derivative of F1( ) involves a number of previous numerical solutions (all red nodes in Figure 2), and the magnitudes depend· on the locations of the nodes, as forms different cases in the above propositions. Similar results for the derivatives of F ( ) with the same functions 2 · P1(t), P2(t) can be proven, where all the indices n should be changed to n + 1. With the above estimates for the derivatives, we can establish recurrence relations for the bias and the numerical error: 16 Proposition 7. Let ∆g = g g and ∆g∗ = g∗ g∗ . Given a sufficiently small time n,m n,m − n,m n,m n,m − n,m step length h and a sufficiently large Ns, if the assumptions (H1) and (H3) hold, the difference between the deterministic solution and the Montee Carlo solution can be eestimated by

Bias estimation: • E(∆G ) k n+1,m k n m 1H 4 4 E 2 − E (1 + h ) (∆Gn,m) + 22P1(tn m+1)h (2+(n m +1 i)h) (∆gn,m∗ ) ∗ ≤ 8 k k − − − Γn,m(i) (47) i=1 X 7 2 h3 (std) g + α¯(tn m+1)h Ω¯ (∆ n,m∗ ) +8¯α(tn m+1)¯γ(tn m+1) . 2 − N n,m − − · Ns  h i  Numerical error estimation: • 1 [E( ∆G 2)]1/2 (1 + H 4h4)[E( ∆G 2)]1/2 k n+1,mk ≤ 8 k n,mk n m (48) 2 − (std) 7 h + 22P1(tn m+1)h 2 + (n m +1 i)h ∗ (∆gn,m∗ )+ γ¯(tn m+1) , − − − NΓn,m(i) 2 − · √N i=1 s X  p and

1 1 1/2 E( ∆G 2) (1 + H 4h4) E( ∆G 2)+(1+ H 4h4)h E( ∆G 2) k n+1,mk ≤ 4 · k n,mk 8 k n,mk × n m   2 − (std) h 44P1(tn m+1)h 2 + (n m +1 i)h ∗ (∆gn,m∗ )+4¯α(tn m+1)¯γ(tn m+1) − − − NΓn,m(i) − − · N (49) ( i=1 s ) X  2 n m 2 2 4 − (std) h + 912P1 (tn m+1)h 2 + (n m +1 i)h ∗ (∆gn,m∗ ) + 17¯γ(tn m+1) , − − − NΓn,m(i) − · N ( i=1 ) s X  where the functions α¯ and γ¯ are defined in (36).

In the above proposition, two different recurrence relations are given for the numerical error. Note that (49) is not a simple square of the estimation (48). The main reason lies in the term 4¯α(tn m+1)¯γ(tn m+1) 2 − − · h /Ns located at the end of the second line in (49). Below we are going to use a simple analog to help the readers understand the difference. Consider the two recurrence relations h en+1 en + , (50) ≤ √Ns 2 2 2 2 2h h en+1 en + en + . (51) ≤ Ns Ns The relation between these two recurrence relations are analogous to the relation between (48) and (49). The square of the first recurrence relation (50) is

2 2 2 2h h en+1 en + en + , ≤ √Ns Ns where the cross-term is different from (51). However, the relation (51) provides a higher numerical order than (50), since by Cauchy-Schwarz inequality, we can derive from (51) that

h2 2h2 e2 1+ e2 + , (52) n+1 ≤ N n N  s  s 17 indicating that en O( h/Ns), while the recurrence relation (50) can only give us en O( 1/Ns). Besides, in order to∼ study the error growth rate with respect to time, we are not allowed to use∼ the Cauchy- p p Schwarz inequality to simplify equation (49) like (52). Later in our proof, the simpler equation (48) will be used the determine the growth rate of the numerical error, while the more complicated version (49) is responsible for the final error estimation. Theorem 3 is obtained from the recurrence relations stated in Proposition 7. To obtain the final estimates (Theorem 4), we need to estimate the error of the deterministic scheme (22), which is given by

Proposition 8. We define the deterministic error En+1,m = Ge(tn+1,tm) Gn+1,m. If the assumptions (H1)(H2)(H3) hold, then for a sufficiently small time step length h and a sufficiently− large number of sam- plings at each step Ns, we have

e θ1√P1(tn−m+1)tn−m+1 2 En+1,m P (tn m+1) e h (53) k k≤ − · e   where P (t) is defined in (40), and the constant θ1 is the same as the one in Theorem 4.

It is easy to see that our final conclusions in Theorem 4 are a straightforward combination of Theorem 3 and Proposition 8 by the triangle inequality.

4. Numerical experiments

In this section, we will verify the above statements using numerical experiments. The following two subsections will be devoted, respectively, to the case of differential equations and the inchworm Monte Carlo method.

4.1. Numerical experiments for ordinary differential equations We consider an example as the following ordinary differential equation: du i = Ku(t)= E R(u,X) , t [0,T ], dt −2 X ∈ (54) R(u,X)= iXu  − with the initial condition u(0) = 1 and the random variable X U(0,K). ∼ We apply the two schemes proposed in (18) to get the numerical solutions un and un with uniform time step length h = T/N. For the stochastic un, we carry out the experiments independently for Nexp = 100NNs (1) (2) (Nexp) times to obtain un , un , , un and we approximate the numerical error by e ··· e N 1 exp e E(eu u e2) µ := u u(i) 2, for n =0, 1, ,N. (55) | n − n|2 ≈ n N | n − n | ··· exp i=1 X Based on these settings, wee now focus on the numerical ordere of the scheme and the growth of the numerical error with respect to t. For given time step h, we define the error function e( ) by e(nh)= µ . · n We first set K = 3 and K = 10 in (54) and T = 3. Figure 3 shows the evolution of the numerical error 1 e(t) for h = 4 and various numbers of samples Ns. For K = 10, the left panel of Figure 3 shows that the error grows exponentially over time as predicted in Theorem 2, while for smaller K, the stability of the method takes effect, and the error grows only linearly up to T = 3 as exhibited in the right panel of Figure 3. This verifies that the exponential growth can be well controlled if appropriate Runge-Kutta schemes and sufficiently small time steps are adopted, which avoids the numerical sign problem in the Monte Carlo method that directly calculates (14). 18 10-3 102 3.5

3 101

2.5 100

2 10-1 1.5

10-2 1

-3 10 0.5

10-4 0 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

Figure 3: Evolution of numerical error e(t) (left: K = 10, right: K = 3).

To verify the convergence rate with respect to h and Ns in the estimate (13). We set K = 1 and T =1 in (54) and consider the numerical error at t =0.5 and t = 1. We first fix Ns = 100 and reduce h from 1/2 to 1/64, and then fix h =1/4 and increase Ns from 100 to 3200. The numerical errors are listed in Table 1, from which we can easily observe the first-order convergence for both h and Ns, as agrees with our estimate (13).

h, Ns e(0.5) order e(1) order h, Ns e(0.5) order e(1) order 1/2, 100 1.0917e-04 – 2.1940e-04 – 1/4, 100 5.2721e-05 – 1.0593e-04 – 1/4, 100 5.2721e-05 1.0502 1.0593e-04 1.0505 1/4, 200 2.6533e-05 0.9906 5.3332e-05 0.9901 1/8, 100 2.6257e-05 1.0057 5.2776e-05 1.0052 1/4, 400 1.3210e-05 1.0062 2.6520e-05 1.0079 1/16, 100 1.3027e-05 1.0111 2.6039e-05 1.0192 1/4, 800 6.6185e-06 0.9970 1.3254e-05 1.0007 1/32, 100 6.5086e-06 1.0011 1.3013e-05 1.0007 1/4, 1600 3.3043e-06 1.0022 6.5942e-06 1.0072 1/64, 100 3.2579e-06 0.9984 6.5124e-06 0.9987 1/4, 3200 1.6528e-06 0.9995 3.3060e-06 0.9961

Table 1: Numerical error e(1) with different time steps h and sampling numbers Ns and the order of accuracy.

4.2. Numerical experiments for the inchworm Monte Carlo method To verify the error growth of the inchworm Monte Carlo method, we consider the spin-boson model with a bath with Ohmic spectral density, where the Hamiltonian and perturbation operators are respectively given by Hs = ǫσˆz +∆ˆσx, Ws =σ ˆz where we set the energy difference between two spin states ǫ = 1 and the frequency of the spin flipping ∆=1.σ ˆx, σˆz are Pauli matrices defined by

0 1 1 0 σˆ = , σˆ = . x 1 0 z 0 1    −  We aim to verify the error growth when M¯ = 1 and M¯ = 3 as we have discussed in Section 3.2. The bath influence functional is given by L B(s ,s ), when ~s = (s ,s ) (~s)= 1 2 2 1 (56) L (B(s1,s3)B(s2,s4), when ~s = (s4,s3,s2,s1)

19 where the correlation function B( , ) is formulated as · · L 2 cl βωl B(τ1, τ2)= coth cos ωl(τ2 τ1) i sin ωl(τ2 τ1)) . (57) 2ωl 2 − − − l=1     X  The general formula of (~s) for higher-dimensional ~s can be found in [5]. According to [27], the coupling intensity c and frequencyL of each harmonic oscillator ω [0,ω ] are respectively defined as l l ∈ max ξω l c = ω c [1 exp( ω /ω )], ω = ω ln 1 [1 exp( ω /ω )] , l =1, ,L. l l L − − max c l − c − L − − max c ··· r  

As for the parameters above, we set L = 200, ωmax =4ωc with the primary frequency ωc = 3, ξ =0.6 and β = 5 throughout our experiments.

We are interested in evolution of the expectation for the observable σˆz(t) := tr ρsGe(2t, 0) , where ρs is the initial density matrix for the system, which is set to be h i  0 0 ρ = s 0 1   in our simulations. Thus σˆ (t) can be approximated via inchworm Monte Carlo method by h z i (11) σˆz(jh) GN+j,N j, for j =0, 1, ,N, h i≈ − ··· where Gn,m is obtained by the scheme (25).e The evolution of σˆz(t) is plotted in Figure 4. Note that due to the numerical error, the computed σˆ (t) may contain a nonzeroh i imaginary part, and here we only the h z i real parte of the numerical result is plotted.

1

0.8

0.6

0.4

0.2

0

-0.2

-0.4 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Figure 4: Evolution of Rehσˆz (t)i.

The numerical results in Figure 4 are obtained using the scheme with time step h = 1/8. We choose 4 5 Ns = 10 for M¯ = 1 and Ns = 10 for M¯ = 3 in order for the curves to be sufficiently smooth. One can observe that when t< 1, two curves are hardly distinguishable, meaning that the contribution from M =3 is much smaller than M = 1. It can therefore be expected that the error curves for both methods are close to each other before t = 1, and the quadratic error growth of M¯ = 3 will become obvious only for large t. Note that the curve given by M¯ = 3 is expected to be closer to the actual physics. Here we only aim at the verification of Theorem 3, and do not discuss the modeling error by choosing a finite M¯ .

20 Unlike the differential equation case, now it is much harder to find the solution of the deterministic scheme due to the high-dimensional integral on the right-hand side of (19). Therefore, instead of verifying (35) directly, we use E( G G 2) = Var(G )+ E(G G ) 2, (58) k n,m − n,mk n,m k n,m − n,m k and only take the first term on the right-hand side (the variance of Gn,m) to approximate our numerical e e e 2 error. Such an approximation is reasonable since the second term E(Gn,m Gn,m) has a higher order 2 2 k − k O(h /Ns ) by the bias estimation (34). To compute the variance numerically,e we run the same simulation Nexp times, and compute the unbiased estimation of the variance: e

Nexp Nexp 2 Nexp 1 [k] 2 1 [k] Var(Gn,m) µ¯n,m := Gn,m Gn,m , ≈ Nexp 1 Nexp k k − Nexp ! − k=1 i=1 X X [k] e e e where Gn,m is the result of the kth simulation. For a given time step h, we let e(jh)=¯ µN+j,N j . Below − we first check the numerical order. By choosing Nexp = 1000NNs, we get results shown in Table 2, from which onee can clearly observe the order of convergence given in (35).

h, Ns e(0.5) order e(1) order h, Ns e(0.5) order e(1) order 1/10, 2 0.0417 – 0.1488 – 1/4, 1 0.1939 – 0.8579 – 1/12, 2 0.0350 0.9658 0.1228 1.0505 1/4, 2 0.0972 0.9959 0.3908 1.1344 1/14, 2 0.0303 0.9293 0.1051 1.0083 1/4, 4 0.0473 1.0386 0.1824 1.0990 1/16, 2 0.0263 1.0574 0.0915 1.0409 1/4, 8 0.0237 0.9998 0.0886 1.0417 1/18, 2 0.0237 0.8936 0.0811 1.0203 1/4, 16 0.0119 0.9962 0.0436 1.0235 1/20, 2 0.0214 0.9614 0.0728 1.0341 1/4, 32 0.0059 1.0053 0.0217 1.0091

Table 2: Numerical error e(0.5), e(1) with different time steps h and sampling numbers Ns and the order of accuracy

Now we present the growth of error for Ns = 4 and 8 in Figure 5, where the time step is set to be h =1/8, and we choose Nexp = 700NNs for both M¯ = 1 and M¯ = 3. As predicted, the two curves almost coincide for t< 1. For M¯ = 1, the numerical error starts to show the exponential growth from t =4.5, and for M¯ = 3, the quadratic exponential growth becomes obvious from t =2.5. Both results are in accordance with the theoretical results in Theorem 3.

100

100

10-1

10-1

10-2 10-2 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Figure 5: Evolution of numerical error e(t) (left: Ns = 4, right: Ns = 8).

By now, we have stated all the results in this paper. From the next section, we will start to prove the theorems and propositions. 21 5. Proofs for the case of differential equations

In this section, we prove the results for differential equations as stated in Section 2.1.

5.1. Proof of Proposition 1 — Part I: Recurrence relation for the bias In this section we focus on the proof of (10). By taking the difference of the schemes (4) and (6) and applying the triangle inequality and the bounds of the coefficients, we get

s E(u u ) E(u u ) + hR E(k k ) (59) k n+1 − n+1 k2 ≤k n − n k2 k i − i k2 i=1 X e e e for all non-negative integer n. We then focus on the estimate for E(ki ki) 2. In fact, we have the following results: k − k e Lemma 1. Given a sufficiently small time step length h. If the boundedness assumption (8) are satisfied, we have

2 E E E 2 h 2 (ki ki) 2 α′ (un un) 2 + un un 2 + uRK std (60) k − k ≤ k − k k − k Ns | |    and e e e E 2 E 2 1 2 ( ki ki 2) β′ un un 2 + uRK std , (61) k − k ≤ k − k Ns | | s+1 2 s  2  2  for β′ = 2 max(sM ′ , 1) and α′ =e 2 max(M ′,sM ′′,s eM ′′R β′/2). Here we recall that s is the number of Runge-Kutta stages, d is the dimension of solution u and R,M ′,M ′′ are some upper bounds defined in (8)–(9).

With the above Lemma, one may insert the estimate (60) into (59) to get the recurrence relation (10) stated in Proposition 1 for the bias E(u u ) . The proof of Lemma 1 is given below: k n+1 − n+1 k2

Proof of Lemma 1. Apply the relation (5) and usee Taylor expansion at the deterministic point tn +cih,un + i 1 h − a k , we have for the mth component of E(k k ), j=1 ij j i − i  P i 1 i 1 − e − E(k(m) k(m)) = E f (m) t + c h,u + h a k f (m) t + c h, u + h a k | i − i | n i n ij j − n i n ij j j=1 j=1 (62)  X  X  e e M ′′ 2 e M ′ Ew + E w . ≤ k ik2 2 k ik2 where i 1 − w = (u u )+ h a (k k ), i n − n ij j − j j=1 X and we have applied the boundedness assumptione (8). The above inee quality immediately yields

i 1 − E(k k ) M ′ E(u u ) + hR E(k k ) k i − i k2 ≤ k n − n k2 k j − j k2 j=1  X  (63) e i 1 e sM e − + ′′ E( u u ) 2)+ h2R2 E( k k 2) . 2 k n − n k2 k j − j k2 j=1  X  e e 22 By recursion, we obtain

i 1 s sM ′′ 2 2 2 − 2 E(k k ) (1 + hRM ′) M ′ E(u u ) + E( u u )+ h R E( k k ) . (64) k i − i k2 ≤ k n − n k 2 k n − nk2 k j − j k2 j=1   X  e e e e We observe from the inequality above that the upper bound of E(ki ki) 2 is partially determined by E 2 k − k the second moment ( kj kj 2) up to (i 1)-th Runge-Kutta stage. Therefore, we subsequently consider E k − 2 k − e the estimate for ( kj kj 2). By direct calculation, k − ke i 1 Ns i 1 − − 2 E 2 E e 1 (i) ( ki ki 2)= f tn + cih,un + h aij kj g tn + cih, un + h aij kj ,Xl k − k − Ns 2 j=1 l=1 j=1 h X  X X  i N e i 1 s e i 1 e 2 E − 1 − (i) =2 f tn + cih,un + h aij kj g tn + cih,un + h aij kj ,Xl − Ns 2 j=1 l=1 j=1 h X  X X  i N s i 1 i 1 2 2 E − (i) − (i) + 2 g tn + cih,un + h aij kj ,Xl g tn + cih, un + h aij kj ,Xl Ns − 2 h l=1 h j=1 j=1 i X X  X  2 2 2 2 e e uRK std +2M ′ d wi 2 ≤ Ns | | k k i 1 2 2 2 2 − 2 2 2 2dsM ′ E u u + R h E k k + u . ≤ k n − nk2 k j − j k2 N | RK|std j=1 s   X  e Here we have used the mean valuee theorem and the standard error estimates for the Monte Carlo method to obtain the upper bound. By applying the above inequality recursively backwards to the first Runge-Kutta stage, we obtain a uniform bound for E( k k 2): k i − ik2 E 2 2 2 2 s 2E 2 2 2 ( ki ki 2e) (1+2M ′ R dsh ) 2dsM ′ un un 2 + uRK std . (65) k − k ≤ k − k Ns | |    s+1 2 The estimation (61) cane be obtained by setting β′ =2 max(dsM ′ ,e1) for h 6 √ds/(2M ′R). Substituting (65) into (64), we can obtain (60) with h 1/ max(M ′R, R√sβ ). ≤ ′

5.2. Proof of Proposition 1 — Part II: Recurrence relation for the numerical error We again insert the two schemes and expand the numerical error E u u 2 into k n+1 − n+1k2 s 2  E 2 E e un+1 un+1 2 = (un un)+ h bi(ki ki) k − k − − 2 i=1  h X i s e 2 e E e 2 2E = un un 2 + h bi(ki ki) (66) k − k − 2 i=1  h X i s e s e † + hE (u u )† b (k k ) + hE b (k k ) (u u ) . n − n i i − i i i − i n − n i=1 i=1 h X i h X  i e e The second term on the right-hand side cane be immediately estimated using the previous resulte (61) given in Lemma 1:

s 2 s 2 2E 2 2 E 2 2 2 2E 2 h 2 h bi(ki ki) R sh ki ki 2 R s β′ h un un 2 + uRK std . (67) − 2 ≤ k − k ≤ k − k Ns | | h i=1 i i=1   X X   e e 23 e Using this estimate, naively we can bound the last two cross terms in (66) by Cauchy-Schwarz inequality. However, such a strategy will lead to an error estimate with the form

s s † 2 h 2 hE (u u )† b (k k ) + hE b (k k ) (u u ) C hE( u u )+ u , n − n i i − i i i − i n − n ≤ k n − nk2 N | RK|std i=1 i=1 s h X i h X  i   e e where the laste term is sub-optimal and will lead to a deterioratione in the final errore estimate. Therefore we need a more careful estimate as in the following lemma: Lemma 2. Given a sufficiently small time step length h. If the boundedness assumptions (8) are satisfied, we have

s s 2 5 † α′ h 4 2 hE (u u )† b (k k ) +hE b (k k ) (u u ) Ccr u +hE( u u ) , (68) n− n i i− i i i− i n− n ≤ N 2 | RK|std k n− nk2 i=1 i=1 s h X i h X  i   e e e 2 2 s+1 2 2 3 e e where Ccr = max(R s , 2+2 M ′ dR s ). Here we recall that s is the number of Runge-Kutta stages, d is the dimension of solution u, R,M ′ are some upper bounds defined in the assumptions (8)–(9) and α′ is given in Lemma 1.

With Lemma 2, we now plug the estimates (67) and (68) into (66) to obtain the recurrence relation (10) for the numerical error by

2 5 2 E 2 2 2 2 E 2 α′ h 4 2 2 h 2 ( un+1 un+1 2) (1+Ccrh+R s β′h ) ( un un 2)+ Ccr 2 uRK std +R s β′ uRK std , (69) k − k ≤ k − k Ns | | Ns | |   Ccr 2 2 from which onee can see that if h and N satisfy h e2 2 ′ , then (11) holds for β = max(2C , R s β′) with s ≤ R s β cr β′ is given in Lemma 1. The rest of this section devotes to the proof of Lemma 2. We introduce a “semi-stochastic” approximation u¯n+1 defined by i 1 − k¯ = f(t + c h, u + h a k¯ ), k =1, ,s; i n i n ij j ··· j=1 X (70) se u¯n+1 = un + h bik¯i. i=1 X This approximation applies the deterministice Runge-Kutta scheme to the stochastic solution un for one time step. The following Lemma controls the difference between this local approximation and the stochastic scheme (6). e Lemma 3. Let X := X(1),X(2), ,X(i) be the collection of samples up to ith Runge-Kutta stage where i ··· each X(j) = (X(j),X(j), ,X(j)), we have 1 2 ··· Ns  2 E ¯ h 2 Xi (ki ki) 2 α′ uRK std, (71) k − k ≤ Ns | | where α′ is given in (60). e

The proof of this lemma is omitted since it is almost identical to the proof of Lemma 1. The first and second terms on the right-hand side of (60) do not appear in the above result, since k¯i and ki are computed based on the same solution at the nth step. Below we provide the proof of Lemma 2: e

24 s Proof of Lemma 2. It suffices to only focus on one factor hE (u u )† b (k k ) since the other n − n i=1 i i − i ¯ one is simply its conjugate transpose which can be controlledh by exactly theP same upper bound.i We use ki as a bridge and split e e s s s hE (u u )† b (k k ) h E (u u )† b (k k¯ ) + h E (u u )†E b (k¯ k ) n − n i i − i ≤ n − n i i − i n − n Xs i i − i i=1 i=1 i=1 h X i h X i h  X i e s 2 s e 2 e E e 2 hE ¯ eE ¯ h ( un un 2)+ bi(ki ki) + Xs bi(ki ki) ≤ k − k 2 − 2 − 2 h i=1  i=1  i Xs X e R2sh e hE( u u 2)+ E k k¯ 2 + E E (k¯ k ) 2 . ≤ k n − nk2 2 k i − ik2 k Xs i − i k2 i=1 X   e e (72) From the first line to the second line above, we have taken advantage of the fact that un is independent th from Xs which is sampled at the (n + 1) time step when calculating un+1. The difference between ki and ¯ ki can be estimated in the same way as the derivation of (65). The result is e 2 2 2 2 2 s e 2 E( k k¯ ) 2M ′ sd(1+2M ′ dR sh ) E( u u ). (73) k i − ik2 ≤ k n − nk2 Inserting the above and the result of Lemma 3 into (72), we get e s 2 2 2 5 2 3 2 2 2 2 s 2 R s α′ h 4 hE (u u )† b (k k ) h(1 + R s M ′ d)(1 + 2M ′ dR sh ) E( u u )+ u , n − n i i − i ≤ k n − nk2 2N 2 | RK|std i=1 s h X i e 2 2 2 from whiche one can easily observe that the lemma holds if 2M ′ dR sh < 1. e

5.3. Proof of Theorem 2 — error bounds In this section, we apply the two recurrence relations stated in Proposition 1 to get the estimates for the bias E(u u ) as well as the numerical error E( u u 2). k n+1 − n+1 k2 k n+1 − n+1k2 By using (11) recursively backwards w.r.t n, we have e e 2 2 5 n E 2 n+1E 2 h 2 α h 4 i ( un+1 un+1 2) (1 + βh) ( u0 u0 2)+ β uRK std + 2 2 2 uRK std (1 + βh) k − k ≤ k − k Ns | | s R N | |  s  i=0 (74) 2 4 X e h 2 α he 4 βtn+1 = uRK std + 2 2 2 uRK std e 1 Ns | | s R Ns | | −    which leads to the global estimate (13) for the bias. Inserting (74) into the recurrence relation (10) and expanding the recursion in a similar way, we get E(u u ) k n+1 − n+1 k2 n n 1 3 2 4 − h 2 i h 2 α h 4 i βtn−i α uRK e (1 + αh) + αh uRK + uRK (1 + αh) e 1 ≤ N | |std N | |std s2R2N 2 | |std − s i=0 s s i=0 X   X n 1  h2 h α2h4 − eαtn+1 1 u 2 + αh u 2 + u 4 eαti eβtn−i 1 ≤ N − | RK|std N | RK|std s2R2N 2 | RK|std − (75) s s s i=0   X  n 1  2 2 4 − h αtn+1 2 h 2 α h 4 max(α,β)tn e 1 uRK std + αh uRK std + 2 2 2 uRK std e 1 ≤ Ns − | | Ns | | s R Ns | | −   i=0 2  2 4 X  h αtn+1 2 h 2 α h 4 max(α,β)tn = e 1 uRK std + αtn uRK std + 2 2 2 uRK std e 1 , Ns − | | Ns | | s R Ns | | −     which completes the proof of (12). 25 6. Proofs of estimates for inchworm Monte Carlo method

In this section, the proofs of theorems in Section 3.2 are detailed. We will again first focus on the difference between the deterministic method and the stochastic method, and the error of the deterministic method will be discussed at the end of this section. Thanks to the previous discussion on the differential equation case, we may follow this framework which guides the general flow of our derivation. Below we point out the major differences as well as difficulties for the case of this integro-differential equation before the detailed proof:

Since K depends on more previously-computed time steps than K due to the nonlocal integral term • 2 1 in (19) (this can be easily observed by comparing gn,m with gn,m∗ ), a uniform expression for Ki like (4) is no longer available for the integro-differential equation. Therefore, we need individual analysis for each Ki. Recall that the Taylor expansion is applied in the proof of Lemma 1 (e.g. in (62)), which requires • to estimate the first- and second-order derivatives of the source term f(t,u). This can no longer be simply assumed as in (8) and has to be carefully studied. They play crucial roles in understanding the behavior of the inchworm Monte Carlo method. The derivation of the error amplification can no longer be handled by the simple discrete Gronwall • inequality, due to the involvement of a large number of previous steps on the right-hand side of the numerical scheme. The error estimation must be handled with more care, e.g. the estimation we used to handle the cross term in (68) (Lemma 2) will lead to a pessimistic (sub-optimal) fast growth rate in the error estimate of integro-differential equations. Most importantly, the magnitude of the derivatives depends on M¯ , as it is determined by the dimen- • sionality of the integral in the equation. This will result in different error amplification with different choices of M¯ . This is the key point which explains whether/how the inchworm Monte Carlo method mitigates the numerical sign problem.

6.1. Estimation of the derivatives of F ( ) 1 · We first estimate the first- and second-order derivatives of F1( ). As discussed in Section 2.1, we will provide a detailed proof for the bounds of first-order derivatives. The· proof for the second-order derivatives will only be sketched.

6.1.1. Proof of Proposition 5—Estimate the first-order derivatives

∂F1(ξn,m) Proof. We are looking for an upper bound for the derivative (pq) with ξn,m being a convex combination ∂Gk,ℓ e of gn,m, gn,m and gn,m defined by

ξn,m := (Ξm+1,m; Ξm+2,m+1, Ξm+2,m; ; Ξn,n 1, , Ξn,m) (76) e ··· − ··· satisfying Ξ = c G (t ,t )+ c G + (1 c c )G (77) j,k 1 e j k 2 j,k − 1 − 2 j,k for some constants 0 c1,c2 1 and c1 + c2 1 given m k < j n. According to the assumption (H1) ≤ ≤ ≤ ≤ ≤ e on the boundedness of Ge, G and G, we immediately have + Ξ G for any j, k =0, 1, ,N 1,N −,N ,N +1, 2N 1, 2N. (78) k j,kk≤ e ··· − ··· −

In (22), we write F1(ξn,m) as

M¯ F (ξ ) = sgn(t t) iM+1 (ξ ) 1 n,m n − FM n,m M=1 M Xis odd 26 where

# ~sM t M M M M M (ξn,m) := ( 1) { ≤ }WsIhΞ(tn,sM )Ws WsIhΞ(s1 ,tm) (tn,~s ) d~s . (79) M M F tn>s > >s >tm − ··· L Z M ··· 1

∂ M (ξn,m) Therefore, we focus on the estimation of F (pq) for each odd integer M. The two cases given in equation ∂Gk,ℓ (42) will be discussed separately below.

∂ M (ξn,m) ∂ M (ξn,m) (I) (k,ℓ) ∂Ωn,m. This case includes F (pq) and F (pq) . Here we only consider the derivative ∈ ∂Gn,ℓ ∂Gk,m ∂ M (ξn,m) F (pq) since the analysis for the other is similar. ∂Gn,ℓ For each (ξ ), we split the derivative by FM n,m ∂ (ξ ) M ∂ (ξ ) FM n,m Ij n,m (pq) = (pq) . (80) ∂G j=0 ∂G n,ℓ X n,ℓ where

j (ξn,m)= M M M M I tn>s > >s >tn− >s > >s >tm Z M ··· j+1 1 j ··· 1 (81) # ~sM t M M M M M M ( 1) { ≤ }WsIhΞ(tn,sM )WsIhΞ(sM ,sM 1) WsIhΞ(s1 ,tm) (tn,~s ) d~s , − − ··· L M M j in which we require that tn 1 is between sj and sj+1. We write the integrand of each j (ξn,m) as 1 − I G × I Ξ(sM ,sM ) j where h j+1 j × G2 j # ~sM t M M M = ( 1) { ≤ }WsIhΞ(tn,s )Ws WsIhΞ(s ,s )Ws, G1 − M ··· j+2 j+1 (82) j M M M M 2 = WsIhΞ(sj ,sj 1)Ws WsIhΞ(s1 ,tm) (tn,~s ) G − ··· L for 1 j M 1 (the formula for j = 0 and j = M is slightly different but easy to get). One can easily verify≤ that≤ j is− completely independent from the factor Ξ since the time sequence sM , ,sM is at G2 n,ℓ { 1 ··· j } least one time step away from t and thus the interpolation of any I Ξ(sM ,sM ) in j never uses the value n h i+1 i G2 of Ξ . On the other hand, the value of j as well as the “interface” I Ξ(sM ,sM ) may or may not rely on n,ℓ G1 h j+1 j Ξn,ℓ, depending on how ℓ is given, as leads to the two cases we are going to discuss below.

j Case 1: ℓ = n 1. This is the most complicated case in this proof. Note that 1 depends on Ξn,n 1 due − M M j M M G M M − to the fact that we get each IhΞ(si+1,si ) in 1 by the interpolation IhΞ(si+1,si )= ci,1Ξn,n+ci,2Ξn 1,n 1+ M M G M M − −M ci,3Ξn,n 1 with some coefficients ci < 1. The “interface” IhΞ(sj+1,sj ) depends on Ξn,n 1 only when sj is − | | M M M M M − M restricted between (tn 2,tn 1) where IhΞ(sj+1,sj )= cj,1Ξn 1,n 1 +cj,2Ξn 1,n 2 +cj,3Ξn,n 1 +cj,4Ξn,n 2. One may refer to Figure− 6 for− a better understanding. − − − − − −

∂ j (ξn,m) For these reasons, we further divide the derivative I (pq) into two parts: ∂Gn,n−1 ∂ (ξ ) Ij n,m (pq) = ∂Gn,n 1 − ∂ j M M j M = 1IhΞ(sj+1,sj ) 2 d~s (83) M M M M M M (pq) tn>s > >s >tn−1 tn−1>s >tn−2 s >s > >s >tm ∂Ξ G G Z M ··· j+1 Z j Z j j−1 ··· 1 " n,n 1 # −  ∂ j M M j ~sM + (pq) 1 IhΞ(sj+1,sj ) 2 d . M M M M G G tn>sM > >sj+1>tn−1 tn−2>sj >tm sj > >tm ∂Ξn,n 1 ! Z ··· Z Z ··· − 27 M si

tn−1 Ξn,n−1

tn−2

tm

M si+1 tn−1 tn

M M j M M Figure 6: For each Ij , red triangle: the area where (si+1,si ) pairs in G1 locate; blue triangle: the area where (si+1,si ) pairs j M M in G2 locate; pink square: the area where the value of IhΞ(sj+1,sj ) is dependent on Ξn,n−1; green rectangle: the area where M M the value of IhΞ(sj+1,sj ) is independent from Ξn,n−1.

For the first integral above, we compute the derivative in the square bracket by ∂ j I Ξ(sM ,sM ) = (pq) G1 h j+1 j ∂Ξn,n 1 −   M # ~sM t M ∂ M M M M = ( 1) { ≤ }W I Ξ(t ,s )W W I Ξ(s ,s ) W W I Ξ(s ,s ) − s h n M s ··· s (pq) h i+1 i s ··· s h j+1 j i=j "∂Ξn,n 1 # X − M # ~sM t M M M M = ( 1) { ≤ }W I Ξ(t ,s )W W c E W W I Ξ(s ,s ). − s h n M s ··· s i,3 pq s ··· s h j+1 j i=j X Here Epq is defined as a 2-by-2 matrix with its pq entry being the only non-zero entry equal to 1. By the hypothesis (H1)(H3), this integral is therefore bounded− by

∂ j M M j sM (pq) 1 IhΞ(sj+1,sj ) 2 d~ M M M M M M G G tn>sM > >sj >tn−1 tn−1>sj >tn−2 sj >sj− > >s >tm " ∂Ξ # Z ··· +1 Z Z 1 ··· 1 n,n 1 − M+1 M M+1 M (M j + 1)W G (M!!L 2 ) 1 d~s M M M M M M ≤ − × tn>s > >s >tn− tn− >s >tn− s >s > >s >tm Z M ··· j+1 1 Z 1 j 2 Z j j−1 ··· 1 M+1 M M+1 1 j 1 M j+1 W G (M!!L 2 ) (M j + 1) (tn 1 tm) − h − . ≤ × − × (M j)!(j 1)! − − − − (84)

M+1 M M+1 We notice that the upper bound above consists of three components: (1) W G (M!!L 2 ): the upper bound of the integrand; (2) M j + 1: the number of terms with the form IhΞ whose values depend on 1 − j 1 M j+1 Ξn,n 1; (3) (tn 1 tm) − h − : the area of the domain of integration. Similarly, we may − (M j)!(j 1)! − − directly write down− − the upper bound for the second integral in (83):

∂ j M M j ~sM (pq) 1 IhΞ(sj+1,sj ) 2 d M M M M M M G G tn>s > >sj >tn−1 tn−2>sj >tm sj >sj− > >s >tm " ∂Ξ # Z M ··· +1 Z Z 1 ··· 1 n,n 1 − M+1 M M+1 1 j M j W G (M!!L 2 ) (M j) (tn 2 tm) h − . (85) ≤ × − × (M j)!j! − − − 28 Combining the estimation (84) and (85) yields

∂ j (ξn,m) M+1 M M+1 M j +1 j M j I 2W G (M!!L 2 ) − (tn 1 tm) h − . (86) ∂G(pq) ≤ (M j)!(j 1)! − − n,n 1 − − − As we have mentioned previously, the upper bound we obtained above is for 1 j M 1. For j = 0 and j = M, we may return to (81) and consider these two individual cases and we≤ easily≤ reach− to the following results with similar argument:

∂ 0(ξn,m) M+1 M M+1 M +1 M I W G L 2 h , ∂G(pq) ≤ (M 1)!! n,n 1 − − ∂ M (ξn,m) M+1 M M+1 M M 1 I W G L 2 (tn 1 tm) − h. ∂G(pq) ≤ (M 1)!! − − n,n 1 − − ∂ j (ξ ) I n,m n n k Now we sum up all the upper bounds for (pq) and use the combination relation (1+X) = k 0 k X ∂Gn,n−1 ≥ to get: P  M ∂ M (ξn,m) ∂ j (ξn,m) M+1 M M+1 M M 1 F I 5W G L 2 (tn tm) − h. (87) ∂G(pq) ≤ ∂G(pq) ≤ (M 3)!! − n,n 1 j=0 n,n 1 − − X −

Here we remark that the argument above is only valid when the odd number M is chosen to be greater than 1 due to the number (M 3)!! in the estimate (87), which is not defined when M = 1. For M = 1, we simply return to the definition− (79) and follow similar procedures to reach to the result

∂ 1(ξ ) F n,m 2W 2GL h. (88) ∂G(pq) ≤ n,n 1 −

j j Case 2: ℓ

∂ j(ξn,m) j ∂ M M j M I = 1 IhΞ(sj+1,sj ) 2 d~s . (pq) M M M M M M (pq) ∂G tn>s > >s >tn−1 tℓ+1>s >tℓ−1 s >s > >s >tm G ∂Ξ G n,ℓ Z M ··· j+1 Z j Z j j−1 ··· 1 n,ℓ ! Note that we need to choose 1 j M 1 and when j =0 or j = M, the corresponding derivative above ≤ ≤ − ∂ j (ξ ) vanishes. We follow the analysis for Case 1 and can also obtain upper bounds for all I n,m , summing ∂G(pq) n,ℓ ∂ M (ξ ) ∂ M (ξ ) up which leads to the estimate of F n,m . By calculation similar to Case 1, the derivative F n,m ∂G(pq) ∂G(pq) n,ℓ n,ℓ can also be bounded by the same upper bounds given in (87) for M 3 and (88) for M = 1. ≥ Overall, we arrive at the conclusion that for any (k,ℓ) ∂Ω , ∈ n,m W 2GL ∂ M (ξn,m) 2 h, if M =1, F M+1 (89) (pq) W M+1G M L 2 M M 1 ∂G ≤ (5 (M 3)!! (tn tm) − h, if M 3. k,ℓ − − ≥

∂F (ξ ) ∂ M (ξ ) To complete our estimation for 1 n,m , we now sum up the bounds for F n,m up to M¯ and get a ∂G(pq) ∂G(pq) k,ℓ k,ℓ uniform bound

M¯ ∂F1(ξ ) M M 1 n,m 2W 2GL +5W 2GL WGL 1/2t − h, (90) (pq)  n m  ∂G ≤ (M 3)!! − · k,ℓ M=3 −    M Xis odd    29 which proves the first case in the Theorem 5.

∂ M (ξ ) ˚ F n,m M M (II) (k,ℓ) Ωn,m. To compute (pq) , we need to first find all IhΞ(sj+1,sj ) in M (ξn,m) that depends ∈ ∂Gk,ℓ F (pq) ˚ M M M M on Ξ . Since (k,ℓ) Ωn,m, only those IhΞ(sj+1,sj ) such that tℓ 1

tn ∂ 1(ξn,m) ∂ # s1 t 1 1 1 1 F = ( 1) { 1≤ }W I Ξ(t ,s )W I Ξ(s ,t ) (t ,s ) ds . (91) (pq) (pq) − s h n 1 s h 1 m L n 1 1 ∂Gk,ℓ ∂Ξk,ℓ Ztm

1 1 It is easy to see that neither IhΞ(tn,s1) nor IhΞ(s1,tm) depends on Ξk,ℓ, since both are interpolated by Ξ ∂ 1(ξn,m) values only on ∂Ωn,m. As the result, F (pq) = 0. ∂Gk,ℓ When M 3, we need to consider the following two possibilities: ≥

Case 1: k ℓ 2. Similar to (80), we apply the following splitting of the integral in the definition of − ≥ M (79): F M 1 ∂ (ξ ) − ∂ (ξ ) FM n,m Jj n,m (pq) = (pq) , ∂G j=1 ∂G k,ℓ X k,ℓ where

j (ξn,m)= M M M M M M J tn> >s >s tk >s >tk− tℓ >s >tℓ− s >s > >tm Z ··· j+2 j+1 Z +1 j+1 1 Z +1 j 1 Z j j−1 ··· # ~sM t M M M M M M ( 1) { ≤ }WsIhΞ(tn,sM )WsIhΞ(sM ,sM 1) WsIhΞ(s1 ,tm) (tn,~s ) d~s . − − ··· L M M Here a critical observation is that once we assume tℓ 1 < sj < tℓ+1, tk 1 < sj+1 < tk+1 for any fixed j, M M − − IhΞ(sj+1,sj ) is then the unique term in the integrand of M (ξn,m) that depends on Ξk,ℓ since (tℓ 1,tℓ+1) F − ∩ (tk 1,tk+1)= when k ℓ 2. This observation is illustrated in Figure 7. − ∅ − ≥

M M sj sj+1 ( ) ( ) t tℓ 1 tℓ tℓ+1 tk 1 tk tk+1 − ··· − M M Figure 7: Locations of sj and sj+1 in Case 1.

We again write the integrand above as j I Ξ(sM ,sM ) j defined exactly the same as in (82), then G1 × h j+1 j × G2 j and j are both independent from Ξ . Therefore, G1 G2 k,ℓ

∂ j (ξ ) ∂ J n,m j M M j ~sM (pq) 1 (pq) IhΞ(sj+1,sj ) 2 d ≤ M M M M G G ∂G tn> >sj tk+1>sj >tk−1 tℓ+1>sj >tℓ−1 sj > >tm ∂Ξ ! k,ℓ Z ··· +1 Z +1 Z Z ··· k,ℓ

M+1 M M+1 M!! M 2 2 4W G L 2 (t t ) − h , ≤ (M j 1)!(j 1)! n − m − − − which leads to

M 1 − M+1 ∂ M (ξn,m) ∂ j (ξn,m) M+1 M M M 2 2 F = J 4W G L 2 (2tn m) − h . (92) ∂G(pq) ∂G(pq) ≤ (M 3)!! − k,ℓ j=1 k,ℓ − X

30 Case 2: k ℓ = 1. There is a overlapping region (tℓ 1,tℓ+1) (tk 1,tk+1) = (tℓ,tℓ+1) in this case. − − ∩ − Consequently, there can be multiple terms in the integrand depending on Ξk,ℓ. To estimate the derivative ∂ M (ξn,m) M F (pq) , we further divide it into three parts based on the distribution of the time sequence ~s in the ∂Gk,ℓ integrand:

M M v u,v M 1 M v u,v M 1 M v 1 u,v M 2 M v 1 u,v ∂ M (ξ ) − ∂ − − ∂ − − − ∂ − − − ∂ F n,m 1 K2,L K2,R 3 (pq) = K(pq) + (pq) + (pq) + K(pq) ∂G v=2 u=0 ∂Ξ v=1 u=1 ∂Ξ v=1 u=0 ∂Ξ ! v=0 u=1 ∂Ξ k,ℓ X X k,ℓ X X k,ℓ X X k,ℓ X X k,ℓ where

u,v 1 = M M M M M M K tn>s > >s >tk tℓ >s > >s >tℓ tℓ− >s > >s >tm Z M ··· u+v+1 +1 Z +1 u+v ··· u+1 Z 1 u ··· 1 u,v M M M M u,v M 1 IhΞ(su+v,su+v 1)Ws WsIhΞ(su+2,su+1) 2 d~s ; G × − ··· × G u,v 2,L = M M M M M M M M K tn>s > >s >tk tℓ >s > >s >tℓ tℓ>s >tℓ− s >s > >s >tm Z M ··· u+v+1 +1 Z +1 u+v ··· u+1 Z u 1 Z u u−1 ··· 1 u,v M M M M u,v M 1 IhΞ(su+v,su+v 1)Ws WsIhΞ(su+1,su ) 2 d~s , G × − ··· × G u,v 2,R = M M M M M M M M K tn>s > >s >s tk >s >tk tℓ >s > >s >tℓ tℓ− >s > >s >tm Z M ··· u+v+2 u+v+1 Z +1 u+v+1 Z +1 u+v ··· u+1 Z 1 u ··· 1 u,v I Ξ(sM ,sM )W W I Ξ(sM ,sM ) u,v d~sM ; G1 × h u+v+1 u+v s ··· s h u+2 u+1 × G2 u,v 3 = M M M M M M M M K tn>s > >s tk >s >tk tℓ >s > >s >tℓ tℓ>s >tℓ− s > >s >tm Z M ··· u+v+1 Z +1 u+v+1 Z +1 u+v ··· u+1 Z u 1 Z u ··· 1 u,v I Ξ(sM ,sM )W W I Ξ(sM ,sM ) u,v d~sM . G1 × h u+v+1 u+v s ··· s h u+1 u × G2 u,v u,v With a slight abuse of notation, here 1 and 2 denote the products with the form “WsIhΞ WsIhΞ” u,vG G ··· that complete the integrand. Each here represents a part of the integral M (ξn,m) where there are v M K F time points in ~s locating in (tℓ,tℓ+1). These cases are illustrated in Figure ??:

u,v In 1 , no time point other than these v points su+1, ,su+v is in the interval (tℓ 1,tk+1). • K ··· − u,v u,v In (or ), there exists at least one point other than su+1, ,su+v locating in (tℓ 1,tℓ) (or • K2,L K2,R ··· − (tk,tk+1) ) while no time point appears in (tk,tk+1) (or (tℓ 1,tℓ)). − u,v In 3 , there exists at least one point other than su+1, ,su+v in both (tk,tk+1) and (tℓ 1,tℓ). • K ··· − u,v u,v u,v By splitting M (ξn,m) in this way, one may easily check that 1 and 2 in each are all inde- F G G K ∂ u,v pendent of Ξk,ℓ, while all IhΞ in between depend on Ξk,ℓ. Therefore, we can now compute K(pq) as the ∂Ξk,ℓ product of the derivative of these IhΞ. Mimicking the previous analysis in (84) and (85), we may bound the

31 first summation

M M v u,v − ∂ 1 K(pq) v=2 u=0 ∂Ξ X X k,ℓ M M v M u v v u − M+1 M+1 (tn tk+1) − − h (tℓ 1 tm) W G (M!!L 2 ) (v 1) − − − ≤ × − × (M u v)! · v! · u! v=2 u=0 X X − − M M v M+1 − M+1 v 1 M v v 1 W G (M!!L 2 ) − (tn m 1) − h ≤ v! − − (M v u)!u! v=2 u=0 X X − − M (93) M+1 M+1 v 1 M v v = W G (M!!L 2 ) − (2tn m 1) − h (M v)!v! − − v=2 X − M 2 v M+1 M+1 M 2 2 − (M 2)! h 1 = W G (M!!L 2 )(2tn m 1) − h − − − (M 2 v)!v! 2t · (M 2)!(v + 2) v=0 n m 1 X − −  − −  − 1 M+1 M+1 1 M 2 2 W G (M!!L 2 ) (2tn m 1 + h) − h ≤ 2 (M 2)! − − − 1 M+1 M+1 M M 2 2 W GL 2 (2tn m) − h . ≤ 2 (M 3)!! − − Similarly, we may estimate the other summations as

M 1 M v u,v M 1 M v 1 u,v − − ∂ 2,L − − − ∂ 2,R M+1 M+1 M M 2 2 K K W GL 2 (pq) + (pq) 2 (2tn m) − h ∂Ξ ∂Ξ ! ≤ (M 3)!! − v=1 u=1 k,ℓ v=1 u=0 k,ℓ − X X X X and M 2 M v 1 u,v − − − M+1 ∂ 3 M+1 (M 1)M M 2 2 K W GL 2 − (2tn m) − h . ∂Ξ(pq) ≤ (M 3)!! − v=0 u=1 k,ℓ − X X Therefore, we get the estimate

∂ M (ξn,m) M+1 M M+1 (M 1)M M 2 2 F 3W G L 2 − (2tn m) − h . (94) ∂G(pq) ≤ (M 3)!! − k,ℓ −

Note that the upper bound above is strictly greater than the Case 1 bound given in (92). Therefore, we may use the upper bound in (94) as the uniform bound for both cases. As a summary, we have obtained the following result: for (k,ℓ) ˚Ω , ∈ n,m

∂ M (ξm,n) 0, if M =1, F M+1 (M 1)M (pq) W M+1G M L 2 M 2 2 ∂G ≤ (3 (M− 3)!! (2tn m) − h , if M 3. k,ℓ − − ≥

Summing up these estimates, we see that for odd M¯ > 1,

M¯ M¯ ∂F1(ξn,m) ∂ M (ξn,m) 3 2 3 (M 1)M 1/2 M 2 2 F 3W G L 2 (2WGL t ) h . (pq) (pq)  − n m −  ∂G ≤ ∂G ≤ (M 3)!! − · k,ℓ M=1 k,ℓ M=3 − M Xis odd  M Xis odd    (95)

By now, all the cases have been discussed, and the final conclusion (42) is a simple combination of (90) and (95), as completes the proof. 32 su+1,su+2,...,su+v su su+v+1 u,v: ( ( ) ) t K1 tℓ 1 tℓ tℓ+1 tk+1 − (tk−1) (tk)

su+1,su+2,...,su+v su su+v+1 u,v : ( ( ) ) t K2,L tℓ 1 tℓ tℓ+1 tk+1 − (tk−1) (tk)

su+1,su+2,...,su+v su su+v+1 u,v : ( ( ) ) t K2,R tℓ 1 tℓ tℓ+1 tk+1 − (tk−1) (tk)

su+1,su+2,...,su+v su su+v+1 u,v: ( ( ) ) t K3 tℓ 1 tℓ tℓ+1 tk+1 − (tk−1) (tk)

sM u,v u,v u,v u,v Figure 8: Distribution of the time sequence in K1 , K2,L, K2,R, K3 .

6.1.2. Proof of Proposition 6—Estimate the second-order derivatives The proof of Proposition 6 is quite tedious and does not shed much light. Moreover the error contributed by the second-order derivatives play a less important role in our final result. Thus we will only provide the outline of the proof stating the idea without technical details. By the definition of F ( ), we decompose the second-order derivative as 1 · 2 M¯ 2 ∂ F1(ξn,m) M+1 ∂ M (ξn,m) = sgn(tn t) i F , (96) ∂G(p1q1)∂G(p2q2) − ∂G(p1q1)∂G(p2q2) k1,ℓ1 k2,ℓ2 M=1 k1,ℓ1 k2,ℓ2 M Xis odd where the definition of the function M (ξn,m) is given in (79). Since each IhΞ is obtained via linear interpolation, one can easily check theF following result:

2 ∂ M (ξn,m) M M M M Lemma 4. If (pFq ) (p q ) is non-zero, there exist at least two factors IhΞ(s ,s ) and IhΞ(s ,s ) ∂G 1 1 ∂G 2 2 j1 j1 1 j2 j2 1 k1,ℓ1 k2,ℓ2 − − in the integrand of (79) with 1 j = j M +1 such that ≤ 1 6 2 ≤ M M tki 1

The entire argument in the rest of this section will be based on this result. Similar to the proof of the bounds for the first-order derivatives, we need to take into account four possibilities for the locations of (k1,ℓ1) and (k2,ℓ2).

(I) (k ,ℓ ) (k ,ℓ ) ∂Ω ∂Ω . Since ∂Ω includes two sides (see Figure 2), we are going to study 1 1 × 2 2 ∈ n,m × n,m n,m the two cases where the two nodes (k1,ℓ1) and (k2,ℓ2) are on the same/different sides.

Case 1: k1 = k2 = n or ℓ1 = ℓ2 = m. This is the case where (k1,ℓ1) and (k2,ℓ2) are on the same side of ∂Ωn,m. Here we only focus on the case k1 = k2 = n and the analysis for the other case ℓ1 = ℓ2 = m is similar. We may check that 33 2 ∂ M (ξn,m) If ℓ1,ℓ2 n 2, by Lemma 4, in order that the second-order derivative (pFq ) (p q ) is nonzero, ∂G 1 1 ∂G 2 2 • ≤ − k1,ℓ1 k2,ℓ2 there exist distinct j1 and j2 such that

M M tn 1

However, the conditions (98) and (99) contradict each other because the point tn 1 can only locate 2 − M ∂ F1(ξn,m) between one pair of adjacent points in the sequence ~s . Thus p q p q is always zero. ∂G( 1 1)∂G( 2 2) n,ℓ1 n,ℓ2

If ℓ1 = n 1 and m +1 ℓ2 n 2 (same for ℓ2 = n 1 and m +1 ℓ1 n 2), the corresponding • conditions− are ≤ ≤ − − ≤ ≤ −

M M tn 1 1 −

2 M 1 ∂ (ξ ) − FM n,m (p q ) (p q ) = 1 1 2 2 t >sM > >sM >t t >sM >t sM > >sM >t ∂Gn,n 1∂Gn,ℓ r=1 n M M−r+1 n−1 ℓ2+1 M−r ℓ2−1 M−r 1 m − 2 X Z ··· Z Z ··· # ~sM t ∂ M M M ( 1) { ≤ } WsIhΞ(tn,sM )Ws IhΞ(sM r+2,sM r+1) (100) − (p1q1) ··· − − × ∂Ξn,n 1 −  ∂ M M M M M WsIhΞ(sM r+1,sM r)Ws IhΞ(s1 ,tm) (tn,~s ) d~s . ∂Ξ(p2q2) − − ··· L n,ℓ2   When M = 1, the derivative is zero. The magnitude of the above sum can be observed from the sizes of the integral domains. The leading-order term is provided by r = 2, which gives

2 ∂ F1(ξ ) n,m O(h2). (101) ∂G(p1q1)∂G(p2q2) ∼ n,ℓ1 n,ℓ2

If ℓ1 = n 1 and ℓ2 = m (same for ℓ2 = n 1 and ℓ1 = m), the analysis for M > 1 is the same as • (100). When− M = 1, we have −

2 tn ∂ M (ξn,m) # s1 t ∂ 1 ∂ 1 1 1 F = ( 1) { 1≤ } WsIhΞ(tn,s ) WsIhΞ(s ,tm) (tn,s ) ds (p1q1) (p2q2) (p1q1) 1 (p2q2) 1 1 1 ∂G ∂G tn−1 − ∂Ξ ∂Ξ L n,n 1 n,ℓ2 Z n,n 1 n,ℓ2 − −    O(h). ∼ (102)

Therefore the derivative (96) also has magnitude O(h). If ℓ = ℓ = n 1, we need to find distinct j and j such that • 1 2 − 1 2 M M tn 1

34 Case 2: k1 = n,ℓ2 = m or ℓ1 = m, k2 = n. This is the case where (k1,ℓ1) and (k2,ℓ2) are on the different sides of ∂Ωn,m in Figure 2. Again we focus only one case k1 = n,ℓ2 = m, and the other case is similar.

If ℓ k > 1 (same for ℓ k > 1 when ℓ = m, k = n), we similarly propose the conditions • | 1 − 2| | 2 − 1| 1 2 M M tn 1 1, the leading-order term in pFq p q is the part ∂G( 1 1)∂G( 2 2) n,ℓ1 k2,m M M M M M M of integral where we let IhΞ(sj1 ,sj1 1)= IhΞ(tn,sM ) and IhΞ(sj2 ,sj2 1)= IhΞ(s1 ,tm) respectively − M − M in (97). As a result, we have the restriction tℓ1 1

If ℓ1 k2 1 (same for ℓ2 k1 1 when ℓ1 = m, k2 = n), we propose exactly the same conditions • | − |≤ | − |≤ 2 for ℓ1 k2 > 1. When M > 1, the derivative again has magnitude O(h ) by the same analysis. When M =| 1,− we| may obtain the result that the derivative again has magnitude O(h) following a similar 1 reasoning as (102) upon setting max(tℓ1 1,tk2 1)

(II) If (k ,ℓ ) (k ,ℓ ) ∂Ω ˚Ω . In this case, we have k = n or ℓ = m. One can easily check 1 1 × 2 2 ∈ n,m × n,m 1 1 that the derivative vanishes when M = 1. For M > 1, we may first assume k1 = n. To find out the leading 2 ∂ M (ξn,m) order term in each pFq p q , we consider the following two cases: ∂G( 1 1)∂G( 2 2) n,ℓ1 k2,ℓ2

M M If k2 ℓ1 1, we require max(tℓ1 1,tk2 1) < sM < min(tℓ1+1,tk2+1) and tℓ2 1 < sM 1 < tℓ2+1 so • | − | ≤ M M − −M M M M M− − that we can set IhΞ(sj1 ,sj1 1)= IhΞ(tn,sM ) and IhΞ(sj2 ,sj2 1)= IhΞ(sM ,sM 1) in (97). Since in − − 2 − M M ∂ F1(ξn,m) 2 this case we need to restrict at least s and s , we have p q p q O(h ). M M 1 ∂G( 1 1)∂G( 2 2) − n,ℓ1 k2,ℓ2 ∼

M M M If ℓ1 k2 > 1, we require tℓ1 1

Similar results can be obtained for the case when ℓ1 = m. So we now have the conclusion (44).

(III) If (k ,ℓ ) (k ,ℓ ) ˚Ω ∂Ω . The reasoning is similar to (II). 1 1 × 2 2 ∈ n,m × n,m

(IV) If (k1,ℓ1) (k2,ℓ2) ˚Ωn,m ˚Ωn,m. Again, we can check that the derivative is non-zero only when M > 1 and we may× assume∈ k k×. We also have the following two cases: 1 ≥ 2

M M If k2 ℓ1 1, we require tk1 1 < sj+2 < tk1+1, max(tℓ1 1,tk2 1) < sj+1 < min(tℓ1+1,tk2+1) and • | − M| ≤ − − − M M M M tℓ2 1

35 M M If k2 ℓ1 > 1, we require tki 1 1 and set IhΞ(sj1 ,sj1 1)= IhΞ(sj1+1,sj1 ) and IhΞ(sj2 ,sj2 1)= IhΞ(sj2+1,sj2 ) in (97). − − − 2 M M M M ∂ F1(ξn,m) 4 Since in this case we need to restrict at least s ,s ,s and s , we have p q p q O(h ). j1+1 j1 j2+1 j2 ∂G( 1 1)∂G( 2 2) k1,ℓ1 k2,ℓ2 ∼

Similar analysis and result can be given for k1 < k2. Therefore, we arrive at (46).

6.2. Proof of Proposition 7 — Recurrence relation for the numerical error By the definitions of the deterministic method (22) and the inchworm Monte Carlo method (25), it is straightforward to check that 1 ∆G = A (h)∆G + h (B (h)∆K + ∆K ) , (104) n+1,m n,m n,m 2 n,m 1 2 where for simplicity we have used the short-hands

∆K = K K , i i − i 1 1 A (h)= I + sgn(t t) + sgn(t t) iH h sgn(t t)sgn(t t)H2h2, n,m e 2 n − n+1 − s − 2 n − n+1 − s B (h)= I + sgn( t t)iH h.  n,m n+1 − s By triangle inequality, the error can be bounded by

1/2 1/2 1 2 1/2 E( ∆G 2) E( A (h)∆G ) 2 + h E B (h)∆K + ∆K . (105) k n+1,mk ≤ k n,m n,m k 2 n,m 1 2     h  i For the first term on the right-hand side, we have E( A (h)∆G ) 2) k n,m n,m k 1 1 2 ρ I + sgn(t t) + sgn(t t) iH h sgn(t t)sgn(t t)H2h2 E( ∆G ) 2), ≤ 2 n − n+1 − s − 2 n − n+1 − s · k n,m k     where ρ( ) denotes the spectral radius of a matrix. Let λ and λ be the two eigenvalues of H . Then · 1 2 s 1 1 2 ρ I + sgn(t t) + sgn(t t) iH h sgn(t t)sgn(t t)H2h2 2 n − n+1 − s − 2 n − n+1 − s     2 1 1 2 2 = max 1+ (sgn(tn t) + sgn(tn+1 t)) iλih sgn(tn t)sgn(tn+1 t)λi h i=1,2 2 − − − 2 − − (106)

1 2 2 2 1 2 4 4 = max 1+ (sgn(tn+1 t) sgn(tn t)) λi h + (sgn(tn+1 t)sgn(tn t)) λi h i=1,2 4 − − − 4 − −   1 1 =1+ (ρ(H ))4 h4 1+ H 4h4. 4 s ≤ 4 Note that in the third line of the above equation, the second term vanishes due to the fact that the scheme evolves according to (R1)(R3). Consequently, the first term on the right-hand side of (105) can be estimated by

2 1/2 1 4 4 2 1/2 1 4 4 2 1/2 E( An,m(h)∆Gn,m) ) 1+ H h E( ∆Gn,m) ) (1 + H h ) E( ∆Gn,m) ) . k k ≤ r 4 · k k ≤ 8 · k k      (107) 1/2 To estimate the second term on the right-hand side of (105), we again need to bound E( K K 2) k i − ik to obtain a recurrence relation for the numerical error. Such results are given in the followingh lemma: i e 36 Lemma 5. Assume that the hypotheses (H1) and (H3) hold. For a sufficiently small time step length h, we have

n m 2 E − E (std) (K1 K1) 8P1(tn m)h (2 + (n m i)h) ∆gn,m +α ¯(tn m) Ω (∆gn,m) , − ≤ − − − Γn,m(i) − N n,m i=1 h i X  e

(108) n m E − E (K2 K2) 28P1(tn m+1)h (2 + (n m +1 i)h) ∆gn,m∗ ∗ − ≤ − − − Γn,m(i) i=1 X  e 2 h2 (std) g +5¯α(tn m+1) Ω¯ (∆ n,m∗ ) +16¯α(tn m+1)¯γ(tn m+1) , − N n,m − − · Ns h i (109) and

1/2 n m E 2 − (std) 1 ( K1 K1 ) 8P1(tn m)h (2 + (n m i)h) (∆gn,m)+2 γ¯(tn m) , (110) k − k ≤ − − − NΓn,m(i) − · √N i=1 s h i X n m p e 1/2 − E 2 (std) ( K2 K2 ) 28P1(tn m+1)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ ) k − k ≤ − − − NΓn,m(i) i=1 h i X (111) e 1 +3 γ¯(tn m+1) , − · √Ns p where α¯ and γ¯ are defined in (36). Here we recall that P1(t), P2(t) are given in Propositions 5 and 6, and W , G , L are some upper bounds given in the assumptions (H1) and (H3).

We only write down the proof for (110) and (111) which are related to the numerical error in this section. The other two will only be used when estimating the bias so we put the corresponding proof in the Appendix.

Proof of (110) and (111). (i) Estimate of E( K K 2): k 1 − 1k The definition of K indicates that i e E ˜ ~s[F1(gn,m;~s)] = F1(gn,m), e (112) E ˜ ~s′ [F2(gn,m∗ ;~s′)] = F2(gn,m∗ ) e e by the fact that ~s and ~s′ are sampled independently from g and g∗ . Therefore, for each rs entry we e n,me n,m have − 2 Ns e e (rs) (rs) 2 (rs) 1 (rs) i Es( K K )= Es F (g ) F˜ (g ;~s ) ~ | 1 − 1 | ~  1 n,m − N 1 n,m  s i=1 X (113) e  2 2  2 1 E ˜(rs) (rs) e (rs) (rs) = ~s F1 (gn,m;~s) F1 (gn,m) + F1 (gn,m) F1 (gn,m) Ns − −   which gives e e e 1/2 1 2 2 1/2 E( K(rs) K(rs) 2) E F˜(rs)(g ;~s) F (rs)(g ) | 1 − 1 | ≤ √N · 1 n,m − 1 n,m s    (114) h i 1/2 e 2 + eE F ( rs)(g ) eF (rs) (g ) . 1 n,m − 1 n,m    37 e According to the the boundedness assumption (H1)(H3), the first term on the right-hand side of the inequality above is immediately bounded by

2 2 1/2 1 E ˜(rs) (rs) 1 F1 (gn,m;~s) F1 (gn,m) γ¯(tn m) . (115) √Ns · − ≤ − · √Ns    p withγ ¯(t) defined in (36). For the seconde term, we use meane value theorem to get 1/2 2 1/2 T 2 E F (rs)(g ) F (rs)(g ) = E F (rs)(η ) (~g ~g ) 1 n,m − 1 n,m ∇ 1 n,m · n,m − n,m    " !#   2 1/2 n m e (rs) e − ∂F (η ) = E 1 n,m ∆G(pq)   (pq) k,ℓ  ∂G · i=1 (k,ℓ) Γn,m(i) p,q=1,2 k,ℓ X ∈X X      2 1/2  n m (rs) 1/2 − ∂F (η ) 2 E 1 n,m E (pq)  (pq) ∆Gk,ℓ  ≤   ∂G  · i=1 (k,ℓ) Γn,m(i) p,q=1,2  k,ℓ     X ∈X X   (116)    2 1/2 n m  (rs)  1/2 −  ∂F (η )  2 E 1 n,m max E ∆G(pq)  (pq)  k,ℓ ∈ i k,ℓ  ≤   ∂G  · ( ) Γn,m ( ); i=1  (k,ℓ) Γn,m(i) p,q=1,2 k,ℓ p,q=1,2     X  ∈X X     n m 1    − −  4P (t ) h (std) (∆g )+ (2h + (n m 1 i)h2) (std) (∆g )  1 n m Γn,m(n m) n,m Γn,m(i) n,m ≤ − N − − − − N " i=1 # X n m − (std) 4P1(tn m)h (2 + (n m i)h) (∆gn,m). ≤ − − − NΓn,m(i) i=1 X Here we have considered the derivatives of F1( ) for different locations in Ωn,m. Also, we have applied Minkowski inequality in the first “ ” and H¨older’s· inequality in the second “ ”. The estimate (110) can then be obtained by inserting (115)≤ and (116) into (114). ≤ (ii) Estimate of E( K K 2): k 2 − 2k 1/2 Similar to (114), we use the triangle inequality to bound E( K(rs) K(rs) 2) by e | 2 − 2 | h i 1/2 2 2 1/2 E (rs) (rs) 2 1 E ˜(rs) e (rs) ( K2 K2 ) F2 (gn,m∗ ;~s′) F2 (gn,m∗ ) | − | ≤ √Ns · − h i    (117) 2 1/2 e e (rs) e (rs) + E F ( g∗ ) F (g∗ ) 2 n,m − 2 n,m    where the first term on the right-hand side can be estimated similarly to (115), ande the result is

2 2 1/2 1 E ˜(rs) (rs) 1 F2 (gn,m∗ ;~s) F2 (gn,m∗ ) γ¯(tn m+1) . (118) √Ns · − ≤ − · √Ns    p For the second term on the right-had e side of (117), wee mimic the analysis in (116) to get

2 1/2 (rs) (rs) E F (g∗ ) F (g∗ ) 2 n,m − 2 n,m    n m 1/2 − e E 2 (std) 4P1(tn m+1)h Gn∗+1,m Gn∗+1,m + (2 + (n m +1 i)h) ∗ (∆gn,m∗ ) . ≤ − k − k − − NΓn,m(i) ( i=1 ) h  i X e (119) 38 Here the difference between Gn∗+1,m and Gn∗+1,m can be estimated by

1/2 e2 E G∗ G∗ k n+1,m − n+1,mk h  i 1/2 1/2 E e(I + sgn(t t)iH h) ∆G 2 + h E( K K 2) ≤ k n − s n,mk k 1 − 1k h  1 2 2 2 1/2 i h i (120) (1 + H h ) E( ∆Gn,m ) e ≤ 2 k k   n m 2 − (std) h +8P1(tn m)h (2 + (n m i)h) (∆gn,m)+2 γ¯(tn m) , − − − NΓn,m(i) − · √N i=1 s X p 2 where we have applied our previous estimate (113) to bound E( K1 K1 ), and we have omitted the details 2 k − k of the estimation of E (I + sgn(tn t)iHsh) ∆Gn,m , which is similar to (106). k − k e   2 1/2 E (rs) (rs) 1H 2 2 (std) F2 (gn,m∗ ) F2 (gn,m∗ ) 4P1(tn m+1)h (1 + h ) Γ∗ (n m)(∆gn,m∗ ) − ≤ − 2 N n,m −    n (121) n m 2 e− (std) h + 1+8P1(tn m)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ )+2 γ¯(tn m) . − − − NΓn,m(i) − · √N i=1 s  X p o Again, we insert the estimates (118) and (121) into (117) to get

1/2 2 1 2 2 3 E( K2 K2 ) 8 3+ h + ( H + 16P1(tn m))h +8P1(tn m)h P1(tn m+1)h k − k ≤ 2 − − − ×   h n m i (122) − e (std) 2 1 (2+(n m +1 i)h) ∗ (∆gn,m∗ )+ 2+16P1(tn m+1)h γ¯(tn m+1) . − − NΓn,m(i) − − · √N i=1 s X  p 1 2 2 3 1 By choosing a sufficiently small time step such that h + ( H + 16P1(tn m))h +8P1(tn m)h and 2 − − ≤ 2 h 1 , the estimate (111) can be obtained. ≤ 16P1(tn−m+1) q With the results above, we now return to the formula (105) and give the recurrence relation for the 1/2 numerical error E( ∆G 2) as k n+1,mk  1/2  E( ∆G 2) k n+1,mk 1 1/2 1 1 1/2 1 1/2 (1 + H 4h4) E( ∆G 2) + (1 + H 2h2)h E( ∆K 2) + h E( ∆K 2) ≤ 8 k n,mk 2 2 k 1k 2 k 2k 1  1/2     (1 + H 4h4) E( ∆G 2) (123) ≤ 8 k n,mk  n m  2 − (std) 7 h + 22P1(tn m+1)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ )+ γ¯(tn m+1) − − − NΓn,m(i) 2 − · √N i=1 s X p upon assuming h √2 . ≤ H

39 Next, we consider the recurrence relation of E( ∆G 2). By straightforward expansion, k n+1,mk 1 2 E( ∆G 2)= E A (h)∆G + h (B (h)∆K + ∆K ) k n+1,mk n,m n,m 2 n,m 1 2 " #

2 1 2 2 = E( An,m(h)∆Gn,m) )+ h E Bn,m(h)∆K1 + ∆K2 + k k 4 k k (124) h i quadratic term

+ Re hE tr (Bn,m(h)∆K|1 + ∆K2)† (An,m{z(h)∆Gn,m) .}

h  cross term i

To bound the quadratic term, we first| derive the following results{z from the estimates (110)} and (111):

n m 2 E 2 2 2 − (std) 1 ( K1 K1 ) 128P1 (tn m)h (2 + (n m i)h) (∆gn,m) +8¯γ(tn m) , k − k ≤ − − − NΓn,m(i) − · N " i=1 # s X e n m 2 − E 2 2 2 (std) (125) ( K2 K2 ) 1568P1 (tn m+1)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ ) k − k ≤ − − − NΓn,m(i) " i=1 # X e 1 + 18¯γ(tn m+1) . − · Ns Then the quadratic term is bounded by 1 1 1 h2E B (h)∆K + ∆K 2 (1 + H 2h2)h2E( ∆K 2)+ h2E( ∆K 2) 4 k n,m 1 2k ≤ 2 k 1k 2 k 2k h i quadratic term 1 |h2E( ∆K 2)+{z h2E( ∆K 2}) (126) ≤ k 1k 2 k 2k 2 n m 2 2 2 4 − (std) h 912P1 (tn m+1) h 2 + (n m +1 i)h ∗ (∆gn,m∗ ) + 17¯γ(tn m+1) . ≤ − − − NΓn,m(i) − · N " i=1 # s X  thanks to the previous requirement on h in (123) and the results (110) and (111) in Lemma 5 obtained in the previous section. Similar to the proof of Lemma 2, the estimation of the cross term in (124) is more subtle. We will again need some key estimates from the following local scheme for the inchworm equation:

G¯∗ = (I + sgn(t t)iH h)G + K¯ h, n+1,m n − s n,m 1 1 1 1 G¯ = (I + sgn(t t)iH h)G + sgn(t t)iH hG¯∗ + (K¯ + K¯ )h, 0 m n 2N, n+1,m 2 n − s e n,m 2 n+1 − s n+1,m 2 1 2 ≤ ≤ ≤ (127) e where ¯ K1 = F1(g¯n,m), g¯n,m = gn,m; (128) ¯ ¯ K2 = F2(g¯n,m∗ ), g¯n,m∗ = (gn,m; Gn+1,n, , Gn+1,m+1, Gn∗+1,m). ···e These quantities are introduced as the counterpart of (70), which is a deterministic time step applied to the e e stochastic solutions. The following results are similare to Lemma 3 for the case of differential equations:

40 Lemma 6. Given the time step length h and the number of Monte Carlo sampling at each step Ns, we have

E (K¯ K ) =0, (129) k ~s 1 − 1 k 2 E ¯ h ~s,~s′ (K2 e K2) 4¯α(tn m+1)¯γ(tn m+1) , (130) k − k≤ − − · Ns n m − E ¯ e 2 1/2 (std) ( K1 K1 ) 8P1(tn m)h (2 + (n m i)h) (∆gn,m), (131) k − k ≤ − − − NΓn,m(i) i=1 X   n m E ¯ 2 1/2 − (std) ( K2 K2 ) 28P1(tn m+1)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ ) (132) k − k ≤ − − − NΓn,m(i) i=1   X where the formula of α¯(t) is given in (108).

The rigorous proof of this lemma is omitted since it is almost identical to that of Lemma 5. Now we are ready to bound the cross-term in (124). By the same treatment as the case of differential equations, we have

Re hE tr (Bn,m(h)∆K1 + ∆K2)† (An,m(h)∆Gn,m) h  i hE tr B (h)(K K¯ ) + (K K¯ ) † (A (h)∆G ) ≤ n,m 1 − 1 2 − 2 n,m n,m h   i + hE tr B (h)(K¯ K ) + (K¯ K ) † (A (h)∆G ) n,m 1 − 1 2 − 2 n,m n,m      = hE tr Bn,m(h)(K 1 K¯1) + (K2 K¯2) † (Aen,m(h)∆Gn,me) − − (133) h  i  † + hE tr E ′ B (h)(K¯ K ) + (K¯ K ) (A (h)∆G ) ~s,~s n,m 1 − 1 2 − 2 n,m n,m  h  i  1/2 2 1/2 h E A (h)∆ G 2 E B (h)(Ke K¯ ) + (Ke K¯ ) ≤ k n,m n,mk n,m 1 − 1 2 − 2 h  i   2 1/2 E E ¯ ¯ + ~s,~s′ Bn,m(h)(K1 K1) + (K2 K2) − − )      e e where we have applied Cauchy-Schwarz inequality in the last step.

2 1/2 On the right-hand side of (133), the term E An,m(h)∆Gn,m has already been bounded in (107). For the other term, we can find the bounds by Lemmak 6 immediately:k   2 1/2 1/2 1/2 E B (h)(K K¯ ) + (K K¯ ) 2 E K K¯ 2 + E K K¯ 2 n,m 1 − 1 2 − 2 ≤ k 1 − 1k k 2 − 2k h  i n m    − (std) 44P1(tn m+1)h (2 + (n m +1 i)h) ∗ (∆gn,m∗ ), ≤ − − − NΓn,m(i) i=1 X 2 1/2 E E ′ B (h)(K¯ K ) + (K¯ K ) E ′ B (h)(K¯ K ) + (K¯ K ) ~s,~s n,m 1 − 1 2 − 2 ≤ ~s,~s n,m 1 − 1 2 − 2       2  e ¯ e 2 ¯ 2 e eh 2 E~s,~s′ (K1 K1) + E~s,~s′ (K2 K2) 4¯α(tn m+1)¯γ(tn m+1) . ≤ − − ≤ − − · Ns (134) e e

41 Thus the final estimation of the cross term is

1 4 4 2 1/2 Re hE tr (B (h)∆K + ∆K )† (A (h)∆G ) 1+ H h h E( ∆G ) n,m 1 2 n,m n,m ≤ 8 k n,mk × h  i   n m  2 (135) − (std) h 44P1(tn m+1)h 2 + (n m +1 i)h ∗ (∆gn,m∗ )+4¯α(tn m+1)¯γ(tn m+1) . − − − NΓn,m(i) − − · N ( i=1 s ) X  Finally, we combine the estimate (126) for the quadratic term with (135) for the cross term to obtain the recurrence relation (49) for the numerical error.

6.3. Proof of (35) in Theorem 3—Estimation of the numerical error In this section we discuss how to apply the recurrence relations in Proposition 7 to obtain the estimates in Theorem 3. Here we only focus on the estimate for the numerical error which we are more interested in. For the bias, we refer the readers to Appendix B for the detailed proof. In Proposition 7, two recurrence relations are given, in which the first relation (48) is easier to analyze due to its linearity. For simplicity, we rewrite this estimate as

(std) 4 (std) Γ∗ (j m+1)(∆gn,m∗ ) (1 + c1h ) Γ∗ (j m)(∆gn,m∗ ) N n,m − ≤ N n,m − j m − (136) 2 (std) h + c2h 2 + (j m +1 i)h ∗ (∆gn,m∗ )+ c3 , j = m, ,n 1, − − NΓn,m(i) √N ··· − i=1 s X  1 H 4 7 where we have introduced the notations c1 = 8 , c2 = 22P1(tn m) and c3 = 2 γ¯(tn m) for simplic- ity. The inequality (136) is obtained by taking the maximum on− the diagonals, and using− the fact that p γ¯(tj+1 m) γ¯(tn m). − ≤ − This inequality shows that the recurrence relation of the error with two indices n and m can be simplified (std) as a recurrence relation with only one index j. For each j, the quantity Γ∗ (j m)(∆gn,m∗ ) denotes the N n,m − maximum numerical error on the (j m)th diagonal (see Figure 2). This can also be understood by an − alternative order of computation: once all the propagators on the diagonals Γ∗ (i) for i j m are n,m ≤ − computed, the propagators on Γn,m∗ (i) can actually be computed in an arbitrary order, i.e., the computation of all the propagators on Γn,m∗ (i) are independent of each other. The derivation of (136) is inspired by this observation, and this idea will also be used in the proof of Theorem 3 to be presented later in this section.

To study the growth of the error from (136), one can define a sequence Aj with the following recurrence relation: { }

j 1 − 2 2 h Aj+1 =(1+3c2h )Aj + c2h 2 + (j +1 i)h Ai + c3 for j = m, ,n 1, (137) − √N ··· − i=m+1 s X  and initial condition A = 0. Then we have [E( ∆G 2)]1/2 A if we require c1 h2 + h 1. m j+1,m j+1 c2 Increasing the index j in (137) by one, we get k k ≤ ≤

j 2 2 h Aj+2 =(1+3c2h )Aj+1 + c2h 2 + (j +2 i)h Ai + c3 . (138) − √N i=m+1 s X  Subtracting (137) from (138) yields

j 1 − A =(2+3c h2)A (1 + c h2 2c h3)A + c h3 A . (139) j+2 2 j+1 − 2 − 2 j 2 i i=m+1 X 42 Similarly, we can reduce the index j in (139) by 1 and again subtract the two equations, so that a recurrence relation without summation can be derived: 2 2 3 2 3 Aj+2 (3+3c2h )Aj+1 +(3+4c2h 2c2h )Aj (1 + c2h c2h )Aj 1 =0. (140) − − − − − The general formula of Aj can then be found by solving the corresponding characteristic equation. We denote Aj as j j j Aj = σ1r1 + σ2r2 + σ3r3. (141) The formula of each ri is give in Appendix A, based on which we can estimate An by n m An C(h,Ns) (1 + θ1 P1(tn m)h) − , (142) ≤ · − where θ1 is a constant and C(h,Ns) is a function to bep determined. The recurrence relation (48) helps determine the growth rate of the numerical error. However, if we use (48) to determine the function C(h,N ), we can only find C(h,N ) 1/N , whereas the desired result is s s ∝ s C(h,N ) h/N . To this end, the other recurrence relation (49) has to be utilized, as in the proof given s s p below: ∝ p Proof of Theorem 3 (Numerical error). As mentioned previously, we only present the proof of (35) in this section. We claim that the error satisfies

(std) h l ∗ (∆gn,m∗ ) θ2 γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) Γn,m(l+1) − − (143) N ≤ · rNs for l =0, 1, ,n m. p p ··· − Here θ2 is some O(1) constant. We will prove this claim by mathematical induction using (49). We first check the initial case l = 0. By the recurrence relation (49), the left-hand side of (143) is bounded by (std) E 2 1/2 E 2 1/2 E 2 1/2 ∗ (∆gn,m∗ )= max [ ∆Gm+1,m ] , [ ∆Gm+2,m+1 ] , , [ ∆Gn+1,n ] NΓn,m(1) k k k k ··· k k (144)  h  h    √17 γ¯(h) θ2 γ¯(tn m+1) , √ − N ≤ Ns ≤ · r s p p as holds for any constant θ √17. 2 ≥ Assume that (143) holds for all l =0, 1, , k 1, when l = k, ··· − (std) E 2 1/2 E 2 1/2 ∗ (∆gn,m∗ ) = max ∆Gm+k+1,m , , ∆Gn+1,n k . (145) NΓn,m(k+1) k k ··· k − k    2 1/2   Therefore we just need to find the bounds for each E ∆Gm+k+j,m+j 1 , j =1, ,n +1 m k. This will be done by the recurrence relation (49). Fork clearer presentation,− k we rewrite··· the equation− − (49)   below only with subscripts replaced:

2 1 4 4 2 E ∆Gm+k+j,m+j 1 1+ H h E ∆Gm+k+j 1,m+j 1 k − k ≤ 4 · k − − k     1 4 4 2 2 1/2 + 1+ H h 44P1(tn m+1)h E ∆Gm+k+j 1,m+j 1 8 − · k − − k   k   (std) (2 + (k +1 i)h) Γ∗ (i)(∆gn,m∗ ) × − N m+k+j−1,m+j−1 (146) i=1 X 3 1 4 4 h 2 1/2 + 1+ H h 4¯α(tn m+1)¯γ(tn m+1) E ∆Gm+k+j 1,m+j 1 8 − − N · k − − k   s   2 k 2 2 4 (std) h + 912P1 (tn m+1)h (2 + (k +1 i)h) Γ∗ (i)(∆gn,m∗ ) + 17¯γ(tn m+1) .  − − N m+k+j−1,m+j−1 − N  "i=1 # s  X  43   For the first and third terms on the right-hand side, we use the inductive hypothesis (143) with l = k 1, which indicates that −

2 E 2 (std) 2 h k 1 ∆Gm+k+j 1,m+j 1 Γ∗ (k)(∆gn,m∗ ) θ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , (147) k − − k ≤ N n,m ≤ − · Ns −  h i p to get

1 4 4 2 1+ H h E ∆Gm+k+j 1,m+j 1 4 · k − − k    (148) 1H 4 4 2 h 2k 2 1+ h θ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , ≤ 4 − · N −   s 3 p 1 4 4 h 2 1/2 1+ H h 4¯α(tn m+1)¯γ(tn m+1) E ∆Gm+k+j 1,m+j 1 8 − − N · k − − k   s  7  1 4 4 3/2 h k 1 1+ H h 4¯α(tn m+1)[¯γ(tn m+1)] θ2 (1 + θ1 P1(tn m+1)h) − (149) ≤ 8 − − · N 3 −   s s p 2 h 2k 2 hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , ≤ − Ns − p 3 1 H 4 4 h θ2 where we have assumed 1+ 8 h N to get the last “ ” in (149). For the s ≤ 4¯α(tn−m+1)√γ¯(tn−m+1) ≤ second and the fourth terms on the right-hand q side of (146), we first use the inductive hypothesis to get

(std) (std) Γ∗ (i)(∆gn,m∗ ) Γ∗ (i)(∆gn,m∗ ) N m+k+j−1,m+j−1 ≤ N n,m (150) h i 1 θ2 γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , for i =1, , k. ≤ − · rNs − ··· p p The next step is to insert the above estimation into (146). To assist our further calculation, we let c4 = θ1 P1(tn m+1) and perform the following calculation: − p k 2 + (k +1 i)h (1 + c h)i − · 4 i=1 X k k = 2 (1 + c h)i + h (k +1 i) (1 + c h)i 4 − · 4 i=1 i=1 (151) X k+1 X k+2 2 (1 + c4h) (1 + c4h) (1 + c4h) (1 + c4h) (1 + c4h)c4kh = 2 − + h − 2 2 − c4h c4h (1 + c h)k+1 (1 + c h)k+1 (1 + c h)k+1 1 1 2 4 +2 4 = 4 2 + , ≤ c h c2h h · c c2 4 4  4 4  1 where we have assumed h . By this inequality, we see that when h is sufficiently small, for any θ1 > 1, ≤ c4

44 the second term on the right-hand side of (146) satisfies

1 4 4 2 2 1/2 1+ H h 44P1(tn m+1)h E ∆Gm+k+j 1,m+j 1 8 − · k − − k   k  (std) (2 + (k +1 i)h) Γ∗ (i)(∆gn,m∗ ) × − N m+k+j−1,m+j−1 i=1 X 3 k 1H 4 4 2 h k+i 2 1+ h 44P1(tn m+1)θ2γ¯(tn m+1) 2 + (k +1 i)h (1 + θ1 P1(tn m+1)h) − ≤ 8 − − N − · −   s i=1 X  p 1 4 4 P1(tn m+1) 1 88 1+ H h (1 + θ1 P1(tn m+1)h) − + ≤ 8 − θ θ2   p 1 1 ! p 2 2 ≤ ≤ 2√P1(tn−m+1) | {z } ≤ | {z } | 2 {z h } 2k 2 hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − × − Ns − 2 h 2k 2 p 704 P1(tn m+1)hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , ≤ − − Ns − p p (152) and when θ2 = √34, the fourth term on the right-hand side of (146) satisfies

2 k 2 2 4 (std) h 912P1 (tn m+1)h (2 + (k +1 i)h) Γ∗ (i)(∆gn,m∗ ) + 17¯γ(tn m+1) − − N m+k+j−1,m+j−1 − N "i=1 # s X 2 5 k 2 2 2 h i 1 h 912P1 (tn m+1)θ2γ¯(tn m+1) (2 + (k +1 i)h)(1+ θ1 P1(tn m+1)h) − + 17¯γ(tn m+1) ≤ − − N − − − N s "i=1 # s X 2 p P1(tn m+1) 1 h 2 − 2 2k 2 = 3648(1+ θ1 P1(tn m+1)h) + 2 h hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − − p θ1 θ1 ! × − Ns − 2 p ≤ p | {z } 17 2 h 2k 2 + 2 hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − θ2 × − Ns −

1 p ≤ 2 2 h |{z} 2k 2 hθ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − ≤ − Ns − p (153)

2 √P1(tn−m+1) 1 by assuming 7296h θ + θ2 1/2. 1 1 ≤   Inserting (148)(152)(149)(153) into (146), we get

2 E ∆Gm+k+j,m+j 1 k − k 1 H 4 4 1 + (704 P1(tn m+1)+2) h + 4 h 2 h 2k − θ γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) 2 2 (154) ≤ (1p + θ1 P1(tn m+1)h) × − Ns − − p 2 hp 2k θ2γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) ≤ − Ns − p by setting θ1 = 353. Since this inequality holds for all j = 1, ,n +1 m k, by (145), we know that (143) holds for l = k. By the principle of mathematical induction,··· we have− completed− the proof of (143). 45 Finally, we set l = n m in (143) to get − E 2 1/2 (std) h n m [ ∆Gn+1,m ] = ∗ (∆g∗ ) θ2 γ¯(tn m+1) (1 + θ1 P1(tn m+1)h) − , (155) Γn,m(n m+1) n,m k k N − ≤ − · rNs −  p p resulting in the final estimate (35) for the numerical error. Remark 2. Due to the jump conditions (21), we actually need to multiply by the norm of the observable O on the right-hand side of (136) when crossing the discontinuities. However, we can always first consider k sk the observable Os/ Os , and then multiply the result by Os . Therefore we can always assume Os =1 in this paper and thusk k our analysis is not affected. k k k k

6.4. Proof of Proposition 8—Estimation of the error for the deterministic method

In this section, we consider the error En+1,m = Ge(tn+1,tm) Gn+1,m for the deterministic scheme (22). By triangle inequality, −

1 e e E G (t ,t ) A (h)G (t ,t )+ h B (h)F (g )+ F (g ∗ ) k n+1,mk≤ e n+1 m − n,m e n m 2 n,m 1 n,m 2 n,m    Part 1 (156) 1 e e | + A (h)G (t ,t )+ {zh B (h)F (g )+ F (g ∗ ) }G n,m e n m 2 n,m 1 n,m 2 n,m − n+1,m    Part 2

e e where gn,m and gn,m∗ are defined| as {z }

e gn,m = (Ge(tm+1,tm); Ge(tm+2,tm+1), Ge(tm+2,tm); ; Ge(tn,tn 1), , Ge(tn,tm)) , ··· − ··· e e g ∗ = g ; G (t ,t ), , G (t ,t ), G (t ,t ) , n,m n,m e n+1 n ··· e n+1 m+1 e n+1 m which are similar to the definitions of gn,m and gn,m∗ . We note thatGe on the discontinuities are again defined to be multiple-valued as

+ + Ge(tN ,tk)= Ge(t ,tk), Ge(t−,tk) and Ge(tj ,tN )= Ge(tj ,t ), Ge(tj ,t−)

e e for 0 k N 1 and N +1 j 2N. We further define en,m = gn,m gn,m and en,m∗ = gn,m∗ gn,m∗ , which≤ will≤ be used− later. The≤ estimation≤ of the two parts in (156) will be− discussed in the following− two subsections.

6.4.1. Estimation of Part 1 in (156) We further split this part of the error by

1 e e G (t ,t ) A (h)G (t ,t )+ h B (h)F (g )+ F (g ∗ ) e n+1 m − n,m e n m 2 n,m 1 n,m 2 n,m   1  = (G (t ,t ) A (h)G (t ,t )) h (B (h) (t , G ,t )+ (t , G ,t )) (157) e n+1 m − n,m e n m − 2 n,m H n e m H n+1 e m 1 1 e e + h (B (h) (t , G ,t )+ (t , G ,t )) h B (h)F (g )+ F (g ∗ ) 2 n,m H n e m H n+1 e m − 2 n,m 1 n,m 2 n,m where the definition of is given in (19).  H Using Taylor expansion, we may easily obtain the bound for the first term of (157):

1 1 5 3 (G (t ,t ) A (h)G (t ,t )) h (B (h) (t , G ,t )+ (t , G ,t )) ( HG ′′+ G ′′′) h . e n+1 m − n,m e n m − 2 n,m H n e m H n+1 e m ≤ 4 12 ·

(158)

46 Meanwhile, since (t , G ,t ) F (ge ) H n e m − 1 n,m M¯ M+1 # ~sM t = sgn(tn t) i ( 1) { ≤ } M M (159) − tn>s > >s >tm − M=1 Z M ··· 1 M Xis odd W G (t ,sM )W W G (sM ,t ) I G (t ,sM )W W I G (sM ,t ) (t ,~sM )d~sM , s e n M s ··· s e 1 m − h e n M s ··· s h e 1 m L n we need the following estimation to bound the above term:  G (t ,sM )W W G (sM ,t ) I G (t ,sM )W W I G (sM ,t ) e n M s ··· s e 1 m − h e n M s ··· s h e 1 m M G (t ,sM )W W G (sM ,sM ) W G (s ,sM ) I G (sM ,sM ) W (160) ≤ e n M s ··· s e j+2 j+1 · · e j+1 j − h e j+1 j · j=0 X M M M IhGe(sj ,sj 1)Ws WsIhGe(s1 ,tm) . · − ··· The term G (s ,sM ) I G (sM ,sM ) in the above equation is the linear interpolation error. By the e j+1 j − h e j+1 j standard error estimation of interpolation (see e.g. [38]), if the point (sM ,sM ) locates inside the triangle j+1 j T ′, the interpolation error is

(rs) M M (rs) M M 1 2 2 (rs) Ge (sj+1,sj ) IhGe (sj+1,sj ) RT ′ ρ D Ge , (161) − ≤ 2 L∞(T ′)   2 (rs) where RT ′ is the radius of the circumscribed circle of T ′ and ρ(D Ge denotes the spectral radius of the (rs) Hessian of Ge , which can be bounded by  2 (rs) 2 (rs) ρ D G G 2G ′′. e ≤ k∇ e k≤   Plugging the above result into (160) and using the bounds of the exact and numerical solutions, we get

M M M M M M 2 G (t ,s )W W G (s ,t ) I G (t ,s )W W I G (s ,t ) (M + 1)G W G ′′h . (162) e n M s ··· s e 1 m − h e n M s ··· s h e 1 m ≤

Finally we apply this bound to (159) to obtain e ¯ 2 (tn, Ge,tm) F1(gn,m) β(tn m)h , (163) H − ≤ − where M¯ 1/2 M +1 1/2 M β¯(t)= WG ′′L (WGL t) .  (M 1)!!  M=1 − M Xis odd    e Note that (tn+1, Ge,tm) F2(gn,m∗ ) can be obtained by changing tn to tn+1 in the expression of H e − ¯ 2 (tn, Ge,tm) F1(gn,m). Therefore its bound can be given by β(tn m+1)h . Thus, H − − 1 1 e e ¯ 3 h ( (tn, Ge,tm)+ (tn+1, Ge,tm)) h F1(gn,m)+ F2(gn,m∗ ) β(tn m+1)h . (164) 2 H H − 2 ≤ −



Inserting the estimates (158) (164) into (157), we have the following estimation for Part 1 of (156):

1 e e G (t ,t ) A (h)G (t ,t )+ h B (h)F (g )+ F (g ∗ ) e n+1 m − n,m e n m 2 n,m 1 n,m 2 n,m    Part 1 (165)

1 5 3 | {z HG ′′ + G ′′′ + β¯(tn }m+1) h . ≤ 4 12 −   47 6.4.2. Estimation of Part 2 in (156) The estimation for this part of the error is similar to that of the stochastic error. By the numerical scheme (22), we have

1 e e A (h)G (t ,t )+ h B (h)F (g )+ F (g ∗ ) G n,m e n m 2 n,m 1 n,m 2 n,m − n+1,m   (166) 1 e 1  e = A (h)E + hB (h) F (g ) F (g ) + h F (g ∗ ) F (g∗ ) , n,m n,m 2 n,m 1 n,m − 1 n,m 2 2 n,m − 2 n,m   For the second term on the right-hand side, we can mimic the analysis of (116) to get

n m − e F1(gn,m) F1(gn,m) 8P1(tn m)h (2 + (n m i)h) en,m Γn,m(i). (167) − ≤ − − − k k i=1 X

For the third term, due to the same analysis, we have

e F (g ∗ ) F (g∗ ) 2 n,m − 2 n,m n m − (168) 8 P1(tn m+1)h Ge(tn +1,tm) Gn∗+1,m + (2 + (n m +1 i)h) en,m∗ Γ∗ (i) ≤ − − − − k k n,m ( i=1 ) X where

G (t ,t ) G∗ e n+1 m − n+1,m ∂ 1 ∂2 = G (t ,t )+ h G ( t ,t )+ h2 G (ν ,t ) (I + sgn(t t)iH h) G + hF (g ) e n m ∂s e n m 2 ∂s2 e n m − n − s n,m 1 n,m " ↑ # ↑   2 1 2 ∂ = (I + sgn(tn t)iHsh) En,m + h (tn, Ge,tm) F1(gn,m) + h 2 Ge(νn,tm) − H − 2 ∂s ↑  e  (I + sgn(tn t)iHsh) En,m + h (tn, Ge,tm) F1(gn,m) ≤ k − k H − 2 e 1 2 ∂ + h F1(gn,m) F1(gn,m) + h 2 Ge(νn,tm) . − 2 ∂s ↑ (169)

Note that the second-order derivative above is the remainder of the Taylor expansion and should be inter- preted as ∂2 (11) (11) ∂2 (12) (12) 2 2 Ge (νn ,tm) 2 Ge (νn ,tm) ∂ ∂s↑ ∂s↑ 2 Ge(νn,tm)= ∂2 (21) (21) ∂2 (22) (22) . ∂s  2 Ge (νn ,tm) 2 Ge (νn ,tm) ↑ ∂s↑ ∂s↑   Using the previous results (163) and (167), we have the bound

G (t ,t ) G∗ e n+1 m − n+1,m n m 1 2 2 2 − 2 ¯ 3 (1 + H h ) En,m +8 P1(tn m)h (2+(n m i)h) en,m Γ (i) + G ′′h + β(tn m)h ≤ 2 k k − − − k k n,m − i=1 X n m  − 1 2 2 2 2 (1 + H h ) En,m +8P1(tn m)h (2+(n m i)h) en,m Γ (i) +2G ′′h ≤ 2 k k − − − k k n,m i=1 X (170)

48 upon assuming h G ′′/β¯(tn m). Plugging the above estimate into (168), we obtain ≤ − n m − e G 3 F2(gn,m∗ ) F2(gn,m∗ ) 28P1(tn m+1)h (2 + (n m +1 i)h) en,m∗ Γ∗ (i) + 16P1(tn m+1) ′′h − ≤ − − − k k n,m − i=1 X (171) which is a result similar to (111). Now we plug the estimates (167) and (171) into (166), we obtain the following estimation for Part 2 of En+1,m:

1 e e A (h)G (t ,t )+ h B (h)F (g )+ F (g ∗ ) G n,m e n m 2 n,m 1 n,m 2 n,m − n+1,m    Part 2 n m | 1H 4 4 {z 2 − } G 3 1+ h En,m + 22P1(tn m+1)h (2+(n m +1 i)h) en,m∗ Γ∗ (i) +8P1(tn m+1) ′′h . ≤ 8 k k − − − k k n,m −   i=1 X (172)

By now, we can combine the estimates for Part 1 and (165) Part 2 (172), so that the estimation (156) yields the recurrence relation: E k n+1,mk n m 1H 4 4 2 − e 3 1+ h En,m + 22P1(tn m+1)h (2+(n m +1 i)h) en,m∗ Γ∗ (i) + P (tn m+1) h , ≤ 8 k k − − − k k n,m − ·   i=1 X (173) where e 1 5 P (t)= H +8P (t) G ′′ + G ′′′ + β¯(t). 4 1 12   One may compare the recurrence relation above with (48) to find that the only difference between these two inequalities is the truncation error (last term). Therefore, we may simply replicate the procedures (136)–(142) and conclude that the deterministic error has the exact same growth rate in the exponential part as the numerical error:

e n m+1 2 En+1,m P (tn m+1)(1 + θ1 P1(tn m+1)h) − h (174) k k≤ − − · which can again be verified by mathematical induction.p Therefore, we arrive at the estimate for the deter- ministic error at stated in Proposition 8.

7. Conclusion

We have presented a detailed analysis to study the error growth in the inchworm Monte Carlo method, emphasizing the trade-off between the numerical sign problem and error growth due to accumulation and amplification due to time marching. The result explains why the inchworm Monte Carlo method has a slower error growth than the classical quantum Monte Carlo method, and our analysis reveals how partial resummation trades-off the numerical sign problem and the error amplification. Our work points to the research direction of improving the time integrator to further suppress the error growth, which will be considered in future works.

49 Acknowledgement

Zhenning Cai was supported by the Academic Research Fund of the Ministry of Education of Singapore under grant No. R-146-000-291-114. The work of Jianfeng Lu was supported in part by the National Science Foundation via grants DMS-1454939 and DMS-2012286.

Appendix A. Formulas of the roots of characteristic polynomial

2 Here we provide the formulas for ri appearing in (141). Let ǫ = c2h . Then we have

1/3 2 ǫ(2+2h +3ǫ) 1 r =1+ ǫ + 3 + R, 1 R 21/3 32/3  × (1 + √3i)ǫ(2+2h +3ǫ) i(i + √3) r =1+ ǫ + R, (A.1) 2 − 22/3 31/3R 2 21/3 32/3 × × × i(i + √3)ǫ(2+2h +3ǫ) i(i + √3) r =1+ ǫ + + R 3 22/3 31/3R 2 21/3 32/3 × × × where

1/3 R = ǫ 9h + 18ǫ + 18hǫ + 18ǫ2 + √3 4ǫ(2+2h +3ǫ)3 + 27(h +2hǫ +2ǫ(1 + ǫ))2 . (A.2) −   p  When h is small, it can be verified that R O(h). One can then see from (A.1) that r O(h) for i =1, 2, 3. ∼ i ∼

Appendix B. Proofs related to the bias estimation of the inchworm Monte Carlo method

In this appendix, we would like to complete the proof of the bias estimation for the inchworm Monte Carlo method. Specifically, the proofs of (108)(109) in Lemma 5 and the proof of (34) in Theorem 3 will be given below. The final result (38) can be obtained directly by the triangle inequality.

Proof of (108) and (109). (i) Estimate of E(K K ) : 1 − 1

We again use the relation (112) to get e E (K K )= F (g ) F (g ). (B.1) ~s 1 − 1 1 n,m − 1 n,m Then by Taylor expansion, we get e e T E (rs) (rs) (rs) E ~ K1 K1 = F1 (gn,m) (gn,m ~gn,m)+ − ∇ · − (B.2)    1  (rs) e E (~g ~g )T 2F (ξ ) (~g ~g ) 2 n,m −e n,m ∇ 1 n,m n,m − n,m h   i where ξn,m is a convex combination of gn,m andegn,m. e The estimate for the first term on the right-hand side of the equation above is similar to (116) which is e T n m (rs) E ~ − E F1 (gn,m) (gn,m ~gn,m) 4P1(tn m)h (2+(n m i)h) ∆gn,m . (B.3) ∇ · − ≤ − − − Γn,m(i) i=1   X 

e

50 For the second term on the right-hand side of (B.2), we use Proposition 6 to bound it by

T E ~g ~g 2F (rs)(ξ ) ~g ~g n,m − n,m ∇ 1 n,m n,m − n,m        (rs) e ∂2F (ξ )e = E 1 n,m ∆G(p1q1)∆G(p2q2)  (p1q1) (p2q2) k1,ℓ1 k2,ℓ2  k ,ℓ ∈ p ,q =1,2; ∂G ∂G ( 1 1) Ωn,m; 1 1 k1,ℓ1 k2,ℓ2 p ,q =1,2  (k2,ℓX2)∈Ωn,m 2 X2   

 + + +  (B.4) ≤ k ,ℓ ∈∂ k ,ℓ ∈∂ ˚ ˚ ( 1 1) Ωn,m; ( 1 1) Ωn,m; (k1,ℓ1)∈Ωn,m; (k1,ℓ1)∈Ωn,m;  (k2 ,ℓ2X)∈∂Ωn,m (k ,ℓX)∈Ω˚n,m (k ,ℓ X)∈∂Ωn,m (k ,ℓX)∈Ω˚   2 2 2 2 2 2 n,m    (rs) ∂2F (ξ ) 2 E 1 n,m E (pq) max ∆Gk,ℓ (p1q1) (p2q2)  (k,ℓ)∈Ωn,m; p ,q =1,2; ∂G ∂G · 1 1 k1,ℓ1 k2,ℓ2 ! p,q=1,2 p ,q =1,2   2 X2  2  (std) α¯(tn m) (∆gn,m) , ≤ − NΩn,m h i where 1 α¯(t)=16P (t)(10t + 16t2 +5t3 + t4). 2 4 Here we first need to count the number of the nodes in the set ∂Ω = 2(n m) 1 and ˚Ω = | n,m| − − n,m 1 (n m 1)(n m 2). Then the last “ ” above is done by combining the second-order derivatives of 2 different− magnitudes− − given− in Proposition 6 with≤ the corresponding number of such derivatives. For example, the first summation in the third line above together with the second-order derivatives such that conditions (a)-(d) are satisfied will contribute P2(tn m)h (4+2 3(n m)) P2(tn m) 10(tn m) to the final estimate in the last line. Similar analysis apply to− other· three× summations.− ≤ − · − Inserting the two estimates above into (B.2) gives us the evaluation for E(K K ) as (108). 1 − 1

(ii) Estimate of E(K2 K2) : e −

Using the same method e as the estimation of E(K1 K1) , we have −

e n m E E − E (K2 K2) 4P1(tn m+1)h (Gn∗+1,m Gn∗+1,m) + (2 + (n m +1 i)h) ∆gn,m∗ ∗ − ≤ − − − − Γn,m(i) ( i=1 ) X 2  e (std) e +α ¯(tn m+1) ∗ (∆gn,m∗ ) , − NΩn,m h i (B.5) which is similar to (108). By the definition of Gn∗+1,m and Gn∗+1,m given in (22) and (25) respectively, we can estimate E(G∗ G∗ ) as n+1,m − n+1,m e

E (Gn∗ +1,me Gn∗+1,m) k − k E ((I + sgn(t t)iH h)∆G ) + h E(K K ) ≤ k e n − s n,m k k 1 − 1 k n m 1H 2 2 E 2 − E (B.6) 1+ h (∆Gn,m) +8P1(tn m)he (2 + (n m i)h) ∆gn,m ≤ 2 k k − − − Γn,m(i)   i=1 X 2  (std) +α ¯(tn m)h (∆gn,m) , − NΩn,m 51 h i where we have applied (108) in the last inequality. Thus it remains only to bound

2 2 (std) (std) E 2 ∗ (∆gn,m∗ ) = max ¯ (∆gn,m∗ ) , ( Gn∗+1,m Gn∗+1,m ) , (B.7) NΩn,m NΩn,m k − k h i h i  E e 2 for which we just need to focus on the estimation of ( Gn∗+1,m Gn∗+1,m ). Again by the definitions (22) and (25), we have k − k e 2 2 2 2 2 2 E( G∗ G∗ ) 2(1 + H h )E( ∆G )+2h E( K1 K ). (B.8) k n+1,m − n+1,mk ≤ k n,mk k − 1k By (110), we can estimatee E( K K 2) as e k 1 − 1k 2 E( K1 K1 ) e k − k n m 2 − e 2 2 (std) 1 128P1 (tn m)h (2+(n m i)h) (∆gn,m) +8¯γ(tn m) ≤ − − − NΓn,m(i) − · N (B.9) ( i=1 ) s X 2 1 2 2 2 (std) (∆g ) 128P1 (tn m)(2 + tn m) tn m Ωn,m n,m +8¯γ(tn m) ≤ − − − N − · Ns h i Therefore

2 E( G∗ G∗ ) k n+1,m − n+1,mk 2 h2 H 2 2 E 2 2 2 2 2 (std) g 2(1 +e h ) ( ∆Gn,m )+256P1 (tn m)(2 + tn m) tn mh Ωn,m (∆ n,m) + 16¯γ(tn m) ≤ k k − − − N − · Ns h 2 i h2 H 2 2 2 2 2 (std) g 2 1+ + 128P1 (tn m)(2 + tn m) tn m h Ωn,m (∆ n,m) + 16¯γ(tn m) ≤ − − − N − · Ns  2 h2   h i 4 (std) (∆g ) Ωn,m n,m + 16¯γ(tn m) ≤ N − · Ns h i H 2 2 2 2 if h 1/ + 128P1 (tn m)(2 + tn m) tn m. Inserting this inequality into (B.7), one obtains ≤ − − − q 2 2 2 (std) (std) h ∗ g g Ω (∆ n,m∗ ) 4 Ω¯ (∆ n,m∗ ) + 16¯γ(tn m) . (B.10) N n,m ≤ N n,m − · Ns h i h i Finally, the estimate (109) can be obtained by inserting the estimates (B.6) and (B.10) into (B.5) and require h 1 . ≤ √2P1(tn−m+1)

Proof of (34). By (104), we estimate the bias by 1 1 E(∆G ) A (h)E(∆G ) + h B (h)E(K K ) + h E(K K ) k n+1,m k≤k n,m n,m k 2 k n,m 1 − 1 k 2 k 2 − 2 k (B.11) 1 1 1+ H 4h4 E(∆G ) + h E(K e K ) + h E(K e K ) . ≤ 8 k n,m k k 1 − 1 k 2 k 2 − 2 k   Now we can insert (108) and (109) into the above equation toe get the recurrencee relation stated in (47),

52 (std) where the error ¯ (∆gn,m∗ ) can be bounded by (35), resulting in NΩn,m E(∆G ) k n+1,m k n m − 1H 4 4 E 2 E 1+ h (∆Gn,m) + 22P1(tn m+1)h (2+(n m +1 i)h) (∆gn,m∗ ) ∗ ≤ 8 k k − − − Γn,m(i)   i=1 X 7 h2 h 3 2 2θ1√P1(tn−m+1)tn−m+1 + α¯(tn m+1)θ2γ¯(tn m+1) e +8¯α(tn m+1)¯γ(tn m+1) 2 − − · N − − · N (B.12)  s s   n m 1H 4 4 E 2 − E 1+ h (∆Gn,m) + 22P1(tn m+1)h (2+(n m +1 i)h) (∆gn,m∗ ) ∗ ≤ 8 k k − − − Γn,m(i)   i=1 X h2 2 2θ1√P1(tn−m+1)tn−m+1 +4¯α(tn m+1)θ2γ¯(tn m+1) e − − · Ns   upon assuming h 1 . ≤ 16 We notice that the above inequality is simply the recurrence relation (173) with the last term changed. Therefore, we can repeat the application of (173) and find the following estimate:

h E 2 2θ1√P1(tn−m+1)tn−m+1 n m+1 (∆Gn+1,m) 4θ2α¯(tn m+1)¯γ(tn m+1) e (1 + θ1 P1(tn m+1)h) − k k≤ − − − · Ns   p (B.13) which leads to the final estimate for the bias stated in (34).

References

[1] M. Asano, Basieva I., Khrennikov A., Ohya M., Tanaka Y., and Yamato I. Quantum Information Biology: From Theory of Open Quantum Systems to Adaptive Dynamics, chapter 18, pages 399–414. World Scientific, 2016. [2] G. A. Bird. Approach to translational equilibrium in a rigid sphere gas. Phys. Fluids, 6(10):1518–1519, 1963. [3] G. A. Bird. Molecular Gas Dynamics and the Direct Simulation of Gas Flows. Oxford: Clarendon Press, 1994. [4] Z. Cai and J. Lu. A quantum kinetic Monte Carlo method for quantum many-body spin dynamics. SIAM J. Sci. Comput., 40(3):B706–B722, 2018. [5] Z. Cai, J. Lu, and S . Yang. Inchworm Monte Carlo method for open quantum systems. To appear in Comm. Pure Appl. Math. [6] Earl Campbell. Random compiler for fast hamiltonian simulation. Phys. Rev. Lett., 123:070503, Aug 2019. [7] H.-T. Chen, G. Cohen, and D. R. Reichman. Inchworm Monte Carlo for exact non-adiabatic dynamics. I. Theory and algorithms. J. Chem. Phys., 146:054105, 2017. [8] H.-T. Chen, G. Cohen, and D. R. Reichman. Inchworm Monte Carlo for exact non-adiabatic dynamics. II. Benchmarks and comparison with established methods. J. Chem. Phys., 146:054106, 2017. [9] G. Cohen, E. Gull, D. R. Reichman, and A. J. Millis. Taming the dynamical sign problem in real-time evolution of quantum many-body problems. Phys. Rev. Lett., 115(26):266802, 2015. [10] M. Cristoforetti, F. Di Renzo, and L. Scorzato. New approach to the sign problem in quantum field theories: High density QCD on a lefschetz thimble. Phys. Rev. D, 86:074506, 2012. [11] Q. Dong, I. Krivenko, J. Kleinhenz, A. E. Antipov, G. Cohen, and E. Gull. Quantum Monte Carlo solution of the dynamical mean field equations in real time. Phys. Rev. B, 96:155126, 2017. [12] F. J. Dyson. The radiation theories of Tomonaga, Schwinger, and Feynman. Phys. Rev., 75(3):486–502, 1949. [13] Eitan Eidelstein, Emanuel Gull, and Guy Cohen. Multiorbital quantum impurity solver for general interactions and hybridizations. Phys. Rev. Lett., 124(20):206405, 2020. [14] M. Esposito, U. Harbola, and S. Mukamel. Nonequilibrium fluctuations, fluctuation theorems, and counting statistics in quantum systems. Rev. Mod. Phys., 81(4):1665–1702, 2009. [15] E. Hairer, S. P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I (2Nd Revised. Ed.): Nonstiff Problems. Springer-Verlag, Berlin, Heidelberg, 1993. [16] W. Hu, C. J. Li, L. Li, and J.-G. Liu. On the diffusion approximation of nonconvex stochastic gradient descent. Ann. Math. Sci. Appl., 4(1):3–32, 2019. [17] A. Ishizaki and Y. Tanimura. Quantum dynamics of system strongly coupled to low-temperature colored noise bath: Reduced hierarchy equations approach. J. Phys. Soc. Japan, 74(12):3131–3134, 2005. [18] S. Jin, L. Li, and J.-G. Liu. Random batch methods (RBM) for interacting particle systems. J. Comput. Phys., 400:108877, 2020.

53 [19] L. Li, Z. Xu, and Y. Zhao. A random-batch Monte Carlo method for many-body systems with singular kernels. SIAM J. Sci. Comput., 42(3):A1486–A1509, 2020. [20] Q. Li, C. Tai, and W. E. Stochastic modified equations and adaptive stochastic gradient algorithms. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2101–2110, International Convention Centre, Sydney, Australia, 2017. [21] Yingzhou Li and Jianfeng Lu. Bold diagrammatic Monte Carlo in the lens of stochastic iterative methods. Trans. Math. Appl., 3:1–17, 2019. [22] G. Lindblad. On the generators of quantum dynamical semigroups. Comm. Math. Phys., 48(2):119–130, 1976. [23] EY Loh Jr, JE Gubernatis, RT Scalettar, SR White, DJ Scalapino, and RL Sugar. Sign problem in the numerical simulation of many-electron systems. Phys. Rev. B, 41(13):9301–9307, 1990. [24] D. MacKernan, R. Kapral, and G. Ciccotti. Sequential short-time propagation of quantum-classical dynamics. J. Phys.: Condens. Matter, 14(40):9069–9076, 2002. [25] N. Makri. Improved Feynman propagators on a grid and non-adiabatic corrections within the path integral framework. Chem. Phys. Lett., 193(5):435–445, 1992. [26] N. Makri. On smooth Feynman propagators for real time path integrals. J. Phys. Chem., 97(10):2417–2424, 1993. [27] N. Makri. The linear response approximation and its lowest order corrections: An influence functional approach. J. Phys. Chem. B, 103(15):2823–2829, 1999. [28] N. Makri. Iterative blip-summed path integral for quantum dynamics in strongly dissipative environments. J. Chem. Phys., 146(13):134101, 2017. [29] N. Makri and D. E. Makarov. Tensor propagator for iterative quantum time evolution of reduced density matrices. I. theory. J. Chem. Phys., 102(11):4600–4610, 1995. [30] L. Mancino, V. Cavina, A. De Pasquale, M. Sbroscia, R. I. Booth, E. Roccia, I. Gianani, V. Giovannetti, and M. Barbieri. Geometrical bounds on irreversibility in open quantum systems. Phys. Rev. Lett., 121(16):160602, 2018. [31] L. M¨uhlbacher and E. Rabani. Real-time path integral approach to nonequilibrium many-body quantum systems. Phys. Rev. Lett., 100(17):176403, 2008. [32] L. M¨uhlbacher and E. Rabani. Diagrammatic Monte Carlo simulation of nonequilibrium systems. Phys. Rev. B, 79(3):035320, 2009. [33] S. Nakajima. On quantum theory of transport phenomena. Prog. Theo. Phys., 20(6):948–959, 1958. [34] N. Prokof’ev and B. Svistunov. Bold diagrammatic Monte Carlo technique: When the sign problem is welcome. Phys. Rev. Lett., 99(25):250201, 2007. [35] M. Ridley, V. N. Singh, E. Gull, and G. Cohen. Numerically exact full counting statistics of the nonequilibrium Anderson impurity model. Phys. Rev. B, 97(11):115109, 2018. [36] M. Schir´o. Real-time dynamics in quantum impurity models with diagrammatic Monte Carlo. Phys. Rev. B, 81(8):085126, 2010. [37] P. W. Shor. Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A, 52(4):R2493–R2496, 1995. [38] Shayne Waldron. The error in linear interpolation at the vertices of a simplex. SIAM J. Numer. Anal., 35(3):1191–1200, 1996. [39] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, page 116, New York, NY, USA, 2004. Association for Computing Machinery. [40] E. Zwanzig. Ensemble method in the theory of irreversibility. J. Chem. Phys., 33(5):1338–1341, 1960.

54