A new scheme for fixed node diffusion with pseudopotentials: Improving reproducibility and reducing the trial-wave-function bias

Cite as: J. Chem. Phys. 151, 134105 (2019); https://doi.org/10.1063/1.5119729 Submitted: 15 July 2019 . Accepted: 12 September 2019 . Published Online: 01 October 2019

Andrea Zen , Jan Gerit Brandenburg , Angelos Michaelides , and Dario Alfè

J. Chem. Phys. 151, 134105 (2019); https://doi.org/10.1063/1.5119729 151, 134105

© 2019 Author(s). The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

A new scheme for fixed node diffusion quantum Monte Carlo with pseudopotentials: Improving reproducibility and reducing the trial-wave-function bias

Cite as: J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 Submitted: 15 July 2019 • Accepted: 12 September 2019 • Published Online: 1 October 2019

Andrea Zen,1,2,3,4,a) Jan Gerit Brandenburg,1,5 Angelos Michaelides,1,2,3,b) and Dario Alfè1,2,4,6,c)

AFFILIATIONS 1 Thomas Young Centre, University College London, London WC1H 0AH, United Kingdom 2London Centre for Nanotechnology, Gordon St., London WC1H 0AH, United Kingdom 3Department of Physics and Astronomy, University College London, London WC1E 6BT, United Kingdom 4Department of Earth Sciences, University College London, London WC1E 6BT, United Kingdom 5Interdisciplinary Center for Scientific Computing, University of Heidelberg, Im Neuenheimer Feld 205A, 69120 Heidelberg, Germany 6Dipartimento di Fisica Ettore Pancini, Università di Napoli Federico II, Monte S. Angelo, I-80126 Napoli, Italy

a)Electronic mail: [email protected] b)Electronic mail: [email protected] c)Electronic mail: [email protected]

ABSTRACT Fixed node diffusion quantum Monte Carlo (FN-DMC) is an increasingly used computational approach for investigating the electronic struc- ture of molecules, solids, and surfaces with controllable accuracy. It stands out among equally accurate electronic structure approaches for its favorable cubic scaling with system size, which often makes FN-DMC the only computationally affordable high-quality method in large condensed phase systems with more than 100 atoms. In such systems, FN-DMC deploys pseudopotentials (PPs) to substantially improve efficiency. In order to deal with nonlocal terms of PPs, the FN-DMC algorithm must use an additional approximation, leading totheso-called localization error. However, the two available approximations, the locality approximation (LA) and the T-move approximation (TM), have certain disadvantages and can make DMC calculations difficult to reproduce. Here, we introduce a third approach, called the determinant localization approximation (DLA). DLA eliminates reproducibility issues and systematically provides good quality results and stable simula- tions that are slightly more efficient than LA and TM. When calculating differences—such as interaction and ionization —DLA is also more accurate than the LA and TM approaches. We believe that DLA paves the way to the automation of FN-DMC and its much easier application in large systems. Published under license by AIP Publishing. https://doi.org/10.1063/1.5119729., s

I. INTRODUCTION computational methods employed for such simulations is of fun- damental importance. In a wide range of physical-chemical prob- A wide range of scientific topics greatly benefit from com- lems, many important static, dynamic, and thermodynamic proper- puter simulations, such as crystal polymorph prediction, molec- ties are related to the potential energy surface. Thus, one of the grand ular adsorption on surfaces, the assessment of phase diagrams, challenges of computational modeling is the evaluation of accurate phase transitions, nucleation, and more. The accuracy of the energetics for molecules, surfaces, and solids. This challenge is far

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-1 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

from straightforward because various types and strengths of inter- TM are used in production calculations, and it is unclear which is atomic and intermolecular interactions are relevant, and a method better.33,106–109 must describe all of them correctly. Both LA and TM have a big problem: reproducibility. As dis- There are various computational methods acknowledged for cussed above, the localization error arises from the projection of having high accuracy. For condensed phase systems, one very all or part of Vˆ NL on ΨT . With either LA or TM, FN-DMC will 1 promising methodology is quantum Monte Carlo (QMC), often in produce different results with different ΨT , even if the nodes are the fixed-node (FN) diffusion Monte Carlo (DMC) flavor. FN-DMC unchanged. Unfortunately, ΨT has a level of arbitrariness in the has favorable scaling with system size (between the 3-rd and the way it can be defined because it is not straightforward to tell what 4-th power of the system size), and it can be efficiently deployed on is the optimal choice for ΨT . Ideally, we want a ΨT as close as high performance computer facilities. Nowadays, there is an increas- possible to the exact (which is unknown) and for ′ ing amount of benchmark data for solids and surfaces obtained via which the ratio ΨT (R)/ΨT (R ) is quickly evaluated computation- 2–20 FN-DMC. Data provided by DMC are of use in tackling interest- ally. In practice, ΨT can have many different functional forms, ing materials science problems and also to help the improvement of and different QMC packages often use different forms, exacerbat- density functional theory (DFT)21 and other cheaper computational ing the reproducibility problem. Even within a specific implemen- approaches. tation of ΨT , there are parameters to be set in some way, and DMC implements a technique to project out the exact ground there is no unique way to do this. Moreover, in large systems the state Φ from a trial wave function ΨT by per- number of parameters increases rapidly, leading to additional dif- forming a propagation according to the imaginary time-dependent ficulties. For this reason, it is not uncommon to find differences Schrödinger equation.1 However, an unconstrained projection leads in FN-DMC results from nominally similar studies. This contrasts to a bosonic wave function, so in fermionic systems, the fixed-node with other electronic structure techniques where agreement between (FN) approximation is typically employed to keep the projected different studies is relatively straightforward to achieve. Resolv- wave function antisymmetric. FN-DMC constrains the projected ing issues such as this are critical to reducing the labor intense wave function ΦFN to have the same nodes as a trial wave function nature of the DMC simulations and making DMC easier to use in 22,99–105 ΨT . Thus, ΦFN is as close as possible to the exact (unknown) general. fermionic ground state Φ, given the nodal constraints, and the equal- In this paper, we introduce a new approach to deal with the ity is reached if ΨT has exact nodes. In addition to the FN approxi- nonlocal potential terms in FN-DMC which resolves the repro- mation, in most practical DMC simulations, pseudopotentials (PPs) ducibility problem. We call this method the determinant localiza- are used because the core electrons in atoms significantly increase tion approximation (DLA). In DLA, as explained below, the key the computational cost of FN-DMC simulations.23–25 There are also point is to do the localization on just the determinant part of the a few other technical aspects of FN-DMC that can affect its accuracy trial wave function. The localization error in DLA is not eliminated, and efficiency, such as the actual implementation of the imaginary but it can be reproduced systematically across different implemen- 26–28 time-dependent Schrödinger propagation for a finite time step. tations of ΨT and in different QMC packages. To this aim, the However, in general the most important and sizeable approxima- localization only uses the part of the trial wave function that can tions in FN-DMC arise from the fixed-node constraint and the use be obtained deterministically: the determinant part, which fixes the of pseudopotentials. nodes. In this way, the reproducibility of DLA is guaranteed by The use of pseudopotentials in FN-DMC brings a twofold construction. approximation. The first and trivial source of approximation isdue It needs to be tested if DLA yields results competitive to LA to the fact that no pseudopotential is perfect. Pseudopotentials (PPs) and TM. It has to be noticed that with PPs, we are always inter- can represent implicitly the influence of the core electrons only ested in energy differences and not in absolute energies. So, the most approximatively, and any method employing pseudopotentials will accurate method is not necessarily the one with the smallest abso- be affected by this issue. The second source of approximation is more lute localization error, but the method that makes consistently the subtle and tricky. PPs have nonlocal operators, i.e., terms Vˆ NL, such same localization error across different configurations of the same that their application on a generic function ξ(R) of the electronic system, such that there is the largest error cancellation in the energy ′ ′ ′ coordinates R gives Vˆ NLξ(R) = ∫ ⟨R∣Vˆ NL∣R ⟩ξ(R )dR . The nonlo- difference. By construction, DLA makes very consistent localization cal PP operators are a big issue in FN-DMC simulations. Indeed, errors. Indeed, we observe in all the representative cases considered one of the quantities that should be evaluated in the FN-DMC pro- in this paper that DLA always yields accurate results, which are sys- jection is the value of the nonlocal terms applied to the projected tematically better than LA and TM whenever the ΨT is not optimal. wave function, Vˆ NLΦFN , which cannot be calculated as we do not Moreover, we notice that DLA produces very stable simulations, in know the functional form of ΦFN . There have been attempts to cir- contrast to LA. In terms of efficiency, DLA appears slightly more cumvent the difficulty, but the issue persists.29 There has been no efficient than both LA and TM. All these features make DLA thebest satisfactory way to deal with the nonlocal PP terms within FN-DMC candidate to perform FN-DMC calculations in large systems, where exactly, and it is necessary to rely on some approximation. So far, the quality of ΨT could be hard to assess and a stable and efficient there are two alternatives: the locality approximation30 (LA) and the simulation is highly needed. T-move approximation29,31 (TM). In the former, the trial wave func- The outline of the paper is the following: we provide a short tion ΨT is used to localize the nonlocal PP terms, so the unknown overview of the FN-DMC method in Sec.II; we describe our deter- −1 ˆ −1 ˆ term ΦFN VNLΦFN is approximated with ΨT VNLΨT . In the latter, minant localization approximation in Sec. III; we illustrate the only the terms in Vˆ NL yielding a sign-problem are localized using results produced by DLA, compared with LA and TM, in Sec.IV, 32 ΨT . In both LA and TM, there is a localization error. Both LA and with a specific focus on interaction energy evaluations (Sec.IVA),

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-2 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

ionization energies (Sec.IVB), stability (Sec.IVC), and efficiency 3N-dimensional electronic coordinates, and ψ(R, t) is the solution (Sec.IVD). A reader already familiar with DMC can skip to Sec. III. at time t of the imaginary time Schödinger equation, We draw our final conclusions in Sec.V. ∂ − ψ(R, t) = (Hˆ − E )ψ(R, t), (1) ∂t T II. OVERVIEW ON FIXED NODE DIFFUSION MONTE CARLO where Hˆ is the Hamiltonian and ET is a trial energy, with initial con- dition (R, 0) = (R) and converging exponentially to the exact A. The trial wave function ψ ΨT ground state Φ(R) for t → ∞. Thus, limt→∞f (R, t) = ΨT (R)Φ(R). The trial wave function has a critical role in determining the Since Φ is an eigenstate for Hˆ, the ground state energy E0 can be accuracy of FN-DMC. A QMC trial wave function is the product calculated using the mixed estimator, ΨT (R) = D(R) ∗ exp J(R) of an antisymmetric function D(R) and a symmetric (bosonic) function exp J(R), called the Jastrow ⟨Φ∣Hˆ ∣Ψ ⟩ Φ(R)Ψ (R)E (R)dR = T = ∫ T L factor, where R is the electronic configuration. The function D(R) E0 , (2) ⟨Φ∣ΨT ⟩ ∫ Φ(R)ΨT (R)dR is typically a single Slater determinant, especially when large systems

are simulated. However, it is worth mentioning that if the system ⟨R∣Hˆ ∣ΨT ⟩ where EL(R) = is the local energy in the electronic configu- under consideration is not too large (say, generally not more than a ⟨R∣ΨT ⟩ few atoms) better functions can be used, such as multideterminant ration R for the trial wave function ΨT . expansions of Slater determinants,34–39 valence-bond wave func- The time evolution of f (R, t) follows from the imaginary time tions,40 the antisymmetrized geminal product,41,42 the Pfaffian,43 Schödinger equation (1), which in the integral form leads to and others (see, for instance, the review by Austin et al.44). More- ′ ′ over, the backflow transformation45–47 can be employed to further f (R , t + τ) = ∫ G(R ← R, τ)f (R, t)dR, (3) improve any of the aforementioned ansätze, at the price of a signif- icantly larger computational cost. The Jastrow factor describes the where τ is the time step and G(R′ ← R, τ) is the Green function for dynamical correlation between the electrons, by including explicit the importance sampling, which is defined (symbolically) as functions of the electron-electron distances. In DMC, a property of ′ interest is the nodal surface of ΨT , which is the hypersurface cor- Ψ (R ) ( ′ ← ) ≡ T ⟨ ′∣ (− ˆ )∣ ⟩ responding to ΨT (R) = 0, for real wave functions, or the complex G R R, τ R exp τH R . (4) ΨT (R) phase of ΨT for complex wave functions. They are both determined by D(R) as the Jastrow factor can only alter the amplitude of ΨT . 2 Thus, by starting from f (R, 0) = ΨT (R) and performing an evolu- The Jastrow factor is implemented differently in different QMC ′ J tion according to the Green function G(R ← R, t), we are able to packages. assess expectation values of the exact ground state Φ, When large and complex systems are simulated, such as

adsorption on surfaces or molecular crystals, the most common ′ ′ ′ 2 Φ(R )ΨT (R ) = lim ∫ G(R ← R, t)ΨT (R) dR. (5) practice is to obtain D from a deterministic approach, usually DFT, t→∞ and to decide a functional form for J and optimize, within the variational Monte Carlo (VMC) scheme,1 the parameters minimiz- This is the process implemented in the DMC algorithm. In fermionic ing either the energy or the variance. Since D comes from a deter- systems, the fixed-node (FN) approximation is typically introduced, ministic method, there is no reproducibility problem here, and in so the FN Hamiltonian Hˆ FN ≡ Hˆ + Vˆ FN , where Vˆ FN is an infinite wall taking energy differences we can usually expect a large cancella- at the nodal surface of ΨT , is used. Further details are reported in tion of the FN error. On the other hand, J is optimized stochas- Appendix A. tically, so its parameters are affected by an optimization uncer- The Hamiltonian Hˆ is the sum of the kinetic and potential oper- tainty. Dealing with this uncertainty becomes increasingly challeng- ators Kˆ and Vˆ , respectively. In all-electron calculations, the poten- ing as the system gets larger. Moreover, a new optimization of J tial Vˆ is local, Vˆ = Vˆ L. However, in general, there is the is needed for every distinct orientation of the molecular systems, need to deploy pseudopotentials to represent the core electrons of and optimizing J so frequently is tedious and time-consuming, the atoms and reduce the computational cost of the calculation, see and, due to the stochastic nature of the optimization procedure, Appendix B. In this case, the potential term has both local and non- can lead to Jastrow factors of different qualities, resulting in less local operators: Vˆ = Vˆ L +Vˆ NL. The presence of nonlocal operators in than optimal cancellation of errors. A human supervision of the the potential complicates the formulation of the DMC algorithm and optimization is always highly recommended, if not necessary. The forces the introduction of a further approximation. In the following, optimization is responsible for making QMC labor intense and we will first consider the simple case of a potential with only local nonautomatic. operators, Sec.IIC, and later we will consider the case of potential term with nonlocal operators, Sec.IID.

B. Diffusion Monte Carlo C. Green’s function for Hˆ = Kˆ + Vˆ The DMC algorithm with importance sampling performs L a time evolution of f (R, t) = ΨT (R)ψ(R, t), where ΨT (R) The simplest case is when the Hamiltonian has only a local is a trial wave function (described in Sec.IIA), R are the potential term; thus, it can be written as Hˆ = Kˆ + Vˆ L, with

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-3 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

Kˆ = − 1 ∇2. By substitution in the imaginary time Schrödinger function of the nonlocal part of the potential. For small τ, we have 2 ′ equation [Eq. (1)], multiplication by ΨT (R), and some algebraic that TNL(R ← R, τ) ∼ δR′,R − τVR′,R, where δR′,R is the Dirac’s delta operations, we obtain and ′ ∂ 1 2 ΨT (R ) ′ f (R, t) = ∇ f (R, t) − ∇ ⋅ (V(R)f (R, t)) VR′,R ≡ ⟨R ∣Vˆ NL∣R⟩. (10) ∂t 2 ΨT (R) − [EL(R) − ET ] f (R, t), (6) Notice that VR′,R can be either positive or negative depending on ′ ′ ˆ ′ where V(R) = ∇ log∣ΨT (R)∣. Thus, the time evolution of f (R, t) ΨT , VNL, R, and R . Whenever VR ,R > 0 for some R ≠ R, then ′ is given on the right-hand side (RHS) of Eq. (6). If the RHS only TNL(R ← R, τ) < 0. The DMC algorithm needs to interpret the ′ had the first two terms, we would have a pure drift-diffusion pro- Green’s function as a transition probability, but if TNL(R ← R, τ) ′ cess, having a Green’s function that for a small time step τ and < 0 for some R and R , it cannot be a transition probability from R ′ ′ for N electrons in the system can be approximated as GDD(R to R (sign problem). Thus, the presence of Vˆ NL yields a sign prob- ′ 29,31 1 [R −R−τV(R)]2 lem in the DMC algorithm because it gives a Green’s function ← R, τ) = / exp{− }. The last term on the RHS (2πτ)3 N 2 2τ G(R′ ← R, τ) which can have negative terms. of Eq. (6) is the branching term, and its associated Green’s function ′ There is no direct solution to this problem, and as a con- ( ′ ← ) = {− EL(R )+EL(R)−2ET } is GB R R, τ exp τ 2 . Green’s function of sequence, an approximation is introduced. As noted earlier, two f (R, t) for a small time interval τ can be approximated48,49 as approaches are available: either to use the locality approximation 30 29,31 ′ ′ ′ (LA) or Casula’s T-move approximation (TM). They are sum- GBDD(R ← R, τ) ≈ GB(R ← R, τ)GDD(R ← R, τ), (7) marized in Secs. II D 1 and II D 2. ′ which is exact for τ → 0. GBDD(R ← R, τ) can be used to approx- imate Green’s function for an arbitrarily large time interval t.50 1. Locality approximation in FN-DMC GBDD defines a branching-drift-diffusion process, as described, for The approach taken in LA is to approximate the unknown ˆ ˆ instance, in Ref.1. The algorithms implemented in QMC packages quantity VNLΦ(R,t) with VNLΨT (R) , which is the value of the nonlo- are usually a little more involved.51,110,111 However, there is no need Φ(R,t) ΨT (R) cal potential localized on the trial wave function Ψ (R). By using here to complicate further the picture. We will be concerned with the T this approximation in Eq. (8), we obtain that the 3rd term on the results of DMC in the continuous limit τ → 0. In this limit, the only RHS is −[E (R) − E ]f (R, t), and the equation becomes identical bias in the DMC energy evaluation E is given by the FN approxi- L T FN to Eq. (6). Thus, Green’s function in LA is given by Eq. (7) and the mation. In particular, E ≥ E , with the equality reached if the nodes FN 0 DMC algorithm is a branching-drift-diffusion process. of Ψ are exact. T The major difference from Sec.IIC is that we approximate the Hamiltonian, which is no longer given by the FN Hamiltonian D. Green’s function for Hˆ = Kˆ + Vˆ L + Vˆ NL Hˆ FN ≡ Hˆ + Vˆ FN , but by When pseudopotentials are used, the potential term has non- LA Vˆ NLΨT local operators Vˆ NL, and the Hamiltonian can be written as Hˆ FN ≡ Kˆ + Vˆ L + + Vˆ FN , (11) ΨT Hˆ = Kˆ + Vˆ L + Vˆ NL. If we consider the imaginary time Schrödinger ˆ ˆ equation (1) and substitute H, we obtain the following time evolu- where the notation VNLΨT is used to indicate that the non- tion of f (R, t): ΨT local potential Vˆ NL has been localized using the function ˆ ∂ 1 VNLΨT 2 ΨT (R). So, given a generic function ξ(R), we have ξ(R) f (R, t) = ∇ f (R, t) − ∇ ⋅ (V(R)f (R, t)) ΨT ∂t 2 ′ ′ ′ ξ(R) LA = ∫ dR ΨT (R )⟨R ∣Vˆ NL∣R⟩ . Notice that Hˆ has no nonlocal ⎡ ⎤ ΨT (R) FN ⎢ (Kˆ + Vˆ L)ΨT (R) Vˆ ψ(R, t) ⎥ − ⎢ + NL − E ⎥ f (R, t). (8) potential term, i.e., the action of Hˆ LA on the generic function ξ at ⎢ ( ) ( ) T ⎥ FN ⎣ ΨT R ψ R, t ⎦ point R only depends on the value of ξ at R. LA LA The ground state for Hˆ FN is the projected wave function ΦFN . The drift and diffusion terms on the RHS are identical to Eq. (6), but LA there is a complication in the branching term. Indeed, we cannot The expectation value of the energy EFN can be evaluated using the ˆ ˆ (R ) VNLΨT LA VNLψ ,t mixed estimator because ΨT (R) = Vˆ NLΨT (R), so Hˆ FN ∣ΨT ⟩ calculate ( ) , as we do not know the analytical form of ψ(R, t). ΨT ψ R,t ˆ LA There is an alternative approach, which is to write the Green’s = H∣ΨT ⟩. However, in general ΦFN is different from the (unknown) ′ ˆ LA function G(R ← R, τ) for Hˆ. Using the Zassenhaus formula, for ground state ΦFN for HFN ; thus, EFN ≠ EFN . In other words, with −τ(Kˆ +Vˆ +Vˆ ) −τVˆ −τ(Kˆ +Vˆ ) LA we have lost the variationality of the approach because the error small τ, we can approximate e L NL with e NL e L , and introduced by this approximation can either be positive or nega- by substituting it into Eq. (4), we obtain LA tive, and EFN is not, in general, an upper bound for E0. Only in ′ ′ ˆ LA ˜ ˜ ˜ the (ideal) case of ΨT = ΦFN , we do have HFN ∣ΦFN ⟩ = EFN ∣ΦFN ⟩, G(R ← R, τ) ∼ ∫ TNL(R ← R, τ)∗GL(R ← R, τ)dR, (9) LA so EFN = EFN . As a corollary, with the exact trial wave function, LA ′ Ψ = Φ, then we have that E = E . However, the trial wave ′ ΨT (R ) ′ −τ(Kˆ +Vˆ L) T FN 0 where GL(R ← R, τ) ≡ ⟨R ∣e ∣R⟩ is Green’s function ΨT (R) function having exact nodes is not a sufficient condition for having for the local part of the Hamiltonian, which has been discussed in LA = ′ EFN E0 as the LA depends on the overall trial wave function ΨT and ′ ΨT (R ) ′ −τVˆ NL 52 LA Sec.IIC, and TNL(R ← R, τ) ≡ ⟨R ∣e ∣R⟩ is Green’s not just on its nodes. In other words, E has both a FN error and a ΨT (R) FN

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-4 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

localization error. Mitas et al.30 showed that the error (including In DLA, the FN Hamiltonian is both FN and localization) on the energy evaluation is quadratic in LA 2 ˆ the wave function error, i.e., EFN − E0 = O[(ΨT − Φ) ]. Notice that DLA VNLD Hˆ ≡ Kˆ + Vˆ L + + Vˆ FN , (13) FN and localization errors add to the time step error and are present FN D also in the limit of zero time step. DLA and the associated projected wave function is ΦFN . In order to be 2. T-move approximation in FN-DMC able to use the mixed estimator, we need to define the Hamiltonian

In the T-move approach, the nonlocal Green’s function TNL DLA Vˆ NLD includes only the terms without sign-problems and the remaining Hˆ ≡ Kˆ + Vˆ L + (14) part of the nonlocal potential is localized in a similar manner to D + − ± LA. To this aim, the positive, VR′,R, and negative, VR′,R parts, VR′,R DLA DLA such that Hˆ FN ∣ΨT ⟩ = Hˆ ∣ΨT ⟩ and VR′ R±∣VR′ R∣ = , , ′ 2 , of the term VR ,R defined in Eq. (10) are used. The sign + ′ DLA DLA problem arises from terms VR ,R, which have to be localized, while ∫ Φ (R)ΨT (R)E (R)dR − EDLA = FN L , (15) the terms VR′ R yield a TNL with non-negative sign. Details about FN DLA , ∫ Φ (R)ΨT (R)dR how this algorithm can be implemented are discussed in Refs. 29 FN and 31. ⟨R∣Hˆ DLA∣Ψ ⟩ where the local energy is EDLA(R) = FN T . DLA becomes exact The corresponding T-move Hamiltonian is L ⟨R∣ΨT ⟩ → DLA∣ ⟩ = ˆ ∣ ⟩ DLA = + in the limit of D Φ, as HFN Φ H Φ and EFN E0. It TM − Vˆ NLΨT 30 Hˆ ≡ Kˆ + Vˆ L + Vˆ + + Vˆ FN , (12) can be shown, along the lines of the argument of Mitas et al., FN NL DLA ΨT that the error EFN − E0 on the energy evaluation (including both 2 + − + − FN and localization) is O[(D − Φ) ]. Comparing with the corre- where the operators Vˆ and Vˆ correspond to V ′ and V ′ , NL NL R ,R R ,R sponding error within LA, it implies than the DLA error on the respectively. The projected wave function ΨTM is the ground state FN absolute energy is typically expected to be larger than the LA error ˆ TM ˆ TM ˆ of HFN , and since HFN ∣ΨT ⟩ = H∣ΨT ⟩, the expectation value of the = TM as ΨT D exp J is typically closer than D to Φ. On the other hand, energy EFN can be evaluated using the mixed estimator. Similar to we suggest an alternative point of view: the DLA approach should TM the LA approach, the projected function ΦFN is, in general, different be seen as a modification of the PPs. PPs introduce an approxima- from the fixed node ground state ΦFN , but if the trial wave function tion in the Hamiltonian, whereas the remaining part of Hˆ comes TM TM ΨT = ΦFN , then EFN = EFN . If ΨT = Φ, then EFN = E0, but if ΨT from first principles (see Appendix B). There is no proof showing TM has exact nodes but differs from Φ, then EFN ≠ E0. Similar to LA, that PPs provide a better approximation of the core if the localiza- 53 TM depends on the overall trial wave function ΨT . Note that the tion is performed using a wave function with or without the Jastrow FN and localization errors add to the time step error and are present factor. However, without the Jastrow, there are clear advantages in also in the limit of zero time step. terms of reproducibility of results and better error cancellation in The TM approach is computationally slightly more expensive energy differences, as discussed below. In other words, we are not DLA LA TM than LA and often has a larger time step error. However, it has two concerned that EFN might be further from E0 than EFN (or EFN ) as TM ˆ advantages over LA: EFN is an upper bound of the exact ground state ˆ VNLD 54 long as VL + D yields a good representation of the ionic potential E0, and it is a more stable algorithm than the LA. energy. DLA Within DLA, the quality of the fixed node energy EFN depends exclusively on D. The Jastrow factor J does not affect the accu- III. NEW APPROACH: DETERMINANT LOCALIZATION racy; the only influence of the Jastrow is on the efficiency, asit APPROXIMATION IN FN-DMC will affect the time step errors and the variance of HDLA. In the The major practical disadvantage of both LA and TM is that limit of zero time step, all calculations which use the same D will the results are highly dependent on the Jastrow factor . This might provide the same energy, no matter what (if any) Jastrow factor is J 62 result in problems of reproducibility, especially between results from used. different QMC packages, as the Jastrow factor is often expressed The implementation of DLA is straightforward as it is a simpli- in different and nonequivalent functional forms across the codes. fication of the LA algorithm. It implies the numerical integration of Moreover, the parameters of the Jastrow are affected by stochastic D (instead of ΨT in LA) over a sphere to determine the nonlocal Vˆ NLD uncertainty. In contrast, it is much easier to control the reproducibil- potential energy D ; see Eq. (C1) in Appendix C. The numer- ity of the determinant part of the wave function D, which is generally ical integration scheme employs quadrature rules; hence, a num- obtained from a deterministic method, e.g., DFT. ber of wave function ratios on the integration grids are evaluated Therefore, we propose to use only the determinant part D at every energy measurement.30 While in the calculations reported 55 of the trial wave function to localize the nonlocal potential Vˆ NL. in this manuscript, we used this simple implementation, it should If we bear in mind that pseudopotentials are tested by develop- be noticed that DLA allows a more involved but much more effi- 56,57 ers using deterministic methods—density functional theory or cient implementation. Whenever D is used instead of ΨT , the inte- coupled cluster with single, double, and perturbative triple excita- grals in Eq. (C1) can be factorized into simpler integrals involving tions [CCSD(T)]58–61—our suggestion seems also quite reasonable the molecular orbitals defining D.24 Thus, all the nonlocal inte- because they are not tested widely and systematically in the presence grals can be done analytically (e.g., analytical expressions have been of a Jastrow and within a DMC scheme. obtained by Hammond et al.24 under the assumption that the basis

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-5 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

functions are gaussian type orbitals) or numerically, becoming a calculations.63 A standard setup for FN-DMC was used, with TN- 57 local potential which, for instance, can be precomputed on a grid DF pseudopotentials, a Slater-Jastrow ΨT with determinant D at the beginning of a QMC simulation. This approach prevents obtained from a DFT calculation.64 A Jastrow factor J having the evaluation of many wave function ratios and possibly yields an explicit electron-electron (e-e), electron-nucleus (e-n), and electron- appreciable speedup. electron-nucleus (e-e-n) terms was used here. Within this specific functional form of J, we obtained two different Jastrow factors, which we call J.bound and J.far. The former, J.bound, is the Jas- IV. RESULTS trow obtained when we optimize the parameters by minimizing In Secs.II and III, we have outlined that both LA and TM the variational variance Var[EL]VMC of the local energy for the LA TM yield total energies, EFN and EFN , affected by the quality of the trial bound configuration. The latter, J.far, is instead obtained byopti- wave function ΨT = D ∗ exp J. Within the DLA scheme intro- mizing the parameters on a configuration where the water and the DLA duced here, the total energy EFN is affected only by the determinant benzene are far away (around 10 Å) and are effectively noninter- part D of the trial wave function ΨT . Therefore, DLA eliminates acting. In Fig. 1, we compare the reference value with the FN- the uncertainty due to the Jastrow factor on the DMC results per- DMC evaluations obtained with LA, TM, and DLA, and the dif- formed with pseudopotentials. We are going to show here, in a few ferent Jastrow factors, whereas Fig. 2 reports the FN-DMC total examples, the amount of uncertainty that the Jastrow can introduce energies. in LA and TM, in contrast to DLA which is not affected by this The left panel of Fig. 1 shows results for J.far used for both the uncertainty. bound and the far-away configuration. With this setup, DLA isthe only method that provides a reliable interaction energy, which we can estimate to be −131 ± 2 meV for the τ → 0 limit from a quadratic A. DLA is good for interaction energy evaluations 65 fit of the values obtained at finite valuesof τ. The estimated τ → 0 The first system that we considered is water bound toben- limit for LA and TM is −187 ± 3 meV and −76 ± 1 meV.66 So, zene, as shown in the inset of Fig. 1. This is a simple example of with this nonoptimal Jastrow factor, LA severely overbinds and TM 67 the calculation of an interaction energy Eint ≡ Ebound − Efar, which underbinds. is the difference between the energy (Ebound) of the system in the A different choice, which is indeed the standard procedure bound configuration and the energyE ( far) of the molecules far away. adopted in DMC, is to optimize the Jastrow factor specifically for Many of these calculations are performed to evaluate a binding each configuration, i.e., we use J.bound for the bound configuration energy curve, which are needed for example in adsorption energy and J.far for the far configuration. We named this scheme J.mix, and calculations of molecules on surfaces.3–6,19,20 Whereas in this small the results obtained with LA, TM, and DLA are shown in the right system, it is not overly burdensome to optimize J at every dif- panel of Fig. 1. In this case, all three methods are in decent agree- ferent geometry, in a larger and more complex adsorption system, ment with the CCSD(T) reference, from a quadratic fit, we obtain this would be tedious and time-consuming, notwithstanding the the τ → 0 limit: −135 ± 3 meV for LA, −127 ± 1 meV for TM, and variability of the quality of the optimization, due to its stochastic −129 ± 2 meV for DLA.68 The figure also shows the time step error nature. associated with the three different methods. The first consideration This specific water-benzene configuration has a reference Eint is that the better choice of the Jastrow has greatly improved the accu- of −128 ± 1 meV,3 as obtained from basis set converged CCSD(T) racy for any finite τ evaluation with respect to the case with J.far. The

FIG. 1. Interaction energy Eint for a water-benzene complex in the so-called “2-leg” configuration, for the geometry shown in the inset. The plots show FN-DMC evaluations vs the time step τ, using LA, TM, and DLA. (Left plot) Results obtained using the Jastrow factor J.far in both the bound and the far configurations. J.far has been optimized for the far configuration, i.e., for noninteracting water-benzene molecules. (Middle plot) Results obtained using a trial wave function without any Jastrow factor. Inthiscase, LA and DLA are equivalent. (Right plot) Results obtained with mixed Jastrow factor: for the noninteracting configuration, we used J.far, and for the bound configuration, we used J.bound. The reference CCSD(T) value is −128 ± 1 meV.3

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-6 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

FIG. 2. Total energy for the far (Efar, left plot) and bound (Ebound, right plot) config- urations. The plots show FN-DMC eval- uations vs the time step τ, using LA, TM, and DLA, and different choices of the Jastrow factor: J.far, J.bound, and no Jastrow (noJ), as defined in the text. FN-DMC-TM noJ points fall outside the interval range [Efar = −1491.225(7) meV and Ebound = −1491.358(1) meV in the limit τ → 0].

best time step dependence is obtained for the DLA approach, where the carbon atom. These pseudopotentials perform very well at the the interaction energy evaluation for τ = 0.03 a.u. is −127 ± 1 meV, CCSD(T) level of theory. This can be seen in Table I, where the which appears already converged (2 meV difference with respect to absolute difference in the IE between AE and eCEPP evaluations is the τ → 0 limit). <0.07 eV, and the relative difference is <0.5%.77 At this point, one could wonder what is the outcome if the Jas- When PPs are deployed in FN-DMC, the difference from the trow is not used at all. We performed the FN-DMC simulation with AE results can be much larger than what is found for CCSD(T) LA and no Jastrow (so, it is equivalent to DLA), and the outcome because of the localization error. As usual, the localization error is reported on the middle panel of Fig. 1. The τ → 0 limit of the depends on the trial wave function. To have an idea of the magnitude interaction energy is −125 ± 4 meV, in excellent agreement with the of the localization error, we performed two sets of FN-DMC calcula- other DMC-DLA evaluations with J.far and J.mix (as it has to be tions, both with eCEPP and a Slater-Jastrow function with the Slater by construction) and also with the reference CCSD(T). Quite unex- determinant obtained from a DFT calculation with local-density pectedly, we also notice that the time step error is quite small, much approximation (LDA) exchange–correlation functional.78 The dif- smaller than the case with J.far, and similar to the case with J.mix. ference among the two sets is the Jastrow factor, which in one case This happens despite the huge time step error on the total energy (Ju) includes only e-e terms and in the other case (Juxf) includes evaluations (see Fig. 2) when a Jastrow is not used and indicates an e-e, e-n, and e-e-n terms. The parameters of the Jastrow factors unexpectedly good error cancellation of the finite step bias in the have been optimized for each atomic ion considered. We have cal- energy difference. We do not know if this behavior of the no Jas- culated the difference between the FN-DMC evaluations of IE(n) in trow case is transferable to other systems. If it was, one would be two sets, for LA, TM, and DLA. Notice that the difference between tempted to do simulations without a Jastrow. However, this is not the corresponding Juxf and the Ju wave functions is solely due to recommended because the variance of the local energy Var[EL] is the localization error because the same D is used for Ju and Juxf, much larger without a Jastrow, around ten times the variance of the so the nodal surfaces are the same. Results are reported in Fig. 3. Slater-Jastrow ΨT . Since the computational cost is proportional to By construction, DLA is not affected by the Jastrow uncertainty, the variance, then a simulation with a given τ and a target preci- whereas the uncertainty on both TM and LA is larger than 0.2 eV. sion will cost, computationally, around an order of magnitude more So, we see in this case that in LA and TM, the choice of the Jastrow in the absence of a Jastrow. If this extra cost could be recovered by yields a localization error that can be more than two times larger using time steps ten times as large is something that would have to than the CCSD(T) error of the eCEPP pseudopotential. Given this be checked on each system. large uncertainty in LA and TM, it is unclear what the most suit- able Jastrow is. Juxf is variationally better than Ju (smaller VMC B. Evaluation of ionization energies Ionization energies (IEs) are typical quantities that are evalu- ated when new PPs are developed or tested.56–61,69–74 The nth ion- TABLE I. First three ionization energies of the carbon atom, in eV. We give experi- mental results (Expt.) and compare CCSD(T) results based on all electron (AE) and ization energy, IE(n) is the energy necessary to remove one electron PP (eCEPP60). from an atomic (or molecular) species X and charge (n − 1), i.e., to have Xn−1 → Xn + e−. Good PPs are expected to yield IEs esti- Expt. AEa eCEPPb Δ mations close to the corresponding all-electron (AE) evaluations. eCEPP−AE Typically, these checks are not performed at the QMC level, even for IE(1) 11.26 11.26 11.22 −0.04 PPs specifically developed for QMC, but using Hartree-Fock,75 DFT, 76 IE(2) 24.38 24.36 24.29 −0.07 or CCSD(T). Here, we test the FN-DMC evaluations using LA, IE(3) 47.89 47.86 47.90 0.04 TM, and DLA. We considered the first three ionization energies of the carbon aPerformed with Orca,48 using an aug-cc-pV(T,Q)Z basis set. atom, and we performed calculations with the energy consistent cor- bPerformed with Orca,48 using the aug-cc-pV5Z-TN basis.60 Differences with respect to 60 related electron pseudopotential (eCEPP), which has a He-core for aug-cc-pVQZ-TN are below 0.01 eV.

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-7 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

evaluated using Kleinman-Bylander projectors80 (i.e., projected into pseudo-atom wave functions). Thus, there is a possible inconsis- tency between the projection of the nonlocal terms at the DFT and DMC levels. The inconsistency is even worse if the Kleinman- Bylander projectors are not obtained with the same DFT functional used in the DFT preparation of D. As a consequence, the method adopted to deal with the nonlocal part of the pseudopotential (LA, TM, or DLA) has a big impact on the stability of the DMC simula- tion. We have observed that DLA is not affected by the instability issues of LA and in all test simulations is as stable as TM and more stable than LA. To illustrate this point, we consider carbon dioxide, CO2, 2 because in a previous study, we noticed that the CO2 molecule often leads to unstable DMC simulations. We consider here the case of the monomer and dimer of CO2 (configurations taken from Ref.2), and we use CEPP pseudopotentials58 for both oxygen and carbon. We 81 obtain D from a DFT/LDA calculation using the PWSCF code, with a 600 Ry plane wave cutoff, and the molecular orbitals obtained were converted into splines.82 The Kleinman-Bylander projectors used in DFT are from Ref. 58, and they were obtained from DFT with Perdew, Burke, and Ernzerhof (PBE) exchange–correlation func- tional so there is an inconsistency with the employed DFT func- tional.83 In QMC, we used a Jastrow factor J with e-e, e-n, and FIG. 3. Uncertainty of the FN-DMC estimation of the nth ionization energy IE(n) of 60 e-e-n terms, and parameters were optimized by minimizing the local the carbon atom due to the Jastrow, using an eCEPP pseudopotential and either energy variance in VMC yielding Var[E ] = 0.956(2) a.u. in the LA, TM, or DLA. Uncertainty is evaluated as the difference between the estimation L VMC using two different sets of Slater-Jastrow wave functions having the same D and molecule. The same identical trial wave function (i.e., no other opti- different J (see text). Error bars correspond to a standard deviation due to the mization of J ) has a much smaller local energy variance in VMC if DLA DMC sampling. the DLA local energy EL [as obtained from the Hamiltonian given DLA in Eq. (14)] is used: Var[EL ]VMC = 0.685(5) a.u. Thus, the DLA Hamiltonian might already have advantages at the VMC level. At the DMC level, LA simulations are not possible as popula- energy and variance). However, in Appendix D, we compare the tion explosions happen so frequently that it was not possible to finish difference between the PPs and the AE results, and it is unclear if the DMC equilibration.84 In contrast, both TM and DLA yield sta- 79 Ju or Juxf is better. DLA solves the issue because results from Ju ble DMC simulations at all the attempted time steps (0.03, 0.01, and and Juxf are equivalent. 0.003 a.u.). Moreover, the results of the DMC simulations seem rea- In which way can we improve the DMC accuracy with DLA? sonable despite the issue in the wave function. The most interesting It can be improved systematically by changing the determinant quantity to consider is the interaction energy Eint. In this case, as part, for instance using a multideterminant D term obtained using the two carbon dioxide molecules are identical, the interaction can a method such as the complete active space self-consistent field be evaluated as E = E − 2 ∗ E , the difference between 35–37 int dimer monomer (CASSCF), the Configuration Interaction using a Perturbative the energy of the dimer and twice the energy of the monomer. The 38,39 Selection made Iteratively (CIPSI), or the antisymmetrized gem- FN-DMC results with TM and DLA are reported in Fig. 4. Both 41,42 inal power. DLA appears convenient in this case because we can methods are in good agreement with the reference CCSD(T) evalua- anticipate improvements in the nodes and in the wave function, and tion2 when we consider the infinitesimal time step limit. DMC-DLA we do not need to be concerned about the unpredictable effects of is less affected by finite time step bias than DMC-TM. This is prob- the Jastrow factor. ably a consequence of the fact that the local energy variance in DLA is smaller than in TM. C. DLA yields stable DMC simulations D. Good efficiency for DLA Stability is a very important practical aspect of DMC simula- tions. Instability in DMC is usually correlated with the quality of Efficiency is a fundamental property of a computational the trial wave function and of the pseudopotentials. In particular, a method. DLA appears to be more efficient than LA and TM. The possible issue in the trial wave functions, which can generate insta- efficiency of DMC can be estimated, as detailed in Appendix E. The bility, is the generation of the determinant part D of the trial wave choice among LA, TM, and DLA influences two quantities affecting function via plane-wave DFT packages using suboptimal Kleinman- the DMC efficiency: the computational cost Tstep for a single DMC 2 Bylander projectors. In QMC (or in DFT calculations using local- time step and the variance σsys of the local energy. The most effi- 2 ized basis), the nonlocal terms in the pseudopotential are evalu- cient methods will have smaller Tstep and σsys. DLA satisfies both the ated using spherical harmonic projectors. In contrast, in plane-wave conditions. A detailed comparison of the efficiency of LA, TM, and DFT calculations, the nonlocal terms in the pseudopotential are DLA is provided in Appendix E. The outcome is that DLA is more

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-8 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

The DLA method is already implemented in the CASINO,86 TurboRVB,87 and QMCPACK88 packages.

ACKNOWLEDGMENTS We thank Sandro Sorella, Pablo López Ríos, and Paul Kent for useful discussions on the presented methodology. Moreover, we acknowledge the prompt implementation of DLA in TurboRVB by Sandro Sorella and in QMCPACK by Paul Kent and Ye Luo. We are grateful to Pablo López Ríos for the help in implementing DLA in the CASINO package. A.Z. and D.A. are supported by the Air Force Office of Scientific Research, Air Force Material Command, and U.S. Air Force, under Grant No. FA9550-19-1-7007. A.Z. and A.M. were also supported by the European Research Council (ERC) under the European Union’s Seventh Framework Program (No. FP/2007- 2013)/ERC Grant Agreement No. 616121 (HeteroIce project). J.G.B. acknowledges support by the Alexander von Humboldt Founda- tion. We are also grateful, for computational resources, to the ARCHER UK National Supercomputing Service, United Kingdom

FIG. 4. Carbon dioxide dimer configuration (inset) and interaction energy Eint Car-Parrinello (UKCP) consortium (Grant No. EP/F036884/1), the obtained from FN-DMC calculations and plotted as a function of the DMC time London Centre for Nanotechnology, the University College London step τ. Dashed lines are obtained from a linear fit, and the extrapolated τ → 0 val- (UCL) Research Computing, the Oak Ridge Leadership Computing ues are −38(5) meV and −44(1) meV for TM and DLA, respectively. The reference 2 Facility (Grant No. DE-AC05-00OR22725), and the UK Materials CCSD(T) value is −42 meV. DMC-LA is not reported because the simulations are and Molecular Modelling Hub, which is partially funded by EPSRC unstable. (Grant No. EP/P020194/1).

efficient in most of the simulations involving organic molecules by APPENDIX A: DMC IMPLEMENTATION AND roughly 30%. FIXED-NODE APPROXIMATION In DMC, the non-negative and normalized function f (R, t) is interpreted as a probability density distribution, which is repre- V. CONCLUSION sented at each time t via a large number of electronic configurations In this paper, we have illustrated some drawbacks of FN-DMC Ri(t), also called walkers, and their associated weights wi(t). Walk- in the presence of pseudopotentials. Specifically: (i) they generate ers Ri(t) and weights wi(t) evolve in time according to a process that unpredictable differences on results as a consequence of some arbi- is ultimately determined by Green’s function.1 A key issue is that ′ trary choices on the Jastrow functional form and the stochastic Green’s function G(R ← R, t) in Eq. (4) does not impose any anti- optimization procedure; (ii) they might affect the reproducibility of symmetry constraint to the system, so the ground state Φ obtained results if different QMC packages are used; (iii) the accuracy dete- according to the imaginary time projection in Eq. (5) would be riorates whenever the Jastrow factor is not good, which is not easy bosonic as it has a lower energy than the corresponding fermionic to establish. These issues are particularly problematic for precisely system. The traditional and most effective way to impose antisym- where FN-DMC is most needed and offers most promise. We have metry in Φ is to adopt the fixed node (FN) approximation: walkers shown that these issues arise essentially because the pseudopoten- are not allowed to cross the nodal surface of the trial wave func- tials have nonlocal terms. Within both LA and TM, the projection tion ΨT (R). In other terms, the Hamiltonian Hˆ is replaced by the scheme is affected by a subtle interaction between the Jastrow fac- FN Hamiltonian Hˆ FN ≡ Hˆ + Vˆ FN , where Vˆ FN is an infinite wall at tor and the pseudopotentials. In this paper, we have introduced a the nodal surface of ΨT . The projected function ΦFN (R) obtained in new alternative approximation, called the DLA. When FN-DMC this way has the same nodes as ΨT (R), and it is the exact solution for deploys DLA, the projected wave function and the associated energy the Hamiltonian Hˆ FN , namely, Hˆ FN ∣ΦFN ⟩ = EFN ∣ΦFN ⟩. This implies ˆ are not affected by the Jastrow factor. This solves the mentioned = ⟨ΦFN ∣HFN ∣ΨT ⟩ ˆ ∣ ⟩ = ˆ ∣ ⟩ that EFN ⟨Φ ∣Ψ ⟩ , and noticing that HFN ΨT H ΨT because drawbacks. The advantages of DLA have been illustrated on a few FN T Vˆ ≠ 0 only in the nodal surface of Ψ , so we can evaluate E examples, including the evaluation of an interaction energy and ion- FN T FN using the mixed estimator, ization energies. Moreover, the proposed algorithm appears as stable as TM and is much more stable than LA. In terms of efficiency, DLA ∫ ΦFN (R)ΨT (R)EL(R)dR performs better than both LA and TM. EFN = . (A1) An interesting perspective for DLA is that it allows the develop- ∫ ΦFN (R)ΨT (R)dR ment of general purpose Jastrow factors, which do not need a system 85,112–114 dependent optimization and yield a validated accuracy. In The FN energy EFN is an upper bound to the exact energy E0, namely, this way, DLA opens the way to the automation of the FN-DMC, EFN ≥ E0, because Hˆ FN ≥ Hˆ. The approach is exact if the nodes of which can make DMC easier and less labor intensive to use. ΨT (R) are exact, because in this case, Hˆ FN ∣ΦFN ⟩ = Hˆ ∣ΦFN ⟩. Thus,

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-9 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

+l the quality of the FN approximation is determined by the quality of α α α Pˆl = ∑ ∣Yl,m⟩⟨Yl,m∣, the nodes of the trial function ΨT (R). m=−l α where ∣Yl,m⟩ are spherical harmonics centered on nucleus α. The APPENDIX B: HAMILTONIAN FOR ALL-ELECTRONS α idea behind Vˆ (r ) is that it represents an effective potential that VERSUS HAMILTONIAN FOR VALENCE ELECTRONS PP i reproduces the effects of both the nucleus and the core electrons AND PSEUDOPOTENTIALS on the valence electrons. However, there is not an exact map- The electronic Hamiltonian in Born-Oppenheimer approxima- ping, or a thermodynamic integration, providing the pseudopoten- tion is tials. Indeed, some criteria need to be chosen to produce them, N N N M and they need to be tested at some level of theory. Moreover, 1 2 1 Zα Hˆ = − ∑ ∇i + ∑ − ∑ ∑ , (B1) it has to be noticed that in independent-electron theories, such 2 i i

Nc Nc Nc M 1 2 1 Zα Hˆ c = − ∑ ∇i + ∑ − ∑ ∑ , (B4) APPENDIX C: JASTROW INTERACTION WITH 2 i i

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-10 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

We could wonder what parts of the Jastrow factor give rise to this issue, and if it is possible to use a Jastrow not affecting the ratio in the RHS of Eq. (C1). Although there are differences in the Jas- trow factor implemented in the different codes, they typically can be expressed in the following way:

J(R) = ∑ δσi,σj u↑↑(rij) + ∑(1 − δσi,σj )u↑↓(rij) i

+ ∑ ∑ χα(riα) + ∑ ∑ fα(riα, rjα, rij), (C2) α i α i

where u↑↑ and u↑↓ are homogeneous two-electron correlation terms, or e-e, describing the interaction between like-spin and unlike-spin pairs, χα is a e-n term depending on the distance of any electron from the αth atom, and f α is a three-body term, or e-e-n, describ- ing the interaction between electron pairs in proximity of the αth atom. Different packages can provide different parameterization of these terms, and sometimes additional terms.91,115,116 The important ′ aspect is that in the displacement ri → ri performed in the angular ′ ′ integration in Eq. (C1), we have that riα → riα = riα, but rij → rij ≠ rij. FIG. 5. Difference ΔeCEPP-AE between the IE(n) evaluation with eCEPP PPs and Thus, the e-n term χα is not responsible for the Jastrow dependence with all-electrons (AE). FN-DMC with PPs yields different results with different ΨT , and we consider two sets of Slater-Jastrow functions, dubbed Ju and Juxf (see of V loc(R), while the e-e and e-e-n terms are at the source of the NL text). In all reported DMC calculations, D is a single Slater determinant obtained issue. One could in principle decide not to use the three-body term from DFT/LDA. Results using LA, TM, and DLA are shown. As a comparison, f α, but the same cannot be done for the terms u↑↑ and u↑↓ as they are outcomes from Hartree Fock (HF) and CCSD(T) are also reported. the most important in the Jastrow factor, responsible for most of the correlation captured by ΨT and the decrease in the variance of the local energy.92 APPENDIX E: ESTIMATION OF COMPUTATIONAL COST OF A FN-DMC SIMULATION WITH LA, TM, AND DLA APPENDIX D: HOW LARGE ARE ERRORS DUE TO PPs IN IE EVALUATIONS? The efficiency of a DMC calculation depends on the com- putational resources required to achieve a target stochastic error Ideally, a perfect PP delivers exactly the same energy differences σ on the quantity of interest. In most cases, we are inter- as AE calculations. In FN-DMC, the situation is complicated by the target ested in the energy, and in this case, the computational time spent fact that localization errors appear when PPs are used. In Sec.IVB, T is it is shown that these localization errors can be large and that with cost LA and TM, there is a big dependence on the Jastrow factor in the 2 t ⎛ t σsys ⎞ trial wave function. For instance, it is not clear if both the wave func- T = T ∗ ac ac N + , (E1) cost step ⎝ w 2 ⎠ tions Ju and Juxf discussed in Sec.IVB show a sizable localization τ teq σtarget error, or if instead one wave function is responsible for most of the where T is the computational time of a single DMC step (which error. step clearly depends on the specific architecture where the calculation In order to investigate this, we have considered the differ- is performed), t and t are the autocorrelation and equilibra- ence Δ between the eCEPP and the AE results, the IE(n) ac eq eCEPP-AE tion times (which are roughly the same and typically of the order evaluated with LA, TM, and DLA. In the AE calculation, we have 2 of 1 a.u.), Nw is the number of walkers, and σsys is the variance used the same level of theory to generate the wave function, so 94 a Slater-Jastrow function with the determinant from DFT/LDA.93 of the local energy in the corresponding DMC scheme. The first Results are reported in Fig. 5. The plot highlights some unex- term in the parentheses is due to the equilibration time in DMC, pected behavior. We would have expected that the Juxf yields better which has to be removed from the sampling, and which typically is results than Ju because Juxf has more parameters and include Ju negligible compared to the second term in the parentheses, which as a special case, and indeed, it yields a lower variational energy instead comes from the statistical sampling. The quantities on the and variance than Ju. However, Juxf does not show a smaller RHS of Eq. (E1) that are affected by the choice of LA, TM, or DLA are only T and σ2 . In particular, σ2 is Var[E ] , ΔeCEPP-AE compared to Ju in any of the IEs. In fact, the worse step sys sys L DMC-LA DLA wave function, Ju, generates the smallest ΔeCEPP-AE in LA and TM. Var[EL]DMC-TM, and Var[EL ]DMC-DLA for LA, TM, and DLA, On the other hand, DLA yields ΔeCEPP-AE quite similar in abso- respectively. LA lute value to the best LA or TM case, but in DLA, the sign The dependence on Tstep is straightforward: if we name Tstep, TM DLA is inverted. We notice a clear correlation between DLA and HF Tstep, and Tstep the cost per step in the LA, TM, and DLA errors. This correlation suggests that the ΔeCEPP-AE in DLA might approaches, respectively, then just reflect the limitations of the determinant part of thewave DLA LA TM function. Tstep < Tstep < Tstep.

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-11 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

Indeed, in TM, the operations are just the same performed in LA, DLA TABLE II. Variational variance, in a.u., Var[EL]VMC and Var[EL ]VMC for the plus those to perform the moves connected to Green’s function boron atom, simulated using He-core pseudopotentials, a single Slater determinant ′ TNL(R ← R) (T-moves). On the other hand, in DLA, there are fewer obtained from a DFT/LDA calculation and different parameterizations of the Jastrow operations than in LA because the Jastrow factor does not need to factor J, each with parameters optimized to minimize the variance. For the eCEPP 60 be considered when evaluating the nonlocal part of the pseudopo- pseudopotentials, we used the aug-cc-pVQZ-eCEPP basis provided by Trail and Needs,60 while for the ccECP pseudopotentials,61 we used the QZ basis provided by tential (see details in Appendix C). However, the observed differ- 61 ence in the costs of LA, TM, and DLA is relatively small (just a few Bennett et al. percent). Pseudo Jastrow Var[E ] Var[EDLA] More important for the efficiency is the dependence on the L VMC L VMC 2 variance σsys. First of all, we have to notice that here the vari- eCEPP no 0.181(3) 0.181(3) ance under consideration is the one relative to the correspond- eCEPP e-e 0.0401(1) 0.0317(1) ing DMC sampling, which is Var[EL]DMC-LA, Var[EL]DMC-TM, and eCEPP e-e, e-n 0.0285(3) 0.0135(1) [ DLA] Var EL DMC-DLA for LA, TM, and DLA, respectively. The LA eCEPP e-e, e-n, e-e-n 0.0278(2) 0.0133(1) and TM approaches are strictly related to the variational variance Var[EL]VMC; the local energy is calculated in the same way, but the ccECP e-e, e-n, e-e-n 0.0355(4) 0.0166(1) underlying probability distribution is different, and in particular, different values of the time step change the sampling and the corre-

sponding variance. In DMC-LA with large τ, the variance becomes 96 the same as the variational variance (this is likely a consequence from a DFT/LDA calculation performed with the ORCA package of the Metropolis step to enforce detailed balance after the drift- and with the localized basis set aug-cc-pVQZ-eCEPP provided in diffusion step), while at small τ, we notice that the DMC-LA variance Refs. 60 and 97. The Jastrow factor used has e-e, e-n, and e-e-n is typically slightly smaller than the VMC variance. In DMC-TM, the terms, with parameters optimized in any atom to minimize the vari- 98 [ DLA] variance converges to the same variance of DMC-LA for small τ, but ational variance. Figure 6 shows that Var EL VMC is smaller than for large τ, it is larger than the VMC and the DMC-LA variance.95 In Var[EL]VMC for most elements with the exception of the fluorine DMC-DLA, the variance has roughly the same relation with the vari- atom where the DLA variance is +6% larger. In particular, in the ational DLA variance that the DMC-LA variance has with the VMC carbon atom, the variance is −32% smaller, in the nitrogen atom, it variance. Thus, the difference in the variance between DMC-LA and is −13%, and in the oxygen atom, it is −0.5%. DMC-DLA is mostly captured by the difference in the correspond- A natural question at this point is which part of J is mostly DLA involved in producing a difference between E and EDLA? In ing variational variances, Var[EL]VMC and Var[EL ]VMC. Notice L L that the VMC sampling is precisely the same if ΨT is the same, Appendix C, we show that the e-n term drops out when we evaluate and the difference only comes from the Hamiltonian, thus the local the nonlocal potential term; thus, it produces no difference between DLA energy. EL and EL . The terms to consider are thus the e-e and the e-e- In order to investigate this difference, in Fig. 6, we report the n. In Table II, we report the variational variances on the boron atom DLA with different parameterizations of the . It shows than the e-e term variational variances (Var[EL]VMC and Var[EL ]VMC) per electron J on the first-row atoms, using eCEPP pseudopotentials,60 which are produces most of the difference. Moreover, one could wonder how He-core for all the reported atoms. The D in ΨT was obtained much this difference is affected by the choice of the pseudopoten- tials. In Table II, we provide the variance also for the ccECP pseu- dopotential.61 We notice that, although in absolute terms, eCEPP and ccECP yield different variances, in relative terms, the difference DLA between EL and EL is the same. The lower variance automatically translates into a smaller stochastic error in the DMC evaluations with same sampling. Thus, DLA is more efficient in most of the simulations involving organic molecules because they are characterized by the presence of many carbon atoms, where DLA is roughly 30% faster than LA and even more compared to TM (where it crucially depends on the time step value).

REFERENCES 1W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Rajagopal, Rev. Mod. Phys. 73, 33 (2001). 2A. Zen, J. G. Brandenburg, J. Klimeš, A. Tkatchenko, D. Alfè, and A. Michaelides, DLA Proc. Natl. Acad. Sci. U. S. A. 115, 1724 (2018). FIG. 6. Local energy variances in VMC, Var[EL]VMC in blue and Var[EL ]VMC 3 in green, per valence electron on the first-row atoms, using eCEPP pseudopo- J. G. Brandenburg, A. Zen, M. Fitzner, B. Ramberger, G. Kresse, T. Tsatsoulis, A. Grüneis, A. Michaelides, and D. Alfè, J. Phys. Chem. Lett. 10, 358 tentials60 (He-core), a Jastrow factor with e-e, e-n, and e-e-n terms, and a single (2019). Slater determinant obtained from a DFT/LDA calculation with the aug-cc-pVQZ- 4A. Zen, L. M. Roch, S. J. Cox, X. L. Hu, S. Sorella, D. Alfè, and A. Michaelides, eCEPP basis.60 J. Phys. Chem. C 120, 26402 (2016).

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-12 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

5Y. S. Al-Hamdani, M. Rossi, D. Alfè, T. Tsatsoulis, B. Ramberger, J. G. 35O. Valsson and C. Filippi, J. Chem. Theory Comput. 6, 1275 (2010). Brandenburg, A. Zen, G. Kresse, A. Grüneis, A. Tkatchenko, and A. Michaelides, 36P. M. Zimmerman, J. Toulouse, Z. Zhang, C. B. Musgrave, and C. J. Umrigar, J. Chem. Phys. 147, 044710 (2017). J. Chem. Phys. 131, 124103 (2009). 6 Y. S. Al-Hamdani, D. Alfè, and A. Michaelides, J. Chem. Phys. 146, 094701 37J. Toulouse and C. J. Umrigar, J. Chem. Phys. 128, 174101 (2008). (2017). 38E. Giner, A. Scemama, and M. Caffarel, Can. J. Chem. 91, 879 (2013). 7 Y. S. Al-Hamdani, M. Ma, D. Alfè, O. A. von Lilienfeld, and A. Michaelides, 39A. Scemama, Y. Garniron, M. Caffarel, and P.-F. Loos, J. Chem. Theory J. Chem. Phys. 142, 181101 (2015). Comput. 14, 1395 (2018). 8 J. Trail, B. Monserrat, P. López Ríos, R. Maezono, and R. J. Needs, Phys. Rev. B 40B. Braida, J. Toulouse, M. Caffarel, and C. J. Umrigar, J. Chem. Phys. 134, 95, 121108 (2017). 084108 (2011). 9 Y. Luo, A. Benali, L. Shulenburger, J. T. Krogel, O. Heinonen, and P. R. C. Kent, 41M. Casula and S. Sorella, J. Chem. Phys. 119, 6500 (2003). New J. Phys. 18, 113049 (2016). 42A. Zen, E. Coccia, Y. Luo, S. Sorella, and L. Guidoni, J. Chem. Theory Comput. 10 M. Dubecký, L. Mitas, and P. Jurecka, Chem. Rev. 116, 5188 (2016). 10, 1048 (2014). 11 N. Devaux, M. Casula, F. Decremps, and S. Sorella, Phys. Rev. B 91, 081101 43M. Bajdich, L. Mitas, G. Drobny, L. Wagner, and K. Schmidt, Phys. Rev. Lett. (2015). 96, 130201 (2006). 12 Y. Wu, L. K. Wagner, and N. R. Aluru, J. Chem. Phys. 142, 234702 (2015). 44B. M. Austin, D. Y. Zubarev, and W. A. J. Lester, Chem. Rev. 112, 263 13 H. Zheng and L. K. Wagner, Phys. Rev. Lett. 114, 176401 (2015). (2012). 14 L. K. Wagner and P. Abbamonte, Phys. Rev. B 90, 125129 (2014). 45R. Feynman and M. Cohen, Phys. Rev. 102, 1189 (1956). 15 S. Azadi, B. Monserrat, W. M. C. Foulkes, and R. J. Needs, Phys. Rev. Lett. 112, 46P. López Ríos, A. Ma, N. D. Drummond, M. D. Towler, and R. J. Needs, Phys. 165501 (2014). Rev. E 74, 066701 (2006). 16 A. Benali, L. Shulenburger, J. T. Krogel, X. Zhong, P. R. C. Kent, and 47M. Holzmann and S. Moroni, Phys. Rev. B 99, 085121 (2019). O. Heinonen, Phys. Chem. Chem. Phys. 18, 18323 (2016). 48B. L. Hammond, W. A. Lester, and P. J. Reynolds, Monte Carlo Meth- 17 H. Shin, Y. Luo, P. Ganesh, J. Balachandran, J. T. Krogel, P. R. C. Kent, A. Benali, ods in Ab Initio Quantum Chemistry (World Scientific, 1994), https://www. and O. Heinonen, Phys. Rev. Mater. 1, 073603 (2017). worldscientific.com/doi/pdf/10.1142/1170. 18 J. Ahn, I. Hong, Y. Kwon, R. C. Clay, L. Shulenburger, H. Shin, and A. Benali, 49M. H. Kalos and P. A. Whitlock, Monte Carlo Methods, 1st ed. (Wiley, 2009). Phys. Rev. B 98, 085429 (2018). 50Given the values of t and τ, we can approximate G(R′ ← R, t) 19 n ′ T. Tsatsoulis, F. Hummel, D. Usvyat, M. Schütz, G. H. Booth, S. S. Binnie, M. ≈ ∫ ∏i=1 G(Ri ← Ri−1, τ)dRn−1 ... dR1, where n = t/τ, R = Rn, and R = R0. J. Gillan, D. Alf, A. Michaelides, and A. Grüneis, J. Chem. Phys. 146, 204108 51For instance, it has been observed that the time step error is largely decreased if (2017). 110,111 20 a Metropolis step is included in order to enforce detailed balance and that K. Doblhoff-Dier, J. Meyer, P. E. Hoggan, and G.-J. Kroes, J. Chem. Theory it is convenient to reformulate the algorithm in order to implement electron-by- Comput. 13, 3208 (2017). electron updates instead of configuration-by-configuration updates as it improves 21R. J. Maurer, C. Freysoldt, A. M. Reilly, J. G. Brandenburg, O. T. Hofmann, the efficiency in large systems.110,111 Moreover, in the proximity of the nodes of T. Björkman, S. Lebègue, and A. Tkatchenko, Annu. Rev. Mater. Res. 49, 1 Ψ (R), the local energy E (R) and the drift vector V(R) diverge, yielding insta- (2019). T L 22 99–105 bilities in the branching and drifting terms. This issue is substantially reduced Alternative strategies to the FN approximation have been developed, by considering modified versions of the branching and drifting terms closeto which are typically more accurate but much more demanding in terms of com- the nodes.26,27 The Metropolis step, the modification to the branching and drift- putational cost. Thus, typically only FN-DMC is affordable in large systems. 23 ing terms, and the electron-by-electron DMC algorithm do affect the simula- D. M. Ceperley, J. Stat. Phys. 43, 815 (1986). tions for finite value of τ but do not change the limit for τ → 0. They are 24 B. L. Hammond, P. J. Reynolds, and W. A. Lester, J. Chem. Phys. 87, 1130 very important technical aspects as they enhance the stability and efficiency of (1987). the algorithm. For instance, the development of a size consistent version of the 25 A. Ma, N. D. Drummond, M. D. Towler, and R. J. Needs, Phys. Rev. E 71, 066704 modification to the branching term, as discussed inRef. 26, yields a speedup (2005). of up to two orders of magnitude in the evaluations of binding and cohesive 26 A. Zen, S. Sorella, M. J. Gillan, A. Michaelides, and D. Alfè, Phys. Rev. B 93, energies.2,26 241118 (2016). 52 If ΨT1 and ΨT2 are two different trial wave functions with the same nodes [e.g., 27 C. J. Umrigar, M. P. Nightingale, and K. J. Runge, J. Chem. Phys. 99, 2865 having the same determinant D(R) but different Jastrow factors], the correspond- (1993). LA LA ing FN-DMC-LA energies, EFN [ΨT1] and EFN [ΨT2], will be in general different, 28 LA LA M. F. DePasquale, S. M. Rothstein, and J. Vrbik, J. Chem. Phys. 89, 3629 EFN [ΨT1] ≠ EFN [ΨT2], because of the different localization errors. 53 (1988). Two different trial wave functions ΨT1 and ΨT2 with the same nodes have in 29M. Casula, Phys. Rev. B 74, 161102 (2006). TM TM general different FN-DMC-TM energies EFN [ΨT1] ≠ EFN [ΨT2] because their 30L. Mitas, E. L. Shirley, and D. M. Ceperley, J. Chem. Phys. 95, 3467 (1991). localization errors are different. 31M. Casula, S. Moroni, S. Sorella, and C. Filippi, J. Chem. Phys. 132, 154113 54D. F. B. ten Haaf, H. J. M. van Bemmel, J. M. J. van Leeuwen, W. van Saarloos, (2010). and D. M. Ceperley, Phys. Rev. B 51, 13039 (1995). 32 55 24 The localization error goes to zero (ideally) as we can find a ΨT closer to the Notice that this approach was used three decades ago by Hammond et al. in (unknown) fixed-node solution ΦFN , both in LA and TM. one of the first works using PPs in FN-DMC, albeit the motivation was tomake 33 29,106 TM yields more stable simulations than LA. However, if the trial wave runs cheaper. Shortly after the LA was introduced30 – with a prescription to solve function ΨT is good, LA has typically no stability issues. In these cases, TM is often the integrals involved in the localization [see Appendix C and Eq. (C1) therein]— affected by larger finite time step and stochastic errors than LA.107 Moreover, LA and it quickly became the default approach. satisfies a detailed balance condition, while TM does not, and the size-consistency 56M. Burkatzki, C. Filippi, and M. Dolg, J. Chem. Phys. 126, 234105 (2007). condition can be harder to satisfy in TM (indeed, the first version of the TM algo- 57J. R. Trail and R. J. Needs, J. Chem. Phys. 122, 174109 (2005). rithm29 was size-inconsistent, an issue solved four years later with two alternative 58J. R. Trail and R. J. Needs, J. Chem. Phys. 139, 014101 (2013). revised TM algorithms31). Recent investigations108,109 of the localization errors 59J. R. Trail and R. J. Needs, J. Chem. Phys. 142, 064110 (2015). in LA and TM do not show that one method is clearly superior to the other in 60J. R. Trail and R. J. Needs, J. Chem. Phys. 146, 204107 (2017). terms of accuracy. 61M. C. Bennett, C. A. Melton, A. Annaberdiyev, G. Wang, L. Shulenburger, and 34 C. Filippi and C. J. Umrigar, J. Chem. Phys. 105, 213 (1996). L. Mitas, J. Chem. Phys. 147, 224106 (2017).

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-13 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

62If one is concerned with the variationality of the approach, it is also possible 84It should be mentioned that some modifications to Green’s function can to use TM in conjunction with DLA. The corresponding Hamiltonian is then: enhance slightly the stability. For instance, within the ZSGMA scheme26 to cure Vˆ + DLTM − NL D the local energy and the drift velocity divergences, it is possible to perform some Hˆ ≡ Kˆ + Vˆ L + Vˆ + + Vˆ FN . Within this second approach, we have FN NL D DMC-LA simulations with large time steps (e.g., τ > 0.03 a.u.), but unfortunately DLTM ≥ = that EFN E0, with the equality obtained for D Φ. the time step bias is too large. 63 The reference value of −128 ± 1 meV was obtained in Ref.3 using all-electrons. 85It should be mentioned that there have been previous attempts to use wave func- We performed here a new CCSD(T) calculation using the TN-DF pseudopoten- 112–114 57 tions that do not need optimization in QMC although they have focused on tials —the same used for the DMC evaluations—and we obtained an interaction relatively simpler systems and do not deal with atomic PPs and localization errors of −130 ± 4 meV. The agreement between the two evaluations indicates that dif- in DMC. ferences between CCSD(T) and DMC should not come from the quality of the 86R. J. Needs, M. D. Towler, N. D. Drummond, and P. L. Rios, J. Phys.: Condens. employed PPs. Matter 22, 023201 (2010). 64 81 87 DFT calculations were performed using the PWSCF package with a plane The TURBORVB Quantum Monte Carlo package includes a complete suite for wave cutoff of 600 Ry and the LDA functional. Molecular orbitals were later variational, diffusion, and Green’s function quantum Monte Carlo calculations converted into splines82 to enhance the efficiency in QMC calculations. QMC on molecules and solids, for wave function and geometry optimization, and for calculations were performed using the CASINO package. quantum Monte Carlo based molecular dynamics simulations. It is developed by 65The error bar is the error of the fit, which is probably underestimated as we are Sorella and co-workers. See http://people.sissa.it/∼sorella/web/index.html. fitting a quadratic function (so, having three parameters) with four points. Asa 88J. Kim et al., J. Phys.: Condens. Matter 30, 195901 (2018). comparison, the actual DLA evaluation with smallest τ, which is 0.001 a.u., has a 89K. Saritas, T. Mueller, L. Wagner, and J. C. Grossman, J. Chem. Theory Comput. stochastic error due to DMC sampling of ±5 meV. 13, 1943 (2017). 66 The reported error comes from the fit. 90W. W. Tipton, N. D. Drummond, and R. G. Hennig, Phys. Rev. B 90, 125110 67Notice that in both LA and TM, the bias is dominated by the error in the total (2014). energy of the bound system since the wave function is by construction much bet- 91For instance, in CASINO,86 Jastrow terms involving arbitrary numbers of par- ter for the unbound systems. Thus, the sign of the TM error can be rationalized: ticles can be introduced,115 and in TurboRVB,87 there is also the possibility to use TM underbinds because of the variationality of the approach, which leads to an four-body e-e-n-n terms, and the e-e-n term is an explicit function of (riα, rjα); see underestimate of the total energy of the bound system when a suboptimal Jastrow Ref. 116. is used. In LA, the optimization bias could in principle have any sign because the 92A. Grüneis, S. Hirata, Y.-y. Ohnishi, and S. Ten-no, J. Chem. Phys. 146, 080901 approach is not variational. In this specific case, the worse Jastrow leads to alower (2017). total energy (see Fig. 2). 93 68 The employed Jastrow for the AE calculations has e-e, e-n, and e-e-n terms. The reported errors are from the fit and are probably underestimated because Notice that in AE calculations, there is no localization error, i.e., the Jastrow factor we are fitting three parameters with four points, especially for the LA approach, does not affect the accuracy and does not lead to any uncertainty. where the simulation with the smallest time step, τ = 0.003 a.u., was very unstable 94 For a derivation of Eq. (E1), see the supplementary material of Ref.2. and we could not make the stochastic error smaller than 5 meV. 95This is likely because the T-moves are performed after the Metropolis step, so 69J. R. Trail and R. J. Needs, J. Chem. Phys. 122, 014112 (2005). 70 the TM algorithm does not satisfy detailed balance. A. Annaberdiyev, G. Wang, C. A. Melton, M. Chandler Bennett, 96 F. Neese, Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2, 73 (2012). L. Shulenburger, and L. Mitas, J. Chem. Phys. 149, 134108 (2018). 97Using a localized basis, we get rid of possible issues with the Kleinman-Bylander 71J. T. Krogel, J. A. Santana, and F. A. Reboredo, Phys. Rev. B 93, 075143 projectors in plane-wave DFT calculations. Moreover, we crosschecked in the car- (2016). 72 bon atom case that the D obtained from plane-wave DFT calculations with 600 P. Seth, P. L. Rios, and R. J. Needs, J. Chem. Phys. 134, 084105 (2011). Ry cutoff and from open-system DFT calculations with an aug-cc-pVQZ-eCEPP 73 S. Fahy, X. W. Wang, and S. G. Louie, Phys. Rev. Lett. 61, 1631 (1988). basis have similar quality. Indeed, the variance obtained when using the same 74 E. Ertekin, L. K. Wagner, and J. C. Grossman, Phys. Rev. B 87, 155210 (2013). Jastrow factor is roughly the same. 75 For instance, HF or DFT is used in the Burkatzki, Filippi, and Dolg’s energy 98Here, there is a possible ambiguity, as one could wonder if we have to mini- 56 [ ] [ DLA] consistent pseudpotentials (BFD), in the Trail and Needs’ norm conserving mize the variance Var EL VMC or Var EL VMC, or provide different optimiza- Hartree-Fock pseudpotentials (TN-NC)69 and smooth relativistic Hartree-Fock tion if we intend to use the original Hamiltonian Hˆ or the DLA version Hˆ DLA. pseudpotentials (TN-DF).57 However, in all cases considered, we noticed that the parameters of J that min- 76 [ ] For instance, CCSD(T) is used in Trail and Needs’ correlated electron pseu- imize Var EL VMC are, within the stochastic uncertainty, the same as those that 58,59 [ DLA] dopotentials (CEPPs) and the shape and energy consistent pseudopoten- minimize Var EL VMC, and vice-versa. tials,60 and in the Mitas and collaborators’ correlation consistent effective core 99D. M. Ceperley and B. J. Alder, J. Chem. Phys. 81, 5833 (1984). potentials (ccECPs).61,70 100R. Bianchi, D. Bressanini, P. Cremaschi, and G. Morosi, J. Chem. Phys. 98, 7204 77PPs an order of magnitude more accurate for IEs have recently been reported.61 (1993). 101 78DFT/LDA calculation performed using the Orca package,96 which uses local- D. M. Arnow, M. H. Kalos, M. A. Lee, and K. E. Schmidt, J. Chem. Phys. 77, ized basis sets. Reported FN-DMC results are converged in the DMC time step 5562 (1982). 102 and the size of the basis set. J. B. Anderson, C. A. Traynor, and B. M. Boghosian, J. Chem. Phys. 95, 7418 79Note that here better means a better error cancellation. (1991). 103 80L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48, 1425 (1982). S. Zhang and M. H. Kalos, Phys. Rev. Lett. 67, 3074 (1991). 104 81P. Giannozzi et al., J. Phys.: Condens. Matter 21, 395502 (2009). M. H. Kalos and F. Pederiva, Phys. Rev. Lett. 85, 3547 (2000). 105 82D. Alfè and M. J. Gillan, Phys. Rev. B 70, 161101 (2004). F. A. Reboredo, R. Q. Hood, and P. R. C. Kent, Phys. Rev. B 79, 195117 83 (2009). In this case, the inconsistency among the Kleinman-Bylander projectors and the 106 DFT functional used is likely the source of the instability of DMC simulations. N. D. Drummond, J. R. Trail, and R. J. Needs, Phys. Rev. B 94, 165170 (2016). 107 Indeed, we can make the system much more stable by generating Kleinman- M. Pozzo and D. Alfè, Phys. Rev. B 77, 104103 (2008). 108 Bylander projectors using DFT/LDA atomic orbitals, for instance by using the Ld1 J. T. Krogel and P. R. C. Kent, J. Chem. Phys. 146, 244101 (2017). 109 code included in the Quantum Espresso package.81 However, in other systems, it A. L. Dzubak, J. T. Krogel, and F. A. Reboredo, J. Chem. Phys. 147, 024102 can be harder to improve the wave function and enhance stability. Here, we are (2017). intentionally considering a case that amplifies instability issues. 110D. Ceperley, M. H. Kalos, and J. L. Lebowitz, Macromolecules 14, 1472 (1981).

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-14 Published under license by AIP Publishing The Journal ARTICLE of Chemical Physics scitation.org/journal/jcp

111P. J. Reynolds, D. M. Ceperley, B. J. Alder, and W. A. Lester, J. Chem. Phys. 77, 114B. Wood and W. M. C. Foulkes, J. Phys.: Condens. Matter 18, 2305 (2006). 5593 (1982). 115P. López Ríos, P. Seth, N. D. Drummond, and R. J. Needs, Phys. Rev. E 86, 112D. Ceperley, Phys. Rev. B 18, 3126 (1978). 036703 (2012). 113M. Holzmann, D. M. Ceperley, C. Pierleoni, and K. Esler, Phys. Rev. E 68, 116A. Zen, Y. Luo, S. Sorella, and L. Guidoni, J. Chem. Theory Comput. 9, 4332 046707 (2003). (2013).

J. Chem. Phys. 151, 134105 (2019); doi: 10.1063/1.5119729 151, 134105-15 Published under license by AIP Publishing