Questaal: a package of electronic structure methods based on the linear muffin-tin orbital technique

Dimitar Pashova, Swagata Acharyaa, Walter R. L. Lambrechtb, Jerome Jacksonc,∗, Kirill D. Belashchenkod, Athanasios Chantise, Francois Jameta, Mark van Schilfgaardea

aDepartment of Physics, King’s College London, Strand, London WC2R 2LS, United Kingdom bDepartment of Physics, Case Western Reserve University, Cleveland, Ohio 44106, USA cScientific Computing Department, STFC Daresbury Laboratory, Warrington WA4 4AD, United Kingdom dDepartment of Physics and Astronomy and Nebraska Center for Materials and Nanoscience, University of Nebraska-Lincoln, Lincoln, Nebraska 68588, USA eAmerican Physical Society, 1 Research Road, Ridge, New York 11961, USA

Abstract This paper summarises the theory and functionality behind Questaal, an open-source suite of codes for calculating the electronic structure and related properties of materials from first principles. The formalism of the linearised muffin-tin orbital (LMTO) method is revisited in detail and developed further by the introduction of short-ranged tight-binding basis functions for full-potential calculations. The LMTO method is presented in both Green’s function and wave function formulations for bulk and layered systems. The suite’s full-potential LMTO code uses a sophisticated basis and augmentation method that allows an efficient and precise solution to the band problem at different levels of theory, most importantly density functional theory, LDA+U, quasi-particle self-consistent GW and combinations of these with dynamical mean field theory. This paper details the technical and theoretical bases of these methods, their implementation in Questaal, and provides an overview of the code’s design and capabilities. Keywords: Questaal, Linear Muffin Tin Orbital, Screening Transformation, Density Functional Theory, Many-Body Perturbation Theory

PROGRAM SUMMARY framework of an extension to the linear muffin-tin Program Title: Questaal orbital (LMTO) technique including a highly precise and Licensing provisions: GNU General Public License, efficient full-potential implementation. An advanced version 3 fully-relativistic, non-collinear implementation based on Programming language: Fortran, C, Python, Shell the atomic sphere approximation is used for calculating Nature of problem: Highly accurate ab initio calculation transport and magnetic properties. of the electronic structure of periodic solids and of the resulting physical, spectroscopic and magnetic properties for diverse material classes with different strengths and kinds of electronic correlation. 1. Introduction Solution method: The many electron problem is considered at different levels of theory: density Different implementations of DFT are distin- functional theory, many body perturbation theory guished mainly by their basis set, which forms the in the GW approximation with different degrees of core of any electronic structure method, and how arXiv:1907.06021v2 [cond-mat.mtrl-sci] 18 Nov 2019 self consistency (notably quasiparticle self-consistent they orthogonalise themselves to the core levels. Us- GW ) and dynamical mean field theory. The solution ing these classifications most methods adopt one of to the single-particle band problem is achieved in the four possible combinations shown in Fig. 1. In the vast majority of cases, basis sets consist of either ∗Corresponding author. E-mail address: atom-centred spatially localised functions (lower [email protected] panel of Fig. 1), or plane waves (PW) (upper). As

Preprint submitted to Computer Physics Communications November 19, 2019 Figure 1: 2×2 rubric for main classifications of basis set. Nuclei are shown as dots. The all-electron methods APW and KKR on the right substitute (augment) the envelope function (green) with numerical solutions of partial waves Figure 2: Left: basis function in the vicinity of a nucleus, inside augmentation spheres (blue and red). Parts inside showing orthogonalisation to core states (solid line), and augmentation spheres are called “partial waves.” The two corresponding basis function in a pseudopotential (dashed figures on the left use a pseudopotential allowing their enve- line). Right: a muffin-tin potential: the potential is flat in the lope functions to be smooth, with no augmentation needed. interstitial region between sites, and spherically symmetric A pseudopotential’s radius corresponds to a characteristic in a volume around each site. Blue depicts an augmentation augmentation radius. The top two figures use plane waves for radius sR around a sphere centred at some nuclear position envelope functions; the bottom two use atom-centred local R where the interstitial and augmented regions join. basis sets. We denote localised basis sets as “KKR”, for the Korringa-Kohn-Rostocker method [1] as it plays a central role in this work; but there are other kinds, for example the Gaussian orbitals widely favoured among quantum chemists. of a one-dimensional wave equation (Sec. 2), which can be efficiently accomplished. An immense amount of work has followed the orig- for treatment of the core, it is very common to sub- inal ideas of Herring and Slater. The Questaal pack- stitute an effective repulsive (pseudo)potential to age is an all-electron implementation in the Slater simulate its effect, an idea initially formulated by tradition, so we will not further discuss the vast liter- Conyers Herring [2]. Pseudopotentials make it pos- ature behind the construction of a pseudopotential, sible to avoid orthogonalisation to the core, which except to note there is a close connection between allows the (pseudo)wave functions to be nodeless pseudopotentials and the energy linearisation proce- and smooth. For methods applied to condensed dure to be described below. Bl¨ochl’s immensely pop- matter, the primary alternative method, formulated ular Projector Augmented-Wave method [4] makes a by Slater in 1937 [3], keeps all the electrons. Space construction intermediate between pseudopotentials is partitioned into non-overlapping spheres centred and APWs. Questaal uses atom-centred envelope at each nucleus, with the interstitial region mak- functions instead of plane waves (Sec. 3), and an aug- ing up the rest. The basis functions are defined mentation scheme that resembles the PAW method by plane waves in the interstitial, which are re- but can be converged to an exact solution for the placed (“augmented”) by numerical solutions of the reference potential, as Slater’s original method did. Schr¨odingerequations (partial waves) inside the aug- The spherical approximation is still almost univer- mentation spheres. The two solutions must be joined sally used to construct the basis set, and thanks smoothly and differentiably on the augmentation to the variational principle, errors are second order sphere boundary (minimum conditions for a non- in the nonspherical part of the potential. The non- singular potential). Slater made a simplification: he spherical part is generally quite small, and this is approximated the potential inside the augmentation widely thought to be a very good approximation, spheres with its spherical average, and also the in- and the Questaal codes adopt it. terstitial potential with a constant. This is called For a MT potential (VMT taken to be 0 for the Muffin Tin (MT) approximation; see Fig. 2. simplicity), the Schr¨odingerequation for energy Solutions to spherical potentials are separable ε has locally analytical solutions: in the intersti- into radial and angular parts, φ`(ε, r)YL(ˆr). The φ` tial the solution can expressed as a plane wave i k·r 2 2 are called partial waves and YL are the spherical e , with ε=~ k /2m. (We will use atomic Ryd- harmonics. Here and elsewhere, angular momentum berg units throughout, ~=2m=e2/2=1). In spher- labelled by an upper case letter refers to both the ical coordinates envelope functions can be Hankel ` and m parts. A lower case symbol refers to the functions HL(E, r)=h`(kr)YL(ˆr) or Bessel functions orbital index only (` is the orbital part of L=(`, m)). j`(kr)YL(ˆr), except that Bessel functions are ex- The φ` are readily found by numerical integration cluded as envelope functions because they are not 2 bounded in space. Inside the augmentation spheres, employed in the first ab initio description of in- solutions consist of some linear combination of the stanton dynamics [10], one of the first noncollinear φ`. magnetic codes and the first ab initio description The all-electron basis sets “APW” (augmented of spin dynamics [11], first implementation of exact plane wave) and “KKR” [1] are both instances of exchange and exact exchange+correlation [12], one augmented-wave methods: both generate arbitrar- of the first all-electron GW implementations [13], ily accurate solutions for a muffin-tin potential. and early density-functional implementations of non- They differ in their choice of envelope functions equilibrium Green’s functions for Landauer-Buttiker (plane waves or Hankel functions), but they are transport [14]. In 2001 Aryasetiawan’s GW was similar in that they join onto solutions of partial extended to a full-potential framework to become waves in augmentation spheres. Both basis sets the first all-electron GW code [15]. Soon after are energy-dependent, which makes them very com- the concept of quasiparticle self-consistency was plicated and their solution messy and slow. This developed [16], which has dramatically improved difficulty was solved by O.K. Andersen in 1975 [5]. the quality of GW. Its most recent extension is to His seminal work paved the way for modern “lin- combine QSGW with DMFT. It and the code of ear” replacements for APW and KKR, the LAPW Kutepov et al. [17] are the first implementations and Linear Muffin Tin Orbitals (LMTO) method. of QSGW +dynamical mean field theory (DMFT); By making a linear approximation to the energy and to the best of our knowledge it has the only dependence of the partial waves inside the aug- implementation of response functions (spin, charge, mentation spheres (Sec. 2.4), the basis set can be superconducting) within QSGW +DMFT. made energy-independent and the eigenfunctions and eigenvalues of the effective one-particle equa- 1.2. Main Features of the Questaal Package tion obtained with standard diagonalisation tech- Ideally a basis set is complete, minimal, and short niques. LAPW, with local orbitals added to widen ranged. We will use the term compact to mean the the energy window over which the linear approx- extent to which a basis can satisfy all these proper- imation is valid (Sec. 3.7.3) is widely viewed to ties: the faster a basis can yield converged results be the industry gold standard for accuracy. Sev- for a given rank of Hamiltonian, the more compact eral well known codes exist:WIEN2K (http://susi. it is. It is very difficult to satisfy all these prop- theochem.tuwien.ac.at) and its descendants such erties at once. KKR is by construction complete as the Exciting code (http://exciting-code.org) and minimal for a “muffin-tin” potential, but it is and FLEUR (http://www.flapw.de). A recent not short-ranged. In 1984 it was shown (by An- study [6] established that these codes all generate dersen once again! [18]) how to take special linear similar results when carefully converged. Questaal’s combinations of muffin-tin orbitals (“screening trans- main DFT code is a generalisation of the LMTO formation”) to make them short ranged. Andersen’s method (Sec. 3), using the more flexible smooth Han- screening transformation was derived for LMTOs, kel functions (Sec. 3.1) instead of standard Hankels in conjunction with his classic Atomic Spheres Ap- for the envelope functions. Accuracy of the smooth- proximation, (ASA, Sec. 2.7), and screening has Hankel basis is also high (Sec. 3.13), and though not subsequently been adopted in KKR methods also. quite reaching the standard of the LAPW methods, The original Questaal codes were designed around it is vastly more efficient. If needed, Questaal can the ASA, and we develop it first in Sec. 2.5. Its add APW’s to the basis to converge it to the LAPW main code no longer makes the ASA approximation, standard (Sec. 3.10). and generalises the LMTOs to more flexible func- tions; that method is developed in Sec. 3. These 1.1. Questaal’s History functions are nevertheless long-ranged and cannot Questaal has enjoyed a long and illustrious his- take advantage of very desirable properties of short- tory, originating in O.K. Andersen’s in the ranged basis sets. Very recently we have adapted 1980’s as the standard “Stuttgart” code. It has Andersen’s screening transformation to the flexible undergone many subsequent evolutions, e.g. an basis of full-potential method (Sec. 2.9). Screen- early all-electron full-potential code [7], which was ing provides a framework for the next-generation used in one of the first ab initio calculations of basis of “Jigsaw Puzzle Orbitals” (JPOs) that will the electron-phonon interaction for superconductiv- be a nearly optimal realisations of the three key ity [8], an efficient molecules code [9] which was properties mentioned above. 3 Most implementations of GW are based on plane in the ASA [26], derived from the Stuttgart waves (PWs), in part to ensure completeness, but ASA code; and it was the first all-electron GW also because implementation is greatly facilitated implementation. Kotani and van Schilfgaarde by the fact that the product of two plane waves is extended it to a full-potential framework in analytic. GW has also been implemented recently 2002 [15]. A shell script lmgw runs the GW in tight-binding forms using e.g., a numerical ba- part and manages links between it and lmf; sis [19], or a Gaussian basis [20]. None of these basis 4. the Quasiparticle Self-Consistent GW (QSGW ) sets is very compact. The FHI AIMS code can be approximation, first formulated by Faleev, reasonably well converged, but only at the expense Kotani and van Schilfgaarde [16]. Questaal’s of a large number of orbitals. Gaussian basis sets are QSGW is a descendent of Kotani’s origi- notorious for becoming over-complete before they nal code, which with some modest modifica- become complete. Questaal’s JPO basis—still under tions can be found at https://github.com/ development—should bring into a single framework tkotani/ecalj/. QSGW also works syn- the key advantages of PW and localised basis sets. chronously with lmf, yielding either a static Questaal implements: QSGW self-energy Σ0 which lmf reads as an ef- fective external potential added to the Hamilto- 1. density-functional theory (DFT) based on com- nian, or a dynamical self-energy (diagonal part) mon LDA and GGA exchange-correlation func- which a post-processing code lmfgws, uses to tionals (other LDA or GGA, but not meta- construct properties of an interacting Green’s GGA, varients are available via libxc[21]). function, such as the spectral function. See There is a standard full-potential DFT code, Sec. 4; lmf (Sec. 3), and also three codes (lm, lmgf, 5. extensions to Dynamical Mean Field Theory lmpg) that implement DFT in the classical using a bath based on either DFT or QSGW. Atomic Spheres Approximation [5, 18], pre- Questaal does not have its own implementa- sented in Sec. 2.7. The latter use the screened, tion of DMFT, but has interfaces to Haule’s tight-binding form (Sec. 2.10). lm, a descendant CTQMC solver [27], and to the TRIQS li- of Andersen’s standard ASA package (Stuttgart brary [28]. See Sec. 5; code), is an approximate, fast form of lmf, use- 6. an empirical tight-binding model (tbe). The ful mainly for close-packed magnetic metals; interested reader is referred to Refs [29, 30] and lmgf (Sec. 2.12) is a Green’s function implemen- the Questaal web site; and tation closely related to the KKR Green’s func- 7. a large variety of other codes and special- tion, parameterised so that it can be linearised. purpose editors to supply input the electronic Sec. 2.13 shows how this is accomplished, result- structure methods, or to process the output. ing in an efficient, energy-independent Hamil- tonian lm uses. lmgf has two useful extensions: 1.3. Outline of the Paper the coherent potential approximation (CPA, The aim of the paper is to provide a consistent Sec. 2.18) and the ability to compute magnetic presentation of the key expressions and ideas of exchange interactions. lmpg (Sec. 2.14) is a the LMTO method, and aspects of electronic struc- principal layer Green’s function technique sim- ture theory as they are implemented in the different ilar to lmgf but designed for layer geometries codes in the Questaal suite. Necessarily this presen- (periodic boundary conditions in two dimen- tation is rather lengthy; the paper is organised as sions). lmgf is particularly useful for Landauer- follows. Section 2 describes the LMTO basis, assum- Buttiker transport [22, 23, 24, 25], and it in- ing the muffin-tin and atomic sphere descriptions cludes a nonequilibrium Keldysh technique [14]. of the potential. Together with tail cancellation 2. density functional theory with local Hubbard and linearisation this comprises therefore a deriva- corrections (LDA+U) with various kinds of tion of the traditional (“second generation”) LMTO double-counting correction (Sec. 3.8). method. The transformation to a short range tight- 3. the GW approximation based on DFT. Ques- binding-like basis, and to other representations is taal’s GW package is separate from the DFT described. The formulation of the crystal Green’s code; there is an interface (lmfgwd) that sup- function in terms of the LMTO potential param- plies DFT eigenfunctions and eigenvalues to it. eters is derived, allowing the use of coherent po- It was originally formulated by Aryasetiawan tential approximation alloy theory. Non-collinear 4 magnetism and fully-relativistic LMTO techniques 2.1. One-centre Expansion of Hankel Functions are presented. Section 3 describes the full-potential An envelope function HL(E, r) has a “head” cen- code: its basis, augmentation method, core treat- tred at the origin where it is singular and tails at ment and other technical aspects are described in sites around the origin. In addition to the head, tails detail. The LDA+U method, relativistic effects, must be augmented by linear combinations of φ`(ε, r) combined LMTO+APW, and numerical precision centred there, which require HL be expanded in func- are also discussed. Section 4 describes the GW tions centred at another site. If the remote site is method: the importance of self-consistency, the na- at R relative to the head, the one-centre expansion ture of QSGW in particular and its successes and can be expressed as a linear combination of Bessel limitations. Section 5 describes Questaal’s interface functions to different DMFT solvers; section 6 discusses cal- X culation of spin and charge susceptibilities. Our HL(E, r) = SML(E, R)JM (E, r − R) (1) perspective on realising a high fidelity solution to M the many-body problem for solids is described in section 7. Finally, software aspects of the Questaal This follows from the fact that both sides satisfy the 2 project are outlined in section 8. Several appendices same second order differential equation (∇ +E) = 0. are provided with a number of useful and important The two functions centred at the origin satisfying relations for the LMTO methodology. this equation are the Hankel and Bessel functions. Hankel functions have a pole at r = 0, whereas Bessel functions are regular there, so this relation 2. The Muffin-Tin Potential and the Atomic must be true for all r < |R|. The larger the value Spheres Approximation of r, the slower the convergence with M. Expressions for the expansion coefficients S can be As noted earlier, the KKR method solves the found in various textbooks; they of course depend on Schr¨odinger equation in a MT potential to arbitrar- how H and J are defined. Our standard definition ily high accuracy. We develop this method first, is and show how the LMTO method is related to S (E, R) = it. In Sec. 3.1 we show how the basis in lmf is MK X k (k+m−`)/2 related to LMTOs. The original KKR basis set 4π CKLM (−1) (−E) HL(E, R) (2) consists of spherical Hankel (Neumann) functions L H (E, r) = h (kr)Y (ˆr) as envelope functions RL R` L The C are Gaunt coefficients (Eq. (A.8)). with k2 = E, augmented by linear combinations of KLM To deal with solids with many sites, we write the partial waves φ (ε, r)Y (ˆr) inside augmentation R` RL one-centre expansion using subscript R to denote a spheres (Fig. 3). (See the Appendix for definitions nucleus and r relative to the nuclear coordinate: and the meaning of subscripts R and L.) The en- velope must be joined continuously and differen- X H (E, r) = S 0 (E)J 0 (E, r) (3) tiably onto augmentation parts, since the kinetic RL R M,RL R M M energy cannot be singular. Note that for large `, ` φR` → const×r . This is because the angular part Envelope functions then have two ` cutoffs: `b for of the kinetic energy becomes dominant for large `. the head at R, and `a for the one-centre expansion 0 at R . These need not be the same: `b determines the rank of the Hamiltonian, while `a is a cutoff beyond which φR0` × φR0`0 is well approximated 0 by const×r`+` . A reasonable rule of thumb for vMTZ reasonably well converged calculations is to take `b one number larger than the highest ` character in the valence bands. Thus `b = 2 is reasonable for sp elements, `b = 3 for transition metal elements, `b = 4 for f shell elements. In the ASA, reasonable Figure 3: Schematic of muffin-tin potential (black) and a results can be obtained for `b = 2 for transition solution (red) in a three-atom chain. metal elements, and `b = 3 for f shell elements. As for `a, traditional forms of augmentation usually 5 Vn Vn+1 require `a = 2`b for comparable convergence: lmf ) ε does it in a unique way that converges much faster ( D than the traditional form and it is usually enough l to take `a = `b + 1 [31] (Sec. 3.6).

2.2. Partial Waves in the MT Spheres −l−1 Partial waves in a sphere of radius s about a nu- cleus must be solved in the four coordinates, r and ε Cn energy ε. Provided r is not too large, v(r) is approx- imately spherical, v(r)≈v(r); Even if the potential is not spherical, it is assumed to be for construction of Figure 4: Variation of φ`(ε, r) with ε for an s orbital. the basis set, as noted earlier. Solutions for a spher- ical potential are partial waves φ`(ε, r)YL(ˆr), where Y are the . Usually Questaal L Consider the change in φ (ε, r) with ε for a uses real harmonics Y (ˆr) instead (see Appendix ` L given v(r). As ε increases φ (ε, r) acquires more for definitions). ` curvature (Fig. 4). In the interval between ε=ε0 φ`(ε, r) satisfies Schr¨odinger’sequation 0 where φ`(ε0, s)=0 so that D` = 0, and ε2 where 2 0 (−∇ + v(r) − ε)φ`(ε, r) = 0 (4) φ`(ε2, s)=0, φ`(ε, s) is positive. D` thus decreases monotonically, with D` → −∞ as ε→ε2. At some We are free to choose the normalisation of φ and ε1 in this region D`= − ` − 1, which is the loga- use rithmic derivative of a Hankel function of energy Z s 2 2 0. ε1 is called the “band centre” C` for reasons to φ (ε, r)r dr = 1 (5) be made clear in Sec. 2.5: it is close to an atomic 0 level and in tight-binding theory would correspond One way to think about solutions of Schr¨odinger’s to an on-site matrix element. Increasing from ε2, equation is to imagine each nucleus, with some v (r) R D` decreases monotonically from +∞ as shown in around it, as a scatterer to waves incident on it. Scat- Fig. 4, reaching 0 once more, passing through some tering causes shifts in the phase of the outgoing wave. ε3 where D`=+`. This is the logarithmic derivative The condition that all the sites scatter coherently, of a Bessel function of energy 0, and is traditionally which allows them to sustain each other without any called V`. incident wave, gives a solution to Schr¨odinger’sequa- Thus D` is a monotonically decreasing cotangent- tion. This condition is explicit in the KKR method, like function of ε, with a series of poles. At each pole and forms the basis for it. Imagine a “muffin -tin” φ`(ε, r) acquires an additional node, incrementing potential—flat in the interstitial but holes carved the principal quantum number n. Between poles out around each nucleus of radius s. The phase shift n is fixed and there are parameters C` and V` for from a scatterer at R is a property of the shape each n. The linear method approximates D` with of φR` at its MT boundary, φR`(ε, s). Complete a simple pole structure, which is accurate over a information about the scattering properties of the certain energy window. Similarly pseudopotentials sphere, if v(r)=v(r), can be parameterised in terms are constructed by requiring the pesudofunction of φ (ε, s) and its slope, as we will see. R` to match D` of the free atom, in a certain energy For a small change in energy of the incident wave, window. there can be a strong change in the phase shift. In For principal quantum number n, φ` has n−`−1 the region near an eigenstate of the free atom the nodes and may vary rapidly to be orthogonal to energy dependence is much stronger for the partial deeper nodes. This poses no difficulty: we use a wave than it is for the incident waves striking it, so shifted logarithmic radial mesh, with point i given electronic structure methods focus on the scattering by properties of the partial waves φ`. This informa- a(i−1) tion is typically expressed through the logarithmic ri = b{e − 1} derivative function   Typically a few hundred points are needed for ac- r dφ`(ε, r) D`(ε) ≡ D{φ`(ε)} = (6) curate integration. Core and valence waves use the φ`(ε, r) dr s same mesh. 6 2.3. Energy Derivative of D The third line follows from Eq. (4), and the second Consider the matrix element integrated over a from Green’s second identity which adds a surface ˙ sphere of radius s: term when φ and φ are interchanged. The Wron- skian W {a, b} is defined as 2 0 = hφ`(εν )| − ∇ + v − ε |φ`(ε)i W {a(s), b(s)} ≡ s2 [a(s)b0(s) − a0(s)b(s)] εν is some fixed energy. Taking into account the boundary condition at s, we obtain = sa(s)b(s)[D{b} − D{a}] (12)

hφ`(ε)| ε − εν |φ`(εν )i = for a pair of functions a(r) and b(r) evaluated at point s. − [D`(ε) − D` (εν )]sφ`(εν , s)φ`(ε, s) from Green’s second identity. From this we obtain the energy derivative of D` as [5] 2.5. The Traditional LMTO Method

D`(ε) − D` (εν ) −1 D˙ `(ε) ≡ lim = (7) ε→ε 2 ν ε − εν sφ` (ε, s) For historical reasons Anderson constructed the original LMTO formalism with non-standard defini- 2.4. Linearisation of Energy Dependence in the Par- tions for the Hankel and Bessel functions. We follow tial Waves those definitions in order to be consistent with the An effective way to solve Schr¨odinger’sequation historical literature. In this paper they are named ˆ ˆ is to linearise the energy-dependence of the partial H` and J` and are defined in Appendix B. Here we waves φ`(ε, r), as follow Andersen’s development only in the context of E≤0, with κ2= − E. Note that E can be chosen ˙ φ`(ε, r) ≈ φ`(εν , r) + (ε − εν )φ`(εν , r) (8) freely and need not be connected to the eigenvalue ε. But for exact solutions in a MT potential, the This was Andersen’s most important contribution energy of HˆL(E, r) and Jˆ`(E, r) must be chosen so to electronic structure theory [5]: it had a dramatic E=ε−vMTZ. impact on the entire field. We will make extensive LMTO and KKR basis sets solve Schr¨odinger’s use of the linear approximation here. equation in a muffin-tin potential, which in the inter- In the linear approximation, four parameters stitial reduces to the Helmholtz equation, and have (φ(s), φ˙(s), and their logarithmic derivatives) com- linear combinations of Hankel functions as solutions pletely characterise the scattering properties of a that satisfy appropriate boundary conditions. We sphere with v = v(r). Only three of them turn out defer treatment of the boundary conditions to the to be independent. To see this, obtain an equation next section and continue the analysis of partial for φ˙ by differentiating Eq. (4) w.r.t. ε: waves for a single scattering centre for now. (−∇2 + v − ε)φ˙(ε, r) = φ(ε, r) (9) As noted, the scattering properties depend much more strongly on the partial waves than the energy With the normalisation Eq. (5), φ and φ˙ are orthog- dependence of the envelopes. In keeping with the onal traditional LMTO method, we assume that the ki- netic energy of the envelopes vanishes (the energy ˙ hφ(εν )φ(εν )i = 0 (10) is taken to be close to the MT potential), then Schr¨odinger’sequation reduces further to Laplace’s Using the normalisation Eq. (5), we can estab- ˙ equation, whose solutions are Hankel and Bessel lish the following relation between φ(εν , s), φ(εν , s), functions Hˆ and Jˆ at E=0, which we denote as D{φ} and D{φ˙}: ` ` Hˆ`(r) and Jˆ`(r). In close-packed solids there is reasonable justification for this choice: the spacing 1 = h[φ(ε )]2i = hφ(ε )| − ∇2 + v − ε |φ˙(ε )i ν ν ν ν between spheres is much smaller than the wave- ˙ 2 ˙ = hφ(εν )| − ∇ + v − εν |φ(εν )i + W {φ, φ} length of a low-energy solution to the wave equation ˆ = W {φ(s), φ˙(s)} (one not too far from the Fermi level). H`(r) and Jˆ (r) are proportional to r−`−1 and r` respectively ˙ ˙ ` = [D{φ(εν )} − D{φ(εν )}]sφ(εν , s)φ(εν , s) (11) (see Appendix) and so φ` can be continued into the 7 interstitial in the vicinity of s to JˆRL for rR>s in Eq. (13) must be taken out of the MTO because it diverges for large r, as noted. ` ` + 1 + D{ε} r  Since Jˆ` is present for r s R L ` R R R are also Bessel functions, Eq. (3). Thus, all the con- (14) tributions to Ψ inside some sphere R, in addition to the partial wave, consist of some linear combination and that do not diverge for large r . N and R R` of Bessel functions. We can find exact solutions P are coefficients fixed by requiring that the R` for the MT potential by finding particular linear value and slope are continuous at s . Thus for R combinations z that cause all the Jˆ inside each r

−1 f(r) = [W {f, b}a(r) − W {f, a}b(r)] W {a, b} φ(εν , s) (D − D{φ(εν )}) ω[D] = − · ˙ ˙ (19) φ(εν , s) (D − D{φ(εν )})

Thus the matching conditions require N and P to Eqs. (8) and (25) refer to the same object; one be is parameterised by ε while the other is parame- W {Hˆ , φ } terised by D or ω[D]. The latter is more convenient P (ε) = ` ` ` ˆ because ω[D] and P [D] both have a simple pole W {J`, φ`} structure. That each have this structure imply that w 2`+1 D{φ (ε)} + ` + 1 −−−→κ→0 2(2` + 1) ` their inverses D[ω] and D[P ] also have a simple pole s D{φ`(ε)} − ` structure. This further implies that if P is parame- (20) terised not by D but instead by ω[D], P {ω[D]} will also have a simple pole structure in ω, and depends W {Jˆ`, Hˆ`} N`(ε) = (21) on ω[D] as: W {Jˆ`, φ`} (ω[D] − ω[D{Hˆ }]) w is an arbitrary length scale, typically set to the P {ω[D]} = P {ω[D{φ˙}]}· average value of s. (ω[D] − ω[D{Jˆ}]) With the help of Eq. (7) the energy derivative of P` is readily shown to be (Eq. A21, Ref. [32]) This relation follows from the fact that P {ω} has a pole structure in ω, that P vanishes when D=D{Hˆ }, dP` w/2 and 1/P vanishes when D=D{Jˆ}. The prefactor P˙` = = (22) 2 ˙ dε [W {Jˆ`, φ`}] P {ω[D{φ(εν }]} follows from the fact that when ω[D]→∞, D→D{φ˙}. Using the following relation between Hankels and To obtain an explicit form for P (ε) a relation be- Bessels, tween ε and ω is required. Matrix elements and over- lap for Φ[D] are readily obtained from Schr¨odinger’s W {Hˆ`, Jˆ`} = w/2 equation, Eq. (4) and (9). With these equations it is readily seen that and normalisation relations Eq. (5) and (10), we can find that ˆ ˆ 2 ˙ 2 [W {H`, J`}] 2 2 P` = = N 0 2 2 ` hΦ[D ] | − ∇ + v − ε | Φ[D]i = ω[D] (26) w [W {Jˆ`, φ`}] w ν 0 ˙2 0 and therefore hΦ[D ] | Φ[D]i = 1 + hφ iω[D ]ω[D] q ε[D] can be obtained from the variational principle N` = P˙`w/2 (23) ε[D] = hΦ[D] | − ∇2 + v | Φ[D]i/hΦ2[D]i 2.8.1. Linearisation of P 2 ˙2 −1 In this section we consider a single ` only and drop = εν + ω[D]{1 + ω [D]hφ i} (27) the subscript. By linearising the energy dependence of φ, Eq. (8), it is possible to parameterise P (ε) in Linearisation of φ or Φ has errors of second order a simple manner. First, we realise P is explicitly a in ε − εν , which means the variational estimate for function of D, and implicitly depends on ε through the energy has errors of fourth order. From inspec- tion of Eq. (27) we can deduce that the following D≡D{φ`(ε)}. Writing Eq. (20) in terms of D we see that linear approximation to ε Hˆ (s) D − D{Hˆ } ε[D] = ε + ω[D] (28) P [D] = · (24) e ν Jˆ(s) D − D{Jˆ} has errors of second order in ε − εν . Thus, P can The linearised φ(ε, r) (Eq. (8)) may be re-expressed be parameterised to second order in ε as using D in place of ε: 1 ε − C ˙ P (ε) ≈ Pe(ε) = Φ(D, r) ≈ φ(εν , r) + ω[D]φ(εν , r) (25) γ ε − V 10 where the “potential parameters” γ, C and V are Name Interpretation defined as C band centre eigenvalue of MT “atom” and resonance in ex- 1 W {J,ˆ φ˙} γ = = tended system, Eq. (30) P [D{φ˙}] W {H,ˆ φ˙} ∆ bandwidth bandwidth in the absence 2`+1 κ→0 (s/w) D{φ˙} − `} of hybridisation with −−−→ (29) other orbitals, Eq. (33) 2(2` + 1) D{φ˙} + ` + 1 p small parameter 3rd order correction to W {H,ˆ φ} second-order potential C−εν = ω[D{Hˆ }] = − W {H,ˆ φ˙} function Pe, Eq. (32) κ→0 φ(s) (D{φ} + ` + 1) V “bottom” ε where D`=+` (free elec- −−−→− (30) trons) φ˙(s) (D{φ˙} + ` + 1) γ “transformation see Sec. 2.9 W {J,ˆ φ} to orthogonal V − εν = ω[D{Jˆ}] = − W {J,ˆ φ˙} basis”

κ→0 φ(s) (D{φ} − `) −−−→− (31) Table 1: Potential parameters in the original LMTO-ASA φ˙(s) (D{φ˙} − `) method. In the tight-binding transformation of this method these symbols become parameterised by α, defining the trans- A simple pole structure can be parameterised in formation; see Sec. 2.10.1. several forms. A particularly useful one is

(ε − C)  ∆ −1 functions can be constructed without linearising φ; Pe(ε) = = + γ (32) γ(ε − C) + ∆ ε − C this is the KKR-ASA method. How linearisation of φ resolves G in an energy-independent Hamiltonian where ∆≡(C − V )γ. ∆ can be expressed in terms is described in Sec. 2.8.3. of Wronskians as 2.8.2. Spin Orbit Coupling as a Perturbation √  2 1/2 ∆ = − W {J,ˆ φ} The formalism of the preceding sections can be w extended to the Pauli Hamiltonian. If we include the w 1/2 φ(s) s`+1 spin-orbit coupling perturbatively, and include the −−−→−κ→0 [D{φ} − `] 2 (2` + 1) w`+1 mass-velocity and Darwin terms the Hamiltonian (33) becomes 1 ∂v(r) ∂ This last equation defines the sign of ∆1/2. −∇2 + v(r) − (ε − v(r))2 + + ξ(r)Lˆ · Sˆ c2 ∂r ∂r Eq. (32) is accurate only to second order because of the linear approximation Eq. (28) for ω(ε). From where the structure of Eq. (27), it is clear that ω(ε), and 2 dv(r) ξ(r) = thus P (ε) can be more accurately parameterised (to c2 rdr third order) by the substitution and 0 P (ε) = Pe(ε ) (34)  z +− ˆ ˆ 1 L L 0 3 ˙2 L · S = −+ z ε = (ε + (ε − εν ) hφ (εν )i) (35) 2 L −L The third-order parameterisation requires another The 2×2 matrix refers to spin space. In orbital parameter, sometimes called the “small parameter” space,

s z Z ¨ L = δ 0 m ˙2 2 φ(s) mm p = φ (εν , r)r dr = − (36) 3φ(s) +− p 0 L = δm0(m+1) (` + m + 1)(` − m) −+ p Thus P is parameterised to third order by four inde- L = δm0(m−1) (` + m)(` − m + 1) pendent parameters. It is sometimes convenient in a Green’s function context to use P and N (or equiva- Lˆ · Sˆ mixes spin components; also matrix elements lently P˙ ; see Eq. (23)) in place of C and ∆. Green’s of Eq. (25) and Lˆ · Sˆ depend on both ` and m. We 11 require matrix elements of the Φ(D), analogous to Hˆ`(r) is long ranged (see Eq. (B.14) for κ=0) unless Eq. (26): E = −κ2 is −1 Ry or deeper. But such a basis is not accurate: the optimal E falls somewhere in the 0 ˆ ˆ 0 0 hΦ`0m0 [D ] | ξL · S | Φlm[D]i = ξ`[D ,D]δll0 (lm|lm ) middle of the occupied part of the bands, and is roughly +0.3 Ry in close-packed systems; E=0 is a 2.8.3. How the ASA-Tail Cancellation Reduces to compromise. a Linear Algebraic Eigenvalue Problem The idea behind the “screening” or “tight-binding” The eigenvalue condition is satisfied when ε is transformation is to keep E= − κ2 near zero but varied so that render the Hilbert space of the χRL(ε, r) short |P − S| = 0 range by rotating the basis into an equivalent set α {χRL(ε, r)}→{χRL(ε, r)}, by particular linear com- α P −S is a matrix (P − S)RL,R0L0 with P diagonal binations which render χRL(ε, r) short ranged (or in RL, and S Hermitian. If P is parameterised by acquire another desirable property, e.g. be orthogo- Eq. (32) nal), see Fig. 5.

 ∆ −1

+ γ − S = 0 z − C

Multiply on the right by S−1 and the left by P −1

∆ −1 + γ − S = 0 ε − C Rearrange, keeping in mind that C, ∆, and γ are real and diagonal in RL √ √ −1 −1 −ε + C + ∆ S − γ ∆ = 0 This has the form of the linear algebraic eigenvalue problem

ehψ = εψ Figure 5: Contour of a screened s envelope function in a bcc lattice. Each contour represents a reduction in amplitude with by a factor of 10. Dashed lines show contours with negative amplitude. At the “hard core” radius (Sec. 2.10.5) on the √ √ head site a screened function has pure ` character; at the −1 −1 eh = C + ∆ S − γ ∆ (37) “hard core” radius on all the tail sites the one-centre expansion of the envelope function vanishes for ` < `max. eh is a Hermitian matrix, with eigenvalues corre- sponding to the zeros in |Pe − S|. The tilde indicates that eh is obtained from Pe and is thus accurate to Andersen sometimes called the change a “screen- second order in ε − εν (C and ∆ are calculated at ing transformation” because it is analogous to screen- ˆ εν , Eq. (30,33)). Linear MTO’s will be constructed ing in electrostatics. Note that H`(E=0, r) ∝ 1/r, α has the same form as a charge monopole, with long in Sec. 2.13, and eh can be identified with h − εν , Eq. (61), for α=γ and if P is parameterised by Pe. range behaviour. It becomes short ranged if screened by opposite charges in the neighbourhood. The 2.9. “Screening” Transformation to Short-ranged, same applies to higher order multipoles: they can Tight-binding Basis become short ranged in the presence of multipoles of opposite sign. Formally, Eq. (37) is a “tight-binding” Hamilto- nian in the sense that it is a Hamiltonian for a linear The transformation can be accomplished in an combination of atom-centred (augmented) envelopes elegant manner by admixing to the original (“bare”) ˆ ˆ 0 χ, Eq. (14) taken at some fixed (linearisation) en- envelope (renamed from HRL to HRL to distinguish ˆ 0 ergy εν . The Hamiltonian is long-ranged because it from the “screened” one) with amounts of HR0L0 12 in the neighbourhood of R: In practice Sα is calculated from

ˆ α X ˆ 0 α α −1 −1 0−1 −1 −1 HRL(r) = HR0L0 (r)BR0L0;RL (38) R0L0 S = α α − S α − α α Whatever prescription determines BR0L0;RL, it is It follows immediately from Eq. (43) that if there are evident that the Hilbert space is unchanged, and two screening representations α and β, the structure ˆ α ˆ 0 α that HRL→HRL if BR0L0;RL→δR0L0;RL constants connecting them are related by The one-centre expansion of Hˆ α in any chan- RL −1 nel R0L0 is some linear combination of Hankel and Sα −1 + α = Sβ + β (44) ˆ 0 Bessel functions, because HRL are Hankels in their 0 0 head channel R L =RL, and linear combinations of 2.10. Screened Muffin-Tin Orbitals and Potential 0 0 Bessels in other channels R L =RL (see Eq. (3)). Functions ˆ 0 0 0 Thus every HRL is expanded in R L by some partic- ular linear combination of Bessel and Hankel func- In this section we develop a screened analogue of tions the MTO’s, Eq. (14) potential functions, Eq. (20), normalisation Eq. (21), and tail cancellation condi- α 0 ˆ 0 JR0L0 (E, r) = JR0L0 (E, r) − αR0L0;RLHR0L0 (E, r) tions (16). Here we mostly concern ourselves with (39) the ASA with κ=0.

α B 0 0 determines α 0 0 , and vice-versa. α R L ;RL R L ;RL 2.10.1. Redefinitions of Symbols used as a superscript indicates that it determines the screening. We have defined a number of quantities in the A simple and elegant way to choose the screening context of the original MTO basis set, Eq. (14) that ˆ 0 will have a corresponding definition in a screened transformation is to expand every HRL in channel 0 0 α basis set. Several previously defined quantities are R L 6=RL, by the same function J`0 (κ, rR0 )YL0 (ˆrR0 ). Then α need only be specified by two indices, now labelled with a superscript 0 to indicate that their definitions correspond to the unscreened α=0 α 0 0 →α 0 0 . R L ;RL R ` ˆ ˆ 0 ˆ ˆ0 0 representation: HRL≡HRL, JRL≡JRL, PRL≡PRL, To sum up, we specify the transformation through 0 0 ˆ α NRL≡NRL and SR0L0;RL≡SR0L0;RL. αR0`0 . Envelopes HRL are expanded at site R0L06=RL, as linear combinations The “potential parameters” C, ∆, and p, Eq. (30- 36) and Table 1, can also be relabelled with ˆ α X ˆα α representation-dependent definitions. It is unfortu- HRL(r) = − JR0L0 (r)SR0L0;RL (40) R0L0 nately rather confusing, but the original definitions ˆα ˆ0 ˆ 0 JR0L0 (r) = [JR0`0 (r) − αR0`0 HR0`0 (r)]YR0L0 (ˆr) (41) without superscripts Eq. (30-36) correspond not to 0 0 0 α C , ∆ , and p , but to the particular screening rep- The structure constants S 0 0 are expansion co- R L ;RL resentation α=γ (dubbed the “γ representation”). efficients that will be determined next, but already To be consistent with the new superscript conven- α α→0 0 0 it should be evident that S −−−→S , where S are tion for P 0 and N 0, the appropriate identifications the structure constants of Eq. (3). are ˆ α In its own “head” HRL must have an additional ir- α γ ˙ ˙γ regular part. By expressing the BR0L0;RL in Eq. (38) φ(εν )≡φ (εν ), φ(εν )≡φ (εν ), and α in terms S 0 0 as R L ;RL C≡Cγ , ∆≡∆γ , p≡pγ (45) ˆ α X ˆ 0 α  HRL(r) = HR0L0 (r) δR0L0;RL + αR0`0 SR0L0;RL Another unfortunate artefact of the evolution in R0L0 LMTO formalism is that the meaning of many sym- (42) bols changed over time. γ is called Q−1 in Ref. [32]. In the most recent NMTO formalism, S and B have we can see indeed that Hˆ α (r) has the required RL exchanged meanings. Questaal’s ASA codes use one-centre expansion, Eqs. (40, 41), provided that definitions that most closely resemble the “second Sα obey a Dyson-like equation generation” LMTO formalism perhaps most clearly Sα = S0 + S0αSα or expressed in Ref. [35]. Reference 16 of that paper makes correspondences to definitions laid out in α −1 0 −1 S = S − α (43) earlier papers. 13 ˆ¯α ˙ 2.10.2. Potential and Normalisation Functions for JR0L0 is the linear combination of φ and φ that Screened MTO’s ˆα matches continuously and differentiably JR0L0 de- The MTO Eq. (14) is derived by augmenting the fined in Eq. (39). Eq. (51) uses it instead of Jˆα be- ˆ 0 envelope HR`(ε, r) by matching it smoothly onto a cause when we later construct energy-independent ˆ0 linear combination of φ`(ε, r) and JR`(r). For the MTO’s we can generate basis sets that accurately ˆ 0 ˆα screened case we match H to φ`(ε, r) and JR`(r), solve Schr¨odinger’sequation in the augmentation the latter defined by Eq. (39): The matching requires spheres. Since P α and N α depend only on values and slopes at the {sR}, the substitution has no effect W {Hˆ 0, φ } P 0(ε) on them. P α(ε) = ` ` = ` (46) ` α 0 W {Jˆ , φ`} 1 − α`P` (ε) ` 2.10.4. Tail Cancellation in the Tight-binding Rep- ˆα ˆ 0 q α W {J` , H` } ˙ α resentation N` (ε) = = P w/2 (47) W {Jˆα, φ } ` α ` ` The energy-dependent χRL, Eq. (51) exactly solve the ASA-MT potential because the trial function Eq. (46) implies X α α Ψ(ε, r) = zRLχRL(ε, r) α −1 0 −1 ∆` RL [P` (ε)] = [P` (ε)] − α` → + γ` − α` ε − C` that satisfy the set of linear equations (48) X α α α (P − S )RL;R0L0 zR0L0 = 0 (52) and is an obvious generalisation of P 0, Eq. (32). R0L0 There is also the analogue of Eq. (44) for P : and the normalisation

α −1 β −1 X α 2 [P (ε)] + α` = [P (ε)] + β` (49) |zRL| = 1 (53) ` ` RL Eq. (48) shows that ∂[P α(ε)]−1/∂ε is independent simplifies to of the screening α and X α Ψ(ε, r) = NR`φRL(ε, r) RL  ∂ −1/2 P α(ε) − [P α(ε)]−1 = ` ∂ε ` ˙ α 1/2 which is a normalised solution to the SE for v(r) = [P` (ε)] P vR(r). p ˆ 0 R = − 2/wW {H` , φ`} In practice the solution is inexact because L sum- ε − C mations are truncated. The solution is rapidly con- → √ ` (50) ∆` vergent in the L-cutoff, however; see Ref. [35] for an analysis. The last forms of Eqs. (48,50) apply when P is parameterised by Pe, Eq. (32). 2.10.5. Hard Core Radius α can be physically interpreted as equivalent to 2.10.3. Screened Muffin-Tin Orbitals specifying a “hard core” radius where the one-centre ˆ α To define the analogue of the MTO, Eq. (14), in expansion HRL vanishes in a sphere centred at 0 a screened representation we write R . This is evident from the one-centre expansion ˆα ˆα Eq. (40) and the form of J` , Eq. (41). J` vanishes α α NR`(ε)χRL(ε, r) = at the radius where  N α (ε)φ (ε, r) ˆ0 ˆ 0  R` RL αR` = J` (E, rR)/H` (E, rR)  P ˆ¯α α α + JR0L0 (r)[P − S ]R0L0;RL if r ∈ {sR}  R0L0 In his more recent developments, Andersen defined  ˆ α HRL(κ, r) if r ∈ interstitial the screening in terms of the hard core radius a` (51) instead of α, because nearly short-ranged basis func- tions can be obtained for a fixed aR`=0.7sR, inde- The partial wave φRL(ε, r) is understood to vanish pendent of κ and `. outside its own head sphere, and P is a matrix It is easy to see how such a transformation can α α diagonal in RL: PR0L0;RL(ε) = PR0L0 (ε) δR0L0;RL. render envelope functions short-ranged. The value 14 ˆ α 0 0 of HR` is forced to be zero in a sea of R L channels The Green’s function corresponding to some fixed surrounding it. Provided the aR` are suitably ad- h has the simple form justed, it quickly drives Hˆ α (r)→0 everywhere for R` G(z) = [z − h − i0+]−1 increasing r. If the aR`→0, the screening vanishes e e ˆ α ˆ 0 and HR` returns to the long-ranged HR`; while if for complex energy z. It is easy to see that Ge(z) the aR` becomes comparable to sR the value on the head must be something like 0 and 1 at the same can be expressed in the following form: time (heads and tails meet). The damping is too G(z) = ∆−1/2 gγ ∆−1/2 (54) ˆ α e e large and the HR`(r) “rings” with increasing r. For aR`=0.7rs or thereabouts the ringing is damped and where Hˆ α (r) decays exponentially with r, even for κ=0. R` γ γ γ −1 ge = [Pe − S ] (55) 2.11. MTO’s and Second Order Green’s Function This is the analogue of Eq. (17) in the γ representa- Through the eigenvectors zα of Sec. 2.10.4, we RL tion. can construct the Green’s function. This was done Peγ (z)−Sγ has a direct connection with z − eh in Appendix A of Ref. [32], where the full Green’s because of P γ (z) takes the simple form (z−C)/∆. function, including the irregular parts, are derived. e γ γ γ Here we will adopt a simpler development along the Pe −√S is linear in√z since S is independent of it, γ γ α and ∆(Pe −S ) ∆ = z−eh. lines of Ref. [5], after linearising the χRL(ε, r). One way to see why P α−Sα have similar eigenval- 2.11.1. Scattering Path Operator in Other Repre- ues for any α is to note that [P α]−1−[Sα]−1 does not sentations depend on α, since [P α]−1 and [Sα]−1 are shifted by the same amount (compare Eq. (43) and (48)). In To build G(ε) from general screening representa- scattering theory [P 0(ε)]−1 is proportional to tan- tions β we need to transform the scattering path ` γ β gent of the phase shift, and we realise that the trans- operator g →g . This can be accomplished [35] formation (P 0,S0)→(P α,Sα) corresponds merely using Eqs. (49) (44). The result is to a shift of the scattering background. The pole gβ = (P α/P β)gα(P α/P β) + (β − α)(P α/P β) structures of P α−Sα can depend on α because of the irregular parts: P α−Sα have the same poles as These transformations require only Eqs. (49) and [P α]−1−[Sα]−1 only where P α and Sα have no zeros (44); they do not depend on parameterisation of P . or poles. Some care must be taken when generating Since Eq. (50) is representation-independent, the Green’s function G. (P α/P β) can equally be written in the following The MTOs Eq. (51) form a complete Hilbert space forms: for any α, but α can be chosen to satisfy varying !1/2 physical requirements. To make short-ranged Hamil- P α(ε) P˙ α = = 1 + (α − β)P α(ε) (56) tonians it has been found empirically that the fol- P β(ε) P˙ β lowing universal choice 2.12. The ASA Green’s Function, General Repre- α = 0.34857 α = 0.05303 s p sentation α = 0.010714 α = 0 for ` > 2 d ` As shown in Sec. 2.11 the relation between the α Green’s function G(E) and the scattering path oper- yields short-ranged basis functions χRL(r) for κ=0, for any reasonably close-packed system. ator g is particularly simple when potential functions Another choice is α=γ. In Sec. 2.8.3 it was shown are parameterised to second order. The relation be- how the tail cancellation condition had the same tween the ASA approximation to G and g can be eigenvalues as a fixed Hamiltonian, Eq. (37). We are written more generally as follows. In the ASA, every now equipped make a connection with the χγ basis point r belongs to some sphere R with partial waves and Eq. (37). Moreover, this connection enables φRL(ε, r) so that us to construct the second order Green’s function. 0 First, Eq. (43) enables us to recognise the quantity GRR0 (ε, r, r ) = −1 −1 γ X 0 S −γ as S . Eq. (37) then has the simple φRL(ε, r) GRLR0L0 (ε) φR0L0 (ε, r ) two-centre form eh = C + ∆1/2 Sγ ∆1/2 LL0 15 where Henceforth, when the energy index is suppressed it means that the energy-dependent function or pa- ˙ α 1 d ln PRL(ε) rameter is to be taken at the linearisation energy GRLR0L0 (ε) = − ¯ 2 dε εν . If J is replaced by Eq. (59), Eq. (51) becomes q q ˙ α α ˙ α α + P (ε) g 0 0 (ε) P 0 0 (ε) (57) RL RLR L R L χRL(r) =  P ˙α α r and r0 (or R and R0) are the field and source points, φRL(r) + φR0L0 (r)hR0L0;RL if r ∈ {sR} R0L0 respectively. It was first shown in Ref. [32], for the (N α )−1Hˆ α (κ, r) if r ∈ interstitial “bare” representation α = 0, and for a screened R` RL (60) representation α in Ref. [35]. The first term cancels a pole appearing in the second term, connected to hα = −P α(P˙ α)−1 + [P˙ α]−1/2Sα[P˙ α]−1/2 (61) the irregular part of G (which we do not consider α here). The Hilbert space of the {χRL(r)} consists of the ˙ P α(ε) can be computed by integration of the ra- pair of functions φR0L0 and φR0L0 inside all augmen- 0 0 dial Schr¨odingerequation for any ε. If this is done, tation channels R L . Changing α merely rotates and the structure constants S are taken as energy- the Hilbert space, modifying how much φR0L0 and ˙ α dependent, this is the screened KKR method. Ques- φR0L0 each χRL contains. taal’s lmgf and lmpg parameterise P α(ε), to second Eq. (60) may be regarded as a Taylor series of α order (Eq. (48)), or to third order (Sec. 2.13.4) in ε. χRL(ε, r), Eq. (51), to first order in ε − εν , with ˙α α α Note that G does not depend on choice of α; it φ (r) h playing the part of χ˙ (r)(ε − εν ). The α α can be used to as a stringent test of the correctness eigenvalues of h are fact the eigenvalues of χRL(ε, r) of the implementation. to first order in ε − εν . If P is parameterised to second order, P (ε)→Pe(ε), hα in the γ representa- tion becomes the second order ASA Hamiltonian, 2.13. The ASA Hamiltonian: Linearisation of the Eq.(37), when α=γ. Muffin-Tin Orbitals The relation between N α(ε) and P α(ε) was al- The energy-dependent MTO, Eq. (51), exactly ready established in Eq. (47). To confirm it is con- solves the ASA potential for a fixed ε. To make a sistent with Eq. (59) at εν , revisit the definition of fixed, energy-independent basis set, we constrain the N α using Eq. (59) energy-dependent MTO, Eq. (51), to be independent W {J α, Hˆ 0} w/2 of ε. As we saw in Secs. 2.8.3 and 2.11, there is N α(ε) = = a simple energy-independent Hamiltonian Eq. (37) W {J α, φ} W {−φ˙αN α(ε)/P˙ α(ε), φ} that has the same eigenvalue spectrum as the second ˙ α ˙ α γ P (ε)w/2 P (ε)w/2 order Ge; this is χ (εν , r). = = RL N α(ε)W {φ˙α, φ} N α(ε) We can construct an energy-independent ba- α α sis χRL(r)=χRL(εν , r) for any α, by choosing which confirms Eq. (47). α the normalisation NR`(ε) in such a way that α α α α ∂χRL(ε, r)/∂ε=0 at ε = εν . Thus we require 2.13.1. Potential Parameters C , ∆ , and o The LMTO literature suffers from an unfortunate ∂ h ¯ i N α(ε)φ(ε, r) + Jˆα(r)P α(ε) = 0 proliferation of symbols, which can be confusing. ∂ε ε=εν Nevertheless we introduce yet another group be- ˙ α α ˙ ˆ¯α ˙ α N (ε)φ(ε, r) + N (ε)φ(ε, r) + J (r)P (ε)ε=εν = 0 cause they offer simple interpretations of what is happening as the basis changes with representation Define α, and also to make a connection with the Green’s function. It is helpful to remember there is a sin- α α α φ (ε, r) ≡ [N (ε)/N (εν )]φ(ε, r) (58) gle potential function P α(ε), which determines the normalisation N α(ε) though Eq. (47). α α Then the condition that χ˙ RL(r) vanish for all r P (ε) can be parameterised to second order with becomes three independent parameters C, ∆, and γ (Eq. (48)) ¯ and to third order with the “small parameter” p Jˆα(r) = −[φ˙α(ε, r)N α(ε)/P˙ α(ε)] (59) ε=εν (Eq. (36)). 16 α The energy derivative of φ (ε, r) at εν is overlap are readily obtained. Using normalisation Eq. (5) and (36), Schr¨odinger’s equation for partial ˙α ˙α ˙ φ` (εν , r) = φ` (r) = φ`(r) + o` φ(r) (62) waves Eq. (4), the parameterisation of φ˙α Eq. (62), α ˙ α α α − γ and neglecting the interstitial parts o` ≡ N` /N` → (63) (α − γ)(C − εν ) + ∆ H = χα| − ∇2 + v|χα The last form applies when P is parameterised to sec- α α α α α α α = h (1 + o h ) + (1 + h o )εν (1 + o h ) ond order. We have introduced the “overlap” poten- + hα ε p hα (67) tial parameter oα. It vanishes in the γ representation ν ` α α and consequently φ˙γ =φ˙. This fact provides a simple O = hχ |χ i α α α α α α α interpretation of χRL(r), Eq. (60) in the γ represen- = (1 + h o )(1 + o h ) + h p h (68) γ tation. χRL acquires pure φ character for its own head, and pure φ˙ in spheres where R06=R. This im- 2.13.4. Third Order Green’s Function plies that the {χγ } basis are orthogonal apart from Ge(z) depends on potential parameters C and ∆. interstitial contributions (neglected in the ASA) and These can be replaced with the following: small terms proportional to p (Eq. (36)). φ and φ˙ γ γ combine in every sphere in the exact proportion ˙ −1 Pe (ε) ∆ = [Pe (ε)] and ε − C = γ (69) γ ˙ Eq. (8) at each eigenvalue εi of h . Thus eigenval- Pe (ε) γ ues of h are correct to one order in εi−εν higher than hα6=γ . which follows from Eq. (50) with α=γ. Then In the LMTO literature two other parameters are the substitution Pe→P through the replacement α 0 3 introduced to characterise h in a suggestive form: ε→ε = ε + (ε − εν ) p, Eqs. (35) and (36). This yields an expression for G to third order—more ac- α α hRL;R0L0 = (C − εν )R` δRL;R0L0 curate than the 2nd order Ge [35, 32]. Questaal codes α α α lmgf and lmpg permit either second or third order + ∆ S 0 0 ∆ 0 0 (64) R` RL;R L R ` parameterisation. where 2.14. Principal Layer Green’s Functions α α ˙ α −1 C` − εν ≡ −P` (P` ) → (C − εν )[... ] (65) Questaal has another implementation of ASA- √ √ ∆α ≡ [P˙ α]−1/2 → ∆ [... ] (66) Green’s function theory designed mainly for trans- port. lmpg is similar in most respects to the crystal and package lmgf, except that is written as a principal- layer technique. lmpg has a ‘special direction’, which  (C − ε )(α − γ) [... ] = 1 + ν defines the layer geometry, and for which G is gen- ∆ erated in real space. In the other two directions,

α Bloch sums are taken in the usual way; thus for h + εν becomes eh (Eq. (37)) when α=γ. each q in the parallel directions, the Hamiltonian becomes one-dimensional and is thus amenable to 2.13.2. How hα Changes with Representation solution in order-N time in the number of layers N. ˙α From Eq. (63) implies that φ transforms as The first account of this method was presented in φ˙α − oαφ = φ˙β − oβφ Ref. [36], and the formalism is described in detail in Ref. [14], including its implementation of the non- Dividing χα in to φ and φ˙ parts, we realise that equilibrium case. Here we summarise the basic idea oα + [hα]−1 is independent of representation and and the main features. therefore lmpg is similar in many respects to lmgf except for its management of the layer geometry. The oβ + [hβ]−1 = oα + [hα]−1 material consists of an active, or embedded region, which is cladded on the left and right by left and 2.13.3. ASA Hamiltonian and Overlap matrix right semi-infinite leads. χα (Eq. (60)) is an energy-independent basis set PLATL PLAT PLATR and has an eigenvalue spectrum. Within the ASA z }| { z }| { z }| { (Sec. 2.7) matrix elements of the Hamiltonian and ... PL − 1 PL 0 | ... | PL n−1 PL n . . . 17 −1 The end regions are half-crystals with infinitely the leads is to modify gI by adding a self-energy repeating layers in one direction. All three regions to layers 0 and n: are partitioned into slices, or principal layers (PL), −1 along the ‘special direction’. The left- and right- end g = (P − S)I + Σ0 + Σn s,L regions consist of a single PL, denoted −1 and n, Σ0 = S0,−1 g−1−1 S−1,0 which repeat to ∓∞. Thus the trilayer geometry is Σ = S gs,R S defined by five lattice vectors: two defining the plane n n−1,n nn n−1,n normal to the interface (the potential is periodic g−1 is inverted by a sparse matrix technique. in those vectors); one vector PLAT for the active lmpg implements both the difference-equation and region and one each (PLATL and PLATR) defining the sparse-matrix techniques: both scale linearly with periodically repeating end regions. the number of layers, in memory and in time. It Far from the interface the potential is periodic has been found empirically that they execute at and states are Bloch states. It is assumed that the similar speed for small systems, while the difference- potential in each end layer is the same as the bulk equation method is significantly faster for large sys- crystal (apart from a constant shift) and repeats tems. periodically in lattice vectors PLATL and (PLATR) to ∓∞. 2.14.2. Green’s Functions for the End Regions Partitioning into PL is done because ε − H = To make g, the diagonal surface Green’s functions −1 s,L s,R G is short-ranged. It is requirement that a PL g−1−1 and gnn are required. lmpg implements two is thick enough so that H only connects adjacent schemes to find them: a “decimation” technique [37] PL. Then H is tridiagonal in the PL representation and a special-purpose difference-equation technique and the work needed to construct G scales linearly applicable for a periodic potential [38]. with the number of PL. Moreover it is possible in The latter method requires solution of a quadratic this framework to construct G for the end regions algebraic eigenvalue problem, which yields eigenval- without using Bloch’s theorem. ues r: they correspond physically to wave numbers Principal layers are defined by the user; they as r = eika. a is the thickness of the PL and k the should be chosen so that each PL is thick enough wave number in the plane normal to the interface. k so that H connects to only nearest-neighbour PL is in general complex since no boundary conditions on either side. (Utility lmscell has a facility to are imposed; it is real only for propagating states. partition the active region into PL automatically.) Eigenvalues occur in pairs, r1 and r2, and in the ∗ absence of spin-orbit coupling, r1 = 1/r2. There is a 2.14.1. Green’s Function for the Trilayer boundary condition on the end leads for the trilayer, lmpg constructs the auxiliary g, and if needed which excludes states that grow into an end region. builds G from g by scaling (Sec. 2.12). In many The surface Green’s function gs can be constructed instances g is sufficient (e.g. to calculate transmis- from the same eigenvectors that make the “bulk” sion and reflection probabilities [14]), although G is g [38]. needed to make the charge density. Note there is a A great advantage of this method is that its solu- g (or G) connecting every layer to every other one; tion provides the eigenfunctions of the system; thus thus g has two layer indices, gij (i and j refer to PL the Green’s function can be resolved into normal here). modes. A large drawback is the practical problem g=(P − S)−1 for the entire trilayer can be con- of finding a solution to the eigenvalue problem. It structed in one of two ways. The first is a difference- can be converted into a linear algebraic eigenvalue equation method, described in Ref. [36]. The second equation of twice the rank; however, the resulting is simply to invert (P − S) using sparse matrix tech- secular matrix can be nearly singular (especially if niques. Both methods require as starting points S is short-ranged). Also, the pairs r1/r2 can range s,L the diagonal element g−1−1 for semi-infinite system over a very large excursion, of unity for propagating (consisting of all layers between ∞ and −1, with states and many orders of magnitude for rapidly vacuum for all layers to the right of the L- region) decaying ones. Capturing them by solving a single s,R and the corresponding gnn for the R- region. eigenvalue problem imposes severe challenges on the The sparse-matrix method is simple to describe. eigenvalue solver. Supposing the active region is considered in isolation; Decimation is recursive and generally efficient; −1 denote it as I. Then gI =(P − S)I . The effect of however problems can appear at special values of k 18 energy where the growing and decaying pair r1 and parameter which also ranges between 0 and 1. The r2 become very close to unity. Unfortunately, those integrand on the contour is smooth, except near the “hot spots” are often the physically interesting ones. endpoint z→EF . Good results can be obtained with At present Questaal’s standard distribution does a modest number of points, typically 12-20. not have a fully satisfactory, all-purpose method to The exact potential function P α has no poles determine gs, though one has been developed and in this half plane; nor does the second order pa- will be reported in a future work. rameterisation but spurious poles may appear in the third order parameterisation. These may be 2.15. Contour Integration over Occupied States avoided by working in the orthogonal representation (α = γ) and/or by choosing fairly large elliptical eccentricities for the contour. 0.2 2.16. Spin-orbit Coupling in the Green’s Function 0.1 Im z It has been shown in Sec. 2.8.2 that spin-orbit 0 coupling (SOC) can be added perturbatively to the −1 −0.8 −0.6 −0.4 −0.2 0 Hamiltonian, resulting in the matrix elements con- 0 Re z taining ξ`[D ,D]. These matrix elements are added to the right-hand-side of the first line in Eq. (26), while the overlap integrals (second line in Eq. (26)) Figure 6: Representative contour integration for occupied remain unchanged. In order to construct the Green’s states. Twelve points are taken for the interval (−1, 0). The contour deformation has eccentricity = 1/2, and a bunching function with SOC, the resulting modification of the parameter 1/2, giving more weight to the points near EF = 0. variational energy in Eq. (27) needs to be reformu- lated as a perturbation of the potential parameters. Many properties of interest involve integration Because the SOC operator is a matrix (see over the occupied states. In contrast to band meth- Sec. 2.8.2), the exact solutions φνljκ(r) of the radial ods which give eigenfunctions for the entire energy Pauli equation are linear combinations of spherical spectrum at once, Green’s functions are solved at waves |lmσi and |lm0σ0i with m + σ = m0 + σ0 = j. a particular energy, and must be numerically inte- However, our perturbative treatment is still based on grated on an energy mesh for integrated properties. basis functions with definite spin that are calculated The spectral function or density of states is related without SOC. The energy dependence, however, is to G as modified by allowing the ω parameter to become a matrix, so that Eq. (25) is replaced by A(ε) = π−1|Im G(ε)| (70)

Φmσ(D↑,D↓, r) = φmσ(νσ, r) and in the noninteracting case is comprised of a X ˙ superposition of δ-functions at the energy levels. + ωmσ,m0σ0 (D↑,D↓)φm0σ0 (νσ0 , r) (71) Thus G(ε) has lots of structure on the real axis, m0σ0 which makes integration along it difficult. However, where we dropped the common ` superscript because since below the Fermi level G should have no poles the SOC operator is diagonal in `. The summation in in the upper half of the complex plane, the Cauchy (71) involves at most two terms with m+σ = m0 +σ0. theorem can be used to deform the integral from Instead of the simple variational estimate of (D) the real axis to a path in the complex plane (Fig. 6). in Eq. (27), we now construct a generalised eigen- lmgf and lmpg use an elliptical path, with upper value equation, which leads to and lower bounds on the real axis respectively at † the Fermi level and some energy below the bottom ωˆ + VˆSO = 1ˆ − ˆν +ω ˆ pˆ(1ˆ − ˆν )ˆω (72) of the band. A Legendre quadrature is used, but the weights where ωˆ is the matrix from Eq. (71), both ωˆ and ˆ can be staggered to bunch points near EF where G VSO are functions of D↑ and D↓, and pˆ and ˆν are ˙2 has lots of structure. Thus five parameters define diagonal matrices with elements hφν`σi and νσ, re- the mesh: the number of points, the upper and lower spectively. The matrices are assumed to include the bounds, the eccentricity of the ellipse (between 0 for full basis set on the given site, i.e., their dimension circle and 1 for a line on the real axis) and bunching is 2(2` + 1). 19 ˆ The ωˆ matrix is found by solving Eq. (72). To Eqs. (74-76). V˙SO is always calculated as the exact ˆ ˆ first order in  − ν , it gives ωˆ = 1 − ˆν − VSO, energy derivative of VˆSO. ˆ so that the matrix elements of VSO are effectively Figure 7 shows the comparison of the magne- added to ˆν . Promoting the potential function P () tocrystalline anisotropy calculated using lm and to a matrix and using the representation Eq. (32) lmgf for two benchmark systems. Two cases are dis- and the definitions Eqs. (29-31), we find that the played: lmgf with second-order potential functions ˆ parameters ∆ and γ are unaffected by VSO while C is compared with the corresponding two-centre approx- ˆ ˆ ˆ promoted as C → C = C1 + VSO. However, in order imation in lm, and lmgf with third-order potential to make the definition of Pˆ() unambiguous, we need functions compared with the full three-centre lm to fix the correct order of matrix multiplication. We calculation. The agreement in both cases for FePt also need to ensure that the poles of G(z) have unit [panel (a)] is very good, while for the (Fe1−xCox)2B residues. We use the following definitions: alloy in the virtual crystal approximation it is essen- −1 tially perfect. G() = λ() + µL()[P () − S] µR(), (73) h √ √ i−1 P () = γ + ∆( − Cˆ)−1 ∆ (74) √ ˆ 1/2 −1 µL() = (1 − V˙SO) [∆ + γ( − Cˆ)] ∆ (75) √ −1 ˆ 1/2 µR() = ∆[∆ + ( − Cˆ)γ] (1 − V˙SO) (76) 1 λ = − µ−1P¨ µ−1 (77) 2 R L ˆ ˆ where V˙SO = dV˙SO/d comes from the energy de- 0 pendence of the SOC parameters ξ`[D ,D]. Note that P˙ = µRµL, and the structure of G(z) guar- antees that the poles of G(z) have unit residues. It is straightforward to check that, with energy- independent VˆSO, G(z) becomes the resolvent of the second-order Hamiltonian Eq. (37), with added VˆSO, as expected. To third order in  − ν and to first order in VˆSO, we find from Eq. (72):

(3) ˆ ˆ 3 ˆ (3) ωˆ = 1 − ˆν +p ˆ(1 − ˆν ) − VSO = ˆ0 − ˆ − Vˆ (3) (78) Figure 7: Benchmarks for magnetocrystalline anisotropy us- ν SO ing lm (blue symbols and solid curves) and lmgf (red symbols where 0 is defined in Eq. (35), and and dashed curves). (a) FePt with the magnetic part of the exchange-correlation field scaled by a factor between 0 and 1. (b) (Fe Co ) B alloy in the virtual crystal approximation, ˆ (3) ˆ  ˆ ˆ 2 1−x x 2 VSO = VSO + VSO, pˆ(1 − ˆν ) (79) with the cation charge varied from 26 to 27. Lower curves with squares: full three-centre lm and lmgf with third-order  with A,ˆ Bˆ = AˆBˆ + BˆAˆ. Because VˆSO is defined potential functions. Upper curves (shifted by 2 meV/f.u. in at the fixed values of the logarithmic derivatives, panel (a) and by 0.5 meV/f.u. in panel (b)): two-centre ˆ approximation in lm and second-order potential functions in the spin-orbit coupling parameters ξ`,σσ0 in VSO are lmgf. The charge density in all calculations for the given 0 calculated with  replaced by  : material is taken from the full self-consistent lm calculation (without scaling in the case of FePt). ξ`,σσ0 () = hφ`σ|ξ(r)|φ`σ0 i 0 ˙ + (`σ − ν`σ)hφ`σ|ξ(r)|φ`σ0 i 0 ˙ 2.17. Fully Relativistic LMTO-ASA + (`σ0 − ν`σ0 )hφ`σ|ξ(r)|φ`σ0 i 0 0 ˙ ˙ We have developed a fully relativistic exten- + ( − ν`σ)( 0 − ν`σ0 )hφ`σ|ξ(r)|φ`σ0 i (80) `σ `σ sion of the LMTO-ASA code within a relativis- Of course, just as in the non-relativistic case,  is tic generalisation of the density functional formal- also replaced by 0 where it appears explicitly in ism [39, 40, 41, 42, 43]. In the most general case, 20 one needs to solve the Kohn-Sham Dirac equation  d 1 − κ  + 1 cf (ε, r) = dr r κ1µ HΨ(ε, r) = εΨ(ε, r) (81)

[− (ε − V (r)) − uB(r)] cgκ1µ(ε, r) with p 2 − 1 − u B(r)gκ2µ(ε, r) 2 H = cαp + (β − I4)mc + V (r)I4 + µBβBeff(r)Σ (82)  d 1 + κ  + 2 g (ε, r) = dr r κ2µ where  00        ε − V (r) + u B(r) 0 σ I 0 σ 0 1 + cfκ µ(ε, r) α = , β = 2 , Σ = c2 2 σ 0 0 −I2 0 σ  d 1 − κ  (83) + 2 cf (ε, r) = dr r κ2µ Here, σ is the vector of Pauli matrices, p is the [− (ε − V (r)) − uB(r)] cgκ2µ(ε, r) momentum operator, Beff(r) is an effective spin- p 2 dependent potential acting on electrons. It should − 1 − u B(r)gκ1µ(ε, r) (87) be noted that this is a simplified form of the rela- tivistic Kohn-Sham equation in which the orbital where contribution to the 4-component relativistic current µ µ µ u = , u0 = , u00 = is neglected. This simplification is necessary in order ` + 1/2 ` − 1/2 ` + 3/2 to avoid the significant formulaic and computational (88) complications that arise in the relativistic current density formulation of the density functional the- In the general case, in the presence of a magnetic ory [43]. field, one has to solve a system of two infinite sets For a spherically symmetric potential V (r) = of mutually coupled differential equations because 1/2[V↑(r) + V↓(r)] inside a single MT sphere, the the magnetic field couples radial amplitudes with direction of the magnetic field can be assumed to different relativistic quantum numbers κ. Specif- point along the z direction, B = B(r)zˆ, where eff ically, states with κ1 to those with κ2 = κ1 and B(r) = 1/2[V (r) − V (r)]. Then, Eq. (82) can be ↑ ↓ κ2 = −κ1 − 1, i.e., states of the same `, but also written as states with κ1 to those with κ2 = 1 − κ1, i.e., states 2 with different `’s, ∆` = ±2. To avoid this com- H = [cαp + (β − I4)mc + V (r)I4 + µBβB(r)Σz] (84) plication, this coupling is neglected. In this case, the set simplifies into coupled equations for each The solutions of the Kohn-Sham Dirac equation (84) pair `µ. For |µ| = ` + 1/2, there is no coupling, so are linear combinations of bispinors: similar to the non-relativistic case, there is only reg- X ular solution with quantum numbers κµ, while for Ψµ(r, ε) = Ψκµ(r, ε) (85) |µ| < `+1/2, we need to solve the set of four coupled κ equations (87) for the four unknown radial functions  g (ε, r)Ω (ˆr)  0 κµ κµ gκ1µ, fκ1µ, gκ2µ, and fκ1µ. The coefficients u, u , Ψκµ(r, ε) = (86) 00 ifκµ(ε, r)Ω−κµ(ˆr) and u in Eq. (87) result from matrix elements of 0 the type hκµ|σ |κ µi [43]. Here, Ω (rˆ) are the spin spherical harmonics, µ z κµ Physically, the neglected coupling corresponds is the projection of the total angular momentum, to a magnetic spin-orbit interaction given by a and κ is the relativistic quantum number, κ2 = term 2c−2r−1dB/drL · S in the weak relativistic J(J + 1) + 1/4. The radial amplitudes g (ε, r) κµ domain [44]. Since this is proportional to the prod- and f (ε, r) satisfy the following set of coupled κµ uct of two small quantities (c−2 and dB/dr) its differential equations: omission is justified in most cases.   The construction of the MT orbitals and the cor- d 1 + κ1 + gκ µ(ε, r) = responding boundary conditions proceeds along the dr r 1 same principles as in the non-relativistic case de-  0  ε − V (r) + u B(r) scribed in the previous sections. The tail cancella- 1 + cfκ1µ(ε, r) c2 tion condition is conveniently formulated in terms 21 of relativistic extensions of the potential and nor- By direct substitution of (95) into (90) we obtain malisation functions: the parameterisation of the potential function: −1 `+1 T w  −1 P (ε) = R (V − ε) R + Q (97) NL(ε) = (2` + 1) gL (ε, s) s or equivalently × (D (ε, s) − I`)−1 (89) L P −1(ε) = W (C − ε)−1 W T + γ (98) where w 2`+1 −1 PL(ε) = 2(2` + 1) T h −1i s V = εν + sgν A + (Dν − Il) gν (99) −1 × (DL(ε, s) + I` + I)(DL(ε, s) − I`) (90) 1 √ w `+ 2 R = 2s (2` + 1) [A (D − Il) + I]−1 g The logarithmic derivative matrix being given in s ν ν terms of the small and large components of the (100) radial amplitude: w 2`+1 Q = 2 (2` + 1) −1 s DL(ε, r) = scfL(ε, s)gL (ε, s) − κ − I (91) h  −1−1i κ 0  × I + (2` + I) (Dν − Il) + A (101) κ = 1 0 κ2 and −1 T For states with |µ| < ` + 1/2, NL(ε), PL(ε), DL(ε), γ = Q ,W = γR, C = V + R γR (102) g (ε), and f (ε) are 2×2 matrices for each subblock L L For an arbitrary representation α: L = (`, µ) with the general form α−1 −1 T   P (ε) = W (C − ε) W + γ − α (103) Aα1κ1(ε) Aα1κ2 (ε) AL(ε) = (92) which is the relativistic analogue of Eq. (48). In Aα2κ1(ε) Aα2κ2 (ε) general, the matrices V , R, Q, C, γ and W are Note that the indices in this matrix have different non-diagonal, some are symmetric and some are physical meaning. Although the values of index α not, so they do not all commute with each other. are numerically equal to those of κ, index α stands This is related to the different physical origin of for different behaviour of the radial function at the the matrix indices discussed above.√ The parameter origin [42, 43] while κ stands for solutions with W is the relativistic analogue of ∆ introduced different quantum states. Therefore, these matrices earlier [see Eq. (33)], and Eq. (97) is analogous to are not symmetric and do not commute with each the non-relativistic Eq. (32). We should note that other. in the code, we have also included a third-order The linearisation can be formulated in a matrix parameterisation of the potential function similar form too [in the following we will leave out the to the non-relativistic case, Eq. (35). subblock index L; all presented matrices have the The physical Green function and Hamiltonian form of Eq. (92)]. Within second-order approxima- are written in terms of the potential function and tion, the radial amplitudes are expanded around a scalar relativistic structure constants matrices after linearisation energy: a transformation of the former from κµ to `mms basis: g(ε, r) ≈ gν (r) + (ε − εν ) Ig˙ν (r) (93) 1 2 f(ε, r) ≈ f (r) + (ε − ε ) If˙ (r) (94) Gα = − P¨αP˙ α + N αT gαN α (104) ν ν ν 2 w Then, a symmetric matrix form of the linearisation where of the logarithmic derivative can be written as −1 gα = (P α − Sα) (105) −1 s T (D(ε) − Dν ) = − gν g + A (95) these are the relativistic analogues of Eq (57) and  ν the scattering path operator. where Within the framework of the TB-LMTO and prin- 1 h −1 −1i cipal layer approach, the fully relativistic version A = − (D − D ) + D − DT  2 ν ν˙ ν ν˙ of the green function for layered geometry is con- 1 structed straightforwardly from the site diagonal = − s g˙ gT + g g˙T  (96) 2 ν ν ν ν fully relativistic potential function [43]. 22 2.18. Coherent Potential Approximation Iteration to self-consistency is facilitated [45, 43] The coherent potential approximation (CPA) is by introducing the coherent interactor matrix Ωi, −1 a Green’s function-based method used to describe for each CPA site, defined through g¯ii = (Pi −Ωi) . a a a the electronic structure of disordered substitutional The conditionally averaged g¯ii is then g¯ii = (Pi − −1 alloys. Questaal’s lmgf code implements the CPA Ωi) . The latter two equations can be used to re- in the Atomic Spheres Approximation, following express the self-consistency condition (108) in terms a the formulation of Refs. [45, 43]. Any lattice site i of Pi,Ωi, and Pi without an explicit reference to can be occupied by any number of components a the scattering path operator: with probabilities (concentrations) ca, which must X i (P − Ω )−1 = ca(P a − Ω )−1 (109) be supplied by the user. These components can i i i i i have different atomic sphere radii, and each has its i own charge density, atomic potential, and diagonal The Ωi matrices (for each required complex energy a matrix of potential functions Pi (). Each site with point) are converged to self-consistency using the substitutional disorder (“CPA site”) is also assigned following procedure. At the beginning of the CPA a coherent potential matrix Pi() that has the same iteration, Eq. (109) is used to obtain Pi for each CPA a orbital structure as Pi () but is off-diagonal, with site from Ωi, and then Pi is inserted in Eq. (106). the restriction that it must be invariant under its After integration over k, the site-diagonal block g¯ii site’s point group. The elements of the Pi() matrix is extracted for each CPA sites and used to obtain −1 are complex even if  is real, and it is fixed by the the next approximation for Ωi = Pi − g¯ii , closing CPA self-consistency condition. the self-consistency loop. The output Ωi matrices The configurational average of the scattering path are linearly mixed with their input values. The operator is given by mixing parameter can usually be set to 1 for energy points that are not too close to the real axis. −1 g¯(, k) = [P() − S(k)] (106) The CPA loop is repeated until the Ωi matrices converge to the desired tolerance, after which a where the site-diagonal matrix P() absorbs the co- charge iteration is performed. At the beginning herent potential matrices for the CPA sites and the of the calculation, the Ω matrices are initialised diagonal matrices of the conventional potential func- i to zero unless they have already been stored on tions for the sites that are occupied deterministically disk. The CPA loop can sometimes converge to (“non-CPA sites”). The matrix in Eq. (106) is inte- an unphysical symmetry-breaking solution; this can grated over the Brillouin zone in the usual way, and usually be avoided by symmetrising the coherent its site-diagonal blocksg ¯ are extracted. ii potentials using the full space group of the crystal. For non-CPA sites, the full Green’s function, den- sity matrix, and the charge density are obtained from g¯ii in the usual way. For each CPA site i, we a define the scattering path operator g¯i (separately for each component a), which is the statistical average under the restriction that site i is occupied determin- istically by component a, while all other sites in the infinite crystal are occupied statistically, according to their average concentrations. Such quantities are called “conditionally averaged.” The charge density for the component a on site i is obtained from the a site-diagonal block g¯ii on site i of the conditionally a averaged g¯i . This site-diagonal block can be found Figure 8: Left: DFT magnetic moment in (Ni80Fe20)1−xCrx, from the matrix equation on that site: compared to experiment (films with 11 nm (crosses) and 2.5 nm (dots) thickness) at 10K. Right: Majority-spin CPA a −1 −1 a (¯gii) =g ¯ii + Pi − Pi (107) spectral function at x=20%. Adapted from Ref. [46] and Vishina’s PhD thesis [47]. and the CPA self-consistency condition reads

X a a g¯ii = ci g¯ii (108) Fig. 8 illustrates one recent application of Ques- i taal’s implementation of the CPA. A number of 23 new, potentially high-impact technologies, Joseph- son MRAM (JMRAM) in particular, performs write operations by rotating a patterned magnetic bit. These operations consume a significant portion of a system’s power. One way to reduce the power consumed by write operations is to substitute mate- rials with smaller saturation magnetisation Ms. The minimum usable Ms is limited by the need to main- tain large enough energy barrier to prevent data loss due to thermal fluctuations. Low Ms can greatly 78.8 73.7 reduce power particularly for JMRAM [48], which operates at around 4 K. One way to reduce Ms in permalloy (Ni80Fe20 alloys, the most commonly used JMRAM material), is to admix Cr into it, as Cr aligns antiferromagnetically to Fe and Ni. Fig. 8 shows some results adapted from a recent joint ex- perimental and theoretical study of Py1−xCrx,[46]. The CPA calculations of Ms track measured values fairly well. Also shown is the CPA energy band 71.9 68.6 structure for the majority spin. Alloy scattering causes bands to broaden out, and scattering life- Figure 9: Self-consistent magnetic spin configurations of the times can be extracted. A detailed account can be fcc Fe-Ni alloy at the four volumes 78.8, 73.7, 71.9, and 68.6 found in Vishina’s PhD thesis [47]. a.u.. Red and blue arrows show magnetic moments on Fe and Ni atoms, respectively. Taken from Ref. [50]. 2.19. Noncollinear Magnetism The ASA codes lm, lmgf and lmpg are fully non- collinear in a rigid-spin framework, meaning that thermal expansion over a considerable temperature the spin quantisation axis is fixed within a sphere, range. In Ref. [50] it was shown that at 35% com- but each sphere can have its own axis. The for- position, Fe-Ni alloys are on the cusp of a collinear- mulation is a straightforward generalisation of the noncollinear transition, and this adds a negative nonmagnetic case. Potential and Hamiltonian-like contribution to the Gr¨uneisenparameter. Fig. 9 is objects become 2×2 matrices in spin space. The reproduced from that paper. structure matrix S is diagonal matrix in this space and independent of spin, while potential functions P 3. Full Potential Implementation and potential parameters of Sec. 2.8.1, become 2×2 matrices that are diagonal in ` but off-diagonal in Questaal’s primary code lmf is an augmented- spin. (Alternatively, a local spin quantisation axis wave implementation of DFT without shape ap- can be defined which makes P diagonal in spin; then proximations. It also handles the one-body part of S is no longer diagonal.) How Questaal constructs the quasiparticle self-consistent GW approximation, the Hamiltonian in the general noncollinear case, by adding the (quasiparticlised) GW self-energy to and also for spin spirals, is discussed in Ref. [49]. the DFT part. Closely related codes are lmfgwd, a These codes have implemented spin dynamics [11], driver supplying input for the GW, lmfdmft, a driver integrating the Landau-Lifshitz equation using a supplying input for a DMFT solver, and lmfgws, a solver from A. Bulgac and D. Kusnezov [51], the post-processing tool that generates quantities using spin analogue of molecular dynamics and molecu- the one-body part from lmf and the dynamical self- lar statics. The formulation is described in detail energy from GW or DMFT. This code uses different in Ref. [49]. Codes also implement spin statics. definitions for classical functions (see Appendix B), Torques needed for both are obtained in DFT from and we shift to those definitions in what follows. the off-diagonal parts of the spin-density matrix. A As for the DFT part, Questaal’s unique features classic application is the study of INVAR. Fe-Ni are: alloys with a Ni concentration around 35 atomic % (INVAR) exhibit anomalously low, almost zero, • it uses a three-component augmentation, follow- 24 ing most of the original method of Ref. [31]. The development of the Questaal basis set. They are all augmentation is reviewed in Sec. 3.6. Ref. [52] connected to the smooth Hankel function for `=0. offers a slightly different presentation and shows Its Fourier transform is how it is connected with the PAW method [4]. 4π 2 2 rs (E−q )/4 bh0(E, rs; q) = − 2 e (110) • it has a more general basis set. Ordinary Hankel E − q functions solve Schr¨odinger’sequation for a MT which is a product of Fourier transforms of ordinary potential, but real potentials vary smoothly into Hankel function for `=0 with a Gaussian function the interstitial (Fig. 10). The envelope func- of width 2/rs. tion of a minimal basis set must adapt to this potential. The traditional Questaal basis uses 0.4 0 Veff smooth Hankel functions [53], which may be −2 thought of as a convolution of a Gaussian func- 0.3 tion and a traditional Hankel function; they −4 are developed in Sec. 3.1. This traditional ba- −6 0 1 2 3 sis works very well for most systems; however h 0.2 when the system is very open, it is slightly in- complete (Sec. 3.13). One way to surmount the 0.1 s p d incompleteness is to combine smooth Hankel functions with plane waves; this is the “Plane- 0 wave Muffin Tin” (PMT) basis [54] (see also 0 1 2 3 4 r Ref. [52]). While it would seem appealing, PMT suffer from two serious drawbacks: first, it tends to become over-complete even with a relatively Figure 10: Radial part of smooth Hankel functions HL for small number of plane waves, and second, it s, p, d orbitals (solid lines) and corresponding ordinary Han- kel functions H (dashed lines). Smoothing radius and is not compact (minimal, short ranged and as L energy were chosen to be rs = +1 and E = −1, respec- ` complete as possible for the relevant energy tively. For rrs, HL → HL, while for rrs, HL ∝ r and −`−1 window). HL ∝ r . The HL satisfy the Helmholtz wave equation (Schr¨odingerequation for constant potential), while the HL • Questaal’s most recent development is the “Jig- satisfy a Schr¨odingerequation corresponding to a potential V eff = −4πG /H , as shown in the inset. saw Puzzle Orbital” (JPO) basis, which uses L L information from the augmentation to construct We use standard definitions of Fourier transforms an optimal shape for the envelope functions. To Z the best of our knowledge, JPO’s are the clos- −iq·r 3 fb(q) = e f(r) d r est practical realisation of compactness. This is particularly important in many-body treat- 1 Z f(r) = e+iq·rfb(q) d3q (111) ments where the efficacy of a theory hinges crit- (2π)3 ically on compactness. This new basis will be presented more fully elsewhere. In Sec. 3.12 we Thus in real space, h0(E, rs; r) is a convolution of a show how a transformation to a tight-binding ordinary Hankel function and a Gaussian function. representation can be carried out in a full- It has an analytic representation potential framework. In the present version 1 h (E, r ; r) = (u − u ) (112) the basis set is merely a unitary transformation 0 s 2r + − of the original one. 1 h˙ (E, r ; r) = (u + u ) (113) 0 s 4¯κ + − 3.1. Smooth Hankel Functions    ∓κr¯ rsκ¯ r u± = e 1 − erf ∓ (114) In his PhD dissertation Michael Methfessel intro- 2 rs duced a class of functions HkL(E, rs, r). As lim- E = −κ¯2 if E < 0 (115) iting cases they encompass both ordinary Hankel functions and Gaussian functions (see Eq. (123) For small r, h0 behaves as a Gaussian and it evolves ˙ and (125) below). They are explained in detail in smoothly into an ordinary Hankel when rrs. h0 Ref. [53]; here we present enough information for is the energy derivative of h0. 25 Questaal’s envelope functions are generalised Han- recurrence relation to obtain χ` for `>1; this pro- kel functions HL(E, rs; r). Their Fourier represen- vides an efficient scheme for calculating them. The tation has a closed form recurrence is derived in Ref. [53] (see Eq. 6.21):

−4π r2(E−q2)/4 Hb L(E, rs; q) = YL(−iq) e s 2` + 1 E E − q2 χ`+1 = χ` − χ`−1 r2 r2 = YL(−iq)bh0(E, rs; q) (116)  `−1 4π 2 r2E/4−(r/r )2 − e s s 2 2 3/2 2 r (πrs ) rs YL(r) is a spherical harmonic (Ap- pendix A). Following the usual rules of Fourier trans- χ0 and χ−1 are needed to start the recursion. χ0 forms this function has a real-space representation is written in Eq. (112) and χ−1 can be obtained by combining Eqs. (113), (137) and (122). HL(E, rs; r) = YL(−∇)h0(E, rs; r) (117) By taking limiting cases we can see the connection with familiar functions, and also the significance of YL(r), r = (x, y, z) is a polynomial in (x, y, z), so parameters E and rs. is meaningful to talk about YL(−∇). The extended 2 family HkL(r) is defined through powers of the (i) k=rs=0: Hb0(E, 0; q)= − 4π/(E − q )Y00(q). Laplacian operator: This is the Fourier transform of H0(E, 0; r) = Y00(r) exp(−κr¯ )/r, and is proportional to the 2k Hb kL(E, rs; q) = (−iq) Hb L(E, rs; q) `=0 spherical Hankel function of the first kind, 2k (1) HkL(E, rs; r) = ∇ HL(E, rs; r) (118) h` (z). For general L the relation defines Ques- taal’s standard definition of ordinary Hankel

A recursion relation for HL(E, rs; r) for `>0 can functions be derived from properties of Fourier transforms in spherical coordinates. Any function whose Fourier HL(E, 0; r) = H0L(E, 0; r) transform factors as ` `+1 (1) = −i κ¯ h` (iκr¯ )YL(ˆr) (123) ˆ Fˆ(q) = f(q)YL(ˆq) (119) −r2q2/4 (ii) k=1 and E=0: Hb 10(0, rs; q) = −4πe s . has a real-space form This is the Fourier transform of a Gaussian function of width rs. For general L we can F (r) = f(r)YL(ˆr) (120) define the family as where f and fˆ are related by [53] GL(E, rs; r) = YL(−∇)g(E, rs; r) (124) 2−3/2 Er2/4 −r2/r2 g(E, rs; r) = πr e s e s (125) ` Z ∞ s 4πi ˆ f(r) = qf(q) j`(qr) qr dq (2π)3 0 Evidently H (q) is proportional to the product of Z ∞ b L ˆ ` the Fourier transforms of a conventional spherical f(q) = 4π(−i) rf(r) j`(qr) qr dr 0 Hankel function of the first kind, and a Gaussian. By the convolution theorem, HL(r) is a convolution and j` is the spherical Bessel function. We can of a Hankel function and a Gaussian. For rrs, therefore express HL in a Slater-Koster form HL(r) behaves as a Hankel function and√ asymptoti- −`−1 cally tends to HL(r) → r exp(− −Er)YL(ˆr). HL(r) = h`(r) YL(ˆr) ≡ χ`(r) YL(r) (121) For rrs it has structure of a Gaussian; it is there- ` χ`(r) = (1/r ) h`(r) (122) fore analytic and regular at the origin, varying as ` −`−1 r YL(ˆr). Thus, the r singularity of the Hankel For envelope functions used in basis sets, e.g. function is smoothed out, with rs determining the ra- HL(E, rs; r), the radial portion of the Fourier trans- dius for transition from Gaussian-like to Hankel-like form does not depend on `; therefore corresponding behaviour. Thus, the smoothing radius rs deter- real-space part depends on ` only through the j`. mines the smoothness of HL, and also the width of This is a very useful fact. It is possible to derive a GL. 26 By analogy with Eq. (118) we can extend the GL simple expansion theorem Eq. (1) does not apply to family with the Laplacian operator: them. Comparing the last form Eq. (129) to Eq. (118) G (E, r ; r) = ∇2k G (E, r ; r) (126) kL s L s and the definition of HkL Eq. (118), we obtain the 2k = YL(−∇)∇ g(E, rs; r) (127) useful relations 1 ∂2 k = Y (−∇) r· g(E, r ; r) Hk+1,L(E, rs; r)+EHkL(E, rs; r) L r ∂r2 s = ∇2 + E H (E, r ; r) (128) kL s = −4πGkL(E, rs; r) (134) 2 k r2(E−q2)/4 GbkL(E, rs; q) = YL(−iq)(−q ) e s (129) This shows that HkL is the solution to the Helmholtz Eq. (128) shows that GkL has the structure (poly- operator −∇2 + E in response to a source term 2 nomial of order k in r )×GL. Specifically [53] smeared out in the form of a Gaussian. A conven- tional Hankel function is the potential from a point 2k+`(2k + 2` + 1)!! multipole at the origin (see Eq. 6.14 in Ref. [53]): GkL(E, r) = 2k+` pk`(rs; r)GL(E, r) rs (2` + 1)!! smearing out the singularity makes HL regular at ` −`−1 where the origin, varying as r for small r instead of r . HkL is also the solution to the Schr¨odingerequation (−1)k(2` + 1)!!2kk! for a potential that has an approximately Gaussian p (r ; r) = L(`+1/2)(r2/r2) k` s ` k s dependence on r (Ref. [53], Eq. 6.30). rs(2k + 2` + 1)!! (130) 3.2. Gradients of Smooth Hankel Functions and Gradients of the HL are needed in several con- −3/2 2 ` 2 2 2 Ers /4 2 −r /rs texts, e.g. for forces and for matrix elements of GL = πrs e 2/rs e (131) the momentum operator, which enters into the di- (`+1/2) electric function and velocity operator. Also, the Lk (u) are generalised Laguerre polynomi- als [53], which have the following orthogonality re- energy derivative is needed in several contexts, e.g. lation for integration of some matrix elements (Sec. D). The form of Eq. (116) suggests that gradient and Z (`+1/2) (`+1/2) −u (`+1/2) position operators acting on H return functions of du L (u)L 0 (u) e u = b k k the same family. This turns out to be the case, and Γ(k + ` + 3/2)δkk0 it makes possible matrix elements of these opera- tors with two such functions. Consider the energy and as a consequence the GkL are orthogonal in the derivative first. Hb L(E, rs; q) is readily differentiated following sense:   ˙ 1 2 Z Hb L(E, rs; q) = Hb L(E, rs; q) − rs /4 a2r2 3 E − q2 GkLGk0L0 e d r = (135) 23k+`k!(2k + 2` + 1)!! δkk0 δ``0 2 3/2 4k+2` In real space, the energy derivative is most easily 4π(πrs ) rs derived from an integral representation of h` (Eq. A5 This can be also written as in Ref. [53])

2k Z `+1 ` 1/rs 3 2 k!(2` + 1)!! 2 r Z 2 2 2 G P 0 0 d r = δ 0 δ 0 (132) 2` E/(4ξ ) −r ξ kL k L 2k−` kk `` h`(r) = √ ξ e e dξ (136) 4πrs π 0 where It is readily seen that

PkL(rs; r) = pk`(rs; r)YL(r) (133) ˙ rh`(r) = 2h`+1(r) We will need this relation for one-centre expansion d ` ( − ) h`(r) = h`+1(r) (137) of the HL around remote sites (Sec. 3.6) since the dr r 27 Noting that the three components of the real har- with nonlocal potentials. It is more convenient to monic Y1m(r), Eq. (A.6) for `=1, are evaluate it in reciprocal space (Appendix D). (y,z,x), the gradient and position operators can be Functions of type Eq. (120) have the Slater-Koster p p obtained from − 4π/3 Y1p(−∇) and 4π/3 Y1p(r) form with another special property, namely that the respectively, with the appropriate permutation of p. Fourier transform fˆ(q), Eq. (119), is independent Using Eq. (117), of `, from which it follows that f`(r) satisfies the following differential equations (Ref. [53], Eq. 4.7 Y1p(∇)HL = −Y1p(−∇)YL(−∇)h0(E, rs; r) and 4.20) (138) ∂f (r) ` −f (r) = ` − f (r) `+1 ∂r r ` The operator product Y (−∇)Y (−∇) can be ex- L K ∂f (r) (` + 1) panded in the same manner as the product of poly- −∇2f (r) = ` + f (r) `−1 ∂r r ` nomials YL(r)YK (r) (Appendix A). (+) (−) 2 More generally, for a function of the form Thus f (r)= − f`+1(r) and f (r)= − ∇ f`−1(r). Eq. (120), it is possible to show that The HkL(r) family is of this type, and moreover, 2 ∇ HkL(r) merely maps HkL to another member of (+) X (+) the family (Eq. (118)). Another useful instance is ∇pf`(r)YL = f C Y`+1m0 ` i;`,m`;p i the real harmonic polynomials Y (Eq. (A.6)). Equa- i=1,2 L tions in this section, e.g. Eq. (137), yield explicit (−) X (−) + f C Y 0 (139) (+) (−) ` i;`,m`;p `−1mi forms for f and f for a number of functions i=1,2 of interest here, summarised in the table below. where F (r) f(r) f (+) f (−) rf(r) ` `−1 `+1 (+) ` YL r 0 (2`+1)r r f = [df`/dr − f`] −2`−1 −`−1 −`−2 −` ` r r YL r (−2`−1)r 0 r ˙ (−) ` + 1 HkL hkl −hk,`+1 −hk+1,`−1 2h`+1 f = [df`/dr + f`] (140) 2`+1 ` r HL(E=0) Same as YL/r JL(E=0) Same as YL (±) C is a sparse representation of a linear trans- i;`,m`;p formation that maps Y`,m(r) to a linear combination Table 3: Mapping gradient and position operators of smoothed Hankel functions to functions in the same fam- of two Y`+1,m0 (r) and two Y`−1,m0 (r). It is more compact to write each sum in a standard matrix ily. form, as P C(±) Y . C(±) is a family of three K KL;p K KL;p In general, e.g. for partial waves φ` inside augmen- rectangular matrices (p=1,2,3), which for a partic- tation spheres, f (+) and f (−) must be determined by ular L, has two nonzero elements corresponding to numerical differentiation. For the pair of functions (±) (+) (−) mi0 (Table 3.2). An alternative definition of C ` −`−1 KL;p {r , r }, f` (f` ) for the first (second) func- is given in Appendix A. tion vanishes. Any function can thus be matched at some r to this pair (Eq. (12)), which determines i p=x p=y p=z the projection of its gradient onto Y`∓1m0 . 1 m`−1 −m`−1 m` 2 m`+1 −m`+1 − 3.3. Two-centre Integrals of Smoothed Hankels One extremely useful property of the H is that (±) kL Table 2: Index m0 corresponding to coefficient C i i=1..2;`,m`;p the product of two of them, centred at different (±) 0 in Eq. (139). Index K in CKL;p corresponding to mi is sites R1 and R2, can be integrated in closed form. 2 0 K = (k±1) + (k±1) + 1 + mi. The result is a sum of other HkL, evaluated at the connecting vector R1 − R2. This follows from The position operator rp applied to f`(r)YK Parseval’s (Plancharel’s) identity results in the same form Eq. (139) but with ra- Z dial functions f (±)→rf(r). Note the close sim- ∗ 3 H1(r − R1)H2(r − R2) d r = ilarity with the gradient operator and in partic- 2 (+) (−) Z ular rf` = r (f − f )/(2` + 1). This operator −3 ∗ iq·(R1−R2) 3 ` ` (2π) Hb 1(q)Hb 2(q)e d q (141) appears for optical matrix elements of Hamiltonians 28 and the fact that H∗ (q)H (q) can be ex- are also real. The difference in unsmoothed and b k1L1 b k2L2 smoothed Hankels is for ` = 0, −1 pressed as a linear combination of other Hb kL(q), or their energy derivatives. Derivations of these and h − hs = [eiκr − u /2 + u /2]/r related integrals are taken up in Appendix D. 0 0 + − = [U+/2 + U−/2]/r s iκr 3.4. Smoothed Hankels of Positive Energy h−1 − h−1 = [e − u+/2 − u−/2]/κ¯ The smooth Hankel functions defined in (Ref. [53]) = [U+/2 − U−/2]/(−iκ) for negative energy also apply for positive energy. For ε > 0, κ is real and U = U ∗ . Then the We demonstrate that here, and show that the differ- + − difference in unsmoothed and smoothed Hankels for ence between the conventional and smooth Hankel ` = 0, −1: functions are real functions. (Ref. [53]) defines κ¯ in contradistinction to usual s h0 − h0 = [U+/2 + U−/2]/r = Re(U+)/r convention for κ s h−1 − h−1 = [U+/2 − U−/2]/(−iκ) = −Im(U+)/κ κ¯2 = −ε withκ ¯ > 0 are real, although h0 and h−1 are complex. and restricts ε < 0. According to usual conventions 3.5. One-Centre Expansion κ is defined as Ordinary Hankel functions have a one-centre ex- √ κ = ε, Im(κ) >= 0 pansion in terms of Bessel functions, Eq. (1): the shape of the radial function does not change as the We can define for any energy κ¯ = −iκ and therefore vector R connecting source and field point changes, only the expansion coefficients S. This is also true ( κ¯ real and positive, ε < 0 for a plane wave. The H are more complicated: κ real and positive, ε > 0 the r-dependence depend on R, tending to Bessel functions only when |R|rs. This complicates the Then one-centre expansion of H. They can, however, be expanded in the polynomials PkL (Eq. (133)) using ∓κr¯ u±(ε, rs; r) = e erfc (¯κrs/2 ∓ r/rs) the bi-orthogonality relation Eq. (132). For any smooth function F, its one-centre expansion can be = e±iκrerfc (−iκr /2 ∓ r/r ) s s written The smoothed Hankels for ` = 0, −1 are real X F (r) = CkLpk`(r)YL(r) (142) s kL h0(r) = (u+ − u−)/2r s where h−1(r) = (u+ + u−)/2¯κ 4πr2k+`22k Z as defined in Ref. [53]. s 3 CkL = 2k d r GkL(rs; r)F (r) To extend the definition to any energy we define 2 k!(2` + 1)! U as: ± Expressions for integrals GkL with the H are written

±iκr in Sec. 3.3. U± = e erfc (r/rs ± iκrs/2) Eq. (142) expands HL, which generalises Eq. (1) which expands H (plane waves can similarly be The following relations are useful: L expanded in Bessel functions). Eq. (142) is more erfc(−x∗) = 2 − erfc∗(x) cumbersome, and it introduces another cutoff kmax. ∗ ∗ These are drawbacks of this method, but it also erfc(x ) = erfc (x) brings with it a significant advantage: envelope functions of very different types can be constructed. Then for ε < 0, iκ is real and This adds a considerable amount of flexibility to the method, as smooth Hankels H and plane waves can U = 2eiκr − u L + + be combined in a uniform manner; this is the PMT U− = u− basis [54]. Perhaps more important, basis functions 29 can be tailored to the potential. This enables them Note the minus sign in the last term. The missing to be very accurate whilst staying minimal. This is terms may be written as the JPO basis (Sec. 3.12). If rs becomes small GkL becomes sharply peaked and the polynomial expansion becomes a Taylor (0) (2) (1) (2) (Fi − Fi )X(Fj − Fj ) expansion about the origin. But by allowing rs to +(F (1) − F (2))X(F (0) − F (2)) be a significant fraction of the MT radius sR, the i i j j error gets distributed over the entire sphere and kmax can be set relatively low. Note that rs used for this expansion need not be (and in practice is not) Write the L projection of F as PLF . By construction (0) (2) the same as rs used to make the envelope functions. PLF = PLF so if the augmentation is carried out to L→∞, F (0) and F (2) are identical and the 3.6. Three-component Augmentation missing terms vanishes identically. This form makes it clear that not only does Eq. (144) converge to As noted in the introduction, the pseudopoten- the exact result in the limit `max→∞, but also that tial method shares many features in common with the error converges much more rapidly in `max than augmentation. Bl¨ochl’s PAW method [4] makes the does conventional augmentation. connection explicit. Questaal starts from an enve- (0) This occurs for two reasons that work synergisti- lope function FRLi(r) that extends everywhere in space, augmented as follows: cally. The operator X is in practice nearly spherical, coupling parts of F with unequal L only weakly. (0) (1) (2) Consider the one-centre expansion of a particular L FRLi(r) = F (r) + F (r) − F (r) (143) RLi RLi RLi of one of these components, and consider for simplic- Index RLi labels the envelope function: R and L ity the unit overlap operator. Near the augmenta- mark the channel where functions are centred, and tion boundary all components have the same value the angular momentum, while i marks the kind of and slope by construction, and the (`-projected) (0) (2) (1) (2) envelope function at R. General envelope functions F −F or F −F vanishes to second order in ` (e.g. plane waves) need not be atom-centred, for r−s. Second, the (`-projected) F varies as r for such functions we adopt the labelling convention large ` for all three components of F , because the R=L=0. Unlike the ASA, there is typically more angular momentum becomes the dominant contri- than radial envelope function per R and L. (We will bution to the differential operator. Thus for large ` 2` sometimes label F with a single Roman index, e.g. the product of two projected functions varies as r using i to implicitly refer to the entire label RLi.), for all components, which is heavily weighted to the F (1) is some linear combination of partial waves outer parts of the sphere, in the region where the pro- (0) (2) (1) (2) matching value and slope of F (0)(r), and F (2)(r) is jections of differences F −F and F −F are (0) (0) (2) (0) (0) the one-centre expansion of F (r). F , and F small. For such L projections of Fi XFj are then (1) (1) are truncated at some `max. a good approximation to the (exact) Fi XFj . By Conventional augmented methods construct ma- truncating both local components, only the (0) com- trix elements of a local or semilocal operator X ponent remains, which as we have seen accurately straightforwardly as represents the (exact) basis function, especially near sR where it is dominant. In other words, the projec- (0) (1) (2) (0) (1) (2) hFi + Fi − Fi |X|Fj + Fj − Fj i tions of the envelope functions contain projections from `max+1 to ∞, and do so with high accuracy Evaluating the cross terms is unwieldy and for that starting at a relatively low `max + 1; indeed conver- reason conventional augmentation methods require gence is very rapid as shown in Fig. 11 (see also Fi be expanded to very high L. However, it has been Fig. 3 of Ref. [31]). This explains why the pseudopo- observed independently by several authors, the first tential and PAW approximations can be truncated being Soler and Williams [55] that the integrand can at much lower ` than the conventional augmented be approximated in the following three-component wave methods. Indeed, the present approach has form: much in common with PAW, but does not make

(0) (0) (1) (1) (2) (2) approximations or involve pseudo-partial waves or Fi XFj + Fi XFj − Fi XFj (144) projector functions. 30 Z Z (0)∗ (0) 1e-2 ∗ 3 3 χi χj d r = χi χj d r GaN Gd X (i)∗ (j) 1e-3 + C σkk0`C 0 (148) Si kL k L kk0L SrTiO3 1e-4 (Ryd/at) where E ∆ 1e-5 Z n 2 2 o 3 τkk0` = P˜kL[−∇ ]P˜k0L − PkL[−∇ ]Pk0L d r S 1e-6 2 3 4 5 2 4 6 8 10 12 (149) augmentation ℓmax kmax Z n o 3 σkk0` = P˜kLP˜k0L − PkLPk0L d r (150) s Figure 11: Convergence of the total energy with respect to R augmentation `-cutoff and polynomial degree, Eq. (145). The local kinetic energy matrix τkk0l is symmetric, even while the individual terms in Eq. (149) are not, 3.7. Secular Matrix because the integral is confined to a sphere. However To construct the secular matrix, we need matrix the surface terms from the true and smooth parts elements of Hamiltonian and overlap: cancel because P˜kL and PkL match in value and Z slope there. ∗ 2 3 Hij = χi (r)[−∇ + V (r)]χj(r) d r ˙ Note also that if φ` and φ` are kept frozen, e.g. Z computed from a free atom as we might do to mimic S = χ∗(r)χ (r) d3r ij i j a PAW or a pseudopotential method, τkk0` and σkk0` are independent of environment. Though we won’t At this stage we need not specify what χi is, except pursue it here, it suggests a path to constructing a (0) to note that it has an envelope part χi that extends unique and transferable pseudopotential. everywhere in space, augmented in spheres taking Matrix elements of the potential are more compli- the form Eq. (143). Specifically cated and will be discussed in the next section. For X (i)n o now we take them as given, along with the overlap χ (r) = χ0(r) + C P˜ (r) − P (r) (145) i i kL kL kL and kinetic energy. We can diagonalise the secular kL matrix and obtain eigenfunctions and the output (i) density as a bi-linear combination of basis functions where CkL are the coefficients (Eq. (142)) that expand the smooth envelope in χ as polynomials X PkL(r) (Eq. (133)) and ψn(r) = Zinχi(r) i ˜ h ˙ i PkL(r) = Aklφ`(r) + Bklφ`(r) YL(ˆr) out X 2 n (r) = wn |ψn(r)| ≡ p˜kl(r)YL(ˆr) (146) n XnX ∗ o ∗ ˙ = wnZinZjn χi (r)χj(r) (151) φ`(r) and φ`(r) are the linearised partial waves, ij n Sec.2.4. Coefficients Akl and Bkl are chosen so that p˜ and p have the same value and slope at s . kl kl R where the sum runs over the occupied eigenstates 3.7.1. Overlap and Kinetic Energy, and Output with wn the occupation probability. The bi-linear Density product may be written in a three-component form

We use Eq. (144) to construct matrix elements ∗ 0∗ 0 of augmented basis functions. Matrix elements of χi χj = χi χj overlap and kinetic energy may be written X (i)∗ n ˜ ˜ o (j) + CRkL PRkLPRk0L0 − PRkLPRk0L0 CRk0L0 0 0 Z Z ∗ Rkk LL ∗ 2 3 (0) 2 (0) 3 χi [−∇ ]χj d r = χi [−∇ ]χj d r (152)

X (i)∗ (j) + CkL τkk0lCk0L (147) The first term yields a smooth n0 which extends kk0L everywhere, and then two local terms, true and 31 cn one-centre expansion of n0 to some L cutoff of gR do not exactly cancel because it extends into the interstitial in the former case, but is truncated val val Xn val val o n (r) = n0 (r) + n1RL(r) − n2RL(r) in the latter. RL Smoothing radius rg must be small, so the core (153) density is almost completely confined to the sphere. val In practice we use the same Gaussians as for gR As pointed out before, we expect the higher L com- entering into the valence multipole moments (see ponents be carried accurately by nval even for low cn val 0 Eq. (156) below). In that way gR and gR can be ` , which provides a simple justification for the max merged together. The Hankel smoothing radius rh pseudopotential and PAW approximations. is chosen independently, but it is also typically small, of order sR/2. The result should depend minimally 3.7.2. Core on the choice of either. The core levels and core density can be computed either scalar-relativistically or fully relativistically 3.7.3. Local Orbitals from the Dirac equation (Sec. 2.17). The core den- Inside augmentation spheres, the usual Hilbert sity is constructed in one of two ways: Self-consistent ˙ space consists of partial waves {φ`, φ`}. Questaal c core: a core partial wave φ` is integrated in the has the ability to extend this space by one local true potential, subject to the boundary condition ˙ z z orbital (LO), to {φ`(εν , r), φ`(εν , r), φ (εz, r)}. φ 0 ` ` φc(sR) = φc(sR) = 0. It was once thought that the consists of a partial wave evaluated at some energy advantage of determining the core in the true poten- εz far above or far below the linearisation energy εν tial outweighed the inaccuracy originating from the for φ`. Addition of a LO greatly extends the energy artificial boundary condition at sR. Frozen overlap- window over which the band structure is valid; see c ping core approximation: φ` is computed in the free for example Fig. 1, of Ref. [56], where the band atom and kept frozen—experience shows this to be structure for Si was compared to LAPW, and to a a better approximation. We develop this case here. full (non-linearised) APW. Properties of Questaal’s The core density has three components, which LO are explained in that reference in a GW context; add to the valence density: here we outline the main features. LO have one each of the following attributes: val X cn n˜0(r) =n ˜0 (r) + gR (r) R • either core-like or high in energy, with princi- core+nuc val pal quantum number ∓ 1 that of valence φ`, n1R(r) = n1R(r) + n1R (r) val cn respectively; n˜2R(r) =n ˜2R(r) + gR (r) (154) • either a conventional LO, which consists φz where ˙ with some φ` and φ` admixed to make the value and slope vanish at s [57], or an ex- gcn(r) = CG G (r ; r) + CH H (E ; r ; r) R R L=0 g L=0 c h tended LO, which consists of φ with a smooth (155) z Hankel function tail that extends continuously The second term is a smooth Hankel function with and differentiably into the interstitial. The ex- CH fixed to fit the spill-out of the core density into tended LO can be applied only to the core-like the interstitial. It is important that this be taken case; and into account when constructing the total energy • either a scalar relativistic LO, or a Dirac LO and potential. The alternative, to renormalise the with µ=1/2. This is not usually important, but core and confine it to the sphere, is a much cruder it can have consequences for heavy elements; approximation. This makes it possible for shallow see how it affects LDA and GW gaps in Sec. 3.9. cores to be treated accurately, without needing to This method was first introduced by Kune˘s et include them in the secular matrix (unfortunately al.[58]. at the loss of strict orthogonality between these and the valence states). High-lying states are not usually important for cn The pseudocores gR in n˜0 and n˜2R nearly cancel, DFT, because states above EF do not contribute eff but they need to be included to construct V and to the potential. Core states of order EF −2 Ry and the total energy described below. Also the two kinds deeper can usually be treated quite adequately as 32 core levels integrated independently of the secular but in practice if it is made too small, numerical matrix (Sec. 3.7.2). There are mild exceptions: for integration of the G become inaccurate, while if too high resolution such as needed for the Delta Codes large charge is not confined to sR. Provided it is project (Sec. 3.13), both deep and high-lying LO confined, the compensating Gaussians ensure that es val were used (in the latter case, particularly adding 4d the electrostatic potential V [n˜0 ] is exact in the LO to the 3d transition metals). interstitial. val GW is more sensitive to states far from EF than The local representation n2R of the smoothed DFT. Including low-lying cores at EF −2 Ry or even density must be equally compensated somewhat deeper with LO can modify valence QP val val val levels by ∼0.1 eV. High-lying states are also more n˜2R = n2R + gR (157) important in the GW context. ZnO is an extreme The qRM can be decomposed into a sum of partial case where they were found to shift the gap by moments several tenths of an eV [59], and many high-lying X LOs are required to reach convergence[60]. qRM = QRkk0LL0M kk0LL0 3.7.4. Effective Potential Z n o Q 0 0 = P˜ P˜ 0 0 − P P 0 0 We seek a three-component form of the total en- Rkk LL M kL k L kL k L sR ergy functional. Its variational derivative with re- × rmY (ˆr) d3r (158) spect to the density yields an effective potential M eff V , which should give the first-order variation of By decomposing qRM this way, we can make any the electrostatic and exchange-correlation energy possible variation in the local density in the Hilbert with respect to the trial density. The density has space spanned by Eq. (152), and obtain the potential three components, the first interacts with a smooth from the electrostatic corresponding to the variation. ˜ potential V0, the second with the local true potential Matrix elements of the local potential are deter- V1R, and the third with the local projection of the mined from variation of the electrostatic energy potential V2R. Correspondingly, the accumulated (Sec. 3.7.4) sums over the first, second, and third terms produce Z out out out 3 n0 , n1R , and n2R , respectively. π1Rkk0LL0 = P˜RkLV1RP˜Rk0L0 d r eff Construction of V is complicated by the fact sR Z that the electrostatic potential at some point de- 3 π 0 0 = P V˜ P 0 0 d r pends on the density everywhere. Therefore it is not 2Rkk LL RkL 2R Rk L sR possible to construct the potential in a sphere solely Z X ˜ 3 from the density inside this sphere, or the smooth + QRkk0LL0M V2RGRM d r val RM sR potential from n0 alone. We can solve this diffi- ˜val (159) culty by defining a pseudodensity n 0 which has the same multipole moments inside augmentation The true potential V1R is the response to true val sphere R as the true density n 1R. Suppose the partial density P˜RkLP˜Rk0L0 , and V˜2R the re- val val local sphere density n1R − n2R has multipole mo- sponse to the smooth partial density PRkLPRk0L0 + P ments qM , where M = angular momentum. Then QRkk0LL0M GRM . These two partial densities define have the same multipole moments by construction.

val val X val We solve the Poisson equation for each of the n˜0 (r) = n0 (r) + gR (r) three components R 2 es val X ∇ V˜ = −8πn˜0 gR (r) = qRM GRM (r) (156) 0 2 es M ∇ V1R = −8πn1R 2 ˜ es n X o where GRM is a Gaussian of angular momentum ∇ V2R = −8π n2R + qRM GRM (160) M M centred at R, with unit moment. GM must be sufficiently localised inside sR that the potential All density components must include core (or pseudo- from the second term at sR is negligibly different core) contributions (Sec. 3.7.2). Poisson’s equation ˜ es from the potential of a unit point multipole at R for V0 is most easily solved by fast Fourier trans- (typically sR/4). Results should not depend on rg, forming n˜0 to reciprocal space, scaling each Fourier 33 2 component nG by −8π/G and transforming back. hT iHKS and hT iHF are calculated in different ways. In principle the electrostatic potential can be cal- It is not trivial that at (nominal) self-consistency culated this way but in practice the compensation EHKS=EHF. The two always differ slightly; when Gaussians in n˜0 can be fairly sharply peaked. Some they differ more than a small amount, it is an indi- parts are separated out and evaluated analytically cation that some parameter (e.g. plane-wave cutoff) to keep the plane wave cutoff low, as discussed in the is set too low. Thus, EHKS−EHF is one measure of ˜ es next section. V0 (r) is exact in the interstitial re- the convergence of the basis representing the charge es gion and extends smoothly through the spheres. V1R density. ˜ es eff and V2R can be solved up to an arbitrary homoge- In Questaal’s three-component framework, nV neous solution which is determined by the boundary a sum of three terms bilinear in the three com- es es es condition at sR. This adds a term proportional to ponents n˜0V˜ , n1RV and n˜2RV˜ . hT i, U and m ˜ es 0 1R 2R r YM (r), which is determined by resolving V0 into Exc each have independent contributions from three a one-centre expansion at sR. components, for example the contribution to the ˜ es ˜ es Pseudopotentials V0 and V2R approximately can- electrostatic energy from the valence electrons is cel each other, but not exactly. This is in part because the latter has a finite L cutoff, and in part Z ˜ es es 1 val ˜ es because V0 carries the tails of the core densities E = n˜0 (r)V0 (r) spilling into the interstitial. These incomplete can- 2 Z cellations are the key to rapid ` convergence of this X n val es 3 val ˜ es o 3 + n1RV1R d r − n˜2RV2R d r (163) method, and also allow relatively shallow cores not R SR to be included in the secular matrix, with minimal loss in accuracy. The last term cancels most of the first inside the augmentation spheres, except the first retains L Three Component Total Energy in DFT components above `max to all orders. This is be- ˜ ˜ The total energy in DFT is comprised of the cause V2R should be a one-centre expansion of V0 up val kinetic energy, electrostatic energy, and exchange- to `max, andn ˜2R should be a one-centre expansion val es val correlation energy of n˜0 . Because V [n˜0 ] is exact in the intersti- tial, including at sR, and it continued inside sR es 3 ˜ es Etot = hT i + U + Exc (161) by adding by V1R d r − V2R, the true potential is correct everywhere. Note that the derivative of Ees Questaal implements two functionals for Etot. The (i) wrt coefficients CRk0L0 in Eq. (152) yield the local traditional Kohn-Sham functional is evaluated at potentials Eq. (159), so that the effective one-body the output density: U and Exc are evaluated Hamiltonian corresponds to the functional derivative from nout generated by the secular matrix, and of the energy. hT i = Pocc ψ | − ∇2|ψ + T . T = HKS i i i core core The total electrostatic energy must include the P core − R ncoreV eff is the kinetic energy of the i core. Write it analogously to Eq. (163) but add the core. If the core is kept frozen, it can be evaluated (pseudo) core and nuclear density to nval (Sec. 3.7.2) from the free atom. Harris [61] and independently Foulkes and Hay- ˜ X ˜ dock [62] showed that the kinetic energy can be ex- U = U0 + {UR − UR} (164) pressed in terms of nin: U=U[nin] and Exc=Exc[nin] R and The first term is the electrostatic energy of the occ X Z smooth density, evaluated on a uniform mesh. It hT i = val − nvalV eff + T (162) HF i in in core involves a valence and pseudo-(core+nucleus) term i

Both EHKS and EHF are calculated, though usually ˜ ˜ val ˜ c+n U0 = U0 + U0 the latter is preferred because it converges more 1 Z rapidly with deviations from self-consistency. It is U˜ val = d3r n˜val(r) V˜ es(r) 0 2 0 nevertheless useful to have both energies, because ˜ c+n X G G H H they are calculated differently in Questaal. U and U0 = (CR − ZR)VR0 + CR VR Exc are computed from different densities, but more, R 34 where If the static susceptibility (Sec. 6) were calculated, 0 Z a good estimate could be made. But χ(r, r , 0) is a G 3 ˜ es VRL = d r GRLV [˜n0] second derivative of HHF: it is far more cumbersome to make than H itself, and approximation of χ Z HF H 3 ˜ es by models such as the Lindhard function worked VR = d r HR0V [˜n0] less well than hoped. A practicable, and simple

c+n G alternate ansatz is to assume that δnin is given by a Besides being needed for U0 , VRL is used in as- change in the Mattheis density construction (super- ˜ val ˜ c+n ˜ val sembling U0 . U0 and U0 must be evaluated position of free-atomic densities), arising δR. This with some care to avoid numerically integrating has worked very well in practice: it dramatically im- sharply peaked functions (core pseudodensity and G proves the convergence of the forces with iterations the multipole contribution to n˜0 (Eq. (156)). VRL H to self-consistency. Here we omit the derivation but and VR are evaluated by splitting the potential into refer to the original paper [31], and also to Appendix ˜ es ˜ es ˜ es ˜ es two terms V [n˜0] = V [n0] + {V [n˜0] − V [n0]}. B of Ref. [52] for an alternate derivation including Integrals of the second term can be performed ana- the correction term. The final expression for shift lytically (the interested reader is referred to Section of nucleus R from R→R+δR is XI of Ref. [31]), thus avoiding integrals of sharply Z occ peaked functions. All of the other integrals are much ˜ cn R X val smoother and are evaluated on the grid of G vectors. δEH = V0 δgR + δ i + δVin(nout−nin) U and U˜ are readily integrated in the sphere i R R (166) together with the valence parts in Eq. (163). The exchange-correlation potential is made anal- V˜0 is the sum of the smooth electrostatic and ogously with the electrostatic potential: exchange-correlation potential. This term describes Z the force of the smooth density on the Gaussian 3 Exc = n0xc[n0] d r lumps which represent the core and the nucleus Z at each site, and is the analogue of the Hellmann- X n o 3 + n1Rxc[n1R] − n2Rxc[n2R] d r Feynman theorem. R sR The eigenvalue shifts account for a change in the (165) atom-centred basis set in the shift in R. This the price for taking a tailored basis, but one whose It is also divided into valence + core contributions. Hilbert space changes when R shifts. Here the R es 3 We reiterate that differences n1RV1R d r − smooth mesh potential is fixed and augmentation ˜ es R n˜2RV2R and n1Rxc(n1R) − n2Rxc(n2R) con- matrices τ, σ, and π (Eqs. (149), (150) and (159)) verge much more rapidly with `-cutoff than either shift rigidly with the nucleus. Gradients only include term separately, and as a consequence the three- redistribution of plane waves owing to the shifts in component augmentation is much more efficient than envelope functions, and in the expansion coefficients the standard one. (j) CRk0L0 (Eq. (145)). The last term is the correction noted above. In 3.7.5. Forces future, it will probably be replaced by an efficient cal- Our original formulation [31] derived a simple culation of χ; meanwhile the Mattheis-shift ansatz expression for the derivative of HHF when a nucleus works rather well. Fig. 12 shows the force FOx on an changes from R to R+δR, assuming that the partial O neighbouring Co substituted for Ti in a 12-atom waves φR` shift rigidly with δR.(lmf has a “frozen unit cell of TiO2, as lmf iterates to self-consistency. φ” switch, which if imposed, guarantees φR` is rigid. The deviation ∆F from the self-consistent force is In practice we have found that the differences are vastly smaller with the correction term (compare small.) solid and dashed lines), and FOx is reasonable al- The original expression yielded forces that con- ready for the Mattheis construction (first iteration). verge only linearly with deviations ∆n=nout−nin The inset compares the two kinds of ∆F with iter- from self-consistency. It is possible to achieve faster ation to self-consistency to show that without and uncorr convergence in ∆n, by adding a correction δV ∆n, with the correction, ∆F scales as ∆FOx ∼∆Q corr 2 where δV is determined from a change δnin in nin, and ∆FOx ∼(∆Q) , the latter being the same rate but there is no unambiguous way to determine δnin. of convergence as the energy itself. 35 FOx in the LSDA in the atomic limit but the orbital en- 500 100 ergies are not. The task of the LSDA+U approach 10 is to optimise the density matrix when the orbitals on which U is applied are allowed to hybridise with 1 200 the other orbitals in the system. It hence describes 0.1 strictly speaking the orbital polarisation in the sys- 10−2 10−3 10−4 100 tem. Note that the total energy is then separately a functional of the spin-dependent electron density and the local density matrix on the orbitals for 50 __ which U terms are added. The density matrix or ∆Q orbital occupations also influence the spin-charge 10−1 10−2 10−3 10−4 densities and so the two contributions are not really independent. The U terms in LSDA+U are treated in the static mean-field Hartree-Fock approxima-

Figure 12: Harris force FOx (mRy/au) on O neighbouring tion in contrast to the DMFT method where they Co in Co-doped TiO2. ∆Q is a measure of the RMS change are treated dynamically. The expressions for the nout−nin integrated over the unit cell. Solid and dashed lines U σ are Eq. (166) with and without the third term. Inset shows E [n ] part of the total energy can be found in [66] σ deviation in FOx from the converged value. Yellow lines show and are given in terms of density matrices nmm0 ∆Q and (∆Q)2 as guides to the eye. and matrix elements of the Coulomb interaction 00 0 000 hm, m |Vee|m , m i. They fully take into account the rotational aspects of the Ylm of the d-states in- 3.8. LDA+U cluded. The double counting term, on the other The LDA+U method, introduced originally by hand, is written in terms of average Coulomb U Anisimov et al. [63, 64, 65] for open shell d- or and exchange J parameters and involves only the f-shell materials including on-site Coulomb and ex- the trace of the density matrix nσ = T r(nσ).[66] change interactions by means of the Hubbard model U and J are the screened Coulomb and exchange is implemented in both the lm and the lmf codes fol- parameters, which can for example, be estimated lowing the rotationally invariant approach described by a separate self-consistent supercell calculation in [66]. Rotational invariance means that the occu- treating the atoms in which the d-orbital occupa- pations of the different dm orbitals are specified by tion is changed as an impurity problem [69]. or by σ σ a density matrix n = nmm0 independent of the spe- means of the linear response approach [70]. Most cific choice of Cartesian axes defining the spherical often they are treated empirically and U is adjusted harmonics. For brevity we refer to the orbitals on to spectroscopic splittings, for example. which U is applied as the d orbitals although the code is written sufficiently general that U terms can The bare Coulomb interaction matrix elements k be applied to any nl of choice and even on multiple can be evaluated exactly in terms of the Slater F (ll) sets of orbitals. integrals and combinations of Gaunt coefficients as The on-site Coulomb interactions are added to worked out for instance in Condon and Shortley [71]. k the LSDA total energy and a double-counting term The Slater F (ll)-integrals, is subtracted. ZZ LDA+U σ σ LSDA σ U σ k 2 2 2 2 k k+1 E [ρ (r), n ] = E [ρ (r)] + E [n ] F (ll) = r1dr1r2dr2R`(r1) R`(r2) (r ) σ − Edc[n ] (167) (168)

Several schemes for the double counting have been proposed [63, 64, 66, 67, 68] and are implemented in where r<, r> are the smaller and larger or r1 and lmf but the prevailing approach is the fully localised r2 could easily be calculated directly in terms of limit (FLL) in which the double-counting term and the partial waves inside the spheres. However, this U-terms cancel for the fully localised (atomic) limit would then not take into account the screening of in which the density matrix becomes a diagonal ma- the Coulomb interaction by the other orbitals in trix with integer occupations. This means that in the system. Therefore it is customary to use the principle the total energy is already well described relation between the average U and J to the Umm0 36 and Jmm0 The calculation within LDA+U starts from a set of initial occupation numbers of the m orbitals for 1 X 0 U = U 0 = F each channel ` for which U and J parameters are (2` + 1)2 mm mm0 provided. These define the initial density matrix. 1 X The calculation then proceeds by making this density U − J = (Umm0 − Jmm0 ) (169) matrix self-consistent simultaneously with making 2`(2` + 1) 0 mm the standard spin density ρσ(r) self-consistent. We given by Anisimov et al. [64] and where Umm0 = note that the configuration one converges to may 0 0 0 0 hmm |Vee|mm i, Jmm0 = hmm |Vee|m mi. Using depend on the starting density matrix and hence one the tables of Condon and Shortley [71] one finds for should in principle consider different starting points d electrons and find the one which minimises the energy or use guidance from physical insight. In rare-earth (RE) 14J = F 2(dd) + F 4(dd) (170) based systems one finds for example that the lowest energy corresponds to the configuration which obeys and for f electrons Hund’s rules [73]. 2 1 1 We have here described only the FLL. The around- 3J = F 2(ff) + F 4(ff) + F 6(ff) (171) 15 11 858 mean-field approach [67, 63] and a mixture of the two [68] are also implemented and make use of the For p-orbitals J = F 2(pp)/5 and for s-orbitals same ingredients. Recently, Keshavarz et al. [74] J = 0. Examining the tabulated values of the Slater argued that adding U to spin-unpolarised LDA, thus F k integrals from Hartree-Fock calculations, one LDA+U may have advantages over adding them to finds that the ratios F 4(dd)/F 2(dd) ≈ 0.625 and LSDA, for example to extract inter-atomic exchange F 6(ff)/F 2(ff) ≈ 0.494, F 4(ff)/F 2(ff) ≈ 0.668 interactions independent of the reference system are approximately constant. Using these fixed ra- used (AFM or FM). This is presently not supported tios and the relation to the average exchange in- in the lm and lmf codes but might be useful to add tegral J one then can fix the F k values. The ad- in the future. Note that in Ref. [64] also a LDA+U vantage of doing it this way is that the screening rather LSDA+U approach was used. of the J and U can then be taken into account A simpler version of the LDA+U is described and there are only two empirical parameters. On by Dudarev et al. [75]. In that case, only a single the other hand one finds in practice that while parameter, namely U˜ = U − J comes into play. One U is strongly reduced from the atomic bare F 0, can use this scheme in the codes by setting J = 0 and the J is approximately unscreened, so one might adjusting the provided U˜ parameter. This means as well use the unscreened directly calculated F k essentially that we neglect the orbital polarisation for k ≥ 2. From LDA calculated atomic wave due to the anisotropic Coulomb interaction exchange functions, we find F 4(dd)/F 2(dd) = 0.658 ± 0.004 terms but keep the Hartree-Fock like feature that for 3d atoms, F 4(ff)/F 2(F f) = 0.685 ± 0.001, empty states feel a different potential from occupied F 6(ff)/F 2(ff) = 0.518±0.001 for rare-earth RE3+. states. In fact, in this case the additional potential (Until recently, Questaal used a different convention; is see note [72].) σ σ ˜ 1 0 By minimising the LDA+U total energy with re- Vmm0 = U[ 2 δmm − nmm0 ] (172) spect to the density matrix elements, one obtains a σ σ non-local potential Vmm0 which can also be found in Or if the density matrix is actually diagonal Vm = ˜ 1 σ [66]. In the ASA lm-version it is straightforward to U[ 2 −nm]. This means that when a spin-orbital mσ add this nonlocal potential directly to the Hamilto- on a given site is empty it is shifted up by U/˜ 2 and nian inside the spheres because the orbitals defining when it is filled, it is shifted down by U/˜ 2. This σ the matrix Vmm0 are just the single lm channel corresponds to the simplest form of the LDA+U partial waves inside the sphere. It is a little more formalism. It allows one for example to include a complex in lmf. Essentially, we now add the oper- shift of a fully occupied d band. This is useful for σ σ σ ator |φlmiVmm0 hφlm0 |. We add the corresponding example in Zn-containing systems. In LDA, the augmentation matrix elements both for the φ and binding energy of these orbitals is underestimated. φ˙ and for local orbitals if these are present for this The deeper a band lies below the Fermi level, the ` channel. higher its downward shift by the self-energy and this 37 can be simply mimicked by the LDA+U method in The full-potential code lmf does not as yet; however this form. It has the effect for example of reduc- it does have a facility to compute the core levels ing the hybridisation of the Zn-3d with the valence fully relativistically. It follows a method similar to band maximum and slightly increases the gap, af- that developed by Ebert [42]. fects things like band-offsets and the valence band As of this writing, the QSGW code allows spin- maximum crystal field and spin-orbit induced fine orbit coupling only in a very restricted manner. Typ- splitting. ically SOC has only a minor effect on the self-energy One can even use this approach for s or p elec- Σ: this is because SOC is a modification to the trons and thereby shift up the mostly cation-s like kinetic energy and only affects the potential in a conduction band minimum in a semiconductor to higher order perturbation. We have found very good adjust the gap. While this is of course purely empir- results by generating QSGW self-energies leaving ical and shifting the gaps for the wrong reason, and out SOC all together, and then as a post-processing hence far less accurate than the QSGW approach, it step include it in the band structure. is sometimes useful in defect calculations because a Additionally, lmf has a special mode where the defect level that would otherwise lie as a resonance only diagonal parts of L·S (see Sec. 2.8.2) are added in the conduction band can now become a defect to the one-particle Hamiltonian. After diagonalisa- level in the gap and allow one to properly control its tion the spin off-diagonal parts are used to modify occupation for different charge states of the defect. the one-particle levels in a special kind of perturba- Using a U on p-orbitals of anions like O or N, this tion theory, as described in the Appendix of Ref. [77]. can also have the effect that the empty defect levels This has the advantage that the eigenvectors are pushed out the valence band and localising on a kept spin-diagonal, which significantly reduces the single p orbital are pushed deeper into the gap. The cost of the GW code. Tests show that at the LDA essential function of the LDA+U terms in these ap- level the modification to the band structure is very plications is to introduce a Hartree-Fock like orbital similar to that of the full L · S, and for purposes dependence of the potential. This is important to of determining the effect on Σ it should be quite reduce the self-interaction error of LDA or GGA reliable except in very special cases. Once a modi- and allows one in a simpler and much less expensive fied Σ has been found (as a one-shot correction to way to simulate what a hybrid functional would do. QSGW computed without SOC), lmf can be run If one picks these shifts carefully to adjust the bulk with the full L · S, including the modification of Σ band structure to the QSGW bands, it provides a approximately in the manner just described. We rather efficient way to study defects with a corrected have found the difference to be negligible except band gap and specific orbital self-interaction, two of when elements are very heavy: CH3NH3PbI3 is the the main errors of LDA plaguing defect calculations. most extreme case we have found so far: the effect An example of this can be found in Boonchun et of SO on the band structure through changes in Σ al. [76]. reduced the gap by 0.1 eV [77] Finally lmf has an ability to fold in approximately 3.9. Relativistic Effects the effects of the proper Dirac Hamiltonian on the The radial solvers all solve by default a scalar valence states. The difference between the full Dirac Dirac equation, which incorporates the dominant and scalar Dirac partial waves can be important for relativistic effects. All the DFT codes (lmf, lm, very small r, where the SOC contributions are the lmgf and lmpg) have the ability to incorporate spin- largest (see ξ(r) in Sec. 2.8.2). For small r, the true orbit coupling perturbatively. In the band codes the Dirac wave function varies as rγ , γ2 = κ2 − (2Z/c)2, perturbation is straightforward (See Sec. 2.8.2); the whereas in the scalar Dirac case it varies as r`. As Green’s function scheme is more subtle, particularly a result, matrix elements ξ(r) are underestimated. with respect to third order parameterisation, but The effect is typically very small, but it becomes non- a formulation is possible (Sec. 2.16). For lmf, the negligible in heavy semiconductors such as CdTe. L · S term is added to the true local potential V1R To ameliorate this error, lmf has the ability to use a (see Sec. 3.7.4). As of this writing, only the spin fully relativistic partial wave in place of a standard diagonal part of the output density is kept in the local orbital, choosing µ = −1/2. Most important self-consistency cycle. is the p1/2 state with κ = 1. Additionally the ASA codes lm, lmgf and lmpg SO coupling is well described for light semicon- have a fully relativistic implementation (Sec. 2.17). ductors such as GaAs and ZnSe (see Table 4), as 38 LDA QSGW 3 3 nc γc Material na γa ∆0 EG ∆0 EG 2 2 GaAs (SR) 5 0.97 0.34 0.24 0.33 1.77 GaAs (p1/2) 5 0.97 0.35 0.24 0.34 1.75 1 1 GaAs (expt) 0.34 1.52 0.34 1.52 0 0 ZnSe (SR) 5 0.98 0.39 0.95 0.39 3.05 ZnSe (p1/2) 5 0.97 0.40 0.95 0.40 3.00 −1 −1 ZnSe (expt) 0.40 2.82 0.40 2.82 −2 −2 CdTe (SR) 6 0.94 0.84 0.26 0.83 1.83 −3 −3 CdTe (p1/2) 6 0.93 0.92 0.26 0.88 1.77 K L Γ X K L Γ X CdTe (expt) 0.90 1.61 0.90 1.61 PbTe (SR) 7 0.80 1.12 -.16 1.00 0.40 Figure 13: Energy band structure of PbTe in eV for (left) the PbTe (p1/2) 6 0.93 1.30 -.57 1.06 0.22 LDA and (right) QSGW approximations. Colours correspond ± + PbTe (expt) 0.19 0.19 to projections onto L6 symmetry. L6 is comprised of Pb s − and the t2g part of the d orbital, and Te p.L6 is comprised of Te s and the t2g part of the d orbital, and Pb p. The Table 4: Spin orbit splitting ∆0 of the valence band at Γ, dashed lines show the effect of using a Dirac p1/2 local orbital and energy bandgaps computed in the LDA and by QSGW, instead of the usual scalar relativistic one. Except for the with a scalar relativistic (SR) local orbital on the cation addition of the Dirac p1/2 orbital, the basis is similar to that and anion p states, and a Dirac p1/2 local orbital. In the used in Ref. [78]. It is a slight enlargement over the default zincblende semiconductors the two band edges are at Γ, but basis, including g orbitals on Pb and Te and a 6s local orbital in PbTe case they are both at L (Fig. 13). n are the principal on Te. These extra orbitals widen the gap relative to the quantum numbers of the cation and anion local p orbital; default basis by ∼0.1 eV. γ = [1 − (Z/c)2]1/2.

sentially the same except for the choice of envelope Z increases γ deviates progressively from unity and functions. LMTO’s are much more compact, and the discrepancy with experiment increases. It is require a much smaller Hilbert space, but they suffer already significant for CdTe, and in PbTe the effect from problems with basis set completeness. An obvi- exceeds 0.1 eV. Replacing the scalar Dirac LO with ous alternative is to combine the two basis sets; this the Dirac p1/2 LO significantly decreases this error is the “PMT basis” [54]. This can be accomplished in CdTe. In PbTe, the valence band edges are both in practice quite neatly, because Questaal’s standard + at L: the two band edges have symmetry L6 and HL require a one-centre expansion, Eq. (142), more − L6 , respectively (Fig. 13). The LDA gap is nomi- general than simple LMTO’s (Eq. (1)). Plane waves + − nally positive, but L6 and L6 are inverted very near can be expanded in polynomials PkL in a similar L as can be seen by the change in colours at the manner, as can other kinds of envelope functions. band edges; thus the gap is actually inverted. An- Thus the method provides a unified framework to ticipating the discussion in Sec. 4, if a diagonal-only seamlessly mix different kinds of envelope functions. + − GW were carried out on top of this, the L6 and L6 Expressions for total energy, forces, assembling out- would order properly, but the wrong topology would put density, etc, remain unchanged. The procedure manifest itself as a band crossing near L, as shown is described in more detail in Ref. [54]; here we + for Ge in Fig. 6 of Ref. [56]. QSGW orders L6 and focus on the primary strengths and weaknesses of − L6 correctly, somewhat overestimating the gap [78]. the method, using the total energy calculated for Adding a Dirac p1/2 local orbital reduces the gap by SrTiO3 shown in Fig. 14 for discussion. It was re- nearly 0.2 eV. In the LDA the gap at L widens as drawn from Ref. [54]. Four basis sets were chosen, a consequence of the fully relativistic partial wave ranging from pure LAPW (’null’) to a moderately + (Table 4), another reflection of the inversion of L6 large basis of 75 smooth Hankels N(HL) + 6 local − and L6 . orbitals.

3.10. The PMT Method Advantages of PMT. As expected, Questaal’s purely In the introduction it was noted that the LAPW atom-centred basis is vastly more compact than a and Questaal’s generalised LMTO method are es- pure LAPW basis (Fig. 14). Even a tiny basis 39 basis character of HL N(HL) + N(LO) • Compactness: LAPWs are not “intelligent”, i.e. null no HL 6 tailored to the potential, and therefore conver- hyper sp on O only 18 gence is slow in the basis size. Also their ex- small spd on all atoms 51 tended range precludes the advantages of short- large +2nd spd (spd on Ti) 81 ranged functions.

Table 5: Nature and size of different basis configurations for • Over-completeness: PMT has difficulties when the case of SrTiO . 3 HL and PWs are both sizeable. This is be- cause they span much the same Hilbert space. Workarounds (reducing the Hilbert space by of just O sp states dramatically improves on the projecting out parts that contribute to a small convergence of the LAPW basis. PMT offers a overlap) have been implemented, but they are marked advantaged from LAPW perspective: by inefficient and somewhat finicky—e.g. energy augmenting that basis by a few functions. From may not vary smoothly with the change in a the atom-centred perspective Questaal’s standard nuclear coordinates. moderately large basis misses about 5 mRy/atom in total energy, showing that it is not complete. • Interpolating Σ0: in QSGW context, the ability Finally the figure shows that adding HL continues to interpolate the QSGW self-energy Σ0 is de- to dramatically improve the convergence until the graded. A workaround has been made for this standard minimal basis of HL is reached (single problem also (Ref. [79]) by projecting Σ0 onto spd function on each atom). Beyond that initial the MTO part of the eigenfunctions, interpo- rapid gain, it doesn’t seem to matter much whether lating it, and re-embedding. Information is lost the basis is increased by adding PW or more HL in the projection/re-embedding step, resulting (compare blue-solid and green-dashed lines). in ambiguities. A much better way to address difficulties with interpolation is to construct 300 very short-ranged, compact functions. This is 100 shown clearly in Sec. 3.12. In summary, PMT does best when treated as a 30 slight augmentation of either the LAPW limit, or 10 the MTO limit. Adding a few PWs with a low cutoff (2 Ry) was useful in our participation in the 3 Delta Codes project (Sec. 3.13) particularly for the molecular solids such as N2 which consist of 90% in- 1 terstitial, or even more. Questaal’s pure MTO basis does not handle those compounds that accurately. 0.3 SrTiO3 3.11. Floating Orbitals 500 300 200 100 50 N While the PMT method can make the basis nearly b complete, it causes severe difficulties when interpo- lating the QSGW Σ0 to an arbitrary k. An alterna- Figure 14: Deviation ∆E from fully converged total en- tive is to add “empty” sites — points outside aug- ergy/atom, in mRy, vs total number of basis functions (PW mentation spheres where extra HL can be added to Γ + HL + LO) Nb at Γ. Black-dashed, peach-solid, green- the basis. These sites have no augmentation spheres; dashed, and blue-solid correspond respectively to the null, they only enhance completeness in the interstitial. hyper, small, and large basis sets. ∆E is drawn on a log-log scale as a function plane-wave cutoff, denoted as 1/Nb. Nb They are more ad hoc than plane waves (Sec. 3.10) is the total number of basis functions PW+HL+LO at the Γ particularly because there is no systematic path to point. convergence. Nevertheless they can be implemented efficiently; and Questaal has an automatic procedure to locate points to fill voids in the interstitial. In Drawbacks of PMT. While they are far more ef- practice we use floating orbitals to converge QSGW ficient than ordinary LAPWs, PMTs show less self-energies with respect to the single-particle basis, promise than was initially hoped for. in open systems. Their effect at the LDA level is 40 usually not large, unless systems are very open or Screening of the HL works in a manner similar to very accurate total energies are required. But for screening traditional HL because HL asymptotically QSGW, where the potential depends on unoccupied approaches a HL for large r (Fig. 10). HL − HL states as well as occupied ones, it makes more of decays approximately as a Gaussian of radius rs: a difference, e.g correcting the fundamental gap in for traditional values of rs (rs . 2s/3) it becomes Si by of order 0.1 eV. We expect that the need for negligible beyond first-neighbour distances. The α α floating orbitals in such systems will be obviated by screened HL employ the expansion coefficients S the new JPO basis. A precursor to it is discussed in the same manner as in the ASA (Sec. 2.9), so they next. have essentially the same range. What is sacrificed is the reinterpretation of screening in terms of hard 3.12. Screened, Short-ranged Orbitals in the Full- core radius (Sec. 2.10.5), but this has no effect be- Potential Framework cause the Hilbert space is unchanged. JPOs improve on the screened Hα by restoring this condition and exploiting it to make the kinetic energy continuous everywhere. Screening makes it possible to avoid Ewald sum- mation otherwise necessary in the unscreened FP case (Appendix C). Also, one-centre expansions are α α α 2 made more efficient. By writing H = {H − H }+ Figure 15: Screened d envelope functions, xy, yz, 3z −1, xz, α and x2−y2, for a zincblende lattice. See also Fig. 5. H , the first term which involves H, must be ex- panded in the more cumbersome polynomials PkL Building on the success of the “screening” of LM- (Eq. (145)). In the screened case the difference be- TOs in the ASA (Sec.2.9), we have developed an comes short-ranged and can be dealt with efficiently analogue within the full-potential LMTO frame- in real space. The latter can use the simpler expan- work. As in the ASA case, “screening” does not sion Eq. (42). alter the Hilbert space but renders basis functions A restriction in transformation is that all func- short ranged. Screening is a precursor to the next- tions of a given κ must share the same Hankel energy, generation JPO basis, which will alter the Hilbert while the traditional basis allows arbitrary choice for space, rendering it significantly more accurate for each orbital. However, this restriction is fairly mild the same rank of Hamiltonian. JPOs are still in de- because of the extra flexibility HL have through the velopment; the method will be presented in a future choice of rs, which the HL do not. The rs strongly work. They have the following properties: affect the kinetic energy near the MT boundary (Fig. 10), and the results are more sensitive to that • they solve the SE with a (nearly) optimal num- choice than to the Hankel energy. ber of basis functions for a given accuracy in One large advantage of the transformation is that the four dimensions (r,E); Hα becomes short enough ranged so that matrix elements • they are very short ranged (see Figs. 5 and 15); and α α α Σ0;RL,R0L0 = hHRL|Σ0|HR0L0 i • they are atom centred with a definite L charac- 0 ter on their own MT sphere. are limited by the range of the physical Σ0(r, r ), α and not by the basis set. Σ0;RL,R0L0 turns out not This provides a framework for exploiting the many to be very long ranged, as suggested some time advantages inherent in a short ranged basis sets. ago by Zein et al. [80]. Fig. 16 shows how the They can be used as projectors, replacing Wannier bandgap in NiO evolves with range cutoff Rcut, de- α functions; they can be used to construct minimal fined as follows. Σ0;RL,R0L0 is initially computed Hamiltonians; near-sightedness can be exploited to without range truncation. Then matrix elements 3 α 0 make very efficient O(N) solvers in DFT and O(N ) of Σ0;RL,R0L0 are set to zero for |R − R | > Rcut, solvers in GW and GW +BSE. That the basis func- and the band structure is calculated. As Fig. 16 tions are associated with a definite L character is shows, the bandgap EG converges slowly with Rcut α=0 α important, for example, when singling out a corre- in the traditional Σ0;RL,R0L0 case, but for Σ0;RL,R0L0 , lated subspace. EG is already reasonably converged including first 41 and second neighbours. Note that for Rcut→∞, the energy in Fig. 16. two methods yield the same result, as the screening transformation does not change the Hilbert space.

One powerful feature of lmf is its ability to inter- 5 polate Σ0. As noted in Sec. 4.2, some compromises 4 must be made because the high-energy parts of nn0 3 Σ do not interpolate well, and elements above (eV) 0 g 2 an energy threshold (of order 2 Ry) must be set to E zero. The screening mitigates this difficulty; it ap- 1 pears that the energy threshold can be taken to ∞. 0 The difference is not large, but in small bandgap cases where the precision of the method is very high, 0 2 4 6 8 10 12 14 16 agreement with experiment may be limited by this rc (bohr) difference. α The short range of Σ0;RL,R0L0 can be used to much Figure 16: NiO bandgap in eV as a function of the range advantage: we realise from the properties of Fourier truncated self-energy represented in the conventional and transforms that the number of k points directly screened basis sets. The first point contains onsite terms only, translates into the number of neighbours in real the second adds first neighbours etc. The conventional basis α shows erratic behaviour until at least the fourth neighbours space. Because Σ0;RL,R0L0 is short ranged, it con- are present. α=0 verges faster in the k mesh than Σ0;RL,R0L0 , which saves significantly on computational effort. Also At the LDA level, the better spatial localisation explicitly semilocal algorithms can be constructed, of screened basis function paves the way to efficient, greatly reducing the cost to make Σ0. real-space assembly of matrices (1 and 2 particle) At present the short ranged screened smooth Han- while maintaining, and in cases improving, accuracy. kel functions shine the most when used to represent The screened functions are also significantly less the QSGW self-energy Σ(ω, q) and interpolate it linearly dependent. The implementation is heavily at intermediate q points not directly calculated in vectorised and at present can handle a couple hun- the heavy GW step. The interpolation process is dred atoms on a single node, with the scalability significantly simplified and ambiguity is removed properly utilised the numbers can be an order of because the high energy Σ no longer needs diagonal magnitude larger. For very large systems, it will approximation, there is no need to define energy be possible to construct O(N) methods, but empiri- threshold and there is no loss of information due to cally it appears so far that systems of a few hundred discarding the off-diagonal parts. This does add to atoms are probably still too small for such methods the computing effort for directly computed QSGW to be advantageous. Σ(ω, q) because no states are excluded and matrix To summarise, our current screening transforma- sizes increase respectively. To counter that, signifi- tion offers advantages of short-ranged basis sets. It cantly fewer Σ(q) need to be computed exactly, in is not yet optimally compact. However, the JPO our experience an equivalent result can be achieved basis, to be reported on soon, we believe will be by using only a half or in cases even a quarter of nearly the most compact (optimal convergence for the original BZ sampling in each direction, for cells a given rank of Hamiltonian, and short range). We with few symmetry operations the savings add up believe these advantages will be very significant, and quickly considering that the GW walks over pairs of obviate the need for enhancements such as plane q points (quadrupling the total number of points for waves (Sec. 3.10) or floating orbitals (Sec. 3.11) to low symmetry cells). Performance can be further obtain high accuracy. improved by still utilising the diagonal approxima- tion to Σ(ω, q) because the interpolation is excellent 3.13. Delta Codes Validation Exercise and the high energy part only weakly affects states near EF . To show the quality of the QSGW Σ(ω, q) The Delta Codes project [6] is a continuing effort expansion in the new, screened basis and the con- to mutually validate the different electronic struc- ventional unscreened, we plot the dependence of the ture codes used in condensed matter and materi- NiO bandgap as a function of the truncated self als modelling. The first comparison has focused 42 on the equations of state calculated for a selec- the size of the LMTO basis is small, the choice of tion of elemental solids. The relative agreement basis parameters can have a significant impact on of different codes, including lmf, can be exam- accuracy. Different schemes for choosing basis pa- ined on the project site https://molmod.ugent. rameters have evolved, of which the simplest is to fit be/deltacodesdft. Agreement is quantified using the basis functions to the free atom wave functions. an average “Delta value”—the integrated difference This provides one set of basis parameters for each `. between equation of state curves over a specified The second set is obtained from the first by shifting volume range. the Hankel energy (typically by −0.5 to −0.8 Ry). The calculated equations of states (scalar relativis- This is a heuristic approach but it is automatic and tic and PBE-GGA exchange-correlation are spec- effective and is the default. ified) agree extremely closely, particularly among An alternative method chooses smoothing radii the all-electron codes. The pseudopotential projects directly from the potential. The gradient of the have also demonstrated precision very similar to the smoothed Hankel functions at typical muffin-tin all-electron methods. sizes is sensitive to rs and variational freedom is The Delta Codes test set has been carried out for obtained by differentiating the two sets of basis lmf with an automatic setup, avoiding any system- functions by rs instead of the Hankel energy. The specific modifications. The main settings defining smoothing radius describes the point where the basis the calculations are: function begins to deviate from the exponential, Hankel-like tail: this is similar to the behaviour Sphere radii:. sR corresponding to touching spheres of atomic eigenfunctions at their classical turning are used, evaluated at the smallest volume tested. points. Associating the smoothing radius with the classi- `max:. The maximum ` for the basis functions is 1 cal turning point allows basis functions to be con- for H and He, 2 for Z < 18 and 3 otherwise. In each structed that resemble atomic-like states at energies case, one higher ` is included in the partial wave different to the atomic eigenvalues and the question expansion. of choosing suitable rs is translated into a deter- Semi-core states:. Local orbitals are effective for mining a pair of reference energies—one each for including semicore states in the valence. We rec- the two basis sets per L. Energies of the valence ommend treating states as semicore whenever the states are typically in the range 0 to −2 Ry, and leaked charge exceeds 0.002e, or the eigenvalue is choices for the reference energies of E1 = −0.5 and higher than ∼ −2 Ry. For transition metals and E2 = −2.0 result in accurate basis sets for most ma- f-electron systems, the inclusion of high-lying ` = 2 terials. In this scheme, the Hankel energy remains or 3 conventional local orbitals can significantly im- a free parameter but similar results are found for prove accuracy and these are added by default for Hankel energies in the range −0.1 to −1 Ry. these groups of elements. The choice of reference energies can be automated by expressing them in terms of the atomic potential Molecular cases:. For cases where the touching at the muffin-tin radius. One basis set is chosen muffin-tin spheres fill less than 30% of the cell vol- with a higher reference energy, giving a more diffuse ume, an additional LAPW basis has been included. basis function, the other is setup at a smaller, more The LMTO basis was not designed for molecular negative, energy which yields a smaller rs and a systems — but the shortcomings of the LMTO ba- more tightly bound basis function. Because v(sR) sis can be easily rectified by the addition of small varies significantly across elements and materials, a 1 number of plane-waves. proportional scheme is more suitable: e.g., v(rs ) = 2 1 2v(sR) and v(rs ) = 2 v(sR); this algorithm is used Basis setup:. Basis parameters determined auto- for automating the DeltaCodes tests. matically as described below. 3.13.2. Comparison with LAPW Results 3.13.1. Choice of Basis Parameters Figure 17 shows the “Delta values” for Questaal Questaal’s full-potential code allows two LMTO with respect to the LAPW code WIEN2K. The level basis functions to be used per `. Each of these is de- of agreement is typical (or better) than most all fined by two parameters which must be chosen: the electron methods involved in the validation exercise. Hankel energy and the smoothing radius. Because In particular, the transition elements are extremely 43 well represented by the smooth Hankel basis. Some this is not possible. Representation of these ob- of the first period and group VII cases involve molec- jects requires a basis spanning the Hilbert space of ular problems for which the muffin-tins fill a small wave function products. Four-centre integrals (e.g. fraction of the unit cell: the performance of the Eq. (184)) can be evaluated as integrals of pairs of LMTO basis in these cases becomes sensitive to rs product basis functions. This was first accomplished and E choices. We do not attempt to choose opti- by Aryasetiawan [85] in an ASA framework, and mal parameters, instead we include an additional extended by us [15] into a full-potential scheme. LAPW (see Sec. 3.10), with cut-off 2 Ry or 4 Ry In the ab initio context, GW has traditionally depending on the packing fraction. Au and group referred to a perturbation around the LDA: i.e. LDA LDA LDA VI are outliers: in both cases the automatic basis H0 = H so that GW = G W , though setup generates basis functions that are rather too in recent years hybrid functionals or LDA+U have smooth to describe the tightly bound states in these become popular. It has become increasingly recog- elements. If necessary, improved performance can nised that results depend rather strongly on the be obtained, thereby converging V0 with respect to choice of H0, or equivalently the Vxc entering into the basis, by adding plane waves. it, by which we mean the beyond-Hartree parts of In addition to confirming that the smooth Hankel the effective potential. Thus the self-energy of H0 is basis is capable of high precision, the testing pro- Σ = Vxc in the language of many-body perturbation cedure also demonstrates that the automatic pro- theory. The implications of starting point depen- cedure for setting up the basis and setting various dence are profound: the quality of GW depends on numerical parameters is highly reliable. the quality of the reference. Moreover it is increas- ingly accepted that HLDA is only a good choice for H He H0 for a very restricted materials class. Li Be Questaal - Wien2K (meV/at) B N O F Ne GW is a perturbation theory, and usually only Na Mg Al Si P S Ar the lowest order perturbation (diagonal part of Σ in K Sc Ti V Mn Fe Ni Zn Ga Ge As Se Br Kr the basis of the reference Hamiltonian) is kept. As Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag In Sn Sb Te I Xe a result eigenfunctions are not perturbed and the Ba Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po Rn change in QP level εkn is

0 LDA Figure 17: “Delta Codes” tests: integrated energy difference δεkn = Zkn[hΨkn|Σ(r, r , εkn) − Vxc (r)|Ψkni] between the energy-volume curves calculated by lmf and (173) WIEN2K [81], expressed in meV/atom; the average value is 0.62meV/atom. Zkn is the quasi-particle (QP) renormalisation factor

 ∂ −1 Z = 1 − hΨ | Σ(r, r0, ε )|Ψ i (174) kn kn ∂ε kn kn 4. GW and QSGW and accounts for the fact that Σ is evaluated at εkn The GW approximation (G=Green’s function, rather than at the QP energy εkn+δεkn. Eq. (173) W =screened Coulomb interaction) is the first-order is the customary way QP energies are evaluated in term in the formally exact diagrammatic expansion GW calculations. In Sec. VI of Ref. [56], we show around some one-particle Hamiltonian H0 [82]. that omitting the Z factor is an approximate way Questaal’s GW formalism is explained in some to incorporate self-consistency, and thus should be detail in Ref. [83], and its implementation of quasi- a better choice than including it. As a practical particle self-consistent GW (QSGW ) is explained in matter it is also the case that results improve using Ref. [84]. Here we recapitulate only the main points. Z=1. From an implementation point of view, GW requires Ref. [56] notes a serious drawback in the diagonal- two-point functions, such as the susceptibility from only approximation: when levels in the reference which W is made (Sec. 6). This entails an auxiliary Hamiltonian are wrongly ordered as they are, e.g. product basis of wave function products. Most com- at the Γ point in Ge, InN, and CuInSe2, and the monly plane waves are used to implement GW. In L point in PbTe [78] the wrong starting topology such a case the same basis for two-particle objects results in an unphysical band crossing in the GW can be used, because a product of plane waves is QP levels. (See, for example an illustration for Ge in another plane wave, but for all-electron methods Fig. 6 of Ref. [56]). Something similar will happen 44 for PbTe (Fig. 13). Moreover, the off-diagonal parts dependent and non-Hermitian, thus falling outside of Σ can significantly modify the eigenfunctions of an independent particle picture. This causes and resulting charge density. This reflects itself GW to degrade, and higher-order diagrams are in many contexts, e.g. strong renormalisation of needed [87] to restore the quality of GW. But by the bandgaps in polar insulators such as TiSe2 and quasiparticlising Σ we can stay within a framework CeO2 [56]. It modifies orbital character of states of perturbation around H0 but choose H0 in an near the Fermi energy in La2CuO4, and significantly optimal manner. QSGW quasiparticlises Σ(ε) by affects how the metal-insulator transition comes approximating it by a static Hermitian potential as about, when GW is combined with DMFT [86]. 1 X Σ = |ψ i {Re[Σ(ε )] + Re[Σ(ε )] } hψ | 4.1. Need for Self-consistency 0 2 i i ij j ij j ij The arbitrariness in the starting point means that (175) there is no unique definition of the GW approxima- tion. Errors in the theory (estimated by deviations which is the QSGW form for a static Vxc. This pro- from experiment) can be fairly scattered for a partic- out cess is carried out to self-consistency, until Σ0 = ular choice of reference, e.g. LDA, making it unclear in Σ0 . Given Σ0, a new effective Hamiltonian can be what the shortcomings of GW actually are. Arbi- made, which generates a new Σ0: trariness in the starting point can be surmounted by iterating G to self-consistency, that is, by find- Σ0 → G0 → Σ = iG0W [G0] → Σ0 ... (176) ing a G generated by GW that is the same as the out in G that generates it (G =G ). But it has long Σ0 is nonlocal with off-diagonal components; both been known that full self-consistency can be quite of these features are important. poor in solids [87, 88]. A recent re-examination of TiSe2 is an interesting case where self-consistency some semiconductors [89] confirms that the dielectric becomes critical. It is a layered diselenide with function (and concomitant QP levels) indeed worsen space group P 3¯m1. Below Tc=200K, it undergoes a when G is self-consistent, for reasons explained in phase transition to a charge density wave, forming Appendix A in Ref. [84]. Fully scGW becomes more a commensurate 2×2×2 superlattice (P 3¯c1) of the problematic in transition metals [90]. Finally, even original structure. It has attracted a great deal while scGW is a conserving approximation in the of interest in part because the CDW may to be Green’s function G, in W it is not: it violates the connected to superconductivity. sum rule [91] and loses its usual physical meaning as a response function. As a result it washes out 0.8 GWLDA +∆V LDA GWLDA spectral functions in transition metals [90], often 0.4 (a) (b) yielding worse results than the LDA. 0 LDA QSGW A simple kind of self-consistency is to update −0.4 eigenvalues but retain LDA eigenfunctions, as was −0.8 first done by Aryasetiawan and Gunnarsson [92]. K Γ L M K Γ L M This greatly improves GW, especially in systems such as NiO where the LDA starting point is very Figure 18: TiSe2 energy bands in eV for the undistorted P 3¯c1 structure. (a): solid lines are LDA results, with red and poor. But it is only adequate in limited circum- green depicting a projection onto Ti and Se orbital character, stances. In TiSe2 it strongly overestimates the respectively. Blue dashed line shows shifts calculated in the bandgap (see below), as it does in NiO [92]. An GW approximation based on the LDA. (b): blue dashed line extreme case is CeO where the position of the Ce shows results from GW based on the LDA (same self-energy 2 as in panel (a)), with an extra potential ∆V LDA deriving 4f levels is severely overestimated; see Fig. 3 in from a charge density shift computed from the rotation of Ref. [83]. The off-diagonal elements in Σ, which are the LDA eigenvectors. Solid lines are QSGW results, with needed to modify the eigenfunctions, can be very the same colour scheme as in panel (a). important. Questaal employs quasiparticle self- How the CDW affects the energy band structure consistency [16, 83, 84]: by construction, the is a source of great controversy. It is accepted reference H0 is determined within GW as in experimentally that in the P 3¯c1 phase, TiSe2 is the fully self-consistent case. But the proper semiconductor with a gap of ∼0.15 eV. Above Tc Σ=iGW cannot be used because it is energy whether intrinsic TiSe2 has a gap is not settled. On 45 4 4 2 the theoretical side, the CDW makes TiSe2 an un- usual materials system: DFT predicts a metal in both P 3¯m1 and P 3¯c1, as expected, but a recent 2 2 1 GLDAW LDA calculation [93] predicts P 3¯m1 to have a small positive gap. We have revisited this prob- 0 0 0 lem and confirmed the findings of Ref. [93] at the GLDAW LDA level. However, at the QSGW level, −2 −2 −1 TiSe2 is a metal in the ideal P 3¯m1 phase. This −4 −4 −2 is atypical for insulators: usually self-consistency Γ K M Γ K M Γ X M Γ A widens the gap, as has long been known (see Fig. 1 in Ref. [83]). The origin can be traced to the modi- Figure 19: Energy band structure, in eV, of majority-spin fication of the LDA eigenfunctions by off-diagonal MnAs (left), minority-spin MnAs (middle) and nonmagnetic elements Σn6=n0 , which modify the density n(r). To FeTe (right) in (a) the QSGW approximation (orange solid see approximately the effect of the density change, lines), GLDAW LDA (dashed blue lines), and the LDA (light LDA LDA assume LDA adequately describes δV eff/δn (this is dashed grey lines). G W was calculated from the LDA, but including the off-diagonal components of Σ with the inverse susceptibility, the charge analogue of Z=1. It is the 0th iteration of the QSGW self-consistency Eq. (181)). For a fixed Σ0, we can estimate the ef- cycle. fect of the change δn renormalising the V eff through the change at Γ disappears, and the one at K is similar to the  LDA LDA LDA GW LDA LDA δΣ ≈ Σ0 − Vxc [n ] + Vxc [n ] LDA. FeTe fares no better. G W is some- where intermediate between LDA and QSGW. Most This is accomplished in a natural way with the important is the unphysical, dispersionless band Questaal package. The quantity in curly brackets at EF on the X-M line, which yields a nonsensi- is generated by GW in the first QSGW cycle, and cal Fermi surface. In both of these materials, the lmf treats this term as an external perturbation. LDA describes the electronic structure better than Running lmf to self-consistency allows the system GLDAW LDA. to respond to the potential and screen it, yielding Ambiguities in the starting point make it difficult nGW . to know what errors are intrinsic to the theory and The result of this process is shown as a dashed line which are accidental. The literature is rife with in Fig. 18(b). It bears a close resemblance to the full manifestations of this problem; see e.g. Ref. [95]. QSGW result, including the negative gap. We will Self-consistency (mostly) removes the starting-point show elsewhere that TiSe2 is an insulator in the P 3¯c1 dependence so that the errors intrinsic to the GW phase only as a consequence of lattice displacements approximation can be best elucidated. Moreover, relative to the symmetric P 3¯m1 phase. QSGW should yield better RPA total energies on GW is used less often in metals. As in the in- average than those calculated from other starting sulating case, GW based on DFT can sometimes points, because the path of adiabatic connection work well, as it apparently does for SrVO3 [94]. But is optimally described through QSGW [84]. To the range of applicability is limited, and indeed better illustrate these points Fig. 20 shows the ion- GLDAW LDA can yield catastrophically bad results. isation potential of 3d transition metal atoms and Two cases in point are MnAs and FeTe. Fig. 19 the heat of formation of 3d-O dimers, computed by shows LDA, GLDAW LDA, and QSGW energy bands the molgw code [96]. We focus on these properties near EF ; they are mostly of Mn or Fe 3d character. because it is known, e.g. from Ref. [97], that the LDA LDA In the G W case, EF must be adjusted to RPA tends to systematically overbind, and the error conserve charge. (The shift is large, of order 1 eV.) is connected with short-ranged correlations. The Consider MnAs first. The majority and minority ionisation potential and the dimer formation energy, 3d bands lie at about −1.5 eV and +1 eV in the both of which benefit from partial cancellation of LDA. This exchange splitting is underestimated: such errors, are much better described. QSGW increases the splitting, putting bands at Note the dramatic difference between choice of about (−2.5,1.5) eV while GLDAW LDA does the op- a Hartree-Fock or the PBE functional. As for the posite, reducing the splitting relative to LDA. Also, superiority of QSGW, Fig. 20 generally bears this the Fermi surface topology is poor: the hole pocket argument out, though there are some surprises, e.g. 46 8 dition of ladders should considerably improve on the 7.6 RPA’s known inadequacy in describing short-ranged correlations. It has long been known that ladders 7.2 dramatically improve the dielectric function (ω) ((ω) and G enter in the coupling constant integra- 6.8 tion for total energy) and it would be interesting 6.4 IP (eV) to explore if such low-order diagrams in a QSGW E(+)−E(0) framework are sufficient to capture total energy to 6 this accuracy. Sc Ti V Cr Mn Fe While self-consistency does largely surmount −2 starting-point dependence (it was recently shown to Exp be the case for some insulators using Hartree-Fock −3 HF PBE and LDA as starting points [100]), but it is not −4 QSGW strictly true that it does. The QSGW H0 or G0 has a Hartree Fock structure, and it can get stuck in −5 metastable valleys in the same way as Hartree Fock; −6 ∆E (eV) an example are the dual low-spin and high-spin Tm+O→TmO states found for FeSe (Sec. 4.4). Also, in strongly −7 correlated systems Σ(ε) can vary rapidly with en- ergy. There can be more than one i that forms a stationary point. This happens, for example, in Figure 20: Ionisation potential, IP, of TM atom and heat of formation ∆E of transition-metal dimers computed within CuO. It is usually an indication that the ground the RPA, using the molgw code selecting various choices of state is not well described by a single Slater determi- starting point. Note the dramatic difference between Hartree- nant. QSGW restricts G0 to a single determinant, Fock and the PBE starting points. Of the common semi- and there is no longer an unambiguously optimal empirical functionals, HSE06 (not shown) is the closest to QSGW. Data were compared to reference CCSD(T) for the choice when there are two or more equally strongly ionisation potential and experiment for ∆E. competing ones. There is no unique prescription for quasiparti- clisation, but Eq. (175) minimises the difference between the full G and G0 according a particular the heat of formation from RPA@PBE deviates less choice of norm. Recently it was shown [101] that from CCSD(T) results than QSGW. QSGW does Eq. (175) minimises the absolute value of the gradi- not describe Cr well, probably because spin fluctua- ent of the Klein functional over all possible spaces of tions are important there (Sec. 4.4), which skews the non-interacting Green’s functions. There are formal statistics with this small sample. The discrepancies reasons (Z factor cancellation [84] and conserving are small enough that incomplete convergence in sum rule [91]) why quasiparticlisation should be bet- the basis set (results in Fig. 20 were obtained with ter; there is also abundant empirical evidence, that a triple-ζ basis) are of the same order and could quasiparticle self-consistency performs better than account for some of the results. The ionisation po- full self-consistency [90, 102, 89], and the particular tential of simple sp atoms was analysed in Ref. [98], construction Eq. (175) is an optimum one. where it was computed from RPA total energy dif- QSGW has been implemented in numerous elec- ferences, compared to the eigenvalue of the GW tronic structure packages, e.g. the spex code [103], one-particle Hamiltonian. There also it was shown the Abinit code [104], VASP [105], recently in the QSGW significantly improves on RPA@HF, and Exciting code [100], and also in molecules codes also that QSGW is slightly biased towards Hartree- such as molgw [96] and NWChem [99]. For solids per- Fock, reflecting a slight tendency to underestimate haps the most rigorous implementation is the spex screening. This is consistent with the tendency for code, which can use a local Sternheimer method QSGW to overestimate bandgaps. Koval et al.[99] to rapidly converge the calculation of the dielec- found somewhat different results for small, second- tric function [106]. As attempts to make efficient row molecules. schemes that are also well converged, e.g. to cal- QSGW -RPA is not accurate enough to reach culate the RPA total energy, the local Sternheimer quantum chemical accuracy, ∼1 kcal/mol. The ad- method will likely emerge as being very significant 47 over time. diagonal-only approximation for high-energy sub- blocks of Σ0. Ecut is typically ∼2 Ry; convergence 4.2. Questaal’s QSGW Implementation can be checked by varying Ecut. The prescription is explained in detail in Sec. IIG of Ref. [84]. The Questaal’s implementation of QSGW evolved error is connected to the relatively long range of the from the ecalj package, developed by Kotani and smooth Hankel envelopes. Preliminary tests using coworkers in the early 2000’s out of Aryasetiawan’s short-ranged, screened Hankels indicate that this GW -ASA code [13], which in turn was developed truncation is no longer needed. from the “Stuttgart” LMTO code. The origi- Typically Σ is a smoother function of k than nal ecalj package can now be found at https: 0 the kinetic energy. Thus the k mesh on which Σ //github.com/tkotani/ecalj/. As of this writ- 0 is made can usually be coarser than the k mesh. ing we maintain the GW code as a separate branch Because of its ability to interpolate, and (Fig. 21). In its present form the GW part of lmf lmgw operate with independent meshes. Questaal’s QSGW implementation operates inde- pendently, receiving information about eigenfunc- tions and eigenvalues, and returning response func- 4.3. Successes of QSGW tions or a self-energy. As for the QSGW cycle, one QSGW has the ability to calculate properties cycle occurs in three parts. for a wide variety of materials classes in a man- ner that no other single theory can equal. It con- sistently shows dramatic improvement relative to DFT and extensions such as hybrid functionals or LDA-based GW. Atomic ionisation potentials [98], quasiparticle (QP) levels, Dresselhaus coefficients in semiconductors [107, 108, 109]; band offsets at the Si/SiO2 interface [104], magnetic moments and spin wave excitations [110, 111]; tunnelling magnetore- sistance [112]; impact ionisation [113]; electric field gradients and deformation potentials [114], spectral functions [115], and dielectric response [116]. Its Figure 21: Questaal’s QSGW cycle. lmf generates a new superior ability to yield QP levels and DOS made self-consistent density and noninteracting H0 for a fixed Σ0; it possible to accurately determine valence maxi- LDA LDA lmfgwd generates information about the eigenfunctions Ψ mum in SrTiO3 [117]. In contrast to G W , corresponding to H0, and lmgw receives the output of lmfgwd QSGW is uniformly reliable with systematic er- and uses it to make a new Σ0. lmgw can also generate response functions and the fully dynamical self-energy Σnn(ω). lmfgws rors (see Sec. 4.4). Fundamental gaps are usually is a post-processing code that yields observables computed in very good agreement with experiments, though −1 −1 nn nn from the interacting G, where G = G0 +Σ (ω)−Σ0 . they are systematically overestimated (see Fig. 1 in Ref. [83]), even in the strongly correlated M1 Rather than storing Σ0 on disk, Σ0 −Vxc is stored phase of VO2 [118, 119, 120], where spin fluctua- (Vxc usually being the LDA potential). Thus lmf tions are not important. It properly narrows the generates its customary LDA Hamiltonian, and adds bandwidth of localised d bands and widens it in Σ0 − Vxc to it. In practice this is accomplished by wide-gap semiconductors [117] and graphene [121]. reading Σ0 − Vxc on the mesh of k points where it QSGW can predict properties inaccessible to was generated, and inverse Bloch-summing to make GLDAW LDA, such as magnetic ground state [115] Σ0 − Vxc in real space. Then Σ0 − Vxc can be inter- and charge density. Fig. 22 shows the density cal- polated to any k by a forward Bloch sum. This is culated by QSGW in the plane normal to [001] for a unique and powerful feature, as it makes it pos- antiferromagnetic NiO and CoO. Two prominent sible to generate QSGW eigenfunctions and eigen- features are seen: (1) NiO is much more spherical values at any k. There are some subtleties in this than CoO, and the density contours for CoO are step: the interpolation does not work well if all the elongated along the [110] line. Both of these findings nn0 0 hn|Σ0 |n i are included. It is solved by rotating to are consistent with γ ray measurements [122, 123]. n6=n0 the LDA eigenfunctions, and zeroing out Σ0 for It was noticed early on [83] that QSGW has a 0 n or n whose energy exceeds a threshold, Ecut; i.e. systematic tendency to overestimate bandgaps in 48 uniformity in E2. Usually E0 can be precisely mea- sured; similarly for E1 because there is a sizeable volume of k near L where valence and conduction bands disperse in a parallel manner. E2 is more difficult, and values are less certain. That being said, the data suggest E0−E2 is, on average, about 0.1 eV below experiment. We have found that this error is, at least in part, due to the necessity of truncating high-lying parts of the off-diagonal Σ0; see Ecut, Sec. 4.1. It has recently become possible Figure 22: Self-consistent charge density in antiferromagnetic to re-evaluate this because a short-range basis was NiO (left) and CoO (right), in the plane normal to [001], recently developed (Sec. 3.12) appears to interpolate computed by QSGW. The Ni (Co) nucleus lies at the centre of the Figure. Σ0 allowing Ecut→∞. Also reliably measured are the electron effective ∗ mass me (hole masses are much less well known), semiconductors and insulators. To what extent dis- and for some systems, the nonparabolicity param- persions are well described is much less discussed, in eter α, characterising deviations from parabolicity part because there is a limited amount of experimen- near Γ, and defined as k2/m∗ = ε(k)[1 − αε(k)] tal data reliable to the precision needed. Perhaps where ε is the band energy relative to the conduc- the best materials family to serve as a benchmark tion band minimum. α is an important quantity are the zincblende semiconductors, where critical- in several contexts, particularly hot-electron semi- point analysis of the dielectric function has been conductor electronics. Unfortunately α is difficult used to accurately measure in most zincblende semi- to measure, and values reported are usually some conductors, not only the Γ-Γ (E0) transition but scalar average of the second rank tensor. Where also the L-L (E1) and X-X (E2) transitions, as in- it has been measured QSGW (or scaled QSGW, dicated in Fig. 23, which presents the QSGW and Eq. (177)) seems to predict α to within the exper- LDA bands compared with ellipsometry and pho- imental resolution; see e.g. Table 4.3 for GaAs. toemission data. Fig. 25 compares QSGW and experimental effec- tive masses for various semiconductors. Conduction band masses are well described, although there is a 4 systematic tendency to underestimate it, connected to the tendency to overestimate EG. E0 Most of this error can be traced to the RPA ap- 0 E E 1 2 proximation to the dielectric function. The optical dielectric constant ∞ (Eq. (192)) in the ω→0 limit −4 is uniformly too small by a factor ∼0.8 for many eV kinds of semiconductors and insulators (Fig. 26). Plasmon peaks in Im(ω) are almost universally −8 blue shifted [124, 125] because the RPA omits the attraction between electron-hole pairs in their vir- tual excitations. A simple Kramers-Kronig analysis −12 shows that as a consequence, ∞=Re (ω→0) should be underestimated. Indeed we find that to be the L Γ X case, but remarkably ∞ is consistently underesti- mated, by a nearly universal factor of 0.8. This is what motivated a hybrid of LDA and QSGW Figure 23: QSGW energy band structure of GaAs compared functionals [107] with ellipsometry data and photoemission data (open and closed circles). Dashed Grey lines are LDA bands. scaled LDA Σ0 = 0.8 Σ0 + 0.2 Vxc (177)

Fig. 24 shows that tendency to overestimate E0 The reasoning is since W is dominated by the and E1 is systematic and uniform, but there is less q→0, ω→0 limit, scaling W by 0.8 justifies Eq. (177). 49 8 8 E 4 E 1 E zbCdS

0 AlAs 7 2 7 6 AlSb,ZnTe 3 ZnSe AlP,zbCdSe ZbGaN ZbGaN 5 ZnS ZnTe,GaP,AlAs (eV) GaAs,CdTe

ZnS 6 GaP,CdTe InP,Si,CdTe 2 AlP,ZnS GW 4 ZbGaN AlSb,GaAs ZnSe GaAs GaSb zbCdS QS InAs CdS InAs 3 InSb,GaSb,Ge 5 InSb,InAs 1 GaSb,Si InSb,AlSb GaP,ZnSe ZnTe

CdSe 2 InP InP 4 Ge,AlAs 0 1 0 1 2 3 4 1 2 3 4 5 6 7 8 4 5 6 7 8 Expt (eV) Expt (eV) Expt (eV)

Figure 24: Transitions E0, E1 and E2 transitions in a zincblende semiconductor compared with critical point measurements. scaled Red and blue circles are QSGW results, and QSGW with Σ0 (Eq. (177)), respectively.

2 GW(T) AlSb 16 GW(0) 1 Ge Si C LDA 0.5 Si AlAs,SiC wAlN 12 ∞ ε 0 SiC

0.2 Ge m

*/ 8 ZnO,wGaN,zbZnS ZnS,GaN,CdS m InN 0.1 ZnO InSb

ZnTe,wCdS,zbZnS GaSb Calculated SrO InAs

0.05 InSb InP,GaAs,CdTe,wCdSe 4 AlAs

GaSb GaP,InP

0.02 AlSb,GaAs AlN

InAs 0 NiO,C MgO ZnTe,CdTe SrTiO3 0 1 2 3 4 5 6 7 0 4 8 12 16 Bandgap (eV) Experimental ε∞

G0 + W + + … G Figure 25: Conduction band effective masses of selected 0 zincblende compounds compared against experiment. Red scaled and blue circles are QSGW results, and QSGW with Σ0 ; black circles are experimental values. Figure 26: Dielectric constant ∞ compared to experiment. Blue are QSGW results; red are LDA results. Also shown are the ladder diagrams that modify bubbles in the polarisability, In practice, it seems that the tendency to overesti- left out in the RPA. mated bandgaps is almost completely ameliorated in these systems (compare red and blue circles, Fig. 24, ∗ as is the tendency to underestimate mc . Exceptions spin fluctuations such as CoO or La2CuO4, and are the small-gap InAs in InSb, where relative errors deep states such as the Ga 3d shift farther from ex- in bandgaps are still not small even with the 0.8 periment (see Table below). Recent work [102, 126] scaling. shows that the great majority of the error in charge The hybrid scheme does a stellar job in many con- susceptibility can be accounted for by ladder dia- texts: it improves on bandgaps in semiconductors grams. (In the case of polar semiconductors there (Fig. 24) and predicting the Dresselhaus splitting in can also be a significant renormalisation of the gap sp semiconductors [107]. But the scheme is empiri- from the Fr¨olich interaction [127, 128]; the electron- cal, and it is limited. It does not properly correct phonon interaction has a modest effect in other semi- the blue shifts in (ω) or take into account other conductors, especially compounds with second row excitonic effects and fails to bring bandgaps in close elements [129], diamond being the largest.) With- agreement with experiment in systems with strong out ladders the low-frequency dielectric constant is 50 ∗ — almost universally — about 80% of experiment EG m /m α Ga 3d ([130]). As we will show elsewhere [126] the addi- QSGW 1.78 0.076 −0.71 18.5, 18.0 tion of ladders in W seem to dramatically improve Eq. (177) 1.47 0.067 −0.82 17.8, 17.3 on the charge channel even in strongly correlated LDA 0.24 0.020 −4.0 15.2, 14.8 cases such as CoO. VO2 is an excellent test bed for Expt 1.52 0.067 −0.83 19.3, 18.6 optics: it has strong correlations in the monoclinic phase; yet, the conductivity is well described by non- Table 6: GaAs: fundamental gap, effective masses, non- parabolicity α and location of 5/2 and 3/2 Ga 3d core states magnetic QSGW, provided ladders are taken into relative to the valence band maximum. Units are eV. The account [119]. measurement of α was taken from Ref. [132], which is within Below we present more detailed properties of the 1% of a subsequent measurement [133]. Photoemission data band structure obtained from classical QSGW on a taken from Ref. [134]. single system, selecting GaAs because reliable mea- surements are available and it is representative of See for example, the excellent description of optical results obtained for the entire family of zincblende conductivity in VO2 [119]. semiconductors (Table 4.3). We observe the follow- Metals and local-moment magnetic systems are ing: similarly well described; see for example the excel- 1. dispersions Γ-L and Γ-X are mostly well de- lent description of known properties of Fe in the scribed, and vastly better than the LDA (which Fermi liquid regime in Ref. [115], and the electronic predicts the Γ-X dispersion to be 1 eV); structure of NiO and MnO in Ref. [16]. 2. the valence bands match photoemission data In summary, absent strong spin fluctuations, and to within experimental resolution; a few other mild exceptions, QSGW with some low- 3. the Ga 3d (see Table below) is pushed down order extensions (ladders and a low-order vertex for relative to the LDA and also relative to narrow semicore states), describe a wide range of ma- GLDAW LDA. Binding of shallow core-like states terials throughout the periodic table with uniform on e.g. Cd, Zn, and Cu is underestimates by accuracy and reliability that cannot be matched by ∼0.4 eV; for deeper states such as Ga 3d it is any other method. larger. It was shown in Ref. [131] that a low- order vertex correction to GW accounts for 4.4. Limitations of QSGW most of this discrepancy, at least for shallow core-like levels; 6 6

4. conduction bands are slightly overestimated 4 4 Cu s and the mass slightly too large; and La d 2 2 5. the nonparabolicity parameter α was calculated along the [001], [110], and [111] directions. It 0 0

varies ∼30% for different directions, but since −2 −2 only scalar quantities are reported, we take a Cu d −4 −4 geometrical average. α is very sensitive to m∗, −6 −6 Z A0 E0 T Y Γ S R Z so we can expect it to be underestimated a little; Z A0 E0 T Y Γ S R Z this turns out to be the case. Using Eq. (177) all three quantities align well with experiment. Figure 27: QSGW band structure (in eV) of nonmag- Other discrepancies become apparent in the rare netic La2CuO4 (left) and antiferromagnetic LaCu2O4 (right). earths: splitting between occupied and unoccupied Colours reflect the orbital character of the bands as follows: 4f levels is too large [135], and multiplet splittings, Cu dx2−y2 (red); Cu d3z2−1 (green), near −2 eV in QSGW ; O-pxy (blue), between −3 eV and −5 eV in QSGW ; O-pz which are significant for 4f, lie beyond the scope (orange), between −3 eV and −5 eV in QSGW. The band QSGW. near 3.3 eV at Y has significant Cu s character, and it also As we will show elsewhere [126] most of the sys- admixes into the Cu dx2−y2 near EF . tematic errors noted above are largely corrected when ladders are included in W . Kutepov [102] The greatest failings appear in QSGW for sys- showed that the bandgaps significantly improve, but tems where spin fluctuations are large. This is not the improvement extends to a wide range of prop- surprising since the main many-body effect in GW erties and very diverse kinds of materials systems. are plasmons. GW has no diagrams in spin beyond 51 the Fock exchange. When spin fluctuations become calculations for LDA and QSGW are shown in Ta- important, the first effect is to reduce the average ble 7. QSGW improves on the LDA, but discrep- moment [136, 137]. As they increase, states begin to ancies with ARPES are much larger than for, e.g. lose their coherence: the band picture and QSGW elemental Fe [115]. That LDA and QSGW should both begin to break down. A good example of this is be different is readily seen from the k dependence La2CuO4 (Fig. 27). Undoped La2CuO4 undergoes of the Z factor, shown in Fig. 28. Local potentials a paramagnetic to an antiferromagnetic insulator such as the LDA cannot incorporate such an effect. transition upon cooling below 325 K [138]. Fig. 27 shows QSGW calculations in the nonmagnetic state, and the antiferromagnetic one. Approximating the paramagnetic state by nonmagnetic state is an in- adequate approximation, though it is often done. In this approximation La2CuO4 is predicted to be metallic, with a single Cu dx2−y2 band crossing EF . In the antiferromagnetic state La2CuO4 is found to be an insulator with a 3.5 eV bandgap. Experimen- tally La2CuO4 is an insulator with a gap ∼2 eV. QSGW severely overestimates the optical gap in La2CuO4. There is a similar mismatch for CoO: the gap is overestimated by ∼2 eV in these cases, but not in NiO. Particularly telling is how for VO2 nonmag- netic QSGW does describe the M1 phase very well when both V-V pairs dimerise [119], but not the M2 Figure 28: The orbitally resolved quasiparticle renormal- ization (Z) factors for FeSe Fe-3d orbitals along the high- phase where only one of them does. (A gap opens symmetry Z-A line in tetragonal Brillouin zone; dxy (black), up when M2 is calculated antiferromagnetically, but dyz (blue), dz2 (green), dxz (magenta), dx2−y2 (red). M2 is probably paramagnetic at room temperature.) M1 and M2 differ mainly through the V-V dimerisa- Spin fluctuations are important, but QSGW does tion, which suppresses spin fluctuations. As we will not adequately capture them. They can be incorpo- show elsewhere [126], including ladders considerably rated in a mean-field manner by constructing a Spe- improves all of these antiferromagnetic insulators, cial QuasiRandom Structure [140] with, in this case, but discrepancies remain particularly for La2CuO4. three spin-up and three spin-down Fe atoms. Two We believe this to be an artefact of the interaction magnetic solutions can be stabilised: a low-moment between spin and charge fluctuations. and high-moment solution. Quasiparticle spectra in the latter case are far removed from experiment, Γ M Z so we consider only the low-moment solution. The ARPES 9 −18 −22 −42 7 34 moments on each Fe site are different, with a range LDA 109 113 −204 −337 254 141 |m|=0.2±0.15µB. The addition of magnetic terms +DMFT 30 45 −110 −125 42 65 shift QP levels closer to ARPES measurements, but QSGW 41 44 −107 −202 131 56 a significant discrepancy with experiment remains. +DMFT 1 10 −21 −40 10 32 In Sec. 5 these results are revisited where local, SQS6 60 45 −52 −70 31 68 high-order diagrams in spin are included. These two systems show concretely how the short- comings of GW become apparent when spin fluc- Table 7: dxz/dyz and dxy QP levels (meV) near EF in tetragonal phase of FeSe (reference is the Fermi energy). tuations are important. The simplest diagram be- LDA and QSGW data are calculated nonmagnetically; the yond GW, the T matrix, was first considered in line marked +DMFT following each is the result with DMFT an ab initio framework by Aryasetiawan and Karls- added [139], as discussed in Sec. 5. Line SQS is a QSGW son [141]; more recently it was implemented in the calculation in a ferrimagnetic SQS structure, as described in the text. spex code [142]. If spin fluctuations are not strong, as seems to be the case for Fe and Ni, such an ap- FeSe is a heavily studied superconductor with proach seems to describe spin excitations fairly well, large spin fluctuations: it provides an excellent though comparisons with experiments are too sparse testbed to compare LDA and QSGW. Nonmagnetic to draw any strong conclusions yet. For strong spin 52 fluctuations, e.g. in La2CuO4 or other unconven- Vollhardt [151] in late 80’s. Several other works from tional superconductors such as FeSe, it seems proba- Mueller-Hartmann [152], Brandt and Mielsh [153], ble that many kinds of diagrams are needed, though Jarrell et al. [154] and Georges et al. [155] helped the effective vertex entering into those diagrams is to build a strong foundation for DMFT. The for- expected to be mostly local. Dynamical Mean Field mulation and the implementation became popular Theory is ideally situated to address such cases; it through the seminal work by Georges et al. [156] is discussed in Sec. 5. and later emerged as a way to study strong corre- Our view is that QSGW with ladders in W are lations in real systems, supplementing an ab initio sufficient for reliable description of susceptibilities LDA framework by embedding a model Anderson within the charge channel, especially when ladders impurity hamiltonian inside an LDA bath [157]. are included in the self-consistency cycle [102, 126] An implementation of DMFT has three compo- to make the QSGW H0. In the spin channel DMFT nents: the partitioning of the Hamiltonian into impu- contributes the dominant missing diagrams to the rity + bath, a means to solve the impurity problem, self-energy, and the susceptibilities can be reliably and the bath-impurity self-consistency. The bath is determined from local vertices, connected to k- essential because correlated states of interest have dependent bubble diagrams. both atomic-like and Bloch-like properties with in- teresting low- and high- energy excitations. Through 5. DFT+DMFT and QSGW +DMFT self-consistency the impurity feeds back on the bath, and ensures that all the symmetries of the entire Hartree Fock and DFT are band structure theo- system are preserved. In Questaal, QSGW or DFT ries: electrons are treated as though they are inde- can act as the bath. We have interfaces to two local pendent; in solids electron states are Bloch waves. solvers: the hybridisation expansion flavor of contin- GW includes correlations that go beyond the Bloch uous time Quantum Monte Carlo (CT-QMC) from picture; still it is a perturbation around the Bloch de- TRIQS [28] and the CT-QMC written by Kristjan scription. In Landau Fermi-liquid theory, electrons Haule [27]. are replaced by ‘quasi-particles’ which are adiabat- For the partitioning, some correlated subspace ically connected to the single-particle Bloch-wave must be identified and this is done using projectors. representation of electrons. This, by construction, Usually the subspace is derived from atomic-like d- is a theory for excited states that adds perturbative or f-states of a transition metal or f shell element. corrections to the band picture. In Landau adiabatic For projectors maximally-localised Wannier orbitals theory electrons can have an effective mass, which is have been widely used [158, 159]. We have so far different from its free mass, and additionally, lifetime adopted the projection into partial waves of a par- broadening effects far from the Fermi energy. One ticular ` in an augmentation sphere [160]. There are extreme case of this scenario is when the low-energy strong reasons to prefer this scheme: the basis set quasi-particle vanishes (without magnetic ordering) is atom-centred and very localised with a definite and the atomic-like, high-energy excitation emerges `, which is the Hilbert space where the correlations and effective mass at the Fermi energy tends to are strong. The less localised the basis, the more ∞. In effect, this is an emergent, non-adiabatic, it contains weakly correlated components (e.g. O-p feature where single-particle spectral weight flows states in transition metal oxides). Too much popu- to higher energies to conserve the total spectral lation of the correlated subspace with uncorrelated weight and subsequently leads to suppression of parts conflicts with the physical interpretation of low-energy quasi-particles. Popular rigid-band tech- separation of bath and interacting subspace and can niques [143, 144, 145, 146, 147, 148, 149, 150] would reduce the reliability of the method [161]. Moreover, catch the physics of suppression of quasi-particles very localised basis functions have a much weaker at Fermi energy, however, would fail to conserve the energy-dependence. As a consequence, with a very spectral weight since a dynamic self-energy (Σ(ω)) local projector it is possible to construct an effective is absent from those theories. interaction over a wide energy window, reducing DMFT was formulated as an effective theory that the frequency-dependence of the effective interac- smoothly interpolates between the Bloch-wave limit tion, and essentially recovering the partial waves and atomic limit. It is formulated as a local impurity in augmentation spheres [17]. Partial waves closely embedded self-consistently in a medium or bath, and resemble atomic orbitals, so solving a purely locally was introduced in a seminal work by Metzner and correlated Hamiltonian with the approximation of 53 single-site DMFT (local self energy) is a good ap- proximation. The thorniest issue is how to construct the ef- fective local Hamiltonian. It requires three related quantities: definition of the local Green’s function, an effective coulomb interaction U, which is partially screened by the bath, and the double counting (DC) Figure 30: QSGW + DMFT spectral functions for La2CuO4 correction. (left) and metallic QSGW + DMFT spectral functions for doped (x=0.12) La2−xSrxCuO4 (right) which shows the clas- sic three-peaked structure with upper and lower Hubbard bands at around ±2 eV, and an almost dispersionless band at EF .

A fully ab initio implementation is a work in progress, that will be reported on elsewhere. Within Questaal U, J can be calculated within constrained RPA, following Ersoy et al. [163]. It has been tested on several materials; for instance, the Hubbard U and Hund’s J for Ni are shown in Fig. 29 which agrees well with Ref. [163]. The Hubbard U for FeSe using QSGW gives U(0) = 3.4 eV, also close to what Figure 29: U(ω) and J(ω) computed using our c-RPA scheme for Ni 3d orbitals starting from a QSGW bath. has been found in the literature [162]. In practical calculations today, we use a static U and adopt the fully localised limit (FLL) for double counting In a fully ab initio QSGW + DMFT theory, these (Sec.3.8). quantities must be computed from the theory itself, We have used QSGW +DMFT to study single- but how to accomplish this is far from a settled issue. and two- particle properties of the hole doped Questaal can estimate the Hubbard U and J using cuprate, La2−xSrxCuO4. The parent compound the constrained random-phase approximation [162] at x=0 is a Mott insulator; while QSGW calculated where the internal transitions between states in- non-magnetically leads to a metallic band structure, cluded in the DMFT subspace are removed. A fully and an insulator with a wide gap when done anti- internally consistent theory requires a frequency- ferromagnetically (Fig. 27). The spins on the Cu dependent U(ω), but available solvers can only solve in La2CuO4 fluctuate strongly and corresponding the local impurity problem with such a U in the spin fluctuation diagrams are missing from QSGW. density-density channel. Haule’s prescription for DMFT incorporates the local spin fluctuation di- excluding states in a large energy window partially agrams with an exact solver (typically CTQMC), mitigates this issue, since U is closer to bare and for the prescribed local Hamiltonian, and it can in nearly energy-independent [17]. (It makes another principle, and indeed does, open a Mott gap [86, 17] approximation, namely that not all the states ex- (charge blocking without magnetic ordering). The cluded from screening are included in the impurity five Cu-3d partial waves in the augmentation spheres problem; these states are effectively treated only at comprise the correlated subspace, and the 3dx2−y2 the bath level.) splits off to form the gap (Fig. 30), which is the paramamagnetic analogue to the antiferromagnetic QSGW band structure (Fig. 27). We solve the cor- 5.1. Questaal’s Implementation of DMFT related impurity Hamiltonian in the Cu-3d orbitals Questaal does not have its own DMFT solver. using the single-site CT-QMC solver within param- It has an interface to Haule’s CTQMC solver [27] agnetic DMFT. Embedding local Green’s functions and to the TRIQS library [28]. Haule’s CTQMC into the QSGW bath, we can compute the non-local solver uses singular value decomposition (SVD) and Green’s functions dressed by the local DMFT self TRIQS uses Legendre polynomial basis as compact energy. They can be used to make spectral func- representations for single-particle propagators. tions and the bubble diagrams entering into response 54 functions (Sec. 6.3). The spin susceptibility relates the induced moment On doping, La2−xSrxCuO4 undergoes a metal- δm to a perturbation δB: insulator transition when x is sufficiently large. At Z high temperatures, the x=0.12 system is known to δm(r, t) = − dt0d3r0χ(r, r0, t − t0)B(r0, t0) be metallic. We recover a metallic solution in our QSGW +DMFT for x=0.12, using a virtual crystal (178) approximation for x. We find the classical three χ is a three-component tensor, χαβ with α and peak spectral feature with atomic-like upper and β Cartesian coordinates. The charge susceptibil- lower Hubbard bands and a low energy incoherent ity has the same form, only the perturbing poten- quasi-particle peak at E (Fig. 30). This structure F tial is the scalar electric potential, which induces is typical to strongly correlated metals in proximity a (scalar) charge density. In general, there can be of a localisation-delocalisation transition [156, 164]. cross-coupling between spin and charge, so that the Comparing with the insulating case at x=0, the full χ is a 4×4 matrix connecting spin and magneti- spectral features for x=.12 has a natural explanation sation. in the fact that spectral weight flows to low energies We restrict ourselves to linear perturbations under doping. around an equilibrium point, where the unperturbed Single-site DMFT incorporates the local spin fluc- system is time invariant. χ is a function only of the tuation diagrams through a local self energy. It can difference t − t0, so its Fourier transform depends not explain Fermi-arc features of weakly hole-doped on a single frequency ω. If in addition the reference cuprates and momentum-dependent suppression of system is in a periodic lattice, its Fourier transform electron pockets in de-twinned FeSe. In weakly space depends only on the difference in translation hole-doped cuprates Fermi arc emerges, most likely, vectors, whose Fourier transform has a single q due to spin fluctuations which are longer ranged in within the Brillouin zone with r and r0 limited to a nature and the corresponding diagrams are absent unit cell. Denoting χˆ(q, r, r0, ω) as χ Fourier trans- from single-site DMFT. Similarly, in FeSe, there formed in both space and time, χˆ is typically what are nematic fluctuations (which are inherently non- is measured by spectroscopy. local), and the corresponding diagrams are absent from our theory. To incorporate such diagrams one 6.1. Spin Susceptibility needs to go beyond the single-site DMFT; cluster DMFT [165] or diagrammatic techniques like Dual A perturbation causes the system to generate Fermions [166]. an internal field Bxc, which adds to the total field, Btot = Bext + Bxc α α α X δBxc β 6. Susceptibilities δBtot = δBext + δm (179) β δmβ Susceptibilities play a central role in many kinds of In collinear spin structures (all spins parallel to material properties. In the static limit they reduce z), transverse and longitudinal components are to the second derivative of the total energy with decoupled. Moreover, by rotating from (x, y) to respect to the corresponding perturbation. This x+− = x ± iy, the three-component χ becomes di- section presents some general properties of linear agonal with transverse elements χ+−, χ−+, and susceptibilities, focusing on the transverse spin sus- longitudinal element χzz. Eq. (178) also applies to ceptibility χ+−, which is the magnetic analogue of the charge channel with the substitution B→V and the better known charge susceptibility. The formal- m→n, so that the full χ becomes a 4 × 4 matrix, ism of the latter is very similar, but it is simpler as noted above. In general χzz is coupled to the as it concerns the response to a scalar potential. charge channel, while χ+− and χ−+ are not. Also Also in the spin case, there is no analogue to the if the equilibrium system is nonmagnetic, χ+−=χzz. lowest order diagram (time-dependent Hartree ap- In this work we are concerned only with χ+− and proximation) in the charge case. To characterise the the charge susceptibility χQ, or alternatively the magnetic susceptibility beyond the noninteracting dielectric function. case requires a vertex such as the T matrix [141]. The “magnetic” kernel Ixc = ∂Bxc/∂m is in gen- Here we outline some general features; other sec- eral a function of (r, r0, ω), though it must satisfy a tions explain how they are implemented in Questaal. sum rule [110]. In many-body perturbation theory, 55 the analogue of I is the most challenging quantity function [116]. DMFT can also be used to build to obtain, but in the adiabatic time-dependent LDA a two-particle Green’s function (or susceptibility), it is static and local, and is written in the transverse including all local diagrams, from which a local case as four-point vertex can be extracted that replaces I. The first QSGW implementation of magnetic re- B (r) +− xc 0 sponse functions [110] was designed for local moment Ixc (r) = 0 δ(r − r ) (180) m(r ) systems, by which is meant systems with rigid spins (RSA): when perturbed by a transverse δB⊥ m Eq. (178) relates m to either Bext or Btot ext rotates rigidly without changing shape. (Archetypal

δm = χ0 δBtot = χ δBext (181) examples are Fe, NiO, and MnAs [110].) In such a case the representation of χ+−(r, r0, ω) simplifies to when χ0 and χ are the bare (noninteracting) and full susceptibilities, respectively. The preceding equa- m(r) χ+−(ω) m(r0) (185) tions imply the relation and I can be completely determined by the sum 0 −1 0 −1 0 [χ(r, r , ω)] = [χ0(r, r , ω)] − Ixc(r, r , ω) rule [110], thus avoiding a diagrammatic calcula- (182) tion for it. This is Questaal’s present perturbative approach to computing χ. The noninteracting susceptibility can be computed In the RSA χ+−(r, r0, ω) is discretised to a lattice by linearising Dyson’s equation for the perturbation +− model and can be written as χR,R0 (ω). This makes δBext it convenient to construct a Heisenberg model +− X −χ (r1, r2, ω) = G↑(r1, r2, ω) ⊗ G↓(r1, r2, −ω) 0 0 H = JRR0 δmRδmR (186) (183) RR0

⊗ indicates frequency convolution: f(ω) ⊗ g(ω) = and extract JRR0 from χ as i/2π R ∞ dωf(ω −ω0)g(ω0). We use ↓ for spin 1 and −∞ 2 ↑ for spin 2. Similar equations with permutations δ E +− −1 JRR0 = = [χR,R0 (ω=0)] (187) of ↑ and ↓ may be written for the other components δmRδmR0 +− of χ0. In the Lehmann representation χ involves 0 The δm are understood to be transverse to mR. the product of four eigenfunctions Thus QSGW can be used to determine coefficients

0 +− JRR entering into the Heisenberg model. This basic χ0 (q, r1, r2, ω) = idea, with some enhancements, is what is described occ unocc ∗ ∗ X X Ψkn↓(r1)Ψk0n0↑(r1)Ψk0n0↑(r2)Ψkn↓(r2) in Ref. [110]. ω − (k0n0↑ − kn↓) + iδ kn↓ k0n0↑ 6.1.1. Magnetic Exchange Interactions in the ASA unocc occ ∗ ∗ X X Ψkn↓(r1)Ψk0n0↑(r1)Ψk0n0↑(r2)Ψkn↓(r2) Sec. 6 presents general formulations of the spin + 0 0 and charge susceptibility. In the ASA, the static 0 0 −ω − (kn↓ − k n ↑) + iδ kn↓ k n ↑ transverse susceptibility is implemented in the lmgf (184) code in a formulation essentially similar to the rigid- spin approximation described there, with an addi- where k0 = q + k. We note in passing, that in a tional “long-wave” approximation [168] proper MBPT formulation of susceptibility, e.g. the T-matrix [141, 142], is a four-point quantity: χ is χ(q) ≈ I χ−1(q) I (188) solved by a Bethe-Salpeter equation involving W and a four-point analogue of Eq. (184), which at the Using the sum rule, I need not be calculated but end is contracted to two coordinates. inferred from χ0. The Questaal code at present does not yet have It is possible to compute χ0 from the full Green’s a perturbative (T matrix) approach for the spin function, Eq. (57). But lmgf implements the classic susceptibility, although one was recently reported in Lichtenstein formula [169], in which exchange pa- a QSGW framework [167]. It does have the ability rameters JRR0 are derived in terms of the auxiliary g to include the charge analogue of ladder diagrams and the “magnetic force” theorem. This latter says to solve a Bethe-Salpeter equation for the dielectric that the change in total energy upon spin rotation 56 1 simplifies to the change in eigenvalue sum; it is a 2 0.5 MnGa3As4 (R/a)

special instance of the Hellmanm-Feynman theorem. . 0

0.08 J It was realised sometime later [168] that the Licht- −0.5 0.04 0.4 0.8 1.2 1.6 2 2.4 2.8 enstein formula is correct only in the q→0 limit (the 1 long-wave approximation), beginning to deviate at 0 2 0.5 Mn2Ga2As4 (R/a)

. 0

around k=0.25·2π/a in elemental 3d magnets. In J the notation of this paper, Lichtenstein’s formula −0.04 −0.5 0.4 0.8 1.2 1.6 2 2.4 2.8 reads −0.08 1 [110] [001] [110] [111] 2 ZB MnAs ε 0.5 ⊥ 1 Z F FP-LDA (R/a)

` −0.12 . 0 J RR0 = dε ImTrL... QSGW J 2π ASA-Li −0.5 −0.16   Γ 0.4 0.8 1.2 1.6 2 2.4 2.8 ↑ ↓ ↓ ↑ L X R/a δPR` gRLR0L0 gR0L0RL + gRLR0L0 gR0L0RL δPR0`0 (189) Figure 31: Left: spin waves in Zb-MnAs (eV) calculated in the GW code with QSGW potential (green) and LDA poten- Questaal implements it for crystals, and for alloys tial (blue), and also in the ASA-LDA with the Lichtenstein within the CPA. See Refs. [49, 170, 171] for a de- formula, Eq. (189). Right: Red diamonds show Heisenberg pa- rameters J Eq. (186) calculated in the ASA with Lichtenstein tailed description. formula for a MnGa3As4 SQS structure (top), MnGa2As2 SQS structure (middle), and Zb-MnAs (bottom), as a func- 6.1.2. Comparing QSGW and LDA Spin Response tion of neighbour distance. In the SQS structures, there Functions can be inequivalent neighbours with the same connecting vector. LDA results calculated from the GW machinery look The GW code has an implementation of spin re- similar. Green squares: same parameters in ZB-MnAs using sponse functions within the rigid spin approximation as QSGW potential. J is in mRy. as described Sec.6. This avoids direct calculation of the vertex, and is very simple, but is restricted to local-moment systems. Ref. [110] applied this approach to NiO, MnO and MnAs, with excellent x=1, LDA predicts to be strongly antiferromagnetic, results. with negative spin wave frequencies everywhere in Questaal has the ability to compute spin waves the Brillouin zone (left panel, Fig. 31). The fig- (SWs) directly, or to extract parameters J entering ures show SWs calculated in the ASA using the into the Heisenberg model, Eq. (186). One appli- Lichtenstein formula, Eq. (189). Its long-wave ap- proximation should be accurate for small q, but it cation — MnxGa1−xAs alloys — are particularly instructive because they continue to attract interest underestimates the strength of J for large q [168]. as a spintronics material. Here we will compare the This is apparent in Fig. 31, comparing the LDA and (ASA) Lichtenstein formula against full-potential, Lichtenstein formulas. rigid-spin result using both LDA and QSGW Hamil- This finding seems to contradict a measurement tonians. Computing properties for random alloys on a quantum dot of Zb-MnAs (Ref. [173]), which of any x is feasible within the ASA, via either the predicts it to be ferromagnetic. Indeed, QSGW Lichtenstein formula or relaxing large structures us- calculations of the same system show that spin ing spin statics, Sec. 2.19. But a similar study is wave frequencies are everywhere positive (left panel, not feasible today in QSGW, so instead we consider Fig. 31). Tc based on Heisenberg parameters ex- four-atom Special QuasiRandom Structures [140] tracted magnon peaks in the SW spectrum is pos- to simulate Mn0.25Ga0.75As and Mn0.75Ga0.25As, itive, of order 600K (somewhat larger than what and also the x=1 case. Mn’s large local moment of Okabayashi observed). around 4µB makes the approximations in Questaal’s Exchange parameters J were extracted for three QSGW magnetic response good ones. compositions, in LDA and for the pure MnAs case, in DFT predicts MnxGa1−xAs to become a spin QSGW (right panel, Fig. 31). At 25% composition glass when x & 0.35, though Tc depends on what J is still positive, but the nearest-neighbour J is fraction of Mn go into interstitial sites, and how the already very small. At 50%, J is negative on average, Mn are ordered on the Ga sublattice. One applica- and a spin glass is predicted. In pure Zb-MnAs, tion of the Questaal code was to show how Mn in J oscillate in sign with neighbour distance, but alloys that favour ordering on the (201) orientation the nearest neighbours (particularly along [001]) can optimise Tc [172]. In the pure Zb-MnAs case, are strongly antiferromagnetic, leading to negative 57 frequencies in the SW spectrum. QSGW shows a Alternatively, however, one can take the limit marked contrast. Only the NN is important, with q → 0 analytically and arrive at the Adler-Wiser J > 0. QSGW and LDA are so different because the formula in the RPA, LDA underestimates the exchange splitting between 2 2 spin-up and spin-down 3d states, by ∼1 eV. This 8π e X X X [ (ω)] = f (1 − f 0 ) 2 αβ Ωω2 nk n k pushes the minority d too close to EF , and gives n n0 k∈BZ rise to long-range, antiferromagnetic interactions. × hψnk|vα|ψn0kihψn0k|vβ|ψnkiδ(ω − εn0k + εnk) (194) 6.2. Optical Response Functions in Questaal The linear optical response is described by the Here Ω is the unit cell volume and fnk are the occu- imaginary part of the dielectric response function pation numbers, which in principle are given by the Fermi function at finite temperature but are in prac- 2(ω) or equivalently the real part of the frequency tice taken to be 0 or 1 for empty and occupied states dependent conductivity σ1(ω) = −iω2(ω)/4π at frequencies in the ultraviolet-visible (UV-VIS) range. respectively. The vα are the components of the ve- locity operator v = r˙ = (i/ )[H, r]. For systems From it one can obtain the real part 1(ω) by the ~ Kramers-Kronig relation and hence other relevant with at least orthorhombic symmetry the 2(ω) ten- functions such as the complex index of refraction sor is diagonal in the Cartesian components. This p well-known equation in the independent particle n˜(ω) = 1(ω) + i2(ω) = n(ω) + iκ(ω) with κ(ω) the extinction coefficient and n(ω) the real part of and long-wavelength form gives the 2(ω) in terms the index of refraction. The reflectivity R(ω) is then of matrix-elements of the velocity operator and ver- obtained via the Fresnel equations. For instance, tical interband transitions and can also be obtained the normal incidence reflectivity from the Kubo formula for the optical conductiv- ity. Within the all-electron methods (no non-local 2 pseudopotentials) and if GW -self-energies are not n˜(ω) − 1 R(ω) = (190) included, the velocity matrix elements are equivalent n˜(ω) + 1 to the momentum matrix elements v = p/m and is a directly measurable quantity. The absorption can be obtained straightforwardly from the ∇ oper- coefficient ator matrix elements within the muffin-tin spheres and using a Fourier transform for the smooth part 2ωκ(ω) ω (ω) α(ω) = = 2 (191) of the wave functions in the full-potential imple- c n(ω)c mentation. The matrix elements of the ∇ operator between spherical harmonics times radial functions is measurable in transmission and spectroscopic el- inside the muffin-tin spheres is obtained using the lipsometry allows one to measure directly  (ω) and 1 well-known gradient formula [174]; more generally,  (ω). 2 see Sec. 3.2 and Appendix D. There are several levels of theory at which one When the non-local self-energy operator or its Her- can obtain the optical dielectric function supported mitian version are included, however, (im/~)[H, r] in the Questaal package. Typically we need the is no longer equal to the momentum operator. To macroscopic dielectric function: correct for this, the approximation proposed by −1 −1 Levine and Allan [175, 176] can be used, which  (ω) = [ 0 (q → 0, ω)] (192) M G=G =0 consists in rescaling the matrix elements by a factor LDA LDA This is essentially obtained as a byproduct of the (εn0k − εnk)/(εn0k − εnk ). This is exact when the GW method (see Sec. 4). One way to obtain this LDA and GW eigenfunctions are the same, and it limit is to take a small but finite q and use lmgw to works well in weakly correlated systems. However, obtain the plane wave matrix elements of the inverse it breaks down when correlations become strong, dielectric function. The direction of the finite q then as in NiO. Questaal also has an approximate form for the proper nonlocal contribution to the velocity defines the specific component of [2(ω)]αα which is actually a tensor. If one simply calculates instead operator. As of this writing, it takes into account only intercell contributions. This approximation

G=G0=0(q → 0, ω) (193) apparently does not work well in strongly correlated systems, and results with full matrix element will one obtains the value without local-field corrections. be reported in due course [126]. 58 The long-wave length approximation without lo- Salpeter equation [124, 125] (BSE) in simple sp cal field (or excitonic) effects of ε2(ω) can thus be semiconductors. Traditionally the method is used obtained both in the ASA lm and the full-potential by substituting GW eigenvalues for the DFT ones lmf codes and, when reading in the QSGW self- obtained from one-shot GW (Sec. 4). For simple sp energy, can use the corrected quasiparticle energies semiconductors where LDA and GW eigenfunctions rather than the Kohn-Sham eigenvalues. While ap- are similar, this works well. It does not, however, proximate, it has the advantage that the q → 0 when correlations become strong, e.g. in Cu2O, limit is taken analytically and that one can decom- where it was shown that (ω) calculated from the pose the optical response in contributions from each BSE starting from a QSGW reference described occupied-empty band pair. By furthermore plot- the measured spectrum quite well [178]. Recently ting vertical interband transition energies for any Cunningham et al. studied optics from BSE using given pair, one can analyse where the bands are QSGW as reference for a number of systems, and parallel and give their largest contribution to the found very good agreement with experiment even joint density of states. This provides a way to relate in correlated systems such as NiO [116] and the the critical points or van-Hove singularities in the monoclinic phase of VO2 [119]. However, when spin optical response to particular interband transitions. fluctuations are large and QSGW does not describe lmf and lm use this to enable decomposition into the underlying band structure well, as in La2CuO4 band-pair contributions if desired; it can also resolve (see Sec. 4.4), the description begins to break down. contributions to  by k. Recently Cunningham has added ladders to W in The code also allows one to calculate dipole op- the QSGW self-consistency cycle. This seems to tical matrix elements between core states and va- dramatically improve on the description of (ω), lence or conduction band states and this can be even in an extreme case such as La2CuO4. These used to model X-ray absorption (XAS) and emis- capabilities are embedded in the standard Questaal sion (XES) spectroscopies. Furthermore, one can distribution. calculate Resonant X-ray Emission Spectroscopy In the ASA code lm second harmonic generation (RXES) also known as Resonant Inelastic X-ray coefficients can also be calculated but this portion Scattering (RIXS) using the Kramers-Heisenberg of the code has not recently been updated to be- equations [177]. The interesting point about this come applicable in the full-potential lmf implemen- spectroscopy is that it allows to probe transitions tation. The calculation of NLO coefficients is a bit between valence and conduction band states of the more complex even within the long-wavelength in- same angular momentum on a given site. In fact, dependent particle approximation because of the let’s say one considers as X-ray edge the K-edge need to include both intra- and inter-band tran- or 1s core level, then the RXES probes transitions sitions and disentangle them in a careful manner. between valence bands and conduction bands of p The way to obtain divergence free expression was angular momentum on the site of core hole that described in a series of papers by John Sipe and are in resonance with the difference of the absorbed coworkers [179, 180, 181]. The version implemented and subsequently coherently emitted X-ray photon in the code [182] uses Aversa’s length gauge for- transitions from these conduction and valence states malism. The intra-band matrix elements ri and to the same core-hole. In optical measurements in inter-band re in this formalism are the VIS-UV region, on the other hand, one probes 0 0 0 only ` to ` ± 1 dipole allowed transitions. Using hnk|ri|mk i = δnm[δ(k − k )ξnn + i∇kδ(k − k )] 0 0 p-levels as core hole RIXS thus probes d − d tran- hnk|re|mk i = (1 − δnm)δ(k − k )ξnm (195) sitions, which often may be strongly influenced by many-body effects. The Kramers-Heisenberg for- with mula assumes the wavevector of the X-ray can be (2π)3i Z ξ ≡ d3r u∗ (r)∇ u (r) (196) neglected so that only direct transitions at the same nm Ω nk k mk k-point are allowed and within this approximation Ω it allows one in principle to extract band-dispersions in which Ω is the volume of the unit cell and unk(r) in a manner complementary to ARPES. is the periodic part of the Bloch function. The It was established 20 years ago that RPA approxi- ξnn can be recognised to be a Berry connection. mation to the dielectric function can be dramatically The manipulation of the matrix elements of ri in improved by adding ladder diagrams via a Bethe- commutators is non-trivial but described in detail 59 in Aversa and Sipe [179]. An update of these parts 7. Towards a High-Fidelity Solution of the of the code to incorporate them in lmf and to be Many-Body Problem made compatible with QSGW band structures as input is intended. Solving the many-electron problem with high- See also Sec. 6.3. fidelity is a formidably difficult task. Our strategy to accomplish this relies on the following premises:

6.3. Response Functions within DMFT (i) many-body solutions are best framed around a non-interacting starting point, and we believe that QSGW is the best choice among them, by 160 7

0.22 6 construction; 120 0.15 5

0.07 4 (ii) charge fluctuations governed by long-range in- 80 0 -0.00 3 teractions, but they can be treated accurately -0.08 40 2 V e -0.15 with low-order perturbation theory; and

m 1 - 0 -0.22 (1/2,0,0) (1/2,1/2,0) (1/2,1,0) 0 - 0 (iii) spin fluctuations are governed mostly by single-

Figure 32: (left) Real part of χ(q, ω=0) for La2CuO4 show- site effective Hamiltonian (or action). The ing a dominant peak at q=(π, π). (middle) Imaginary part short-range interactions can be too strong to of χ(q, ω) (arbitrary units) for La1.88Sr0.12CuO4 showing handle perturbatively, as can be seen by the that spin fluctuations get gapped at (π, π) below 145 K. (right) Nodal superconducting gap structure for the hole- non-analytic behaviour of one- and two-particle doped La2CuO4 showing dx2−y2 gap symmetry. The color quantities, and their high degree of sensitivity bar shows the negative (in blue) and positive (in red) super- to small perturbations, e.g. change in tempera- conducting gap magnitudes in arbitrary units passing through ture. Nonlocal contributions are much weaker node where the gap closes (in yellow). and can be treated in perturbation theory.

With a converged self-energy, the CTQMC can 2’ QSGW e-ph sample the two-particle Green’s function [183] to 1’ 3’ obtain local spin and charge vertices. Finally, the non-local Bethe-Salpeter equations (BSE) in spin, 1 c 2 3 QSGW DMFT χ G++ charge and superconducting channels [183] can be solved from a local vertex and a non-local polar- isation bubble in the respective channels. This Figure 33: Questaal strategy for solving the many-electron allows us to compute the corresponding real [86] problem. and imaginary part of susceptibilities [86, 184] in those channels. In Fig. (32) we show the real part of the spin susceptibilities in x=0 La2−xSrxCuO4, Charge and spin fluctuations have very different adapted from Ref. [86]. There is a peak at q=(π, π) characteristic energy scales: plasmon frequencies are suggesting dominant N´eelanti-ferromagnetic spin typically of order 5 eV while typical magnon frequen- fluctuations (Fig. 32). cies are of order 100 meV or less. As a conseqence We compute the imaginary part of the spin suscep- they mutually interact weakly so that the self-energy tibilities along the line (h, k, l) = (0,1/2,0)−(1,1/2,0) is predominated by independent contributions from and find that the peak at (1/2,1/2,0) gets spin spin and charge channels. This is the premise of the gapped (vanishing weight at ω=0) when x=0.12 Bose Factor Ansatz which has has been found to be below a certain temperature, consistent with experi- effective in a DMFT framework [186]. mental observations [185]. Additionally, solving the We have framed a hierarchical strategy around BSE in the p-p superconducting channel we recover these premises (Fig. 33). At the lowest level is a superconducting gap function with dx2−y2 symme- QSGW, which depends only minimally on DFT. try (right panel, Fig. 32). Our QSGW +DMFT im- We think this is of central importance to frame plementation can be successfully applied to a wide a consistently reliable theory, for reasons we have range of systems; weakly correlated metals [115], pointed out in Sec. 4.1. strongly correlated metals [184] and correlated Mott Referring to the figure, QSGW is adequate for insulators [86]. many purposes. When not, there are two routes: a 60 perturbative, nonlocal path (10, 20, 30) : low-order (5) data structures are organised with performance diagrams are added to W to make (W → Wc). To in mind and most heavy number crunching is out- date we have added ladders to the charge channel; sourced to performance libraries implementing the in progress is a project to add the electron-phonon BLAS, LAPACK, FFTW3 APIs, (6) the code is interaction perturbatively [187], effectively adding written in an uniform style with consistent flow pat- another bosonic contribution to W . Also possible, terns. This practice has been extended through the but not yet accomplished in Questaal, is to add use of modern version tracking and continuous inte- low order spin fluctuation diagrams such as the T gration pipelines incorporating regression, coverage matrix. All of them make a better G (G++ in the and performance testing, and minimal style policy Figure). The second path begins as nonperturbative, enforcement. local path (1). From a local interaction nonlocal susceptibilities can be constructed (2). We have 8.1. Release Policy shown a few instances of this in Sec. 6.3, and the We use the popular distributed version tracking results are remarkably good in cases we have studied system git. The public online hosted repository is so far, e.g. Ref. [184]. Finally, the susceptibilities hooked to continuous integration pipelines executing can be used to add a new diagrammatic contribution the steps above on each push event. New features (3) to G (G++). The simplest addition is Dual are developed in separate feature branches, if pushed Fermions [166]. to the public online repository, these are visible to all Perhaps more satisfactory would be to have a users. When judged safe and reasonably usable, fea- single, unified approach, for example the Diagram- ture branches are merged to the main development matic Monte Carlo (DiagMC) method [188]. But branch (“lm”). After extensive use, the development it is formidably difficult to make this method work branch is merged to the release branch (“master”) practically in real systems, because of the enormous and a new release is tagged with a numerical ver- time and memory costs that realistic (as opposed sion in the format vmajor.minor.patchlevel. If/when to model) Hamiltonians require. We think that the issues are discovered and fixed the patchlevel is in- JPO basis alluded to in Sec. 3.12 is perhaps the cremented in a new tag and the new commits are most promising framework to realise DiagMC for merged back to the development branch and from realistic Hamiltonians. Even in such a case an opti- there to the feature branches. This approach re- mal approach would likely closely resemble Fig. 33, duces the maintenance burden significantly, however but with some cross-linking between paths (1) and it does mean that once the minor version is incre- 0 (1 ), using a diagrammatic Monte Carlo solver only mented there are no more patches offered to the to include nonlocal diagrams beyond the RPA. older versions and users are strongly encouraged to update to the latest version available. Since new de- velopments are effectively done by interested users, 8. Software Aspects and there is as of yet no contractual support offered, we feel this is a justified arrangement and in the long The software package has a long history and has term offers users new features and improved perfor- featured a high level of modularisation from its mance with effectively close to no risk. The major early days. The different parts interoperate through version is only incremented when a very significant, shared interfaces and file data formats. possibly incompatible rewrite of core components The implementation is mainly in procedural For- or the input system has been performed. tran 77. Fixes and improvements as well as new New code and changes to existing code require: feature code uses modern Fortran features (f2008+) (1) coding standards pass; (2) regression tests suite for convenience as well as improved reliability and pass (approximately 350 tests as of this writing); interoperability. Despite f77’s shortcomings and and (3) test coverage of at least 90% of code. questionable reputation, sound software engineering practices have been observed, there is (1) extensive code documentation in an uniform format, (2) al- 8.2. Parallelisation most no shared mutable state (i.e. common blocks The programmes of the package support multilevel or module variables); (3) action through side effect parallelisation multiprocessing through the message is avoided and data is passed through arguments, passing interface (MPI) and simultaneous multi- (4) unnecessary temporary data copies are avoided, threading through performance libraries and (when 61 necessary) manual OpenMP directives. The full- Full-potential potential program uses multiprocessing mostly dur- blm automatic input file generator ing the Brillouin zone sampling, outside of the q-loop lmf main band code the MPI processes cooperate on the local potential lmfa free atom solver generation, only multithreading is available within lmfgwd interface to GW the processing of each q-point. The empirical polaris- lmfdmft interface to DMFT able ion tight binding program offers more flexibility Atomic sphere approximation in this regard by allowing simultaneous assignment lm ASA band code of groups of processes to each q-point as well as lmgf ASA crystal Green’s function code spreading different blocks of q-points over groups of lmpg ASA principal layer Green’s function processes [189]. A somewhat similar approach has lmstr ASA structure constants been recently developed by Martin L¨udersfor the tbe empirical tight-binding GW code. While it is available in a feature branch Post-processing and utilities it is not yet considered production ready and has fplot a graphics package not been merged to the main branch. The GW code pldos a DOS postprocessor is fairly multithreaded and for systems of at least plbnds a bands postprocessor 15-20 atoms performs well on many-core wide vector lmchk checks structure-related quantities architectures like Intel’s Xeon Phi family of proces- lmdos assembles partial DOS sors (tested on x200, Knight’s Landing). A small lmfgws dynamical self-energy postprocessing but performance-critical part was also ported to lmscell supercell maker Nvidia’s CUDA parallel platform and shows promis- mcx a matrix calculator ing performance on hardware with good double pre- Editors cision floating point capabilities (for example Titan, lmf|lm --rsedit restart file P100, V100 compute cards). The polarisable ion lm --popted optics tight binding program makes more extensive use of lmf --wsig∼edit static Σ0 CUDA; it is described in [189]. lmfgws --sfuned dynamic Σ To ease parallel IO, classic fortran binary files lmf --chimedit magnetic susceptibility are being moved to HDF5 based data files. This lmscell --stack superlattice also improves interoperability with various postpro- cessing environments because the HDF5 libraries offer bindings to a variety of popular programming Table 8: Main executables in the Questaal suite. languages. 8.4. Selected Publications from Questaal 8.3. Usability In this section we point to original papers where An ordinary text command line is all one needs new capabilities were developed within Questaal. to be able to use the programmes from the package. Some the concepts are not original with Questaal, The main executables (see Table 8) support a num- though many are: ber of flags, notable among which is “--input”, it causes the full input understood by the executable • an early DFT calculation of the Schottky barrier to be printed together with default values and short height at a metal/semiconductor contact, showing documentation for each token. Another very valu- the importance of screening at the interface [190]; able and possibly unique feature is the ability to • an early calculation of alloy phase diagrams com- override almost any input file value through the bining statistical theory and DFT [191]; command line through the small embedded prepro- cessor which renders the input transparent internally • a systematic technique for deriving force theorems (Fig. 34). A third is the suite of special-purpose edi- within DFT [9]; tors (see Table). There is extensive documentation • the first formulation of adiabatic spin dynamics online together with many tutorials and ready made within DFT [11] and its application to explain example script snippets. Users are encouraged to the Invar effect in permalloy [192]; report issues and offer ideas through the online tick- eting system and contribute improvements or fixes • a formulation of self-consistent empirical tight- through pull requests or inline patches. binding theory [29]; 62 • Questaal’s implementation of nonequilibrium elec- tron transport in nanosystems [14], and signifi- cant applications to magnetic transport [197, 22, 198, 24, 25];

• a fusion of genetic algorithms and exchange in- teractions to predict and optimise critical tem- peratures in multi-component systems [172];

• Dresselhaus terms calculated ab initio with high fidelity in zincblende semiconductors [107, 199, 109];

• first GW description of 4f systems, showing the tendency to overestimate splitting between occu- pied and unoccupied f states [135];

• spin wave theory within QSGW [110];

• prediction of Impact Ionisation rates with QSGW [113];

• the PMT method of Sec. 3.10 [54];

• a formulation of tunnelling transport within QSGW [112];

• effect of spin-orbit coupling on the QSGW self- energy [77];

Figure 34: Basic input file: a full-potential GGA calculation • approximate description of transient absorption for HCP Gd with 7µB l = 3 starting moment. A preprocessor spectroscopy within QSGW [200]; directive (nk) is used to specify the k-sampling, requesting 1000 points in the full Brillouin zone: the generated grid • Questaal’s first application of QSGW +DMFT, (12,12,7) has components corresponding to the relative BZ vector lengths. nk, like pwemax—which flags the inclusion of showing the need to go beyond GW when spin LAPW basis functions—can be changed from the command fluctuations are important [115]; line. The LMTO basis is to be setup automatically, caused by the AUTOBAS token in the HAM (Hamiltonian) section. • addition of ladder diagrams to dielectric func- tion [116]; • a formulation of Electron Energy Loss Spec- troscopy in DFT [193]; • Fr¨ohlich contribution to renormalisation of QSGW energy bands [128]; and • the original description of Questaal’s full- potential method, Sec. 3 [31]; • first application of QSGW +DMFT+BSE for sus- ceptibilities, Sec. 6.3 [86]. • the original description of Questaal’s all-electron GW approximation [194], and first calculation of RPA total energy in a solid [195]; 8.5. Distribution and Licensing • solution of the Boltzmann transport equation Historically the code has been distributed as a combined with DFT [196]; set of source tarballs, in addition registered users can git-clone our online repository. The code is • the original formulation of Quasiparticle Self- distributed under the terms of the General Public Consistent GW [16], demonstration of its wide license version 3. Contributions under a compatible applicability [83], and a detailed description [84]; license are welcome. 63 9. Conclusions many, the UK Materials and Molecular Modelling Hub for computational resources, which is partially Questaal is a descendant of one of the early all- funded by EPSRC (EP/P020194/1) and STFC Sci- electron methods developed in Stuttgart. It has entific Computing Department’s SCARF cluster gradually evolved to its present form, but as no (www.scarf.rl.ac.uk). J.J. acknowledges support paper has been written for any of its stages the under the CCP9 project “Computational Electronic present work is intended to summarise much of this Structure of Condensed Matter”, part of the Compu- evolution. Our aim was to present the many expres- tational Science Centre for Research Communities sions from classic works, combined with previously (CoSeC). unpublished recent developments, in a unified way to show the connections between the parts. We made an attempt to show Questaal’s strengths Appendices as well as its limits, within varying levels of theory. An augmentation radius is labelled by symbol We think Questaal provides a very promising path s. Subscript R is used to denote a site-dependent to efficiently solve the electronic structure problem quantity, such as the augmentation radius sR. When with high fidelity. It is unique in its potential to R appears in a subscript of a function of r such as span such a wide range of properties and materials, HRL(r), it implies that r is relative to R, i.e. r − R. and with varying levels of approximation. When a symbol refers to angular momentum, an upper case letter such as L, refers to both angular 10. Acknowledgements quantum number and magnetic quantum number parts, ` and m. Many people have contributed to the Questaal project. Much of the and basic algo- A. Definition of Real and Spherical Harmonics rithms in the lmf implementation were largely the Where spherical harmonics are used, Questaal work of Michael Methfessel. GW was formulated uses the same definitions as Jackson [201]: and implemented largely by Ferdi Aryasetiawan and 1 (2` + 1)(` − m)! 2 Takao Kotani, and QSGW by Sergey Faleev. Martin Y (θ, φ) = P m(cos θ)eimφ L¨uderswrote a parallel version of this code and was ` m 4π(` + m)! ` involved in some validation aspects. Derek Stew- (A.1) art wrote the original code for Landauer-Buttiker (1 − x2)m/2 d`+m transport, and Faleev extended it to the nonequi- P m(x) = (−1)m (x2 − 1)` ` 2ll! dx`+m librium case. Pooya Azarhoosh improved on the (A.2) optics and performed TAS studies. Alena Vishina redesigned the fully relativistic Dirac code, making The −m and +m functions are related by [see Jack- it self-consistent. Herve Ness has redesigned signifi- son (3.51) and (3.53)] cant parts of the layer Green’s function technique. (` − m)! Brian Cunningham and Myrta Gr¨uningadded P −m(x) = (−1)m P m(x) (A.3) ` (` + m)! ` ladder diagrams to W , and Savio Laricchia is close to m ∗ completing a QSGW -based approach to the electron- Y`,−m(ˆr) = (−1) Y` m(ˆr) (A.4) phonon interaction and related quantities. Both of Questaal mostly uses real harmonics Ylm, which these additions will have significant impact in the are related to the spherical harmonics as future. Also, many thanks to Kristjan Haule and Gabi Kotliar for helping us get started in DMFT, Y` 0(ˆr) = Y` 0(ˆr) and to Ole Andersen for being the founder and 1 m inspiration for so many of the ideas behind Questaal. Y` m(ˆr) = √ [(−1) Y` m(ˆr) + Y`,−m(ˆr)] 2 Finally, many thanks to Andy Millis and the Si- 1 m mons Foundation, who enriched Questaal in funda- Y`,−m(ˆr) = √ [(−1) Y` m(ˆr) − Y`,−m(ˆr)] (A.5) mental ways, and to EPRSC which made it possible 2i to become a community code (CCP9 flagship project: It also uses real harmonic polynomials, [note: sub- EP/M011631/1). stituted l → `] We gratefully acknowledge PRACE for award- ` ing us access to SuperMUC at GCS@LRZ, Ger- Y`m(r) = r Y`m(ˆr), (A.6) 64 which are real polynomials in x, y, and z. spherical Neumann functions n` and Bessel functions The product of two spherical harmonic polynomi- j` as follows: als can be expanded as a linear combination of these functions in the same manner as ordinary ones HL = YL(−∇) h0(r) `+1 X = k n`(kr)YL, E > 0 YK (ˆr)YL(ˆr) = CKLM YM (ˆr) (A.7) `+1 = (ik) n`(kr)YL, E < 0 M −` X k+`−m JL = k j`(kr)YL, E > 0 YK (r)YL(r) = CKLM r YM (r) (A.8) −` M = (ik) j`(kr)YL, E < 0 where The Gaunt coefficients CKLM are nonzero only when ikr k + ` − m is an even integer, so the r.h.s. is also a h0 = Re e /r and j0 = sin(kr)/kr (B.13) polynomial in (x, y, z). Note also, from Eq. (A.6) it follows that if α is HL and JL are real for any real energy, and struc- purely imaginary ture constants Eq. (1) have the property SKL(R) = ∗ SLK (−R). They can generated from the `=0 func- l ∗ Y`m(αr) = (−1) Y`m(αr) (A.9) tions using the operator YL(−∇); see, e.g. Eq. (117). The latter are useful for derivations and also practi- It also follows from Eq. (A.6) and the discussion cally, for instance, to express the gradient of ∇HL as around Eq. (139) that a linear combination of other HL. Eq. (123) shows the explicit relation to the usual spherical Hankel X − (1) ∇pYL(r) = (2` + 1) CKL;p YK (r) function of the first kind, h` (z). K Spherical harmonics YL can be substituted for the −2l−1 X + YL in Eq. (B.13). −∇p(r YL(r)) = (2` + 1) CKL;p YK (r) K For historical reasons, Andersen made a different (A.10) set of definitions, which are convenient for the ASA when E=0. In this paper we distinguish them from −2l−1 for p = 1, 2, 3. r YL(r) and YL(r) are ordinary Questaal’s standard definitions Eq. (B.13) with a Hankel and Bessel functions of energy zero. In the caret. For E≤0 and κ2= − E, top (bottom) line, k=`+1 (k=`−1). Eqs. (A.10) −` −1 (±) Hˆ`(κ, r) = −[(2` − 1)!!(κw) ] (iκw)h`(κr) may be taken as a definition of CKL;p. Similarly −−−→κ→0 (w/r)`+1 X  (+) (−) 2 rpYL(r) = C + C r YK (r) (A.11) −` KL;p KL;p Jˆ`(κ, r) = (−1/2)[(2` − 1)!!(κw) ]j`(κr) K −−−→κ→0 [2(2` + 1)]−1(r/w)` (B.14) C(−) is the transpose of C(+) : w is a length scale which can be chosen arbitrarily (+) (−) (usually taken to be some average of the MT radii). CKL;p = CLK;p (A.12) but we keep them separate to distinguish terms that take Y`,m into Y`+1,m0 and those that map into Y`−1,m0 .

B. Definition of Hankel and Bessel Functions To distinguish radial parts of spherical functions from solid versions, we denote spherical parts with subscript `, and solid functions with subscript L, which refers to both the ` and m parts. The Questaal codes generally follow Methfessel’s definitions√ for Hankel and Bessel functions. Writing k= E with Im k≥0, they are related to standard 65 C. Calculations in Real and Reciprocal Space All of the formalism developed so far (with the exception of APWs) can apply to either real or reciprocal space. At present the Questaal codes work in reciprocal space, with the exception of lmpg (Sec. 2.14), which uses real space for the principal layer direction. The k mesh is always a uniform mesh. Along any axis the user can specify whether the mesh has a point coinciding with k=0 or points that symmetrically straddle it. When screened MTO’s are used, Bloch sums are formed directly from the short ranged orbitals. The standard lmf basis is not sufficiently short ranged, and Bloch sums are constructed with an Ewald technique. A Bloch sum of HL is written as a sum over lattice vectors R:

k X ik·R ik·r X −ik·(r−R) HL(ε, γ; r) = e HL(ε, γ; r − R) = e e HL(ε, γ; r − R) R R The latter sum is periodic in R and can be written as a Fourier series in the reciprocal lattice vectors G

X −ik·(r−R) 1 X iG·r e HL(ε, γ; r − R) = e Hb L(ε, γ; k + G) V R G so that the Bloch-summed HL is

k 1 X i(k+G)·r H (ε, γ; r) = Hb L(ε, γ; k + G)e (C.15) L V G V is the volume of the unit cell.

D. Matrix elements of Smooth Hankel Envelope Functions This Appendix derives analytic expressions for matrix elements of the overlap, laplacian, gradient and position operators for a pair of functions Hk2L2 (ε2, rs2 ; r − R2) and Hk1L1 (ε1, rs1 ; r − R1), of the form Eq. (118). The strategy will be to use Parseval’s identity, Eq. (141), and evaluate the matrix elements of ∗ functions in reciprocal space. Analytic solutions can be obtained because products of functions Hb 2Hb 1 can be contracted to sums of functions of the same class, whose Fourier transform is known. 2 To begin with, consider matrix elements of the overlap and Laplacian operators. Since ∇ HkL = Hk+1L, matrix elements of the Laplacian and overlap are different members of the same class. Using Eq. (116), (118) and (A.8), products of two Hb kL(q) read as follows. To shorten the formulas the smoothing radius will be 2 expressed in terms of γ=rs /4. h i n o ∗ 2 k1+k2 `1 iq·R1 −iq·R2 Hb 1 Hb 2 = bh0(ε1, γ1, q)bh0(ε2, γ2, q) (−q ) (−1) YL1 (−iq) YL2 (−iq) e e (D.16)

The quantity in curly braces can be expanded into a linear combination of YL using Eq. (A.8),

2 k1+k2 `1 `1 X 2 k1+k2+(`1+`2−m)/2 (−q ) (−1) YL1 (−iq) YL2 (−iq) = (−1) CL1L2M (−q ) YM (−iq) (D.17) M and the radial part in square brackets is similarly expanded

2 2 2 (4π) eγ1(ε1−q )eγ2(ε2−q ) bh0(ε1, γ1, q)bh0(ε2, γ2, q) = 2 2 (ε1 − q )(ε2 − q ) 2 " 2 2 # (4π) eγ1(ε1−ε2)e(γ1+γ2)(ε2−q ) eγ2(ε2−ε1)e(γ2+γ1)(ε1−q ) = 2 − 2 ε1 − ε2 ε2 − q ε1 − q 4π h i γ2(ε2−ε1) γ1(ε1−ε2) = e bh0(ε1, γ, q) − e bh0(ε2, γ, q) (D.18) ε1 − ε2 2 ε ,ε →ε (4π) 2 2 1 (γ1+γ2)(ε−q ) −−−−−→ 2 2 e (D.19) (ε2 − q )

γ = γ1 + γ2. (D.20)

66 When the factors in Eq. (D.16) are combined, the product is seen to be the Fourier transform of a linear combination of HkM with smoothing radius given by γ1 + γ2, at the connecting vector R1 − R2. The overlap matrix can then be written Z hH H i = H∗ (ε , γ ; r − R )H (ε , γ ; r − R ) d3r k1L1 k2L2 k1L1 1 1 1 k2L2 2 2 2

ne−γ1(ε2−ε1) eγ2(ε2−ε1) o `1 X 0 0 (D.21) = 4π(−1) CL1L2M Hk M (ε2, γ; R) − Hk M (ε1, γ; R) ε2 − ε1 ε2 − ε1 M 0 k = k1 + k2 + (`1 + `2 − m)/2, γ = γ1 + γ2, R = R1 − R2

The case ε1 = ε2 = ε must be handled specially. In that limit, the radial part simplifies to Eq. (D.19). This function is not contained in the HkL pantheon, and a new function must be added

2k Wb kL(q) = wb0(ε, rs; q)(−iq) YL(−iq) 4π γ(ε−q2) −1 (D.22) w0(ε, γ; q) = e = bh0(ε, γ; q) b (ε − q2)2 (ε − q2)

Wb kL is closely related to the energy derivative of Hb kL. Differentiate Eq. (118), with respect to energy:

˙ ∂Hb kL Hb kL ≡ = Wb kL + γHb kL ∂ε which are generated from radial functions (D.23) b˙ h0(ε, γ; q) = wb0(ε, γ; q) + γ bh0(ε, γ; q) ˙ This provides a practical scheme to obtain W0L in real space since analytic forms for H0L and H0L are known (Eqs. (112) and (113)). To make WkL for higher k it is readily seen that

Wk+lL(r) = −εWkL(r) − HkL(r)

Using Eq. (D.19) for the quantity in square brackets, Eq. (D.16), the resulting integral is Z H∗ (ε, γ ; r − R )H (ε, γ ; r − R ) d3r = k1L1 1 1 k2L2 2 2 (D.24) `1 X 0 4π(−1) CL1L2M Wk M (ε, γ; R) M where k0, γ and R the same as in Eq. (D.21). Note that Eq. (D.24) encompasses matrix elements between two generalized Gaussians, by virtue of Eq. (134), Hk+1L(ε=0) = −4πGkL.

D.1. The momentum and position operators acting on a smooth Hankel function Following the usual rules of Fourier transforms, the gradient operator ∇r and position operator r act in reciprocal space as follows. Use the definition Eqs. (116) and (118) with Eq. (A.11) to obtain Z 1 iq·r 3 ∇p HkL(ε, γ; r) = Hb kL(ε, γ; q) ∇e d q (2π)3 Z 1 iq·r 3 = − e (−iq) bh0(ε, γ; q)YL(−iq) d q (2π)3 Z (D.25) 1 3 iq·r X  (+) (−) = − d q e C Hb kM (q) + C Hk+1M (q) (2π)3 ML;p ML;p M X  (+) (−) = − CML;pHkM (r) + CML;pHk+1M (r) M 67 (±) CML;p has two nonzero elements for fixed L; p. The position operator is more complicated, and we restrict the derivation to k=0 case. Z 1 ∂ iq·r 3 r H0L(ε, γ; r) = Hb 0L(ε, γ; q) e d q (2π)3 i∂q Z 1 iq·r 3 = e [i∇qbh0(ε, γ; q)]YL(−iq) + bh0(ε, γ; q)[i∇qYL(−iq)] d q (2π)3 The first term contains b˙ [i∇qbh0(ε, γ; q)] = 2iq (wb0 − γbh0) = 2(−iq) h0 X  (+) ˙ (−) ˙ i∇qbh0(ε, γ; q)]YL(−iq) = 2 CML;pHb 0M (q) + CML;pHb 1M (q) M using Eq. (A.11). The second term follows directly from Eq. (A.10) to yield X  (+) ˙ (−) ˙ rp H0L(ε, γ; r) = CML;p(2H0M (r)) + CML;p(2H1M (r) + (2` + 1)H0M (r)) (D.26) M D.2. Matrix elements of position and gradient operators Eq. (D.25) shows that ∇HkL(r) generates a linear combination of the same family of HkL(r), evaluated at the vector connecting atom centres. Thus, matrix elements of the gradient operator are linear combinations of the same functions. A similar situation applies to matrix elements of r, though they entail matrix elements ˙ of the overlap between HkL and HkL. These integrals can be evaluated in reciprocal space using Parseval’s ∗ ˙ identity, expanding Hb 1 Hb 2 analogously to Eq. (D.16). The radial part of this product can be expanded as: 2 2 2 −(4π) eγ1(ε1−q )eγ2(ε2−q ) b˙ bh0(ε1, γ1, q)h0(ε2, γ2, q) = 2 2 2 + γ2 bh0(ε1, γ1, q)bh0(ε2, γ2, q) (ε1 − q )(ε2 − q ) 4π     (ε1−ε2) γ1 −1 −ε2 γ1−ε1 γ2 ε1 γ ε2 γ = e wb0(ε2, γ, q) + [γ2 + (ε1 − ε2) ] e e bh0(ε2, γ, q) − e bh0(ε1, γ, q) ε1 − ε2 (D.27) Substituting this radial function into the square brackets in Eq. (D.19), we obtain D E Z H H˙ = H∗ (ε , γ ; r − R )H˙ (ε , γ ; r − R ) d3r k1L1 k2L2 k1L1 1 1 1 k2L2 2 2 2

4π(−1)`1  X (ε1−ε2) γ1 0 = CL1L2M × e Wk M (ε2) + (D.28) ε1 − ε2 M   −1 −ε2 γ1−ε1 γ2 ε1 γ ε2 γ [γ2 + (ε1 − ε2) ] e e Hk0M (ε2) − e Hk0M (ε1)

0 where Wk0M and Hk0M are evaluated with the same arguments k , γ and R as in Eq. (D.21). Combining Eqs. D.25, D.26, D.21 and D.28 Z hH ∇ H i = H∗ (ε , γ ; r − R )∇ H (ε , γ ; r − R ) d3r k1L1 p k2L2 k1L1 1 1 1 p k2L2 2 2 2 (D.29) X (+) (−) = − C hH H i + C hH H i ML2;p k1L1 k2M ML;p k1L1 k2+1M M

Z hH (r − R ) H i = H∗ (ε , γ ; r − R )(r − R ) H (ε , γ ; r − R ) d3r 0L1 2 p 0L2 0L1 1 1 1 2 p 0L2 2 2 2 (D.30) X n (+) D E (−)  D E  o = C 2 H H˙ + C 2 H H˙ + (2` + 1) hH H i ML2;p 0L1 0M ML;p 0L1 1M 2 0L1 0M M

68 References Phys. Rev. B 88 (2013) 075105. doi:10.1103/PhysRevB. 88.075105. [20] M. J. van Setten, F. Weigend, F. Evers, J. Chem. [1] J. Korringa, Phys. Rep. 238 (1994) 341–360. doi: Theory Comput. 9 (2013) 232–246. 10.1016/0370-1573(94)90122-8. doi:10.1021/ . [2] C. Herring, Phys. Rev. 57 (1940) 1169–1177. doi: ct300648t [21] M. A. Marques, M. J. Oliveira, T. Burnus, Comput. 10.1103/PhysRev.57.1169. Phys. Commun. 183 (2012) 2272–2281. [3] J. C. Slater, Phys. Rev. 51 (1937) 846–851. doi:10. doi:10.1016/ . 1103/PhysRev.51.846. j.cpc.2012.05.007 [4] P. E. Bl¨ochl, Phys. Rev. B 50 (1994) 17953–17979. [22] A. N. Chantis, K. D. Belashchenko, E. Y. Tsymbal, M. van Schilfgaarde, Phys. Rev. Lett. 98 (2007) 046601. doi:10.1103/PhysRevB.50.17953. [5] O. K. Andersen, Phys. Rev. B 12 (1975) 3060–3083. doi:10.1103/PhysRevLett.98.046601. [23] A. N. Chantis, K. D. Belashchenko, D. L. Smith, E. Y. doi:10.1103/PhysRevB.12.3060. [6] K. Lejaeghere, G. Bihlmayer, T. Bjorkman, P. Blaha, Tsymbal, M. van Schilfgaarde, R. C. Albers, Phys. Rev. S. Blugel, V. Blum, D. Caliste, I. E. Castelli, S. J. Lett. 99 (2007) 196603. doi:10.1103/PhysRevLett.99. Clark, A. Dal Corso, S. de Gironcoli, T. Deutsch, 196603. J. K. Dewhurst, I. Di Marco, C. Draxl, M. Duak, [24] K. D. Belashchenko, A. A. Kovalev, M. van Schil- O. Eriksson, J. A. Flores-Livas, K. F. Garrity, L. Gen- fgaarde, Phys. Rev. Lett. 117 (2016) 207204. doi: ovese, P. Giannozzi, M. Giantomassi, S. Goedecker, 10.1103/PhysRevLett.117.207204. X. Gonze, O. Granas, E. K. U. Gross, A. Gulans, [25] K. D. Belashchenko, A. A. Kovalev, M. van Schil- F. Gygi, D. R. Hamann, P. J. Hasnip, N. A. W. fgaarde, Phys. Rev. Mater. 3 (2019) 011401. doi: Holzwarth, D. Iuan, D. B. Jochym, F. Jollet, D. Jones, 10.1103/PhysRevMaterials.3.011401. G. Kresse, K. Koepernik, E. Kucukbenli, Y. O. Kvash- [26] F. Aryasetiawan, Phys. Rev. B 46 (1992) 13051–13064. nin, I. L. M. Locht, S. Lubeck, M. Marsman, N. Marzari, doi:10.1103/PhysRevB.46.13051. U. Nitzsche, L. Nordstrom, T. Ozaki, L. Paulatto, C. J. [27] K. Haule, Phys. Rev. B 75 (2007) 155113. doi:10. Pickard, W. Poelmans, M. I. J. Probert, K. Refson, 1103/PhysRevB.75.155113. M. Richter, G.-M. Rignanese, S. Saha, M. Scheffler, [28] O. Parcollet, M. Ferrero, T. Ayral, H. Hafermann, M. Schlipf, K. Schwarz, S. Sharma, F. Tavazza, P. Thun- I. Krivenko, L. Messio, P. Seth, Comput. Phys. Com- strom, A. Tkatchenko, M. Torrent, D. Vanderbilt, M. J. mun. 196 (2015) 398–415. doi:10.1016/j.cpc.2015. van Setten, V. Van Speybroeck, J. M. Wills, J. R. 04.023. Yates, G.-X. Zhang, S. Cottenier, Science 351 (2016) [29] M. W. Finnis, A. T. Paxton, M. Methfessel, M. van Schilfgaarde, Phys. Rev. Lett. 81 (1998) 5149–5152. aad3000–aad3000. doi:10.1126/science.aad3000. [7] M. Methfessel, D. Hennig, M. Scheffler, Phys. Rev. B doi:10.1103/PhysRevLett.81.5149. [30] M. W. Finnis, Interatomic forces in condensed matter, 46 (1992) 4816–4829. doi:10.1103/PhysRevB.46.4816. [8] A. I. Liechtenstein, I. I. Mazin, C. O. Rodriguez, Oxford University Press, Oxford, 2003. O. Jepsen, O. K. Andersen, M. Methfessel, Phys. Rev. [31] M. Methfessel, M. van Schilfgaarde, R. A. Casali, in: H. Dreyss´e(Ed.), Electronic structure and physical B 44 (1991) 5388–5391. doi:10.1103/PhysRevB.44. properties of solids: the uses of the LMTO method, no. 5388. 535 in Lecture notes in physics, Springer, Berlin, 2000, [9] M. Methfessel, M. van Schilfgaarde, M. Scheffler, Phys. p. 114. Rev. Lett. 70 (1993) 29–32. doi:10.1103/PhysRevLett. [32] O. Gunnarsson, O. Jepsen, O. K. Andersen, Phys. Rev. 70.29. [10] M. I. Katsnelson, V. P. Antropov, M. van Schilfgaarde, B 27 (1983) 7144–7168. doi:10.1103/PhysRevB.27. . B. N. Harmon, JETP Lett. 62 (1995) 439–443. 7144 [11] V. P. Antropov, M. I. Katsnelson, M. van Schilfgaarde, [33] O. K. Andersen, T. Saha-Dasgupta, R. W. Tank, B. N. Harmon, Phys. Rev. Lett. 75 (1995) 729–732. G. K. C. Arcangeli, O. Jepsen, G. Krier, in: H. Dreyss´e (Ed.), Electronic structure and physical properties of doi:10.1103/PhysRevLett.75.729. [12] T. Kotani, H. Akai, Phys. Rev. B 54 (1996) 16502– solids: the uses of the LMTO method, no. 535 in Lec- ture notes in physics, Springer, Berlin, 2000, p. 3. 16514. doi:10.1103/PhysRevB.54.16502. [13] F. Aryasetiawan, O. Gunnarsson, Phys. Rev. Lett. [34] O. K. Andersen, A. V. Postnikov, S. Y. Savrasov, in: W. H. Butler, P. H. Dederichs, A. Gonis, R. L. Weaver 74 (1995) 3221–3224. doi:10.1103/PhysRevLett.74. (Eds.), Applications of Multiple Scattering Theory to 3221. [14] S. V. Faleev, F. L´eonard, D. A. Stewart, M. van Materials Science, Vol. 253, Materials Research Society, Pittsburgh, 1992, pp. 37–70. Schilfgaarde, Phys. Rev. B 71 (2005) 195422. doi: [35] O. K. Andersen, Z. Pawlowska, O. Jepsen, Phys. Rev. B 10.1103/PhysRevB.71.195422. [15] T. Kotani, M. van Schilfgaarde, Solid State Commun. 34 (1986) 5253–5269. doi:10.1103/PhysRevB.34.5253. [36] M. van Schilfgaarde, W. R. L. Lambrecht, in: 121 (2002) 461–465. doi:10.1016/S0038-1098(02) L. Colombo, A. Gonis, P. Turchi (Eds.), Tight-binding 00028-5. [16] S. V. Faleev, M. van Schilfgaarde, T. Kotani, Phys. Rev. approach to Computational Materials Science, Vol. 491 of Lecture Notes in Physics, Springer, Berlin, 1998, p. Lett. 93 (2004) 126406. doi:10.1103/PhysRevLett.93. 137. 126406. [17] S. Choi, A. Kutepov, K. Haule, M. van Schilfgaarde, [37] M. P. L. Sancho, J. M. L. Sancho, J. Rubio, J. Phys. F 15 (1985) 851–858. . G. Kotliar, npj Quantum Mater. 1 (2016) 16001. doi: doi:10.1088/0305-4608/15/4/009 [38] A.-B. Chen, Y.-M. Lai-Hsu, W. Chen, Phys. Rev. B 39 10.1038/npjquantmats.2016.1. [18] O. K. Andersen, O. Jepsen, Phys. Rev. Lett. 53 (1984) (1989) 923–929. doi:10.1103/PhysRevB.39.923. [39] A. K. Rajagopal, J. Callaway, Phys. Rev. B 7 (1973) 2571–2574. doi:10.1103/PhysRevLett.53.2571. [19] F. Caruso, P. Rinke, X. Ren, A. Rubio, M. Scheffler, 1912–1919. doi:10.1103/PhysRevB.7.1912. 69 [40] G. Vignale, M. Rasolt, Phys. Rev. Lett. 59 (1987) 2360– PhysRevB.52.R5467. 2363. doi:10.1103/PhysRevLett.59.2360. [67] M. T. Czy˙zyk,G. A. Sawatzky, Phys. Rev. B 49 (1994) [41] G. Vignale, M. Rasolt, Phys. Rev. B 37 (1988) 10685– 14211–14228. doi:10.1103/PhysRevB.49.14211. 10696. doi:10.1103/PhysRevB.37.10685. [68] A. G. Petukhov, I. I. Mazin, L. Chioncel, A. I. Licht- [42] H. Ebert, J. Phys. Condens. Matter. 1 (1989) 9111– enstein, Phys. Rev. B 67 (2003) 153106. doi:10.1103/ 9116. doi:10.1088/0953-8984/1/46/005. PhysRevB.67.153106. [43] I. Turek, V. Drchal, J. Kudrnovsk´y,M. Sob,˘ P. Wein- [69] V. I. Anisimov, O. Gunnarsson, Phys. Rev. B 43 (1991) berger, Electronic Structure of Disordered Alloys, Sur- 7570–7574. doi:10.1103/PhysRevB.43.7570. faces and Interfaces, Springer, Boston, 1997. [70] M. Cococcioni, S. de Gironcoli, Phys. Rev. B 71 (2005) [44] R. Feder, F. Rosicky, B. Ackermann, Z. Phys. B 52 035105. doi:10.1103/PhysRevB.71.035105. (1983) 31–36. doi:10.1007/BF01305895. [71] E. U. Condon, G. H. Shortley, The Theory of Atomic [45] J. Kudrnovsk´y,V. Drchal, Phys. Rev. B 41 (1990) Spectra, Cambridge University Press, Cambridge, 1953. 7515–7528. doi:10.1103/PhysRevB.41.7515. [72] Note that Eq.(2) in Ref. [73] incorrectly uses 50/429 [46] A. Devonport, A. Vishina, R. Singh, M. Edwards, (taken from Ref. [65]) instead of 1/858 as the contri- K. Zheng, J. Domenico, N. Rizzo, C. Kopas, M. van bution of F 6(ff) to the average J, thus the actual Schilfgaarde, N. Newman, J. Magn. Magn. Mater. 460 J used were about 25 % smaller than the ones listed. (2018) 193–202. doi:10.1016/j.jmmm.2018.03.054. However, since this did not correspond to the correct [47] V. Alena, Fully relativistic green’s functions method, average J, it makes an error in the double counting Ph.D. thesis, Kings College London (2019). term which shifts the f-bands somewhat relative to the [48] M. Abd El Qader, R. K. Singh, S. N. Galvin, L. Yu, other bands. Also the calculation used the Hartree-Fock J. M. Rowell, N. Newman, Appl. Phys. Lett. 104 (2014) unscreened atomic values of the Slater integrals F k for 022602. doi:10.1063/1.4862195. k ≥ 2. [49] V. P. Antropov, M. I. Katsnelson, B. N. Harmon, [73] P. Larson, W. R. L. Lambrecht, A. Chantis, M. van M. van Schilfgaarde, D. Kusnezov, Phys. Rev. B 54 Schilfgaarde, Phys. Rev. B 75 (2007) 045114. doi: (1996) 1019–1035. doi:10.1103/PhysRevB.54.1019. 10.1103/PhysRevB.75.045114. [50] M. van Schilfgaarde, I. A. Abrikosov, B. Johansson, [74] S. Keshavarz, J. Sch¨ott,A. J. Millis, Y. O. Kvashnin, Nature 400 (1999) 46–49. doi:10.1038/21848. Phys. Rev. B 97 (2018) 184404. doi:10.1103/PhysRevB. [51] A. Bulgac, D. Kusnezov, Phys. Rev. A 42 (1990) 5045– 97.184404. 5048. doi:10.1103/PhysRevA.42.5045. [75] S. L. Dudarev, G. A. Botton, S. Y. Savrasov, C. J. [52] T. Kotani, H. Kino, H. Akai, J. Phys. Soc. Jpn. 84 Humphreys, A. P. Sutton, Phys. Rev. B 57 (1998) (2015) 034702. doi:10.7566/JPSJ.84.034702. 1505–1509. doi:10.1103/PhysRevB.57.1505. [53] E. Bott, M. Methfessel, W. Krabs, P. C. Schmidt, [76] A. Boonchun, W. R. L. Lambrecht, Phys. Status J. Math. Phys. 39 (1998) 3393–3425. doi:10.1063/1. Solidi B 248 (2011) 1043–1051. doi:10.1002/pssb. 532437. 201046328. [54] T. Kotani, M. van Schilfgaarde, Phys. Rev. B 81 (2010) [77] F. Brivio, K. T. Butler, A. Walsh, M. van Schilfgaarde, 125117. doi:10.1103/PhysRevB.81.125117. Phys. Rev. B 89 (2014) 155204. doi:10.1103/PhysRevB. [55] J. M. Soler, A. R. Williams, Phys. Rev. B 40 (1989) 89.155204. 1560–1564. doi:10.1103/PhysRevB.40.1560. [78] A. Svane, N. E. Christensen, M. Cardona, A. N. Chan- [56] M. van Schilfgaarde, T. Kotani, S. V. Faleev, Phys. tis, M. van Schilfgaarde, T. Kotani, Phys. Rev. B 81 Rev. B 74 (2006) 245125. doi:10.1103/PhysRevB.74. (2010) 245120. doi:10.1103/PhysRevB.81.245120. 245125. [79] T. Kotani, J. Phys. Soc. Jpn. 83 (2014) 094711. doi: [57] D. J. Singh, L. Nordstr¨om,Planewaves, Pseudopoten- 10.7566/JPSJ.83.094711. tials and the LAPW Method, Second Edition, Springer, [80] N. E. Zein, S. Y. Savrasov, G. Kotliar, Phys. Rev. Berlin, 2006. Lett. 96 (2006) 226403. doi:10.1103/PhysRevLett.96. [58] J. Kune˘s,P. Nov´ak,R. Schmid, P. Blaha, K. Schwarz, 226403. Phys. Rev. B 64 (2001) 153102. doi:10.1103/PhysRevB. [81] K. Schwarz, P. Blaha, G. Madsen, Comput. Phys. Com- 64.153102. mun. 147 (2002) 71–76. doi:10.1016/S0010-4655(02) [59] C. Friedrich, M. C. M¨uller,S. Bl¨ugel,Phys. Rev. B 83 00206-0. (2011) 081101. doi:10.1103/PhysRevB.83.081101. [82] L. Hedin, Phys. Rev. 139 (1965) A796–A823. doi: [60] H. Jiang, P. Blaha, Phys. Rev. B 93 (2016) 115203. 10.1103/PhysRev.139.A796. doi:10.1103/PhysRevB.93.115203. [83] M. van Schilfgaarde, T. Kotani, S. Faleev, Phys. Rev. [61] J. Harris, Phys. Rev. B 31 (1985) 1770–1779. doi: Lett. 96 (2006) 226402. doi:10.1103/PhysRevLett.96. 10.1103/PhysRevB.31.1770. 226402. [62] W. M. C. Foulkes, R. Haydock, Phys. Rev. B 39 (1989) [84] T. Kotani, M. van Schilfgaarde, S. V. Faleev, Phys. 12520–12536. doi:10.1103/PhysRevB.39.12520. Rev. B 76 (2007) 165106. doi:10.1103/PhysRevB.76. [63] V. I. Anisimov, J. Zaanen, O. K. Andersen, Phys. Rev. 165106. B 44 (1991) 943–954. doi:10.1103/PhysRevB.44.943. [85] F. Aryasetiawan, O. Gunnarsson, Phys. Rev. B [64] V. I. Anisimov, I. V. Solovyev, M. A. Korotin, M. T. 49 (1994) 16214–16222. doi:10.1103/PhysRevB.49. Czy˙zyk,G. A. Sawatzky, Phys. Rev. B 48 (1993) 16929– 16214. 16934. doi:10.1103/PhysRevB.48.16929. [86] S. Acharya, C. Weber, E. Plekhanov, D. Pashov, [65] V. I. Anisimov, F. Aryasetiawan, A. I. Lichtenstein, A. Taraphder, M. Van Schilfgaarde, Phys. Rev. X 8 J. Phys. Condens. Matter. 9 (1997) 767–808. doi: (2018) 021038. doi:10.1103/PhysRevX.8.021038. 10.1088/0953-8984/9/4/002. [87] E. L. Shirley, Phys. Rev. B 54 (1996) 7758–7764. doi: [66] A. I. Liechtenstein, V. I. Anisimov, J. Zaanen, 10.1103/PhysRevB.54.7758. Phys. Rev. B 52 (1995) R5467–R5470. doi:10.1103/ [88] B. Holm, U. von Barth, Phys. Rev. B 57 (1998) 2108–

70 2117. doi:10.1103/PhysRevB.57.2108. 174433. [89] M. Grumet, P. Liu, M. Kaltak, J. Klimeˇs,G. Kresse, [113] T. Kotani, M. van Schilfgaarde, Phys. Rev. B 81 (2010) Phys. Rev. B 98 (2018) 155143. doi:10.1103/PhysRevB. 125201. doi:10.1103/PhysRevB.81.125201. 98.155143. [114] N. E. Christensen, A. Svane, R. Laskowski, B. Palanivel, [90] K. D. Belashchenko, V. P. Antropov, N. E. Zein, Phys. P. Modak, A. N. Chantis, M. van Schilfgaarde, Rev. B 73 (2006) 073105. doi:10.1103/PhysRevB.73. T. Kotani, Phys. Rev. B 81 (2010) 045203. doi: 073105. 10.1103/PhysRevB.81.045203. [91] D. Tamme, R. Schepe, K. Henneberger, Phys. Rev. Lett. [115] L. Sponza, P. Pisanti, A. Vishina, D. Pashov, C. Weber, 83 (1999) 241–241. doi:10.1103/PhysRevLett.83.241. M. van Schilfgaarde, S. Acharya, J. Vidal, G. Kotliar, [92] F. Aryasetiawan, O. Gunnarsson, Phys. Rev. Lett. Phys. Rev. B 95 (2017) 041112. doi:10.1103/PhysRevB. 74 (1995) 3221–3224. doi:10.1103/PhysRevLett.74. 95.041112. 3221. [116] B. Cunningham, M. Gr¨uning,P. Azarhoosh, D. Pashov, [93] M. Cazzaniga, H. Cercellier, M. Holzmann, C. Monney, M. van Schilfgaarde, Phys. Rev. Mater. 2 (2018) 034603. P. Aebi, G. Onida, V. Olevano, Phys. Rev. B 85 (2012) doi:10.1103/PhysRevMaterials.2.034603. 195111. doi:10.1103/PhysRevB.85.195111. [117] S. Chambers, T. Droubay, T. Kaspar, M. Gutowski, [94] R. Sakuma, P. Werner, F. Aryasetiawan, Phys. Rev. B M. van Schilfgaarde, Surf. Sci. 554 (2004) 81–89. doi: 88 (2013) 235110. doi:10.1103/PhysRevB.88.235110. 10.1016/j.susc.2004.02.021. [95] P. Liu, B. Kim, X.-Q. Chen, D. D. Sarma, G. Kresse, [118] M. Gatti, F. Bruneval, V. Olevano, L. Reining, C. Franchini, Phys. Rev. Mater. 2 (2018) 075003. doi: Phys. Rev. Lett. 99 (2007) 266402. doi:10.1103/ 10.1103/PhysRevMaterials.2.075003. PhysRevLett.99.266402. [96] F. Bruneval, T. Rangel, S. M. Hamed, M. Shao, [119] C. Weber, S. Acharya, B. Cunningham, M. Gr¨uning, C. Yang, J. B. Neaton, Comput. Phys. Commun. 208 L. Zhang, H. Zhao, Y. Tan, Y. Zhang, C. Zhang, K. Liu, (2016) 149–161. doi:10.1016/j.cpc.2016.06.019. M. V. Schilfgaarde, M. Shalaby, arXiv:1901.08139[link]. [97] T. Olsen, K. S. Thygesen, Phys. Rev. B 86 (2012) URL https://arxiv.org/abs/1901.08139v1 081103. doi:10.1103/PhysRevB.86.081103. [120] Y. Tan, H. Zhao, L. Z. Yan, L. Zhang, Y. Zhang, [98] F. Bruneval, J. Chem. Phys. 136 (2012) 194107. doi: C. Weber, S. Acharya, B. Cunningham, M. Grun- 10.1063/1.4718428. ing, K. Liu, M. V. Schilfgaarde, M. Shalaby, Possi- [99] P. Koval, D. Foerster, D. S´anchez-Portal, Phys. Rev. B ble phonon-induced electronic bi-stability in vo2 for 89 (2014) 155417. doi:10.1103/PhysRevB.89.155417. ultrafast memory at room temperature, in: 2019 [100] N. Salas-Illanes, Means of the QSGW method within 44th International Conference on Infrared, Millime- the LAPW+LO framework, Ph.D. thesis, Humboldt- ter, and Terahertz Waves (IRMMW-THz), 2019, pp. Universit¨atzu Berlin (2019). 1–1. doi:10.1109/IRMMW-THz.2019.8874246. [101] S. Ismail-Beigi, J. Phys. Condens. Matter. 29 (2017) [121] M. van Schilfgaarde, M. I. Katsnelson, Phys. Rev. B 385501. doi:10.1088/1361-648X/aa7803. 83 (2011) 081409. doi:10.1103/PhysRevB.83.081409. [102] A. L. Kutepov, Phys. Rev. B 95 (2017) 195120. doi: [122] W. Jauch, M. Reehuis, Phys. Rev. B 65 (2002) 125111. 10.1103/PhysRevB.95.195120. doi:10.1103/PhysRevB.65.125111. [103] I. Aguilera, C. Friedrich, S. Bl¨ugel,Phys. Rev. B 91 [123] W. Jauch, M. Reehuis, Phys. Rev. B 70 (2004) 195121. (2015) 125129. doi:10.1103/PhysRevB.91.125129. doi:10.1103/PhysRevB.70.195121. [104] R. Shaltaf, G.-M. Rignanese, X. Gonze, F. Giustino, [124] S. Albrecht, L. Reining, R. Del Sole, G. Onida, A. Pasquarello, Phys. Rev. Lett. 100 (2008) 186401. Phys. Rev. Lett. 80 (1998) 4510–4513. doi:10.1103/ doi:10.1103/PhysRevLett.100.186401. PhysRevLett.80.4510. [105] M. Shishkin, M. Marsman, G. Kresse, Phys. Rev. [125] M. Rohlfing, S. G. Louie, Phys. Rev. Lett. 81 (1998) Lett. 99 (2007) 246403. doi:10.1103/PhysRevLett.99. 2312–2315. doi:10.1103/PhysRevLett.81.2312. 246403. [126] B. Cunningham, M. Gr¨uning,D. Pashov, M. van Schil- [106] M. Betzinger, C. Friedrich, A. G¨orling,S. Bl¨ugel, Phys. fgaarde. Rev. B 92 (2015) 245101. doi:10.1103/PhysRevB.92. [127] H. Fr¨ohlich, Adv. Phys. 3 (1954) 325–361. doi:10. 245101. 1080/00018735400101213. [107] A. N. Chantis, M. van Schilfgaarde, T. Kotani, [128] W. R. L. Lambrecht, C. Bhandari, M. van Schilfgaarde, Phys. Rev. Lett. 96 (2006) 086405. doi:10.1103/ Phys. Rev. Mater. 1 (2017) 043802. doi:10.1103/ PhysRevLett.96.086405. PhysRevMaterials.1.043802. [108] A. N. Chantis, M. Cardona, N. E. Christensen, D. L. [129] M. Cardona, M. L. W. Thewalt, Rev. Mod. Phys. 77 Smith, M. van Schilfgaarde, T. Kotani, A. Svane, R. C. (2005) 1173–1224. doi:10.1103/RevModPhys.77.1173. Albers, Phys. Rev. B 78 (2008) 075208. doi:10.1103/ [130] C. Bhandari, M. van Schilgaarde, T. Kotani, W. R. L. PhysRevB.78.075208. Lambrecht, Phys. Rev. Mater. 2 (2018) 013807. doi: [109] J.-W. Luo, A. N. Chantis, M. van Schilfgaarde, 10.1103/PhysRevMaterials.2.013807. G. Bester, A. Zunger, Phys. Rev. Lett. 104 (2010) [131] A. Gr¨uneis,G. Kresse, Y. Hinuma, F. Oba, Phys. Rev. 066405. doi:10.1103/PhysRevLett.104.066405. Lett. 112 (2014) 096401. doi:10.1103/PhysRevLett. [110] T. Kotani, M. van Schilfgaarde, J. Phys. Condens. 112.096401. Matter. 20 (2008) 295214. doi:10.1088/0953-8984/ [132] M. Heiblum, M. V. Fischetti, W. P. Dumke, D. J. 20/29/295214. Frank, I. M. Anderson, C. M. Knoedler, L. Osterling, [111] L. Ke, M. van Schilfgaarde, J. Pulikkotil, T. Kotani, Phys. Rev. Lett. 58 (1987) 816–819. doi:10.1103/ V. Antropov, Phys. Rev. B 83 (2011) 060404. doi: PhysRevLett.58.816. 10.1103/PhysRevB.83.060404. [133] T. Ruf, M. Cardona, Phys. Rev. B 41 (1990) 10747– [112] S. V. Faleev, O. N. Mryasov, M. van Schilfgaarde, Phys. 10753. doi:10.1103/PhysRevB.41.10747. Rev. B 85 (2012) 174433. doi:10.1103/PhysRevB.85. [134] M. Cardona, C. M. Penchina, N. Shevchik, J. Tejeda,

71 Solid State Commun. 11 (1972) 1655–1658. doi:10. [161] Private communication. 1016/0038-1098(72)90764-8. [162] F. Aryasetiawan, M. Imada, A. Georges, G. Kotliar, [135] A. N. Chantis, M. van Schilfgaarde, T. Kotani, Phys. S. Biermann, A. I. Lichtenstein, Phys. Rev. B 70 (2004) Rev. B 76 (2007) 165126. doi:10.1103/PhysRevB.76. 195104. doi:10.1103/PhysRevB.70.195104. 165126. [163] E. S¸a¸sioˇglu,C. Friedrich, S. Bl¨ugel,Phys. Rev. B 83 [136] T. Moriya, Spin Fluctuations in Itinerant Electron (2011) 121101. doi:10.1103/PhysRevB.83.121101. Magnetism, Springer, Berlin, 1985. [164] A. Georges, in: E. Pavarini, E. Koch, D. Vollhardt, [137] A. Aguayo, I. Mazin, D. Singh, Phys. Rev. Lett. 92 A. Lichtenstein (Eds.), DMFT at 25: Infinite dimen- (2004) 147201. doi:10.1103/PhysRevLett.92.147201. sions: lecture notes of the Autumn School on Corre- [138] Y. Nishihara, M. Tokumoto, K. Murata, H. Unoki, Jpn. lated Electrons 2014, Forschungszentrum J¨ulich, J¨ulich, J. Appl. Phys. 26 (1987) L1416–L1418. doi:10.1143/ 2014. JJAP.26.L1416. [165] T. Maier, M. Jarrell, T. Pruschke, M. H. Hettler, [139] S. Acharya, D. Pashov, F. Jamet, M. van Schilfgaarde, Rev. Mod. Phys. 77 (2005) 1027–1080. doi:10.1103/ arXiv:1908.08136v1. RevModPhys.77.1027. URL https://arxiv.org/abs/1908.08136v1 [166] E. Gull, A. J. Millis, A. I. Lichtenstein, A. N. Rubtsov, [140] A. Zunger, S.-H. Wei, L. G. Ferreira, J. E. Bernard, M. Troyer, P. Werner, Rev. Mod. Phys. 83 (2011) 349– Phys. Rev. Lett. 65 (1990) 353–356. doi:10.1103/ 404. doi:10.1103/RevModPhys.83.349. PhysRevLett.65.353. [167] H. Okumura, K. Sato, T. Kotani, Phys. Rev. B 100 [141] F. Aryasetiawan, K. Karlsson, Phys. Rev. B 60 (1999) (2019) 054419. doi:10.1103/PhysRevB.100.054419. 7419–7428. doi:10.1103/PhysRevB.60.7419. [168] V. Antropov, J. Magn. Magn. Mater. 262 (2003) L192– [142] E. S¸a¸sioˇglu,A. Schindlmayr, C. Friedrich, F. Freimuth, L197. doi:10.1016/S0304-8853(03)00206-3. S. Bl¨ugel,Phys. Rev. B 81 (2010) 054434. doi:10. [169] A. Liechtenstein, M. Katsnelson, V. Antropov, 1103/PhysRevB.81.054434. V. Gubanov, J. Magn. Magn. Mater. 67 (1987) 65–74. [143] W. F. Brinkman, T. M. Rice, Phys. Rev. B 2 (1970) doi:10.1016/0304-8853(87)90721-9. 4302–4304. doi:10.1103/PhysRevB.2.4302. [170] V. Antropov, B. Harmon, A. Smirnov, J. Magn. [144] M. C. Gutzwiller, Phys. Rev. 137 (1965) A1726–A1735. Magn. Mater. 200 (1999) 148–166. doi:10.1016/ doi:10.1103/PhysRev.137.A1726. S0304-8853(99)00425-4. [145] S. E. Barnes, J. Phys. F 7 (1977) 2637–2647. doi: [171] M. van Schilfgaarde, V. P. Antropov, J. Appl. Phys. 10.1088/0305-4608/7/12/022. 85 (1999) 4827–4829. doi:10.1063/1.370495. [146] P. Coleman, Phys. Rev. B 29 (1984) 3035–3044. doi: [172] A. Franceschetti, S. V. Dudiy, S. V. Barabash, 10.1103/PhysRevB.29.3035. A. Zunger, J. Xu, M. van Schilfgaarde, Phys. Rev. [147] G. Kotliar, A. E. Ruckenstein, Phys. Rev. Lett. Lett. 97 (2006) 047202. doi:10.1103/PhysRevLett.97. 57 (1986) 1362–1365. doi:10.1103/PhysRevLett.57. 047202. 1362. [173] J. Okabayashi, M. Mizuguchi, K. Ono, M. Oshima, [148] T. Li, P. W¨olfle,P. J. Hirschfeld, Phys. Rev. B 40 A. Fujimori, H. Kuramochi, H. Akinaga, Phys. Rev. B (1989) 6817–6821. doi:10.1103/PhysRevB.40.6817. 70 (2004) 233305. doi:10.1103/PhysRevB.70.233305. [149] F. Lechermann, A. Georges, G. Kotliar, O. Parcollet, [174] M. E. Rose, Elementary Theory of Angular Momentum, Phys. Rev. B 76 (2007) 155102. doi:10.1103/PhysRevB. Wiley, New York, 1957. 76.155102. [175] Z. H. Levine, D. C. Allan, Phys. Rev. Lett. 63 (1989) [150] H. Hasegawa, J. Phys. Soc. Jpn. 66 (1997) 1391–1397. 1719–1722. doi:10.1103/PhysRevLett.63.1719. doi:10.1143/JPSJ.66.1391. [176] R. Del Sole, R. Girlanda, Phys. Rev. B 48 (1993) 11789– [151] W. Metzner, D. Vollhardt, Phys. Rev. Lett. 62 (1989) 11795. doi:10.1103/PhysRevB.48.11789. 324–327. doi:10.1103/PhysRevLett.62.324. [177] A. R. H. Preston, A. DeMasi, L. F. J. Piper, K. E. [152] B. Menge, E. Muller-Hartmann, Z. Phys. B 82 (1991) Smith, W. R. L. Lambrecht, A. Boonchun, T. Chei- 237–242. doi:10.1007/BF01324333. wchanchamnangij, J. Arnemann, M. van Schilfgaarde, [153] U. Brandt, C. Mielsch, Z. Phys. B 75 (1989) 365–370. B. J. Ruck, Phys. Rev. B 83 (2011) 205106. doi: doi:10.1007/BF01321824. 10.1103/PhysRevB.83.205106. [154] M. Jarrell, Phys. Rev. Lett. 69 (1992) 168–171. doi: [178] F. Bruneval, N. Vast, L. Reining, M. Izquierdo, 10.1103/PhysRevLett.69.168. F. Sirotti, N. Barrett, Phys. Rev. Lett. 97 (2006) [155] A. Georges, G. Kotliar, Phys. Rev. B 45 (1992) 6479– 267601. doi:10.1103/PhysRevLett.97.267601. 6483. doi:10.1103/PhysRevB.45.6479. [179] C. Aversa, J. E. Sipe, Phys. Rev. B 52 (1995) 14636– [156] A. Georges, G. Kotliar, W. Krauth, M. J. Rozenberg, 14645. doi:10.1103/PhysRevB.52.14636. Rev. Mod. Phys. 68 (1996) 13–125. doi:10.1103/ [180] J. E. Sipe, E. Ghahramani, Phys. Rev. B 48 (1993) RevModPhys.68.13. 11705–11722. doi:10.1103/PhysRevB.48.11705. [157] V. I. Anisimov, A. I. Poteryaev, M. A. Korotin, A. O. [181] J. E. Sipe, A. I. Shkrebtii, Phys. Rev. B 61 (2000) Anokhin, G. Kotliar, J. Phys. Condens. Matter. 9 (1997) 5337–5352. doi:10.1103/PhysRevB.61.5337. 7359–7367. doi:10.1088/0953-8984/9/35/010. [182] S. N. Rashkeev, W. R. L. Lambrecht, B. Segall, Phys. [158] N. Marzari, D. Vanderbilt, Phys. Rev. B 56 (1997) Rev. B 57 (1998) 3905–3919. doi:10.1103/PhysRevB. 12847–12865. doi:10.1103/PhysRevB.56.12847. 57.3905. [159] F. Lechermann, A. Georges, A. Poteryaev, S. Biermann, [183] H. Park, The study of two-particle response functions in M. Posternak, A. Yamasaki, O. K. Andersen, Phys. strongly correlated electron systems within the dynami- Rev. B 74 (2006) 125120. doi:10.1103/PhysRevB.74. cal mean field theory, Ph.D. thesis, Rutgers University- 125120. Graduate School-New Brunswick (2011). [160] K. Haule, C.-H. Yee, K. Kim, Phys. Rev. B 81 (2010) [184] S. Acharya, D. Pashov, C. Weber, H. Park, L. Sponza, 195107. doi:10.1103/PhysRevB.81.195107. M. van Schilfgaarde, arXiv:1811.05143 (accepted for

72 publication in Communications Physics). URL https://arxiv.org/abs/1811.05143v1 [185] T. Suzuki, T. Goto, K. Chiba, T. Shinoda, T. Fukase, H. Kimura, K. Yamada, M. Ohashi, Y. Yamaguchi, Phys. Rev. B 57 (1998) R3229–R3232. doi:10.1103/ PhysRevB.57.R3229. [186] M. Casula, A. Rubtsov, S. Biermann, Phys. Rev. B 85 (2012) 035115. doi:10.1103/PhysRevB.85.035115. [187] S. Laricchia, N. Bonini, M. van Schilfgaarde, in: Bul- letin of the American Physical Society, American Phys- ical Society, 2019, p. P22.00011. [188] N. V. Prokof’ev, B. V. Svistunov, Phys. Rev. Lett. 81 (1998) 2514–2517. doi:10.1103/PhysRevLett.81. 2514. [189] D. Pashov, HECToR dCSE Report. [190] M. van Schilfgaarde, N. Newman, Phys. Rev. Lett. 65 (1990) 2728–2731. doi:10.1103/PhysRevLett.65. 2728. [191] M. Asta, D. de Fontaine, M. van Schilfgaarde, M. Sluiter, M. Methfessel, Phys. Rev. B 46 (1992) 5055–5072. doi:10.1103/PhysRevB.46.5055. [192] M. van Schilfgaarde, I. A. Abrikosov, B. Johansson, Nature 400 (1999) 46–49. doi:10.1038/21848. [193] A. T. Paxton, M. v. Schilfgaarde, M. MacKenzie, A. J. Craven, J. Phys. Condens. Matter. 12 (2000) 729–750. doi:10.1088/0953-8984/12/5/319. [194] T. Kotani, M. van Schilfgaarde, Solid State Commun. 121 (2002) 461–465. doi:10.1016/S0038-1098(02) 00028-5. [195] T. Miyake, F. Aryasetiawan, T. Kotani, M. van Schilf- gaarde, M. Usuda, K. Terakura, Phys. Rev. B 66 (2002) 245103. doi:10.1103/PhysRevB.66.245103. [196] S. Krishnamurthy, M. van Schilfgaarde, N. Newman, Appl. Phys. Lett. 83 (2003) 1761–1763. doi:10.1063/ 1.1606873. [197] J. P. Velev, K. D. Belashchenko, D. A. Stewart, M. van Schilfgaarde, S. S. Jaswal, E. Y. Tsymbal, Phys. Rev. Lett. 95 (2005) 216601. doi:10.1103/PhysRevLett.95. 216601. [198] A. L. Wysocki, R. F. Sabirianov, M. van Schilfgaarde, K. D. Belashchenko, Phys. Rev. B 80 (2009) 224423. doi:10.1103/PhysRevB.80.224423. [199] J. J. Krich, B. I. Halperin, Phys. Rev. Lett. 98 (2007) 226802. doi:10.1103/PhysRevLett.98.226802. [200] A. M. A. Leguy, P. Azarhoosh, M. I. Alonso, M. Campoy-Quiles, O. J. Weber, J. Yao, D. Bryant, M. T. Weller, J. Nelson, A. Walsh, M. van Schilf- gaarde, P. R. F. Barnes, Nanoscale 8 (2016) 6317–6327. doi:10.1039/C5NR05435D. [201] J. D. Jackson, Classical electrodynamics, Wiley, New York, 1999.

73