Perspective: Methods for large-scale density functional calculations on metallic systems Jolyon Aarons, Misbah Sarwar, David Thompsett, and Chris-Kriton Skylaris

Citation: J. Chem. Phys. 145, 220901 (2016); doi: 10.1063/1.4972007 View online: http://dx.doi.org/10.1063/1.4972007 View Table of Contents: http://aip.scitation.org/toc/jcp/145/22 Published by the American Institute of Physics THE JOURNAL OF CHEMICAL PHYSICS 145, 220901 (2016)

Perspective: Methods for large-scale density functional calculations on metallic systems Jolyon Aarons,1 Misbah Sarwar,2 David Thompsett,2 and Chris-Kriton Skylaris1,a) 1School of Chemistry, University of Southampton, Southampton SO17 1BJ, United Kingdom 2Johnson Matthey Technology Centre, Sonning Common, Reading, United Kingdom (Received 14 July 2016; accepted 28 November 2016; published online 15 December 2016)

Current research challenges in areas such as energy and bioscience have created a strong need for Density Functional Theory (DFT) calculations on metallic nanostructures of hundreds to thousands of atoms to provide understanding at the atomic level in technologically important processes such as catalysis and magnetic materials. Linear-scaling DFT methods for calculations with thousands of atoms on insulators are now reaching a level of maturity. However such methods are not applicable to metals, where the continuum of states through the chemical potential and their partial occupancies provide significant hurdles which have yet to be fully overcome. Within this perspective we outline the theory of DFT calculations on metallic systems with a focus on methods for large-scale calculations, as required for the study of metallic nanoparticles. We present early approaches for electronic energy minimization in metallic systems as well as approaches which can impose partial state occupancies from a thermal distribution without access to the electronic Hamiltonian eigenvalues, such as the classes of Fermi operator expansions and integral expansions. We then focus on the significant progress which has been made in the last decade with developments which promise to better tackle the length- scale problem in metals. We discuss the challenges presented by each method, the likely future directions that could be followed and whether an accurate linear-scaling DFT method for metals is in sight. Published by AIP Publishing. [http://dx.doi.org/10.1063/1.4972007]

I. INTRODUCTION control the emissions released, and the production of certain foodstuffs also relies on catalytic processes, all of which can Electronic structure theory calculations using the Density use metallic nanoparticles. Catalysis also has a significant role Functional Theory (DFT) approach are widely used to com- to play in Proton Exchange Membrane (PEM) fuel cells, for pute and understand the chemical and physical properties of example, offering a promising source of clean energy, produc- molecules and materials. The study of metallic systems, in ing electricity by the electrochemical conversion of hydrogen particular, is an important area for the employment of DFT and oxygen to water. Either monometallic or alloyed, metal- simulations as there is a broad range of practical applica- lic nanoparticles are used as catalysts in these processes and tions. These applications range from the study of bulk metals anchored to a support such as oxide or carbon. The size, shape, and surfaces to the study of metallic nanoparticles, which is and composition (e.g., core-shell, bulk alloy, and segregated a rapidly growing area of research due to its technological structure)7 of these nanoparticles influence their chemical relevance.1 properties, and catalytically important sizes of nanoparticles For example, metallic nanoparticles have optical proper- (diameters of 2-10 nm) can consist of hundreds to thousands ties that are tunable and entirely different from those of the of atoms. bulk material as their interaction with light is determined by Even though extended infinite surface slabs of one type of their quantization of energy levels and surface plasmons. Due crystal plane are often used as models,8 which correspond to to their tunable optical properties, metal nanoparticles have the limit of very large nanoparticles (&5 nm), we can see from found numerous applications in biodiagnostics2,3 as sensitive Figure1 that this limit is not reached before nanoparticles with markers, such as for the detection of DNA by Au nanoparticle thousands of atoms are considered. The slab model has been markers. Another very promising area of metallic nanoparti- applied successfully in screening metal and metal alloys for cle usage concerns their magnetic properties which intricately a variety of reactions, providing guidance on how to improve depend not only on their size but also on their geometry,4 and current catalysts or identifying novel materials or composi- can exhibit effects such as giant magnetoresistance.5,6 How- tions.9 However, these types of models, while providing useful ever, by far the domain in which metallic nanoparticles have insight, do not capture the complexity of the nanoparticle,10 found most application so far is the area of heterogeneous for example, the effect the particle size has on properties or catalysis. The catalytic cracking of hydrocarbons produces edge effects between different crystal planes. The influence of the fuels we use, the vehicles we drive contain catalysts to the support can modify the electronic structure and geometry of the nanoparticle as well as cause “spillover effects” which may, in turn, affect catalytic activity.11 Metallic nanoparticles a)Electronic address: [email protected] are also dynamic, and interaction with adsorbates can induce

0021-9606/2016/145(22)/220901/14/$30.00 145, 220901-1 Published by AIP Publishing. 220901-2 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

usually accurate for chemical properties. However, this is an area of ongoing developments18 and there have been appli- cations in specific cases such as the physical properties of liquid Li.19 The majority of DFT calculations are performed with the Kohn-Sham approach which describes the energy of the sys- tem as a functional of the electronic density n(r) and molecular orbitals {ψi}, X E[n] = f hψ | Tˆ |ψ i + υ (r)n(r)dr i i i  ext i

+ EH [n] + Exc[n], (1) where the terms on the right are the kinetic energy, the exter- nal potential energy, the Hartree energy, and the exchange- correlation energy of the electrons, respectively. The molecular FIG. 1. The first 6 “magic numbers” of platinum cuboctahedral nanoparticles orbitals {ψi} are the solutions of a Schrodinger¨ equation for a showing how the number of atoms scales cubically with the diameter; 5 nm fictitious system of non-interacting particles, is not reached until the nanoparticle has 2869 atoms. 1 2 − ∇ + υˆKS ψi(r) = iψi(r), (2) a change in their shape and composition, which can modify " 2 # their behaviour. A fundamental understanding of how catalytic where i are the energy levels and where the effective potential reactions occur on the surfaces of these catalysts is crucial for υˆKS has been constructed in such a way that the electronic improving their performance, and DFT simulations play a key density of the fictitious non-interacting system role.12–16 X n(r) = f |ψ (r)|2 (3) The need to understand and control the rich and unique i i physical and chemical properties of metallic nanoparticles pro- i vides the motivation for the development of suitable DFT is the same as the electronic density of the system of interest methods for their study. Conventional DFT approaches used of interacting particles. to model metallic slabs are unsuited to modelling metallic We need to point out that the solution of the Kohn-Sham nanoparticles larger than ∼100 atoms as they are computation- equations (2) is not a trivial process as the Kohn-Sham poten- ally very costly with an increasing number of atoms. Therefore, tialυ ˆKS[n] is a functional of the density n which in turn depends the development of DFT methods for metals with reduced (ide- on the occupancies fi and the one-particle wavefunctions ψi via ally linear) scaling of computational effort with the number Equation (3). Thus the Kohn-Sham equations are non-linear of atoms is essential to modelling nanoparticles of appropri- and in practice they need to be solved iteratively until the wave- ate sizes for the applications mentioned above. Such methods functions, occupancies, and the density no longer change with also allow for the introduction of increased complexity into respect to each other, which is what is termed a self-consistent the models, such as the effect of the support or the influence solution to these equations. Typically this solution is obtained of the environment such as the solvent. by a Self-Consistent-Field (SCF) process whereυ ˆKS[n] is built In this perspective, we present an introduction to meth- from the current approximation to the density; then the Kohn- ods for large-scale DFT simulations of metallic systems with Sham equations are solved to obtain new wavefunctions and metallic nanoparticle applications in mind. We provide the occupancies to build a new density; from that new density a key ideas between the various classes of such methods and newυ ˆKS[n] is constructed and these iterations continue until discuss their computational demands. We conclude with some convergence. thoughts about likely future developments in this area. For a non-spin-polarized system, the occupancies fi are either 2 or 0, depending on whether the orbitals are occupied or not. The extension of the equations to spin polarization, II. DENSITY FUNCTIONAL THEORY FOR METALLIC with occupancies 1 or 0, is trivial. This formulation of DFT SYSTEMS is suitable for calculations on materials with a bandgap (or Density functional theory has become the method of HOMO-LUMO gap in molecules), and a wide range of algo- choice for electronic structure calculations of materials and rithms have been developed for the efficient numerical solution molecules due to its ability to provide sufficient accuracy for of these equations. practical simulations combined with manageable computa- The absence of a gap at the of metallic sys- tional cost. DFT can be performed in its original formulation tems makes the application of DFT approaches for insulators by Hohenberg and Kohn17 where observables such as the unsuitable for metallic systems. The extension of DFT to finite energy are computed as functionals of the electronic density. electronic temperature by Mermin can overcome this limita- This approach is attractive due to its computational simplic- tion by providing a canonical ensemble ity, but its accuracy is limited due to the lack of generally treatment of the electrons. In this approach, the existence applicable high accuracy approximations for the kinetic energy of a universal functional FT [n] of the electronic density for functional. Thus such “orbital-free” DFT calculations are not the canonical ensemble electronic system at temperature T is 220901-3 Aarons et al. J. Chem. Phys. 145, 220901 (2016) shown, and the of the electronic system now to determine, in principle, an infinite number of states is written as (instead of just N states in the zero temperature case)— although we can in practice neglect the states whose energy A[n] = F [n] + υ (r) n(r) dr, (4) T  ext is higher than a threshold beyond which the occupancies are practically zero. Another complication is the fact that where υ (r) is the external potential and F [n] contains the ext T most exchange-correlation functionals that are used in practice kinetic energy, the electron-electron interaction energy, and have been developed for zero temperature, so their behaviour the of the electronic canonical ensemble. and accuracy in a finite temperature calculation are not well A Kohn-Sham mapping of canonical ensemble DFT to understood. a system of non-interacting electrons can be carried out by Due to operations such as diagonalization, the computa- analogy with the derivation of standard Kohn-Sham DFT for tional effort to perform DFT calculations, whether on insu- zero electronic temperature. In this description, the electronic lators or metals, formally increases with the third power in system is represented by the single particle states (molecular the number of atoms. This scaling constitutes a bottleneck in orbitals) {ψ } which are solutions of a Kohn-Sham eigenvalue i efforts to apply DFT calculations to more than a few hun- equation, such as (2). The electronic density is constructed dred atoms, as is typically the case in problems involving from all the single particle states, biomolecules and nanostructures. Walter Kohn showed the X ∗ path for removing this limitation for insulators with his theory n(r) = fiψi(r)ψi (r), (5) 23 i of the “nearsightedness of electronic matter” which states that the 1-particle density matrix (or equivalently the Wannier where the fractional occupancies fi of the states follow the Fermi-Dirac distribution, functions) decays exponentially in a system with a bandgap, X 0  −  −1 0 ∗ 0 −γ |r−r | (FD) i µ ρ(r, r ) = fiψi(r)ψi (r ) ∝ e . (9) fi (i) = 1 + exp , (6)  σ  i where µ is the chemical potential and σ = kBT, where T is Several linear-scaling DFT programs have been developed the electronic temperature and kB is the . during the last couple of decades, based on reformulations of For this distribution of occupancies, the electronic entropy is DFT in terms of the density matrix or localized (Wannier-like) given by functions24–28 and a wide array of techniques have been formu- X lated to allow linear scaling calculations.29,30 Typically these S( f ) = −k f ln( f ) + (1 − f ) ln(1 − f ). (7) i B i i i i methods take advantage of the exponential decay to construct i highly sparse matrices and use operations such as sparse matrix As in the zero temperature case, the non-interacting system multiplication and storage where the central processing unit is constructed to have the same density as that of the inter- (CPU) and memory used scale only linearly with system size. acting system. The Helmholtz free energy on the interacting However, for metallic systems, there is not yet a linear- electronic system is expressed as scaling reformulation. The density matrix for metals is known X to have algebraic rather than exponential decay at zero temper- A[T, {ε }, {ψ }] = f hψ | Tˆ |ψ i + υ (r)n(r)dr i i i i i  ext ature and while exponential decay is recovered at finite tem- i perature, the exponent and the temperature at which it becomes + EH [n] + Exc[n] − TS[{ f i}], (8) useful for linear-scaling have not been explored adequately for and consists of the kinetic energy of the non-interacting elec- practical use.31 trons, and the known expressions for the external poten- Due to the importance of calculations on metallic systems, tial energy and Hartree energy of the electrons and the methods have been developed with the aim of performing such unknown exchange-correlation energy expression. Also, the calculations in a stable and efficient manner. Here we provide entropic contribution to the electronic free energy −TS[{f i}] is a review of the main classes of methods for such calcula- included. In practice, this is a functional not only of the density tions with emphasis on recent developments that promise to but also of the molecular orbitals (as they are needed for the cal- reduce the scaling of the computational effort with the number culation of the non-interacting kinetic energy, and the orbital of atoms, with the aim of simulating larger and more com- energies, which determines their fractional occupancies. plex metallic systems. We conclude with a discussion of the Another aspect which is particularly relevant for DFT future directions that could be followed towards the devel- calculations of metallic systems is Brillouin zone sampling. opment of improved methods for large-scale (and eventually Because of the extremely complicated Fermi surface in some linear-scaling) calculations on metallic systems. metallic systems, incredibly dense k-point sampling must be done to sample the Brillouin zone adequately, for instance III. METHODS FOR DFT CALCULATIONS by using a Monkhorst-Pack grid,20 or the VASP tetrahedron ON METALLIC SYSTEMS method.21 As the systems become larger, even in metallic sys- A. Electronic smearing and density mixing tems, the k-point sampling becomes less demanding as the bands flatten. One method for minimising the electronic energy in a Solving these canonical ensemble Kohn-Sham equations DFT calculation is known as direct inversion in the itera- is not trivial and presents more difficulties than working tive subspace (DIIS). First introduced by Pulay for Hartree- with the zero temperature Kohn-Sham equations.22 One has Fock calculations,32,33 this method was later adapted for DFT 220901-4 Aarons et al. J. Chem. Phys. 145, 220901 (2016) calculations by Kresse and Furthmuller¨ 34 and applied suc- If the DIIS assumption is used that the residuals are linear in cessfully to systems of up to 1000 metal atoms using the nin(r), then 35 VASP code. A similar technique has been implemented in − mX1 the linear scaling DFT code CONQUEST,36 and while not i+1 i−j R[nin (r)] = αi−j R[nin (r)], (15) linear scaling for metals, the authors show how such tech- j=0 niques may be applied successfully to density matrix based the Pulay mixing coefficients, αi which minimize the norm of DFT approaches and perform some operations in a linear the residual associated with the current iteration are given as scaling way. P −1 j (A )ji The central assumption in this method is that a good α = , (16) i P −1 approximation to the solution can be constructed as a linear kl (A )kl combination of the approximate solutions of the previous m where A is the matrix with elements aij the dot products of the iterations, residuals with index i and j. m−1 To apply such a scheme for metals, a preconditioner is i+1 X i−j x = αi−j x , (10) additionally required to ensure that the convergence has an j=0 acceptable rate. The Kerker preconditioner damps long-range where x can represent any of the variables of the solution, components in density changes more than short-range compo- 37 such as the Hamiltonian matrix, the density, or the one-particle nents because, in metals, long-range changes in the density wavefunctions. The DIIS method constructs a set of linear are often the cause of charge-sloshing effects, 2 equations to solve which yield the expansion coefficients αj. k G(k) = A , (17) Kresse et al. discuss two uses for DIIS, RMM-DIIS, k2 + k2 for the iterative diagonalization of a Hamiltonian and Pulay 0 mixing of densities. RMM-DIIS, which is a form of iter- where A is a mixing weight and the parameter k0 effectively ative diagonalization, allows for the first N eigenpairs of defines a “length-scale” for what is meant by long-range. The k a Hamiltonian matrix to be found without performing a are the wave vectors via which the density is represented. Thus G(k) is used to multiply the ni+1 density in reciprocal space full diagonalization of the whole matrix—detailed informa- in tion can be found in Kresse and Furthmuller.¨ 34 In the fol- and thus damp the “charge-sloshing” that can occur at long lowing paragraphs, we will discuss the Pulay mixing of wavelengths. Furthermore, as in all approaches for metals, a densities. smeared occupancy distribution must be used. Directly inputting the output density (from Equation (3)) For metallic systems, the choice of smearing function is into the next construction of the Hamiltonian can result in also a major consideration. While a Fermi-Dirac occupation an unstable SCF procedure where large changes in the out- can be used (6), many more options exist which exhibit advan- put density result from small changes in the input, known as tages and disadvantages over Fermi-Dirac. The major benefit charge-sloshing. Density mixing attempts to damp oscillations of Fermi-Dirac occupation is that the electronic smearing cor- in the SCF procedure, by mixing densities from previous iter- responds to a physical thermal distribution at temperature T. ations with the density produced from the wavefunctions and If thermally distributed electrons are not of interest and the occupancies at the current iteration (the output density). free energies obtained will be “corrected” to approximate zero The simplest approach is the linear mixing of densities Kelvin energies, then any other sigmoidal distribution which where the new density, ni+1(r) is constructed from the density converges to a step function in some limit might be used. A of the previous iteration as significant downside to Fermi-Dirac smearing is that the func- tion tails off very slowly, so a large number of very slightly i+1 i i nin (r) = α nout(r) + (1 − α) nin(r), (11) occupied conduction bands must be used to capture all of the occupied states fully. i i where nin(r) and nout(r) are the input and output SCF solu- Gaussian smearing solves the issue with the long tails of tions at the ith iteration. A better option regarding stability the distribution neatly. The smearing function is given by and efficiency is to use a mixing scheme based upon a history 1   − µ  of previous densities, such as Pulay’s DIIS procedure. Under (G)  − i P i fi ( i) = 1 erf , (18) the constraint of electron number conservation i α = 1, the 2  σ  next input density is given as where a shifted and scaled error function is used for state occupancy. Using Gaussian smearing, therefore, means that mX−1 i+1 i−j calculations will need relatively fewer partially occupied con- nin (r) = αi−j nin (r). (12) j=0 duction states to be effective. In this approach, the “smearing width,” σ no longer has a physical interpretation and the free In DIIS, the mixing coefficients αi are found by first energy functional to be minimized becomes an analogue gen- considering the density residual, eralized free energy. Despite this, the approach has been used successfully for decades (it is the default smearing scheme R[ni (r)] = ni (r) − ni (r), (13) in out in in the CASTEP plane-wave DFT code, for instance) and the which can incidentally be used to reformulate linear mixing as results can be effectively extrapolated to zero σ22 by using 1 i+1 i i 0  2 n (r) = n (r) + αR[n (r)]. (14) Eσ=0 = A + E + O(σ ), (19) in in in 2 220901-5 Aarons et al. J. Chem. Phys. 145, 220901 (2016) where A0 is the generalized free energy functional. This is considered to be a unitary transformation of the orbitals in a post hoc correction and hence is applied at the end of a which the occupancies are diagonal. In practice, the following 0 0 calculation, while forces and stresses are not variational with energy expression is used: A[T; { fi,j }, {ψi }]. Working with this respect to this unsmeared energy. expression provides a stable direct energy minimization algo- Another approach to recover non-smeared results for rithm because the optimization of occupancies is not slowed metals is to use a smearing function which knocks out down by nearly degenerate unitary rotations of the molecular the σ dependence of the generalized entropy. First order orbitals. The optimization of the energy is done in two nested Methfessel-Paxton Hermite polynomial smearing,38 loops as follows: within the inner loop, occupancy contribu-

 2!   − µ 2 tion to the energy is minimized and in the outer loop the orbital (MP) 1 3 i − µ − i f (i) = √ − e σ , (20) contribution is minimized using the projected functional i π 2 σ A[T, {ψ0}] = min A[T; {ψ0}; { f 0 }], (22) i { 0 } i i,j has only a quartic dependence on σ, so that results obtained fi,j using this approach need not be extrapolated back to zero σ but which allows an unconstrained optimization of both the may be used directly. A significant disadvantage of this method orbitals and the occupancies. is that it yields non-physical negative occupancies. This can Despite the obvious advantages of the Marzari method, lead to difficulties in finding the particle number conserving one weakness is that the mapping from the occupancies to the chemical potential, as the thermal distribution has degenera- unbounded range of orbital energies is very ill-conditioned, as cies, but may also lead to more serious concerns such as areas typically there is a large number of occupancies close to zero of negative electron density. that can map on to very different orbital energies. Marzari-Vanderbilt “cold smearing”39 solves all of these Freysoldt et al.42 have attempted to address this issue by issues by using a form developing an equivalent scheme where one works directly ! 1 3 3 2 with the molecular orbital energies instead of the occupan- f (MV)(x ) = ax3 − x2 − ax + e−x , (21) i i π 2 2 cies. In the spirit of the Marzari approach, they employ a non-diagonal representation of orbital energies, which is the where x = ( − µ)/σ and a is a free parameter for which the i i Hamiltonian matrix, and minimize the following functional: authors suggest a value of 0.5634. 0 0 Head-Gordon et al. have explored expansions of the var- A[T] = min A[T; {ψi }; {Hi,j }]. (23) {H0 }{ψ0 } ious smearing functions and present comparisons of conver- i,j i 0 0 gence with the order of the expansions and the number of The combined minimization of {Hi,j } and {ψi } is performed operations involved.40 using line searches along an augmented search direction in a joint space. B. Ensemble DFT (EDFT) Alternatively, such a scheme can be done in two loops Density mixing is non-variational in the sense that it can following the Marzari philosophy. In the outer, orbital loop, produce “converged” solutions below the minimum, but more a preconditioned gradient of the functional with respect to than this it can take a long time to reach convergence, or it orbitals is used, while in the Hamiltonian space a search direc- can even be unstable without a reliable mixing scheme and tion between the current matrix and that corresponding to preconditioner. This is particularly the case in systems with the non-self consistent energy minimum is used. The non- a large number of degrees of freedom. Ideally, a variational self consistent Hamiltonian is constructed by first calculating approach, where every step is guaranteed to lower the energy the occupancies { fi} from the Hamiltonian eigenvalues via towards the ground state energy, would be preferred over a an occupancy distribution function. A new electronic charge density mixing approach, if it could ensure that the ground state density can then be calculated as X ∗ energy would always be reached through a stable progression. nnew(r) = fiψi(r)ψ (r), (24) 41 i Marzari et al. proposed such a scheme in 1997, which in the n literature has become familiar as Ensemble DFT (EDFT). We and from this density a new (non self-consistent) Hamiltonian need to note that EDFT should not be confused with Mermin’s matrix H˜ is constructed which is the end-point to the inner original finite temperature DFT formalism that is often referred loop line search for updating the Hamiltonian, to with the same name. A variational progression towards the n+1 − λ n λ ˜ n ground state energy is achieved by decoupling the problems Hij = (1 )Hij + Hij. (25) of optimising the 1-particle wavefunctions and of optimising Of course, as in the method of Marzari, in order to perform the electronic occupancy of these wavefunctions with respect calculations with this approach a diagonalization of the Hamil- to the energy of the system. This is achieved by performing an tonian matrix must occur at each inner loopstep, resulting in a occupancy optimization process with fixed wavefunctions at cubically scaling algorithm. every step in the optimization of the wavefunctions themselves. Recently an EDFT method has been implemented within The Helmholtz free energy of Equation (8) is expressed in the linear-scaling DFT package ONETEP.43 ONETEP uses a an equivalent way in terms of the occupancies of the molecular minimal set of non-orthogonal generalized Wannier functions orbitals A[T; { f i}, {ψi}], due to the existence of a one-to-one (NGWFs) which are represented in terms of coefficients of mapping between the orbital energies and their occupancies. a basis set of periodic sinc (psinc) functions and are strictly The method generalizes the occupancies to non-diagonal form localized in space. The psinc basis functions are related to 0 0 { fi,j } by working with molecular orbitals {ψi } which can be plane waves through a unitary rotation. 220901-6 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

ONETEP optimizes the NGWFs variationally with The Hamiltonian matrix in the basis of Lowdin¨ orthogo- respect to the energy in a manner similar to the outer loop nalized orbitals can be written as of the EDFT method. The psinc basis set allows a system- H 0 = (S−1/2HS−1/2) (30) atic convergence to the complete basis set limit with a single i,j ij plane-wave cutoff energy parameter much like plane waves do and subsequently, the Kohn-Sham Hamiltonian eigenvalues for periodic calculations. Techniques such as PAW, which has can be found by diagonalising this orthogonal matrix with a gained favour in many codes for the efficient representation of standard parallel eigenvalue solver. core electrons, may also be applied to the spatially localized The next step in this EDFT inner-loop method is to con- NGWFs {φα}. struct a non-self consistent density matrix from the Hamilto- ONETEP is usually employed for large systems of hun- nian matrix. dreds or thousands of atoms within the linear-scaling mode, but This density matrix can be represented as a function of a step function occupancy distribution limits its applicability the eigenvalues εi of the Hamiltonian as to insulating systems. To exploit near-sightedness of electronic XN matter, terms in the density matrix, αβ α † β K = M i f (εi)Mi , (31) 0 X ∗ 0 i ρ(r, r ) = fnψn(r)ψn(r ), (26) n where the eigenvalues (band occupancies) of the finite- temperature density kernel, Kαβ are given in terms of a smear- which is also equal to the EDFT generalized occupancy matrix ing function (6): The matrix M contains the eigenvectors of (if a Fermi-Dirac occupancy function is used) are truncated the Hamiltonian eigenproblem, based on spatial separation and the rest of the DFT can be β β written in terms of this quantity as HαβM i = SαβM i εi. (32) n(r) = ρ(r, r). (27) Such eigenvalue based approaches will always scale as O(N3) as they employ matrix diagonalization algorithms. The con- ONETEP constructs the density matrix as an expansion in the struction of the Hamiltonian and orthogonalization procedures NGWFs as is however linear-scaling which significantly reduces the pref- 0 X αβ ∗ 0 actor of this approach (see Fig.2) versus a comparable EDFT ρ(r, r ) = φα(r)K φβ(r ), (28) αβ implementation in conventional plane-wave codes. A minimization is performed in the space of Hamiltonians where Kαβ is the generalized occupancy of the NGWFs, i.e., and a search direction is constructed as its eigenvalues are { f }, and is known as the density kernel. i ∆(n) = H˜ (n) − H(n), (33) EDFT is implemented44 by taking the Hamiltonian eigenvalue αβ αβ αβ approach of Freysoldt et al., but also taking advantage of the where a new Hamiltonian matrix, H˜ has been constructed localized nature of the NGWFs which obviously also requires from the updated density kernel. The Hamiltonian can then dealing with non-orthogonality. be updated as In ONETEP EDFT, the free energy functional is optimized (n+1) (n) (n) Hαβ = Hαβ − λ∆αβ, (34) first with respect to the Hamiltonian matrix, {Hα,β } while keep- where λ is a scalar line search parameter. Once the inner ing the NGWFs {φα} fixed. The orbital contribution to the free energy can then be minimized by optimising a projected loop has converged, and the contribution to the energy from Helmholtz functional, the Hamiltonian can no longer be reduced, the outer loop is 0 A [T; {φα}] = min A[T; {Hαβ }; {φα}] (29) {Hαβ } with respect to {φα}. In practice, the diagonalization of the Hamiltonian within the inner loop is done by orthogonalizing and then by solving a standard eigenvalue problem using parallel solvers. It can alternatively be done directly with a generalized parallel eigen- solver, but to facilitate the expedient replacement of the eigen- solver with an expansion method (discussed in Sec. III D), orthogonalization is performed first. The orthogonalization step can be carried out in many ways: using Gram-Schmidt, Cholesky or Lowdin¨ methods, or simply by taking the inverse of the overlap matrix and left-multiplying the Hamiltonian FIG. 2. The scaling of the computational effort with the number of atoms in matrix by this to construct a new orthogonal Hamiltonian. the EDFT metals method in ONETEP. The computationally demanding steps The Lowdin¨ approach requires the overlap matrix to the of the calculation such as the construction of the density dependent (DD) and power 1/2 and −1/2. Due to the work of Jans´ık et al.,45 it density independent (DI) parts of the Hamiltonian and the NGWF gradient are linear-scaling and thus allow large numbers of atoms to be treated. However, is possible to construct Lowdin¨ factors in a linear-scaling there is a diagonalization step which is cubic-scaling and eventually dominates way using a Newton-Schultz approach, without an expensive the calculation time. Reproduced with permission from J. Chem. Phys. 139(5), eigendecomposition. 054107 (2013). Copyright 2013 AIP Publishing LLC. 220901-7 Aarons et al. J. Chem. Phys. 145, 220901 (2016) resumed and this process continues to self-consistency. At Section III D 4) but which may be applicable more widely is self-consistency, i.e., when the calculation has converged, the Selected Inversion (SI). In effect this amounts to a Cholesky Hamiltonian will commute with the density kernel. expansion (or LDL or LU decomposition) of the matrix in order to compute selected matrix elements, such as the diagonal, of C. Matrix inversion the inverse matrix exactly.53 In all of the methods which follow in Section III D, there is Assuming that the matrix to be inverted is sparse, then it a need for matrix inversion or factorization. In fact, even if one can be decomposed as needs an orthogonal representation of the Hamiltonian matrix LDL† = PQPT , (37) in general, or for computing derivatives of molecular orbitals for optimizing these orbitals, an inverse or factorization of the with sparse L, which are lower triangular factor matrices and overlap matrix is necessary. where D is a diagonal matrix and P is a permutation matrix The overlap matrix of strictly localized functions is itself chosen to reduce the amount of “fill-in” in the sparsity pattern a highly sparse matrix, since spatial separation of atoms in sys- of L with respect to Q. If we write PQPT → Q for simplicity, tems with many atoms ensures that most matrix elements will then the inverse matrix, be zero and hence S is localized. As for the inverse overlap −1 −† −1 sparsity of insulators, this can be shown to be exponentially Q = L (LD) , (38) localized by treating S as a Hamiltonian matrix and taking so, provided that the LDL factorization of the sparse Q matrix its Green’s function at a shift of zero.46 Since Green’s func- can be performed, the potentially sparse (but likely with tions are always exponentially localized at shifts outside of the high fill-in) Q−1 matrix may be calculated trivially by back eigenvalue range of the “Hamiltonian” (the overlap matrix is substitution. positive definite), the inverse overlap matrix too is exponen- Rather than forming the inverse matrix in this fashion, tially localized. Nunes and Vanderbilt46 state that the decay however, SI may be used to calculate solely the selected ele- length of the exponential localization is dependent upon the ments of the inverse matrix on a pre-defined sparsity pattern. ratio of maximum eigenvalue to minimum eigenvalue. So, First, the LDL factorization can be computed recursively, using systems involving overlap matrices with large `2 condition the block factorization, number will have a long decay length in the inverse overlap ! ! ! ! matrix. AB I 0 A 0 IA−1B = . In many cases it may be desirable to entirely avoid matrix CD CA−1 I 0 D − CA−1B 0 I inversion and solve matrix equations directly using a decompo- sition, but this entirely depends on whether a matrix decom- (39) position can be efficiently computed for a sparse matrix in In the case that A is a scalar, a and the matrix to be factorized 47 parallel. is block-symmetric, so that C is a vector c = bT , and then If inversion is called for, as in several of the techniques in ! ! ! ! Section III D, because the matrices involved are sparse, con- a b 1 0 a 0 1 lT = , (40) ventional inversion techniques cannot be employed efficiently, bT D l I 0 D − bT b/a 0 I so more specialized techniques are used. The Newton-Schultz- Hotelling (NSH) inversion is a generalization of the applica- where the vector l = b/a. The Schur complement, S = D − T tion of the Newton-Raphson method for the iterative inversion b b/a can then be factorized recursively and so the inverse of scalars to matrices. As in the scalar case, the roots of an can be written using the symmetric block matrix inverse equation, in this case f(X) = Q − X−1, are found iteratively formula, for scalar a, as ! with the Newton-Raphson approach. This can be performed a−1 + lT S−1l −lT S−1 Q−1 = . (41) simply by recursive application of the iteration −S−1l S−1 X = X (2I − QX ), (35) n+1 n n In this way, the inverse can be computed recursively, where Q is the matrix to be inverted and Xn converges quadrat- along with the factorization, by descending through the recur- ically to the inverse matrix in the limit of n → ∞, provided that sion hierarchy until the Schur complement can be formed and Q is non-singular and X0 is initialized with a matrix which inverted using scalar operations, and then the elements of each guarantees convergence of the iteration,48 successive inverse are calculated from the inverse of the pre- vious level, as the hierarchy is ascended. By computing the X = αQT , (36) 0 inverse recursively in this fashion, the authors53 show that if where α = 1/(||Q||1 ||Q||∞) is a good choice in general, accord- the diagonal of the inverse matrix is required, only those ele- ing to Pan and Schreiber.49 This is discussed in detail in the ments of the inverse Schur complements which have an index work of Ozaki,50 along with several other possibilities for equal to the index of a non-zero element of an l vector need matrix inversion in linear-scaling electronic structure theory to be calculated. This corresponds to calculating the inverse calculations. This method, however, has been picked up by the matrix for only those elements for which the L matrix has a community and is used extensively in modern methods51 and non-zero element. The authors report computational scaling codes.52 of O(N3/2) for 1D systems. Another approach which has been developed for the Other methods exist for calculating selected elements of inversion of matrices on the poles of a contour integral (see the inverse of a sparse matrix with similarly reduced scaling. 220901-8 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

The FIND algorithm54 works by permuting the original matrix Methfessel-Paxton38 or Marzari-Vanderbilt41 cold-smearing. to make one desired diagonal element obtainable as the trivial For simplicity, this work will only consider Fermi-Dirac style solution to the inverse of the Schur complement, where the occupancy smearings, however. B matrix is a vector b and D is a scalar, after an LDL or LU As an alternative to the eigenvalue based methods men- decomposition of A using the notation of Eq. (39). This is then tioned in Sec. III B, FOE methods begin instead by writing the repeated for every diagonal element, but the computational occupancy formula in matrix form cost is made manageable by performing partial decomposi- −1 f (X) = (I + eX) , (42) tions and reusing information. Another option is based on Takahashi’s equations;55 this has the advantage of not needing where I is the identity matrix and to invert the triangular factors. X = (H − µI)β, (43) An important point to note with these techniques for inver- sion is that a sparse matrix factorization must be used for them where H is the Hamiltonian matrix expressed in an orthonor- to be practical and beneficial. Routines for doing this are avail- mal basis, µ is the chemical potential, and β = 1/kBT. For ¨ able in libraries such as SuperLU47 and MUMPS.56 Both of instance, we can obtain H using a Lowdin orthogonalization, these libraries contain distributed-memory parallel implemen- see Equation (30), using, for instance, the previously men- ´ tations of matrix factorizations for sparse matrices, but it is well tioned iterative refinement method of Jansık et al. or the with the combination of a recursive factorization of the overlap known that the parallel scaling of such approaches is somewhat 62 57 matrix and iterative refinement of the approximate result, limited. 63 If a general sparse matrix is factorized in such a way, as proposed by Rubensson et al. This method for calcu- ¨ ¨ however, then the factors may be dense, so the approach does lating the Lowdin factor (and inverse Lowdin factor) allows not exploit the sparsity at all and so is not helpful in mak- for rapid convergence in a Newton-Schultz style refinement ing a fast algorithm. In order to circumvent such a situation, scheme, while providing heuristics for the requisite parame- the rows and columns of the matrix to be factorized must be ters, without reference to even extremal eigenvalues. As the reordered prior to factorization with a fill-in reducing reorder- title of the work suggests, for sparse matrices, this method can ing. The ideal reordering in terms of number of non-zero be implemented in a linear scaling way. elements in the triangular factor is unknown, as calculating In practice, the matrix formula of Equation (42) cannot it is an NP-complete problem; however, good58 algorithms be applied directly as the condition number of the matrix to be exist for finding approximations. Unfortunately, the best of inverted will be too large in general. Instead, the operation in these are not well parallelizable. Despite this, libraries such as (42) is performed as an approximate matrix-expansion of this ParMETIS59 and PT SCOTCH60 exist and do this operation function. with the best parallel algorithms currently known. It is perhaps In all of the following methods, matrix inversion plays an worth noting that the best performing reordering methods dif- important role. In Secs. III D 2–III D 4 we will discuss three fer, depending whether serial or parallel computers are used of the major flavours of expansion approaches; first the ratio- for the calculation. nal expansions, where the density matrix is made up from a sum involving inverses of functions of the Hamiltonian matrix, second the Chebyshev expansions, where the density matrix is D. Expansion approaches formed from as a matrix Chebyshev polynomial approximat- 1. Introduction to expansion approaches ing the Fermi-Dirac function of the eigenvalues, and third the recursive approaches, where simple polynomial functions are As in linear-scaling methods for insulators, the idea applied recursively to the Hamiltonian matrix to produce the behind expansion and other reduced scaling approaches is to density matrix. be able to perform an SCF calculation without using an (inher- ently) cubic-scaling diagonalization step. Thus these methods 2. Chebyshev expansion approaches attempt to compute a converged density matrix for metal- A way to perform the operation of applying the Fermi- lic systems directly from the Hamiltonian, using for exam- Dirac occupation function to the eigenvalues of a Hamiltonian ple (potentially linear-scaling) matrix multiplications, as the matrix was proposed by Goedecker and Teter in 1995 as a Fermi operator expansion (FOE) approach which is one of the Chebyshev expansion.64 In order to do this, the Fermi function early approaches of this kind. In this way, the cycle of having is written as a Chebyshev expansion to diagonalize the Hamiltonian to obtain wavefunctions and N energies from which to build the occupancies and the density X f (X) = a T (X), (44) is no longer needed. i i i=0 The idea for Fermi operator expansions (FOEs) was first where {T } are the Chebyshev matrices of the first kind of proposed by Goedecker and Colombo in 1994.61 The den- i degree i and {a } are the expansion coefficients. The Chebyshev sity matrix with finite-temperature occupancies, which can i matrices are in the standard form be constructed as a function of the eigenvalues of the one- particle Hamiltonian, as in EDFT, is instead built as a matrix- T0(X) = I, polynomial expansion of the occupancy function. In most T1(X) = X, (45) works, this occupancy function is the matrix analogue of the Tn+1(X) = 2XTn(X) − Tn−1(X). Fermi-Dirac occupancy function but can equally be a general- The coefficients can be developed by simply taking the ized occupancy function, such as Gaussian smearing or either Chebyshev expansion of the scalar Fermi-Dirac function and 220901-9 Aarons et al. J. Chem. Phys. 145, 220901 (2016) applying these to compute a Chebyshev Fermi operator eval- By decomposing the quantity in parentheses as uation. As the Chebyshev polynomials can also be defined YP trigonometrically Tn(cos(ω)) = cos(nω), then the coefficients I + e−X = M M∗, (50) can be found simply through a discrete cosine transform, l l l=1 ! 1 {ai} = DCT , (46) where 1 + ecos(x) i(2l−1)π/2P −H/2P Ml = I − e e , (51) where x = 2(((e − µ)β) − e )/(e − e ) − 1 so that the range i i 0 N 0 where P is an integer which Ceriotti et al. suggest should be of equispaced e covers at least the range of the eigenrange of i in the range 500-1000 for optimal efficiency.68 Krajewski and the Hamiltonian matrix (e :e ) and the interval is scaled and 0 N Parrinello show that expectation values of physical observables shifted to cover the useful interpolative range of Chebyshev can be calculated using a Monte Carlo stochastic method lead- polynomials ( 1:1). This operation need only be performed ing to an O(N) scaling approach, at the expense of noise on once the coefficients are stored for every subsequent evalua- values of the calculated properties.69 In a separate publication, tion and is effectively negligible in the complexity and timing the authors also show that the approach scales linearly with- of the algorithms based on such an expansion. This way of out Monte Carlo sampling in the case of 1D systems such as performing the expansion requires approximately N terms for carbon nanotubes.70 a given accuracy, where N is a function of the smearing width, In later work, Ceriotti, Kuhne,¨ and Parrinello extend β, the required accuracy 10 D, and the width of the Hamil- the approach to allow better scaling in general. With this tonian eigenvalue spectrum ∆E. This implies that if matrix approach, the usually expensive and difficult application of multiplication operations can be at best O(N) scaling for tight- the matrix-exponential operator is reduced to a number of binding calculations such as those for which this technique more tractable matrix exponential problems, in the sense that was first proposed, then of the order of M ≈ D β∆E matrix approximating them effectively through low-order truncated multiplications would be required.65 matrix-polynomial expansions is achievable. Through the use An improvement to the original Chebyshev series of of this formalism, the grand-canonical density matrix, other- Goedecker was proposed by Liang and Head-Gordon in 2003; wise known as the density matrix with Fermi-Dirac occupancy, this uses a divide and conquer approach to re-sum the terms of can be expressed exactly in terms of this expansion by taking a truncated Chebyshev series,40 which is closely related to the the appropriate derivative of the grand potential, divide and conquer approach for standard polynomials sug- gested in Paterson and Stockmeyer.66 This approach factors P δΩ − −1 2 X − the polynomial into a number of terms which share common = (I + e X ) = [I −

N Neven Nodd values of l is significantly larger than for those with higher l. X i X 2j X 2j+1 aiX = a2jX + a2j+1X , (47) Furthermore, the condition number drops off rapidly with l, i=0 j=0 j=0 approaching 1 after tens of terms. In practice, the authors have then a factor of X can be taken out of the odd sum, to give shown that to efficiently invert all Ml matrices in an expan- sion, the majority of low condition-number matrices can be N Neven Nodd X i X 2 j X 2 j handled through an expansion about an approximate diagonal aiX = a2j(X ) + X a2j+1(X ) . (48) matrix, with the significant advantage that these terms can all i=0 j=0 j=0 be formed from the same intermediate matrices, reducing the If this process is performed recursively, almost a factor of two computational effort markedly. The remaining terms are han- in matrix multiplications can be saved for each sub-division. If dled via a Newton-Schultz-Hotelling method, which is more done carefully, the amount of work to produce these terms and expensive, but required only for approximately 10 matrices to combine them to construct the full polynomial is less than the achieve maximum efficiency. amount of work to construct the polynomial directly because The approach of Ceriotti et al. makes use of the fast- of the unduplicated effort. To experience the most gain, this resuming improvements to the Chebyshev expansion for the approach is repeated recursively, until the sub-division can decomposition in the high-l regime. It also uses initialization no longer be performed (efficiently). Using such a divide and from previous matrices in the low-l regime. This work pro- conquer scheme reduces the scaling of√ a Chebyshev decom- vides a useful framework for analysis of the problems facing position of the Fermi operator to O( N) number of matrix such methods, but alone does not bring the scaling factor down multiplications. from the previous state of the art. In order to achieve that, Rather than decomposing the Fermi operator directly in Richters and Kuhne¨ go on to improve the approach by com- terms of a truncated matrix-polynomial, Krajewski and Par- puting only the real components of high condition number −1 rinello propose a construction based on an exact decomposition Ml and computing the high-l expansion by approximating 71 of the grand-canonical potential for a system of non-interacting the inverse of Ml directly as a Chebyshev expansion. By 67 fermions,   taking√ this approach, the scaling of the method is reduced to Ω = −2Tr ln I + e−X . (49) O( 3 N) matrix multiplications.72 220901-10 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

3. Recursive methods poles of the function X. Thus, these integrals are computed by numerically integrating around a contour surrounding the Another class of methods which are closely related to eigenspectrum. purification used in linear-scaling DFT techniques for insula- At every point z on the quadrature path, a Hermitian tors are the recursive operator expansions. Techniques such as matrix, H zI, must be inverted. Thus, the rational expansion these first appeared in the 1970s with the Haydock approach performed as in the contour integral expansion of Goedecker for tight-binding calculations, which can be performed in a has low prefactor in number of inversions O(ln(M)) but results linear scaling manner.73,74 in a large number of matrix multiplications if performed with Niklasson75 proposed taking a low order Pade´ approxi- an iterative inversion algorithm.77 mant of the Fermi operator (42), In subsequent work, Lin-Lin et al. propose an alternative 2  2 2−1 scheme based on contour integration of the matrix Fermi-Dirac Xn(Xn−1) = Xn−1 Xn−1 + (I − Xn−1) (53) function, so that and applying it recursively starting with the initial P 1 X ω X = I − (H − µ I) β/22+N , (54) f (X) = =m l , (59) 0 2 0 H − (z + µ)S l=1 l where N is the number of iterates in the expansion. With this method the Fermi operator can be approximated as where the complex shifts zl and quadrature weights ωl can be calculated from solely the chemical potential, the Hamiltonian f (H, µ, β) = XN (XN−1(··· (X0) ··· )). (55) eigenspectrum width, and the number of points on the contour Niklasson reports that the scheme is quadratically convergent integral, P. A contour based on the complex shifts suggested and in practice the number of iterates can often be kept low by the authors is shown in Figure3. Similar techniques to (N ≈ 10). Given that one inversion must be performed, or one these are used in Korringa–Kohn–Rostoker (KKR) and related linear equation solved per iteration which can be seeded from multi-scattering techniques, see SectionV. the previous iterate, if using an iterative method, this technique In performing this integral, the authors are careful to avoid is expected to be very quick in practice. the non-analytic parts of Fermi-Dirac function in the com- plex plane on the imaginary axis greater than π/β and less 4. Rational expansion approaches than −π/β, by constructing a contour which encompasses the Goedecker was the first to introduce methods for the ratio- eigenspectrum of the Hamiltonian matrix, but “necks” suffi- nal series expansion of the finite temperature density matrix.76 ciently at zero on the real axis to pass though the analytic The Fermi operator can be expressed as window while enveloping all of the real eigenvalues. A figure !2 showing such a contour is given in Fig.3. 1 µ 1 H − µ0I f (H, µ, β) = 2I + Within, the authors show the poor convergence of the 2kBT −∞ 2! kBT Matsubara expansion (expansion into even and odd imaginary * !4 −1 1 H − µ0I frequency components) that Goedecker also reported, though + , + ··· dµ0, (56) they offer a solution to this in terms of fast multipole (FMP) 4! k T B methods. They do however show that the integral expansion + which still contains a very expensive and ill-conditioned inver- based on a contour integral requires fewer inversions than this sion operation. Writing the expression in this- form does how- Matsubara based approach, even with the FMP method. These ever allow it to be further approximated by expressing the methods scale as O(ln(M)) where M is the number of matrix integrand as a partial fraction expansion truncated to order n, inversions.78 µ Xn C f (H, µ, β) = ν dµ0. (57)  (H − µ0I) − k T (A + iB ) −∞ ν=1 B ν ν

The coefficients Cν = Aν + iBν can be calculated, and the partial fraction decomposition is very quickly convergent.77 The author suggests an n of 16, giving a compact expression n/ X4 2iB A − iB f (H, µ, β) = ν dz + ν ν dz " H − zI  + H − zI ν=1 Πν Λν A + iB + ν ν dz , (58)  − − Λν H zI # where the z values are the complex value points along the path used to evaluate each integral by quadrature. Three + − paths in the complex plane, Πν, Λν and Λν , are used for quadrature. In essence, the approach works by re-expressing FIG. 3. The typical eigenspectrum of a Hamiltonian matrix for a metallic system is shown on the real axis of this Argand diagram (blue line). The dis- the occupancy formula (42) as in Equation (58) which con- cretized contour integral of a matrix smearing function, as used in the PEXSI sists of contour integrals. These could be formally evalu- method, with poles on the eigenvalues is taken around the black contour, ated from their residues, but we do not have access to the avoiding non-analytic points (iπ/β and −iπ/β) on the imaginary axis. 220901-11 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

Lin-Lin and Roberto Car et al. have proposed a multipole matrix must be calculated, making the whole process many expansion based on Matsubara theory.79 Given that times more expensive.

X −1 Several methods have been proposed to improve the situa- (I + e ) = (I − tanh(X/2))/2, (60) tion, including a finite difference representation of the change then the Matsubara representation can be written, using the in number of electrons with respect to chemical potential.80 pole-expansion of tanh, as Niklasson recommends81 using the analytic derivative of ∞ the density matrix with respect to the chemical potential X −1 X −1 (I + e ) = I − 4

VI. CONCLUSIONS AND OUTLOOK In the future, the ultimate target in methods for metallic systems would be to have computational CPU and memory Density functional theory calculations on metallic sys- cost which increase linearly with the number of atoms as is tems (i.e., systems with zero band gap) cannot be done with the case for insulators where a number of linear-scaling DFT DFT approaches that have been developed for insulators as programs currently exist. This is not at all trivial as the tem- such approaches are not designed to cope with a continuum perature and spatial cutoff at which exponential decay of the of energy levels at the chemical potential. Mermin’s formula- density matrix in a metal would be sufficient to have enough tion of finite temperature DFT provided the theoretical basis matrix sparsity for linear-scaling algorithms and acceptable for DFT calculations on metallic systems. A great deal of accuracy is not clear and would need to be explored care- progress has been made over the last thirty years in study- fully. Coming from a background of linear-scaling DFT, so ing metallic systems based on Mermin’s DFT, starting with far we have a metal method which works in the localized approaches such as density mixing and the more stable ensem- non-orthogonal generalized Wannier function framework of ble DFT. In the last decade, rapid progress has been made the ONETEP package. This approach is an intermediate step on expansion methods, which however were introduced much towards the development of a linear-scaling method for met- earlier. als as it benefits from the framework of a linear-scaling DFT Advanced implementations of Mermin’s DFT have approach (e.g., sparse matrices and algorithms) but it still allowed calculations with over 1000 atoms to be performed. requires a cubic-scaling diagonalization step as it is based on For example, Alfe` et al. have performed calculations on molten the EDFT formalism. We expect further work in this direction iron with over 1000 atoms which has provided new unique to involve an expansion method which can exploit the sparsity 35 insights about processes taking place in the Earth’s core, in the matrices as we do in insulating linear-scaling DFT. which can not be obtained by experimental means. Other Based on our review of the available literature discussing examples can be found in the work of Nørskov et al. who the application of expansion approaches to DFT calculations, have studied small molecule adsorption on platinum nanopar- the sizes (i.e., number of atoms) of metallic systems which ticles of up to 1500 atoms90 and in the work of Skylaris and have been studied with these methods is quite limited, in com- Ruiz-Serrano who have reported calculations on gold nanopar- parison to, for instance, the sizes of insulating systems studied ticles with up to 2000 atoms.44 These three examples have with the same methods. There are several reasons for this. For used approaches which minimize the number of O(N3) oper- example, these algorithms have higher prefactors than do the ations, either the approach of Kresse et al. as in VASP and algorithms involved in calculating idempotent density matri- GPAW, in the first two examples, respectively, or the approach ces, as used in linear-scaling DFT. Production calculations of Skylaris et al. in the latter example. This turns out to be of metallic systems with methods which purport to have a key for calculations on systems up to the low thousands of reduced or even linear-scaling computational cost with sys- atoms, at which point the cubic diagonalization step dominates tem size have not yet been reported for the large numbers and a different approach with lower computational scaling is of atoms that one would have expected because the crossover required. point at which these methods become advantageous over diag- Wehave reviewed the methods for calculations on metallic onalization has not yet been reached on currently used high systems with emphasis on recent developments that promise performance computing facilities. calculations with larger numbers of atoms. Fractional occupan- Also, algorithms such as density mixing do not work as cies of states are needed when dealing with metallic systems well for large systems, as has been seen in practical observa- and conventionally these are computed after diagonalization tions of the scaling of SCF iterations with the number of atoms of the Hamiltonian which is impractical for large systems. in conventional DFT for metallic systems. If this is indeed a One way to avoid diagonalization is to use expansion methods factor, then it is possible that an alternative, such as EDFT will which construct a finite temperature density matrix via matrix be required. expansion of the Hamiltonian, without needing to access its Even with these caveats, the class of operator expan- eigenvalues. Expansion methods have typically been slow and sions applied to density matrix DFT methods appears to be so unsuitable for increasing the size of system one can study the strongest contender for reducing the scaling of accu- or for reducing time to science. Recent developments have rate DFT calculations on metallic systems in the future. If improved this situation. such methods are further developed, and actually start to There is considerable option, however, in the particular be routinely applied as computational power increases, we expansion chosen to compute the density matrix. The variants expect that they will have a great impact on metallic nanos- fall roughly into two categories: those based on an expansion tructure and bulk metal surfaces engineering for industrial via Chebyshev polynomials and those based on a rational or applications in optics, magnetics, catalysis, and other areas contour integral expansion. At present, on balance, it seems in which the unique properties of metallic systems can be that the PEXSI approach which is based on a contour inte- exploited. gral combined with a novel matrix inversion approach is the most computationally efficient option. Time will tell whether ACKNOWLEDGMENTS the Chebyshev expansion methods can be developed further to have improved scaling with system size and become com- Jolyon Aarons would like to thank Johnson Matthey petitive to the best rational expansion methods, or if stochastic and the Engineering and Physical Sciences Research Council methods become usable. (EPSRC) for an industrial CASE Ph.D. studentship. 220901-13 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

1R. Ferrando, J. Jellinek, and R. L. Johnston, “Nanoalloys: From theory 28L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. Alireza Ghasemi, to applications of alloy clusters and nanoparticles,” Chem. Rev. 108(3), A. Willand, D. Caliste, O. Zilberberg, M. Rayson, A. Bergman, and 845–910 (2008). R. Schneider, “Daubechies wavelets as a basis set for density functional 2P.K. Jain, W.Huang, and M. A. El-Sayed, “On the universal scaling behavior pseudopotential calculations,” J. Chem. Phys. 129(1), 014109 (2008). of the distance decay of plasmon coupling in metal nanoparticle pairs: A 29S. Goedecker, “Linear scaling electronic structure methods,” Rev. Mod. plasmon ruler equation,” Nano Lett. 7(7), 2080–2088 (2007). Phys. 71, 1085–1123 (1999). 3N. L. Rosi and C. A. Mirkin, “Nanostructures in biodiagnostics,” Chem. 30D. R. Bowler and T. Miyazaki, “O(n) methods in electronic structure Rev. 105(4), 1547–1562 (2005). calculations,” Rep. Prog. Phys. 75(3), 036503 (2012). 4C. D. Paola, R. D’Agosta, and F. Baletto, “Geometrical effects on the 31S. Goedecker, “Decay properties of the finite-temperature density matrix in magnetic properties of nanoparticles,” Nano Lett. 16(4), 2885–2889 (2016). metals,” Phys. Rev. B 58(7), 3501 (1998). 5L. V. Lutsev, A. I. Stognij, and N. N. Novitskii, “Giant magnetoresistance 32P. Pulay, “Convergence acceleration of iterative sequences. The case of scf in semiconductor/granular film heterostructures with cobalt nanoparticles,” iteration,” Chem. Phys. Lett. 73(2), 393–398 (1980). Phys. Rev. B 80, 184423 (2009). 33P. Pulay, “Improved scf convergence acceleration,” J. Comput. Chem. 3(4), 6W. Wang, F. Zhu, J. Weng, J. Xiao, and W. Lai, “Nanoparticle morphology 556–560 (1982). in a granular cu–co alloy with giant magnetoresistance,” Appl. Phys. Lett. 34G. Kresse and J. Furthmuller,¨ “Efficient iterative schemes for ab initio total- 72(9), 1118–1120 (1998). energy calculations using a plane-wave basis set,” Phys. Rev. B 54(16), 7F. Baletto and R. Ferrando, “Structural properties of nanoclusters: Energetic, 11169 (1996). thermodynamic, and kinetic effects,” Rev. Mod. Phys. 77(1), 371 (2005). 35D. Alfe,` M. J. Gillan, and G. D. Price, “Ab initio chemical potentials of solid 8Y.Ma and P. B. Balbuena, “Pt surface segregation in bimetallic Pt3M alloys: and liquid solutions and the chemistry of the earth’s core,” J. Chem. Phys. A density functional theory study,” Surf. Sci. 602(1), 107–113 (2008). 116(16), 7127–7136 (2002). 9G. E. Ramirez-Caballero and P. B. Balbuena, “Surface segregation of core 36L. Tong, Metal CONQUEST, HECToR dCSE report, March 2011, http:// atoms in core–shell structures,” Chem. Phys. Lett. 456(1), 64–67 (2008). www.hector.ac.uk/cse/distributedcse/reports/conquest/conquest.pdf (On- 10I. Swart, Frank M. F. de Groot, Bert M. Weckhuysen, P. Gruene, G. Meijer, line); accessed 28 September 2016. 37 and A. Fielicke, “H2 adsorption on 3d transition metal clusters: A com- G. P. Kerker, “Efficient iteration scheme for self-consistent pseudopotential bined infrared spectroscopy and density functional study,” J. Phys. Chem. calculations,” Phys. Rev. B 23(6), 3082 (1981). A 112(6), 1139–1149 (2008). 38M. Methfessel and A. T. Paxton, “High-precision sampling for Brillouin- 11 F. Tournus, K. Sato, T. Epicier, T. J. Konno, and V. Dupuis, “Multi-L10 zone integration in metals,” Phys. Rev. B 40, 3616–3621 (1989). domain CoPt and FePt nanoparticles revealed by electron microscopy,” 39N. Marzari, D. Vanderbilt, A. De Vita, and M. C. Payne, “Thermal contrac- Phys. Rev. Lett. 110, 055501 (2013). tion and disordering of the Al(110) surface,” Phys. Rev. Lett. 82, 3296–3299 12J. K. Nørskov, J. Rossmeisl, A. Logadottir, L. Lindqvist, J. R. Kitchin, (1999). T. Bligaard, and H. Jonsson,´ “Origin of the overpotential for oxygen reduc- 40W. Z. Liang, C. Saravanan, Y. Shao, R. Baer, A. T. Bell, and M. tion at a fuel-cell cathode,” J. Phys. Chem. B 108(46), 17886–17892 Head-Gordon, “Improved Fermi operator expansion methods for fast (2004). electronic structure calculations,” J. Chem. Phys. 119(8), 4117–4125 13J. R. Kitchin, J.K. Nørskov, M. A. Barteau, and J. G. Chen, “Modification (2003). of the surface electronic and chemical properties of Pt(111) by subsurface 41N. Marzari, D. Vanderbilt, and M. C. Payne, “Ensemble density-functional 3d transition metals,” J. Chem. Phys. 120(21), 10240–10246 (2004). theory for ab initio molecular dynamics of metals and finite-temperature 14J. Greeley and M. Mavrikakis, “Alloy catalysts designed from first princi- insulators,” Phys. Rev. Lett. 79(7), 1337 (1997). ples,” Nat. Mater. 3(11), 810–815 (2004). 42C. Freysoldt, S. Boeck, and J. Neugebauer, “Direct minimization tech- 15Y. Ma and P. B. Balbuena, “Surface properties and dissolution trends of nique for metals in density functional theory,” Phys. Rev. B 79, 241103 Pt3M alloys in the presence of adsorbates,” J. Phys. Chem. C 112(37), (2009). 14520–14528 (2008). 43C.-K. Skylaris, P. D. Haynes, A. A. Mostofi, and M. C. Payne, “Intro- 16A. Hellman, A. Resta, N. M. Martin, J. Gustafson, A. Trinchero, P.-A. ducing ONETEP: Linear-scaling density functional simulations on parallel Carlsson, O. Balmes, R. Felici, R. van Rijn, J. W. M. Frenken et al., “The computers,” J. Chem. Phys. 122(8), 084119 (2005). active phase of palladium during methane oxidation,” J. Phys. Chem. Lett. 44A.´ Ruiz-Serrano and C.-K. Skylaris, “A variational method for density func- 3(6), 678–682 (2012). tional theory calculations on metallic systems with thousands of atoms,” 17P. Hohenberg and W. Kohn, “Inhomogeneous electron gas,” Phys. Rev. J. Chem. Phys. 139(5), 054107 (2013). 136(3B), B864 (1964). 45B. Jans´ık, S. Høst, P. Jørgensen, J. Olsen, and T. Helgaker, “Linear-scaling 18T. A. Wesolowski and Y. A. Wang, Recent Progress in Orbital-free Density symmetric square-root decomposition of the overlap matrix,” J. Chem. Phys. Functional Theory (World Scientific, 2013). 126(12), 124104 (2007). 19M. Chen, L. Hung, C. Huang, J. Xia, and E. A. Carter, “The melting point 46R. W. Nunes and D. Vanderbilt, “Generalization of the density-matrix of lithium: An orbital-free first-principles molecular dynamics study,” Mol. method to a nonorthogonal basis,” Phys. Rev. B 50(23), 17611 (1994). Phys. 111(22-23), 3448–3456 (2013). 47X. S. Liand J. W. Demmel, “SuperLU DIST: A scalable distributed-memory 20H. J. Monkhorst and J. D. Pack, “Special points for Brillouin-zone sparse direct solver for unsymmetric linear systems,” ACM Trans. Math. integrations,” Phys. Rev. B 13, 5188–5192 (1976). Software 29(2), 110–140 (2003). 21P. E. Blochl,¨ O. Jepsen, and O. K. Andersen, “Improved tetrahedron 48W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numeri- method for Brillouin-zone integrations,” Phys. Rev. B 49, 16223–16233 cal Recipes: The Art of Scientific Computing (Cambridge University Press, (1994). Cambridge, 1992), Vol. 683. 22M. J. Gillan, “Calculation of the vacancy formation energy in aluminium,” 49V.Pan and R. Schreiber, “An improved Newton iteration for the generalized J. Phys.: Condens. Matter 1(4), 689 (1989). inverse of a matrix, with applications,” SIAM J. Sci. Stat. Comput. 12(5), 23W. Kohn, “Density functional and density matrix method scaling lin- 1109–1130 (1991). early with the number of atoms,” Phys. Rev. Lett. 76(17), 3168–3171 50T. Ozaki, “Efficient recursion method for inverting an overlap matrix,” Phys. (1996). Rev. B 64(19), 195110 (2001). 24D. R. Bowler, R. Choudhury, M. J. Gillan, and T. Miyazaki, “Recent progress 51A. H. R. Palser and D. E. Manolopoulos, “Canonical purification of the with large-scale ab initio calculations: The CONQUEST code,” Phys. Status density matrix in electronic-structure theory,” Phys. Rev. B 58(19), 12704 Solidi B 243(5), 989–1000 (2006). (1998). 25C.-K. Skylaris, A. A. Mostofi, P. D. Haynes, O. Dieguez,´ and M. C. Payne, 52P. D. Haynes, C.-K. Skylaris, A. A. Mostofi, and M. C. Payne, “Density ker- “Nonorthogonal generalized Wannier function pseudopotential plane-wave nel optimization in the ONETEP code,” J. Phys.: Condens. Matter 20(29), method,” Phys. Rev. B 66, 035119 (2002). 294207 (2008). 26J. VandeVondele, U. Borstnik,ˇ and J. Hutter, “Linear scaling self-consistent 53L. Lin, C. Yang, J. C. Meza, J. Lu, L. Ying et al., “Selinv—An algorithm field calculations with millions of atoms in the condensed phase,” J. Chem. for selected inversion of a sparse symmetric matrix,” ACM Trans. Math. Theory Comput. 8(10), 3565–3573 (2012). Software 37(4), 40 (2011). 27E. Artacho, D. Sanchez-Portal,´ P. Ordejon,´ A. Garc´ıa, and J. M. Soler, 54S. Li and E. Darve, “Extension and optimization of the FIND algorithm: “Linear-scaling ab-initio calculations for large and complex systems,” Phys. Computing Green’s and less-than Green’s functions,” J. Comput. Phys. Status Solidi B 215(1), 809–817 (1999). 231(4), 1121–1139 (2012). 220901-14 Aarons et al. J. Chem. Phys. 145, 220901 (2016)

55F. Henry Rouet, “Partial computation of the inverse of a large sparse matrix- 74R. Haydock, V. Heine, and M. J Kelly, “Electronic structure based on the application to astrophysics,” M.Sc. thesis, Institut National Polytechnique local atomic environment for tight-binding bands,” J. Phys. C: Solid State de Toulouse, 2009. Phys. 5(20), 2845 (1972). 56P. R. Amestoy, I. S. Duff, J.-Y. L’Excellent, and J. Koster, “A fully asyn- 75A. M. N. Niklasson, “Implicit purification for temperature-dependent chronous multifrontal solver using distributed dynamic scheduling,” SIAM density matrices,” Phys. Rev. B 68, 233104 (2003). J. Matrix Anal. Appl. 23(1), 15–41 (2001). 76S. Goedecker, “Integral representation of the Fermi distribution and its appli- 57N. I. M. Gould, J. A. Scott, and Y. Hu, “A numerical evaluation of sparse cations in electronic-structure calculations,” Phys. Rev. B 48(23), 17573 direct solvers for the solution of large sparse symmetric linear systems of (1993). equations,” ACM Trans. Math. Software 33(2), 10 (2007). 77S. Goedecker, “Low complexity algorithms for electronic structure calcu- 58G. Karypis and V. Kumar, “A parallel algorithm for multilevel graph par- lations,” J. Comput. Phys. 118(2), 261–268 (1995). titioning and sparse matrix ordering,” J. Parallel Distrib. Comput. 48(1), 78L. Lin, J. Lu, L. Ying, and E. Weinan, “Pole-based approximation of the 71–95 (1998). Fermi-Dirac function,” Chin. Ann. Math., Ser. B 30(6), 729–742 (2009). 59G. Karypis, K. Schloegel, and V. Kumar, Parmetis: Parallel Graph Par- 79L. Lin, J. Lu, R. Car, and E. Weinan, “Multipole representation of the Fermi titioning and Sparse Matrix Ordering Library (Department of Computer operator with application to the electronic structure analysis of metallic Science, University of Minnesota, 1997). systems,” Phys. Rev. B 79(11), 115133 (2009). 60C. Chevalier and F. Pellegrini, “PT-scotch: A tool for efficient parallel graph 80L. Lin, M. Chen, C. Yang, and L. He, “Accelerating atomic orbital-based ordering,” Parallel Comput. 34(6), 318–331 (2008). electronic structure calculation via pole expansion and selected inversion,” 61S. Goedecker and L. Colombo, “Efficient linear scaling algorithm for tight- J. Phys.: Condens. Matter 25(29), 295501 (2013). binding molecular dynamics,” Phys. Rev. Lett. 73(1), 122 (1994). 81A. M. N. Niklasson, P. Steneteg, and N. Bock, “Extended lagrangian free 62A. M. N. Niklasson, “Iterative refinement method for the approximate energy molecular dynamics,” J. Chem. Phys. 135(16), 164111 (2011). factorization of a matrix inverse,” Phys. Rev. B 70, 193102 (2004). 82J. Korringa, “On the calculation of the energy of a Bloch wave in a metal,” 63E. H. Rubensson, N. Bock, E. Holmstrom,¨ and A. M. N. Niklasson, Physica 13(6-7), 392–400 (1947). “Recursive inverse factorization,” J. Chem. Phys. 128(10), 104105 (2008). 83W. Kohn and N. Rostoker, “Solution of the Schrodinger¨ equation in periodic 64S. Goedecker and M. Teter, “Tight-binding electronic-structure calculations lattices with an application to metallic lithium,” Phys. Rev. 94(5), 1111 and tight-binding molecular dynamics with localized orbitals,” Phys. Rev. (1954). B 51(15), 9455 (1995). 84D. D. Johnson, F. J. Pinski, and G. M. Stocks, “Fast method for calculat- 65R. Baer and M. Head-Gordon, “Chebyshev expansion methods for elec- ing the self-consistent electronic structure of random alloys,” Phys. Rev. B tronic structure calculations on large molecular systems,” J. Chem. Phys. 30(10), 5508 (1984). 107(23), 10003–10013 (1997). 85F. J. Pinski and G. M. Stocks, “Fast method for calculating the self-consistent 66M. S. Paterson and L. J. Stockmeyer, “On the number of nonscalar multipli- electronic structure of random alloys. II. Optimal use of the complex plane,” cations necessary to evaluate polynomials,” SIAM J. Comput. 2(1), 60–66 Phys. Rev. B 32(6), 4204 (1985). (1973). 86Y. Wang, G. M. Stocks, W. A. Shelton, D. M. C. Nicholson, Z. Szotek, and 67F. R. Krajewski and M. Parrinello, “Stochastic linear scaling for metals and W. M. Temmerman, “Order-N multiple scattering approach to electronic nonmetals,” Phys. Rev. B 71, 233105 (2005). structure calculations,” Phys. Rev. Lett. 75, 2867–2870 (1995). 68M. Ceriotti, T. D. Kuhne,¨ and M. Parrinello, “An efficient and accurate 87I. A. Abrikosov, A. M. N. Niklasson, S. I. Simak, B. Johansson, decomposition of the Fermi operator,” J. Chem. Phys. 129(2), 024707 A. V. Ruban, and H. L. Skriver, “Order-N Green’s function technique (2008). for local environment effects in alloys,” Phys. Rev. Lett. 76, 4203–4206 69F. R. Krajewski and M. Parrinello, “Linear scaling electronic structure (1996). Monte Carlo method for metals,” Phys. Rev. B 75, 235108 (2007). 88A. Alam, S. N. Khan, A. V. Smirnov, D. M. Nicholson, and D. D. John- 70F. R. Krajewski and M. Parrinello, “Linear scaling for quasi-one- son, “Green’s function multiple-scattering theory with a truncated basis dimensional systems,” Phys. Rev. B 74, 125107 (2006). set: An augmented-kkr formalism,” Phys. Rev. B 90(20), 205102 71D. Richters and T. D. Kuhne,¨ “Self-consistent field theory based molecular (2014). dynamics with linear system-size scaling,” J. Chem. Phys. 140(13), 134109 89R. Zeller, “Towards a linear-scaling algorithm for electronic structure cal- (2014). culations with the tight-binding Korringa–Kohn–Rostoker Green function 72M. Ceriotti, T. D. Kuhne,¨ and M. Parrinello, “A hybrid approach to Fermi method,” J. Phys.: Condens. Matter 20(29), 294215 (2008). operator expansion,” preprint arXiv:0809.2232 (2008). 90L. Li, A. H. Larsen, N. A. Romero, V. A. Morozov, C. Glinsvad, F. Abild- 73R. Haydock, V. Heine, and M. J. Kelly, “Electronic structure based on the Pedersen, J. Greeley, K. W. Jacobsen, and J. K. Nørskov, “Investigation of local atomic environment for tight-binding bands. II,” J. Phys. C: Solid State catalytic finite-size-effects of platinum metal clusters,” J. Phys. Chem. Lett. Phys. 8(16), 2591 (1975). 4(1), 222–226 (2012).