Generating basis sets for CRYSTAL and QWalk Lucas K. Wagner

The point of a basis set is to describe a (generally unknown) function efficiently. That is, we are going to approximate some general function f(x) by a sum over known basis functions (in this case χ(x)): X f(x) = ciχi(x). (1) i

We will usually choose χi(x) such that they are convenient to work with. Perhaps integrals are easy to do with them, or perhaps they very closely approximate the function f(x), so that we don’t need too many elements in the sum of Eqn1. One basis set expansion that you may be familiar with is the Fourier expansion, which uses plane waves as the χi’s. In many-body quantum systems, we typically start our description of the many-body wave function Ψ(r1, r2,...) with a Slater determinant. This is written as follows:   φ1(r1) φ1(r2) φ1(r3) ...  φ2(r1) φ2(r2) φ2(r3) ...  ΨS(r1, r2,...) = Det   (2)  φ3(r1) φ3(r2) φ3(r3) ...  ...... where ri is the position of the ith electron and φi(r) is called a molecular or crystalline orbital (MO/CO). The Slater determinant is the simplest possible many-electron wave function that satisfies fermion antisymmetry [Ψ(r1, r2,...) = Ψ(r2, r1,...)]. There also − exist algorithms to evaluate properties of the Slater determinant efficiently. Note that these one-particle functions φi have not yet been specified, and we will have to come up with a way to represent them within the computer. We expand the MO/CO’s in some basis χj as follows; { } X φi(r) = cijχj(r). (3) j

Essentially what a Hartree-Fock code does is find a set of cij such that the energy of a Slater determinant is minimized. That is, it performs the operation

R ∗ ˆ ˆ ΨS(r1, r2,...)HΨS(r1, r2,...)dr1dr2 ... min ΨS H ΨS = min R ∗ (4) {cij }h | | i {cij } ΨS(r1, r2,...)ΨS(r1, r2,...)dr1dr2 ...

Because of the form of ΨS, it is possible to separate these integrals and evaluate them efficiently. At the end, one basically needs to evaluate integrals like (for example): Z ∗ 1 ∗ χi (r1)χi(r1) χj (r2)χj(r2)dr1dr2. (5) r1 r2 | − | These integrals (plus solving some linear equations) are often a significant part of the runtime of HF/DFT codes. See Szabo and Ostlund for details. While the theoretical basis

1 for DFT is a little different, the actual implementation is almost identical to Hartree-Fock, except that the bare Hˆ is replaced with an effective one that includes some of the effects of correlation. How to choose a good basis set χi so that Eqn5 and others like it are easy to evaluate? { } One possibility is to choose functions for which we can do the integrals analytically and save a lot of work. This is the choice of codes based on Gaussian basis sets, such as GAUSSIAN, GAMESS, NWCHEM, and CRYSTAL. This typically allows one to accurately represent the true φi functions with only a relatively few basis functions, but it is common to be confused about which Gaussian basis functions one should add to the set for highest accuracy. There is no magical ’make my basis better’ switch for Gaussian basis sets. So we have to understand what the orbitals φi are supposed to look like, and what the current basis produces. The starting point for molecules and solids is the so-called ”tight-binding” approximation. In this approximation, the orbitals are represented by linear combinations of atomic orbitals (LCAO). While LCAO is usually not quantitatively correct, it is often qualitatively useful and gets a large proportion of the true total energy. So, in making a basis, we should first include the atomic orbitals as a starting point. In a Gaussian basis, this is usually done by solving for the atomic orbitals numerically and then fitting them to Gaussians. This is called a contracted basis function. In published basis sets, there is usually a single χ that is made of 2-10 Gaussian functions for each atomic state. If one uses only the atomic basis, the set is termed single ζ. As an example, let’s go through the motions of converging a basis set for hydrogen from scratch. We’ll start with the following CRYSTAL input file: H atom MOLECULE 1 1 1 0.0 0.0 0.0 ENDGEOM 1 4 0 0 1 1. 0. *0.295798402378 1.0 0 0 1 0. 0. *1.86698928478 1.0 0 0 1 0. 0. *13.8776804872 1.0 0 0 1 0. 0. *7.0 1.0 99 0 ENDB UHF SPINLOCK

2 1 200 END

Note that we have made four uncontracted Gaussian basis functions. CRYSTAL will have four parameters to minimize the energy of the single atomic orbital. Run: % python optimize_stars.py -c h.inp

You can find optimize stars.py in the utils/ subdirectory of the qwalk distribution (after 0.97.1). Make sure that the paths are set correctly and that crystal is installed for opti- mize stars.py to work. In the output, you should see that optimize stars.py is varying the exponents of the Gaussians to obtain the lowest energy. My final line was -0.498629291241 [ 0.15878329 0.59531549 13.2327468 2.22967714]

Since the exact ground state energy of hydrogen is -0.5, this is fairly close (although one could do better with more Gaussians). Now run H_Chain MOLECULE 1 1 1 0.0 0.0 0.0 ENDGEOM 1 4 0 0 1 1. 0. 0.158783290609 1.0 0 0 1 0. 0. 0.595315492828 1.0 0 0 1 0. 0. 13.2327468043 1.0 0 0 1 0. 0. 2.22967713977 1.0 99 0 ENDB UHF SPINLOCK 1 200 PRINTOUT EIGENVECS -1 END END with CRYSTAL. You should get the same energy as before. You will have in your output lines like this:

3 FINAL EIGENVECTORS ALPHA ELECTRONS

1 ( 0 0 0)

1 2 3 4 1 0.6361 -1.360 0.6387 -0.2323 2 0.3622 1.410 -1.746 0.7183 3 0.1901E-01 0.3343E-01 -0.2075E-01 1.323 4 0.1068 -0.2649E-01 1.642 -1.240

These are the cij’s from above. Each column here contains the coefficients for the atomic orbitals, expressed in the basis that we specified. In the case of hydrogen, we want to find the coefficients for the first atomic orbital, which is the 1s orbital. Simply take the first column and contract them (be careful about the order!): H atom MOLECULE 1 1 1 0.0 0.0 0.0 ENDGEOM 1 1 0 0 4 1. 0. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068 99 0 ENDB UHF SPINLOCK 1 200 PRINTOUT EIGENVECS -1 END END

Running this, you should get the same energy as before, but note that there is only one coefficient in the output file. You have now successfully fit an atomic orbital with Gaussian basis functions.

4 Note that atoms with more electrons will have more occupied orbitals and we will want to identify them in the CRYSTAL output and include them in our LCAO basis. We can use this minimal basis (which is also called single-ζ, SZ, or tight binding) to do calculations on condensed systems. Let’s consider the H2 molecule: H2 MOLECULE 1 2 1 0.0 0.0 0.0 1 0.0 0.0 0.7 ENDGEOM 1 1 0 0 4 1. 0. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068 99 0 ENDB UHF PRINTOUT EIGENVECS -1 END END

Running this, I get that the energy is -1.0835124662535 Hartrees, so it is a bound molecule! However, we are not allowing for the atomic orbitals to relax. If you look at the eigenvectors for this system, you’ll see that they are simple bonding/antibonding combinations of the atomic orbitals and they cannot deform. We can do better by adding additional Gaussian basis functions. Here is how to make a triple-ζ with polarization (TZP) basis: For each atomic orbital: • – Add two Gaussian orbitals of the same angular momentum as the atomic orbital, spaced by a factor of 3 (we will optimize this) ∼ – Add one Gaussian orbital of one higher angular momentum than the atomic orbital, at around 0.6 (we will optimize this later). ∼ – Optimize the Gaussian exponents in a local environment similar to your system of interest.

We can do that here:

5 H2 MOLECULE 1 2 1 0.0 0.0 0.0 1 0.0 0.0 0.7 ENDGEOM 1 4 0 0 4 1. 1. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068 0 0 1 0. 1. *1.0 1.0 0 0 1 0. 1. *0.3 1.0 0 2 1 0. 1. *0.6 1.0 99 0 ENDB UHF PRINTOUT EIGENVECS -1 END END

And run optimize stars.py as we did with the atom. I got a large reduction in the energy, to -1.1292157899972, much lower than the single-ζ result! Final notes:

The contracted atomic orbitals change if the pseudopotential changes. That is, a • BFD pseudopotential will need a different contracted orbital than a Trail-Needs pseudopotential.

The uncontracted orbitals will often be the same across different pseudopotentials. • They are meant to describe the changes in the atomic orbitals in the presence of other atoms, which should (if the pseudopotential approximation is good) be universal. You can thus ‘steal’ uncontracted Gaussians from other sets.

If you have a metallic system (zero gap), make sure to read the CRYSTAL manual • carefully for some potential issues there. Increasing TOLINTEG may be necessary, in particular.

6 Fig1 shows the convergence of the total energy as the basis improves. For reference, here are the basis inputs that I used to generate the plot: SZ 1 1 0 0 4 1. 0. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068

DZP: 1 3 0 0 4 1. 1. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068 0 0 1 0. 1. *1.0 1.0 0 2 1 0. 1. *0.6 1.0

TZP: 1 4 0 0 4 1. 1. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01 2.22967713977 0.1068 0 0 1 0. 1. *1.0 1.0 0 0 1 0. 1. *0.3 1.0 0 2 1 0. 1. *0.6 1.0

QZDP: 1 6 0 0 4 1. 1. 0.158783290609 0.6361 0.595315492828 0.3622 13.2327468043 0.1901E-01

7 1.08 −

1.09 −

1.10 −

1.11 −

Energy (Hartree) 1.12 −

1.13 −

1.14 − SZ DZP TZP DZDP Basis

Figure 1: Convergence of the total energy with the basis

2.22967713977 0.1068 0 0 1 0. 1. *1.0 1.0 0 0 1 0. 1. *0.3 1.0 0 0 1 0. 1. *3.0 1.0 0 2 1 0. 1. *2.4 1.0 0 2 1 0. 1. *0.6 1.0

8 FAQ

How do I adapt a basis for molecular systems to CRYSTAL?

This is sometimes a bit tricky. I often will experiment with the method outlined above and adapting the molecular basis. Remember that there is an upper bound property with respect to variation in the basis. The main issue you will find with using a molecular basis is that it may have very diffuse gaussians (small exponents). These will largely determine the runtime of CRYSTAL, but they can increase the accuracy as well, so it’s a tradeoff.

9