TeraChem Beta2 User Guide

TeraChem User Guide

Beta2 Release

PetaChem, LLC 27170 Moody Court Los Altos Hills, CA http://www.petachem.com

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

Introduction

TeraChem is general purpose designed to run on GPU architectures under a 64-bit . Some of TeraChem features include: • Restricted Hartree-Fock and grid-based restricted Kohn-Sham single point energy and gradient calculations • Various DFT functionals (BLYP, B3LYP, PBE, etc) and DFT grids (800 - 80,000 grid points per atom) • Geometry optimization (L-BFGS, Conjugate gradient, Steepest descent) o The optimization can be carried out either in Cartesian or internal coordinates whatever is specified in the start file (all input geometries are provided in Cartesians). The Cartesian → internal → Cartesian coordinate transformation is performed automatically whenever required. • Transition state search (Nudged elastic band) in internal and Cartesian coordinates • Ab initio (NVE, NVT ensembles) • Support of multiple-GPU systems

This Beta2 release, however, is restricted to run only on one Nvidia GPU. This is done for testing purposes. Please note that unlike the Beta1 version, there are no any other restrictions imposed in the Beta2 release.

The next (Beta3) version of TeraChem will be released in February, 2010 and will include • Unrestricted Hartree-Fock and Kohn-Sham methods (single-point, geometry optimization, transition state search, and ab initio molecular dynamics) • More DFT functionals • Constrained geometry optimization

Obtaining TeraChem

TeraChem Beta2 Release software is available free of charge for all interested users in Academia, Government, and Industry. To obtain a copy of TeraChem Beta2 please fill out the online application at http://www.petachem.com/betaversion.html.

Acknowledgements

This software was developed by Ivan Ufimtsev and Todd Martinez at the University of Illinois at Urbana-Champaign and PetaChem, LLC. The authors would like to specially thank Nathan Luehr for contribution of the GPU accelerated code for construction of numerical DFT grids. Some parts of TeraChem (geometry optimization and transition state search) use the DL-FIND library created by Johannes Kästner at Stuttgart Univeristy ("DL-FIND: an Open-Source Geometry Optimizer for Atomistic Simulations" Johannes

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

Kästner, Joanne M. Carr, Thomas W. Keal, Walter Thiel, Adrian Wander, Paul Sherwood J. Phys. Chem. A 113, 11856 (2009)).

System requirements

This Beta2 version of TeraChem was compiled and tested under 64-bit Redhat Enterprise Linux 5.3 operating system running on an Intel Core2 quad-core CPU machine. An Nvidia 1.3 compute capability graphics card is required to run the program. Please refer to the CUDA Programming Guide at http://www.nvidia.com/object/cuda_develop.html for the list of Nvidia GPU’s that meet this requirement. Furthermore, a CUDA driver should be installed on the system. Details on how to obtain and install the CUDA driver are provided below.

Installation

1) First of all, you need to download and install the CUDA 2.3 driver. To do that, go to http://www.nvidia.com/object/cuda_get.html, select [Linux 64-bit] in the “Operating System” dropdown and then the Linux version you have in the “Linux Version” menu. For CentOS, select [Redhat Enterprise]. You will need to install only the CUDA 2.3 driver, no CUDA toolkit or SDK is required. After downloading the driver package, shut down the X server by typing init 3 launch the driver binary, and follow the instructions. After the installation is complete, restart X by typing init 5 2) Unpack the tc.tar.gz archive using the following command (in a temporary directory which you may later remove): tar zxvf tc.tar.gz 3) Run the install script by typing ./install This script will verify that your machine has a suitable graphics card, verify that you accept the license terms, and install the software in a location of your choosing. It will also create a script which sets the appropriate environment variables.

Running TeraChem

After installation, you can run TeraChem by typing source instdir/TeraChem/SetTCVars.sh instdir/TeraChem/terachem inputfile where instdir is the installation directory you chose during the install (defaults to your home directory) and inputfile is the name of a TeraChem input file. Note that the

© 2009 PetaChem, LLC TeraChem Beta2 User Guide environment variable TeraChem is set by “source’ing” SetTCVars.sh. This is needed in order for TeraChem to locate its basis set library files.

The inputfile contains the required parameters of the job (including the filename of the file which contains the atomic coordinates for the molecule of interest). Most of the parameters have default values. The complete list of parameters available in this Beta2 version is presented in Table 1. An example of the configuration file used for single point energy calculations of caffeine with the BLYP functional, DFT-D dispersion corrections, and the 6-31G basis set is:

# basis set basis 6-31g

# coordinates file coordinates caffeine.xyz

# molecule charge charge 0

# SCF method (rhf/blyp/b3lyp/etc...): DFT-BLYP method blyp

# add dispersion correction (DFT-D) dftd yes

# type of the job (energy/gradient/md/minimize/ts): energy run energy

end

All comment lines begin with the ‘#’ character. All characters in the configuration file should be in lower case. There is no requirement imposed on the line ordering in the configuration file except that the last word should be ‘end’. Below is the output of this job. The program first lists all parameters values followed by all GPUs used in the job (in Beta2 only one GPU is used). Each GPU has its compute capability printed next to it. The molecule properties such as total charge and the number of atoms along with basis set information and atomic coordinates are printed right after the hardware section. The SCF procedure, which includes the DIIS error (the maximum component of the DIIS error vector), integrated number of electrons, exchange-correlation energy, SCF energy, and the total time elapsed per iteration, completes the program’s output deck.

XYZ coordinates molecules/caffeine.xyz Orbitals will be written to orbitals.log every 1000000000 time step Spin multiplicity: 1 Using DIIS algorithm to converge WF WF convergence threshold: 3.00e-05 Maximum number of SCF iterations: 100 Coulomb integral threshold: 1.00e-11 One-electron integral threshold: 1.00e-12 Gradient threshold: 1.00e-12 Exchange integral threshold: 1.00e-11 K-guard threshold: 1.00e-03

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

X-matrix tolerance: 1.00e-05 Method: DFT-D BLYP DFT grid type: 2 Initial guess generated by maximum overlap

****************************************** **** SINGLE POINT ENERGY CALCULATIONS **** ****************************************** using 1 out of 6 CUDA devices 0: GeForce GTX 295 (CC 1.3) -- USED 1: GeForce GTX 295 (CC 1.3) -- IDLE 2: GeForce GTX 295 (CC 1.3) -- IDLE 3: GeForce GTX 295 (CC 1.3) -- IDLE 4: GeForce GTX 295 (CC 1.3) -- IDLE 5: GeForce GTX 295 (CC 1.3) -- IDLE ------

Basis set: 6-31g Total atoms: 24 Total charge: 0 Total electrons: 102 (51-alpha, 51-beta) Total orbitals: 146 Total AO shells: 90 (62 S-shells; 28 P-shells) The spin state is singlet

BASIS SET INFORMATION SHELL EXPONENT COEFF S 6 3047.5248800 0.001834737132 457.3695180 0.014037322813 103.9486850 0.068842622264 29.2101553 0.232184443216 9.2866630 0.467941348435 3.1639270 0.362311985337

<-- TRUNCATED -->

*** Molecular Geometry (ANGS) *** CAFFEINE

Type X Y Z C 0.916319 0.172365 0.000054 C 0.350908 -1.060221 0.000041 C -1.829481 -0.153083 0.000094

<-- TRUNCATED --> generating PQs... generating S... generating guess... Time to generate guess: 0.27 0: CUBLAS initialized Setting up the DFT grid... time to set the grid = 0.03 s DFT grid points: 147438 (6143 points/atom)

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

Purifying P... IDMP = 2.442491e-15 1 DIISerr = 0.347813 Nelec = 101.997440 Exc = -90.019379 a.u. Escf = -677.423360 a.u. Elapsed time: 0.95 sec 2 DIISerr = 0.976397 Nelec = 101.998773 Exc = -93.633544 a.u. Escf = -670.995777 a.u. Elapsed time: 0.95 sec 3 DIISerr = 0.311630 Nelec = 101.998597 Exc = -93.284300 a.u. Escf = -678.960328 a.u. Elapsed time: 0.94 sec 4 DIISerr = 0.167960 Nelec = 101.998376 Exc = -91.634326 a.u. Escf = -679.828163 a.u. Elapsed time: 0.93 sec 5 DIISerr = 0.188721 Nelec = 101.998437 Exc = -92.089113 a.u. Escf = -679.845689 a.u. Elapsed time: 0.92 sec 6 DIISerr = 0.052340 Nelec = 101.998438 Exc = -92.099619 a.u. Escf = -679.956199 a.u. Elapsed time: 0.92 sec 7 DIISerr = 0.012841 Nelec = 101.998431 Exc = -92.054925 a.u. Escf = -679.967389 a.u. Elapsed time: 0.91 sec 8 DIISerr = 0.005672 Nelec = 101.998430 Exc = -92.045538 a.u. Escf = -679.968392 a.u. Elapsed time: 0.90 sec 9 DIISerr = 0.004079 Nelec = 101.998431 Exc = -92.049942 a.u. Escf = -679.968469 a.u. Elapsed time: 0.89 sec 10 DIISerr = 0.000660 Nelec = 101.998431 Exc = -92.053289 a.u. Escf = -679.968546 a.u. Elapsed time: 0.88 sec 11 DIISerr = 0.000245 Nelec = 101.998431 Exc = -92.054180 a.u. Escf = -679.968549 a.u. Elapsed time: 0.86 sec 12 DIISerr = 0.000118 Nelec = 101.998431 Exc = -92.053914 a.u. Escf = -679.968550 a.u. Elapsed time: 0.85 sec 13 DIISerr = 0.000032 Nelec = 101.998431 Exc = -92.053849 a.u. Escf = -679.968550 a.u. Elapsed time: 0.83 sec 14 DIISerr = 0.000014 Nelec = 101.998431 Exc = -92.053872 a.u. Escf = -679.968550 a.u. Elapsed time: 0.80 sec FINAL ENERGY: -679.968550 a.u. CENTER OF MASS: {0.008017, 0.006143, 0.000066} ANGS DIPOLE MOMENT: {3.584927, -0.524282, 0.000005} (|D| = 3.623062) DEBYE Processing time: 12.92 sec

The coordinates file fed to the coordinates parameter should be in the XMol format, where the first line specifies the number of atoms, and the second line provides description of the system (it can be left blank). Atomic coordinates are listed starting from the third line. All coordinates are in Angstroms. Here is an example file for a hydrogen molecule.

2 Hydrogen Molecule – Xmol format H 0.0 0.0 0.0 H 0.7 0.0 0.0

Some jobs (for example, transition state search using NEB method) require several sets of coordinates (frames). In this case all frames should be listed in the coordinates file one by one, i.e.

2 Hydrogen Molecule – Xmol format frame 1 H 0.0 0.0 0.0 H 0.7 0.0 0.0 2 Hydrogen Molecule – Xmol format frame 2 H 0.0 0.0 0.0 H 0.8 0.0 0.0

Note that there should be no blank lines between individual frames.

The PDB format often used for protein molecules is also supported and will be automatically assumed if the filename for the coordinates ends in ‘.pdb’. The basis set information used by TeraChem is provided by a set of files located in the basis directory. Thereby, this directory should be located at the same directory with the TeraChem binary (and is placed there by the install script).

Table 1. TeraChem job parameters available in the Beta2 version.

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

Parameter Description Default value

General parameters coordinates Name of file containing atomic coordinates not set basis One of the following: sto-3g, sto-6g, 3-21g, 6-31g, not set 6-311g, 3-21++g, 6-31++g charge The total charge of the molecule (integer). not set method rhf/blyp/b3lyp

Although various DFT functionals have been rhf implemented, only BLYP and B3LYP are available in this beta version dftgrid Integer value from [0-5] range, inclusive. Grid 0 contains approximately 800 grid points, grid 5 2 does ~80,000 points per atom. The default grid (type 2) contains 6,000-7,000 points per atom. guess generate or

generate means the initial WF guess is generated from scratch using maximum orbital generate overlap; otherwise, it is loaded from the WF file. The WF is dumped in the end of each calculation to the scr/c0 file. maxit Maximum number of SCF iterations (integer). 100 convthre WF convergence threshold (float). 3.0e-5 xtol Basis set linear dependency threshold (float). When diffuse functions are used, xtol and 1.0e-5 convthre should be raised to ~1.0e-4 … 1.0e- 3 dftd Should dispersion corrections be used? (yes or no no) units Units used for coordinates (angstrom or bohr) angstrom

Geometry Optimization and Transition State Search min_print How much should be printed on the screen verbose (something/verbose/debug) nstep Maximum number of optimization/TS search 100 steps min_tolerance Termination criterion based on the maximum 10-3 energy gradient component. min_tolerance_e Termination criterion based on the SCF energy 10-4

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

change. min_coordinates Type of coordinates in which optimization/TS search is performed internal (cartesian/internal/total_connection) min_method Optimization/TS search method (sd/cg1/cg2/lbfgs) sd – steepest descent lbfgs cg1 and cg2 – conjudate gradient lbfgs – L-BFGS min_hess_update Hessian update algorithm (never/powell/bofill/bfgs) bfgs If never, the Hessian is recalculated using finite differences at each step min_init_hess Initial Hessian (fischer-almlof/one- point/two-point/diagonal/identity) one-/two-point – exact Hessian from finite differences fischer- diagonal – only diagonal elements are almlof calculated using final differences, off-diagonal elements equal zero identity – initial Hessian is an identity matrix min_delta Atomic displacement in finite difference 0.003 calculations min_max_step Maximum step size in internal coordinates 0.5 min_restart Whether the optimization/TS search job is started from scratch (no) or loaded from the checkpoint no files (yes) min_dump How often the checkpoint files are written. By 10 default at each 10th step. ts_method Transition state search method (neb_free/neb_restricted/neb_frozen/ neb_free_cart/neb_restricted_cart/ neb_frozen_cart) neb_free – Nudged Elastic Band (NEB) with free endpoints neb_restricted – NEB with endpoints allowed neb_free to move perpendicularly to the their tangent direction neb_frozen – NEB with frozen endpoints neb_x_cart – only initialization is performed in min_coordinates coordinates, while the TS search is done in Cartesians min_image Number of NEB images in the TS search calculations. Should be greater than one. The 10 images are listed in the input coordinates file (specified by coordinates). If the number of

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

images found is smaller than min_image, the program will automatically generate missing images by interpolation. At least two images (endpoints) should be listed in the coordinates th file. The last one (i.e. the min_image ) is the climbing image.

Molecular dynamics parameters

nstep Total number of MD steps. Set nstep to 0 for 106 single-point energy calculations (integer). rseed Seed for random number generator. 1351351 timestep MD integration time step in femtoseconds (float) 1.0 thermostat Temperature control – velocity rescaling or rescale Langevin dynamics (rescale or langevin) rescalefreq When velocity rescaling is used, determines how often the velocities are rescaled. For instance,

setting rescale to 1000 will force rescaling at 2·109 every 1000th MD step. To obtain NVE dynamics, set rescalefreq to a value larger than nstep. tinit Initial temperature (K) sampled from Boltzman 300.0 distribution of velocities at T = tinit (float) t0 Thermostat temperature (K) (float) 300.0 lnvtime The Langevin damping time (fs), only used when 1000.0 thermostat is set to langevin. orbitals Path to an output file containing the canonical orbitals. The orbitals are printed in GAMESS orbitals.log format and can be visualized by VMD. orbitalswrtfrq Determines how often the orbitals are written to the output file. Due to large size (sometimes the orbitals require 100MB and even more of disk 2·109 space) it does not make sense to write orbitals at every MD iteration. (integer)

Output results

In addition to the information displayed on the screen, TeraChem creates several output files. scr/c0 – the converged WF binary file containing the C [i][j] array where i (row) is the MO and j (column) is the basis function index. This file can be used as initial WF guess in subsequent calculations.

orbitals.log – the canonical MO orbitals in GAMESS format. This file name can be modified by the orbitals parameter. Because the orbitals require much

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

th disk space, they can be written every n MD step, specified by orbitalswrtfrq. The orbitals can be visualized by VMD.

coors.xyz – the MD trajectory geometry file in the XMol format. The trajectory can be visualized by VMD.

optim.xyz – the geometry optimization or transition state search (depending on the job type) trajectory geometry file in the XMol format. The trajectory can be visualized by VMD.

log.xls – a tab separated file containing 7 columns: 1) SCF energy, 2) currently not in use, 3) Kinetic energy, 4) Temperature, 5) Total energy (SCF + Kinetic), 6) HOMO energy, 7) LUMO energy. All energies are in Hartree, temperature in Kelvin. In NVT dynamics, the Total energy does not include contribution from the damping force, and thus it should not be conserved.

optlog.xls – a tab separated file containing 7 columns of which only one (the first one) is currently used. The first column contains the SCF energy trajectory during geometry optimization or transition state search.

th neb_n.xyz – an XMol file containing trajectory of the n NEB image. The last image (i.e. the 10th if min_image equals 10) is the actual transition state that can also be obtained from the optim.xyz file.

nebinfo – this file contains energies of all (min_image-1) NEB images along the NEB converged path.

nebpath.xyz – this file contains XYZ coordinates of all (min_image-1) NEB images along the NEB converged path.

Geometry Optimization and Transition State Search

The instdir/tests/sp directory contains a simple configuration file (start.go) used for geometry optimization of a spiropyran molecule:

# basis set basis 6-31g

# coordinates file coordinates sp.xyz

# molecule charge charge 0

# SCF method (rhf/blyp/b3lyp/etc...): RHF method rhf

# type of the job (energy/gradient/md/minimize/ts): geometry

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

# optimization run minimize

end

The optimization is triggered by the ‘minimize’ keyword. Note that the sp.xyz file in fact contains two sets of coordinates (frames) written one by one. In geometry optimization jobs only the first frame is taken into account while the others (if any) are ignored. The second frame, however, is required by transition state search jobs and represents the second NEB endpoint in NEB calculations. The TS search is triggered by the ‘ts’ keyword, i.e. file (start.ts)

# basis set basis 6-31g

# coordinates file coordinates sp.xyz

# molecule charge charge 0

# SCF method (rhf/blyp/b3lyp/etc...): RHF method rhf

# type of the job (energy/gradient/md/minimize/ts): TS search run ts

end

Trajectory

To create a simple MD trajectory, you can use the example in instdir/tests/caffeine. The configuration file start.md is:

# basis set basis 6-31g

# coordinates file coordinates caffeine.xyz

# molecule charge charge 0

# SCF method (rhf/blyp/b3lyp/etc...): DFT-BLYP method blyp

# type of the job (energy/gradient/md/minimize/ts): MD run md

# number of MD steps nstep 10

© 2009 PetaChem, LLC TeraChem Beta2 User Guide

# dump orbitals every MD step orbitalswrtfrq 1

end

The only difference between this file and the file used for the single point calculations is nstep and orbitalswrtfrq parameters, which are now set to 10 and 1, respectively. orbitalswrtfrq ensures that the molecular orbitals will be written at every MD step (out of 10, total).

The TeraChem output coordinates file format is compatible with VMD. To open the trajectory file, go to VMD  File  New Molecule. Browse to the output coors.xyz file and make sure that the file type is correctly determined. Otherwise, select XYZ in the [Determine file type] menu. After the trajectory is loaded, you can adjust the representation settings so that the final molecule looks as shown in Figure 1.

Figure 1. Caffeine molecule geometry output (coors.xyz) in VMD.

The molecular orbitals can be visualized with VMD in similar fashion. Again, go to VMD  File  New Molecule. Browse to the output orbitals.log file and make sure

© 2009 PetaChem, LLC TeraChem Beta2 User Guide that the determined file type is GAMESS. Otherwise, select GAMESS in the [Determine file type] menu. To display both positive and negative isosurfaces you will need to create one representation for each orbital. After adjusting the atomic representation, create a new one and select “orbital” in the [Drawing Method] menu. All MO’s will be listed in the [Orbital] dropdown along with the orbital energy. The [Isovalue] scroll bar controls the isosurface value (in a.u.).

Figure 2. Caffeine molecule orbitals output (orbitals.log) in VMD.

Interactive Calculations

Interactive calculations are especially suitable for remote jobs when TeraChem is running on a remote machine (cluster) and the trajectory visualization is performed on a local desktop. TeraChem can visualize the geometry optimization, TS search, and molecular dynamics trajectories in real time, i.e. interactively as the calculations run. The production version will be able to carry out such interactive molecular dynamics (IMD) simulations for larger molecules using a hardware solution with up to eight GPUs. Future versions will also allow for user manipulation of the molecule, i.e. imposing external

© 2009 PetaChem, LLC TeraChem Beta2 User Guide forces on atoms. In the instdir/tests/benzene directory, you will find the files needed to try IMD for benzene. The configuration file start.imd reads:

# basis set basis sto-3g

# coordinates file coordinates C6H6.pdb

# molecule charge charge 0

# SCF method (rhf/blyp/b3lyp/etc...): Restricted Hartree-Fock method rhf

# type of the job (energy/gradient/md/minimize/ts): MD run md

# initial temperature in K tinit 1000

# this triggers interactive molecular dynamics # imd specifies the port VMD should connect to imd 54321

# number of MD steps nstep 1000

end

Open two terminal windows. In the first window, launch VMD and load in the coordinates of the benzene molecule from C6H6.pdb In the second terminal window, launch TeraChem, which will initialize the simulation and pause for connection to be made with VMD. Now, in VMD, select ExtensionsSimulationIMD Connect (NAMD). Type localhost (or the IP address of the remote machine on which TeraChem is currently running) in the Hostname field and 54321 (the port specified by the imd keyword in the TeraChem input file) in the Port field. Click the Connect radio button and you should see TeraChem executing in its window while the benzene molecule vibrates in the VMD display window.

Contact information

We will be very grateful to receive any feedback on your experience with TeraChem. Should you have any suggestions, concerns, or bug reports, please email them to [email protected]

Copyright

TeraChem software is Copyright © 2009 PetaChem, LLC

© 2009 PetaChem, LLC