DIIS and Hamiltonian Diagonalisation for Total-Energy Minimisation in the ONETEP Program Using the Scalapack Eigensolver

DIIS and Hamiltonian diagonalisation for total-energy minimisation in the ONETEP program using the ScaLAPACK eigensolver. Álvaro Ruiz Serrano August 27, 2010 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2010 Abstract Direct Inversion in the Iterative Subspace (DIIS) combined with Hamiltonian diagonalisation has been implemented in the ONETEP code for minimising the total energy of large systems using Density Functional Theory. This novel approach mixes the quantum single-particle density matrix at each iteration to accelerate the convergence of the self-consistent method in the inner loop of ONETEP. At each iteration the Hamilto- nian matrix is diagonalised and a new density matrix is generated from its eigenvectors. Eigensolver routines working on dense matrices scale with the third power of the system size which makes difficult to simulate systems of thousands of atoms. The ScaLAPACK parallel eigensolver is assessed as a method for diagonalising the Hamiltonian matrix in this kind of calculations. The results show that the DIIS-Hamiltonian diagonalisation method works to a very high accuracy for systems of up to near 600 atoms and becomes unstable for larger systems. The ScaLAPACK parallel eigensolver has proven to enhance this approach by distributing the data over the processors which cooper- ate to find the solution of the system. The DIIS method is intended to be combined with ensemble-DFT methods for simulating metallic systems, which are not achievable with linear-scaling DFT approaches. The performance results show that it is possible to simulate systems of a few thousands of atoms with a computational effort that is comparable to linear-scaling DFT programs. The code that has been developed for this work will be added to the main ONETEP code repository and will be available to other users for active research on condensed matter theory. Contents 1 Introduction 1 2 Background theory 4 2.1 First-principles methods in computational chemistry . 4 2.2 Density Functional Theory . 6 2.3 Linear-scaling DFT . 8 2.4 The ONETEP code . 10 3 The DIIS algorithm: theory and computational implementation in ONETEP 15 3.1 Description of the original DIIS . 16 3.2 Implementation of the DIIS algorithm in the ONETEP code . 17 3.3 Description of the code . 20 3.4 Eigensolvers . 23 3.4.1 LAPACK serial eigensolver, DSYGVX . 24 3.4.2 ScaLAPACK parallel eigensolver, PDSYGVX . 25 4 Target machines architecture, compilation details and benchmarks 26 4.1 Target machines architecture . 26 4.2 Compilation details . 27 4.3 Benchmarks . 28 5 Validation tests 31 5.1 Results . 31 5.2 Methods for solving the convergence problems of the DIIS algorithm for large systems . 35 5.2.1 DIIS initialisation . 35 5.2.2 Kerker preconditioning . 36 5.2.3 Level shifting . 36 5.2.4 Other approaches . 37 6 Performance results and analysis 38 6.1 Comparison of LAPACK and ScaLAPACK eigensolvers . 39 6.2 Comparison of LNV and DIIS algorithms for the inner loop of ONETEP 42 6.3 Future performance optimisations . 46 i 7 Conclusions 47 ii List of Tables 4.1 Set of small test cases . 28 4.2 Set of benchmarks corresponding to silicon nanorods of different size . 29 4.3 Set of benchmarks corresponding to a the amyloid protein of increasing size . 30 5.1 Energy obtained by LNV and DIIS methods for the set of small test cases. The asterisk * indicates that the calculation did not converge to the standard threshold. 33 5.2 Energy obtained by LNV and DIIS methods for the benchmark set of silicon nanorods. The asterisk * indicates that the calculation did not converge to the standard threshold. 34 6.1 Successful calculations using LAPACK and ScaLAPACK eigensolvers on HECToR and Iridis 3. In all cases, 4 CPUs per node have been used. The asterisk * indicates that a smaller number of nodes has not been tested and it is possible for the calculation to fit on less nodes than that. For saving computational time, the protein systems have not been simulated using LAPACK on Iridis 3. 40 iii List of Figures 2.1 ONETEP energy minimisation procedure. The energy functional is minimised in two nested loops: the inner loop optimises the elements of the density kernel, while the outer loop optimises the NGWFs. 13 3.1 Implementation of the DIIS algorithm in ONETEP. 20 4.1 Set of small benchmarks. 29 4.2 Set of silicon nanorods benchmarks. 29 4.3 Set of benchmarks of cuts to the amyloid protein. 30 5.1 Convergence of the energy of the LNV and DIIS methods for the set of small systems. 32 5.2 Potential energy well generated by the H-bonds between the monomers of the Tennis ball protein. The calculations with LNV and DIIS show very good agreement for this system. 33 5.3 Convergence of the energy of the LNV and DIIS methods for the set of silicon nanorods. 34 6.1 Timings, speed-up, parallel efficiency and serial fraction (Karp-Flatt metric) of the eigensolvers for the Si766H402 silicon nanorod on Iridis 3. 41 6.2 Timings, speed-up, parallel efficiency and serial fraction (Karp-Flatt metric) of the eigensolvers for the p16_20 protein benchmark on HEC- ToR..................................... 42 6.3 Timings, speed-up, parallel efficiency and serial fraction (Karp-Flatt metric) of the the inner loop of ONETEP for the p64_20 protein benchmark on HECToR. 43 6.4 Timings, speed-up, parallel efficiency and serial fraction (Karp-Flatt metric) of the inner loop of ONETEP for the p64_20 protein benchmark on HECToR. 44 6.5 Comparison of the scaling of the LNV and DIIS algorithms with the system size for the set of silicon nanorods. The systems have been simulating usign 20 nodes and 4 CPUs per node on Iridis 3. 44 6.6 Comparison of the scaling of the LNV and DIIS algorithms with the system size for the set of amyloid benchmarks. The systems have been simulating usign 20 nodes and 4 CPUs per node on HECToR and Iridis 3. 45 iv 6.7 Scaling of the DIIS-Hamiltonian diagonalisation routines with the system size of the amyloid protein. The systems have been simulating usign 20 nodes and 4 CPUs per node on Iridis 3. 45 v Acknowledgements The author would like to acknowledge the members of the ONETEP developers group for their valuable help and his MSc supervisor, Bartosz Dobrzelecki, for his advice during the completion of this work. This project was done as part of High-End Computing PhD studentship granted to the author, funded by EPSRC and supervised by Dr. Chris-Kriton Skylaris at the University of Southampton. The first two years of this studentship include the part-time partici- pation to the MSc in High Performance Computing at the University of Edinburgh, for which this dissertation is submitted. Chapter 1 Introduction Solid State Physics is the branch of Science that studies the formation and properties of matter. As a result of research on this field, novel materials and compounds have been developed for engineering and medical purposes that expand the capabilities of technology. Condensed matter is formed by atoms that interact with each other accord- ing to the laws of Quantum Mechanics which define the observable properties of the material. Experimental work in the field has to be combined with theoretical models of atoms, molecules and solids in order to overcome the intrinsic difficulty of working in the nanoscale. First-principles calculations provide an an accurate description of the processes that take place at atomic scale, which is derived directly from the basic equations of Quantum Mechanics and do not require parameterisations based on experimental results to work. With these calculations, often based on the Born-Oppenheimer approximation [1], it is possible to determine the structure of a molecule or group of molecules and its electronic properties, which, in turn, result in a detailed description of the final behaviour of the real compound. For these reasons, due to its ability for describing the nature of materials, the development of First-Principles techniques con- stitutes an important field of research for academy and industry. Molecules are formed by many atoms which may contain a very large number of elec- trons. Codes for solving such systems represent a major challenge for computational science as they are an example of a problem of N interacting bodies, which, when translated into equations, result in an intractable amount of degrees of freedom to be found. The application of parallel computing techniques into First-Principles Quan- tum Mechanical codes has made possible to simulate systems that would have never been achievable with serial computers. The efficiency and scaling of these algorithms is the key for scaling to new and larger systems like modern nanomaterials or drug compounds. The ONETEP code [2] performs First-Principles Quantum Mechanics calculations based on Density Functional Theory (DFT) [3] with a computational effort that scales linearly with the number of atoms in the system. This program takes advantage of parallel computing to simulate systems of tens of thousands of atoms with a great quantum ac- 1 curacy. The code, which uses Message-Passing directives for massive parallelisation on distributed memory machines, has been designed for maximum parallel performance on modern state-of-the-art clusters. ONETEP uses a Self-Consistent Field (SCF) approach [1] for finding the ground-state of the system by minimising the total energy with respect to the quantum single-particle density matrix in two nested loops. In this work, Direct Inversion in the Iterative Subspace (DIIS) method [4] combined with explicit diagonalisation of the Hamiltonian matrix has been introduced as a new method for minimising the total energy in the inner loop of ONETEP.

Load more