Massively Parallel Quantum Chemical Density Matrix Renormalization Group Method

Massively parallel quantum chemical density matrix renormalization group method Jiří Brabec,y Jan Brandejs,y,z Karol Kowalski,{ Sotiris Xantheas,{ Örs Legeza,x and Libor Veis∗,y yJ. Heyrovský Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223 Prague 8, Czech Republic zFaculty of Mathematics and Physics, Charles University, Prague, Czech Republic {Pacific Northwest National Laboratory, Richland, WA 99352, USA xStrongly Correlated Systems “Lendület” Research group, Wigner Research Centre for Physics, H-1525, Budapest, Hungary E-mail: [email protected] Abstract (QC),3–8 where it has shortly developed into an advanced multireference approach capable of We present, to the best of our knowlegde, going well beyond the limits of standard quan- the first attempt to exploit the supercomputer tum chemical methods in problems where large platform for quantum chemical density ma- complete active spaces (CAS) are mandatory trix renormalization group (QC-DMRG) calcu- and even reach the full configuration interac- lations. We have developed the parallel scheme tion (FCI) limit.9–14 based on the in-house MPI global memory li- It has been applied on various problems brary, which combines operator and symme- ranging from very accurate computations on try sector parallelisms, and tested its perfor- small molecules,5,15,16 extended (pseudo-) lin- mance on three different molecules, all typical ear systems like polyenes, polyacenes, or candidates for QC-DMRG calculations. In case graphene nanoribbons,17–23 transition-metal of the largest calculation, which is the nitro- compounds,10,23–28 or molecules containing genase FeMo cofactor cluster with the active heavy-element atoms which require relativis- space comprising 113 electrons in 76 orbitals tic four component treatment.29,30 Recently, and bond dimension equal to 6000, our parallel the limits of the QC-DMRG method have been approach scales up to approximately 2000 CPU pushed by large scale computations of challeng- cores. ing bio-inorganic systems.31–34 During the past few years, several post-DMRG methods cap- arXiv:2001.04890v2 [physics.chem-ph] 19 Jun 2020 turing the missing dynamic electron correlation 1 Introduction on top of the DMRG wave function have also 35–40 The density matrix renormalization group been developed. (DMRG) method represents a very power- Regarding the parallelization strategies for QC-DMRG, the algorithms for shared,41 as well ful approach originally developed for treat- 24,42,43 ment of one-dimensional systems in solid state as distributed memory architectures have 1,2 been developed. The Chan’s distributed ap- physics. 42 Further success of DMRG in physics moti- proach is based on parallelization over dif- vated its application also in quantum chemistry ferent terms in the Hamiltonian and assigns certain orbital indices (and the corresponding 1 renormalized operators) to individual proces- The paper is organized as follows: in section sors. An alternative approach of Kurashige 2.1, we give a brief overview of the QC-DMRG et al.24 is based on parallelization over differ- method. Since the MOLMPS program employes ent symmetry sectors. Recently, the matrix- the renormalized operators rather than MPOs2, product-operator (MPO) inspired paralleliza- the presentation is in the original renormalization scheme employing the sum of operators tion group picture.46 Section 2.2 contains the formulation, which should result in lower inter- details of our parallel scheme and the compu- node communication requirements, was pro- tational details of our numerical tests are pre- posed.43 sented in section 3. Section 4 summarizes the Completely different approach than those pre- results with discussion, and section 5 closes sented so far was suggested by Stoudenmire and with conclusions and outlook. White.44 This scheme relies on the observation that DMRG approximately preserves the re- duced density matrix over regions where it does 2 Theory not sweep. In this approach, the lattice of orbitals is divided into several parts and sweeping 2.1 QC-DMRG overview on these parts is realized in parallel. In non-relativistic electronic structure calcula- 41 Extension of parallelization scheme of Ref. tions, one is interested in eigenvalues and eigen- to a smart hybrid CPU-GPU implementation vectors of the electronic Hamiltonian with the has also been presented, exploiting the power following second-quantized structure47 of both CPU and GPU tolerating problems ex- ceeding the GPU memory size.45 In DMRG, the n iteartive construction of the Hamiltonian is de- X y composed into several independent matrix op- Hel. = hpqapσaqσ + pq=1 erations and each of these are further decom- σ2f";#g posed into smaller independent tasks based on n X y y symmetries, thus diagonalization has been ex- + vpqrsapσaqσ0 arσ0 asσ; (1) pressed as a single list of dense matrix opera- pqrs=1 σ,σ02f";#g tions. 12 There exist a few great QC-DMRG codes where hpq and vpqrs represent one and two- with different functionalities and most of them electron integrals in a molecular orbital (MO) are open-source and available online. However, basis, which is for simplicity assumed to be re- according to the best of our knowledge, none stricted (e.g. restricted Hartree-Fock), σ and σ0 of them is truly massively parallel, i.e. can be denote spin variables, and n is the size of the run advantageously on hundreds or more than MO space in which the Hamiltonian is diago- a thousand of CPU cores. This article is thus nalized. a first attempt to port the QC-DMRG method The very first step of a QC-DMRG calcula- to a supercomputer platform. Our parallel ap- tion is to order the individual MOs on a one- proach is similarly to the shared memory algo- dimensional lattice, putting mutually strongly rithm41,45 based on merging of the operator and correlated orbitals as close as possible, which symmetry sector loops and employ the global may be carried out e.g. with the help of tech- memory model. It relies on a fast inter-node niques developed in the field of quantum infor- connection. mation.8,14,25,48 Then in the course of the practi- 46 The new C++ QC-DMRG implementation cal two-site QC-DMRG sweep algorithm, Hel. named MOLMPS1 was created based on this is being diagonalized in a vector space which is parallel approach. formed as a tensor product of the four spaces; 1 The MOLMPS code with all its functionalities will 2It is just a matter of taste, both formulations are be presented in a different publication. equivalent in terms of efficiency. 43 2 so called left block, left site, right site, and right dimension.1,2,46 When forming e.g. the left en- block. The sweep algorithm starts with just a larged block containing p orbitals, the full vec- single orbital in the left block, which is then en- tor space is spanned by flp−1g ⊗ fsg, where larged in each DMRG iteration by one orbital flp−1g denotes the basis of the left block with up to the point, where the right block contains p − 1 orbitals and fsg the basis of the added only a single orbital. The right block is be- (p-th) orbital (site). In order to keep the di- ing enlarged afterwards, see Figure 1, and the mension M, the new basis must be truncated sweeping is repeated until the energy is con- in a following way verged. There is in fact an analogy between the X L DMRG sweep algorithm and the Hartree-Fock jlpi = Olp−1s;lp jlp−1i ⊗ jsi ; (3) 49 self-consistent iterative procedure. lp−1s where OL is the 4M × M left block renormalization matrix. In case of the DMRG algorithm, the determi- nant representation of the complicated many- electron basis is not stored, instead the matrix representations of second-quantized operators needed for the action of the Hamiltonian (1) on a wave function are formed and stored. For a single orbital, all the required operator matrices can be formed from matrices in (2) by matrix transpositions, multiplications with appropriate MO integrals, and matrix-matrix multiplications. For the block of orbitals, the situation is more Figure 1: The scheme of the DMRG sweep al- complicated. Since the renormalized many- gorithm. electron basis is not complete, one cannot store only matrices of creation (or annihilation) oper- A single MO (site) may be empty, occupied ators acting on individual orbitals of the given by one α or β electron, or doubly occupied. The block and form matrices of operators corre- corresponding vector space is thus spanned by sponding to the strings of second-quantized op- the four basis states fj0i ; j"i ; j#i ; j"#ig and is erators appearing in (1) by their multiplica- complete. The matrix representations of cre- tions. In fact, one has to form all operator inter- ation operators in this basis read mediates necessary for the action of the Hamil- tonian (1) on a wave function. Projecting the Schrödinger equation onto the 0 1 0 1 0 0 0 0 0 0 0 0 product space of the left block, left site, right y B0 0 0 0C y B1 0 0 0C site, and right block flg ⊗ fs g ⊗ fs g ⊗ frg, a = B C ; a = B C : l r " @1 0 0 0A # @0 0 0 0A we have the effective equation 0 1 0 0 0 0 −1 0 (2) Hel. = E ; (4) Enlarging the left or right block with M basis states by one MO as mentioned above, would where are the expansion coefficients of the without any truncation lead to the new 4M- wave function, thus dimensional vector space and when repeated to the curse of dimensionality.

Load more