<<

Massively parallel quantum chemical density matrix group method

Jiří Brabec,† Jan Brandejs,†,‡ Karol Kowalski,¶ Sotiris Xantheas,¶ Örs Legeza,§ and Libor Veis∗,†

†J. Heyrovský Institute of Physical Chemistry, Academy of Sciences of the Czech Republic, v.v.i., Dolejškova 3, 18223 Prague 8, Czech Republic ‡Faculty of and Physics, Charles University, Prague, Czech Republic ¶Pacific Northwest National Laboratory, Richland, WA 99352, USA §Strongly Correlated Systems “Lendület” Research group, Wigner Research Centre for Physics, H-1525, Budapest, Hungary

E-mail: [email protected]

Abstract (QC),3–8 where it has shortly developed into an advanced multireference approach capable of We present, to the best of our knowlegde, going well beyond the limits of standard quan- the first attempt to exploit the supercomputer tum chemical methods in problems where large platform for quantum chemical density ma- complete active spaces (CAS) are mandatory trix renormalization group (QC-DMRG) calcu- and even reach the full configuration interac- lations. We have developed the parallel scheme tion (FCI) limit.9–14 based on the in-house MPI global memory li- It has been applied on various problems brary, which combines operator and symme- ranging from very accurate computations on try sector parallelisms, and tested its perfor- small molecules,5,15,16 extended (pseudo-) lin- mance on three different molecules, all typical ear systems like polyenes, polyacenes, or candidates for QC-DMRG calculations. In case graphene nanoribbons,17–23 transition-metal of the largest calculation, which is the nitro- compounds,10,23–28 or molecules containing genase FeMo cofactor cluster with the active heavy-element atoms which require relativis- space comprising 113 electrons in 76 orbitals tic four component treatment.29,30 Recently, and bond dimension equal to 6000, our parallel the limits of the QC-DMRG method have been approach scales up to approximately 2000 CPU pushed by large scale computations of challeng- cores. ing bio-inorganic systems.31–34 During the past few years, several post-DMRG methods cap-

arXiv:2001.04890v2 [physics.chem-ph] 19 Jun 2020 turing the missing dynamic electron correlation 1 Introduction on top of the DMRG wave function have also 35–40 The density matrix renormalization group been developed. (DMRG) method represents a very power- Regarding the parallelization strategies for QC-DMRG, the for shared,41 as well ful approach originally developed for treat- 24,42,43 ment of one-dimensional systems in solid state as distributed memory architectures have 1,2 been developed. The Chan’s distributed ap- physics. 42 Further success of DMRG in physics moti- proach is based on parallelization over dif- vated its application also in quantum chemistry ferent terms in the Hamiltonian and assigns certain orbital indices (and the corresponding

1 renormalized operators) to individual proces- The paper is organized as follows: in section sors. An alternative approach of Kurashige 2.1, we give a brief overview of the QC-DMRG et al.24 is based on parallelization over differ- method. Since the MOLMPS program employes ent symmetry sectors. Recently, the matrix- the renormalized operators rather than MPOs2, product-operator (MPO) inspired paralleliza- the presentation is in the original renormaliza- tion scheme employing the sum of operators tion group picture.46 Section 2.2 contains the formulation, which should result in lower inter- details of our parallel scheme and the compu- node communication requirements, was pro- tational details of our numerical tests are pre- posed.43 sented in section 3. Section 4 summarizes the Completely different approach than those pre- results with discussion, and section 5 closes sented so far was suggested by Stoudenmire and with conclusions and outlook. White.44 This scheme relies on the observation that DMRG approximately preserves the re- duced density matrix over regions where it does 2 Theory not sweep. In this approach, the lattice of or- bitals is divided into several parts and sweeping 2.1 QC-DMRG overview on these parts is realized in parallel. In non-relativistic electronic structure calcula- 41 Extension of parallelization scheme of Ref. tions, one is interested in eigenvalues and eigen- to a smart hybrid CPU-GPU implementation vectors of the electronic Hamiltonian with the has also been presented, exploiting the power following second-quantized structure47 of both CPU and GPU tolerating problems ex- ceeding the GPU memory size.45 In DMRG, the n iteartive construction of the Hamiltonian is de- X † composed into several independent matrix op- Hel. = hpqapσaqσ + pq=1 erations and each of these are further decom- σ∈{↑,↓} posed into smaller independent tasks based on n X † † symmetries, thus diagonalization has been ex- + vpqrsapσaqσ0 arσ0 asσ, (1) pressed as a single list of dense matrix opera- pqrs=1 σ,σ0∈{↑,↓} tions. 12 There exist a few great QC-DMRG codes where hpq and vpqrs represent one and two- with different functionalities and most of them electron integrals in a molecular orbital (MO) are open-source and available online. However, basis, which is for simplicity assumed to be re- according to the best of our knowledge, none stricted (e.g. restricted Hartree-Fock), σ and σ0 of them is truly massively parallel, i.e. can be denote variables, and n is the size of the run advantageously on hundreds or more than MO space in which the Hamiltonian is diago- a thousand of CPU cores. This article is thus nalized. a first attempt to port the QC-DMRG method The very first step of a QC-DMRG calcula- to a supercomputer platform. Our parallel ap- tion is to order the individual MOs on a one- proach is similarly to the shared memory algo- dimensional lattice, putting mutually strongly rithm41,45 based on merging of the operator and correlated orbitals as close as possible, which symmetry sector loops and employ the global may be carried out e.g. with the help of tech- memory model. It relies on a fast inter-node niques developed in the field of quantum infor- connection. mation.8,14,25,48 Then in the course of the practi- 46 The new C++ QC-DMRG implementation cal two-site QC-DMRG sweep , Hel. named MOLMPS1 was created based on this is being diagonalized in a vector space which is parallel approach. formed as a product of the four spaces;

1 The MOLMPS code with all its functionalities will 2It is just a matter of taste, both formulations are be presented in a different publication. equivalent in terms of efficiency. 43

2 so called left block, left site, right site, and right dimension.1,2,46 When forming e.g. the left en- block. The sweep algorithm starts with just a larged block containing p orbitals, the full vec- single orbital in the left block, which is then en- tor space is spanned by {lp−1} ⊗ {s}, where larged in each DMRG iteration by one orbital {lp−1} denotes the basis of the left block with up to the point, where the right block contains p − 1 orbitals and {s} the basis of the added only a single orbital. The right block is be- (p-th) orbital (site). In order to keep the di- ing enlarged afterwards, see Figure 1, and the mension M, the new basis must be truncated sweeping is repeated until the energy is con- in a following way verged. There is in fact an analogy between the X L DMRG sweep algorithm and the Hartree-Fock |lpi = Olp−1s,lp |lp−1i ⊗ |si , (3) 49 self-consistent iterative procedure. lp−1s where OL is the 4M × M left block renormal- ization matrix. In case of the DMRG algorithm, the determi- nant representation of the complicated many- electron basis is not stored, instead the ma- trix representations of second-quantized oper- ators needed for the of the Hamiltonian (1) on a wave function are formed and stored. For a single orbital, all the required operator matrices can be formed from matrices in (2) by matrix transpositions, multiplications with appropriate MO integrals, and matrix-matrix multiplications. For the block of orbitals, the situation is more Figure 1: The scheme of the DMRG sweep al- complicated. Since the renormalized many- gorithm. electron basis is not complete, one cannot store only matrices of creation (or annihilation) oper- A single MO (site) may be empty, occupied ators acting on individual orbitals of the given by one α or β electron, or doubly occupied. The block and form matrices of operators corre- corresponding vector space is thus spanned by sponding to the strings of second-quantized op- the four basis states {|0i , |↑i , |↓i , |↑↓i} and is erators appearing in (1) by their multiplica- complete. The matrix representations of cre- tions. In fact, one has to form all operator inter- ation operators in this basis read mediates necessary for the action of the Hamil- tonian (1) on a wave function. Projecting the Schrödinger equation onto the     0 0 0 0 0 0 0 0 product space of the left block, left site, right † 0 0 0 0 † 1 0 0 0 site, and right block {l} ⊗ {s } ⊗ {s } ⊗ {r}, a =   , a =   . l r ↑ 1 0 0 0 ↓ 0 0 0 0 we have the effective equation 0 1 0 0 0 0 −1 0

(2) Hel.ψ = Eψ, (4) Enlarging the left or right block with M basis states by one MO as mentioned above, would where ψ are the expansion coefficients of the without any truncation lead to the new 4M- wave function, thus dimensional vector space and when repeated to the curse of dimensionality. The essence of the X |Ψi = ψ |li ⊗ |s i ⊗ |s i ⊗ |ri . (5) DMRG algorithm1,2 is indeed to determine the ls1s2r l r ls s r optimal left and right block many-electron ba- l r sis with bounded dimension M, so called bond

3 In order to reduce the number of matrix- matrix multiplications during the action of the ↑↑↑↑ X † † Hamiltonian on a wave function, which are Hint 3 vpqrsap↑aq↑ar↑as↑ = pq∈left the most CPU-demanding tasks, the efficient rs6∈left QC-DMRG codes work with the so called pre- X = A↑↑a a . (7) summed (or partially summed) operators,50 i.e. rs r↑ s↑ intermediates formed by contraction of opera- rs6∈left tor matrices with MO integrals. For example in Notice that the four-index summation have the left block been replaced by the two-index one3. When employing the partial summations, all the op- erators that build up the Hamiltonian are at most two-index (normal or pre-summed).3 ↑↑ X † † Ars = vpqrsap↑aq↑, rs 6∈ left. (6) The full Hamiltonian matrix (4) is not pq∈left formed, instead the tensor product structure of the vector space is employed. For example let us assume that we have the above men- A↑↑ are examples of the left block two-index rs tioned contributing term, where A↑↑ in the left pre-summed operators which together with rs block is accompanied by a a in the right one; a a acting on the two sites or the right block r↑ s↑ r↑ s↑ A↑↑ ⊗ I ⊗ I ⊗ a a ,I being the identity (plus Hermitian conjugate terms) contribute to rs r↑ s↑ the (↑↑↑↑)-part of the two-electron Hamiltonian interaction term

0 0 0 0 ↑↑  0 ↑↑ 0 hl |⊗hs |⊗hs |⊗hr | A ⊗I ⊗I ⊗a a |li⊗|s i⊗|s i⊗|ri = hl | A |li hr | a a |ri δ 0 δ 0 . (8) l r rs r↑ s↑ l r rs r↑ s↑ slsl srsr

The action of this Hamiltonian term on a trial are reshaped into the matrix form ψ(lsl),(srr), the wave function vector (φls1s2r) needed for the it- aforementioned reduced density matrices can be erative diagonalization solvers like Davidson al- computed in the following way gorithm,51 can be therefore compiled only from the knowledge of composing operator matrices L † in the basis of the individual blocks (left, right, ρ = ψψ , (9) or sites). ρR = ψ†ψ. (10) To complete the overview of the QC-DMRG algorithm, it remains to define the renormal- For the transition to the next iteration, all op- ization matrix (3). As argued by White,1,2 it erator matrices formed for the enlarged block, is optimal to make the renormalization matrix e.g. in |lp−1i ⊗ |si basis for the forward sweep OL (or OR) from the M eigenvectors of the (3), have to be renormalized left (or right) enlarged block reduced density matrix with the largest eigenvalues. When the A0 = (OL)†AOL, (11) wave function expansion coefficients ψ (5) lslsrr where A represents an operator matrix in the 0 3Also the two-index pre-summed operators are non-truncated (4M-dimensional) basis and A formed in such a way, that the contraction with MO is the renormalized matrix representation in the integrals is performed for the larger block, keeping the truncated (M-dimensional) basis. 24,43 remaining two sums as short as possible. Another ingredient of the efficient QC-DMRG

4 b c th code is a proper handling of quantum sym- |sl i and |sri in (12) denote b symmetry sec- metries.52,53 Currently, the MOLMPS code em- tor left site basis state and cth symmetry sector ployes U(1) symmetry4, however the SU(2) right site basis state. (spin-adapted version)54–56 is under develop- In summary, one QC-DMRG iteration is com- ment and it will not affect the parallel scheme posed of the three main steps, whose paralleliza- presented in the next subsection. tion is discussed in detail in the next subsec- We employ U(1) symmetries to restrict the tion, namely: (a) formation of pre-summed op- total number of α and β electrons (or equiva- erators, (b) Hamiltonian diagonalization, and lently spin projection MS). As usually, all left (c) operators renormalization. The overall cost and right block basis states, as well as the site of the QC-DMRG computation is O(M 2n4) + basis states, are grouped into symmetry sectors O(M 3n3), where the first term corresponds to sharing the number of α (n↑) and β (n↓) elec- the formation of pre-summed operators whereas trons. Only the non-zero blocks of operator ma- the second one to the Hamiltonian diagonaliza- trices are stored in the form of dense matrices tion and renormalization.3 together with the necessary information about Last but not least, we would like to briefly the symmetry sectors. mention the connection to the matrix product When expanding the wave function (5) in the state (MPS) wave function form.57 In fact, the symmetry-sector decomposed form of the left MPS matrices are nothing, but reshaped renor- block, left site, right site, and right block ba- malization matrices (3), and they are easily ob- 43 sis, only those sectors whose n↑ and n↓ sum up tainable from the DMRG sweep algorithm. tot. tot. to the correct total numbers (n↑ , n↓ ) con- They can be used e.g. for efficient calculations tribute. Moreover in case of the non-relativistic of correlation functions or a subset of the FCI QC-DMRG method, for which n↑ and n↓ are expansion coefficients, which may be employed good quantum numbers5, all the symmetry sec- for the purposes of the tailored coupled cluster tors of a single site are one-dimensional14 and methods.39,58–62 If we reshape OL from (3) lp−1s,lp we can write to Ls (i.e. 4M × M matrix into M × M × 4 lp−1,lp tensor) and similarly for the right block matrix,

0 then starting with the wave function expansion X X abcd b c |Ψi = ψlr |li⊗|sl i⊗|sri⊗|ri , (12) in the renormalized basis (5) and recursive ap- abcd l∈a r∈d plication of (3) would lead to where a, b, c, d denote indices of the left block, X s1 s2 sisi+1 sn left site, right site, and right block symmetry |ΨMPSi = L L ... ψ ... R · sectors and primed summation symbol stands {s} for the restricted summation for which holds · |s1s2 . . . sni (15)

which is the two-site MPS form of the DMRG tot. n↑(a) + n↑(b) + n↑(c) + n↑(d) = n↑ , (13) wave function. tot. n↓(a) + n↓(b) + n↓(c) + n↓(d) = n↓ . (14)

4Point group symmetry may be employed as well, 2.2 Parallel scheme it is however useful only in case of small symmetric Before discussing the parallel scheme, let us molecules. Otherwise, the localized or split-localized MO basis, which break the point group symmetry, are first briefly describe, how the data is stored typical in QC-DMRG calculations. 23 Localized or split- in MOLMPS, in particular the operators. As localized MO basis were also used in the presented nu- is usual, we employ the sparsity generated by merical examples. quantum symmetries mentioned in the previ- 5 This is not the case of the four-component relativis- ous subsection. The operator class contains in- tic QC-DMRG, where only the total number of electrons is a good quantum number. 29,30 formation about individual symmetry sectors and the corresponding dense matrices. For

5 smallest arrays, is the local data model, where

), ), Symmetry sector decomposed storage of ↑ Full matrix form of *↑ selected are available locally on all by means of std::vector (white blocks are zero)

sec1 sec2 sec3 … seck nodes. This significantly reduces communica- |sec ⟩⟨sec | |sec ⟩⟨sec | … |sec ⟩⟨sec | 1 2 2 3 / 1 sec 1 tion for the reasonable price of a slightly higher memory requirements. User can in fact specify sec 2 a threshold for the array size, below which this sec 3 model is employed. This model is by default

n … … used for example for the Krylov vectors during 3 2 Block orbital indices 51 1 seck the Davidson diagonalization. The second option is the global model. It is suitable for large arrays and the data are evenly distributed among nodes (in the MPI Figure 2: The operator class storage demon- SHM regime). The distribution is performed strated on the example of the block operator over all available nodes in such a way that a a†. ↑ balanced load on nodes is ensured. This model is typically used for dense tensors of the indi- these dense entities, we have developed our own vidual sectors of the left and right block opera- lightweight tensor library which can work with tors in case of larger calculations (active space up to three-legged dense tensors and serves as sizes and bond dimensions). When the tensors a wrapper to BLAS and LAPACK. In case of fit into the memory of a single node, the above the block operators with one or two orbital in- mentioned local memory approach is certainly dices (normal or pre-summed), the third index more advantageous (see the results section). In- of the dense tensor corresponds to the orbital deed, any intermediate of the DMRG calcula- index / pair of indices. The dense tensors of tion may be treated in a different data model, individual symmetry sectors are stored in the purely based on the amount of free computer vector container of the C++ standard library memory, just to maximize data locality. as is depicted in Figure 2. The already mentioned in-house dense ten- Our parallel approach is based on our own sor library provides a templated tensor class for MPI global memory (GM) library. It relies on simple tensor handling in combination with the a fast inter-node connection (e.g. Infiniband, GM class. The tensor class consists of a pointer Omni-Path, Fibre Chanel) which is common for to the data array and descriptors, which involve all modern supercomputer architectures. The dimensions and also a pointer to the GM class. data distribution and handling are managed by While descriptors are available locally for each a GM class.The GM class instance envelops a process, data arrays are handled by GM class. group of tensors, which are distributed under This design of the code substantially simplifies the same conditions, e.g. the operators of a code design and data handling. given (left or right) block and carries all the When performing any operation with dense information about the distribution. tensors, get_data() call recognizes whether In order to minimize the amount of data the data are stored locally or remotely and stored in the memory, we employed the MPI then return data pointer or fetch the data shared memory (SHM) model (introduced with from remote location first. We also intro- MPI version 3), so only one copy of each ten- duced pre-fetching of tensors (or parts of it) sor/matrix is stored per node. All processes needed for a calculation where the same sec- on a given node can access these data directly tors are involved multiple times. fetch_data() without using a remote access. Remote pro- or fetch_slices() fetch the data from remote cesses can access these data using inter-node location and assign it temporarily to a corre- communicators by RMA calls. sponding tensor class. When the data are no For distribution, the GM supports two mod- longer needed, the allocated memory is freed. els. The first one, which is suitable for the Regarding the sources of parallelism, we com-

6 bine operator and symmetry sector parallelisms similarly to the simpler shared memory ap- 0 (α,s,s0) 0 T 41,45 s X s (α,s,s ) proach, in order to generate a large-enough φe = f(s) · Al · φ · Ar . (18) number of tasks (dense matrix-matrix opera- αs tions), which can be executed in parallel. All The action of the Hamiltonian on a trial wave three main steps are task-based parallelized. function vector thus comprise a huge number of dense matrix-matrix multiplications. 2.2.1 Hamiltonian diagonalization In our approach, before the Davidson algo- In case of the iterative Hamiltonian diagonaliza- rithm is started, a huge task list which com- tion (4) by means of the Davidson51 or similar bines the loops over operator combinations (α) 0 algorithms, the Hamiltonian is applied sequen- and symmetry sectors (s, s ) is generated. Since tionally on a trial wave function vector. This different terms from this task list may write to 0 action is composed of a large number of opera- the same wave function output sector (s ), each tor combinations MPI process has its own copy of φe and we use the reduce function. In case of the GM model, the individual op-

|Φei = Hel. |Φi (16) erator combinations acting on both (left and X (α) right) blocks at the same time do not involve = A ⊗ A(α) ⊗ A(α) ⊗ A(α) |Φi , l sl sr r orbital indices. So for example instead of the α ↑↑ aforementioned term Ars ⊗ I ⊗ I ⊗ ar↑as↑, we ↑↑ where α denotes a given operator combination have A ⊗ I ⊗ I ⊗ a↑a↑ and the loop over rs and Al, Asl , Asr , and Ar represent the left block, is performed sequentionally during the task ex- left site, right site, and right block operators, ecution. It is organized this way to avoid fetch- respectively. One such operator combination is ing of small memory chunks and with the view ↑↑ e.g. the term Ars ⊗ I ⊗ I ⊗ ar↑as↑, which was of a further GPU acceleration in future [parallel mentioned earlier. execution of matrix-matrix multiplications (18) When taking into account the symmetry sec- performed on different slices of the same dense tors (12), then it holds for the expansion coef- tensors].45 ficients of the resulting vector |Φei In order to exploit data locality at the max- imum, we have developed the semi-dynamic

0 0 0 scheduler. It considers where data for individ- a0b0c0d0 X X X  (α)a→a  (α)b→b φ 0 0 = A A · el r l l0l sl ual tasks are stored and also involves indepen- α abcd l∈a dent counters specific for each node. As the r∈d 0 0 result, the group of tasks is assigned to a node, · A(α)c→c A(α)d→d φabcd, sr r r0r lr where their execution will cause the minimum amount of communication. The group of tasks 0 0 0 0 l ∈ a and r ∈ d . (17) are executed locally on a given node with a dynamical task distribution among local pro- 0 The superscript of type a → a labels the sym- cesses. In case of tasks involving tensors from metry sector of a given operator to which the both (left and right) blocks, the execution node operator matrix elements belong. If we formally is selected based on the storage of the larger 0 gather all symmetry sector indices to s (and s ), from both tensors to minimize the amount of merge elements of the site operators (scalars) data being fetched. into a multiplication factor f(s), and reshape the wave function expansion coefficients into a 2.2.2 Operators renormalization matrix form, we can write Also in case of the renormalization, we generate a huge task list, which combines the loops over different operators to be renormalized and their

7 symmetry sectors. These tasks are completely and stored. independent and can be executed in parallel. As an example, let us consider the term During a given task execution, the complete sector matrix of the newly formed operator cor- X † † responding to the non-truncated enlarged ba- vpqrsap↑ ⊗ aq↑ ⊗ I ⊗ ar↑as↑, (19) sis is formed according to the blocking tables6 and renormalized by the sector matrices of the where p belongs to the left block, q to the left renormalization operator (11). site, and r with s to the right block. If we form In case of the GM model, we again employ the the following operators in the left block semi-dynamic scheduler, which now considers tmp X † the locality of the newly formed operator sec- Ars = vpqrsap↑, (20) p∈left tor matrices. The block operator sector matri- q=left site ces needed for the blocking have to be fetched, rs∈right which naturally requires much more communi- we can rewrite Eq. 19 as cation than the Hamiltonian diagonalization.42 In order to decrease the communication, we do X tmp † not work with individual slices of dense tensors Ars ⊗ aq↑ ⊗ I ⊗ ar↑as↑. (21) of the newly formed operators (corresponding rs∈right to different orbital indices), but rather group In our approach, we first generate the list of them into larger chunks. We also fetch the all operators to be formed on-the-fly and then, whole dense tensors of the block operators and in line with previous subsections, generate a order the tasks so that they may be re-used for huge task list combining all the pre-summed op- subsequent tasks. It is at the cost of a slightly erators and their symmetry sectors. higher memory requirements. Not the local memory model, neither the GM In case of the local memory model, the sit- model requires network communication. It is uation is much simpler since there is no need because in the GM model, we form the sym- for fetching of tensors of the block operators metry sector matrices of the pre-summed oper- (everything is available on all nodes). The dif- ators on the same nodes where the given sym- ference, however, is that we have to update all metry sector tensors of the original operators, nodes with the newly formed sector matrices. whose slices are multiplied by MO integrals and Because in the local memory model, the dense summed, are stored. tensors of the newly formed operators are stored In case of the local memory model, all nodes in consecutive arrays, it can be done efficiently have to be updated with the newly formed sec- by means of the accumulate function. tor matrices, which is done in the same way as in case of the renormalization. 2.2.3 Operators pre-summation Since we fully employ the tensor product struc- ture of the vector space of the two blocks and 3 Computational details two sites, we form also additional pre-summed We have tested the parallel scheme pre- operators on-the-fly, during preparing the ac- sented in the previous section on three differ- tion of the Hamiltonian on a trial wave func- ent molecules, all typical candidates for the tion. These operators are formed in order to QC-DMRG computations, namely: Fe(II)- minimize the number of matrix-matrix multipli- porphyrin model (Figure 3a), extended π- cations during the aforementioned Hamiltonian conjugated system (Figure 3b), and FeMoco diagonalization and they are not renormalized cluster (Figure 3c). 6They contain the information about how the in- Fe(II)-porphyrin model was selected, because dividual operators are combined during blocking, i.e. it was demonstrated by Li Manni et al.63,64 when the block is enlarged by a site. that apart from 3d, 4d, and 4s orbitals of

8 (a) Fe(II)-porphyrin model (b) Defected π-conjugated anthracene tetramer (c) FeMoco cluster

Figure 3: Structures of the molecules for which the parallel QC-DMRG scaling has been studied. Notice that the FeMoco cluster is for clarity not complete, only the atoms of the ligands which are directly bonded to Fe and Mo atoms are displayed. Atom colors: nitrogen - blue, sulphur - yellow, oxygen - red, carbon - brown, hydrogen - white, iron - grey, molybdenum - green. the Fe center and σ(Fe-N) orbital, inclusion on-surface synthesis of ethynylene-bridged an- of all π orbitals from the porphyrin ring into thracene polymers.65 Depending on the defect, the active space is necessary for a quantita- such species may exhibit peculiar electronic tive determination of its ground state, leading structure properties. For the present study, we to CAS(32,34). In our recent DMRG-DLPNO- have employed the UB3LYP optimized geome- TCCSD(T) study,62 we have optimized this try and built the active space from ROHF/cc- CAS orbitals for the lowest triplet and quin- PVDZ C-atom pz orbitals, which corresponds tet states and different geometries by means to CAS(63,63). The active space orbitals were of the state specific DMRG-CASSCF method fully localized.23 in TZVP basis. In the present study, we The last system is the nitrogenase FeMo co- have tested the parallel QC-DMRG scaling factor (FeMoco) cluster (Figure 3c, notice that on the above mentioned triplet state DMRG- for clarity reasons the structure is not complete, CASSCF(32,34)/TZVP orbitals optimized at only the atoms of the ligands which are directly the geometry used in Li Manni et al. works.63,64 bonded to Fe and Mo atoms are displayed), The triplet state was chosen as it is more cor- which is undoubtedly one of the most chal- related than quintet62,63 and the active space lenging problems of the current computational orbitals were split-localized.23 chemistry. Its importance is proved by the fact The second system selected for the scaling that FeMoco is responsible for the nitrogen re- tests is the defected π-conjugated anthracene duction during the process of nitrogen fixation tetramer (Figure 3b). It is a representative of under ambient conditions in certain types of π-conjugated hydrocarbons (linear or quasi lin- bacteria.66 In contrast, the industrial Haber- ear), a group of molecules frequently studied by Bosch process to produce ammonia (mainly for means of QC-DMRG calculations in the C-atom fertilizers) is very energetically demanding. In 17–19,21,23 pz active space. We have recently stud- fact, the electronic structure of the FeMo co- ied the ground state of the above mentioned factor remains poorly understood.33,34 Reiher et defected anthracene tetramer with the QC- al. proposed the model of FeMoco with the ac- DMRG method since this and similar species tive space containing 54 electrons in 54 orbitals often appear as unwanted by-products during in the context of simulations on quantum com-

9 puters.67 Recently, it was shown on a different DMRG calculation. We present timings only model that the larger active space, in partic- for the three main parts discussed in the text, ular CAS(113, 76) is necessary for the correct namely Hamiltonian diagonalization via David- open-shell nature of its ground state.33 For our son procedure, operator pre-summation, and benchmark tests, we have employed the inte- renormalization. Other parts including e.g. the gral file provided with the later paper, which is formation and diagonalization of the reduced available online.68 All the computational details density matrix or broadcastings are marginal can be found in Ref.33 for the presented cases. In all three cases, we have employed the We have tested the performance of the local Fiedler method25 to order the active space or- memory model on the example of the smallest bitals on a one-dimensional lattice. The order- system [Fe(II)-porphyrin model, M = 2048]. ing optimization was iterated about four times As can be seen in Figure 4a, the Davidson by means of the QC-DMRG calculations with algorithm scales almost ideally up to approx. increasing bond dimensions varying from M = 500 CPU cores and still shows a good perfor- 256 up to M = 1024, which were followed by mance up to approx. 1500 CPU cores. The the calculations of the single-site entropies and tiny bump at 48 CPU cores is caused by the mutual information necessary for the Fiedler fact that despite no communication is needed method and the warm-up procedure.8,14,25 At during the execution of individual tasks, the fi- least three sweeps with the final orbital order- nal result scattered in chunks among nodes has ing and actual bond dimensions were performed to be gathered (by means of the reduce func- before measuring the individual timings in the tion) after each Davidson step, which requires middle of the sweep. All the QC-DMRG calcu- a small amount of communication. lations were initialized with the CI-DEAS pro- The dashed line in Figure 4a around 512 CPU cedure8,14 and they were performed with the cores corresponds to the same treatment of op- MOLMPS program. erator combinations acting on both blocks (left and right) as in case of the GM model, i.e. performing the loop over orbital indices inside 4 Results and discussion tasks. One can see that such a treatment is not suitable for the local memory model where all The timings of the individual parts of one QC- data are easily accessible locally. DMRG iteration for all tested systems corre- The pre-summation of operators and renor- sponding to the middle of the sweep, which is malization in Figure 4b also show almost per- the most time consuming, are summarized in fect scaling. Figures 4, 5, 6, 7, and 8. In the present study, On the example of the Fe(II)-porphyrin model we were interested solely in the scaling char- with M = 4096, we demonstrate the transition acterictics of our parallel scheme, we therefore from the local to the GM model. This case still do not present any energies (they would be out fits into the memory of a single node, however, of context anyway). Nevertheless, the detailed we have employed the GM approach in order to study of the Fe(II)-porphyrin model has already see the effect of communication. For this and been submitted62 and chemistry-oriented pa- further cases, we do not present scalings for pre- pers about the remaining two systems should summations since they scale almost perfectly also appear soon. (no need for communication). All calculations were performed on the Sa- The effect of communication on scaling of lomon supercomputer of the Czech national su- the Davidson algorithm (Figure 5a) is apparent percomputing center in Ostrava with the follow- when going from a single node (24 CPU cores) ing hardware: 24 cores per node (2 x Intel Xeon to 48 CPU cores. The scaling is a lot worse E5-2680v3, 2.5 GHz), 128 GB RAM per node than in case of the local memory model, which and InfiniBand FDR56 interconnect. We have is definitely not surprising, but a reasonable im- used up to 2480 CPU cores for a single QC- provement can be seen up to aprrox. 1500 cores.

10 4096 102.4 Pre-summation 51.2 2048 Renormalization 25.6 1024 12.8 512 6.4 256 3.2 128 1.6 Time [s] Time [s] 64 0.8 0.4 32 0.2 16 0.1 8 0.05 1 2 4 8 16 32 64 128 256 512 1024 2048 1 2 4 8 16 32 64 128 256 512 1024 2048 CPU cores CPU cores

(a) Davidson procedure (b) Pre-summation and renormalization

Figure 4: Timings of the individual parts of one QC-DMRG iteration corresponding to the middle of the sweep performed on the Fe(II)-porphyrin model [CAS(32,34)] with bond dimension M = 2048.

16384 512 8192 256 4096 2048 128 1024 64 512 Time [s] Time [s] 256 32 128 16 64 32 8 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 CPU cores CPU cores (a) Davidson procedure (b) Renormalization

Figure 5: Timings of the Davidson procedure and the renormalization of the QC-DMRG iteration corresponding to the middle of the sweep performed on the Fe(II)-porphyrin model [CAS(32,34)] with bond dimension M = 4096.

The situation is worse for renormalization (Fig- tetramer with M = 4096 (Figure 7) and ure 5b), which scales badly in this case. How- FeMoco cluster with M = 60007 (Figure 8) ever, an important point to stress is that it is represent the most challenging problems which much faster than the Davidson algorithm itself require larger number of nodes and also large (75 sec. vs 20 sec. for 2496 CPU cores). amount of communication. Our parallel ap- The Fe(II)-porphyrin model with M = 8192 proach scales up to approx. 2000 CPU cores in Figure 6 is a memory demanding example and in case of the FeMoco cluster, there is still which requires at least 4 nodes (of 128 GB). The a non-negligible improvement (10%) up to ap- scaling of the Davidson algorithm (Figure 6a), prox. 2500 CPU cores. In all the tested cases, as well as the renormalization (Figure 6b) is in- the renormalization is adequately faster than deed similar to the M = 4096 case, despite the the Davidson algorithm. fact that the tasks required significantly more 7In case of the Davidson algorithm, the block which intensive communication. was not going to be enlarged corresponded to M = 4000. The defected π-conjugated anthracene This does not affect the renormalization though.

11 8192 512

4096 256

2048 Time [s] Time [s] 128 1024

512 64 64 128 256 512 1024 2048 4096 64 128 256 512 1024 2048 4096 CPU cores CPU cores (a) Davidson procedure (b) Renormalization

Figure 6: Timings of the Davidson procedure and the renormalization of the QC-DMRG iteration corresponding to the middle of the sweep performed on the Fe(II)-porphyrin model [CAS(32,34)] with bond dimension M = 8192.

450 1200 Davidson procedure Davidson procedure 400 Renormalization Renormalization 1000 350 300 800 250 600 200 Time [s] Time [s] 150 400 100 200 50 0 0 768 1536 2016 768 2080 2480 CPU cores CPU cores

Figure 7: Timings of the Davidson procedure Figure 8: Timings of the Davidson procedure and the renormalization of the QC-DMRG iter- and the renormalization of the QC-DMRG iter- ation corresponding to the middle of the sweep ation corresponding to the middle of the sweep performed on the defected π-conjugated an- performed on the FeMoco cluster [CAS(113,76)] thracene tetramer [CAS(63,63)] with bond di- with bond dimension M = 6000. mension M = 4096.

supercopmuter platform for QC-DMRG calcu- The performance analysis of the largest calcu- lations. We have developed the parallel scheme lations mentioned above still shows signs of the based on the MPI global memory library which inter-node imbalance, which keeps us a room for combines operator and symmetry sector par- further improvement and will be the subject of allelisms. We have tested its performance on future work. three different molecules with the active spaces ranging from to 34 up to 76 orbitlas and vari- ous bond dimensions. For smaller computations 5 Conclusions (smaller active spaces and bond dimensions) we have achieved almost perfect scaling. For larger In this paper, we have presented the first at- calculations, which did not fit into the memory tempt (to our best knowledge) to exploit the of a single node, we have achieved worse, but

12 still reasonable scaling up to about 2000 CPU (2) White, S. R. Density-matrix algorithms cores. Our largest calculation corresponds to for quantum renormalization groups. the FeMoco cluster [CAS(113,76)] with bond di- Physical Review B 1993, 48, 10345– mension M = 6000 on 2480 CPU cores. We be- 10356. lieve that further acceleration is possible when the problem of the inter-node imalance is solved (3) White, S. R.; Martin, R. L. Ab initio in future. As was also discussed in the text, an- quantum chemistry using the density ma- other possible source of speed-up can be GPU trix renormalization group. The Journal of units. Chemical Physics 1999, 110, 7127–4130. In summary, we have shown that the most (4) Chan, G. K.-L.; Head-Gordon, M. Highly challenging problems of the current electronic correlated calculations with a polynomial structure may be calculated by means of the cost algorithm: A study of the density ma- QC-DMRG method on a supercomputer in a trix renormalization group. The Journal of fraction of time of the few-node calculation. Chemical Physics 2002, 116, 4462–4476. (5) Chan, G. K.-L.; Head-Gordon, M. Ex- Acknowledgment act solution (within a triple-zeta, double polarization basis set) of the electronic We would like to thank Pavel Jelínek for pro- Schrödinger equation for water. The Jour- viding us with the DFT optimized geometry of nal of Chemical Physics 2003, 118, 8551– the defected π-conjugated anthracene tetramer. 8554. This work has been supported by the Czech Science Foundation (grant no. 18-18940Y), the (6) Legeza, Ö.; Röder, J.; Hess, B. A. Con- Center for Scalable and Predictive methods for trolling the accuracy of the density-matrix Excitation and Correlated phenomena (SPEC), renormalization-group method: The dy- which is funded by the U.S. Department of En- namical block state selection approach. ergy (DOE), Office of Science, Office of Basic Physical Review B 2003, 67, 125114. Energy Sciences, the Division of Chemical Sci- ences, Geosciences, and Biosciences, the Hun- (7) Legeza, Ö.; Röder, J.; Hess, B. A. QC- garian National Research, Development and In- DMRG study of the ionic-neutral curve novation Office (grant no. K120569), and the crossing of LiF. Molecular Physics 2003, Hungarian Quantum Technology National Ex- 101, 2019–2028. cellence Program (project no. 2017-1.2.1-NKP- (8) Legeza, Ö.; Sólyom, J. Optimizing 2017-00001). All the computations were carried the density-matrix renormalization group out on the Salomon supercomputer in Ostrava, method using quantum information en- we would therefore like to acknowledge the sup- tropy. Physical Review B 2003, 68, port by the Czech Ministry of Education, Youth 195116. and Sports from the Large Infrastructures for Research, Experimental Development and In- (9) Legeza, Ö.; Noack, R.; Sólyom, J.; Tin- novations project “IT4Innovations National Su- cani, L. In Computational Many-Particle percomputing Center - LM2015070". Physics; Fehske, H., Schneider, R., Weisse, A., Eds.; Lecture Notes in Physics; Springer Berlin Heidelberg, 2008; Vol. 739; References pp 653–664. (1) White, S. R. Density matrix formula- (10) Marti, K. H.; Reiher, M. The Density tion for quantum renormalization groups. Matrix Renormalization Group Algorithm Physical Review Letters 1992, 69, 2863– in Quantum Chemistry. Zeitschrift für 2866. Physikalische Chemie 2010, 224, 583– 599.

13 (11) Chan, G. K.-L.; Sharma, S. The Density and β-carotene. The Journal of Chemical Matrix Renormalization Group in Quan- Physics 2008, 128, 144117. tum Chemistry. Annual Reviews of Phys- ical Chemistry 2011, 62, 465–481. (19) Mizukami, W.; Kurashige, Y.; Yanai, T. More π Electrons Make a Difference: (12) Wouters, S.; Van Neck, D. The density of Many Radicals on Graphene matrix renormalization group for ab initio Nanoribbons Studied by Ab Initio DMRG quantum chemistry. The European Physi- Theory. Journal of Chemical Theory and cal Journal D 2014, 68 . Computation 2013, 9, 401–407.

(13) Yanai, T.; Kurashige, Y.; Mizukami, W.; (20) Barcza, G.; Barford, W.; Gebhard, F.; Chalupský, J.; Lan, T. N.; Saitow, M. Legeza, Ö. Excited states in polydiacety- Density matrix renormalization group for lene chains: A density matrix renormaliza- ab initio calculations and associated dy- tion group study. Physical Review B 2013, namic correlation methods: A review 87, 245116. of theory and applications. International Journal of Quantum Chemistry 2014, (21) Hu, W.; Chan, G. K.-L. Excited-State 115, 283–299. Geometry Optimization with the Density Matrix Renormalization Group, as Ap- (14) Szalay, Sz.; Pfeffer, M.; Murg, V.; Bar- plied to Polyenes. Journal of Chemical cza, G.; Verstraete, F.; Schneider, R.; Theory and Computation 2015, 11, 3000– Legeza, Ö. Tensor product methods and 3009. entanglement optimization for ab initio quantum chemistry. International Journal (22) Timár, M.; Barcza, G.; Gebhard, F.; of Quantum Chemistry 2015, 115, 1342– Veis, L.; Örs Legeza, Hückel-Hubbard- 1391. Ohno modeling of π-bonds in ethene and ethyne with application to trans- (15) Chan, G. K.-L.; Kállay, M.; Gauss, J. polyacetylene. Physical Chemistry Chem- State-of-the-art density matrix renormal- ical Physics 2016, 18, 18835–18845. ization group and coupled cluster theory studies of the nitrogen binding curve. The (23) Olivares-Amaya, R.; Hu, W.; Journal of Chemical Physics 2004, 121, Nakatani, N.; Sharma, S.; Yang, J.; 6110–6116. Chan, G. K.-L. The ab-initio density matrix renormalization group in practice. (16) Sharma, S.; Yanai, T.; Booth, G. H.; Um- The Journal of Chemical Physics 2015, rigar, C. J.; Chan, G. K.-L. Spectroscopic 142, 034102. accuracy directly from quantum chem- istry: Application to ground and excited (24) Kurashige, Y.; Yanai, T. High- states of beryllium dimer. The Journal of performance ab initio density matrix Chemical Physics 2014, 140, 104112. renormalization group method: Ap- plicability to large-scale multireference (17) Hachmann, J.; Dorando, J. J.; Avilés, M.; problems for metal compounds. The Chan, G. K.-L. The radical character of Journal of Chemical Physics 2009, 130, the acenes: A density matrix renormaliza- 234114. tion group study. The Journal of Chemical Physics 2007, 127, 134309. (25) Barcza, G.; Legeza, Ö.; Marti, K. H.; Reiher, M. Quantum-information analysis (18) Ghosh, D.; Hachmann, J.; Yanai, T.; of electronic states of different molecular Chan, G. K.-L. Orbital optimization structures. Physical Review A 2011, 83, in the density matrix renormalization 012508. group, with applications to polyenes

14 (26) Boguslawski, K.; Marti, K. H.; Legeza, Ö.; the FeMo cofactor of nitrogenase as rele- Reiher, M. Accurate ab Initio Spin Den- vant to quantum simulations. The Journal sities. Journal of Chemical Theory and of Chemical Physics 2019, 150, 024302. Computation 2012, 8, 1970–1982. (34) Li, Z.; Guo, S.; Sun, Q.; Chan, G. K.- (27) Wouters, S.; Bogaerts, T.; Van L. Electronic landscape of the P-cluster Der Voort, P.; Van Speybroeck, V.; of nitrogenase as revealed through many- Van Neck, D. Communication: DMRG- electron quantum wavefunction simula- SCF study of the singlet, triplet, and tions. Nature Chemistry 2019, 11, 1026– quintet states of oxo-Mn(Salen). The 1033. Journal of Chemical Physics 2014, 140, 241103. (35) Kurashige, Y.; Yanai, T. Second-order with a density matrix (28) Nachtigallová, D.; Antalík, A.; Lo, R.; renormalization group self-consistent field Sedlák, R.; Manna, D.; Tuček, J.; reference function: Theory and applica- Ugolotti, J.; Veis, L.; Örs Legeza,; Pit- tion to the study of chromium dimer. The tner, J.; Zbořil, R.; Hobza, P. An Isolated Journal of Chemical Physics 2011, 135, Molecule of Iron(II) Phthalocyanin Ex- 094104. hibits Quintet Ground-State: A Nexus be- tween Theory and Experiment. Chemistry (36) Saitow, M.; Kurashige, Y.; Yanai, T. Mul- - A European Journal 2018, 24, 13413– tireference configuration interaction the- 13417. ory using cumulant reconstruction with internal contraction of density matrix (29) Knecht, S.; Legeza, Ö.; Reiher, M. Com- renormalization group wave function. The munication: Four-component density ma- Journal of Chemical Physics 2013, 139, trix renormalization group. The Journal of 044118. Chemical Physics 2014, 140, 041101. (37) Neuscamman, E.; Yanai, T.; Chan, G. K.- (30) Battaglia, S.; Keller, S.; Knecht, S. Effi- L. A review of canonical transformation cient Relativistic Density-Matrix Renor- theory. International Reviews in Physical malization Group Implementation in a Chemistry 2010, 29, 231–271. Matrix-Product Formulation. Journal of Chemical Theory and Computation 2018, (38) Sharma, S.; Chan, G. A flexible multi- 14, 2353–2369. reference perturbation theory by minimiz- ing the Hylleraas functional with matrix (31) Kurashige, Y.; Chan, G. K.-L.; Yanai, T. product states. The Journal of Chemical Entangled quantum electronic wavefunc- Physics 2014, 141, 111101. tions of the Mn4CaO5 cluster in photosys- tem II. Nature Chemistry 2013, 5, 660– (39) Veis, L.; Antalík, A.; Brabec, J.; Neese, F.; 666. Örs Legeza,; Pittner, J. Coupled Cluster Method with Single and Double Excita- (32) Sharma, S.; Sivalingam, K.; Neese, F.; tions Tailored by Matrix Product State Chan, G. K.-L. Low-energy spectrum Wave Functions. The Journal of Physical of iron–sulfur clusters directly from Chemistry Letters 2016, 7, 4072–4078. many-particle quantum mechanics. Nature Chemistry 2014, 6, 927. (40) Freitag, L.; Knecht, S.; Angeli, C.; Rei- her, M. Multireference Perturbation The- (33) Li, Z.; Li, J.; Dattani, N. S.; Umri- ory with Cholesky Decomposition for the gar, C. J.; Chan, G. K.-L. The elec- Density Matrix Renormalization Group. tronic complexity of the ground-state of Journal of Chemical Theory and Compu- tation 2017, 13, 451–459.

15 (41) Hager, G.; Jeckelmann, E.; Fehske, H.; (50) Xiang, T. Density-matrix Wellein, G. Parallelization strategies for renormalization-group method in momen- density matrix renormalization group tum space. Physical Review B 1996, 53, algorithms on shared-memory systems. R10445–R10448. Journal of Computational Physics 2004, 194, 795 – 808. (51) Davidson, E. R. The iterative calculation of a few of the lowest eigenvalues and (42) Chan, G. K.-L. An algorithm for large corresponding eigenvectors of large real- scale density matrix renormalization symmetric matrices. Journal of Computa- group calculations. The Journal of tional Physics 1975, 17, 87 – 94. Chemical Physics 2004, 120, 3172–3178. (52) McCulloch, I. P.; Gulácsi, M. Density (43) Chan, G. K.-L.; Keselman, A.; Matrix Renormalisation Group Method Nakatani, N.; Li, Z.; White, S. R. and Symmetries of the Hamiltonian. Aus- Matrix product operators, matrix prod- tralian Journal of Physics 2000, 53, 597– uct states, and ab initio density matrix 612. renormalization group algorithms. The Journal of Chemical Physics 2016, 145, (53) Tóth, A. I.; Moca, C. P.; Legeza, Ö.; 014102. Zaránd, G. Density matrix numerical renormalization group for non-Abelian (44) Stoudenmire, E. M.; White, S. R. Real- symmetries. Physical Review B 2008, 78, space parallel density matrix renormaliza- 245109. tion group. Physical Review B 2013, 87, 155137. (54) Sharma, S.; Chan, G. K.-L. Spin-adapted density matrix renormalization group al- (45) Nemes, C.; Barcza, G.; Nagy, Z.; Leg- gorithms for quantum chemistry. The eza, Ö.; Szolgay, P. The density matrix Journal of Chemical Physics 2012, 136, renormalization group algorithm on kilo- 124121. processor architectures: Implementation and trade-offs. Computer Physics Com- (55) Wouters, S.; Poelmans, W.; Ayers, P. W.; munications 2014, 185, 1570 – 1581. Neck, D. V. CheMPS2: A free open-source spin-adapted implementation of the den- (46) Schollwöck, U. The density-matrix renor- sity matrix renormalization group for malization group. Reviews of Modern ab initio quantum chemistry. Computer Physics 2005, 77, 259–315. Physics Communications 2014, 185, 1501 – 1514. (47) Szabo, A.; Ostlund, N. Modern Quan- tum Chemistry: Introduction to Advanced (56) Keller, S.; Reiher, M. Spin-adapted Ma- Electronic Structure Theory; Dover Publi- trix Product States and Operators. The cations, 1996. Journal of Chemical Physics 2016, 144, 134101. (48) Rissler, J.; Noack, R. M.; White, S. R. Measuring orbital interaction using quan- (57) Schollwöck, U. The density-matrix renor- tum . Chemical Physics malization group in the age of matrix 2006, 323, 519 – 531. product states. Annals of Physics 2011, 326, 96 – 192, January 2011 Special Issue. (49) Chan, G. K.-L. Density matrix renormali- sation group Lagrangians. Physical Chem- (58) Veis, L.; Antalík, A.; Brabec, J.; Neese, F.; istry Chemical Physics 2008, 10, 3454– Örs Legeza,; Pittner, J. Correction to 3459. Coupled Cluster Method with Single and Double Excitations Tailored by Matrix

16 Product State Wave Functions. The Jour- Mechanism Stabilizing Intermediate Spin nal of Physical Chemistry Letters 2016, 8, States in Fe(II)-Porphyrin. The Journal of 291–291. Physical Chemistry A 2018, 122, 4935– 4947. (59) Veis, L.; Antalík, A.; Örs Legeza,; Alavi, A.; Pittner, J. The Intricate Case (64) Manni, G. L.; Kats, D.; Tew, D. P.; of Tetramethyleneethane: A Full Configu- Alavi, A. Role of Valence and Semi- ration Interaction Quantum Monte Carlo core Electron Correlation on Spin Gaps Benchmark and Multireference Coupled in Fe(II)-Porphyrins. Journal of Chemical Cluster Studies. Journal of Chemical The- Theory and Computation 2019, 15, 1492– ory and Computation 2018, 14, 2439– 1497. 2445.

(60) Faulstich, F. M.; Máté, M.; Laesta- (65) Sánchez-Grande, A.; de la Torre, B.; dius, A.; Csirik, M. A.; Veis, L.; Anta- Santos, J.; Cirera, B.; Lauwaet, K.; lik, A.; Brabec, J.; Schneider, R.; Pit- Chutora, T.; Edalatmanesh, S.; Mu- tner, J.; Kvaal, S.; Örs Legeza, Numeri- tombo, P.; Rosen, J.; Zbořil, R.; Mi- cal and Theoretical Aspects of the DMRG- randa, R.; Björk, J.; Jelínek, P.; TCC Method Exemplified by the Nitrogen Martín, N.; Écija, D. On-Surface Syn- Dimer. Journal of Chemical Theory and thesis of Ethynylene-Bridged Anthracene Computation 2019, 15, 2206–2220. Polymers. Angewandte Chemie 2019, 131, 6631–6635. (61) Antalík, A.; Veis, L.; Brabec, J.; Demel, O.; Örs Legeza,; Pittner, J. To- (66) Howard, J. B.; Rees, D. C. Structural Ba- ward the efficient local tailored coupled sis of Biological Nitrogen Fixation. Chem- cluster approximation and the peculiar ical Reviews 1996, 96, 2965–2982. case of oxo-Mn(Salen). The Journal of Chemical Physics 2019, 151, 084112. (67) Reiher, M.; Wiebe, N.; Svore, K. M.; (62) Antalík, A.; Nachtigalová, D.; Lo, R.; Ma- Wecker, D.; Troyer, M. Elucidating reac- toušek, M.; Lang, J.; Örs Legeza,; Pit- tion mechanisms on quantum computers. tner, J.; Hobza, P.; Veis, L. Ground state Proceedings of the National Academy of of the Fe(II)-porphyrin model system cor- Sciences 2017, 114, 7555–7560. responds to the quintet state: DFT, DMRG-TCCSD and DMRG-TCCSD(T) (68) FeMoco active space Hamiltonian. study, Submitted. https://github.com/zhendongli2008/ Active-space-model-for-FeMoco, (63) Manni, G. L.; Alavi, A. Understanding the Accessed: 2020-01-08.

17