State, Trends, and Needs of Quantum Chemistry Software Edward Valeev Department of Chemistry, Virginia Tech Software Institute for Abstractions and Methodologies for HPC Simulations Codes on Future Architectures University of Chicago December 10, 2012 Monday, December 10, 12 Talk Outline Intro to QC Core problems, data, and algorithms QC Software: Now Overview of the field-wide challenges + our own codes QC Software: Then Development trends and needs Monday, December 10, 12 What is Quantum Chemistry? QC = quantum mechanics of electrons (+nuclei) in chemical setting will not talk about solid state physics today! similar (yet different) ballgame 1 electrons nuclei electrons Z Hˆ Ψ = EΨ Hˆ = 2 I − 2 ∇i − r R i i i I I | − | ellipic linear 2nd-order PDE with electrons 1 (usually) Dirichlet boundary conditions + r r i<j i j | − | Ψ Ψ( r ) ≡ { i} Provides Molecular Geometries Chemical Reaction Energetics Chemical Reaction Pathways Response to Perturbations (Spectra) tamoxifen (breast cancer drug) etc. 57 nuclei, 200 electrons Monday, December 10, 12 What is Quantum Chemistry? QC = quantum mechanics of electrons (+nuclei) in chemical setting will not talk about solid state physics today! similar (yet different) ballgame 1 electrons nuclei electrons Z Hˆ Ψ = EΨ Hˆ = 2 I − 2 ∇i − r R i i i I I | − | ellipic linear 2nd-order PDE with electrons 1 (usually) Dirichlet boundary conditions + r r i<j i j | − | Ψ Ψ( r ) ≡ { i} Issues Huge # of variables High precision needed relevant # of electrons = 10 ... 10000 relevant differences of E on the order 1 ppm ... 1 ppb ↓ # of variables = 30 ... 30000 Monday, December 10, 12 Applied Math View of QC physics-based reduction of dimension Hˆ Ψ = EΨ linear n-electron problem Fˆ( φ )φ = φ , φ φ (r) Gˆ( φ )φ =0, φ φ (r , r ) { i} k k k k ≡ k { ij} kl kl ≡ kl 1 2 set of coupled nonlinear 1-electron problems: set of coupled (nonlinear) many-electron (2, 3, 4, ...) problems: mean field methods perturbation theory, coupled-cluster method (Hartree-Fock and Density Functional Theory) many methods with different data/computation traits solved using (usually) spectral iterative Galerkin method Hˆ Ψ = EΨ = E ⇒ ≈ global basis set constructed from preoptimized atomic basis sets (molecule ≃ sum of atoms) Monday, December 10, 12 Data Structures and Operations few-dimensional arrays 1-electron methods ‣1 and 2-dimensional arrays (vectors and matrices) Hc = eSc ‣each dimension’s rank is 500 ... 50000 ‣hence must distribute matrices in the strong-scaling limit ‣standard matrix algebra (BLAS and LAPACK + scalable extensions) ‣product (DGEMM) ‣real symmetric matrix decomposition and eigenvalue problems ‣iterative computation of H is the expensive step for medium size systems, linear algebra dominates for large systems H += (2(φ φ φ φ ) (φ φ φ φ ))P pq p r| q s − p q| r s rs pqrs two-electron (6-dimensional) integrals cost is critical hand-written or auto-generated low-level kernels Monday, December 10, 12 Data Structures and Operations few-dimensional arrays many-electron methods ‣k-dimensional arrays (tensors) ‣k=4 is most common, k up to 8 desired ‣dimension ranks 100 ... 1000, hence must be distributed (and need large aggregate memory) ‣permutational and other symmetries T = T = ... ijab − ijba ‣general tensor algebra (DIY codes, no standard tools) ‣redistribute storage T T ijab → jiba ‣contract (inner product) Rijab+= vabcdTijcd cd ‣tensor decompositions v σ X Y Z W abcd ≈ r ra rb rc rd r ‣iterative computation of HΨ is the expensive step Monday, December 10, 12 Programming Issues ‣Intrinsic complexity ‣Performance ‣Sustainability ‣Teachability ‣etc. Monday, December 10, 12 Programming Issues: Complexity many-electron QC methods have simple specifications a 0 = 0 (a˜ )†H¯ 0 | i | ⇥ ab 0 = 0 (a˜ )†H¯ 0 | ij | ⇥ ij 0 = 0 (γˆ )†H¯ 0 | kl | ⇥ specification of CCSD-R12 method in Shiozaki, Kamiya, Hirata, and Valeev, J. Chem. Phys. 2008 Monday, December 10, 12 Programming Issues: Complexity many-electron methods have complex working equations one of many working CCSD-R12 equations in Shiozaki, Kamiya, Hirata, and Valeev, J. Chem. Phys. 2008 Monday, December 10, 12 Programming Issues: Complexity many-electron methods have complex working equations one of many working CCSD-R12 equations in Shiozaki, Kamiya, Hirata, and Valeev, J. Chem. Phys. 2008 automated derivation/transformation/generation tools crucial for performance and validation Monday, December 10, 12 Programming Issues: Complexity automated implementation of many-electron methods Problem Specification Transformation/Optimization a 0 = 0 (a˜ )†H¯ 0 | i | ⇥ ab 0 = 0 (a˜ )†H¯ 0 | ij | ⇥ ij 0 = 0 (γˆ )†H¯ 0 | kl | ⇥ Software Realization Array4D<double> r_aa_vvoo = ‣Many examples v_aa_vvoo("p1a,p2a,h1a,h2a") ‣CSE: Janssen (1991) -f_a_vv("p1a,p3a")*t_aa_vvoo("p2a,p3a,h1a,h2a") ‣TCE: Hirata, Harrison, Sadayappan et al +f_a_vv("p2a,p3a")*t_aa_vvoo("p1a,p3a,h1a,h2a") +f_a_oo("h3a,h1a")*t_aa_vvoo("p1a,p2a,h2a,h3a") (2000) -f_a_oo("h3a,h2a")*t_aa_vvoo("p1a,p2a,h1a,h3a") ‣SIAL: Deumens, Sanders et al (2008) +0.5*t_aa_vvoo("p3a,p4a,h1a,h2a")*v_aa_vvvv("p1a,p2a,p3a,p4a") ‣etc. +v_ab_voov("p1a,h3b,h1a,p3b")*t_ab_vvoo("p2a,p3b,h2a,h3b") most are work in progress -v_aa_vovo("p1a,h3a,p3a,h1a")*t_aa_vvoo("p2a,p3a,h2a,h3a") ‣ -v_ab_voov("p1a,h3b,h2a,p3b")*t_ab_vvoo("p2a,p3b,h1a,h3b") ‣DSL + compiler +v_aa_vovo("p1a,h3a,p3a,h2a")*t_aa_vvoo("p2a,p3a,h1a,h3a") ‣or even DSL+compiler+embedded DSL -v_ab_voov("p2a,h3b,h1a,p3b")*t_ab_vvoo("p1a,p3b,h2a,h3b") +v_aa_vovo("p2a,h3a,p3a,h1a")*t_aa_vvoo("p1a,p3a,h2a,h3a") +v_ab_voov("p2a,h3b,h2a,p3b")*t_ab_vvoo("p1a,p3b,h1a,h3b") -v_aa_vovo("p2a,h3a,p3a,h2a")*t_aa_vvoo("p1a,p3a,h1a,h3a") + ... ; Monday, December 10, 12 Example: DSL in QC • example use of SIAL DSL to implement the following equation Monday, December 10, 12 Programming Issues: Performance tensor contraction cd ij Rijab+= vabcdTijcd ⟹ ab cd × cd vab;cd Tcd;ij ‣Tensors are tiled for ‣performance (tiles ~ memory segments) ‣physics (tiles ~ atoms or blocks of basis function) ‣Low-level operation is DGEMM ‣High-level operation is SUMMA/another parallel matrix multiply Monday, December 10, 12 Programming Issues: Performance tensor contraction cd ij Rijab+= vabcdTijcd ⟹ ab cd × cd vab;cd Tcd;ij ‣Tensors are tiled for ‣performance (tiles ~ memory segments) ‣physics (tiles ~ atoms or blocks of basis function) ‣Low-level operation is DGEMM ‣High-level operation is SUMMA/another parallel matrix multiply big issue on the horizon is sparsity Monday, December 10, 12 Overview of our main QC codes Research Code Synopsis ‣research code originally developed by Curt Janssen (Sandia National Lab; now @ Google) ‣basic QC (Hartree-Fock, DFT, MP2, local MP2) + some advanced features (R12 explicit correlation) ‣~400k lines of C++, classic object-oriented code ‣mostly parallel (message-passing, threads, one- sided) ‣repo on Sourceforge (world read; select write) Monday, December 10, 12 Overview of our main QC codes Reusable Components Synopsis ‣domain-specific compiler that generates lightweight LIB∫ INT Gaussian integrals code ‣built into Psi; used as a component by MPQC, CP2K, ORCA and many “toy” codes Synopsis ‣block-sparse parallel tensor “language” embedded TiledArray in C++ ‣built on top of MADNESS task-based runtime ‣code looks like this: R("i,j,a,b") += 0.5 * T("i,j,c,d") * W("c,d,a,b"); Monday, December 10, 12 The big present/future challenges for QC codes ‣Dense->sparse arrays, and other disruptive theory trends ‣disruptive hardware trends: SIMD, heterogeneity, power-limited ‣Lots of legacy codes, few community-wide tools ‣Education Monday, December 10, 12 Questions? Monday, December 10, 12.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages19 Page
-
File Size-