Accurate molecular polarizabilities with coupled cluster theory and machine learning David M. Wilkinsa, Andrea Grisafia, Yang Yangb, Ka Un Laob, Robert A. DiStasio Jr.b,1, and Michele Ceriottia,1 aLaboratory of Computational Science and Modeling, Institut des Materiaux,´ Ecole´ Polytechnique Fed´ erale´ de Lausanne, 1015 Lausanne, Switzerland; and bDepartment of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853 Edited by Michael L. Klein, Institute of Computational Molecular Science, Temple University, Philadelphia, PA, and approved December 26, 2018 (received for review September 23, 2018) The molecular dipole polarizability describes the tendency of a junction with a sufficiently large (diffuse) one-particle basis set molecule to change its dipole moment in response to an applied (20–23). However, such a prediction is accompanied by a sub- electric field. This quantity governs key intra- and intermolecu- stantially larger computational cost (scaling with the sixth power lar interactions, such as induction and dispersion; plays a vital of the system size), which can become quite prohibitive even role in determining the spectroscopic signatures of molecules; and when treating molecules with as few as 10 − 15 atoms. is an essential ingredient in polarizable force fields. Compared In the last few years, machine learning (ML) has gained trac- with other ground-state properties, an accurate prediction of tion as an alternative approach to the prediction of molecular the molecular polarizability is considerably more difficult, as this properties, substituting or complementing electronic structure response quantity is quite sensitive to the underlying electronic methods (24–26). In particular, it has been shown that accu- structure description. In this work, we present highly accurate racy on par with (or even better than) DFT can be achieved quantum mechanical calculations of the static dipole polarizabil- in the prediction of many molecular properties (27, 28) and ity tensors of 7,211 small organic molecules computed using that DFT (29) or coupled cluster (30) accuracy can be reached linear response coupled cluster singles and doubles theory (LR- more easily when using a less accurate but more computation- CCSD). Using a symmetry-adapted machine-learning approach, ally efficient electronic structure method as a stepping stone. we demonstrate that it is possible to predict the LR-CCSD molec- The polarizability, however, poses an additional challenge to ular polarizabilities of these small molecules with an error that ML. Due to its tensorial nature, the predicted α must trans- CHEMISTRY is an order of magnitude smaller than that of hybrid density form according to the symmetries of the SO(3) rotation group. functional theory (DFT) at a negligible computational cost. The For rigid molecules, this is easily achieved by learning the com- resultant model is robust and transferable, yielding molecular ponents of the tensor written in the reference frame of the polarizabilities for a diverse set of 52 larger molecules (includ- molecule (31, 32). However, to obtain a transferable model ing challenging conjugated systems, carbohydrates, small drugs, that would also be suitable for flexible molecules—as well as amino acids, nucleobases, and hydrocarbon isomers) at an accu- different compounds—this line of thought would require a cum- racy that exceeds that of hybrid DFT. The atom-centered decom- bersome and inelegant fragment decomposition. To avoid these position implicit in our machine-learning approach offers some complications, a symmetry-adapted Gaussian process regression insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest. Significance dipole polarizability j machine learning j coupled cluster theory j The dipole polarizability of molecules and materials is cen- density functional theory j Gaussian process regression tral to several physical phenomena, modeling techniques, and the interpretation of many experiments. Its accurate evalua- he last decade has seen great progress in the first princi- tion from first principles requires quantum chemistry methods Tples evaluation of the structures, stabilities, and properties of that are often too demanding for routine use. The highly molecules and materials. Kohn–Sham density functional theory accurate calculations reported herein provide a much-needed (DFT) has played a pivotal role in this endeavor by providing benchmark of the accuracy of hybrid density functional theory ground-state properties with an accuracy that is sufficient for (DFT) as well as training data for a machine-learning model many useful applications at a manageable computational cost that can predict the polarizability tensor with an error that is (1–3). However, DFT is not equally accurate for every property about 50% smaller than DFT. This framework provides an accu- of interest. For instance, an accurate and reliable description of rate, inexpensive, and transferable strategy for estimating the the molecular dipole polarizability α, a tensor that describes how polarizabilities of molecules containing dozens of atoms, and the molecular dipole changes in the presence of an applied elec- therefore removes a considerable obstacle to accurate and tric field E, can be quite difficult to obtain (4). This is primarily reliable atomistic-based modeling of matter. due to the fact that α is a response property that is particularly sensitive to the quantum mechanical description of the underly- Author contributions: D.M.W., A.G., Y.Y., K.U.L., R.A.D., and M.C. designed research; D.M.W., A.G., Y.Y., and K.U.L. performed research; D.M.W., A.G., Y.Y., K.U.L., R.A.D., and ing electronic structure. As such, nontrivial electron correlation M.C. analyzed data; and D.M.W., A.G., Y.Y., K.U.L., R.A.D., and M.C. wrote the paper.y effects and basis set incompleteness error must be simultane- The authors declare no conflict of interest.y ously accounted for when determining α. For these reasons and This article is a PNAS Direct Submission.y in light of the fact that α is a fundamental quantity of interest that underlies induction and dispersion interactions (5–7) and Published under the PNAS license.y Raman and sum frequency generation spectroscopy (8–11), and Data deposition: Coordinates and dipole polarizabilities of the QM7b database, the showcase dataset, and the alkenes and acenes have been deposited in the Materials represents a key ingredient in the development of next genera- Cloud Archive (https://archive.materialscloud.org/2019.0002/v1).y tion polarizable force fields (12–16), it is important to provide 1 To whom correspondence may be addressed. Email: [email protected] or michele. benchmark values for α beyond the accuracy of DFT. In this ceriotti@epfl.ch.y regard, linear response coupled cluster singles and doubles the- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. ory (LR-CCSD) (17–19) has been shown to provide considerably 1073/pnas.1816132116/-/DCSupplemental.y more accurate and reliable predictions for α when used in con- www.pnas.org/cgi/doi/10.1073/pnas.1816132116 PNAS Latest Articles j 1 of 6 Downloaded by guest on September 23, 2021 (SA-GPR) scheme has recently been derived to naturally incor- LR-CCSD results, which will be referred to as DFT and cou- porate this SO(3) covariance into an ML scheme that is suitable pled cluster singles and doubles theory (CCSD), respectively, to predict tensorial quantities of arbitrary order (33). In this throughout the remainder of the manuscript. paper, we present comprehensive coupled cluster-level bench- marks for the polarizabilities of the ∼ 7, 000 small organic Improved SA-GPR. The formalism underlying the SA-GPR molecules contained in the QM7b database (34). We use scheme in general and the λ-SOAP (smooth overlap of atomic these reference calculations to assess the accuracy of different positions) descriptors on which our model is based have been hybrid DFT schemes and train an SA-GPR–based ML scheme introduced elsewhere (33) and are summarized in Materials and (AlphaML) that can accurately predict the polarizability tensor Methods. In this work, we include several substantial improve- with nominal computational cost. We then test the extrapolative ments that increase the accuracy and speed of the SA-GPR prediction capabilities of AlphaML on a showcase dataset com- model, and these are worth a separate discussion. For one, eval- posed of 52 larger molecules and demonstrate that this approach uation of the λ-SOAP representation is greatly accelerated by provides a viable alternative to state-of-the-art and computa- choosing the most significant few hundred spherical harmonic tionally prohibitive electronic structure methods for predicting components (of several tens of thousands) using farthest point molecular polarizabilities. sampling (FPS) (43). The calculation of the kernel in Eq. 1 can be carried out with essentially the same result as if all Results components were retained but with a much lower computa- Electronic Structure Calculations. The QM7b database (26, 34, 35) tional cost. A second improvement is the generalization of the includes N = 7,211 molecules containing up to seven “heavy” λ-SOAP kernels beyond the linear kernels used in ref. 33. It atoms (i.e., C, N, O, S, Cl) with varying levels of H satura- has been shown that, in many cases, taking an integer power tion. This dataset is based on a systematic enumeration of small of the scalar SOAP kernel improves the performance of the organic compounds (35) and contains a rich diversity of chemical associated ML model. This can be understood in terms of the groups, making it a challenging test of the accuracy associated order (two body, three body,: : :) of the interatomic correla- with DFT and quantum chemical methodologies. DFT-based tions that are described by different kernels (44, 45). In the molecular polarizabilities were obtained by (numerical) differ- tensorial case, one should be careful, as the linear nature of entiation of the molecular dipole moment, µ, with respect to the kernel is essential to ensure the correct covariant behav- an external electric field E using the hybrid B3LYP (36, 37) ior.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-