<<

ACCDB: A Collection of for Broad Computational Purposes

Pierpaolo Morgante,1 Roberto Peverati1*

1 P. Morgante, R. Peverati Chemistry Program, Florida Institute of Technology, 150 W. University Blvd., Melbourne, FL 32901, United States. E-mail: [email protected]

ABSTRACT. The importance of databases of reliable and accurate data in chemistry has substantially increased in the past two decades. Their main usage is to parametrize theory methods, and to assess their capabilities and accuracy for a broad set of chemical problems. The collection we present here—ACCDB—includes data from 16 different research groups, for a total of 44,931 unique reference data points, all at a level of theory significantly higher than DFT, and covering most of the . It is composed of five databases taken from literature (GMTKN, MGCDB84, Minnesota2015, DP284, and W4-17), two newly developed reaction databases (W4-17-RE, and MN-RE), and a new collection of databases containing transition metals. A set of expandable software tools for its manipulation is also presented here for the first time, as well as a case study where ACCDB is used for benchmarking commercial CPUs for chemistry calculations.

Introduction heats of formation), anharmonicity effects (for vibrational frequencies), vector-relativistic Databases of atomic and molecular effects (spin-orbit couplings), etc. Moreover, have always been an important tool in high-level calculated data do not suffer from computational chemistry for the assessment and experimental errors. Albeit they do still suffer parametrization of semi-empirical methods.1,2 from the computational accuracy error of the Large sets of experimental data of small level of theory that is used to obtain them, such became increasingly important with error is easier to assess and to control than the the development of methods for the calculation stochastic experimental error, and in many of heats of formation of molecules, starting from cases, it is preferable. the early days of semi-empirical methods (for a historical perspective see Ref. 3), to the The importance of computational chemistry composite methods of Curtiss and co-workers,4 databases containing high-level data has grown and the multi-coefficient correlation methods of exponentially in the last two decades because of Truhlar and co-workers.5 More recently, the the advent of semi-empirical exchange- development of high-accuracy ab initio correlation (xc) functionals in density functional calculation methods has shifted the interest theory (DFT). While the necessity of extremely from databases of experimental data to large databases for parametrization of new xc databases of purely calculated data. The functionals—and the increasing number of advantages of the latter for the assessment and parameters in the functional forms—is a parametrization of semi-empirical electronic controversial topic that is outside the scope of 6,7 structure methods is clear, since they include this work (see for example refs. , and for a 8-10 data that are directly comparable between recent debate, see ), we believe that calculations, without the need of corrections for databases will undoubtedly remain a experimental conditions, such as zero-point fundamental tool for the assessment of the energies (in most cases), thermal corrections (for applicability, accuracy, and reliability of new and

1 existing xc functionals in the years to come, main-group and transition-metals regardless of the underlying wars between , non-covalent interactions, optimization philosophies (from first principle vs. and . parametrized). It is objectively true that more parameters in the xc functionals require more In the next two section we describe the structure data in the training set (a statistical necessity to of ACCDB, including details for all the databases avoid over-training), but the availability of larger that compose it, and for the automation tools training sets does not necessarily translate into that we developed to simplify the management more parametrized functionals, nor into more of such a large quantity of data. Before the parameters in a functional.11 conclusion, we also present a case study where ACCDB is used to benchmark the performance of The modern use of computational chemistry three commercial CPUs recently introduced onto databases containing a large number of high- the market (as of August 2018) by AMD, for level calculated data is not limited to the routine computational chemistry calculation. parametrization and assessment of semi- empirical composite or DFT methods: other Structure of the interesting applications include validation of high-accuracy ab initio method,12 benchmarking ACCDB includes five different databases from 21 22,23 of software or hardware,13-15 and the application five different sources: MGCDB84, GMTKN, 24,25 26,27 12 of modern techniques to chemistry, Minnesota, DP284, and W4-17. The such as and machine data from these sources cover primarily main learning.16 group elements. Some transition metals (TMs) are present in the Minnesota database, but we In light of this increasing importance of large sets significantly expand this number by introducing of calculated data with high accuracy in here a database of reactions involving first-, chemistry, we present here a collection of second-, and third-row TMs, as well as elements almost 45k unique reference data points (with ~ from the second transition. In addition to these 37k of them presented here for the first time), all “traditional” data points, we also introduce here at a level of theory significantly higher than DFT. two new databases obtained using automatic This collection—that we named ACCDB— generation of reaction energies,28 containing includes data from 16 different research groups, further 36,275 “non-traditional” reference data a total of more than 10k atomic and molecular points. structures files, as well as a set of software tools for their manipulation, automation of In total, ACCDB includes 191 subsets and 44,931 corresponding jobs, and statistical analysis. Its data points. A brief description of each database primary difference with respect to other recent is presented below, and a summary is in Table 1. large databases of chemical compounds17-20 is We suggest the user of each pre-existing the high level of accuracy at which our data are database to refer to the original publications for obtained. Another key aspect of ACCDB is its more information on the subsets and to give broad applicability to different areas of proper credit to its primary authors. chemistry, including—but not limited to—both

2

Table 1. Summary of all databases included in ACCDB.a

Name of the Number of Unique Reference Brief description of what is included in the database: Ref. database: Structures: Data Points:

MGCDB84 Main Group Chemistry DataBase 5,931 4,985 21

General Main Group, Thermochemistry, Kinetics, and Non- GMTKN 2,639 1,664 22,23 covalent interactions and Mindless Benchmarks

Thermochemistry, Kinetics, Non-Covalent interactions Minnesota 719 471 24,25,29 (Database2015, Database2015A, and Database2015B)

Automatically Generated Reaction Energies from - MN-RE - 9,135 This work Minnesota 2015B

DP284 Moments and Polarizabilities 181 284 26,27

W4-17 Total Atomization Energies 215 1,042 12

- W4-17-RE Automatically Generated Reaction Energies from W4-17 - 27,140 This work

Metals&EE Collection of subsetscontaining metals and excitation energies 364 210 30-40

8,656 ACCDB 10,049 (44,931)b [a] For further details of all subsets of each database—and clarifications on the corresponding reference—see the supporting information. [b] The number in parentheses includes the “non-traditional” RE databases.

MGCDB84. The Main Group Chemistry DataBase database,23 both from Goerigk’s and Grimme’s has been introduced by Mardirossian and Head- groups. GMTKN55 is the extension of two Gordon.21 It is the largest database included in previously published databases by Goerigk et al., ACCDB, with 84 subsets, 5,931 single-point called GMTKN2446 and GMTKN30.47,48 It includes geometries, and 4,985 reference data. Part of it 55 subsets, 2,459 single-point geometries, and was used as training set for new xc 1,499 relative energies. The four areas of functionals,11,41-43 while it has been used as interest are: non-covalent interactions (both benchmark set in its entirety, for assessing the inter- and intra-molecular interactions, 595 performance of the Minnesota functionals,44 as data), basic properties (atomization energies, well as about 200 more functionals.21 Its main and proton affinities, dissociation focus areas are: non-covalent interactions, energies of various compounds, 467 data), which make up almost half of the database thermochemistry (reaction energies and (2,647 data), thermochemistry (1,205 data), isomerization reactions, 243 data), and kinetics isomerization energies (910 data) and kinetics (barrier heights, 194 data). MB08-165 is a (barrier heights: 206 data). Additionally, the database containing 165 “Mindless Benchmark”, electronic energies of the first 18 are also obtained with randomly-generated, artificial included. Quite recently, a statistically significant molecules each containing 8 atoms.23 This version of the MGCDB84 database, called MG8, database was included in both GMTKN24 and has been proposed by Bun Chan.45 GMTKN30, but it was replaced in GMTKN55 by a new database of 43 artificial molecules of 16 GMTKN. This database is composed by the atoms each (MB16-43). Because of its relevance GMTKN55 database,22 and the MB08-165

3 as a chemically “unbiased” test set (the W4-17. W4-1712 is an extension of two previous molecules are not real ones, reducing the databases—namely W4-08,51 and W4-1152— possibility of forecasting the outcome of the developed by Martin’s and Karton’s groups. W4- calculations), we have decided to keep MB08- 17 includes 203 total atomization energies of 165 in our collection, in addition to GMTKN55. As 215 first- and second-row molecules and radicals a comprehensive example of the usage of with up to eight non-hydrogen atoms. We have GMTKN55 in the context of DFT, Goerigk et al.22 also decided to keep the reaction energies performed a rigorous analysis of 217 xc generated in the original paper for the W4-11 functionals (all with and without semi-empirical version of the database,52 thus including 99 data dispersion corrections). Very recently, Goerigk points from the BDE99 subset, 707 reference and his group also used GMTKN55 to assess the data for the HAT707 subset, 20 data points from performance of 29 double-hybrid functionals. 49 ISOMERIZATION20, and 13 reference data from In a similar fashion to what was done by Bun SN13. Each of these subsets have been Chan for MGCDB84, a “diet” version of the generated using molecules as that are in W4-17, GMTKN55 database was recently proposed by therefore no additional computation is needed Tim Gould.50 to evaluate them. The reference energies for W4-17 and its subsets have been obtained using Minnesota. The Minnesota database was the highly accurate Weizmann-4 computational developed in Truhlar’s group for the protocol, and they are guaranteed within a 3σ parametrization of new xc functionals. We confidence intervals of 1 kJ mol-1. included in ACCDB the molecular data in Databases 2015,29 Database2015A,24 and W4-17-RE and MN-RE. These databases are Database2015B25 (we do not include in ACCDB presented here for the first time. Each of them the geometry optimizations and solid-state sets). includes reaction energies that are automatically The first version of Database2015 is updated in generated from the W4-17 and Minnesota both Database2015A—which includes 32 databases, using the autoRE perl script provided subsets, 652 geometries, and 422 reference in Ref. 28 This program generates all the energies—, and Database2015B—which stoichiometrically feasible reactions of the form includes 34 subsets, 719 geometries, and 471 reference energies. The focus areas are: bond A + B → C + D, energies (MGBE150 and TMBE33; 40% of the from a corresponding list of atomization energy. reference data), non-covalent interactions The program automatically excludes all (NC87; 20% of the data), barrier heights (BH76; redundancies, double counting of reverse 16% of the data), thermochemistry reactions, and trivial isomerizations. In this case, (isomerization energies, excitation energies, we started from the entire W4-17 database, and hydrocarbon thermochemistry; 15% of the all atomization energies in the Minnesota data), and basic properties (ionization Database2015B, to obtain the W4-17-RE, with potentials, atomic energies; 9% of the data). 27,140 unique reference data, and the MN-RE, DP284. This database is a collection of two with 9,135 unique reference data. We refer to recent sets introduced by Hait and Head- these databases as “non-traditional” because Gordon, and it is comprised of 181 structures of their usage for the parametrization and small molecules. It includes 152 reference values assessment of electronic structure methods is for dipole moments,26 and 132 reference data currently unexplored, and some statistical noise for polarizabilities,27 obtained at the is expected, due to their large numbers of data. 28 CCSD(T)/CBS level. However, based on the work of Margraf et al., we expect low correlation between these sets

4

and the corresponding atomization energies LTMBH26. The Late Transition Metals Barrier sets, supporting the inclusion of these “non- Heights database includes reactions catalyzed by traditional” sets into ACCDB as independent Au, Pt, and Ir.33 It includes 40 structures, and 26 databases. reaction energies: two are catalyzed by Ir, two are catalyzed by Pt, and 22 are catalyzed by gold. Metals and Excitation Energies. The Metals and Excitation Energies collection is also introduced MOR41. The Metal Organic Reactions database here for the first time, with the goal to expand of Grimme and co-workers34 includes 41 data (95 ACCDB to first-, second-, and third-row transition structures and 13 different transition metals: Ti, metals, as well as actinides from Th to Cm. Cr, Mn, Fe, Co, Ni, Mo, Ru, Rh, Pd, W, Ir, Pt, plus Particular care should be given for this database Al). All structures are carefully chosen to have when selecting an appropriate basis-set (in most single-reference character only. cases, basis sets that include an effective core potentials are required), as well as for issues p-VR17. This database includes valence (one related with stability of the SCF solution, and electron goes from an ns to an np orbital) and proper treatment of spin-contamination. All of Rydberg (one electron goes from an np to an the data have been taken from different sources (n+1)s orbital) excitations of different p-block 35 in the literature, and are divided into the elements and their mono-charged cations. The following eight subsets: elements included are: B, Al, Ga, F, Ne, Cl, Ar, Br and Kr; while the cations are: B+, C+, Al+, Si+, Ga+, 3d-SSIP30. Spin-state (SS) energetics and Ge+, Ne+, Kr+, for a total of 34 atomic structures ionization potentials (IP) of all 10 first-row 3d and 17 reference energies (9 valence + 8 Rydberg 30 TMs (from Sc to Zn). Spin states refers to the excitations). lowest-energy multiplicity-changing excitation energy for each species, and it includes data for Por21. This is a new database presented here for both the neutral and the cations. the first time. It includes spin states and binding Reference energies are experimental energies energies data of porphyrin structures, which are with spin-orbit coupling removed. ubiquitous in nature (the famous heme group is an example). It is divided into two subsets: 4d-SSIP24. This set is analogous to 3d-SSIP30 but PorSS11, which includes the spin-state energy for the first eight second-row 4d TMs (Y, Zr, Nb, differences of three Mn-porphyrins, one Co- 31 Mo, Tc, Ru, Rh, Pd). porphyrin, and seven Fe-porphyrins, bonded to different ligands (NH , OH-, SH-);36 and PorBE10, AIP28. This database includes the ionization 3 which includes the binding energies for the potentials of mono- and dioxides of actinides (Th complexes between a model system of a heme to Cm).32 It includes 42 geometries and 28 group and three diatomic molecules: NO, CO, reference data. All data are calculated at the 37 CASPT2/ANO-RCC (triple-zeta quality) level with and O2 (Figure 1). Por21 includes 32 a CAS(16,14) for the monoxides (14 orbitals for structures, and 21 reference energies obtained the metals, 2 for oxygen), and CAS(14,14) for the at the CASPT2 level (different active spaces have dioxide species. CASPT2 geometries are used as been used for each in the database, reference. their details are given in the original publications,36,37 and are also reported in our TMBH23. The Transition Metal Barrier Heights online repository). database includes reactions catalyzed by Zr, Mo, W and Re.38-40 It includes 49 structures and 23 reaction energies: five are catalyzed by Zr, five are catalyzed by Mo, seven are catalyzed by W, and six are catalyzed by Re.

5 MGCDB84 and Minnesota to their original values, as presented —and extensively used—by the authors of such databases. As last example, we want to discuss the DC13 subset in GMTKN55, which includes 13 problematic reactions for DFT methods. In the previous versions of the GMTKN database, the DC9 subset of Truhlar and co-workers was used. In GMTKN55, however, Goerigk and co-workers replaced it with a newer version that shares only Figure 1. One of the structures in PorBE10, where a Fe- porphyrin is bonded to an imidazole moiety to mimic the one reaction with DC9, but it includes one binding environment in the heme group, and to an O2 reaction in common with the Styrene45 subset, molecule. The binding energy is calculated with respect to and one in common with the C20C24 subset, the entire complex (including the imidazole moiety), and both in MGCDB84. the O2 molecule at infinite separation (carbon is black, hydrogen is white, nitrogen is blue, oxygen is red, iron is For more details of all the subsets of every orange). database, as well as references to their original Additional considerations. For the sake of sources, and a better overview of their overlaps, completeness, we have decided to collect all see also the Supporting Information.7,12,22-27,30- original databases and leave them untouched in 40,46,48,51-174 ACCDB. For this reason, some of the subsets in some databases do overlap with each other, and Automation Software the reference data for some data point in overlapping subsets are (on purpose) not always ACCDB contains 10,049 geometry files—in xyz consistent. Hence, we advise care when using format, and appropriately named, including the entire collection, especially if the reduction charge and spin multiplicity data—collected in of redundancies and/or of the number of one directory called “Geometries”. Each file calculations is a priority. For example, both requires a single-point energy calculation, GMTKN55 and MGCDB84 include the database usually performed with W4-11, which is the previous version of W4-17 software engines, such as Gaussian175 or Q- (with the latter having the most accurate 176 reference data points). In addition, MGCDB84 Chem. These calculations will result in 44,931 heavily relies on data taken from GMTKN30—a unique reference data points, the majority of preceding version of GMTKN55—and therefore which are reaction energies (vide supra). The the overlap between these two databases is reference energies for each database or subset substantial (and again, the most accurate data are reported in csv files that are available in references have to be sought in the latter either E , kcal mol-1, or kJ mol-1. Each reference GMTKN55 database). Another complex situation h is the DBH76 set of barrier heights, which is file also includes the stoichiometry coefficients present in MGCDB84, as well as in GMTKN55, for the reaction in consideration, and reference and also in Minnesota (albeit with different to the corresponding filename in the names, as HTBH38 and NHTBH38). As pointed “Geometries” directory. out in the GMTKN55 paper,22 the most recent— and most accurate—reference data for this As part of ACCDB, we provide a set of tools based subset are the one obtained by Goerigk and co- on snakemake workflows,177 that can be used for workers and included in GMTKN55. However, for the automation of the jobs, the parsing of the consistency purpose, we have kept the values in

6

output and reference files, and the collection of the final . The automation files include: one Snakefile, with the python source code of the workflow; one configuration file (config.yaml), with user-specific configurations that can be simply specified using yaml syntax; and one template file, specific for each quantum chemistry software engine (sample ginput.tmpl and qhcem.tmpl files are provided for Gaussian and Q-Chem, respectively, extension to other programs are straightforward). The lists of the molecules pertaining to each database or set are also provided, and they are used in our workflow to extract the relevant xyz files from the “Geometries” directory. A representation of our software workflow is given in Figure 2. Such workflow will run all calculations on the selected Figure 2. Pictorial representation of the ACCDB workflow. databases (with the desired quantum chemistry engine), parse the output files of all completed We also want to point out that other automated calculations, and collect the results into a single procedures exist, but they significantly differ csv file that can be used to calculate the from ACCDB and the workflow presented above: statistics. More sophisticated statistical data can de la Roza179 gives the molecular geometries, be collected from the output files, with simple reference data, and a pre-compiled Gaussian modifications of the workflow. Instructions on input file for every structure in the GMTKN55 how to interact with—and modify—the database plus others, but his software lacks workflow, as well as details of all the workflow controls and analysis tools; on the configurations available in the yaml file are also other hand, GC3PIE is a useful tool to submit jobs included within the project. All relevant files are to cluster environments, 180 but it lacks an released under the GNU GPL license on accompanying database of geometry files, and Github.178 there is no trace of automatization. ACCDB is a much more general and flexible solution because it provides 10,049 geometry files (in a generic format that can be easily and automatically converted to any quantum chemistry engine input files via a simple customizable template file), a flexible workflow to automate the submission of the jobs and to control their execution from beginning to end (with tools applicable to both cluster environments, or single machines), as well as a parser for the retrieval of the final results, and for the

7 calculation of the statistics on almost 45k same rack: each machine has 1 GB of RAM per reference data points (all of them also provided). thread, comparable motherboards (except for the Epyc processor, which is in a 2xCPU Case Study: ACCDB on AMD Zen- motherboard), the same based CPUs (Ubuntu server 18.04), and the same quantum chemistry engine (Q-Chem 5.1), with all As a simple case study, we used the entire calculations running on one single thread. On the ACCDB for Hartree–Fock (HF) calculations, in one hand, launching multiple single-thread conjunction with the simple 3-21G , as a calculations in parallel (vs. single multiple-thread benchmark for testing the performance of a set calculations in sequence) gives us the chance to of commercially available processors of the Zen measure the true scaling performance of each family, recently introduced onto the market by processor (vs. the scaling performance of the AMD. We chose HF because it is the most routine quantum chemistry engine). On the other hand, self-consistent-field method in computational averaging out on almost 10,000 structures chemistry, while we used the 3-21G minimal eliminates the risk of having a single slow basis set because we needed a small basis set (in calculation that bottlenecks the overall time. order to keep the overall time of completion of Comparable benchmarks could have been each calculation manageable), and because it is achieved using a single—sufficiently large— defined for most of the periodic table. The calculation ran in parallel multiple times, but calculations range in size from H atom (3 basis using the entire database gives us the functions), to a complex of Ru in MOR41 with opportunity to understand the behavior of the 120 atoms (640 basis functions). It’s important to CPUs under conditions that are closer to a real remark that the purpose of this test is not to day-to-day research environment, and to evaluate the accuracy of the method/basis-set extrapolate “educated guesses” on the timings combinations, but rather the speed of the CPUs. for more advanced methods and basis sets Hence, we will not report the accuracies of the combinations, by just adding the scaling calculations with respect to the unique reference information of the quantum chemistry engine. energies. Single-thread Speed. We analyze first the single- We used three newly acquired (August 2018) thread performance of each CPU, by collecting machines equipped with similar hardware, but the time (in minutes) it takes to run the entire three different Zen-family CPUs. The processors ACCDB database on each machine with have all been recently released (between 2017 individual calculations running in sequence on a and 2018), they have similar price per thread, single-thread (ACCDB-st in Table 2). This number and they cover different categories in the high- shows that the faster CPU for single-thread end market range of AMD Zen-based CPUs: the performance is the Ryzen 7, with the Epyc consumer category (Ryzen 7 2700X), the coming last. Perhaps not surprisingly, these prosumer category (Ryzen Threadripper 1950X), numbers correlate well with the base clock rates and the professional category (Epyc 7281). The of each CPU. Unfortunately, though, ACCDB-st configuration for each machine is as similar as doesn’t represent parallel performance (multi- possible, and all machines are mounted on the thread scaling), which is more important for

8

evaluating the performance of CPUs in research both Ryzen, and Threadripper (the optimal ratio environments, since the vast majority of for both is only up to 4x). calculations are usually performed in parallel on Table 3. Multi-thread scaling performance of each CPU.a multiple threads/cores. Ryzen Threadripper Epyc Table 2. Summary of the different machines equipped with Time Time Load:b R.I.c Time [min] R.I.c R.I.c AMD CPUs, and single-thread benchmark results. [min] [min] Machine CPU Cores/ Base ACCDB- 1x 974 - 1200 - 1508 - Threads Clock st (min) 2x 513 1.9 719 1.7 716 2.1 Rate 4x 262 3.7 306 3.9 346 4.4 (GHz) 8x 235 4.1 256 4.7 233 6.5 Ryzen Ryzen 7 8/16 3.7 974 16x 236 4.1 174 6.9 238 6.3 2700X 32x 186 6.4 237 6.4 64x 247 6.1 Threadripper Ryzen 16/32 3.4 1200 [a] Time in minutes to run the entire ACCDB. [b] Full loads for Threadripper each machine are: Ryzen 16x, Threadripper 32x, 2xEpyc 64x. 1950X [c] Relative improvement over single-thread performance. Epyc 2xEpyc 7281 32/64a 2.1 1508 [a] 16/32 per CPU in a dual-CPU server configuration. The server processor Epyc does scale better up to 8x, but its poor single-thread performance is Multi-thread Scaling. In order to seek for the limiting its results significantly. The deviation of best performer with optimal compromise the R.I. from the ideal value at moderately high between single-thread speed and multi-thread number of threads is an indication that the load scaling, we expanded our calculations on the that the quantum chemistry engine puts on the entire ACCDB to include parallel calculations on cores is very high, and the unexpected each CPU. We started from ACCDB-st, with single degradation of the performance at full-load puts calculations ran in sequence (1x), and performed into question the use of virtual cores for subsequent runs of the full database doubling quantum chemistry calculations (results at 16x the number of calculations ran in parallel at each for Ryzen, at 32x for Threadripper, and at 64x for run (2x -> 4x -> 8x -> etc.), until full load is Epyc are all worse than the previous step for reached for each machine (again, each individual each processor). calculation is always a simple single-thread Q- Chem calculation, what changes is the number of Is core virtualization useful for quantum simultaneous Q-Chem calculations on different chemistry? Simultaneous multi-threading (SMT, threads). These detailed scaling results are sometimes also called hyper-threading) is a reported in Table 3, as well as in the plots of popular way to increase the total number of Figure 3. Our best overall result for the entire cores seen by the operating system, by ACCDB test is obtained with Threadripper at 16x virtualizing multiple threads on one physical (174 minutes), however its relative core. Dual-threading (virtualization of two improvement over the single-thread threads on a single physical core) is now performance—which ideally should be close to becoming the de facto standard for all new 16 for this case—is only 6.9. We found commercial processors introduced onto the degradation of performance when higher market by both Intel and AMD, but considering number of threads are used with every our multi-thread scaling results reported above, processor. The relative improvement (R.I.) starts a reasonable question arises: should we use SMT to deviate for its ideal value surprisingly early for for quantum chemistry calculations? In Figure 3

9 we report our results for multi-core scaling with results. In light of these results, the answer to virtualization turned off, compared with the the question that we pose as the title of this previous results with virtualization on. section is rather simple: SMT does not present any advantage for quantum chemistry calculations, and we suggest to turn it off. (This is just the simplest strategy to avoid overfilling of threads, with the resulting significant degradation of the performance.) For machines where SMT is turned on by default—for example on shared or centers—the best scaling performance can be achieved when at least half of the virtual threads for each physical core are left empty, and particular care should be given to not allow those virtual threads to perform any work (e.g. by reserving twice the amount of threads than those effectively used by the calculations).

CPUs Recommendations. For calculations that are generally small (i.e. small molecules, small basis sets), the less expensive Ryzen 7 is the best choice among the CPUs we tested, mostly because its single-thread performance is the best, and because it scales reasonably well up to 4 threads/cores. Interestingly enough, because of the higher single-thread speed, the performance of Ryzen 7 on our benchmark at 4 threads, is similar to that of Threadripper at 8 threads. Threadripper becomes a feasible choice Figure 3. Parallel performance of the three CPUs under only for overall computation time, since our best investigation (Ryzen 7: Blue, top panel; Threadripper: Red, result on the ACCDB benchmark was obtained middle panel; Epyc: Green, bottom panel), with simultaneous multi-threading (SMT) turned on (lighter bars, with Threadripper at 16 threads (174 minutes). background) or off (darker bars, forefront), as a function of However, the relative improvement over the the number of threads/cores (the value of our ACCDB multi- single-thread performance is not quite as good thread scaling benchmark is also reported on each bar). as we expected, and if 16 cores are not needed Despite the obvious loss of half of the threads of on one machine by the quantum chemistry each CPU, it seems clear that the results don’t engine itself, spreading the calculations on four change much, as long as at least half of the independent Ryzen 7 machines working at four virtual threads are empty. The main difference is threads each, will result in less than half overall at full load, for which the processors with SMT computation time over the best Threadripper turned on have a sensible degradation of the result (with a moderately higher organizational

10

effort to split the calculations, but at much lower when single-thread performance, or moderate retail prices). If calculations are large enough to scaling ability is required (small calculations), require multiple threads on a single machine, the Ryzen 7 CPUs are the best choice. For very large calculations, where a high number of cores is Epyc CPUs become competitive, because they required on an individual machine, the Epyc allow to build individual machines with a processors have a clear advantage. Despite significantly higher number of cores, and they Threadripper being the overall fastest processor have slightly better scaling performances than in our benchmarks, it is hard to recommend it for the two consumer CPUs we tested. Finally, in quantum chemistry calculations, mostly because regards with virtualization of threads, we better results can be achieved with multiple suggest to turn SMT off for quantum chemistry (less-expensive) individual Ryzen 7 machines, or with a more tailored usage of one Epyc server. calculations, in machines where this can be Despite the apparent advantage of doubling the done. In cases where this is not possible—for number of available threads, the use of example on a shared cluster—the strategy that simultaneous multi-threading (virtualization of will provide the best results is to request twice cores) is highly discouraged on all tested CPUs. the amount of virtual threads, and leave half of Finally, ACCDB is made available to the them unused. community, in the hope that it will be useful for different applications in many areas of Conclusions computational chemistry, including development of new semi-empirical methods, In the present article we introduced a large and assessment of existing ones. collection of computational chemistry databases (ACCDB), and the software tools that can be used Acknowledgments to interact with them. ACCDB includes 44,931 unique reference data points, all at a level of The authors are grateful to the many theory significantly higher than DFT. The data who made our aggregation of such a large covers all first four rows of the periodic table (H number of high-level calculated data possible, through Kr), most of the fifth row (W, Re, Ir, Pt, Au, Pb, Bi), and some actinides (Th through Cm). with special shout-outs to Donald G. Truhlar, ACCDB is composed of five databases taken from Martin Head-Gordon, Stefan Grimme, Jan M.L. literature: GMTKN, MGCDB84, Minnesota, Martin, Amir Karton, Lars Goerigk, and their DP284, and W4-17, plus two newly developed entire groups. R.P. is also grateful to Narbe reaction energy databases, presented here for Mardirossian, Haoyu Yu, Xiao He, and Pragya the first time and called W4-17-RE and MN-RE, Verma for their help in locating the correct and a collection of databases containing transition metals, also new to this article. structures for some subsets, and to Vladimir Konjkov for the suggestion of snakemake as a A set of expandable software tools for the suitable tool for the development of the interaction with ACCDB, its manipulation, and workflow. calculation of statistical data, is also presented here for the first time, and based on the Keywords: Database, Benchmarks, DFT, WFT, snakemake workflow language. semi-empirical methods Our case study also provides important insights Additional Supporting Information may be on the performance of modern AMD CPUs, for routine quantum chemical calculations. The found in the online version of this article main results can be summarized as follows:

11 References 19. J.-L. Reymond, Acc. Chem. Res., 2015, 48, 722–730. 1. J. A. Pople, M. Head-Gordon, D. J. Fox, K. 20. M. Nakata and T. Shimazaki, J. Chem. Inf. Raghavachari and L. A. Curtiss, J. Chem. Model., 2017, 57, 1300–1308. Phys., 1998, 90, 5622–5629. 21. N. Mardirossian and M. Head-Gordon, Mol. 2. L. A. Curtiss, C. Jones, G. W. Trucks, K. Phys., 2017, 115, 2315–2372. Raghavachari and J. A. Pople, J. Chem. Phys., 22. L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, 1998, 93, 2537–2545. A. Najibi and S. Grimme, Phys. Chem. Chem. 3. W. Thiel, WIREs Comput Mol Sci, 2013, 4, Phys., 2017, 19, 32184–32215. 145–157. 23. M. Korth and S. Grimme, J. Chem. Theory 4. L. A. Curtiss, P. C. Redfern and K. Comput., 2009, 5, 993–1003. Raghavachari, WIREs Comput Mol Sci, 2011, 24. H. S. Yu, X. He and D. G. Truhlar, J. Chem. 1, 810–825. Theory Comput., 2016, 12, 1280–1293. 5. P. L. Fast, J. C. Corchado, M. L. Sánchez and 25. H. S. Yu, X. He, S. L. Li and D. G. Truhlar, D. G. Truhlar, J. Phys. Chem. A, 1999, 103, Chem. Sci., 2016, 7, 5032–5051. 5129–5136. 26. D. Hait and M. Head-Gordon, J. Chem. 6. J. P. Perdew, A. Ruzsinszky, J. Tao, V. N. Theory Comput., 2018, 14, 1969–1981. Staroverov, G. E. Scuseria and G. I. Csonka, 27. D. Hait and M. Head-Gordon, Phys. Chem. J. Chem. Phys., 2005, 123, 062201. Chem. Phys., 2018, 20, 19800–19810. 7. R. Peverati and D. G. Truhlar, Phil Trans R 28. J. T. Margraf, D. S. Ranasinghe and R. J. Soc A, 2014, 372, 20120476. Bartlett, Phys. Chem. Chem. Phys., 2017, 19, 8. M. G. Medvedev, I. S. Bushmarinov, J. Sun, 9798–9805. J. P. Perdew and K. A. Lyssenko, Science, 29. H. S. Yu, W. Zhang, P. Verma, X. He and D. 2017, 355, 49–52. G. Truhlar, 2015, 17, 12146–12160. 9. K. P. Kepp, Science, 2017, 356, 496b. 30. S. Luo, B. Averkiev, K. R. Yang, X. Xu and D. 10. M. G. Medvedev, I. S. Bushmarinov, J. Sun, G. Truhlar, J. Chem. Theory Comput., 2014, J. P. Perdew and K. A. Lyssenko, Science, 10, 102–121. 2017, 356, 496c. 31. S. Luo and D. G. Truhlar, J. Chem. Theory 11. N. Mardirossian and M. Head-Gordon, Phys. Comput., 2012, 8, 4112–4126. Chem. Chem. Phys., 2014, 16, 9904–9924. 32. B. B. Averkiev, M. Mantina, R. Valero, I. 12. A. Karton, N. Sylvetsky and J. M. L. Martin, J. Infante, A. Kovacs, D. G. Truhlar and L. Comput. Chem., 2017, 38, 2063–2075. Gagliardi, Theor. Chem. Acc., 2011, 129, 13. J. C. Grossman, J. Chem. Phys., 2002, 117, 657–666. 1434–1440. 33. R. Kang, W. Lai, J. Yao, S. Shaik and H. Chen, 14. M. Korth, A. Luechow and S. Grimme, J. J. Chem. Theory Comput., 2012, 8, 3119– Phys. Chem. A, 2007, 112, 2104–2109. 3127. 15. N. Nemec, M. D. Towler and R. J. Needs, J. 34. S. Dohm, A. Hansen, M. Steinmetz, S. Chem. Phys., 2010, 132, 034111–7. Grimme and M. P. Checinski, J. Chem. 16. F. Brockherde, L. Vogt, L. Li, M. E. Theory Comput., 2018, 14, 2596–2608. Tuckerman, K. Burke and K.-R. Müller, Nat. 35. K. Yang, R. Peverati, D. G. Truhlar and R. Comm., 2017, 1–10. Valero, J. Chem. Phys., 2011, 135, 044118– 17. M. Álvarez-Moreno, C. de Graaf, N. López, 044118–22. F. Maseras, J. M. Poblet and C. Bo, J. Chem. 36. K. Pierloot, Q. M. Phung and A. Domingo, J. Inf. Model., 2015, 55, 95–103. Chem. Theory Comput., 2017, 13, 537–553. 18. R. Ramakrishnan, P. O. Dral, M. Rupp and O. 37. M. Radoń and K. Pierloot, J. Phys. Chem. A, A. von Lilienfeld, Sci. Data, 2014, 1, 191–7. 2008, 112, 11824–11832.

12

38. Y. Sun and H. Chen, J. Chem. Theory 59. L. A. Curtiss, K. Raghavachari, P. C. Redfern Comput., 2013, 9, 4735–4743. and J. A. Pople, J. Chem. Phys., 1997, 106, 39. L. Hu and H. Chen, J. Chem. Theory Comput., 1063. 2015, 11, 4601–4614. 60. Y. Zhao, B. J. Lynch and D. G. Truhlar, Phys. 40. Y. Sun and H. Chen, J. Chem. Theory Chem. Chem. Phys., 2005, 7, 43–52. Comput., 2014, 10, 579–588. 61. Y. Zhao, N. González-García and D. G. 41. N. Mardirossian and M. Head-Gordon, J. Truhlar, J. Phys. Chem. A, 2005, 109, 2012– Chem. Phys., 2015, 142, 074111–32. 2018. 42. N. Mardirossian and M. Head-Gordon, J. 62. J. Friedrich and J. Hänchen, J. Chem. Theory Chem. Phys., 2016, 144, 214110–23. Comput., 2013, 9, 5381–5394. 43. N. Mardirossian and M. Head-Gordon, J. 63. J. Friedrich, J. Chem. Theory Comput., 2015, Chem. Phys., 2018, 148, 241736–14. 11, 3596–3609. 44. N. Mardirossian, M. Head-Gordon, J. Chem. 64. Y. Zhao and D. G. Truhlar, Theor. Chem. Theory Comput., 2016, 12, 4303–4325. Acc., 2008, 120, 215–241. 45. B. Chan, J. Chem. Theory Comput., 2018, 14, 65. S. Grimme, J. Chem. Phys., 2006, 124, 4254–4262. 034108. 46. L. Goerigk and S. Grimme, J. Chem. Theory 66. S. Grimme, C. Mück-Lichtenfeld, E.-U. Comput., 2010, 6, 107–126. Würthwein, A. W. Ehlers, T. P. M. Goumans 47. L. Goerigk and S. Grimme, 2011, 13, 6670– and K. Lammertsma, J. Phys. Chem. A, 2006, 6688. 110, 2583–2586. 48. L. Goerigk and S. Grimme, J. Chem. Theory 67. M. Piacenza and S. Grimme, J. Comput. Comput., 2011, 7, 291–309. Chem., 2004, 25, 83–99. 49. N. Mehta, M. Casanova-Páez and L. Goerigk, 68. H. L. Woodcock, H. F. Schaefer III and P. R. Phys. Chem. Chem. Phys., 2018, 20, 23175– Schreiner, J. Phys. Chem. A, 2002, 106, 23194. 11923–11931. 50. T. Gould, ChemRxiv, 2018. 69. P. R. Schreiner, A. A. Fokin, R. A. Pascal and 51. A. Karton, A. Tarnopolsky, J.-F. Lamere, G. C. A. de Meijere, Org. Lett., 2006, 8, 3635– Schatz and J. M. L. Martin, J. Phys. Chem. A, 3638. 2008, 112, 12868–12886. 70. C. Lepetit, H. Chermette, M. Gicquel, J.-L. 52. A. Karton, S. Daon and J. M. L. Martin, Heully and R. Chauvin, J. Phys. Chem. A, Chem. Phys. Lett., 2011, 510, 165–178. 2007, 111, 136–149. 53. L. Curtiss, K. Raghavachari, G. W. Trucks and 71. J. S. Lee, J. Phys. Chem. A, 2005, 109, J. A. Pople, J. Chem. Phys., 1990, 94, 7221– 11927–11932. 7230. 72. A. Karton and J. M. L. Martin, Mol. Phys., 54. S. Parthiban and J. M. L. Martin, J. Chem. 2012, 110, 2477–2491. Phys., 2001, 114, 6014–6029. 73. Y. Zhao, O. Tishchenko, J. R. Gour, W. Li, J. J. 55. Y. Zhao and D. G. Truhlar, J. Phys. Chem. A, Lutz, P. Piecuch and D. G. Truhlar, J. Phys. 2006, 110, 10478–10486. Chem. A, 2009, 113, 5786–5799. 56. H. Yu and D. G. Truhlar, J. Chem. Theory 74. D. Manna and J. M. L. Martin, J. Phys. Chem. Comput., 2015, 11, 2968–2983. A, 2015, 120, 153–160. 57. Y. Zhao, H. T. Ng, R. Peverati and D. G. 75. E. R. Johnson, P. Mori-Sanchez, A. J. Cohen Truhlar, J. Chem. Theory Comput., 2012, 8, and W. Yang, J. Chem. Phys., 2008, 129, 2824–2834. 204112. 58. S. Grimme, H. Kruse, L. Goerigk and G. 76. F. Neese, T. Schwabe, S. Kossmann, B. Erker, Angew. Chem. Int. Ed., 2010, 49, Schirmer and S. Grimme, J. Chem. Theory 1402–1405. Comput., 2009, 5, 3060–3073.

13 77. S. N. Steinmann, G. I. Csonka and C. 97. J. Řezáč, K. E. Riley and P. Hobza, J. Chem. Corminboeuf, J. Chem. Theory Comput., Theory Comput., 2012, 8, 4285–4292. 2008, 5, 2950–2958. 98. K. U. Lao, R. Schäffer, G. Jansen and J. M. 78. H. Krieg and S. Grimme, Mol. Phys., 2010, Herbert, J. Chem. Theory Comput., 2015, 11, 108, 2655–2666. 2473–2486. 79. L.-J. Yu and A. Karton, Chem Phys, 2014, 99. T. Schwabe and S. Grimme, Phys. Chem. 441, 166–177. Chem. Phys., 2007, 9, 3397–3406. 80. S. Grimme, M. Steinmetz and M. Korth, J 100. S. Grimme, Angew Chem Int Edit, 2006, 45, Org Chem, 2007, 72, 2118–2126. 4460–4464. 81. R. Huenerbein, B. Schirmer, J. Moellmann 101. D. Gruzman, A. Karton and J. M. L. Martin, and S. Grimme, 2010, 12, 6940–6948. J. Phys. Chem. A, 2008, 113, 11974–11983. 82. R. Sure, A. Hansen, P. Schwerdtfeger and S. 102. M. K. Kesharwani, A. Karton and J. M. L. Grimme, Phys. Chem. Chem. Phys., 2017, Martin, J. Chem. Theory Comput., 2015, 12, 19, 14296–14305. 444–454. 83. V. Guner, K. S. Khuong, A. G. Leach, P. S. 103. D. Řeha, H. Valdés, J. Vondrášek, P. Hobza, Lee, M. D. Bartberger and K. N. Houk, J. A. Abu-Riziq, B. Crews and M. S. de Vries, Phys. Chem. A, 2003, 107, 11445–11459. Chem-Eur J, 2005, 11, 6803–6817. 84. D. H. Ess and K. N. Houk, J. Phys. Chem. A, 104. L. Goerigk, A. Karton, J. M. L. Martin and L. 2005, 109, 9542–9553. Radom, 2013, 15, 7028. 85. T. C. Dinadayalane, R. Vijaya, A. Smitha and 105. U. R. Fogueri, S. Kozuch, A. Karton and J. G. N. Sastry, J. Phys. Chem. A, 2002, 106, M. L. Martin, J. Phys. Chem. A, 2013, 117, 1627–1633. 2269–2277. 86. L. Goerigk, R. Sharma, Can. J. Chem., 2016, 106. G. I. Csonka, A. D. French, G. P. Johnson 94, 1133–1143. and C. A. Stortz, J. Chem. Theory Comput., 87. A. Karton, R. J. O’Reilly, B. Chan and L. 2009, 5, 679–692. Radom, J. Chem. Theory Comput., 2012, 8, 107. H. Kruse, A. Mladek, K. Gkionis, A. Hansen, 3128–3136. S. Grimme and J. Sponer, J. Chem. Theory 88. A. Karton, R. J. O’Reilly and L. Radom, J. Comput., 2015, 11, 4972–4991. Phys. Chem. A, 2012, 116, 4211–4221. 108. S. Kozuch, S. M. Bachrach and J. M. L. 89. S. Grimme, J. Antony, S. Ehrlich and H. Martin, J. Phys. Chem. A, 2014, 118, 293– Krieg, J. Chem. Phys., 2010, 132, 154104. 303. 90. P. Jurečka, J. Sponer, J. Cerny and P. Hobza, 109. J. Řezáč and P. Hobza, J. Chem. Theory 2006, 8, 1985–1993. Comput., 2013, 9, 2151–2155. 91. M. S. Marshall, L. A. Burns and C. D. Sherrill, 110. B. J. Mintz and J. M. Parks, J. Phys. Chem. J. Chem. Phys., 2011, 135, 194102–11. A, 2012, 116, 1086–1092. 92. J. Řezáč, K. E. Riley and P. Hobza, J. Chem. 111. J. Řezáč and P. Hobza, J. Chem. Theory Theory Comput., 2011, 7, 2427–2438. Comput., 2012, 8, 141–151. 93. V. S. Bryantsev, M. S. Diallo, A. C. T. van 112. J. C. Faver, M. L. Benson, X. He, B. P. Duin and W. A. Goddard, J. Chem. Theory Roberts, B. Wang, M. S. Marshall, M. R. Comput., 2009, 5, 1016–1026. Kennedy, C. D. Sherrill and K. M. Merz Jr., J. 94. D. Manna, M. K. Kesharwani, N. Sylvetsky Chem. Theory Comput., 2011, 7, 790–797. and J. M. L. Martin, J. Chem. Theory 113. E. G. Hohenstein and C. D. Sherrill, J. Phys. Comput., 2017, 13, 3136–3152. Chem. A, 2009, 113, 878–886. 95. D. Setiawan, E. Kraka and D. Cremer, J. 114. C. D. Sherrill, T. Takatani and E. G. Phys. Chem. A, 2015, 119, 1642–1656. Hohenstein, J. Phys. Chem. A, 2009, 113, 96. S. Kozuch and J. M. L. Martin, J. Chem. 10146–10159. Theory Comput., 2013, 9, 1918–1931.

14

115. T. Takatani and C. David Sherrill, Phys. 134. P. R. Tentscher and J. S. Arey, J. Chem. Chem. Chem. Phys., 2007, 9, 6106–6114. Theory Comput., 2013, 9, 1568–1579. 116. J. Witte, M. Goldey, J. B. Neaton and M. 135. A. Bauzá, I. Alkorta, A. Frontera and J. Head-Gordon, J. Chem. Theory Comput., Elguero, J. Chem. Theory Comput., 2013, 9, 2015, 11, 1481–1492. 5201–5210. 117. D. L. Crittenden, J. Phys. Chem. A, 2009, 136. A. Otero-de-la-Roza, E. R. Johnson and G. 113, 1663–1669. A. Dilabio, J. Chem. Theory Comput., 2014, 118. K. L. Copeland and G. S. Tschumper, J. 10, 5436–5447. Chem. Theory Comput., 2012, 8, 1646– 137. S. N. Steinmann, C. Piemontesi, A. Delachat 1656. and C. Corminboeuf, J. Chem. Theory 119. D. G. A. Smith, P. Jankowski, M. Slawik, H. Comput., 2012, 8, 1629–1640. A. Witek and K. Patkowski, J. Chem. Theory 138. A. Karton, D. Gruzman and J. M. L. Martin, Comput., 2014, 10, 3140–3150. J. Phys. Chem. A, 2009, 113, 8434–8447. 120. J. Řezáč, K. E. Riley and P. Hobza, J. Chem. 139. J. J. Wilke, M. C. Lind, H. F. Schaefer III, A. Theory Comput., 2011, 7, 3466–3470. G. Császár and W. D. Allen, J. Chem. Theory 121. J. Řezáč, Y. Huang, P. Hobza and G. J. O. Comput., 2009, 5, 1511–1523. Beran, J. Chem. Theory Comput., 2015, 11, 140. J. M. L. Martin, J. Phys. Chem. A, 2013, 117, 3065–3079. 3118–3132. 122. J. Granatier, M. Pitonak and P. Hobza, J. 141. S. Yoo, E. Apra, X. C. Zeng and S. S. Chem. Theory Comput., 2012, 8, 2282– Xantheas, J. Phys. Chem. Lett., 2010, 1, 2292. 3122–3127. 123. S. Li, D. G. A. Smith and K. Patkowski, Phys. 142. L.-J. Yu, F. Sarrami, A. Karton and R. J. Chem. Chem. Phys., 2015, 17, 16560–16574. O’Reilly, Mol. Phys., 2014, 113, 1284–1296. 124. A. D. Boese, J. Chem. Theory Comput., 143. B. J. Lynch, Y. Zhao and D. G. Truhlar, J. 2013, 9, 4403–4413. Phys. Chem. A, 2003, 107, 1384–1388. 125. A. D. Boese, Mol. Phys., 2015, 113, 1618– 144. R. J. O’Reilly and A. Karton, Int. J. Quantum 1629. Chem., 2015, 116, 52–60. 126. A. D. Boese, Chem. Phys. Chem., 2015, 16, 145. A. Karton, P. R. Schreiner and J. M. L. 978–985. Martin, J. Comput. Chem., 2015, 37, 49–58. 127. K. U. Lao and J. M. Herbert, J. Phys. Chem. 146. A. Karton and L. Goerigk, 2015, 36, 622– A, 2015, 119, 235–252. 632. 128. K. U. Lao and J. M. Herbert, J. Chem. Phys., 147. L.-J. Yu, F. Sarrami, R. J. O’Reilly and A. 2013, 139, 034107–17. Karton, Chem Phys, 2015, 458, 1–8. 129. B. Temelso, K. A. Archer and G. C. Shields, 148. J. Zheng, Y. Zhao and D. G. Truhlar, J. J. Phys. Chem. A, 2011, 115, 12034–12046. Chem. Theory Comput., 2007, 3, 569–582. 130. N. Mardirossian, D. S. Lambrecht, L. 149. L.-J. Yu, F. Sarrami, R. J. O’Reilly and A. McCaslin, S. S. Xantheas and M. Head- Karton, Mol. Phys., 2015, 114, 21–33. Gordon, J. Chem. Theory Comput., 2013, 9, 150. S. Chakravorty, S. Gwaltney, E. R. 1368–1380. Davidson, F. Parpia and C. Fischer, Phys. 131. B. Chan, A. T. B. Gilbert, P. M. W. Gill and L. Rev. A, 1993, 47, 3649–3670. Radom, J. Chem. Theory Comput., 2014, 10, 151. K. T. Tang and J. P. Toennies, J. Chem. 3777–3783. Phys., 2003, 118, 4976–4983. 132. G. S. Fanourgakis, E. Apra and S. S. 152. W. Zhang, D. G. Truhlar and M. Tang, J. Xantheas, J. Chem. Phys., 2004, 121, 2655– Chem. Theory Comput., 2013, 9, 3965– 2663. 3977. 133. T. Anacker and J. Friedrich, 2014, 35, 634– 153. Y. Zhao, N. E. Schultz and D. G. Truhlar, J. 643. Chem. Theory Comput., 2006, 2, 364–382.

15 154. R. Peverati and D. G. Truhlar, J. Chem. 175. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. Phys., 2011, 135, 191102. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. 155. X. Li, X. Xu, X. You and D. G. Truhlar, J. Scalmani, V. Barone, G. A. Petersson, H. Phys. Chem. A, 2016, 120, 4025–4036. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, 156. B. B. Averkiev, Y. Zhao and D. G. Truhlar, J. Bloino, B. G. Janesko, R. Gomperts, B. ‘Journal of Molecular . A, Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Chemical’, 2010, 324, 80–88. Izmaylov, J. L. Sonnenberg, D. Williams- 157. X. Xu, W. Zhang, M. Tang and D. G. Truhlar, Young, F. Ding, F. Lipparini, F. Egidi, J. J. Chem. Theory Comput., 2015, 11, 2036– Goings, B. Peng, A. Petrone, T. Henderson, 2052. D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. 158. C. E. Hoyer, G. Li Manni, D. G. Truhlar and Rega, G. Zheng, W. Liang, M. Hada, M. L. Gagliardi, J. Chem. Phys., 2014, 141, Ehara, K. Toyota, R. Fukuda, J. Hasegawa, 204309. M. Ishida, T. Nakajima, Y. Honda, O. Kitao, 159. O. A. Vydrov and T. van Voorhis, J. Chem. H. Nakai, T. Vreven, K. Throssell, J. A. J. Theory Comput., 2012, 8, 1929–1934. Montgomery, J. E. Peralta, F. Ogliaro, M. J. 160. K. M. de Lange and J. R. Lane, J. Chem. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Phys., 2011, 134, 034301–10. Kudin, V. N. Staroverov, T. A. Keith, R. 161. J. D. McMahon and J. R. Lane, J. Chem. Kobayashi, J. Normand, K. Raghavachari, A. Phys., 2011, 135, 154309–10. P. Rendell, J. C. Burant, S. S. Iyengar, J. 162. H. Yu and D. G. Truhlar, J. Chem. Theory Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Comput., 2014, 10, 2291–2305. Adamo, R. Cammi, J. W. Ochterski, R. L. 163. T. Schwabe, Phys. Chem. Chem. Phys. Martin, K. Morokuma, O. Farkas, J. B. 2014, 16, 14559–12. Foresman and D. J. Fox, Gaussian, Inc.,, 164. S. Luo, Y. Zhao and D. G. Truhlar, 2011, 13, 2016, Gaussian 16, Revision A.01. 13683–13689. 176. Y. Shao, Z. Gan, E. Epifanovsky, A. T. B. 165. Y. Zhao, N. E. Schultz and D. G. Truhlar, J. Gilbert, M. Wormit, J. Kussmann, A. W. Chem. Phys., 2005, 123, 161103. Lange, A. Behn, J. Deng, X. Feng, D. Ghosh, 166. Y. Zhao and D. G. Truhlar, J. Chem. Phys., M. Goldey, P. R. Horn, L. D. Jacobson, I. 2006, 125, 194101. Kaliman, R. Z. Khaliullin, T. Kus, A. Landau, J. 167. R. Peverati, Y. Zhao and D. G. Truhlar, J. Liu, E. I. Proynov, Y. M. Rhee, R. M. Richard, Phys. Chem. Lett., 2011, 2, 1991–1997. M. A. Rohrdanz, R. P. Steele, E. J. 168. Y. Zhao and D. G. Truhlar, J. Phys. Chem. A, Sundstrom, H. L. Woodcock, P. M. 2005, 109, 5656–5667. Zimmerman, D. Zuev, B. Albrecht, E. Alguire, 169. R. Peverati and D. G. Truhlar, 2012, 14, B. Austin, G. J. O. Beran, Y. A. Bernard, E. 13171–13174. Berquist, K. Brandhorst, K. B. Bravaya, S. T. 170. R. Li, R. Peverati, M. Isegawa and D. G. Brown, D. Casanova, C. M. Chang, Y. Chen, Truhlar, J. Phys. Chem. A, 2012, 117, 169– S. H. Chien, K. D. Closser, D. L. Crittenden, 173. M. Diedenhofen, R. J. DiStasio, H. Do, A. D. 171. R. Peverati and D. G. Truhlar, J. Chem. Dutoi, R. G. Edgar, S. Fatehi, L. Fusti-Molnar, Theory Comput., 2012, 8, 2310–2319. A. Ghysels, A. Golubeva-Zadorozhnaya, J. 172. R. Peverati and D. G. Truhlar, J. Phys. Gomes, M. W. D. Hanson-Heine, P. H. P. Chem. Lett., 2012, 3, 117–124. Harbach, A. W. Hauser, E. G. Hohenstein, Z. 173. J. Zheng, Y. Zhao and D. G. Truhlar, J. C. Holden, T. C. Jagau, H. Ji, B. Kaduk, K. Chem. Theory Comput., 2009, 5, 808–821. Khistyaev, J. Kim, J. Kim, R. A. King, P. 174. Y. Zhao and D. G. Truhlar, J. Chem. Theory Klunzinger, D. Kosenkov, T. Kowalczyk, C. M. Comput., 2005, 1, 415–432. Krauter, K. U. Lao, A. Laurent, K. V. Lawler, S. V. Levchenko, C. Y. Lin, F. Liu, E. Livshits,

16

R. C. Lochan, A. Luenser, P. Manohar, S. F. R. Baer, A. T. Bell, N. A. Besley, J.-D. Chai, A. Manzer, S. P. Mao, N. Mardirossian, A. V. Dreuw, B. D. Dunietz, T. R. Furlani, S. R. Marenich, S. A. Maurer, N. J. Mayhall, E. Gwaltney, C. P. Hsu, Y. Jung, J. Kong, D. S. Neuscamman, C. M. Oana, R. Olivares- Lambrecht, W. Liang, C. Ochsenfeld, V. A. Amaya, D. P. ONeill, J. A. Parkhill, T. M. Rassolov, L. V. Slipchenko, J. E. Subotnik, T. Perrine, R. Peverati, A. Prociuk, D. R. Rehn, Van Voorhis, J. M. Herbert, A. I. Krylov, P. E. Rosta, N. J. Russ, S. M. Sharada, S. M. W. Gill and M. Head-Gordon, Mol. Phys., Sharma, D. W. Small, A. Sodt, T. Stein, D. 2015, 113, 184–215. Stuck, Y. C. Su, A. J. W. Thom, T. 177. J. Koster and S. Rahmann, , Tsuchimochi, V. Vanovschi, L. Vogt, O. 2012, 28, 2520–2522. Vydrov, T. Wang, M. A. Watson, J. Wenzel, 178.https://github.com/peverati/ACCDB A. White, C. F. Williams, J. Yang, S. Yeganeh, 179.https://github.com/aoterodelaroza/refdata S. R. Yost, Z. Q. You, I. Y. Zhang, X. Zhang, Y. 180. S. Maffioletti and R. Murri, EGICF12- Zhao, B. R. Brooks, G. K. L. Chan, D. M. EMITC2 Conf. Proc., 2012, 162. Chipman, C. J. Cramer, W. A. Goddard, M. S. Gordon, W. J. Hehre, A. Klamt, H. F. Schaefer, M. W. Schmidt, C. D. Sherrill, D. G. Truhlar, A. Warshel, X. Xu, A. Aspuru-Guzik,

17 Supporting Information for “ACCDB: A Collection of Chemistry DataBases for Broad Computational Purposes”

P. Morgante, R. Peverati Chemistry Program, Florida Institute of Technology 150 W. University Blvd., 32901 Melbourne FL, United States.

Email: [email protected]

Page S2: Table S1: Summary of databases and subsets included in ACCDB. Page S3: Table S2: Super position of subsets in the databases. Pages S4-S7: List of references for all the subsets in Table S1 and S2.

1 Table S1. Databases included in ACCDB with their respective subsets and relevant citations. GMTKN55 MGCDB84 Minnesota Metals&EE W4-11(TAE140)1 HAL5949,50 A2462 H2O16Rel594 SRM2106 3d-SSIP30118 G21EA2 AHB2151 DS1463 H2O20Rel1080 SRMGD5106,107,134 4d-SSIP24120 G21IP2 CHB651 HB1564 H2O20Rel48,46,85,86 3dSRBE2108 AIP28137 DIPCS103 IL1651 HSG44,65 Melatonin5258 SR-MGN-BE107106,109,110 LTMBH26138 PA263,4,5 IDISP8,9,33,52,53 NBC1044,66-68 YMPJ51955 ABDE13111 MOR41139 SIE4x43 ICONF3 S2243,44 EIE2295 MR-MGM-BE4107 p-VR17121 ALKBDE106 ACONF54 X4050 Styrene4524 MR-MGN-BE17106 Por21140,141,148 YBDE103,7 AMINO20x455 A21x1269 DIE6032 3dSRBE4108 TMBH23142-144 AL2x63 PCONF2156,57 BzDC21570 ISOMERIZATION201 SRMBE10106,134 HEAVYSB113 MCONF58 HW3071 C20C2426 PdBE2112 W4-17 NBPRC8,9,10 SCONF8,59 NC1572 AlkAtom1991 FeCl113 TAE203145 ALK83 UPU2360 S6645,73 BDE99nonMR1 3dMRBE6108 BDE991 RC213 BUT14DIOL61 S66x845 G21EA2,8 MRBE3106 HAT7071 G2RC3,11 3B-69-DIM74 G21IP2,8 CuH, VO, CuCl, NiCl113 ISOMERIZATION201 BH76RC3,8,12,13 MB08-165 AlkBind1275 TAE140nonMR1 MR-TMD-BE3106,114 SN131 FH5114,15 MB08-16527 CO2Nitrogen1676 AlkIsod1491 BH7612,13,106,135 TAUT153 HB4977-79 BH76RC8,12,13 NCCE2344,106,115-117,136 DP284 DC133,8,16-26 Ionic4351 EA1396 CT7106 Dip152146 MB16-433,27 H2O6Bind880,81 HAT707nonMR1 S6x645 Pol132147 DARC3,8,28 HW6Cl80,81 IP1396 NGDWI21105,106 RSE433,29 HW6F80,81 NBPRC8,9,10 3dEE8114,118,119 New RE databases: BSR363,30,31 FmH2O1080,81 SN131 4dAEE5120 W4-17-RE148 CDIE2032 Shields3882 BSR363,8,30,31 pEE5121 MN-RE148 ISO3433 SW49Bind34583 HNBrBDE1897 4pISOE4122 ISOL249,34 SW49Bind683 WCPT641 2pISOE4122 C60ISO35 WATER278,46 BDE99MR1 ISOL6/11106,123 PARel3 3B-69-TRIM74 HAT707MR1 pTC135,106,124,125 BH763,12,13 CE2040,84 TAE140MR1 HC7/11106,126 BHPERI8,36-38 H2O20Bind1080 PlatonicHD698 EA13/0396,106,109,124,127,128 BHDIV103 H2O20Bind48,46,85,86 PlatonicID698 PA85,106 INV2439 TA1387 PlatonicIG698 IP2396,106,109,124,127-129 BHROT273 XB1849 PlatonicTAE698 AE1716,104,106 PX1340 Bauza3088,89 BHPERI268,99 SMAE3130,131,132 WCPT1841 CT2090 CRBH20100 DC9/12106,133 RG183 XB5149 DBH24101,102 ADIM63,42 AlkIsomer1191 CR20103 S2243,44 Butanediol6561 HTBH3813 S6645 ACONF8,54 NHTBH3812 HEAVY283,42 CYCONF8,92 PX1340,84 WATER2746,47 Pentane1493 WCPT2741 CARBHB123 SW49Rel34583 AE18104 PNICO233,48 SW49Rel683 RG10105

2 Table S2. Super position of subsets with at least partial overlap across all different databases. (Yellow indicates partial overlap, orange indicates complete overlap) GMTKN55a,b MGCDB84a W4-17 Minnesota Metals&EE W4-11 (TAE140) TAE140 TAE203c BDE99, HAT707, BDE99, HAT707,

ISOMERIZATION20, SN13 ISOMERIZATION20, SN13 G21EA G21EA G21IP G21IP PA26 PA8 NBPRC NBPRC BH76RC BH76RC DC13d C20C24, Styrene45d DC9d (DC9 in GMTKN30) MB16-43e

(similar to MB08-165) BSR36 BSR36 CDIE20 DIE60 ISOL24 IsoL6/11 (ISOL22 in GMTKN30) BH76 HTBH38, NHTBH38 BH76 BHPERI26 BHPERI PX13 CE20, PX13 WCPT18 WCPT6, WCPT27 S22 S22, HSG, NBC10 NCCE23 S66 S66, S66x8 S6x6 WATER27, H2O20Bind4, WATER27 H2O20Rel4 HAL59 X40, XB51, XB18 AHB21, CHB6, IL16 Ionic43 ACONF ACONF Amino20x4 YMPJ519 CYCONF (GMTKN30) CYCONF (from GMTKN30) MCONF Melatonin52 But14diol Butanediol165 RG10 NGDWI21 IP13 IP23 EA13 EA13 AE18 AE17 DBH24 W4-08f 3dEE8 3d-SSIP30 4dAEE5 4d-SSIP24 pEE5 p-VR17 [a] MGCDB84 and GMTKN55 share a considerable number of subsets since some of the subsets in MGCDB84 are taken from GMTKN30. [b] For GMTKN55, all reference energies have been recalculated, and they differ (in general) from the ones reported in MGCDB84, or in Minnesota. [c] TAE203 replaces TAE140. The latter is taken from W4-11, together with BDE99, HAT707, ISOMERIZATION20, and SN13. [d] One of the reactions included in DC13 comes from C20C24, one from Styrene45, one from DC9, the others from literature. [e] MB16-43 substitutes MB08-165. [f] The reference energies for DBH24 are taken from W4-08 (ref. 102), but they are not included in our version of W4-17.

3 References for all the subsets included in ACCDB.

1. A. Karton, S. Daon and J. M. L. Martin, Chem. Phys. Lett., 2011, 510, 165–178. 2. L. A. Curtiss, K. Raghavachari, G. W. Trucks, J. A. Pople, J. Chem. Phys., 1991, 94, 7221-7230. 3. L. Goerigk, A. Hansen, C. Bauer, S. Ehrlich, A. Najibi and S. Grimme, Phys. Chem. Chem. Phys., 2017, 19, 32184– 32215. 4. S. Parthiban and J. M. L. Martin, J. Chem. Phys., 2001, 114, 6014–6029. 5. Y. Zhao and D. G. Truhlar, J. Phys. Chem. A, 2006, 110, 10478–10486. 6. H. Yu and D. G. Truhlar, J. Chem. Theory Comput., 2015, 11, 2968–2983. 7. Y. Zhao, H. T. Ng, R. Peverati and D. G. Truhlar, J. Chem. Theory Comput., 2012, 8, 2824–2834. 8. L. Goerigk and S. Grimme, J. Chem. Theory Comput., 2010, 6, 107–126. 9. L. Goerigk and S. Grimme, J. Chem. Theory Comput., 2011, 7, 291–309. 10. S. Grimme, H. Kruse, L. Goerigk and G. Erker, Angew. Chem., Int. Ed., 2010, 49, 1402–1405. 11. L. A. Curtiss, K. Raghavachari, P. C. Redfern and J. A. Pople, J. Chem. Phys., 1997, 106, 1063–1079. 12. Y. Zhao, B. J. Lynch and D. G. Truhlar, Phys. Chem. Chem. Phys., 2005, 7, 43–52. 13. Y. Zhao, N. Gonzalez-Garcia and D. G. Truhlar, J. Phys. Chem. A, 2005, 109, 2012–2018. 14. J. Friedrich and J. Hanchen, J. Chem. Theory Comput., 2013, 9, 5381–5394. 15. J. Friedrich, J. Chem. Theory Comput., 2015, 11, 3596–3609. 16. Y. Zhao and D. G. Truhlar, Theor. Chem. Acc., 2008, 120, 215–241. 17. S. Grimme, J. Chem. Phys., 2006, 124, 034108-034108-16. 18. S. Grimme, C. Mück-Lichtenfeld, E.-U. Würthwein, A. W. Ehlers, T. P. M. Goumans and K. Lammertsma, J. Phys. Chem. A, 2006, 110, 2583–2586. 19. M. Piacenza and S. Grimme, J. Comput. Chem., 2004, 25, 83–99. 20. H. L. Woodcock, H. F. Schaefer III and P. R. Schreiner, J. Phys. Chem. A, 2002, 106, 11923–11931. 21. P. R. Schreiner, A. A. Fokin, R. A. Pascal and A. de Meijere, Org. Lett., 2006, 8, 3635–3638. 22. C. Lepetit, H. Chermette, M. Gicquel, J.-L. Heully and R. Chauvin, J. Phys. Chem. A, 2007, 111, 136–149. 23. J. S. Lee, J. Phys. Chem. A, 2005, 109, 11927–11932. 24. A. Karton and J. M. L. Martin, Mol. Phys., 2012, 110, 2477–2491. 25. Y. Zhao, O. Tishchenko, J. R. Gour, W. Li, J. J. Lutz, P. Piecuch and D. G. Truhlar, J. Phys. Chem. A, 2009, 113, 5786–5799. 26. D. Manna and J. M. L. Martin, J. Phys. Chem. A, 2016, 120, 153–160. 27. M. Korth and S. Grimme, J. Chem. Theory Comput., 2009, 5, 993–1003. 28. E. R. Johnson, P. Mori-Sanchez, A. J. Cohen and W. Yang, J. Chem. Phys., 2008, 129, 204112. 29. F. Neese, T. Schwabe, S. Kossmann, B. Schirmer and S. Grimme, J. Chem. Theory Comput., 2009, 5, 3060–3073. 30. S. N. Steinmann, G. Csonka and C. Carminboeuf, J. Chem. Theory Comput., 2009, 5, 2950–2958. 31. H. Krieg and S. Grimme, Mol. Phys., 2010, 108, 2655–2666. 32. L.-J. Yu and A. Karton, Chem. Phys., 2014, 441, 166–177. 33. S. Grimme, M. Steinmetz and M. Korth, J. Org. Chem., 2007, 72, 2118–2126. 34. R. Huenerbein, B. Schirmer, J. Moellmann and S. Grimme, Phys. Chem. Chem. Phys., 2010, 12, 6940– 6948. 35. R. Sure, A. Hansen, P. Schwerdtfeger and S. Grimme, Phys. Chem. Chem. Phys., 2017, 19, 14296– 14305. 36. V. Guner, K. S. Khuong, A. G. Leach, P. S. Lee, M. D. Bartberger and K. N. Houk, J. Phys. Chem. A, 2003, 107, 11445–11459. 37. D. H. Ess and K. N. Houk, J. Phys. Chem. A, 2005, 109, 9542–9553. 38. T. C. Dinadayalane, R. Vijaya, A. Smitha and G. N. Sastry, J. Phys. Chem. A, 2002, 106, 1627–1633. 39. L. Goerigk and R. Sharma, Can. J. Chem., 2016, 94, 1133–1143.

4 40. A. Karton, R. J. O’Reilly, B. Chan and L. Radom, J. Chem. Theory Comput., 2012, 8, 3128–3136. 41. A. Karton, R. J. O’Reilly and L. Radom, J. Phys. Chem. A, 2012, 116, 4211–4221. 42. S. Grimme, J. Antony, S. Ehrlich and H. Krieg, J. Chem. Phys., 2010, 132, 154104. 43. P. Jurecka, J. Sponer, J. Cerny and P. Hobza, Phys. Chem. Chem. Phys., 2006, 8, 1985–1993. 44. M. S. Marshall, L. A. Burns and C. D. Sherrill, J. Chem. Phys., 2011, 135, 194102-194102-10. 45. J. Rezac, K. E. Riley and P. Hobza, J. Chem. Theory Comput., 2011, 7, 2427–2438. 46. V. S. Bryantsev, M. S. Diallo, A. C. T. van Duin and W. A. Goddard III, J. Chem. Theory Comput., 2009, 5, 1016–1026. 47. D. Manna, M. K. Kesharwani, N. Sylvetsky and J. M. L. Martin, J. Chem. Theory Comput., 2017, 13, 3136–3152. 48. D. Setiawan, E. Kraka and D. Cremer, J. Phys. Chem. A, 2015, 119, 1642–1656. 49. S. Kozuch and J. M. L. Martin, J. Chem. Theory Comput., 2013, 9, 1918–1931. 50. J. Rezac, K. E. Riley and P. Hobza, J. Chem. Theory Comput., 2012, 8, 4285–4292. 51. K. U. Lao, R. Schäffer, G. Jansen and J. M. Herbert, J. Chem. Theory Comput., 2015, 11, 2473–2486. 52. T. Schwabe and S. Grimme, Phys. Chem. Chem. Phys., 2007, 9, 3397–3406. 53. S. Grimme, Angew. Chem., Int. Ed., 2006, 45, 4460–4464. 54. D. Gruzman, A. Karton and J. M. L. Martin, J. Phys. Chem. A, 2009, 113, 11974–11983. 55. M. K. Kesharwani, A. Karton and J. M. L. Martin, J. Chem. Theory Comput., 2016, 12, 444–454. 56. D. Reha, H. Valdes, J. Vondrasek, P. Hobza, A. Abu-Riziq, B. Crews and M. S. de Vries, Chem. – Eur. J., 2005, 11, 6803–6817. 57. L. Goerigk, A. Karton, J. M. L. Martin and L. Radom, Phys. Chem. Chem. Phys., 2013, 15, 7028–7031. 58. U. R. Fogueri, S. Kozuch, A. Karton and J. M. L. Martin, J. Phys. Chem. A, 2013, 117, 2269–2277. 59. G. I. Csonka, A. D. French, G. P. Johnson and C. A. Stortz, J. Chem. Theory Comput., 2009, 5, 679–692. 60. H. Kruse, A. Mladek, K. Gkionis, A. Hansen, S. Grimme and J. Sponer, J. Chem. Theory Comput., 2015, 11, 4972–4991. 61. S. Kozuch, S. M. Bachrach and J. M. L. Martin, J. Phys. Chem. A, 2014, 118, 293–303. 62. J. Rezac and P. Hobza, J. Chem. Theory Comput., 2013, 9, 2151-2155. 63. B. J. Mintz and J. M. Parks, J. Phys. Chem. A, 2012, 116, 1086-1092. 64. J. Rezac and P. Hobza, J. Chem. Theory Comput., 2012, 8, 141-151. 65. J. C. Faver, M. L. Benson, X. He, B. P. Roberts, B. Wang, M. S. Marshall, M. R. Kennedy, C. D. Sherrill, and K. M. Merz, J. Chem. Theory Comput., 2011, 7, 790-797. 66. E. G. Hohenstein and C. D. Sherrill, J. Phys. Chem. A, 2009, 113, 878-886. 67. C. D. Sherrill, T. Takatani, and E. G. Hohenstein, J. Phys. Chem. A, 2009, 113, 10146-10159. 68. T. Takatani and C. D. Sherrill, Phys. Chem. Chem. Phys., 2007, 9, 6106-6114. 69. J. Witte, M. Goldey, J. B. Neaton, and M. Head-Gordon, J. Chem. Theory Comput., 2015, 11, 1481- 1492. 70. D. L. Crittenden, J. Phys. Chem. A, 2009, 113, 1663-1669. 71. K. L. Copeland and G. S. Tschumper, J. Chem. Theory Comput., 2012, 8, 1646-1656. 72. D. G. A. Smith, P. Jankowski, M. Slawik, H. A. Witek, and K. Patkowski, J. Chem. Theory Comput., 2014, 10, 3140-3150. 73. J. Rezac, K.E. Riley, and P. Hobza, J. Chem. Theory Comput., 2011, 7, 3466-3470. 74. J. Rezac, Y. Huang, P. Hobza, and G. J. O. Beran, J. Chem. Theory Comput., 2015, 11, 3065-3079. 75. J. Granatier, M. Pitonak, and P. Hobza, J. Chem. Theory Comput., 2012, 8, 2282-2292 76. S. Li, D. G. A. Smith, and K. Patkowski, Phys. Chem. Chem. Phys., 2015, 17, 16560-16574 77. A. D. Boese, J. Chem. Theory Comput., 2013, 9, 4403-4413 78. A. D. Boese, Mol. Phys., 2015, 113, 1618-1629 79. A. D. Boese, Chem. Phys. Chem., 2015, 16, 978-985. 80. K.U. Lao and J.M. Herbert, J. Phys. Chem. A, 2015, 119, 235-252. 81. K.U. Lao and J.M. Herbert, J. Chem. Phys., 2013, 139, 034107-034107-16.

5 82. B. Temelso, K. A. Archer, and G. C. Shields, J. Phys. Chem. A, 2011, 115, 12034-12046. 83. N. Mardirossian, D. S. Lambrecht, L. McCaslin, S. S. Xantheas, and M. Head-Gordon, J. Chem. Theory Comput., 2013, 9, 1368-1380. 84. B. Chan, A. T. B. Gilbert, P. M. W. Gill, and L. Radom, J. Chem. Theory Comput., 2014, 10, 3777-3783. 85. G. S. Fanourgakis, E. Aprà, and S. S. Xantheas, J. Chem. Phys., 2004, 121, 2655-2663. 86. T. Anacker and J. Friedrich, J. Comput. Chem., 2014, 35, 634-643. 87. P. R. Tentscher and J. S. Arey, J. Chem. Theory Comput., 2013, 9, 1568-1579. 88. A. Bauzá, I. Alkorta, A. Frontera, and J. Elguero, J. Chem. Theory Comput., 2013, 9, 5201-5210. 89. A. O. de-la Roza, E. R. Johnson, and G. A. DiLabio, J. Chem. Theory Comput., 2014, 10, 5436-5447. 90. S. N. Steinmann, C. Piemontesi, A. Delachat, and C. Carminboeuf, J. Chem. Theory Comput., 2012, 8, 1629-1640. 91. A. Karton, D. Gruzman, and J. M. L. Martin, J. Phys. Chem. A, 2009, 113, 8434-8447. 92. J. J. Wilke, M. C. Lind, H. F. Schaefer III, A. G. Csaszar and W. D. Allen, J. Chem. Theory Comput., 2009, 5, 1511–1523. 93. J. M. L. Martin, J. Phys. Chem. A, 2013, 117, 3118-3132. 94. S. Yoo, E. Aprà, X. C. Zeng, and S. S. Xantheas, J. Phys. Chem. Lett., 2010, 1, 3122-3127. 95. L. J. Yu, F. Sarrami, A. Karton, and R. J. O’Reilly, Mol. Phys., 2015, 113, 1284-1296. 96. B. J. Lynch, Y. Zhao, and D. G. Truhlar, J. Phys. Chem. A, 2003, 107, 1384-1388. 97. R. J. O’Reilly, and A. Karton, Int. J. Quantum Chem., 2016, 116, 52-60. 98. A. Karton, P. R. Schreiner, and J. M. L. Martin, J. Comput. Chem., 2016, 37, 49-58. 99. A. Karton and L. Goerigk, J. Comput. Chem., 2015, 36, 622-632. 100. L. J. Yu, F. Sarrami, R. J. O’Reilly, and A. Karton, Chem. Phys., 2015, 458, 1-8. 101. J. Zheng, Y. Zhao, and D. G. Truhlar, J. Chem. Theory Comput., 2007, 3, 569-582. 102. A. Karton, A. Tarnopolsky, J. F. Lamère, G. C. Schatz, and J. M. L. Martin, J. Phys. Chem. A, 2008, 112, 12868-12886. 103. L. J. Yu, F. Sarrami, R. J. O’Reilly, and A. Karton, Mol. Phys., 2016, 114, 21-33. 104. S. J. Chakravorty, S. R. Gwaltney, E. R. Davidson, F. A. Parpia, C. F. Fischer, Phys. Rev. A, 1993, 47, 3649-3670. 105. K. T. Tang and J. P. Toennies, J. Chem. Phys., 2003, 118, 4976-4983. 106. R. Peverati, D. G. Truhlar, Phyl. Trans. R. Soc. A, 2014, 372, 20120476. 107. H. S. Yu and D. G. Truhlar, J. Chem. Theory Comput., 2015, 11, 2968–2983. 108. W. Zhang, D. G. Truhlar and M. Tang, J. Chem. Theory Comput., 2013, 9, 3965–3977. 109. Y. Zhao, N. E. Schultz, D. G. Truhlar, J. Chem. Theory Comput., 2006, 2, 364-382. 110. R. Peverati, D. G. Truhlar, J. Chem. Phys., 2011, 135, 191102-191102-4. 111. X. Li. X. Xu, X. You, D. G. Truhlar, J. Phys. Chem. A, 2016, 120, 4025-4036. 112. B. Averkiev, Y. Zhao and D. G. Truhlar, J. Mol. Catal. A: Chem., 2010, 324, 80–88. 113. X. Xu, W. Zhang, M. Tang and D. G. Truhlar, J. Chem. Theory Comput., 2015, 11, 2036–2052. 114. C. E. Hoyer, L. G. Manni, D. G. Truhlar and L. Gagliardi, J. Chem. Phys., 2014, 141, 204309. 115. O. A. Vydrov and T. V. Voorhis, J. Chem. Theory Comput., 2012, 8, 1929–1934. 116. K. M. Lange and J. R. Lane, J. Chem. Phys., 2011, 134, 034301. 117. J. D. McMahon and J. R. Lane, J. Chem. Phys., 2011, 135, 154309 118. S. Luo, B. Averkiev, K. R. Yang, X. Xu and D. G. Truhlar, J. Chem. Theory Comput., 2014, 10, 102–121 119. H. S. Yu and D. G. Truhlar, J. Chem. Theory Comput., 2014, 10, 2291–2305 120. S. Luo and D. G. Truhlar, J. Chem. Theory Comput., 2012, 8, 4112–4126 121. K. Yang, R. Peverati, D. G. Truhlar and R. J. Valero, J. Chem. Phys., 2011, 135, 044188. 122. T. Schwabe, Phys. Chem. Chem. Phys., 2014, 16, 14559– 14567. 123. S. Luo, Y. Zhao, D. G. Truhlar, Phys. Chem. Chem. Phys., 2011 ,13, 13683–13689. 124. Y. Zhao, N. E. Schultz, D. G. Truhlar, J. Chem. Phys., 2005, 123, 161103. 125. Y. Zhao, D. G. Truhlar, J. Chem. Phys., 2006, 125, 194101

6 126. R. Peverati, Y. Zhao, D. G. Truhlar, J. Phys. Chem. Lett., 2011, 2, 1991–1997. 127. Y. Zhao, D. G. Truhlar, J. Phys. Chem. A, 2005, 109, 5656-5667. 128. R. Peverati, D. G. Truhlar, Phys. Chem. Chem. Phys., 2012, 14, 13171-13174. 129. R. Li, R. Peverati, M. Isegawa, D. G. Truhlar, J. Phys. Chem. A, 2012, 117, 169–173. 130. H. S. Yu, X. He, D. G. Truhlar, J. Chem. Theory Comput., 2016, 12, 1280-1293. 131. H. S. Yu, X. He, S. L. Li, D. G. Truhlar, Chem. Science, 2016, 7, 5032-5051. 132. NIST website: webbook.nist.gov 133. R. Peverati, D. G. Truhlar, J. Chem. Theory Comput., 2012, 8, 2310-2319. 134. R. Peverati, D. G. Truhlar, J. Phys. Chem. Lett., 2012, 3, 117-124. 135. J. Zheng, Y. Zhao, D. G. Truhlar, J. Chem. Theory. Comput., 2009, 5, 808-821. 136. Y. Zhao, D. G. Truhlar, J. Chem. Theory Comput., 2005, 1, 415-432. 137. B. B. Averkiev, M. Mantina, R. Valero, I. Infante, A. Kovacs, D. G. Truhlar and L. Gagliardi, Theor. Chem. Acc., 2011, 129, 657–666. 138. R. Kang, W. Lai, J. Yao, S. Shaik and H. Chen, J. Chem. Theory Comput., 2012, 8, 3119–3127. 139. S. Dohm, A. Hansen, M. Steinmetz, S. Grimme and M. P. Checinski, J. Chem. Theory Comput., 2018, 14, 2596–2608. 140. K. Pierloot, Q. M. Phung and A. Domingo, J. Chem. Theory Comput., 2017, 13, 537–553 141. M. Radoń and K. Pierloot, J. Phys. Chem. A, 2008, 112, 11824–11832. 142. Y. Sun and H. Chen, J. Chem. Theory Comput., 2013, 9, 4735–4743. 143. L. Hu and H. Chen, J. Chem. Theory Comput., 2015, 11, 4601–4614. 144. Y. Sun and H. Chen, J. Chem. Theory Comput., 2014, 10, 579–588. 145. A. Karton, N. Sylvetsky and G. J. Martin, J. Comput. Chem., 2017, 38, 2063–2075. 146. D. Hait, M. Head-Gordon, J. Chem. Theory Comput., 2018, 14, 1969–1981. 147. D. Hait and M. Head-Gordon, Phys. Chem. Chem. Phys., 2018, 140, A1133–11. 148. P. Morgante and R. Peverati, present work.

7