Computational Systems Chemistry with Rigorous Uncertainty Quantification
Total Page:16
File Type:pdf, Size:1020Kb
DISS ETH NO. 24842 Computational Systems Chemistry with Rigorous Uncertainty Quantification A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZÜRICH (Dr. sc. ETH Zürich) presented by Jonny Proppe Master of Science (M.Sc.) Chemistry University of Hamburg born on August 18, 1987 citizen of Germany accepted on the recommendation of Prof. Dr. Markus Reiher, examiner Prof. Dr. Sereina Riniker, co-examiner ETH Zürich Zürich, Switzerland 2018 Jonny Proppe: Computational Systems Chemistry with Rigorous Uncertainty Quantification Dissertation ETH Zürich No. 24842, 2018. To Louisa. Contents Abstract vii Zusammenfassung ix Acknowledgments xi Abbreviations xiii 1 Complexity and Uncertainty: Challenges for Computational Chemistry 1 2 Computational Chemistry from a Network Perspective 5 2.1 Exploration of Chemical Reaction Networks ................ 6 2.2 Determination of Rate Constants from First Principles . 12 2.3 Kinetic Modeling of Reactive Chemical Systems . 14 3 Computational Chemistry from a Statistical Perspective 25 3.1 Statistical Calibration of Parametric Property Models . 25 3.2 Uncertainty Classification ........................... 27 3.3 Prediction Uncertainty from Resampling Methods . 31 4 Mechanism Deduction from Noisy Chemical Reaction Net- works 37 4.1 Kinetic Modeling of Complex Chemical Reaction Networks . 38 4.2 Overview of the KiNetX Meta-Algorithm . 40 4.3 Automatic Network Generation ....................... 41 4.4 Hierarchical Network Reduction ....................... 42 4.5 Propagation of Free-Energy Uncertainty . 48 4.6 Analysis of Kinetic Sensitivity to Free-Energy Uncertainty . 49 4.7 Integration of Stiff Ordinary Differential Equations . 51 4.8 Exemplary Workflow of the KiNetX Meta-Algorithm . 52 4.9 Accuracy and Efficiency of KiNetX-Based Kinetic Modeling . 55 4.10 KiNetX as a Guide for Reaction Space Exploration . 58 4.11 Conclusions .................................. 60 5 Reliable Estimation of Prediction Uncertainty for Physico- chemical Property Models 61 5.1 Isomer Shift Calibration in Theoretical 57Fe Mössbauer Spectroscopy . 62 5.2 Reference Set of Molecular Iron Compounds . 63 5.3 Effect of Experimental Uncertainty on Model Parameters . 64 5.4 Model Selection Based on Occam’s Razor . 68 5.5 Assessment of Data Inconsistency Based on Jackknife-after-Bootstrapping 69 v 5.6 How Reliable are Density Functional Rankings Based on a Specific Data Set? ....................................... 76 5.7 Effect of Exact Exchange on Model Prediction Uncertainty . 80 5.8 Conclusions .................................. 83 6 Case Study: Thermochemical, Kinetic, and Spectroscopic Modeling of Iron Porphyrin Carbene Chemistry 87 6.1 Thermochemical Analysis of Iron Porphyrin Carbene Reactivity . 87 6.2 Density Functional Assessment Based on Kinetic Modeling . 105 6.3 Mössbauer Spectroscopy for the Discrimination of Spin–Charge States . 111 7 A Computational Perspective on the Study of Complex Chem- ical Systems 115 Appendix A Kinetic Modeling of Complex Reactions: Techni- cal Details 119 A.1 Computational Singular Perturbation . 119 A.2 Construction of Random Covariance Matrices . 122 A.3 Encoding Chemical Logic into Network Graphs . 123 Appendix B 57Fe Mössbauer Isomer Shift Prediction: Technical Details 125 B.1 Statistical Calibration Analysis . 125 B.2 Quantum Chemical Calculations . 128 References 131 Publications 153 Curriculum Vitae 155 vi Abstract The success of in silico design approaches for molecules and materials that attempt to solve major technological issues of our society depends crucially on knowing the un- certainty of property predictions. Calibration is an essential model-building approach in this respect as it renders the inference of uncertainty-equipped predictions based on computer simulations possible. However, there exist various pitfalls that may affect the transferability of a property model to new data. By resorting to Bayesian inference and resampling methods (bootstrapping and cross-validation), we discuss issues such as the proper selection of reference data and property models, the identification and elimina- tion of systematic errors, and the rigorous quantification of prediction uncertainty. We apply this statistical calibration approach to the prediction of 57Fe Mössbauer isomer shifts from electron densities obtained with density functional theory. Our findings re- veal that the specific selection of reference iron complexes can have a significant effect on the ranking of density functionals with respect to model transferability. Further- more, we show that bootstrapping can be harnessed to determine the sensitivity of such model rankings to changes in the reference data set, which is inevitable to guide future computational studies. Such a statistically rigorous approach to calibration is almost unknown to chemistry. Our study is one of the very few addressing this issue and its results can be applied by all chemists to arbitrary property models with our open- source software reBoot. In this thesis, we define a new standard for the calibration of computational results due to the rigor, transparency, and generality of our statistical ap- proach, which is completely automatable. Black-box uncertainty quantification can also be applied to macroscopic systems by propagating the uncertainties inferred for single- molecule properties, which will ultimately allow modeling in chemistry to accelerate the discovery of important drugs, organic materials for solar cells, electrolytes for flow bat- teries, etc. A rather fundamental application area of this systems-focussed uncertainty quantification approach is the understanding of complex chemical reaction mechanisms, which is therefore another focus of this thesis. For an approach that accounts for all elementary processes within a reactive mixture, it is essential to know all relevant inter- mediates and transition states, to determine relative (free) energies, to quantify their uncertainties, and to model the systems kinetics based on uncertainty propagation. The advantage of a holistic in silico approach to chemistry is that the origin of all data can be rigorously controlled, which allows for reliable uncertainty quantification and prop- agation. In this thesis, we present the first automated exploration of parts of chemical vii reaction space based on quantum mechanical descriptors at the example of synthetic nitrogen fixation. Moreover, an extension to the exploration strategy considering un- certainty propagation through all stages of in silico modeling is presented in detail at the example of the formose reaction. It is generally hard to model the kinetics of such complex reactive systems as they usually constitute processes spanning multiple time scales. Here, we present a simple and efficient strategy based on computational singular perturbation, which allows us to model the kinetics of complex chemical systems at arbitrary time scales. To study arbitrary reaction networks of dilute chemical systems (low-pressure gas or low-concentration solution phase), we implemented a generalized scheme of our kinetic modeling approach referred to as KiNetX. Main features of the completely automated KiNetX meta-algorithm are hierarchical network reduction, un- certainty propagation, and global sensitivity analysis, the latter of which detects critical (uncertainty-amplifying) regions of a network such that more complex electronic struc- ture models are only employed if necessary. We also developed an automatic generator of abstract reaction networks encoding chemical logic, named AutoNetGen, which is cou- pled to KiNetX and allows us to examine a multitude of different chemical scenarios in short time. In a final case study, we apply the insights gained from computational sys- tems chemistry with rigorous uncertainty quantification to model the thermochemistry, kinetics, and spectroscopic properties of iron porphyrin compounds, which constitute a crucial type of active centers in metalloenzyme research. viii Zusammenfassung Der Erfolg von in-silico-Designansätzen für Moleküle und Materialien, welche zur Lö- sung wichtiger technologischer Probleme unserer Gesellschaft entwickelt werden, hängt signifikant von unserem Wissen über die Unsicherheit von Eigenschaftsvorhersagen ab. Die Kalibrierung von Simulationsergebnissen stellt in dieser Hinsicht einen essentiellen Modellbildungsschritt dar, auf dessen Grundlage unsicherheitsbehaftete Vorhersagen ermöglicht werden. Diese potentiell überaus effektive Vorgehensweise ist jedoch mit einer Reihe von Schwierigkeiten verbunden, ohne deren Berücksichtigung die Über- tragbarkeit eines Eigenschaftsmodells auf neue Daten beeinträchtigt werden kann. In dieser Dissertation diskutieren wir solche Schwierigkeiten – die geeignete Wahl von Re- ferenzdaten und Eigenschaftsmodellen; die Identifizierung und Eliminierung systema- tischer Fehler; die rigorose Quantifizierung der Unsicherheit einer Vorhersage – im Lichte Bayes’scher Inferenz und auf Wiederholungsprobennahmen basierender Metho- den (Bootstrapping, Kreuzvalidierung). Wir wenden unseren statistischen Ansatz auf die Kalibrierung berechneter Elektronendichten an, um ab-initio-Vorhersagen über 57Fe- Mößbauer-Isomerieverschiebungen treffen zu können. Unsere Ergebnisse deuten darauf hin, dass die spezifische Wahl der Referenzeisenkomplexe einen signifikanten Effekt auf das Ranking der verwendeten Modelle (hier: Dichtefunktionale) bezüglich ihrer Übertragbarkeit ausübt. Des Weiteren zeigen wir, dass Bootstrapping genutzt werden kann, um die Sensitivität solcher Modellrankings auf