NEW DOMAINS IN AUTOMATIC MECHANISM GENERATION

A Dissertation Presented

By

Belinda Leigh Slakman

To

The Department of Chemical Engineering

In partial fulfillment of the requirements For the degree of

Doctor of Philosophy

In the Field of

Chemical Engineering

Northeastern University Boston, Massachusetts

August 2017 ii

Abstract

Deeper understanding of complex chemical systems can be aided by detailed kinetic mod- eling, in which processes are broken down into their individual elementary reactions. An important industrial goal is to move from postdictive to predictive modeling, where new chemical vapor deposition (CVD) precursors, for example, can be tested for efficiency without performing tedious and expensive experiments. Some of these microkinetic mod- els may contain hundreds of reacting chemical species, and thousands of reactions; thus, it is desirable to build the models automatically with a computer to speed up model gen- eration and reduce errors. Automatic mechanism generation is now commonly used for applications such as combustion, but extension to other systems presents challenges. This dissertation describes the extension of the Reaction Mechanism Generator (RMG) soft- ware to two less-studied chemical systems: the oxidation of liquid fuels and the gas-phase decomposition of silicon hydrides. To model liquid fuel oxidation, the software’s existing gas-phase thermodynamics and kinetics databases needed to be supplemented, or corrected to account for solvated re- actions. Existing correlations and data for solvation thermodynamics and diffusion were improved and added to RMG. Solvation kinetics data were obtained by developing a ma- chine learning algorithm to systematically predict the change in barrier height when going from gas-phase to various solvents. The algorithm was trained with cal- culations on a simple set of hydrogen abstraction and intra-hydrogen migration reactions. The method was used to change the rates in a model for the oxidation of dodecane/methyl oleate blends, showing a marked change in the models prediction for the fuel’s induction period. iii

The second part of this dissertation involves gas-phase silicon hydride decomposi- tion, for the application to CVD. Thermodynamic and kinetic data were added from liter- ature to RMG’s database. Specifically focusing on radical reaction types, additional data were calculated via quantum chemistry for hydrogen bond increment (HBI) values of sili- con hydride species, as well as hydrogen abstraction reaction rates. A SiH4 decomposition model was built with the updated RMG and compared to experiment, with good agreement. This work provides new insight on both of these chemical systems and contributes new calculated thermodynamics and kinetics parameters. Importantly, it also guides future developers in adding capabilities for new phases or elements to mechanism generation software. iv

ACKNOWLEDGEMENTS

My dissertation work would not have been possible without the support of many people, near and far. Thank you to my advisor, Dr. Richard West, for supporting me these past five years. Your intelligence and insights have been invaluable, but at the same time, you have always let me run with the ideas I have had and let me make mistakes on my own- the marks of a great advisor. Thanks for giving me the opportunities to travel and present my research, teach, and mentor. I also want to thank my other committee members: Dr. Anand Asthagiri, Dr. Carolyn Lee-Parsons, Dr. Mary Jo Ondrechen, and Dr. Harsono Simka for your time and helpful discussions over the years. I would especially like to thank Harsono for mentoring me throughout two internships at Intel Corporation and beyond, and for all of your personal and professional advice. I want to thank my other colleagues at Intel, particularly Karson Knutson, Dr. Har- inath Reddy, and other members of the TCAD-IPAG group. I learned a lot from all of you, and thanks for giving me the opportunity to come back and work with you a second time. This dissertation would surely not have been possible without the hard work of past and present RMG developers in the Green Group at MIT. I would especially like to thank Dr. William Green for his insights over the years, and Dr. Amrit Jalan and Yunsie Chung for helpful discussions on solvation in RMG. This work could not have been completed without the help of Research Comput- ing at Northeastern University, especially Nilay Roy, who helped maintain the Discovery cluster. v

I want to thank Michael Li, and my instructors and colleagues at The Data Incuba- tor, for teaching me about data science, and providing guidance and friendship during the spring of 2017. I would also like to acknowledge Lilian Tsang and the organizers of the Combustion Energy Frontier Research Center summer school, which I attended in 2013 and 2014. I would like to thank the current and former Northeastern Department of Chemical Engineering staff for their support: Jessica, Brandon, Francesca, Kelly, Sarah and espe- cially Pat and Rob. I also have to thank my 11 classmates and friends who started this journey with me: Chris, Dan, Hunter, Luting, Mark, Negar, Oljora, Sue, Sydney, Tanya and Taylor. To Dr. Pierre Bhoorasingh and Dr. Fariba Seyedzadeh Khanshan: I couldn’t have done this work without your guidance, good example, and levity, and thank you for the friendship and advice you continue to provide. Thank you to Jason for being our “scientist” and for all of your research assistance. I’d also like to thank Nate for letting me vent, not just about kinetics. I also want to thank current West group members Yawei, Mike, Rasha and Krishna and past members Jacob, Victor, Elliot, Claudia, and Drew for your useful discussions and companionship; it’s been a pleasure to work with all of you. I have to thank all of my friends, in Boston and beyond. In particular, I want to thank Amanda for being my roommate for 4 years, Ina for sharing in the best friendship that has ever come from Craigslist (#18westwood), Katarina for reminding me that my Ph.D. project is not the only measure of my worth, and Jennifer for judgment-free support of my decisions. Last, and most importantly, I have to thank my family, who has patiently supported me for the past 28 years. Mom, Dad, Jordan and Casey, thank you for standing with me as I take this next step. I am lucky to have such love in my life. vi

Contents

1 Introduction 1

1.1 Automatic mechanism generation ...... 2 1.1.1 Reaction Mechanism Generator (RMG) ...... 6 1.2 Parameter estimation ...... 9 1.2.1 Thermodynamics ...... 9 1.2.2 Kinetics ...... 11 1.2.3 Transition state geometries ...... 13 1.3 Machine learning with decision trees ...... 13 1.3.1 Decision trees in chemistry and biology ...... 14 1.4 Dissertation overview ...... 15

2 Automatic calculation of solvation thermodynamics 17

2.1 Background ...... 18 2.1.1 Explicit thermodynamic calculations ...... 18 2.1.2 Estimation techniques ...... 18 2.2 Methods ...... 21 2.2.1 Adaptation of solvation thermodynamics from RMG-Java . . . . . 22 2.2.2 New additions in RMG-Py ...... 23 2.3 Results ...... 27 2.3.1 Estimation of solute descriptors ...... 27 2.3.2 Calculation of solvation thermodynamics ...... 28 vii

2.4 Summary ...... 29 2.5 Recommendations ...... 29 2.5.1 Temperature dependence of solvation thermodynamics ...... 30 2.5.2 Calculation of solvation thermodynamics for lone pair species . . . 30 2.5.3 Improved benchmarking of group additivity values ...... 30 2.5.4 Expansion and enhancement of solvents ...... 31

3 Implementing kinetic solvent effects in automatic mechanism generation 32

3.1 Background ...... 33 3.1.1 Experimental techniques for determining reaction rates in liquids . 34 3.1.2 ...... 35 3.1.3 Kinetic solvent effects within reaction families ...... 38 3.2 Methods ...... 49 3.2.1 Diffusion ...... 49 3.2.2 Intrinsic kinetics ...... 50 3.2.3 Fuel oxidation model modification ...... 54 3.2.4 Reactor simulations ...... 54 3.3 Results ...... 55 3.3.1 Solvation kinetics trends ...... 55 3.3.2 New reactor simulations ...... 58 3.4 Summary ...... 58 3.5 Recommendations ...... 61 3.5.1 On-the-fly estimation of solvation kinetics ...... 61 3.5.2 Benchmarking the estimates ...... 62 3.5.3 Check thermodynamic consistency with LSERs ...... 62 3.5.4 Data-driven approaches ...... 63 viii

4 Automated silicon hydride mechanism generation 66

4.1 Background ...... 67

4.1.1 Experimental work on SiH4 chemistry ...... 67

4.1.2 Detailed mechanisms for SiH4 CVD ...... 68 4.1.3 Importance of radical chemistry in silicon hydride thermal decom- position ...... 70 4.2 Methods ...... 70 4.2.1 RMG source code ...... 71 4.2.2 Updating RMG’s database ...... 71 4.2.3 RMG model generation ...... 74 4.2.4 Reactor modeling ...... 74 4.3 Results ...... 75 4.3.1 Kinetics of hydrogen abstraction reactions ...... 75 4.3.2 Calculated thermodynamic data ...... 76 4.3.3 RMG generated mechanisms ...... 79 4.4 Discussion ...... 86 4.5 Summary ...... 87 4.6 Recommendations ...... 88 4.6.1 Expansion of thermodynamic libraries for radical species ...... 88 4.6.2 Calculation of rates ...... 89 4.6.3 Sensitivity analysis ...... 90 4.6.4 Surface chemistry ...... 91

5 Conclusion 92

5.1 Liquid-phase fuel oxidation ...... 92 5.2 Thermal decomposition of silicon hydrides ...... 94 5.3 Summary ...... 95 ix

References 96

A Supplementary Info for Solvation Kinetics 114

A.1 Solvation kinetics molecular structure group trees and values ...... 114 A.2 Script for modifying Chemkin files for solvation kinetics corrections . . . . 116 A.3 Modified input file for n-dodecane/ methyl oleate oxidation . . . . 116 A.4 Cantera script to simulate liquid fuel oxidation reactor ...... 116 A.5 Code for automatic tree building ...... 116

B Supplementary Info for Silicon Hydrides 117

B.1 Geometries of reactants and transition states for hydrogen abstraction reac- tions ...... 117 B.2 Geometries of silicon hydride species ...... 119

B.3 Largest SiH4 decomposition mechanism ...... 122 B.4 Cantera script for simulating reactor ...... 122 B.5 Code for residence time comparison ...... 122 x

List of Figures

1.1 Rules for the generation of an elementary reaction network in GRACE . . .3 1.2 Lumped approach for the primary oxidation of n-pentane in MAMOX . . .4 1.3 Template and recipe for the hydrogen abstraction reaction family ...... 7 1.4 Part of hydrogen abstraction hierarchical tree in RMG ...... 7

2.1 Examples of molecular structure fragments ...... 20 2.2 Entry in the solute group database ...... 22 2.3 Molecular structure fragments in g-decalactone ...... 27

2.4 Comparison of calculated and experimental values for ∆Hsolv and ∆Gsolv . 28

3.1 Example of a potential energy surface ...... 33 3.2 Concept of peroxyl radical clock ...... 35 3.3 Comparison between continuum and hybrid implicit/explicit solvation models 36 3.4 The solvent effect on hydrogen abstraction from α-tocopherol ...... 38 3.5 Example of the PCET mechanism ...... 41

3.6 β-scission rates correlate with Dimroth-Reichardt parameter ET ...... 43 3.7 Diels-Alder reaction of cyclopentadiene and (-)-menthyl acrylate ...... 44 3.8 Acetylation mechanism ...... 45 3.10 Possible mechanisms in the hydrolysis of formamide ...... 48

3.11 ∆EA for the reaction XH + ·OH ←→ ·X + H2O...... 56 3.12 Illustration of the first few levels of group trees for hydrogen abstraction . . 57 xi

3.13 Comparison of experiments, original model and updated model with ki- netic solvent effects for 0% methyl oleate...... 59 3.14 Comparison of experiments, original model and updated model with ki- netic solvent effects for 5% methyl oleate...... 59 3.15 Comparison of experiments, original model and updated model with ki- netic solvent effects for 10% methyl oleate...... 60 3.16 Comparison of experiments, original model and updated model with ki- netic solvent effects for 30% methyl oleate...... 60 3.17 Algorithm for obtaining solvation energy estimation values from a large transition state dataset ...... 64

◦ 4.1 G3//B3LYP calculations of ∆f H298 compared to high level calculations . . 77

◦ 4.2 Group additivity calculations of ∆f H298 and HBI corrections compared to high level calculations ...... 79

4.3 Simulation results compared with SiH4 thermal decomposition experiment . 81

4.4 SiH4 concentration vs. temperature at different residence times ...... 82 4.5 Simulation results for full pressure dependent mechanisms generated by RMG, compared with a mechanism generated without radical reaction fam- ilies allowed ...... 83 4.6 Flux diagram for Si at 6 × 104 seconds for full, pressure dependent mech- anism generated by RMG and simulated at 613 K ...... 83

4.7 Variation in concentration profiles of SiH4 and Si2H6 with initial SiH4 con- centration at 873 K...... 84 xii

List of Tables

2.1 Radical corrections to A for solvation thermodynamics calculations . . . . 25 2.2 Comparison of group additivity and experimental solute parameters . . . . 27

3.1 Training reactions used to deduce kinetic solvent effects ...... 52 3.2 Solvents used for single-point energy calculations on training reactions . . . 53

4.1 Reaction families used to generate mechanisms for silicon hydrides in RMG 73 4.2 Hydrogen abstraction rates calculated from M062X/6-311+(3d2f) and tran- sition state theory using Cantherm ...... 75 4.3 Hydrogen bond increment (HBI) corrections calculated with G3//B3LYP . . 78

A.1 Group tree for hydrogen abstraction reactions ...... 114

B.1 Transition state geometries for silicon hydride hydrogen abstractions . . . . 117 B.2 Geometries of silicon hydride species ...... 119 1

1 INTRODUCTION

Understanding the oxidation of fuels in the liquid phase is important, as autoxida- tion during storage leads to a loss of reactivity and may cause fuels to fail national standards of oxidative stability [1]. Over time, autoxidation also causes large increases in viscosity [2], making the fuels more difficult and less cost-effective to use. On the other hand, oxi- dation of fuels in the liquid-phase can also be used to produce higher-value petrochemical products [3]. Fuel oxidation is a complex process, dependent upon the detailed chemistry of fuel components and additives. Many of these details are unknown, particularly for newer, biologically derived fuels [4]. Building microkinetic models can be an effective method for studying these systems; however, since these models can be quite large (174 species and 3275 reactions for a recent n-dodecane/methyl oleate autoxidation model [4]), building the models automatically using a computer is preferable to increase speed and reduce errors. Another complex chemical system which can benefit from detailed kinetic model- ing is chemical vapor deposition (CVD). CVD is an important process in the semiconductor industry for making silicon wafers. Predictive models can guide experimentalists on how to design new silicon precursors and choose process conditions that will improve the effi- ciency of CVD and the quality of silicon produced. Modeling just the gas-phase portion of CVD is complex; one large manually built model, for the application of silicon nanopar- ticle formation, contains 220 chemical species and 2600 reactions [5]. Again, building these models by hand is slow, error-prone and important pathways may be missed, so using automatic mechanism generation is more desirable. Mechanism generation software was utilized in one such study of silicon hydrides in the gas phase [6]. Proper modeling of gas- 2 phase chemistry of CVD using automatic mechanism generation leads the way for similar modeling of gas-surface chemistry, which is integral to these systems. This dissertation will address these two chemistry domains using automatic mecha- nism generation, and will focus on one software package in particular, the Reaction Mech- anism Generator (RMG) [7, 8]. Studying such systems using RMG (and automatic mecha- nism generation in general) is novel; most applications of automatic mechanism generation are in combustion and pyrolysis of hydrocarbon fuels, with recent extensions to oxygenated biofuels [9] and fuels containing sulfur [10]. Extensions beyond combustion and pyrolysis of fuels include refrigerant formation from chlorinated hydrocarbons [11] and the pyrolysis of ethyl nitrite [12]. The completed work falls at the intersection of chemical kinetics, computational chemistry, and machine learning, with the latter two used for parameter estimation in mi- crokinetic models. Background on each of these facets of the dissertation will be discussed in this chapter, including a brief history of automatic mechanism generation.

1.1 Automatic mechanism generation

As early as 1979, Ugi et al. introduced the idea of using matrices to represent chem- ical species, reactions and distances [13]. The concept was extended to mechanism gen- eration, defined as the problem of finding all possible sets of bond/electron matrices that fit a given reaction matrix, and then using rules based on chemical knowledge to limit the size of the mechanism. At the same time, Yoneda created GRACE, a network generator based upon matrix theory using square matrices [14]. The reactant and product matrices are also divided into atom groups, consisting of a center atom and its attached hydrogen atoms. GRACE can handle both radical and ionic reactions but does not take into account stereochemistry. The sub-systems of GRACE include decomposing an overall reaction into a set of elementary reactions; gathering the Arrhenius parameters for these reactions; and setting up the the material balances for each species. This mechanism generator also in- 3 cludes additional constraints; an example of these is given in Figure 1.1, reproduced from the paper. The authors acknowledge that the constraints must be carefully chosen by an experienced chemist, so that one does not generate unreasonable reactions or species.

Figure 1.1: Rules imposed for the generation of an elementary reaction network, repro- duced from [14].

The matrix theory above is also applied in the 1992 network generator KING [15]. A combinatorial approach is taken, in which the set of products to be formed or possible reactions are unknown. In this way, a large network is formed, but it may be possible to discover new reaction types. It is suggested that constraints, including the number of species that are allowed to interact and the number of bonds that can be broken or formed, be implemented in order to reduce the size of the networks formed. Ranzi et al. used a lumping method in their mechanism generator, MAMOX, de- veloped in Fortran for oxidation and pyrolysis of fuels [16]. In this procedure, the de- composition of the primary products are simply“lumped” and products beyond this first decomposition are ignored. An overall rate constant is given for each type of reaction of the primary products (see the reproduced Figure 1.2). Lumping reduces the number of species and reactions to a manageable number, but one may miss important pathways or products. Alternatively, NetGen is a mechanism generator which, instead of lumping, in- 4

Figure 1.2: Lumped approach for the primary oxidation of n-pentane, reproduced from [16]. Intermediates and products are grouped together, with k1 - k9 representing lumped rate constants for each step of the mechanism.

cludes the reactions deemed important in the network and removes the ones which are not [17]. If a species has a production rate that is greater than some minimum rate, it is in- cluded in the reaction network. This minimum rate is determined by a user-specified factor multiplied by a characteristic rate, defined as

amount of reactant converted R = char time it takes for conversion

If a species exceeds the minimum rate, it is included in the network and then re- acted with all other species, and these reactions added to the mechanism. The mechanism generation stops when no more species exist which have a rate greater than the minimum rate and the desired conversion of a starting species has been achieved. The concept of characteristic rate is included in several other mechanism generators, including Genesys and RMG, to be described in the following sections. EXGAS is a software for generating mechanisms of alkane and ether fuels [18]. The mechanisms created consist of a reaction base which involves all unimolecular or bi-

molecular reactions of molecules C0 to C2; a primary mechanism with only initial species and oxygen as reactants; and a secondary mechanism, which is lumped. The lumping is 5 done by grouping the species formed in the primary mechanism that contain the same func- tional groups and have the same chemical formula. A distinguishing feature of EXGAS is that internally, non-cyclic species are represented as treelike structures. Thus, species can be compared for redundancy using an algorithm which is called the “algorithm of canon- icity”, based on graph theory. The concept of graph representation of species was used in all future automatic mechanism generators (with additional layers of complexity added in some cases). A mechanism generator developed in 2003 by Ratkiewicz et al., COMGEN, uses chemical graphs to represent species both internally and externally [19]. Information such as atomic charges, valences and bond types can also be included in these graphs. To gen- erate reactions, the program relies on reaction patterns, which are matched with the same pattern in the reacting molecules. The Rule Input Network Generator (RING) takes not only the initial species and network analysis instructions as user inputs, but also the chemistry rules it uses to gener- ate reactions [20]. The language used to provide input to RING is English-like and the compiler “proofreads” this in order to communicate between the user and network genera- tion. RING has the ability to represent intermediates and also has a post-processing module which analyzes the network topologically to enable lumping of pathways and mechanisms. This mechanism generator also implements a lumping scheme that groups together isomers with the same functional groups. For a network representing the dehydration of fructose to form hydroxymethylfurfural, for example, lumping reduced the number of reactions by a factor of two. RING has the ability to represent user-defined entities that the program does not automatically identify, such as inorganic atoms or active sites on catalysts. It can also represent non-bonded interactions such as partial bonds and hydrogen bonds. Genesys, developed at Ghent University, is similar to NetGen and RING in that it uses only elementary reactions, limiting the network size by characteristic rate and uses a rule-based approach for defining how molecules react. However, it differs in its internal 6 representation of chemical species; rather than using a graph, it utilizes the Chemistry De- velopment Kit (CDK). The representation allows for different stereo-isomers to be uniquely identified, which connectivity graphs cannot distinguish [21]. This capability is beneficial, since different conformers may have different reactivity.

1.1.1 Reaction Mechanism Generator (RMG)

The mechanism generator utilized in this work is the Reaction Mechanism Gener- ator (RMG) [7, 8, 22]. RMG is an open-source program with two versions, one written in the Java programming language and the other in Python. The Python version will be used and modified in this work, due to its simplicity of use and code readability as compared to the Java version. Given a set of starting species, temperature, and pressure, RMG uses the literature and chemical knowledge to propose a network of elementary reactions, their rates, and thermodynamic properties. In RMG, like several of the aforementioned mechanism generators, chemical species are represented by graphs, with nodes as atoms and edges as bonds. Graph theory can then be used to compare species and ensure uniqueness in the network. The network is built by expanding a ‘core’ set of species, starting with the user input. When a reaction results in a new species, that species is put on the ‘edge’. When a species flux reaches a critical amount, determined by a user-defined tolerance and the characteristic rate of the core, it is brought into the core. The simulation ends when there are no more edge species which fit this criteria for a given simulation time or species conversion [23]. The concept of charac- teristic rate has been previously described [17]. A unique aspect of RMG is its way of estimating reaction rates. Reaction families in RMG have a specific “recipe” that defines the change in radicals, lone pairs and bond order occurring in a reaction. Figure 1.3 gives an example of the template and recipe for the hydrogen abstraction reaction family. Starred atoms indicate the reacting atoms. Kinetic groups include the reacting atom in the family’s recipe and the structures surrounding that 7

Figure 1.3: Example of the template and recipe for the hydrogen abstraction reaction fam- ily. In the recipe, starred atoms participate in each action; ‘S’ refers to a single bond being formed or broken, and ‘1’ indicates that the radical count increases or decreases by 1.

Figure 1.4: An example of part of the hierarchical tree for the hydrogen abstraction reaction class from RMG documentation 8 atom. These groups are defined in RMG in a tree-like structure. The root of each tree is the most generic group which follows the reaction recipe, while nodes make up more specific groups. An example of a hierarchical tree is given in Figure 1.4. When a combination of groups reacts, a rule describes its kinetic parameters. If a rule exactly matches the combination of groups in the reaction it will be returned, but if not, the nearest neighbors in the hierarchical trees will be averaged to determine the reaction’s kinetic parameters. One way new kinetics can be added to RMG’s database is as part of a reaction library. Rates in a reaction library, such as GRI-mech [24], are used without modification whenever the reactant or product species are found in the model core. This method of addition to RMG’s database should be used for reactions that do not belong to a reaction family. For example, reactions with small molecules may not follow the same trend as others in the reaction family, or reactions may not match an existing recipe at all. Also, if a pressure-dependent reaction rate is known, this reaction should be added to a library in order bypass RMG’s Master Equation code for pressure dependence, explained below. Reaction kinetics added as training reactions to a reaction family are placed into the hierarchical tree of estimate rules, using the most specific functional group definitions possible for those reactants. They are then used when completing the more general nodes in the tree by averaging. Training reaction rates must provide high-pressure limit kinetics for an elementary reaction of a specific reaction family, and will influence the estimates of other similar reactions generated by the reaction family. This is the preferred way to add kinetic data to RMG’s rule-based estimation database [8]. In the case of unimolecular reactions, when the number of nonreactive collisions with a third body is rate-limiting, rate coefficients are dependent on both temperature and pressure. RMG contains a methodology for using the high-pressure limit kinetics to esti- mate these pressure dependent rate coefficients, which is described in ref. 25. When a new chemistry domain is studied with RMG, new reaction libraries, fami- lies and rate estimates should be added. This poses particular challenges when extending 9 to new elements, and especially when extending to a new phase. Prior to this work, all reaction rates included in RMG’s database, as well as its methods for rate estimation, were based upon data for gas-phase reactions. Additionally, RMG previously contained capa- bility for chemical species containing only carbon, hydrogen, oxygen, nitrogen, sulfur and chlorine.

1.2 Parameter estimation

Most parameters in microkinetic models are estimated, due to a lack of known and trusted kinetic and thermodynamic parameters from experimental or high level theoretical calculations [26]. The following sections will outline estimation techniques that are used to calculate these kinetics and thermodynamics values at low computational cost. These esti- mation techniques have been incorporated into automatic mechanism generation software and form the basis for the some of the work in this dissertation.

1.2.1 Thermodynamics

Thermodynamic values can be calculated using group additivity, with radical chem- ical species adding more complexity. When group additivity is insufficiently accurate, the parameters can also be calculated using quantum chemistry.

1.2.1.1 Benson’s group additivity scheme

Benson developed a scheme in which chemical species are decomposed into groups, defined as a central atom connected to its ligands. Group values for enthalpy and entropies of formation, and heat capacity as a function of temperature, can then be summed to obtain these thermodynamic values for the overall species [27]. Group values exist for molecules containing a variety of atoms and also for open-shell species, but groups do not yet exist for every single type of chemical species that might be present in a microkinetic model. Group values can either be derived from experimental data, or high level quantum calculations 10 can be used to calculate new group values. The accuracy of the group additivity method is dependent on the accuracy of the methods used to derive the group values, and also how similar the chemical species to be estimated are to the species from which the groups were derived. For example, many group additivity values exist for stable molecules, but less for radicals or ions, meaning that less appropriate or analogous values are sometimes applied to calculate their thermodynamic properties [26].

1.2.1.2 Hydrogen Bond Increment method

An alternative method to Benson’s group additivity for determining the thermody- namic properties of radical species was proposed by Lay et al. [28] and is known as the Hy- drogen Bond Increment (HBI) method. The approach uses the thermodynamic properties of parent molecules and a single group value to account for loss of a hydrogen atom. For

◦ ∗ enthalpy of formation, ∆f H298, the group value for a radical R is simply the bond strength

◦ of the R–H bond. The HBI values for Cp(T ) and S298 are obtained from molecular struc- ture differences between the radical and parent molecule using the rigid-rotor/harmonic oscillator (RRHO) approximation, which essentially means that translations, rotations, and vibrations of the species are treated as uncoupled when solving the Schrodinger¨ equation.

◦ The S298 value calculated for the radical does not include symmetry corrections, which should be added in later based on the point group of the radical. The equations for the thermodynamic properties are as follows:

◦ ∗ ◦ ◦ ◦ ∆f H298(R ) = HBI(∆f H298) + ∆f H298(RH) − ∆f H298(H)

◦ ∗ ◦ ◦ Cp (R ) = HBI(Cp ) + Cp (RH)

◦ ∗ ◦ ◦ S298(R ) = HBI(S298) + S298(RH) where R∗ is the radical chemical species, RH is the parent molecule created by saturating the radical chemical species with hydrogen atoms, and HBI are the group values for each 11 of enthalpy, entropy and heat capacity.

1.2.1.3 Quantum Mechanics Thermodynamic Property estimation

RMG contains a module for calculating thermodynamics known as Quantum Me- chanics Thermodynamic Property (QMTP) estimation. During the course of a simulation, if this option is turned on, RMG will send information about the molecule’s 2D graph to a computational chemistry program to obtain the 3D geometry and eventually the species’ vibrational frequencies and enthalpy. Other thermodynamic properties can be calculated from the computational chemistry output files using statistical mechanics. Typically, this option is only used for cyclic and polycyclic species. QMTP uses the RRHO approxi- mation and it was built for semi-empirical methods such as PM3, PM6 and PM7. While these methods are not as accurate as other quantum chemistry methods, such as density functional theory (DFT), they are sufficient when group additivity estimates are very inac- curate, at low computational cost [29].

1.2.2 Kinetics

Similar to thermodynamics, unknown reaction kinetics can be calculated using esti- mation techniques. Kinetic parameters can also be calculated directly using transition state theory; these are used to formulate rate rules for a given type of reaction [30].

1.2.2.1 Evans-Polanyi

One estimation correlation, which relates reaction rates to enthalpy of reaction, is the Evans-Polanyi relationship [31]:

(E + α∆H◦) k(T ) = A exp 0 RT

The correlation trends with reaction family; parameters A, α and E0 are fitted for a given reaction family. While computationally efficient, the relationship is not accurate for 12 all classes of reactions.

1.2.2.2 Transition State Theory

With increases in computing power, more reaction rates can be calculated using quantum chemistry and statistical mechanics concepts in a framework known as transi- tion state theory [32]. Transition state theory says that a quasi-equilibrium exists between reactant and activated complexes, or transition states. Classical transition state theory is described by the following equation:

k T −∆G‡  k(T ) = B exp h RT

kB is the Boltzmann’s constant, T is temperature, h is Planck’s constant, R is the ideal gas constant, and ∆G‡ is the difference in Gibbs free energy between transition state and reactant. The equation can also be expressed in terms of partition functions, which can be calculated using statistical mechanics and molecular parameters from quantum chem- istry calculations. CanTherm, included as part of the RMG software but often used stand- alone, is an automated kinetics calculator that calculates reaction rates via classical transi- tion state theory (and also can calculate thermodynamic properties) from quantum chem- istry output files [33]. Classical transition state theory is only appropriate for reaction types with a clear reaction barrier; variational transition state theory should be used when the di- viding surface between reactant and product is less clear. Other kinetics calculators include POLYRATE [34], Variflex [35], and MultiWell [36]. 13

1.2.3 Transition state geometries

To allow kinetics to be calculated via transition state theory, a transition state ge- ometry is required. This is often the bottleneck of kinetics calculations, as a very close estimate is needed for computational chemistry programs to correctly optimize the transi- tion state geometry, which lies at a first order saddle point on the potential energy surface. Double ended methods have been developed, which require the user to provide the reactant and product geometries and use these to find the transition state. These methods fall under the categories of interpolation, nudged elastic band, or string methods. Some of these dou- ble ended methods have been automated, such as an algorithm developed by Zimmerman [37], and KinBot [38], but the computational cost of these programs is prohibitive for use in detailed kinetic models that may require thousands of rate parameter estimates. Bhoorasingh and West developed a group contribution algorithm for creating the transition state geometry estimate directly. It uses a machine learning approach to train key distances in the transition state geometry, based on the molecular structure of the reactants and products. Once these distances are known, correct transition states can be optimized about 70% of the time, for three reaction families [39]. The algorithm was recently inte- grated with computational chemistry packages, as well as CanTherm, for a fully automated method to calculate reaction rates. This automated kinetics algorithm, known as AutoTST, was used to generate some of the transition states and reaction rates used in this work [40].

1.3 Machine learning with decision trees

Thermodynamics, kinetics, and transition state geometry estimation using group contribution is essentially a regression problem. Group values, based on molecular struc- ture, are calculated by “regressing”, or minimizing the overall error (based on some cost function) between estimated and true values from a training set. The way these regres- sions are currently set up in RMG are in the format of decision tree regressors, which are 14 retrained when new reaction rates or transition state geometries are added to the training set. Decision trees can be applied to classification or regression problems. The class label, or value in the case of regression, is chosen based on a set of questions about features of data items. The tree is constructed by training on data items whose class label or value is known, and then used on new data points. Decision trees are advantageous because they are easier to understand than other, more complex machine learning models, such as neural networks, but are more robust than linear or logistic regressions. The predictions can also be improved by combining the results of multiple decision trees, known as ensembling [41]. Depending on the application, the decision tree method is sometimes preferable over simpler methods such as cluster analysis or data partitioning, which are used to develop quantitative structure activity relationships (QSAR), because of its applicability to large sets of diverse compounds which may also contain erroneous data [42]. Some examples of using decision trees in chemistry and biology are described below.

1.3.1 Decision trees in chemistry and biology

There are many examples of bioinformatics and cheminformatics problems that have been approached with decision trees. Han et al. describe a method for choosing biologically interesting compounds for drug discovery from a high throughput screening (HTS) data set. The decision tree was formulated based on the PubChem chemical struc- ture fingerprint system, and the C4.5 algorithm was used to construct the decision trees [43]. 10-fold cross validation was used to verify the validity of the decision tree model. It was shown that the model can be used to determine commonalities in an HTS data set, select compounds, and eliminate selections which arise from noisy data [42]. DNA sequencing is also a promising area for use of decision trees. Decision tree regression was used by Thornley et al. for prediction of unknown bases in a sequence. The decision tree was trained on all the peak heights near the base to be predicted, as well as 15 the bases in the neighborhood of those peaks. A neural network was further used to regress the information that was most successful for the decision tree regressor. [44] Decision tree regression has also been used for environmental applications. Hu and Cheng used a decision tree method known as the conditional inference tree (CIT) to under- stand which factors are important for predicting heavy metal distribution among the surface soils of the Pearl River Delta in China. They used a random forest approach, a bootstrap- ping algorithm where the CITs are subsampled without replacement. They combined the CIT approach with a finite mixture distribution model (FMDM), which can be used to dis- tinguish between natural and arthropogenic causes of heavy metal concentration in soils [45]. Decision tree classification and regression have proved useful for several applica- tions in chemistry and biology. Diverse features and data sets can be used with ease of understanding and dealing with missing data. They are also more robust than other ma- chine learning techniques. Furthermore, decision tree regressors are already implemented for some estimation in RMG, and can be extended to applications in this dissertation. As it stands, these trees are built by hand from chemical intuition, and the structure of these trees also influence the accuracy of the predictive models. Automated decision tree generation will be briefly discussed in Chapter 3.

1.4 Dissertation overview

The dissertation will apply automatic mechanism generation, quantum chemical calculations and machine learning to two new domains: silicon hydride chemistry and liquid-phase fuel chemistry. Chapter 2 of the dissertation discusses the implementation of solvation thermodynamic corrections to the gas phase data and methods in the Python version of RMG (RMG-Py). The implementation draws largely from the implementation in RMG-Java, with several improvements and additional data. Chapter 3 continues on the liquid-phase application, introducing a new method which uses machine learning to predict 16 a change in reaction rate between gas and liquid-phase. The method is used to modify the rates of an existing fuel oxidation model. Chapter 4 introduces a different application: the use of automatic mechanism generation to generate detailed kinetic models for silicon hydrides. The process of calculating and adding thermodynamic and kinetic data for silicon hydrides to RMG is outlined, and a new model for gas-phase thermal decomposition of

SiH4 is generated and discussed. Chapter 5 concludes by wrapping up the previous chapters and recommending some future directions for the research. 17

2 AUTOMATIC CALCULATION OF SOLVATION THERMODYNAMICS

Several environmentally, medically, and industrially relevant chemical systems in- volve liquid-phase reactions, including secondary organic aerosol formation, oxidation of fuels in the condensed phase, and radical scavenging in the body [4, 46–49]. When these systems are large and complex, containing thousands of radical-radical and radical- molecule reactions, it is difficult to elucidate all reaction pathways by hand. It is much easier and less error-prone to generate these mechanisms automatically. Thermodynamic solvation corrections are one necessary component in automatic mechanism generation for liquid-phase systems. It is important to have correct thermodynamics, because the equi- librium constant, Keq, is calculated from ∆Grxn. The accuracy of these Keq and therefore

∆Grxn can change the reaction mechanism dramatically. Because of solute/solvent interac- tions, the thermodynamics for individual chemical species, which are used to calculate the overall reaction thermodynamics, are different in gas and liquid phase. Experimental in- formation is not available for every species (and is especially sparse for radicals), so there exists a need for some other way to estimate the thermodynamic parameters. Literature data and estimation methods are already implemented into the Reaction Mechanism Gen- erator for gas-phase thermodynamics [7, 8, 22]. Using these gas-phase data, we can make so-called “corrections” to the thermodynamic data for the liquid-phase. Such methods exist in the Java version of RMG [50]. I have implemented these methods in the Python version of RMG, and extended the capabilities further as part of this dissertation. 18

2.1 Background

Below is a review of methods used by chemists to calculate the liquid-phase ther- modynamics parameters that are necessary for mechanism generation. The parameters can either be calculated explicitly using computational chemistry, or they can be estimated us- ing empirical relationships.

2.1.1 Explicit thermodynamic calculations

Calculation of thermodynamic parameters in solution is performed using a variety of methods, including discrete models such as quantum mechanical (QM), molecular me- chanical (MM) and hybrid (QM/MM) models [51], and continuum models [52]. Discrete solvation models treat each solvent molecule separately and can be computationally ex- pensive, especially in the case of pure QM methods. These methods also cannot represent long-range, bulk phenomena in solvent. Continuum methods, including the polarizable continuum methods (PCM), multipole expansion (MPE) and Generalized Born (GB), are less expensive when taking a QM approach. However, they have the disadvantage of ne- glecting local interactions between solute and solvent. A more thorough description of these methods can be found in Jalan et al. [53], and in Chapter 3.

2.1.2 Estimation techniques

Another approach to calculating solvation thermodynamics are estimation meth- ods, such the linear solvation free energy relationship method (LSER). This approach is based on the assumption that the solvation thermochemistry of a single species can be bro- ken down into individual contributions from solute/solvent properties such as cavitation, dispersion, hydrogen bonding and polarizability. Kamlet and Taft suggested that these con- tributions could be quantified in terms of electronic transitions, such as π to π? and p to p?, which occur when a solute is solvated. Kamlet and Taft’s observations along with other 19 contributions resulted in the solvatochromatic equation [54]:

? 2 SP = SP0 + sπ1 + dδ + aα1 + bβ1 + h(δh)

SP refers to any solvation parameter (with SP0 an intercept); for example, one solvation property to be calculated is the logarithm of the partition coefficient between gas and sol- vent (logK). The lowercase parameters are fitted parameters for a given solvent, while the

? other parameters represent the following: π1 is the electrostatic contribution due to dipo-

larity/polarizability, α1 is hydrogen bond donation, β1 is hydrogen bond acceptance, δh represents cavity formation, and δ is a polarizability correction factor [55]. Abraham et al. [56] later refined this equation:

SP = c + eE + sS + aA + bB + lL

where the capital letters purely represent solute properties and again, the lowercase letters are fitted solvent parameters. These solvent parameters have been previously tabulated and published for many solvents; for example, those for water and 1-octanol can be found in [57]. Of the solute descriptors, S represents the solute’s electrostatic interactions due to dipolarity and polarizability and was derived from the π? parameter; L is a representation of size based upon the solute’s gas-hexadecane partition coefficient; E is derived from the Kamlet-Taft δ parameter and serves as a correction to S; A represents hydrogen bond do- nation ability/acidity of the solute; and B is the hydrogen bond acceptance ability/basicity [55]. Some correlations are used to find the solute (uppercase) Abraham parameters indi- vidually (reviewed in [53]), but molecular structure group additivity methods can be used to calculate all of the parameters at once. In particular, Platts et al. devised 81 molecular structure fragments to be used in the calculation of S,B,E and L, and another 51 frag- ments for A [58]. These fragments include atom-centered contributions, such as a carbon 20

(a) (b)

Figure 2.1: Examples of molecular structure fragments. (a) An atom-centered methyl group, consisting of a carbon atom with three single bonds to hydrogen atoms. (b) A non atom-centered ester group, specified by several atoms.

atom attached to four non-hydrogen atoms; group-based corrections, such as fused rings; and intra-atomic interactions, such as ortho, meta and para interactions. Examples of some of these groups are given in Figure 2.1. The contributions for each of these fragments are added together to obtain a value for each solute descriptor. Then, the Gibbs free energy of

solvation, ∆Gsolv at 298 K can be found from the partition coefficient with the following:

∆Gsolv = −RT lnK

Mintz et al. found solvent descriptors to utilize the solvatochromatic approach with the Abraham solute descriptors for the prediction of the enthalpy of solvation, ∆Hsolv [59]:

∆Hsolv = ch + ehE + shS + ahA + bhB + lhL

where the subscripts h denote that these solvent descriptors refer to the fitted parameters corresponding to the enthalpy of solvation relationship.

To obtain ∆Gsolv at other temperatures, a method of linear extrapolation is com-

monly used. ∆Gsolv and ∆Hsolv at 298 K are used to calculate the entropy of solvation,

∆Ssolv:

∆H − ∆G ∆S = solv solv solv T

Assuming that ∆Hsolv and ∆Ssolv are independent of temperature, ∆Gsolv at a given tem- perature can be found with: 21

∆Gsolv(T ) = ∆Hsolv(298K) − T ∆Ssolv(298K)

Other methods for treating the temperature dependence of ∆Gsolv can be used, as in [60]; however, the Mintz method is both fast and useful in that the solute properties, cal- culated via group additivity, can be used for both the partition coefficient and the enthalpy of solvation. The errors in the outlined linear solvation method for molecules at 298 K are com- parable to those of the discrete and continuous methods, about 1 kcal/mol typically, but higher for large molecules and ions [61]. However, the computational time required is much shorter as compared to the more expensive QM and MM methods [53]. A study comparing force field (MM) calculations of solvation free energy with SM6, a continuum QM method [62], and LSER, for nitroaromatic compounds, found that the LSER method performs as well as the best force field methods. The LSER method had a mean unsigned error of 0.59 kcal/mol as compared with experimental data, with all energies slightly too negative. The deviation increased with increasing number of nitro functional groups on the molecules. The SM6 method, in comparison, yielded a mean unsigned error of 0.50 kcal/mol, was unaffected by the number of nitro groups, and is faster than MM calculations. However, the error was highly affected by the level of theory used for the geometry optimizations [63].

2.2 Methods

As mentioned, while solvation thermodynamics had been previously implemented in the Java version of RMG [50], in this dissertation these capabilities were adapted for use in RMG-Py. Additionally, several improvements to the prior implementation of solvation thermodynamics were made in this work. 22

2.2.1 Adaptation of solvation thermodynamics from RMG-Java

(a) (b)

Figure 2.2: (a) Example of an entry in the solute group database, and (b) the molecular structure of the corresponding group.

The estimation methods of Abraham and Platts which are described in the Back- ground, using linear solvation energy relationships along with group additivity, were newly implemented in this work. Specifically, the RMG-database [64] project, which is coupled with the Python version of RMG, was modified. The addition of solute molecular struc- ture groups consisted of adding two group databases, one for atom-centered groups and one for non atom-centered groups. The groups are specified such that each atom (besides hydrogen) in a molecule must belong to exactly one atom-centered group, and may also belong to one non atom-centered group. An example entry in the solute group database and its corresponding structure is shown in Figure 2.2. Each group is defined by a label, an adjacency list specifying the atom types and bonds included in the group, and the Abraham solute parameters S,B,E,L and A. The starred atom is the one matched against atoms in the solute molecule in RMG, and “Cs” indicates that it is a carbon atom with only single bonds. 23

A solvent database also was added, where the solvent is defined by either an adja- cency list or its SMILES string identifier. The parameters included in the solvent database are the solvent coefficients for the logK and ∆Hsolv linear solvation energy relationships. Each entry also contains the A-E coefficients of the viscosity-temperature correlation to be used in diffusion corrections. Some entries also store the solvent’s dielectric constant and its solute A and B parameters for potential use in kinetic solvent corrections. The diffusion and other kinetic solvent corrections will be explained in the following chapter.

2.2.2 New additions in RMG-Py

Once the solute and solvent parameters are obtained or calculated based on the

Platts group fragments, the functions to calculate partition coefficient K, ∆Hsolv, and ulti- mately ∆Gsolv(T ) were implemented using the equations in the Background. These meth- ods were added to a new solvation module in RMG-Py, with modifications also made to the source code in the thermo and molecule modules to calculate and apply the solvation correction during an RMG simulation. Modifications also were made to the main RMG-Py execution in order to load the user-specified solvent at the start of the so that the proper sol- vent descriptors would be loaded. Furthermore, in this work, several important updates and improvements were made to the implementation of solvation thermodynamics in RMG-Py, which are not present in RMG-Java

2.2.2.1 Treatment of radicals and Abraham A value

No Abraham parameters for logK or ∆Hsolv exist for radical species. Therefore, we must devise a method to calculate solvation thermodynamics of radicals using the available data. By comparison, for gas phase thermodynamics, radical species (R∗) data are calcu- lated using the saturated species (RH) data and hydrogen bond increment (HBI) theory, explained in Chapter 1 [28]. One can similarly use the saturated species data to calculate the solvation thermodynamics of radicals by calculating the solute descriptors, and then 24 correcting the Abraham A solute descriptor to account for the effect on hydrogen bonding caused by removing a hydrogen atom. For example, the effect of creating a peroxyl radical from a peroxide group by removing a hydrogen is a 0.345 decrease in A [58]. 14 Platts groups for this A descriptor correction have been added to a “radical” database in the cur- rent RMG-Py. These A descriptor corrections were obtained from Table 5 of [58], with slight modifications made to adhere to the RMG format of adjacency lists and to ensure that the correction is not double counted for groups where there are two hydrogen bonded atoms. The sign of the values were flipped from those provided in the table since the cor- rection is applied when a hydrogen is removed. With these changes, the implementation of radical groups in RMG-Py is shown in Table 2.1.

2.2.2.2 Treatment of lone pairs in solvation thermodynamics

Currently, there exist no Platts groups to account for lone pairs on atoms above their normal bonded configuration (i.e., 0 for carbon, 2 for oxygen). Similarly, there also exist no gas-phase Benson groups for carbon centered groups with one or more lone pairs. Since these groups would not match existing Platts or Benson groups, in both cases, would be

1 estimated by RMG incorrectly. For example, since CH2 is only singly bonded to 2 groups, in this case hydrogen atoms, it would not match any Platts groups that are sp3 carbon atoms bonded to 4 groups, or any sp2 or sp groups. Therefore, it would fall up RMG’s molecular structure group tree to a generic sp3 carbon atom. This generic group is filled in with the data for a carbon atom bonded to 4 other carbon atoms (as in neopentane), which would

1 have very different thermodynamic properties than CH2. When this situation occurs, rather than using this erroneous data, I devised and implemented a new method to convert these

1 3 lone pairs to unpaired electrons, therefore converting CH2 to CH2. RMG would then proceed with the algorithm in the prior section; that is, it would saturate these unpaired electrons, creating CH4, and calculate the solute descriptors for that molecule. While not relevant in this example, if removing these two hydrogen atoms would change the A solute 25

Radical fragment Hydrogen bonding correction to A -0.345 -0.345

-0.243

-0.087

-0.371

-0.543

-0.247

-0.275

-0.281

0.091

0.0825*

0.119

-0.17

Table 2.1: Hydrogen bonding parameters from [58], adapted for corrections to the solvation thermodynamics of radical species when saturating the radical with hydrogens causes a hydrogen bonding effect. *the value was divided in half since there are two hydrogen- bonded nitrogen atoms in this molecule. 26 descriptor, it would also be updated as in the true radical electrons case. Similarly, the gas- phase thermodynamics methods were analogously updated to convert lone pairs to radicals when we do not have data for the specific lone pair group.

2.2.2.3 Additional Platts groups

Carbon, hydrogen and oxygen atom-containing Platts groups for calculating solute descriptors were originally included in the Java version of RMG. To adapt to increasing capabilities of RMG-Py, in this work, additional published Platts groups were added to the database of RMG-Py. Specifically, these were groups including nitrogen and sulfur atoms [58]. In the future, halogen groups calculated by Platts should be added to the database, as capabilities for chlorine and fluorine are currently being added to RMG. Additionally, sili- con atom-containing groups should be added, but must first be calculated, as these groups were not previously published.

2.2.2.4 Solute database

Using group additivity to calculate the five solute parameters for every species in every reaction increases the time needed for mechanism generation. For many molecules, the solute descriptors have already been calculated and tabulated [65]. Thus, in this work a solute database was created from this published data, which currently contains 152 values. If RMG can find the molecule of interest in this solute database, it will use the values given. If it cannot, then it will use group additivity to calculate the solute descriptors. 27

2.3 Results

The calculations of solvation thermodynamics were validated to ensure that both the group additive scheme for estimating solute descriptors as well as the Abraham and

Mintz correlations for logK and ∆Hsolv are sufficiently accurate.

2.3.1 Estimation of solute descriptors

Figure 2.3: g-decalactone structure, with atom-centered groups circled in green and non atom-centered groups in blue.

For verification that the algorithm to predict solute descriptors is implemented cor- rectly into RMG, the Abraham solute parameters for g-decalactone (Figure 2.3), a molecule consisting of several atom-centered and non-atom-centered groups, are compared to exper- imental values from ACD/Labs [66]. Table 2.2 displays the results of the comparison. The values predicted by RMG-Py and the experimental values closely agree, showing that RMG can identify molecular structure fragments via their adjacency list and count them correctly for groups containing carbon, hydrogen and oxygen atoms. The groups containing nitrogen and sulfur have been implemented, but not yet tested.

Solute Parameter Experimental Predicted by RMG-Py S 1.26 1.26 B 0.55 0.54 E 0.32 0.39 L 6.27 6.33 A 0 0

Table 2.2: Abraham solute parameters for g-decalactone calculated by group additivity in RMG-Py, shown with experimental values from from ACD/Labs [66] 28

(a) (b)

Figure 2.4: Comparison of solvation thermodynamics values for 20 solutes in water calcu- lated by RMG-Py to those in databases. (a) ∆Hsolv, compared to values in Mintz et al. [59] (b) ∆Gsolv, compared to values from the University of Minnesota solvation database [67].

2.3.2 Calculation of solvation thermodynamics

The logarithm of the partition coefficient, logK and ∆Hsolv are calculated with linear solvation energy relationships. These relationships utilize the solute descriptors, which we showed can be calculated via group additivity in the previous section, along with

the solvent descriptors. Then, ∆Gsolv was obtained from K. For 20 solutes in water at

298 K, the calculated values of ∆Hsolv were compared to those in Mintz et al. [59] and those of ∆Gsolv were correlated with those from the University of Minnesota solvation database [67]. The mean absolute deviation was 4.2 kJ/mol for ∆Hsolv and 2.9 kJ/mol for

∆Gsolv. The results of this comparison are illustrated in Figure 2.4. Because the Minnesota solvation database was likely used to train the values for the Platts group contributions, the similarity in values should be interpreted as a demonstration of reproducibility (also acknowledged in Jalan et al. [50]). 29

2.4 Summary

In this chapter of the dissertation, solvation thermodynamics was enabled in the Python version of the Reaction Mechanism Generator (RMG). The calculation methods, involving linear solvation energy relationships and a group additivity method for determin- ing solute descriptors, were adapted into RMG-Py, with the source code updated with these changes. In addition, treatment for the solvation thermodynamics of radicals was improved by using saturated species thermodynamics and the Abraham A value, more of which were added to RMG’s database. Solvation thermodynamics for species with lone pairs were treated by converting the lone pairs into unpaired electrons. More group additivity values were added to RMG’s database for species containing nitrogen and sulfur, and estimation was sped up and improved by incorporating a database of known solute descriptors. It

was ensured that group additivity was implemented properly by comparison of ∆Gsolv and

∆Hsolv to the values in University of Minnesota’s solvation database [67]. Solute descrip- tors for a complex molecule were also successfully compared to experimental values from ACD/Labs [66]. The progress made in these areas make it possible to correct gas-phase thermodynamics to account for reactions in different solvents, and ultimately are the first step to generating detailed kinetic models for liquid-phase systems using RMG-Py.

2.5 Recommendations

The recommendations outlined below will further improve the estimation of solva- tion thermodynamics in RMG-Py. These involve improving the temperature dependence of ∆Gsolv, refining treatment of the solvation chemistry of species with lone pairs, better benchmarking, and expanded solvent list. 30

2.5.1 Temperature dependence of solvation thermodynamics

The linear extrapolation method used for calculating ∆Gsolv at temperatures other than 298 K, i.e., by assuming that ∆Hsolv and ∆Ssolv are constant, is not always an ac- curate assumption for temperatures far from 298 K. For example, it was shown to be very

inaccurate for O2 in water; while the RMG estimation method would have ∆Gsolv increase monotonically with temperature, the experimental values only increase slightly, and begin to decrease near 450 K [68, 69], (Chung, unpublished work). Other methods for treating the temperature dependence have been previously suggested and implemented [53, 60]. Because they are slower than the current method, other methods should only be imple- mented in RMG where the inaccuracy is expected to be high; for example, only above certain temperatures or for certain chemical systems.

2.5.2 Calculation of solvation thermodynamics for lone pair species

A fix has been implemented to convert lone pairs to radical electrons in order to compute solute descriptors; however, it would be preferable to directly calculate Platts groups for lone pair species. Though there is a lack of experimental data for these molecules, quantum chemistry calculations could be done to obtain solvation thermochemistry and derive solute descriptors for these species. These descriptors could simply be put into the solute library, or several calculations could be further processed to derive Platts group ad- ditivity values.

2.5.3 Improved benchmarking of group additivity values

The solvation thermodynamic parameters calculated in this work were only com- pared to values in the Minnesota solvation database, which were originally used to train the molecular structure group values used to calculate the solute descriptors. While ex-

perimental ∆Hsolv and ∆Gsolv are not widely available, the calculated parameters could be compared to high level quantum calculations to get a better feel for the accuracy of 31 the estimates. This type of comparison is often done to gauge the accuracy of gas-phase thermodynamic values calculated by group additivity [28, 70].

2.5.4 Expansion and enhancement of solvents

The current list of 26 solvents available for calculating solvation thermodynam- ics could be further expanded in the future. For example, there exist published Abraham solvent descriptors for biological solvents such as blood in the brain [71], which would facilitate kinetic modeling of even more diverse processes using RMG. Another useful addition would be the ability to select mixtures of solvents for a RMG simulation, thus re- quiring a method of interpolation between the solvent descriptors of two or more solvents. Ben Amara et al. use dodecane as the single solvent in their RMG simulations, despite investigating different mixtures of dodecane and methyl oleate as biodiesel surrogates, for example [4]. Furthermore, if a solvent is also a reacting chemical species, the identity of the solvent changes throughout the RMG simulation. Capability to make the solvent de- scriptors similarly change as a simulation progresses would make the RMG simulations more realistic. 32

3 IMPLEMENTING KINETIC SOLVENT EFFECTS IN AUTOMATIC MECHANISM GENERATION

The previous chapter dealt with corrections to gas-phase thermodynamics to ac-

count for solvation, which is necessary in calculating the overall ∆Grxn and thus reverse reaction rates. However, learning about complex liquid-phase systems also requires knowl- edge of solvent effects on the forward rates of elementary chemical reactions. Many of the reactions in liquid-phase mechanisms of interest, including fuel oxidation, are radical- molecule and radical-radical reactions. Depending on the solvent, rates of reaction can vary by orders of magnitude, thus changing likely pathways and product distributions. Further- more, knowledge of kinetic solvent effects not only helps with generation of liquid-phase reaction mechanisms, but can aid in the design of solvents to promote a desired reaction pathway or product [72]. Two main effects must be understood: physical diffusion and intrinsic kinetics effects. To account for diffusion limitations on reaction kinetics, we calculate an effective rate constant for each reaction, which depends on the intrinsic gas-phase reaction rate and the diffusivities of reacting species. Diffusivities, diameters of reacting species, and solvent viscosities are all calculated using empirical correlations. The specifics of these calcula- tions will be outlined in this chapter. Modifying reaction rates to account for a solvent’s intrinsic effect on the gas-phase kinetics is more challenging. This intrinsic effect changes the chemical environment of reactants and modifies the reaction barrier, as opposed to a solvent’s physical diffusion limitation, which modifies the effective rate [73]. Solvent effects on the reaction rate de- 33 pend on both the nature of the solvent and the type of reaction occurring. The intrinsic effect of the solvent on the reaction rate will be investigated using quantum mechanical methods to find the energy difference between the reactant and transi- tion states of chemical reactions, i.e. Figure 3.1. Several reactions within specific families in RMG that are relevant to oxidation chemistry, such as hydrogen abstraction, will be analyzed. Trends in solvent effect will be illustrated, based upon properties of the react- ing species, for example, carbon chain length or presence of an alcohol group. Using these trends, a scheme can be created to predict the change in rate with solvation based on molec- ular structure and properties of the solvent.

Figure 3.1: Example of a potential energy surface for a reaction in gas-phase (black) and a solvent (blue). A solvent may have a different effect on the energy of reactants, transition states, and products in a chemical reaction. Reproduced from [73]

3.1 Background

Prior studies on intrinsic solvent effects employ a number of experimental and the- oretical approaches, which will be outlined below. Previous solvent effect discoveries will then be organized and discussed based on reaction family. 34

3.1.1 Experimental techniques for determining reaction rates in liquids

Experimentally determining reaction rates for radical reactions in solution can be difficult due to the short-lived nature of some radicals; however, some methods have been developed over the last century and are commonly used for measuring these kinetics. In an early method pioneered by Briers and Chapman known as rotating sector, or the intermittent-illumination method (IIM), a sample is exposed to a constant intensity of light for intermittent periods of time, such that the amount of time spent in light and in the dark remains constant [74–76]. The average reaction rate, WM, can be calculated by:

kp p −1 WM = √ [M] φI(1 + r) 2kt

where kp and kt are the propagation and termination rates, respectively, [M] is the concen- tration of the compound under investigation, M, reacting with a radical, φ is the quantum yield of photoinitiation, I is the light intensity, and r is the ratio of time in the dark to time in the light [76]. This method has been applied to reactions in gas phase and solution, including polymerization and radical recombination [77–79]. This method, however, can only be used for some specific types of radical chain reactions, with one requirement being that they can be photochemically initiated. [80] A very common method for measuring the reaction rate of radical reactions in both gas and liquid-phase is laser flash photolysis. In this method a sample is excited by a pulse from a laser, and radical species are monitored by measurement of their spectral absorption. The spectral absorption can be measured with electron spin resonance, in which the unpaired electron of the radical interacts with the nuclei in the molecule leading to a mapping of electron density [81]. An indirect way of measuring the rate constants for radical-molecule reactions is the radical clock method, which uses a known unimolecular reaction rate and a measured prod- uct distribution to determine an unknown radical-molecule reaction rate.[80] For example, 35

Roschek and co-workers developed radical clocks for peroxyl radical reactions using the competition between a beta-fragmentation of a peroxyl radical and a bimolecular H-atom transfer [82]. This concept is shown in Figure 3.2. Jha and Pratt point out some limitations

Figure 3.2: Concept of peroxyl radical clock from Roschek et al.[82] to the type of molecule R1-H from which the hydrogen atom is abstracted [83]. If R1 is either persistent or highly stabilized, it cannot carry the chain reaction, and a large con- centration of substrate is required. They describe a modification the radical clock method using peroxyesters, making it possible to study a wider range of reactions.

3.1.2 Computational chemistry

Another approach to computing intrinsic kinetic solvent effects is by comparing reaction barriers calculated in both gas and solution using quantum chemistry. Differen- tial solvation between reactants and transition states affects the reaction rate according to transition state theory, which says that a quasi-equilibrium exists between reactant and ac- tivated complexes, or transition states [32]. Several computational methods are commonly utilized for obtaining geometries of reactants and transition states and their energies.

3.1.2.1 Density functional theory

One set of methods used to solve the Schrodinger¨ equation numerically is based on density functional theory (DFT), an approximation to the wave function with only three variables used to obtain electronic structures of molecules, radicals and activated com- plexes [84, 85]. The DFT method chosen has a significant impact on whether the species’ 36 geometries and energies are accurate, and it was previously found that the accuracy of den- sity functionals for predicting barrier height is correlated with their accuracy for transition state geometries [86]. In addition, approximate functionals such as DFT predict the tran- sition state energies too low because they incorrectly delocalize electrons [87]. However, when comparing transition states in gas and liquid, this discrepancy matters less since it will be present in both cases and some cancellation of error will occur. When DFT is used to compute solvent effects, comparison to experimental rates shows that it provides a high enough level of theory to capture the desired effects [88, 89].

3.1.2.2 Solvation models

Figure 3.3: Comparison between a continuum solvation model (left, figure from [90]) and a hybrid implicit/explicit model (right, figure from[91])

Computational methods for estimating solvation energies are reviewed briefly in the previous chapter and in [53] and generally fall into two categories: those that represent solute and solvent molecules explicitly, and those that represent only solute molecules ex- plicitly and the solvent molecules somewhere in between explicit and continuous. Explicit treatment is done either quantum mechanically (QM) or with molecular mechanics (MM), or some combination of both, as in QM/MM. One hybrid implicit/explicit solvation method is the shells theory proposed by Pliego

[92]. In this solvation treatment, the solvent shell closest to the solute (S1), representing solute-solvent interaction, is treated either fully quantum mechanically or with molecular

dynamics based on classical force fields. The remaining solvent, S2, is treated with contin-

uum solvation. When the number of solvent molecules in S1 becomes infinite, this theory 37 converges to the full discrete solvent representation. This approach is also known as the cluster-continuum model, mixed discrete-continuum model, or quasichemical theory [93]. Continuum solvation models represent solvation as a solute placed inside a cavity within an implicit solvent, which is modeled as a continuum with a constant property such as conductivity or dielectric constant. See Figure 3.3 for a pictoral comparison between continuum and a more explicit method. The solute cavity can be shaped like a sphere or ellipsoid, or as in more modern methods, based upon a superposition of atom-centered spheres [93]. However, representing a solvent this way does not account for local solute- solvent interactions, and the assumption that the dielectric constant near the solute surface is equal to the bulk dielectric constant is inaccurate. A recommended method for calculating liquid-phase energies, which takes into account contributions from the first solvation shell, is SMD [94]. This model, like the authors’ other SMx methods, include a term for non-electrostatic effects due to cavity for- mation, dispersion interactions, and solvent structure. The contribution is dependent upon the solvent-accessible surface area (SASA) of each solute atom. The main feature distin- guishing SMD from the other SMx methods is that it utilizes a continuous charge density of the solute, rather than a discrete representation. In the computational chemistry program [95], the method is combined with the polarizable continuum method (PCM) [96] for single-point energy calculations on a solute in a solvent. However, it can be used with other algorithms such as COSMO [97] and COSab [98]. A recent method developed by Pomogaeva and Chipman, known as the composite method for implicit representation of solvent (CMIRS), uses six parameters to describe interactions between solute and solvent including dispersion, exchange, hydrogen bonding, and long range electrostatic interactions. Because of low level of parameterization, this model is believed to capture a higher level of physical truth. For hydration energy, the mean unsigned error may be as low as 0.8 kcal/mol for neutral solutes and 2.4 kcal/mol for ionic solutes and has been parametrized for the B3LYP and Hartree-Fock quantum 38 chemistry methods [99]. With regards to solvation kinetics, Silva et al. parametrized the

CMIRS model for methanol in order to predict activation free energy barriers for SN2 and

SNAr reactions. Both CMIRS and SMD were compared to experimental data of solvation free energies and while they perform similarly for neutral species, the MUE for CMIRS is lower in the case of anion and cation solutes. For free energy barriers, CMIRS performs similarly to COSMO-RS while SMD is slightly worse [100].

3.1.3 Kinetic solvent effects within reaction families

Kinetic solvent effects are usually deduced, and are most generalizable, within a particular reaction type. However, as will be shown, the kinetic solvent effect can vary within a reaction family and across solvents.

3.1.3.1 Hydrogen abstraction

Figure 3.4: The solvent effect on hydrogen abstraction from α-tocopherol is independent of the radical (DPPH (x-axis) or TOH (y-axis)) Each number from 1-13 represents a different solvent. Reproduced from [101]

The largest body of literature on solvent effects on reaction rates is in bimolecular hydrogen abstraction reactions. Das et al. used laser flash photolysis to study the reaction 39 of tert-butoxyl radicals with phenols in six solvents [102]. The rate decreased in polar sol- vents, explained by the capability of the phenolic OH group to hydrogen bond with solvent molecules. Ingold and co-workers have attempted since to further deduce these solvent effects in hydrogen abstraction reactions. Valgimigli et al. found that the solvent effect on abstraction of the phenolic hydrogen from α-tocopherol by both tert-butoxyl radical and 2,2,-diphenyl-1-picrylhydrazyl (DPPH·) is independent of the radical in almost every sol- vent they tested [101] (see Figure 3.4). This result is especially surprising, since for these two radicals, the reaction rate in the same solvent differs by over 106. Any deviation from this behavior, as in tert-butyl alcohol, is thought to be due to the reaction being partially diffusion controlled. The reaction of α-tocopherol with tert-butoxyl was further investi-

H gated in four solvents; the rate constant decreased with increasing β2 value [103]. The

H β2 parameter represents the hydrogen bond acceptance ability/basicity of the solvent. Fi- nally, discrepancies between the data obtained and another study on the reaction of Trolox

with Cl3COO· [104] are explained by a mechanistic shift from hydrogen atom abstraction to electron transfer in solvents with high dielectric constants and basicities. This electron transfer mechanism may also be accompanied by a solvent-assisted proton loss, known as sequential proton loss electron transfer (SPLET):

− + Cl3COO · +ArOH −→ Cl3COO + ArO · +H (1)

− + Cl3COO · +ArOH + S −→ Cl3COO + ArO · +SH (2)

Thus, the electron transfer mechanism can account for rates which are higher than the rate expected from simply using a correlation with a solvent property. Foti et al. also discovered this fast electron-transfer reaction between phenols and DPPH [105]. While hy- drogen atom transfer was dominant in nonpolar solvents most of the time, electron transfer

still occurred if the radical was strongly oxidizing, as with Cl3COO·. Reactions proceeding via the electron-transfer mechanism should be faster in polar solvents. But surprisingly, the 40 rate constant was higher in ethanol than methanol despite methanol having a higher dielec- tric constant; they attributed this inconsistency to solvent impurities. The Snelgrove-Ingold correlation for hydrogen abstraction reactions relates the dif-

H ference in rate constant between the reaction in gas and in solvent to the solvent’s α2 and

H β2 (hydrogen bonding) parameters [106]:

kgas H H log10 = 8.3α2 β2 ksolvent

Later, it was found that this single empirical equation could not describe an entire reaction family [107–109]. In solvents which support ionization, some hydrogen abstrac- tion reactions, for example that of 2,2’-methylene-bis(4-methyl-6-tert-butylphenol) (BIS) with DPPH·, proceeded by the SPLET mechanism [109]. This led to a reaction which is zero-order in DPPH·. Because this is not true for the reactions of all phenols with DPPH·, it was suggested that properties of the reacting phenol play a role. One such property is the intramolecular H-bond in BIS, which may slow the reverse proton-transfer reaction, thus

H leading to the unusual effect observed. Furthermore, Nielsen and Ingold found that the β2 scale does not account for solvents’ anion solvating properties, and thus reactions involving proton transfer do not quite follow the Snelgrove-Ingold correlation [110]. The Taft β scale [54] gives better correlations for this type of reaction. An experimental study by Warren and Mayer also note the failure of the generalized Snelgrove-Ingold correlation. They stud- ied the effect of small amounts of solvent additives on the oxidation of ascorbate (vitamin C) by TEMPO radical. Their results indicate that solvent effect on hydrogen abstraction reactions is better explained by local solvent effects, as the effects are much greater than can be explained by bulk solvent properties [111]. In some types of hydrogen abstraction reactions, the effect of changing the solvent is low. The reaction between ascorbate and 2,2,6,6-tetramethylpiperidine-1-oxyl radical was studied experimentally [112]. The mechanism was best explained by proton-coupled electron transfer (PCET) (Figure 3.5), where an electron and proton are transferred simul- 41

Figure 3.5: Example of the PCET mechanism, reproduced from [112] taneously but between different sets of orbitals. The solvent was varied between water and mixtures of water and dioxane, decreasing the polarity. The quantity investigated was the kinetic isotope effect (KIE), which is defined as the ratio of the rates of hydrogen ab- straction in water and in D2O. KIE was found to only slightly increase with decreasing solvent polarity. Interestingly, hydrogen tunneling was suspected to take place in all sol- vents studied, because the experimental KIEs were larger than expected by semiclassical theory. From these studies it has generally been understood that solvent effects on hydro- gen abstraction reactions are significant when looking at O-H bond abstraction, primarily because of the O-H bond’s ability to participate in hydrogen bonding networks, but that the effect on C-H bond abstraction is negligible. Despite this assumption, Koner and co- workers generalized these effects to C-H and Sn-H bonds [113]. They explain the results as a stabilization of the abstracting species in polar solvents, rather than the hydrogen donor (which is the case for abstraction from the O-H bond). Thus, the nature of the abstracting species has a large effect on the reaction rate in solvents. They still hypothesize that the reaction of a non-polar hydrogen donor and non-polar abstracting radical will have little solvent effect, but acknowledge this is hard to test due to the high activation barrier of these reactions. 42

3.1.3.2 Radical addition to multiple bonds

Kinetic solvent effects on the addition of radicals to multiple bonds has mainly been inves- tigated theoretically and has been extensively studied by the groups of Fouassier [114–116] and Radom [117–120]. When various radicals were added to methyl acrylate using DFT calculations, the rate of reaction in various solvents correlated well with the dipole mo- ment of the solvents; however, it was argued that a multipole approach was still needed [115]. Polar solvents still had a small effect on the rate if the charge transfer from the reactants to transition states, calculated using a Mulliken charge analysis [121], was low. Wong and Radom performed calculations using the self-consistent isodensity polarizable continuum model (SCIPCM) [52, 117]. In the case of radicals with saturated substituents adding to alkenes with saturated substituents, addition of any solvent increases the barrier. Solvent decreases the barrier in the unsaturated case. The higher the dielectric constant of the solvent, the greater this effect. Finally, Garcia et al. investigated the addition of sev- eral radicals to methyl aminoacrylate with DFT and found that for an unspecified solvent, barriers increase with solvation for electrophilic radicals (phenyl and trifluoromethyl) and decrease for nucleophilic radicals (methoxymethyl and methyl) [122]. From these studies, it can be concluded that the chemical nature of the radical has a larger effect on the rates than the polarity of the solvents in this reaction family.

3.1.3.3 Beta scission

β-scission of tert-butoxyl radicals using electron spin resonance to measure rates was in- vestigated by Weber and Fischer [124]. They found that at 300 K, the solvation rates were at least ten times larger than the gas phase rates. This result was explained by a transition state effect, where the transition state is more polar than the radical and is thus more sta- bilized by interactions with polar and polarizable solvents. Furthermore, the β-scission of cumyloxyl radicals using laser flash photolysis also showed a rate increase with increas- ing solvent polarity [125]. The same effect was found for alkyloxyl radicals [126]. Bietti 43

Figure 3.6: β-scission rates correlate with Dimroth-Reichardt parameter ET . Reproduced from [123] et al. also confirmed this effect and found that the solvent Dimroth-Reichardt parameter

(ET ) correlated well with the increase in rate [127], confirming earlier results by the Ingold group [123], see Figure 3.6. The ET parameter represents the charge-transfer absorption of the solvent in pyridinium N-phenolbetaine and serves as a different measure of solvent po- larity than the dielectric constant [128]. For example, methyl formamide has an extremely high dielectric constant but is of similar polarity to methanol as characterized by its ET value. It appears from these studies that the rates of β-scission reactions can be general-

ized to increase with some measure of solvent polarity such as ET . However, since only reactions of cumyloxyl and alkyloxyl radicals have been investigated, and mostly in water, acetonitrile or mixtures of the two, it is difficult to infer kinetic solvent effects for the entire reaction family.

3.1.3.4 Diels-Alder

Like the other reaction families discussed, the rates of Diels-Alder reactions in so- lution, in general, depend on both solute and solvent properties [130]. Breslow et al. [131] described a large increase in rate of Diels-Alder reactions in water and in stereoselectivity between endo and exo products. They explain the acceleration in terms of the reactant struc- tures; the diene and dienophile engage in hydrophobic stacking. Experimental studies with 44

Figure 3.7: Example of Diels-Alder reaction of cyclopentadiene and (-)-menthyl acrylate and possible product isomers. Figure reproduced from [129] methanol show that it is indeed this hydrophobic effect rather than a polarity effect which increases the rate in water, as the rates of some reactions actually decrease in methanol, which is unexpected. Later, Ruiz-Lopez et al. [129] studied Diels-Alder reactions of cyclopentadiene and methyl acrylate using ab initio calculations. They found that because the solvent’s electric field changes the shape of the potential energy surface, a direct change in the overall reac- tion mechanism is seen with solvation. It is argued that only adding the solvation energy to the gas-phase energy is not sufficient to determine the reaction path in solution. However, they also maintain that continuum theory models are sufficient for capturing some specific interactions with solvents, such as hydrogen-bonding, since overall electrostatic effects will implicitly include these properties. Another simulation study on the reaction of methyl acrylate with cyclopentadiene was done by Sheehan and Sharratt using molecular dynam- ics [132]. The reaction was studied in both methanol and n-hexane. The result showed that the endo/exo selectivity of the reaction products is related to the difference in the solvated transition states’ free energies. Further, the rates and selectivity were affected by properties of the solvent such as their polarity and H-bonding ability. In methanol, the product was energetically favored as compared to the energy in n-hexane; this was explained by the similarity in polarity between the product and methanol. Soto-Delgado et al. studied the reaction of cyclopentadiene and methyl vinyl ke- tone in both water and methanol, using a combined QM/MM-MD approach [133]. The 45 activation free-energy barrier in methanol is 2.1 kcal/mol higher than that in water, which is within 0.1 kcal/mol from the experimental difference (a rate deceleration of 2 orders of magnitude. Again, the ability of hydrogen bonding between solvent and transition state, which is stronger and longer-lived in water than in methanol, contributes to this effect. Kiselev et al. compared reaction rates of 9-(hydroxymethyl)anthracene and 9,10- bis(hydroxymethyl)anthracene with maleic anhydride, N-ethylmaleimide and N- phenyl- maleimide, in organic and water-1,4-dioxane cosolvents [134]. It was found that even those reactants which do not hydrogen-bond with water experienced acceleration of rate in water, depending on the structure of the diene. The organic cosolvents reduced the reaction rate, also depending on the polarity of the reactants. These studies illustrate the importance of both polarity of reactants and hydrogen bonding ability in kinetic solvent effects of Diels-Alder reactions, and verify that contin- uum models and other computational methods can capture both of these effects reasonably. Despite the success of these models, experimental studies were necessary to show the hy- drophobic effect that is crucial in some cases. Additionally, merely adding solvation en- ergies to those obtained in the gas-phase is not sufficient enough when the transition state geometries or shape of the potential energy surface change significantly in a solvent.

3.1.3.5 Acetylation

Figure 3.8: Acetylation mechanism, reproduced from [135]

Xu et al. [136] theoretically investigated the catalyst-assisted acetylation of tert-butanol with acetic anhydride in the gas phase and in three solvents. The geometry optimization 46 and reaction energies were found with B3LYP/6-311++G(d,p) and B3LYP/6-31G(d), while solvation energies were found using PCM. They found that the reaction proceeds via a mechanism characterized by nucleophilic attack of the catalysis, and that this mechanism does not change, nor does the rate-limiting step, with solvent. Because polar solvents solvate the reactants better than the transition states or reaction intermediates, the reaction is less favorable in these solvents. Consequently, the reaction proceeds less favorably in going from gas, to carbon tetrachloride, to chloroform, and finally to dichloromethane, the most polar solvent of the group. These results are also supported by early experimental studies [137].

3.1.3.6 Epoxidation

The epoxidation of olefins by hydrogen peroxide was studied both experimentally and with DFT by Berksessel and co-workers [89, 138, 139]. They find that for these reactions, using fluorinated alcohols such as HFIP accelerates the rate in relation to 1,4-dioxane [138, 139]. From 0-4 molecules of HFIP were studied with the reactants quantum mechanically in the gas-phase, and then the whole system was treated with PCM [89]. Acetone was chosen as a model solvent for HFIP, because of its similar dielectric constant, with some additional considerations for cavitation. The activation enthalpies were shown to decrease with increasing number of HFIP molecules, while increasing contribution of entropy lead to the Gibbs free energy of activation reaching saturation with three or four HFIP molecules. The same theoretical study was done using methanol as a solvent, but increasing the number of methanol molecules had no influence. However, the activation barrier was reduced more with methanol than with HFIP. This result shows that methanol acts only as a polar solvent for the reaction, and explicit hydrogen-bonding with methanol does not affect the reaction rate, as it does with HFIP. While prior theoretical study by the Shaik group showed that fluorinated alcohols increased the epoxidation reaction rate [140], the studies by Berkessel explicitly showed the effect of multiple aggregates of the solvent molecules. 47

(a) (b)

Figure 3.9: Plots for epoxidation rate of β-caryophyllene versus differing solvent parame- ters, reproduced from [141]

Later, Steenackers et al. experimentally studied the epoxidation of β -caryophyllene to caryopyllene oxide in aqueous H2O2, alcohols, nitrogen containing solvents, and furans, 11 in total [141]. In all cases, the rate correlated extremely well with Abraham’s hydro- gen bonding parameters (R2 = 0.97), and interestingly, not well at all with the dielectric constant (R2 = 0.17), as shown in Figure 3.9 reproduced from the paper. They further char- acterized solvent effect using ωB97XD/g-311++G(df,pd) and IEPCM. This computational study confirmed previous studies that the solvent stabilizes the O-O bond in H2O2 in the transition state structure via hydrogen bonding.

3.1.3.7 Hydrolysis

Almerindo and Pliego studied the hydroylsis of formamide with ab initio calculations and PCM [142]. 1-4 explicit water molecules were considered. Two mechanisms, stepwise and concerted, were investigated; see Figure 3.10. For the stepwise mechanism, the activation barrier increased by 4.6 kcal/mol with one water molecule and 11.0 kcal/mol with two water molecules. Adding water molecules beyond two made the system entropically unfavorable. For the concerted mechanism, the solvation only increased the barrier by 6.4 kcal/mol with two water molecules, indicating that the transition state is more stabilized by the solvent 48

Figure 3.10: The possible mechanisms in the hydrolysis of formamide. Figure reproduced from [142] than in the stepwise mechanism. Additionally, the level of theory used for the calculations had a large effect on the barriers. The discrepancy caused by the level of theory was larger than the difference between using geometries optimized in the liquid-phase, rather than geometries optimized in the gas phase and then modeled with PCM.

3.1.3.8 O-neophyl rearrangement

The O-neophyl rearrangement of two 1,1, diphenylethoxyl radicals was investigated with laser flash photolysis in five solvents by Bietti and Salamone [143]. For both radicals, the rate constant decreased with increasing solvent polarity. There was a linear correlation found between the logarithm of the rearrangement rate constant and the Dimroth-Reichardt

N N parameter ET . Because ET represents the solvent anion’s solvating ability, the trend was explained in terms of the “decrease in the extent of negative charge on the oxygen atom on going from the starting radical to the transition state.” [143] 49

3.2 Methods

As the numerous studies in the Background showed, chemical reactivity can change drastically in different solvents. While some rates change systematically with solvent po- larity, it is not always clear what properties of the solvent and reactant structure will have an effect on the rates. Experimental data are only available for certain radical reactions in some reaction families, most extensively hydrogen-abstraction. Again, because RMG contains information and estimation methods for gas-phase kinetics, it is desirable to de- vise a method to correct these rates in the liquid-phase, which is done in this dissertation. For diffusion effects, simple correlations are available for correcting reaction rates, and are utilized here; however, correcting intrinsic kinetics is more complicated. Using quantum chemistry provides a powerful tool for determining reactant and transition state energies. These quantum calculations are expensive, so in this work, we take the approach of study- ing a subset of reactions and generalizing the results.

3.2.1 Diffusion

Diffusion corrections to reaction rate are needed because reactions are physically impeded by the presence of solvent. To implement this phenomenon into RMG-Py, simple correlations based on the hard-sphere approximation were used to minimize computational cost and complexity. The correction is only made in the bimolecular direction; no cor- rection is needed for unimolecular reactions. For reactions which are bimolecular in both directions, the diffusion correction is calculated based upon the direction where it will have the greater effect. In all cases, since the equilibrium constant does not change as a re- sult of diffusion, it is used to modify the reverse rate of reaction proportionally [50]. This modification is implemented by calculating an effective rate constant [144]:

4π(r1 + r2)(D1 + D2)kint keff = 4π(r1 + r2)(D1 + D2) + kint 50

where r1 and r2 are the radii of the reacting species, D1 and D2 are their diffusivities, and kint is the intrinsic reaction rate in the gas phase. The radii are calculated by assuming the molecules are spherical and using the McGowan volume, which gives a contribution to the volume for each atom and a subtraction for each bond [145]. The Stokes-Einstein equation is used to calculate the diffusivities:

k T D = B 6πηr

and the viscosity η, dependent on temperature, can be calculated with the following:

B η = A + + C log T + DT E T

where A, B, C, D and E are fitted parameters tabulated for many solvents [146]. This temperature dependence is an improvement over the implementation in RMG-Java, where solvent viscosity is assumed to be its value at 298 K. The solvent database, which contains the solvent descriptors described in Chapter 2, also include the A − E parameters for each solvent. All changes to the source code were made in RMG-Py’s solvation module, used to store the solvent properties when loaded, and a newly created diffusionlimited module that has functions based on the empirical correlations. Additionally, the reaction module was modified in order to correct the gas-phase rate when solvation is turned on as specified by the user in the RMG input file.

3.2.2 Intrinsic kinetics

Group-based estimation techniques have proved effective for predicting parame- ters for mechanism generation, including thermodynamic parameters [27], kinetics [30], and most recently, transition state geometries [39]. In this dissertation, I show that such an approach is also possible for predicting the difference in a reaction’s activation energy 51

between the gas-phase and liquid-phase (∆EA). First, the effect on ∆EA of changing re- actant functional groups was tested in small sets of hydrogen abstraction reactions. The required reactant and transition state geometries and energies were calculated with density functional theory (DFT) and a continuum solvation method, which have previously been used to determine kinetic solvent effects [89, 136, 141]. All calculations were completed using the Gaussian ‘09 computational chemistry package [95] carried out on Northeast- ern University’s Discovery cluster, a high performance computing cluster. Based on the observations from these calculations, a group estimation method was developed to predict

∆EA for different classes of solvents, described in the following sections. The method was applied to a published n-dodecane/methyl oleate oxidation model previously built using RMG, serving to represent a middle-distillate fuel with a fatty acid methyl ester (FAME) additive, both which are components of biofuels [4]. I then observed the effect of including kinetic solvent effects on predicted fuel induction period (IP).

3.2.2.1 Quantum chemistry protocol

For the quantum chemistry calculations, 53 hydrogen abstraction reactions were chosen to best observe effects of functional groups on reaction barriers (Table 3.1, rep- resented by SMILES). These reactions included carbon, hydrogen, oxygen and nitrogen atoms, and probed the effect of alcohol groups versus alkanes, different abstraction sites within the same molecule, unsaturated versus saturated rings, and carbon chain length. For this set of reactions, gas-phase reactant geometries and transition state geometries were optimized using M06-2X [147, 148]/MG3S [149–154]. The transition states were verified with an intrinsic reaction coordinate calculation. Once a correct transition state was found, single point energies were calculated in eight different solvents, for both the reactants and transition states, using SMD [94]. These eight solvents, given in Table 3.2, cover a wide range of dielectric constants and at least one falls under each of six categories of solvent as defined by Schmid [155]: 1) nonpolar, 2) aliphatic, 3) protic/protogenetic in which one 52

Reaction # heavy atoms 1 [CH3] + C ←→ C + [CH3] 4 2 [OH] + C ←→ O + [CH3] 4 3 O[O] + C ←→ OO + [CH3] 6 4 [CH3] + CC ←→ C + C[CH2] 6 5 [CH3] + CO ←→ C + C[O] 6 6 [CH3] + CO ←→ C + [CH2]O 6 7 [OH] + CC ←→ O + C[CH2] 6 8 [OH] + CO ←→ O + C[O] 6 9 [OH] + CO ←→ O + [CH2]O 6 10 O[O] + CC ←→ OO + C[CH2] 8 11 O[O] + CO ←→ OO + C[O] 8 12 O[O] + CO ←→ OO + [CH2]O 8 13 [CH3] + CCC ←→ C + CC[CH2] 8 14 [CH3] + CCC ←→ C + C[CH]C 8 15 [CH3] + CCO ←→ C + CC[O] 8 16 [CH3] + CCO ←→ C + C[CH]O 8 17 [CH3] + CCO ←→ C + [CH2]CO 8 18 [OH] + CCC ←→ O + CC[CH2] 8 19 [OH] + CCO ←→ O + CC[O] 8 20 [OH] + CCO ←→ O + C[CH]O 8 21 [OH] + CCO ←→ O + [CH2]CO 8 22 O[O] + CCO ←→ OO + CC[O] 10 23 O[O] + CCO ←→ OO + C[CH]O 10 24 O[O] + CCO ←→ OO + [CH2]CO 10 25 [CH3] + CCCO ←→ C + CCC[O] 10 26 [CH3] + CCCO ←→ C + CC[CH]O 10 27 [CH3] + CCCO ←→ C + C[CH]CO 10 28 [CH3] + CCCO ←→ C + [CH2]CCO 10 29 [OH] + CCCO ←→ O + CCC[O] 10 30 [OH] + CCCO ←→ O + CC[CH]O 10 31 [OH] + CCCO ←→ O + C[CH]CO 10 32 [OH] + CCCO ←→ O + [CH2]CCO 10 33 O[O] + CCCO ←→ OO + CCC[O] 12 34 O[O] + CCCO ←→ OO + CC[CH]O 12 35 O[O] + CCCO ←→ OO + C[CH]CO 12 36 O[O] + CCCO ←→ OO + [CH2]CCO 12 37 [CH3] + O1C=CC=C1 ←→ C + O1C=[C]C=C1 12 38 [CH3] + O1C=CC=C1 ←→ C + O1[C]=CC=C1 12 39 [CH3] + [NH]1C=CC=C1 ←→ C + [NH]1C=[C]C=C1 12 40 [CH3] + [NH]1C=CC=C1 ←→ C + [NH]1[C]=CC=C1 12 41 [CH3] + [NH]1C=CC=C1 ←→ C + [N]1C=CC=C1 12 42 [OH] + C1C=COC=1 ←→ O + C1C=[C]OC=1 12 43 [OH] + C1C=COC=1 ←→ O + C1[C]=COC=1 12 44 [OH] + [NH]1C=CC=C1 ←→ O + [NH]1C=[C]C=C1 12 45 [OH] + [NH]1C=CC=C1 ←→ O + [NH]1[C]=CC=C1 12 46 [CH3] + C1=CC=CC=C1 ←→ C + C1=[C]C=CC=C1 14 47 [CH3] + C1=CC=NC=C1 ←→ C + C1=C[C]=NC=C1 14 48 [CH3] + C1=CC=NC=C1 ←→ C + C1=[C]C=NC=C1 14 49 [CH3] + C1=CC=NC=C1 ←→ C + [C]1=CC=NC=C1 14 50 [CH3] + C1CCCCC1 ←→ C + C1[CH]CCCC1 14 51 [OH] + C1=CC=NC=C1 ←→ O + C1=C[C]=NC=C1 14 52 [OH] + C1=CC=NC=C1 ←→ O + C1=[C]C=NC=C1 14 53 [OH] + C1=CC=NC=C1 ←→ O + [C]1=CC=NC=C1 14

Table 3.1: Training reactions used to deduce kinetic solvent effects 53 hydrogen atom is bonded to oxygen, 4) halogenated, 5) amines, and 6) select “normal” solvents which are non-protonic, do not contain chlorine, and are aliphatic with a single, dominant bond dipole. The SMD calculations and gas-phase energy calculations then made it possible to calculate ∆EA for each reaction in each solvent.

Solvent  Octane 1.9 Benzene 2.3 Tetrahydofuran 7.4 Dichloromethane 8.9 Pyridine 13.0 Acetonitrile 35.7 Dimethylsulfoxide 46.8 Water 78.4

Table 3.2: Solvents used for single-point energy calculations on training reactions

3.2.2.2 Molecular structure group training

Based upon the observations from the training set of reactions, two hierarchical, molecular group trees (one for each reactant) were manually constructed based on the fea- tures considered most important to predicting ∆EA. I chose these features based on both chemical intuition and the data, and a similar procedure for constructing similar group trees has been described [39]. Using regression, group contributions to ∆EA were calculated for each leaf in the trees to best fit the data for the ∆EA of the 53 training reactions. The top level of the tree contains a base value for ∆EA, and the total ∆EA is calculated by adding this base value to the values for the leaf in each tree which which best matches the reacting groups in the reaction of interest. This calculation method is similar to the current procedure for gas-phase thermodynamics, solvation thermodynamics, and transition state geometry data estimation in RMG (and hierarchical trees are also used for the rule-based estimation of gas-phase kinetics). Methods for calculating ∆EA based on this group con- tribution method were added to the solvation module of RMG, drawing from the newly added databases of solvation kinetics group values in RMG-database. 54

Though the trees for each solvent have the same structure, a different contribution to

∆EA is used according to the calculations in that solvent. Twenty-six solvents are currently available to use for thermodynamic corrections in RMG, and these may be expanded in the future. These solvents can be generalized to different categories, as mentioned above. The eight solvents for which SMD calculations have been done will be used to construct these categories; ∆EA for solvents outside of these eight will be calculated using the appropriate category.

3.2.3 Fuel oxidation model modification

The published n-dodecane/methyl oleate model, generated with RMG by Ben Amara et al., includes a Chemkin file and a dictionary which gives the SMILES strings for each named species in the mechanism [4]. Using this information, I parsed the Chemkin file using methods in RMG to generate a reaction in RMG format, corresponding to each hy- drogen abstraction reaction in the mechanism. Once the reacting groups were determined by this script, the tree was traversed to find the contributions to ∆EA which are most ap- plicable to the reaction. The contributions from the n-octane solvent tree were used, as it is the solvent which is most chemically similar to n-dodecane. The script then rewrote

the Chemkin file, updating the EA of each hydrogen abstraction reaction. The updated Chemkin file is included in Appendix A.3, and the script used to generate it can be found in Appendix A.2.

3.2.4 Reactor simulations

To examine the effect of changing the reaction barriers in the fuel oxidation method, I simulated a reactor using both the previous (gas-phase kinetics) and updated (liquid-phase kinetics) models. An ideal gas reactor model was used, with the density set to the liquid fuel density of n-dodecane, to approximate a liquid-phase reactor. Oxygen was continuously replenished throughout the course of the simulation to emulate a constant partial pressure 55 of oxygen. Other conditions were set to be the same as the conditions set in the modeling study conducted by Ben Amara et al. [4]. Different methyl oleate concentrations (0%-30% by volume) were used. The induction period, which is defined as the time at which the fuel is 5% converted, was compared for both models as well as to prior experimental data. The script used to simulate the reactor is included in Appendix A.4.

3.3 Results

Some of the trends observed for change in ∆EA with molecular structure groups are outlined below. The trends were used to generate the group tree for correcting the gas phase kinetics in different solvents. Finally, the data for n-octane was used to modify the

EA in a fuel oxidation model; the updated reactor simulations are shown below.

3.3.1 Solvation kinetics trends

Several solvation kinetics trends, based on solvent and molecular structure, were deduced from the training data. One such relationship is displayed in Figure 3.11. Here, when ·OH abstracts a hydrogen from different sites of propanol, the kinetic solvent effect increases as the abstraction site gets closer to the alcohol group. Examples of other such trends will be briefly listed. Solvent effect on hydrogen abstraction from alkanes is the same with increasing carbon chain length, with the only difference being for methane; hydrogen abstraction from saturated rings and unsaturated rings have a slight difference, with larger differences occurring for high dielectric solvents; and hydrogen abstraction by ·OOH trends differently than abstraction ·OH and ·CH3, with the latter two behaving similarly [156]. The intuition gathered from these trends was used to construct the molecular structure group tree for ∆EA. 56

Figure 3.11: Difference in energy between gas-phase and liquid-phase, between reac- tants and transition state (∆EA), for the reaction XH + ·OH ←→ ·X + H2O, where XH is propanol and each symbol represents a different abstraction site (indicated by molecules on the right with the abstraction site circled in purple). Kinetic solvent effect increases as distance of abstraction site to alcohol group decreases.

3.3.1.1 Group tree

From the intuition gained from the trends discovered in the section above, along with general principles included in the existing thermodynamics and kinetics databases in RMG, a molecular structure group tree was constructed for the determination of solvation kinetic corrections for hydrogen abstraction reactions. Importantly, the structure of this tree determines what estimates will be used for ∆EA when an exact value is unavailable. The first few levels of the tree are displayed in Figure 3.12; the full tree has 1-2 more layers of complexity. Similar studies were conducted for intra-hydrogen migration reactions, although the solvation corrections were not applied to the n-dodecane/methyl oleate oxidation model. 57

(a) Molecular structure group tree for the molecule being abstracted from (XH)

(b) Molecular structure group tree for abstracting species (Y·)

Figure 3.12: Illustration of the first few levels of group trees for hydrogen abstraction. The full tree contains 2 more levels of complexity. Reacting atoms are colored in blue. R’ refers to a functional group that is not a hydrogen. 58

Both hydrogen abstraction and intra-hydrogen migration group trees were also further trained with a larger set of reactions (see Recommendations section). The full tree along with group values for ∆EA, for hydrogen abstraction in n-octane, is included in Appendix A.1; this tree was trained using the large set of reactions.

3.3.2 New reactor simulations

Figures 3.13, 3.14, 3.15, and 3.16 compare the results of reactor simulations us- ing the original, gas-phase kinetic model for n-dodecane/methyl oleate oxidation, the up- dated model with kinetic solvent effects, and experiments. For the simulation with pure n-dodecane (0% methyl oleate), addition of solvation kinetic effects appears to increase the induction period of the fuel. This increase is in the direction of most, though not all, of the experimental data points. The amount that the modified model increases the induction pe- riod, on a logarithmic basis, is dependent on the amount of methyl oleate in the fuel blend. Additionally, the model which is least consistent with the experimental data is the one built for 30% methyl oleate content. These results may suggest more training data are needed for the solvation kinetic groups which contain or are proximal to oxygen atoms. While not relevant in this example, as n-dodecane is a non-hydrogen bonding solvent, oxygen as a re- acting atom will matter even more in such cases as the transition state would be stabilized by a hydrogen bonding solvent relative to the reactants.

3.4 Summary

In this chapter, modifications to the RMG-Py software were implemented for liquid- phase mechanism generation, specifically for reaction kinetics. Based on well known corre- lations, reaction rates were modified to account for diffusion effects, with an improvement over the implementation in RMG-Java: the addition of temperature-dependence in the sol- vent viscosity. Additionally, a large part of this chapter was devoted to demonstrating the applicability of a group contribution method for estimating kinetic solvent effects. Several 59

102

101

100 Experiments Ben Amara 2013 model Modified model 10-1 2.20 2.25 2.30 2.35 2.40 2.45 2.50 Induction Period (hours) 1000 (K 1 ) T −

Figure 3.13: Comparison of experiments, original model and updated model with kinetic solvent effects for 0% methyl oleate.

102 Experiments Ben Amara 2013 model 1 10 Modified model

100

10 1 2.2 2.3 2.4 2.5 Induction Period (hours) 1000 1 T (K )

Figure 3.14: Comparison of experiments, original model and updated model with kinetic solvent effects for 5% methyl oleate. experimental and theoretical methods for measuring liquid phase rate constants were re- viewed; however, given RMG’s extensive database of gas-phase rates, it was preferable to design a method that modified these rates systematically to account for a solvent. A train- ing set of hydrogen abstraction reactions, and later intra-hydrogen migration reactions, was used to deduce trends based upon molecular structure on the change in energy between gas- phase and different solvents, for both reactants and transition states. The calculations were completed using M06-2X/MG3S. Based upon the trends, a group contribution method was devised, and used to modify rates of a fuel oxidation model built using RMG-Java. Reac- tor simulations using the updated model showed that correcting rates changes the induction 60

102 Experiments Ben Amara 2013 model 1 10 Modified model

100

10 1 2.2 2.3 2.4 2.5 Induction Period (hours) 1000 1 T (K )

Figure 3.15: Comparison of experiments, original model and updated model with kinetic solvent effects for 10% methyl oleate.

102 Experiments Ben Amara 2013 model 1 10 Modified model

100

10 1 2.2 2.3 2.4 2.5 Induction Period (hours) 1000 1 T (K )

Figure 3.16: Comparison of experiments, original model and updated model with kinetic solvent effects for 30% methyl oleate. period measurement, generally towards the direction of experimental induction period. The more methyl oleate in the fuel blend, the worse the agreement of the updated model with the experimental values. This reflects a need for more training reactions including methyl esters, and oxygen-containing species in general. 61

3.5 Recommendations

Several recommendations to improve solvation kinetics in RMG-Py are outlined below. Among these are better automation, benchmarking, and integration with estimation methods to ensure thermodynamic consistency.

3.5.1 On-the-fly estimation of solvation kinetics

In the future, the algorithm described in this chapter can be integrated into the RMG software such that the mechanism including the kinetic solvent effect can be generated on the fly, instead of as a post-processing step to modify a Chemkin file. On-the-fly genera- tion of solvation kinetics ensures that important reactions are included in the model, since a post-processing step is done on reactions whose inclusion in the mechanism was based on their gas-phase rates. This framework has already been partially set up, as the post- processing step is completed using methods I added to RMG. When the solvation database is loaded, the solvation kinetics correction database is loaded in addition to the thermody- namic database files. A method to calculate the barrier correction using group contributions was added to the solvation module. The missing step is correcting the intrinsic reaction rate during model generation, as we similarly do to account for diffusion. Because these steps are similar, it may make sense to do them in the same place in the source code. Some placeholder code is included in the diffusionLimited module for this purpose, in a new class called LiquidKinetics. Complete interfacing with the kinetics and solvation databases is currently incomplete. Importantly, this on-the-fly step should not be integrated into RMG until further benchmarking and consistency checks are done, explained below. 62

3.5.2 Benchmarking the estimates

To minimize computational cost, the training reactions used to train group additivity values for ∆EA involved small molecules, but the reactions in real detailed kinetic mod- els for fuel oxidation will contain many more atoms. Thus, it is important to understand whether our training reaction set can accurately capture the kinetic solvent effect on the actual reactions. It is hypothesized that the chemistry of the reacting site is most important to ∆EA, and that effects beyond the next-nearest neighbor are negligible. To test this, the

∆EA for some hydrogen abstraction and intra-hydrogen migration reactions from the n- dodecane/methyl oleate model should be calculated using M062X/MG3S and SMD using the same protocol as above. These ∆EA can then be compared to the ∆EA that would have been estimated for the same reaction using the automated method. This is work currently in progress at the time of writing.

3.5.3 Check thermodynamic consistency with LSERs

One way of checking the consistency of both the SMD calculations, and the linear solvation energy relationships (LSERs) discussed in Chapter 2, is to compare the equilib- rium constant obtained by using both of these methods. Using LSERs, the equilibrium

constant of reaction, Keq, can be obtained with:

−∆G K = exp eq RT

∆G = ∆Ggas + ∆Gsolv

where ∆Gsolv was calculated via the partition coefficient obtained from the Abraham corre-

lations. Another Keq can be calculated using ∆G for the reaction’s reactants and transition state, which are calculated using the DFT and SMD method outlined in this chapter. Since data from both of these methods would be used alongside each other in mechanism gener- 63 ation codes such as RMG, it is important the values obtained from each method reasonably match.

3.5.4 Data-driven approaches

The algorithm for predicting intrinsic kinetic solvent effects has two main draw- backs: small amounts of training data, and a human-constructed tree structure. Manually setting up transition state calculations is tedious, thus the first iteration of this algorithm only contained 53 hydrogen abstraction reactions as training data, with the starting transi- tion state geometry guess and input files all set up by hand. Using larger amounts of data which had been previously calculated automatically, and the effect of more data points on the estimation of solvation kinetics, can be explored. Furthermore, the molecular structure trees in RMG are all constructed with the fea- tures ordered in the way the scientist believes to be the most important. For example, many of the trees in RMG for calculating thermodynamic or kinetic parameters consider the ele- ment of the central reacting atom to be more important than its bonding configuration, and the number of its radical electrons to be more important than its element. While a chemist’s intuition should not be ignored, a data driven approach may improve the accuracy of the estimation method. To explore this concept, automated decision trees can be constructed for solvation kinetics using the Python package, scikit-learn [157].

3.5.4.1 Automated transition state theory to generate large training sets

To generate the large solvation kinetics training set, we needed a large set of gas- phase transition states, as this calculation set-up is the bottleneck of calculating ∆EA. The set of transition states was created using an automated algorithm, based on a group contri- bution approach to predicting transition state geometries [39]. Outside of this algorithm, Python scripts were written to streamline the process of 1) obtaining the gas-phase tran- sition state geometry and energy from the database of transition states, 2) calculating the 64

Figure 3.17: Algorithm for obtaining solvation energy estimation values from a large tran- sition state dataset. Utilizes software (from top to bottom): AutoTST [39], Gaussian 09 [95], cclib [158], and RMG-Py [7, 8], with helper scripts written as part of this work. single-point energy of the transition state and reactant(s) in the solvent of interest, 3) cal- culating ∆EA for this reaction, and 4) using the values to train group values for the ∆EA estimation algorithm. A schematic for this process is shown in Figure 3.17.

3.5.4.2 Automatic generation of decision trees with scikit-learn

Scikit-learn is an open source Python package used for machine learning [157]. It has been used for automatically constructing decision trees in several chemical and bio- logical applications [159–161]. Rather than using a chemist’s intuition about how to order the decision tree, scikit-learn will utilize a user-defined metric (such as RMS error on a training set, for example), for determining how the tree should be constructed. Using a package such as scikit-learn opens up several other possibilities. The decision tree model can be robustly cross-validated on the training set and model hyperparameters can be tuned using built in modules. Also, ensemble methods such as random forests, which use several decision trees to estimate the quantity of interest, can improve the accuracy of the model. Several feature selection, parameter tuning and model combinations can be tested using pipelines in scikit-learn, with much less code than building these trees manually. Some ex- ample experimental code for building decision trees for estimating solvation kinetic effect 65 is included in Appendix A.5. 66

4 AUTOMATED SILICON HYDRIDE MECHANISM GENERATION

Chemical vapor deposition (CVD) is a candidate domain for studying detailed chemical kinetics. The gas-phase chemistry of the precursors to CVD directly affects the yield and quality of the solid materials produced, and is thus an important facet of the process for industrial scientists, such as those fabricating microelectronics, to under- stand. Complete, detailed models of the reactions of these precursor gases, mainly silicon hydrides such as silane (SiH4), can provide necessary insights to the semiconductor indus- try. Detailed kinetic models also allow performance predictions to be made at different operating conditions than what has been studied experimentally. However, building these large mechanisms by hand is tedious, and automatic mechanism generators can be used to build the models faster and minimize errors. Using automatic mechanism generation en- sures that all important reaction pathways, including those involving radical species, will be considered in modeling and prediction of the CVD process. Furthermore, a framework for studying gas-phase silicon hydrides provides the first step in studying surface reactions involved in CVD. In this section of the dissertation, Reaction Mechanism Generator (RMG), an au- tomatic mechanism generator discussed in the prior sections, was used to build detailed kinetic models for silicon hydrides used for CVD. A new element, silicon, was added to the RMG framework. Specifically, data for silicon hydrides were added to RMG’s database of thermodynamic and kinetic parameters. Radical reaction types, which already existed in RMG but lacked data for silicon, and new reaction types specifically important for silicon hydride chemistry, can both be proposed with the updated RMG. 67

Using the new data in RMG, a model for SiH4 thermal decomposition was built. The resulting model was used to simulate a flow reactor, and these simulations were com- pared to SiH4 flow tube experimental data obtained from Onischuk et al. [162], including data for SiH4 and Si2H6 concentration profiles with time, and at different temperatures, residence times, and initial concentrations of SiH4. Studying SiH4 can serve as a proof-of- concept that can be extended to other novel gas precursors.

4.1 Background

SiH4 CVD has been well studied experimentally and theoretically [163–170], among others, for the past 5 decades. Several representative studies will be discussed in the fol- lowing sections.

4.1.1 Experimental work on SiH4 chemistry

Several decades ago, Purnell and Walsh experimentally studied SiH4 pyrolysis be-

◦ tween 375 and 430 C. They found the overall rate of SiH4 −→ SiH2 + H2 to be of order 1.5, and propose different chemical mechanisms to account for this order. The authors rule out heterogenous steps under these conditions, because the vessel’s A/V ratio did not strongly change the reaction rate. Thus, they conclude that the mechanism must include a unimolecular, first order decomposition which is pressure dependent along with chain reactions. However, the choice of whether SiH4 first decomposes to SiH3 or SiH2 required further information of bond dissociation energies [163]. Newman et al. studied silane decomposition in a shock tube, supplemented by RRKM rate calculations. They found that rate of silane decomposition is independent of initial silane concentration, and also that hydrogen atom production is not important in the process, providing evidence that SiH4 decomposition to SiH2 is the most important initial step. However, because they did not see disilane, Si2H6, as a product, they hypothesize the

SiH2 must be consumed by some other means rather than producing hydrogen atoms [171]. 68

Michael Coltrin and co-workers at Sandia National Laboratories have done exten- sive work on silane CVD in a rotating disk reactor, which eliminates temperature and con- centration gradients allowing CVD to be more uniform [165, 167]. Based on laser fluo- rescence measurements, two reactions are proposed that produce Si atoms, which involve disilylene (H3SiSiH). These reactions are in contrast to what was previously thought to be the route to Si atoms, SiH2 −→ Si + H2 [167]. Frenklach et al. studied silane and disilane pyrolysis in a shock tube, diluted with both argon and hydrogen, at high temperatures (900-2000 K). The results of these ex- periments were used to build a detailed kinetic model for both gas-phase and gas-surface reactions. Comparison to the experiment showed that the refractive index of the silicon material used greatly affects model predictions [168]. Onischuk et al. investigated silane pyrolysis in a flow reactor between 800-1000 K. For the gas phase, the effects of initial silane concentration on silane decomposition and disilane evolution were elucidated; specifically, these rates increased with increasing silane concentration. At different residence times, the effect of temperature on final silane concentration was probed. The solid phase was also investigated; the concentration of particles as well as hydrogen content in the solid product was measured. Based on the experimental results, the chemical mechanism was described as an initial decomposition of silane into SiH2 and H2, followed by subsequent silylene insertions to create higher silanes and substituted silylenes [162]. The Onischuk et al. experiments are used as a basis of comparison in this chapter of the dissertation.

4.1.2 Detailed mechanisms for SiH4 CVD

Mechanisms for SiH4 thermal decomposition have been developed using the above experimental data as well as theoretical calculations, by hand and automatically. Yuuki et al. developed a model for SiH4 and Si2H6 from the experimental observations of Purnell and Walsh and Newman et al. [163, 164, 171]. The model includes 10 species and 11 69 reactions, and it achieves good agreement with two experiments [163, 172]. Coltrin et al. developed a mathematic model for their rotating disk reactor, which includes a 26 reaction gas-phase chemical mechanism [165]. Several rate constants were taken from the RRKM calculations of Becerra and Walsh [173]. Giunta et al. built a gas-phase silane decomposition mechanism from prior exper- imental works on SiH4/NO CVD[174] and CVD from Mg2Si [175], which includes 18 gas-phase reactions. They hypothesize that film growth is mainly due to disilene species. The model generally modeled CVD from silane well, compared to experiment, while ex- perimental comparison to disilane CVD only agrees qualitatively at best. Authors cite low reliability of rate constants as well as the absence of heterogeneous chemistry from the

model as reasons why Si2H6 experiments do not match well [166]. The largest manual reaction mechanism for the pyrolysis of silane to form silicon nanoparticles, which was built by Swihart and Girshick, contains 220 chemical species and 2600 chemical reactions. To build the mechanism, a group additivity method was devel- oped to estimate the thermochemistry of several silicon hydrides in the model. Reactivity rules were also developed to generate reactions in the mechanism, including templates for

silylene (SiH2) insertion into Si-H and H-H bonds (and its reverse, SiH2 elimination); sily- lene to silene isomerization; and ring opening and closing isomerizations [5]. Later, this work was continued in collaboration with the the Broadbelt group at Northwestern Univer- sity, which has extensively modeled the pyrolysis of silane, particularly for the application of silicon nanoparticle formation. The group additivity thermochemistry method was im- proved, using the G3//B3LYP level of theory [70, 176]. Wong et al. [6] used automatic

mechanism generation to build a model for silicon nanoparticle formation from SiH4 de- composition. In that study, automatic mechanism generation software developed by the Broadbelt group, which uses a rate-based termination criteria to limit model size, was em- ployed. Reaction types were the same as in the previous study by Swihart and Girshick [5]. The study included a comprehensive analysis of the different factors contributing to 70 silicon particle clustering. However, radical pathways were not included in the built mod- els. Adamczyk and co-workers further developed group additive rate rules for the reaction families involved, derived from quantum chemistry calculations at G3//B3LYP [177–180].

4.1.3 Importance of radical chemistry in silicon hydride thermal decomposition

In their early study, Purnell and Walsh suggested that decomposition of SiH4 to

SiH3 and H plays a role in silane pyrolysis despite not being the most dominant path- way [163]. An earlier study by Emeleus´ and Reid [181] suggested the role of SiH3 be- comes more important when Si2H6 and Si3H8 are used as the silicon hydride precursor to CVD. The involvement of radicals may also be more important at certain CVD conditions

[182, 183], for example, in low temperature CVD in silane plasma [184]. SiH3 was found to be the dominant radical species at steady state in silane plasmas in a study by Robertson et al. [185]. Watanabe et al. later demonstrated that the concentration profiles of Si, SiH

and SiH2 radicals greatly affect particle growth in these systems [186]. Given that these studies all theorize the role of radical reaction pathways in silicon hydride thermal decom- position, methods for building detailed kinetic models should have the ability to include these pathways if they are important.

4.2 Methods

To enable RMG to generate detailed kinetic models for silicon hydrides, small up- dates to the RMG source code were made, and its database was updated. Once these updates were performed, RMG was used to build detailed kinetic models for silane thermal decomposition. The models output by RMG were used to simulate a reactor for comparison to the experiments of Onischuk et al. [162]. 71

4.2.1 RMG source code

Although silicon atom types already existed in RMG, the functionalities for in- crementing and decrementing bond orders, breaking and forming bonds, and adding and removing lone pairs and radical electrons were fully implemented in this work. These updates were made in the molecule module of RMG. Additionally, several thermodynamic and kinetics calculations were completed us- ing quantum chemistry and CanTherm, which parses quantum chemistry output files to perform thermodynamic and kinetics calculations, and is a subprogram of RMG [33]. In this work, CanTherm uses the rigid rotor harmonic oscillator approximation to calculate thermodynamic properties and canonical transition state theory with Eckart tunneling to calculate kinetic parameters. In order to perform these calculations, CanTherm must apply atom, bond and spin-orbit coupling energy corrections to adjust the energies calculated for a particular level of theory and quantum chemistry program. These corrections must be manually entered into the RMG source code for silicon atom types and silicon containing bonds. Thus, the necessary parameters were entered for the levels of theory used in the calculations, either from published data [70, 187, 188] or additional calculations of atomic energies using Gaussian 09 [95].

4.2.2 Updating RMG’s database

RMG’s database was updated with both kinetics and thermodynamics data for sil- icon hydrides. Published reaction rates have been added to reaction libraries, which are preferentially used during RMG simulations if they match a generated reaction. Additional rates were calculated using transition state theory. Some published and calculated rates have been added to the training data for four reaction families, to be described below, and will be used to estimate reaction rates in these families where the exact rates are unavail- able. Published, calculated, and group additive thermodynamic data for silicon hydrides have also been added to RMG’s database. 72

4.2.2.1 Kinetics data

In this work, two reaction rate libraries were added to RMG’s database. The li- braries are based on one experimental study of SiH4 CVD [166] and one theoretical study involving pressure-dependent rate calculations for mainly Si2 species [170]. If specified in the input file of an RMG job, the rates from these libraries will be used if an RMG- generated reaction matches a reaction in the library. Additionally, two new reaction families were added to RMG’s database: 1. sily- lene insertion, in which a silicon atom with a lone pair can insert into a silicon-hydrogen or hydrogen-hydrogen bond; and 2. silylene-to-silene isomerization, in which a hydrogen atom migrates from an sp3 bonded silicon atom to a silicon atom with a lone pair, and a double bond is formed. These families are based on the reaction types used in a previous work on model generation [6]. The kinetics data in these reaction families come from rates calculated by Adamczyk et al. [177–179] using G3//B3LYP. The third reaction family considered in previous mechanism generation for silicon hydrides were ring closing reactions (and its reverse, ring opening), in which a silylene molecule with at least three silicon atoms isomerizes to form a ring structure. Two of these reactions (for Si3H6 and Si4H8) were added into a reaction library. Because the current work does not consider larger molecules, an RMG reaction family for arbitrarily large rings was not created. In addition, the training reaction databases of two existing reaction families in RMG, hydrogen abstraction and radical recombination, were updated with silicon hydride- containing reactions and kinetic parameters [183, 189], found in the NIST kinetics database [190]. For hydrogen abstraction reactions, ten additional rates were calculated. The reac- tant and transition state geometries were optimized using M06-2X [147, 148] / 6-311+G(3d2f) [150–152, 191, 192] , with a clear saddle point found for each reaction. The CanTherm package, which applies conventional transition state theory, was used to determine the Arrhenius kinetic parameters [33]. More detailed information about CanTherm can be 73 found in Chapter 1. Reliability of these new, calculated rates were tested by comparing to available published experimental and theoretical rate calculations for hydrogen abstraction reactions of silicon hydrides [182, 189, 193–197].

Reaction family Template Hydrogen abstraction R1 H + R2 R1 + R2 H Radical recombination R1 + R2 R1 R2 R1 Silylene insertion R1 Si H + R2 H SiH2 R2 R1 R1 Silylene-to-silene isomerization H Si Si H Si SiH2 R2 R2

Table 4.1: Reaction families used to generate mechanisms for silicon hydrides in RMG

The templates for these four reaction families, which are used to generate the SiH4 thermal decomposition models, are given in Table 4.1. Since these general reactions are reversible, reactions based on the reverse templates (i.e. silylene elimination) are also in- cluded in the model. Further information about how RMG prioritizes reaction rates from libraries and families can be found in Chapter 1.

4.2.2.2 Thermodynamics data

Group additivity values for the thermodynamics of silicon hydride species (silanes, silenes, and silylenes) were previously calculated by Swihart and Girshick [5]. The group values were later improved upon by Wong et al. by fitting to G3//B3LYP calculations [70]. In this work, the Wong et al. values were added to RMG’s database for use in the existing group additivity scheme for calculating thermodynamic parameters in RMG. For radical species, group additivity values have not previously been determined. We used G3//B3LYP quantum chemistry calculations to generate hydrogen bond increment (HBI) values for silicon hydride radicals. HBI is a method, described in more detail in Chapter 1, used to calculate the thermodynamic parameters of a radical species R∗ from 74 its parent molecule RH, which is the chemical species created by saturating the radical with hydrogen atoms [28]. Both the G3//B3LYP calculations for the closed-shell species and group additivity values using HBI for radical species were benchmarked against higher level calculations by Katzer et al. [198]

4.2.3 RMG model generation

The conditions for the RMG simulation were chosen to closely mirror those of the experimental conditions in the flow tube experiment [162], with the base case conditions of T P y = 913 K, = 39 kPa, and 0(SiH4) = 0.00016 in an argon bath gas. Temperature, pressure, and initial SiH4 mole fraction were varied around these values to generate a comprehensive mechanism that could be used for reactor simulations at a variety of conditions. Pressure dependence was included in some of the built models to test its effects on the model ac- curacy. The pressure dependence scheme in RMG has been described in Chapter 1 and previously [25].

4.2.4 Reactor modeling

Once the models were built in RMG, they were tested for validity by comparison to experimental results using a constant pressure reaction model in Cantera, which inte- grates with Python [199]. The Python script used to simulate the reactor and generate the comparison plots are provided in Appendix B.4. We simulated the same conditions for temperature, pressure, initial SiH4 mole fraction, and residence time given in the experi- ment by Onischuk et al. [162] The rate constants and thermodynamic parameters used in the simulation come from the RMG-generated mechanism; the RMG simulation provides a Chemkin file as output, which can be easily converted to a Cantera input file with a script. Different simulations as well as sensitivity analyses were performed to understand the ef- fects of temperature, pressure, initial SiH4 mole fraction, residence time, and model size on the SiH4 concentration profile. 75

4.3 Results

The thermodynamic and kinetic data calculated were compared to published data in order to validate their use in RMG’s database. After updating the database, the RMG generated models were used in simulations and compared with experimental results. The effects of changing various process conditions were considered.

4.3.1 Kinetics of hydrogen abstraction reactions

# Reaction log A Ea log k300K log k1000K Source

1 SiH4 + H −→ SiH3 + H2 14.8 10.7 13.1 14.2 This work 13.9 11.7 11.9 Ref. 196 11.2 13.6 Ref. 189 13.2 Ref. 197 (QI) 13.1 Ref. 197 (VTST) 13.2 Ref. 195 11.3 Ref. 194

2 SiH3 + H2 −→ SiH4 + H 13.4 72.3 1.03 9.51 This work 12.4 61.6 1.65 Ref. 196

3 Si2H6 + H −→ Si2H5 + H2 15.9 3.55 15.5 15.7 This work 13.8 11.1 11.9 Ref. 193 11.9 14.0 Ref. 189

4 Si2H5 + H2 −→ Si2H6 + H 14.2 77.0 1.01 10.1 This work

5 Si2H6 + SiH3 −→ Si2H5 + SiH4 15.1 12.7 13.2 14.3 This work 9.6 Ref. 182

6 Si2H5 + SiH4 −→ Si2H6 + SiH3 14.8 24.6 10.9 13.4 This work

7 Si2H5 + H −→ SiH3SiH + H2 14.5 4.55 13.9 14.2 This work

8 SiH3SiH + H2 −→ Si2H5 + H 14.1 88.1 -0.97 9.45 This work

9 Si2H4 + H −→ Si2H3 + H2 14.7 7.95 13.5 14.3 This work

10 Si2H3 + H2 −→ Si2H4 + H 13.6 126 -8.11 6.93 This work

Table 4.2: Hydrogen abstraction rates calculated from M062X/6-311+(3d2f) and transition state theory using Cantherm. Rates were compared with those obtained from literature, where available. Units in kJ, mol, cm3, s. Logarithms are base 10. 76

We calculated ten hydrogen abstraction reaction rates for use in RMG’s database, but only four prior published rates were available for comparison. The rate coefficients at 300 K and 1000 K and Arrhenius parameters are displayed in Table 4.2, along with published data where available (Reactions 1, 2, 3 and 5). The calculated geometries of the transition states of these ten reactions are included in Appendix B.1. There are only five transition state geometries given, since the reverse reactions will have the same transition states. In all cases, the rate coefficients at 1000 K are within 1–2 orders of magnitude of the published data. However, at 300 K, the discrepancy is much greater for Reactions 3 and 5. Because we are considering temperatures between 800 K and 1000 K for this study, the differences at low temperatures are not particularly concerning. For reactions such as Reaction 1, where many experimental and theoretical data are available and are in agreement with one another, it is practical to use this agreed-upon rate in RMG’s database instead of the less accurate rate calculated here. However, the comparison of these four calculated rates to the literature data shows that for reactions where data are unavailable or scarce, such as Reaction 4, these DFT estimates are reasonable enough for the purposes of automatic mechanism generation, particularly at CVD relevant temperatures. If the RMG generated model were to be simulated at process conditions at which radical chemistry becomes more important (see further results), or different silicon precursors are used, more accurate calculations should be done.

4.3.2 Calculated thermodynamic data

All geometry optimization results from G3//B3LYP calculations on silicon hydrides species are provided in Appendix B.2. The parity plot shown in Figure 4.1 reveals that cal-

◦ culations of ∆f H298 using G3//B3LYP for both closed-shell and radical species compare well with the high level calculations of Katzer et al. [198]. Most discrepancies are less than 5 kcal/mol. Larger differences between the two methods are seen for some Si3 and 77

◦ Figure 4.1: G3//B3LYP calculations of ∆f H298 (this work) compared to high level calcu- lations by Katzer et al. [198], for silicon hydride species with up to three silicon atoms. Structures shown for species with more than 5 kcal/mol discrepancy. multiradical species. Katzer et al. report that for proper treatment of these species, which have both multiple radicals and divalent silicon atoms, a multiconfiguration reference wave function with a post-self-consistent field calculation method must be used [198]. Therefore, we know that our treatment with G3//B3LYP will not be exact. However, G3//B3LYP pro- vides a reasonable and computationally less expensive estimation of the thermochemistry for most species, which justifies our use of it for other calculations in this work. Hydrogen bond increment (HBI) values derived from the G3//B3LYP calculations

◦ of radical species and their parent molecules are given in Table 4.3. ∆f H298 for 16 silicon hydride radicals were then calculated via group additivity and compared to the Katzer et al. values [198], illustrated in Figure 4.2. In this particular comparison, the group addi- tivity values of Wong et al. [70] were used to calculate the thermodynamics for the parent molecules, for the sake of consistency across species, but during an actual RMG simulation, exact species thermodynamics would be used for the parent molecules if available in the database. Because of this, Figure 4.2 is a reflection of both the accuracy of Wong’s group 78

HBI(∆ H◦ ) HBI(S◦ ) Radical species Parent molecule f 298 298 (kcal/mol) (cal/mol/K)

Si SiH3 HSi SiH3 74.0 -3.47

SiH SiH2 75.3 -4.92

HSi SiH2 75.8 -0.571

H2Si SiH H2Si SiH2 83.1 3.01

H H2 Si Si 86.0 1.76 H3Si SiH3 H3Si SiH3

H2Si SiH3 H3Si SiH3 88.5 0.812

SiH3 SiH4 91.3 0.192 Si 146 -12.9

H2Si Si 154 -0.446

Si 168 3.00 H3Si SiH3

HSi SiH3 174 0.443

SiH2 181 -1.03

Table 4.3: Hydrogen bond increment (HBI) corrections calculated with G3//B3LYP. These corrections account for the effect of losing 1 or 2 hydrogen atoms on the enthalpy and entropy. 79

◦ Figure 4.2: Group additivity calculations of ∆f H298 from RMG-Py, derived from Wong et al. GAV [70] and HBI corrections (this work), compared to high level calculations by Katzer et al. [198], for 16 silicon hydride radical species. Structures shown for species with more than 5 kcal/mol discrepancy. additivity values and our calculated HBI values. While most values compare reasonably well, there are again significant differences for three multiradical species which contain either double bonds or divalent silicons, for the reasons discussed above. It may be prefer- able, therefore, to calculate the thermodynamics of these multiradical species at a higher level and to put them in a thermodynamic library in RMG, which are used preferentially over group additivity estimates.

4.3.3 RMG generated mechanisms

Two RMG mechanisms were used for most of the reactor simulations, the difference being inclusion of pressure dependent rate parameters. Tolerance was adjusted to make the size of the models roughly equal and to ensure that radical chemistry was included in both models. The mechanism including pressure dependence contains 63 silicon hydride species and 1298 reactions and the mechanism without pressure dependence contains 57 80 species and 578 reactions. A third (pressure-dependent) mechanism was also generated to incorporate important species and reactions at a variety of initial SiH4 concentrations for comparison to experiment. This mechanism was larger, with 83 species and 2708 reactions. The largest RMG-generated mechanism (Chemkin file, species dictionary, and converted Cantera input file) is provided in Appendix B.3.

4.3.3.1 Effect of pressure dependence

A constant-pressure reactor model was simulated with both the pressure dependent and non-pressure dependent RMG mechanisms. The result, shown in comparison to the prior experimental results, is displayed in Figure 4.3(a). Since the RMG model containing pressure dependence is shown to replicate more accurately the experimental result, pressure dependence was used for the remaining analysis. This result is expected, as including pressure dependent networks in the model is important at the low pressures typically used for SiH4 CVD.

4.3.3.2 Effect of temperature

The full pressure dependent model was further used for reactor simulations at dif- ferent temperatures. Figure 4.3(b) displays the plot from experiment with the pressure dependent model simulated at 863 K, 893 K, 913 K (the experimental temperature), and 963 K. The simulations at 863 K and 963 K show that temperatures within a ± 50 K range can affect the SiH4 concentration profile enormously. Simulating the reactor at 893 K provides a close comparison to the experimental data reported at 913 K, illustrating that an uncertainty of 20 K can fully explain the difference between the model and the experiments.

The main initial decomposition reaction, SiH4 )−−−−* SiH2 + H2, has an activation energy of about 55 kcal/mol, with many reported theoretical and experimental determinations cov- ering a range of about ±5 kcal/mol. A 20 K change from 913 K to 893 K corresponds 81

1.0 1.0 Experiment Experiment 0.9 PDep 0.9 863 K No PDep 893 K 0.8 0.8 913 K 963 K 4 4

4 0.7 4 0.7 H H i i H H i i S S , , S S 0 0 y y

y 0.6 y 0.6

0.5 0.5

0.4 0.4

0.3 0.3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Time (s) Time (s) (a) (b)

Figure 4.3: Simulation results (this work) compared with SiH4 thermal decomposition ex- periment (data points) [162]. (a) Comparison of RMG models with and without pressure dependence to experimental results. Conditions are T= 913 K, P = 39 kPa, and y0,SiH4 = 1.6 × 10−4 (b) Comparison of RMG model, including pressure dependence, at different temperatures, compared to the experimental results at 913 K. to only a −1.20 kcal/mol change in activation energy for a global reaction rate following Arrhenius kinetics with a 55 kcal/mol activation energy. In other words, the discrepancy lies well within the expected uncertainty in the activation energies.

4.3.3.3 Residence time variation with temperature

In the SiH4 experiment, the reactor residence time was varied by changing the volu- metric flow rate of the feed gas [162]. Our simulations similarly varied the volumetric flow rate into the reactor. Figure 4.4 compares the SiH4 concentration profile versus temperature at four different residence times. In the compared experiment, reported inlet volumetric flow rates and residence times at 843 K were not consistent given a constant volume re- actor, leading to slight errors in the residence time used in the simulation. The error bars in the figure represent this discrepancy only, with more information included in Appendix B.5. The reactor simulations compare qualitatively well with the experimental data. In the previous section, it was noted that a 20 K difference in temperature, which is well within

the uncertainty of the reaction rates, would better capture the SiH4 profile. One can imagine 82

1.0

0.8

Exp. 0.8 s 4

4 0.6

H Exp. 0.5 s i H i

S Exp. 0.35 s , S

0 Exp. 0.2 s y 0.4 y Mod. 0.8 s Mod. 0.5 s 0.2 Mod. 0.35 s Mod. 0.2 s 0.0 800 850 900 950 1000 Temperature (K)

Figure 4.4: SiH4 concentration vs. temperature at different residence times, from Onischuk et al. [162] (points) and from reactor simulations in this work.

that the simulation data in Figure 4.4 would, therefore, become closer to the experimental results if the curves were shifted by +20 K; the largest discrepancy is about 40 K.

4.3.3.4 Effect of radical reaction families

To investigate the effect of excluding radical reaction families, a new mechanism was generated with the radical reaction families (hydrogen abstraction, radical recombina- tion) disabled during the mechanism generation. The results for the mechanisms simulated at 913 K are displayed in Figure 4.5(a), along with the experimental data. At these condi-

tions, the removal of radical reactions has no noticeable effect on overall SiH4 decompo- sition rate. Because radical pathways are thought to be relatively more important at lower temperatures of CVD conditions, RMG mechanisms were generated at 613 K, both with all reactions included and with radical reaction families removed, and used in simulations. At this lower temperature and far slower decomposition, as shown in Figure 4.5(b), there is still hardly a difference between the full mechanism and the mechanisms with radical reac- tions removed. Comparing the Si fluxes at 6 × 104 seconds, which was chosen due to the slight acceleration of the non-radical pathway at that time, we don’t see a difference in the significant pathways to SiH4 decomposition, and only negligible participation by radicals. The flux diagram for the full mechanism is shown in Figure 4.6. 83

1.0 Experiment 0.9 Full mechanism No radical reactions 0.8 4

4 0.7 H i H i S , S 0 y

y 0.6

0.5

0.4

0.3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Time (s) x105 (a) (b)

Figure 4.5: Simulation results for full pressure dependent mechanisms generated by RMG, compared with a mechanism generated without radical reaction families allowed. P = 39 −4 kPa, y0,SiH4 = 1.6 × 10 (a) T = 913 K (b) T = 613 K.

SiH₄

SiH₂

Si₂H₆

SiH₃SiH

H₄Si₂

SiH₂Si

Si₂

Scale = 1e-10 time = 60158

Figure 4.6: Flux diagram for Si at 6 × 104 seconds for full, pressure dependent mechanism generated by RMG and simulated at 613 K 84

4.3.3.5 SiH4 and Si2H6 concentration profiles

1.00 0.040 y0 = 0.00088 0.95 0.035 y0 = 0.0025 0.030 y0 = 0.01 0.90 y0 = 0.05 4 4 6 0.025 4 H H H i i H 2 i S S 0.85 i 0.020 , , S S 0 0 y y y y 0.015 0.80 y0 = 0.00016 y0 = 0.00088 0.010 0.75 y0 = 0.0025 y0 = 0.05 0.005 0.70 0.000 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Time (s) Time (s) (a) Experimental results from Onischuk et al. [162]

1.00 0.040 0.95 0.035 0.030 0.90 4 4 6 0.025 4 H H H i i H 2 i y0 = 0.00016 S S 0.85 i 0.020 , , S S 0 0

y y0 = 0.00088 y y y 0.015 0.80 y0 = 0.0025 0.010 0.75 y0 = 0.01 y0 = 0.05 0.005 0.70 0.000 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Time (s) Time (s) (b) Results of reactor simulations using RMG-generated model

1.00 0.040 0.95 0.035 0.030 0.90 4 4 6 0.025 4 H H H i i H 2 i y0 = 0.00016 S S 0.85 i 0.020 , , S S 0 0

y y0 = 0.00088 y y y 0.015 0.80 y0 = 0.0025 0.010 0.75 y0 = 0.01 y0 = 0.05 0.005 0.70 0.000 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Time (s) Time (s) (c) Simulation results after updating three reaction rates.

Figure 4.7: Variation in concentration profiles of SiH4 and Si2H6 with initial SiH4 concen- tration at 873 K.

Figure 4.7 displays the concentration of SiH4 and Si2H6 with time at different ini-

tial SiH4 concentrations at 873 K. These profiles appear not to vary with the initial SiH4 concentration (4.7(b)), although the experimental results show a clear difference (4.7(a)).

In contrast, an early shock tube study reveals that rate of SiH4 decomposition is unaffected 85

by initial SiH4 concentrations [171], although the conditions are different from both this modeling study and from the Onischuk et al. experiment. A sensitivity analysis was performed to assess which reactions qualitatively affected the concentration profiles of SiH4 and Si2H6. Three reactions were identified which, when the rate constants were modified by no more than two orders of magnitude (102), produced noticeable changes in these concentration profiles. These reactions are as follows:

SiH2 + SiH2 )−−−−* Si2H4

H2 + SiH3SiH )−−−−* Si2H6

H2 + SiH3SiH )−−−−* SiH2 + SiH4

Changing these few rates at the same time can produce concentration profiles that are close to the experimental results, as shown in Figure 4.7(c). While the authors do not recommend changing the rates on the basis of the current evidence, the result is shown to establish that this initial built mechanism is plausible given the uncertainties. For example,

the first altered reaction rate, for SiH2 + SiH2 ←→ Si2H4, was estimated by Giunta et

al. to have the same rate as SiH2 + Si2H6 ←→ Si3H8, which was measured by Jasinski and Chu[166, 200] (this is the rate used in the RMG mechanism). However, Dollet and de Persis reported their calculated rate for this reaction to be an order of magnitude different[170]. Therefore, it reasonable to assume considerable uncertainty in this rate. For a final model, however, modifying rates should be done carefully, with additional evidence and more knowledge about the fidelity of all rates. 86

4.4 Discussion

Thermodynamic and kinetic data for silicon hydrides were generated to update RMG’s database, and the computational methods used were benchmarked with available experimental and high level theoretical data. From both of these studies, it was observed that where high accuracy data are available, they should be put into a thermodynamic or kinetic library in RMG for greater model accuracy. However, if data are unavailable, the calculations provide reasonable enough estimates for initial construction of a model. These calculations were used in RMG’s database for the hydrogen abstraction reaction family and to generate hydrogen bond increment values for calculating thermodynamics of radical species. The additions to RMG’s database allows users to build kinetic models for SiH4 thermal decomposition. The inclusion of radical reaction families was found to be unimportant to overall

SiH4 decomposition rate at both 613 K and 913 K. Thus, inclusion of these pathways does not mechanistically change the understanding of thermal decomposition of SiH4, instead corroborating the prior studies that conclude radical reaction pathways are much slower compared to the main decomposition pathway to SiH2 and H2 and the subsequent insertion and elimination reactions that occur. However, the new ability to automatically build more complex mechanisms for silicon hydrides in general can provide understanding which helps in selection of process conditions and/or different precursors for CVD which reduce the amount of radicals formed, thus reducing the risk for unwanted particle formation and growth in the CVD reactor. Generation of two detailed kinetic models in RMG, one with pressure dependent reactions and the other without, showed that the pressure dependent model more accu- rately compared to experimental data. Furthermore, using the pressure dependent model,

temperature had a large effect on the SiH4 concentration profile with time. A 20 K change in temperature, which is comparable to a change of only about 1.2 kcal/mol in activation 87

energy at these temperatures, brought the simulation results closer to the experimental SiH4 concentration profile. Simulations using the RMG generated model were able to reasonably replicate the effect of residence time on the SiH4 concentration profile at different temperatures. If uncertainty in temperature by about 20 K is taken into account, the simulations using the RMG generated model more closely match the experiment.

Comparing the SiH4 and Si2H6 profiles obtained by varying initial SiH4 concen- tration, the model does not match well with the experimental data. Specifically, the sim- ulations show no change in these profiles at different initial concentrations, whereas the experiment shows a clear dependence. The qualitative trends seen in the experimental results can be obtained by chang- ing three reaction rates by amounts likely within the uncertainty of these rates; however, this approach to modifying rates is not advised, as the causes of the discrepancy may lie elsewhere. Instead, it’s recommended to calculate carefully those reaction rates which can feasibly be calculated. A global sensitivity analysis could provide additional information

about which rates significantly affect the SiH4 and Si2H6 concentration profiles. Addi- tionally, given that these results in this dissertation are more in line with the Newman et

al. experiment at different conditions, the effect of SiH4 initial concentration still remains unclear. Further investigation by detailed measurements and calculations of these important rates could improve the accuracy of RMG’s database for silicon hydrides.

4.5 Summary

A framework for extending the Reaction Mechanism Generator software package (RMG) to a new class of elements and chemical reactions has been demonstrated, which allows generation of detailed kinetic models for silicon hydride decomposition. This work builds on previous efforts to generate these mechanisms automatically, making use of pub- lished thermodynamic and kinetic data. Additional data were calculated to allow radical 88 reaction pathways to be enabled, including calculations of rates for hydrogen abstraction reactions of silicon hydrides, and hydrogen bond increment values for radical species. All thermodynamic calculations were completed at the G3//B3LYP level of theory, while the reaction rates were calculated using M062X/MG3S and classical transition state theory. For silane thermal decomposition, simulations using the RMG-generated model reason- ably compare to experimental results, with inconclusive results on the effect of initial silane concentration. Inclusion of the enabled radical reaction families, hydrogen abstraction and radical recombination, made little difference in the reactor simulations. Many other reac- tion families exist in RMG which can be extended to silicon hydrides, but availability of experimental or high level theoretical calculations of their reaction rates remains a chal- lenge. This work represents an important first step in extending RMG to model chemical vapor deposition by more accurately representing the detailed gas phase chemistry in this process, and validating the results for silane thermal decomposition.

4.6 Recommendations

Several recommendations for SiH4 model improvement are outlined below. These mainly involve refinement of the thermodynamics and kinetics parameters used in RMG’s database for the construction of the model. Additionally, the possibility of including surface chemistry in RMG is discussed.

4.6.1 Expansion of thermodynamic libraries for radical species

As suggested previously, certain effects caused by multiradical or divalent silicon species cannot be accounted for using lower-level electronic structure methods [198]. For species with these particular characteristics, the comparison of the values calculated with G3//B3LYP as well as the group additivity values using new HBI corrections, and the higher

◦ level calculations of Katzer et al., yielded differences in ∆f H298 larger than 5 kcal/mol. The radical species with these chemical characteristics that appear in the core or edge of the 89

RMG-generated models should be recalculated, using high level methods, such as coupled- cluster methods. Availability of computational chemistry software that has capability to run these methods is imperative; ORCA is one such open-source software [201]. Furthermore, adequate computational resources are necessary to run such high-level calculations. Such resources are available using the Discovery cluster at Northeastern University, accessible through the Massachusetts Green High Performance Computing Center (MGHPCC).

4.6.2 Calculation of rates

In this work, ten hydrogen abstraction reaction rates were calculated using DFT and added to RMG’s database. For silylene insertion, silylene-to-silene isomerization, and ring opening reactions, rates were sourced from the work of Adamczyk et al [177–180]. All of these rates from literature were calculated with G3//B3LYP. Both levels of theory used to calculate rates are known to be approximate, with errors in DFT energies contributing to discrepancies in barrier heights that have a significant effect on the rate. Furthermore, the M062X/MG3S calculations used the rigid rotor harmonic oscillator (RRHO) approxi- mation and did not check whether the reactants, products and transition states found were the lowest energy conformers. While these effects are generally larger for thermodynamics calculations since some error cancellation might occur in the calculation of kinetics, the magnitude of the effect is unknown. Thus, one way of improving RMG-generated models would be to calculate reaction rates that are either estimated, or used directly from lower- level calculations, at higher levels of theory. If computational resources are a problem, the lower levels of theory can be used, but the 1-D hindered rotor approximation can be incorporated to find the lowest energy conformer and include the effects of hindered rotors [202]. 90

4.6.2.1 Using automatic transition state theory calculations

A recently developed method calculates reaction rates automatically, in a high- throughput manner, using canonical transition state theory. The method, known as Au- toTST, uses a machine-learning algorithm to estimate transition state geometries via group contribution (discussed in Chapter 1) and is integrated with computational chemistry soft- wares and CanTherm [33, 39, 40]. AutoTST could in theory be applied to the reaction fam- ilies utilized in this work which have a reaction barrier in order to generate many reaction rates for these families. However, significant challenges arise when extending AutoTST to new reaction families. Furthermore, AutoTST currently has some limitations, including in- accuracies that result from using DFT energies, errors in calculation of symmetry number, use of the RRHO approximation, and improper determination of lowest-energy conformer [203]. Use of AutoTST, or any other methods used to calculate reaction rates more care- fully, should be used in conjunction with sensitivity analyses that identify which reactions in the mechanism are important.

4.6.3 Sensitivity analysis

One type of sensitivity analysis was employed in this work, where the reaction rates were individually varied to assess the effect on silane decomposition with differing initial silane concentration. Once three reactions were identified, they were all modified at the same time to visualize a more pronounced effect. However, such a sensitivity analysis was semi-manual and rather unsystematic. Further sensitivity analysis is available using Cantera, and can be used to see how each parameter in the reactor system affects a system output variable. In this way, it is possible to rank each reaction rate constant as affecting concentration of SiH4, Si2H6, or any other species concentration of interest. 91

4.6.4 Surface chemistry

Further extension of RMG to surface chemistry could allow for more rigorous com- parison to CVD experiments. Many published experiments and mechanisms could not be used for validation, since the gas and surface phase analysis were coupled, unlike in the Onischuk et al. paper [162]. A collaborative project between Professor Richard West and Professor C. Franklin Goldsmith at Brown University involves extending RMG to heteroge- neous catalysis. The framework used here, which allows for the representation of adsorbed species and heterogeneous reactions, is extensible to silicon surface chemistry [204]. Another avenue which can aid in developing reaction rules for silicon surface re- actions is metadynamics. Metadynamics is a method fueled by and explores phase space through a bias potential, based on collective variables that describe the system. Such an approach can map the potential energy surface of a system without knowing the types of reactions that can occur beforehand [205]. Zheng and Pfaendtner demonstrated that the approach provides meaningful results for methanol oxidation [206]. Preliminary work in our group, in collaboration with Intel Corporation, also shows that metadynamics can find simple pathways of SiH4 decomposition on a silicon surface, mod- eled as a silicon hydride cluster. Clever choice of collective variables may yield addi- tional reaction pathways involved in silane chemistry, allowing them to be added to RMG’s database. 92

5 CONCLUSION

This dissertation made progress in extending automatic mechanism generation to two domains not previously or commonly studied in this manner: liquid-phase oxidation of fuels and gas-phase thermal decomposition of silicon hydrides. These extensions were made within the framework of the open-source software package, the Reaction Mechanism Generator (RMG).

5.1 Liquid-phase fuel oxidation

To enable liquid-phase mechanism generation, source code was updated with ex- isting linear solvation energy relationships and group additivity methods for determining solute descriptors, in order to calculate solvation thermodynamics. In addition, more ca- pabilities were added to deal with chemical species with unpaired electrons and lone pairs. Furthermore, group additivity values for nitrogen and sulfur-containing species, for which RMG can now generate mechanisms, were added. Addition of a database of known so- lute descriptors now makes it faster to look up values on the fly rather than using group additivity every time. Based on well-known correlations, gas-phase reaction rates in RMG were modified to account for diffusion effects on bimolecular reactions in the presence of a solvent. In- trinsic solvent effects were incorporated by development of a group contribution method to correct gas-phase rates for different solvents systematically, based on molecular struc- ture of the reaction. This method has been developed for two different reaction families, intra-hydrogen migration and hydrogen abstraction, for eight different solvents that are 93 categorically different. The corrections have been applied to a detailed kinetic model for predicting induction period of oxidation of biofuel blends. Used as a training set for the group contribution method, this work also contributes gas-phase geometries for several chemical species and transition states calculated at the M062X/MG3S level of theory, and energies calculated using a continuum solvation model in the eight solvents. There are several possible avenues for future work on liquid-phase mechanism gen- eration in RMG-Py. Thermodynamic data can be more accurately represented by perform- ing high-level quantum chemistry calculations. These may be used to replace some of the estimation techniques and other assumptions made about species containing lone pairs and unpaired electrons, or to extend RMG to new applications. Calculations in different sol- vents, beyond the 26 currently in RMG’s database, can also lead to the development of new solvent descriptors to be used in the estimation of thermodynamic properties using the linear solvation energy relationships. The calculation of kinetic solvent effect could be better automated. At this point, the easiest feature to implement is the on-the-fly calculation of ∆EA, as most of this code has been written. This would ensure that important reactions are included in the model, since a post-processing step only ensures modification of rates of reactions that were included in the model based on their gas-phase rates. Furthermore, building the decision tree for determining kinetic solvent effect can also be automated using a software package such as scikit-learn. Some of this work has already been completed, but the automatically built tree should be compared to the existing tree for accuracy. Finally, both the solvation thermodynamics and kinetics estimation methods would benefit from better benchmarking. Specifically, the values calculated from group contri- bution should be compared to those calculated exactly from quantum chemistry, using continuum solvation. If computational power and time is available, benchmarking with higher-level methods such as coupled cluster methods, and/or more explicit solvent repre- sentations, could greatly benefit the accuracy of detailed kinetic models and also provide 94 much-needed data to the chemistry community.

5.2 Thermal decomposition of silicon hydrides

The second application of this dissertation, gas-phase silicon hydride decomposi- tion, demonstrated how to add new classes of elements and chemical reactions to mecha- nism generation software. While microkinetic models had been developed for some silicon hydrides previously, this work specifically focused on the ability to include radical reaction types along with commonly used reaction families. Thermodynamic and kinetic data for silicon hydrides were added from literature to RMG’s database. The work also contributes new, calculated data for hydrogen bond increment (HBI) values of silicon hydride species, as well as hydrogen abstraction reaction rates.

The new additions to RMG were demonstrated by building a model for SiH4 ther- mal decomposition. Comparison to experimental work shows a qualitative and semi- quantitative match, within the uncertainty in activation energy of the main SiH4 decom- position channel. Additionally, this work has added insight on which process conditions, as well as model generation options, have an effect on the SiH4 decomposition profile, mainly corroborating early studies.

However, the SiH4 and Si2H6 profiles obtained when initial SiH4 concentration is varied do not compare well with the experimental data. This motivates further research, including global sensitivity analysis to see which reactions affect this dependence. If im- portant reactions are identified, the source of these rates should be investigated and if they are poorly estimated, should be carefully calculated. To calculate these reactions in a high throughput way, the automated transition state theory calculator, AutoTST, could be uti- lized. In its current state, AutoTST has some sources of errors that must be addressed, the two major errors being in the calculation of symmetry number and improper determination of lowest energy conformer. Despite these flaws, efforts to extend AutoTST to the silylene insertion reaction family have been initiated. Alternatively, individual reaction rates could 95 be calculated using quantum chemistry methods at a higher level of theory than DFT. Another research direction is in extension of RMG further, either to surface chem- istry, or to new silicon precursors. Enabling surface chemistry is important for comparison to real chemical vapor deposition experiments. Lastly, extending RMG further to new pre- cursors (such as those containing germanium or silicon) can shift these generated microki- netic models from merely confirming prior experiments and modeling, to being predictive.

5.3 Summary

This dissertation reports new thermodynamic and chemical kinetic parameters, as well as new insights about two systems not previously investigated with automatic mech- anism generation. Each of these chemistry applications demonstrates how machine learn- ing can be used to predict parameters during the course of a simulation, where on-the-fly quantum calculations for every unknown parameter would be computationally infeasible. Importantly, the work also provides a framework for future developers to add capabilities for new phases or elements to mechanism generation software. 96

References

[1] B. R. Moser. Comparative oxidative stability of fatty acid alkyl esters by accelerated methods. JAOCS, J. Am. Oil Chem. Soc., 86(7):699–706, 2009.

[2] S. Blaine and P. E. Savage. Reaction Pathways in Lubricant Degradation. 2. n- Hexadecane Autoxidation. Ind. Eng. Chem. Res., 30:2185–2191, 1991.

[3] F. Garcia-ochoa, A. Romero, and J. Querol. Modeling of the Thermal n-Octane Oxidation in the Liquid Phase. Ind. Eng. Chem. Res., 28(1):43–48, 1989.

[4] A. Ben Amara, A. Nicolle, M. Alves-Fortunato, and N. Jeuland. Toward Predictive Modeling of Petroleum and Biobased Fuel Stability: Kinetics of Methyl Oleate/ n -Dodecane Autoxidation. Energy & Fuels, 27(10):6125–6133, October 2013.

[5] M. T. Swihart and S. L. Girshick. Thermochemistry and kinetics of silicon hydride cluster formation during thermal decomposition of silane. J. Phys. Chem. B, 103(1): 64–76, 1999.

[6] H. W. Wong, X. Li, M. T. Swihart, and L. J. Broadbelt. Detailed kinetic modeling of silicon nanoparticle formation chemistry via automated mechanism generation. J. Phys. Chem. A, 108(46):10122–10132, 2004.

[7] W. H. Green, Jr, J. W. Allen, B. A. Buesser, R. W. Ashcraft, G. J. O. Beran, C. A. Class, C. Gao, C. F. Goldsmith, M. R. Harper, A. Jalan, M. Keceli, G. R. Magoon, D. M. Matheu, S. S. Merchant, J. D. Mo, S. Petway, S. Raman, S. Sharma, J. Song, Y. V. Suleimanov, K. M. Van Geem, J. Wen, R. H. West, A. Wong, H.-W. Wong, P. E. Yelvington, N. Yee, and J. Yu. RMG - Reaction Mechanism Generator, 2013. URL http://greengroup.github.io/RMG-Py/.

[8] C. W. Gao, J. W. Allen, W. H. Green, and R. H. West. Reaction Mechanism Gen- erator : automatic construction of chemical kinetic mechanisms. Comput. Phys. Commun., 2016.

[9] F. Seyedzadeh Khanshan and R. H. West. Developing detailed kinetic models of syngas production from bio-oil gasification using Reaction Mechanism Generator (RMG). Fuel, 163:25–33, 2016.

[10] C. A. Class, M. Liu, G. Vandeputte, and W. H. Green. Automatic mechanism gen- eration for pyrolysis of di-tert-butyl sulfide. Phys. Chem. Chem. Phys., 18:21651– 21658, 2016. 97

[11] F. S. Khanshan. Automatic generation of detailed kinetic models for complex chem- ical systems. PhD thesis, Northeastern University, College of Engineering, Depart- ment of Chemical Engineering, http://hdl.handle.net/2047/D20213055, 2016.

[12] K. Prozument, Y. V. Suleimanov, B. Buesser, J. M. Oldham, W. H. Green, A. G. Suits, and R. W. Field. A Signature of Roaming Dynamics in the Thermal De- composition of Ethyl Nitrite: Chirped-Pulse Rotational Spectroscopy and Kinetic Modeling. J. Phys. Chem. Lett., 5:3641–3648, 2014.

[13] I. Ugi, J. Bauer, and J. Brandt. New applications of computers in chemistry. Angew. Chemie Int. Ed. English, 18:111–123, 1979.

[14] Y. Yoneda. A Computer Program Packages for the Analysis, Creation, and Esti- mation of Generalized Reactions-GRACE. I. Generation of Elementary Reaction Network in Radical Reactions-A/GRACE(I). Bull. Chem. Soc. Jpn., 52(1):8–14, 1979.

[15] F. D. Maio and P. Lignola. KING, a KInetic Network Generator. Chem. Eng. Sci., 47(9-11):2713–2718, 1992.

[16] E. Ranzi, T. Faravelli, P. Gaffuri, and A. Sogaro. Low-Temperature Comustion: Automatic Generation of Primary Oxidation Reactions and Lumping Procedures. Combust. Flame, 102:179–192, 1995.

[17] R. G. Susnow, A. M. Dean, W. H. Green, P. Peczak, and L. J. Broadbelt. Rate-Based Construction of Kinetic Models for Complex Systems. J. Phys. Chem. A, 5639(96): 3731–3740, 1997.

[18] V. Warth, F. Battin-Leclerc, R. Fournet, P. Glaude, G. Come, and G. Scacchi. Com- puter based generation of reaction mechanisms for gas-phase oxidation. Comput. Chem., 24(5):541–60, July 2000.

[19] A. Ratkiewicz and T. N. Truong. Application of chemical graph theory for automated mechanism generation. J. Chem. Inf. Comput. Sci., 43:36–44, 2003.

[20] S. Rangarajan, A. Bhan, and P. Daoutidis. Language-oriented rule-based reaction network generation and analysis: Description of RING. Comput. Chem. Eng., 45: 114–123, October 2012.

[21] R. Van De Vijver, N. M. Vandewiele, P. L. Bhoorasingh, B. L. Slakman, F. S. Khan- shan, H. H. Carstensen, M. F. Reyniers, G. B. Marin, R. H. West, and K. M. Van Geem. Automatic mechanism and kinetic model generation for gas- And solution- phase processes: A perspective on best practices, recent advances, and future chal- lenges. Int. J. Chem. Kinet., 47(4):199–231, 2015.

[22] J. W. Allen, R. W. Ashcraft, G. J. Beran, B. A. Buesser, C. A. Class, C. Gao, C. F. Goldsmith, M. R. Harper, A. Jalan, M. Keceli, G. R. Magoon, D. M. Matheu, S. S. Merchant, J. D. Mo, S. Petway, S. Ruman, S. Sharma, K. M. Van Geem, J. Song, 98

Y. Suleymanov, N. Vandewiele, J. Wen, R. H. West, A. Wong, H.-W. Wong, N. W.- W. Yee, P. E. Yelvington, J. Yu, and W. H. Green. RMG (Reaction Mechanism Generator) version 4.0, 2013. URL http://rmg.sourceforge.net/.

[23] J. Song. Building robust chemical reaction mechanisms: next generation of automatic model construction software. PhD the- sis, Massachusetts Inst. Technol. Dept. Chem. Eng., 2004. URL http://dspace.mit.edu/handle/1721.1/30058.

[24] G. P. Smith, D. M. Golden, M. Frenklach, N. W. Moriarty, B. Eite- neer, M. Goldenberg, C. T. Bowman, R. K. Hanson, S. Song, J. William C. Gardiner, V. V. Lissianski, and Z. Qin. Gri-mech 3.0. URL http://www.me.berkeley.edu/gri mech/.

[25] J. W. Allen, C. F. Goldsmith, and W. H. Green. Automatic estimation of pressure- dependent rate coefficients. Phys. Chem. Chem. Phys., 14(3):1131–55, jan 2012.

[26] L. J. Broadbelt and J. Pfaendtner. Lexicography of kinetic modeling of complex reaction networks. AIChE J., 51(8):2112–2121, aug 2005.

[27] S. W. Benson and J. H. Buss. Additivity Rules for the Estimation of Molecular Properties. Thermodynamic Properties. J. Chem. Phys., 29(3):546, 1958.

[28] T. H. Lay, J. W. Bozzelli, A. M. Dean, and E. R. Ritter. Hydrogen atom bond incre- ments for calculation of thermodynamic properties of hydrocarbon radical species. J. Phys. Chem., 99(39):14514–14527, 1995.

[29] G. R. Magoon and W. H. Green. Design and implementation of a next-generation software interface for on-the-fly quantum and force field calculations in automated reaction mechanism generation. Comput. Chem. Eng., 52:35–45, may 2013.

[30] R. Sumathi, H. H. Carstensen, and W. H. Green. Reaction rate prediction via group additivity part 1: H abstraction from alkanes by H and CH3. J. Phys. Chem. A, 105 (28):6910–6925, 2001.

[31] M. Evans and M. Polanyi. Further considerations on the thermodynamics of chemi- cal equilibria and reaction rates. Trans. Faraday Soc., 32:1333–1360, 1936.

[32] H. Eyring, H. Gershinowitz, and C. E. Sun. The Absolute Rate of Homogeneous Atomic Reactions. J. Chem. Phys., 3(12):786, 1935.

[33] J. W. Allen and W. H. Green. CanTherm: Open-source software for thermodynam- ics and kinetics. Included in: Reaction Mechanism Generator, v2.0.0, 2016. URL http://reactionmechanismgenerator.github.io.

[34] A. D. Isaacson, D. G. Truhlar, S. N. Rai, R. Steckler, G. C. Hancock, B. C. Garrett, and M. J. Redmon. Polyrate: A general computer program for variational transi- tion state theory and semiclassical tunneling calculations of chemical reaction rates. Comput. Phys. Commun., 47. 99

[35] S. J. Klippenstein, A. F. Wagner, R. C. Dunbar, D. M. Wardlaw, S. H. Robertson, and J. A. Miller. Variflex: Version 2.02m, 2010. [36] J. M. P. L. L. L. A. M. P. J. S. T. L. N. J R Baker, N F Ortiz and T. J. D. Kumar. Multiwell-2012.1 software, 2012. [37] P. M. Zimmerman. Automated discovery of chemically reasonable elementary reac- tion steps. J. Comput. Chem., 34(16):1385–92, jun 2013. [38]J.Z ador´ and H. Najm. Kinbot: An automated code for exploring reaction pathways in the gas phase. Sandia National Laboratories Technical Report SAND2012-8095, 2012. [39] P. Bhoorasingh and R. West. Transition state geometry prediction using molecular group contributions. Phys. Chem. Chem. Phys., 17(48):32173–32182, 2015. [40] P. Bhoorasingh, B. Slakman, F. S. Khanshan, J. Cain, and R. West. Kinetic data for manuscript describing the AutoTST algorithm for automated Transi- tion State Theory calculations of chemical reaction rates. figshare.com, page 10.6084/m9.figshare.4234160, 12 2016. [41] C. Kingsford and S. L. Salzberg. What are decision trees? Nat. Biotechnol., 26(9): 1011–3, 2008. [42] L. Han, Y. Wang, and S. H. Bryant. Developing and validating predictive deci- sion tree models from mining chemical structural fingerprints and highthroughput screening data in PubChem. BMC Bioinformatics, 9(1):401, 2008. [43] J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014. [44] D. Thornley, M. Zverev, and S. Petridis. Machine learned regression for abduc- tive DNA sequencing. Proc. - 6th Int. Conf. Mach. Learn. Appl. ICMLA 2007, (December):254–259, 2007. [45] Y. Hu and H. Cheng. Application of stochastic models in identification and appor- tionment of heavy metal pollution sources in the surface soils of a large-scale region. Environ. Sci. Technol., 47(8):3752–3760, 2013. [46] V. F. McNeill, J. L. Woo, D. D. Kim, A. N. Schwier, N. J. Wannell, A. J. Sumner, and J. M. Barakat. Aqueous-phase secondary organic aerosol and organosulfate formation in atmospheric aerosols: a modeling study. Environ. Sci. Technol., 46: 8075–8081, aug 2012. [47] Y. B. Lim and B. J. Turpin. Laboratory evidence of organic peroxide and peroxy- hemiacetal formation in the aqueous phase and implications for aqueous OH. Atmos. Chem. Phys., 15(22):12867–12877, 2015. [48] K. Chatelain, A. Nicolle, A. Ben Amara, L. Catoire, and L. Starck. Wide Range Experimental and Kinetic Modeling Study of Chain Length Impact on n -Alkanes Autoxidation. Energy & Fuels, 30:1294–1303, 2016. 100

[49] V. Thavasi, R. P. A. Bettens, and L. P. Leong. Temperature and solvent effects on radical scavenging ability of phenols. J. Phys. Chem. A, 113(13):3068–3077, apr 2009. [50] A. Jalan, R. H. West, and W. H. Green. An extensible framework for capturing solvent effects in computer generated kinetic models. J. Phys. Chem. B, 117(10): 2955–70, March 2013. [51] H. M. Senn and W. Thiel. QM/MM methods for biomolecular systems. Angew. Chem. Int. Ed. Engl., 48(7):1198–229, January 2009. [52] J. Tomasi, B. Mennucci, and R. Cammi. Quantum mechanical continuum solvation models. Chem. Rev., 105(8):2999–3093, August 2005. [53] A. Jalan, R. W. Ashcraft, R. H. West, and W. H. Green. Predicting solvation energies for kinetic modeling. Annu. Reports Sect. “C” (Physical Chem.), 106:211, 2010. [54] M. Kamlet and R. Taft. The Solvatochromic Comparison Method. I. The beta-Scale of Solvent Hydrogen-Bond Acceptor (HBA) Basicities. J. Am. Chem. Soc., 98(2): 377, 1976. [55] M. H. Abraham, A. Ibrahim, and A. M. Zissimos. Determination of sets of solute descriptors from chromatographic measurements. J. Chromatogr. A, 1037(1-2):29– 47, May 2004. [56] M. Abraham. Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes. Chem. Soc. Rev., 96(5), 1993. [57] C. Mintz, M. Clark, W. E. Acree, and M. H. Abraham. Enthalpy of solvation corre- lations for gaseous solutes dissolved in water and in 1-octanol based on the Abraham model. J. Chem. Inf. Model., 47(1):115–21, January 2007. [58] J. Platts and D. Butina. Estimation of molecular linear free energy relation descrip- tors using a group contribution approach. J. Chem. Inf. Comput. Sci., 39:835–845, 1999. [59] C. Mintz, M. Clark, K. Burton, W. E. Acree, and M. H. Abraham. Enthalpy of Solva- tion Corrections for Gaseous Solutes Dissolved in Benzene and in Alkane Solvents Based on the Abraham Model. QSAR Comb. Sci., 26(8):881–888, 2007. [60] R. A. Pierotti. The solubility of gases in liquids. J. Phys. Chem., 67(9):1840–1845, 1963. [61] J. Kongsted, P. Soderhjelm,¨ and U. Ryde. How accurate are continuum solvation models for drug-like molecules? J. Comput. Aided. Mol. Des., 23(7):395–409, July 2009. [62] C. Kelly, C. Cramer, and D. Truhlar. SM6: a density functional theory continuum solvation model for calculating aqueous solvation free energies of neutrals, ions, and solute-water clusters. J. Chem. Theory . . . , pages 1133–1152, 2005. 101

[63] A. Ahmed and S. I. Sandler. Hydration free energies of multifunctional nitroaromatic compounds. J. Chem. Theory Comput., 9(6):2774–2785, 2013.

[64] W. H. Green, Jr, J. W. Allen, B. A. Buesser, R. W. Ashcraft, G. J. O. Beran, C. A. Class, C. Gao, C. F. Goldsmith, M. R. Harper, A. Jalan, M. Keceli, G. R. Ma- goon, D. M. Matheu, S. S. Merchant, J. D. Mo, S. Petway, S. Raman, S. Sharma, J. Song, Y. V. Suleimanov, K. M. Van Geem, J. Wen, R. H. West, A. Wong, H.- W. Wong, P. E. Yelvington, N. Yee, and J. Yu. Rmg-database project. URL http://github.com/GreenGroup/RMG-database.

[65] M. Abraham. Hydrogen Bonding. Part 34. The Factors that Influence the Solubility of Gases and Vapours in Water at 298 K, and a New Method for its Determination. J. Chem. Soc., Perkin Trans., 1994.

[66] ACD/Labs. Acd/i-lab, 2010-2013. URL http://ilab.acdlabs.com/iLab2/.

[67] G. D. Marenich, A. V.; Kelly, C. P.; Thompson, J. D.; Hawkins and D. G. Cham- bers, C. C.; Giesen, D. J.; Winget, P.; Cramer, C. J.; Truhlar. Minnesota Solvation Database-version 2012, 2012. URL http://comp.chem.umn.edu/mnsol/.

[68] D. Tromans. Temperature and pressure dependent solubility of oxygen in water: a thermodynamic analysis. Hydrometallurgy, 48(3):327–342, 1998.

[69] R. F. Prini and R. Crovetto. Evaluation of Data on Solubility of Simple Apolar Gases in Light and Heavy Water at High Temperature, 1989.

[70] H. W. Wong, J. C. A. Nieto, M. T. Swihart, and L. J. Broadbelt. Thermochemistry of silicon-hydrogen compounds generalized from quantum chemical calculations. J. Phys. Chem. A, 108(5):874–897, 2004.

[71] J. A. Platts, M. H. Abraham, Y. H. Zhao, A. Hersey, L. Ijaz, and D. Butina. Corre- lation and prediction of a large blood-brain distribution data set - An LFER study. Eur. J. Med. Chem., 36(9):719–730, 2001.

[72] H. Struebing, Z. Ganase, P. G. Karamertzanis, E. Siougkrou, P. Haycock, P. M. Piccione, A. Armstrong, A. Galindo, and C. S. Adjiman. Computer-aided molecular design of solvents for accelerated reaction kinetics. Nat. Chem., (September), sep 2013.

[73] F. F. Crim. Molecular reaction dynamics across the phases: similarities and differ- ences. Faraday Discuss., 157:9, 2012.

[74] F. Briers, D. Chapman, and E. Walters. The Influence of the Intensity of Illumination on the Velocity of Photochemical Changes. The Determination of the Mean Life of a Hypothetical Catalyst. J. Chem. Soc., 129:562–569, 1926. 102

[75] F. Briers and D. Chapman. The Influence of the Intensity of Illumination on the Velocity of the Photochemical Union of Bromine and Hydrogen, and the Determi- nation of the Mean Life of a Postulated Catalyst. J. Chem. Soc., pages 1802–1811, 1928.

[76] E. T. Denisov. Liquid-Phase Reaction Rate Constants. Springer US, 2012. ISBN 9781468483000. URL https://books.google.com/books?id=j8LeBwAAQBAJ.

[77] H. W. Melville. The Photochemical Polymerization of Methyl Methacrylate Vapour. Proc. R. Soc. A Math. Phys. Eng. Sci., 163:511–542, dec 1937.

[78] C. Swain and P. Bartlett. Rate Constants of the Steps in Addition Polymerization. II. Use of the Rotating-Sector Method on Liquid Vinyl Acetate. J. Am. Chem. Soc., 68 (11):2381–2386, 1946.

[79] A. Shepp. Rate of Recombination of Radicals. I. A General Sector Theory; A Cor- rection to the Methyl Radical Recombination Rate. J. Chem. Phys., 24(5):939–943, 1956.

[80] D. Griller and K. U. Ingold. Free-Radical Clocks. Acc. Chem. Res., 13:317–323, 1980.

[81] D. Geske and A. Maki. Electrochemical Generation of Free Radicals and Their Study by Electron Spin Resonance Spectroscopy; the Nitrobenzene Anion Radical. J. Am. Chem. Soc., 82(9):2671–2676, 1960.

[82] B. Roschek, K. a. Tallman, C. L. Rector, J. G. Gillmore, D. a. Pratt, C. Punta, and N. a. Porter. Peroxyl radical clocks. J. Org. Chem., 71(9):3527–32, apr 2006.

[83] M. Jha and D. a. Pratt. Kinetic solvent effects on peroxyl radical reactions. Chem. Commun., pages 1252–1254, mar 2008.

[84] P. Hohenberg and W. Kohn. Inhomogeneous Electron Gas. Phys. Rev., 155(1962): 864, 1964.

[85] W. Kohn and L. J. Sham. Self-consistent equations including exchange and correla- tion effects. Phys. Rev., 385(1951):1133–1138, 1965.

[86] X. Xu, I. M. Alecu, and D. G. Truhlar. How Well Can Modern Density Functionals Predict Internuclear Distances at Transition States ? J. Chem. Theory Comput., 7: 1667–1676, 2011.

[87] A. J. Cohen, P. Mori-Sanchez,´ and W. Yang. Insights into current limitations of density functional theory. Science (80-. )., 321:792–794, aug 2008.

[88] R. Arnaud and N. Bugaud. Role of polar and enthalpic effects in the addition of methyl radical to substituted alkenes: a density functional study including solvent effects. J. Am. . . . , 7863(13):5733–5740, 1998. 103

[89] A. Berkessel and J. A. Adrio. Dramatic acceleration of olefin epoxidation in fluo- rinated alcohols: activation of hydrogen peroxide by multiple h-bond networks. J. Am. Chem. Soc., 128(41):13412–13420, oct 2006.

[90] J. Herbert. Dielectric continuum solvation models, 2013. URL https://chemistry.osu.edu/ herbert/projects/PCM.html.

[91] R. S. Cataliotti, F. Aliotta, and R. Ponterio. Silver nanoparticles behave as hydropho- bic solutes towards the liquid water structure in the interaction shell. A Raman study in the O-H stretching region. Phys. Chem. Chem. Phys., 11(47):11258–11263, dec 2009.

[92] J. R. Pliego. Shells theory of solvation and the long-range Born correction. Theor. Chem. Acc., 128(3):275–283, nov 2011.

[93] B. Mennucci and R. Cammi, editors. Continuum Solvation Models in Chem- ical Physics: From Theory to Applications. John Wiley and Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, 2007. ISBN 9780470029381.

[94] A. V. Marenich, C. J. Cramer, and D. G. Truhlar. Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B, 113(18):6378–96, May 2009.

[95] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheese- man, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Cari- cato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Had, and D. J. Fox. Gaussian 09, Revision D.01, 2009.

[96] S. Miertus,ˇ E. Scrocco, and J. Tomasi. Electrostatic interaction of a solute with a continuum. A direct utilization of ab initio molecular potentials for the prevision of solvent effects. Chem. Phys., 55:117–129, 1981.

[97] A. Klamt and G. Schu¨urmann.¨ COSMO: a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc. Perkin Trans., 2, 1993.

[98] K. Baldridge and A. Klamt. First principles implementation of solvent effects with- out outlying charge error. J. Chem. Phys., 106(16):6622, 1997.

[99] A. Pomogaeva and D. M. Chipman. Hydration energy from a composite method for implicit representation of solvent. J. Chem. Theory Comput., 10(1):211–219, 2014.

[100] N. M. Silva, P. Deglmann, and J. R. Pliego. CMIRS Solvation Model for Methanol: Parametrization, Testing, and Comparison with SMD, SM8, and COSMO-RS. J. Phys. Chem. B, 120(49):12660–12668, 2016. 104

[101] L. Valgimigli and J. Banks. Kinetic Solvent Effects on Hydroxylic Hydrogen Atom Abstractions Are Independent of the Nature of the Abstracting Radical. Two Ex- treme Tests Using Vitamin E and Phenol. J. Am. Chem. Soc., 117(9):9966–9971, 1995. [102] P. Das and M. Encinas. Reactions of tert-Butoxy Radicals with Phenols. Comparison with the Reactions of Carbonyl Triplets. J. Am. Chem. Soc., 103(14):4162–4166, 1981. [103] L. Valgimigli, K. U. Ingold, and J. Lusztyk. Antioxidant Activities of Vitamin E Analogues in Water and a Kamlet-Taft beta-Value for Water. J. Am. Chem. Soc., 118 (15):3545–3549, 1996. [104] Z. Alfassi, R. Huie, and P. Neta. Solvent Effects on the Rate Constants for Reaction of Trichloromethylperoxyl Radicals with Organic Reductants. J. Phys. Chem., 97: 7253–7257, 1993. [105] M. Foti, C. Daquino, and C. Geraci. Electron-Transfer Reaction of Cinnamic Acids and Their Methyl Esters with the DPPH Radical in Alcoholic Solutions. J. Org. Chem., 69(14):2309–2314, 2004. [106] D. Snelgrove and J. Lusztyk. Kinetic Solvent Effects on Hydrogen-Atom Abstrac- tions: Reliable, Quantitative Predictions via a Single Empirical Equation. J. Am. Chem. Soc., 123(3):469–477, 2001. [107] G. Litwinienko and K. U. Ingold. Abnormal solvent effects on hydrogen atom ab- stractions. 1. The reactions of phenols with 2,2-diphenyl-1-picrylhydrazyl (dpph*) in alcohols. J. Org. Chem., 68(9):3433–8, May 2003. [108] G. Litwinienko and K. U. Ingold. Abnormal solvent effects on hydrogen atom ab- straction. 2. Resolution of the curcumin antioxidant controversy. The role of sequen- tial proton loss electron transfer. J. Org. Chem., 69(18):5888–96, September 2004. [109] G. Litwinienko and K. Ingold. Abnormal Solvent Effects on Hydrogen Atom Ab- straction. 3. Novel Kinetics in Sequential Proton Loss Electron Transfer Chemistry. J. Org. Chem., 70(22):8982–8990, 2005. [110] M. Nielsen and K. Ingold. Kinetic Solvent Effects on Proton and Hydrogen Atom Transfers from Phenols/ Similarities and Differences. J. Am. Chem. Soc., 128(4): 1172–1182, 2006. [111] J. J. Warren and J. M. Mayer. Tuning of the thermochemical and kinetic properties of ascorbate by its local environment: solution chemistry and biochemical implications. J. Am. Chem. Soc., 132(22):7784–93, jun 2010. [112] I. Sajenko, V. Pilepic,´ C. J. Brala, and S. Ursic.´ Solvent dependence of the kinetic isotope effect in the reaction of ascorbate with the 2,2,6,6-tetramethylpiperidine-1- oxyl radical: tunnelling in a small molecule reaction. J. Phys. Chem. A, 114(10): 3423–3430, mar 2010. 105

[113] A. L. Koner, U. Pischel, and W. M. Nau. Kinetic solvent effects on hydrogen ab- straction reactions. Org. Lett., 9(15):2899–902, July 2007.

[114] J. Lalevee,´ X. Allonas, and J. Fouassier. Reactivity of Carbon-Centered Radicals toward Acrylate Double Bonds: Relative Contribution of Polar vs. Enthalpy Effects. J. Phys. Chem. A, pages 4326–4334, 2004.

[115] J. Lalevee,´ X. Allonas, J. P. Fouassier, D. Rinaldi, M. F. Ruiz Lopez, and J. L. Rivail. Solvent effect on the radical addition reaction to double bond: Experimental and quantum chemical investigations. Chem. Phys. Lett., 415(4-6):202–205, November 2005.

[116] J. Lalevee,´ X. Allonas, and J.-P. Fouassier. Addition of carbon-centered radicals to double bonds: influence of the alkene structure. J. Org. Chem., 70(3):814–9, February 2005.

[117] M. W. Wong and L. Radom. Radical Addition to Alkenes: Further Assessment of Theoretical Procedures. J. Phys. Chem. A, 102(12):2237–2245, March 1998.

[118] H. Fischer and L. Radom. Factors Controlling the Addition of Carbon-Centered Radicals to Alkenes- An Experimental and Theoretical Perspective. Angew. Chemie Int. Ed., 2001.

[119] G. P. F. Wood, M. S. Gordon, L. Radom, and D. M. Smith. Nature of Glycine and Its a-Carbon Radical in Aqueous Solution : A Theoretical Investigation. J. Chem. Theory Comput., 4:1788–1794, 2008.

[120] B. Chan, R. J. O’Reilly, C. J. Easton, and L. Radom. Reactivities of Amino Acid Derivatives Toward Hydrogen Abstraction by Cl * and OH *. J. Org. Chem., 77(21): 9807–9812, November 2012.

[121] R. S. Mulliken. Electronic Population Analysis on LCAO[Single Bond]MO Molec- ular Wave Functions. I. J. Chem. Phys., 23(10):1833, 1955.

[122] A. Garc´ıa, D. Dom´ınguez, and A. Navarro-Vazquez.´ Addition of carbon centered radicals to methyl 3-(methylamino)acrylate: The regioselectivity of radical addition to enamino esters. Comput. Theor. Chem., 979:17–21, January 2012.

[123] D. V. Avila, U. Ingold, and J. Lusztyk. Solvent Effects on the Competitive beta- Scission and Hydrogen Atom Abstraction Reactions of the Cumyloxyl Radical. Res- olution of a Long-standing Problem. J. Am. Chem. Soc., 115(2):466–470, 1993.

[124] M. Weber and H. Fischer. Absolute Rate Constants for the beta-Scission and Hy- drogen Abstraction Reactions of the tert-Butoxyl Radical and for Several Radical Rearrangements: Evaluating Delayed Radical Formations by Time-Resolved Elec- tron Spin Resonance. J. Am. Chem. Soc., 121:7381–7388, 1999. 106

[125] E. Baciocchi, M. Bietti, M. Salamone, and S. Steenken. Spectral properties and absolute rate constants for beta-scission of ring-substituted cumyloxyl radicals. A laser flash photolysis study. J. Org. Chem., 67(7):2266–70, April 2002.

[126] M. Newcomb, P. Daublain, and J. H. Horner. p-Nitrobenzenesulfenate esters as precursors for laser flash photolysis studies of alkyl radicals. J. Org. Chem., 67(24): 8669–71, November 2002.

[127] M. Bietti, G. Gente, and M. Salamone. Structural effects on the beta-scission reac- tion of tertiary arylcarbinyloxyl radicals. The role of alpha-cyclopropyl and alpha- cyclobutyl groups. J. Org. Chem., 70(17):6820–6, August 2005.

[128] C. Reichardt. Empirical Parameters of the Polarity of Solvents. Angew. Chemie Int. Ed. English, 4(1):29, 1965.

[129] M. F. Ruiz-Lopez, X. Assfeld, J. I. Garcia, J. a. Mayoral, and L. Salvatella. Solvent effects on the mechanism and selectivities of asymmetric Diels-Alder reactions. J. Am. Chem. Soc., 115(19):8780–8787, sep 1993.

[130] R. Bini, C. Chiappe, and V. Mestre. A rationalization of the solvent effect on the Diels-Alder reaction in ionic liquids using multiparameter linear solvation energy relationships. Org. Biomol. Chem., 6:2522–2529, 2008.

[131] R. Breslow and T. Guo. Diels-Alder Reactions in Nonaqueous Polar Solvents. Ki- netic Effects of Chaotropic and Antichaotropic Agents and of beta-Cyclodextrin. J. Am. Chem. Soc., 110(17):5613–5617, 1988.

[132] M. E. Sheehan and P. N. Sharratt. Molecular dynamics methodology for the study of the solvent effects on a concentrated Diels-Alder reaction and the separation of the post-reaction mixture. Comput. Chem. Eng., 22:S27–S33, mar 1998.

[133] J. Soto-Delgado, R. A. Tapia, and J. Torras. Multiscale Treatment for the Molecular Mechanism of a Diels-Alder Reaction in Solution: A QM/MM-MD Study. J. Chem. Theory Comput., 12(10):4735–4742, 2016.

[134] V. D. Kiselev, D. A. Kornilov, I. A. Sedov, and A. I. Konovalov. Solvent Influ- ence on the Diels-Alder Reaction Rates of 9-(Hydroxymethyl)anthracene and 9,10- Bis(hydroxymethyl)anthracene with Two Maleimides. Int. J. Chem. Kinet., 49(1): 61–68, 2017.

[135] Organic chemistry forum: p-aminophenol n- acylation mechanism question, 2011. URL http://www.chemicalforums.com/index.php?topic=51784.0.

[136] S. Xu, I. Held, B. Kempf, H. Mayr, W. Steglich, and H. Zipse. The DMAP-catalyzed acetylation of alcohols–a mechanistic study (DMAP = 4-(dimethylamino)pyridine). Chemistry, 11(16):4751–4757, aug 2005. 107

[137] A. Hassner, L. Krepski, and V. Alexanian. Aminopyridines as acylation catalysts for tertiary alcohols. Tetrahedron, 34:2069–2076, 1978.

[138] A. Berkessel and J. Adrio. Kinetic Studies of Olefin Epoxidation with Hydrogen Peroxide in 1,1,1,3,3,3,-Hexafluoro-2-propanol Reveal a Crucial Catalytic Role for Solvent Clusters. Adv. Synth. Catal., 346:275–280, 2004.

[139] A. Berkessel, J. a. Adrio, D. Huttenhain,¨ and J. M. Neudorfl.¨ Unveiling the ”booster effect” of fluorinated alcohol solvents: aggregation-induced conformational changes and cooperatively enhanced H-bonding. J. Am. Chem. Soc., 128(26):8421–8426, jul 2006.

[140] S. D. Visser, J. Kaneti, R. Neumann, and S. Shaik. Fluorinated Alcohols Enable Olefin Epoxidation by H2O2: Template Catalysts. J. Org. Chem., 68(7):2903–2912, 2003.

[141] B. Steenackers, A. Neirinckx, L. DeCooman, I. Hermans, and D. DeVos. The strained sesquiterpene beta-caryophyllene as a probe for the solvent-assisted epoxi- dation mechanism. ChemPhysChem, 15:966–973, 2014.

[142] G. Almerindo and J. Pliego. Ab initio investigation of the kinetics and mechanism of the neutral hydrolysis of formamide in aqueous solution. J. Braz. Chem. Soc., 18 (4):696–702, 2007.

[143] M. Bietti and M. Salamone. Solvent Effects on the O-Neophyl Rearrangement of 1,1-Diarylalkoxyl Radicals. A Laser Flash Photolysis Study. J. Org. Chem., 70(25): 10603–10606, 2005.

[144] S. Rice. Diffusion-Controlled Reactions in Solution. In C. Bamford and C. Tipper, editors, Compr. Chem. Kinet., chapter 2, pages 3–45. Elsevier Pub. Co., Amsterdam, New York, 1985.

[145] J. McGowan. Molecular Volumes and structural chemistry. Recl. des Trav. Chim. des Pays-Bas, 75(2):193–208, 1956.

[146] R. L. Rowley, W. V. Wilding, J. L. Oscarson, Y. Yang, N. A. Zundel, T. E. Daubert, and R. P. Danner. DIPPR Data Compilation of Pure Chemical Properties. Design Institute for Physical Properties, AIChE, New York, NY, 2009.

[147] Y. Zhao and D. G. Truhlar. A new local density functional for main-group ther- mochemistry, transition metal bonding, thermochemical kinetics, and noncovalent interactions. J. Chem. Phys., 125(19):194101, 2006.

[148] Y. Zhao and D. G. Truhlar. The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06- class functionals and 12 other function. Theor. Chem. Acc., 120(1-3):215–241, 2008. 108

[149] R. Krishnan, J. S. Binkley, R. Seeger, and J. A. Pople. Self-consistent molecular orbital methods. XX. A basis set for correlated wave functions. J. Chem. Phys., 72 (1):650–654, 1980.

[150] T. Clark, J. Chandrasekhar, G. W. Spitznagel, and P. V. R. Schleyer. Efficient diffuse function-augmented basis sets for anion calculations. III. The 3-21+G basis set for first-row elements, Li-F. J. Comput. Chem., 4(3):294–301, 1983.

[151] M. J. Frisch, J. A. Pople, and J. S. Binkley. Self-consistent molecular orbital methods 25. Supplementary functions for Gaussian basis sets. J. Chem. Phys., 80(7):3265, 1984.

[152] L. A. Curtiss, K. Raghavach Ari, P. C. Redfern, V. Rassolov, and J. A. Pople. Gaussian-3 (G3) theory for molecules containing first and second-row atoms. J. Chem. Phys., 109(18):7764–7776, 1998.

[153] P. L. Fast, M. L. Sanchez,´ and D. G. Truhlar. Multi-coefficient Gaussian-3 method for calculating potential energy surfaces. Chem. Phys. Lett., 306(5-6):407–410, jun 1999.

[154] B. J. Lynch, Y. Zhao, and D. G. Truhlar. Effectiveness of diffuse basis functions for calculating relative energies by density functional theory. J. Phys. Chem. A, 107(9): 1384–1388, 2003.

[155] R. Schmid. Handbook of Solvents. In G. Wypych, editor, Handbook of Solvents, chapter 13, pages 737–846. William Andrew, ChemTec, 2001.

[156] B. L. Slakman and R. H. West. Intrinsic solvation kinetics for automatic mechanism generation. In Gordon Research Conference on Atomic and Molecular Interactions, Easton, MA, July 2014.

[157] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

[158] N. OBoyle, A. Tenderholt, and K. Langner. cclib: a library for package-independent computational chemistry algorithms. J. Comp. Chem., 29.

[159] J. Schreiber, Z. L. Wescoe, R. Abu-Shumays, J. T. Vivian, B. Baatar, K. Karplus, and M. Akeson. Error rates for nanopore discrimination among cytosine, methylcy- tosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl. Acad. Sci. U. S. A., 110(47):18910–5, 2013.

[160] S. Riniker, N. Fechner, and G. a. Landrum. Heterogeneous Classi fi er Fusion for Ligand-Based Virtual Screening: Or, How Decision Making by Committee Can Be a Good Thing. J. Chem. Inf. Model., 53(Ml):2829–2836, 2013. 109

[161] T. Leuchtefeld, A. Maertens, J. M. McKim, T. Hartung, A. Kleensang, and V. Sa-´ Rochaa. Probabilistic hazard assessment for skin sensitization potency by dosere- sponse modeling using feature elimination instead of quantitative structureactivity relationships. J Appl Toxicol., 35(11):1261–1371, 2015.

[162] A. A. Onischuk, V. P. Strunin, M. A. Ushakova, and V. N. Panfilov. Studying of silane thermal decomposition mechanism. Int. J. Chem. Kinet., 30(2):99–110, 1998.

[163] J. H. Purnell and R. Walsh. The Pyrolysis of Monosilane. Proc. R. Soc. A Math. Phys. Eng. Sci., 293(1435):543–561, 1966.

[164] A. Yuuki, Y. Matsui, and K. Tachibana. A Numerical Study on Gaseous Reactions in Silane Pyrolysis. Jpn. J. Appl. Phys., 26:747–754, 1987.

[165] M. E. Coltrin, R. J. Kee, and G. H. Evans. A Mathematical Model of the Fluid Mechanics and Gas-Phase Chemistry in a Rotating Disk Chemical Vapor Deposition Reactor. J. Electrochem. Soc., 136(3), 1989.

[166] C. J. Giunta, R. J. McCurdy, J. D. Chapple-Sokol, and R. G. Gordon. Gas–phase kinetics in the atmospheric pressure chemical vapor deposition of silicon from silane and disilane. J. Appl. Phys., 67(2):1062–1075, 1990.

[167] P. Ho, M. E. Coltrin, and W. G. Breiland. Laser-induced fluorescence measurements and kinetic analysis of Si atom formation in a rotating disk chemical vapor deposi- tion reactor. J. Phys. Chem., 98(40):10138–10147, 1994.

[168] M. Frenklach, L. Ting, H. Wang, and M. J. Rabinowitz. Silicon particle formation in pyrolysis of silane and disilane. Isr. J. Chem., 36(3):293–303, 1996.

[169] A. Dollet, S. De Persis, and F. Teyssandier. Rate constants of reactions involving

SiH4 as an association product from quantum Rice-Ramsperger-Kassel calculations. Phys. Chem. Chem. Phys., 6(6):1203–1212, 2004.

[170] A. Dollet and S. de Persis. Pressure-dependent rate coefficients of chemical reactions

involving Si2H4 isomerization from QRRK calculations. J. Anal. Appl. Pyrolysis, 80 (2):460–470, 2007.

[171] C. Newman, H. O’Neal, M. Ring, F. Leska, and N. Shipley. Kinetics and Mechanism of the Silane Decomposition. Int. J. Chem. Kinet., 11:1167–1182, 1979.

[172] M. Bowrey and J. H. Purnell. The Pyrolysis of Disilane and Rate Constants of Silene Insertion Reactions. Proc. R. Soc. A Math. Phys. Eng. Sci., 321(1546):341–359, 1971.

[173] R. Becerra and R. Walsh. Mechanism of formation of tri- and tetrasilane in the re- action of atomic hydrogen with monosilane and the thermochemistry of the disilene isomers. J. Phys. Chem., 91(22):5765–5770, 1987. 110

[174] J. D. Chapple-Sokol, C. J. Giunta, and R. G. Gordon. A Kinetics Study of the Atmospheric Pressure CVD Reaction of Silane and Nitrous Oxide. J. Electrochem. Soc., 136(10):2993–3003, 1989.

[175] R. J. McCurdy and R. G. Gordon. Compensating impurities as the limiting factor in atmospheric pressure chemical vapor deposition of a-Si:H from Mg2Si generated higher silanes. J. Appl. Phys., 63(9):4669–4676, 1988.

[176] A. G. Baboul, L. A. Curtiss, P. C. Redfern, and K. Raghavachari. Gaussian-3 theory using density functional geometries and zero-point energies. J. Chem. Phys., 110 (16):7650, 1999.

[177] A. J. Adamczyk, M.-F. Reyniers, G. B. Marin, and L. J. Broadbelt. Exploring 1,2- hydrogen shift in silicon nanoparticles: reaction kinetics from quantum chemical calculations and derivation of transition state group additivity database. J. Phys. Chem. A, 113(41):10933–46, 2009.

[178] A. J. Adamczyk, M.-F. Reyniers, G. B. Marin, and L. J. Broadbelt. Kinetics of substituted silylene addition and elimination in silicon nanocluster growth captured by group additivity. ChemPhysChem, 11(9):1978–94, 2010.

[179] A. J. Adamczyk, M.-F. Reyniers, G. B. Marin, and L. J. Broadbelt. Kinetic corre-

lations for H2 addition and elimination reaction mechanisms during silicon hydride pyrolysis. Phys. Chem. Chem. Phys., 12(39):12676–96, 2010.

[180] A. J. Adamczyk, M. F. Reyniers, G. B. Marin, and L. J. Broadbelt. Hydrogenated amorphous silicon nanostructures: Novel structure-reactivity relationships for cy- clization and ring opening in the gas phase. Theor. Chem. Acc., 128(1):91–113, 2011.

[181] H. Emeleus and C. Reid. 220. The pyrolysis of disilane and trisilane. J. Chem. Soc., pages 1021–1030, 1939.

[182] S. K. Loh and J. M. Jasinski. Direct kinetic studies of SiH3 +SiH3HCCl4SiD4Si2H6, and C3H6 by tunable infrared diode laser spectroscopy. J. Chem. Phys., 95(7):4914, 1991.

[183] J. Takahashi, T. Momose, and T. Shida. Thermal Rate Constants for SiH4 )−−−−* SiH3 + H and CH4 )−−−−* CH3 + H by Canonical Variational Transition State Theory. Bull. Chem. Soc. Jpn., 67(1):74–85, 1994.

[184] U. V. Bhandarkar, M. T. Swihart, S. L. Girshick, and U. R. Kortshagen. Modelling of silicon hydride clustering in a low-pressure silane plasma. J. Phys. D. Appl. Phys., 33(21):2731–2746, 2000.

[185] R. Robertson, D. Hils, H. Chatham, and A. Gallagher. Radical species in argon- silane discharges. Appl. Phys. Lett., 43(6):544–546, 1983. 111

[186] Y. Watanabe. Contribution of short lifetime radicals to the growth of particles in

SiH4 high frequency discharges and the effects of particles on deposited films. J. Vac. Sci. Technol. A Vacuum, Surfaces, Film., 14(3):995, 1996.

[187] G. A. Petersson, D. K. Malick, W. G. Wilson, J. W. Ochterski, J. A. Montgomery, and M. J. Frisch. Calibration and comparison of the Gaussian-2, complete basis set, and density functional methods for computational thermochemistry. J. Chem. Phys., 109(24):10570–10579, 1998.

[188] L. A. Curtiss, P. C. Redfern, K. Raghavachari, and J. A. Pople. Assessment of Gaussian-2 and density functional theories for the computation of ionization poten- tials and electron affinities. J. Chem. Phys., 109(1):42–55, 1998.

[189] S. Y. Wu, P. Raghunath, J. S. Wu, and M. C. Lin. Ab initio chemical kinetic study for

reactions of H atoms with SiH4 and Si2H6: comparison of theory and experiment. J. Phys. Chem. A, 114(1):633–639, 2010.

[190] J. A. Manion, R. E. Huie, R. D. Levin, D. R. Burgess Jr., V. L. Orkin, W. Tsang, W. S. McGivern, J. W. Hudgens, V. D. Knyazev, D. B. Atkinson, E. Chai, A. M. Tereza, C.-Y. Lin, T. C. Allison, W. G. Mallard, F. Westley, J. T. Herron, R. F. Hampson, and D. H. Frizzell. NIST Chemical Kinetics Database, NIST Standard Reference Database 17, Version 7.0 (Web Version), Release 1.6.8, Data version 2015.12, 2015.

[191] A. D. McLean and G. S. Chandler. Contracted Gaussian basis sets for molecular calculations. I. Second row atoms, Z=11–18. J. Chem. Phys., 72(10):5639, 1980.

[192] Y. Zhao, B. J. Lynch, and D. G. Truhlar. Development and Assessment of a New Hybrid Density Functional Model for Thermochemical Kinetics. J. Phys. Chem. A, 108(14):2715–2719, 2004.

[193] N. L. Arthur, P. Potzinger, B. Reimann, and H. P. Steenbergen. Reaction of H- Atoms with Some Silanes and Disilanes - Rate Constants and Arrhenius Parameters. J. Chem. Soc., Faraday Trans. 2, 85(9):1447–1463, 1989.

[194] N. L. Arthur and L. A. Miles. Rate constants for H + (CH3)4-nSiHn, n = 1–4. Chem. Phys. Lett., 282(January):192–196, 1998.

[195] X. Yu, S.-M. Li, Z.-S. Li, and C.-C. Sun. Direct Ab Initio Dynamics Studies of the Reaction Paths and Rate Constants of Hydrogen Atom with Germane and Silane. J. Phys. Chem. A, 104(40):9207–9212, 2000.

[196] L. D. Crosby and H. A. Kurtz. Application of Electronic Structure and Transition State Theory: Reaction of Hydrogen With Silicon Radicals. Int. J. Quantum Chem., 106(15):3149–3159, 2006.

[197] W. Wang, S. Feng, and Y. Zhao. Quantum instanton evaluation of the thermal rate

constants and kinetic isotope effects for SiH4 + H −−→ SiH3 + H2 reaction in full Cartesian space. J. Chem. Phys., 126:114307, 2007. 112

[198] G. Katzer, M. C. Ernst, A. F. Sax, and J. Kalcher. Computational Thermochemistry of Medium-Sized Silicon Hydrides. J. Phys. Chem. A, 101(21):3942–3958, 1997.

[199] D. G. Goodwin, H. K. Moffat, and R. L. Speth. Cantera: An Object-oriented Soft- ware Toolkit for Chemical Kinetics, Thermodynamics, and Transport Processes, 2016. URL http://www.cantera.org.

[200] J. M. Jasinski and J. O. Chu. Absolute rate constants for the reaction of silylene with hydrogen, silane, and disilane. J. Chem. Phys., 88(3):1678, 1988.

[201] F. Neese. The ORCA program system. Wiley Interdiscip. Rev.: Comput. Mol. Sci., 2:73–78, 2012.

[202] J. Pfaendtner, X. Yu, and L. J. Broadbelt. The 1-D hindered rotor approximation. Theor. Chem. Acc., 118(5-6):881–898, jul 2007.

[203] P. L. Bhoorasingh. Automated Calculation of Reaction Kinetics via Transition State Theory. PhD thesis, Northeastern University, Boston, MA, jun 2016.

[204] C. F. Goldsmith and R. H. West. Automatic Generation of Microkinetic Mechanisms for Heterogeneous Catalysis. 2017.

[205] A. Laio and F. L. Gervasio. Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science. Reports Prog. Phys., 71(12):126601, 2008.

[206] S. Zheng and J. Pfaendtner. Car-Parrinello molecular dynamics + metadynamics study of high-temperature methanol oxidation reactions using generic collective variables. J. Phys. Chem. C, 118(20):10764–10770, 2014. 113

Appendices 114

A SUPPLEMENTARY INFO FOR SOLVATION KINETICS

A.1 Solvation kinetics molecular structure group trees and values

Table A.1: Group tree for hydrogen abstraction reactions

Group label ∆EA # reactions (kJ/mol) trained on L1: X_H_or_Xrad_H_Xbirad_H_Xtrirad_H 1.925 997 L2: Xrad_H -0.033 19 L2: Xbirad_H -1.305 2 L3: CH2_triplet_H -1.305 2 L3: CH2_singlet_H 0 L3: NH_triplet_H 0 L3: NH_singlet_H 0 L2: Xtrirad_H 0 L3: C_quartet_H 0 L3: C_doublet_H 0 L2: X_H 0.003 976 L3: O_H 0.579 93 L4: O_sec 0.579 93 L5: O/H/NonDeC 0.128 51 L6: O/H/Cs\H3 0.252 4 L6: O/H/Cs\Cs|H3 0.083 24 L6: O/H/Cs\Cs|Cs/H3 0.53 11 L3: Cs_H -0.259 819 L4: C_alkane -1.178 154 L5: C_methane -1.606 9 L5: C/H3/Cs\H3 -1.302 18 L5: C/H3/Cs\Cs|H3 -1.008 60 L4: C/Hx/O -0.158 92 L5: C/H3/O -0.004 45 L6: C/H3/O\H 0.363 32 L5: C/H2/Cs/O -0.31 47 L6: C/H2/Cs/O\H -0.249 42 L6: C/H2/O/Cs\Cs 0 L7: C/H2/O|H/Cs\Cs 0 Continued on next page 115

Table A.1 – continued from previous page Group label ∆EA # reactions (kJ/mol) trained on L4: C/Hx/Cs\O -0.879 21 L5: C/H3/Cs\O 0 L6: C/H3/Cs\O|H 0 L5: C/H2/Cs/Cs\O -0.879 21 L6: C/H2/Cs/Cs\O|H -0.879 21 L4: C/H3/Cs\Cs|O 0 L5: C/H3/Cs\Cs|O/H 0 L4: C/H2/NonDeC_6ring 1.711 1 L3: Cb_H 0 L4: Cb/H/Cb 0 L5: Cb/H/Cb/Cb 0 L6: Cb/H/Cb/Cb\O 0 L6: Cb/H/Cb/Cb\Nb 0 L6: Cb/H/Cb/Cb\Cb|Nb 0 L6: Cb/H/Cb/Cb\N_pyrrole 0 L3: Cd/H/Cb 0 L4: Cd/H/Cb/O 0 L4: Cd/H/Cb/Nb 0 L4: Cd/H/Cb/N_pyrrole 0 L3: N3_H 0 L4: N3s_H 0 L5: N3s/H/Cb/Cb 0 L1: Y_rad_birad_trirad_quadrad 0 L2: Y_1centerquadrad 0 L3: C_quintet 0 L3: C_triplet 0 L2: Y_1centertrirad 0 L3: N_atom_quartet 0 L3: N_atom_doublet 0 L3: CH_quartet 0 L3: CH_doublet 0 L2: Y_1centerbirad -2.192 61 L2: Y_rad 0.149 936 L3: Y_2centeradjbirad -2.713 24 L3: O_rad 1.971 337 L4: O_pri_rad 0.758 79 L4: O_sec_rad 2.355 258 L5: O_rad/NonDeO 2.368 210 L3: Cs_rad -0.473 282 L4: C_methyl -2.962 64 116

A.2 Script for modifying Chemkin files for solvation kinetics corrections

Included electronically as modifyReactionBarriers.py

A.3 Modified Cantera input file for n-dodecane/ methyl oleate oxidation

Included electronically as liq modified.cti

A.4 Cantera script to simulate liquid fuel oxidation reactor

Included electronically as oxidation.py

A.5 Code for automatic tree building

Included electronically as Building a Tree for Solvation Kinetics Data using Scikit Learn.ipynb 117

B SUPPLEMENTARY INFO FOR SILICON HYDRIDES

B.1 Geometries of reactants and transition states for hydrogen abstraction reactions

Table B.1: Transition state geometries for silicon hydride hydrogen abstractions. Geome- tries were optimized at M06-2X/6-311+G(3d2f).

Reaction Geometry (xyz)

SiH4 + H ←→ SiH3 + H2 Si 0.15453 0.00006 -0.00002 H -1.41678 0.00240 0.00055 H 0.62492 -0.56812 1.27728 H 0.62860 1.38878 -0.14797 H 0.62319 -0.82427 -1.12967 H -2.62334 0.00039 0.00014

Si2H6 + H ←→ Si2H5 + H2 Si 1.08477 -0.21175 0.00000 H 1.76873 1.18341 0.00048 Si -1.23147 0.06812 0.00000 H 1.54266 -0.93570 1.20460 H 1.54277 -0.93472 -1.20514 H 2.39507 2.29684 0.00003 H -1.64097 0.81927 1.20466 H -1.91403 -1.24233 -0.00282 H -1.64042 0.82406 -1.20185

Continued on next page 118

Table B.1 – continued from previous page Reaction Geometry (xyz)

Si2H6 + SiH3 ←→ Si2H5 + SiH4 Si -0.55565 0.74630 -0.57115 H 2.82912 -1.40097 1.04827 Si -2.27585 -0.37494 0.52734 H -0.67818 0.64407 -2.04227 H -0.49801 2.17627 -0.19500 Si 2.72787 -0.35594 0.00917 H -2.06810 -0.27329 1.98716 H -2.25046 -1.80175 0.14252 H -3.60717 0.17968 0.19965 H 3.22534 -0.86811 -1.28355 H 0.98622 0.12059 -0.20376 H 3.48999 0.84300 0.41209

Si2H5 + H ←→ SiH3SiH + H2 Si 1.09996 -0.23822 -0.10182 H 1.92029 1.02459 0.19908 Si -1.19412 0.08446 0.01971 H 1.64355 -1.36585 0.68301 H 2.83364 2.02686 0.17835 H -1.63898 0.40468 1.39338 H -1.88427 -1.14091 -0.43095 H -1.55592 1.20330 -0.87343

Si2H4 + H ←→ Si2H3 + H2 Si 0.97302 -0.22919 -0.11437 H 1.90926 0.97626 -0.02486 Si -1.13964 0.10093 0.07414 H 1.55866 -1.43591 0.50100 H 2.67744 1.96603 0.49374 H -2.07255 -1.04044 0.06669 H -1.74016 1.32958 -0.47338 119

B.2 Geometries of silicon hydride species

Table B.2: Geometries of silicon hydride species used to calculate thermodynamics proper- ties and HBI values. Geometries were optimized at G3//B3LYP. (S) and (T) indicate singlet or triplet electronic state.

SMILES Geometry (xyz) [SiH2](S) Si -1.43587 0.00246 0.06697 H -1.74175 0.47705 -1.35562 H -0.04854 0.64701 0.01782

[SiH2](T) Si -1.33315 0.10876 -0.07282 H -1.93728 0.40943 -1.40268 H 0.04429 0.60833 0.20466

[SiH2]=[Si] Si -3.60702 1.82709 -0.33428 Si -1.49125 1.19881 0.33577 H -0.81469 -0.11028 0.07296 H -0.51770 2.01734 1.12493

[SiH2]=[SiH2] Si -1.53193 -0.02404 -0.20088 Si 0.58853 0.34809 0.13524 H -2.04746 -1.40486 -0.00758 H -2.19289 0.64071 -1.35459 H 1.24951 -0.31663 1.28895 H 1.10404 1.72891 -0.05809

[SiH2][SiH3] Si 0.64164 0.63190 -0.07677 H 2.12401 0.79844 -0.10622 H 0.09716 1.50707 0.99666 Si -0.01278 -1.58161 0.31378 H 0.09701 1.08711 -1.38474 H 0.40941 -2.11507 1.64186 H 0.40969 -2.53755 -0.75125

[SiH3] Si 0.61349 0.58486 -0.06567 H 2.10205 0.54852 -0.07144 H 0.09988 1.44264 -1.16912 H 0.09996 1.05209 1.25158

Continued on next page 120

Table B.2 – continued from previous page SMILES Geometry (xyz) [SiH3][Si][SiH3] Si 1.67253 0.27843 0.63961 H 3.14517 0.40071 0.46884 Si 0.63136 1.45854 -1.09549 H 1.30287 0.82573 1.97628 H 1.31054 -1.16767 0.61548 Si -1.67483 1.68344 -1.43917 H -2.33983 2.35487 -0.28601 H -2.33137 0.36094 -1.64601 H -1.88650 2.51252 -2.65597

[SiH3][SiH2][SiH3] Si 1.34707 0.41763 0.52351 H 2.83377 0.50351 0.50128 Si 0.41113 1.59415 -1.28727 H 0.86407 0.97917 1.81566 H 0.94075 1.03046 -2.56387 H 0.84714 3.02022 -1.22248 Si -1.94053 1.50667 -1.32139 H -2.40856 0.09624 -1.42074 H -2.48341 2.26393 -2.48329 H -2.50232 2.09543 -0.07399 H 0.95938 -1.01934 0.46801

[SiH3][SiH3] Si 1.25034 0.42662 0.50364 H 2.73834 0.48621 0.51730 Si 0.39108 1.62122 -1.32867 H 0.74771 0.99732 1.78408 H 0.84025 -1.00347 0.43617 H 0.80116 3.05131 -1.26120 H -1.09692 1.56161 -1.34235 H 0.89373 1.05052 -2.60911

Continued on next page 121

Table B.2 – continued from previous page SMILES Geometry (xyz) [SiH3][SiH][SiH3] Si 1.61691 0.33637 0.61343 H 3.09531 0.50916 0.63765 Si 0.72147 1.40841 -1.26788 H 1.04575 0.94103 1.84764 H 1.32225 -1.12597 0.63522 Si -1.60568 1.62562 -1.43403 H 1.32982 0.90996 -2.53930 H -2.13957 2.21855 -0.17765 H -2.28972 0.31855 -1.65621 H -1.94939 2.52478 -2.56959

[SiH4] Si 1.46180 0.40735 0.51255 H 2.94718 0.40735 0.51255 H 0.96668 -0.24704 -0.72558 H 0.96667 1.80680 0.56489 H 0.96667 -0.33770 1.69833

[SiH] Si 0.30787 2.67553 0.05506 H -0.20525 2.37574 -1.36497

[SiH][SiH2] Si -2.73075 1.30914 0.07907 Si -0.47875 1.78249 0.08754 H -2.72465 1.27934 -1.44957 H 0.33260 1.85245 1.33822 H 0.30767 2.46622 -0.98447

[SiH][SiH3](S) Si -2.71696 1.51477 0.05196 Si -0.31118 1.56106 0.04435 H -2.68937 1.16938 -1.44010 H 0.22225 0.23388 -0.39172 H 0.30300 1.87871 1.36442 H 0.17481 2.58171 -0.93456

[SiH][SiH3](T) Si -2.53013 1.47300 -0.10995 Si -0.19734 1.55835 0.01393 H -3.23625 1.16324 -1.39379 H 0.42084 0.25197 -0.35026 H 0.16206 1.88550 1.41850 H 0.36336 2.60747 -0.88406 122

B.3 Largest SiH4 decomposition mechanism

Included electronically as chem annotated.inp and species dictionary.txt. Converted Cantera input file is included as silane.cti.

B.4 Cantera script for simulating reactor

Included electronically as silane decomp.py

B.5 Code for residence time comparison

Included electronically as Residence Times.ipynb