Construction, analysis, and modeling of complex reaction networks with RING

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY

Srinivas Rangarajan

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy

Advised by Prodromos Daoutidis and Aditya Bhan

August 2013 c Srinivas Rangarajan 2013 ALL RIGHTS RESERVED Acknowledgments

I would like to thank my advisors Profs. Prodromos Daoutidis and Aditya Bhan. They gave me enough freedom to pursue my ideas, encouraged collaborations, offered inter- esting problems to work on, and above all were always available for advice to sort out my problems, technical and otherwise. I thank the members of the Daoutidis group, with whom I shared my office, and the Bhan group, with whom I had valuable interaction time-to-time. I would like to specifically thank my colleagues in the Daoutidis group, Ana Torres, Alex Marvin, Dimitrios Georgis, Seongmin Heo, Adam Kelloway, and former members Dr. Sujit Jogwar, Prof. Fernando Lima, and Prof. Milana Trifkovic, with whom I have had innumerable engaging technical discussions on common research problems. Beyond work, I have also found good friends in them. The camaraderie in the group has been wonderful and has made our office, the Popcorn Lounge, the ideal place to work; I will forever cherish the 2 PM coffee breaks. I also want to thank my batchmates and friends in the department, especially, Dr. Reetam Chakrabarty, with whom I had many valuable discussions starting from my first few weeks in Minneapolis. I thank all my other friends that I have made and all the roommates I have had here in the last five years – life in Minneapolis would never have been the cherishable if not for them. I want to specifically thank Julie Prince, our graduate program co-ordinator; I ar- rived late in the US, and without her help in sorting out all the initial administrative paper work, driving me to the SSN office, etc., my first few weeks would have been

i ii unmanageable. Prof. Eric Van Wyk and Ted Kaminski at the Department of Computer Science and Engineering have been great collaborators that have patiently worked with me over the last three-and-a-half years. They both have provided a lot of insights on the computer science aspect of this research, and I would like to thank both of them for this, especially Ted, from whom I learnt a lot about efficient computer programming methods. Lastly, I’d like to thank my family, back in India, for their complete and uncondi- tional love and support over the course of my PhD, without which it would have been impossible to come to a new country and remain focused on my research. iii

To My Family Abstract

Complex reaction networks are found in a variety of engineered and natural chemi- cal systems ranging from petroleum processing to atmospheric chemistry and includ- ing biomass conversion, materials synthesis, metabolism, and biological degradation of chemicals. These systems comprise of several thousands of reactions and species inter-related through a highly interconnected network. This thesis presents methods, computational tools, and applications that demonstrate that: (a) any complex network can be constructed automatically from a small set of initial reactants and chemical transformation rules, and (b) a given network can be analyzed in terms of identifying topological information such as reaction pathways, determining thermodynamically fea- sible routes, evaluating the spectrum of plausible and synthetically feasible compounds, exploring dominant routes to form experimentally observed products, and formulating and solving a rigorous kinetic model. A new computational tool called Rule Input Network Generator, or RING, has been developed to construct and analyze complex reaction networks. Given initial re- actants of a reaction system (e.g. the components of the feed to a reactor) and reaction rules that describe the possible chemical transformations that can occur, RING first constructs an exhaustive network of reactions and species consistent with the inputs. Inputs into RING are in the form an English-like domain specific language with syntax involving common chemistry parlance. The language compiler further catches erroneous chemistry rules, such as incorrect charge balance in a reaction rule, and heuristically

iv v optimizes user-specified instructions to improve the speed of execution. RING, fur- ther, accepts “post-processing” instructions that allow for: (i) lumping, or grouping together isomers to reduce the size of the reaction network, (ii) “querying” the network to extract information such as reaction pathways and mechanisms that describe how an initial reactant is transformed into a specific product, (iii) calculating thermochemical properties of species and reactions to evaluate thermochemical feasibility of reaction steps, and (iv) formulating and solving rigorous microkinetic models of complex reac- tion networks. RING, thus, provides a “rule-based” framework to assemble and explore a complex reaction network. RING implements several algorithms, methods, and techniques from computer sci- ence, cheminformatics, and graph theory. The language has been developed using SIL- VER, a meta-languge for specifying attribute grammars, and COPPER, a parser gen- erator. The language is extensible in that independent additions can be incorporated to the original language to perform additional analysis without syntactical and semantic conflicts with the existing grammar. Algorithms from chemical graph theory and chem- informatics are adopted to (i) represent molecules as strings externally and as graphs internally, (ii) store reaction rules as graph transfomration rules, (iii) identify fragments in molecules that can serve as reaction centers through pattern matching, (iv) determine molecular characteristics such as shape (linear, branched, cyclic, etc.) and aromaticity, and (v) identify isomeric lumps through a new molecular hashing technique. Graph traversal algorithms are further employed by the post-processing modules to identify pathways and mechanisms. This thesis presents several case studies of application of RING in elucidating com- plex networks of reactions. First, when chemistry alone is known about the system, RING can be used to identify plausible mechanisms for product formation consistent with experimental observations; it can further be used to postulate possible experiments to discriminate between the alternative mechanisms. This has been demonstrated with a case study of glycerol and acetone conversion on solid Brønsted acid catalysts. Sec- ond, if molecular properties can be evaluated quickly using semi-empirical methods for a large number of species and compounds, RING can be used to identify species in the network that have desired physical properties and thermochemically favorable synthesis routes to form them. A case study on identifying fatty alcohols, in a spectrum of more than 60,000 compounds, that can potentially be used to make nonionic surfactants with desirable properties and their synthesis routes from biomass-derived oxygenates vi presents an application of this method. It was found that lauryl alcohol, a fatty al- cohol currently used to make surfactants, can be synthesized from biomass-oxygenates using a combination of metal, basic, and acid catalysts. It was also found that some of the intermediate synthesis steps could potentially be coupled to drive the overall re- action forward, or could benefit from using biphasic systems for immediate separation of products from reactants. Third, if activation barriers of each step in the reaction can be reliably predicted using semi-empirical methods, RING can be used to iden- tify dominant reaction mechanisms for converting reactants to experimentally observed products. This was demonstrated by analyzing the energetically favorable mechanisms for glycerol conversion to syn gas or 1,2-propane diol on transition metal catalysts such as Platinum, Palladium, Rhodium, and Ruthenium. It was found that glycerol would decompose to syn gas on Platinum and Palladium, while a significant selectivity to the diol can be obtained on Rhodium and Ruthenium, thus offering insights for designing catalysts for complex biomass conversion systems. Finally, if kinetic parameters and thermochemistry can be estimated apriori, RING can be used to formulate and solve rigorous microkinetic models to get quantitative information such as yield and selectiv- ity. This feature is demonstrated through a model developed for methanol conversion to hydrocarbons (MTH) on Brønsted acid catalyst HZSM-5. RING is generic in terms of chemistries it can handle and flexible in terms of the type of analysis that can be performed. This thesis posits that it can be used in conjunction with experimental and data to elucidate systems with complex reaction networks, especially in hydrocarbon processing and biomass conversion. Contents

Acknowledgmentsi

Abstract iv

Table of Contents vii

List of Tables xii

List of Figures xiv

1 Introduction1 1.1 Construction and analysis of complex networks: Methods and challenges3 1.2 Rule Input Network Generator: Features...... 5 1.3 Applications of RING...... 5 1.3.1 Topological analysis and mechanism hypothesis of complex systems6 1.3.2 Identification of synthetically feasible compounds and synthesis routes...... 6 1.3.3 Mechanism elucidation of complex glycerol conversion network on transition metals...... 7 1.3.4 Kinetic modeling of Methanol-to-Hydrocarbons system...... 7

vii CONTENTS viii

2 Background9 2.1 Computer-assisted prediction of organic chemical reactions...... 9 2.2 Network generation and analysis: a review...... 11 2.3 Topological network analysis...... 16

3 RING: description 17 3.1 Components of RING...... 17 3.1.1 Reaction language compiler...... 17 3.1.2 Reaction network generator...... 18 3.1.3 Post-processing module...... 19 3.2 RING: Inputs...... 21 3.2.1 Initial reactants...... 21 3.2.2 Reaction rules...... 25 3.2.3 Global constraints...... 28 3.2.4 Post-processing instructions]...... 28 3.3 RING: Outputs...... 32 3.3.1 Reaction network...... 32 3.3.2 Lumping results...... 32 3.3.3 Pathway identification results...... 33 3.3.4 Mechanism identification results...... 33 3.4 Discussion...... 34

4 RING: Algorithms 38 4.1 Reaction Language and Compiler...... 38 4.1.1 Compiler optimizations...... 41 4.1.2 Language extension...... 43 4.2 RING: Network generation...... 45 4.2.1 Molecules and fragments...... 46 4.2.2 Reaction rule...... 48 4.2.3 Reaction...... 48 4.2.4 Generation of the reaction network...... 51 4.3 Post-processing...... 54 4.3.1 Pathway identification...... 54 4.3.2 Mechanism enumeration...... 57 4.3.3 Lumping...... 61 CONTENTS ix

4.3.4 Molecule and reaction queries...... 67 4.4 Kinetic modeling...... 68 4.4.1 Thermodynamic consistency...... 69 4.5 Discussion...... 71

5 Topological network analysis with RING 72 5.1 Analysis of network characteristics: Propane aromatization...... 72 5.1.1 Network construction...... 73 5.1.2 Dominant benzene production pathways...... 77 5.1.3 Network lumping...... 80 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 82 5.2.1 Acrolein production pathways...... 84 5.2.2 Pathways to acetone...... 84 5.2.3 Potential mechanisms that result in acetic acid production from acetone...... 86 5.3 Discussion...... 89

6 Identification and analysis of synthesis routes in complex catalytic reaction networks for biomass upgrading 91 6.1 Introduction...... 91 6.2 Computational methods...... 92 6.3 Fatty alcohols from biomass...... 94 6.3.1 Fatty alcohols...... 94 6.3.2 Synthesis of desired fatty alcohols and alcohol ethoxylates.... 95 6.4 Results and Discussion...... 96 6.4.1 Fatty alcohols product spectrum...... 96 6.4.2 Alcohol ethoxylates property estimation...... 97 6.4.3 Synthesis routes to lauryl alcohol...... 99 6.4.4 Thermochemical analysis of synthesis pathways...... 102 6.4.5 Biphasic separability analysis of synthesis pathways...... 106 6.5 Discussion...... 108

7 Glycerol converison on transition metal catalysts 111 7.1 Introduction...... 111 7.2 Glycerol conversion on transition metals...... 113 CONTENTS x

7.3 Method...... 113 7.3.1 Network construction and representation...... 114 7.3.2 On-the-fly estimation of thermochemistry and activation barriers 116 7.3.3 Pathways analysis...... 119 7.4 Results...... 120 7.4.1 Platinum...... 120 7.4.2 Other metals...... 125 7.5 Discussion...... 130

8 Kineting modeling of MTH 132 8.1 Introduction...... 132 8.2 Kinetic modeling: Procedure...... 132 8.2.1 Reaction rules...... 133 8.2.2 Thermochemistry...... 135 8.2.3 Kinetic parameters...... 136 8.2.4 Network generation and lumping...... 140 8.3 Results...... 142 8.4 Discussion...... 147

9 Summary and Future 148 9.1 Summary and Discussion...... 148 9.2 Future directions...... 150 9.2.1 Parameter estimation...... 150 9.2.2 Reactor models...... 152 9.2.3 Dynamic simulations of the kinetic model...... 152 9.2.4 Model-based design of experiments...... 152 9.2.5 Optimization...... 153 9.2.6 Efficient software...... 154 9.3 Conclusion...... 155

Bibliography 156

Appendix 185 A Fructose-to-HMF...... 185 B RING language: EBNF...... 189 C RING compiler optimizations: Benchmark study...... 194 CONTENTS xi

D RING: Class hierarchy...... 195 E Network generation: additional algorithms...... 198 F Network gneration algorithm: proof of completeness and correctness.. 199 F.1 Proof...... 200 G Hash value calculation for lumping...... 201 H Metal catalysis rules...... 201 I Alternative scheme for thermochemistry and activation energies estima- tion in metal catalysis...... 206 J Benson-like groups to account for surface cycle effects in metal catalysis 207 K Binding energies of Carbon, Oxygen, and Hydrogen on different metals. 207 L Comparison of RING-calculated thermochemistry values of surface species on Platinum with DFT...... 209 List of Tables

2.1 A list of Reaction network generators, and a description of their essential features, and their areas of applications ...... 13

3.1 Sample input code for Fructose-to-HMF system ...... 23

4.1 Benchmarking statistics: Run-time ratios for successive compiler opti- mizations...... 42

5.1 Reaction rules (elementary steps) for propane aromatization on HZSM-5 ...... 74 5.2 Propane aromatization network size and reaction distribution of the four most common rules computed using RING...... 75 5.3 Network size, number of reactions, and execution time of propane arom- atization network with/without encapsulation...... 77 5.4 Reaction rate ratios for different species in Propane aromatizationa ... 78 5.5 Pathways statistics for acid-catalyzed acrolein production from glycerol 84

6.1 Reaction chemistries input into RING...... 96 6.2 Commercial surfactants and their physical properties...... 98 6.3 Queried surfactant ranges and the number of identified moleculess... 98 6.4 Error statistics for species and reaction enthalpy estimation using group additivity...... 103

xii LIST OF TABLES xiii

6.5 Comparisons between G4MP2 and group additivity estimations..... 104 6.6 Experimental and estimated LogP for a representative set of oxygenates 108

7.1 Number of identified CO formation pathways on Pt in RING pathway queries as a function of upper bound on activation barrier specified. All energy values are in kJ/mol...... 122 7.2 A list of highest barriers of the low energy pathways to CO and 1,2 propane diol formation on different metals (in kJ/mol)...... 128

8.1 Reaction rules for olefin cycle in MTH chemistry. Reverse rules are also included in the chemistry ...... 133 8.2 Reaction rules for aromatics cycle in MTH chemistry. Reverse rules are also included in the chemistry ...... 134 8.3 Kinetic parameters used in modeling MTH chemistry. A refers to pre-exponential factors, Ea is the activation barrier ..... 136 8.4 Comparison of MTH model predictions with experimental data..... 143 8.5 Degree of rate control values predicted by the MTH model...... 147

9.1 Benchmarking data for compiler optimizations...... 195 9.2 Benchmarking data for compiler optimizations (run times are in seconds) 196 9.3 Seed parameters, corresponding primes, and factors for the calculation of atom hash seed...... 202 9.4 Rules Input into RING for glycerol conversion on transition metals ...... 204 9.5 Surface groups and supplementary angles...... 208 9.6 Bond dissociation values for atomic C, H, and O bonds with different metal209 9.7 Comparison of RING-predicted and DFT values for surface hydrocarbons and oxygenates on Platinum. All values are in kJ/mol...... 210 List of Figures

3.1 The modular structure of RING. Dashed arrows show internal informa- tion flow, solid arrows refer to inputs and outputs...... 18 3.2 SMILES strings with composite atoms and non-bonded interactions: (i) Dimeric intermediate in DEE synthesis from ethanol on proton-form ze- olite catalysts; (ii) surface species formed upon dissociative adsorption of ethanol...... 22 3.3 Dissociative adsorption of molecular hydrogen on metal sites (Pd).... 25 3.4 Sample input reaction rule in RING elementary step involving protona- tion of an alcohol...... 26 3.5 Sample group additivity and corrections specification for thermochem- istry estimation in RING...... 31 3.6 Sample rule inputs into RING for the estimation of kinetic parameters. 32 3.7 Sample reactions of the Fructose-to-HMF network generated by RING.. 33 3.8 Sample lump and its representative in the Fructose-to-HMF reaction net- work identified by RING...... 34 3.9 One pathway from fructose to HMF in the Fructose-to-HMF reaction network identified by RING. The sequential representation of reaction is an adaptation of the reaction list that RING generates...... 35 3.10 Example of one complete mechanism having four catalytic cycles for HMF production from fructose, identified by RING...... 36

xiv LIST OF FIGURES xv

4.1 Example reaction rule – Protonation of a keto group...... 39 4.2 The overall structure of RING, zoomed in on the compiler component, using the pathways analysis module as an example of a post-processing module...... 40 4.3 The host language, and a representative subset of extensions to it, imple- mented in Silver. The arrows represent dependencies between grammars. 45 4.4 (a) One six-member ring of Cyclohexane, (b) Two six-member rings and one ten-member ring of Naphthalene, (c) Extended conjugation with 4n+2 pi electrons leading to aromaticity in Anthracene, and (d) Allyl carbenium ion, wherein a carbenium ion is in conjugation with a double bond...... 48 4.5 Products of protonation reactions of alkenes: (a) 1,3-hexadiene protona- tion to form 4 different products; (b) 1,3-butadiene protonation to form two different products; and (c) 3-hexene protonation to form only one distinct product...... 50 4.6 Representation of reactions that break aromaticity in RING: (a) Proto- nation of benzene leads to double bonds in conjugation with each other and with the carbenium ion; (b) Protonation of furan leads to three pos- sible products depending upon the protonation site. In all cases, the products have pi-electrons redistributed as double bonds or lone pairs so as to maintain conjugation...... 51 4.7 Sample post-processing instruction written in the language: a query to obtain pathways to acetone...... 54 4.8 An example of molecule hash value calculation. The steps are: (1) cal- culate hash seed of atom, (2) calculate hash of the atom using the hash seed, and (3) calculate molecule hash from the constituent atom hashes. 63 4.9 Three-step lumping process: (a) Functional group-based equivalence is used to collate molecules, (b) A representative of each collated lump is identified, and (c) Paraffins, Olefins, Naphthenes, and Aromatics (PONA) lumps are further lumped based on molecular formula...... 64

5.1 Sample post-processing instruction written in the language: a query to obtain pathways to acetone...... 76 LIST OF FIGURES xvi

5.2 Reactions and species count for acid-catalyzed propane aromatization with hydride transfer encapsulation as a function of largest allowed species size...... 77 5.3 Acid-catalyzed propane aromatization to form benzene involving cycliza-

tion of a C6 species...... 81 5.4 A count of species lumps and reaction lumps with hydride-transfer en- capsulation as a function of largest allowed species size for acid-catalyzed propane aromatization...... 82 5.5 Acrolein production pathways over solid acid catalysts. The 3- hydropropionaldehyde precursor to acrolein is highlighted by a dashed oval...... 85 5.6 Acetone production pathways from Glycerol over solid acid catalysts (note all pathways require either hydroxyacetone or 1,2-propanediol).. 86 5.7 Pathways identified by RING from acetone to acetic acid over solid acid catalysts. The proposed mechanism is enclosed within the dashed curve 87 5.8 Individual reaction cycles for the conversion of acetone to acetic acid and butene...... 88 5.9 13C- and 18O- labeling simulations using RING for the conversion of aqueous acetone solution on zeolites...... 89

6.1 A workflow of network generation, analysis, and semi-empirical molecular property relationship estimation with RING...... 93 6.2 A schematic of fatty alcohol and alcohol ethoxylates showing the hy- drophobic (alkyl chain) and hydrophilic (ethylene oxide oligomeric chain) components...... 94 6.3 Initial reactants input into RING. HMF stands for 5-hydroxymethylfurfural. 95 6.4 The spectrum of monoalcohols generated in the network...... 97 6.5 Molecular structures of alcohol ethoxylates in each of the three ranges of queries...... 98 6.6 Lauryl alcohol synthesis routes from biomass-derived oxygenates. An atom-efficient route is shown with bold arrows...... 99 6.7 Possible products of acetone-acetaldehyde aldol condensation included Pent-3-en2-one (marked 1)...... 100 6.8 Selective synthesis of lauryl alcohol from furfural, ethanol, methanol, and acetone...... 101 LIST OF FIGURES xvii

6.9 Synthesis scheme for converting biomass-derived oxygenates to diesel (adapted from Ref Huber et al. Science 308 (2005) 1446 1450)...... 102 6.10 Gas phase thermochemistry analysis of synthesis routes to lauryl alcohol. The route with bold arrows is a selective route without cross condensation steps involving molecules with -hydrogens with respect to the carbonyl group. Enthalpy (and free energy) change (kJ/mol) of each reaction at 1 atm and 600 K is given. Thermochemistry values of equilibrium-limited reactions are underlined and hydrogenation steps that can potentially drive a preceding equilibrium-limited reaction are marked by dashed ar- rows...... 105 6.11 Octanol-water partition coefficients of species in the synthesis routes to lauryl alcohol. The values corresponding to each species are calculated as the logarithm of the equilibrium partition coefficient value. Reactions marked with bold arrows can potentially by carried out in a biphasic aqueous-organic system...... 109

7.1 SMILES strings and pictorial representation: (a) physisorbed methanol,

and (b) chemisorbed C3H7O3*...... 115 7.2 Representative pathways for carbon monoxide formation from Glycerol on Pt. All values are in kJ/mol...... 121 7.3 Lowest energy pathway on Pt without OH scission leading species 1 in Figure 7.2. All values are in kJ/mol...... 123 7.4 A complete reaction cycle forming carbon monoxide on Pt. Not shown is the stoichiometrically equivalent number of hydrogen formation steps from H*. All values are in kJ/mol...... 124 7.5 Representative pathways for 1,2 propane diol formation from Glycerol on Pt. All values are in kJ/mol...... 126 7.6 Pathway to form carbon monoxide from glycerol with the lowest energy barrier on Pd, Rh, and Ru. All values are in kJ/mol. Steps I and VIII are non-activated...... 127 7.7 Reaction pathway to form carbon monoxide with the lowest energy bar- rier on Pt and the corresponding energy barriers on Pd, Rh, and Ru. All values are in kJ/mol. Steps I and VIII are non-activated...... 127 LIST OF FIGURES xviii

8.1 Linear correlation of thermochemistry with carbon number. (a) Enthalpy of alkoxide - Enthalpy of alkane in kJ/mol at 298 K; (b) Entropy of alkoxide - Entropy of alkane in J/mol/K at 298 K...... 141 8.2 The flow rate profile of DME along the catalyst bed...... 145 8.3 Concentration profile of surface alkoxides along the catalytic bed. Con- centrations are in mmol/g.cat. The total acid site concentration is 0.387 mmol/g.cat for an Si:Al ratio of 42...... 145 8.4 The flow rate profile of a representative set of hydrocarbons along the catalyst bed...... 146

9.1 Class hierarchy diagram of RING showing the inter-relationships of the classes constituting the necessary elements of network generation.... 197 CHAPTER 1

Introduction

Many engineered and natural chemical systems have been reported to have a complex underlying reaction network [1,2]. Examples of individual chemical systems include petrochemical processes, biomass conversion, combustion of fuels, nanoparticles synthe- sis, atmospheric chemistry of volatile organic compounds, degradation of xenobiotics in the environment, and biological systems [3,4,5,6,7,8,9, 10, 11, 12]. A different class of complex reaction networks emerges in the realm of organic synthesis. Biomass, for example, can be converted into a plethora of valuable compounds such as platform chem- icals, cosmetics, solvents, pharma- and neutra-ceuticals, etc. using a wide spectrum of chemsitries spanning homogeneous and heterogeneous, catalytic and noncatalytic, and thermochemical and biochemical routes [13]. This network of potential transformations, the synthesis network, is also complex. Complex networks show two distinctive characteristics. First, their size is large; for example, combustion of hexadecane – a model diesel compound – can involve up to 6000 species (compounds) and 20,000 reactions [7]. Tropospheric degradation network of volatile organic chemicals can have up to 4000 species and 12,000 reactions [14] while biological systems such as the metabolic network of Escherichia Coli is reported to have up to 1000 species and 2000 reactions. Further, in many petrochemical processes such

1 1 Introduction 2 as hydrocracking∗ and fluid catalytic cracking†, the reactors convert crude oil feedstock containing several hundred compounds into a variety of products through a complex reaction network containing several thousand intermediate species and reactions [15, 16]. Second, the species and reactions are closely interdependent because each experimentally observed product can be potentially formed from initial reactants (feed) by tens to hundreds of reaction pathways and mechanisms ‡. Brønsted acid catalyzed conversion of propane to aromatics [17], for example, has over hundred pathways leading to Benzene – the major product. Several of these pathways have common species (or “hubs”) and reaction sub-sequences, further indicating their interdependence. Jeong et al. [18] showed that metabolic networks of several organisms follow a “power law” structure in that few nodes, called central metabolites, act as hubs connecting all metabolites through several reactions; consequently, the path length between these nodes is small and the paths involve these hub nodes. This research aims to address two challenges that emerge in the elucidation of com- plex reaction networks in the context of biomass and hydrocarbon conversion – (a) how to automatically construct these networks with high fidelity irrespective of the chemistry, and (b) what types of analysis can be performed to gain a fundamental understand- ing of the transformations that occur in the system. The first question arises because manual assembly of complex reaction networks is time consuming and error prone, thus necessitating an automated tool to construct these networks. The second challenge is directly related to the elucidation of complex reaction networks. Network elucidation in individual complex systems allows for: (a) identifying the different reaction pathways (or mechanisms) that can operate concurrently, (b) determining the step(s) that con- trol the overall rate of conversion of the feed, (c) finding the dominant pathway(s) to specific products from initial reactants (feed), and (d) establishing which species and reactions control the selectivity to a desired product. Addressing these questions is a key step towards modeling, design, operation, and control of reactors involving complex systems. Elucidation of synthesis networks, on the other hand, addresses a different set of questions: (a) What is the spectrum of possible products, (b) which compounds in the spectrum have desirable properties, (c) what are the different routes to synthesize

∗a process to produce diesel †a process to produce gasoline ‡Pathways are sequences of several reaction steps connecting initial reactants and final products. A mechanism is a superset of pathways additionally containing information of by-products and co- reactants. 1.1 Construction and analysis of complex networks: Methods and challenges 3 these compounds, and (d) how do these routes compare in terms of different technoeco- nomic objectives such as selectivity, yield, and material (feedstock, catalyst, etc.) and utility (heat, electricity, etc.) requirements. In this context, this thesis presents a new generic computational tool for network generation and analysis – Rule Input Network Generator (RING) [19, 20] and discusses examples of its application in elucidating complex reaction systems. RING can construct any network of reactions using initial reactants and reaction rules. It can subsequently analyze the generated network in three ways. First, the topological information in the form of reaction pathways and mechanisms – an ordered collection of reactions that together describe how a product is formed from initial reactants – can be obtained from the network. Second, if thermochemical and physical properties of species are known, thermochemically feasible routes to form desirable products can be identified. Third, if kinetic properties (including activation barriers) of each reaction step are known, or can be quickly estimated, then mathematical models (known as microkinetic models) can be formulated and solved to get quantitative information about the system such as the yield each product in a reactor. These three analyses need progressively larger amounts of data as inputs but enable an increasingly quantitative understanding of the chemical system.

1.1 Construction and analysis of complex networks: Meth- ods and challenges

The reactions in a complex network, albeit several hundreds to thousands in number, can be categorized as belonging to one of a small number of reaction rules that describe valid chemical transformations. For example, each reaction of the propane aromatization system is an instance of application of one of less than fifteen common elementary steps of acid-catalysis chemistry such as protonation, beta scission, oligomerization, cyclization, and hydride transfer. This implies that if initial reactants (feed to the reactor, for example) and the set of reaction rules are known, the entire network can be constructed automatically by iteratively applying the rules to the reactants and products successively generated. Developing such a computational tool, a network generator, requires addressing several challenges. First, an unambiguous and generic representation scheme for all types of molecules – stable compounds, surface intermediates, radicals, ions, etc. – is required. Second, a method is required to represent the reaction rules in 1.1 Construction and analysis of complex networks: Methods and challenges 4 a manner that can be directly applied on the reactant molecules to form new (product) molecules, thereby generating a reaction. Third, some provision is required to limit the combinatorial explosion that is possible in the iterative process of generation of reactions described above. For example, if a rule combines two reactants to form a larger compound, it could lead to an infinite loop wherein successively larger molecules are formed. A simple upper limit on the size can prevent the combinatorial explosion in this case; in general, however, more elaborate features are required such as rule constraints, rank- and rate-based termination of the generation process, etc. (see Chapter2 for details). Several challenges also need to be addressed to perform the three types of analy- ses discussed earlier. First, the size and interpedence of the network necessitates the development and use of new computational tools and methods. For example, identi- fying reaction pathways and mechanisms requires the development of efficient network traversal algorithms that can identify cycles and unique routes (that are different in their chemistries) among all possible ones, and prune out those that do not satisfy user-defined constraints (length, intermediates present, etc.). The kinetic models of these systems, further, tend to be high dimensional nonlinear systems of differential- algebraic equations that require robust and stiff numerical solvers capable of handling the size of the system. Second, the computational tools need to be generic, or chemistry- independent. For example, calculating molecular properties such as thermochemistry of a large number of species requires the availability of semi-empirical methods for quick estimation and a generic and flexible software framework that provides enough options for specifying methods to calculate thermochemical quantities of any stable compounds or reactive intermediates. Many network generators have been developed and have mainly been applied for kinetic modeling of specific systems such as pyrolysis, catalysis, and biological systems. However, to handle the wide variety of organic chemistries and to allow for a complete analysis of complex networks in terms of the questions pertaining to its elucidation as discussed above, a network generator that is generic in terms of the allowed chemistries and analysis options (extending beyond kinetic modeling) is required. 1.2 Rule Input Network Generator: Features 5 1.2 Rule Input Network Generator: Features

RING takes in as inputs initial reactants, reaction rules, and post-processing network analysis instructions written in a domain specific reaction language. In addition to an exhaustive network of all possible reactions and species consistent with the input reaction rules, RING can output, as desired, (a) lumps or groups of molecules that are functional isomers§ and the corresponding lumped network, (b) reaction pathways and mechanisms to products, (c) thermochemical properties of each species in the net- work using group additivity inputs, and (d) results of a kinetic model – the predicted yield of different species for user specified kinetic parameters, reactor size, and physical conditions of temperature and pressure. RING has been shown to handle a wide vari- ety of chemistries including free-radical systems and homogeneous and heterogeneous acid/base/metal catalysis [19]. RING builds on the concepts of existing network generators and expands the scope of systems that can be handled and types of analysis that can be performed. Specifically, it employs state-of-the-art cheminformatics and graph theoretic algorithms and meth- ods to represent compounds and reaction rules, manipulate the structure of molecules to generate new reactions, identify molecular characteristics such as shape and chemical functionality, traverse reaction networks to find pathways and mechanisms, and deter- mine molecular similarity while grouping together (or lumping) isomers. This allows RING to be (i) generic in terms of chemistries that can be modeled, (ii) flexibile in the analyses that can be done, and (iii) scalable to accommodate large networks. A detailed description of the features of RING is given in Chapter3 while Chapter4 presents all underlying algorithms.

1.3 Applications of RING

Several case studies are shown in this thesis to demonstrate how RING can be used to elucidate the two different classes of networks discussed earlier in this chapter. These case studies demonstrate all three types of network analysis methods discussed above.

§Isomers are molecules having the same molecular formula but having a different chemical structure, and therefore, identity. Functional isomers are a subset of isomers that have the same set of groups (of atoms) thereby having a similar chemical functionality which is sometimes hard to distinguish experimentally 1.3 Applications of RING 6 1.3.1 Topological analysis and mechanism hypothesis of complex sys- tems

Chapter5 presents case studies wherein topological analysis features in RING are demonstrated to enable: (a) identifying dominant pathways to specific products in complex networks using inputs on kinetic parameters from theory, and (b) hypothesis of plausible mechanisms and determination of possible experiments to test them. The case studies include: (a) Brønsted acid-catalyzed conversion of propane to aromatics, and (b) dehydration of glycerol conversion of acetone on acid catalysts. Through these case studies, we demonstrate that RING can be used to guide experimentation and computational analysis.

1.3.2 Identification of synthetically feasible compounds and synthesis routes

Chapter6 discusses a method involving the second type of analysis. Specifically, a method is presented to (a) construct an exhaustive synthesis network of biomass con- version reactions, (b) query the spectrum (≥ 15000) of possible compounds for desirable ones on the basis of physical properties, (c) identify synthesis routes to produce these compounds, and (d) compare these routes in terms of stoichiometric, energetics, and physical parameters such as atom efficiency, enthapy and free energy change of reac- tions, and aqueous-organic partition coefficients (Log P)¶ of intermediate compounds. To calculate physical and thermochemical properties, we employ quantitative structure- property relationships and group additivity method. This method is presented in the context of identifying and evaluating heterogeneous catalytic routes from biomass to fatty alcohols that are potential constituents of nonionic surfactants. We thereby demon- strate that this method is generic, flexible, reliable and fast in terms of the scope of chemistry that can be considered, properties that can be included, predictive accuracy, and speed of execution; it can, therefore, be used to rapidly screen a large spectrum of compounds and synthesis routes in biomass conversion.

¶This is a measure of relative solubility of a compound in two immiscible liquids (like water and oil) at equilibrium conditions 1.3 Applications of RING 7 1.3.3 Mechanism elucidation of complex glycerol conversion network on transition metals

Chapter7 extends the method proposed above to identify energetically feasible mech- anisms in glycerol conversion on transition metal catalystsk. Glycerol can decompose into carbon monoxide and hydrogen (collectively referred to as syn gas) through carbon- carbon (or C-C) bond scission reactions or form compounds such as 1,2-propane diol through carbon-oxygen (C-O) bond scission steps depending on the transition metal cat- alyst used. The method presented is used to construct the complex network comprising of 3300 reactions and 500 species that exhaustively describes possible transformations of glycerol on these catalysts. Subsequently, using semi-empirical methods available in the literature, the method calculates the thermochemistry and activation barriers∗∗ of each reaction step. Thee energetically feasible pathways for forming C-C scission and C-O scission products (syngas and propane diol respectively are then identified on Platinum, Palladium, Rhodium, and Ruthenium catalysts. The results (discussed in detail in Chapter7) indicate that (a) syn gas formation is preferred on Platinum and Palladium, (b) Rhodium and Ruthenium will have a comparatively higher selectivity for C-O scission products, and (c) glycerol tends to undergo several dehydrogenation steps prior to undergoing C-C scission, which are all consistent with experimental observations and DFT calculations. It is proposed that this method can be used to screen a large number of pathways in complex catalytic networks and to thereby identify the dominant modes in the system owing to its generality, flexibility, and speed.

1.3.4 Kinetic modeling of Methanol-to-Hydrocarbons system

Chapter8 presents the final type of analysis – kinetic modeling of complex reaction systems using RING. Specifically, the chapter demonstrates (a) specification of kinetic parameters of each reaction rule in a rule-based manner and (b) formulation and specifi- cation of group additivity rules for thermochemistry estimation, to construct and solve a thermodynamically consistent kinetic model automatically subsequent to network gen- eration in RING. This is presented in the context of kinetic modeling of methanol conversion to hydrocarbons. We present kinetic modeling results – the yield of various

kThese are metallic or alloy catalysts of transition metals such as Platinum, Nickel, Copper, etc. ∗∗the threshold energy required by a reactant molecule to cross over a potential energy hill to ulti- mately form a product 1.3 Applications of RING 8 products at different points along a plug flow reactor for user-specified reactor condi- tions and sizing, and compare our predictions with experimental data. The chapter also lays the foundation for estimation of kinetic and thermochemical parameters by fitting the kinetic model to experimental data. CHAPTER 2

Background∗

In Chapter1, several examples of complex reaction systems were given. Despite the large size, it was argued that the reaction networks can be constructed from a relatively smaller set of chemical transformation rules. Automated network generators have there- fore been developed to construct such networks from initial reactants and pre-specified reaction rules. In this chapter, we provide a detailed discussion of relevant background developments in cheminformatics and state-of-the art in network generation and kinetic modeling.

2.1 Computer-assisted prediction of organic chemical re- actions

A systematic computer-based representation of reactions was first applied in organic chemistry for the synthesis of large and complex drug-like molecules. LHASA [21], a tool for computer-assisted analysis of organic syntheses, is an interactive program that can perform retrosynthetic analysis. Synthesis strategies and heuristics are applied to generate plausible pathways for synthesizing target molecules from commonly available compounds and reagents. The pathways generated by LHASA [21] are based on a

∗Reported with permission from Rangarajan et al.[20] Copyright c 2012 Elsevier Inc.

9 2.1 Computer-assisted prediction of organic chemical reactions 10 database of known reactions. In contrast, IGOR [22, 23] and SYNGEN [24] adopt a formal approach to organic synthesis by basing their synthetic strategies on reactions generated through bond and electron rearrangements. Molecules in IGOR are repre- sented by bond-electron (BE) matrices [25] – this is an abstraction that treats molecules as graphs. The off-diagonal entries hold the bond order between two atoms while the diagonal entries store the total electron count in lone pairs and unpaired electrons. A reaction step is represented by a matrix (R) of changes in the BE matrix. CAMEO [26] was developed to predict organic reactions based on mechanistic reasoning starting from a given set of initial reactants. ROBIA [27, 28], a reaction prediction tool, applies a set of reaction rules from a database to input molecules, for a given reaction conditions, to identify products. These are further evaluated using computational chemistry methods such as AM1 [29] to predict likely transformations. More recently, an expert system [30] was presented to predict reactions and mech- anisms, and formulate retrosynthesis strategies on the basis of a database of reaction rules written in SMIRKS[31] format. This tool keeps track of electron flows and maps atoms between reactants and products so that the resulting reactions are balanced. The concept of graph transformation has been discussed and applied for formal representa- tion of reactions in chemical and biological systems [32, 33, 34, 35, 36]. Reaction rules, in this case, are represented as graph transformation rules – operations on the node/ edge properties – on the molecular graphs. Reactions, then, are the result of applica- tion of these transformation rules on a set of reactant graphs. Such an abstraction is applicable, irrespective of the type of chemistries, and naturally allows for a reaction rule-based description of a system. Such tools for organic synthesis cannot be directly applied in reaction network gener- ation because these two problems are characteristically different [37, 38]. The techniques from organic synthesis design, such as representation and transformations of molecules can, however, be extended to the problem of reaction network generation. Specifically, principles and ideas of graph theory that have been exploited in the chemistry liter- ature form the basis of formal internal representation of species and reactions in our computational tool RING. 2.2 Network generation and analysis: a review 11 2.2 Network generation and analysis: a review

Rule-based automated reaction network generators are computational tools that take in a set of molecules as reactants, and iteratively apply the set of input reaction rules, to construct a comprehensive list of possible reactions. Network generators have been developed and applied in different fields such as pyrolysis & oxidation, catalysis, and biological systems. Table 2.1 lists and describes several of them. All automated net- work generators have five common and essential features [39]. First, an unambiguous representation is required for molecules and reactions. This is usually represented as character strings for input and output [40, 41, 42]. Second, an internal representation of molecules is required, such as molecular trees, adjacency matrices, or chemical graphs, thereby enabling quick structure manipulation. Adjacency matrix is the most common representation format owing to its simplicity. An adjacency matrix “M” of a molecule is a square matrix containing connectivity and bond order information between every two atoms. Thus, M(i,j) = 0 implies that the ith and jth atoms are not connected while a positive nonzero value would indicate the strength of the bond (1 is a single bond, 2 is a double bond, etc). The diagonal values indicate the number of unpaired electrons in the atoms. The Bond-electron matrix, therefore, is an adjacency matrix. Third, an in- ternal representation of reaction rules that can be applied iteratively on the molecules is required. A common representation scheme is to employ a matrix for reaction rules “R” proposed by Dugundji and Ugi [25] and later used in other tools such as NETGEN [40]. Baltanas & Froment [43] used a Boolean matrix to represent molecules for generation of networks for modeling paraffin cracking and isomerization on bifunctional catalysts. The Boolean matrix is similar to the adjacency matrix; however, bonds of a higher order (e.g. double bonds) and information on charges (such as +1 for carbenium ions) are stored separately. This method, therefore, is similar to that of Dugundji and Ugi [25]. Transformations in RDL [41] and RDL++ [44], on the other hand, are input by the user as English-language-like statements describing changes in the charge/ bonding of atoms participating in the reaction rule which get directly applied on the internal graph description of molecules. Fourth, all network generators have a generation scheme that iteratively applies the reaction rules to all input and generated molecules so that the resultant network is exhaustive. The scheme should ensure that all possible reactions of a given set of reactants are generated corresponding to that reaction rule. Faulon and Sault [45] describe such a generation scheme as deterministic network generation. Combinatorial explosion is an important practical problem that can significantly 2.2 Network generation and analysis: a review 12 increase execution time and lead to a large proportion of unimportant or improbable reactions. The fifth essential feature of most network generation tools, therefore, is to employ a systematic procedure to curtail this effect. When kinetic parameters are available apriori, quantitative estimates of the magnitude of the reaction rates allows for the identification of “important”/“unimportant” reactions and species that should be included in, or excluded from, the network. For example, the tool NETGEN adopts rate- based [46] network pruning criteria. This requires generation and kinetic modeling in concurrence because the rates calculated on-the-fly are used to determine if a particular species will react further. In the absence of such kinetic information, either topological or experts-based constraints can be provided. For example, species rank-based criteria [47] network pruning criteria prevents reactions that involve species of ranks greater than a specified value, while the tools RDL [41] and RDL++ [44] allow for the specification of constraints that molecules should satisfy to undergo a particular transformation. Faulon and Sault [45] propose stochastic (or sampling) network generation algorithms, in contrast to the deterministic scheme, for concurrent generation and reduction of networks. These algorithms scale in polynomial time but require on-the-fly estimation of rate constants which is achieved, in their case, through quantitative structure property relationships for free-radical chemistries. Kinetic modeling is a common application of automated network generation, wherein the appropriate differential algebraic system of equations that captures the dynamics of the system is formulated. The model is then solved with kinetic parameters estimated, predicted, or specified, to obtain product yield information. Network generation in con- junction with kinetic modeling has been extensively applied for hydrocarbon [40, 48], and biochemical systems [49]. Complex reaction networks, however, have also been ana- lyzed for: (a) deriving topological properties such as average path length of the network [50], (b) identifying synthetic/ degradation pathways [51], and (c) deriving and testing plausible mechanisms and overall rate expressions [52, 53, 54]. The use of additional thermodynamic data in conjunction with the reaction network has further enabled quan- titative analysis of networks in terms of: (a) generating thermodynamically meaningful flux distributions in biochemical systems [55], (b) extracting functional information such as regulatory sites in biological systems [56], and (c) identifying thermodynamically fea- sible synthesis routes [57] to form chemicals, or biological degradation pathways [58] to decompose molecules. 2.2 Network generation and analysis: a review 13 [ 63 , 64 ] [ 42 ] [ 48 , 66 , 67 ] [ 40 , 57 , 60 , 61 , 46 , 8 ] , 47 (i) Applied in gas phase combustion Rate based and rank based pruning. (i) Hydrocarbon gas(ii) phase chemistry. Thermochemistry wasfrom a calculated database. (i) Applications in hydrocarbon pyroly- sis. (ii) Accurate calculationsics of and kinet- thermodynamics, andtion formula- of kinetic modelsuct to yields predict and prod- conversion. and oxidation.radical (ii) chemistry rules A used library innetwork reaction generation. of (iii) free Tree represen- tation system basedal[ 62 ]. on Chinnick. et (ii) Application in gasnanoparticle phase synthesis, pyrolysis, and biochemical reactions. (iii) Linkedfor to thermochemistry. 59 ] MOPAC[ Continued on next page graph theory. String representationmolecules, of reactant patternBlurock based et on al.[ 65 ],dices and for topological molecule in- identification. of hydrocarbons.from Kinetics semi-empirical relations estimated obtained from theoretical calculations. based on ’BE’adjacency & ’R’ matrix matrices.molecules and representation reaction Uses rules. of datastructure for internal molecule rep- resentation. A list of Reaction network generators, and a description of their essential features, and their RMG Kinetic models of free radical chemistry Name Description Remarks References EXGAS Kinetic model builder using a tree COMGEN Network generator based on chemical NETGEN Network generator and model builder Table 2.1: areas of applications 2.2 Network generation and analysis: a review 14 [ 41 , 37 , 68 ] [ 44 , 69 , 17 , 70 ] [ 71 ] [ 72 , 73 , 74 ] [ 75 , 57 , 55 , 58 ] (i) Reaction networkscratch generated depending upon from reactioninput, rules thus offeringscribing flexibility the in system.on de- rules to (ii) prevent combinatorial explo- Constraints sion. (i) Applied inof heterogeneous microkinetic catalytic systems, modeling and data analysis and knowledge extraction in high-throughput experimentation. (i) Applied in(ii) combustion Reactions are chemistry. determined combina- torially, as a linear combinationmentary of steps. ele- (i) Application in reaction network gen- eration in biological systems and subse- quent dynamic modeling. (i) Reaction rules are obtained fromenzyme the function information in KEGG database. (ii) Incorporationcontribution of based thermochemistry group- es- timation for flux analysis and pathways prediction. Continued on next page Table 2.1 – continued from previous page tion of reaction rules.framework using Object-oriented elements of graph the- ory. Uses ’BE’ and ’R’and matrix reaction for representation. molecule action network.sentation Graph of based molecules with repre- being each a node buildingmolecule block of the of biological the system. macro- ing and analyzingpathways. biological reaction to enable description ofalyzed solid-acid reactions cat- of hydrocarbons. RDL English-like language based descrip- KING An automated mechanism generator. Name Description Remarks References BNICE Computational framework for generat- RDL++ Extends RDL with additional features BioNETGEN Rule-based generation of biological re- 2.2 Network generation and analysis: a review 15 [ 76 ] [ 77 ] (i) Multiscale simulation ofated the reaction gener- network usingalgorithms. stochastic (i) Application in modeling of biotrans- formations of VOCspollute water. that commonly Table 2.1 – continued from previous page thetic biological systems.enumeration of sets of Complete biomolecularactions re- based on userular input parts of involved in molec- and gene regulation. expression metabolism ofture. chemicals incompounds Generated a are interconnected paths mix- through common of metabolites. different solved to ODE predict models the the each of time the profiles compounds. of Name Description Remarks References BioTrans Computational tool for predicting SynBioSS Modeling and simulation tool for syn- 2.3 Topological network analysis 16 2.3 Topological network analysis

The transformations occurring on a compound to form various products can be obtained by finding: (a) a sequence of reactions that describe how the reactant gets converted into an intermediate, which could further react, and ultimately form the final product, and (b) a set of reactions that when taken together leads to an overall reaction with- out the net generation/consumption of any unstable reactive intermediate. The former represents “a pathway” from the initial reactant to the final product while the latter is a “mechanism” with a balanced overall stoichiometry. A direct mechanism is the minimal set of reactions that has zero net consumption/formation of reactive interme- diates [78]; minimality ensures that no nonzero subset of this set is also a mechanism. Direct mechanisms, thus, are conceptually similar to reaction cycles. An overall (or complete) mechanism, further, is a set of direct mechanisms so that the reactants of the overall reaction are all initial reactants, and the desired target molecule is one of the products. Overall mechanisms, thus, are conceptually similar to a set of reaction cycles that operate simultaneously in a reaction system to convert the initial reactant to final products. Identifying pathways between two species in a network has wide applications in biological network analysis, and algorithms with/without tracking of individual atoms have been proposed for both weighted and unweighted networks [79, 80]. Given a list of reactions, algorithms have been proposed for identifying possible direct mechanisms using either combinatorial [81, 82, 83] or graph theoretic methods [52, 54]. A concept closely related to that of mechanisms is the linear-algebra-based flux analysis wherein all possible independent basis set of solution vectors that solve the stoichiometric steady-state mass balance equations are identified [84]. Two commonly applied flux analysis techniques in systems biology are extreme pathways [85] and ele- mentary flux modes [86]. CHAPTER 3

RING: description∗

In this chapter, we discuss the new computational tool RING that has been developed in this research for network generation and analysis. RING is an automatic rule-based net- work generator that builds on and extends existing network generators. RING provides a generic scope by allowing for representing intermediates and describing reaction rules applicable to most organic chemical transformations. RING also allows for topological network analysis through identification of pathways and mechanisms in the generated network in addition to kinetic modeling. Figure 3.1 shows the input-output structure of RING as well as the three internal modules - a language compiler, a reaction network generator, and a post-processing module.

3.1 Components of RING

3.1.1 Reaction language compiler

The compiler acts as an interface between the user and network generation and post- processing components of RING by translating the inputs into relevant instructions used internally by the network generator. The inputs into RING are written as a program in an English-like reaction language and include the initial reactants, reaction rules, and

∗Reported from Rangarajan et al. [20] Copyright c 2012 Elsevier Inc.

17 3.1 Components of RING 18

Reaction network Network generator Rule Inputs Reaction Language Reactants Compiler Rules Post-processing Lumps Post-processing Pathways module Mechanisms Rule Input Network Generator (RING)

Figure 3.1: The modular structure of RING. Dashed arrows show internal information flow, solid arrows refer to inputs and outputs. set of post-processing instructions. The utility of a language interface in network generation was demonstrated in RDL [41] and RDL++ [44]. The syntax of the language closely resembles chemistry parlance so that debugging a program written in the language essentially involves proofreading. RINGs language interface has been developed using SILVER, an attribute grammar specification language [87]. In addition to providing an interface like in RDL++, the language can: (a) catch chemistry-specific inconsistencies as well as syntactical errors, (b) improve speed of execution through domain-specific optimization of input instruc- tions, and (c) allow for independent extensions in terms of syntax and semantics to the original grammar.

3.1.2 Reaction network generator

The information on initial reactants and the translated reaction rules is used by the network generator to construct the reaction network by iterative application of the rules upon the initial reactants and the products generated thereof. The output from the network generator is a list of all possible species and reactions consistent with the rules specified. Several Cheminformatics algorithms to manipulate the chemical information of a molecule, extract topological information of molecules, and generate reactions have been adopted in RING. Further, standard Cheminformatics formats to input, represent, store, and retrieve the chemical information pertaining to molecules, such as elemental composition, valency, charge, bonding, and electronic information have also been used. 3.1 Components of RING 19 3.1.3 Post-processing module

Specific information regarding the constituents of/ transformations within the network can be mined from the generated reaction network using the several post-processing options available in RING. For example, the generated reaction network may be large enough to preclude manual analysis to obtain specific information such as whether and how a particular molecule (or a class of molecules) is (are) formed. Such queries can be input in the form of instructions, or rules, specifying the analysis that is sought.

Lumping

Complex reaction networks can be composed of several species that are structural iso- mers of each other. If these similar species are grouped together into lumps, the total network size can be reduced significantly. For example, Hsu et al. [44] showed for acid-catalyzed aromatization of propane that the number of reactions without lumping structural isomers was more than a million while structural lumping resulted in only 605 distinct reactions. Analytical identification of lumps of molecules, instead of individual species, is a more feasible option when the number of species in a system is large [15] because of experimental difficulties in distinguishing between isomers. Furthermore, lumping can lead to a reduced network that can be used to formulate a simplified lower- dimensional mathematical model that is easier to solve. Lumping strategies have been proposed in hydrocarbon processing wherein, a lump is represented either by a single likely structure and reactions are written on that basis [88], or as a mixture with an estimated internal distribution and an overall reaction computed on the basis of com- position and kinetics information [89]. In contrast to these methods, lumping methods for dimension reduction in kinetic models through mathematical transformations have also been proposed [90, 91, 16]. Lumping instructions, as an input feature is not available in the tools listed in Table 1, though RDL++ [44] can group together hydrocarbons in a post-processing step. In RING, however, a generic lumping scheme, that identifies functional equivalence between isomers, has been implemented. In general, two molecules can be lumped only when there is a one-to-one mapping between their atoms; mapping in this case exists when the two atoms belong to the same functional group. Such functional equivalence cannot be inferred from molecular formula and structure alone for non-hydrocarbon molecules such as oxygenates. We have, therefore, developed a new lumping algorithm 3.1 Components of RING 20 that takes into consideration the environment of each atom in a molecule to decide the lump it belongs to. For example, ethanol (CH3CH2OH) and dimethyl ether (CH3OCH3) both have the same molecular formula; however, they have different functional groups. Ethanol has a C-OH group that the ether does not possess; hence, there is no one- to-one mapping between the oxygen atom of ethanol and the ether. Therefore, the two molecules are not lumped. On the other hand, 2-pentanol and 3-pentanol are both functionally equivalent (S2 in supplementary material of Rangarajan et al. [20] are lumped together as secondary five-carbon alcohols. 1-pentanol, being a primary alcohol, will naturally not be a constituent of this lump.

Pathway identification

RING can identify pathways to user-specified molecules from initial reactants that sat- isfy user-specified constraints on the nature of pathways (maximum length, presence/ absence of reactions in the pathway, etc). A reverse depth-first network traversal algo- rithm (see Chapter4) is used to identify all possible pathways exhaustively.

Mechanism identification

In RING, all overall mechanisms from the initial reactants to user-specified products can be identified. Further, all direct mechanisms forming any specified molecule can also be found. The algorithms of Mavrovouniotis and Stephanopoulos [81] and Fan et al. [52] identify all possible direct mechanisms from a given list of reactions. Since mechanisms are only sought for specified products, these two algorithms cannot be directly used within RING. An adapted version of the pathways algorithm is instead used that seeks only the relevant direct or overall mechanisms.

Thermochemistry estimation

RING can calculate the thermochemical parameters - enthalpy, entropy, and free energy of formation (and change) – of all the species (and reactions) in a network if the user provides group contribution rules [92].

Kinetic modeling

Kinetic modeling allows for obtaining quantitative insights, as opposed to qualitative topological network analysis results, by providing information on concentration, yield, 3.2 RING: Inputs 21 and selectivity of each species at different stages (or at different times) in the reactor, sensitivity of outputs to kinetic parameters, and rate determining steps. RING calls the open source software IDAS [93] to solve a differential-algebraic system representing the kinetic model of the complex system assuming each reaction is elementary and follows mass action kinetics. The user specifies reactor information – pressure, temperature, volume, and inlet flow rates of reactants – and RING solves for an isothermal steady state PFR. Outputs are flow rate, yi, of different species in the network at different stages of the reactor, sensitivity of species flow rates to kinetic rate constants dyi , and dkj   degree of rate control or DORC [94]. DORC is defined as d ln ri wherein r d ln kj i kj 6=kl,Kj and kj are the rate of production/consumption of species i and the rate constant of reaction j respectively; this value is calculated assuming all other rate constants kl are

fixed except the reverse reaction whose kinetics is determined by Kj, the equilibrium constant, which is also fixed.

3.2 RING: Inputs

Inputs into RING are: (a) initial reactants, (b) global constraints, (c) reaction rules, and (d) post-processing instructions describing the desired analyses of the network. The outputs from RING are lists of: (a) species, (b) reactions, (c) lumps and their constituents, (d) pathways to specified products, (e) mechanisms to specified products, and (f) species or reactions extracted from the network from a molecule or reaction query. In this section, the inputs into and outputs from RING are discussed in the context of an example dehydration of fructose to form HMF. The system is typically carried out in aqueous phase and is catalyzed by mineral acids [95]. An illustrative program for this system is shown in Table 3.1.

3.2.1 Initial reactants

Initial reactants are represented as SMILES-like strings [96]. Lines 1-2 in Table 3.1 describe the reactants for the Fructose-to-HMF system (Fructofuranose and proton to represent the acid), written in SMILES. A SMILES parser has been developed for RING which recognizes the elements C, H, N, O, S, and P. On many occasions, inorganic atoms such as Al, Cu, Pt, etc. are involved in chemical transformations, for example, as active sites (catalysts, ligands) or as reagents (electrophiles, nucleophiles, etc.) in substitution reactions, and have to be represented and manipulated during reaction generation. To 3.2 RING: Inputs 22

Figure 3.2: SMILES strings with composite atoms and non-bonded interactions: (i) Dimeric intermediate in DEE synthesis from ethanol on proton-form zeolite catalysts; (ii) surface species formed upon dissociative adsorption of ethanol. this end, a new feature, composite atoms, has been added to represent a miscellaneous group of atoms as a single atom for convenience in representation and manipulation. A composite atom, within RING, is a user-defined entity used to represent: (a) an atom other than the recognized elements or (b) a group of atoms considered together as a unified atom. These atoms are represented as alphanumeric strings enclosed within curly braces, for example, Platinum atoms can be represented as {Pt}. The name given to a composite atom is arbitrary and does not have to be an element name. For example, two different types of active sites (AS) on a catalyst can be represented as {AS1} and {AS2}. These atoms have to be defined using define composite atom statement. Once defined, RING recognizes the composite atom during reaction rule specification and reaction generation. Examples employing composite atoms are discussed in Chapter 4. A second feature that has been added into SMILES in RING is the definition of non-bonded interactions partial and hydrogen bond. This interaction is represented using “ and can be used to depict, for example, dimeric alcohol intermediates in zeolites [97] co-adsorbed on a zeolite, or adsorption of molecules on a metal surface as shown in Figure 3.2. The use of SMILES, with these two additional features, allow for rigorous description of the intermediates involved in different types of chemistries. 3.2 RING: Inputs 23

Table 3.1: Sample input code for Fructose-to-HMF system

Reaction rules & Post-processing

1. input reactant O1C(CO)(O)C(O)C(O)C1(CO) 2. input reactant [H+] 3. //Defining global constraints 4. global constraints on Molecule f 5. fragment a{ 6. C+ labeled 1 7. any atom labeled 2 double bond to 1} 8. ! Molecule contains a 9. Molecule.size <15 && Molecule.charge <2 10. fragment b { 11. C labeled c1 12. C labeled c2 double bond to c1 13. X labeled x1 double bond to c2} 14. ! Molecule contains b} 15. //alcohol protonation rule 16. rule alcoholprot{ 17. neutral reactant r1{ 18. C labeled c1 {! connected to >1 O with any bond } 19. nonringatom O labeled o1 single bond to c1} 20. positive reactant proton{ 21. H+ labeled h1} 22. constraints {r1.size <15 && r1.size >2} 23. form bond (o1, h1) 24. modify atomtype (o1, O+) 25. modify atomtype (h1, H)}

Continued on next page 3.2 RING: Inputs 24

Table 3.1 – continued from previous page Reaction rules & Post-processing

26. //More reaction rules here 27. lump all isomers{ 28. represent acyclic with farthest apart{ 29. represent cyclic with farthest apart} 30. find pathways to mol{ 31. mol is OCc1oc(C=O)cc1 32. } constraints { 33. maximum length 11 34. contains <= 2 rule Hshift 35. eliminate similar pathways 36. } store in HMFPathways.txt 37. find complete mechanisms to mol{ 38. mol is OCc1oc(C=O)cc1} overall constraints{ 39. maximum length 12 40. maximum cycles 4 } cycles constraints { 41. maximum length 4 42. contains <=2 rule Hshift 43. eliminate similar mechanisms} store in HMFMechanisms.txt 44. find mol{ 45. mol is cyclic && mol.size > 5 && mol is neutral 46. }store in CyclicNeutralMolecules.txt 47. find reactions{ 48. rule is only alcoholprot 49. reaction with 1 reactant mol{ mol is cyclic} 50. } store in CyclicAlcProtRxns.txt 3.2 RING: Inputs 25

H H H H

Pd Pd Pd Pd Figure 3.3: Dissociative adsorption of molecular hydrogen on metal sites (Pd).

3.2.2 Reaction rules

These describe the chemical transformations that take place in the reaction system [19] and comprise of: (i) the set of atoms that participate in the reaction, called the reac- tion center or reactant patterns, (ii) the final electronic configuration of the atoms in the reaction center and the bonding between them, and (iii) constraints that govern the reaction center and the entire molecule. These constraints have to be satisfied by the reactant molecule prior to the application of the reaction rules. In addition, con- straints can also be imposed on the products formed and are classified under molecular constraints. Each reaction rule defines a particular chemical transformation which can be ele- mentary or non-elementary, and be unimolecular or bimolecular. For a given reaction system, the reaction rules can be deduced from the literature and from knowledge of the operating conditions. For example, gas phase pyrolysis of hydrocarbons or oxygenates typically involves homolytic bond cleavage/formation, beta-scission, hydride shifts, etc., while heterogeneous solid acid catalysis would involve adsorption/desorption of hydro- carbons, beta-scission, alkylation, etc. RING allows for the specification of an additional reactant identical to one of the reactant(s). This feature has been implemented mainly to address elementary steps on surfaces where two adjacent identical surface species (or sites) are involved, such as desorption of surface hydrogen species on palladium to form molecular hydrogen [98]. This duplication allows for an exception whereby termolecular reactions can de described when the third reactant pattern is identical to one of the other two, such as in the case of dissociative adsorption of molecular hydrogen (the reverse of the desorption step) shown in Figure 3.3. Lines 16-25 in Table 3.1 describe the reaction rule for protonation of a hydroxy group, shown in Figure 3.4. The rule essentially describes the abstraction of a proton by the oxygen atom of an alcohol to form an oxonium ion. 3.2 RING: Inputs 26

Figure 3.4: Sample input reaction rule in RING elementary step involving protonation of an alcohol.

Reactant patterns

Reaction centers (patterns) are written as shown in lines 17-21 in Table 3.1. The hydroxy pattern is described in lines 17-19 a carbon atom, labeled ‘c1’, that is single bonded to an oxygen atom not in a ring, labeled ‘o1’. Lines 20-21 describe a proton a positive hydrogen atom labeled ‘h1’. All atoms in a reactant pattern are labeled for unambiguous identification while describing transformations. Each reactant pattern describes one of the reactants. Rules can be unimolecular (single pattern), bimolecular (two patterns), ternary (consisting of two identical and a second distinct pattern), and intramolecular (special case of bimolecular rules). Therefore, bimolecular reactions have two reactant patterns which together constitute the reaction center. Ternary reaction rules can also be specified when two of the three patterns are identical. Bimolecular reaction rules can be set to also/ only allow intramolecular reactions. In such a case, an intramolecular reaction will occur only if both the patterns exist in a single reactant.

Constraints specification

Constraints are essential in preventing a combinatorial explosion in the size of the gener- ated network. In RING, two types of constraints can be specified: molecular and atom environment constraints. The atom environment constraints, given along with reactant pattern specification, describe the spatial properties of the atoms in the reaction cen- ter. Specifically, they describe: (a) what the neighboring atoms and fragments are, and (b) topological characteristics of the atoms - ring atom, aromatic atom or allylic atom, largest size of the ring in which atom is present, etc. The molecular constraints are constraints on (a) charge, (b) size, (c) presence (or absence) of structural features, and (d) shape and presence (or absence) of topological characteristics. Atom constraints are specified as shown in line 18 of Table 3.1, wherein, the constraint restricts the carbon ‘c1’ to be connected to one or fewer oxygen atoms. This ensures that the carbon is not part of an ester or an acid functional group. Molecular constraints are specified as shown in line 22 of Table 3.1. The carbonyl 3.2 RING: Inputs 27 species has an upper and lower bound on the size (defined as the number of non- hydrogen atoms in it). Lines 17 and 20 also impose charge constraints on the reactants by using the prefixes “neutral and “positive. Combined constraints involving both reactants and constraints on products can also be specified. This constraint specification scheme, inclusive of reactant and product molecular constraints and reaction center atom constraints, thus, provides the user with the capability of imposing restrictions to varying levels of severity.

Transformations

A reaction rule is complete when the final configuration of atoms and bonds of the reaction center is specified. Transformations in RING, similar to those in RDL [41] and RDL++ [44] involve either changes in connectivity such as bond formation/ cleavage and modification of the bond order, or changes in electronic configuration such as adding or removing charge, lone pairs, and unpaired electrons. The changes in electronic con- figuration are described by atomtype modifications - an atom of one atomtype changes to another atomtype (e.g., C forming C+). Lines 23-25 of Table 3.1 describe the trans- formations for the case of alcohol protonation: (a) formation of a single bond between oxygen atom, ‘o1, and hydrogen atom, ‘h1, (b) formation of an oxonium ion, and (c) loss of charge on the proton to form neutral hydrogen. The description of changes in the bond connectivity and electronic configurations, along with reactant pattern de- scription, provides complete information on the initial and final state of each atom in the reaction center. This allows for: (a) catching incorrect rules that violate valency constraints of atoms directly by the language compiler prior to network generation, and (b) correctly identifying changes in topological characteristics of molecules such as in steps that break aromaticity. Such a description of transformations is independent of the type of chemistry and hence applicable to most organic chemical transformations homogeneous/heterogeneous catalytic/non-catalytic chemistries. The Fructose-to-HMF system has several additional rules: protonation of carbonyl and C=C groups (and their reverse steps), hydride shift, allyl shift, and dehydration of oxonium ion. AppendixA the corresponding reaction language code for these ele- mentary steps. These rules, while only a subset of all acid-catalyzed elementary steps, form the likely steps for aqueous phase mineral acid catalyzed system at moderate temperatures (about 100deg C). 3.2 RING: Inputs 28 3.2.3 Global constraints

These are molecular constraints on the entire reaction system applicable to all molecules in the network. For example, global constraints can fix the maximum possible size of a molecule in a system. For the Fructose-to-HMF system, lines 4-14 describe molecular constraints applicable to all molecules at all times - the maximum size of a molecule is 15 atoms (line 9), the maximum allowed charge on a molecule is 1 (line 9), a positively charged atom can never be connected to any atom with a double bond (lines 5-8), and there are no consecutive double bonds (lines 10-14). It can be noted that the two fragments, a and b, are defined in a manner similar to the reactant pattern definition, and refer to specific patterns in the molecule that should or should not be present in the molecule. Such fragment constraints can also be imposed in reaction rules.

3.2.4 Post-processing instructions]

Lumping strategy

In RING, the user can specify: (a) if lumping based on functional equivalence is required, (b) if the representative molecule of the lump is the isomer that has branches farthest/ closest apart, and (c) if further lumping of hydrocarbons, such as paraffins, olefins, naphthenes, and aromatics based on size is required. While line 27 specifies that lumping is required, lines 28-29 in Table 3.1 describe how acyclic and cyclic molecules satisfying the condition for lumping, are both represented by the constituent that has its branches farthest apart. For example, in the lump of five-carbon secondary alcohols (section 3.1.3), the lump representative, according to line 28, would be 3-pentanol because its branch (the C-OH group) is farther apart from both the ends when compared to 2- pentanol, the other constituent.

Pathways identification

Pathway querying, in RING, consists of two steps a description of the target molecule(s) and the nature of the pathways desired. Target molecules are described by molecular constraints, while constraints on the nature of pathways pathway constraints are described subsequently to put a limit on path length, describe presence/ absence of rules and molecules, and to describe constraints on the reactants and products in one or more reactions of the pathway. Lines 30-36 in Table 3.1 show a pathway query to HMF, “OCc1oc(C=O)cc1. The constraints on the pathway include a maximum length 3.2 RING: Inputs 29 constraint of 11 reactions in the sequence, a limit on the number of occurrences of a reaction corresponding to the rule Hshift (hydride shift, see AppendixA information), and a requirement that similar pathways be eliminated.

Mechanism identification

Lines 37-43 in Table 3.1 describe an overall mechanism query to find mechanisms to HMF. The mechanisms can have several cycles; therefore, constraints can be imposed on the overall mechanism (lines 39-40) as well as on individual cycles (lines 41-43). The overall constraints restrict the total number of reactions and cycles in the complete mechanism. The individual cycle constraints restrict the number of reactions in each cycle, and limit the number of occurrences of a reaction corresponding to the rule Hshift. Further, the requirement “eliminate similar mechanisms is imposed so that multiple cycles having the same overall reaction are not obtained.

Molecule and reaction queries

The generated network can be queried for specific components of the network species and reactions. The molecule queries are given as shown in lines 44 46 in Table 3.1 wherein all molecules satisfying the specified molecular constraints cyclic and neutral are sought. Further, reaction queries seeking all reactions of the network satisfying specific types of reactions, reactants, or products, can also be input. Lines 47 50 in Table 3.1 show an example of reaction query for obtaining all alcohol protonation reactions involving a cyclic reactant.

Thermochemistry calculation

RING provides two options for specifying the group additivity information. First, groups and their contributions to enthalpy, entropy, and specific heat capacity at different tem- peratures be specified. Second, corrections can be specified by defining fragments and their correction contributions. These corrections can account for non-nearest neighbor effects such as gauche or 1,5 interactions. The user specifies the groups and their contributions as shown in Figure 3.5. The first atom in the group additivity fragment is the central atom while the others are neighboring atoms. This is in accordance with Benson’s definition of groups. No such 3.2 RING: Inputs 30 differentiation is necessary for group corrections. The term “gasPhaseSpecies” is a char- acteristic defined by the user to describe molecules that are gas phase (and not surface bound). Including this characteristic indicates to RING that the fragment correction should be applied only if the molecule is gaseous. RING compiles the additivity inputs into multiple sets of group additivity specifications classified according to the first (or central) atomtype (C, O, C+, C., etc). For each set of group fragments, a hash value pair, say (h1,h2), is calculated for the fragments similar to that in lumping. The first value of the hash pair, h1, takes into consideration the central atom and immediate neighbor information, while the second value, h2, takes into consideration the electronic configuration of the central and neighboring atoms. While most of Benson’s groups only consider nearest neighbors, some groups have additional information about atoms twice-removed from the central atom; specifically, in cases of groups where neighboring carbon atoms are involved in C-C or C-O double bonds, the specific atoms are written as ‘Cd’ or ‘CO’ and are differentiated frm the regular carbon atom. Note that a group such as C(C)(H)(H)(H) as defined in Figure 3.5 is contained in C(Cd)(H)(H)(H). That is, if an atom matches the latter fragment, it will match the first as well. Therefore, any group additivity method needs to check the second fragment before it checks for the first fragment. Similarly, a group such as C(Cd)(Cd)C(H) needs to be checked prior to C(Cd)(C)(C)(H) which in turn needs to be checked prior to checking for C(C)(C)(C)(H). To distinguish neighboring C atoms from ‘Cd’ or ‘CO’, the hashvalue h2 that RING calculates is appropriately modified to reflect the number of double bonds of the neigh- boring atoms.

Kinetic modeling

The user has to specify rules to calculate kinetic parameters. Figure 3.6 shows an ex- ample of how rules to calculate the kinetics of a particular reaction rule – H abstraction – is written in RING. Conditional “if” statements can be written to assign different kinetic parameters depending on the nature of reactants and/or products. For exam- ple, in Figure 3.6, the rule specifies that if “r1”, which represents a reactant in the rule “Habstraction” as defined by the user while specifying reaction rules, is a methyl radical ([C.]), then assign kinetics on the basis of the nature of the product. The work “prima- ryRadical” is a molecule characteristic defined by the user to represent primary radicals. The rule, therefore, goes on to specify that if the product is a primary radical, use a 3.2 RING: Inputs 31 group additivity { fragment { C labeled c1 H labeled h1 single bond to c1 H labeled h2 single bond to c1 H labeled h3 single bond to c1 C labeled c2 single bond to c1 } enthalpy -42.9 kJ/mol entropy 127.12 J/mol/K cp (298=> 25.31, 400 => 32.07, 500 => 38.4, 600=> 44.06, 800=>53.56, 1000=>60.63, 1500=>72.47)

} group corrections { gasPhaseSpecies fragment { nonringatom C labeled c1 {connecetd to 3 C with single bond} nonringatom C labeled c2 single bond to c1 {connected to 2 C with single bond} } enthalpy 2.9 kJ/mol entropy -0.7 J/mol/K cp (298=> -0.9, 400 => -1.08, 500 => -1.10, 600=> -1.01, 800=>-0.76, 1000=>-0.56, 1500=>-0.35) }

Figure 3.5: Sample group additivity and corrections specification for thermochemistry estimation in RING particular value (in line 3) for pre exponential factor, activation barrier, and n (temper- ature exponent); otherwise use another value (given in line 4). If “r1” is not methyl, the values in line 5 have to be used. This scheme for kinetics specification allows for adding more rules to refine the estimation of kinetics. In addition, activation barriers can also be specified using linear free energy relationships such as the Brønsted-Evans-Polanyi relationships which estimate the barriers as a linear function of the enthalpy of the reaction [99]. Thermodynamic consistency can be forced by the user by specifying that the specific kinetics be calculated from the kinetics of the reverse step. For example, lines 6–7 in Figure 3.6 define the kinetics of C-C scission step to be the reverse of that of C-C formation step. 3.3 RING: Outputs 32 1. kinetics Habstraction { 2. if (r1 is "[C.]") { 3. if product mol any (mol is primaryRadical) {A 5.436e-1 cc/mmol/h Ea 29903.72 J/mol n 3.65} 4. A 2.718 cc/mmol/h Ea 5481 cal/mol n 3.46 } 5. A 2.5e10*3600/1000 cc/mmol/h Ea 10400*4.18 J/mol n 0.0 }

6. kinetics CCScission { 7. use reverse of CCformation}

Figure 3.6: Sample rule inputs into RING for the estimation of kinetic parameters

3.3 RING: Outputs

3.3.1 Reaction network

The reaction network is a list of species and a list of reactions, both written in SMILES format. The format for reactions essentially involves concatenating SMILES strings of reactants and products of that reaction with appropriate delimiters (‘. demarcates two different reactants/ products, while ‘¿¿ separates reactants from products). A SMILES generator has been developed that constructs the individual strings for molecules from their internal representation and appends them together to get the reaction string. Figure3.7 shows a sample set of reactions constituting the network, in both SMILES and graphical form. Several chemistry packages parse SMILES strings to generate a graphical output of molecules and reactions; the reactions in Figure 3.7 were generated using ChemDraw [100].

3.3.2 Lumping results

Results from a lumping analysis are output as a list of lumps wherein each lump is represented by its representative molecule (identified based on user-specified criterion) and is followed by a list of species belonging to the lump. Figure 3.8 shows, pictorially, one of the lumps identified by RING a lump of Fructose derivatives containing three hydroxy groups, an oxonium ion, a methyl group, and a carbon-carbon double bond. The eight molecules shown in the figure are lumped into one lump represented by the lump representative shown. Once the molecules are lumped, each reaction can be rewritten in terms of lumps to get a lumped reaction. Since the number of lumps is lesser than the number of species in a network (about 550 species and 240 lumps), the 3.3 RING: Outputs 33

Figure 3.7: Sample reactions of the Fructose-to-HMF network generated by RING. number of lumped reactions is also fewer (about 650 lumped reactions from a network of about 1200 reactions). Lumping, thus, reduces the network size by a factor of 2. For some systems, such as propane aromatization, lumping is shown to lead to reduction in network size by several orders of magnitude (see Chapter5).

3.3.3 Pathway identification results

Pathways are obtained as a list of reaction sequences wherein each sequence is, in itself, a list of reactions that traces the product formation. Figure 3.9 depicts one of the pathways for the conversion of fructose to HMF as a sequence of reactions. The pathway involves three alcohol-protonation and dehydration steps, while one of the steps is essentially an enol-carbonyl tautomerism step. Antal et al. [95] propose this sequence of steps as the dominant route for HMF synthesis from fructose in aqueous solutions catalyzed by mineral acids.

3.3.4 Mechanism identification results

Figure 3.10 shows the overall (complete) mechanism that corresponds to the pathway described above. The mechanism has four catalytic cycles (figure 3.10) three dehy- dration cycles and one isomerization cycle. The overall reaction, thus, is a balanced reaction with one molecule of HMF and three molecules of water formed from each fructose molecule. We can further query to obtain the constituent elementary step 3.4 Discussion 34

Figure 3.8: Sample lump and its representative in the Fructose-to-HMF reaction network identified by RING. reactions of these individual cycles through identifying direct mechanisms.

3.4 Discussion

The input options and output features of RING allow for systematic rule-based eluci- dation of complex reaction networks through: (a) construction of the reaction network from reaction rules, and (b) analysis of the transformations in the network through post-processing instructions (or rules). Such analysis is particularly relevant in biomass conversion as upgrading oxygenates to produce fuels and chemicals involves multiple thermochemical steps to remove oxygen atoms present in different functional groups [101, 102, 103]. RING currently does not take into consideration the kinetics of each of the steps and hence cannot apply features such as rate-based pruning [46]. In summary, RING is a network generation and analysis tool developed using Chem- informatics and graph theoretical algorithms. It is composed of three components a language compiler for a domain specific English-like reaction language, a reaction net- work generator that can construct an exhaustive network from specified reaction rules and initial reactants, and a post-processing module that enables (i) lumping of isomers to get a network of reduced size, (ii) finding pathways between initial reactants and 3.4 Discussion 35

Figure 3.9: One pathway from fructose to HMF in the Fructose-to-HMF reaction net- work identified by RING. The sequential representation of reaction is an adaptation of the reaction list that RING generates. 3.4 Discussion 36

Figure 3.10: Example of one complete mechanism having four catalytic cycles for HMF production from fructose, identified by RING. 3.4 Discussion 37 specified products, and (iii) finding mechanisms for the formation of specified molecules in the network. RING can be used to construct reaction networks from elementary or overall reaction rules in a rule-based manner, and further, to elucidate the transfor- mations occurring in complex chemical reaction networks through identifying reaction pathways, mechanisms, and lumps of isomers. The construction and analysis options in RING were described through an illustrative example involving acid catalyzed dehy- dration of fructose to HMF. RING is distributed open source under GNU Lesser GPL version 2.1 license through a dedicated website [104]. CHAPTER 4

RING: Algorithms

This three-component structure of RING, the inputs and outputs, and the applications of RING in network generation and analysis have been discussed in Chapter3. In this chapter, we discuss each module elaborately, providing the details of the algorithms and methods employed.

4.1 Reaction Language and Compiler

A language interface for network generation was first proposed and introduced by Prick- ett and Mavrovouniotis [41]. Hsu et al. [44] expanded this language to include features specifically meant for heterogenous catalysis. These are domain specific languages, or DSLs, and are custom languages developed specifically for describing reaction rules. Domain specific languages (DSLs) have several advantages [105]: (a) they allow for using high level notations as specifications that are well known to the domain expert, (b) programs are concise and “self-documenting”, and (c) domain knowledge-based val- idation and optimizations are possible. The reaction language in RING offers these advantages as well (see Section 4.1); specifically, the syntax is composed entirely of chemistry parlance making it easier to understand and debug compared to general pur- pose languages. Figure 4.1 shows a sample reaction rule input to RING – protonation of a carbonyl group. The reaction rule consists of several parts: (a) declaration of the

38 4.1 Reaction Language and Compiler 39 reaction center or reactant patterns (lines 2–6) that define the atoms and bonds par- ticipating in the rule, (b) specification of atom (curly brackets in line 3) and molecular constraints (lines 7–8) describing restrictions on the specific atoms in the reaction center or the entire reactant respectively, and (c) a list of transformations (lines 9–11) describ- ing changes in the electronic configuration of the atoms or the order (single, double, etc.) of bonds

o1 o1 h1 O OH

+ + H C c1 h1 c1

1. rule KetoProtonation { 2. neutral reactant keto { 3. C labeled c1 { connected to 2 C with single bond } 4. O labeled o1 double bond to c1} 5. positive reactant proton { 6. H+ labeled h1} 7. constraints { 8. keto.size < 7 && keto is linear} 9. form bond (o1, h1) 10. modify atomtype (c1, C+) 11. modify atomtype (h1, H)} Figure 4.1: Example reaction rule – Protonation of a keto group

High-level notations used in the language (Figure 4.1) such as “rule”, “neutral”, “reactant”, “single bond”, “positive”, etc. are derived from common chemistry termi- nology. Further, the structure of a reaction rule specification scheme – description of reaction center, stipulation of constraints, and description of transformations – closely resembles a chemist’s description of the reaction rule. These two factors, thereby, allow for a one-to-one correspondence between how the user would perceive the reaction rule and write it down in the reaction language. Figure 4.2 zooms in on the compiler in the overall structure of RING. The core of the reaction language of RING – reaction rule specification language – is an extensible language that focuses solely on describing the reaction rules of interest. These speci- fications are then transformed by the compiler into C++ code that makes use of the reaction network generator library. RING is equipped with a number of extensions, 4.1 Reaction Language and Compiler 40

Rxn Rule Reaction network Specification Language generator Input program extends

Pathways Query Pathways post- Language processing module Reaction Language Compiler

Figure 4.2: The overall structure of RING, zoomed in on the compiler component, using the pathways analysis module as an example of a post-processing module. specifically those for specifying post-processing features such as pathways and mecha- nisms queries. Each new feature or module in RING can contribute a language extension to the compiler, usually with additional syntax for describing any specific additional in- puts required. The compiler translates these instructions into appropriate C++ code that can then be compiled with RING’s core C++ implementation. The core of the reaction language of RING bears some similarity to RDL++ [44], although the superficial syntax is quite different, as RING is not based on S-expressions. There are, however, some notable differences from that system. First, inputs into RING are compiled to C++, whereas RDL interprets them. This allows the RING compiler to perform many “optimizations” on the rules given by the user. This can potentially lead to significant advantage in the specification of molecule constraints because these are directly translated to C++ boolean functions that are optimized to fail fast (see section 4.1.1). Second, while both RING and RDL++ perform name-binding and basic type checking (for example, where a molecule is expected, ensuring that a name refers to a molecule and not a bond), RING further ensures that the transformations described are appropriate at compile time, checking for basic chemistry mistakes like valency violations. For example, in the sample rule in Figure 4.1, if either of the two modify atomtype statements are missing, the reaction rule would not conserve charge, and the compiler would generate an appropriate error. Further, if double bond to c1 was erroneously written as double bond to c2 or aromatic bond to c1, the compiler would generate errors stating that c2 is not a previously defined label and that aromatic bonds can only exit between two aromatic atoms. 4.1 Reaction Language and Compiler 41

The EBNF of the grammar of the language is given in AppendixB. RING’s distri- bution includes a complete manual with all the syntax. Interested readers can access the documentation available online [104].

4.1.1 Compiler optimizations

Three broad categories of optimizations are performed by the compiler to enhance the speed of execution. We discuss these categories first and then provide illustrative statis- tics to demonstrate the efficacy of these optimizations.

Constraint categorization

The network generation algorithm needs to sift through candidate molecules for each of the reactants in a bimolecular reaction rule to identify potential co-reactants. Each possible pair of molecules must be considered for each rule. If the pair satisfies the constraints, a check for the presence of the relevant reaction patterns in the reactants can be done and subsequently new reactions generated. The constraints in a rule can be categorized as depending on one or the other reactant individually, and those “com- bined” constraints that unavoidably depend on both reactants. Ordering the constraints such that individual constraints are checked first can potentially speed up the network generation process. For example, if a molecule does not satisfy the constraints per- taining to the first reactant in the rule, then there is no need to check for a potential molecule pair. The RING compiler automatically classifies the constraints into these categories and emits them as separate constraint checking functions for the network generator. These functions are used in the specific order – individual constraints of the first reactant, individual constraints of the second reactant if the first set of constraints are satisfied, and combined constraints if individual constraints are all satisfied. The order of specification of constraints by the user is, thus, immaterial.

Constraint ordering

The order in which either individual or combined constraints are checked may also affect performance. For example, it is faster to check the number of heavy atoms or the charge of a molecule than to check if the molecule has a large functional group, such as acid anhydride. The compiler estimates heuristically the cost of checking each constraint, and orders them to attempt to verify those constraints that are quick and easy to check 4.1 Reaction Language and Compiler 42 Table 4.1: Benchmarking statistics: Run-time ratios for successive compiler optimiza- tions

Run-time ratio Benchmarks Reactions Species None→Constr. Cat 1 Constr. Cat→All Base catalysis 12771 4609 1.02 ± 0.08 1.00 ± 0.01 Fructose-to-HMF 1223 546 1.09 ± 0.02 0.99 ± 0.02 Glucose pyrolysis 14375 3131 1.00 ± 0.01 1.00 ± 0.01 HMF→Levulinc acid 39844 14875 58.30 ± 0.59 1.00 ± 0.01 Propane aromatization 2031 594 1.41 ± 0.03 1.21 ± 0.00 1 Constraint categorization optimization before those that are slower. For example, checking for molecule size of charge precedes any checks for molecular fragments.

Pattern re-ordering

Constraints checking that involve identification of specific functional groups in a molecule, and reactant pattern detection require pattern matching of a fragment in a molecule. In such cases, the particular arrangement of atoms in the reactant patterns or functional groups can speed up the matching process. The compiler re-orders these patterns before presenting it to the network generator to try to fail to match as early as possible. Unlike constraints re-ordering, where there is a cost associated with checking each constraint, here the cost for each atom is roughly the same. Instead, the atoms and bonds are ordered roughly by likelihood that they occur at all in molecules. Most organic molecules derived from biomass/ petroleum sources have more carbon atoms than oxygen atoms. Consider a six carbon ring and a pattern with two carbons and an oxygen, such as the fragment “C-C-O”. There are 12 different ways the two carbons could match this molecule before failing due to the lack of an oxygen, but matching the oxygen first would fail immediately. Placing rarer atoms first can thus make the match- ing fail early and thus speed up the pattern matching process. RING applies several empirical heuristics for the likelihood of occurrence, such as: (a) Nitrogen, Sulfur, and Phosphorous atoms are rarer than Oxygen atoms, which in turn are rare compared to Carbon atoms, (b) charged atoms occur less frequently than neutral atoms in a network, and (c) heavier bonds (double and triple) are rarer than single bonds. Table 4.1 lists the run-time ratios upon enabling: (a) only the constraint catego- rization optimization compared to having no optimizations (None→Constr. Cat), and 4.1 Reaction Language and Compiler 43

(b) all the optimizations with respect to having constraint categorization alone (Con- str. Cat→All) for five systems in a benchmarking study. The systems included in the study are: (a) the synthesis network for forming longer chain alcohols from smaller

(C1-C2) oxygenates using base catalyzed carbon-carbon bond formation and metal cat- alyzed (de)hydrogenation chemistries, (b) acid catalyzed conversion of fructose to 5- hydroxymethylfurfural (HMF), (c) pyrolysis of glucose by neutral electrocyclic reaction steps, (d) conversion of HMF to levulinic acid in acidic medium, and (e) acid catalyzed aromatization of propane. The first system (base catalysis) consists of overall reaction rules, while the rest of the systems are modeled in terms of elementary steps. It can be noted that while optimizations do not lead to statistically significant speed-up in the first three cases, there is considerable savings in the run times for the last two systems – HMF-to-levulinc acid and propane aromatization. Specifically, the network generation time for the HMF-to-levulinic acid system speeds up by a factor of about 60 upon enabling constraint categorization alone. This is attributable to the nature of constraints imposed in some of the reaction rules of this system wherein one of the re- actants is restricted to being an oxygenate. Checking for this constraint early prevents RING from performing several unnecessary steps before eventually rejecting a molecule because it does not satisfy that constraint. This system, however, shows no further noticeable improvement upon subsequently enabling the other two optimizations. On the other hand, the propane aromatization system shows significant speed-up due to constraints categorization alone as well as all upon enabling all the optimizations. The additional improvement in the latter case is possibly due to the constraint ordering op- timization because this system contains a greater number of constraints relative to the other systems. Pattern re-ordering, although not explicitly resulting in any statistically significant improvement in these five systems, can lead to about 15% speed-up (and potentially more) in identifying individual patterns in some cases (see AppendixC). Thus, our statistics suggest that systems with: (a) reaction rules having very restrictive constraints, and (b) large reaction networks (several tens of thousands of reactions) can benefit significantly from compiler optimizations.

4.1.2 Language extension

The RING compiler and language is capable of supporting independently developed post-processing modules due to its design and implementation in a domain-specific language called Silver [87] that uses a parser generator called Copper [106]. These two 4.1 Reaction Language and Compiler 44 tools are designed to support implementing extensible languages. A language extension can not only add new syntax, but also new analysis of the existing language. For example, an extension can add new error checks for the sensibility of the reaction rules, or an entirely new syntax intended for a new post-processing option. The difficulty of accomplishing language extension lies in two areas: (a) a full range of extensions have to be possible without dramatically complicating the design of the compiler and (b) different extensions should not be able to conflict and produce a broken compiler. Silver is a functional language based on attribute grammars, with a strong compo- sition model for ‘grammars.’ The RING compiler is written as an attribute grammar in Silver, and makes no reference to any extensions (post-processing options). Instead, Silver is simply able to take the extension grammars it is provided with and automat- ically compose them with the host language grammar, producing a working compiler with all the requested pieces combined. Host languages and extensions for Java [107], C, Promela [108], and Modelica have been written in Silver, and in fact the Silver compiler is also written in Silver. These capabilities are more than enough to allow post-processing modules to fully integrate with the RING compiler. Copper is a parser and context-aware scanner generator that comes with a compo- sition test for extension grammars. Frequently, because only full context-free languages are closed under composition, full generalized LR (GLR) [109] parsers are used when doing language extension. GLR parsers can parse any context-free language, but they may result in many ambiguities in the resulting parse tree. Copper sticks with the de- terministic LALR(1) [109] class of languages that are familiar to compiler writers, but because this class is not closed under composition, it does a special analysis [110] of the syntax introduced by a language extension. New syntax that passes the composition check will, therefore, not conflict with any other new syntax that also passes the same check. Although this limits the kinds of syntax extensions one can safely introduce, it is permissive enough to allow many desired extensions. The RING compiler comes with a number of post-processing extensions already, the organization of a representative part of which can be seen in Figure 4.3. The language extension, “pathway constraint language”, adds the syntax for expression pathway con- straints to the host language (reaction rule specification language). Each post-processing analysis option further has a corresponding language extension that depends on the host language and the pathway constraint language extension, and adds a specific type of 4.2 RING: Network generation 45

Pathways analysis

Rxn Rule Specification Molecule / Language Pathway constraint Reaction language Queries

Mechanisms analysis

Figure 4.3: The host language, and a representative subset of extensions to it, imple- mented in Silver. The arrows represent dependencies between grammars. analysis expressible in the extended language. For example, “pathways analysis” lan- guage allows for expressing pathway queries in addition to specifying the reaction rules of a system. Each of these extensions is an independent (only those dependencies shown) grammar, and the full compiler is built by having Silver compose them all together.

4.2 RING: Network generation

String-based representations, such as SMILES, are linear and, thus, cannot explicitly represent the atom connectivity for branched or cyclic molecules although ring iden- tifiers are placed in these strings which can be used to infer that two non-successive atoms are connected to form a ring. Internal representations are therefore used within RING – chemical graphs for molecules and fragments, and graph transformation rules for reaction rules to store the atom connectivity information unambiguously and explic- itly. Consequently, reactions are internally treated as a graph transformation problem. An object oriented framework has been adopted in RING for defining and representing chemical entities, such as atoms and molecules, and concepts, such as reactions and reaction rules. EROS[111], a reaction prediction tool for organic chemistry, adopted an object oriented approach to represent chemical entities. The object model enables delocalization of electrons to be effectively captured avoiding the limitations of repre- senting molecules in the form of connection tables. RDL[41] and RDL++[44] both use an object oriented framework for defining and representing molecules. The Chemistry Development Kit[112] (CDK) adopts a comprehensive hierarchy of classes that formal- izes the representation of chemical concepts and entities at various levels of abstraction - from the lowest level consisting of classes such as atoms and bonds, to higher levels 4.2 RING: Network generation 46 including molecules, fragments, polymers, ensembles, etc. RING adopts a part of the hierarchy of CDK to represent molecules and constituents and incorporates several other classes for reaction representation and network generation. The details of the hierarchy are discussed in AppendixD. In this section we discuss several algorithms used for the different aspects of network generation.

4.2.1 Molecules and fragments

The network generator module, as shown in 3.1, obtains the inputs from the translation of the compiler to construct the reaction network. The SMILES user-input is carried forward as it is, while the reactant patterns (and fragments for molecular constraints) information is translated into SMARTS[113]-like strings in a manner similar to that in COMGEN[42]. Several variations to the original SMILES and SMARTS strings, however, have been included. SMILES in RING is adapted from that of Weininger et al.[96] and further includes two new features – composite atoms encapsulating a group of atoms, and non-bonded interactions [20]. SMARTS strings include atom environmental constraints [20] enclosed by square braces (similar to MQL notation[114]), and compos- ite atoms and non-bonded interactions in the pattern (written similar to SMILES-like strings). Two important classes that store chemical information relevant for reactions are Molecule and Substructure (see AppendixD for more details). Class Molecule contains all the necessary information of a molecule. The atom connectivity information is stored as adjacency lists (in a class that class Molecule inherits from), and atoms themselves form a class. The class Molecule has functions for: (a) parsing SMILES, (b) finding canonical SMILES and topologically equivalent atoms, (c) ring perception, and (d) detection of allylic atoms and aromaticity. Class Substructure, on the other hand, stores information pertaining to molecular fragments (reactant patterns as well as fragment descriptions for constraint) and has a SMARTS parser. Class Substructure, like Molecule, also inherits adjacency lists from a parent class.

Unique SMILES

The canonical SMILES strings are generated using an adapted form of Weininger’s algorithm[115]. In the first step of the algorithm that calculates the initial rank of atoms in a molecule, RING calculates the rank based on (a) nature of the atom (composite or single, with preference given to composite); (b) mass number; (c) atomic number; (d) 4.2 RING: Network generation 47 valency; (e) number of neighbors that are non-hydrogen; (e) charge; (f) number of double bonds attached to the atom; and (g) number of hydrogen atoms attached to the atom. RING then iteratively recalculates the rank based on the value of the corresponding primes of the neighboring atoms, as prescribed by Weininger[115]. However, ties in the ranks of atoms are not broken at each step as suggested in Weininger’s algorithm. This leads to categorizing the atoms of the molecule into topologically distinct classes similar to the notion of class indices adopted by Prickett et al[37]. For example, while

finding the canonical SMILES of propane (H3C-CH2-CH3), the tie between the ranks of the two end carbon atoms is forcibly broken in Weininger’s algorithm. Instead, in RING, the tie is not immediately broken and the two end carbon atoms are categorized as belonging to the same topological class. This categorization is required in many other algorithms, such as finding topologically unique matches, as described later in Section “Reaction”. Canonical SMILES, however, requires the tie-breaking step for highly symmetric molecules [116, 117]; it is, therefore, done subsequently in an iterative manner until all atoms have distinct ranks.

Topological analysis

Topological features, such as rings, aromaticity, and allylic atoms are perceived in a Molecule object using established cheminformatics algorithms. The Set of All Rings (SAR) is identified in a molecule using a breadth-first search technique developed by Hanser et al[118]. This algorithm finds all possible cycles in a molecule. For example, the algorithm would detect one ring in cyclohexane and three in naphthalene as it would detect two six-membered and one ten-membered ring (4.4a and b). Aromaticity is de- tected using H¨uckel’s 4n+2 rule and additional guiding rules specified by Jorgensen et al[119]. Each ring in the SAR, from the largest to the smallest, is tested for extended conjugation of pi electrons (double bonds), lone pair electrons, unpaired electrons, and positive charge. If extended conjugation and 4n+2 pi electrons are identified in any ring, it is deemed aromatic (4.4c). Perception of the SAR and testing the rings from the largest to the smallest is essential for aromaticity detection because extended conju- gation that can spread across multiple rings in a fused system has to be identified first. For example, in Anthracene, extended conjugation exists in the 14-membered envelope ring and has to be detected first. Allylic atoms are detected as those having a charge, electron pair, or unpaired electron that can conjugate with a neighboring pi-bonded system, such as a C=C fragment (4.4d). 4.2 RING: Network generation 48

(a) (b)

Cyclohexane Naphthalene 1 - 6 member ring 2 - 6 member ring 1 - 10 member ring (c) (d) C+

Anthracene - extended conjugation Allyl carbenium ion in the 14 member ring with pi electrons

Figure 4.4: (a) One six-member ring of Cyclohexane, (b) Two six-member rings and one ten-member ring of Naphthalene, (c) Extended conjugation with 4n+2 pi electrons leading to aromaticity in Anthracene, and (d) Allyl carbenium ion, wherein a carbenium ion is in conjugation with a double bond.

4.2.2 Reaction rule

The internal representation of reaction rules, represented as graph transformation rules, is formalized in a separate class. Each reactant pattern is a fragment, in SMARTS, of the respective reactant molecule that participates in the reaction, the number of patterns in a rule being equal to the molecularity of the reaction. The class also contains function pointers pointing to the boolean functions representing molecular constraints (Section “Reaction language and compiler”). The transformations are translated into a set of C++ instructions that prescribe atomtype and bond changes. For bimolecular reaction rules, constraints are classified into three categories - constraints involving only the first reactant, constraints involving only the second reactant, and combined constraints that involve both (Section “Compiler optimizations”). The atom environment constraints that are described within SMARTS for individual atoms of a fragment are stored in Substructure as a list of constraints for each atom.

4.2.3 Reaction

Reactions are internally formalized as an application of a graph transformation rule on reactant chemical graph(s). This requires identification of all fragments in the reac- tant(s) that match the reactant pattern(s). This translates to finding a subgraph of the reactant chemical graph isomorphic to the reactant pattern. Consider for example the protonation of 1,3-hexadiene (4.5a). Protonation can occur on either of the two “C=C” fragments leading to different carbocations. Furthermore, the same fragment could match the reactant pattern in two ways, “C=C”, leading to two possible products. 4.2 RING: Network generation 49

The protonation of the 1,2 carbon-carbon double bond could lead to either a primary or a secondary carbenium ion. Hence, all possible matches of a given pattern must be enumerated to generate all products possible from the application of a reaction rule. In RING, an adapted Ullmann algorithm[120] is employed. The first step is the defi- nition of a matrix M0(m × n), where ’m’ is the size of fragment and ’n’ is the size of th the molecule, such that M0(i, j) is set to “1” if the i atom of the substructure is a potential match of the jth atom of the molecule and “0” if not. The potential match is decided on the basis of four rules: (a) atomtype of the atom in the fragment and corresponding atom in the molecule must match, (b) atom environment constraints of the atom in the fragment must be satisfied by the corresponding atom in the molecule, (c) number of hydrogen atoms attached to the atom in the fragment should be less than or equal to that of the corresponding atom in the molecule, and (d) the number of double and triple bonds of the atom in the fragment is less than or equal to that of the corresponding atom in the molecule. Once M0 is defined, a depth-first traversal of the molecule with backtracking and refinement, as prescribed by the Ullmann algorithm, is implemented to seek all isomorphic subgraphs. The algorithm can also find cyclic fragments in a molecule by an additional check in the refinement step to ensure that if neighboring atoms in a fragment are part of a ring, so should their corresponding matches in the molecule. Although all pattern matches have to be identified, they may not all lead to unique reactions. For example, in protonation of 1,3 butadiene (4.5b), protonation of either double bond will lead to the same product because the two double bonds are topo- logically identical. Moreover, the 1,2 carbon-carbon double bond can match with the “C=C” fragment twice because both carbon atoms (1,2) of the molecule can match either of the two carbon atoms of the fragment. These two matches are topologically different because the two carbon atoms of 1,3 butadiene are topologically different. In- deed, whenever there is a symmetry in the reactant pattern, such as in “C=C”, each fragment can match a pattern in more than one topologically unique way. However, if the molecule is symmetric about the fragment that matches the pattern (that in it- self has an internal symmetry), multiplicity of unique reactions will be reduced. For example, protonation of 3-hexene (4.5c) leads to only one unique reaction because the molecule is symmetric about the central “C=C” fragment. Therefore, to generate unique reactions, topologically unique matches have to be identified. The multiplicity of each 4.2 RING: Network generation 50 topologically unique match can be calculated from the knowledge of topologically dis- tinct atom classes (discussed in Section “Unique SMILES”) which, in turn, determines the multiplicity of each unique reaction.

(a) 2 distinct double bonds; 4 topologically unique matches to "C=C" C+

+ + C+ + H C 1,3 hexadiene

+C

(b) 2 distinct double bonds; 2 topologically unique matches to "C=C"

+C + H+ + 1,3 butadiene C

(c) 1 double bond; 1 topologically unique match to "C=C"

+ + H C+ 3 - hexene

Figure 4.5: Products of protonation reactions of alkenes: (a) 1,3-hexadiene protona- tion to form 4 different products; (b) 1,3-butadiene protonation to form two different products; and (c) 3-hexene protonation to form only one distinct product.

In reactions where the aromaticity of a ring in a molecule is broken, such as the protonation of benzene, RING reassigns the bond order of the bonds in the aromatic ring as alternate single and double bonds such that the valency is satisfied for all atoms. For example, protonation of benzene or the oxygen atom of furan results in the remaining 4 pi-electrons being redistributed as two conjugated double bonds in further conjugation with the positive charge, while protonation of the carbon atoms of furan will lead to only the products shown in 4.6. The protonation step will never involve the carbon-carbon aromatic bond furthest from the oxygen because bond redistribution will not satisfy valency of the remaining atoms unless their atomtypes are further modified. The bond redistribution algorithm is almost the inverse of aromaticity detection algorithm and checks, all aromatic rings, from largest to smallest, that are broken by the protonation step. For each ring, the algorithm reassigns aromatic bonds as single or double bonds 4.2 RING: Network generation 51 ensuring conjugation and maintaining valency.

(a) Protonation of Benzene

+ H+ C+

(b) Protonation of Furan + + O O OC + H+

O C+

Figure 4.6: Representation of reactions that break aromaticity in RING: (a) Protonation of benzene leads to double bonds in conjugation with each other and with the carbenium ion; (b) Protonation of furan leads to three possible products depending upon the protonation site. In all cases, the products have pi-electrons redistributed as double bonds or lone pairs so as to maintain conjugation.

4.2.4 Generation of the reaction network

Generation algorithm

The network generator constructs the network exhaustively by iteratively applying each rule on the initial reactants, to begin with, and products generated, subsequently. A control loop formalizes this recursive application.1 gives the overall scheme for network generation, the details of the functions called here are given in AppendixE. Three con- tainers of molecules are maintained: UnprocessedMol, containing those molecules whose reactions are yet to be considered, a list of molecules that have already been checked for reactions, called ProcessedList, and a set, AllMoleculesSet containing references to all the molecules. It should be noted that these containers actually contain pointers (references) to the molecule objects stored in a separate registry. The UnprocessedMol is initialized with the initial reactants. In each iteration of the while-loop, a molecule ‘M1’ is popped from this queue and pushed into the Pro- cessedList. Subsequently, for each reaction rule, ‘M1’ is checked if it can participate as a reactant in the rule. If the rule is bimolecular, ‘M1’ is checked to find if it can be either of the reactants of the rule, and in each case, all possible partner reactants 4.2 RING: Network generation 52

Algorithm 1 GenerateReactionNetwork() UnprocessedMol ← initial reactants that satisfy global constraints Lump Initial reactants AllReactionsMap = ϕ while UnprocessedMol 6= ϕ do M1 = UnprocessedMol.pop() ProcessedList.push(Reactant) for each Reactiontype R do if IsReactant(Reactant, R, first) then if R.molecularity =2 then for each possible M2 from ProcessedList as second reactant do GenerateRxnsUpdateMol([M1,M2], R) if IsReactant(M1, R, second) then for each possible M2 from ProcessedList as first reactant do GenerateRxnsUpdateMol([M2,M1], R) else GenerateRxnsUpdateMol([M1], R) List all reactions in AllReactionsMap and species in AllMoleculesSet as output are chosen from the ProcessedList. Function GenerateRxnsUpdateMol generates all topologically distinct reactions for the given reaction rules and reactants. It: (a) updates UnprocessedMol and creates new/ updates existing lumps if new species satisfying the product and global constraints are formed, (b) adds new reactions into AllReactionMap, and (c) updates the frequency of generation of a reaction with the reaction multiplic- ity, evaluated as discussed in section “Reaction”, if the reaction was already present. GenerateReactionNetwork terminates when UnprocessedMol is empty. The num- ber of molecules of a given size is finite even if large. Therefore, an upper bound in the molecule size in global constraints will ensure that the queue in1 will ultimately become empty. However, some systems, such as those that do not involve any form of chain or ring growth, do not require an upper bound in size for the algorithm to eventually terminate. Stricter constraints in reaction rules, given on the basis of experimental or theoretical insights, will further curtail the reaction network size. For each species in the network, an integer rank is maintained that is calculated as the least number of steps required to reach the species from any of the initial reactants. For each product molecule of a reaction, the reactant with the highest rank is noted as the parent. The rank of the product molecule, then, is one more than the rank of its parent. In case two or more reactants have the same rank the number of atoms 4.2 RING: Network generation 53 contributed to the product molecule by each of the reactants is calculated. The parent reactant of a product, then, is the molecule that contributed the most number of atoms. If the parent of the molecule can still not be determined, then, the first reactant of the rule is taken as the parent. If a product already has a rank assigned, the rank is updated with the new rank value if the latter is smaller than the current value. The molecules are arranged in UnprocessedMol according to their ranks. If the product molecule is already in UnprocessedMol, its position will be altered appropriately if it gets assigned a new rank. This ranking scheme is useful in pathway analysis, as discussed later (Sections “Pathway identification” and “Mechanism enumeration”).

Completeness and correctness

An important aspect in reaction network generation is the completeness and correctness of the reaction network generated. Completeness requires that all possible reactions and products that are possible be generated, given the chemistry rules and initial reactants. Correctness of a reaction network requires that the network be complete and also not have reactions and species that are inconsistent with the specified chemistry rules. The rigorous mathematical proof of correctness is an open mathematical problem as dis- cussed by Hsu et al [44]. However, on the basis of certain reasonable assumptions, the following proposition can be made. The proof for this proposition is given in Appendix F Proposition 1: Given that: (a) the subgraph isomorphism algorithm is correct and complete, (b) topologically equivalent classes of atoms in a molecule can be determined accurately, and (c) canonical SMILES strings can be generated correctly,1 will yield a complete and correct reaction network. The three assumptions made in Proposition 1 are reasonable. The subgraph isomor- phism detection, albeit computationally expensive, is complete and detects all fragments in the molecule that match a pattern. The topological equivalence is calculated based on extended connectivity measures as described in Weininger et al. [115] Although ex- tended connectivity measures have been proven to be incorrect in identifying topological symmetry in highly symmetric molecules, they are considered to be an efficient and in- expensive measure in most practical cases[121]. The canonical SMILES algorithms have also been routinely applied in organic chemistry for generating unique strings. 4.3 Post-processing 54 4.3 Post-processing

Instructions given subsequent to reaction rule description for identifying network in- formation such as isomer lumps, specific reactions and molecules, and pathways and mechanisms, constitute post-processing instructions. In this section, we discuss in de- tail the algorithms of the post-processing options in RING.

Find pathways to mol{ mol is "CC(=O)C" } constraints { length < 6 contains <=2 rule "HydrideShift" eliminate similar pathways } store in "HMFPathways.txt"

Figure 4.7: Sample post-processing instruction written in the language: a query to obtain pathways to acetone

4.3.1 Pathway identification

The pathway identification algorithm in RING finds all pathways between the initial reactants and specified products of the generated network. Figure 4.7 shows a sample query for pathways to acetone of length less than 6 steps and not having more than 2 instances of the rule “HydrideShift” (assuming such a rule was specified earlier). The algorithm makes use of two sets of information identified and stored during network generation. First, RING keeps track of the rank of each species in the network – the minimum number of steps required for that molecule to form from any of the initial reactants. Second, the “closer” parent of each product molecule in a reaction is identified during network generation in a three-step process. The reactant with a larger rank (that is, the one further from the initial reactants) is the defacto closer parent; if all reactants have the same rank, however, the one that contributes most number of atoms to the product becomes its closer parent. In case of a tie, the first reactant is assumed to be the closer parent. The ranks and parent information are used in a reverse depth-first search starting from the product and traversing backwards along the network to reach the initial reactants. The details of the algorithm are given in Algorithm2. The inputs of the algorithm are the target molecule, the maximum path length desired and additional pathway constraints (Figure 4.7), while the output is a list of 4.3 Post-processing 55

Algorithm 2 FindPathways(Molecule M0, integer MaxPathLength, PathwayCon- straints PC ) PathwayStack ← {} Stack of reactions constituting a pathway AllPathways ← {} List of pathways MoleculeStack ← M0 Stack of molecules traversed while ! MoleculeStack.empty() do if MoleculeStack.top() is an initial reactant then if MoleculeStack has no repeating entry then AllPathways.push (PathwayStack) PathwayStack.pop() MoleculeStack.pop() else find new Reaction, R, that forms the Molecule MoleculeStack.top() if found && PathwayStack.size()

Rank(M0) ≤ MaxPathLength. It is better to traverse backwards from the product to the initial reactant because the parent molecule of each product of a reaction is known. During the process of traversal, a stack of intermediate molecules, MoleculeStack, is maintained. The first, and hence the bottom, element of the stack is M0. The algorithm also maintains a stack of reactions that constitutes the pathway. At each step of the traversal, a reaction that produces the molecule at the top of MoleculeStack is found and the parent, ‘P’, of this molecule is chosen as the next intermediate molecule if P’s rank permits that a pathway can be formed in MaxPathLength steps or less. The reaction is then added into the PathwayStack. Furthermore, at each step, the next molecule is chosen such that it is not already present in MoleculeStack. This ensures that cycles are avoided and only acyclic paths are returned by the algorithm. If, ultimately, a parent 4.3 Post-processing 56 is reached within MaxPathLength steps such that it is one of the initial reactants, a pathway is deemed found, and the stack of reactions in PathwayStack is added to a list of pathways, AllPathways. The algorithm then backtracks to the previous molecule in MoleculeStack and proceeds as above. The algorithm terminates when MoleculeStack ultimately turns empty. The pathways, thus found, do not consider the co-reactants and products of a reaction and that the relevant intermediate molecules of a pathway are determined by the parent-daughter relationship of each reaction. As shown in Figure

4.7, the user can specify constraints to describe the target molecule M0 and subsequently provide pathway constraints. All pathways obtained are subsequently checked to ensure they satisfy other pathway constraints. These pathway constraints include restrictions on path length, the number of occurrences of specific rules, molecules as reactants or products in the entire pathway or in a rule, the activation barrier (if can be calculated), etc. Sometimes, several pathways may be found between an initial reactant and a spec- ified product that have the same length, and the same number of reactions of each rule type. Such pathways differ only in the order of reaction within the pathway with intermediates of one pathway being isomers of that of another pathway. Such pathways can be grouped together as a single pathway in RING by using the eliminate similar pathways command, as shown in Figure 4.7.

Comparison with other pathway finding algorithms

Pathways identification has extensive applications in biological network analysis and biological network reconstruction as it allows for understanding various complex bio- chemical processes such as metabolism. These pathway finding algorithms identify k-shortest paths [122] between source and target compounds or reactions of a directed weighted/unweighted network with or without tracking the destination of each of the atoms [18, 79, 80]. These algorithms start with a given set of reactions typically con- structed from databases. To assess the performance of our algorithm, we compare our algorithm with these state-of-the-art generic pathway finding algorithms. The compar- isons are only qualitative because these algorithms cater to any assembled network of reactions while our algorithm is specific to networks generated from an initial set of re- actants by successive aplication of reaction rules; our algorithms are therefore optimized for the specific case of analyzing automatically generated reaction networks. The algorithm in RING differs from these algorithms in several ways. First, because 4.3 Post-processing 57 the network was constructed from initial reactants and reaction rules, it is safe to assume tha each species in the network can be traced back to the initial reactant in at least as many steps as the rank of the species. This assumption, however, does not hold in the algorithms discussed above. Second, in the algorithm in RING, backtracking along any path from the products will ultimately lead to the initial reactants; this again does not hold in genetic pathway identification algorithms. Indeed, in those algorithms, forward or reverse search of the network from the reactants or products respectively will have similar performance. Third, our pathway algorithm can, in effect, track atoms along the pathway because of the availability of parent information for each reaction. For the generic pathway identification algorithms, tracking the destination of the atoms of a reactant in the network requires identification of atom mapping between the reactants and products.

4.3.2 Mechanism enumeration

In addition to pathways, RING can identify direct and complete mechanisms. Direct mechanisms represent reaction cycles containing a set of reactions such that the overall reaction has no reactive intermediates. Complete mechanisms represent, on the other hand, a set of direct mechanisms (or reaction cycles) that describe the complete trans- formation from the initial reactants to any products. Note that, therefore, complete mechanisms are supersets of reaction pathways. Note that, by our definition, both of them refer only to a set of reactions and their stoichiometry and not to their kinetics or thermochemistry, although these can be calculated if relevant information is available. Further, at this stage, we do not even consider possible rate-determining steps. Algorithm3 describes the procedure for finding direct mechanisms. The procedure is similar to that of finding pathways – a reverse depth-first search is employed with stacks for reactions and molecules that get populated along the traversal. At each step of the traversal, a reaction ’R’ is chosen, like in pathways. However, ’R’ is chosen on the basis of the stoichiometric coefficient of the current intermediate – top molecule of MoleculeStack – in the overall reaction, OR, of all the reactions in ReactionStack. Therefore, this reaction could form/ consume the intermediate as appropriate. Further, the algorithm ensures that adding the new reaction will not lead to closed cycles that are net-zero in overall stoichiometry. The intermediate chosen for the next step of the traversal is a reactive intermediate in OR. A direct mechanism is found when no reactive intermediates are explicitly involved in the overall reaction. As the overall reaction is 4.3 Post-processing 58 checked each time a reaction is added, this procedure ensures that the mechanism is composed of a minimal set of reactions and, hence, is direct. Further, the stoichiometric coefficient of a reactive intermediate in ReactionStack may not just be 1 (or -1) but could be higher or lower. Similarly, the stoichiometric coefficient of the intermediate in ‘R’ need not be 1 (or -1). In such cases, appropriate stoichiometric numbers, ν1 and ν2 in the algorithm, need to be found so that the overall reaction OR, of ‘R’ and reactions in ReactionStack taken together, does not explicitly involve the reactive intermediate at all. The algorithm terminates when MoleculeStack is empty.

Algorithm 3 FindDirectMechanisms(Molecule M0, integer MaxLength, Mecha- nismConstraints MC ) ReactionStack ← {} Stack of reactions constituting a direct mechanism AllDirectMechs ← {} List of direct mechanisms MoleculeStack ← M0 Stack of molecules traversed while ! MoleculeStack.empty() do if ReactionStack is a direct mechanism then AllDirectMechs.push (ReactionStack) ReactionStack.pop() MoleculeStack.pop() else find new reaction, R, forming/ consuming MoleculeStack.top() if found && # of unique reactions in ReactionStack + 1 ≤ MaxLength then if adding R into the ReactionStack does not lead to cycles then determine the stoichiometric factors, ν1 and ν2, of ReactionStack and R respectively get the overall reaction OR of ν1×ReactionStack and ν2× R if OR 6= ϕ then find reactive intermediate I ∈ OR if found then ReactionStack.push(R) MoleculeStack.push(M1) else MechanismStack.pop() MoleculeStack.pop() return AllDirectMechs satisfying MC

Algorithm4 lays out the procedure adopted in RING for finding complete mech- anisms. This algorithm also involves a reverse depth-first search strategy like in the 4.3 Post-processing 59 identification of pathways and direct mechanisms, but at each step of the traversal, di- rect mechanisms are added instead of individual reactions. Note that direct mechanisms are also referred to as reaction cycles (“RxnCycles”, specifically) in the algorithm. The direct mechanisms are calculated on-the-fly when a new intermediate is encountered and stored until the end. This way, direct mechanisms of only the relevant intermediates need to be identified, and further, only once. At each step, the next intermediate is chosen from amongst all the non-initial reactants of the overall reaction OR. An overall reaction is found when the reactants of OR are all initial reactants of the given sys- tem. In this sense, mechanisms and pathways parallel stoichiometric and path-finding approaches in systems biology [123].

Algorithm 4 FindCompleteMechs(Molecule M0, integer MaxLength, integer MaxRxnCycles, OverallConstraints MC ) MechanismStack ← {} Stack of direct mechanisms constituting a complete mechanism

AllMechanisms ← {} List of complete mechanisms MoleculeStack ← M0 Stack of molecules traversed while ! MoleculeStack.empty() do if MechanismStack is a complete mechanism then AllMechanisms.push (MechanismStack) MechanismStack.pop() MoleculeStack.pop() else find direct mechanism, Dm, forming MoleculeStack.top(), and not considered before if found && MechanismStack.size() + Dm.size() ≤ MaxLength then determine the stoichiometric factors, ν1 and ν2, of MechanismStack and Dm respectively get the overall reaction OR of MechanismStack and Dm find reactant M1 ∈ OR and 6= any initial reactant if found && MechanismStack.NumberOfRxnCycles() ≤ MaxRxnCycles then MechanismStack.push(Dm) MoleculeStack.push(M1) else MechanismStack.pop() MoleculeStack.pop() return AllMechs satisfying MC 4.3 Post-processing 60

Comparison with other mechanism identification algorithms

Algorithms for the construction of reaction mechanisms have been proposed, notably by Happel and Sellers [124], Otarod and Happel [83], Mavrovouniotis and Stephanopoulos [81], and Mavrovouniotis [82]. These algorithms construct all possible direct mecha- nisms from a given set of reactions. The underlying idea is to exhaustively consider all distinct combinations of reactions so that the set of reactions chosen have no net consumption/ formation of reactive intermediates and is minimal in size, thus form- ing a direct mechanism. This is done, for example, in the case of the algorithm by Mavrovouniotis and Stephanopoulos [81], by finding all pairs of reactions for a reactive intermediate – one forming the reactive intermediate and another consuming it – and adding them together to get a new set of reactions without that intermediate. Subse- quently, the reactions involving that reactive intermediate are replaced by this new set of reactions so that the intermediate is completely eliminated from the network. This procedure is done iteratively to obtain all direct mechanisms eventually. Our algorithm differs from these in several ways and, again, we provide only quali- tative and descriptive comparison because the scope of the algorithms mentioned above are different from the ones proposed here. First, in RING, mechanisms – direct or overall – are sought for specified target molecules. That is, not all possible mecha- nisms are sought, only specific mechanisms per the requirement are identified. This does not require the successive elimination of all intermediates, such as in the algorithm by Mavrovouniotis and Stephanopoulos [81]; only those intermediates involved in the synthesis of the target molecules need to be eliminated. Second, algorithms mentioned above find only direct mechanisms; they, however, do not find the overall mechanism from initial reactants to final products. Third, our direct mechanisms identifies empty cycles and eliminates them. The other algorithms mentioned above do not report a cycle detection and elimination method. Alternative mechanism identification methods involving constrained optimization have also been proposed [125]. These methods formulate a integer or mixed-integer linear programming problem to select reactions that taken together have no net con- sumption/production of reactive intermediates. Such algorithms offer the advantage of identifying a linearly independent set of overall mechanisms, for example, as in elemen- tary modes and extreme pathways [126], or identifying alternative solutions [127]. These methods can also be used to identify cycles by setting that both reactants and products be net-zero. Marvin et al. [128] describe a method for identifying mechanisms that 4.3 Post-processing 61 combines network generation using RING with constrained optimization algorithms. RING outputs the network in formats that can be exported into optimization software (specifically GAMS). The advantage of such a method is that in addition to identifying all mechanisms, they can be sorted or ranked according to any user-defined objective.

4.3.3 Lumping

The size of a complex reaction network can be reduced by lumping, or grouping, isomers. The reduced network can be more amenable to further analysis such as kinetic modeling. In RING, the process for identifying the lumps consists of three steps: (a) collation of molecules with the same number of different types of functional groups into one lump, or functional lumping, (b) assignment of a representative molecule to each lump based on user-input structural criteria for cyclic and acyclic species, and (c) further lumping of paraffins, olefins, naphthenes, hydrocarbon aromatics (PONA), or molecules satisfying user-defined properties based on molecular formula. Each of these steps is considered in detail below.

Molecule collation

The first step in the process, if lumping is sought by the user, is to find and group all molecules that have the same number of each functional group. For example, 2- pentanol and 3-pentanol are lumped together because they both have two carbon atoms belonging to a methyl group (CH3), two methylene carbon atoms (CH2), and one carbon and oxygen of the CHOH group. 4.9(a) shows collation of two groups of functionally equivalent molecules – secondary pentanols and xylenes. On the other hand, 1-pentanol is a separate lump because it has three methylene carbon atoms, one methyl carbon, and one carbon and oxygen atom of the CH2OH group. It should be noted that functional groups have a specific combination of atoms and bonding. Thus, two molecules have exactly the same number of each functional group if there exists a mapping between each atom of one molecule and a unique atom in the other molecule. The mapping, in this case, is defined as possible when the two atoms are identical themselves and have identical nearest atoms and bonds. Thus, functional equivalence between two molecules can be established by keeping track of what kinds of atoms are present in the two. To do this, RING adopts a simplified hashing scheme. 4.3 Post-processing 62

Wipke et al. [129] proposed and tested different hash functions generated by com- bining different parts of a canonical molecule name (SEMA) [130] through boolean op- erations such as XOR, to obtain different-sized (8,9, or 10) bit-string hash values. These bit-strings, thereby, implicitly contain the structural and steriochemical information of the compound. Ihlenfeldt et al. [131], on the other hand, use the complete molecular topology instead of only using the name for hashing. An atom hashing seed is first generated for each atom as the product of prime numbers corresponding to certain seed parameters. Subsequently, the atom hash is generated by combining the seed of each atom with its neighbors followed by equalization of bit distribution. The molecule hash is then obtained as a combination (logical XORs) of the atom hashes of the constituents. This procedure, thus, directly takes into account every atom and its neighboring envi- ronment and bonding in the moelcule to generate a hash value. In RING, the hashing scheme exploits the properties of prime numbers and adapts ideas from Ihlenfeldt et al. [131] and Wipke et al. [129], as discussed below. However, the use of molecule hashing in RING is different from that described above. Wipke et al. [129] and Ihlendfeldt et al. [131] used hashing predominantly for molecule indexing in databases and quick retrieval of molecular information. In such cases, hash functions are designed so as to minimize the chances of collision of hash values of different molecules. In contrast, the key requirements for lumping are that: (a) all molecules that are functionally equivalent – i.e. they have the same number of each type of functional group – must necessarily have the same hash value, and (b) collision between inequivalent molecules should be completely avoided. Algorithm5 describes the algorithm for generating the hash value of molecules. The first step is the evaluation of atom hash seeds for each atom as the product of primes corresponding to various parameters such as the number of nearest non-hydrogen neighbors, element type of the atom (C, N, O, etc), element type of the neighboring atoms, aromaticity, and bond orders of each bond connected to it (see AppendixG). The final value of the seed is the product of each of these factors. For example, the hash seed of the tertiary carbon in 2-butanol is calculated as shown in Figure 4.8. The carbon atom has three neighboring atoms, hence the factor 23, while the prime corresponding to carbon, prime(C), is cubed because the atom under consideration is carbon and it has two neighboring carbon atoms. Subsequently, each seed is assigned a prime number on-the-fly by RING during reaction network generation. This prime number constitutes the atom hash of that atom. It can be noted that two atoms of the same element having 4.3 Post-processing 63 identical nearest neighbors have the same atom hash value. Thus, an atom hash value corresponds to a particular class/ kind of atom, such as the tertiary carbon (Tert. C) shown in Figure 4.8. The molecule hash value is evaluated as the product of the atom hash values of each of the non-hydrogen atoms in the molecule (hydrogen atom hashes are, however, considered for purely hydrogenic species such as H+, H-, H· etc.). More information on the individual functions used in Algorithm5 is given in AppendixG.

Algorithm 5 (Integer, Integer) HashValue (Mol) Hash ← 1, ElectronicHash ← 1 for each non-Hydrogen atom ai of Mol do AtomHash ← 1 n = # nearest neighbors of ai n AtomHashSeed = 2 × ElementPrimeValue(ai) if atom is aromatic then AtomHashSeed = AtomHashSeed ×3 ElectronicHash = ElectronicHash × AtomElectronicHash(ai) for each neighbor, N, of ai do AtomHashSeed = AtomHashSeed × ElementPrimeValue(N)bondorder AtomHash = AtomPrimeValue(AtomHashSeed) Hash = Hash × AtomHash return (Hash, ElectronicHash)

Figure 4.8: An example of molecule hash value calculation. The steps are: (1) calculate hash seed of atom, (2) calculate hash of the atom using the hash seed, and (3) calculate molecule hash from the constituent atom hashes.

An additional hash value accounting for the electronic configuration of the atoms of the molecule is also evaluated. For each atom not in its elemental ground-state electronic configuration, the product of its atom hash and a prime number corresponding to the electronic nature of the atom (magnitude of charge and presence/absence of unpaired electrons) is evaluated as an electron hash of the atom. The electronic hash of the molecule is the product of the electron hash values of the non-ground-state atoms. The 4.3 Post-processing 64 molecule electronic hash of a neutral and stable molecule, such as 2-butanol, is 1 by definition. The hash value of the molecule is, hence, the integer pair of the molecule hash value and molecule electronic hash. A functional group is composed of specific types of atoms; hence, two functionally equivalent molecules would have the same number of each type of atom, or equal hash values. The hashing technique for functional equivalence using an integer pair, calculated as products of prime numbers, is atypical when compared to the bitstrings-based hashing functions discussed in Ihlenfeldt et al [131]. However, the utility of hash values in RING is similar to that of the bitstring technique as they both provide distinct identification for quick retrieval of molecular information. The hash value is implemented as a pair of unsigned integers in RING; there is, therefore, a finite possibility that hash values become large enough to cause a numerical overflow. The hash values can, in such a case, wrap to result in a different number, and thereby potentially lead to collisions. However, this is resolved by comparing the sizes of the two colliding molecules which will necessarily be different.

Five-Carbon Alcohol lump

Figure 4.9: Three-step lumping process: (a) Functional group-based equivalence is used to collate molecules, (b) A representative of each collated lump is identified, and (c) Paraffins, Olefins, Naphthenes, and Aromatics (PONA) lumps are further lumped based on molecular formula. 4.3 Post-processing 65

Identifying a representative molecule

The representative molecule of functionally equivalent lumps is determined on the basis of user-defined criteria. Specifically: (a) the representative molecule of acyclic species lumps can have leaves (the end atoms of the molecule) closest to (or farthest apart from) each other, and (b) the representative molecule of lumps of cyclic species can have branches farthest apart from (or closest to) each other. Figure 4.9(b) shows that 3-pentanol and p-Xylene are chosen as the representative molecule based on leaves and, hence, branches being farthest apart. Alternatively, the molecule with branch ends (or leaves) closest to each other could be set as the representative. In such a case, for example, o-xylene will be the lump representative of xylenes.

Additional lumping

The lumps identified through functional lumping can be collapsed further to form fewer groups. At this stage, the lumping is according to the number of heavy and hydrogen atoms in the molecule. These molecule constituents should satisfy certain molecular characteristics set by the user. RING implements the additional lumping by sifting through each lump, determining each time if the lump representative matches the char- acteristics specified for additional lumping and then accordingly groups together the lumps to get the new set of lumps. There are four pre-specified classes for lumping hy- drocarbons - paraffins, olefins, naphthenes, hydrocarbon aromatics, and their reactive intermediates. The user has a choice to represent this lump by the constituent having the most/least number of branches. Figure 4.9(c) shows an example where paraffins and aromatics are both lumped to most branched molecules. Alternatively, either or both of them could have been lumped to the least branched of the constituents. In addition to these pre-specified classes, there can be additional user-defined classes. The user specifies molecular characteristics involving size, shape, and/or pres- ence/absence of a specified frequency of functional groups that describes the desired class of molecules and then specifies the representative nature (most/least branched). For example, surface alkoxide intermediates in solid Brønsted acid catalysis can be lumped using this method. Further, n- and iso-alcohols lumped separately by the functional lumping scheme can be further lumped together using this strategy. 4.3 Post-processing 66

Comparison with other lumping methods

Methods to lump a reaction network fall into two broad categories: (a) mathematical lumping based on kinetics, and (b) chemical functionality-based lumping. Mathematical lumping methods [90, 91, 132]identify a lumping matrix “M” that groups one or more species so that the resultant model results are as close to the original. The constituents of such groups may or may not have physical meaning [16]. Chemical functionality- based lumping methods, on the other hand, group together molecules that have similar set of functional groups. These lumps have similar chemical and physical properties in addition to being related through several reactions in the network. RING’s lumping scheme falls under this second category and we compare our algorithm with other similar methods. The vector-based representation of Structure-Oriented Lumping (SOL) [88] offers a natural framework for lumping structural isomers that have the same set and number of different functional groups but have a different order or position of these groups in the molecule. However, molecules can only be represented as lumps and it is not always possible to get the structure of the individual molecules that constitute the lump from the vector. Further, the vector that stores the structural representation is fixed and pre- assigned. For each new chemistry, therefore, the internal representation vector must be expanded to appropriately account for new functional groups. RING offers the feature of SOL – functional groups-based lumping – through functional lumping step but is more generic as it is chemistry-independent and identifies and tracks functional groups dynamically. The software RDL++ [44] performs lumping of hydrocarbon isomers on the basis of molecular formula as a post-processing step. This method is not applicable for lumping oxygenates because even linear molecules with different functional groups can have the same molecular formula; for example, dimethyl ether and ethanol are both of the form C2H6O, even though they have different functional groups. The lumping technique in RING can distinguish these two molecules; however, provisions also exist to allow lumping of molecules based on the molecular formula for user-specified categories of molecules (such as paraffins, naphthenics, surface alkoxide intermediates, etc.).

Lumped network

Once molecules are grouped together to their respective lumps, the reactions of the network can be lumped as well, on the basis of the lumps of the reactants and the products. A lumped reaction, then, is one wherein the reactants and products are 4.3 Post-processing 67 represented by their respective lumps. A lumped network, consequently, is the network of lumped reactions. Several reactions, of a given reaction rule, can collapse to the same lumped reaction. The size of the lumped network, therefore, is significantly smaller compared to that of the parent network. This “reduced” network, in principle, contains distinct reactions of the different groups of molecules of the network.

4.3.4 Molecule and reaction queries

When a molecule’s thermochemistry value (enthalpy, entropy, or specific heat capacity) has to be predicted, RING loops over all the atoms in the molecule to determine their contributions and adds them up. Specifically, for each atom, RING determines its atomtype, calculates the hash pair, and checks if the appropriate set of group fragments has a group with the same hash pair value. Note that the hash pair value of the atom takes into account any double bonds of the neighboring atoms. If a group with the atom’s hash pair value is not found, then the hash value h2 is recalculated assuming one less double bond to check if the new hash pair has some matching group. This will be repeated successively, until h2 does not take into account any double bonds, at which point if no matching groups are available, RING will throw an error. This procedure ensures that a group with more information (in terms of having ‘Cd’ or ‘CO’ over just C) is checked prior to checking for that with less information (and hence a more generic group). When a matching group is identified, RING checks for matches of the group in the entire molecule. For each match, RING identifies its corresponding central atom (the atom that matches the central atom of the group), finds out which ones have a hash pair equal to original value of (h1,h2), accounts for the contribution of those atoms, and removes the atoms from further consideration. Once RING calculates the contributions of each additivity group, corrections are calculated and added by going over each user-defined correction fragment and checking for matches in the molecule. For each distinct match, an appropriate correction is added. The final value is then the molecule thermochemical property value. If specific heat capacity values (cp) are available at different temperatures as shown in Figure 3.5, enthalpy and entropy at any temperature can be calculated. For this purpose, cp is calculated using linear interpolation; for example, cp at 450 K is calculated as an average of cp values at 400 and 500 K. This group contribution scheme has been used to calculate the thermochemical propeties of gaseous phase stable and radical species, surface alkoxides arising in 4.4 Kinetic modeling 68

Brønsted heterogeneous catalysis, and surface intermediates in metal catalysis. The twin features of additivity and corrections enables defining non-standard Benson-like methods, such as for calculating thermochemistry of surface intermediates on metals

[133]. For example, [133] distinguish CH2 groups depending on whether the molecule is gaseous or surface intermediate. This can be handled by first defining the group in additivity scheme as that for gaseous species and then add a correction for the group applicable only for surface species (by an appropriately defined characteristic by the user) that is equal to the difference between the surface value and the gas phase value.

4.4 Kinetic modeling

The kinetic modeling feature in RING solves a steady state plug flow reactor. The equations for the kinetic modeling are shown in equations 4.7.

dF Flow Balance: i = r (C ) ∀i ∈ S (4.1) dV i i bulk

QSSA: rj(Cj) = 0 ∀j ∈ Ssurface (4.2) X o Site Balance: αkjCj + Ck = Ck ∀k ∈ Ssite &j ∈ Ssurface (4.3) j P RT × Fi Vol. flow rate: ν = i (4.4) P

Initial Conditions: Fi(0) = Fi0; Ciν = Fi ∀i ∈ Sbulk (4.5)

Cj(0) = Cj0 ∀j ∈ Ssurface (4.6)

Ck(0) = Ck0 ∀k ∈ Ssite (4.7)

where F is the molar flow rate, C is the concentration, ν is the volumetric flow rate; Sbulk, Ssurface, and Ssite are sets of species in the bulk phase (that flow in and out of the reactor), species on the surface (in the case of heterogeneous catalyst), and heterogeneous site types, respectively. Site type refers to each kind of heterogeneous site in the system. Brønsted acid catalysis has one site type, while metal-acid catalytic systems have two types of sites (metallic sites and acid sites). αkj is the number of individual sites of each site type ‘k’ occupied by surface species ‘j’. F0 and C0 are initial flow rates and concentrations. P is the pressure of the system while R is the universal gas constant and T is the system temperature. ‘r’ is the rate of formation/consumption 4.4 Kinetic modeling 69 of each species and is assumed positive for net formation and negative for net con- sumption. The rate r of all species is given by r = Sυ, where S is the stoichiometric matrix of all reactions in the system and υ is the rate vector consisting of rate of each reaction (calculated assuming mass action kinetics). The stoichiometric matrix stores the stoichiometric coefficients of each species involved in a reaction and is of dimension # species × # reactions. For each site type, a site balance is included such that the overall concentration of all free sites of that type and all other surface species occupying that site type adds up to the total site type concentration. The kinetic module in RING is standard and comparable to several other network generators. RING offers the capability of modeling both homogeneous and heteroge- neous chemistries. To formulate and solve kinetic models heterogeneous catalysis, RING invokes the quasi-steady state assumption (QSSA) for surface intermediates and solves the resultant differential-algebraic system of equations that takes into account site bal- ance. Consistent initial conditions are required to solve such systems,and assuming that initially all surface sites are free leads to inconsistencies because QSSA of other surface species will not hold. RING uses the consistent initial conditions calculation feature of IDAS [93] to calculate the initial concentrations on the surface species. However, one limitation arises in solving systems such as metal catalysis – the equations correspond- ing to QSSA are nonlinear with the possibility of multiple solutions. A more robust method to obtain the correct initial conditions is therefore currently being pursued.

4.4.1 Thermodynamic consistency

Thermodynamic consistency of a kinetic model ensures that the model does not violate thermochemical constraints. Thermodynamic consistency requires two conditions to be satisfied. First, the ratio of forward and reverse rate constants are constrained to be equal to the equilibrium constant (equation 4.8):

kf RT ∆n = Kc(T ) = Kp(T ) × ( ) (4.8) kr 1 atm

where kf , kr, Kc, Kp, and ∆n refer to forward rate constant, reverse rate constant, equilibrium constant with respect to concentration and pressure, and change in the number of gas phase molecules. This means that the forward and reverse activation barriers differ by the heat of reaction and the forward and reverse pre-exponential ∆S factor differ by the factor e R . Second, in general, for a reaction network, a reaction 4.4 Kinetic modeling 70

L P Rj can be written as a linear combination of a basis set of reactions as Rj = Ri. i=1 The dimension of this basis set ‘L’ is determined by the rank of the network which is less than or equal to the number of species in the system. The equilibrium constant of this reaction, therefore, can be written as a function of the equilibrium constants of the basis set of reactions as shown in equationeqn:LinComb [134]

L Y αi Kj = Ki (4.9) i=1

where Ki is the equilibrium constant of ‘L’ basis set reactions and Kj is the equi- librium constant of reaction ‘j’. If the thermochemistry of ‘L’ reactions is known, then the theromchemistry of each other reaction can be calculated. Specifically, in a cycle, where a linear combination of reactions leads to a net zero overall expression, the sum of free energy change of each constituent reaction adds up to zero, or alternatively, the product of equilibrium constants each reaction is equal to one. The requirement of ‘L’ thermochemistry values will be satisfied if the enthalpy of each species can be estimated independently. In other words, if (a) the thermochemistry of each species in the network can be estimated (say using group additivity) and, (b) forward and reverse reactions are constrained by the equilibrium constant, the kinetic model will be thermodynamically consistent. Tools such as RMG [48, 67] offers apriori estimation of kinetic parameters taking into account pressure dependence for gas phase homogeneous chemistries [135]. RING, on the other hand, provides the option of user specification of kinetics in a user-defined option. NETGEN [40] and RMG [67, 48] also employ rate-based construction of reaction networks. This method combines network generation and kinetic modeling whereby species and corresponding reactions are added only when their rates are larger than a threshold value; once a new reaction and species are added, the integration of the model is re-started. This method has been demonstrated to significantly reduce the size of the reaction mechanism. RING does not currently have rate-based construction. However, we have demonstrated using RING that the size of the reaction of complex systems such as propane aromatization on solid Br¨onstedacid catalysts can be pruned down significantly (up to two or three orders of magnitude) by (a) imposing constraints in reaction rules based on inputs from computational chemistry studies, and (b) lumping molecules that are not experimentally distinguished [136]. 4.5 Discussion 71 4.5 Discussion

Several salient and distinctive characteristics of RING can be noted. RING has been demonstrated to be versatile as a broad spectrum of chemistries, such as gas phase free radical, liquid phase acid/ base catalyzed, and heterogeneous solid acid/ base /metal catalyzed chemistries, can be specified and modeled [19]. Several options are provided to the users such as (a) constraints on reaction rules, (b) query features on molecules, reactions, pathways, and mechanisms, (c) choice on lumping strategy, (d) group additiv- ity and correction features for thermochemistry calculations, and (d) kinetic modeling. These options, further, have a common underlying theme – they are all rule-based. For example, the chemistry is specified through reaction rules, pathway queries in the form of rules to identify molecules and particular type of pathways, and kinetic parameters are specified in the form of rules involving conditional statements. This rule-based fea- ture enables translating expert knowledge into specific instructions, thereby lending the tool flexibility. The domain-specific language used as input to RING provides a very high-level, declarative language for describing chemistries to the system. This allows the inputs to be described as conceived rather than translating them into abstractions in general purpose programming languages, and further offers features such as error- checking and optimizations. DSLs, however, also have several general disadvantages [105]. In particular, one of the biggest is balancing between domain-specific and general purpose language features – offer too few general-purpose language features and the DSL’s applicability is very constrained and limited, but offer too many and DSL turns into just another general purpose language. The language extension model offers a solution to these problems. Instead of needing to design the core reaction rule specification language to handle any possible future development – something that would necessitate many general purpose language features – instead, the core of the language can be kept declarative and simple, intended only to describe the chemistry. Adapting the language to serve new purposes and provide new analyses can instead be accomplished through language extension. RING already comes with several extensions – molecule, reaction, pathways and mech- anisms queries, and kinetic modeling – and many more can be added in the future. Each of these new features can be developed independently as separate extensions that end-users can compose with the RING compiler automatically using Silver, resulting in a language and compiler tailored to their needs. CHAPTER 5

Topological network analysis with RING∗

In this Chapter, network generation and subsequent topological of analysis of complex reaction networks using RING are discussed. Specifically, we use RING to: (a) analyze network characteristics of a complex reaction network, (b) identify a set of plausible dominant pathways to specific products using computational and experimental data, and (c) hypothesize mechanisms to products in a complex network and propose potential experiments to discriminate between plausible ones. These analyses are presented in context of two case studies – propane aromatization, and glycerol and acetone conversion on Brønsted acid catalysts.

5.1 Analysis of network characteristics: Propane aroma- tization

In this section, we show that, using a proposed set of elementary steps and their esti- mated kinetics relative to one another, RING can be used to: (a) reconstruct the com- plex network, (b) analyze the structural properties of the network such as frequency of occurrence of each reaction type in the network, (c) generate a smaller network of reactions by encapsulating a class of reactants such that information on chemical trans- formations is not lost, (d) identify a set of likely dominant pathways to major products, and (e) lump structural isomers that are functionally equivalent so as to reduce the

∗Reported from Rangarajan et al. [136] Copyright c 2012 Elsevier Inc.

72 5.1 Analysis of network characteristics: Propane aromatization 73 overall size of the network further by several orders of magnitude. We illustrate these features for the system of acid-catalyzed aromatization of propane on ZSM-5 catalysts [17]. Bhan et al. [17] proposed a set of elementary steps for this system and further de- veloped a microkinetic model for the system. Hsu et al. [44] generated this network in RDL++, a network generator for constructing reaction networks for catalytic systems, using these elementary steps. We analyze this system further, in terms of methods described above. The reaction network is first constructed with the proposed elemen- tary steps, following which pathways to benzene are identified and the network further reduced through lumping.

5.1.1 Network construction

The elementary steps of the system are shown in Table 5.1. Several of these (1 11) were considered by Hsu et al. [44] and were also used here. Four more rules (12 15) have been added cyclization to form five-membered rings, ring expansion to form six-membered rings, alkylation of aromatics, and dealkylation on the basis of the elementary steps proposed by Bhan et al. [17]. The reaction rule inputs are given in the supplementary material (S1.1). For simplicity, the surface alkoxides formed upon adsorption of olefins are denoted as carbenium ions, Brnsted acid sites of the zeolite as [H+], and carbo- nium ion intermediates formed upon alkane activation are represented as C*. These elementary step reaction rules were input into RING with propane as the initial reac- tant to generate the reaction network. Table 2 shows that the total number of reactions generated by RING increases four-fold upon increasing the global maximum allowed compound size constraint from five to six carbons. Hsu et al. [44] showed that the computational time and the number of reactions increase exponentially with the size of the largest molecule allowed and report an exe- cution time of 48 hours for generating the network with up to C9 species that contained over one million reactions. In this section, we show how RING can be used to reduce the size of the network, yet preserve the information on transformations. 5.1 Analysis of network characteristics: Propane aromatization 74

Table 5.1: Reaction rules (elementary steps) for propane aromatization on HZSM-5

Reactants: Propane and [H+]

1. Adsorption of an alkane to form a carbonium ion 2. Desorption of a carbonium ion to form an alkane 3. Dehydrogenation of carbonium ions 4. Protolysis of a carbonium ion to yield a paraffin and a carbenium ion 5. Adsorption of olefins to form carbenium ions 6. Desorption of a carbenium ion to form an olefin 7. Beta scission of carbenium ions 8. Oligomerization of a carbenium ion and an olefin 9. Hydride transfer with paraffins and olefins as hy- drogen donors 10. Hydride transfer with cyclic species as hydrogen donors 11. Cyclization with internal hydride shifts 12. Cyclization to form 5-Carbon rings 13. Ring expansion of 5 C rings to 6 14. Alkylation of aromatics 15. Dealkylation of aromatics Constraints (global): Forbid vinylic carbenium ions, consecutive double bonds, linear trienes, and molecules of size ≥ 10 C atoms

Table 5.2 also shows the distribution for the four most frequently occurring reaction rules in the network with a largest allowed molecule size of 6 carbon atoms. Hydride transfer steps account for a large majority of reactions (about 72%) because almost every surface intermediate (except the ones not satisfying the constraints of the reaction rule) can accept hydrogen from most paraffins, olefins, and naphthenes. The hydrogen acceptor, irrespective of the identity of the donor, undergoes the same transformation it accepts a hydride to form a neutral gas phase species. As the size of the largest species 5.1 Analysis of network characteristics: Propane aromatization 75 Table 5.2: Propane aromatization network size and reaction distribution of the four most common rules computed using RING

Rule/Network Number of reactions (Species Largest size Largest size 6 5 Complete network 1791 (199) 400 (71) Hydride Transfer from rings 870 0 Hydride Transfer from paraffins and 429 232 olefins Desorption 103 30 beta-Scission/Oligomerization 57 16 increases, the number of hydride transfer steps naturally tends to increase exponentially to account for an even larger proportion (≥ 70% for largest allowable molecule size greater than 6) of all reactions. As we are interested in identifying the transformations occurring in the network, and not simulating the system using a kinetic model, hydrogen donor information can be encapsulated using a composite atom definition. A composite atom within RING is a user-defined entity used to represent: (a) an atom other than C, N, H, O, N, P, and S, or (b) a group of atoms considered together as a unified atom. In this case, we define a hydrogen transfer agent (HTA), as a composite atom; the hydrogen donor then is (HTA) bonded to hydrogen, referred to as [{HTA}H] in modified SMILES. The hydride transfer rule can now be rewritten as [{HTA}H] losing a hydride ion to a carbenium ion to form [{HTA}+] (rule “HydrideTransfer1” in scheme 1). [{HTA}H] and {HTA}+ represent all potential hydrogen donor and hydrogen acceptor species in the entire reaction network. We must therefore, write an additional rule that describes how [{HTA}+] can obtain a hydride from all potential hydrogen donors (H attached to a C in a neutral molecule) as shown in rule “HydrideTransfer2” in Figure 5.1. The original network can be restored by finding all combinations of pairs of these two rules and adding the reactions of each pair so that the common species [HTAH] and [HTA+] cancel out (see Section S2.1 of supporting material in Rangarajan et al. [136] for more details on the method and syntax). In this sense, the encapsulation is invertible and the information on the actual transformations is not lost. Figure 5.2 shows that the network size increases exponentially with the largest allowed compound 5.1 Analysis of network characteristics: Propane aromatization 76 //rule Hydride transfer from {HTA}H rule HydrideTransfer1{ positive reactant r1{ C+ labeled c1 } neutral reactant r2{ HTA labeled t1 H labeled h1 single bond to t1} modify atomtype (c1,C) break bond (t1,h1) form bond (h1,c1) modify atomtype (t1,HTA+) }

//rule Hydride transfer to {HTA}+ rule HydrideTransfer2{ positive reactant r1{ HTA+ labeled t1} neutral reactant r2{ C labeled c1 H labeled h1 single bond to c1} break bond (c1,h1) form bond (t1, h1) modify atomtype (t1, HTA) modify atomtype (c1, C+) }

Figure 5.1: Sample post-processing instruction written in the language: a query to obtain pathways to acetone size while Table 5.3 compares the network size and number of reactions for the case with and without encapsulation. As expected, two additional species [HTAH] and [HTA+] are present in the generated network in the case with encapsulation. A total of 20,000 reactions are generated when species up to C9 are allowed using this method of encapsulation. This is considerably lower than the 1 million reactions generated using RDL++ where hydride transfer steps are explicitly enumerated. Encapsulation results in reducing the proportion of the hydride transfer reactions from about 72% to 25% of the network with a maximum allowed compound size of 6, thereby accounting for lesser number of reactions generated as other reactions of the system are not affected. Further, for this network, there is a savings of 25% in execution time, which can potentially be much higher for larger networks. Thus, the definition of composite atoms within RING can reduce the network size and speed up the network generation significantly, and concurrently preserve the information regarding transformations. 5.1 Analysis of network characteristics: Propane aromatization 77

100000

Species 10000 Reactions

1000

100

Numberof reactions/ species 10 5 6 7 8 9 Largest compound size Figure 5.2: Reactions and species count for acid-catalyzed propane aromatization with hydride transfer encapsulation as a function of largest allowed species size

Table 5.3: Network size, number of reactions, and execution time of propane aromati- zation network with/without encapsulation

With encapsulation Without encapsulation Max com- species Reacions Time(s) % Hydride Time(s) % Hydride pound transfers transfers size 5 73 221 20.6 24.0 24.9 58.0 6 201 658 140.6 25.2 190.4 72.5

5.1.2 Dominant benzene production pathways

Benzene, the major aromatic product in this system can be produced in several ways. A query, for instance, for the shortest (and up to one step longer) distinct pathways to benzene identified more than 100 possible pathways. Not all of them are equally likely, and so it is desirable to identify the set of dominant ones. We can use experimental and computational data to provide RING with additional logical constraints that can prune down the set of pathways identified by RING. These constraints will determine if a particular pathway is to be explored or discarded. Bhan et al. [17] calculated activation energies and pre-exponential factors for ele- mentary steps of the system using experimental data and computations for a simplified 5.1 Analysis of network characteristics: Propane aromatization 78 Table 5.4: Reaction rate ratios for different species in Propane aromatizationa

(i) Alkanes → Alkyl carbenium ion Hydride transfer to form 2deg (to form 1deg) : Dehydrogenationb Propane 6.68 : 1 (0.03 : 1) Hexane 12.45 : 1 (0.05 : 1) (ii) Alkyl carbenium chain growth Alkylation (with alkanes) : Oligomerization (with alkenes) C2H6 :C3H8 :C4H10 :C2H4 :C3H6 :C4H8 Propyl 0.015 : 1 : 0.005 : 3.5 : 6.54 (0.33) : 0.53 (0.026)c (iii) Fate of alkyl carbenium ion Oligomerization (to 1deg) : -scission (to 1deg ): Desorption : Hydride transfer Hexyl 1 : 16.32 : 0.17 : 0.22 (iv) Reaction rate ratio of different steps Any → 1deg : → 2deg :→ 3degd Hydride transfer 0.004 : 1 : 12.20 Oligomerization/ beta-scission 1 : 20.00 : 60.59 a All values are calculated on the basis of kinetics information presented in Bhan et al. [17]. Reference temperature is 803K, W/F 2˜ gm-cat hr/mol.b For the case of dehydrogenation, we assume that the alkane forms a carbonium-ion like species upon adsorption in a quasi-equilibrated step, the concentration of acid sites as calculated for Si/Al = 16 with 98.5% of sites vacant (reported in Bhan et al. [17]). c Values in parentheses are relative oligomerization rates leading to primary carbenium ions. d All ratios are reported on the basis of a secondary carbenium ion as a starting reactant.

model of the system comprising of about 300 reactions. The kinetic parameters were ob- tained by Bhan et al. from: (a) parametric fitting to their own data, (b) computational chemistry calculations in the literature for certain reaction rules such as cyclization [137, 138, 139] and hydride transfer [140, 141], and (c) parametric fitting to experimen- tal data in the literature for rules such as beta-scission/oligomerization [142]. From this information, therefore, for any molecule, we can compare the rates of all possible reactions it can undergo with knowledge of rate constants and the concentration of any co-reactants at hand. We further use this to identify the dominant transformations of each species molecule or intermediate, and therefore, can construct a set of dominant pathways leading to Benzene. In this section, parameters as listed in Bhan et al. [17] have been directly used. Table 5.4 shows the ratios of reaction rates of different reaction steps (at specific 5.1 Analysis of network characteristics: Propane aromatization 79 conditions) calculated with the kinetics data reported by Bhan et al.[17]. Carbenium ions (or surface alkoxides) can be formed from alkanes either by hydride transfer steps or by alkane adsorption followed by dehydrogenation. Table 5.4(i) reports the ratio of rates of these two steps for propane and hexane. It can be seen that the rate of hydride transfer to form a secondary carbenium ion can be up to an order of magnitude larger than dehydrogenation; the relative rate of hydride transfer to form a primary carbenium ion, however, is much smaller. Indeed, hydride transfer to form tertiary and secondary carbenium ions is several orders of magnitude faster than that to form a primary carbe- nium ion as seen in Table 5.4(iv). We assume an approximate coverage of 1.5% based on results of Bhan et al. [17] and use exit concentrations in our calculations; in reality, however, the concentrations of species, including acid site coverage, vary along the bed and local concentrations are required to calculate relative rates at a specific location of the reactor. As a first approximation, nevertheless, we assume hydride transfer as the predominant step for the formation of secondary/ tertiary carbenium ions, while primary carbenium ions are formed by alkane activation and dehydrogenation. Alkyl carbenium ions can grow in chain size either by alkylation or by oligomer- ization. Table 5.4(ii) shows that oligomerization with ethylene and propylene (to form secondary carbenium ions) clearly dominates over alkylation. Alkylation with propane is higher compared to other paraffins (C2H6:C3H8:C4H10 = 0.015: 1: 0.005) because of the higher partial pressure of the propane reactant. Therefore, the pathway query restricted alkylation steps to necessarily have propane as a co-reactant. An alkyl car- benium ion can: (a) grow in chain size (by oligomerization primarily), (b) crack to form smaller molecules, (c) desorb to form an olefin, or (d) undergo hydrogen transfer. Table 5.4(iii) shows the relative rates of these four reactions for hexyl carbenium ions; beta-scission is the dominant reaction. However, as expected, desorption is essential to all pathways and therefore, we do not discard the reaction rule. It can be argued that smaller carbenium ions (such as secondary propyl carbenium ions) cannot undergo beta-scission and will likely oligomerize to form a larger carbenium ion or desorb to form the corresponding olefin. Hence, the rate of scission of hexyl carbenium ions can be compensated by the rate of oligomerization of smaller carbenium ions. Bhan et al. [17] calculate a first order cyclization rate constant to be 9.7 107 /s while that of ref- erence (2deg →1deg) beta-scission rate constant is 516.4 /s. Therefore, if a molecule can cyclize (a carbenium ion with a carbon-carbon double bond three or four carbons away), it will do so rapidly. Therefore, we constrain the pathway detection algorithm 5.1 Analysis of network characteristics: Propane aromatization 80 to avoid oligomerization or beta-scission of carbenium ions that can cyclize. Oligomer- ization and beta-scission rates to form secondary and tertiary carbenium ions are larger than those to form primary carbenium ions as shown in Table 5.4(iv). Therefore, if a molecule can crack / oligomerize to either form a primary or a secondary carbenium ion, it would prefer to form the latter. This constraint is also considered for pathway analysis. Finally, dealkylation of toluene is reported to have an activation energy 30 kJ/mol higher (leading to a rate constant 90 times lower) than dealkylation of C8 and

C9 aromatics [17, 139]. Selectivity to toluene is also experimentally found to be higher than selectivity to xylene [17, 143]. Therefore, pathways to benzene through toluene were avoided. Three major classes of pathways to benzene formation exist depending upon the largest compound size in the pathway. Figure 5.3 depicts pathways involving cyclization of C6 species, which formed either by alkylation or by oligomerization, ulti- mately leading to species 2. Given that the rate of oligomerization is about seven times higher than the rate of alkylation and both routes lead to the same intermediate, we can conclude that species 2 is almost exclusively formed via oligomerization and subsequent hydride transfer. Cyclization, as shown, can happen by forming five- or six-membered rings. Species 1 or 2 (Figure 5.3) can also lead to the formation of C9 species, which ultimately undergoes cyclization and aromatization to form ethyl-methyl-benzenes (see

Supporting material S2.2 of Rangarajan et al. [136] for details). These C9 aromatics can then undergo dealkylation to form benzene. A third route is through the formation of C8 compounds from butene, which alternatively could also be formed from propene and ethane (Supporting material S2.2 of Rangarajan et al.[136]). The application of kinetic information as constraints resulted in identifying 10 dominant pathways from an initial set of over 100 pathways, leading to an order of magnitude reduction. Longer pathways could also be significant, and the exact contribution of each pathway can be found if the reaction fluxes are available from kinetic modeling.

5.1.3 Network lumping

The propane aromatization network, even upon encapsulation, contains about 20,000 reactions and close to 6,000 species (Figure 5.2). However, many species both stable and ionic species are functionally equivalent to each other; they have the same molecu- lar formula and are composed of the same number and type of functional groups. These molecules have similar chemical properties owing to their chemical similarity and can, therefore, be grouped together into “lumps” of molecules. Lumps include both stable 5.1 Analysis of network characteristics: Propane aromatization 81

H+ + C* C H+ H2 +C H+ {HTA}H H+ C

+ 1 {HTA}+ H {HTA}+ {HTA}H

+ CH+ + C+ {HTA} C C + H 2 {HTA}H H+ H+

H+

C+ Ring C+ {HTA}+ + expansion C C+ + {HTA}H H + + H {HTA} {HTA}H

C+ H+ Figure 5.3: Acid-catalyzed propane aromatization to form benzene involving cyclization of a C6 species species and ionic intermediates. The number of lumps is smaller than the total species count which enables us to get a compact representation of the network because reactions can also be lumped by assuming them to be composed of lump(s) reacting to form a set of product lumps. The number of lumped reactions can be orders of magnitude lower than the original network size. In general, lumping requires taking into consid- eration functional group information and becomes more involved when several types of functional groups, such as in oxygenates, exist. We have developed a new lump- ing technique that identifies such functional equivalence (Chapter4). We further lump paraffins, olefins, naphthenics, and aromatics on the basis of molecular formula using an additional lumping instruction (Chapter4) as this is a common classification for hydrocarbons. In Rangarajan et al. [20] (and Chapter3, we demonstrated how lump- ing reduced the network size by a factor of 2 for the fructose-to-HMF system, which contains significant amount of oxygenates. For the case of hydrocarbons, this reduction is even higher as discussed below. Figure 5.4 shows the size of the lumped network as a function of the largest allowed compound size. The number of reactions is of the order of a few hundred reactions, which is two orders of magnitude lower than the original network with encapsulation, and four orders of magnitude lower than the network without encapsulation based on the numbers reported by Hsu et al. [44]. This 100-fold reduction in network size leads 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 82 to a lumped network that is more amenable to kinetic modeling. A lumped network could also potentially be a starting point for a corresponding pathway analysis with lumped reactions.

500

400 Lumped reactions Species lumps

300

200

100 Numberof reactions/ species

0 5 6 7 8 9 Largest compound size Figure 5.4: A count of species lumps and reaction lumps with hydride-transfer encap- sulation as a function of largest allowed species size for acid-catalyzed propane aroma- tization

This example shows how, using RING, (a) a complex network can be constructed from elementary steps, (b) the network can be collapsed into one of a smaller size using encapsulation and inverted back to the original network, (c) the network can be grouped to obtain a lumped network of size that is two or more orders of magnitude smaller, and (d) likely shortest dominant pathways can be obtained from relative ratios of estimated kinetic parameters.

5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration

In this section, we show that RING can be used for testing postulated elementary steps and further propose new mechanisms for complex reaction systems. We use the system of acid-catalyzed glycerol dehydration for illustration. Corma et al. [6] studied the conversion of glycerol-water mixtures to acrolein on ZSM-5 based zeolite catalysts. A 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 83 complex spectrum of products, including acrolein, acetaldehyde, and a variety of acids, was experimentally observed. On the basis of reported mechanisms for oxygenates on solid acid catalysts, the authors suggest plausible pathways to certain products. Using RING we identify pathways to acrolein and acetone exhaustively on the basis of a set of postulated elementary steps, and further test their plausibility by noting if a predicted unique intermediate molecule is actually observed in experiments. Further, we hypothesize a mechanism for acetone conversion under the same reaction conditions using the same set of elementary steps. In heterogeneous Brnsted-acid catalyzed conversion of oxygenates, the following el- ementary steps are commonly observed: alcohol adsorption-desorption, dehydration, alkene adsorption-desorption, hydride transfer, -scission, oligomerization, 1,2-hydride shift, cyclization, etherification, and carbonyl adsorption-desorption. None of the ex- periments discussed by Corma et al. [6] report the production of ethers. This is perhaps because ethers can decompose at the temperature of operation (300 500deg C) by the microscopic reverse of the etherification step. We, therefore, input the aforementioned steps into RING, with the exception of the etherification step. Carbon monoxide is observed in these experiments, indicating decarbonylation steps at these high temper- atures. A microscopic reverse of the carbonylation step [144] was, therefore, included. Water concentration used in the experiments by Corma et al. [6] was considerably high (20-85 glycerol wt % feed); at these conditions, hydration is likely and is considered as a possible elementary step. We model this system such that surface species are repre- sented as carbenium ions and acid sites are represented as [H+]. The hydride transfer steps are modeled by encapsulation, as discussed in Section 3. The adsorption of car- bonyl compounds has been known to lead to acylium-like intermediates [145]; therefore, intermediates of carbonyl compounds are modeled to form carbenium ions. The inputs into RING are given in the supplementary material (S1.2 of Rangarajan et al. [136]). The global constraints for this system are: (a) largest allowed molecule is 8 atoms (number of carbon and oxygen atoms), (b) consecutive double bonds are prohibited, (c) C=C+ is prohibited, and (d) gem-diols cannot be formed because they are highly reactive and form the respective ketone or aldehyde (Gomez-Bombarelli et al. [146] report that log Keq of hydration of C2-C4 aldehydes and ketones vary from 0 to -2.7). 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 84 Table 5.5: Pathways statistics for acid-catalyzed acrolein production from glycerol

Number of pathways Length No constraints With constraints All Distinct All Distinct 12 37203 1385 156 16 10 909 161 45 8 8 22 12 6 3

5.2.1 Acrolein production pathways

Acrolein is the major product of glycerol conversion. Pathways to acrolein from glycerol were queried in RING, the results of which are presented in Table 5.5. Initially, a query is set such that pathways of length up to twelve steps are sought with no further constraints. Several thousand pathways were identified by RING as reported in the table. The number of pathways was reduced to several hundred upon: (a) setting reversible rearrangements (for e.g., Hshift) to less than two instances in the pathway so as to prevent several rearrangements in a pathway, and (b) preventing the cleavage of C-C bonds so that the carbon backbone of glycerol is preserved. Table 5.5 also lists the distinct pathways identified by RING for the cases with/ without additional constraints. By imposing more stringent constraints, we progressively identify fewer pathways and obtain three pathways that can be analyzed manually. Figure 5.5 shows the distinct pathways to acrolein. All pathways involve 3- hydroxypropionaldehyde (O=CCCO), which suggests that it is a necessary intermediate and a primary product upon glycerol dehydration. Corma et al. [6] report that ace- tol and 3-hydroxypropionaldehyde are observed as products at low conversions. They also infer that 3-hydroxypropionaldehyde is an intermediate in the formation of acrolein since a separate experiment with acetol as feed resulted in negligible acrolein produc- tion. Thus, our inference of a unique and necessary intermediate in acrolein synthesis pathways starting from glycerol, made using the exhaustive set of pathways obtained through RING corroborates experimental observation of that species.

5.2.2 Pathways to acetone

Corma et al. [6] report significant acetone production when acetol was fed to the reac- tor. They also observed that temperature and conversion did not affect the selectivity of acrolein but affected all other products such as acetone, acetaldehyde, and higher 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 85

OH

HO OH H+

H2O {HTA}H HOC+ OH HO OH {HTA} + {HTA}H - H+

+ C+ HO OH {HTA} HO OH

C+ HO OH + - H

O OH

H2O H+

- H+ +C O O

Figure 5.5: Acrolein production pathways over solid acid catalysts. The 3- hydropropionaldehyde precursor to acrolein is highlighted by a dashed oval oxygenates. They also found that when 1,2-propanediol was used as feed, it resulted in 60% carbon selectivity to acetone. On the basis of these observations, Corma et al. [6], conjectured that acrolein and acetone are formed in parallel pathways. A pathway query to acetone was input into RING with the constraint that the car- bon backbone of glycerol is preserved (by ensuring no C-C scission). This resulted in less than 10 distinct pathways, some of which are shown in Figure 5.6. Each of these path- ways involved either acetol (hydroxyacetone), 1,2-propanediol, or their corresponding adsorbed surface intermediates. Thus, analysis through RING also shows that acetol or propanediol are primary products obtained upon dehydration of the terminal hydroxy group, while dehydration of the secondary hydroxyl group results in the formation of 3-hydroxypropionaldehyde hydroxypropionaldehyde and subsequently acrolein (Figure 5.5). The above analyses for acrolein and acetone formation (and that in Section S3.1 in Rangarajan et al. [136] for pathways to acetaldehyde given in the supporting mate- rial) clearly show that pathways identified from the postulated elementary steps involve intermediates observed experimentally. This, therefore, suggests that the elementary steps considered for this study are adequate to describe the chemistry of the system. We can now postulate mechanisms to observed products for other oxygenate conversion systems operating under the same conditions as glycerol dehydration. 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 86

OH

HO OH

H+

OH H2O

+ OH {HTA} HO - H+ C+ OH OH OH H+ {HTA}H

HO +C enol to keto {HTA}+ H2O OH O {HTA}+ {HTA}H {HTA}H OH HO C+ HO

C+ OH H2O HO H+ + - H C+ O O {HTA}H O

+C {HTA}+

Figure 5.6: Acetone production pathways from Glycerol over solid acid catalysts (note all pathways require either hydroxyacetone or 1,2-propanediol)

5.2.3 Potential mechanisms that result in acetic acid production from acetone

In a separate experiment, Corma et al. (2008) used an aqueous solution of acetone as feed at similar temperatures and WHSV as that of the glycerol conversion experiments. They reported butenes and acetic acid as the major products, leading them to postulate a stoichiometric reaction to account for these two products:

2 Acetone → Butene + Acid

As the experimental conditions were similar, we generated a network for acetone conversion using RING with the same elementary steps as used for glycerol dehydration, and identified pathways to acetic acid from acetone (Figure 5.7). Several routes exist for acetic acid production, all of which require the hydration of a surface carbonyl species and a larger C6 molecule formed upon the addition of acetone with another acetone tautomer. Only one pathway (involving addition product 1) results in butene (2 methyl propene) as a by-product. This pathway involves dehydration and a hydration step, with no net additions. The catalytic cycles and overall reaction, as shown in Figure 5.8 represent the transformation of acetone to acetic acid and iso-butene. We note that the overall stoichiometric reaction based on our postulated sequence of elementary steps formulated within RING matches that proposed by Corma et al. 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 87

O + H OH

C+

H+ {HTA}H {HTA}+ O OH OH

OH C+ + H+ C OH OH C+

OH O OH H+ C+ +C

OH O {HTA} + HO

{HTA}H O OH OH 1 C+ H+

H O OH 2 H+ OH {HTA}H O {HTA} + O C+

C+

OH OH

O O H2O C+ O OH + H+ O

Figure 5.7: Pathways identified by RING from acetone to acetic acid over solid acid catalysts. The proposed mechanism is enclosed within the dashed curve

[6]. The mechanisms involving hydrogen transfer involve the formation of butan-3- en-2-ol, which could further dehydrate to form butadiene. No alcohols or dienes are, however, reported suggesting that the mechanism without hydrogen transfer is the most likely one. Several experiments can be devised to test the hypothesis that the proposed mechanism is indeed the actual route to form butene. It is evident from the pathway shown in Figure 5.7 that at low conversions the C6 addition product should be observed and the rate of acetone conversion would be second order with respect to acetone. In addition to this, as we show below, we can simulate isotope tracking with RING to predict possible observations if isotope labeling experiments are conducted. The acetone system was generated in RING with: (a) unlabeled water, (b) unlabeled acetone, (c) completely 13C- and 18O-labeled acetone, and (d) 18O-labeled water; Figure 5.9 shows the results of this simulation. The pathway marked “Pathway 1” shows the 18 condensation of two unlabeled acetone molecules, followed by hydration with OH2 leading to an 18O-acid. This shows that an experiment with unlabeled acetone in 5.2 Mechanism hypothesis based on experimental data: Glycerol dehydration 88

O

OH Acetone Propenol

O O + HO

OH Acetone Propenol 5-hydroxy-5methyl-pentan-2-one

O O HO +

OH 5-hydroxy-5-methyl-pentan-2-one 2-Butene Propenol

O O 2 +

OH 2 Acetone 2-Buetene Acetic acid

Figure 5.8: Individual reaction cycles for the conversion of acetone to acetic acid and butene

18O-water solution should lead to 18O incorporation into the acid product. Pathways marked “2” and “3”, on the other hand, form a completely labeled acetic acid molecule with one or all atoms of butene being labeled. “Pathway 4” forms butene that has 3 carbons labeled. Therefore, if the mechanism proceeds with condensation, dehydration, scission, and hydration, we should expect that upon co-feeding equimolar concentrations of completely labeled and unlabeled acetone, the butenes formed should have equal distribution of zero, one, three, and four 13C-labeled carbons. Further, Figure 5.9 shows acetic acid is either completely labeled or unlabeled with respect to carbons, and 18 18 will have O incorporation when OH2 is used. Thus, RING has been used to: (a) propose and test a set of elementary steps to identify plausible pathways to observed products, by comparing pathway predictions of intermediates with that of experimentally observed ones, and (b) hypothesize mecha- nisms for product formation in related systems as well as suggest suitable experiments, especially isotope labeling studies, to validate it. In general, manual enumeration of possible routes to observed products is tedious for systems having a complex reaction network. RING, in such cases, can be used to identify these routes exhaustively, and 5.3 Discussion 89

Pathway 1 O OH OH OH O + + OH + O H + HO H + C C+ O 13C 18O H+ 13 C H2O 13 18 13C C O 13C 13C O

Pathway 2 + 18O C 18O H+ OH 18O 13 + H+ C HO + 13 13 13 13 C C C C C 13 13 13C 13 13 C C 18 C C O H+ H O 13C 2 13 13 + 13 18 18O 13C C C C O 18 13C 13C O 13C+ 13C C+ 13C 13C 13C 13 O CH2 18 H+ Pathway 3 18 OH2 + 13C O H OH 13 13 13 18 C CH3 13 C O + 13 18 C H C O 18 +13 O 18O C O 13 13 C C 13C+ 13C 13C 13C 13CH 13 13 13 18OH 13 3 C C C 2 + 18O OH C 18OH H 18O 2 + 13 CC 18 13C O 13C 13C 18O H+ Pathway 4 13C 13 13 13 C C C O 13C O 18 O 18O H+ OH2 13 C+ C 13C+ 13 O C 13C 18O H+ 18 OH2 Figure 5.9: 13C- and 18O- labeling simulations using RING for the conversion of aqueous acetone solution on zeolites. can be a starting point for proposing and testing mechanisms.

5.3 Discussion

The two case studies illustrate the different classes of network analysis problems that can be addressed using RING. In the first example, propane aromatization, the elementary steps and kinetics are already established and we infer the characteristics of the network such as reaction distribution, combinatorial complexity, most abundant lumps, and likely dominant pathways to benzene. In the second example, we hypothesize a set of elementary steps for glycerol dehydration and compare pathways enumerated by RING with experimental data to validate these elementary steps. Further, we use the elementary steps to propose a mechanism for acetone dehydration. Therefore, these two cases are contrasting examples; however, both case studies illustrate that RING can be used in conjunction with additional data to elucidate the transformations to desired products occurring in complex reaction networks. Thus, we postulate that RING can be used along with experimental and computational studies so that data can be directly input as constraints in network generation or analysis. 5.3 Discussion 90

Four characteristics of RING enable it to be applied in network analysis: (a) flex- ibility and versatility a broad spectrum of chemistries can be handled as discussed previously [19], several types of network analysis features (pathways, lumping, and mechanisms) are available, and a variety of options are available in the form of con- straints on rules and post-processing steps that allow for including expert knowledge; (b) rule-based the entire process of network generation and analysis is governed by rules, viz., reaction rules and post-processing instructions written in an English-language in- terface [20]; (c) exhaustiveness the generated network is complete and correct for the given inputs, and the pathways and mechanism algorithms traverse the network com- pletely; and (d) speed generation of a network takes seconds to hours depending on the nature of the reaction system and system specific modifications, such as the encapsu- lation discussed earlier in the Chapter, allows for further reducing the execution time. RING, as shown in the three case studies, can therefore be applied in several types of complex network analysis problems. CHAPTER 6

Identification and analysis of synthesis routes in complex catalytic reaction networks for biomass upgrading∗

In the previous chapter, examples of topological analysis were discussed in context of analyzing complex reaction networks. In this chapter, a case study is presented to demonstrate network querying and analysis to identify desired molecules and feasible synthesis routes when additional molecular properties can be calculated.

6.1 Introduction

Biomass conversion to fuels and chemicals is a prospective green alternative to processing petroleum. Several catalytic routes have been proposed for converting biomass into valuable products [102, 101]. This includes individual reaction systems such as acid- catalyzed fructose dehydration to 5-hydroxymethylfurfural (HMF) [148] and multi-step catalytic processes that convert biomass into fuels such as diesel [149, 150]. Given the plethora of options for chemical transformations, a wide variety of chemicals can be synthesized from biomass [151, 152, 153, 154, 155]. Consequently, the network of all potential reactions to upgrade biomass the synthesis network is complex. The analysis of these complex networks from identifying synthesis routes to calculating the energetics requires a computational method having two features. First, the reaction network should be assembled in an automated manner. To this end, rule-based network generators that

∗Reported from Rangarajan et al. [147] Copyright c 2013 Elsevier Inc.

91 6.2 Computational methods 92 construct the reaction network of a system exhaustively from fundamental reaction rules can be used. Second, the energetics of each reaction step in the network should be estimated rapidly. This is because, while ab initio computations have been used to calculate thermochemistry of synthesis routes [156], applying such a method to an entire complex network is computationally intractable. In this chapter, we explore the synthesis network of biomass conversion to fatty alcohols for the production of nonionic surfactants in terms of: (a) determining the spectrum of products that can be synthesized from biomass-derived precursors using a defined set of rules for acid, base, and metal catalysis, (b) selecting synthetically fea- sible compounds that have user-desired properties, (c) identifying the different routes to synthesize these desired products from biomass, and (d) comparing these routes in terms of parameters such as atom efficiency, selectivity, energetics, and two-phase separability. To address the computational challenge associated with the size of this problem, we employ: (a) RING to construct the network and identify synthesis routes, and (b) on-the-fly semi-empirical property prediction using group additivity for ther- mochemistry and quantitative structure-property relationships (QSPRs) for physical properties. Through this example, as a case study, we present a scalable technique for rapidly screening the large spectrum of biomass-derived compounds and identifying and evaluating synthesis routes to valuable ones.

6.2 Computational methods

The reaction network was generated using RING. Inputs into RING include the set of initial reactants, the reaction rules describing the chemistries, and a set of post- processing network analysis features such as queries on molecules, reactions, pathways, and mechanisms in the network. A mechanism listing the steps that need to be taken (in a specific order) from the initial reactants to the final products, is, in effect, a synthesis route. The gas phase thermochemistry enthalpy and entropy of formation, and heat capacity of molecules at any specified temperature is calculated using group additivity. To this end, an additivity scheme has been implemented within RING that takes user-specified groups along with other inputs to calculate thermochemistry on-the- fly. For this work, groups covering hydrocarbons and oxygenates have been used [92, 157, 158, 159]. Octanol-water partition coefficients (LogP) are calculated for potential biphasic reactions also using group additivity [160]. Quantitative structure-property relationships for nonionic surfactants (see next Section), are also implemented in RING 6.2 Computational methods 93 for on-the-fly physical property prediction. One of the properties, surface tension (see next Section), requires the calculation of enthalpy of formation using PM3 [161] level of theory. To accommodate this calculation, we have integrated RING with the open source tool Openbabel [162] and MOPAC [59]. This allows for calculating enthalpy of formation of a generated molecule in a three-step procedure. First, when RING generates a new species, during the process of network generation, it supplies Openbabel with the SMILES [96] string of the species. Openbabel translates this SMILES string into a MOPAC-compatible input file with preliminary estimates of the 3D structure [162]. Second, MOPAC performs the geometry optimization and energy calculations to return an output file with the results. Third, RING parses the file, to obtain the relevant data that is required. This three-step procedure, albeit automated, was found to be the most time consuming step in property estimation. For the generated network, an analysis with a randomly selected set of 15 fatty alcohols showed that the mean error between calculating the thermochemistry using group additivity and PM3 level of theory was less than 2% (see Section S1 of supporting information of Rangarajan et al.[147] for a detailed tabulation). The QSPR for surface tension was found to be insensitive to this level of error. Therefore, for initial querying analysis, we calculate the properties using group additivity. Subsequently, we refine the surface tension values using the enthalpy of formation value calculated at the PM3 level of theory (calculated using MOPAC) for those selected molecules. Figure 6.1 provides a schematic of the work and information flow in this work.

Figure 6.1: A workflow of network generation, analysis, and semi-empirical molecular property relationship estimation with RING. 6.3 Fatty alcohols from biomass 94 6.3 Fatty alcohols from biomass

6.3.1 Fatty alcohols

Fatty alcohols are alcohols with a large alkyl (C8+) group. One of the major applications of fatty alcohols is in the synthesis of nonionic surfactants known as alcohol ethoxylates (AEs). These are compounds composed of one unit of a fatty alcohol and several units (up to 12 typically) of ethylene oxide (EO) [163] (Figure 6.2). The alkyl group of the fatty alcohol constitutes the hydrophobic (or lipophilic) part, while the hydrophilicity is due to the EO units. Such surfactants find applications as constituents of detergents, cleaners, emulsifiers, etc. and are sold commercially [164]. The design of AEs involves tuning the size and shape of the hydrocarbon chain (for adjusting the hydrophobic properties), and increasing/decreasing the number of EO units. The desired physical properties that are typically targeted [165] include: (a) low critical micelle concentra- tion (CMC) the minimum concentration at which a surfactants aggregate to form a micelle, (b) high cloud point the temperature above which the surfactant action ceases, (c) application-specific hydrophilic-lipophilic balance (HLB) the balance between the hydropobicity and lipophilicity of the molecule that determines if the surfactant is water or oil soluble, (d) low interfacial surface tension (γ) at a specified temperature and com- position, (e) high biodegradation by microorganisms, measured in time taken to break the molecule down completely, and (f) aquatic toxicity the concentration of surfactants in water above which it is lethal for aquatic life. Quantitative structure-property rela- tionships have been developed to relate most of these properties with the structure of the surfactant [166, 167, 168, 169, 170]. Typically, these relationships involve calculating topological indices of a part of or the whole molecule or energetic and thermochemical properties using semi-empirical Hartree-Fock theories such as PM3 and AM1, and then relating the desired physical properties to these parameters [165]. Being semi-empirical, these methods provide a fast and reasonably accurate [165] prediction of such properties.

OH O C13 Fatty alcohol n

(OC2H4)nOH Alcohol ethoxylate Figure 6.2: A schematic of fatty alcohol and alcohol ethoxylates showing the hydropho- bic (alkyl chain) and hydrophilic (ethylene oxide oligomeric chain) components. 6.3 Fatty alcohols from biomass 95 6.3.2 Synthesis of desired fatty alcohols and alcohol ethoxylates

Several techniques have been conventionally used to synthesize fatty alcohols from veg- etable oils [171], hydrocarbons[172], and small alcohols [173, 174]. More recently, bio- chemical processes have been developed, by engineering microorganisms, to convert sugars into fatty alcohols [175, 176]. Here, we explore heterogeneous catalytic routes to synthesize fatty alcohols from biomass-derived platform oxygenates.

Figure 6.3: Initial reactants input into RING. HMF stands for 5-hydroxymethylfurfural.

The initial reactants are shown in Figure 6.3, while the reaction rules that were used for generating the reaction network are listed in Table 6.1. The reactants are taken from the US Department of Energy list of top 12 biomass-derived platform chemicals [151] and a subsequent study that revisited the topic [152]. The chemistries involve overall reaction rules comprising acid, base, and metal catalysis. Further, these rules comprise of C-C bond formation (aldol condensation, ketonization, Michael addition, alkylation), deoxygenation (dehydration, hydrogenolysis), and saturation (hydrogenation) the three steps that lead to larger carbon chains, lower oxygen and higher hydrogen content in a molecule than the biomass-derived initial reactants. These reaction rules have been used based on their application in fine chemicals synthesis and biomass conversion [177, 178, 179, 180, 181, 182]. The C-C bond formation steps were restricted to be applied only on molecules that have been generated in four or fewer steps from the initial reactants. Further, a global constraint was fixed to restrict the size of any generated molecule to be less than 15 atoms (excluding hydrogen). The network generated by RING has over 180,000 reactions and 50,000 species. To probe this large network, we now address the four questions raised earlier in context of the product and chemistry selection problem pertaining to the synthesis of fatty alcohols applicable as surfactants. Specifically, we present and discuss: (a) the spectrum of fatty monoalcohols formed in the network, (b) the subset of these alcohols that can lead to alcohol surfactants with properties close to commercially available ones, (c) the different atom efficient routes 6.4 Results and Discussion 96 Table 6.1: Reaction chemistries input into RING

Aldol condensation, with and with- Acetone to Methyl-isobutyl ketone out oxygen retention reversal Ketonization of acids and esters Propanoic acid to diethylketone Hydrogenation of C=C and car- Butene, acetone, and benzene hy- bonyl groups, and aromatics drogenation Dehydrogenation of alcohols Ethanol dehydrogenation to ac- etaldehyde Hydrogenolysis of acids, esters, Acetic acid or ethyl acetate to cyclic ethers, and alcohols ethanol, 1,3 propanediol to propanol Dehydration and decomposition of Dehydration of lactic acid/3- hydroxy acids hydroxypropionic acid Dehydration of diacids Succinic acid to succinic anhydride Michael addition Ethyl acrylate and acetone to form 5-oxohexanoic acid ethyl ester Tishchenko reaction Dimerization of benzaldehyde/ trimethylacetaldehye to benzyl- benzoate/ 2,2-dimethylpropyl-2,2- dimethylpropanoate Alkylation and hydroxyalkylation of Alkylation of 2-methylfuran by ac- aromatics etaldehyde Meerwein Ponndorf Verley (MPV) Reduction of cyclohexanone with reduction by isopropanol isopropanol to synthesize these alcohols, and (d) the thermochemistry landscape and the biphasic separability aspects of these routes.

6.4 Results and Discussion

6.4.1 Fatty alcohols product spectrum

In the large set of species in the generated network, we specifically identify saturated monoalcohols. Figure 6.4 shows a spectrum of such alcohols. Clearly, a large number (16,000+ in total) and a variety (primary, secondary, linear, and branched) of monoalco- hols exist in the network. This rich network is the starting point for identifying potential alcohol ethoxylates that can be synthesized. 6.4 Results and Discussion 97

7000 Primary 6000 Secondary 5000 Linear

4000 Branched

3000

2000

1000

0 8 9 10 11 12 13 14 # Carbons Figure 6.4: The spectrum of monoalcohols generated in the network.

6.4.2 Alcohol ethoxylates property estimation

The AEs are derived from the generated fatty alcohols, by successively adding ethylene oxide units to it in an automated manner within RING. Further, the surfactant proper- ties are also estimated on-the-fly. Specifically, we consider AEs formed by adding 4 12 units of ethylene oxide per molecule of fatty alcohol. Thus, from each of the 16,000+ monoalcohols identified in the spectrum, we derive nine AEs. The properties of these surfactants are then estimated using the QSPRs discussed above. The properties es- timated include critical micelle concentration (CMC, in ppm), cloud point (CP, in ◦ C), hydrophilic-lipophilic balance (HLB, dimensionless), and surface tension at CMC (γcmc, in dynes/cm). Three commercially available surfactants, and their properties, are listed in Table 6.2 for reference. In the spectrum of product properties, we query for those surfactant compounds that fall in a target range of desired properties, as de- termined by properties of commercial nonionic surfactants. The ranges were chosen in view of the error in the QSPRs, and of commercial surfactants being mixtures of sev- eral fatty alcohols. The results of the query for surfactants in the network in different target regions (I-III) around these three surfactants are presented in Table 6.3. Surface tension values in Table 6.2 are measurements at higher surfactant concentration (≥0.1 wt%) than at the CMC; these values are, therefore, lower than γcmc. The queries take this into account by seeking a range higher in value than those reported in Table 6.4 Results and Discussion 98 Table 6.2: Commercial surfactants and their physical properties

Name CP (◦C) CMC (ppm) HLB γ (dynes/cm) SURFONIC L12-8 [183, 184] 80 77 13.6 29 SURFONIC L46-7 [183, 184] 50 12 11.6 27 TERGITOL TMN-6 [185] 36 800 13.1 27

Table 6.3: Queried surfactant ranges and the number of identified moleculess

Query CP (◦C) CMC (ppm) HLB γ (dynes/cm) # surfactants found Range I 50 – 100 0 – 100 12 – 14 27 – 35 12 Range II 30 – 60 0 – 30 11 – 13 25 – 35 8 Range III 30 – 40 500 – 1000 12.5 – 13 27 – 34 18

6.2. Figure 6.5 shows the structure of a few of the molecules identified in the three ranges. The structures of all molecules identified in each of the ranges are given in the supporting information Section S2 of Rangarajan et al[147]. One of the molecules in range I, lauryl alcohol (or 1-dodecanol), is a principal constituent of SURFONIC L12-8 [183, 184], while the other structures identified in the search have different struc- tures but have properties similar to that of lauryl alcohol. This combination of network generation and QSPRs enables us to systematically identify potential surfactants that can be obtained from biomass through a combination of heterogeneous acid, base, and metal-catalyzed routes.

(a) Range I (b) Range II (OC2H4)8OH (OC2H4)8OH

(OC2H4)8OH (OC2H4)7OH

(OC2H4)8OH Lauryl alcohol ethoxylate

(c) Range III

(OC2H4)7OH

(OC2H4)7OH

HO(C2H4O)7 Figure 6.5: Molecular structures of alcohol ethoxylates in each of the three ranges of queries. 6.4 Results and Discussion 99 6.4.3 Synthesis routes to lauryl alcohol

Figure 6.6 shows several routes to form lauryl alcohol. The synthesis route involving furfural, ethanol, and acetone (shown in bold arrows) represents an atom-efficient route for the synthesis of lauryl alcohol atom efficiency described on the basis of the weight of useful product to that of byproducts. All these routes are similar in their sequence C-C bond formation reactions (aldol condensation steps) occur first, followed by deoxygena- tion steps (hydrogenolysis). Further, dehydrogenation steps (ethanol to acetaldehyde) generate carbonyl groups for further C-C bond formation steps, while hydrogenation steps at the end of the sequence saturate the chains.

O

O

O Acetone O OH O O H2 H2O OH Furfural O Lactic acid

O H2 CO O 2 O

O OH O H O O 2 H Ethanol H2 Acetaldehyde 2 O Furfural H O 2 O O O O

H2O O O H O 2 O O

O H2O 2H2 H2 OH

H2 OH O O Methanol H2O

2H2

O H2 OH Lauryl alcohol Figure 6.6: Lauryl alcohol synthesis routes from biomass-derived oxygenates. An atom- efficient route is shown with bold arrows.

Each of the steps in Figure 6.6 has a precedent in that a similar reaction has been reported in the context of chemical synthesis from oxygenates. Ethanol/ methanol (and in general alcohol) dehydrogenation on metals is a well-established class of reactions used industrially [178, 186], while the subsequent aldol condensation of aldehydes and ketones is similar to the condensation step in butanol and MIBK synthesis [187, 188]. Lactic acid decomposition has been noted on several heterogeneous catalysts as a major byproduct in lactic acid dehydration (with selectivities comparable to that of acrylic acid, the major product) [189, 190]. The successive aldol condensation steps of furfural with 6.4 Results and Discussion 100 acetaldehyde, and of their condensed product with a five-carbon ketone (2-pentanone) is similar to the condensation steps involving furanic derivatives such as HMF with acetone or itself [149, 150]. The subsequent hydrogenation of C=C, carbonyl groups, and aromatics have established precedents [149, 150, 178, 180]. The hydrogenolysis steps to remove the hydroxy group can be classified under the category of alcohol hydrogenolysis [178], while the final step of breaking the cyclic ether to form a linear alcohol in high selectivity has been recently reported [191]. Reactions in Figure 6.6 do not take into consideration the selectivity of each step. Selectivity, however, is particularly important in this context because several condensa- tion steps, including aldol condensation, can lead to undesired condensed products as byproducts. For example, Figure 6.7 shows the possible condensation products of ace- tone and acetaldehyde; pent-3-en-2-one (marked 1 in Figure 6.7) is the desired product. Therefore, while including this cross condensation reaction, the possibility of forming other three products cannot be ignored. One way to prevent such reactions from being included in the synthesis routes is by only allowing either self condensations or cross condensations wherein one of the co-reactants cannot condense with itself (for exam- ple, furfural and acetone, because the former does not have two hydrogen atoms to the carbonyl group). Further, enforcing hydrogenation steps to occur successively will ensure that partial hydrogenation another reaction step that could lead to multiple intermediates (and hence lower selectivity to the desired product) is avoided.

O 1

H2O O O O O + O O

H2O H2O

H2O

O Figure 6.7: Possible products of acetone-acetaldehyde aldol condensation included Pent- 3-en2-one (marked 1).

Figure 6.8 shows two selective synthesis routes to form lauryl alcohol (the difference is only in the manner 2-butenal is formed). These routes involve aldol condensation of furfural with acetone, and a second condensation with 2-butenal to form a C12 condensed molecule. The 2-butenal is, in turn, formed by aldol condensation of either 6.4 Results and Discussion 101 acetone with formaldehyde or self-condensation of acetaldehyde. The C12 molecule is subsequently hydrogenated and undergoes hydrogenolysis to get the fatty alcohol. This synthesis rotue contrasts with the route depicted in Figure 6.6 wherein the products of furfural-acetaldehyde and acetaldehyde-acetone aldol condensation were hydrogenated to saturate the C-C double bonds. Postponing this intermediate hydrogenation step in the routes as shown in Figure 6.8 ensures that intermediates such as 2-butenal and formaldehyde cannot undergo aldol condensation further. The atom efficiency of these routes (shown in Figure 6.8) matches the one in bold arrows in Figure 6.6. This sequence is indeed the preferred choice in the synthesis of diesel-range alkanes from oxygenates derived from C5 and C6 sugars. For example, as shown in Figure 6.9, Huber et al. [150] suggest a route involving multiple aldol condensation steps of 5-hydroxymethylfurfural (HMF) followed by hydrogenation/ hydrogenolysis to form long chain alkanes. West et al. [192] also discuss the synthesis of alkanes of different molecular weights wherein HMF-Acetone-HMF condensation is carried out prior to hydrogenation. Thus, plausible synthesis routes to lauryl alcohols can be systematically identified and screened on the basis of overall stoichiometry or chemistry knowledge.

O O Furfural OH O Ethanol H2O H2

Acetone O O OH Acetaldehyde Methanol O O O H2 H2O O H2O 2-butenal O

OH O 4H2

O O H2O H2

O 2H2 O

H2

HO Lauryl alcohol Figure 6.8: Selective synthesis of lauryl alcohol from furfural, ethanol, methanol, and acetone. 6.4 Results and Discussion 102

O HO O O O HO O H2O HO O H2O O

Hydrogenation O Hydrogenolysis O C13 alkane O OH

OH Figure 6.9: Synthesis scheme for converting biomass-derived oxygenates to diesel (adapted from Ref Huber et al. Science 308 (2005) 1446 1450).

6.4.4 Thermochemical analysis of synthesis pathways

The analysis in the previous section demonstrates that while multiple synthesis routes may have identical overall stoichiometry, the sequence and the choice of chemistry steps can lead to considerable differences in other parameters such as yield/ selectivity. In this section, these routes are compared in terms of different thermochemical parame- ters. RING calculates the thermochemistry on-the-fly using group additivity. While this method has shown very good accuracy for hydrocarbons (≤ 1 kcal/mol [157]), the errors could be larger for oxygenates encountered in biomass conversion. To test the accuracy of group additivity, a sample set of 30 reactions and more than 50 compounds was considered. The species considered include hydrocarbons and oxygenates such as butane, hexane, benzene, lactic acid, pentanol, dimethyl ether, propane diol, acetalde- hyde, furfural, tetrahydrofuran, etc, while reactions involved these compounds. The enthalpy of formation (and reaction enthalpy change) at 298 K and 1 atm pressure was estimated using RING and compared with values reported in the literature (predom- inantly from NIST webbook). The sample set and the estimations are given in the supporting information (Section S3 of Rangarajan et al. [147]). Table 6.4 gives the error statistics for species enthalpy of formation and reaction enthalpy change. It can be noted that the mean unsigned error is remarkably small (5.2 kJ/mol). The large standard deviation indicates that while mean unsigned error can be less than 1 kcal/mol, specific compounds can have larger deviations. For example, group additivity value for enthalpy of formation of 5-hydroxymethylfurfural (HMF) is about 26 kJ/mol (about 6 kcal/mol) more negative than the value reported in the literature 6.4 Results and Discussion 103 Table 6.4: Error statistics for species and reaction enthalpy estimation using group additivity

Signed errora (kJ/mol) Absolute error (kJ/mol) Mean Std. Devia- Mean Std. Devia- tion tion Species -1.40 9.36 5.21 7.86 Reactions 1.59 11.28 5.36 9.32 P-valueb 0.23 0.94 a errors are calculated as (group additivity value experimental value); b P-value is the two-sided value calculated with t-test assuming normal distribution, unequal variances and sample size. by Assary et al. [156]. This can be attributed to the limited data set used to formulate the group additivity that does not accurately capture the effects of substitution in furans. The unavailability of accurate experimental thermochemistry data prohibits further scrutiny of group additivity methods for large oxygenates such as those seen in Figures 6.6 and 6.8. Given the error in calculation of species enthalpy of formation, it would be expected that the errors would propagate further for the estimation of reaction enthalpy change, thereby leading to much larger deviations. However, as seen from the results shown in Table 6.4, the error statistics of reaction enthalpy change appear to be identical to that of species enthalpy of formation for both unsigned and absolute errors. The P-value in both cases is large enough implying that the null hypothesis the two distributions are statistically similar in nature holds to a level of significance much greater than the typical value of 5% [193]. This statistical equivalence can be attributed to the nature of group additivity. The error in each constituent group of a molecule contributes to the total deviation in calculating species enthalpy of formation. On the other hand, only those groups that participate in a reaction contribute to the deviation in the reaction enthalpy; the contribution of groups preserved in the reaction gets cancelled out in the calculation. This leads to an average deviation in reaction enthalpy change comparable to that of species enthalpy. Examples of this “cancellation” effect are shown in Table 6.5 wherein G4MP2 (from Assary et al. [156]) and group additivity values for some compounds and reactions are compared. It is clear that while the estimated enthalpy of formation of compound A in the table (condensate of HMF and acetone) is close to that predicted using G4MP2, estimations of the other three compounds deviate by 10 20 kJ/mol. On the other hand, 6.4 Results and Discussion 104 Table 6.5: Comparisons between G4MP2 and group additivity estimations

Species/Reactions Enthalpy (∆Hf or ∆Hrxn) in kJ/mol Group additivity G4MP2[156] Difference O O

HO A -332.67 -331.06 -1.61 O O

HO B -444.38 -436.39 -11.71 O O HO HMF -360.02 -339.42 -20.6 O OH HO DHMF -411.47 -401.28 -10.19 A + H2 → B -111.71 -105.34 -6.37 HMF + H2 → DHMF -51.46 -61.86 10.41 the deviation in the estimation of enthalpy change of the two hydrogenation reactions shown in Table 6.5 is smaller than that observed for compounds. This is because, group additivity enthalpy of formation values are more negative than their corresponding G4MP2 values in every case; thus, the errors cancel out to an extent while calculating the reaction enthalpy. Therefore, although the enthalpy of formation of the C12 conden- sate in Figure 6.8 may be estimated using group additivity with significant error, the hydrogenation reaction enthalpy of that compound can be calculated with reasonable accuracy (≤ 10 kJ/mol) because the furanic group and the nature of substitution (C substitution in position 2) is preserved in the reaction. Figure 6.10 shows the thermochemistry of the reaction steps in Figure 6.8 and of one of the routes in Figure 6.6. The enthalpy (and free energy) change of each reaction step is given at 1 atm pressure and temperature 600K for each species. The temperature has been chosen to be considerably greater than the boiling point of lauryl alcohol (532 K [194]) to compare the gas phase energetics of all reactions at the same physical conditions. It can be noted that while dehydrogenation of ethanol is endothermic, but feasible (negative ∆Grxn), the subsequent aldol condensation steps with acetaldehyde or acetone 6.4 Results and Discussion 105

O O O O OH OH O H2O Ethanol 15.79 (22.78) Furfural O H2 72.32 (-6.63) O H2 72.32 (-6.63) O H2O 11.08 (13.29) O Acetone O O -121.99 (-52.02) Acetaldehyde H2 O H O O OH 2 O O Methanol 4.72 (5.85) 9.43 (15.34) O O 2 1 O H2O O H 89.11 (20.32) H2 2 7.02 (11.49) H O 2 O -122.4 (-48.15) 2-butanone O -9.72 (-2.28) 2-butenal 11.41 (0.94) O OH O O 4H2 -180.73 (-24.20) 2H2 4 O -543.64 (-110.61) H2O H O 2 OH 3 -76.3 (-84.22) -76.3 (-84.22) O H2 2H2 O O -145.36 (14.22) H2O -74.34 (-32.56) 5 H2

HO Lauryl alcohol Figure 6.10: Gas phase thermochemistry analysis of synthesis routes to lauryl alcohol. The route with bold arrows is a selective route without cross condensation steps involv- ing molecules with -hydrogens with respect to the carbonyl group. Enthalpy (and free energy) change (kJ/mol) of each reaction at 1 atm and 600 K is given. Thermochemistry values of equilibrium-limited reactions are underlined and hydrogenation steps that can potentially drive a preceding equilibrium-limited reaction are marked by dashed arrows. are both mildy endothermic and non-spontaneous. These reactions require higher tem- perature or partial pressure of the reactants for thermochemical feasibility. The same is also true for aldol condensation of furfural with acetone/acetaldehyde. Assuming, as a first approximation, that these molecules behave as an ideal gas at 1 atmosphere and 600 K, the equilibrium conversion is approximately 35% and 10% for acetaldehyde-acetone and furfural-acetaldehyde condensation, respectively. The aldol condensation of com- pound marked 1 in Figure 6.10 (Furfural-acetone condensation product) with 2-butenal is exothermic and thermochemically spontaneous, while that of compound 2 (furfural- acetone condensation product) with 2-butanone is mildly endothermic and equilibrium limited (equilibrium conversion of 24%). It can be noted that subsequent hydrogenation 6.4 Results and Discussion 106 of all the condensed products is both exothermic and spontaneous; this could possibly drive an equilibrium-limited aldol condensation step. For example, Kunkes et al. [195] showed that 2-hexanone condensation/hydrogenation over Pd/CeZrOx at high pres- sures (5-26 bar) in the gas phase leads to the C12 condensed product in high selectivity (≥80%). Thermochemistry calculations (see [195]) showed that aldol condensation is equilibrium-limited at such conditions, while hydrogenation steps are irreversible. Thus, the authors concluded that the thermodynamically downhill hydrogenation step drives the equilibrium limited condensation step

The C12 condensates (3 and 4 in Figure 6.10) are further hydrogenated and un- dergo hydrogenolysis in successive exothermic and spontaneous overall reactions. It is, however, important to note that hydrogenation of the carbonyl group (subsequent to saturating the C-C double bonds in the successive hydrogenation steps of compound 3) is not spontaneous at the conditions of 1 atm and 600 K (∆Hrxn = -57.15 kJ/mol, ∆Grxn = 24.65 kJ/mol). However, being exothermic, the equilibrium is shifted in the forward direction and at 400 K, ∆Grxn is negative (-2.33 kJ/mol). Hydro-dearomatization of compound 5 (Figure 6.10) to form the corresponding tetrahydrofuran derivative is also exothermic and equilibrium-limited with a ∆Grxn equal to 14.22 kJ/mol (see Figure 6.10). Further, there is a net difference in the moles of product formed compared to the reactants. Thus, the equilibrium of this reaction is shifted by changes in pressure and temperature. At 400 K (and 1 atm), Grxn is considerably negative (-38.55 kJ/mol). Alternatively, at 600 K and 10 atmosphere total pressure (90% hydrogen partial pressure), calculations, assuming ideality of gases, give an 80 % equilibrium conversion. The results discussed above outlining various routes for the synthesis of lauryl al- cohols show that it is possible to rapidly (a) determine exothermicity/ endothermic- ity of reactions in several synthesis routes, (b) identify thermodynamic bottlenecks (equilibrium-limited reactions), (c) postulate if some reactions can potentially drive others, and (d) predict physical conditions that could drive a reaction in the preferred direction.

6.4.5 Biphasic separability analysis of synthesis pathways

Several biomass conversion processes are carried out in liquid phase. In such cases, the product formed can be separated from the reactants by partitioning it into a second 6.4 Results and Discussion 107 phase within a reactor (biphasic reactors) [148, 196] or in a downstream solvent sepa- ration unit [192]. Specifically, biphasic systems can drive the equilibrium forward and prevent the products from reacting further. For example, dehydration of fructose is carried out in a biphasic reactor to separate the product HMF from the reaction phase (aqueous phase containing acid) into an organic phase [148]. We have implemented on-the-fly prediction of octanol-water partition coefficients using group additivity in RING (see Section 2) and below, we discuss the use of this descriptor to assess biphasic separability. Biphasic separability can, in general, be inferred from the partition coefficient be- tween the aqueous phase and the organic phase. Ideally, partition coefficients between the actual organic phase (which could be a mixture of many solvents) and aqueous phase need to be evaluated; however, octanol-water partition coefficients (reported in logarithmic units, LogP) can give an approximate measure of the hydrophobicity /hy- drophilicity of the molecule. Positive (or negative) values imply hydrophobicity (or hydrophilicity). A second advantage of using the LogP measure is that these values can also be calculated using group additive techniques [160, 197]. In this section, we analyze the synthesis routes to lauryl alcohol in terms of separability of reactants and products. Figure 6.11 shows the LogP value of initial reactants, intermediates, and lauryl alcohol for the synthesis routes shown in Figure 6.10. Table 6.6 compares experimental and estimated LogP values for several oxygenates. Also given in the table are estimates from another additive method (CLOGP [198]) accessible through commercial software. Clearly, the method implemented in RING predicts the magnitude of hydrophobicity correctly in most cases, though the prediction is incorrect for acetone. The deviations, however, are consistent with the reported standard deviation of about 0.6 log units [160]. CLOGP predictions are better; however it incorrectly predicts acetaldehyde to be hydrophilic. For compounds 1 and 4 in Figure 6.10, both of which are unsaturated carbonyl derivatives of furan, the differences between the two methods are less than 0.3 units. Further, relative hydrophlicity (or hydrophobicity) values are predicted correctly by the LogP estimation method. For example, glycerol, having a higher degree of hydrogen bonding per molecule than ethanol is expected, and observed, to be more hydrophilic than the latter. This is captured correctly in the LogP values estimated by RING. Further, for a given class of molecules such as alkanes, alcohols, and aldehydes, it is observed that increasing carbon number results in a higher hydrophobicity [199]. This relative increase is also seen for LogP estimates; for example, compound 4 is estimated 6.5 Discussion 108 Table 6.6: Experimental and estimated LogP for a representative set of oxygenates

Octanol-water partition coefficient (in LogP units) Compounds Experimental Estimated [160] CLOGP [100] Ethanol [199] -0.3 -0.003 -0.24 Acetaldehyde [199] 0.45 0.20 -0.22 Furfural [200] 0.46 0.97 0.67 Dodecanol [199] 5.13 3.90 5.05 Glycerol [200] -1.76 -1.43 -1.54 to be more hydrophobic than compound 1 owing to its larger size. Thus, LogP values assessed using RING are a reliable estimate of the octanol-water partition coefficient value. From Figure 6.11, it can be noted that several reactions have reactants and prod- ucts differing in their solubility. Specifically, furfural-acetone (or acetaldehyde) con- densation to form compound 1 (or compound 6), condensation of 1 with 2-butenal to form compound 3, condensation of compound 2 with 2-butanone to form compound 4, and the two hydrogenolysis reactions forming compound 5 have products that are at least 0.5 units more hydrophobic (a factor of three more soluble in octanol) than the reactants. These reactions, therefore, are potential biphasic systems. Indeed, several reports in the literature specifically employ biphasic systems (reactors/separators) for condensation/hydrogenolysis reactions for upgrading biomass. West et al. [192] discuss a process wherein the condensation and the final hydrogenation steps are followed by solvent-separation units where the aqueous contents are recycled while the contents of the organic phase are separated and either fed to the next stage or drawn off as products. Zapata et al. [196] use hybrid catalysts composed of base or metal catalysts fused to carbon nanotubes in an emulsion to simultaneously carry out condensation/ hydrogena- tion reactions and separate the products. On-the-fly estimation of LogP, thus, enables identifying potential candidates wherein differences in solubility can be exploited for enhancing product selectivity and/ or purification.

6.5 Discussion

We describe a computational method that combines automated network generation with semi-empirical molecular property prediction to screen large numbers of compounds and synthesis routes entailed in biomass conversion. In the preceding section, an application 6.5 Discussion 109

0.9691 O -0.0029 O O O OH OH O H2O Furfural Ethanol H2 O O H2 O 0.5953 H2O O 0.2052 1.3687 Acetone O -0.393 O Acetaldehyde H2 O H O O OH 2 O O Methanol 1.2881 1.3687 O O 2 1 O H2O O H 1.3755 0.7614 H2 2 H O 2 O -0.1849 1.1515 2-butanone 2-butenal O O 3.0146 OH 3.2765 O O 2.8712 4H2 -180.73 (-24.20) 2H2 4 O H2O H O 3 2 OH H 3.9145 O 2 2H2 O 3.0304 O 4.0596 H2O 5 H2

HO Lauryl alcohol 3.8981 Figure 6.11: Octanol-water partition coefficients of species in the synthesis routes to lauryl alcohol. The values corresponding to each species are calculated as the logarithm of the equilibrium partition coefficient value. Reactions marked with bold arrows can potentially by carried out in a biphasic aqueous-organic system. in the context of synthesizing fatty alcohols is discussed; however, the method adopted is not restricted to fatty alcohol synthesis alone and is generic enough to be applied to other classes of chemistries and compounds. For example, biomass conversion to fuels such as gasoline and diesel can also be explored in a similar manner, exploiting the vast resource base of structure-property relationships that exists for calculating properties such as octane rating, cetane index, flash points, density, viscosity, cold flow properties, and boiling points [201, 202, 203, 204]. There are three distinct advantages of this method: speed and automation, scalabil- ity, and reliability. Network generation, synthesis routes identification, and property- based selection/screening of molecules/routes are all performed by RING quickly and in an automated manner. The network represents the synthetically feasible set of molecules from initial reactants; the size of the network (in terms of the number of species and 6.5 Discussion 110 reactions), therefore, is a fractional subspace of the total chemical spectrum of possi- bilities. This implies that the computational cost of property estimation and synthesis routes identification is significantly smaller than if the entire chemical spectrum had to be evaluated. Further, molecular property prediction methods are both quantitatively and/or qualitatively reliable, as discussed in detail in Section 6.4. At this stage of screening, the kinetics of each reaction step is not considered. That each reaction is a potential complex network consisting of several types of reactive inter- mediates and reactions depending upon the chemistry is also neglected. For example, aldol condensation is represented as an overall reaction step; but this reaction, however, in itself has several elementary step reactions that ultimately lead to the formation of the condensate. Therefore, though cross-condensation reactions are prevented while identifying selective routes in Section 6.4.3, several by-products of base catalysis are neglected. Furthermore, the thermochemical analysis discussed above only involves es- timating gas phase properties, and consequently liquid phase energetics is not captured. To account for some of these, other semi-empirical correlations can be included. Ki- netics can be used to evaluate synthesis routes by including rate parameters derived using linear free-energy relationships involving energetic properties of reactants/ prod- ucts, such as in alcohol dehydration [205, 206], or by using rate rules formulated on the basis of regression from experimental data [207, 208], if available for all the speci- fied chemistries. Liquid phase thermochemistry can be estimated using semi-empirical methods [209, 210]. These methods have been proven to be fast and reliable in compar- ison to more sophisticated ab initio calculations [211]. The method we describe above is flexible in accommodating any type of semi-empirical correlation as a means of assessing properties and energetics of reactions or molecules. CHAPTER 7

Glycerol converison on transition metal catalysts

In this chapter, we combine network analysis features of RING with semi-empirical estimation of molecular properties, as in previous chapter, to elucidate the mechanisms in the complex network of glycerol conversion over transition metal catalysts. We begin with an introduction that motivates the method and subsequently present its details in context of glycerol conversion.

7.1 Introduction

Heterogeneous catalysis is a prominent means to upgrade biomass into valuable fuels and chemicals[212, 213]. Computational studies on the underlying mechanism can give insights into the potential energy surface of the system and thereby provide guidelines to design new and improved catalysts[214]. Specifically for biomass conversion, Density Functional Theory has been applied to estimate the thermochemistry of surface inter- mediates and activation energies of elementary steps in the conversion of alcohols[215], polyols[216, 217], acids, ethers, esters[133], and furans[218, 219] on transition-metal catalysts, and to thereby elucidate the mechanisms for upgrading such compounds. Semi-empirical relationships have been derived based on these calculations for predict- ing thermochemical properties of surface species and reactions rapidly compared to expensive DFT calculations. These include: (a) group/atom contribution methods to calculate enthalpy, entropy, and free energy of formation[220, 133] and adsorption energy

111 7.1 Introduction 112 relative to gas phase stable reference molecules[221, 222]; and (b) linear free energy rela- tionships to calculate activation energies of elementary reaction steps based on reaction and species energies[223, 224, 225, 221, 222]. In addition, linear scaling relationships to relate the energy of a species on one transition metal atom to that on another have been developed for transition metal catalysis by Nørskov and coworkers[226]. All these meth- ods have been shown to be accurate to 0.1-0.2 eV if applied to species similar to those that were used to develop the relationships[227]. These methods have been applied to study the entire reaction network for systems such as decomposition of ethanol [215] and ethylene glycol[217] on transition metals to identify potential reaction pathways on Platinum and other metals and to formulate and solve kinetic models. Extending such analyses to larger molecules such as glycerol and other higher alco- hols (for example, sorbitol), however, is nontrivial because with increasing molecular size the number of reactions and surface intermediates, resulting from breaking any bond in the initial reactant and species produced thereof, increase exponentially such that the reaction network becomes complex. Consequently, several hundred pathways can potentially convert molecules such as glycerol (initial feed) to experimentally observed products and it is intractable and error-prone to identify all these pathways manually. Computational studies have, therefore, focused on the first few critical steps of the re- action cycles or used a simplified reaction network. For example, Chen et al.[224] and Liu and Greeley[221, 222] used DFT-based semi-empirical correlations for predicting the relative stability of various dehydrogenated intermediates (C3HxO3*) and thereby proposed possible steps involved in the first C-C or C-O scission to products such as carbon monoxide on Platinum and other transition metal catalysts such as Palladium, Copper, and Rhodium. Coll et al.[228] studied the comparative adsorption of a va- riety of stable intermediates formed upon dehydration and subsequent hydrogenation of glycerol on Nickel, Rhodium, and Palladium and thereby identified transition metal catalysts that stabilize the intermediates and promote hydrogenation. To identify all plausible and energetically feasible elementary step routes for converting glycerol on these catalysts, however, all such possible routes and the associated catalytic cycles will have to be identified and their activation and reaction energies calculated. We propose an automated method to exhaustively enumerate all reactions of the system, identify all possible pathways between reactants and any specified product, and calculate the reaction and activation energies through semi-empirical methods to identify the plausible mechanisms for converting the reactants. Specifically, in this 7.2 Glycerol conversion on transition metals 113 work, we use RING[19, 20], an automated network generation and analysis tool, to: (a) construct the complex network of glycerol conversion on transition metal catalysts, (b) query for and identify possible pathways to C-C/C-O scission products such as carbon monoxide and propylene glycol, (c) determine, among them, the energetically feasible routes on Platinum using group additivity-based thermochemistry and linear free energy relationships, and (d) extend the analysis to other transition metals such as Palladium (Pd), Rhodium (Rh), and Ruthenium (Ru) using linear scaling relationships, and thereby study the trend in glycerol conversion over different catalysts as a function of atomic binding energies of carbon, oxygen, and hydrogen on them. We begin our analysis by describing the system and our methods in sections 7.2 and 7.3.

7.2 Glycerol conversion on transition metals

Glycerol is a representative biomass compound because it has a C:O ratio of 1:1 and contains C-C, C-H, O-H, and C-O bonds. Glycerol conversion on transition metal catalysts can lead to several products – if the C-C bonds in glycerol are broken, the products that are formed are of a smaller carbon backbone such as carbon monoxide, ethylene gylcol, etc while C-O bond scission leads to products with a lower C:O ratio such as 1,2- or 1,3- propane diol (propylene glycol), propanol, etc. Additionally, products such as methane and ethanol can also be formed depending on the relative extent of C-C and C-O bond scission. Given this spectrum of possible products, it is desirable to identify catalysts that selectively produce a specific product such as syngas or one of the propane diols. Several experimental studies have focused on developing catalysts to produce syngas [229, 230, 231] or glycols [232, 233, 234, 235] from glycerol under various conditions of temperature, pressure, and phase. In this study, we use RING[19, 20] to computationally analyze pathways to carbon monoxide (formed from C-C scission) and 1,2- propane diol (C-O scission product) from glycerol on transition metal catalysts.

7.3 Method

Construction of the network, calculation of thermochemical properties and activation energies, and pathway querying was carried out using RING. Primary inputs into RING are: (a) the initial reactants of the system (glycerol and hydrogen, in this study), (b) a set of reaction rules describing the chemistry (metal catalysis rules in this case which are included in AppendixH), and (c) queries pertaining to reactions and species in 7.3 Method 114 the network and pathways or mechanisms to different products (for example, carbon monoxide or propylene glycol). Additional inputs include: (a) group additivity rules for thermochemistry, and (b) rules to calculate activation barriers of each reaction rule. RING uses these additional inputs to calculate, on-the-fly, the required thermochemical quantities and reaction barriers. Outputs from RING, relevant to this study, are: (a) an exhaustive list of reactions and species in the network that is consistent with the reaction rules, (b) a list of reactions and species satisfying the reaction and species queries, and (c) a list of pathways and mechanisms on the basis of input queries.

7.3.1 Network construction and representation

The reaction network for glycerol conversion on platinum was generated using elemen- tary reaction rules common to metal catalysis such as C-C, C-H, C-O, and O-H scission. Additionally, C=O formation steps that ultimately lead to carbon monoxide (based on Salciccioli et al.[217] and Kandoi et al.[236]) were included. Further, C-H, C-O, and C-C formation steps that are microscopic reverse steps of the corresponding scission rules were also specified. Physisorption rules and the corresponding reverse step - desorption were included for alcohols and polyols. However, several steps were not included be- cause resulting products have not been reported in experimental studies. For example, a C-C bond formation step is not included because C4+ products were not reported in an experimental study on vapor phase glycerol decomposition[230]. Further, C-O formation leading to acids or esters are also not included in the network because glyc- erol hydrogenolysis on metals at neutral pH conditions do not lead to products with carboxylic acid groups[233]. A list of reaction rules input into RING is provided in AppendixH. Salciccioli et al.[220] proposed several bonding rules for small oxygenate surface species. For example, one of the bonding rules requires that free hydroxy groups at- tached to free (non-surface bound) carbon atoms in any surface species interacts with the surface through a weak bond (non-bonded interactions) except in the case of physisorbed glycerol wherein only two consecutive free C-OH groups interact. This interaction, how- ever, does not exist if the carbon atom is bonded to the surface and hence not free. This rule is applicable for species such as a physisorbed methanol intermediate which is ad- sorbed on the surface via a weakly bonded free hydroxy group; the species CH2OH* formed upon C-H scission of physisorbed methanol, on the other hand, does not have this weak OH-metal interaction. These rules were incorporated as global constraints 7.3 Method 115

(a) (b)

OH HO* C**HO OH*

SMILES: CO_[{M}] SMILES: C(CO_[{M}])(CO_[{M}])([{M}])O

Physisorbed methanol Chemisorbed C3 H 7O 3 * Figure 7.1: SMILES strings and pictorial representation: (a) physisorbed methanol, and (b) chemisorbed C3H7O3* that are applicable to all species in the network. Multiple versions of each affected chemistry rule (e.g., C-H scission of physisorbed and chemisorbed species as different rules) were specified to take into consideration bonding of the atoms to the metal. A complete list of bonding rules implemented in this study and a detailed discussion of the methodology adopted to include these rules as inputs into RING are given in Appendix H. The reaction network generated for this study consists of more than 500 species and 3300 reactions, thus reinforcing our earlier statement that this system is complex. All species are represented in a modified-SMILES[96] scheme that allows for explic- itly accounting for the bonds between a metal atom of the surface and any atom of the surface species. The number of metal atoms to which each surface species is bonded to is determined by assuming that each atom of the species satisfies its full valency.

Thus, a surface species AHx is assumed to have xmax - x single bonds with the metal, where xmax is the maximum valency of the atom ‘A’. For example, surface methylene –

CH2* – is represented in SMILES as the string “C([M])[M]” wherein “M” stands for the metal and hydrogens are implicit. Since, carbon has four valence electrons, it is bonded to two metal atoms in this case. This is consistent with the bonding rules proposed by Salciccioli et al.[220] Figure 7.1a shows the SMILES string and pictorial representa- tion of physisorbed methanol. This representation scheme also applies to multidentate species wherein each surface bound heteroatom has as many single bonds with the metal atoms as needed to fulfill its valency requirements. Figure 7.1b shows the pictorial and

SMILES representation for a chemisorbed C3H7O3* species. As seen in the figure, sur- face bound (or interacting) groups are marked with a “*” without explicitly noting the type of bonding. 7.3 Method 116 7.3.2 On-the-fly estimation of thermochemistry and activation barri- ers Thermochemistry calculation.

For this study, the thermochemical quantities such as enthalpy, entropy, and free energy of formation and specific heat capacity of each of the surface species in the network at any temperature of interest are calculated using group additivity methods proposed by Salciccioli et al.[133] for Pt(111). Alternatively, the additivity methods proposed by Liu and Greeley [221, 222] based on bond-order conservation principle can also be used to calculate the energetics of surface species. Details of implementing this scheme with RING are included in AppendixI. Gas phase thermochemistry is estimated using group additivity methods proposed by Benson[92], Sabbe et al. [157, 158], and Khan and Broadbelt[159]. In RING, there are three specific features for specifying group ad- ditivity rules. First, a contribution value can be specified for each molecular fragment, defined as comprising of a central atom and its neighbors, mimicking the style origi- nally proposed by Benson [92]. Second, corrections to account for non-nearest neighbor effects can be specified by defining an appropriate submolecular pattern (representing a specific substructure in a molecule) and its corrective contributions. Traditionally, such corrections are used to account for 1,4 and 1,5 interactions and intramolecular hydrogen bonding in gaseous hydrocarbons and oxygenates respectively. Third, corrections can be specified on the basis of overall molecular characteristics. We use the second feature – group corrections – to account for four different cases that arise due to the specific nature of the group additivity scheme proposed by Salcic- cioli et al[220, 133]. First, surface ring corrections, have been proposed by Salciccioli et al. [220, 133], to calculate the contributions of rings such as M-C-C-M. Their strategy involves calculating the sum of the natural supplementary angles of all heteroatoms participating in the ring and adding a correction proportional to this sum if it is less than 180◦. We have, instead, enumerated all possible rings that have a cumulative supplementary angle that is less than 180◦ (shown in AppendixJ) and added these as group corrections in the input to RING. Second, in the group additivity scheme pro- posed by Salciccioli et al.[133], carbon centered fragments that are bonded to a hydroxy group are distinguished depending on whether or not the oxygen interacts with the sur- face (for e.g. C-(C)(H)(Owk) vs C-(C)(H)(O)). This is done to specifically account for non-bonded interactions of hydroxy and aldehyde groups with the surface[133]. We in- clude the non-interacting group using the first group additivity feature described earlier 7.3 Method 117 and include the difference between the two groups as a correction term with the group corrections feature such that these are applied only to surface bound species. Third, Sal- ciccioli et al.[133] deduct the translational entropy contribution of non-surface-bonded fragments such as C-(C)2(O) from Benson’s group additivity value if these groups are a part of surface bound species. Within RING, Benson’s values are included using the additivity feature while the translational entropy correction values are assigned using the correction feature such that these are applicable for surface intermediates alone.

Linear scaling corrections. Fourth, in this study, the thermochemical quantities of species on other transition metals are calculated using linear scaling relationships with Pt(111) values as reference. This is done within RING by adding corrections for each type of surface bound atom to account for the differences between the metal of interest and platinum. Equation 7.1 gives these relationships

X E(CxHyOz*)M = E(CxHyOz*)P t − γaj (Qaj ,M − Qaj ,P t) (7.1) aj

xmax,aj − xaj γaj = (7.2) xmax,aj

where E is the energy of a given species, Qaj is the atom-metal bond dissociation th energy of the j atom aj ∈ {C,H,O} through which the species is bonded to the surface. The summation is over all the surface bound (and interacting) atoms in the species and thus directly reflects the denticity of the species. Subscripts ‘Pt’ (for Platinum) and ‘M’

(representing any other metal) refer to the surface. γj is the bonding correction factor that captures the bonding type of the heteroatom and the metal. Nørskov and coworkers proposed a theoretical method for calculating γ (equation 7.2) for species of the type AHx[226] and further extended it to include larger and multidentate species[237, 238]. More recently, however, γ values have been regressed from DFT data[227]. In this study, both methods have been used to calculate γ. Equation 7.1 has been adapted from Salciccioli et al.[220] and accounts for their weak bonding (non-bonded interactions) corrections for hydroxy and aldehyde groups. For such terms, QO and a γ value of 0.1 (or 0.14 when data from Sutton and Vlachos[227] were taken) have been used for calculating the energy corrections. Within RING, a correction term is specified for each type of C, H, and O bonding‡. We have taken these binding (or dissociation) energy

‡The type of bonding refers to the number of metal-heteroatom single bonds required to satisfy the 7.3 Method 118 values from Salciccioli et al.[217], Sutton and Vlachos [227], and CatApp[239] even as we note that some of the binding energy values for the same metal from different sources are not consistent (especially values by Sutton and Vlachos[227] for Rhodium with those reported by Liu and Greeley[240]). To be consistent, we take only the relative values with respect to Platinum (QM -QP t) from each source. The actual values used and their sources are given in AppendixK.

CO coverage effects. It has been shown that co-adsorbed carbon monoxide can destabilize itself and other species on the surface of transition metal catalysts. Grabow et al.[241] proposed a correlation for binding energy of carbon monoxide as a function of its coverage. For a given coverage, therefore, the destabilization value can be calculated and added to DFT calculations for carbon monoxide. Additionally, as initially adopted by Ferrin et al.[215], we assume that every other surface species is destabilized to the same extent as carbon monoxide. In RING, this translates to a molecular correction applicable to all surface species. AppendixL includes a comparison of DFT values (taken from Salciccioli et al.[217]) and those estimated by RING for a set of about 15 C1−3H0−8O0−3 surface intermediates of the network. The mean absolute deviation is 8 kJ/mol (or 0.1 eV) and the standard deviation is 11 kJ/mol.

Activation barriers.

Activation energies are input into RING in the form of linear free energy correlations for each reaction rule. RING accepts both BEP-like correlations and transition state scaling (TSS) relationships. As seen from equations 7.3 and 7.4, these two correlation types are similar in that they both estimate activation energy as a linear function of the energy of reaction (∆Hrxn)or species (relative to an assumed initial state). In equation

7.4,EFS and ETS are the final state and transition state energies with respect to the initial reactants in gas phase.

Ea = α∆Hrxn + β (7.3)

ETS = αEFS + β (7.4) latter’s valence constraints. 7.3 Method 119

The BEP correlations used in this study are obtained from Sutton and Vlachos[225]. TSS correlations developed and applied by Ferrin et al.[215], Chen et al.[224], and Liu and Greeley [221, 222] can also be used in RING. Conditional relationships can also be used to specify rules for kinetics. For example, Sutton and Vlachos[225] propose different correlations for C-O scission and C-OH scission. These two rules can be specified by conditional “if” statements in RING.

Assumptions. Several assumptions are made while using these correlations. First, these correlations are assumed to be valid at any temperature. Second, it is assumed that these correlations are valid across all other metals, consistent with the assumption made by Ferrin et al.[215]. Third, the correlations are assumed to hold even in the presence of carbon monoxide coverage effects that lead to a change in reaction and species energies. In effect, if the species energies are available at a given temperature for a specific catalytic surface at a given carbon monoxide coverage, the activation barrier of each reaction step can be estimated using the given set of correlations.

7.3.3 Pathways analysis

Queries for pathways to different products can be specified within RING along with constraints on the types of pathways sought. For example, a restriction on the number of occurrences of reactions of a particular rule or an upper bound on the activation energy of any step in a pathway or mechanism can also be specified. The queries used in this study can be found in the input files in AppendixH. The generated network can be queried for reaction pathways to either products (carbon monoxide and 1,2 propane diol) under study. Constraints were enforced in these queries to screen out undesired pathways. Specifically, the length of a pathway was restricted to be up to three steps longer than the shortest possible pathway from glycerol to carbon monoxide or 1,2 propane diol. Activation barriers of C-C scission steps were also constrained to be less than a stipulated upper bound of 120 kJ/mol. The length constraints prevents unnecessarily long pathways that can, for instance, comprise of several C-H formation and C-H scission steps in succession that negate each other by forming cycles in effect. This constraint can be relaxed if more pathways need to be explored. The constraint on the energy barrier of each reaction step restricts energetically unfavorable pathways from being listed. 7.4 Results 120 7.4 Results

In this section, we discuss the pathways to form C-C or C-O scission products of glycerol identified by RING based on the queries described above. Pathways and associated energetics on platinum are discussed first, followed by a comparative analysis of these pathways across metals and alloys.

7.4.1 Platinum Carbon monoxide formation.

As the upper bound on the maximum C-C scission barrier is increased, the number of identified pathways to form carbon monoxide grows almost exponentially. Table 7.1 lists the number of pathways identified depending on the upper bound on the highest activation barrier allowed in the query. Figure 7.2 shows a few of the pathways identified by RING and their activation barriers at 550 K. These pathways have several common characteristics that are consistent with experimental and theoretical studies. First, glycerol physisorbs on to the metal sites with a reaction enthalpy of about -37 kJ/mol (-43 kJ/mol at 298K) which compares well with DFT predicted value of -0.46 eV. Second, physisorbed glycerol undergoes successive C-H/O-H scission steps, or dehydrogenation steps, prior to C-C scission. Chen et al. [224] and Liu and Greeley [221] show that C-C scission is significantly less favorable than C-H scission (by as much as 0.5 eV in terms of free energy of activation) until glycerol has been dehydrogenated to lose three or more hydrogen atoms. Third, C-C scission steps involve one or more carbonyl carbons. This is also observed in calculations reported by Chen et al. [224] and Liu and Greeley[221]. This is because carbonyl groups (and carbon monoxide) are stabilized by Platinum thereby reducing the barrier for the reactions [221]. Fourth, the step with the highest activation energy in each pathway is either a C-H bond scission or a C-C bond scission with activation barriers varying from 74 kJ/mol to over 100 kJ/mol (values shown in bold italics in Figure 7.2). Several pathways have similar energetics and, therefore, are all likely to be significant. Chen et al.[224] showed using a combination of DFT and semi-empirical methods that multiple low energy pathways for glycerol decomposition can potentially exist, each involving steps that have barriers of less than 0.75 eV. The activation barrier of C-H and C-C scission steps are lower than the desorption enthalpy of carbon monoxide of about 105 kJ/mol at 550 K (DFT predicted binding 7.4 Results 121

OH OHHO Glycerol OH 74.06 O* 44.44 O* *HO *HO OH* *HOC* OH* OH* H* 1 H* 53.59 49.49 87.41 H* H* OH H* O 20.89 H* *HOC* OH* *HO OH* 87.41 OH* CH* *HO OH CH* H* *HO CH* OH H* 67.65 O* H* 94.88 36.09 81.42 H* H* O OH* OH *HO OH* *HO C* *HO *HC C* C* OH* OH C* O 64.25 OH OH H* H* 87.03 90.24 H* O 103.71 H* O C* OH 86.77 *HO 35.14 HO C* OH* C* O C *HO C* * H* OH H* 74.29

O O O:C C* *HO C* Carbon monoxide Figure 7.2: Representative pathways for carbon monoxide formation from Glycerol on Pt. All values are in kJ/mol. energy of carbon monoxide varies from -1.2 eV[242] to -1.8 eV[241] depending on the functionals used). At reaction conditions, surface carbon monoxide will tend to equi- librate with gas phase carbon monoxide[236, 217] and C-H/C-C bond scission steps will be rate limiting. However, under conditions where carbon monoxide adsorption is not equilibrated, its production from glycerol will be desorption limited. Indeed, TPD studies by Skoplyak et al. [231] of glycerol at UHV conditions show that the carbon monoxide peak corresponds to pure CO TPD peak indicating that its desorption is the limiting step.

Lowest barrier pathway. It can be noted that species 1 in Figure 7.2 is a result of O-H scission and the activation barrier (74 kJ/mol) is almost equal to that of the step 7.4 Results 122 Table 7.1: Number of identified CO formation pathways on Pt in RING pathway queries as a function of upper bound on activation barrier specified. All energy values are in kJ/mol.

Activation barrier Number of pathways ≤ 80 6 ≤ 90 54 ≤ 100 98 ≤ 120 100 with the highest barrier. Further, this value is much larger (by about 30 kJ/mol) than the first dehydrogenation step in other pathways shown in the figure. This is because species 1 is less stable than the other C3H7O3* species and as a result has a lower heat of reaction and, hence, a larger activation energy. Chen et al. [224], when considering the stability of possible C3H7O3* species reached the same conclusion and consequently excluded this species from further analysis, in effect, assuming that dominant pathways will tend to involve relatively stable intermediates. That is, the pathway shown in Figure 7.2 involving species 1 was not considered by them. However, interestingly, several pathways involving species 1 have a C-C scission step as the highest barrier step (which is only marginally higher than the dehydrogenation step forming it) and this barrier value is the lowest among all pathways§ found by RING. Thus, while having a high barrier dehydrogenation step to begin with, this pathway has all subsequent steps with lower energy barriers compared to other pathways; the energetics of this pathway can now be re-estimated with DFT to verify if it is indeed a low energy pathway. Upon including an additional constraint in pathway queries that restricts OH scission leading to species 1 as an intermediate in any pathway, the routes identified by RING do not include the above pathway and the lowest energy route now has barriers of at most 77 kJ/mol (the pathway is shown in Figure 7.3). This shows that RING can be used to exhaustively identify energetically favorable pathways based on semi-empirical correlations which can then be re-evaluated using a higher level of theory such as DFT.

Carbon monoxide mechanism. In addition to querying for pathways, RING allows for identifying direct or complete mechanisms - or full catalytic cycles. Figure 7.4 shows a full cycle for glycerol to decompose to syngas. This cycle extends the pathway in Figure 7.3. We note that C-C scission of the C2 surface intermediate has the highest

§By this, we imply that this barrier value is lower than all those values corresponding to the highest activation barrier step in each of the other pathways 7.4 Results 123

OH OHHO

O OH 49.49 OH 66.29 OH 76.7 *HO *HOC* OH* HOC* OH* CH* *HO C OH* H* H* * H* OH H* 67.65

O O O 74.29 64.25 O C* *HO O:C *HO C* *HO C* C* O H* OH Figure 7.3: Lowest energy pathway on Pt without OH scission leading species 1 in Figure 7.2. All values are in kJ/mol. activation barrier in the cycle at 125 kJ/mol. However, Sutton and Vlachos [225] and Salciccioli et al.[217] report a barrier of 40-50 kJ/mol for this step. This discrepancy between the reported DFT value and that calculated by RING, we posit, is likely due to the errors in the linear free energy relationship used and thermochemistry estimation relative to DFT values. Specifically, the difference between RING-calculated value for C-

C scission of the C2 intermediate shown in Figure 7.4 and that reported by Salciccioli et al.[217] is about -20 kJ/mol which leads to an over prediction of the activation barrier by about 15 kJ/mol; that the actual discrepancy in barrier value is higher than this indicates that the errors in the linear free energy correlation also partially account for the deviation. It should be noted that this argument implies that the C-C scission step in Figure 7.2 could also be over-predicted. The next step involving dehydrogenation of CH2OH* is also higher than DFT predicted values (95 kJ/mol relative to about 43 kJ/mol calculated using DFT[236]). This may also be attributed to the errors in the correlation and the nature of species included in the dataset used to obtain it because the DFT predicted enthalpy of reaction deviates from that calculated by RING by ∼ 5 kJ/mol. Thus, while linear free energy relationships systematically reproduce DFT trends over a large and diverse set of reactions and species, they suffer from intrinsic errors arising from the simplifications that lead to these correlations.

Effects of CO poisoning. At a carbon monoxide surface coverage of 0.5, which is a conservative estimate considering that methanol decomposition on Platinum can have carbon monoxide coverages up to 0.9[236], the destabilization energy (see Section 7.3.2) is ∼ 0.2 eV. This, in turn, affects the energy barriers of individual reaction steps. We 7.4 Results 124

OH OHHO

O OH 49.49 OH 66.29 OH 76.7 *HO *HOC* OH* HOC* OH* CH* *HO C OH* H* H* * H* OH H* 67.65

O O O 74.29 64.25 O C* *HO O:C *HO C* *HO C* C* O O H* OH O C* 16.82 124.31 C* OH 43.5 95.58 C* *HC *H2C H* OH OH H* H* Figure 7.4: A complete reaction cycle forming carbon monoxide on Pt. Not shown is the stoichiometrically equivalent number of hydrogen formation steps from H*. All values are in kJ/mol. include into RING this value as a molecular correction and observe that the barriers of C-H and C-C scission steps increase by about 25 and 15 kJ/mol respectively. As a result the pathway involving species 1 in Figure 7.2 will now have the initial dehydrogenation step as the step with the highest barrier (98 kJ/mol); the barriers of other pathways shown in the figure will also shift up by 25 kJ/mol.

1,2 propane diol pathways.

Figure 7.5 shows some of the identified pathways to 1,2 propane diol. Glycerol, it can be seen, first gets physisorbed on to the metal. Subsequently, the physisorbed species can either undergo a direct C-O scission of the primary carbon-oxygen bond or undergo a C-H scission prior to C-O scission. All pathways have C-H formation as the terminal step leading to physisorbed diol. It can also be seen that steps with the highest barrier are C-O scission steps. These pathways are the shortest routes (and up to three steps longer in length) to form 1,2 propane diol. To investigate the possibility of a more dehydrogenated intermediate having a lower C-O scission barrier, we extracted all reactions (through a reaction query) involving C-O scission of an intermediate having three carbon and oxygen atoms. We observed that the reaction involving species 2 and 3 in Figure 7.5 has the lowest activation barrier among C-O scission reactions 7.4 Results 125 while there are several other C-O scission steps involving more dehydrogenated species within 20 kJ/mol of this lowest barrier step. This C-O scission step (corresponding to C-O scission of species 2) has a barrier almost comparable to the lowest C-C bond scission barrier of species 2 (about 135 kJ/mol). This is consistent with the finding by Liu and Greeley [222] that C-O scission step of C3H8O3* (physisorbed glycerol) is slightly (0.1 eV) lower than C-C scission. Upon successive dehydrogenation, however, Liu and Greeley note that C-C scission barriers reduce significantly while C-O scission free energy barriers are somewhat constant throughout after decreasing by about 0.3 eV upon the first dehydrogenation step¶. Our observations above and the results of Liu and Greeley thus indicate that C-O scission barriers of the pathways considered above are a good indicator of C-O scission barriers in the network as a whole. The activation barriers for 1,2 propane diol (≥ 135 kJ/mol) are more than 60 kJ/mol higher than the lowest energy pathway for carbon monoxide formation. This implies that C-C scission is favored over C-O scission on Platinum. Indeed, experimental studies of glycerol conversion on Platinum show that diols are not produced in any significant amount[229].

7.4.2 Other metals

As the binding energy of atomic carbon, oxygen, and hydrogen on other metals differs from that of Platinum, the energetics of each of the species and, therefore, the enthalpy and the activation barrier of each reaction can change. The linear scaling correlations discussed above can capture this trend and we present the results from our calculations below.

CO formation.

Table 7.2 lists the value of the highest barrier step of the three lowest energy pathways in RING for carbon monoxide and 1,2 propane diol synthesis over Platinum (Pt), Palla- dium (Pd), Rhodium (Rh), and Ruthenium (Ru). It can be seen that carbon monoxide formation barrier energies are within 10 kJ/mol (or 0.1 eV) of the values on Pt. This is a combination of two factors. First, the lowest energy pathway is not the same across all metals. For example, the pathway involving species 1 in Figure 7.2 is the lowest

¶Liu and Greeley [222] report free energy of activation with glycerol as reference reactant and sto- ichiometrically equivalent amount of hydrogen as one of the products. They, therefore, correct for entropy gain of hydrogen evolution in the overall balanced reaction. The activation barrier decrease upon dehydrogenation is, as a result, even smaller if not negligible 7.4 Results 126

OH C**O OH* 88.85 H* 102.81 H* OH OH* OH C**HO OH* *O OH* OHHO 49.49 155.67 Glycerol O* OH H* CH2* 134.87 H* OH *HO *HO *HO OH* OH* HO OH* 79.14 2 OH* 3 1,2 propane H* 20.89 H* 79.14 diol

OH* 103.4 CH2* CH2* *HO 150.8 H* CH* *HO *HO CH* OH* OH OH* OH Figure 7.5: Representative pathways for 1,2 propane diol formation from Glycerol on Pt. All values are in kJ/mol. energy pathway on Pt while for the other three metals, the pathway shown in Figure 7.6 has the lowest energy barrier. Second, the effect of the binding energy differences of atomic carbon across these metals on activation barriers is, to an extent, negated by that of hydrogen. For example, while bond dissociation energy of atomic C on Ru is 16 kJ/mol lower than on Pt (that is the binding energy is less negative by 16 kJ/mol), binding energy of hydrogen atom on Rh is more negative by 14.5 kJ/mol based on the values used in this study. This means that the net change in the heat of reaction is small (≤ 0.2 eV) and, as a result, the activation barriers are similar. One difference is Rh whose C and H binding energies are more negative than that on Pt. This results in ∼ 7 kJ/mol reduction in the activation barrier for forming carbon monoxide. Species 1 in Figure 7.2 is stabilized on surfaces with a higher affinity for O such as Ru and Rh. In figure 7.7 the lowest energy pathway on Pt (involving species 1 of Figure 7.2) is shown with the corresponding activation barriers on the other three metals. It can be seen that while the energetics on Pd is similar to Pt, there is a significant difference in barrier values on Ru and Rh. This is consistent with the much higher binding energy value of atomic O on Ru and Rh over Pt (of more than 75 kJ/mol). 7.4 Results 127

OH OH I II O* OHHO *HO *HO OH* OH* H* III H* Step Pd Rh Ru O O V O* II 42.89 30.31 38.99 IV III 58.69 54.88 66.70 *HO OH *HO C CH* *HO O* IV 55.79 69.66 60.79 * V 61.05 48.47 57.14 H* OH H* VI VI 43.33 57.21 48.34 H* VII 86.07 67.29 79.71

O VII O O VIII + O:C *HO O *HO C* C* C * Figure 7.6: Pathway to form carbon monoxide from glycerol with the lowest energy barrier on Pd, Rh, and Ru. All values are in kJ/mol. Steps I and VIII are non-activated.

OH OH I II O* OHHO *HO *HO OH* OH* H* Step Pt Pd Rh Ru III H* II 74.06 64.96 29.67 15.62 III 65.32 58.72 46.14 54.82 O O O* V IV IV 53.59 31.97 86.47 104.45 *HO OH *HO *HOC* OH* V 67.65 61.05 48.47 57.14 CH* C VI 64.25 43.33 57.21 48.34 * H* OH H* VII 74.29 86.07 67.29 79.71 VI H*

O VII O O VIII + *HO O *HO C* C* O:C C * Figure 7.7: Reaction pathway to form carbon monoxide with the lowest energy barrier on Pt and the corresponding energy barriers on Pd, Rh, and Ru. All values are in kJ/mol. Steps I and VIII are non-activated. 7.4 Results 128 Table 7.2: A list of highest barriers of the low energy pathways to CO and 1,2 propane diol formation on different metals (in kJ/mol)

Two significant digits used to distinguish similar values.

Metal Lowest energy pathway barriers 1,2 propane diol CO 134.9 74.3 Pt 150.8 74.5 155.7 76.7 144.4 86.1 Pd 159.6 87.3 167.2 96.1 104.9 69.7 Rh 119.2 71.1 126.1 77.0 100.8 79.7 Ru 114.1 79.74 126.1 80.1

1,2 propane diol formation.

The energetics of low energy pathways to 1,2 propane diol, on the other hand, vary significantly across the four metals. Specifically, Ru and Rh both have pathways that are about 30 kJ/mol lower than those on Pt and Pd. This, again, can be attributed to the O binding energies on these four metals. Also, the pathway involving the reaction of species 2 forming species 3 in Figure 7.5 has the lowest energetics across all metals. These trends indicate that Pt and Pd are expected to be more selective towards syngas production while Ru and Rh are comparatively better suited for glycerol hydrogenolysis to propane diol. This is consistent with the computational results obtained by Liu and Greeley wherein they predict that Pt(111) and Pd(111) are more suited for C-C scission products than Rh(111). Experimental glycerol hydrogenolysis studies have shown that, at neutral pH conditions, Ru is more active towards the synthesis of diols compared to Pt[233].

Energetics on ideal Ethylene glycol catalyst.

Salciccioli et al.[217] identified the characteristics (binding energies of C, H, and O) of an optimal ethylene glycol (EG) decomposition catalyst that maximizes TOF to hydrogen using first-principles based kinetic modeling and scaling relationships. Using 7.4 Results 129 the binding energies reported by them, we found that the lowest energy pathway to CO formation on this catalyst has an activation barrier of 100.5 kJ/mol which is, in fact, higher than that on Platinum implying that this ideal catalyst may not be suitable for glycerol decomposition. The higher activation barrier in the case of the ideal EG catalyst because the bind- ing energy of C on this catalyst is 80 kJ/mol lower than on Platinum. This results in carbonyl groups being less stabilized which in turn leads to a reduction in the exother- micity of the C-C scission step and thus a higher C-C scission barrier than for Platinum. On the other hand, the lower C binding energy also implies that the coverage effect of carbon monoxide is expected to be less destabilizing and CO coverages will tend to be lower than that on Platinum. This means that the increase in activation barrier of this C-C scission step on Platinum due to the coverage effects of carbon monoxide will be higher compared to that on the ideal EG catalyst. For instance, This may in turn mean that the activation barriers under actual conditions may be lower with the ideal EG catalyst making it a better decomposition catalyst than Pt. Indeed, the activation barriers (about 98 kJ/mol) with a moderate level of CO poisoning on Platinum (Section 7.4.1) is comparable to the barriers of glycerol decomposition on this catalyst. A similar effect has been reported by Simonetti et al.[229] to account for Rhenium (Re) promotion of Pt catalyst for glycerol decomposition wherein Re reduces the CO binding energy on the catalyst and thus reduces its coverage on the surface[243]. Their simplified kinetic model for glycerol decomposition shows that the rate of syngas pro- duction on Pt-Re catalyst is higher than on Pt even though the lumped kinetic rate constant of C-C scission is about eight times lower on Pt-Re catalyst than on Pt at reaction conditions of about 550K and 1 atm because the equilibrium constant of CO adsorption on Pt-Re and, hence, its surface coverage are both lower by an order or magnitude than on Pt.

Other linear scaling correlations.

We also used the modified linear scaling parameters proposed by Sutton et al.[227] for oxygenates as an alternative method of finding the thermochemistry and activation barriers of each step in the network for other metals. These calculations lead to similar trends tabulated in Table 7.2. Thus, RING can be used to identify energetically feasible pathways and catalytic 7.5 Discussion 130 cycles and to thereby provide insights into the dominant mechanisms in a complex reac- tion network such as glycerol conversion on different transition metal catalytic surfaces such as Pt, Rh, Ru, and Pd at different temperatures and coverage conditions using available semi-empirical correlations.

7.5 Discussion

The method presented here has several characteristics that allows for its application to probe complex catalytic reaction networks beyond the current case study. The method is automated, fast, and scalable. A large network with thousands of reactions can be generated and analyzed in a few minutes to few hours depending on the pathway query and the size of the network. This means that the semi-empirical methods developed by the computational catalysis community can now be applied to much more complex networks of reactions. Furthermore, because RING ensures exhaustiveness in generation of the network and identification of pathways and the semi-empirical methods used are reasonably accurate (as discussed earlier), the overall method is reliable in identifying the critical pathways or mechanisms. Once likely dominant pathways or reaction cycles have been identified using RING, a more rigorous DFT calculation can be performed on them, if desired, to get a more accurate picture of the reaction mechanism. The method is flexible in terms of how different semi-empirical correlations can be used for calculating thermochemistry or activation barriers. RING can take in inputs for thermochemistry in the form of (a) group contributions, (b) linear correlations based on molecule size, (c) individual specifications, or (d) combinations of all three of them. Activation barriers can be prescribed as (a) either of the two forms of linear free energy correlations, (b) as individual specifications, or (c) a combination of them. Additional complexity can further be introduced with respect to the chemistry or energetics cal- culation. For example, the co-adsorption effects of alcohols and polyols discussed by Sautet and coworkers on Rhodium(111)[244] can be introduced either in the chemistry rules or in the thermochemistry calculation as a correction. The method is generic. RING can be used to generate and analyze mechanisms of complex reaction networks pertaining to homogeneous and heterogeneous acid, base, and metal catalysis chemistry. Previously[136], we have reported network generation and querying of glycerol dehydration catalyzed by Brønsted acid catalyst HZSM-5 that was experimentally studied by Corma et al.[6]. If semi-empirical methods are made available to calculate the thermochemistry and activation barriers, a similar analysis 7.5 Discussion 131 can be pursued. Semi-empirical correlations for the physisorption and chemisorption enthalpies for alkanes and alkenes in zeolites exist[245, 246] and can be directly used with the current input specification scheme in RING. The method proposed relies on generating the reaction network and calculating the thermochemical quantities and activation barriers of each step. If the pre-exponential factor of each step is available or can be estimated, a complete microkinetic model can be formulated and solved to obtain a quantitative description of the system. Thus, our method, is a first step towards being able to develop detailed kinetic models for complex networks of reactions. CHAPTER 8

Kineting modeling of MTH

In this chapter, kinetic modeling with RING is presented in context of an example system of methanol conversion to hydrocarbons (MTH).

8.1 Introduction

MTH is a process by which methane and biomass, once converted into methanol (or dimethyl ether (DME)), can be transformed into gasoline-range fuels and chemicals such as propylene, aromatics, etc. It has been shown that the mechanism of chain growth of the C1 feed is not direct C-C bond formation but one that involves an indirect route involving a hydrocarbon pool consisting of olefins and aromatics that acts as a co- catalyst [247, 248, 249]. Ilias and Bhan [250] have further shown that co-feeding olefins such as propylene or aromatics such as Toluene seeds the hydrocarbon pool in such a manner as to tune the selectivity towards predominantly olefinic or aromatic products, respectively. In this chapter, a kinetic model is developed to quantitatively capture the mechanism of this system.

8.2 Kinetic modeling: Procedure

In this section, a systematic procedure for kinetic modeling as adopted in this study is discussed. RING has been used to construct the reaction network from elementary

132 8.2 Kinetic modeling: Procedure 133 steps, lump the network based on experimental observations, estimate thermochemistry, calculate kinetic parameters, and solve the resultant lumped kinetic model.

8.2.1 Reaction rules

First step in kinetic modeling is the construction of the reaction network. This requires reaction rules to be specified in RING. Ilias and Bhan [250] have presented a comprehen- sive review of the chemistry in MTH. The reaction network can broadly be composed of an aromatic- and an olefin-based cycle. The elementary steps occuring in the olefin cycle are given in the Table 8.1 along with a pictorial representation of a representative reaction. The elementary steps cover chain growth (methylation and oligomerization), dehydration of alcohol, cracking, hydride transfer to form alkanes, isomerization (hy- dride and methyl shifts), and cyclization. On the other hand, the elementary steps in the aromatics cycle, as given in the Table 8.2, cover methylation of aromatics, dealkylation to form olefins, hydride transfer of cyclic hydrocarbons leading to aromatics, ring ex- pansion and contraction, etc. Note that the two cycles interact with each other through hydride transfer, cyclization, and dealkylation.

Table 8.1: Reaction rules for olefin cycle in MTH chemistry. Reverse rules are also included in the chemistry

Reaction rule Illustrative example

+ {Zeo}H Olefin adsorption {Zeo} {Zeo}

+ {Zeo} Methylation

+ {Zeo} {Zeo} Oligomerization (& beta-scission) {Zeo}

{Zeo} + + Hydride transfer Continued on next page 8.2 Kinetic modeling: Procedure 134

Table 8.1 – continued from previous page Reaction rule Illustrative example {Zeo} + {Zeo} + Hydride transfer with rings O DME adsorption + {Zeo}H OH + {Zeo}

Methanol dehydration OH + {Zeo}H H2O + {Zeo} {Zeo}

{Zeo} Cyclization {Zeo}

+ {Zeo}H 1,5 cyclization (alkene)

{Zeo}

1,5 cyclization (diene) {Zeo}

Table 8.2: Reaction rules for aromatics cycle in MTH chemistry. Reverse rules are also included in the chemistry

Reaction rule Illustrative example

{Zeo}

+ {Zeo}H Aromatics adsorption

+ {Zeo}

Aromatics methylation {Zeo} {Zeo}

Allyl shift (& beta- {Zeo} scission) Continued on next page 8.2 Kinetic modeling: Procedure 135

Table 8.2 – continued from previous page Reaction rule Illustrative example {Zeo}

Ring Expansion {Zeo}

{Zeo}

Methyl shift {Zeo} {Zeo} {Zeo}

+ Alpha scission

8.2.2 Thermochemistry

Thermochemical quantities of gaseous species have been calculated using the group additivity method discussed in previous chapters. For surface species, or alkoxides, we calculate the thermochemistry using linear correlations developed using computational chemistry calculations reported in the literature [245] that can, in turn, be used to provide group additivity rules. Chemisorption enthalpies and entropies of linear and branched alkenes have been reported by Marin and coworkers [245] using a hybrid QM/MM. The chemisorption thermochemistry of alkenes is the enthalpy change of a reaction of the type given in equation 8.1.

Olefin + Brønsted site → Alkoxide (8.1)

Linear correlations have been developed for chemisorption enthalpies and entropies as a function of carbon number of the alkene. These correlations can be used to calcu- late the enthalpy and entropy of formation of a surface species (alkoxide) by calculating the corresponding thermochemical quantity of olefin using group additivity and adding a correction corresponding to the carbon number. Implicit in this calculation is the assumption that the acid site is set as zero. Since the thermochemistry is used mainly for estimating equilibrium constants of the reaction (for thermochemical consistency), this assumption is valid. However, this cannot be directly implemented within a group additivity feature for two reasons. First, not all alkoxides have a corresponding alkene 8.2 Kinetic modeling: Procedure 136 related by the reaction 8.1. For example, alkoxides without an alpha-hydrogen (with respect to the alkoxide C) cannot desorb to form an alkene. Second, an alkoxide can potentially be formed from two or more different alkenes. To recast these correlations in a manner that can be directly used within a group additivity framework, the com- putational chemistry calculations were used to estimate the enthalpy change of hydride transfer reactions 8.2 and develop similar linear correlations. Figure 8.1 shows the linear trends of the enthalpy and entropy difference between alkoxides and their corre- sponding alkane with carbon number of the corresponding alkane for different types of alkoxides. This difference is the enthalpy of the hydride transfer shifted by a constant value equal to the enthalpy (or entropy) of hydrogen. These linear correlations now allow for calculating the thermochemistry of any alkoxide by referencing it to a unique alkane (formed by replacing the alkoxide-carbon bond with a carbon-hydrogen bond). For surface methyl group, the heat of formation is taken from the reaction enthalpy of methanol dehydration reported by Mazar et al. [251]; the entropy of formation was calculated assuming that methyl radical loses its translational entropy upon forming the methoxide species.

Alkane + Brønsted site → Alkoxide + Hydrogen (8.2)

8.2.3 Kinetic parameters

Kinetic parameters for this study have been taken from the literature. Table 8.3 contains all the kinetic parameters used in this study and the corresponding literature source.

Table 8.3: Kinetic parameters used in modeling MTH chemistry. A refers to pre-exponential factors, Ea is the activation barrier

Kinetic information

(i) Olefin adsorption Adsorption to primary alkoxide [252]: Ethylene: A 2.67e4 1/atm/s Ea 54 kJ/mol Propylene: A 7.11e3 1/atm/s Ea 36 kJ/mol Butene (and higher olefin): A 1.32e3 1/atm/s Ea 34 kJ/mol

Continued on next page 8.2 Kinetic modeling: Procedure 137

Table 8.3 – continued from previous page Kinetic information Adsorption to secondary alkoxide [252, 253]: Propylene: A 2.66e4 1/atm/s Ea -20 kJ/mol Butene: A 4.939e3 1/atm/s Ea -22 kJ/mol Pentene (and higher olefin): A 5.57e3 1/atm/s Ea -34 kJ/mol n 0.0

Adsorption to tertiary alkoxide [252]: Butene (and higher olefin): A 6.8e4 1/atm/s Ea -59 kJ/mol n 0.0

Remark: The activation barriers and pre-exponential factors are apparent values. The intrinsic values of activation barrier and pre-exponential values given in Nguyen et al. [252] and van de Runstraat et al. [253] are corrected with olefin physisorption enthalpies and entropies [245]. Since these values are apparent, they can be negative. The kinetics for the reverse rule, desorption to form olefins, are calculated from the thermochemistry of olefin adsorption. (ii) Aromatics adsorption Benzene: A 5.4579e6 1/atm/s Ea -14.7 kJ/mol Toluene: A 4.28e5 1/atm/s Ea -37.7 kJ/mol Xylene: A 4.28e5 1/atm/s Ea -37.7 kJ/mol Trimethyl Benzene and higher methyl benzenes: A 1.2253e6 1/atm/s Ea -28.2 kJ/mol

Remark: The intrinsic activation barriers and pre-exponential factors taken from [254] are corrected by physisorption enthalpies and entropies of aromatics. Physisorption enthalpies are taken from Brogaard et al. [255] while entropies are calculated with the correlation reported by Nguyen et al. [245] that relates physisorption entropy with physisorption enthalpy for olefins. (iii) Dimehtyl ether adsorption and ether dehydration DME adsorption: A 729.48 1/atm/s Ea 29.6 kJ/mol Ethanol dehydration: A 1.59e4 1/atm/s Ea 22.7 kJ/mol Continued on next page 8.2 Kinetic modeling: Procedure 138

Table 8.3 – continued from previous page Kinetic information Remark: The apparent activation barrier and pre-exponential factor values are taken from Lesthaeghe et al. [248]. It has been reported by the authors that presence of water can stabilize the intermediates and reduce the activation barriers; the values given above are for this case. (iv) Oligomerization (including methylation) Methylation [256, 257]: Ethylene: A 1.96e8 1/atm/s Ea 94 kJ/mol Propylene: A 2.9e5 1/atm/s Ea 61 kJ/mol 1-Butene: A 3.84e4 1/atm/s Ea 44 kJ/mol 2-Butene: A 8.4e4 1/atm/s Ea 49 kJ/mol Isobutene and higher alkenes: A 6.8e7 1/atm/s Ea 56 kJ/mol

Remark: Other oligomerization reactions is calculated assuming reverse of beta scission values. (v) Beta-scission∗ Primary ←→ Secondary: A 6.6e10 1/s Ea 124.6 kJ/mol Primary ←→ Primary: A 6.6e10 1/s Ea 129.2 kJ/mol Secondary ←→ Secondary: A 6.6e10 1/s Ea 104.6 kJ/mol Primary ←→ Tertiary: A 6.6e10 1/s Ea 97.2 kJ/mol

Remark: Kinetics taken from Bhan et al. [17] (vi) Cyclization

C6: A 1.59e13 1/s Ea 67.1 kJ/mol

C7 &C8: A 1.59e13 1/s Ea 38.1 kJ/mol

Remark: Kinetics of cyclization of C6 species was taken from Joshi and Thom- son [138]; Values for higher alkenes are taken from Joshi and Thomson [139]. (vii) 1,5 Alkene and diene cyclization Alkene: A 1.24e12 1/s Ea 28.4 kJ/mol Continued on next page

∗This has been taken from unpublished data of Cha-Jung Chen, Department of Chemical Engineering & Materials Science, University of Minnesota 8.2 Kinetic modeling: Procedure 139

Table 8.3 – continued from previous page Kinetic information Diene: 2.71e14 1/s Ea 159.5 kJ/mol n 0.0

Remark: Kinetics are taken from Vandichel et al. [258]. It was found that including 1,5 cyclization of alkene through the quoted low-energy step leads to a significantly large amount of cycloalkane that does not subsequently arom- atize because ring expansion and hydride transfer is not fast enough to allow for it. However, cycloalkanes are not observed experimentally and hence for this study, this step is ignored.

(viii) Ring expansion A 3.3e12 1/s Ea 51.3 kJ/mol Gem-methylated species: A 1.5e11 1/s Ea 112.6 kJ/mol

Remark: These values are taken from McCann et al. [254]. Ring contraction is assumed to be reverse of this.

(ix) Armatics methylation

Gem-methylation leading to tertiary carbenium ion: Xylene: A 1.26e4 1/atm/s Ea 20.2 kJ/mol Tri-methyl Benzene: A 3.22e4 1/atm/s Ea 28.7 kJ/mol Tetra-methyl Benzene: A 4.7e4 1/atm/s Ea 42.1 kJ/mol Penta-methyl Benzene: A 2.76e6 1/atm/s Ea 79.6 kJ/mol Hexa-methyl Benzene: A 1.75e6 1/atm/s Ea 117.7 kJ/mol

Gem-methylation leading to secondary carbenium ion: Toluene: A 1.12e4 1/atm/s Ea 49.5 kJ/mol Xylene: A 1.26e4 1/atm/s Ea 36.3 kJ/mol Tri-methyl Benzene: A 3.22e4 1/atm/s Ea 24.6 kJ/mol Tetra-methyl Benzene: A 4.36e4 1/atm/s Ea 12.9 kJ/mol Penta-methyl Benzene: A 2.76e6 1/atm/s Ea 40.8 kJ/mol Continued on next page 8.2 Kinetic modeling: Procedure 140

Table 8.3 – continued from previous page Kinetic information Hexa-methyl Benzene: A 1.75e6 1/atm/s Ea 117.7 kJ/mol

Methylation of aromatics: Benzene: A /1.44e6 1/atm/s Ea 58 kJ/mol Toluene: A 5.12e4*5 1/atm/s Ea 52 kJ/mol o-Xylene: A 4.48e5*4 1/atm/s Ea 62 kJ/mol p-Xylene: A 1.04e5*4 1/atm/s Ea 62 kJ/mol Tri-methyl benzene and higher: A 4.64e4 1/atm/s Ea 65.4 kJ/mol

Remark: For gem-methylation, values are taken from Lesthaeghe et al. [259], McCann et al. [254], and Lesthaeghe et al. [260]. The apparent values are calculated by adding appropriate physiorption entropy and enthalpy correc- tions. Methylation values are from unpublished data of Ian Hill, Depart- ment of Chemical Engineering & Materials Science, University of Minnesota. Demethylation is taken to be reverse in all cases. (x) Hydride shift, methyl shift, and ring allyl shift Remarks: These are assumed to be rapid. (xi) Hydride transfer Any ←→ Secondary: A 1.6e6 1/atm/s Ea 86.2 kJ/mol Any ←→ Primary: A 1.6e6 1/atm/s Ea 123.4 kJ/mol Primary/Secondary ←→ Tertiary/ allylic: A 1.6e6 1/atm/s Ea 69.5 kJ/mol Tertiary ←→ Tertiary/ allylic: A 1.6e6 1/atm/s Ea 90.0 kJ/mol Reactant is surface methyl: A 1.6e6 1/atm/s Ea 116.2 kJ/mol Remarks: All values are taken from Bhan et al.[17]. A 30 kJ/mol penalty over the reference value of 86.2 kJ/mol is used if the hydride transfer involves a methyl group being the acceptor. This prevents formation of methane - which is not observed experimentally.

8.2.4 Network generation and lumping

The reaction network generation and lumping procedure is described in steps below: 8.2 Kinetic modeling: Procedure 141

* Figure 8.1: Linear correlation of thermochemistry with carbon number. (a) Enthalpy of alkoxide - Enthalpy of alkane in kJ/mol at 298 K; (b) Entropy of alkoxide - Entropy of alkane in J/mol/K at 298 K 8.3 Results 142

1. Generate the reaction network using the reaction rules given above. Assume en- capsulation for hydride transfer (see Chapter5).

2. Lump the network such that:

• Functional lumping is applied to all molecules except aromatics, for which it

is applied only if larger than C8. This is to ensure xylenes are differentiated,

• Paraffins, olefins, naphthenics, and aromatics are lumped for C6 (C8 for aro- matics) and higher, or if they are branched, and • Surface intermediates are lumped according to the nature of their alkoxide

bond (primary, secondary, or tertiary) and size (C6 and higher, or if they are branched).

3. Reconstruct the lumped network to get the lumped reactions of hydride transfer completely.

This process minimizes to the maximum extent possible the combinatorial explo- sion that is likely associated with hydride transfer. The resultant network has about 290 lumped species and 4000 lumped reactions. The complete network is estimated to include 60,000+ species and more than one million reactions.

8.3 Results

Table 8.4 compares the predictions of this model with sample experimental data†. The model captures the conversion to within a factor of 2. Experimentally, a significant amount of methanol is obtained. This has not been accounted for completely in the current model. This is possibly due to hydration of the methyl groups on the surface; this reaction was not considered explictly in this model. Experiments do not report water in the effluent; hence, a direct comparison of predicted yield of water with experimental data is not possible. Accounting for this reaction would likely lead to a more accurate value of conversion.

Both model and experimental results indicate that selectivity to aliphatic C2-C9 compounds is higher than to aromatics. Ilias and Bhan [250] postulate that this is

†This is unpublished data of Rachit Khare, Department of Chemical Engineering & Materials Science, University of Minnesota 8.3 Results 143

Table 8.4: Comparison of MTH model predictions with experimental data

Experimental conditions Temperature 624 K Pressure 135 kPa Parameters Model predictions Experimental data Overall DME conversion‡ 13 % 26% Selectivity (C%) C2 41 13 C3 31 10 C4 1.7 17.4 C5 1.4 12.4 C6 0.5 12.1 C7 0.05 10 C8 0.02 7.3 C9 0.06 10.6 Benzene 0.44 0˜ Toluene 10.8 0.45 Xylenes 7.5 3.7 Trimethyl Benzene 0.7 1.5 Tetramethyl Benzene 0.06 0.7 Pentamethyl Benzene 0.23 0.02 Hexamethyl Benzene 3.6 0.03 8.3 Results 144 because the hydrocarbon pool in the system is preferentially seeded with propene, an olefin, which increases the rate of propogation of the olefin cycle. However, the model predicts a larger amount of light-end products such as C2 and C3 alkanes and alkenes compared to experimental observations. This is likely due to large beta-scission values that promote cracking of larger aliphatics. RING predicts larger yields of aromatics than those experimentally observed; this indicates that cyclization and hydride transfer rates at this temperature are lower than assumed. Figure 8.2 shows the profile of DME flow rate along the catalyst bed. It can be seen that the rate of consumption of DME is fairly constant along the reactor. The initial rate for DME conversion is about 0.15 mol DME/(mol acid sites-second) and then after the first 10% of the bed stabilizes to 0.25 DME/(mol acid sites-second). However, the concentration of the surface species on the catalyst changes significantly in the first 10% of the bed, after which the concentrations are fairly constant, as shown in Figure 8.3. Initially, the bed is mostly covered by surface methyl group; subsequently ethoxide and propoxides dominate the surface. As shown in Figure 8.4, the concentration of propylene remains constant initially and subsequently increases slightly. The ethylene flow rate, on the other hand, continuously increases along the bed, thereby increasing the hydrocarbon pool concentration along the bed. It can be seen that toluene, the aromatic product, has a near-zero production rate initially and the flow rate continu- ously increases along the reactor. This is characteristic of secondary products – toluene is formed by successive methylation of propylene and ethylene to form large aliphat- ics that cyclize and further undergo aromatization by participating in hydride transfer steps. Table 8.5 shows degree of rate control (DORC) values (see Chapter4 for definition) for various gas phase species in the reactor. For DME, the kinetic rate constant with the highest DORC is the DME adsorption rate constant (see Table 8.3(iii)). The DORC value increases from 0.1 to 0.6 in the first 10% of the bed, coinciding with the sharp reduction in the concentration of the surface species. This means that, while the con- centration of the surface methyl species will be less sensitive to the kinetic parameter initially than in the later part of the reactor. As DME conversion increases, the DORC value corresponding the adsorption rate constant decreases. The DORC value is less than 1 which signifies that there is no standard rate-determining step for DME con- sumption; however, it can be concluded that the adsorption step is the most significant step in DME conversion. 8.3 Results 145

* Figure 8.2: The flow rate profile of DME along the catalyst bed.

* Figure 8.3: Concentration profile of surface alkoxides along the catalytic bed. Concen- trations are in mmol/g.cat. The total acid site concentration is 0.387 mmol/g.cat for an Si:Al ratio of 42. 8.3 Results 146

* Figure 8.4: The flow rate profile of a representative set of hydrocarbons along the catalyst bed.

The DORC values for ethylene, shown in Table 8.5, suggest that DME adsorp- tion also affects ethylene production. This is because DME adsorption determines the consumption of DME, which in turn determines the concentration of aliphatics and aro- matics in the hydrocarbon pool. The hydrocarbon pool components crack to produce ethylene. Ethylene production is also dependent significantly on methylation kinetic rate constant; the positive DORC value suggests that increasing methylation kinetics would increase ethylene production. This is contrary to our expectation that if the rate constant of a reaction that consumes a product increases, its rate of formation should decrease. In this case, however, the higher forward methylation rate constant would also increase the reverse beta-scission rate constant, because the two rate constants are related by a fixed equilibrium constant value. The effect of any change in kinetics is larger on the beta-scission step; hence, the increase in ethylene production. Toluene, on the other hand, shows a negative DORC value with respect to its methylation rate con- stant. It can be further noted that ether adsorption appears to be a dominant reaction step. 8.4 Discussion 147 Table 8.5: Degree of rate control values predicted by the MTH model

Catalyst-bed fraction Species 0 0.1 0.5 1 Rate constant DME 0.12 0.65 0.52 0.47 DME adsorption Ethylene – – – 0.45 DME adsorption – – – 0.4 Ethylene methylation Toluene - - - 0.45 DME adsorption – – – -0.4 Toluene methylation

8.4 Discussion

In this chapter, a rigorous kinetic model for the complex MTH chemistry was devel- oped. An exhaustive network of MTH was developed from elementary step rules and lumped further according to paraffins, olefins, naphthenics, and aromatics to reduce the size of the network. The thermochemistry of each individual species was predicted using linear semi-empirical correlations developed from computational chemistry data. Kinetic parameters were drawn from computational and experimental studies reported in the literature. The model predicts conversion, yields and selectivity to within an order of magnitude of experimental results, and in several cases within a factor of 2; this is remarkable, given that the kinetic parameters have been as reported. However, it is clear that with some re-estimation of these kinetic parameters, the model will capture the kinetics of the underlying mechanisms better. This requires parameter estimation and is discussed more elaborately in Chapter9. The kinetic model, as presented above, allows for obtaining a quantitative under- standing of the reaction system. Specifically for MTH chemistry, the kinetic model provided insights such as (a) the selectivity of system to aliphatic products over aro- matics, (b) the sensitivity of DME adsorption step in the overall conversion of DME and the production of other aliphatic and aromatic species, (c) the evolution of the hydrocarbon pool along the reactor, and (d) the contrasting effect of methylation rate constants on ethylene and toluene production. CHAPTER 9

Summary and Future

9.1 Summary and Discussion

The main contribution in this thesis is the development and application of a new com- putational tool, RING, for complex network generation and analysis. RING takes in as input: (a) initial reactants of the reaction system under study, (b) reaction rules that describe valid chemical transformations, and (c) post-processing instructions to lump the network by grouping isomers, query the network in terms of topological informa- tion such as pathways and mechanisms, and calculate kinetics and thermochemistry for thermochemical analysis and kinetic modeling of the reaction system. As outputs, RING generates an exhaustive network of reactions and species consistent with the in- puts, determines its lumped network of a much smaller size, finds reaction pathways and mechanisms to chosen products, estimates thermochemistry of all species and reac- tions in the network (that can then be used in pathway analysis), and formulates and solves kinetic models. The inputs into RING are written in the form of a program in an English-like domain specific reaction language that acts as a front-end. RING adopts several cheminformatics, chemical graph theory, and computer science algorithms for the (a) unambiguous representation of inputs into RING in the language, (b) internal representation of molecules as graphs and reaction rules as transformations, (c) internal abstraction of a reaction as an application of a graph transformation rule on the reactant

148 9.1 Summary and Discussion 149 graphs to generate new graphs (products), (d) traversal of the network to identify path- ways and mechanisms, (e) grouping together (or lumping) isomers based on functional groups, and (e) calculating thermochemistry. RING is available open-source [104]. RING can be used to model and analyze a variety of biomass and hydrocarbon processing systems including homogeneous and heterogeneous chemistries. Different types of analysis can be done depending on the availability of information. If only gen- eral chemistry of the system is known, topological network analysis can be performed using RING to identify possible pathways and products prior to experimentation, or to identify plausible pathways to experimentally observed products that is consistent other experimental observations. For example, we showed that glycerol dehydration to acrolein on Brønsted acid catalysts necessarily involves 3-hydroxypropanal, consistent with experimental observations that it was a primary product which was observed at low conversions only (Chapter5). Further, RING could also be used for mechanism hy- pothesis and for proposing experiments to discriminate multiple candidate mechanisms. For example, for acetone conversion on Brønsted acid catalysts, we identified among several possible mechanisms one plausible route that was able to match the experimen- tal inference of overall stoichiometric reaction through which two molecules of acetone got converted into one molecule each of acetic acid and isobutene. Based on this path- way, we could propose experiments such as isotope labeling studies that could confirm our predictions (Chapter5). In addition to an understanding of the chemistry, if it is possible to determine the energetics (activation barrier and thermochemistry) for each reaction step, then plausible energetically feasible mechanisms can be identified using RING. For example, we have used RING to identify plausible mechanisms for glyc- erol decomposition and hydrogenolysis to form syn gas or 1,2 propane diol respectively, by using RING’s pathway identification features along with semi-empirical estimation of thermochemistry (using group additivity) and activation barriers (using linear free energy relationships) given in the literature (Chapter7. This type of analysis can be ex- tended to other classes of systems requiring different molecular properties. For example, in Chapter6, RING was used to probe the synthetically feasible fatty alcohols that can be formed via heterogeneous catalysis from biomass-derived platform chemicals. Fatty alcohols in the product spectrum that could lead to alcohol ethoxylates with desired physical properties were identified. Subsequently, RING was used to identify synthesis routes to these products that could then be compared in terms of thermochemistry, biphasic separability of individual steps, overall atom efficiency, etc. The final type of 9.2 Future directions 150 analysis that can be done using RING is kinetic modeling, if kinetic parameters can be estimated for each reaction step apriori. RING could be used to obtain quantitative results such as yields and selectivity of products in a reactor, rate determining steps, and dominant reactions. This feature was demonstrated in Chapter8 in the context of Methanol-to-Hydrocarbons. RING can be used to provide inputs for additional computational studies in several ways. First, the pathways identified by RING for a complex system can be used for a computational chemistry analysis to calculate the thermochemistry and energy barriers of individual steps, and to thereby obtain insights into the dominant mechanisms. For example, Seshadri and Westmoreland obtained a list of pathways from RING for glucose pyrolysis – some of which were unintuitive – that were then used in an ab initio study to explore the mechanism of the first C-C and C-O scission steps to form pyrolysis products [261]. Second, RING can generate outputs in formats that can be easily used in other tools. For example, RING can generate the stoichiometry and kinetic rate expressions of each reaction in a format that can be directly used in ATHENA visual studio [262], thereby allowing for network generation with RING and subsequent network analysis such as parameter estimation using ATHENA. Similarly, RING can generate the network and kinetics information in the format acceptable directly into CHEMKIN [263]. Further, RING can generate the network information that can be interpreted using GAMS [264]. In a study by Marvin et al. [128], the reaction network generated by RING was fed into GAMS to formulate and solve a mixed-integer linear programming problems to simultaneously identify desirable gasoline-blend components and their optimal synthesis routes (in terms of economic, energetic, and reaction rate objectives) from biomass-derived oxygenates.

9.2 Future directions

RING can be further expanded in terms of features and capabilities in many fronts. These proposed extensions are discussed in detail below.

9.2.1 Parameter estimation

As noted in Chapter8, apriori estimation of kinetic parameters will not always result in a mechanistically accurate kinetic model that captures both yields and selectivities of products completely. It is, therefore, necessary to estimate kinetic paramters from 9.2 Future directions 151 available experimental data. In this context, we propose that RING be expanded to include parameter estimation. Preliminary efforts in this direction have already been undertaken. RING has been linked to an open-source nonlinear optimization software IPOPT [265]. The objective of the parameter estimation problem is to minimize the square of the relative residual of each experimental data point and the corresponding model prediction (see Equation 9.1 where ‘n’ is the number of experimental data sets and Sflow is the set of bulk species that can be experimentally measured). The decision variables of this optimization problem are the kinetic rate constants at a reference temperature and activation barriers of individual reaction steps, and thermochemical quantities. This is in contrast to the standard estimation problem involving pre-exponential factors and activation energy; it has been argued that recasting the problem in terms of rate constants and activation barriers leads to numerical robustness as well as unbiased estimates [266].

i=n expt model X X Fij − Fij Z = min ( )2 (9.1) F expt i=0 j∈Sflow ij Constraints for this problem can involve upper and lower bounds on the decision variables and relative ratios or differences betweeen two variables. For example, in MTH chemistry, the activation barrier decision variable for beta scission to tertiary alkoxides can be constrained to be lower than that to secondary alkoxides by a fixed value. At each iteration, IPOPT requires methods to calculate the objective, jacobian of the objective, residual of the constraints, gradient of the constraints, and the hessian of the objective and the constraints. RING provides these based on user inputs and kinetic modeling results. Specifically, to calculate the objective, RING solves a kinetic model with the current values of kinetic parameters. The jacobian of the objective is calculated from the sensitivity analysis results obtained from IDAS, the numerical solver in RING for solving the kinetic model, which gives dFi values for each parameter p . The hessian dpj j of the objective is approximated to be as given in equation 9.2. This assumes that the initial guess of parameters is close to the minima. The details of this approximation and convergence properties are given in Biegler [267].

dF j dF j 2 i=n i × i d Z X X dpi dpj = expt (9.2) dpidpj (F )2 i=0 j∈Sflow ij The resulting solution of this optimization problem will result in a set of kinetic 9.2 Future directions 152 parameters that captures the experimental data to the “best” extent possible given the assumptions about the chemistry. Using these kinetic parameters, RING can then be used to predict the yields and selectivity at different reactor conditions and sizing. Further, RING can be used to identify the dominant pathways in the system under those conditiona from which a reduced model that captures only the significant dynamics can be obtained. This reduced model would be numerically robust, quick to solve, and can be embedded into a larger process model for process design and optimization.

9.2.2 Reactor models

RING can currently formulate and solve kinetic models. To extend this to include reactor effects such as non-isothermal conditions, flow effects, mass transfer, etc. reactor models are required. To develop non-isothermal plug flow reactor models, the energy balance has to be formulated and solved along with the kinetic model. This is currently feasible because the enthalpy of formation and specific heat capacity of each species can be calculated from group additivity methods and no additional numerical methods are required. To incorporate flow and mass transfer effects, RING can be coupled to other software such CHEMKIN [263], CatalyticFOAM, and [268].

9.2.3 Dynamic simulations of the kinetic model

Dynamic, or time-varying, kinetic models of flow reactors allow for capturing: (a) effects such as approach to steady-state and induction periods∗, and (b) deactivation of cata- lysts. This requires solving partial differential-algebraic equations (PDAEs). RING can be expanded to include this feature within the existing software framework. PDAEs can be recast into DAEs through the method of lines [269]. This involves discretizing the model in terms of the time domain and solving an expanded set of differential-algebraic equations that now simutaneously solves for different times at each point along the reactor. The resulting DAEs can then be solved using IDAS.

9.2.4 Model-based design of experiments

In complex reaction systems, there is a possibility of multiple mechanisms (or reaction routes) existing between reactants and products. A major challenge in parameter esti- mation could arise as a result – multiple sets of kinetic parameters could lead to similar

∗the period from start-up to the point the effluent of the reactor reaches steady state 9.2 Future directions 153 predictions or multiple models can be proposed for a reaction system. Further, the confidence interval of certain parameter estimates may be unacceptably large because experimental data did not cover regions of the parameter space most sensitive to those parameters. Additional experimental data will, therefore, be required for performing model discrimination and improving accuracy. To this end, RING can be additionally equipped to do model-based experimental design by pursuing two approaches. First, each of the multiple models can be solved, at different operating conditions (concen- trations, flow rates, temperature, space velocity, etc.) and identify where and how the predictions of the different models diverge. This will help identify new experimental conditions at which additional data can be obtained. Second, for those parameters that have a large confidence interval, state-of-the-art experimental design methods such as the A and D optimality criteria [270] can be used to pinpoint the “best” operating conditions at which new experiments must be conducted. The additional data in both cases will be used for improving the estimates of the kinetic parameters. Thus, a strong feedback between experimentation and computations can be established.

9.2.5 Optimization

The kinetic and reactor models can be used to identify the optimal conditions at which a reactor should operate so that the yield and selectivity of desired products is maximized. For example, the feed components and their flow rates, temperature and pressure of the reactor, and catalyst amount required to maximize the yield and selectivity of aromatics or olefins in MTH chemistry can be identified using the kinetic model developed in Chapter8. This requires embedding the kinetic model in an optimization problem that solves for specific objectives such as maximizing yield. This is similar to parameter estimation in that an optimization problem needs to be solved that implicitly embeds a kinetic model. In terms of software infrastructure, therefore, the existing RING- IPOPT link can be employed. The objective of this optimization problem can then be: (a) maximizing/minimizing yields of specific products, (b) maximizing selectivity to certain classes of products such as olefins in MTH chemistry, (c) maximizing physical properties such as octane rating of the product mixture, and (d) minimizing operating costs or maximizing profitability. 9.2 Future directions 154 9.2.6 Efficient software

A final direction of improvement addresses the scalability and numerical efficiency and robustness of the methods employed in RING. Methods to handle very large reaction systems and kinetic models require faster and well-scaling methods than those cur- rently employed. Specifically, improvements can be made in two areas. First, RING is currently a serial code. Network generation, pathways and mechanism analysis, and kinetic modeling are all done in a serial manner; these can benefit from parallel pro- cessing. Specifically, in network generation, the process of identifying reactions of a species by applying each reaction rule on it can be parallelized because each application of a reaction rule is independent of another. The species in the unprocessed list during network generation can also be processed in parallel. Furthermore, in each application of a reaction rule, pattern matching of the reaction center on the reactant molecules is required; this can also be parallelized. Reaction pathways and mechanism analysis can be parallelized by adopting parallel network traversal algorithms. For kinetic modeling, parallel processing capabilities offered by IDAS can be leveraged. A second area of improvement is in employing robust and scalable numerical algo- rithms. To this end, for solving kinetic models, iterative methods such as GMRES can be adopted. This requires providing pre-conditioners that approximate the jacobian of the kinetic model that is also easy to invert. Incomplete LU decomposition could serve as an effective pre-conditioners for this case [271]. Further, the current parameter estimation scheme uses a sequential approach wherein a kinetic model is solved at each iteration. Alternatively, the parameter estimation problem can be re-written in such a manner that the constraints now include a discretized kinetic model. The discretization can be based on orthogonal collocation [267]. This is advantageous in many ways be- cause: (a) nonlinear optimization solvers can handle large systems efficiently by using state-of-the-art stiff and sparse solvers for solving linear equations, (b) the jacobian of the constraints can be calculated analytically using automatic differentiation, and (c) current optimizers are highly parallelized. Such efficient implementations will allow for handling significantly larger systems such as the complete MTH network (without lumping) consisting of several hundred thousand reactions and several tens of thousands of species, and the MTH network that includes 13C isotopically labeled initial reactants. It will also allow for solving more detailed reactor and process models that can be built on the kinetic model. 9.3 Conclusion 155 9.3 Conclusion

RING, a new open source computational tool for network generation and analysis was reported in this thesis. We propose that RING can be used for (i) network analysis in terms of identifying pathways and mechanisms, (ii) hypothesis of plausible mechanisms, (iii) identifying dominant mechanisms in a reaction system, and (iv) performing kinetic modeling of any type of complex reaction systems. RING, thus, can be used to eluci- date complex reaction systems and thereby guide experimentation. Several directions of improvements have been listed to enable RING to be employed in the design and development of optimal reactors and processes involving complex reaction systems.

Acknowledgements

The author would like to thank Prof. Eric Van Wyk and Ted Kaminski, Department of Computer Science and Engineering, University of Minnesota, for helpful suggestions on computer science algorithms and collaboration on developing the reaction language for RING. Financial support from the Initiative for Renewable Energy (Large Grant: RL-0004-09) at the University of Minnesota, the National Science Foundation Emerging Frontiers in Research and Innovation program, grant # 0937706 is gratefully acknowl- edged. Partial financial support from The Dow Chemical Company is also acknowl- edged. The author would like to acknowledge partial financial support from the Doctoral Dissertation Fellowships program of the University of Minnesota and the University of Minnesota Digital Technology Center, 2011 Digital Technology Initiative. Bibliography

[1] R. Vinu and Linda J. Broadbelt. Unraveling reaction pathways and specifying re- action kinetics for complex systems. Annual Review of Chemical and Biomolecular Engineering, 3(1):29–54, 2012.

[2] Linda J. Broadbelt and Jim Pfaendtner. Lexicography of kinetic modeling of complex reaction networks. AIChE Journal, 51(8):2112–2121, 2005.

[3] G. P. Froment, B. O. Van de Steene, P. S. Van Damme, S. Narayanan, and A. G. Goossens. Thermal cracking of ethane and ethane-propane mixtures. Industrial & Engineering Chemistry Process Design and Development, 15(4):495–504, 1976.

[4] D. Mohan, C. U. Pittman Jr., and P. H. Steele. Pyrolysis of wood/biomass for bio-oil: A critical review. Energy & Fuels, 20(3):848–889, 2006.

[5] G. Yaluris, R. J. Madon, and J. A. Dumesic. 2-methylhexane cracking on y zeolites: Catalytic cycles and reaction selectivity. Journal of Catalysis, 165:205– 220, 1997.

[6] A. Corma, G. W. Huber, L. Sauvanaud, and P. O’Connor. Biomass to chemicals: Catalytic conversion of glycerol/water mixtures into acrolein, reaction network. Journal of Catalysis, 257(1):163–171, 2008.

[7] Charles K. Westbrook, William J. Pitz, Olivier Herbinet, Henry J. Curran, and Emma J. Silke. A comprehensive detailed chemical kinetic reaction mechanism for

156 BIBLIOGRAPHY 157

combustion of n-alkane hydrocarbons from n-octane to n-hexadecane. Combustion and Flame, 156(1):181 – 199, 2009.

[8] H-W Wong, X Li, M. T. Swihart, and L. J. Broadbelt. Detailed kinetic modeling of silicon nanoparticle formation chemistry via automated mechanism generation. The Journal of Physical Chemistry A, 108(46):10122 – 10132, 2004.

[9] Shumaila S. Khan, Qizhi Zhang, and Linda J. Broadbelt. Automated mechanism generation. part 1: mechanism development and rate constant estimation for voc chemistry in the atmosphere. Journal of Atmospheric Chemistry, 63(2):125–156, 2009.

[10] A. N. Mayeno, R. S. H. Yang, and B. Reisfeld. Biochemical reaction network modeling: predicting metabolism of organic chemical mixtures. Environmental Science and Technology, 39:5363 – 5371, 2005.

[11] M. Rizzi, M. Baltes, U. Theobald, and M. Reuss. In vivo analysis of metabolic dynamics in saccharomyces cerevisiae: Ii. mathematical model. Biotechnology and Bioengineering, 55:592–608, 1997.

[12] J. L. Reed, T. D. Vo, C. H. Schilling, and B. O. Palsson. An expanded genome- scale model of escheria coli K-12 (iJR904 GSM/ GPR). Genome Biology, 4(9):R54, 2003.

[13] Prodromos Daoutidis, W. Alex Marvin, Srinivas Rangarajan, and Ana I. Tor- res. Engineering biomass conversion processes: A systems perspective. AIChE Journal, 59(1):3–18, 2013.

[14] Michael E. Jenkin, Sandra M. Saunders, and Michael J. Pilling. The tropospheric degradation of volatile organic compounds: a protocol for mechanism develop- ment. Atmospheric Environment, 31(1):81 – 104, 1997.

[15] R. J. Quann and S. B. Jaffe. Building useful models of complex reaction systems in petroleum refining. Chemical Engineering Science, 51(10):1615 – 1631, 1996.

[16] Teh C. Ho. Kinetic modeling of large-scale reac- tion systems. Catalysis Reviews, 50(3):287–378, 2008, http://www.tandfonline.com/doi/pdf/10.1080/01614940802019425. BIBLIOGRAPHY 158

[17] A. Bhan, S-H. Hsu, G. Blau, J. M. Caruthers, V. Venkatasubramanian, and W.N. Delgass. Microkinetic modeling of propane aromatization over HZSM-5. Journal of Catalysis, 235:35 – 51, 2005.

[18] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabasi. The large-scale organization of metabolic networks. Nature, 407:651–654, 2000.

[19] S. Rangarajan, A. Bhan, and P. Daoutidis. Rule-based generation of thermochem- ical routes to biomass conversion. Industrial & Engineering Chemistry Research, 49(21):10459 – 10470, 2010.

[20] S. Rangarajan, A. Bhan, and P. Daoutidis. Language-oriented rule-based reaction network generation and analysis: Description of RING. Computers & Chemical Engineering, 45:114–123, 2012.

[21] E. J. Corey, A. K. Long, and S. D. Rubenstein. Computer-assisted analysis in organic synthesis. Science, 228(4698):408 – 418, 1985.

[22] I. Ugi, J. Bauer, K. Bley, D. Alf, A. Dietz, E. Fortain, B. Gruber, R. Herges, M. Knauer, K. Reitsam, and N. Stein. Computer assisted solution of chemical problems-the historical development and present state of the art of a new discipline of chemistry. Angewandte Chemie International Edition in English, 32:201 – 227, 1993.

[23] I. Ugi, J. Bauer, C. Blomvberger, J. Brandt, A. Dietz, E. Fontain, B. Gruber, A. V. Scholley-Pfab, A. Senff, and N. Stein. Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry. Journal of Chemical Information and Computer Science, 34:3 – 16, 1994.

[24] J. B. Hendrickson. The SYNGEN approach to synthesis design. Analytica Chimica Acta, 235:103 – 113, 1990.

[25] J. Dugundji and I. Ugi. An algebraic model of constitutional chemistry as a basis for chemical computer programs. Topics in Current Chemistry, 39:19 – 64, 1973.

[26] W. L. Jorgensen, E. R. Laird, A. J. Gushurst, J. M. Fleischer, S. A. Gothe, H. E. Helson, G. D. Paderes, and S. Sinclair. CAMEO: a program for the logical BIBLIOGRAPHY 159

prediction of the products of organic reactions. Pure and Applied Chemistry, 62(10):1921 – 1932, 1990.

[27] I. M. Socorro and J. M. Goodman. The ROBIA program for predicting organic reactivity. Journal of Chemical Information and Modeling, 46(2):606 – 614, 2006.

[28] I. M. Socorro, K. Taylor, and J. M. Goodman. ROBIA: A reaction prediction program. Organic Letters, 7(16):3541 – 3544, 2005.

[29] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, and J. J. P. Stewart. AM1: A new general purpose quantum mechanical molecular model. Journal of the American Chemical Society, 107:3902 – 3909, 1985.

[30] J. H. Chen and P. Baldi. No electron left behind: A rule-based expert system to predict chemical reactions and reaction mechanisms. Journal of Chemical Infor- mation and Modeling, 49(9):2034 – 2043, 2009.

[31] Inc Daylight Chemical Information Systems. Daylight Theory Manual, accessed Nov 2011. Chapter 5.

[32] A. Kerber, R. Laue, M. Meringer, and C. R¨ucker. Molecules in silico: A graph description of chemical reactions. Journal of Chemical Information and Modeling, 47(3):805 – 817, 2007.

[33] T. Lenaerts and H. Bersini. A Synthon approach to artificial chemistry. Artificial Life, 15(1):89 – 103, 2009.

[34] R. Rosello and G. Valiente. Graph transformation in molecular biology. Lecture Notes Computer Science, 3393:116 – 133, 2005.

[35] G. Benk, C. Flamm, and P. F. Stadler. A graph-based toy model of chemistry. Journal of Chemical Information and Computer Sciences, 43(4):1085 – 1093, 2003.

[36] M. K. Yadav, B. P. Kelley, and S. M. Silverman. The potential of a chemical graph transformation system. Lecture Notes in Computer Science, Proceedings of the Second International Conference, ICGT 2004, Rome, Italy., 3256:83 – 95, 2004. BIBLIOGRAPHY 160

[37] S. E. Prickett and M. L. Mavrovouniotis. Construction of complex reaction sys- tems .2 Molecule manipulation and reaction application algorithms. Computers & Chemical Engineering, 21(11):1237 – 1254, 1997.

[38] A.V. Zeigarnik, Bruk. L. G., O. N. Temkin, V. A. Likholobov, and L. Maier. Computer-aided studies of reaction mechanisms. Russian Chemical Reviews, 65(2):117 – 130, 1996.

[39] A.S. Tomlin and T. Turanyi. Piling, M. J., Low-Temperature Combustion and Auto-Ignition, volume 35. Elsevier, Amsterdam, 1997. Chapter 4.

[40] L. J. Broadbelt, S. M. Stark, and M. T. Klein. Computer-generated pyrolysis modeling - on the fly generation of species, reactions and rates. Industrial & Engineering Chemistry Research, 33(4):790–799, 1994.

[41] S. E. Prickett and M. L. Mavrovouniotis. Construction of complex reaction sys- tems .1. Reaction description language. Computers & Chemical Engineering, 21(11):1219–1235, 1997.

[42] A. Ratkiewicz and T. N. Truong. Application of chemical graph theory for au- tomated mechanism generation. Journal of Chemical Information and Modeling, 43:36 – 44, 2003.

[43] Miguel A. Baltanas and Gilbert F. Froment. Computer generation of reaction networks and calculation of product distributions in the hydroisomerization and hydrocracking of paraffins on pt-containing bifunctional catalysts. Computers & Chemical Engineering, 9(1):71 – 81, 1985.

[44] S.H. Hsu, B. Krishnamurthy, P. Rao, C. H. Zhao, S. Jagannathan, and V. Venkata- subramanian. A domain-specific compiler theory based framework for automated reaction network generation. Computers & Chemical Engineering, 32(10):2455– 2470, 2008.

[45] Jean-Loup Faulon and Allen G. Sault. Stochastic generator of chemical structure. 3. reaction network generation. Journal of Chemical Information and Computer Sciences, 41(4):894–908, 2001, http://pubs.acs.org/doi/pdf/10.1021/ci000029m. PMID: 11500106. BIBLIOGRAPHY 161

[46] R. G Susnow, A. M. Dean, W. H. Green, P. Peczak, and L. J. Broadbelt. Rate- based construction of kinetic models for complex systems. The Journal of Physical Chemistry A, 101(20):3731 – 3740, 1997.

[47] L. J. Broadbelt, S. M. Stark, and M. T. Klein. Termination of computer-generated reaction mechanisms: Species rank-based convergence criterion. Industrial & En- gineering Chemistry Research, 34(8):2566 – 2573, 1995.

[48] K.M. Van Geem, M-F. Reyniers, G.B. Marin, J. Song, W.H. Green, and D. M. Matheu. Automatic reaction network generation using RMG for steam cracking of n-hexane. AIChE Journal, 52(2):718–730, 2006.

[49] James R. Faeder, William S. Hlavacek, Ilona Reischl, Michael L. Blinov, Henry Metzger, Antonio Redondo, Carla Wofsy, and Byron Goldstein. In- vestigation of early events in fceri-mediated signaling using a detailed math- ematical model. The Journal of Immunology, 170(7):3769–3781, 2003, http://www.jimmunol.org/content/170/7/3769.full.pdf+html.

[50] M. Arita. The metabolic world of escherichia coli is not small. Proceedings of the National Academy of Sciences, 101(6):1543 – 1547, 2004.

[51] J. Gonzalez-Lergier, L. J. Broadbelt, and V. Hatzimanikatis. Theoretical con- siderations and computational analysis of the complexity in polyketide synthesis pathways. Journal of the American Chemical Society, 127(27):9930 – 9938, 2005.

[52] L.T. Fan, B. Bertok, and F. Friedler. A graph-theoretic method to identify candi- date mechanisms for deriving the rate law of a catalytic reaction. Computers and Chemistry, 26:265 – 292, 2002.

[53] Ilie Fishtik, Caitlin A. Callaghan, and Ravindra Datta. Reaction route graphs. i. theory and algorithm. The Journal of Physical Chemistry B, 108(18):5671–5682, 2004, http://pubs.acs.org/doi/pdf/10.1021/jp0374004.

[54] Y-C. Lin, L. T. Fan, S. Shafie, B. Bertok, and F. Friedler. Generation of light hy- drocarbons through Fischer-Tropsch synthesis: Identification of potentially dom- inant catalytic pathways via the graph-theoretic method and energetic analysis. Computers and Chemical Engineering, 33:1182 – 1186, 2009. BIBLIOGRAPHY 162

[55] C. S. Henry, L. J. Broadbelt, and V. Hatzimanikatis. Thermodynamics-based metabolic flux analysis. Biophysical Journal, 92(5):1792 – 1805, 2007.

[56] A. Kummel, S. Panke, and M. Heinemann. Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data. Molecular Systems Biology, 2:0034, 2006.

[57] C. Li, C. S. Henry, M. D. Jankowski, J. A. Ionita, V. Hatzimanikatis, and L. J. Broadbelt. Computational discovery of biochemical routes to specialty chemicals. Chemical Engineering Science, 59:5051 – 5060, 2004.

[58] S. D. Finley, L. J. Broadbelt, and V. Hatzimanikatis. Computational framework for predictive biodegradation. Biotechnology and Bioengineering, 104(6):1086 – 1097, 2009.

[59] James J. P. Stewart. MOPAC-2009, stewart computational chemistry. OpenMOPAC. net(2008), accessed Dec 2010.

[60] L. J. Broadbelt, S. M. Stark, and M. T. Klein. Computer generated reaction net- works: on-the-fly calculation of species properties using computational . Chemical Engineering Science, 49(24(2)):4991 – 5010, 1994.

[61] L. J. Broadbelt, S. M. Stark, and M. T. Klein. Computer generated reaction mod- elling: Decomposition and encoding algorithms for determining species unique- ness. Computers & Chemical Engineering, 20(2):113 – 129, 1996.

[62] S. J. Chinnick, D. L. Baulch, and P. B. Asyscough. An expert system for hy- drocarbon pyrolysis reactions. Chemometrics and Intelligent laboratory Systems, 5:39–52, 1988.

[63] B. Heyberger, F. Battin-Leclerc, V. Warth, R. Fournet, G. M. Come, and G. Scac- chi. Comprehensive mechanism for the gas-phase oxidation of propene. Combus- tion and Flame, 126:1780–1802, 2001.

[64] V. Warth, F. Battin-Leclerc, R. Fournet, P.A. Glaude, G.M. Come, and G. Scac- chi. Computer based generation of reaction mechanisms for gas-phase oxidation. Computers and chemistry, 24:541–560, 2000.

[65] E. S. Blurock. Reaction system for modeling chemical reactions. Journal of Chemical Information and Computer Science, 35:607 – 616, 1994. BIBLIOGRAPHY 163

[66] W.H. Green, B. Bhattacharjee, O. Oluwole, J. Song, R. Sumathi, C. D. Wijaya, Wong H-W., P. E. Yelvington, and J. Yu. New methods for predictive chemical kinetics. Prepr. Pap.-Am. Chem. Soc. Div. Fuel Chem., 49(1):323, 2004.

[67] J. Song. Massachusetts Institute of Technology, 2004. PhD. Dissertation.

[68] S. E. Prickett and M. L. Mavrovouniotis. Construction of complex reaction systems.3. An example: alkylation of olefins. Computers & Chemical Engineering, 21(12):1325 – 1337, 1997.

[69] S. Katare, A. Bhan, J. M. Caruthers, W. N. Delgass, and V. Venkatasubrama- nian. A hybrid genetic algorithm for efficient parameter estimation of large kinetic models. Computers & Chemical Engineering, 28(12):2569–2581, 2004.

[70] J.M. Caruthers, J.A. Lauterbach, K.T. Thomson, V. Venkatasubramanian, C.M. Snively, A. Bhan, S. Katare, and G. Oskarsdottir. Catalyst design: knowledge extraction from high-throughput experimentation. Journal of Catalysis, 216(1- 2):98–109, MAY 15 2003.

[71] F. P. Di Maio and P. G. Lignola. KING, a kinetic network generator. Chemical Engineering Science, 47((9-11)):2713 – 2718, 1992.

[72] M. L. Blinov, J. Yang, J. R. Faeder, and W. S. Hlavacek. Graph theory for rule- based modeling of biochemical networks. Transactions on Computational Systems Biology VII, Lecture notes in Computer Sceince, 4230:89 – 106, 2006.

[73] J. R. Faeder, M. L. Blinov, B. Goldstein, and W. Hlavacek. Rule-based modeling of biochemical networks. Complexity, 10(4):22–41, 2005.

[74] M.L. Blinov, J. R. Faeder, J. Yang, B. Goldstein, and W. S. Hlavacek. ’On-the-fly’ or ’generate-first’ modeling? Nature Biotechnology, 23(11):1344 – 1345, 2005.

[75] C. S. Henry, L. J. Broadbelt, and V. Hatzimanikatis. Discovery and analy- sis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3- hydroxypropanoate. Biotechnology and Bioengineering, 106(3):462 – 473, 2010.

[76] A. Hill, J. Tomshine, E. Wedding, V. Sotiropoulos, and Y. Kaznessis. SynBioSS: the synthetic biology modeling suite. Bioinformatics, 24(21):2551 – 2553, 2008. BIBLIOGRAPHY 164

[77] Arthur N. Mayeno, Raymond S. H. Yang, and Brad Reisfeld. Biochem- ical reaction network modeling:? predicting metabolism of organic chemi- cal mixtures. Environmental Science & Technology, 39(14):5363–5371, 2005, http://pubs.acs.org/doi/pdf/10.1021/es0479991. PMID: 16086453.

[78] P. C. Milner. The possible mechanisms of complex reactions involving consecutive steps. Journal of Electrochemical Society, 111:228–232, 1964.

[79] D. Croes, F. Couche, S. J. Wodak, and J. van Helden. Inferring meaningful pathways in weighted metabolic networks. Journal of Molecular Biology, 356:222– 236, 2006.

[80] A. P. Heath, G. N. Bennett, and L. E. Kavraki. Finding metabolic pathways using atom tracking. Bioinformatics, 26(12):1548 – 1555, 2010.

[81] M. L. Mavrovouniotis and G. Stephanopolous. Synthesis of reaction mechanisms consisting of reversible and irreversible steps. 1. A synthesis approach in the con- text of simple examples. Industrial & Engineering Chemistry Research, 31:1625– 1637, 1992.

[82] M. L. Mavrovouniotis. Synthesis of reaction mechanisms consisting of reversible and irreversible steps. 2. Formalization and analysis of the synthesis algorithm. Industrial & Engineering Chemistry Research, 31:1637–1653, 1992.

[83] M. Otarod and J. Happel. Studies on the structure of chemical mechanisms. Chemical Engineering Science, 47(3):587 – 592, 1992.

[84] J. A. Papin, J. Stelling, N. D. Price, S. Klmat, S. Schuster, and B. O. Palsson. Comparison of network-based pathway analysis methods. Trends in Biotechnology, 22(8):400–405, 2004.

[85] CHRISTOPHE H. SCHILLING, DAVID LETSCHER, and BERNHARD . PALS- SON. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. Journal of Theoretical Biology, 203(3):229 – 248, 2000.

[86] Stefan Schuster and Claus Hilgetag. On elementary flux modes in biochemical reaction systems at steady state. Journal of Biological Systems, 02(02):165–182, 1994, http://www.worldscientific.com/doi/pdf/10.1142/S0218339094000131. BIBLIOGRAPHY 165

[87] Eric Van Wyk, Derek Bodin, Jimin Gao, and Lijesh Krishnan. Silver: an extensi- ble attribute grammar system. Science of Computer Programming, 75(1–2):39–54, January 2010.

[88] R. J. Quann and S. B. Jaffe. Structure oriented lumping - describing the chemistry of complex hydrocarbon mixtures. Industrial & Engineering Chemistry Research, 31(11):2483–2497, 1992.

[89] E. Ranzi, M. Dente, A. Goldaniga, G. Bozzano, and T. Faravelli. Lumping pro- cedures in detailed kinetic modeling of gasification, pyrolysis, partial oxidation and combustion of hydrocarbon mixtures. Progress in Energy and Combustion Science, 27(1):99 – 139, 2001.

[90] J. C. W. Kuo and James Wei. Lumping analysis in monomolecu- lar reaction systems. analysis of approximately lumpable system. In- dustrial & Engineering Chemistry Fundamentals, 8(1):124–133, 1969, http://pubs.acs.org/doi/pdf/10.1021/i160029a020.

[91] Genyuan Li and Herschel Rabitz. A general analysis of exact lumping in chemical kinetics. Chemical Engineering Science, 44(6):1413 – 1430, 1989.

[92] S. Benson. Thermochemical Kinetics. John Wiley & Sons, 1976.

[93] Alan C. Hindmarsh, Peter N. Brown, Keith E. Grant, Steven L. Lee, Radu Serban, Dan E. Shumaker, and Carol S. Woodward. Sundials: Suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Math. Softw., 31(3):363–396, September 2005.

[94] CharlesT. Campbell. Future directions and industrial perspectives micro- and macro-kinetics: Their relationship in heterogeneous catalysis. Topics in Catalysis, 1(3-4):353–366, 1994.

[95] Michael Jerry Antal, William S. L. Mok, and Geoffrey N. Richards. Mechanism of formation of 5-(hydroxymethyl)-2-furaldehyde from -fructose and sucrose. Car- bohydrate Research, 199(1):91 – 109, 1990.

[96] D. Weininger. SMILES, A Chemical Language and Information System. 1. Intro- duction to Methodology and Encoding Rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, FEB 1988. BIBLIOGRAPHY 166

[97] H. Chiang and A. Bhan. Catalytic consequences of hydroxyl group location on the rate and mechanism of parallel dehydration reactions of ethanol over acidic zeolites. Journal of Catalysis, 271(2):251 – 261, 2010.

[98] H. Conrad, G. Ertl, and E. E. Latta. Adsorption of hydrogen on palladium single surfaces. Surface Science, 41:435 – 446, 1974.

[99] J. N. Brφnsted. Acid and basic catalysis. Chemical Reviews, 5(3):231–338, 1928, http://pubs.acs.org/doi/pdf/10.1021/cr60019a001.

[100] CambridgeSoft Inc. ChemDraw. http://www.cambridgesoft.com/software/ChemDraw/, December 2013.

[101] A. Corma, S. Iborra, and A. Velty. Chemical routes for the transformation of biomass into chemicals. Chemical Reviews, 107(6):2411–2502, 2007.

[102] G. W. Huber, S. Iborra, and A. Corma. Synthesis of transportation fuels from biomass: Chemistry, catalysts, and engineering. Chemical Reviews, 106:4044– 4098, 2006.

[103] L. D. Schmidt and P. J. Dauenhauer. Hybrid routes to biofuels. Nature, 447:914– 915, 2007.

[104] RING. http://research.cems.umn.edu/bhan/software.php, 2013.

[105] A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: An annotated bibliography. ACM SIGPLAN Notices, 35(6):26–36, jun 2000.

[106] Eric Van Wyk and August Schwerdfeger. Context-aware scanning for parsing extensible languages. Intl. Conf. on Generative Programming and Component Engineering, (GPCE), October 2007.

[107] Eric Van Wyk, Lijesh Krishnan, August Schwerdfeger, and Derek Bodin. Attribute grammar-based language extensions for Java. European Conf. on Object Oriented Prog. (ECOOP), 4609:575–599, 2007.

[108] Yogesh Mali and Eric Van Wyk. Building extensible specifications and imple- mentations of Promela with AbleP. Proceedings of 18th the International SPIN Workshop on Model Checking of Software (SPIN 2011), 6823:108–125, July 2011. BIBLIOGRAPHY 167

[109] Dick Grune and Ceriel J. H. Jacobs, editors. Parsing Techniques. Springer, 2008.

[110] August Schwerdfeger and Eric Van Wyk. Verifiable composition of determinis- tic grammars. Proc. of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2009.

[111] S. Bauerschmidt and J. Gasteiger. Overcoming the limitations of a connection ta- ble description: A universal representation of chemical species. Journal of Chem- ical Information and Computer Science, 37:705 – 714, 1997.

[112] C. Steinbeck, Y. Q. Han, S. Kuhn, O. Horlacher, E. Luttmann, and E. Willigha- gen. The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sci- ences, 43(2):493 – 500, 2003.

[113] Inc Daylight Chemical Information Systems. Daylight Theory Manual, accessed Nov 2011. Chapter 4.

[114] E. Proschak, J. K. Wegner, A. Schller, G. Schneider, and U. Fechner. Molecular query language (MQL) a context-free grammar for substructure matching. Journal of Chemical Information and Modeling, 47(2):295 – 301, 2007.

[115] D. Weininger, A. Weininger, and J. L. Weininger. SMILES. 2. algorithm for gen- eration of unique smiles notation. Journal of Chemical Information and Computer Sciences, 29(2):97 – 101, 1989.

[116] M. Misra. Handbook of Cheminformatics Algorithms, editors, Faulon, J-L. and Bender, A. CRC Press: Florida, 2010. Chapter 2, p 37.

[117] H-W. Wong, X. Li, M. T. Swihart, and L. J. Broadbelt. Encoding of polycyclic Si- containing molecules for determining species uniqueness in automated mechanism generation. Journal of Chemical Information and Computer Science, 43:735 – 742, 2003.

[118] T. Hanser, P. Jauffret, and Kaufmann G. A new algorithm for exhaustive ring perception in a molecular graph. Journal of Chemical Information and Modeling, 36:1146 – 1152, 1996. BIBLIOGRAPHY 168

[119] B. L. Roos-Kozel and W. L. Jorgensen. Computer-assisted mechanistic evaluation of organic reactions. 2. Perception of rings, aromaticity, and tautomers. Journal of Chemical Information and Computer Science, 21:101–111, 1981.

[120] J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM, 23(1):31 – 42, 1976.

[121] R. E. Carhart. Erroneous claims concerning the perception of topological sym- metry. Journal of Chemical Information and Computer Science, 18(2):108 – 110, 1978.

[122] D. Eppenstein. Finding the k shortest paths. SIAM Journal of Computing, 28:652– 673, 1998.

[123] Francisco J. Planes and John E. Beasley. A critical examina- tion of stoichiometric and path-finding approaches to metabolic pathways. Briefings in Bioinformatics, 9(5):422–436, 2008, http://bib.oxfordjournals.org/content/9/5/422.full.pdf+html.

[124] J. Happel and P. H. Sellers. Multiple reaction mechanisms in catalysis. Industrial & Engineering Chemistry Research, 21, 1982.

[125] F.J. Planes and J.E. Beasley. Path finding approaches and metabolic pathways. Discrete Applied Mathematics, 157(10):2244 – 2256, 2009. ¡ce:title¿Networks in Computational Biology¡/ce:title¿.

[126] Steffen Klamt and Jrg Stelling. Two approaches for metabolic pathway analysis? Trends in Biotechnology, 21(2):64 – 69, 2003.

[127] Sangbum Lee, Chan Phalakornkule, Michael M Domach, and Ignacio E Gross- mann. Recursive {MILP} model for finding all the alternate optima in {LP} models for metabolic networks. Computers & Chemical Engineering, 24(27):711 – 716, 2000.

[128] W. Alex Marvin, Srinivas Rangarajan, and Prodromos Daoutidis. Au- tomated generation and optimal selection of biofuel-gasoline blends and their synthesis routes. Energy & Fuels, 27(6):3585–3594, 2013, http://pubs.acs.org/doi/pdf/10.1021/ef4003318. BIBLIOGRAPHY 169

[129] W. T. Wipke, S. Krishnan, and G. I. Ouchi. Hash functions for rapid storage and retrieval of chemical structures. Journal of Chemical Information and Computer Science, 18(1):32 – 37, 1978.

[130] W. T. Wipke and T. M. Dyott. Steriochemically unique naming algorithm. Journal of the American Chemical Society, 96(15):4834 – 4842, 1974.

[131] W. D. Ihlenfeldt and J. Gasteiger. Hash codes for the identification and classi- fication of molecular structure elements. Journal of Computational Chemistry, 15(8):793 – 813, 1994.

[132] Pamela G. Coxson and Kenneth B. Bischoff. Lumping strategy. 1. introductory techniques and applications of cluster analysis. In- dustrial & Engineering Chemistry Research, 26(6):1239–1248, 1987, http://pubs.acs.org/doi/pdf/10.1021/ie00066a031.

[133] M. Salciccioli, S. M. Edie, and D. G. Vlachos. Adsorption of acid, ester, and ether functional groups on Pt: Fast prediction of thermo- chemical properties of adsorbed oxygenates via DFT-based group additivity methods. The Journal of Physical Chemistry C, 116(2):1873–1886, 2012, http://pubs.acs.org/doi/pdf/10.1021/jp2091413.

[134] A. B. Mhadeshwar, H. Wang, and D. G. Vlachos. Thermodynamic consistency in microkinetic development of surface reaction mechanisms. The Journal of Physical Chemistry B, 107(46):12721–12733, 2003.

[135] David M. Matheu, William H. Green, and Jeffrey M. Grenda. Capturing pressure- dependence in automated mechanism generation: Reactions through cycloalkyl intermediates. International Journal of Chemical Kinetics, 35(3):95–119, 2003.

[136] S. Rangarajan, A. Bhan, and P. Daoutidis. Language-oriented rule-based reaction network generation and analysis: Applications of RING. Computers & Chemical Engineering, 46:141–152, 2012.

[137] Yogesh V. Joshi, Aditya Bhan, and Kendall T. Thomson. Dft-based reaction path- way analysis of hexadiene cyclization via carbenium ion intermediates: mechanistic study of light alkane aromatization catalysis. The Journal of Physical Chemistry B, 108(3):971–980, 2004, http://pubs.acs.org/doi/pdf/10.1021/jp036205m. BIBLIOGRAPHY 170

[138] Yogesh V. Joshi and Kendall T. Thomson. Embedded cluster (qm/mm) inves- tigation of {C6} diene cyclization in hzsm-5. Journal of Catalysis, 230(2):440 – 463, 2005.

[139] Yogesh V. Joshi and Kendall T. Thomson. Brønsted acid catalyzed cyclization of c7 and c8 dienes in hzsm-5: A hybrid qm/mm study and comparison with c6 diene cyclization. The Journal of Physical Chemistry C, 112(33):12825–12833, 2008, http://pubs.acs.org/doi/pdf/10.1021/jp712071k.

[140] V.B. Kazansky, M.V. Frash, and R.A. van Santen. A quantum-chemical study of hydride transfer in catalytic transformations of paraffins on zeolites. pathways through adsorbed nonclassical carbonium ions. Catalysis Letters, 48(1-2):61–67, 1997.

[141] V.B. Kazansky, M.V. Frash, and R.A. van Santen. Quantum-chemical study of hydride transfer in catalytic transformation of paraffins on zeolites. In Son-Ki Ihm Hakze Chon and Young Sun Uh, editors, Progress in Zeolite and Microporous Materials Preceedings of the 11th International Zeolite Conference, volume 105 of Studies in Surface Science and Catalysis, pages 2283 – 2290. Elsevier, 1997.

[142] J.S. Buchanan, J.G. Santiesteban, and W.O. Haag. Mechanistic considerations in acid-catalyzed cracking of olefins. Journal of Catalysis, 158(1):279 – 287, 1996.

[143] Dmitri B. Lukyanov, N. Suor Gnep, and Michel R. Guisnet. Ki- netic modeling of propane aromatization reaction over hzsm-5 and gahzsm- 5. Industrial & Engineering Chemistry Research, 34(2):516–523, 1995, http://pubs.acs.org/doi/pdf/10.1021/ie00041a012.

[144] Patricia Cheung, Aditya Bhan, Glenn J. Sunley, David J. Law, and Enrique Igle- sia. Site requirements and elementary steps in dimethyl ether carbonylation cat- alyzed by acidic zeolites. Journal of Catalysis, 245(1):110 – 123, 2007.

[145] O Kresnawahjuesa, R.J Gorte, and David White. Characterization of acylating intermediates formed on h-zsm-5. Journal of Molecular Catalysis A: Chemical, 208(12):175 – 185, 2004. BIBLIOGRAPHY 171

[146] Rafael Gomez-Bombarelli, Marina Gonzalez-Perez, Maria Teresa Perez-Prior, Emilio Calle, and Julio Casado. Computational calculation of equilibrium con- stants: Addition to carbonyl compounds. The Journal of Physical Chemistry A, 113(42):11423–11428, 2009.

[147] S. Rangarajan, A. Bhan, and P. Daoutidis. Identification and analysis of synthesis routes in complex catalytic reaction networks for biomass upgrading. Applied Catalysis B: Environmental, In Press, DOI: 10.1016/j.apcatb.2013.01.030, 2012.

[148] Y. Roman-Leshkov, J. Chheda, and J. Dumesic. Phase modifiers promote efficient production of hydroxymethylfurfural from fructose. Science, 312(5782):1933–1937, 2006, http://www.sciencemag.org/cgi/reprint/312/5782/1933.pdf.

[149] E.L. Kunkes, D. A. Simonetti, R. M. West, J. C. Serrano-Ruiz, C. A. Gartner, and J. A. Dumesic. Catalytic conversion of biomass to monofunctional hydrocarbons and targeted liquid-fuel classes. Science, 322(5900):417–421, OCT 17 2008.

[150] George W. Huber, Juben N. Chheda, Christopher J. Barrett, and James A. Dumesic. Production of liquid alkanes by aqueous-phase processing of biomass- derived carbohydrates. Science, 308(5727):1446–1450, 2005.

[151] T. Werpy and et al. Top value added chemicals from biomass, volume 1 - results of screening for potential candidates from sugars and synthe- sis gas. Technical report, U.S. Department of Energy, 2004. available at www.eere.energy.gov/biomass/pdfs/35523.pdf.

[152] Joseph J. Bozell and Gene R. Petersen. Technology development for the produc- tion of biobased products from biorefinery carbohydrates-the US Department of Energy’s “top 10” revisited. Green Chemistry, 12(4):539–554, 2010.

[153] Francesco Cherubini and Anders H. Strmman. Chemicals from lignocellulosic biomass: opportunities, perspectives, and potential of biorefinery systems. Biofu- els, Bioproducts and Biorefining, 5(5):548–561, 2011.

[154] Jacco van Haveren, Elinor L. Scott, and Johan Sanders. Bulk chemicals from biomass. Biofuels, Bioproducts and Biorefining, 2(1):41–57, 2008.

[155] Pierre Gallezot. Conversion of biomass to selected chemical products. Chem. Soc. Rev., 41:1538–1558, 2012. BIBLIOGRAPHY 172

[156] Rajeev S. Assary, Paul C. Redfern, Jeff R. Hammond, Jeffrey Greeley, and Larry A. Curtiss. Predicted thermochemistry for chemical conversions of 5- hydroxymethylfurfural. Chemical Physics Letters, 497(13):123 – 128, 2010.

[157] Maarten K. Sabbe, Mark Saeys, Marie-Franoise Reyniers, Guy B. Marin, Veronique Van Speybroeck, and Michel Waroquier. Group additive values for the gas phase standard enthalpy of formation of hydrocarbons and hydrocar- bon radicals. The Journal of Physical Chemistry A, 109(33):7466–7480, 2005, http://pubs.acs.org/doi/pdf/10.1021/jp050484r.

[158] Maarten K. Sabbe, Freija De Vleeschouwer, Marie-Francoise Reyniers, Michel Waroquier, and Guy B. Marin. First principles based group additive values for the gas phase standard entropy and heat capacity of hydrocarbons and hydrocar- bon radicals. The Journal of Physical Chemistry A, 112(47):12235–12251, 2008, http://pubs.acs.org/doi/pdf/10.1021/jp807526n.

[159] Shumaila S. Khan, Xinrui Yu, Jeffrey R. Wade, R. Dean Malmgren, and Linda J. Broadbelt. Thermochemistry of radicals and molecules relevant to atmospheric chemistry: Determination of group additivity values using G3//B3LYP theory. The Journal of Physical Chemistry A, 113(17):5176–5194, 2009, http://pubs.acs.org/doi/pdf/10.1021/jp809361y.

[160] Scott A. Wildman and Gordon M. Crippen. Prediction of physicochemical param- eters by atomic contributions. Journal of Chemical Information and Computer Sciences, 39(5):868–873, 1999.

[161] James J. P. Stewart. Optimization of parameters for semiempirical methods i. method. Journal of Computational Chemistry, 10(2):209–220, 1989.

[162] Noel O’Boyle, Michael Banck, Craig James, Chris Morley, Tim Vandermeersch, and Geoffrey Hutchison. : An open chemical toolbox. Journal of Cheminformatics, 3(1):33, 2011.

[163] Krister Holmberg, Bo J¨onsson,Bengt Kronberg, and Bj¨ornLindman. Introduction to Surfactants, pages 1–37. John Wiley & Sons, Ltd, 2003.

[164] C.L. Edwards. Polyoxyethylene Alcohols, page 111. CRC Press, 1997. BIBLIOGRAPHY 173

[165] Jiwei Hu, Xiaoyi Zhang, and Zhengwu Wang. A review on progress in qspr studies for surfactants. International Journal of Molecular Sciences, 11(3):1020–1047, 2010.

[166] Paul D. T. Huibers, Victor S. Lobanov, Alan R. Katritzky, Dinesh O. Shah, and Mati Karelson. Prediction of critical micelle concentration using a quantita- tive structureproperty relationship approach. 1. nonionic surfactants. Langmuir, 12(6):1462–1470, 1996, http://pubs.acs.org/doi/pdf/10.1021/la950581j.

[167] Paul D.T. Huibers, Dinesh O. Shah, and Alan R. Katritzky. Predicting surfactant cloud point from molecular structure. Journal of Colloid and Interface Science, 193(1):132 – 136, 1997.

[168] Mei-Ling Chen, Zheng-Wu Wang, and Han-Jun Duan. Qspr for hlb values of nonionic surfactants using two simple descriptors. Jour- nal of Dispersion Science and Technology, 30(10):1481–1485, 2009, http://www.tandfonline.com/doi/pdf/10.1080/01932690903123338.

[169] Zheng-Wu Wang, Jun-Li Feng, Hai-Jun Wang, Zheng-Gang Cui, and Gan-Zuo Li. Effectiveness of surface tension reduction by nonionic surfactants with quantitative structure-property relationship approach. Journal of Dispersion Science and Tech- nology, 26(4):441–447, 2005, http://www.tandfonline.com/doi/pdf/10.1081/DIS- 200054572.

[170] D.W. Roberts. Aquatic toxicity: Are surfactant properties relevant? Journal of Surfactants and Detergents, 3(3):309–315, 2000.

[171] UdoR. Kreutzer. Manufacture of fatty alcohols based on natural fats and oils. Journal of the American Oil Chemists Society, 61(2):343–348, 1984.

[172] Ernst Billig and David R. Bryant. Oxo Process. John Wiley & Sons, Inc., 2000.

[173] G. Knothe. Synthesis, applications, and characterization of guerbet compounds and their derivatives. Lipid Technology, 14:101–104, 2002.

[174] Jr. O’Lenick, AnthonyJ. Guerbet chemistry. Journal of Surfactants and Deter- gents, 4(3):311–315, 2001. BIBLIOGRAPHY 174

[175] Eric J. Steen, Yisheng Kang, Gregory Bokinsky, Zhihao Hu, Andreas Schirmer, Amy McClure, Stephen B. del Cardayre, and Jay D. Keasling. Microbial pro- duction of fatty-acid-derived fuels and chemicals from plant biomass. Nature, 463(7280):559–U182, JAN 28 2010.

[176] Codexis. http://www.codexis.com/chemicals, Aug 2012.

[177] Hideshi. Hattori. Heterogeneous basic catalysis. Chemical Reviews, 95(3):537–558, 1995, http://pubs.acs.org/doi/pdf/10.1021/cr00035a005.

[178] R. A. Sheldon and H van Bekkum, editors. Fine Chemicals through Heterogeneous Catalysis. WILEY-VCH Verlag GmbH, 2007.

[179] U. Meyer, H. Gorzawski, and W.F. Hlderich. Michael addition of ethyl acrylate and acetone over solid bases. Catalysis Letters, 59(2-4):201–206, 1999.

[180] Avelino Corma, Olalla de la Torre, Michael Renz, and Nicolas Villandier. Produc- tion of High-Quality Diesel from Biomass Waste Products. Angewandte Chemie- International Edition, 50(10):2375–2378, 2011.

[181] I. T. Horvath, editor. Encyclopedia of Catalysis. John Wiley & Sons, Inc., 2002.

[182] Christian A. Gartner, Juan Carlos Serrano-Ruiz, Drew J. Braden, and James A. Dumesic. Catalytic upgrading of bio-oils by ketonization. ChemSusChem, 2(12):1121–1124, 2009.

[183] M. Ash and I. Ash, editors. Handbook of Green Chemicals. Synapse Information Resources Inc; 2nd Ed., 2004.

[184] Surfonic l12-8 technical bulletin huntsman corporation; huntsman metalworking chemicals product information brochure.

[185] TERGITOL TMN-6, The Dow Chemical Company, (90% surfactant) technical data sheet. http://www.dow.com/surfactants/products/branched.htm, Aug 2012.

[186] H. F. Rase, editor. Handbook of Commercial Catalysts: Heterogeneous Catalysts. CRC Press, 2000.

[187] J.I. Di Cosimo, C.R. Apesteguia, M.J.L. Gines, and E. Iglesia. Structural require-

ments and reaction pathways in condensation reactions of alcohols on mgyalox catalysts. Journal of Catalysis, 190(2):261 – 275, 2000. BIBLIOGRAPHY 175

[188] Maria J. Climent, Avelino Corma, and Sara Iborra. Heterogeneous catalysts for the one-pot synthesis of chemicals and fine chemicals. Chemical Reviews, 111(2):1072–1133, 2011, http://pubs.acs.org/doi/pdf/10.1021/cr1002084.

[189] J. C. Serrano-Ruiz and J. A. Dumesic. Catalytic upgrading of lactic acid to fuels and chemicals by dehydration/hydrogenation and C-C coupling reactions. Green Chemistry, 11(8):1101–1104, 2009.

[190] Garry C. Gunter, Robert H. Langford, James E. Jackson, and Dennis J. Miller. Catalysts and supports for conversion of lactic acid to acrylic acid and 2,3- pentanedione. Industrial & Engineering Chemistry Research, 34(3):974–980, 1995.

[191] Mei Chia, Yomaira J. Pagan-Torres, David Hibbitts, Qiaohua Tan, Hien N. Pham, Abhaya K. Datye, Matthew Neurock, Robert J. Davis, and James A. Dumesic. Selective hydrogenolysis of polyols and cyclic ethers over bifunctional surface sites on rhodiumrhenium catalysts. Journal of the American Chemical Society, 133(32):12675–12689, 2011, http://pubs.acs.org/doi/pdf/10.1021/ja2038358.

[192] Ryan M. West, Zhen Y. Liu, Maximilian Peter, and James A. Dumesic. Liq- uid alkanes with targeted molecular weights from biomass-derived carbohydrates. ChemSusChem, 1(5):417–424, 2008.

[193] R. Johnson, editor. Miller & Freund’s Probability and Statistics for Engineers, Seventh Ed. Prentice Hall, 2004.

[194] N. P. Cheremisinoff, editor. Industrial Solvents Handbook, 2nd Edition. Marcel Dekker Inc., 2003.

[195] Edward L. Kunkes, Elif I. Grbz, and James A. Dumesic. Vapour-phase c-c cou- pling reactions of biomass-derived oxygenates over pd/cezrox catalysts. Journal of Catalysis, 266(2):236 – 249, 2009.

[196] PaulaA. Zapata, Jimmy Faria, M. Pilar Ruiz, and DanielE. Resasco. Condensa- tion/hydrogenation of biomass-derived oxygenates in water/oil emulsions stabi- lized by nanohybrid catalysts. Topics in Catalysis, 55(1-2):38–52, 2012.

[197] AlbertJ. Leo and David Hoekman. Calculating log p(oct) with no missing frag- ments;the problem of estimating new interaction parameters. Perspectives in Drug Discovery and Design, 18(1):19–38, 2000. BIBLIOGRAPHY 176

[198] Biobyte Corporation. Clogp. http://biobyte.com/, December 2012.

[199] James Sangster. Octanol-water partition coefficients of simple organic compounds. Journal of Physical and Chemical Reference Data, 18(3):1111–1229, 1989.

[200] Logkow database. http://logkow.cisti.nrc.ca/logkow/index.jsp, December 2012.

[201] Prasenjeet Ghosh, Karlton J. Hickey, and Stephen B. Jaffe. Devel- opment of a detailed gasoline composition-based octane model. In- dustrial & Engineering Chemistry Research, 45(1):337–345, 2006, http://pubs.acs.org/doi/pdf/10.1021/ie050811h.

[202] Prasenjeet Ghosh and Stephen B. Jaffe. Detailed composition-based model for predicting the cetane number of diesel fuels. Industrial & Engineering Chemistry Research, 45(1):346–351, 2006, http://pubs.acs.org/doi/pdf/10.1021/ie0508132.

[203] Diego Alonso Saldana, Laurie Starck, Pascal Mougin, Bernard Rousseau, Lu- divine Pidol, Nicolas Jeuland, and Benoit Creton. Flash point and cetane number predictions for fuel compounds using quantitative structure prop- erty relationship (qspr) methods. Energy & Fuels, 25(9):3900–3908, 2011, http://pubs.acs.org/doi/pdf/10.1021/ef200795j.

[204] Alan R. Katritzky, Minati Kuanar, Svetoslav Slavov, C. Dennis Hall, Mati Karel- son, Iiris Kahn, and Dimitar A. Dobchev. Quantitative correlation of physical and chemical properties with chemical structure: Utility for prediction. Chemical Re- views, 110(10):5714–5789, 2010, http://pubs.acs.org/doi/pdf/10.1021/cr900238d.

[205] H. Knozinger. Dehydration of alcohols on aluminum oxide. Angewandte Chemie International Edition in English, 7(10):791–805, 1968.

[206] Helmut Knozinger, Horst Buhl, and Karel Kochloefl. The dehydration of alcohols on alumina: Xiv. reactivity and mechanism. Journal of Catalysis, 24(1):57 – 68, 1972.

[207] C. A. Gaertner, J. C. Serrano-Ruiz, D. J. Braden, and J. A. Dumesic. Ketoniza- tion Reactions of Carboxylic Acids and Esters over Ceria-Zirconia as Biomass- Upgrading Processes. Industrial & Engineering Chemistry Research, 49(13):6027– 6033, JUL 7 2010. BIBLIOGRAPHY 177

[208] Alessandra Beretta, Enrico Tronconi, Pio Forzatti, Italo Pasquon, Emilio Micheli, Lorenzo Tagliabue, and Gian Battista Antonelli. Development of a mechanistic kinetic model of the higher alcohol synthesis over a cs-doped zn/cr/o catalyst. 1. model derivation and data fitting. Industrial & Engineering Chemistry Research, 35(7):2144–2153, 1996.

[209] Michael H. Abraham, Robert E. Smith, Ron Luchtefeld, Aaron J. Boorem, Ren- sheng Luo, and William E. Acree. Prediction of solubility of drugs and other compounds in organic solvents. Journal of Pharmaceutical Sciences, 99(3):1500– 1515, 2010.

[210] Christina Mintz, Michael Clark, William E. Acree, and Michael H. Abraham. Enthalpy of solvation correlations for gaseous solutes dissolved in water and in 1-octanol based on the Abraham model. Journal of Chemical Information and Modeling, 47(1):115–121, 2007.

[211] Amrit Jalan, Robert W. Ashcraft, Richard H. West, and William H. Green. Pre- dicting solvation energies for kinetic modeling. Annu. Rep. Prog. Chem., Sect. C: Phys. Chem., 106:211–258, 2010.

[212] Yu-Chuan Lin and George W. Huber. The critical role of heterogeneous catalysis in lignocellulosic biomass conversion. Energy Environ. Sci., 2:68–80, 2009.

[213] Pierre Gallezot and Alain Kiennemann. Conversion of Biomass on Solid Catalysts. Wiley-VCH Verlag GmbH & Co. KGaA, 2008.

[214] Jens K. Nørskov, Frank Abild-Pedersen, Felix Studt, and Thomas Bli- gaard. Density functional theory in surface chemistry and catalysis. Proceedings of the National Academy of Sciences, 108(3):937–943, 2011, http://www.pnas.org/content/108/3/937.full.pdf+html.

[215] P. Ferrin, D. Simonetti, S. Kandoi, E. Kunkes, J. A. Dumesic, J. K. Nørskov, and M. Mavrikakis. Modeling ethanol decomposition on transition metals: A combined application of scaling and Brφnsted-Evans-Polanyi relations. Journal of the American Chemical Society, 131(16):5809–5815, 2009.

[216] Shampa Kandoi, Jeff Greeley, Dante Simonetti, John Shabaker, James A. Dumesic, and Manos Mavrikakis. Reaction kinetics of ethylene glycol reforming BIBLIOGRAPHY 178

over platinum in the vapor versus aqueous phases. The Journal of Physical Chem- istry C, 115(4):961–971, 2011, http://pubs.acs.org/doi/pdf/10.1021/jp104136s.

[217] M. Salciccioli and D. G. Vlachos. Kinetic modeling of pt catalyzed and computation-driven catalyst discovery for ethylene glycol decomposition. ACS Catalysis, 1(10):1246–1256, 2011.

[218] Simon H. Pang and J. Will Medlin. Adsorption and reaction of furfural and furfuryl alcohol on pd(111): Unique reaction pathways for multifunctional reagents. ACS Catalysis, 1(10):1272–1283, 2011, http://pubs.acs.org/doi/pdf/10.1021/cs200226h.

[219] Vassili Vorotnikov, Giannis Mpourmpakis, and Dionisios G. Vla- chos. Dft study of furfural conversion to furan, furfuryl alcohol, and 2-methylfuran on pd(111). ACS Catalysis, 2(12):2496–2504, 2012, http://pubs.acs.org/doi/pdf/10.1021/cs300395a.

[220] M. Salciccioli, Y. Chen, and D. G. Vlachos. Density functional theory-derived group additivity and linear scaling methods for prediction of oxygenate stabil- ity on metal catalysts: Adsorption of open-ring alcohol and polyol dehydrogena- tion intermediates on pt-based metals. The Journal of Physical Chemistry C, 114(47):20155–20166, 2010, http://pubs.acs.org/doi/pdf/10.1021/jp107836a.

[221] Bin Liu and Jeffrey Greeley. Decomposition pathways of glycerol via ch, oh, and cc bond scission on pt(111): A density functional theory study. The Journal of Physical Chemistry C, 115(40):19702–19709, 2011, http://pubs.acs.org/doi/pdf/10.1021/jp202923w.

[222] Bin Liu and Jeffrey Greeley. Density functional theory study of selectivity con- siderations for cc versus co bond scission in glycerol decomposition on pt(111). Topics in Catalysis, 55(5-6):280–289, 2012.

[223] R Alcala, M Mavrikakis, and JA Dumesic. DFT studies for cleavage of C-C and C- O bonds in surface species derived from ethanol on Pt(111). Journal of Catalysis, 218(1):178–190, 2003.

[224] Y. Chen, M. Salciccioli, and D. G. Vlachos. An efficient reaction pathway search method applied to the decomposition of glycerol on platinum. The Journal of Physical Chemistry C, 115(38):18707–18720, 2011. BIBLIOGRAPHY 179

[225] Jonathan E. Sutton and Dionisios G. Vlachos. A theoretical and computational analysis of linear free energy relations for the esti- mation of activation energies. ACS Catalysis, 2(8):1624–1634, 2012, http://pubs.acs.org/doi/pdf/10.1021/cs3003269.

[226] F. Abild-Pedersen, J. Greeley, F. Studt, J. Rossmeisl, T. R. Munter, P. G. Moses, E. Sk´ulason,T. Bligaard, and J. K. Nørskov. Scaling properties of adsorption energies for hydrogen-containing molecules on transition-metal surfaces. Phys. Rev. Lett., 99:016105, Jul 2007.

[227] Jonathan E. Sutton and Dionisios G. Vlachos. Error estimates in semi-empirical estimation methods of surface reactions. Journal of Catalysis, 297(0):202 – 216, 2013.

[228] David Coll, Francoise Delbecq, Yosslen Aray, and Philippe Sautet. Stability of intermediates in the glycerol hydrogenolysis on transition metal catalysts from first principles. Phys. Chem. Chem. Phys., 13:1448–1456, 2011.

[229] D.A. Simonetti, E.L. Kunkes, and J.A. Dumesic. Gas-phase conversion of glycerol to synthesis gas over carbon-supported platinum and platinum-rhenium catalysts. Journal of Catalysis, 247(2):298 – 306, 2007.

[230] JW Shabaker, GW Huber, and JA Dumesic. Aqueous-phase reforming of oxy- genated hydrocarbons over Sn-modified Ni catalysts. Journal of Catalysis.

[231] Orest Skoplyak, Mark A. Barteau, and Jingguang G. Chen. Enhancing H-2 and CO Production from Glycerol Using Bimetallic Surfaces. ChemSusChem, 1(6):524–526, 2008.

[232] Shuai Wang and Haichao Liu. Selective hydrogenolysis of glycerol to propylene glycol on CuZnO catalysts. Catalysis Letters, 117:62–67, 2007. 10.1007/s10562- 007-9106-9.

[233] Erin P. Maris and Robert J. Davis. Hydrogenolysis of glycerol over carbon- supported ru and pt catalysts. Journal of Catalysis, 249(2):328 – 337, 2007.

[234] Mohanprasad A. Dasari, Pim-Pahn Kiatsimkul, Willam R. Sutterlin, and Galen J. Suppes. Low-pressure hydrogenolysis of glycerol to propylene glycol. Applied Catalysis A: General, 281(12):225 – 231, 2005. BIBLIOGRAPHY 180

[235] Florian Auneau, Carine Michel, Franoise Delbecq, Catherine Pinel, and Philippe Sautet. Unravelling the mechanism of glycerol hydrogenolysis over rhodium cata- lyst through combined experimentaltheoretical investigations. Chemistry A Eu- ropean Journal, 17(50):14288–14299, 2011.

[236] Shampa Kandoi, Jeff Greeley, MarcoA. Sanchez-Castillo, StevenT. Evans, AmitA. Gokhale, JamesA. Dumesic, and Manos Mavrikakis. Prediction of experimental methanol decomposition rates on platinum from first principles. Topics in Catal- ysis, 37(1):17–28, 2006.

[237] G. Jones, T. Bligaard, F. Abild-Pedersen, and J. K. Norskov. Using scaling rela- tions to understand trends in the catalytic activity of transition metals. Journal of Physics-Condensed Matter, 20(6):064239, 2008.

[238] Glenn Jones, Felix Studt, Frank Abild-Pedersen, Jens K. Nørskov, and Thomas Bligaard. Scaling relationships for adsorption energies of {C2} hydrocarbons on transition metal surfaces. Chemical Engineering Science, 66(24):6318 – 6323, 2011. ¡ce:title¿Novel Gas Conversion Symposium- Lyon 2010, C1-C4 Catalytic Processes for the Production of Chemicals and Fuels¡/ce:title¿.

[239] Jens S. Hummelshøj, Frank Abild-Pedersen, Felix Studt, Thomas Bligaard, and Jens K. Nørskov. Catapp: A web application for surface chemistry and heteroge- neous catalysis. Angewandte Chemie International Edition, 51(1):272–274, 2012.

[240] Bin Liu and Jeffrey Greeley. A density functional theory analysis of trends in glycerol decomposition on close-packed transition metal surfaces. Phys. Chem. Chem. Phys., 15:6475–6485, 2013.

[241] Lars C. Grabow, Amit A. Gokhale, Steven T. Evans, James A. Dumesic, and Manos Mavrikakis. Mechanism of the water gas shift reaction on pt: First princi- ples, experiments, and microkinetic modeling. The Journal of Physical Chemistry C, 112(12):4608–4617, 2008, http://pubs.acs.org/doi/pdf/10.1021/jp7099702.

[242] Hanne Falsig, Britt Hvolbk, Iben S. Kristensen, Tao Jiang, Thomas Bligaard, Claus H. Christensen, and Jens K. Nørskov. Trends in the catalytic co oxidation activity of nanoparticles. Angewandte Chemie International Edition, 47(26):4835– 4839, 2008. BIBLIOGRAPHY 181

[243] Edward L. Kunkes, Dante A. Simonetti, James A. Dumesic, William D. Pyrz, Luis E. Murillo, Jingguang G. Chen, and Douglas J. Buttrey. The role of rhenium in the conversion of glycerol to synthesis gas over carbon supported platinumrhe- nium catalysts. Journal of Catalysis, 260(1):164 – 177, 2008.

[244] Carine Michel, Florian Auneau, Franoise Delbecq, and Philippe Sautet. Ch versus oh bond dissociation for alcohols on a rh(111) surface: A strong assis- tance from hydrogen bonded neighbors. ACS Catalysis, 1(10):1430–1440, 2011, http://pubs.acs.org/doi/pdf/10.1021/cs200370g.

[245] Cuong M. Nguyen, Bart A. De Moor, Marie-Franoise Reyniers, and Guy B. Marin. Physisorption and chemisorption of linear alkenes in zeolites: A combined qm-pot(mp2//b3lyp:gulp)statistical thermodynamics study. The Journal of Physical Chemistry C, 115(48):23831–23847, 2011, http://pubs.acs.org/doi/pdf/10.1021/jp2067606.

[246] Bart A. De Moor, Marie-Francoise Reyniers, Oliver C. Gobin, Johannes A. Lercher, and Guy B. Marin. Adsorption of c2-c8 n-alkanes in ze- olites. The Journal of Physical Chemistry C, 115(4):1204–1219, 2011, http://pubs.acs.org/doi/pdf/10.1021/jp106536m.

[247] James F. Haw, Weiguo Song, David M. Marcus, and John B. Nicholas. The mechanism of methanol to hydrocarbon catalysis. Accounts of Chemical Research, 36(5):317–326, 2003, http://pubs.acs.org/doi/pdf/10.1021/ar020006o.

[248] David Lesthaeghe, Veronique Van Speybroeck, Guy B. Marin, and Michel Waro- quier. Understanding the failure of direct C-C coupling in the zeolite-catalyzed methanol-to-olefin process. Angewandte Chemie, 118(11):1746–1751, 2006.

[249] Samia Ilias and Aditya Bhan. Mechanism of the catalytic conver- sion of methanol to hydrocarbons. ACS Catalysis, 3(1):18–31, 2013, http://pubs.acs.org/doi/pdf/10.1021/cs3006583.

[250] Samia Ilias and Aditya Bhan. Tuning the selectivity of methanol-to-hydrocarbons conversion on h-zsm-5 by co-processing olefin or aromatic compounds. Journal of Catalysis, 290(0):186 – 192, 2012. BIBLIOGRAPHY 182

[251] M. N. Mazar, S. Al-Hashimi, A. Bhan, and M. Cococcioni. Methyla- tion of ethene by surface methoxides: A periodic pbe+d study across ze- olites. The Journal of Physical Chemistry C, 116(36):19385–19395, 2012, http://pubs.acs.org/doi/pdf/10.1021/jp306003e.

[252] Cuong M. Nguyen, Bart A. De Moor, Marie-Francoise Reyniers, and Guy B. Marin. Isobutene protonation in h-fau, h-mor, h-zsm-5, and h- zsm-22. The Journal of Physical Chemistry C, 116(34):18236–18249, 2012, http://pubs.acs.org/doi/pdf/10.1021/jp304081k.

[253] Annemieke van de Runstraat, Joop van Grondelle, and Rutger A. van Santen. Microkinetics modeling of the hydroisomerization of n- hexane. Industrial & Engineering Chemistry Research, 36(8):3116–3125, 1997, http://pubs.acs.org/doi/pdf/10.1021/ie960661y.

[254] David M. McCann, David Lesthaeghe, Philip W. Kletnieks, Darryl R. Guenther, Miranda J. Hayman, Veronique Van Speybroeck, Michel Waroquier, and James F. Haw. A complete catalytic cycle for supramolecular methanol-to-olefins conversion by linking theory with experiment. Angewandte Chemie International Edition, 47(28):5179–5182, 2008.

[255] Rasmus Y. Brogaard, Bert M. Weckhuysen, and Jens K. Nrskov. Guest-host interactions of arenes in h-zsm-5 and their impact on methanol-to-hydrocarbons deactivation processes. Journal of Catalysis, 300(0):235 – 241, 2013.

[256] Ian M. Hill, Saleh Al Hashimi, and Aditya Bhan. Kinetics and mechanism of olefin methylation reactions on zeolites. Journal of Catalysis, 285(1):115 – 123, 2012.

[257] Ian M. Hill, Yong Sam Ng, and Aditya Bhan. Kinetics of butene isomer methy- lation with dimethyl ether over zeolite catalysts. ACS Catalysis, 2(8):1742–1748, 2012, http://pubs.acs.org/doi/pdf/10.1021/cs300317p.

[258] Matthias Vandichel, David Lesthaeghe, Jeroen Van der Mynsbrugge, Michel Waroquier, and Veronique Van Speybroeck. Assembly of cyclic hydrocarbons from ethene and propene in acid zeolite catalysis to produce active catalytic sites for {MTO} conversion. Journal of Catalysis, 271(1):67 – 78, 2010. BIBLIOGRAPHY 183

[259] David Lesthaeghe, Bart De Sterck, Veronique Van Speybroeck, Guy B. Marin, and Michel Waroquier. Zeolite shape-selectivity in the gem-methylation of aromatic hydrocarbons. Angewandte Chemie International Edition, 46(8):1311–1314, 2007.

[260] David Lesthaeghe, Annelies Horre, Michel Waroquier, Guy B. Marin, and Veronique Van Speybroeck. Theoretical insights on methylbenzene side-chain growth in zsm-5 zeolites for methanol-to-olefin conversion. Chemistry A Eu- ropean Journal, 15(41):10803–10808, 2009.

[261] Vikram Seshadri and Phillip R. Westmoreland. Concerted reactions and mechanism of glucose pyrolysis and implications for cellulose kinet- ics. The Journal of Physical Chemistry A, 116(49):11997–12013, 2012, http://pubs.acs.org/doi/pdf/10.1021/jp3085099.

[262] ATHENA Visual Studio. http://www.athenavisual.com/, July 2013.

[263] CHEMKIN. http://www.reactiondesign.com/images/CHEMKIN-BRO_0910. pdf, accessed Dec 2010.

[264] General Algebraic Modeling System. www.gams.com, 2011.

[265] Andreas Wachter and Lorenz T. Biegler. On the implementation of an interior- point filter line-search algorithm for large-scale nonlinear programming. Mathe- matical Programming, 106(1):25–57, 2006.

[266] Marcio Schwaab, Lvia P. Lemos, and Jos Carlos Pinto. Optimum reference tem- perature for reparameterization of the arrhenius equation. part 2: Problems in- volving multiple reparameterizations. Chemical Engineering Science, 63(11):2895 – 2906, 2008.

[267] L. T. Biegler. Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes. SIAM-Society for Industrial and Applied Mathematics, 2010.

[268] CANTERA. http://sourceforge.net/projects/cantera/, July 2013.

[269] William E. Schiesser. The Numerical Method of Lines: Integration of Partial Differential Equations. Academic Press, 1991. BIBLIOGRAPHY 184

[270] C. Kravaris, J. Hahn, and Y. Chu. Advances and selected recent developments in state and parameter estimation. Computers & Chemical Engineering, 51(0):111 – 123, 2013.

[271] Edmond Chow and Yousef Saad. Approximate inverse preconditioners for general sparse matrices. Technical report, 1994.

[272] Catapp. http://suncat.slac.stanford.edu/catapp/, July 2013. Appendix

A Inputs into RING for studying Fructose-to-HMF sys- tem (see Chapter3)

//reactant definition input reactant "O1C(CO)(O)C(O)C(O)C1(CO)" input reactant "[H+]"

//definition of a characteristic called primaryCarbenium define characteristic primaryCarbenium on mol { fragment a{ C+ labeled 1 {! connected to >1 heavy atom} }

mol contains a }

//global constraints specification global constraints on Molecule { //declaration of a fragment named ’a’ fragment a

185 A Fructose-to-HMF 186

{ C+ labeled 1 $ labeled 2 double bond to 1 } //molecule does not contain C+=$, where $ is any atom ! Molecule contains a Molecule.size < 15 Molecule.charge >-2 && Molecule.charge <2 fragment b { C labeled c1 C labeled c2 double bond to c1 X labeled x1 double bond to c2 } ! Molecule contains b }

//Protonation of an alcoholic group of a cyclic molecule rule alcoholprot{ neutral reactant r1{ C labeled c1 nonringatom O labeled o1 single bond to } positive reactant proton{ H+ labeled h1 } constraints { (r1.size <15) && (r1.size >2)} form bond (o1, h1) modify atomtype (o1, O+) modify atomtype (h1, H) }

//Dehydration of oxonium species to form carbenium ion rule dehydration{ reactant r1 { C labeled c1 //{connected to 2 C} O+ labeled o1 single bond to c1} constraints { r1.charge = 1 A Fructose-to-HMF 187 r1.size <15} break bond (c1, o1) modify atomtype (c1, C+) modify atomtype (o1,O) }

//Deprotonation of carbenium to form C=C rule deProt{ positive reactant r1 { C+ labeled c1 C labeled c2 single bond to c1 {!connected to >=1 O with double bond} H labeled h1 single bond to c2} constraints{ r1.size < 15} break bond (c2, h1) increase bond order (c1, c2) modify atomtype (c1, C) modify atomtype (h1, H+) }

//Protonation of C=C rule CCProtonation{ reactant r1 { C labeled c1 C labeled c2 double bond to c1 } positive reactant r2{ H+ labeled h1} constraints {r1.charge = 0 && r1.size <15} form bond (c1, h1) decrease bond order (c1,c2) modify atomtype (c2,C+) modify atomtype (h1, H) }

//Hydride shifts rule Hshift { positive reactant r1 { C+ labeled c1 C labeled c2 single bond to c1 {! connected to >=1 O with double bond} H labeled h1 single bond to c2 } constraints {r1.size < 15} break bond (h1,c2) A Fructose-to-HMF 188 form bond (c1,h1) modify atomtype (c1,C) modify atomtype (c2, C+) }

//Deprot to form ketone rule deProtToKetone{ reactant r1 { C+ labeled c1 O labeled o1 single bond to c1 H labeled h1 single bond to o1 } constraints{ r1.charge =1 && r1.size <15 } break bond (o1,h1) increase bond order (c1, o1) modify atomtype (c1,C) modify atomtype (h1, H+) }

//Allylic rearrangement rule allylrearrangement { reactant r1 { C+ labeled c1 C labeled c2 single bond to c1 C labeled c3 double bond to c2} constraints {r1.size <15 && r1.charge =1} decrease bond order (c2,c3) increase bond order (c1,c2) modify atomtype (c1,C) modify atomtype (c3,C+)} rule carbonylProt{ neutral reactant r1{ C labeled c1 O labeled o1 double bond to c1} positive reactant r2{ H+ labeled h1} constraints {r1.size <15} decrease bond order (c1,o1) form bond (o1,h1) modify atomtype (c1,C+) B RING language: EBNF 189 modify atomtype (h1,H) }

//end of rules!

B RING language: EBNF

======TOP LEVEL

Program ::= {Reactant} {Dcl} [GlobalConstraints] {Rule} {ReconstructInfo} [LumpStrat] {PostProcessingDesc}

Reactant ::= ’input’ ’reactant’ String_t

Dcl ::= ’define’ ’characteristic’ Ident_t ’on’ Ident_t ’{’ {Constraint} ’}’ Dcl ::= ’define’ ’composite’ ’atom’ CompositeDcl {’,’ CompositeDcl} Dcl ::= ’group’ Ident_t ’(’ Ident_t {’,’ Ident_t} ’)’ ’{’ Assignments ’}’ CompositeDcl ::= CompositeElement_t [’(site)’]

GlobalConstraints ::= ’global’ ’constraints’ ’on’ Ident_t ’{’ {Constraint} ’}’

ReconstructInfo ::= ’reconstruct’ ’network’ Ident_t ’(’ Ident_t ’)’ Ident_t ’(’ Ident_t ’)’ ’{’ {Constraint} ’}’

LumpStrat ::= ’lump’ ’all’ ’isomers’ ’{’ {LumpParameter} ’}’

LumpParameter ::= ’lump’ ’aromatics’ ’to’ ’least’ ’branched’ LumpParameter ::= ’lump’ ’aromatics’ ’to’ ’most’ ’branched’ LumpParameter ::= ’lump’ ’naphthenics’ ’to’ ’least’ ’branched’ LumpParameter ::= ’lump’ ’naphthenics’ ’to’ ’most’ ’branched’ LumpParameter ::= ’lump’ ’olefins’ ’to’ ’least’ ’branched’ LumpParameter ::= ’lump’ ’olefins’ ’to’ ’most’ ’branched’ LumpParameter ::= ’lump’ ’paraffins’ ’to’ ’least’ ’branched’ LumpParameter ::= ’lump’ ’paraffins’ ’to’ ’most’ ’branched’ B RING language: EBNF 190

LumpParameter ::= ’represent’ ’acyclic’ ’with’ ’closest’ ’apart’ LumpParameter ::= ’represent’ ’acyclic’ ’with’ ’farthest’ ’apart’ LumpParameter ::= ’represent’ ’cyclic’ ’with’ ’closest’ ’apart’ LumpParameter ::= ’represent’ ’cyclic’ ’with’ ’farthest’ ’apart’

======REACTION RULE DESCRIPTION

Rule ::= ’rule’ Ident_t ’{’ Declarations [ReactantConstraints] {Transformation} [ProductConstraints] [AdditionalInformation] [RuleCost] [RuleRank] ’}’

Declarations ::= Declaration Declaration [DuplicateDeclaration] Declarations ::= Declaration [DuplicateDeclaration] Declaration ::= {DeclPrefix} ’reactant’ Ident_t ’{’ Assignments ’}’ Declaration ::= {DeclPrefix} ’reactant’ Ident_t ’group’ Ident_t ’(’ GroupAssignments ’)’ DuplicateDeclaration ::= {DeclPrefix} ’reactant’ Ident_t ’duplicates’ Ident_t ’(’ GroupAssignments ’)’

GroupAssignment ::= Ident_t ’=’ ’>’ Ident_t GroupAssignments ::= GroupAssignment GroupAssignments ::= GroupAssignment ’,’ GroupAssignments

ReactantConstraints ::= ’constraints’ ’{’ {Constraint} ’}’

Transformation ::= ’break’ ’bond’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’break’ BondPrefix ’bond’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’decrease’ ’bond’ ’order’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’form’ ’bond’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’form’ BondPrefix ’bond’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’increase’ ’bond’ ’order’ ’(’ Label_t ’,’ Label_t ’)’ Transformation ::= ’modify’ ’atomtype’ ’(’ Label_t ’,’ AtomType ’)’ Transformation ::= ’modify’ ’bond’ ’(’ Label_t ’,’ Label_t ’,’ BondPrefix ’)’

ProductConstraints ::= ’product’ ’constraints’ ’on’ Ident_t ’{’ {Constraint} ’}’ B RING language: EBNF 191

AdditionalInformation ::= ’allow’ ’intramolecular’ ’reaction’ AdditionalInformation ::= ’only’ ’intramolecular’ ’reaction’ AdditionalInformation ::= ’only’ ’self’ ’reaction’ RuleCost ::= ’rule’ ’cost’ ’is’ Num_t RuleRank ::= ’maximum’ ’rule’ ’rank’ ’is’ Num_t

======REACTANT DEFINITION

Assignments ::= FirstAssignment {Assignment} FirstAssignment ::= AtomType ’labeled’ Label_t [MaybeAtomConstraints] Assignment ::= AtomType ’labeled’ Label_t BondPrefix ’bond’ ’to’ Label_t [MaybeAtomConstraints] Assignment ::= ’ringbond’ Label_t BondPrefix ’bond’ ’to’ Label_t

MaybeAtomConstraints ::= ’{’ AtomConstraint {’,’ AtomConstraint} ’}’ AtomConstraint ::= [’!’] ’connected’ ’to’ [AtomConstraintNumber] AtomType AtomConstraint ::= [’!’] ’connected’ ’to’ [AtomConstraintNumber] AtomType ’with’ BondPrefix ’bond’ AtomConstraint ::= [’!’] ’connected’ ’to’ [AtomConstraintNumber] ’group’ Ident_t AtomConstraint ::= [’!’] ’connected’ ’to’ [AtomConstraintNumber] ’group’ Ident_t ’with’ BondPrefix ’bond’ AtomConstraint ::= ’in’ ’ring’ ’of’ ’size’ AtomConstraintNumber

AtomConstraintNumber ::= [’<=’ | ’<’ | ’=’ | ’>=’ | ’>’] Num_t

AtomType ::= (’allylic’ | ’aromatic’ | ’nonallylic’ | ’nonaromatic’ | ’nonringatom’ | ’ringatom’) Element (’-’ | ’:’ | ’.’ | ’*’ | ’+’ | ’+’ ’.’)

Element ::= CompositeElement_t | ’H’ | ’N’ | ’O’ | ’P’ | ’S’ | ’X’ | ’&’ | ’c’ | ’n’ | ’o’ | ’p’ | ’s’ | ’C*’ | ’C’ | ’$’ | ’heavy’ ’atom’ | ’heteroatom’ | ’any’ ’atom’

BondPrefix ::= ’any’ | ’single’ | ’double’ | ’triple’ | ’aromatic’ | ’partial’ | ’strong’ | BondPrefix ’nonring’ | BondPrefix ’ring’ B RING language: EBNF 192

DeclPrefix ::= Ident_t | ’aromatic’ | ’cyclic’ | ’linear’ | ’negative’ | ’neutral’ | ’nonradical’ | ’olefinic’ | ’paraffinic’ | ’positive’ | ’radical’

Constraint ::= Expr Constraint ::= ’fragment’ Ident_t ’{’ Assignments ’}’

Expr ::= ’!’ Expr Expr ::= ’(’ Expr ’)’ Expr ::= Expr ’||’ Expr Expr ::= Expr ’&&’ Expr Expr ::= Ident_t ’contains’ AtomConstraintNumber ’of’ ’group’ Ident_t Expr ::= Ident_t ’contains’ AtomConstraintNumber ’of’ Ident_t Expr ::= Ident_t ’contains’ ’group’ Ident_t Expr ::= Ident_t ’contains’ Ident_t Expr ::= Ident_t ’.’ ’formula’ ’is’ String_t Expr ::= Ident_t ’is’ ’aromatic’ Expr ::= Ident_t ’is’ ’bridged’ Expr ::= Ident_t ’is’ ’cyclic’ Expr ::= Ident_t ’is’ ’heteroaromatic’ Expr ::= Ident_t ’is’ Ident_t Expr ::= Ident_t ’is’ ’oxygenate’ Expr ::= Ident_t ’is’ String_t Expr ::= IntExpr ’between’ IntExpr ’and’ IntExpr Expr ::= IntExpr ’<=’ IntExpr Expr ::= IntExpr ’<’ IntExpr Expr ::= IntExpr ’=’ IntExpr Expr ::= IntExpr ’>=’ IntExpr Expr ::= IntExpr ’>’ IntExpr

IntExpr ::= Ident_t ’.’ ’charge’ IntExpr ::= Ident_t ’.’ ’maxringsize’ IntExpr ::= Ident_t ’.’ ’minringsize’ IntExpr ::= Ident_t ’.’ ’size’ IntExpr ::= Ident_t ’.’ ’unpairedelectron’ IntExpr ::= IntExpr ’+’ IntExpr IntExpr ::= Int_t B RING language: EBNF 193

======POST PROCESSING

PostProcessingDesc ::= ’find’ ’all’ Ident_t ’{’ {Constraint} ’}’ ’store’ ’in’ String_t PostProcessingDesc ::= ’find’ ’all’ ’reactions’ ’{’ {PathwayConstraint} ’}’ ’store’ ’in’ String_t PostProcessingDesc ::= ’find’ ’complete’ ’mechanisms’ ’to’ Ident_t ’{’ {Constraint} ’}’ PostProcessingDesc ::= ’find’ ’direct’ ’mechanisms’ ’to’ Ident_t ’{’ {Constraint} ’}’ ’constraints’ ’{’ {PathwayConstraint} ’}’ ’store’ ’in’ String_t PostProcessingDesc ::= ’find’ ’pathways’ ’to’ Ident_t ’{’ {Constraint} ’}’ ’constraints’ ’{’ {PathwayConstraint} ’}’ ’store’ ’in’ String_t PostProcessingDesc ::= ’find’ ’shortest’ ’distance’ ’to’ String_t

PathwayConstraint ::= ’eliminate’ ’similar’ ’mechanisms’ PathwayConstraint ::= ’eliminate’ ’similar’ ’pathways’ PathwayConstraint ::= ’maximum’ ’cost’ Num_t PathwayConstraint ::= ’maximum’ ’cycles’ Num_t PathwayConstraint ::= ’maximum’ ’length’ Num_t PathwayConstraint ::= ’maximum’ ’length’ ’shortest’ ’+’ Num_t PathwayConstraint ::= ’minimum’ ’cycles’ Num_t PathwayConstraint ::= ’minimum’ ’length’ Num_t ’,’ ’cost’ Num_t PathwayConstraint ::= ’rule’ ’is’ ’only’ Ident_t PathwayConstraint ::= PathwayExpr

PathwayExpr ::= ’contains’ AtomConstraintNumber [ReactionMember] UnimolecularConstraint PathwayExpr ::= ’contains’ AtomConstraintNumber [ReactionMember] UnimolecularConstraint ’in’ ’overall’ ’stoichiometry’ PathwayExpr ::= ’contains’ AtomConstraintNumber ’rule’ Ident_t PathwayExpr ::= ’contains’ AtomConstraintNumber ’rule’ Ident_t ’with’ [ReactionMember] BimolecularConstraint C RING compiler optimizations: Benchmark study 194

PathwayExpr ::= ’contains’ AtomConstraintNumber ’rule’ Ident_t ’with’ [ReactionMember] UnimolecularConstraint PathwayExpr ::= ’!’ PathwayExpr PathwayExpr ::= ’(’ PathwayExpr ’)’ PathwayExpr ::= PathwayExpr ’||’ PathwayExpr PathwayExpr ::= PathwayExpr ’&&’ PathwayExpr PathwayExpr ::= ’reaction’ ’is’ ’intramolecular’ PathwayExpr ::= ’reaction’ ’rule’ ’is’ Ident_t PathwayExpr ::= ’reaction’ ’with’ AtomConstraintNumber [ReactionMember] UnimolecularConstraint PathwayExpr ::= ’rule’ Ident_t ’only’ ’occurs’ ’as’ ’intramolecular’ ’reaction’ PathwayExpr ::= ’rule’ Ident_t ’only’ ’occurs’ ’as’ ’self’ ’reaction’ PathwayExpr ::= ’rule’ Ident_t ’only’ ’occurs’ ’with’ [ReactionMember] BimolecularConstraint PathwayExpr ::= ’rule’ Ident_t ’only’ ’occurs’ ’with’ [ReactionMember] UnimolecularConstraint

ReactionMember ::= ’product’ | ’products’ | ’reactant’ | ’reactants’ UnimolecularConstraint ::= Ident_t ’{’ {Constraint} ’}’ BimolecularConstraint ::= Ident_t ’,’ Ident_t ’{’ {Constraint} ’}’

C RING compiler optimizations: Benchmark study

Two benchmarking studies were performed. In the first study, the run times of five systems (rules given in S6) were compared under three cases: (a) no optimizations at all where all constraints where categorized under combined constraints, and constraints and patterns were ordered as is, (b) constraints were categorized according to their scope, and (c) constraint and pattern ordering optimizations were both added. The table 9.1 contains the relevant data. The errors are reported on the basis of 95% confidence interval assuming a t-distribution of the triplicate measurements made. By default combined constraints are ignored for intramolecular reactions. So individual constraints under no optimizations would be categorized under combined constraints and be overlooked by RING for such rules. In these examples, therefore, it was ensured that no constraints were specified for rules allowing intramolecular reactions. D RING: Class hierarchy 195 Table 9.1: Benchmarking data for compiler optimizations

Optimizations Benchmarks Reactions None Constr. All (Species) Cat ONLY Base catalysis 12771 (4609) 1059.38 ± 1039.23 ± 1035.70 ± 78.38 3.95 6.29 Fructose-to-HMF 1223 (546) 1.75 ± 0.04 1.60 ± 0.01 1.62 ± 0.04 Glucose pyrolysis 14375 (3131) 37.75 ± 0.12 37.69 ± 0.03 37.82 ± 0.13 HMF→Levulinc acid 39844 (14875) 3120.19 ± 53.52 ± 0.25 53.28 ± 0.11 10.75 Propane aromatiza- 2031 (594) 83.19 ± 0.96 59.01 ± 0.09 48.92 ± 0.05 tion

In a second study, the potential speed-up in individual pattern matching was tested using three examples. In each example, a default pattern and a pattern modified on the basis of the heuristics given in section “Pattern re-ordering” of the article. 9.2 documents the relevant data of this study. Measurements were made in triplicates with each measurement consisting of 10000 iterations of successively applying the pattern matching. The first molecule is a fatty acid methyl esters typical in biodiesel. The pattern sought is an ester group, first ordered by carbons, and subsequently ordered by the carbonyl oxygen. The second molecule is tetrahydrofuran while the pattern sought is a C-C-O chain first ordered by a carbon and then by the oxygen. The third molecule is an intermediate in Fructose-to-HMF conversion and the pattern is a linear chain of two carbons and a carbenium ion initially ordered by a carbon and then ordered by the ion.

D RING: Class hierarchy

9.1 shows the class hierarchy diagram of the fundamental classes in RING. As shown in the figure, atoms are of two kinds - single and composite - each derived from an abstract base class Atom in a polymorphic manner. The class SingleAtom also extends Atomtype, which holds the electronic information. CompositeAtom, as described earlier, allows for the definition of additional elements and/or groups of atoms whose explicit chemical identification is not essential, thus providing flexibility to the user in describing the chemistry. The electronic information, except charge, is not stored and valency is calculated based just on the nearest neighbors. As CompositeAtom is not D RING: Class hierarchy 196

Table 9.2: Benchmarking data for compiler optimizations (run times are in seconds)

Molecule Pattern CC(=O)OC O=C(OC)(C) 24.75 24.52 CCCCCCCCCCCCCCCCC(=O)OC 24.64 24.64 24.67 24.58 Average 24.69 24.58 % speed-up negligible

CCO OCC 12.38 10.67 C1COCC1 12.25 10.52 12.30 10.58 Average (% speed-up) 12.30 10.58 % speed-up 14%

CC’C+’ ’C+’CC 10.45 9.38 OC=C1OC(CO)C(O)[C+]1 10.45 9.38 10.45 9.38 Average 10.45 9.38 % speed-up 10% D RING: Class hierarchy 197

CompositeAtom SingleAtom Atomtype Element

0* Atom Atomcontainer

Reaction LumpInfo Substructure Molecule 0..1 1 0* 0..2 1 1 0* 1 LumpingStrategy Rxn_net_gen Reactiontype Patternmatch

0*

GlobalConstraints

Figure 9.1: Class hierarchy diagram of RING showing the inter-relationships of the classes constituting the necessary elements of network generation derived from Atomtype, it has members that store a pseudoelement symbol and an atomtype symbol that are derived from user-input information in SMILES or reaction transformations. For example, if input SMILES has a [Zeo-], such as in Figure 2a(i), then RING creates an object of class CompositeAtom with a pseudoelement symbol Zeo, atomtype symbol Zeo- and interprets the charge to be ’-1’. The classes Molecule and Substructure that represent individual molecules and fragments respectively are derived from the class Atomcontainer that stores an array of atoms polymorphically (that is, both SingleAtom and CompositeAtom objects). Atomcontainer also has an adjacency list for atom connectivity and bond order, and thus maintains a graph of atoms and bonds. Molecule and Substructure differ in the member functions and attributes they implement. For example, Molecule has the SMILES parser, while the SMARTS parser is in Substructure. Molecule also stores topological features such as rings, aromatic rings, and allylic atoms. The class Atomcontainer, in effect, provides the graph representation of molecules and frag- ments; classes Molecule and Substructure, on the other hand, implement additional attributes that characterize molecules and fragments. Conceptually, all atoms of a molecule must satisfy their valency, while those of fragments need not. Therefore, class Molecule checks for consistency in valency of its atoms (SingleAtom objects only) E Network generation: additional algorithms 198 and sets the difference between the determined valency and the number of nearest neigh- bors as implicit hydrogen atoms, in a manner similar to the convention in SMILES for calculating the hydrogen count. The internal representation of reaction rules, represented as graph transformation rules, is formalized in the class Reactiontype. Molecular constraints are stored as function pointers while the atom environment constraints that are described within SMARTS for individual atoms of a fragment are stored in Substructure as a list of constraints for each atom. Reactions are internally formalized, within class Reaction, as an application of a graph transformation rule on reactant chemical graph(s). The class Patternmatch implements the Ullmann algorithm and holds as members a Sub- structure object and a Molecule object on which the isomorphism check is performed. All possible matches are initially obtained by the algorithm. An object of Class Pat- ternmatch is contained in Reaction to seek all the fragments. Once the patterns are identified, for each match, the labels of the atoms in the reactant pattern are mapped to the corresponding atom in the molecule. As graph transformations are operations on the graph of the reactants, the Atomcontainer objects of the reactants are joined together to form a cumulative adjacency list. Transformations that change the connectivity - bond formation, scission, bond order increase and decrease - are then modifications in the adjacency list. The transformations that modify atomtype - changes in charge, lone pair of electrons, or unpaired electrons - are, in effect, changes in the attributes of the SingleAtom and CompositeAtom objects contained in the Atomcontainer. The connected components of the adjacency list, identified by a breadth-first search, are the products obtained from the reaction. The Molecule objects are constructed for each connected component, their canonical SMILES are identified and reaction strings are constructed using the reactant and product SMILES strings.

E Network generation: additional algorithms

Algorithms6 and7 are two additional algorithms referred to in Chapter4 in the context of network generation. F Network gneration algorithm: proof of completeness and correctness 199

Algorithm 6 bool IsReactant(Molecule Mol, Reactiontype Rxntype, int Index) if Subgraph Rxntype.pattern[Index] ∈ Mol satisfying atom and molecule constraints then return true else return false

Algorithm 7 GenerateAllReactions(list(Molecule) MoleculeList, Reactiontype Rxntype) Create Reaction object Rxn(MoleculeList, Rxntype) Generate reactions of Rxn for each reaction in Rxn do if products satisfy global and product constraints then if reaction ∈/ AllReactionMap then insert reaction into AllReactionMap with frequency = multiplicity else increase AllReactionMap[reaction] with reaction multiplicity if any of products is new then insert into UnprocessedQueue lump new products

F Network gneration algorithm: proof of completeness and correctness

Proposition 1: Given that (a) subgraph isomorphism algorithm is correct and complete, (b) topologically equivalent classes of atoms in a molecule can be determined accurately, and (c) canonical SMILES strings can be generated correctly, the algorithm in Scheme 2 will yield a complete and correct reaction network. Prior to proving the proposition, some definitions are in order. Definition 1: A molecule M is considered to be reachable from the initial reactants in a reaction network, given the chemistry rules, if there exists a finite set of reactions based on the chemistry rules such that some linear combination of these reactions will result in an overall reaction having as reactants some or all of the initial reactants of the system and M as one of the products. Definition 2: A molecule M can be reached from the initial reactants through more than one reaction route. The cardinality of the smallest set of reactions that result in F Network gneration algorithm: proof of completeness and correctness 200 the formation of M from the initial reactants is the distance of the molecule M from the initial reactants. Note: By chemistry rules, reaction rules as well as global constraints are implied. Therefore, by correctness of the reaction network, it is meant that all molecules sat- isfy the global constraints and are reachable from the initial reactants. Completeness requires that no molecule be generated that either cannot be generated from the given reaction rules and initial reactants or does not satisfy global constraints. Definition 3: A molecule M can be formed in a reaction network by one or more reactions. The reactants of any of these reactions are possible parents of M. The set ’P’ is defined as the set of all possible distinct parents.

F.1 Proof

The algorithm for reaction network generation checks for all possible types of uni-and bimolecular reactions: (a) single reactant A checked for participation in a unimolecular reaction, (b) two reactants, A and B, checked for participation in a bimolecular reaction, with each one of them either as a first or as a second reactant, and (c) single reactant participating as both reactants of a bimolecular reaction, resulting in a reaction of the form 2A ⇒ products. Note that, in addition, intramolecular reactions and third reactant features are also provided in RING. Furthermore, having two molecule containers - unprocessed and processed - and picking the second molecule of a bimolecular reaction from the processed molecule container ensures that no molecule pair is missed out while checking for possible reactions. Checking for all these possibilities ensures that, given the species of the network, all reaction possibilities have been checked. The assumptions (a) and (b) made in the proposition further ensures that given the reactants to be checked for a reaction rule, all possible matches and topologically equivalent classes will be correctly evaluated so that the correct products, reactions and their frequencies are generated. The assumption (c) ensures that RING doesn’t generate conflicting SMILES strings thereby leading to incorrect molecular representation at any point. With this understanding, we can prove completeness and correctness as discussed below.

Proof of completeness

Suppose an actually reachable molecule M has been missed in the reaction network, it should mean that no reactions leading to M, and hence its parents, were also not generated. Thus, we can conclude that more than one molecules and reactions were G Hash value calculation for lumping 201 missed. Let us denote the set of all missed molecules as SM . It should be noted, by virtue of the discussion above, there cannot be a reaction with the generated species and/or initial reactants leading to any of the elements of SM . Therefore, to produce any missed molecule, another missed molecule not in SM is required as reactant. But this is impossible, thus leading to a contradiction of our initial supposition that there exists a molecule M. Hence the network is complete.

Proof of correctness

Suppose a molecule M is generated that cannot be generated with the given chemistry rules, then each molecule in the set P of all possible parents of M are incorrectly gen- erated. This is so because, in light of the discussion above, M would not have been formed unless a possible parent was incorrectly generated. Proceeding in this manner recursively, a set SI of all incorrectly generated molecules can be found. The set has to be finite because the number of generated molecules is finite. Thus, SI is a subset of the set of all generated molecules, so it would definitely be possible to find at least one molecule, M´ (generated or initial reactant) that is outside the set SI such that it is a possible parent to some molecule in SI . This implies either that the particular generated reaction is incorrect, or that M´ is incorrectly represented. But, this implies that at least one of the three assumptions is wrong. This leads to a contrapositive argument that, if the network generated is incorrect then one of the assumptions is incorrect. Therefore, if the assumptions are assumed to be correct, the network generated is correct.

G Hash value calculation for lumping

Table 9.3 contains the parameters used for atom hash seed calculation. Algorithms8, 9, and 10 describe the supporting functions for hash value calculations.

Algorithm 8 Integer ElementPrimeValue (Atom at) return prime number value corresponding to the element of at

H Metal catalysis rules

The following bonding rules taken from Salccicioli et al. [220] are used as global con- straints: H Metal catalysis rules 202

Table 9.3: Seed parameters, corresponding primes, and factors for the calculation of atom hash seed

Parameter Prime number Factor Number of nearest non- 2 2n hydrogen neighbors, n

Aromaticity 3 3-if aromatic; 1- otherwise

Element C-5; H-7; O-11; N- Corresponding 13; P-15; S-17 prime, p. e.g. ’5’ for Carbon

Bond orders, b - pb, p is the prime corresponding to the neighboring element

Algorithm 9 Integer AtomPrimeValue (int AtomHashSeed) if AtomHashSeed ∈/ PrimeHashSeedMap then n = PrimeHashSeedMap.size() +1 PrimeHashSeedMap.insert(AtomHashSeed, nth smallest prime number) return PrimeHashSeedMap[AtomHashSeed]

Global PrimeHashSeedMap (Integer,Integer)

Algorithm 10 Integer AtomElectronicHash (Atom at) up = # of unpaired electrons of at ElectronicHash = 3|charge| × 5up if at.charge ≥ 0 then return 2× ElectronicHash else return ElectronicHash H Metal catalysis rules 203

1. All atoms and species will gas-phase bond order rules.

2. Hydroxyl groups will interact with the surface if the neighboring carbon is free. However, if there are three consecutive free C-OH groups in a molecule, only two consecutive ones among them will bond to the surface.

3. Carbonyl groups will prefer forming M-C-O-M rings unless a carbon alpha to it is bonded to the surface or there are two consecutive C, one of which is bonded to the carbonyl C, that are both bonded to the surface.

The following rules (Table 9.4) have been included. Only the forward steps are given here. In the actual input, reverse rules are also explicitly specified. The rules are represented in a manner described in the input into RING. M corresponds to a metal atom. Different versions of the same rule have been input to account for weak bonding of oxygen atoms in the neighborhood of the reaction center. H Metal catalysis rules 204 ”. Depending upon the number of This rule involves forming atial) weak (par- Oa M gaseous bond reactantgroup. having with a This thea bond hydroxyl metal “ is for representedoxygen with atoms inwe the gaseous have reactant, multiplebond rules formation step. describing the This involves breakingThis a scission C steps canOH affect interactions. H the Therefore, bond. weak dependingon the naturebound) of and the the presencea Carbon or hydroxyl (free absence group, of or this different rule versions is of used. This involves breaking anDifferent versions O exist depending upon Hwhether or bond. not OHthe surface. is interacting with This involves breakingAgain, a on C theing OH basis C groups of bond. (weaklyC=O bound the groups, or neighbor- not), oxygen or atoms different just versions ofwritten. it double are bonded {M}H {M}H {M} {M} + + + O {M} {M} {M} O Continued on next page + + + + {M}{M} {M}{M} + {M}{M} {M}{M} + + + OH OH

Rules Input into RING for glycerol conversion on transition metals Table 9.4: Rule name Pictorial representation Description Physisorption C-H scission O-H scission C-C scission H Metal catalysis rules 205 This involves breakingThis a again C haspending O on multiple bond. the versions(already weakly nature de- bound of or not) OH groups A form ofis OH bonded scission to the whereintwo surface of the to its C valence account electronsM for (i.e. bonds). two Here, C- C-O upon bond OH forms scission a the carbonyl group. O + {M} {M}{M} + + {M}H {M} + O {M} + + Table 9.4 – continued from previous page {M}{M} + {M}{M} OH + O {M} {M}

Rule nameC-O scission Pictorial representationC=O formation Description I Alternative scheme for thermochemistry and activation energies estimation in metal catalysis 206 I Alternative scheme for thermochemistry and activation energies estimation in metal catalysis

We describe herein a detailed procedure for using the energy calculation method pro- posed by Liu and Greeley [221, 222, 240] within RING. The binding energy of a species as defined by these authors is

2x − y + 2 BE(C H O ∗) = E(C H O )−E −E(C H O (g))+ E(H (g)) (9.3) x y z x y z slab x 2x+2 z 2 2 That is, the binding energy is the total energy of a surface species minus the en- ergies of the slab, gas phase fully hydrogenated molecule corresponding to the species, and an appropriate amount of gas phase hydrogen molecule (needed to fully hydro- genate this species). This binding energy is further written as a sum of contributions of various groups as corrections to the binding energy of the fully hydrogenated species

(CxH2x+2Oz*. Therefore, if (a) the binding energy of the fully hydrogenated gas phase stable molecules such as glycerol, propane diols, ethylene glycol, ethanol, methanol, etc., (b) the corresponding gas phase energy of these molecules, (c) energy of hydrogen, and (d) the energy of a clean slab are known, the energy of a surface species can be calcu- lated based on group contributions feature in RING. The individual group contributions can be specified for each group (e.g. primary carbon, oxygen pertaining to the primary carbon, secondary/primary carbon without neighboring oxygen, etc.). The group ad- ditivity value will be the product pxixi (for more details on what these terms mean, interested readers can find the definitions in Liu and Greeley). The contribution of an integer multiple of half the energy of hydrogen gas can be specified for each group depending on how many more hydrogen atoms are required to saturate it. For example, CH2OH* needs one more hydrogen atom and 0.5E(H2) value could be added to group the primary C (connected O) in the group additivity feature. The energy and binding energy of the fully hydrogenated molecule can be specified as group corrections. To make these corrections applicable for the appropriate species, constraints can be specified. For example, to use methanol for C1 surface oxygenate species, a constraint can be specified (as characteristic declarations) that stipulates that the correction is applicable only for those molecules having exactly one C and O bond and is surface bonded. While defining group corrections, a characteristic could be defined for a C1 oxygenate. The syntax for that would look like: J Benson-like groups to account for surface cycle effects in metal catalysis 207 define characteristic C1Oxygenate on mol{ fragment f{ C labeled c1 O labeled o1 any bond to c1 } Mol is surfaceSpecies && contains 1 of f }

The energy of the slab can be specified as a molecular correction. RING can calculate thermochemistry using the TSS method based on the final state energies relative to the initial state gas phase values. The initial state surface energetics is available from group additivity scheme of Salciccioli et al. [220] or Liu and Greeley [221, 222, 240] (discussed above). To calculate its gas phase values, the binding energy (defined as the energy of a surface species minus the energy of the corresponding gas phase species and the slab) of the reactants are required. This can be provided to RING by making use of the linear scaling correlations by Norskov and coworkers [226]. The implementation of this method is similar to how linear scaling relationships were used (as discussed in Chapter7). To use the TSS method then only requires the following syntax (the numbers are purely for illustrative purposes): kinetics SampleRule { A 1e13 1/s Ea from LFER (a = 1.02 b = 100 kJ/mol) n 0.0 }

J Benson-like groups to account for surface cycle effects in metal catalysis

The surface ring groups (and their supplementary angles) included in RING are given in Table 9.5S. The contribution of these ring fragments are based on values given by Salciccioli et al. [133].

K Binding energies of Carbon, Oxygen, and Hydrogen on different metals

The binding energies ((Chapter7)) of carbon, oxygen, and hydrogen atoms on Platinum, Palladium, Rhodium, and Ruthenium are given in Table 9.6.All values, except when explicitly stated, were taken from Sutton and Vlachos [227]. Note that these are atom- metal bond dissociation energies (or the reverse of binding energies). K Binding energies of Carbon, Oxygen, and Hydrogen on different metals 208

Table 9.5: Surface groups and supplementary angles

C(M3)C(M3) 0 C(M3)C(M2) 54.75 C(M3)C(M) 70.5 C(M3)O(M) 70.5 C(M3)C(=O)(M) 60 C(M2)C(M) 125.25 C(M2)O(M) 125.25 C(M2)C(=O)(M) 114.75 C(M2)C(M2) 109.5 C(M)C(M) 141 C(M)O(M) 141 C(M)C(=O)(M) 130.5 O(M)O(M) 141 O(M)C(=O)M 130.5 C(=O)(M)C(=O)(M) 120 C(M)(M)C(=O)C(M)(M) 169.5 C(M)(M)C(=O)C(=O)(M) 174.75 C(M)(M)(M)CC(M)(M)(M) 70.5 C(M)(M)(M)OC(M)(M)(M) 70.5 C(M)(M)(M)C(=O)C(M)(M)(M) 60 C(M)(M)(M)CC(M)(M) 125.25 C(M)(M)(M)C(=O)C(M)(M) 114.75 C(M)(M)(M)CC(M) 141 C(M)(M)(M)OC(M) 141 C(M)(M)(M)C(=O)C(M) 130.5 C(M)(M)(M)CO(M) 141 C(M)(M)(M)OO(M) 141 C(M)(M)(M)C(=O)O(M) 130.5 C(M)(M)(M)OC(=O)(M) 130.5 C(M)(M)(M)C(=O)C(=O)(M) 120 C(M)(M)(M)CCC(M)(M)(M) 141 C(M)(M)(M)COC(M)(M)(M) 141 C(M)(M)(M)CC(=O)C(M)(M)(M) 130.5 C(M)(M)(M)OOC(M)(M)(M) 141 C(M)(M)(M)OC(=O)C(M)(M)(M) 130.5 C(M)(M)(M)C(=O)C(=O)C(M)(M)(M) 120 C(M)(M)(M)C(=O)C(=O)C(M)(M) 174.75 L Comparison of RING-calculated thermochemistry values of surface species on Platinum with DFT 209 Table 9.6: Bond dissociation values for atomic C, H, and O bonds with different metal

Metal C-M dissociation O-M dissociation H-M dissociation energy energy energy Pt 680.22 379.19 264.4 Pd 644.52 369.54 279.8 [272] Rh 701.45 455.41 277.9 Ru 663.82 486.29 278.8

L Comparison of RING-calculated thermochemistry val- ues of surface species on Platinum with DFT

Table 9.7 shows the thermochemistry predicted by RING for surface species on Platinum and the corresponding DFT values taken from Vlachos and coworkers [220, 227, 133] (Chapter7). The absolute deviation is about 8 kJ/mol (or 2 kcal/mol) for enthalpy of formation and standard deviation is about 11 kJ/mol. L Comparison of RING-calculated thermochemistry values of surface species on Platinum with DFT 210

Table 9.7: Comparison of RING-predicted and DFT values for surface hydrocarbons and oxygenates on Platinum. All values are in kJ/mol

Heat of formation (kJ/mol) SMILES Molecular for- RING-predicted DFT values mula C([{M}])O CH2OH* -205.907 -222.794 C(CO [{M}])([{M}])([{M}])O OHCH2COH* -433.382 -429.704 O([{M}])C CH3O* -145.046 -145.882 C(=O)(C([{M}])O)[{M}] HOCHCO* -391 -391.666 C(=O)C[{M}] CH2CHO* -158.297 -157.168 C(CO[{M}])([{M}])[{M}] CHCH2O* -121.3 -120.802 C(=O)(C[{M}])[{M}] CH2CO* -200.4 -198.55 C(=O)C(O[{M}])[{M}] OCHCHO* -280.011 -275.462 O=CC([{M}])O HOCHCHO* -355.425 -354.046 C(=O)(CO [{M}])[{M}] HOCH2CO* -401.656 -405.46 C(O[{M}])([{M}])C CH3CHO* -199.726 -205.656 C([{M}])([{M}])(O)C CH3COH* -258.839 -270.446 C(C[{M}])([{M}])[{M}] CHCH2* 12.1028 16.72 C(C([{M}])[{M}])([{M}])[{M}] CHCH* 71.51 32.604 C([{M}])CCC[{M}] CH2CH2CH2CH2 -119.012 -116.204 C([{M}])C([{M}])C CH2CHCH3 -83.1045 -86.944